Performance Monitoring – Identifying Memory Bottlenecks

 

Memory bottlenecks are evidenced by two different things happening on the system — paging and swapping. Paging refers to pages of memory being reclaimed by the page daemon when the system starts to get low on free memory. Swapping is more extreme, and refers to entire processes being swapped out.

 

To determine if you are only paging, or also swapping, examine two columns in the vmstat output. The first column is the sr column. If the value in this column is greater than zero then the page scanner is scanning memory pages to put them back on the free list to be reused.

 

The page scanner runs when memory falls under the value of a system parameter known as lostfree – default value is 1/64th of physical memory – or cachefree if priority_paging is enabled default value is 1/128th of physical memory.

 

You should not worry about high scan rates if you are using the file system heavily. High scan rates can be normal in many circumstances. If priority_paging is enable, the page scanner steals the pages more effectively so the file system I/O does not cause unnecessary paging of applications. priority_paging causes sr rate to be higher for its own good. Solaris 8 introduces the cyclic cache. With cyclic cache, the scanner is not used to reclaim pages during file system I/O therefore if sr is greater than 0 then it’s a indication that the system is running low in memory.

To see if you are swapping, refer to the w column. It is the third column of the output, and refers to entire processes which are swapped out. You can determine what these processes are by running the command ‘ /usr/bin/ ps -e -o pid,rss,args ‘ and looking for a RSS of 0 (sched, pageout and fsflush processes should always have a RSS of 0).

If you have anything in the w column, you are either low on memory right now, or you have been in the past. If your system gets low on memory and processes are swapped out, it may take a long time for them to get back into memory. This is especially true if they are daemons which are not run often, because they have to receive an event in order to try to run again. This is not necessarily bad, as long as when they need to run, they will have the memory to do so.

If, over time, you see swapping, you should probably consider adding memory to the system or devising a strategy to low overall memory usage on the system.

Just made a quick 5 mins demo video to explain sample scenario related to memory bottleneck. The video is not a high quality video, but good enough to understand the simple scenario.

[FMP width=”900″ height=”506″]http://unixadminschool.com/blog/wp-content/uploads/ou/mem-bottleneck.mp4[/FMP]

Ramdev

Ramdev

I have started unixadminschool.com ( aka gurkulindia.com) in 2009 as my own personal reference blog, and later sometime i have realized that my leanings might be helpful for other unixadmins if I manage my knowledge-base in more user friendly format. And the result is today's' unixadminschool.com. You can connect me at - https://www.linkedin.com/in/unixadminschool/

17 Responses

  1. Pratap says:

    Excellent Ramdev

  2. Raj says:

    Hi,
    Firstly, I would like to appreciate you guys for posting very much useful information on gurkul site and clarifying most of doubts by explaining in detail and I also see most of it is from real time.
     I am very much curious to know few more things like ” what are the different commands to login and exit to different SUN Solaris Consoles there prompts etc.It would be great if you can please provide the info..

    Regards,
    Raj

    • Ramdev Ramdev says:

      Thanks Raj. Welcome to gurkulindia.
      I am having similar thought to consolidate the console exit and break sequences for various hardware, will definitely share the info very soon.

      • Ramdev Ramdev says:

        @ Raj – now you have the information that you requested “http://gurkulindia.com/main/2012/03/working-with-solaris-server-consoles-using-lom-ilom-alom-elom-rsc/” ..cheers

  3. Ramesh says:

    Excellent Ramdev

  4. prajwala says:

    Excellent Ramdev

  5. prajwala says:

    Excellent Ramdev ………

  6. Thank you Ramdev,

    i would share few commands , useful for performance monitoring.

    UNIX95=1 ps -eo vsz,pid,ppid,args | sort -rn | head -20
    UNIX95= ps -eo rss,vsz,ruser,pid,args | sort -rn | more
    UNIX95= ps -ef -o pid,ruser,vsz,args|sort -nrk3| awk ‘{ print $1″ ” ,$2 ” “, $3/1024 “MB” ,” ” $4}’ | head -20

  7. Ramdev Ramdev says:

    @prajwala, @ramesh – thanks for the comment
    @ Santhosh – Thanks for sharing these nice commands

  8. Raj says:

    Thank you so much Ram..

  9. Chetan says:

    Hi,
    today I recv mail from HPOV saying that CPU utilization reach to 99.98% so i login in server i check prstat -a cmd so how we trubleshoot if cpu or mem. utilization high

    o/p:

    PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP
    8685 p10adm 17G 2326M cpu19 0 0 1:52:04 12% disp+work/1
    9070 p10adm 20G 4384M cpu0 0 0 4:09:46 12% disp+work/1
    29001 p10adm 17G 1777M sleep 0 0 11:35:10 8.3% disp+work/1
    3834 root 168M 151M sleep 59 0 17:50:33 0.1% java/80
    189 daemon 6952K 3456K sleep 59 0 1:35:12 0.1% kcfd/10
    3671 p10adm 17G 3375M sleep 59 0 1:32:19 0.1% disp+work/1
    9777 githesh 3976K 3384K cpu17 59 0 0:00:00 0.1% prstat/1
    1198 root 6864K 3264K cpu3 0 0 23:35:31 0.1% zstatd/1
    4136 p10adm 17G 1057M sleep 59 0 0:00:09 0.1% disp+work/1
    27729 p10adm 17G 3003M sleep 59 0 0:39:28 0.1% disp+work/1
    9790 root 6264K 4600K sleep 53 0 0:00:00 0.1% ssh/1
    9678 githesh 7560K 5192K sleep 59 0 0:00:00 0.1% sshd/1
    18215 p10adm 17G 3799M sleep 59 0 2:49:51 0.1% disp+work/1
    3706 p10adm 17G 1595M sleep 59 0 0:00:04 0.0% disp+work/1
    22893 p10adm 13G 371M sleep 59 0 0:24:16 0.0% disp+work/1
    22978 p10adm 17G 2370M sleep 59 0 0:26:36 0.0% disp+work/1
    22848 p10adm 8656K 3056K sleep 59 0 0:22:04 0.0% saposcol/1
    8874 root 3344K 2328K sleep 55 0 0:00:00 0.0% tkp1000log/1
    210 root 10M 3800K sleep 59 0 0:23:29 0.0% nscd/34
    7724 p10adm 17G 900M sleep 59 0 0:00:03 0.0% disp+work/1
    22972 p10adm 17G 779M sleep 59 0 0:03:51 0.0% disp+work/1
    229 root 2792K 1240K sleep 59 0 0:54:07 0.0% in.mpathd/1
    24405 root 17M 13M sleep 59 0 0:00:17 0.0% ovcd/28
    9737 githesh 1736K 1264K sleep 59 0 0:00:00 0.0% sudosh/1
    9677 root 5624K 3776K sleep 59 0 0:00:00 0.0% sshd/1
    22895 p10adm 334M 146M sleep 59 0 0:04:21 0.0% gwrd/1
    1008 noaccess 179M 91M sleep 59 0 0:52:32 0.0% java/18
    24415 root 19M 13M sleep 59 0 0:00:22 0.0% opcmsga/10
    192 root 9224K 4752K sleep 59 0 1:13:40 0.0% picld/11
    618 root 14M 1712K sleep 59 0 0:01:30 0.0% snmpd/4
    541 root 4584K 1624K sleep 59 0 0:00:11 0.0% sshd/1
    689 root 3976K 1304K sleep 59 0 0:00:00 0.0% rpc.metad/1
    12958 p10adm 17G 1372M sleep 59 0 0:03:31 0.0% disp+work/1
    499 root 1712K 1040K sleep 59 0 0:01:13 0.0% utmpd/1
    602 root 2888K 1224K sleep 59 0 0:00:01 0.0% snmpdx/1
    498 root 2424K 976K sleep 59 0 0:00:00 0.0% smcboot/1
    524 root 3776K 1472K sleep 59 0 0:00:00 0.0% mountd/1
    532 root 4424K 1704K sleep 59 0 0:03:01 0.0% syslogd/11
    476 root 2880K 1304K sleep 59 0 0:00:02 0.0% ttymon/1
    622 root 3856K 1040K sleep 59 0 0:00:00 0.0% dmispd/1
    637 root 4016K 1216K sleep 59 0 0:00:00 0.0% mdmonitord/1
    440 root 2456K 1240K sleep 59 0 0:00:02 0.0% sac/1
    NPROC USERNAME SWAP RSS MEMORY TIME CPU
    76 p10adm 32G 12G 37% 76:31:34 34%
    63 root 411M 424M 1.3% 46:27:19 0.4%
    8 githesh 6952K 15M 0.0% 0:00:00 0.2%
    7 daemon 10M 10M 0.0% 1:35:22 0.1%
    1 noaccess 94M 158M 0.5% 0:52:32 0.0%
    1 smmsp 1808K 8080K 0.0% 0:00:21 0.0%
    1 nobody 1064K 2968K 0.0% 0:00:00 0.0%

    Total: 157 processes, 500 lwps, load averages: 3.07, 3.07, 3.30

    in above o-p show “load averages” how he calculate that load averages, what is lwps

    what is diff bet prstat & sar o-p?

    Many thanks
    Chetan

  10. Yogesh Raheja says:

    @Chetan, from above outputs its clear that top three processes are taking much of your CPU usage Next step would be to check the processes which are eating up these CPU usage. First colum will provide you with the PID. Do ps -ef | grep -i and check the process details and if its applicatio/DB/etc etc related then ask them to tune the things from there end.

  11. Chetan says:

    Thnks sir,

    but wht about,

    above o-p show “load averages” how he calculate that load averages, what is lwps

    what is diff bet prstat & sar o-p?

  12. Yogesh Raheja says:

    @Chetan, if you will closely observer the outputs of sar and prstat, you will be able to find that sar will provide you with the limited information about the CPU state i.e time and load used by user/sys and waitstate along with ideal status. But finding the cpu percentage usage our next step would be to find out the PID of the process , owner of the process which is causing more load i.e in short we require detail information about CPU usage and about the process which are causing this much load and PRSTAT is the way in solaris to finf out that. Similarly we have sar and tprof combination in AIX and sar plus top combination in Linux.

  13. Yogesh Raheja says:

    @Chetan, the lwps are nothing but the process threads. The can be understand on kernel level. kindly find the below link which will be useful for you to understand this concept: http://developers.sun.com/solaris/articles/prstat.html

  14. Chetan says:

    Hi,

    Many thanks 4 reply….

    Sir , now app. user want last 24 hrs cpu utilization & memory utilization & disk utilization o/p…..
    can u help meee????

    One more if i want see which file or dir taken more space in specific dir. in 1) KB. 2) MB. 3). GB. in tht seqnance .if i give du -sh *| sort -nr | more cmd it’s shows the o/p but not in less to more format

    Mant Thnaks……

  15. anon says:

    Thanks on your marvelous posting! I truly enjoyed
    reading it, you will be a great author. I will be sure to bookmark
    your blog and will eventually come back in the future. I want to
    encourage continue your great posts, have a nice weekend!

What is in your mind, about this post ? Leave a Reply

Close
  Our next learning article is ready, subscribe it in your email

What is your Learning Goal for Next Six Months ? Talk to us