vBulletin Search Engine Optimization
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| I've got an AIX server that's going tits-up with increasing frequency, in what appears to be a resource starvation. Early in the year, it was going down once a month, but now it's up to twice a week. Performance logs indicate that the server uses increasingly more swap space until it runs out. I'm guessing it's a memory leak with one of the few applications on the box, but am unsure how to narrow down the culprit and catch it red handed. Pointers? -- |
| |||
| If you have "topas" on your version of AIX, run it (/usr/bin/topas in AIX 5.1). Use 'P' to get to the process screen, and then tab until page space is highlighted. If you don't have topas, you could run "ps axv" repeatedly for several hours and stick a date/time stamp on each sample. You can also sort it by mem size to see if something is leaking. Here's a script to do that. You're probably interested in SIZE, RSS, and %MEM. ---cut--- #!/usr/bin/ksh # psestat.sh - ps axv output sorted by memory utilization (field 6) # get the top N processes (get them all for now) let TIMES=720 # num of seconds to wait before sampling again let DELAY=10 # initialize the counter let NUMBER=1 # how many lines to print let LINES=20 # header line HEADER=`ps axvwww | head -1` while [[ $NUMBER -le $TIMES ]] do date print $HEADER ps axvwww | grep -v 'PID' | sort -r -n -k6,6 | head -$LINES #ps axvwww | grep -v 'PID' | sort -r -n -k6,6 # uncomment this ps to get all procs sleep $DELAY; let NUMBER=NUMBER+1 done ---cut--- Run it for a couple of hours and redirect the output to a file... $ nohup ./psestat.sh > psestat.out & ....then you can do an egrep to get info on memory utilization of any process over time: $ egrep 'RSS|Wed Dec 29|136042' psestat.out Wed Dec 29 17:08:42 PST 2004 PID TTY STAT TIME PGIN SIZE RSS LIM TSIZ TRS %CPU %MEM COMMAND 136042 - A 43:37 2494 44084 26128 32768 22 28 0.2 1.0 goodproc Wed Dec 29 17:08:52 PST 2004 PID TTY STAT TIME PGIN SIZE RSS LIM TSIZ TRS %CPU %MEM COMMAND 136042 - A 43:38 2494 44084 26128 32768 22 28 0.2 1.0 goodproc Wed Dec 29 17:09:02 PST 2004 PID TTY STAT TIME PGIN SIZE RSS LIM TSIZ TRS %CPU %MEM COMMAND 136042 - A 43:38 2494 44084 26128 32768 22 28 0.2 1.0 goodproc $ goodproc isn't growing here, but if it were, you'd likely see an increase over time in the RSS, SIZE, and %MEM. You might also take a look at "svmon" on AIX: http://publibn.boulder.ibm.com/doc_l....htm#HDRI46342 See also: http://publibn.boulder.ibm.com/doc_l...ungd02.htm#ToC |
| |||
| There are a lot of poorly coded Java applications out there. And a good number of them leak memory as fast as they can get it! JaYmZ <test@testpleasegodnodontspamme.com> wrote in message news:JmGAd.35160$dv1.29536@edtnps89... > I've got an AIX server that's going tits-up with increasing frequency, in > what appears to be a resource starvation. Early in the year, it was going > down once a month, but now it's up to twice a week. > > Performance logs indicate that the server uses increasingly more swap space > until it runs out. > > I'm guessing it's a memory leak with one of the few applications on the box, > but am unsure how to narrow down the culprit and catch it red handed. > > Pointers? > > > -- > |
| |||
| "James T. Sprinkle" <oicmrsnakes@hotmail.com> writes: > There are a lot of poorly coded Java applications out there. And a good > number of them leak memory as fast as they can get it! > > JaYmZ > The good news about coding in Java is that you don't have to do memory management. The bad news is that you don't get to, either :-) -- #include <disclaimer.std> /* I don't speak for IBM ... */ /* Heck, I don't even speak for myself */ /* Don't believe me ? Ask my wife :-) */ Richard D. Latham lathamr@us.ibm.com |
| |||
| <test@testpleasegodnodontspamme.com> wrote in message news:JmGAd.35160$dv1.29536@edtnps89... > I've got an AIX server that's going tits-up with increasing frequency, in > what appears to be a resource starvation. Early in the year, it was going > down once a month, but now it's up to twice a week. > > Performance logs indicate that the server uses increasingly more swap space > until it runs out. > > I'm guessing it's a memory leak with one of the few applications on the box, > but am unsure how to narrow down the culprit and catch it red handed. > > Pointers? > -- Profile the platform with DPMonitor, and you'll not only catch the culprit red-handed, but you'll have the evidence to justify additional resources, if they're truely needed, and also to monitor on-going, to ensure what ever actions taken to address/resolve the problem(s) do indeed address/resolve the problem. Plus, you will know full well in advance of upcoming resource starvation incidents by looking at the graphical output displays. If it's a third party software product, a detailed set of graphs showing that software processes resource consumption can be provided to the software provider from which they can be held accountable to address/resolve problems in their applications, (i.e. fix memory leaks or whatever else is found). Performance Agents run on: - AIX: 4.3, 5.1, includes 64-bit and 32 CPU systems, with 5.2 support coming soon - Solaris: 2.6, 2.7, 2.8, with 2.9 coming soon - Windows NT 4.0, 2000, XP, with 2003 Server coming soon Performance Explorer runs on a separate Windows Workstation that expects to communicate with the Agents over TCP/IP and UDP. 10 day Evaluation license provides system wide resource profiling. Fully licensed version provides system wide & individual per process monitoring. Extremely low overhead Agents. DPMonitor is priced per number of CPU's on the Application Server(s) being monitored, and is also available for 3 or 6 month lease, to allow for more in-depth, fully licensed evaluations. Map out graphically exactly what is happening and when, over time. Individual CPU stats for multi-processor systems, swap activity, RAM activity, buffer cache, disk I/O, network I/O, and more. For more information, go to www.deltek.us and click on link for examples of DPMonitor usage. Happy New Year. |
| ||||
| test@testpleasegodnodontspamme.com writes: >I've got an AIX server that's going tits-up with increasing frequency, in >what appears to be a resource starvation. Early in the year, it was going >down once a month, but now it's up to twice a week. .... >I'm guessing it's a memory leak with one of the few applications on the box, >but am unsure how to narrow down the culprit and catch it red handed. It won't directly point to the culprit, but if you set the vmtune -n parameter to 2, the pager won't kill off root or daemon processes when paging space gets low. This gives you more opportunity to look around for which process is causing trouble. (Unless your applications run as root :-( ). Also, if you do run out of paging space, it can be tricky sometimes to execute a clean shutdown, because you cannot get enough memory to run ps to see which processes to kill. If you happen to have a root ksh session, you can execute the following command (which runs entirely in the shell) to find processes to kill. for x in /proc/*;do echo;echo $x;echo $(< $x/psinfo);done This produces messy output that often has enough info to identify processes to kill. E.g., /proc/52440 ???????(_????A? ?N/?/?/?/?vivi +9 /usr/tmp/nn.IfN4qa???Zp-VDS<<OTHER???? /proc/52956 ?????__*??????<A? ?___/?/?/?/?ksh-kshV??+R<>OTHER???? /proc/5534 ????*A?__ ?X?/?/?/?/?srcmstr/usr/sbin/srcmstr,??S<<OTHER???? /proc/5786 p????,,LA???__/?/?/?/?errdemon/usr/lib/errdemon?? ??S<<OTHER???? /proc/6188 ,?,,?0TpA?__?!`?/?/?/?/?xntpd/usr/sbin/xntpdN??'S00OTHER???? The number after the /proc/ is the process ID and the text in the next line(s) of mess is the command name and some arguments. Kill is a shell built-in, so you can use kill to terminate processes until you free up enough memory to run a real ps. My experience is that if I am going to reboot anyway, and nothing else works, killing srcmstr frees up enough space for a mostly-clean shutdown. My further experience is that once you run out of paging space, you must reboot. AIX on its own does not recover enough space, no matter how many processes you kill. (True on AIX 5.1. Not sure on 5.2 yet.) -- Dale Talcott, Rosen Center for Advanced Computing, Purdue University aeh@quest.cc.purdue.edu http://quest.cc.purdue.edu/~aeh/ |