jimwgramling@gmail.com writes:
> I've got an application built on a pseries machine under AIX 5.3.
> Under certain circumstances (which we have yet to identify
> precisely!), the application may hang, segv, throw a memory error, or
> even a "catastrophe in malloc" error.
All typical symptoms of heap corruption...
> Since restarting just the application isn't enough to resolve the
> problem (even rebuilding the app doesn't do it), it seems like we may
> be looking at some kind of memory corruption issue.
It is extremely likely that you are looking at heap corruption.
> Unfortunately,
> I've never been onsite when the problem crops up (fortunately it's not
> *too* frequent!), so I don't know what was going on with the machine
> (paging, filesystem full, etc.).
It wouldn't have helped anyway -- by the time you observe the
problem, the part of code that actually did the damage is likely
nowhere to be seen (not currently on the stack).
> For the thread-safe versions of the reentrant C runtime start-up
> routines ? crt0_r.o ,mcrt0_r.o, gcrt0_r.o ?
On AIX 5.1, crt0_r.o is a symlink to crt0.o.
Most likely the same is true on AIX 5.3.
> Any helpful suggestions will be greatly appreciated!
You need a heap debugger.
Start with this one:
http://www.redbooks.ibm.com/redbooks...tml/wwhelp.htm
and go to 4.2.4 The debug malloc allocator.
Your other options are (commercial) Insure++, Purify and ZeroFault.
Cheers,
--
In order to understand recursion you must first understand recursion.
Remove /-nsp/ for email.