"newbie" <new@new.com> writes:
> Our HP C3600 Unix workstaion always crashes recently, 4 or 5 times everyday.
> which debug tool can be used to analyze core.26.1.gz and vmunix.gz?
Nitpick: when one process crashes and produces a file named "core",
that's a core dump. What you've got is a more often called a _crash_
dump: it happens when the the entire OS crashes.
You're using HP-UX 10.20, whose official support ended quite a while
ago. Fortunately the procedure has not changed significantly in more
modern versions.
You'll need the "q4" crash dump analysis tool (available in patches if
you don't have it installed, installing it does not require a reboot).
Here's an old copy of a HP crash dump analysis document that still
specifies q4 patch numbers for HP-UX 10.20:
http://olearycomputers.com/ll/hpq4.html#full_doc
You'll also need enough disk space to gunzip the crash dump files.
After a crash and reboot, syslog.log shows only what happened after
the system restarted. The tail end of /var/adm/syslog/OLDsyslog.log
might have been more interesting reading...
> # more syslog.log
> Apr 5 01:17:52 HP001 vmunix: t61622: GPIO TAMS61622 card driver. Copyright
> 1999-2001 TAMS Inc. 970-669-6553
> Apr 5 01:17:52 HP001 vmunix: t61622: INSTALLING the driver... Looking for
> TAMS61622 hardware...
> Apr 5 01:17:52 HP001 vmunix: t60488 driver installed
> Apr 5 01:17:52 HP001 vmunix: t61622[4]: FOUND TAMS61622 hardware.
> Attaching...
> Apr 5 01:17:52 HP001 vmunix: t61622[4]: ATTACHED successfully. Enabling...
> Apr 5 01:17:52 HP001 vmunix: 10/1/4/0 t61622
> Apr 5 01:17:52 HP001 vmunix: T60488: card identified and enabled
> Apr 5 01:17:52 HP001 vmunix: t61622[4]: DMA support: Yes
> Apr 5 01:17:52 HP001 vmunix: t61622[4]: ENABLED successfully. TAMS61622
> ready.
> Apr 5 01:17:52 HP001 vmunix: T60488: card identified and enabled
Hmm... you seem to have some specialized hardware. This is probably
the reason why you're still using HP-UX version 10.20, right?
> Apr 5 01:17:52 HP001 vmunix: WARNING: Insufficient space on dump device to
> save full crashdump.
> Apr 5 01:17:52 HP001 vmunix: Only 1992294400 of 2147484672 bytes will
> be saved.
Ouch. Your dumps may not be complete, and the attempt to analyze them
may fail. To get a complete dump, you would need a bigger primary
swap/dump LV, or a dedicated dump LV at some location that is
accessible by the firmware.
(The firmware stores the crash dump on the designated dump device,
which usually does double duty as the primary swap partition
/dev/vg00/lvol2. When the system is rebooting, the OS sees the dump and
copies it to /var/adm/crash just before activating swap.)
> Apr 5 01:19:19 HP001 vmunix: /: bad dir ino 19228 at offset 0: mangled
> entry
That looks like your root filesystem may have taken a hit in one of
those crashes. A "full" file system check might be needed.
(Short version: read "man fsck" and "man fsck_vxfs", plan what you
intend to do, then reboot the system to single-user mode. When the
root filesystem is in read-only mode and other FSs are not mounted,
you can safely run fsck on the root filesystem. After that, reboot
again to make sure the kernel is aware of any changes made by the fsck
utility.)
--
Matti.Kurkela@welho.com