vBulletin Search Engine Optimization
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| My clients system has started throwing the following message on the system console (retyped from description read over the phone): WARNING: allocb failed - NSTRPAGES exceeded I read TA116684 on how to debug failures and the items on kernel tuning and drivers don't seem to be applicable as this just started happening. 4. Failing hardware 5. External network hardware misbehaving 6. Extremely high network traffic Items 4, 5, and 6 seem to be possible candidates. The client called today when he simply rebooted the system (as that was what I was advising him to do on previous occasions) and then the system spontaneously rebooted about an hour later. I connected via SSH to the running system and as I was monitoring it, it rebooted again. I found the following in /usr/adm/syslog: Mar 7 14:55:05 vetreal bootpd[1359]: IP address not found: 192.168.160.143 Mar 7 15:00:53 vetreal TLW param1=-1 Fri Mar 7 15:00:53 CST 2008 reboot initated Mar 7 15:03:44 vetreal syslogd: restart The two odd things that jump out above is TLW param1=-1 and "reboot initiated" both at 15:00:53. Does anyone recognize these two entries that seem to be related? I've never seen a system log with the message "YYYY reboot initiated" Checking /usr/adm/syslog: # grep "2008 reboot initated" /usr/adm/syslog Mon Jan 28 13:19:20 CST 2008 reboot initated Sat Feb 16 13:35:05 CST 2008 reboot initated Mon Feb 25 17:30:05 CST 2008 reboot initated Wed Mar 5 16:37:54 CST 2008 reboot initated Fri Mar 7 12:38:13 CST 2008 reboot initated Fri Mar 7 14:09:16 CST 2008 reboot initated Fri Mar 7 14:16:23 CST 2008 reboot initated Fri Mar 7 15:00:53 CST 2008 reboot initated # grep "2007 reboot initated" /usr/adm/syslog Tue May 1 00:04:44 CDT 2007 reboot initated # grep "2006 reboot initated" /usr/adm/syslog # grep "2005 reboot initated" /usr/adm/syslog # Syslog starts Jan 31 2005. I see that it has been occurring but not to the level that it has today. -- Steve Fabac S.M. Fabac & Associates 816/765-1670 |
| |||
| Steve M. Fabac, Jr. wrote: > My clients system has started throwing the following message > on the system console (retyped from description read over > the phone): > > WARNING: allocb failed - NSTRPAGES exceeded > > I read TA116684 on how to debug failures and the > items on kernel tuning and drivers don't seem to be > applicable as this just started happening. > > 4. Failing hardware > > 5. External network hardware misbehaving > > 6. Extremely high network traffic > > Items 4, 5, and 6 seem to be possible candidates. > > The client called today when he simply rebooted the system > (as that was what I was advising him to do on previous > occasions) and then the system spontaneously rebooted about > an hour later. > > I connected via SSH to the running system and as I was > monitoring it, it rebooted again. > > I found the following in /usr/adm/syslog: > Mar 7 14:55:05 vetreal bootpd[1359]: IP address not found: 192.168.160.143 > Mar 7 15:00:53 vetreal TLW param1=-1 > Fri Mar 7 15:00:53 CST 2008 reboot initated > Mar 7 15:03:44 vetreal syslogd: restart > > The two odd things that jump out above is TLW param1=-1 > and "reboot initiated" both at 15:00:53. > > Does anyone recognize these two entries that seem to be related? > > I've never seen a system log with the message "YYYY reboot initiated" > > Checking /usr/adm/syslog: > > # grep "2008 reboot initated" /usr/adm/syslog > Mon Jan 28 13:19:20 CST 2008 reboot initated > Sat Feb 16 13:35:05 CST 2008 reboot initated > Mon Feb 25 17:30:05 CST 2008 reboot initated > Wed Mar 5 16:37:54 CST 2008 reboot initated > Fri Mar 7 12:38:13 CST 2008 reboot initated > Fri Mar 7 14:09:16 CST 2008 reboot initated > Fri Mar 7 14:16:23 CST 2008 reboot initated > Fri Mar 7 15:00:53 CST 2008 reboot initated > # grep "2007 reboot initated" /usr/adm/syslog > Tue May 1 00:04:44 CDT 2007 reboot initated > # grep "2006 reboot initated" /usr/adm/syslog > # grep "2005 reboot initated" /usr/adm/syslog > # > > Syslog starts Jan 31 2005. > > I see that it has been occurring but not to the level that it > has today. > > > Is this on an HP or Compaq system, with the full EFS installed? I've seen similar things from cpqmon, the EFS health monitor. But nothing matching that particular entry. Do you have some other HW monitor installed? Is 'initiated' really misspelled 'initated' that way? Might be worthwhile to run strings on binaries looking for 'reboot' or that particular mis-spelling of initiated. -- ---------------------------------------------------- Pat Welch, UBB Computer Services, a WCS Affiliate SCO Authorized Partner Microlite BackupEdge Certified Reseller Unix/Linux/Windows/Hardware Sales/Support (209) 745-1401 Cell: (209) 251-9120 E-mail: patubb@inreach.com ---------------------------------------------------- |
| |||
| On Mar 7, 10:02*pm, "Steve M. Fabac, Jr." <smfa...@att.net> wrote: > My clients system has started throwing the following message > on the system console (retyped from description read over > the phone): > > WARNING: allocb failed - NSTRPAGES exceeded > > I read TA116684 on how to debug failures and the > items on kernel tuning and drivers don't seem to be > applicable as this just started happening. Might look at http://aplawrence.com/SCOFAQ/FAQ_scotec1haltcatch.html which includes some suggestions Bela made way back when.. |
| ||||
| Pat Welch wrote: > Steve M. Fabac, Jr. wrote: >> My clients system has started throwing the following message >> on the system console (retyped from description read over >> the phone): >> >> WARNING: allocb failed - NSTRPAGES exceeded >> >> I read TA116684 on how to debug failures and the >> items on kernel tuning and drivers don't seem to be >> applicable as this just started happening. >> >> 4. Failing hardware >> >> 5. External network hardware misbehaving >> >> 6. Extremely high network traffic >> >> Items 4, 5, and 6 seem to be possible candidates. >> >> The client called today when he simply rebooted the system >> (as that was what I was advising him to do on previous >> occasions) and then the system spontaneously rebooted about >> an hour later. >> >> I connected via SSH to the running system and as I was >> monitoring it, it rebooted again. >> >> I found the following in /usr/adm/syslog: >> Mar 7 14:55:05 vetreal bootpd[1359]: IP address not found: >> 192.168.160.143 >> Mar 7 15:00:53 vetreal TLW param1=-1 >> Fri Mar 7 15:00:53 CST 2008 reboot initated >> Mar 7 15:03:44 vetreal syslogd: restart >> >> The two odd things that jump out above is TLW param1=-1 >> and "reboot initiated" both at 15:00:53. >> >> Does anyone recognize these two entries that seem to be related? >> >> I've never seen a system log with the message "YYYY reboot initiated" >> >> Checking /usr/adm/syslog: >> >> # grep "2008 reboot initated" /usr/adm/syslog >> Mon Jan 28 13:19:20 CST 2008 reboot initated >> Sat Feb 16 13:35:05 CST 2008 reboot initated >> Mon Feb 25 17:30:05 CST 2008 reboot initated >> Wed Mar 5 16:37:54 CST 2008 reboot initated >> Fri Mar 7 12:38:13 CST 2008 reboot initated >> Fri Mar 7 14:09:16 CST 2008 reboot initated >> Fri Mar 7 14:16:23 CST 2008 reboot initated >> Fri Mar 7 15:00:53 CST 2008 reboot initated >> # grep "2007 reboot initated" /usr/adm/syslog >> Tue May 1 00:04:44 CDT 2007 reboot initated >> # grep "2006 reboot initated" /usr/adm/syslog >> # grep "2005 reboot initated" /usr/adm/syslog >> # >> >> Syslog starts Jan 31 2005. >> >> I see that it has been occurring but not to the level that it >> has today. >> >> >> > > Is this on an HP or Compaq system, with the full EFS installed? > > I've seen similar things from cpqmon, the EFS health monitor. > > But nothing matching that particular entry. Do you have some other HW > monitor installed? > > Is 'initiated' really misspelled 'initated' that way? Might be > worthwhile to run strings on binaries looking for 'reboot' or that > particular mis-spelling of initiated. > That was a good suggestion. I found the offending script in: # # Purpose: # To install and umpgrade the Fault-Freedom II Driver # Description: # arg 1: Name of step to perform. # arg 2: Keyword list, e.g. UPGRADE. # arg 3: space-separated list of packages. #************************************************* **************** LOGFILE=/tmp/ff2install.log INSTALL_DIR=/usr/local/ff2 HALTSYS=`l -Wv /etc/haltsys | awk '{ print $11 }'` SHUTDOWN=`l -Wv /etc/shutdown | awk '{ print $11 }'` INITFILE=`l -Wv /etc/inittab | awk '{ print $11 }'` INITBASE=/etc/conf/cf.d/init.base .... ############### # /etc/reboot # ############### grep Ff2 ${HALTSYS} > /dev/null 2>&1 if [ "$?" = "1" ] then ex_cmd cp ${HALTSYS} /etc/haltsys.preff2 sed -e '/^haltsys/a\ [ -x /usr/local/ff2/bin/Ff2 ] && \ {\ /usr/local/ff2/bin/Ff2 shutdown\ DATE=\`date\`\ > echo "${DATE} reboot initated" >> /usr/adm/syslog\ }' < /tmp/haltsys$$ > ${HALTSYS} ex_cmd rm /tmp/haltsys$$ ex_cmd chmod 700 ${HALTSYS} ex_cmd chown root ${HALTSYS} ex_cmd chgrp sys ${HALTSYS} fi "ccs" line 250 of 649 --38%-- :!pwd /opt/K/1776/FF2/1.0.2W/cntl/packages/FF2 FaultFredomII from 1776 Software is installed but not running (no heartbeat connection). It's a long story. When I was called in July 2001, the client was trying to get FF2 working but whenever they tried to manually fail to the backup server, the system would lock up. So it was installed but disabled from starting all its daemons. I wrote scripts to manually switch identities between the two machines and mirror the data directories overnight. Now I've got to look to see what is still running and why it has suddenly shut the server down four times in one day. What we did find was the Cisco switches were showing a lot of chatter. After hours, the client powered the switches down and then back on and the chatter subsided. Netstat -m run today after rebooting the Cisco switches on Friday shows all zeros in the fail column where before, they were getting 300+ in multiple buffers. so it looks like >> 5. External network hardware misbehaving is the correct assessment. -- Steve Fabac S.M. Fabac & Associates 816/765-1670 |