vBulletin Search Engine Optimization
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Doesn't seem like a network issue to me. I've never seen anything like this, but my impression was that it could not affinitize. I.e. couldnot grab a resource that it wanted - in this case a CPU. Rebooting cleared it up, so IMHO, something was hanging onto the CPU that prevented IDS from grabbing it cheers j. Sane ego te vocavi. Forsitan capedictum tuum desit. -----Original Message----- From: informix-list-bounces@iiug.org [mailto:informix-list-bounces@iiug.org]On Behalf Of Omar Muņoz Sent: Sunday, February 17, 2008 9:20 PM To: david@smooth1.co.uk; informix-list@iiug.org Subject: Re: Help about server which hangs during restart Hi. At the beginning I thought something similar to Norma and I edited sqlhosts in order to use IP instead of names in order to avoid solving then, but that didn't work, and anyway everything is at the same machine. You're right about shared memory dump, David. My dump partition didn't get full, but since shared memory is pretty big (13 Gb for all) I don't think I had large file support on it. I gonna talk about it with the OS administrator. Anyway, assert failure happened before that. Sadly, I didn't perform any onstat monitor activity at that time, but I guess some information might be at af file. What should I look for? I gonna do some work on the patch you sent me, even when supposedly this situation takes effect on Solaris 8 and we have solaris 9, according with the link. Thanks a lot to everyone Omar Munoz --- "david@smooth1.co.uk" <david@smooth1.co.uk> wrote: > On 17 Feb, 17:42, "Sebastian, Norma J." > <NormaJean.Sebast...@tellabs.com> wrote: > > Did you do anything about the network messages? > > Informix told you there were network problems at > 8:27AM, and didn't crash until 10AM. > > Did you try to fix or figure out the network > problem in that 1.5 hours before informix gave up? > > My guess is you have some important part of > informix on a shared/network drive and informix > could not get to it. > > > > > > > > -----Original Message----- > > From: informix-list-boun...@iiug.org > [mailto:informix-list-boun...@iiug.org] On Behalf Of > Omar "Muņoz > > Sent: Sunday, February 17, 2008 11:35 AM > > To: informix-list-boun...@iiug.org; > informix-l...@iiug.org > > Subject: Help about server which hangs during > restart > > > > Hi. > > > > * *Once - when we tried to bring a database online > > after we had shut it down without problems as a > part > > of a weekly task- database hung out for about an > hour. > > Online.log looked like this: > > > > 08:19:13 *On-Line Mode > > 08:19:13 *Affinitied VP 1 to phys proc 4 > > 08:27:11 *VP Notify mechanism incomplete after 5 > > minutes. This can be due to slo > > w network file access. Will try 12 more times > > 08:35:11 *VP Notify mechanism incomplete after 5 > > minutes. This can be due to slo > > w network file access. Will try 11 more times > > 08:43:09 *VP Notify mechanism incomplete after 5 > > minutes. This can be due to slo > > w network file access. Will try 10 more times > > 08:51:08 *VP Notify mechanism incomplete after 5 > > minutes. This can be due to slo > > w network file access. Will try 9 more times > > > > After that, database crashed, and online.log > showed: > > > > 09:54:56 *notifyvp(): vp 3, pid 22717 of class 0 > > didn't rcv > > 09:54:56 *notifyvp(): vp 4, pid 22718 of class 0 > > didn't rcv > > 09:54:56 *notifyvp(): vp 5, pid 22719 of class 0 > > didn't rcv > > 09:54:56 *notifyvp(): vp 6, pid 22720 of class 0 > > didn't rcv > > 09:54:56 *notifyvp(): vp 7, pid 22721 of class 0 > > didn't rcv > > 09:54:56 *notifyvp(): vp 8, pid 22722 of class 0 > > didn't rcv > > 09:54:56 *notifyvp(): vp 9, pid 22723 of class 0 > > didn't rcv > > 09:54:56 *Assert Failed: mt_notifyvp timed out > > 09:54:56 *IBM Informix Dynamic Server Version > 9.40.FC7 > > 09:54:56 * Who: Session(1, informix@orion, 0, > > 35930e028) > > * * * * * * * * Thread(7, main_loop(), 3592cc028, > 1) > > * * * * * * * * File: mt.c Line: 11121 > > 09:54:56 *stack trace for pid 22637 written to > > /respaldo2/informix/tmp/siisa/af. > > 3ef58cf > > 09:54:56 * See Also: > > /respaldo2/informix/tmp/siisa/af.3ef58cf, > > shmem.3ef58cf.0 > > 09:56:12 *Error writing > > '/respaldo2/informix/tmp/siisa/shmem.3ef58cf.0' > errno > > = > > 28 > > 09:56:12 *mt.c, line 11121, thread 7, proc id > 22637, > > mt_notifyvp timed out. > > 09:56:14 *The Master Daemon Died > > 09:56:15 *PANIC: Attempting to bring system down > > > > We finally were able to bring informix online only > by > > restarting the entire server (a Sun server with > > Solaris 9). Even so, I'd like to ask you which is > the > > reason why this kind of error happens. I'm talking > > about an informix 9.40 FC7 database. > > > > * *Thanks in advance > > > > * * * * * * * * * * * Omar Munoz > > > > * * * > __________________________________________________ _________________________* _________ > > Looking for last minute shopping deals? * > > Find them fast with Yahoo! Search. > *http://tools.search.yahoo.com/newsea...egory=shopping > > _______________________________________________ > > Informix-list mailing list > > > Informix-l...@iiug.orghttp://www.iiug.org/mailman/listinfo/informix-list > > > ================================================== ========== > > The information contained in this message may be > privileged > > and confidential and protected from disclosure. If > the reader > > of this message is not the intended recipient, or > an employee > > or agent responsible for delivering this message > to the > > intended recipient, you are hereby notified that > any reproduction, > > dissemination or distribution of this > communication is strictly > > prohibited. If you have received this > communication in error, > > please notify us immediately by replying to the > message and > > deleting it from your computer. Thank you. Tellabs > > > ================================================== ==========- > Hide quoted text - > > > > - Show quoted text - > > No Norma, this does not have to be a network issue, > you are wrong. All > it means if that the VP's timed out communicating. > Perhaps one of the threads was hung in an OS call or > there is an > Informix bug? DO NOT MAKE ASSUMPTIONS. > > There is a known vp notify issue in 9.40.UC7 > > http://www-1.ibm.com/support/docview...&context=SSHPY E&dc=D600&uid=swg21233887&loc=en_US&cs=utf-8&lang=en > > Patch to 9.40.UC7W1X1 to fix. PS When will 9.40.xC8 > be out? > > Omar, there could be some other issue. Who knows, > could be a hardware > issue an OS issue or an Informix issue. > > You have DUMPSHMEM enabled but if you check under > /usr/include/sys/ > errno.h on your system you will find errno 28 is no > more space. > There is either not enough space under > "'/respaldo2/informix/tmp/siisa/ > shmem" to dump shared memory or it is trying to > write a > file >2Gb on a file system that is not largefile > enabled. > > What did onstat -g stk all give? Did you > strace/truss the server > pids? > Where there any issues in the OS logs? > > You need to do more investigation when the problem > happens. > > _______________________________________________ > Informix-list mailing list > Informix-list@iiug.org > http://www.iiug.org/mailman/listinfo/informix-list > __________________________________________________ __________________________ ________ Never miss a thing. Make Yahoo your home page. http://www.yahoo.com/r/hs _______________________________________________ Informix-list mailing list Informix-list@iiug.org http://www.iiug.org/mailman/listinfo/informix-list |
| ||||
| On 18 Feb, 02:49, "Jack Parker" <jack.park...@verizon.net> wrote: > Doesn't seem like a network issue to me. *I've never seen anything like > this, but my impression was that it could not affinitize. *I.e. could not > grab a resource that it wanted - in this case a CPU. *Rebooting cleared it > up, so IMHO, something was hanging onto the CPU that prevented IDS from > grabbing it > > cheers > j. > > Sane ego te vocavi. Forsitan capedictum tuum desit. > > > > -----Original Message----- > From: informix-list-boun...@iiug.org > > [mailto:informix-list-boun...@iiug.org]On Behalf Of Omar Muņoz > Sent: Sunday, February 17, 2008 9:20 PM > To: da...@smooth1.co.uk; informix-l...@iiug.org > Subject: Re: Help about server which hangs during restart > > Hi. > > * *At the beginning I thought something similar to > Norma and I edited sqlhosts in order to use IP instead > of names in order to avoid solving then, but that > didn't work, and anyway everything is at the same > machine. > > * *You're right about shared memory dump, David. My > dump partition didn't get full, but since shared > memory is pretty big *(13 Gb for all) I don't think I > had large file support on it. I gonna talk about it > with the OS administrator. Anyway, assert failure > happened before that. > > * *Sadly, I didn't perform any onstat monitor activity > at that time, but I guess some information might be at > af file. What should I look for? > > * *I gonna do some work on the patch you sent me, even > when supposedly this situation takes effect on Solaris > 8 and we have solaris 9, according with the link. > > * *Thanks a lot to everyone > > * * * * * * * * * *Omar Munoz > > --- "da...@smooth1.co.uk" <da...@smooth1.co.uk> wrote: > > > On 17 Feb, 17:42, "Sebastian, Norma J." > > <NormaJean.Sebast...@tellabs.com> wrote: > > > Did you do anything about the network messages? > > > Informix told you there were network problems at > > 8:27AM, and didn't crash until 10AM. > > > Did you try to fix or figure out the network > > problem in that 1.5 hours before informix gave up? > > > My guess is you have some important part of > > informix on a shared/network drive and informix > > could not get to it. > > > > -----Original Message----- > > > From: informix-list-boun...@iiug.org > > [mailto:informix-list-boun...@iiug.org] On Behalf Of > > Omar "Muņoz > > > Sent: Sunday, February 17, 2008 11:35 AM > > > To: informix-list-boun...@iiug.org; > > informix-l...@iiug.org > > > Subject: Help about server which hangs during > > restart > > > > Hi. > > > > * *Once - when we tried to bring a database online > > > after we had shut it down without problems as a > > part > > > of a weekly task- database hung out for about an > > hour. > > > Online.log looked like this: > > > > 08:19:13 *On-Line Mode > > > 08:19:13 *Affinitied VP 1 to phys proc 4 > > > 08:27:11 *VP Notify mechanism incomplete after 5 > > > minutes. This can be due to slo > > > w network file access. Will try 12 more times > > > 08:35:11 *VP Notify mechanism incomplete after 5 > > > minutes. This can be due to slo > > > w network file access. Will try 11 more times > > > 08:43:09 *VP Notify mechanism incomplete after 5 > > > minutes. This can be due to slo > > > w network file access. Will try 10 more times > > > 08:51:08 *VP Notify mechanism incomplete after 5 > > > minutes. This can be due to slo > > > w network file access. Will try 9 more times > > > > After that, database crashed, and online.log > > showed: > > > > 09:54:56 *notifyvp(): vp 3, pid 22717 of class 0 > > > didn't rcv > > > 09:54:56 *notifyvp(): vp 4, pid 22718 of class 0 > > > didn't rcv > > > 09:54:56 *notifyvp(): vp 5, pid 22719 of class 0 > > > didn't rcv > > > 09:54:56 *notifyvp(): vp 6, pid 22720 of class 0 > > > didn't rcv > > > 09:54:56 *notifyvp(): vp 7, pid 22721 of class 0 > > > didn't rcv > > > 09:54:56 *notifyvp(): vp 8, pid 22722 of class 0 > > > didn't rcv > > > 09:54:56 *notifyvp(): vp 9, pid 22723 of class 0 > > > didn't rcv > > > 09:54:56 *Assert Failed: mt_notifyvp timed out > > > 09:54:56 *IBM Informix Dynamic Server Version > > 9.40.FC7 > > > 09:54:56 * Who: Session(1, informix@orion, 0, > > > 35930e028) > > > * * * * * * * * Thread(7, main_loop(), 3592cc028, > > 1) > > > * * * * * * * * File: mt.c Line: 11121 > > > 09:54:56 *stack trace for pid 22637 written to > > > /respaldo2/informix/tmp/siisa/af. > > > 3ef58cf > > > 09:54:56 * See Also: > > > /respaldo2/informix/tmp/siisa/af.3ef58cf, > > > shmem.3ef58cf.0 > > > 09:56:12 *Error writing > > > '/respaldo2/informix/tmp/siisa/shmem.3ef58cf.0' > > errno > > > = > > > 28 > > > 09:56:12 *mt.c, line 11121, thread 7, proc id > > 22637, > > > mt_notifyvp timed out. > > > 09:56:14 *The Master Daemon Died > > > 09:56:15 *PANIC: Attempting to bring system down > > > > We finally were able to bring informix online only > > by > > > restarting the entire server (a Sun server with > > > Solaris 9). Even so, I'd like to ask you which is > > the > > > reason why this kind of error happens. I'm talking > > > about an informix 9.40 FC7 database. > > > > * *Thanks in advance > > > > * * * * * * * * * * * Omar Munoz > > > > * * * > > __________________________________________________ _________________________** > _________ > > > Looking for last minute shopping deals? * > > > Find them fast with Yahoo! Search. > > *http://tools.search.yahoo.com/newsea...egory=shopping > > > _______________________________________________ > > > Informix-list mailing list > > Informix-l...@iiug.orghttp://www.iiug.org/mailman/listinfo/informix-list > > ================================================== ========== > > > The information contained in this message may be > > privileged > > > and confidential and protected from disclosure. If > > the reader > > > of this message is not the intended recipient, or > > an employee > > > or agent responsible for delivering this message > > to the > > > intended recipient, you are hereby notified that > > any reproduction, > > > dissemination or distribution of this > > communication is strictly > > > prohibited. If you have received this > > communication in error, > > > please notify us immediately by replying to the > > message and > > > deleting it from your computer. Thank you. Tellabs > > ================================================== ==========- > > Hide quoted text - > > > > - Show quoted text - > > > No Norma, this does not have to be a network issue, > > you are wrong. All > > it means if that the VP's timed out communicating. > > Perhaps one of the threads was hung in an OS call or > > there is an > > Informix bug? DO NOT MAKE ASSUMPTIONS. > > > There is a known vp notify issue in 9.40.UC7 > > http://www-1.ibm.com/support/docview...=SSGU8G&contex.... > E&dc=D600&uid=swg21233887&loc=en_US&cs=utf-8&lang=en > > > Patch to 9.40.UC7W1X1 to fix. PS When will 9.40.xC8 > > be out? > > > Omar, *there could be some other issue. Who knows, > > could be a hardware > > issue an OS issue or an Informix issue. > > > You have DUMPSHMEM enabled but if you check under > > /usr/include/sys/ > > errno.h on your system you will find errno 28 is no > > more space. > > There is either not enough space under > > "'/respaldo2/informix/tmp/siisa/ > > shmem" to dump shared memory or it is trying to > > write a > > file >2Gb on a file system that is not largefile > > enabled. > > > What did onstat -g stk all give? Did you > > strace/truss the server > > pids? > > Where there any issues in the OS logs? > > > You need to do more investigation when the problem > > happens. > > > _______________________________________________ > > Informix-list mailing list > > Informix-l...@iiug.org > >http://www.iiug.org/mailman/listinfo/informix-list > > __________________________________________________ _________________________*_ > ________ > Never miss a thing. *Make Yahoo your home page.http://www.yahoo.com/r/hs > _______________________________________________ > Informix-list mailing list > Informix-l...@iiug.orghttp://www.iiug.org/mailman/listinfo/informix-list- Hide quoted text - > > - Show quoted text - "impression was that it could not affinitize." ?? If the system calls to pin a CPU VP onto a cpu fail then I would expect an error in the online.log. "something was hanging onto the CPU that prevented IDS from grabbing it". What? I've never heard such complete bollocks! Nothing can "hang on" to a CPU. I expect either an OS bug or an Informix bug (SPARC Solaris normally reports hardware issues pretty reliably to the OS logs). Without more info we cannot tell which. |