vBulletin Search Engine Optimization
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Hi group, OS: AIX 5300-05 Service Pack 1 IDS: 10.00.FC5 IHAC that is running into some nasty problems on the latest version of AIX where an IDS instance is reporting a OS authentication problem of all users trying to connect to the instance (even for the informix user). All existing connections seems to get dropped and it becomes impossible to reconnect. .... 10:41:33 WARNING: mt_aio_wait: errno == EINVAL 10:41:33 listener-thread: err = -951: oserr = 2: errstr = userA@hostA: Incorrect password or user userA@hostnameA is not known on the database server. System error = 2. .... The problem happens quite sporadically and has been seen to start after between 10 to 25 days of the instance being online. The only known solution is to bounce the instance but that isn't acceptable for a live 24x7 system. A call has been raised with Informix support (PMR 02830,019,866 btw for you ambitious IBM techies) and after applying all the recommended OS patches, we are still having the problem. We're in the process of applying the next two service packs available for this version of AIX (both were released last month so I'd guess there are plenty of bugs present) but as this system is to be taken into production soon, it will definately present a critical problem if the problem re-occurs then. There is now a second instance on the same machine but that one didn't show the problem last time it occurred on the first instance so perhaps this isn't an AIX issue but a purely IDS one. Has anyone experienced the same type of problem? Tech support doesn't seem to have an answer to it. RoB |
| |||
| phone IBM back and re open the PMR RoB wrote: > Hi group, > > OS: AIX 5300-05 Service Pack 1 > IDS: 10.00.FC5 > > IHAC that is running into some nasty problems on the latest version of > AIX where an IDS instance is reporting a OS authentication problem of > all users trying to connect to the instance (even for the informix > user). All existing connections seems to get dropped and it becomes > impossible to reconnect. > > ... > 10:41:33 WARNING: mt_aio_wait: errno == EINVAL > 10:41:33 listener-thread: err = -951: oserr = 2: errstr = userA@hostA: > Incorrect password or user userA@hostnameA is not known on the database > server. > System error = 2. > ... > > The problem happens quite sporadically and has been seen to start after > between 10 to 25 days of the instance being online. The only known > solution is to bounce the instance but that isn't acceptable for a live > 24x7 system. > > A call has been raised with Informix support (PMR 02830,019,866 btw for > you ambitious IBM techies) and after applying all the recommended OS > patches, we are still having the problem. > > We're in the process of applying the next two service packs available > for this version of AIX (both were released last month so I'd guess > there are plenty of bugs present) but as this system is to be taken > into production soon, it will definately present a critical problem if > the problem re-occurs then. > > There is now a second instance on the same machine but that one didn't > show the problem last time it occurred on the first instance so perhaps > this isn't an AIX issue but a purely IDS one. > > Has anyone experienced the same type of problem? Tech support doesn't > seem to have an answer to it. > > > RoB |
| |||
| There was a simular issue years ago; AIX had a mem leak in one of the systemcalls causing the db to grow and after it hit some limit it would barf and die??? can not remember the exact details... Go and beat up aix support to get an answer. Superboer. RoB schreef: > Hi group, > > OS: AIX 5300-05 Service Pack 1 > IDS: 10.00.FC5 > > IHAC that is running into some nasty problems on the latest version of > AIX where an IDS instance is reporting a OS authentication problem of > all users trying to connect to the instance (even for the informix > user). All existing connections seems to get dropped and it becomes > impossible to reconnect. > > ... > 10:41:33 WARNING: mt_aio_wait: errno == EINVAL > 10:41:33 listener-thread: err = -951: oserr = 2: errstr = userA@hostA: > Incorrect password or user userA@hostnameA is not known on the database > server. > System error = 2. > ... > > The problem happens quite sporadically and has been seen to start after > between 10 to 25 days of the instance being online. The only known > solution is to bounce the instance but that isn't acceptable for a live > 24x7 system. > > A call has been raised with Informix support (PMR 02830,019,866 btw for > you ambitious IBM techies) and after applying all the recommended OS > patches, we are still having the problem. > > We're in the process of applying the next two service packs available > for this version of AIX (both were released last month so I'd guess > there are plenty of bugs present) but as this system is to be taken > into production soon, it will definately present a critical problem if > the problem re-occurs then. > > There is now a second instance on the same machine but that one didn't > show the problem last time it occurred on the first instance so perhaps > this isn't an AIX issue but a purely IDS one. > > Has anyone experienced the same type of problem? Tech support doesn't > seem to have an answer to it. > > > RoB |
| |||
| Had the same issue with 9.40.FC5 on AIX 5.3. It was an OS ML that fixed the issue. Here is what we have installed for ML's: All filesets for 5.3.0.0_AIX_ML were found. All filesets for 5300-01_AIX_ML were found. All filesets for 5300-02_AIX_ML were found. All filesets for 5300-03_AIX_ML were found. Not all filesets for 5300-04_AIX_ML were found. All filesets for 5300-05_AIX_ML were found. Hope this helps. RoB wrote: > Hi group, > > OS: AIX 5300-05 Service Pack 1 > IDS: 10.00.FC5 > > IHAC that is running into some nasty problems on the latest version of > AIX where an IDS instance is reporting a OS authentication problem of > all users trying to connect to the instance (even for the informix > user). All existing connections seems to get dropped and it becomes > impossible to reconnect. > > ... > 10:41:33 WARNING: mt_aio_wait: errno == EINVAL > 10:41:33 listener-thread: err = -951: oserr = 2: errstr = userA@hostA: > Incorrect password or user userA@hostnameA is not known on the database > server. > System error = 2. > ... > > The problem happens quite sporadically and has been seen to start after > between 10 to 25 days of the instance being online. The only known > solution is to bounce the instance but that isn't acceptable for a live > 24x7 system. > > A call has been raised with Informix support (PMR 02830,019,866 btw for > you ambitious IBM techies) and after applying all the recommended OS > patches, we are still having the problem. > > We're in the process of applying the next two service packs available > for this version of AIX (both were released last month so I'd guess > there are plenty of bugs present) but as this system is to be taken > into production soon, it will definately present a critical problem if > the problem re-occurs then. > > There is now a second instance on the same machine but that one didn't > show the problem last time it occurred on the first instance so perhaps > this isn't an AIX issue but a purely IDS one. > > Has anyone experienced the same type of problem? Tech support doesn't > seem to have an answer to it. > > > RoB |
| ||||
| >Hello Rob, > >It is a known problem. I don't have the PMR & Bug number, it is >somehow alleviated by TL05. Shortly put, there is a memory leak >when calling a specific OS function (crypt) and it manifests with all >long running programs calling it (Informix/Db2/...). > >You can test it for yourself (as we did), generating connections >to the database (doing nothing and exiting as soon as possible, >as many connections you can do). After about 1.3 Mil connections >on pre-TL05 and after about 4.2 Mil connections on post-TL05 >you hit this limit. > >We are also interested in a more permanent solution - as it is now, >we still have to bounce the engine once 3-4 weeks of activity. >If you get more information about this, please notify me also >(or post in the newsgroup). > >Regards, >Bogdan BOTEZ. Bogdan kindly emailed me to point us in the right direction. We contacted Informix support again and there doesn't seem to be a solution to this problem as of yet. The cause of the problem seems to be related to the total amount of CPU time spent on the msc-VP (last column of certain level then the problem will start occurring. Once identified (write a simple script that opens up and closes a lot of connections to a database on the server), this could be used as a metric to decide when to bounce the instance (until the AIX blokes finally get around to fix this). Thank you all for your input! RoB |