This is a discussion on replication between 9.2 and 9.4--help me!! within the Informix forums, part of the Database Server Software category; --> hello! I have a question for you: I administrate about 80 IDS 9.2 servers running Unixware 7 and IDS ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| hello! I have a question for you: I administrate about 80 IDS 9.2 servers running Unixware 7 and IDS 9.20 with ER .The manager would to upgrade root server to IDS 9.4 but...it is a problem with replication between root server running IDS 9.2 and other servers running IDS 9.2 I set the enviroment parameter CDRSITES_92x on IDS 9.4. Using only the root server and a leaf server everything is OK. But at the moment when i added another server and changed CDRSITES (ex: export CDRSITES_92x="302,303") the root server hangs--ONINIT process use 99.9% from CPU--all is blocked. ( If all the servers are online and connected replication runs, there is no problem!!! The problem appears when root server is restarting if the server is stopped because of different reasons.. I find that the thread CDRStart still running after the server starts: 37 473ace38 46e600d8 2 running 1cpu CDRStart Is it normally? (on ids 9.2 it don't happened) IBM told us that the last version ids 9.4 uc5 resolve our problem but....don't happened. Is it a known bug? Return of top command: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2186 informix 16 0 62824 61m 61m R 99.9 12.2 16:57.02 oninit onstat -g ath: tid tcb rstcb prty status vp-class name 2 473a0018 0 2 sleeping forever 3lio lio vp 0 3 473ac140 0 2 sleeping forever 4pio pio vp 0 4 473c1140 0 2 sleeping forever 5aio aio vp 0 5 473d6140 0 2 sleeping forever 6msc msc vp 0 6 47403140 0 2 sleeping forever 7aio aio vp 1 7 474182a0 46e5d018 4 ready 1cpu main_loop() 8 47418e98 0 2 running 8soc soctcppoll 9 474408b0 0 2 cond wait arrived 9soc soctcppoll 10 474578b0 0 3 sleeping forever 1cpu soctcplst 11 47457cf8 46e5d630 2 sleeping forever 1cpu flush_sub(0) 12 47457e88 46e5dc48 2 sleeping forever 1cpu flush_sub(1) 13 47440a40 46e5e260 2 sleeping forever 1cpu flush_sub(2) 14 47440bd0 46e5e878 2 sleeping forever 1cpu flush_sub(3) 15 473c18d0 0 2 sleeping forever 10aio aio vp 2 16 474cd140 0 2 sleeping forever 11aio aio vp 3 17 474d9140 0 2 sleeping forever 12aio aio vp 4 18 474ee140 0 2 sleeping forever 13aio aio vp 5 19 475034b0 46e5ee90 3 sleeping forever 1cpu aslogflush 20 47503ab8 46e5f4a8 1 ready 1cpu btscanner 0 35 473acad0 46e5fac0 1 ready 1cpu sbspclean 36 473c1b68 46e606f0 4 ready 1cpu onmode_mon 37 473ace38 46e600d8 2 running 1cpu CDRStart 39 474cd628 46e60d08 2 sleeping forever 1cpu CDRSchedMgr 40 474cdac0 46e61320 2 ready 1cpu preDDR 41 474cde08 46e61938 2 ready 1cpu CDRDTCleaner 42 474d9480 46e61f50 2 cond wait CDRCparse 1cpu CDRCparse 43 474eed78 46e62568 2 ready 1cpu CDRGfan 44 473a0518 46e62b80 4 ready 1cpu CDRGeval0 45 473a0808 46e63198 4 ready 1cpu CDRGeval1 46 473a0af8 46e637b0 4 ready 1cpu CDRGeval2 47 477313c0 46e63dc8 2 ready 1cpu CDRNsT303 |
| |||
| Do you have a case number? If so - what is it? What is the OS of the central root server? What was the bug that tech support said was the cause of the problem? We did run into a problem that might cause such a hang if the CDRSITES_92x environmental variable was incorrectly formatted. cristizaharioiu@hotmail.com (Cristian Zaharioiu) wrote in message news:<929f826d.0410120314.5642fa53@posting.google. com>... > hello! > > I have a question for you: I administrate about 80 IDS 9.2 servers > running Unixware 7 and IDS 9.20 with ER .The manager would to upgrade > root server to IDS 9.4 but...it is a problem with > replication between root server running IDS 9.2 and other servers > running IDS 9.2 > I set the enviroment parameter CDRSITES_92x on IDS 9.4. Using only the > root server and a leaf server everything is OK. But at the moment when > i added another server and changed CDRSITES (ex: export > CDRSITES_92x="302,303") the root server hangs--ONINIT process use > 99.9% from CPU--all is blocked. ( > > If all the servers are online and connected replication runs, there is > no problem!!! > The problem appears when root server is restarting if the server is > stopped because of different reasons.. > I find that the thread CDRStart still running after the server > starts: > > 37 473ace38 46e600d8 2 running 1cpu > CDRStart > > Is it normally? (on ids 9.2 it don't happened) > > > IBM told us that the last version ids 9.4 uc5 resolve our problem > but....don't happened. > Is it a known bug? > > Return of top command: > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 2186 informix 16 0 62824 61m 61m R 99.9 12.2 16:57.02 oninit > > onstat -g ath: > > tid tcb rstcb prty status vp-class > name > 2 473a0018 0 2 sleeping forever 3lio > lio vp 0 > 3 473ac140 0 2 sleeping forever 4pio > pio vp 0 > 4 473c1140 0 2 sleeping forever 5aio > aio vp 0 > 5 473d6140 0 2 sleeping forever 6msc > msc vp 0 > 6 47403140 0 2 sleeping forever 7aio > aio vp 1 > 7 474182a0 46e5d018 4 ready 1cpu > main_loop() > 8 47418e98 0 2 running 8soc > soctcppoll > 9 474408b0 0 2 cond wait arrived 9soc > soctcppoll > 10 474578b0 0 3 sleeping forever 1cpu > soctcplst > 11 47457cf8 46e5d630 2 sleeping forever 1cpu > flush_sub(0) > 12 47457e88 46e5dc48 2 sleeping forever 1cpu > flush_sub(1) > 13 47440a40 46e5e260 2 sleeping forever 1cpu > flush_sub(2) > 14 47440bd0 46e5e878 2 sleeping forever 1cpu > flush_sub(3) > 15 473c18d0 0 2 sleeping forever 10aio > aio vp 2 > 16 474cd140 0 2 sleeping forever 11aio > aio vp 3 > 17 474d9140 0 2 sleeping forever 12aio > aio vp 4 > 18 474ee140 0 2 sleeping forever 13aio > aio vp 5 > 19 475034b0 46e5ee90 3 sleeping forever 1cpu > aslogflush > 20 47503ab8 46e5f4a8 1 ready 1cpu > btscanner 0 > 35 473acad0 46e5fac0 1 ready 1cpu > sbspclean > 36 473c1b68 46e606f0 4 ready 1cpu > onmode_mon > 37 473ace38 46e600d8 2 running 1cpu > CDRStart > 39 474cd628 46e60d08 2 sleeping forever 1cpu > CDRSchedMgr > 40 474cdac0 46e61320 2 ready 1cpu > preDDR > 41 474cde08 46e61938 2 ready 1cpu > CDRDTCleaner > 42 474d9480 46e61f50 2 cond wait CDRCparse 1cpu > CDRCparse > 43 474eed78 46e62568 2 ready 1cpu > CDRGfan > 44 473a0518 46e62b80 4 ready 1cpu > CDRGeval0 > 45 473a0808 46e63198 4 ready 1cpu > CDRGeval1 > 46 473a0af8 46e637b0 4 ready 1cpu > CDRGeval2 > 47 477313c0 46e63dc8 2 ready 1cpu > CDRNsT303 |
| |||
| mpruet@comcast.net (mpruet) wrote in message news:<40676184.0410121229.52ed3d6c@posting.google. com>... > Do you have a case number? If so - what is it? > > > What is the OS of the central root server? > > What was the bug that tech support said was the cause of the problem? > We did run into a problem that might cause such a hang if the > CDRSITES_92x environmental variable was incorrectly formatted. > > > > For us the support was assured by IBM Romania, which disscused these problems with IBM Czech Republic which, probably, reclamed this bug to IBM. We were told that this problem will be disscused at the CEMA support and has the number 408366" The central root server running Linux Mandrake 9.2. Today I want to install and configure the root server using Red Hat, although i don't think the problem is Linux distribution.. I have also observed the return of onstat -g cdr command is different if I declare CDRISTES_92x using "" or not...(export CDRSITES_92x=303,304 or export CDRSITES_92x="303,304" ) onstat -g cdr (with CDRSITES_92x=303,304 ): IBM Informix Dynamic Server Version 9.40.UC5 -- On-Line -- Up 00:00:40 -- 61704 Kbytes CDR block 0x475d0688 Mode: 3 ServerId 1 ServerName g_test94 NumConnect 7 InitLock: 0x47403798 <unlocked> InitCondition 0x47403898 Grouper 0x47658018 Queuer 0x475d0e18 Globcat 0x4746d638 RcvManager (nil) Network IF 0x47733730 DataSync (nil) QueueManager 0x475d2688 SchedManager 0x475d0df0 DeleteTable 0x475d0f00 RepScheduler 0x475d0dc8 Pool 0x475d0020 Fault (nil) Memory usage for pool name CDR: Free 7680 Used 29184 --------------------- GLOBAL-CATALOG CACHE STATISTICS Replicate Id Sequence # : 0 Grouper Started : NO Bitunits allocated : 11 Modify count: 0, wait 0 Sync count : 0, wait 0 Sync status: SS_ACTIVE Abort status: 0 SERVERS ------------------- Current server : Id 1, Nm g_test94 Last server slot: (0, 3) # free slots : 0 Broadcast map : <[000d]> Leaf server map : <[000c]> Root server map : <[0002]> Adjacent server map: <[000c]> Id: 01, Nm: g_test94, Or: 0x0002, off: 0, idle: 0, state Active root Id: 00, forward Id: 00, ishub: TRUE, isleaf: FALSE subtree map: <[000c]> NifInfo: 0x475d0f90 Last Updated (0) 1969/12/31 19:00:00 State Local server Total sent 0 (0 bytes) Total recv'd 0 (0 bytes) Retry 0 (0 attempts) Connected 0 Id: 303, Nm: g_scot43, Or: 0x0004, off: 0, idle: 0, state Active root Id: 01, forward Id: 303, ishub: FALSE, isleaf: TRUE NifInfo: 0x4764e050 Last Updated (1097598409) 2004/10/12 12:26:49 State Start error !!!!! Total sent 0 (0 bytes) Total recv'd 0 (0 bytes) Retry 0 (0 attempts) Connected 0 Id: 304, Nm: g_scot44, Or: 0x0008, off: 0, idle: 0, state Active root Id: 01, forward Id: 304, ishub: FALSE, isleaf: TRUE NifInfo: 0x4764e0f0 Last Updated (1097598409) 2004/10/12 12:26:49 State Start error !!!!! Total sent 0 (0 bytes) Total recv'd 0 (0 bytes) Retry 0 (0 attempts) Connected 0 ----------------------- Grouper at 0x0x47658018: Last Idle Time: (1097598424) 2004/10/12 12:27:04 RSAM interface ring buffer size: 528 !!!!!! RSAM interface ring buffer pending entries: 0 Eval thread interface ring buffer size: 48 Eval thread interface ring buffer pending entries: 0 Log update buffers in use: 0 Max log update buffers used at once: 1 Log update buffer memory in use: 0 Max log update buffer memory used at once: 64 onstat -g cdr (with CDRSITES_92x="303,304" ): IBM Informix Dynamic Server Version 9.40.UC5 -- On-Line -- Up 00:00:57 -- 61704 Kbytes CDR block 0x475bd688 Mode: 1 ServerId 1 ServerName g_test94 NumConnect 7 InitLock: 0x474036f0 <unlocked> InitCondition 0x474037f0 Grouper 0x4764f018 Queuer 0x475bde18 Globcat 0x4746ff60 RcvManager (nil) Network IF 0x4771c730 DataSync (nil) QueueManager 0x475bf688 SchedManager 0x475bddf0 DeleteTable 0x475bdf00 RepScheduler 0x475bddc8 Pool 0x475bd020 Fault (nil) Memory usage for pool name CDR: Free 7280 Used 29584 --------------------- GLOBAL-CATALOG CACHE STATISTICS Replicate Id Sequence # : 0 Grouper Started : NO Bitunits allocated : 12 Modify count: 0, wait 0 Sync count : 0, wait 0 Sync status: SS_ACTIVE Abort status: 0 SERVERS ------------------- Current server : Id 1, Nm g_test94 Last server slot: (0, 3) # free slots : 0 Broadcast map : <[000d]> Leaf server map : <[000c]> Root server map : <[0002]> Adjacent server map: <[000c]> Id: 01, Nm: g_test94, Or: 0x0002, off: 0, idle: 0, state Active root Id: 00, forward Id: 00, ishub: TRUE, isleaf: FALSE subtree map: <[000c]> NifInfo: 0x475bdf90 Last Updated (0) 1969/12/31 19:00:00 State Local server Total sent 0 (0 bytes) Total recv'd 0 (0 bytes) Retry 0 (0 attempts) Connected 0 Id: 303, Nm: g_scot43, Or: 0x0004, off: 0, idle: 0, state Active root Id: 01, forward Id: 303, ishub: FALSE, isleaf: TRUE NifInfo: 0x47635050 Last Updated (0) 1969/12/31 19:00:00 State Disconnected !!!!!!!!!! Total sent 0 (0 bytes) Total recv'd 0 (0 bytes) Retry 0 (0 attempts) Connected 0 Id: 304, Nm: g_scot44, Or: 0x0008, off: 0, idle: 0, state Active root Id: 01, forward Id: 304, ishub: FALSE, isleaf: TRUE NifInfo: 0x476350f0 Last Updated (0) 1969/12/31 19:00:00 State Disconnected !!!!!!!!!! Total sent 0 (0 bytes) Total recv'd 0 (0 bytes) Retry 0 (0 attempts) Connected 0 ------------------- Grouper at 0x0x4764f018: Change RSAM interface is down !!!!!!!!!!!!! RSAM interface ring buffer size: 528 Eval thread interface ring buffer size: 48 Eval thread interface ring buffer pending entries: 0 Updates from Log: 0 Log update links allocated: 512 Blob links allocated: 0 Conflict Resolution Blocks Allocated: 0 |
| ||||
| The patch that you recently received should have addressed the problem with the CDRSITESxxxx environmental variable. I don't think that the issue with CDRSITES would have caused the cdrStart thread to go into a loop. I think that the hang would have been with one of the NIF threads. I don't know exactly what the cdrStart thread was doing. We did start up a send thread (CDRNsT303), but I don't see any other network threads. So how to proceed? I'm totally unavailable right now. Don't have access to my Linux machine and won't have access to it for a couple of weeks. So I can't do too much personally. That said, there is a bit of extra information that might help out a bit. We need to determine exactly where we are within the engine when the hang is occuring. There are two ways to get this information. 1) turn on tracing, and 2) attach with a debugger. To turn on tracing, you would need add the following environmental variable prior to running oninit... XTRACE="xtrace size 100000:heavy -c XTF_CDR_GCC:heavy -c XTF_CDR_NIF:heavy -c XTF_CDR_RQM This will cause tracing to be started as the engine is started. Once the engine appears to be hung, the trace file can be displayed by running "xtrace fview" (We need to also have onstat -g ath to understand what the threads are.) Also, we really need to get some stack traces when the server is hung. Since you are running on linux, you would need to use gdb. It's been a long time since I've used gdb, so I can't remember the exact commands once gdb is active. Also, I can't remember how to attach to a running process using gdb. If I remember correctly, to attach to a running process, you would issue "attach processID" where processid is obtained from onstat -g glo. To interrupt the attached process, you would issue <CTRL>-c. Then to display the stack "bt", to continue the process "c", and to detach from the process "detach". But as I've said, its been a while since I used gdb, and I'm a bit rusty. However, we need to get a feel what the cdrStart thread is doing when the hang occurs, so we need to get a few stack traces at the point of the hang. mpruet@comcast.net (mpruet) wrote in message news:<40676184.0410121229.52ed3d6c@posting.google. com>... > Do you have a case number? If so - what is it? > > > What is the OS of the central root server? > > What was the bug that tech support said was the cause of the problem? > We did run into a problem that might cause such a hang if the > CDRSITES_92x environmental variable was incorrectly formatted. > > > > > cristizaharioiu@hotmail.com (Cristian Zaharioiu) wrote in message news:<929f826d.0410120314.5642fa53@posting.google. com>... > > hello! > > > > I have a question for you: I administrate about 80 IDS 9.2 servers > > running Unixware 7 and IDS 9.20 with ER .The manager would to upgrade > > root server to IDS 9.4 but...it is a problem with > > replication between root server running IDS 9.2 and other servers > > running IDS 9.2 > > I set the enviroment parameter CDRSITES_92x on IDS 9.4. Using only the > > root server and a leaf server everything is OK. But at the moment when > > i added another server and changed CDRSITES (ex: export > > CDRSITES_92x="302,303") the root server hangs--ONINIT process use > > 99.9% from CPU--all is blocked. ( > > > > If all the servers are online and connected replication runs, there is > > no problem!!! > > The problem appears when root server is restarting if the server is > > stopped because of different reasons.. > > I find that the thread CDRStart still running after the server > > starts: > > > > 37 473ace38 46e600d8 2 running 1cpu > > CDRStart > > > > Is it normally? (on ids 9.2 it don't happened) > > > > > > IBM told us that the last version ids 9.4 uc5 resolve our problem > > but....don't happened. > > Is it a known bug? > > > > Return of top command: > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > > 2186 informix 16 0 62824 61m 61m R 99.9 12.2 16:57.02 oninit > > > > onstat -g ath: > > > > tid tcb rstcb prty status vp-class > > name > > 2 473a0018 0 2 sleeping forever 3lio > > lio vp 0 > > 3 473ac140 0 2 sleeping forever 4pio > > pio vp 0 > > 4 473c1140 0 2 sleeping forever 5aio > > aio vp 0 > > 5 473d6140 0 2 sleeping forever 6msc > > msc vp 0 > > 6 47403140 0 2 sleeping forever 7aio > > aio vp 1 > > 7 474182a0 46e5d018 4 ready 1cpu > > main_loop() > > 8 47418e98 0 2 running 8soc > > soctcppoll > > 9 474408b0 0 2 cond wait arrived 9soc > > soctcppoll > > 10 474578b0 0 3 sleeping forever 1cpu > > soctcplst > > 11 47457cf8 46e5d630 2 sleeping forever 1cpu > > flush_sub(0) > > 12 47457e88 46e5dc48 2 sleeping forever 1cpu > > flush_sub(1) > > 13 47440a40 46e5e260 2 sleeping forever 1cpu > > flush_sub(2) > > 14 47440bd0 46e5e878 2 sleeping forever 1cpu > > flush_sub(3) > > 15 473c18d0 0 2 sleeping forever 10aio > > aio vp 2 > > 16 474cd140 0 2 sleeping forever 11aio > > aio vp 3 > > 17 474d9140 0 2 sleeping forever 12aio > > aio vp 4 > > 18 474ee140 0 2 sleeping forever 13aio > > aio vp 5 > > 19 475034b0 46e5ee90 3 sleeping forever 1cpu > > aslogflush > > 20 47503ab8 46e5f4a8 1 ready 1cpu > > btscanner 0 > > 35 473acad0 46e5fac0 1 ready 1cpu > > sbspclean > > 36 473c1b68 46e606f0 4 ready 1cpu > > onmode_mon > > 37 473ace38 46e600d8 2 running 1cpu > > CDRStart > > 39 474cd628 46e60d08 2 sleeping forever 1cpu > > CDRSchedMgr > > 40 474cdac0 46e61320 2 ready 1cpu > > preDDR > > 41 474cde08 46e61938 2 ready 1cpu > > CDRDTCleaner > > 42 474d9480 46e61f50 2 cond wait CDRCparse 1cpu > > CDRCparse > > 43 474eed78 46e62568 2 ready 1cpu > > CDRGfan > > 44 473a0518 46e62b80 4 ready 1cpu > > CDRGeval0 > > 45 473a0808 46e63198 4 ready 1cpu > > CDRGeval1 > > 46 473a0af8 46e637b0 4 ready 1cpu > > CDRGeval2 > > 47 477313c0 46e63dc8 2 ready 1cpu > > CDRNsT303 |