This is a discussion on ER problem---CDRACK cause rootserver crash within the Informix forums, part of the Database Server Software category; --> Hello, I have any problems with ER rootserver; the server often crashes..about 1-2 times per day.the system is Unixware ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Hello, I have any problems with ER rootserver; the server often crashes..about 1-2 times per day.the system is Unixware 7 and IDS 9.20. The error is : 16:20:23 Informix Dynamic Server 2000 Version 9.20.UC3 16:20:23 Who: Session(92, informix@sco00, 0, 390625404) Thread(80, CDRACK_1, 17462bd8, 3) File: mtex.c Line: 408 16:20:23 Results: Exception Caught. Type: MT_EX_OS, Context: mem 16:20:23 Action: Please notify Informix Technical Support. 16:20:23 stack trace for pid 17198 written to /home2/informix/tmp/af.4380c56 16:20:23 See Also: /home2/informix/tmp/af.4380c56 ....... 16:20:42 mtex.c, line 408, thread 80, proc id 17198, No Exception Handler. 16:20:42 Fatal error in ADM VP at mt.c:11029 16:20:42 Unexpected virtual processor termination, pid = 17198, exit = 0x100 16:20:43 PANIC: Attempting to bring system down 16:20:43 semctl: errno = 22 Constantly the rootserver crashes by reason of CDRACK_0 or CDRACK_1. The problem appears after I define new replicates between 2 leaf servers connected directly to rootserver. The architecture is here: sco00-rootserver sco01,sco40,sco42,sco43,sco44,sco45,sco46 leaf servers connected directly to sco00. I define new replicates primary target- data are replicated from sco40-sco46 to sco01; this replicates replicate a lot of data...30-50 row(transaction)/ min. i think that this volume of data cause the crash because before i define this new replicates I haven't this problem. Here is part of onconfig and af. file genereted: --onconfig CDR_LOGBUFFERS 16384 CDR_EVALTHREADS 1,2 # evaluator threads (per-cpu-vp,additional) CDR_DSLOCKWAIT 300 # DS lockwait timeout (seconds) #CDR_QUEUEMEM 4096 # Maximum amount of memory for any CDR queue (Kbytes) CDR_QUEUEMEM 16384 CDR_LOGDELTA 30 # % of log space allowed in queue memory CDR_NUMCONNECT 100 # Expected connections per server CDR_NIFRETRY 300 # Connection retry (seconds) CDR_NIFCOMPRESS 5 # Link level compression (-1 never, 0 none, 9 max) --onstat -g ath 75 1816b1f0 174614d8 2 sleeping secs: 52 3cpu CDRNsT117 77 1807fcf8 17461a98 2 cond wait CDRBlbslp 3cpu CDRBLOB_0 78 182a2178 17462058 2 cond wait CDRBlbslp 3cpu CDRBLOB_1 79 182af1a0 17462618 2 cond wait CDRAckslp 1cpu CDRACK_0 *80 182bc1a0 17462bd8 2 running 3cpu CDRACK_1 81 182c91f0 17463198 2 cond wait CDRDssleep 1cpu CDRD_0 82 182d7178 17463758 2 cond wait CDRDssleep 1cpu CDRD_1 83 182e4178 17463d18 2 cond wait netnorm 1cpu CDRNr46 onstat -g stk 80 light: Informix Dynamic Server 2000 Version 9.20.UC3 -- On-Line -- Up 1 days 05:11:08 -- 433028 Kbytes Stack for thread: 80 CDRACK_1 base: 0x182c0018 len: 36864 pc: 0x0856f0eb tos: 0x182c7dd8 state: running vp: 3 0x08863d98 (*nosymtab*)0x8863d98 What can i do to avoid this problem ? Can I tuning any parameters on onconfig file? Also I think to define server sco01 as nonroot server and sco40..sco46 leaf servers connected to sco01..is it a good idea ? My expectation is this architecture avoid replication from sco40-46 to sco01 through sco00 so the data will be replicated directly and sco00 won't be implicated...is it correct ? Thank you in advance... Cristian |
| |||
| "cristizaharioiu" <cristizaharioiu@gmail.com> wrote in message news:1130489255.781893.43220@g44g2000cwa.googlegro ups.com... > Hello, > > I have any problems with ER rootserver; the server often > crashes..about 1-2 times per day.the system is Unixware 7 and IDS 9.20. > The error is : > > 16:20:23 Informix Dynamic Server 2000 Version 9.20.UC3 > 16:20:23 Who: Session(92, informix@sco00, 0, 390625404) > Thread(80, CDRACK_1, 17462bd8, 3) > File: mtex.c Line: 408 > 16:20:23 Results: Exception Caught. Type: MT_EX_OS, Context: mem > 16:20:23 Action: Please notify Informix Technical Support. > 16:20:23 stack trace for pid 17198 written to > /home2/informix/tmp/af.4380c56 > 16:20:23 See Also: /home2/informix/tmp/af.4380c56 > ...... > 16:20:42 mtex.c, line 408, thread 80, proc id 17198, No Exception > Handler. > 16:20:42 Fatal error in ADM VP at mt.c:11029 > 16:20:42 Unexpected virtual processor termination, pid = 17198, exit = > 0x100 > > 16:20:43 PANIC: Attempting to bring system down > 16:20:43 semctl: errno = 22 > > > Constantly the rootserver crashes by reason of CDRACK_0 or CDRACK_1. > The problem appears after I define new replicates between 2 leaf > servers connected directly to rootserver. > > The architecture is here: sco00-rootserver > sco01,sco40,sco42,sco43,sco44,sco45,sco46 > leaf servers connected directly to sco00. > > I define new replicates primary target- data are replicated from > sco40-sco46 to sco01; this replicates replicate a lot of data...30-50 > row(transaction)/ min. i think that this volume of data cause the crash > because before i define this new replicates I haven't this problem. I don't think this is the case. We have customers replicating in the thousands of transactions a second. > > Here is part of onconfig and af. file genereted: > --onconfig > CDR_LOGBUFFERS 16384 > CDR_EVALTHREADS 1,2 # evaluator threads > (per-cpu-vp,additional) > CDR_DSLOCKWAIT 300 # DS lockwait timeout (seconds) > #CDR_QUEUEMEM 4096 # Maximum amount of memory for any CDR > queue (Kbytes) > CDR_QUEUEMEM 16384 > CDR_LOGDELTA 30 # % of log space allowed in queue > memory > CDR_NUMCONNECT 100 # Expected connections per server > CDR_NIFRETRY 300 # Connection retry (seconds) > CDR_NIFCOMPRESS 5 # Link level compression (-1 never, 0 > none, 9 max) > --onstat -g ath > > 75 1816b1f0 174614d8 2 sleeping secs: 52 3cpu > CDRNsT117 > 77 1807fcf8 17461a98 2 cond wait CDRBlbslp 3cpu > CDRBLOB_0 > 78 182a2178 17462058 2 cond wait CDRBlbslp 3cpu > CDRBLOB_1 > 79 182af1a0 17462618 2 cond wait CDRAckslp 1cpu > CDRACK_0 > *80 182bc1a0 17462bd8 2 running 3cpu > CDRACK_1 > 81 182c91f0 17463198 2 cond wait CDRDssleep 1cpu > CDRD_0 > 82 182d7178 17463758 2 cond wait CDRDssleep 1cpu > CDRD_1 > 83 182e4178 17463d18 2 cond wait netnorm 1cpu > CDRNr46 > > onstat -g stk 80 light: > > Informix Dynamic Server 2000 Version 9.20.UC3 -- On-Line -- Up 1 days > 05:11:08 -- 433028 Kbytes > > Stack for thread: 80 CDRACK_1 > base: 0x182c0018 > len: 36864 > pc: 0x0856f0eb > tos: 0x182c7dd8 > state: running > vp: 3 > > 0x08863d98 (*nosymtab*)0x8863d98 We are going to have to get the stack somehow. It might be worth it to set AFDEBUG so that instead of just crashing that the server will hang. That would make it possible to attach to the server while it is in the process of crashing with a debugger and get a stack. > > > > What can i do to avoid this problem ? Can I tuning any parameters on > onconfig file? I would try turning off compression. I don't know if that would help, but it's worth a try. > > Also I think to define server sco01 as nonroot server and sco40..sco46 > leaf servers connected to sco01..is it a good idea ? My expectation is > this architecture avoid replication from sco40-46 to sco01 through > sco00 so the data will be replicated directly and sco00 won't be > implicated...is it correct ? sco00 will forward the transactions to sco01. Sco01 may not participate in the replicated tables, but it will participate in the network flow. > > Thank you in advance... > Cristian > |
| |||
| cristizaharioiu wrote: > Thank you Madison, > > At the first step I would try to turn off compression ...it's necessary > to turn off compression on sco00 or both sco00 and sco01 ? compression is negotiated. That means that you only have to set it on one of the nodes |
| |||
| cristizaharioiu wrote: > Thank you Madison, > > At the first step I would try to turn off compression ...it's necessary > to turn off compression on sco00 or both sco00 and sco01 ? compression is negotiated. That means that you only have to set it on one of the nodes |
| ||||
| Cristian In the long term, I would suggest upgrading to at least version 9.4 - I had lots of ER crashes at 9.21, but after I was finally able to get to 9.4, ER has been much more robust. Warning - for my configuration it took a lot of work to upgrade my root server to 9.4 - but it was worth it. Daniel |