Unix Technical Forum

replication between 9.2 and 9.4--help me!!

This is a discussion on replication between 9.2 and 9.4--help me!! within the Informix forums, part of the Database Server Software category; --> hello! I have a question for you: I administrate about 80 IDS 9.2 servers running Unixware 7 and IDS ...


Go Back   Unix Technical Forum > Database Server Software > Informix

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-20-2008, 07:52 AM
Cristian Zaharioiu
 
Posts: n/a
Default replication between 9.2 and 9.4--help me!!

hello!

I have a question for you: I administrate about 80 IDS 9.2 servers
running Unixware 7 and IDS 9.20 with ER .The manager would to upgrade
root server to IDS 9.4 but...it is a problem with
replication between root server running IDS 9.2 and other servers
running IDS 9.2
I set the enviroment parameter CDRSITES_92x on IDS 9.4. Using only the
root server and a leaf server everything is OK. But at the moment when
i added another server and changed CDRSITES (ex: export
CDRSITES_92x="302,303") the root server hangs--ONINIT process use
99.9% from CPU--all is blocked. (

If all the servers are online and connected replication runs, there is
no problem!!!
The problem appears when root server is restarting if the server is
stopped because of different reasons..
I find that the thread CDRStart still running after the server
starts:

37 473ace38 46e600d8 2 running 1cpu
CDRStart

Is it normally? (on ids 9.2 it don't happened)


IBM told us that the last version ids 9.4 uc5 resolve our problem
but....don't happened.
Is it a known bug?

Return of top command:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2186 informix 16 0 62824 61m 61m R 99.9 12.2 16:57.02 oninit

onstat -g ath:

tid tcb rstcb prty status vp-class
name
2 473a0018 0 2 sleeping forever 3lio
lio vp 0
3 473ac140 0 2 sleeping forever 4pio
pio vp 0
4 473c1140 0 2 sleeping forever 5aio
aio vp 0
5 473d6140 0 2 sleeping forever 6msc
msc vp 0
6 47403140 0 2 sleeping forever 7aio
aio vp 1
7 474182a0 46e5d018 4 ready 1cpu
main_loop()
8 47418e98 0 2 running 8soc
soctcppoll
9 474408b0 0 2 cond wait arrived 9soc
soctcppoll
10 474578b0 0 3 sleeping forever 1cpu
soctcplst
11 47457cf8 46e5d630 2 sleeping forever 1cpu
flush_sub(0)
12 47457e88 46e5dc48 2 sleeping forever 1cpu
flush_sub(1)
13 47440a40 46e5e260 2 sleeping forever 1cpu
flush_sub(2)
14 47440bd0 46e5e878 2 sleeping forever 1cpu
flush_sub(3)
15 473c18d0 0 2 sleeping forever 10aio
aio vp 2
16 474cd140 0 2 sleeping forever 11aio
aio vp 3
17 474d9140 0 2 sleeping forever 12aio
aio vp 4
18 474ee140 0 2 sleeping forever 13aio
aio vp 5
19 475034b0 46e5ee90 3 sleeping forever 1cpu
aslogflush
20 47503ab8 46e5f4a8 1 ready 1cpu
btscanner 0
35 473acad0 46e5fac0 1 ready 1cpu
sbspclean
36 473c1b68 46e606f0 4 ready 1cpu
onmode_mon
37 473ace38 46e600d8 2 running 1cpu
CDRStart
39 474cd628 46e60d08 2 sleeping forever 1cpu
CDRSchedMgr
40 474cdac0 46e61320 2 ready 1cpu
preDDR
41 474cde08 46e61938 2 ready 1cpu
CDRDTCleaner
42 474d9480 46e61f50 2 cond wait CDRCparse 1cpu
CDRCparse
43 474eed78 46e62568 2 ready 1cpu
CDRGfan
44 473a0518 46e62b80 4 ready 1cpu
CDRGeval0
45 473a0808 46e63198 4 ready 1cpu
CDRGeval1
46 473a0af8 46e637b0 4 ready 1cpu
CDRGeval2
47 477313c0 46e63dc8 2 ready 1cpu
CDRNsT303
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-20-2008, 07:53 AM
mpruet
 
Posts: n/a
Default Re: replication between 9.2 and 9.4--help me!!

Do you have a case number? If so - what is it?


What is the OS of the central root server?

What was the bug that tech support said was the cause of the problem?
We did run into a problem that might cause such a hang if the
CDRSITES_92x environmental variable was incorrectly formatted.




cristizaharioiu@hotmail.com (Cristian Zaharioiu) wrote in message news:<929f826d.0410120314.5642fa53@posting.google. com>...
> hello!
>
> I have a question for you: I administrate about 80 IDS 9.2 servers
> running Unixware 7 and IDS 9.20 with ER .The manager would to upgrade
> root server to IDS 9.4 but...it is a problem with
> replication between root server running IDS 9.2 and other servers
> running IDS 9.2
> I set the enviroment parameter CDRSITES_92x on IDS 9.4. Using only the
> root server and a leaf server everything is OK. But at the moment when
> i added another server and changed CDRSITES (ex: export
> CDRSITES_92x="302,303") the root server hangs--ONINIT process use
> 99.9% from CPU--all is blocked. (
>
> If all the servers are online and connected replication runs, there is
> no problem!!!
> The problem appears when root server is restarting if the server is
> stopped because of different reasons..
> I find that the thread CDRStart still running after the server
> starts:
>
> 37 473ace38 46e600d8 2 running 1cpu
> CDRStart
>
> Is it normally? (on ids 9.2 it don't happened)
>
>
> IBM told us that the last version ids 9.4 uc5 resolve our problem
> but....don't happened.
> Is it a known bug?
>
> Return of top command:
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 2186 informix 16 0 62824 61m 61m R 99.9 12.2 16:57.02 oninit
>
> onstat -g ath:
>
> tid tcb rstcb prty status vp-class
> name
> 2 473a0018 0 2 sleeping forever 3lio
> lio vp 0
> 3 473ac140 0 2 sleeping forever 4pio
> pio vp 0
> 4 473c1140 0 2 sleeping forever 5aio
> aio vp 0
> 5 473d6140 0 2 sleeping forever 6msc
> msc vp 0
> 6 47403140 0 2 sleeping forever 7aio
> aio vp 1
> 7 474182a0 46e5d018 4 ready 1cpu
> main_loop()
> 8 47418e98 0 2 running 8soc
> soctcppoll
> 9 474408b0 0 2 cond wait arrived 9soc
> soctcppoll
> 10 474578b0 0 3 sleeping forever 1cpu
> soctcplst
> 11 47457cf8 46e5d630 2 sleeping forever 1cpu
> flush_sub(0)
> 12 47457e88 46e5dc48 2 sleeping forever 1cpu
> flush_sub(1)
> 13 47440a40 46e5e260 2 sleeping forever 1cpu
> flush_sub(2)
> 14 47440bd0 46e5e878 2 sleeping forever 1cpu
> flush_sub(3)
> 15 473c18d0 0 2 sleeping forever 10aio
> aio vp 2
> 16 474cd140 0 2 sleeping forever 11aio
> aio vp 3
> 17 474d9140 0 2 sleeping forever 12aio
> aio vp 4
> 18 474ee140 0 2 sleeping forever 13aio
> aio vp 5
> 19 475034b0 46e5ee90 3 sleeping forever 1cpu
> aslogflush
> 20 47503ab8 46e5f4a8 1 ready 1cpu
> btscanner 0
> 35 473acad0 46e5fac0 1 ready 1cpu
> sbspclean
> 36 473c1b68 46e606f0 4 ready 1cpu
> onmode_mon
> 37 473ace38 46e600d8 2 running 1cpu
> CDRStart
> 39 474cd628 46e60d08 2 sleeping forever 1cpu
> CDRSchedMgr
> 40 474cdac0 46e61320 2 ready 1cpu
> preDDR
> 41 474cde08 46e61938 2 ready 1cpu
> CDRDTCleaner
> 42 474d9480 46e61f50 2 cond wait CDRCparse 1cpu
> CDRCparse
> 43 474eed78 46e62568 2 ready 1cpu
> CDRGfan
> 44 473a0518 46e62b80 4 ready 1cpu
> CDRGeval0
> 45 473a0808 46e63198 4 ready 1cpu
> CDRGeval1
> 46 473a0af8 46e637b0 4 ready 1cpu
> CDRGeval2
> 47 477313c0 46e63dc8 2 ready 1cpu
> CDRNsT303

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-20-2008, 07:53 AM
Cristian Zaharioiu
 
Posts: n/a
Default Re: replication between 9.2 and 9.4--help me!!

mpruet@comcast.net (mpruet) wrote in message news:<40676184.0410121229.52ed3d6c@posting.google. com>...
> Do you have a case number? If so - what is it?
>
>
> What is the OS of the central root server?
>
> What was the bug that tech support said was the cause of the problem?
> We did run into a problem that might cause such a hang if the
> CDRSITES_92x environmental variable was incorrectly formatted.
>
>
>
>

For us the support was assured by IBM Romania, which disscused these
problems
with IBM Czech Republic which, probably, reclamed this bug to IBM. We
were told that this problem will be disscused at the CEMA support and
has the
number 408366"

The central root server running Linux Mandrake 9.2. Today I want to
install and configure the root server using Red Hat, although i don't
think the problem is Linux distribution..

I have also observed the return of onstat -g cdr command is different
if I declare CDRISTES_92x using "" or not...(export
CDRSITES_92x=303,304 or export CDRSITES_92x="303,304" )

onstat -g cdr (with CDRSITES_92x=303,304 ):

IBM Informix Dynamic Server Version 9.40.UC5 -- On-Line -- Up
00:00:40 -- 61704 Kbytes
CDR block 0x475d0688
Mode: 3
ServerId 1
ServerName g_test94
NumConnect 7
InitLock: 0x47403798 <unlocked>
InitCondition 0x47403898
Grouper 0x47658018
Queuer 0x475d0e18
Globcat 0x4746d638
RcvManager (nil)
Network IF 0x47733730
DataSync (nil)
QueueManager 0x475d2688
SchedManager 0x475d0df0
DeleteTable 0x475d0f00
RepScheduler 0x475d0dc8
Pool 0x475d0020
Fault (nil)
Memory usage for pool name CDR:
Free 7680 Used 29184
---------------------

GLOBAL-CATALOG CACHE STATISTICS

Replicate Id Sequence # : 0
Grouper Started : NO
Bitunits allocated : 11
Modify count: 0, wait 0
Sync count : 0, wait 0
Sync status: SS_ACTIVE Abort status: 0

SERVERS
-------------------
Current server : Id 1, Nm g_test94
Last server slot: (0, 3)
# free slots : 0
Broadcast map : <[000d]>
Leaf server map : <[000c]>
Root server map : <[0002]>
Adjacent server map: <[000c]>
Id: 01, Nm: g_test94, Or: 0x0002, off: 0, idle: 0, state Active
root Id: 00, forward Id: 00, ishub: TRUE, isleaf: FALSE
subtree map: <[000c]>
NifInfo: 0x475d0f90
Last Updated (0) 1969/12/31 19:00:00
State Local server
Total sent 0 (0 bytes)
Total recv'd 0 (0 bytes)
Retry 0 (0 attempts)
Connected 0

Id: 303, Nm: g_scot43, Or: 0x0004, off: 0, idle: 0, state Active
root Id: 01, forward Id: 303, ishub: FALSE, isleaf: TRUE
NifInfo: 0x4764e050
Last Updated (1097598409) 2004/10/12 12:26:49
State Start error !!!!!
Total sent 0 (0 bytes)
Total recv'd 0 (0 bytes)
Retry 0 (0 attempts)
Connected 0

Id: 304, Nm: g_scot44, Or: 0x0008, off: 0, idle: 0, state Active
root Id: 01, forward Id: 304, ishub: FALSE, isleaf: TRUE
NifInfo: 0x4764e0f0
Last Updated (1097598409) 2004/10/12 12:26:49
State Start error !!!!!
Total sent 0 (0 bytes)
Total recv'd 0 (0 bytes)
Retry 0 (0 attempts)
Connected 0
-----------------------

Grouper at 0x0x47658018:
Last Idle Time: (1097598424) 2004/10/12 12:27:04
RSAM interface ring buffer size: 528 !!!!!!
RSAM interface ring buffer pending entries: 0
Eval thread interface ring buffer size: 48
Eval thread interface ring buffer pending entries: 0
Log update buffers in use: 0
Max log update buffers used at once: 1
Log update buffer memory in use: 0
Max log update buffer memory used at once: 64


onstat -g cdr (with CDRSITES_92x="303,304" ):

IBM Informix Dynamic Server Version 9.40.UC5 -- On-Line -- Up
00:00:57 -- 61704 Kbytes
CDR block 0x475bd688
Mode: 1
ServerId 1
ServerName g_test94
NumConnect 7
InitLock: 0x474036f0 <unlocked>
InitCondition 0x474037f0
Grouper 0x4764f018
Queuer 0x475bde18
Globcat 0x4746ff60
RcvManager (nil)
Network IF 0x4771c730
DataSync (nil)
QueueManager 0x475bf688
SchedManager 0x475bddf0
DeleteTable 0x475bdf00
RepScheduler 0x475bddc8
Pool 0x475bd020
Fault (nil)
Memory usage for pool name CDR:
Free 7280 Used 29584
---------------------

GLOBAL-CATALOG CACHE STATISTICS

Replicate Id Sequence # : 0
Grouper Started : NO
Bitunits allocated : 12
Modify count: 0, wait 0
Sync count : 0, wait 0
Sync status: SS_ACTIVE Abort status: 0

SERVERS
-------------------
Current server : Id 1, Nm g_test94
Last server slot: (0, 3)
# free slots : 0
Broadcast map : <[000d]>
Leaf server map : <[000c]>
Root server map : <[0002]>
Adjacent server map: <[000c]>
Id: 01, Nm: g_test94, Or: 0x0002, off: 0, idle: 0, state Active
root Id: 00, forward Id: 00, ishub: TRUE, isleaf: FALSE
subtree map: <[000c]>
NifInfo: 0x475bdf90
Last Updated (0) 1969/12/31 19:00:00
State Local server
Total sent 0 (0 bytes)
Total recv'd 0 (0 bytes)
Retry 0 (0 attempts)
Connected 0

Id: 303, Nm: g_scot43, Or: 0x0004, off: 0, idle: 0, state Active
root Id: 01, forward Id: 303, ishub: FALSE, isleaf: TRUE
NifInfo: 0x47635050
Last Updated (0) 1969/12/31 19:00:00
State Disconnected !!!!!!!!!!
Total sent 0 (0 bytes)
Total recv'd 0 (0 bytes)
Retry 0 (0 attempts)
Connected 0

Id: 304, Nm: g_scot44, Or: 0x0008, off: 0, idle: 0, state Active
root Id: 01, forward Id: 304, ishub: FALSE, isleaf: TRUE
NifInfo: 0x476350f0
Last Updated (0) 1969/12/31 19:00:00
State Disconnected !!!!!!!!!!
Total sent 0 (0 bytes)
Total recv'd 0 (0 bytes)
Retry 0 (0 attempts)
Connected 0

-------------------
Grouper at 0x0x4764f018: Change
RSAM interface is down !!!!!!!!!!!!!
RSAM interface ring buffer size: 528
Eval thread interface ring buffer size: 48
Eval thread interface ring buffer pending entries: 0
Updates from Log: 0
Log update links allocated: 512
Blob links allocated: 0
Conflict Resolution Blocks Allocated: 0
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-20-2008, 07:53 AM
mpruet
 
Posts: n/a
Default Re: replication between 9.2 and 9.4--help me!!

The patch that you recently received should have addressed the problem
with the CDRSITESxxxx environmental variable.

I don't think that the issue with CDRSITES would have caused the
cdrStart thread to go into a loop. I think that the hang would have
been with one of the NIF threads.

I don't know exactly what the cdrStart thread was doing. We did start
up a send thread (CDRNsT303), but I don't see any other network
threads.

So how to proceed? I'm totally unavailable right now. Don't have
access to my Linux machine and won't have access to it for a couple of
weeks. So I can't do too much personally.

That said, there is a bit of extra information that might help out a
bit.

We need to determine exactly where we are within the engine when the
hang is occuring. There are two ways to get this information. 1)
turn on tracing, and 2) attach with a debugger.

To turn on tracing, you would need add the following environmental
variable prior to running oninit...

XTRACE="xtrace size 100000:heavy -c XTF_CDR_GCC:heavy -c
XTF_CDR_NIF:heavy -c XTF_CDR_RQMn"

This will cause tracing to be started as the engine is started.

Once the engine appears to be hung, the trace file can be displayed by
running "xtrace fview" (We need to also have onstat -g ath to
understand what the threads are.)


Also, we really need to get some stack traces when the server is hung.
Since you are running on linux, you would need to use gdb. It's been
a long time since I've used gdb, so I can't remember the exact
commands once gdb is active. Also, I can't remember how to attach to
a running process using gdb.
If I remember correctly, to attach to a running process, you would
issue "attach processID" where processid is obtained from onstat -g
glo. To interrupt the attached process, you would issue <CTRL>-c.
Then to display the stack "bt", to continue the process "c", and to
detach from the process "detach". But as I've said, its been a while
since I used gdb, and I'm a bit rusty.

However, we need to get a feel what the cdrStart thread is doing when
the hang occurs, so we need to get a few stack traces at the point of
the hang.



mpruet@comcast.net (mpruet) wrote in message news:<40676184.0410121229.52ed3d6c@posting.google. com>...
> Do you have a case number? If so - what is it?
>
>
> What is the OS of the central root server?
>
> What was the bug that tech support said was the cause of the problem?
> We did run into a problem that might cause such a hang if the
> CDRSITES_92x environmental variable was incorrectly formatted.
>
>
>
>
> cristizaharioiu@hotmail.com (Cristian Zaharioiu) wrote in message news:<929f826d.0410120314.5642fa53@posting.google. com>...
> > hello!
> >
> > I have a question for you: I administrate about 80 IDS 9.2 servers
> > running Unixware 7 and IDS 9.20 with ER .The manager would to upgrade
> > root server to IDS 9.4 but...it is a problem with
> > replication between root server running IDS 9.2 and other servers
> > running IDS 9.2
> > I set the enviroment parameter CDRSITES_92x on IDS 9.4. Using only the
> > root server and a leaf server everything is OK. But at the moment when
> > i added another server and changed CDRSITES (ex: export
> > CDRSITES_92x="302,303") the root server hangs--ONINIT process use
> > 99.9% from CPU--all is blocked. (
> >
> > If all the servers are online and connected replication runs, there is
> > no problem!!!
> > The problem appears when root server is restarting if the server is
> > stopped because of different reasons..
> > I find that the thread CDRStart still running after the server
> > starts:
> >
> > 37 473ace38 46e600d8 2 running 1cpu
> > CDRStart
> >
> > Is it normally? (on ids 9.2 it don't happened)
> >
> >
> > IBM told us that the last version ids 9.4 uc5 resolve our problem
> > but....don't happened.
> > Is it a known bug?
> >
> > Return of top command:
> > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> > 2186 informix 16 0 62824 61m 61m R 99.9 12.2 16:57.02 oninit
> >
> > onstat -g ath:
> >
> > tid tcb rstcb prty status vp-class
> > name
> > 2 473a0018 0 2 sleeping forever 3lio
> > lio vp 0
> > 3 473ac140 0 2 sleeping forever 4pio
> > pio vp 0
> > 4 473c1140 0 2 sleeping forever 5aio
> > aio vp 0
> > 5 473d6140 0 2 sleeping forever 6msc
> > msc vp 0
> > 6 47403140 0 2 sleeping forever 7aio
> > aio vp 1
> > 7 474182a0 46e5d018 4 ready 1cpu
> > main_loop()
> > 8 47418e98 0 2 running 8soc
> > soctcppoll
> > 9 474408b0 0 2 cond wait arrived 9soc
> > soctcppoll
> > 10 474578b0 0 3 sleeping forever 1cpu
> > soctcplst
> > 11 47457cf8 46e5d630 2 sleeping forever 1cpu
> > flush_sub(0)
> > 12 47457e88 46e5dc48 2 sleeping forever 1cpu
> > flush_sub(1)
> > 13 47440a40 46e5e260 2 sleeping forever 1cpu
> > flush_sub(2)
> > 14 47440bd0 46e5e878 2 sleeping forever 1cpu
> > flush_sub(3)
> > 15 473c18d0 0 2 sleeping forever 10aio
> > aio vp 2
> > 16 474cd140 0 2 sleeping forever 11aio
> > aio vp 3
> > 17 474d9140 0 2 sleeping forever 12aio
> > aio vp 4
> > 18 474ee140 0 2 sleeping forever 13aio
> > aio vp 5
> > 19 475034b0 46e5ee90 3 sleeping forever 1cpu
> > aslogflush
> > 20 47503ab8 46e5f4a8 1 ready 1cpu
> > btscanner 0
> > 35 473acad0 46e5fac0 1 ready 1cpu
> > sbspclean
> > 36 473c1b68 46e606f0 4 ready 1cpu
> > onmode_mon
> > 37 473ace38 46e600d8 2 running 1cpu
> > CDRStart
> > 39 474cd628 46e60d08 2 sleeping forever 1cpu
> > CDRSchedMgr
> > 40 474cdac0 46e61320 2 ready 1cpu
> > preDDR
> > 41 474cde08 46e61938 2 ready 1cpu
> > CDRDTCleaner
> > 42 474d9480 46e61f50 2 cond wait CDRCparse 1cpu
> > CDRCparse
> > 43 474eed78 46e62568 2 ready 1cpu
> > CDRGfan
> > 44 473a0518 46e62b80 4 ready 1cpu
> > CDRGeval0
> > 45 473a0808 46e63198 4 ready 1cpu
> > CDRGeval1
> > 46 473a0af8 46e637b0 4 ready 1cpu
> > CDRGeval2
> > 47 477313c0 46e63dc8 2 ready 1cpu
> > CDRNsT303

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 11:17 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com