This is a discussion on RE: HDR unusable with 9.40 within the Informix forums, part of the Database Server Software category; --> Ajay, Is it fixed in 9.40uc5? When is it supposed to be released? Are there other known limitations in ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Ajay, Is it fixed in 9.40uc5? When is it supposed to be released? Are there other known limitations in 9.40 HDR implementation (like requirement for the secondary to be online during index build) compared to 7.30, 9.2, 9.3? P.S. I've just repeated experiment once again (replication on secondary was restored from Level-0 backup) with exactly the same result: 1. All indexes were dropped successfully; 2. The first index was created on Primary; 3. Replication of that index was started on Secondary, at the same time next index creation was started on Primary 4. In few seconds, both servers went into infinite checkpoint. No CPU activity, no disk activity on both machines ------------------------------------------ Alexey Sonkin -----Original Message----- From: ajaykg68@yahoo.com [mailto:ajaykg68@yahoo.com] Yes. This is a known bug and is fixed in later release of 9.40. -Ajay Gupta "Madison Pruet" <mpruet@comcast.net> wrote in message news:<uyOSc.292803$Oq2.232844@attbi_s52>... > Is there a known bug for this problem??? > > "Ajay Gupta" <ajaykg68@yahoo.com> wrote in message > news:26ad1e92.0408120908.57d4d1bb@posting.google.c om... > > If you are transferring a big index to secondary, most of the buffers on > > secondary are becoming dirty which may be source of the problem. > > > > Workaround for above problem is to increase the buffers on secondary > > and force a checkpoint (onmode -c) between two create index on primary. > > > > Message 'index will be unusable on secondary' is just a warning. > > If transfer of index from primary to secondary is aborted, you > > get this message. Index on primary is fine and usable. A query > > on secondary will not be able to use the index. > > > > -Ajay Gupta > > > > Alexey Sonkin <alexeis@grandvirtual.com> wrote in message > news:<cfee79$18l$1@news.xmission.com>... > > > Hi, everybody, > > > > > > I'm in a deep frustration about HDR quality and implementation in > 9.40uc4 > > > > > > In fact, 9.40 HDR is totally unusable: by creating a set of indexes over > a > > > big table, > > > one can easily get primary into 'forever blocked in checkpoint' and > > > make secondary completely unusable (should be restored from archive) > > > > > > I was trying to create a set of 10 indexes over a rather big table > > > in ANSI database on primary in HDR-replicated pair. > > > The first index in a set of 10 indexes was created successfully. > > > > > > Soon after the second index creation was started, the primary server > > > became blocked in a checkpoint. For about a minute after that, the > > > secondary was trying to write something to a disk, then became silent. > > > Both servers were blocked in a checkpoint. > > > > > > After several hours, I've stopped secondary. > > > Primary became operational immediately. > > > > > > After the secondary was started again, index synchronization > > > (of both indexes) was started (with proper message in 'online.log') > > > from scratch. > > > The first index was synchronized successfully, then second index sync.. > > > and both server became blocked in checkpoint again. > > > > > > I repeated the experiment of starting/stopping secondary several > times... > > > And every time both servers were blocked at the synchronization > > > of the second index. > > > > > > That's it. Secondary became totally unusable. > > > > > > I made an attempt to create index keeping secondary offline... > > > One more interesting result: message appeared in 'online.log', > > > indicating, that 'index will be unusable on secondary'. > > > That is, 9.40, unlike 7.x and 9.21, doesn't allow to create > > > indexes on primary with secondary offline. > > > > > > HDR is really, really bad in 9.40 > > > > > > I'm going to file a level-1 bug to IBM/PA > > > > > > ------------------------------------------ > > > Alexey Sonkin > > > Senior Database Administrator > > > > > > > > > > > > sending to informix-list sending to informix-list |
| ||||
| Hi Alexey, this is another silly problem that I found in 9.40.UC2 and reproduces in 9.30.FC5. I opened a tech support case with UK IBM, and after more than 10 days waiting,( I sent them a test case ), they answered that they reproduced the problem under 9.30.UC2 but the didn't reproduce the problem under 9.30.UC3 so they refused to open a new bug and they didn't give me a bug number. This only happens if there is no activity on the primary server and also see my workaround below. I didn't have time to test it under 9.40.UC3 or later hope this helps You will see all the info I sent to IBM below: New secondary crashes after fail over procedure Testing HDR fail over and following the steps from Informix admin manual new secondary crashes a few seconds after it changes its state to secondary server. If you try to bring it back up using oninit it will crash again with the same error. I found a workaround for this problem, ( see below ), but if you don't want to use the workaround the only way to re-create hdr is applying a new physical restore. I tested the same procedure under 9.30.FC5 Solaris 8 and 9.40.UC2 Linux redhat 9 ( kernel 2.4.20-8 ) The procedure I followed was: 1. Take primary instance (prdhdr) off line. Go first to quiescent mode and then offline. 2. Execute hdrmkpri.sh prhdr on secondary server (secondary). 3. Execute hdrmksec.sh sechdr on primary server ( primary ) 4. Bring new primary server backup using onint -v which is the same than Instance A (currently Primary) Instance B (currently Secondary) ------------------------------ -------------------------------- 1] onmode -ky (server should be up) 2] hdrmkpri.sh <primary_server_name> 3] hdrmksec.sh <secondary_server_name> (now a Secondary server) 4] oninit (now a Primary server) Secondary Server online message log 09:04:01 Loading Module <BUILTINNULL> 09:04:06 Dynamically allocated new virtual shared memory segment (size 8192KB) 09:04:06 IBM Informix Dynamic Server Version 9.40.UC2 Software Serial Number AAA#B000000 09:04:06 IBM Informix Dynamic Server Initialized -- Shared Memory Initialized. 09:04:06 DR: Reservation of the last logical log for log backup turned on 09:04:06 Data replication type and state information reset. To start DR, use the 'onmode -d' command and wait for the pair to be operational, before shutting down the database server 09:04:06 Physical Recovery Started at Page (1:1784). 09:04:06 Physical Recovery Complete: 0 Pages Examined, 0 Pages Restored. 09:04:06 Dataskip is now OFF for all dbspaces 09:04:06 Restartable Restore has been ENABLED 09:04:06 Recovery Mode 09:04:08 DR: Reservation of the last logical log for log backup turned off 09:04:08 DR: new type = secondary, primary server name = net940 09:04:08 DR: Trying to connect to primary server = net940 09:04:08 DR: Cannot connect to primary server 09:04:08 DR: Turned off on secondary server [informix@appst archive]$ sh: line 1: /usr/informix/etc/alarmprogram.sh: No such file or directory sh: line 1: /usr/informix/etc/alarmprogram.sh: No such file or directory [informix@appst archive]$ onstat -m shared memory not initialized for INFORMIXSERVER 'shmsec940' Message Log File: /informix/940/online.log Thread(19, dr_secapply, 4620d478, 1) File: rshdr.c Line: 5497 09:05:01 Results: Dynamic Server must abort 09:05:01 Action: Reinitialize shared memory 09:05:01 stack trace for pid 4789 written to /tmp/af.3fb8b2d 09:05:01 See Also: /tmp/af.3fb8b2d, shmem.3fb8b2d.0 09:05:01 Process exited with return code 127: /bin/sh /bin/sh -c /usr/informix/etc/alarmprogram.sh 3 15 "Data Replication failure." "DR: Turned off on secondary server 09:05:05 rshdr.c, line 5497, thread 19, proc id 4789, DR: Log Record Apply Thread Exited Abnormally. Internal Error. A restart of the database server shall be required to correct this problem. .. 09:05:05 invoke_alarm(): /bin/sh -c '/usr/informix/etc/alarmprogram.sh 5 6 "Internal Subsystem failure: 'MT'" "rshdr.c, line 5497, thread 19, proc id 4789, DR: Log Record Apply Thread Exited Abnormally. Internal Error. A restart of the database server shall be required to correct this problem. .." ' 09:05:05 invoke_alarm(): mt_exec failed, status 32512, errno 0 09:05:05 The Master Daemon Died 09:05:05 invoke_alarm(): /bin/sh -c '/usr/informix/etc/alarmprogram.sh 5 6 "Internal Subsystem failure: 'MT'" "The Master Daemon Died" ' 09:05:05 invoke_alarm(): mt_exec failed, status 32512, errno 0 09:05:05 PANIC: Attempting to bring system down [informix@appst archive]$ Primary server online message log Message Log File: /informix/940/online.log 09:05:02 DR: Failure recovery error (2) 09:05:02 Process exited with return code 127: /bin/sh /bin/sh -c /usr/informix/etc/alarmprogram.sh 3 15 "Data Replication failure." "DR: Local and Remote server type a 09:05:04 Physical Recovery Started at Page (1:1784). 09:05:04 Physical Recovery Complete: 0 Pages Examined, 0 Pages Restored. 09:05:04 DR: Turned off on primary server 09:05:04 Logical Recovery Started. 09:05:04 10 recovery worker threads will be started. 09:05:04 Process exited with return code 127: /bin/sh /bin/sh -c /usr/informix/etc/alarmprogram.sh 3 15 "Data Replication failure." "DR: Turned off on primary server" 09:05:04 DR: Cannot connect to secondary server 09:05:04 Process exited with return code 127: /bin/sh /bin/sh -c /usr/informix/etc/alarmprogram.sh 3 15 "Data Replication failure." "DR: Cannot connect to secondary se 09:05:08 Logical Recovery has reached the transaction cleanup phase. 09:05:08 Logical Recovery Complete. 0 Committed, 0 Rolled Back, 0 Open, 0 Bad Locks 09:05:09 Dataskip is now OFF for all dbspaces 09:05:09 Checkpoint Completed: duration was 0 seconds. 09:05:09 Checkpoint loguniq 155, logpos 0x9018, timestamp: 10928966 09:05:09 Maximum server connections 0 09:05:09 On-Line Mode [informix@st archive]$ onstat -g dri IBM Informix Dynamic Server Version 9.40.UC2 -- On-Line (Prim) -- Up 00:00:26 -- 49744 Kbytes Data Replication: Type State Paired server Last DR CKPT (id/pg) primary off netsec940 155 / 7 DRINTERVAL 30 DRTIMEOUT 30 DRLOSTFOUND /usr/informix/etc/dr.lostfound By the way I found a workaround for this, if you follow this procedure secondary server and HDR will stay up and running. 1. Take primary instance (prdhdr) off line. Go first to quiescent mode and then offline. 2. Execute hdrmkpri.sh prhdr on secondary server (secondary). 3. Bring the instance back up on secondary using oninit -v. 4. Execute onmode -l on sechdr to switch to the next logical log file. 5. Execute onmode -c on sechdr to write a new checkpoint record. 6. Take sechdhr offline. Go first to quiescent mode and then offline. 7. Execute hdrmksec.sh sechdr on primary server ( primary ) 8. Check with onstat -l where is the last checkpoint record on primary. 9. Bring new primary server back up using oninit -v 10. Execute onstat -m and onstat -g dri to make sure that HDR is up and running. Alexey Sonkin <alexeis@grandvirtual.com> wrote in message news:<cfhgl8$fgl$1@news.xmission.com>... > Ajay, > > Is it fixed in 9.40uc5? > When is it supposed to be released? > > Are there other known limitations in 9.40 HDR > implementation (like requirement for the secondary to be online > during index build) compared to 7.30, 9.2, 9.3? > > P.S. I've just repeated experiment once again (replication on secondary was > restored from Level-0 backup) with exactly the same result: > > 1. All indexes were dropped successfully; > 2. The first index was created on Primary; > 3. Replication of that index was started on Secondary, at the same time next > index creation was started on Primary > 4. In few seconds, both servers went into infinite checkpoint. > > No CPU activity, no disk activity on both machines > > ------------------------------------------ > Alexey Sonkin > > > -----Original Message----- > From: ajaykg68@yahoo.com [mailto:ajaykg68@yahoo.com] > > Yes. This is a known bug and is fixed in later release of 9.40. > > -Ajay Gupta > > "Madison Pruet" <mpruet@comcast.net> wrote in message > news:<uyOSc.292803$Oq2.232844@attbi_s52>... > > Is there a known bug for this problem??? > > > > "Ajay Gupta" <ajaykg68@yahoo.com> wrote in message > > news:26ad1e92.0408120908.57d4d1bb@posting.google.c om... > > > If you are transferring a big index to secondary, most of the buffers on > > > secondary are becoming dirty which may be source of the problem. > > > > > > Workaround for above problem is to increase the buffers on secondary > > > and force a checkpoint (onmode -c) between two create index on primary. > > > > > > Message 'index will be unusable on secondary' is just a warning. > > > If transfer of index from primary to secondary is aborted, you > > > get this message. Index on primary is fine and usable. A query > > > on secondary will not be able to use the index. > > > > > > -Ajay Gupta > > > > > > Alexey Sonkin <alexeis@grandvirtual.com> wrote in message > news:<cfee79$18l$1@news.xmission.com>... > > > > Hi, everybody, > > > > > > > > I'm in a deep frustration about HDR quality and implementation in > 9.40uc4 > > > > > > > > In fact, 9.40 HDR is totally unusable: by creating a set of indexes > over > a > > > > big table, > > > > one can easily get primary into 'forever blocked in checkpoint' and > > > > make secondary completely unusable (should be restored from archive) > > > > > > > > I was trying to create a set of 10 indexes over a rather big table > > > > in ANSI database on primary in HDR-replicated pair. > > > > The first index in a set of 10 indexes was created successfully. > > > > > > > > Soon after the second index creation was started, the primary server > > > > became blocked in a checkpoint. For about a minute after that, the > > > > secondary was trying to write something to a disk, then became silent. > > > > Both servers were blocked in a checkpoint. > > > > > > > > After several hours, I've stopped secondary. > > > > Primary became operational immediately. > > > > > > > > After the secondary was started again, index synchronization > > > > (of both indexes) was started (with proper message in 'online.log') > > > > from scratch. > > > > The first index was synchronized successfully, then second index > sync.. > > > > and both server became blocked in checkpoint again. > > > > > > > > I repeated the experiment of starting/stopping secondary several > times... > > > > And every time both servers were blocked at the synchronization > > > > of the second index. > > > > > > > > That's it. Secondary became totally unusable. > > > > > > > > I made an attempt to create index keeping secondary offline... > > > > One more interesting result: message appeared in 'online.log', > > > > indicating, that 'index will be unusable on secondary'. > > > > That is, 9.40, unlike 7.x and 9.21, doesn't allow to create > > > > indexes on primary with secondary offline. > > > > > > > > HDR is really, really bad in 9.40 > > > > > > > > I'm going to file a level-1 bug to IBM/PA > > > > > > > > ------------------------------------------ > > > > Alexey Sonkin > > > > Senior Database Administrator > > > > > > > > > > > > > > > > sending to informix-list > sending to informix-list |
| Thread Tools | |
| Display Modes | |
|
|