vBulletin Search Engine Optimization
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| IDS 9.21.UC4 Solaris 2.6 EDS takes our backup using onbar. What we have observed is that, quite frequently onbar hangs. By hangs it means, onbar processes keep running indefinitely. Since the backup is taken daily, all subsequent backups also hang until the earlier on is killed. I have a script to look for onbar process 15 hours after it starts, and if it finds one, it sends a mail as follows:- ========================================= onbar is still running on flx10 root 15709 15703 0 00:50:14 ? 0:00 /bin/sh /opt/informix/bin/onbar -b -L 0 root 15878 15711 0 00:51:14 ? 0:00 /opt/informix/bin/onbar_d -b -L 0 root 15879 15711 0 00:51:14 ? 0:00 /opt/informix/bin/onbar_d -b -L 0 root 15873 15711 0 00:51:14 ? 0:00 /opt/informix/bin/onbar_d -b -L 0 root 15872 15711 0 00:51:14 ? 0:00 /opt/informix/bin/onbar_d -b -L 0 root 15871 15711 0 00:51:14 ? 0:00 /opt/informix/bin/onbar_d -b -L 0 root 15874 15711 0 00:51:14 ? 0:00 /opt/informix/bin/onbar_d -b -L 0 root 15711 15709 0 00:50:14 ? 0:01 /opt/informix/bin/onbar_d -b -L 0 ========================================= The ssyadmin then kills the processes indicated above. Once they kill the client process, we get this message in online.log ======================================= 15:04:10 Archive on logs02 ABORTED. 15:04:10 Aborted by client. 15:04:10 Archive on logs03 ABORTED. 15:04:10 Aborted by client. 15:04:10 Archive on logs04 ABORTED. 15:04:10 Aborted by client. 15:04:10 Archive on physdbs ABORTED. 15:04:10 Aborted by client. 15:04:11 Archive on fdbs04 ABORTED. 15:04:11 Aborted by client. 15:04:11 Archive on gdbs01 ABORTED. 15:04:11 Aborted by client. ======================================= In other words, the backup is useless. Earlier on, it use to be once/twice a month. Now it is once/twice a week. What could be wrong with it. Why does onbar hang for 15 hours without doing anything. |
| |||
| Are you running parallel backups and if yes do you have enough tape heads in the storage device? What else is using the storage device at the time this is running? rkusenet wrote: > > IDS 9.21.UC4 > Solaris 2.6 > > EDS takes our backup using onbar. What we have observed is that, > quite frequently onbar hangs. By hangs it means, onbar processes > keep running indefinitely. Since the backup is taken daily, all > subsequent backups also hang until the earlier on is killed. > > I have a script to look for onbar process 15 hours after it > starts, and if it finds one, it sends a mail as follows:- > > ========================================= > onbar is still running on flx10 > root 15709 15703 0 00:50:14 ? 0:00 /bin/sh /opt/informix/bin/onbar -b -L 0 > root 15878 15711 0 00:51:14 ? 0:00 /opt/informix/bin/onbar_d -b -L 0 > root 15879 15711 0 00:51:14 ? 0:00 /opt/informix/bin/onbar_d -b -L 0 > root 15873 15711 0 00:51:14 ? 0:00 /opt/informix/bin/onbar_d -b -L 0 > root 15872 15711 0 00:51:14 ? 0:00 /opt/informix/bin/onbar_d -b -L 0 > root 15871 15711 0 00:51:14 ? 0:00 /opt/informix/bin/onbar_d -b -L 0 > root 15874 15711 0 00:51:14 ? 0:00 /opt/informix/bin/onbar_d -b -L 0 > root 15711 15709 0 00:50:14 ? 0:01 /opt/informix/bin/onbar_d -b -L 0 > ========================================= > > The ssyadmin then kills the processes indicated above. > > Once they kill the client process, we get this message in online.log > ======================================= > 15:04:10 Archive on logs02 ABORTED. > 15:04:10 Aborted by client. > 15:04:10 Archive on logs03 ABORTED. > 15:04:10 Aborted by client. > 15:04:10 Archive on logs04 ABORTED. > 15:04:10 Aborted by client. > 15:04:10 Archive on physdbs ABORTED. > 15:04:10 Aborted by client. > 15:04:11 Archive on fdbs04 ABORTED. > 15:04:11 Aborted by client. > 15:04:11 Archive on gdbs01 ABORTED. > 15:04:11 Aborted by client. > ======================================= > In other words, the backup is useless. > > Earlier on, it use to be once/twice a month. Now it is once/twice a week. > What could be wrong with it. Why does onbar hang for 15 hours without doing > anything. -- Paul Watson # Oninit Ltd # Growing old is mandatory Tel: +44 1436 672201 # Growing up is optional Fax: +44 1436 678693 # Mob: +44 7818 003457 # www.oninit.com # |
| |||
| "Brice Avila" <briceavila@hotmail.com> wrote in message news:c46e3ec5.0308071049.50bec301@posting.google.c om... > What is happening in the BAR_ACT_LOG, and with the storage manager? > The multiple onbar processes for a parallel backup is normal, one for > each dbspace. You'll notice that 15709 spawned 15711, and then 15711 > spawned several dbspace onbar backup processes. The online.log you > provided shows a list of the the dbspace backups that are killed when > the onbar processes are killed: logs02, logs03, logs04, physdbs, > fdbs04, gdbs01. From the information you provided, things appear OK. > > If 15 hours is too long for a backup (there are some customers I know > that would like a backup as short as 15 hours) then you'll need help > from your storage manager vendor to see where the bottlenecks are. > > Hope this information helps. > > Brice Avila > Minneapolis, Minnesota brice, there is definitely some problem. onbar never completes and it also screwes up future onbar backups unless it is killed. The problem seems to be with multiple onbar processes. XBSA-1.0.1 6.0.Build.153 15872 Wed Aug 6 00:52:17 2003 _nwbsa_is_retryable_error: received a retryable network error (Severity 0 Number -13): current request exceeds parallelism XBSA-1.0.1 6.0.Build.153 15873 Wed Aug 6 00:52:17 2003 _nwbsa_is_retryable_error: received a retryable network error (Severity 0 Number -13): current request exceeds parallelism XBSA-1.0.1 6.0.Build.153 15871 Wed Aug 6 00:52:17 2003 _nwbsa_is_retryable_error: received a retryable network error (Severity 0 Number -13): current request exceeds parallelism XBSA-1.0.1 6.0.Build.153 15874 Wed Aug 6 00:52:19 2003 nwbsa_is_retryable_error: received a retryable network error (Severity 0 Number -13): current request exceeds parallelism I have reduced BAR_MAX_BACKUP tp 4. Yestday the backup went fine, though slow. I will wait for few more weeks to see whether it solves the problem. Ravi |
| |||
| Ravi, Now that you provided some Legato information you perhaps have a network bottleneck (?). You'll need to investigate logs along these lines for further solutions. Lowering BAR_MAX_BACKUP only prevents the amount of information passing over the network. While the Informix logs show normal processing, the Legato logs show some errant info. Hope this information helps. Brice Avila Minneapolis, Minnesota "rkusenet" <rkusenet@sympatico.ca> wrote in message news:<bgu7rk$s3cpk$1@ID-75254.news.uni-berlin.de>... > "Brice Avila" <briceavila@hotmail.com> wrote in message > news:c46e3ec5.0308071049.50bec301@posting.google.c om... <SNIP> |
| ||||
| How about something simple like maybe the storage manager is waiting for a tape mount? When the onbar is hanging, don't just kill it but go into your storage manager and see what it thinks is happening. "rkusenet" <rkusenet@sympatico.ca> wrote in message news:bgro1s$rbvfo$1@ID-75254.news.uni-berlin.de... > IDS 9.21.UC4 > Solaris 2.6 > > EDS takes our backup using onbar. What we have observed is that, > quite frequently onbar hangs. By hangs it means, onbar processes > keep running indefinitely. Since the backup is taken daily, all > subsequent backups also hang until the earlier on is killed. > > I have a script to look for onbar process 15 hours after it > starts, and if it finds one, it sends a mail as follows:- > > ========================================= > onbar is still running on flx10 > root 15709 15703 0 00:50:14 ? 0:00 /bin/sh /opt/informix/bin/onbar -b -L 0 > root 15878 15711 0 00:51:14 ? 0:00 /opt/informix/bin/onbar_d -b -L 0 > root 15879 15711 0 00:51:14 ? 0:00 /opt/informix/bin/onbar_d -b -L 0 > root 15873 15711 0 00:51:14 ? 0:00 /opt/informix/bin/onbar_d -b -L 0 > root 15872 15711 0 00:51:14 ? 0:00 /opt/informix/bin/onbar_d -b -L 0 > root 15871 15711 0 00:51:14 ? 0:00 /opt/informix/bin/onbar_d -b -L 0 > root 15874 15711 0 00:51:14 ? 0:00 /opt/informix/bin/onbar_d -b -L 0 > root 15711 15709 0 00:50:14 ? 0:01 /opt/informix/bin/onbar_d -b -L 0 > ========================================= > > The ssyadmin then kills the processes indicated above. > > Once they kill the client process, we get this message in online.log > ======================================= > 15:04:10 Archive on logs02 ABORTED. > 15:04:10 Aborted by client. > 15:04:10 Archive on logs03 ABORTED. > 15:04:10 Aborted by client. > 15:04:10 Archive on logs04 ABORTED. > 15:04:10 Aborted by client. > 15:04:10 Archive on physdbs ABORTED. > 15:04:10 Aborted by client. > 15:04:11 Archive on fdbs04 ABORTED. > 15:04:11 Aborted by client. > 15:04:11 Archive on gdbs01 ABORTED. > 15:04:11 Aborted by client. > ======================================= > In other words, the backup is useless. > > Earlier on, it use to be once/twice a month. Now it is once/twice a week. > What could be wrong with it. Why does onbar hang for 15 hours without doing > anything. > > > > |