vBulletin Search Engine Optimization
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Hi, I have an odd problem with one of my customer's SCO machines. Without going into a lot of detail and reasons for backup decisions, I have implemented a dd cloning of the master hard disk to a backup disk. That is to say my customer has 2 hard disks (in a caddy), the primary or master disk (hd00) and the clone or backup disk (hd10). A cron job wakes up twice daily and does a complete clone of the master disk to the backup / clone disk using dd. No great issue here as the entire disk (20Gb) takes about 24 minutes to complete. Now in Linux, this works great. If the master disk crashes for any reason, the customer simply re-boots their system and boots from the secondary drive whilst the primary is being repaired and all boots up fine and proper and no problems. However, the same process in SCO 5.0.5 seems to generate a completely corrupted disk. When I compare the master to the clone, you can see that the clone is completely a mess. I don't understand why this is the case with SCO but not with linux. As I'm doing a dd of the ENTIRE disk (a binary bit by bit copy), I don't understand why it doesn't work under sco. BTW - in SCO I don't boot from the secondary drive - I have to set it to the master drive first however, as per above the disk is so badly corrupted that the boot process will not work. The command I use is time dd if=/dev/hd00 of=/dev/hd10 bs=1451520 count=12529 (for SCO) time dd if=/dev/hda of=/dev/hdb bs=1451520 count=12529 (for linux) Some additional info. If I use a "live cd" to boot and do the above clone for sco it all works fine - hence I confirm that my drives and controller are healthy. The problem only occurs only if I boot the HDD (primary disk) and then try the dd clone. So obviously dd doesn't seem to like doing something on a running system but not sure why this should be so. Again, under a running Linux system this works perfectly fine. Any ideas or help would be appreciated - thanks. Rgds. Otto Rodusek. (otto@applied.com.sg) |
| |||
| Otto Rodusek wrote: > Without going into a lot of detail and reasons for backup decisions, I > have implemented a dd cloning of the master hard disk to a backup disk. > That is to say my customer has 2 hard disks (in a caddy), the primary > or master disk (hd00) and the clone or backup disk (hd10). A cron job > wakes up twice daily and does a complete clone of the master disk to > the backup / clone disk using dd. No great issue here as the entire > disk (20Gb) takes about 24 minutes to complete. Now in Linux, this > works great. If the master disk crashes for any reason, the customer > simply re-boots their system and boots from the secondary drive whilst > the primary is being repaired and all boots up fine and proper and no > problems. > > However, the same process in SCO 5.0.5 seems to generate a completely > corrupted disk. When I compare the master to the clone, you can see > that the clone is completely a mess. I don't understand why this is the > case with SCO but not with linux. As I'm doing a dd of the ENTIRE disk > (a binary bit by bit copy), I don't understand why it doesn't work > under sco. BTW - in SCO I don't boot from the secondary drive - I have > to set it to the master drive first however, as per above the disk is > so badly corrupted that the boot process will not work. > > The command I use is > > time dd if=/dev/hd00 of=/dev/hd10 bs=1451520 count=12529 (for SCO) > time dd if=/dev/hda of=/dev/hdb bs=1451520 count=12529 (for linux) > > Some additional info. If I use a "live cd" to boot and do the above > clone for sco it all works fine - hence I confirm that my drives and > controller are healthy. The problem only occurs only if I boot the HDD > (primary disk) and then try the dd clone. So obviously dd doesn't seem > to like doing something on a running system but not sure why this > should be so. Again, under a running Linux system this works perfectly > fine. Install the newest "wd" driver you can, try this again. There was a data corruption problem much like you describe, happening only under certain kinds of two-drive simultaneous loads. Hopefully someone else will chime in with the name and location of the newest "wd" driver known to work with OSR505. This is not necessarily the very newest "wd". I am not 100% certain that this problem _can_ be fixed under OSR505. I don't remember if the fix was in "wd" alone or involved other parts of the kernel. You might need to put OSR507 on the machine. (Note that when OSR505 first shipped, the idea of a 20GB IDE disk was some sort of bizarre fantasy...) >Bela< |
| |||
| Hi Bela, Thanks for the reply. Indeed I tried it with both the old wd driver (as originally delivered in 5.0.5) as well as the newer wd driver as found in 5.0.7. The timing is DRASTICALLY diff (the old wd driver took almost 4 hrs to complete the clone whilst the new wd driver finished in under 25 minutes for a 20Gb clone!!). However, the result seems to be the same - the new clone is badly corrupted. As I mentioned previously this only happens on a "live" running system. If I boot from say a CD and clone drive 1 to drive 2 it works just fine. Again, any help or ideas to try would be much appreciated. Rgds. Otto. Bela Lubkin wrote: > Otto Rodusek wrote: > > > Without going into a lot of detail and reasons for backup decisions, I > > have implemented a dd cloning of the master hard disk to a backup disk. > > That is to say my customer has 2 hard disks (in a caddy), the primary > > or master disk (hd00) and the clone or backup disk (hd10). A cron job > > wakes up twice daily and does a complete clone of the master disk to > > the backup / clone disk using dd. No great issue here as the entire > > disk (20Gb) takes about 24 minutes to complete. Now in Linux, this > > works great. If the master disk crashes for any reason, the customer > > simply re-boots their system and boots from the secondary drive whilst > > the primary is being repaired and all boots up fine and proper and no > > problems. > > > > However, the same process in SCO 5.0.5 seems to generate a completely > > corrupted disk. When I compare the master to the clone, you can see > > that the clone is completely a mess. I don't understand why this is the > > case with SCO but not with linux. As I'm doing a dd of the ENTIRE disk > > (a binary bit by bit copy), I don't understand why it doesn't work > > under sco. BTW - in SCO I don't boot from the secondary drive - I have > > to set it to the master drive first however, as per above the disk is > > so badly corrupted that the boot process will not work. > > > > The command I use is > > > > time dd if=/dev/hd00 of=/dev/hd10 bs=1451520 count=12529 (for SCO) > > time dd if=/dev/hda of=/dev/hdb bs=1451520 count=12529 (for linux) > > > > Some additional info. If I use a "live cd" to boot and do the above > > clone for sco it all works fine - hence I confirm that my drives and > > controller are healthy. The problem only occurs only if I boot the HDD > > (primary disk) and then try the dd clone. So obviously dd doesn't seem > > to like doing something on a running system but not sure why this > > should be so. Again, under a running Linux system this works perfectly > > fine. > > Install the newest "wd" driver you can, try this again. There was a > data corruption problem much like you describe, happening only under > certain kinds of two-drive simultaneous loads. > > Hopefully someone else will chime in with the name and location of the > newest "wd" driver known to work with OSR505. This is not necessarily > the very newest "wd". > > I am not 100% certain that this problem _can_ be fixed under OSR505. I > don't remember if the fix was in "wd" alone or involved other parts of > the kernel. You might need to put OSR507 on the machine. (Note that > when OSR505 first shipped, the idea of a 20GB IDE disk was some sort of > bizarre fantasy...) > > >Bela< |
| |||
| Hi Bela, Thanks for the reply. Indeed I tried it with both the old wd driver (as originally delivered in 5.0.5) as well as the newer wd driver as found in 5.0.7. The timing is DRASTICALLY diff (the old wd driver took almost 4 hrs to complete, the new wd driver finished in under 25 minutes for a 20Gb clone!!). However, the result seems to be the same - the new clone is badly corrupted. As I mentioned previously this only happens on a "live" running booted system. If I boot from say a CD and clone drive 1 to drive 2 it works just fine. Again, any help or ideas to try would be much appreciated. Rgds. Otto. |
| |||
| Hi Bela, Thanks for the reply. Indeed I tried it with both the old wd driver (as originally delivered in 5.0.5) as well as the newer wd driver as found in 5.0.7. The timing is DRASTICALLY diff (the old wd driver took almost 4 hrs to complete, the new wd driver finished in under 25 minutes for a 20Gb clone!!). However, the result seems to be the same - the new clone is badly corrupted. As I mentioned previously this only happens on a "live" running booted system. If I boot from say a CD and clone drive 1 to drive 2 it works just fine. Again, any help or ideas to try would be much appreciated. Rgds. Otto. Bela Lubkin wrote: > Otto Rodusek wrote: > > > Without going into a lot of detail and reasons for backup decisions, I > > have implemented a dd cloning of the master hard disk to a backup disk. > > That is to say my customer has 2 hard disks (in a caddy), the primary > > or master disk (hd00) and the clone or backup disk (hd10). A cron job > > wakes up twice daily and does a complete clone of the master disk to > > the backup / clone disk using dd. No great issue here as the entire > > disk (20Gb) takes about 24 minutes to complete. Now in Linux, this > > works great. If the master disk crashes for any reason, the customer > > simply re-boots their system and boots from the secondary drive whilst > > the primary is being repaired and all boots up fine and proper and no > > problems. > > > > However, the same process in SCO 5.0.5 seems to generate a completely > > corrupted disk. When I compare the master to the clone, you can see > > that the clone is completely a mess. I don't understand why this is the > > case with SCO but not with linux. As I'm doing a dd of the ENTIRE disk > > (a binary bit by bit copy), I don't understand why it doesn't work > > under sco. BTW - in SCO I don't boot from the secondary drive - I have > > to set it to the master drive first however, as per above the disk is > > so badly corrupted that the boot process will not work. > > > > The command I use is > > > > time dd if=/dev/hd00 of=/dev/hd10 bs=1451520 count=12529 (for SCO) > > time dd if=/dev/hda of=/dev/hdb bs=1451520 count=12529 (for linux) > > > > Some additional info. If I use a "live cd" to boot and do the above > > clone for sco it all works fine - hence I confirm that my drives and > > controller are healthy. The problem only occurs only if I boot the HDD > > (primary disk) and then try the dd clone. So obviously dd doesn't seem > > to like doing something on a running system but not sure why this > > should be so. Again, under a running Linux system this works perfectly > > fine. > > Install the newest "wd" driver you can, try this again. There was a > data corruption problem much like you describe, happening only under > certain kinds of two-drive simultaneous loads. > > Hopefully someone else will chime in with the name and location of the > newest "wd" driver known to work with OSR505. This is not necessarily > the very newest "wd". > > I am not 100% certain that this problem _can_ be fixed under OSR505. I > don't remember if the fix was in "wd" alone or involved other parts of > the kernel. You might need to put OSR507 on the machine. (Note that > when OSR505 first shipped, the idea of a 20GB IDE disk was some sort of > bizarre fantasy...) > > >Bela< |
| |||
| In article <1136729079.655900.309730@g14g2000cwa.googlegroups .com>, Otto <otto@applied.com.sg> wrote: >Thanks for the reply. Indeed I tried it with both the old wd driver (as >originally delivered in 5.0.5) as well as the newer wd driver as found >in 5.0.7. The timing is DRASTICALLY diff (the old wd driver took almost >4 hrs to complete, the new wd driver finished in under 25 minutes for a >20Gb clone!!). However, the result seems to be the same - the new >clone is badly corrupted. As I mentioned previously this only happens >on a "live" running booted system. If I boot from say a CD and clone >drive 1 to drive 2 it works just fine. Again, any help or ideas to try >would be much appreciated. Rgds. Otto. Just an observation here. Running dd on a live drive could be problematic as many things are being continually run and updated on an Unix system. From my POV the best way to ensure you are OK is to put the OS on the other drive for starters, and then do updates through the filesystem and NOT dd via a tool such as rsync. Bill -- Bill Vermillion - bv @ wjv . com |
| |||
| Hi Bill, Thanks for the feedback. I fully agree with what you write HOWEVER, in my case the server is a simple email server with almost all POP accounts and also a SAMBA file server (again most of the data is static) so a "little" bit of loss is acceptable. I have used this "Disaster Recovery" method successfully on several of my customer sites (running Linux) and it is really a 30 second (no kidding!!) reboot to get the system going in the event of a disk failure. BTW I have also implemented a Microlite BackupEdge backup as well. If you use the Disaster Recovery of Microlite, it takes approximately 2-4 hours to fully recover (hence the 30 second recovery for a mail server) is required by my customer. I really did now want to get into the merits or lack thereof of the method used - just a reason or fix as to why the "dd" command works fine under same conditions with LINUX and not woth SO 5.0.5. I will probably try Bela suggestion and try with SCO 5.0.7 and see whether it is indeed a kernel problem or not. Again much thanks for your time and suggestions. Rgds. Otto. Bill Vermillion wrote: > In article <1136729079.655900.309730@g14g2000cwa.googlegroups .com>, > Otto <otto@applied.com.sg> wrote: > > >Thanks for the reply. Indeed I tried it with both the old wd driver (as > >originally delivered in 5.0.5) as well as the newer wd driver as found > >in 5.0.7. The timing is DRASTICALLY diff (the old wd driver took almost > >4 hrs to complete, the new wd driver finished in under 25 minutes for a > >20Gb clone!!). However, the result seems to be the same - the new > >clone is badly corrupted. As I mentioned previously this only happens > >on a "live" running booted system. If I boot from say a CD and clone > >drive 1 to drive 2 it works just fine. Again, any help or ideas to try > >would be much appreciated. Rgds. Otto. > > Just an observation here. > > Running dd on a live drive could be problematic as many things are > being continually run and updated on an Unix system. > > From my POV the best way to ensure you are OK is to put the OS on > the other drive for starters, and then do updates through the > filesystem and NOT dd via a tool such as rsync. > > Bill > -- > Bill Vermillion - bv @ wjv . com |
| |||
| Otto enscribed: | Hi Bill, | | Thanks for the feedback. | | I fully agree with what you write HOWEVER, in my case the server is a | simple email server with almost all POP accounts and also a SAMBA file | server (again most of the data is static) so a "little" bit of loss is | acceptable. I have used this "Disaster Recovery" method successfully on | several of my customer sites (running Linux) and it is really a 30 | second (no kidding!!) reboot to get the system going in the event of a | disk failure. BTW I have also implemented a Microlite BackupEdge backup | as well. If you use the Disaster Recovery of Microlite, it takes | approximately 2-4 hours to fully recover (hence the 30 second recovery | for a mail server) is required by my customer. I really did now want to | get into the merits or lack thereof of the method used - just a reason | or fix as to why the "dd" command works fine under same conditions with | LINUX and not woth SO 5.0.5. I will probably try Bela suggestion and | try with SCO 5.0.7 and see whether it is indeed a kernel problem or | not. Again much thanks for your time and suggestions. Rgds. Otto. Which begs the question (at least here), why are you going through all of these gymnastics when there are much better and more reliable solutions. Mirror the drive and install a hot spare. Let technology do the work instead of kludging 1980 style partial solutions. -- ================================================== ======================== Tom Parsons tom@tegan.com ================================================== ======================== |
| ||||
| In article <20060110064709.64734@tegan.com>, Tom Parsons <sconews@tegan.com> wrote: >Otto enscribed: >| Hi Bill, >| >| Thanks for the feedback. >| >| I fully agree with what you write HOWEVER, in my case the server is a >| simple email server with almost all POP accounts and also a SAMBA file >| server (again most of the data is static) so a "little" bit of loss is >| acceptable. I have used this "Disaster Recovery" method successfully on >| several of my customer sites (running Linux) and it is really a 30 >| second (no kidding!!) reboot to get the system going in the event of a >| disk failure. BTW I have also implemented a Microlite BackupEdge backup >| as well. If you use the Disaster Recovery of Microlite, it takes >| approximately 2-4 hours to fully recover (hence the 30 second recovery >| for a mail server) is required by my customer. I really did now want to >| get into the merits or lack thereof of the method used - just a reason >| or fix as to why the "dd" command works fine under same conditions with >| LINUX and not woth SO 5.0.5. I will probably try Bela suggestion and >| try with SCO 5.0.7 and see whether it is indeed a kernel problem or >| not. Again much thanks for your time and suggestions. Rgds. Otto. >Which begs the question (at least here), why are you going through all >of these gymnastics when there are much better and more reliable solutions. >Mirror the drive and install a hot spare. Let technology do the work >instead of kludging 1980 style partial solutions. I also gave Otto some solutions as he sent me email, and I assumed that he had not posted to the 'net. One thing I pointed out is that dd is going to copy things over and over and over again - as so much never changes, where something like rsync would work well. The 'dd' approach seems to probably come from the Ghost approach that many people use in the MS world. But MS and Linux solutions don't always apply very well in real Unix systems. I also suggested raid. He never indicated his hardware or size of the system so who is to know if the 3-4 hour Microlite recovery is based only on the amount of data he has. If it is large then the 'dd' is also going to take a lot of system resources - most of which will be wasted. I also pointed out to Otto that at times Linux approaches can spell disaster when applied to a pure Unix system. I had to physically pull a Sun Netra [from our racks] to repair a system that the client had Linux-ized - so it would be more familiar. After a shutdown it would not reboot as it could not find it's home directory or shell. And the system would not let you reinstall the OS as there was one there. I wound up pulling the unit, swapping drives, installing Solaris on the new drive. Then it was a simple 'fsck' on the old drive from the new drive,and then swap them back and set things up the way they were intended. As you and I well know there are so many minor [and sometimes major] differences between Unix systems, and particularly Unix-like systems, that you can shoot yourself in the hard-drive if you try to use a universal approach. Bill -- Bill Vermillion - bv @ wjv . com |