vBulletin Search Engine Optimization
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| I have been using Slackware for over 10 years now and I know this has nothing to do with Slack but I thought I would throw that in there for the hell of it. Anyway over the last ten years I have noticed something that's a little disturbing as far as the filesystem/hardrives are concerned. Over the years I have been downloading, testing and then rming very large files. The drives I have used to do this work on have always inevitably failed. I have been through 5 drives in the last ten years, and now the sixth is starting to fail. The symptoms are always the same, I will be working with a large file and there will be a lockup, not a hard lockup but the more I mess with the computer trying to kill the offending app the harder the lockup gets until I finally have to just do a hard reset. As the 'puter reboots, it of course, runs e2fsck and finds a few bad inodes and fixes things up and we're back to normal again. But as time goes by, over several weeks maybe, the problem gets worse and worse as far as working with large files is concerned. The computer may run fine as long as I just use the computer normally, such as surfing the net and doing email and that sort of thing. But if I do anything with large files there will be a crash to contend with and I finally have to replace the drive. I have tried reformatting to see if a fresh filesystem would help but to no avail. I really don't think it is the fault of the drives but I could be wrong there too, it's been known to happen. And before you ask, I use Western Digital drives. I use western digital for the root filesystem and the data drives. I have never had a failure on the root filesystem drive. Just the work drive. BTW, my computers run 24/7 the uptime on this one was 206 days until this last problem began. Here are some of my thoughts on the situation: The constant writing and erasing of the drive just wears the media itself out. But why??? Aren't the drives still magnetic media? Ya know, it's been so long since I studied hardware I don't even know what they have been doing in the advancement of harddrives in the past few years. I'm just writing this to get some others' thoughts on this situation. Your thoughts and suggestions are all welcome. Thanks, Widgeteye |
| |||
| WIdgeteye wrote: > Over the years I have been downloading, testing and then rming very large > files. The drives I have used to do this work on have always inevitably > failed. I have been through 5 drives in the last ten years, and now the > sixth is starting to fail. Are you sure the problem is with the disk hardware? By your description of the problem, I would tend to suspect either a shortage of memory, or a failure in physical memory, not the disk. Whe you bring the system back up after a hard reset, I would expect e2fsk to find errors on the file system containing the file you were working on, almost "by definition". The more times you do this, the more errors I would expect e2fsck to find. The file was not unlikely to have become corrupted after the first hard-reset. > The computer may run fine as long as I just use the computer normally, > such as surfing the net and doing email and that sort of thing. Generally low memory requirement types of things, yes. What sorts of work are you doing on the large files? How large are they? Can you split them up (see the split manual page) and work on them in sections, to determine whether or not the same operations done on smaller files exhibit similar behaviour? > But if I do anything with large files there will be a crash to contend > with and I finally have to replace the drive. Please define "do anything". There are people who use Linux for processing large audio files, for example, and do not have to replace disks every couple of years. In fact, the newest disk on any of my computers is _in_ my audio workstation, and it's about 3 to 5 years old. (mind you that system has taken to failing to boot, but that looks like a BIOS problem, not a disk failure or file system problem ... I need to spend some time working on it before I can use it again ...) > I have tried reformatting to see if a fresh filesystem would help but > to no avail. Assuming you have a problem elsewhere, that isn't surprising. > I really don't think it is the fault of the drives but I could be wrong > there too, it's been known to happen. I don't think the drives are at fault either. One drive failing I would believe, maybe even two, but this many drives "failing" this consistently suggests to me that they never failed in the first place and that the problem is elsewhere. > The constant writing and erasing of the drive just wears the media > itself out. No. I manage a news server (which writes and erases files constantly) that has been running for years without any interruption, let alone failure. The only interruptions this system has seen in the last 5 years or so have been when we needed to move the physical system to another location in the machine room, or more recently when we upgraded it from a commercial Unix system to Slackware Linux on newer hardware. I also manage mail servers which spend all their time writing and erasing files (approximately 200K messages per day among four mail servers). Disks can (and sometimes do) fail, but they don't "wear out" just because the system is writing and erasing files all the time. If they did, I'd have to recommend changing to a different brand of disks. > But why??? Aren't the drives still magnetic media? Yes, unless you're using CompactFlash cards, USB keychain drives, or other drives of that sort. > Ya know, it's been so long since I studied hardware I don't even > know what they have been doing in the advancement of harddrives in > the past few years. If we're talking about "regular" hard disks, they've been miniaturizing them, increasing the data density, increasing rotational speeds, and reducing manufacturing costs. There have been some advances in the interfaces from hard drives to the rest of the system (serial ATA, for example), but the physical drive itself is basically and conceptually the same as it's been for years. (smaller, faster, cheaper, perhaps, but it's still some number of rotating platters, each with a magnetic head floating just beyond the surface ...) -- ---------------------------------------------------------------------- Sylvain Robitaille syl@alcor.concordia.ca Systems analyst Concordia University Instructional & Information Technology Montreal, Quebec, Canada ---------------------------------------------------------------------- |
| |||
| WIdgeteye wrote: > > Over the years I have been downloading, testing and then rming very large > files. The drives I have used to do this work on have always inevitably > failed. I have been through 5 drives in the last ten years, and now the > sixth is starting to fail. > Funny you should mention this... just last night I was playing with vmware and it kept crashing on me. I was trying to setup slackware 10.2 on it and it seemed that whenever I created partitions on the virtual drive it would lock up the system. It was a hard lock and I couldn't do anything except hit the reset button (I couldn't even ssh into the machine). Anyway, this happend 3 times in a row then I decided to do it differently. I create the new virtual machine again and had it allocate the disk space immediately and this caused it to lock up again. I was pretty sure it was a disk failure. I just so happened to have an extra drive on the computer (empty) so I proceeded to create the virtual machine on this drive. It worked fine on the first try. The failing disk is only about 4 months old and since its a sata drive, I can't seem to get smartd stats on it. > The symptoms are always the same, I will be working with a large file and > there will be a lockup, not a hard lockup but the more I mess with the > computer trying to kill the offending app the harder the lockup gets until > I finally have to just do a hard reset. As the 'puter reboots, it of > course, runs e2fsck and finds a few bad inodes and fixes things up and > we're back to normal again. But as time goes by, over several weeks maybe, > the problem gets worse and worse as far as working with large files is > concerned. > I've also seen this happen on my last few drives. I've denied the possibly of it being "linux" breaking my hardware but possibly I was just fooling myself. Are different filesystems more/less prone to hardware damage? Would reiserfs be better? I've stuck with ext3 for a while since people seem to suggest it for its maturity. > I really don't think it is the fault of the drives but I could be wrong > there too, it's been known to happen. > And before you ask, I use Western Digital drives. I use western digital > for the root filesystem and the data drives. I have never had a failure > on the root filesystem drive. Just the work drive. > BTW, my computers run 24/7 the uptime on this one was 206 days until this > last problem began. Mine was only 12 days (booted to windows to play a game a few weeks ago). -Miguel |
| |||
| On 2005-09-21, WIdgeteye <None@none.none> wrote: > > I have been using Slackware for over 10 years now and I know this has > nothing to do with Slack but I thought I would throw that in there for the > hell of it. > > I really don't think it is the fault of the drives but I could be wrong > there too, it's been known to happen. > And before you ask, I use Western Digital drives. I use western digital > for the root filesystem and the data drives. I have never had a failure > on the root filesystem drive. Just the work drive. > BTW, my computers run 24/7 the uptime on this one was 206 days until this > last problem began. My systems run 24/7 and all use Western Digital. Up until last year I was still using a Western Digital 40meg that ran 24/7 for misc or old compressed data storage..that drive was almost 18 years old. Like you the bulk of the files I edit or work with are large, by my standards 10-15 megs, and I've never had the problems you discribe with Western Digital. I've had, and continue to have constant problems with laptop hard drives...they always fail after several years...but never a hard drive failure like you describe on any of the desktops and never with such frequency. I can only hazard a wild guess and say your problems might not relate to your hard drive....have you considered using a different file system like Reiser or try any number of dard drive utilities to test your drives. I've heard, and I stress no first hand knowledge here, of hard drives failing due to excessive heat in the box or poor ambient conditions, to much moisture in the air, etc....just a thought. ken |
| |||
| On Wed, 21 Sep 2005 17:07:49 +0000, No_One wrote: > Like you the bulk of the files I edit or work with are large, by my standards > 10-15 megs, and I've never had the problems you discribe with Western Digital. The files I work with are in the hundreds of meg over into the gig sizes. |
| |||
| On Wed, 21 Sep 2005 10:04:33 -0700, Miguel De Anda wrote: > > I've also seen this happen on my last few drives. I've denied the possibly > of it being "linux" breaking my hardware but possibly I was just fooling > myself. Are different filesystems more/less prone to hardware damage? Would > reiserfs be better? I've stuck with ext3 for a while since people seem to > suggest it for its maturity. I too have been using ext3 for the last couple of years. The problem made the change from ext2 to ext3 too. I have never tried reiserfs. I don't know dude, I'm in listening mode right now. |
| |||
| On Wed, 21 Sep 2005 18:14:38 +0200, WIdgeteye <None@none.none> wrote: > > I have been using Slackware for over 10 years now and I know this has > nothing to do with Slack but I thought I would throw that in there for > the > hell of it. > > Anyway over the last ten years I have noticed something that's a little > disturbing as far as the filesystem/hardrives are concerned. > > Over the years I have been downloading, testing and then rming very large > files. The drives I have used to do this work on have always inevitably > failed. I have been through 5 drives in the last ten years, and now the > sixth is starting to fail. > I have experienced the same issues/problems over the years. My 'problem' is that I insist on re-building my machine so some times I end up with a strange mix of hw :-) But I have always been able to pinpoint disk failures to faulty discs by running a manufacturers diagnostic utility. This has the added bonus of being able to get a replacement disc if the faulty one still is under warranty. I have also come to the conclusion that if a disc fails (with ensuing fsck repairs) there is only one thing you can do - salvage as much data as possible, and reformat the disc. This will detect and remap sector defects and you will probably save yourself the trouble of recovering from a totally trashed filesystem. I think that the discs are suspect to wear 'n tear due to heat, vibration, aso. So if you only repair the filesystem the underlying problem, a bad sector, will show uo again. I have been using ext2, ext3 and reiserfs and it makes no difference, once a disc has problems it will only get worse over time. And i have not been working with very large files. The above is just my experience and gut feelings accumulated over 10+ years, nothing scientific :-) Best regards Ib |
| |||
| On 2005-09-21, WIdgeteye <None@none.none> wrote: > On Wed, 21 Sep 2005 17:07:49 +0000, No_One wrote: > > >> Like you the bulk of the files I edit or work with are large, by my standards >> 10-15 megs, and I've never had the problems you discribe with Western Digital. > > The files I work with are in the hundreds of meg over into the gig sizes. > > ahh..well, a gigzilla user if ever I've seen one. Your's is bigger than mine, no question about it. ken |
| |||
| On Wed, 21 Sep 2005 16:48:52 +0000, Sylvain Robitaille wrote: > Are you sure the problem is with the disk hardware? By your description > of the problem, I would tend to suspect either a shortage of memory, or a > failure in physical memory, not the disk. Whe you bring the system back > up after a hard reset, I would expect e2fsk to find errors on the file > system containing the file you were working on, almost "by definition". It isn't hardware, I have 1 gig of memory and have been through 4 mother board and processor upgrades in just the last few years. The problem remains. > The more times you do this, the more errors I would expect e2fsck to > find. The file was not unlikely to have become corrupted after the > first hard-reset. > >> The computer may run fine as long as I just use the computer normally, >> such as surfing the net and doing email and that sort of thing. > > Generally low memory requirement types of things, yes. What sorts of > work are you doing on the large files? par2 r file.par rar e file.rar burn to dvd erase. Get the picture?? > How large are they? From 700 meg to 1.4 gig for most. > can you split them up (see the split manual page) and work on them in > sections, to determine whether or not the same operations done on > smaller files exhibit similar behaviour? no >> But if I do anything with large files there will be a crash to contend >> with and I finally have to replace the drive. > > Please define "do anything". See above >> I have tried reformatting to see if a fresh filesystem would help but >> to no avail. > > Assuming you have a problem elsewhere, that isn't surprising. There's no problem elsewhere. >> The constant writing and erasing of the drive just wears the media >> itself out. > > No. > > I manage a news server (which writes and erases files constantly) that HUndreds of megs and gigs at a time? > I also manage mail servers which spend all their time writing and > erasing files (approximately 200K messages per day among four mail > servers). Disks can (and sometimes do) fail, but they don't "wear out" > just because the system is writing and erasing files all the time. If > they did, I'd have to recommend changing to a different brand of disks. 200k per day != 700M - 1.5G per hour Thanks |
| ||||
| On 2005-09-21, WIdgeteye <None@none.none> wrote: > > I'm just writing this to get some others' thoughts on this situation. > Your thoughts and suggestions are all welcome. Could there be environmental issues that are plaguing your hardware? Maybe it's particularly hot, dusty, or some other factor that's causing you more problems than normal? Just a wild guess, but certainly not outside the realm of possibility. --keith |