vBulletin Search Engine Optimization
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Hello all, I've got something I've never seen before. All signs point to bad hard drives and/or a heat issue, but 2 of the three drives I've tried are brand new. All drives are from different mfg's, bought at different times from different vendors. One of the machine's is a laptop, the other a 4u rack mount case with lots of ventilation. The machines are in separate environment - both fairly hospitable. Here's the deal - I've installed FC3, done lots of updates, etc. and have fully working systems... for about a week and then the disks appear to crash. FWIW, I'm running 2.6.9-1.724_FC3. The systems stay running, to a certain extent. For instance you can ssh in, ls and cd around, top and kill appear to work, but ps doesn't. Doing anything more gets you an "Input/output error" and that's it. If you try to shutdown or reboot (using those commands), you get "Bus error" and nothing more. When you physically reset the box, it fails to boot, saying it can't find a bootable hard disk. If you leave it off and flip it back on hours later (sometimes a few days later), it'll actually boot and you wind up fsck'ing, but it will usually return to working condition. One of the disks, a 3 year old disk, did actually crash I think. I replaced it and the new one makes a "coughing" or scratching sound after running for about a week; not a normal sound, I think I'll return that disk. The odd sound happens for several minutes at about 20 second intervals, then stops, sometimes for hours, and then returns intermittently. The coughing sound started last night after running about a week. This morning I noticed the "Input/output error" on the machine... I thought it was an overheating issue, so I've been watching smartctl output and checking it's heat information. It's been as high as 75C, usually when it's been idle awhile, but as soon as I disturb it it drops back to about 45-50C. When I noticed the error this morning it was at 45C, so I'm not so sure about the heat problem. The only thing that points me to a heat issue is the "turn it off for a few days/hours" bit. A friend's machine, the 4u unit, did the same thing last night. A physical reset and fsck took care of it. Any idea what could cause the "Input/output error"? Why can't I reboot the machine from the command prompt? I can't even run /bin/sync before flipping the power, which contributes to my disk corruption... I've never seen a Linux platform so unstable, and I've been using Linux for 11-12 years! I'm used to installing Linux and forgetting about it, not having to reboot it for months. These intallations seem to be destroying themselves! Have I just had an incredible case of coincidental drive failures? What are the odds of three drives from different mfgs failing at once with the same exact symptoms (i.e. 1 week old FC3 install)? Astronomical! Or is there a problem with Fedora and/or the ext3 filesystem? Thanks for any insights, Bret |
| |||
| bret.schuhmacher@aspect.com wrote: [snip] > Have I just had an incredible case of coincidental drive failures? What > are the odds of three drives from different mfgs failing at once with > the same exact symptoms (i.e. 1 week old FC3 install)? Astronomical! > Or is there a problem with Fedora and/or the ext3 filesystem? .... first off, what do the logs reveal? -- << http://michaeljtobler.homelinux.com/ >> If you were attacked by a homosexual, would you beat him off? |
| ||||
| Well, anything's possible, but it doesn't sound like hardware. I'd bet you are updating from a questionable source and installing the same bad package. Care to give us more info on how you have "done lots of updates"? Tom F. |