This is a discussion on smartctl's "Non-medium error count" (SCSI drives) within the Linux Operating System forums, part of the Unix Operating Systems category; --> One of my SCSI drives seems to be having a bad time of it - failing on some reads, ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| One of my SCSI drives seems to be having a bad time of it - failing on some reads, and with disk read errors (and parity errors) appearing in syslog. However, running smartctl -a on the drive shows up *no* corrected or uncorrected errors for the drive - but it does have a count in the hundreds for "Non-medium error count". So, what exactly does "non-medium error count" mean? I've got another disk on the same bus and that's not had any trouble, so I don't think it's a cabling/termination issue (and it's been running fine 24x7 for a year or so until the last week when I've had a few glitches) Possible controller (HBA) failure? Or does 'non-medium' mean the drive electronics are going bad (disk cache memory or somesuch)? Needless to say I'll be taking the drive off-line until I know what the issues are cheers Jules |
| |||
| Jules wrote: > One of my SCSI drives seems to be having a bad time of it - failing on > some reads, and with disk read errors (and parity errors) appearing in > syslog. > > However, running smartctl -a on the drive shows up *no* corrected or > uncorrected errors for the drive - but it does have a count in the > hundreds for "Non-medium error count". > > So, what exactly does "non-medium error count" mean? from http://smartmontools.sourceforge.net....html#errorlog Error Logs ... non-medium error counter (only a single number displayed). This represents the number of recoverable events other than write, read or verify errors. > ...I've got another disk > on the same bus and that's not had any trouble, so I don't think it's a > cabling/termination issue (and it's been running fine 24x7 for a year or > so until the last week when I've had a few glitches) > > Possible controller (HBA) failure? Or does 'non-medium' mean the drive > electronics are going bad (disk cache memory or somesuch)? > > Needless to say I'll be taking the drive off-line until I know what the > issues are Check out the full feature/test set available with smartctl and smartd. I never can remember all that's available specific codes The examples/explanations are worth a look. http://smartmontools.sourceforge.net/ hth, prg |
| ||||
| On Wed, 13 Apr 2005 18:04:09 -0700, prg wrote: >> So, what exactly does "non-medium error count" mean? > > from > http://smartmontools.sourceforge.net....html#errorlog > > Error Logs ... > non-medium error counter (only a single number displayed). This > represents the number of recoverable events other than write, read or > verify errors. Now that's interesting. I tried copying one of the affected files from the drive, which died with an I/O error - suggesting that the block couldn't be read at all. Yet smartctl output is only showing non-medium events and nothing under read/write events... That suggests cabling to me, but it's funny how the other drive on the bus has been fine (I'm well used to SCSI so I've seen some pretty odd things happen though!). Maybe there's a problem at the drive connector to that specific drive - I'll have a look. Oddly enough, an hour after I posted, the problem had gone away again, which is rather spooky. Nothing on the system was scheduled to kick in around the time of the problems, nor were any big inductive loads nearby active (!) or anything like that. I hate random glitches :-( > Check out the full feature/test set available with smartctl and smartd. > I never can remember all that's available > specific codes Will do - thanks. I'll probably copy all the data off the drive, wipe it and do a destructive test/rebuild of the filesystem just to see if that throws up any bad blocks. I'll double-check cabling first though :-) cheers Jules |