Unix Technical Forum

smartctl's "Non-medium error count" (SCSI drives)

This is a discussion on smartctl's "Non-medium error count" (SCSI drives) within the Linux Operating System forums, part of the Unix Operating Systems category; --> One of my SCSI drives seems to be having a bad time of it - failing on some reads, ...


Go Back   Unix Technical Forum > Unix Operating Systems > Linux Operating System

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 01-18-2008, 08:24 AM
Jules
 
Posts: n/a
Default smartctl's "Non-medium error count" (SCSI drives)


One of my SCSI drives seems to be having a bad time of it - failing on
some reads, and with disk read errors (and parity errors) appearing in
syslog.

However, running smartctl -a on the drive shows up *no* corrected or
uncorrected errors for the drive - but it does have a count in the
hundreds for "Non-medium error count".

So, what exactly does "non-medium error count" mean? I've got another disk
on the same bus and that's not had any trouble, so I don't think it's a
cabling/termination issue (and it's been running fine 24x7 for a year or
so until the last week when I've had a few glitches)

Possible controller (HBA) failure? Or does 'non-medium' mean the drive
electronics are going bad (disk cache memory or somesuch)?

Needless to say I'll be taking the drive off-line until I know what the
issues are

cheers

Jules

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 01-18-2008, 08:24 AM
prg
 
Posts: n/a
Default Re: smartctl's "Non-medium error count" (SCSI drives)


Jules wrote:
> One of my SCSI drives seems to be having a bad time of it - failing

on
> some reads, and with disk read errors (and parity errors) appearing

in
> syslog.
>
> However, running smartctl -a on the drive shows up *no* corrected or
> uncorrected errors for the drive - but it does have a count in the
> hundreds for "Non-medium error count".
>
> So, what exactly does "non-medium error count" mean?


from
http://smartmontools.sourceforge.net....html#errorlog

Error Logs ...
non-medium error counter (only a single number displayed). This
represents the number of recoverable events other than write, read or
verify errors.

> ...I've got another disk
> on the same bus and that's not had any trouble, so I don't think it's

a
> cabling/termination issue (and it's been running fine 24x7 for a year

or
> so until the last week when I've had a few glitches)
>
> Possible controller (HBA) failure? Or does 'non-medium' mean the

drive
> electronics are going bad (disk cache memory or somesuch)?
>
> Needless to say I'll be taking the drive off-line until I know what

the
> issues are


Check out the full feature/test set available with smartctl and smartd.
I never can remember all that's available Worse are all the vendor
specific codes

The examples/explanations are worth a look.

http://smartmontools.sourceforge.net/

hth,
prg

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 01-18-2008, 08:25 AM
Jules
 
Posts: n/a
Default Re: smartctl's "Non-medium error count" (SCSI drives)

On Wed, 13 Apr 2005 18:04:09 -0700, prg wrote:
>> So, what exactly does "non-medium error count" mean?

>
> from
> http://smartmontools.sourceforge.net....html#errorlog
>
> Error Logs ...
> non-medium error counter (only a single number displayed). This
> represents the number of recoverable events other than write, read or
> verify errors.


Now that's interesting. I tried copying one of the affected files from the
drive, which died with an I/O error - suggesting that the block couldn't
be read at all. Yet smartctl output is only showing non-medium events and
nothing under read/write events...

That suggests cabling to me, but it's funny how the other drive on the bus
has been fine (I'm well used to SCSI so I've seen some pretty odd things
happen though!). Maybe there's a problem at the drive connector to that
specific drive - I'll have a look.

Oddly enough, an hour after I posted, the problem had gone away again,
which is rather spooky. Nothing on the system was scheduled to kick in
around the time of the problems, nor were any big inductive loads nearby
active (!) or anything like that. I hate random glitches :-(

> Check out the full feature/test set available with smartctl and smartd.
> I never can remember all that's available Worse are all the vendor
> specific codes


Will do - thanks. I'll probably copy all the data off the drive, wipe it
and do a destructive test/rebuild of the filesystem just to see if
that throws up any bad blocks. I'll double-check cabling first though
:-)

cheers

Jules

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 01:35 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com