Peter -
On Sun, 30 Jan 2005 17:54:11 +0100, Peter C. Tribble wrote
(in article <ctj3fj$f0a$1@helium.hgmp.mrc.ac.uk>):
[...]
> So why did disksuite fail it? It must have had some reason for doing so that
> is likely to be logged somewhere. Disksuite is pretty aggressive at failing
> devices (it dosn't allow many errors before chucking it out completely) but
> I've always seen at least one error that explains it.
I found that error message in the logs:
Jan 29 22:45:40 gbr2-p40 scsi: [ID 107833 kern.warning] WARNING:
/pci@1f,0/pci@1,1/scsi@2/sd@2,0 (sd2):
Jan 29 22:45:40 gbr2-p40 SCSI transport failed: reason 'reset':
retrying command
Jan 29 22:45:48 gbr2-p40 md_stripe: [ID 641072 kern.warning] WARNING: md:
d33: read error on /dev/dsk/c0t2d0s3
Jan 29 22:45:48 gbr2-p40 md_mirror: [ID 842313 kern.info] NOTICE: md: d33:
B_FAILFAST I/O retry
Jan 29 22:46:00 gbr2-p40 md_stripe: [ID 641072 kern.warning] WARNING: md:
d33: read error on /dev/dsk/c0t2d0s3
Jan 29 22:46:00 gbr2-p40 md_mirror: [ID 104909 kern.warning] WARNING: md:
d33: /dev/dsk/c0t2d0s3 needs maintenance
Jan 29 22:46:00 gbr2-p40 md_stripe: [ID 241980 kern.notice] NOTICE: md: d33:
hotspared device /dev/dsk/c0t2d0s3 with /dev/dsk/c0t4d0s7
> I usuall (in the cases where the disk is responsive at all) do a
> format/analyze/read/repair and try replacing it just to see if it was a
> single bad block. Sometimes works.
I checked the grown defects list and it's still empty. The disk is a 73 Gb
Maxtor Atlas 10K IV bought in May 2004.
defect> prim
Extracting primary defect list...Extraction complete.
Defect List has a total of 684 defects.
defect> g
Extracting grown defects list...Extraction complete.
Defect List has a total of 0 defects.
[...]
> Seems complicated. How about just
>
> metareplace -e d3 c0t2d0s3
I tried and it worked. Thanks a lot ! Much simpler and easier than what I was
going to do.
[...]
> That's a concatenation. Don't think you want that...
Definitely not ;-)
[...]
> If the metareplace succeeds. And if the metareplace fails, the hot spare
> should still be in place.
You're right. When the metareplace started to resync the failed slice, the
hot spare switched back to "available".
So what do you think now ? Do you believe that a LVM software failure is
something possible or the drive is definitely dying ? Is it worth breaking
the mirror to reformat the disk and recreate the mirror ?
Thx again,
Georges
--
Georges Tomazi -
gt@diapason.com