Unix Technical Forum

Array full of dead drives.

This is a discussion on Array full of dead drives. within the Sun Solaris Hardware forums, part of the Solaris Operating System category; --> I've managed to scavenge a 711 multipak (12 drive version) with some Sun label IBM 9GB drives. For a ...


Go Back   Unix Technical Forum > Unix Operating Systems > Solaris Operating System > Sun Solaris Hardware

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 01-16-2008, 02:44 PM
Darren Dunham
 
Posts: n/a
Default Array full of dead drives.

I've managed to scavenge a 711 multipak (12 drive version) with some Sun
label IBM 9GB drives.

For a while I thought the array was damaged (one of the scsi cables had
a bent pin), but now I'm coming to the conclusion that instead all 7
drives are dead in the same way.

I grabbed an old 2G drive out of a sparc 5. Works in the array just
fine.

All 7 of the IBM drives behave identically. Iostat -En or other inquiry
attempts show the drive and product codes...

c0t13d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: IBM Product: DDRS39130SUN9.0G Revision: S98E Serial No: 478806
Size: 9.06GB <9055065600 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0

If I use the 'analysis' function in format, I have no problems doing any
of the tests: read, write, compare, whatever.

But, I cannot put a label on the disk.

partition> label
Ready to label disk, continue? y

Warning: Unable to get capacity. Cannot check geometry
partition> ^D

The drives don't auto-configure, and if I try to read any of the mode
pages, they all fail.

If the drives didn't have Sun identifiers in the label, I'd assume that
they were some sort of non-standard drive (like the blocks weren't 512
bytes or something). But that's not the case. Or if the drives didn't
spin and pass the analysis, I'd assume something mechanical failed.

However, all 7 simply fail to auto-configure, and fail to take labels.
I even stuck them in a sparc 5 to make sure the array wasn't faulty and
it had the same problems in the other machine.

Anyone ever heard of anything like this? Any particular trick that I
might be able to perform to start talking to these things? I'm just
having a hard time believing these drives all randomly failed in just
this manner.

Thanks!
--
Darren Dunham ddunham@taos.com
Senior Technical Consultant TAOS http://www.taos.com/
Got some Dr Pepper? San Francisco, CA bay area
< This line left intentionally blank to confuse you. >
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 01-16-2008, 02:44 PM
Greg Menke
 
Posts: n/a
Default Re: Array full of dead drives.

Darren Dunham <ddunham@redwood.taos.com> writes:

>
> If the drives didn't have Sun identifiers in the label, I'd assume that
> they were some sort of non-standard drive (like the blocks weren't 512
> bytes or something). But that's not the case. Or if the drives didn't
> spin and pass the analysis, I'd assume something mechanical failed.
>
> However, all 7 simply fail to auto-configure, and fail to take labels.
> I even stuck them in a sparc 5 to make sure the array wasn't faulty and
> it had the same problems in the other machine.
>
> Anyone ever heard of anything like this? Any particular trick that I
> might be able to perform to start talking to these things? I'm just
> having a hard time believing these drives all randomly failed in just
> this manner.


I have an 8 gig IBM drive out of a SGI labelled Clariion rack, same kind
of problem, if not exactly the same- its been a while. I've heard the
occasional suggestion that SGI used a 1024 byte block because they
wanted to be evil, but I don't know how to confirm or not on this drive.
Bummer, because it was going to snazz up a 4 gig SS-10.

Gregm

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 01-16-2008, 02:44 PM
John Burns
 
Posts: n/a
Default Re: Array full of dead drives.

> I have an 8 gig IBM drive out of a SGI labelled Clariion rack, same kind
> of problem, if not exactly the same- its been a while. I've heard the
> occasional suggestion that SGI used a 1024 byte block because they
> wanted to be evil, but I don't know how to confirm or not on this drive.


I don't think it's that simple. I'm sure we ran across this a few years
ago with some discs we took from a similar array. They have a weird
block size like 1048 or something, the extra is used for array control
information or some such.

You need to issue a SCSI mode select command to reset the block size. We
found an Adaptec 2940 in a PC would do this if we low level formatted
the drives using the card's utility software.

I think a lot of the things SGI did were evil. But this one wasn't there
own idea!

--
Who needs a life when you've got Unix? :-)
Email: john@unixnerd.demon.co.uk, John G.Burns B.Eng, Bonny Scotland
Web : http://www.unixnerd.demon.co.uk - The Ultimate BMW Homepage!
Need Sun or HP Unix kit? http://www.unixnerd.demon.co.uk/unix.html
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 01-16-2008, 02:44 PM
Thomas Tornblom
 
Posts: n/a
Default Re: Array full of dead drives.

You might be able to do a mode select on the disks and select 512 byte
blocks. You would likely need a program using "uscsi" for that.

Some disks require a low level format after this, and some disks can
be altered on the fly.

Here's such a hack I used to toy with this many years ago. Warning, I
take absolutely no responsibility if you manage to screw up some
important disk ;-)
---
#include <stdio.h>
#include <fcntl.h>
#include <sys/scsi/scsi.h>
#include <sys/scsi/impl/uscsi.h>

char buf[100];
union scsi_cdb cdb;
struct uscsi_cmd cmd;

int
main(int argc, char *argv[])
{
int fd;
struct mode_header *mh;
struct block_descriptor *bd;
int bs;

if (argc != 3) {
fprintf(stderr, "Usage: %s <blocksize> <device>\n", argv[0]);
exit(1);
}

if ((bs = strtol(argv[1], 0, 0)) == 0) {
fprintf(stderr, "Usage: %s <blocksize> <device>\n", argv[0]);
exit(1);
}

if ((fd = open(argv[2], O_NDELAY)) < 0) {
perror("open");
exit(1);
}

mh = (struct mode_header *) buf;
bd = (struct block_descriptor *) &buf[sizeof(*mh)];
mh->bdesc_length = sizeof(*bd);

cdb.scc_cmd = SCMD_MODE_SENSE;
cdb.g0_count0 = 12;

cmd.uscsi_flags = USCSI_READ | USCSI_DIAGNOSE;
cmd.uscsi_cdb = (caddr_t) &cdb;
cmd.uscsi_bufaddr = buf;
cmd.uscsi_buflen = 12;
cmd.uscsi_cdblen = 6;

if (ioctl(fd, USCSICMD, &cmd) < 0) {
perror("ioctl");
exit(1);
}
printf("size %d\n", (bd->blksize_mid << 8) + bd->blksize_lo);

mh = (struct mode_header *) buf;
bd = (struct block_descriptor *) &buf[sizeof(*mh)];

mh->length = 0;
mh->bdesc_length = sizeof(*bd);
mh->medium_type = 0;
mh->device_specific = 0;

bd->blksize_mid = bs >> 8;
bd->density_code = bs & 0xff;
bd->blks_hi = 0;
bd->blks_mid = 0;
bd->blks_lo = 0;

cdb.scc_cmd = SCMD_MODE_SELECT;
cdb.g0_addr2 = 1; /* save page */
cdb.g0_count0 = 12;

cmd.uscsi_flags = USCSI_WRITE | USCSI_DIAGNOSE;
cmd.uscsi_cdb = (caddr_t) &cdb;
cmd.uscsi_bufaddr = buf;
cmd.uscsi_buflen = 12;
cmd.uscsi_cdblen = 6;

if (ioctl(fd, USCSICMD, &cmd) < 0) {
perror("ioctl");
exit(1);
}

exit(0);
}
---

No guarantee that it even works now...

Good luck!

Thomas
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 01-16-2008, 02:44 PM
Greg Menke
 
Posts: n/a
Default Re: Array full of dead drives.

Thomas Tornblom <thomas@Hax.SE.remove-to-reply> writes:


> You might be able to do a mode select on the disks and select 512 byte
> blocks. You would likely need a program using "uscsi" for that.
>
> Some disks require a low level format after this, and some disks can
> be altered on the fly.
>
> Here's such a hack I used to toy with this many years ago. Warning, I
> take absolutely no responsibility if you manage to screw up some
> important disk ;-)


Got it, thanks. I'll give it a whirl once the SS10 is back running.

Gregm

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 01-16-2008, 02:44 PM
Sunny
 
Posts: n/a
Default Re: Array full of dead drives.



Darren Dunham wrote:

> Anyone ever heard of anything like this? Any particular trick that I
> might be able to perform to start talking to these things? I'm just
> having a hard time believing these drives all randomly failed in just
> this manner.


I have a couple of scrounged multipaks I use for auxiliary storage on
PCs, and seem to recall encountering similar problems initially.

I didn't do much in the way of diagnostics, just formatted the drives on
the PC using Adaptec's SCSI BIOS, then they worked. Might be worth trying.

HTH

Sunny
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 01-16-2008, 02:45 PM
Darren Dunham
 
Posts: n/a
Default Re: Array full of dead drives.

In comp.unix.solaris John Burns <john@unixnerd.demon.co.uk> wrote:
>> I have an 8 gig IBM drive out of a SGI labelled Clariion rack, same kind
>> of problem, if not exactly the same- its been a while. I've heard the
>> occasional suggestion that SGI used a 1024 byte block because they
>> wanted to be evil, but I don't know how to confirm or not on this drive.


> I don't think it's that simple. I'm sure we ran across this a few years
> ago with some discs we took from a similar array. They have a weird
> block size like 1048 or something, the extra is used for array control
> information or some such.


> You need to issue a SCSI mode select command to reset the block size. We
> found an Adaptec 2940 in a PC would do this if we low level formatted
> the drives using the card's utility software.


Yeah, these appear to be sun drives, so I don't think it's what's
happening.

I even drug out sformat to see if it would help. It looks like there's
some sort of reservation the disks. I have no idea how or why..

../sformat: I/O error. test unit ready: scsi sendcmd: retryable error
CDB: 00 00 00 00 00 00
status: 0x18 (RESERVATION CONFLICT)
cmd finished after 0.091s timeout 20s
Formatting ... ./sformat: I/O error. format unit: scsi sendcmd:
retryable error
CDB: 04 18 00 00 01 00
status: 0x18 (RESERVATION CONFLICT)
resid: 4
cmd finished after 0.096s timeout 13680s
done.
../sformat: I/O error. read capacity: scsi sendcmd: retryable error
CDB: 25 00 00 00 00 00 00 00 00 00
status: 0x18 (RESERVATION CONFLICT)
resid: 8
cmd finished after 0.099s timeout 20s
Condition not caught: capacity_not_set.
Raisecond: not implemented.
Abort (core dumped)

Hm....

--
Darren Dunham ddunham@taos.com
Senior Technical Consultant TAOS http://www.taos.com/
Got some Dr Pepper? San Francisco, CA bay area
< This line left intentionally blank to confuse you. >
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 01-16-2008, 02:45 PM
Thomas Tornblom
 
Posts: n/a
Default Re: Array full of dead drives.

That sounds like there is a Persistent Group Reservation on the disks.

The disks have probably been used in a cluster environment, and the
reservation needs to be "scrubbed".

Check:
http://forum.sun.com/thread.jspa?thr...357&tstart=240

Thomas
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 01-16-2008, 02:45 PM
Joerg Schilling
 
Posts: n/a
Default Re: Array full of dead drives.

In article <zu7Md.183$Hr.58@newssvr24.news.prodigy.net>,
Darren Dunham <ddunham@redwood.taos.com> wrote:

>I even drug out sformat to see if it would help. It looks like there's
>some sort of reservation the disks. I have no idea how or why..
>
>./sformat: I/O error. test unit ready: scsi sendcmd: retryable error
>CDB: 00 00 00 00 00 00
>status: 0x18 (RESERVATION CONFLICT)
>cmd finished after 0.091s timeout 20s
>Formatting ... ./sformat: I/O error. format unit: scsi sendcmd:
>retryable error
>CDB: 04 18 00 00 01 00
>status: 0x18 (RESERVATION CONFLICT)
>resid: 4
>cmd finished after 0.096s timeout 13680s


How many hosts did you connect to this drive simultaneously?

--
EMail:joerg@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
js@cs.tu-berlin.de (uni)
schilling@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
URL: http://www.fokus.fraunhofer.de/usr/schilling ftp://ftp.berlios.de/pub/schily
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #10 (permalink)  
Old 01-16-2008, 02:46 PM
Darren Dunham
 
Posts: n/a
Default Re: Array full of dead drives.

In comp.unix.solaris Thomas Tornblom <thomas@hax.se.remove-to-reply> wrote:
> That sounds like there is a Persistent Group Reservation on the disks.


> The disks have probably been used in a cluster environment, and the
> reservation needs to be "scrubbed".


Hmm. Actually, that may be a possibility.

I don't have the history of this array, but it's very likely an older
Sun Cluster was in use near it, so it could have been using the array.

The cluster is gone though...

> http://forum.sun.com/thread.jspa?thr...357&tstart=240


Hmm. Good for Sun Cluster 3.0 only I suppose?

--
Darren Dunham ddunham@taos.com
Senior Technical Consultant TAOS http://www.taos.com/
Got some Dr Pepper? San Francisco, CA bay area
< This line left intentionally blank to confuse you. >
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 06:38 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com