This is a discussion on Array full of dead drives. within the Sun Solaris Hardware forums, part of the Solaris Operating System category; --> I've managed to scavenge a 711 multipak (12 drive version) with some Sun label IBM 9GB drives. For a ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| I've managed to scavenge a 711 multipak (12 drive version) with some Sun label IBM 9GB drives. For a while I thought the array was damaged (one of the scsi cables had a bent pin), but now I'm coming to the conclusion that instead all 7 drives are dead in the same way. I grabbed an old 2G drive out of a sparc 5. Works in the array just fine. All 7 of the IBM drives behave identically. Iostat -En or other inquiry attempts show the drive and product codes... c0t13d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: IBM Product: DDRS39130SUN9.0G Revision: S98E Serial No: 478806 Size: 9.06GB <9055065600 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 If I use the 'analysis' function in format, I have no problems doing any of the tests: read, write, compare, whatever. But, I cannot put a label on the disk. partition> label Ready to label disk, continue? y Warning: Unable to get capacity. Cannot check geometry partition> ^D The drives don't auto-configure, and if I try to read any of the mode pages, they all fail. If the drives didn't have Sun identifiers in the label, I'd assume that they were some sort of non-standard drive (like the blocks weren't 512 bytes or something). But that's not the case. Or if the drives didn't spin and pass the analysis, I'd assume something mechanical failed. However, all 7 simply fail to auto-configure, and fail to take labels. I even stuck them in a sparc 5 to make sure the array wasn't faulty and it had the same problems in the other machine. Anyone ever heard of anything like this? Any particular trick that I might be able to perform to start talking to these things? I'm just having a hard time believing these drives all randomly failed in just this manner. Thanks! -- Darren Dunham ddunham@taos.com Senior Technical Consultant TAOS http://www.taos.com/ Got some Dr Pepper? San Francisco, CA bay area < This line left intentionally blank to confuse you. > |
| |||
| Darren Dunham <ddunham@redwood.taos.com> writes: > > If the drives didn't have Sun identifiers in the label, I'd assume that > they were some sort of non-standard drive (like the blocks weren't 512 > bytes or something). But that's not the case. Or if the drives didn't > spin and pass the analysis, I'd assume something mechanical failed. > > However, all 7 simply fail to auto-configure, and fail to take labels. > I even stuck them in a sparc 5 to make sure the array wasn't faulty and > it had the same problems in the other machine. > > Anyone ever heard of anything like this? Any particular trick that I > might be able to perform to start talking to these things? I'm just > having a hard time believing these drives all randomly failed in just > this manner. I have an 8 gig IBM drive out of a SGI labelled Clariion rack, same kind of problem, if not exactly the same- its been a while. I've heard the occasional suggestion that SGI used a 1024 byte block because they wanted to be evil, but I don't know how to confirm or not on this drive. Bummer, because it was going to snazz up a 4 gig SS-10. Gregm |
| |||
| > I have an 8 gig IBM drive out of a SGI labelled Clariion rack, same kind > of problem, if not exactly the same- its been a while. I've heard the > occasional suggestion that SGI used a 1024 byte block because they > wanted to be evil, but I don't know how to confirm or not on this drive. I don't think it's that simple. I'm sure we ran across this a few years ago with some discs we took from a similar array. They have a weird block size like 1048 or something, the extra is used for array control information or some such. You need to issue a SCSI mode select command to reset the block size. We found an Adaptec 2940 in a PC would do this if we low level formatted the drives using the card's utility software. I think a lot of the things SGI did were evil. But this one wasn't there own idea! -- Who needs a life when you've got Unix? :-) Email: john@unixnerd.demon.co.uk, John G.Burns B.Eng, Bonny Scotland Web : http://www.unixnerd.demon.co.uk - The Ultimate BMW Homepage! Need Sun or HP Unix kit? http://www.unixnerd.demon.co.uk/unix.html |
| |||
| You might be able to do a mode select on the disks and select 512 byte blocks. You would likely need a program using "uscsi" for that. Some disks require a low level format after this, and some disks can be altered on the fly. Here's such a hack I used to toy with this many years ago. Warning, I take absolutely no responsibility if you manage to screw up some important disk ;-) --- #include <stdio.h> #include <fcntl.h> #include <sys/scsi/scsi.h> #include <sys/scsi/impl/uscsi.h> char buf[100]; union scsi_cdb cdb; struct uscsi_cmd cmd; int main(int argc, char *argv[]) { int fd; struct mode_header *mh; struct block_descriptor *bd; int bs; if (argc != 3) { fprintf(stderr, "Usage: %s <blocksize> <device>\n", argv[0]); exit(1); } if ((bs = strtol(argv[1], 0, 0)) == 0) { fprintf(stderr, "Usage: %s <blocksize> <device>\n", argv[0]); exit(1); } if ((fd = open(argv[2], O_NDELAY)) < 0) { perror("open"); exit(1); } mh = (struct mode_header *) buf; bd = (struct block_descriptor *) &buf[sizeof(*mh)]; mh->bdesc_length = sizeof(*bd); cdb.scc_cmd = SCMD_MODE_SENSE; cdb.g0_count0 = 12; cmd.uscsi_flags = USCSI_READ | USCSI_DIAGNOSE; cmd.uscsi_cdb = (caddr_t) &cdb; cmd.uscsi_bufaddr = buf; cmd.uscsi_buflen = 12; cmd.uscsi_cdblen = 6; if (ioctl(fd, USCSICMD, &cmd) < 0) { perror("ioctl"); exit(1); } printf("size %d\n", (bd->blksize_mid << 8) + bd->blksize_lo); mh = (struct mode_header *) buf; bd = (struct block_descriptor *) &buf[sizeof(*mh)]; mh->length = 0; mh->bdesc_length = sizeof(*bd); mh->medium_type = 0; mh->device_specific = 0; bd->blksize_mid = bs >> 8; bd->density_code = bs & 0xff; bd->blks_hi = 0; bd->blks_mid = 0; bd->blks_lo = 0; cdb.scc_cmd = SCMD_MODE_SELECT; cdb.g0_addr2 = 1; /* save page */ cdb.g0_count0 = 12; cmd.uscsi_flags = USCSI_WRITE | USCSI_DIAGNOSE; cmd.uscsi_cdb = (caddr_t) &cdb; cmd.uscsi_bufaddr = buf; cmd.uscsi_buflen = 12; cmd.uscsi_cdblen = 6; if (ioctl(fd, USCSICMD, &cmd) < 0) { perror("ioctl"); exit(1); } exit(0); } --- No guarantee that it even works now... Good luck! Thomas |
| |||
| Thomas Tornblom <thomas@Hax.SE.remove-to-reply> writes: > You might be able to do a mode select on the disks and select 512 byte > blocks. You would likely need a program using "uscsi" for that. > > Some disks require a low level format after this, and some disks can > be altered on the fly. > > Here's such a hack I used to toy with this many years ago. Warning, I > take absolutely no responsibility if you manage to screw up some > important disk ;-) Got it, thanks. I'll give it a whirl once the SS10 is back running. Gregm |
| |||
| Darren Dunham wrote: > Anyone ever heard of anything like this? Any particular trick that I > might be able to perform to start talking to these things? I'm just > having a hard time believing these drives all randomly failed in just > this manner. I have a couple of scrounged multipaks I use for auxiliary storage on PCs, and seem to recall encountering similar problems initially. I didn't do much in the way of diagnostics, just formatted the drives on the PC using Adaptec's SCSI BIOS, then they worked. Might be worth trying. HTH Sunny |
| |||
| In comp.unix.solaris John Burns <john@unixnerd.demon.co.uk> wrote: >> I have an 8 gig IBM drive out of a SGI labelled Clariion rack, same kind >> of problem, if not exactly the same- its been a while. I've heard the >> occasional suggestion that SGI used a 1024 byte block because they >> wanted to be evil, but I don't know how to confirm or not on this drive. > I don't think it's that simple. I'm sure we ran across this a few years > ago with some discs we took from a similar array. They have a weird > block size like 1048 or something, the extra is used for array control > information or some such. > You need to issue a SCSI mode select command to reset the block size. We > found an Adaptec 2940 in a PC would do this if we low level formatted > the drives using the card's utility software. Yeah, these appear to be sun drives, so I don't think it's what's happening. I even drug out sformat to see if it would help. It looks like there's some sort of reservation the disks. I have no idea how or why.. ../sformat: I/O error. test unit ready: scsi sendcmd: retryable error CDB: 00 00 00 00 00 00 status: 0x18 (RESERVATION CONFLICT) cmd finished after 0.091s timeout 20s Formatting ... ./sformat: I/O error. format unit: scsi sendcmd: retryable error CDB: 04 18 00 00 01 00 status: 0x18 (RESERVATION CONFLICT) resid: 4 cmd finished after 0.096s timeout 13680s done. ../sformat: I/O error. read capacity: scsi sendcmd: retryable error CDB: 25 00 00 00 00 00 00 00 00 00 status: 0x18 (RESERVATION CONFLICT) resid: 8 cmd finished after 0.099s timeout 20s Condition not caught: capacity_not_set. Raisecond: not implemented. Abort (core dumped) Hm.... -- Darren Dunham ddunham@taos.com Senior Technical Consultant TAOS http://www.taos.com/ Got some Dr Pepper? San Francisco, CA bay area < This line left intentionally blank to confuse you. > |
| |||
| That sounds like there is a Persistent Group Reservation on the disks. The disks have probably been used in a cluster environment, and the reservation needs to be "scrubbed". Check: http://forum.sun.com/thread.jspa?thr...357&tstart=240 Thomas |
| |||
| In article <zu7Md.183$Hr.58@newssvr24.news.prodigy.net>, Darren Dunham <ddunham@redwood.taos.com> wrote: >I even drug out sformat to see if it would help. It looks like there's >some sort of reservation the disks. I have no idea how or why.. > >./sformat: I/O error. test unit ready: scsi sendcmd: retryable error >CDB: 00 00 00 00 00 00 >status: 0x18 (RESERVATION CONFLICT) >cmd finished after 0.091s timeout 20s >Formatting ... ./sformat: I/O error. format unit: scsi sendcmd: >retryable error >CDB: 04 18 00 00 01 00 >status: 0x18 (RESERVATION CONFLICT) >resid: 4 >cmd finished after 0.096s timeout 13680s How many hosts did you connect to this drive simultaneously? -- EMail:joerg@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin js@cs.tu-berlin.de (uni) schilling@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://www.fokus.fraunhofer.de/usr/schilling ftp://ftp.berlios.de/pub/schily |
| ||||
| In comp.unix.solaris Thomas Tornblom <thomas@hax.se.remove-to-reply> wrote: > That sounds like there is a Persistent Group Reservation on the disks. > The disks have probably been used in a cluster environment, and the > reservation needs to be "scrubbed". Hmm. Actually, that may be a possibility. I don't have the history of this array, but it's very likely an older Sun Cluster was in use near it, so it could have been using the array. The cluster is gone though... > http://forum.sun.com/thread.jspa?thr...357&tstart=240 Hmm. Good for Sun Cluster 3.0 only I suppose? -- Darren Dunham ddunham@taos.com Senior Technical Consultant TAOS http://www.taos.com/ Got some Dr Pepper? San Francisco, CA bay area < This line left intentionally blank to confuse you. > |