This is a discussion on Problem replacing disk in StorEdge T3 within the Sun Solaris Hardware forums, part of the Solaris Operating System category; --> At work we have a T3 where all disks are configured for RAID5. One of the disks has failed, ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| At work we have a T3 where all disks are configured for RAID5. One of the disks has failed, which means that accessing the data on the T3 is really slow. When I entered the replacement disk, it seemed to be taken in use automatically (proc list showed some progress), but then it failed with a 0D status (see vol stat, fru stat etc below). I noticed that the disk is not exactly the same as the other, could this be the reason? It is a proper replacement disk bought from Sun with the proper bracket and everything, so it should work or what? What can I do to fix this? -- - Erlend Leganger T300 Release 1.17b 2001/05/31 17:47:22 Copyright (C) 1997-2001 Sun Microsystems, Inc. All Rights Reserved. bigdaddy:/:<1>vol stat v0 u1d1 u1d2 u1d3 u1d4 u1d5 u1d6 u1d7 u1d8 u1d9 mounted 0D 0 0 0 0 0 0 0 0 bigdaddy:/:<2>fru list ID TYPE VENDOR MODEL REVISION SERIAL ------ ----------------- ----------- ----------- -------- -------- u1ctr controller card SLR-MI 375-0084-02- 0210 022813 u1d1 disk drive SEAGATE ST336605FSUN A338 3FP0H63D u1d2 disk drive SEAGATE ST336704FSUN A42D 3CD0VFBL u1d3 disk drive SEAGATE ST336704FSUN A42D 3CD0T89W u1d4 disk drive SEAGATE ST336704FSUN A42D 3CD0VCZ4 u1d5 disk drive SEAGATE ST336704FSUN A42D 3CD0VF5L u1d6 disk drive SEAGATE ST336704FSUN A42D 3CD0TG33 u1d7 disk drive SEAGATE ST336704FSUN A42D 3CD0TT8G u1d8 disk drive SEAGATE ST336704FSUN A42D 3CD0VD4T u1d9 disk drive SEAGATE ST336704FSUN A42D 3CD0TXQF u1l1 loop card SLR-MI 375-0085-01- 5.02 Flash 033179 u1l2 loop card SLR-MI 375-0085-01- 5.02 Flash 030038 u1pcu1 power/cooling unit TECTROL-CAN 300-1454-01( 0000 028800 u1pcu2 power/cooling unit TECTROL-CAN 300-1454-01( 0000 028799 u1mpn mid plane SLR-MI 370-3990-01- 0000 021282 bigdaddy:/:<3>fru stat CTLR STATUS STATE ROLE PARTNER TEMP ------ ------- ---------- ---------- ------- ---- u1ctr ready enabled master - 30.5 DISK STATUS STATE ROLE PORT1 PORT2 TEMP VOLUME ------ ------- ---------- ---------- --------- --------- ---- ------ u1d1 ready disabled data disk ready ready 30 v0 u1d2 ready enabled data disk ready ready 33 v0 u1d3 ready enabled data disk ready ready 34 v0 u1d4 ready enabled data disk ready ready 32 v0 u1d5 ready enabled data disk ready ready 33 v0 u1d6 ready enabled data disk ready ready 33 v0 u1d7 ready enabled data disk ready ready 36 v0 u1d8 ready enabled data disk ready ready 32 v0 u1d9 ready enabled data disk ready ready 32 v0 LOOP STATUS STATE MODE CABLE1 CABLE2 TEMP ------ ------- ---------- ------- --------- --------- ---- u1l1 ready enabled master - - 27.0 u1l2 ready enabled slave - - 27.5 POWER STATUS STATE SOURCE OUTPUT BATTERY TEMP FAN1 FAN2 ------ ------- --------- ------ ------ ------- ------ ------ ------ u1pcu1 ready enabled line normal fault normal normal normal u1pcu2 ready enabled line normal fault normal normal normal bigdaddy:/:<4>exit Connection closed by foreign host. |
| |||
| Erlend Leganger wrote: > At work we have a T3 where all disks are configured for RAID5. One of > the disks has failed, which means that accessing the data on the T3 is > really slow. > > When I entered the replacement disk, it seemed to be taken in use > automatically (proc list showed some progress), but then it failed with > a 0D status (see vol stat, fru stat etc below). > > I noticed that the disk is not exactly the same as the other, could this > be the reason? It is a proper replacement disk bought from Sun with the > proper bracket and everything, so it should work or what? What can I do > to fix this? I'd say, complain @ Sun. Searching google, I found the documentation from Seagate. Among other things, it lists this: ST336605: 29,549 cyl / 4 heads / 71,687,371 data blocks ST336704: 14,100 cyl / 12 heads / 71,687,369 data blocks I don't know whether these differences are a problem in this case. Sun should be able to tell... Maybe the issue can be fixed with a firmware update on the new drive (or on all the old ones)? -- Best regards, Jeroen Besse http://rblcheck.besse.nl/ (to contact me: the nospam address actually exists) |
| |||
| You need to take a look at the syslog file right after the rebuild fails. There should be more information in there. I have had this happen before where the rebuild fails because of a read error on another disk... |
| |||
| 1) Your boot firmware is very old. 2) Your disk firmware is way out of date. 3) Both the batteries in your PCUs are expired. The latest boot firmware is 1.18.04 and you're at 1.17b. That's at least 3 years out-of-date! The latest disk firmware for the ST336605FSUN is A838 The latest disk firmware for the ST336704FSUN is AE26 If you're lucky you'll be able to recover. The 'proc list' command will show if the new disk is being reconstructed to. Otherwise hopefully you have a way to backup the data. If so you can get the batteries replaced, upgrade all the firmware and reinitialize the volume and restore the data. |
| |||
| John S. wrote on 2005-10-06 13:48: > You need to take a look at the syslog file right after the rebuild > fails. There should be more information in there. I have had this > happen before where the rebuild fails because of a read error on > another disk... Thanks for the tip. I have now learnt that the disk should be OK, so I will try this again tomorrow and watch the syslog as you suggest. I will be back with the result. -- - Erlend Leganger |
| |||
| Trinean wrote on 2005-10-09 00:27: > 1) Your boot firmware is very old. > 2) Your disk firmware is way out of date. > 3) Both the batteries in your PCUs are expired. > > The latest boot firmware is 1.18.04 and you're at 1.17b. > That's at least 3 years out-of-date! > > The latest disk firmware for the ST336605FSUN is A838 > The latest disk firmware for the ST336704FSUN is AE26 > > If you're lucky you'll be able to recover. The 'proc list' command will show > if the new disk is being reconstructed to. Otherwise hopefully you have a > way to backup the data. If so you can get the batteries replaced, upgrade > all the firmware and reinitialize the volume and restore the data. > I guess this is what happens when you have a device that works OK, you just forget about it... The batteries have been replaced though, we had ordered them in. I was able to copy the data from the T3 to other disk areas on the server, so I'm OK with the files (I also have a backup on tape made before it failed). I haven't RTFM yet, but are there any tips I should be aware of when upgrading boot and disk firmware? What to do first? Where do I get hold of the firmware updates? -- - Erlend Leganger |
| |||
| > I guess this is what happens when you have a device that works OK, you > just forget about it... The batteries have been replaced though, we had > ordered them in. You have to do more than just replace the batteries or the T3 won't know anything has changed. Commands need to be ran to reset the dates back to zero so the errors will go away. This InfoDoc should explain the procedures: http://www.sunshack.org/data/sh/2.1/...doc/46176.html Also the batteries should now last 3 years instead of 2 years per Sun. In the same patch you would use to upgrade the boot and disk firmware: http://sunsolve.sun.com/pub-cgi/pdow...15-17&method=h there is a T3extender program that will run commands to set the battery expiration life to 36 months instead of 24 months. Good luck! |
| |||
| John S. wrote on 2005-10-06 13:48: > You need to take a look at the syslog file right after the rebuild > fails. There should be more information in there. I have had this > happen before where the rebuild fails because of a read error on > another disk... > You were 100% correct. The warning light was lit on disk u1d1, so this disk was replaced and attempted rebuilt. The rebuild failed after a while, with a note of multiple disk errors in the syslog - it seems as u1d4 has a problem as well. I was fooled by vol stat only showing error on on u1d1 - I will check the syslog more carefully in the future. -- - Erlend Leganger |
| ||||
| Trinean wrote on 2005-10-10 00:01: > > This InfoDoc should explain the procedures: > > http://www.sunshack.org/data/sh/2.1/...doc/46176.html > > Also the batteries should now last 3 years instead of 2 years per Sun. > > In the same patch you would use to upgrade the boot and disk firmware: > > http://sunsolve.sun.com/pub-cgi/pdow...15-17&method=h Excellent, thank you. I need to wait for my second replacement disk, but after reading up on the patch installation method, it doesn't seem too difficult to do. > there is a T3extender program that will run commands to set the battery > expiration life to 36 months instead of 24 months. I had a look at the T3extender program code and I decided that using this patch is an extreme overkill (creating a long perl script and even include perl itself in the patch) to do a small job: I only made two ".id write blife <pcu> 36" commands which seems to do the trick (see below). Of course, if you have a room full of racks fully populated with T3s, the script would be handy... -- - Erlend Leganger bigdaddy:/:<48> bigdaddy:/:<48>id read u1pcu1 Revision : 0000 Manufacture Week : 00442000 Battery Install Week : 00412005 Battery Life Used : 0 days, 2 hours Battery Life Span : 730 days, 12 hours Serial Number : 028800 Battery Warranty Date: 20051010082149 Battery Internal Flag: 0x00000000 Vendor ID : TECTROL-CAN Model ID : 300-1454-01(50) bigdaddy:/:<49> bigdaddy:/:<49>id read u1pcu2 Revision : 0000 Manufacture Week : 00442000 Battery Install Week : 00412005 Battery Life Used : 0 days, 2 hours Battery Life Span : 730 days, 12 hours Serial Number : 028799 Battery Warranty Date: 20051010082152 Battery Internal Flag: 0x00000000 Vendor ID : TECTROL-CAN Model ID : 300-1454-01(50) bigdaddy:/:<50> bigdaddy:/:<50> bigdaddy:/:<50>.id write blife u1pcu1 36 bigdaddy:/:<51>.id write blife u1pcu2 36 bigdaddy:/:<52> bigdaddy:/:<52> bigdaddy:/:<52>id read u1pcu1 Revision : 0000 Manufacture Week : 00442000 Battery Install Week : 00412005 Battery Life Used : 0 days, 2 hours Battery Life Span : 1095 days, 18 hours Serial Number : 028800 Battery Warranty Date: 20051010082149 Battery Internal Flag: 0x00000000 Vendor ID : TECTROL-CAN Model ID : 300-1454-01(50) bigdaddy:/:<53> bigdaddy:/:<53> bigdaddy:/:<53>id read u1pcu2 Revision : 0000 Manufacture Week : 00442000 Battery Install Week : 00412005 Battery Life Used : 0 days, 2 hours Battery Life Span : 1095 days, 18 hours Serial Number : 028799 Battery Warranty Date: 20051010082152 Battery Internal Flag: 0x00000000 Vendor ID : TECTROL-CAN Model ID : 300-1454-01(50) bigdaddy:/:<54> bigdaddy:/:<54> bigdaddy:/:<54> |
| Thread Tools | |
| Display Modes | |
|
|