View Single Post

   
  #2 (permalink)  
Old 01-16-2008, 10:20 AM
Darren Dunham
 
Posts: n/a
Default Re: Weird missing disk fsck boot problem

In comp.unix.solaris Tom <the.worlok@gmail.com> wrote:
> I have a server which we use for NFS, SMB, and CVS. It is an E250
> connected to an external A1000 RAID box. The setup has been VERY
> reliable for the last 7 or so years. I have to say that this has been
> en exceptionally stable platform.


> The internal disks are mirrored for failover and the external RAID box
> houses the cvs depot, nfs and smb shares.... I use soft partitions
> for the filesystems (Solaris Volume Manager). I simply grow a soft
> partition and the particular filesystem when I need to.


> It seems to me that volume manager is referencing c2 while the RAID
> array is now, for some reason, c3. Perhaps the battery change did
> this? I didn't. If so it still doesn't explain why it manifested
> AFTER the OS patches and not merely AFTER the controller battery
> change or drive change...... That one has me scratching my head.


I've seen that on A1000s in the past. I've even seen a difference
between the controller reported by 'df' and the controller reported by
'format'. That can be a bit worrying....

> 2 weeks ago I replaced the controller battery in the RAID array, as
> well as an internal failed disk in the RAID array device. It uses one
> disk as a hot spare, and when I replaced the faulty disk with a new
> one it did what it does and all was/is well with that. (or so I
> thought?)


Should have been

> This weekend I patched the machine with the latest 9_Recommended in
> order to get it up to date and to get the DST patch. The patch bundle
> overwrote my inetd.conf, but I think I fixed that okay and now my CVS
> server and SWAT once again start up on boot.


> The problem is that for every boot, the boot process is paused for
> what the system says is a missing disk device. This is odd b/c it
> wasn't happening prior to the 9_Recommended patch and once the machine
> is up everything looks fine and all of the expected filesystems are
> present and mounted. Nothing that it is searching for is on the
> vfstab.


> The /var/adm/messages file complains about this:


> Mar 11 17:40:53 localhostname fsck: [ID 309067 daemon.error]
> localhostname: /dev/rdsk/c2t10d0s6: No such device or address


> ????


Can you post the contents of /etc/vfstab?

> ....and in the device.tab this is referenced but again I can't
> remember why it would be there and all of the expected partitions are
> there.


> disk9:/dev/rdsk/c2t10d0s2:/dev/dsk/c2t10d0s2::desc="Disk Drive"
> type="disk" part="true" removable="false" capacity="212381696"
> dpartlist="dpart900,dpart906"


> An observation which may or may not be pertinent, (I think it is) is
> that the RAID array controller calls itself disk, c3t10d0s2, which
> except for one number is very close to the c2t10d0s2 that the system
> is complaining about. Perhaps it changed it's name when I replaced
> the controller battery? What discourages me from that notion is that
> if that were so then this boot behavior should have manifested 2 weeks
> ago, not just after this weekend's patch install. ??


I doubt the battery had anything to do with it.


> These are my disks in format:


> AVAILABLE DISK SELECTIONS:
> 0. c0t0d0 <SUN36G cyl 24620 alt 2 hd 27 sec 107>
> /pci@1f,4000/scsi@3/sd@0,0
> 1. c0t8d0 <SUN36G cyl 24620 alt 2 hd 27 sec 107>
> /pci@1f,4000/scsi@3/sd@8,0
> 2. c0t9d0 <SUN36G cyl 24620 alt 2 hd 27 sec 107>
> /pci@1f,4000/scsi@3/sd@9,0
> 3. c0t10d0 <SUN36G cyl 24620 alt 2 hd 27 sec 107>
> /pci@1f,4000/scsi@3/sd@a,0
> 4. c0t11d0 <SUN36G cyl 24620 alt 2 hd 27 sec 107>
> /pci@1f,4000/scsi@3/sd@b,0
> 5. c0t12d0 <SUN36G cyl 24620 alt 2 hd 27 sec 107>
> /pci@1f,4000/scsi@3/sd@c,0


> 6. c3t10d0 <Symbios-StorEDGEA1000-0301 cyl 65533 alt 2 hd 64
> sec 169>
> /pseudo/rdnexus@3/rdriver@a,0


> As you can see, s6 is where I keep my important data:


> Current partition table (original):
> Total disk cylinders available: 65533 + 2 (reserved cylinders)



> Part Tag Flag Cylinders Size Blocks
> 0 root wm 0 0
> (0/0/0) 0
> 1 unassigned wm 0 0
> (0/0/0) 0
> 2 backup wu 0 - 65532 337.98GB (65533/0/0)
> 708804928
> 3 unassigned wm 0 0
> (0/0/0) 0
> 4 unassigned wm 0 0
> (0/0/0) 0
> 5 unassigned wm 0 0
> (0/0/0) 0
> 6 usr wm 0 - 65531 337.98GB (65532/0/0)
> 708794112
> 7 unassigned wm 65532 - 65532 5.28MB (1/0/0)
> 10816


> ========


> metastat shows this:


> # metastat |grep c2t10d0


> Device: c2t10d0s6
> c2t10d0s6 10816 No Yes
> Device: c2t10d0s6
> c2t10d0s6 10816 No Yes
> c2t10d0 Yes id1,iver@w600a0b80000a5ed90000000f3d94158d



> grepping c3 shows nothing........


--
Darren Dunham ddunham@taos.com
Senior Technical Consultant TAOS http://www.taos.com/
Got some Dr Pepper? San Francisco, CA bay area
< This line left intentionally blank to confuse you. >
Reply With Quote