This is a discussion on To SAN 4.3 or not, that's the question... within the Sun Solaris Administration forums, part of the Solaris Operating System category; --> Do I want to install Suns StorEdge SAN 4.3 software on a Ultra 2/2300 runnings Solaris 9, connected to ...
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Do I want to install Suns StorEdge SAN 4.3 software on a Ultra 2/2300 runnings Solaris 9, connected to a pair of Sun A5000 boxes and a DEC/HP/Compaq HSG80 RAID controller via three separate JNI64-1063 HBA controllers? What will I gain by installing that package? Will things be more stable? Btw, there definitely seems to be a OS-crashing bug in Solaris 9 when some FC disks in the A5000 generate errors/fails - typically the machine crashes after a series of warnings from the disks - and this is not machine-related since we got exactly the same behaviour when we tried to use Suns SOCAL-adapters in a Sun Enterprise 3000 server... BAD TRAP: type=31 rp=2a10004b060 addr=300aff3db48 mmu_fsr=0 sched: trap type = 0x31 addr=0x300aff3db48 pid=0, pc=0x116e790, sp=0x2a10004a901, tstate=0x4480001600, context=0x0 g1-g7: 14ba800, 0, bb, 10, 30000305880, 10, 2a10004bd40 000002a10004ad90 unix:die+a4 (31, 2a10004b060, 300aff3db48, 0, 16, 0) %l0-3: 0000000000000000 00000000aaab4fa4 000002a10004b060 000002a10004af58 %l4-7: 0000000000000031 0000030004f64ec8 0000000001171dd0 0000000001171d10 000002a10004ae70 unix:trap+874 (2a10004b060, 0, 10000, 10200, 300, 0) %l0-3: 0000000000000001 0000000000000000 0000000001437b28 0000000000000031 %l4-7: 0000000000000006 0000000000000001 0000000000000000 0000000000000000 000002a10004afb0 unix:ktl0+48 (3, 2a10004bd40, 1490c00, 0, aabb50ec, 0) %l0-3: 0000000000000001 0000000000001400 0000004480001600 000000000102bf54 %l4-7: 000003000000e930 000003000000e958 0000000000000005 000002a10004b060 000002a10004b100 SUNW,UltraSPARC-II:bcopy+1564 (aaab4fa4, 30005488ba4, 2, 10, 100c4ec, 0) %l0-3: 0000000000000000 0000030001b8b728 0000030001b8b72c 0000030001b8b188 %l4-7: 0000030001ad3e70 0000030001b8b72c 0000000000000000 0000030001abc000 000002a10004b300 fcaw:fca_cmd_complete+428 (30005488060, 12, 30005488000, 30002235e18, 3000549d108, 30001abc000) %l0-3: 0000030005488b98 0000000000000014 0000000000000012 0000000000000000 %l4-7: 0000000000000004 0000030005488060 0000000000010000 0000030004f18088 000002a10004b450 fcaw:fca_highintr+753c (3000027d238, b, 0, 1, c0000000, 30001abc000) %l0-3: 0000030005488060 0000000000000028 0000000000017e3c 0000000000000028 %l4-7: 0000030001abc000 0000000000057f08 0000000000000000 0000000000000000 000002a10004ba90 sbus:sbus_intr_wrapper+28 (3000006a5e8, 7db, 1400000, 2a10004bd40, fb60, 11f5e28) %l0-3: 0000000001219df8 00000300003d4618 0000000000000000 000000000142d7c0 %l4-7: 000003000005bbf8 0000000000000005 0000000000000000 0000030000210106 -- -- Peter Eriksson <peter@ifm.liu.se> Phone: +46 13 28 2786 Computer Systems Manager/BOFH Cell/GSM: +46 705 18 2786 Physics Department, Linköping University Room: Building F, F203 |
| |||
| Peter Eriksson <peter@ifm.liu.se> wrote: > Btw, there definitely seems to be a OS-crashing bug in Solaris 9 > when some FC disks in the A5000 generate errors/fails - typically > the machine crashes after a series of warnings from the > BAD TRAP: type=31 rp=2a10004b060 addr=300aff3db48 mmu_fsr=0 > 000002a10004b300 fcaw:fca_cmd_complete+428 (30005488060, 12, 30005488000, 30002235e18, 3000549d108, 30001abc000) FCAW is the 3rd party driver for the JNI cards. This isn't a Solaris bug, it's a JNI bug. If you're running an old version of the JNI driver I'd suggest upgrading - older version of this driver are very well known for bringing machines down like this... Scott |
| |||
| Scott Howard <scott@hunterlink.net.au> writes: >Peter Eriksson <peter@ifm.liu.se> wrote: >> Btw, there definitely seems to be a OS-crashing bug in Solaris 9 >> when some FC disks in the A5000 generate errors/fails - typically >> the machine crashes after a series of warnings from the >> BAD TRAP: type=31 rp=2a10004b060 addr=300aff3db48 mmu_fsr=0 >> 000002a10004b300 fcaw:fca_cmd_complete+428 (30005488060, 12, 30005488000, 30002235e18, 3000549d108, 30001abc000) >FCAW is the 3rd party driver for the JNI cards. This isn't a Solaris >bug, it's a JNI bug. If you're running an old version of the JNI driver >I'd suggest upgrading - older version of this driver are very well known >for bringing machines down like this... I *seriously* doubt that it is a JNI driver (we run the latest driver version btw) problem since we used to have *exactly* the same problem with Suns SOCAL cards on a different server when a drive goes bad. We've tried two different servers (Ultra 2 and Enterprise 3000). Tried three different SOCAL cards. Tried different FC cables. Replacing the bad drives with ones that work is one solution, but that only removes the symptoms, doesn't solve the bug in Solaris (just delays it until the next drive goes bad). Mind you - it doesn't crash right away. It typically takes a number of error messages printed and a reset or two of the FC bus before Solaris 9 takes a dive. (Things are slightly more stable with the JNI cards though so currently the SOCAL boards are resting in a antistatic bag). - Peter -- -- Peter Eriksson <peter@ifm.liu.se> Phone: +46 13 28 2786 Computer Systems Manager/BOFH Cell/GSM: +46 705 18 2786 Physics Department, Linköping University Room: Building F, F203 |
| ||||
| Peter Eriksson <peter@ifm.liu.se> wrote: > I *seriously* doubt that it is a JNI driver (we run the latest Having seen too many similar crashes where it was the JNI driver I'd seriously doubt it's not, but... If you have a support contact raise a call and give them the crash dump. This will allow Sun to tell with certainty where the problem is. > driver version btw) problem since we used to have *exactly* the > same problem with Suns SOCAL cards on a different server when a > drive goes bad. It's not physically possible to have *exactly* the same problem with a SOCAL card. The panic you've shown is occuring in the FCAW driver. If you don't have this driver installed (which you wouldn't for a SOCAL card) then the machine can't panic in this driver. Scott. |