Unix Technical Forum

SB1K crash kernel panic - memory or CPU ?

This is a discussion on SB1K crash kernel panic - memory or CPU ? within the Sun Solaris Hardware forums, part of the Solaris Operating System category; --> Hi, My SB1K ( dual 750MHz/4GB running S9 08/03 patched late last year) just crashed for no apparent reason. ...


Go Back   Unix Technical Forum > Unix Operating Systems > Solaris Operating System > Sun Solaris Hardware

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 01-16-2008, 02:43 PM
Andrew Tyson
 
Posts: n/a
Default SB1K crash kernel panic - memory or CPU ?

Hi,

My SB1K ( dual 750MHz/4GB running S9 08/03 patched late last year) just
crashed for no apparent reason. I have appended the relevant
/var/adm/messages log at the end of the post. I tried using adb which
came up with the following;

# adb -k unix.0 vmcore.0
physmem 7bcfa

I also tried running iscda (that a number of NG threads mentioned)
however the script went into an loop spewing out errors.

I guess I could have tried mdb, but I must confess that I neither have
the time, nor the inclination =)

Anyone have any ideas what may have caused the problem?

Thanks and regards,
Andrew

Jan 26 00:07:48 horus SUNW,UltraSPARC-III: [ID 326222 kern.warning]
WARNING: [AF
T1] Timeout (TO) Event detected by CPU0 Privileged Data Access at TL=0,
errID 0x
00000c74.79225e80
Jan 26 00:07:48 horus AFSR 0x00001000<TO>.00000000 AFAR
0x000007f8.00610900
Jan 26 00:07:48 horus Fault_PC 0x134061c
Jan 26 00:07:48 horus unix: [ID 836849 kern.notice]
Jan 26 00:07:48 horus ^Mpanic[cpu0]/thread=2a10007dd40:
Jan 26 00:07:48 horus unix: [ID 855895 kern.notice] [AFT1] errID
0x00000c74.7922
5e80 TO Error(s)
Jan 26 00:07:48 horus See previous message(s) for details
Jan 26 00:07:48 horus unix: [ID 100000 kern.notice]
Jan 26 00:07:48 horus genunix: [ID 723222 kern.notice] 000002a10007d360
SUNW,Ult
raSPARC-III:cpu_aflt_log+5bc (2a10007d46b, 1, 2a10007d678, 10, 117be60,
117be88)
Jan 26 00:07:49 horus genunix: [ID 179002 kern.notice] %l0-3:
0000000001491ea0
0000000000000010 0000000000000003 000002a10007d678
Jan 26 00:07:49 horus %l4-7: 0000100000000000 0000000000000000
000002a10007d5a
8 000002a10007d41e
Jan 26 00:07:49 horus genunix: [ID 723222 kern.notice] 000002a10007d5b0
SUNW,Ult
raSPARC-III:cpu_deferred_error+4d0 (0, 1, 4000100003200000, 40001000,
7f8, 1)
Jan 26 00:07:49 horus genunix: [ID 179002 kern.notice] %l0-3:
000002a10007d678
0000000400000000 4000100003200000 0000030001531928
Jan 26 00:07:49 horus %l4-7: 0000000000000001 000002a10007d9f0
0000030001855e9
0 0000000080000000
Jan 26 00:07:49 horus genunix: [ID 723222 kern.notice] 000002a10007d940
unix:ktl
0+48 (1, 1c000, 10220, 10278, 10290, 10270)
Jan 26 00:07:49 horus genunix: [ID 179002 kern.notice] %l0-3:
0000000000000000
0000000000001400 0000004400001607 0000000001172880
Jan 26 00:07:49 horus %l4-7: 00000300018b62b8 0000030001855ea0
000000000000000
5 000002a10007d9f0
Jan 26 00:07:49 horus genunix: [ID 723222 kern.notice] 000002a10007da90
pcisch
ci_intr_wrapper+7c (300003d5818, 22a, 1400000, 2a10007dd40, 4540, 13405f0)
Jan 26 00:07:49 horus genunix: [ID 179002 kern.notice] %l0-3:
0000000000000000
000003000154e000 00000300016b8000 0000000000000001
Jan 26 00:07:49 horus %l4-7: 000003000154e028 000003000000e6d8
00000000014a880
0 00000000014a8800
Jan 26 00:07:50 horus unix: [ID 100000 kern.notice]
Jan 26 00:07:50 horus genunix: [ID 672855 kern.notice] syncing file
systems...
Jan 26 00:08:20 horus unix: [ID 836849 kern.notice]
Jan 26 00:08:20 horus ^Mpanic[cpu0]/thread=2a10007dd40:
Jan 26 00:08:20 horus unix: [ID 715357 kern.notice] panic sync timeout
Jan 26 00:08:20 horus unix: [ID 100000 kern.notice]
Jan 26 00:08:20 horus genunix: [ID 111219 kern.notice] dumping to
/dev/dsk/c1t1d
0s1, offset 859373568, content: kernel
Jan 26 00:08:21 horus genunix: [ID 409368 kern.notice] ^M100% done:
35873 pages
dumped, compression ratio 3.67,
Jan 26 00:08:21 horus genunix: [ID 851671 kern.notice] dump succeeded
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 01-16-2008, 02:44 PM
Gavin Maltby
 
Posts: n/a
Default Re: SB1K crash kernel panic - memory or CPU ?

Andrew Tyson wrote:
> Hi,
>
> My SB1K ( dual 750MHz/4GB running S9 08/03 patched late last year) just
> crashed for no apparent reason. I have appended the relevant
> /var/adm/messages log at the end of the post. I tried using adb which
> came up with the following;
>


> Jan 26 00:07:48 horus SUNW,UltraSPARC-III: [ID 326222 kern.warning]
> WARNING: [AF
> T1] Timeout (TO) Event detected by CPU0 Privileged Data Access at TL=0,
> errID 0x
> 00000c74.79225e80
> Jan 26 00:07:48 horus AFSR 0x00001000<TO>.00000000 AFAR
> 0x000007f8.00610900
> Jan 26 00:07:48 horus Fault_PC 0x134061c
> Jan 26 00:07:48 horus unix: [ID 836849 kern.notice]
> Jan 26 00:07:48 horus ^Mpanic[cpu0]/thread=2a10007dd40:
> Jan 26 00:07:48 horus unix: [ID 855895 kern.notice] [AFT1] errID
> 0x00000c74.7922
> 5e80 TO Error(s)


This is a TO (timeout) error from the Safari databus from an access to
an address that claimed to be mapped (ie, some device claims to have
implemented that physical address range) but for which data did not return
within the time expected. The fault address 0x000007f8.00610900 I think looks
like a device address rather than real cacheable memory, but I don't know
which device that would map to. Chances are that there's a bad or marginal
PCI card, I'd guess.

Incidentally Solaris 10 makes all this hugely more elegant.

Gavin
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 01-16-2008, 02:44 PM
Andrew Tyson
 
Posts: n/a
Default Re: SB1K crash kernel panic - memory or CPU ?

Gavin

Thanks for the advice. i should have mentioned that this error occurs
very rarely when Gnome 2.0 desktop fires up (i.e. twice six months), so
maybe it's an issue with one of the UPA based elite 3dm6 framebuffers?

> Incidentally Solaris 10 makes all this hugely more elegant.
>


Yeah I've got the 11/04 release, just haven't had time to upgrade the
box. By more elegenant I guess that you mean it won't caused a reboot?

regards,
AT
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 08:44 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com