Unix Technical Forum

[C180] HPMC... processor cache damaged?

This is a discussion on [C180] HPMC... processor cache damaged? within the HP-UX Operating System forums, part of the Unix Operating Systems category; --> Hello. After four power outages some weeks ago, all in the same day, my HP 9000 C180 started hanging. ...


Go Back   Unix Technical Forum > Unix Operating Systems > HP-UX Operating System

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 01-16-2008, 08:40 PM
igor@nospam.invalid
 
Posts: n/a
Default [C180] HPMC... processor cache damaged?

Hello.

After four power outages some weeks ago, all in the same day, my HP 9000
C180 started hanging. I guess that I know the answer, but need to ask.
Perhaps there is something I can do to recover this machine (e.g.,
cleaning it, or checking all cables and connectors to see if something
is loose). But I think that I must fear the worst.

Now the HPMC dump:


HP-UX procyon B.11.11 U 9000/780 2000845441

CPU-ID( Model ) = 0xe
----------------- Processor 0 HPMC Information ------------------

Timestamp = Mon Dec 19 20:03:22 GMT 2005 (20:05:12:19:20:03:22)

HPMC Chassis Codes = 0xcbf0 0x20b4 0x5008 0x5408 0x5508 0xcbfb


General Registers 0 - 31
00-03 0000000000000000 0000000000c15000 0000000000000000 0000000000aafa28
04-07 0000000000000001 0000000000000000 0000000000000000 0000000000aafa28
08-11 0000000000ab03d8 0000000000000001 0000015adf016425 0000015adf016425
12-15 0000000000d3d2b8 0000000000ab2c90 0000000000000000 0000000000000000
16-19 0000000000000000 0000000000000000 0000000040046000 0000000000000000
20-23 0000000000000000 00000000000c6135 0000000000000000 0000000000000000
24-27 0000000000000000 0000000000000001 0000000003887000 0000000000c20000
28-31 fffffff0ffffffff 0000000003661260 0000000003661290 0000000000c17000

Control Registers 0 - 31
00-03 000000007cd43907 0000000000000000 0000000000000000 0000000000000000
04-07 0000000000000000 0000000000000000 0000000000000000 0000000000000000
08-11 0000000000003129 0000000000002afe 00000000000000c0 0000000000000032
12-15 0000000000000000 0000000000000000 000000000002a000 fffffff0ffffffff
16-19 0000015ae41c62cc 0000000000000000 00000000001805cc 000000002b7dcfff
20-23 0000000010340002 00000000abeaec00 000000ff0804ff1f 0000000000000000
24-27 0000000000aafa28 00000000000377f0 0000000000aafa28 8000000100065658
28-31 0000000000000000 0000015ae40be438 0000000000175b00 0000000003661290

Space Registers 0 - 7
00-03 07bec800 09f51000 01fadc00 00000000
04-07 00000000 ffffffff 04e3a800 00000000

IIA Space = 0x0000000000000000
IIA Offset = 0x00000000001805d0
Check Type = 0x80000000
CPU State = 0x9e000004
Cache Check = 0x50000000
TLB Check = 0x00000000
Bus Check = 0x00000000
Assists Check = 0x00000000
Assist State = 0x00000000
Path Info = 0x00000000
System Responder Address = 0x0000000000000000
System Requestor Address = 0x0000000000000000
Check Summary = 0x8000020030006080
Available Memory = 0x0000000000000000
CPU Diagnose Register 2 = 0x0301000000002206
CPU Status Register 0 = 0x3420c20000000000
CPU Status Register 1 = 0x8000020000000000
SADD LOG = 0x4800000000000000
Read Short LOG = 0xc17ff0fffffa0010


Memory Error Log Information:

Timestamp = Mon Dec 19 20:03:23 GMT 2005 (20:05:12:19:20:03:23)

No memory errors logged


I/O Module Error Log Information:

Timestamp = Mon Dec 19 20:03:23 GMT 2005 (20:05:12:19:20:03:23)

Bus HPA Module Type Path Slt Md Sev Estat Requestor Responder
--- ---------- ---------------- -------- -- -- ---- ----- ---------- ----------
0 0xfff88000 I/O Adapter 8 2 0 he 0x0d 0x00000000 0x00000000
0 0xfff8a000 I/O Adapter 10 2 2 he 0x0d 0x00000000 0x00000000

Module Revision
------ --------
Backplane Board 2
I/O Board 2
Processor Board 1
PA 8000-8000 CPU 3.1
PDC 6.2
Memory Controller 9
I/O Bus Adapter 0 15
GSC to PCI Bridge 0x6801
Built in FWSCSI 0
FWSCSI Chip 3
Multi I/O Chip 0
EISA Interface 0
I/O Bus Adapter 1 15
GSC 1 40MHz
GSC 2 40MHz


Current Diagnose Registers for Processor HPA: 0xfffa0000
DR2: 0x00000000 03010000
CPU Status 0/1: 0x00000000 08002204 / 0x00000000 4420c200
Note: prefetch is disabled.
PDH Status 1/2: 0x00000000 / 0x00000000
PDH Control: 0x00000000


My guess is that it is a too bad HPMC error. In fact, the 0xcbf0 (HPMC
handing initiated), 0x20b4 (cache errors), 0x5008, 0x5408 and 0x5508
(bus transactions errors), and 0xcbfb (branching to the OS HPMC handler)
chassis codes do not announce anything good.

Is there something I can do to recover (or *try* to recover) the machine?
It now only stays up for two, three, at most four, hours.

Cheers,
Igor.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 01-16-2008, 08:40 PM
Lars Bausch
 
Posts: n/a
Default Re: [C180] HPMC... processor cache damaged?

Hello Igor,

igor@nospam.invalid wrote:
> After four power outages some weeks ago, all in the same day, my HP 9000
> C180 started hanging. I guess that I know the answer, but need to ask.
> Perhaps there is something I can do to recover this machine (e.g.,
> cleaning it, or checking all cables and connectors to see if something
> is loose). But I think that I must fear the worst.
>
> Now the HPMC dump:

Sniped.

> My guess is that it is a too bad HPMC error. In fact, the 0xcbf0 (HPMC
> handing initiated), 0x20b4 (cache errors), 0x5008, 0x5408 and 0x5508
> (bus transactions errors), and 0xcbfb (branching to the OS HPMC handler)
> chassis codes do not announce anything good.
>
> Is there something I can do to recover (or *try* to recover) the machine?
> It now only stays up for two, three, at most four, hours.


The Output from pimdecode :

Summary:
Below is a list of causes for the failure ordered from most likely to
least likely. Replace assemblies in the order listed.
CPU board

Details:
CBF0 HPMC_INITIATED HPMC handling initiated
20B4 CPU 0 DCACHE_ODD_DATA_PARITY
5008 CPU 0 Runway broad error. This IOA received a Broad_Error from
another module. Look elsewhere for the cause of the Broad_Error.
5408 U2 chip IOA 0 Runway broad error. This IOA received a Broad_Error
from another module. Look elsewhere for the cause of the Broad_Error.
5508 U2 chip IOA 1 Runway broad error. This IOA received a Broad_Error
from another module. Look elsewhere for the cause of the Broad_Error.
CBFB Branching to OS HPMC handler. Was in OS when failure occured.

Check Summary = 0x8000020030006080
Bit 0: HPMC is detected
Bit 22: D-Cache HPMC
Runway Error Type (Check Summary[32:35]) = 0x3
Slave Detected Broadcast Error
Bit 49: Parity LPMCs enabled.
Bit 50: Parity-induced LPMC pending.
Bit 56: Sticky bit. Set when a parity error is found in the even (bit 0)
data word of the odd cache port. Only cleared with a move-to-diagnose
instruction.


Regards

Lars
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 01-16-2008, 08:40 PM
igor@nospam.invalid
 
Posts: n/a
Default Re: [C180] HPMC... processor cache damaged?

Lars Bausch <lars.bausch@dotsch.de> wrote:
>
> The Output from pimdecode :
>
> Summary:
> Below is a list of causes for the failure ordered from most likely to
> least likely. Replace assemblies in the order listed.
> CPU board

[...]

Hi Lars,

Thanks a lot for the detailed description of the HPMC failure.
I was not aware about the availability of the PIM Analysis Tool.
Nice tool, sadly it is not available for HP-UX or, even better,
as source code compilable on any Unix system.

I will carefully look at the output of pimdecode when trying
to recover my workstation but my first though is that the output
of pimcode shows that a failure in the processor data cache is
the most probable source for this problem. That is really bad.

Thanks a lot for the feedback on this matter.

Best regards,
Igor.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 07:58 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com