This is a discussion on [C180] HPMC... processor cache damaged? within the HP-UX Operating System forums, part of the Unix Operating Systems category; --> Hello. After four power outages some weeks ago, all in the same day, my HP 9000 C180 started hanging. ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Hello. After four power outages some weeks ago, all in the same day, my HP 9000 C180 started hanging. I guess that I know the answer, but need to ask. Perhaps there is something I can do to recover this machine (e.g., cleaning it, or checking all cables and connectors to see if something is loose). But I think that I must fear the worst. Now the HPMC dump: HP-UX procyon B.11.11 U 9000/780 2000845441 CPU-ID( Model ) = 0xe ----------------- Processor 0 HPMC Information ------------------ Timestamp = Mon Dec 19 20:03:22 GMT 2005 (20:05:12:19:20:03:22) HPMC Chassis Codes = 0xcbf0 0x20b4 0x5008 0x5408 0x5508 0xcbfb General Registers 0 - 31 00-03 0000000000000000 0000000000c15000 0000000000000000 0000000000aafa28 04-07 0000000000000001 0000000000000000 0000000000000000 0000000000aafa28 08-11 0000000000ab03d8 0000000000000001 0000015adf016425 0000015adf016425 12-15 0000000000d3d2b8 0000000000ab2c90 0000000000000000 0000000000000000 16-19 0000000000000000 0000000000000000 0000000040046000 0000000000000000 20-23 0000000000000000 00000000000c6135 0000000000000000 0000000000000000 24-27 0000000000000000 0000000000000001 0000000003887000 0000000000c20000 28-31 fffffff0ffffffff 0000000003661260 0000000003661290 0000000000c17000 Control Registers 0 - 31 00-03 000000007cd43907 0000000000000000 0000000000000000 0000000000000000 04-07 0000000000000000 0000000000000000 0000000000000000 0000000000000000 08-11 0000000000003129 0000000000002afe 00000000000000c0 0000000000000032 12-15 0000000000000000 0000000000000000 000000000002a000 fffffff0ffffffff 16-19 0000015ae41c62cc 0000000000000000 00000000001805cc 000000002b7dcfff 20-23 0000000010340002 00000000abeaec00 000000ff0804ff1f 0000000000000000 24-27 0000000000aafa28 00000000000377f0 0000000000aafa28 8000000100065658 28-31 0000000000000000 0000015ae40be438 0000000000175b00 0000000003661290 Space Registers 0 - 7 00-03 07bec800 09f51000 01fadc00 00000000 04-07 00000000 ffffffff 04e3a800 00000000 IIA Space = 0x0000000000000000 IIA Offset = 0x00000000001805d0 Check Type = 0x80000000 CPU State = 0x9e000004 Cache Check = 0x50000000 TLB Check = 0x00000000 Bus Check = 0x00000000 Assists Check = 0x00000000 Assist State = 0x00000000 Path Info = 0x00000000 System Responder Address = 0x0000000000000000 System Requestor Address = 0x0000000000000000 Check Summary = 0x8000020030006080 Available Memory = 0x0000000000000000 CPU Diagnose Register 2 = 0x0301000000002206 CPU Status Register 0 = 0x3420c20000000000 CPU Status Register 1 = 0x8000020000000000 SADD LOG = 0x4800000000000000 Read Short LOG = 0xc17ff0fffffa0010 Memory Error Log Information: Timestamp = Mon Dec 19 20:03:23 GMT 2005 (20:05:12:19:20:03:23) No memory errors logged I/O Module Error Log Information: Timestamp = Mon Dec 19 20:03:23 GMT 2005 (20:05:12:19:20:03:23) Bus HPA Module Type Path Slt Md Sev Estat Requestor Responder --- ---------- ---------------- -------- -- -- ---- ----- ---------- ---------- 0 0xfff88000 I/O Adapter 8 2 0 he 0x0d 0x00000000 0x00000000 0 0xfff8a000 I/O Adapter 10 2 2 he 0x0d 0x00000000 0x00000000 Module Revision ------ -------- Backplane Board 2 I/O Board 2 Processor Board 1 PA 8000-8000 CPU 3.1 PDC 6.2 Memory Controller 9 I/O Bus Adapter 0 15 GSC to PCI Bridge 0x6801 Built in FWSCSI 0 FWSCSI Chip 3 Multi I/O Chip 0 EISA Interface 0 I/O Bus Adapter 1 15 GSC 1 40MHz GSC 2 40MHz Current Diagnose Registers for Processor HPA: 0xfffa0000 DR2: 0x00000000 03010000 CPU Status 0/1: 0x00000000 08002204 / 0x00000000 4420c200 Note: prefetch is disabled. PDH Status 1/2: 0x00000000 / 0x00000000 PDH Control: 0x00000000 My guess is that it is a too bad HPMC error. In fact, the 0xcbf0 (HPMC handing initiated), 0x20b4 (cache errors), 0x5008, 0x5408 and 0x5508 (bus transactions errors), and 0xcbfb (branching to the OS HPMC handler) chassis codes do not announce anything good. Is there something I can do to recover (or *try* to recover) the machine? It now only stays up for two, three, at most four, hours. Cheers, Igor. |
| |||
| Hello Igor, igor@nospam.invalid wrote: > After four power outages some weeks ago, all in the same day, my HP 9000 > C180 started hanging. I guess that I know the answer, but need to ask. > Perhaps there is something I can do to recover this machine (e.g., > cleaning it, or checking all cables and connectors to see if something > is loose). But I think that I must fear the worst. > > Now the HPMC dump: Sniped. > My guess is that it is a too bad HPMC error. In fact, the 0xcbf0 (HPMC > handing initiated), 0x20b4 (cache errors), 0x5008, 0x5408 and 0x5508 > (bus transactions errors), and 0xcbfb (branching to the OS HPMC handler) > chassis codes do not announce anything good. > > Is there something I can do to recover (or *try* to recover) the machine? > It now only stays up for two, three, at most four, hours. The Output from pimdecode : Summary: Below is a list of causes for the failure ordered from most likely to least likely. Replace assemblies in the order listed. CPU board Details: CBF0 HPMC_INITIATED HPMC handling initiated 20B4 CPU 0 DCACHE_ODD_DATA_PARITY 5008 CPU 0 Runway broad error. This IOA received a Broad_Error from another module. Look elsewhere for the cause of the Broad_Error. 5408 U2 chip IOA 0 Runway broad error. This IOA received a Broad_Error from another module. Look elsewhere for the cause of the Broad_Error. 5508 U2 chip IOA 1 Runway broad error. This IOA received a Broad_Error from another module. Look elsewhere for the cause of the Broad_Error. CBFB Branching to OS HPMC handler. Was in OS when failure occured. Check Summary = 0x8000020030006080 Bit 0: HPMC is detected Bit 22: D-Cache HPMC Runway Error Type (Check Summary[32:35]) = 0x3 Slave Detected Broadcast Error Bit 49: Parity LPMCs enabled. Bit 50: Parity-induced LPMC pending. Bit 56: Sticky bit. Set when a parity error is found in the even (bit 0) data word of the odd cache port. Only cleared with a move-to-diagnose instruction. Regards Lars |
| ||||
| Lars Bausch <lars.bausch@dotsch.de> wrote: > > The Output from pimdecode : > > Summary: > Below is a list of causes for the failure ordered from most likely to > least likely. Replace assemblies in the order listed. > CPU board [...] Hi Lars, Thanks a lot for the detailed description of the HPMC failure. I was not aware about the availability of the PIM Analysis Tool. Nice tool, sadly it is not available for HP-UX or, even better, as source code compilable on any Unix system. I will carefully look at the output of pimdecode when trying to recover my workstation but my first though is that the output of pimcode shows that a failure in the processor data cache is the most probable source for this problem. That is really bad. Thanks a lot for the feedback on this matter. Best regards, Igor. |
| Thread Tools | |
| Display Modes | |
|
|