Lets try and get back on this.
After one false start where the MSC kicked out the PSU with a fault It came up the second time with a console spamming:
Code:
1A 000: =====> bridge_sanity diag took an exception. <=====
1A 000: EPC : 0xc00000001fc11530
1A 000: BadVA : 0x0000000000000000
1A 000: Cause : 0x000000008000c01c
Another power cycle stopped that. Configuration is still sitting at one flashed and working node, a healthy Irix install and a bunch of other nodes with disabled CPU's disabled ram and PROM versions that I cannot seem to get irix to flash.
Edited: Woah.
Code:
IP27 PROM SGI Version 6.156 built 11:27:56 AM Nov 18, 2003
Testing/Initializing memory ............... DONE
Copying PROM code to memory ............... DONE
Discovering local IO ......................
Found an io Board with nic 030-1275-002.
Found an io Board with nic 030-1275-004.
DONE
Discovering NUMAlink connectivity ......... DONE
Found 5 objects (3 hubs, 2 routers) in 55798 usec
Waiting for peers to complete discovery.... DONE
Recognized 390 MHz midplane
Global master is /hw/module/1/slot/n1
*** Barrier sync warning: local=4418, NIC 0x329e00=4400 (promrev mismatch?)
*** Barrier sync warning: local=4446, NIC 0x329e00=4428 (promrev mismatch?)
*** Barrier sync warning: local=4908, NIC 0x329e00=4890 (promrev mismatch?)
*** Barrier sync warning: local=4981, NIC 0x329e00=4963 (promrev mismatch?)
Testing/Initializing all memory ........... DONE
waiting for node with nic 329e00 at module 1 slot 2 at global barrier.....
*** Barrier sync warning: local=5108, NIC 0x329e00=5090 (promrev mismatch?)
waiting for node with nic 3472e2 at module 1 slot 3 at global barrier...........
Checking partitioning information ......... DONE
*** Barrier sync warning: local=5355, NIC 0x329e00=5337 (promrev mismatch?)
Loading BASEIO prom ....................... DONE
BASEIO PROM Monitor SGI Version 6.156 built 11:26:28 AM Nov 18, 2003 (BE64)
6 CPUs on 3 nodes found.
Installing PROM Device drivers ............
Base I/O Ethernet set to /dev/ethernet/ef0
Walking SCSI Adapter 0 (/hw/module/1/slot/io1), (pci id 0)
1+ 2- 3- 4- 5- 6+ 7- 8- 9- 10- 11- 12- 13- 14- 15- = 2 device(s)
Walking SCSI Adapter 1 (/hw/module/1/slot/io1), (pci id 1)
1- 2- 3- 4- 5- 6- 7- 8- 9- 10- 11- 12- 13- 14- 15- = 0 device(s)
Initializing PROM Device drivers .......... DONE
Checking hardware inventory ............... Found new or re
-enabled component MEM BANK 0 1 2 3
***Warning: Found a new MSCSI board in module 1, slot io7, serial EAX394
Please use the 'update' command from the PROM Monitor to update the inventory
***Warning: Found a new IP27 board in module 1, slot n3, serial HXC897
Please use the 'update' command from the PROM Monitor to update the inventory
DONE
**** System Configuration and Diagnostics Summary ****
CONFIG:
No. of NODEs enabled = 3
No. of NODEs disabled = 0
No. of CPUs enabled = 6
No. of CPUs disabled = 0
Mem enabled = 1792 MB
Mem disabled = 0 MB
No. of RTRs enabled = 2
No. of RTRs disabled = 0
DIAG RESULTS:
ALL DIAGS PASSED.
**** End System Configuration and Diagnostics Summary ****
System Maintenance Menu
1) Start System
2) Install System Software
3) Run Diagnostics
4) Recover System
5) Enter Command Monitor
Option? 5
Command Monitor. Type "exit" to return to the menu.
>> hinv -v
IP27 Node Board, Module 1, Slot n1
ASIC HUB Rev 5, 100 MHz, (nasid 0)
Processor A: 250 MHz R10000 Rev 3.4
Secondary Cache 4MB 250MHz Tap 0x9 , (cpu 0)
R10010FPC Rev 3.4
Processor B: 250 MHz R10000 Rev 3.4
Secondary Cache 4MB 250MHz Tap 0x9 , (cpu 1)
R10010FPC Rev 3.4
Memory on board, 256 MBytes (Standard)
Bank 0, 128 MBytes (Standard) <-- (Software Bank 0)
Bank 1, 128 MBytes (Standard)
IP27 Node Board, Module 1, Slot n2
ASIC HUB Rev 5, 100 MHz, (nasid 1)
Processor A: 250 MHz R10000 Rev 3.4
Secondary Cache 4MB 250MHz Tap 0x19 , (cpu 2)
R10010FPC Rev 3.4
Processor B: 250 MHz R10000 Rev 3.4
Secondary Cache 4MB 250MHz Tap 0x19 , (cpu 3)
R10010FPC Rev 3.4
Memory on board, 512 MBytes (Standard)
Bank 0, 128 MBytes (Standard) <-- (Software Bank 0)
Bank 1, 128 MBytes (Standard)
Bank 2, 128 MBytes (Standard)
Bank 3, 128 MBytes (Standard)
IP27 Node Board, Module 1, Slot n3
ASIC HUB Rev 5, 100 MHz, (nasid 2)
Processor A: 250 MHz R10000 Rev 3.4
Secondary Cache 4MB 250MHz Tap 0x9 , (cpu 4)
R10010FPC Rev 3.4
Processor B: 250 MHz R10000 Rev 3.4
Secondary Cache 4MB 250MHz Tap 0x9 , (cpu 5)
R10010FPC Rev 3.4
Memory on board, 1024 MBytes (Standard)
Bank 0, 256 MBytes (Standard) <-- (Software Bank 0)
Bank 1, 256 MBytes (Standard)
Bank 2, 256 MBytes (Standard)
Bank 3, 256 MBytes (Standard)
BASEIO IO Board, Module 1, Slot io1
ASIC BRIDGE Rev 4, (widget 8)
adapter PCI-SCSI Rev 5
(pci id 0)
peripheral SCSI DISK, ID 1, IBM DXHS18Y
peripheral SCSI CDROM, ID 6, TOSHIBA CD-ROM XM-5401TA
adapter PCI-SCSI Rev 5
(pci id 1)
adapter IOC3 Rev 1
(pci id 2)
controller multi function SuperIO
controller Ethernet Rev 1
adapter IOC3 Rev 1
(pci id 6)
controller multi function SuperIO
controller Keyboard/Mouse
controller Parallel Port
adapter RAD
(pci id 7)
PCI_XIO IO Board, Module 1, Slot io2
ASIC BRIDGE Rev 3, (widget 12)
adapter ID (Vendor 14e4 Device 1645 class 2 subclass 2)
(pci id 0)
adapter ID (Vendor 10df Device fc00 class c subclass c)
(pci id 1)
XTALK_PCI IO Board, Module 1, Slot io6
ASIC BRIDGE Rev 4, (widget 13)
adapter ID (Vendor 10a9 Device 9 class 2 subclass 2)
(pci id 1)
XTALK_PCI IO Board, Module 1, Slot io5
ASIC BRIDGE Rev 4, (widget 14)
adapter ID (Vendor 1077 Device 2200 class 1 subclass 1)
(pci id 1)
MSCSI IO Board, Module 1, Slot io7
ASIC BRIDGE Rev 3, (widget 14)
adapter PCI-SCSI Rev 5
(pci id 0)
adapter PCI-SCSI Rev 5
(pci id 1)
adapter PCI-SCSI Rev 5
(pci id 2)
adapter PCI-SCSI Rev 5
(pci id 3)
ASIC ROUTER , Module 1, Slot r1 (nasid 1)
ASIC ROUTER , Module 1, Slot r2 (nasid 2)
MIDPLANE, Module 1 Frequency 390 MHz
ASIC XBOW Rev 4, on midplane of Module 1
MIDPLANE, Module 1 Frequency 390 MHz
ASIC XBOW Rev 4, on midplane of Module 1
MARDIGRAS Graphics Board, Module 1, Slot io4
>>
Suddenly two nodes started working fine. CPU's and ram are behaving as well. Lemme see if it boots.
Edited: Yeah. Aside from PROM rev mismatches the system is running on three nodeboards suddenly with not a peep about CPU or ram failures.
Code:
>> auto
Starting up the system...
Loading dksc(0,1,8)/sash: 896+111764+16853+3848 entry: 0xa8000000012a6ee4
4460239+1061168+992208 entry: 0xa80000000001a750
WARNING: downrev router board in module 1 slot r1
WARNING: downrev router board in module 1 slot r1
WARNING: downrev router board in module 1 slot r2
IRIX Release 6.5 IP27 Version 01090133 System V - 64 Bit
Copyright 1987-2006 Silicon Graphics, Inc.
All Rights Reserved.
Setting rbaud to 19200
/hw/module/1/slot/io5/xtalk_pci/pci/1/scsi_ctlr/0: Firmware version: 2.2.6: TP.
/hw/module/1/slot/io5/xtalk_pci/pci/1/scsi_ctlr/0: Response in of 32768 is inval
id, rereading.
/hw/module/1/slot/io5/xtalk_pci/pci/1/scsi_ctlr/0: Response in of 32768 is inval
id, ignoring.
/hw/module/1/slot/io5/xtalk_pci/pci/1/scsi_ctlr/0: Response in of 32768 is inval
id, rereading.
/hw/module/1/slot/io5/xtalk_pci/pci/1/scsi_ctlr/0: Response in of 32768 is inval
id, ignoring.
/hw/module/1/slot/io5/xtalk_pci/pci/1/scsi_ctlr/0: Response in of 32768 is inval
id, rereading.
/hw/module/1/slot/io5/xtalk_pci/pci/1/scsi_ctlr/0: Response in of 32768 is inval
id, ignoring.
/hw/module/1/slot/io5/xtalk_pci/pci/1/scsi_ctlr/0: no cable detected.
NOTICE: Starting failsoftd
The system is coming up.
network: WARNING: IRIS's Internet address is the default.
Using standalone network mode.
WARNING: ef0: link fail - check ethernet cable
Warning: Internet Gateway web server running as root.
Use "chkconfig webface_apache off" to disable.
Starting new eventmond...
No System change detected.
IRIS console login: root
IRIX Release 6.5 IP27 IRIS
Copyright 1987-2006 Silicon Graphics, Inc. All Rights Reserved.
Last login: Thu May 30 22:00:28 PDT 2013 on ttyd1
TERM = (vt100)
IRIS 1# hinv -v
6 250 MHZ IP27 Processors
CPU: MIPS R10000 Processor Chip Revision: 3.4
FPU: MIPS R10010 Floating Point Chip Revision: 3.4
CPU 0 at Module 1/Slot 1/Slice A: 250 Mhz MIPS R10000 Processor Chip (enabled)
Processor revision: 3.4. Scache: Size 4 MB Speed 250 Mhz Tap 0x9
CPU 1 at Module 1/Slot 1/Slice B: 250 Mhz MIPS R10000 Processor Chip (enabled)
Processor revision: 3.4. Scache: Size 4 MB Speed 250 Mhz Tap 0x9
CPU 2 at Module 1/Slot 2/Slice A: 250 Mhz MIPS R10000 Processor Chip (enabled)
Processor revision: 3.4. Scache: Size 4 MB Speed 250 Mhz Tap 0x19
CPU 3 at Module 1/Slot 2/Slice B: 250 Mhz MIPS R10000 Processor Chip (enabled)
Processor revision: 3.4. Scache: Size 4 MB Speed 250 Mhz Tap 0x19
CPU 4 at Module 1/Slot 3/Slice A: 250 Mhz MIPS R10000 Processor Chip (enabled)
Processor revision: 3.4. Scache: Size 4 MB Speed 250 Mhz Tap 0x9
CPU 5 at Module 1/Slot 3/Slice B: 250 Mhz MIPS R10000 Processor Chip (enabled)
Processor revision: 3.4. Scache: Size 4 MB Speed 250 Mhz Tap 0x9
Main memory size: 1792 Mbytes
Instruction cache size: 32 Kbytes
Data cache size: 32 Kbytes
Secondary unified instruction/data cache size: 4 Mbytes
Memory at Module 1/Slot 1: 256 MB (enabled)
Bank 0 contains 128 MB (Standard) DIMMS (enabled)
Bank 1 contains 128 MB (Standard) DIMMS (enabled)
Memory at Module 1/Slot 2: 512 MB (enabled)
Bank 0 contains 128 MB (Standard) DIMMS (enabled)
Bank 1 contains 128 MB (Standard) DIMMS (enabled)
Bank 2 contains 128 MB (Standard) DIMMS (enabled)
Bank 3 contains 128 MB (Standard) DIMMS (enabled)
Memory at Module 1/Slot 3: 1024 MB (enabled)
Bank 0 contains 256 MB (Standard) DIMMS (enabled)
Bank 1 contains 256 MB (Standard) DIMMS (enabled)
Bank 2 contains 256 MB (Standard) DIMMS (enabled)
Bank 3 contains 256 MB (Standard) DIMMS (enabled)
ROUTER in Module 1/Slot 2: Revision 2: Active Ports [4,5,6] (enabled)
ROUTER in Module 1/Slot 3: Revision 2: Active Ports [5,6] (enabled)
Integral SCSI controller 3: Version QL1040B (rev. 2), differential
Integral SCSI controller 4: Version QL1040B (rev. 2), differential
Integral SCSI controller 5: Version QL1040B (rev. 2), differential
Integral SCSI controller 6: Version QL1040B (rev. 2), differential
Integral SCSI controller 0: Version QL1040B (rev. 2), single ended
Disk drive: unit 1 on SCSI controller 0 (unit 1)
CDROM: unit 6 on SCSI controller 0
Integral SCSI controller 1: Version QL1040B (rev. 2), single ended
Integral SCSI controller 2: Version Fibre Channel QL2200A, 33 MHz PCI
IOC3/IOC4 serial port: tty1
IOC3/IOC4 serial port: tty2
IOC3/IOC4 serial port: tty3
IOC3/IOC4 serial port: tty4
IOC3 parallel port: plp1
Graphics board: EMXI
Gigabit Ethernet: eg0, module 1, XIO slot io6, firmware version 0.0.0
Integral Fast Ethernet: ef0, version 1, module 1, slot io1, pci 2
Iris Audio Processor: version RAD revision 7.0, number 1
Origin MSCSI board, module 1 slot 7: Revision 3
Origin BASEIO board, module 1 slot 1: Revision 4
Origin PCI XIO board, module 1 slot 2: Revision 3
PCI Adapter ID (vendor 0x1077, device 0x1020) PCI slot 0
PCI Adapter ID (vendor 0x1077, device 0x1020) PCI slot 1
PCI Adapter ID (vendor 0x1077, device 0x1020) PCI slot 2
PCI Adapter ID (vendor 0x1077, device 0x1020) PCI slot 3
PCI Adapter ID (vendor 0x14e4, device 0x1645) PCI slot 0
PCI Adapter ID (vendor 0x10df, device 0xfc00) PCI slot 1
PCI Adapter ID (vendor 0x10a9, device 0x0003) PCI slot 6
PCI Adapter ID (vendor 0x10a9, device 0x0003) PCI slot 2
PCI Adapter ID (vendor 0x1077, device 0x1020) PCI slot 0
PCI Adapter ID (vendor 0x1077, device 0x1020) PCI slot 1
PCI Adapter ID (vendor 0x10a9, device 0x0005) PCI slot 7
PCI Adapter ID (vendor 0x10a9, device 0x0009) PCI slot 1
PCI Adapter ID (vendor 0x1077, device 0x2200) PCI slot 1
IOC3/IOC4 external interrupts: 1
HUB in Module 1/Slot 1: Revision 5 Speed 100.00 Mhz (enabled)
HUB in Module 1/Slot 2: Revision 5 Speed 100.00 Mhz (enabled)
HUB in Module 1/Slot 3: Revision 5 Speed 100.00 Mhz (enabled)
IP27prom in Module 1/Slot n1: Revision 6.156
IP27prom in Module 1/Slot n2: Revision 6.69
IP27prom in Module 1/Slot n3: Revision 6.156
IO6prom on Global Master Baseio in Module 1/Slot io1: Revision 6.156
IRIS 2#
Edited: WTF? Yes we still have the rev mismatch but I'm now getting healthy boots with all CPU's enabled and no disabled ram and I didn't do anything but add nodes.
EDITED: Oh..
Code:
WARNING: /hw/module/1/slot/n2/node/cpubus/0/a: Number of consecutive exceptions
(physical HARDWARE ERROR STATE:
+ Errors on node Nasid 0x2 (2)
+ IP27 in /hw/module/1/slot/n3 [serial number HXC897]
+ HUB signalled following errors.
+ ICRB4_A entry Register: 0xa0000190052040
+ 39<->02: SN0Net address 0x64014810
+ 56<->56: CRB has an error
+ ICRB4_B entry Register: 0xa5104e000000
+ ICRB6_A entry Register: 0xa0000190052040
+ 39<->02: SN0Net address 0x64014810
+ 56<->56: CRB has an error
+ ICRB6_B entry Register: 0xa5104e000000
+ ICRB9_A entry Register: 0xa0000190052040
+ 39<->02: SN0Net address 0x64014810
+ 56<->56: CRB has an error
+ ICRB9_B entry Register: 0xa5104e000000
+ ICRBb_A entry Register: 0xa0000190052040
+ 39<->02: SN0Net address 0x64014810
+ 56<->56: CRB has an error
+ ICRBb_B entry Register: 0xa5104e000000
+ ICRBc_A entry Register: 0xa0000190052040
+ 39<->02: SN0Net address 0x64014810
+ 56<->56: CRB has an error
+ ICRBc_B entry Register: 0xa5104e000000
End Hardware Error State
++FRU ANALYSIS BEGIN
++
++
++ FRU Analysis Summary
++
++ Software : 70%
++
++FRU ANALYSIS END
PANIC: /hw/module/1/slot/n2/node/cpubus/0/a: Kernel Data Bus error in Cached spa
ce at physical address 0x3200a4010 /hw/module/1/slot/n4/node/memory/dimm_bank/1
(EPC 0xc000000000020440)
NOTICE - cpu 4 didn't dump TLB, may be hung
Dumping to /hw/module/1/slot/io1/baseio/pci/0/scsi_ctlr/0/target/1/lun/0/disk/pa
rtition/1/block at block 0, space: 0x2000 pages
Waiting 5 seconds for I/O processor.
CPU 3 is the I/O processor.
Dumping low memory...
Dumping static kernel pages......
Dumping pfdat pages....
Dumping backtrace pages...
Dumping dynamic kernel pages.........
Dumping buffer pages...
Dumping remaining in-use pages......WARNING: /hw/module/1/slot/n2/node/cpubus/0/
a: Number of consecutive exceptions (physical address 0x3200a4000) exceeded limi
t
WARNING: /hw/module/1/slot/n2/node/cpubus/0/a: Unrecoverable bus error exception
, node 0x3 paddr 0x3200a4000
WARNING: /hw/module/1/slot/n2/node/cpubus/0/a: Mem info: standard dir entry hi 0
x0 entry lo 0x33a3
WARNING: /hw/module/1/slot/n2/node/cpubus/0/a: Number of consecutive exceptions
(physical address 0x3202a4000) exceeded limit
WARNING: /hw/module/1/slot/n2/node/cpubus/0/a: Unrecoverable bus error exception
, node 0x3 paddr 0x3202a4000
WARNING: /hw/module/1/slot/n2/node/cpubus/0/a: Mem info: standard dir entry hi 0
x0 entry lo 0x33a3
Dumping free pages............WARNING: /hw/module/1/slot/n2/node/cpubus/0/a: Num
ber of consecutive exceptions (physical address 0x320000000) exceeded limit
WARNING: /hw/module/1/slot/n2/node/cpubus/0/a: Unrecoverable bus error exception
, node 0x3 paddr 0x320000000
WARNING: /hw/module/1/slot/n2/node/cpubus/0/a: Mem info: standard dir entry hi 0
x0 entry lo 0x312c
WARNING: /hw/module/1/slot/n2/node/cpubus/0/a: Number of consecutive exceptions
(physical address 0x320004000) exceeded limit
WARNING: /hw/module/1/slot/n2/node/cpubus/0/a: Unrecoverable bus error exception
, node 0x3 paddr 0x320004000
(this continues on and on...)
A reboot found that the ram in the last node failed again. The system came up fine after it was disabled I'll work with that later but for now the whole system is now up and running.
Code:
IRIS 2# hinv -v
8 250 MHZ IP27 Processors
CPU: MIPS R10000 Processor Chip Revision: 3.4
FPU: MIPS R10010 Floating Point Chip Revision: 3.4
CPU 0 at Module 1/Slot 1/Slice A: 250 Mhz MIPS R10000 Processor Chip (enabled)
Processor revision: 3.4. Scache: Size 4 MB Speed 250 Mhz Tap 0x9
CPU 1 at Module 1/Slot 1/Slice B: 250 Mhz MIPS R10000 Processor Chip (enabled)
Processor revision: 3.4. Scache: Size 4 MB Speed 250 Mhz Tap 0x9
CPU 2 at Module 1/Slot 2/Slice A: 250 Mhz MIPS R10000 Processor Chip (enabled)
Processor revision: 3.4. Scache: Size 4 MB Speed 250 Mhz Tap 0x19
CPU 3 at Module 1/Slot 2/Slice B: 250 Mhz MIPS R10000 Processor Chip (enabled)
Processor revision: 3.4. Scache: Size 4 MB Speed 250 Mhz Tap 0x19
CPU 4 at Module 1/Slot 3/Slice A: 250 Mhz MIPS R10000 Processor Chip (enabled)
Processor revision: 3.4. Scache: Size 4 MB Speed 250 Mhz Tap 0x9
CPU 5 at Module 1/Slot 3/Slice B: 250 Mhz MIPS R10000 Processor Chip (enabled)
Processor revision: 3.4. Scache: Size 4 MB Speed 250 Mhz Tap 0x9
CPU 6 at Module 1/Slot 4/Slice A: 250 Mhz MIPS R10000 Processor Chip (enabled)
Processor revision: 3.4. Scache: Size 4 MB Speed 250 Mhz Tap 0x0
CPU 7 at Module 1/Slot 4/Slice B: 250 Mhz MIPS R10000 Processor Chip (enabled)
Processor revision: 3.4. Scache: Size 4 MB Speed 250 Mhz Tap 0x0
Main memory size: 1920 Mbytes
Instruction cache size: 32 Kbytes
Data cache size: 32 Kbytes
Secondary unified instruction/data cache size: 4 Mbytes
Memory at Module 1/Slot 1: 256 MB (enabled)
Bank 0 contains 128 MB (Standard) DIMMS (enabled)
Bank 1 contains 128 MB (Standard) DIMMS (enabled)
Memory at Module 1/Slot 2: 512 MB (enabled)
Bank 0 contains 128 MB (Standard) DIMMS (enabled)
Bank 1 contains 128 MB (Standard) DIMMS (enabled)
Bank 2 contains 128 MB (Standard) DIMMS (enabled)
Bank 3 contains 128 MB (Standard) DIMMS (enabled)
Memory at Module 1/Slot 3: 1024 MB (enabled)
Bank 0 contains 256 MB (Standard) DIMMS (enabled)
Bank 1 contains 256 MB (Standard) DIMMS (enabled)
Bank 2 contains 256 MB (Standard) DIMMS (enabled)
Bank 3 contains 256 MB (Standard) DIMMS (enabled)
Memory at Module 1/Slot 4: 128 MB (enabled)
Bank 0 contains 128 MB (Standard) DIMMS (enabled)
Bank 1 contains 128 MB (Standard) DIMMS (disabled)
ROUTER in Module 1/Slot 2: Revision 2: Active Ports [4,5,6] (enabled)
ROUTER in Module 1/Slot 4: Revision 2: Active Ports [4,5,6] (enabled)
Integral SCSI controller 3: Version QL1040B (rev. 2), differential
Integral SCSI controller 4: Version QL1040B (rev. 2), differential
Integral SCSI controller 5: Version QL1040B (rev. 2), differential
Integral SCSI controller 6: Version QL1040B (rev. 2), differential
Integral SCSI controller 1: Version QL1040B (rev. 2), single ended
Integral SCSI controller 0: Version QL1040B (rev. 2), single ended
Disk drive: unit 1 on SCSI controller 0 (unit 1)
CDROM: unit 6 on SCSI controller 0
Integral SCSI controller 2: Version Fibre Channel QL2200A, 33 MHz PCI
IOC3/IOC4 serial port: tty1
IOC3/IOC4 serial port: tty2
IOC3/IOC4 serial port: tty3
IOC3/IOC4 serial port: tty4
IOC3 parallel port: plp1
Graphics board: EMXI
Gigabit Ethernet: eg0, module 1, XIO slot io6, firmware version 0.0.0
Integral Fast Ethernet: ef0, version 1, module 1, slot io1, pci 2
Iris Audio Processor: version RAD revision 7.0, number 1
Origin PCI XIO board, module 1 slot 2: Revision 3
Origin MSCSI board, module 1 slot 7: Revision 3
Origin BASEIO board, module 1 slot 1: Revision 4
PCI Adapter ID (vendor 0x14e4, device 0x1645) PCI slot 0
PCI Adapter ID (vendor 0x10df, device 0xfc00) PCI slot 1
PCI Adapter ID (vendor 0x1077, device 0x1020) PCI slot 0
PCI Adapter ID (vendor 0x1077, device 0x1020) PCI slot 1
PCI Adapter ID (vendor 0x1077, device 0x1020) PCI slot 2
PCI Adapter ID (vendor 0x1077, device 0x1020) PCI slot 3
PCI Adapter ID (vendor 0x10a9, device 0x0009) PCI slot 1
PCI Adapter ID (vendor 0x10a9, device 0x0003) PCI slot 6
PCI Adapter ID (vendor 0x10a9, device 0x0003) PCI slot 2
PCI Adapter ID (vendor 0x1077, device 0x1020) PCI slot 0
PCI Adapter ID (vendor 0x1077, device 0x1020) PCI slot 1
PCI Adapter ID (vendor 0x10a9, device 0x0005) PCI slot 7
PCI Adapter ID (vendor 0x1077, device 0x2200) PCI slot 1
IOC3/IOC4 external interrupts: 1
HUB in Module 1/Slot 1: Revision 5 Speed 100.00 Mhz (enabled)
HUB in Module 1/Slot 2: Revision 5 Speed 100.00 Mhz (enabled)
HUB in Module 1/Slot 3: Revision 5 Speed 100.00 Mhz (enabled)
HUB in Module 1/Slot 4: Revision 5 Speed 100.00 Mhz (enabled)
IP27prom in Module 1/Slot n1: Revision 6.156
IP27prom in Module 1/Slot n2: Revision 6.69
IP27prom in Module 1/Slot n3: Revision 6.156
IP27prom in Module 1/Slot n4: Revision 6.57
IO6prom on Global Master Baseio in Module 1/Slot io1: Revision 6.156
Edited. I hate computers.
Removed the node, clean the ram and the slots. Put it back in....
Code:
IP27 PROM SGI Version 6.156 built 11:27:56 AM Nov 18, 2003
Testing/Initializing memory ............... DONE
Copying PROM code to memory ............... DONE
Discovering local IO ......................
Found an io Board with nic 030-1275-002.
Found an io Board with nic 030-1275-004.
DONE
Discovering NUMAlink connectivity ......... DONE
Found 6 objects (4 hubs, 2 routers) in 66460 usec
Waiting for peers to complete discovery.... DONE
Recognized 390 MHz midplane
Global master is /hw/module/1/slot/n1
*** Barrier sync warning: local=4418, NIC 0x329e00=4400 (promrev mismatch?)
*** Barrier sync warning: local=4446, NIC 0x329e00=4428 (promrev mismatch?)
*** Barrier sync warning: local=4908, NIC 0x329e00=4890 (promrev mismatch?)
*** Barrier sync warning: local=4981, NIC 0x329e00=4963 (promrev mismatch?)
Testing/Initializing all memory ........... DONE
Initializing headless node at nasid 3
.----- MEMORY FAILURE (miscompare, node slot 1) -----
No hub MD error registered
Address 0xa800000300001000
Actual 0x9e0de114090dde55
Expected 0x9e0de51c090dde55
Difference 0x0000040800000000
----- MEMORY FAILURE (miscompare, node slot 1) -----
No hub MD error registered
Address 0xa800000300001020
Actual 0xa20d994439360ab9
Expected 0xa20d9d4c39360ab9
Difference 0x0000040800000000
----- MEMORY FAILURE (miscompare, node slot 1) -----
No hub MD error registered
Address 0xa800000300001040
Actual 0xa60d5174695e371d
Expected 0xa60d557c695e371d
Difference 0x0000040800000000
----- MEMORY FAILURE (miscompare, node slot 1) -----
No hub MD error registered
Address 0xa800000300001060
Actual 0xaa0d09a499866381
Expected 0xaa0d0dac99866381
Difference 0x0000040800000000
Exceeded maximum error count (stopping)
.......*** Swapping bank 0 with bank 1 on headless node nasid 3
Discovering local IO ...................... DONE
*** Barrier sync warning: local=5108, NIC 0x329e00=5090 (promrev mismatch?)
waiting for node with nic 3472e2 at module 1 slot 3 at global barrier.......
Checking partitioning information ......... DONE
*** Barrier sync warning: local=5355, NIC 0x329e00=5337 (promrev mismatch?)
Loading BASEIO prom ....................... DONE
BASEIO PROM Monitor SGI Version 6.156 built 11:26:28 AM Nov 18, 2003 (BE64)
6 CPUs on 4 nodes found.
Installing PROM Device drivers ............
Base I/O Ethernet set to /dev/ethernet/ef0
Walking SCSI Adapter 0 (/hw/module/1/slot/io1), (pci id 0)
1+ 2- 3- 4- 5- 6+ 7- 8- 9- 10- 11- 12- 13- 14- 15- = 2 device(s)
Walking SCSI Adapter 1 (/hw/module/1/slot/io1), (pci id 1)
1- 2- 3- 4- 5- 6- 7- 8- 9- 10- 11- 12- 13- 14- 15- = 0 device(s)
Initializing PROM Device drivers .......... DONE
Checking hardware inventory ............... Found new or re
-enabled component MEM BANK 0 1 2 3
***Warning: Found a new IP27 board in module 1, slot n3, serial HXC897
Please use the 'update' command from the PROM Monitor to update the inventory
***Warning: Found a new IP27 board in module 1, slot n4, serial JHL636
Please use the 'update' command from the PROM Monitor to update the inventory
***Warning: Found a new MSCSI board in module 1, slot io7, serial EAX394
Please use the 'update' command from the PROM Monitor to update the inventory
DONE
**** System Configuration and Diagnostics Summary ****
CONFIG:
No. of NODEs enabled = 4
No. of NODEs disabled = 0
No. of CPUs enabled = 6
No. of CPUs disabled = 2
Mem enabled = 1920 MB
Mem disabled = 128 MB
No. of RTRs enabled = 2
No. of RTRs disabled = 0
DIAG RESULTS:
/hw/module/1/slot/n4/node/cpu/0: CPU A disabled
Reason: PROM copied to memory (bank 0) is bad.
/hw/module/1/slot/n4/node/cpu/1: CPU B disabled
Reason: PROM copied to memory (bank 0) is bad.
/hw/module/1/slot/n4/node/mem: MEMBANK(S) 0 disabled
Reason:
Bank 0: Some DIMMs failed mem test.
**** End System Configuration and Diagnostics Summary ****
Lovely. The node went back to being completely unusable again. Should of left it alone when it was at least leaving the CPU's enabled.
Last edit for now, I swear: I got the CPU's back by rotating the ram around (which still fail) and cleaning the compression connector, but now node 1 is silently failing at boot and locking the system up......I'm done for now.