Let's say you find, buy, or inherit two or more Altix 350 modules. You connect everything up, but it just won't seem to make it to the EFI shell - in fact it dumps you into the Power On Diagnostics (POD) instead. It might look something like this:
As frequently happens you may have received these modules in pieces, and everything is suspect. Did you get the right RAM? Are the NUMAlink cables good? Was there some other step you were supposed to take? You may have some or all of those issues, but we're going to tackle the fact that the two modules have different versions of the PROM.
If you look at the snippet above you'll see " 001c01#0c: SGI SAL Version 4.43 " which indicates that the brick 001c01 (probably your base module) has PROM version 4.43, whereas the line " 001c08#0a: SGI SAL Version 3.25 " indicates that module 001c08 has PROM version 3.25. Not only is PROM 3.25 too old to work with version 4.43, it's too old to work with version 2.6 of the Linux kernel...
Fortunately there's a way to fix this even if you can't get to the EFI shell, where the documentation from SGI tells you that you can use the "flash" command to reprogram a module's PROM from a binary file. And many thanks to forum member rosmaniac for sharing the method shown below in this thread .
Though nobody has reported seeing documentation on the POD, there is a "help" command, and the help command tells you about another command: the "flash" command. It's not quite the same as the EFI Shell command of the same name - instead of flashing a PROM image stored in a file, it burns the PROM image from the master node into the node you select.
How do you tell the "flash" command which node to act on? You need something called the node's NASID, and there's another command the POD provides - "pcfg" - that will tell you what those are. So first, let's see how the "pcfg" command does that.
First thing I do is run the "version" command just to make sure that the POD is using or part of the correct PROM version.
You can see that when I run the "pcfg" command the output contains two Entry blocks, one for each module. Check the entries for the module name you want (" Module=001c08 "), confirm that this module has the version of the PROM you want to replace (" Prom=3.25 "), and then note the value of the NASID for this entry (" Nasid=2 ").
Now you're ready to flash the newer PROM to the out-of-date module, using the NASID, Make sure that the command you've entered is the one you want, the "flash" command will not ask you to confirm !!
That's it - the newer PROM image from module 001c01 has been written to module 001c08. But it won't take effect until you reset everything:
Note that both modules are now reporting PROM version 4.43!
And now not only are the modules running the same version of the PROM, the system can start the EFI Shell and you can take the next steps in troubleshooting this system.
Code:
SGI SN1 L1 Controller
Firmware Image B: Rev. 1.26.9, Built 02/12/2004 13:59:52
001c01-L1>* pwr up
001c01-L1>
entering console mode 001c01 CPU0, <CTRL_T> to escape to L1
INFO: console subchannel changed: 001c08 CPU2
001c08#0c: SGI SAL Version 3.25 rel040225 IP41 built 12:01:43 PM Feb 25, 2004
INFO: console subchannel changed: 001c08 CPU0
001c08#0a: SGI SAL Version 3.25 rel040225 IP41 built 12:01:43 PM Feb 25, 2004
001c01#0c: SGI SAL Version 4.43 rel051202 IP41 built 02:14:24 PM Dec 2, 2005
001c01#0a: SGI SAL Version 4.43 rel051202 IP41 built 02:14:24 PM Dec 2, 2005
Probing memory DIMMs ...............Found I/O brick attached to module/001c01/sl
ab/0/node
Probing memory DIMMs ........................ DONE
Initializing memory controller ............ DONE
Testing memory .............................. DONE
Initializing memory controller ............ DONE
Testing memory ............................ DONE
Initializing memory .................... DONE
.Switching to RAM and testing CPU .......... DONE
Discovering NUMAlink connectivity ......... DONE
Found 2 objects (2 chipsets, 0 routers, 0 iochipsets) in 3909 usec
Waiting for peers to complete discovery.... .. DONE
Initializing memory ....................... DONE
Switching to RAM and testing CPU .......... DONE
Discovering NUMAlink connectivity ......... DONE
Found 2 objects (2 chipsets, 0 routers, 0 iochipsets) in 15722 usec
Waiting for peers to complete discovery.... DONE
DONE
tree barrier at module/001c01/slab/0/node timed out
POD entered via MCA, using Cac mode
INFO: console subchannel changed: 001c08 CPU2
POD entered via MCA, using Cac mode
INFO: console subchannel changed: 001c08 CPU0
0 002: POD SysCt Cac> INFO: console subchannel changed: 001c08 CPU2
2 002: POD SysCt Cac> *** module/001c08/slab/0/node has taken an exception! Cont
inuing...
Discovering local I/O on nasid 0 ......... DONE
Checking partitioning information ......... DONE
Erecting partition fences ................. DONE
POD entered via MCA, using Cac mode
0 000: POD SysCt Cac> POD entered via MCA, using Cac mode
2 000: POD SysCt Cac>
Firmware Image B: Rev. 1.26.9, Built 02/12/2004 13:59:52
001c01-L1>* pwr up
001c01-L1>
entering console mode 001c01 CPU0, <CTRL_T> to escape to L1
INFO: console subchannel changed: 001c08 CPU2
001c08#0c: SGI SAL Version 3.25 rel040225 IP41 built 12:01:43 PM Feb 25, 2004
INFO: console subchannel changed: 001c08 CPU0
001c08#0a: SGI SAL Version 3.25 rel040225 IP41 built 12:01:43 PM Feb 25, 2004
001c01#0c: SGI SAL Version 4.43 rel051202 IP41 built 02:14:24 PM Dec 2, 2005
001c01#0a: SGI SAL Version 4.43 rel051202 IP41 built 02:14:24 PM Dec 2, 2005
Probing memory DIMMs ...............Found I/O brick attached to module/001c01/sl
ab/0/node
Probing memory DIMMs ........................ DONE
Initializing memory controller ............ DONE
Testing memory .............................. DONE
Initializing memory controller ............ DONE
Testing memory ............................ DONE
Initializing memory .................... DONE
.Switching to RAM and testing CPU .......... DONE
Discovering NUMAlink connectivity ......... DONE
Found 2 objects (2 chipsets, 0 routers, 0 iochipsets) in 3909 usec
Waiting for peers to complete discovery.... .. DONE
Initializing memory ....................... DONE
Switching to RAM and testing CPU .......... DONE
Discovering NUMAlink connectivity ......... DONE
Found 2 objects (2 chipsets, 0 routers, 0 iochipsets) in 15722 usec
Waiting for peers to complete discovery.... DONE
DONE
tree barrier at module/001c01/slab/0/node timed out
POD entered via MCA, using Cac mode
INFO: console subchannel changed: 001c08 CPU2
POD entered via MCA, using Cac mode
INFO: console subchannel changed: 001c08 CPU0
0 002: POD SysCt Cac> INFO: console subchannel changed: 001c08 CPU2
2 002: POD SysCt Cac> *** module/001c08/slab/0/node has taken an exception! Cont
inuing...
Discovering local I/O on nasid 0 ......... DONE
Checking partitioning information ......... DONE
Erecting partition fences ................. DONE
POD entered via MCA, using Cac mode
0 000: POD SysCt Cac> POD entered via MCA, using Cac mode
2 000: POD SysCt Cac>
As frequently happens you may have received these modules in pieces, and everything is suspect. Did you get the right RAM? Are the NUMAlink cables good? Was there some other step you were supposed to take? You may have some or all of those issues, but we're going to tackle the fact that the two modules have different versions of the PROM.
If you look at the snippet above you'll see " 001c01#0c: SGI SAL Version 4.43 " which indicates that the brick 001c01 (probably your base module) has PROM version 4.43, whereas the line " 001c08#0a: SGI SAL Version 3.25 " indicates that module 001c08 has PROM version 3.25. Not only is PROM 3.25 too old to work with version 4.43, it's too old to work with version 2.6 of the Linux kernel...
Fortunately there's a way to fix this even if you can't get to the EFI shell, where the documentation from SGI tells you that you can use the "flash" command to reprogram a module's PROM from a binary file. And many thanks to forum member rosmaniac for sharing the method shown below in this thread .
Though nobody has reported seeing documentation on the POD, there is a "help" command, and the help command tells you about another command: the "flash" command. It's not quite the same as the EFI Shell command of the same name - instead of flashing a PROM image stored in a file, it burns the PROM image from the master node into the node you select.
How do you tell the "flash" command which node to act on? You need something called the node's NASID, and there's another command the POD provides - "pcfg" - that will tell you what those are. So first, let's see how the "pcfg" command does that.
Code:
2 000: POD SysCt Cac> version
SGI SAL Version 4.43 rel051202 IP41 built 02:14:24 PM Dec 2, 2005
2 000: POD SysCt Cac> pcfg
NUMAlink Topology: (node 0):
Entry 0: SHub 001c01#0 Chiprev=3 Route=0x0
Module=001c01 Slab=0 Partition=0 Space=RESET
Nasid=0 Flags=0x100000 Syssize=0 Prom=4.43
Port 1 connection: Entry 1 SHub 001c08#0 port 2
Port 1 status: UP NF
Port 2 connection: Entry 1 SHub 001c08#0 port 1
Port 2 status: UP NF
Entry 1: SHub 001c08#0 Chiprev=3 Route=0x1
Module=001c08 Slab=0 Partition=0 Space=RESET
Nasid=2 Flags=0x1110000 Syssize=0 Prom=3.25
Port 1 connection: Entry 0 SHub 001c01#0 port 2
Port 1 status: UP NF
Port 2 connection: Entry 0 SHub 001c01#0 port 1
Port 2 status: UP NF
2 000: POD SysCt Cac>
SGI SAL Version 4.43 rel051202 IP41 built 02:14:24 PM Dec 2, 2005
2 000: POD SysCt Cac> pcfg
NUMAlink Topology: (node 0):
Entry 0: SHub 001c01#0 Chiprev=3 Route=0x0
Module=001c01 Slab=0 Partition=0 Space=RESET
Nasid=0 Flags=0x100000 Syssize=0 Prom=4.43
Port 1 connection: Entry 1 SHub 001c08#0 port 2
Port 1 status: UP NF
Port 2 connection: Entry 1 SHub 001c08#0 port 1
Port 2 status: UP NF
Entry 1: SHub 001c08#0 Chiprev=3 Route=0x1
Module=001c08 Slab=0 Partition=0 Space=RESET
Nasid=2 Flags=0x1110000 Syssize=0 Prom=3.25
Port 1 connection: Entry 0 SHub 001c01#0 port 2
Port 1 status: UP NF
Port 2 connection: Entry 0 SHub 001c01#0 port 1
Port 2 status: UP NF
2 000: POD SysCt Cac>
First thing I do is run the "version" command just to make sure that the POD is using or part of the correct PROM version.
You can see that when I run the "pcfg" command the output contains two Entry blocks, one for each module. Check the entries for the module name you want (" Module=001c08 "), confirm that this module has the version of the PROM you want to replace (" Prom=3.25 "), and then note the value of the NASID for this entry (" Nasid=2 ").
Now you're ready to flash the newer PROM to the out-of-date module, using the NASID, Make sure that the command you've entered is the one you want, the "flash" command will not ask you to confirm !!
Code:
2 000: POD SysCt Cac> flash 2
Flashing node 2
...erasing sectors
................................................................................
................Done.
...copying prom
source address : 0x80000087ffa00000
destination address: 0x8000008fffa00000
size (bytes) : 0x0000000000600000
...programming
................................................................................
................Flash of node 2 complete.
Waiting for all flash operations to complete...DONE.
2 000: POD SysCt Cac>
Flashing node 2
...erasing sectors
................................................................................
................Done.
...copying prom
source address : 0x80000087ffa00000
destination address: 0x8000008fffa00000
size (bytes) : 0x0000000000600000
...programming
................................................................................
................Flash of node 2 complete.
Waiting for all flash operations to complete...DONE.
2 000: POD SysCt Cac>
That's it - the newer PROM image from module 001c01 has been written to module 001c08. But it won't take effect until you reset everything:
Code:
2 000: POD SysCt Cac> reset
001c08#0c: SGI SAL Version 4.43 rel051202 IP41 built 02:14:24 PM Dec 2, 2005
001c01#0c: SGI SAL Version 4.43 rel051202 IP41 built 02:14:24 PM Dec 2, 2005
INFO: console subchannel changed: 001c08 CPU0
001c08#0a: SGI SAL Version 4.43 rel051202 IP41 built 02:14:24 PM Dec 2, 2005
001c01#0a: SGI SAL Version 4.43 rel051202 IP41 built 02:14:24 PM Dec 2, 2005
Probing memory DIMMs ..............Found I/O brick attached to module/001c01/slab/0/node
001c08#0c: SGI SAL Version 4.43 rel051202 IP41 built 02:14:24 PM Dec 2, 2005
001c01#0c: SGI SAL Version 4.43 rel051202 IP41 built 02:14:24 PM Dec 2, 2005
INFO: console subchannel changed: 001c08 CPU0
001c08#0a: SGI SAL Version 4.43 rel051202 IP41 built 02:14:24 PM Dec 2, 2005
001c01#0a: SGI SAL Version 4.43 rel051202 IP41 built 02:14:24 PM Dec 2, 2005
Probing memory DIMMs ..............Found I/O brick attached to module/001c01/slab/0/node
Note that both modules are now reporting PROM version 4.43!
Code:
Probing memory DIMMs ............................. DONE
Initializing memory controller ............ DONE
Initializing memory controller ............ DONE
DONE
Testing memory .........................Testing memory .........................
... DONE
.Initializing memory .................... DONE
Switching to RAM and testing CPU .......... DONE
..Discovering NUMAlink connectivity ......... DONE
Found 2 objects (2 chipsets, 0 routers, 0 iochipsets) in 3912 usec
Waiting for peers to complete discovery.... DONE
Initializing memory ..................... DONE
Switching to RAM and testing CPU .......... DONE
Discovering NUMAlink connectivity ......... DONE
Found 2 objects (2 chipsets, 0 routers, 0 iochipsets) in 3910 usec
Waiting for peers to complete discovery.... DONE
DONE
Discovering local I/O on nasid 0 ......... DONE
Checking partitioning information ......... Checking partitioning
information ......... DONE
DONE
Erecting partition fences ................. Syncing EFI var. store (
module/001c01/slab/0/node->module/001c08/slab/0/node) ...DONE
4 CPUs on 2 nodes found.
...........-
..... DONE
Decompressing SAL runtime ................. DONE
Loading SAL runtime ....................... DONE
Decompressing EFI ......................... DONE
Loading EFI ............................... DONE
Altix IO Topology Information
*****************************
Serial Number:R200****
PCI SEGMENT PCIBUS NUMBER BRICK RACK:SLOT BUS CONNECTION TOPOLOGY
----------- ------------- --------------------- -------------------
0x0000 0x01 OPbrick 001:01 01 001c01:slot0:slab0:widget15:bus0
0x0000 0x02 OPbrick 001:01 02 001c01:slot0:slab0:widget15:bus1
EFI version 1.10 [14.62] Build flags: EFI64 Running on Intel(R) Itanium Processor EFI_DEBUG
EFI IA-64 SDV/FDK (No BIOS ) [Dec 2 2005 14:10:20] - INTEL
Copyright (c) 2000-2005 Broadcom Corporation
Broadcom NetXtreme Gigabit Ethernet EFI driver v8.1.1
Seg: 0 Bus: 1 Dev: 1 Func: 0 - SGI IOC4 ATA detected: Firmware Rev 79
Seg: 0 Bus: 1 Dev: 3 Func: 0 - Qlogic 12160 SCSI Controller detected: Firmware Rev 6
(Pun 1,Lun 0): FUJITSU MAP3735NC 5605
Broadcom NetXtreme Gigabit Ethernet (BCM5701) is detected (PCI)
EFI Boot Manager ver 1.10 [14.62]
Partition 0: Enabled Disabled
CBricks 2 Nodes 2 0
RBricks 0 CPUs 4 0
IOBricks 2 Mem(GB) 6 0
Loading device drivers
Please select a boot option
SUSE Linux Enterprise Server 11 SP1
EFI Shell [Built-in]
Boot option maintenance menu
Use ^ and v to change option(s). Use Enter to select an option
EFI Shell [Built-in]
Loading.: EFI Shell [Built-in]
EFI Shell version 1.10 [14.62]
Device mapping table
fs0 : Acpi(PNP0A03,1)/Pci(3|0)/Scsi(Pun1,Lun0)/HD(Part1,Sig3B6D7BAE-C470-476E
-BC5D-F3F974DB367E)
blk0 : Acpi(PNP0A03,1)/Pci(3|0)/Scsi(Pun1,Lun0)
blk1 : Acpi(PNP0A03,1)/Pci(3|0)/Scsi(Pun1,Lun0)/HD(Part1,Sig3B6D7BAE-C470-476E
-BC5D-F3F974DB367E)
blk2 : Acpi(PNP0A03,1)/Pci(3|0)/Scsi(Pun1,Lun0)/HD(Part2,Sig5FEB84EE-6CD6-4687
-8004-D119966337C7)
blk3 : Acpi(PNP0A03,1)/Pci(3|0)/Scsi(Pun1,Lun0)/HD(Part3,SigA5D5C908-46E3-44B9
-984F-B7C3C24FF740)
Shell>
Initializing memory controller ............ DONE
Initializing memory controller ............ DONE
DONE
Testing memory .........................Testing memory .........................
... DONE
.Initializing memory .................... DONE
Switching to RAM and testing CPU .......... DONE
..Discovering NUMAlink connectivity ......... DONE
Found 2 objects (2 chipsets, 0 routers, 0 iochipsets) in 3912 usec
Waiting for peers to complete discovery.... DONE
Initializing memory ..................... DONE
Switching to RAM and testing CPU .......... DONE
Discovering NUMAlink connectivity ......... DONE
Found 2 objects (2 chipsets, 0 routers, 0 iochipsets) in 3910 usec
Waiting for peers to complete discovery.... DONE
DONE
Discovering local I/O on nasid 0 ......... DONE
Checking partitioning information ......... Checking partitioning
information ......... DONE
DONE
Erecting partition fences ................. Syncing EFI var. store (
module/001c01/slab/0/node->module/001c08/slab/0/node) ...DONE
4 CPUs on 2 nodes found.
...........-
..... DONE
Decompressing SAL runtime ................. DONE
Loading SAL runtime ....................... DONE
Decompressing EFI ......................... DONE
Loading EFI ............................... DONE
Altix IO Topology Information
*****************************
Serial Number:R200****
PCI SEGMENT PCIBUS NUMBER BRICK RACK:SLOT BUS CONNECTION TOPOLOGY
----------- ------------- --------------------- -------------------
0x0000 0x01 OPbrick 001:01 01 001c01:slot0:slab0:widget15:bus0
0x0000 0x02 OPbrick 001:01 02 001c01:slot0:slab0:widget15:bus1
EFI version 1.10 [14.62] Build flags: EFI64 Running on Intel(R) Itanium Processor EFI_DEBUG
EFI IA-64 SDV/FDK (No BIOS ) [Dec 2 2005 14:10:20] - INTEL
Copyright (c) 2000-2005 Broadcom Corporation
Broadcom NetXtreme Gigabit Ethernet EFI driver v8.1.1
Seg: 0 Bus: 1 Dev: 1 Func: 0 - SGI IOC4 ATA detected: Firmware Rev 79
Seg: 0 Bus: 1 Dev: 3 Func: 0 - Qlogic 12160 SCSI Controller detected: Firmware Rev 6
(Pun 1,Lun 0): FUJITSU MAP3735NC 5605
Broadcom NetXtreme Gigabit Ethernet (BCM5701) is detected (PCI)
EFI Boot Manager ver 1.10 [14.62]
Partition 0: Enabled Disabled
CBricks 2 Nodes 2 0
RBricks 0 CPUs 4 0
IOBricks 2 Mem(GB) 6 0
Loading device drivers
Please select a boot option
SUSE Linux Enterprise Server 11 SP1
EFI Shell [Built-in]
Boot option maintenance menu
Use ^ and v to change option(s). Use Enter to select an option
EFI Shell [Built-in]
Loading.: EFI Shell [Built-in]
EFI Shell version 1.10 [14.62]
Device mapping table
fs0 : Acpi(PNP0A03,1)/Pci(3|0)/Scsi(Pun1,Lun0)/HD(Part1,Sig3B6D7BAE-C470-476E
-BC5D-F3F974DB367E)
blk0 : Acpi(PNP0A03,1)/Pci(3|0)/Scsi(Pun1,Lun0)
blk1 : Acpi(PNP0A03,1)/Pci(3|0)/Scsi(Pun1,Lun0)/HD(Part1,Sig3B6D7BAE-C470-476E
-BC5D-F3F974DB367E)
blk2 : Acpi(PNP0A03,1)/Pci(3|0)/Scsi(Pun1,Lun0)/HD(Part2,Sig5FEB84EE-6CD6-4687
-8004-D119966337C7)
blk3 : Acpi(PNP0A03,1)/Pci(3|0)/Scsi(Pun1,Lun0)/HD(Part3,SigA5D5C908-46E3-44B9
-984F-B7C3C24FF740)
Shell>
And now not only are the modules running the same version of the PROM, the system can start the EFI Shell and you can take the next steps in troubleshooting this system.