SGI: Hardware

HOWTO: Restore Fuel L1 firmware after a broken update

I was getting a "firmware too old" message from numastatd at startup, so I thought it would be a good idea to update it to 1.44.0 which came with 6.5.30.

It turns out that this combination of -003 motherboard and 1.44.0 L1 firmware renders the system unbootable. Won't respond to power button, verbal abuse, nothing.

The procedure to reverse this is quite simple, though :)

Remove the machine's side panel. Somewhere near the SCSI connector on the motherboard you will find an RS232 port, attach a null-modem cable to it.
On the other machine, start a terminal emulator ('cu' works well), set it to 38400 8N1 (e.g. 'cu -l /dev/your_serial_port -s 38400').
As soon as you plug in the power cable on the Fuel, you should be greeted by a prompt:

Code: Select all

ALERT: Error reading the display I/O expander, no acknowledge


SGI SN1 L1 Controller
Firmware Image A: Rev. 1.44.0, Built 07/17/2006 18:19:54


001?01-L1>


The following entries in the log appeared at the time of the update:

Code: Select all

09/29/09 21:21:28 L1 booting 1.44.0
09/29/09 21:21:28 vram checksum error - initializing core data.
09/29/09 21:21:28 ALERT: Error reading the display I/O expander, no acknowledge
09/29/09 21:21:28 ** fixing invalid SSN value


If you try to issue a power up command, you will receive this lovely message:

Code: Select all

001?01-L1>pwr up
ERROR: no power supplies available.


Resetting the NVRAM didn't help, so I decided to boot the other L1 image.

Code: Select all

001?01-L1>flash status
Flash image A currently booted

Image      Status        Revision    Built
-----   -------------   ----------   -----
A     default         1.44.0       07/17/2006 18:19:54
B     valid           1.10.12      02/01/2002 14:40:22
001?01-L1>flash default b

(if your L1 booted from image B, enter "flash default a" instead)

Code: Select all

001?01-L1>reboot_l1


After this, everything works normally:

Code: Select all

SGI SN1 L1 Controller
Firmware Image B: Rev. 1.10.12, Built 02/01/2002 14:40:22


001a01-L1>flash status
Flash image B currently booted

Image      Status        Revision    Built
-----   -------------   ----------   -----
A     valid           1.44.0       07/17/2006 18:19:54
B     user default    1.10.12      02/01/2002 14:40:22


If anyone has the newest 1.48.0 L1 image, I could give it a try to see whether they've fixed this or not :)
ShadeOfBlue wrote: If you try to issue a power up command, you will receive this lovely message:

Code: Select all

001?01-L1>pwr up
ERROR: no power supplies available.


Damn, that happened to me too, but in my case the system was a Fuel prototype !
I used the same procedure to recover the system.

ShadeOfBlue wrote: If anyone has the newest 1.48.0 L1 image, I could give it a try to see whether they've fixed this or not :)

Forget it. That's what I used:

Code: Select all

Flash image B currently booted

Image      Status        Revision    Built
-----   -------------   ----------   -----
A     valid           1.48.1       01/22/2007 11:33:34
B     user default    1.9.15       12/04/2001 16:21:34
To accentuate the special identity of the IRIS 4D/70, Silicon Graphics' designers selected a new color palette. The machine's coating blends dark grey, raspberry and beige colors into a pleasing harmony. ( IRIS 4D/70 Superworkstation Technical Report )
jan-jaap wrote: Forget it. That's what I used:

Code: Select all

Flash image B currently booted

Image      Status        Revision    Built
-----   -------------   ----------   -----
A     valid           1.48.1       01/22/2007 11:33:34
B     user default    1.9.15       12/04/2001 16:21:34

Oh well, I'll stick with 1.10.12 then and do a 'chkconfig numastatd off' :)
Question (not having an IP35-derived system yet):

Why haven't you re-flashed the first image back so you have a failsafe again?
"Brakes??? What Brakes???"

:Indigo: :Octane: :Indigo2: :Indigo2IMP: :Indy: :PI: :O3x0: :ChallengeL: :O2000R: (single-CM)
ShadeOfBlue wrote: I was getting a "firmware too old" message from numastatd at startup, so I thought it would be a good idea to update it to 1.44.0 which came with 6.5.30.

It turns out that this combination of -003 motherboard and 1.44.0 L1 firmware renders the system unbootable. Won't respond to power button, verbal abuse, nothing.

The procedure to reverse this is quite simple, though :)

Remove the machine's side panel. Somewhere near the SCSI connector on the motherboard you will find an RS232 port, attach a null-modem cable to it.
On the other machine, start a terminal emulator ('cu' works well), set it to 38400 8N1 (e.g. 'cu -l /dev/your_serial_port -s 38400').
As soon as you plug in the power cable on the Fuel, you should be greeted by a prompt:

Code: Select all

ALERT: Error reading the display I/O expander, no acknowledge


SGI SN1 L1 Controller
Firmware Image A: Rev. 1.44.0, Built 07/17/2006 18:19:54


001?01-L1>


The following entries in the log appeared at the time of the update:

Code: Select all

09/29/09 21:21:28 L1 booting 1.44.0
09/29/09 21:21:28 vram checksum error - initializing core data.
09/29/09 21:21:28 ALERT: Error reading the display I/O expander, no acknowledge
09/29/09 21:21:28 ** fixing invalid SSN value


If you try to issue a power up command, you will receive this lovely message:

Code: Select all

001?01-L1>pwr up
ERROR: no power supplies available.


This is interesting for an odd reason - I have a grafix card that is flaky. If the machine does boot then it runs fine. But when it doesn't, I get the exact symptoms you describe. It appears that whatever on the grafix card does the 'acknowledge' during the post is headed south. Dr. Dave, where are you ?
Good question. Probably U11, though I don't have a Fuel video card handy. It's the Philips part, and should be basically an I/O chip with an I2C interface - I tried looking at the pics posted a while back but can't read the part number. Philips calls these parts "I/O Expanders" so no bowdlerization there.

So.... it's looking like in the case where there are issues, the I2C interface to the graphics card is not working, or at least the I2C peripherals on the card are not responding. I'd bet the later revisions of the L1 firmware have issues initializing the I2C controller on the motherboard. and thus (depending on the severity of the problem) may cause *none* of the I2C peripherals to be detected. This of course leads to the inevitable problem...

Hamei, have a look at U11 closely and see if there is anything like a cold solder joint. There is a cluster of I2C chips around there, including the Dallas environment monitor and an Atmel flash chip, as well as the Philips part and whatever else. Check them all. The address is latched internally at reset if I remember correctly, so basically if it's flaky and you reset it enough times it will eventually 'latch' the correct value and everything is then hunky-dory until the next reset/powerup.

I2C is a bidirectional serial protocol, all the peripheral chips are wired in parallel, and a unique 'address' is usually hard-strapped to each device by tying pins high and low. For the flaky board, it could be possible that one of the chips is getting an incorrect address strapped to it (which would interfere with the address decoding mechanism), or again the mainboard I2C (or whatever I2C controller the L1 has access too, probably on-chip) is not being set-up/initialized/run properly by the later L1 firmware.
:O3000: <> :O3000: :O2000: :Tezro: :Fuel: x2+ :Octane2: :Octane: x3 :1600SW: x2 :O2: x2+ :Indigo2IMP: :Indigo2: x2 :Indigo: x3 :Indy: x2+

Once you step up to the big iron, you learn all about physics, electrical standards, and first aid - usually all in the same day
SAQ wrote: Why haven't you re-flashed the first image back so you have a failsafe again?

That would be the rational thing to do. But the damn thing gave me a heart attack when it pulled that stunt on me (it wasn't my system) so I decided not to mess with it anymore :oops:
To accentuate the special identity of the IRIS 4D/70, Silicon Graphics' designers selected a new color palette. The machine's coating blends dark grey, raspberry and beige colors into a pleasing harmony. ( IRIS 4D/70 Superworkstation Technical Report )
SAQ wrote: Why haven't you re-flashed the first image back so you have a failsafe again?

The system currently boots from the good image, so it can't be overwritten. If for some reason the system decides to boot off the second image, I can always attach the cable again and re-do the procedure.
It is unlikely that the machine would suddenly start ignoring the "user default" flash setting, so I'm just going to leave it as it is :)

SGI did a really crappy job at testing these IP35 updates... With such a small set of possible hardware combinations I'd expect them to test everything, at least to see if it powers up.
I know this is an old post, but to save anyone else making the same mistakes..... from being too keen to update firmware I thought I should post this. SGI did test this sort of situation out before releasing updates and I think somewhere there is a release note that says don't jump straight from IRIX 6.5.11 or similar to 6.5.30 for a number of reasons....

If you have an older series Fuel that gives you these "firmware too old" messages, simply do a flashsc update to a mid life version such as 1.22.0 found in IRIX 6.5.21, followed by another update to 1.44.0 found in 6.5.30.

This avoids the problems mentioned above and would stop people incorrectly bad mouthing SGI's tech guy's.

I have plenty of Fuel systems that run 030-1707-003 motherboards, originally running very old f/w successfully running 1.44.0, it just needs some patience and 5 minutes of your time to check the procedure before running it.

Hope this helps anyhow.
In order of use at the moment..... :Fuel: :O3000:

Currently looking to buy good :Fuel: and :O2: :O2+: machine.
Torfinn
tjsgifan wrote: This avoids the problems mentioned above and would stop people incorrectly bad mouthing SGI's tech guy's.

A firmware update should never render a system unbootable.

It wouldn't have been that hard to add a check to the flashsc program to prevent such a situation from occurring, or to actually handle non-incremental updates properly. For a system that used to cost many thousands of dollars, I'd expect them to get at least this right.

But this is hardly the only problem with the 1.44.0 update; the update for the O3000's L2 controller is actually missing a file in the firmware image, breaking a large part of the L2 controller's functionality (including the ability to reflash it to an earlier version) -- how does this even get past testing?

I usually have a lot of respect for SGI engineers, but whoever was responsible for quality assurance on the 1.44.0 update did a bad job.