SGI: Hardware

Altix 3700 lost the EFI ... solved

Hello,

It didn't take me long to go from happy to problems :?
The question is how much I have messed up the Altix. Curiosity killed the Altix?

In an attempt to get the last brick and cpus working, I opened it up and reseated all memory.
Damn design. They put two mainboards sort of back to back, with 4 cpus on each side. So I had
to open the top to deal with memory, flip it over and do the same on the underside. It weighs a lot.

The memory was the only thing that I dared to touch, everything else is screwed tight to the
mainboard. Anyway, that little operation didn't do any good, the top brick still showed 'Powered on'
while the other where 'Testing memory'.
There is a row of red leds you can se from the back of the brick. I guess the blinking is the cpus
working, like the O2000 cpus. On the top brick they are all solid red from startup while the
other bricks leds are blinking as if they are working.

So the next step, which usually gives some result on an O2000, was to 'pod', 'initalllogs' etc.
I didn't pay the warning text so much attention (as usuall ;) ) and now the EFI doesn't come
up when I reboot. It stops right here

Code: Select all

POD entered via MCA, using Cac mode

0 002: POD SysCt Cac>

Does anyone know howto restore the EFI? Please tell or point me in the direction of some docs.

In Voralyan's Altix 350 PROM update viewtopic.php?t=16725651#p7342874 I found this.

Code: Select all

Ok, flash was successfull. The next step is important: Go to POD mode and clear all logs...
Beware: All PROM variables and EFI settings are lost after this...
But no description on how to fix it.

The only good that came out of this, is now I know what is wrong from the session log.

Code: Select all

Module 001c33#2  Bank 1 failed memory tests..      DONE
If it is anything like the O2000, the node need working memory in bank 1 to start booting, the other
banks are not so important, they are just disabled.
By swapping the bank 1 memory, most likely the brick will work and I only loose a some memory.

And then I want the EFI back ... :(
... ehm ... it seems like it was no problem after all.

On friday I turned of the system with a feeling that I had messed something up real bad,
and posted in this forum for help. I spent the saturday googling for answer, to no avail.

Today I turned it on again and the EFI was back. It seems like all it needed was a real
power cycle. On friday I only rebooted it without turning it off/on properly :?

I also swapped around memory in the non working brick and managed to enable the cpus.

Code: Select all

Partition  0:                           Enabled Disabled
CBricks          8         Nodes         32        0
RBricks          0         CPUs          64        0
IOBricks         1         Mem(GB)      504        8

I should think more like Douglas Adams, 'Don't panic' :-)

It does however not boot my SLES9 SP2 correctly. Maybe it's that XFS bug that makes it
hang during boot, which should be fixed sith SP2.

Code: Select all

Loading kernel/fs/dmapi/dmapi.ko
Loading kernel/fs/exportfs/exportfs.ko
Loading kernel/fs/xfs/xfs.ko
SGI-XFS CVS-2004-10-17_05:00_UTC with ACLs, security attributes, realtime, large block/inode numbers, dmapi support, no debug enabled
Waiting for device /dev/sda4 to appear:  ok
rootfs:  major=8 minor=4 devn=2052
rootfs: /sys/block/sda/sda4 major=8 minor=4 devn=2052
attempt to access beyond end of device
sda4: rw=0, want=4, limit=2
EXT2-fs: unable to read superblock
attempt to access beyond end of device
sda4: rw=0, want=4,
and here it stops.
Glad to hear the machine is response again, and that you've managed to bring all the CPUs back online. Huzzah!

Not sure about the problem you're having booting SLES, though. This is a problem booting the existing installation, judging from the "sda4" in there.

Edit: Oh, right, you posted about installing SLES9 the first week of August. So pretty recently. You had rebooted a few times after installing, right? And didn't apply SP2, from your comments? Can you still boot from CD/DVD? Is there a rescue mode/disc, and if so does it suggest anything useful?
Then? :IRIS3130: ... Now? :O3x02L: :A3504L: - :A3502L: :1600SW: +MLA :Fuel: :Octane2: :Octane: :Indigo2IMP: ... Other: DEC :BA213: :BA123: Sun , DG AViiON , NeXT :Cube:
Thanks. Yes, it feels better now.

Yes. I had SLES9 up and running. First time I booted after installing on an empty disk everything worked.
There was even SLES option in the EFI menu. After a power cycle it wasn't in the EFI menu anymore, but
I could start it manually. Maybe I forgot to do something to make it a permanent bootoption.
In an attempt to get it back in the EFI menu I reinstalled the whole thing, starting with the SP2 CDs.
If I don't boot on the SP2 CD it hangs during installation. I have tried that to :-)
The 'only' difference is that the disk was already partitioned and I installed on the existing partitions.

After that I experienced the first hang during boot. Maybe the SP2 didn't get installed even though I used
them. Then I had my little adventure with EFI and the cpus and another SLES reinstall, same as last time.
It resulted in same halt during boot. Maybe a clean disk with no old partitions would have done the trick.

I'm giving up on SLES9 now anyway and moving on to SLES11SP2. It has less discs to shuffle when installing :-)
This time I'm also going to erase old partitions.
bjornl wrote: I'm giving up on SLES9 now anyway and moving on to SLES11SP2. It has less discs to shuffle when installing :-)
This time I'm also going to erase old partitions.
Don't forget you need a special EFI partition, I'm forgetting the details but it should be covered in the installation guide.

I checked my Altix 350 Friday and I've got SLES11 SP1, not SP2. Let me step back and explain the chain of logic that led me to ignore SP2.

I wanted the appropriate ProPack for SLES11. Why? Here's what SGI says you get :
sgi wrote: SGI ProPack 7 is the next generation of SGI's suite of performance optimization libraries and tools that accelerate applications and provide additional capabilities for SGI high performance computer systems.

The SGI ProPack 7 releases focus on the value-add features SGI develops such as the SGI Message Passing Toolkit, linkless FFIO, XVM volume manager, numatools, cpusets, Unified Parallel C, global reference unit (GRU) and superpages support on SGI Altix UV and SGI REACT Real-Time for Linux.

Sounds pretty cool, right? But if you check the release notes , PP7 only works with SLES11, and PP7 SP1 only works with SLES11 SP1. There is no PP7 for SLES11 SP2.

Furthermore PP7 SP1 requires SFS2 SP1 or later. Okay, if we check this page we can see which versions of SFS2 work with different versions of SLES11 and RHEL6. Oh, no IA64 support with RHEL6, and no IA64 versions of SFS2 after SLES11 SP1. Okay then, that leaves us with the following if you want an IA64 Altix with the latest possible Linux and ProPack:

SLES 11 SP1
SGI Foundation Software (SFS) 2 SP1
SGI ProPack 7 SP1

If you have SLES11 SP2, go for it - I doubt you'll really miss out on much. Maybe one of these days I'll report back on some amazing thing that ProPack 7 does for me, and in that case you can consider downgrading. I really just wanted to document how I decided to stop at SLES11 SP1.
Then? :IRIS3130: ... Now? :O3x02L: :A3504L: - :A3502L: :1600SW: +MLA :Fuel: :Octane2: :Octane: :Indigo2IMP: ... Other: DEC :BA213: :BA123: Sun , DG AViiON , NeXT :Cube:
Hi smj,

Guess you had enough time to test meanwhile...are there any good reasons to go for SLES 11 SP1 + Sgi foundation 2 SP1 + ProPack 7 for Altix owners ?
:Indy: :Indigo2: :O2: :O2+: :320: :Octane: :Octane2: :Fuel: :O3x02L: :A3502L: