SGI: Hardware

Dead Fuel ?

Hi folks,

I've acquired another fuel, this time it has 500Mhz PIMM, 2GB RAM and of course just a V10.

Couple things I noticed when I first powered it up:

L1 didn't give me any echo to any commands - it would just respond "ok", or print expected output - ie, there was no usual 001a01-L1> prompt

Nothing shows up on Serial port1 - everything goes onto L1 port, so after booting, I had to press CTRL-D to get into PROM - which is fine by me.

I noticed that PROM doesn't really hold settings (at least not all of them), settings like netaddr would last through soft reset, but settings like srvaddr, tapedevice, fxaddr, notape (all the standard for installation) wouldn't last (of course all of them were with -p ).

I was able to start FX remotely, partition my drive etc, but then, I started SASH, and when I wrote "install" it would throw a TLB error at me:

Code:
A 000: *** TLB Refill Exception on node 0
A 000: *** EPC: 0xc00000001fc44de4 (0xc00000001fc44de4)
A 000: *** Press ENTER to continue.


(it may not be exactly the address I got - my scrollback went up so I stole it from someone else).

Well, I thought, maybe that's because my console was set to g , I've changed it to d (it wouldn't last reboot), and just in case - plugged a keyboard and a mouse in - who knows, maybe it's as picky as an O2.

That didn't help.

So then I went into L1 again, reset nvram, checked flash status etc. And I finally got the ol'good L1 prompt - but then, it won't boot again anymore, all I get is this:

Code:
escaping to L1 system controller
001a01-L1>power up

returning to console mode  001a01 console, <CTRL_T> to escape to L1
Starting PROM Boot process


IP35 PROM SGI Version 6.180  built 01:50:56 PM Nov 18, 2003
Running in DDR mode
Testing/Initializing memory ...............             DONE
Copying PROM code to memory ...............             DONE
Discovering local IO ......................             DONE
Discovering NUMAlink connectivity .........
Local hub NUMAlink is down.
*** Local network link down
DONE
Found 1 objects (1 hubs, 0 routers) in 5895 usec
Waiting for peers to complete discovery....             DONE
No other nodes present; becoming global master
Global master is /hw/rack/001/bay/01
Intializing any CPUless nodes..............             DONE
Checking partitioning information .........             DONE
No other nodes present; becoming partition master
Loading BASEIO prom .......................             DONE

BASEIO PROM Monitor SGI Version 6.180  built 01:47:37 PM Nov 18, 2003 (BE64)
1 CPUs on 1 nodes found.

NVRAM checksum is incorrect: reinitializing.
Automatic update of PROM environment disabled

PS/2 Keyboard & Mouse diagnostics
Found mouse on port 0
Found keyboard on port 1
PS/2 Keyboard & Mouse diagnostics passed

Graphics diagnostics

Odyssey board #0 found on nasid 0
Running Odyssey xtalk sanity diag...
Board version 1 - Buzz revision 2B
On board sdram size: 32 Mb
Cas latency: CAS 3
2 banks by sdram module
Running Odyssey Buzz registers diag...
Device passed diagnostics

Installing PROM Device drivers ............
Base I/O Ethernet set to /dev/ethernet/ef0
Installing Graphics Console...
graphics install: searching for pipe 0

Walking SCSI Adapter 0, (pci id 1)
1- 2- 3- 4- 5- 6- 7- 8- 9- 10- 11- 12- 13- 14- 15- = 0 device(s)


Walking SCSI Adapter 1, (pci id 1)
1- 2- 3- 4- 5- 6- 7- 8- 9- 10- 11- 12- 13- 14- 15- = 0 device(s)

Initializing PROM Device drivers ..........             DONE



I believe it should start going into hardware discovery - but it freezes at this point...

Some info from L1:

Code:
001a01-L1>flash status
Flash image B currently booted

Image      Status        Revision    Built
-----   -------------   ----------   -----
A     valid           1.22.6       08/07/2003 12:56:31
B     user default    1.22.6       08/07/2003 12:56:31
001a01-L1>


001a01-L1>serial all

Data                            Location      Value
------------------------------  ------------  --------
Local System Serial Number      EEPROM        08:00:69:10:26:C9
Local Brick Serial Number       EEPROM        MZL370
Reference Brick Serial Number   NVRAM         MZL370


EEPROM      Product Name    Serial      Part Number           Rev  T/W
----------  --------------  ----------  --------------------  ---  ------
NODE        IP34            MZL370      030_1707_003          M    00
MAC         MAC ADDRESS     NA          NA                    NA   NA
PIMM        IP34PIMM        MEF168      030_1708_002          J    00
XIO         ASTODYB         MFB185      030_1725_001          F    00

EEPROM     JEDEC-SPD Info           Part Number        Rev Speed  SGI
---------- ------------------------ ------------------ ---- ------ --------
DIMM 0     CE0000000000000027F30900 M3 47L6510BT0-CA0   0B   10.0  N/A
DIMM 2     CE0000000000000028C4B801 M3 47L6510BT0-CA0   0B   10.0  N/A
DIMM 1     CE0000000000000027FA0900 M3 47L6510BT0-CA0   0B   10.0  N/A
DIMM 3     CE0000000000000028C2B801 M3 47L6510BT0-CA0   0B   10.0  N/A

001a01-L1>



001a01-L1>env
Environmental monitoring is enabled and running.

Description    State       Warning Limits     Fault Limits       Current
-------------- ----------  -----------------  -----------------  -------
12V    Enabled  10%  10.80/ 13.20  20%   9.60/ 14.40   12.06
12V IO    Enabled  10%  10.80/ 13.20  20%   9.60/ 14.40   12.12
5V    Enabled  10%   4.50/  5.50  20%   4.00/  6.00    5.07
3.3V    Enabled  10%   2.97/  3.63  20%   2.64/  3.96    3.32
2.5V    Enabled  10%   2.25/  2.75  20%   2.00/  3.00    2.46
1.5V    Enabled  10%   1.35/  1.65  20%   1.20/  1.80    1.47
5V aux    Enabled  10%   4.50/  5.50  20%   4.00/  6.00    5.10
3.3V aux    Enabled  10%   2.97/  3.63  20%   2.64/  3.96    3.27
PIMM0 12V bias    Enabled  10%  10.80/ 13.20  20%   9.60/ 14.40   12.12
Fuel SRAM    Enabled  10%   2.25/  2.75  20%   2.00/  3.00    2.54
Fuel CPU    Enabled  10%   1.44/  1.76  20%   1.28/  1.92    1.61
PIMM0 1.5V    Enabled  10%   1.35/  1.65  20%   1.20/  1.80    1.49
PIMM0 3.3V aux    Enabled  10%   2.97/  3.63  20%   2.64/  3.96    3.27
PIMM0 5V aux    Enabled  10%   4.50/  5.50  20%   4.00/  6.00    5.10
XIO 12V bias    Enabled  10%  10.80/ 13.20  20%   9.60/ 14.40   12.00
XIO 5V    Enabled  10%   4.50/  5.50  20%   4.00/  6.00    5.07
XIO 2.5V    Enabled  10%   2.25/  2.75  20%   2.00/  3.00    2.47
XIO 3.3V aux    Enabled  10%   2.97/  3.63  20%   2.64/  3.96    3.30

Description    State       Warning RPM  Current RPM
-------------- ----------  -----------  -----------
FAN 0  EXHAUST    Enabled          920         1357
FAN 1       HD    Enabled         1560         2671
FAN 2      PCI    Enabled         1120         1790
FAN 3    XIO 1    Enabled         1600         2200
FAN 4    XIO 2    Enabled         1600         2157
FAN 5       PS    Enabled         1600         2531

Advisory   Critical   Fault      Current
Description    State       Temp       Temp       Temp       Temp
-------------- ----------  ---------  ---------  ---------  ---------
NODE 0            Enabled   60C/140F   65C/149F   70C/158F   46C/114F
NODE 1            Enabled   60C/140F   65C/149F   70C/158F   44C/111F
NODE 2            Enabled   60C/140F   65C/149F   70C/158F   29C/ 84F
PIMM              Enabled   60C/140F   65C/149F   70C/158F   60C/140F
ODYSSEY           Enabled   60C/140F   65C/149F   70C/158F   35C/ 95F
BEDROCK           Enabled   70C/158F   75C/167F   80C/176F   54C/129F

001a01-L1>





I did notice it says that NVRAM checksum is invalid, I wonder how much is that related...

Any ideas ?

I do have another Fuel, and I might be able to try and swap the timekeeper chips (dallas and ST), but I wonder if anything else could be done to find out what's going on....

Cheers

_________________
[click for links to hinv] JP: [ :O200: :Fuel: :Octane2: :Octane: :O2: :Indy: :Indy: ] PL: [ :Fuel: :O2: :O2+: :Indy: ]
For Sale: 2*O200 M/B, 2*O200 PSU, 6*256MB O200 RAM, 2*O200 SCSI Backplane, 2*O200 MSC, DMediaPro DM-2 ( 030-1653-002 Rev. H , XT-DIGVID) with Octane XIO pull (Origin pull optionally available)
kubatyszko wrote:
Code:
NVRAM checksum is incorrect: reinitializing.
Automatic update of PROM environment disabled

PS/2 Keyboard & Mouse diagnostics
Found mouse on port 0
Found keyboard on port 1
PS/2 Keyboard & Mouse diagnostics passed

With a corrupt NVRAM, the 'console' variable could be garbage. I would remove the keyboard & mouse to force it to serial. The get into the PROM console and do a 'resetenv', 'update' and 'reset'

_________________
Now this is a deep dark secret, so everybody keep it quiet :)
It turns out that when reset, the WD33C93 defaults to a SCSI ID of 0, and it was simpler to leave it that way... -- Dave Olson, in comp.sys.sgi

Currently in commercial service: Image :Onyx2: (2x) :O3x02L:
In the museum : almost every MIPS/IRIX system.
Wanted : GM1 board for Professional Series GT graphics (030-0076-003, 030-0076-004)
Hmm, interesting, after unplugging the kbd and mouse it goes through (but it worked with them plugged until I reset the L1)...

So I'm getting the same TLB problem:

Code:
>> setenv netaddr 192.168.4.45
>> setenv srvaddr 192.168.4.100
>> setenv notape 1
>> setenv tapedevice bootp()192.168.4.100:/IRIX/IRIX6.5.30/cd1/dist/sa
>> boot -f $tapedevice(sash64)
Obtaining /IRIX/IRIX6.5.30/cd1/dist/sa(sash64) from server 192.168.4.100
896+111764+16853+3848 entry: 0xa8000000012a6ee4
Standalone Shell SGI Version 6.5 ARCS   Jul 20, 2006 (64 Bit)
sash: install
A 000 001c01:
A 000 001c01: *** TLB Refill Exception on node 0
A 000 001c01: *** EPC: 0xa80000000129c874 (0xa80000000129c874)
A 000 001c01: *** Press ENTER to continue.
A 000 001c01: POD SysCt Unc> why
A 000 001c01:  EPC    : 0xa80000000129c874 (0xa80000000129c874)
A 000 001c01:  ERREPC : 0x28142aebadf406f6 (0x28142aebadf406f6)
A 000 001c01:  CACERR : 0x000000009e044ac6
A 000 001c01:  Status : 0x0000000024407c80
A 000 001c01:  BadVA  : 0x0000000000000038 (+0x38)
A 000 001c01:  RA     : 0xa80000000129a49c (0xa80000000129a49c)
A 000 001c01:  SP     : 0xa8000000012fe870
A 000 001c01:  A0     : 0xa8000000012a9988
A 000 001c01:  Cause  : 0x0000000000008008 (INT:8------- <Load TLB Miss>)
A 000 001c01:  Reason : 244 (Unexpected TLB Refill Exception.)
A 000 001c01:  POD mode was called from: 0xc00000001fc01de8 (0xc00000001fc01de8)
A 000 001c01: POD SysCt Unc>


I wonder if this could be bad RAM (but it does pass the initial test).

EDIT, I tried removing the RAM and using only first bank, and again using only second bank - no luck. but I haven't tried using another RAM modules

EDIT 2, I tried with another (working) RAM, and another PIMM (also working, 600MHz) - no luck. Must be the Motherboard - Now the question is, could something be really broken, or maybe it's related to bad NVRAM...

EDIT 3, Swapping the Dallas chip didn't help (Still seeing that NVRAM checksum error), will try swapping the ST tomorrow (if even makes sense, can't recall which chip is the NVRAM or is it even replaceable)..


Thanks

_________________
[click for links to hinv] JP: [ :O200: :Fuel: :Octane2: :Octane: :O2: :Indy: :Indy: ] PL: [ :Fuel: :O2: :O2+: :Indy: ]
For Sale: 2*O200 M/B, 2*O200 PSU, 6*256MB O200 RAM, 2*O200 SCSI Backplane, 2*O200 MSC, DMediaPro DM-2 ( 030-1653-002 Rev. H , XT-DIGVID) with Octane XIO pull (Origin pull optionally available)
Did you resetenv and verify all PROM environment variables are sane? Can't recall about IP35 and ilk, but IP12/20 doesn't reset all environment vars with resetenv, so a manual check is necessary.

_________________
Damn the torpedoes, full speed ahead!

Systems available for remote access on request.

:Indigo: :Octane: :Indigo2: :Indigo2IMP: :Indy: :PI: :O200: :ChallengeL: :O2000R: (single-CM)
I did resetenv a couple times, couldn't see anything suspicious with any variables - better yet, the ones I have set disappear (I use -p of course), including console=d resetting to g at every reboot or so.
I'm far away to the west of Tokyo without access to the machine, so I won't be pursuing this for a couple more weeks.

_________________
[click for links to hinv] JP: [ :O200: :Fuel: :Octane2: :Octane: :O2: :Indy: :Indy: ] PL: [ :Fuel: :O2: :O2+: :Indy: ]
For Sale: 2*O200 M/B, 2*O200 PSU, 6*256MB O200 RAM, 2*O200 SCSI Backplane, 2*O200 MSC, DMediaPro DM-2 ( 030-1653-002 Rev. H , XT-DIGVID) with Octane XIO pull (Origin pull optionally available)