The collected works of ClarusWorks

I'm looking for at least one, preferably two disk caddies for an Octane. I'm located in Pennsylvania, USA. I could offer money, or I also have an Indigo keyboard (somewhat yellowed and dusty) that's missing a cable that I'd be willing to trade.
I have a Fuel that won't power up. Pressing the power button does nothing, the light bar never flashes, etc. I assumed it was a bad power supply at first but now I am not so sure. It has the 430 watt NMB power supply, and when plugged in the 5V standby rail is on and reasonably close to +5V. For further testing I removed it from the Fuel, plugged in a couple of old hard drives I keep around for load-testing PSU and connected the PS_ON pin of the WTX 24 pin connector to ground. All of the hard drives spun up and stayed running as long as PS_ON was grounded, which makes me think that at least the +5 and +12V rails are OK.

I've tried connecting a serial cable to the internal L1 port at 38400 8N1 and see no signs of life after hitting CTRL+T several times with or without a null modem adapter. Any ideas on what to check next?
I tried connecting to the L1 via the internal serial port with a terminal emulator set to 38400 8,N,1 and didn't get any output. Should the L1 be up and running as long as +5V standby is up?

I don't have any spare Fuel graphics cards laying around, but I removed the card that was installed, and the machine still will not power on.
hamei wrote:
ClarusWorks wrote: I removed the card that was installed, and the machine still will not power on.

A fuel will definitely not power up without a working graphics card installed. btdt.

Thanks for that bit of info. Even if the main CPU won't power on, will the L1 come up without a graphics card? It'd be useful to get some signs of life and maybe some debug info out of this motherboard, otherwise I'm trying to troubleshoot a large multilayer board I have no schematics for with a multimeter and luck.

As for the PSU issues, I've read into those and my understanding is that the only difference between a Fuel PSU and a standard ATX of appropriate current ratings is the different pinout, and 2 pins used for fan monitoring. The pinouts I've seen only indicate one thing that could be a power on signal is PS_ON, pin 23. This is the pin I connected to GND when testing the PSU with the junk hard drives. The threads I've read about Fuel PSUs seem to imply that if you just rewire a regular ATX PSU correctly it will work with a Fuel if you disable environmental monitoring. I do have another PSU on the way, once I get it I will try rewiring it and see if that does anything.

I also think I remember at least one person who managed to fake out the fan speed signal with a 555 timer. If I can get this motherboard working I'd be happy to look at those fan signals with an oscilloscope and try to solve this mystery (lots of modern ATX power supplies do have a fan speed signal they feed back to the motherboard, it's possible that signal would work, or might work with some extra electronics added - usually the tachometer pulse off of a fan needs to be pulled up to some reference voltage and cleaned up a bit before something that wants to see square waves will like it).
The replacement PSU I was intending to use, assuming this is actually a PSU failure, is a Zippy/Emacs PSM-6600P ATX server PSU (you've probably never heard of the name, but they build reliable stuff with good caps, etc. as opposed to the $30 '1000 Watt' PSUs that have tons of ripple and fuses that look suspiciously like jumper wire). It has the following ratings:
5V @ 30A
12V @ 26A
12V @ 20A
-12V @ 0.8A
3.3V @ 30A
5V Standby @ 2A
Combined 5V and 3.3V max is 50A
Combined 12V max is 40A

This is higher than the 430 watt NMB on everything except the 3.3V rail, which is 45A on the NMB 430W unit, but only 27 on the 460 watt Sparkle PSU or 24 on the 460 watt NMB that later Fuels used. Interestingly, the later 460 watt NMB actually has lower max ratings on both the +5 and +3.3 rails, and only marginally more +12V (18 and 18 vs 16 and 18). I'm guessing the combined max of +5 and +3.3 is higher, maybe the early NMB power supplies liked to cook themselves due to overcurrent on those rails.
I'm suspecting there's something screwy going on here: Today I was doing some probing around the L1, was originally seeing the 3.3V supply to the L1 PROM low (1.6V), but then I went to probe a second time, and now I'm getting 3.3V and the L1 is alive. I can now confirm that a Fuel L1 will in fact come up with no CPU or graphics card attached :-)

ALERT: Unknown PSC: 15
INFO: Cannot disable power supply: 17

SGI SN1 L1 Controller
Firmware Image B: Rev. 1.12.6, Built 04/22/2002 08:13:40

ERROR: command not found.
Commands are:
* autopower|apwr junkbus|jb|bedrock brick
partdb cpu nia|ni|ctc nib
iia|ii|cti iib config|cfg debug
display|dsp env fan help|hlp
history|hist l1dbg link log
ioport|ioprt istat l1 leds
margin|mgn network pimm port|prt
power|pwr reset|rst nmi softreset|softrst
select|sel serial eeprom uart
usb verbose router|rtr date
nvram security nextgen flash
reboot_l1 version|ver pbay test|tst
scan pci
enter 'hlp <cmd>' for more help on a single command.
001a01-L1>serial all

Data Location Value
------------------------------ ------------ --------
Local System Serial Number EEPROM 08:00:69:10:48:94
Local Brick Serial Number EEPROM MLA346
Reference Brick Serial Number NVRAM MLA346

EEPROM Product Name Serial Part Number Rev T/W
---------- -------------- ---------- -------------------- --- ------
NODE IP34 MLA346 030_1707_003 H 00
PIMM no hardware detected
XIO no hardware detected

EEPROM JEDEC Info Part Number Rev Speed (ns)
---------- ------------------------ ------------------ --- ----------
DIMM 0 CE0000000000000028260D00 M3 47L6510BT1-CA0 0B 10.0
DIMM 2 no hardware detected
DIMM 1 CE0000000000000028130A00 M3 47L6510BT1-CA0 0B 10.0
DIMM 3 no hardware detected

CPU A: 0xff: Console poll found data for reading
CPU B: 0x00: PLED_RESET: Slave loop (0x00/0x45=okay, solid 0x00=possibly hung)
CPU C: 0x00: PLED_RESET: Slave loop (0x00/0x45=okay, solid 0x00=possibly hung)
Supply State Voltage Margin Value
-------------- ----- --------- ------- -----
12V off 5.563V N/A
12V IO NC 0.000V N/A
5V NC 0.676V N/A
3.3V NC 1.531V default 0
2.5V off 1.157V default 0
1.5V NC 1.255V default 0
5V aux NC 2.314V N/A
3.3V aux NC 0.000V N/A
PIMM0 12V bias <not present>
unknown NC ERROR (-204) default ERROR (-207)
Asterix CPU <not present>
PIMM0 1.5V <not present>
PIMM0 3.3V aux <not present>
PIMM0 5V aux <not present>
XIO 12V bias <not present>
XIO 5V <not present>
XIO 2.5V <not present>
XIO 3.3V aux <not present>
nyef wrote:
hamei wrote: The power supplies don't generally cook themselves. The stupid little interface chip dies and then you can't tell it to turn on.

This is a bit of a crazy question, but... Do we know, or can we find out enough about the "stupid little interface chip" to be able to produce a replacement or some other workaround in order to revive the power supply instead of hacking in an ATX supply as a replacement?

I don't think there is an "interface chip". The ATX supply mods with the exception of the mysterious fan signals appear to just be swapping wires around on the 24 pin and 8 pin connectors. The chip that some people seem to have had fail in their NMB Fuel PSUs is a PIC microcontroller that the mystery fan signals as well as some other stuff are connected to. Actually powering on the PSU is just a matter of pulling the power on pin low, no chips involved. It's possible that the fan signal is somehow special, it's also possible that all the PIC is doing is filtering a fan tachometer signal from the mechanical fan itself since those signals tend to be extremely dirty by default.
The power on pin is a normal ATX power supply thing. Your standard PC PSU has it as well, it's just in a different location. If that wasn't the case, Kubatysko's ATX mod would not work at all. The faking out the tach signal is there to make environmental monitoring happy. My idea would be instead of faking it, use a PSU with a 3 wire fan (third wire is the tach) and run that signal into the tach pin on the 24 pin connector.
So, here's a quick update on where I am after looking at the machine again tonight:

Sometimes the L1 comes up, sometimes it does not. Unplugging/replugging the AC cord from the power supply 2 or 3 times will eventually get it to come up. Possible strike against the power supply there.

When the L1 comes up, it can read the serial numbers on the motherboard and PIMM but not the V10 card, XIO reports "No hardware detected". If I try to power the machine up with environmental monitoring on, I get some possibly bogus temperature and voltage warnings:

Code: Select all

SGI SN1 L1 Controller
Firmware Image A: Rev. 1.12.6, Built 04/22/2002 08:13:40

001a01-L1>pwr up
001a01 ATTN: 12V low fault limit reached  5.563V.

001a01 ATTN: brick auto power down in 30 seconds

001a01 ATTN: brick auto power down in 25 seconds
env off
001a01 ATTN: power down aborted, environmental monitor reset

001a01 ATTN: NODE 0 fault temperature reached @ 89 C/ 192 F.

If I disable environmental monitoring, the machine will power up, and one of two things happens: If the V10 card is installed, the light bar stays on solid red. If the V10 card is not installed, the light bar turns red for a few seconds, and then just keeps flashing white forever.

I tried running some of the tests in the L1, and think there might be some bad DS1780s on this board (which could explain the crazy temperature readings and low 12V warning). It also looks like I may want to invest in a Dallas chip, or apply a Dremel tool and CR2032 holder to this one.

Code: Select all

02/07/06 06:28:15 checksum Error - common header initialized
02/07/06 06:28:15 nvram checksum error - initializing core data.
02/07/06 06:28:15 nvram checksum error - initializing extended data.
02/07/06 06:28:15 nvram checksum error - log pointers invalid, log pointers rest
02/07/06 06:28:15 L1 booting...
02/07/06 06:28:15 NVRAM doesn't match EEPROM
02/07/06 06:28:15 USB0: waiting on open
02/07/06 06:28:15 checking power status
02/07/06 06:28:15 power up (COMMAND)
02/07/06 06:28:15 12V low fault limit reached  5.563V.
02/07/06 06:28:15 power down (COMMAND)
02/07/06 06:28:15 NODE 0 fault temperature reached @ 89 C/ 192 F.
02/07/06 06:28:15 checking power status
02/07/06 06:28:15 power up (COMMAND)
02/07/06 06:28:15 power down (PANEL)
001a01-L1>ERROR: Reset error: scan error - unable to reset scan hardware

Supply          State Voltage    Margin  Value
--------------  ----- ---------  ------- -----
12V     on    5.563V      N/A
12V IO     NC   12.188V      N/A
5V     NC    5.096V      N/A
3.3V     NC    1.531V  default     0
2.5V     on    1.157V  default     0
1.5V     NC    1.255V  default     0
5V aux     NC    2.314V      N/A
3.3V aux     NC    0.000V      N/A
PIMM0 12V bias     NC    5.813V      N/A
unknown     NC    1.209V  default     0
Asterix CPU     on    1.311V  default    93
PIMM0 1.5V     NC    1.311V  default     0
PIMM0 3.3V aux     NC    1.600V      N/A
PIMM0 5V aux     NC    2.418V      N/A
XIO 12V bias     <not present>
XIO 5V     <not present>
XIO 2.5V     <not present>
XIO 3.3V aux     <not present>
001a01-L1>tst ioexp get all
0:      BUS SELECT I/O Expander: 0x78
1:  MARGIN CONTROL I/O Expander: 0x00
2:        LED0 LOW I/O Expander: 0x05
3:  ASIC CONTROL 1 I/O Expander: 0x7b
4:  ASIC CONTROL 2 I/O Expander: 0x33
5:    POWER ENABLE I/O Expander: 0xfa
6:      XB CONTROL I/O Expander: 0xb7
7:    PIMM CONTROL I/O Expander: 0x30
8:     ODYSSEY GFX I/O Expander: not present
9: DISPLAY CONTROL I/O Expander: 0x17
001a01-L1>tst i2cNODE 0 DS1780 Company ID miscompare, bus: 2 addr: 2c act: 59 ea
PIMM DS1780 Company ID miscompare, bus: 4 addr: 2e act: 5d exp: da

error -1 on tst.
001a01-L1>tst i2cNODE 0 DS1780 Company ID miscompare, bus: 2 addr: 2c act: 59 ea
PIMM DS1780 Company ID miscompare, bus: 4 addr: 2e act: 5d exp: da

error -1 on tst.

At this point, I think what I'm planning to do is go ahead and convert an ATX PSU since I already have one on the way. If the board still acts crazy after that, I'll procure a motherboard from somewhere and hope my PIMM and memory are good. I have no idea if the V10 card is broken, or if there's an I2C bus issue preventing the L1 from seeing it.