SGI: Hardware

dead O200 - Page 1

hi!

i have bought an O200 a long time ago, but it is dead. it does nothing. only makes big noise :D i had some time and soldered an MSC cable. with this i am able to access the MSC and it works, but the O200 does nothing on the serial... what could the problem be? i wanted to buy a carrier logic board, but i could not find any in Europe for weeks, than i stopped searching for it. my board is: 030-1025-002 (1M cache support) and 013-1896-001 (dual R10K/180/1M) and 030-1235-001 (PCI board). i suspect the carrier logic is dead. now, i found one for sale: 030-1389-001 (it is a 4M cache support board). would it work with dual 1M CPUs and the PCI board? what else should i do/look for? maybe it's working... or am i just too n00b? :P

thanks!
Richard
It might be best to find a processor/main board combo. I killed one of my O200s trying to configure it to use a different CPU. Once you have the wrong values in and reboot it's pretty much game over. I have the same symptoms as you.

If you buy the board w/o a CPU you'll need the same CPU the board was originally "flashed" for.
thank you!

unfortunately i could not find a pair combo (motherboard + cpu) for sale, so i have to look further after them. is there any documentation which describes, witch cpu goes with witch motherboards for O200? i dont know O200s, does the motherboard clock the cpu or cpus clock themselfs (where is this info hold, NVRAM?) i bought a complete system, that used to work some day, but noone knows what happened to it. it can be that the ram modules are faulty, is there a way to test them in other computers? i have an octane1/r12k too (it works well :D ). my friend has an r5k/200 o2. i have access to some SUNs too...

regards,
Richard
Not sure exactly how it works, but the cpu clock and some other values are store in flash memory on the logic carrier. If the values don't match the cpu it usually doesn't boot...just like you're seeing.

The O200 memory is specific to the Origin 200, Origin 2000 and Onyx 2. The memory must be installed in pairs. On the O200 it is an odd layout, you start with the two outer most sockets and work inwards.
i've got 6 memory modules in my O200, 4 (inner, 2 most inner slots are empty) of them are yellow. i've noticed, that they don't sit very well in their sockets, you can move them 1-2milimeters sideway when inserted and there is only one clip per socket. feels like kinda lame pc... :( does faulty memory affect serial terminal output? i also inserted the system disk onto my octane, it looked ok (no hw damage and fs was clean). i will play with the memory modules. the MSC port works so nice :D although, i have no special plans with the O200, it would be nice, if it would run, and maybe i could find some task that i could use it for...

i see, you are parting purple i2... i have 2 of them, both are defective :( both R10k, max and high impact. the upper board of the impact is faulty. one does nothing, the other messes up the screen with random pixels (later with matrix like patterns, here is the screenshot http://www.kozarmisleny.hu/_lionsgi/sgi ... enshot.rgb ), the machine is stable... it could be a cold solder somewhere near the framebuffer RAM or DAC, or faulty RAM or DAC. i've disassembled the boards, resoldered some pins, but nothing happened. it used to work fine for 2 years, it started messing up when cold, after 10mins it went away, it did it for half year got worser form week to week... i like my purple maximpact the most of my machines, it was my first SGI.

one PSU is also bad, has capacitor problems (mostly it dont starts, if it starts it works fine)
I have a 180MHz R10k single processor O200 I want to get rid of, if you're interested, I can part it.
is the flash memory in the 0200 the DALLAS chip? it is socketed, could be exchanged in the "new" carrier logic, to ensure it will work with the cpu that was in the faulty one...
i had some spare time this weekend :D and i decided to disasseble my dead o200... i completely disasebled the dual R10k/180/1M board. i cleaned the pins and so on... there is a thin something between the cpu and the heatsink. on one cpu this was completely loose, so i guess that cpu did not have (or had bad) contact to the heatsink. now i screwed together the cpu board with the other cpu only, and got this:

1A 000: Starting PROM Boot process
1A 000:
1A 000:
1A 000: IP27 PROM SGI Version 6.17 built 04:52:09 PM Sep 21, 1998
1A 000: using BaseIO nic
1A 000: Testing/Initializing memory ............... DONE
1A 000: Copying PROM code to memory ............... DONE
1A 000: Discovering local IO ...................... DONE
1A 000: update_klcfg_cpuinfo: Couldn't find my structure.
1A 000: Discovering CrayLink connectivity .........
1A 000: Local hub CrayLink is down.
1A 000: *** Local network link down
1A 000: DONE
1A 000: Found 1 objects (1 hubs, 0 routers) in 38663 usec
1A 000: Waiting for peers to complete discovery.... DONE
1A 000: No other nodes present; becoming global master
1A 000: Global master is /hw/module/1/slot/MotherBoard
1A 000: Testing/Initializing all memory ........... DONE
1A 000: *** Nasid 0: CPU B was previously Present & Enabled but is now Absent
1A 000: *** Nasid 0: Memory bank 1 was previously Present & Enabled but is now A bsen
1A 000: t
1A 000: *** Nasid 0: Memory bank 2 was previously Present & Enabled but is now A bsen
1A 000: t
1A 000: *** Nasid 0: Memory bank 1 was previously had 64 MB but now has 0 MB
1A 000: *** Nasid 0: Memory bank 2 was previously had 64 MB but now has 0 MB
1A 000: Checking partitioning information ......... DONE
1A 000: No other nodes present; becoming partition master
1A 000: *** No console found. Searching for console...
1A 000: *** Found console on /hw/module/1/slot/io1.
1A 000: *** You can change the console by setting the ConsolePath variable
1A 000: *** Setting ConsolePath variable and resetting.
1A 000: Starting PROM Boot process

then i disassebled the whole thing again, i noticed, that there is a pin missing from the second plastic socket that is between the cpu and the pcb. should this pin be missing? after turning on i got this:

1B 000: Testing/Initializing all memory ........... DONE
1B 000: *** Nasid 0: CPU A was previously Present & Enabled but is now Present & Dis
1B 000: abled
1B 000: *** Nasid 0: CPU B was previously Absent but is now Present & Enabled
1B 000: Checking partitioning information ......... DONE

how many working CPUs i have then? one? :D

after restarting i got then this:

1B 000: Starting PROM Boot process
1B 000: serial_pio failed with 2 errors
1B 000: serial_pio failed:
1B 000: RSLT serial_pio FAIL diag_rc = 63
1B 000:
1B 000: diag_serial_pio: /hw/module/1/slot/io7: FAILED
1B 000:
1B 000:
1B 000: IP27 PROM SGI Version 6.17 built 04:52:09 PM Sep 21, 1998
1B 000: using BaseIO nic
1B 000: Testing/Initializing memory ............... DONE
1B 000: Copying PROM code to memory ............... DONE
1B 000: Discovering local IO ...................... DONE
1B 000: Discovering CrayLink connectivity .........
1B 000: Local hub CrayLink is down.
1B 000: *** Local network link down
1B 000: DONE
1B 000: Found 1 objects (1 hubs, 0 routers) in 38388 usec
1B 000: Waiting for peers to complete discovery.... DONE
1B 000: No other nodes present; becoming global master
1B 000: Global master is /hw/module/1/slot/MotherBoard
1B 000: Testing/Initializing all memory ........... DONE
1B 000: Checking partitioning information ......... DONE
1B 000: No other nodes present; becoming partition master
1B 000: *** No console found. Searching for console...
1B 000: *** Found console on /hw/module/1/slot/io1.
1B 000: *** You can change the console by setting the ConsolePath variable
1B 000: *** Setting ConsolePath variable and resetting.
1B 000: Starting PROM Boot process
1B 000: serial_pio failed with 2 errors
1B 000: serial_pio failed:
1B 000: RSLT serial_pio FAIL diag_rc = 63
1B 000:
1B 000: diag_serial_pio: /hw/module/1/slot/io7: FAILED

what does it mean simply? there is no serial terminal and scsi hdd attashed to it...

thanks!
I see in the output that you pulled the memory out of banks 1 and 2. That can't be helping much. With that said, your O200 appears to have enough issues that it might be difficult to get working again.

The Origin 200 appears to have been a bastart step-brother of the Origin 2000 family, and it's probably one of the more difficult pieces of kit to get working once it starts misbehaving.

I used to know how to work with this stuff five years ago... You might end up having to wait for a replacement Origin 200 logic carrier with CPU module already installed to become available...

However, with enough spare time and enough tinkering, you just might get something working. Just remember to clear the PROM and enable all the devices (I forget the litany, but if you search this forum for "enableall", you should find the list of commands) if you can actually get into the maintenance PROM.

Chris
:O2000R: (<-EMXI/IO6G) :O200: :O200: :O200: (<- quad R12k O200 w/GIGAchannel and ESI+Tex) plus a bunch of assorted standalone workstations...
so, one of my CPUs is dead, thats sure. i dismounted it, now it runs with only one CPU. i get all this output on the MSC port. when i plug a serial cable into the serial port, i get only trash. the cable is working tho, when i use it on my octane. on the octane i can go into the PROM and do things, but on this o200, only trash comes out (like viewing an executeable file with an ASCII editor) how do i access the PROM on the o200? (i am afraid, i have to look for a working carrier + cpu...)

here is the MSC output i got now each time:

MSC> pwr u
ok
1A 000: Starting PROM Boot process
1A 000: serial_pio failed with 2 errors
1A 000: serial_pio failed:
1A 000: RSLT serial_pio FAIL diag_rc = 63
1A 000:
1A 000: diag_serial_pio: /hw/module/1/slot/io7: FAILED
1A 000:
1A 000:
1A 000: IP27 PROM SGI Version 6.17 built 04:52:09 PM Sep 21, 1998
1A 000: using BaseIO nic
1A 000: Testing/Initializing memory ............... DONE
1A 000: Copying PROM code to memory ............... DONE
1A 000: Discovering local IO ...................... DONE
1A 000: update_klcfg_cpuinfo: Couldn't find my structure.
1A 000: Discovering CrayLink connectivity .........
1A 000: Local hub CrayLink is down.
1A 000: *** Local network link down
1A 000: DONE
1A 000: Found 1 objects (1 hubs, 0 routers) in 38496 usec
1A 000: Waiting for peers to complete discovery.... DONE
1A 000: No other nodes present; becoming global master
1A 000: Global master is /hw/module/1/slot/MotherBoard
1A 000: Testing/Initializing all memory ........... DONE
1A 000: Checking partitioning information ......... DONE
1A 000: No other nodes present; becoming partition master
1A 000: *** No console found. Searching for console...
1A 000: *** Found console on /hw/module/1/slot/io1.
1A 000: *** You can change the console by setting the ConsolePath variable
1A 000: *** Setting ConsolePath variable and resetting.
1A 000: Starting PROM Boot process
1A 000: serial_pio failed with 2 errors
1A 000: serial_pio failed:
1A 000: RSLT serial_pio FAIL diag_rc = 63
1A 000:
1A 000: diag_serial_pio: /hw/module/1/slot/io7: FAILED

maybe bad serial port? there's a module near the serial ports, sticked onto the carrier logic. there is nothing on this little PCB. if i remove this, i get no BaseIO error...

thanks!
At least you're seeing some level of consistency.

It does appear that the serial port is failing, but that's a little odd, because serial ports don't really fail anymore. I do see your PROM is rather old, so maybe there's a way to update the PROM through the MSC. I don't recall. Either way, you probably want to completely reset the MSC's variables to their defaults, to see if that helps.

Try searching this forum for discussions from, say, last summer, when a few of us were simultaneously having problems with Origin 2000 systems. There was a bit of discussion over how to work with the MSC.

Chris
:O2000R: (<-EMXI/IO6G) :O200: :O200: :O200: (<- quad R12k O200 w/GIGAchannel and ESI+Tex) plus a bunch of assorted standalone workstations...
i could not get it goning, so i bought another o200. it is working, but has only one cpu. i played around with it, i tried to put the degraded 2 cpu module (with one cpu dismounted, other cpu is dead, probably burnt, due to bad heatsink connection) in the "new" carrier logic, it wanted to boot. :D now i moved the cpu from the "new" module to the old degraded one as a second cpu. it did not do anything. now i dismounted the new cpu from the old module (it became degraded again); it did not do anything :S. now i moved back the new cpu onto the new module, it goes well. what could be happened to the old module? why did it not function at least in degraded mode again? i asked some people on the chat about this, they said, i have to upgrade the PROM. which chip is the PROM on the carrier logic? is it the little PLCC one near the pciboard socket? what if i swap these chips (old one had dual, new one had single cpu originally)?

thanks!
hello!

now the origin complains, that some (1 :P ) of the processor(s) have old firmware, that i should upgrade... where do i get newer firmware for IP27 (Origin200/180/1M)?

thanks!
LionSGI wrote: hello!

now the origin complains, that some (1 :P ) of the processor(s) have old firmware, that i should upgrade... where do i get newer firmware for IP27 (Origin200/180/1M)?

thanks!


Its part of every IRIX installation and its located under "/usr/cpu/firmware". Just use "flash" and it will use the default location.

Regards
Joerg
joerg wrote:
LionSGI wrote: hello!

now the origin complains, that some (1 :P ) of the processor(s) have old firmware, that i should upgrade... where do i get newer firmware for IP27 (Origin200/180/1M)?

thanks!


Its part of every IRIX installation and its located under "/usr/cpu/firmware". Just use "flash" and it will use the default location.

Regards
Joerg


hello! i have installed IRIX 6.5.22f yesterday. it did some upgrade, but it still complains about the old CPU flash. i used the flash command by hand, it did nothing (perhaps the installer did flash the newest that i have, so it is nothing more to do, unless i get even newer PROMs) as i can remember i have v6.150 now, it was 6.31 before... any ideas?

thanks!
LionSGI wrote: i have v6.150 now, it was 6.31 before... any ideas?

Looks good to me. Do a 'hinv -mv' and near the end you should have a line that looks like:

Code: Select all

IP27prom in Module 1/Slot n1: Revision 6.156

PROM v6.156 is the latest, and I think it was already in IRIX 6.5.22 (my runs 6.5.30)
To accentuate the special identity of the IRIS 4D/70, Silicon Graphics' designers selected a new color palette. The machine's coating blends dark grey, raspberry and beige colors into a pleasing harmony. ( IRIS 4D/70 Superworkstation Technical Report )
hello!

this message comes on the console:

Code: Select all

Starting up the system...

WARNING: Some CPUs have old firmware.  Please update node board flash proms.

and this is hinv -mv:

Code: Select all

origin 4# hinv -mv
Location: /hw/module/1/slot/MotherBoard/node
PIMM_1XT5_1MB Board: barcode DWK946     part 013-1895-001 rev  C
Location: /hw/module/1/slot/MotherBoard/node/xtalk/8
IP29 Board: barcode DBF494     part 030-1025-002 rev  L
Location: /hw/module/1/slot/MotherBoard/node/xtalk/8/pci/2
1 180 MHZ IP27 Processor
CPU: MIPS R10000 Processor Chip Revision: 2.6
FPU: MIPS R10010 Floating Point Chip Revision: 2.6
CPU 0 at Module 1/Slot 1/Slice A: 180 Mhz MIPS R10000 Processor Chip (enabled)
Processor revision: 2.6. Scache: Size 1 MB Speed 120 Mhz  Tap 0x9
Main memory size: 384 Mbytes
Instruction cache size: 32 Kbytes
Data cache size: 32 Kbytes
Secondary unified instruction/data cache size: 1 Mbyte
Memory at Module 1/Slot 1: 384 MB (enabled)
Bank 0 contains 128 MB (Standard) DIMMS (enabled)
Bank 1 contains 64 MB (Standard) DIMMS (enabled)
Bank 2 contains 64 MB (Standard) DIMMS (enabled)
Bank 3 contains 128 MB (Standard) DIMMS (enabled)
Integral SCSI controller 0: Version QL1040B (rev. 2), single ended
Disk drive: unit 1 on SCSI controller 0 (unit 1)
Integral SCSI controller 1: Version QL1040B (rev. 2), single ended
CDROM: unit 2 on SCSI controller 1
IOC3/IOC4 serial port: tty1
IOC3/IOC4 serial port: tty2
IOC3 parallel port: plp1
Integral Fast Ethernet: ef0, version 1, module 1, slot MotherBoard, pci 2
Origin 200 base I/O, module 1 slot 1
PCI Adapter ID (vendor 0x10a9, device 0x0003) PCI slot 2
PCI Adapter ID (vendor 0x1077, device 0x1020) PCI slot 0
PCI Adapter ID (vendor 0x1077, device 0x1020) PCI slot 1
IOC3/IOC4 external interrupts: 1
HUB in Module 1/Slot 1: Revision 3 Speed 90.00 Mhz (enabled)
IP27prom in Module 1/Slot n1: Revision 6.150
origin 5#

Code: Select all

origin 14# uname -aR
IRIX64 origin 6.5 6.5.22m 10070055 IP27

so, how does IRIX know, that the CPU flash is old, when the newest is already flashed that it knows about?

thanks!
i successfully managed to kill my "new" origin200. i only put the dual cpu module in it (with one cpu) that it used to work with, now it does nothing anymore. so i put back the single cpu module (all CPUs are 180/1M), that i installed IRIX with 2 days ago; it does nothing, MSC works well :P i am tired of the origin200 by now. i'm sure it saved something in the PROM, or in the DALLAS or whereever... is there a way to reset these settings? can someone give me a full MSC command list?

thanks!
LionSGI wrote: hello!

this message comes on the console:

Code: Select all

Starting up the system...

WARNING: Some CPUs have old firmware.  Please update node board flash proms.


thanks!


It's a bug is 6.5.22m and affects bigger Origins too.
I got 2 O200s which were mixed together to make at least one working but no success. Is it possible effectively to kill the machine with interchanging the parts?
:O200: :Indigo: :O2: :Indigo2IMP: