SGI: Hardware

Origin-2000 8x300MHz, 16GB RAM, ESI-TRAM - Help Needed

Hi all,

Managed to upgrade an Origin 2000 along with throwing in an ESI+TRAM from Octane. hinv -vm and /usr/gfx/gfxinfo posted below this although a few problems are evident.

Using a serial null cable, installed 6.5.18 although after installation, receiving errors revolving around not enabling all memory banks (you'll notice that with one of the banks, only half of RAM appears - physically all banks full). Under PROM, went through ENABLEALL, UPDATE, RESET. Additionally the monitor screen seems to flicker. From line command, can't seem to initiate the x-server. Have I gone through the right sequence of commands? After viewing the hinv output, am I missing any drivers or are all drivers up to date? Is my hardware such as the IO6G correctly defined?

Firstly the errors:

Code: Select all

System Maintenance Menu

1) Start System
2) Install System Software
3) Run Diagnostics
4) Recover System
5) Enter Command Monitor

Option? 5
Command Monitor.  Type "exit" to return to the menu.
>> flash
Usage: flash [-c|n] [-i] [-e] [-m MODID -s SLOTID] [-N NASID] [-f] [-F] [-y] [-v] [-
e] [-l] [-C] FILE
Flash all appropriate PROMs with FILE
>> enableall
Nasid 0: MEM disabled ...
Bank 5: Reason: Some DIMMs failed mem test.
Enable? (y|n) [y]:y
Nasid 2: MEM disabled ...
Bank 3: Reason: Some DIMMs failed mem test.
Enable? (y|n) [y]:y
Nasid 3: MEM disabled ...
Bank 3: Reason: Some DIMMs failed mem test.
Enable? (y|n) [y]:y
Bank 5: Reason: Some DIMMs failed mem test.
Enable? (y|n) [y]:y
Bank 7: Reason: Some DIMMs failed mem test.
Enable? (y|n) [y]:y
>>
>>
>> update
Writing 7 records....... DONE
Updated new configuration. Wrote 7 records.
>> reset
Resetting the system...


IP27 PROM SGI Version 6.156  built 11:27:56 AM Nov 18, 2003
*** Mixed standard and premium memory:
*** Treating all as standard.
Testing/Initializing memory ...............             DONE
Copying PROM code to memory ...............             DONE
Discovering local IO ......................             DONE
Discovering NUMAlink connectivity .........             DONE
Found 6 objects (4 hubs, 2 routers) in 66471 usec
Waiting for peers to complete discovery....             DONE
Recognized 390 MHz midplane
Global master is /hw/module/1/slot/n1
Testing/Initializing all memory .........----- MEMORY FAILURE (miscompare, node slot
1) -----
Uncorrectable directory ECC error
HSPEC address: 0xe8000200
Bad syndrome : 0x0a (multi)
Physical loc : MMYL5
Address      0x90000000e8000408
Mask         0x000000000000ffff
Actual       0x000000000000a8aa
Expected     0x000000000000aaaa
Difference   0x0000000000000200
Single Bit   MMYH5 line 14
----- MEMORY FAILURE (miscompare, node slot 1) -----
No hub MD error registered
Address      0x90000000e8000418
Mask         0x000000000000ffff
Actual       0x000000000000a8aa
Expected     0x000000000000aaaa
Difference   0x0000000000000200
Single Bit   MMYH5 line 14
----- MEMORY FAILURE (miscompare, node slot 1) -----
No hub MD error registered
Address      0x90000000e8000428
Mask         0x000000000000ffff
Actual       0x000000000000a8aa
Expected     0x000000000000aaaa
Difference   0x0000000000000200
Single Bit   MMYH5 line 14
----- MEMORY FAILURE (miscompare, node slot 1) -----
No hub MD error registered
Address      0x90000000e8000438
Mask         0x000000000000ffff
Actual       0x000000000000a8aa
Expected     0x000000000000aaaa
Difference   0x0000000000000200
Single Bit   MMYH5 line 14
Exceeded maximum error count (stopping)
DONE
Checking partitioning information .........             DONE
Loading BASEIO prom .......................             DONE

BASEIO PROM Monitor SGI Version 6.129  built 09:08:08 AM Nov 16, 2002 (BE64)
8 CPUs on 4 nodes found.
Installing PROM Device drivers ............
Base I/O Ethernet set to /dev/ethernet/ef0

Walking SCSI Adapter 0 (/hw/module/1/slot/io1), (pci id 0)
1+ 2- 3- 4- 5- 6+ 7- 8- 9- 10- 11- 12- 13- 14- 15- = 2 device(s)


Walking SCSI Adapter 1 (/hw/module/1/slot/io1), (pci id 1)
1- 2- 3- 4+ 5- 6- 7- 8- 9- 10- 11- 12- 13- 14- 15- = 1 device(s)

Initializing PROM Device drivers ..........             DONE
Checking hardware inventory ...............                      Found new or re-ena
bled component MEM BANK 5
Found new or re-enabled component MEM BANK 3 5
DONE

**** System Configuration and Diagnostics Summary ****
CONFIG:
No. of NODEs enabled    = 4
No. of NODEs disabled   = 0
No. of CPUs enabled     = 8
No. of CPUs disabled    = 0
Mem enabled             = 13312 MB
Mem disabled            = 576 MB
No. of RTRs enabled     = 2
No. of RTRs disabled    = 0

DIAG RESULTS:
/hw/module/1/slot/n3/node/mem: MEMBANK(S) 3  disabled
Reason:
Bank 3: Some DIMMs failed mem test.
/hw/module/1/slot/n4/node/mem: MEMBANK(S) 7  disabled
Reason:
Bank 7: Some DIMMs failed mem test.
**** End System Configuration and Diagnostics Summary ****


Starting up the system...

To perform system maintenance instead, press <Esc>
kl_parse_path: dev_info = NULL
IRIX Release 6.5 IP27 Version 10151453 System V - 64 Bit
Copyright 1987-2002 Silicon Graphics, Inc.
All Rights Reserved.

Setting rbaud to 19200
/hw/module/1/slot/io5/fibre_channel/pci/0/scsi_ctlr/0 (0):
Loop init timeout: LIP READY not received -- giving up [2]
/hw/module/1/slot/io5/fibre_channel/pci/1/scsi_ctlr/0 (1):
Loop init timeout: LIP TCB not completed -- giving up [15]
The system is coming up.

network: WARNING: Failed to configure ef1 as gate-IRIS.
network: WARNING: Failed to configure ef1 as gate-IRIS.
Warning:  Internet Gateway web server running as root.
Use "chkconfig webface_apache off" to disable.
Network time: xntpd.
Starting SGI Freeware Apache httpd
/usr/freeware/apache/sbin/apachectl start: httpd started
inst:
inst: Software installation has installed new configuration files and/or saved
inst: the previous version in some cases.  You may need to update or merge
inst: old configuration files with the newer versions.  See the "Updating
inst: Configuration Files" section in the versions(1M) manual page for details.
inst: The shell command "versions changed" will list the affected files.
inst:

IRIS console login: root
IRIX Release 6.5 IP27 IRIS
Copyright 1987-2002 Silicon Graphics, Inc. All Rights Reserved.
Last login: Thu Feb 28 21:16:01 PST 2008 on ttyd1
TERM = (vt100)
IRIS 1# startx
/usr/gfx/gfxinit: Operation not permitted
gfxinit: graphics initialize failed
Xsgi0[1385]:
Xsgi0[1385]: Fatal server error:
Xsgi0[1385]: Cannot establish any listening sockets - Make sure an X server isn't al
ready running
xinit:  Error 0 (errno 0):  Server error.



Secondly the hinv -vm echo:

Code: Select all

IRIS 38# hinv -vm
Location: /hw/module/1/slot/n1/node
MODULEID Board: barcode K0014982   part              rev
IP31 Board: barcode JRM017     part 030-1255-004 rev  A
IP31PIMM8MB Board: barcode JRX213     part 030-1401-003 rev  A
8P12_MPLN Board: barcode GFK590     part 013-1547-003 rev  E
Location: /hw/module/1/slot/n2/node
IP31 Board: barcode JRM970     part 030-1255-004 rev  A
IP31PIMM8MB Board: barcode JRS944     part 030-1401-003 rev  A
Location: /hw/module/1/slot/n3/node
IP31PIMM8MB Board: barcode JRS747     part 030-1401-003 rev  A
IP31 Board: barcode JRM631     part 030-1255-004 rev  A
Location: /hw/module/1/slot/n4/node
IP31PIMM8MB Board: barcode JDS490     part 030-1401-003 rev  A
IP31 Board: barcode JEB966     part 030-1255-004 rev  A
Location: /hw/module/1/slot/r1/router
ROUTER_IR1 Board: barcode KCJ162     part 030-0841-003 rev  B
Location: /hw/module/1/slot/r2/router
ROUTER_IR1 Board: barcode KCJ137     part 030-0841-003 rev  B
Location: /hw/module/1/slot/io4/menet
MENET Board: barcode HKM786     part 030-0873-003 rev  J
Location: /hw/module/1/slot/io1/baseio
MIO Board: barcode JKV745     part 030-0880-003 rev  G
BASEIO Board: barcode HSE188     part 030-0734-002 rev  N
Location: /hw/module/1/slot/io5/fibre_channel
FIBRE_CHANNEL Board: barcode JJD789     part 030-0927-003 rev  E
Location: /hw/module/1/slot/io3/mgras
MOT10 Board: barcode GXA772     part 030-1241-002 rev  F
8 300 MHZ IP27 Processors
CPU: MIPS R12000 Processor Chip Revision: 2.5
FPU: MIPS R12010 Floating Point Chip Revision: 2.5
CPU 0 at Module 1/Slot 1/Slice A: 300 Mhz MIPS R12000 Processor Chip (enabled)
Processor revision: 2.5. Scache: Size 8 MB Speed 200 Mhz  Tap 0xa
CPU 1 at Module 1/Slot 1/Slice B: 300 Mhz MIPS R12000 Processor Chip (enabled)
Processor revision: 2.5. Scache: Size 8 MB Speed 200 Mhz  Tap 0xa
CPU 2 at Module 1/Slot 2/Slice A: 300 Mhz MIPS R12000 Processor Chip (enabled)
Processor revision: 2.5. Scache: Size 8 MB Speed 200 Mhz  Tap 0xa
CPU 3 at Module 1/Slot 2/Slice B: 300 Mhz MIPS R12000 Processor Chip (enabled)
Processor revision: 2.5. Scache: Size 8 MB Speed 200 Mhz  Tap 0xa
CPU 4 at Module 1/Slot 3/Slice A: 300 Mhz MIPS R12000 Processor Chip (enabled)
Processor revision: 2.5. Scache: Size 8 MB Speed 200 Mhz  Tap 0xa
CPU 5 at Module 1/Slot 3/Slice B: 300 Mhz MIPS R12000 Processor Chip (enabled)
Processor revision: 2.5. Scache: Size 8 MB Speed 200 Mhz  Tap 0xa
CPU 6 at Module 1/Slot 4/Slice A: 300 Mhz MIPS R12000 Processor Chip (enabled)
Processor revision: 2.5. Scache: Size 8 MB Speed 200 Mhz  Tap 0xa
CPU 7 at Module 1/Slot 4/Slice B: 300 Mhz MIPS R12000 Processor Chip (enabled)
Processor revision: 2.5. Scache: Size 8 MB Speed 200 Mhz  Tap 0xa
Main memory size: 13312 Mbytes
Instruction cache size: 32 Kbytes
Data cache size: 32 Kbytes
Secondary unified instruction/data cache size: 8 Mbytes
Memory at Module 1/Slot 1: 4096 MB (enabled)
Bank 0 contains 512 MB (Premium) DIMMS (enabled)
Bank 1 contains 512 MB (Premium) DIMMS (enabled)
Bank 2 contains 512 MB (Standard) DIMMS (enabled)
Bank 3 contains 512 MB (Standard) DIMMS (enabled)
Bank 4 contains 512 MB (Standard) DIMMS (enabled)
Bank 5 contains 512 MB (Standard) DIMMS (disabled)
Bank 6 contains 512 MB (Standard) DIMMS (enabled)
Bank 7 contains 512 MB (Standard) DIMMS (enabled)
Memory at Module 1/Slot 2: 2048 MB (enabled)
Bank 0 contains 512 MB (Premium) DIMMS (enabled)
Bank 1 contains 512 MB (Premium) DIMMS (enabled)
Bank 2 contains 512 MB (Standard) DIMMS (enabled)
Bank 3 contains 512 MB (Standard) DIMMS (enabled)
Memory at Module 1/Slot 3: 3584 MB (enabled)
Bank 0 contains 512 MB (Premium) DIMMS (enabled)
Bank 1 contains 512 MB (Premium) DIMMS (enabled)
Bank 2 contains 512 MB (Standard) DIMMS (enabled)
Bank 3 contains 512 MB (Standard) DIMMS (disabled)
Bank 4 contains 512 MB (Standard) DIMMS (enabled)
Bank 5 contains 512 MB (Standard) DIMMS (enabled)
Bank 6 contains 512 MB (Standard) DIMMS (enabled)
Bank 7 contains 512 MB (Standard) DIMMS (enabled)
Memory at Module 1/Slot 4: 3584 MB (enabled)
Bank 0 contains 512 MB (Premium) DIMMS (enabled)
Bank 1 contains 512 MB (Premium) DIMMS (enabled)
Bank 2 contains 512 MB (Standard) DIMMS (enabled)
Bank 3 contains 512 MB (Standard) DIMMS (disabled)
Bank 4 contains 512 MB (Standard) DIMMS (enabled)
Bank 5 contains 512 MB (Standard) DIMMS (disabled)
Bank 6 contains 512 MB (Standard) DIMMS (enabled)
ROUTER in Module 1/Slot 2: Revision 2: Active Ports [4,5,6] (enabled)
ROUTER in Module 1/Slot 4: Revision 2: Active Ports [4,5,6] (enabled)
Integral SCSI controller 0: Version QL1040B (rev. 2), single ended
Disk drive: unit 1 on SCSI controller 0 (unit 1)
CDROM: unit 6 on SCSI controller 0
Integral SCSI controller 1: Version QL1040B (rev. 2), single ended
Integral SCSI controller 3: Version Fibre Channel AIC-1160, revision 2
Integral SCSI controller 2: Version Fibre Channel AIC-1160, revision 2
IOC3 serial port: tty3
IOC3 serial port: tty4
IOC3 serial port: tty5
IOC3 serial port: tty6
IOC3 serial port: tty7
IOC3 serial port: tty8
IOC3 serial port: tty1
IOC3 serial port: tty2
IOC3 serial port: tty9
IOC3 serial port: tty10
IOC3 parallel port: plp1
Graphics board: ESI with texture option
Fast Ethernet: ef1, version 1, module 1, slot io4, pci 0
Fast Ethernet: ef2, version 1, module 1, slot io4, pci 1
Fast Ethernet: ef3, version 1, module 1, slot io4, pci 2
Fast Ethernet: ef4, version 1, module 1, slot io4, pci 3
Integral Fast Ethernet: ef0, version 1, module 1, slot io1, pci 2
Iris Audio Processor: version RAD revision 7.0, number 1
Origin MENET board, module 1 slot 4: Revision 4
PCI Adapter ID (vendor 4265, device 3) pci slot 0
PCI Adapter ID (vendor 4265, device 3) pci slot 1
PCI Adapter ID (vendor 4265, device 3) pci slot 2
PCI Adapter ID (vendor 4265, device 3) pci slot 3
Origin BASEIO board, module 1 slot 1: Revision 4
PCI Adapter ID (vendor 4265, device 3) pci slot 6
PCI Adapter ID (vendor 4265, device 3) pci slot 2
PCI Adapter ID (vendor 4215, device 4128) pci slot 0
PCI Adapter ID (vendor 4215, device 4128) pci slot 1
PCI Adapter ID (vendor 4265, device 5) pci slot 7
Origin FIBRE CHANNEL board, module 1 slot 5: Revision 4
PCI Adapter ID (vendor 36868, device 4448) pci slot 0
PCI Adapter ID (vendor 36868, device 4448) pci slot 1
IOC3 external interrupts: 1
HUB in Module 1/Slot 1: Revision 5 Speed 100.00 Mhz (enabled)
HUB in Module 1/Slot 2: Revision 5 Speed 100.00 Mhz (enabled)
HUB in Module 1/Slot 3: Revision 5 Speed 100.00 Mhz (enabled)
HUB in Module 1/Slot 4: Revision 5 Speed 100.00 Mhz (enabled)
IP27prom in Module 1/Slot n1: Revision 6.156
IP27prom in Module 1/Slot n2: Revision 6.156
IP27prom in Module 1/Slot n3: Revision 6.156
IP27prom in Module 1/Slot n4: Revision 6.156
IO6prom on Global Master Baseio in Module 1/Slot io1: Revision 6.129



Thirdly /usr/gfx/gfxinfo/usr/gfx/gfxinfo

Code: Select all

IRIS 40# /usr/gfx/gfxinfo/usr/gfx/gfxinfo
Graphics board 0 is "IMPACTSR" graphics.
Managed (":0.0") 1280x1024
Product ID 0x2, 1 GE, 1 RE, 4 TRAMs
MGRAS revision 4, RA revision 0
HQ rev B, GE12 rev A, RE4 rev C, PP1 rev E,
VC3 rev A, CMAP rev E
unknown, assuming 19" monitor (id 0xf)

(Could not contact X server; thus, no XSGIvc information available)



Your advise would be embraced.

Thanks to all.
You've already tried to reseat the RAM and/or have tried it in another location to see if the problem followed the RAM?
aristides wrote: went through ENABLEALL, UPDATE, RESET....

Code: Select all

Command Monitor.  Type "exit" to return to the menu.
>> enableall
>> update
>> reset
Resetting the system...

You might try going into POD mode to clear and flush the diagnostic logs:

Enter POD mode from the PROM command line by entering "pod", then:
"go cac"
"clearalllogs"
"initalllogs"
"flush"
"reset" <the system will reset>

When it restarts, go back into the PROM monitor and try "enableall", "update". and "reset" again.

If that doesn't work, I have one other possiblity you can try <though I'll warn you ahead of time that it's
a long shot *and* I haven't tried it with disabled RAM>. I was able re-enable processors
on a 250MHz nodeboard that appeared in the PROM as "permanently disabled" by skipping the power on diagnostics
check. Various incantations of the normal POD mode log reset method had already failed several times
when I happened to re-try the process after a warm reboot from IRIX and was surprised to see the processors
re-enabled. I ran the board thru several cold reboots without problem and returned it to the owner
<he'd asked my help in reviving it> over a year ago and last I heard it was still working.

You can force the system to skip diagnostics by setting specific hardware or software switches on the MSC,
but the simplest way is to warm reboot the system <with IRIX running>. So:

If the system is able, boot into IRIX.
Warm reboot the system, it should skip power on diagnostics when it restarts.
Stop in the PROM monitor, enter POD mode and repeat the process to clear and re initialize the logs.
If you're *real lucky*, when the system restarts from the POD reset it won't disable the memory.
Since we <tried to> trick the system, closely monitor the running system for memory errors.

If not, then more than likely that means you need to replace or remove some RAM.

BTW - I moved your post to the "Hardware" forum where you're more likely to get the "help" portion of your post the attention it needs. You can repost the hinv in the Hinv forum if you'd like

aristides wrote: Is my hardware such as the IO6G correctly defined?

The IO6G seems to appear ok in you hinv:

Code: Select all

Location: /hw/module/1/slot/io1/baseio
MIO Board: barcode JKV745     part 030-0880-003 rev  G
BASEIO Board: barcode HSE188     part 030-0734-002 rev  N

Though you do have a PROM revision mismatch between the IO6G and the nodeboards:

Code: Select all

IP27prom in Module 1/Slot n1: Revision 6.156
IP27prom in Module 1/Slot n2: Revision 6.156
IP27prom in Module 1/Slot n3: Revision 6.156
IP27prom in Module 1/Slot n4: Revision 6.156
IO6prom on Global Master Baseio in Module 1/Slot io1: Revision 6.129

See "man flash" after IRIX loads to correct the issue.

You've got an IO6G and graphics installed. Have you tried cold booting with a keyboard, mouse and monitor attached?
***********************************************************************
Welcome to ARMLand - 0/0x0d00
running...(sherwood-root 0607201829)
* InfiniteReality/Reality Software, IRIX 6.5 Release *
***********************************************************************
I see the word "directory" in one of the error messages. You might want to consider yanking the directory RAM to see if that helps.

I'm a fan of the "Standard"->"Premium" upgrade myself, and probably a quarter of my memory banks are "Premium" in my 16p, even though it doesn't make any difference, but in this case, it looks like the directory memory is causing problems.

Chris
:O2000R: (<-EMXI/IO6G) :O200: :O200: :O200: (<- quad R12k O200 w/GIGAchannel and ESI+Tex) plus a bunch of assorted standalone workstations...
recondas wrote:
Enter POD mode from the PROM command line by entering "pod", then:
"go cac"
"clearalllogs"
"initalllogs"
"flush"
"reset" <the system will reset>

When it restarts, go back into the PROM monitor and try "enableall", "update". and "reset" again.

If that doesn't work, I have one other possiblity you can try <though I'll warn you ahead of time that it's
a long shot *and* I haven't tried it with disabled RAM>......

......So:

If the system is able, boot into IRIX.
Warm reboot the system, it should skip power on diagnostics when it restarts.
Stop in the PROM monitor, enter POD mode and repeat the process to clear and re initialize the logs.
If you're *real lucky*, when the system restarts from the POD reset it won't disable the memory.
Since we <tried to> trick the system, closely monitor the running system for memory errors.


Going backwards and forward with this option sorry to say Recondas. Sometimes there is slightly more RAM available, sometimes less. What seems to remain constant is that in Module 1/Slot 2 - 2GB doesn't seem to be seen by the system.

aristides wrote: Memory at Module 1/Slot 2: 2048 MB (enabled)
Bank 0 contains 512 MB (Premium) DIMMS (enabled)
Bank 1 contains 512 MB (Premium) DIMMS (enabled)
Bank 2 contains 512 MB (Standard) DIMMS (enabled)
Bank 3 contains 512 MB (Standard) DIMMS (enabled)
Memory at Module 1/Slot 3: 3584 MB (enabled)
Bank 0 contains 512 MB (Premium) DIMMS (enabled)



The Keeper wrote: I see the word "directory" in one of the error messages. You might want to consider yanking the directory RAM to see if that helps.


Thanks for the note Chris. Unfortunately pulling out the separate directory RAM cards from all 4 nodes did not help. Same message wrt 'directory' error. Now without the physical directory cards, I believe the system is drawing on the 'on-board' main RAM units.
sometimes you see that a DIMM slot has got filled with crap and dust if it has been in a system with empty banks and then been upgraded.
Whan I have had this issue in the past I strip the board down and clean all the DIMM sockets with a good contact cleaner and wipe the DIMM contacts clean as well.

You may also have duff RAM, it happens...

Did you load the graphics drivers for Irix after you added the ESI?
:ChallengeL: :O2000: :Onyx2: :Onyx: :O2000R: :O2000R: :O2000E: :O2000E: :Onyx2R: :O3000: :0300: :0300: :0300: :Indy: :Indigo2: :Indigo2: :Indigo2IMP: :Octane: :Octane: :Octane2: :Octane2: :Fuel: :Fuel:
maxsleg wrote: sometimes you see that a DIMM slot has got filled with crap and dust if it has been in a system with empty banks and then been upgraded.
Whan I have had this issue in the past I strip the board down and clean all the DIMM sockets with a good contact cleaner and wipe the DIMM contacts clean as well.


Thanks for that advise. I'll go through and clear the dust with a mini compressed air hose. Any suggestions on which contact cleaner to use?

maxsleg wrote: Did you load the graphics drivers for Irix after you added the ESI?


No.. not manually. A bit green from my end regarding this. Doesn't the installtion procedure cater for this? or does it need to be done manually?
DON'T use compressed air, all you will do is blow crap deeper into things!

Take the board to a ESD work bench and gently brush out the dust.


I would suggest the following as you are a newbie.
Get a fresh HDD & irix install set and do a complete fresh install - then you can config things as you like it and learn a bit on the way.
:ChallengeL: :O2000: :Onyx2: :Onyx: :O2000R: :O2000R: :O2000E: :O2000E: :Onyx2R: :O3000: :0300: :0300: :0300: :Indy: :Indigo2: :Indigo2: :Indigo2IMP: :Octane: :Octane: :Octane2: :Octane2: :Fuel: :Fuel:
maxsleg wrote: DON'T use compressed air, all you will do is blow crap deeper into things!

Take the board to a ESD work bench and gently brush out the dust.


OK will do. Thsi may turn into a long term project. Refurbishment is low priority at the moment although will take it step by step.

maxsleg wrote: I would suggest the following as you are a newbie.
Get a fresh HDD & irix install set and do a complete fresh install - then you can config things as you like it and learn a bit on the way.


Here is what I started with... a new 126GB disk and fresh install. Obviously initially connected a null serial from PC to Origin and commenced installing. During initial install, I had a second external scsi cd device and video cable connected. Aside the problems with RAM recognition (advise addressed above), I initially noticed the monitor working (although flickering) while having both serial cable and video cable connected. In order to alleviate the problem, I re-ran the installation using the serial cable only. Fully installed system was then rebooted although could not get a signal to the monitor (this was the same case with previous install as I rebooted and then could not lock in the signal to the monitor. I know the monitor should work as is a GDM17SE Sony and have tested this on an Octane. I'll go back and clean the RAM points although will probably experience the same video issue. I did notice some errors in terminal mode towards the end of the installation process - didn't get a record of what they were (my mistake here).

So any further advise from the Nekochan community would be very welcomed ;)
aristides wrote: So any further advise from the Nekochan community would be very welcomed ;)

<if you haven't already done so, visit SGI's Technical Pubication Library and check out copies of the Origin 2000 stuff>

Try stripping the O2k until you get a minimum configuration that boots without error.

Remove:
Directory RAM
All standard RAM except the pair in Bank 0 on each node <your hinv indicates all Bank 0s were working>
The Graphics module
The IO6G <if you still have the IO6 to replace it with>
The MENET and FC boards
The HD that contains the failed IRIX install
The external CD
If necessary, all but one nodeboard
<from this point make and test each change/reconfiguration *one* step at a time - it'll take more time, but it will also enable you to make more sense of any errors>

Connect a serial terminal <enable a *large* scroll back buffer on the terminal program and save each session>.

1) Boot to the PROM monitor and issue "resetenv"
2) Enter POD mode from the PROM command line by entering "pod", then:
    "go cac"
    "clearalllogs"
    "initalllogs"
    "flush"
    "reset" <the system will reset>
3) When it restarts, stop in the PROM and:
    run "enableall",followed by "update" at the PROM command line
<NOTE: repeat this 3 step process after *every* hardware error>

Reboot - are there any error messages?

If so - what are they? <stop and report back>

If not, install the IO6G and graphics board <but *nothing* else yet and do not connect kb, m, or monitor>
Boot to the PROM monitor, and "update" the PROM hardware invertory
Boot again - if errors appear report back

If no errors appear during the boot to PROM
Pwer down, re-install the boot drive,
restart the system,
clear/prep the drive and install IRIX <what revision is your install set, btw?>

If there are install errors <stop and report back>

If not, connect a kb, mouse and monitor, <leave the serial terminal connected for now> and attempt to boot IRIX

If booting IRIX is unsuccessful what errors appeared?

If the IRIX boot was successful, test each RAM set in Bank 0 of a nodeboard <*no* Directory RAM yet>.
If any set gives errors, record the error message, init the POD log, update the PROm inventory, and test the remaining sets.

Once you have eliminated any problem RAM
Try the RAM that passed in the other memory banks
If there are any errors during this process, try another known good set in the problem bank
if the problem persists <and cleaning the slot(s) didn't help>, skip the bank or replace the nodeboard

Once the RAM is tested and running w/o error, reinstall the MENET and FC boards
You can also reinstall the Directory RAM, but in an 8 processor system it does little beyond using electricity and producing heat.

BTW - when you remove nodeboards the compression connectors <labeled "Connector Actuation 7/64 Hex> should be released first, then the phillips headed machine screws at the top and bottom of each board.

When you install nodeboard, reverse the process. Tighten the machine screws first, then the compression bolts <I alternate turning each bolt in a pair a few turns at a time so the connector is seated evenly, but *do not* over tighten>. Following this procedure prevents the compression connector having to support the weight of the nodeboard during removal/installation.
***********************************************************************
Welcome to ARMLand - 0/0x0d00
running...(sherwood-root 0607201829)
* InfiniteReality/Reality Software, IRIX 6.5 Release *
***********************************************************************
recondas wrote: Try stripping the O2k until you get a minimum configuration that boots without error.

Remove:
Directory RAM
All standard RAM except the pair in Bank 0 on each node <your hinv indicates all Bank 0s were working>
The Graphics module
The IO6G <if you still have the IO6 to replace it with>
The MENET and FC boards
The HD that contains the failed IRIX install
The external CD
If necessary, all but one the nodeboard
<from this point make and test each change/reconfiguration *one* step at a time - it'll take more time, but it will also enable you to make more sense of any errors>


Recondas... your stepwise advise is the essence of support :D

This will take some time for me to implement as will tend to work on this when I will have some 'free time' under my belt. I'll report back in a stepwise manner while following your advise.

Thank you again and I can hopefully commence this process very soon.
recondas wrote: Connect a serial terminal <enable a *large* scroll back buffer on the terminal program and save each session>.

Boot to the PROM monitor and issue "resetenv"
Go into POD mode and reinitialize the POD logs
Reset the system, stop in the PROM and enableall/update <repeat this 3 step process after *every* hardware error>

Reboot - are there any error messages?

If so - what are they? <stop and report back>


This seemed to work well. Here comes the output:

Code: Select all

>> Initializing PROM Device drivers ..........             DONE
>> checking hardware inventory ...............              /hw/module/1/slot/n1:ME
>> BANK 2 missing or disabled
>> /module/1/slot/n1:MEM BANK 3 missing or disabled
>>
>> Warning: Board in module 1, slot io6 is missing or disabled
>> previously contained a ROUTER board, barcode JJD789 laser 38e83e
>>
>> Warning: Board in module 1, slot io5 is missing or disabled
>> previously contained a MSCSI board, barcode GTF800 laser 2570e4
>>
>> Warning: Board in module 1, slot io3 is missing or disabled
>> previously contained a New Type board, barcode MGA947 laser 5a1924
>> Warning: Found a new IP27 board in module 1, slot n2, serial JRM970
>> ase use the 'update' command from the PROM Monitor to update the inventory
>> Warning: Found a new IP27 board in module 1, slot n3, serial JRS747
>> ase use the 'update' command from the PROM Monitor to update the inventory
>> Warning: Found a new IP27 board in module 1, slot n4, serial JDS490
>> ase use the 'update' command from the PROM Monitor to update the inventory
>> E
>>
>>
>> System Maintenance Menu
>>
>> Start System
>> Install System Software
>> Run Diagnostics
>> Recover System
>> Enter Command Monitor
>>
>> ion?
>>
>>
>>
>>
>>
>> ls
dksc(0,1,8)/: no such device
dksc(0,1,0)/: no such device
>> resetenv
>> pod

Switching into Power-On Diagnostics mode...


1A 000: *** Software entry into POD mode from IO6 POD mode on node 0
1A 000: POD IOC3 Dex> go cac
Testing/Initializing memory

1A 000: *** Requested CAC mode on node 0
1A 000: POD IOC3 Cac> clearalllogs
*** This must be run only after NUMAlink discovery is complete.
*** This will clear all previous log variables such as:
*** moduleids, nodeids, etc. for all nodes.
Clear all logs? [n] y
All PROM logs cleared!
1A 000: POD IOC3 Cac> initalllogs
*** This must be run only after NUMAlink discovery is complete.
*** This will clear all previous log variables such as:
*** moduleids, nodeids, etc. for all nodes.
Clear all logs environment variables, and aliases ? [n] y
All PROM logs cleared!
1A 000: POD IOC3 Cac> flush
1A 000: POD IOC3 Cac> reset
Resetting the system...


IP27 PROM SGI Version 6.156  built 11:27:56 AM Nov 18, 2003
Testing/Initializing memory ...............             DONE
Copying PROM code to memory ...............             DONE
Discovering local IO ......................             DONE
Discovering NUMAlink connectivity .........             DONE
Found 6 objects (4 hubs, 2 routers) in 66468 usec
Waiting for peers to complete discovery....             DONE
Recognized 390 MHz midplane
Global master is /hw/module/1/slot/n1
Testing/Initializing all memory ...........             DONE
Checking partitioning information .........             DONE
Loading BASEIO prom .......................             DONE

BASEIO PROM Monitor SGI Version 6.94  built 03:59:15 PM Dec  5, 2001 (BE64)
8 CPUs on 4 nodes found.
Installing PROM Device drivers ............

Walking SCSI Adapter 0 (/hw/module/1/slot/io1), (pci id 0)
1- 2- 3- 4- 5- 6+ 7- 8- 9- 10- 11- 12- 13- 14- 15- = 1 device(s)


Walking SCSI Adapter 1 (/hw/module/1/slot/io1), (pci id 1)
1- 2- 3- 4- 5- 6- 7- 8- 9- 10- 11- 12- 13- 14- 15- = 0 device(s)

Initializing PROM Device drivers ..........             DONE
Checking hardware inventory ...............              /hw/module/1/slot/n1:ME
M BANK 2 missing or disabled
/hw/module/1/slot/n1:MEM BANK 3 missing or disabled

***Warning: Board in module 1, slot io6 is missing or disabled
It previously contained a ROUTER board, barcode JJD789 laser 38e83e

***Warning: Board in module 1, slot io5 is missing or disabled
It previously contained a MSCSI board, barcode GTF800 laser 2570e4

***Warning: Board in module 1, slot io3 is missing or disabled
It previously contained a New Type board, barcode MGA947 laser 5a1924
***Warning: Found a new IP27 board in module 1, slot n2, serial JRM970
Please use the 'update' command from the PROM Monitor to update the inventory
***Warning: Found a new IP27 board in module 1, slot n3, serial JRS747
Please use the 'update' command from the PROM Monitor to update the inventory
***Warning: Found a new IP27 board in module 1, slot n4, serial JDS490
Please use the 'update' command from the PROM Monitor to update the inventory
DONE

**** System Configuration and Diagnostics Summary ****
CONFIG:
No. of NODEs enabled    = 4
No. of NODEs disabled   = 0
No. of CPUs enabled     = 8
No. of CPUs disabled    = 0
Mem enabled             = 4096 MB
Mem disabled            = 0 MB
No. of RTRs enabled     = 2
No. of RTRs disabled    = 0

DIAG RESULTS:
ALL DIAGS PASSED.
**** End System Configuration and Diagnostics Summary ****



recondas wrote: If not, install the IO6G and graphics board <but *nothing* else yet and do not connect kb, m, or monitor>
Boot to the PROM monitor, and "update" the PROM hardware invertory
Boot again - if errors appear report back


Done. Reinstalled IO6G. I did though shutdown the system prior to removing original IO6. Hope this was correct. No errors after startup.


recondas wrote: If no errors appear during the boot to PROM
Pwer down, re-install the boot drive,
restart the system,
clear/prep the drive and install IRIX <what revision is your install set, btw?>

If there are install errors <stop and report back>


Yes it was this site that was used as reference previously. Brilliant notes! IRIX 6.5.18 installed. No instalkl errors. Standard conflicts at start regarding java plugins. Here is the hinv -vm info so far:

Code: Select all

IRIS console login: root
IRIX Release 6.5 IP27 IRIS
Copyright 1987-2002 Silicon Graphics, Inc. All Rights Reserved.
TERM = (vt100)
IRIS 1#
IRIS 1# hinv -vm
Location: /hw/module/1/slot/n1/node
MODULEID Board: barcode K0014982   part              rev
IP31 Board: barcode JRM017     part 030-1255-004 rev  A
IP31PIMM8MB Board: barcode JRX213     part 030-1401-003 rev  A
8P12_MPLN Board: barcode GFK590     part 013-1547-003 rev  E
Location: /hw/module/1/slot/n2/node
IP31 Board: barcode JRM970     part 030-1255-004 rev  A
IP31PIMM8MB Board: barcode JRS944     part 030-1401-003 rev  A
Location: /hw/module/1/slot/n3/node
IP31PIMM8MB Board: barcode JRS747     part 030-1401-003 rev  A
IP31 Board: barcode JRM631     part 030-1255-004 rev  A
Location: /hw/module/1/slot/n4/node
IP31PIMM8MB Board: barcode JDS490     part 030-1401-003 rev  A
IP31 Board: barcode JEB966     part 030-1255-004 rev  A
Location: /hw/module/1/slot/r1/router
ROUTER_IR1 Board: barcode KCJ162     part 030-0841-003 rev  B
Location: /hw/module/1/slot/r2/router
ROUTER_IR1 Board: barcode KCJ137     part 030-0841-003 rev  B
Location: /hw/module/1/slot/io1/baseio
MIO Board: barcode JKV745     part 030-0880-003 rev  G
BASEIO Board: barcode HSE188     part 030-0734-002 rev  N
Location: /hw/module/1/slot/io3/mgras
MOT10 Board: barcode GXA772     part 030-1241-002 rev  F
8 300 MHZ IP27 Processors
CPU: MIPS R12000 Processor Chip Revision: 2.5
FPU: MIPS R12010 Floating Point Chip Revision: 2.5
CPU 0 at Module 1/Slot 1/Slice A: 300 Mhz MIPS R12000 Processor Chip (enabled)
Processor revision: 2.5. Scache: Size 8 MB Speed 200 Mhz  Tap 0xa
CPU 1 at Module 1/Slot 1/Slice B: 300 Mhz MIPS R12000 Processor Chip (enabled)
Processor revision: 2.5. Scache: Size 8 MB Speed 200 Mhz  Tap 0xa
CPU 2 at Module 1/Slot 2/Slice A: 300 Mhz MIPS R12000 Processor Chip (enabled)
Processor revision: 2.5. Scache: Size 8 MB Speed 200 Mhz  Tap 0xa
CPU 3 at Module 1/Slot 2/Slice B: 300 Mhz MIPS R12000 Processor Chip (enabled)
Processor revision: 2.5. Scache: Size 8 MB Speed 200 Mhz  Tap 0xa
CPU 4 at Module 1/Slot 3/Slice A: 300 Mhz MIPS R12000 Processor Chip (enabled)
Processor revision: 2.5. Scache: Size 8 MB Speed 200 Mhz  Tap 0xa
CPU 5 at Module 1/Slot 3/Slice B: 300 Mhz MIPS R12000 Processor Chip (enabled)
Processor revision: 2.5. Scache: Size 8 MB Speed 200 Mhz  Tap 0xa
CPU 6 at Module 1/Slot 4/Slice A: 300 Mhz MIPS R12000 Processor Chip (enabled)
Processor revision: 2.5. Scache: Size 8 MB Speed 200 Mhz  Tap 0xa
CPU 7 at Module 1/Slot 4/Slice B: 300 Mhz MIPS R12000 Processor Chip (enabled)
Processor revision: 2.5. Scache: Size 8 MB Speed 200 Mhz  Tap 0xa
Main memory size: 4096 Mbytes
Instruction cache size: 32 Kbytes
Data cache size: 32 Kbytes
Secondary unified instruction/data cache size: 8 Mbytes
Memory at Module 1/Slot 1: 1024 MB (enabled)
Bank 0 contains 512 MB (Standard) DIMMS (enabled)
Bank 1 contains 512 MB (Standard) DIMMS (enabled)
Memory at Module 1/Slot 2: 1024 MB (enabled)
Bank 0 contains 512 MB (Standard) DIMMS (enabled)
Bank 1 contains 512 MB (Standard) DIMMS (enabled)
Memory at Module 1/Slot 3: 1024 MB (enabled)
Bank 0 contains 512 MB (Standard) DIMMS (enabled)
Bank 1 contains 512 MB (Standard) DIMMS (enabled)
Memory at Module 1/Slot 4: 1024 MB (enabled)
Bank 0 contains 512 MB (Standard) DIMMS (enabled)
Bank 1 contains 512 MB (Standard) DIMMS (enabled)
ROUTER in Module 1/Slot 2: Revision 2: Active Ports [4,5,6] (enabled)
ROUTER in Module 1/Slot 4: Revision 2: Active Ports [4,5,6] (enabled)
Integral SCSI controller 0: Version QL1040B (rev. 2), single ended
Disk drive: unit 1 on SCSI controller 0 (unit 1)
CDROM: unit 6 on SCSI controller 0
Integral SCSI controller 1: Version QL1040B (rev. 2), single ended
IOC3 serial port: tty1
IOC3 serial port: tty2
IOC3 serial port: tty3
IOC3 serial port: tty4
IOC3 parallel port: plp1
Graphics board: ESI with texture option
Integral Fast Ethernet: ef0, version 1, module 1, slot io1, pci 2
Iris Audio Processor: version RAD revision 7.0, number 1
Origin BASEIO board, module 1 slot 1: Revision 4
PCI Adapter ID (vendor 4265, device 3) pci slot 6
PCI Adapter ID (vendor 4265, device 3) pci slot 2
PCI Adapter ID (vendor 4215, device 4128) pci slot 0
PCI Adapter ID (vendor 4215, device 4128) pci slot 1
PCI Adapter ID (vendor 4265, device 5) pci slot 7
IOC3 external interrupts: 1
HUB in Module 1/Slot 1: Revision 5 Speed 100.00 Mhz (enabled)
HUB in Module 1/Slot 2: Revision 5 Speed 100.00 Mhz (enabled)
HUB in Module 1/Slot 3: Revision 5 Speed 100.00 Mhz (enabled)
HUB in Module 1/Slot 4: Revision 5 Speed 100.00 Mhz (enabled)
IP27prom in Module 1/Slot n1: Revision 6.156
IP27prom in Module 1/Slot n2: Revision 6.156
IP27prom in Module 1/Slot n3: Revision 6.156
IP27prom in Module 1/Slot n4: Revision 6.156
IO6prom on Global Master Baseio in Module 1/Slot io1: Revision 6.129


recondas wrote: If not, connect a kb, mouse and monitor, <leave the serial terminal connected for now> and attempt to boot IRIX

If booting IRIX is unsuccessful what errors appeared?


Now seems that IRIX did properly install.... but here's the rub.... I seem to rely on the serial connection to initiate the xserver [./gfxstop then ./gfxstart]. Then I see the monitor light up and startup screen showing - without console, I can't seem to get the monitor to work. When the monitor is showing, it has a terrible flicker. Can't seem to shake it off (excuse the pun). While flickering, managed to log in (via the gui) as root and go over to system/display attributes to change size/hz to no avail.

So nearly there and not yet. Once I get this out of the way, I still have to test the RAM and reinstall the MENET and FC boards.

So over to recondas and the Team. :)
The console is only "good" for investigating boot messages and for the initial setup. After that keep the network up and running so that you can using ssh/telnet. After that you can export our DISPLAY to your client and use the GUI or CMD tools if needed. Maybe this helps to get your GFX working.

regards
Joerg
aristides wrote: This seemed to work well.

Good - I'm glad you found it useful.
aristides wrote:

Code: Select all


**** System Configuration and Diagnostics Summary ****
CONFIG:
No. of NODEs enabled    = 4
No. of NODEs disabled   = 0
No. of CPUs enabled     = 8
No. of CPUs disabled    = 0
Mem enabled             = 4096 MB
Mem disabled            = 0 MB
No. of RTRs enabled     = 2
No. of RTRs disabled    = 0

DIAG RESULTS:
ALL DIAGS PASSED.
**** End System Configuration and Diagnostics Summary ****

That looks good


aristides wrote: Reinstalled IO6G. I did though shutdown the system prior to removing original IO6. Hope this was correct.

Yes, you did the right thing. I should have offer a little more clarification. **No hardware** gets installed when the system is powered on <well, at least not from our current perspective>.

Looks like IRIX knows the graphics option board and the IO6G are installed <though I haven't referenced them in order, they do appear in your hinv>:

Code: Select all


Location: /hw/module/1/slot/io3/mgras
MOT10 Board: barcode GXA772     part 030-1241-002 rev  F
Graphics board: ESI with texture option

<and>

Code: Select all

Location: /hw/module/1/slot/io1/baseio
MIO Board: barcode JKV745 part 030-0880-003 rev G
BASEIO Board: barcode HSE188 part 030-0734-002 rev N


Code: Select all

IP27prom in Module 1/Slot n1: Revision 6.156
IP27prom in Module 1/Slot n2: Revision 6.156
IP27prom in Module 1/Slot n3: Revision 6.156
IP27prom in Module 1/Slot n4: Revision 6.156
IO6prom on Global Master Baseio in Module 1/Slot io1: Revision 6.129

BTW, your nodeboards and IO6G still have different PROM revisions <6.129 and 6.156>. When you get a chance read through "man flash".

aristides wrote: Now seems that IRIX did properly install.... but here's the rub.... I seem to rely on the serial connection to initiate the xserver [./gfxstop then ./gfxstart]. Then I see the monitor light up and startup screen showing - without console, I can't seem to get the monitor to work. When the monitor is showing, it has a terrible flicker. Can't seem to shake it off (excuse the pun). While flickering, managed to log in (via the gui) as root and go over to system/display attributes to change size/hz to no avail.

If the graphics board and IO6G were installed when you loaded IRIX all of the necessary software sub-systems should have been included in your IRIX installation. You'll need to stop in the PROM and reconfigure it to use a graphics <rather than serial> console <I suspect that's also the reason you're having issues with the graphics>.

"man prom" has all the info you need, but there is a multi-page nekochan topic on adding <and configuring> graphics and a keyboard/mouse to an O2k - you might want to take a look at that. BTW, once you get it set up and start booting to the "graphics console" <KB&M and monitor> you won't have any graphics output until IRIX loads <be patient, the O2k power-on diagnostics can take a while>. On a similar note, once graphics are up and running, if you need to access the PROM menus, you'll need to shut down, disconnect the KB&M, reconnect the serial terminal and reboot - the PROM should sense the missing KB&M and re-directed the console to the serial terminal.

"man prom" can be found here <take a look at the 'console' setting and compare it to the console in your current PROM environment>:
http://techpubs.sgi.com/library/tpl/cgi ... e=1%20prom
and the nekochan topic "Origin 2000 Deskside + SI-Tram Graphics. how its going" is here:
viewtopic.php?f=3&t=5749&
Some of the configuration steps used to get the O2k and IRIX to use the CADDuo KB&M ports *won't* apply to you <the IO6G should be transparent to the system>.. IO6Gs became a little more common towards the end of the topic, so even though it is seven pages long, I'd recommend reading the entire topic. You might find this particular post useful:
viewtopic.php?p=60658#p60658


aristides wrote: So nearly there and not yet. Once I get this out of the way, I still have to test the RAM and reinstall the MENET and FC boards.

Let us know how it goes.
***********************************************************************
Welcome to ARMLand - 0/0x0d00
running...(sherwood-root 0607201829)
* InfiniteReality/Reality Software, IRIX 6.5 Release *
***********************************************************************