SGI: Hardware

What did I do?!

I brought home a MIPS-based Cobalt Qube 2 I picked up at the local university surplus store today, and the first thing I did was try to connect it to my (already running) Indigo2 through its second serial port to monitor the Qube's console during boot. When I first tried to connect to the serial port on the Qube, there was a little spark. Recalling a similar recent event at work with an ungrounded embedded system's serial port, I figured that the Qube just wasn't grounded through its power supply and had floated to some unknown voltage and would be fine once the serial was connected.

Bad guess. :( To avoid the spark, I first unplugged the Qube's 12V input, plugged in the serial cable, then tried to plug in the Qube again. Right away, I hear a loud CRACK and the Indigo2 buzzes through its speakers and shuts off abruptly, the Qube's 12V plug is charred, and neither the Qube or the I2 would turn on anymore. Oh, crap.

I unplugged both and had dinner with the family. When I came back a half hour later, I plugged back in the I2 and it fired right up. I'm running full diagnostics now, and it seems fine, other than the second serial UART failing loopback. So that's damaged. :evil:

The Qube is dead, so I'm about to open it up and start checking it out. I'm an embedded systems EE, so I at least have a chance at repairing it, but this still sucks.

What the heck did I actually do? RS-232 is supposed to be shorts-tolerant (it's in the spec), so even if the pins were wrong it shouldn't bring down the two systems. I could see the Qube possibly not being fully compliant, but not the I2. Does the Qube have some funky serial port pinout that's totally nonstandard? I've used both of the I2's serial ports before without problems.

Do Indigo2 power supplies have resettable thermal and/or overcurrent protection? It seems like mine does. I couldn't find it in any documentation but I thought someone here might know for sure.

Also, does anyone know if the serial hardware on an IP28 is repairable, or am I down to only one serial port from now on with this CPU board? Here's where I hope to find a discrete serial UART chip, instead of it being embedded in an SGI ASIC.

_________________
:Indigo2IMP: :1600SW: R10K Indigo2 MaxIMPACT, 4 TRAMS, 768MB RAM, 2x9GB HD, CD-ROM, Phobos G160
Black Cardinal
By 12v plug you mean the entire DC power to the Qube?

Short-tolerant is different to being 240v tolerant....
I don't have an Indigo2 anymore but IIRC there are ports next to the serial ports, same size though, that are not RS232. wouldn't want to get those mixed up... I would bet that the serial is glued onto an ASIC somewhere but there might be a replaceable fuse somewhere that could need replacing. If nothing else a new mainboard is not the end of the world (though replacing one on an I2 is quite a chore)

Not familiar at all with the qube but maybe you made a similar mistake and plugged into a port that is not RS232. You also have no idea what is going on in the cube... maybe you just put 120V AC through your indigo2? Always better to try on a PC with questionable equipment first just for oopsies like this.

Probably, something in the qube was toast before this ordeal started, maybe even boobie trapped?

_________________
:Onyx: (Aldebaran) :Octane: (Chaos) :O2: (Machop)
:hp xw9300: (Aggrocrag) :hp dv8000: (Attack)
Another possible scenario is that something in the Qube was shorted to the Qube's case and the Qube's case was not properly grounded. When you plugged the serial cable from the I2 to the Qube, the I2's case (thru the serial port border/shield) provided a ground path for the current, but whatever was shorted to the Qube's case fried.

I've seen this in a few PCs from time to time - someone will plug in a peripheral and the peripheral and/or PC will release magic smoke. People instinctively blame the peripheral or bus controller but it's often times the case grounding combined with a short.

I've also seen somewhat similar behavior out of ground differentials (this is why you *always* tie two Origin racks together with something metal before trying to power them up) - if both systems are grounded separately there can be a potential difference of up to a few V between the grounds, and the differential can manifest in a lot of ways (current flowing through cable shields when attaching the systems together, current ending up on random data pins and frying things, etc.). Based on how quickly and dramatically you describe the issue as happening I doubt this is it, though, especially if you had them plugged into the same circuits (or even different circuits in a house). It's rare to see too much ground potential difference in common household wiring as the grounds aren't usually isolated in any way.

_________________
:0300: <> :0300: :Indy: :1600SW: :1600SW:
Quote:
By 12v plug you mean the entire DC power to the Qube?

Yes, the 12V power supply input on the Qube. It has a custom 3-pin DIN-style connector, and came with the original supply.

By shorts-tolerant, I meant that the RS-232 spec requires the ports themselves to be shorts-tolerant, so that if you accidentally connect two pins together that are not supposed to be, it won't work but it won't be damaged, either. I know this is true for the signal lines (TX, RX, CTS, etc.), but I'm not sure about the power lines.

But even so, I find it hard to believe that Cobalt would have a special pinout for the DB-9 serial connector. And it is labeled "Serial," and from what I've read the only console it has is through its serial port.

Quote:
Always better to try on a PC with questionable equipment first just for oopsies like this.

The only PC around here is my wife's laptop, and I don't think she'd appreciate me using it to test my questionable hardware purchases! :lol: Seriously though, you have a good point.

I did test the Qube before I bought it, and it booted. It has a status LCD and I could see it doing a disk check. Here's a pic of the back of the Qube:

_________________
:Indigo2IMP: :1600SW: R10K Indigo2 MaxIMPACT, 4 TRAMS, 768MB RAM, 2x9GB HD, CD-ROM, Phobos G160
Black Cardinal
bri3d wrote:
Another possible scenario is that something in the Qube was shorted to the Qube's case and the Qube's case was not properly grounded. When you plugged the serial cable from the I2 to the Qube, the I2's case (thru the serial port border/shield) provided a ground path for the current, but whatever was shorted to the Qube's case fried.

Seems like the most likely scenario ...

Quote:
if both systems are grounded separately there can be a potential difference of up to a few V between the grounds, and the differential can manifest in a lot of ways (current flowing through cable shields when attaching the systems together ....

This is why experienced people cut the shield on one end of any rs-232 cable. The shield is supposed to be a shield, not a conductor. If you accidentally have different potentials on the pieces being connected the shield carries current. Bad idear.
I did some troubleshooting on the Qube and discovered that a fuse had blown in the 12V brick power supply. It's a small 1.5A fuse soldered onto the PCB. I didn't have the correct (physical) size fuse, so I soldered in an inline AGC fuse holder for now. The Qube still wouldn't turn on, so I tried using an HP bench supply instead, but that didn't make a difference. Then I disassembled the Qube completely and put it back together one subsystem at a time, testing it under power, until I found the culprit: the hard drive. It's drawing considerably over 3A by itself and causing the power supply voltage to drop to about 4V. I don't know if the drive was already on its way out or if it died as a result of this episode. However, without the drive the Qube seems to power up OK with the brick supply again.

During the teardown I didn't find anything suspicious that might have caused a short. I tested the power supply to get the pinout, and verified that the Qube is isolated from the AC through its power supply, like a laptop. It's chassis is connected to the DC common, and SIGNAL GROUND on pin 5 of the serial port is tied directly to it as well. Even if the hard drive was shorting 12V to DC common, since the whole thing is floating relative to the AC it shouldn't have mattered when attached to the I2.

The I2 is not so good. It finished the diagnostic and although it only failed the loopback test for ttyd2, when I pressed ENTER to exit the diagnostic, the screen went black and it hanged. I shut it off, and now when I power it up it just reports " Warning: persistent break condition on serial port 0. " and then hangs. It doesn't even make it to the PROM maintenance menu. The next step is to crack open the case and take a look at what components are attached to the serial ports, I guess. :(

_________________
:Indigo2IMP: :1600SW: R10K Indigo2 MaxIMPACT, 4 TRAMS, 768MB RAM, 2x9GB HD, CD-ROM, Phobos G160
Black Cardinal
Yikes! poor I2. I'm somewhat amazed that a bad hard drive could have caused this. I'll be following the thread with interest.

_________________
:Onyx2R: :IRIS3130: :O2000: :PI: :PI: :Indigo: :Octane: :O2: :O2: :Indigo2: :Indy: :Indy: :Indy: :1600SW: :pdp8e:
:BA213: <- MicroVAX 3500 :BA213: <- DECsystem 5500 :BA215: <- MicroVAX 3300
I took the Qube's hard drive (20GB parallel ATA) in to work today and tested it in a PC. It doesn't spin up at all, so it's either got a seized bearing and is drawing locked-rotor current or it has a short somewhere on its electronics. I'll try swapping in another disk this evening.

I'm also thinking about a clean way to ground the Qube's chassis instead of letting it float.

I'm anxious to get home to start troubleshooting the I2. I'm one of those people who can't stand being idle on a problem when there is clearly something I should be doing about it. I guess it's the engineering mentality. :D

_________________
:Indigo2IMP: :1600SW: R10K Indigo2 MaxIMPACT, 4 TRAMS, 768MB RAM, 2x9GB HD, CD-ROM, Phobos G160
Black Cardinal
Woohoo, I may be able to repair the Indigo2 after all! :P I was very happy to find a MAX249 serial interface chip on board:
Attachment:
Indigo2 MAX249.jpg
Indigo2 MAX249.jpg [ 484.39 KiB | Viewed 203 times ]

This is a multi-channel serial port driver/receiver chip, one in a very popular line (the most famous version being the MAX232). It doesn't contain the UART itself, but converts from logic-level signals on the CPU/UART side to the +/-12V used in serial connections. The MAX249 version is a more sophisticated variant with enough channels to handle all of the data and control signals for at least two serial ports, and extra pins for readback of the values.

Here's the datasheet: http://datasheets.maxim-ic.com/en/ds/MAX220-MAX249.pdf

With a DMM I probed the chip and the DIN serial ports and found the following connections: (revised after I found a few errors in the MAX 249 pin names)
Code:
SERIAL PORT 1

DIN-8           MAX249       COMMENT
-----------     ---------    --------------------
pin 1 (DTR)     44 TB1out
pin 2 (CTS)     40 RB4in
pin 3 (TXD)     43 TB2out
pin 4 (GND)     19 GND
pin 5 (RXD)     38 RB2in
pin 6 (RTS)     42 TB3out
pin 7 (DCD)     41 RB5in
pin 8 (GND)     N.C.


SERIAL PORT 2

DIN-8           MAX249       COMMENT
-----------     ---------    --------------------
pin 1 (DTR)     1 TA1out     GND short
pin 2 (CTS)     N.C.
pin 3 (TXD)     2 TA2out     low impedance to GND
pin 4 (GND)     N.C.
pin 5 (RXD)     7 RA2in      low impedance to GND
pin 6 (RTS)     3 TA3out     GND short
pin 7 (DCD)     4 RA5in      low impedance to GND
pin 8 (GND)     11 RA2out    low impedance to GND

Found additional ground connections on MAX249 pins 1,3,5,6,8,9,10,13,14,15,16,18,20,22,40,41
All ground connections are also attached to chassis.

The second serial port has also very clearly been damaged. I'm not sure why I couldn't find a few of the pins on the MAX249. It's extremely unlikely they go anywhere else. Does the Indigo2 have ports that can switch between RS232 and RS422 like the Octane? If so, then there's probably another buffer chip to switch some of the connections around.

It's not a cheap chip at $20 from Mouser, but it's in stock and it's a PLCC so it's within my skills to remove and solder a new one in place. It's possible there was further damage beyond this one chip, but the fact that the only diagnostic that failed was the serial loopback test makes me hopeful that it was just unable to read back the values from the damaged MAX249.

If this doesn't solve the problem, as sybrfreq said I can always replace the IP28 board later.

_________________
:Indigo2IMP: :1600SW: R10K Indigo2 MaxIMPACT, 4 TRAMS, 768MB RAM, 2x9GB HD, CD-ROM, Phobos G160
Black Cardinal
Nice sleuthing. Good luck!
Black Cardinal wrote:
Does the Indigo2 have ports that can switch between RS232 and RS422 like the Octane? If so, then there's probably another buffer chip to switch some of the connections around.


Yep, I2/Indy both have the RS-232/RS-422 device-selectable interfaces

_________________
Damn the torpedoes, full speed ahead!

:Indigo: :Octane: :Indigo2: :Indigo2IMP: :Indy: :PI: :O200: :ChallengeL:
SAQ wrote:
Yep, I2/Indy both have the RS-232/RS-422 device-selectable interfaces

That's what I was afraid of, thanks for the confirmation. man serial wasn't entirely clear on this.

I poked around some more with an LED flashlight and found two chips with visible damage:
Attachment:
Indigo2_RS422_damage.jpg
Indigo2_RS422_damage.jpg [ 203.2 KiB | Viewed 136 times ]

The chip at the upper right is a SN75175, which is an RS-422/RS-485 driver/receiver. The one at the bottom right seems to be a DS4691M (the writing is partially obscured by bubbled plastic), and it's one of those obsolete chips that turn out a million search results in Chinese but none have a datasheet. From it's part number I suspect it's some sort of tri-state buffer chip used to switch between the MAX249 and the SN75175 for a few of the data lines. With a DMM I verified that both of these chips have several pins shorted to ground, and in the case of the DS4691M it even has one pin blown off! (The gap is not quite visible in the picture.)

The MAX249 may actually be fine, although in my experience PLCCs can be blown without visible damage, maybe because the package is so thick that the evidence stays buried under millimeters of plastic. The first thing I'll try is removing the two obviously damaged chips, and re-check the MAX249 for shorts to ground. If it's bad, too, then I'll replace it. I'm hoping that I'll be able to run the system with these two parts removed for now, and maybe replace them, or the entire IP28 board, later. If I'm really lucky I'll be able to use the second serial port as RS-232 even though the RS-422 hardware is gone.

_________________
:Indigo2IMP: :1600SW: R10K Indigo2 MaxIMPACT, 4 TRAMS, 768MB RAM, 2x9GB HD, CD-ROM, Phobos G160
Black Cardinal
Black Cardinal wrote:
I'm hoping that I'll be able to run the system with these two parts removed for now, and maybe replace them, or the entire IP28 board, later. If I'm really lucky I'll be able to use the second serial port as RS-232 even though the RS-422 hardware is gone.



Not sure if it looks at the buffer chips, but my guess is that you'll be able to fool the system to run in -232 only mode easily enough. You might need to wire jumpers in place of the tristate, but it should work.

Instead of looking for a full IP28 you can just look around for mainboards from systems with similar serial port hardware. IP22/IP24 are obvious candidates, and IP20 probably has it as well. Chances are someone here will have a parts board that has something wrong with it that's probably not the serial drivers.

_________________
Damn the torpedoes, full speed ahead!

:Indigo: :Octane: :Indigo2: :Indigo2IMP: :Indy: :PI: :O200: :ChallengeL:
I've removed the two visibly damaged chips and when I still measured low impedance between 5V and GND, I went ahead and removed the MAX249 as well. I also found a blown inductor on the I/O backplane board, which explains the missing connection for serial port 2's GND on pin 5:
Attachment:
DSC_6606.jpg
DSC_6606.jpg [ 179.9 KiB | Viewed 112 times ]

However, the system still won't boot beyond the same error message as before: " Warning: persistent break condition on serial port 0. ". Any further troubleshooting will be very difficult, and it's not really worth it.

I guess I am now in the market for an IP28 board. I'll be posting in the Hardware Wanted section next. :( On the bright side, I have now been through a complete teardown and rebuild of an Indigo2. The hardest part is keeping your curious kids away from the myriad screws.

_________________
:Indigo2IMP: :1600SW: R10K Indigo2 MaxIMPACT, 4 TRAMS, 768MB RAM, 2x9GB HD, CD-ROM, Phobos G160
Black Cardinal
Might be upstream, then. Does IP28 use the IOC2 (integrated I/O (kbd/mouse/parallel/serial) used on Indy) or separate UARTS? The SGI docs indicated that they were considering moving to IOC later on in the Indigo2 lifecycle, but I can't recall if they went through with it.

_________________
Damn the torpedoes, full speed ahead!

:Indigo: :Octane: :Indigo2: :Indigo2IMP: :Indy: :PI: :O200: :ChallengeL:
It looks like they didn't. According to a document on majix.org ( http://download.majix.org/sgi/ioc.pdf ), it should be a 176-pin part from VLSI. Fortunately, when I had the IP28 out I took some pictures of the board for later reference. There are no 176-pin parts on the board.

That document also includes a convenient list of the components used in the Fullhouse (Indigo2) design on page 22. I found the serial UART chip near the audio board, a Zilog Z8523010VSX. I'll do a little more probing to see if it might be damaged, too. This is an obsolete part, but as common as it was I can probably get hold of one somewhere.

_________________
:Indigo2IMP: :1600SW: R10K Indigo2 MaxIMPACT, 4 TRAMS, 768MB RAM, 2x9GB HD, CD-ROM, Phobos G160
Black Cardinal
Black Cardinal wrote:
Attachment:
Indigo2_RS422_damage.jpg

The chip at the upper right is a SN75175, which is an RS-422/RS-485 driver/receiver. The one at the bottom right seems to be a DS4691M (the writing is partially obscured by bubbled plastic), and it's one of those obsolete chips that turn out a million search results in Chinese but none have a datasheet. From it's part number I suspect it's some sort of tri-state buffer chip used to switch between the MAX249 and the SN75175 for a few of the data lines. With a DMM I verified that both of these chips have several pins shorted to ground, and in the case of the DS4691M it even has one pin blown off! (The gap is not quite visible in the picture.)


It's a DS3691 RS-422/423 line driver with tri-state outputs (on IP22 anyway). Datasheets are available, here's one http://www.datasheetcatalog.com/datashe ... 3691.shtml

The riser is all passives except for the system serial number/MAC chip (uncanned NIC - does this make it perishable?)

_________________
Damn the torpedoes, full speed ahead!

:Indigo: :Octane: :Indigo2: :Indigo2IMP: :Indy: :PI: :O200: :ChallengeL: