A couple of days ago, my O350 had switched off overnight. It turned out that one of the PSUs had failed
I took the PSU out of the system (hot swap PSUs rule), opened it up, and with the help of a colleague determined that the a PFC diode had failed and taken a MOSFET switcher and the 8A primary fuse with it.
We ordered replacement parts:
Parts arrived, surgery was performed and I'm happy to report that the patient lives again
However, the story doesn't end there. I had of course looked for a replacement unit as well. The PSU is a Delta DPS-500EB E. It didn't take long for me to figure out this PSU has been used by a wide range of manufacturers, including Intel, SUN, Acer, Fujitsu-Siemens and several others. The Intel part number is A76009-00x, and this is where it gets interesting. There's a Technical Advisary TA-0674-5 (dated February 20, 2004) out for this PSU:
SUN issued a bulletin wrt. the Sun Fire V65x Servers which use this PSU , I'm not aware of any action by SGI though ( oh, how things have changed... ). I can only assume that Origin 350 systems produced before Q2 2004 (and maybe later) have the same issue. Mine certainly did: the PFC diode that blew was D802 described in Intel TA-0674-5. And it had apparently been hot for a while, because the heat had all but erased the part# printed on it.
I also won an equivalent (?) Fujitsu-Siemens DPS-500EB PSU for a whopping 1€ (+shipping) on eBay I'm curious to find out if (1) this PSU will work at all, or that the O350 will reject it. It can read part- and serial numbers so who knows, and (2) whether this PSU contains the same Infinion part, or the alternate diode that TA-0674-5 mentions.
Most of the hobbyist crew probably don't run their O350's 24/7, but even if you don't, *I* wouldn't be happy knowing there's a component in there that degrades and blows like that. Unfortunately we ordered the parts before I dug up this Technical Advisary, so I probably have the same part susceptible to degradation unless Infinion improved it.
I will almost certainly replace D802 in the other module of my O350 as well. Until I have this sorted out I will minimize the power-on hours of the O350. If you own an O350, you may want to do the same.
I took the PSU out of the system (hot swap PSUs rule), opened it up, and with the help of a colleague determined that the a PFC diode had failed and taken a MOSFET switcher and the 8A primary fuse with it.
We ordered replacement parts:
- The primary PFC diode: INFINEON - IDH12S60C - DIODE, SCHOTTKY, 600V, 12A, TO220-2 .
- The MOSFET switcher: INFINEON - SPW47N60C3 - MOSFET, N, COOLMOS, TO-247 .
- And an 8A fast fuse.
Parts arrived, surgery was performed and I'm happy to report that the patient lives again
However, the story doesn't end there. I had of course looked for a replacement unit as well. The PSU is a Delta DPS-500EB E. It didn't take long for me to figure out this PSU has been used by a wide range of manufacturers, including Intel, SUN, Acer, Fujitsu-Siemens and several others. The Intel part number is A76009-00x, and this is where it gets interesting. There's a Technical Advisary TA-0674-5 (dated February 20, 2004) out for this PSU:
TA-0674-5 wrote: Description
Intel® Server Chassis SR2300 500-watt redundant power (RP) supply modules with Intel part number A76009-006 and
prior revisions have the potential to fail during sustained, powered-on operation due to a failure of the primary PFC diode
(D802) in the power supply module. If a diode failure occurs, systems operating in a non-redundant power supply
configuration (only one power supply module installed in the power supply cage) will experience an immediate system
power down. [...]
Root Cause
Inherent imperfections in the silicon carbide base material (substrate) used to fabricate the diode cause abnormal electric
fields within the diode package during normal operating conditions. These fields result in high temperatures in the
imperfection areas which cause degradation and eventual failure of the diode. The structural design of the current
supplier’s diode does not have designed-in protection from these abnormal electric fields.
Corrective Action / Resolution
Intel has identified an alternate supplier source for the primary PFC diode in the power supply module. The alternate
diode design is substantially less susceptible to substrate imperfections, because it has designed-in protection against
substrate imperfections, and is therefore more robust than the current diode design. Intel has determined that power
supply modules built with the alternate diode meet Intel’s DPM rate requirement for server system power supplies. The
alternate diode is an equivalent drop-in replacement. An Engineering Change Order (ECO) has been completed to
incorporate the alternate diode. This change is described in Product Change Notification (PCN) number 103919-00.
Power supply modules built with the alternate diode will be marked with Intel part number A76009-007 (or later revisions).
Power supply modules with Intel part number A76009-006 and prior revisions may be reworked with the alternate diode
by Intel’s factories to part number A76009-007. Reworked power supplies will be marked with a green sticker and
relabeled with part number A76009-007 (or later revisions). Power supply modules with the alternate diode will begin
shipment from the power supply supplier on February 19, 2004. All affected product codes built after February 19, 2004
will contain power supply modules built with the alternate diode.
SUN issued a bulletin wrt. the Sun Fire V65x Servers which use this PSU , I'm not aware of any action by SGI though ( oh, how things have changed... ). I can only assume that Origin 350 systems produced before Q2 2004 (and maybe later) have the same issue. Mine certainly did: the PFC diode that blew was D802 described in Intel TA-0674-5. And it had apparently been hot for a while, because the heat had all but erased the part# printed on it.
I also won an equivalent (?) Fujitsu-Siemens DPS-500EB PSU for a whopping 1€ (+shipping) on eBay I'm curious to find out if (1) this PSU will work at all, or that the O350 will reject it. It can read part- and serial numbers so who knows, and (2) whether this PSU contains the same Infinion part, or the alternate diode that TA-0674-5 mentions.
Most of the hobbyist crew probably don't run their O350's 24/7, but even if you don't, *I* wouldn't be happy knowing there's a component in there that degrades and blows like that. Unfortunately we ordered the parts before I dug up this Technical Advisary, so I probably have the same part susceptible to degradation unless Infinion improved it.
I will almost certainly replace D802 in the other module of my O350 as well. Until I have this sorted out I will minimize the power-on hours of the O350. If you own an O350, you may want to do the same.
Now this is a deep dark secret, so everybody keep it quiet
It turns out that when reset, the WD33C93 defaults to a SCSI ID of 0, and it was simpler to leave it that way... -- Dave Olson, in comp.sys.sgi
Currently in commercial service: (2x)
In the museum : almost every MIPS/IRIX system.
Wanted : GM1 board for Professional Series GT graphics (030-0076-003, 030-0076-004)
It turns out that when reset, the WD33C93 defaults to a SCSI ID of 0, and it was simpler to leave it that way... -- Dave Olson, in comp.sys.sgi
Currently in commercial service: (2x)
In the museum : almost every MIPS/IRIX system.
Wanted : GM1 board for Professional Series GT graphics (030-0076-003, 030-0076-004)