SGI: Discussion

Nightmare on DIMM street

Quote:
DIMM error rates are hundreds to thousands of times higher than thought — a mean of 3,751 correctable errors per DIMM per year


A ZDnet article about DRAMs and their reliability.

http://blogs.zdnet.com/storage/?p=638

_________________
:Octane2: 2xR12000 400MHz, 4GB RAM, V12 GFX
SGI - the legend will never die!!
this just says that the design commodity DIMMS results in an error rate larger than the sum of the error rates of the constituent DRAMS. the DRAMS are the parts that have a 'known' Single Upset error rate. the assumption when calculating reliability is that the DIMM will only contribute faults associated with mechanical faults.

the most likely suspect power plane noise, follows by signal cross talk. each is an inexact science to calculate, measure, and address.

my memory designs never had error rate anywhere near this type of rate.

_________________
I love my iPad!!!
Quote:
The latest, most dense generations of DRAM perform as well, error wise, as previous generations.
does this mean that errors per # of bits are staying the same, or that errors per system are staying the same? Memory is growing; my current PC has 16GB of memory and it's 4... almost 5 years old.

There are new PCs that have room for 192 GB of memory. (although IIRC that requires 16GB dimms which are not yet common) That's bigger than my system disk! In several years there will be PC systems with 1 TB of memory... perhaps there will be a need to increase error correction overhead in the future?

_________________
:Onyx: (Aldebaran) :Octane: (Chaos) :O2: (Machop)
:hp xw9300: (Aggrocrag) :hp dv8000: (Attack)
sybrfreq wrote:
Quote:
The latest, most dense generations of DRAM perform as well, error wise, as previous generations.
does this mean that errors per # of bits are staying the same, or that errors per system are staying the same?


the most useful metric is number of errors per bit or Single Bit Error Rate for correctable errors (not all DRAM errors are correctable unfortunately). my observation has been that the SBER has gone down as bits per package have gone up. each new generation of DRAM size sets off a 'fear of impending doom' that the system error rates will go up. but this doesn't happen.

however, this is a oboservation for a system that is very well designed. not a commodity part. that is one difference in custom hardware you don't get in off-the-shelf junk which is what google bases their observation on.

_________________
I love my iPad!!!
skywriter wrote:
my observation has been that the SBER has gone down as bits per package have gone up. each new generation of DRAM size sets off a 'fear of impending doom' that the system error rates will go up. but this doesn't happen.

however, this is a oboservation for a system that is very well designed. not a commodity part. that is one difference in custom hardware you don't get in off-the-shelf junk which is what google bases their observation on.


Thanks, that answered my question exactly!

_________________
:Onyx: (Aldebaran) :Octane: (Chaos) :O2: (Machop)
:hp xw9300: (Aggrocrag) :hp dv8000: (Attack)