public inbox for linux-ia64@vger.kernel.org
 help / color / mirror / Atom feed
From: Russ Anderson <rja@efs.americas.sgi.com>
To: linux-ia64@vger.kernel.org
Subject: Re: new utility for decoding salinfo records
Date: Tue, 11 Jan 2005 21:58:56 +0000	[thread overview]
Message-ID: <200501112158.j0BLwvAN087091@efs.americas.sgi.com> (raw)
In-Reply-To: <1105458388.22104.7.camel@quince.llnl.gov>

Ben Woodard wrote:
> 
> So does anyone with "normal world" experience have any suggestions on
> how I should take into account the various perspectives? 
> 
> Do other people consider the isolated SBE a problem? 
> 
> Do other people consider 1SBE/hr on a DIMM a real problem that needs to
> be fixed?

Why would anyone consider a recovered error a problem?  ECC corrected
the data so life is good.

The real question is whether the corrected error is an indication that
something bad - a crash due to and uncorrected error - is going to happen.
That is the bad thing we want to avoid.

The answer to the question of whether single bits turn into double bits
is - it depends.  There are a number of underlying causes for SBEs and
different ways in which the SBE could degrade into a MBE.  The DRAM
technology plays a big part.  From experience, some DIMMs have SBEs that
never turn into MBEs.  Other DIMMs get MBEs without preceeding SBEs.

You really have to analyze the specific DIMMs, look at the failure 
characteristics of the technology, to get any specific data to base 
a logical conclusion.  And even then slight changes in the manufacturing 
process can skew those numbers.

What linux really needs is better SBE logging infrastructure, to 
keep track of specific DIMMs and the SBEs within the DIMMs, to
collect real data on which to draw meaningful conclusion.

The one solid answer I can give you is that the overall failure 
rate that causes system crashes remains constant over time.
That's because if a specific memory technology makes the memory
subsystem more reliable, people will just buy more memory until
they reach the same noticeable error rate.  ECC memory did not
eliminate memory errors, it allowed much larger memories with
the same overall memory failure rate.


-- 
Russ Anderson, OS RAS/Partitioning Project Lead  
SGI - Silicon Graphics Inc          rja@sgi.com

  parent reply	other threads:[~2005-01-11 21:58 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-01-11 15:46 new utility for decoding salinfo records Ben Woodard
2005-01-11 19:03 ` David Mosberger
2005-01-11 19:49 ` Luck, Tony
2005-01-11 20:25 ` David Mosberger
2005-01-11 20:26 ` Ben Woodard
2005-01-11 20:53 ` Mark Goodwin
2005-01-11 21:03 ` Ben Woodard
2005-01-11 21:12 ` Ben Woodard
2005-01-11 21:22 ` Russ Anderson
2005-01-11 21:23 ` Luck, Tony
2005-01-11 21:25 ` David Mosberger
2005-01-11 21:36 ` David Mosberger
2005-01-11 21:36 ` Matthias Fouquet-Lapar
2005-01-11 21:37 ` Ben Woodard
2005-01-11 21:42 ` David Mosberger
2005-01-11 21:58 ` Russ Anderson [this message]
2005-01-11 22:02 ` David Mosberger
2005-01-11 22:26 ` Matthias Fouquet-Lapar
2005-01-12  4:10 ` Keith Owens
2005-01-12  6:08 ` Luck, Tony
2005-01-12  6:43 ` Keith Owens
2005-01-12  9:34 ` Matthias Fouquet-Lapar
2005-01-12 16:57 ` Ben Woodard
2005-01-12 20:46 ` Keith Owens

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200501112158.j0BLwvAN087091@efs.americas.sgi.com \
    --to=rja@efs.americas.sgi.com \
    --cc=linux-ia64@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox