public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Valdis.Kletnieks@vt.edu
To: Folkert van Heusden <folkert@vanheusden.com>
Cc: roland <devzero@web.de>, linux-kernel@vger.kernel.org
Subject: Re: Software based ECC ?
Date: Sun, 12 Aug 2007 23:09:22 -0400	[thread overview]
Message-ID: <1599.1186974562@turing-police.cc.vt.edu> (raw)
In-Reply-To: Your message of "Sun, 12 Aug 2007 18:51:31 +0200." <20070812165131.GG7973@vanheusden.com>

[-- Attachment #1: Type: text/plain, Size: 1818 bytes --]

On Sun, 12 Aug 2007 18:51:31 +0200, Folkert van Heusden said:

> a question and an idea: Q: is ecc guaranteed to detect all bitflips?

It depends on the exact ECC function the hardware implements.  Usually it
provides performance such as:

"Correct all 1-bit errors. Detect all 2-bit errors, and most 3 and higher,
but not correct".

(Of course, "correct all 1 or 2 bit and detect all 3 bit" can be done, it
just takes more bits of ECC.)

> Idea: what about a multicore system (3 or more) that runs the same
> processes on 2 cores and a third core verifying that they both do the
> same? As I think it is not only ram that can become faulty.

This is actually done for high-reliability systems (Google for "tell me twice"
and "tell me three times").  The problem is that it takes a lot of extra
hardware.  The G5 and later IBM Z-series mainframe chipsets (not to be confused with
the PowerPC G5) implemented dual computation units and a comparator that
signals a 'Machine Check' condition if the two CPUs don't end up in the
same exact state (as an added bonus, at the end of each instruction that
both *do* compare good, it latches the *entire* state of the CPU out,
and then does the following:

1) Retry the instruction on the same CPU - if it compares correctly, keep
going and flag a "soft" error.

2) If it still fails, read out the last "known good" status latch, and load
it into a spare CPU, and fire it up, and flag the failing one as bad.

http://www.research.ibm.com/journal/rd/435/spainhower.pdf
http://www.research.ibm.com/journal/rd/435/mueller.pdf

These guys have forgotten more about designing highly reliable systems than
most of us will ever know. ;)

Needless to say, not everybody is willing to pay the costs of the hardware
overhead of this approach.  


[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]

  parent reply	other threads:[~2007-08-13  3:09 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-08-10 21:16 Software based ECC ? roland
2007-08-10 22:21 ` Alan Cox
2007-08-11  6:11 ` Valdis.Kletnieks
2007-08-12 16:51   ` Folkert van Heusden
2007-08-12 17:07     ` Jan Engelhardt
2007-08-12 19:05     ` chibiryuu
2007-08-13  3:09     ` Valdis.Kletnieks [this message]
     [not found] <8QK3R-kc-9@gated-at.bofh.it>
     [not found] ` <8QSuw-4J2-9@gated-at.bofh.it>
     [not found]   ` <8RoXy-3NJ-13@gated-at.bofh.it>
2007-08-21 18:44     ` Bodo Eggert
2007-08-21 20:17       ` linux-os (Dick Johnson)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1599.1186974562@turing-police.cc.vt.edu \
    --to=valdis.kletnieks@vt.edu \
    --cc=devzero@web.de \
    --cc=folkert@vanheusden.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox