public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Ross Biro <ross.biro@gmail.com>
To: "Maciej W. Rozycki" <macro@linux-mips.org>
Cc: Russ Anderson <rja@sgi.com>, linux-kernel@vger.kernel.org
Subject: Re: [RCF] Linux memory error handling
Date: Wed, 15 Jun 2005 18:03:41 -0700	[thread overview]
Message-ID: <8783be6605061518034b220fce@mail.gmail.com> (raw)
In-Reply-To: <Pine.LNX.4.61L.0506151545410.13835@blysk.ds.pg.gda.pl>

On 6/15/05, Maciej W. Rozycki <macro@linux-mips.org> wrote:
> On Wed, 15 Jun 2005, Russ Anderson wrote:
> 
> >
> >           Polling Threshold:  A solid single bit error can cause a burst
> >               of correctable errors that can cause a significant logging
> >               overhead.  SBE thresholding counts the number of SBEs for
> >               a given page and if too many SBEs are detected in a given
> >               period of time, the interrupt is disabled and instead
> >               linux periodically polls for corrected errors.
> 
>  This is highly undesirable if the same interrupt is used for MBEs.  A
> page that causes an excessive number of SBEs should rather be removed from
> the available pool instead.  Logging should probably take recent events
> into account anyway and take care of not overloading the system, e.g. by
> keeping only statistical data instead of detailed information about each
> event under load.
> 

First, SBEs and MBEs are named historically and are currently called
correctable and uncorrectable errors.  Modern chip sets can often
handle many incorrect bits in a single word and still correct the
problem.  So please don't assume you can make any inferences into the
probability of an MBE because you are seeing SBEs.  Any such
inferences would need to be chip set specific.

Some common chip sets have bugs in them that can cause an excessive
number of reported SBEs.  On those chip sets with out any error
reporting, there is a noticeable performance hit when the SBE counters
go wild.  If every SBE generated an interrupt the system would grind
to a halt.  So there needs to be easy ways to disable interrupts
associated with SBEs.

Also some memory/chip set combinations generate a significant number
of SBEs with out any significant danger of an MBE, so many people will
want to ignore SBEs entirely, or only poll once in a while.

Finally, many chip sets have memory scrubbing technology that can
simultaneously generate SBEs in memory not being accessed by the
kernel and fix those errors. So don't just assume that because the
kernel isn't allowing access to a page, you won't see SBEs or MBEs
from that page.

Otherwise, anything done in this direction seems like a good idea to me.

    Ross

  parent reply	other threads:[~2005-06-16  1:03 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-06-15 14:30 [RCF] Linux memory error handling Russ Anderson
2005-06-15 15:08 ` Andi Kleen
2005-06-15 16:36   ` Russ Anderson
2005-06-15 15:26 ` Maciej W. Rozycki
2005-06-15 19:46   ` Russell King
2005-06-15 20:28     ` [RFC] " Russ Anderson
2005-06-15 20:45       ` Dave Hansen
2005-06-15 21:27         ` Russ Anderson
2005-06-15 21:33           ` Dave Hansen
2005-06-20 20:42             ` Russ Anderson
2005-06-20 21:07               ` Dave Hansen
2005-06-15 22:09   ` Russ Anderson
2005-06-16 19:42     ` Maciej W. Rozycki
2005-06-16  1:03   ` Ross Biro [this message]
2005-06-15 20:42 ` [RCF] " Joel Schopp
2005-06-16  2:54 ` Wang, Zhenyu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8783be6605061518034b220fce@mail.gmail.com \
    --to=ross.biro@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=macro@linux-mips.org \
    --cc=rja@sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox