All of lore.kernel.org
 help / color / mirror / Atom feed
From: Russ Anderson <rja@sgi.com>
To: Matthew Wilcox <matthew@wil.cx>
Cc: Andi Kleen <andi@firstfloor.org>,
	mingo@elte.hu, tglx@linutronix.de,
	Tony Luck <tony.luck@intel.com>,
	linux-kernel@vger.kernel.org, linux-ia64@vger.kernel.org
Subject: Re: [PATCH 0/2] Migrate data off physical pages with corrected memory errors (Version 7)
Date: Sun, 20 Jul 2008 17:50:04 +0000	[thread overview]
Message-ID: <20080720175004.GB9409@sgi.com> (raw)
In-Reply-To: <20080719121328.GA20138@parisc-linux.org>

On Sat, Jul 19, 2008 at 06:13:28AM -0600, Matthew Wilcox wrote:
> On Sat, Jul 19, 2008 at 12:37:11PM +0200, Andi Kleen wrote:
> > Russ Anderson <rja@sgi.com> writes:
> > 
> > > [PATCH 0/2] Migrate data off physical pages with corrected memory errors (Version 7)
> > 
> > FWIW I discussed this with some hardware people and the general
> > opinion was that it was way too aggressive to disable a page on the
> > first corrected error like this patchkit currently does.  
> 
> I think it's reasonable to take a page out of service on the first error.
> Then a user program needs to be notified of which bit is suspected.
> It can then subject that page to an intense set of tests (I'd start
> by stealing the ones from memtest86+) and if no more errors are found,
> it could return the page to service.

In general I agree with that approach.  One concern is that in the
process of testing the memory the diagnostic may hit an uncorrectable
error.  That is not a problem with Itanium, which is designed to handle
uncorrected/poisoned data going into and out of the processor core, but
can be a system fatal error (requiring a reboot) on other processor types.
Just something to be aware of.

-- 
Russ Anderson, OS RAS/Partitioning Project Lead  
SGI - Silicon Graphics Inc          rja@sgi.com

WARNING: multiple messages have this Message-ID (diff)
From: Russ Anderson <rja@sgi.com>
To: Matthew Wilcox <matthew@wil.cx>
Cc: Andi Kleen <andi@firstfloor.org>,
	mingo@elte.hu, tglx@linutronix.de,
	Tony Luck <tony.luck@intel.com>,
	linux-kernel@vger.kernel.org, linux-ia64@vger.kernel.org
Subject: Re: [PATCH 0/2] Migrate data off physical pages with corrected memory errors (Version 7)
Date: Sun, 20 Jul 2008 12:50:04 -0500	[thread overview]
Message-ID: <20080720175004.GB9409@sgi.com> (raw)
In-Reply-To: <20080719121328.GA20138@parisc-linux.org>

On Sat, Jul 19, 2008 at 06:13:28AM -0600, Matthew Wilcox wrote:
> On Sat, Jul 19, 2008 at 12:37:11PM +0200, Andi Kleen wrote:
> > Russ Anderson <rja@sgi.com> writes:
> > 
> > > [PATCH 0/2] Migrate data off physical pages with corrected memory errors (Version 7)
> > 
> > FWIW I discussed this with some hardware people and the general
> > opinion was that it was way too aggressive to disable a page on the
> > first corrected error like this patchkit currently does.  
> 
> I think it's reasonable to take a page out of service on the first error.
> Then a user program needs to be notified of which bit is suspected.
> It can then subject that page to an intense set of tests (I'd start
> by stealing the ones from memtest86+) and if no more errors are found,
> it could return the page to service.

In general I agree with that approach.  One concern is that in the
process of testing the memory the diagnostic may hit an uncorrectable
error.  That is not a problem with Itanium, which is designed to handle
uncorrected/poisoned data going into and out of the processor core, but
can be a system fatal error (requiring a reboot) on other processor types.
Just something to be aware of.

-- 
Russ Anderson, OS RAS/Partitioning Project Lead  
SGI - Silicon Graphics Inc          rja@sgi.com

  parent reply	other threads:[~2008-07-20 17:50 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-07-18 20:35 [PATCH 0/2] Migrate data off physical pages with corrected memory errors (Version 7) Russ Anderson
2008-07-18 20:35 ` Russ Anderson
2008-07-19 10:37 ` Andi Kleen
2008-07-19 10:37   ` Andi Kleen
2008-07-19 12:13   ` Matthew Wilcox
2008-07-19 12:13     ` Matthew Wilcox
2008-07-19 15:06     ` Andi Kleen
2008-07-19 15:06       ` Andi Kleen
2008-07-20 17:50     ` Russ Anderson [this message]
2008-07-20 17:50       ` Russ Anderson
2008-07-20 17:39   ` Russ Anderson
2008-07-20 17:39     ` Russ Anderson
2008-07-21 19:11     ` [PATCH 0/2] Migrate data off physical pages with corrected Alex Williamson
2008-07-21 19:11       ` [PATCH 0/2] Migrate data off physical pages with corrected memory errors (Version 7) Alex Williamson
2008-07-21 19:45       ` Russ Anderson
2008-07-21 19:45         ` Russ Anderson
2008-07-21 19:40     ` Andi Kleen
2008-07-21 19:40       ` Andi Kleen
2008-07-28 21:44       ` Russ Anderson
2008-07-28 21:44         ` Russ Anderson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080720175004.GB9409@sgi.com \
    --to=rja@sgi.com \
    --cc=andi@firstfloor.org \
    --cc=linux-ia64@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=matthew@wil.cx \
    --cc=mingo@elte.hu \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.