All of lore.kernel.org
 help / color / mirror / Atom feed
From: Russ Anderson <rja@sgi.com>
To: Alex Williamson <alex.williamson@hp.com>
Cc: Andi Kleen <andi@firstfloor.org>,
	mingo@elte.hu, tglx@linutronix.de,
	Tony Luck <tony.luck@intel.com>,
	linux-kernel@vger.kernel.org, linux-ia64@vger.kernel.org
Subject: Re: [PATCH 0/2] Migrate data off physical pages with corrected memory errors (Version 7)
Date: Mon, 21 Jul 2008 19:45:43 +0000	[thread overview]
Message-ID: <20080721194543.GA214920@sgi.com> (raw)
In-Reply-To: <1216667499.8806.79.camel@lappy>

On Mon, Jul 21, 2008 at 01:11:39PM -0600, Alex Williamson wrote:
> On Sun, 2008-07-20 at 12:39 -0500, Russ Anderson wrote:
> > On Sat, Jul 19, 2008 at 12:37:11PM +0200, Andi Kleen wrote:
> > > If you really wanted to do this you probably should hook it up
> > > to mcelog's (or the IA64 equivalent) DIMM database
> > 
> > Is there an IA64 equivalent?  I've looked at the x86_64 mcelog,
> > but have not found a IA64 version.
> 
> There's a bit in the SAL error record that can tell you when the
> platform thinks the page should be deallocated.  In the section header
> (B2.2), ERROR_RECOVERY_INFO, bit 3 "Error threshold exceeded".  If you
> use this bit, then it's a platform decision.  If you want pages to be
> deallocated on the first hit, then have your SAL always set that bit.  I
> believe HP systems do implement this bit in SAL using some kind of
> heuristics.

Good point.  Linux does not have that field defined.

I'll submit a real patch to Tony shortly.
-------------------------------------------------
---
 include/asm-ia64/sal.h |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Index: linus/include/asm-ia64/sal.h
=================================--- linus.orig/include/asm-ia64/sal.h	2008-07-18 11:32:02.000000000 -0500
+++ linus/include/asm-ia64/sal.h	2008-07-21 14:40:47.142922279 -0500
@@ -341,7 +341,8 @@ typedef struct sal_log_record_header {
 typedef struct sal_log_sec_header {
     efi_guid_t guid;			/* Unique Section ID */
     sal_log_revision_t revision;	/* Major and Minor revision of Section */
-    u16 reserved;
+    u8 error_recovery_info;		/* Platform error recovery status */
+    u8 reserved;
     u32 len;				/* Section length */
 } sal_log_section_hdr_t;
 

-- 
Russ Anderson, OS RAS/Partitioning Project Lead  
SGI - Silicon Graphics Inc          rja@sgi.com

WARNING: multiple messages have this Message-ID (diff)
From: Russ Anderson <rja@sgi.com>
To: Alex Williamson <alex.williamson@hp.com>
Cc: Andi Kleen <andi@firstfloor.org>,
	mingo@elte.hu, tglx@linutronix.de,
	Tony Luck <tony.luck@intel.com>,
	linux-kernel@vger.kernel.org, linux-ia64@vger.kernel.org
Subject: Re: [PATCH 0/2] Migrate data off physical pages with corrected memory errors (Version 7)
Date: Mon, 21 Jul 2008 14:45:43 -0500	[thread overview]
Message-ID: <20080721194543.GA214920@sgi.com> (raw)
In-Reply-To: <1216667499.8806.79.camel@lappy>

On Mon, Jul 21, 2008 at 01:11:39PM -0600, Alex Williamson wrote:
> On Sun, 2008-07-20 at 12:39 -0500, Russ Anderson wrote:
> > On Sat, Jul 19, 2008 at 12:37:11PM +0200, Andi Kleen wrote:
> > > If you really wanted to do this you probably should hook it up
> > > to mcelog's (or the IA64 equivalent) DIMM database
> > 
> > Is there an IA64 equivalent?  I've looked at the x86_64 mcelog,
> > but have not found a IA64 version.
> 
> There's a bit in the SAL error record that can tell you when the
> platform thinks the page should be deallocated.  In the section header
> (B2.2), ERROR_RECOVERY_INFO, bit 3 "Error threshold exceeded".  If you
> use this bit, then it's a platform decision.  If you want pages to be
> deallocated on the first hit, then have your SAL always set that bit.  I
> believe HP systems do implement this bit in SAL using some kind of
> heuristics.

Good point.  Linux does not have that field defined.

I'll submit a real patch to Tony shortly.
-------------------------------------------------
---
 include/asm-ia64/sal.h |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Index: linus/include/asm-ia64/sal.h
===================================================================
--- linus.orig/include/asm-ia64/sal.h	2008-07-18 11:32:02.000000000 -0500
+++ linus/include/asm-ia64/sal.h	2008-07-21 14:40:47.142922279 -0500
@@ -341,7 +341,8 @@ typedef struct sal_log_record_header {
 typedef struct sal_log_sec_header {
     efi_guid_t guid;			/* Unique Section ID */
     sal_log_revision_t revision;	/* Major and Minor revision of Section */
-    u16 reserved;
+    u8 error_recovery_info;		/* Platform error recovery status */
+    u8 reserved;
     u32 len;				/* Section length */
 } sal_log_section_hdr_t;
 

-- 
Russ Anderson, OS RAS/Partitioning Project Lead  
SGI - Silicon Graphics Inc          rja@sgi.com

  reply	other threads:[~2008-07-21 19:45 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-07-18 20:35 [PATCH 0/2] Migrate data off physical pages with corrected memory errors (Version 7) Russ Anderson
2008-07-18 20:35 ` Russ Anderson
2008-07-19 10:37 ` Andi Kleen
2008-07-19 10:37   ` Andi Kleen
2008-07-19 12:13   ` Matthew Wilcox
2008-07-19 12:13     ` Matthew Wilcox
2008-07-19 15:06     ` Andi Kleen
2008-07-19 15:06       ` Andi Kleen
2008-07-20 17:50     ` Russ Anderson
2008-07-20 17:50       ` Russ Anderson
2008-07-20 17:39   ` Russ Anderson
2008-07-20 17:39     ` Russ Anderson
2008-07-21 19:11     ` [PATCH 0/2] Migrate data off physical pages with corrected Alex Williamson
2008-07-21 19:11       ` [PATCH 0/2] Migrate data off physical pages with corrected memory errors (Version 7) Alex Williamson
2008-07-21 19:45       ` Russ Anderson [this message]
2008-07-21 19:45         ` Russ Anderson
2008-07-21 19:40     ` Andi Kleen
2008-07-21 19:40       ` Andi Kleen
2008-07-28 21:44       ` Russ Anderson
2008-07-28 21:44         ` Russ Anderson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080721194543.GA214920@sgi.com \
    --to=rja@sgi.com \
    --cc=alex.williamson@hp.com \
    --cc=andi@firstfloor.org \
    --cc=linux-ia64@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.