All of lore.kernel.org
 help / color / mirror / Atom feed
From: "K.Prasad" <prasad@linux.vnet.ibm.com>
To: Andi Kleen <andi@firstfloor.org>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>,
	Ananth N Mavinakayanahalli <ananth@in.ibm.com>,
	kexec@lists.infradead.org,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	"Luck, Tony" <tony.luck@intel.com>,
	Vivek Goyal <vgoyal@redhat.com>
Subject: Re: [RFC] Kdump and memory error handling
Date: Mon, 9 May 2011 22:59:35 +0530	[thread overview]
Message-ID: <20110509172935.GD1963@in.ibm.com> (raw)
In-Reply-To: <20110504203914.GC1737@one.firstfloor.org>

On Wed, May 04, 2011 at 10:39:14PM +0200, Andi Kleen wrote:
> > Any thoughts/suggestions?
> 
> My old attempts to solve this are
> 
> Don't dump on MCE:
> 
> http://git.kernel.org/?p=linux/kernel/git/ak/linux-mce-2.6.git;a=shortlog;h=refs/heads/mce/xpanic
> 

The problem we seen in avoiding a panic->crash_kexec->[coredump capture] is
that the user may not have a means to know the reason for crash, unless
the serial console is connected to capture and store the panic string.

Alternatively a 'slim' kdump (as described here:
https://lkml.org/lkml/2011/5/4/396) would not contain meaningless data from
the old memory, but inform the user about the cause of the crash. I'm
intending to post some patches with a quick implementation of it soon.

> Handle dumps of corrupted memory regresions:
> 
> http://git.kernel.org/?p=linux/kernel/git/ak/linux-mce-2.6.git;a=shortlog;h=refs/heads/mce/crashdump
> 

> IMHO these patches are still the right solutions for this.
> 

Like Vatsa had raised, the processor's behaviour upon reading (or any I/O
operation) the faulty memory location isn't clearly defined (to the
extent I read through System Programming Guide Part 1, Volume 3A,
Chapter 15). In such a scenario, disabling MCE for the kdump kernel (which can
potentially read the faulty memory) is making things hazy.

Thanks,
K.Prasad


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

WARNING: multiple messages have this Message-ID (diff)
From: "K.Prasad" <prasad@linux.vnet.ibm.com>
To: Andi Kleen <andi@firstfloor.org>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	"Luck, Tony" <tony.luck@intel.com>,
	Vivek Goyal <vgoyal@redhat.com>,
	kexec@lists.infradead.org, Srivatsa Vaddagiri <vatsa@in.ibm.com>,
	Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Subject: Re: [RFC] Kdump and memory error handling
Date: Mon, 9 May 2011 22:59:35 +0530	[thread overview]
Message-ID: <20110509172935.GD1963@in.ibm.com> (raw)
In-Reply-To: <20110504203914.GC1737@one.firstfloor.org>

On Wed, May 04, 2011 at 10:39:14PM +0200, Andi Kleen wrote:
> > Any thoughts/suggestions?
> 
> My old attempts to solve this are
> 
> Don't dump on MCE:
> 
> http://git.kernel.org/?p=linux/kernel/git/ak/linux-mce-2.6.git;a=shortlog;h=refs/heads/mce/xpanic
> 

The problem we seen in avoiding a panic->crash_kexec->[coredump capture] is
that the user may not have a means to know the reason for crash, unless
the serial console is connected to capture and store the panic string.

Alternatively a 'slim' kdump (as described here:
https://lkml.org/lkml/2011/5/4/396) would not contain meaningless data from
the old memory, but inform the user about the cause of the crash. I'm
intending to post some patches with a quick implementation of it soon.

> Handle dumps of corrupted memory regresions:
> 
> http://git.kernel.org/?p=linux/kernel/git/ak/linux-mce-2.6.git;a=shortlog;h=refs/heads/mce/crashdump
> 

> IMHO these patches are still the right solutions for this.
> 

Like Vatsa had raised, the processor's behaviour upon reading (or any I/O
operation) the faulty memory location isn't clearly defined (to the
extent I read through System Programming Guide Part 1, Volume 3A,
Chapter 15). In such a scenario, disabling MCE for the kdump kernel (which can
potentially read the faulty memory) is making things hazy.

Thanks,
K.Prasad


  parent reply	other threads:[~2011-05-09 17:29 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-05-04 19:35 [RFC] Kdump and memory error handling K.Prasad
2011-05-04 19:35 ` K.Prasad
2011-05-04 20:02 ` Luck, Tony
2011-05-04 20:02   ` Luck, Tony
2011-05-04 20:39 ` Andi Kleen
2011-05-04 20:39   ` Andi Kleen
2011-05-05  3:02   ` Vivek Goyal
2011-05-05  3:02     ` Vivek Goyal
2011-05-05  9:25   ` Srivatsa Vaddagiri
2011-05-05  9:25     ` Srivatsa Vaddagiri
2011-05-09 17:29   ` K.Prasad [this message]
2011-05-09 17:29     ` K.Prasad
2011-05-09 17:40     ` Vivek Goyal
2011-05-09 17:40       ` Vivek Goyal
2011-05-12 22:22 ` Eric W. Biederman
2011-05-12 22:22   ` Eric W. Biederman
2011-05-17 17:24   ` K.Prasad
2011-05-17 17:24     ` K.Prasad

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110509172935.GD1963@in.ibm.com \
    --to=prasad@linux.vnet.ibm.com \
    --cc=ananth@in.ibm.com \
    --cc=andi@firstfloor.org \
    --cc=kexec@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tony.luck@intel.com \
    --cc=vatsa@in.ibm.com \
    --cc=vgoyal@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.