From: "K.Prasad" <prasad@linux.vnet.ibm.com>
To: Andi Kleen <andi@firstfloor.org>
Cc: "Luck, Tony" <tony.luck@intel.com>,
kexec@lists.infradead.org,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
anderson@redhat.com, "Eric W. Biederman" <ebiederm@xmission.com>,
Vivek Goyal <vgoyal@redhat.com>
Subject: Re: [RFC Patch 5/6] slimdump: Capture slimdump for fatal MCE generated crashes
Date: Fri, 27 May 2011 21:23:46 +0530 [thread overview]
Message-ID: <20110527155346.GA2384@in.ibm.com> (raw)
In-Reply-To: <20110526173257.GC4065@one.firstfloor.org>
On Thu, May 26, 2011 at 07:32:57PM +0200, Andi Kleen wrote:
> On Thu, May 26, 2011 at 10:53:05PM +0530, K.Prasad wrote:
> >
> > slimdump: Capture slimdump for fatal MCE generated crashes
> >
> > System crashes resulting from fatal hardware errors (such as MCE) don't need
> > all the contents from crashing-kernel's memory. Generate a new 'slimdump' that
> > retains only essential information while discarding the old memory.
>
> While this is a good idea, note there may be still poisoned lines
> in memory that haven't resulted in a machine check yet, but could
> still be fatal when read after a full crash dump for some other
> reason.
>
True, this patch does not handle the discovery of old poisoned lines/new
memory errors that may occur when inside the kdump kernel.
> So you still need
>
> http://git.kernel.org/?p=linux/kernel/git/ak/linux-mce-2.6.git;a=commit;h=fe61906edce9e70d02481a77a617ba1397573dce
> and
> http://git.kernel.org/?p=linux/kernel/git/ak/linux-mce-2.6.git;a=commit;h=cb58f049ae6709ddbab71be199390dc6852018cd
>
> in addition.
>
> -Andi
So, there could be (atleast) two ways to handle fatal MCEs in kdump
kernel:
- To disable MCE exceptions as done by the patches cited above. However
the result of a read operation on corrupted memory is unknown and the
system behaviour is undefined. We're unsure if this is a safe thing to
do.
- To disable capture of kdump (when panic is invoked from) inside kdump
kernel and simply reboot the system. Since the chance of memory error
inside kdump kernel (which runs for a very short duration) is rare, I
think this solution is preferrable.
Let me know your thoughts on this.
Thanks,
K.Prasad
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
WARNING: multiple messages have this Message-ID (diff)
From: "K.Prasad" <prasad@linux.vnet.ibm.com>
To: Andi Kleen <andi@firstfloor.org>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
"Luck, Tony" <tony.luck@intel.com>,
Vivek Goyal <vgoyal@redhat.com>,
kexec@lists.infradead.org,
"Eric W. Biederman" <ebiederm@xmission.com>,
anderson@redhat.com
Subject: Re: [RFC Patch 5/6] slimdump: Capture slimdump for fatal MCE generated crashes
Date: Fri, 27 May 2011 21:23:46 +0530 [thread overview]
Message-ID: <20110527155346.GA2384@in.ibm.com> (raw)
In-Reply-To: <20110526173257.GC4065@one.firstfloor.org>
On Thu, May 26, 2011 at 07:32:57PM +0200, Andi Kleen wrote:
> On Thu, May 26, 2011 at 10:53:05PM +0530, K.Prasad wrote:
> >
> > slimdump: Capture slimdump for fatal MCE generated crashes
> >
> > System crashes resulting from fatal hardware errors (such as MCE) don't need
> > all the contents from crashing-kernel's memory. Generate a new 'slimdump' that
> > retains only essential information while discarding the old memory.
>
> While this is a good idea, note there may be still poisoned lines
> in memory that haven't resulted in a machine check yet, but could
> still be fatal when read after a full crash dump for some other
> reason.
>
True, this patch does not handle the discovery of old poisoned lines/new
memory errors that may occur when inside the kdump kernel.
> So you still need
>
> http://git.kernel.org/?p=linux/kernel/git/ak/linux-mce-2.6.git;a=commit;h=fe61906edce9e70d02481a77a617ba1397573dce
> and
> http://git.kernel.org/?p=linux/kernel/git/ak/linux-mce-2.6.git;a=commit;h=cb58f049ae6709ddbab71be199390dc6852018cd
>
> in addition.
>
> -Andi
So, there could be (atleast) two ways to handle fatal MCEs in kdump
kernel:
- To disable MCE exceptions as done by the patches cited above. However
the result of a read operation on corrupted memory is unknown and the
system behaviour is undefined. We're unsure if this is a safe thing to
do.
- To disable capture of kdump (when panic is invoked from) inside kdump
kernel and simply reboot the system. Since the chance of memory error
inside kdump kernel (which runs for a very short duration) is rare, I
think this solution is preferrable.
Let me know your thoughts on this.
Thanks,
K.Prasad
next prev parent reply other threads:[~2011-05-27 15:54 UTC|newest]
Thread overview: 80+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-05-26 17:07 [RFC Patch 0/6] slimdump: Enable slimdump if crashing kernel memory is not required K.Prasad
2011-05-26 17:07 ` K.Prasad
2011-05-26 17:12 ` [Patch 1/6] XPANIC: Add extended panic interface K.Prasad
2011-05-26 17:12 ` K.Prasad
2011-05-26 17:38 ` richard -rw- weinberger
2011-05-26 17:38 ` richard -rw- weinberger
2011-05-27 15:56 ` K.Prasad
2011-05-27 15:56 ` K.Prasad
2011-05-27 17:59 ` Eric W. Biederman
2011-05-27 17:59 ` Eric W. Biederman
2011-05-26 17:12 ` [Patch 2/6] x86: mce: Convert mce code to xpanic K.Prasad
2011-05-26 17:12 ` K.Prasad
2011-05-27 18:01 ` Eric W. Biederman
2011-05-27 18:01 ` Eric W. Biederman
2011-05-26 17:14 ` [Bugfix][Patch 3/3] Invoke vpanic inside xpanic function K.Prasad
2011-05-26 17:14 ` K.Prasad
2011-05-26 17:15 ` [RFC Patch 4/6] PANIC_MCE: Introduce a new panic flag for fatal MCE, capture related information K.Prasad
2011-05-26 17:15 ` K.Prasad
2011-05-26 18:43 ` Vivek Goyal
2011-05-26 18:43 ` Vivek Goyal
2011-05-27 17:03 ` K.Prasad
2011-05-27 17:03 ` K.Prasad
2011-05-27 18:29 ` Vivek Goyal
2011-05-27 18:29 ` Vivek Goyal
2011-05-27 18:04 ` Eric W. Biederman
2011-05-27 18:04 ` Eric W. Biederman
2011-05-31 17:40 ` K.Prasad
2011-05-31 17:40 ` K.Prasad
2011-06-01 17:18 ` Dave Anderson
2011-06-01 17:18 ` Dave Anderson
2011-06-01 17:23 ` Vivek Goyal
2011-06-01 17:23 ` Vivek Goyal
2011-06-01 17:41 ` Dave Anderson
2011-06-01 17:41 ` Dave Anderson
2011-06-08 17:16 ` K.Prasad
2011-06-08 17:16 ` K.Prasad
2011-06-12 15:44 ` Eric W. Biederman
2011-06-12 15:44 ` Eric W. Biederman
2011-06-15 2:06 ` K.Prasad
2011-06-15 2:06 ` K.Prasad
2011-05-27 18:09 ` Eric W. Biederman
2011-05-27 18:09 ` Eric W. Biederman
2011-05-26 17:23 ` [RFC Patch 5/6] slimdump: Capture slimdump for fatal MCE generated crashes K.Prasad
2011-05-26 17:23 ` K.Prasad
2011-05-26 17:32 ` Andi Kleen
2011-05-26 17:32 ` Andi Kleen
2011-05-27 15:53 ` K.Prasad [this message]
2011-05-27 15:53 ` K.Prasad
2011-05-26 17:44 ` Vivek Goyal
2011-05-26 17:44 ` Vivek Goyal
2011-05-26 18:09 ` Andi Kleen
2011-05-26 18:09 ` Andi Kleen
2011-05-26 18:26 ` Vivek Goyal
2011-05-26 18:26 ` Vivek Goyal
2011-05-26 18:58 ` Andi Kleen
2011-05-26 18:58 ` Andi Kleen
2011-05-26 19:10 ` Vivek Goyal
2011-05-26 19:10 ` Vivek Goyal
2011-05-26 23:44 ` Simon Horman
2011-05-26 23:44 ` Simon Horman
2011-05-27 16:57 ` K.Prasad
2011-05-27 16:57 ` K.Prasad
2011-05-27 17:59 ` Vivek Goyal
2011-05-27 17:59 ` Vivek Goyal
2011-06-08 17:00 ` K.Prasad
2011-06-08 17:00 ` K.Prasad
2011-05-27 18:14 ` Eric W. Biederman
2011-05-27 18:14 ` Eric W. Biederman
2011-05-26 17:26 ` [RFC Patch 6/6] Crash: Recognise slim coredumps and process new elf-note sections K.Prasad
2011-05-26 17:26 ` K.Prasad
2011-05-27 15:37 ` Mahesh J Salgaonkar
2011-05-27 15:37 ` Mahesh J Salgaonkar
2011-05-27 18:16 ` Eric W. Biederman
2011-05-27 18:16 ` Eric W. Biederman
2011-05-27 18:22 ` Vivek Goyal
2011-05-27 18:22 ` Vivek Goyal
2011-05-27 18:35 ` Eric W. Biederman
2011-05-27 18:35 ` Eric W. Biederman
2011-05-26 17:31 ` [RFC Patch 0/6] slimdump: Enable slimdump if crashing kernel memory is not required K.Prasad
2011-05-26 17:31 ` K.Prasad
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110527155346.GA2384@in.ibm.com \
--to=prasad@linux.vnet.ibm.com \
--cc=anderson@redhat.com \
--cc=andi@firstfloor.org \
--cc=ebiederm@xmission.com \
--cc=kexec@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=tony.luck@intel.com \
--cc=vgoyal@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.