[RFC] Kdump and memory error handling

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFC] Kdump and memory error handling
@ 2011-05-04 19:35 K.Prasad
  2011-05-04 20:02 ` Luck, Tony
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: K.Prasad @ 2011-05-04 19:35 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Andi Kleen, Luck, Tony, Vivek Goyal, kexec, Srivatsa Vaddagiri,
	Ananth N Mavinakayanahalli

Hi All,
        We've been trying to study and improve the kdump behaviour when
a panic is triggered due to an unrecoverable memory error causing a
machine check exception (MCE) followed by a kernel panic.

In this context we foresee a few issues in capturing kdump and would
like to receive comments about the ways to handle them.

Probable Issues when capturing coredump through kdump following a memory
error
---------------------------
- First, a coredump of the memory from the crashing kernel isn't really
  helpful in debugging the crash that was caused due to a faulty memory.
  Collecting the same has some of the problems illustrated below. It should
  therefore suffice to let the user know the reason of the crash
  rather than provide a complete dump of the memory.

  For this, a 'slim' yet crash-tool readable coredump containing:
  - message about the cause (such as crash due to unrecoverable memory error)
    in the coredump's elf-note section.
  - and no data from the memory of the 'crashing' kernel (their elf
    sections can be reduced to zero length).
  may be suitable.

- Alternatively, if the kdump kernel decides to capture the coredump,
  its attempts to read the faulty memory location may lead to subsequent 
  faults in the context of kdump kernel with fatal consequences. This
  may either be avoided by:

  a) Pass the address of the corrupt memory location to the kdump kernel
  and skip reading that location while creating the vmcore. This needs
  an instance of 'struct mce' (from the 'crashing' kernel), which
  already contains the faulty memory address (in the physical address
  form, which should be confirmed using the IA32_MCi_MISC[8:6] bits stored
  in 'misc' field of 'struct mce') to be populated inside the elf
  (-notes?) section.

  b) Use modified copy applications (such as a modified 'cp' command)
  that can map the /dev/oldmem into user-space and then initiate the
  creation of vmcore. In this method, the user-space process performing
  the copy will receive a SIGBUS while consuming the faulty memory (through
  INT18 -> do_machine_check) but it must be modified to be resilient to the
  signal, while intelligently skipping to the subsequent memory location
  for further copying. Meanwhile the data for the faulty memory location
  can be represented using 'zero-ed' data and the vmcore enhanced to
  indicate the cause of the crash as one resulting from a fatal MCE.

Any thoughts/suggestions?

Thanks,
K.Prasad

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [RFC] Kdump and memory error handling
  2011-05-04 19:35 [RFC] Kdump and memory error handling K.Prasad
@ 2011-05-04 20:02 ` Luck, Tony
  2011-05-04 20:39 ` Andi Kleen
  2011-05-12 22:22 ` Eric W. Biederman
  2 siblings, 0 replies; 9+ messages in thread
From: Luck, Tony @ 2011-05-04 20:02 UTC (permalink / raw)
  To: prasad@linux.vnet.ibm.com, Linux Kernel Mailing List
  Cc: Andi Kleen, Vivek Goyal, kexec@lists.infradead.org,
	Srivatsa Vaddagiri, Ananth N Mavinakayanahalli

Your first suggestion of a "slim" dump makes the most sense. The
purpose of a crash dump is a research resource to find out why
the system crashed - but in the case of a machine check, we already
have the reasons for the crash captured by the machine check handler.

Perhaps you could include __log_buf[] in the slim crash dump? Assuming
that the machine check is not a result of an uncorrectable error
in this memory range.

-Tony

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] Kdump and memory error handling
  2011-05-04 19:35 [RFC] Kdump and memory error handling K.Prasad
  2011-05-04 20:02 ` Luck, Tony
@ 2011-05-04 20:39 ` Andi Kleen
  2011-05-05  3:02   ` Vivek Goyal
                     ` (2 more replies)
  2011-05-12 22:22 ` Eric W. Biederman
  2 siblings, 3 replies; 9+ messages in thread
From: Andi Kleen @ 2011-05-04 20:39 UTC (permalink / raw)
  To: K.Prasad
  Cc: Linux Kernel Mailing List, Andi Kleen, Luck, Tony, Vivek Goyal,
	kexec, Srivatsa Vaddagiri, Ananth N Mavinakayanahalli

> Any thoughts/suggestions?

My old attempts to solve this are

Don't dump on MCE:

http://git.kernel.org/?p=linux/kernel/git/ak/linux-mce-2.6.git;a=shortlog;h=refs/heads/mce/xpanic

Handle dumps of corrupted memory regresions:

http://git.kernel.org/?p=linux/kernel/git/ak/linux-mce-2.6.git;a=shortlog;h=refs/heads/mce/crashdump

IMHO these patches are still the right solutions for this.

-Andi

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] Kdump and memory error handling
  2011-05-04 20:39 ` Andi Kleen
@ 2011-05-05  3:02   ` Vivek Goyal
  2011-05-05  9:25   ` Srivatsa Vaddagiri
  2011-05-09 17:29   ` K.Prasad
  2 siblings, 0 replies; 9+ messages in thread
From: Vivek Goyal @ 2011-05-05  3:02 UTC (permalink / raw)
  To: Andi Kleen
  Cc: K.Prasad, Linux Kernel Mailing List, Luck, Tony, kexec,
	Srivatsa Vaddagiri, Ananth N Mavinakayanahalli

On Wed, May 04, 2011 at 10:39:14PM +0200, Andi Kleen wrote:
> > Any thoughts/suggestions?
> 
> My old attempts to solve this are
> 
> Don't dump on MCE:
> 
> http://git.kernel.org/?p=linux/kernel/git/ak/linux-mce-2.6.git;a=shortlog;h=refs/heads/mce/xpanic
> 
> Handle dumps of corrupted memory regresions:
> 
> http://git.kernel.org/?p=linux/kernel/git/ak/linux-mce-2.6.git;a=shortlog;h=refs/heads/mce/crashdump
> 

This idea of disabling mce temporarily sounds interesting. 

The slim dump giving access to log buffers makes most sense to me. Why
not leave it to user space to filter out only log buffers. So if a 
crash happens due to MCE, we can probably append an ELF note section
to vmcore and may be user space filtering utitliy (makedumpfile) can
extract and save only log portion of dump if it is an MCE triggered crash.

Of course this needs to be coupled with Andi's patch of disabling mce
temporarily so that makedumpfile does not induce another crash.

On a side note, can we just save log buf in NVRAM area and access later
using pstore (by tony luck) and if we can detect that system has that
NVRAM capability then skip kdump or something like that.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] Kdump and memory error handling
  2011-05-04 20:39 ` Andi Kleen
  2011-05-05  3:02   ` Vivek Goyal
@ 2011-05-05  9:25   ` Srivatsa Vaddagiri
  2011-05-09 17:29   ` K.Prasad
  2 siblings, 0 replies; 9+ messages in thread
From: Srivatsa Vaddagiri @ 2011-05-05  9:25 UTC (permalink / raw)
  To: Andi Kleen
  Cc: K.Prasad, Linux Kernel Mailing List, Luck, Tony, Vivek Goyal,
	kexec, Ananth N Mavinakayanahalli

On Wed, May 04, 2011 at 10:39:14PM +0200, Andi Kleen wrote:
> Handle dumps of corrupted memory regresions:
> 
> http://git.kernel.org/?p=linux/kernel/git/ak/linux-mce-2.6.git;a=shortlog;h=refs/heads/mce/crashdump

What happens when mce is disabled and capture kernel reads corrupted memory? 
Does that result in dump having erroneous data in dump?

- vatsa

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] Kdump and memory error handling
  2011-05-04 20:39 ` Andi Kleen
  2011-05-05  3:02   ` Vivek Goyal
  2011-05-05  9:25   ` Srivatsa Vaddagiri
@ 2011-05-09 17:29   ` K.Prasad
  2011-05-09 17:40     ` Vivek Goyal
  2 siblings, 1 reply; 9+ messages in thread
From: K.Prasad @ 2011-05-09 17:29 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Linux Kernel Mailing List, Luck, Tony, Vivek Goyal, kexec,
	Srivatsa Vaddagiri, Ananth N Mavinakayanahalli

On Wed, May 04, 2011 at 10:39:14PM +0200, Andi Kleen wrote:
> > Any thoughts/suggestions?
> 
> My old attempts to solve this are
> 
> Don't dump on MCE:
> 
> http://git.kernel.org/?p=linux/kernel/git/ak/linux-mce-2.6.git;a=shortlog;h=refs/heads/mce/xpanic
> 

The problem we seen in avoiding a panic->crash_kexec->[coredump capture] is
that the user may not have a means to know the reason for crash, unless
the serial console is connected to capture and store the panic string.

Alternatively a 'slim' kdump (as described here:
https://lkml.org/lkml/2011/5/4/396) would not contain meaningless data from
the old memory, but inform the user about the cause of the crash. I'm
intending to post some patches with a quick implementation of it soon.

> Handle dumps of corrupted memory regresions:
> 
> http://git.kernel.org/?p=linux/kernel/git/ak/linux-mce-2.6.git;a=shortlog;h=refs/heads/mce/crashdump
> 

> IMHO these patches are still the right solutions for this.
> 

Like Vatsa had raised, the processor's behaviour upon reading (or any I/O
operation) the faulty memory location isn't clearly defined (to the
extent I read through System Programming Guide Part 1, Volume 3A,
Chapter 15). In such a scenario, disabling MCE for the kdump kernel (which can
potentially read the faulty memory) is making things hazy.

Thanks,
K.Prasad

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] Kdump and memory error handling
  2011-05-09 17:29   ` K.Prasad
@ 2011-05-09 17:40     ` Vivek Goyal
  0 siblings, 0 replies; 9+ messages in thread
From: Vivek Goyal @ 2011-05-09 17:40 UTC (permalink / raw)
  To: K.Prasad
  Cc: Andi Kleen, Linux Kernel Mailing List, Luck, Tony, kexec,
	Srivatsa Vaddagiri, Ananth N Mavinakayanahalli

On Mon, May 09, 2011 at 10:59:35PM +0530, K.Prasad wrote:
> On Wed, May 04, 2011 at 10:39:14PM +0200, Andi Kleen wrote:
> > > Any thoughts/suggestions?
> > 
> > My old attempts to solve this are
> > 
> > Don't dump on MCE:
> > 
> > http://git.kernel.org/?p=linux/kernel/git/ak/linux-mce-2.6.git;a=shortlog;h=refs/heads/mce/xpanic
> > 
> 
> The problem we seen in avoiding a panic->crash_kexec->[coredump capture] is
> that the user may not have a means to know the reason for crash, unless
> the serial console is connected to capture and store the panic string.
> 
> Alternatively a 'slim' kdump (as described here:
> https://lkml.org/lkml/2011/5/4/396) would not contain meaningless data from
> the old memory, but inform the user about the cause of the crash. I'm
> intending to post some patches with a quick implementation of it soon.
> 
> > Handle dumps of corrupted memory regresions:
> > 
> > http://git.kernel.org/?p=linux/kernel/git/ak/linux-mce-2.6.git;a=shortlog;h=refs/heads/mce/crashdump
> > 
> 
> > IMHO these patches are still the right solutions for this.
> > 
> 
> Like Vatsa had raised, the processor's behaviour upon reading (or any I/O
> operation) the faulty memory location isn't clearly defined (to the
> extent I read through System Programming Guide Part 1, Volume 3A,
> Chapter 15). In such a scenario, disabling MCE for the kdump kernel (which can
> potentially read the faulty memory) is making things hazy.

How would a slim dump make that any better? And why leaving it to user
space to filter out the relevant pieces is not a good idea? 

I agree that it can lead to failure in case the memory we are dependent
on extracting the right information is corrupted but then slim dump
should have similar issues too (until and unless we do something smart
of determining the safe reason and putting all the inforamtion regarding
dump there from inside the kernel after the fault).

Thanks
Vivek

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] Kdump and memory error handling
  2011-05-04 19:35 [RFC] Kdump and memory error handling K.Prasad
  2011-05-04 20:02 ` Luck, Tony
  2011-05-04 20:39 ` Andi Kleen
@ 2011-05-12 22:22 ` Eric W. Biederman
  2011-05-17 17:24   ` K.Prasad
  2 siblings, 1 reply; 9+ messages in thread
From: Eric W. Biederman @ 2011-05-12 22:22 UTC (permalink / raw)
  To: prasad
  Cc: Linux Kernel Mailing List, Srivatsa Vaddagiri,
	Ananth N Mavinakayanahalli, Luck, Tony, kexec, Andi Kleen,
	Vivek Goyal

"K.Prasad" <prasad@linux.vnet.ibm.com> writes:

> Hi All,
>         We've been trying to study and improve the kdump behaviour when
> a panic is triggered due to an unrecoverable memory error causing a
> machine check exception (MCE) followed by a kernel panic.
>
> In this context we foresee a few issues in capturing kdump and would
> like to receive comments about the ways to handle them.
>
> Probable Issues when capturing coredump through kdump following a memory
> error
> ---------------------------
> - First, a coredump of the memory from the crashing kernel isn't really
>   helpful in debugging the crash that was caused due to a faulty memory.
>   Collecting the same has some of the problems illustrated below. It should
>   therefore suffice to let the user know the reason of the crash
>   rather than provide a complete dump of the memory.
>
>   For this, a 'slim' yet crash-tool readable coredump containing:
>   - message about the cause (such as crash due to unrecoverable memory error)
>     in the coredump's elf-note section.
>   - and no data from the memory of the 'crashing' kernel (their elf
>     sections can be reduced to zero length).
>   may be suitable.
>
> - Alternatively, if the kdump kernel decides to capture the coredump,
>   its attempts to read the faulty memory location may lead to subsequent 
>   faults in the context of kdump kernel with fatal consequences. This
>   may either be avoided by:
>
>   a) Pass the address of the corrupt memory location to the kdump kernel
>   and skip reading that location while creating the vmcore. This needs
>   an instance of 'struct mce' (from the 'crashing' kernel), which
>   already contains the faulty memory address (in the physical address
>   form, which should be confirmed using the IA32_MCi_MISC[8:6] bits stored
>   in 'misc' field of 'struct mce') to be populated inside the elf
>   (-notes?) section.
>
>   b) Use modified copy applications (such as a modified 'cp' command)
>   that can map the /dev/oldmem into user-space and then initiate the
>   creation of vmcore. In this method, the user-space process performing
>   the copy will receive a SIGBUS while consuming the faulty memory (through
>   INT18 -> do_machine_check) but it must be modified to be resilient to the
>   signal, while intelligently skipping to the subsequent memory location
>   for further copying. Meanwhile the data for the faulty memory location
>   can be represented using 'zero-ed' data and the vmcore enhanced to
>   indicate the cause of the crash as one resulting from a fatal MCE.
>
> Any thoughts/suggestions?

In practice this all works for me.

I have received several crash dumps where there was an mce error.

I admit I have my userspace configured to just grab the dmesg from the
kernel log and not do a full crash dump.  So in that sense I am already
a slim crash dump.

But in practice with real hardware errors it is working today without
kernel changes.

Eric

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] Kdump and memory error handling
  2011-05-12 22:22 ` Eric W. Biederman
@ 2011-05-17 17:24   ` K.Prasad
  0 siblings, 0 replies; 9+ messages in thread
From: K.Prasad @ 2011-05-17 17:24 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Kernel Mailing List, Srivatsa Vaddagiri,
	Ananth N Mavinakayanahalli, Luck, Tony, kexec, Andi Kleen,
	Vivek Goyal

On Thu, May 12, 2011 at 03:22:44PM -0700, Eric W. Biederman wrote:
> "K.Prasad" <prasad@linux.vnet.ibm.com> writes:
> 
> > Hi All,
> >         We've been trying to study and improve the kdump behaviour when
> > a panic is triggered due to an unrecoverable memory error causing a
> > machine check exception (MCE) followed by a kernel panic.
> >
> > In this context we foresee a few issues in capturing kdump and would
> > like to receive comments about the ways to handle them.
> >
> > Probable Issues when capturing coredump through kdump following a memory
> > error
> > ---------------------------
> > - First, a coredump of the memory from the crashing kernel isn't really
> >   helpful in debugging the crash that was caused due to a faulty memory.
> >   Collecting the same has some of the problems illustrated below. It should
> >   therefore suffice to let the user know the reason of the crash
> >   rather than provide a complete dump of the memory.
> >
> >   For this, a 'slim' yet crash-tool readable coredump containing:
> >   - message about the cause (such as crash due to unrecoverable memory error)
> >     in the coredump's elf-note section.
> >   - and no data from the memory of the 'crashing' kernel (their elf
> >     sections can be reduced to zero length).
> >   may be suitable.
> >
> > - Alternatively, if the kdump kernel decides to capture the coredump,
> >   its attempts to read the faulty memory location may lead to subsequent 
> >   faults in the context of kdump kernel with fatal consequences. This
> >   may either be avoided by:
> >
> >   a) Pass the address of the corrupt memory location to the kdump kernel
> >   and skip reading that location while creating the vmcore. This needs
> >   an instance of 'struct mce' (from the 'crashing' kernel), which
> >   already contains the faulty memory address (in the physical address
> >   form, which should be confirmed using the IA32_MCi_MISC[8:6] bits stored
> >   in 'misc' field of 'struct mce') to be populated inside the elf
> >   (-notes?) section.
> >
> >   b) Use modified copy applications (such as a modified 'cp' command)
> >   that can map the /dev/oldmem into user-space and then initiate the
> >   creation of vmcore. In this method, the user-space process performing
> >   the copy will receive a SIGBUS while consuming the faulty memory (through
> >   INT18 -> do_machine_check) but it must be modified to be resilient to the
> >   signal, while intelligently skipping to the subsequent memory location
> >   for further copying. Meanwhile the data for the faulty memory location
> >   can be represented using 'zero-ed' data and the vmcore enhanced to
> >   indicate the cause of the crash as one resulting from a fatal MCE.
> >
> > Any thoughts/suggestions?
> 
> In practice this all works for me.
> 
> I have received several crash dumps where there was an mce error.
> 
> I admit I have my userspace configured to just grab the dmesg from the
> kernel log and not do a full crash dump.  So in that sense I am already
> a slim crash dump.
> 
> But in practice with real hardware errors it is working today without
> kernel changes.
>

The problem with the existing kernel code is that it allows for the old
kernel's memory regions to be read (through read_vmcore function),
although intelligent userspace tools may avoid such a possibility (like
the one you mentioned).

Given that the system can experience recursive MCE faults while reading
the corrupt memory region, a 'slim' vmcore region presented by the kernel
to the user-space would be a safe option. We could also use such a dump to
include more relevant information such as address of corrupt memory,
type of memory error, etc.

Thanks,
K.Prasad


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2011-05-17 17:25 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-05-04 19:35 [RFC] Kdump and memory error handling K.Prasad
2011-05-04 20:02 ` Luck, Tony
2011-05-04 20:39 ` Andi Kleen
2011-05-05  3:02   ` Vivek Goyal
2011-05-05  9:25   ` Srivatsa Vaddagiri
2011-05-09 17:29   ` K.Prasad
2011-05-09 17:40     ` Vivek Goyal
2011-05-12 22:22 ` Eric W. Biederman
2011-05-17 17:24   ` K.Prasad

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).