Re: [RFC][PATCH] Add a sysctl option controlling kexec when MCE occurred

public inbox for kexec@lists.infradead.org
 help / color / mirror / Atom feed

* Re: [RFC][PATCH] Add a sysctl option controlling kexec when MCE occurred
       [not found] <5C4C569E8A4B9B42A84A977CF070A35B2C132F68FC@USINDEVS01.corp.hds.com>
@ 2010-12-23  0:29 ` Greg KH
  2010-12-23  7:43 ` Andi Kleen
  2010-12-27  1:56 ` Hidetoshi Seto
  2 siblings, 0 replies; 9+ messages in thread
From: Greg KH @ 2010-12-23  0:29 UTC (permalink / raw)
  To: Seiji Aguchi
  Cc: hawk@comx.dk, kexec@lists.infradead.org, drosenberg@vsecurity.com,
	dle-develop@lists.sourceforge.net, linux-mm@kvack.org,
	rdunlap@xenotime.net, andi@firstfloor.org, hpa@zytor.com,
	akpm@linuxfoundation.org, ext-andriy.shevchenko@nokia.com,
	eric.dumazet@gmail.com, x86@kernel.org, opurdila@ixiacom.com,
	mingo@redhat.com, ying.huang@intel.com, kees.cook@canonical.com,
	paulmck@linux.vnet.ibm.com, dzickus@redhat.com,
	len.brown@intel.com, seto.hidetoshi@jp.fujitsu.com,
	hadi@cyberus.ca, tglx@linutronix.de, hidave.darkstar@gmail.com,
	eugeneteo@kernel.org, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, Satoru Moriya,
	ebiederm@xmission.com, tj@kernel.org, davem@davemloft.net

On Wed, Dec 22, 2010 at 06:35:40PM -0500, Seiji Aguchi wrote:
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -81,6 +81,9 @@
>  #include <linux/nmi.h>
>  #endif
>  
> +#ifdef CONFIG_X86_MCE
> +#include <asm/mce.h>
> +#endif

Please don't put ifdefs in .c files, you do that a lot for this option.
Just make it work on all platforms and then you will not need the
#ifdef.

thanks,

greg k-h

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC][PATCH] Add a sysctl option controlling kexec when MCE occurred
       [not found] <5C4C569E8A4B9B42A84A977CF070A35B2C132F68FC@USINDEVS01.corp.hds.com>
  2010-12-23  0:29 ` [RFC][PATCH] Add a sysctl option controlling kexec when MCE occurred Greg KH
@ 2010-12-23  7:43 ` Andi Kleen
  2010-12-23  9:18   ` Borislav Petkov
  2010-12-27  1:56 ` Hidetoshi Seto
  2 siblings, 1 reply; 9+ messages in thread
From: Andi Kleen @ 2010-12-23  7:43 UTC (permalink / raw)
  To: Seiji Aguchi
  Cc: hawk@comx.dk, kexec@lists.infradead.org, drosenberg@vsecurity.com,
	dle-develop@lists.sourceforge.net, linux-mm@kvack.org,
	rdunlap@xenotime.net, andi@firstfloor.org, hpa@zytor.com,
	Seiji Aguchi, akpm@linuxfoundation.org,
	ext-andriy.shevchenko@nokia.com, eric.dumazet@gmail.com,
	x86@kernel.org, opurdila@ixiacom.com, mingo@redhat.com,
	ying.huang@intel.com, kees.cook@canonical.com,
	paulmck@linux.vnet.ibm.com, dzickus@redhat.com,
	len.brown@intel.com, seto.hidetoshi@jp.fujitsu.com,
	hadi@cyberus.ca, tglx@linutronix.de, hidave.darkstar@gmail.com,
	eugeneteo@kernel.org, gregkh@suse.de, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, Satoru Moriya,
	ebiederm@xmission.com, tj@kernel.org, davem@davemloft.net



>   - Accessing to memory and dumping it to disks.

A better solution for this is

http://git.kernel.org/?p=linux/kernel/git/ak/linux-mce-2.6.git;a=commitdiff;h=fe61906edce9e70d02481a77a617ba1397573dce
and
http://git.kernel.org/?p=linux/kernel/git/ak/linux-mce-2.6.git;a=commit;h=cb58f049ae6709ddbab71be199390dc6852018cd

I'm not a big friend of sysctls for things like this -- either the behaviour
makes sense and should be default or not.

-Andi



_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC][PATCH] Add a sysctl option controlling kexec when MCE occurred
  2010-12-23  7:43 ` Andi Kleen
@ 2010-12-23  9:18   ` Borislav Petkov
  2010-12-23 17:31     ` Seiji Aguchi
  0 siblings, 1 reply; 9+ messages in thread
From: Borislav Petkov @ 2010-12-23  9:18 UTC (permalink / raw)
  To: Andi Kleen
  Cc: hawk@comx.dk, kexec@lists.infradead.org, drosenberg@vsecurity.com,
	dle-develop@lists.sourceforge.net, linux-mm@kvack.org,
	rdunlap@xenotime.net, hpa@zytor.com, Seiji Aguchi,
	akpm@linuxfoundation.org, ext-andriy.shevchenko@nokia.com,
	eric.dumazet@gmail.com, x86@kernel.org, opurdila@ixiacom.com,
	mingo@redhat.com, ying.huang@intel.com, kees.cook@canonical.com,
	paulmck@linux.vnet.ibm.com, dzickus@redhat.com,
	len.brown@intel.com, seto.hidetoshi@jp.fujitsu.com,
	hadi@cyberus.ca, tglx@linutronix.de, hidave.darkstar@gmail.com,
	eugeneteo@kernel.org, gregkh@suse.de, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, Satoru Moriya,
	ebiederm@xmission.com, tj@kernel.org, davem@davemloft.net

On Thu, Dec 23, 2010 at 08:43:39AM +0100, Andi Kleen wrote:
> 
> 
> >   - Accessing to memory and dumping it to disks.
> 
> A better solution for this is
> 
> http://git.kernel.org/?p=linux/kernel/git/ak/linux-mce-2.6.git;a=commitdiff;h=fe61906edce9e70d02481a77a617ba1397573dce
> and
> http://git.kernel.org/?p=linux/kernel/git/ak/linux-mce-2.6.git;a=commit;h=cb58f049ae6709ddbab71be199390dc6852018cd
> 
> I'm not a big friend of sysctls for things like this -- either the behaviour
> makes sense and should be default or not.

This doesn't add up. AFAICT, you're disabling MCE reporting for crash
dumps and the original patch's intention was to control whether kexec
should run after a machine check. And I agree with Greg that this
shouldn't be configurable but instead on by default - if you get a
critical error and you cannot guarantee a stable system anymore, kexec
shouldn't start at all. That simple.

Thanks.

-- 
Regards/Gruss,
    Boris.

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [RFC][PATCH] Add a sysctl option controlling kexec when MCE occurred
  2010-12-23  9:18   ` Borislav Petkov
@ 2010-12-23 17:31     ` Seiji Aguchi
  2010-12-23 19:56       ` Eric W. Biederman
  0 siblings, 1 reply; 9+ messages in thread
From: Seiji Aguchi @ 2010-12-23 17:31 UTC (permalink / raw)
  To: Borislav Petkov, Andi Kleen
  Cc: hawk@comx.dk, kexec@lists.infradead.org, drosenberg@vsecurity.com,
	dle-develop@lists.sourceforge.net, linux-mm@kvack.org,
	rdunlap@xenotime.net, hpa@zytor.com, akpm@linuxfoundation.org,
	ext-andriy.shevchenko@nokia.com, eric.dumazet@gmail.com,
	x86@kernel.org, opurdila@ixiacom.com, mingo@redhat.com,
	ying.huang@intel.com, kees.cook@canonical.com,
	paulmck@linux.vnet.ibm.com, dzickus@redhat.com,
	len.brown@intel.com, seto.hidetoshi@jp.fujitsu.com,
	hadi@cyberus.ca, tglx@linutronix.de, hidave.darkstar@gmail.com,
	eugeneteo@kernel.org, gregkh@suse.de, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, Satoru Moriya,
	ebiederm@xmission.com, tj@kernel.org, davem@davemloft.net

Hi,

I agree with Borislav that kexec shouldn't start at all because we can't guarantee 
a stable system anymore when MCE is reported.

On the other hand, I understand there are people like Andi who want to start kexec 
even if MCE occurred.

That is why I propose adding a new option controlling kexec behaviour when MCE occurred.

I don't stick to "sysctl".
I suggest to add a new boot parameter instead of sysctl because users can't change 
their configuration once the boot parameter is set.

I will resend the patch if it is acceptable.

Regards,

Seiji
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC][PATCH] Add a sysctl option controlling kexec when MCE occurred
  2010-12-23 17:31     ` Seiji Aguchi
@ 2010-12-23 19:56       ` Eric W. Biederman
       [not found]         ` <5C4C569E8A4B9B42A84A977CF070A35B2C132F6CFA@USINDEVS01.corp.hds.com>
  0 siblings, 1 reply; 9+ messages in thread
From: Eric W. Biederman @ 2010-12-23 19:56 UTC (permalink / raw)
  To: Seiji Aguchi
  Cc: hawk@comx.dk, kexec@lists.infradead.org, drosenberg@vsecurity.com,
	dle-develop@lists.sourceforge.net, linux-mm@kvack.org,
	rdunlap@xenotime.net, Andi Kleen, hpa@zytor.com,
	akpm@linuxfoundation.org, ext-andriy.shevchenko@nokia.com,
	eric.dumazet@gmail.com, x86@kernel.org, opurdila@ixiacom.com,
	mingo@redhat.com, ying.huang@intel.com, kees.cook@canonical.com,
	paulmck@linux.vnet.ibm.com, dzickus@redhat.com,
	len.brown@intel.com, seto.hidetoshi@jp.fujitsu.com,
	hadi@cyberus.ca, Borislav Petkov, tglx@linutronix.de,
	hidave.darkstar@gmail.com, eugeneteo@kernel.org, gregkh@suse.de,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	Satoru Moriya, tj@kernel.org, davem@davemloft.net

Seiji Aguchi <seiji.aguchi@hds.com> writes:

> Hi,
>
> I agree with Borislav that kexec shouldn't start at all because we can't guarantee 
> a stable system anymore when MCE is reported.

In the case of kexec on panic we can never guarantee a stable system.
But the odds are much better of executing non-corrupt code  and of
telling people you had a hardware error if you go through the kexec
on panic process.

If I read Andi's patch correctly he was suggesting to not allow any more
mces to be reported on that path.

> On the other hand, I understand there are people like Andi who want to start kexec 
> even if MCE occurred.
>
> That is why I propose adding a new option controlling kexec behaviour
> when MCE occurred.

What do you gain but not doing the kexec on panic, when you have the
system configured to take one.  We already have the big policy knobs
to enable or disable this kind of behavior.

> I don't stick to "sysctl".

I think adding a sysctl in this path or any unnecessary code will make
things less reliable.

Last time this happened to me (about a week ago).  The kexec on panic
from a ecc reported memory error worked just fine.  Aka in the real
world it seems to work.

So what is the problem you are trying to avoid, and why can't we do
something in the kernels initialization path to avoid initializing
when there is a problem?

Eric

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 9+ messages in thread

[parent not found: <5C4C569E8A4B9B42A84A977CF070A35B2C132F6CFA@USINDEVS01.corp.hds.com>]

* Re: [RFC][PATCH] Add a sysctl option controlling kexec when MCE occurred
       [not found]         ` <5C4C569E8A4B9B42A84A977CF070A35B2C132F6CFA@USINDEVS01.corp.hds.com>
@ 2010-12-25 17:19           ` Eric W. Biederman
  2010-12-25 18:33             ` H. Peter Anvin
  0 siblings, 1 reply; 9+ messages in thread
From: Eric W. Biederman @ 2010-12-25 17:19 UTC (permalink / raw)
  To: Seiji Aguchi
  Cc: hawk@comx.dk, kexec@lists.infradead.org, drosenberg@vsecurity.com,
	dle-develop@lists.sourceforge.net, linux-mm@kvack.org,
	rdunlap@xenotime.net, Andi Kleen, hpa@zytor.com,
	akpm@linuxfoundation.org, ext-andriy.shevchenko@nokia.com,
	eric.dumazet@gmail.com, x86@kernel.org, opurdila@ixiacom.com,
	mingo@redhat.com, ying.huang@intel.com, kees.cook@canonical.com,
	paulmck@linux.vnet.ibm.com, dzickus@redhat.com,
	len.brown@intel.com, seto.hidetoshi@jp.fujitsu.com,
	hadi@cyberus.ca, Borislav Petkov, tglx@linutronix.de,
	hidave.darkstar@gmail.com, eugeneteo@kernel.org, gregkh@suse.de,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	Satoru Moriya, tj@kernel.org, davem@davemloft.net

Seiji Aguchi <seiji.aguchi@hds.com> writes:

> Hi,
>
> Thank you for giving your comments.
>
>>So what is the problem you are trying to avoid, and why can't we do
>>something in the kernels initialization path to avoid initializing
>>when there is a problem?
>
> Kdump gets a dump disk identifier based on information from memory.
>
> So, kdump may receive wrong identifier when it starts after MCE 
> occurred, because MCE is reported by memory, cache, and TLB errors
>
> In the worst case, kdump will overwrite user data if it recognizes a 
> disk saving user data as a dump disk.

Absurdly unlikely there is a sha256 checksum verified over the
kdump kernel before it starts booting.  If you have very broken
memory it is possible, but absurdly unlikely that the machine will
even boot if you are having enough uncorrectable memory errors
an hour to get past the sha256 checksum and then be corruppt.

> Kdump shouldn't write any data to disk when information from
> hardware is incredible because saving user data is always first 
> priority.

Which is what is already implemented.

It looks to me like you are jumping at shadows, and adding
complexity to the kernel with no gain, and significant cost.


Eric

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC][PATCH] Add a sysctl option controlling kexec when MCE occurred
  2010-12-25 17:19           ` Eric W. Biederman
@ 2010-12-25 18:33             ` H. Peter Anvin
  2010-12-25 21:40               ` Eric W. Biederman
  0 siblings, 1 reply; 9+ messages in thread
From: H. Peter Anvin @ 2010-12-25 18:33 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: hawk@comx.dk, kexec@lists.infradead.org, drosenberg@vsecurity.com,
	dle-develop@lists.sourceforge.net, linux-mm@kvack.org,
	rdunlap@xenotime.net, Andi Kleen, Seiji Aguchi,
	akpm@linuxfoundation.org, ext-andriy.shevchenko@nokia.com,
	eric.dumazet@gmail.com, x86@kernel.org, opurdila@ixiacom.com,
	mingo@redhat.com, ying.huang@intel.com, kees.cook@canonical.com,
	paulmck@linux.vnet.ibm.com, dzickus@redhat.com,
	len.brown@intel.com, seto.hidetoshi@jp.fujitsu.com,
	hadi@cyberus.ca, Borislav Petkov, tglx@linutronix.de,
	hidave.darkstar@gmail.com, eugeneteo@kernel.org, gregkh@suse.de,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	Satoru Moriya, tj@kernel.org, davem@davemloft.net

On 12/25/2010 09:19 AM, Eric W. Biederman wrote:
>>
>> So, kdump may receive wrong identifier when it starts after MCE 
>> occurred, because MCE is reported by memory, cache, and TLB errors
>>
>> In the worst case, kdump will overwrite user data if it recognizes a 
>> disk saving user data as a dump disk.
> 
> Absurdly unlikely there is a sha256 checksum verified over the
> kdump kernel before it starts booting.  If you have very broken
> memory it is possible, but absurdly unlikely that the machine will
> even boot if you are having enough uncorrectable memory errors
> an hour to get past the sha256 checksum and then be corruppt.
> 

That wouldn't be the likely scenario (passing a sha256 checksum with the
wrong data due to a random event will never happen for all the computers
on Earth before the Sun destroys the planet).  However, in a
failing-memory scenario, the much more likely scenario is that kdump
starts up, verifies the signature, and *then* has corruption causing it
to write to the wrong disk or whatnot.  This is inherent in any scheme
that allows writing to hard media after a failure (as opposed to, say,
dumping to the network.)

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC][PATCH] Add a sysctl option controlling kexec when MCE occurred
  2010-12-25 18:33             ` H. Peter Anvin
@ 2010-12-25 21:40               ` Eric W. Biederman
  0 siblings, 0 replies; 9+ messages in thread
From: Eric W. Biederman @ 2010-12-25 21:40 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: hawk@comx.dk, kexec@lists.infradead.org, drosenberg@vsecurity.com,
	dle-develop@lists.sourceforge.net, linux-mm@kvack.org,
	rdunlap@xenotime.net, Andi Kleen, Seiji Aguchi,
	akpm@linuxfoundation.org, ext-andriy.shevchenko@nokia.com,
	eric.dumazet@gmail.com, x86@kernel.org, opurdila@ixiacom.com,
	mingo@redhat.com, ying.huang@intel.com, kees.cook@canonical.com,
	paulmck@linux.vnet.ibm.com, dzickus@redhat.com,
	len.brown@intel.com, seto.hidetoshi@jp.fujitsu.com,
	hadi@cyberus.ca, Borislav Petkov, tglx@linutronix.de,
	hidave.darkstar@gmail.com, eugeneteo@kernel.org, gregkh@suse.de,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	Satoru Moriya, tj@kernel.org, davem@davemloft.net

"H. Peter Anvin" <hpa@zytor.com> writes:

> On 12/25/2010 09:19 AM, Eric W. Biederman wrote:
>>>
>>> So, kdump may receive wrong identifier when it starts after MCE 
>>> occurred, because MCE is reported by memory, cache, and TLB errors
>>>
>>> In the worst case, kdump will overwrite user data if it recognizes a 
>>> disk saving user data as a dump disk.
>> 
>> Absurdly unlikely there is a sha256 checksum verified over the
>> kdump kernel before it starts booting.  If you have very broken
>> memory it is possible, but absurdly unlikely that the machine will
>> even boot if you are having enough uncorrectable memory errors
>> an hour to get past the sha256 checksum and then be corruppt.
>> 
>
> That wouldn't be the likely scenario (passing a sha256 checksum with the
> wrong data due to a random event will never happen for all the computers
> on Earth before the Sun destroys the planet).  However, in a
> failing-memory scenario, the much more likely scenario is that kdump
> starts up, verifies the signature, and *then* has corruption causing it
> to write to the wrong disk or whatnot.  This is inherent in any scheme
> that allows writing to hard media after a failure (as opposed to, say,
> dumping to the network.)

Then kdump kernel should also panic if we detect an uncorrected ECC
error.  So this doesn't appear to open any new holes for disk corruption.

kexec on panic can also be used for taking crash dumps over the
network.  What happens with the data is totally defined by userspace
code in an initrd.

Which is why extra policy knobs should be where they can be used.

Eric

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC][PATCH] Add a sysctl option controlling kexec when MCE occurred
       [not found] <5C4C569E8A4B9B42A84A977CF070A35B2C132F68FC@USINDEVS01.corp.hds.com>
  2010-12-23  0:29 ` [RFC][PATCH] Add a sysctl option controlling kexec when MCE occurred Greg KH
  2010-12-23  7:43 ` Andi Kleen
@ 2010-12-27  1:56 ` Hidetoshi Seto
  2 siblings, 0 replies; 9+ messages in thread
From: Hidetoshi Seto @ 2010-12-27  1:56 UTC (permalink / raw)
  To: Seiji Aguchi
  Cc: hawk@comx.dk, kexec@lists.infradead.org, drosenberg@vsecurity.com,
	linux-mm@kvack.org, rdunlap@xenotime.net, andi@firstfloor.org,
	hpa@zytor.com, akpm@linuxfoundation.org,
	ext-andriy.shevchenko@nokia.com, eric.dumazet@gmail.com,
	x86@kernel.org, opurdila@ixiacom.com, mingo@redhat.com,
	ying.huang@intel.com, kees.cook@canonical.com,
	paulmck@linux.vnet.ibm.com, dzickus@redhat.com,
	len.brown@intel.com, dle-develop@lists.sourceforge.net,
	hadi@cyberus.ca, tglx@linutronix.de, hidave.darkstar@gmail.com,
	eugeneteo@kernel.org, gregkh@suse.de, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, Satoru Moriya,
	ebiederm@xmission.com, tj@kernel.org, davem@davemloft.net

(2010/12/23 8:35), Seiji Aguchi wrote:
> Hi,
> 
> [Purpose]
> Kexec may trigger additional hardware errors and multiply the damage 
> if it works after MCE occurred because there are some hardware-related 
> operations in kexec as follows.
>   - Sending NMI to cpus
>   - Initializing hardware during boot process of second kernel.
>   - Accessing to memory and dumping it to disks.
> 
> So, I propose adding a new option controlling kexec behaviour when MCE 
> occurred.
> This patch prevents unnecessary hardware errors and avoid expanding 
> the damage.
> 
> [Patch Description]
> I added a sysctl option ,kernel.kexec_on_mce, controlling kexec behaviour 
> when MCE occurred.
> 
>  - Permission
>    - 0644
>  - Value(default is "1")
>    - non-zero: Kexec is enabled regardless of MCE.
>    - 0: Kexec is disabled when MCE occurred.
> 
> Matrix of kernel.kexec_on_mce value, MCE and kexec behaviour
> 
> --------------------------------------------------
> kernel.kexec_on_mce| MCE          | kexec behaviour
> --------------------------------------------------
> non-zero           | occurred     | enabled
>                    -------------------------------
>                    | not occurred | enabled
> --------------------------------------------------
> 0                  | occurred     | disabled
>                    |------------------------------
>                    | not occurred | enabled
> --------------------------------------------------
> 
> Any comments and suggestions are welcome.

This reminds me of a quite similar patch that I've made a long time ago
but haven't posted.

Following is what I found still in a branch of my private git tree.
I guess it cannot be applied without rebase, but I think the description
of my patch could give you some different point of view etc.
Feel free to use this debris to improve yours.


Thanks,
H.Seto

<*__NOTE_THIS_PATCH_IS_NOT_READY_TO_APPLY__*>
=====
From: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Date: Fri, 10 Jul 2009 15:55:42 +0900
Subject: [PATCH] kdump, sysctl: kdump_on_safe

This patch adds a sysctl kdump_on_safe, to limit kdump to run only
on safe situation.

Quote from document in this patch:
 > kdump_on_safe:
 >
 > When the system experiences panic, kdump will be triggered if
 > crash kernel is configured.  However the kdump might fail if
 > the panic was caused by fatal error, such as hardware error
 > reported by machine check exception.  It should be rare case,
 > but in the worst case, it will result in data corruption and/or
 > fatal damage on the hardware.
 >
 > If this flag is 1, it prevents kdump from running on such
 > unstable system situation.  Default is 0.

This will be a possible option if your hardware can provide good error
report (in SEL etc.) and/or kernel can provide other data enough for
error investigation (console log, mcelog on x86 etc.), and you'd like
to reduce down-time by skipping kdump on such situation.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
---
 Documentation/sysctl/kernel.txt  |   15 +++++++++++++++
 arch/x86/kernel/cpu/mcheck/mce.c |    3 +++
 include/linux/kexec.h            |    3 +++
 kernel/kexec.c                   |    8 ++++++++
 kernel/sysctl.c                  |   13 +++++++++++++
 5 files changed, 42 insertions(+), 0 deletions(-)

diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
index 3894eaa..9d66ab9 100644
--- a/Documentation/sysctl/kernel.txt
+++ b/Documentation/sysctl/kernel.txt
@@ -33,6 +33,7 @@ show up in /proc/sys/kernel:
 - hotplug
 - java-appletviewer           [ binfmt_java, obsolete ]
 - java-interpreter            [ binfmt_java, obsolete ]
+- kdump_on_safe               [ kexec ]
 - kstack_depth_to_print       [ X86 only ]
 - l2cr                        [ PPC only ]
 - modprobe                    ==> Documentation/debugging-modules.txt
@@ -247,6 +248,20 @@ This flag controls the L2 cache of G3 processor boards. If
 
 ==============================================================
 
+kdump_on_safe:
+
+When the system experiences panic, kdump will be triggered if
+crash kernel is configured.  However the kdump might fail if
+the panic was caused by fatal error, such as hardware error
+reported by machine check exception.  It should be rare case,
+but in the worst case, it will result in data corruption and/or
+fatal damage on the hardware.
+
+If this flag is 1, it prevents kdump from running on such
+unstable system situation.  Default is 0.
+
+==============================================================
+
 kstack_depth_to_print: (X86 only)
 
 Controls the number of words to print when dumping the raw
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 3e2ab18..c93bb38 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -23,6 +23,7 @@
 #include <linux/sysdev.h>
 #include <linux/delay.h>
 #include <linux/ctype.h>
+#include <linux/kexec.h>
 #include <linux/sched.h>
 #include <linux/sysfs.h>
 #include <linux/types.h>
@@ -291,6 +292,8 @@ static void mce_panic(char *msg, struct mce *final, char *exp)
 	int cpu;
 
 	if (!fake_panic) {
+		set_kdump_might_fail();
+
 		/*
 		 * Make sure only one CPU runs in machine check panic
 		 */
diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index 03e8e8d..41e9ab0 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -209,10 +209,13 @@ int __init parse_crashkernel(char *cmdline, unsigned long long system_ram,
 int crash_shrink_memory(unsigned long new_size);
 size_t crash_get_memory_size(void);
 
+extern int kdump_might_fail;
+static inline void set_kdump_might_fail(void) { kdump_might_fail = 1; }
 #else /* !CONFIG_KEXEC */
 struct pt_regs;
 struct task_struct;
 static inline void crash_kexec(struct pt_regs *regs) { }
 static inline int kexec_should_crash(struct task_struct *p) { return 0; }
+static inline void set_kdump_might_fail(void) { }
 #endif /* CONFIG_KEXEC */
 #endif /* LINUX_KEXEC_H */
diff --git a/kernel/kexec.c b/kernel/kexec.c
index 87ebe8a..182c2f3 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -40,6 +40,9 @@
 #include <asm/system.h>
 #include <asm/sections.h>
 
+int kdump_on_safe;
+int kdump_might_fail;
+
 /* Per cpu memory for storing cpu states in case of system crash. */
 note_buf_t __percpu *crash_notes;
 
@@ -1064,6 +1067,11 @@ asmlinkage long compat_sys_kexec_load(unsigned long entry,
 
 void crash_kexec(struct pt_regs *regs)
 {
+	if (kdump_on_safe && kdump_might_fail) {
+		printk(KERN_EMERG "kexec cancelled due to unstable system.\n");
+		return;
+	}
+
 	/* Take the kexec_mutex here to prevent sys_kexec_load
 	 * running on one cpu from replacing the crash kernel
 	 * we are using after a panic on a different cpu.
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 8686b0f..8564e5c 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -156,6 +156,10 @@ extern int unaligned_dump_stack;
 
 extern struct ratelimit_state printk_ratelimit_state;
 
+#ifdef CONFIG_KEXEC
+extern int kdump_on_safe;
+#endif
+
 #ifdef CONFIG_PROC_SYSCTL
 static int proc_do_cad_pid(struct ctl_table *table, int write,
 		  void __user *buffer, size_t *lenp, loff_t *ppos);
@@ -926,6 +930,15 @@ static struct ctl_table kern_table[] = {
 		.proc_handler	= proc_dointvec,
 	},
 #endif
+#ifdef CONFIG_KEXEC
+	{
+		.procname	= "kdump_on_safe",
+		.data		= &kdump_on_safe,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec,
+	},
+#endif
 /*
  * NOTE: do not add new entries to this table unless you have read
  * Documentation/sysctl/ctl_unnumbered.txt
-- 
1.7.3.2
</*__NOTE_THIS_PATCH_IS_NOT_READY_TO_APPLY__*>




_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2010-12-27  2:03 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <5C4C569E8A4B9B42A84A977CF070A35B2C132F68FC@USINDEVS01.corp.hds.com>
2010-12-23  0:29 ` [RFC][PATCH] Add a sysctl option controlling kexec when MCE occurred Greg KH
2010-12-23  7:43 ` Andi Kleen
2010-12-23  9:18   ` Borislav Petkov
2010-12-23 17:31     ` Seiji Aguchi
2010-12-23 19:56       ` Eric W. Biederman
     [not found]         ` <5C4C569E8A4B9B42A84A977CF070A35B2C132F6CFA@USINDEVS01.corp.hds.com>
2010-12-25 17:19           ` Eric W. Biederman
2010-12-25 18:33             ` H. Peter Anvin
2010-12-25 21:40               ` Eric W. Biederman
2010-12-27  1:56 ` Hidetoshi Seto

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox