Re: [RFC][PATCH v2] Controlling kexec behaviour when hardware error happened.

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
To: Seiji Aguchi <seiji.aguchi@hds.com>
Cc: "hpa@zytor.com" <hpa@zytor.com>,
	"andi@firstfloor.org" <andi@firstfloor.org>,
	"ebiederm@xmission.com" <ebiederm@xmission.com>,
	"bp@alien8.de" <bp@alien8.de>, "gregkh@suse.de" <gregkh@suse.de>,
	"linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"x86@kernel.org" <x86@kernel.org>,
	"dle-develop@lists.sourceforge.net" 
	<dle-develop@lists.sourceforge.net>,
	"amwang@redhat.com" <amwang@redhat.com>,
	Satoru Moriya <satoru.moriya@hds.com>
Subject: Re: [RFC][PATCH v2] Controlling kexec behaviour when hardware error happened.
Date: Thu, 10 Feb 2011 17:36:58 +0900	[thread overview]
Message-ID: <4D53A3AA.5050908@jp.fujitsu.com> (raw)
In-Reply-To: <5C4C569E8A4B9B42A84A977CF070A35B2C1494DBE0@USINDEVS01.corp.hds.com>

(2011/02/10 1:35), Seiji Aguchi wrote:
> Hi,
> 
> I submitted a quite similar patch last December.
> 
> http://www.spinics.net/lists/linux-mm/msg13157.html
> 
> I retry it with different description of the purpose.
> 
> [Changelog]
> from v1:
>     - Change name of sysctl parameter ,kexec_on_mce, to kexec_on_hwerr. 
>     - Move variable declaration from <asm/mce.h> to <kernel/panic.h>.
>     - Remove CONFIG_X86_MCE in *.c files.
>     - Modify [Purpose]/[Patch Description].
> 
> [Purpose]
> There are some logging features of firmware/hardware, SEL,BMC, etc, in enterprise servers.
> We investigate the firmware/hardware logs first when MCE occurred and replace the broken hardware.
> So, memory dump is not necessary for detecting root cause of machine check.
> Also, we can reduce down-time by skipping kdump.
> 
> Of course, there are a lot of servers which don't have logging features of firmware/hardware.
> So, I proposed a option controlling kexec behaviour when hardware error occurred. 
> 
> [Patch Description]
> This patch adds a sysctl option ,kernel.kexec_on_hwerr, controlling kexec behaviour when hardware error occurred.
> 
>  - Permission
> 　　- 0644
>  - Value(default is "1")
>    - non-zero: Kexec is enabled regardless of hardware error.
>    - 0: Kexec is disabled when MCE occurred.
>    
> 
> Matrix of kernel.kexec_on_hwerr value ,hardware error and kexec
> 
> --------------------------------------------------
> kernel.kexec_on_hwerr| hardware error | kexec
> --------------------------------------------------
> non-zero             | occurred       | enabled
>                      -----------------------------
>                      | not occurred   | enabled
> --------------------------------------------------
> 0                    | occurred       | disabled
>                      |----------------------------
>                      | not occurred   | enabled
> --------------------------------------------------
> 
> 
> Any comments and suggestions are welcome.
> 
>  Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com>
> 
> ---
>  Documentation/sysctl/kernel.txt  |   11 +++++++++++
>  arch/x86/kernel/cpu/mcheck/mce.c |    2 ++
>  include/linux/kernel.h           |    2 ++
>  include/linux/sysctl.h           |    1 +
>  kernel/panic.c                   |   15 ++++++++++++++-
>  kernel/sysctl.c                  |    8 ++++++++
>  kernel/sysctl_binary.c           |    1 +
>  mm/memory-failure.c              |    2 ++
>  8 files changed, 41 insertions(+), 1 deletions(-)
> 
> diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt index 11d5ced..3159111 100644
> --- a/Documentation/sysctl/kernel.txt
> +++ b/Documentation/sysctl/kernel.txt
> @@ -34,6 +34,7 @@ show up in /proc/sys/kernel:
>  - hotplug
>  - java-appletviewer           [ binfmt_java, obsolete ]
>  - java-interpreter            [ binfmt_java, obsolete ]
> +- kexec_on_hwerr              [ x86 only ]
>  - kptr_restrict
>  - kstack_depth_to_print       [ X86 only ]
>  - l2cr                        [ PPC only ]
> @@ -261,6 +262,16 @@ This flag controls the L2 cache of G3 processor boards. If  0, the cache is disabled. Enabled if nonzero.
>  
>  ==============================================================
> +kexec_on_hwerr: (X86 only)
> +
> +Controls the behaviour of kexec when panic occurred due to hardware 
> +error.
> +Default value is 1.
> +
> +0: Kexec is disabled.
> +non-zero: Kexec is enabled.
> +
> +==============================================================
>  
>  kptr_restrict:
>  
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> index d916183..e76b47b 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -944,6 +944,8 @@ void do_machine_check(struct pt_regs *regs, long error_code)
>  
>  	percpu_inc(mce_exception_count);
>  
> +	hwerr_flag = 1;
> +
>  	if (notify_die(DIE_NMI, "machine check", regs, error_code,
>  			   18, SIGKILL) == NOTIFY_STOP)
>  		goto out;

Now x86 supports some recoverable machine check, so setting
flag here will prevent running kexec on systems that have
encountered such recoverable machine check and recovered.

I think mce_panic() is proper place to set this flag "hwerr_flag".

> diff --git a/include/linux/kernel.h b/include/linux/kernel.h index 2fe6e84..c2fba7c 100644
> --- a/include/linux/kernel.h
> +++ b/include/linux/kernel.h
> @@ -242,6 +242,8 @@ extern void add_taint(unsigned flag);  extern int test_taint(unsigned flag);  extern unsigned long get_taint(void);  extern int root_mountflags;
> +extern int kexec_on_hwerr;
> +extern int hwerr_flag;
>  
>  extern bool early_boot_irqs_disabled;
>  
> diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h index 7bb5cb6..8ae5bfe 100644
> --- a/include/linux/sysctl.h
> +++ b/include/linux/sysctl.h
> @@ -153,6 +153,7 @@ enum
>  	KERN_MAX_LOCK_DEPTH=74, /* int: rtmutex's maximum lock depth */
>  	KERN_NMI_WATCHDOG=75, /* int: enable/disable nmi watchdog */
>  	KERN_PANIC_ON_NMI=76, /* int: whether we will panic on an unrecovered */
> +	KERN_KEXEC_ON_HWERR=77, /* int: bevaviour of kexec for hardware error 
> +*/
>  };
>  
>  
> diff --git a/kernel/panic.c b/kernel/panic.c index 991bb87..84c1d2e 100644
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -28,6 +28,8 @@
>  #define PANIC_BLINK_SPD 18
>  
>  int panic_on_oops;
> +int kexec_on_hwerr = 1;
> +int hwerr_flag;
>  static unsigned long tainted_mask;
>  static int pause_on_oops;
>  static int pause_on_oops_flag;
> @@ -45,6 +47,16 @@ static long no_blink(int state)
>  	return 0;
>  }
>  
> +static int kexec_should_skip(void)
> +{
> +	if (!kexec_on_hwerr && hwerr_flag) {
> +		printk(KERN_WARNING "Kexec is skipped because hardware error "
> +		       "occurred.\n");
> +		return 1;
> +	}
> +	return 0;
> +}
> +
>  /* Returns how long it waited in ms */
>  long (*panic_blink)(int state);
>  EXPORT_SYMBOL(panic_blink);
> @@ -86,7 +98,8 @@ NORET_TYPE void panic(const char * fmt, ...)
>  	 * everything else.
>  	 * Do we want to call this before we try to display a message?
>  	 */
> -	crash_kexec(NULL);
> +	if (!kexec_should_skip())
> +		crash_kexec(NULL);
>  
>  	kmsg_dump(KMSG_DUMP_PANIC);
>  
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 0f1bd83..f78edd8 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -811,6 +811,14 @@ static struct ctl_table kern_table[] = {
>  		.mode		= 0644,
>  		.proc_handler	= proc_dointvec,
>  	},
> +	{
> +		.procname	= "kexec_on_hwerr",
> +		.data		= &kexec_on_hwerr,
> +		.maxlen		= sizeof(int),
> +		.mode		= 0644,
> +		.proc_handler	= proc_dointvec,
> +	},
> +
>  #endif
>  #if defined(CONFIG_MMU)
>  	{
> diff --git a/kernel/sysctl_binary.c b/kernel/sysctl_binary.c index b875bed..8d572ca 100644
> --- a/kernel/sysctl_binary.c
> +++ b/kernel/sysctl_binary.c
> @@ -137,6 +137,7 @@ static const struct bin_table bin_kern_table[] = {
>  	{ CTL_INT,	KERN_COMPAT_LOG,		"compat-log" },
>  	{ CTL_INT,	KERN_MAX_LOCK_DEPTH,		"max_lock_depth" },
>  	{ CTL_INT,	KERN_PANIC_ON_NMI,		"panic_on_unrecovered_nmi" },
> +	{ CTL_INT,	KERN_KEXEC_ON_HWERR,		"kexec_on_hwerr" },
>  	{}
>  };
>  
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 0207c2f..0178f47 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -994,6 +994,8 @@ int __memory_failure(unsigned long pfn, int trapno, int flags)
>  	int res;
>  	unsigned int nr_pages;
>  
> +	hwerr_flag = 1;
> +
>  	if (!sysctl_memory_failure_recovery)
>  		panic("Memory failure from trap %d on page %lx", trapno, pfn);
>  

For similar reason, setting flag here is not good for
systems working after isolating some poisoned memory page.

Why not:
 if (!sysctl_memory_failure_recovery) {
 	hwerr_flag = 1;
 	panic("Memory failure from trap %d on page %lx", trapno, pfn);
 }

Thanks,
H.Seto

WARNING: multiple messages have this Message-ID (diff)

From: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
To: Seiji Aguchi <seiji.aguchi@hds.com>
Cc: "hpa@zytor.com" <hpa@zytor.com>,
	"andi@firstfloor.org" <andi@firstfloor.org>,
	"ebiederm@xmission.com" <ebiederm@xmission.com>,
	"bp@alien8.de" <bp@alien8.de>, "gregkh@suse.de" <gregkh@suse.de>,
	"linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"x86@kernel.org" <x86@kernel.org>,
	"dle-develop@lists.sourceforge.net"
	<dle-develop@lists.sourceforge.net>,
	"amwang@redhat.com" <amwang@redhat.com>,
	Satoru Moriya <satoru.moriya@hds.com>
Subject: Re: [RFC][PATCH v2] Controlling kexec behaviour when hardware error happened.
Date: Thu, 10 Feb 2011 17:36:58 +0900	[thread overview]
Message-ID: <4D53A3AA.5050908@jp.fujitsu.com> (raw)
In-Reply-To: <5C4C569E8A4B9B42A84A977CF070A35B2C1494DBE0@USINDEVS01.corp.hds.com>

(2011/02/10 1:35), Seiji Aguchi wrote:
> Hi,
> 
> I submitted a quite similar patch last December.
> 
> http://www.spinics.net/lists/linux-mm/msg13157.html
> 
> I retry it with different description of the purpose.
> 
> [Changelog]
> from v1:
>     - Change name of sysctl parameter ,kexec_on_mce, to kexec_on_hwerr. 
>     - Move variable declaration from <asm/mce.h> to <kernel/panic.h>.
>     - Remove CONFIG_X86_MCE in *.c files.
>     - Modify [Purpose]/[Patch Description].
> 
> [Purpose]
> There are some logging features of firmware/hardware, SEL,BMC, etc, in enterprise servers.
> We investigate the firmware/hardware logs first when MCE occurred and replace the broken hardware.
> So, memory dump is not necessary for detecting root cause of machine check.
> Also, we can reduce down-time by skipping kdump.
> 
> Of course, there are a lot of servers which don't have logging features of firmware/hardware.
> So, I proposed a option controlling kexec behaviour when hardware error occurred. 
> 
> [Patch Description]
> This patch adds a sysctl option ,kernel.kexec_on_hwerr, controlling kexec behaviour when hardware error occurred.
> 
>  - Permission
> 　　- 0644
>  - Value(default is "1")
>    - non-zero: Kexec is enabled regardless of hardware error.
>    - 0: Kexec is disabled when MCE occurred.
>    
> 
> Matrix of kernel.kexec_on_hwerr value ,hardware error and kexec
> 
> --------------------------------------------------
> kernel.kexec_on_hwerr| hardware error | kexec
> --------------------------------------------------
> non-zero             | occurred       | enabled
>                      -----------------------------
>                      | not occurred   | enabled
> --------------------------------------------------
> 0                    | occurred       | disabled
>                      |----------------------------
>                      | not occurred   | enabled
> --------------------------------------------------
> 
> 
> Any comments and suggestions are welcome.
> 
>  Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com>
> 
> ---
>  Documentation/sysctl/kernel.txt  |   11 +++++++++++
>  arch/x86/kernel/cpu/mcheck/mce.c |    2 ++
>  include/linux/kernel.h           |    2 ++
>  include/linux/sysctl.h           |    1 +
>  kernel/panic.c                   |   15 ++++++++++++++-
>  kernel/sysctl.c                  |    8 ++++++++
>  kernel/sysctl_binary.c           |    1 +
>  mm/memory-failure.c              |    2 ++
>  8 files changed, 41 insertions(+), 1 deletions(-)
> 
> diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt index 11d5ced..3159111 100644
> --- a/Documentation/sysctl/kernel.txt
> +++ b/Documentation/sysctl/kernel.txt
> @@ -34,6 +34,7 @@ show up in /proc/sys/kernel:
>  - hotplug
>  - java-appletviewer           [ binfmt_java, obsolete ]
>  - java-interpreter            [ binfmt_java, obsolete ]
> +- kexec_on_hwerr              [ x86 only ]
>  - kptr_restrict
>  - kstack_depth_to_print       [ X86 only ]
>  - l2cr                        [ PPC only ]
> @@ -261,6 +262,16 @@ This flag controls the L2 cache of G3 processor boards. If  0, the cache is disabled. Enabled if nonzero.
>  
>  ==============================================================
> +kexec_on_hwerr: (X86 only)
> +
> +Controls the behaviour of kexec when panic occurred due to hardware 
> +error.
> +Default value is 1.
> +
> +0: Kexec is disabled.
> +non-zero: Kexec is enabled.
> +
> +==============================================================
>  
>  kptr_restrict:
>  
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> index d916183..e76b47b 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -944,6 +944,8 @@ void do_machine_check(struct pt_regs *regs, long error_code)
>  
>  	percpu_inc(mce_exception_count);
>  
> +	hwerr_flag = 1;
> +
>  	if (notify_die(DIE_NMI, "machine check", regs, error_code,
>  			   18, SIGKILL) == NOTIFY_STOP)
>  		goto out;

Now x86 supports some recoverable machine check, so setting
flag here will prevent running kexec on systems that have
encountered such recoverable machine check and recovered.

I think mce_panic() is proper place to set this flag "hwerr_flag".

> diff --git a/include/linux/kernel.h b/include/linux/kernel.h index 2fe6e84..c2fba7c 100644
> --- a/include/linux/kernel.h
> +++ b/include/linux/kernel.h
> @@ -242,6 +242,8 @@ extern void add_taint(unsigned flag);  extern int test_taint(unsigned flag);  extern unsigned long get_taint(void);  extern int root_mountflags;
> +extern int kexec_on_hwerr;
> +extern int hwerr_flag;
>  
>  extern bool early_boot_irqs_disabled;
>  
> diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h index 7bb5cb6..8ae5bfe 100644
> --- a/include/linux/sysctl.h
> +++ b/include/linux/sysctl.h
> @@ -153,6 +153,7 @@ enum
>  	KERN_MAX_LOCK_DEPTH=74, /* int: rtmutex's maximum lock depth */
>  	KERN_NMI_WATCHDOG=75, /* int: enable/disable nmi watchdog */
>  	KERN_PANIC_ON_NMI=76, /* int: whether we will panic on an unrecovered */
> +	KERN_KEXEC_ON_HWERR=77, /* int: bevaviour of kexec for hardware error 
> +*/
>  };
>  
>  
> diff --git a/kernel/panic.c b/kernel/panic.c index 991bb87..84c1d2e 100644
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -28,6 +28,8 @@
>  #define PANIC_BLINK_SPD 18
>  
>  int panic_on_oops;
> +int kexec_on_hwerr = 1;
> +int hwerr_flag;
>  static unsigned long tainted_mask;
>  static int pause_on_oops;
>  static int pause_on_oops_flag;
> @@ -45,6 +47,16 @@ static long no_blink(int state)
>  	return 0;
>  }
>  
> +static int kexec_should_skip(void)
> +{
> +	if (!kexec_on_hwerr && hwerr_flag) {
> +		printk(KERN_WARNING "Kexec is skipped because hardware error "
> +		       "occurred.\n");
> +		return 1;
> +	}
> +	return 0;
> +}
> +
>  /* Returns how long it waited in ms */
>  long (*panic_blink)(int state);
>  EXPORT_SYMBOL(panic_blink);
> @@ -86,7 +98,8 @@ NORET_TYPE void panic(const char * fmt, ...)
>  	 * everything else.
>  	 * Do we want to call this before we try to display a message?
>  	 */
> -	crash_kexec(NULL);
> +	if (!kexec_should_skip())
> +		crash_kexec(NULL);
>  
>  	kmsg_dump(KMSG_DUMP_PANIC);
>  
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 0f1bd83..f78edd8 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -811,6 +811,14 @@ static struct ctl_table kern_table[] = {
>  		.mode		= 0644,
>  		.proc_handler	= proc_dointvec,
>  	},
> +	{
> +		.procname	= "kexec_on_hwerr",
> +		.data		= &kexec_on_hwerr,
> +		.maxlen		= sizeof(int),
> +		.mode		= 0644,
> +		.proc_handler	= proc_dointvec,
> +	},
> +
>  #endif
>  #if defined(CONFIG_MMU)
>  	{
> diff --git a/kernel/sysctl_binary.c b/kernel/sysctl_binary.c index b875bed..8d572ca 100644
> --- a/kernel/sysctl_binary.c
> +++ b/kernel/sysctl_binary.c
> @@ -137,6 +137,7 @@ static const struct bin_table bin_kern_table[] = {
>  	{ CTL_INT,	KERN_COMPAT_LOG,		"compat-log" },
>  	{ CTL_INT,	KERN_MAX_LOCK_DEPTH,		"max_lock_depth" },
>  	{ CTL_INT,	KERN_PANIC_ON_NMI,		"panic_on_unrecovered_nmi" },
> +	{ CTL_INT,	KERN_KEXEC_ON_HWERR,		"kexec_on_hwerr" },
>  	{}
>  };
>  
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 0207c2f..0178f47 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -994,6 +994,8 @@ int __memory_failure(unsigned long pfn, int trapno, int flags)
>  	int res;
>  	unsigned int nr_pages;
>  
> +	hwerr_flag = 1;
> +
>  	if (!sysctl_memory_failure_recovery)
>  		panic("Memory failure from trap %d on page %lx", trapno, pfn);
>  

For similar reason, setting flag here is not good for
systems working after isolating some poisoned memory page.

Why not:
 if (!sysctl_memory_failure_recovery) {
 	hwerr_flag = 1;
 	panic("Memory failure from trap %d on page %lx", trapno, pfn);
 }

Thanks,
H.Seto

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2011-02-10  8:37 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-02-09 16:35 [RFC][PATCH v2] Controlling kexec behaviour when hardware error happened Seiji Aguchi
2011-02-09 16:35 ` Seiji Aguchi
2011-02-09 16:51 ` Greg KH
2011-02-09 16:51   ` Greg KH
2011-02-09 17:06 ` Eric W. Biederman
2011-02-09 17:06   ` Eric W. Biederman
2011-02-09 17:07 ` Eric W. Biederman
2011-02-09 17:07   ` Eric W. Biederman
2011-02-10  3:04   ` Cong Wang
2011-02-10  3:04     ` Cong Wang
2011-02-10  8:36 ` Hidetoshi Seto [this message]
2011-02-10  8:36   ` Hidetoshi Seto
2011-02-10  9:14   ` Borislav Petkov
2011-02-10  9:14     ` Borislav Petkov
2011-02-14  1:20     ` Hidetoshi Seto
2011-02-14  1:20       ` Hidetoshi Seto

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4D53A3AA.5050908@jp.fujitsu.com \
    --to=seto.hidetoshi@jp.fujitsu.com \
    --cc=amwang@redhat.com \
    --cc=andi@firstfloor.org \
    --cc=bp@alien8.de \
    --cc=dle-develop@lists.sourceforge.net \
    --cc=ebiederm@xmission.com \
    --cc=gregkh@suse.de \
    --cc=hpa@zytor.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=satoru.moriya@hds.com \
    --cc=seiji.aguchi@hds.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.