* [PATCH 1/2] ia64 kdump: add kdump_on_fatal_mca
@ 2008-04-08 4:31 Hidetoshi Seto
2008-04-08 17:36 ` Jay Lan
` (8 more replies)
0 siblings, 9 replies; 10+ messages in thread
From: Hidetoshi Seto @ 2008-04-08 4:31 UTC (permalink / raw)
To: linux-ia64
While it is convenient that we can invoke kdump by asserting INIT
via button on chassis etc., there are some situations that invoking
kdump on fatal MCA is not welcomed rather than rebooting fast without
dump.
This patch adds a new flag 'kdump_on_fatal_mca' that is independent
from 'kdump_on_init' currently available. Adding this flag enable
us to turning on/off of kdump depend on the event, INIT and/or fatal
MCA.
Thanks,
H.Seto
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
---
arch/ia64/kernel/crash.c | 31 +++++++++++++++++++++++--------
arch/ia64/kernel/mca.c | 6 +-----
2 files changed, 24 insertions(+), 13 deletions(-)
Index: GIT-torvalds/arch/ia64/kernel/crash.c
=================================--- GIT-torvalds.orig/arch/ia64/kernel/crash.c
+++ GIT-torvalds/arch/ia64/kernel/crash.c
@@ -24,6 +24,7 @@ int kdump_status[NR_CPUS];
static atomic_t kdump_cpu_frozen;
atomic_t kdump_in_progress;
static int kdump_on_init = 1;
+static int kdump_on_fatal_mca = 0;
static inline Elf64_Word
*append_elf_note(Elf64_Word *buf, char *name, unsigned type, void *data,
@@ -148,7 +149,7 @@ kdump_init_notifier(struct notifier_bloc
struct ia64_mca_notify_die *nd;
struct die_args *args = data;
- if (!kdump_on_init)
+ if (!kdump_on_init && !kdump_on_fatal_mca)
return NOTIFY_DONE;
if (!ia64_kimage) {
@@ -174,11 +175,14 @@ kdump_init_notifier(struct notifier_bloc
switch (val) {
case DIE_INIT_MONARCH_PROCESS:
- atomic_set(&kdump_in_progress, 1);
- *(nd->monarch_cpu) = -1;
+ if (kdump_on_init) {
+ atomic_set(&kdump_in_progress, 1);
+ *(nd->monarch_cpu) = -1;
+ }
break;
case DIE_INIT_MONARCH_LEAVE:
- machine_kdump_on_init();
+ if (kdump_on_init)
+ machine_kdump_on_init();
break;
case DIE_INIT_SLAVE_LEAVE:
if (atomic_read(&kdump_in_progress))
@@ -189,16 +193,19 @@ kdump_init_notifier(struct notifier_bloc
unw_init_running(kdump_cpu_freeze, NULL);
break;
case DIE_MCA_MONARCH_LEAVE:
- /* die_register->signr indicate if MCA is recoverable */
- if (!args->signr)
+ /* die_register->signr indicate if MCA is recoverable */
+ if (kdump_on_fatal_mca && !args->signr) {
+ atomic_set(&kdump_in_progress, 1);
+ *(nd->monarch_cpu) = -1;
machine_kdump_on_init();
+ }
break;
}
return NOTIFY_DONE;
}
#ifdef CONFIG_SYSCTL
-static ctl_table kdump_on_init_table[] = {
+static ctl_table kdump_ctl_table[] = {
{
.ctl_name = CTL_UNNUMBERED,
.procname = "kdump_on_init",
@@ -207,6 +214,14 @@ static ctl_table kdump_on_init_table[] .mode = 0644,
.proc_handler = &proc_dointvec,
},
+ {
+ .ctl_name = CTL_UNNUMBERED,
+ .procname = "kdump_on_fatal_mca",
+ .data = &kdump_on_fatal_mca,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = &proc_dointvec,
+ },
{ .ctl_name = 0 }
};
@@ -215,7 +230,7 @@ static ctl_table sys_table[] = {
.ctl_name = CTL_KERN,
.procname = "kernel",
.mode = 0555,
- .child = kdump_on_init_table,
+ .child = kdump_ctl_table,
},
{ .ctl_name = 0 }
};
Index: GIT-torvalds/arch/ia64/kernel/mca.c
=================================--- GIT-torvalds.orig/arch/ia64/kernel/mca.c
+++ GIT-torvalds/arch/ia64/kernel/mca.c
@@ -1266,16 +1266,12 @@ ia64_mca_handler(struct pt_regs *regs, s
} else {
/* Dump buffered message to console */
ia64_mlogbuf_finish(1);
-#ifdef CONFIG_KEXEC
- atomic_set(&kdump_in_progress, 1);
- monarch_cpu = -1;
-#endif
}
+
if (notify_die(DIE_MCA_MONARCH_LEAVE, "MCA", regs, (long)&nd, 0, recover)
= NOTIFY_STOP)
ia64_mca_spin(__func__);
-
if (atomic_dec_return(&mca_count) > 0) {
int i;
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] ia64 kdump: add kdump_on_fatal_mca
2008-04-08 4:31 [PATCH 1/2] ia64 kdump: add kdump_on_fatal_mca Hidetoshi Seto
@ 2008-04-08 17:36 ` Jay Lan
2008-04-08 17:51 ` Luck, Tony
` (7 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: Jay Lan @ 2008-04-08 17:36 UTC (permalink / raw)
To: linux-ia64
Hidetoshi Seto wrote:
> While it is convenient that we can invoke kdump by asserting INIT
> via button on chassis etc., there are some situations that invoking
> kdump on fatal MCA is not welcomed rather than rebooting fast without
> dump.
>
> This patch adds a new flag 'kdump_on_fatal_mca' that is independent
> from 'kdump_on_init' currently available. Adding this flag enable
> us to turning on/off of kdump depend on the event, INIT and/or fatal
> MCA.
>
> Thanks,
> H.Seto
>
> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
>
> ---
> arch/ia64/kernel/crash.c | 31 +++++++++++++++++++++++--------
> arch/ia64/kernel/mca.c | 6 +-----
> 2 files changed, 24 insertions(+), 13 deletions(-)
>
> Index: GIT-torvalds/arch/ia64/kernel/crash.c
> =================================> --- GIT-torvalds.orig/arch/ia64/kernel/crash.c
> +++ GIT-torvalds/arch/ia64/kernel/crash.c
> @@ -24,6 +24,7 @@ int kdump_status[NR_CPUS];
> static atomic_t kdump_cpu_frozen;
> atomic_t kdump_in_progress;
> static int kdump_on_init = 1;
> +static int kdump_on_fatal_mca = 0;
I think kdump_on_fatal_mca should be set to 1 by default. Fatal
mca is exactly the case where we need a dump to analyze the problem.
Thanhks,
- jay
>
> static inline Elf64_Word
> *append_elf_note(Elf64_Word *buf, char *name, unsigned type, void *data,
> @@ -148,7 +149,7 @@ kdump_init_notifier(struct notifier_bloc
> struct ia64_mca_notify_die *nd;
> struct die_args *args = data;
>
> - if (!kdump_on_init)
> + if (!kdump_on_init && !kdump_on_fatal_mca)
> return NOTIFY_DONE;
>
> if (!ia64_kimage) {
> @@ -174,11 +175,14 @@ kdump_init_notifier(struct notifier_bloc
>
> switch (val) {
> case DIE_INIT_MONARCH_PROCESS:
> - atomic_set(&kdump_in_progress, 1);
> - *(nd->monarch_cpu) = -1;
> + if (kdump_on_init) {
> + atomic_set(&kdump_in_progress, 1);
> + *(nd->monarch_cpu) = -1;
> + }
> break;
> case DIE_INIT_MONARCH_LEAVE:
> - machine_kdump_on_init();
> + if (kdump_on_init)
> + machine_kdump_on_init();
> break;
> case DIE_INIT_SLAVE_LEAVE:
> if (atomic_read(&kdump_in_progress))
> @@ -189,16 +193,19 @@ kdump_init_notifier(struct notifier_bloc
> unw_init_running(kdump_cpu_freeze, NULL);
> break;
> case DIE_MCA_MONARCH_LEAVE:
> - /* die_register->signr indicate if MCA is recoverable */
> - if (!args->signr)
> + /* die_register->signr indicate if MCA is recoverable */
> + if (kdump_on_fatal_mca && !args->signr) {
> + atomic_set(&kdump_in_progress, 1);
> + *(nd->monarch_cpu) = -1;
> machine_kdump_on_init();
> + }
> break;
> }
> return NOTIFY_DONE;
> }
>
> #ifdef CONFIG_SYSCTL
> -static ctl_table kdump_on_init_table[] = {
> +static ctl_table kdump_ctl_table[] = {
> {
> .ctl_name = CTL_UNNUMBERED,
> .procname = "kdump_on_init",
> @@ -207,6 +214,14 @@ static ctl_table kdump_on_init_table[] > .mode = 0644,
> .proc_handler = &proc_dointvec,
> },
> + {
> + .ctl_name = CTL_UNNUMBERED,
> + .procname = "kdump_on_fatal_mca",
> + .data = &kdump_on_fatal_mca,
> + .maxlen = sizeof(int),
> + .mode = 0644,
> + .proc_handler = &proc_dointvec,
> + },
> { .ctl_name = 0 }
> };
>
> @@ -215,7 +230,7 @@ static ctl_table sys_table[] = {
> .ctl_name = CTL_KERN,
> .procname = "kernel",
> .mode = 0555,
> - .child = kdump_on_init_table,
> + .child = kdump_ctl_table,
> },
> { .ctl_name = 0 }
> };
> Index: GIT-torvalds/arch/ia64/kernel/mca.c
> =================================> --- GIT-torvalds.orig/arch/ia64/kernel/mca.c
> +++ GIT-torvalds/arch/ia64/kernel/mca.c
> @@ -1266,16 +1266,12 @@ ia64_mca_handler(struct pt_regs *regs, s
> } else {
> /* Dump buffered message to console */
> ia64_mlogbuf_finish(1);
> -#ifdef CONFIG_KEXEC
> - atomic_set(&kdump_in_progress, 1);
> - monarch_cpu = -1;
> -#endif
> }
> +
> if (notify_die(DIE_MCA_MONARCH_LEAVE, "MCA", regs, (long)&nd, 0, recover)
> = NOTIFY_STOP)
> ia64_mca_spin(__func__);
>
> -
> if (atomic_dec_return(&mca_count) > 0) {
> int i;
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: [PATCH 1/2] ia64 kdump: add kdump_on_fatal_mca
2008-04-08 4:31 [PATCH 1/2] ia64 kdump: add kdump_on_fatal_mca Hidetoshi Seto
2008-04-08 17:36 ` Jay Lan
@ 2008-04-08 17:51 ` Luck, Tony
2008-04-08 18:19 ` Jay Lan
` (6 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: Luck, Tony @ 2008-04-08 17:51 UTC (permalink / raw)
To: linux-ia64
> I think kdump_on_fatal_mca should be set to 1 by default. Fatal
> mca is exactly the case where we need a dump to analyze the problem.
I'm not so sure. If the fatal MCA was caused by the s/w doing something
wrong (e.g. accessing non-existant memory), then a dump is useful to find out
what went wrong.
But if the MCA was caused by some h/w error (e.g. 2xECC bit error in kernel
memory), then a dump won't help.
Perhaps the dump would help distinguish the s/w case from the h/w case?
-Tony
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] ia64 kdump: add kdump_on_fatal_mca
2008-04-08 4:31 [PATCH 1/2] ia64 kdump: add kdump_on_fatal_mca Hidetoshi Seto
2008-04-08 17:36 ` Jay Lan
2008-04-08 17:51 ` Luck, Tony
@ 2008-04-08 18:19 ` Jay Lan
2008-04-08 18:21 ` Russ Anderson
` (5 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: Jay Lan @ 2008-04-08 18:19 UTC (permalink / raw)
To: linux-ia64
Luck, Tony wrote:
>> I think kdump_on_fatal_mca should be set to 1 by default. Fatal
>> mca is exactly the case where we need a dump to analyze the problem.
>
> I'm not so sure. If the fatal MCA was caused by the s/w doing something
> wrong (e.g. accessing non-existant memory), then a dump is useful to find out
> what went wrong.
>
> But if the MCA was caused by some h/w error (e.g. 2xECC bit error in kernel
> memory), then a dump won't help.
>
> Perhaps the dump would help distinguish the s/w case from the h/w case?
With a dump, certainly it would be very clear, but that is not the
point.
When one puts aside a big chunk of memory for crashkernel. i think he
would like to get the dump when kernel crashes. Trying to recreate a
kernel crash on a large production system is expensive and sometimes
not that readily reproducible.
Taking a dump on fatal MCA is the (default) bahavior today, isn't it?
- jay
>
> -Tony
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] ia64 kdump: add kdump_on_fatal_mca
2008-04-08 4:31 [PATCH 1/2] ia64 kdump: add kdump_on_fatal_mca Hidetoshi Seto
` (2 preceding siblings ...)
2008-04-08 18:19 ` Jay Lan
@ 2008-04-08 18:21 ` Russ Anderson
2008-04-08 18:34 ` Jay Lan
` (4 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: Russ Anderson @ 2008-04-08 18:21 UTC (permalink / raw)
To: linux-ia64
On Tue, Apr 08, 2008 at 10:51:42AM -0700, Luck, Tony wrote:
> > I think kdump_on_fatal_mca should be set to 1 by default. Fatal
> > mca is exactly the case where we need a dump to analyze the problem.
>
> I'm not so sure. If the fatal MCA was caused by the s/w doing something
> wrong (e.g. accessing non-existant memory), then a dump is useful to find out
> what went wrong.
>
> But if the MCA was caused by some h/w error (e.g. 2xECC bit error in kernel
> memory), then a dump won't help.
>
> Perhaps the dump would help distinguish the s/w case from the h/w case?
Yes. We generally try to take a dump after a crash to collect all
the available data. The analysis of the data (to determine h/w or
s/w) occurs after the reboot.
As an alternative, could kdump_on_fatal_mca be turned on by
default in Altix (in the Altix specific boot code)? Then
we could set our default without impacting other vendors.
Thanks,
--
Russ Anderson, OS RAS/Partitioning Project Lead
SGI - Silicon Graphics Inc rja@sgi.com
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] ia64 kdump: add kdump_on_fatal_mca
2008-04-08 4:31 [PATCH 1/2] ia64 kdump: add kdump_on_fatal_mca Hidetoshi Seto
` (3 preceding siblings ...)
2008-04-08 18:21 ` Russ Anderson
@ 2008-04-08 18:34 ` Jay Lan
2008-04-08 18:54 ` Luck, Tony
` (3 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: Jay Lan @ 2008-04-08 18:34 UTC (permalink / raw)
To: linux-ia64
Russ Anderson wrote:
> On Tue, Apr 08, 2008 at 10:51:42AM -0700, Luck, Tony wrote:
>>> I think kdump_on_fatal_mca should be set to 1 by default. Fatal
>>> mca is exactly the case where we need a dump to analyze the problem.
>> I'm not so sure. If the fatal MCA was caused by the s/w doing something
>> wrong (e.g. accessing non-existant memory), then a dump is useful to find out
>> what went wrong.
>>
>> But if the MCA was caused by some h/w error (e.g. 2xECC bit error in kernel
>> memory), then a dump won't help.
>>
>> Perhaps the dump would help distinguish the s/w case from the h/w case?
>
> Yes. We generally try to take a dump after a crash to collect all
> the available data. The analysis of the data (to determine h/w or
> s/w) occurs after the reboot.
>
> As an alternative, could kdump_on_fatal_mca be turned on by
> default in Altix (in the Altix specific boot code)? Then
> we could set our default without impacting other vendors.
Yes, Propack can set the configuration on Altix. I saw the proposed
patch that changes the current behavior without mentioning it
at the summary, so i brought it up. ;)
- jay
>
> Thanks,
^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: [PATCH 1/2] ia64 kdump: add kdump_on_fatal_mca
2008-04-08 4:31 [PATCH 1/2] ia64 kdump: add kdump_on_fatal_mca Hidetoshi Seto
` (4 preceding siblings ...)
2008-04-08 18:34 ` Jay Lan
@ 2008-04-08 18:54 ` Luck, Tony
2008-04-09 0:36 ` Hidetoshi Seto
` (2 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: Luck, Tony @ 2008-04-08 18:54 UTC (permalink / raw)
To: linux-ia64
> As an alternative, could kdump_on_fatal_mca be turned on by
> default in Altix (in the Altix specific boot code)? Then
> we could set our default without impacting other vendors.
This is a sysctl variable ... are there any cases where we
set these to different default values for different platforms?
It would seem better for the kernel to set the default value
to the most useful one, and for system owners that don't want
the default behaviour to just add an entry to /etc/sysctl.conf
to set their preference.
I'm not sure how we decide what is the most useful default.
So far SGI are voting for enabling dumping for MCA, and
Fujitsu have implicitly voted to disable (as that is what
Seto-san's patch sets). Any other opinions?
-Tony
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] ia64 kdump: add kdump_on_fatal_mca
2008-04-08 4:31 [PATCH 1/2] ia64 kdump: add kdump_on_fatal_mca Hidetoshi Seto
` (5 preceding siblings ...)
2008-04-08 18:54 ` Luck, Tony
@ 2008-04-09 0:36 ` Hidetoshi Seto
2008-04-09 21:59 ` Luck, Tony
2008-04-09 23:01 ` Bjorn Helgaas
8 siblings, 0 replies; 10+ messages in thread
From: Hidetoshi Seto @ 2008-04-09 0:36 UTC (permalink / raw)
To: linux-ia64
>> I saw the proposed
>> patch that changes the current behavior without mentioning it
>> at the summary, so i brought it up. ;)
Sorry about that. It's my fault.
But as the result it was lucky that we can have a good
discussion about "required action after a fatal MCA".
> So far SGI are voting for enabling dumping for MCA, and
> Fujitsu have implicitly voted to disable (as that is what
> Seto-san's patch sets). Any other opinions?
Ah, my vote can be invalid.
AFAIK, Fujitsu have both situation where kdump_on_fatal_mca
should be enabled and where it should be disabled.
Therefore I don't mind which is the default.
Anyway we need to put a line to /etc/sysctl.conf of one of both.
So now "enable vs disable" is not 1:1 but 1:0 or 1.5:0.5.
Tony, (if there are no other votes until your poll closing)
would you apply my patch modifying the default to 1?
Or I need to resend you all just modified the one byte?
Thanks,
H.Seto
^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: [PATCH 1/2] ia64 kdump: add kdump_on_fatal_mca
2008-04-08 4:31 [PATCH 1/2] ia64 kdump: add kdump_on_fatal_mca Hidetoshi Seto
` (6 preceding siblings ...)
2008-04-09 0:36 ` Hidetoshi Seto
@ 2008-04-09 21:59 ` Luck, Tony
2008-04-09 23:01 ` Bjorn Helgaas
8 siblings, 0 replies; 10+ messages in thread
From: Luck, Tony @ 2008-04-09 21:59 UTC (permalink / raw)
To: linux-ia64
> Tony, (if there are no other votes until your poll closing)
> would you apply my patch modifying the default to 1?
> Or I need to resend you all just modified the one byte?
Change is not even one byte ... just one bit ('0' = 0x30,
'1' = 0x31 :-)
I made the change and committed it to my "test" branch.
-Tony
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] ia64 kdump: add kdump_on_fatal_mca
2008-04-08 4:31 [PATCH 1/2] ia64 kdump: add kdump_on_fatal_mca Hidetoshi Seto
` (7 preceding siblings ...)
2008-04-09 21:59 ` Luck, Tony
@ 2008-04-09 23:01 ` Bjorn Helgaas
8 siblings, 0 replies; 10+ messages in thread
From: Bjorn Helgaas @ 2008-04-09 23:01 UTC (permalink / raw)
To: linux-ia64
On Tuesday 08 April 2008 12:54:13 pm Luck, Tony wrote:
> > As an alternative, could kdump_on_fatal_mca be turned on by
> > default in Altix (in the Altix specific boot code)? Then
> > we could set our default without impacting other vendors.
>
> This is a sysctl variable ... are there any cases where we
> set these to different default values for different platforms?
>
> It would seem better for the kernel to set the default value
> to the most useful one, and for system owners that don't want
> the default behaviour to just add an entry to /etc/sysctl.conf
> to set their preference.
>
> I'm not sure how we decide what is the most useful default.
> So far SGI are voting for enabling dumping for MCA, and
> Fujitsu have implicitly voted to disable (as that is what
> Seto-san's patch sets). Any other opinions?
I'd like dumping on MCA to be enabled by default.
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2008-04-09 23:01 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-04-08 4:31 [PATCH 1/2] ia64 kdump: add kdump_on_fatal_mca Hidetoshi Seto
2008-04-08 17:36 ` Jay Lan
2008-04-08 17:51 ` Luck, Tony
2008-04-08 18:19 ` Jay Lan
2008-04-08 18:21 ` Russ Anderson
2008-04-08 18:34 ` Jay Lan
2008-04-08 18:54 ` Luck, Tony
2008-04-09 0:36 ` Hidetoshi Seto
2008-04-09 21:59 ` Luck, Tony
2008-04-09 23:01 ` Bjorn Helgaas
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox