All of lore.kernel.org
 help / color / mirror / Atom feed
From: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
To: Vivek Goyal <vgoyal@redhat.com>
Cc: "Don Zickus" <dzickus@redhat.com>, "Baoquan He" <bhe@redhat.com>,
	kexec@lists.infradead.org, linux-kernel@vger.kernel.org,
	"\"Hatayama, Daisuke/畑山 大輔\"" <d.hatayama@jp.fujitsu.com>,
	mingo@redhat.com, ebiederm@xmission.com,
	hidehiro.kawai.ez@hitachi.com, akpm@linux-foundation.org,
	bp@suse.de, "Ingo Molnar" <mingo@kernel.org>
Subject: Re: [PATCH v2] kernel/panic/kexec: fix "crash_kexec_post_notifiers" option issue in oops path
Date: Tue, 24 Mar 2015 12:58:02 +0900	[thread overview]
Message-ID: <5510E0CA.5000507@hitachi.com> (raw)
In-Reply-To: <20150323143158.GB3172@redhat.com>

(2015/03/23 23:31), Vivek Goyal wrote:
[...]
>>>> Secondly, and more importantly, the whole premise of commit 
>>>> f06e5153f4ae is broken IMHO:
>>>>
>>>>  "This can help rare situations where kdump fails because of unstable
>>>>   crashed kernel or hardware failure (memory corruption on critical
>>>>   data/code)"
>>>>
>>>> wtf?
>>>>
>>>> If the kernel crashed due to a kernel crash, then the kernel booting 
>>>> up in whatever hardware state should be able to do a clean bootup. The 
>>>> fix for those 'rare situations' should be to fix the real bug (for 
>>>> example by making hardware driver init (or deinit) sequences more 
>>>> robust), not to paper it over by ordering around crash-time sequences 
>>>> ...
>>>>
>>>> If it crashed due to some hardware failure, there's literally an 
>>>> infinite amount of failure modes that may or may not be impacted by 
>>>> kexec crash-time handling ordering. We don't want to put a zillion 
>>>> such flags into the kernel proper just to allow the perturbation of 
>>>> the kernel.
>>>
>>> I think one of the motivations behind this patch was call to kmsg_dump().
>>> Some vendors have been wanting to have the capability to save kernel logs
>>> to some NVRAM before transition to second kernel happens. Their argument
>>> is that kdump does not succeed all the time and if kdump does not succeed
>>> then atleast they have something to work with (kernel logs retrieved
>>> from pstore interface).
>>
>> Doesn't pstore attach itself to printk itself? AFAICS it does:
>>
>>  fs/pstore/platform.c:   register_console(&pstore_console);
>>
>> so the printk log leading up to and including the crash should be 
>> available, regardless of this patch. What am I missing?
> 
> That's a good point. I was not aware of it. I am Ccing Don Zickus as
> he has spent some time on this in the past.
> 
> Masami, would you have thougths on this? IIRC, one reason why kmsg_dump()
> was written so that one could dump kernel messages to an NVRAM. Of one
> could simple register pstore as console, then how kmsg_dump() will
> continue to be useful?

Yes, actually, kmsg_dump and pstore can help a lot to dump the last
message (even though kmsg_dump() is called only when setting
crash_kexec_post_notifiers...)

However, there are some machines which don't support pstore, but
only IPMI. pstore(kmsg) stores messages to a local NVRAM, and IPMI
stores messages to BMC(Board Management Controller)'s NVRAM (SEL:
System Event Log).
Some enterprise servers only have BMC, but no NVRAM. For such kind
of servers, we still need to call panic_notifier to store messages
via IPMI.
And also, using IPMI has another secondary feature, we can notice
machine failure from remote machine via IPMI over LAN by monitoring
SEL :)

You might want to integrate IPMI and pstore. But since IPMI SEL is
very limited and very slow, those are very different.

>>> Not that I agree fully with this as problem might happen while we 
>>> try to run panic_notifiers or kmsg_dump hooks and never transition 
>>> into kdump kernel.
>>
>> btw., this is the big problem with 'notifiers' in general: they are 
>> opaque with barely any semantics defined, and a source of constant 
>> confusion.
> 
> Agreed. That's the reason Eric never liked the idea of letting panic
> notifiers run before crash_kexec().

I see. thus I added a notice on documentation.

                        Note that this also increases risks of kdump failure,
                        because some panic notifiers can make the crashed
                        kernel more unstable.

I personally don't recommend to use this in usual situation. Only for
the machines which is very well configured and tested, this feature can
be enabled.

>>> And it has been literally years since some developers have been 
>>> pushing for allowing to run panic notifiers before crash_kexec(). 
>>> Eric Biederman has been pushing back saying it reduces the 
>>> reliability of kdump operation so this is not acceptable.
>>
>> So what do those notifiers do?
> 
> IIRC, two main reasons had come in the past.
> 
> - In a cluster of nodes, people wanted to send some sort of notifications
>   to main server that a node has crashed and don't fence it off as it
>   might be saving dump.
> 
> - And saving kernel logs to non volatile store.
> 
> There might be more and I might not be aware about these. Hatayama and
> Masami, can you shed more light on this.

Yes, as I described above, we'd like to use IPMI to write the log to SEL
and that also allow us to monitor the machine remotely.

> 
> BTW, first problem we faced in our clusters too and now it has been fixed.
> Basically we send notifications in second kernel in user space to master
> server that this node is still saving dump so don't fence it off.

Yeah, that's the usual way, I think. In some "mission-critical" use-cases,
we can't relay only on the kdump stability.

Thank you,



-- 
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Research Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

WARNING: multiple messages have this Message-ID (diff)
From: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
To: Vivek Goyal <vgoyal@redhat.com>
Cc: "Ingo Molnar" <mingo@kernel.org>, "Baoquan He" <bhe@redhat.com>,
	"\"Hatayama, Daisuke/畑山 大輔\"" <d.hatayama@jp.fujitsu.com>,
	ebiederm@xmission.com, hidehiro.kawai.ez@hitachi.com,
	linux-kernel@vger.kernel.org, kexec@lists.infradead.org,
	akpm@linux-foundation.org, mingo@redhat.com, bp@suse.de,
	"Don Zickus" <dzickus@redhat.com>
Subject: Re: [PATCH v2] kernel/panic/kexec: fix "crash_kexec_post_notifiers" option issue in oops path
Date: Tue, 24 Mar 2015 12:58:02 +0900	[thread overview]
Message-ID: <5510E0CA.5000507@hitachi.com> (raw)
In-Reply-To: <20150323143158.GB3172@redhat.com>

(2015/03/23 23:31), Vivek Goyal wrote:
[...]
>>>> Secondly, and more importantly, the whole premise of commit 
>>>> f06e5153f4ae is broken IMHO:
>>>>
>>>>  "This can help rare situations where kdump fails because of unstable
>>>>   crashed kernel or hardware failure (memory corruption on critical
>>>>   data/code)"
>>>>
>>>> wtf?
>>>>
>>>> If the kernel crashed due to a kernel crash, then the kernel booting 
>>>> up in whatever hardware state should be able to do a clean bootup. The 
>>>> fix for those 'rare situations' should be to fix the real bug (for 
>>>> example by making hardware driver init (or deinit) sequences more 
>>>> robust), not to paper it over by ordering around crash-time sequences 
>>>> ...
>>>>
>>>> If it crashed due to some hardware failure, there's literally an 
>>>> infinite amount of failure modes that may or may not be impacted by 
>>>> kexec crash-time handling ordering. We don't want to put a zillion 
>>>> such flags into the kernel proper just to allow the perturbation of 
>>>> the kernel.
>>>
>>> I think one of the motivations behind this patch was call to kmsg_dump().
>>> Some vendors have been wanting to have the capability to save kernel logs
>>> to some NVRAM before transition to second kernel happens. Their argument
>>> is that kdump does not succeed all the time and if kdump does not succeed
>>> then atleast they have something to work with (kernel logs retrieved
>>> from pstore interface).
>>
>> Doesn't pstore attach itself to printk itself? AFAICS it does:
>>
>>  fs/pstore/platform.c:   register_console(&pstore_console);
>>
>> so the printk log leading up to and including the crash should be 
>> available, regardless of this patch. What am I missing?
> 
> That's a good point. I was not aware of it. I am Ccing Don Zickus as
> he has spent some time on this in the past.
> 
> Masami, would you have thougths on this? IIRC, one reason why kmsg_dump()
> was written so that one could dump kernel messages to an NVRAM. Of one
> could simple register pstore as console, then how kmsg_dump() will
> continue to be useful?

Yes, actually, kmsg_dump and pstore can help a lot to dump the last
message (even though kmsg_dump() is called only when setting
crash_kexec_post_notifiers...)

However, there are some machines which don't support pstore, but
only IPMI. pstore(kmsg) stores messages to a local NVRAM, and IPMI
stores messages to BMC(Board Management Controller)'s NVRAM (SEL:
System Event Log).
Some enterprise servers only have BMC, but no NVRAM. For such kind
of servers, we still need to call panic_notifier to store messages
via IPMI.
And also, using IPMI has another secondary feature, we can notice
machine failure from remote machine via IPMI over LAN by monitoring
SEL :)

You might want to integrate IPMI and pstore. But since IPMI SEL is
very limited and very slow, those are very different.

>>> Not that I agree fully with this as problem might happen while we 
>>> try to run panic_notifiers or kmsg_dump hooks and never transition 
>>> into kdump kernel.
>>
>> btw., this is the big problem with 'notifiers' in general: they are 
>> opaque with barely any semantics defined, and a source of constant 
>> confusion.
> 
> Agreed. That's the reason Eric never liked the idea of letting panic
> notifiers run before crash_kexec().

I see. thus I added a notice on documentation.

                        Note that this also increases risks of kdump failure,
                        because some panic notifiers can make the crashed
                        kernel more unstable.

I personally don't recommend to use this in usual situation. Only for
the machines which is very well configured and tested, this feature can
be enabled.

>>> And it has been literally years since some developers have been 
>>> pushing for allowing to run panic notifiers before crash_kexec(). 
>>> Eric Biederman has been pushing back saying it reduces the 
>>> reliability of kdump operation so this is not acceptable.
>>
>> So what do those notifiers do?
> 
> IIRC, two main reasons had come in the past.
> 
> - In a cluster of nodes, people wanted to send some sort of notifications
>   to main server that a node has crashed and don't fence it off as it
>   might be saving dump.
> 
> - And saving kernel logs to non volatile store.
> 
> There might be more and I might not be aware about these. Hatayama and
> Masami, can you shed more light on this.

Yes, as I described above, we'd like to use IPMI to write the log to SEL
and that also allow us to monitor the machine remotely.

> 
> BTW, first problem we faced in our clusters too and now it has been fixed.
> Basically we send notifications in second kernel in user space to master
> server that this node is still saving dump so don't fence it off.

Yeah, that's the usual way, I think. In some "mission-critical" use-cases,
we can't relay only on the kdump stability.

Thank you,



-- 
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Research Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



  parent reply	other threads:[~2015-03-24  3:58 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-06 16:31 [PATCH v2] kernel/panic/kexec: fix "crash_kexec_post_notifiers" option issue in oops path "Hatayama, Daisuke/畑山 大輔"
2015-03-06 16:31 ` "Hatayama, Daisuke/畑山 大輔"
2015-03-06 18:08 ` Vivek Goyal
2015-03-06 18:08   ` Vivek Goyal
2015-03-23  3:47 ` Baoquan He
2015-03-23  3:47   ` Baoquan He
2015-03-23  7:19   ` Ingo Molnar
2015-03-23  7:19     ` Ingo Molnar
2015-03-23 13:37     ` Vivek Goyal
2015-03-23 13:37       ` Vivek Goyal
2015-03-23 13:50       ` Ingo Molnar
2015-03-23 13:50         ` Ingo Molnar
2015-03-23 14:31         ` Vivek Goyal
2015-03-23 14:31           ` Vivek Goyal
2015-03-23 16:01           ` Don Zickus
2015-03-23 16:01             ` Don Zickus
2015-03-24  3:58           ` Masami Hiramatsu [this message]
2015-03-24  3:58             ` Masami Hiramatsu
2015-03-23 15:36     ` Vivek Goyal
2015-03-23 15:36       ` Vivek Goyal
2015-03-24  3:30     ` Masami Hiramatsu
2015-03-24  3:30       ` Masami Hiramatsu
2015-03-24  7:11       ` Ingo Molnar
2015-03-24  7:11         ` Ingo Molnar
2015-03-24 10:27         ` Eric W. Biederman
2015-03-24 10:27           ` Eric W. Biederman
2015-03-24 14:32           ` Vivek Goyal
2015-03-24 14:32             ` Vivek Goyal
2015-03-25 15:07             ` Hidehiro Kawai
2015-03-25 15:07               ` Hidehiro Kawai
2015-03-24 14:46         ` Vivek Goyal
2015-03-24 14:46           ` Vivek Goyal
2015-03-24 16:18           ` Ingo Molnar
2015-03-24 16:18             ` Ingo Molnar
2015-03-24 17:04             ` Vivek Goyal
2015-03-24 17:04               ` Vivek Goyal
2015-05-12  8:43               ` Hidehiro Kawai
2015-05-12  8:43                 ` Hidehiro Kawai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5510E0CA.5000507@hitachi.com \
    --to=masami.hiramatsu.pt@hitachi.com \
    --cc=akpm@linux-foundation.org \
    --cc=bhe@redhat.com \
    --cc=bp@suse.de \
    --cc=d.hatayama@jp.fujitsu.com \
    --cc=dzickus@redhat.com \
    --cc=ebiederm@xmission.com \
    --cc=hidehiro.kawai.ez@hitachi.com \
    --cc=kexec@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=mingo@redhat.com \
    --cc=vgoyal@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.