All of lore.kernel.org
 help / color / mirror / Atom feed
From: ebiederm@xmission.com (Eric W. Biederman)
To: Vivek Goyal <vgoyal@redhat.com>
Cc: linux-mips@linux-mips.org, Baoquan He <bhe@redhat.com>,
	linux-sh@vger.kernel.org, linux-s390@vger.kernel.org,
	kexec@lists.infradead.org, linux-kernel@vger.kernel.org,
	linux-metag@vger.kernel.org,
	Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>,
	HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>,
	Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>,
	dwalker@fifo99.com, Andrew Morton <akpm@linux-foundation.org>,
	linuxppc-dev@lists.ozlabs.org, Ingo Molnar <mingo@kernel.org>,
	linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH 1/3] panic: Disable crash_kexec_post_notifiers if kdump is not available
Date: Tue, 14 Jul 2015 13:01:12 -0500	[thread overview]
Message-ID: <87si8qmxef.fsf@x220.int.ebiederm.org> (raw)
In-Reply-To: <20150714175527.GI10792@redhat.com> (Vivek Goyal's message of "Tue, 14 Jul 2015 13:55:27 -0400")

Vivek Goyal <vgoyal@redhat.com> writes:

> On Tue, Jul 14, 2015 at 05:29:53PM +0000, dwalker@fifo99.com wrote:
>
> [..]
>> > >> > If a machine is failing, there are high chance it can't deliver you the
>> > >> > notification. Detecting that failure suing some kind of polling mechanism
>> > >> > might be more reliable. And it will make even kdump mechanism more
>> > >> > reliable so that it does not have to run panic notifiers after the crash.
>> > >> 
>> > >> I think what your suggesting is that my company should change how it's hardware works
>> > >> and that's not really an option for me. This isn't a simple thing like checking over the
>> > >> network if the machine is down or not, this is way more complex hardware design.
>> > >
>> > > That means you are ready to live with an unreliable design. There might be
>> > > cases where notifier does not get run properly and you will not do switch
>> > > despite the fact that OS has failed. I was just trying to nudge you in
>> > > a direction which could be more reliable mechanism.
>> > 
>> > Sigh I see some deep confusion going on here.
>> > 
>> > The panic notifiers are just that panic notifiers.  They have not been
>> > nor should they be tied to kexec.   If those notifiers force a switch
>> > over of between machines I fail to see why you would care if it was
>> > kexec or another panic situation that is forcing that switchover.
>> 
>> Hidehiro isn't fixing the failover situation on my side, he's fixing register
>> information collection when crash_kexec_post_notifiers is used.
>
> Sure. Given that we have created this new parameter, let us fix it so that
> we can capture the other cpu register state in crash dump.
>
> I am little disappointed that it was not tested well when this parameter was
> introuced. We should have atleast tested it to the extent to see if there
> is proper cpu state present for all cpus in the crash dump.
>
> At that point of time it looked like a simple modification
> to allow panic notifiers before crash_kexec().

Either that or we say no one cares enough, and it known broken so let's
just revert the fool thing.

I honestly can't see how to support panic notifiers, before kexec.
There is no way to tell what is being done and all of the pieces
including smp_send_stop are known to be buggy.

It isn't like this latest set of patches was reviewed/tested much
better, as the first patch was wrong.

Eric

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

WARNING: multiple messages have this Message-ID (diff)
From: ebiederm@xmission.com (Eric W. Biederman)
To: Vivek Goyal <vgoyal@redhat.com>
Cc: dwalker@fifo99.com,
	Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mips@linux-mips.org, Baoquan He <bhe@redhat.com>,
	linux-sh@vger.kernel.org, linux-s390@vger.kernel.org,
	kexec@lists.infradead.org, linux-kernel@vger.kernel.org,
	Ingo Molnar <mingo@kernel.org>,
	HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>,
	Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>,
	linuxppc-dev@lists.ozlabs.org, linux-metag@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH 1/3] panic: Disable crash_kexec_post_notifiers if kdump is not available
Date: Tue, 14 Jul 2015 13:01:12 -0500	[thread overview]
Message-ID: <87si8qmxef.fsf@x220.int.ebiederm.org> (raw)
In-Reply-To: <20150714175527.GI10792@redhat.com> (Vivek Goyal's message of "Tue, 14 Jul 2015 13:55:27 -0400")

Vivek Goyal <vgoyal@redhat.com> writes:

> On Tue, Jul 14, 2015 at 05:29:53PM +0000, dwalker@fifo99.com wrote:
>
> [..]
>> > >> > If a machine is failing, there are high chance it can't deliver you the
>> > >> > notification. Detecting that failure suing some kind of polling mechanism
>> > >> > might be more reliable. And it will make even kdump mechanism more
>> > >> > reliable so that it does not have to run panic notifiers after the crash.
>> > >> 
>> > >> I think what your suggesting is that my company should change how it's hardware works
>> > >> and that's not really an option for me. This isn't a simple thing like checking over the
>> > >> network if the machine is down or not, this is way more complex hardware design.
>> > >
>> > > That means you are ready to live with an unreliable design. There might be
>> > > cases where notifier does not get run properly and you will not do switch
>> > > despite the fact that OS has failed. I was just trying to nudge you in
>> > > a direction which could be more reliable mechanism.
>> > 
>> > Sigh I see some deep confusion going on here.
>> > 
>> > The panic notifiers are just that panic notifiers.  They have not been
>> > nor should they be tied to kexec.   If those notifiers force a switch
>> > over of between machines I fail to see why you would care if it was
>> > kexec or another panic situation that is forcing that switchover.
>> 
>> Hidehiro isn't fixing the failover situation on my side, he's fixing register
>> information collection when crash_kexec_post_notifiers is used.
>
> Sure. Given that we have created this new parameter, let us fix it so that
> we can capture the other cpu register state in crash dump.
>
> I am little disappointed that it was not tested well when this parameter was
> introuced. We should have atleast tested it to the extent to see if there
> is proper cpu state present for all cpus in the crash dump.
>
> At that point of time it looked like a simple modification
> to allow panic notifiers before crash_kexec().

Either that or we say no one cares enough, and it known broken so let's
just revert the fool thing.

I honestly can't see how to support panic notifiers, before kexec.
There is no way to tell what is being done and all of the pieces
including smp_send_stop are known to be buggy.

It isn't like this latest set of patches was reviewed/tested much
better, as the first patch was wrong.

Eric

WARNING: multiple messages have this Message-ID (diff)
From: ebiederm@xmission.com (Eric W. Biederman)
To: linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH 1/3] panic: Disable crash_kexec_post_notifiers if kdump is not available
Date: Tue, 14 Jul 2015 18:01:12 +0000	[thread overview]
Message-ID: <87si8qmxef.fsf@x220.int.ebiederm.org> (raw)
In-Reply-To: <20150714175527.GI10792@redhat.com> (Vivek Goyal's message of "Tue, 14 Jul 2015 13:55:27 -0400")

Vivek Goyal <vgoyal@redhat.com> writes:

> On Tue, Jul 14, 2015 at 05:29:53PM +0000, dwalker@fifo99.com wrote:
>
> [..]
>> > >> > If a machine is failing, there are high chance it can't deliver you the
>> > >> > notification. Detecting that failure suing some kind of polling mechanism
>> > >> > might be more reliable. And it will make even kdump mechanism more
>> > >> > reliable so that it does not have to run panic notifiers after the crash.
>> > >> 
>> > >> I think what your suggesting is that my company should change how it's hardware works
>> > >> and that's not really an option for me. This isn't a simple thing like checking over the
>> > >> network if the machine is down or not, this is way more complex hardware design.
>> > >
>> > > That means you are ready to live with an unreliable design. There might be
>> > > cases where notifier does not get run properly and you will not do switch
>> > > despite the fact that OS has failed. I was just trying to nudge you in
>> > > a direction which could be more reliable mechanism.
>> > 
>> > Sigh I see some deep confusion going on here.
>> > 
>> > The panic notifiers are just that panic notifiers.  They have not been
>> > nor should they be tied to kexec.   If those notifiers force a switch
>> > over of between machines I fail to see why you would care if it was
>> > kexec or another panic situation that is forcing that switchover.
>> 
>> Hidehiro isn't fixing the failover situation on my side, he's fixing register
>> information collection when crash_kexec_post_notifiers is used.
>
> Sure. Given that we have created this new parameter, let us fix it so that
> we can capture the other cpu register state in crash dump.
>
> I am little disappointed that it was not tested well when this parameter was
> introuced. We should have atleast tested it to the extent to see if there
> is proper cpu state present for all cpus in the crash dump.
>
> At that point of time it looked like a simple modification
> to allow panic notifiers before crash_kexec().

Either that or we say no one cares enough, and it known broken so let's
just revert the fool thing.

I honestly can't see how to support panic notifiers, before kexec.
There is no way to tell what is being done and all of the pieces
including smp_send_stop are known to be buggy.

It isn't like this latest set of patches was reviewed/tested much
better, as the first patch was wrong.

Eric

WARNING: multiple messages have this Message-ID (diff)
From: ebiederm@xmission.com (Eric W. Biederman)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH 1/3] panic: Disable crash_kexec_post_notifiers if kdump is not available
Date: Tue, 14 Jul 2015 13:01:12 -0500	[thread overview]
Message-ID: <87si8qmxef.fsf@x220.int.ebiederm.org> (raw)
In-Reply-To: <20150714175527.GI10792@redhat.com> (Vivek Goyal's message of "Tue, 14 Jul 2015 13:55:27 -0400")

Vivek Goyal <vgoyal@redhat.com> writes:

> On Tue, Jul 14, 2015 at 05:29:53PM +0000, dwalker at fifo99.com wrote:
>
> [..]
>> > >> > If a machine is failing, there are high chance it can't deliver you the
>> > >> > notification. Detecting that failure suing some kind of polling mechanism
>> > >> > might be more reliable. And it will make even kdump mechanism more
>> > >> > reliable so that it does not have to run panic notifiers after the crash.
>> > >> 
>> > >> I think what your suggesting is that my company should change how it's hardware works
>> > >> and that's not really an option for me. This isn't a simple thing like checking over the
>> > >> network if the machine is down or not, this is way more complex hardware design.
>> > >
>> > > That means you are ready to live with an unreliable design. There might be
>> > > cases where notifier does not get run properly and you will not do switch
>> > > despite the fact that OS has failed. I was just trying to nudge you in
>> > > a direction which could be more reliable mechanism.
>> > 
>> > Sigh I see some deep confusion going on here.
>> > 
>> > The panic notifiers are just that panic notifiers.  They have not been
>> > nor should they be tied to kexec.   If those notifiers force a switch
>> > over of between machines I fail to see why you would care if it was
>> > kexec or another panic situation that is forcing that switchover.
>> 
>> Hidehiro isn't fixing the failover situation on my side, he's fixing register
>> information collection when crash_kexec_post_notifiers is used.
>
> Sure. Given that we have created this new parameter, let us fix it so that
> we can capture the other cpu register state in crash dump.
>
> I am little disappointed that it was not tested well when this parameter was
> introuced. We should have atleast tested it to the extent to see if there
> is proper cpu state present for all cpus in the crash dump.
>
> At that point of time it looked like a simple modification
> to allow panic notifiers before crash_kexec().

Either that or we say no one cares enough, and it known broken so let's
just revert the fool thing.

I honestly can't see how to support panic notifiers, before kexec.
There is no way to tell what is being done and all of the pieces
including smp_send_stop are known to be buggy.

It isn't like this latest set of patches was reviewed/tested much
better, as the first patch was wrong.

Eric

  reply	other threads:[~2015-07-14 18:01 UTC|newest]

Thread overview: 105+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-10 11:33 [PATCH 0/3] kexec: crash_kexec_post_notifiers boot option related fixes Hidehiro Kawai
2015-07-10 11:33 ` Hidehiro Kawai
2015-07-10 11:33 ` Hidehiro Kawai
2015-07-10 11:33 ` Hidehiro Kawai
2015-07-10 11:33 ` [PATCH 3/3] kexec: Change the timing of callbacks related to "crash_kexec_post_notifiers" boot option Hidehiro Kawai
2015-07-10 11:33   ` Hidehiro Kawai
2015-07-10 11:33   ` [PATCH 3/3] kexec: Change the timing of callbacks related to "crash_kexec_post_notifiers" boot optio Hidehiro Kawai
2015-07-10 11:33   ` [PATCH 3/3] kexec: Change the timing of callbacks related to "crash_kexec_post_notifiers" boot option Hidehiro Kawai
2015-07-14 14:42   ` Vivek Goyal
2015-07-14 14:42     ` Vivek Goyal
2015-07-14 14:42     ` [PATCH 3/3] kexec: Change the timing of callbacks related to "crash_kexec_post_notifiers" boot o Vivek Goyal
2015-07-14 14:42     ` [PATCH 3/3] kexec: Change the timing of callbacks related to "crash_kexec_post_notifiers" boot option Vivek Goyal
2015-07-15  3:09     ` Masami Hiramatsu
2015-07-15  3:09       ` Masami Hiramatsu
2015-07-15  3:09       ` Re: [PATCH 3/3] kexec: Change the timing of callbacks related to "crash_kexec_post_notifiers" bo Masami Hiramatsu
2015-07-15  3:09       ` Re: [PATCH 3/3] kexec: Change the timing of callbacks related to "crash_kexec_post_notifiers" boot option Masami Hiramatsu
2015-07-10 11:33 ` [PATCH 2/3] kexec: Pass panic message to crash_kexec() Hidehiro Kawai
2015-07-10 11:33   ` Hidehiro Kawai
2015-07-10 11:33   ` Hidehiro Kawai
2015-07-10 11:33   ` Hidehiro Kawai
2015-07-10 11:33 ` [PATCH 1/3] panic: Disable crash_kexec_post_notifiers if kdump is not available Hidehiro Kawai
2015-07-10 11:33   ` Hidehiro Kawai
2015-07-10 11:33   ` Hidehiro Kawai
2015-07-10 11:33   ` Hidehiro Kawai
2015-07-10 13:41   ` Eric W. Biederman
2015-07-10 13:41     ` Eric W. Biederman
2015-07-10 13:41     ` Eric W. Biederman
2015-07-10 13:41     ` Eric W. Biederman
2015-07-10 13:41     ` Eric W. Biederman
2015-07-13 20:26     ` dwalker
2015-07-13 20:26       ` dwalker at fifo99.com
2015-07-13 20:26       ` dwalker
2015-07-13 20:26       ` dwalker
2015-07-14  1:19       ` Eric W. Biederman
2015-07-14  1:19         ` Eric W. Biederman
2015-07-14  1:19         ` Eric W. Biederman
2015-07-14  1:19         ` Eric W. Biederman
2015-07-14 13:59         ` dwalker
2015-07-14 13:59           ` dwalker at fifo99.com
2015-07-14 13:59           ` dwalker
2015-07-14 13:59           ` dwalker
2015-07-14 13:59           ` dwalker-zu3NM2574RrQT0dZR+AlfA
2015-07-14 14:20           ` Vivek Goyal
2015-07-14 14:20             ` Vivek Goyal
2015-07-14 14:20             ` Vivek Goyal
2015-07-14 14:20             ` Vivek Goyal
2015-07-14 14:20             ` Vivek Goyal
2015-07-14 15:02           ` Vivek Goyal
2015-07-14 15:02             ` Vivek Goyal
2015-07-14 15:02             ` Vivek Goyal
2015-07-14 15:02             ` Vivek Goyal
2015-07-14 15:34             ` dwalker
2015-07-14 15:34               ` dwalker at fifo99.com
2015-07-14 15:34               ` dwalker
2015-07-14 15:34               ` dwalker
2015-07-14 15:40               ` Vivek Goyal
2015-07-14 15:40                 ` Vivek Goyal
2015-07-14 15:40                 ` Vivek Goyal
2015-07-14 15:40                 ` Vivek Goyal
2015-07-14 15:48                 ` dwalker
2015-07-14 15:48                   ` dwalker at fifo99.com
2015-07-14 15:48                   ` dwalker
2015-07-14 15:48                   ` dwalker
2015-07-14 16:16                   ` Vivek Goyal
2015-07-14 16:16                     ` Vivek Goyal
2015-07-14 16:16                     ` Vivek Goyal
2015-07-14 16:16                     ` Vivek Goyal
2015-07-14 16:16                     ` Vivek Goyal
2015-07-14 17:06                     ` Eric W. Biederman
2015-07-14 17:06                       ` Eric W. Biederman
2015-07-14 17:06                       ` Eric W. Biederman
2015-07-14 17:06                       ` Eric W. Biederman
2015-07-14 17:06                       ` Eric W. Biederman
2015-07-14 17:29                       ` dwalker
2015-07-14 17:29                         ` dwalker at fifo99.com
2015-07-14 17:29                         ` dwalker
2015-07-14 17:29                         ` dwalker
2015-07-14 17:55                         ` Vivek Goyal
2015-07-14 17:55                           ` Vivek Goyal
2015-07-14 17:55                           ` Vivek Goyal
2015-07-14 17:55                           ` Vivek Goyal
2015-07-14 17:55                           ` Vivek Goyal
2015-07-14 18:01                           ` Eric W. Biederman [this message]
2015-07-14 18:01                             ` Eric W. Biederman
2015-07-14 18:01                             ` Eric W. Biederman
2015-07-14 18:01                             ` Eric W. Biederman
2015-07-14 18:23                             ` Vivek Goyal
2015-07-14 18:23                               ` Vivek Goyal
2015-07-14 18:23                               ` Vivek Goyal
2015-07-14 18:23                               ` Vivek Goyal
2015-07-14 18:23                               ` Vivek Goyal
2015-07-15  5:16                               ` Masami Hiramatsu
2015-07-15  5:16                                 ` Masami Hiramatsu
2015-07-15  5:16                                 ` Masami Hiramatsu
2015-07-15  5:16                                 ` Masami Hiramatsu
2015-07-15 10:49                 ` Hidehiro Kawai
2015-07-15 10:49                   ` Hidehiro Kawai
2015-07-15 10:49                   ` Hidehiro Kawai
2015-07-15 10:49                   ` Hidehiro Kawai
2015-07-15 10:49                   ` Hidehiro Kawai
2015-07-14  1:56       ` Hidehiro Kawai
2015-07-14  1:56         ` Hidehiro Kawai
2015-07-14  1:56         ` Hidehiro Kawai
2015-07-14  1:56         ` Hidehiro Kawai
2015-07-14  1:56         ` Hidehiro Kawai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87si8qmxef.fsf@x220.int.ebiederm.org \
    --to=ebiederm@xmission.com \
    --cc=akpm@linux-foundation.org \
    --cc=bhe@redhat.com \
    --cc=d.hatayama@jp.fujitsu.com \
    --cc=dwalker@fifo99.com \
    --cc=hidehiro.kawai.ez@hitachi.com \
    --cc=kexec@lists.infradead.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-metag@vger.kernel.org \
    --cc=linux-mips@linux-mips.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=linux-sh@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=masami.hiramatsu.pt@hitachi.com \
    --cc=mingo@kernel.org \
    --cc=vgoyal@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.