From mboxrd@z Thu Jan 1 00:00:00 1970 Return-path: Date: Tue, 14 Jul 2015 14:23:36 -0400 From: Vivek Goyal Subject: Re: [PATCH 1/3] panic: Disable crash_kexec_post_notifiers if kdump is not available Message-ID: <20150714182336.GB3912@redhat.com> References: <20150714135919.GA18333@fifo99.com> <20150714150208.GD10792@redhat.com> <20150714153430.GA18766@fifo99.com> <20150714154040.GA3912@redhat.com> <20150714154833.GA18883@fifo99.com> <20150714161612.GH10792@redhat.com> <87a8uyoeig.fsf@x220.int.ebiederm.org> <20150714172953.GA19135@fifo99.com> <20150714175527.GI10792@redhat.com> <87si8qmxef.fsf@x220.int.ebiederm.org> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <87si8qmxef.fsf@x220.int.ebiederm.org> List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "kexec" Errors-To: kexec-bounces+dwmw2=infradead.org@lists.infradead.org To: "Eric W. Biederman" Cc: linux-mips@linux-mips.org, Baoquan He , linux-sh@vger.kernel.org, linux-s390@vger.kernel.org, kexec@lists.infradead.org, linux-kernel@vger.kernel.org, linux-metag@vger.kernel.org, Masami Hiramatsu , HATAYAMA Daisuke , Hidehiro Kawai , dwalker@fifo99.com, Andrew Morton , linuxppc-dev@lists.ozlabs.org, Ingo Molnar , linux-arm-kernel@lists.infradead.org On Tue, Jul 14, 2015 at 01:01:12PM -0500, Eric W. Biederman wrote: > Vivek Goyal writes: > > > On Tue, Jul 14, 2015 at 05:29:53PM +0000, dwalker@fifo99.com wrote: > > > > [..] > >> > >> > If a machine is failing, there are high chance it can't deliver you the > >> > >> > notification. Detecting that failure suing some kind of polling mechanism > >> > >> > might be more reliable. And it will make even kdump mechanism more > >> > >> > reliable so that it does not have to run panic notifiers after the crash. > >> > >> > >> > >> I think what your suggesting is that my company should change how it's hardware works > >> > >> and that's not really an option for me. This isn't a simple thing like checking over the > >> > >> network if the machine is down or not, this is way more complex hardware design. > >> > > > >> > > That means you are ready to live with an unreliable design. There might be > >> > > cases where notifier does not get run properly and you will not do switch > >> > > despite the fact that OS has failed. I was just trying to nudge you in > >> > > a direction which could be more reliable mechanism. > >> > > >> > Sigh I see some deep confusion going on here. > >> > > >> > The panic notifiers are just that panic notifiers. They have not been > >> > nor should they be tied to kexec. If those notifiers force a switch > >> > over of between machines I fail to see why you would care if it was > >> > kexec or another panic situation that is forcing that switchover. > >> > >> Hidehiro isn't fixing the failover situation on my side, he's fixing register > >> information collection when crash_kexec_post_notifiers is used. > > > > Sure. Given that we have created this new parameter, let us fix it so that > > we can capture the other cpu register state in crash dump. > > > > I am little disappointed that it was not tested well when this parameter was > > introuced. We should have atleast tested it to the extent to see if there > > is proper cpu state present for all cpus in the crash dump. > > > > At that point of time it looked like a simple modification > > to allow panic notifiers before crash_kexec(). > > Either that or we say no one cares enough, and it known broken so let's > just revert the fool thing. Masami, you introduced this option. Are you fine with the revert? Is it really being used and tested? > I honestly can't see how to support panic notifiers, before kexec. > There is no way to tell what is being done and all of the pieces > including smp_send_stop are known to be buggy. we should be able to replace smp_send_stop() with what crash_kexec() is doing to stop the machine? If yes, then it should be fine I guess. This parameter description clearly says that specify it at your own risk. So we are not issuing a big support statement for successful kdump after panic notifiers. If it is something fixable, otherwise user needs to deal with it. Thanks Vivek _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec From mboxrd@z Thu Jan 1 00:00:00 1970 From: Vivek Goyal Subject: Re: [PATCH 1/3] panic: Disable crash_kexec_post_notifiers if kdump is not available Date: Tue, 14 Jul 2015 14:23:36 -0400 Message-ID: <20150714182336.GB3912@redhat.com> References: <20150714135919.GA18333@fifo99.com> <20150714150208.GD10792@redhat.com> <20150714153430.GA18766@fifo99.com> <20150714154040.GA3912@redhat.com> <20150714154833.GA18883@fifo99.com> <20150714161612.GH10792@redhat.com> <87a8uyoeig.fsf@x220.int.ebiederm.org> <20150714172953.GA19135@fifo99.com> <20150714175527.GI10792@redhat.com> <87si8qmxef.fsf@x220.int.ebiederm.org> Mime-Version: 1.0 Return-path: Content-Disposition: inline In-Reply-To: <87si8qmxef.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org> Sender: linux-metag-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: "Eric W. Biederman" Cc: dwalker-zu3NM2574RrQT0dZR+AlfA@public.gmane.org, Hidehiro Kawai , Andrew Morton , linux-mips-6z/3iImG2C8G8FEW9MqTrA@public.gmane.org, Baoquan He , linux-sh-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-s390-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Ingo Molnar , HATAYAMA Daisuke , Masami Hiramatsu , linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ@public.gmane.org, linux-metag-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org On Tue, Jul 14, 2015 at 01:01:12PM -0500, Eric W. Biederman wrote: > Vivek Goyal writes: > > > On Tue, Jul 14, 2015 at 05:29:53PM +0000, dwalker-zu3NM2574RrQT0dZR+AlfA@public.gmane.org wrote: > > > > [..] > >> > >> > If a machine is failing, there are high chance it can't deliver you the > >> > >> > notification. Detecting that failure suing some kind of polling mechanism > >> > >> > might be more reliable. And it will make even kdump mechanism more > >> > >> > reliable so that it does not have to run panic notifiers after the crash. > >> > >> > >> > >> I think what your suggesting is that my company should change how it's hardware works > >> > >> and that's not really an option for me. This isn't a simple thing like checking over the > >> > >> network if the machine is down or not, this is way more complex hardware design. > >> > > > >> > > That means you are ready to live with an unreliable design. There might be > >> > > cases where notifier does not get run properly and you will not do switch > >> > > despite the fact that OS has failed. I was just trying to nudge you in > >> > > a direction which could be more reliable mechanism. > >> > > >> > Sigh I see some deep confusion going on here. > >> > > >> > The panic notifiers are just that panic notifiers. They have not been > >> > nor should they be tied to kexec. If those notifiers force a switch > >> > over of between machines I fail to see why you would care if it was > >> > kexec or another panic situation that is forcing that switchover. > >> > >> Hidehiro isn't fixing the failover situation on my side, he's fixing register > >> information collection when crash_kexec_post_notifiers is used. > > > > Sure. Given that we have created this new parameter, let us fix it so that > > we can capture the other cpu register state in crash dump. > > > > I am little disappointed that it was not tested well when this parameter was > > introuced. We should have atleast tested it to the extent to see if there > > is proper cpu state present for all cpus in the crash dump. > > > > At that point of time it looked like a simple modification > > to allow panic notifiers before crash_kexec(). > > Either that or we say no one cares enough, and it known broken so let's > just revert the fool thing. Masami, you introduced this option. Are you fine with the revert? Is it really being used and tested? > I honestly can't see how to support panic notifiers, before kexec. > There is no way to tell what is being done and all of the pieces > including smp_send_stop are known to be buggy. we should be able to replace smp_send_stop() with what crash_kexec() is doing to stop the machine? If yes, then it should be fine I guess. This parameter description clearly says that specify it at your own risk. So we are not issuing a big support statement for successful kdump after panic notifiers. If it is something fixable, otherwise user needs to deal with it. Thanks Vivek From mboxrd@z Thu Jan 1 00:00:00 1970 Received: with ECARTIS (v1.0.0; list linux-mips); Tue, 14 Jul 2015 20:23:44 +0200 (CEST) Received: from mx1.redhat.com ([209.132.183.28]:33270 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by eddie.linux-mips.org with ESMTP id S27010860AbbGNSXmGamsl (ORCPT ); Tue, 14 Jul 2015 20:23:42 +0200 Received: from int-mx09.intmail.prod.int.phx2.redhat.com (int-mx09.intmail.prod.int.phx2.redhat.com [10.5.11.22]) by mx1.redhat.com (Postfix) with ESMTPS id 81624B668B; Tue, 14 Jul 2015 18:23:38 +0000 (UTC) Received: from horse.redhat.com (dhcp-25-235.bos.redhat.com [10.18.25.235]) by int-mx09.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id t6EINbPJ013444; Tue, 14 Jul 2015 14:23:37 -0400 Received: by horse.redhat.com (Postfix, from userid 10451) id E84262021F4; Tue, 14 Jul 2015 14:23:36 -0400 (EDT) Date: Tue, 14 Jul 2015 14:23:36 -0400 From: Vivek Goyal To: "Eric W. Biederman" Cc: dwalker@fifo99.com, Hidehiro Kawai , Andrew Morton , linux-mips@linux-mips.org, Baoquan He , linux-sh@vger.kernel.org, linux-s390@vger.kernel.org, kexec@lists.infradead.org, linux-kernel@vger.kernel.org, Ingo Molnar , HATAYAMA Daisuke , Masami Hiramatsu , linuxppc-dev@lists.ozlabs.org, linux-metag@vger.kernel.org, linux-arm-kernel@lists.infradead.org Subject: Re: [PATCH 1/3] panic: Disable crash_kexec_post_notifiers if kdump is not available Message-ID: <20150714182336.GB3912@redhat.com> References: <20150714135919.GA18333@fifo99.com> <20150714150208.GD10792@redhat.com> <20150714153430.GA18766@fifo99.com> <20150714154040.GA3912@redhat.com> <20150714154833.GA18883@fifo99.com> <20150714161612.GH10792@redhat.com> <87a8uyoeig.fsf@x220.int.ebiederm.org> <20150714172953.GA19135@fifo99.com> <20150714175527.GI10792@redhat.com> <87si8qmxef.fsf@x220.int.ebiederm.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87si8qmxef.fsf@x220.int.ebiederm.org> User-Agent: Mutt/1.5.23 (2014-03-12) X-Scanned-By: MIMEDefang 2.68 on 10.5.11.22 Return-Path: X-Envelope-To: <"|/home/ecartis/ecartis -s linux-mips"> (uid 0) X-Orcpt: rfc822;linux-mips@linux-mips.org Original-Recipient: rfc822;linux-mips@linux-mips.org X-archive-position: 48298 X-ecartis-version: Ecartis v1.0.0 Sender: linux-mips-bounce@linux-mips.org Errors-to: linux-mips-bounce@linux-mips.org X-original-sender: vgoyal@redhat.com Precedence: bulk List-help: List-unsubscribe: List-software: Ecartis version 1.0.0 List-Id: linux-mips X-List-ID: linux-mips List-subscribe: List-owner: List-post: List-archive: X-list: linux-mips On Tue, Jul 14, 2015 at 01:01:12PM -0500, Eric W. Biederman wrote: > Vivek Goyal writes: > > > On Tue, Jul 14, 2015 at 05:29:53PM +0000, dwalker@fifo99.com wrote: > > > > [..] > >> > >> > If a machine is failing, there are high chance it can't deliver you the > >> > >> > notification. Detecting that failure suing some kind of polling mechanism > >> > >> > might be more reliable. And it will make even kdump mechanism more > >> > >> > reliable so that it does not have to run panic notifiers after the crash. > >> > >> > >> > >> I think what your suggesting is that my company should change how it's hardware works > >> > >> and that's not really an option for me. This isn't a simple thing like checking over the > >> > >> network if the machine is down or not, this is way more complex hardware design. > >> > > > >> > > That means you are ready to live with an unreliable design. There might be > >> > > cases where notifier does not get run properly and you will not do switch > >> > > despite the fact that OS has failed. I was just trying to nudge you in > >> > > a direction which could be more reliable mechanism. > >> > > >> > Sigh I see some deep confusion going on here. > >> > > >> > The panic notifiers are just that panic notifiers. They have not been > >> > nor should they be tied to kexec. If those notifiers force a switch > >> > over of between machines I fail to see why you would care if it was > >> > kexec or another panic situation that is forcing that switchover. > >> > >> Hidehiro isn't fixing the failover situation on my side, he's fixing register > >> information collection when crash_kexec_post_notifiers is used. > > > > Sure. Given that we have created this new parameter, let us fix it so that > > we can capture the other cpu register state in crash dump. > > > > I am little disappointed that it was not tested well when this parameter was > > introuced. We should have atleast tested it to the extent to see if there > > is proper cpu state present for all cpus in the crash dump. > > > > At that point of time it looked like a simple modification > > to allow panic notifiers before crash_kexec(). > > Either that or we say no one cares enough, and it known broken so let's > just revert the fool thing. Masami, you introduced this option. Are you fine with the revert? Is it really being used and tested? > I honestly can't see how to support panic notifiers, before kexec. > There is no way to tell what is being done and all of the pieces > including smp_send_stop are known to be buggy. we should be able to replace smp_send_stop() with what crash_kexec() is doing to stop the machine? If yes, then it should be fine I guess. This parameter description clearly says that specify it at your own risk. So we are not issuing a big support statement for successful kdump after panic notifiers. If it is something fixable, otherwise user needs to deal with it. Thanks Vivek From mboxrd@z Thu Jan 1 00:00:00 1970 From: Vivek Goyal Date: Tue, 14 Jul 2015 18:23:36 +0000 Subject: Re: [PATCH 1/3] panic: Disable crash_kexec_post_notifiers if kdump is not available Message-Id: <20150714182336.GB3912@redhat.com> List-Id: References: <20150714135919.GA18333@fifo99.com> <20150714150208.GD10792@redhat.com> <20150714153430.GA18766@fifo99.com> <20150714154040.GA3912@redhat.com> <20150714154833.GA18883@fifo99.com> <20150714161612.GH10792@redhat.com> <87a8uyoeig.fsf@x220.int.ebiederm.org> <20150714172953.GA19135@fifo99.com> <20150714175527.GI10792@redhat.com> <87si8qmxef.fsf@x220.int.ebiederm.org> In-Reply-To: <87si8qmxef.fsf@x220.int.ebiederm.org> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: linux-arm-kernel@lists.infradead.org On Tue, Jul 14, 2015 at 01:01:12PM -0500, Eric W. Biederman wrote: > Vivek Goyal writes: > > > On Tue, Jul 14, 2015 at 05:29:53PM +0000, dwalker@fifo99.com wrote: > > > > [..] > >> > >> > If a machine is failing, there are high chance it can't deliver you the > >> > >> > notification. Detecting that failure suing some kind of polling mechanism > >> > >> > might be more reliable. And it will make even kdump mechanism more > >> > >> > reliable so that it does not have to run panic notifiers after the crash. > >> > >> > >> > >> I think what your suggesting is that my company should change how it's hardware works > >> > >> and that's not really an option for me. This isn't a simple thing like checking over the > >> > >> network if the machine is down or not, this is way more complex hardware design. > >> > > > >> > > That means you are ready to live with an unreliable design. There might be > >> > > cases where notifier does not get run properly and you will not do switch > >> > > despite the fact that OS has failed. I was just trying to nudge you in > >> > > a direction which could be more reliable mechanism. > >> > > >> > Sigh I see some deep confusion going on here. > >> > > >> > The panic notifiers are just that panic notifiers. They have not been > >> > nor should they be tied to kexec. If those notifiers force a switch > >> > over of between machines I fail to see why you would care if it was > >> > kexec or another panic situation that is forcing that switchover. > >> > >> Hidehiro isn't fixing the failover situation on my side, he's fixing register > >> information collection when crash_kexec_post_notifiers is used. > > > > Sure. Given that we have created this new parameter, let us fix it so that > > we can capture the other cpu register state in crash dump. > > > > I am little disappointed that it was not tested well when this parameter was > > introuced. We should have atleast tested it to the extent to see if there > > is proper cpu state present for all cpus in the crash dump. > > > > At that point of time it looked like a simple modification > > to allow panic notifiers before crash_kexec(). > > Either that or we say no one cares enough, and it known broken so let's > just revert the fool thing. Masami, you introduced this option. Are you fine with the revert? Is it really being used and tested? > I honestly can't see how to support panic notifiers, before kexec. > There is no way to tell what is being done and all of the pieces > including smp_send_stop are known to be buggy. we should be able to replace smp_send_stop() with what crash_kexec() is doing to stop the machine? If yes, then it should be fine I guess. This parameter description clearly says that specify it at your own risk. So we are not issuing a big support statement for successful kdump after panic notifiers. If it is something fixable, otherwise user needs to deal with it. Thanks Vivek From mboxrd@z Thu Jan 1 00:00:00 1970 From: vgoyal@redhat.com (Vivek Goyal) Date: Tue, 14 Jul 2015 14:23:36 -0400 Subject: [PATCH 1/3] panic: Disable crash_kexec_post_notifiers if kdump is not available In-Reply-To: <87si8qmxef.fsf@x220.int.ebiederm.org> References: <20150714135919.GA18333@fifo99.com> <20150714150208.GD10792@redhat.com> <20150714153430.GA18766@fifo99.com> <20150714154040.GA3912@redhat.com> <20150714154833.GA18883@fifo99.com> <20150714161612.GH10792@redhat.com> <87a8uyoeig.fsf@x220.int.ebiederm.org> <20150714172953.GA19135@fifo99.com> <20150714175527.GI10792@redhat.com> <87si8qmxef.fsf@x220.int.ebiederm.org> Message-ID: <20150714182336.GB3912@redhat.com> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org On Tue, Jul 14, 2015 at 01:01:12PM -0500, Eric W. Biederman wrote: > Vivek Goyal writes: > > > On Tue, Jul 14, 2015 at 05:29:53PM +0000, dwalker at fifo99.com wrote: > > > > [..] > >> > >> > If a machine is failing, there are high chance it can't deliver you the > >> > >> > notification. Detecting that failure suing some kind of polling mechanism > >> > >> > might be more reliable. And it will make even kdump mechanism more > >> > >> > reliable so that it does not have to run panic notifiers after the crash. > >> > >> > >> > >> I think what your suggesting is that my company should change how it's hardware works > >> > >> and that's not really an option for me. This isn't a simple thing like checking over the > >> > >> network if the machine is down or not, this is way more complex hardware design. > >> > > > >> > > That means you are ready to live with an unreliable design. There might be > >> > > cases where notifier does not get run properly and you will not do switch > >> > > despite the fact that OS has failed. I was just trying to nudge you in > >> > > a direction which could be more reliable mechanism. > >> > > >> > Sigh I see some deep confusion going on here. > >> > > >> > The panic notifiers are just that panic notifiers. They have not been > >> > nor should they be tied to kexec. If those notifiers force a switch > >> > over of between machines I fail to see why you would care if it was > >> > kexec or another panic situation that is forcing that switchover. > >> > >> Hidehiro isn't fixing the failover situation on my side, he's fixing register > >> information collection when crash_kexec_post_notifiers is used. > > > > Sure. Given that we have created this new parameter, let us fix it so that > > we can capture the other cpu register state in crash dump. > > > > I am little disappointed that it was not tested well when this parameter was > > introuced. We should have atleast tested it to the extent to see if there > > is proper cpu state present for all cpus in the crash dump. > > > > At that point of time it looked like a simple modification > > to allow panic notifiers before crash_kexec(). > > Either that or we say no one cares enough, and it known broken so let's > just revert the fool thing. Masami, you introduced this option. Are you fine with the revert? Is it really being used and tested? > I honestly can't see how to support panic notifiers, before kexec. > There is no way to tell what is being done and all of the pieces > including smp_send_stop are known to be buggy. we should be able to replace smp_send_stop() with what crash_kexec() is doing to stop the machine? If yes, then it should be fine I guess. This parameter description clearly says that specify it at your own risk. So we are not issuing a big support statement for successful kdump after panic notifiers. If it is something fixable, otherwise user needs to deal with it. Thanks Vivek