From mboxrd@z Thu Jan 1 00:00:00 1970 Return-path: Date: Tue, 14 Jul 2015 13:55:27 -0400 From: Vivek Goyal Subject: Re: [PATCH 1/3] panic: Disable crash_kexec_post_notifiers if kdump is not available Message-ID: <20150714175527.GI10792@redhat.com> References: <20150713202611.GA16525@fifo99.com> <87h9p7r0we.fsf@x220.int.ebiederm.org> <20150714135919.GA18333@fifo99.com> <20150714150208.GD10792@redhat.com> <20150714153430.GA18766@fifo99.com> <20150714154040.GA3912@redhat.com> <20150714154833.GA18883@fifo99.com> <20150714161612.GH10792@redhat.com> <87a8uyoeig.fsf@x220.int.ebiederm.org> <20150714172953.GA19135@fifo99.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20150714172953.GA19135@fifo99.com> List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "kexec" Errors-To: kexec-bounces+dwmw2=infradead.org@lists.infradead.org To: dwalker@fifo99.com Cc: linux-mips@linux-mips.org, Baoquan He , linux-sh@vger.kernel.org, linux-s390@vger.kernel.org, kexec@lists.infradead.org, linux-kernel@vger.kernel.org, linux-metag@vger.kernel.org, Masami Hiramatsu , HATAYAMA Daisuke , "Eric W. Biederman" , Hidehiro Kawai , Andrew Morton , linuxppc-dev@lists.ozlabs.org, Ingo Molnar , linux-arm-kernel@lists.infradead.org On Tue, Jul 14, 2015 at 05:29:53PM +0000, dwalker@fifo99.com wrote: [..] > > >> > If a machine is failing, there are high chance it can't deliver you the > > >> > notification. Detecting that failure suing some kind of polling mechanism > > >> > might be more reliable. And it will make even kdump mechanism more > > >> > reliable so that it does not have to run panic notifiers after the crash. > > >> > > >> I think what your suggesting is that my company should change how it's hardware works > > >> and that's not really an option for me. This isn't a simple thing like checking over the > > >> network if the machine is down or not, this is way more complex hardware design. > > > > > > That means you are ready to live with an unreliable design. There might be > > > cases where notifier does not get run properly and you will not do switch > > > despite the fact that OS has failed. I was just trying to nudge you in > > > a direction which could be more reliable mechanism. > > > > Sigh I see some deep confusion going on here. > > > > The panic notifiers are just that panic notifiers. They have not been > > nor should they be tied to kexec. If those notifiers force a switch > > over of between machines I fail to see why you would care if it was > > kexec or another panic situation that is forcing that switchover. > > Hidehiro isn't fixing the failover situation on my side, he's fixing register > information collection when crash_kexec_post_notifiers is used. Sure. Given that we have created this new parameter, let us fix it so that we can capture the other cpu register state in crash dump. I am little disappointed that it was not tested well when this parameter was introuced. We should have atleast tested it to the extent to see if there is proper cpu state present for all cpus in the crash dump. At that point of time it looked like a simple modification to allow panic notifiers before crash_kexec(). Thanks Vivek _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec From mboxrd@z Thu Jan 1 00:00:00 1970 From: Vivek Goyal Subject: Re: [PATCH 1/3] panic: Disable crash_kexec_post_notifiers if kdump is not available Date: Tue, 14 Jul 2015 13:55:27 -0400 Message-ID: <20150714175527.GI10792@redhat.com> References: <20150713202611.GA16525@fifo99.com> <87h9p7r0we.fsf@x220.int.ebiederm.org> <20150714135919.GA18333@fifo99.com> <20150714150208.GD10792@redhat.com> <20150714153430.GA18766@fifo99.com> <20150714154040.GA3912@redhat.com> <20150714154833.GA18883@fifo99.com> <20150714161612.GH10792@redhat.com> <87a8uyoeig.fsf@x220.int.ebiederm.org> <20150714172953.GA19135@fifo99.com> Mime-Version: 1.0 Return-path: Content-Disposition: inline In-Reply-To: <20150714172953.GA19135-zu3NM2574RrQT0dZR+AlfA@public.gmane.org> Sender: linux-metag-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: dwalker-zu3NM2574RrQT0dZR+AlfA@public.gmane.org Cc: "Eric W. Biederman" , Hidehiro Kawai , Andrew Morton , linux-mips-6z/3iImG2C8G8FEW9MqTrA@public.gmane.org, Baoquan He , linux-sh-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-s390-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, kexec-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Ingo Molnar , HATAYAMA Daisuke , Masami Hiramatsu , linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ@public.gmane.org, linux-metag-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org On Tue, Jul 14, 2015 at 05:29:53PM +0000, dwalker-zu3NM2574RrQT0dZR+AlfA@public.gmane.org wrote: [..] > > >> > If a machine is failing, there are high chance it can't deliver you the > > >> > notification. Detecting that failure suing some kind of polling mechanism > > >> > might be more reliable. And it will make even kdump mechanism more > > >> > reliable so that it does not have to run panic notifiers after the crash. > > >> > > >> I think what your suggesting is that my company should change how it's hardware works > > >> and that's not really an option for me. This isn't a simple thing like checking over the > > >> network if the machine is down or not, this is way more complex hardware design. > > > > > > That means you are ready to live with an unreliable design. There might be > > > cases where notifier does not get run properly and you will not do switch > > > despite the fact that OS has failed. I was just trying to nudge you in > > > a direction which could be more reliable mechanism. > > > > Sigh I see some deep confusion going on here. > > > > The panic notifiers are just that panic notifiers. They have not been > > nor should they be tied to kexec. If those notifiers force a switch > > over of between machines I fail to see why you would care if it was > > kexec or another panic situation that is forcing that switchover. > > Hidehiro isn't fixing the failover situation on my side, he's fixing register > information collection when crash_kexec_post_notifiers is used. Sure. Given that we have created this new parameter, let us fix it so that we can capture the other cpu register state in crash dump. I am little disappointed that it was not tested well when this parameter was introuced. We should have atleast tested it to the extent to see if there is proper cpu state present for all cpus in the crash dump. At that point of time it looked like a simple modification to allow panic notifiers before crash_kexec(). Thanks Vivek From mboxrd@z Thu Jan 1 00:00:00 1970 Received: with ECARTIS (v1.0.0; list linux-mips); Tue, 14 Jul 2015 19:55:31 +0200 (CEST) Received: from mx1.redhat.com ([209.132.183.28]:51899 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by eddie.linux-mips.org with ESMTP id S27010546AbbGNRzaHbsHc (ORCPT ); Tue, 14 Jul 2015 19:55:30 +0200 Received: from int-mx13.intmail.prod.int.phx2.redhat.com (int-mx13.intmail.prod.int.phx2.redhat.com [10.5.11.26]) by mx1.redhat.com (Postfix) with ESMTPS id 089912CD82B; Tue, 14 Jul 2015 17:55:28 +0000 (UTC) Received: from horse.redhat.com (dhcp-25-235.bos.redhat.com [10.18.25.235]) by int-mx13.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id t6EHtR9r014519; Tue, 14 Jul 2015 13:55:27 -0400 Received: by horse.redhat.com (Postfix, from userid 10451) id 210EF202DF2; Tue, 14 Jul 2015 13:55:27 -0400 (EDT) Date: Tue, 14 Jul 2015 13:55:27 -0400 From: Vivek Goyal To: dwalker@fifo99.com Cc: "Eric W. Biederman" , Hidehiro Kawai , Andrew Morton , linux-mips@linux-mips.org, Baoquan He , linux-sh@vger.kernel.org, linux-s390@vger.kernel.org, kexec@lists.infradead.org, linux-kernel@vger.kernel.org, Ingo Molnar , HATAYAMA Daisuke , Masami Hiramatsu , linuxppc-dev@lists.ozlabs.org, linux-metag@vger.kernel.org, linux-arm-kernel@lists.infradead.org Subject: Re: [PATCH 1/3] panic: Disable crash_kexec_post_notifiers if kdump is not available Message-ID: <20150714175527.GI10792@redhat.com> References: <20150713202611.GA16525@fifo99.com> <87h9p7r0we.fsf@x220.int.ebiederm.org> <20150714135919.GA18333@fifo99.com> <20150714150208.GD10792@redhat.com> <20150714153430.GA18766@fifo99.com> <20150714154040.GA3912@redhat.com> <20150714154833.GA18883@fifo99.com> <20150714161612.GH10792@redhat.com> <87a8uyoeig.fsf@x220.int.ebiederm.org> <20150714172953.GA19135@fifo99.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150714172953.GA19135@fifo99.com> User-Agent: Mutt/1.5.23 (2014-03-12) X-Scanned-By: MIMEDefang 2.68 on 10.5.11.26 Return-Path: X-Envelope-To: <"|/home/ecartis/ecartis -s linux-mips"> (uid 0) X-Orcpt: rfc822;linux-mips@linux-mips.org Original-Recipient: rfc822;linux-mips@linux-mips.org X-archive-position: 48295 X-ecartis-version: Ecartis v1.0.0 Sender: linux-mips-bounce@linux-mips.org Errors-to: linux-mips-bounce@linux-mips.org X-original-sender: vgoyal@redhat.com Precedence: bulk List-help: List-unsubscribe: List-software: Ecartis version 1.0.0 List-Id: linux-mips X-List-ID: linux-mips List-subscribe: List-owner: List-post: List-archive: X-list: linux-mips On Tue, Jul 14, 2015 at 05:29:53PM +0000, dwalker@fifo99.com wrote: [..] > > >> > If a machine is failing, there are high chance it can't deliver you the > > >> > notification. Detecting that failure suing some kind of polling mechanism > > >> > might be more reliable. And it will make even kdump mechanism more > > >> > reliable so that it does not have to run panic notifiers after the crash. > > >> > > >> I think what your suggesting is that my company should change how it's hardware works > > >> and that's not really an option for me. This isn't a simple thing like checking over the > > >> network if the machine is down or not, this is way more complex hardware design. > > > > > > That means you are ready to live with an unreliable design. There might be > > > cases where notifier does not get run properly and you will not do switch > > > despite the fact that OS has failed. I was just trying to nudge you in > > > a direction which could be more reliable mechanism. > > > > Sigh I see some deep confusion going on here. > > > > The panic notifiers are just that panic notifiers. They have not been > > nor should they be tied to kexec. If those notifiers force a switch > > over of between machines I fail to see why you would care if it was > > kexec or another panic situation that is forcing that switchover. > > Hidehiro isn't fixing the failover situation on my side, he's fixing register > information collection when crash_kexec_post_notifiers is used. Sure. Given that we have created this new parameter, let us fix it so that we can capture the other cpu register state in crash dump. I am little disappointed that it was not tested well when this parameter was introuced. We should have atleast tested it to the extent to see if there is proper cpu state present for all cpus in the crash dump. At that point of time it looked like a simple modification to allow panic notifiers before crash_kexec(). Thanks Vivek From mboxrd@z Thu Jan 1 00:00:00 1970 From: Vivek Goyal Date: Tue, 14 Jul 2015 17:55:27 +0000 Subject: Re: [PATCH 1/3] panic: Disable crash_kexec_post_notifiers if kdump is not available Message-Id: <20150714175527.GI10792@redhat.com> List-Id: References: <20150713202611.GA16525@fifo99.com> <87h9p7r0we.fsf@x220.int.ebiederm.org> <20150714135919.GA18333@fifo99.com> <20150714150208.GD10792@redhat.com> <20150714153430.GA18766@fifo99.com> <20150714154040.GA3912@redhat.com> <20150714154833.GA18883@fifo99.com> <20150714161612.GH10792@redhat.com> <87a8uyoeig.fsf@x220.int.ebiederm.org> <20150714172953.GA19135@fifo99.com> In-Reply-To: <20150714172953.GA19135@fifo99.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: linux-arm-kernel@lists.infradead.org On Tue, Jul 14, 2015 at 05:29:53PM +0000, dwalker@fifo99.com wrote: [..] > > >> > If a machine is failing, there are high chance it can't deliver you the > > >> > notification. Detecting that failure suing some kind of polling mechanism > > >> > might be more reliable. And it will make even kdump mechanism more > > >> > reliable so that it does not have to run panic notifiers after the crash. > > >> > > >> I think what your suggesting is that my company should change how it's hardware works > > >> and that's not really an option for me. This isn't a simple thing like checking over the > > >> network if the machine is down or not, this is way more complex hardware design. > > > > > > That means you are ready to live with an unreliable design. There might be > > > cases where notifier does not get run properly and you will not do switch > > > despite the fact that OS has failed. I was just trying to nudge you in > > > a direction which could be more reliable mechanism. > > > > Sigh I see some deep confusion going on here. > > > > The panic notifiers are just that panic notifiers. They have not been > > nor should they be tied to kexec. If those notifiers force a switch > > over of between machines I fail to see why you would care if it was > > kexec or another panic situation that is forcing that switchover. > > Hidehiro isn't fixing the failover situation on my side, he's fixing register > information collection when crash_kexec_post_notifiers is used. Sure. Given that we have created this new parameter, let us fix it so that we can capture the other cpu register state in crash dump. I am little disappointed that it was not tested well when this parameter was introuced. We should have atleast tested it to the extent to see if there is proper cpu state present for all cpus in the crash dump. At that point of time it looked like a simple modification to allow panic notifiers before crash_kexec(). Thanks Vivek From mboxrd@z Thu Jan 1 00:00:00 1970 From: vgoyal@redhat.com (Vivek Goyal) Date: Tue, 14 Jul 2015 13:55:27 -0400 Subject: [PATCH 1/3] panic: Disable crash_kexec_post_notifiers if kdump is not available In-Reply-To: <20150714172953.GA19135@fifo99.com> References: <20150713202611.GA16525@fifo99.com> <87h9p7r0we.fsf@x220.int.ebiederm.org> <20150714135919.GA18333@fifo99.com> <20150714150208.GD10792@redhat.com> <20150714153430.GA18766@fifo99.com> <20150714154040.GA3912@redhat.com> <20150714154833.GA18883@fifo99.com> <20150714161612.GH10792@redhat.com> <87a8uyoeig.fsf@x220.int.ebiederm.org> <20150714172953.GA19135@fifo99.com> Message-ID: <20150714175527.GI10792@redhat.com> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org On Tue, Jul 14, 2015 at 05:29:53PM +0000, dwalker at fifo99.com wrote: [..] > > >> > If a machine is failing, there are high chance it can't deliver you the > > >> > notification. Detecting that failure suing some kind of polling mechanism > > >> > might be more reliable. And it will make even kdump mechanism more > > >> > reliable so that it does not have to run panic notifiers after the crash. > > >> > > >> I think what your suggesting is that my company should change how it's hardware works > > >> and that's not really an option for me. This isn't a simple thing like checking over the > > >> network if the machine is down or not, this is way more complex hardware design. > > > > > > That means you are ready to live with an unreliable design. There might be > > > cases where notifier does not get run properly and you will not do switch > > > despite the fact that OS has failed. I was just trying to nudge you in > > > a direction which could be more reliable mechanism. > > > > Sigh I see some deep confusion going on here. > > > > The panic notifiers are just that panic notifiers. They have not been > > nor should they be tied to kexec. If those notifiers force a switch > > over of between machines I fail to see why you would care if it was > > kexec or another panic situation that is forcing that switchover. > > Hidehiro isn't fixing the failover situation on my side, he's fixing register > information collection when crash_kexec_post_notifiers is used. Sure. Given that we have created this new parameter, let us fix it so that we can capture the other cpu register state in crash dump. I am little disappointed that it was not tested well when this parameter was introuced. We should have atleast tested it to the extent to see if there is proper cpu state present for all cpus in the crash dump. At that point of time it looked like a simple modification to allow panic notifiers before crash_kexec(). Thanks Vivek