From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Cooper Subject: Re: [PATCH 2/2] x86/crash: Disable the watchdog NMIs on the crashing cpu. Date: Mon, 18 Nov 2013 10:33:16 +0000 Message-ID: <5289ECEC.4030207@citrix.com> References: <1384547567-17059-1-git-send-email-andrew.cooper3@citrix.com> <1384547567-17059-3-git-send-email-andrew.cooper3@citrix.com> <5289EB6F0200007800103EBB@nat28.tlf.novell.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Received: from mail6.bemta14.messagelabs.com ([193.109.254.103]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1ViM8u-0001wM-IP for xen-devel@lists.xenproject.org; Mon, 18 Nov 2013 10:33:20 +0000 In-Reply-To: <5289EB6F0200007800103EBB@nat28.tlf.novell.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Jan Beulich Cc: xen-devel , Keir Fraser , David Vrabel , Tim Deegan List-Id: xen-devel@lists.xenproject.org On 18/11/13 09:26, Jan Beulich wrote: >>>> On 15.11.13 at 21:32, Andrew Cooper wrote: >> --- a/xen/arch/x86/crash.c >> +++ b/xen/arch/x86/crash.c >> @@ -118,6 +118,7 @@ static void nmi_shootdown_cpus(void) >> unsigned long msecs; >> int i, cpu = smp_processor_id(); >> >> + disable_lapic_nmi_watchdog(); >> local_irq_disable(); >> >> crashing_cpu = cpu; > _If_ you do this here, I wonder why it's being done before > disabling interrupts. > > But then again I wonder whether it wouldn't be better to do > this even earlier (i.e. by passing a flag to watchdog_disable()), > as the NMI watchdog becomes useless with that call being done > from kexec_common_shutdown(). Disabling interrupts here is more defensive coding than anything else. It is not expected to be able to get here with interrupts enabled, but in a crash Putting this in watchdog_disable() would result in a race condition. disable_lapic_nmi_watchdog() mutates global state, meaning that it can only possibly run correctly on a single cpu. In an ideal world with plenty of time, the lapic watchdog code could be improved. However, restricting its use until after one_cpu_only() is the easiest fix. ~Andrew > >> --- a/xen/arch/x86/nmi.c >> +++ b/xen/arch/x86/nmi.c >> @@ -165,7 +165,7 @@ static void nmi_timer_fn(void *unused) >> set_timer(&this_cpu(nmi_timer), NOW() + MILLISECS(1000)); >> } >> >> -static void disable_lapic_nmi_watchdog(void) >> +void disable_lapic_nmi_watchdog(void) > The suggested alternative would also make it unnecessary to > make this function non-static... > > Jan >