All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3] x86/nmi: start NMI watchdog on CPU0 after SMP bootstrap
@ 2018-02-19 14:23 Igor Druzhinin
  2018-02-19 14:48 ` Andrew Cooper
  2018-02-19 15:18 ` Jan Beulich
  0 siblings, 2 replies; 6+ messages in thread
From: Igor Druzhinin @ 2018-02-19 14:23 UTC (permalink / raw)
  To: andrew.cooper3, jbeulich; +Cc: Igor Druzhinin, xen-devel

We're noticing a reproducible system boot hang on certain
post-Skylake platforms where the BIOS is configured in
legacy boot mode with x2APIC disabled. The system stalls
immediately after writing the first SMP initialization
sequence into APIC ICR.

The cause of the problem is watchdog NMI handler execution -
somewhere near the end of NMI handling (after it's already
rescheduled the next NMI) it tries to access IO port 0x61
to get the actual NMI reason on CPU0. Unfortunately, this
port is emulated by BIOS using SMIs and this emulation for
some reason takes more time than we expect during INIT-SIPI-SIPI
sequence. As the result, the system is constantly moving between
NMI and SMI handler and not making any progress.

To avoid this, initialize the watchdog after SMP bootstrap on
CPU0 and, additionally, protect the NMI handler by moving
IO port access before NMI re-scheduling. The latter should help
in case of post boot CPU onlining. Although we're running
watchdog at much lower frequency it's neveretheless possible
we may trigger the issue anyway.

Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
---
v3: corrected comments and coommit meesage.
---
 xen/arch/x86/apic.c    |  2 +-
 xen/arch/x86/smpboot.c |  3 +++
 xen/arch/x86/traps.c   | 12 ++++++++++--
 3 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/apic.c b/xen/arch/x86/apic.c
index 5039173..ffa5a69 100644
--- a/xen/arch/x86/apic.c
+++ b/xen/arch/x86/apic.c
@@ -684,7 +684,7 @@ void setup_local_APIC(void)
         printk("Leaving ESR disabled.\n");
     }
 
-    if (nmi_watchdog == NMI_LOCAL_APIC)
+    if (nmi_watchdog == NMI_LOCAL_APIC && smp_processor_id())
         setup_apic_nmi_watchdog();
     apic_pm_activate();
 }
diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
index 2ebef03..1844116 100644
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -1248,7 +1248,10 @@ int __cpu_up(unsigned int cpu)
 void __init smp_cpus_done(void)
 {
     if ( nmi_watchdog == NMI_LOCAL_APIC )
+    {
+        setup_apic_nmi_watchdog();
         check_nmi_watchdog();
+    }
 
     setup_ioapic_dest();
 
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 2e022b0..e6c7487 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -1706,7 +1706,7 @@ static nmi_callback_t *nmi_callback = dummy_nmi_callback;
 void do_nmi(const struct cpu_user_regs *regs)
 {
     unsigned int cpu = smp_processor_id();
-    unsigned char reason;
+    unsigned char reason = 0;
     bool handle_unknown = false;
 
     ++nmi_count(cpu);
@@ -1714,6 +1714,15 @@ void do_nmi(const struct cpu_user_regs *regs)
     if ( nmi_callback(regs, cpu) )
         return;
 
+    /*
+     * There is a chance that this IO port access will produce SMI which,
+     * in turn, may take enough time for the next NMI tick to happen.
+     * To avoid having nested NMIs as the result let's do it before
+     * watchdog re-scheduling.
+     */
+    if ( cpu == 0 )
+        reason = inb(0x61);
+
     if ( (nmi_watchdog == NMI_NONE) ||
          (!nmi_watchdog_tick(regs) && watchdog_force) )
         handle_unknown = true;
@@ -1721,7 +1730,6 @@ void do_nmi(const struct cpu_user_regs *regs)
     /* Only the BSP gets external NMIs from the system. */
     if ( cpu == 0 )
     {
-        reason = inb(0x61);
         if ( reason & 0x80 )
             pci_serr_error(regs);
         if ( reason & 0x40 )
-- 
2.7.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v3] x86/nmi: start NMI watchdog on CPU0 after SMP bootstrap
  2018-02-19 14:23 [PATCH v3] x86/nmi: start NMI watchdog on CPU0 after SMP bootstrap Igor Druzhinin
@ 2018-02-19 14:48 ` Andrew Cooper
  2018-02-19 16:12   ` Alexey G
  2018-02-19 15:18 ` Jan Beulich
  1 sibling, 1 reply; 6+ messages in thread
From: Andrew Cooper @ 2018-02-19 14:48 UTC (permalink / raw)
  To: Igor Druzhinin, jbeulich; +Cc: xen-devel

On 19/02/18 14:23, Igor Druzhinin wrote:
> We're noticing a reproducible system boot hang on certain
> post-Skylake platforms where the BIOS is configured in

These are Skylake, not post-Skylake.

> legacy boot mode with x2APIC disabled. The system stalls
> immediately after writing the first SMP initialization
> sequence into APIC ICR.
>
> The cause of the problem is watchdog NMI handler execution -
> somewhere near the end of NMI handling (after it's already
> rescheduled the next NMI) it tries to access IO port 0x61
> to get the actual NMI reason on CPU0. Unfortunately, this
> port is emulated by BIOS using SMIs and this emulation for
> some reason takes more time than we expect during INIT-SIPI-SIPI
> sequence. As the result, the system is constantly moving between
> NMI and SMI handler and not making any progress.
>
> To avoid this, initialize the watchdog after SMP bootstrap on
> CPU0 and, additionally, protect the NMI handler by moving
> IO port access before NMI re-scheduling. The latter should help
> in case of post boot CPU onlining. Although we're running
> watchdog at much lower frequency it's neveretheless possible
> we may trigger the issue anyway.
>
> Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
> ---
> v3: corrected comments and coommit meesage.
> ---
>  xen/arch/x86/apic.c    |  2 +-
>  xen/arch/x86/smpboot.c |  3 +++
>  xen/arch/x86/traps.c   | 12 ++++++++++--
>  3 files changed, 14 insertions(+), 3 deletions(-)
>
> diff --git a/xen/arch/x86/apic.c b/xen/arch/x86/apic.c
> index 5039173..ffa5a69 100644
> --- a/xen/arch/x86/apic.c
> +++ b/xen/arch/x86/apic.c
> @@ -684,7 +684,7 @@ void setup_local_APIC(void)
>          printk("Leaving ESR disabled.\n");
>      }
>  
> -    if (nmi_watchdog == NMI_LOCAL_APIC)
> +    if (nmi_watchdog == NMI_LOCAL_APIC && smp_processor_id())
>          setup_apic_nmi_watchdog();
>      apic_pm_activate();
>  }
> diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
> index 2ebef03..1844116 100644
> --- a/xen/arch/x86/smpboot.c
> +++ b/xen/arch/x86/smpboot.c
> @@ -1248,7 +1248,10 @@ int __cpu_up(unsigned int cpu)
>  void __init smp_cpus_done(void)
>  {
>      if ( nmi_watchdog == NMI_LOCAL_APIC )
> +    {
> +        setup_apic_nmi_watchdog();
>          check_nmi_watchdog();
> +    }
>  
>      setup_ioapic_dest();
>  
> diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
> index 2e022b0..e6c7487 100644
> --- a/xen/arch/x86/traps.c
> +++ b/xen/arch/x86/traps.c
> @@ -1706,7 +1706,7 @@ static nmi_callback_t *nmi_callback = dummy_nmi_callback;
>  void do_nmi(const struct cpu_user_regs *regs)
>  {
>      unsigned int cpu = smp_processor_id();
> -    unsigned char reason;
> +    unsigned char reason = 0;
>      bool handle_unknown = false;
>  
>      ++nmi_count(cpu);
> @@ -1714,6 +1714,15 @@ void do_nmi(const struct cpu_user_regs *regs)
>      if ( nmi_callback(regs, cpu) )
>          return;
>  
> +    /*
> +     * There is a chance that this IO port access will produce SMI which,
> +     * in turn, may take enough time for the next NMI tick to happen.
> +     * To avoid having nested NMIs as the result let's do it before
> +     * watchdog re-scheduling.

This isn't strictly accurate.  How about:

/* Reads of 0x61 may trap to SMM, and on production SKX servers, have
been observed to take up to 200ms to complete.  By reading this port
before we re-arm the NMI watchdog, we reduce the chance of having an NMI
watchdog expire while in the SMI handler. */

In particular, if we are servicing a non-watchdog NMI, the watchdog will
still be counting down while the SMI executes.

~Andrew

> +     */
> +    if ( cpu == 0 )
> +        reason = inb(0x61);
> +
>      if ( (nmi_watchdog == NMI_NONE) ||
>           (!nmi_watchdog_tick(regs) && watchdog_force) )
>          handle_unknown = true;
> @@ -1721,7 +1730,6 @@ void do_nmi(const struct cpu_user_regs *regs)
>      /* Only the BSP gets external NMIs from the system. */
>      if ( cpu == 0 )
>      {
> -        reason = inb(0x61);
>          if ( reason & 0x80 )
>              pci_serr_error(regs);
>          if ( reason & 0x40 )


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v3] x86/nmi: start NMI watchdog on CPU0 after SMP bootstrap
  2018-02-19 14:23 [PATCH v3] x86/nmi: start NMI watchdog on CPU0 after SMP bootstrap Igor Druzhinin
  2018-02-19 14:48 ` Andrew Cooper
@ 2018-02-19 15:18 ` Jan Beulich
  2018-02-19 15:20   ` Igor Druzhinin
  1 sibling, 1 reply; 6+ messages in thread
From: Jan Beulich @ 2018-02-19 15:18 UTC (permalink / raw)
  To: Igor Druzhinin; +Cc: andrew.cooper3, xen-devel

>>> On 19.02.18 at 15:23, <igor.druzhinin@citrix.com> wrote:
> We're noticing a reproducible system boot hang on certain
> post-Skylake platforms where the BIOS is configured in
> legacy boot mode with x2APIC disabled. The system stalls
> immediately after writing the first SMP initialization
> sequence into APIC ICR.
> 
> The cause of the problem is watchdog NMI handler execution -
> somewhere near the end of NMI handling (after it's already
> rescheduled the next NMI) it tries to access IO port 0x61
> to get the actual NMI reason on CPU0. Unfortunately, this
> port is emulated by BIOS using SMIs and this emulation for
> some reason takes more time than we expect during INIT-SIPI-SIPI
> sequence. As the result, the system is constantly moving between
> NMI and SMI handler and not making any progress.
> 
> To avoid this, initialize the watchdog after SMP bootstrap on
> CPU0 and, additionally, protect the NMI handler by moving
> IO port access before NMI re-scheduling. The latter should help
> in case of post boot CPU onlining. Although we're running
> watchdog at much lower frequency it's neveretheless possible
> we may trigger the issue anyway.

I'm afraid I can't connect "the latter" to anything earlier in the
description.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v3] x86/nmi: start NMI watchdog on CPU0 after SMP bootstrap
  2018-02-19 15:18 ` Jan Beulich
@ 2018-02-19 15:20   ` Igor Druzhinin
  2018-02-19 15:29     ` Jan Beulich
  0 siblings, 1 reply; 6+ messages in thread
From: Igor Druzhinin @ 2018-02-19 15:20 UTC (permalink / raw)
  To: Jan Beulich; +Cc: andrew.cooper3, xen-devel

On 19/02/18 15:18, Jan Beulich wrote:
>>>> On 19.02.18 at 15:23, <igor.druzhinin@citrix.com> wrote:
>> We're noticing a reproducible system boot hang on certain
>> post-Skylake platforms where the BIOS is configured in
>> legacy boot mode with x2APIC disabled. The system stalls
>> immediately after writing the first SMP initialization
>> sequence into APIC ICR.
>>
>> The cause of the problem is watchdog NMI handler execution -
>> somewhere near the end of NMI handling (after it's already
>> rescheduled the next NMI) it tries to access IO port 0x61
>> to get the actual NMI reason on CPU0. Unfortunately, this
>> port is emulated by BIOS using SMIs and this emulation for
>> some reason takes more time than we expect during INIT-SIPI-SIPI
>> sequence. As the result, the system is constantly moving between
>> NMI and SMI handler and not making any progress.
>>
>> To avoid this, initialize the watchdog after SMP bootstrap on
>> CPU0 and, additionally, protect the NMI handler by moving
>> IO port access before NMI re-scheduling. The latter should help
>> in case of post boot CPU onlining. Although we're running
>> watchdog at much lower frequency it's neveretheless possible
>> we may trigger the issue anyway.
> 
> I'm afraid I can't connect "the latter" to anything earlier in the
> description.

It's the previous sentence - there are 2 things that we do here - the
latter is "protect the NMI handler by moving IO port access before NMI
re-scheduling"

Igor


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v3] x86/nmi: start NMI watchdog on CPU0 after SMP bootstrap
  2018-02-19 15:20   ` Igor Druzhinin
@ 2018-02-19 15:29     ` Jan Beulich
  0 siblings, 0 replies; 6+ messages in thread
From: Jan Beulich @ 2018-02-19 15:29 UTC (permalink / raw)
  To: Igor Druzhinin; +Cc: andrew.cooper3, xen-devel

>>> On 19.02.18 at 16:20, <igor.druzhinin@citrix.com> wrote:
> On 19/02/18 15:18, Jan Beulich wrote:
>>>>> On 19.02.18 at 15:23, <igor.druzhinin@citrix.com> wrote:
>>> We're noticing a reproducible system boot hang on certain
>>> post-Skylake platforms where the BIOS is configured in
>>> legacy boot mode with x2APIC disabled. The system stalls
>>> immediately after writing the first SMP initialization
>>> sequence into APIC ICR.
>>>
>>> The cause of the problem is watchdog NMI handler execution -
>>> somewhere near the end of NMI handling (after it's already
>>> rescheduled the next NMI) it tries to access IO port 0x61
>>> to get the actual NMI reason on CPU0. Unfortunately, this
>>> port is emulated by BIOS using SMIs and this emulation for
>>> some reason takes more time than we expect during INIT-SIPI-SIPI
>>> sequence. As the result, the system is constantly moving between
>>> NMI and SMI handler and not making any progress.
>>>
>>> To avoid this, initialize the watchdog after SMP bootstrap on
>>> CPU0 and, additionally, protect the NMI handler by moving
>>> IO port access before NMI re-scheduling. The latter should help
>>> in case of post boot CPU onlining. Although we're running
>>> watchdog at much lower frequency it's neveretheless possible
>>> we may trigger the issue anyway.
>> 
>> I'm afraid I can't connect "the latter" to anything earlier in the
>> description.
> 
> It's the previous sentence - there are 2 things that we do here - the
> latter is "protect the NMI handler by moving IO port access before NMI
> re-scheduling"

Oh, I thought you mean to refer to the lower frequency. How
about "The latter should also help ..."?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v3] x86/nmi: start NMI watchdog on CPU0 after SMP bootstrap
  2018-02-19 14:48 ` Andrew Cooper
@ 2018-02-19 16:12   ` Alexey G
  0 siblings, 0 replies; 6+ messages in thread
From: Alexey G @ 2018-02-19 16:12 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Igor Druzhinin, jbeulich, xen-devel

On Mon, 19 Feb 2018 14:48:37 +0000
Andrew Cooper <andrew.cooper3@citrix.com> wrote:

>On 19/02/18 14:23, Igor Druzhinin wrote:
>> We're noticing a reproducible system boot hang on certain
>> post-Skylake platforms where the BIOS is configured in  
>
>These are Skylake, not post-Skylake.

Well, strictly speaking, any platforms which made obsolete REF_TOGGLE
bit in NMI_SC (I/O port 61h) and provide sw emulation for it via the
SMI I/O trap instead.

>> legacy boot mode with x2APIC disabled. The system stalls
>> immediately after writing the first SMP initialization
>> sequence into APIC ICR.
>>
>> The cause of the problem is watchdog NMI handler execution -
>> somewhere near the end of NMI handling (after it's already
>> rescheduled the next NMI) it tries to access IO port 0x61
>> to get the actual NMI reason on CPU0. Unfortunately, this
>> port is emulated by BIOS using SMIs and this emulation for
>> some reason takes more time than we expect during INIT-SIPI-SIPI
>> sequence. As the result, the system is constantly moving between
>> NMI and SMI handler and not making any progress.
>>
>> To avoid this, initialize the watchdog after SMP bootstrap on
>> CPU0 and, additionally, protect the NMI handler by moving
>> IO port access before NMI re-scheduling. The latter should help
>> in case of post boot CPU onlining. Although we're running
>> watchdog at much lower frequency it's neveretheless possible
>> we may trigger the issue anyway.
>>
>> Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
>> ---
>> v3: corrected comments and coommit meesage.
>> ---
>>  xen/arch/x86/apic.c    |  2 +-
>>  xen/arch/x86/smpboot.c |  3 +++
>>  xen/arch/x86/traps.c   | 12 ++++++++++--
>>  3 files changed, 14 insertions(+), 3 deletions(-)
>>
>> diff --git a/xen/arch/x86/apic.c b/xen/arch/x86/apic.c
>> index 5039173..ffa5a69 100644
>> --- a/xen/arch/x86/apic.c
>> +++ b/xen/arch/x86/apic.c
>> @@ -684,7 +684,7 @@ void setup_local_APIC(void)
>>          printk("Leaving ESR disabled.\n");
>>      }
>>  
>> -    if (nmi_watchdog == NMI_LOCAL_APIC)
>> +    if (nmi_watchdog == NMI_LOCAL_APIC && smp_processor_id())
>>          setup_apic_nmi_watchdog();
>>      apic_pm_activate();
>>  }
>> diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
>> index 2ebef03..1844116 100644
>> --- a/xen/arch/x86/smpboot.c
>> +++ b/xen/arch/x86/smpboot.c
>> @@ -1248,7 +1248,10 @@ int __cpu_up(unsigned int cpu)
>>  void __init smp_cpus_done(void)
>>  {
>>      if ( nmi_watchdog == NMI_LOCAL_APIC )
>> +    {
>> +        setup_apic_nmi_watchdog();
>>          check_nmi_watchdog();
>> +    }
>>  
>>      setup_ioapic_dest();
>>  
>> diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
>> index 2e022b0..e6c7487 100644
>> --- a/xen/arch/x86/traps.c
>> +++ b/xen/arch/x86/traps.c
>> @@ -1706,7 +1706,7 @@ static nmi_callback_t *nmi_callback =
>> dummy_nmi_callback; void do_nmi(const struct cpu_user_regs *regs)
>>  {
>>      unsigned int cpu = smp_processor_id();
>> -    unsigned char reason;
>> +    unsigned char reason = 0;
>>      bool handle_unknown = false;
>>  
>>      ++nmi_count(cpu);
>> @@ -1714,6 +1714,15 @@ void do_nmi(const struct cpu_user_regs *regs)
>>      if ( nmi_callback(regs, cpu) )
>>          return;
>>  
>> +    /*
>> +     * There is a chance that this IO port access will produce SMI
>> which,
>> +     * in turn, may take enough time for the next NMI tick to
>> happen.
>> +     * To avoid having nested NMIs as the result let's do it before
>> +     * watchdog re-scheduling.  
>
>This isn't strictly accurate.  How about:
>
>/* Reads of 0x61 may trap to SMM, and on production SKX servers, have
>been observed to take up to 200ms to complete.  By reading this port
>before we re-arm the NMI watchdog, we reduce the chance of having an
>NMI watchdog expire while in the SMI handler. */
>
>In particular, if we are servicing a non-watchdog NMI, the watchdog
>will still be counting down while the SMI executes.
>
>~Andrew
>
>> +     */
>> +    if ( cpu == 0 )
>> +        reason = inb(0x61);
>> +
>>      if ( (nmi_watchdog == NMI_NONE) ||
>>           (!nmi_watchdog_tick(regs) && watchdog_force) )
>>          handle_unknown = true;
>> @@ -1721,7 +1730,6 @@ void do_nmi(const struct cpu_user_regs *regs)
>>      /* Only the BSP gets external NMIs from the system. */
>>      if ( cpu == 0 )
>>      {
>> -        reason = inb(0x61);
>>          if ( reason & 0x80 )
>>              pci_serr_error(regs);
>>          if ( reason & 0x40 )  
>
>
>_______________________________________________
>Xen-devel mailing list
>Xen-devel@lists.xenproject.org
>https://lists.xenproject.org/mailman/listinfo/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-02-19 16:12 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-02-19 14:23 [PATCH v3] x86/nmi: start NMI watchdog on CPU0 after SMP bootstrap Igor Druzhinin
2018-02-19 14:48 ` Andrew Cooper
2018-02-19 16:12   ` Alexey G
2018-02-19 15:18 ` Jan Beulich
2018-02-19 15:20   ` Igor Druzhinin
2018-02-19 15:29     ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.