xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Alexey G <x1917x@gmail.com>
To: Igor Druzhinin <igor.druzhinin@citrix.com>
Cc: andrew.cooper3@citrix.com, jbeulich@suse.com, xen-devel@lists.xen.org
Subject: Re: [PATCH] x86/nmi: lower initial watchdog frequency to avoid boot hangs
Date: Tue, 6 Feb 2018 13:10:26 +1000	[thread overview]
Message-ID: <20180206131026.00007e4b@gmail.com> (raw)
In-Reply-To: <1517865522-3248-1-git-send-email-igor.druzhinin@citrix.com>

On Mon, 5 Feb 2018 21:18:42 +0000
Igor Druzhinin <igor.druzhinin@citrix.com> wrote:

>We're noticing a reproducible system boot hang on certain
>post-Skylake platforms where the BIOS is configured in
>legacy boot mode with x2APIC disabled. The system stalls
>immediately after writing the first SMP initialization
>sequence into APIC ICR.
>
>The cause of the problem is watchdog NMI handler execution -
>somewhere near the end of NMI handling (after it's already
>rescheduled the next NMI) it tries to access IO port 0x61
>to get the actual NMI reason on CPU0. Unfortunately, this
>port is emulated by BIOS using SMIs and this emulation
>apparently might take more than we expect under certain
>conditions. As the result, the system is constantly moving
>between NMI and SMI handler and not making any progress.
>
>Just lower the initial frequency for now as we lower it later
>even more anyway.

I/O port 61h normally is not emulated by SMI legacy kbd handling code
in BIOS, only ports like 60h, 64h, etc.
Contrary to USB legacy emulation, it has to intercept port 61h via a
different approach -- generic SMI I/O trap, which is not common (at least
it was) to use by BIOSes... although it is possible as EFI interface and
code for this is available. The assumption about port 61h being trapped by
the SMI handler must be explicitly confirmed by checking I/O Trap control
regs in the RCBA region.

If I/O trap regs won't show an active I/O trap on I/O port 61h -- the
root cause might be different (might even be related to stuff like
NMI2SMI logic).

If the problem is actually due to NMI handler being preempted by another
NMI which occurred after (a long) execution of triggered SMI handler, it
might be better to do all sensitive stuff before re-enabling NMIs by IRET in
the NMI handler.

>Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
>---
> xen/arch/x86/nmi.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
>diff --git a/xen/arch/x86/nmi.c b/xen/arch/x86/nmi.c
>index d7fce28..1eb2a32 100644
>--- a/xen/arch/x86/nmi.c
>+++ b/xen/arch/x86/nmi.c
>@@ -34,7 +34,8 @@
> #include <asm/apic.h>
> 
> unsigned int nmi_watchdog = NMI_NONE;
>-static unsigned int nmi_hz = HZ;
>+/* initial watchdog frequency - shouldn't be too high to avoid boot hangs
>*/ +static unsigned int nmi_hz = HZ / 10;
> static unsigned int nmi_perfctr_msr;	/* the MSR to reset in NMI
> handler */ static unsigned int nmi_p4_cccr_val;
> static DEFINE_PER_CPU(struct timer, nmi_timer);


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

  reply	other threads:[~2018-02-06  3:10 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-05 21:18 [PATCH] x86/nmi: lower initial watchdog frequency to avoid boot hangs Igor Druzhinin
2018-02-06  3:10 ` Alexey G [this message]
2018-02-06 14:21   ` Andrew Cooper
2018-02-06 17:08     ` Alexey G
2018-02-06 17:21       ` Igor Druzhinin
2018-02-06 18:17         ` Alexey G
2018-02-06 19:50           ` Igor Druzhinin
2018-02-07  6:35             ` Alexey G
2018-02-06 14:10 ` Andrew Cooper
2018-02-06 16:07 ` Jan Beulich
2018-02-06 16:14   ` Igor Druzhinin
2018-02-06 16:23     ` Jan Beulich
2018-02-06 16:27       ` Igor Druzhinin
2018-02-06 16:29       ` Igor Druzhinin
2018-02-06 21:51       ` Igor Druzhinin
2018-02-07  9:13         ` Jan Beulich
2018-02-07 13:01           ` Igor Druzhinin
2018-02-07 13:08             ` Jan Beulich
2018-02-07 13:24               ` Andrew Cooper
2018-02-07 15:06                 ` Jan Beulich
2018-02-07 17:08                   ` Andrew Cooper
2018-02-08  9:12                     ` Jan Beulich
2018-02-08 12:18                       ` Andrew Cooper
2018-02-13  9:03                         ` Jan Beulich
2018-02-07 13:54               ` Igor Druzhinin
2018-02-08  6:37             ` Alexey G
2018-02-08 10:47               ` Igor Druzhinin
2018-02-08 12:32                 ` Alexey G
2018-02-08 12:40                   ` Andrew Cooper
2018-02-08 14:37                     ` Alexey G
2018-02-08 15:00                       ` Andrew Cooper
2018-02-08 15:28                         ` Alexey G

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180206131026.00007e4b@gmail.com \
    --to=x1917x@gmail.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=igor.druzhinin@citrix.com \
    --cc=jbeulich@suse.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).