xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Igor Druzhinin <igor.druzhinin@citrix.com>
To: Alexey G <x1917x@gmail.com>
Cc: andrew.cooper3@citrix.com, Jan Beulich <JBeulich@suse.com>,
	xen-devel@lists.xen.org
Subject: Re: [PATCH] x86/nmi: lower initial watchdog frequency to avoid boot hangs
Date: Thu, 8 Feb 2018 10:47:45 +0000	[thread overview]
Message-ID: <c1bee848-8ecb-9748-dcad-bb529310fc80@citrix.com> (raw)
In-Reply-To: <20180208163732.000050d9@gmail.com>

On 08/02/18 06:37, Alexey G wrote:
> On Wed, 7 Feb 2018 13:01:08 +0000
> Igor Druzhinin <igor.druzhinin@citrix.com> wrote:
>> So far the issue confirmed:
>> Dell PowerEdge R740, Huawei systems based on Xeon Gold 6152 (the one
>> that it was tested on), Intel S2600XX, etc.
>>
>> Also see:
>> https://bugs.xenserver.org/browse/XSO-774
>>
>> Well, no-watchdog is what we currently recommend in that case but we
>> hoped there is a general solution here from Xen side. You have your
>> point that they should fix this on their side because it's their fault
>> indeed. But the user experience is also important for us I think.
> 
> Igor,
> 
> It would be nice to measure the actual SMI handling time on affected
> systems (eg. via rdtsc before/after inb(0x61) + averaging for
> multiple reads perhaps), is it really 10+ ms.
> 

I've done this measurement before. So what we are seeing exactly is that
the time we are spending in SMI is spiking (sometimes up to 200ms) at
the moment we go through INIT-SIPI-SIPI sequence. Looks like this is
enough to push the system into a livelock spiral. So I agree with Jan to
some point that the proposed workaround might not be working on some
systems.

> There might be a chance that perf counter frequency is calculated wrong
> for some systems, resulting in a very high rate of NMI watchdog ticks
> instead of long SMI handler execution time. >10ms just looks... too
> extreme.
> 

We ruled that out.

> Huawei Server 2488 V5 BIOS -- similar SMI I/O trap handler for the port
> 61h found. Some differences with gigabyte H270 system though:
> 
> - no "allocated" I/O traps anymore, but one additional SMI I/O trap
>   encountered: port 900h, dword size. Possibly related to PCIe PM
>   facilities.
> 
> - port 61h SMI handler now has multiple calls to debug/assert stub
>   functions -- there might be a chance that some of impacted systems
>   had debug build on, resulting in those stubs expanded to some real
>   debugging code with negative impact on SMI handling speed.
> 
> Few additional observations:
> 
> - port 61h I/O Trap SMI handler checks accessed I/O address/size to be
>   equal to 61h/1byte. There might be some difference when reading port
>   61h via inw(0x60)/inl(0x60)/etc
> 
> - looks like there exist an alternative way to read NMI status without
>   triggering SMI -- via ports 63h/65h/67h, but this depends on
>   undocumented bit in Generic Control and Status register
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

  reply	other threads:[~2018-02-08 10:47 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-05 21:18 [PATCH] x86/nmi: lower initial watchdog frequency to avoid boot hangs Igor Druzhinin
2018-02-06  3:10 ` Alexey G
2018-02-06 14:21   ` Andrew Cooper
2018-02-06 17:08     ` Alexey G
2018-02-06 17:21       ` Igor Druzhinin
2018-02-06 18:17         ` Alexey G
2018-02-06 19:50           ` Igor Druzhinin
2018-02-07  6:35             ` Alexey G
2018-02-06 14:10 ` Andrew Cooper
2018-02-06 16:07 ` Jan Beulich
2018-02-06 16:14   ` Igor Druzhinin
2018-02-06 16:23     ` Jan Beulich
2018-02-06 16:27       ` Igor Druzhinin
2018-02-06 16:29       ` Igor Druzhinin
2018-02-06 21:51       ` Igor Druzhinin
2018-02-07  9:13         ` Jan Beulich
2018-02-07 13:01           ` Igor Druzhinin
2018-02-07 13:08             ` Jan Beulich
2018-02-07 13:24               ` Andrew Cooper
2018-02-07 15:06                 ` Jan Beulich
2018-02-07 17:08                   ` Andrew Cooper
2018-02-08  9:12                     ` Jan Beulich
2018-02-08 12:18                       ` Andrew Cooper
2018-02-13  9:03                         ` Jan Beulich
2018-02-07 13:54               ` Igor Druzhinin
2018-02-08  6:37             ` Alexey G
2018-02-08 10:47               ` Igor Druzhinin [this message]
2018-02-08 12:32                 ` Alexey G
2018-02-08 12:40                   ` Andrew Cooper
2018-02-08 14:37                     ` Alexey G
2018-02-08 15:00                       ` Andrew Cooper
2018-02-08 15:28                         ` Alexey G

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c1bee848-8ecb-9748-dcad-bb529310fc80@citrix.com \
    --to=igor.druzhinin@citrix.com \
    --cc=JBeulich@suse.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=x1917x@gmail.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).