Re: [PATCH] x86/nmi: lower initial watchdog frequency to avoid boot hangs

From: Alexey G <x1917x@gmail.com>
To: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Igor Druzhinin <igor.druzhinin@citrix.com>,
	Jan Beulich <JBeulich@suse.com>,
	xen-devel@lists.xen.org
Subject: Re: [PATCH] x86/nmi: lower initial watchdog frequency to avoid boot hangs
Date: Fri, 9 Feb 2018 01:28:50 +1000	[thread overview]
Message-ID: <20180209012850.0000705d@gmail.com> (raw)
In-Reply-To: <63e98a88-1511-e4a4-492e-f035d4fda423@citrix.com>

On Thu, 8 Feb 2018 15:00:33 +0000
Andrew Cooper <andrew.cooper3@citrix.com> wrote:

>On 08/02/18 14:37, Alexey G wrote:
>> On Thu, 8 Feb 2018 12:40:41 +0000
>> Andrew Cooper <andrew.cooper3@citrix.com> wrote:  
>>> - Perf/Oprofile.  This is currently mutually exclusive with Xen
>>> using the watchdog, but needn't be and hopefully won't be in the
>>> future. 
>>>> Most of the time we deal with watchdog NMIs, while all others
>>>> should be somewhat rare. The thing is, we actually need to read
>>>> I/O port 61h on system NMIs only. 
>>>>
>>>> If the main problem lies in a flow of SMIs due to reading port 61h
>>>> on every NMI watchdog tick -- why not to avoid reading it?
>>>>
>>>> There are at least 2 ways to check if the NMI was due to a watchdog
>>>> tick:
>>>> - LAPIC (SDM states that "When a performance monitoring counters
>>>> interrupt is generated, the mask bit for its associated LVT entry
>>>> is set")
>>>> - perf MSR overflow bit
>>>>
>>>> So, if we detect it was a NMI due to a watchdog using these
>>>> methods (early in the NMI handler), we can avoid touching the port
>>>> 61h and thus triggering SMI I/O trap on it.    
>>> The problem is having multiple NMIs arriving.  Like all other edge
>>> triggered interrupts, extra arrivals get dropped.  By skipping the
>>> 0x61 read if we believe it was a watchdog NMI, we've opened a race
>>> condition where we will completely miss the system NMI.  
>> There shouldn't be any problem I think. NMIs don't need to be cleared
>> with EOI and it's a common practice to handle NMIs one-by-one (as a
>> NMI handler is not reentrant in a typical scenario).
>>
>> Execution of SMI doesn't cause a pending (blocked) NMI to get
>> dropped, similar mechanisms might be employed for a single NMI which
>> arrived in blocked-by-NMI state. Otherwise the whole thing will
>> break -- merely handling arbitrary NMI will be enough to miss any
>> other NMIs. This is a too obvious flaw. So normally it should be
>> just a matter which NMI of two will be serviced first.
>> This assumption can be verified empirically by requesting the chipset
>> to send an external NMI while serving a watchdog NMI and checking if
>> it arrive later on.  

>NMI handling works just like other interrupts, except that its
>equivalent of the ISR/IRR state is hidden.
>
>One new NMI will become pending while an NMI is in progress (because
>there is an IRR bit to be set), but any further will be dropped.
>
>You can demonstrate this easily by having CPUs or the chipset send
>NMIs.

One in service with one pending is enough for our scenario. It covers
the situation when either watchdog or external NMI being serviced and
another (single) NMI arrived. As long as two of these are serviced one
after another, there shouldn't be trouble in missing NMIs due to NMI
status misunderstanding -- all related status bits are sticky (incl.
Mask bit in LAPIC) and and won't go away if two NMIs executed in either
order, provided they both executed.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel