Re: [PATCH V2] xen: vmx: Use an INT 2 call to process real NMI's instead of self_nmi() in VMEXIT handler

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Tim Deegan <tim@xen.org>
Cc: Ian Campbell <Ian.Campbell@citrix.com>,
	"eddie.dong@intel.com" <eddie.dong@intel.com>,
	xen-devel <xen-devel@lists.xen.org>,
	Jan Beulich <JBeulich@suse.com>,
	"jun.nakajima@intel.com" <jun.nakajima@intel.com>,
	Malcolm Crossley <malcolm.crossley@citrix.com>
Subject: Re: [PATCH V2] xen: vmx: Use an INT 2 call to process real NMI's instead of self_nmi() in VMEXIT handler
Date: Thu, 15 Nov 2012 16:52:20 +0000	[thread overview]
Message-ID: <50A51DC4.7040205@citrix.com> (raw)
In-Reply-To: <20121115164156.GE75988@ocelot.phlegethon.org>


On 15/11/12 16:41, Tim Deegan wrote:
> Hi, 
>
> At 10:06 +0000 on 14 Nov (1352887560), Jan Beulich wrote:
>>> +            asm volatile("int $2"); /* Real NMI, vector 2: normal processing */
>> And I still don't like this use of "int $2" here: An aspect we didn't
>> consider so far is that a nested MCE would break things again
> OK, I think I understand the problem[s], but I'm going to spell it out
> slowly so you can correct me. :)
>
> [ tl;dr I agree that do_nmi() is better, and we should do that in this
>   patch, but maybe we need to solve the general problem too. ]
>
> On a PV guest, we have to use dedicated stacks for NMI and MCE in case
> either of those things happens just before SYSRET when we're on the user
> stack (no other interrupt or exception can happen at that point).
>
> On an AMD CPU we _don't_ have dedicated stacks for NMI or MCE when we're
> running a HVM guest, so the stack issue doesn't apply (but nested NMIs
> are still bad).
>
> On an Intel CPU, we _do_ use dedicated stacks for NMI and MCE in HVM
> guests.  We don't really have to but it saves time in the context switch
> not to update the IDT.  Using do_nmi() here means that the first NMI is
> handled on the normal stack instead.  It's also consistent with the way
> we call do_machine_check() for the MCE case.  But it needs an explicit
> IRET after the call to do_nmi() to make sure that NMIs get re-enabled.
>
> These dedicated stacks make the general problem of re-entrant MCE/NMI
> worse.  In the general case those handlers don't expect to be called in
> a reentrant way, but blatting the stack turns a possible problem into a
> definite one.

I have made a fairly simple patch which deliberately invokes a
re-entrant NMI.  The result is that a PCPU spins around the NMI handler
until the watchdog takes the host down.  It is also possible to get a
reentrant NMI if there is a pagefault (or handful of other possible
faults) when trying to execute the iret of the NMI itself; NMIs can get
re-enabled from the iret of the pagefault, and we take a new NMI before
attempting to retry the iret from the original NMI.

>
> ---
>
> All of this would be moot except for the risk that we might take an MCE
> while in the NMI handler.  The IRET from the MCE handler re-enables NMIs
> while we're still in the NMI handler, and a second NMI arriving could
> break the NMI handler.  In the PV case, it will also clobber the NMI
> handler's stack.  In the VMX case we would need to see something like
> (NMI (MCE) (NMI (MCE) (NMI))) for that to happen, but it could.

There is the MCIP bit in an MCE status MSR which acts as a latch for
MCEs.  If a new MCE is generated while this bit is set, then a triple
fault occurs.  An MCE handler is required to set this bit to 0 to
indicate that it has dealt with the MCE.  However, there is a race
condition window between setting this bit to 0 and leaving the MCE stack
during which another MCE can arrive and corrupt the stack.

>
> The inverse case, taking an NMI while in the MCE handler, is not very
> interesting.  There's no masking of MCEs so that handler already has to
> deal with nested entry, and the IRET from the NMI handler has no effect.
>
> We could potentially solve the problem by having the MCE handler check
> whether it's returning to the NMI stack, and do a normal return in that
> case.  It's a bit of extra code but only in the MCE handler, which is
> not performance-critical. 
>
> If we do that, then the choice of 'int $2' vs 'do_nmi(); fake_iret()'
> is mostly one of taste.  do_nmi() saves an IDT indirection but
> unbalances the call/return stack.  I slightly prefer 'int $2' just
> because it makes the PV and non-PV cases more similar.
>
> But first, we should take the current fix, with do_nmi() and iret() 
> instead of 'int $2'.  The nested-MCE issue can be handled separately.
>
> Does that make sense?

I have been looking at appling a similar fix to Linuses fix
(http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=3f3c8b8c4b2a34776c3470142a7c8baafcda6eb0)
to Xen, for both the NMI and MCE stacks.

Work is currently in the preliminary stages at the moment.

~Andrew

>
> Tim.
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com

next prev parent reply	other threads:[~2012-11-15 16:52 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-11-13 20:08 [PATCH V2] xen: vmx: Use an INT 2 call to process real NMI's instead of self_nmi() in VMEXIT handler Malcolm Crossley
2012-11-14 10:06 ` Jan Beulich
2012-11-15 16:41   ` Tim Deegan
2012-11-15 16:52     ` Andrew Cooper [this message]
2012-11-15 17:25       ` Tim Deegan
2012-11-16  8:17         ` Jan Beulich
2012-11-16  9:59           ` Mats Petersson
2012-11-16 10:18             ` Keir Fraser
2012-11-15 17:03     ` Mats Petersson
2012-11-15 17:15       ` Tim Deegan
2012-11-15 17:33         ` Mats Petersson
2012-11-15 17:44           ` Tim Deegan
2012-11-15 18:23             ` Mats Petersson
2012-11-16  8:07     ` Jan Beulich
2012-11-16 10:56       ` Tim Deegan
2012-11-16 11:23         ` Jan Beulich
2012-11-16 11:52           ` Andrew Cooper
2012-11-16 13:53             ` Tim Deegan
2012-11-16 14:11               ` Andrew Cooper
2012-11-22  8:58 ` Jan Beulich
2012-11-22 10:52   ` Andrew Cooper

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50A51DC4.7040205@citrix.com \
    --to=andrew.cooper3@citrix.com \
    --cc=Ian.Campbell@citrix.com \
    --cc=JBeulich@suse.com \
    --cc=eddie.dong@intel.com \
    --cc=jun.nakajima@intel.com \
    --cc=malcolm.crossley@citrix.com \
    --cc=tim@xen.org \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).