From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Tim Deegan <tim@xen.org>
Cc: Ian Campbell <Ian.Campbell@citrix.com>,
"eddie.dong@intel.com" <eddie.dong@intel.com>,
xen-devel <xen-devel@lists.xen.org>,
Jan Beulich <JBeulich@suse.com>,
"jun.nakajima@intel.com" <jun.nakajima@intel.com>,
Malcolm Crossley <malcolm.crossley@citrix.com>
Subject: Re: [PATCH V2] xen: vmx: Use an INT 2 call to process real NMI's instead of self_nmi() in VMEXIT handler
Date: Thu, 15 Nov 2012 16:52:20 +0000 [thread overview]
Message-ID: <50A51DC4.7040205@citrix.com> (raw)
In-Reply-To: <20121115164156.GE75988@ocelot.phlegethon.org>
On 15/11/12 16:41, Tim Deegan wrote:
> Hi,
>
> At 10:06 +0000 on 14 Nov (1352887560), Jan Beulich wrote:
>>> + asm volatile("int $2"); /* Real NMI, vector 2: normal processing */
>> And I still don't like this use of "int $2" here: An aspect we didn't
>> consider so far is that a nested MCE would break things again
> OK, I think I understand the problem[s], but I'm going to spell it out
> slowly so you can correct me. :)
>
> [ tl;dr I agree that do_nmi() is better, and we should do that in this
> patch, but maybe we need to solve the general problem too. ]
>
> On a PV guest, we have to use dedicated stacks for NMI and MCE in case
> either of those things happens just before SYSRET when we're on the user
> stack (no other interrupt or exception can happen at that point).
>
> On an AMD CPU we _don't_ have dedicated stacks for NMI or MCE when we're
> running a HVM guest, so the stack issue doesn't apply (but nested NMIs
> are still bad).
>
> On an Intel CPU, we _do_ use dedicated stacks for NMI and MCE in HVM
> guests. We don't really have to but it saves time in the context switch
> not to update the IDT. Using do_nmi() here means that the first NMI is
> handled on the normal stack instead. It's also consistent with the way
> we call do_machine_check() for the MCE case. But it needs an explicit
> IRET after the call to do_nmi() to make sure that NMIs get re-enabled.
>
> These dedicated stacks make the general problem of re-entrant MCE/NMI
> worse. In the general case those handlers don't expect to be called in
> a reentrant way, but blatting the stack turns a possible problem into a
> definite one.
I have made a fairly simple patch which deliberately invokes a
re-entrant NMI. The result is that a PCPU spins around the NMI handler
until the watchdog takes the host down. It is also possible to get a
reentrant NMI if there is a pagefault (or handful of other possible
faults) when trying to execute the iret of the NMI itself; NMIs can get
re-enabled from the iret of the pagefault, and we take a new NMI before
attempting to retry the iret from the original NMI.
>
> ---
>
> All of this would be moot except for the risk that we might take an MCE
> while in the NMI handler. The IRET from the MCE handler re-enables NMIs
> while we're still in the NMI handler, and a second NMI arriving could
> break the NMI handler. In the PV case, it will also clobber the NMI
> handler's stack. In the VMX case we would need to see something like
> (NMI (MCE) (NMI (MCE) (NMI))) for that to happen, but it could.
There is the MCIP bit in an MCE status MSR which acts as a latch for
MCEs. If a new MCE is generated while this bit is set, then a triple
fault occurs. An MCE handler is required to set this bit to 0 to
indicate that it has dealt with the MCE. However, there is a race
condition window between setting this bit to 0 and leaving the MCE stack
during which another MCE can arrive and corrupt the stack.
>
> The inverse case, taking an NMI while in the MCE handler, is not very
> interesting. There's no masking of MCEs so that handler already has to
> deal with nested entry, and the IRET from the NMI handler has no effect.
>
> We could potentially solve the problem by having the MCE handler check
> whether it's returning to the NMI stack, and do a normal return in that
> case. It's a bit of extra code but only in the MCE handler, which is
> not performance-critical.
>
> If we do that, then the choice of 'int $2' vs 'do_nmi(); fake_iret()'
> is mostly one of taste. do_nmi() saves an IDT indirection but
> unbalances the call/return stack. I slightly prefer 'int $2' just
> because it makes the PV and non-PV cases more similar.
>
> But first, we should take the current fix, with do_nmi() and iret()
> instead of 'int $2'. The nested-MCE issue can be handled separately.
>
> Does that make sense?
I have been looking at appling a similar fix to Linuses fix
(http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=3f3c8b8c4b2a34776c3470142a7c8baafcda6eb0)
to Xen, for both the NMI and MCE stacks.
Work is currently in the preliminary stages at the moment.
~Andrew
>
> Tim.
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
--
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com
next prev parent reply other threads:[~2012-11-15 16:52 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-11-13 20:08 [PATCH V2] xen: vmx: Use an INT 2 call to process real NMI's instead of self_nmi() in VMEXIT handler Malcolm Crossley
2012-11-14 10:06 ` Jan Beulich
2012-11-15 16:41 ` Tim Deegan
2012-11-15 16:52 ` Andrew Cooper [this message]
2012-11-15 17:25 ` Tim Deegan
2012-11-16 8:17 ` Jan Beulich
2012-11-16 9:59 ` Mats Petersson
2012-11-16 10:18 ` Keir Fraser
2012-11-15 17:03 ` Mats Petersson
2012-11-15 17:15 ` Tim Deegan
2012-11-15 17:33 ` Mats Petersson
2012-11-15 17:44 ` Tim Deegan
2012-11-15 18:23 ` Mats Petersson
2012-11-16 8:07 ` Jan Beulich
2012-11-16 10:56 ` Tim Deegan
2012-11-16 11:23 ` Jan Beulich
2012-11-16 11:52 ` Andrew Cooper
2012-11-16 13:53 ` Tim Deegan
2012-11-16 14:11 ` Andrew Cooper
2012-11-22 8:58 ` Jan Beulich
2012-11-22 10:52 ` Andrew Cooper
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=50A51DC4.7040205@citrix.com \
--to=andrew.cooper3@citrix.com \
--cc=Ian.Campbell@citrix.com \
--cc=JBeulich@suse.com \
--cc=eddie.dong@intel.com \
--cc=jun.nakajima@intel.com \
--cc=malcolm.crossley@citrix.com \
--cc=tim@xen.org \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).