Reentrant NMIs, MCEs and interrupt stack tables.

All of lore.kernel.org
 help / color / mirror / Atom feed

* Reentrant NMIs, MCEs and interrupt stack tables.
@ 2012-11-21 21:06 Andrew Cooper
  2012-11-21 21:17 ` Tim Deegan
  0 siblings, 1 reply; 4+ messages in thread
From: Andrew Cooper @ 2012-11-21 21:06 UTC (permalink / raw)
  To: xen-devel@lists.xen.org, Keir Fraser, Jan Beulich, Tim Deegan

Hello,

While working on a fix for the rare-but-possible problem of reentrant
NMIs and MCEs, I have discovered that it is sadly possible to generate
fake NMIs and MCEs which will run the relevant handlers on the relevant
stacks, without invoking any of the other CPU logic for these special
interrupts.

A fake NMI can be generated by a processor in PIC mode as opposed to
Virtual wire mode, with a delivery of vector 2.  This setup is certainly
possible on a 64bit CPU, but I doubt there are many 64bit CPUs running
with only PIC.

A fake MCE is easy to generate.  A mal-programmed IO-APIC, IOMMU or
MSI/MSI-X entry which deliveres vector 0x18 is sufficient.  The LAPIC
will reject vectors 0 thru 0xf, but will deliver vectors 0x10 thru 0x1f,
despite them being architecturally reserved for exceptions.

The possibility of these fake interrupts (however unlikely) means that
there is necessarily a race condition between receiving a fake interrupt
and a genuine interrupt during which the handler cannot fixup the stack
sufficiently to be able to safely get back out.  If this race condition
were to occur, the real interrupt will corrupt the exception frame of
the fake interrupt, meaning that we cannot possibly resume the original
context.  This situation can be detected, but cannot be corrected, and
the only course of action is to crash gracefully.

The above problem made me wonder why we use separate stacks for NMIs and
MCEs.  I completely accept that the double fault handler should be on a
separate stack, but as we guarentee never to return from it, these
problems disappear.

Is there any particular reason to have separate stacks for NMIs and
MCEs, other than perhaps that it is good/common practice?  I can't think
of any other reasons offhand. (I am not necessarily advocating that we
combine NMIs and MCEs back into the regular Xen stack because, while it
would remove the above race condition, it would make other aspects of
the problem harder to solve.)

-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Reentrant NMIs, MCEs and interrupt stack tables.
  2012-11-21 21:06 Reentrant NMIs, MCEs and interrupt stack tables Andrew Cooper
@ 2012-11-21 21:17 ` Tim Deegan
  2012-11-21 21:40   ` Andrew Cooper
  2012-11-22  8:55   ` Jan Beulich
  0 siblings, 2 replies; 4+ messages in thread
From: Tim Deegan @ 2012-11-21 21:17 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Keir Fraser, Jan Beulich, xen-devel@lists.xen.org

At 21:06 +0000 on 21 Nov (1353532004), Andrew Cooper wrote:
> Hello,
> 
> While working on a fix for the rare-but-possible problem of reentrant
> NMIs and MCEs, I have discovered that it is sadly possible to generate
> fake NMIs and MCEs which will run the relevant handlers on the relevant
> stacks, without invoking any of the other CPU logic for these special
> interrupts.
> 
> A fake NMI can be generated by a processor in PIC mode as opposed to
> Virtual wire mode, with a delivery of vector 2.  This setup is certainly
> possible on a 64bit CPU, but I doubt there are many 64bit CPUs running
> with only PIC.
> 
> A fake MCE is easy to generate.  A mal-programmed IO-APIC, IOMMU or
> MSI/MSI-X entry which deliveres vector 0x18 is sufficient.  The LAPIC
> will reject vectors 0 thru 0xf, but will deliver vectors 0x10 thru 0x1f,
> despite them being architecturally reserved for exceptions.

You're not suggesting these could be caused by guest activity?

> The possibility of these fake interrupts (however unlikely) means that
> there is necessarily a race condition between receiving a fake interrupt
> and a genuine interrupt during which the handler cannot fixup the stack
> sufficiently to be able to safely get back out.  If this race condition
> were to occur, the real interrupt will corrupt the exception frame of
> the fake interrupt, meaning that we cannot possibly resume the original
> context.  This situation can be detected, but cannot be corrected, and
> the only course of action is to crash gracefully.

If once of these could only be casued by a bug in Xen, then I don't think
we need to handle it at all.  If it's trivial to detect it and crash
cleanly, that would be nice.

> The above problem made me wonder why we use separate stacks for NMIs and
> MCEs.  I completely accept that the double fault handler should be on a
> separate stack, but as we guarentee never to return from it, these
> problems disappear.
> 
> Is there any particular reason to have separate stacks for NMIs and
> MCEs, other than perhaps that it is good/common practice? 

It's to avoid a race where we take an NMI or MCE after swicthhing to the
user/guest stack but before SYSRET.

Tim.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Reentrant NMIs, MCEs and interrupt stack tables.
  2012-11-21 21:17 ` Tim Deegan
@ 2012-11-21 21:40   ` Andrew Cooper
  2012-11-22  8:55   ` Jan Beulich
  1 sibling, 0 replies; 4+ messages in thread
From: Andrew Cooper @ 2012-11-21 21:40 UTC (permalink / raw)
  To: Tim Deegan; +Cc: Keir (Xen.org), Jan Beulich, xen-devel@lists.xen.org

On 21/11/12 21:17, Tim Deegan wrote:
> At 21:06 +0000 on 21 Nov (1353532004), Andrew Cooper wrote:
>> Hello,
>>
>> While working on a fix for the rare-but-possible problem of reentrant
>> NMIs and MCEs, I have discovered that it is sadly possible to generate
>> fake NMIs and MCEs which will run the relevant handlers on the relevant
>> stacks, without invoking any of the other CPU logic for these special
>> interrupts.
>>
>> A fake NMI can be generated by a processor in PIC mode as opposed to
>> Virtual wire mode, with a delivery of vector 2.  This setup is certainly
>> possible on a 64bit CPU, but I doubt there are many 64bit CPUs running
>> with only PIC.
>>
>> A fake MCE is easy to generate.  A mal-programmed IO-APIC, IOMMU or
>> MSI/MSI-X entry which deliveres vector 0x18 is sufficient.  The LAPIC
>> will reject vectors 0 thru 0xf, but will deliver vectors 0x10 thru 0x1f,
>> despite them being architecturally reserved for exceptions.
> You're not suggesting these could be caused by guest activity?

No.  This would be buggy hardware or buggy Xen.  Perhaps I should have
said "A fake MCE is easy to generate (if you are hacking Xen to try and
deliberately make it happen)" (Although 'easy' is just speculation based
on the description of behaviour of the LAPIC in the Intel SDM Volume 3)

>
>> The possibility of these fake interrupts (however unlikely) means that
>> there is necessarily a race condition between receiving a fake interrupt
>> and a genuine interrupt during which the handler cannot fixup the stack
>> sufficiently to be able to safely get back out.  If this race condition
>> were to occur, the real interrupt will corrupt the exception frame of
>> the fake interrupt, meaning that we cannot possibly resume the original
>> context.  This situation can be detected, but cannot be corrected, and
>> the only course of action is to crash gracefully.
> If once of these could only be casued by a bug in Xen, then I don't think
> we need to handle it at all.  If it's trivial to detect it and crash
> cleanly, that would be nice.

With all the other gubbins in to work around the stack problem, it
becomes two extra conditionals, so for all intents and purposes trivial.

>
>> The above problem made me wonder why we use separate stacks for NMIs and
>> MCEs.  I completely accept that the double fault handler should be on a
>> separate stack, but as we guarentee never to return from it, these
>> problems disappear.
>>
>> Is there any particular reason to have separate stacks for NMIs and
>> MCEs, other than perhaps that it is good/common practice? 
> It's to avoid a race where we take an NMI or MCE after swicthhing to the
> user/guest stack but before SYSRET.
>
> Tim.

Ah yes - I forgot to consider that case.

Thanks,

-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Reentrant NMIs, MCEs and interrupt stack tables.
  2012-11-21 21:17 ` Tim Deegan
  2012-11-21 21:40   ` Andrew Cooper
@ 2012-11-22  8:55   ` Jan Beulich
  1 sibling, 0 replies; 4+ messages in thread
From: Jan Beulich @ 2012-11-22  8:55 UTC (permalink / raw)
  To: Andrew Cooper, Tim Deegan; +Cc: Keir Fraser, xen-devel@lists.xen.org

>>> On 21.11.12 at 22:17, Tim Deegan <tim@xen.org> wrote:
> At 21:06 +0000 on 21 Nov (1353532004), Andrew Cooper wrote:
>> Hello,
>> 
>> While working on a fix for the rare-but-possible problem of reentrant
>> NMIs and MCEs, I have discovered that it is sadly possible to generate
>> fake NMIs and MCEs which will run the relevant handlers on the relevant
>> stacks, without invoking any of the other CPU logic for these special
>> interrupts.
>> 
>> A fake NMI can be generated by a processor in PIC mode as opposed to
>> Virtual wire mode, with a delivery of vector 2.  This setup is certainly
>> possible on a 64bit CPU, but I doubt there are many 64bit CPUs running
>> with only PIC.
>> 
>> A fake MCE is easy to generate.  A mal-programmed IO-APIC, IOMMU or
>> MSI/MSI-X entry which deliveres vector 0x18 is sufficient.  The LAPIC
>> will reject vectors 0 thru 0xf, but will deliver vectors 0x10 thru 0x1f,
>> despite them being architecturally reserved for exceptions.
> 
> You're not suggesting these could be caused by guest activity?
> 
>> The possibility of these fake interrupts (however unlikely) means that
>> there is necessarily a race condition between receiving a fake interrupt
>> and a genuine interrupt during which the handler cannot fixup the stack
>> sufficiently to be able to safely get back out.  If this race condition
>> were to occur, the real interrupt will corrupt the exception frame of
>> the fake interrupt, meaning that we cannot possibly resume the original
>> context.  This situation can be detected, but cannot be corrected, and
>> the only course of action is to crash gracefully.
> 
> If once of these could only be casued by a bug in Xen, then I don't think
> we need to handle it at all.

Fully agree - the nesting we need to deal with cleanly is only
what can result from proper operation. Buggy operation should
not require any extra efforts, as long as it's only hypothetical
(i.e. if we knew a certain chipset/CPU could cause such, the need
for a workaround would surely arise; bugs in Xen we should treat
as such rather than trying to work around their effects).

> If it's trivial to detect it and crash cleanly, that would be nice.

That shouldn't be too difficult, as such interrupts would set ISR bits
in either the PIC or the LAPIC.

Jan

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2012-11-22  8:55 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-11-21 21:06 Reentrant NMIs, MCEs and interrupt stack tables Andrew Cooper
2012-11-21 21:17 ` Tim Deegan
2012-11-21 21:40   ` Andrew Cooper
2012-11-22  8:55   ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.