* EFI mixed mode + perf = rampant triple faults
@ 2014-12-17 16:51 Andy Lutomirski
2014-12-17 16:54 ` Andy Lutomirski
0 siblings, 1 reply; 12+ messages in thread
From: Andy Lutomirski @ 2014-12-17 16:51 UTC (permalink / raw)
To: linux-efi, LKML
I figured I should send this email before I forget about this issue:
If you run perf record across any EFI mixed mode call or otherwise
receive an NMI or MCE, the machine triple-faults. The cause is
straightforward: there is no valid IDT when we have long mode disabled
for the duration of the EFI call.
As far as I know, the only way to have continuously functional interrupt
handling across a long mode transition is to install an interrupt vector
table and hope that CPUs actually do something intelligent when
receiving an interrupt with LME=1, LMA=1, and PG=0. Yuck.
Could we get away with issuing 32-bit EFI calls in compat mode, i.e.
with a 32-bit CPL0 CS but while still in long mode? I think that
delivery of an IST interrupt (which includes both NMI and MCE) will
correctly switch to a fully valid 64-bit state and would correctly
switch back when we execute IRET at the end. (Am I missing some reason
that switching bitness without a privilege level change doesn't work
well? I haven't thought of anything, other than the lack of SS controls
on intra-ring interrupts, but that shouldn't be an issue here.)
As an added benefit, this would considerably simplify the code.
--Andy
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: EFI mixed mode + perf = rampant triple faults
2014-12-17 16:51 EFI mixed mode + perf = rampant triple faults Andy Lutomirski
@ 2014-12-17 16:54 ` Andy Lutomirski
2014-12-31 18:37 ` Matt Fleming
0 siblings, 1 reply; 12+ messages in thread
From: Andy Lutomirski @ 2014-12-17 16:54 UTC (permalink / raw)
To: LKML, linux-efi@vger.kernel.org; +Cc: Borislav Petkov
[trying again with .org spelled correctly. also cc: bpetkov]
On Wed, Dec 17, 2014 at 8:51 AM, Andy Lutomirski <luto@amacapital.net> wrote:
> I figured I should send this email before I forget about this issue:
>
> If you run perf record across any EFI mixed mode call or otherwise
> receive an NMI or MCE, the machine triple-faults. The cause is
> straightforward: there is no valid IDT when we have long mode disabled
> for the duration of the EFI call.
>
> As far as I know, the only way to have continuously functional interrupt
> handling across a long mode transition is to install an interrupt vector
> table and hope that CPUs actually do something intelligent when
> receiving an interrupt with LME=1, LMA=1, and PG=0. Yuck.
>
> Could we get away with issuing 32-bit EFI calls in compat mode, i.e.
> with a 32-bit CPL0 CS but while still in long mode? I think that
> delivery of an IST interrupt (which includes both NMI and MCE) will
> correctly switch to a fully valid 64-bit state and would correctly
> switch back when we execute IRET at the end. (Am I missing some reason
> that switching bitness without a privilege level change doesn't work
> well? I haven't thought of anything, other than the lack of SS/SP controls
> on intra-ring interrupts, but that shouldn't be an issue here.)
>
> As an added benefit, this would considerably simplify the code.
>
> --Andy
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: EFI mixed mode + perf = rampant triple faults
2014-12-17 16:54 ` Andy Lutomirski
@ 2014-12-31 18:37 ` Matt Fleming
2015-01-14 16:51 ` Matt Fleming
0 siblings, 1 reply; 12+ messages in thread
From: Matt Fleming @ 2014-12-31 18:37 UTC (permalink / raw)
To: Andy Lutomirski
Cc: LKML, linux-efi@vger.kernel.org, Borislav Petkov, H. Peter Anvin,
Thomas Gleixner, Ingo Molnar, Peter Zijlstra
On Wed, 17 Dec, at 08:54:56AM, Andy Lutomirski wrote:
> [trying again with .org spelled correctly. also cc: bpetkov]
>
> On Wed, Dec 17, 2014 at 8:51 AM, Andy Lutomirski <luto@amacapital.net> wrote:
> > I figured I should send this email before I forget about this issue:
> >
> > If you run perf record across any EFI mixed mode call or otherwise
> > receive an NMI or MCE, the machine triple-faults. The cause is
> > straightforward: there is no valid IDT when we have long mode disabled
> > for the duration of the EFI call.
Right, the lack of IDT is intentional since we disable interrupts while
making the EFI call and so far I have side-stepped (ignored) the NMI/MCE
issue.
Perf is an interesting use case. I've admittedly never used it with EFI
mixed mode, but yes, we should definitely get that working (if NMI/MCE
handling wasn't justification enough).
> > As far as I know, the only way to have continuously functional interrupt
> > handling across a long mode transition is to install an interrupt vector
> > table and hope that CPUs actually do something intelligent when
> > receiving an interrupt with LME=1, LMA=1, and PG=0. Yuck.
> >
> > Could we get away with issuing 32-bit EFI calls in compat mode, i.e.
> > with a 32-bit CPL0 CS but while still in long mode? I think that
> > delivery of an IST interrupt (which includes both NMI and MCE) will
> > correctly switch to a fully valid 64-bit state and would correctly
> > switch back when we execute IRET at the end. (Am I missing some reason
> > that switching bitness without a privilege level change doesn't work
> > well? I haven't thought of anything, other than the lack of SS/SP controls
> > on intra-ring interrupts, but that shouldn't be an issue here.)
> >
> > As an added benefit, this would considerably simplify the code.
I can't immediately think of a reason that this wouldn't work, but I've
Cc'd more x86 folks for additional insight.
I will schedule some time to look into this issue in the new year.
Thanks Andy.
--
Matt Fleming, Intel Open Source Technology Center
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: EFI mixed mode + perf = rampant triple faults
2014-12-31 18:37 ` Matt Fleming
@ 2015-01-14 16:51 ` Matt Fleming
2015-01-14 18:27 ` Andy Lutomirski
0 siblings, 1 reply; 12+ messages in thread
From: Matt Fleming @ 2015-01-14 16:51 UTC (permalink / raw)
To: Andy Lutomirski
Cc: LKML, linux-efi@vger.kernel.org, Borislav Petkov, H. Peter Anvin,
Thomas Gleixner, Ingo Molnar, Peter Zijlstra
On Wed, 31 Dec, at 06:37:39PM, Matt Fleming wrote:
> On Wed, 17 Dec, at 08:54:56AM, Andy Lutomirski wrote:
> > > As far as I know, the only way to have continuously functional interrupt
> > > handling across a long mode transition is to install an interrupt vector
> > > table and hope that CPUs actually do something intelligent when
> > > receiving an interrupt with LME=1, LMA=1, and PG=0. Yuck.
> > >
> > > Could we get away with issuing 32-bit EFI calls in compat mode, i.e.
> > > with a 32-bit CPL0 CS but while still in long mode? I think that
> > > delivery of an IST interrupt (which includes both NMI and MCE) will
> > > correctly switch to a fully valid 64-bit state and would correctly
> > > switch back when we execute IRET at the end. (Am I missing some reason
> > > that switching bitness without a privilege level change doesn't work
> > > well? I haven't thought of anything, other than the lack of SS/SP controls
> > > on intra-ring interrupts, but that shouldn't be an issue here.)
> > >
> > > As an added benefit, this would considerably simplify the code.
>
> I can't immediately think of a reason that this wouldn't work, but I've
> Cc'd more x86 folks for additional insight.
>
> I will schedule some time to look into this issue in the new year.
> Thanks Andy.
I finally got some time to look into this, and running with
__KERNEL32_CS seems to work fine at runtime both with Qemu + 32-bit OVMF
and on my ASUS T100. Manually triggering an MCE exception immediately
before invoking the firmware service recovers gracefully.
Where this won't work so well is at boot time before we jump to the
kernel proper. There, we still need to restore the firmware's GDT so
that interrupts are serviced correctly before ExitBootServices() (in
particular, ia32 Tianocore assumes __KERNEL_CS is a 32-bit CS).
Which means the code to handle mixed mode calls at boot time and runtime
has now diverged. Fixing that is probably just a SMOP to maximise code
reuse though.
I'll post a patch after some more testing.
--
Matt Fleming, Intel Open Source Technology Center
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: EFI mixed mode + perf = rampant triple faults
2015-01-14 16:51 ` Matt Fleming
@ 2015-01-14 18:27 ` Andy Lutomirski
2015-01-14 18:35 ` Borislav Petkov
2015-01-15 19:41 ` Matt Fleming
0 siblings, 2 replies; 12+ messages in thread
From: Andy Lutomirski @ 2015-01-14 18:27 UTC (permalink / raw)
To: Matt Fleming
Cc: LKML, linux-efi@vger.kernel.org, Borislav Petkov, H. Peter Anvin,
Thomas Gleixner, Ingo Molnar, Peter Zijlstra
On Wed, Jan 14, 2015 at 8:51 AM, Matt Fleming <matt@console-pimps.org> wrote:
> On Wed, 31 Dec, at 06:37:39PM, Matt Fleming wrote:
>> On Wed, 17 Dec, at 08:54:56AM, Andy Lutomirski wrote:
>> > > As far as I know, the only way to have continuously functional interrupt
>> > > handling across a long mode transition is to install an interrupt vector
>> > > table and hope that CPUs actually do something intelligent when
>> > > receiving an interrupt with LME=1, LMA=1, and PG=0. Yuck.
>> > >
>> > > Could we get away with issuing 32-bit EFI calls in compat mode, i.e.
>> > > with a 32-bit CPL0 CS but while still in long mode? I think that
>> > > delivery of an IST interrupt (which includes both NMI and MCE) will
>> > > correctly switch to a fully valid 64-bit state and would correctly
>> > > switch back when we execute IRET at the end. (Am I missing some reason
>> > > that switching bitness without a privilege level change doesn't work
>> > > well? I haven't thought of anything, other than the lack of SS/SP controls
>> > > on intra-ring interrupts, but that shouldn't be an issue here.)
>> > >
>> > > As an added benefit, this would considerably simplify the code.
>>
>> I can't immediately think of a reason that this wouldn't work, but I've
>> Cc'd more x86 folks for additional insight.
>>
>> I will schedule some time to look into this issue in the new year.
>> Thanks Andy.
>
> I finally got some time to look into this, and running with
> __KERNEL32_CS seems to work fine at runtime both with Qemu + 32-bit OVMF
> and on my ASUS T100. Manually triggering an MCE exception immediately
> before invoking the firmware service recovers gracefully.
How are you manually triggering an MCE? I've been playing with some
MCE stuff recently, but the only reasonably reliable way I know of to
trigger an MCE is using WHEA, and I don't have a box with WHEA, and I
assume your ASUS T100 doesn't either.
>
> Where this won't work so well is at boot time before we jump to the
> kernel proper. There, we still need to restore the firmware's GDT so
> that interrupts are serviced correctly before ExitBootServices() (in
> particular, ia32 Tianocore assumes __KERNEL_CS is a 32-bit CS).
Tianocore makes assumptions about the kernel's GDT layout? Yuck.
--Andy
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: EFI mixed mode + perf = rampant triple faults
2015-01-14 18:27 ` Andy Lutomirski
@ 2015-01-14 18:35 ` Borislav Petkov
2015-01-14 18:38 ` Andy Lutomirski
2015-01-15 19:41 ` Matt Fleming
1 sibling, 1 reply; 12+ messages in thread
From: Borislav Petkov @ 2015-01-14 18:35 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Matt Fleming, LKML, linux-efi@vger.kernel.org, H. Peter Anvin,
Thomas Gleixner, Ingo Molnar, Peter Zijlstra
On Wed, Jan 14, 2015 at 10:27:47AM -0800, Andy Lutomirski wrote:
> How are you manually triggering an MCE? I've been playing with some
> MCE stuff recently, but the only reasonably reliable way I know of to
> trigger an MCE is using WHEA, and I don't have a box with WHEA, and I
> assume your ASUS T100 doesn't either.
asm volatile("int $18");
> Tianocore makes assumptions about the kernel's GDT layout? Yuck.
Yuck indeed.
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: EFI mixed mode + perf = rampant triple faults
2015-01-14 18:35 ` Borislav Petkov
@ 2015-01-14 18:38 ` Andy Lutomirski
2015-01-14 18:47 ` Borislav Petkov
0 siblings, 1 reply; 12+ messages in thread
From: Andy Lutomirski @ 2015-01-14 18:38 UTC (permalink / raw)
To: Borislav Petkov
Cc: Matt Fleming, LKML, linux-efi@vger.kernel.org, H. Peter Anvin,
Thomas Gleixner, Ingo Molnar, Peter Zijlstra
On Wed, Jan 14, 2015 at 10:35 AM, Borislav Petkov <bp@alien8.de> wrote:
> On Wed, Jan 14, 2015 at 10:27:47AM -0800, Andy Lutomirski wrote:
>> How are you manually triggering an MCE? I've been playing with some
>> MCE stuff recently, but the only reasonably reliable way I know of to
>> trigger an MCE is using WHEA, and I don't have a box with WHEA, and I
>> assume your ASUS T100 doesn't either.
>
> asm volatile("int $18");
That's not a real MCE, though -- it happens synchronously instead of
at MCE priority with all the associated messiness. Or is the idea to
just stick that in after switching to the 32-bit mode being tested?
--Andy
>
>> Tianocore makes assumptions about the kernel's GDT layout? Yuck.
>
> Yuck indeed.
>
> --
> Regards/Gruss,
> Boris.
>
> Sent from a fat crate under my desk. Formatting is fine.
> --
--
Andy Lutomirski
AMA Capital Management, LLC
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: EFI mixed mode + perf = rampant triple faults
2015-01-14 18:38 ` Andy Lutomirski
@ 2015-01-14 18:47 ` Borislav Petkov
2015-01-14 18:49 ` Andy Lutomirski
0 siblings, 1 reply; 12+ messages in thread
From: Borislav Petkov @ 2015-01-14 18:47 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Matt Fleming, LKML, linux-efi@vger.kernel.org, H. Peter Anvin,
Thomas Gleixner, Ingo Molnar, Peter Zijlstra
On Wed, Jan 14, 2015 at 10:38:25AM -0800, Andy Lutomirski wrote:
> That's not a real MCE, though -- it happens synchronously instead of
MCE can be synchronous in a sense too, as a result of executing an insn,
for example, i.e., EIPV bit set.
> at MCE priority with all the associated messiness. Or is the idea to
> just stick that in after switching to the 32-bit mode being tested?
Yes, the idea was to simply trigger a real exception after switching to
32-bit while still with 64-bit IDT handlers.
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: EFI mixed mode + perf = rampant triple faults
2015-01-14 18:47 ` Borislav Petkov
@ 2015-01-14 18:49 ` Andy Lutomirski
0 siblings, 0 replies; 12+ messages in thread
From: Andy Lutomirski @ 2015-01-14 18:49 UTC (permalink / raw)
To: Borislav Petkov
Cc: Matt Fleming, LKML, linux-efi@vger.kernel.org, H. Peter Anvin,
Thomas Gleixner, Ingo Molnar, Peter Zijlstra
On Wed, Jan 14, 2015 at 10:47 AM, Borislav Petkov <bp@alien8.de> wrote:
> On Wed, Jan 14, 2015 at 10:38:25AM -0800, Andy Lutomirski wrote:
>> That's not a real MCE, though -- it happens synchronously instead of
>
> MCE can be synchronous in a sense too, as a result of executing an insn,
> for example, i.e., EIPV bit set.
>
>> at MCE priority with all the associated messiness. Or is the idea to
>> just stick that in after switching to the 32-bit mode being tested?
>
> Yes, the idea was to simply trigger a real exception after switching to
> 32-bit while still with 64-bit IDT handlers.
>
Seems like a reasonable test to me.
--Andy
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: EFI mixed mode + perf = rampant triple faults
2015-01-14 18:27 ` Andy Lutomirski
2015-01-14 18:35 ` Borislav Petkov
@ 2015-01-15 19:41 ` Matt Fleming
2015-01-15 19:59 ` H. Peter Anvin
1 sibling, 1 reply; 12+ messages in thread
From: Matt Fleming @ 2015-01-15 19:41 UTC (permalink / raw)
To: Andy Lutomirski
Cc: LKML, linux-efi@vger.kernel.org, Borislav Petkov, H. Peter Anvin,
Thomas Gleixner, Ingo Molnar, Peter Zijlstra
On Wed, 14 Jan, at 10:27:47AM, Andy Lutomirski wrote:
>
> How are you manually triggering an MCE? I've been playing with some
> MCE stuff recently, but the only reasonably reliable way I know of to
> trigger an MCE is using WHEA, and I don't have a box with WHEA, and I
> assume your ASUS T100 doesn't either.
As Borislav mentions, I used 'int $18', solely to trigger the 64-bit
exception handler code paths in the middle of the EFI mixed mode code.
> > Where this won't work so well is at boot time before we jump to the
> > kernel proper. There, we still need to restore the firmware's GDT so
> > that interrupts are serviced correctly before ExitBootServices() (in
> > particular, ia32 Tianocore assumes __KERNEL_CS is a 32-bit CS).
>
> Tianocore makes assumptions about the kernel's GDT layout? Yuck.
No, but 32-bit Tianocore does rely on the second GDT entry being a
32-bit CS.
It has no knowledge of Linux's GDT layout.
--
Matt Fleming, Intel Open Source Technology Center
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: EFI mixed mode + perf = rampant triple faults
2015-01-15 19:41 ` Matt Fleming
@ 2015-01-15 19:59 ` H. Peter Anvin
2015-01-15 22:21 ` Matt Fleming
0 siblings, 1 reply; 12+ messages in thread
From: H. Peter Anvin @ 2015-01-15 19:59 UTC (permalink / raw)
To: Matt Fleming, Andy Lutomirski
Cc: LKML, linux-efi@vger.kernel.org, Borislav Petkov, Thomas Gleixner,
Ingo Molnar, Peter Zijlstra
On 01/15/2015 11:41 AM, Matt Fleming wrote:
>>
>> Tianocore makes assumptions about the kernel's GDT layout? Yuck.
>
> No, but 32-bit Tianocore does rely on the second GDT entry being a
> 32-bit CS.
>
> It has no knowledge of Linux's GDT layout.
>
If it assumes that descriptor 16 is a 32-bit CS (and what about data?
24?) that *is* making assumptions on the kernel.
-hpa
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: EFI mixed mode + perf = rampant triple faults
2015-01-15 19:59 ` H. Peter Anvin
@ 2015-01-15 22:21 ` Matt Fleming
0 siblings, 0 replies; 12+ messages in thread
From: Matt Fleming @ 2015-01-15 22:21 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Andy Lutomirski, LKML, linux-efi@vger.kernel.org, Borislav Petkov,
Thomas Gleixner, Ingo Molnar, Peter Zijlstra
On Thu, 15 Jan, at 11:59:42AM, H. Peter Anvin wrote:
> On 01/15/2015 11:41 AM, Matt Fleming wrote:
> >>
> >>Tianocore makes assumptions about the kernel's GDT layout? Yuck.
> >
> >No, but 32-bit Tianocore does rely on the second GDT entry being a
> >32-bit CS.
> >
> >It has no knowledge of Linux's GDT layout.
> >
>
> If it assumes that descriptor 16 is a 32-bit CS (and what about
> data? 24?) that *is* making assumptions on the kernel.
Bear in mind that this is before ExitBootServices() is invoked, so where
the firmware still thinks it (not the OS) "owns" the platform.
None of this comes into play at Runtime.
--
Matt Fleming, Intel Open Source Technology Center
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2015-01-15 22:21 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-12-17 16:51 EFI mixed mode + perf = rampant triple faults Andy Lutomirski
2014-12-17 16:54 ` Andy Lutomirski
2014-12-31 18:37 ` Matt Fleming
2015-01-14 16:51 ` Matt Fleming
2015-01-14 18:27 ` Andy Lutomirski
2015-01-14 18:35 ` Borislav Petkov
2015-01-14 18:38 ` Andy Lutomirski
2015-01-14 18:47 ` Borislav Petkov
2015-01-14 18:49 ` Andy Lutomirski
2015-01-15 19:41 ` Matt Fleming
2015-01-15 19:59 ` H. Peter Anvin
2015-01-15 22:21 ` Matt Fleming
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox