public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Is espfix64's double-fault thing OK on Xen?
@ 2014-07-09 23:17 Andy Lutomirski
  2014-07-14 16:58 ` Konrad Rzeszutek Wilk
  2014-07-14 17:04 ` H. Peter Anvin
  0 siblings, 2 replies; 11+ messages in thread
From: Andy Lutomirski @ 2014-07-09 23:17 UTC (permalink / raw)
  To: H. Peter Anvin, Konrad Rzeszutek Wilk,
	linux-kernel@vger.kernel.org

This part in __do_double_fault looks fishy:

    cmpl $__KERNEL_CS,CS(%rdi)
    jne do_double_fault

Shouldn't that be:

    test $3,CS(%rdi)
    jnz do_double_fault

--Andy

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Is espfix64's double-fault thing OK on Xen?
  2014-07-09 23:17 Is espfix64's double-fault thing OK on Xen? Andy Lutomirski
@ 2014-07-14 16:58 ` Konrad Rzeszutek Wilk
  2014-07-14 17:04 ` H. Peter Anvin
  1 sibling, 0 replies; 11+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-07-14 16:58 UTC (permalink / raw)
  To: Andy Lutomirski, david.vrabel
  Cc: H. Peter Anvin, linux-kernel@vger.kernel.org

On Wed, Jul 09, 2014 at 04:17:57PM -0700, Andy Lutomirski wrote:
> This part in __do_double_fault looks fishy:
> 
>     cmpl $__KERNEL_CS,CS(%rdi)
>     jne do_double_fault
> 
> Shouldn't that be:
> 
>     test $3,CS(%rdi)
>     jnz do_double_fault
> 

Let me rope in David, who was playing with that recently.

> --Andy

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Is espfix64's double-fault thing OK on Xen?
  2014-07-09 23:17 Is espfix64's double-fault thing OK on Xen? Andy Lutomirski
  2014-07-14 16:58 ` Konrad Rzeszutek Wilk
@ 2014-07-14 17:04 ` H. Peter Anvin
  2014-07-14 17:11   ` Andy Lutomirski
  1 sibling, 1 reply; 11+ messages in thread
From: H. Peter Anvin @ 2014-07-14 17:04 UTC (permalink / raw)
  To: Andy Lutomirski, Konrad Rzeszutek Wilk,
	linux-kernel@vger.kernel.org

On 07/09/2014 04:17 PM, Andy Lutomirski wrote:
> This part in __do_double_fault looks fishy:
> 
>     cmpl $__KERNEL_CS,CS(%rdi)
>     jne do_double_fault
> 
> Shouldn't that be:
> 
>     test $3,CS(%rdi)
>     jnz do_double_fault
> 

No, it should be fine.  The *only* case where we need to do the espfix
magic is when we are on __KERNEL_CS.

	-hpa



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Is espfix64's double-fault thing OK on Xen?
  2014-07-14 17:04 ` H. Peter Anvin
@ 2014-07-14 17:11   ` Andy Lutomirski
  2014-07-14 17:15     ` Andy Lutomirski
  0 siblings, 1 reply; 11+ messages in thread
From: Andy Lutomirski @ 2014-07-14 17:11 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Konrad Rzeszutek Wilk, linux-kernel@vger.kernel.org

On Mon, Jul 14, 2014 at 10:04 AM, H. Peter Anvin <hpa@zytor.com> wrote:
> On 07/09/2014 04:17 PM, Andy Lutomirski wrote:
>> This part in __do_double_fault looks fishy:
>>
>>     cmpl $__KERNEL_CS,CS(%rdi)
>>     jne do_double_fault
>>
>> Shouldn't that be:
>>
>>     test $3,CS(%rdi)
>>     jnz do_double_fault
>>
>
> No, it should be fine.  The *only* case where we need to do the espfix
> magic is when we are on __KERNEL_CS.
>

IIRC Xen has a somewhat different GDT, and at least the userspace CS
in IA32_STAR disagrees with normal Linux.  If the kernel CS is also
strange, then there will be an extra possible CS value here.

--Andy

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Is espfix64's double-fault thing OK on Xen?
  2014-07-14 17:11   ` Andy Lutomirski
@ 2014-07-14 17:15     ` Andy Lutomirski
  2014-07-14 21:31       ` Andy Lutomirski
  0 siblings, 1 reply; 11+ messages in thread
From: Andy Lutomirski @ 2014-07-14 17:15 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Konrad Rzeszutek Wilk, linux-kernel@vger.kernel.org

On Mon, Jul 14, 2014 at 10:11 AM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Mon, Jul 14, 2014 at 10:04 AM, H. Peter Anvin <hpa@zytor.com> wrote:
>> On 07/09/2014 04:17 PM, Andy Lutomirski wrote:
>>> This part in __do_double_fault looks fishy:
>>>
>>>     cmpl $__KERNEL_CS,CS(%rdi)
>>>     jne do_double_fault
>>>
>>> Shouldn't that be:
>>>
>>>     test $3,CS(%rdi)
>>>     jnz do_double_fault
>>>
>>
>> No, it should be fine.  The *only* case where we need to do the espfix
>> magic is when we are on __KERNEL_CS.
>>
>
> IIRC Xen has a somewhat different GDT, and at least the userspace CS
> in IA32_STAR disagrees with normal Linux.  If the kernel CS is also
> strange, then there will be an extra possible CS value here.

There's FLAT_KERNEL_CS64, which is not equal to __KERNEL_CS.  If the
espfix mechanism gets invoked with that CS, then I expect that
something unexpected will happen.

That being said, FLAT_KERNEL_CS64 is CPL3, so my code might not be any better.

--Andy

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Is espfix64's double-fault thing OK on Xen?
  2014-07-14 17:15     ` Andy Lutomirski
@ 2014-07-14 21:31       ` Andy Lutomirski
  2014-07-14 21:35         ` Andy Lutomirski
  0 siblings, 1 reply; 11+ messages in thread
From: Andy Lutomirski @ 2014-07-14 21:31 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Konrad Rzeszutek Wilk, linux-kernel@vger.kernel.org

I'm now rather confused.

On Xen 64-bit, AFAICS, syscall handlers run with CS = 0xe033.  I think
that Xen is somehow fixing up traps that came from "kernel" mode to
show CS = 0xe030, which is an impossible selector value (unless that
segment is conforming) to keep user_mode_vm happy.

I'm running this test:

https://gitorious.org/linux-test-utils/linux-clock-tests/source/1e13516a41416a7282f43c83097c9dfe4619344b:sigreturn.c

It requires a kernel with my SS sigcontext change; otherwise it
doesn't do anything.

Without Xen, it works reliably.  On Xen, it seems to OOPS some
fraction of the time.  It gets a null pointer dereference here:

    movq %rax,(0*8)(%rdi)    /* RAX */

It looks like:

[    0.565752] BUG: unable to handle kernel NULL pointer dereference
at           (null)
[    0.566706] IP: [<ffffffff81775493>] irq_return_ldt+0x11/0x5c
[    0.566706] PGD 4eb40067 PUD 4eb38067 PMD 0
[    0.566706] Oops: 0002 [#1] SMP
[    0.566706] Modules linked in:
[    0.566706] CPU: 1 PID: 81 Comm: sigreturn Not tainted 3.16.0-rc4+ #47
[    0.566706] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[    0.566706] task: ffff88004e8aa180 ti: ffff88004eb68000 task.ti:
ffff88004eb68000
[    0.566706] RIP: e030:[<ffffffff81775493>]  [<ffffffff81775493>]
irq_return_ldt+0x11/0x5c
[    0.566706] RSP: e02b:ffff88004eb6bfc8  EFLAGS: 00010002
[    0.566706] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffffffffffff
[    0.566706] RDX: 000000000000000a RSI: 0000000000000051 RDI: 0000000000000000
[    0.566706] RBP: 00000000006d3018 R08: 0000000000000000 R09: 0000000000000000
[    0.566706] R10: 0000000000000008 R11: 0000000000000202 R12: 0000000000000000
[    0.566706] R13: 0000000000000001 R14: 000000000040eec0 R15: 0000000000000000
[    0.566706] FS:  0000000000000000(0063) GS:ffff880056300000(0000)
knlGS:0000000000000000
[    0.566706] CS:  e033 DS: 000f ES: 000f CR0: 0000000080050033
[    0.566706] CR2: 0000000000000000 CR3: 000000004eb3c000 CR4: 0000000000042660
[    0.566706] Stack:
[    0.566706]  0000000000000051 0000000000000000 0000000000000000
0000000000000007
[    0.566706]  0000000000000202 8badf00d5aad0000 000000000000000f
[    0.566706] Call Trace:
[    0.566706] Code: 44 24 20 04 75 14 e9 9d 5a 89 ff 90 66 66 66 2e
0f 1f 84 00 00 00 00 00 48 cf 50 57 66 66 90 66 66 90 65 48 8b 3c 25
00 b0 00 00 <48> 89 07 48 8b 44 24 10 48 89 47 08 48 8b 44 24 18 48 89
47 10
[    0.566706] RIP  [<ffffffff81775493>] irq_return_ldt+0x11/0x5c
[    0.566706]  RSP <ffff88004eb6bfc8>
[    0.566706] CR2: 0000000000000000
[    0.566706] ---[ end trace a62b7f28ce379a48 ]---

When it doesn't OOPS, it segfaults.  I don't know why.  I suspect that
Xen either has a bug in modify_ldt, sigreturn, or iret when returning
to a CS that lives in the LDT.


--Andy

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Is espfix64's double-fault thing OK on Xen?
  2014-07-14 21:31       ` Andy Lutomirski
@ 2014-07-14 21:35         ` Andy Lutomirski
  2014-07-14 22:23           ` H. Peter Anvin
  0 siblings, 1 reply; 11+ messages in thread
From: Andy Lutomirski @ 2014-07-14 21:35 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Konrad Rzeszutek Wilk, linux-kernel@vger.kernel.org

On Mon, Jul 14, 2014 at 2:31 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> I'm now rather confused.
>
> On Xen 64-bit, AFAICS, syscall handlers run with CS = 0xe033.  I think
> that Xen is somehow fixing up traps that came from "kernel" mode to
> show CS = 0xe030, which is an impossible selector value (unless that
> segment is conforming) to keep user_mode_vm happy.
>
> I'm running this test:
>
> https://gitorious.org/linux-test-utils/linux-clock-tests/source/1e13516a41416a7282f43c83097c9dfe4619344b:sigreturn.c
>
> It requires a kernel with my SS sigcontext change; otherwise it
> doesn't do anything.
>
> Without Xen, it works reliably.  On Xen, it seems to OOPS some
> fraction of the time.  It gets a null pointer dereference here:
>
>     movq %rax,(0*8)(%rdi)    /* RAX */
>
> It looks like:
>
> [    0.565752] BUG: unable to handle kernel NULL pointer dereference
> at           (null)
> [    0.566706] IP: [<ffffffff81775493>] irq_return_ldt+0x11/0x5c
> [    0.566706] PGD 4eb40067 PUD 4eb38067 PMD 0
> [    0.566706] Oops: 0002 [#1] SMP
> [    0.566706] Modules linked in:
> [    0.566706] CPU: 1 PID: 81 Comm: sigreturn Not tainted 3.16.0-rc4+ #47
> [    0.566706] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> [    0.566706] task: ffff88004e8aa180 ti: ffff88004eb68000 task.ti:
> ffff88004eb68000
> [    0.566706] RIP: e030:[<ffffffff81775493>]  [<ffffffff81775493>]
> irq_return_ldt+0x11/0x5c
> [    0.566706] RSP: e02b:ffff88004eb6bfc8  EFLAGS: 00010002
> [    0.566706] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffffffffffff
> [    0.566706] RDX: 000000000000000a RSI: 0000000000000051 RDI: 0000000000000000
> [    0.566706] RBP: 00000000006d3018 R08: 0000000000000000 R09: 0000000000000000
> [    0.566706] R10: 0000000000000008 R11: 0000000000000202 R12: 0000000000000000
> [    0.566706] R13: 0000000000000001 R14: 000000000040eec0 R15: 0000000000000000
> [    0.566706] FS:  0000000000000000(0063) GS:ffff880056300000(0000)
> knlGS:0000000000000000
> [    0.566706] CS:  e033 DS: 000f ES: 000f CR0: 0000000080050033
> [    0.566706] CR2: 0000000000000000 CR3: 000000004eb3c000 CR4: 0000000000042660
> [    0.566706] Stack:
> [    0.566706]  0000000000000051 0000000000000000 0000000000000000
> 0000000000000007
> [    0.566706]  0000000000000202 8badf00d5aad0000 000000000000000f
> [    0.566706] Call Trace:
> [    0.566706] Code: 44 24 20 04 75 14 e9 9d 5a 89 ff 90 66 66 66 2e
> 0f 1f 84 00 00 00 00 00 48 cf 50 57 66 66 90 66 66 90 65 48 8b 3c 25
> 00 b0 00 00 <48> 89 07 48 8b 44 24 10 48 89 47 08 48 8b 44 24 18 48 89
> 47 10
> [    0.566706] RIP  [<ffffffff81775493>] irq_return_ldt+0x11/0x5c
> [    0.566706]  RSP <ffff88004eb6bfc8>
> [    0.566706] CR2: 0000000000000000
> [    0.566706] ---[ end trace a62b7f28ce379a48 ]---
>
> When it doesn't OOPS, it segfaults.  I don't know why.  I suspect that
> Xen either has a bug in modify_ldt, sigreturn, or iret when returning
> to a CS that lives in the LDT.

Presumably the problem is here:

ENTRY(xen_iret)
    pushq $0
1:    jmp hypercall_iret
ENDPATCH(xen_iret)

This seems rather unlikely to work on the espfix stack.

Maybe espfix64 should be disabled when running on Xen and Xen should
implement its own espfix64 in the hypervisor.

--Andy

>
>
> --Andy



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Is espfix64's double-fault thing OK on Xen?
  2014-07-14 21:35         ` Andy Lutomirski
@ 2014-07-14 22:23           ` H. Peter Anvin
  2014-07-15  2:46             ` Andy Lutomirski
  0 siblings, 1 reply; 11+ messages in thread
From: H. Peter Anvin @ 2014-07-14 22:23 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: Konrad Rzeszutek Wilk, linux-kernel@vger.kernel.org

On 07/14/2014 02:35 PM, Andy Lutomirski wrote:
> Presumably the problem is here:
> 
> ENTRY(xen_iret)
>     pushq $0
> 1:    jmp hypercall_iret
> ENDPATCH(xen_iret)
> 
> This seems rather unlikely to work on the espfix stack.
> 
> Maybe espfix64 should be disabled when running on Xen and Xen should
> implement its own espfix64 in the hypervisor.

Perhaps the first question is: is espfix even necessary on Xen?  How
does the Xen PV IRET handle returning to a 16-bit stack segment?

	-hpa



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Is espfix64's double-fault thing OK on Xen?
  2014-07-14 22:23           ` H. Peter Anvin
@ 2014-07-15  2:46             ` Andy Lutomirski
  2014-07-15  3:20               ` Andy Lutomirski
  2014-07-15  4:14               ` H. Peter Anvin
  0 siblings, 2 replies; 11+ messages in thread
From: Andy Lutomirski @ 2014-07-15  2:46 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Konrad Rzeszutek Wilk, linux-kernel@vger.kernel.org

On Mon, Jul 14, 2014 at 3:23 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> On 07/14/2014 02:35 PM, Andy Lutomirski wrote:
>> Presumably the problem is here:
>>
>> ENTRY(xen_iret)
>>     pushq $0
>> 1:    jmp hypercall_iret
>> ENDPATCH(xen_iret)
>>
>> This seems rather unlikely to work on the espfix stack.
>>
>> Maybe espfix64 should be disabled when running on Xen and Xen should
>> implement its own espfix64 in the hypervisor.
>
> Perhaps the first question is: is espfix even necessary on Xen?  How
> does the Xen PV IRET handle returning to a 16-bit stack segment?
>

Test case here:

https://gitorious.org/linux-test-utils/linux-clock-tests/source/dbfe196a0f6efedc119deb1cdbb0139dbdf609ee:

It's sigreturn_32 and sigreturn_64.  Summary:

(sigreturn_64 always fails unless my SS patch is applied.  results
below for sigreturn_64 assume the patch is applied.  This is on KVM
(-cpu host) on Sandy Bridge.)

On Xen with espfix, both OOPS intermittently.

On espfix-less kernels (Xen and non-Xen), 16-bit CS w/ 16-bit SS
always fails.  Native (32-bit or 64-bit, according to the binary) CS
with 16-bit SS fails for sigreturn_32, but passes for sigreturn_64.  I
find this somewhat odd.  Native ss always passes.

So I think that Xen makes no difference here, aside from the bug.

That being said, I don't know whether Linux can do espfix64 at all
when Xen is running -- for all I know, the IRET hypercall switches
stacks to a Xen stack.

--Andy

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Is espfix64's double-fault thing OK on Xen?
  2014-07-15  2:46             ` Andy Lutomirski
@ 2014-07-15  3:20               ` Andy Lutomirski
  2014-07-15  4:14               ` H. Peter Anvin
  1 sibling, 0 replies; 11+ messages in thread
From: Andy Lutomirski @ 2014-07-15  3:20 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Konrad Rzeszutek Wilk, linux-kernel@vger.kernel.org

On Mon, Jul 14, 2014 at 7:46 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Mon, Jul 14, 2014 at 3:23 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>> On 07/14/2014 02:35 PM, Andy Lutomirski wrote:
>>> Presumably the problem is here:
>>>
>>> ENTRY(xen_iret)
>>>     pushq $0
>>> 1:    jmp hypercall_iret
>>> ENDPATCH(xen_iret)
>>>
>>> This seems rather unlikely to work on the espfix stack.
>>>
>>> Maybe espfix64 should be disabled when running on Xen and Xen should
>>> implement its own espfix64 in the hypervisor.
>>
>> Perhaps the first question is: is espfix even necessary on Xen?  How
>> does the Xen PV IRET handle returning to a 16-bit stack segment?
>>
>
> Test case here:
>
> https://gitorious.org/linux-test-utils/linux-clock-tests/source/dbfe196a0f6efedc119deb1cdbb0139dbdf609ee:
>
> It's sigreturn_32 and sigreturn_64.  Summary:
>
> (sigreturn_64 always fails unless my SS patch is applied.  results
> below for sigreturn_64 assume the patch is applied.  This is on KVM
> (-cpu host) on Sandy Bridge.)
>
> On Xen with espfix, both OOPS intermittently.
>
> On espfix-less kernels (Xen and non-Xen), 16-bit CS w/ 16-bit SS
> always fails.  Native (32-bit or 64-bit, according to the binary) CS
> with 16-bit SS fails for sigreturn_32, but passes for sigreturn_64.  I
> find this somewhat odd.  Native ss always passes.
>
> So I think that Xen makes no difference here, aside from the bug.
>
> That being said, I don't know whether Linux can do espfix64 at all
> when Xen is running -- for all I know, the IRET hypercall switches
> stacks to a Xen stack.

Microcode is weird.  Without espfix:

[RUN]    64-bit CS (33), 32-bit SS (2b)
    SP: 8badf00d5aadc0de -> 8badf00d5aadc0de
[OK]    all registers okay
[RUN]    32-bit CS (23), 32-bit SS (2b)
    SP: 8badf00d5aadc0de -> 5aadc0de
[OK]    all registers okay
[RUN]    16-bit CS (7), 32-bit SS (2b)
    SP: 8badf00d5aadc0de -> 5aadc0de
[OK]    all registers okay
[RUN]    64-bit CS (33), 16-bit SS (f)
    SP: 8badf00d5aadc0de -> 8badf00d5aadc0de
[OK]    all registers okay
[RUN]    32-bit CS (23), 16-bit SS (f)
    SP: 8badf00d5aadc0de -> 5ae3c0de
[FAIL]    Reg 15 mismatch: requested 0x8badf00d5aadc0de; got 0x5ae3c0de
[RUN]    16-bit CS (7), 16-bit SS (f)
    SP: 8badf00d5aadc0de -> 5ae3c0de
[FAIL]    Reg 15 mismatch: requested 0x8badf00d5aadc0de; got 0x5ae3c0de

--Andy

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Is espfix64's double-fault thing OK on Xen?
  2014-07-15  2:46             ` Andy Lutomirski
  2014-07-15  3:20               ` Andy Lutomirski
@ 2014-07-15  4:14               ` H. Peter Anvin
  1 sibling, 0 replies; 11+ messages in thread
From: H. Peter Anvin @ 2014-07-15  4:14 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: Konrad Rzeszutek Wilk, linux-kernel@vger.kernel.org

On 07/14/2014 07:46 PM, Andy Lutomirski wrote:
> 
> On espfix-less kernels (Xen and non-Xen), 16-bit CS w/ 16-bit SS
> always fails.  Native (32-bit or 64-bit, according to the binary) CS
> with 16-bit SS fails for sigreturn_32, but passes for sigreturn_64.  I
> find this somewhat odd.  Native ss always passes.
> 

espfix32 is disabled on Xen.

	-hpa


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2014-07-15  4:14 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-07-09 23:17 Is espfix64's double-fault thing OK on Xen? Andy Lutomirski
2014-07-14 16:58 ` Konrad Rzeszutek Wilk
2014-07-14 17:04 ` H. Peter Anvin
2014-07-14 17:11   ` Andy Lutomirski
2014-07-14 17:15     ` Andy Lutomirski
2014-07-14 21:31       ` Andy Lutomirski
2014-07-14 21:35         ` Andy Lutomirski
2014-07-14 22:23           ` H. Peter Anvin
2014-07-15  2:46             ` Andy Lutomirski
2014-07-15  3:20               ` Andy Lutomirski
2014-07-15  4:14               ` H. Peter Anvin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox