* Is espfix64's double-fault thing OK on Xen?
@ 2014-07-09 23:17 Andy Lutomirski
2014-07-14 16:58 ` Konrad Rzeszutek Wilk
2014-07-14 17:04 ` H. Peter Anvin
0 siblings, 2 replies; 11+ messages in thread
From: Andy Lutomirski @ 2014-07-09 23:17 UTC (permalink / raw)
To: H. Peter Anvin, Konrad Rzeszutek Wilk,
linux-kernel@vger.kernel.org
This part in __do_double_fault looks fishy:
cmpl $__KERNEL_CS,CS(%rdi)
jne do_double_fault
Shouldn't that be:
test $3,CS(%rdi)
jnz do_double_fault
--Andy
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Is espfix64's double-fault thing OK on Xen?
2014-07-09 23:17 Is espfix64's double-fault thing OK on Xen? Andy Lutomirski
@ 2014-07-14 16:58 ` Konrad Rzeszutek Wilk
2014-07-14 17:04 ` H. Peter Anvin
1 sibling, 0 replies; 11+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-07-14 16:58 UTC (permalink / raw)
To: Andy Lutomirski, david.vrabel
Cc: H. Peter Anvin, linux-kernel@vger.kernel.org
On Wed, Jul 09, 2014 at 04:17:57PM -0700, Andy Lutomirski wrote:
> This part in __do_double_fault looks fishy:
>
> cmpl $__KERNEL_CS,CS(%rdi)
> jne do_double_fault
>
> Shouldn't that be:
>
> test $3,CS(%rdi)
> jnz do_double_fault
>
Let me rope in David, who was playing with that recently.
> --Andy
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Is espfix64's double-fault thing OK on Xen?
2014-07-09 23:17 Is espfix64's double-fault thing OK on Xen? Andy Lutomirski
2014-07-14 16:58 ` Konrad Rzeszutek Wilk
@ 2014-07-14 17:04 ` H. Peter Anvin
2014-07-14 17:11 ` Andy Lutomirski
1 sibling, 1 reply; 11+ messages in thread
From: H. Peter Anvin @ 2014-07-14 17:04 UTC (permalink / raw)
To: Andy Lutomirski, Konrad Rzeszutek Wilk,
linux-kernel@vger.kernel.org
On 07/09/2014 04:17 PM, Andy Lutomirski wrote:
> This part in __do_double_fault looks fishy:
>
> cmpl $__KERNEL_CS,CS(%rdi)
> jne do_double_fault
>
> Shouldn't that be:
>
> test $3,CS(%rdi)
> jnz do_double_fault
>
No, it should be fine. The *only* case where we need to do the espfix
magic is when we are on __KERNEL_CS.
-hpa
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Is espfix64's double-fault thing OK on Xen?
2014-07-14 17:04 ` H. Peter Anvin
@ 2014-07-14 17:11 ` Andy Lutomirski
2014-07-14 17:15 ` Andy Lutomirski
0 siblings, 1 reply; 11+ messages in thread
From: Andy Lutomirski @ 2014-07-14 17:11 UTC (permalink / raw)
To: H. Peter Anvin; +Cc: Konrad Rzeszutek Wilk, linux-kernel@vger.kernel.org
On Mon, Jul 14, 2014 at 10:04 AM, H. Peter Anvin <hpa@zytor.com> wrote:
> On 07/09/2014 04:17 PM, Andy Lutomirski wrote:
>> This part in __do_double_fault looks fishy:
>>
>> cmpl $__KERNEL_CS,CS(%rdi)
>> jne do_double_fault
>>
>> Shouldn't that be:
>>
>> test $3,CS(%rdi)
>> jnz do_double_fault
>>
>
> No, it should be fine. The *only* case where we need to do the espfix
> magic is when we are on __KERNEL_CS.
>
IIRC Xen has a somewhat different GDT, and at least the userspace CS
in IA32_STAR disagrees with normal Linux. If the kernel CS is also
strange, then there will be an extra possible CS value here.
--Andy
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Is espfix64's double-fault thing OK on Xen?
2014-07-14 17:11 ` Andy Lutomirski
@ 2014-07-14 17:15 ` Andy Lutomirski
2014-07-14 21:31 ` Andy Lutomirski
0 siblings, 1 reply; 11+ messages in thread
From: Andy Lutomirski @ 2014-07-14 17:15 UTC (permalink / raw)
To: H. Peter Anvin; +Cc: Konrad Rzeszutek Wilk, linux-kernel@vger.kernel.org
On Mon, Jul 14, 2014 at 10:11 AM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Mon, Jul 14, 2014 at 10:04 AM, H. Peter Anvin <hpa@zytor.com> wrote:
>> On 07/09/2014 04:17 PM, Andy Lutomirski wrote:
>>> This part in __do_double_fault looks fishy:
>>>
>>> cmpl $__KERNEL_CS,CS(%rdi)
>>> jne do_double_fault
>>>
>>> Shouldn't that be:
>>>
>>> test $3,CS(%rdi)
>>> jnz do_double_fault
>>>
>>
>> No, it should be fine. The *only* case where we need to do the espfix
>> magic is when we are on __KERNEL_CS.
>>
>
> IIRC Xen has a somewhat different GDT, and at least the userspace CS
> in IA32_STAR disagrees with normal Linux. If the kernel CS is also
> strange, then there will be an extra possible CS value here.
There's FLAT_KERNEL_CS64, which is not equal to __KERNEL_CS. If the
espfix mechanism gets invoked with that CS, then I expect that
something unexpected will happen.
That being said, FLAT_KERNEL_CS64 is CPL3, so my code might not be any better.
--Andy
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Is espfix64's double-fault thing OK on Xen?
2014-07-14 17:15 ` Andy Lutomirski
@ 2014-07-14 21:31 ` Andy Lutomirski
2014-07-14 21:35 ` Andy Lutomirski
0 siblings, 1 reply; 11+ messages in thread
From: Andy Lutomirski @ 2014-07-14 21:31 UTC (permalink / raw)
To: H. Peter Anvin; +Cc: Konrad Rzeszutek Wilk, linux-kernel@vger.kernel.org
I'm now rather confused.
On Xen 64-bit, AFAICS, syscall handlers run with CS = 0xe033. I think
that Xen is somehow fixing up traps that came from "kernel" mode to
show CS = 0xe030, which is an impossible selector value (unless that
segment is conforming) to keep user_mode_vm happy.
I'm running this test:
https://gitorious.org/linux-test-utils/linux-clock-tests/source/1e13516a41416a7282f43c83097c9dfe4619344b:sigreturn.c
It requires a kernel with my SS sigcontext change; otherwise it
doesn't do anything.
Without Xen, it works reliably. On Xen, it seems to OOPS some
fraction of the time. It gets a null pointer dereference here:
movq %rax,(0*8)(%rdi) /* RAX */
It looks like:
[ 0.565752] BUG: unable to handle kernel NULL pointer dereference
at (null)
[ 0.566706] IP: [<ffffffff81775493>] irq_return_ldt+0x11/0x5c
[ 0.566706] PGD 4eb40067 PUD 4eb38067 PMD 0
[ 0.566706] Oops: 0002 [#1] SMP
[ 0.566706] Modules linked in:
[ 0.566706] CPU: 1 PID: 81 Comm: sigreturn Not tainted 3.16.0-rc4+ #47
[ 0.566706] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 0.566706] task: ffff88004e8aa180 ti: ffff88004eb68000 task.ti:
ffff88004eb68000
[ 0.566706] RIP: e030:[<ffffffff81775493>] [<ffffffff81775493>]
irq_return_ldt+0x11/0x5c
[ 0.566706] RSP: e02b:ffff88004eb6bfc8 EFLAGS: 00010002
[ 0.566706] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffffffffffff
[ 0.566706] RDX: 000000000000000a RSI: 0000000000000051 RDI: 0000000000000000
[ 0.566706] RBP: 00000000006d3018 R08: 0000000000000000 R09: 0000000000000000
[ 0.566706] R10: 0000000000000008 R11: 0000000000000202 R12: 0000000000000000
[ 0.566706] R13: 0000000000000001 R14: 000000000040eec0 R15: 0000000000000000
[ 0.566706] FS: 0000000000000000(0063) GS:ffff880056300000(0000)
knlGS:0000000000000000
[ 0.566706] CS: e033 DS: 000f ES: 000f CR0: 0000000080050033
[ 0.566706] CR2: 0000000000000000 CR3: 000000004eb3c000 CR4: 0000000000042660
[ 0.566706] Stack:
[ 0.566706] 0000000000000051 0000000000000000 0000000000000000
0000000000000007
[ 0.566706] 0000000000000202 8badf00d5aad0000 000000000000000f
[ 0.566706] Call Trace:
[ 0.566706] Code: 44 24 20 04 75 14 e9 9d 5a 89 ff 90 66 66 66 2e
0f 1f 84 00 00 00 00 00 48 cf 50 57 66 66 90 66 66 90 65 48 8b 3c 25
00 b0 00 00 <48> 89 07 48 8b 44 24 10 48 89 47 08 48 8b 44 24 18 48 89
47 10
[ 0.566706] RIP [<ffffffff81775493>] irq_return_ldt+0x11/0x5c
[ 0.566706] RSP <ffff88004eb6bfc8>
[ 0.566706] CR2: 0000000000000000
[ 0.566706] ---[ end trace a62b7f28ce379a48 ]---
When it doesn't OOPS, it segfaults. I don't know why. I suspect that
Xen either has a bug in modify_ldt, sigreturn, or iret when returning
to a CS that lives in the LDT.
--Andy
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Is espfix64's double-fault thing OK on Xen?
2014-07-14 21:31 ` Andy Lutomirski
@ 2014-07-14 21:35 ` Andy Lutomirski
2014-07-14 22:23 ` H. Peter Anvin
0 siblings, 1 reply; 11+ messages in thread
From: Andy Lutomirski @ 2014-07-14 21:35 UTC (permalink / raw)
To: H. Peter Anvin; +Cc: Konrad Rzeszutek Wilk, linux-kernel@vger.kernel.org
On Mon, Jul 14, 2014 at 2:31 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> I'm now rather confused.
>
> On Xen 64-bit, AFAICS, syscall handlers run with CS = 0xe033. I think
> that Xen is somehow fixing up traps that came from "kernel" mode to
> show CS = 0xe030, which is an impossible selector value (unless that
> segment is conforming) to keep user_mode_vm happy.
>
> I'm running this test:
>
> https://gitorious.org/linux-test-utils/linux-clock-tests/source/1e13516a41416a7282f43c83097c9dfe4619344b:sigreturn.c
>
> It requires a kernel with my SS sigcontext change; otherwise it
> doesn't do anything.
>
> Without Xen, it works reliably. On Xen, it seems to OOPS some
> fraction of the time. It gets a null pointer dereference here:
>
> movq %rax,(0*8)(%rdi) /* RAX */
>
> It looks like:
>
> [ 0.565752] BUG: unable to handle kernel NULL pointer dereference
> at (null)
> [ 0.566706] IP: [<ffffffff81775493>] irq_return_ldt+0x11/0x5c
> [ 0.566706] PGD 4eb40067 PUD 4eb38067 PMD 0
> [ 0.566706] Oops: 0002 [#1] SMP
> [ 0.566706] Modules linked in:
> [ 0.566706] CPU: 1 PID: 81 Comm: sigreturn Not tainted 3.16.0-rc4+ #47
> [ 0.566706] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> [ 0.566706] task: ffff88004e8aa180 ti: ffff88004eb68000 task.ti:
> ffff88004eb68000
> [ 0.566706] RIP: e030:[<ffffffff81775493>] [<ffffffff81775493>]
> irq_return_ldt+0x11/0x5c
> [ 0.566706] RSP: e02b:ffff88004eb6bfc8 EFLAGS: 00010002
> [ 0.566706] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffffffffffff
> [ 0.566706] RDX: 000000000000000a RSI: 0000000000000051 RDI: 0000000000000000
> [ 0.566706] RBP: 00000000006d3018 R08: 0000000000000000 R09: 0000000000000000
> [ 0.566706] R10: 0000000000000008 R11: 0000000000000202 R12: 0000000000000000
> [ 0.566706] R13: 0000000000000001 R14: 000000000040eec0 R15: 0000000000000000
> [ 0.566706] FS: 0000000000000000(0063) GS:ffff880056300000(0000)
> knlGS:0000000000000000
> [ 0.566706] CS: e033 DS: 000f ES: 000f CR0: 0000000080050033
> [ 0.566706] CR2: 0000000000000000 CR3: 000000004eb3c000 CR4: 0000000000042660
> [ 0.566706] Stack:
> [ 0.566706] 0000000000000051 0000000000000000 0000000000000000
> 0000000000000007
> [ 0.566706] 0000000000000202 8badf00d5aad0000 000000000000000f
> [ 0.566706] Call Trace:
> [ 0.566706] Code: 44 24 20 04 75 14 e9 9d 5a 89 ff 90 66 66 66 2e
> 0f 1f 84 00 00 00 00 00 48 cf 50 57 66 66 90 66 66 90 65 48 8b 3c 25
> 00 b0 00 00 <48> 89 07 48 8b 44 24 10 48 89 47 08 48 8b 44 24 18 48 89
> 47 10
> [ 0.566706] RIP [<ffffffff81775493>] irq_return_ldt+0x11/0x5c
> [ 0.566706] RSP <ffff88004eb6bfc8>
> [ 0.566706] CR2: 0000000000000000
> [ 0.566706] ---[ end trace a62b7f28ce379a48 ]---
>
> When it doesn't OOPS, it segfaults. I don't know why. I suspect that
> Xen either has a bug in modify_ldt, sigreturn, or iret when returning
> to a CS that lives in the LDT.
Presumably the problem is here:
ENTRY(xen_iret)
pushq $0
1: jmp hypercall_iret
ENDPATCH(xen_iret)
This seems rather unlikely to work on the espfix stack.
Maybe espfix64 should be disabled when running on Xen and Xen should
implement its own espfix64 in the hypervisor.
--Andy
>
>
> --Andy
--
Andy Lutomirski
AMA Capital Management, LLC
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Is espfix64's double-fault thing OK on Xen?
2014-07-14 21:35 ` Andy Lutomirski
@ 2014-07-14 22:23 ` H. Peter Anvin
2014-07-15 2:46 ` Andy Lutomirski
0 siblings, 1 reply; 11+ messages in thread
From: H. Peter Anvin @ 2014-07-14 22:23 UTC (permalink / raw)
To: Andy Lutomirski; +Cc: Konrad Rzeszutek Wilk, linux-kernel@vger.kernel.org
On 07/14/2014 02:35 PM, Andy Lutomirski wrote:
> Presumably the problem is here:
>
> ENTRY(xen_iret)
> pushq $0
> 1: jmp hypercall_iret
> ENDPATCH(xen_iret)
>
> This seems rather unlikely to work on the espfix stack.
>
> Maybe espfix64 should be disabled when running on Xen and Xen should
> implement its own espfix64 in the hypervisor.
Perhaps the first question is: is espfix even necessary on Xen? How
does the Xen PV IRET handle returning to a 16-bit stack segment?
-hpa
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Is espfix64's double-fault thing OK on Xen?
2014-07-14 22:23 ` H. Peter Anvin
@ 2014-07-15 2:46 ` Andy Lutomirski
2014-07-15 3:20 ` Andy Lutomirski
2014-07-15 4:14 ` H. Peter Anvin
0 siblings, 2 replies; 11+ messages in thread
From: Andy Lutomirski @ 2014-07-15 2:46 UTC (permalink / raw)
To: H. Peter Anvin; +Cc: Konrad Rzeszutek Wilk, linux-kernel@vger.kernel.org
On Mon, Jul 14, 2014 at 3:23 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> On 07/14/2014 02:35 PM, Andy Lutomirski wrote:
>> Presumably the problem is here:
>>
>> ENTRY(xen_iret)
>> pushq $0
>> 1: jmp hypercall_iret
>> ENDPATCH(xen_iret)
>>
>> This seems rather unlikely to work on the espfix stack.
>>
>> Maybe espfix64 should be disabled when running on Xen and Xen should
>> implement its own espfix64 in the hypervisor.
>
> Perhaps the first question is: is espfix even necessary on Xen? How
> does the Xen PV IRET handle returning to a 16-bit stack segment?
>
Test case here:
https://gitorious.org/linux-test-utils/linux-clock-tests/source/dbfe196a0f6efedc119deb1cdbb0139dbdf609ee:
It's sigreturn_32 and sigreturn_64. Summary:
(sigreturn_64 always fails unless my SS patch is applied. results
below for sigreturn_64 assume the patch is applied. This is on KVM
(-cpu host) on Sandy Bridge.)
On Xen with espfix, both OOPS intermittently.
On espfix-less kernels (Xen and non-Xen), 16-bit CS w/ 16-bit SS
always fails. Native (32-bit or 64-bit, according to the binary) CS
with 16-bit SS fails for sigreturn_32, but passes for sigreturn_64. I
find this somewhat odd. Native ss always passes.
So I think that Xen makes no difference here, aside from the bug.
That being said, I don't know whether Linux can do espfix64 at all
when Xen is running -- for all I know, the IRET hypercall switches
stacks to a Xen stack.
--Andy
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Is espfix64's double-fault thing OK on Xen?
2014-07-15 2:46 ` Andy Lutomirski
@ 2014-07-15 3:20 ` Andy Lutomirski
2014-07-15 4:14 ` H. Peter Anvin
1 sibling, 0 replies; 11+ messages in thread
From: Andy Lutomirski @ 2014-07-15 3:20 UTC (permalink / raw)
To: H. Peter Anvin; +Cc: Konrad Rzeszutek Wilk, linux-kernel@vger.kernel.org
On Mon, Jul 14, 2014 at 7:46 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Mon, Jul 14, 2014 at 3:23 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>> On 07/14/2014 02:35 PM, Andy Lutomirski wrote:
>>> Presumably the problem is here:
>>>
>>> ENTRY(xen_iret)
>>> pushq $0
>>> 1: jmp hypercall_iret
>>> ENDPATCH(xen_iret)
>>>
>>> This seems rather unlikely to work on the espfix stack.
>>>
>>> Maybe espfix64 should be disabled when running on Xen and Xen should
>>> implement its own espfix64 in the hypervisor.
>>
>> Perhaps the first question is: is espfix even necessary on Xen? How
>> does the Xen PV IRET handle returning to a 16-bit stack segment?
>>
>
> Test case here:
>
> https://gitorious.org/linux-test-utils/linux-clock-tests/source/dbfe196a0f6efedc119deb1cdbb0139dbdf609ee:
>
> It's sigreturn_32 and sigreturn_64. Summary:
>
> (sigreturn_64 always fails unless my SS patch is applied. results
> below for sigreturn_64 assume the patch is applied. This is on KVM
> (-cpu host) on Sandy Bridge.)
>
> On Xen with espfix, both OOPS intermittently.
>
> On espfix-less kernels (Xen and non-Xen), 16-bit CS w/ 16-bit SS
> always fails. Native (32-bit or 64-bit, according to the binary) CS
> with 16-bit SS fails for sigreturn_32, but passes for sigreturn_64. I
> find this somewhat odd. Native ss always passes.
>
> So I think that Xen makes no difference here, aside from the bug.
>
> That being said, I don't know whether Linux can do espfix64 at all
> when Xen is running -- for all I know, the IRET hypercall switches
> stacks to a Xen stack.
Microcode is weird. Without espfix:
[RUN] 64-bit CS (33), 32-bit SS (2b)
SP: 8badf00d5aadc0de -> 8badf00d5aadc0de
[OK] all registers okay
[RUN] 32-bit CS (23), 32-bit SS (2b)
SP: 8badf00d5aadc0de -> 5aadc0de
[OK] all registers okay
[RUN] 16-bit CS (7), 32-bit SS (2b)
SP: 8badf00d5aadc0de -> 5aadc0de
[OK] all registers okay
[RUN] 64-bit CS (33), 16-bit SS (f)
SP: 8badf00d5aadc0de -> 8badf00d5aadc0de
[OK] all registers okay
[RUN] 32-bit CS (23), 16-bit SS (f)
SP: 8badf00d5aadc0de -> 5ae3c0de
[FAIL] Reg 15 mismatch: requested 0x8badf00d5aadc0de; got 0x5ae3c0de
[RUN] 16-bit CS (7), 16-bit SS (f)
SP: 8badf00d5aadc0de -> 5ae3c0de
[FAIL] Reg 15 mismatch: requested 0x8badf00d5aadc0de; got 0x5ae3c0de
--Andy
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Is espfix64's double-fault thing OK on Xen?
2014-07-15 2:46 ` Andy Lutomirski
2014-07-15 3:20 ` Andy Lutomirski
@ 2014-07-15 4:14 ` H. Peter Anvin
1 sibling, 0 replies; 11+ messages in thread
From: H. Peter Anvin @ 2014-07-15 4:14 UTC (permalink / raw)
To: Andy Lutomirski; +Cc: Konrad Rzeszutek Wilk, linux-kernel@vger.kernel.org
On 07/14/2014 07:46 PM, Andy Lutomirski wrote:
>
> On espfix-less kernels (Xen and non-Xen), 16-bit CS w/ 16-bit SS
> always fails. Native (32-bit or 64-bit, according to the binary) CS
> with 16-bit SS fails for sigreturn_32, but passes for sigreturn_64. I
> find this somewhat odd. Native ss always passes.
>
espfix32 is disabled on Xen.
-hpa
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2014-07-15 4:14 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-07-09 23:17 Is espfix64's double-fault thing OK on Xen? Andy Lutomirski
2014-07-14 16:58 ` Konrad Rzeszutek Wilk
2014-07-14 17:04 ` H. Peter Anvin
2014-07-14 17:11 ` Andy Lutomirski
2014-07-14 17:15 ` Andy Lutomirski
2014-07-14 21:31 ` Andy Lutomirski
2014-07-14 21:35 ` Andy Lutomirski
2014-07-14 22:23 ` H. Peter Anvin
2014-07-15 2:46 ` Andy Lutomirski
2014-07-15 3:20 ` Andy Lutomirski
2014-07-15 4:14 ` H. Peter Anvin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox