* Is espfix64's double-fault thing OK on Xen?
@ 2014-07-09 23:17 Andy Lutomirski
2014-07-14 16:58 ` Konrad Rzeszutek Wilk
2014-07-14 17:04 ` H. Peter Anvin
0 siblings, 2 replies; 11+ messages in thread
From: Andy Lutomirski @ 2014-07-09 23:17 UTC (permalink / raw)
To: H. Peter Anvin, Konrad Rzeszutek Wilk,
linux-kernel@vger.kernel.org
This part in __do_double_fault looks fishy:
cmpl $__KERNEL_CS,CS(%rdi)
jne do_double_fault
Shouldn't that be:
test $3,CS(%rdi)
jnz do_double_fault
--Andy
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: Is espfix64's double-fault thing OK on Xen? 2014-07-09 23:17 Is espfix64's double-fault thing OK on Xen? Andy Lutomirski @ 2014-07-14 16:58 ` Konrad Rzeszutek Wilk 2014-07-14 17:04 ` H. Peter Anvin 1 sibling, 0 replies; 11+ messages in thread From: Konrad Rzeszutek Wilk @ 2014-07-14 16:58 UTC (permalink / raw) To: Andy Lutomirski, david.vrabel Cc: H. Peter Anvin, linux-kernel@vger.kernel.org On Wed, Jul 09, 2014 at 04:17:57PM -0700, Andy Lutomirski wrote: > This part in __do_double_fault looks fishy: > > cmpl $__KERNEL_CS,CS(%rdi) > jne do_double_fault > > Shouldn't that be: > > test $3,CS(%rdi) > jnz do_double_fault > Let me rope in David, who was playing with that recently. > --Andy ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Is espfix64's double-fault thing OK on Xen? 2014-07-09 23:17 Is espfix64's double-fault thing OK on Xen? Andy Lutomirski 2014-07-14 16:58 ` Konrad Rzeszutek Wilk @ 2014-07-14 17:04 ` H. Peter Anvin 2014-07-14 17:11 ` Andy Lutomirski 1 sibling, 1 reply; 11+ messages in thread From: H. Peter Anvin @ 2014-07-14 17:04 UTC (permalink / raw) To: Andy Lutomirski, Konrad Rzeszutek Wilk, linux-kernel@vger.kernel.org On 07/09/2014 04:17 PM, Andy Lutomirski wrote: > This part in __do_double_fault looks fishy: > > cmpl $__KERNEL_CS,CS(%rdi) > jne do_double_fault > > Shouldn't that be: > > test $3,CS(%rdi) > jnz do_double_fault > No, it should be fine. The *only* case where we need to do the espfix magic is when we are on __KERNEL_CS. -hpa ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Is espfix64's double-fault thing OK on Xen? 2014-07-14 17:04 ` H. Peter Anvin @ 2014-07-14 17:11 ` Andy Lutomirski 2014-07-14 17:15 ` Andy Lutomirski 0 siblings, 1 reply; 11+ messages in thread From: Andy Lutomirski @ 2014-07-14 17:11 UTC (permalink / raw) To: H. Peter Anvin; +Cc: Konrad Rzeszutek Wilk, linux-kernel@vger.kernel.org On Mon, Jul 14, 2014 at 10:04 AM, H. Peter Anvin <hpa@zytor.com> wrote: > On 07/09/2014 04:17 PM, Andy Lutomirski wrote: >> This part in __do_double_fault looks fishy: >> >> cmpl $__KERNEL_CS,CS(%rdi) >> jne do_double_fault >> >> Shouldn't that be: >> >> test $3,CS(%rdi) >> jnz do_double_fault >> > > No, it should be fine. The *only* case where we need to do the espfix > magic is when we are on __KERNEL_CS. > IIRC Xen has a somewhat different GDT, and at least the userspace CS in IA32_STAR disagrees with normal Linux. If the kernel CS is also strange, then there will be an extra possible CS value here. --Andy ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Is espfix64's double-fault thing OK on Xen? 2014-07-14 17:11 ` Andy Lutomirski @ 2014-07-14 17:15 ` Andy Lutomirski 2014-07-14 21:31 ` Andy Lutomirski 0 siblings, 1 reply; 11+ messages in thread From: Andy Lutomirski @ 2014-07-14 17:15 UTC (permalink / raw) To: H. Peter Anvin; +Cc: Konrad Rzeszutek Wilk, linux-kernel@vger.kernel.org On Mon, Jul 14, 2014 at 10:11 AM, Andy Lutomirski <luto@amacapital.net> wrote: > On Mon, Jul 14, 2014 at 10:04 AM, H. Peter Anvin <hpa@zytor.com> wrote: >> On 07/09/2014 04:17 PM, Andy Lutomirski wrote: >>> This part in __do_double_fault looks fishy: >>> >>> cmpl $__KERNEL_CS,CS(%rdi) >>> jne do_double_fault >>> >>> Shouldn't that be: >>> >>> test $3,CS(%rdi) >>> jnz do_double_fault >>> >> >> No, it should be fine. The *only* case where we need to do the espfix >> magic is when we are on __KERNEL_CS. >> > > IIRC Xen has a somewhat different GDT, and at least the userspace CS > in IA32_STAR disagrees with normal Linux. If the kernel CS is also > strange, then there will be an extra possible CS value here. There's FLAT_KERNEL_CS64, which is not equal to __KERNEL_CS. If the espfix mechanism gets invoked with that CS, then I expect that something unexpected will happen. That being said, FLAT_KERNEL_CS64 is CPL3, so my code might not be any better. --Andy ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Is espfix64's double-fault thing OK on Xen? 2014-07-14 17:15 ` Andy Lutomirski @ 2014-07-14 21:31 ` Andy Lutomirski 2014-07-14 21:35 ` Andy Lutomirski 0 siblings, 1 reply; 11+ messages in thread From: Andy Lutomirski @ 2014-07-14 21:31 UTC (permalink / raw) To: H. Peter Anvin; +Cc: Konrad Rzeszutek Wilk, linux-kernel@vger.kernel.org I'm now rather confused. On Xen 64-bit, AFAICS, syscall handlers run with CS = 0xe033. I think that Xen is somehow fixing up traps that came from "kernel" mode to show CS = 0xe030, which is an impossible selector value (unless that segment is conforming) to keep user_mode_vm happy. I'm running this test: https://gitorious.org/linux-test-utils/linux-clock-tests/source/1e13516a41416a7282f43c83097c9dfe4619344b:sigreturn.c It requires a kernel with my SS sigcontext change; otherwise it doesn't do anything. Without Xen, it works reliably. On Xen, it seems to OOPS some fraction of the time. It gets a null pointer dereference here: movq %rax,(0*8)(%rdi) /* RAX */ It looks like: [ 0.565752] BUG: unable to handle kernel NULL pointer dereference at (null) [ 0.566706] IP: [<ffffffff81775493>] irq_return_ldt+0x11/0x5c [ 0.566706] PGD 4eb40067 PUD 4eb38067 PMD 0 [ 0.566706] Oops: 0002 [#1] SMP [ 0.566706] Modules linked in: [ 0.566706] CPU: 1 PID: 81 Comm: sigreturn Not tainted 3.16.0-rc4+ #47 [ 0.566706] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 0.566706] task: ffff88004e8aa180 ti: ffff88004eb68000 task.ti: ffff88004eb68000 [ 0.566706] RIP: e030:[<ffffffff81775493>] [<ffffffff81775493>] irq_return_ldt+0x11/0x5c [ 0.566706] RSP: e02b:ffff88004eb6bfc8 EFLAGS: 00010002 [ 0.566706] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffffffffffff [ 0.566706] RDX: 000000000000000a RSI: 0000000000000051 RDI: 0000000000000000 [ 0.566706] RBP: 00000000006d3018 R08: 0000000000000000 R09: 0000000000000000 [ 0.566706] R10: 0000000000000008 R11: 0000000000000202 R12: 0000000000000000 [ 0.566706] R13: 0000000000000001 R14: 000000000040eec0 R15: 0000000000000000 [ 0.566706] FS: 0000000000000000(0063) GS:ffff880056300000(0000) knlGS:0000000000000000 [ 0.566706] CS: e033 DS: 000f ES: 000f CR0: 0000000080050033 [ 0.566706] CR2: 0000000000000000 CR3: 000000004eb3c000 CR4: 0000000000042660 [ 0.566706] Stack: [ 0.566706] 0000000000000051 0000000000000000 0000000000000000 0000000000000007 [ 0.566706] 0000000000000202 8badf00d5aad0000 000000000000000f [ 0.566706] Call Trace: [ 0.566706] Code: 44 24 20 04 75 14 e9 9d 5a 89 ff 90 66 66 66 2e 0f 1f 84 00 00 00 00 00 48 cf 50 57 66 66 90 66 66 90 65 48 8b 3c 25 00 b0 00 00 <48> 89 07 48 8b 44 24 10 48 89 47 08 48 8b 44 24 18 48 89 47 10 [ 0.566706] RIP [<ffffffff81775493>] irq_return_ldt+0x11/0x5c [ 0.566706] RSP <ffff88004eb6bfc8> [ 0.566706] CR2: 0000000000000000 [ 0.566706] ---[ end trace a62b7f28ce379a48 ]--- When it doesn't OOPS, it segfaults. I don't know why. I suspect that Xen either has a bug in modify_ldt, sigreturn, or iret when returning to a CS that lives in the LDT. --Andy ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Is espfix64's double-fault thing OK on Xen? 2014-07-14 21:31 ` Andy Lutomirski @ 2014-07-14 21:35 ` Andy Lutomirski 2014-07-14 22:23 ` H. Peter Anvin 0 siblings, 1 reply; 11+ messages in thread From: Andy Lutomirski @ 2014-07-14 21:35 UTC (permalink / raw) To: H. Peter Anvin; +Cc: Konrad Rzeszutek Wilk, linux-kernel@vger.kernel.org On Mon, Jul 14, 2014 at 2:31 PM, Andy Lutomirski <luto@amacapital.net> wrote: > I'm now rather confused. > > On Xen 64-bit, AFAICS, syscall handlers run with CS = 0xe033. I think > that Xen is somehow fixing up traps that came from "kernel" mode to > show CS = 0xe030, which is an impossible selector value (unless that > segment is conforming) to keep user_mode_vm happy. > > I'm running this test: > > https://gitorious.org/linux-test-utils/linux-clock-tests/source/1e13516a41416a7282f43c83097c9dfe4619344b:sigreturn.c > > It requires a kernel with my SS sigcontext change; otherwise it > doesn't do anything. > > Without Xen, it works reliably. On Xen, it seems to OOPS some > fraction of the time. It gets a null pointer dereference here: > > movq %rax,(0*8)(%rdi) /* RAX */ > > It looks like: > > [ 0.565752] BUG: unable to handle kernel NULL pointer dereference > at (null) > [ 0.566706] IP: [<ffffffff81775493>] irq_return_ldt+0x11/0x5c > [ 0.566706] PGD 4eb40067 PUD 4eb38067 PMD 0 > [ 0.566706] Oops: 0002 [#1] SMP > [ 0.566706] Modules linked in: > [ 0.566706] CPU: 1 PID: 81 Comm: sigreturn Not tainted 3.16.0-rc4+ #47 > [ 0.566706] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 > [ 0.566706] task: ffff88004e8aa180 ti: ffff88004eb68000 task.ti: > ffff88004eb68000 > [ 0.566706] RIP: e030:[<ffffffff81775493>] [<ffffffff81775493>] > irq_return_ldt+0x11/0x5c > [ 0.566706] RSP: e02b:ffff88004eb6bfc8 EFLAGS: 00010002 > [ 0.566706] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffffffffffff > [ 0.566706] RDX: 000000000000000a RSI: 0000000000000051 RDI: 0000000000000000 > [ 0.566706] RBP: 00000000006d3018 R08: 0000000000000000 R09: 0000000000000000 > [ 0.566706] R10: 0000000000000008 R11: 0000000000000202 R12: 0000000000000000 > [ 0.566706] R13: 0000000000000001 R14: 000000000040eec0 R15: 0000000000000000 > [ 0.566706] FS: 0000000000000000(0063) GS:ffff880056300000(0000) > knlGS:0000000000000000 > [ 0.566706] CS: e033 DS: 000f ES: 000f CR0: 0000000080050033 > [ 0.566706] CR2: 0000000000000000 CR3: 000000004eb3c000 CR4: 0000000000042660 > [ 0.566706] Stack: > [ 0.566706] 0000000000000051 0000000000000000 0000000000000000 > 0000000000000007 > [ 0.566706] 0000000000000202 8badf00d5aad0000 000000000000000f > [ 0.566706] Call Trace: > [ 0.566706] Code: 44 24 20 04 75 14 e9 9d 5a 89 ff 90 66 66 66 2e > 0f 1f 84 00 00 00 00 00 48 cf 50 57 66 66 90 66 66 90 65 48 8b 3c 25 > 00 b0 00 00 <48> 89 07 48 8b 44 24 10 48 89 47 08 48 8b 44 24 18 48 89 > 47 10 > [ 0.566706] RIP [<ffffffff81775493>] irq_return_ldt+0x11/0x5c > [ 0.566706] RSP <ffff88004eb6bfc8> > [ 0.566706] CR2: 0000000000000000 > [ 0.566706] ---[ end trace a62b7f28ce379a48 ]--- > > When it doesn't OOPS, it segfaults. I don't know why. I suspect that > Xen either has a bug in modify_ldt, sigreturn, or iret when returning > to a CS that lives in the LDT. Presumably the problem is here: ENTRY(xen_iret) pushq $0 1: jmp hypercall_iret ENDPATCH(xen_iret) This seems rather unlikely to work on the espfix stack. Maybe espfix64 should be disabled when running on Xen and Xen should implement its own espfix64 in the hypervisor. --Andy > > > --Andy -- Andy Lutomirski AMA Capital Management, LLC ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Is espfix64's double-fault thing OK on Xen? 2014-07-14 21:35 ` Andy Lutomirski @ 2014-07-14 22:23 ` H. Peter Anvin 2014-07-15 2:46 ` Andy Lutomirski 0 siblings, 1 reply; 11+ messages in thread From: H. Peter Anvin @ 2014-07-14 22:23 UTC (permalink / raw) To: Andy Lutomirski; +Cc: Konrad Rzeszutek Wilk, linux-kernel@vger.kernel.org On 07/14/2014 02:35 PM, Andy Lutomirski wrote: > Presumably the problem is here: > > ENTRY(xen_iret) > pushq $0 > 1: jmp hypercall_iret > ENDPATCH(xen_iret) > > This seems rather unlikely to work on the espfix stack. > > Maybe espfix64 should be disabled when running on Xen and Xen should > implement its own espfix64 in the hypervisor. Perhaps the first question is: is espfix even necessary on Xen? How does the Xen PV IRET handle returning to a 16-bit stack segment? -hpa ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Is espfix64's double-fault thing OK on Xen? 2014-07-14 22:23 ` H. Peter Anvin @ 2014-07-15 2:46 ` Andy Lutomirski 2014-07-15 3:20 ` Andy Lutomirski 2014-07-15 4:14 ` H. Peter Anvin 0 siblings, 2 replies; 11+ messages in thread From: Andy Lutomirski @ 2014-07-15 2:46 UTC (permalink / raw) To: H. Peter Anvin; +Cc: Konrad Rzeszutek Wilk, linux-kernel@vger.kernel.org On Mon, Jul 14, 2014 at 3:23 PM, H. Peter Anvin <hpa@zytor.com> wrote: > On 07/14/2014 02:35 PM, Andy Lutomirski wrote: >> Presumably the problem is here: >> >> ENTRY(xen_iret) >> pushq $0 >> 1: jmp hypercall_iret >> ENDPATCH(xen_iret) >> >> This seems rather unlikely to work on the espfix stack. >> >> Maybe espfix64 should be disabled when running on Xen and Xen should >> implement its own espfix64 in the hypervisor. > > Perhaps the first question is: is espfix even necessary on Xen? How > does the Xen PV IRET handle returning to a 16-bit stack segment? > Test case here: https://gitorious.org/linux-test-utils/linux-clock-tests/source/dbfe196a0f6efedc119deb1cdbb0139dbdf609ee: It's sigreturn_32 and sigreturn_64. Summary: (sigreturn_64 always fails unless my SS patch is applied. results below for sigreturn_64 assume the patch is applied. This is on KVM (-cpu host) on Sandy Bridge.) On Xen with espfix, both OOPS intermittently. On espfix-less kernels (Xen and non-Xen), 16-bit CS w/ 16-bit SS always fails. Native (32-bit or 64-bit, according to the binary) CS with 16-bit SS fails for sigreturn_32, but passes for sigreturn_64. I find this somewhat odd. Native ss always passes. So I think that Xen makes no difference here, aside from the bug. That being said, I don't know whether Linux can do espfix64 at all when Xen is running -- for all I know, the IRET hypercall switches stacks to a Xen stack. --Andy ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Is espfix64's double-fault thing OK on Xen? 2014-07-15 2:46 ` Andy Lutomirski @ 2014-07-15 3:20 ` Andy Lutomirski 2014-07-15 4:14 ` H. Peter Anvin 1 sibling, 0 replies; 11+ messages in thread From: Andy Lutomirski @ 2014-07-15 3:20 UTC (permalink / raw) To: H. Peter Anvin; +Cc: Konrad Rzeszutek Wilk, linux-kernel@vger.kernel.org On Mon, Jul 14, 2014 at 7:46 PM, Andy Lutomirski <luto@amacapital.net> wrote: > On Mon, Jul 14, 2014 at 3:23 PM, H. Peter Anvin <hpa@zytor.com> wrote: >> On 07/14/2014 02:35 PM, Andy Lutomirski wrote: >>> Presumably the problem is here: >>> >>> ENTRY(xen_iret) >>> pushq $0 >>> 1: jmp hypercall_iret >>> ENDPATCH(xen_iret) >>> >>> This seems rather unlikely to work on the espfix stack. >>> >>> Maybe espfix64 should be disabled when running on Xen and Xen should >>> implement its own espfix64 in the hypervisor. >> >> Perhaps the first question is: is espfix even necessary on Xen? How >> does the Xen PV IRET handle returning to a 16-bit stack segment? >> > > Test case here: > > https://gitorious.org/linux-test-utils/linux-clock-tests/source/dbfe196a0f6efedc119deb1cdbb0139dbdf609ee: > > It's sigreturn_32 and sigreturn_64. Summary: > > (sigreturn_64 always fails unless my SS patch is applied. results > below for sigreturn_64 assume the patch is applied. This is on KVM > (-cpu host) on Sandy Bridge.) > > On Xen with espfix, both OOPS intermittently. > > On espfix-less kernels (Xen and non-Xen), 16-bit CS w/ 16-bit SS > always fails. Native (32-bit or 64-bit, according to the binary) CS > with 16-bit SS fails for sigreturn_32, but passes for sigreturn_64. I > find this somewhat odd. Native ss always passes. > > So I think that Xen makes no difference here, aside from the bug. > > That being said, I don't know whether Linux can do espfix64 at all > when Xen is running -- for all I know, the IRET hypercall switches > stacks to a Xen stack. Microcode is weird. Without espfix: [RUN] 64-bit CS (33), 32-bit SS (2b) SP: 8badf00d5aadc0de -> 8badf00d5aadc0de [OK] all registers okay [RUN] 32-bit CS (23), 32-bit SS (2b) SP: 8badf00d5aadc0de -> 5aadc0de [OK] all registers okay [RUN] 16-bit CS (7), 32-bit SS (2b) SP: 8badf00d5aadc0de -> 5aadc0de [OK] all registers okay [RUN] 64-bit CS (33), 16-bit SS (f) SP: 8badf00d5aadc0de -> 8badf00d5aadc0de [OK] all registers okay [RUN] 32-bit CS (23), 16-bit SS (f) SP: 8badf00d5aadc0de -> 5ae3c0de [FAIL] Reg 15 mismatch: requested 0x8badf00d5aadc0de; got 0x5ae3c0de [RUN] 16-bit CS (7), 16-bit SS (f) SP: 8badf00d5aadc0de -> 5ae3c0de [FAIL] Reg 15 mismatch: requested 0x8badf00d5aadc0de; got 0x5ae3c0de --Andy ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Is espfix64's double-fault thing OK on Xen? 2014-07-15 2:46 ` Andy Lutomirski 2014-07-15 3:20 ` Andy Lutomirski @ 2014-07-15 4:14 ` H. Peter Anvin 1 sibling, 0 replies; 11+ messages in thread From: H. Peter Anvin @ 2014-07-15 4:14 UTC (permalink / raw) To: Andy Lutomirski; +Cc: Konrad Rzeszutek Wilk, linux-kernel@vger.kernel.org On 07/14/2014 07:46 PM, Andy Lutomirski wrote: > > On espfix-less kernels (Xen and non-Xen), 16-bit CS w/ 16-bit SS > always fails. Native (32-bit or 64-bit, according to the binary) CS > with 16-bit SS fails for sigreturn_32, but passes for sigreturn_64. I > find this somewhat odd. Native ss always passes. > espfix32 is disabled on Xen. -hpa ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2014-07-15 4:14 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-07-09 23:17 Is espfix64's double-fault thing OK on Xen? Andy Lutomirski 2014-07-14 16:58 ` Konrad Rzeszutek Wilk 2014-07-14 17:04 ` H. Peter Anvin 2014-07-14 17:11 ` Andy Lutomirski 2014-07-14 17:15 ` Andy Lutomirski 2014-07-14 21:31 ` Andy Lutomirski 2014-07-14 21:35 ` Andy Lutomirski 2014-07-14 22:23 ` H. Peter Anvin 2014-07-15 2:46 ` Andy Lutomirski 2014-07-15 3:20 ` Andy Lutomirski 2014-07-15 4:14 ` H. Peter Anvin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox