public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC] x86, ia32entry: Use sysretl to return from sysenter
@ 2015-03-27 21:54 Andy Lutomirski
  2015-03-28  8:35 ` Ingo Molnar
  2015-03-29 19:07 ` Denys Vlasenko
  0 siblings, 2 replies; 7+ messages in thread
From: Andy Lutomirski @ 2015-03-27 21:54 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Denys Vlasenko, Borislav Petkov, linux-kernel, X86 ML,
	Ingo Molnar, hpa, Andy Lutomirski, stable

Sysexit is scary on 64-bit kernels -- sysexit must be invoked with
usergs and IRQs on.  That means that we rely on sti to correctly
mask interrupts for one instruction.  This is okay by itself, but
the semantics with respect to NMIs are unclear.

Avoid the whole issue by using sysretl instead.  For background,
Intel CPUs don't allow syscall from compat mode, but they do allow
sysret back to compat mode.  Go figure.

Oddly this seems to be 30 cycles or so faster.  Avoiding popfq and
sti will account for under half of that, I think, so my best guess
is that Intel just optimizes sysret much better than sysexit.

Cc: stable@vger.kernel.org
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---

This needs careful review even though it's short.  It everyone likes
it, I'll resubmit with a second patch to tear out the associated
paravirt gunk.

I wouldn't be at all surprised if this breaks Xen for some reason
I haven't thought of.

arch/x86/ia32/ia32entry.S | 32 +++++++++++++++++++-------------
 1 file changed, 19 insertions(+), 13 deletions(-)

diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
index 5d2641ce9957..356be82fef0c 100644
--- a/arch/x86/ia32/ia32entry.S
+++ b/arch/x86/ia32/ia32entry.S
@@ -180,28 +180,34 @@ sysenter_dispatch:
 	testl	$_TIF_ALLWORK_MASK, ASM_THREAD_INFO(TI_flags, %rsp, SIZEOF_PTREGS)
 	jnz	sysexit_audit
 sysexit_from_sys_call:
+	/*
+	 * NB: sysexit is not obviously safe for 64-bit kernels -- an
+	 * NMI between sti and sysexit has poorly specified behavior,
+	 * and and NMI followed by an IRQ with usergs is fatal.  So
+	 * we just pretend we're using sysexit but we really use
+	 * sysretl instead.
+	 *
+	 * This code path is still called sysexit because it pairs
+	 * with sysenter and it uses the sysenter calling convention.
+	 */
 	andl    $~TS_COMPAT,ASM_THREAD_INFO(TI_status, %rsp, SIZEOF_PTREGS)
-	/* clear IF, that popfq doesn't enable interrupts early */
-	andl	$~0x200,EFLAGS(%rsp)
-	movl	RIP(%rsp),%edx		/* User %eip */
-	CFI_REGISTER rip,rdx
+	movl	RIP(%rsp),%ecx		/* User %eip */
+	CFI_REGISTER rip,rcx
 	RESTORE_RSI_RDI
-	/* pop everything except ss,rsp,rflags slots */
-	REMOVE_PT_GPREGS_FROM_STACK 3*8
+	xorl	%edx,%edx
 	xorq	%r8,%r8
 	xorq	%r9,%r9
 	xorq	%r10,%r10
-	xorq	%r11,%r11
-	popfq_cfi
+	movl	EFLAGS(%rsp),%r11d	/* User eflags */
 	/*CFI_RESTORE rflags*/
-	popq_cfi %rcx				/* User %esp */
-	CFI_REGISTER rsp,rcx
 	TRACE_IRQS_ON
 	/*
-	 * 32bit SYSEXIT restores eip from edx, esp from ecx.
-	 * cs and ss are loaded from MSRs.
+	 * Sysretl works even on Intel CPUs.  Use it in preference to sysexit,
+	 * since it avoids a dicey window with interrupts enabled.
+	 * CS and SS are loaded from MSRs.
 	 */
-	ENABLE_INTERRUPTS_SYSEXIT32
+	movl	RSP(%rsp),%esp
+	USERGS_SYSRET32
 
 	CFI_RESTORE_STATE
 
-- 
2.3.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2015-03-30  9:06 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-03-27 21:54 [RFC] x86, ia32entry: Use sysretl to return from sysenter Andy Lutomirski
2015-03-28  8:35 ` Ingo Molnar
2015-03-28 15:17   ` Andy Lutomirski
2015-03-29 12:58     ` Borislav Petkov
2015-03-29 19:07 ` Denys Vlasenko
2015-03-29 21:16   ` Andy Lutomirski
2015-03-30  9:04     ` Borislav Petkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox