public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] x86/asm/entry/32: Restore %ss before SYSRETL if necessary
@ 2015-04-23 12:34 Denys Vlasenko
  2015-04-23 15:22 ` Linus Torvalds
  0 siblings, 1 reply; 20+ messages in thread
From: Denys Vlasenko @ 2015-04-23 12:34 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Denys Vlasenko, Brian Gerst, Linus Torvalds, Steven Rostedt,
	Borislav Petkov, H. Peter Anvin, Andy Lutomirski, Oleg Nesterov,
	Frederic Weisbecker, Alexei Starovoitov, Will Drewry, Kees Cook,
	x86, linux-kernel

AMD docs say that SYSRET32 loads %ss selector with a value from a MSR,
but *cached descriptor* of %ss is not modified.
(Intel CPUs reset the descriptor to a fixed, valid state).

It was observed to cause Wine crashes. Conjectured sequence of events
causing it is as follows:

1. Wine process enters kernel via syscall insn.
2. Context switch to any other task.
3. Interrupt or exception happens, CPU loads %ss with 0.
   (This happens according to both Intel and AMD docs.)
   %ss cached descriptor is set to "invalid" state.
4. Context switch back to Wine.
5. sysret to 32-bit userspace. %ss selector has correct value but its
   cached descriptor is still invalid.
6. The very first userspace POP insn after this causes exception 12.

Fix this by checking %ss selector value. If it is not __KERNEL_DS,
(and it really can only be __KERNEL_DS or zero),
then load it with __KERNEL_DS.

We also use SYSRET32 for SYSENTER-based syscalls, but that codepath is
only used by Intel CPUs, which don't have this quirk.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Reported-by: Brian Gerst <brgerst@gmail.com>
CC: Brian Gerst <brgerst@gmail.com>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: Ingo Molnar <mingo@kernel.org>
CC: Borislav Petkov <bp@alien8.de>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Oleg Nesterov <oleg@redhat.com>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: Alexei Starovoitov <ast@plumgrid.com>
CC: Will Drewry <wad@chromium.org>
CC: Kees Cook <keescook@chromium.org>
CC: x86@kernel.org
CC: linux-kernel@vger.kernel.org
---
 arch/x86/ia32/ia32entry.S | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
index 0c302d0..9537dcb 100644
--- a/arch/x86/ia32/ia32entry.S
+++ b/arch/x86/ia32/ia32entry.S
@@ -408,6 +408,18 @@ cstar_dispatch:
 sysretl_from_sys_call:
 	andl $~TS_COMPAT, ASM_THREAD_INFO(TI_status, %rsp, SIZEOF_PTREGS)
 	RESTORE_RSI_RDI_RDX
+	/*
+	 * On AMD, SYSRET32 loads %ss selector, but does not modify its
+	 * cached descriptor; and in kernel, %ss can be loaded with 0,
+	 * setting cached descriptor to "invalid". This has no effect on
+	 * 64-bit mode, but on return to 32-bit mode, it makes stack ops fail.
+	 * Fix %ss only if it's wrong: read from %ss takes ~2 cycles,
+	 * write to %ss is ~40 cycles.
+	 */
+	movl	%ss, %ecx
+	cmpl	$__KERNEL_DS, %ecx
+	jne	reload_ss
+ss_is_good:
 	movl RIP(%rsp),%ecx
 	CFI_REGISTER rip,rcx
 	movl EFLAGS(%rsp),%r11d
@@ -426,6 +438,10 @@ sysretl_from_sys_call:
 	 * does not exist, it merely sets eflags.IF=1).
 	 */
 	USERGS_SYSRET32
+reload_ss:
+	movl	$__KERNEL_DS, %ecx
+	movl	%ecx, %ss
+	jmp	ss_is_good
 
 #ifdef CONFIG_AUDITSYSCALL
 cstar_auditsys:
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2015-04-24  1:00 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-04-23 12:34 [PATCH] x86/asm/entry/32: Restore %ss before SYSRETL if necessary Denys Vlasenko
2015-04-23 15:22 ` Linus Torvalds
2015-04-23 15:50   ` Andy Lutomirski
2015-04-23 16:05     ` Linus Torvalds
2015-04-23 16:11       ` Borislav Petkov
2015-04-23 16:06   ` Brian Gerst
2015-04-23 16:13     ` Linus Torvalds
2015-04-23 16:27       ` Andy Lutomirski
2015-04-23 20:01         ` Denys Vlasenko
2015-04-23 21:10           ` Borislav Petkov
2015-04-23 21:37             ` H. Peter Anvin
2015-04-23 21:46               ` Borislav Petkov
2015-04-23 22:29               ` Andy Lutomirski
2015-04-23 22:31                 ` H. Peter Anvin
2015-04-23 22:38                   ` Andy Lutomirski
2015-04-23 22:52                     ` H. Peter Anvin
2015-04-23 22:55                       ` Andy Lutomirski
2015-04-23 23:04                         ` Linus Torvalds
2015-04-23 23:22                           ` Denys Vlasenko
2015-04-24  0:59                         ` H. Peter Anvin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox