[patch added to the 3.12 stable tree] x86/nmi: Enable nested do

stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [patch added to the 3.12 stable tree] x86/nmi: Enable nested do_nmi() handling for 64-bit kernels
@ 2015-08-19  6:35 Jiri Slaby
  2015-08-19  6:35 ` [patch added to the 3.12 stable tree] x86/nmi/64: Remove asm code that saves CR2 Jiri Slaby
                   ` (7 more replies)
  0 siblings, 8 replies; 13+ messages in thread
From: Jiri Slaby @ 2015-08-19  6:35 UTC (permalink / raw)
  To: stable
  Cc: Andy Lutomirski, Borislav Petkov, Linus Torvalds, Peter Zijlstra,
	Thomas Gleixner, Ingo Molnar, Jiri Slaby

From: Andy Lutomirski <luto@kernel.org>

This patch has been added to the 3.12 stable tree. If you have any
objections, please let us know.

===============

commit 9d05041679904b12c12421cbcf9cb5f4860a8d7b upstream.

32-bit kernels handle nested NMIs in C.  Enable the exact same
handling on 64-bit kernels as well.  This isn't currently
necessary, but it will become necessary once the asm code starts
allowing limited nesting.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
---
 arch/x86/kernel/nmi.c | 123 +++++++++++++++++++++-----------------------------
 1 file changed, 52 insertions(+), 71 deletions(-)

diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index 6fcb49ce50a1..b82e0fdc7edb 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -392,15 +392,15 @@ static __kprobes void default_do_nmi(struct pt_regs *regs)
 }
 
 /*
- * NMIs can hit breakpoints which will cause it to lose its
- * NMI context with the CPU when the breakpoint does an iret.
- */
-#ifdef CONFIG_X86_32
-/*
- * For i386, NMIs use the same stack as the kernel, and we can
- * add a workaround to the iret problem in C (preventing nested
- * NMIs if an NMI takes a trap). Simply have 3 states the NMI
- * can be in:
+ * NMIs can hit breakpoints which will cause it to lose its NMI context
+ * with the CPU when the breakpoint or page fault does an IRET.
+ *
+ * As a result, NMIs can nest if NMIs get unmasked due an IRET during
+ * NMI processing.  On x86_64, the asm glue protects us from nested NMIs
+ * if the outer NMI came from kernel mode, but we can still nest if the
+ * outer NMI came from user mode.
+ *
+ * To handle these nested NMIs, we have three states:
  *
  *  1) not running
  *  2) executing
@@ -414,15 +414,14 @@ static __kprobes void default_do_nmi(struct pt_regs *regs)
  * (Note, the latch is binary, thus multiple NMIs triggering,
  *  when one is running, are ignored. Only one NMI is restarted.)
  *
- * If an NMI hits a breakpoint that executes an iret, another
- * NMI can preempt it. We do not want to allow this new NMI
- * to run, but we want to execute it when the first one finishes.
- * We set the state to "latched", and the exit of the first NMI will
- * perform a dec_return, if the result is zero (NOT_RUNNING), then
- * it will simply exit the NMI handler. If not, the dec_return
- * would have set the state to NMI_EXECUTING (what we want it to
- * be when we are running). In this case, we simply jump back
- * to rerun the NMI handler again, and restart the 'latched' NMI.
+ * If an NMI executes an iret, another NMI can preempt it. We do not
+ * want to allow this new NMI to run, but we want to execute it when the
+ * first one finishes.  We set the state to "latched", and the exit of
+ * the first NMI will perform a dec_return, if the result is zero
+ * (NOT_RUNNING), then it will simply exit the NMI handler. If not, the
+ * dec_return would have set the state to NMI_EXECUTING (what we want it
+ * to be when we are running). In this case, we simply jump back to
+ * rerun the NMI handler again, and restart the 'latched' NMI.
  *
  * No trap (breakpoint or page fault) should be hit before nmi_restart,
  * thus there is no race between the first check of state for NOT_RUNNING
@@ -445,49 +444,36 @@ enum nmi_states {
 static DEFINE_PER_CPU(enum nmi_states, nmi_state);
 static DEFINE_PER_CPU(unsigned long, nmi_cr2);
 
-#define nmi_nesting_preprocess(regs)					\
-	do {								\
-		if (this_cpu_read(nmi_state) != NMI_NOT_RUNNING) {	\
-			this_cpu_write(nmi_state, NMI_LATCHED);		\
-			return;						\
-		}							\
-		this_cpu_write(nmi_state, NMI_EXECUTING);		\
-		this_cpu_write(nmi_cr2, read_cr2());			\
-	} while (0);							\
-	nmi_restart:
-
-#define nmi_nesting_postprocess()					\
-	do {								\
-		if (unlikely(this_cpu_read(nmi_cr2) != read_cr2()))	\
-			write_cr2(this_cpu_read(nmi_cr2));		\
-		if (this_cpu_dec_return(nmi_state))			\
-			goto nmi_restart;				\
-	} while (0)
-#else /* x86_64 */
+#ifdef CONFIG_X86_64
 /*
- * In x86_64 things are a bit more difficult. This has the same problem
- * where an NMI hitting a breakpoint that calls iret will remove the
- * NMI context, allowing a nested NMI to enter. What makes this more
- * difficult is that both NMIs and breakpoints have their own stack.
- * When a new NMI or breakpoint is executed, the stack is set to a fixed
- * point. If an NMI is nested, it will have its stack set at that same
- * fixed address that the first NMI had, and will start corrupting the
- * stack. This is handled in entry_64.S, but the same problem exists with
- * the breakpoint stack.
+ * In x86_64, we need to handle breakpoint -> NMI -> breakpoint.  Without
+ * some care, the inner breakpoint will clobber the outer breakpoint's
+ * stack.
  *
- * If a breakpoint is being processed, and the debug stack is being used,
- * if an NMI comes in and also hits a breakpoint, the stack pointer
- * will be set to the same fixed address as the breakpoint that was
- * interrupted, causing that stack to be corrupted. To handle this case,
- * check if the stack that was interrupted is the debug stack, and if
- * so, change the IDT so that new breakpoints will use the current stack
- * and not switch to the fixed address. On return of the NMI, switch back
- * to the original IDT.
+ * If a breakpoint is being processed, and the debug stack is being
+ * used, if an NMI comes in and also hits a breakpoint, the stack
+ * pointer will be set to the same fixed address as the breakpoint that
+ * was interrupted, causing that stack to be corrupted. To handle this
+ * case, check if the stack that was interrupted is the debug stack, and
+ * if so, change the IDT so that new breakpoints will use the current
+ * stack and not switch to the fixed address. On return of the NMI,
+ * switch back to the original IDT.
  */
 static DEFINE_PER_CPU(int, update_debug_stack);
+#endif
 
-static inline void nmi_nesting_preprocess(struct pt_regs *regs)
+dotraplinkage notrace __kprobes void
+do_nmi(struct pt_regs *regs, long error_code)
 {
+	if (this_cpu_read(nmi_state) != NMI_NOT_RUNNING) {
+		this_cpu_write(nmi_state, NMI_LATCHED);
+		return;
+	}
+	this_cpu_write(nmi_state, NMI_EXECUTING);
+	this_cpu_write(nmi_cr2, read_cr2());
+nmi_restart:
+
+#ifdef CONFIG_X86_64
 	/*
 	 * If we interrupted a breakpoint, it is possible that
 	 * the nmi handler will have breakpoints too. We need to
@@ -498,22 +484,8 @@ static inline void nmi_nesting_preprocess(struct pt_regs *regs)
 		debug_stack_set_zero();
 		this_cpu_write(update_debug_stack, 1);
 	}
-}
-
-static inline void nmi_nesting_postprocess(void)
-{
-	if (unlikely(this_cpu_read(update_debug_stack))) {
-		debug_stack_reset();
-		this_cpu_write(update_debug_stack, 0);
-	}
-}
 #endif
 
-dotraplinkage notrace __kprobes void
-do_nmi(struct pt_regs *regs, long error_code)
-{
-	nmi_nesting_preprocess(regs);
-
 	nmi_enter();
 
 	inc_irq_stat(__nmi_count);
@@ -523,8 +495,17 @@ do_nmi(struct pt_regs *regs, long error_code)
 
 	nmi_exit();
 
-	/* On i386, may loop back to preprocess */
-	nmi_nesting_postprocess();
+#ifdef CONFIG_X86_64
+	if (unlikely(this_cpu_read(update_debug_stack))) {
+		debug_stack_reset();
+		this_cpu_write(update_debug_stack, 0);
+	}
+#endif
+
+	if (unlikely(this_cpu_read(nmi_cr2) != read_cr2()))
+		write_cr2(this_cpu_read(nmi_cr2));
+	if (this_cpu_dec_return(nmi_state))
+		goto nmi_restart;
 }
 
 void stop_nmi(void)
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [patch added to the 3.12 stable tree] x86/nmi/64: Remove asm code that saves CR2
  2015-08-19  6:35 [patch added to the 3.12 stable tree] x86/nmi: Enable nested do_nmi() handling for 64-bit kernels Jiri Slaby
@ 2015-08-19  6:35 ` Jiri Slaby
  2015-08-19  6:35 ` [patch added to the 3.12 stable tree] x86/nmi/64: Switch stacks on userspace NMI entry Jiri Slaby
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Jiri Slaby @ 2015-08-19  6:35 UTC (permalink / raw)
  To: stable
  Cc: Andy Lutomirski, Linus Torvalds, Peter Zijlstra, Thomas Gleixner,
	Ingo Molnar, Jiri Slaby

From: Andy Lutomirski <luto@kernel.org>

This patch has been added to the 3.12 stable tree. If you have any
objections, please let us know.

===============

commit 0e181bb58143cb4a2e8f01c281b0816cd0e4798e upstream.

Now that do_nmi saves CR2, we don't need to save it in asm.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
---
 arch/x86/kernel/entry_64.S | 18 ------------------
 1 file changed, 18 deletions(-)

diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 7b22af265d12..fd27e5b86d7b 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -1876,29 +1876,11 @@ end_repeat_nmi:
 	call save_paranoid
 	DEFAULT_FRAME 0
 
-	/*
-	 * Save off the CR2 register. If we take a page fault in the NMI then
-	 * it could corrupt the CR2 value. If the NMI preempts a page fault
-	 * handler before it was able to read the CR2 register, and then the
-	 * NMI itself takes a page fault, the page fault that was preempted
-	 * will read the information from the NMI page fault and not the
-	 * origin fault. Save it off and restore it if it changes.
-	 * Use the r12 callee-saved register.
-	 */
-	movq %cr2, %r12
-
 	/* paranoidentry do_nmi, 0; without TRACE_IRQS_OFF */
 	movq %rsp,%rdi
 	movq $-1,%rsi
 	call do_nmi
 
-	/* Did the NMI take a page fault? Restore cr2 if it did */
-	movq %cr2, %rcx
-	cmpq %rcx, %r12
-	je 1f
-	movq %r12, %cr2
-1:
-	
 	testl %ebx,%ebx				/* swapgs needed? */
 	jnz nmi_restore
 nmi_swapgs:
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [patch added to the 3.12 stable tree] x86/nmi/64: Switch stacks on userspace NMI entry
  2015-08-19  6:35 [patch added to the 3.12 stable tree] x86/nmi: Enable nested do_nmi() handling for 64-bit kernels Jiri Slaby
  2015-08-19  6:35 ` [patch added to the 3.12 stable tree] x86/nmi/64: Remove asm code that saves CR2 Jiri Slaby
@ 2015-08-19  6:35 ` Jiri Slaby
  2015-08-19  6:35 ` [patch added to the 3.12 stable tree] arch: Introduce smp_load_acquire(), smp_store_release() Jiri Slaby
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Jiri Slaby @ 2015-08-19  6:35 UTC (permalink / raw)
  To: stable
  Cc: Andy Lutomirski, Linus Torvalds, Peter Zijlstra, Thomas Gleixner,
	Ingo Molnar, Jiri Slaby

From: Andy Lutomirski <luto@kernel.org>

This patch has been added to the 3.12 stable tree. If you have any
objections, please let us know.

===============

commit 9b6e6a8334d56354853f9c255d1395c2ba570e0a upstream.

Returning to userspace is tricky: IRET can fail, and ESPFIX can
rearrange the stack prior to IRET.

The NMI nesting fixup relies on a precise stack layout and
atomic IRET.  Rather than trying to teach the NMI nesting fixup
to handle ESPFIX and failed IRET, punt: run NMIs that came from
user mode on the normal kernel stack.

This will make some nested NMIs visible to C code, but the C
code is okay with that.

As a side effect, this should speed up perf: it eliminates an
RDMSR when NMIs come from user mode.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
---
 arch/x86/kernel/entry_64.S | 65 +++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 61 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index fd27e5b86d7b..691337073e1f 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -1706,19 +1706,76 @@ ENTRY(nmi)
 	 * a nested NMI that updated the copy interrupt stack frame, a
 	 * jump will be made to the repeat_nmi code that will handle the second
 	 * NMI.
+	 *
+	 * However, espfix prevents us from directly returning to userspace
+	 * with a single IRET instruction.  Similarly, IRET to user mode
+	 * can fault.  We therefore handle NMIs from user space like
+	 * other IST entries.
 	 */
 
 	/* Use %rdx as out temp variable throughout */
 	pushq_cfi %rdx
 	CFI_REL_OFFSET rdx, 0
 
+	testb	$3, CS-RIP+8(%rsp)
+	jz	.Lnmi_from_kernel
+
+	/*
+	 * NMI from user mode.  We need to run on the thread stack, but we
+	 * can't go through the normal entry paths: NMIs are masked, and
+	 * we don't want to enable interrupts, because then we'll end
+	 * up in an awkward situation in which IRQs are on but NMIs
+	 * are off.
+	 */
+
+	SWAPGS
+	cld
+	movq	%rsp, %rdx
+	movq	PER_CPU_VAR(kernel_stack), %rsp
+	addq	$KERNEL_STACK_OFFSET, %rsp
+	pushq	5*8(%rdx)	/* pt_regs->ss */
+	pushq	4*8(%rdx)	/* pt_regs->rsp */
+	pushq	3*8(%rdx)	/* pt_regs->flags */
+	pushq	2*8(%rdx)	/* pt_regs->cs */
+	pushq	1*8(%rdx)	/* pt_regs->rip */
+	pushq   $-1		/* pt_regs->orig_ax */
+	pushq   %rdi		/* pt_regs->di */
+	pushq   %rsi		/* pt_regs->si */
+	pushq   (%rdx)		/* pt_regs->dx */
+	pushq   %rcx		/* pt_regs->cx */
+	pushq   %rax		/* pt_regs->ax */
+	pushq   %r8		/* pt_regs->r8 */
+	pushq   %r9		/* pt_regs->r9 */
+	pushq   %r10		/* pt_regs->r10 */
+	pushq   %r11		/* pt_regs->r11 */
+	pushq	%rbx		/* pt_regs->rbx */
+	pushq	%rbp		/* pt_regs->rbp */
+	pushq	%r12		/* pt_regs->r12 */
+	pushq	%r13		/* pt_regs->r13 */
+	pushq	%r14		/* pt_regs->r14 */
+	pushq	%r15		/* pt_regs->r15 */
+
 	/*
-	 * If %cs was not the kernel segment, then the NMI triggered in user
-	 * space, which means it is definitely not nested.
+	 * At this point we no longer need to worry about stack damage
+	 * due to nesting -- we're on the normal thread stack and we're
+	 * done with the NMI stack.
 	 */
-	cmpl $__KERNEL_CS, 16(%rsp)
-	jne first_nmi
 
+	movq	%rsp, %rdi
+	movq	$-1, %rsi
+	call	do_nmi
+
+	/*
+	 * Return back to user mode.  We must *not* do the normal exit
+	 * work, because we don't want to enable interrupts.  Fortunately,
+	 * do_nmi doesn't modify pt_regs.
+	 */
+	SWAPGS
+
+	addq	$6*8, %rsp	/* skip bx, bp, and r12-r15 */
+	jmp	restore_args
+
+.Lnmi_from_kernel:
 	/*
 	 * Check the special variable on the stack to see if NMIs are
 	 * executing.
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [patch added to the 3.12 stable tree] arch: Introduce smp_load_acquire(), smp_store_release()
  2015-08-19  6:35 [patch added to the 3.12 stable tree] x86/nmi: Enable nested do_nmi() handling for 64-bit kernels Jiri Slaby
  2015-08-19  6:35 ` [patch added to the 3.12 stable tree] x86/nmi/64: Remove asm code that saves CR2 Jiri Slaby
  2015-08-19  6:35 ` [patch added to the 3.12 stable tree] x86/nmi/64: Switch stacks on userspace NMI entry Jiri Slaby
@ 2015-08-19  6:35 ` Jiri Slaby
  2015-08-19  6:35 ` [patch added to the 3.12 stable tree] rcu: Provide counterpart to rcu_dereference() for non-RCU situations Jiri Slaby
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Jiri Slaby @ 2015-08-19  6:35 UTC (permalink / raw)
  To: stable
  Cc: Peter Zijlstra, Benjamin Herrenschmidt, Frederic Weisbecker,
	Mathieu Desnoyers, Michael Ellerman, Michael Neuling,
	Russell King, Geert Uytterhoeven, Heiko Carstens, Linus Torvalds,
	Martin Schwidefsky, Victor Kaplansky, Tony Luck, Oleg Nesterov,
	Ingo Molnar, Jiri Slaby

From: Peter Zijlstra <peterz@infradead.org>

This patch has been added to the 3.12 stable tree. If you have any
objections, please let us know.

===============

commit 47933ad41a86a4a9b50bed7c9b9bd2ba242aac63 upstream.

A number of situations currently require the heavyweight smp_mb(),
even though there is no need to order prior stores against later
loads.  Many architectures have much cheaper ways to handle these
situations, but the Linux kernel currently has no portable way
to make use of them.

This commit therefore supplies smp_load_acquire() and
smp_store_release() to remedy this situation.  The new
smp_load_acquire() primitive orders the specified load against
any subsequent reads or writes, while the new smp_store_release()
primitive orders the specifed store against any prior reads or
writes.  These primitives allow array-based circular FIFOs to be
implemented without an smp_mb(), and also allow a theoretical
hole in rcu_assign_pointer() to be closed at no additional
expense on most architectures.

In addition, the RCU experience transitioning from explicit
smp_read_barrier_depends() and smp_wmb() to rcu_dereference()
and rcu_assign_pointer(), respectively resulted in substantial
improvements in readability.  It therefore seems likely that
replacing other explicit barriers with smp_load_acquire() and
smp_store_release() will provide similar benefits.  It appears
that roughly half of the explicit barriers in core kernel code
might be so replaced.

[Changelog by PaulMck]

Reviewed-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Michael Ellerman <michael@ellerman.id.au>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Victor Kaplansky <VICTORK@il.ibm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Link: http://lkml.kernel.org/r/20131213150640.908486364@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
---
 arch/arm/include/asm/barrier.h      | 15 +++++++++++
 arch/arm64/include/asm/barrier.h    | 50 +++++++++++++++++++++++++++++++++++++
 arch/ia64/include/asm/barrier.h     | 23 +++++++++++++++++
 arch/metag/include/asm/barrier.h    | 15 +++++++++++
 arch/mips/include/asm/barrier.h     | 15 +++++++++++
 arch/powerpc/include/asm/barrier.h  | 21 +++++++++++++++-
 arch/s390/include/asm/barrier.h     | 15 +++++++++++
 arch/sparc/include/asm/barrier_64.h | 15 +++++++++++
 arch/x86/include/asm/barrier.h      | 43 ++++++++++++++++++++++++++++++-
 include/asm-generic/barrier.h       | 15 +++++++++++
 include/linux/compiler.h            |  9 +++++++
 11 files changed, 234 insertions(+), 2 deletions(-)

diff --git a/arch/arm/include/asm/barrier.h b/arch/arm/include/asm/barrier.h
index 60f15e274e6d..2f59f7443396 100644
--- a/arch/arm/include/asm/barrier.h
+++ b/arch/arm/include/asm/barrier.h
@@ -59,6 +59,21 @@
 #define smp_wmb()	dmb(ishst)
 #endif
 
+#define smp_store_release(p, v)						\
+do {									\
+	compiletime_assert_atomic_type(*p);				\
+	smp_mb();							\
+	ACCESS_ONCE(*p) = (v);						\
+} while (0)
+
+#define smp_load_acquire(p)						\
+({									\
+	typeof(*p) ___p1 = ACCESS_ONCE(*p);				\
+	compiletime_assert_atomic_type(*p);				\
+	smp_mb();							\
+	___p1;								\
+})
+
 #define read_barrier_depends()		do { } while(0)
 #define smp_read_barrier_depends()	do { } while(0)
 
diff --git a/arch/arm64/include/asm/barrier.h b/arch/arm64/include/asm/barrier.h
index d4a63338a53c..78e20ba8806b 100644
--- a/arch/arm64/include/asm/barrier.h
+++ b/arch/arm64/include/asm/barrier.h
@@ -35,10 +35,60 @@
 #define smp_mb()	barrier()
 #define smp_rmb()	barrier()
 #define smp_wmb()	barrier()
+
+#define smp_store_release(p, v)						\
+do {									\
+	compiletime_assert_atomic_type(*p);				\
+	smp_mb();							\
+	ACCESS_ONCE(*p) = (v);						\
+} while (0)
+
+#define smp_load_acquire(p)						\
+({									\
+	typeof(*p) ___p1 = ACCESS_ONCE(*p);				\
+	compiletime_assert_atomic_type(*p);				\
+	smp_mb();							\
+	___p1;								\
+})
+
 #else
+
 #define smp_mb()	asm volatile("dmb ish" : : : "memory")
 #define smp_rmb()	asm volatile("dmb ishld" : : : "memory")
 #define smp_wmb()	asm volatile("dmb ishst" : : : "memory")
+
+#define smp_store_release(p, v)						\
+do {									\
+	compiletime_assert_atomic_type(*p);				\
+	switch (sizeof(*p)) {						\
+	case 4:								\
+		asm volatile ("stlr %w1, %0"				\
+				: "=Q" (*p) : "r" (v) : "memory");	\
+		break;							\
+	case 8:								\
+		asm volatile ("stlr %1, %0"				\
+				: "=Q" (*p) : "r" (v) : "memory");	\
+		break;							\
+	}								\
+} while (0)
+
+#define smp_load_acquire(p)						\
+({									\
+	typeof(*p) ___p1;						\
+	compiletime_assert_atomic_type(*p);				\
+	switch (sizeof(*p)) {						\
+	case 4:								\
+		asm volatile ("ldar %w0, %1"				\
+			: "=r" (___p1) : "Q" (*p) : "memory");		\
+		break;							\
+	case 8:								\
+		asm volatile ("ldar %0, %1"				\
+			: "=r" (___p1) : "Q" (*p) : "memory");		\
+		break;							\
+	}								\
+	___p1;								\
+})
+
 #endif
 
 #define read_barrier_depends()		do { } while(0)
diff --git a/arch/ia64/include/asm/barrier.h b/arch/ia64/include/asm/barrier.h
index 60576e06b6fb..d0a69aa35e27 100644
--- a/arch/ia64/include/asm/barrier.h
+++ b/arch/ia64/include/asm/barrier.h
@@ -45,14 +45,37 @@
 # define smp_rmb()	rmb()
 # define smp_wmb()	wmb()
 # define smp_read_barrier_depends()	read_barrier_depends()
+
 #else
+
 # define smp_mb()	barrier()
 # define smp_rmb()	barrier()
 # define smp_wmb()	barrier()
 # define smp_read_barrier_depends()	do { } while(0)
+
 #endif
 
 /*
+ * IA64 GCC turns volatile stores into st.rel and volatile loads into ld.acq no
+ * need for asm trickery!
+ */
+
+#define smp_store_release(p, v)						\
+do {									\
+	compiletime_assert_atomic_type(*p);				\
+	barrier();							\
+	ACCESS_ONCE(*p) = (v);						\
+} while (0)
+
+#define smp_load_acquire(p)						\
+({									\
+	typeof(*p) ___p1 = ACCESS_ONCE(*p);				\
+	compiletime_assert_atomic_type(*p);				\
+	barrier();							\
+	___p1;								\
+})
+
+/*
  * XXX check on this ---I suspect what Linus really wants here is
  * acquire vs release semantics but we can't discuss this stuff with
  * Linus just yet.  Grrr...
diff --git a/arch/metag/include/asm/barrier.h b/arch/metag/include/asm/barrier.h
index e355a4c10968..2d6f0de77325 100644
--- a/arch/metag/include/asm/barrier.h
+++ b/arch/metag/include/asm/barrier.h
@@ -85,4 +85,19 @@ static inline void fence(void)
 #define smp_read_barrier_depends()     do { } while (0)
 #define set_mb(var, value) do { var = value; smp_mb(); } while (0)
 
+#define smp_store_release(p, v)						\
+do {									\
+	compiletime_assert_atomic_type(*p);				\
+	smp_mb();							\
+	ACCESS_ONCE(*p) = (v);						\
+} while (0)
+
+#define smp_load_acquire(p)						\
+({									\
+	typeof(*p) ___p1 = ACCESS_ONCE(*p);				\
+	compiletime_assert_atomic_type(*p);				\
+	smp_mb();							\
+	___p1;								\
+})
+
 #endif /* _ASM_METAG_BARRIER_H */
diff --git a/arch/mips/include/asm/barrier.h b/arch/mips/include/asm/barrier.h
index 314ab5532019..52c5b61d7aba 100644
--- a/arch/mips/include/asm/barrier.h
+++ b/arch/mips/include/asm/barrier.h
@@ -180,4 +180,19 @@
 #define nudge_writes() mb()
 #endif
 
+#define smp_store_release(p, v)						\
+do {									\
+	compiletime_assert_atomic_type(*p);				\
+	smp_mb();							\
+	ACCESS_ONCE(*p) = (v);						\
+} while (0)
+
+#define smp_load_acquire(p)						\
+({									\
+	typeof(*p) ___p1 = ACCESS_ONCE(*p);				\
+	compiletime_assert_atomic_type(*p);				\
+	smp_mb();							\
+	___p1;								\
+})
+
 #endif /* __ASM_BARRIER_H */
diff --git a/arch/powerpc/include/asm/barrier.h b/arch/powerpc/include/asm/barrier.h
index ae782254e731..f89da808ce31 100644
--- a/arch/powerpc/include/asm/barrier.h
+++ b/arch/powerpc/include/asm/barrier.h
@@ -45,11 +45,15 @@
 #    define SMPWMB      eieio
 #endif
 
+#define __lwsync()	__asm__ __volatile__ (stringify_in_c(LWSYNC) : : :"memory")
+
 #define smp_mb()	mb()
-#define smp_rmb()	__asm__ __volatile__ (stringify_in_c(LWSYNC) : : :"memory")
+#define smp_rmb()	__lwsync()
 #define smp_wmb()	__asm__ __volatile__ (stringify_in_c(SMPWMB) : : :"memory")
 #define smp_read_barrier_depends()	read_barrier_depends()
 #else
+#define __lwsync()	barrier()
+
 #define smp_mb()	barrier()
 #define smp_rmb()	barrier()
 #define smp_wmb()	barrier()
@@ -65,4 +69,19 @@
 #define data_barrier(x)	\
 	asm volatile("twi 0,%0,0; isync" : : "r" (x) : "memory");
 
+#define smp_store_release(p, v)						\
+do {									\
+	compiletime_assert_atomic_type(*p);				\
+	__lwsync();							\
+	ACCESS_ONCE(*p) = (v);						\
+} while (0)
+
+#define smp_load_acquire(p)						\
+({									\
+	typeof(*p) ___p1 = ACCESS_ONCE(*p);				\
+	compiletime_assert_atomic_type(*p);				\
+	__lwsync();							\
+	___p1;								\
+})
+
 #endif /* _ASM_POWERPC_BARRIER_H */
diff --git a/arch/s390/include/asm/barrier.h b/arch/s390/include/asm/barrier.h
index 16760eeb79b0..578680f6207a 100644
--- a/arch/s390/include/asm/barrier.h
+++ b/arch/s390/include/asm/barrier.h
@@ -32,4 +32,19 @@
 
 #define set_mb(var, value)		do { var = value; mb(); } while (0)
 
+#define smp_store_release(p, v)						\
+do {									\
+	compiletime_assert_atomic_type(*p);				\
+	barrier();							\
+	ACCESS_ONCE(*p) = (v);						\
+} while (0)
+
+#define smp_load_acquire(p)						\
+({									\
+	typeof(*p) ___p1 = ACCESS_ONCE(*p);				\
+	compiletime_assert_atomic_type(*p);				\
+	barrier();							\
+	___p1;								\
+})
+
 #endif /* __ASM_BARRIER_H */
diff --git a/arch/sparc/include/asm/barrier_64.h b/arch/sparc/include/asm/barrier_64.h
index 95d45986f908..b5aad964558e 100644
--- a/arch/sparc/include/asm/barrier_64.h
+++ b/arch/sparc/include/asm/barrier_64.h
@@ -53,4 +53,19 @@ do {	__asm__ __volatile__("ba,pt	%%xcc, 1f\n\t" \
 
 #define smp_read_barrier_depends()	do { } while(0)
 
+#define smp_store_release(p, v)						\
+do {									\
+	compiletime_assert_atomic_type(*p);				\
+	barrier();							\
+	ACCESS_ONCE(*p) = (v);						\
+} while (0)
+
+#define smp_load_acquire(p)						\
+({									\
+	typeof(*p) ___p1 = ACCESS_ONCE(*p);				\
+	compiletime_assert_atomic_type(*p);				\
+	barrier();							\
+	___p1;								\
+})
+
 #endif /* !(__SPARC64_BARRIER_H) */
diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
index c6cd358a1eec..04a48903b2eb 100644
--- a/arch/x86/include/asm/barrier.h
+++ b/arch/x86/include/asm/barrier.h
@@ -92,12 +92,53 @@
 #endif
 #define smp_read_barrier_depends()	read_barrier_depends()
 #define set_mb(var, value) do { (void)xchg(&var, value); } while (0)
-#else
+#else /* !SMP */
 #define smp_mb()	barrier()
 #define smp_rmb()	barrier()
 #define smp_wmb()	barrier()
 #define smp_read_barrier_depends()	do { } while (0)
 #define set_mb(var, value) do { var = value; barrier(); } while (0)
+#endif /* SMP */
+
+#if defined(CONFIG_X86_OOSTORE) || defined(CONFIG_X86_PPRO_FENCE)
+
+/*
+ * For either of these options x86 doesn't have a strong TSO memory
+ * model and we should fall back to full barriers.
+ */
+
+#define smp_store_release(p, v)						\
+do {									\
+	compiletime_assert_atomic_type(*p);				\
+	smp_mb();							\
+	ACCESS_ONCE(*p) = (v);						\
+} while (0)
+
+#define smp_load_acquire(p)						\
+({									\
+	typeof(*p) ___p1 = ACCESS_ONCE(*p);				\
+	compiletime_assert_atomic_type(*p);				\
+	smp_mb();							\
+	___p1;								\
+})
+
+#else /* regular x86 TSO memory ordering */
+
+#define smp_store_release(p, v)						\
+do {									\
+	compiletime_assert_atomic_type(*p);				\
+	barrier();							\
+	ACCESS_ONCE(*p) = (v);						\
+} while (0)
+
+#define smp_load_acquire(p)						\
+({									\
+	typeof(*p) ___p1 = ACCESS_ONCE(*p);				\
+	compiletime_assert_atomic_type(*p);				\
+	barrier();							\
+	___p1;								\
+})
+
 #endif
 
 /*
diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h
index 639d7a4d033b..01613b382b0e 100644
--- a/include/asm-generic/barrier.h
+++ b/include/asm-generic/barrier.h
@@ -46,5 +46,20 @@
 #define read_barrier_depends()		do {} while (0)
 #define smp_read_barrier_depends()	do {} while (0)
 
+#define smp_store_release(p, v)						\
+do {									\
+	compiletime_assert_atomic_type(*p);				\
+	smp_mb();							\
+	ACCESS_ONCE(*p) = (v);						\
+} while (0)
+
+#define smp_load_acquire(p)						\
+({									\
+	typeof(*p) ___p1 = ACCESS_ONCE(*p);				\
+	compiletime_assert_atomic_type(*p);				\
+	smp_mb();							\
+	___p1;								\
+})
+
 #endif /* !__ASSEMBLY__ */
 #endif /* __ASM_GENERIC_BARRIER_H */
diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index a2329c5e6206..2472740d7ab2 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -302,6 +302,11 @@ void ftrace_likely_update(struct ftrace_branch_data *f, int val, int expect);
 # define __same_type(a, b) __builtin_types_compatible_p(typeof(a), typeof(b))
 #endif
 
+/* Is this type a native word size -- useful for atomic operations */
+#ifndef __native_word
+# define __native_word(t) (sizeof(t) == sizeof(int) || sizeof(t) == sizeof(long))
+#endif
+
 /* Compile time object size, -1 for unknown */
 #ifndef __compiletime_object_size
 # define __compiletime_object_size(obj) -1
@@ -341,6 +346,10 @@ void ftrace_likely_update(struct ftrace_branch_data *f, int val, int expect);
 #define compiletime_assert(condition, msg) \
 	_compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
 
+#define compiletime_assert_atomic_type(t)				\
+	compiletime_assert(__native_word(t),				\
+		"Need native word sized stores/loads for atomicity.")
+
 /*
  * Prevent the compiler from merging or refetching accesses.  The compiler
  * is also forbidden from reordering successive instances of ACCESS_ONCE(),
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [patch added to the 3.12 stable tree] rcu: Provide counterpart to rcu_dereference() for non-RCU situations
  2015-08-19  6:35 [patch added to the 3.12 stable tree] x86/nmi: Enable nested do_nmi() handling for 64-bit kernels Jiri Slaby
                   ` (2 preceding siblings ...)
  2015-08-19  6:35 ` [patch added to the 3.12 stable tree] arch: Introduce smp_load_acquire(), smp_store_release() Jiri Slaby
@ 2015-08-19  6:35 ` Jiri Slaby
  2015-08-19  6:35 ` [patch added to the 3.12 stable tree] rcu: Move lockless_dereference() out of rcupdate.h Jiri Slaby
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Jiri Slaby @ 2015-08-19  6:35 UTC (permalink / raw)
  To: stable; +Cc: Paul E. McKenney, Al Viro, Jiri Slaby

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

This patch has been added to the 3.12 stable tree. If you have any
objections, please let us know.

===============

commit 54ef6df3f3f1353d99c80c437259d317b2cd1cbd upstream.

Although rcu_dereference() and friends can be used in situations where
object lifetimes are being managed by something other than RCU, the
resulting sparse and lockdep-RCU noise can be annoying.  This commit
therefore supplies a lockless_dereference(), which provides the
protection for dereferences without the RCU-related debugging noise.

Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
---
 include/linux/rcupdate.h | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index f1f1bc39346b..4912c953cab7 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -554,6 +554,20 @@ static inline void rcu_preempt_sleep_check(void)
 		(p) = (typeof(*v) __force space *)(v); \
 	} while (0)
 
+/**
+ * lockless_dereference() - safely load a pointer for later dereference
+ * @p: The pointer to load
+ *
+ * Similar to rcu_dereference(), but for situations where the pointed-to
+ * object's lifetime is managed by something other than RCU.  That
+ * "something other" might be reference counting or simple immortality.
+ */
+#define lockless_dereference(p) \
+({ \
+	typeof(p) _________p1 = ACCESS_ONCE(p); \
+	smp_read_barrier_depends(); /* Dependency order vs. p above. */ \
+	(_________p1); \
+})
 
 /**
  * rcu_access_pointer() - fetch RCU pointer with no dereferencing
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [patch added to the 3.12 stable tree] rcu: Move lockless_dereference() out of rcupdate.h
  2015-08-19  6:35 [patch added to the 3.12 stable tree] x86/nmi: Enable nested do_nmi() handling for 64-bit kernels Jiri Slaby
                   ` (3 preceding siblings ...)
  2015-08-19  6:35 ` [patch added to the 3.12 stable tree] rcu: Provide counterpart to rcu_dereference() for non-RCU situations Jiri Slaby
@ 2015-08-19  6:35 ` Jiri Slaby
  2015-08-19  6:35 ` [patch added to the 3.12 stable tree] x86/ldt: Make modify_ldt synchronous Jiri Slaby
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Jiri Slaby @ 2015-08-19  6:35 UTC (permalink / raw)
  To: stable; +Cc: Peter Zijlstra, Paul McKenney, Rusty Russell, Jiri Slaby

From: Peter Zijlstra <peterz@infradead.org>

This patch has been added to the 3.12 stable tree. If you have any
objections, please let us know.

===============

commit 0a04b0166929405cd833c1cc40f99e862b965ddc upstream.

I want to use lockless_dereference() from seqlock.h, which would mean
including rcupdate.h from it, however rcupdate.h already includes
seqlock.h.

Avoid this by moving lockless_dereference() into compiler.h. This is
somewhat tricky since it uses smp_read_barrier_depends() which isn't
available there, but its a CPP macro so we can get away with it.

The alternative would be moving it into asm/barrier.h, but that would
be updating each arch (I can do if people feel that is more
appropriate).

Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
---
 include/linux/compiler.h | 15 +++++++++++++++
 include/linux/rcupdate.h | 15 ---------------
 2 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index 2472740d7ab2..4a3caa61a002 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -364,6 +364,21 @@ void ftrace_likely_update(struct ftrace_branch_data *f, int val, int expect);
  */
 #define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x))
 
+/**
+ * lockless_dereference() - safely load a pointer for later dereference
+ * @p: The pointer to load
+ *
+ * Similar to rcu_dereference(), but for situations where the pointed-to
+ * object's lifetime is managed by something other than RCU.  That
+ * "something other" might be reference counting or simple immortality.
+ */
+#define lockless_dereference(p) \
+({ \
+	typeof(p) _________p1 = ACCESS_ONCE(p); \
+	smp_read_barrier_depends(); /* Dependency order vs. p above. */ \
+	(_________p1); \
+})
+
 /* Ignore/forbid kprobes attach on very low level functions marked by this attribute: */
 #ifdef CONFIG_KPROBES
 # define __kprobes	__attribute__((__section__(".kprobes.text")))
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 4912c953cab7..965725f957d9 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -555,21 +555,6 @@ static inline void rcu_preempt_sleep_check(void)
 	} while (0)
 
 /**
- * lockless_dereference() - safely load a pointer for later dereference
- * @p: The pointer to load
- *
- * Similar to rcu_dereference(), but for situations where the pointed-to
- * object's lifetime is managed by something other than RCU.  That
- * "something other" might be reference counting or simple immortality.
- */
-#define lockless_dereference(p) \
-({ \
-	typeof(p) _________p1 = ACCESS_ONCE(p); \
-	smp_read_barrier_depends(); /* Dependency order vs. p above. */ \
-	(_________p1); \
-})
-
-/**
  * rcu_access_pointer() - fetch RCU pointer with no dereferencing
  * @p: The pointer to read
  *
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [patch added to the 3.12 stable tree] x86/ldt: Make modify_ldt synchronous
  2015-08-19  6:35 [patch added to the 3.12 stable tree] x86/nmi: Enable nested do_nmi() handling for 64-bit kernels Jiri Slaby
                   ` (4 preceding siblings ...)
  2015-08-19  6:35 ` [patch added to the 3.12 stable tree] rcu: Move lockless_dereference() out of rcupdate.h Jiri Slaby
@ 2015-08-19  6:35 ` Jiri Slaby
  2015-08-19 15:49   ` Andy Lutomirski
  2015-08-19  6:35 ` [patch added to the 3.12 stable tree] x86/ldt: Correct LDT access in single stepping logic Jiri Slaby
  2015-08-19  6:35 ` [patch added to the 3.12 stable tree] x86/ldt: Correct FPU emulation access to LDT Jiri Slaby
  7 siblings, 1 reply; 13+ messages in thread
From: Jiri Slaby @ 2015-08-19  6:35 UTC (permalink / raw)
  To: stable
  Cc: Andy Lutomirski, Andrew Cooper, Andy Lutomirski, Boris Ostrovsky,
	Borislav Petkov, Brian Gerst, Denys Vlasenko, H. Peter Anvin,
	Jan Beulich, Konrad Rzeszutek Wilk, Linus Torvalds,
	Peter Zijlstra, Sasha Levin, Steven Rostedt, Thomas Gleixner,
	Ingo Molnar, Jiri Slaby

From: Andy Lutomirski <luto@kernel.org>

This patch has been added to the 3.12 stable tree. If you have any
objections, please let us know.

===============

commit 37868fe113ff2ba814b3b4eb12df214df555f8dc upstream.

modify_ldt() has questionable locking and does not synchronize
threads.  Improve it: redesign the locking and synchronize all
threads' LDTs using an IPI on all modifications.

This will dramatically slow down modify_ldt in multithreaded
programs, but there shouldn't be any multithreaded programs that
care about modify_ldt's performance in the first place.

This fixes some fallout from the CVE-2015-5157 fixes.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/4c6978476782160600471bd865b318db34c7b628.1438291540.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
---
 arch/x86/include/asm/desc.h        |  15 ---
 arch/x86/include/asm/mmu.h         |   3 +-
 arch/x86/include/asm/mmu_context.h |  48 ++++++-
 arch/x86/kernel/cpu/common.c       |   4 +-
 arch/x86/kernel/cpu/perf_event.c   |  13 +-
 arch/x86/kernel/ldt.c              | 262 ++++++++++++++++++++-----------------
 arch/x86/kernel/process_64.c       |   4 +-
 arch/x86/kernel/step.c             |   6 +-
 arch/x86/power/cpu.c               |   3 +-
 9 files changed, 208 insertions(+), 150 deletions(-)

diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h
index f6aaf7d16571..cb69cd0ba8c7 100644
--- a/arch/x86/include/asm/desc.h
+++ b/arch/x86/include/asm/desc.h
@@ -280,21 +280,6 @@ static inline void clear_LDT(void)
 	set_ldt(NULL, 0);
 }
 
-/*
- * load one particular LDT into the current CPU
- */
-static inline void load_LDT_nolock(mm_context_t *pc)
-{
-	set_ldt(pc->ldt, pc->size);
-}
-
-static inline void load_LDT(mm_context_t *pc)
-{
-	preempt_disable();
-	load_LDT_nolock(pc);
-	preempt_enable();
-}
-
 static inline unsigned long get_desc_base(const struct desc_struct *desc)
 {
 	return (unsigned)(desc->base0 | ((desc->base1) << 16) | ((desc->base2) << 24));
diff --git a/arch/x86/include/asm/mmu.h b/arch/x86/include/asm/mmu.h
index 5f55e6962769..926f67263287 100644
--- a/arch/x86/include/asm/mmu.h
+++ b/arch/x86/include/asm/mmu.h
@@ -9,8 +9,7 @@
  * we put the segment information here.
  */
 typedef struct {
-	void *ldt;
-	int size;
+	struct ldt_struct *ldt;
 
 #ifdef CONFIG_X86_64
 	/* True if mm supports a task running in 32 bit compatibility mode. */
diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index be12c534fd59..86fef96f4eca 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -16,6 +16,50 @@ static inline void paravirt_activate_mm(struct mm_struct *prev,
 #endif	/* !CONFIG_PARAVIRT */
 
 /*
+ * ldt_structs can be allocated, used, and freed, but they are never
+ * modified while live.
+ */
+struct ldt_struct {
+	/*
+	 * Xen requires page-aligned LDTs with special permissions.  This is
+	 * needed to prevent us from installing evil descriptors such as
+	 * call gates.  On native, we could merge the ldt_struct and LDT
+	 * allocations, but it's not worth trying to optimize.
+	 */
+	struct desc_struct *entries;
+	int size;
+};
+
+static inline void load_mm_ldt(struct mm_struct *mm)
+{
+	struct ldt_struct *ldt;
+
+	/* lockless_dereference synchronizes with smp_store_release */
+	ldt = lockless_dereference(mm->context.ldt);
+
+	/*
+	 * Any change to mm->context.ldt is followed by an IPI to all
+	 * CPUs with the mm active.  The LDT will not be freed until
+	 * after the IPI is handled by all such CPUs.  This means that,
+	 * if the ldt_struct changes before we return, the values we see
+	 * will be safe, and the new values will be loaded before we run
+	 * any user code.
+	 *
+	 * NB: don't try to convert this to use RCU without extreme care.
+	 * We would still need IRQs off, because we don't want to change
+	 * the local LDT after an IPI loaded a newer value than the one
+	 * that we can see.
+	 */
+
+	if (unlikely(ldt))
+		set_ldt(ldt->entries, ldt->size);
+	else
+		clear_LDT();
+
+	DEBUG_LOCKS_WARN_ON(preemptible());
+}
+
+/*
  * Used for LDT copy/destruction.
  */
 int init_new_context(struct task_struct *tsk, struct mm_struct *mm);
@@ -50,7 +94,7 @@ static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next,
 
 		/* Load the LDT, if the LDT is different: */
 		if (unlikely(prev->context.ldt != next->context.ldt))
-			load_LDT_nolock(&next->context);
+			load_mm_ldt(next);
 	}
 #ifdef CONFIG_SMP
 	  else {
@@ -71,7 +115,7 @@ static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next,
 			 * to make sure to use no freed page tables.
 			 */
 			load_cr3(next->pgd);
-			load_LDT_nolock(&next->context);
+			load_mm_ldt(next);
 		}
 	}
 #endif
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 00cc6f79615d..6db4828574ef 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1309,7 +1309,7 @@ void cpu_init(void)
 	load_sp0(t, &current->thread);
 	set_tss_desc(cpu, t);
 	load_TR_desc();
-	load_LDT(&init_mm.context);
+	load_mm_ldt(&init_mm);
 
 	clear_all_debug_regs();
 	dbg_restore_debug_regs();
@@ -1356,7 +1356,7 @@ void cpu_init(void)
 	load_sp0(t, thread);
 	set_tss_desc(cpu, t);
 	load_TR_desc();
-	load_LDT(&init_mm.context);
+	load_mm_ldt(&init_mm);
 
 	t->x86_tss.io_bitmap_base = offsetof(struct tss_struct, io_bitmap);
 
diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index c7106f116fb0..0271272d55d0 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -31,6 +31,7 @@
 #include <asm/nmi.h>
 #include <asm/smp.h>
 #include <asm/alternative.h>
+#include <asm/mmu_context.h>
 #include <asm/timer.h>
 #include <asm/desc.h>
 #include <asm/ldt.h>
@@ -1953,21 +1954,25 @@ static unsigned long get_segment_base(unsigned int segment)
 	int idx = segment >> 3;
 
 	if ((segment & SEGMENT_TI_MASK) == SEGMENT_LDT) {
+		struct ldt_struct *ldt;
+
 		if (idx > LDT_ENTRIES)
 			return 0;
 
-		if (idx > current->active_mm->context.size)
+		/* IRQs are off, so this synchronizes with smp_store_release */
+		ldt = lockless_dereference(current->active_mm->context.ldt);
+		if (!ldt || idx > ldt->size)
 			return 0;
 
-		desc = current->active_mm->context.ldt;
+		desc = &ldt->entries[idx];
 	} else {
 		if (idx > GDT_ENTRIES)
 			return 0;
 
-		desc = __this_cpu_ptr(&gdt_page.gdt[0]);
+		desc = __this_cpu_ptr(&gdt_page.gdt[0]) + idx;
 	}
 
-	return get_desc_base(desc + idx);
+	return get_desc_base(desc);
 }
 
 #ifdef CONFIG_COMPAT
diff --git a/arch/x86/kernel/ldt.c b/arch/x86/kernel/ldt.c
index c37886d759cc..2bcc0525f1c1 100644
--- a/arch/x86/kernel/ldt.c
+++ b/arch/x86/kernel/ldt.c
@@ -12,6 +12,7 @@
 #include <linux/string.h>
 #include <linux/mm.h>
 #include <linux/smp.h>
+#include <linux/slab.h>
 #include <linux/vmalloc.h>
 #include <linux/uaccess.h>
 
@@ -20,82 +21,82 @@
 #include <asm/mmu_context.h>
 #include <asm/syscalls.h>
 
-#ifdef CONFIG_SMP
+/* context.lock is held for us, so we don't need any locking. */
 static void flush_ldt(void *current_mm)
 {
-	if (current->active_mm == current_mm)
-		load_LDT(&current->active_mm->context);
+	mm_context_t *pc;
+
+	if (current->active_mm != current_mm)
+		return;
+
+	pc = &current->active_mm->context;
+	set_ldt(pc->ldt->entries, pc->ldt->size);
 }
-#endif
 
-static int alloc_ldt(mm_context_t *pc, int mincount, int reload)
+/* The caller must call finalize_ldt_struct on the result. LDT starts zeroed. */
+static struct ldt_struct *alloc_ldt_struct(int size)
 {
-	void *oldldt, *newldt;
-	int oldsize;
-
-	if (mincount <= pc->size)
-		return 0;
-	oldsize = pc->size;
-	mincount = (mincount + (PAGE_SIZE / LDT_ENTRY_SIZE - 1)) &
-			(~(PAGE_SIZE / LDT_ENTRY_SIZE - 1));
-	if (mincount * LDT_ENTRY_SIZE > PAGE_SIZE)
-		newldt = vmalloc(mincount * LDT_ENTRY_SIZE);
+	struct ldt_struct *new_ldt;
+	int alloc_size;
+
+	if (size > LDT_ENTRIES)
+		return NULL;
+
+	new_ldt = kmalloc(sizeof(struct ldt_struct), GFP_KERNEL);
+	if (!new_ldt)
+		return NULL;
+
+	BUILD_BUG_ON(LDT_ENTRY_SIZE != sizeof(struct desc_struct));
+	alloc_size = size * LDT_ENTRY_SIZE;
+
+	/*
+	 * Xen is very picky: it requires a page-aligned LDT that has no
+	 * trailing nonzero bytes in any page that contains LDT descriptors.
+	 * Keep it simple: zero the whole allocation and never allocate less
+	 * than PAGE_SIZE.
+	 */
+	if (alloc_size > PAGE_SIZE)
+		new_ldt->entries = vzalloc(alloc_size);
 	else
-		newldt = (void *)__get_free_page(GFP_KERNEL);
-
-	if (!newldt)
-		return -ENOMEM;
+		new_ldt->entries = kzalloc(PAGE_SIZE, GFP_KERNEL);
 
-	if (oldsize)
-		memcpy(newldt, pc->ldt, oldsize * LDT_ENTRY_SIZE);
-	oldldt = pc->ldt;
-	memset(newldt + oldsize * LDT_ENTRY_SIZE, 0,
-	       (mincount - oldsize) * LDT_ENTRY_SIZE);
+	if (!new_ldt->entries) {
+		kfree(new_ldt);
+		return NULL;
+	}
 
-	paravirt_alloc_ldt(newldt, mincount);
+	new_ldt->size = size;
+	return new_ldt;
+}
 
-#ifdef CONFIG_X86_64
-	/* CHECKME: Do we really need this ? */
-	wmb();
-#endif
-	pc->ldt = newldt;
-	wmb();
-	pc->size = mincount;
-	wmb();
-
-	if (reload) {
-#ifdef CONFIG_SMP
-		preempt_disable();
-		load_LDT(pc);
-		if (!cpumask_equal(mm_cpumask(current->mm),
-				   cpumask_of(smp_processor_id())))
-			smp_call_function(flush_ldt, current->mm, 1);
-		preempt_enable();
-#else
-		load_LDT(pc);
-#endif
-	}
-	if (oldsize) {
-		paravirt_free_ldt(oldldt, oldsize);
-		if (oldsize * LDT_ENTRY_SIZE > PAGE_SIZE)
-			vfree(oldldt);
-		else
-			put_page(virt_to_page(oldldt));
-	}
-	return 0;
+/* After calling this, the LDT is immutable. */
+static void finalize_ldt_struct(struct ldt_struct *ldt)
+{
+	paravirt_alloc_ldt(ldt->entries, ldt->size);
 }
 
-static inline int copy_ldt(mm_context_t *new, mm_context_t *old)
+/* context.lock is held */
+static void install_ldt(struct mm_struct *current_mm,
+			struct ldt_struct *ldt)
 {
-	int err = alloc_ldt(new, old->size, 0);
-	int i;
+	/* Synchronizes with lockless_dereference in load_mm_ldt. */
+	smp_store_release(&current_mm->context.ldt, ldt);
+
+	/* Activate the LDT for all CPUs using current_mm. */
+	on_each_cpu_mask(mm_cpumask(current_mm), flush_ldt, current_mm, true);
+}
 
-	if (err < 0)
-		return err;
+static void free_ldt_struct(struct ldt_struct *ldt)
+{
+	if (likely(!ldt))
+		return;
 
-	for (i = 0; i < old->size; i++)
-		write_ldt_entry(new->ldt, i, old->ldt + i * LDT_ENTRY_SIZE);
-	return 0;
+	paravirt_free_ldt(ldt->entries, ldt->size);
+	if (ldt->size * LDT_ENTRY_SIZE > PAGE_SIZE)
+		vfree(ldt->entries);
+	else
+		kfree(ldt->entries);
+	kfree(ldt);
 }
 
 /*
@@ -104,17 +105,37 @@ static inline int copy_ldt(mm_context_t *new, mm_context_t *old)
  */
 int init_new_context(struct task_struct *tsk, struct mm_struct *mm)
 {
+	struct ldt_struct *new_ldt;
 	struct mm_struct *old_mm;
 	int retval = 0;
 
 	mutex_init(&mm->context.lock);
-	mm->context.size = 0;
 	old_mm = current->mm;
-	if (old_mm && old_mm->context.size > 0) {
-		mutex_lock(&old_mm->context.lock);
-		retval = copy_ldt(&mm->context, &old_mm->context);
-		mutex_unlock(&old_mm->context.lock);
+	if (!old_mm) {
+		mm->context.ldt = NULL;
+		return 0;
 	}
+
+	mutex_lock(&old_mm->context.lock);
+	if (!old_mm->context.ldt) {
+		mm->context.ldt = NULL;
+		goto out_unlock;
+	}
+
+	new_ldt = alloc_ldt_struct(old_mm->context.ldt->size);
+	if (!new_ldt) {
+		retval = -ENOMEM;
+		goto out_unlock;
+	}
+
+	memcpy(new_ldt->entries, old_mm->context.ldt->entries,
+	       new_ldt->size * LDT_ENTRY_SIZE);
+	finalize_ldt_struct(new_ldt);
+
+	mm->context.ldt = new_ldt;
+
+out_unlock:
+	mutex_unlock(&old_mm->context.lock);
 	return retval;
 }
 
@@ -125,53 +146,47 @@ int init_new_context(struct task_struct *tsk, struct mm_struct *mm)
  */
 void destroy_context(struct mm_struct *mm)
 {
-	if (mm->context.size) {
-#ifdef CONFIG_X86_32
-		/* CHECKME: Can this ever happen ? */
-		if (mm == current->active_mm)
-			clear_LDT();
-#endif
-		paravirt_free_ldt(mm->context.ldt, mm->context.size);
-		if (mm->context.size * LDT_ENTRY_SIZE > PAGE_SIZE)
-			vfree(mm->context.ldt);
-		else
-			put_page(virt_to_page(mm->context.ldt));
-		mm->context.size = 0;
-	}
+	free_ldt_struct(mm->context.ldt);
+	mm->context.ldt = NULL;
 }
 
 static int read_ldt(void __user *ptr, unsigned long bytecount)
 {
-	int err;
+	int retval;
 	unsigned long size;
 	struct mm_struct *mm = current->mm;
 
-	if (!mm->context.size)
-		return 0;
+	mutex_lock(&mm->context.lock);
+
+	if (!mm->context.ldt) {
+		retval = 0;
+		goto out_unlock;
+	}
+
 	if (bytecount > LDT_ENTRY_SIZE * LDT_ENTRIES)
 		bytecount = LDT_ENTRY_SIZE * LDT_ENTRIES;
 
-	mutex_lock(&mm->context.lock);
-	size = mm->context.size * LDT_ENTRY_SIZE;
+	size = mm->context.ldt->size * LDT_ENTRY_SIZE;
 	if (size > bytecount)
 		size = bytecount;
 
-	err = 0;
-	if (copy_to_user(ptr, mm->context.ldt, size))
-		err = -EFAULT;
-	mutex_unlock(&mm->context.lock);
-	if (err < 0)
-		goto error_return;
+	if (copy_to_user(ptr, mm->context.ldt->entries, size)) {
+		retval = -EFAULT;
+		goto out_unlock;
+	}
+
 	if (size != bytecount) {
-		/* zero-fill the rest */
-		if (clear_user(ptr + size, bytecount - size) != 0) {
-			err = -EFAULT;
-			goto error_return;
+		/* Zero-fill the rest and pretend we read bytecount bytes. */
+		if (clear_user(ptr + size, bytecount - size)) {
+			retval = -EFAULT;
+			goto out_unlock;
 		}
 	}
-	return bytecount;
-error_return:
-	return err;
+	retval = bytecount;
+
+out_unlock:
+	mutex_unlock(&mm->context.lock);
+	return retval;
 }
 
 static int read_default_ldt(void __user *ptr, unsigned long bytecount)
@@ -195,6 +210,8 @@ static int write_ldt(void __user *ptr, unsigned long bytecount, int oldmode)
 	struct desc_struct ldt;
 	int error;
 	struct user_desc ldt_info;
+	int oldsize, newsize;
+	struct ldt_struct *new_ldt, *old_ldt;
 
 	error = -EINVAL;
 	if (bytecount != sizeof(ldt_info))
@@ -213,34 +230,39 @@ static int write_ldt(void __user *ptr, unsigned long bytecount, int oldmode)
 			goto out;
 	}
 
-	mutex_lock(&mm->context.lock);
-	if (ldt_info.entry_number >= mm->context.size) {
-		error = alloc_ldt(&current->mm->context,
-				  ldt_info.entry_number + 1, 1);
-		if (error < 0)
-			goto out_unlock;
-	}
-
-	/* Allow LDTs to be cleared by the user. */
-	if (ldt_info.base_addr == 0 && ldt_info.limit == 0) {
-		if (oldmode || LDT_empty(&ldt_info)) {
-			memset(&ldt, 0, sizeof(ldt));
-			goto install;
+	if ((oldmode && !ldt_info.base_addr && !ldt_info.limit) ||
+	    LDT_empty(&ldt_info)) {
+		/* The user wants to clear the entry. */
+		memset(&ldt, 0, sizeof(ldt));
+	} else {
+		if (!IS_ENABLED(CONFIG_X86_16BIT) && !ldt_info.seg_32bit) {
+			error = -EINVAL;
+			goto out;
 		}
+
+		fill_ldt(&ldt, &ldt_info);
+		if (oldmode)
+			ldt.avl = 0;
 	}
 
-	if (!IS_ENABLED(CONFIG_X86_16BIT) && !ldt_info.seg_32bit) {
-		error = -EINVAL;
+	mutex_lock(&mm->context.lock);
+
+	old_ldt = mm->context.ldt;
+	oldsize = old_ldt ? old_ldt->size : 0;
+	newsize = max((int)(ldt_info.entry_number + 1), oldsize);
+
+	error = -ENOMEM;
+	new_ldt = alloc_ldt_struct(newsize);
+	if (!new_ldt)
 		goto out_unlock;
-	}
 
-	fill_ldt(&ldt, &ldt_info);
-	if (oldmode)
-		ldt.avl = 0;
+	if (old_ldt)
+		memcpy(new_ldt->entries, old_ldt->entries, oldsize * LDT_ENTRY_SIZE);
+	new_ldt->entries[ldt_info.entry_number] = ldt;
+	finalize_ldt_struct(new_ldt);
 
-	/* Install the new entry ...  */
-install:
-	write_ldt_entry(mm->context.ldt, ldt_info.entry_number, &ldt);
+	install_ldt(mm, new_ldt);
+	free_ldt_struct(old_ldt);
 	error = 0;
 
 out_unlock:
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 8e9fe8dfd37b..f99825ea4f96 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -122,11 +122,11 @@ void __show_regs(struct pt_regs *regs, int all)
 void release_thread(struct task_struct *dead_task)
 {
 	if (dead_task->mm) {
-		if (dead_task->mm->context.size) {
+		if (dead_task->mm->context.ldt) {
 			pr_warn("WARNING: dead process %s still has LDT? <%p/%d>\n",
 				dead_task->comm,
 				dead_task->mm->context.ldt,
-				dead_task->mm->context.size);
+				dead_task->mm->context.ldt->size);
 			BUG();
 		}
 	}
diff --git a/arch/x86/kernel/step.c b/arch/x86/kernel/step.c
index 9b4d51d0c0d0..6273324186ac 100644
--- a/arch/x86/kernel/step.c
+++ b/arch/x86/kernel/step.c
@@ -5,6 +5,7 @@
 #include <linux/mm.h>
 #include <linux/ptrace.h>
 #include <asm/desc.h>
+#include <asm/mmu_context.h>
 
 unsigned long convert_ip_to_linear(struct task_struct *child, struct pt_regs *regs)
 {
@@ -30,10 +31,11 @@ unsigned long convert_ip_to_linear(struct task_struct *child, struct pt_regs *re
 		seg &= ~7UL;
 
 		mutex_lock(&child->mm->context.lock);
-		if (unlikely((seg >> 3) >= child->mm->context.size))
+		if (unlikely(!child->mm->context.ldt ||
+			     (seg >> 3) >= child->mm->context.ldt->size))
 			addr = -1L; /* bogus selector, access would fault */
 		else {
-			desc = child->mm->context.ldt + seg;
+			desc = &child->mm->context.ldt->entries[seg];
 			base = get_desc_base(desc);
 
 			/* 16-bit code segment? */
diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
index 424f4c97a44d..6c8e2f5ce056 100644
--- a/arch/x86/power/cpu.c
+++ b/arch/x86/power/cpu.c
@@ -23,6 +23,7 @@
 #include <asm/debugreg.h>
 #include <asm/fpu-internal.h> /* pcntxt_mask */
 #include <asm/cpu.h>
+#include <asm/mmu_context.h>
 
 #ifdef CONFIG_X86_32
 __visible unsigned long saved_context_ebx;
@@ -157,7 +158,7 @@ static void fix_processor_context(void)
 	syscall_init();				/* This sets MSR_*STAR and related */
 #endif
 	load_TR_desc();				/* This does ltr */
-	load_LDT(&current->active_mm->context);	/* This does lldt */
+	load_mm_ldt(current->active_mm);	/* This does lldt */
 }
 
 /**
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [patch added to the 3.12 stable tree] x86/ldt: Correct LDT access in single stepping logic
  2015-08-19  6:35 [patch added to the 3.12 stable tree] x86/nmi: Enable nested do_nmi() handling for 64-bit kernels Jiri Slaby
                   ` (5 preceding siblings ...)
  2015-08-19  6:35 ` [patch added to the 3.12 stable tree] x86/ldt: Make modify_ldt synchronous Jiri Slaby
@ 2015-08-19  6:35 ` Jiri Slaby
  2015-08-19  6:35 ` [patch added to the 3.12 stable tree] x86/ldt: Correct FPU emulation access to LDT Jiri Slaby
  7 siblings, 0 replies; 13+ messages in thread
From: Jiri Slaby @ 2015-08-19  6:35 UTC (permalink / raw)
  To: stable
  Cc: Juergen Gross, Linus Torvalds, Peter Zijlstra, Thomas Gleixner,
	bp, Ingo Molnar, Jiri Slaby

From: Juergen Gross <jgross@suse.com>

This patch has been added to the 3.12 stable tree. If you have any
objections, please let us know.

===============

commit 136d9d83c07c5e30ac49fc83b27e8c4842f108fc upstream.

Commit 37868fe113ff ("x86/ldt: Make modify_ldt synchronous")
introduced a new struct ldt_struct anchored at mm->context.ldt.

convert_ip_to_linear() was changed to reflect this, but indexing
into the ldt has to be changed as the pointer is no longer void *.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bp@suse.de
Link: http://lkml.kernel.org/r/1438848278-12906-1-git-send-email-jgross@suse.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
---
 arch/x86/kernel/step.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/step.c b/arch/x86/kernel/step.c
index 6273324186ac..0ccb53a9fcd9 100644
--- a/arch/x86/kernel/step.c
+++ b/arch/x86/kernel/step.c
@@ -28,11 +28,11 @@ unsigned long convert_ip_to_linear(struct task_struct *child, struct pt_regs *re
 		struct desc_struct *desc;
 		unsigned long base;
 
-		seg &= ~7UL;
+		seg >>= 3;
 
 		mutex_lock(&child->mm->context.lock);
 		if (unlikely(!child->mm->context.ldt ||
-			     (seg >> 3) >= child->mm->context.ldt->size))
+			     seg >= child->mm->context.ldt->size))
 			addr = -1L; /* bogus selector, access would fault */
 		else {
 			desc = &child->mm->context.ldt->entries[seg];
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [patch added to the 3.12 stable tree] x86/ldt: Correct FPU emulation access to LDT
  2015-08-19  6:35 [patch added to the 3.12 stable tree] x86/nmi: Enable nested do_nmi() handling for 64-bit kernels Jiri Slaby
                   ` (6 preceding siblings ...)
  2015-08-19  6:35 ` [patch added to the 3.12 stable tree] x86/ldt: Correct LDT access in single stepping logic Jiri Slaby
@ 2015-08-19  6:35 ` Jiri Slaby
  7 siblings, 0 replies; 13+ messages in thread
From: Jiri Slaby @ 2015-08-19  6:35 UTC (permalink / raw)
  To: stable
  Cc: Juergen Gross, Linus Torvalds, Peter Zijlstra, Thomas Gleixner,
	billm, Ingo Molnar, Jiri Slaby

From: Juergen Gross <jgross@suse.com>

This patch has been added to the 3.12 stable tree. If you have any
objections, please let us know.

===============

commit 4809146b86c3d41ce588fdb767d021e2a80600dd upstream.

Commit 37868fe113ff ("x86/ldt: Make modify_ldt synchronous")
introduced a new struct ldt_struct anchored at mm->context.ldt.

Adapt the x86 fpu emulation code to use that new structure.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: billm@melbpc.org.au
Link: http://lkml.kernel.org/r/1438883674-1240-1-git-send-email-jgross@suse.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
---
 arch/x86/math-emu/fpu_entry.c   |  3 +--
 arch/x86/math-emu/fpu_system.h  | 21 ++++++++++++++++++---
 arch/x86/math-emu/get_address.c |  3 +--
 3 files changed, 20 insertions(+), 7 deletions(-)

diff --git a/arch/x86/math-emu/fpu_entry.c b/arch/x86/math-emu/fpu_entry.c
index 9b868124128d..274a52b1183e 100644
--- a/arch/x86/math-emu/fpu_entry.c
+++ b/arch/x86/math-emu/fpu_entry.c
@@ -29,7 +29,6 @@
 
 #include <asm/uaccess.h>
 #include <asm/traps.h>
-#include <asm/desc.h>
 #include <asm/user.h>
 #include <asm/i387.h>
 
@@ -185,7 +184,7 @@ void math_emulate(struct math_emu_info *info)
 			math_abort(FPU_info, SIGILL);
 		}
 
-		code_descriptor = LDT_DESCRIPTOR(FPU_CS);
+		code_descriptor = FPU_get_ldt_descriptor(FPU_CS);
 		if (SEG_D_SIZE(code_descriptor)) {
 			/* The above test may be wrong, the book is not clear */
 			/* Segmented 32 bit protected mode */
diff --git a/arch/x86/math-emu/fpu_system.h b/arch/x86/math-emu/fpu_system.h
index 2c614410a5f3..d342fce49447 100644
--- a/arch/x86/math-emu/fpu_system.h
+++ b/arch/x86/math-emu/fpu_system.h
@@ -16,9 +16,24 @@
 #include <linux/kernel.h>
 #include <linux/mm.h>
 
-/* s is always from a cpu register, and the cpu does bounds checking
- * during register load --> no further bounds checks needed */
-#define LDT_DESCRIPTOR(s)	(((struct desc_struct *)current->mm->context.ldt)[(s) >> 3])
+#include <asm/desc.h>
+#include <asm/mmu_context.h>
+
+static inline struct desc_struct FPU_get_ldt_descriptor(unsigned seg)
+{
+	static struct desc_struct zero_desc;
+	struct desc_struct ret = zero_desc;
+
+#ifdef CONFIG_MODIFY_LDT_SYSCALL
+	seg >>= 3;
+	mutex_lock(&current->mm->context.lock);
+	if (current->mm->context.ldt && seg < current->mm->context.ldt->size)
+		ret = current->mm->context.ldt->entries[seg];
+	mutex_unlock(&current->mm->context.lock);
+#endif
+	return ret;
+}
+
 #define SEG_D_SIZE(x)		((x).b & (3 << 21))
 #define SEG_G_BIT(x)		((x).b & (1 << 23))
 #define SEG_GRANULARITY(x)	(((x).b & (1 << 23)) ? 4096 : 1)
diff --git a/arch/x86/math-emu/get_address.c b/arch/x86/math-emu/get_address.c
index 6ef5e99380f9..d13cab2aec45 100644
--- a/arch/x86/math-emu/get_address.c
+++ b/arch/x86/math-emu/get_address.c
@@ -20,7 +20,6 @@
 #include <linux/stddef.h>
 
 #include <asm/uaccess.h>
-#include <asm/desc.h>
 
 #include "fpu_system.h"
 #include "exception.h"
@@ -158,7 +157,7 @@ static long pm_address(u_char FPU_modrm, u_char segment,
 		addr->selector = PM_REG_(segment);
 	}
 
-	descriptor = LDT_DESCRIPTOR(PM_REG_(segment));
+	descriptor = FPU_get_ldt_descriptor(segment);
 	base_address = SEG_BASE_ADDR(descriptor);
 	address = base_address + offset;
 	limit = base_address
-- 
2.5.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [patch added to the 3.12 stable tree] x86/ldt: Make modify_ldt synchronous
  2015-08-19  6:35 ` [patch added to the 3.12 stable tree] x86/ldt: Make modify_ldt synchronous Jiri Slaby
@ 2015-08-19 15:49   ` Andy Lutomirski
  2015-08-19 16:04     ` Jiri Slaby
  0 siblings, 1 reply; 13+ messages in thread
From: Andy Lutomirski @ 2015-08-19 15:49 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: stable, Andy Lutomirski, Andrew Cooper, Boris Ostrovsky,
	Borislav Petkov, Brian Gerst, Denys Vlasenko, H. Peter Anvin,
	Jan Beulich, Konrad Rzeszutek Wilk, Linus Torvalds,
	Peter Zijlstra, Sasha Levin, Steven Rostedt, Thomas Gleixner,
	Ingo Molnar

On Tue, Aug 18, 2015 at 11:35 PM, Jiri Slaby <jslaby@suse.cz> wrote:
> From: Andy Lutomirski <luto@kernel.org>
>
> This patch has been added to the 3.12 stable tree. If you have any
> objections, please let us know.

Do you have the followups:

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=4809146b86c3
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=136d9d83c07c5e30ac49fc83b27e8c4842f108fc
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=12e244f4b550498bbaf654a52f93633f7dde2dc7

--Andy

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [patch added to the 3.12 stable tree] x86/ldt: Make modify_ldt synchronous
  2015-08-19 15:49   ` Andy Lutomirski
@ 2015-08-19 16:04     ` Jiri Slaby
  2015-08-19 17:48       ` Andy Lutomirski
  0 siblings, 1 reply; 13+ messages in thread
From: Jiri Slaby @ 2015-08-19 16:04 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: stable, Andy Lutomirski, Andrew Cooper, Boris Ostrovsky,
	Borislav Petkov, Brian Gerst, Denys Vlasenko, H. Peter Anvin,
	Jan Beulich, Konrad Rzeszutek Wilk, Linus Torvalds,
	Peter Zijlstra, Sasha Levin, Steven Rostedt, Thomas Gleixner,
	Ingo Molnar

On 08/19/2015, 05:49 PM, Andy Lutomirski wrote:
> On Tue, Aug 18, 2015 at 11:35 PM, Jiri Slaby <jslaby@suse.cz> wrote:
>> From: Andy Lutomirski <luto@kernel.org>
>>
>> This patch has been added to the 3.12 stable tree. If you have any
>> objections, please let us know.
> 
> Do you have the followups:
> 
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=4809146b86c3
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=136d9d83c07c5e30ac49fc83b27e8c4842f108fc

Yes.

> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=12e244f4b550498bbaf654a52f93633f7dde2dc7

And no -- applied.

Thanks for the heads-up!

I think I will postpone all these a bit as XEN is not fixed in upstream yet.

thanks,
-- 
js
suse labs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [patch added to the 3.12 stable tree] x86/ldt: Make modify_ldt synchronous
  2015-08-19 16:04     ` Jiri Slaby
@ 2015-08-19 17:48       ` Andy Lutomirski
  2015-08-20  6:38         ` Jiri Slaby
  0 siblings, 1 reply; 13+ messages in thread
From: Andy Lutomirski @ 2015-08-19 17:48 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: Thomas Gleixner, Denys Vlasenko, Borislav Petkov, stable,
	Andrew Cooper, Ingo Molnar, Steven Rostedt, Konrad Rzeszutek Wilk,
	Brian Gerst, Linus Torvalds, Boris Ostrovsky, H. Peter Anvin,
	Sasha Levin, Peter Zijlstra, Jan Beulich

On Aug 19, 2015 9:04 AM, "Jiri Slaby" <jslaby@suse.cz> wrote:
>
> On 08/19/2015, 05:49 PM, Andy Lutomirski wrote:
> > On Tue, Aug 18, 2015 at 11:35 PM, Jiri Slaby <jslaby@suse.cz> wrote:
> >> From: Andy Lutomirski <luto@kernel.org>
> >>
> >> This patch has been added to the 3.12 stable tree. If you have any
> >> objections, please let us know.
> >
> > Do you have the followups:
> >
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=4809146b86c3
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=136d9d83c07c5e30ac49fc83b27e8c4842f108fc
>
> Yes.
>
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=12e244f4b550498bbaf654a52f93633f7dde2dc7
>
> And no -- applied.
>
> Thanks for the heads-up!
>
> I think I will postpone all these a bit as XEN is not fixed in upstream yet.
>

I thought Xen was fixed.  What's the issue?  The only problem I know
of is fixed by:

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=aa1acff356bbedfd03b544051f5b37

--Andy

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [patch added to the 3.12 stable tree] x86/ldt: Make modify_ldt synchronous
  2015-08-19 17:48       ` Andy Lutomirski
@ 2015-08-20  6:38         ` Jiri Slaby
  0 siblings, 0 replies; 13+ messages in thread
From: Jiri Slaby @ 2015-08-20  6:38 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Thomas Gleixner, Denys Vlasenko, Borislav Petkov, stable,
	Andrew Cooper, Ingo Molnar, Steven Rostedt, Konrad Rzeszutek Wilk,
	Brian Gerst, Linus Torvalds, Boris Ostrovsky, H. Peter Anvin,
	Sasha Levin, Peter Zijlstra, Jan Beulich

On 08/19/2015, 07:48 PM, Andy Lutomirski wrote:
> I thought Xen was fixed.  What's the issue?  The only problem I know
> of is fixed by:
> 
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=aa1acff356bbedfd03b544051f5b37

Oh yes! I failed to find it because:
Probe target addresses in set_aliased_prot before the hypercall
Probe target addresses in set_aliased_prot() before the hypercall

The former is the mail you sent, the latter is the commit.

thank you,
-- 
js
suse labs

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2015-08-20  6:38 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-08-19  6:35 [patch added to the 3.12 stable tree] x86/nmi: Enable nested do_nmi() handling for 64-bit kernels Jiri Slaby
2015-08-19  6:35 ` [patch added to the 3.12 stable tree] x86/nmi/64: Remove asm code that saves CR2 Jiri Slaby
2015-08-19  6:35 ` [patch added to the 3.12 stable tree] x86/nmi/64: Switch stacks on userspace NMI entry Jiri Slaby
2015-08-19  6:35 ` [patch added to the 3.12 stable tree] arch: Introduce smp_load_acquire(), smp_store_release() Jiri Slaby
2015-08-19  6:35 ` [patch added to the 3.12 stable tree] rcu: Provide counterpart to rcu_dereference() for non-RCU situations Jiri Slaby
2015-08-19  6:35 ` [patch added to the 3.12 stable tree] rcu: Move lockless_dereference() out of rcupdate.h Jiri Slaby
2015-08-19  6:35 ` [patch added to the 3.12 stable tree] x86/ldt: Make modify_ldt synchronous Jiri Slaby
2015-08-19 15:49   ` Andy Lutomirski
2015-08-19 16:04     ` Jiri Slaby
2015-08-19 17:48       ` Andy Lutomirski
2015-08-20  6:38         ` Jiri Slaby
2015-08-19  6:35 ` [patch added to the 3.12 stable tree] x86/ldt: Correct LDT access in single stepping logic Jiri Slaby
2015-08-19  6:35 ` [patch added to the 3.12 stable tree] x86/ldt: Correct FPU emulation access to LDT Jiri Slaby

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).