public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
To: Andi Kleen <andi@firstfloor.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Ingo Molnar <mingo@elte.hu>
Cc: Steven Rostedt <rostedt@goodmis.org>,
	Steven Rostedt <srostedt@redhat.com>,
	Jeremy Fitzhardinge <jeremy@goop.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Peter Zijlstra <peterz@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	David Miller <davem@davemloft.net>,
	Roland McGrath <roland@redhat.com>,
	Ulrich Drepper <drepper@redhat.com>,
	Rusty Russell <rusty@rustcorp.com.au>,
	Gregory Haskins <ghaskins@novell.com>,
	Arnaldo Carvalho de Melo <acme@redhat.com>,
	"Luis Claudio R. Goncalves" <lclaudio@uudg.org>,
	Clark Williams <williams@redhat.com>,
	Christoph Lameter <cl@linux-foundation.org>
Subject: [RFC PATCH] x86 alternatives : fix LOCK_PREFIX race with preemptible kernel and CPU hotplug
Date: Wed, 13 Aug 2008 19:41:56 -0400	[thread overview]
Message-ID: <20080813234156.GA25775@Krystal> (raw)
In-Reply-To: <20080813200119.GA18966@Krystal>

If a kernel thread is preempted in single-cpu mode right after the NOP (nop
about to be turned into a lock prefix), then we CPU hotplug a CPU, and then the
thread is scheduled back again, a SMP-unsafe atomic operation will be used on
shared SMP variables, leading to corruption. No corruption would happen in the
reverse case : going from SMP to UP is ok because we split a bit instruction
into tiny pieces, which does not present this condition.

Changing the 0x90 (single-byte nop) currently used into a 0x3E DS segment
override prefix should fix this issue. Since the default of the atomic
instructions is to use the DS segment anyway, it should not affect the
behavior.

** BIG FAT WARNING (tm) ** : this patch assumes that the 0x3E prefix will
leave atomic operations as-is (thus assuming they normally touch data in the DS
segment). Is there an obnoxious corner-case I would have missed ?

This source :
AMD64 Architecture Programmer's Manual Volume 3: General-Purpose and System
Instructions
States
"Instructions that Reference a Non-Stack Segment—If an instruction encoding
references any base register other than rBP or rSP, or if an instruction
contains an immediate offset, the default segment is the data segment (DS).
These instructions can use the segment-override prefix to select one of the
non-default segments, as shown in Table 1-5."

Therefore, forcing the DS segment on the atomic operations, which already use
the DS segment, should not change.

This source :
http://wiki.osdev.org/X86_Instruction_Encoding
States
"In 64-bit the CS, SS, DS and ES segment overrides are ignored."
(since it's a wiki, it would have to be confirmed, but at least the worse that
happens is that the DS override is ignored)

This patch applies to 2.6.27-rc2, but would also have to be applied to earlier
kernels (2.6.26, 2.6.25, ...).

Performance impact of the fix : tests done on "xaddq" and "xaddl" shows it
actually improves performances on Intel Xeon, AMD64, Pentium M. It does not
change the performance on Pentium 3 and Pentium 4.

Xeon E5405 2.0GHz :
NR_TESTS                                    10000000
test empty cycles :                        162207948
test test 1-byte nop xadd cycles :         170755422
test test DS override prefix xadd cycles : 170000118 *
test test LOCK xadd cycles :               472012134

AMD64 2.0GHz :
NR_TESTS                                    10000000
test empty cycles :                        146674549
test test 1-byte nop xadd cycles :         150273860
test test DS override prefix xadd cycles : 149982382 *
test test LOCK xadd cycles :               270000690

Pentium 4 3.0GHz
NR_TESTS                                    10000000
test empty cycles :                        290001195
test test 1-byte nop xadd cycles :         310000560
test test DS override prefix xadd cycles : 310000575 *
test test LOCK xadd cycles :              1050103740

Pentium M 2.0GHz
NR_TESTS 10000000
test empty cycles :                        180000523
test test 1-byte nop xadd cycles :         320000345
test test DS override prefix xadd cycles : 310000374 *
test test LOCK xadd cycles :               480000357

Pentium 3 550MHz
NR_TESTS                                    10000000
test empty cycles :                        510000231
test test 1-byte nop xadd cycles :         620000128
test test DS override prefix xadd cycles : 620000110 *
test test LOCK xadd cycles :               800000088

Speed test modules can be found at 
http://ltt.polymtl.ca/svn/trunk/tests/kernel/test-prefix-speed-32.c
http://ltt.polymtl.ca/svn/trunk/tests/kernel/test-prefix-speed.c

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Andi Kleen <andi@firstfloor.org>
CC: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
CC: Steven Rostedt <srostedt@redhat.com>
CC: Jeremy Fitzhardinge <jeremy@goop.org>
CC: Ingo Molnar <mingo@elte.hu>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: David Miller <davem@davemloft.net>
CC: Roland McGrath <roland@redhat.com>
CC: Ulrich Drepper <drepper@redhat.com>
CC: Rusty Russell <rusty@rustcorp.com.au>
CC: Gregory Haskins <ghaskins@novell.com>
CC: Arnaldo Carvalho de Melo <acme@redhat.com>
CC: "Luis Claudio R. Goncalves" <lclaudio@uudg.org>
CC: Clark Williams <williams@redhat.com>
CC: Christoph Lameter <cl@linux-foundation.org>
---
 arch/x86/kernel/alternative.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

Index: linux-2.6-lttng/arch/x86/kernel/alternative.c
===================================================================
--- linux-2.6-lttng.orig/arch/x86/kernel/alternative.c	2008-08-13 18:42:47.000000000 -0400
+++ linux-2.6-lttng/arch/x86/kernel/alternative.c	2008-08-13 18:54:36.000000000 -0400
@@ -241,25 +241,25 @@ static void alternatives_smp_lock(u8 **s
 			continue;
 		if (*ptr > text_end)
 			continue;
-		text_poke(*ptr, ((unsigned char []){0xf0}), 1); /* add lock prefix */
+		/* turn DS segment override prefix into lock prefix */
+		text_poke(*ptr, ((unsigned char []){0xf0}), 1);
 	};
 }
 
 static void alternatives_smp_unlock(u8 **start, u8 **end, u8 *text, u8 *text_end)
 {
 	u8 **ptr;
-	char insn[1];
 
 	if (noreplace_smp)
 		return;
 
-	add_nops(insn, 1);
 	for (ptr = start; ptr < end; ptr++) {
 		if (*ptr < text)
 			continue;
 		if (*ptr > text_end)
 			continue;
-		text_poke(*ptr, insn, 1);
+		/* turn lock prefix into DS segment override prefix */
+		text_poke(*ptr, ((unsigned char []){0x3E}), 1);
 	};
 }
 
-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

  reply	other threads:[~2008-08-13 23:47 UTC|newest]

Thread overview: 100+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-08-07 18:20 [PATCH 0/5] ftrace: to kill a daemon Steven Rostedt
2008-08-07 18:20 ` [PATCH 1/5] ftrace: create __mcount_loc section Steven Rostedt
2008-08-07 18:20 ` [PATCH 2/5] ftrace: mcount call site on boot nops core Steven Rostedt
2008-08-07 18:20 ` [PATCH 3/5] ftrace: enable mcount recording for modules Steven Rostedt
2008-08-08  6:43   ` Rusty Russell
2008-08-08 12:51     ` Steven Rostedt
2008-08-07 18:20 ` [PATCH 4/5] ftrace: rebuild everything on change to FTRACE_MCOUNT_RECORD Steven Rostedt
2008-08-07 18:20 ` [PATCH 5/5] ftrace: enable using mcount recording on x86 Steven Rostedt
2008-08-07 18:47 ` [PATCH 0/5] ftrace: to kill a daemon Mathieu Desnoyers
2008-08-07 20:42   ` Steven Rostedt
2008-08-08 17:22     ` Mathieu Desnoyers
2008-08-08 17:36       ` Steven Rostedt
2008-08-08 17:46         ` Mathieu Desnoyers
2008-08-08 18:13           ` Steven Rostedt
2008-08-08 18:15             ` Peter Zijlstra
2008-08-08 18:21             ` Mathieu Desnoyers
2008-08-08 18:41               ` Steven Rostedt
2008-08-08 19:04                 ` Linus Torvalds
2008-08-08 19:05                 ` Mathieu Desnoyers
2008-08-08 23:38                   ` Steven Rostedt
2008-08-09  0:23                     ` Andi Kleen
2008-08-09  0:36                       ` Steven Rostedt
2008-08-09  0:47                         ` Jeremy Fitzhardinge
2008-08-09  0:51                           ` Linus Torvalds
2008-08-09  1:25                             ` Steven Rostedt
2008-08-13  6:31                               ` Mathieu Desnoyers
2008-08-13 15:38                                 ` Mathieu Desnoyers
2008-08-13 17:52                               ` Efficient x86 and x86_64 NOP microbenchmarks Mathieu Desnoyers
2008-08-13 18:27                                 ` Linus Torvalds
2008-08-13 18:41                                   ` Andi Kleen
2008-08-13 18:45                                     ` Avi Kivity
2008-08-13 18:51                                       ` Andi Kleen
2008-08-13 18:56                                         ` Avi Kivity
2008-08-13 19:30                                     ` Mathieu Desnoyers
2008-08-13 19:37                                       ` Andi Kleen
2008-08-13 20:01                                         ` Mathieu Desnoyers
2008-08-13 23:41                                           ` Mathieu Desnoyers [this message]
2008-08-14  0:01                                             ` [RFC PATCH] x86 alternatives : fix LOCK_PREFIX race with preemptible kernel and CPU hotplug H. Peter Anvin
2008-08-14  1:13                                               ` Mathieu Desnoyers
2008-08-14  1:22                                               ` Jeremy Fitzhardinge
2008-08-14  1:26                                                 ` Roland McGrath
2008-08-14  1:49                                                 ` Mathieu Desnoyers
2008-08-14  3:35                                                   ` Jeremy Fitzhardinge
2008-08-14 15:18                                                     ` Mathieu Desnoyers
2008-08-14 16:10                                                       ` Linus Torvalds
2008-08-14 16:13                                                       ` H. Peter Anvin
2008-08-14 16:58                                                         ` Mathieu Desnoyers
2008-08-14 17:05                                                           ` Jeremy Fitzhardinge
2008-08-14 17:30                                                             ` Mathieu Desnoyers
2008-08-14 17:43                                                               ` Jeremy Fitzhardinge
2008-08-14 18:37                                                                 ` H. Peter Anvin
2008-08-14 18:53                                                                   ` Mathieu Desnoyers
2008-08-14 19:29                                                                     ` Jeremy Fitzhardinge
2008-08-14 20:31                                                                       ` Mathieu Desnoyers
2008-08-14 20:39                                                                         ` H. Peter Anvin
2008-08-14 21:46                                                                         ` Jeremy Fitzhardinge
2008-08-14 22:26                                                                           ` H. Peter Anvin
2008-08-14 17:17                                                           ` H. Peter Anvin
2008-08-14 18:09                                                             ` Mathieu Desnoyers
2008-08-14 19:49                                                             ` Mathieu Desnoyers
2008-08-14 17:04                                                       ` Jeremy Fitzhardinge
2008-08-14 17:18                                                         ` H. Peter Anvin
2008-08-14 17:28                                                           ` Jeremy Fitzhardinge
2008-08-14 17:31                                                             ` H. Peter Anvin
2008-08-14 17:46                                                           ` Mathieu Desnoyers
2008-08-14 17:49                                                             ` Jeremy Fitzhardinge
2008-08-14 17:55                                                               ` Mathieu Desnoyers
2008-08-14 18:59                                                                 ` Gregory Haskins
2008-08-15 21:34                                         ` Efficient x86 and x86_64 NOP microbenchmarks Steven Rostedt
2008-08-15 21:51                                           ` Andi Kleen
2008-08-13 19:16                                   ` Mathieu Desnoyers
2008-08-09  0:51                           ` [PATCH 0/5] ftrace: to kill a daemon Steven Rostedt
2008-08-09  0:53                         ` Roland McGrath
2008-08-09  1:13                           ` Andi Kleen
2008-08-09  1:19                         ` Andi Kleen
2008-08-09  1:30                           ` Steven Rostedt
2008-08-09  1:55                             ` Andi Kleen
2008-08-09  2:03                               ` Steven Rostedt
2008-08-09  2:23                                 ` Andi Kleen
2008-08-09  4:12                           ` Steven Rostedt
2008-08-09  0:30                     ` Steven Rostedt
2008-08-11 18:21                       ` Mathieu Desnoyers
2008-08-11 19:28                         ` Steven Rostedt
2008-08-08 19:08                 ` Jeremy Fitzhardinge
2008-08-11  2:41                 ` Rusty Russell
2008-08-11 12:33                   ` Steven Rostedt
2008-08-07 21:11 ` Jeremy Fitzhardinge
2008-08-07 21:29   ` Steven Rostedt
2008-08-07 22:26     ` Roland McGrath
2008-08-08  1:21       ` Steven Rostedt
2008-08-08  1:24         ` Steven Rostedt
2008-08-08  1:56         ` Steven Rostedt
2008-08-08  7:22         ` Peter Zijlstra
2008-08-08 11:31           ` Steven Rostedt
2008-08-08  4:54       ` Sam Ravnborg
2008-08-09  9:48 ` Abhishek Sagar
2008-08-09 13:01   ` Steven Rostedt
2008-08-09 15:01     ` Abhishek Sagar
2008-08-09 15:37       ` Steven Rostedt
2008-08-09 17:14         ` Abhishek Sagar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080813234156.GA25775@Krystal \
    --to=mathieu.desnoyers@polymtl.ca \
    --cc=acme@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=cl@linux-foundation.org \
    --cc=davem@davemloft.net \
    --cc=drepper@redhat.com \
    --cc=ghaskins@novell.com \
    --cc=jeremy@goop.org \
    --cc=lclaudio@uudg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=peterz@infradead.org \
    --cc=roland@redhat.com \
    --cc=rostedt@goodmis.org \
    --cc=rusty@rustcorp.com.au \
    --cc=srostedt@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=williams@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox