public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] x86-32: improve atomic64_t functions (v2)
@ 2010-02-19 17:26 Luca Barbieri
  2010-02-19 17:26 ` [PATCH 1/5] x86: add support for relative CALL and JMP in alternatives (v2) Luca Barbieri
                   ` (5 more replies)
  0 siblings, 6 replies; 15+ messages in thread
From: Luca Barbieri @ 2010-02-19 17:26 UTC (permalink / raw)
  To: mingo; +Cc: hpa, a.p.zijlstra, akpm, linux-kernel, Luca Barbieri

Changes in v2:
- 386/486 is supported with a custom assembly implementation, the generic
  implementation is no longer used/modified
- dropped SSE code
- changed CALL alternative code to use a custom alternative type:
  insn parser no longer used
- several implementation improvements
- several formatting/style improvements
- merged 386 support into main patch

This patchset improves the atomic64_t functions on x86-32.
It also includes a testsuite that has been used to test this functionality
and can test any atomic64_t implementation.

It offers the following improvements:
1. Better code due to hand-written assembly (e.g. use of the ZF flag)
2. All atomic64 functions implemented
3. Support for 386/486 due to the ability to alternatively use either
   the cmpxchg8b assembly implementation or the 386 cli/popf assembly one

The first patches add functionality to the alternatives system to support
the new atomic64_t code.
A patch that improves cmpxchg64() using that functionality is also included.

To test this code, enable CONFIG_ATOMIC64_SELFTEST, compile for 386 and
boot normally and with "clearcpuid=8".

You should receive a message stating that the atomic64 test passed,
along with the selected configuration.

386/486 SMP is not supported, following existing practice, but the code
is structured to allow to very easily add such support.

Signed-off-by: Luca Barbieri <luca@luca-barbieri.com>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/5] x86: add support for relative CALL and JMP in alternatives (v2)
  2010-02-19 17:26 [PATCH 0/5] x86-32: improve atomic64_t functions (v2) Luca Barbieri
@ 2010-02-19 17:26 ` Luca Barbieri
  2010-02-19 17:26 ` [PATCH 2/5] x86: add support for lock prefix " Luca Barbieri
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 15+ messages in thread
From: Luca Barbieri @ 2010-02-19 17:26 UTC (permalink / raw)
  To: mingo; +Cc: hpa, a.p.zijlstra, akpm, linux-kernel, Luca Barbieri

Changes in v2:
- Don't use instruction parser: use the method described below instead

Currently CALL and JMP cannot be used in alternatives because the
relative offset would be wrong.

This patch adds a new type of alternative, denoted by replacementlen = 0xff.
This alternative causes the kernel to generate a CALL rel32 to the
address provided in the alternative sequence address field.

This can be generated with ALTERNATIVE_CALL

This approach has the advantage of not requiring the instruction parser,
not requiring ad-hoc compile time relocation logic, and minimizing the
size of the alternative data.

Alternatives more complex than a single CALL could still be supported
with multiple successive alternative patches, but this is currently not
required.

Signed-off-by: Luca Barbieri <luca@luca-barbieri.com>
---
 arch/x86/include/asm/alternative.h |   14 ++++++++++++++
 arch/x86/kernel/alternative.c      |   24 ++++++++++++++++++++----
 2 files changed, 34 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h
index 69b74a7..77f78e2 100644
--- a/arch/x86/include/asm/alternative.h
+++ b/arch/x86/include/asm/alternative.h
@@ -90,6 +90,20 @@ static inline void alternatives_smp_switch(int smp) {}
       "663:\n\t" newinstr "\n664:\n"		/* replacement     */	\
       ".previous"
 
+#define ALTERNATIVE_CALL(oldinstr, func, feature)		       \
+									\
+      "661:\n\t" oldinstr "\n662:\n"				    \
+      ".section .altinstructions,\"a\"\n"			       \
+      _ASM_ALIGN "\n"						   \
+      _ASM_PTR "661b\n"			 /* label	   */   \
+      _ASM_PTR func "\n"			/* new instruction */   \
+      "  .byte " __stringify(feature) "\n"      /* feature bit     */   \
+      "  .byte 662b-661b\n"		     /* sourcelen       */   \
+      "  .byte 0xff\n"			  /* replacementlen  */   \
+      "  .byte 0\n"			     /* pad */	       \
+      ".previous\n"						     \
+
+
 /*
  * Alternative instructions for different CPU types or capabilities.
  *
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index de7353c..77eba91 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -210,7 +210,7 @@ void __init_or_module apply_alternatives(struct alt_instr *start,
 	DPRINTK("%s: alt table %p -> %p\n", __func__, start, end);
 	for (a = start; a < end; a++) {
 		u8 *instr = a->instr;
-		BUG_ON(a->replacementlen > a->instrlen);
+		size_t len;
 		BUG_ON(a->instrlen > sizeof(insnbuf));
 		if (!boot_cpu_has(a->cpuid))
 			continue;
@@ -222,9 +222,25 @@ void __init_or_module apply_alternatives(struct alt_instr *start,
 				__func__, a->instr, instr);
 		}
 #endif
-		memcpy(insnbuf, a->replacement, a->replacementlen);
-		add_nops(insnbuf + a->replacementlen,
-			 a->instrlen - a->replacementlen);
+		if (a->replacementlen == 0xff) {
+			/* emit a CALL rel32 */
+			long v = a->replacement - (instr + 5);
+			int v32 = (int)v;
+			BUG_ON(5 > a->instrlen);
+#ifdef CONFIG_X86_64
+			if (WARN_ON((long)v32 != v))
+				continue;
+#endif
+			len = 5;
+			insnbuf[0] = 0xe8;
+			memcpy(insnbuf + 1, &v32, 4);
+		} else {
+			BUG_ON(a->replacementlen > a->instrlen);
+			len = a->replacementlen;
+			memcpy(insnbuf, a->replacement, len);
+		}
+		add_nops(insnbuf + len,
+			a->instrlen - len);
 		text_poke_early(instr, insnbuf, a->instrlen);
 	}
 }
-- 
1.6.6.1.476.g01ddb


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/5] x86: add support for lock prefix in alternatives (v2)
  2010-02-19 17:26 [PATCH 0/5] x86-32: improve atomic64_t functions (v2) Luca Barbieri
  2010-02-19 17:26 ` [PATCH 1/5] x86: add support for relative CALL and JMP in alternatives (v2) Luca Barbieri
@ 2010-02-19 17:26 ` Luca Barbieri
  2010-02-19 17:26 ` [PATCH 3/5] x86-32: allow UP/SMP lock replacement in cmpxchg64 (v2) Luca Barbieri
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 15+ messages in thread
From: Luca Barbieri @ 2010-02-19 17:26 UTC (permalink / raw)
  To: mingo; +Cc: hpa, a.p.zijlstra, akpm, linux-kernel, Luca Barbieri

Changes in v2:
- Naming change
- Change label to not conflict with alternatives

The current lock prefix UP/SMP alternative code doesn't allow
LOCK_PREFIX to be used in alternatives code.

This patch solves the problem by adding a new LOCK_PREFIX_ALTERNATIVE_PATCH
macro that only records the lock prefix location but does not emit
the prefix.

The user of this macro can then start any alternative sequence with
"lock" and have it UP/SMP patched.

To make this work, the UP/SMP alternative code is changed to do the
lock/DS prefix switching only if the byte actually contains a lock or
DS prefix.

Thus, if an alternative without the "lock" is selected, it will now do
nothing instead of clobbering the code.

Signed-off-by: Luca Barbieri <luca@luca-barbieri.com>
---
 arch/x86/include/asm/alternative.h |    8 +++++---
 arch/x86/kernel/alternative.c      |    6 ++++--
 2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h
index 77f78e2..2ed9784 100644
--- a/arch/x86/include/asm/alternative.h
+++ b/arch/x86/include/asm/alternative.h
@@ -28,12 +28,14 @@
  */
 
 #ifdef CONFIG_SMP
-#define LOCK_PREFIX \
+#define LOCK_PREFIX_HERE \
 		".section .smp_locks,\"a\"\n"	\
 		_ASM_ALIGN "\n"			\
-		_ASM_PTR "661f\n" /* address */	\
+		_ASM_PTR "671f\n" /* address */	\
 		".previous\n"			\
-		"661:\n\tlock; "
+		"671:"
+
+#define LOCK_PREFIX LOCK_PREFIX_HERE "\n\tlock; "
 
 #else /* ! CONFIG_SMP */
 #define LOCK_PREFIX ""
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 77eba91..57a672c 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -258,7 +258,8 @@ static void alternatives_smp_lock(u8 **start, u8 **end, u8 *text, u8 *text_end)
 		if (*ptr > text_end)
 			continue;
 		/* turn DS segment override prefix into lock prefix */
-		text_poke(*ptr, ((unsigned char []){0xf0}), 1);
+		if (**ptr == 0x3e)
+			text_poke(*ptr, ((unsigned char []){0xf0}), 1);
 	};
 	mutex_unlock(&text_mutex);
 }
@@ -277,7 +278,8 @@ static void alternatives_smp_unlock(u8 **start, u8 **end, u8 *text, u8 *text_end
 		if (*ptr > text_end)
 			continue;
 		/* turn lock prefix into DS segment override prefix */
-		text_poke(*ptr, ((unsigned char []){0x3E}), 1);
+		if (**ptr == 0xf0)
+			text_poke(*ptr, ((unsigned char []){0x3E}), 1);
 	};
 	mutex_unlock(&text_mutex);
 }
-- 
1.6.6.1.476.g01ddb


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/5] x86-32: allow UP/SMP lock replacement in cmpxchg64 (v2)
  2010-02-19 17:26 [PATCH 0/5] x86-32: improve atomic64_t functions (v2) Luca Barbieri
  2010-02-19 17:26 ` [PATCH 1/5] x86: add support for relative CALL and JMP in alternatives (v2) Luca Barbieri
  2010-02-19 17:26 ` [PATCH 2/5] x86: add support for lock prefix " Luca Barbieri
@ 2010-02-19 17:26 ` Luca Barbieri
  2010-02-19 17:26 ` [PATCH 4/5] lib: add self-test for atomic64_t Luca Barbieri
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 15+ messages in thread
From: Luca Barbieri @ 2010-02-19 17:26 UTC (permalink / raw)
  To: mingo; +Cc: hpa, a.p.zijlstra, akpm, linux-kernel, Luca Barbieri

Changes in v2:
- Naming change

Use the functionality just introduced in the previous patch.

Signed-off-by: Luca Barbieri <luca@luca-barbieri.com>
---
 arch/x86/include/asm/cmpxchg_32.h |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/cmpxchg_32.h b/arch/x86/include/asm/cmpxchg_32.h
index ffb9bb6..8859e12 100644
--- a/arch/x86/include/asm/cmpxchg_32.h
+++ b/arch/x86/include/asm/cmpxchg_32.h
@@ -271,7 +271,8 @@ extern unsigned long long cmpxchg_486_u64(volatile void *, u64, u64);
 	__typeof__(*(ptr)) __ret;				\
 	__typeof__(*(ptr)) __old = (o);				\
 	__typeof__(*(ptr)) __new = (n);				\
-	alternative_io("call cmpxchg8b_emu",			\
+	alternative_io(LOCK_PREFIX_HERE				\
+			"call cmpxchg8b_emu",			\
 			"lock; cmpxchg8b (%%esi)" ,		\
 		       X86_FEATURE_CX8,				\
 		       "=A" (__ret),				\
-- 
1.6.6.1.476.g01ddb


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 4/5] lib: add self-test for atomic64_t
  2010-02-19 17:26 [PATCH 0/5] x86-32: improve atomic64_t functions (v2) Luca Barbieri
                   ` (2 preceding siblings ...)
  2010-02-19 17:26 ` [PATCH 3/5] x86-32: allow UP/SMP lock replacement in cmpxchg64 (v2) Luca Barbieri
@ 2010-02-19 17:26 ` Luca Barbieri
  2010-02-19 17:26 ` [PATCH 5/5] x86-32: rewrite 32-bit atomic64 functions in assembly (v2) Luca Barbieri
  2010-02-23 22:47 ` [PATCH 0/5] x86-32: improve atomic64_t functions (v2) H. Peter Anvin
  5 siblings, 0 replies; 15+ messages in thread
From: Luca Barbieri @ 2010-02-19 17:26 UTC (permalink / raw)
  To: mingo; +Cc: hpa, a.p.zijlstra, akpm, linux-kernel, Luca Barbieri

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 4800 bytes --]

This patch adds self-test on boot code for atomic64_t.

This has been used to test the later changes in this patchset.

Signed-off-by: Luca Barbieri <luca@luca-barbieri.com>
---
 lib/Kconfig.debug   |    7 ++
 lib/Makefile        |    2 +
 lib/atomic64_test.c |  158 +++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 167 insertions(+), 0 deletions(-)
 create mode 100644 lib/atomic64_test.c

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 25c3ed5..3676c51 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1054,6 +1054,13 @@ config DMA_API_DEBUG
 	  This option causes a performance degredation.  Use only if you want
 	  to debug device drivers. If unsure, say N.
 
+config ATOMIC64_SELFTEST
+	bool "Perform an atomic64_t self-test at boot"
+	help
+	  Enable this option to test the atomic64_t functions at boot.
+
+	  If unsure, say N.
+
 source "samples/Kconfig"
 
 source "lib/Kconfig.kgdb"
diff --git a/lib/Makefile b/lib/Makefile
index 3b0b4a6..ab333e8 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -100,6 +100,8 @@ obj-$(CONFIG_GENERIC_CSUM) += checksum.o
 
 obj-$(CONFIG_GENERIC_ATOMIC64) += atomic64.o
 
+obj-$(CONFIG_ATOMIC64_SELFTEST) += atomic64_test.o
+
 hostprogs-y	:= gen_crc32table
 clean-files	:= crc32table.h
 
diff --git a/lib/atomic64_test.c b/lib/atomic64_test.c
new file mode 100644
index 0000000..4ff649e
--- /dev/null
+++ b/lib/atomic64_test.c
@@ -0,0 +1,158 @@
+/*
+ * Testsuite for atomic64_t functions
+ *
+ * Copyright © 2010  Luca Barbieri
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+#include <linux/init.h>
+#include <asm/atomic.h>
+
+#define INIT(c) do { atomic64_set(&v, c); r = c; } while (0)
+static __init int test_atomic64(void)
+{
+	long long v0 = 0xaaa31337c001d00dLL;
+	long long v1 = 0xdeadbeefdeafcafeLL;
+	long long v2 = 0xfaceabadf00df001LL;
+	long long onestwos = 0x1111111122222222LL;
+	long long one = 1LL;
+
+	atomic64_t v = ATOMIC64_INIT(v0);
+	long long r = v0;
+	BUG_ON(v.counter != r);
+
+	atomic64_set(&v, v1);
+	r = v1;
+	BUG_ON(v.counter != r);
+	BUG_ON(atomic64_read(&v) != r);
+
+	INIT(v0);
+	atomic64_add(onestwos, &v);
+	r += onestwos;
+	BUG_ON(v.counter != r);
+
+	INIT(v0);
+	atomic64_add(-one, &v);
+	r += -one;
+	BUG_ON(v.counter != r);
+
+	INIT(v0);
+	r += onestwos;
+	BUG_ON(atomic64_add_return(onestwos, &v) != r);
+	BUG_ON(v.counter != r);
+
+	INIT(v0);
+	r += -one;
+	BUG_ON(atomic64_add_return(-one, &v) != r);
+	BUG_ON(v.counter != r);
+
+	INIT(v0);
+	atomic64_sub(onestwos, &v);
+	r -= onestwos;
+	BUG_ON(v.counter != r);
+
+	INIT(v0);
+	atomic64_sub(-one, &v);
+	r -= -one;
+	BUG_ON(v.counter != r);
+
+	INIT(v0);
+	r -= onestwos;
+	BUG_ON(atomic64_sub_return(onestwos, &v) != r);
+	BUG_ON(v.counter != r);
+
+	INIT(v0);
+	r -= -one;
+	BUG_ON(atomic64_sub_return(-one, &v) != r);
+	BUG_ON(v.counter != r);
+
+	INIT(v0);
+	atomic64_inc(&v);
+	r += one;
+	BUG_ON(v.counter != r);
+
+	INIT(v0);
+	r += one;
+	BUG_ON(atomic64_inc_return(&v) != r);
+	BUG_ON(v.counter != r);
+
+	INIT(v0);
+	atomic64_dec(&v);
+	r -= one;
+	BUG_ON(v.counter != r);
+
+	INIT(v0);
+	r -= one;
+	BUG_ON(atomic64_dec_return(&v) != r);
+	BUG_ON(v.counter != r);
+
+	INIT(v0);
+	BUG_ON(atomic64_xchg(&v, v1) != v0);
+	r = v1;
+	BUG_ON(v.counter != r);
+
+	INIT(v0);
+	BUG_ON(atomic64_cmpxchg(&v, v0, v1) != v0);
+	r = v1;
+	BUG_ON(v.counter != r);
+
+	INIT(v0);
+	BUG_ON(atomic64_cmpxchg(&v, v2, v1) != v0);
+	BUG_ON(v.counter != r);
+
+	INIT(v0);
+	BUG_ON(!atomic64_add_unless(&v, one, v0));
+	BUG_ON(v.counter != r);
+
+	INIT(v0);
+	BUG_ON(atomic64_add_unless(&v, one, v1));
+	r += one;
+	BUG_ON(v.counter != r);
+
+	INIT(onestwos);
+	BUG_ON(atomic64_dec_if_positive(&v) != (onestwos - 1));
+	r -= one;
+	BUG_ON(v.counter != r);
+
+	INIT(0);
+	BUG_ON(atomic64_dec_if_positive(&v) != -one);
+	BUG_ON(v.counter != r);
+
+	INIT(-one);
+	BUG_ON(atomic64_dec_if_positive(&v) != (-one - one));
+	BUG_ON(v.counter != r);
+
+	INIT(onestwos);
+	BUG_ON(atomic64_inc_not_zero(&v));
+	r += one;
+	BUG_ON(v.counter != r);
+
+	INIT(0);
+	BUG_ON(!atomic64_inc_not_zero(&v));
+	BUG_ON(v.counter != r);
+
+	INIT(-one);
+	BUG_ON(atomic64_inc_not_zero(&v));
+	r += one;
+	BUG_ON(v.counter != r);
+
+#ifdef CONFIG_X86
+	printk(KERN_INFO "atomic64 test passed for %s+ platform %s CX8 and %s SSE\n",
+#ifdef CONFIG_X86_CMPXCHG64
+			"586",
+#else
+			"386",
+#endif
+			boot_cpu_has(X86_FEATURE_CX8) ? "with" : "without",
+			boot_cpu_has(X86_FEATURE_XMM) ? "with" : "without");
+#else
+	printk(KERN_INFO "atomic64 test passed\n");
+#endif
+
+	return 0;
+}
+
+core_initcall(test_atomic64);
-- 
1.6.6.1.476.g01ddb


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 5/5] x86-32: rewrite 32-bit atomic64 functions in assembly (v2)
  2010-02-19 17:26 [PATCH 0/5] x86-32: improve atomic64_t functions (v2) Luca Barbieri
                   ` (3 preceding siblings ...)
  2010-02-19 17:26 ` [PATCH 4/5] lib: add self-test for atomic64_t Luca Barbieri
@ 2010-02-19 17:26 ` Luca Barbieri
  2010-02-23 22:47 ` [PATCH 0/5] x86-32: improve atomic64_t functions (v2) H. Peter Anvin
  5 siblings, 0 replies; 15+ messages in thread
From: Luca Barbieri @ 2010-02-19 17:26 UTC (permalink / raw)
  To: mingo; +Cc: hpa, a.p.zijlstra, akpm, linux-kernel, Luca Barbieri

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 27083 bytes --]

Changes in v2:
- Merged 386 and cx8 support in the same patch
- 386 support now done in assembly, C code no longer used at all
- cmpxchg64 is used for atomic64_cmpxchg
- stop using macros, use one-line inline functions instead
- miscellanous changes and improvements

This patch replaces atomic64_32.c with two assembly implementations,
one for 386/486 machines using pushf/cli/popf and one for 586+ machines
using cmpxchg8b.

The cmpxchg8b implementation provides the following advantages over the
current one:

1. Implements atomic64_add_unless, atomic64_dec_if_positive and
   atomic64_inc_not_zero

2. Uses the ZF flag changed by cmpxchg8b instead of doing a comparison

3. Uses custom register calling conventions that reduce or eliminate
   register moves to suit cmpxchg8b

4. Reads the initial value instead of using cmpxchg8b to do that.
   Currently we use lock xaddl and movl, which seems the fastest.

5. Does not use the lock prefix for atomic64_set
   64-bit writes are already atomic, so we don't need that.
   We still need it for atomic64_read to avoid restoring a value
   changed in the meantime.

6. Allocates registers as well or better than gcc

The 386 implementation provides support for 386 and 486 machines.
386/486 SMP is not supported (we dropped it), but such support can be
added easily if desired.

A pure assembly implementation is required due to the custom calling
conventions, and desire to use %ebp in atomic64_add_return (we need
7 registers...), as well as the ability to use pushf/popf in the 386
code without an intermediate pop/push.

The parameter names are changed to match the convention in atomic_64.h

Signed-off-by: Luca Barbieri <luca@luca-barbieri.com>
---
 arch/x86/include/asm/atomic_32.h |  278 +++++++++++++++++++++++++++++---------
 arch/x86/lib/Makefile            |    3 +-
 arch/x86/lib/atomic64_32.c       |  273 +++++++------------------------------
 arch/x86/lib/atomic64_386_32.S   |  175 ++++++++++++++++++++++++
 arch/x86/lib/atomic64_cx8_32.S   |  225 ++++++++++++++++++++++++++++++
 5 files changed, 664 insertions(+), 290 deletions(-)
 create mode 100644 arch/x86/lib/atomic64_386_32.S
 create mode 100644 arch/x86/lib/atomic64_cx8_32.S

diff --git a/arch/x86/include/asm/atomic_32.h b/arch/x86/include/asm/atomic_32.h
index dc5a667..5f22cc7 100644
--- a/arch/x86/include/asm/atomic_32.h
+++ b/arch/x86/include/asm/atomic_32.h
@@ -268,109 +268,193 @@ typedef struct {
 
 #define ATOMIC64_INIT(val)	{ (val) }
 
-extern u64 atomic64_cmpxchg(atomic64_t *ptr, u64 old_val, u64 new_val);
+#ifdef CONFIG_X86_CMPXCHG64
+#define ATOMIC64_ALTERNATIVE_(f, g) "call atomic64_" #g "_cx8"
+#else
+#define ATOMIC64_ALTERNATIVE_(f, g) ALTERNATIVE_CALL("call atomic64_" #f "_386", "atomic64_" #g "_cx8", X86_FEATURE_CX8)
+#endif
+
+#define ATOMIC64_ALTERNATIVE(f) ATOMIC64_ALTERNATIVE_(f, f)
+
+/**
+ * atomic64_cmpxchg - cmpxchg atomic64 variable
+ * @p: pointer to type atomic64_t
+ * @o: expected value
+ * @n: new value
+ *
+ * Atomically sets @v to @n if it was equal to @o and returns
+ * the old value.
+ */
+
+static inline long long atomic64_cmpxchg(atomic64_t *v, long long o, long long n)
+{
+	return cmpxchg64(&v->counter, o, n);
+}
 
 /**
  * atomic64_xchg - xchg atomic64 variable
- * @ptr:      pointer to type atomic64_t
- * @new_val:  value to assign
+ * @v: pointer to type atomic64_t
+ * @n: value to assign
  *
- * Atomically xchgs the value of @ptr to @new_val and returns
+ * Atomically xchgs the value of @v to @n and returns
  * the old value.
  */
-extern u64 atomic64_xchg(atomic64_t *ptr, u64 new_val);
+static inline long long atomic64_xchg(atomic64_t *v, long long n)
+{
+	long long o;
+	unsigned high = (unsigned)(n >> 32);
+	unsigned low = (unsigned)n;
+	asm volatile(ATOMIC64_ALTERNATIVE(xchg)
+		     : "=A" (o), "+b" (low), "+c" (high)
+		     : "S" (v)
+		     : "memory"
+		     );
+	return o;
+}
 
 /**
  * atomic64_set - set atomic64 variable
- * @ptr:      pointer to type atomic64_t
- * @new_val:  value to assign
+ * @v: pointer to type atomic64_t
+ * @n: value to assign
  *
- * Atomically sets the value of @ptr to @new_val.
+ * Atomically sets the value of @v to @n.
  */
-extern void atomic64_set(atomic64_t *ptr, u64 new_val);
+static inline void atomic64_set(atomic64_t *v, long long i)
+{
+	unsigned high = (unsigned)(i >> 32);
+	unsigned low = (unsigned)i;
+	asm volatile(ATOMIC64_ALTERNATIVE(set)
+		     : "+b" (low), "+c" (high)
+		     : "S" (v)
+		     : "eax", "edx", "memory"
+		     );
+}
 
 /**
  * atomic64_read - read atomic64 variable
- * @ptr:      pointer to type atomic64_t
+ * @v: pointer to type atomic64_t
  *
- * Atomically reads the value of @ptr and returns it.
+ * Atomically reads the value of @v and returns it.
  */
-static inline u64 atomic64_read(atomic64_t *ptr)
+static inline long long atomic64_read(atomic64_t *v)
 {
-	u64 res;
-
-	/*
-	 * Note, we inline this atomic64_t primitive because
-	 * it only clobbers EAX/EDX and leaves the others
-	 * untouched. We also (somewhat subtly) rely on the
-	 * fact that cmpxchg8b returns the current 64-bit value
-	 * of the memory location we are touching:
-	 */
-	asm volatile(
-		"mov %%ebx, %%eax\n\t"
-		"mov %%ecx, %%edx\n\t"
-		LOCK_PREFIX "cmpxchg8b %1\n"
-			: "=&A" (res)
-			: "m" (*ptr)
-		);
-
-	return res;
-}
-
-extern u64 atomic64_read(atomic64_t *ptr);
+	long long r;
+	asm volatile(ATOMIC64_ALTERNATIVE(read)
+		     : "=A" (r), "+c" (v)
+		     : : "memory"
+		     );
+	return r;
+ }
 
 /**
  * atomic64_add_return - add and return
- * @delta: integer value to add
- * @ptr:   pointer to type atomic64_t
+ * @i: integer value to add
+ * @v: pointer to type atomic64_t
  *
- * Atomically adds @delta to @ptr and returns @delta + *@ptr
+ * Atomically adds @i to @v and returns @i + *@v
  */
-extern u64 atomic64_add_return(u64 delta, atomic64_t *ptr);
+static inline long long atomic64_add_return(long long i, atomic64_t *v)
+{
+	asm volatile(ATOMIC64_ALTERNATIVE(add_return)
+		     : "+A" (i), "+c" (v)
+		     : : "memory"
+		     );
+	return i;
+}
 
 /*
  * Other variants with different arithmetic operators:
  */
-extern u64 atomic64_sub_return(u64 delta, atomic64_t *ptr);
-extern u64 atomic64_inc_return(atomic64_t *ptr);
-extern u64 atomic64_dec_return(atomic64_t *ptr);
+static inline long long atomic64_sub_return(long long i, atomic64_t *v)
+{
+	asm volatile(ATOMIC64_ALTERNATIVE(sub_return)
+		     : "+A" (i), "+c" (v)
+		     : : "memory"
+		     );
+	return i;
+}
+
+static inline long long atomic64_inc_return(atomic64_t *v)
+{
+	long long a;
+	asm volatile(ATOMIC64_ALTERNATIVE(inc_return)
+		     : "=A" (a)
+		     : "S" (v)
+		     : "memory", "ecx"
+		     );
+	return a;
+}
+
+static inline long long atomic64_dec_return(atomic64_t *v)
+{
+	long long a;
+	asm volatile(ATOMIC64_ALTERNATIVE(dec_return)
+		     : "=A" (a)
+		     : "S" (v)
+		     : "memory", "ecx"
+		     );
+	return a;
+}
 
 /**
  * atomic64_add - add integer to atomic64 variable
- * @delta: integer value to add
- * @ptr:   pointer to type atomic64_t
+ * @i: integer value to add
+ * @v: pointer to type atomic64_t
  *
- * Atomically adds @delta to @ptr.
+ * Atomically adds @i to @v.
  */
-extern void atomic64_add(u64 delta, atomic64_t *ptr);
+static inline long long atomic64_add(long long i, atomic64_t *v)
+{
+	asm volatile(ATOMIC64_ALTERNATIVE_(add, add_return)
+		     : "+A" (i), "+c" (v)
+		     : : "memory"
+		     );
+	return i;
+}
 
 /**
  * atomic64_sub - subtract the atomic64 variable
- * @delta: integer value to subtract
- * @ptr:   pointer to type atomic64_t
+ * @i: integer value to subtract
+ * @v: pointer to type atomic64_t
  *
- * Atomically subtracts @delta from @ptr.
+ * Atomically subtracts @i from @v.
  */
-extern void atomic64_sub(u64 delta, atomic64_t *ptr);
+static inline long long atomic64_sub(long long i, atomic64_t *v)
+{
+	asm volatile(ATOMIC64_ALTERNATIVE_(sub, sub_return)
+		     : "+A" (i), "+c" (v)
+		     : : "memory"
+		     );
+	return i;
+}
 
 /**
  * atomic64_sub_and_test - subtract value from variable and test result
- * @delta: integer value to subtract
- * @ptr:   pointer to type atomic64_t
- *
- * Atomically subtracts @delta from @ptr and returns
+ * @i: integer value to subtract
+ * @v: pointer to type atomic64_t
+  *
+ * Atomically subtracts @i from @v and returns
  * true if the result is zero, or false for all
  * other cases.
  */
-extern int atomic64_sub_and_test(u64 delta, atomic64_t *ptr);
+static inline int atomic64_sub_and_test(long long i, atomic64_t *v)
+{
+	return atomic64_sub_return(i, v) == 0;
+}
 
 /**
  * atomic64_inc - increment atomic64 variable
- * @ptr: pointer to type atomic64_t
+ * @v: pointer to type atomic64_t
  *
- * Atomically increments @ptr by 1.
+ * Atomically increments @v by 1.
  */
-extern void atomic64_inc(atomic64_t *ptr);
+static inline void atomic64_inc(atomic64_t *v)
+{
+	asm volatile(ATOMIC64_ALTERNATIVE_(inc, inc_return)
+		     : : "S" (v)
+		     : "memory", "eax", "ecx", "edx"
+		     );
+}
 
 /**
  * atomic64_dec - decrement atomic64 variable
@@ -378,38 +462,98 @@ extern void atomic64_inc(atomic64_t *ptr);
  *
  * Atomically decrements @ptr by 1.
  */
-extern void atomic64_dec(atomic64_t *ptr);
+static inline void atomic64_dec(atomic64_t *v)
+{
+	asm volatile(ATOMIC64_ALTERNATIVE_(dec, dec_return)
+		     : : "S" (v)
+		     : "memory", "eax", "ecx", "edx"
+		     );
+}
 
 /**
  * atomic64_dec_and_test - decrement and test
- * @ptr: pointer to type atomic64_t
+ * @v: pointer to type atomic64_t
  *
- * Atomically decrements @ptr by 1 and
+ * Atomically decrements @v by 1 and
  * returns true if the result is 0, or false for all other
  * cases.
  */
-extern int atomic64_dec_and_test(atomic64_t *ptr);
+static inline int atomic64_dec_and_test(atomic64_t *v)
+{
+	return atomic64_dec_return(v) == 0;
+}
 
 /**
  * atomic64_inc_and_test - increment and test
- * @ptr: pointer to type atomic64_t
+ * @v: pointer to type atomic64_t
  *
- * Atomically increments @ptr by 1
+ * Atomically increments @v by 1
  * and returns true if the result is zero, or false for all
  * other cases.
  */
-extern int atomic64_inc_and_test(atomic64_t *ptr);
+static inline int atomic64_inc_and_test(atomic64_t *v)
+{
+	return atomic64_inc_return(v) == 0;
+}
 
 /**
  * atomic64_add_negative - add and test if negative
- * @delta: integer value to add
- * @ptr:   pointer to type atomic64_t
+ * @i: integer value to add
+ * @v: pointer to type atomic64_t
  *
- * Atomically adds @delta to @ptr and returns true
+ * Atomically adds @i to @v and returns true
  * if the result is negative, or false when
  * result is greater than or equal to zero.
  */
-extern int atomic64_add_negative(u64 delta, atomic64_t *ptr);
+static inline int atomic64_add_negative(long long i, atomic64_t *v)
+{
+	return atomic64_add_return(i, v) < 0;
+}
+
+/**
+ * atomic64_add_unless - add unless the number is a given value
+ * @v: pointer of type atomic64_t
+ * @a: the amount to add to v...
+ * @u: ...unless v is equal to u.
+ *
+ * Atomically adds @a to @v, so long as it was not @u.
+ * Returns non-zero if @v was not @u, and zero otherwise.
+ */
+static inline int atomic64_add_unless(atomic64_t *v, long long a, long long u)
+{
+	unsigned low = (unsigned)u;
+	unsigned high = (unsigned)(u >> 32);
+	asm volatile(ATOMIC64_ALTERNATIVE(add_unless) "\n\t"
+		     : "+A" (a), "+c" (v), "+S" (low), "+D" (high)
+		     : : "memory");
+	return (int)a;
+}
+
+
+static inline int atomic64_inc_not_zero(atomic64_t *v)
+{
+	int r;
+	asm volatile(ATOMIC64_ALTERNATIVE(inc_not_zero)
+		     : "=a" (r)
+		     : "S" (v)
+		     : "ecx", "edx", "memory"
+		     );
+	return r;
+}
+
+static inline long long atomic64_dec_if_positive(atomic64_t *v)
+{
+	long long r;
+	asm volatile(ATOMIC64_ALTERNATIVE(dec_if_positive)
+		     : "=A" (r)
+		     : "S" (v)
+		     : "ecx", "memory"
+		     );
+	return r;
+}
+
+#undef ATOMIC64_ALTERNATIVE
+#undef ATOMIC64_ALTERNATIVE_
 
 #include <asm-generic/atomic-long.h>
 #endif /* _ASM_X86_ATOMIC_32_H */
diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
index cffd754..05d686b 100644
--- a/arch/x86/lib/Makefile
+++ b/arch/x86/lib/Makefile
@@ -26,11 +26,12 @@ obj-y += msr.o msr-reg.o msr-reg-export.o
 
 ifeq ($(CONFIG_X86_32),y)
         obj-y += atomic64_32.o
+        lib-y += atomic64_cx8_32.o
         lib-y += checksum_32.o
         lib-y += strstr_32.o
         lib-y += semaphore_32.o string_32.o
 ifneq ($(CONFIG_X86_CMPXCHG64),y)
-        lib-y += cmpxchg8b_emu.o
+        lib-y += cmpxchg8b_emu.o atomic64_386_32.o
 endif
         lib-$(CONFIG_X86_USE_3DNOW) += mmx_32.o
 else
diff --git a/arch/x86/lib/atomic64_32.c b/arch/x86/lib/atomic64_32.c
index 824fa0b..540179e 100644
--- a/arch/x86/lib/atomic64_32.c
+++ b/arch/x86/lib/atomic64_32.c
@@ -6,225 +6,54 @@
 #include <asm/cmpxchg.h>
 #include <asm/atomic.h>
 
-static noinline u64 cmpxchg8b(u64 *ptr, u64 old, u64 new)
-{
-	u32 low = new;
-	u32 high = new >> 32;
-
-	asm volatile(
-		LOCK_PREFIX "cmpxchg8b %1\n"
-		     : "+A" (old), "+m" (*ptr)
-		     :  "b" (low),  "c" (high)
-		     );
-	return old;
-}
-
-u64 atomic64_cmpxchg(atomic64_t *ptr, u64 old_val, u64 new_val)
-{
-	return cmpxchg8b(&ptr->counter, old_val, new_val);
-}
-EXPORT_SYMBOL(atomic64_cmpxchg);
-
-/**
- * atomic64_xchg - xchg atomic64 variable
- * @ptr:      pointer to type atomic64_t
- * @new_val:  value to assign
- *
- * Atomically xchgs the value of @ptr to @new_val and returns
- * the old value.
- */
-u64 atomic64_xchg(atomic64_t *ptr, u64 new_val)
-{
-	/*
-	 * Try first with a (possibly incorrect) assumption about
-	 * what we have there. We'll do two loops most likely,
-	 * but we'll get an ownership MESI transaction straight away
-	 * instead of a read transaction followed by a
-	 * flush-for-ownership transaction:
-	 */
-	u64 old_val, real_val = 0;
-
-	do {
-		old_val = real_val;
-
-		real_val = atomic64_cmpxchg(ptr, old_val, new_val);
-
-	} while (real_val != old_val);
-
-	return old_val;
-}
-EXPORT_SYMBOL(atomic64_xchg);
-
-/**
- * atomic64_set - set atomic64 variable
- * @ptr:      pointer to type atomic64_t
- * @new_val:  value to assign
- *
- * Atomically sets the value of @ptr to @new_val.
- */
-void atomic64_set(atomic64_t *ptr, u64 new_val)
-{
-	atomic64_xchg(ptr, new_val);
-}
-EXPORT_SYMBOL(atomic64_set);
-
-/**
-EXPORT_SYMBOL(atomic64_read);
- * atomic64_add_return - add and return
- * @delta: integer value to add
- * @ptr:   pointer to type atomic64_t
- *
- * Atomically adds @delta to @ptr and returns @delta + *@ptr
- */
-noinline u64 atomic64_add_return(u64 delta, atomic64_t *ptr)
-{
-	/*
-	 * Try first with a (possibly incorrect) assumption about
-	 * what we have there. We'll do two loops most likely,
-	 * but we'll get an ownership MESI transaction straight away
-	 * instead of a read transaction followed by a
-	 * flush-for-ownership transaction:
-	 */
-	u64 old_val, new_val, real_val = 0;
-
-	do {
-		old_val = real_val;
-		new_val = old_val + delta;
-
-		real_val = atomic64_cmpxchg(ptr, old_val, new_val);
-
-	} while (real_val != old_val);
-
-	return new_val;
-}
-EXPORT_SYMBOL(atomic64_add_return);
-
-u64 atomic64_sub_return(u64 delta, atomic64_t *ptr)
-{
-	return atomic64_add_return(-delta, ptr);
-}
-EXPORT_SYMBOL(atomic64_sub_return);
-
-u64 atomic64_inc_return(atomic64_t *ptr)
-{
-	return atomic64_add_return(1, ptr);
-}
-EXPORT_SYMBOL(atomic64_inc_return);
-
-u64 atomic64_dec_return(atomic64_t *ptr)
-{
-	return atomic64_sub_return(1, ptr);
-}
-EXPORT_SYMBOL(atomic64_dec_return);
-
-/**
- * atomic64_add - add integer to atomic64 variable
- * @delta: integer value to add
- * @ptr:   pointer to type atomic64_t
- *
- * Atomically adds @delta to @ptr.
- */
-void atomic64_add(u64 delta, atomic64_t *ptr)
-{
-	atomic64_add_return(delta, ptr);
-}
-EXPORT_SYMBOL(atomic64_add);
-
-/**
- * atomic64_sub - subtract the atomic64 variable
- * @delta: integer value to subtract
- * @ptr:   pointer to type atomic64_t
- *
- * Atomically subtracts @delta from @ptr.
- */
-void atomic64_sub(u64 delta, atomic64_t *ptr)
-{
-	atomic64_add(-delta, ptr);
-}
-EXPORT_SYMBOL(atomic64_sub);
-
-/**
- * atomic64_sub_and_test - subtract value from variable and test result
- * @delta: integer value to subtract
- * @ptr:   pointer to type atomic64_t
- *
- * Atomically subtracts @delta from @ptr and returns
- * true if the result is zero, or false for all
- * other cases.
- */
-int atomic64_sub_and_test(u64 delta, atomic64_t *ptr)
-{
-	u64 new_val = atomic64_sub_return(delta, ptr);
-
-	return new_val == 0;
-}
-EXPORT_SYMBOL(atomic64_sub_and_test);
-
-/**
- * atomic64_inc - increment atomic64 variable
- * @ptr: pointer to type atomic64_t
- *
- * Atomically increments @ptr by 1.
- */
-void atomic64_inc(atomic64_t *ptr)
-{
-	atomic64_add(1, ptr);
-}
-EXPORT_SYMBOL(atomic64_inc);
-
-/**
- * atomic64_dec - decrement atomic64 variable
- * @ptr: pointer to type atomic64_t
- *
- * Atomically decrements @ptr by 1.
- */
-void atomic64_dec(atomic64_t *ptr)
-{
-	atomic64_sub(1, ptr);
-}
-EXPORT_SYMBOL(atomic64_dec);
-
-/**
- * atomic64_dec_and_test - decrement and test
- * @ptr: pointer to type atomic64_t
- *
- * Atomically decrements @ptr by 1 and
- * returns true if the result is 0, or false for all other
- * cases.
- */
-int atomic64_dec_and_test(atomic64_t *ptr)
-{
-	return atomic64_sub_and_test(1, ptr);
-}
-EXPORT_SYMBOL(atomic64_dec_and_test);
-
-/**
- * atomic64_inc_and_test - increment and test
- * @ptr: pointer to type atomic64_t
- *
- * Atomically increments @ptr by 1
- * and returns true if the result is zero, or false for all
- * other cases.
- */
-int atomic64_inc_and_test(atomic64_t *ptr)
-{
-	return atomic64_sub_and_test(-1, ptr);
-}
-EXPORT_SYMBOL(atomic64_inc_and_test);
-
-/**
- * atomic64_add_negative - add and test if negative
- * @delta: integer value to add
- * @ptr:   pointer to type atomic64_t
- *
- * Atomically adds @delta to @ptr and returns true
- * if the result is negative, or false when
- * result is greater than or equal to zero.
- */
-int atomic64_add_negative(u64 delta, atomic64_t *ptr)
-{
-	s64 new_val = atomic64_add_return(delta, ptr);
-
-	return new_val < 0;
-}
-EXPORT_SYMBOL(atomic64_add_negative);
+long long atomic64_read_cx8(long long, const atomic64_t *v);
+EXPORT_SYMBOL(atomic64_read_cx8);
+long long atomic64_set_cx8(long long, const atomic64_t *v);
+EXPORT_SYMBOL(atomic64_set_cx8);
+long long atomic64_xchg_cx8(long long, unsigned high);
+EXPORT_SYMBOL(atomic64_xchg_cx8);
+long long atomic64_add_return_cx8(long long a, atomic64_t *v);
+EXPORT_SYMBOL(atomic64_add_return_cx8);
+long long atomic64_sub_return_cx8(long long a, atomic64_t *v);
+EXPORT_SYMBOL(atomic64_sub_return_cx8);
+long long atomic64_inc_return_cx8(long long a, atomic64_t *v);
+EXPORT_SYMBOL(atomic64_inc_return_cx8);
+long long atomic64_dec_return_cx8(long long a, atomic64_t *v);
+EXPORT_SYMBOL(atomic64_dec_return_cx8);
+long long atomic64_dec_if_positive_cx8(atomic64_t *v);
+EXPORT_SYMBOL(atomic64_dec_if_positive_cx8);
+int atomic64_inc_not_zero_cx8(atomic64_t *v);
+EXPORT_SYMBOL(atomic64_inc_not_zero_cx8);
+int atomic64_add_unless_cx8(atomic64_t *v, long long a, long long u);
+EXPORT_SYMBOL(atomic64_add_unless_cx8);
+
+#ifndef CONFIG_X86_CMPXCHG64
+long long atomic64_read_386(long long, const atomic64_t *v);
+EXPORT_SYMBOL(atomic64_read_386);
+long long atomic64_set_386(long long, const atomic64_t *v);
+EXPORT_SYMBOL(atomic64_set_386);
+long long atomic64_xchg_386(long long, unsigned high);
+EXPORT_SYMBOL(atomic64_xchg_386);
+long long atomic64_add_return_386(long long a, atomic64_t *v);
+EXPORT_SYMBOL(atomic64_add_return_386);
+long long atomic64_sub_return_386(long long a, atomic64_t *v);
+EXPORT_SYMBOL(atomic64_sub_return_386);
+long long atomic64_inc_return_386(long long a, atomic64_t *v);
+EXPORT_SYMBOL(atomic64_inc_return_386);
+long long atomic64_dec_return_386(long long a, atomic64_t *v);
+EXPORT_SYMBOL(atomic64_dec_return_386);
+long long atomic64_add_386(long long a, atomic64_t *v);
+EXPORT_SYMBOL(atomic64_add_386);
+long long atomic64_sub_386(long long a, atomic64_t *v);
+EXPORT_SYMBOL(atomic64_sub_386);
+long long atomic64_inc_386(long long a, atomic64_t *v);
+EXPORT_SYMBOL(atomic64_inc_386);
+long long atomic64_dec_386(long long a, atomic64_t *v);
+EXPORT_SYMBOL(atomic64_dec_386);
+long long atomic64_dec_if_positive_386(atomic64_t *v);
+EXPORT_SYMBOL(atomic64_dec_if_positive_386);
+int atomic64_inc_not_zero_386(atomic64_t *v);
+EXPORT_SYMBOL(atomic64_inc_not_zero_386);
+int atomic64_add_unless_386(atomic64_t *v, long long a, long long u);
+EXPORT_SYMBOL(atomic64_add_unless_386);
+#endif
diff --git a/arch/x86/lib/atomic64_386_32.S b/arch/x86/lib/atomic64_386_32.S
new file mode 100644
index 0000000..5db07fe
--- /dev/null
+++ b/arch/x86/lib/atomic64_386_32.S
@@ -0,0 +1,175 @@
+/*
+ * atomic64_t for 386/486
+ *
+ * Copyright © 2010  Luca Barbieri
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/linkage.h>
+#include <asm/alternative-asm.h>
+#include <asm/dwarf2.h>
+
+/* if you want SMP support, implement these with real spinlocks */
+.macro LOCK reg
+	pushfl
+	CFI_ADJUST_CFA_OFFSET 4
+	cli
+.endm
+
+.macro UNLOCK reg
+	popfl
+	CFI_ADJUST_CFA_OFFSET -4
+.endm
+
+.macro BEGIN func reg
+$v = \reg
+
+ENTRY(atomic64_\func\()_386)
+	CFI_STARTPROC
+	LOCK $v
+
+.macro RETURN
+	UNLOCK $v
+	ret
+.endm
+
+.macro END_
+	CFI_ENDPROC
+ENDPROC(atomic64_\func\()_386)
+.purgem RETURN
+.purgem END_
+.purgem END
+.endm
+
+.macro END
+RETURN
+END_
+.endm
+.endm
+
+BEGIN read %ecx
+	movl  ($v), %eax
+	movl 4($v), %edx
+END
+
+BEGIN set %esi
+	movl %ebx,  ($v)
+	movl %ecx, 4($v)
+END
+
+BEGIN xchg %esi
+	movl  ($v), %eax
+	movl 4($v), %edx
+	movl %ebx,  ($v)
+	movl %ecx, 4($v)
+END
+
+BEGIN add %ecx
+	addl %eax,  ($v)
+	adcl %edx, 4($v)
+END
+
+BEGIN add_return %ecx
+	addl  ($v), %eax
+	adcl 4($v), %edx
+	movl %eax,  ($v)
+	movl %edx, 4($v)
+END
+
+BEGIN sub %ecx
+	subl %eax,  ($v)
+	sbbl %edx, 4($v)
+END
+
+BEGIN sub_return %ecx
+	negl %edx
+	negl %eax
+	sbbl $0, %edx
+	addl  ($v), %eax
+	adcl 4($v), %edx
+	movl %eax,  ($v)
+	movl %edx, 4($v)
+END
+
+BEGIN inc %esi
+	addl $1,  ($v)
+	adcl $0, 4($v)
+END
+
+BEGIN inc_return %esi
+	movl  ($v), %eax
+	movl 4($v), %edx
+	addl $1, %eax
+	adcl $0, %edx
+	movl %eax,  ($v)
+	movl %edx, 4($v)
+END
+
+BEGIN dec %esi
+	subl $1,  ($v)
+	sbbl $0, 4($v)
+END
+
+BEGIN dec_return %esi
+	movl  ($v), %eax
+	movl 4($v), %edx
+	subl $1, %eax
+	sbbl $0, %edx
+	movl %eax,  ($v)
+	movl %edx, 4($v)
+END
+
+BEGIN add_unless %ecx
+	addl %eax, %esi
+	adcl %edx, %edi
+	addl  ($v), %eax
+	adcl 4($v), %edx
+	cmpl %eax, %esi
+	je 3f
+1:
+	movl %eax,  ($v)
+	movl %edx, 4($v)
+	xorl %eax, %eax
+2:
+RETURN
+3:
+	cmpl %edx, %edi
+	jne 1b
+	movl $1, %eax
+	jmp 2b
+END_
+
+BEGIN inc_not_zero %esi
+	movl  ($v), %eax
+	movl 4($v), %edx
+	testl %eax, %eax
+	je 3f
+1:
+	addl $1, %eax
+	adcl $0, %edx
+	movl %eax,  ($v)
+	movl %edx, 4($v)
+	xorl %eax, %eax
+2:
+RETURN
+3:
+	testl %edx, %edx
+	jne 1b
+	movl $1, %eax
+	jmp 2b
+END_
+
+BEGIN dec_if_positive %esi
+	movl  ($v), %eax
+	movl 4($v), %edx
+	subl $1, %eax
+	sbbl $0, %edx
+	js 1f
+	movl %eax,  ($v)
+	movl %edx, 4($v)
+1:
+END
diff --git a/arch/x86/lib/atomic64_cx8_32.S b/arch/x86/lib/atomic64_cx8_32.S
new file mode 100644
index 0000000..e49c4eb
--- /dev/null
+++ b/arch/x86/lib/atomic64_cx8_32.S
@@ -0,0 +1,225 @@
+/*
+ * atomic64_t for 586+
+ *
+ * Copyright © 2010  Luca Barbieri
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/linkage.h>
+#include <asm/alternative-asm.h>
+#include <asm/dwarf2.h>
+
+.macro SAVE reg
+	pushl %\reg
+	CFI_ADJUST_CFA_OFFSET 4
+	CFI_REL_OFFSET \reg, 0
+.endm
+
+.macro RESTORE reg
+	popl %\reg
+	CFI_ADJUST_CFA_OFFSET -4
+	CFI_RESTORE \reg
+.endm
+
+.macro read64 reg
+	movl %ebx, %eax
+	movl %ecx, %edx
+/* we need LOCK_PREFIX since otherwise cmpxchg8b always does the write */
+	LOCK_PREFIX
+	cmpxchg8b (\reg)
+.endm
+
+ENTRY(atomic64_read_cx8)
+	CFI_STARTPROC
+
+	read64 %ecx
+	ret
+	CFI_ENDPROC
+ENDPROC(atomic64_read_cx8)
+
+ENTRY(atomic64_set_cx8)
+	CFI_STARTPROC
+
+1:
+/* we don't need LOCK_PREFIX since aligned 64-bit writes
+ * are atomic on 586 and newer */
+	cmpxchg8b (%esi)
+	jne 1b
+
+	ret
+	CFI_ENDPROC
+ENDPROC(atomic64_set_cx8)
+
+ENTRY(atomic64_xchg_cx8)
+	CFI_STARTPROC
+
+	movl %ebx, %eax
+	movl %ecx, %edx
+1:
+	LOCK_PREFIX
+	cmpxchg8b (%esi)
+	jne 1b
+
+	ret
+	CFI_ENDPROC
+ENDPROC(atomic64_xchg_cx8)
+
+.macro addsub_return func ins insc
+ENTRY(atomic64_\func\()_return_cx8)
+	CFI_STARTPROC
+	SAVE ebp
+	SAVE ebx
+	SAVE esi
+	SAVE edi
+
+	movl %eax, %esi
+	movl %edx, %edi
+	movl %ecx, %ebp
+
+	read64 %ebp
+1:
+	movl %eax, %ebx
+	movl %edx, %ecx
+	\ins\()l %esi, %ebx
+	\insc\()l %edi, %ecx
+	LOCK_PREFIX
+	cmpxchg8b (%ebp)
+	jne 1b
+
+10:
+	movl %ebx, %eax
+	movl %ecx, %edx
+	RESTORE edi
+	RESTORE esi
+	RESTORE ebx
+	RESTORE ebp
+	ret
+	CFI_ENDPROC
+ENDPROC(atomic64_\func\()_return_cx8)
+.endm
+
+addsub_return add add adc
+addsub_return sub sub sbb
+
+.macro incdec_return func ins insc
+ENTRY(atomic64_\func\()_return_cx8)
+	CFI_STARTPROC
+	SAVE ebx
+
+	read64 %esi
+1:
+	movl %eax, %ebx
+	movl %edx, %ecx
+	\ins\()l $1, %ebx
+	\insc\()l $0, %ecx
+	LOCK_PREFIX
+	cmpxchg8b (%esi)
+	jne 1b
+
+10:
+	movl %ebx, %eax
+	movl %ecx, %edx
+	RESTORE ebx
+	ret
+	CFI_ENDPROC
+ENDPROC(atomic64_\func\()_return_cx8)
+.endm
+
+incdec_return inc add adc
+incdec_return dec sub sbb
+
+ENTRY(atomic64_dec_if_positive_cx8)
+	CFI_STARTPROC
+	SAVE ebx
+
+	read64 %esi
+1:
+	movl %eax, %ebx
+	movl %edx, %ecx
+	subl $1, %ebx
+	sbb $0, %ecx
+	js 2f
+	LOCK_PREFIX
+	cmpxchg8b (%esi)
+	jne 1b
+
+2:
+	movl %ebx, %eax
+	movl %ecx, %edx
+	RESTORE ebx
+	ret
+	CFI_ENDPROC
+ENDPROC(atomic64_dec_if_positive_cx8)
+
+ENTRY(atomic64_add_unless_cx8)
+	CFI_STARTPROC
+	SAVE ebp
+	SAVE ebx
+/* these just push these two parameters on the stack */
+	SAVE edi
+	SAVE esi
+
+	movl %ecx, %ebp
+	movl %eax, %esi
+	movl %edx, %edi
+
+	read64 %ebp
+1:
+	cmpl %eax, 0(%esp)
+	je 4f
+2:
+	movl %eax, %ebx
+	movl %edx, %ecx
+	addl %esi, %ebx
+	adcl %edi, %ecx
+	LOCK_PREFIX
+	cmpxchg8b (%ebp)
+	jne 1b
+
+	xorl %eax, %eax
+3:
+	addl $8, %esp
+	CFI_ADJUST_CFA_OFFSET -8
+	RESTORE ebx
+	RESTORE ebp
+	ret
+4:
+	cmpl %edx, 4(%esp)
+	jne 2b
+	movl $1, %eax
+	jmp 3b
+	CFI_ENDPROC
+ENDPROC(atomic64_add_unless_cx8)
+
+ENTRY(atomic64_inc_not_zero_cx8)
+	CFI_STARTPROC
+	SAVE ebx
+
+	read64 %esi
+1:
+	testl %eax, %eax
+	je 4f
+2:
+	movl %eax, %ebx
+	movl %edx, %ecx
+	addl $1, %ebx
+	adcl $0, %ecx
+	LOCK_PREFIX
+	cmpxchg8b (%esi)
+	jne 1b
+
+	xorl %eax, %eax
+3:
+	RESTORE ebx
+	ret
+4:
+	testl %edx, %edx
+	jne 2b
+	movl $1, %eax
+	jmp 3b
+	CFI_ENDPROC
+ENDPROC(atomic64_inc_not_zero_cx8)
-- 
1.6.6.1.476.g01ddb


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/5] x86-32: improve atomic64_t functions (v2)
  2010-02-19 17:26 [PATCH 0/5] x86-32: improve atomic64_t functions (v2) Luca Barbieri
                   ` (4 preceding siblings ...)
  2010-02-19 17:26 ` [PATCH 5/5] x86-32: rewrite 32-bit atomic64 functions in assembly (v2) Luca Barbieri
@ 2010-02-23 22:47 ` H. Peter Anvin
  2010-02-24  9:56   ` Luca Barbieri
  5 siblings, 1 reply; 15+ messages in thread
From: H. Peter Anvin @ 2010-02-23 22:47 UTC (permalink / raw)
  To: Luca Barbieri; +Cc: mingo, a.p.zijlstra, akpm, linux-kernel

Hi Luca,

I wonder if I could ask you to recreate your patchset on top of the
x86/asm branch in the -tip tree.  There are some nontrivial changes to
the alternatives mechanism, plus a restructuring of the atomic headers
which both conflict with this patchset.

The -tip tree is available from:

git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/5] x86-32: improve atomic64_t functions (v2)
  2010-02-23 22:47 ` [PATCH 0/5] x86-32: improve atomic64_t functions (v2) H. Peter Anvin
@ 2010-02-24  9:56   ` Luca Barbieri
  2010-02-26 10:14     ` Ingo Molnar
  0 siblings, 1 reply; 15+ messages in thread
From: Luca Barbieri @ 2010-02-24  9:56 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: mingo, a.p.zijlstra, akpm, linux-kernel

> I wonder if I could ask you to recreate your patchset on top of the
> x86/asm branch in the -tip tree.

Done, resent.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/5] x86-32: improve atomic64_t functions (v2)
  2010-02-24  9:56   ` Luca Barbieri
@ 2010-02-26 10:14     ` Ingo Molnar
  2010-02-26 11:08       ` Luca Barbieri
  0 siblings, 1 reply; 15+ messages in thread
From: Ingo Molnar @ 2010-02-26 10:14 UTC (permalink / raw)
  To: Luca Barbieri; +Cc: H. Peter Anvin, a.p.zijlstra, akpm, linux-kernel


* Luca Barbieri <luca@luca-barbieri.com> wrote:

> > I wonder if I could ask you to recreate your patchset on top of the
> > x86/asm branch in the -tip tree.
> 
> Done, resent.

FYI, it triggered build failures in -tip testing:

lib/atomic64_test.c: In function 'test_atomic64':
lib/atomic64_test.c:116: error: implicit declaration of function 'atomic64_dec_if_positive'

	Ingo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/5] x86-32: improve atomic64_t functions (v2)
  2010-02-26 10:14     ` Ingo Molnar
@ 2010-02-26 11:08       ` Luca Barbieri
  2010-02-26 11:23         ` Luca Barbieri
  0 siblings, 1 reply; 15+ messages in thread
From: Luca Barbieri @ 2010-02-26 11:08 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: H. Peter Anvin, a.p.zijlstra, akpm, linux-kernel

> FYI, it triggered build failures in -tip testing:
>
> lib/atomic64_test.c: In function 'test_atomic64':
> lib/atomic64_test.c:116: error: implicit declaration of function 'atomic64_dec_if_positive'

This was on x86-64 right?

That function is implemented in the generic atomic64 implementation
and my x86-32 version, but not in the x86-64 implementation.

There is a similar problem with the 32-bt atomic_dec_if_positive, that
is implemented by ppc, mips, microblaze and avr32 but not in x86-32
and asm-generic.
Currently the 64-bit version seems unused, while the 32-bit one seems
to be only used by ppc-only drivers (IBM pSeries virtual SCSI and
PlayStation3 drivers).

I'll send a couple of patches to fix this shortly.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/5] x86-32: improve atomic64_t functions (v2)
  2010-02-26 11:08       ` Luca Barbieri
@ 2010-02-26 11:23         ` Luca Barbieri
  2010-03-01  7:35           ` H. Peter Anvin
  0 siblings, 1 reply; 15+ messages in thread
From: Luca Barbieri @ 2010-02-26 11:23 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: H. Peter Anvin, a.p.zijlstra, akpm, linux-kernel

Sent patches, both to conditionally perform the test and implement the
functions for x86 and x86-64.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/5] x86-32: improve atomic64_t functions (v2)
  2010-02-26 11:23         ` Luca Barbieri
@ 2010-03-01  7:35           ` H. Peter Anvin
  2010-03-01  8:49             ` Paul Mackerras
  2010-03-01 17:16             ` Luca Barbieri
  0 siblings, 2 replies; 15+ messages in thread
From: H. Peter Anvin @ 2010-03-01  7:35 UTC (permalink / raw)
  To: Luca Barbieri, Paul Mackerras
  Cc: Ingo Molnar, a.p.zijlstra, akpm, linux-kernel

On 02/26/2010 03:23 AM, Luca Barbieri wrote:
> Sent patches, both to conditionally perform the test and implement the
> functions for x86 and x86-64.

Yes, and with the test turned on, the kernel crashes immediately on boot
on x86-64.

Some minor investigation reveals the following:

lib/atomic64.c has the wrong return value for atomic64_add_unless().
With "wrong" I mean it is the opposite sense compared to
atomic_add_unless(), not just on x86 but on all architectures.

Accordingly, I have to conclude that lib/atomic64.c is buggy, and that
since your test matches that bug, I will have to conclude that your
x86-32 implementation is also buggy.  Thus, please send patches to fix
your test and your 32-bit implementations (and preferrably
lib/atomic64.c too, but I can do that just fine.)

Cc: Paul Mackerras who did the generic atomic64_t implementation for
verification that this is indeed a bug.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/5] x86-32: improve atomic64_t functions (v2)
  2010-03-01  7:35           ` H. Peter Anvin
@ 2010-03-01  8:49             ` Paul Mackerras
  2010-03-01 17:16             ` Luca Barbieri
  1 sibling, 0 replies; 15+ messages in thread
From: Paul Mackerras @ 2010-03-01  8:49 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Luca Barbieri, Ingo Molnar, a.p.zijlstra, akpm, linux-kernel

On Sun, Feb 28, 2010 at 11:35:31PM -0800, H. Peter Anvin wrote:

> On 02/26/2010 03:23 AM, Luca Barbieri wrote:
> > Sent patches, both to conditionally perform the test and implement the
> > functions for x86 and x86-64.
> 
> Yes, and with the test turned on, the kernel crashes immediately on boot
> on x86-64.
> 
> Some minor investigation reveals the following:
> 
> lib/atomic64.c has the wrong return value for atomic64_add_unless().
> With "wrong" I mean it is the opposite sense compared to
> atomic_add_unless(), not just on x86 but on all architectures.
> 
> Accordingly, I have to conclude that lib/atomic64.c is buggy, and that
> since your test matches that bug, I will have to conclude that your
> x86-32 implementation is also buggy.  Thus, please send patches to fix
> your test and your 32-bit implementations (and preferrably
> lib/atomic64.c too, but I can do that just fine.)
> 
> Cc: Paul Mackerras who did the generic atomic64_t implementation for
> verification that this is indeed a bug.

Yes, it sure looks like it.  *blush*

Paul.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/5] x86-32: improve atomic64_t functions (v2)
  2010-03-01  7:35           ` H. Peter Anvin
  2010-03-01  8:49             ` Paul Mackerras
@ 2010-03-01 17:16             ` Luca Barbieri
  2010-03-01 17:31               ` Luca Barbieri
  1 sibling, 1 reply; 15+ messages in thread
From: Luca Barbieri @ 2010-03-01 17:16 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Paul Mackerras, Ingo Molnar, a.p.zijlstra, akpm, linux-kernel

> Yes, and with the test turned on, the kernel crashes immediately on boot
> on x86-64.
>
> Some minor investigation reveals the following:
>
> lib/atomic64.c has the wrong return value for atomic64_add_unless().
> With "wrong" I mean it is the opposite sense compared to
> atomic_add_unless(), not just on x86 but on all architectures.
>
> Accordingly, I have to conclude that lib/atomic64.c is buggy, and that
> since your test matches that bug, I will have to conclude that your
> x86-32 implementation is also buggy.  Thus, please send patches to fix
> your test and your 32-bit implementations (and preferrably
> lib/atomic64.c too, but I can do that just fine.)

You are right: sent a patchset to fix it.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/5] x86-32: improve atomic64_t functions (v2)
  2010-03-01 17:16             ` Luca Barbieri
@ 2010-03-01 17:31               ` Luca Barbieri
  0 siblings, 0 replies; 15+ messages in thread
From: Luca Barbieri @ 2010-03-01 17:31 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Paul Mackerras, Ingo Molnar, a.p.zijlstra, akpm, linux-kernel

Upon further inspection, atomic64_inc_not_zero was broken too.

The generic implementation implements it in terms of
atomic64_add_unless and thus does not need a specific fix for it.

Sent another patchset to fix that.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2010-03-01 17:31 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-02-19 17:26 [PATCH 0/5] x86-32: improve atomic64_t functions (v2) Luca Barbieri
2010-02-19 17:26 ` [PATCH 1/5] x86: add support for relative CALL and JMP in alternatives (v2) Luca Barbieri
2010-02-19 17:26 ` [PATCH 2/5] x86: add support for lock prefix " Luca Barbieri
2010-02-19 17:26 ` [PATCH 3/5] x86-32: allow UP/SMP lock replacement in cmpxchg64 (v2) Luca Barbieri
2010-02-19 17:26 ` [PATCH 4/5] lib: add self-test for atomic64_t Luca Barbieri
2010-02-19 17:26 ` [PATCH 5/5] x86-32: rewrite 32-bit atomic64 functions in assembly (v2) Luca Barbieri
2010-02-23 22:47 ` [PATCH 0/5] x86-32: improve atomic64_t functions (v2) H. Peter Anvin
2010-02-24  9:56   ` Luca Barbieri
2010-02-26 10:14     ` Ingo Molnar
2010-02-26 11:08       ` Luca Barbieri
2010-02-26 11:23         ` Luca Barbieri
2010-03-01  7:35           ` H. Peter Anvin
2010-03-01  8:49             ` Paul Mackerras
2010-03-01 17:16             ` Luca Barbieri
2010-03-01 17:31               ` Luca Barbieri

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox