[PATCH 0/4] Fix atomic operations so that atomic64

linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/4] Fix atomic operations so that atomic64_test passes on ARM [V2]
@ 2010-06-30 14:04 Will Deacon
  2010-06-30 14:04 ` [PATCH 1/4] ARM: atomic ops: fix register constraints for atomic64_add_unless Will Deacon
  0 siblings, 1 reply; 15+ messages in thread
From: Will Deacon @ 2010-06-30 14:04 UTC (permalink / raw)
  To: linux-arm-kernel

This is version 2 of the patches originally posted here:

http://lists.infradead.org/pipermail/linux-arm-kernel/2010-June/018742.html

There are quite a few changes, mainly because I now understand the compiler
bug I was hitting previously.

A recently introduced Kernel feature is the atomic64_test on boot.
ARM fails these tests because the compiler is not correctly informed about
updates occuring to memory and is at liberty to constant-fold atomic64_t
variables.

This patch series fixes some problems with the current atomic64 operations,
adds the correct memory constraints to all of the atomic operations and finally
enables all of the atomic64 tests for ARM. I had to speak with the GCC guys at
ARM to get the memory constraints right in patch 3/4. The problem with the "m"
constraint likely affects other parts of the Kernel, for example, kprobes-decode
seems to have multiple uses of an "m" operand within a single block of inline
asm.

Cc: Nigel Stephens <nigel.stephens@arm.com>
Cc: Nicolas Pitre <nico@fluxnic.net>

Will Deacon (4):
  ARM: atomic ops: fix register constraints for atomic64_add_unless
  ARM: atomic ops: reduce critical region in atomic64_cmpxchg
  ARM: atomic ops: add memory constraints to inline asm
  ARM: atomic64_test: add ARM as supported architecture

 arch/arm/include/asm/atomic.h |  132 ++++++++++++++++++++--------------------
 lib/atomic64_test.c           |    2 +-
 2 files changed, 67 insertions(+), 67 deletions(-)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/4] ARM: atomic ops: fix register constraints for atomic64_add_unless
  2010-06-30 14:04 [PATCH 0/4] Fix atomic operations so that atomic64_test passes on ARM [V2] Will Deacon
@ 2010-06-30 14:04 ` Will Deacon
  2010-06-30 14:04   ` [PATCH 2/4] ARM: atomic ops: reduce critical region in atomic64_cmpxchg Will Deacon
  2010-07-08  4:37   ` [PATCH 1/4] ARM: atomic ops: fix register constraints for atomic64_add_unless Nicolas Pitre
  0 siblings, 2 replies; 15+ messages in thread
From: Will Deacon @ 2010-06-30 14:04 UTC (permalink / raw)
  To: linux-arm-kernel

The atomic64_add_unless function compares an atomic variable with
a given value and, if they are not equal, adds another given value
to the atomic variable. The function returns zero if the addition
did not occur and non-zero otherwise.

On ARM, the return value is initialised to 1 in C code. Inline assembly
code then performs the atomic64_add_unless operation, setting the
return value to 0 iff the addition does not occur. This means that
when the addition *does* occur, the value of ret must be preserved
across the inline assembly and therefore requires a "+r" constraint
rather than the current one of "=&r".

Thanks to Nicolas Pitre for helping to spot this.

Cc: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm/include/asm/atomic.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/arm/include/asm/atomic.h b/arch/arm/include/asm/atomic.h
index a0162fa..e9e56c0 100644
--- a/arch/arm/include/asm/atomic.h
+++ b/arch/arm/include/asm/atomic.h
@@ -440,7 +440,7 @@ static inline int atomic64_add_unless(atomic64_t *v, u64 a, u64 u)
 "	teq	%2, #0\n"
 "	bne	1b\n"
 "2:"
-	: "=&r" (val), "=&r" (ret), "=&r" (tmp)
+	: "=&r" (val), "+r" (ret), "=&r" (tmp)
 	: "r" (&v->counter), "r" (u), "r" (a)
 	: "cc");

-- 
1.6.3.3

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/4] ARM: atomic ops: reduce critical region in atomic64_cmpxchg
  2010-06-30 14:04 ` [PATCH 1/4] ARM: atomic ops: fix register constraints for atomic64_add_unless Will Deacon
@ 2010-06-30 14:04   ` Will Deacon
  2010-06-30 14:04     ` [PATCH 3/4] ARM: atomic ops: add memory constraints to inline asm Will Deacon
  2010-07-08  4:49     ` [PATCH 2/4] ARM: atomic ops: reduce critical region in atomic64_cmpxchg Nicolas Pitre
  2010-07-08  4:37   ` [PATCH 1/4] ARM: atomic ops: fix register constraints for atomic64_add_unless Nicolas Pitre
  1 sibling, 2 replies; 15+ messages in thread
From: Will Deacon @ 2010-06-30 14:04 UTC (permalink / raw)
  To: linux-arm-kernel

In order to reduce the likelihood of failure in a load/store
exclusive block, the number of intervening instructions should
be kept to a minimum.

This patch hoists a mov operation out of the critical region
in atomic64_cmpxchg.

Cc: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm/include/asm/atomic.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/arm/include/asm/atomic.h b/arch/arm/include/asm/atomic.h
index e9e56c0..4f0f282 100644
--- a/arch/arm/include/asm/atomic.h
+++ b/arch/arm/include/asm/atomic.h
@@ -358,8 +358,8 @@ static inline u64 atomic64_cmpxchg(atomic64_t *ptr, u64 old, u64 new)
 
 	do {
 		__asm__ __volatile__("@ atomic64_cmpxchg\n"
-		"ldrexd		%1, %H1, [%2]\n"
 		"mov		%0, #0\n"
+		"ldrexd		%1, %H1, [%2]\n"
 		"teq		%1, %3\n"
 		"teqeq		%H1, %H3\n"
 		"strexdeq	%0, %4, %H4, [%2]"
-- 
1.6.3.3

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/4] ARM: atomic ops: add memory constraints to inline asm
  2010-06-30 14:04   ` [PATCH 2/4] ARM: atomic ops: reduce critical region in atomic64_cmpxchg Will Deacon
@ 2010-06-30 14:04     ` Will Deacon
  2010-06-30 14:04       ` [PATCH 4/4] ARM: atomic64_test: add ARM as supported architecture Will Deacon
                         ` (3 more replies)
  2010-07-08  4:49     ` [PATCH 2/4] ARM: atomic ops: reduce critical region in atomic64_cmpxchg Nicolas Pitre
  1 sibling, 4 replies; 15+ messages in thread
From: Will Deacon @ 2010-06-30 14:04 UTC (permalink / raw)
  To: linux-arm-kernel

Currently, the 32-bit and 64-bit atomic operations on ARM do not
include memory constraints in the inline assembly blocks. In the
case of barrier-less operations [for example, atomic_add], this
means that the compiler may constant fold values which have actually
been modified by a call to an atomic operation.

This issue can be observed in the atomic64_test routine in
<kernel root>/lib/atomic64_test.c:

00000000 <test_atomic64>:
   0:	e1a0c00d 	mov	ip, sp
   4:	e92dd830 	push	{r4, r5, fp, ip, lr, pc}
   8:	e24cb004 	sub	fp, ip, #4
   c:	e24dd008 	sub	sp, sp, #8
  10:	e24b3014 	sub	r3, fp, #20
  14:	e30d000d 	movw	r0, #53261	; 0xd00d
  18:	e3011337 	movw	r1, #4919	; 0x1337
  1c:	e34c0001 	movt	r0, #49153	; 0xc001
  20:	e34a1aa3 	movt	r1, #43683	; 0xaaa3
  24:	e16300f8 	strd	r0, [r3, #-8]!
  28:	e30c0afe 	movw	r0, #51966	; 0xcafe
  2c:	e30b1eef 	movw	r1, #48879	; 0xbeef
  30:	e34d0eaf 	movt	r0, #57007	; 0xdeaf
  34:	e34d1ead 	movt	r1, #57005	; 0xdead
  38:	e1b34f9f 	ldrexd	r4, [r3]
  3c:	e1a34f90 	strexd	r4, r0, [r3]
  40:	e3340000 	teq	r4, #0
  44:	1afffffb 	bne	38 <test_atomic64+0x38>
  48:	e59f0004 	ldr	r0, [pc, #4]	; 54 <test_atomic64+0x54>
  4c:	e3a0101e 	mov	r1, #30
  50:	ebfffffe 	bl	0 <__bug>
  54:	00000000 	.word	0x00000000

The atomic64_set (0x38-0x44) writes to the atomic64_t, but the
compiler doesn't see this, assumes the test condition is always
false and generates an unconditional branch to __bug. The rest of the
test is optimised away.

This patch adds suitable memory constraints to the atomic operations on ARM
to ensure that the compiler is informed of the correct data hazards. We have
to use the "Qo" constraints to avoid hitting the GCC anomaly described at
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44492 , where the compiler
makes assumptions about the writeback in the addressing mode used by the
inline assembly. These constraints forbid the use of auto{inc,dec} addressing
modes, so it doesn't matter if we don't use the operand exactly once.

Cc: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm/include/asm/atomic.h |  132 ++++++++++++++++++++--------------------
 1 files changed, 66 insertions(+), 66 deletions(-)

diff --git a/arch/arm/include/asm/atomic.h b/arch/arm/include/asm/atomic.h
index 4f0f282..99ab659 100644
--- a/arch/arm/include/asm/atomic.h
+++ b/arch/arm/include/asm/atomic.h
@@ -40,12 +40,12 @@ static inline void atomic_add(int i, atomic_t *v)
 	int result;
 
 	__asm__ __volatile__("@ atomic_add\n"
-"1:	ldrex	%0, [%2]\n"
-"	add	%0, %0, %3\n"
-"	strex	%1, %0, [%2]\n"
+"1:	ldrex	%0, [%3]\n"
+"	add	%0, %0, %4\n"
+"	strex	%1, %0, [%3]\n"
 "	teq	%1, #0\n"
 "	bne	1b"
-	: "=&r" (result), "=&r" (tmp)
+	: "=&r" (result), "=&r" (tmp), "+Qo" (v->counter)
 	: "r" (&v->counter), "Ir" (i)
 	: "cc");
 }
@@ -58,12 +58,12 @@ static inline int atomic_add_return(int i, atomic_t *v)
 	smp_mb();
 
 	__asm__ __volatile__("@ atomic_add_return\n"
-"1:	ldrex	%0, [%2]\n"
-"	add	%0, %0, %3\n"
-"	strex	%1, %0, [%2]\n"
+"1:	ldrex	%0, [%3]\n"
+"	add	%0, %0, %4\n"
+"	strex	%1, %0, [%3]\n"
 "	teq	%1, #0\n"
 "	bne	1b"
-	: "=&r" (result), "=&r" (tmp)
+	: "=&r" (result), "=&r" (tmp), "+Qo" (v->counter)
 	: "r" (&v->counter), "Ir" (i)
 	: "cc");
 
@@ -78,12 +78,12 @@ static inline void atomic_sub(int i, atomic_t *v)
 	int result;
 
 	__asm__ __volatile__("@ atomic_sub\n"
-"1:	ldrex	%0, [%2]\n"
-"	sub	%0, %0, %3\n"
-"	strex	%1, %0, [%2]\n"
+"1:	ldrex	%0, [%3]\n"
+"	sub	%0, %0, %4\n"
+"	strex	%1, %0, [%3]\n"
 "	teq	%1, #0\n"
 "	bne	1b"
-	: "=&r" (result), "=&r" (tmp)
+	: "=&r" (result), "=&r" (tmp), "+Qo" (v->counter)
 	: "r" (&v->counter), "Ir" (i)
 	: "cc");
 }
@@ -96,12 +96,12 @@ static inline int atomic_sub_return(int i, atomic_t *v)
 	smp_mb();
 
 	__asm__ __volatile__("@ atomic_sub_return\n"
-"1:	ldrex	%0, [%2]\n"
-"	sub	%0, %0, %3\n"
-"	strex	%1, %0, [%2]\n"
+"1:	ldrex	%0, [%3]\n"
+"	sub	%0, %0, %4\n"
+"	strex	%1, %0, [%3]\n"
 "	teq	%1, #0\n"
 "	bne	1b"
-	: "=&r" (result), "=&r" (tmp)
+	: "=&r" (result), "=&r" (tmp), "+Qo" (v->counter)
 	: "r" (&v->counter), "Ir" (i)
 	: "cc");
 
@@ -118,11 +118,11 @@ static inline int atomic_cmpxchg(atomic_t *ptr, int old, int new)
 
 	do {
 		__asm__ __volatile__("@ atomic_cmpxchg\n"
-		"ldrex	%1, [%2]\n"
+		"ldrex	%1, [%3]\n"
 		"mov	%0, #0\n"
-		"teq	%1, %3\n"
-		"strexeq %0, %4, [%2]\n"
-		    : "=&r" (res), "=&r" (oldval)
+		"teq	%1, %4\n"
+		"strexeq %0, %5, [%3]\n"
+		    : "=&r" (res), "=&r" (oldval), "+Qo" (ptr->counter)
 		    : "r" (&ptr->counter), "Ir" (old), "r" (new)
 		    : "cc");
 	} while (res);
@@ -137,12 +137,12 @@ static inline void atomic_clear_mask(unsigned long mask, unsigned long *addr)
 	unsigned long tmp, tmp2;
 
 	__asm__ __volatile__("@ atomic_clear_mask\n"
-"1:	ldrex	%0, [%2]\n"
-"	bic	%0, %0, %3\n"
-"	strex	%1, %0, [%2]\n"
+"1:	ldrex	%0, [%3]\n"
+"	bic	%0, %0, %4\n"
+"	strex	%1, %0, [%3]\n"
 "	teq	%1, #0\n"
 "	bne	1b"
-	: "=&r" (tmp), "=&r" (tmp2)
+	: "=&r" (tmp), "=&r" (tmp2), "+Qo" (*addr)
 	: "r" (addr), "Ir" (mask)
 	: "cc");
 }
@@ -249,7 +249,7 @@ static inline u64 atomic64_read(atomic64_t *v)
 	__asm__ __volatile__("@ atomic64_read\n"
 "	ldrexd	%0, %H0, [%1]"
 	: "=&r" (result)
-	: "r" (&v->counter)
+	: "r" (&v->counter), "Qo" (v->counter)
 	);
 
 	return result;
@@ -260,11 +260,11 @@ static inline void atomic64_set(atomic64_t *v, u64 i)
 	u64 tmp;
 
 	__asm__ __volatile__("@ atomic64_set\n"
-"1:	ldrexd	%0, %H0, [%1]\n"
-"	strexd	%0, %2, %H2, [%1]\n"
+"1:	ldrexd	%0, %H0, [%2]\n"
+"	strexd	%0, %3, %H3, [%2]\n"
 "	teq	%0, #0\n"
 "	bne	1b"
-	: "=&r" (tmp)
+	: "=&r" (tmp), "=Qo" (v->counter)
 	: "r" (&v->counter), "r" (i)
 	: "cc");
 }
@@ -275,13 +275,13 @@ static inline void atomic64_add(u64 i, atomic64_t *v)
 	unsigned long tmp;
 
 	__asm__ __volatile__("@ atomic64_add\n"
-"1:	ldrexd	%0, %H0, [%2]\n"
-"	adds	%0, %0, %3\n"
-"	adc	%H0, %H0, %H3\n"
-"	strexd	%1, %0, %H0, [%2]\n"
+"1:	ldrexd	%0, %H0, [%3]\n"
+"	adds	%0, %0, %4\n"
+"	adc	%H0, %H0, %H4\n"
+"	strexd	%1, %0, %H0, [%3]\n"
 "	teq	%1, #0\n"
 "	bne	1b"
-	: "=&r" (result), "=&r" (tmp)
+	: "=&r" (result), "=&r" (tmp), "+Qo" (v->counter)
 	: "r" (&v->counter), "r" (i)
 	: "cc");
 }
@@ -294,13 +294,13 @@ static inline u64 atomic64_add_return(u64 i, atomic64_t *v)
 	smp_mb();
 
 	__asm__ __volatile__("@ atomic64_add_return\n"
-"1:	ldrexd	%0, %H0, [%2]\n"
-"	adds	%0, %0, %3\n"
-"	adc	%H0, %H0, %H3\n"
-"	strexd	%1, %0, %H0, [%2]\n"
+"1:	ldrexd	%0, %H0, [%3]\n"
+"	adds	%0, %0, %4\n"
+"	adc	%H0, %H0, %H4\n"
+"	strexd	%1, %0, %H0, [%3]\n"
 "	teq	%1, #0\n"
 "	bne	1b"
-	: "=&r" (result), "=&r" (tmp)
+	: "=&r" (result), "=&r" (tmp), "+Qo" (v->counter)
 	: "r" (&v->counter), "r" (i)
 	: "cc");
 
@@ -315,13 +315,13 @@ static inline void atomic64_sub(u64 i, atomic64_t *v)
 	unsigned long tmp;
 
 	__asm__ __volatile__("@ atomic64_sub\n"
-"1:	ldrexd	%0, %H0, [%2]\n"
-"	subs	%0, %0, %3\n"
-"	sbc	%H0, %H0, %H3\n"
-"	strexd	%1, %0, %H0, [%2]\n"
+"1:	ldrexd	%0, %H0, [%3]\n"
+"	subs	%0, %0, %4\n"
+"	sbc	%H0, %H0, %H4\n"
+"	strexd	%1, %0, %H0, [%3]\n"
 "	teq	%1, #0\n"
 "	bne	1b"
-	: "=&r" (result), "=&r" (tmp)
+	: "=&r" (result), "=&r" (tmp), "+Qo" (v->counter)
 	: "r" (&v->counter), "r" (i)
 	: "cc");
 }
@@ -334,13 +334,13 @@ static inline u64 atomic64_sub_return(u64 i, atomic64_t *v)
 	smp_mb();
 
 	__asm__ __volatile__("@ atomic64_sub_return\n"
-"1:	ldrexd	%0, %H0, [%2]\n"
-"	subs	%0, %0, %3\n"
-"	sbc	%H0, %H0, %H3\n"
-"	strexd	%1, %0, %H0, [%2]\n"
+"1:	ldrexd	%0, %H0, [%3]\n"
+"	subs	%0, %0, %4\n"
+"	sbc	%H0, %H0, %H4\n"
+"	strexd	%1, %0, %H0, [%3]\n"
 "	teq	%1, #0\n"
 "	bne	1b"
-	: "=&r" (result), "=&r" (tmp)
+	: "=&r" (result), "=&r" (tmp), "+Qo" (v->counter)
 	: "r" (&v->counter), "r" (i)
 	: "cc");
 
@@ -359,11 +359,11 @@ static inline u64 atomic64_cmpxchg(atomic64_t *ptr, u64 old, u64 new)
 	do {
 		__asm__ __volatile__("@ atomic64_cmpxchg\n"
 		"mov		%0, #0\n"
-		"ldrexd		%1, %H1, [%2]\n"
-		"teq		%1, %3\n"
-		"teqeq		%H1, %H3\n"
-		"strexdeq	%0, %4, %H4, [%2]"
-		: "=&r" (res), "=&r" (oldval)
+		"ldrexd		%1, %H1, [%3]\n"
+		"teq		%1, %4\n"
+		"teqeq		%H1, %H4\n"
+		"strexdeq	%0, %5, %H5, [%3]"
+		: "=&r" (res), "=&r" (oldval), "+Qo" (ptr->counter)
 		: "r" (&ptr->counter), "r" (old), "r" (new)
 		: "cc");
 	} while (res);
@@ -381,11 +381,11 @@ static inline u64 atomic64_xchg(atomic64_t *ptr, u64 new)
 	smp_mb();
 
 	__asm__ __volatile__("@ atomic64_xchg\n"
-"1:	ldrexd	%0, %H0, [%2]\n"
-"	strexd	%1, %3, %H3, [%2]\n"
+"1:	ldrexd	%0, %H0, [%3]\n"
+"	strexd	%1, %4, %H4, [%3]\n"
 "	teq	%1, #0\n"
 "	bne	1b"
-	: "=&r" (result), "=&r" (tmp)
+	: "=&r" (result), "=&r" (tmp), "+Qo" (ptr->counter)
 	: "r" (&ptr->counter), "r" (new)
 	: "cc");
 
@@ -402,16 +402,16 @@ static inline u64 atomic64_dec_if_positive(atomic64_t *v)
 	smp_mb();
 
 	__asm__ __volatile__("@ atomic64_dec_if_positive\n"
-"1:	ldrexd	%0, %H0, [%2]\n"
+"1:	ldrexd	%0, %H0, [%3]\n"
 "	subs	%0, %0, #1\n"
 "	sbc	%H0, %H0, #0\n"
 "	teq	%H0, #0\n"
 "	bmi	2f\n"
-"	strexd	%1, %0, %H0, [%2]\n"
+"	strexd	%1, %0, %H0, [%3]\n"
 "	teq	%1, #0\n"
 "	bne	1b\n"
 "2:"
-	: "=&r" (result), "=&r" (tmp)
+	: "=&r" (result), "=&r" (tmp), "+Qo" (v->counter)
 	: "r" (&v->counter)
 	: "cc");
 
@@ -429,18 +429,18 @@ static inline int atomic64_add_unless(atomic64_t *v, u64 a, u64 u)
 	smp_mb();
 
 	__asm__ __volatile__("@ atomic64_add_unless\n"
-"1:	ldrexd	%0, %H0, [%3]\n"
-"	teq	%0, %4\n"
-"	teqeq	%H0, %H4\n"
+"1:	ldrexd	%0, %H0, [%4]\n"
+"	teq	%0, %5\n"
+"	teqeq	%H0, %H5\n"
 "	moveq	%1, #0\n"
 "	beq	2f\n"
-"	adds	%0, %0, %5\n"
-"	adc	%H0, %H0, %H5\n"
-"	strexd	%2, %0, %H0, [%3]\n"
+"	adds	%0, %0, %6\n"
+"	adc	%H0, %H0, %H6\n"
+"	strexd	%2, %0, %H0, [%4]\n"
 "	teq	%2, #0\n"
 "	bne	1b\n"
 "2:"
-	: "=&r" (val), "+r" (ret), "=&r" (tmp)
+	: "=&r" (val), "+r" (ret), "=&r" (tmp), "+Qo" (v->counter)
 	: "r" (&v->counter), "r" (u), "r" (a)
 	: "cc");
 
-- 
1.6.3.3

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 4/4] ARM: atomic64_test: add ARM as supported architecture
  2010-06-30 14:04     ` [PATCH 3/4] ARM: atomic ops: add memory constraints to inline asm Will Deacon
@ 2010-06-30 14:04       ` Will Deacon
  2010-07-08  5:50         ` Nicolas Pitre
  2010-07-07 16:42       ` [PATCH 3/4] ARM: atomic ops: add memory constraints to inline asm Will Deacon
                         ` (2 subsequent siblings)
  3 siblings, 1 reply; 15+ messages in thread
From: Will Deacon @ 2010-06-30 14:04 UTC (permalink / raw)
  To: linux-arm-kernel

ARM has support for the atomic64_dec_if_positive operation
so ensure that it is tested by the atomic64_test routine.

Cc: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 lib/atomic64_test.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/lib/atomic64_test.c b/lib/atomic64_test.c
index 250ed11..44524cc 100644
--- a/lib/atomic64_test.c
+++ b/lib/atomic64_test.c
@@ -114,7 +114,7 @@ static __init int test_atomic64(void)
 	BUG_ON(v.counter != r);
 
 #if defined(CONFIG_X86) || defined(CONFIG_MIPS) || defined(CONFIG_PPC) || \
-    defined(CONFIG_S390) || defined(_ASM_GENERIC_ATOMIC64_H)
+    defined(CONFIG_S390) || defined(_ASM_GENERIC_ATOMIC64_H) || defined(CONFIG_ARM)
 	INIT(onestwos);
 	BUG_ON(atomic64_dec_if_positive(&v) != (onestwos - 1));
 	r -= one;
-- 
1.6.3.3

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 4/4] ARM: atomic64_test: add ARM as supported architecture
  2010-06-30 14:04       ` [PATCH 4/4] ARM: atomic64_test: add ARM as supported architecture Will Deacon
@ 2010-07-08  5:50         ` Nicolas Pitre
  0 siblings, 0 replies; 15+ messages in thread
From: Nicolas Pitre @ 2010-07-08  5:50 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 30 Jun 2010, Will Deacon wrote:

> ARM has support for the atomic64_dec_if_positive operation
> so ensure that it is tested by the atomic64_test routine.
> 
> Cc: Nicolas Pitre <nico@fluxnic.net>
> Signed-off-by: Will Deacon <will.deacon@arm.com>

Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>

> ---
>  lib/atomic64_test.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/lib/atomic64_test.c b/lib/atomic64_test.c
> index 250ed11..44524cc 100644
> --- a/lib/atomic64_test.c
> +++ b/lib/atomic64_test.c
> @@ -114,7 +114,7 @@ static __init int test_atomic64(void)
>  	BUG_ON(v.counter != r);
>  
>  #if defined(CONFIG_X86) || defined(CONFIG_MIPS) || defined(CONFIG_PPC) || \
> -    defined(CONFIG_S390) || defined(_ASM_GENERIC_ATOMIC64_H)
> +    defined(CONFIG_S390) || defined(_ASM_GENERIC_ATOMIC64_H) || defined(CONFIG_ARM)
>  	INIT(onestwos);
>  	BUG_ON(atomic64_dec_if_positive(&v) != (onestwos - 1));
>  	r -= one;
> -- 
> 1.6.3.3
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 3/4] ARM: atomic ops: add memory constraints to inline asm
  2010-06-30 14:04     ` [PATCH 3/4] ARM: atomic ops: add memory constraints to inline asm Will Deacon
  2010-06-30 14:04       ` [PATCH 4/4] ARM: atomic64_test: add ARM as supported architecture Will Deacon
@ 2010-07-07 16:42       ` Will Deacon
  2010-07-08  5:49       ` Nicolas Pitre
       [not found]       ` <004101cb1df3$5825ecd0$0871c670$%deacon@arm.com>
  3 siblings, 0 replies; 15+ messages in thread
From: Will Deacon @ 2010-07-07 16:42 UTC (permalink / raw)
  To: linux-arm-kernel

Hello,

> Subject: [PATCH 3/4] ARM: atomic ops: add memory constraints to inline asm
> 
> Currently, the 32-bit and 64-bit atomic operations on ARM do not
> include memory constraints in the inline assembly blocks. In the
> case of barrier-less operations [for example, atomic_add], this
> means that the compiler may constant fold values which have actually
> been modified by a call to an atomic operation.
> 
> This issue can be observed in the atomic64_test routine in
> <kernel root>/lib/atomic64_test.c:
> 
> 00000000 <test_atomic64>:
>    0:	e1a0c00d 	mov	ip, sp
>    4:	e92dd830 	push	{r4, r5, fp, ip, lr, pc}
>    8:	e24cb004 	sub	fp, ip, #4
>    c:	e24dd008 	sub	sp, sp, #8
>   10:	e24b3014 	sub	r3, fp, #20
>   14:	e30d000d 	movw	r0, #53261	; 0xd00d
>   18:	e3011337 	movw	r1, #4919	; 0x1337
>   1c:	e34c0001 	movt	r0, #49153	; 0xc001
>   20:	e34a1aa3 	movt	r1, #43683	; 0xaaa3
>   24:	e16300f8 	strd	r0, [r3, #-8]!
>   28:	e30c0afe 	movw	r0, #51966	; 0xcafe
>   2c:	e30b1eef 	movw	r1, #48879	; 0xbeef
>   30:	e34d0eaf 	movt	r0, #57007	; 0xdeaf
>   34:	e34d1ead 	movt	r1, #57005	; 0xdead
>   38:	e1b34f9f 	ldrexd	r4, [r3]
>   3c:	e1a34f90 	strexd	r4, r0, [r3]
>   40:	e3340000 	teq	r4, #0
>   44:	1afffffb 	bne	38 <test_atomic64+0x38>
>   48:	e59f0004 	ldr	r0, [pc, #4]	; 54 <test_atomic64+0x54>
>   4c:	e3a0101e 	mov	r1, #30
>   50:	ebfffffe 	bl	0 <__bug>
>   54:	00000000 	.word	0x00000000
> 
> The atomic64_set (0x38-0x44) writes to the atomic64_t, but the
> compiler doesn't see this, assumes the test condition is always
> false and generates an unconditional branch to __bug. The rest of the
> test is optimised away.
> 
> This patch adds suitable memory constraints to the atomic operations on ARM
> to ensure that the compiler is informed of the correct data hazards. We have
> to use the "Qo" constraints to avoid hitting the GCC anomaly described at
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44492 , where the compiler
> makes assumptions about the writeback in the addressing mode used by the
> inline assembly. These constraints forbid the use of auto{inc,dec} addressing
> modes, so it doesn't matter if we don't use the operand exactly once.
> 
> Cc: Nicolas Pitre <nico@fluxnic.net>
> Signed-off-by: Will Deacon <will.deacon@arm.com>

Does anybody have any feedback on this patch? The other three in the
series are straightforward, but I'd like another set of eyes to go over this
code [I know it makes for tough reading!] before submitted to the patch
system.

Any comments appreciated!

Cheers,

Will

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 3/4] ARM: atomic ops: add memory constraints to inline asm
  2010-06-30 14:04     ` [PATCH 3/4] ARM: atomic ops: add memory constraints to inline asm Will Deacon
  2010-06-30 14:04       ` [PATCH 4/4] ARM: atomic64_test: add ARM as supported architecture Will Deacon
  2010-07-07 16:42       ` [PATCH 3/4] ARM: atomic ops: add memory constraints to inline asm Will Deacon
@ 2010-07-08  5:49       ` Nicolas Pitre
  2010-07-08  9:36         ` Will Deacon
       [not found]         ` <004b01cb1e81$0b745960$225d0c20$%deacon@arm.com>
       [not found]       ` <004101cb1df3$5825ecd0$0871c670$%deacon@arm.com>
  3 siblings, 2 replies; 15+ messages in thread
From: Nicolas Pitre @ 2010-07-08  5:49 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 30 Jun 2010, Will Deacon wrote:

> Currently, the 32-bit and 64-bit atomic operations on ARM do not
> include memory constraints in the inline assembly blocks. In the
> case of barrier-less operations [for example, atomic_add], this
> means that the compiler may constant fold values which have actually
> been modified by a call to an atomic operation.
> 
> This issue can be observed in the atomic64_test routine in
> <kernel root>/lib/atomic64_test.c:
> 
> 00000000 <test_atomic64>:
>    0:	e1a0c00d 	mov	ip, sp
>    4:	e92dd830 	push	{r4, r5, fp, ip, lr, pc}
>    8:	e24cb004 	sub	fp, ip, #4
>    c:	e24dd008 	sub	sp, sp, #8
>   10:	e24b3014 	sub	r3, fp, #20
>   14:	e30d000d 	movw	r0, #53261	; 0xd00d
>   18:	e3011337 	movw	r1, #4919	; 0x1337
>   1c:	e34c0001 	movt	r0, #49153	; 0xc001
>   20:	e34a1aa3 	movt	r1, #43683	; 0xaaa3
>   24:	e16300f8 	strd	r0, [r3, #-8]!
>   28:	e30c0afe 	movw	r0, #51966	; 0xcafe
>   2c:	e30b1eef 	movw	r1, #48879	; 0xbeef
>   30:	e34d0eaf 	movt	r0, #57007	; 0xdeaf
>   34:	e34d1ead 	movt	r1, #57005	; 0xdead
>   38:	e1b34f9f 	ldrexd	r4, [r3]
>   3c:	e1a34f90 	strexd	r4, r0, [r3]
>   40:	e3340000 	teq	r4, #0
>   44:	1afffffb 	bne	38 <test_atomic64+0x38>
>   48:	e59f0004 	ldr	r0, [pc, #4]	; 54 <test_atomic64+0x54>
>   4c:	e3a0101e 	mov	r1, #30
>   50:	ebfffffe 	bl	0 <__bug>
>   54:	00000000 	.word	0x00000000
> 
> The atomic64_set (0x38-0x44) writes to the atomic64_t, but the
> compiler doesn't see this, assumes the test condition is always
> false and generates an unconditional branch to __bug. The rest of the
> test is optimised away.
> 
> This patch adds suitable memory constraints to the atomic operations on ARM
> to ensure that the compiler is informed of the correct data hazards. We have
> to use the "Qo" constraints to avoid hitting the GCC anomaly described at
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44492 , where the compiler
> makes assumptions about the writeback in the addressing mode used by the
> inline assembly. These constraints forbid the use of auto{inc,dec} addressing
> modes, so it doesn't matter if we don't use the operand exactly once.

Why do you use the "o" constraint?  That's for an offsetable 
memory reference.  Since we already use the actual address value with a 
register constraint, it would be more logical to simply use the "Q" 
constraint alone without any "o".  The gcc manual says:

|    `Q'
|          A memory reference where the exact address is in a single
|          register

Both "Qo" and "Q" provide the same wanted end result in this 
case.  But a quick test shows that "Q" produces exactly what we need,
more so than "Qo", because of the other operand which is the actual 
address (the same register is used in both cases).

In any case:

Reviewed-by: Nicolas Pitre <nicolas.pitre@linaro.org>

Also, I'm assuming that such a constraint should be added to the 32 bit 
versions too for the same correctness reason?


> Cc: Nicolas Pitre <nico@fluxnic.net>
> Signed-off-by: Will Deacon <will.deacon@arm.com>
> ---
>  arch/arm/include/asm/atomic.h |  132 ++++++++++++++++++++--------------------
>  1 files changed, 66 insertions(+), 66 deletions(-)
> 
> diff --git a/arch/arm/include/asm/atomic.h b/arch/arm/include/asm/atomic.h
> index 4f0f282..99ab659 100644
> --- a/arch/arm/include/asm/atomic.h
> +++ b/arch/arm/include/asm/atomic.h
> @@ -40,12 +40,12 @@ static inline void atomic_add(int i, atomic_t *v)
>  	int result;
>  
>  	__asm__ __volatile__("@ atomic_add\n"
> -"1:	ldrex	%0, [%2]\n"
> -"	add	%0, %0, %3\n"
> -"	strex	%1, %0, [%2]\n"
> +"1:	ldrex	%0, [%3]\n"
> +"	add	%0, %0, %4\n"
> +"	strex	%1, %0, [%3]\n"
>  "	teq	%1, #0\n"
>  "	bne	1b"
> -	: "=&r" (result), "=&r" (tmp)
> +	: "=&r" (result), "=&r" (tmp), "+Qo" (v->counter)
>  	: "r" (&v->counter), "Ir" (i)
>  	: "cc");
>  }
> @@ -58,12 +58,12 @@ static inline int atomic_add_return(int i, atomic_t *v)
>  	smp_mb();
>  
>  	__asm__ __volatile__("@ atomic_add_return\n"
> -"1:	ldrex	%0, [%2]\n"
> -"	add	%0, %0, %3\n"
> -"	strex	%1, %0, [%2]\n"
> +"1:	ldrex	%0, [%3]\n"
> +"	add	%0, %0, %4\n"
> +"	strex	%1, %0, [%3]\n"
>  "	teq	%1, #0\n"
>  "	bne	1b"
> -	: "=&r" (result), "=&r" (tmp)
> +	: "=&r" (result), "=&r" (tmp), "+Qo" (v->counter)
>  	: "r" (&v->counter), "Ir" (i)
>  	: "cc");
>  
> @@ -78,12 +78,12 @@ static inline void atomic_sub(int i, atomic_t *v)
>  	int result;
>  
>  	__asm__ __volatile__("@ atomic_sub\n"
> -"1:	ldrex	%0, [%2]\n"
> -"	sub	%0, %0, %3\n"
> -"	strex	%1, %0, [%2]\n"
> +"1:	ldrex	%0, [%3]\n"
> +"	sub	%0, %0, %4\n"
> +"	strex	%1, %0, [%3]\n"
>  "	teq	%1, #0\n"
>  "	bne	1b"
> -	: "=&r" (result), "=&r" (tmp)
> +	: "=&r" (result), "=&r" (tmp), "+Qo" (v->counter)
>  	: "r" (&v->counter), "Ir" (i)
>  	: "cc");
>  }
> @@ -96,12 +96,12 @@ static inline int atomic_sub_return(int i, atomic_t *v)
>  	smp_mb();
>  
>  	__asm__ __volatile__("@ atomic_sub_return\n"
> -"1:	ldrex	%0, [%2]\n"
> -"	sub	%0, %0, %3\n"
> -"	strex	%1, %0, [%2]\n"
> +"1:	ldrex	%0, [%3]\n"
> +"	sub	%0, %0, %4\n"
> +"	strex	%1, %0, [%3]\n"
>  "	teq	%1, #0\n"
>  "	bne	1b"
> -	: "=&r" (result), "=&r" (tmp)
> +	: "=&r" (result), "=&r" (tmp), "+Qo" (v->counter)
>  	: "r" (&v->counter), "Ir" (i)
>  	: "cc");
>  
> @@ -118,11 +118,11 @@ static inline int atomic_cmpxchg(atomic_t *ptr, int old, int new)
>  
>  	do {
>  		__asm__ __volatile__("@ atomic_cmpxchg\n"
> -		"ldrex	%1, [%2]\n"
> +		"ldrex	%1, [%3]\n"
>  		"mov	%0, #0\n"
> -		"teq	%1, %3\n"
> -		"strexeq %0, %4, [%2]\n"
> -		    : "=&r" (res), "=&r" (oldval)
> +		"teq	%1, %4\n"
> +		"strexeq %0, %5, [%3]\n"
> +		    : "=&r" (res), "=&r" (oldval), "+Qo" (ptr->counter)
>  		    : "r" (&ptr->counter), "Ir" (old), "r" (new)
>  		    : "cc");
>  	} while (res);
> @@ -137,12 +137,12 @@ static inline void atomic_clear_mask(unsigned long mask, unsigned long *addr)
>  	unsigned long tmp, tmp2;
>  
>  	__asm__ __volatile__("@ atomic_clear_mask\n"
> -"1:	ldrex	%0, [%2]\n"
> -"	bic	%0, %0, %3\n"
> -"	strex	%1, %0, [%2]\n"
> +"1:	ldrex	%0, [%3]\n"
> +"	bic	%0, %0, %4\n"
> +"	strex	%1, %0, [%3]\n"
>  "	teq	%1, #0\n"
>  "	bne	1b"
> -	: "=&r" (tmp), "=&r" (tmp2)
> +	: "=&r" (tmp), "=&r" (tmp2), "+Qo" (*addr)
>  	: "r" (addr), "Ir" (mask)
>  	: "cc");
>  }
> @@ -249,7 +249,7 @@ static inline u64 atomic64_read(atomic64_t *v)
>  	__asm__ __volatile__("@ atomic64_read\n"
>  "	ldrexd	%0, %H0, [%1]"
>  	: "=&r" (result)
> -	: "r" (&v->counter)
> +	: "r" (&v->counter), "Qo" (v->counter)
>  	);
>  
>  	return result;
> @@ -260,11 +260,11 @@ static inline void atomic64_set(atomic64_t *v, u64 i)
>  	u64 tmp;
>  
>  	__asm__ __volatile__("@ atomic64_set\n"
> -"1:	ldrexd	%0, %H0, [%1]\n"
> -"	strexd	%0, %2, %H2, [%1]\n"
> +"1:	ldrexd	%0, %H0, [%2]\n"
> +"	strexd	%0, %3, %H3, [%2]\n"
>  "	teq	%0, #0\n"
>  "	bne	1b"
> -	: "=&r" (tmp)
> +	: "=&r" (tmp), "=Qo" (v->counter)
>  	: "r" (&v->counter), "r" (i)
>  	: "cc");
>  }
> @@ -275,13 +275,13 @@ static inline void atomic64_add(u64 i, atomic64_t *v)
>  	unsigned long tmp;
>  
>  	__asm__ __volatile__("@ atomic64_add\n"
> -"1:	ldrexd	%0, %H0, [%2]\n"
> -"	adds	%0, %0, %3\n"
> -"	adc	%H0, %H0, %H3\n"
> -"	strexd	%1, %0, %H0, [%2]\n"
> +"1:	ldrexd	%0, %H0, [%3]\n"
> +"	adds	%0, %0, %4\n"
> +"	adc	%H0, %H0, %H4\n"
> +"	strexd	%1, %0, %H0, [%3]\n"
>  "	teq	%1, #0\n"
>  "	bne	1b"
> -	: "=&r" (result), "=&r" (tmp)
> +	: "=&r" (result), "=&r" (tmp), "+Qo" (v->counter)
>  	: "r" (&v->counter), "r" (i)
>  	: "cc");
>  }
> @@ -294,13 +294,13 @@ static inline u64 atomic64_add_return(u64 i, atomic64_t *v)
>  	smp_mb();
>  
>  	__asm__ __volatile__("@ atomic64_add_return\n"
> -"1:	ldrexd	%0, %H0, [%2]\n"
> -"	adds	%0, %0, %3\n"
> -"	adc	%H0, %H0, %H3\n"
> -"	strexd	%1, %0, %H0, [%2]\n"
> +"1:	ldrexd	%0, %H0, [%3]\n"
> +"	adds	%0, %0, %4\n"
> +"	adc	%H0, %H0, %H4\n"
> +"	strexd	%1, %0, %H0, [%3]\n"
>  "	teq	%1, #0\n"
>  "	bne	1b"
> -	: "=&r" (result), "=&r" (tmp)
> +	: "=&r" (result), "=&r" (tmp), "+Qo" (v->counter)
>  	: "r" (&v->counter), "r" (i)
>  	: "cc");
>  
> @@ -315,13 +315,13 @@ static inline void atomic64_sub(u64 i, atomic64_t *v)
>  	unsigned long tmp;
>  
>  	__asm__ __volatile__("@ atomic64_sub\n"
> -"1:	ldrexd	%0, %H0, [%2]\n"
> -"	subs	%0, %0, %3\n"
> -"	sbc	%H0, %H0, %H3\n"
> -"	strexd	%1, %0, %H0, [%2]\n"
> +"1:	ldrexd	%0, %H0, [%3]\n"
> +"	subs	%0, %0, %4\n"
> +"	sbc	%H0, %H0, %H4\n"
> +"	strexd	%1, %0, %H0, [%3]\n"
>  "	teq	%1, #0\n"
>  "	bne	1b"
> -	: "=&r" (result), "=&r" (tmp)
> +	: "=&r" (result), "=&r" (tmp), "+Qo" (v->counter)
>  	: "r" (&v->counter), "r" (i)
>  	: "cc");
>  }
> @@ -334,13 +334,13 @@ static inline u64 atomic64_sub_return(u64 i, atomic64_t *v)
>  	smp_mb();
>  
>  	__asm__ __volatile__("@ atomic64_sub_return\n"
> -"1:	ldrexd	%0, %H0, [%2]\n"
> -"	subs	%0, %0, %3\n"
> -"	sbc	%H0, %H0, %H3\n"
> -"	strexd	%1, %0, %H0, [%2]\n"
> +"1:	ldrexd	%0, %H0, [%3]\n"
> +"	subs	%0, %0, %4\n"
> +"	sbc	%H0, %H0, %H4\n"
> +"	strexd	%1, %0, %H0, [%3]\n"
>  "	teq	%1, #0\n"
>  "	bne	1b"
> -	: "=&r" (result), "=&r" (tmp)
> +	: "=&r" (result), "=&r" (tmp), "+Qo" (v->counter)
>  	: "r" (&v->counter), "r" (i)
>  	: "cc");
>  
> @@ -359,11 +359,11 @@ static inline u64 atomic64_cmpxchg(atomic64_t *ptr, u64 old, u64 new)
>  	do {
>  		__asm__ __volatile__("@ atomic64_cmpxchg\n"
>  		"mov		%0, #0\n"
> -		"ldrexd		%1, %H1, [%2]\n"
> -		"teq		%1, %3\n"
> -		"teqeq		%H1, %H3\n"
> -		"strexdeq	%0, %4, %H4, [%2]"
> -		: "=&r" (res), "=&r" (oldval)
> +		"ldrexd		%1, %H1, [%3]\n"
> +		"teq		%1, %4\n"
> +		"teqeq		%H1, %H4\n"
> +		"strexdeq	%0, %5, %H5, [%3]"
> +		: "=&r" (res), "=&r" (oldval), "+Qo" (ptr->counter)
>  		: "r" (&ptr->counter), "r" (old), "r" (new)
>  		: "cc");
>  	} while (res);
> @@ -381,11 +381,11 @@ static inline u64 atomic64_xchg(atomic64_t *ptr, u64 new)
>  	smp_mb();
>  
>  	__asm__ __volatile__("@ atomic64_xchg\n"
> -"1:	ldrexd	%0, %H0, [%2]\n"
> -"	strexd	%1, %3, %H3, [%2]\n"
> +"1:	ldrexd	%0, %H0, [%3]\n"
> +"	strexd	%1, %4, %H4, [%3]\n"
>  "	teq	%1, #0\n"
>  "	bne	1b"
> -	: "=&r" (result), "=&r" (tmp)
> +	: "=&r" (result), "=&r" (tmp), "+Qo" (ptr->counter)
>  	: "r" (&ptr->counter), "r" (new)
>  	: "cc");
>  
> @@ -402,16 +402,16 @@ static inline u64 atomic64_dec_if_positive(atomic64_t *v)
>  	smp_mb();
>  
>  	__asm__ __volatile__("@ atomic64_dec_if_positive\n"
> -"1:	ldrexd	%0, %H0, [%2]\n"
> +"1:	ldrexd	%0, %H0, [%3]\n"
>  "	subs	%0, %0, #1\n"
>  "	sbc	%H0, %H0, #0\n"
>  "	teq	%H0, #0\n"
>  "	bmi	2f\n"
> -"	strexd	%1, %0, %H0, [%2]\n"
> +"	strexd	%1, %0, %H0, [%3]\n"
>  "	teq	%1, #0\n"
>  "	bne	1b\n"
>  "2:"
> -	: "=&r" (result), "=&r" (tmp)
> +	: "=&r" (result), "=&r" (tmp), "+Qo" (v->counter)
>  	: "r" (&v->counter)
>  	: "cc");
>  
> @@ -429,18 +429,18 @@ static inline int atomic64_add_unless(atomic64_t *v, u64 a, u64 u)
>  	smp_mb();
>  
>  	__asm__ __volatile__("@ atomic64_add_unless\n"
> -"1:	ldrexd	%0, %H0, [%3]\n"
> -"	teq	%0, %4\n"
> -"	teqeq	%H0, %H4\n"
> +"1:	ldrexd	%0, %H0, [%4]\n"
> +"	teq	%0, %5\n"
> +"	teqeq	%H0, %H5\n"
>  "	moveq	%1, #0\n"
>  "	beq	2f\n"
> -"	adds	%0, %0, %5\n"
> -"	adc	%H0, %H0, %H5\n"
> -"	strexd	%2, %0, %H0, [%3]\n"
> +"	adds	%0, %0, %6\n"
> +"	adc	%H0, %H0, %H6\n"
> +"	strexd	%2, %0, %H0, [%4]\n"
>  "	teq	%2, #0\n"
>  "	bne	1b\n"
>  "2:"
> -	: "=&r" (val), "+r" (ret), "=&r" (tmp)
> +	: "=&r" (val), "+r" (ret), "=&r" (tmp), "+Qo" (v->counter)
>  	: "r" (&v->counter), "r" (u), "r" (a)
>  	: "cc");
>  
> -- 
> 1.6.3.3
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 3/4] ARM: atomic ops: add memory constraints to inline asm
  2010-07-08  5:49       ` Nicolas Pitre
@ 2010-07-08  9:36         ` Will Deacon
       [not found]         ` <004b01cb1e81$0b745960$225d0c20$%deacon@arm.com>
  1 sibling, 0 replies; 15+ messages in thread
From: Will Deacon @ 2010-07-08  9:36 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Nicolas,

> > Currently, the 32-bit and 64-bit atomic operations on ARM do not
> > include memory constraints in the inline assembly blocks. In the
> > case of barrier-less operations [for example, atomic_add], this
> > means that the compiler may constant fold values which have actually
> > been modified by a call to an atomic operation.

Thanks a lot for looking at this.

> Why do you use the "o" constraint?  That's for an offsetable
> memory reference.  Since we already use the actual address value with a
> register constraint, it would be more logical to simply use the "Q"
> constraint alone without any "o".  The gcc manual says:
> 
> |    `Q'
> |          A memory reference where the exact address is in a single
> |          register
> 
> Both "Qo" and "Q" provide the same wanted end result in this
> case.  But a quick test shows that "Q" produces exactly what we need,
> more so than "Qo", because of the other operand which is the actual
> address (the same register is used in both cases).

Whilst using "Q" on its own does generate correct code, using "Qo" is
a slight optimisation. The issue with "Q" is that GCC computes the address
again, even though it has already done so for the "r" constraint. Ideally,
we'd ditch the "r" constraint and just use "Q", but unfortunately this results
in code like ldrex r0, [r1, #0] which GAS refuses to accept. If we use "o",
then GCC doesn't compute the address twice, but can fail if it ends up with
a non-offsettable address (complaining that the constraints are impossible
to satisfy). Using "Qo" results in "o" being used if possible but, where
it doesn't match, "Q" is used at the expense of an extra register:

With "Q":

 740:   f57ff05f        dmb     sy
 744:   e30f4001        movw    r4, #61441      ; 0xf001
 748:   e30a5bad        movw    r5, #43949      ; 0xabad
 74c:   e34f400d        movt    r4, #61453      ; 0xf00d
 750:   e34f5ace        movt    r5, #64206      ; 0xface
 754:   e24be034        sub     lr, fp, #52     ; 0x34    <--- Redundant address computation
 758:   e3a02000        mov     r2, #0
 75c:   e1b36f9f        ldrexd  r6, [r3]
 760:   e1360004        teq     r6, r4
 764:   01370005        teqeq   r7, r5
 768:   01a32f90        strexdeq        r2, r0, [r3]
 76c:   e3520000        cmp     r2, #0
 770:   1afffff7        bne     754 <test_atomic64+0x754>
 774:   f57ff05f        dmb     sy

With "Qo":

 6f4:   f57ff05f        dmb     sy
 6f8:   e30f4001        movw    r4, #61441      ; 0xf001
 6fc:   e30a5bad        movw    r5, #43949      ; 0xabad
 700:   e34f400d        movt    r4, #61453      ; 0xf00d
 704:   e34f5ace        movt    r5, #64206      ; 0xface
 708:   e3a02000        mov     r2, #0
 70c:   e1b36f9f        ldrexd  r6, [r3]
 710:   e1360004        teq     r6, r4
 714:   01370005        teqeq   r7, r5
 718:   01a32f90        strexdeq        r2, r0, [r3]
 71c:   e3520000        cmp     r2, #0
 720:   1afffff8        bne     708 <test_atomic64+0x708>
 724:   f57ff05f        dmb     sy
 
> In any case:
> 
> Reviewed-by: Nicolas Pitre <nicolas.pitre@linaro.org>

Thanks, I'll submit this to the patch system today.

> Also, I'm assuming that such a constraint should be added to the 32 bit
> versions too for the same correctness reason?

That's right. The patch updates the 32-bit variants as well as the
64-bit ones.

Cheers,

Will

^ permalink raw reply	[flat|nested] 15+ messages in thread

[parent not found: <004b01cb1e81$0b745960$225d0c20$%deacon@arm.com>]

* [PATCH 3/4] ARM: atomic ops: add memory constraints to inline asm
       [not found]         ` <004b01cb1e81$0b745960$225d0c20$%deacon@arm.com>
@ 2010-07-08 12:42           ` Nicolas Pitre
  0 siblings, 0 replies; 15+ messages in thread
From: Nicolas Pitre @ 2010-07-08 12:42 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 8 Jul 2010, Will Deacon wrote:

> Hi Nicolas,
> 
> > > Currently, the 32-bit and 64-bit atomic operations on ARM do not
> > > include memory constraints in the inline assembly blocks. In the
> > > case of barrier-less operations [for example, atomic_add], this
> > > means that the compiler may constant fold values which have actually
> > > been modified by a call to an atomic operation.
> 
> Thanks a lot for looking at this.
> 
> > Why do you use the "o" constraint?  That's for an offsetable
> > memory reference.  Since we already use the actual address value with a
> > register constraint, it would be more logical to simply use the "Q"
> > constraint alone without any "o".  The gcc manual says:
> > 
> > |    `Q'
> > |          A memory reference where the exact address is in a single
> > |          register
> > 
> > Both "Qo" and "Q" provide the same wanted end result in this
> > case.  But a quick test shows that "Q" produces exactly what we need,
> > more so than "Qo", because of the other operand which is the actual
> > address (the same register is used in both cases).
> 
> Whilst using "Q" on its own does generate correct code, using "Qo" is
> a slight optimisation. The issue with "Q" is that GCC computes the address
> again, even though it has already done so for the "r" constraint. Ideally,
> we'd ditch the "r" constraint and just use "Q", but unfortunately this results
> in code like ldrex r0, [r1, #0] which GAS refuses to accept. If we use "o",
> then GCC doesn't compute the address twice, but can fail if it ends up with
> a non-offsettable address (complaining that the constraints are impossible
> to satisfy). Using "Qo" results in "o" being used if possible but, where
> it doesn't match, "Q" is used at the expense of an extra register:
> 
> With "Q":
> 
>  740:   f57ff05f        dmb     sy
>  744:   e30f4001        movw    r4, #61441      ; 0xf001
>  748:   e30a5bad        movw    r5, #43949      ; 0xabad
>  74c:   e34f400d        movt    r4, #61453      ; 0xf00d
>  750:   e34f5ace        movt    r5, #64206      ; 0xface
>  754:   e24be034        sub     lr, fp, #52     ; 0x34    <--- Redundant address computation
>  758:   e3a02000        mov     r2, #0
>  75c:   e1b36f9f        ldrexd  r6, [r3]
>  760:   e1360004        teq     r6, r4
>  764:   01370005        teqeq   r7, r5
>  768:   01a32f90        strexdeq        r2, r0, [r3]
>  76c:   e3520000        cmp     r2, #0
>  770:   1afffff7        bne     754 <test_atomic64+0x754>
>  774:   f57ff05f        dmb     sy

That's weird.  The simple test I did was:

int foo(int *x)
{
        int r;
        asm("%1 %2" : "=&r" (r), "+Q" (x[2]) : "r" (&x[2]));
        return r;
}

which resulted in:

        add     r0, r0, #8
#APP
@ 4 "t.c" 1
        [r0, #0] r0
@ 0 "" 2
        mov     r0, r1
        bx      lr

So the address is clearly computed only once.

> > In any case:
> > 
> > Reviewed-by: Nicolas Pitre <nicolas.pitre@linaro.org>
> 
> Thanks, I'll submit this to the patch system today.

Nicolas

^ permalink raw reply	[flat|nested] 15+ messages in thread

[parent not found: <004101cb1df3$5825ecd0$0871c670$%deacon@arm.com>]

* [PATCH 3/4] ARM: atomic ops: add memory constraints to inline asm
       [not found]       ` <004101cb1df3$5825ecd0$0871c670$%deacon@arm.com>
@ 2010-07-08  5:58         ` Nicolas Pitre
  2010-07-08 10:03           ` Will Deacon
  0 siblings, 1 reply; 15+ messages in thread
From: Nicolas Pitre @ 2010-07-08  5:58 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 7 Jul 2010, Will Deacon wrote:

> Hello,
> 
> > Subject: [PATCH 3/4] ARM: atomic ops: add memory constraints to inline asm
> > 
> > This patch adds suitable memory constraints to the atomic operations on ARM
> 
> Does anybody have any feedback on this patch? The other three in the
> series are straightforward, but I'd like another set of eyes to go over this
> code [I know it makes for tough reading!] before submitted to the patch
> system.
> 
> Any comments appreciated!

In addition to the comments I've just provided, I think you should also 
add a

Cc: stable at kernel.org

to the list of tags for those patches.  They are definitely fixing real 
correctness issues.


Nicolas

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 3/4] ARM: atomic ops: add memory constraints to inline asm
  2010-07-08  5:58         ` Nicolas Pitre
@ 2010-07-08 10:03           ` Will Deacon
  0 siblings, 0 replies; 15+ messages in thread
From: Will Deacon @ 2010-07-08 10:03 UTC (permalink / raw)
  To: linux-arm-kernel

Nicolas,

> > > Subject: [PATCH 3/4] ARM: atomic ops: add memory constraints to inline asm
> > >
> > > This patch adds suitable memory constraints to the atomic operations on ARM

> In addition to the comments I've just provided, I think you should also
> add a
> 
> Cc: stable at kernel.org
> 
> to the list of tags for those patches.  They are definitely fixing real
> correctness issues.

I've submitted these patches as 621{1-3}/1. I Cc'd stable for the first
two patches because the first one is required in order to apply the second.

Cheers,

Will

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 2/4] ARM: atomic ops: reduce critical region in atomic64_cmpxchg
  2010-06-30 14:04   ` [PATCH 2/4] ARM: atomic ops: reduce critical region in atomic64_cmpxchg Will Deacon
  2010-06-30 14:04     ` [PATCH 3/4] ARM: atomic ops: add memory constraints to inline asm Will Deacon
@ 2010-07-08  4:49     ` Nicolas Pitre
  2010-07-08  9:43       ` Will Deacon
  1 sibling, 1 reply; 15+ messages in thread
From: Nicolas Pitre @ 2010-07-08  4:49 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 30 Jun 2010, Will Deacon wrote:

> In order to reduce the likelihood of failure in a load/store
> exclusive block, the number of intervening instructions should
> be kept to a minimum.
> 
> This patch hoists a mov operation out of the critical region
> in atomic64_cmpxchg.
> 
> Cc: Nicolas Pitre <nico@fluxnic.net>
> Signed-off-by: Will Deacon <will.deacon@arm.com>
> ---
>  arch/arm/include/asm/atomic.h |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/arch/arm/include/asm/atomic.h b/arch/arm/include/asm/atomic.h
> index e9e56c0..4f0f282 100644
> --- a/arch/arm/include/asm/atomic.h
> +++ b/arch/arm/include/asm/atomic.h
> @@ -358,8 +358,8 @@ static inline u64 atomic64_cmpxchg(atomic64_t *ptr, u64 old, u64 new)
>  
>  	do {
>  		__asm__ __volatile__("@ atomic64_cmpxchg\n"
> -		"ldrexd		%1, %H1, [%2]\n"
>  		"mov		%0, #0\n"
> +		"ldrexd		%1, %H1, [%2]\n"
>  		"teq		%1, %3\n"
>  		"teqeq		%H1, %H3\n"
>  		"strexdeq	%0, %4, %H4, [%2]"

I'm not sure you gain anything here.  The ldrexd probably requires at 
least one result delay cycle which is filled by the  mov instruction.  
By moving the mov insn before the ldrexd you are probably making the 
whole sequence one cycle longer.


Nicolas

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 2/4] ARM: atomic ops: reduce critical region in atomic64_cmpxchg
  2010-07-08  4:49     ` [PATCH 2/4] ARM: atomic ops: reduce critical region in atomic64_cmpxchg Nicolas Pitre
@ 2010-07-08  9:43       ` Will Deacon
  0 siblings, 0 replies; 15+ messages in thread
From: Will Deacon @ 2010-07-08  9:43 UTC (permalink / raw)
  To: linux-arm-kernel

Hello,

> > diff --git a/arch/arm/include/asm/atomic.h b/arch/arm/include/asm/atomic.h
> > index e9e56c0..4f0f282 100644
> > --- a/arch/arm/include/asm/atomic.h
> > +++ b/arch/arm/include/asm/atomic.h
> > @@ -358,8 +358,8 @@ static inline u64 atomic64_cmpxchg(atomic64_t *ptr, u64 old, u64 new)
> >
> >  	do {
> >  		__asm__ __volatile__("@ atomic64_cmpxchg\n"
> > -		"ldrexd		%1, %H1, [%2]\n"
> >  		"mov		%0, #0\n"
> > +		"ldrexd		%1, %H1, [%2]\n"
> >  		"teq		%1, %3\n"
> >  		"teqeq		%H1, %H3\n"
> >  		"strexdeq	%0, %4, %H4, [%2]"
> 
> I'm not sure you gain anything here.  The ldrexd probably requires at
> least one result delay cycle which is filled by the  mov instruction.
> By moving the mov insn before the ldrexd you are probably making the
> whole sequence one cycle longer.

You're right. In fact, thinking about it, this patch is largely
superficial because if the core can do exclusive load/stores then
the mov will be issued down a separate pipeline anyway.

I'll drop this one from the patch series and submit the other three.

Thanks,

Will

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/4] ARM: atomic ops: fix register constraints for atomic64_add_unless
  2010-06-30 14:04 ` [PATCH 1/4] ARM: atomic ops: fix register constraints for atomic64_add_unless Will Deacon
  2010-06-30 14:04   ` [PATCH 2/4] ARM: atomic ops: reduce critical region in atomic64_cmpxchg Will Deacon
@ 2010-07-08  4:37   ` Nicolas Pitre
  1 sibling, 0 replies; 15+ messages in thread
From: Nicolas Pitre @ 2010-07-08  4:37 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 30 Jun 2010, Will Deacon wrote:

> The atomic64_add_unless function compares an atomic variable with
> a given value and, if they are not equal, adds another given value
> to the atomic variable. The function returns zero if the addition
> did not occur and non-zero otherwise.
> 
> On ARM, the return value is initialised to 1 in C code. Inline assembly
> code then performs the atomic64_add_unless operation, setting the
> return value to 0 iff the addition does not occur. This means that
> when the addition *does* occur, the value of ret must be preserved
> across the inline assembly and therefore requires a "+r" constraint
> rather than the current one of "=&r".
> 
> Thanks to Nicolas Pitre for helping to spot this.
> 
> Cc: Nicolas Pitre <nico@fluxnic.net>
> Signed-off-by: Will Deacon <will.deacon@arm.com>

Reviewed-by: Nicolas Pitre <nicolas.pitre@linaro.org>

> ---
>  arch/arm/include/asm/atomic.h |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/arch/arm/include/asm/atomic.h b/arch/arm/include/asm/atomic.h
> index a0162fa..e9e56c0 100644
> --- a/arch/arm/include/asm/atomic.h
> +++ b/arch/arm/include/asm/atomic.h
> @@ -440,7 +440,7 @@ static inline int atomic64_add_unless(atomic64_t *v, u64 a, u64 u)
>  "	teq	%2, #0\n"
>  "	bne	1b\n"
>  "2:"
> -	: "=&r" (val), "=&r" (ret), "=&r" (tmp)
> +	: "=&r" (val), "+r" (ret), "=&r" (tmp)
>  	: "r" (&v->counter), "r" (u), "r" (a)
>  	: "cc");
>  
> -- 
> 1.6.3.3
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2010-07-08 12:42 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-06-30 14:04 [PATCH 0/4] Fix atomic operations so that atomic64_test passes on ARM [V2] Will Deacon
2010-06-30 14:04 ` [PATCH 1/4] ARM: atomic ops: fix register constraints for atomic64_add_unless Will Deacon
2010-06-30 14:04   ` [PATCH 2/4] ARM: atomic ops: reduce critical region in atomic64_cmpxchg Will Deacon
2010-06-30 14:04     ` [PATCH 3/4] ARM: atomic ops: add memory constraints to inline asm Will Deacon
2010-06-30 14:04       ` [PATCH 4/4] ARM: atomic64_test: add ARM as supported architecture Will Deacon
2010-07-08  5:50         ` Nicolas Pitre
2010-07-07 16:42       ` [PATCH 3/4] ARM: atomic ops: add memory constraints to inline asm Will Deacon
2010-07-08  5:49       ` Nicolas Pitre
2010-07-08  9:36         ` Will Deacon
     [not found]         ` <004b01cb1e81$0b745960$225d0c20$%deacon@arm.com>
2010-07-08 12:42           ` Nicolas Pitre
     [not found]       ` <004101cb1df3$5825ecd0$0871c670$%deacon@arm.com>
2010-07-08  5:58         ` Nicolas Pitre
2010-07-08 10:03           ` Will Deacon
2010-07-08  4:49     ` [PATCH 2/4] ARM: atomic ops: reduce critical region in atomic64_cmpxchg Nicolas Pitre
2010-07-08  9:43       ` Will Deacon
2010-07-08  4:37   ` [PATCH 1/4] ARM: atomic ops: fix register constraints for atomic64_add_unless Nicolas Pitre

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).