linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/6] Add support for pldw instruction on v7 MP cores
@ 2013-07-25 15:55 Will Deacon
  2013-07-25 15:55 ` [PATCH v2 1/6] ARM: prefetch: remove redundant "cc" clobber Will Deacon
                   ` (5 more replies)
  0 siblings, 6 replies; 8+ messages in thread
From: Will Deacon @ 2013-07-25 15:55 UTC (permalink / raw)
  To: linux-arm-kernel

Hello all,

Thie is version two of that patches I posted the other day:

v1: http://lists.infradead.org/pipermail/linux-arm-kernel/2013-July/185663.html

Changes since v1 include:

	- Rewrite prefetch and prefetchw as macros, to avoid explicit
	  casting in the caller (also requires changing the type of
	  arch_rwlock_t to remove the volatile member).

	- Some cosmetic cleanups.

Thanks to Nicolas for the feedback received so far.

Will


Will Deacon (6):
  ARM: prefetch: remove redundant "cc" clobber
  ARM: smp_on_up: move inline asm ALT_SMP patching macro out of
    spinlock.h
  ARM: prefetch: add support for prefetchw using pldw on SMP ARMv7+ CPUs
  ARM: locks: prefetch the destination word for write prior to strex
  ARM: atomics: prefetch the destination word for write prior to strex
  ARM: bitops: prefetch the destination word for write prior to strex

 arch/arm/include/asm/atomic.h         |  7 ++++++
 arch/arm/include/asm/processor.h      | 43 ++++++++++++++++++++++++-----------
 arch/arm/include/asm/spinlock.h       | 28 +++++++++++------------
 arch/arm/include/asm/spinlock_types.h |  2 +-
 arch/arm/include/asm/unified.h        |  4 ++++
 arch/arm/lib/bitops.h                 |  5 ++++
 6 files changed, 61 insertions(+), 28 deletions(-)

-- 
1.8.2.2

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 1/6] ARM: prefetch: remove redundant "cc" clobber
  2013-07-25 15:55 [PATCH v2 0/6] Add support for pldw instruction on v7 MP cores Will Deacon
@ 2013-07-25 15:55 ` Will Deacon
  2013-07-25 15:55 ` [PATCH v2 2/6] ARM: smp_on_up: move inline asm ALT_SMP patching macro out of spinlock.h Will Deacon
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: Will Deacon @ 2013-07-25 15:55 UTC (permalink / raw)
  To: linux-arm-kernel

The pld instruction does not affect the condition flags, so don't bother
clobbering them.

Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm/include/asm/processor.h | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/arm/include/asm/processor.h b/arch/arm/include/asm/processor.h
index 06e7d50..91cfe08 100644
--- a/arch/arm/include/asm/processor.h
+++ b/arch/arm/include/asm/processor.h
@@ -101,9 +101,7 @@ static inline void prefetch(const void *ptr)
 {
 	__asm__ __volatile__(
 		"pld\t%a0"
-		:
-		: "p" (ptr)
-		: "cc");
+		:: "p" (ptr));
 }
 
 #define ARCH_HAS_PREFETCHW
-- 
1.8.2.2

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 2/6] ARM: smp_on_up: move inline asm ALT_SMP patching macro out of spinlock.h
  2013-07-25 15:55 [PATCH v2 0/6] Add support for pldw instruction on v7 MP cores Will Deacon
  2013-07-25 15:55 ` [PATCH v2 1/6] ARM: prefetch: remove redundant "cc" clobber Will Deacon
@ 2013-07-25 15:55 ` Will Deacon
  2013-07-25 15:55 ` [PATCH v2 3/6] ARM: prefetch: add support for prefetchw using pldw on SMP ARMv7+ CPUs Will Deacon
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: Will Deacon @ 2013-07-25 15:55 UTC (permalink / raw)
  To: linux-arm-kernel

Patching UP/SMP alternatives inside inline assembly blocks is useful
outside of the spinlock implementation, where it is used for sev and wfe.

This patch lifts the macro into processor.h and gives it a scarier name
to (a) avoid conflicts in the global namespace and (b) to try and deter
its usage unless you "know what you're doing". The W macro for generating
wide instructions when targetting Thumb-2 is also made available under
the name WASM, to reduce the potential for conflicts with other headers.

Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm/include/asm/processor.h | 12 ++++++++++++
 arch/arm/include/asm/spinlock.h  | 15 ++++-----------
 arch/arm/include/asm/unified.h   |  4 ++++
 3 files changed, 20 insertions(+), 11 deletions(-)

diff --git a/arch/arm/include/asm/processor.h b/arch/arm/include/asm/processor.h
index 91cfe08..cbdb130 100644
--- a/arch/arm/include/asm/processor.h
+++ b/arch/arm/include/asm/processor.h
@@ -22,6 +22,7 @@
 #include <asm/hw_breakpoint.h>
 #include <asm/ptrace.h>
 #include <asm/types.h>
+#include <asm/unified.h>
 
 #ifdef __KERNEL__
 #define STACK_TOP	((current->personality & ADDR_LIMIT_32BIT) ? \
@@ -91,6 +92,17 @@ unsigned long get_wchan(struct task_struct *p);
 #define KSTK_EIP(tsk)	task_pt_regs(tsk)->ARM_pc
 #define KSTK_ESP(tsk)	task_pt_regs(tsk)->ARM_sp
 
+#ifdef CONFIG_SMP
+#define __ALT_SMP_ASM(smp, up)						\
+	"9998:	" smp "\n"						\
+	"	.pushsection \".alt.smp.init\", \"a\"\n"		\
+	"	.long	9998b\n"					\
+	"	" up "\n"						\
+	"	.popsection\n"
+#else
+#define __ALT_SMP_ASM(smp, up)	up
+#endif
+
 /*
  * Prefetching support - only ARMv5.
  */
diff --git a/arch/arm/include/asm/spinlock.h b/arch/arm/include/asm/spinlock.h
index f8b8965..0de7bec 100644
--- a/arch/arm/include/asm/spinlock.h
+++ b/arch/arm/include/asm/spinlock.h
@@ -11,15 +11,7 @@
  * sev and wfe are ARMv6K extensions.  Uniprocessor ARMv6 may not have the K
  * extensions, so when running on UP, we have to patch these instructions away.
  */
-#define ALT_SMP(smp, up)					\
-	"9998:	" smp "\n"					\
-	"	.pushsection \".alt.smp.init\", \"a\"\n"	\
-	"	.long	9998b\n"				\
-	"	" up "\n"					\
-	"	.popsection\n"
-
 #ifdef CONFIG_THUMB2_KERNEL
-#define SEV		ALT_SMP("sev.w", "nop.w")
 /*
  * For Thumb-2, special care is needed to ensure that the conditional WFE
  * instruction really does assemble to exactly 4 bytes (as required by
@@ -31,17 +23,18 @@
  * the assembler won't change IT instructions which are explicitly present
  * in the input.
  */
-#define WFE(cond)	ALT_SMP(		\
+#define WFE(cond)	__ALT_SMP_ASM(		\
 	"it " cond "\n\t"			\
 	"wfe" cond ".n",			\
 						\
 	"nop.w"					\
 )
 #else
-#define SEV		ALT_SMP("sev", "nop")
-#define WFE(cond)	ALT_SMP("wfe" cond, "nop")
+#define WFE(cond)	__ALT_SMP_ASM("wfe" cond, "nop")
 #endif
 
+#define SEV		__ALT_SMP_ASM(WASM(sev), WASM(nop))
+
 static inline void dsb_sev(void)
 {
 #if __LINUX_ARM_ARCH__ >= 7
diff --git a/arch/arm/include/asm/unified.h b/arch/arm/include/asm/unified.h
index f5989f4..b88beab 100644
--- a/arch/arm/include/asm/unified.h
+++ b/arch/arm/include/asm/unified.h
@@ -38,6 +38,8 @@
 #ifdef __ASSEMBLY__
 #define W(instr)	instr.w
 #define BSYM(sym)	sym + 1
+#else
+#define WASM(instr)	#instr ".w"
 #endif
 
 #else	/* !CONFIG_THUMB2_KERNEL */
@@ -50,6 +52,8 @@
 #ifdef __ASSEMBLY__
 #define W(instr)	instr
 #define BSYM(sym)	sym
+#else
+#define WASM(instr)	#instr
 #endif
 
 #endif	/* CONFIG_THUMB2_KERNEL */
-- 
1.8.2.2

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 3/6] ARM: prefetch: add support for prefetchw using pldw on SMP ARMv7+ CPUs
  2013-07-25 15:55 [PATCH v2 0/6] Add support for pldw instruction on v7 MP cores Will Deacon
  2013-07-25 15:55 ` [PATCH v2 1/6] ARM: prefetch: remove redundant "cc" clobber Will Deacon
  2013-07-25 15:55 ` [PATCH v2 2/6] ARM: smp_on_up: move inline asm ALT_SMP patching macro out of spinlock.h Will Deacon
@ 2013-07-25 15:55 ` Will Deacon
  2013-07-25 15:55 ` [PATCH v2 4/6] ARM: locks: prefetch the destination word for write prior to strex Will Deacon
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: Will Deacon @ 2013-07-25 15:55 UTC (permalink / raw)
  To: linux-arm-kernel

SMP ARMv7 CPUs implement the pldw instruction, which allows them to
prefetch data cachelines in an exclusive state.

This patch defines the prefetchw macro using pldw for CPUs that support
it.

Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm/include/asm/processor.h | 29 ++++++++++++++++++-----------
 1 file changed, 18 insertions(+), 11 deletions(-)

diff --git a/arch/arm/include/asm/processor.h b/arch/arm/include/asm/processor.h
index cbdb130..dde7ecc 100644
--- a/arch/arm/include/asm/processor.h
+++ b/arch/arm/include/asm/processor.h
@@ -109,19 +109,26 @@ unsigned long get_wchan(struct task_struct *p);
 #if __LINUX_ARM_ARCH__ >= 5
 
 #define ARCH_HAS_PREFETCH
-static inline void prefetch(const void *ptr)
-{
-	__asm__ __volatile__(
-		"pld\t%a0"
-		:: "p" (ptr));
-}
+#define prefetch(p)							\
+({									\
+	__asm__ __volatile__(						\
+		"pld\t%a0"						\
+		:: "p" (p));						\
+})
 
+#if __LINUX_ARM_ARCH__ >= 7 && defined(CONFIG_SMP)
 #define ARCH_HAS_PREFETCHW
-#define prefetchw(ptr)	prefetch(ptr)
-
-#define ARCH_HAS_SPINLOCK_PREFETCH
-#define spin_lock_prefetch(x) do { } while (0)
-
+#define prefetchw(p)							\
+({									\
+	__asm__ __volatile__(						\
+		".arch_extension	mp\n"				\
+		__ALT_SMP_ASM(						\
+			WASM(pldw)		"\t%a0",		\
+			WASM(pld)		"\t%a0"			\
+		)							\
+		:: "p" (p));						\
+})
+#endif
 #endif
 
 #define HAVE_ARCH_PICK_MMAP_LAYOUT
-- 
1.8.2.2

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 4/6] ARM: locks: prefetch the destination word for write prior to strex
  2013-07-25 15:55 [PATCH v2 0/6] Add support for pldw instruction on v7 MP cores Will Deacon
                   ` (2 preceding siblings ...)
  2013-07-25 15:55 ` [PATCH v2 3/6] ARM: prefetch: add support for prefetchw using pldw on SMP ARMv7+ CPUs Will Deacon
@ 2013-07-25 15:55 ` Will Deacon
  2013-07-25 19:22   ` Nicolas Pitre
  2013-07-25 15:55 ` [PATCH v2 5/6] ARM: atomics: " Will Deacon
  2013-07-25 15:55 ` [PATCH v2 6/6] ARM: bitops: " Will Deacon
  5 siblings, 1 reply; 8+ messages in thread
From: Will Deacon @ 2013-07-25 15:55 UTC (permalink / raw)
  To: linux-arm-kernel

The cost of changing a cacheline from shared to exclusive state can be
significant, especially when this is triggered by an exclusive store,
since it may result in having to retry the transaction.

This patch prefixes our {spin,read,write}_[try]lock implementations with
pldw instructions (on CPUs which support them) to try and grab the line
in exclusive state from the start. arch_rwlock_t is changed to avoid
using a volatile member, since this generates compiler warnings when
falling back on the __builtin_prefetch intrinsic which expects a const
void * argument.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm/include/asm/spinlock.h       | 13 ++++++++++---
 arch/arm/include/asm/spinlock_types.h |  2 +-
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/arch/arm/include/asm/spinlock.h b/arch/arm/include/asm/spinlock.h
index 0de7bec..3c8c532 100644
--- a/arch/arm/include/asm/spinlock.h
+++ b/arch/arm/include/asm/spinlock.h
@@ -5,7 +5,7 @@
 #error SMP not supported on pre-ARMv6 CPUs
 #endif
 
-#include <asm/processor.h>
+#include <linux/prefetch.h>
 
 /*
  * sev and wfe are ARMv6K extensions.  Uniprocessor ARMv6 may not have the K
@@ -70,6 +70,7 @@ static inline void arch_spin_lock(arch_spinlock_t *lock)
 	u32 newval;
 	arch_spinlock_t lockval;
 
+	prefetchw(&lock->slock);
 	__asm__ __volatile__(
 "1:	ldrex	%0, [%3]\n"
 "	add	%1, %0, %4\n"
@@ -93,6 +94,7 @@ static inline int arch_spin_trylock(arch_spinlock_t *lock)
 	unsigned long contended, res;
 	u32 slock;
 
+	prefetchw(&lock->slock);
 	do {
 		__asm__ __volatile__(
 		"	ldrex	%0, [%3]\n"
@@ -145,6 +147,7 @@ static inline void arch_write_lock(arch_rwlock_t *rw)
 {
 	unsigned long tmp;
 
+	prefetchw(&rw->lock);
 	__asm__ __volatile__(
 "1:	ldrex	%0, [%1]\n"
 "	teq	%0, #0\n"
@@ -163,6 +166,7 @@ static inline int arch_write_trylock(arch_rwlock_t *rw)
 {
 	unsigned long tmp;
 
+	prefetchw(&rw->lock);
 	__asm__ __volatile__(
 "	ldrex	%0, [%1]\n"
 "	teq	%0, #0\n"
@@ -193,7 +197,7 @@ static inline void arch_write_unlock(arch_rwlock_t *rw)
 }
 
 /* write_can_lock - would write_trylock() succeed? */
-#define arch_write_can_lock(x)		((x)->lock == 0)
+#define arch_write_can_lock(x)		(ACCESS_ONCE((x)->lock) == 0)
 
 /*
  * Read locks are a bit more hairy:
@@ -211,6 +215,7 @@ static inline void arch_read_lock(arch_rwlock_t *rw)
 {
 	unsigned long tmp, tmp2;
 
+	prefetchw(&rw->lock);
 	__asm__ __volatile__(
 "1:	ldrex	%0, [%2]\n"
 "	adds	%0, %0, #1\n"
@@ -231,6 +236,7 @@ static inline void arch_read_unlock(arch_rwlock_t *rw)
 
 	smp_mb();
 
+	prefetchw(&rw->lock);
 	__asm__ __volatile__(
 "1:	ldrex	%0, [%2]\n"
 "	sub	%0, %0, #1\n"
@@ -249,6 +255,7 @@ static inline int arch_read_trylock(arch_rwlock_t *rw)
 {
 	unsigned long tmp, tmp2 = 1;
 
+	prefetchw(&rw->lock);
 	__asm__ __volatile__(
 "	ldrex	%0, [%2]\n"
 "	adds	%0, %0, #1\n"
@@ -262,7 +269,7 @@ static inline int arch_read_trylock(arch_rwlock_t *rw)
 }
 
 /* read_can_lock - would read_trylock() succeed? */
-#define arch_read_can_lock(x)		((x)->lock < 0x80000000)
+#define arch_read_can_lock(x)		(ACCESS_ONCE((x)->lock) < 0x80000000)
 
 #define arch_read_lock_flags(lock, flags) arch_read_lock(lock)
 #define arch_write_lock_flags(lock, flags) arch_write_lock(lock)
diff --git a/arch/arm/include/asm/spinlock_types.h b/arch/arm/include/asm/spinlock_types.h
index b262d2f..47663fc 100644
--- a/arch/arm/include/asm/spinlock_types.h
+++ b/arch/arm/include/asm/spinlock_types.h
@@ -25,7 +25,7 @@ typedef struct {
 #define __ARCH_SPIN_LOCK_UNLOCKED	{ { 0 } }
 
 typedef struct {
-	volatile unsigned int lock;
+	u32 lock;
 } arch_rwlock_t;
 
 #define __ARCH_RW_LOCK_UNLOCKED		{ 0 }
-- 
1.8.2.2

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 5/6] ARM: atomics: prefetch the destination word for write prior to strex
  2013-07-25 15:55 [PATCH v2 0/6] Add support for pldw instruction on v7 MP cores Will Deacon
                   ` (3 preceding siblings ...)
  2013-07-25 15:55 ` [PATCH v2 4/6] ARM: locks: prefetch the destination word for write prior to strex Will Deacon
@ 2013-07-25 15:55 ` Will Deacon
  2013-07-25 15:55 ` [PATCH v2 6/6] ARM: bitops: " Will Deacon
  5 siblings, 0 replies; 8+ messages in thread
From: Will Deacon @ 2013-07-25 15:55 UTC (permalink / raw)
  To: linux-arm-kernel

The cost of changing a cacheline from shared to exclusive state can be
significant, especially when this is triggered by an exclusive store,
since it may result in having to retry the transaction.

This patch prefixes our atomic access implementations with pldw
instructions (on CPUs which support them) to try and grab the line in
exclusive state from the start. Only the barrier-less functions are
updated, since memory barriers can limit the usefulness of prefetching
data.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm/include/asm/atomic.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/arm/include/asm/atomic.h b/arch/arm/include/asm/atomic.h
index da1c77d..55ffc3b 100644
--- a/arch/arm/include/asm/atomic.h
+++ b/arch/arm/include/asm/atomic.h
@@ -12,6 +12,7 @@
 #define __ASM_ARM_ATOMIC_H
 
 #include <linux/compiler.h>
+#include <linux/prefetch.h>
 #include <linux/types.h>
 #include <linux/irqflags.h>
 #include <asm/barrier.h>
@@ -41,6 +42,7 @@ static inline void atomic_add(int i, atomic_t *v)
 	unsigned long tmp;
 	int result;
 
+	prefetchw(&v->counter);
 	__asm__ __volatile__("@ atomic_add\n"
 "1:	ldrex	%0, [%3]\n"
 "	add	%0, %0, %4\n"
@@ -79,6 +81,7 @@ static inline void atomic_sub(int i, atomic_t *v)
 	unsigned long tmp;
 	int result;
 
+	prefetchw(&v->counter);
 	__asm__ __volatile__("@ atomic_sub\n"
 "1:	ldrex	%0, [%3]\n"
 "	sub	%0, %0, %4\n"
@@ -138,6 +141,7 @@ static inline void atomic_clear_mask(unsigned long mask, unsigned long *addr)
 {
 	unsigned long tmp, tmp2;
 
+	prefetchw(addr);
 	__asm__ __volatile__("@ atomic_clear_mask\n"
 "1:	ldrex	%0, [%3]\n"
 "	bic	%0, %0, %4\n"
@@ -283,6 +287,7 @@ static inline void atomic64_set(atomic64_t *v, u64 i)
 {
 	u64 tmp;
 
+	prefetchw(&v->counter);
 	__asm__ __volatile__("@ atomic64_set\n"
 "1:	ldrexd	%0, %H0, [%2]\n"
 "	strexd	%0, %3, %H3, [%2]\n"
@@ -299,6 +304,7 @@ static inline void atomic64_add(u64 i, atomic64_t *v)
 	u64 result;
 	unsigned long tmp;
 
+	prefetchw(&v->counter);
 	__asm__ __volatile__("@ atomic64_add\n"
 "1:	ldrexd	%0, %H0, [%3]\n"
 "	adds	%0, %0, %4\n"
@@ -339,6 +345,7 @@ static inline void atomic64_sub(u64 i, atomic64_t *v)
 	u64 result;
 	unsigned long tmp;
 
+	prefetchw(&v->counter);
 	__asm__ __volatile__("@ atomic64_sub\n"
 "1:	ldrexd	%0, %H0, [%3]\n"
 "	subs	%0, %0, %4\n"
-- 
1.8.2.2

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 6/6] ARM: bitops: prefetch the destination word for write prior to strex
  2013-07-25 15:55 [PATCH v2 0/6] Add support for pldw instruction on v7 MP cores Will Deacon
                   ` (4 preceding siblings ...)
  2013-07-25 15:55 ` [PATCH v2 5/6] ARM: atomics: " Will Deacon
@ 2013-07-25 15:55 ` Will Deacon
  5 siblings, 0 replies; 8+ messages in thread
From: Will Deacon @ 2013-07-25 15:55 UTC (permalink / raw)
  To: linux-arm-kernel

The cost of changing a cacheline from shared to exclusive state can be
significant, especially when this is triggered by an exclusive store,
since it may result in having to retry the transaction.

This patch prefixes our atomic bitops implementation with prefetchw,
to try and grab the line in exclusive state from the start. The testop
macro is left alone, since the barrier semantics limit the usefulness
of prefetching data.

Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm/lib/bitops.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/arm/lib/bitops.h b/arch/arm/lib/bitops.h
index d6408d1..e0c68d5 100644
--- a/arch/arm/lib/bitops.h
+++ b/arch/arm/lib/bitops.h
@@ -10,6 +10,11 @@ UNWIND(	.fnstart	)
 	and	r3, r0, #31		@ Get bit offset
 	mov	r0, r0, lsr #5
 	add	r1, r1, r0, lsl #2	@ Get word offset
+#if __LINUX_ARM_ARCH__ >= 7
+	.arch_extension	mp
+	ALT_SMP(W(pldw)	[r1])
+	ALT_UP(W(nop))
+#endif
 	mov	r3, r2, lsl r3
 1:	ldrex	r2, [r1]
 	\instr	r2, r2, r3
-- 
1.8.2.2

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 4/6] ARM: locks: prefetch the destination word for write prior to strex
  2013-07-25 15:55 ` [PATCH v2 4/6] ARM: locks: prefetch the destination word for write prior to strex Will Deacon
@ 2013-07-25 19:22   ` Nicolas Pitre
  0 siblings, 0 replies; 8+ messages in thread
From: Nicolas Pitre @ 2013-07-25 19:22 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 25 Jul 2013, Will Deacon wrote:

> The cost of changing a cacheline from shared to exclusive state can be
> significant, especially when this is triggered by an exclusive store,
> since it may result in having to retry the transaction.
> 
> This patch prefixes our {spin,read,write}_[try]lock implementations with
> pldw instructions (on CPUs which support them) to try and grab the line
> in exclusive state from the start. arch_rwlock_t is changed to avoid
> using a volatile member, since this generates compiler warnings when
> falling back on the __builtin_prefetch intrinsic which expects a const
> void * argument.
> 
> Signed-off-by: Will Deacon <will.deacon@arm.com>

Acked-by: Nicolas Pitre <nico@linaro.org>

> ---
>  arch/arm/include/asm/spinlock.h       | 13 ++++++++++---
>  arch/arm/include/asm/spinlock_types.h |  2 +-
>  2 files changed, 11 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm/include/asm/spinlock.h b/arch/arm/include/asm/spinlock.h
> index 0de7bec..3c8c532 100644
> --- a/arch/arm/include/asm/spinlock.h
> +++ b/arch/arm/include/asm/spinlock.h
> @@ -5,7 +5,7 @@
>  #error SMP not supported on pre-ARMv6 CPUs
>  #endif
>  
> -#include <asm/processor.h>
> +#include <linux/prefetch.h>
>  
>  /*
>   * sev and wfe are ARMv6K extensions.  Uniprocessor ARMv6 may not have the K
> @@ -70,6 +70,7 @@ static inline void arch_spin_lock(arch_spinlock_t *lock)
>  	u32 newval;
>  	arch_spinlock_t lockval;
>  
> +	prefetchw(&lock->slock);
>  	__asm__ __volatile__(
>  "1:	ldrex	%0, [%3]\n"
>  "	add	%1, %0, %4\n"
> @@ -93,6 +94,7 @@ static inline int arch_spin_trylock(arch_spinlock_t *lock)
>  	unsigned long contended, res;
>  	u32 slock;
>  
> +	prefetchw(&lock->slock);
>  	do {
>  		__asm__ __volatile__(
>  		"	ldrex	%0, [%3]\n"
> @@ -145,6 +147,7 @@ static inline void arch_write_lock(arch_rwlock_t *rw)
>  {
>  	unsigned long tmp;
>  
> +	prefetchw(&rw->lock);
>  	__asm__ __volatile__(
>  "1:	ldrex	%0, [%1]\n"
>  "	teq	%0, #0\n"
> @@ -163,6 +166,7 @@ static inline int arch_write_trylock(arch_rwlock_t *rw)
>  {
>  	unsigned long tmp;
>  
> +	prefetchw(&rw->lock);
>  	__asm__ __volatile__(
>  "	ldrex	%0, [%1]\n"
>  "	teq	%0, #0\n"
> @@ -193,7 +197,7 @@ static inline void arch_write_unlock(arch_rwlock_t *rw)
>  }
>  
>  /* write_can_lock - would write_trylock() succeed? */
> -#define arch_write_can_lock(x)		((x)->lock == 0)
> +#define arch_write_can_lock(x)		(ACCESS_ONCE((x)->lock) == 0)
>  
>  /*
>   * Read locks are a bit more hairy:
> @@ -211,6 +215,7 @@ static inline void arch_read_lock(arch_rwlock_t *rw)
>  {
>  	unsigned long tmp, tmp2;
>  
> +	prefetchw(&rw->lock);
>  	__asm__ __volatile__(
>  "1:	ldrex	%0, [%2]\n"
>  "	adds	%0, %0, #1\n"
> @@ -231,6 +236,7 @@ static inline void arch_read_unlock(arch_rwlock_t *rw)
>  
>  	smp_mb();
>  
> +	prefetchw(&rw->lock);
>  	__asm__ __volatile__(
>  "1:	ldrex	%0, [%2]\n"
>  "	sub	%0, %0, #1\n"
> @@ -249,6 +255,7 @@ static inline int arch_read_trylock(arch_rwlock_t *rw)
>  {
>  	unsigned long tmp, tmp2 = 1;
>  
> +	prefetchw(&rw->lock);
>  	__asm__ __volatile__(
>  "	ldrex	%0, [%2]\n"
>  "	adds	%0, %0, #1\n"
> @@ -262,7 +269,7 @@ static inline int arch_read_trylock(arch_rwlock_t *rw)
>  }
>  
>  /* read_can_lock - would read_trylock() succeed? */
> -#define arch_read_can_lock(x)		((x)->lock < 0x80000000)
> +#define arch_read_can_lock(x)		(ACCESS_ONCE((x)->lock) < 0x80000000)
>  
>  #define arch_read_lock_flags(lock, flags) arch_read_lock(lock)
>  #define arch_write_lock_flags(lock, flags) arch_write_lock(lock)
> diff --git a/arch/arm/include/asm/spinlock_types.h b/arch/arm/include/asm/spinlock_types.h
> index b262d2f..47663fc 100644
> --- a/arch/arm/include/asm/spinlock_types.h
> +++ b/arch/arm/include/asm/spinlock_types.h
> @@ -25,7 +25,7 @@ typedef struct {
>  #define __ARCH_SPIN_LOCK_UNLOCKED	{ { 0 } }
>  
>  typedef struct {
> -	volatile unsigned int lock;
> +	u32 lock;
>  } arch_rwlock_t;
>  
>  #define __ARCH_RW_LOCK_UNLOCKED		{ 0 }
> -- 
> 1.8.2.2
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2013-07-25 19:22 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-07-25 15:55 [PATCH v2 0/6] Add support for pldw instruction on v7 MP cores Will Deacon
2013-07-25 15:55 ` [PATCH v2 1/6] ARM: prefetch: remove redundant "cc" clobber Will Deacon
2013-07-25 15:55 ` [PATCH v2 2/6] ARM: smp_on_up: move inline asm ALT_SMP patching macro out of spinlock.h Will Deacon
2013-07-25 15:55 ` [PATCH v2 3/6] ARM: prefetch: add support for prefetchw using pldw on SMP ARMv7+ CPUs Will Deacon
2013-07-25 15:55 ` [PATCH v2 4/6] ARM: locks: prefetch the destination word for write prior to strex Will Deacon
2013-07-25 19:22   ` Nicolas Pitre
2013-07-25 15:55 ` [PATCH v2 5/6] ARM: atomics: " Will Deacon
2013-07-25 15:55 ` [PATCH v2 6/6] ARM: bitops: " Will Deacon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).