[PATCH v2 00/20] arm64: support for 8.1 LSE atomic instructions

linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 00/20] arm64: support for 8.1 LSE atomic instructions
@ 2015-07-24 10:41 Will Deacon
  2015-07-24 10:41 ` [PATCH v2 01/20] arm64: rwlocks: don't fail trylock purely due to contention Will Deacon
                   ` (19 more replies)
  0 siblings, 20 replies; 44+ messages in thread
From: Will Deacon @ 2015-07-24 10:41 UTC (permalink / raw)
  To: linux-arm-kernel

Hello,

This is version two of the patches I originally posted here:

  http://lists.infradead.org/pipermail/linux-arm-kernel/2015-July/355980.html

Changes since v1 include:

  * Fixed WFE usage in rwlock code
  * Fixed rwlock trylock retry semantics
  * Added documentation update for failed cmpxchg semantics
  * Replaced explicit save/restore of LR with a clobber

Tested on Juno and Fastmodel.

Peter -- do you want to pick up the documentation update, or shall I
         keep it with the rest of this series?

Will

--->8

Will Deacon (20):
  arm64: rwlocks: don't fail trylock purely due to contention
  documentation: Clarify failed cmpxchg memory ordering semantics
  arm64: cpufeature.h: add missing #include of kernel.h
  arm64: atomics: move ll/sc atomics into separate header file
  arm64: elf: advertise 8.1 atomic instructions as new hwcap
  arm64: alternatives: add cpu feature for lse atomics
  arm64: introduce CONFIG_ARM64_LSE_ATOMICS as fallback to ll/sc atomics
  arm64: atomics: patch in lse instructions when supported by the CPU
  arm64: locks: patch in lse instructions when supported by the CPU
  arm64: bitops: patch in lse instructions when supported by the CPU
  arm64: xchg: patch in lse instructions when supported by the CPU
  arm64: cmpxchg: patch in lse instructions when supported by the CPU
  arm64: cmpxchg_dbl: patch in lse instructions when supported by the
    CPU
  arm64: cmpxchg: avoid "cc" clobber in ll/sc routines
  arm64: cmpxchg: avoid memory barrier on comparison failure
  arm64: atomics: tidy up common atomic{,64}_* macros
  arm64: atomics: prefetch the destination word for write prior to stxr
  arm64: atomics: implement atomic{,64}_cmpxchg using cmpxchg
  arm64: atomic64_dec_if_positive: fix incorrect branch condition
  arm64: kconfig: select HAVE_CMPXCHG_LOCAL

 Documentation/atomic_ops.txt          |   4 +-
 Documentation/memory-barriers.txt     |   6 +-
 arch/arm64/Kconfig                    |  13 ++
 arch/arm64/Makefile                   |  13 +-
 arch/arm64/include/asm/atomic.h       | 262 ++++++-------------------------
 arch/arm64/include/asm/atomic_ll_sc.h | 237 ++++++++++++++++++++++++++++
 arch/arm64/include/asm/atomic_lse.h   | 285 ++++++++++++++++++++++++++++++++++
 arch/arm64/include/asm/cmpxchg.h      | 192 +++++++++--------------
 arch/arm64/include/asm/cpufeature.h   |   5 +-
 arch/arm64/include/asm/futex.h        |   2 +
 arch/arm64/include/asm/lse.h          |  53 +++++++
 arch/arm64/include/asm/spinlock.h     | 147 ++++++++++++++----
 arch/arm64/include/uapi/asm/hwcap.h   |   1 +
 arch/arm64/kernel/setup.c             |  18 +++
 arch/arm64/lib/Makefile               |  13 ++
 arch/arm64/lib/atomic_ll_sc.c         |   3 +
 arch/arm64/lib/bitops.S               |  45 +++---
 17 files changed, 911 insertions(+), 388 deletions(-)
 create mode 100644 arch/arm64/include/asm/atomic_ll_sc.h
 create mode 100644 arch/arm64/include/asm/atomic_lse.h
 create mode 100644 arch/arm64/include/asm/lse.h
 create mode 100644 arch/arm64/lib/atomic_ll_sc.c

-- 
2.1.4

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 01/20] arm64: rwlocks: don't fail trylock purely due to contention
  2015-07-24 10:41 [PATCH v2 00/20] arm64: support for 8.1 LSE atomic instructions Will Deacon
@ 2015-07-24 10:41 ` Will Deacon
  2015-07-24 11:14   ` Catalin Marinas
  2015-07-24 10:41 ` [PATCH v2 02/20] documentation: Clarify failed cmpxchg memory ordering semantics Will Deacon
                   ` (18 subsequent siblings)
  19 siblings, 1 reply; 44+ messages in thread
From: Will Deacon @ 2015-07-24 10:41 UTC (permalink / raw)
  To: linux-arm-kernel

STXR can fail for a number of reasons, so don't fail an rwlock trylock
operation simply because the STXR reported failure.

I'm not aware of any issues with the current code, but this makes it
consistent with spin_trylock and also other architectures (e.g. arch/arm).

Reported-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/spinlock.h | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/spinlock.h b/arch/arm64/include/asm/spinlock.h
index cee128732435..0f08ba5cfb33 100644
--- a/arch/arm64/include/asm/spinlock.h
+++ b/arch/arm64/include/asm/spinlock.h
@@ -140,10 +140,11 @@ static inline int arch_write_trylock(arch_rwlock_t *rw)
 	unsigned int tmp;
 
 	asm volatile(
-	"	ldaxr	%w0, %1\n"
-	"	cbnz	%w0, 1f\n"
+	"1:	ldaxr	%w0, %1\n"
+	"	cbnz	%w0, 2f\n"
 	"	stxr	%w0, %w2, %1\n"
-	"1:\n"
+	"	cbnz	%w0, 1b\n"
+	"2:\n"
 	: "=&r" (tmp), "+Q" (rw->lock)
 	: "r" (0x80000000)
 	: "memory");
@@ -209,11 +210,12 @@ static inline int arch_read_trylock(arch_rwlock_t *rw)
 	unsigned int tmp, tmp2 = 1;
 
 	asm volatile(
-	"	ldaxr	%w0, %2\n"
+	"1:	ldaxr	%w0, %2\n"
 	"	add	%w0, %w0, #1\n"
-	"	tbnz	%w0, #31, 1f\n"
+	"	tbnz	%w0, #31, 2f\n"
 	"	stxr	%w1, %w0, %2\n"
-	"1:\n"
+	"	cbnz	%w1, 1b\n"
+	"2:\n"
 	: "=&r" (tmp), "+r" (tmp2), "+Q" (rw->lock)
 	:
 	: "memory");
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v2 01/20] arm64: rwlocks: don't fail trylock purely due to contention
  2015-07-24 10:41 ` [PATCH v2 01/20] arm64: rwlocks: don't fail trylock purely due to contention Will Deacon
@ 2015-07-24 11:14   ` Catalin Marinas
  0 siblings, 0 replies; 44+ messages in thread
From: Catalin Marinas @ 2015-07-24 11:14 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jul 24, 2015 at 11:41:52AM +0100, Will Deacon wrote:
> STXR can fail for a number of reasons, so don't fail an rwlock trylock
> operation simply because the STXR reported failure.
> 
> I'm not aware of any issues with the current code, but this makes it
> consistent with spin_trylock and also other architectures (e.g. arch/arm).
> 
> Reported-by: Catalin Marinas <catalin.marinas@arm.com>
> Signed-off-by: Will Deacon <will.deacon@arm.com>

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 02/20] documentation: Clarify failed cmpxchg memory ordering semantics
  2015-07-24 10:41 [PATCH v2 00/20] arm64: support for 8.1 LSE atomic instructions Will Deacon
  2015-07-24 10:41 ` [PATCH v2 01/20] arm64: rwlocks: don't fail trylock purely due to contention Will Deacon
@ 2015-07-24 10:41 ` Will Deacon
  2015-07-24 11:15   ` Catalin Marinas
  2015-07-27 11:58   ` Will Deacon
  2015-07-24 10:41 ` [PATCH v2 03/20] arm64: cpufeature.h: add missing #include of kernel.h Will Deacon
                   ` (17 subsequent siblings)
  19 siblings, 2 replies; 44+ messages in thread
From: Will Deacon @ 2015-07-24 10:41 UTC (permalink / raw)
  To: linux-arm-kernel

A failed cmpxchg does not provide any memory ordering guarantees, a
property that is used to optimise the cmpxchg implementations on Alpha,
PowerPC and arm64.

This patch updates atomic_ops.txt and memory-barriers.txt to reflect
this.

Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 Documentation/atomic_ops.txt      | 4 +++-
 Documentation/memory-barriers.txt | 6 +++---
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/Documentation/atomic_ops.txt b/Documentation/atomic_ops.txt
index dab6da3382d9..b19fc34efdb1 100644
--- a/Documentation/atomic_ops.txt
+++ b/Documentation/atomic_ops.txt
@@ -266,7 +266,9 @@ with the given old and new values. Like all atomic_xxx operations,
 atomic_cmpxchg will only satisfy its atomicity semantics as long as all
 other accesses of *v are performed through atomic_xxx operations.
 
-atomic_cmpxchg must provide explicit memory barriers around the operation.
+atomic_cmpxchg must provide explicit memory barriers around the operation,
+although if the comparison fails then no memory ordering guarantees are
+required.
 
 The semantics for atomic_cmpxchg are the same as those defined for 'cas'
 below.
diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index 13feb697271f..18fc860df1be 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -2383,9 +2383,7 @@ about the state (old or new) implies an SMP-conditional general memory barrier
 explicit lock operations, described later).  These include:
 
 	xchg();
-	cmpxchg();
 	atomic_xchg();			atomic_long_xchg();
-	atomic_cmpxchg();		atomic_long_cmpxchg();
 	atomic_inc_return();		atomic_long_inc_return();
 	atomic_dec_return();		atomic_long_dec_return();
 	atomic_add_return();		atomic_long_add_return();
@@ -2398,7 +2396,9 @@ explicit lock operations, described later).  These include:
 	test_and_clear_bit();
 	test_and_change_bit();
 
-	/* when succeeds (returns 1) */
+	/* when succeeds */
+	cmpxchg();
+	atomic_cmpxchg();		atomic_long_cmpxchg();
 	atomic_add_unless();		atomic_long_add_unless();
 
 These are used for such things as implementing ACQUIRE-class and RELEASE-class
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v2 02/20] documentation: Clarify failed cmpxchg memory ordering semantics
  2015-07-24 10:41 ` [PATCH v2 02/20] documentation: Clarify failed cmpxchg memory ordering semantics Will Deacon
@ 2015-07-24 11:15   ` Catalin Marinas
  2015-07-27 11:58   ` Will Deacon
  1 sibling, 0 replies; 44+ messages in thread
From: Catalin Marinas @ 2015-07-24 11:15 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jul 24, 2015 at 11:41:53AM +0100, Will Deacon wrote:
> A failed cmpxchg does not provide any memory ordering guarantees, a
> property that is used to optimise the cmpxchg implementations on Alpha,
> PowerPC and arm64.
> 
> This patch updates atomic_ops.txt and memory-barriers.txt to reflect
> this.
> 
> Cc: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Will Deacon <will.deacon@arm.com>

Acked-by: Catalin Marinas <catalin.marinas@arm.com>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 02/20] documentation: Clarify failed cmpxchg memory ordering semantics
  2015-07-24 10:41 ` [PATCH v2 02/20] documentation: Clarify failed cmpxchg memory ordering semantics Will Deacon
  2015-07-24 11:15   ` Catalin Marinas
@ 2015-07-27 11:58   ` Will Deacon
  2015-07-27 12:02     ` Peter Zijlstra
  1 sibling, 1 reply; 44+ messages in thread
From: Will Deacon @ 2015-07-27 11:58 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jul 24, 2015 at 11:41:53AM +0100, Will Deacon wrote:
> A failed cmpxchg does not provide any memory ordering guarantees, a
> property that is used to optimise the cmpxchg implementations on Alpha,
> PowerPC and arm64.
> 
> This patch updates atomic_ops.txt and memory-barriers.txt to reflect
> this.
> 
> Cc: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Will Deacon <will.deacon@arm.com>
> ---
>  Documentation/atomic_ops.txt      | 4 +++-
>  Documentation/memory-barriers.txt | 6 +++---
>  2 files changed, 6 insertions(+), 4 deletions(-)

Peter: are you ok with me taking this via the arm64 tree (along with the
rest of the series), or would you prefer this patch routed through -tip?

Will

> diff --git a/Documentation/atomic_ops.txt b/Documentation/atomic_ops.txt
> index dab6da3382d9..b19fc34efdb1 100644
> --- a/Documentation/atomic_ops.txt
> +++ b/Documentation/atomic_ops.txt
> @@ -266,7 +266,9 @@ with the given old and new values. Like all atomic_xxx operations,
>  atomic_cmpxchg will only satisfy its atomicity semantics as long as all
>  other accesses of *v are performed through atomic_xxx operations.
>  
> -atomic_cmpxchg must provide explicit memory barriers around the operation.
> +atomic_cmpxchg must provide explicit memory barriers around the operation,
> +although if the comparison fails then no memory ordering guarantees are
> +required.
>  
>  The semantics for atomic_cmpxchg are the same as those defined for 'cas'
>  below.
> diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
> index 13feb697271f..18fc860df1be 100644
> --- a/Documentation/memory-barriers.txt
> +++ b/Documentation/memory-barriers.txt
> @@ -2383,9 +2383,7 @@ about the state (old or new) implies an SMP-conditional general memory barrier
>  explicit lock operations, described later).  These include:
>  
>  	xchg();
> -	cmpxchg();
>  	atomic_xchg();			atomic_long_xchg();
> -	atomic_cmpxchg();		atomic_long_cmpxchg();
>  	atomic_inc_return();		atomic_long_inc_return();
>  	atomic_dec_return();		atomic_long_dec_return();
>  	atomic_add_return();		atomic_long_add_return();
> @@ -2398,7 +2396,9 @@ explicit lock operations, described later).  These include:
>  	test_and_clear_bit();
>  	test_and_change_bit();
>  
> -	/* when succeeds (returns 1) */
> +	/* when succeeds */
> +	cmpxchg();
> +	atomic_cmpxchg();		atomic_long_cmpxchg();
>  	atomic_add_unless();		atomic_long_add_unless();
>  
>  These are used for such things as implementing ACQUIRE-class and RELEASE-class
> -- 
> 2.1.4
> 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 02/20] documentation: Clarify failed cmpxchg memory ordering semantics
  2015-07-27 11:58   ` Will Deacon
@ 2015-07-27 12:02     ` Peter Zijlstra
  2015-07-27 13:00       ` Will Deacon
  0 siblings, 1 reply; 44+ messages in thread
From: Peter Zijlstra @ 2015-07-27 12:02 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jul 27, 2015 at 12:58:22PM +0100, Will Deacon wrote:
> On Fri, Jul 24, 2015 at 11:41:53AM +0100, Will Deacon wrote:
> > A failed cmpxchg does not provide any memory ordering guarantees, a
> > property that is used to optimise the cmpxchg implementations on Alpha,
> > PowerPC and arm64.
> > 
> > This patch updates atomic_ops.txt and memory-barriers.txt to reflect
> > this.
> > 
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Signed-off-by: Will Deacon <will.deacon@arm.com>
> > ---
> >  Documentation/atomic_ops.txt      | 4 +++-
> >  Documentation/memory-barriers.txt | 6 +++---
> >  2 files changed, 6 insertions(+), 4 deletions(-)
> 
> Peter: are you ok with me taking this via the arm64 tree (along with the
> rest of the series), or would you prefer this patch routed through -tip?

So I have this one queued, and typically these changes go through tip
because the RCU tree also ends up there, and Paul is the typical source
of patches there.

So to minimize collisions on memory-barriers.txt i'd like to keep it if
its not too much hassle.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 02/20] documentation: Clarify failed cmpxchg memory ordering semantics
  2015-07-27 12:02     ` Peter Zijlstra
@ 2015-07-27 13:00       ` Will Deacon
  0 siblings, 0 replies; 44+ messages in thread
From: Will Deacon @ 2015-07-27 13:00 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jul 27, 2015 at 01:02:01PM +0100, Peter Zijlstra wrote:
> On Mon, Jul 27, 2015 at 12:58:22PM +0100, Will Deacon wrote:
> > On Fri, Jul 24, 2015 at 11:41:53AM +0100, Will Deacon wrote:
> > > A failed cmpxchg does not provide any memory ordering guarantees, a
> > > property that is used to optimise the cmpxchg implementations on Alpha,
> > > PowerPC and arm64.
> > > 
> > > This patch updates atomic_ops.txt and memory-barriers.txt to reflect
> > > this.
> > > 
> > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > Signed-off-by: Will Deacon <will.deacon@arm.com>
> > > ---
> > >  Documentation/atomic_ops.txt      | 4 +++-
> > >  Documentation/memory-barriers.txt | 6 +++---
> > >  2 files changed, 6 insertions(+), 4 deletions(-)
> > 
> > Peter: are you ok with me taking this via the arm64 tree (along with the
> > rest of the series), or would you prefer this patch routed through -tip?
> 
> So I have this one queued, and typically these changes go through tip
> because the RCU tree also ends up there, and Paul is the typical source
> of patches there.
> 
> So to minimize collisions on memory-barriers.txt i'd like to keep it if
> its not too much hassle.

No problem at all, just didn't want it to get dropped. Minimising conflicts
in memory-barriers.txt is definitely a good idea ;)

Thanks!

Will

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 03/20] arm64: cpufeature.h: add missing #include of kernel.h
  2015-07-24 10:41 [PATCH v2 00/20] arm64: support for 8.1 LSE atomic instructions Will Deacon
  2015-07-24 10:41 ` [PATCH v2 01/20] arm64: rwlocks: don't fail trylock purely due to contention Will Deacon
  2015-07-24 10:41 ` [PATCH v2 02/20] documentation: Clarify failed cmpxchg memory ordering semantics Will Deacon
@ 2015-07-24 10:41 ` Will Deacon
  2015-07-24 11:15   ` Catalin Marinas
  2015-07-24 10:41 ` [PATCH v2 04/20] arm64: atomics: move ll/sc atomics into separate header file Will Deacon
                   ` (16 subsequent siblings)
  19 siblings, 1 reply; 44+ messages in thread
From: Will Deacon @ 2015-07-24 10:41 UTC (permalink / raw)
  To: linux-arm-kernel

cpufeature.h makes use of DECLARE_BITMAP, which in turn relies on the
BITS_TO_LONGS and DIV_ROUND_UP macros.

This patch includes kernel.h in cpufeature.h to prevent all users having
to do the same thing.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/cpufeature.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index c1044218a63a..eb09f1ee8036 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -30,6 +30,8 @@
 
 #ifndef __ASSEMBLY__
 
+#include <linux/kernel.h>
+
 struct arm64_cpu_capabilities {
 	const char *desc;
 	u16 capability;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v2 03/20] arm64: cpufeature.h: add missing #include of kernel.h
  2015-07-24 10:41 ` [PATCH v2 03/20] arm64: cpufeature.h: add missing #include of kernel.h Will Deacon
@ 2015-07-24 11:15   ` Catalin Marinas
  0 siblings, 0 replies; 44+ messages in thread
From: Catalin Marinas @ 2015-07-24 11:15 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jul 24, 2015 at 11:41:54AM +0100, Will Deacon wrote:
> cpufeature.h makes use of DECLARE_BITMAP, which in turn relies on the
> BITS_TO_LONGS and DIV_ROUND_UP macros.
> 
> This patch includes kernel.h in cpufeature.h to prevent all users having
> to do the same thing.
> 
> Signed-off-by: Will Deacon <will.deacon@arm.com>

Acked-by: Catalin Marinas <catalin.marinas@arm.com>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 04/20] arm64: atomics: move ll/sc atomics into separate header file
  2015-07-24 10:41 [PATCH v2 00/20] arm64: support for 8.1 LSE atomic instructions Will Deacon
                   ` (2 preceding siblings ...)
  2015-07-24 10:41 ` [PATCH v2 03/20] arm64: cpufeature.h: add missing #include of kernel.h Will Deacon
@ 2015-07-24 10:41 ` Will Deacon
  2015-07-24 11:19   ` Catalin Marinas
  2015-07-24 10:41 ` [PATCH v2 05/20] arm64: elf: advertise 8.1 atomic instructions as new hwcap Will Deacon
                   ` (15 subsequent siblings)
  19 siblings, 1 reply; 44+ messages in thread
From: Will Deacon @ 2015-07-24 10:41 UTC (permalink / raw)
  To: linux-arm-kernel

In preparation for the Large System Extension (LSE) atomic instructions
introduced by ARM v8.1, move the current exclusive load/store (LL/SC)
atomics into their own header file.

Reviewed-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/atomic.h       | 162 +--------------------------
 arch/arm64/include/asm/atomic_ll_sc.h | 205 ++++++++++++++++++++++++++++++++++
 2 files changed, 207 insertions(+), 160 deletions(-)
 create mode 100644 arch/arm64/include/asm/atomic_ll_sc.h

diff --git a/arch/arm64/include/asm/atomic.h b/arch/arm64/include/asm/atomic.h
index 7047051ded40..9467450a5c03 100644
--- a/arch/arm64/include/asm/atomic.h
+++ b/arch/arm64/include/asm/atomic.h
@@ -30,6 +30,8 @@
 
 #ifdef __KERNEL__
 
+#include <asm/atomic_ll_sc.h>
+
 /*
  * On ARM, ordinary assignment (str instruction) doesn't clear the local
  * strex/ldrex monitor on some implementations. The reason we can use it for
@@ -38,79 +40,6 @@
 #define atomic_read(v)	ACCESS_ONCE((v)->counter)
 #define atomic_set(v,i)	(((v)->counter) = (i))
 
-/*
- * AArch64 UP and SMP safe atomic ops.  We use load exclusive and
- * store exclusive to ensure that these are atomic.  We may loop
- * to ensure that the update happens.
- */
-
-#define ATOMIC_OP(op, asm_op)						\
-static inline void atomic_##op(int i, atomic_t *v)			\
-{									\
-	unsigned long tmp;						\
-	int result;							\
-									\
-	asm volatile("// atomic_" #op "\n"				\
-"1:	ldxr	%w0, %2\n"						\
-"	" #asm_op "	%w0, %w0, %w3\n"				\
-"	stxr	%w1, %w0, %2\n"						\
-"	cbnz	%w1, 1b"						\
-	: "=&r" (result), "=&r" (tmp), "+Q" (v->counter)		\
-	: "Ir" (i));							\
-}									\
-
-#define ATOMIC_OP_RETURN(op, asm_op)					\
-static inline int atomic_##op##_return(int i, atomic_t *v)		\
-{									\
-	unsigned long tmp;						\
-	int result;							\
-									\
-	asm volatile("// atomic_" #op "_return\n"			\
-"1:	ldxr	%w0, %2\n"						\
-"	" #asm_op "	%w0, %w0, %w3\n"				\
-"	stlxr	%w1, %w0, %2\n"						\
-"	cbnz	%w1, 1b"						\
-	: "=&r" (result), "=&r" (tmp), "+Q" (v->counter)		\
-	: "Ir" (i)							\
-	: "memory");							\
-									\
-	smp_mb();							\
-	return result;							\
-}
-
-#define ATOMIC_OPS(op, asm_op)						\
-	ATOMIC_OP(op, asm_op)						\
-	ATOMIC_OP_RETURN(op, asm_op)
-
-ATOMIC_OPS(add, add)
-ATOMIC_OPS(sub, sub)
-
-#undef ATOMIC_OPS
-#undef ATOMIC_OP_RETURN
-#undef ATOMIC_OP
-
-static inline int atomic_cmpxchg(atomic_t *ptr, int old, int new)
-{
-	unsigned long tmp;
-	int oldval;
-
-	smp_mb();
-
-	asm volatile("// atomic_cmpxchg\n"
-"1:	ldxr	%w1, %2\n"
-"	cmp	%w1, %w3\n"
-"	b.ne	2f\n"
-"	stxr	%w0, %w4, %2\n"
-"	cbnz	%w0, 1b\n"
-"2:"
-	: "=&r" (tmp), "=&r" (oldval), "+Q" (ptr->counter)
-	: "Ir" (old), "r" (new)
-	: "cc");
-
-	smp_mb();
-	return oldval;
-}
-
 #define atomic_xchg(v, new) (xchg(&((v)->counter), new))
 
 static inline int __atomic_add_unless(atomic_t *v, int a, int u)
@@ -142,95 +71,8 @@ static inline int __atomic_add_unless(atomic_t *v, int a, int u)
 #define atomic64_read(v)	ACCESS_ONCE((v)->counter)
 #define atomic64_set(v,i)	(((v)->counter) = (i))
 
-#define ATOMIC64_OP(op, asm_op)						\
-static inline void atomic64_##op(long i, atomic64_t *v)			\
-{									\
-	long result;							\
-	unsigned long tmp;						\
-									\
-	asm volatile("// atomic64_" #op "\n"				\
-"1:	ldxr	%0, %2\n"						\
-"	" #asm_op "	%0, %0, %3\n"					\
-"	stxr	%w1, %0, %2\n"						\
-"	cbnz	%w1, 1b"						\
-	: "=&r" (result), "=&r" (tmp), "+Q" (v->counter)		\
-	: "Ir" (i));							\
-}									\
-
-#define ATOMIC64_OP_RETURN(op, asm_op)					\
-static inline long atomic64_##op##_return(long i, atomic64_t *v)	\
-{									\
-	long result;							\
-	unsigned long tmp;						\
-									\
-	asm volatile("// atomic64_" #op "_return\n"			\
-"1:	ldxr	%0, %2\n"						\
-"	" #asm_op "	%0, %0, %3\n"					\
-"	stlxr	%w1, %0, %2\n"						\
-"	cbnz	%w1, 1b"						\
-	: "=&r" (result), "=&r" (tmp), "+Q" (v->counter)		\
-	: "Ir" (i)							\
-	: "memory");							\
-									\
-	smp_mb();							\
-	return result;							\
-}
-
-#define ATOMIC64_OPS(op, asm_op)					\
-	ATOMIC64_OP(op, asm_op)						\
-	ATOMIC64_OP_RETURN(op, asm_op)
-
-ATOMIC64_OPS(add, add)
-ATOMIC64_OPS(sub, sub)
-
-#undef ATOMIC64_OPS
-#undef ATOMIC64_OP_RETURN
-#undef ATOMIC64_OP
-
-static inline long atomic64_cmpxchg(atomic64_t *ptr, long old, long new)
-{
-	long oldval;
-	unsigned long res;
-
-	smp_mb();
-
-	asm volatile("// atomic64_cmpxchg\n"
-"1:	ldxr	%1, %2\n"
-"	cmp	%1, %3\n"
-"	b.ne	2f\n"
-"	stxr	%w0, %4, %2\n"
-"	cbnz	%w0, 1b\n"
-"2:"
-	: "=&r" (res), "=&r" (oldval), "+Q" (ptr->counter)
-	: "Ir" (old), "r" (new)
-	: "cc");
-
-	smp_mb();
-	return oldval;
-}
-
 #define atomic64_xchg(v, new) (xchg(&((v)->counter), new))
 
-static inline long atomic64_dec_if_positive(atomic64_t *v)
-{
-	long result;
-	unsigned long tmp;
-
-	asm volatile("// atomic64_dec_if_positive\n"
-"1:	ldxr	%0, %2\n"
-"	subs	%0, %0, #1\n"
-"	b.mi	2f\n"
-"	stlxr	%w1, %0, %2\n"
-"	cbnz	%w1, 1b\n"
-"	dmb	ish\n"
-"2:"
-	: "=&r" (result), "=&r" (tmp), "+Q" (v->counter)
-	:
-	: "cc", "memory");
-
-	return result;
-}
-
 static inline int atomic64_add_unless(atomic64_t *v, long a, long u)
 {
 	long c, old;
diff --git a/arch/arm64/include/asm/atomic_ll_sc.h b/arch/arm64/include/asm/atomic_ll_sc.h
new file mode 100644
index 000000000000..aef70f2d4cb8
--- /dev/null
+++ b/arch/arm64/include/asm/atomic_ll_sc.h
@@ -0,0 +1,205 @@
+/*
+ * Based on arch/arm/include/asm/atomic.h
+ *
+ * Copyright (C) 1996 Russell King.
+ * Copyright (C) 2002 Deep Blue Solutions Ltd.
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef __ASM_ATOMIC_LL_SC_H
+#define __ASM_ATOMIC_LL_SC_H
+
+/*
+ * AArch64 UP and SMP safe atomic ops.  We use load exclusive and
+ * store exclusive to ensure that these are atomic.  We may loop
+ * to ensure that the update happens.
+ *
+ * NOTE: these functions do *not* follow the PCS and must explicitly
+ * save any clobbered registers other than x0 (regardless of return
+ * value).  This is achieved through -fcall-saved-* compiler flags for
+ * this file, which unfortunately don't work on a per-function basis
+ * (the optimize attribute silently ignores these options).
+ */
+
+#ifndef __LL_SC_INLINE
+#define __LL_SC_INLINE		static inline
+#endif
+
+#ifndef __LL_SC_PREFIX
+#define __LL_SC_PREFIX(x)	x
+#endif
+
+#define ATOMIC_OP(op, asm_op)						\
+__LL_SC_INLINE void							\
+__LL_SC_PREFIX(atomic_##op(int i, atomic_t *v))				\
+{									\
+	unsigned long tmp;						\
+	int result;							\
+									\
+	asm volatile("// atomic_" #op "\n"				\
+"1:	ldxr	%w0, %2\n"						\
+"	" #asm_op "	%w0, %w0, %w3\n"				\
+"	stxr	%w1, %w0, %2\n"						\
+"	cbnz	%w1, 1b"						\
+	: "=&r" (result), "=&r" (tmp), "+Q" (v->counter)		\
+	: "Ir" (i));							\
+}									\
+
+#define ATOMIC_OP_RETURN(op, asm_op)					\
+__LL_SC_INLINE int							\
+__LL_SC_PREFIX(atomic_##op##_return(int i, atomic_t *v))		\
+{									\
+	unsigned long tmp;						\
+	int result;							\
+									\
+	asm volatile("// atomic_" #op "_return\n"			\
+"1:	ldxr	%w0, %2\n"						\
+"	" #asm_op "	%w0, %w0, %w3\n"				\
+"	stlxr	%w1, %w0, %2\n"						\
+"	cbnz	%w1, 1b"						\
+	: "=&r" (result), "=&r" (tmp), "+Q" (v->counter)		\
+	: "Ir" (i)							\
+	: "memory");							\
+									\
+	smp_mb();							\
+	return result;							\
+}
+
+#define ATOMIC_OPS(op, asm_op)						\
+	ATOMIC_OP(op, asm_op)						\
+	ATOMIC_OP_RETURN(op, asm_op)
+
+ATOMIC_OPS(add, add)
+ATOMIC_OPS(sub, sub)
+
+#undef ATOMIC_OPS
+#undef ATOMIC_OP_RETURN
+#undef ATOMIC_OP
+
+__LL_SC_INLINE int
+__LL_SC_PREFIX(atomic_cmpxchg(atomic_t *ptr, int old, int new))
+{
+	unsigned long tmp;
+	int oldval;
+
+	smp_mb();
+
+	asm volatile("// atomic_cmpxchg\n"
+"1:	ldxr	%w1, %2\n"
+"	cmp	%w1, %w3\n"
+"	b.ne	2f\n"
+"	stxr	%w0, %w4, %2\n"
+"	cbnz	%w0, 1b\n"
+"2:"
+	: "=&r" (tmp), "=&r" (oldval), "+Q" (ptr->counter)
+	: "Ir" (old), "r" (new)
+	: "cc");
+
+	smp_mb();
+	return oldval;
+}
+
+#define ATOMIC64_OP(op, asm_op)						\
+__LL_SC_INLINE void							\
+__LL_SC_PREFIX(atomic64_##op(long i, atomic64_t *v))			\
+{									\
+	long result;							\
+	unsigned long tmp;						\
+									\
+	asm volatile("// atomic64_" #op "\n"				\
+"1:	ldxr	%0, %2\n"						\
+"	" #asm_op "	%0, %0, %3\n"					\
+"	stxr	%w1, %0, %2\n"						\
+"	cbnz	%w1, 1b"						\
+	: "=&r" (result), "=&r" (tmp), "+Q" (v->counter)		\
+	: "Ir" (i));							\
+}									\
+
+#define ATOMIC64_OP_RETURN(op, asm_op)					\
+__LL_SC_INLINE long							\
+__LL_SC_PREFIX(atomic64_##op##_return(long i, atomic64_t *v))		\
+{									\
+	long result;							\
+	unsigned long tmp;						\
+									\
+	asm volatile("// atomic64_" #op "_return\n"			\
+"1:	ldxr	%0, %2\n"						\
+"	" #asm_op "	%0, %0, %3\n"					\
+"	stlxr	%w1, %0, %2\n"						\
+"	cbnz	%w1, 1b"						\
+	: "=&r" (result), "=&r" (tmp), "+Q" (v->counter)		\
+	: "Ir" (i)							\
+	: "memory");							\
+									\
+	smp_mb();							\
+	return result;							\
+}
+
+#define ATOMIC64_OPS(op, asm_op)					\
+	ATOMIC64_OP(op, asm_op)						\
+	ATOMIC64_OP_RETURN(op, asm_op)
+
+ATOMIC64_OPS(add, add)
+ATOMIC64_OPS(sub, sub)
+
+#undef ATOMIC64_OPS
+#undef ATOMIC64_OP_RETURN
+#undef ATOMIC64_OP
+
+__LL_SC_INLINE long
+__LL_SC_PREFIX(atomic64_cmpxchg(atomic64_t *ptr, long old, long new))
+{
+	long oldval;
+	unsigned long res;
+
+	smp_mb();
+
+	asm volatile("// atomic64_cmpxchg\n"
+"1:	ldxr	%1, %2\n"
+"	cmp	%1, %3\n"
+"	b.ne	2f\n"
+"	stxr	%w0, %4, %2\n"
+"	cbnz	%w0, 1b\n"
+"2:"
+	: "=&r" (res), "=&r" (oldval), "+Q" (ptr->counter)
+	: "Ir" (old), "r" (new)
+	: "cc");
+
+	smp_mb();
+	return oldval;
+}
+
+__LL_SC_INLINE long
+__LL_SC_PREFIX(atomic64_dec_if_positive(atomic64_t *v))
+{
+	long result;
+	unsigned long tmp;
+
+	asm volatile("// atomic64_dec_if_positive\n"
+"1:	ldxr	%0, %2\n"
+"	subs	%0, %0, #1\n"
+"	b.mi	2f\n"
+"	stlxr	%w1, %0, %2\n"
+"	cbnz	%w1, 1b\n"
+"	dmb	ish\n"
+"2:"
+	: "=&r" (result), "=&r" (tmp), "+Q" (v->counter)
+	:
+	: "cc", "memory");
+
+	return result;
+}
+
+#endif	/* __ASM_ATOMIC_LL_SC_H */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v2 04/20] arm64: atomics: move ll/sc atomics into separate header file
  2015-07-24 10:41 ` [PATCH v2 04/20] arm64: atomics: move ll/sc atomics into separate header file Will Deacon
@ 2015-07-24 11:19   ` Catalin Marinas
  0 siblings, 0 replies; 44+ messages in thread
From: Catalin Marinas @ 2015-07-24 11:19 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jul 24, 2015 at 11:41:55AM +0100, Will Deacon wrote:
> In preparation for the Large System Extension (LSE) atomic instructions
> introduced by ARM v8.1, move the current exclusive load/store (LL/SC)
> atomics into their own header file.
> 
> Reviewed-by: Steve Capper <steve.capper@arm.com>
> Signed-off-by: Will Deacon <will.deacon@arm.com>

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 05/20] arm64: elf: advertise 8.1 atomic instructions as new hwcap
  2015-07-24 10:41 [PATCH v2 00/20] arm64: support for 8.1 LSE atomic instructions Will Deacon
                   ` (3 preceding siblings ...)
  2015-07-24 10:41 ` [PATCH v2 04/20] arm64: atomics: move ll/sc atomics into separate header file Will Deacon
@ 2015-07-24 10:41 ` Will Deacon
  2015-07-24 11:24   ` Catalin Marinas
  2015-07-24 10:41 ` [PATCH v2 06/20] arm64: alternatives: add cpu feature for lse atomics Will Deacon
                   ` (14 subsequent siblings)
  19 siblings, 1 reply; 44+ messages in thread
From: Will Deacon @ 2015-07-24 10:41 UTC (permalink / raw)
  To: linux-arm-kernel

The ARM v8.1 architecture introduces new atomic instructions to the A64
instruction set for things like cmpxchg, so advertise their availability
to userspace using a hwcap.

Reviewed-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/uapi/asm/hwcap.h |  1 +
 arch/arm64/kernel/setup.c           | 14 ++++++++++++++
 2 files changed, 15 insertions(+)

diff --git a/arch/arm64/include/uapi/asm/hwcap.h b/arch/arm64/include/uapi/asm/hwcap.h
index 73cf0f54d57c..361c8a8ef55f 100644
--- a/arch/arm64/include/uapi/asm/hwcap.h
+++ b/arch/arm64/include/uapi/asm/hwcap.h
@@ -27,5 +27,6 @@
 #define HWCAP_SHA1		(1 << 5)
 #define HWCAP_SHA2		(1 << 6)
 #define HWCAP_CRC32		(1 << 7)
+#define HWCAP_ATOMICS		(1 << 8)
 
 #endif /* _UAPI__ASM_HWCAP_H */
diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index f3067d4d4e35..c7fd2c946374 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -280,6 +280,19 @@ static void __init setup_processor(void)
 	if (block && !(block & 0x8))
 		elf_hwcap |= HWCAP_CRC32;
 
+	block = (features >> 20) & 0xf;
+	if (!(block & 0x8)) {
+		switch (block) {
+		default:
+		case 2:
+			elf_hwcap |= HWCAP_ATOMICS;
+		case 1:
+			/* RESERVED */
+		case 0:
+			break;
+		}
+	}
+
 #ifdef CONFIG_COMPAT
 	/*
 	 * ID_ISAR5_EL1 carries similar information as above, but pertaining to
@@ -456,6 +469,7 @@ static const char *hwcap_str[] = {
 	"sha1",
 	"sha2",
 	"crc32",
+	"atomics",
 	NULL
 };
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v2 05/20] arm64: elf: advertise 8.1 atomic instructions as new hwcap
  2015-07-24 10:41 ` [PATCH v2 05/20] arm64: elf: advertise 8.1 atomic instructions as new hwcap Will Deacon
@ 2015-07-24 11:24   ` Catalin Marinas
  0 siblings, 0 replies; 44+ messages in thread
From: Catalin Marinas @ 2015-07-24 11:24 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jul 24, 2015 at 11:41:56AM +0100, Will Deacon wrote:
> The ARM v8.1 architecture introduces new atomic instructions to the A64
> instruction set for things like cmpxchg, so advertise their availability
> to userspace using a hwcap.
> 
> Reviewed-by: Steve Capper <steve.capper@arm.com>
> Signed-off-by: Will Deacon <will.deacon@arm.com>

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>

> diff --git a/arch/arm64/include/uapi/asm/hwcap.h b/arch/arm64/include/uapi/asm/hwcap.h
> index 73cf0f54d57c..361c8a8ef55f 100644
> --- a/arch/arm64/include/uapi/asm/hwcap.h
> +++ b/arch/arm64/include/uapi/asm/hwcap.h
> @@ -27,5 +27,6 @@
>  #define HWCAP_SHA1		(1 << 5)
>  #define HWCAP_SHA2		(1 << 6)
>  #define HWCAP_CRC32		(1 << 7)
> +#define HWCAP_ATOMICS		(1 << 8)
>  
>  #endif /* _UAPI__ASM_HWCAP_H */
> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
> index f3067d4d4e35..c7fd2c946374 100644
> --- a/arch/arm64/kernel/setup.c
> +++ b/arch/arm64/kernel/setup.c
> @@ -280,6 +280,19 @@ static void __init setup_processor(void)
>  	if (block && !(block & 0x8))
>  		elf_hwcap |= HWCAP_CRC32;
>  
> +	block = (features >> 20) & 0xf;
> +	if (!(block & 0x8)) {
> +		switch (block) {
> +		default:
> +		case 2:
> +			elf_hwcap |= HWCAP_ATOMICS;
> +		case 1:
> +			/* RESERVED */
> +		case 0:
> +			break;
> +		}
> +	}

After we merge Robin's patch for cpuid_feature_extract_field(), we can
change this function (subsequent patch though).

-- 
Catalin

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 06/20] arm64: alternatives: add cpu feature for lse atomics
  2015-07-24 10:41 [PATCH v2 00/20] arm64: support for 8.1 LSE atomic instructions Will Deacon
                   ` (4 preceding siblings ...)
  2015-07-24 10:41 ` [PATCH v2 05/20] arm64: elf: advertise 8.1 atomic instructions as new hwcap Will Deacon
@ 2015-07-24 10:41 ` Will Deacon
  2015-07-24 11:26   ` Catalin Marinas
  2015-07-24 10:41 ` [PATCH v2 07/20] arm64: introduce CONFIG_ARM64_LSE_ATOMICS as fallback to ll/sc atomics Will Deacon
                   ` (13 subsequent siblings)
  19 siblings, 1 reply; 44+ messages in thread
From: Will Deacon @ 2015-07-24 10:41 UTC (permalink / raw)
  To: linux-arm-kernel

Add a CPU feature for the LSE atomic instructions, so that they can be
patched in at runtime when we detect that they are supported.

Reviewed-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/cpufeature.h | 3 ++-
 arch/arm64/kernel/setup.c           | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index eb09f1ee8036..d58db9d3b4fa 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -25,8 +25,9 @@
 #define ARM64_WORKAROUND_DEVICE_LOAD_ACQUIRE	1
 #define ARM64_WORKAROUND_845719			2
 #define ARM64_HAS_SYSREG_GIC_CPUIF		3
+#define ARM64_CPU_FEAT_LSE_ATOMICS		4
 
-#define ARM64_NCAPS				4
+#define ARM64_NCAPS				5
 
 #ifndef __ASSEMBLY__
 
diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index c7fd2c946374..5b170df96aaf 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -286,6 +286,7 @@ static void __init setup_processor(void)
 		default:
 		case 2:
 			elf_hwcap |= HWCAP_ATOMICS;
+			cpus_set_cap(ARM64_CPU_FEAT_LSE_ATOMICS);
 		case 1:
 			/* RESERVED */
 		case 0:
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v2 06/20] arm64: alternatives: add cpu feature for lse atomics
  2015-07-24 10:41 ` [PATCH v2 06/20] arm64: alternatives: add cpu feature for lse atomics Will Deacon
@ 2015-07-24 11:26   ` Catalin Marinas
  0 siblings, 0 replies; 44+ messages in thread
From: Catalin Marinas @ 2015-07-24 11:26 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jul 24, 2015 at 11:41:57AM +0100, Will Deacon wrote:
> Add a CPU feature for the LSE atomic instructions, so that they can be
> patched in at runtime when we detect that they are supported.
> 
> Reviewed-by: Steve Capper <steve.capper@arm.com>
> Signed-off-by: Will Deacon <will.deacon@arm.com>

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 07/20] arm64: introduce CONFIG_ARM64_LSE_ATOMICS as fallback to ll/sc atomics
  2015-07-24 10:41 [PATCH v2 00/20] arm64: support for 8.1 LSE atomic instructions Will Deacon
                   ` (5 preceding siblings ...)
  2015-07-24 10:41 ` [PATCH v2 06/20] arm64: alternatives: add cpu feature for lse atomics Will Deacon
@ 2015-07-24 10:41 ` Will Deacon
  2015-07-24 11:38   ` Catalin Marinas
  2015-07-24 10:41 ` [PATCH v2 08/20] arm64: atomics: patch in lse instructions when supported by the CPU Will Deacon
                   ` (12 subsequent siblings)
  19 siblings, 1 reply; 44+ messages in thread
From: Will Deacon @ 2015-07-24 10:41 UTC (permalink / raw)
  To: linux-arm-kernel

In order to patch in the new atomic instructions at runtime, we need to
generate wrappers around the out-of-line exclusive load/store atomics.

This patch adds a new Kconfig option, CONFIG_ARM64_LSE_ATOMICS. which
causes our atomic functions to branch to the out-of-line ll/sc
implementations. To avoid the register spill overhead of the PCS, the
out-of-line functions are compiled with specific compiler flags to
force out-of-line save/restore of any registers that are usually
caller-saved.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/Kconfig                    |  12 +++
 arch/arm64/include/asm/atomic.h       |   9 ++
 arch/arm64/include/asm/atomic_ll_sc.h |  19 +++-
 arch/arm64/include/asm/atomic_lse.h   | 160 ++++++++++++++++++++++++++++++++++
 arch/arm64/lib/Makefile               |  13 +++
 arch/arm64/lib/atomic_ll_sc.c         |   3 +
 6 files changed, 214 insertions(+), 2 deletions(-)
 create mode 100644 arch/arm64/include/asm/atomic_lse.h
 create mode 100644 arch/arm64/lib/atomic_ll_sc.c

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 318175f62c24..d11b1af62438 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -664,6 +664,18 @@ config SETEND_EMULATION
 	  If unsure, say Y
 endif
 
+config ARM64_LSE_ATOMICS
+	bool "ARMv8.1 atomic instructions"
+	help
+	  As part of the Large System Extensions, ARMv8.1 introduces new
+	  atomic instructions that are designed specifically to scale in
+	  very large systems.
+
+	  Say Y here to make use of these instructions for the in-kernel
+	  atomic routines. This incurs a small overhead on CPUs that do
+	  not support these instructions and requires the kernel to be
+	  built with binutils >= 2.25.
+
 endmenu
 
 menu "Boot options"
diff --git a/arch/arm64/include/asm/atomic.h b/arch/arm64/include/asm/atomic.h
index 9467450a5c03..955cc14f3ce4 100644
--- a/arch/arm64/include/asm/atomic.h
+++ b/arch/arm64/include/asm/atomic.h
@@ -21,6 +21,7 @@
 #define __ASM_ATOMIC_H
 
 #include <linux/compiler.h>
+#include <linux/stringify.h>
 #include <linux/types.h>
 
 #include <asm/barrier.h>
@@ -30,7 +31,15 @@
 
 #ifdef __KERNEL__
 
+#define __ARM64_IN_ATOMIC_IMPL
+
+#ifdef CONFIG_ARM64_LSE_ATOMICS
+#include <asm/atomic_lse.h>
+#else
 #include <asm/atomic_ll_sc.h>
+#endif
+
+#undef __ARM64_IN_ATOMIC_IMPL
 
 /*
  * On ARM, ordinary assignment (str instruction) doesn't clear the local
diff --git a/arch/arm64/include/asm/atomic_ll_sc.h b/arch/arm64/include/asm/atomic_ll_sc.h
index aef70f2d4cb8..024b892dbc6a 100644
--- a/arch/arm64/include/asm/atomic_ll_sc.h
+++ b/arch/arm64/include/asm/atomic_ll_sc.h
@@ -21,6 +21,10 @@
 #ifndef __ASM_ATOMIC_LL_SC_H
 #define __ASM_ATOMIC_LL_SC_H
 
+#ifndef __ARM64_IN_ATOMIC_IMPL
+#error "please don't include this file directly"
+#endif
+
 /*
  * AArch64 UP and SMP safe atomic ops.  We use load exclusive and
  * store exclusive to ensure that these are atomic.  We may loop
@@ -41,6 +45,10 @@
 #define __LL_SC_PREFIX(x)	x
 #endif
 
+#ifndef __LL_SC_EXPORT
+#define __LL_SC_EXPORT(x)
+#endif
+
 #define ATOMIC_OP(op, asm_op)						\
 __LL_SC_INLINE void							\
 __LL_SC_PREFIX(atomic_##op(int i, atomic_t *v))				\
@@ -56,6 +64,7 @@ __LL_SC_PREFIX(atomic_##op(int i, atomic_t *v))				\
 	: "=&r" (result), "=&r" (tmp), "+Q" (v->counter)		\
 	: "Ir" (i));							\
 }									\
+__LL_SC_EXPORT(atomic_##op);
 
 #define ATOMIC_OP_RETURN(op, asm_op)					\
 __LL_SC_INLINE int							\
@@ -75,7 +84,8 @@ __LL_SC_PREFIX(atomic_##op##_return(int i, atomic_t *v))		\
 									\
 	smp_mb();							\
 	return result;							\
-}
+}									\
+__LL_SC_EXPORT(atomic_##op##_return);
 
 #define ATOMIC_OPS(op, asm_op)						\
 	ATOMIC_OP(op, asm_op)						\
@@ -110,6 +120,7 @@ __LL_SC_PREFIX(atomic_cmpxchg(atomic_t *ptr, int old, int new))
 	smp_mb();
 	return oldval;
 }
+__LL_SC_EXPORT(atomic_cmpxchg);
 
 #define ATOMIC64_OP(op, asm_op)						\
 __LL_SC_INLINE void							\
@@ -126,6 +137,7 @@ __LL_SC_PREFIX(atomic64_##op(long i, atomic64_t *v))			\
 	: "=&r" (result), "=&r" (tmp), "+Q" (v->counter)		\
 	: "Ir" (i));							\
 }									\
+__LL_SC_EXPORT(atomic64_##op);
 
 #define ATOMIC64_OP_RETURN(op, asm_op)					\
 __LL_SC_INLINE long							\
@@ -145,7 +157,8 @@ __LL_SC_PREFIX(atomic64_##op##_return(long i, atomic64_t *v))		\
 									\
 	smp_mb();							\
 	return result;							\
-}
+}									\
+__LL_SC_EXPORT(atomic64_##op##_return);
 
 #define ATOMIC64_OPS(op, asm_op)					\
 	ATOMIC64_OP(op, asm_op)						\
@@ -180,6 +193,7 @@ __LL_SC_PREFIX(atomic64_cmpxchg(atomic64_t *ptr, long old, long new))
 	smp_mb();
 	return oldval;
 }
+__LL_SC_EXPORT(atomic64_cmpxchg);
 
 __LL_SC_INLINE long
 __LL_SC_PREFIX(atomic64_dec_if_positive(atomic64_t *v))
@@ -201,5 +215,6 @@ __LL_SC_PREFIX(atomic64_dec_if_positive(atomic64_t *v))
 
 	return result;
 }
+__LL_SC_EXPORT(atomic64_dec_if_positive);
 
 #endif	/* __ASM_ATOMIC_LL_SC_H */
diff --git a/arch/arm64/include/asm/atomic_lse.h b/arch/arm64/include/asm/atomic_lse.h
new file mode 100644
index 000000000000..088ede94771a
--- /dev/null
+++ b/arch/arm64/include/asm/atomic_lse.h
@@ -0,0 +1,160 @@
+/*
+ * Based on arch/arm/include/asm/atomic.h
+ *
+ * Copyright (C) 1996 Russell King.
+ * Copyright (C) 2002 Deep Blue Solutions Ltd.
+ * Copyright (C) 2012 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef __ASM_ATOMIC_LSE_H
+#define __ASM_ATOMIC_LSE_H
+
+#ifndef __ARM64_IN_ATOMIC_IMPL
+#error "please don't include this file directly"
+#endif
+
+/* Move the ll/sc atomics out-of-line */
+#define __LL_SC_INLINE
+#define __LL_SC_PREFIX(x)	__ll_sc_##x
+#define __LL_SC_EXPORT(x)	EXPORT_SYMBOL(__LL_SC_PREFIX(x))
+
+/* Macros for constructing calls to out-of-line ll/sc atomics */
+#define __LL_SC_CALL(op)						\
+	"bl\t" __stringify(__LL_SC_PREFIX(atomic_##op)) "\n"
+#define __LL_SC_CALL64(op)						\
+	"bl\t" __stringify(__LL_SC_PREFIX(atomic64_##op)) "\n"
+
+#define ATOMIC_OP(op, asm_op)						\
+static inline void atomic_##op(int i, atomic_t *v)			\
+{									\
+	register int w0 asm ("w0") = i;					\
+	register atomic_t *x1 asm ("x1") = v;				\
+									\
+	asm volatile(							\
+	__LL_SC_CALL(op)						\
+	: "+r" (w0), "+Q" (v->counter)					\
+	: "r" (x1)							\
+	: "x30");							\
+}									\
+
+#define ATOMIC_OP_RETURN(op, asm_op)					\
+static inline int atomic_##op##_return(int i, atomic_t *v)		\
+{									\
+	register int w0 asm ("w0") = i;					\
+	register atomic_t *x1 asm ("x1") = v;				\
+									\
+	asm volatile(							\
+	__LL_SC_CALL(op##_return)					\
+	: "+r" (w0)							\
+	: "r" (x1)							\
+	: "x30", "memory");						\
+									\
+	return w0;							\
+}
+
+#define ATOMIC_OPS(op, asm_op)						\
+	ATOMIC_OP(op, asm_op)						\
+	ATOMIC_OP_RETURN(op, asm_op)
+
+ATOMIC_OPS(add, add)
+ATOMIC_OPS(sub, sub)
+
+#undef ATOMIC_OPS
+#undef ATOMIC_OP_RETURN
+#undef ATOMIC_OP
+
+static inline int atomic_cmpxchg(atomic_t *ptr, int old, int new)
+{
+	register unsigned long x0 asm ("x0") = (unsigned long)ptr;
+	register int w1 asm ("w1") = old;
+	register int w2 asm ("w2") = new;
+
+	asm volatile(
+	__LL_SC_CALL(cmpxchg)
+	: "+r" (x0)
+	: "r" (w1), "r" (w2)
+	: "x30", "cc", "memory");
+
+	return x0;
+}
+
+#define ATOMIC64_OP(op, asm_op)						\
+static inline void atomic64_##op(long i, atomic64_t *v)			\
+{									\
+	register long x0 asm ("x0") = i;				\
+	register atomic64_t *x1 asm ("x1") = v;				\
+									\
+	asm volatile(							\
+	__LL_SC_CALL64(op)						\
+	: "+r" (x0), "+Q" (v->counter)					\
+	: "r" (x1)							\
+	: "x30");							\
+}									\
+
+#define ATOMIC64_OP_RETURN(op, asm_op)					\
+static inline long atomic64_##op##_return(long i, atomic64_t *v)	\
+{									\
+	register long x0 asm ("x0") = i;				\
+	register atomic64_t *x1 asm ("x1") = v;				\
+									\
+	asm volatile(							\
+	__LL_SC_CALL64(op##_return)					\
+	: "+r" (x0)							\
+	: "r" (x1)							\
+	: "x30", "memory");						\
+									\
+	return x0;							\
+}
+
+#define ATOMIC64_OPS(op, asm_op)					\
+	ATOMIC64_OP(op, asm_op)						\
+	ATOMIC64_OP_RETURN(op, asm_op)
+
+ATOMIC64_OPS(add, add)
+ATOMIC64_OPS(sub, sub)
+
+#undef ATOMIC64_OPS
+#undef ATOMIC64_OP_RETURN
+#undef ATOMIC64_OP
+
+static inline long atomic64_cmpxchg(atomic64_t *ptr, long old, long new)
+{
+	register unsigned long x0 asm ("x0") = (unsigned long)ptr;
+	register long x1 asm ("x1") = old;
+	register long x2 asm ("x2") = new;
+
+	asm volatile(
+	__LL_SC_CALL64(cmpxchg)
+	: "+r" (x0)
+	: "r" (x1), "r" (x2)
+	: "x30", "cc", "memory");
+
+	return x0;
+}
+
+static inline long atomic64_dec_if_positive(atomic64_t *v)
+{
+	register unsigned long x0 asm ("x0") = (unsigned long)v;
+
+	asm volatile(
+	__LL_SC_CALL64(dec_if_positive)
+	: "+r" (x0)
+	:
+	: "x30", "cc", "memory");
+
+	return x0;
+}
+
+#endif	/* __ASM_ATOMIC_LSE_H */
diff --git a/arch/arm64/lib/Makefile b/arch/arm64/lib/Makefile
index d98d3e39879e..1a811ecf71da 100644
--- a/arch/arm64/lib/Makefile
+++ b/arch/arm64/lib/Makefile
@@ -3,3 +3,16 @@ lib-y		:= bitops.o clear_user.o delay.o copy_from_user.o	\
 		   clear_page.o memchr.o memcpy.o memmove.o memset.o	\
 		   memcmp.o strcmp.o strncmp.o strlen.o strnlen.o	\
 		   strchr.o strrchr.o
+
+# Tell the compiler to treat all general purpose registers as
+# callee-saved, which allows for efficient runtime patching of the bl
+# instruction in the caller with an atomic instruction when supported by
+# the CPU. Result and argument registers are handled correctly, based on
+# the function prototype.
+lib-$(CONFIG_ARM64_LSE_ATOMICS) += atomic_ll_sc.o
+CFLAGS_atomic_ll_sc.o	:= -fcall-used-x0 -ffixed-x1 -ffixed-x2		\
+		   -ffixed-x3 -ffixed-x4 -ffixed-x5 -ffixed-x6		\
+		   -ffixed-x7 -fcall-saved-x8 -fcall-saved-x9		\
+		   -fcall-saved-x10 -fcall-saved-x11 -fcall-saved-x12	\
+		   -fcall-saved-x13 -fcall-saved-x14 -fcall-saved-x15	\
+		   -fcall-saved-x16 -fcall-saved-x17 -fcall-saved-x18
diff --git a/arch/arm64/lib/atomic_ll_sc.c b/arch/arm64/lib/atomic_ll_sc.c
new file mode 100644
index 000000000000..b0c538b0da28
--- /dev/null
+++ b/arch/arm64/lib/atomic_ll_sc.c
@@ -0,0 +1,3 @@
+#include <asm/atomic.h>
+#define __ARM64_IN_ATOMIC_IMPL
+#include <asm/atomic_ll_sc.h>
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v2 07/20] arm64: introduce CONFIG_ARM64_LSE_ATOMICS as fallback to ll/sc atomics
  2015-07-24 10:41 ` [PATCH v2 07/20] arm64: introduce CONFIG_ARM64_LSE_ATOMICS as fallback to ll/sc atomics Will Deacon
@ 2015-07-24 11:38   ` Catalin Marinas
  0 siblings, 0 replies; 44+ messages in thread
From: Catalin Marinas @ 2015-07-24 11:38 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jul 24, 2015 at 11:41:58AM +0100, Will Deacon wrote:
> In order to patch in the new atomic instructions at runtime, we need to
> generate wrappers around the out-of-line exclusive load/store atomics.
> 
> This patch adds a new Kconfig option, CONFIG_ARM64_LSE_ATOMICS. which
> causes our atomic functions to branch to the out-of-line ll/sc
> implementations. To avoid the register spill overhead of the PCS, the
> out-of-line functions are compiled with specific compiler flags to
> force out-of-line save/restore of any registers that are usually
> caller-saved.
> 
> Signed-off-by: Will Deacon <will.deacon@arm.com>

Slightly easier to read now ;).

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 08/20] arm64: atomics: patch in lse instructions when supported by the CPU
  2015-07-24 10:41 [PATCH v2 00/20] arm64: support for 8.1 LSE atomic instructions Will Deacon
                   ` (6 preceding siblings ...)
  2015-07-24 10:41 ` [PATCH v2 07/20] arm64: introduce CONFIG_ARM64_LSE_ATOMICS as fallback to ll/sc atomics Will Deacon
@ 2015-07-24 10:41 ` Will Deacon
  2015-07-24 14:43   ` Catalin Marinas
  2015-07-24 10:42 ` [PATCH v2 09/20] arm64: locks: " Will Deacon
                   ` (11 subsequent siblings)
  19 siblings, 1 reply; 44+ messages in thread
From: Will Deacon @ 2015-07-24 10:41 UTC (permalink / raw)
  To: linux-arm-kernel

On CPUs which support the LSE atomic instructions introduced in ARMv8.1,
it makes sense to use them in preference to ll/sc sequences.

This patch introduces runtime patching of atomic_t and atomic64_t
routines so that the call-site for the out-of-line ll/sc sequences is
patched with an LSE atomic instruction when we detect that
the CPU supports it.

If binutils is not recent enough to assemble the LSE instructions, then
the ll/sc sequences are inlined as though CONFIG_ARM64_LSE_ATOMICS=n.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/Makefile                   |  13 +-
 arch/arm64/include/asm/atomic.h       |   4 +-
 arch/arm64/include/asm/atomic_ll_sc.h |  12 --
 arch/arm64/include/asm/atomic_lse.h   | 270 ++++++++++++++++++++++------------
 arch/arm64/include/asm/lse.h          |  34 +++++
 arch/arm64/kernel/setup.c             |   3 +
 6 files changed, 229 insertions(+), 107 deletions(-)
 create mode 100644 arch/arm64/include/asm/lse.h

diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
index 4d2a925998f9..fa23c0dc3e77 100644
--- a/arch/arm64/Makefile
+++ b/arch/arm64/Makefile
@@ -17,7 +17,18 @@ GZFLAGS		:=-9
 
 KBUILD_DEFCONFIG := defconfig
 
-KBUILD_CFLAGS	+= -mgeneral-regs-only
+# Check for binutils support for specific extensions
+lseinstr := $(call as-instr,.arch_extension lse,-DCONFIG_AS_LSE=1)
+
+ifeq ($(CONFIG_ARM64_LSE_ATOMICS), y)
+  ifeq ($(lseinstr),)
+$(warning LSE atomics not supported by binutils)
+  endif
+endif
+
+KBUILD_CFLAGS	+= -mgeneral-regs-only $(lseinstr)
+KBUILD_AFLAGS	+= $(lseinstr)
+
 ifeq ($(CONFIG_CPU_BIG_ENDIAN), y)
 KBUILD_CPPFLAGS	+= -mbig-endian
 AS		+= -EB
diff --git a/arch/arm64/include/asm/atomic.h b/arch/arm64/include/asm/atomic.h
index 955cc14f3ce4..cb53efa23f62 100644
--- a/arch/arm64/include/asm/atomic.h
+++ b/arch/arm64/include/asm/atomic.h
@@ -21,11 +21,11 @@
 #define __ASM_ATOMIC_H
 
 #include <linux/compiler.h>
-#include <linux/stringify.h>
 #include <linux/types.h>
 
 #include <asm/barrier.h>
 #include <asm/cmpxchg.h>
+#include <asm/lse.h>
 
 #define ATOMIC_INIT(i)	{ (i) }
 
@@ -33,7 +33,7 @@
 
 #define __ARM64_IN_ATOMIC_IMPL
 
-#ifdef CONFIG_ARM64_LSE_ATOMICS
+#if defined(CONFIG_ARM64_LSE_ATOMICS) && defined(CONFIG_AS_LSE)
 #include <asm/atomic_lse.h>
 #else
 #include <asm/atomic_ll_sc.h>
diff --git a/arch/arm64/include/asm/atomic_ll_sc.h b/arch/arm64/include/asm/atomic_ll_sc.h
index 024b892dbc6a..9cf298914ac3 100644
--- a/arch/arm64/include/asm/atomic_ll_sc.h
+++ b/arch/arm64/include/asm/atomic_ll_sc.h
@@ -37,18 +37,6 @@
  * (the optimize attribute silently ignores these options).
  */
 
-#ifndef __LL_SC_INLINE
-#define __LL_SC_INLINE		static inline
-#endif
-
-#ifndef __LL_SC_PREFIX
-#define __LL_SC_PREFIX(x)	x
-#endif
-
-#ifndef __LL_SC_EXPORT
-#define __LL_SC_EXPORT(x)
-#endif
-
 #define ATOMIC_OP(op, asm_op)						\
 __LL_SC_INLINE void							\
 __LL_SC_PREFIX(atomic_##op(int i, atomic_t *v))				\
diff --git a/arch/arm64/include/asm/atomic_lse.h b/arch/arm64/include/asm/atomic_lse.h
index 088ede94771a..e6e544bce331 100644
--- a/arch/arm64/include/asm/atomic_lse.h
+++ b/arch/arm64/include/asm/atomic_lse.h
@@ -25,55 +25,76 @@
 #error "please don't include this file directly"
 #endif
 
-/* Move the ll/sc atomics out-of-line */
-#define __LL_SC_INLINE
-#define __LL_SC_PREFIX(x)	__ll_sc_##x
-#define __LL_SC_EXPORT(x)	EXPORT_SYMBOL(__LL_SC_PREFIX(x))
-
-/* Macros for constructing calls to out-of-line ll/sc atomics */
-#define __LL_SC_CALL(op)						\
-	"bl\t" __stringify(__LL_SC_PREFIX(atomic_##op)) "\n"
-#define __LL_SC_CALL64(op)						\
-	"bl\t" __stringify(__LL_SC_PREFIX(atomic64_##op)) "\n"
-
-#define ATOMIC_OP(op, asm_op)						\
-static inline void atomic_##op(int i, atomic_t *v)			\
-{									\
-	register int w0 asm ("w0") = i;					\
-	register atomic_t *x1 asm ("x1") = v;				\
-									\
-	asm volatile(							\
-	__LL_SC_CALL(op)						\
-	: "+r" (w0), "+Q" (v->counter)					\
-	: "r" (x1)							\
-	: "x30");							\
-}									\
-
-#define ATOMIC_OP_RETURN(op, asm_op)					\
-static inline int atomic_##op##_return(int i, atomic_t *v)		\
-{									\
-	register int w0 asm ("w0") = i;					\
-	register atomic_t *x1 asm ("x1") = v;				\
-									\
-	asm volatile(							\
-	__LL_SC_CALL(op##_return)					\
-	: "+r" (w0)							\
-	: "r" (x1)							\
-	: "x30", "memory");						\
-									\
-	return w0;							\
+#define __LL_SC_ATOMIC(op)	__LL_SC_CALL(atomic_##op)
+
+static inline void atomic_add(int i, atomic_t *v)
+{
+	register int w0 asm ("w0") = i;
+	register atomic_t *x1 asm ("x1") = v;
+
+	asm volatile(ARM64_LSE_ATOMIC_INSN(__LL_SC_ATOMIC(add),
+	"	stadd	%w[i], %[v]\n")
+	: [i] "+r" (w0), [v] "+Q" (v->counter)
+	: "r" (x1)
+	: "x30");
 }
 
-#define ATOMIC_OPS(op, asm_op)						\
-	ATOMIC_OP(op, asm_op)						\
-	ATOMIC_OP_RETURN(op, asm_op)
+static inline int atomic_add_return(int i, atomic_t *v)
+{
+	register int w0 asm ("w0") = i;
+	register atomic_t *x1 asm ("x1") = v;
+
+	asm volatile(ARM64_LSE_ATOMIC_INSN(
+	/* LL/SC */
+	"	nop\n"
+	__LL_SC_ATOMIC(add_return),
+	/* LSE atomics */
+	"	ldaddal	%w[i], w30, %[v]\n"
+	"	add	%w[i], %w[i], w30")
+	: [i] "+r" (w0), [v] "+Q" (v->counter)
+	: "r" (x1)
+	: "x30", "memory");
+
+	return w0;
+}
 
-ATOMIC_OPS(add, add)
-ATOMIC_OPS(sub, sub)
+static inline void atomic_sub(int i, atomic_t *v)
+{
+	register int w0 asm ("w0") = i;
+	register atomic_t *x1 asm ("x1") = v;
+
+	asm volatile(ARM64_LSE_ATOMIC_INSN(
+	/* LL/SC */
+	"	nop\n"
+	__LL_SC_ATOMIC(sub),
+	/* LSE atomics */
+	"	neg	%w[i], %w[i]\n"
+	"	stadd	%w[i], %[v]")
+	: [i] "+r" (w0), [v] "+Q" (v->counter)
+	: "r" (x1)
+	: "x30");
+}
 
-#undef ATOMIC_OPS
-#undef ATOMIC_OP_RETURN
-#undef ATOMIC_OP
+static inline int atomic_sub_return(int i, atomic_t *v)
+{
+	register int w0 asm ("w0") = i;
+	register atomic_t *x1 asm ("x1") = v;
+
+	asm volatile(ARM64_LSE_ATOMIC_INSN(
+	/* LL/SC */
+	"	nop\n"
+	__LL_SC_ATOMIC(sub_return)
+	"	nop",
+	/* LSE atomics */
+	"	neg	%w[i], %w[i]\n"
+	"	ldaddal	%w[i], w30, %[v]\n"
+	"	add	%w[i], %w[i], w30")
+	: [i] "+r" (w0), [v] "+Q" (v->counter)
+	: "r" (x1)
+	: "x30", "memory");
+
+	return w0;
+}
 
 static inline int atomic_cmpxchg(atomic_t *ptr, int old, int new)
 {
@@ -81,64 +102,111 @@ static inline int atomic_cmpxchg(atomic_t *ptr, int old, int new)
 	register int w1 asm ("w1") = old;
 	register int w2 asm ("w2") = new;
 
-	asm volatile(
-	__LL_SC_CALL(cmpxchg)
-	: "+r" (x0)
-	: "r" (w1), "r" (w2)
+	asm volatile(ARM64_LSE_ATOMIC_INSN(
+	/* LL/SC */
+	"	nop\n"
+	__LL_SC_ATOMIC(cmpxchg)
+	"	nop",
+	/* LSE atomics */
+	"	mov	w30, %w[old]\n"
+	"	casal	w30, %w[new], %[v]\n"
+	"	mov	%w[ret], w30")
+	: [ret] "+r" (x0), [v] "+Q" (ptr->counter)
+	: [old] "r" (w1), [new] "r" (w2)
 	: "x30", "cc", "memory");
 
 	return x0;
 }
 
-#define ATOMIC64_OP(op, asm_op)						\
-static inline void atomic64_##op(long i, atomic64_t *v)			\
-{									\
-	register long x0 asm ("x0") = i;				\
-	register atomic64_t *x1 asm ("x1") = v;				\
-									\
-	asm volatile(							\
-	__LL_SC_CALL64(op)						\
-	: "+r" (x0), "+Q" (v->counter)					\
-	: "r" (x1)							\
-	: "x30");							\
-}									\
-
-#define ATOMIC64_OP_RETURN(op, asm_op)					\
-static inline long atomic64_##op##_return(long i, atomic64_t *v)	\
-{									\
-	register long x0 asm ("x0") = i;				\
-	register atomic64_t *x1 asm ("x1") = v;				\
-									\
-	asm volatile(							\
-	__LL_SC_CALL64(op##_return)					\
-	: "+r" (x0)							\
-	: "r" (x1)							\
-	: "x30", "memory");						\
-									\
-	return x0;							\
+#undef __LL_SC_ATOMIC
+
+#define __LL_SC_ATOMIC64(op)	__LL_SC_CALL(atomic64_##op)
+
+static inline void atomic64_add(long i, atomic64_t *v)
+{
+	register long x0 asm ("x0") = i;
+	register atomic64_t *x1 asm ("x1") = v;
+
+	asm volatile(ARM64_LSE_ATOMIC_INSN(__LL_SC_ATOMIC64(add),
+	"	stadd	%[i], %[v]\n")
+	: [i] "+r" (x0), [v] "+Q" (v->counter)
+	: "r" (x1)
+	: "x30");
 }
 
-#define ATOMIC64_OPS(op, asm_op)					\
-	ATOMIC64_OP(op, asm_op)						\
-	ATOMIC64_OP_RETURN(op, asm_op)
+static inline long atomic64_add_return(long i, atomic64_t *v)
+{
+	register long x0 asm ("x0") = i;
+	register atomic64_t *x1 asm ("x1") = v;
+
+	asm volatile(ARM64_LSE_ATOMIC_INSN(
+	/* LL/SC */
+	"	nop\n"
+	__LL_SC_ATOMIC64(add_return),
+	/* LSE atomics */
+	"	ldaddal	%[i], x30, %[v]\n"
+	"	add	%[i], %[i], x30")
+	: [i] "+r" (x0), [v] "+Q" (v->counter)
+	: "r" (x1)
+	: "x30", "memory");
 
-ATOMIC64_OPS(add, add)
-ATOMIC64_OPS(sub, sub)
+	return x0;
+}
 
-#undef ATOMIC64_OPS
-#undef ATOMIC64_OP_RETURN
-#undef ATOMIC64_OP
+static inline void atomic64_sub(long i, atomic64_t *v)
+{
+	register long x0 asm ("x0") = i;
+	register atomic64_t *x1 asm ("x1") = v;
+
+	asm volatile(ARM64_LSE_ATOMIC_INSN(
+	/* LL/SC */
+	"	nop\n"
+	__LL_SC_ATOMIC64(sub),
+	/* LSE atomics */
+	"	neg	%[i], %[i]\n"
+	"	stadd	%[i], %[v]\n")
+	: [i] "+r" (x0), [v] "+Q" (v->counter)
+	: "r" (x1)
+	: "x30");
+}
 
+static inline long atomic64_sub_return(long i, atomic64_t *v)
+{
+	register long x0 asm ("x0") = i;
+	register atomic64_t *x1 asm ("x1") = v;
+
+	asm volatile(ARM64_LSE_ATOMIC_INSN(
+	/* LL/SC */
+	"	nop\n"
+	__LL_SC_ATOMIC64(sub_return)
+	"	nop",
+	/* LSE atomics */
+	"	neg	%[i], %[i]\n"
+	"	ldaddal	%[i], x30, %[v]\n"
+	"	add	%[i], %[i], x30")
+	: [i] "+r" (x0), [v] "+Q" (v->counter)
+	: "r" (x1)
+	: "x30", "memory");
+
+	return x0;
+}
 static inline long atomic64_cmpxchg(atomic64_t *ptr, long old, long new)
 {
 	register unsigned long x0 asm ("x0") = (unsigned long)ptr;
 	register long x1 asm ("x1") = old;
 	register long x2 asm ("x2") = new;
 
-	asm volatile(
-	__LL_SC_CALL64(cmpxchg)
-	: "+r" (x0)
-	: "r" (x1), "r" (x2)
+	asm volatile(ARM64_LSE_ATOMIC_INSN(
+	/* LL/SC */
+	"	nop\n"
+	__LL_SC_ATOMIC64(cmpxchg)
+	"	nop",
+	/* LSE atomics */
+	"	mov	x30, %[old]\n"
+	"	casal	x30, %[new], %[v]\n"
+	"	mov	%[ret], x30")
+	: [ret] "+r" (x0), [v] "+Q" (ptr->counter)
+	: [old] "r" (x1), [new] "r" (x2)
 	: "x30", "cc", "memory");
 
 	return x0;
@@ -146,15 +214,33 @@ static inline long atomic64_cmpxchg(atomic64_t *ptr, long old, long new)
 
 static inline long atomic64_dec_if_positive(atomic64_t *v)
 {
-	register unsigned long x0 asm ("x0") = (unsigned long)v;
-
-	asm volatile(
-	__LL_SC_CALL64(dec_if_positive)
-	: "+r" (x0)
+	register long x0 asm ("x0") = (long)v;
+
+	asm volatile(ARM64_LSE_ATOMIC_INSN(
+	/* LL/SC */
+	"	nop\n"
+	__LL_SC_ATOMIC64(dec_if_positive)
+	"	nop\n"
+	"	nop\n"
+	"	nop\n"
+	"	nop\n"
+	"	nop",
+	/* LSE atomics */
+	"1:	ldr	x30, %[v]\n"
+	"	subs	%[ret], x30, #1\n"
+	"	b.mi	2f\n"
+	"	casal	x30, %[ret], %[v]\n"
+	"	sub	x30, x30, #1\n"
+	"	sub	x30, x30, %[ret]\n"
+	"	cbnz	x30, 1b\n"
+	"2:")
+	: [ret] "+&r" (x0), [v] "+Q" (v->counter)
 	:
 	: "x30", "cc", "memory");
 
 	return x0;
 }
 
+#undef __LL_SC_ATOMIC64
+
 #endif	/* __ASM_ATOMIC_LSE_H */
diff --git a/arch/arm64/include/asm/lse.h b/arch/arm64/include/asm/lse.h
new file mode 100644
index 000000000000..d516624a461e
--- /dev/null
+++ b/arch/arm64/include/asm/lse.h
@@ -0,0 +1,34 @@
+#ifndef __ASM_LSE_H
+#define __ASM_LSE_H
+
+#if defined(CONFIG_AS_LSE) && defined(CONFIG_ARM64_LSE_ATOMICS)
+
+#include <linux/stringify.h>
+
+#include <asm/alternative.h>
+#include <asm/cpufeature.h>
+
+__asm__(".arch_extension	lse");
+
+/* Move the ll/sc atomics out-of-line */
+#define __LL_SC_INLINE
+#define __LL_SC_PREFIX(x)	__ll_sc_##x
+#define __LL_SC_EXPORT(x)	EXPORT_SYMBOL(__LL_SC_PREFIX(x))
+
+/* Macro for constructing calls to out-of-line ll/sc atomics */
+#define __LL_SC_CALL(op)	"bl\t" __stringify(__LL_SC_PREFIX(op)) "\n"
+
+/* In-line patching at runtime */
+#define ARM64_LSE_ATOMIC_INSN(llsc, lse)				\
+	ALTERNATIVE(llsc, lse, ARM64_CPU_FEAT_LSE_ATOMICS)
+
+#else
+
+#define __LL_SC_INLINE		static inline
+#define __LL_SC_PREFIX(x)	x
+#define __LL_SC_EXPORT(x)
+
+#define ARM64_LSE_ATOMIC_INSN(llsc, lse)	llsc
+
+#endif	/* CONFIG_AS_LSE && CONFIG_ARM64_LSE_ATOMICS */
+#endif	/* __ASM_LSE_H */
diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index 5b170df96aaf..930a353b868c 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -287,6 +287,9 @@ static void __init setup_processor(void)
 		case 2:
 			elf_hwcap |= HWCAP_ATOMICS;
 			cpus_set_cap(ARM64_CPU_FEAT_LSE_ATOMICS);
+			if (IS_ENABLED(CONFIG_AS_LSE) &&
+			    IS_ENABLED(CONFIG_ARM64_LSE_ATOMICS))
+				pr_info("LSE atomics supported\n");
 		case 1:
 			/* RESERVED */
 		case 0:
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v2 08/20] arm64: atomics: patch in lse instructions when supported by the CPU
  2015-07-24 10:41 ` [PATCH v2 08/20] arm64: atomics: patch in lse instructions when supported by the CPU Will Deacon
@ 2015-07-24 14:43   ` Catalin Marinas
  0 siblings, 0 replies; 44+ messages in thread
From: Catalin Marinas @ 2015-07-24 14:43 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jul 24, 2015 at 11:41:59AM +0100, Will Deacon wrote:
> On CPUs which support the LSE atomic instructions introduced in ARMv8.1,
> it makes sense to use them in preference to ll/sc sequences.
> 
> This patch introduces runtime patching of atomic_t and atomic64_t
> routines so that the call-site for the out-of-line ll/sc sequences is
> patched with an LSE atomic instruction when we detect that
> the CPU supports it.
> 
> If binutils is not recent enough to assemble the LSE instructions, then
> the ll/sc sequences are inlined as though CONFIG_ARM64_LSE_ATOMICS=n.
> 
> Signed-off-by: Will Deacon <will.deacon@arm.com>

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 09/20] arm64: locks: patch in lse instructions when supported by the CPU
  2015-07-24 10:41 [PATCH v2 00/20] arm64: support for 8.1 LSE atomic instructions Will Deacon
                   ` (7 preceding siblings ...)
  2015-07-24 10:41 ` [PATCH v2 08/20] arm64: atomics: patch in lse instructions when supported by the CPU Will Deacon
@ 2015-07-24 10:42 ` Will Deacon
  2015-07-24 15:08   ` Catalin Marinas
  2015-07-24 10:42 ` [PATCH v2 10/20] arm64: bitops: " Will Deacon
                   ` (10 subsequent siblings)
  19 siblings, 1 reply; 44+ messages in thread
From: Will Deacon @ 2015-07-24 10:42 UTC (permalink / raw)
  To: linux-arm-kernel

On CPUs which support the LSE atomic instructions introduced in ARMv8.1,
it makes sense to use them in preference to ll/sc sequences.

This patch introduces runtime patching of our locking functions so that
LSE atomic instructions are used for spinlocks and rwlocks.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/spinlock.h | 137 ++++++++++++++++++++++++++++++--------
 1 file changed, 108 insertions(+), 29 deletions(-)

diff --git a/arch/arm64/include/asm/spinlock.h b/arch/arm64/include/asm/spinlock.h
index 0f08ba5cfb33..87ae7efa1211 100644
--- a/arch/arm64/include/asm/spinlock.h
+++ b/arch/arm64/include/asm/spinlock.h
@@ -16,6 +16,7 @@
 #ifndef __ASM_SPINLOCK_H
 #define __ASM_SPINLOCK_H
 
+#include <asm/lse.h>
 #include <asm/spinlock_types.h>
 #include <asm/processor.h>
 
@@ -38,11 +39,21 @@ static inline void arch_spin_lock(arch_spinlock_t *lock)
 
 	asm volatile(
 	/* Atomically increment the next ticket. */
+	ARM64_LSE_ATOMIC_INSN(
+	/* LL/SC */
 "	prfm	pstl1strm, %3\n"
 "1:	ldaxr	%w0, %3\n"
 "	add	%w1, %w0, %w5\n"
 "	stxr	%w2, %w1, %3\n"
-"	cbnz	%w2, 1b\n"
+"	cbnz	%w2, 1b\n",
+	/* LSE atomics */
+"	mov	%w2, %w5\n"
+"	ldadda	%w2, %w0, %3\n"
+"	nop\n"
+"	nop\n"
+"	nop\n"
+	)
+
 	/* Did we get the lock? */
 "	eor	%w1, %w0, %w0, ror #16\n"
 "	cbz	%w1, 3f\n"
@@ -67,15 +78,25 @@ static inline int arch_spin_trylock(arch_spinlock_t *lock)
 	unsigned int tmp;
 	arch_spinlock_t lockval;
 
-	asm volatile(
-"	prfm	pstl1strm, %2\n"
-"1:	ldaxr	%w0, %2\n"
-"	eor	%w1, %w0, %w0, ror #16\n"
-"	cbnz	%w1, 2f\n"
-"	add	%w0, %w0, %3\n"
-"	stxr	%w1, %w0, %2\n"
-"	cbnz	%w1, 1b\n"
-"2:"
+	asm volatile(ARM64_LSE_ATOMIC_INSN(
+	/* LL/SC */
+	"	prfm	pstl1strm, %2\n"
+	"1:	ldaxr	%w0, %2\n"
+	"	eor	%w1, %w0, %w0, ror #16\n"
+	"	cbnz	%w1, 2f\n"
+	"	add	%w0, %w0, %3\n"
+	"	stxr	%w1, %w0, %2\n"
+	"	cbnz	%w1, 1b\n"
+	"2:",
+	/* LSE atomics */
+	"	ldr	%w0, %2\n"
+	"	eor	%w1, %w0, %w0, ror #16\n"
+	"	cbnz	%w1, 1f\n"
+	"	add	%w1, %w0, %3\n"
+	"	casa	%w0, %w1, %2\n"
+	"	and	%w1, %w1, #0xffff\n"
+	"	eor	%w1, %w1, %w0, lsr #16\n"
+	"1:")
 	: "=&r" (lockval), "=&r" (tmp), "+Q" (*lock)
 	: "I" (1 << TICKET_SHIFT)
 	: "memory");
@@ -85,10 +106,19 @@ static inline int arch_spin_trylock(arch_spinlock_t *lock)
 
 static inline void arch_spin_unlock(arch_spinlock_t *lock)
 {
-	asm volatile(
-"	stlrh	%w1, %0\n"
-	: "=Q" (lock->owner)
-	: "r" (lock->owner + 1)
+	unsigned long tmp;
+
+	asm volatile(ARM64_LSE_ATOMIC_INSN(
+	/* LL/SC */
+	"	ldr	%w1, %0\n"
+	"	add	%w1, %w1, #1\n"
+	"	stlrh	%w1, %0",
+	/* LSE atomics */
+	"	mov	%w1, #1\n"
+	"	nop\n"
+	"	staddlh	%w1, %0")
+	: "=Q" (lock->owner), "=&r" (tmp)
+	:
 	: "memory");
 }
 
@@ -123,13 +153,24 @@ static inline void arch_write_lock(arch_rwlock_t *rw)
 {
 	unsigned int tmp;
 
-	asm volatile(
+	asm volatile(ARM64_LSE_ATOMIC_INSN(
+	/* LL/SC */
 	"	sevl\n"
 	"1:	wfe\n"
 	"2:	ldaxr	%w0, %1\n"
 	"	cbnz	%w0, 1b\n"
 	"	stxr	%w0, %w2, %1\n"
 	"	cbnz	%w0, 2b\n"
+	"	nop",
+	/* LSE atomics */
+	"1:	mov	%w0, wzr\n"
+	"2:	casa	%w0, %w2, %1\n"
+	"	cbz	%w0, 3f\n"
+	"	ldxr	%w0, %1\n"
+	"	cbz	%w0, 2b\n"
+	"	wfe\n"
+	"	b	1b\n"
+	"3:")
 	: "=&r" (tmp), "+Q" (rw->lock)
 	: "r" (0x80000000)
 	: "memory");
@@ -139,12 +180,18 @@ static inline int arch_write_trylock(arch_rwlock_t *rw)
 {
 	unsigned int tmp;
 
-	asm volatile(
+	asm volatile(ARM64_LSE_ATOMIC_INSN(
+	/* LL/SC */
 	"1:	ldaxr	%w0, %1\n"
 	"	cbnz	%w0, 2f\n"
 	"	stxr	%w0, %w2, %1\n"
 	"	cbnz	%w0, 1b\n"
-	"2:\n"
+	"2:",
+	/* LSE atomics */
+	"	mov	%w0, wzr\n"
+	"	casa	%w0, %w2, %1\n"
+	"	nop\n"
+	"	nop")
 	: "=&r" (tmp), "+Q" (rw->lock)
 	: "r" (0x80000000)
 	: "memory");
@@ -154,9 +201,10 @@ static inline int arch_write_trylock(arch_rwlock_t *rw)
 
 static inline void arch_write_unlock(arch_rwlock_t *rw)
 {
-	asm volatile(
-	"	stlr	%w1, %0\n"
-	: "=Q" (rw->lock) : "r" (0) : "memory");
+	asm volatile(ARM64_LSE_ATOMIC_INSN(
+	"	stlr	wzr, %0",
+	"	swpl	wzr, wzr, %0")
+	: "=Q" (rw->lock) :: "memory");
 }
 
 /* write_can_lock - would write_trylock() succeed? */
@@ -173,6 +221,10 @@ static inline void arch_write_unlock(arch_rwlock_t *rw)
  *
  * The memory barriers are implicit with the load-acquire and store-release
  * instructions.
+ *
+ * Note that in UNDEFINED cases, such as unlocking a lock twice, the LL/SC
+ * and LSE implementations may exhibit different behaviour (although this
+ * will have no effect on lockdep).
  */
 static inline void arch_read_lock(arch_rwlock_t *rw)
 {
@@ -180,26 +232,43 @@ static inline void arch_read_lock(arch_rwlock_t *rw)
 
 	asm volatile(
 	"	sevl\n"
+	ARM64_LSE_ATOMIC_INSN(
+	/* LL/SC */
 	"1:	wfe\n"
 	"2:	ldaxr	%w0, %2\n"
 	"	add	%w0, %w0, #1\n"
 	"	tbnz	%w0, #31, 1b\n"
 	"	stxr	%w1, %w0, %2\n"
-	"	cbnz	%w1, 2b\n"
+	"	nop\n"
+	"	cbnz	%w1, 2b",
+	/* LSE atomics */
+	"1:	wfe\n"
+	"2:	ldxr	%w0, %2\n"
+	"	adds	%w1, %w0, #1\n"
+	"	tbnz	%w1, #31, 1b\n"
+	"	casa	%w0, %w1, %2\n"
+	"	sbc	%w0, %w1, %w0\n"
+	"	cbnz	%w0, 2b")
 	: "=&r" (tmp), "=&r" (tmp2), "+Q" (rw->lock)
 	:
-	: "memory");
+	: "cc", "memory");
 }
 
 static inline void arch_read_unlock(arch_rwlock_t *rw)
 {
 	unsigned int tmp, tmp2;
 
-	asm volatile(
+	asm volatile(ARM64_LSE_ATOMIC_INSN(
+	/* LL/SC */
 	"1:	ldxr	%w0, %2\n"
 	"	sub	%w0, %w0, #1\n"
 	"	stlxr	%w1, %w0, %2\n"
-	"	cbnz	%w1, 1b\n"
+	"	cbnz	%w1, 1b",
+	/* LSE atomics */
+	"	movn	%w0, #0\n"
+	"	nop\n"
+	"	nop\n"
+	"	staddl	%w0, %2")
 	: "=&r" (tmp), "=&r" (tmp2), "+Q" (rw->lock)
 	:
 	: "memory");
@@ -207,18 +276,28 @@ static inline void arch_read_unlock(arch_rwlock_t *rw)
 
 static inline int arch_read_trylock(arch_rwlock_t *rw)
 {
-	unsigned int tmp, tmp2 = 1;
+	unsigned int tmp, tmp2;
 
-	asm volatile(
+	asm volatile(ARM64_LSE_ATOMIC_INSN(
+	/* LL/SC */
+	"	mov	%w1, #1\n"
 	"1:	ldaxr	%w0, %2\n"
 	"	add	%w0, %w0, #1\n"
 	"	tbnz	%w0, #31, 2f\n"
 	"	stxr	%w1, %w0, %2\n"
 	"	cbnz	%w1, 1b\n"
-	"2:\n"
-	: "=&r" (tmp), "+r" (tmp2), "+Q" (rw->lock)
+	"2:",
+	/* LSE atomics */
+	"	ldr	%w0, %2\n"
+	"	adds	%w1, %w0, #1\n"
+	"	tbnz	%w1, #31, 1f\n"
+	"	casa	%w0, %w1, %2\n"
+	"	sbc	%w1, %w1, %w0\n"
+	"	nop\n"
+	"1:")
+	: "=&r" (tmp), "=&r" (tmp2), "+Q" (rw->lock)
 	:
-	: "memory");
+	: "cc", "memory");
 
 	return !tmp2;
 }
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v2 09/20] arm64: locks: patch in lse instructions when supported by the CPU
  2015-07-24 10:42 ` [PATCH v2 09/20] arm64: locks: " Will Deacon
@ 2015-07-24 15:08   ` Catalin Marinas
  0 siblings, 0 replies; 44+ messages in thread
From: Catalin Marinas @ 2015-07-24 15:08 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jul 24, 2015 at 11:42:00AM +0100, Will Deacon wrote:
> On CPUs which support the LSE atomic instructions introduced in ARMv8.1,
> it makes sense to use them in preference to ll/sc sequences.
> 
> This patch introduces runtime patching of our locking functions so that
> LSE atomic instructions are used for spinlocks and rwlocks.
> 
> Signed-off-by: Will Deacon <will.deacon@arm.com>

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 10/20] arm64: bitops: patch in lse instructions when supported by the CPU
  2015-07-24 10:41 [PATCH v2 00/20] arm64: support for 8.1 LSE atomic instructions Will Deacon
                   ` (8 preceding siblings ...)
  2015-07-24 10:42 ` [PATCH v2 09/20] arm64: locks: " Will Deacon
@ 2015-07-24 10:42 ` Will Deacon
  2015-07-24 15:19   ` Catalin Marinas
  2015-07-24 10:42 ` [PATCH v2 11/20] arm64: xchg: " Will Deacon
                   ` (9 subsequent siblings)
  19 siblings, 1 reply; 44+ messages in thread
From: Will Deacon @ 2015-07-24 10:42 UTC (permalink / raw)
  To: linux-arm-kernel

On CPUs which support the LSE atomic instructions introduced in ARMv8.1,
it makes sense to use them in preference to ll/sc sequences.

This patch introduces runtime patching of our bitops functions so that
LSE atomic instructions are used instead.

Reviewed-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/lse.h | 23 +++++++++++++++++++++--
 arch/arm64/lib/bitops.S      | 43 ++++++++++++++++++++++++-------------------
 2 files changed, 45 insertions(+), 21 deletions(-)

diff --git a/arch/arm64/include/asm/lse.h b/arch/arm64/include/asm/lse.h
index d516624a461e..fb3ac56a2cc0 100644
--- a/arch/arm64/include/asm/lse.h
+++ b/arch/arm64/include/asm/lse.h
@@ -4,10 +4,19 @@
 #if defined(CONFIG_AS_LSE) && defined(CONFIG_ARM64_LSE_ATOMICS)
 
 #include <linux/stringify.h>
-
 #include <asm/alternative.h>
 #include <asm/cpufeature.h>
 
+#ifdef __ASSEMBLER__
+
+.arch_extension	lse
+
+.macro alt_lse, llsc, lse
+	alternative_insn "\llsc", "\lse", ARM64_CPU_FEAT_LSE_ATOMICS
+.endm
+
+#else	/* __ASSEMBLER__ */
+
 __asm__(".arch_extension	lse");
 
 /* Move the ll/sc atomics out-of-line */
@@ -22,7 +31,16 @@ __asm__(".arch_extension	lse");
 #define ARM64_LSE_ATOMIC_INSN(llsc, lse)				\
 	ALTERNATIVE(llsc, lse, ARM64_CPU_FEAT_LSE_ATOMICS)
 
-#else
+#endif	/* __ASSEMBLER__ */
+#else	/* CONFIG_AS_LSE && CONFIG_ARM64_LSE_ATOMICS */
+
+#ifdef __ASSEMBLER__
+
+.macro alt_lse, llsc, lse
+	\llsc
+.endm
+
+#else	/* __ASSEMBLER__ */
 
 #define __LL_SC_INLINE		static inline
 #define __LL_SC_PREFIX(x)	x
@@ -30,5 +48,6 @@ __asm__(".arch_extension	lse");
 
 #define ARM64_LSE_ATOMIC_INSN(llsc, lse)	llsc
 
+#endif	/* __ASSEMBLER__ */
 #endif	/* CONFIG_AS_LSE && CONFIG_ARM64_LSE_ATOMICS */
 #endif	/* __ASM_LSE_H */
diff --git a/arch/arm64/lib/bitops.S b/arch/arm64/lib/bitops.S
index 7dac371cc9a2..bc18457c2bba 100644
--- a/arch/arm64/lib/bitops.S
+++ b/arch/arm64/lib/bitops.S
@@ -18,52 +18,57 @@
 
 #include <linux/linkage.h>
 #include <asm/assembler.h>
+#include <asm/lse.h>
 
 /*
  * x0: bits 5:0  bit offset
  *     bits 31:6 word offset
  * x1: address
  */
-	.macro	bitop, name, instr
+	.macro	bitop, name, llsc, lse
 ENTRY(	\name	)
 	and	w3, w0, #63		// Get bit offset
 	eor	w0, w0, w3		// Clear low bits
 	mov	x2, #1
 	add	x1, x1, x0, lsr #3	// Get word offset
 	lsl	x3, x2, x3		// Create mask
-1:	ldxr	x2, [x1]
-	\instr	x2, x2, x3
-	stxr	w0, x2, [x1]
-	cbnz	w0, 1b
+
+alt_lse	"1:	ldxr	x2, [x1]",		"\lse	x3, [x1]"
+alt_lse	"	\llsc	x2, x2, x3",		"nop"
+alt_lse	"	stxr	w0, x2, [x1]",		"nop"
+alt_lse	"	cbnz	w0, 1b",		"nop"
+
 	ret
 ENDPROC(\name	)
 	.endm
 
-	.macro	testop, name, instr
+	.macro	testop, name, llsc, lse
 ENTRY(	\name	)
 	and	w3, w0, #63		// Get bit offset
 	eor	w0, w0, w3		// Clear low bits
 	mov	x2, #1
 	add	x1, x1, x0, lsr #3	// Get word offset
 	lsl	x4, x2, x3		// Create mask
-1:	ldxr	x2, [x1]
-	lsr	x0, x2, x3		// Save old value of bit
-	\instr	x2, x2, x4		// toggle bit
-	stlxr	w5, x2, [x1]
-	cbnz	w5, 1b
-	dmb	ish
+
+alt_lse	"1:	ldxr	x2, [x1]",		"\lse	x4, x2, [x1]"
+	lsr	x0, x2, x3
+alt_lse	"	\llsc	x2, x2, x4",		"nop"
+alt_lse	"	stlxr	w5, x2, [x1]",		"nop"
+alt_lse	"	cbnz	w5, 1b",		"nop"
+alt_lse	"	dmb	ish",			"nop"
+
 	and	x0, x0, #1
-3:	ret
+	ret
 ENDPROC(\name	)
 	.endm
 
 /*
  * Atomic bit operations.
  */
-	bitop	change_bit, eor
-	bitop	clear_bit, bic
-	bitop	set_bit, orr
+	bitop	change_bit, eor, steor
+	bitop	clear_bit, bic, stclr
+	bitop	set_bit, orr, stset
 
-	testop	test_and_change_bit, eor
-	testop	test_and_clear_bit, bic
-	testop	test_and_set_bit, orr
+	testop	test_and_change_bit, eor, ldeoral
+	testop	test_and_clear_bit, bic, ldclral
+	testop	test_and_set_bit, orr, ldsetal
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v2 10/20] arm64: bitops: patch in lse instructions when supported by the CPU
  2015-07-24 10:42 ` [PATCH v2 10/20] arm64: bitops: " Will Deacon
@ 2015-07-24 15:19   ` Catalin Marinas
  0 siblings, 0 replies; 44+ messages in thread
From: Catalin Marinas @ 2015-07-24 15:19 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jul 24, 2015 at 11:42:01AM +0100, Will Deacon wrote:
> On CPUs which support the LSE atomic instructions introduced in ARMv8.1,
> it makes sense to use them in preference to ll/sc sequences.
> 
> This patch introduces runtime patching of our bitops functions so that
> LSE atomic instructions are used instead.
> 
> Reviewed-by: Steve Capper <steve.capper@arm.com>
> Signed-off-by: Will Deacon <will.deacon@arm.com>

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 11/20] arm64: xchg: patch in lse instructions when supported by the CPU
  2015-07-24 10:41 [PATCH v2 00/20] arm64: support for 8.1 LSE atomic instructions Will Deacon
                   ` (9 preceding siblings ...)
  2015-07-24 10:42 ` [PATCH v2 10/20] arm64: bitops: " Will Deacon
@ 2015-07-24 10:42 ` Will Deacon
  2015-07-24 15:19   ` Catalin Marinas
  2015-07-24 10:42 ` [PATCH v2 12/20] arm64: cmpxchg: " Will Deacon
                   ` (8 subsequent siblings)
  19 siblings, 1 reply; 44+ messages in thread
From: Will Deacon @ 2015-07-24 10:42 UTC (permalink / raw)
  To: linux-arm-kernel

On CPUs which support the LSE atomic instructions introduced in ARMv8.1,
it makes sense to use them in preference to ll/sc sequences.

This patch introduces runtime patching of our xchg primitives so that
the LSE swp instruction (yes, you read right!) is used instead.

Reviewed-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/cmpxchg.h | 38 +++++++++++++++++++++++++++++++++-----
 1 file changed, 33 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/cmpxchg.h b/arch/arm64/include/asm/cmpxchg.h
index d8c25b7b18fb..d0cce8068902 100644
--- a/arch/arm64/include/asm/cmpxchg.h
+++ b/arch/arm64/include/asm/cmpxchg.h
@@ -22,6 +22,7 @@
 #include <linux/mmdebug.h>
 
 #include <asm/barrier.h>
+#include <asm/lse.h>
 
 static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size)
 {
@@ -29,37 +30,65 @@ static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size
 
 	switch (size) {
 	case 1:
-		asm volatile("//	__xchg1\n"
+		asm volatile(ARM64_LSE_ATOMIC_INSN(
+		/* LL/SC */
 		"1:	ldxrb	%w0, %2\n"
 		"	stlxrb	%w1, %w3, %2\n"
 		"	cbnz	%w1, 1b\n"
+		"	dmb	ish",
+		/* LSE atomics */
+		"	nop\n"
+		"	swpalb	%w3, %w0, %2\n"
+		"	nop\n"
+		"	nop")
 			: "=&r" (ret), "=&r" (tmp), "+Q" (*(u8 *)ptr)
 			: "r" (x)
 			: "memory");
 		break;
 	case 2:
-		asm volatile("//	__xchg2\n"
+		asm volatile(ARM64_LSE_ATOMIC_INSN(
+		/* LL/SC */
 		"1:	ldxrh	%w0, %2\n"
 		"	stlxrh	%w1, %w3, %2\n"
 		"	cbnz	%w1, 1b\n"
+		"	dmb	ish",
+		/* LSE atomics */
+		"	nop\n"
+		"	swpalh	%w3, %w0, %2\n"
+		"	nop\n"
+		"	nop")
 			: "=&r" (ret), "=&r" (tmp), "+Q" (*(u16 *)ptr)
 			: "r" (x)
 			: "memory");
 		break;
 	case 4:
-		asm volatile("//	__xchg4\n"
+		asm volatile(ARM64_LSE_ATOMIC_INSN(
+		/* LL/SC */
 		"1:	ldxr	%w0, %2\n"
 		"	stlxr	%w1, %w3, %2\n"
 		"	cbnz	%w1, 1b\n"
+		"	dmb	ish",
+		/* LSE atomics */
+		"	nop\n"
+		"	swpal	%w3, %w0, %2\n"
+		"	nop\n"
+		"	nop")
 			: "=&r" (ret), "=&r" (tmp), "+Q" (*(u32 *)ptr)
 			: "r" (x)
 			: "memory");
 		break;
 	case 8:
-		asm volatile("//	__xchg8\n"
+		asm volatile(ARM64_LSE_ATOMIC_INSN(
+		/* LL/SC */
 		"1:	ldxr	%0, %2\n"
 		"	stlxr	%w1, %3, %2\n"
 		"	cbnz	%w1, 1b\n"
+		"	dmb	ish",
+		/* LSE atomics */
+		"	nop\n"
+		"	swpal	%3, %0, %2\n"
+		"	nop\n"
+		"	nop")
 			: "=&r" (ret), "=&r" (tmp), "+Q" (*(u64 *)ptr)
 			: "r" (x)
 			: "memory");
@@ -68,7 +97,6 @@ static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size
 		BUILD_BUG();
 	}
 
-	smp_mb();
 	return ret;
 }
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v2 11/20] arm64: xchg: patch in lse instructions when supported by the CPU
  2015-07-24 10:42 ` [PATCH v2 11/20] arm64: xchg: " Will Deacon
@ 2015-07-24 15:19   ` Catalin Marinas
  0 siblings, 0 replies; 44+ messages in thread
From: Catalin Marinas @ 2015-07-24 15:19 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jul 24, 2015 at 11:42:02AM +0100, Will Deacon wrote:
> On CPUs which support the LSE atomic instructions introduced in ARMv8.1,
> it makes sense to use them in preference to ll/sc sequences.
> 
> This patch introduces runtime patching of our xchg primitives so that
> the LSE swp instruction (yes, you read right!) is used instead.
> 
> Reviewed-by: Steve Capper <steve.capper@arm.com>
> Signed-off-by: Will Deacon <will.deacon@arm.com>

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 12/20] arm64: cmpxchg: patch in lse instructions when supported by the CPU
  2015-07-24 10:41 [PATCH v2 00/20] arm64: support for 8.1 LSE atomic instructions Will Deacon
                   ` (10 preceding siblings ...)
  2015-07-24 10:42 ` [PATCH v2 11/20] arm64: xchg: " Will Deacon
@ 2015-07-24 10:42 ` Will Deacon
  2015-07-24 15:21   ` Catalin Marinas
  2015-07-24 10:42 ` [PATCH v2 13/20] arm64: cmpxchg_dbl: " Will Deacon
                   ` (7 subsequent siblings)
  19 siblings, 1 reply; 44+ messages in thread
From: Will Deacon @ 2015-07-24 10:42 UTC (permalink / raw)
  To: linux-arm-kernel

On CPUs which support the LSE atomic instructions introduced in ARMv8.1,
it makes sense to use them in preference to ll/sc sequences.

This patch introduces runtime patching of our cmpxchg primitives so that
the LSE cas instruction is used instead.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/atomic.h       |  3 +-
 arch/arm64/include/asm/atomic_ll_sc.h | 38 ++++++++++++++++
 arch/arm64/include/asm/atomic_lse.h   | 39 ++++++++++++++++
 arch/arm64/include/asm/cmpxchg.h      | 84 ++++++++---------------------------
 4 files changed, 98 insertions(+), 66 deletions(-)

diff --git a/arch/arm64/include/asm/atomic.h b/arch/arm64/include/asm/atomic.h
index cb53efa23f62..ee32776d926c 100644
--- a/arch/arm64/include/asm/atomic.h
+++ b/arch/arm64/include/asm/atomic.h
@@ -24,7 +24,6 @@
 #include <linux/types.h>
 
 #include <asm/barrier.h>
-#include <asm/cmpxchg.h>
 #include <asm/lse.h>
 
 #define ATOMIC_INIT(i)	{ (i) }
@@ -41,6 +40,8 @@
 
 #undef __ARM64_IN_ATOMIC_IMPL
 
+#include <asm/cmpxchg.h>
+
 /*
  * On ARM, ordinary assignment (str instruction) doesn't clear the local
  * strex/ldrex monitor on some implementations. The reason we can use it for
diff --git a/arch/arm64/include/asm/atomic_ll_sc.h b/arch/arm64/include/asm/atomic_ll_sc.h
index 9cf298914ac3..b4298f9a898f 100644
--- a/arch/arm64/include/asm/atomic_ll_sc.h
+++ b/arch/arm64/include/asm/atomic_ll_sc.h
@@ -205,4 +205,42 @@ __LL_SC_PREFIX(atomic64_dec_if_positive(atomic64_t *v))
 }
 __LL_SC_EXPORT(atomic64_dec_if_positive);
 
+#define __CMPXCHG_CASE(w, sz, name, mb, cl)				\
+__LL_SC_INLINE unsigned long						\
+__LL_SC_PREFIX(__cmpxchg_case_##name(volatile void *ptr,		\
+				     unsigned long old,			\
+				     unsigned long new))		\
+{									\
+	unsigned long tmp, oldval;					\
+									\
+	asm volatile(							\
+	"	" #mb "\n"						\
+	"1:	ldxr" #sz "\t%" #w "[oldval], %[v]\n"			\
+	"	eor	%" #w "[tmp], %" #w "[oldval], %" #w "[old]\n"	\
+	"	cbnz	%" #w "[tmp], 2f\n"				\
+	"	stxr" #sz "\t%w[tmp], %" #w "[new], %[v]\n"		\
+	"	cbnz	%w[tmp], 1b\n"					\
+	"	" #mb "\n"						\
+	"	mov	%" #w "[oldval], %" #w "[old]\n"		\
+	"2:"								\
+	: [tmp] "=&r" (tmp), [oldval] "=&r" (oldval),			\
+	  [v] "+Q" (*(unsigned long *)ptr)				\
+	: [old] "Lr" (old), [new] "r" (new)				\
+	: cl);								\
+									\
+	return oldval;							\
+}									\
+__LL_SC_EXPORT(__cmpxchg_case_##name);
+
+__CMPXCHG_CASE(w, b,    1,        ,         )
+__CMPXCHG_CASE(w, h,    2,        ,         )
+__CMPXCHG_CASE(w,  ,    4,        ,         )
+__CMPXCHG_CASE( ,  ,    8,        ,         )
+__CMPXCHG_CASE(w, b, mb_1, dmb ish, "memory")
+__CMPXCHG_CASE(w, h, mb_2, dmb ish, "memory")
+__CMPXCHG_CASE(w,  , mb_4, dmb ish, "memory")
+__CMPXCHG_CASE( ,  , mb_8, dmb ish, "memory")
+
+#undef __CMPXCHG_CASE
+
 #endif	/* __ASM_ATOMIC_LL_SC_H */
diff --git a/arch/arm64/include/asm/atomic_lse.h b/arch/arm64/include/asm/atomic_lse.h
index e6e544bce331..5136c3ab48e9 100644
--- a/arch/arm64/include/asm/atomic_lse.h
+++ b/arch/arm64/include/asm/atomic_lse.h
@@ -243,4 +243,43 @@ static inline long atomic64_dec_if_positive(atomic64_t *v)
 
 #undef __LL_SC_ATOMIC64
 
+#define __LL_SC_CMPXCHG(op)	__LL_SC_CALL(__cmpxchg_case_##op)
+
+#define __CMPXCHG_CASE(w, sz, name, mb, cl...)				\
+static inline unsigned long __cmpxchg_case_##name(volatile void *ptr,	\
+						  unsigned long old,	\
+						  unsigned long new)	\
+{									\
+	register unsigned long x0 asm ("x0") = (unsigned long)ptr;	\
+	register unsigned long x1 asm ("x1") = old;			\
+	register unsigned long x2 asm ("x2") = new;			\
+									\
+	asm volatile(ARM64_LSE_ATOMIC_INSN(				\
+	/* LL/SC */							\
+	"nop\n"								\
+	__LL_SC_CMPXCHG(name)						\
+	"nop",								\
+	/* LSE atomics */						\
+	"	mov	" #w "30, %" #w "[old]\n"			\
+	"	cas" #mb #sz "\t" #w "30, %" #w "[new], %[v]\n"		\
+	"	mov	%" #w "[ret], " #w "30")			\
+	: [ret] "+r" (x0), [v] "+Q" (*(unsigned long *)ptr)		\
+	: [old] "r" (x1), [new] "r" (x2)				\
+	: "x30" , ##cl);						\
+									\
+	return x0;							\
+}
+
+__CMPXCHG_CASE(w, b,    1,   )
+__CMPXCHG_CASE(w, h,    2,   )
+__CMPXCHG_CASE(w,  ,    4,   )
+__CMPXCHG_CASE(x,  ,    8,   )
+__CMPXCHG_CASE(w, b, mb_1, al, "memory")
+__CMPXCHG_CASE(w, h, mb_2, al, "memory")
+__CMPXCHG_CASE(w,  , mb_4, al, "memory")
+__CMPXCHG_CASE(x,  , mb_8, al, "memory")
+
+#undef __LL_SC_CMPXCHG
+#undef __CMPXCHG_CASE
+
 #endif	/* __ASM_ATOMIC_LSE_H */
diff --git a/arch/arm64/include/asm/cmpxchg.h b/arch/arm64/include/asm/cmpxchg.h
index d0cce8068902..60a558127cef 100644
--- a/arch/arm64/include/asm/cmpxchg.h
+++ b/arch/arm64/include/asm/cmpxchg.h
@@ -21,6 +21,7 @@
 #include <linux/bug.h>
 #include <linux/mmdebug.h>
 
+#include <asm/atomic.h>
 #include <asm/barrier.h>
 #include <asm/lse.h>
 
@@ -111,74 +112,20 @@ static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size
 static inline unsigned long __cmpxchg(volatile void *ptr, unsigned long old,
 				      unsigned long new, int size)
 {
-	unsigned long oldval = 0, res;
-
 	switch (size) {
 	case 1:
-		do {
-			asm volatile("// __cmpxchg1\n"
-			"	ldxrb	%w1, %2\n"
-			"	mov	%w0, #0\n"
-			"	cmp	%w1, %w3\n"
-			"	b.ne	1f\n"
-			"	stxrb	%w0, %w4, %2\n"
-			"1:\n"
-				: "=&r" (res), "=&r" (oldval), "+Q" (*(u8 *)ptr)
-				: "Ir" (old), "r" (new)
-				: "cc");
-		} while (res);
-		break;
-
+		return __cmpxchg_case_1(ptr, old, new);
 	case 2:
-		do {
-			asm volatile("// __cmpxchg2\n"
-			"	ldxrh	%w1, %2\n"
-			"	mov	%w0, #0\n"
-			"	cmp	%w1, %w3\n"
-			"	b.ne	1f\n"
-			"	stxrh	%w0, %w4, %2\n"
-			"1:\n"
-				: "=&r" (res), "=&r" (oldval), "+Q" (*(u16 *)ptr)
-				: "Ir" (old), "r" (new)
-				: "cc");
-		} while (res);
-		break;
-
+		return __cmpxchg_case_2(ptr, old, new);
 	case 4:
-		do {
-			asm volatile("// __cmpxchg4\n"
-			"	ldxr	%w1, %2\n"
-			"	mov	%w0, #0\n"
-			"	cmp	%w1, %w3\n"
-			"	b.ne	1f\n"
-			"	stxr	%w0, %w4, %2\n"
-			"1:\n"
-				: "=&r" (res), "=&r" (oldval), "+Q" (*(u32 *)ptr)
-				: "Ir" (old), "r" (new)
-				: "cc");
-		} while (res);
-		break;
-
+		return __cmpxchg_case_4(ptr, old, new);
 	case 8:
-		do {
-			asm volatile("// __cmpxchg8\n"
-			"	ldxr	%1, %2\n"
-			"	mov	%w0, #0\n"
-			"	cmp	%1, %3\n"
-			"	b.ne	1f\n"
-			"	stxr	%w0, %4, %2\n"
-			"1:\n"
-				: "=&r" (res), "=&r" (oldval), "+Q" (*(u64 *)ptr)
-				: "Ir" (old), "r" (new)
-				: "cc");
-		} while (res);
-		break;
-
+		return __cmpxchg_case_8(ptr, old, new);
 	default:
 		BUILD_BUG();
 	}
 
-	return oldval;
+	unreachable();
 }
 
 #define system_has_cmpxchg_double()     1
@@ -229,13 +176,20 @@ static inline int __cmpxchg_double_mb(volatile void *ptr1, volatile void *ptr2,
 static inline unsigned long __cmpxchg_mb(volatile void *ptr, unsigned long old,
 					 unsigned long new, int size)
 {
-	unsigned long ret;
-
-	smp_mb();
-	ret = __cmpxchg(ptr, old, new, size);
-	smp_mb();
+	switch (size) {
+	case 1:
+		return __cmpxchg_case_mb_1(ptr, old, new);
+	case 2:
+		return __cmpxchg_case_mb_2(ptr, old, new);
+	case 4:
+		return __cmpxchg_case_mb_4(ptr, old, new);
+	case 8:
+		return __cmpxchg_case_mb_8(ptr, old, new);
+	default:
+		BUILD_BUG();
+	}
 
-	return ret;
+	unreachable();
 }
 
 #define cmpxchg(ptr, o, n) \
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v2 12/20] arm64: cmpxchg: patch in lse instructions when supported by the CPU
  2015-07-24 10:42 ` [PATCH v2 12/20] arm64: cmpxchg: " Will Deacon
@ 2015-07-24 15:21   ` Catalin Marinas
  0 siblings, 0 replies; 44+ messages in thread
From: Catalin Marinas @ 2015-07-24 15:21 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jul 24, 2015 at 11:42:03AM +0100, Will Deacon wrote:
> On CPUs which support the LSE atomic instructions introduced in ARMv8.1,
> it makes sense to use them in preference to ll/sc sequences.
> 
> This patch introduces runtime patching of our cmpxchg primitives so that
> the LSE cas instruction is used instead.
> 
> Signed-off-by: Will Deacon <will.deacon@arm.com>

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 13/20] arm64: cmpxchg_dbl: patch in lse instructions when supported by the CPU
  2015-07-24 10:41 [PATCH v2 00/20] arm64: support for 8.1 LSE atomic instructions Will Deacon
                   ` (11 preceding siblings ...)
  2015-07-24 10:42 ` [PATCH v2 12/20] arm64: cmpxchg: " Will Deacon
@ 2015-07-24 10:42 ` Will Deacon
  2015-07-24 15:29   ` Catalin Marinas
  2015-07-24 10:42 ` [PATCH v2 14/20] arm64: cmpxchg: avoid "cc" clobber in ll/sc routines Will Deacon
                   ` (6 subsequent siblings)
  19 siblings, 1 reply; 44+ messages in thread
From: Will Deacon @ 2015-07-24 10:42 UTC (permalink / raw)
  To: linux-arm-kernel

On CPUs which support the LSE atomic instructions introduced in ARMv8.1,
it makes sense to use them in preference to ll/sc sequences.

This patch introduces runtime patching of our cmpxchg_double primitives
so that the LSE casp instruction is used instead.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/atomic_ll_sc.h | 34 ++++++++++++++++++
 arch/arm64/include/asm/atomic_lse.h   | 43 ++++++++++++++++++++++
 arch/arm64/include/asm/cmpxchg.h      | 68 +++++++++--------------------------
 3 files changed, 94 insertions(+), 51 deletions(-)

diff --git a/arch/arm64/include/asm/atomic_ll_sc.h b/arch/arm64/include/asm/atomic_ll_sc.h
index b4298f9a898f..77d3aabf52ad 100644
--- a/arch/arm64/include/asm/atomic_ll_sc.h
+++ b/arch/arm64/include/asm/atomic_ll_sc.h
@@ -243,4 +243,38 @@ __CMPXCHG_CASE( ,  , mb_8, dmb ish, "memory")
 
 #undef __CMPXCHG_CASE
 
+#define __CMPXCHG_DBL(name, mb, cl)					\
+__LL_SC_INLINE int							\
+__LL_SC_PREFIX(__cmpxchg_double##name(unsigned long old1,		\
+				      unsigned long old2,		\
+				      unsigned long new1,		\
+				      unsigned long new2,		\
+				      volatile void *ptr))		\
+{									\
+	unsigned long tmp, ret;						\
+									\
+	asm volatile("// __cmpxchg_double" #name "\n"			\
+	"	" #mb "\n"						\
+	"1:	ldxp	%0, %1, %2\n"					\
+	"	eor	%0, %0, %3\n"					\
+	"	eor	%1, %1, %4\n"					\
+	"	orr	%1, %0, %1\n"					\
+	"	cbnz	%1, 2f\n"					\
+	"	stxp	%w0, %5, %6, %2\n"				\
+	"	cbnz	%w0, 1b\n"					\
+	"	" #mb "\n"						\
+	"2:"								\
+	: "=&r" (tmp), "=&r" (ret), "+Q" (*(unsigned long *)ptr)	\
+	: "r" (old1), "r" (old2), "r" (new1), "r" (new2)		\
+	: cl);								\
+									\
+	return ret;							\
+}									\
+__LL_SC_EXPORT(__cmpxchg_double##name);
+
+__CMPXCHG_DBL(   ,        ,         )
+__CMPXCHG_DBL(_mb, dmb ish, "memory")
+
+#undef __CMPXCHG_DBL
+
 #endif	/* __ASM_ATOMIC_LL_SC_H */
diff --git a/arch/arm64/include/asm/atomic_lse.h b/arch/arm64/include/asm/atomic_lse.h
index 5136c3ab48e9..db41a63691cd 100644
--- a/arch/arm64/include/asm/atomic_lse.h
+++ b/arch/arm64/include/asm/atomic_lse.h
@@ -282,4 +282,47 @@ __CMPXCHG_CASE(x,  , mb_8, al, "memory")
 #undef __LL_SC_CMPXCHG
 #undef __CMPXCHG_CASE
 
+#define __LL_SC_CMPXCHG_DBL(op)	__LL_SC_CALL(__cmpxchg_double##op)
+
+#define __CMPXCHG_DBL(name, mb, cl...)					\
+static inline int __cmpxchg_double##name(unsigned long old1,		\
+					 unsigned long old2,		\
+					 unsigned long new1,		\
+					 unsigned long new2,		\
+					 volatile void *ptr)		\
+{									\
+	unsigned long oldval1 = old1;					\
+	unsigned long oldval2 = old2;					\
+	register unsigned long x0 asm ("x0") = old1;			\
+	register unsigned long x1 asm ("x1") = old2;			\
+	register unsigned long x2 asm ("x2") = new1;			\
+	register unsigned long x3 asm ("x3") = new2;			\
+	register unsigned long x4 asm ("x4") = (unsigned long)ptr;	\
+									\
+	asm volatile(ARM64_LSE_ATOMIC_INSN(				\
+	/* LL/SC */							\
+	"	nop\n"							\
+	"	nop\n"							\
+	"	nop\n"							\
+	__LL_SC_CMPXCHG_DBL(name),					\
+	/* LSE atomics */						\
+	"	casp" #mb "\t%[old1], %[old2], %[new1], %[new2], %[v]\n"\
+	"	eor	%[old1], %[old1], %[oldval1]\n"			\
+	"	eor	%[old2], %[old2], %[oldval2]\n"			\
+	"	orr	%[old1], %[old1], %[old2]")			\
+	: [old1] "+r" (x0), [old2] "+r" (x1),				\
+	  [v] "+Q" (*(unsigned long *)ptr)				\
+	: [new1] "r" (x2), [new2] "r" (x3), [ptr] "r" (x4),		\
+	  [oldval1] "r" (oldval1), [oldval2] "r" (oldval2)		\
+	: "x30" , ##cl);						\
+									\
+	return x0;							\
+}
+
+__CMPXCHG_DBL(   ,   )
+__CMPXCHG_DBL(_mb, al, "memory")
+
+#undef __LL_SC_CMPXCHG_DBL
+#undef __CMPXCHG_DBL
+
 #endif	/* __ASM_ATOMIC_LSE_H */
diff --git a/arch/arm64/include/asm/cmpxchg.h b/arch/arm64/include/asm/cmpxchg.h
index 60a558127cef..f70212629d02 100644
--- a/arch/arm64/include/asm/cmpxchg.h
+++ b/arch/arm64/include/asm/cmpxchg.h
@@ -128,51 +128,6 @@ static inline unsigned long __cmpxchg(volatile void *ptr, unsigned long old,
 	unreachable();
 }
 
-#define system_has_cmpxchg_double()     1
-
-static inline int __cmpxchg_double(volatile void *ptr1, volatile void *ptr2,
-		unsigned long old1, unsigned long old2,
-		unsigned long new1, unsigned long new2, int size)
-{
-	unsigned long loop, lost;
-
-	switch (size) {
-	case 8:
-		VM_BUG_ON((unsigned long *)ptr2 - (unsigned long *)ptr1 != 1);
-		do {
-			asm volatile("// __cmpxchg_double8\n"
-			"	ldxp	%0, %1, %2\n"
-			"	eor	%0, %0, %3\n"
-			"	eor	%1, %1, %4\n"
-			"	orr	%1, %0, %1\n"
-			"	mov	%w0, #0\n"
-			"	cbnz	%1, 1f\n"
-			"	stxp	%w0, %5, %6, %2\n"
-			"1:\n"
-				: "=&r"(loop), "=&r"(lost), "+Q" (*(u64 *)ptr1)
-				: "r" (old1), "r"(old2), "r"(new1), "r"(new2));
-		} while (loop);
-		break;
-	default:
-		BUILD_BUG();
-	}
-
-	return !lost;
-}
-
-static inline int __cmpxchg_double_mb(volatile void *ptr1, volatile void *ptr2,
-			unsigned long old1, unsigned long old2,
-			unsigned long new1, unsigned long new2, int size)
-{
-	int ret;
-
-	smp_mb();
-	ret = __cmpxchg_double(ptr1, ptr2, old1, old2, new1, new2, size);
-	smp_mb();
-
-	return ret;
-}
-
 static inline unsigned long __cmpxchg_mb(volatile void *ptr, unsigned long old,
 					 unsigned long new, int size)
 {
@@ -210,21 +165,32 @@ static inline unsigned long __cmpxchg_mb(volatile void *ptr, unsigned long old,
 	__ret; \
 })
 
+#define system_has_cmpxchg_double()     1
+
+#define __cmpxchg_double_check(ptr1, ptr2)					\
+({										\
+	if (sizeof(*(ptr1)) != 8)						\
+		BUILD_BUG();							\
+	VM_BUG_ON((unsigned long *)(ptr2) - (unsigned long *)(ptr1) != 1);	\
+})
+
 #define cmpxchg_double(ptr1, ptr2, o1, o2, n1, n2) \
 ({\
 	int __ret;\
-	__ret = __cmpxchg_double_mb((ptr1), (ptr2), (unsigned long)(o1), \
-			(unsigned long)(o2), (unsigned long)(n1), \
-			(unsigned long)(n2), sizeof(*(ptr1)));\
+	__cmpxchg_double_check(ptr1, ptr2); \
+	__ret = !__cmpxchg_double_mb((unsigned long)(o1), (unsigned long)(o2), \
+				     (unsigned long)(n1), (unsigned long)(n2), \
+				     ptr1); \
 	__ret; \
 })
 
 #define cmpxchg_double_local(ptr1, ptr2, o1, o2, n1, n2) \
 ({\
 	int __ret;\
-	__ret = __cmpxchg_double((ptr1), (ptr2), (unsigned long)(o1), \
-			(unsigned long)(o2), (unsigned long)(n1), \
-			(unsigned long)(n2), sizeof(*(ptr1)));\
+	__cmpxchg_double_check(ptr1, ptr2); \
+	__ret = !__cmpxchg_double((unsigned long)(o1), (unsigned long)(o2), \
+				  (unsigned long)(n1), (unsigned long)(n2), \
+				  ptr1); \
 	__ret; \
 })
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v2 13/20] arm64: cmpxchg_dbl: patch in lse instructions when supported by the CPU
  2015-07-24 10:42 ` [PATCH v2 13/20] arm64: cmpxchg_dbl: " Will Deacon
@ 2015-07-24 15:29   ` Catalin Marinas
  0 siblings, 0 replies; 44+ messages in thread
From: Catalin Marinas @ 2015-07-24 15:29 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jul 24, 2015 at 11:42:04AM +0100, Will Deacon wrote:
> On CPUs which support the LSE atomic instructions introduced in ARMv8.1,
> it makes sense to use them in preference to ll/sc sequences.
> 
> This patch introduces runtime patching of our cmpxchg_double primitives
> so that the LSE casp instruction is used instead.
> 
> Signed-off-by: Will Deacon <will.deacon@arm.com>

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 14/20] arm64: cmpxchg: avoid "cc" clobber in ll/sc routines
  2015-07-24 10:41 [PATCH v2 00/20] arm64: support for 8.1 LSE atomic instructions Will Deacon
                   ` (12 preceding siblings ...)
  2015-07-24 10:42 ` [PATCH v2 13/20] arm64: cmpxchg_dbl: " Will Deacon
@ 2015-07-24 10:42 ` Will Deacon
  2015-07-24 15:30   ` Catalin Marinas
  2015-07-24 10:42 ` [PATCH v2 15/20] arm64: cmpxchg: avoid memory barrier on comparison failure Will Deacon
                   ` (5 subsequent siblings)
  19 siblings, 1 reply; 44+ messages in thread
From: Will Deacon @ 2015-07-24 10:42 UTC (permalink / raw)
  To: linux-arm-kernel

We can perform the cmpxchg comparison using eor and cbnz which avoids
the "cc" clobber for the ll/sc case and consequently for the LSE case
where we may have to fall-back on the ll/sc code at runtime.

Reviewed-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/atomic_ll_sc.h | 14 ++++++--------
 arch/arm64/include/asm/atomic_lse.h   |  4 ++--
 2 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/include/asm/atomic_ll_sc.h b/arch/arm64/include/asm/atomic_ll_sc.h
index 77d3aabf52ad..d21091bae901 100644
--- a/arch/arm64/include/asm/atomic_ll_sc.h
+++ b/arch/arm64/include/asm/atomic_ll_sc.h
@@ -96,14 +96,13 @@ __LL_SC_PREFIX(atomic_cmpxchg(atomic_t *ptr, int old, int new))
 
 	asm volatile("// atomic_cmpxchg\n"
 "1:	ldxr	%w1, %2\n"
-"	cmp	%w1, %w3\n"
-"	b.ne	2f\n"
+"	eor	%w0, %w1, %w3\n"
+"	cbnz	%w0, 2f\n"
 "	stxr	%w0, %w4, %2\n"
 "	cbnz	%w0, 1b\n"
 "2:"
 	: "=&r" (tmp), "=&r" (oldval), "+Q" (ptr->counter)
-	: "Ir" (old), "r" (new)
-	: "cc");
+	: "Lr" (old), "r" (new));
 
 	smp_mb();
 	return oldval;
@@ -169,14 +168,13 @@ __LL_SC_PREFIX(atomic64_cmpxchg(atomic64_t *ptr, long old, long new))
 
 	asm volatile("// atomic64_cmpxchg\n"
 "1:	ldxr	%1, %2\n"
-"	cmp	%1, %3\n"
-"	b.ne	2f\n"
+"	eor	%0, %1, %3\n"
+"	cbnz	%w0, 2f\n"
 "	stxr	%w0, %4, %2\n"
 "	cbnz	%w0, 1b\n"
 "2:"
 	: "=&r" (res), "=&r" (oldval), "+Q" (ptr->counter)
-	: "Ir" (old), "r" (new)
-	: "cc");
+	: "Lr" (old), "r" (new));
 
 	smp_mb();
 	return oldval;
diff --git a/arch/arm64/include/asm/atomic_lse.h b/arch/arm64/include/asm/atomic_lse.h
index db41a63691cd..a03242e143e5 100644
--- a/arch/arm64/include/asm/atomic_lse.h
+++ b/arch/arm64/include/asm/atomic_lse.h
@@ -113,7 +113,7 @@ static inline int atomic_cmpxchg(atomic_t *ptr, int old, int new)
 	"	mov	%w[ret], w30")
 	: [ret] "+r" (x0), [v] "+Q" (ptr->counter)
 	: [old] "r" (w1), [new] "r" (w2)
-	: "x30", "cc", "memory");
+	: "x30", "memory");
 
 	return x0;
 }
@@ -207,7 +207,7 @@ static inline long atomic64_cmpxchg(atomic64_t *ptr, long old, long new)
 	"	mov	%[ret], x30")
 	: [ret] "+r" (x0), [v] "+Q" (ptr->counter)
 	: [old] "r" (x1), [new] "r" (x2)
-	: "x30", "cc", "memory");
+	: "x30", "memory");
 
 	return x0;
 }
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v2 14/20] arm64: cmpxchg: avoid "cc" clobber in ll/sc routines
  2015-07-24 10:42 ` [PATCH v2 14/20] arm64: cmpxchg: avoid "cc" clobber in ll/sc routines Will Deacon
@ 2015-07-24 15:30   ` Catalin Marinas
  0 siblings, 0 replies; 44+ messages in thread
From: Catalin Marinas @ 2015-07-24 15:30 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jul 24, 2015 at 11:42:05AM +0100, Will Deacon wrote:
> We can perform the cmpxchg comparison using eor and cbnz which avoids
> the "cc" clobber for the ll/sc case and consequently for the LSE case
> where we may have to fall-back on the ll/sc code at runtime.
> 
> Reviewed-by: Steve Capper <steve.capper@arm.com>
> Signed-off-by: Will Deacon <will.deacon@arm.com>

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 15/20] arm64: cmpxchg: avoid memory barrier on comparison failure
  2015-07-24 10:41 [PATCH v2 00/20] arm64: support for 8.1 LSE atomic instructions Will Deacon
                   ` (13 preceding siblings ...)
  2015-07-24 10:42 ` [PATCH v2 14/20] arm64: cmpxchg: avoid "cc" clobber in ll/sc routines Will Deacon
@ 2015-07-24 10:42 ` Will Deacon
  2015-07-24 15:32   ` Catalin Marinas
  2015-07-24 10:42 ` [PATCH v2 16/20] arm64: atomics: tidy up common atomic{,64}_* macros Will Deacon
                   ` (4 subsequent siblings)
  19 siblings, 1 reply; 44+ messages in thread
From: Will Deacon @ 2015-07-24 10:42 UTC (permalink / raw)
  To: linux-arm-kernel

cmpxchg doesn't require memory barrier semantics when the value
comparison fails, so make the barrier conditional on success.

Reviewed-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/atomic_ll_sc.h | 48 ++++++++++++++++-------------------
 1 file changed, 22 insertions(+), 26 deletions(-)

diff --git a/arch/arm64/include/asm/atomic_ll_sc.h b/arch/arm64/include/asm/atomic_ll_sc.h
index d21091bae901..fb26f2b1f300 100644
--- a/arch/arm64/include/asm/atomic_ll_sc.h
+++ b/arch/arm64/include/asm/atomic_ll_sc.h
@@ -92,19 +92,18 @@ __LL_SC_PREFIX(atomic_cmpxchg(atomic_t *ptr, int old, int new))
 	unsigned long tmp;
 	int oldval;
 
-	smp_mb();
-
 	asm volatile("// atomic_cmpxchg\n"
 "1:	ldxr	%w1, %2\n"
 "	eor	%w0, %w1, %w3\n"
 "	cbnz	%w0, 2f\n"
-"	stxr	%w0, %w4, %2\n"
+"	stlxr	%w0, %w4, %2\n"
 "	cbnz	%w0, 1b\n"
+"	dmb	ish\n"
 "2:"
 	: "=&r" (tmp), "=&r" (oldval), "+Q" (ptr->counter)
-	: "Lr" (old), "r" (new));
+	: "Lr" (old), "r" (new)
+	: "memory");
 
-	smp_mb();
 	return oldval;
 }
 __LL_SC_EXPORT(atomic_cmpxchg);
@@ -164,19 +163,18 @@ __LL_SC_PREFIX(atomic64_cmpxchg(atomic64_t *ptr, long old, long new))
 	long oldval;
 	unsigned long res;
 
-	smp_mb();
-
 	asm volatile("// atomic64_cmpxchg\n"
 "1:	ldxr	%1, %2\n"
 "	eor	%0, %1, %3\n"
 "	cbnz	%w0, 2f\n"
-"	stxr	%w0, %4, %2\n"
+"	stlxr	%w0, %4, %2\n"
 "	cbnz	%w0, 1b\n"
+"	dmb	ish\n"
 "2:"
 	: "=&r" (res), "=&r" (oldval), "+Q" (ptr->counter)
-	: "Lr" (old), "r" (new));
+	: "Lr" (old), "r" (new)
+	: "memory");
 
-	smp_mb();
 	return oldval;
 }
 __LL_SC_EXPORT(atomic64_cmpxchg);
@@ -203,7 +201,7 @@ __LL_SC_PREFIX(atomic64_dec_if_positive(atomic64_t *v))
 }
 __LL_SC_EXPORT(atomic64_dec_if_positive);
 
-#define __CMPXCHG_CASE(w, sz, name, mb, cl)				\
+#define __CMPXCHG_CASE(w, sz, name, mb, rel, cl)			\
 __LL_SC_INLINE unsigned long						\
 __LL_SC_PREFIX(__cmpxchg_case_##name(volatile void *ptr,		\
 				     unsigned long old,			\
@@ -212,11 +210,10 @@ __LL_SC_PREFIX(__cmpxchg_case_##name(volatile void *ptr,		\
 	unsigned long tmp, oldval;					\
 									\
 	asm volatile(							\
-	"	" #mb "\n"						\
 	"1:	ldxr" #sz "\t%" #w "[oldval], %[v]\n"			\
 	"	eor	%" #w "[tmp], %" #w "[oldval], %" #w "[old]\n"	\
 	"	cbnz	%" #w "[tmp], 2f\n"				\
-	"	stxr" #sz "\t%w[tmp], %" #w "[new], %[v]\n"		\
+	"	st" #rel "xr" #sz "\t%w[tmp], %" #w "[new], %[v]\n"	\
 	"	cbnz	%w[tmp], 1b\n"					\
 	"	" #mb "\n"						\
 	"	mov	%" #w "[oldval], %" #w "[old]\n"		\
@@ -230,18 +227,18 @@ __LL_SC_PREFIX(__cmpxchg_case_##name(volatile void *ptr,		\
 }									\
 __LL_SC_EXPORT(__cmpxchg_case_##name);
 
-__CMPXCHG_CASE(w, b,    1,        ,         )
-__CMPXCHG_CASE(w, h,    2,        ,         )
-__CMPXCHG_CASE(w,  ,    4,        ,         )
-__CMPXCHG_CASE( ,  ,    8,        ,         )
-__CMPXCHG_CASE(w, b, mb_1, dmb ish, "memory")
-__CMPXCHG_CASE(w, h, mb_2, dmb ish, "memory")
-__CMPXCHG_CASE(w,  , mb_4, dmb ish, "memory")
-__CMPXCHG_CASE( ,  , mb_8, dmb ish, "memory")
+__CMPXCHG_CASE(w, b,    1,        ,  ,         )
+__CMPXCHG_CASE(w, h,    2,        ,  ,         )
+__CMPXCHG_CASE(w,  ,    4,        ,  ,         )
+__CMPXCHG_CASE( ,  ,    8,        ,  ,         )
+__CMPXCHG_CASE(w, b, mb_1, dmb ish, l, "memory")
+__CMPXCHG_CASE(w, h, mb_2, dmb ish, l, "memory")
+__CMPXCHG_CASE(w,  , mb_4, dmb ish, l, "memory")
+__CMPXCHG_CASE( ,  , mb_8, dmb ish, l, "memory")
 
 #undef __CMPXCHG_CASE
 
-#define __CMPXCHG_DBL(name, mb, cl)					\
+#define __CMPXCHG_DBL(name, mb, rel, cl)				\
 __LL_SC_INLINE int							\
 __LL_SC_PREFIX(__cmpxchg_double##name(unsigned long old1,		\
 				      unsigned long old2,		\
@@ -252,13 +249,12 @@ __LL_SC_PREFIX(__cmpxchg_double##name(unsigned long old1,		\
 	unsigned long tmp, ret;						\
 									\
 	asm volatile("// __cmpxchg_double" #name "\n"			\
-	"	" #mb "\n"						\
 	"1:	ldxp	%0, %1, %2\n"					\
 	"	eor	%0, %0, %3\n"					\
 	"	eor	%1, %1, %4\n"					\
 	"	orr	%1, %0, %1\n"					\
 	"	cbnz	%1, 2f\n"					\
-	"	stxp	%w0, %5, %6, %2\n"				\
+	"	st" #rel "xp	%w0, %5, %6, %2\n"			\
 	"	cbnz	%w0, 1b\n"					\
 	"	" #mb "\n"						\
 	"2:"								\
@@ -270,8 +266,8 @@ __LL_SC_PREFIX(__cmpxchg_double##name(unsigned long old1,		\
 }									\
 __LL_SC_EXPORT(__cmpxchg_double##name);
 
-__CMPXCHG_DBL(   ,        ,         )
-__CMPXCHG_DBL(_mb, dmb ish, "memory")
+__CMPXCHG_DBL(   ,        ,  ,         )
+__CMPXCHG_DBL(_mb, dmb ish, l, "memory")
 
 #undef __CMPXCHG_DBL
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v2 15/20] arm64: cmpxchg: avoid memory barrier on comparison failure
  2015-07-24 10:42 ` [PATCH v2 15/20] arm64: cmpxchg: avoid memory barrier on comparison failure Will Deacon
@ 2015-07-24 15:32   ` Catalin Marinas
  0 siblings, 0 replies; 44+ messages in thread
From: Catalin Marinas @ 2015-07-24 15:32 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jul 24, 2015 at 11:42:06AM +0100, Will Deacon wrote:
> cmpxchg doesn't require memory barrier semantics when the value
> comparison fails, so make the barrier conditional on success.
> 
> Reviewed-by: Steve Capper <steve.capper@arm.com>
> Signed-off-by: Will Deacon <will.deacon@arm.com>

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 16/20] arm64: atomics: tidy up common atomic{,64}_* macros
  2015-07-24 10:41 [PATCH v2 00/20] arm64: support for 8.1 LSE atomic instructions Will Deacon
                   ` (14 preceding siblings ...)
  2015-07-24 10:42 ` [PATCH v2 15/20] arm64: cmpxchg: avoid memory barrier on comparison failure Will Deacon
@ 2015-07-24 10:42 ` Will Deacon
  2015-07-24 15:40   ` Catalin Marinas
  2015-07-24 10:42 ` [PATCH v2 17/20] arm64: atomics: prefetch the destination word for write prior to stxr Will Deacon
                   ` (3 subsequent siblings)
  19 siblings, 1 reply; 44+ messages in thread
From: Will Deacon @ 2015-07-24 10:42 UTC (permalink / raw)
  To: linux-arm-kernel

The common (i.e. identical for ll/sc and lse) atomic macros in atomic.h
are needlessley different for atomic_t and atomic64_t.

This patch tidies up the definitions to make them consistent across the
two atomic types and factors out common code such as the add_unless
implementation based on cmpxchg.

Reviewed-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/atomic.h | 92 +++++++++++++++++------------------------
 1 file changed, 38 insertions(+), 54 deletions(-)

diff --git a/arch/arm64/include/asm/atomic.h b/arch/arm64/include/asm/atomic.h
index ee32776d926c..51816ab2312d 100644
--- a/arch/arm64/include/asm/atomic.h
+++ b/arch/arm64/include/asm/atomic.h
@@ -26,8 +26,6 @@
 #include <asm/barrier.h>
 #include <asm/lse.h>
 
-#define ATOMIC_INIT(i)	{ (i) }
-
 #ifdef __KERNEL__
 
 #define __ARM64_IN_ATOMIC_IMPL
@@ -42,67 +40,53 @@
 
 #include <asm/cmpxchg.h>
 
-/*
- * On ARM, ordinary assignment (str instruction) doesn't clear the local
- * strex/ldrex monitor on some implementations. The reason we can use it for
- * atomic_set() is the clrex or dummy strex done on every exception return.
- */
-#define atomic_read(v)	ACCESS_ONCE((v)->counter)
-#define atomic_set(v,i)	(((v)->counter) = (i))
-
-#define atomic_xchg(v, new) (xchg(&((v)->counter), new))
-
-static inline int __atomic_add_unless(atomic_t *v, int a, int u)
-{
-	int c, old;
-
-	c = atomic_read(v);
-	while (c != u && (old = atomic_cmpxchg((v), c, c + a)) != c)
-		c = old;
-	return c;
-}
+#define ___atomic_add_unless(v, a, u, sfx)				\
+({									\
+	typeof((v)->counter) c, old;					\
+									\
+	c = atomic##sfx##_read(v);					\
+	while (c != (u) &&						\
+	      (old = atomic##sfx##_cmpxchg((v), c, c + (a))) != c)	\
+		c = old;						\
+	c;								\
+ })
 
-#define atomic_inc(v)		atomic_add(1, v)
-#define atomic_dec(v)		atomic_sub(1, v)
+#define ATOMIC_INIT(i)	{ (i) }
 
-#define atomic_inc_and_test(v)	(atomic_add_return(1, v) == 0)
-#define atomic_dec_and_test(v)	(atomic_sub_return(1, v) == 0)
-#define atomic_inc_return(v)    (atomic_add_return(1, v))
-#define atomic_dec_return(v)    (atomic_sub_return(1, v))
-#define atomic_sub_and_test(i, v) (atomic_sub_return(i, v) == 0)
+#define atomic_read(v)			READ_ONCE((v)->counter)
+#define atomic_set(v, i)		(((v)->counter) = (i))
+#define atomic_xchg(v, new)		xchg(&((v)->counter), (new))
 
-#define atomic_add_negative(i,v) (atomic_add_return(i, v) < 0)
+#define atomic_inc(v)			atomic_add(1, (v))
+#define atomic_dec(v)			atomic_sub(1, (v))
+#define atomic_inc_return(v)		atomic_add_return(1, (v))
+#define atomic_dec_return(v)		atomic_sub_return(1, (v))
+#define atomic_inc_and_test(v)		(atomic_inc_return(v) == 0)
+#define atomic_dec_and_test(v)		(atomic_dec_return(v) == 0)
+#define atomic_sub_and_test(i, v)	(atomic_sub_return((i), (v)) == 0)
+#define atomic_add_negative(i, v)	(atomic_add_return((i), (v)) < 0)
+#define __atomic_add_unless(v, a, u)	___atomic_add_unless(v, a, u,)
 
 /*
  * 64-bit atomic operations.
  */
-#define ATOMIC64_INIT(i) { (i) }
-
-#define atomic64_read(v)	ACCESS_ONCE((v)->counter)
-#define atomic64_set(v,i)	(((v)->counter) = (i))
-
-#define atomic64_xchg(v, new) (xchg(&((v)->counter), new))
-
-static inline int atomic64_add_unless(atomic64_t *v, long a, long u)
-{
-	long c, old;
-
-	c = atomic64_read(v);
-	while (c != u && (old = atomic64_cmpxchg((v), c, c + a)) != c)
-		c = old;
+#define ATOMIC64_INIT			ATOMIC_INIT
+#define atomic64_read			atomic_read
+#define atomic64_set			atomic_set
+#define atomic64_xchg			atomic_xchg
+
+#define atomic64_inc(v)			atomic64_add(1, (v))
+#define atomic64_dec(v)			atomic64_sub(1, (v))
+#define atomic64_inc_return(v)		atomic64_add_return(1, (v))
+#define atomic64_dec_return(v)		atomic64_sub_return(1, (v))
+#define atomic64_inc_and_test(v)	(atomic64_inc_return(v) == 0)
+#define atomic64_dec_and_test(v)	(atomic64_dec_return(v) == 0)
+#define atomic64_sub_and_test(i, v)	(atomic64_sub_return((i), (v)) == 0)
+#define atomic64_add_negative(i, v)	(atomic64_add_return((i), (v)) < 0)
+#define atomic64_add_unless(v, a, u)	(___atomic_add_unless(v, a, u, 64) != u)
 
-	return c != u;
-}
+#define atomic64_inc_not_zero(v)	atomic64_add_unless((v), 1, 0)
 
-#define atomic64_add_negative(a, v)	(atomic64_add_return((a), (v)) < 0)
-#define atomic64_inc(v)			atomic64_add(1LL, (v))
-#define atomic64_inc_return(v)		atomic64_add_return(1LL, (v))
-#define atomic64_inc_and_test(v)	(atomic64_inc_return(v) == 0)
-#define atomic64_sub_and_test(a, v)	(atomic64_sub_return((a), (v)) == 0)
-#define atomic64_dec(v)			atomic64_sub(1LL, (v))
-#define atomic64_dec_return(v)		atomic64_sub_return(1LL, (v))
-#define atomic64_dec_and_test(v)	(atomic64_dec_return((v)) == 0)
-#define atomic64_inc_not_zero(v)	atomic64_add_unless((v), 1LL, 0LL)
 
 #endif
 #endif
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v2 16/20] arm64: atomics: tidy up common atomic{,64}_* macros
  2015-07-24 10:42 ` [PATCH v2 16/20] arm64: atomics: tidy up common atomic{,64}_* macros Will Deacon
@ 2015-07-24 15:40   ` Catalin Marinas
  0 siblings, 0 replies; 44+ messages in thread
From: Catalin Marinas @ 2015-07-24 15:40 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jul 24, 2015 at 11:42:07AM +0100, Will Deacon wrote:
> The common (i.e. identical for ll/sc and lse) atomic macros in atomic.h
> are needlessley different for atomic_t and atomic64_t.
> 
> This patch tidies up the definitions to make them consistent across the
> two atomic types and factors out common code such as the add_unless
> implementation based on cmpxchg.
> 
> Reviewed-by: Steve Capper <steve.capper@arm.com>
> Signed-off-by: Will Deacon <will.deacon@arm.com>

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 17/20] arm64: atomics: prefetch the destination word for write prior to stxr
  2015-07-24 10:41 [PATCH v2 00/20] arm64: support for 8.1 LSE atomic instructions Will Deacon
                   ` (15 preceding siblings ...)
  2015-07-24 10:42 ` [PATCH v2 16/20] arm64: atomics: tidy up common atomic{,64}_* macros Will Deacon
@ 2015-07-24 10:42 ` Will Deacon
  2015-07-24 15:42   ` Catalin Marinas
  2015-07-24 10:42 ` [PATCH v2 18/20] arm64: atomics: implement atomic{, 64}_cmpxchg using cmpxchg Will Deacon
                   ` (2 subsequent siblings)
  19 siblings, 1 reply; 44+ messages in thread
From: Will Deacon @ 2015-07-24 10:42 UTC (permalink / raw)
  To: linux-arm-kernel

The cost of changing a cacheline from shared to exclusive state can be
significant, especially when this is triggered by an exclusive store,
since it may result in having to retry the transaction.

This patch makes use of prfm to prefetch cachelines for write prior to
ldxr/stxr loops when using the ll/sc atomic routines.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/atomic_ll_sc.h | 9 +++++++++
 arch/arm64/include/asm/cmpxchg.h      | 8 ++++++++
 arch/arm64/include/asm/futex.h        | 2 ++
 arch/arm64/lib/bitops.S               | 2 ++
 4 files changed, 21 insertions(+)

diff --git a/arch/arm64/include/asm/atomic_ll_sc.h b/arch/arm64/include/asm/atomic_ll_sc.h
index fb26f2b1f300..652877fefae6 100644
--- a/arch/arm64/include/asm/atomic_ll_sc.h
+++ b/arch/arm64/include/asm/atomic_ll_sc.h
@@ -45,6 +45,7 @@ __LL_SC_PREFIX(atomic_##op(int i, atomic_t *v))				\
 	int result;							\
 									\
 	asm volatile("// atomic_" #op "\n"				\
+"	prfm	pstl1strm, %2\n"					\
 "1:	ldxr	%w0, %2\n"						\
 "	" #asm_op "	%w0, %w0, %w3\n"				\
 "	stxr	%w1, %w0, %2\n"						\
@@ -62,6 +63,7 @@ __LL_SC_PREFIX(atomic_##op##_return(int i, atomic_t *v))		\
 	int result;							\
 									\
 	asm volatile("// atomic_" #op "_return\n"			\
+"	prfm	pstl1strm, %2\n"					\
 "1:	ldxr	%w0, %2\n"						\
 "	" #asm_op "	%w0, %w0, %w3\n"				\
 "	stlxr	%w1, %w0, %2\n"						\
@@ -93,6 +95,7 @@ __LL_SC_PREFIX(atomic_cmpxchg(atomic_t *ptr, int old, int new))
 	int oldval;
 
 	asm volatile("// atomic_cmpxchg\n"
+"	prfm	pstl1strm, %2\n"
 "1:	ldxr	%w1, %2\n"
 "	eor	%w0, %w1, %w3\n"
 "	cbnz	%w0, 2f\n"
@@ -116,6 +119,7 @@ __LL_SC_PREFIX(atomic64_##op(long i, atomic64_t *v))			\
 	unsigned long tmp;						\
 									\
 	asm volatile("// atomic64_" #op "\n"				\
+"	prfm	pstl1strm, %2\n"					\
 "1:	ldxr	%0, %2\n"						\
 "	" #asm_op "	%0, %0, %3\n"					\
 "	stxr	%w1, %0, %2\n"						\
@@ -133,6 +137,7 @@ __LL_SC_PREFIX(atomic64_##op##_return(long i, atomic64_t *v))		\
 	unsigned long tmp;						\
 									\
 	asm volatile("// atomic64_" #op "_return\n"			\
+"	prfm	pstl1strm, %2\n"					\
 "1:	ldxr	%0, %2\n"						\
 "	" #asm_op "	%0, %0, %3\n"					\
 "	stlxr	%w1, %0, %2\n"						\
@@ -164,6 +169,7 @@ __LL_SC_PREFIX(atomic64_cmpxchg(atomic64_t *ptr, long old, long new))
 	unsigned long res;
 
 	asm volatile("// atomic64_cmpxchg\n"
+"	prfm	pstl1strm, %2\n"
 "1:	ldxr	%1, %2\n"
 "	eor	%0, %1, %3\n"
 "	cbnz	%w0, 2f\n"
@@ -186,6 +192,7 @@ __LL_SC_PREFIX(atomic64_dec_if_positive(atomic64_t *v))
 	unsigned long tmp;
 
 	asm volatile("// atomic64_dec_if_positive\n"
+"	prfm	pstl1strm, %2\n"
 "1:	ldxr	%0, %2\n"
 "	subs	%0, %0, #1\n"
 "	b.mi	2f\n"
@@ -210,6 +217,7 @@ __LL_SC_PREFIX(__cmpxchg_case_##name(volatile void *ptr,		\
 	unsigned long tmp, oldval;					\
 									\
 	asm volatile(							\
+	"	prfm	pstl1strm, %2\n"				\
 	"1:	ldxr" #sz "\t%" #w "[oldval], %[v]\n"			\
 	"	eor	%" #w "[tmp], %" #w "[oldval], %" #w "[old]\n"	\
 	"	cbnz	%" #w "[tmp], 2f\n"				\
@@ -249,6 +257,7 @@ __LL_SC_PREFIX(__cmpxchg_double##name(unsigned long old1,		\
 	unsigned long tmp, ret;						\
 									\
 	asm volatile("// __cmpxchg_double" #name "\n"			\
+	"	prfm	pstl1strm, %2\n"				\
 	"1:	ldxp	%0, %1, %2\n"					\
 	"	eor	%0, %0, %3\n"					\
 	"	eor	%1, %1, %4\n"					\
diff --git a/arch/arm64/include/asm/cmpxchg.h b/arch/arm64/include/asm/cmpxchg.h
index f70212629d02..7bfda0944c9b 100644
--- a/arch/arm64/include/asm/cmpxchg.h
+++ b/arch/arm64/include/asm/cmpxchg.h
@@ -33,12 +33,14 @@ static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size
 	case 1:
 		asm volatile(ARM64_LSE_ATOMIC_INSN(
 		/* LL/SC */
+		"	prfm	pstl1strm, %2\n"
 		"1:	ldxrb	%w0, %2\n"
 		"	stlxrb	%w1, %w3, %2\n"
 		"	cbnz	%w1, 1b\n"
 		"	dmb	ish",
 		/* LSE atomics */
 		"	nop\n"
+		"	nop\n"
 		"	swpalb	%w3, %w0, %2\n"
 		"	nop\n"
 		"	nop")
@@ -49,12 +51,14 @@ static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size
 	case 2:
 		asm volatile(ARM64_LSE_ATOMIC_INSN(
 		/* LL/SC */
+		"	prfm	pstl1strm, %2\n"
 		"1:	ldxrh	%w0, %2\n"
 		"	stlxrh	%w1, %w3, %2\n"
 		"	cbnz	%w1, 1b\n"
 		"	dmb	ish",
 		/* LSE atomics */
 		"	nop\n"
+		"	nop\n"
 		"	swpalh	%w3, %w0, %2\n"
 		"	nop\n"
 		"	nop")
@@ -65,12 +69,14 @@ static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size
 	case 4:
 		asm volatile(ARM64_LSE_ATOMIC_INSN(
 		/* LL/SC */
+		"	prfm	pstl1strm, %2\n"
 		"1:	ldxr	%w0, %2\n"
 		"	stlxr	%w1, %w3, %2\n"
 		"	cbnz	%w1, 1b\n"
 		"	dmb	ish",
 		/* LSE atomics */
 		"	nop\n"
+		"	nop\n"
 		"	swpal	%w3, %w0, %2\n"
 		"	nop\n"
 		"	nop")
@@ -81,12 +87,14 @@ static inline unsigned long __xchg(unsigned long x, volatile void *ptr, int size
 	case 8:
 		asm volatile(ARM64_LSE_ATOMIC_INSN(
 		/* LL/SC */
+		"	prfm	pstl1strm, %2\n"
 		"1:	ldxr	%0, %2\n"
 		"	stlxr	%w1, %3, %2\n"
 		"	cbnz	%w1, 1b\n"
 		"	dmb	ish",
 		/* LSE atomics */
 		"	nop\n"
+		"	nop\n"
 		"	swpal	%3, %0, %2\n"
 		"	nop\n"
 		"	nop")
diff --git a/arch/arm64/include/asm/futex.h b/arch/arm64/include/asm/futex.h
index 74069b3bd919..a681608faf9a 100644
--- a/arch/arm64/include/asm/futex.h
+++ b/arch/arm64/include/asm/futex.h
@@ -24,6 +24,7 @@
 
 #define __futex_atomic_op(insn, ret, oldval, uaddr, tmp, oparg)		\
 	asm volatile(							\
+"	prfm	pstl1strm, %2\n"					\
 "1:	ldxr	%w1, %2\n"						\
 	insn "\n"							\
 "2:	stlxr	%w3, %w0, %2\n"						\
@@ -112,6 +113,7 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
 		return -EFAULT;
 
 	asm volatile("// futex_atomic_cmpxchg_inatomic\n"
+"	prfm	pstl1strm, %2\n"
 "1:	ldxr	%w1, %2\n"
 "	sub	%w3, %w1, %w4\n"
 "	cbnz	%w3, 3f\n"
diff --git a/arch/arm64/lib/bitops.S b/arch/arm64/lib/bitops.S
index bc18457c2bba..43ac736baa5b 100644
--- a/arch/arm64/lib/bitops.S
+++ b/arch/arm64/lib/bitops.S
@@ -31,6 +31,7 @@ ENTRY(	\name	)
 	eor	w0, w0, w3		// Clear low bits
 	mov	x2, #1
 	add	x1, x1, x0, lsr #3	// Get word offset
+alt_lse "	prfm	pstl1strm, [x1]",	"nop"
 	lsl	x3, x2, x3		// Create mask
 
 alt_lse	"1:	ldxr	x2, [x1]",		"\lse	x3, [x1]"
@@ -48,6 +49,7 @@ ENTRY(	\name	)
 	eor	w0, w0, w3		// Clear low bits
 	mov	x2, #1
 	add	x1, x1, x0, lsr #3	// Get word offset
+alt_lse "	prfm	pstl1strm, [x1]",	"nop"
 	lsl	x4, x2, x3		// Create mask
 
 alt_lse	"1:	ldxr	x2, [x1]",		"\lse	x4, x2, [x1]"
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v2 17/20] arm64: atomics: prefetch the destination word for write prior to stxr
  2015-07-24 10:42 ` [PATCH v2 17/20] arm64: atomics: prefetch the destination word for write prior to stxr Will Deacon
@ 2015-07-24 15:42   ` Catalin Marinas
  0 siblings, 0 replies; 44+ messages in thread
From: Catalin Marinas @ 2015-07-24 15:42 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jul 24, 2015 at 11:42:08AM +0100, Will Deacon wrote:
> The cost of changing a cacheline from shared to exclusive state can be
> significant, especially when this is triggered by an exclusive store,
> since it may result in having to retry the transaction.
> 
> This patch makes use of prfm to prefetch cachelines for write prior to
> ldxr/stxr loops when using the ll/sc atomic routines.
> 
> Signed-off-by: Will Deacon <will.deacon@arm.com>

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 18/20] arm64: atomics: implement atomic{, 64}_cmpxchg using cmpxchg
  2015-07-24 10:41 [PATCH v2 00/20] arm64: support for 8.1 LSE atomic instructions Will Deacon
                   ` (16 preceding siblings ...)
  2015-07-24 10:42 ` [PATCH v2 17/20] arm64: atomics: prefetch the destination word for write prior to stxr Will Deacon
@ 2015-07-24 10:42 ` Will Deacon
  2015-07-24 15:44   ` Catalin Marinas
  2015-07-24 10:42 ` [PATCH v2 19/20] arm64: atomic64_dec_if_positive: fix incorrect branch condition Will Deacon
  2015-07-24 10:42 ` [PATCH v2 20/20] arm64: kconfig: select HAVE_CMPXCHG_LOCAL Will Deacon
  19 siblings, 1 reply; 44+ messages in thread
From: Will Deacon @ 2015-07-24 10:42 UTC (permalink / raw)
  To: linux-arm-kernel

We don't need duplicate cmpxchg implementations, so use cmpxchg to
implement atomic{,64}_cmpxchg, like we do for xchg already.

Reviewed-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/atomic.h       |  2 ++
 arch/arm64/include/asm/atomic_ll_sc.h | 46 -----------------------------------
 arch/arm64/include/asm/atomic_lse.h   | 43 --------------------------------
 3 files changed, 2 insertions(+), 89 deletions(-)

diff --git a/arch/arm64/include/asm/atomic.h b/arch/arm64/include/asm/atomic.h
index 51816ab2312d..b4eff63be0ff 100644
--- a/arch/arm64/include/asm/atomic.h
+++ b/arch/arm64/include/asm/atomic.h
@@ -56,6 +56,7 @@
 #define atomic_read(v)			READ_ONCE((v)->counter)
 #define atomic_set(v, i)		(((v)->counter) = (i))
 #define atomic_xchg(v, new)		xchg(&((v)->counter), (new))
+#define atomic_cmpxchg(v, old, new)	cmpxchg(&((v)->counter), (old), (new))
 
 #define atomic_inc(v)			atomic_add(1, (v))
 #define atomic_dec(v)			atomic_sub(1, (v))
@@ -74,6 +75,7 @@
 #define atomic64_read			atomic_read
 #define atomic64_set			atomic_set
 #define atomic64_xchg			atomic_xchg
+#define atomic64_cmpxchg		atomic_cmpxchg
 
 #define atomic64_inc(v)			atomic64_add(1, (v))
 #define atomic64_dec(v)			atomic64_sub(1, (v))
diff --git a/arch/arm64/include/asm/atomic_ll_sc.h b/arch/arm64/include/asm/atomic_ll_sc.h
index 652877fefae6..cbaedf9afb2f 100644
--- a/arch/arm64/include/asm/atomic_ll_sc.h
+++ b/arch/arm64/include/asm/atomic_ll_sc.h
@@ -88,29 +88,6 @@ ATOMIC_OPS(sub, sub)
 #undef ATOMIC_OP_RETURN
 #undef ATOMIC_OP
 
-__LL_SC_INLINE int
-__LL_SC_PREFIX(atomic_cmpxchg(atomic_t *ptr, int old, int new))
-{
-	unsigned long tmp;
-	int oldval;
-
-	asm volatile("// atomic_cmpxchg\n"
-"	prfm	pstl1strm, %2\n"
-"1:	ldxr	%w1, %2\n"
-"	eor	%w0, %w1, %w3\n"
-"	cbnz	%w0, 2f\n"
-"	stlxr	%w0, %w4, %2\n"
-"	cbnz	%w0, 1b\n"
-"	dmb	ish\n"
-"2:"
-	: "=&r" (tmp), "=&r" (oldval), "+Q" (ptr->counter)
-	: "Lr" (old), "r" (new)
-	: "memory");
-
-	return oldval;
-}
-__LL_SC_EXPORT(atomic_cmpxchg);
-
 #define ATOMIC64_OP(op, asm_op)						\
 __LL_SC_INLINE void							\
 __LL_SC_PREFIX(atomic64_##op(long i, atomic64_t *v))			\
@@ -163,29 +140,6 @@ ATOMIC64_OPS(sub, sub)
 #undef ATOMIC64_OP
 
 __LL_SC_INLINE long
-__LL_SC_PREFIX(atomic64_cmpxchg(atomic64_t *ptr, long old, long new))
-{
-	long oldval;
-	unsigned long res;
-
-	asm volatile("// atomic64_cmpxchg\n"
-"	prfm	pstl1strm, %2\n"
-"1:	ldxr	%1, %2\n"
-"	eor	%0, %1, %3\n"
-"	cbnz	%w0, 2f\n"
-"	stlxr	%w0, %4, %2\n"
-"	cbnz	%w0, 1b\n"
-"	dmb	ish\n"
-"2:"
-	: "=&r" (res), "=&r" (oldval), "+Q" (ptr->counter)
-	: "Lr" (old), "r" (new)
-	: "memory");
-
-	return oldval;
-}
-__LL_SC_EXPORT(atomic64_cmpxchg);
-
-__LL_SC_INLINE long
 __LL_SC_PREFIX(atomic64_dec_if_positive(atomic64_t *v))
 {
 	long result;
diff --git a/arch/arm64/include/asm/atomic_lse.h b/arch/arm64/include/asm/atomic_lse.h
index a03242e143e5..b960a547723c 100644
--- a/arch/arm64/include/asm/atomic_lse.h
+++ b/arch/arm64/include/asm/atomic_lse.h
@@ -96,28 +96,6 @@ static inline int atomic_sub_return(int i, atomic_t *v)
 	return w0;
 }
 
-static inline int atomic_cmpxchg(atomic_t *ptr, int old, int new)
-{
-	register unsigned long x0 asm ("x0") = (unsigned long)ptr;
-	register int w1 asm ("w1") = old;
-	register int w2 asm ("w2") = new;
-
-	asm volatile(ARM64_LSE_ATOMIC_INSN(
-	/* LL/SC */
-	"	nop\n"
-	__LL_SC_ATOMIC(cmpxchg)
-	"	nop",
-	/* LSE atomics */
-	"	mov	w30, %w[old]\n"
-	"	casal	w30, %w[new], %[v]\n"
-	"	mov	%w[ret], w30")
-	: [ret] "+r" (x0), [v] "+Q" (ptr->counter)
-	: [old] "r" (w1), [new] "r" (w2)
-	: "x30", "memory");
-
-	return x0;
-}
-
 #undef __LL_SC_ATOMIC
 
 #define __LL_SC_ATOMIC64(op)	__LL_SC_CALL(atomic64_##op)
@@ -190,27 +168,6 @@ static inline long atomic64_sub_return(long i, atomic64_t *v)
 
 	return x0;
 }
-static inline long atomic64_cmpxchg(atomic64_t *ptr, long old, long new)
-{
-	register unsigned long x0 asm ("x0") = (unsigned long)ptr;
-	register long x1 asm ("x1") = old;
-	register long x2 asm ("x2") = new;
-
-	asm volatile(ARM64_LSE_ATOMIC_INSN(
-	/* LL/SC */
-	"	nop\n"
-	__LL_SC_ATOMIC64(cmpxchg)
-	"	nop",
-	/* LSE atomics */
-	"	mov	x30, %[old]\n"
-	"	casal	x30, %[new], %[v]\n"
-	"	mov	%[ret], x30")
-	: [ret] "+r" (x0), [v] "+Q" (ptr->counter)
-	: [old] "r" (x1), [new] "r" (x2)
-	: "x30", "memory");
-
-	return x0;
-}
 
 static inline long atomic64_dec_if_positive(atomic64_t *v)
 {
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v2 18/20] arm64: atomics: implement atomic{, 64}_cmpxchg using cmpxchg
  2015-07-24 10:42 ` [PATCH v2 18/20] arm64: atomics: implement atomic{, 64}_cmpxchg using cmpxchg Will Deacon
@ 2015-07-24 15:44   ` Catalin Marinas
  0 siblings, 0 replies; 44+ messages in thread
From: Catalin Marinas @ 2015-07-24 15:44 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jul 24, 2015 at 11:42:09AM +0100, Will Deacon wrote:
> We don't need duplicate cmpxchg implementations, so use cmpxchg to
> implement atomic{,64}_cmpxchg, like we do for xchg already.
> 
> Reviewed-by: Steve Capper <steve.capper@arm.com>
> Signed-off-by: Will Deacon <will.deacon@arm.com>
> ---

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 19/20] arm64: atomic64_dec_if_positive: fix incorrect branch condition
  2015-07-24 10:41 [PATCH v2 00/20] arm64: support for 8.1 LSE atomic instructions Will Deacon
                   ` (17 preceding siblings ...)
  2015-07-24 10:42 ` [PATCH v2 18/20] arm64: atomics: implement atomic{, 64}_cmpxchg using cmpxchg Will Deacon
@ 2015-07-24 10:42 ` Will Deacon
  2015-07-24 15:45   ` Catalin Marinas
  2015-07-24 10:42 ` [PATCH v2 20/20] arm64: kconfig: select HAVE_CMPXCHG_LOCAL Will Deacon
  19 siblings, 1 reply; 44+ messages in thread
From: Will Deacon @ 2015-07-24 10:42 UTC (permalink / raw)
  To: linux-arm-kernel

If we attempt to atomic64_dec_if_positive on INT_MIN, we will underflow
and incorrectly decide that the original parameter was positive.

This patches fixes the broken condition code so that we handle this
corner case correctly.

Reviewed-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/atomic_ll_sc.h | 2 +-
 arch/arm64/include/asm/atomic_lse.h   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/atomic_ll_sc.h b/arch/arm64/include/asm/atomic_ll_sc.h
index cbaedf9afb2f..1a9ae9197a9f 100644
--- a/arch/arm64/include/asm/atomic_ll_sc.h
+++ b/arch/arm64/include/asm/atomic_ll_sc.h
@@ -149,7 +149,7 @@ __LL_SC_PREFIX(atomic64_dec_if_positive(atomic64_t *v))
 "	prfm	pstl1strm, %2\n"
 "1:	ldxr	%0, %2\n"
 "	subs	%0, %0, #1\n"
-"	b.mi	2f\n"
+"	b.lt	2f\n"
 "	stlxr	%w1, %0, %2\n"
 "	cbnz	%w1, 1b\n"
 "	dmb	ish\n"
diff --git a/arch/arm64/include/asm/atomic_lse.h b/arch/arm64/include/asm/atomic_lse.h
index b960a547723c..d4b6cb940ee1 100644
--- a/arch/arm64/include/asm/atomic_lse.h
+++ b/arch/arm64/include/asm/atomic_lse.h
@@ -185,7 +185,7 @@ static inline long atomic64_dec_if_positive(atomic64_t *v)
 	/* LSE atomics */
 	"1:	ldr	x30, %[v]\n"
 	"	subs	%[ret], x30, #1\n"
-	"	b.mi	2f\n"
+	"	b.lt	2f\n"
 	"	casal	x30, %[ret], %[v]\n"
 	"	sub	x30, x30, #1\n"
 	"	sub	x30, x30, %[ret]\n"
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v2 19/20] arm64: atomic64_dec_if_positive: fix incorrect branch condition
  2015-07-24 10:42 ` [PATCH v2 19/20] arm64: atomic64_dec_if_positive: fix incorrect branch condition Will Deacon
@ 2015-07-24 15:45   ` Catalin Marinas
  0 siblings, 0 replies; 44+ messages in thread
From: Catalin Marinas @ 2015-07-24 15:45 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jul 24, 2015 at 11:42:10AM +0100, Will Deacon wrote:
> If we attempt to atomic64_dec_if_positive on INT_MIN, we will underflow
> and incorrectly decide that the original parameter was positive.
> 
> This patches fixes the broken condition code so that we handle this
> corner case correctly.
> 
> Reviewed-by: Steve Capper <steve.capper@arm.com>
> Signed-off-by: Will Deacon <will.deacon@arm.com>

Acked-by: Catalin Marinas <catalin.marinas@arm.com>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 20/20] arm64: kconfig: select HAVE_CMPXCHG_LOCAL
  2015-07-24 10:41 [PATCH v2 00/20] arm64: support for 8.1 LSE atomic instructions Will Deacon
                   ` (18 preceding siblings ...)
  2015-07-24 10:42 ` [PATCH v2 19/20] arm64: atomic64_dec_if_positive: fix incorrect branch condition Will Deacon
@ 2015-07-24 10:42 ` Will Deacon
  2015-07-24 15:45   ` Catalin Marinas
  19 siblings, 1 reply; 44+ messages in thread
From: Will Deacon @ 2015-07-24 10:42 UTC (permalink / raw)
  To: linux-arm-kernel

We implement an optimised cmpxchg_local macro, so let the kernel know.

Reviewed-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index d11b1af62438..19ea772a5d7b 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -53,6 +53,7 @@ config ARM64
 	select HAVE_C_RECORDMCOUNT
 	select HAVE_CC_STACKPROTECTOR
 	select HAVE_CMPXCHG_DOUBLE
+	select HAVE_CMPXCHG_LOCAL
 	select HAVE_DEBUG_BUGVERBOSE
 	select HAVE_DEBUG_KMEMLEAK
 	select HAVE_DMA_API_DEBUG
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v2 20/20] arm64: kconfig: select HAVE_CMPXCHG_LOCAL
  2015-07-24 10:42 ` [PATCH v2 20/20] arm64: kconfig: select HAVE_CMPXCHG_LOCAL Will Deacon
@ 2015-07-24 15:45   ` Catalin Marinas
  0 siblings, 0 replies; 44+ messages in thread
From: Catalin Marinas @ 2015-07-24 15:45 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jul 24, 2015 at 11:42:11AM +0100, Will Deacon wrote:
> We implement an optimised cmpxchg_local macro, so let the kernel know.
> 
> Reviewed-by: Steve Capper <steve.capper@arm.com>
> Signed-off-by: Will Deacon <will.deacon@arm.com>

Acked-by: Catalin Marinas <catalin.marinas@arm.com>

^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2015-07-27 13:00 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-07-24 10:41 [PATCH v2 00/20] arm64: support for 8.1 LSE atomic instructions Will Deacon
2015-07-24 10:41 ` [PATCH v2 01/20] arm64: rwlocks: don't fail trylock purely due to contention Will Deacon
2015-07-24 11:14   ` Catalin Marinas
2015-07-24 10:41 ` [PATCH v2 02/20] documentation: Clarify failed cmpxchg memory ordering semantics Will Deacon
2015-07-24 11:15   ` Catalin Marinas
2015-07-27 11:58   ` Will Deacon
2015-07-27 12:02     ` Peter Zijlstra
2015-07-27 13:00       ` Will Deacon
2015-07-24 10:41 ` [PATCH v2 03/20] arm64: cpufeature.h: add missing #include of kernel.h Will Deacon
2015-07-24 11:15   ` Catalin Marinas
2015-07-24 10:41 ` [PATCH v2 04/20] arm64: atomics: move ll/sc atomics into separate header file Will Deacon
2015-07-24 11:19   ` Catalin Marinas
2015-07-24 10:41 ` [PATCH v2 05/20] arm64: elf: advertise 8.1 atomic instructions as new hwcap Will Deacon
2015-07-24 11:24   ` Catalin Marinas
2015-07-24 10:41 ` [PATCH v2 06/20] arm64: alternatives: add cpu feature for lse atomics Will Deacon
2015-07-24 11:26   ` Catalin Marinas
2015-07-24 10:41 ` [PATCH v2 07/20] arm64: introduce CONFIG_ARM64_LSE_ATOMICS as fallback to ll/sc atomics Will Deacon
2015-07-24 11:38   ` Catalin Marinas
2015-07-24 10:41 ` [PATCH v2 08/20] arm64: atomics: patch in lse instructions when supported by the CPU Will Deacon
2015-07-24 14:43   ` Catalin Marinas
2015-07-24 10:42 ` [PATCH v2 09/20] arm64: locks: " Will Deacon
2015-07-24 15:08   ` Catalin Marinas
2015-07-24 10:42 ` [PATCH v2 10/20] arm64: bitops: " Will Deacon
2015-07-24 15:19   ` Catalin Marinas
2015-07-24 10:42 ` [PATCH v2 11/20] arm64: xchg: " Will Deacon
2015-07-24 15:19   ` Catalin Marinas
2015-07-24 10:42 ` [PATCH v2 12/20] arm64: cmpxchg: " Will Deacon
2015-07-24 15:21   ` Catalin Marinas
2015-07-24 10:42 ` [PATCH v2 13/20] arm64: cmpxchg_dbl: " Will Deacon
2015-07-24 15:29   ` Catalin Marinas
2015-07-24 10:42 ` [PATCH v2 14/20] arm64: cmpxchg: avoid "cc" clobber in ll/sc routines Will Deacon
2015-07-24 15:30   ` Catalin Marinas
2015-07-24 10:42 ` [PATCH v2 15/20] arm64: cmpxchg: avoid memory barrier on comparison failure Will Deacon
2015-07-24 15:32   ` Catalin Marinas
2015-07-24 10:42 ` [PATCH v2 16/20] arm64: atomics: tidy up common atomic{,64}_* macros Will Deacon
2015-07-24 15:40   ` Catalin Marinas
2015-07-24 10:42 ` [PATCH v2 17/20] arm64: atomics: prefetch the destination word for write prior to stxr Will Deacon
2015-07-24 15:42   ` Catalin Marinas
2015-07-24 10:42 ` [PATCH v2 18/20] arm64: atomics: implement atomic{, 64}_cmpxchg using cmpxchg Will Deacon
2015-07-24 15:44   ` Catalin Marinas
2015-07-24 10:42 ` [PATCH v2 19/20] arm64: atomic64_dec_if_positive: fix incorrect branch condition Will Deacon
2015-07-24 15:45   ` Catalin Marinas
2015-07-24 10:42 ` [PATCH v2 20/20] arm64: kconfig: select HAVE_CMPXCHG_LOCAL Will Deacon
2015-07-24 15:45   ` Catalin Marinas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).