linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] Implement ticket spinlocks for ARM
@ 2012-06-22 15:15 Will Deacon
  2012-06-22 15:15 ` [PATCH 1/2] ARM: spinlock: use ticket algorithm for ARMv6+ locking implementation Will Deacon
  2012-06-22 15:15 ` [PATCH 2/2] ARM: rwlocks: remove unused branch labels from trylock routines Will Deacon
  0 siblings, 2 replies; 6+ messages in thread
From: Will Deacon @ 2012-06-22 15:15 UTC (permalink / raw)
  To: linux-arm-kernel

Hello,

With multi-cluster ARM platforms on the horizon, our current spinlock
implementation can result in lock-bias when the memory latency is not
uniform between clusters. This may be due to microarchitectural
constraints or (mis-)configuration of the cache-coherent interconnect.

This series ensures lock fairness by implementing the ticket algorithm
described in detail here:

        http://lwn.net/Articles/267968/

Whilst fairness is not necessarily congruent with throughput, I've not
observed any drop in performance on single-cluster systems with this
change to the kernel (I ran hackbench, which stresses &(&u->lock)->rlock
on the msg{snd,rcv} paths).

All comments welcome,

Will


Will Deacon (2):
  ARM: spinlock: use ticket algorithm for ARMv6+ locking implementation
  ARM: rwlocks: remove unused branch labels from trylock routines

 arch/arm/include/asm/spinlock.h       |   77 ++++++++++++++++++++++-----------
 arch/arm/include/asm/spinlock_types.h |   17 ++++++-
 2 files changed, 66 insertions(+), 28 deletions(-)

-- 
1.7.4.1

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/2] ARM: spinlock: use ticket algorithm for ARMv6+ locking implementation
  2012-06-22 15:15 [PATCH 0/2] Implement ticket spinlocks for ARM Will Deacon
@ 2012-06-22 15:15 ` Will Deacon
  2012-06-22 20:08   ` Nicolas Pitre
  2012-06-22 15:15 ` [PATCH 2/2] ARM: rwlocks: remove unused branch labels from trylock routines Will Deacon
  1 sibling, 1 reply; 6+ messages in thread
From: Will Deacon @ 2012-06-22 15:15 UTC (permalink / raw)
  To: linux-arm-kernel

Ticket spinlocks ensure locking fairness by reducing the thundering herd
effect when acquiring a lock. This is especially important on systems
where memory-access times are not necessarily uniform when accessing
the lock structure (for example, on a multi-cluster platform where the
lock is allocated into L1 when a CPU releases it).

This patch implements the ticket spinlock algorithm for ARM, replacing
the simpler implementation for ARMv6+ processors.

Cc: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm/include/asm/spinlock.h       |   73 ++++++++++++++++++++++-----------
 arch/arm/include/asm/spinlock_types.h |   17 +++++++-
 2 files changed, 64 insertions(+), 26 deletions(-)

diff --git a/arch/arm/include/asm/spinlock.h b/arch/arm/include/asm/spinlock.h
index 65fa3c8..dcca638 100644
--- a/arch/arm/include/asm/spinlock.h
+++ b/arch/arm/include/asm/spinlock.h
@@ -59,18 +59,13 @@ static inline void dsb_sev(void)
 }
 
 /*
- * ARMv6 Spin-locking.
+ * ARMv6 ticket-based spin-locking.
  *
- * We exclusively read the old value.  If it is zero, we may have
- * won the lock, so we try exclusively storing it.  A memory barrier
- * is required after we get a lock, and before we release it, because
- * V6 CPUs are assumed to have weakly ordered memory.
- *
- * Unlocked value: 0
- * Locked value: 1
+ * A memory barrier is required after we get a lock, and before we
+ * release it, because V6 CPUs are assumed to have weakly ordered
+ * memory.
  */
 
-#define arch_spin_is_locked(x)		((x)->lock != 0)
 #define arch_spin_unlock_wait(lock) \
 	do { while (arch_spin_is_locked(lock)) cpu_relax(); } while (0)
 
@@ -79,31 +74,40 @@ static inline void dsb_sev(void)
 static inline void arch_spin_lock(arch_spinlock_t *lock)
 {
 	unsigned long tmp;
+	u32 newval;
+	arch_spinlock_t lockval;
 
 	__asm__ __volatile__(
-"1:	ldrex	%0, [%1]\n"
-"	teq	%0, #0\n"
-	WFE("ne")
-"	strexeq	%0, %2, [%1]\n"
-"	teqeq	%0, #0\n"
+"1:	ldrex	%0, [%3]\n"
+"	add	%1, %0, %4\n"
+"	strex	%2, %1, [%3]\n"
+"	teq	%2, #0\n"
 "	bne	1b"
-	: "=&r" (tmp)
-	: "r" (&lock->lock), "r" (1)
+	: "=&r" (lockval), "=&r" (newval), "=&r" (tmp)
+	: "r" (&lock->slock), "I" (1 << TICKET_SHIFT)
 	: "cc");
 
+	while (lockval.tickets.next != lockval.tickets.owner) {
+		wfe();
+		lockval.tickets.owner = ACCESS_ONCE(lock->tickets.owner);
+	}
+
 	smp_mb();
 }
 
 static inline int arch_spin_trylock(arch_spinlock_t *lock)
 {
 	unsigned long tmp;
+	u32 slock;
 
 	__asm__ __volatile__(
-"	ldrex	%0, [%1]\n"
-"	teq	%0, #0\n"
-"	strexeq	%0, %2, [%1]"
-	: "=&r" (tmp)
-	: "r" (&lock->lock), "r" (1)
+"	ldrex	%0, [%2]\n"
+"	cmp	%0, %0, ror #16\n"
+"	movne	%1, #1\n"
+"	addeq	%0, %0, %3\n"
+"	strexeq	%1, %0, [%2]"
+	: "=&r" (slock), "=&r" (tmp)
+	: "r" (&lock->slock), "I" (1 << TICKET_SHIFT)
 	: "cc");
 
 	if (tmp == 0) {
@@ -116,17 +120,38 @@ static inline int arch_spin_trylock(arch_spinlock_t *lock)
 
 static inline void arch_spin_unlock(arch_spinlock_t *lock)
 {
+	unsigned long tmp;
+	u32 slock;
+
 	smp_mb();
 
 	__asm__ __volatile__(
-"	str	%1, [%0]\n"
-	:
-	: "r" (&lock->lock), "r" (0)
+"	mov	%1, #1\n"
+"1:	ldrex	%0, [%2]\n"
+"	uadd16	%0, %0, %1\n"
+"	strex	%1, %0, [%2]\n"
+"	teq	%1, #0\n"
+"	bne	1b"
+	: "=&r" (slock), "=&r" (tmp)
+	: "r" (&lock->slock)
 	: "cc");
 
 	dsb_sev();
 }
 
+static inline int arch_spin_is_locked(arch_spinlock_t *lock)
+{
+	struct __raw_tickets tickets = ACCESS_ONCE(lock->tickets);
+	return tickets.owner != tickets.next;
+}
+
+static inline int arch_spin_is_contended(arch_spinlock_t *lock)
+{
+	struct __raw_tickets tickets = ACCESS_ONCE(lock->tickets);
+	return (tickets.next - tickets.owner) > 1;
+}
+#define arch_spin_is_contended	arch_spin_is_contended
+
 /*
  * RWLOCKS
  *
diff --git a/arch/arm/include/asm/spinlock_types.h b/arch/arm/include/asm/spinlock_types.h
index d14d197..b262d2f 100644
--- a/arch/arm/include/asm/spinlock_types.h
+++ b/arch/arm/include/asm/spinlock_types.h
@@ -5,11 +5,24 @@
 # error "please don't include this file directly"
 #endif
 
+#define TICKET_SHIFT	16
+
 typedef struct {
-	volatile unsigned int lock;
+	union {
+		u32 slock;
+		struct __raw_tickets {
+#ifdef __ARMEB__
+			u16 next;
+			u16 owner;
+#else
+			u16 owner;
+			u16 next;
+#endif
+		} tickets;
+	};
 } arch_spinlock_t;
 
-#define __ARCH_SPIN_LOCK_UNLOCKED	{ 0 }
+#define __ARCH_SPIN_LOCK_UNLOCKED	{ { 0 } }
 
 typedef struct {
 	volatile unsigned int lock;
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/2] ARM: rwlocks: remove unused branch labels from trylock routines
  2012-06-22 15:15 [PATCH 0/2] Implement ticket spinlocks for ARM Will Deacon
  2012-06-22 15:15 ` [PATCH 1/2] ARM: spinlock: use ticket algorithm for ARMv6+ locking implementation Will Deacon
@ 2012-06-22 15:15 ` Will Deacon
  2012-06-22 20:22   ` Nicolas Pitre
  1 sibling, 1 reply; 6+ messages in thread
From: Will Deacon @ 2012-06-22 15:15 UTC (permalink / raw)
  To: linux-arm-kernel

The ARM arch_{read,write}_trylock implementations include unused
backwards branch labels, since we don't retry the locking operation
if the exclusive store fails.

This patch removes the labels.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm/include/asm/spinlock.h |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm/include/asm/spinlock.h b/arch/arm/include/asm/spinlock.h
index dcca638..aa7945b 100644
--- a/arch/arm/include/asm/spinlock.h
+++ b/arch/arm/include/asm/spinlock.h
@@ -183,7 +183,7 @@ static inline int arch_write_trylock(arch_rwlock_t *rw)
 	unsigned long tmp;
 
 	__asm__ __volatile__(
-"1:	ldrex	%0, [%1]\n"
+"	ldrex	%0, [%1]\n"
 "	teq	%0, #0\n"
 "	strexeq	%0, %2, [%1]"
 	: "=&r" (tmp)
@@ -269,7 +269,7 @@ static inline int arch_read_trylock(arch_rwlock_t *rw)
 	unsigned long tmp, tmp2 = 1;
 
 	__asm__ __volatile__(
-"1:	ldrex	%0, [%2]\n"
+"	ldrex	%0, [%2]\n"
 "	adds	%0, %0, #1\n"
 "	strexpl	%1, %0, [%2]\n"
 	: "=&r" (tmp), "+r" (tmp2)
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 1/2] ARM: spinlock: use ticket algorithm for ARMv6+ locking implementation
  2012-06-22 15:15 ` [PATCH 1/2] ARM: spinlock: use ticket algorithm for ARMv6+ locking implementation Will Deacon
@ 2012-06-22 20:08   ` Nicolas Pitre
  2012-06-25  9:36     ` Will Deacon
  0 siblings, 1 reply; 6+ messages in thread
From: Nicolas Pitre @ 2012-06-22 20:08 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, 22 Jun 2012, Will Deacon wrote:

> Ticket spinlocks ensure locking fairness by reducing the thundering herd
> effect when acquiring a lock.

The conventional thundering herd effect is still there as all waiting 
CPUs are still woken up from WFE upon a spin_unlock with a SEV.  Maybe 
the fact that the actual spinning is no longer banging on the exclusion 
monitor does count as reduction of the thundering herd effect at that 
level though, and if that is what you meant then this could be precised 
here.

> This is especially important on systems
> where memory-access times are not necessarily uniform when accessing
> the lock structure (for example, on a multi-cluster platform where the
> lock is allocated into L1 when a CPU releases it).

In which case it is more about fairness.  This algorithm brings fairness 
due to its FIFO nature, despite possible memory access speed 
differences.  And that is even more important than the thundering herd 
effect.  Solving both at once is of course all good.

> This patch implements the ticket spinlock algorithm for ARM, replacing
> the simpler implementation for ARMv6+ processors.
> 
> Cc: Nicolas Pitre <nico@fluxnic.net>
> Signed-off-by: Will Deacon <will.deacon@arm.com>

A minor remarks below.  Otherwise...

Reviewed-by: Nicolas Pitre <nico@linaro.org>

> ---
>  arch/arm/include/asm/spinlock.h       |   73 ++++++++++++++++++++++-----------
>  arch/arm/include/asm/spinlock_types.h |   17 +++++++-
>  2 files changed, 64 insertions(+), 26 deletions(-)
> 
> diff --git a/arch/arm/include/asm/spinlock.h b/arch/arm/include/asm/spinlock.h
> index 65fa3c8..dcca638 100644
> --- a/arch/arm/include/asm/spinlock.h
> +++ b/arch/arm/include/asm/spinlock.h
> @@ -79,31 +74,40 @@ static inline void dsb_sev(void)
>  static inline void arch_spin_lock(arch_spinlock_t *lock)
>  {
>  	unsigned long tmp;
> +	u32 newval;
> +	arch_spinlock_t lockval;
>  
>  	__asm__ __volatile__(
> -"1:	ldrex	%0, [%1]\n"
> -"	teq	%0, #0\n"
> -	WFE("ne")
> -"	strexeq	%0, %2, [%1]\n"
> -"	teqeq	%0, #0\n"
> +"1:	ldrex	%0, [%3]\n"
> +"	add	%1, %0, %4\n"
> +"	strex	%2, %1, [%3]\n"
> +"	teq	%2, #0\n"
>  "	bne	1b"
> -	: "=&r" (tmp)
> -	: "r" (&lock->lock), "r" (1)
> +	: "=&r" (lockval), "=&r" (newval), "=&r" (tmp)
> +	: "r" (&lock->slock), "I" (1 << TICKET_SHIFT)
>  	: "cc");
>  
> +	while (lockval.tickets.next != lockval.tickets.owner) {
> +		wfe();
> +		lockval.tickets.owner = ACCESS_ONCE(lock->tickets.owner);
> +	}
> +
>  	smp_mb();
>  }
>  
>  static inline int arch_spin_trylock(arch_spinlock_t *lock)
>  {
>  	unsigned long tmp;
> +	u32 slock;
>  
>  	__asm__ __volatile__(
> -"	ldrex	%0, [%1]\n"
> -"	teq	%0, #0\n"
> -"	strexeq	%0, %2, [%1]"
> -	: "=&r" (tmp)
> -	: "r" (&lock->lock), "r" (1)
> +"	ldrex	%0, [%2]\n"
> +"	cmp	%0, %0, ror #16\n"
> +"	movne	%1, #1\n"

You could replace the above 2 insns with:

	subs	%1, %0, %0, ror #16

> +"	addeq	%0, %0, %3\n"
> +"	strexeq	%1, %0, [%2]"
> +	: "=&r" (slock), "=&r" (tmp)
> +	: "r" (&lock->slock), "I" (1 << TICKET_SHIFT)
>  	: "cc");
>  
>  	if (tmp == 0) {


Nicolas

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 2/2] ARM: rwlocks: remove unused branch labels from trylock routines
  2012-06-22 15:15 ` [PATCH 2/2] ARM: rwlocks: remove unused branch labels from trylock routines Will Deacon
@ 2012-06-22 20:22   ` Nicolas Pitre
  0 siblings, 0 replies; 6+ messages in thread
From: Nicolas Pitre @ 2012-06-22 20:22 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, 22 Jun 2012, Will Deacon wrote:

> The ARM arch_{read,write}_trylock implementations include unused
> backwards branch labels, since we don't retry the locking operation
> if the exclusive store fails.
> 
> This patch removes the labels.
> 
> Signed-off-by: Will Deacon <will.deacon@arm.com>

Acked-by: Nicolas Pitre <nico@linaro.org>

> ---
>  arch/arm/include/asm/spinlock.h |    4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm/include/asm/spinlock.h b/arch/arm/include/asm/spinlock.h
> index dcca638..aa7945b 100644
> --- a/arch/arm/include/asm/spinlock.h
> +++ b/arch/arm/include/asm/spinlock.h
> @@ -183,7 +183,7 @@ static inline int arch_write_trylock(arch_rwlock_t *rw)
>  	unsigned long tmp;
>  
>  	__asm__ __volatile__(
> -"1:	ldrex	%0, [%1]\n"
> +"	ldrex	%0, [%1]\n"
>  "	teq	%0, #0\n"
>  "	strexeq	%0, %2, [%1]"
>  	: "=&r" (tmp)
> @@ -269,7 +269,7 @@ static inline int arch_read_trylock(arch_rwlock_t *rw)
>  	unsigned long tmp, tmp2 = 1;
>  
>  	__asm__ __volatile__(
> -"1:	ldrex	%0, [%2]\n"
> +"	ldrex	%0, [%2]\n"
>  "	adds	%0, %0, #1\n"
>  "	strexpl	%1, %0, [%2]\n"
>  	: "=&r" (tmp), "+r" (tmp2)
> -- 
> 1.7.4.1
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/2] ARM: spinlock: use ticket algorithm for ARMv6+ locking implementation
  2012-06-22 20:08   ` Nicolas Pitre
@ 2012-06-25  9:36     ` Will Deacon
  0 siblings, 0 replies; 6+ messages in thread
From: Will Deacon @ 2012-06-25  9:36 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Nicolas,

Thanks for looking at this.

On Fri, Jun 22, 2012 at 09:08:47PM +0100, Nicolas Pitre wrote:
> On Fri, 22 Jun 2012, Will Deacon wrote:
> 
> > Ticket spinlocks ensure locking fairness by reducing the thundering herd
> > effect when acquiring a lock.
> 
> The conventional thundering herd effect is still there as all waiting 
> CPUs are still woken up from WFE upon a spin_unlock with a SEV.  Maybe 
> the fact that the actual spinning is no longer banging on the exclusion 
> monitor does count as reduction of the thundering herd effect at that 
> level though, and if that is what you meant then this could be precised 
> here.

Sure, I can clarify that. The thundering herd effect is reduced because (a)
the cacheline can remain shared and (b) the memory access time doesn't
determine the winner.

> > This is especially important on systems
> > where memory-access times are not necessarily uniform when accessing
> > the lock structure (for example, on a multi-cluster platform where the
> > lock is allocated into L1 when a CPU releases it).
> 
> In which case it is more about fairness.  This algorithm brings fairness 
> due to its FIFO nature, despite possible memory access speed 
> differences.  And that is even more important than the thundering herd 
> effect.  Solving both at once is of course all good.

Precisely.

> > This patch implements the ticket spinlock algorithm for ARM, replacing
> > the simpler implementation for ARMv6+ processors.
> > 
> > Cc: Nicolas Pitre <nico@fluxnic.net>
> > Signed-off-by: Will Deacon <will.deacon@arm.com>
> 
> A minor remarks below.  Otherwise...
> 
> Reviewed-by: Nicolas Pitre <nico@linaro.org>

Thanks Nicolas.

> >  static inline int arch_spin_trylock(arch_spinlock_t *lock)
> >  {
> >  	unsigned long tmp;
> > +	u32 slock;
> >  
> >  	__asm__ __volatile__(
> > -"	ldrex	%0, [%1]\n"
> > -"	teq	%0, #0\n"
> > -"	strexeq	%0, %2, [%1]"
> > -	: "=&r" (tmp)
> > -	: "r" (&lock->lock), "r" (1)
> > +"	ldrex	%0, [%2]\n"
> > +"	cmp	%0, %0, ror #16\n"
> > +"	movne	%1, #1\n"
> 
> You could replace the above 2 insns with:
> 
> 	subs	%1, %0, %0, ror #16

Wahey, that is extremely concise!

Cheers,

Will

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2012-06-25  9:36 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-06-22 15:15 [PATCH 0/2] Implement ticket spinlocks for ARM Will Deacon
2012-06-22 15:15 ` [PATCH 1/2] ARM: spinlock: use ticket algorithm for ARMv6+ locking implementation Will Deacon
2012-06-22 20:08   ` Nicolas Pitre
2012-06-25  9:36     ` Will Deacon
2012-06-22 15:15 ` [PATCH 2/2] ARM: rwlocks: remove unused branch labels from trylock routines Will Deacon
2012-06-22 20:22   ` Nicolas Pitre

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).