* [PATCH v3 1/5] kernel/locking: Use struct qrwlock instead of struct __qrwlock
2017-10-12 12:20 [PATCH v3 0/5] Switch arm64 over to qrwlock Will Deacon
@ 2017-10-12 12:20 ` Will Deacon
2017-10-12 12:20 ` [PATCH v3 2/5] locking/atomic: Add atomic_cond_read_acquire Will Deacon
` (5 subsequent siblings)
6 siblings, 0 replies; 12+ messages in thread
From: Will Deacon @ 2017-10-12 12:20 UTC (permalink / raw)
To: linux-arm-kernel
There's no good reason to keep the internal structure of struct qrwlock
hidden from qrwlock.h, particularly as it's actually needed for unlock
and ends up being abstracted independently behind the __qrwlock_write_byte
function.
Stop pretending we can hide this stuff, and move the __qrwlock definition
into qrwlock, removing the __qrwlock_write_byte nastiness and using the
same struct definition everywhere instead.
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
include/asm-generic/qrwlock.h | 12 +-----------
include/asm-generic/qrwlock_types.h | 15 +++++++++++++--
kernel/locking/qrwlock.c | 26 ++------------------------
3 files changed, 16 insertions(+), 37 deletions(-)
diff --git a/include/asm-generic/qrwlock.h b/include/asm-generic/qrwlock.h
index 50925327b0a8..02c0a768e6b0 100644
--- a/include/asm-generic/qrwlock.h
+++ b/include/asm-generic/qrwlock.h
@@ -129,22 +129,12 @@ static inline void queued_read_unlock(struct qrwlock *lock)
}
/**
- * __qrwlock_write_byte - retrieve the write byte address of a queue rwlock
- * @lock : Pointer to queue rwlock structure
- * Return: the write byte address of a queue rwlock
- */
-static inline u8 *__qrwlock_write_byte(struct qrwlock *lock)
-{
- return (u8 *)lock + 3 * IS_BUILTIN(CONFIG_CPU_BIG_ENDIAN);
-}
-
-/**
* queued_write_unlock - release write lock of a queue rwlock
* @lock : Pointer to queue rwlock structure
*/
static inline void queued_write_unlock(struct qrwlock *lock)
{
- smp_store_release(__qrwlock_write_byte(lock), 0);
+ smp_store_release(&lock->wmode, 0);
}
/*
diff --git a/include/asm-generic/qrwlock_types.h b/include/asm-generic/qrwlock_types.h
index 0abc6b6062fb..507f2dc51bba 100644
--- a/include/asm-generic/qrwlock_types.h
+++ b/include/asm-generic/qrwlock_types.h
@@ -9,12 +9,23 @@
*/
typedef struct qrwlock {
- atomic_t cnts;
+ union {
+ atomic_t cnts;
+ struct {
+#ifdef __LITTLE_ENDIAN
+ u8 wmode; /* Writer mode */
+ u8 rcnts[3]; /* Reader counts */
+#else
+ u8 rcnts[3]; /* Reader counts */
+ u8 wmode; /* Writer mode */
+#endif
+ };
+ };
arch_spinlock_t wait_lock;
} arch_rwlock_t;
#define __ARCH_RW_LOCK_UNLOCKED { \
- .cnts = ATOMIC_INIT(0), \
+ { .cnts = ATOMIC_INIT(0), }, \
.wait_lock = __ARCH_SPIN_LOCK_UNLOCKED, \
}
diff --git a/kernel/locking/qrwlock.c b/kernel/locking/qrwlock.c
index 2655f26ec882..1af791e37348 100644
--- a/kernel/locking/qrwlock.c
+++ b/kernel/locking/qrwlock.c
@@ -23,26 +23,6 @@
#include <linux/spinlock.h>
#include <asm/qrwlock.h>
-/*
- * This internal data structure is used for optimizing access to some of
- * the subfields within the atomic_t cnts.
- */
-struct __qrwlock {
- union {
- atomic_t cnts;
- struct {
-#ifdef __LITTLE_ENDIAN
- u8 wmode; /* Writer mode */
- u8 rcnts[3]; /* Reader counts */
-#else
- u8 rcnts[3]; /* Reader counts */
- u8 wmode; /* Writer mode */
-#endif
- };
- };
- arch_spinlock_t lock;
-};
-
/**
* rspin_until_writer_unlock - inc reader count & spin until writer is gone
* @lock : Pointer to queue rwlock structure
@@ -125,10 +105,8 @@ void queued_write_lock_slowpath(struct qrwlock *lock)
* or wait for a previous writer to go away.
*/
for (;;) {
- struct __qrwlock *l = (struct __qrwlock *)lock;
-
- if (!READ_ONCE(l->wmode) &&
- (cmpxchg_relaxed(&l->wmode, 0, _QW_WAITING) == 0))
+ if (!READ_ONCE(lock->wmode) &&
+ (cmpxchg_relaxed(&lock->wmode, 0, _QW_WAITING) == 0))
break;
cpu_relax();
--
2.1.4
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v3 2/5] locking/atomic: Add atomic_cond_read_acquire
2017-10-12 12:20 [PATCH v3 0/5] Switch arm64 over to qrwlock Will Deacon
2017-10-12 12:20 ` [PATCH v3 1/5] kernel/locking: Use struct qrwlock instead of struct __qrwlock Will Deacon
@ 2017-10-12 12:20 ` Will Deacon
2017-10-12 12:20 ` [PATCH v3 3/5] kernel/locking: Use atomic_cond_read_acquire when spinning in qrwlock Will Deacon
` (4 subsequent siblings)
6 siblings, 0 replies; 12+ messages in thread
From: Will Deacon @ 2017-10-12 12:20 UTC (permalink / raw)
To: linux-arm-kernel
smp_cond_load_acquire provides a way to spin on a variable with acquire
semantics until some conditional expression involing the variable is
satisfied. Architectures such as arm64 can potentially enter a low-power
state, waking up only when the value of the variable changes, which
reduces the system impact of tight polling loops.
This patch makes the same interface available to users of atomic_t,
atomic64_t and atomic_long_t, rather than require messy accesses to the
structure internals.
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
include/asm-generic/atomic-long.h | 3 +++
include/linux/atomic.h | 4 ++++
2 files changed, 7 insertions(+)
diff --git a/include/asm-generic/atomic-long.h b/include/asm-generic/atomic-long.h
index 288cc9e96395..f2d97b782031 100644
--- a/include/asm-generic/atomic-long.h
+++ b/include/asm-generic/atomic-long.h
@@ -243,4 +243,7 @@ static inline long atomic_long_add_unless(atomic_long_t *l, long a, long u)
#define atomic_long_inc_not_zero(l) \
ATOMIC_LONG_PFX(_inc_not_zero)((ATOMIC_LONG_PFX(_t) *)(l))
+#define atomic_long_cond_read_acquire(v, c) \
+ ATOMIC_LONG_PFX(_cond_read_acquire)((ATOMIC_LONG_PFX(_t) *)(v), (c))
+
#endif /* _ASM_GENERIC_ATOMIC_LONG_H */
diff --git a/include/linux/atomic.h b/include/linux/atomic.h
index 40d6bfec0e0d..0aeb2b3f4578 100644
--- a/include/linux/atomic.h
+++ b/include/linux/atomic.h
@@ -653,6 +653,8 @@ static inline int atomic_dec_if_positive(atomic_t *v)
}
#endif
+#define atomic_cond_read_acquire(v, c) smp_cond_load_acquire(&(v)->counter, (c))
+
#ifdef CONFIG_GENERIC_ATOMIC64
#include <asm-generic/atomic64.h>
#endif
@@ -1072,6 +1074,8 @@ static inline long long atomic64_fetch_andnot_release(long long i, atomic64_t *v
}
#endif
+#define atomic64_cond_read_acquire(v, c) smp_cond_load_acquire(&(v)->counter, (c))
+
#include <asm-generic/atomic-long.h>
#endif /* _LINUX_ATOMIC_H */
--
2.1.4
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v3 3/5] kernel/locking: Use atomic_cond_read_acquire when spinning in qrwlock
2017-10-12 12:20 [PATCH v3 0/5] Switch arm64 over to qrwlock Will Deacon
2017-10-12 12:20 ` [PATCH v3 1/5] kernel/locking: Use struct qrwlock instead of struct __qrwlock Will Deacon
2017-10-12 12:20 ` [PATCH v3 2/5] locking/atomic: Add atomic_cond_read_acquire Will Deacon
@ 2017-10-12 12:20 ` Will Deacon
2017-10-12 12:20 ` [PATCH v3 4/5] arm64: locking: Move rwlock implementation over to qrwlocks Will Deacon
` (3 subsequent siblings)
6 siblings, 0 replies; 12+ messages in thread
From: Will Deacon @ 2017-10-12 12:20 UTC (permalink / raw)
To: linux-arm-kernel
The qrwlock slowpaths involve spinning when either a prospective reader
is waiting for a concurrent writer to drain, or a prospective writer is
waiting for concurrent readers to drain. In both of these situations,
atomic_cond_read_acquire can be used to avoid busy-waiting and make use
of any backoff functionality provided by the architecture.
This patch replaces the open-code loops and rspin_until_writer_unlock
implementation with atomic_cond_read_acquire. The write mode transition
zero to _QW_WAITING is left alone, since (a) this doesn't need acquire
semantics and (b) should be fast.
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Tested-by: Waiman Long <longman@redhat.com>
Tested-by: Jeremy Linton <jeremy.linton@arm.com>
Tested-by: Adam Wallis <awallis@codeaurora.org>
Tested-by: Jan Glauber <jglauber@cavium.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
include/asm-generic/qrwlock.h | 4 ++--
kernel/locking/qrwlock.c | 50 +++++++++++--------------------------------
2 files changed, 14 insertions(+), 40 deletions(-)
diff --git a/include/asm-generic/qrwlock.h b/include/asm-generic/qrwlock.h
index 02c0a768e6b0..c716b02e8fd7 100644
--- a/include/asm-generic/qrwlock.h
+++ b/include/asm-generic/qrwlock.h
@@ -49,7 +49,7 @@
/*
* External function declarations
*/
-extern void queued_read_lock_slowpath(struct qrwlock *lock, u32 cnts);
+extern void queued_read_lock_slowpath(struct qrwlock *lock);
extern void queued_write_lock_slowpath(struct qrwlock *lock);
/**
@@ -100,7 +100,7 @@ static inline void queued_read_lock(struct qrwlock *lock)
return;
/* The slowpath will decrement the reader count, if necessary. */
- queued_read_lock_slowpath(lock, cnts);
+ queued_read_lock_slowpath(lock);
}
/**
diff --git a/kernel/locking/qrwlock.c b/kernel/locking/qrwlock.c
index 1af791e37348..5825e0fc1a8e 100644
--- a/kernel/locking/qrwlock.c
+++ b/kernel/locking/qrwlock.c
@@ -24,28 +24,10 @@
#include <asm/qrwlock.h>
/**
- * rspin_until_writer_unlock - inc reader count & spin until writer is gone
- * @lock : Pointer to queue rwlock structure
- * @writer: Current queue rwlock writer status byte
- *
- * In interrupt context or at the head of the queue, the reader will just
- * increment the reader count & wait until the writer releases the lock.
- */
-static __always_inline void
-rspin_until_writer_unlock(struct qrwlock *lock, u32 cnts)
-{
- while ((cnts & _QW_WMASK) == _QW_LOCKED) {
- cpu_relax();
- cnts = atomic_read_acquire(&lock->cnts);
- }
-}
-
-/**
* queued_read_lock_slowpath - acquire read lock of a queue rwlock
* @lock: Pointer to queue rwlock structure
- * @cnts: Current qrwlock lock value
*/
-void queued_read_lock_slowpath(struct qrwlock *lock, u32 cnts)
+void queued_read_lock_slowpath(struct qrwlock *lock)
{
/*
* Readers come here when they cannot get the lock without waiting
@@ -53,13 +35,12 @@ void queued_read_lock_slowpath(struct qrwlock *lock, u32 cnts)
if (unlikely(in_interrupt())) {
/*
* Readers in interrupt context will get the lock immediately
- * if the writer is just waiting (not holding the lock yet).
- * The rspin_until_writer_unlock() function returns immediately
- * in this case. Otherwise, they will spin (with ACQUIRE
- * semantics) until the lock is available without waiting in
- * the queue.
+ * if the writer is just waiting (not holding the lock yet),
+ * so spin with ACQUIRE semantics until the lock is available
+ * without waiting in the queue.
*/
- rspin_until_writer_unlock(lock, cnts);
+ atomic_cond_read_acquire(&lock->cnts, (VAL & _QW_WMASK)
+ != _QW_LOCKED);
return;
}
atomic_sub(_QR_BIAS, &lock->cnts);
@@ -68,14 +49,14 @@ void queued_read_lock_slowpath(struct qrwlock *lock, u32 cnts)
* Put the reader into the wait queue
*/
arch_spin_lock(&lock->wait_lock);
+ atomic_add(_QR_BIAS, &lock->cnts);
/*
* The ACQUIRE semantics of the following spinning code ensure
* that accesses can't leak upwards out of our subsequent critical
* section in the case that the lock is currently held for write.
*/
- cnts = atomic_fetch_add_acquire(_QR_BIAS, &lock->cnts);
- rspin_until_writer_unlock(lock, cnts);
+ atomic_cond_read_acquire(&lock->cnts, (VAL & _QW_WMASK) != _QW_LOCKED);
/*
* Signal the next one in queue to become queue head
@@ -90,8 +71,6 @@ EXPORT_SYMBOL(queued_read_lock_slowpath);
*/
void queued_write_lock_slowpath(struct qrwlock *lock)
{
- u32 cnts;
-
/* Put the writer into the wait queue */
arch_spin_lock(&lock->wait_lock);
@@ -113,15 +92,10 @@ void queued_write_lock_slowpath(struct qrwlock *lock)
}
/* When no more readers, set the locked flag */
- for (;;) {
- cnts = atomic_read(&lock->cnts);
- if ((cnts == _QW_WAITING) &&
- (atomic_cmpxchg_acquire(&lock->cnts, _QW_WAITING,
- _QW_LOCKED) == _QW_WAITING))
- break;
-
- cpu_relax();
- }
+ do {
+ atomic_cond_read_acquire(&lock->cnts, VAL == _QW_WAITING);
+ } while (atomic_cmpxchg_relaxed(&lock->cnts, _QW_WAITING,
+ _QW_LOCKED) != _QW_WAITING);
unlock:
arch_spin_unlock(&lock->wait_lock);
}
--
2.1.4
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v3 4/5] arm64: locking: Move rwlock implementation over to qrwlocks
2017-10-12 12:20 [PATCH v3 0/5] Switch arm64 over to qrwlock Will Deacon
` (2 preceding siblings ...)
2017-10-12 12:20 ` [PATCH v3 3/5] kernel/locking: Use atomic_cond_read_acquire when spinning in qrwlock Will Deacon
@ 2017-10-12 12:20 ` Will Deacon
2017-10-12 12:20 ` [PATCH v3 5/5] kernel/locking: Prevent slowpath writers getting held up by fastpath Will Deacon
` (2 subsequent siblings)
6 siblings, 0 replies; 12+ messages in thread
From: Will Deacon @ 2017-10-12 12:20 UTC (permalink / raw)
To: linux-arm-kernel
Now that the qrwlock can make use of WFE, remove our homebrew rwlock
code in favour of the generic queued implementation.
Tested-by: Waiman Long <longman@redhat.com>
Tested-by: Jeremy Linton <jeremy.linton@arm.com>
Tested-by: Adam Wallis <awallis@codeaurora.org>
Tested-by: Jan Glauber <jglauber@cavium.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
arch/arm64/Kconfig | 17 ++++
arch/arm64/include/asm/Kbuild | 1 +
arch/arm64/include/asm/spinlock.h | 164 +-------------------------------
arch/arm64/include/asm/spinlock_types.h | 6 +-
4 files changed, 20 insertions(+), 168 deletions(-)
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 0df64a6a56d4..df02ad932020 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -22,7 +22,24 @@ config ARM64
select ARCH_HAS_STRICT_MODULE_RWX
select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
select ARCH_HAVE_NMI_SAFE_CMPXCHG if ACPI_APEI_SEA
+ select ARCH_INLINE_READ_LOCK if !PREEMPT
+ select ARCH_INLINE_READ_LOCK_BH if !PREEMPT
+ select ARCH_INLINE_READ_LOCK_IRQ if !PREEMPT
+ select ARCH_INLINE_READ_LOCK_IRQSAVE if !PREEMPT
+ select ARCH_INLINE_READ_UNLOCK if !PREEMPT
+ select ARCH_INLINE_READ_UNLOCK_BH if !PREEMPT
+ select ARCH_INLINE_READ_UNLOCK_IRQ if !PREEMPT
+ select ARCH_INLINE_READ_UNLOCK_IRQRESTORE if !PREEMPT
+ select ARCH_INLINE_WRITE_LOCK if !PREEMPT
+ select ARCH_INLINE_WRITE_LOCK_BH if !PREEMPT
+ select ARCH_INLINE_WRITE_LOCK_IRQ if !PREEMPT
+ select ARCH_INLINE_WRITE_LOCK_IRQSAVE if !PREEMPT
+ select ARCH_INLINE_WRITE_UNLOCK if !PREEMPT
+ select ARCH_INLINE_WRITE_UNLOCK_BH if !PREEMPT
+ select ARCH_INLINE_WRITE_UNLOCK_IRQ if !PREEMPT
+ select ARCH_INLINE_WRITE_UNLOCK_IRQRESTORE if !PREEMPT
select ARCH_USE_CMPXCHG_LOCKREF
+ select ARCH_USE_QUEUED_RWLOCKS
select ARCH_SUPPORTS_MEMORY_FAILURE
select ARCH_SUPPORTS_ATOMIC_RMW
select ARCH_SUPPORTS_NUMA_BALANCING
diff --git a/arch/arm64/include/asm/Kbuild b/arch/arm64/include/asm/Kbuild
index 2326e39d5892..e63d0a8312de 100644
--- a/arch/arm64/include/asm/Kbuild
+++ b/arch/arm64/include/asm/Kbuild
@@ -16,6 +16,7 @@ generic-y += mcs_spinlock.h
generic-y += mm-arch-hooks.h
generic-y += msi.h
generic-y += preempt.h
+generic-y += qrwlock.h
generic-y += rwsem.h
generic-y += segment.h
generic-y += serial.h
diff --git a/arch/arm64/include/asm/spinlock.h b/arch/arm64/include/asm/spinlock.h
index aa51a38e46e4..fdb827c7832f 100644
--- a/arch/arm64/include/asm/spinlock.h
+++ b/arch/arm64/include/asm/spinlock.h
@@ -137,169 +137,7 @@ static inline int arch_spin_is_contended(arch_spinlock_t *lock)
}
#define arch_spin_is_contended arch_spin_is_contended
-/*
- * Write lock implementation.
- *
- * Write locks set bit 31. Unlocking, is done by writing 0 since the lock is
- * exclusively held.
- *
- * The memory barriers are implicit with the load-acquire and store-release
- * instructions.
- */
-
-static inline void arch_write_lock(arch_rwlock_t *rw)
-{
- unsigned int tmp;
-
- asm volatile(ARM64_LSE_ATOMIC_INSN(
- /* LL/SC */
- " sevl\n"
- "1: wfe\n"
- "2: ldaxr %w0, %1\n"
- " cbnz %w0, 1b\n"
- " stxr %w0, %w2, %1\n"
- " cbnz %w0, 2b\n"
- __nops(1),
- /* LSE atomics */
- "1: mov %w0, wzr\n"
- "2: casa %w0, %w2, %1\n"
- " cbz %w0, 3f\n"
- " ldxr %w0, %1\n"
- " cbz %w0, 2b\n"
- " wfe\n"
- " b 1b\n"
- "3:")
- : "=&r" (tmp), "+Q" (rw->lock)
- : "r" (0x80000000)
- : "memory");
-}
-
-static inline int arch_write_trylock(arch_rwlock_t *rw)
-{
- unsigned int tmp;
-
- asm volatile(ARM64_LSE_ATOMIC_INSN(
- /* LL/SC */
- "1: ldaxr %w0, %1\n"
- " cbnz %w0, 2f\n"
- " stxr %w0, %w2, %1\n"
- " cbnz %w0, 1b\n"
- "2:",
- /* LSE atomics */
- " mov %w0, wzr\n"
- " casa %w0, %w2, %1\n"
- __nops(2))
- : "=&r" (tmp), "+Q" (rw->lock)
- : "r" (0x80000000)
- : "memory");
-
- return !tmp;
-}
-
-static inline void arch_write_unlock(arch_rwlock_t *rw)
-{
- asm volatile(ARM64_LSE_ATOMIC_INSN(
- " stlr wzr, %0",
- " swpl wzr, wzr, %0")
- : "=Q" (rw->lock) :: "memory");
-}
-
-/* write_can_lock - would write_trylock() succeed? */
-#define arch_write_can_lock(x) ((x)->lock == 0)
-
-/*
- * Read lock implementation.
- *
- * It exclusively loads the lock value, increments it and stores the new value
- * back if positive and the CPU still exclusively owns the location. If the
- * value is negative, the lock is already held.
- *
- * During unlocking there may be multiple active read locks but no write lock.
- *
- * The memory barriers are implicit with the load-acquire and store-release
- * instructions.
- *
- * Note that in UNDEFINED cases, such as unlocking a lock twice, the LL/SC
- * and LSE implementations may exhibit different behaviour (although this
- * will have no effect on lockdep).
- */
-static inline void arch_read_lock(arch_rwlock_t *rw)
-{
- unsigned int tmp, tmp2;
-
- asm volatile(
- " sevl\n"
- ARM64_LSE_ATOMIC_INSN(
- /* LL/SC */
- "1: wfe\n"
- "2: ldaxr %w0, %2\n"
- " add %w0, %w0, #1\n"
- " tbnz %w0, #31, 1b\n"
- " stxr %w1, %w0, %2\n"
- " cbnz %w1, 2b\n"
- __nops(1),
- /* LSE atomics */
- "1: wfe\n"
- "2: ldxr %w0, %2\n"
- " adds %w1, %w0, #1\n"
- " tbnz %w1, #31, 1b\n"
- " casa %w0, %w1, %2\n"
- " sbc %w0, %w1, %w0\n"
- " cbnz %w0, 2b")
- : "=&r" (tmp), "=&r" (tmp2), "+Q" (rw->lock)
- :
- : "cc", "memory");
-}
-
-static inline void arch_read_unlock(arch_rwlock_t *rw)
-{
- unsigned int tmp, tmp2;
-
- asm volatile(ARM64_LSE_ATOMIC_INSN(
- /* LL/SC */
- "1: ldxr %w0, %2\n"
- " sub %w0, %w0, #1\n"
- " stlxr %w1, %w0, %2\n"
- " cbnz %w1, 1b",
- /* LSE atomics */
- " movn %w0, #0\n"
- " staddl %w0, %2\n"
- __nops(2))
- : "=&r" (tmp), "=&r" (tmp2), "+Q" (rw->lock)
- :
- : "memory");
-}
-
-static inline int arch_read_trylock(arch_rwlock_t *rw)
-{
- unsigned int tmp, tmp2;
-
- asm volatile(ARM64_LSE_ATOMIC_INSN(
- /* LL/SC */
- " mov %w1, #1\n"
- "1: ldaxr %w0, %2\n"
- " add %w0, %w0, #1\n"
- " tbnz %w0, #31, 2f\n"
- " stxr %w1, %w0, %2\n"
- " cbnz %w1, 1b\n"
- "2:",
- /* LSE atomics */
- " ldr %w0, %2\n"
- " adds %w1, %w0, #1\n"
- " tbnz %w1, #31, 1f\n"
- " casa %w0, %w1, %2\n"
- " sbc %w1, %w1, %w0\n"
- __nops(1)
- "1:")
- : "=&r" (tmp), "=&r" (tmp2), "+Q" (rw->lock)
- :
- : "cc", "memory");
-
- return !tmp2;
-}
-
-/* read_can_lock - would read_trylock() succeed? */
-#define arch_read_can_lock(x) ((x)->lock < 0x80000000)
+#include <asm/qrwlock.h>
/* See include/linux/spinlock.h */
#define smp_mb__after_spinlock() smp_mb()
diff --git a/arch/arm64/include/asm/spinlock_types.h b/arch/arm64/include/asm/spinlock_types.h
index 55be59a35e3f..6b856012c51b 100644
--- a/arch/arm64/include/asm/spinlock_types.h
+++ b/arch/arm64/include/asm/spinlock_types.h
@@ -36,10 +36,6 @@ typedef struct {
#define __ARCH_SPIN_LOCK_UNLOCKED { 0 , 0 }
-typedef struct {
- volatile unsigned int lock;
-} arch_rwlock_t;
-
-#define __ARCH_RW_LOCK_UNLOCKED { 0 }
+#include <asm-generic/qrwlock_types.h>
#endif
--
2.1.4
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v3 5/5] kernel/locking: Prevent slowpath writers getting held up by fastpath
2017-10-12 12:20 [PATCH v3 0/5] Switch arm64 over to qrwlock Will Deacon
` (3 preceding siblings ...)
2017-10-12 12:20 ` [PATCH v3 4/5] arm64: locking: Move rwlock implementation over to qrwlocks Will Deacon
@ 2017-10-12 12:20 ` Will Deacon
2017-10-12 13:16 ` [PATCH v3 0/5] Switch arm64 over to qrwlock Adam Wallis
2017-10-19 16:53 ` Will Deacon
6 siblings, 0 replies; 12+ messages in thread
From: Will Deacon @ 2017-10-12 12:20 UTC (permalink / raw)
To: linux-arm-kernel
When a prospective writer takes the qrwlock locking slowpath due to the
lock being held, it attempts to cmpxchg the wmode field from 0 to
_QW_WAITING so that concurrent lockers also take the slowpath and queue
on the spinlock accordingly, allowing the lockers to drain.
Unfortunately, this isn't fair, because a fastpath writer that comes in
after the lock is made available but before the _QW_WAITING flag is set
can effectively jump the queue. If there is a steady stream of prospective
writers, then the waiter will be held off indefinitely.
This patch restores fairness by separating _QW_WAITING and _QW_LOCKED
into two distinct fields: _QW_LOCKED continues to occupy the bottom byte
of the lockword so that it can be cleared unconditionally when unlocking,
but _QW_WAITING now occupies what used to be the bottom bit of the reader
count. This then forces the slow-path for concurrent lockers.
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Tested-by: Waiman Long <longman@redhat.com>
Tested-by: Jeremy Linton <jeremy.linton@arm.com>
Tested-by: Adam Wallis <awallis@codeaurora.org>
Tested-by: Jan Glauber <jglauber@cavium.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
include/asm-generic/qrwlock.h | 23 +++++------------------
include/asm-generic/qrwlock_types.h | 8 ++++----
kernel/locking/qrwlock.c | 20 +++++---------------
3 files changed, 14 insertions(+), 37 deletions(-)
diff --git a/include/asm-generic/qrwlock.h b/include/asm-generic/qrwlock.h
index c716b02e8fd7..0f7062bd55e5 100644
--- a/include/asm-generic/qrwlock.h
+++ b/include/asm-generic/qrwlock.h
@@ -26,24 +26,11 @@
/*
* Writer states & reader shift and bias.
- *
- * | +0 | +1 | +2 | +3 |
- * ----+----+----+----+----+
- * LE | 78 | 56 | 34 | 12 | 0x12345678
- * ----+----+----+----+----+
- * | wr | rd |
- * +----+----+----+----+
- *
- * ----+----+----+----+----+
- * BE | 12 | 34 | 56 | 78 | 0x12345678
- * ----+----+----+----+----+
- * | rd | wr |
- * +----+----+----+----+
*/
-#define _QW_WAITING 1 /* A writer is waiting */
-#define _QW_LOCKED 0xff /* A writer holds the lock */
-#define _QW_WMASK 0xff /* Writer mask */
-#define _QR_SHIFT 8 /* Reader count shift */
+#define _QW_WAITING 0x100 /* A writer is waiting */
+#define _QW_LOCKED 0x0ff /* A writer holds the lock */
+#define _QW_WMASK 0x1ff /* Writer mask */
+#define _QR_SHIFT 9 /* Reader count shift */
#define _QR_BIAS (1U << _QR_SHIFT)
/*
@@ -134,7 +121,7 @@ static inline void queued_read_unlock(struct qrwlock *lock)
*/
static inline void queued_write_unlock(struct qrwlock *lock)
{
- smp_store_release(&lock->wmode, 0);
+ smp_store_release(&lock->wlocked, 0);
}
/*
diff --git a/include/asm-generic/qrwlock_types.h b/include/asm-generic/qrwlock_types.h
index 507f2dc51bba..8af752acbdc0 100644
--- a/include/asm-generic/qrwlock_types.h
+++ b/include/asm-generic/qrwlock_types.h
@@ -13,11 +13,11 @@ typedef struct qrwlock {
atomic_t cnts;
struct {
#ifdef __LITTLE_ENDIAN
- u8 wmode; /* Writer mode */
- u8 rcnts[3]; /* Reader counts */
+ u8 wlocked; /* Locked for write? */
+ u8 __lstate[3];
#else
- u8 rcnts[3]; /* Reader counts */
- u8 wmode; /* Writer mode */
+ u8 __lstate[3];
+ u8 wlocked; /* Locked for write? */
#endif
};
};
diff --git a/kernel/locking/qrwlock.c b/kernel/locking/qrwlock.c
index 5825e0fc1a8e..c7471c3fb798 100644
--- a/kernel/locking/qrwlock.c
+++ b/kernel/locking/qrwlock.c
@@ -39,8 +39,7 @@ void queued_read_lock_slowpath(struct qrwlock *lock)
* so spin with ACQUIRE semantics until the lock is available
* without waiting in the queue.
*/
- atomic_cond_read_acquire(&lock->cnts, (VAL & _QW_WMASK)
- != _QW_LOCKED);
+ atomic_cond_read_acquire(&lock->cnts, !(VAL & _QW_LOCKED));
return;
}
atomic_sub(_QR_BIAS, &lock->cnts);
@@ -56,7 +55,7 @@ void queued_read_lock_slowpath(struct qrwlock *lock)
* that accesses can't leak upwards out of our subsequent critical
* section in the case that the lock is currently held for write.
*/
- atomic_cond_read_acquire(&lock->cnts, (VAL & _QW_WMASK) != _QW_LOCKED);
+ atomic_cond_read_acquire(&lock->cnts, !(VAL & _QW_LOCKED));
/*
* Signal the next one in queue to become queue head
@@ -79,19 +78,10 @@ void queued_write_lock_slowpath(struct qrwlock *lock)
(atomic_cmpxchg_acquire(&lock->cnts, 0, _QW_LOCKED) == 0))
goto unlock;
- /*
- * Set the waiting flag to notify readers that a writer is pending,
- * or wait for a previous writer to go away.
- */
- for (;;) {
- if (!READ_ONCE(lock->wmode) &&
- (cmpxchg_relaxed(&lock->wmode, 0, _QW_WAITING) == 0))
- break;
-
- cpu_relax();
- }
+ /* Set the waiting flag to notify readers that a writer is pending */
+ atomic_add(_QW_WAITING, &lock->cnts);
- /* When no more readers, set the locked flag */
+ /* When no more readers or writers, set the locked flag */
do {
atomic_cond_read_acquire(&lock->cnts, VAL == _QW_WAITING);
} while (atomic_cmpxchg_relaxed(&lock->cnts, _QW_WAITING,
--
2.1.4
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v3 0/5] Switch arm64 over to qrwlock
2017-10-12 12:20 [PATCH v3 0/5] Switch arm64 over to qrwlock Will Deacon
` (4 preceding siblings ...)
2017-10-12 12:20 ` [PATCH v3 5/5] kernel/locking: Prevent slowpath writers getting held up by fastpath Will Deacon
@ 2017-10-12 13:16 ` Adam Wallis
2017-10-19 16:53 ` Will Deacon
6 siblings, 0 replies; 12+ messages in thread
From: Adam Wallis @ 2017-10-12 13:16 UTC (permalink / raw)
To: linux-arm-kernel
On 10/12/2017 8:20 AM, Will Deacon wrote:
> Hello,
>
> This is version three of the patches previously posted here:
>
> v1: http://lists.infradead.org/pipermail/linux-arm-kernel/2017-October/534666.html
> v2: http://lists.infradead.org/pipermail/linux-arm-kernel/2017-October/534981.html
>
> Changes since v2 include:
>
> * Drop unused cnt argument
> * Fix typo in UNLOCK Kconfig inlining options
> * Added Tested-by tags
>
> Thanks to all of those who provided review and testing feedback!
>
> Will
>
> --->8
>
> Will Deacon (5):
> kernel/locking: Use struct qrwlock instead of struct __qrwlock
> locking/atomic: Add atomic_cond_read_acquire
> kernel/locking: Use atomic_cond_read_acquire when spinning in qrwlock
> arm64: locking: Move rwlock implementation over to qrwlocks
> kernel/locking: Prevent slowpath writers getting held up by fastpath
>
> arch/arm64/Kconfig | 17 ++++
> arch/arm64/include/asm/Kbuild | 1 +
> arch/arm64/include/asm/spinlock.h | 164 +-------------------------------
> arch/arm64/include/asm/spinlock_types.h | 6 +-
> include/asm-generic/atomic-long.h | 3 +
> include/asm-generic/qrwlock.h | 37 ++-----
> include/asm-generic/qrwlock_types.h | 15 ++-
> include/linux/atomic.h | 4 +
> kernel/locking/qrwlock.c | 86 +++--------------
> 9 files changed, 61 insertions(+), 272 deletions(-)
>
Will, I see the same performance improvements that I was getting in v2 of the
patch. I will continue to run these patches on multiple systems over the next
few weeks, months for stability/functionality testing as well.
Tested-by: Adam Wallis <awallis@codeaurora.org>
--
Adam Wallis
Qualcomm Datacenter Technologies as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v3 0/5] Switch arm64 over to qrwlock
2017-10-12 12:20 [PATCH v3 0/5] Switch arm64 over to qrwlock Will Deacon
` (5 preceding siblings ...)
2017-10-12 13:16 ` [PATCH v3 0/5] Switch arm64 over to qrwlock Adam Wallis
@ 2017-10-19 16:53 ` Will Deacon
2017-10-24 14:48 ` Will Deacon
2017-10-28 6:10 ` chengjian (D)
6 siblings, 2 replies; 12+ messages in thread
From: Will Deacon @ 2017-10-19 16:53 UTC (permalink / raw)
To: linux-arm-kernel
Hi all,
On Thu, Oct 12, 2017 at 01:20:46PM +0100, Will Deacon wrote:
> This is version three of the patches previously posted here:
>
> v1: http://lists.infradead.org/pipermail/linux-arm-kernel/2017-October/534666.html
> v2: http://lists.infradead.org/pipermail/linux-arm-kernel/2017-October/534981.html
>
> Changes since v2 include:
>
> * Drop unused cnt argument
> * Fix typo in UNLOCK Kconfig inlining options
> * Added Tested-by tags
>
> Thanks to all of those who provided review and testing feedback!
I've not had any more feedback on this and the testing results are very
encouraging so I'd like to merge it for 4.15. However, it might make more
sense for the whole thing to go via -tip instead.
Ingo, Peter: what do you prefer?
Will
> Will Deacon (5):
> kernel/locking: Use struct qrwlock instead of struct __qrwlock
> locking/atomic: Add atomic_cond_read_acquire
> kernel/locking: Use atomic_cond_read_acquire when spinning in qrwlock
> arm64: locking: Move rwlock implementation over to qrwlocks
> kernel/locking: Prevent slowpath writers getting held up by fastpath
>
> arch/arm64/Kconfig | 17 ++++
> arch/arm64/include/asm/Kbuild | 1 +
> arch/arm64/include/asm/spinlock.h | 164 +-------------------------------
> arch/arm64/include/asm/spinlock_types.h | 6 +-
> include/asm-generic/atomic-long.h | 3 +
> include/asm-generic/qrwlock.h | 37 ++-----
> include/asm-generic/qrwlock_types.h | 15 ++-
> include/linux/atomic.h | 4 +
> kernel/locking/qrwlock.c | 86 +++--------------
> 9 files changed, 61 insertions(+), 272 deletions(-)
>
> --
> 2.1.4
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v3 0/5] Switch arm64 over to qrwlock
2017-10-19 16:53 ` Will Deacon
@ 2017-10-24 14:48 ` Will Deacon
2017-10-24 15:40 ` Ingo Molnar
2017-10-28 6:10 ` chengjian (D)
1 sibling, 1 reply; 12+ messages in thread
From: Will Deacon @ 2017-10-24 14:48 UTC (permalink / raw)
To: linux-arm-kernel
Ingo, Peter,
Would you be willing to merge this via -tip, please? I'm happy to
rebase/repost as required, but the code is in good shape now.
Will
On Thu, Oct 19, 2017 at 05:53:32PM +0100, Will Deacon wrote:
> On Thu, Oct 12, 2017 at 01:20:46PM +0100, Will Deacon wrote:
> > This is version three of the patches previously posted here:
> >
> > v1: http://lists.infradead.org/pipermail/linux-arm-kernel/2017-October/534666.html
> > v2: http://lists.infradead.org/pipermail/linux-arm-kernel/2017-October/534981.html
> >
> > Changes since v2 include:
> >
> > * Drop unused cnt argument
> > * Fix typo in UNLOCK Kconfig inlining options
> > * Added Tested-by tags
> >
> > Thanks to all of those who provided review and testing feedback!
>
> I've not had any more feedback on this and the testing results are very
> encouraging so I'd like to merge it for 4.15. However, it might make more
> sense for the whole thing to go via -tip instead.
>
> Ingo, Peter: what do you prefer?
>
> Will
>
> > Will Deacon (5):
> > kernel/locking: Use struct qrwlock instead of struct __qrwlock
> > locking/atomic: Add atomic_cond_read_acquire
> > kernel/locking: Use atomic_cond_read_acquire when spinning in qrwlock
> > arm64: locking: Move rwlock implementation over to qrwlocks
> > kernel/locking: Prevent slowpath writers getting held up by fastpath
> >
> > arch/arm64/Kconfig | 17 ++++
> > arch/arm64/include/asm/Kbuild | 1 +
> > arch/arm64/include/asm/spinlock.h | 164 +-------------------------------
> > arch/arm64/include/asm/spinlock_types.h | 6 +-
> > include/asm-generic/atomic-long.h | 3 +
> > include/asm-generic/qrwlock.h | 37 ++-----
> > include/asm-generic/qrwlock_types.h | 15 ++-
> > include/linux/atomic.h | 4 +
> > kernel/locking/qrwlock.c | 86 +++--------------
> > 9 files changed, 61 insertions(+), 272 deletions(-)
> >
> > --
> > 2.1.4
> >
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v3 0/5] Switch arm64 over to qrwlock
2017-10-24 14:48 ` Will Deacon
@ 2017-10-24 15:40 ` Ingo Molnar
2017-10-25 8:36 ` Peter Zijlstra
0 siblings, 1 reply; 12+ messages in thread
From: Ingo Molnar @ 2017-10-24 15:40 UTC (permalink / raw)
To: linux-arm-kernel
* Will Deacon <will.deacon@arm.com> wrote:
> Ingo, Peter,
>
> Would you be willing to merge this via -tip, please? I'm happy to
> rebase/repost as required, but the code is in good shape now.
Sure, will do it tomorrow!
Thanks,
Ingo
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v3 0/5] Switch arm64 over to qrwlock
2017-10-24 15:40 ` Ingo Molnar
@ 2017-10-25 8:36 ` Peter Zijlstra
0 siblings, 0 replies; 12+ messages in thread
From: Peter Zijlstra @ 2017-10-25 8:36 UTC (permalink / raw)
To: linux-arm-kernel
On Tue, Oct 24, 2017 at 05:40:45PM +0200, Ingo Molnar wrote:
>
> * Will Deacon <will.deacon@arm.com> wrote:
>
> > Ingo, Peter,
> >
> > Would you be willing to merge this via -tip, please? I'm happy to
> > rebase/repost as required, but the code is in good shape now.
>
> Sure, will do it tomorrow!
Yeah, ACK on this.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v3 0/5] Switch arm64 over to qrwlock
2017-10-19 16:53 ` Will Deacon
2017-10-24 14:48 ` Will Deacon
@ 2017-10-28 6:10 ` chengjian (D)
1 sibling, 0 replies; 12+ messages in thread
From: chengjian (D) @ 2017-10-28 6:10 UTC (permalink / raw)
To: linux-arm-kernel
Hi, all.
On 2017/10/20 0:53, Will Deacon wrote:
> I've not had any more feedback on this and the testing results are very
> encouraging so I'd like to merge it for 4.15. However, it might make more
> sense for the whole thing to go via -tip instead.
>
> Ingo, Peter: what do you prefer?
>
> Will
>
>> Will Deacon (5):
>> kernel/locking: Use struct qrwlock instead of struct __qrwlock
>> locking/atomic: Add atomic_cond_read_acquire
>> kernel/locking: Use atomic_cond_read_acquire when spinning in qrwlock
>> arm64: locking: Move rwlock implementation over to qrwlocks
>> kernel/locking: Prevent slowpath writers getting held up by fastpath
>>
>> arch/arm64/Kconfig | 17 ++++
>> arch/arm64/include/asm/Kbuild | 1 +
>> arch/arm64/include/asm/spinlock.h | 164 +-------------------------------
>> arch/arm64/include/asm/spinlock_types.h | 6 +-
>> include/asm-generic/atomic-long.h | 3 +
>> include/asm-generic/qrwlock.h | 37 ++-----
>> include/asm-generic/qrwlock_types.h | 15 ++-
>> include/linux/atomic.h | 4 +
>> kernel/locking/qrwlock.c | 86 +++--------------
>> 9 files changed, 61 insertions(+), 272 deletions(-)
>>
>> --
>> 2.1.4
I am very interested in this change.
Can you share the testing result for me?
Or Test methods, especially test suite and so on.
I would like to further test the performance.
Thanks.
Cheng Jian
^ permalink raw reply [flat|nested] 12+ messages in thread