* [PATCH v5 0/9] locking/rwsem: Enable reader optimistic spinning
@ 2017-06-01 17:38 Waiman Long
2017-06-01 17:38 ` [PATCH v5 1/9] locking/rwsem: relocate rwsem_down_read_failed() Waiman Long
` (9 more replies)
0 siblings, 10 replies; 11+ messages in thread
From: Waiman Long @ 2017-06-01 17:38 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar
Cc: linux-kernel, x86, linux-alpha, linux-ia64, linux-s390,
linux-arch, Davidlohr Bueso, Dave Chinner, Waiman Long
v4->v5:
- Drop the OSQ patch, the need to increase the size of the rwsem
structure and the autotuning mechanism.
- Add an intermediate patch to enable readers spinning on writer.
- Other miscellaneous changes and optimizations.
v3->v4:
- Rebased to the latest tip tree due to changes to rwsem-xadd.c.
- Update the OSQ patch to fix race condition.
v2->v3:
- Used smp_acquire__after_ctrl_dep() to provide acquire barrier.
- Added the following new patches:
1) make rwsem_spin_on_owner() return a tristate value.
2) reactivate reader spinning when there is a large number of
favorable writer-on-writer spinnings.
3) move all the rwsem macros in arch-specific rwsem.h files
into a common asm-generic/rwsem_types.h file.
4) add a boot parameter to specify the reader spinning threshold.
- Updated some of the patches as suggested by PeterZ and adjusted
some of the reader spinning parameters.
v1->v2:
- Fixed a 0day build error.
- Added a new patch 1 to make osq_lock() a proper acquire memory
barrier.
- Replaced the explicit enabling of reader spinning by an autotuning
mechanism that disable reader spinning for those rwsems that may
not benefit from reader spinning.
- Remove the last xfs patch as it is no longer necessary.
v4: https://lkml.org/lkml/2016/8/18/1039
This patchset enables more aggressive optimistic spinning on both
readers and writers waiting on a writer or reader owned lock. Spinning
on writer is done by looking at the on_cpu flag of the lock owner.
Spinning on readers, on the other hand, is count-based as there is no
easy way to figure out if all the readers are running. The spinner
will stop spinning once the count goes to 0. It will then set a bit
in the owner field to indicate that reader spinning is disabled for
the current reader-owned locking session so that subsequent writers
won't continue spinning.
Patch 1 moves down the rwsem_down_read_failed() function for later
patches.
Patch 2 reduces the length of the blocking window after a read locking
attempt where writer lock stealing is disabled because of the active
read lock. It can improve rwsem performance for contended lock.
Patch 3 moves the macro definitions in various arch-specific rwsem.h
header files into a commont asm-generic/rwsem_types.h file.
Patch 4 changes RWSEM_WAITING_BIAS to simpify reader trylock code
that is needed for reader optimistic spinning.
Patch 5 enables reader to spin on writer-owned lock.
Patch 6 uses a new bit in the owner field to indicate that reader
spinning should be disabled for the current reader-owned locking
session. It will be cleared when a writer owns the lock again.
Patch 7 modifies rwsem_spin_on_owner() to return a tri-state value
that can be used in later patch.
Patch 8 enables writers to optimistically spin on reader-owned lock
using a fixed iteration count.
Patch 9 enables reader lock stealing as long as the lock is
reader-owned and reader optimistic spinning isn't disabled.
In term of rwsem performance, a rwsem microbenchmark and fio randrw
test with a xfs filesystem on a ramdisk were used to verify the
performance changes due to these patches. Both tests were run on a
2-socket, 36-core E5-2699 v3 system with turbo-boosting off. The rwsem
microbenchmark (1:1 reader/writer ratio) has short critical section
while the fio randrw test has long critical section (4k read/write).
The following table shows the performance of the rwsem microbenchmark
with different number of patches applied:
# of Patches Locking rate FIO Bandwidth FIO Bandwidth
Applied 36 threads 36 threads 16 threads
------------ ------------ ------------- -------------
0 510.1 Mop/s 785 MB/s 835 MB/s
2 520.1 Mop/s 789 MB/s 835 MB/s
5 1760.2 Mop/s 281 MB/s 818 MB/s
8 5439.0 Mop/s 1361 MB/s 1367 MB/s
9 5440.8 Mop/s 1324 MB/s 1356 MB/s
With the readers spinning on writer patch (patch 5), performance
improved with short critical section workload, but degraded with
long critical section workload. This is caused by the fact that
existing code tends to collect all the readers in the wait queue and
wake all of them up together making them all proceed in parallel. On
the other hand, patch 5 will kind of breaking up the readers into
smaller batches sandwitched among the writers. So we see big drop
with 36 threads, but much smaller drop with 16 threads. Fortunately,
the performance drop was gone once we have the full patchset.
A different fio test with 18 reader threads and 18 writer threads
was also run to see how the rwsem code perfers readers or writers.
# of Patches Read Bandwith Write Bandwidth
------------ ------------- ---------------
0 86 MB/s 883 MB/s
2 86 MB/s 919 MB/s
5 158 MB/s 393 MB/s
8 2830 MB/s 1404 MB/s (?)
9 2903 MB/s 1367 MB/s (?)
It can be seen that the existing rwsem code perfers writers. With this
patchset, it becomes readers preferring. Please note that for the
last 2 entries, the reader threads exited before the writer threads
and so the write bandwidth were actually inflated.
Waiman Long (9):
locking/rwsem: relocate rwsem_down_read_failed()
locking/rwsem: Stop active read lock ASAP
locking/rwsem: Move common rwsem macros to asm-generic/rwsem_types.h
locking/rwsem: Change RWSEM_WAITING_BIAS for better disambiguation
locking/rwsem: Enable readers spinning on writer
locking/rwsem: Use bit in owner to stop spinning
locking/rwsem: Make rwsem_spin_on_owner() return a tri-state value
locking/rwsem: Enable count-based spinning on reader
locking/rwsem: Enable reader lock stealing
arch/alpha/include/asm/rwsem.h | 11 +-
arch/ia64/include/asm/rwsem.h | 9 +-
arch/s390/include/asm/rwsem.h | 9 +-
arch/x86/include/asm/rwsem.h | 22 +--
include/asm-generic/rwsem.h | 19 +--
include/asm-generic/rwsem_types.h | 28 ++++
kernel/locking/rwsem-xadd.c | 282 ++++++++++++++++++++++++++++----------
kernel/locking/rwsem.h | 66 +++++++--
8 files changed, 307 insertions(+), 139 deletions(-)
create mode 100644 include/asm-generic/rwsem_types.h
--
1.8.3.1
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v5 1/9] locking/rwsem: relocate rwsem_down_read_failed()
2017-06-01 17:38 [PATCH v5 0/9] locking/rwsem: Enable reader optimistic spinning Waiman Long
@ 2017-06-01 17:38 ` Waiman Long
2017-06-01 17:39 ` [PATCH v5 2/9] locking/rwsem: Stop active read lock ASAP Waiman Long
` (8 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: Waiman Long @ 2017-06-01 17:38 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar
Cc: linux-kernel, x86, linux-alpha, linux-ia64, linux-s390,
linux-arch, Davidlohr Bueso, Dave Chinner, Waiman Long
The rwsem_down_read_failed() function was relocted from above the
optimistic spinning section to below that section. This enables
it to use functions in that section in future patches. There is no
code change.
Signed-off-by: Waiman Long <longman@redhat.com>
---
kernel/locking/rwsem-xadd.c | 96 ++++++++++++++++++++++-----------------------
1 file changed, 48 insertions(+), 48 deletions(-)
diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 34e727f..13bdbc3 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -219,54 +219,6 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
}
/*
- * Wait for the read lock to be granted
- */
-__visible
-struct rw_semaphore __sched *rwsem_down_read_failed(struct rw_semaphore *sem)
-{
- long count, adjustment = -RWSEM_ACTIVE_READ_BIAS;
- struct rwsem_waiter waiter;
- DEFINE_WAKE_Q(wake_q);
-
- waiter.task = current;
- waiter.type = RWSEM_WAITING_FOR_READ;
-
- raw_spin_lock_irq(&sem->wait_lock);
- if (list_empty(&sem->wait_list))
- adjustment += RWSEM_WAITING_BIAS;
- list_add_tail(&waiter.list, &sem->wait_list);
-
- /* we're now waiting on the lock, but no longer actively locking */
- count = atomic_long_add_return(adjustment, &sem->count);
-
- /*
- * If there are no active locks, wake the front queued process(es).
- *
- * If there are no writers and we are first in the queue,
- * wake our own waiter to join the existing active readers !
- */
- if (count = RWSEM_WAITING_BIAS ||
- (count > RWSEM_WAITING_BIAS &&
- adjustment != -RWSEM_ACTIVE_READ_BIAS))
- __rwsem_mark_wake(sem, RWSEM_WAKE_ANY, &wake_q);
-
- raw_spin_unlock_irq(&sem->wait_lock);
- wake_up_q(&wake_q);
-
- /* wait to be given the lock */
- while (true) {
- set_current_state(TASK_UNINTERRUPTIBLE);
- if (!waiter.task)
- break;
- schedule();
- }
-
- __set_current_state(TASK_RUNNING);
- return sem;
-}
-EXPORT_SYMBOL(rwsem_down_read_failed);
-
-/*
* This function must be called with the sem->wait_lock held to prevent
* race conditions between checking the rwsem wait list and setting the
* sem->count accordingly.
@@ -461,6 +413,54 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
#endif
/*
+ * Wait for the read lock to be granted
+ */
+__visible
+struct rw_semaphore __sched *rwsem_down_read_failed(struct rw_semaphore *sem)
+{
+ long count, adjustment = -RWSEM_ACTIVE_READ_BIAS;
+ struct rwsem_waiter waiter;
+ DEFINE_WAKE_Q(wake_q);
+
+ waiter.task = current;
+ waiter.type = RWSEM_WAITING_FOR_READ;
+
+ raw_spin_lock_irq(&sem->wait_lock);
+ if (list_empty(&sem->wait_list))
+ adjustment += RWSEM_WAITING_BIAS;
+ list_add_tail(&waiter.list, &sem->wait_list);
+
+ /* we're now waiting on the lock, but no longer actively locking */
+ count = atomic_long_add_return(adjustment, &sem->count);
+
+ /*
+ * If there are no active locks, wake the front queued process(es).
+ *
+ * If there are no writers and we are first in the queue,
+ * wake our own waiter to join the existing active readers !
+ */
+ if (count = RWSEM_WAITING_BIAS ||
+ (count > RWSEM_WAITING_BIAS &&
+ adjustment != -RWSEM_ACTIVE_READ_BIAS))
+ __rwsem_mark_wake(sem, RWSEM_WAKE_ANY, &wake_q);
+
+ raw_spin_unlock_irq(&sem->wait_lock);
+ wake_up_q(&wake_q);
+
+ /* wait to be given the lock */
+ while (true) {
+ set_current_state(TASK_UNINTERRUPTIBLE);
+ if (!waiter.task)
+ break;
+ schedule();
+ }
+
+ __set_current_state(TASK_RUNNING);
+ return sem;
+}
+EXPORT_SYMBOL(rwsem_down_read_failed);
+
+/*
* Wait until we successfully acquire the write lock
*/
static inline struct rw_semaphore *
--
1.8.3.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v5 2/9] locking/rwsem: Stop active read lock ASAP
2017-06-01 17:38 [PATCH v5 0/9] locking/rwsem: Enable reader optimistic spinning Waiman Long
2017-06-01 17:38 ` [PATCH v5 1/9] locking/rwsem: relocate rwsem_down_read_failed() Waiman Long
@ 2017-06-01 17:39 ` Waiman Long
2017-06-01 17:39 ` [PATCH v5 3/9] locking/rwsem: Move common rwsem macros to asm-generic/rwsem_types.h Waiman Long
` (7 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: Waiman Long @ 2017-06-01 17:39 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar
Cc: linux-kernel, x86, linux-alpha, linux-ia64, linux-s390,
linux-arch, Davidlohr Bueso, Dave Chinner, Waiman Long
Currently, when down_read() fails, the active read locking isn't undone
until the rwsem_down_read_failed() function grabs the wait_lock. If the
wait_lock is contended, it may takes a while to get the lock. During
that period, writer lock stealing will be disabled because of the
active read lock.
This patch will release the active read lock ASAP when either the
optimisitic spinners are present or the trylock fails so that writer
lock stealing can happen sooner.
On a 2-socket 36-core x86-64 E5-2699 v3 system, a rwsem microbenchmark
was run with 36 locking threads (one/core) doing 250k reader and writer
lock/unlock operations each, the resulting locking rates (avg of 3
runs) on a 4.12 based kernel were 510.1 Mop/s and 520.1 Mop/s without
and with the patch respectively. That was an increase of about 2%.
Signed-off-by: Waiman Long <longman@redhat.com>
---
kernel/locking/rwsem-xadd.c | 27 ++++++++++++++++++++++-----
1 file changed, 22 insertions(+), 5 deletions(-)
diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 13bdbc3..e6c2bd5 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -418,6 +418,7 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
__visible
struct rw_semaphore __sched *rwsem_down_read_failed(struct rw_semaphore *sem)
{
+ bool first_in_queue = false;
long count, adjustment = -RWSEM_ACTIVE_READ_BIAS;
struct rwsem_waiter waiter;
DEFINE_WAKE_Q(wake_q);
@@ -425,13 +426,30 @@ struct rw_semaphore __sched *rwsem_down_read_failed(struct rw_semaphore *sem)
waiter.task = current;
waiter.type = RWSEM_WAITING_FOR_READ;
+ /*
+ * Undo read bias from down_read operation to stop active locking if:
+ * 1) Optimistic spinners are present; or
+ * 2) the wait_lock isn't free.
+ * Doing that after taking the wait_lock may otherwise block writer
+ * lock stealing for too long impacting performance.
+ */
+ if (rwsem_has_spinner(sem) || raw_spin_is_locked(&sem->wait_lock)) {
+ atomic_long_add(-RWSEM_ACTIVE_READ_BIAS, &sem->count);
+ adjustment = 0;
+ }
+
raw_spin_lock_irq(&sem->wait_lock);
- if (list_empty(&sem->wait_list))
+ if (list_empty(&sem->wait_list)) {
adjustment += RWSEM_WAITING_BIAS;
+ first_in_queue = true;
+ }
list_add_tail(&waiter.list, &sem->wait_list);
- /* we're now waiting on the lock, but no longer actively locking */
- count = atomic_long_add_return(adjustment, &sem->count);
+ /* we're now waiting on the lock */
+ if (adjustment)
+ count = atomic_long_add_return(adjustment, &sem->count);
+ else
+ count = atomic_long_read(&sem->count);
/*
* If there are no active locks, wake the front queued process(es).
@@ -440,8 +458,7 @@ struct rw_semaphore __sched *rwsem_down_read_failed(struct rw_semaphore *sem)
* wake our own waiter to join the existing active readers !
*/
if (count = RWSEM_WAITING_BIAS ||
- (count > RWSEM_WAITING_BIAS &&
- adjustment != -RWSEM_ACTIVE_READ_BIAS))
+ (count > RWSEM_WAITING_BIAS && first_in_queue))
__rwsem_mark_wake(sem, RWSEM_WAKE_ANY, &wake_q);
raw_spin_unlock_irq(&sem->wait_lock);
--
1.8.3.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v5 3/9] locking/rwsem: Move common rwsem macros to asm-generic/rwsem_types.h
2017-06-01 17:38 [PATCH v5 0/9] locking/rwsem: Enable reader optimistic spinning Waiman Long
2017-06-01 17:38 ` [PATCH v5 1/9] locking/rwsem: relocate rwsem_down_read_failed() Waiman Long
2017-06-01 17:39 ` [PATCH v5 2/9] locking/rwsem: Stop active read lock ASAP Waiman Long
@ 2017-06-01 17:39 ` Waiman Long
2017-06-01 17:39 ` [PATCH v5 4/9] locking/rwsem: Change RWSEM_WAITING_BIAS for better disambiguation Waiman Long
` (6 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: Waiman Long @ 2017-06-01 17:39 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar
Cc: linux-kernel, x86, linux-alpha, linux-ia64, linux-s390,
linux-arch, Davidlohr Bueso, Dave Chinner, Waiman Long
Almost all the macro definitions in the various architecture specific
rwsem.h header files are essentially the same. This patch moves all
of them into a common header asm-generic/rwsem_types.h to eliminate
the duplication.
Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Waiman Long <longman@redhat.com>
---
arch/alpha/include/asm/rwsem.h | 8 +-------
arch/ia64/include/asm/rwsem.h | 7 ++-----
arch/s390/include/asm/rwsem.h | 7 +------
arch/x86/include/asm/rwsem.h | 19 +------------------
include/asm-generic/rwsem.h | 16 +---------------
include/asm-generic/rwsem_types.h | 26 ++++++++++++++++++++++++++
6 files changed, 32 insertions(+), 51 deletions(-)
create mode 100644 include/asm-generic/rwsem_types.h
diff --git a/arch/alpha/include/asm/rwsem.h b/arch/alpha/include/asm/rwsem.h
index 77873d0..f99e39a 100644
--- a/arch/alpha/include/asm/rwsem.h
+++ b/arch/alpha/include/asm/rwsem.h
@@ -13,13 +13,7 @@
#ifdef __KERNEL__
#include <linux/compiler.h>
-
-#define RWSEM_UNLOCKED_VALUE 0x0000000000000000L
-#define RWSEM_ACTIVE_BIAS 0x0000000000000001L
-#define RWSEM_ACTIVE_MASK 0x00000000ffffffffL
-#define RWSEM_WAITING_BIAS (-0x0000000100000000L)
-#define RWSEM_ACTIVE_READ_BIAS RWSEM_ACTIVE_BIAS
-#define RWSEM_ACTIVE_WRITE_BIAS (RWSEM_WAITING_BIAS + RWSEM_ACTIVE_BIAS)
+#include <asm-generic/rwsem_types.h>
static inline void __down_read(struct rw_semaphore *sem)
{
diff --git a/arch/ia64/include/asm/rwsem.h b/arch/ia64/include/asm/rwsem.h
index 8fa98dd..21a9066 100644
--- a/arch/ia64/include/asm/rwsem.h
+++ b/arch/ia64/include/asm/rwsem.h
@@ -26,13 +26,10 @@
#endif
#include <asm/intrinsics.h>
+#include <asm-generic/rwsem_types.h>
+#undef RWSEM_UNLOCKED_VALUE
#define RWSEM_UNLOCKED_VALUE __IA64_UL_CONST(0x0000000000000000)
-#define RWSEM_ACTIVE_BIAS (1L)
-#define RWSEM_ACTIVE_MASK (0xffffffffL)
-#define RWSEM_WAITING_BIAS (-0x100000000L)
-#define RWSEM_ACTIVE_READ_BIAS RWSEM_ACTIVE_BIAS
-#define RWSEM_ACTIVE_WRITE_BIAS (RWSEM_WAITING_BIAS + RWSEM_ACTIVE_BIAS)
/*
* lock for reading
diff --git a/arch/s390/include/asm/rwsem.h b/arch/s390/include/asm/rwsem.h
index 597e7e9..13dedc81 100644
--- a/arch/s390/include/asm/rwsem.h
+++ b/arch/s390/include/asm/rwsem.h
@@ -39,12 +39,7 @@
#error "please don't include asm/rwsem.h directly, use linux/rwsem.h instead"
#endif
-#define RWSEM_UNLOCKED_VALUE 0x0000000000000000L
-#define RWSEM_ACTIVE_BIAS 0x0000000000000001L
-#define RWSEM_ACTIVE_MASK 0x00000000ffffffffL
-#define RWSEM_WAITING_BIAS (-0x0000000100000000L)
-#define RWSEM_ACTIVE_READ_BIAS RWSEM_ACTIVE_BIAS
-#define RWSEM_ACTIVE_WRITE_BIAS (RWSEM_WAITING_BIAS + RWSEM_ACTIVE_BIAS)
+#include <asm-generic/rwsem_types.h>
/*
* lock for reading
diff --git a/arch/x86/include/asm/rwsem.h b/arch/x86/include/asm/rwsem.h
index a34e0d4..b55affb 100644
--- a/arch/x86/include/asm/rwsem.h
+++ b/arch/x86/include/asm/rwsem.h
@@ -38,24 +38,7 @@
#ifdef __KERNEL__
#include <asm/asm.h>
-
-/*
- * The bias values and the counter type limits the number of
- * potential readers/writers to 32767 for 32 bits and 2147483647
- * for 64 bits.
- */
-
-#ifdef CONFIG_X86_64
-# define RWSEM_ACTIVE_MASK 0xffffffffL
-#else
-# define RWSEM_ACTIVE_MASK 0x0000ffffL
-#endif
-
-#define RWSEM_UNLOCKED_VALUE 0x00000000L
-#define RWSEM_ACTIVE_BIAS 0x00000001L
-#define RWSEM_WAITING_BIAS (-RWSEM_ACTIVE_MASK-1)
-#define RWSEM_ACTIVE_READ_BIAS RWSEM_ACTIVE_BIAS
-#define RWSEM_ACTIVE_WRITE_BIAS (RWSEM_WAITING_BIAS + RWSEM_ACTIVE_BIAS)
+#include <asm-generic/rwsem_types.h>
/*
* lock for reading
diff --git a/include/asm-generic/rwsem.h b/include/asm-generic/rwsem.h
index 6c6a214..9d31d68 100644
--- a/include/asm-generic/rwsem.h
+++ b/include/asm-generic/rwsem.h
@@ -12,21 +12,7 @@
* Adapted largely from include/asm-i386/rwsem.h
* by Paul Mackerras <paulus@samba.org>.
*/
-
-/*
- * the semaphore definition
- */
-#ifdef CONFIG_64BIT
-# define RWSEM_ACTIVE_MASK 0xffffffffL
-#else
-# define RWSEM_ACTIVE_MASK 0x0000ffffL
-#endif
-
-#define RWSEM_UNLOCKED_VALUE 0x00000000L
-#define RWSEM_ACTIVE_BIAS 0x00000001L
-#define RWSEM_WAITING_BIAS (-RWSEM_ACTIVE_MASK-1)
-#define RWSEM_ACTIVE_READ_BIAS RWSEM_ACTIVE_BIAS
-#define RWSEM_ACTIVE_WRITE_BIAS (RWSEM_WAITING_BIAS + RWSEM_ACTIVE_BIAS)
+#include <asm-generic/rwsem_types.h>
/*
* lock for reading
diff --git a/include/asm-generic/rwsem_types.h b/include/asm-generic/rwsem_types.h
new file mode 100644
index 0000000..093ef6a
--- /dev/null
+++ b/include/asm-generic/rwsem_types.h
@@ -0,0 +1,26 @@
+#ifndef _ASM_GENERIC_RWSEM_TYPES_H
+#define _ASM_GENERIC_RWSEM_TYPES_H
+
+#ifdef __KERNEL__
+
+/*
+ * the semaphore definition
+ *
+ * The bias values and the counter type limits the number of
+ * potential readers/writers to 32767 for 32 bits and 2147483647
+ * for 64 bits.
+ */
+#ifdef CONFIG_64BIT
+# define RWSEM_ACTIVE_MASK 0xffffffffL
+#else
+# define RWSEM_ACTIVE_MASK 0x0000ffffL
+#endif
+
+#define RWSEM_UNLOCKED_VALUE 0x00000000L
+#define RWSEM_ACTIVE_BIAS 0x00000001L
+#define RWSEM_WAITING_BIAS (-RWSEM_ACTIVE_MASK-1)
+#define RWSEM_ACTIVE_READ_BIAS RWSEM_ACTIVE_BIAS
+#define RWSEM_ACTIVE_WRITE_BIAS (RWSEM_WAITING_BIAS + RWSEM_ACTIVE_BIAS)
+
+#endif /* __KERNEL__ */
+#endif /* _ASM_GENERIC_RWSEM_TYPES_H */
--
1.8.3.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v5 4/9] locking/rwsem: Change RWSEM_WAITING_BIAS for better disambiguation
2017-06-01 17:38 [PATCH v5 0/9] locking/rwsem: Enable reader optimistic spinning Waiman Long
` (2 preceding siblings ...)
2017-06-01 17:39 ` [PATCH v5 3/9] locking/rwsem: Move common rwsem macros to asm-generic/rwsem_types.h Waiman Long
@ 2017-06-01 17:39 ` Waiman Long
2017-06-01 17:39 ` [PATCH v5 5/9] locking/rwsem: Enable readers spinning on writer Waiman Long
` (5 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: Waiman Long @ 2017-06-01 17:39 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar
Cc: linux-kernel, x86, linux-alpha, linux-ia64, linux-s390,
linux-arch, Davidlohr Bueso, Dave Chinner, Waiman Long
When the count value (signed long) is in between 0 and
RWSEM_WAITING_BIAS, there are 2 possibilities. Either a writer is
present and there is no waiter or there are waiters and readers. There
is no easy way to know which is true unless the wait_lock is taken.
This patch changes the RWSEM_WAITING_BIAS from 0xffff (32-bit) or
0xffffffff (64-bit) to 0xc0000000 (32-bit) or 0xc000000000000000
(64-bit). By doing so, we will be able to determine if writers
are present by looking at the count value alone without taking the
wait_lock.
This patch has the effect of halving the maximum number of writers
that can attempt to take the write lock simultaneously. However,
even the reduced maximum of about 16k (32-bit) or 1G (64-bit) should
be more than enough for the foreseeable future.
With that change, the following identity is now no longer true:
RWSEM_ACTIVE_WRITE_BIAS = RWSEM_WAITING_BIAS + RWSEM_ACTIVE_READ_BIAS
Signed-off-by: Waiman Long <longman@redhat.com>
---
arch/alpha/include/asm/rwsem.h | 3 ++-
arch/ia64/include/asm/rwsem.h | 2 +-
arch/s390/include/asm/rwsem.h | 2 +-
arch/x86/include/asm/rwsem.h | 3 ++-
include/asm-generic/rwsem.h | 3 ++-
include/asm-generic/rwsem_types.h | 10 ++++++----
kernel/locking/rwsem-xadd.c | 32 ++++++++++++++++++++++++--------
7 files changed, 38 insertions(+), 17 deletions(-)
diff --git a/arch/alpha/include/asm/rwsem.h b/arch/alpha/include/asm/rwsem.h
index f99e39a..dc236a5 100644
--- a/arch/alpha/include/asm/rwsem.h
+++ b/arch/alpha/include/asm/rwsem.h
@@ -179,7 +179,8 @@ static inline void __downgrade_write(struct rw_semaphore *sem)
"2: br 1b\n"
".previous"
:"=&r" (oldcount), "=m" (sem->count), "=&r" (temp)
- :"Ir" (-RWSEM_WAITING_BIAS), "m" (sem->count) : "memory");
+ :"Ir" (-RWSEM_ACTIVE_WRITE_BIAS + RWSEM_ACTIVE_READ_BIAS),
+ "m" (sem->count) : "memory");
#endif
if (unlikely(oldcount < 0))
rwsem_downgrade_wake(sem);
diff --git a/arch/ia64/include/asm/rwsem.h b/arch/ia64/include/asm/rwsem.h
index 21a9066..ecea341 100644
--- a/arch/ia64/include/asm/rwsem.h
+++ b/arch/ia64/include/asm/rwsem.h
@@ -141,7 +141,7 @@
do {
old = atomic_long_read(&sem->count);
- new = old - RWSEM_WAITING_BIAS;
+ new = old - RWSEM_ACTIVE_WRITE_BIAS + RWSEM_ACTIVE_READ_BIAS;
} while (atomic_long_cmpxchg_release(&sem->count, old, new) != old);
if (old < 0)
diff --git a/arch/s390/include/asm/rwsem.h b/arch/s390/include/asm/rwsem.h
index 13dedc81..e675a64 100644
--- a/arch/s390/include/asm/rwsem.h
+++ b/arch/s390/include/asm/rwsem.h
@@ -188,7 +188,7 @@ static inline void __downgrade_write(struct rw_semaphore *sem)
{
signed long old, new, tmp;
- tmp = -RWSEM_WAITING_BIAS;
+ tmp = -RWSEM_ACTIVE_WRITE_BIAS + RWSEM_ACTIVE_READ_BIAS;
asm volatile(
" lg %0,%2\n"
"0: lgr %1,%0\n"
diff --git a/arch/x86/include/asm/rwsem.h b/arch/x86/include/asm/rwsem.h
index b55affb..479c6d5 100644
--- a/arch/x86/include/asm/rwsem.h
+++ b/arch/x86/include/asm/rwsem.h
@@ -195,7 +195,8 @@ static inline void __downgrade_write(struct rw_semaphore *sem)
"1:\n\t"
"# ending __downgrade_write\n"
: "+m" (sem->count)
- : "a" (sem), "er" (-RWSEM_WAITING_BIAS)
+ : "a" (sem), "er" (-RWSEM_ACTIVE_WRITE_BIAS +
+ RWSEM_ACTIVE_READ_BIAS)
: "memory", "cc");
}
diff --git a/include/asm-generic/rwsem.h b/include/asm-generic/rwsem.h
index 9d31d68..2d84cf4 100644
--- a/include/asm-generic/rwsem.h
+++ b/include/asm-generic/rwsem.h
@@ -106,7 +106,8 @@ static inline void __downgrade_write(struct rw_semaphore *sem)
* read-locked region is ok to be re-ordered into the
* write side. As such, rely on RELEASE semantics.
*/
- tmp = atomic_long_add_return_release(-RWSEM_WAITING_BIAS, &sem->count);
+ tmp = atomic_long_add_return_release(-RWSEM_ACTIVE_WRITE_BIAS +
+ RWSEM_ACTIVE_READ_BIAS, &sem->count);
if (tmp < 0)
rwsem_downgrade_wake(sem);
}
diff --git a/include/asm-generic/rwsem_types.h b/include/asm-generic/rwsem_types.h
index 093ef6a..6d55d25 100644
--- a/include/asm-generic/rwsem_types.h
+++ b/include/asm-generic/rwsem_types.h
@@ -7,20 +7,22 @@
* the semaphore definition
*
* The bias values and the counter type limits the number of
- * potential readers/writers to 32767 for 32 bits and 2147483647
- * for 64 bits.
+ * potential writers to 16383 for 32 bits and 1073741823 for 64 bits.
+ * The combined readers and writers can go up to 65534 for 32-bits and
+ * 4294967294 for 64-bits.
*/
#ifdef CONFIG_64BIT
# define RWSEM_ACTIVE_MASK 0xffffffffL
+# define RWSEM_WAITING_BIAS (-(1L << 62))
#else
# define RWSEM_ACTIVE_MASK 0x0000ffffL
+# define RWSEM_WAITING_BIAS (-(1L << 30))
#endif
#define RWSEM_UNLOCKED_VALUE 0x00000000L
#define RWSEM_ACTIVE_BIAS 0x00000001L
-#define RWSEM_WAITING_BIAS (-RWSEM_ACTIVE_MASK-1)
#define RWSEM_ACTIVE_READ_BIAS RWSEM_ACTIVE_BIAS
-#define RWSEM_ACTIVE_WRITE_BIAS (RWSEM_WAITING_BIAS + RWSEM_ACTIVE_BIAS)
+#define RWSEM_ACTIVE_WRITE_BIAS (-RWSEM_ACTIVE_MASK)
#endif /* __KERNEL__ */
#endif /* _ASM_GENERIC_RWSEM_TYPES_H */
diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index e6c2bd5..4fb6cce 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -31,28 +31,30 @@
* 0x00000000 rwsem is unlocked, and no one is waiting for the lock or
* attempting to read lock or write lock.
*
- * 0xffff000X (1) X readers active or attempting lock, with waiters for lock
+ * 0xc000000X (1) X readers active or attempting lock, with waiters for lock
* X = #active readers + # readers attempting lock
* (X*ACTIVE_BIAS + WAITING_BIAS)
- * (2) 1 writer attempting lock, no waiters for lock
+ *
+ * 0xffff000X (1) 1 writer attempting lock, no waiters for lock
* X-1 = #active readers + #readers attempting lock
* ((X-1)*ACTIVE_BIAS + ACTIVE_WRITE_BIAS)
- * (3) 1 writer active, no waiters for lock
+ * (2) 1 writer active, no waiters for lock
* X-1 = #active readers + #readers attempting lock
* ((X-1)*ACTIVE_BIAS + ACTIVE_WRITE_BIAS)
*
- * 0xffff0001 (1) 1 reader active or attempting lock, waiters for lock
+ * 0xc0000001 (1) 1 reader active or attempting lock, waiters for lock
* (WAITING_BIAS + ACTIVE_BIAS)
- * (2) 1 writer active or attempting lock, no waiters for lock
+ *
+ * 0xffff0001 (1) 1 writer active or attempting lock, no waiters for lock
* (ACTIVE_WRITE_BIAS)
*
- * 0xffff0000 (1) There are writers or readers queued but none active
+ * 0xc0000000 (1) There are writers or readers queued but none active
* or in the process of attempting lock.
* (WAITING_BIAS)
* Note: writer can attempt to steal lock for this count by adding
* ACTIVE_WRITE_BIAS in cmpxchg and checking the old count
*
- * 0xfffe0001 (1) 1 writer active, or attempting lock. Waiters on queue.
+ * 0xbfff0001 (1) 1 writer active, or attempting lock. Waiters on queue.
* (ACTIVE_WRITE_BIAS + WAITING_BIAS)
*
* Note: Readers attempt to lock by adding ACTIVE_BIAS in down_read and checking
@@ -64,9 +66,23 @@
* checking the count becomes ACTIVE_WRITE_BIAS for successful lock
* acquisition (i.e. nobody else has lock or attempts lock). If
* unsuccessful, in rwsem_down_write_failed, we'll check to see if there
- * are only waiters but none active (5th case above), and attempt to
+ * are only waiters but none active (7th case above), and attempt to
* steal the lock.
*
+ * We can infer the reader/writer/waiter state of the lock by looking
+ * at the signed count value :
+ * (1) count > 0
+ * Only readers are present.
+ * (2) WAITING_BIAS - ACTIVE_WRITE_BIAS < count < 0
+ * Have writers, maybe readers, but no waiter
+ * (3) WAITING_BIAS < count <= WAITING_BIAS - ACTIVE_WRITE_BIAS
+ * Have readers and waiters, but no writer
+ * (4) count < WAITING_BIAS
+ * Have writers and waiters, maybe readers
+ *
+ * IOW, writers are present when
+ * (1) count < WAITING_BIAS, or
+ * (2) WAITING_BIAS - ACTIVE_WRITE_BIAS < count < 0
*/
/*
--
1.8.3.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v5 5/9] locking/rwsem: Enable readers spinning on writer
2017-06-01 17:38 [PATCH v5 0/9] locking/rwsem: Enable reader optimistic spinning Waiman Long
` (3 preceding siblings ...)
2017-06-01 17:39 ` [PATCH v5 4/9] locking/rwsem: Change RWSEM_WAITING_BIAS for better disambiguation Waiman Long
@ 2017-06-01 17:39 ` Waiman Long
2017-06-01 17:39 ` [PATCH v5 6/9] locking/rwsem: Use bit in owner to stop spinning Waiman Long
` (4 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: Waiman Long @ 2017-06-01 17:39 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar
Cc: linux-kernel, x86, linux-alpha, linux-ia64, linux-s390,
linux-arch, Davidlohr Bueso, Dave Chinner, Waiman Long
This patch enables readers to optimistically spin on a rwsem when it
is owned by a writer instead of going to sleep directly. The key to
make this possible is the changes made to RWSEM_WAITING_BIAS that
enables us to check the status of the rwsem for read lock stealing
without taking the wait_lock.
The rwsem_can_spin_on_owner() function is extracted out
of rwsem_optimistic_spin() and is called directly by
rwsem_down_read_failed() and rwsem_down_write_failed().
On a 2-socket 36-core 72-thread x86-64 E5-2699 v3 system, a rwsem
microbenchmark was run with 36 locking threads (one/core) doing 250k
reader and writer lock/unlock operations each, the resulting locking
rates (avg of 3 runs) on a 4.12 based kernel were 520.1 Mop/s and
1760.2 Mop/s without and with the patch respectively. That was an
increase of about 238%.
Signed-off-by: Waiman Long <longman@redhat.com>
---
kernel/locking/rwsem-xadd.c | 67 ++++++++++++++++++++++++++++++++++++---------
1 file changed, 54 insertions(+), 13 deletions(-)
diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 4fb6cce..f82ce29 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -85,6 +85,12 @@
* (2) WAITING_BIAS - ACTIVE_WRITE_BIAS < count < 0
*/
+static inline bool count_has_writer(long count)
+{
+ return (count < RWSEM_WAITING_BIAS) || ((count < 0) &&
+ (count > RWSEM_WAITING_BIAS - RWSEM_ACTIVE_WRITE_BIAS));
+}
+
/*
* Initialize an rwsem:
*/
@@ -287,6 +293,25 @@ static inline bool rwsem_try_write_lock_unqueued(struct rw_semaphore *sem)
}
}
+/*
+ * Try to acquire read lock before the reader is put on wait queue
+ */
+static inline bool rwsem_try_read_lock_unqueued(struct rw_semaphore *sem)
+{
+ long count = atomic_long_read(&sem->count);
+
+ if (count_has_writer(count))
+ return false;
+ count = atomic_long_add_return_acquire(RWSEM_ACTIVE_READ_BIAS,
+ &sem->count);
+ if (!count_has_writer(count))
+ return true;
+
+ /* Back out the change */
+ atomic_long_add(-RWSEM_ACTIVE_READ_BIAS, &sem->count);
+ return false;
+}
+
static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem)
{
struct task_struct *owner;
@@ -356,16 +381,14 @@ static noinline bool rwsem_spin_on_owner(struct rw_semaphore *sem)
return !rwsem_owner_is_reader(READ_ONCE(sem->owner));
}
-static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
+static bool rwsem_optimistic_spin(struct rw_semaphore *sem,
+ enum rwsem_waiter_type type)
{
bool taken = false;
preempt_disable();
/* sem->wait_lock should not be held when doing optimistic spinning */
- if (!rwsem_can_spin_on_owner(sem))
- goto done;
-
if (!osq_lock(&sem->osq))
goto done;
@@ -380,10 +403,11 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
/*
* Try to acquire the lock
*/
- if (rwsem_try_write_lock_unqueued(sem)) {
- taken = true;
+ taken = (type = RWSEM_WAITING_FOR_WRITE)
+ ? rwsem_try_write_lock_unqueued(sem)
+ : rwsem_try_read_lock_unqueued(sem);
+ if (taken)
break;
- }
/*
* When there's no owner, we might have preempted between the
@@ -417,7 +441,13 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
}
#else
-static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
+static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem)
+{
+ return false;
+}
+
+static inline bool rwsem_optimistic_spin(struct rw_semaphore *sem,
+ enum rwsem_waiter_type type)
{
return false;
}
@@ -434,7 +464,7 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
__visible
struct rw_semaphore __sched *rwsem_down_read_failed(struct rw_semaphore *sem)
{
- bool first_in_queue = false;
+ bool first_in_queue = false, can_spin;
long count, adjustment = -RWSEM_ACTIVE_READ_BIAS;
struct rwsem_waiter waiter;
DEFINE_WAKE_Q(wake_q);
@@ -444,14 +474,24 @@ struct rw_semaphore __sched *rwsem_down_read_failed(struct rw_semaphore *sem)
/*
* Undo read bias from down_read operation to stop active locking if:
- * 1) Optimistic spinners are present; or
- * 2) the wait_lock isn't free.
+ * 1) Optimistic spinners are present;
+ * 2) the wait_lock isn't free; or
+ * 3) optimistic spinning is allowed.
* Doing that after taking the wait_lock may otherwise block writer
* lock stealing for too long impacting performance.
*/
- if (rwsem_has_spinner(sem) || raw_spin_is_locked(&sem->wait_lock)) {
+ can_spin = rwsem_can_spin_on_owner(sem);
+ if (can_spin || rwsem_has_spinner(sem) ||
+ raw_spin_is_locked(&sem->wait_lock)) {
atomic_long_add(-RWSEM_ACTIVE_READ_BIAS, &sem->count);
adjustment = 0;
+
+ /*
+ * Do optimistic spinning and steal lock if possible.
+ */
+ if (can_spin &&
+ rwsem_optimistic_spin(sem, RWSEM_WAITING_FOR_READ))
+ return sem;
}
raw_spin_lock_irq(&sem->wait_lock);
@@ -509,7 +549,8 @@ struct rw_semaphore __sched *rwsem_down_read_failed(struct rw_semaphore *sem)
count = atomic_long_sub_return(RWSEM_ACTIVE_WRITE_BIAS, &sem->count);
/* do optimistic spinning and steal lock if possible */
- if (rwsem_optimistic_spin(sem))
+ if (rwsem_can_spin_on_owner(sem) &&
+ rwsem_optimistic_spin(sem, RWSEM_WAITING_FOR_WRITE))
return sem;
/*
--
1.8.3.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v5 6/9] locking/rwsem: Use bit in owner to stop spinning
2017-06-01 17:38 [PATCH v5 0/9] locking/rwsem: Enable reader optimistic spinning Waiman Long
` (4 preceding siblings ...)
2017-06-01 17:39 ` [PATCH v5 5/9] locking/rwsem: Enable readers spinning on writer Waiman Long
@ 2017-06-01 17:39 ` Waiman Long
2017-06-01 17:39 ` [PATCH v5 7/9] locking/rwsem: Make rwsem_spin_on_owner() return a tri-state value Waiman Long
` (3 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: Waiman Long @ 2017-06-01 17:39 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar
Cc: linux-kernel, x86, linux-alpha, linux-ia64, linux-s390,
linux-arch, Davidlohr Bueso, Dave Chinner, Waiman Long
To prepare for reader optimistic spinning, bit 1 of the owner field
in the rw_semaphore structure is now used for stopping optimistic
spinning on a reader-owned rwsem to reduce the possibility of writer
livelocking due to constant incoming stream of readers. That bit
can be set by both sleeping or spinning writers.
This patch provides the helper functions to facilitate the use of
that bit.
Signed-off-by: Waiman Long <longman@redhat.com>
---
kernel/locking/rwsem.h | 66 ++++++++++++++++++++++++++++++++++++++++++--------
1 file changed, 56 insertions(+), 10 deletions(-)
diff --git a/kernel/locking/rwsem.h b/kernel/locking/rwsem.h
index a699f40..3f01888 100644
--- a/kernel/locking/rwsem.h
+++ b/kernel/locking/rwsem.h
@@ -1,19 +1,27 @@
/*
- * The owner field of the rw_semaphore structure will be set to
- * RWSEM_READ_OWNED when a reader grabs the lock. A writer will clear
- * the owner field when it unlocks. A reader, on the other hand, will
- * not touch the owner field when it unlocks.
+ * The lower 2 bits of the owner field in the rw_semaphore structure are
+ * used for the following special purposes on a reader-owned lock:
+ * 1) Bit 0 - Mark the semaphore as being owned by readers.
+ * 2) Bit 1 - The optimistic spinning disable bit set by a writer to disable
+ * spinning on a reader-owned lock after failing to acquire the
+ * lock for a certain period of time. It will be reset only when a
+ * new writer acquires the lock.
+ *
+ * A writer will clear the owner field when it unlocks. A reader, on the other
+ * hand, will not touch the owner field when it unlocks.
*
* In essence, the owner field now has the following 3 states:
* 1) 0
* - lock is free or the owner hasn't set the field yet
- * 2) RWSEM_READER_OWNED
+ * 2) RWSEM_READER_OWNED [| RWSEM_SPIN_DISABLE_BIT]
* - lock is currently or previously owned by readers (lock is free
* or not set by owner yet)
* 3) Other non-zero value
* - a writer owns the lock
*/
-#define RWSEM_READER_OWNED ((struct task_struct *)1UL)
+#define RWSEM_READER_OWNED_BIT 1UL
+#define RWSEM_SPIN_DISABLE_BIT 2UL
+#define RWSEM_READER_OWNED ((struct task_struct *)RWSEM_READER_OWNED_BIT)
#ifdef CONFIG_RWSEM_SPIN_ON_OWNER
/*
@@ -33,6 +41,11 @@ static inline void rwsem_clear_owner(struct rw_semaphore *sem)
WRITE_ONCE(sem->owner, NULL);
}
+static inline bool rwsem_owner_is_reader(struct task_struct *owner)
+{
+ return (unsigned long)owner & RWSEM_READER_OWNED_BIT;
+}
+
static inline void rwsem_set_reader_owned(struct rw_semaphore *sem)
{
/*
@@ -40,19 +53,48 @@ static inline void rwsem_set_reader_owned(struct rw_semaphore *sem)
* do a write to the rwsem cacheline when it is really necessary
* to minimize cacheline contention.
*/
- if (sem->owner != RWSEM_READER_OWNED)
+ if (!rwsem_owner_is_reader(READ_ONCE(sem->owner)))
WRITE_ONCE(sem->owner, RWSEM_READER_OWNED);
}
static inline bool rwsem_owner_is_writer(struct task_struct *owner)
{
- return owner && owner != RWSEM_READER_OWNED;
+ return ((unsigned long)owner & ~RWSEM_SPIN_DISABLE_BIT) &&
+ !rwsem_owner_is_reader(owner);
}
-static inline bool rwsem_owner_is_reader(struct task_struct *owner)
+static inline bool rwsem_owner_is_spin_disabled(struct task_struct *owner)
+{
+ return (unsigned long)owner & RWSEM_SPIN_DISABLE_BIT;
+}
+
+/*
+ * Try to set an optimistic spinning disable bit while it is reader-owned.
+ */
+static inline void rwsem_set_spin_disable(struct rw_semaphore *sem)
+{
+ struct task_struct *new;
+
+ if (READ_ONCE(sem->owner) != RWSEM_READER_OWNED)
+ return;
+ new = (struct task_struct *)(RWSEM_READER_OWNED_BIT|
+ RWSEM_SPIN_DISABLE_BIT);
+
+ /*
+ * Failure in cmpxchg() will be ignored, and the caller is expected
+ * to retry later.
+ */
+ (void)cmpxchg(&sem->owner, RWSEM_READER_OWNED, new);
+}
+
+/*
+ * Is reader-owned rwsem optimistic spinning disabled?
+ */
+static inline bool rwsem_is_spin_disabled(struct rw_semaphore *sem)
{
- return owner = RWSEM_READER_OWNED;
+ return rwsem_owner_is_spin_disabled(READ_ONCE(sem->owner));
}
+
#else
static inline void rwsem_set_owner(struct rw_semaphore *sem)
{
@@ -65,4 +107,8 @@ static inline void rwsem_clear_owner(struct rw_semaphore *sem)
static inline void rwsem_set_reader_owned(struct rw_semaphore *sem)
{
}
+
+static inline void rwsem_set_spin_disable(struct rw_semaphore *sem)
+{
+}
#endif
--
1.8.3.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v5 7/9] locking/rwsem: Make rwsem_spin_on_owner() return a tri-state value
2017-06-01 17:38 [PATCH v5 0/9] locking/rwsem: Enable reader optimistic spinning Waiman Long
` (5 preceding siblings ...)
2017-06-01 17:39 ` [PATCH v5 6/9] locking/rwsem: Use bit in owner to stop spinning Waiman Long
@ 2017-06-01 17:39 ` Waiman Long
2017-06-01 17:39 ` [PATCH v5 8/9] locking/rwsem: Enable count-based spinning on reader Waiman Long
` (2 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: Waiman Long @ 2017-06-01 17:39 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar
Cc: linux-kernel, x86, linux-alpha, linux-ia64, linux-s390,
linux-arch, Davidlohr Bueso, Dave Chinner, Waiman Long
This patch modifies rwsem_spin_on_owner() to return a tri-state value
to better reflect the state of lock holder which enables us to make a
better decision of what to do next.
Signed-off-by: Waiman Long <longman@redhat.com>
---
kernel/locking/rwsem-xadd.c | 14 +++++++++-----
1 file changed, 9 insertions(+), 5 deletions(-)
diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index f82ce29..7d030a2 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -341,9 +341,13 @@ static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem)
}
/*
- * Return true only if we can still spin on the owner field of the rwsem.
+ * Return the folowing three values depending on the lock owner state.
+ * 1 when owner has changed and no reader is detected yet.
+ * 0 when owner has change and/or owner is a reader.
+ * -1 when optimistic spinning has to stop because either the owner stops
+ * running or its timeslice has been used up.
*/
-static noinline bool rwsem_spin_on_owner(struct rw_semaphore *sem)
+static noinline int rwsem_spin_on_owner(struct rw_semaphore *sem)
{
struct task_struct *owner = READ_ONCE(sem->owner);
@@ -367,7 +371,7 @@ static noinline bool rwsem_spin_on_owner(struct rw_semaphore *sem)
if (!owner->on_cpu || need_resched() ||
vcpu_is_preempted(task_cpu(owner))) {
rcu_read_unlock();
- return false;
+ return -1;
}
cpu_relax();
@@ -378,7 +382,7 @@ static noinline bool rwsem_spin_on_owner(struct rw_semaphore *sem)
* If there is a new owner or the owner is not set, we continue
* spinning.
*/
- return !rwsem_owner_is_reader(READ_ONCE(sem->owner));
+ return rwsem_owner_is_reader(READ_ONCE(sem->owner)) ? 0 : 1;
}
static bool rwsem_optimistic_spin(struct rw_semaphore *sem,
@@ -399,7 +403,7 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem,
* 2) readers own the lock as we can't determine if they are
* actively running or not.
*/
- while (rwsem_spin_on_owner(sem)) {
+ while (rwsem_spin_on_owner(sem) > 0) {
/*
* Try to acquire the lock
*/
--
1.8.3.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v5 8/9] locking/rwsem: Enable count-based spinning on reader
2017-06-01 17:38 [PATCH v5 0/9] locking/rwsem: Enable reader optimistic spinning Waiman Long
` (6 preceding siblings ...)
2017-06-01 17:39 ` [PATCH v5 7/9] locking/rwsem: Make rwsem_spin_on_owner() return a tri-state value Waiman Long
@ 2017-06-01 17:39 ` Waiman Long
2017-06-01 17:39 ` [PATCH v5 9/9] locking/rwsem: Enable reader lock stealing Waiman Long
2017-06-08 18:49 ` [PATCH v5 0/9] locking/rwsem: Enable reader optimistic spinning Waiman Long
9 siblings, 0 replies; 11+ messages in thread
From: Waiman Long @ 2017-06-01 17:39 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar
Cc: linux-kernel, x86, linux-alpha, linux-ia64, linux-s390,
linux-arch, Davidlohr Bueso, Dave Chinner, Waiman Long
When the rwsem is owned by reader, writers stop optimistic spinning
simply because there is no easy way to figure out if all the readers
are actively running or not. However, there are scenarios where
the readers are unlikely to sleep and optimistic spinning can help
performance.
This patch provides a simple mechanism for spinning on a reader-owned
rwsem. It is a loop count threshold based spinning where the count
will get reset whenenver the the rwsem count value changes indicating
that the rwsem is still active. There is another maximum count value
that limits that maximum number of spinnings that can happen.
When the loop or max counts reach 0, a bit will be set in the owner
field to indicate that no more optimistic spinning should be done on
this rwsem until it becomes writer owned.
The spinning threshold and maximum values can be overridden by
architecture specific rwsem.h header file, if necessary. The current
default threshold value is 1024 iterations.
On a 2-socket 36-core 72-thread x86-64 E5-2699 v3 system, a rwsem
microbenchmark was run with 36 locking threads (one/core) doing 250k
reader and writer lock/unlock operations each, the resulting locking
rates (avg of 3 runs) on a 4.12 based kernel were 1760.2 Mop/s and
5439.0 Mop/s without and with the patch respectively. That was an
increase of about 209%.
Signed-off-by: Waiman Long <longman@redhat.com>
---
kernel/locking/rwsem-xadd.c | 72 ++++++++++++++++++++++++++++++++++++++-------
1 file changed, 61 insertions(+), 11 deletions(-)
diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 7d030a2..a571bec 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -133,6 +133,22 @@ enum rwsem_wake_type {
};
/*
+ * Reader-owned rwsem spinning threshold and maximum value
+ *
+ * This threshold and maximum values can be overridden by architecture
+ * specific value. The loop count will be reset whenenver the rwsem count
+ * value changes. The max value constrains the total number of reader-owned
+ * lock spinnings that can happen.
+ */
+#ifdef ARCH_RWSEM_RSPIN_THRESHOLD
+# define RWSEM_RSPIN_THRESHOLD ARCH_RWSEM_RSPIN_THRESHOLD
+# define RWSEM_RSPIN_MAX ARCH_RWSEM_RSPIN_MAX
+#else
+# define RWSEM_RSPIN_THRESHOLD (1 << 10)
+# define RWSEM_RSPIN_MAX (1 << 14)
+#endif
+
+/*
* handle the lock release when processes blocked on it that can now run
* - if we come here from up_xxxx(), then:
* - the 'active part' of count (&0x0000ffff) reached 0 (but may have changed)
@@ -324,9 +340,9 @@ static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem)
owner = READ_ONCE(sem->owner);
if (!rwsem_owner_is_writer(owner)) {
/*
- * Don't spin if the rwsem is readers owned.
+ * Don't spin if the rspin disable bit is set.
*/
- ret = !rwsem_owner_is_reader(owner);
+ ret = !rwsem_owner_is_spin_disabled(owner);
goto done;
}
@@ -389,6 +405,10 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem,
enum rwsem_waiter_type type)
{
bool taken = false;
+ int owner_state; /* Lock owner state */
+ int rspin_cnt = RWSEM_RSPIN_THRESHOLD;
+ int rspin_max = RWSEM_RSPIN_MAX;
+ long old_count = 0;
preempt_disable();
@@ -396,14 +416,16 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem,
if (!osq_lock(&sem->osq))
goto done;
+ if (rwsem_is_spin_disabled(sem))
+ rspin_cnt = 0;
+
/*
* Optimistically spin on the owner field and attempt to acquire the
* lock whenever the owner changes. Spinning will be stopped when:
* 1) the owning writer isn't running; or
- * 2) readers own the lock as we can't determine if they are
- * actively running or not.
+ * 2) readers own the lock and reader spinning count has reached 0.
*/
- while (rwsem_spin_on_owner(sem) > 0) {
+ while ((owner_state = rwsem_spin_on_owner(sem)) >= 0) {
/*
* Try to acquire the lock
*/
@@ -414,6 +436,33 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem,
break;
/*
+ * We only decremnt the rspin_cnt when the lock is owned
+ * by readers (owner_state = 0). In which case,
+ * rwsem_spin_on_owner() will essentially be a no-op
+ * and we will be spinning in this main loop. The spinning
+ * count will be reset whenever the rwsem count value
+ * changes.
+ */
+ if (!owner_state) {
+ long count;
+
+ if (!rspin_cnt || !rspin_max) {
+ if (!rwsem_is_spin_disabled(sem))
+ rwsem_set_spin_disable(sem);
+ break;
+ }
+
+ count = atomic_long_read(&sem->count);
+ if (count != old_count) {
+ old_count = count;
+ rspin_cnt = RWSEM_RSPIN_THRESHOLD;
+ } else {
+ rspin_cnt--;
+ }
+ rspin_max--;
+ }
+
+ /*
* When there's no owner, we might have preempted between the
* owner acquiring the lock and setting the owner field. If
* we're an RT task that will live-lock because we won't let
@@ -468,7 +517,7 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
__visible
struct rw_semaphore __sched *rwsem_down_read_failed(struct rw_semaphore *sem)
{
- bool first_in_queue = false, can_spin;
+ bool first_in_queue = false;
long count, adjustment = -RWSEM_ACTIVE_READ_BIAS;
struct rwsem_waiter waiter;
DEFINE_WAKE_Q(wake_q);
@@ -476,6 +525,9 @@ struct rw_semaphore __sched *rwsem_down_read_failed(struct rw_semaphore *sem)
waiter.task = current;
waiter.type = RWSEM_WAITING_FOR_READ;
+ if (!rwsem_can_spin_on_owner(sem))
+ goto enqueue;
+
/*
* Undo read bias from down_read operation to stop active locking if:
* 1) Optimistic spinners are present;
@@ -484,20 +536,18 @@ struct rw_semaphore __sched *rwsem_down_read_failed(struct rw_semaphore *sem)
* Doing that after taking the wait_lock may otherwise block writer
* lock stealing for too long impacting performance.
*/
- can_spin = rwsem_can_spin_on_owner(sem);
- if (can_spin || rwsem_has_spinner(sem) ||
- raw_spin_is_locked(&sem->wait_lock)) {
+ if (rwsem_has_spinner(sem) || raw_spin_is_locked(&sem->wait_lock)) {
atomic_long_add(-RWSEM_ACTIVE_READ_BIAS, &sem->count);
adjustment = 0;
/*
* Do optimistic spinning and steal lock if possible.
*/
- if (can_spin &&
- rwsem_optimistic_spin(sem, RWSEM_WAITING_FOR_READ))
+ if (rwsem_optimistic_spin(sem, RWSEM_WAITING_FOR_READ))
return sem;
}
+enqueue:
raw_spin_lock_irq(&sem->wait_lock);
if (list_empty(&sem->wait_list)) {
adjustment += RWSEM_WAITING_BIAS;
--
1.8.3.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v5 9/9] locking/rwsem: Enable reader lock stealing
2017-06-01 17:38 [PATCH v5 0/9] locking/rwsem: Enable reader optimistic spinning Waiman Long
` (7 preceding siblings ...)
2017-06-01 17:39 ` [PATCH v5 8/9] locking/rwsem: Enable count-based spinning on reader Waiman Long
@ 2017-06-01 17:39 ` Waiman Long
2017-06-08 18:49 ` [PATCH v5 0/9] locking/rwsem: Enable reader optimistic spinning Waiman Long
9 siblings, 0 replies; 11+ messages in thread
From: Waiman Long @ 2017-06-01 17:39 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar
Cc: linux-kernel, x86, linux-alpha, linux-ia64, linux-s390,
linux-arch, Davidlohr Bueso, Dave Chinner, Waiman Long
The rwsem has supported writer lock stealing for a long time. Reader
lock stealing isn't allowed as it may lead to writer lock starvation.
As a result, writers are preferred over readers. However, preferring
readers generally leads to better overall performance.
This patch now enables reader lock stealing on a rwsem as long as
the lock is reader-owned and optimistic spinning hasn't been disabled
because of long writer wait. This will improve overall performance
without running the risk of writer lock starvation.
Signed-off-by: Waiman Long <longman@redhat.com>
---
kernel/locking/rwsem-xadd.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index a571bec..f5caba8 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -529,6 +529,14 @@ struct rw_semaphore __sched *rwsem_down_read_failed(struct rw_semaphore *sem)
goto enqueue;
/*
+ * Steal the lock if no writer was present and the optimistic
+ * spinning disable bit isn't set.
+ */
+ count = atomic_long_read(&sem->count);
+ if (!count_has_writer(count))
+ return sem;
+
+ /*
* Undo read bias from down_read operation to stop active locking if:
* 1) Optimistic spinners are present;
* 2) the wait_lock isn't free; or
--
1.8.3.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH v5 0/9] locking/rwsem: Enable reader optimistic spinning
2017-06-01 17:38 [PATCH v5 0/9] locking/rwsem: Enable reader optimistic spinning Waiman Long
` (8 preceding siblings ...)
2017-06-01 17:39 ` [PATCH v5 9/9] locking/rwsem: Enable reader lock stealing Waiman Long
@ 2017-06-08 18:49 ` Waiman Long
9 siblings, 0 replies; 11+ messages in thread
From: Waiman Long @ 2017-06-08 18:49 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar
Cc: linux-kernel, x86, linux-alpha, linux-ia64, linux-s390,
linux-arch, Davidlohr Bueso, Dave Chinner
Hi,
Got the following tip-bit about this patch performance impact.
Cheers,
Longman
----------------------------------------------------
Greeting,
FYI, we noticed a 125.4% improvement of will-it-scale.per_thread_ops due to commit:
commit: a150752454e4aea37a44d7eb5baf5a538bcad6fc ("locking/rwsem: Enable readers spinning on writer")
url: https://github.com/0day-ci/linux/commits/Waiman-Long/locking-rwsem-Enable-reader-optimistic-spinning/20170602-071830
in testcase: will-it-scale
on test machine: 8 threads Ivy Bridge with 16G memory
with following parameters:
nr_task: 100%
mode: thread
test: malloc1
cpufreq_governor: performance
test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
test-url: https://github.com/antonblanchard/will-it-scale
Details are as below:
-------------------------------------------------------------------------------------------------->
To reproduce:
git clone https://github.com/01org/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml
testcase/path_params/tbox_group/run: will-it-scale/100%-thread-malloc1-performance/lkp-ivb-d01
f25a7e717bfb87ab a150752454e4aea37a44d7eb5b
---------------- --------------------------
%stddev change %stddev
\ | \
6092 ± 12% 125% 13734 will-it-scale.per_thread_ops
14641877 ± 12% 126% 33029197 will-it-scale.time.minor_page_faults
15.03 ± 13% 57% 23.66 ± 12% will-it-scale.time.user_time
40731914 ± 12% 46% 59414926 ± 5% will-it-scale.time.voluntary_context_switches
11954 ± 18% 28% 15275 ± 11% will-it-scale.time.maximum_resident_set_size
142 22% 174 will-it-scale.time.percent_of_cpu_this_job_got
414 21% 502 will-it-scale.time.system_time
539104 -78% 117329 ± 13% will-it-scale.time.involuntary_context_switches
31904937 ± 13% 55% 49519854 ± 5% interrupts.CAL:Function_call_interrupts
129303 ± 10% 48% 191426 ± 4% vmstat.system.in
297417 ± 11% 42% 421902 ± 4% vmstat.system.cs
25.73 26.28 turbostat.CorWatt
31.60 32.21 turbostat.PkgWatt
22.67 19% 27.03 turbostat.%Busy
837 20% 1006 turbostat.Avg_MHz
1271 ± 36% 6e+04 56891 ± 74% latency_stats.max.call_rwsem_down_read_failed.__do_page_fault.do_page_fault.page_fault
2249 ± 19% 5e+04 52972 ± 86% latency_stats.max.call_rwsem_down_write_failed_killable.vm_mmap_pgoff.SyS_mmap_pgoff.SyS_mmap.entry_SYSCALL_64_fastpath
2264 ± 19% 5e+04 52187 ± 88% latency_stats.max.call_rwsem_down_write_failed_killable.vm_munmap.SyS_munmap.entry_SYSCALL_64_fastpath
9934 ± 25% 5e+04 57497 ± 75% latency_stats.max.max
14956191 ± 12% 123% 33343207 perf-stat.page-faults
14956191 ± 12% 123% 33343206 perf-stat.minor-faults
2.266e+11 ± 4% 46% 3.318e+11 perf-stat.branch-instructions
3.231e+11 ± 3% 39% 4.485e+11 perf-stat.dTLB-loads
1.155e+12 ± 3% 38% 1.593e+12 perf-stat.instructions
0.02 ± 11% 103% 0.05 ± 6% perf-stat.dTLB-store-miss-rate%
86305241 ± 8% 74% 1.502e+08 ± 6% perf-stat.dTLB-store-misses
0.56 14% 0.64 perf-stat.ipc
2.057e+12 21% 2.481e+12 perf-stat.cpu-cycles
3.674e+11 ± 3% -15% 3.136e+11 perf-stat.dTLB-stores
0.76 ± 3% -32% 0.51 ± 4% perf-stat.branch-miss-rate%
1869 ± 5% 30% 2432 ± 8% perf-stat.instructions-per-iTLB-miss
6.014e+10 ± 8% -48% 3.146e+10 ± 5% perf-stat.cache-references
0.29 ± 6% -17% 0.24 ± 12% perf-stat.dTLB-load-miss-rate%
90408163 ± 11% 42% 1.283e+08 ± 4% perf-stat.context-switches
182383 ± 13% -55% 82982 ± 49% perf-stat.cpu-migrations
[*] bisect-good sample
[O] bisect-bad sample
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
Thanks,
Xiaolong
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2017-06-08 18:49 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-06-01 17:38 [PATCH v5 0/9] locking/rwsem: Enable reader optimistic spinning Waiman Long
2017-06-01 17:38 ` [PATCH v5 1/9] locking/rwsem: relocate rwsem_down_read_failed() Waiman Long
2017-06-01 17:39 ` [PATCH v5 2/9] locking/rwsem: Stop active read lock ASAP Waiman Long
2017-06-01 17:39 ` [PATCH v5 3/9] locking/rwsem: Move common rwsem macros to asm-generic/rwsem_types.h Waiman Long
2017-06-01 17:39 ` [PATCH v5 4/9] locking/rwsem: Change RWSEM_WAITING_BIAS for better disambiguation Waiman Long
2017-06-01 17:39 ` [PATCH v5 5/9] locking/rwsem: Enable readers spinning on writer Waiman Long
2017-06-01 17:39 ` [PATCH v5 6/9] locking/rwsem: Use bit in owner to stop spinning Waiman Long
2017-06-01 17:39 ` [PATCH v5 7/9] locking/rwsem: Make rwsem_spin_on_owner() return a tri-state value Waiman Long
2017-06-01 17:39 ` [PATCH v5 8/9] locking/rwsem: Enable count-based spinning on reader Waiman Long
2017-06-01 17:39 ` [PATCH v5 9/9] locking/rwsem: Enable reader lock stealing Waiman Long
2017-06-08 18:49 ` [PATCH v5 0/9] locking/rwsem: Enable reader optimistic spinning Waiman Long
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).