* [PATCH 1/7] locking/rwsem: don't resched at the end of optimistic spinning
2014-08-04 2:36 [PATCH 0/7] locking/rwsem: enable reader opt-spinning & writer respin Waiman Long
@ 2014-08-04 2:36 ` Waiman Long
2014-08-04 7:55 ` Peter Zijlstra
2014-08-04 2:36 ` [PATCH 3/7] locking/rwsem: check for active writer/spinner before wakeup Waiman Long
` (5 subsequent siblings)
6 siblings, 1 reply; 23+ messages in thread
From: Waiman Long @ 2014-08-04 2:36 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra
Cc: linux-kernel, linux-api, linux-doc, Davidlohr Bueso, Jason Low,
Scott J Norton, Waiman Long
For a fully preemptive kernel, a call to preempt_enable() could
potentially trigger a task rescheduling event. In the case of rwsem
optimistic spinning, the task has either gotten the lock or is going
to sleep soon. So there is no point to do rescheduling here.
Signed-off-by: Waiman Long <Waiman.Long@hp.com>
---
kernel/locking/rwsem-xadd.c | 6 +++++-
1 files changed, 5 insertions(+), 1 deletions(-)
diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index a2391ac..d058946 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -385,7 +385,11 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
}
osq_unlock(&sem->osq);
done:
- preempt_enable();
+ /*
+ * Don't need to do rescheduling here as we either got the lock or
+ * is going to sleep soon.
+ */
+ preempt_enable_no_resched();
return taken;
}
--
1.7.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH 1/7] locking/rwsem: don't resched at the end of optimistic spinning
2014-08-04 2:36 ` [PATCH 1/7] locking/rwsem: don't resched at the end of optimistic spinning Waiman Long
@ 2014-08-04 7:55 ` Peter Zijlstra
[not found] ` <20140804075528.GI9918-ndre7Fmf5hadTX5a5knrm8zTDFooKrT+cvkQGrU6aU0@public.gmane.org>
0 siblings, 1 reply; 23+ messages in thread
From: Peter Zijlstra @ 2014-08-04 7:55 UTC (permalink / raw)
To: Waiman Long
Cc: Ingo Molnar, linux-kernel, linux-api, linux-doc, Davidlohr Bueso,
Jason Low, Scott J Norton
[-- Attachment #1: Type: text/plain, Size: 435 bytes --]
On Sun, Aug 03, 2014 at 10:36:16PM -0400, Waiman Long wrote:
> For a fully preemptive kernel, a call to preempt_enable() could
> potentially trigger a task rescheduling event. In the case of rwsem
> optimistic spinning, the task has either gotten the lock or is going
> to sleep soon. So there is no point to do rescheduling here.
Uh what? Why shouldn't we preempt if we've gotten the lock? What if a
FIFO task just woke up?
[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 3/7] locking/rwsem: check for active writer/spinner before wakeup
2014-08-04 2:36 [PATCH 0/7] locking/rwsem: enable reader opt-spinning & writer respin Waiman Long
2014-08-04 2:36 ` [PATCH 1/7] locking/rwsem: don't resched at the end of optimistic spinning Waiman Long
@ 2014-08-04 2:36 ` Waiman Long
[not found] ` <1407119782-41119-4-git-send-email-Waiman.Long-VXdhtT5mjnY@public.gmane.org>
2014-08-04 2:36 ` [PATCH 4/7] locking/rwsem: threshold limited spinning for active readers Waiman Long
` (4 subsequent siblings)
6 siblings, 1 reply; 23+ messages in thread
From: Waiman Long @ 2014-08-04 2:36 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra
Cc: linux-kernel, linux-api, linux-doc, Davidlohr Bueso, Jason Low,
Scott J Norton, Waiman Long
On a highly contended rwsem, spinlock contention due to the slow
rwsem_wake() call can be a significant portion of the total CPU cycles
used. With writer lock stealing and writer optimistic spinning, there
is also a pretty good chance that the lock may have been stolen
before the waker wakes up the waiters. The woken tasks, if any,
will have to go back to sleep again.
This patch adds checking code at the beginning of the rwsem_wake()
and __rwsem_do_wake() function to look for spinner and active
writer respectively. The presence of an active writer will abort the
wakeup operation. The presence of a spinner will still allow wakeup
operation to proceed as long as the trylock operation succeeds. This
strikes a good balance between excessive spinlock contention especially
when there are a lot of active readers and a lot of failed fastpath
operations because there are tasks waiting in the queue.
Signed-off-by: Waiman Long <Waiman.Long@hp.com>
---
include/linux/osq_lock.h | 5 ++++
kernel/locking/rwsem-xadd.c | 57 ++++++++++++++++++++++++++++++++++++++++++-
2 files changed, 61 insertions(+), 1 deletions(-)
diff --git a/include/linux/osq_lock.h b/include/linux/osq_lock.h
index 90230d5..79db546 100644
--- a/include/linux/osq_lock.h
+++ b/include/linux/osq_lock.h
@@ -24,4 +24,9 @@ static inline void osq_lock_init(struct optimistic_spin_queue *lock)
atomic_set(&lock->tail, OSQ_UNLOCKED_VAL);
}
+static inline bool osq_has_spinner(struct optimistic_spin_queue *lock)
+{
+ return atomic_read(&lock->tail) != OSQ_UNLOCKED_VAL;
+}
+
#endif
diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index dce22b8..9f71a67 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -107,6 +107,37 @@ enum rwsem_wake_type {
RWSEM_WAKE_READ_OWNED /* Waker thread holds the read lock */
};
+#ifdef CONFIG_RWSEM_SPIN_ON_OWNER
+/*
+ * return true if there is an active writer by checking the owner field which
+ * should be set (not RWSEM_READ_OWNED) if there is one.
+ */
+static inline bool rwsem_has_active_writer(struct rw_semaphore *sem)
+{
+ struct task_struct *owner = ACCESS_ONCE(sem->owner);
+
+ return owner && (owner != RWSEM_READ_OWNED);
+}
+
+/*
+ * Return true if the rwsem has active spinner
+ */
+static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
+{
+ return osq_has_spinner(&sem->osq);
+}
+#else /* CONFIG_RWSEM_SPIN_ON_OWNER */
+static inline bool rwsem_has_active_writer(struct rw_semaphore *sem)
+{
+ return false; /* Assume it has no active writer */
+}
+
+static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
+{
+ return false;
+}
+#endif /* CONFIG_RWSEM_SPIN_ON_OWNER */
+
/*
* handle the lock release when processes blocked on it that can now run
* - if we come here from up_xxxx(), then:
@@ -125,6 +156,15 @@ __rwsem_do_wake(struct rw_semaphore *sem, enum rwsem_wake_type wake_type)
struct list_head *next;
long oldcount, woken, loop, adjustment;
+ /*
+ * Abort the wakeup operation if there is an active writer as the
+ * lock was stolen. up_write() should have cleared the owner field
+ * before calling this function. If that field is now set, there must
+ * be an active writer present.
+ */
+ if (rwsem_has_active_writer(sem))
+ goto out;
+
waiter = list_entry(sem->wait_list.next, struct rwsem_waiter, list);
if (waiter->type == RWSEM_WAITING_FOR_WRITE) {
if (wake_type == RWSEM_WAKE_ANY)
@@ -479,7 +519,22 @@ struct rw_semaphore *rwsem_wake(struct rw_semaphore *sem)
{
unsigned long flags;
- raw_spin_lock_irqsave(&sem->wait_lock, flags);
+ /*
+ * If a spinner is present, it is not necessary to do the wakeup.
+ * Try to do wakeup when the trylock succeeds to avoid potential
+ * spinlock contention which may introduce too much delay in the
+ * unlock operation.
+ *
+ * In case the spinning writer is just going to break out of the loop,
+ * it will still do a trylock in rwsem_down_write_failed() before
+ * sleeping.
+ */
+ if (rwsem_has_spinner(sem)) {
+ if (!raw_spin_trylock_irqsave(&sem->wait_lock, flags))
+ return sem;
+ } else {
+ raw_spin_lock_irqsave(&sem->wait_lock, flags);
+ }
/* do nothing if list empty */
if (!list_empty(&sem->wait_list))
--
1.7.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 4/7] locking/rwsem: threshold limited spinning for active readers
2014-08-04 2:36 [PATCH 0/7] locking/rwsem: enable reader opt-spinning & writer respin Waiman Long
2014-08-04 2:36 ` [PATCH 1/7] locking/rwsem: don't resched at the end of optimistic spinning Waiman Long
2014-08-04 2:36 ` [PATCH 3/7] locking/rwsem: check for active writer/spinner before wakeup Waiman Long
@ 2014-08-04 2:36 ` Waiman Long
[not found] ` <1407119782-41119-5-git-send-email-Waiman.Long-VXdhtT5mjnY@public.gmane.org>
2014-08-04 2:36 ` [PATCH 5/7] locking/rwsem: move down rwsem_down_read_failed function Waiman Long
` (3 subsequent siblings)
6 siblings, 1 reply; 23+ messages in thread
From: Waiman Long @ 2014-08-04 2:36 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra
Cc: linux-kernel, linux-api, linux-doc, Davidlohr Bueso, Jason Low,
Scott J Norton, Waiman Long
Even thought only the writers can perform optimistic spinning, there
is still a chance that readers may take the lock before a spinning
writer can get it. In that case, the owner field will be NULL and the
spinning writer can spin indefinitely until its time quantum expires
when some lock owning readers are not running.
This patch tries to handle this special case by:
1) setting the owner field to a special value RWSEM_READ_OWNED
to indicate that the current or last owner is a reader.
2) seting a threshold on how many times (currently 100) spinning will
be done with active readers before giving up as there is no easy
way to determine if all of them are currently running.
By doing so, it tries to strike a balance between giving up too early
and losing potential performance gain and wasting too many precious
CPU cycles when some lock owning readers are not running.
Signed-off-by: Waiman Long <Waiman.Long@hp.com>
---
include/linux/rwsem.h | 7 +++++++
kernel/locking/rwsem-xadd.c | 23 ++++++++++++++++++++---
kernel/locking/rwsem.c | 17 ++++++++++++++++-
3 files changed, 43 insertions(+), 4 deletions(-)
diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h
index 035d3c5..48e1e31 100644
--- a/include/linux/rwsem.h
+++ b/include/linux/rwsem.h
@@ -66,6 +66,13 @@ static inline int rwsem_is_locked(struct rw_semaphore *sem)
#endif
#ifdef CONFIG_RWSEM_SPIN_ON_OWNER
+/*
+ * The owner field is set to RWSEM_READ_OWNED if the last owner(s) are
+ * readers. It is not reset until a writer takes over and set it to its
+ * task structure pointer or NULL when it frees the lock. So a value
+ * of RWSEM_READ_OWNED doesn't mean it currently has active readers.
+ */
+#define RWSEM_READ_OWNED ((struct task_struct *)-1)
#define __RWSEM_OPT_INIT(lockname) , .osq = OSQ_LOCK_UNLOCKED, .owner = NULL
#else
#define __RWSEM_OPT_INIT(lockname)
diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 9f71a67..576d4cd 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -304,6 +304,11 @@ static inline bool rwsem_try_write_lock(long count, struct rw_semaphore *sem)
#ifdef CONFIG_RWSEM_SPIN_ON_OWNER
/*
+ * Threshold for optimistic spinning on readers
+ */
+#define RWSEM_READ_SPIN_THRESHOLD 100
+
+/*
* Try to acquire write lock before the writer has been put on wait queue.
*/
static inline bool rwsem_try_write_lock_unqueued(struct rw_semaphore *sem)
@@ -332,7 +337,7 @@ static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem)
rcu_read_lock();
owner = ACCESS_ONCE(sem->owner);
- if (owner)
+ if (owner && (owner != RWSEM_READ_OWNED))
on_cpu = owner->on_cpu;
rcu_read_unlock();
@@ -381,10 +386,17 @@ bool rwsem_spin_on_owner(struct rw_semaphore *sem, struct task_struct *owner)
return sem->owner == NULL;
}
+/*
+ * With active writer, spinning is done by checking if that writer is on
+ * CPU. With active readers, there is no easy way to determine if all of
+ * them are active. So it fall back to spin a certain number of them
+ * RWSEM_READ_SPIN_THRESHOLD before giving up.
+ */
static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
{
struct task_struct *owner;
bool taken = false;
+ int read_spincnt = 0;
preempt_disable();
@@ -397,8 +409,12 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
while (true) {
owner = ACCESS_ONCE(sem->owner);
- if (owner && !rwsem_spin_on_owner(sem, owner))
+ if (owner == RWSEM_READ_OWNED) {
+ if (++read_spincnt > RWSEM_READ_SPIN_THRESHOLD)
+ break;
+ } else if (owner && !rwsem_spin_on_owner(sem, owner)) {
break;
+ }
/* wait_lock will be acquired if write_lock is obtained */
if (rwsem_try_write_lock_unqueued(sem)) {
@@ -412,7 +428,8 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
* we're an RT task that will live-lock because we won't let
* the owner complete.
*/
- if (!owner && (need_resched() || rt_task(current)))
+ if ((!owner || (owner == RWSEM_READ_OWNED)) &&
+ (need_resched() || rt_task(current)))
break;
/*
diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
index e2d3bc7..9770439 100644
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -18,6 +18,11 @@ static inline void rwsem_set_owner(struct rw_semaphore *sem)
sem->owner = current;
}
+static inline void rwsem_set_owner_read(struct rw_semaphore *sem)
+{
+ sem->owner = RWSEM_READ_OWNED;
+}
+
static inline void rwsem_clear_owner(struct rw_semaphore *sem)
{
sem->owner = NULL;
@@ -28,6 +33,10 @@ static inline void rwsem_set_owner(struct rw_semaphore *sem)
{
}
+static inline void rwsem_set_owner_read(struct rw_semaphore *sem)
+{
+}
+
static inline void rwsem_clear_owner(struct rw_semaphore *sem)
{
}
@@ -42,6 +51,7 @@ void __sched down_read(struct rw_semaphore *sem)
rwsem_acquire_read(&sem->dep_map, 0, 0, _RET_IP_);
LOCK_CONTENDED(sem, __down_read_trylock, __down_read);
+ rwsem_set_owner_read(sem);
}
EXPORT_SYMBOL(down_read);
@@ -53,8 +63,10 @@ int down_read_trylock(struct rw_semaphore *sem)
{
int ret = __down_read_trylock(sem);
- if (ret == 1)
+ if (ret == 1) {
rwsem_acquire_read(&sem->dep_map, 0, 1, _RET_IP_);
+ rwsem_set_owner_read(sem);
+ }
return ret;
}
@@ -127,6 +139,7 @@ void downgrade_write(struct rw_semaphore *sem)
*/
rwsem_clear_owner(sem);
__downgrade_write(sem);
+ rwsem_set_owner_read(sem);
}
EXPORT_SYMBOL(downgrade_write);
@@ -139,6 +152,7 @@ void down_read_nested(struct rw_semaphore *sem, int subclass)
rwsem_acquire_read(&sem->dep_map, subclass, 0, _RET_IP_);
LOCK_CONTENDED(sem, __down_read_trylock, __down_read);
+ rwsem_set_owner_read(sem);
}
EXPORT_SYMBOL(down_read_nested);
@@ -159,6 +173,7 @@ void down_read_non_owner(struct rw_semaphore *sem)
might_sleep();
__down_read(sem);
+ rwsem_set_owner_read(sem);
}
EXPORT_SYMBOL(down_read_non_owner);
--
1.7.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 5/7] locking/rwsem: move down rwsem_down_read_failed function
2014-08-04 2:36 [PATCH 0/7] locking/rwsem: enable reader opt-spinning & writer respin Waiman Long
` (2 preceding siblings ...)
2014-08-04 2:36 ` [PATCH 4/7] locking/rwsem: threshold limited spinning for active readers Waiman Long
@ 2014-08-04 2:36 ` Waiman Long
2014-08-04 2:36 ` [PATCH 6/7] locking/rwsem: enables optimistic spinning for readers Waiman Long
` (2 subsequent siblings)
6 siblings, 0 replies; 23+ messages in thread
From: Waiman Long @ 2014-08-04 2:36 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra
Cc: linux-kernel, linux-api, linux-doc, Davidlohr Bueso, Jason Low,
Scott J Norton, Waiman Long
Move the rwsem_down_read_failed() function down to below the optimistic
spinning section before enabling optimistic spinning for the readers.
There is no change in code.
Signed-off-by: Waiman Long <Waiman.Long@hp.com>
---
kernel/locking/rwsem-xadd.c | 96 +++++++++++++++++++++---------------------
1 files changed, 48 insertions(+), 48 deletions(-)
diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 576d4cd..86618ea 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -239,54 +239,6 @@ __rwsem_do_wake(struct rw_semaphore *sem, enum rwsem_wake_type wake_type)
return sem;
}
-/*
- * Wait for the read lock to be granted
- */
-__visible
-struct rw_semaphore __sched *rwsem_down_read_failed(struct rw_semaphore *sem)
-{
- long count, adjustment = -RWSEM_ACTIVE_READ_BIAS;
- struct rwsem_waiter waiter;
- struct task_struct *tsk = current;
-
- /* set up my own style of waitqueue */
- waiter.task = tsk;
- waiter.type = RWSEM_WAITING_FOR_READ;
- get_task_struct(tsk);
-
- raw_spin_lock_irq(&sem->wait_lock);
- if (list_empty(&sem->wait_list))
- adjustment += RWSEM_WAITING_BIAS;
- list_add_tail(&waiter.list, &sem->wait_list);
-
- /* we're now waiting on the lock, but no longer actively locking */
- count = rwsem_atomic_update(adjustment, sem);
-
- /* If there are no active locks, wake the front queued process(es).
- *
- * If there are no writers and we are first in the queue,
- * wake our own waiter to join the existing active readers !
- */
- if (count == RWSEM_WAITING_BIAS ||
- (count > RWSEM_WAITING_BIAS &&
- adjustment != -RWSEM_ACTIVE_READ_BIAS))
- sem = __rwsem_do_wake(sem, RWSEM_WAKE_ANY);
-
- raw_spin_unlock_irq(&sem->wait_lock);
-
- /* wait to be given the lock */
- while (true) {
- set_task_state(tsk, TASK_UNINTERRUPTIBLE);
- if (!waiter.task)
- break;
- schedule();
- }
-
- tsk->state = TASK_RUNNING;
-
- return sem;
-}
-
static inline bool rwsem_try_write_lock(long count, struct rw_semaphore *sem)
{
if (!(count & RWSEM_ACTIVE_MASK)) {
@@ -458,6 +410,54 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
#endif
/*
+ * Wait for the read lock to be granted
+ */
+__visible
+struct rw_semaphore __sched * rwsem_down_read_failed(struct rw_semaphore *sem)
+{
+ long count, adjustment = -RWSEM_ACTIVE_READ_BIAS;
+ struct rwsem_waiter waiter;
+ struct task_struct *tsk = current;
+
+ /* set up my own style of waitqueue */
+ waiter.task = tsk;
+ waiter.type = RWSEM_WAITING_FOR_READ;
+ get_task_struct(tsk);
+
+ raw_spin_lock_irq(&sem->wait_lock);
+ if (list_empty(&sem->wait_list))
+ adjustment += RWSEM_WAITING_BIAS;
+ list_add_tail(&waiter.list, &sem->wait_list);
+
+ /* we're now waiting on the lock, but no longer actively locking */
+ count = rwsem_atomic_update(adjustment, sem);
+
+ /* If there are no active locks, wake the front queued process(es).
+ *
+ * If there are no writers and we are first in the queue,
+ * wake our own waiter to join the existing active readers !
+ */
+ if (count == RWSEM_WAITING_BIAS ||
+ (count > RWSEM_WAITING_BIAS &&
+ adjustment != -RWSEM_ACTIVE_READ_BIAS))
+ sem = __rwsem_do_wake(sem, RWSEM_WAKE_ANY);
+
+ raw_spin_unlock_irq(&sem->wait_lock);
+
+ /* wait to be given the lock */
+ while (true) {
+ set_task_state(tsk, TASK_UNINTERRUPTIBLE);
+ if (!waiter.task)
+ break;
+ schedule();
+ }
+
+ tsk->state = TASK_RUNNING;
+
+ return sem;
+}
+
+/*
* Wait until we successfully acquire the write lock
*/
__visible
--
1.7.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 6/7] locking/rwsem: enables optimistic spinning for readers
2014-08-04 2:36 [PATCH 0/7] locking/rwsem: enable reader opt-spinning & writer respin Waiman Long
` (3 preceding siblings ...)
2014-08-04 2:36 ` [PATCH 5/7] locking/rwsem: move down rwsem_down_read_failed function Waiman Long
@ 2014-08-04 2:36 ` Waiman Long
2014-08-04 2:36 ` [PATCH 7/7] locking/rwsem: allow waiting writers to go back to optimistic spinning Waiman Long
[not found] ` <1407119782-41119-1-git-send-email-Waiman.Long-VXdhtT5mjnY@public.gmane.org>
6 siblings, 0 replies; 23+ messages in thread
From: Waiman Long @ 2014-08-04 2:36 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra
Cc: linux-kernel, linux-api, linux-doc, Davidlohr Bueso, Jason Low,
Scott J Norton, Waiman Long
This patch enables readers to go into the optimistic spinning loop
so that both the readers and writers can spin together. This could
speed up workloads that use both readers and writers.
Signed-off-by: Waiman Long <Waiman.Long@hp.com>
---
kernel/locking/rwsem-xadd.c | 124 +++++++++++++++++++++++++++++++++++++------
1 files changed, 108 insertions(+), 16 deletions(-)
diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 86618ea..aafc9f0 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -279,6 +279,81 @@ static inline bool rwsem_try_write_lock_unqueued(struct rw_semaphore *sem)
}
}
+/*
+ * Try to acquire read lock
+ *
+ * There is ambiguity when RWSEM_WAITING_BIAS < count < 0 as a writer may
+ * be active instead of having waiters. So we need to recheck the count
+ * under wait_lock to be sure.
+ */
+static inline bool rwsem_try_read_lock_unqueued(struct rw_semaphore *sem)
+{
+ long old, count = ACCESS_ONCE(sem->count);
+ bool taken = false; /* True if lock taken */
+
+ while (!taken) {
+ if (count < RWSEM_WAITING_BIAS)
+ break; /* Have writer and waiter */
+
+ old = count;
+ if (count >= 0 || count == RWSEM_WAITING_BIAS) {
+ count = cmpxchg(&sem->count, old,
+ old + RWSEM_ACTIVE_READ_BIAS);
+ if (count == old) {
+ /* Got the read lock */
+ ACCESS_ONCE(sem->owner) = RWSEM_READ_OWNED;
+ taken = true;
+ /*
+ * Try to wake up readers if lock is free
+ */
+ if ((count == RWSEM_WAITING_BIAS) &&
+ raw_spin_trylock_irq(&sem->wait_lock)) {
+ if (!list_empty(&sem->wait_list))
+ goto wake_readers;
+ raw_spin_unlock_irq(&sem->wait_lock);
+ }
+ }
+ } else if (ACCESS_ONCE(sem->owner) == RWSEM_READ_OWNED) {
+ long threshold;
+
+ /*
+ * RWSEM_WAITING_BIAS < count < 0
+ */
+ raw_spin_lock_irq(&sem->wait_lock);
+ threshold = list_empty(&sem->wait_list)
+ ? 0 : RWSEM_WAITING_BIAS;
+ count = ACCESS_ONCE(sem->count);
+ if (count < threshold) {
+ raw_spin_unlock_irq(&sem->wait_lock);
+ break;
+ }
+ old = count;
+ count = cmpxchg(&sem->count, old,
+ old + RWSEM_ACTIVE_READ_BIAS);
+ if (count == old) {
+ taken = true;
+ ACCESS_ONCE(sem->owner) = RWSEM_READ_OWNED;
+ /*
+ * Wake up pending readers, if any,
+ * while holding the lock.
+ */
+ if (threshold)
+ goto wake_readers;
+ }
+ raw_spin_unlock_irq(&sem->wait_lock);
+ } else {
+ break;
+ }
+ }
+ return taken;
+
+wake_readers:
+ __rwsem_do_wake(sem, RWSEM_WAKE_READ_OWNED);
+ raw_spin_unlock_irq(&sem->wait_lock);
+ return true;
+
+}
+
static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem)
{
struct task_struct *owner;
@@ -344,7 +419,8 @@ bool rwsem_spin_on_owner(struct rw_semaphore *sem, struct task_struct *owner)
* them are active. So it fall back to spin a certain number of them
* RWSEM_READ_SPIN_THRESHOLD before giving up.
*/
-static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
+static bool rwsem_optimistic_spin(struct rw_semaphore *sem,
+ enum rwsem_waiter_type type)
{
struct task_struct *owner;
bool taken = false;
@@ -368,11 +444,11 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
break;
}
- /* wait_lock will be acquired if write_lock is obtained */
- if (rwsem_try_write_lock_unqueued(sem)) {
- taken = true;
+ taken = (type == RWSEM_WAITING_FOR_WRITE)
+ ? rwsem_try_write_lock_unqueued(sem)
+ : rwsem_try_read_lock_unqueued(sem);
+ if (taken)
break;
- }
/*
* When there's no owner, we might have preempted between the
@@ -403,7 +479,8 @@ done:
}
#else
-static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
+static bool rwsem_optimistic_spin(struct rw_semaphore *sem,
+ enum rwsem_waiter_type type)
{
return false;
}
@@ -415,11 +492,21 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
__visible
struct rw_semaphore __sched * rwsem_down_read_failed(struct rw_semaphore *sem)
{
- long count, adjustment = -RWSEM_ACTIVE_READ_BIAS;
+ long count, adjustment = 0;
struct rwsem_waiter waiter;
struct task_struct *tsk = current;
- /* set up my own style of waitqueue */
+ /* undo read bias from down_read operation, stop active locking */
+ count = rwsem_atomic_update(-RWSEM_ACTIVE_READ_BIAS, sem);
+
+ /* do optimistic spinning and steal lock if possible */
+ if (rwsem_optimistic_spin(sem, RWSEM_WAITING_FOR_READ))
+ return sem;
+
+ /*
+ * Optimistic spinning failed, proceed to the slowpath
+ * and block until we can acquire the sem.
+ */
waiter.task = tsk;
waiter.type = RWSEM_WAITING_FOR_READ;
get_task_struct(tsk);
@@ -429,8 +516,11 @@ struct rw_semaphore __sched * rwsem_down_read_failed(struct rw_semaphore *sem)
adjustment += RWSEM_WAITING_BIAS;
list_add_tail(&waiter.list, &sem->wait_list);
- /* we're now waiting on the lock, but no longer actively locking */
- count = rwsem_atomic_update(adjustment, sem);
+ /* we're now waiting on the lock */
+ if (adjustment)
+ count = rwsem_atomic_update(adjustment, sem);
+ else
+ count = ACCESS_ONCE(sem->count);
/* If there are no active locks, wake the front queued process(es).
*
@@ -438,8 +528,7 @@ struct rw_semaphore __sched * rwsem_down_read_failed(struct rw_semaphore *sem)
* wake our own waiter to join the existing active readers !
*/
if (count == RWSEM_WAITING_BIAS ||
- (count > RWSEM_WAITING_BIAS &&
- adjustment != -RWSEM_ACTIVE_READ_BIAS))
+ (count > RWSEM_WAITING_BIAS && adjustment))
sem = __rwsem_do_wake(sem, RWSEM_WAKE_ANY);
raw_spin_unlock_irq(&sem->wait_lock);
@@ -471,7 +560,7 @@ struct rw_semaphore __sched *rwsem_down_write_failed(struct rw_semaphore *sem)
count = rwsem_atomic_update(-RWSEM_ACTIVE_WRITE_BIAS, sem);
/* do optimistic spinning and steal lock if possible */
- if (rwsem_optimistic_spin(sem))
+ if (rwsem_optimistic_spin(sem, RWSEM_WAITING_FOR_WRITE))
return sem;
/*
@@ -542,9 +631,12 @@ struct rw_semaphore *rwsem_wake(struct rw_semaphore *sem)
* spinlock contention which may introduce too much delay in the
* unlock operation.
*
- * In case the spinning writer is just going to break out of the loop,
- * it will still do a trylock in rwsem_down_write_failed() before
- * sleeping.
+ * In case the spinner is just going to break out of the loop, it
+ * will still do a trylock in rwsem_down_write_failed() before
+ * sleeping, or call __rwsem_do_wake() in rwsem_down_read_failed()
+ * if it detects a free lock. In either cases, we won't have the
+ * situation that the lock is free and no task is woken up from the
+ * waiting queue.
*/
if (rwsem_has_spinner(sem)) {
if (!raw_spin_trylock_irqsave(&sem->wait_lock, flags))
--
1.7.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 7/7] locking/rwsem: allow waiting writers to go back to optimistic spinning
2014-08-04 2:36 [PATCH 0/7] locking/rwsem: enable reader opt-spinning & writer respin Waiman Long
` (4 preceding siblings ...)
2014-08-04 2:36 ` [PATCH 6/7] locking/rwsem: enables optimistic spinning for readers Waiman Long
@ 2014-08-04 2:36 ` Waiman Long
[not found] ` <1407119782-41119-1-git-send-email-Waiman.Long-VXdhtT5mjnY@public.gmane.org>
6 siblings, 0 replies; 23+ messages in thread
From: Waiman Long @ 2014-08-04 2:36 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra
Cc: linux-kernel, linux-api, linux-doc, Davidlohr Bueso, Jason Low,
Scott J Norton, Waiman Long
More aggressive use of optimistic spinning and enabling readers to
participate in spinning may make tasks waiting in the queue harder
to get access to the semaphore, especially for writers who need
exclusive access.
This patch enables a waking writer to go back to the optimistic
spinning loop as long as the owner is running. This allows waiting
writers better access to the semaphore as well as reduce the size
of the waiting queue.
Signed-off-by: Waiman Long <Waiman.Long@hp.com>
---
kernel/locking/rwsem-xadd.c | 38 +++++++++++++++++++++++++++++++-------
1 files changed, 31 insertions(+), 7 deletions(-)
diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index aafc9f0..94b0124 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -484,6 +484,11 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem,
{
return false;
}
+
+static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem)
+{
+ return false;
+}
#endif
/*
@@ -553,12 +558,14 @@ __visible
struct rw_semaphore __sched *rwsem_down_write_failed(struct rw_semaphore *sem)
{
long count;
- bool waiting = true; /* any queued threads before us */
+ bool waiting; /* any queued threads before us */
+ bool respin;
struct rwsem_waiter waiter;
/* undo write bias from down_write operation, stop active locking */
count = rwsem_atomic_update(-RWSEM_ACTIVE_WRITE_BIAS, sem);
+optspin:
/* do optimistic spinning and steal lock if possible */
if (rwsem_optimistic_spin(sem, RWSEM_WAITING_FOR_WRITE))
return sem;
@@ -573,8 +580,7 @@ struct rw_semaphore __sched *rwsem_down_write_failed(struct rw_semaphore *sem)
raw_spin_lock_irq(&sem->wait_lock);
/* account for this before adding a new element to the list */
- if (list_empty(&sem->wait_list))
- waiting = false;
+ waiting = !list_empty(&sem->wait_list);
list_add_tail(&waiter.list, &sem->wait_list);
@@ -595,23 +601,41 @@ struct rw_semaphore __sched *rwsem_down_write_failed(struct rw_semaphore *sem)
/* wait until we successfully acquire the lock */
set_current_state(TASK_UNINTERRUPTIBLE);
- while (true) {
+ respin = false;
+ while (!respin) {
if (rwsem_try_write_lock(count, sem))
break;
raw_spin_unlock_irq(&sem->wait_lock);
- /* Block until there are no active lockers. */
- do {
+ /*
+ * Block until there are no active lockers or optimistic
+ * spinning is possible.
+ */
+ while (true) {
schedule();
set_current_state(TASK_UNINTERRUPTIBLE);
- } while ((count = sem->count) & RWSEM_ACTIVE_MASK);
+ count = ACCESS_ONCE(sem->count);
+ if (!(count & RWSEM_ACTIVE_MASK))
+ break;
+ /*
+ * Go back to optimistic spinning if possible
+ */
+ if (rwsem_can_spin_on_owner(sem)) {
+ respin = true;
+ break;
+ }
+ }
raw_spin_lock_irq(&sem->wait_lock);
}
__set_current_state(TASK_RUNNING);
list_del(&waiter.list);
+ if (respin && list_empty(&sem->wait_list))
+ rwsem_atomic_update(-RWSEM_WAITING_BIAS, sem);
raw_spin_unlock_irq(&sem->wait_lock);
+ if (respin)
+ goto optspin;
return sem;
}
--
1.7.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
[parent not found: <1407119782-41119-1-git-send-email-Waiman.Long-VXdhtT5mjnY@public.gmane.org>]
* [PATCH 2/7] locking/rwsem: more aggressive use of optimistic spinning
[not found] ` <1407119782-41119-1-git-send-email-Waiman.Long-VXdhtT5mjnY@public.gmane.org>
@ 2014-08-04 2:36 ` Waiman Long
2014-08-04 4:09 ` Davidlohr Bueso
[not found] ` <1407119782-41119-3-git-send-email-Waiman.Long-VXdhtT5mjnY@public.gmane.org>
2014-08-04 4:25 ` [PATCH 0/7] locking/rwsem: enable reader opt-spinning & writer respin Davidlohr Bueso
1 sibling, 2 replies; 23+ messages in thread
From: Waiman Long @ 2014-08-04 2:36 UTC (permalink / raw)
To: Ingo Molnar, Peter Zijlstra
Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
linux-api-u79uwXL29TY76Z2rM5mHXA,
linux-doc-u79uwXL29TY76Z2rM5mHXA, Davidlohr Bueso, Jason Low,
Scott J Norton, Waiman Long
The rwsem_can_spin_on_owner() function currently allows optimistic
spinning only if the owner field is defined and is running. That is
too conservative as it will cause some tasks to miss the opportunity
of doing spinning in case the owner hasn't been able to set the owner
field in time or the lock has just become available.
This patch enables more aggressive use of optimistic spinning by
assuming that the lock is spinnable unless proved otherwise.
Signed-off-by: Waiman Long <Waiman.Long-VXdhtT5mjnY@public.gmane.org>
---
kernel/locking/rwsem-xadd.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index d058946..dce22b8 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -285,7 +285,7 @@ static inline bool rwsem_try_write_lock_unqueued(struct rw_semaphore *sem)
static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem)
{
struct task_struct *owner;
- bool on_cpu = false;
+ bool on_cpu = true; /* Assume spinnable unless proved not to be */
if (need_resched())
return false;
--
1.7.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH 2/7] locking/rwsem: more aggressive use of optimistic spinning
2014-08-04 2:36 ` [PATCH 2/7] locking/rwsem: more aggressive use of " Waiman Long
@ 2014-08-04 4:09 ` Davidlohr Bueso
[not found] ` <1407119782-41119-3-git-send-email-Waiman.Long-VXdhtT5mjnY@public.gmane.org>
1 sibling, 0 replies; 23+ messages in thread
From: Davidlohr Bueso @ 2014-08-04 4:09 UTC (permalink / raw)
To: Waiman Long
Cc: Ingo Molnar, Peter Zijlstra, linux-kernel, linux-api, linux-doc,
Jason Low, Scott J Norton
On Sun, 2014-08-03 at 22:36 -0400, Waiman Long wrote:
> The rwsem_can_spin_on_owner() function currently allows optimistic
> spinning only if the owner field is defined and is running. That is
> too conservative as it will cause some tasks to miss the opportunity
> of doing spinning in case the owner hasn't been able to set the owner
> field in time or the lock has just become available.
>
> This patch enables more aggressive use of optimistic spinning by
> assuming that the lock is spinnable unless proved otherwise.
>
> Signed-off-by: Waiman Long <Waiman.Long@hp.com>
> ---
> kernel/locking/rwsem-xadd.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
> index d058946..dce22b8 100644
> --- a/kernel/locking/rwsem-xadd.c
> +++ b/kernel/locking/rwsem-xadd.c
> @@ -285,7 +285,7 @@ static inline bool rwsem_try_write_lock_unqueued(struct rw_semaphore *sem)
> static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem)
> {
> struct task_struct *owner;
> - bool on_cpu = false;
> + bool on_cpu = true; /* Assume spinnable unless proved not to be */
Nope, unfortunately we need as it fixes some pretty bad regressions when
dealing with multiple readers -- as readers do not deal with lock
ownership, so another thread can spin for too long in !owner. See commit
37e95624.
^ permalink raw reply [flat|nested] 23+ messages in thread
[parent not found: <1407119782-41119-3-git-send-email-Waiman.Long-VXdhtT5mjnY@public.gmane.org>]
* Re: [PATCH 2/7] locking/rwsem: more aggressive use of optimistic spinning
[not found] ` <1407119782-41119-3-git-send-email-Waiman.Long-VXdhtT5mjnY@public.gmane.org>
@ 2014-08-04 4:10 ` Jason Low
0 siblings, 0 replies; 23+ messages in thread
From: Jason Low @ 2014-08-04 4:10 UTC (permalink / raw)
To: Waiman Long
Cc: Ingo Molnar, Peter Zijlstra, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
linux-api-u79uwXL29TY76Z2rM5mHXA,
linux-doc-u79uwXL29TY76Z2rM5mHXA, Davidlohr Bueso, Scott J Norton,
Dave Chinner
On Sun, 2014-08-03 at 22:36 -0400, Waiman Long wrote:
> The rwsem_can_spin_on_owner() function currently allows optimistic
> spinning only if the owner field is defined and is running. That is
> too conservative as it will cause some tasks to miss the opportunity
> of doing spinning in case the owner hasn't been able to set the owner
> field in time or the lock has just become available.
>
> This patch enables more aggressive use of optimistic spinning by
> assuming that the lock is spinnable unless proved otherwise.
>
> Signed-off-by: Waiman Long <Waiman.Long-VXdhtT5mjnY@public.gmane.org>
> ---
> kernel/locking/rwsem-xadd.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
> index d058946..dce22b8 100644
> --- a/kernel/locking/rwsem-xadd.c
> +++ b/kernel/locking/rwsem-xadd.c
> @@ -285,7 +285,7 @@ static inline bool rwsem_try_write_lock_unqueued(struct rw_semaphore *sem)
> static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem)
> {
> struct task_struct *owner;
> - bool on_cpu = false;
> + bool on_cpu = true; /* Assume spinnable unless proved not to be */
Hi,
So "on_cpu = true" was recently converted to "on_cpu = false" in order
to address issues such as a 5x performance regression in the xfs_repair
workload that was caused by the original rwsem optimistic spinning code.
However, patch 4 in this patchset does address some of the problems with
spinning when there are readers. CC'ing Dave Chinner, who did the
testing with the xfs_repair workload.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/7] locking/rwsem: enable reader opt-spinning & writer respin
[not found] ` <1407119782-41119-1-git-send-email-Waiman.Long-VXdhtT5mjnY@public.gmane.org>
2014-08-04 2:36 ` [PATCH 2/7] locking/rwsem: more aggressive use of " Waiman Long
@ 2014-08-04 4:25 ` Davidlohr Bueso
[not found] ` <1407126313.3216.10.camel-5JQ4ckphU/8SZAcGdq5asR6epYMZPwEe5NbjCUgZEJk@public.gmane.org>
1 sibling, 1 reply; 23+ messages in thread
From: Davidlohr Bueso @ 2014-08-04 4:25 UTC (permalink / raw)
To: Waiman Long
Cc: Ingo Molnar, Peter Zijlstra, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
linux-api-u79uwXL29TY76Z2rM5mHXA,
linux-doc-u79uwXL29TY76Z2rM5mHXA, Jason Low, Scott J Norton
On Sun, 2014-08-03 at 22:36 -0400, Waiman Long wrote:
> This patch set improves upon the rwsem optimistic spinning patch set
> from Davidlohr to enable better performing rwsem and more aggressive
> use of optimistic spinning.
>
> By using a microbenchmark running 1 million lock-unlock operations per
> thread on a 4-socket 40-core Westmere-EX x86-64 test machine running
> 3.16-rc7 based kernels, the following table shows the execution times
> with 2/10 threads running on different CPUs on the same socket where
> load is the number of pause instructions in the critical section:
>
> lock/r:w ratio # of threads Load:Execution Time (ms)
> -------------- ------------ ------------------------
> mutex 2 1:530.7, 5:406.0, 10:472.7
> mutex 10 1:1848 , 5:2046 , 10:4394
>
> Before patch:
> rwsem/0:1 2 1:339.4, 5:368.9, 10:394.0
> rwsem/1:1 2 1:2915 , 5:2621 , 10:2764
> rwsem/10:1 2 1:891.2, 5:779.2, 10:827.2
> rwsem/0:1 10 1:5618 , 5:5722 , 10:5683
> rwsem/1:1 10 1:14562, 5:14561, 10:14770
> rwsem/10:1 10 1:5914 , 5:5971 , 10:5912
>
> After patch:
> rwsem/0:1 2 1:161.1, 5:244.4, 10:271.4
> rwsem/1:1 2 1:188.8, 5:212.4, 10:312.9
> rwsem/10:1 2 1:168.8, 5:179.5, 10:209.8
> rwsem/0:1 10 1:1306 , 5:1733 , 10:1998
> rwsem/1:1 10 1:1512 , 5:1602 , 10:2093
> rwsem/10:1 10 1:1267 , 5:1458 , 10:2233
>
> % Change:
> rwsem/0:1 2 1:-52.5%, 5:-33.7%, 10:-31.1%
> rwsem/1:1 2 1:-93.5%, 5:-91.9%, 10:-88.7%
> rwsem/10:1 2 1:-81.1%, 5:-77.0%, 10:-74.6%
> rwsem/0:1 10 1:-76.8%, 5:-69.7%, 10:-64.8%
> rwsem/1:1 10 1:-89.6%, 5:-89.0%, 10:-85.8%
> rwsem/10:1 10 1:-78.6%, 5:-75.6%, 10:-62.2%
So at a very low level you see nicer results, which aren't really
translating to much of a significant impact at a higher level (aim7).
> It can be seen that there is dramatic reduction in the execution
> times. The new rwsem is now even faster than mutex whether it is all
> writers or a mixture of writers and readers.
>
> Running the AIM7 benchmarks on the same 40-core system (HT off),
> the performance improvements on some of the workloads were as follows:
>
> Workload Before Patch After Patch % Change
> -------- ------------ ----------- --------
> custom (200-1000) 446135 477404 +7.0%
> custom (1100-2000) 449665 484734 +7.8%
> high_systime 152437 154217 +1.2%
> (200-1000)
> high_systime 269695 278942 +3.4%
> (1100-2000)
I worry about complicating rwsems even _more_ than they are, specially
for such a marginal gain. You might want to try other workloads -- ie:
postgresql (pgbench), I normally get pretty useful data when dealing
with rwsems.
^ permalink raw reply [flat|nested] 23+ messages in thread