[PATCH 0/7] locking/rwsem: enable reader opt-spinning & writer respin

linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/7] locking/rwsem: enable reader opt-spinning & writer respin
@ 2014-08-04  2:36 Waiman Long
  2014-08-04  2:36 ` [PATCH 1/7] locking/rwsem: don't resched at the end of optimistic spinning Waiman Long
                   ` (6 more replies)
  0 siblings, 7 replies; 23+ messages in thread
From: Waiman Long @ 2014-08-04  2:36 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra
  Cc: linux-kernel, linux-api, linux-doc, Davidlohr Bueso, Jason Low,
	Scott J Norton, Waiman Long

This patch set improves upon the rwsem optimistic spinning patch set
from Davidlohr to enable better performing rwsem and more aggressive
use of optimistic spinning.

By using a microbenchmark running 1 million lock-unlock operations per
thread on a 4-socket 40-core Westmere-EX x86-64 test machine running
3.16-rc7 based kernels, the following table shows the execution times
with 2/10 threads running on different CPUs on the same socket where
load is the number of pause instructions in the critical section:

  lock/r:w ratio # of threads	Load:Execution Time (ms)
  -------------- ------------	------------------------
  mutex		      2		1:530.7, 5:406.0, 10:472.7
  mutex		     10		1:1848 , 5:2046 , 10:4394

Before patch:
  rwsem/0:1	      2		1:339.4, 5:368.9, 10:394.0
  rwsem/1:1	      2		1:2915 , 5:2621 , 10:2764
  rwsem/10:1	      2		1:891.2, 5:779.2, 10:827.2
  rwsem/0:1	     10		1:5618 , 5:5722 , 10:5683
  rwsem/1:1	     10		1:14562, 5:14561, 10:14770
  rwsem/10:1	     10		1:5914 , 5:5971 , 10:5912

After patch:
  rwsem/0:1	     2		1:161.1, 5:244.4, 10:271.4
  rwsem/1:1	     2		1:188.8, 5:212.4, 10:312.9
  rwsem/10:1	     2		1:168.8, 5:179.5, 10:209.8
  rwsem/0:1	    10		1:1306 , 5:1733 , 10:1998
  rwsem/1:1	    10		1:1512 , 5:1602 , 10:2093
  rwsem/10:1	    10		1:1267 , 5:1458 , 10:2233

% Change:
  rwsem/0:1	     2		1:-52.5%, 5:-33.7%, 10:-31.1%
  rwsem/1:1	     2		1:-93.5%, 5:-91.9%, 10:-88.7%
  rwsem/10:1	     2		1:-81.1%, 5:-77.0%, 10:-74.6%
  rwsem/0:1	    10		1:-76.8%, 5:-69.7%, 10:-64.8%
  rwsem/1:1	    10		1:-89.6%, 5:-89.0%, 10:-85.8%
  rwsem/10:1	    10		1:-78.6%, 5:-75.6%, 10:-62.2%

It can be seen that there is dramatic reduction in the execution
times. The new rwsem is now even faster than mutex whether it is all
writers or a mixture of writers and readers.

Running the AIM7 benchmarks on the same 40-core system (HT off),
the performance improvements on some of the workloads were as follows:

      Workload	     Before Patch	After Patch	% Change
      --------	     ------------	-----------	--------
  custom (200-1000)	446135		  477404	 +7.0%
  custom (1100-2000)	449665		  484734	 +7.8%
  high_systime		152437		  154217	 +1.2%
   (200-1000)
  high_systime		269695		  278942	 +3.4%
   (1100-2000)

Waiman Long (7):
  locking/rwsem: don't resched at the end of optimistic spinning
  locking/rwsem: more aggressive use of optimistic spinning
  locking/rwsem: check for active writer/spinner before wakeup
  locking/rwsem: threshold limited spinning for active readers
  locking/rwsem: move down rwsem_down_read_failed function
  locking/rwsem: enables optimistic spinning for readers
  locking/rwsem: allow waiting writers to go back to optimistic spinning

 include/linux/osq_lock.h    |    5 +
 include/linux/rwsem.h       |    7 +
 kernel/locking/rwsem-xadd.c |  328 ++++++++++++++++++++++++++++++++++---------
 kernel/locking/rwsem.c      |   17 ++-
 4 files changed, 288 insertions(+), 69 deletions(-)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 1/7] locking/rwsem: don't resched at the end of optimistic spinning
  2014-08-04  2:36 [PATCH 0/7] locking/rwsem: enable reader opt-spinning & writer respin Waiman Long
@ 2014-08-04  2:36 ` Waiman Long
  2014-08-04  7:55   ` Peter Zijlstra
  2014-08-04  2:36 ` [PATCH 3/7] locking/rwsem: check for active writer/spinner before wakeup Waiman Long
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 23+ messages in thread
From: Waiman Long @ 2014-08-04  2:36 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra
  Cc: linux-kernel, linux-api, linux-doc, Davidlohr Bueso, Jason Low,
	Scott J Norton, Waiman Long

For a fully preemptive kernel, a call to preempt_enable() could
potentially trigger a task rescheduling event. In the case of rwsem
optimistic spinning, the task has either gotten the lock or is going
to sleep soon. So there is no point to do rescheduling here.

Signed-off-by: Waiman Long <Waiman.Long@hp.com>
---
 kernel/locking/rwsem-xadd.c |    6 +++++-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index a2391ac..d058946 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -385,7 +385,11 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
 	}
 	osq_unlock(&sem->osq);
 done:
-	preempt_enable();
+	/*
+	 * Don't need to do rescheduling here as we either got the lock or
+	 * is going to sleep soon.
+	 */
+	preempt_enable_no_resched();
 	return taken;
 }
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 2/7] locking/rwsem: more aggressive use of optimistic spinning
       [not found] ` <1407119782-41119-1-git-send-email-Waiman.Long-VXdhtT5mjnY@public.gmane.org>
@ 2014-08-04  2:36   ` Waiman Long
  2014-08-04  4:09     ` Davidlohr Bueso
       [not found]     ` <1407119782-41119-3-git-send-email-Waiman.Long-VXdhtT5mjnY@public.gmane.org>
  2014-08-04  4:25   ` [PATCH 0/7] locking/rwsem: enable reader opt-spinning & writer respin Davidlohr Bueso
  1 sibling, 2 replies; 23+ messages in thread
From: Waiman Long @ 2014-08-04  2:36 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-doc-u79uwXL29TY76Z2rM5mHXA, Davidlohr Bueso, Jason Low,
	Scott J Norton, Waiman Long

The rwsem_can_spin_on_owner() function currently allows optimistic
spinning only if the owner field is defined and is running. That is
too conservative as it will cause some tasks to miss the opportunity
of doing spinning in case the owner hasn't been able to set the owner
field in time or the lock has just become available.

This patch enables more aggressive use of optimistic spinning by
assuming that the lock is spinnable unless proved otherwise.

Signed-off-by: Waiman Long <Waiman.Long-VXdhtT5mjnY@public.gmane.org>
---
 kernel/locking/rwsem-xadd.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index d058946..dce22b8 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -285,7 +285,7 @@ static inline bool rwsem_try_write_lock_unqueued(struct rw_semaphore *sem)
 static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem)
 {
 	struct task_struct *owner;
-	bool on_cpu = false;
+	bool on_cpu = true;	/* Assume spinnable unless proved not to be */
 
 	if (need_resched())
 		return false;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 3/7] locking/rwsem: check for active writer/spinner before wakeup
  2014-08-04  2:36 [PATCH 0/7] locking/rwsem: enable reader opt-spinning & writer respin Waiman Long
  2014-08-04  2:36 ` [PATCH 1/7] locking/rwsem: don't resched at the end of optimistic spinning Waiman Long
@ 2014-08-04  2:36 ` Waiman Long
       [not found]   ` <1407119782-41119-4-git-send-email-Waiman.Long-VXdhtT5mjnY@public.gmane.org>
  2014-08-04  2:36 ` [PATCH 4/7] locking/rwsem: threshold limited spinning for active readers Waiman Long
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 23+ messages in thread
From: Waiman Long @ 2014-08-04  2:36 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra
  Cc: linux-kernel, linux-api, linux-doc, Davidlohr Bueso, Jason Low,
	Scott J Norton, Waiman Long

On a highly contended rwsem, spinlock contention due to the slow
rwsem_wake() call can be a significant portion of the total CPU cycles
used. With writer lock stealing and writer optimistic spinning, there
is also a pretty good chance that the lock may have been stolen
before the waker wakes up the waiters. The woken tasks, if any,
will have to go back to sleep again.

This patch adds checking code at the beginning of the rwsem_wake()
and __rwsem_do_wake() function to look for spinner and active
writer respectively.  The presence of an active writer will abort the
wakeup operation.  The presence of a spinner will still allow wakeup
operation to proceed as long as the trylock operation succeeds. This
strikes a good balance between excessive spinlock contention especially
when there are a lot of active readers and a lot of failed fastpath
operations because there are tasks waiting in the queue.

Signed-off-by: Waiman Long <Waiman.Long@hp.com>
---
 include/linux/osq_lock.h    |    5 ++++
 kernel/locking/rwsem-xadd.c |   57 ++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 61 insertions(+), 1 deletions(-)

diff --git a/include/linux/osq_lock.h b/include/linux/osq_lock.h
index 90230d5..79db546 100644
--- a/include/linux/osq_lock.h
+++ b/include/linux/osq_lock.h
@@ -24,4 +24,9 @@ static inline void osq_lock_init(struct optimistic_spin_queue *lock)
 	atomic_set(&lock->tail, OSQ_UNLOCKED_VAL);
 }
 
+static inline bool osq_has_spinner(struct optimistic_spin_queue *lock)
+{
+	return atomic_read(&lock->tail) != OSQ_UNLOCKED_VAL;
+}
+
 #endif
diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index dce22b8..9f71a67 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -107,6 +107,37 @@ enum rwsem_wake_type {
 	RWSEM_WAKE_READ_OWNED	/* Waker thread holds the read lock */
 };
 
+#ifdef CONFIG_RWSEM_SPIN_ON_OWNER
+/*
+ * return true if there is an active writer by checking the owner field which
+ * should be set (not RWSEM_READ_OWNED) if there is one.
+ */
+static inline bool rwsem_has_active_writer(struct rw_semaphore *sem)
+{
+	struct task_struct *owner = ACCESS_ONCE(sem->owner);
+
+	return owner && (owner != RWSEM_READ_OWNED);
+}
+
+/*
+ * Return true if the rwsem has active spinner
+ */
+static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
+{
+	return osq_has_spinner(&sem->osq);
+}
+#else /* CONFIG_RWSEM_SPIN_ON_OWNER */
+static inline bool rwsem_has_active_writer(struct rw_semaphore *sem)
+{
+	return false;	/* Assume it has no active writer */
+}
+
+static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
+{
+	return false;
+}
+#endif /* CONFIG_RWSEM_SPIN_ON_OWNER */
+
 /*
  * handle the lock release when processes blocked on it that can now run
  * - if we come here from up_xxxx(), then:
@@ -125,6 +156,15 @@ __rwsem_do_wake(struct rw_semaphore *sem, enum rwsem_wake_type wake_type)
 	struct list_head *next;
 	long oldcount, woken, loop, adjustment;
 
+	/*
+	 * Abort the wakeup operation if there is an active writer as the
+	 * lock was stolen. up_write() should have cleared the owner field
+	 * before calling this function. If that field is now set, there must
+	 * be an active writer present.
+	 */
+	if (rwsem_has_active_writer(sem))
+		goto out;
+
 	waiter = list_entry(sem->wait_list.next, struct rwsem_waiter, list);
 	if (waiter->type == RWSEM_WAITING_FOR_WRITE) {
 		if (wake_type == RWSEM_WAKE_ANY)
@@ -479,7 +519,22 @@ struct rw_semaphore *rwsem_wake(struct rw_semaphore *sem)
 {
 	unsigned long flags;
 
-	raw_spin_lock_irqsave(&sem->wait_lock, flags);
+	/*
+	 * If a spinner is present, it is not necessary to do the wakeup.
+	 * Try to do wakeup when the trylock succeeds to avoid potential
+	 * spinlock contention which may introduce too much delay in the
+	 * unlock operation.
+	 *
+	 * In case the spinning writer is just going to break out of the loop,
+	 * it will still do a trylock in rwsem_down_write_failed() before
+	 * sleeping.
+	 */
+	if (rwsem_has_spinner(sem)) {
+		if (!raw_spin_trylock_irqsave(&sem->wait_lock, flags))
+			return sem;
+	} else {
+		raw_spin_lock_irqsave(&sem->wait_lock, flags);
+	}
 
 	/* do nothing if list empty */
 	if (!list_empty(&sem->wait_list))
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 4/7] locking/rwsem: threshold limited spinning for active readers
  2014-08-04  2:36 [PATCH 0/7] locking/rwsem: enable reader opt-spinning & writer respin Waiman Long
  2014-08-04  2:36 ` [PATCH 1/7] locking/rwsem: don't resched at the end of optimistic spinning Waiman Long
  2014-08-04  2:36 ` [PATCH 3/7] locking/rwsem: check for active writer/spinner before wakeup Waiman Long
@ 2014-08-04  2:36 ` Waiman Long
       [not found]   ` <1407119782-41119-5-git-send-email-Waiman.Long-VXdhtT5mjnY@public.gmane.org>
  2014-08-04  2:36 ` [PATCH 5/7] locking/rwsem: move down rwsem_down_read_failed function Waiman Long
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 23+ messages in thread
From: Waiman Long @ 2014-08-04  2:36 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra
  Cc: linux-kernel, linux-api, linux-doc, Davidlohr Bueso, Jason Low,
	Scott J Norton, Waiman Long

Even thought only the writers can perform optimistic spinning, there
is still a chance that readers may take the lock before a spinning
writer can get it. In that case, the owner field will be NULL and the
spinning writer can spin indefinitely until its time quantum expires
when some lock owning readers are not running.

This patch tries to handle this special case by:
 1) setting the owner field to a special value RWSEM_READ_OWNED
    to indicate that the current or last owner is a reader.
 2) seting a threshold on how many times (currently 100) spinning will
    be done with active readers before giving up as there is no easy
    way to determine if all of them are currently running.

By doing so, it tries to strike a balance between giving up too early
and losing potential performance gain and wasting too many precious
CPU cycles when some lock owning readers are not running.

Signed-off-by: Waiman Long <Waiman.Long@hp.com>
---
 include/linux/rwsem.h       |    7 +++++++
 kernel/locking/rwsem-xadd.c |   23 ++++++++++++++++++++---
 kernel/locking/rwsem.c      |   17 ++++++++++++++++-
 3 files changed, 43 insertions(+), 4 deletions(-)

diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h
index 035d3c5..48e1e31 100644
--- a/include/linux/rwsem.h
+++ b/include/linux/rwsem.h
@@ -66,6 +66,13 @@ static inline int rwsem_is_locked(struct rw_semaphore *sem)
 #endif
 
 #ifdef CONFIG_RWSEM_SPIN_ON_OWNER
+/*
+ * The owner field is set to RWSEM_READ_OWNED if the last owner(s) are
+ * readers. It is not reset until a writer takes over and set it to its
+ * task structure pointer or NULL when it frees the lock. So a value
+ * of RWSEM_READ_OWNED doesn't mean it currently has active readers.
+ */
+#define RWSEM_READ_OWNED	((struct task_struct *)-1)
 #define __RWSEM_OPT_INIT(lockname) , .osq = OSQ_LOCK_UNLOCKED, .owner = NULL
 #else
 #define __RWSEM_OPT_INIT(lockname)
diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 9f71a67..576d4cd 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -304,6 +304,11 @@ static inline bool rwsem_try_write_lock(long count, struct rw_semaphore *sem)
 
 #ifdef CONFIG_RWSEM_SPIN_ON_OWNER
 /*
+ * Threshold for optimistic spinning on readers
+ */
+#define RWSEM_READ_SPIN_THRESHOLD	100
+
+/*
  * Try to acquire write lock before the writer has been put on wait queue.
  */
 static inline bool rwsem_try_write_lock_unqueued(struct rw_semaphore *sem)
@@ -332,7 +337,7 @@ static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem)
 
 	rcu_read_lock();
 	owner = ACCESS_ONCE(sem->owner);
-	if (owner)
+	if (owner && (owner != RWSEM_READ_OWNED))
 		on_cpu = owner->on_cpu;
 	rcu_read_unlock();
 
@@ -381,10 +386,17 @@ bool rwsem_spin_on_owner(struct rw_semaphore *sem, struct task_struct *owner)
 	return sem->owner == NULL;
 }
 
+/*
+ * With active writer, spinning is done by checking if that writer is on
+ * CPU. With active readers, there is no easy way to determine if all of
+ * them are active. So it fall back to spin a certain number of them
+ * RWSEM_READ_SPIN_THRESHOLD before giving up.
+ */
 static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
 {
 	struct task_struct *owner;
 	bool taken = false;
+	int  read_spincnt = 0;
 
 	preempt_disable();
 
@@ -397,8 +409,12 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
 
 	while (true) {
 		owner = ACCESS_ONCE(sem->owner);
-		if (owner && !rwsem_spin_on_owner(sem, owner))
+		if (owner == RWSEM_READ_OWNED) {
+			if (++read_spincnt > RWSEM_READ_SPIN_THRESHOLD)
+				break;
+		} else if (owner && !rwsem_spin_on_owner(sem, owner)) {
 			break;
+		}
 
 		/* wait_lock will be acquired if write_lock is obtained */
 		if (rwsem_try_write_lock_unqueued(sem)) {
@@ -412,7 +428,8 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
 		 * we're an RT task that will live-lock because we won't let
 		 * the owner complete.
 		 */
-		if (!owner && (need_resched() || rt_task(current)))
+		if ((!owner || (owner == RWSEM_READ_OWNED)) &&
+		    (need_resched() || rt_task(current)))
 			break;
 
 		/*
diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
index e2d3bc7..9770439 100644
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -18,6 +18,11 @@ static inline void rwsem_set_owner(struct rw_semaphore *sem)
 	sem->owner = current;
 }
 
+static inline void rwsem_set_owner_read(struct rw_semaphore *sem)
+{
+	sem->owner = RWSEM_READ_OWNED;
+}
+
 static inline void rwsem_clear_owner(struct rw_semaphore *sem)
 {
 	sem->owner = NULL;
@@ -28,6 +33,10 @@ static inline void rwsem_set_owner(struct rw_semaphore *sem)
 {
 }
 
+static inline void rwsem_set_owner_read(struct rw_semaphore *sem)
+{
+}
+
 static inline void rwsem_clear_owner(struct rw_semaphore *sem)
 {
 }
@@ -42,6 +51,7 @@ void __sched down_read(struct rw_semaphore *sem)
 	rwsem_acquire_read(&sem->dep_map, 0, 0, _RET_IP_);
 
 	LOCK_CONTENDED(sem, __down_read_trylock, __down_read);
+	rwsem_set_owner_read(sem);
 }
 
 EXPORT_SYMBOL(down_read);
@@ -53,8 +63,10 @@ int down_read_trylock(struct rw_semaphore *sem)
 {
 	int ret = __down_read_trylock(sem);
 
-	if (ret == 1)
+	if (ret == 1) {
 		rwsem_acquire_read(&sem->dep_map, 0, 1, _RET_IP_);
+		rwsem_set_owner_read(sem);
+	}
 	return ret;
 }
 
@@ -127,6 +139,7 @@ void downgrade_write(struct rw_semaphore *sem)
 	 */
 	rwsem_clear_owner(sem);
 	__downgrade_write(sem);
+	rwsem_set_owner_read(sem);
 }
 
 EXPORT_SYMBOL(downgrade_write);
@@ -139,6 +152,7 @@ void down_read_nested(struct rw_semaphore *sem, int subclass)
 	rwsem_acquire_read(&sem->dep_map, subclass, 0, _RET_IP_);
 
 	LOCK_CONTENDED(sem, __down_read_trylock, __down_read);
+	rwsem_set_owner_read(sem);
 }
 
 EXPORT_SYMBOL(down_read_nested);
@@ -159,6 +173,7 @@ void down_read_non_owner(struct rw_semaphore *sem)
 	might_sleep();
 
 	__down_read(sem);
+	rwsem_set_owner_read(sem);
 }
 
 EXPORT_SYMBOL(down_read_non_owner);
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 5/7] locking/rwsem: move down rwsem_down_read_failed function
  2014-08-04  2:36 [PATCH 0/7] locking/rwsem: enable reader opt-spinning & writer respin Waiman Long
                   ` (2 preceding siblings ...)
  2014-08-04  2:36 ` [PATCH 4/7] locking/rwsem: threshold limited spinning for active readers Waiman Long
@ 2014-08-04  2:36 ` Waiman Long
  2014-08-04  2:36 ` [PATCH 6/7] locking/rwsem: enables optimistic spinning for readers Waiman Long
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 23+ messages in thread
From: Waiman Long @ 2014-08-04  2:36 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra
  Cc: linux-kernel, linux-api, linux-doc, Davidlohr Bueso, Jason Low,
	Scott J Norton, Waiman Long

Move the rwsem_down_read_failed() function down to below the optimistic
spinning section before enabling optimistic spinning for the readers.
There is no change in code.

Signed-off-by: Waiman Long <Waiman.Long@hp.com>
---
 kernel/locking/rwsem-xadd.c |   96 +++++++++++++++++++++---------------------
 1 files changed, 48 insertions(+), 48 deletions(-)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 576d4cd..86618ea 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -239,54 +239,6 @@ __rwsem_do_wake(struct rw_semaphore *sem, enum rwsem_wake_type wake_type)
 	return sem;
 }
 
-/*
- * Wait for the read lock to be granted
- */
-__visible
-struct rw_semaphore __sched *rwsem_down_read_failed(struct rw_semaphore *sem)
-{
-	long count, adjustment = -RWSEM_ACTIVE_READ_BIAS;
-	struct rwsem_waiter waiter;
-	struct task_struct *tsk = current;
-
-	/* set up my own style of waitqueue */
-	waiter.task = tsk;
-	waiter.type = RWSEM_WAITING_FOR_READ;
-	get_task_struct(tsk);
-
-	raw_spin_lock_irq(&sem->wait_lock);
-	if (list_empty(&sem->wait_list))
-		adjustment += RWSEM_WAITING_BIAS;
-	list_add_tail(&waiter.list, &sem->wait_list);
-
-	/* we're now waiting on the lock, but no longer actively locking */
-	count = rwsem_atomic_update(adjustment, sem);
-
-	/* If there are no active locks, wake the front queued process(es).
-	 *
-	 * If there are no writers and we are first in the queue,
-	 * wake our own waiter to join the existing active readers !
-	 */
-	if (count == RWSEM_WAITING_BIAS ||
-	    (count > RWSEM_WAITING_BIAS &&
-	     adjustment != -RWSEM_ACTIVE_READ_BIAS))
-		sem = __rwsem_do_wake(sem, RWSEM_WAKE_ANY);
-
-	raw_spin_unlock_irq(&sem->wait_lock);
-
-	/* wait to be given the lock */
-	while (true) {
-		set_task_state(tsk, TASK_UNINTERRUPTIBLE);
-		if (!waiter.task)
-			break;
-		schedule();
-	}
-
-	tsk->state = TASK_RUNNING;
-
-	return sem;
-}
-
 static inline bool rwsem_try_write_lock(long count, struct rw_semaphore *sem)
 {
 	if (!(count & RWSEM_ACTIVE_MASK)) {
@@ -458,6 +410,54 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
 #endif
 
 /*
+ * Wait for the read lock to be granted
+ */
+__visible
+struct rw_semaphore __sched * rwsem_down_read_failed(struct rw_semaphore *sem)
+{
+	long count, adjustment = -RWSEM_ACTIVE_READ_BIAS;
+	struct rwsem_waiter waiter;
+	struct task_struct *tsk = current;
+
+	/* set up my own style of waitqueue */
+	waiter.task = tsk;
+	waiter.type = RWSEM_WAITING_FOR_READ;
+	get_task_struct(tsk);
+
+	raw_spin_lock_irq(&sem->wait_lock);
+	if (list_empty(&sem->wait_list))
+		adjustment += RWSEM_WAITING_BIAS;
+	list_add_tail(&waiter.list, &sem->wait_list);
+
+	/* we're now waiting on the lock, but no longer actively locking */
+	count = rwsem_atomic_update(adjustment, sem);
+
+	/* If there are no active locks, wake the front queued process(es).
+	 *
+	 * If there are no writers and we are first in the queue,
+	 * wake our own waiter to join the existing active readers !
+	 */
+	if (count == RWSEM_WAITING_BIAS ||
+	    (count > RWSEM_WAITING_BIAS &&
+	     adjustment != -RWSEM_ACTIVE_READ_BIAS))
+		sem = __rwsem_do_wake(sem, RWSEM_WAKE_ANY);
+
+	raw_spin_unlock_irq(&sem->wait_lock);
+
+	/* wait to be given the lock */
+	while (true) {
+		set_task_state(tsk, TASK_UNINTERRUPTIBLE);
+		if (!waiter.task)
+			break;
+		schedule();
+	}
+
+	tsk->state = TASK_RUNNING;
+
+	return sem;
+}
+
+/*
  * Wait until we successfully acquire the write lock
  */
 __visible
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 6/7] locking/rwsem: enables optimistic spinning for readers
  2014-08-04  2:36 [PATCH 0/7] locking/rwsem: enable reader opt-spinning & writer respin Waiman Long
                   ` (3 preceding siblings ...)
  2014-08-04  2:36 ` [PATCH 5/7] locking/rwsem: move down rwsem_down_read_failed function Waiman Long
@ 2014-08-04  2:36 ` Waiman Long
  2014-08-04  2:36 ` [PATCH 7/7] locking/rwsem: allow waiting writers to go back to optimistic spinning Waiman Long
       [not found] ` <1407119782-41119-1-git-send-email-Waiman.Long-VXdhtT5mjnY@public.gmane.org>
  6 siblings, 0 replies; 23+ messages in thread
From: Waiman Long @ 2014-08-04  2:36 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra
  Cc: linux-kernel, linux-api, linux-doc, Davidlohr Bueso, Jason Low,
	Scott J Norton, Waiman Long

This patch enables readers to go into the optimistic spinning loop
so that both the readers and writers can spin together. This could
speed up workloads that use both readers and writers.

Signed-off-by: Waiman Long <Waiman.Long@hp.com>
---
 kernel/locking/rwsem-xadd.c |  124 +++++++++++++++++++++++++++++++++++++------
 1 files changed, 108 insertions(+), 16 deletions(-)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 86618ea..aafc9f0 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -279,6 +279,81 @@ static inline bool rwsem_try_write_lock_unqueued(struct rw_semaphore *sem)
 	}
 }
 
+/*
+ * Try to acquire read lock
+ *
+ * There is ambiguity when RWSEM_WAITING_BIAS < count < 0 as a writer may
+ * be active instead of having waiters. So we need to recheck the count
+ * under wait_lock to be sure.
+ */
+static inline bool rwsem_try_read_lock_unqueued(struct rw_semaphore *sem)
+{
+	long old, count = ACCESS_ONCE(sem->count);
+	bool taken = false;	/* True if lock taken */
+
+	while (!taken) {
+		if (count < RWSEM_WAITING_BIAS)
+			break;	/* Have writer and waiter */
+
+		old = count;
+		if (count >= 0 || count == RWSEM_WAITING_BIAS) {
+			count = cmpxchg(&sem->count, old,
+					old + RWSEM_ACTIVE_READ_BIAS);
+			if (count == old) {
+				/* Got the read lock */
+				ACCESS_ONCE(sem->owner) = RWSEM_READ_OWNED;
+				taken = true;
+				/*
+				 * Try to wake up readers if lock is free
+				 */
+				if ((count == RWSEM_WAITING_BIAS) &&
+				    raw_spin_trylock_irq(&sem->wait_lock)) {
+					if (!list_empty(&sem->wait_list))
+						goto wake_readers;
+					raw_spin_unlock_irq(&sem->wait_lock);
+				}
+			}
+		} else if (ACCESS_ONCE(sem->owner) == RWSEM_READ_OWNED) {
+			long threshold;
+
+			/*
+			 * RWSEM_WAITING_BIAS < count < 0
+			 */
+			raw_spin_lock_irq(&sem->wait_lock);
+			threshold = list_empty(&sem->wait_list)
+				  ? 0 : RWSEM_WAITING_BIAS;
+			count = ACCESS_ONCE(sem->count);
+			if (count < threshold) {
+				raw_spin_unlock_irq(&sem->wait_lock);
+				break;
+			}
+			old   = count;
+			count = cmpxchg(&sem->count, old,
+					old + RWSEM_ACTIVE_READ_BIAS);
+			if (count == old) {
+				taken = true;
+				ACCESS_ONCE(sem->owner) = RWSEM_READ_OWNED;
+				/*
+				 * Wake up pending readers, if any,
+				 * while holding the lock.
+				 */
+				if (threshold)
+					goto wake_readers;
+			}
+			raw_spin_unlock_irq(&sem->wait_lock);
+		} else {
+			break;
+		}
+	}
+	return taken;
+
+wake_readers:
+	__rwsem_do_wake(sem, RWSEM_WAKE_READ_OWNED);
+	raw_spin_unlock_irq(&sem->wait_lock);
+	return true;
+
+}
+
 static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem)
 {
 	struct task_struct *owner;
@@ -344,7 +419,8 @@ bool rwsem_spin_on_owner(struct rw_semaphore *sem, struct task_struct *owner)
  * them are active. So it fall back to spin a certain number of them
  * RWSEM_READ_SPIN_THRESHOLD before giving up.
  */
-static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
+static bool rwsem_optimistic_spin(struct rw_semaphore *sem,
+				  enum rwsem_waiter_type type)
 {
 	struct task_struct *owner;
 	bool taken = false;
@@ -368,11 +444,11 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
 			break;
 		}
 
-		/* wait_lock will be acquired if write_lock is obtained */
-		if (rwsem_try_write_lock_unqueued(sem)) {
-			taken = true;
+		taken = (type == RWSEM_WAITING_FOR_WRITE)
+		      ? rwsem_try_write_lock_unqueued(sem)
+		      : rwsem_try_read_lock_unqueued(sem);
+		if (taken)
 			break;
-		}
 
 		/*
 		 * When there's no owner, we might have preempted between the
@@ -403,7 +479,8 @@ done:
 }
 
 #else
-static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
+static bool rwsem_optimistic_spin(struct rw_semaphore *sem,
+				  enum rwsem_waiter_type type)
 {
 	return false;
 }
@@ -415,11 +492,21 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
 __visible
 struct rw_semaphore __sched * rwsem_down_read_failed(struct rw_semaphore *sem)
 {
-	long count, adjustment = -RWSEM_ACTIVE_READ_BIAS;
+	long count, adjustment = 0;
 	struct rwsem_waiter waiter;
 	struct task_struct *tsk = current;
 
-	/* set up my own style of waitqueue */
+	/* undo read bias from down_read operation, stop active locking */
+	count = rwsem_atomic_update(-RWSEM_ACTIVE_READ_BIAS, sem);
+
+	/* do optimistic spinning and steal lock if possible */
+	if (rwsem_optimistic_spin(sem, RWSEM_WAITING_FOR_READ))
+		return sem;
+
+	/*
+	 * Optimistic spinning failed, proceed to the slowpath
+	 * and block until we can acquire the sem.
+	 */
 	waiter.task = tsk;
 	waiter.type = RWSEM_WAITING_FOR_READ;
 	get_task_struct(tsk);
@@ -429,8 +516,11 @@ struct rw_semaphore __sched * rwsem_down_read_failed(struct rw_semaphore *sem)
 		adjustment += RWSEM_WAITING_BIAS;
 	list_add_tail(&waiter.list, &sem->wait_list);
 
-	/* we're now waiting on the lock, but no longer actively locking */
-	count = rwsem_atomic_update(adjustment, sem);
+	/* we're now waiting on the lock */
+	if (adjustment)
+		count = rwsem_atomic_update(adjustment, sem);
+	else
+		count = ACCESS_ONCE(sem->count);
 
 	/* If there are no active locks, wake the front queued process(es).
 	 *
@@ -438,8 +528,7 @@ struct rw_semaphore __sched * rwsem_down_read_failed(struct rw_semaphore *sem)
 	 * wake our own waiter to join the existing active readers !
 	 */
 	if (count == RWSEM_WAITING_BIAS ||
-	    (count > RWSEM_WAITING_BIAS &&
-	     adjustment != -RWSEM_ACTIVE_READ_BIAS))
+	   (count >  RWSEM_WAITING_BIAS && adjustment))
 		sem = __rwsem_do_wake(sem, RWSEM_WAKE_ANY);
 
 	raw_spin_unlock_irq(&sem->wait_lock);
@@ -471,7 +560,7 @@ struct rw_semaphore __sched *rwsem_down_write_failed(struct rw_semaphore *sem)
 	count = rwsem_atomic_update(-RWSEM_ACTIVE_WRITE_BIAS, sem);
 
 	/* do optimistic spinning and steal lock if possible */
-	if (rwsem_optimistic_spin(sem))
+	if (rwsem_optimistic_spin(sem, RWSEM_WAITING_FOR_WRITE))
 		return sem;
 
 	/*
@@ -542,9 +631,12 @@ struct rw_semaphore *rwsem_wake(struct rw_semaphore *sem)
 	 * spinlock contention which may introduce too much delay in the
 	 * unlock operation.
 	 *
-	 * In case the spinning writer is just going to break out of the loop,
-	 * it will still do a trylock in rwsem_down_write_failed() before
-	 * sleeping.
+	 * In case the spinner is just going to break out of the loop, it
+	 * will still do a trylock in rwsem_down_write_failed() before
+	 * sleeping, or call __rwsem_do_wake() in rwsem_down_read_failed()
+	 * if it detects a free lock. In either cases, we won't have the
+	 * situation that the lock is free and no task is woken up from the
+	 * waiting queue.
 	 */
 	if (rwsem_has_spinner(sem)) {
 		if (!raw_spin_trylock_irqsave(&sem->wait_lock, flags))
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 7/7] locking/rwsem: allow waiting writers to go back to optimistic spinning
  2014-08-04  2:36 [PATCH 0/7] locking/rwsem: enable reader opt-spinning & writer respin Waiman Long
                   ` (4 preceding siblings ...)
  2014-08-04  2:36 ` [PATCH 6/7] locking/rwsem: enables optimistic spinning for readers Waiman Long
@ 2014-08-04  2:36 ` Waiman Long
       [not found] ` <1407119782-41119-1-git-send-email-Waiman.Long-VXdhtT5mjnY@public.gmane.org>
  6 siblings, 0 replies; 23+ messages in thread
From: Waiman Long @ 2014-08-04  2:36 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra
  Cc: linux-kernel, linux-api, linux-doc, Davidlohr Bueso, Jason Low,
	Scott J Norton, Waiman Long

More aggressive use of optimistic spinning and enabling readers to
participate in spinning may make tasks waiting in the queue harder
to get access to the semaphore, especially for writers who need
exclusive access.

This patch enables a waking writer to go back to the optimistic
spinning loop as long as the owner is running. This allows waiting
writers better access to the semaphore as well as reduce the size
of the waiting queue.

Signed-off-by: Waiman Long <Waiman.Long@hp.com>
---
 kernel/locking/rwsem-xadd.c |   38 +++++++++++++++++++++++++++++++-------
 1 files changed, 31 insertions(+), 7 deletions(-)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index aafc9f0..94b0124 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -484,6 +484,11 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem,
 {
 	return false;
 }
+
+static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem)
+{
+	return false;
+}
 #endif
 
 /*
@@ -553,12 +558,14 @@ __visible
 struct rw_semaphore __sched *rwsem_down_write_failed(struct rw_semaphore *sem)
 {
 	long count;
-	bool waiting = true; /* any queued threads before us */
+	bool waiting; /* any queued threads before us */
+	bool respin;
 	struct rwsem_waiter waiter;
 
 	/* undo write bias from down_write operation, stop active locking */
 	count = rwsem_atomic_update(-RWSEM_ACTIVE_WRITE_BIAS, sem);
 
+optspin:
 	/* do optimistic spinning and steal lock if possible */
 	if (rwsem_optimistic_spin(sem, RWSEM_WAITING_FOR_WRITE))
 		return sem;
@@ -573,8 +580,7 @@ struct rw_semaphore __sched *rwsem_down_write_failed(struct rw_semaphore *sem)
 	raw_spin_lock_irq(&sem->wait_lock);
 
 	/* account for this before adding a new element to the list */
-	if (list_empty(&sem->wait_list))
-		waiting = false;
+	waiting = !list_empty(&sem->wait_list);
 
 	list_add_tail(&waiter.list, &sem->wait_list);
 
@@ -595,23 +601,41 @@ struct rw_semaphore __sched *rwsem_down_write_failed(struct rw_semaphore *sem)
 
 	/* wait until we successfully acquire the lock */
 	set_current_state(TASK_UNINTERRUPTIBLE);
-	while (true) {
+	respin = false;
+	while (!respin) {
 		if (rwsem_try_write_lock(count, sem))
 			break;
 		raw_spin_unlock_irq(&sem->wait_lock);
 
-		/* Block until there are no active lockers. */
-		do {
+		/*
+		 * Block until there are no active lockers or optimistic
+		 * spinning is possible.
+		 */
+		while (true) {
 			schedule();
 			set_current_state(TASK_UNINTERRUPTIBLE);
-		} while ((count = sem->count) & RWSEM_ACTIVE_MASK);
+			count = ACCESS_ONCE(sem->count);
+			if (!(count & RWSEM_ACTIVE_MASK))
+				break;
+			/*
+			 * Go back to optimistic spinning if possible
+			 */
+			if (rwsem_can_spin_on_owner(sem)) {
+				respin = true;
+				break;
+			}
+		}
 
 		raw_spin_lock_irq(&sem->wait_lock);
 	}
 	__set_current_state(TASK_RUNNING);
 
 	list_del(&waiter.list);
+	if (respin && list_empty(&sem->wait_list))
+		rwsem_atomic_update(-RWSEM_WAITING_BIAS, sem);
 	raw_spin_unlock_irq(&sem->wait_lock);
+	if (respin)
+		goto optspin;
 
 	return sem;
 }
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/7] locking/rwsem: more aggressive use of optimistic spinning
  2014-08-04  2:36   ` [PATCH 2/7] locking/rwsem: more aggressive use of " Waiman Long
@ 2014-08-04  4:09     ` Davidlohr Bueso
       [not found]     ` <1407119782-41119-3-git-send-email-Waiman.Long-VXdhtT5mjnY@public.gmane.org>
  1 sibling, 0 replies; 23+ messages in thread
From: Davidlohr Bueso @ 2014-08-04  4:09 UTC (permalink / raw)
  To: Waiman Long
  Cc: Ingo Molnar, Peter Zijlstra, linux-kernel, linux-api, linux-doc,
	Jason Low, Scott J Norton

On Sun, 2014-08-03 at 22:36 -0400, Waiman Long wrote:
> The rwsem_can_spin_on_owner() function currently allows optimistic
> spinning only if the owner field is defined and is running. That is
> too conservative as it will cause some tasks to miss the opportunity
> of doing spinning in case the owner hasn't been able to set the owner
> field in time or the lock has just become available.
> 
> This patch enables more aggressive use of optimistic spinning by
> assuming that the lock is spinnable unless proved otherwise.
> 
> Signed-off-by: Waiman Long <Waiman.Long@hp.com>
> ---
>  kernel/locking/rwsem-xadd.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
> index d058946..dce22b8 100644
> --- a/kernel/locking/rwsem-xadd.c
> +++ b/kernel/locking/rwsem-xadd.c
> @@ -285,7 +285,7 @@ static inline bool rwsem_try_write_lock_unqueued(struct rw_semaphore *sem)
>  static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem)
>  {
>  	struct task_struct *owner;
> -	bool on_cpu = false;
> +	bool on_cpu = true;	/* Assume spinnable unless proved not to be */

Nope, unfortunately we need as it fixes some pretty bad regressions when
dealing with multiple readers -- as readers do not deal with lock
ownership, so another thread can spin for too long in !owner. See commit
37e95624.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/7] locking/rwsem: more aggressive use of optimistic spinning
       [not found]     ` <1407119782-41119-3-git-send-email-Waiman.Long-VXdhtT5mjnY@public.gmane.org>
@ 2014-08-04  4:10       ` Jason Low
  0 siblings, 0 replies; 23+ messages in thread
From: Jason Low @ 2014-08-04  4:10 UTC (permalink / raw)
  To: Waiman Long
  Cc: Ingo Molnar, Peter Zijlstra, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-doc-u79uwXL29TY76Z2rM5mHXA, Davidlohr Bueso, Scott J Norton,
	Dave Chinner

On Sun, 2014-08-03 at 22:36 -0400, Waiman Long wrote:
> The rwsem_can_spin_on_owner() function currently allows optimistic
> spinning only if the owner field is defined and is running. That is
> too conservative as it will cause some tasks to miss the opportunity
> of doing spinning in case the owner hasn't been able to set the owner
> field in time or the lock has just become available.
> 
> This patch enables more aggressive use of optimistic spinning by
> assuming that the lock is spinnable unless proved otherwise.
> 
> Signed-off-by: Waiman Long <Waiman.Long-VXdhtT5mjnY@public.gmane.org>
> ---
>  kernel/locking/rwsem-xadd.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
> index d058946..dce22b8 100644
> --- a/kernel/locking/rwsem-xadd.c
> +++ b/kernel/locking/rwsem-xadd.c
> @@ -285,7 +285,7 @@ static inline bool rwsem_try_write_lock_unqueued(struct rw_semaphore *sem)
>  static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem)
>  {
>  	struct task_struct *owner;
> -	bool on_cpu = false;
> +	bool on_cpu = true;	/* Assume spinnable unless proved not to be */

Hi,

So "on_cpu = true" was recently converted to "on_cpu = false" in order
to address issues such as a 5x performance regression in the xfs_repair
workload that was caused by the original rwsem optimistic spinning code.

However, patch 4 in this patchset does address some of the problems with
spinning when there are readers. CC'ing Dave Chinner, who did the
testing with the xfs_repair workload.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/7] locking/rwsem: enable reader opt-spinning & writer respin
       [not found] ` <1407119782-41119-1-git-send-email-Waiman.Long-VXdhtT5mjnY@public.gmane.org>
  2014-08-04  2:36   ` [PATCH 2/7] locking/rwsem: more aggressive use of " Waiman Long
@ 2014-08-04  4:25   ` Davidlohr Bueso
       [not found]     ` <1407126313.3216.10.camel-5JQ4ckphU/8SZAcGdq5asR6epYMZPwEe5NbjCUgZEJk@public.gmane.org>
  1 sibling, 1 reply; 23+ messages in thread
From: Davidlohr Bueso @ 2014-08-04  4:25 UTC (permalink / raw)
  To: Waiman Long
  Cc: Ingo Molnar, Peter Zijlstra, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-doc-u79uwXL29TY76Z2rM5mHXA, Jason Low, Scott J Norton

On Sun, 2014-08-03 at 22:36 -0400, Waiman Long wrote:
> This patch set improves upon the rwsem optimistic spinning patch set
> from Davidlohr to enable better performing rwsem and more aggressive
> use of optimistic spinning.
> 
> By using a microbenchmark running 1 million lock-unlock operations per
> thread on a 4-socket 40-core Westmere-EX x86-64 test machine running
> 3.16-rc7 based kernels, the following table shows the execution times
> with 2/10 threads running on different CPUs on the same socket where
> load is the number of pause instructions in the critical section:
> 
>   lock/r:w ratio # of threads	Load:Execution Time (ms)
>   -------------- ------------	------------------------
>   mutex		      2		1:530.7, 5:406.0, 10:472.7
>   mutex		     10		1:1848 , 5:2046 , 10:4394
> 
> Before patch:
>   rwsem/0:1	      2		1:339.4, 5:368.9, 10:394.0
>   rwsem/1:1	      2		1:2915 , 5:2621 , 10:2764
>   rwsem/10:1	      2		1:891.2, 5:779.2, 10:827.2
>   rwsem/0:1	     10		1:5618 , 5:5722 , 10:5683
>   rwsem/1:1	     10		1:14562, 5:14561, 10:14770
>   rwsem/10:1	     10		1:5914 , 5:5971 , 10:5912
> 
> After patch:
>   rwsem/0:1	     2		1:161.1, 5:244.4, 10:271.4
>   rwsem/1:1	     2		1:188.8, 5:212.4, 10:312.9
>   rwsem/10:1	     2		1:168.8, 5:179.5, 10:209.8
>   rwsem/0:1	    10		1:1306 , 5:1733 , 10:1998
>   rwsem/1:1	    10		1:1512 , 5:1602 , 10:2093
>   rwsem/10:1	    10		1:1267 , 5:1458 , 10:2233
> 
> % Change:
>   rwsem/0:1	     2		1:-52.5%, 5:-33.7%, 10:-31.1%
>   rwsem/1:1	     2		1:-93.5%, 5:-91.9%, 10:-88.7%
>   rwsem/10:1	     2		1:-81.1%, 5:-77.0%, 10:-74.6%
>   rwsem/0:1	    10		1:-76.8%, 5:-69.7%, 10:-64.8%
>   rwsem/1:1	    10		1:-89.6%, 5:-89.0%, 10:-85.8%
>   rwsem/10:1	    10		1:-78.6%, 5:-75.6%, 10:-62.2%

So at a very low level you see nicer results, which aren't really
translating to much of a significant impact at a higher level (aim7).

> It can be seen that there is dramatic reduction in the execution
> times. The new rwsem is now even faster than mutex whether it is all
> writers or a mixture of writers and readers.
> 
> Running the AIM7 benchmarks on the same 40-core system (HT off),
> the performance improvements on some of the workloads were as follows:
> 
>       Workload	     Before Patch	After Patch	% Change
>       --------	     ------------	-----------	--------
>   custom (200-1000)	446135		  477404	 +7.0%
>   custom (1100-2000)	449665		  484734	 +7.8%
>   high_systime		152437		  154217	 +1.2%
>    (200-1000)
>   high_systime		269695		  278942	 +3.4%
>    (1100-2000)

I worry about complicating rwsems even _more_ than they are, specially
for such a marginal gain. You might want to try other workloads -- ie:
postgresql (pgbench), I normally get pretty useful data when dealing
with rwsems.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/7] locking/rwsem: don't resched at the end of optimistic spinning
  2014-08-04  2:36 ` [PATCH 1/7] locking/rwsem: don't resched at the end of optimistic spinning Waiman Long
@ 2014-08-04  7:55   ` Peter Zijlstra
       [not found]     ` <20140804075528.GI9918-ndre7Fmf5hadTX5a5knrm8zTDFooKrT+cvkQGrU6aU0@public.gmane.org>
  0 siblings, 1 reply; 23+ messages in thread
From: Peter Zijlstra @ 2014-08-04  7:55 UTC (permalink / raw)
  To: Waiman Long
  Cc: Ingo Molnar, linux-kernel, linux-api, linux-doc, Davidlohr Bueso,
	Jason Low, Scott J Norton

[-- Attachment #1: Type: text/plain, Size: 435 bytes --]

On Sun, Aug 03, 2014 at 10:36:16PM -0400, Waiman Long wrote:
> For a fully preemptive kernel, a call to preempt_enable() could
> potentially trigger a task rescheduling event. In the case of rwsem
> optimistic spinning, the task has either gotten the lock or is going
> to sleep soon. So there is no point to do rescheduling here.

Uh what? Why shouldn't we preempt if we've gotten the lock? What if a
FIFO task just woke up? 

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/7] locking/rwsem: enable reader opt-spinning & writer respin
       [not found]     ` <1407126313.3216.10.camel-5JQ4ckphU/8SZAcGdq5asR6epYMZPwEe5NbjCUgZEJk@public.gmane.org>
@ 2014-08-04 18:07       ` Waiman Long
  0 siblings, 0 replies; 23+ messages in thread
From: Waiman Long @ 2014-08-04 18:07 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Ingo Molnar, Peter Zijlstra, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-doc-u79uwXL29TY76Z2rM5mHXA, Jason Low, Scott J Norton

On 08/04/2014 12:25 AM, Davidlohr Bueso wrote:
> On Sun, 2014-08-03 at 22:36 -0400, Waiman Long wrote:
>> This patch set improves upon the rwsem optimistic spinning patch set
>> from Davidlohr to enable better performing rwsem and more aggressive
>> use of optimistic spinning.
>>
>> By using a microbenchmark running 1 million lock-unlock operations per
>> thread on a 4-socket 40-core Westmere-EX x86-64 test machine running
>> 3.16-rc7 based kernels, the following table shows the execution times
>> with 2/10 threads running on different CPUs on the same socket where
>> load is the number of pause instructions in the critical section:
>>
>>    lock/r:w ratio # of threads	Load:Execution Time (ms)
>>    -------------- ------------	------------------------
>>    mutex		      2		1:530.7, 5:406.0, 10:472.7
>>    mutex		     10		1:1848 , 5:2046 , 10:4394
>>
>> Before patch:
>>    rwsem/0:1	      2		1:339.4, 5:368.9, 10:394.0
>>    rwsem/1:1	      2		1:2915 , 5:2621 , 10:2764
>>    rwsem/10:1	      2		1:891.2, 5:779.2, 10:827.2
>>    rwsem/0:1	     10		1:5618 , 5:5722 , 10:5683
>>    rwsem/1:1	     10		1:14562, 5:14561, 10:14770
>>    rwsem/10:1	     10		1:5914 , 5:5971 , 10:5912
>>
>> After patch:
>>    rwsem/0:1	     2		1:161.1, 5:244.4, 10:271.4
>>    rwsem/1:1	     2		1:188.8, 5:212.4, 10:312.9
>>    rwsem/10:1	     2		1:168.8, 5:179.5, 10:209.8
>>    rwsem/0:1	    10		1:1306 , 5:1733 , 10:1998
>>    rwsem/1:1	    10		1:1512 , 5:1602 , 10:2093
>>    rwsem/10:1	    10		1:1267 , 5:1458 , 10:2233
>>
>> % Change:
>>    rwsem/0:1	     2		1:-52.5%, 5:-33.7%, 10:-31.1%
>>    rwsem/1:1	     2		1:-93.5%, 5:-91.9%, 10:-88.7%
>>    rwsem/10:1	     2		1:-81.1%, 5:-77.0%, 10:-74.6%
>>    rwsem/0:1	    10		1:-76.8%, 5:-69.7%, 10:-64.8%
>>    rwsem/1:1	    10		1:-89.6%, 5:-89.0%, 10:-85.8%
>>    rwsem/10:1	    10		1:-78.6%, 5:-75.6%, 10:-62.2%
> So at a very low level you see nicer results, which aren't really
> translating to much of a significant impact at a higher level (aim7).

I was using a 4-socket system for testing. I believe the performance 
gain will be higher on larger machine. I will run some tests on those 
larger machine as well.
>> It can be seen that there is dramatic reduction in the execution
>> times. The new rwsem is now even faster than mutex whether it is all
>> writers or a mixture of writers and readers.
>>
>> Running the AIM7 benchmarks on the same 40-core system (HT off),
>> the performance improvements on some of the workloads were as follows:
>>
>>        Workload	     Before Patch	After Patch	% Change
>>        --------	     ------------	-----------	--------
>>    custom (200-1000)	446135		  477404	 +7.0%
>>    custom (1100-2000)	449665		  484734	 +7.8%
>>    high_systime		152437		  154217	 +1.2%
>>     (200-1000)
>>    high_systime		269695		  278942	 +3.4%
>>     (1100-2000)
> I worry about complicating rwsems even _more_ than they are, specially
> for such a marginal gain. You might want to try other workloads -- ie:
> postgresql (pgbench), I normally get pretty useful data when dealing
> with rwsems.
>

Thank for the info. I will try running pgbench as well.

-Longman

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/7] locking/rwsem: don't resched at the end of optimistic spinning
       [not found]     ` <20140804075528.GI9918-ndre7Fmf5hadTX5a5knrm8zTDFooKrT+cvkQGrU6aU0@public.gmane.org>
@ 2014-08-04 18:36       ` Waiman Long
  2014-08-04 20:48         ` Peter Zijlstra
  0 siblings, 1 reply; 23+ messages in thread
From: Waiman Long @ 2014-08-04 18:36 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-doc-u79uwXL29TY76Z2rM5mHXA, Davidlohr Bueso, Jason Low,
	Scott J Norton

On 08/04/2014 03:55 AM, Peter Zijlstra wrote:
> On Sun, Aug 03, 2014 at 10:36:16PM -0400, Waiman Long wrote:
>> For a fully preemptive kernel, a call to preempt_enable() could
>> potentially trigger a task rescheduling event. In the case of rwsem
>> optimistic spinning, the task has either gotten the lock or is going
>> to sleep soon. So there is no point to do rescheduling here.
> Uh what? Why shouldn't we preempt if we've gotten the lock? What if a
> FIFO task just woke up?

I didn't mean that we shouldn't preempt if there is a higher priority 
task. I am sure that there will be other preemption points along the way 
that a higher priority task can take over the CPU. I just want to say 
that doing it here may not be the best place especially if the task is 
going to sleep soon.

If you think this patch does not make sense, I can remove it as other 
patches in the set has no dependency on this one.

-Longman

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/7] locking/rwsem: don't resched at the end of optimistic spinning
  2014-08-04 18:36       ` Waiman Long
@ 2014-08-04 20:48         ` Peter Zijlstra
  2014-08-04 21:12           ` Jason Low
  2014-08-05 17:54           ` Waiman Long
  0 siblings, 2 replies; 23+ messages in thread
From: Peter Zijlstra @ 2014-08-04 20:48 UTC (permalink / raw)
  To: Waiman Long
  Cc: Ingo Molnar, linux-kernel, linux-api, linux-doc, Davidlohr Bueso,
	Jason Low, Scott J Norton

On Mon, Aug 04, 2014 at 02:36:35PM -0400, Waiman Long wrote:
> On 08/04/2014 03:55 AM, Peter Zijlstra wrote:
> >On Sun, Aug 03, 2014 at 10:36:16PM -0400, Waiman Long wrote:
> >>For a fully preemptive kernel, a call to preempt_enable() could
> >>potentially trigger a task rescheduling event. In the case of rwsem
> >>optimistic spinning, the task has either gotten the lock or is going
> >>to sleep soon. So there is no point to do rescheduling here.
> >Uh what? Why shouldn't we preempt if we've gotten the lock? What if a
> >FIFO task just woke up?
> 
> I didn't mean that we shouldn't preempt if there is a higher priority task.
> I am sure that there will be other preemption points along the way that a
> higher priority task can take over the CPU. I just want to say that doing it
> here may not be the best place especially if the task is going to sleep
> soon.
> 
> If you think this patch does not make sense, I can remove it as other
> patches in the set has no dependency on this one.

Yeah, its actively harmful, you delay preemption by an unspecified
amount of time in case of the spin-acquire. We've had such bugs in -rt
and they're not fun.

Basically the only time you should use no_resched is if the very next
statement is schedule().

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/7] locking/rwsem: don't resched at the end of optimistic spinning
  2014-08-04 20:48         ` Peter Zijlstra
@ 2014-08-04 21:12           ` Jason Low
  2014-08-05 17:54           ` Waiman Long
  1 sibling, 0 replies; 23+ messages in thread
From: Jason Low @ 2014-08-04 21:12 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Waiman Long, Ingo Molnar, linux-kernel, linux-api, linux-doc,
	Davidlohr Bueso, Scott J Norton

On Mon, 2014-08-04 at 22:48 +0200, Peter Zijlstra wrote:
> On Mon, Aug 04, 2014 at 02:36:35PM -0400, Waiman Long wrote:
> > On 08/04/2014 03:55 AM, Peter Zijlstra wrote:
> > >On Sun, Aug 03, 2014 at 10:36:16PM -0400, Waiman Long wrote:
> > >>For a fully preemptive kernel, a call to preempt_enable() could
> > >>potentially trigger a task rescheduling event. In the case of rwsem
> > >>optimistic spinning, the task has either gotten the lock or is going
> > >>to sleep soon. So there is no point to do rescheduling here.
> > >Uh what? Why shouldn't we preempt if we've gotten the lock? What if a
> > >FIFO task just woke up?
> > 
> > I didn't mean that we shouldn't preempt if there is a higher priority task.
> > I am sure that there will be other preemption points along the way that a
> > higher priority task can take over the CPU. I just want to say that doing it
> > here may not be the best place especially if the task is going to sleep
> > soon.
> > 
> > If you think this patch does not make sense, I can remove it as other
> > patches in the set has no dependency on this one.
> 
> Yeah, its actively harmful, you delay preemption by an unspecified
> amount of time in case of the spin-acquire. We've had such bugs in -rt
> and they're not fun.
> 
> Basically the only time you should use no_resched is if the very next
> statement is schedule().

Right, we actually added an extra resched check at essentially the same
point in the mutex spinning code  :)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/7] locking/rwsem: check for active writer/spinner before wakeup
       [not found]   ` <1407119782-41119-4-git-send-email-Waiman.Long-VXdhtT5mjnY@public.gmane.org>
@ 2014-08-04 21:20     ` Jason Low
  2014-08-05 17:56       ` Waiman Long
  0 siblings, 1 reply; 23+ messages in thread
From: Jason Low @ 2014-08-04 21:20 UTC (permalink / raw)
  To: Waiman Long
  Cc: Ingo Molnar, Peter Zijlstra, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-doc-u79uwXL29TY76Z2rM5mHXA, Davidlohr Bueso, Scott J Norton

On Sun, 2014-08-03 at 22:36 -0400, Waiman Long wrote:
> On a highly contended rwsem, spinlock contention due to the slow
> rwsem_wake() call can be a significant portion of the total CPU cycles
> used. With writer lock stealing and writer optimistic spinning, there
> is also a pretty good chance that the lock may have been stolen
> before the waker wakes up the waiters. The woken tasks, if any,
> will have to go back to sleep again.
> 
> This patch adds checking code at the beginning of the rwsem_wake()
> and __rwsem_do_wake() function to look for spinner and active
> writer respectively.  The presence of an active writer will abort the
> wakeup operation.  The presence of a spinner will still allow wakeup
> operation to proceed as long as the trylock operation succeeds. This
> strikes a good balance between excessive spinlock contention especially
> when there are a lot of active readers and a lot of failed fastpath
> operations because there are tasks waiting in the queue.
> 
> Signed-off-by: Waiman Long <Waiman.Long-VXdhtT5mjnY@public.gmane.org>
> ---
>  include/linux/osq_lock.h    |    5 ++++
>  kernel/locking/rwsem-xadd.c |   57 ++++++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 61 insertions(+), 1 deletions(-)
> 
> diff --git a/include/linux/osq_lock.h b/include/linux/osq_lock.h
> index 90230d5..79db546 100644
> --- a/include/linux/osq_lock.h
> +++ b/include/linux/osq_lock.h
> @@ -24,4 +24,9 @@ static inline void osq_lock_init(struct optimistic_spin_queue *lock)
>  	atomic_set(&lock->tail, OSQ_UNLOCKED_VAL);
>  }
>  
> +static inline bool osq_has_spinner(struct optimistic_spin_queue *lock)
> +{
> +	return atomic_read(&lock->tail) != OSQ_UNLOCKED_VAL;
> +}

Like with other locks, should we make this "osq_is_locked"? We can still
add the rwsem has_spinner() abstractions which makes use of
osq_is_locked() if we want.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/7] locking/rwsem: threshold limited spinning for active readers
       [not found]   ` <1407119782-41119-5-git-send-email-Waiman.Long-VXdhtT5mjnY@public.gmane.org>
@ 2014-08-05  4:54     ` Davidlohr Bueso
  2014-08-05  5:30       ` Davidlohr Bueso
  2014-08-05 18:14       ` Waiman Long
  0 siblings, 2 replies; 23+ messages in thread
From: Davidlohr Bueso @ 2014-08-05  4:54 UTC (permalink / raw)
  To: Waiman Long
  Cc: Ingo Molnar, Peter Zijlstra, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-doc-u79uwXL29TY76Z2rM5mHXA, Jason Low, Scott J Norton

On Sun, 2014-08-03 at 22:36 -0400, Waiman Long wrote:
> Even thought only the writers can perform optimistic spinning, there
> is still a chance that readers may take the lock before a spinning
> writer can get it. In that case, the owner field will be NULL and the
> spinning writer can spin indefinitely until its time quantum expires
> when some lock owning readers are not running.

Right, now I understand where you were coming from in patch 3/7 ;)

> This patch tries to handle this special case by:
>  1) setting the owner field to a special value RWSEM_READ_OWNED
>     to indicate that the current or last owner is a reader.
>  2) seting a threshold on how many times (currently 100) spinning will
     ^^setting
>     be done with active readers before giving up as there is no easy
>     way to determine if all of them are currently running.
> 
> By doing so, it tries to strike a balance between giving up too early
> and losing potential performance gain and wasting too many precious
> CPU cycles when some lock owning readers are not running.

That's exactly why these kind of magic things aren't a good thing, much
less in locking. And other alternatives are much more involved, creating
more overhead, which can make the whole thing pretty much useless.

Nor does the amount of times trying to spin strike me as the correct
metric to determine such things. Instead something y cycles or time
based.

[...]
>  #ifdef CONFIG_RWSEM_SPIN_ON_OWNER
> +/*
> + * The owner field is set to RWSEM_READ_OWNED if the last owner(s) are
> + * readers. It is not reset until a writer takes over and set it to its
> + * task structure pointer or NULL when it frees the lock. So a value
> + * of RWSEM_READ_OWNED doesn't mean it currently has active readers.
> + */
> +#define RWSEM_READ_OWNED	((struct task_struct *)-1)

Looks rather weird...

>  #define __RWSEM_OPT_INIT(lockname) , .osq = OSQ_LOCK_UNLOCKED, .owner = NULL
>  #else
>  #define __RWSEM_OPT_INIT(lockname)
> diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
> index 9f71a67..576d4cd 100644
> --- a/kernel/locking/rwsem-xadd.c
> +++ b/kernel/locking/rwsem-xadd.c
> @@ -304,6 +304,11 @@ static inline bool rwsem_try_write_lock(long count, struct rw_semaphore *sem)
>  
>  #ifdef CONFIG_RWSEM_SPIN_ON_OWNER
>  /*
> + * Threshold for optimistic spinning on readers
> + */
> +#define RWSEM_READ_SPIN_THRESHOLD	100

I dislike this for the same reasons they weren't welcomed in spinlocks.
We don't know how it can impact workloads that have not been tested.

[...]
>  static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
>  {
>  	struct task_struct *owner;
>  	bool taken = false;
> +	int  read_spincnt = 0;
>  
>  	preempt_disable();
>  
> @@ -397,8 +409,12 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
>  
>  	while (true) {
>  		owner = ACCESS_ONCE(sem->owner);
> -		if (owner && !rwsem_spin_on_owner(sem, owner))
> +		if (owner == RWSEM_READ_OWNED) {
> +			if (++read_spincnt > RWSEM_READ_SPIN_THRESHOLD)
> +				break;

This is still a pretty fast-path and is going to affect writers, so we
really want to keep it un-clobbered.

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/7] locking/rwsem: threshold limited spinning for active readers
  2014-08-05  4:54     ` Davidlohr Bueso
@ 2014-08-05  5:30       ` Davidlohr Bueso
       [not found]         ` <1407216632.2566.22.camel-5JQ4ckphU/8SZAcGdq5asR6epYMZPwEe5NbjCUgZEJk@public.gmane.org>
  2014-08-05 18:14       ` Waiman Long
  1 sibling, 1 reply; 23+ messages in thread
From: Davidlohr Bueso @ 2014-08-05  5:30 UTC (permalink / raw)
  To: Waiman Long
  Cc: Ingo Molnar, Peter Zijlstra, linux-kernel, linux-api, linux-doc,
	Jason Low, Scott J Norton

On Mon, 2014-08-04 at 21:54 -0700, Davidlohr Bueso wrote:
> >  #ifdef CONFIG_RWSEM_SPIN_ON_OWNER
> > +/*
> > + * The owner field is set to RWSEM_READ_OWNED if the last owner(s) are
> > + * readers. It is not reset until a writer takes over and set it to its
> > + * task structure pointer or NULL when it frees the lock. So a value
> > + * of RWSEM_READ_OWNED doesn't mean it currently has active readers.
> > + */
> > +#define RWSEM_READ_OWNED	((struct task_struct *)-1)
> 
> Looks rather weird...

Instead of populating owner when taking the reader lock, why not just
leave it NULL. Then, we can differentiate between the owner being NULL
either because it is taken by reader(s) or simply because it is not
taken. So something like this:

static inline bool rwsem_owner_is_reader(struct rw_semaphore *sem)
{
	return !sem->owner && rwsem_is_locked(sem));
}


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/7] locking/rwsem: threshold limited spinning for active readers
       [not found]         ` <1407216632.2566.22.camel-5JQ4ckphU/8SZAcGdq5asR6epYMZPwEe5NbjCUgZEJk@public.gmane.org>
@ 2014-08-05  5:41           ` Davidlohr Bueso
  0 siblings, 0 replies; 23+ messages in thread
From: Davidlohr Bueso @ 2014-08-05  5:41 UTC (permalink / raw)
  To: Waiman Long
  Cc: Ingo Molnar, Peter Zijlstra, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-doc-u79uwXL29TY76Z2rM5mHXA, Jason Low, Scott J Norton

On Mon, 2014-08-04 at 22:30 -0700, Davidlohr Bueso wrote:
> On Mon, 2014-08-04 at 21:54 -0700, Davidlohr Bueso wrote:
> > >  #ifdef CONFIG_RWSEM_SPIN_ON_OWNER
> > > +/*
> > > + * The owner field is set to RWSEM_READ_OWNED if the last owner(s) are
> > > + * readers. It is not reset until a writer takes over and set it to its
> > > + * task structure pointer or NULL when it frees the lock. So a value
> > > + * of RWSEM_READ_OWNED doesn't mean it currently has active readers.
> > > + */
> > > +#define RWSEM_READ_OWNED	((struct task_struct *)-1)
> > 
> > Looks rather weird...
> 
> Instead of populating owner when taking the reader lock, why not just
> leave it NULL. Then, we can differentiate between the owner being NULL
> either because it is taken by reader(s) or simply because it is not
> taken. So something like this:
> 
> static inline bool rwsem_owner_is_reader(struct rw_semaphore *sem)
> {
> 	return !sem->owner && rwsem_is_locked(sem));
> }

Although we could race between both checks, specially when going into
slowpaths, too bad. Probably a read barrier before calling is_locked
too. Anyway, you see, things start getting funky.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/7] locking/rwsem: don't resched at the end of optimistic spinning
  2014-08-04 20:48         ` Peter Zijlstra
  2014-08-04 21:12           ` Jason Low
@ 2014-08-05 17:54           ` Waiman Long
  1 sibling, 0 replies; 23+ messages in thread
From: Waiman Long @ 2014-08-05 17:54 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-doc-u79uwXL29TY76Z2rM5mHXA, Davidlohr Bueso, Jason Low,
	Scott J Norton

On 08/04/2014 04:48 PM, Peter Zijlstra wrote:
> On Mon, Aug 04, 2014 at 02:36:35PM -0400, Waiman Long wrote:
>> On 08/04/2014 03:55 AM, Peter Zijlstra wrote:
>>> On Sun, Aug 03, 2014 at 10:36:16PM -0400, Waiman Long wrote:
>>>> For a fully preemptive kernel, a call to preempt_enable() could
>>>> potentially trigger a task rescheduling event. In the case of rwsem
>>>> optimistic spinning, the task has either gotten the lock or is going
>>>> to sleep soon. So there is no point to do rescheduling here.
>>> Uh what? Why shouldn't we preempt if we've gotten the lock? What if a
>>> FIFO task just woke up?
>> I didn't mean that we shouldn't preempt if there is a higher priority task.
>> I am sure that there will be other preemption points along the way that a
>> higher priority task can take over the CPU. I just want to say that doing it
>> here may not be the best place especially if the task is going to sleep
>> soon.
>>
>> If you think this patch does not make sense, I can remove it as other
>> patches in the set has no dependency on this one.
> Yeah, its actively harmful, you delay preemption by an unspecified
> amount of time in case of the spin-acquire. We've had such bugs in -rt
> and they're not fun.
>
> Basically the only time you should use no_resched is if the very next
> statement is schedule().

Thank for the clarification. I will remove patch 1 from the patch set.

-Longman

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/7] locking/rwsem: check for active writer/spinner before wakeup
  2014-08-04 21:20     ` Jason Low
@ 2014-08-05 17:56       ` Waiman Long
  0 siblings, 0 replies; 23+ messages in thread
From: Waiman Long @ 2014-08-05 17:56 UTC (permalink / raw)
  To: Jason Low
  Cc: Ingo Molnar, Peter Zijlstra, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	linux-doc-u79uwXL29TY76Z2rM5mHXA, Davidlohr Bueso, Scott J Norton

On 08/04/2014 05:20 PM, Jason Low wrote:
> On Sun, 2014-08-03 at 22:36 -0400, Waiman Long wrote:
>> On a highly contended rwsem, spinlock contention due to the slow
>> rwsem_wake() call can be a significant portion of the total CPU cycles
>> used. With writer lock stealing and writer optimistic spinning, there
>> is also a pretty good chance that the lock may have been stolen
>> before the waker wakes up the waiters. The woken tasks, if any,
>> will have to go back to sleep again.
>>
>> This patch adds checking code at the beginning of the rwsem_wake()
>> and __rwsem_do_wake() function to look for spinner and active
>> writer respectively.  The presence of an active writer will abort the
>> wakeup operation.  The presence of a spinner will still allow wakeup
>> operation to proceed as long as the trylock operation succeeds. This
>> strikes a good balance between excessive spinlock contention especially
>> when there are a lot of active readers and a lot of failed fastpath
>> operations because there are tasks waiting in the queue.
>>
>> Signed-off-by: Waiman Long<Waiman.Long-VXdhtT5mjnY@public.gmane.org>
>> ---
>>   include/linux/osq_lock.h    |    5 ++++
>>   kernel/locking/rwsem-xadd.c |   57 ++++++++++++++++++++++++++++++++++++++++++-
>>   2 files changed, 61 insertions(+), 1 deletions(-)
>>
>> diff --git a/include/linux/osq_lock.h b/include/linux/osq_lock.h
>> index 90230d5..79db546 100644
>> --- a/include/linux/osq_lock.h
>> +++ b/include/linux/osq_lock.h
>> @@ -24,4 +24,9 @@ static inline void osq_lock_init(struct optimistic_spin_queue *lock)
>>   	atomic_set(&lock->tail, OSQ_UNLOCKED_VAL);
>>   }
>>
>> +static inline bool osq_has_spinner(struct optimistic_spin_queue *lock)
>> +{
>> +	return atomic_read(&lock->tail) != OSQ_UNLOCKED_VAL;
>> +}
> Like with other locks, should we make this "osq_is_locked"? We can still
> add the rwsem has_spinner() abstractions which makes use of
> osq_is_locked() if we want.
>
>

Yes, that is a good idea. I will make the change in the next version.

-Longman

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/7] locking/rwsem: threshold limited spinning for active readers
  2014-08-05  4:54     ` Davidlohr Bueso
  2014-08-05  5:30       ` Davidlohr Bueso
@ 2014-08-05 18:14       ` Waiman Long
  1 sibling, 0 replies; 23+ messages in thread
From: Waiman Long @ 2014-08-05 18:14 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Ingo Molnar, Peter Zijlstra, linux-kernel, linux-api, linux-doc,
	Jason Low, Scott J Norton

On 08/05/2014 12:54 AM, Davidlohr Bueso wrote:
> On Sun, 2014-08-03 at 22:36 -0400, Waiman Long wrote:
>> Even thought only the writers can perform optimistic spinning, there
>> is still a chance that readers may take the lock before a spinning
>> writer can get it. In that case, the owner field will be NULL and the
>> spinning writer can spin indefinitely until its time quantum expires
>> when some lock owning readers are not running.
> Right, now I understand where you were coming from in patch 3/7 ;)
>
>> This patch tries to handle this special case by:
>>   1) setting the owner field to a special value RWSEM_READ_OWNED
>>      to indicate that the current or last owner is a reader.
>>   2) seting a threshold on how many times (currently 100) spinning will
>       ^^setting
>>      be done with active readers before giving up as there is no easy
>>      way to determine if all of them are currently running.
>>
>> By doing so, it tries to strike a balance between giving up too early
>> and losing potential performance gain and wasting too many precious
>> CPU cycles when some lock owning readers are not running.
> That's exactly why these kind of magic things aren't a good thing, much
> less in locking. And other alternatives are much more involved, creating
> more overhead, which can make the whole thing pretty much useless.
>
> Nor does the amount of times trying to spin strike me as the correct
> metric to determine such things. Instead something y cycles or time
> based.

I can make it to be time-based. Still we need some kind of magic number 
of ns of spinning before we give up. Also, it will make the code more 
complicated. Now I am thinking about reduce the threshold to a small 
number, say 16, in addition to whether the sem count is changing to 
decide when to give up. Hopefully, that will reduce the number of 
useless spinning when the readers are running.

> [...]
>>   #ifdef CONFIG_RWSEM_SPIN_ON_OWNER
>> +/*
>> + * The owner field is set to RWSEM_READ_OWNED if the last owner(s) are
>> + * readers. It is not reset until a writer takes over and set it to its
>> + * task structure pointer or NULL when it frees the lock. So a value
>> + * of RWSEM_READ_OWNED doesn't mean it currently has active readers.
>> + */
>> +#define RWSEM_READ_OWNED	((struct task_struct *)-1)
> Looks rather weird...

Overloading pointers with some kind of special value is a technique that 
is also used elsewhere in the kernel.

>
>>   #define __RWSEM_OPT_INIT(lockname) , .osq = OSQ_LOCK_UNLOCKED, .owner = NULL
>>   #else
>>   #define __RWSEM_OPT_INIT(lockname)
>> diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
>> index 9f71a67..576d4cd 100644
>> --- a/kernel/locking/rwsem-xadd.c
>> +++ b/kernel/locking/rwsem-xadd.c
>> @@ -304,6 +304,11 @@ static inline bool rwsem_try_write_lock(long count, struct rw_semaphore *sem)
>>
>>   #ifdef CONFIG_RWSEM_SPIN_ON_OWNER
>>   /*
>> + * Threshold for optimistic spinning on readers
>> + */
>> +#define RWSEM_READ_SPIN_THRESHOLD	100
> I dislike this for the same reasons they weren't welcomed in spinlocks.
> We don't know how it can impact workloads that have not been tested.

Well, this kind of fixed threshold spinning is actually used in the 
para-virtualized spinlock. Please see the SPIN_THRESHOLD macro in 
arch/x86/include/asm/spinlock.h for more details.

> [...]
>>   static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
>>   {
>>   	struct task_struct *owner;
>>   	bool taken = false;
>> +	int  read_spincnt = 0;
>>
>>   	preempt_disable();
>>
>> @@ -397,8 +409,12 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
>>
>>   	while (true) {
>>   		owner = ACCESS_ONCE(sem->owner);
>> -		if (owner&&  !rwsem_spin_on_owner(sem, owner))
>> +		if (owner == RWSEM_READ_OWNED) {
>> +			if (++read_spincnt>  RWSEM_READ_SPIN_THRESHOLD)
>> +				break;
> This is still a pretty fast-path and is going to affect writers, so we
> really want to keep it un-clobbered.
>
> Thanks,
> Davidlohr
>

When the lock is writer-owned, the only overhead is an additional check 
for (owner == RWSEM_READ_OWNED) which should be negligible compared to 
reading the contended semaphore cacheline. I don't think that it will 
have any performance impact in this case.

-Longman

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2014-08-05 18:14 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-08-04  2:36 [PATCH 0/7] locking/rwsem: enable reader opt-spinning & writer respin Waiman Long
2014-08-04  2:36 ` [PATCH 1/7] locking/rwsem: don't resched at the end of optimistic spinning Waiman Long
2014-08-04  7:55   ` Peter Zijlstra
     [not found]     ` <20140804075528.GI9918-ndre7Fmf5hadTX5a5knrm8zTDFooKrT+cvkQGrU6aU0@public.gmane.org>
2014-08-04 18:36       ` Waiman Long
2014-08-04 20:48         ` Peter Zijlstra
2014-08-04 21:12           ` Jason Low
2014-08-05 17:54           ` Waiman Long
2014-08-04  2:36 ` [PATCH 3/7] locking/rwsem: check for active writer/spinner before wakeup Waiman Long
     [not found]   ` <1407119782-41119-4-git-send-email-Waiman.Long-VXdhtT5mjnY@public.gmane.org>
2014-08-04 21:20     ` Jason Low
2014-08-05 17:56       ` Waiman Long
2014-08-04  2:36 ` [PATCH 4/7] locking/rwsem: threshold limited spinning for active readers Waiman Long
     [not found]   ` <1407119782-41119-5-git-send-email-Waiman.Long-VXdhtT5mjnY@public.gmane.org>
2014-08-05  4:54     ` Davidlohr Bueso
2014-08-05  5:30       ` Davidlohr Bueso
     [not found]         ` <1407216632.2566.22.camel-5JQ4ckphU/8SZAcGdq5asR6epYMZPwEe5NbjCUgZEJk@public.gmane.org>
2014-08-05  5:41           ` Davidlohr Bueso
2014-08-05 18:14       ` Waiman Long
2014-08-04  2:36 ` [PATCH 5/7] locking/rwsem: move down rwsem_down_read_failed function Waiman Long
2014-08-04  2:36 ` [PATCH 6/7] locking/rwsem: enables optimistic spinning for readers Waiman Long
2014-08-04  2:36 ` [PATCH 7/7] locking/rwsem: allow waiting writers to go back to optimistic spinning Waiman Long
     [not found] ` <1407119782-41119-1-git-send-email-Waiman.Long-VXdhtT5mjnY@public.gmane.org>
2014-08-04  2:36   ` [PATCH 2/7] locking/rwsem: more aggressive use of " Waiman Long
2014-08-04  4:09     ` Davidlohr Bueso
     [not found]     ` <1407119782-41119-3-git-send-email-Waiman.Long-VXdhtT5mjnY@public.gmane.org>
2014-08-04  4:10       ` Jason Low
2014-08-04  4:25   ` [PATCH 0/7] locking/rwsem: enable reader opt-spinning & writer respin Davidlohr Bueso
     [not found]     ` <1407126313.3216.10.camel-5JQ4ckphU/8SZAcGdq5asR6epYMZPwEe5NbjCUgZEJk@public.gmane.org>
2014-08-04 18:07       ` Waiman Long

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).