[PATCH -tip v2 0/6] locking: Various updates

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH -tip v2 0/6] locking: Various updates
@ 2015-01-06 19:45 Davidlohr Bueso
  2015-01-06 19:45 ` [PATCH 1/6] locking/mutex: Checking the stamp is ww only Davidlohr Bueso
                   ` (6 more replies)
  0 siblings, 7 replies; 13+ messages in thread
From: Davidlohr Bueso @ 2015-01-06 19:45 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Paul E. McKenney, Davidlohr Bueso, linux-kernel

Hello,

A little bit of everything really, with the last two patches being
the most interesting ones.

Patches 1-3 cleanup a bit of ww mutex code.
Patch 4 isolates osq code.
Patch 5 uses the brand new READ/ASSIGN_ONCE primitives.
Patch 6 is a performance patch and gets rid of barrier calls when
        polling for the (osq) lock.

More details obviously in the individual patches. Applies on today's -tip.

Please consider for 3.20, thanks!

Changes since v1 (https://lkml.org/lkml/2014/12/28/22):
o drop patch 1 (pickd up by paulmck)
o drop patch 2 (no can do, thanks a lot gcc)
o improve changelogs.

Davidlohr Bueso (6):
  locking/mutex: Checking the stamp is ww only
  locking/mutex: Move mcs related comments to proper location
  locking/mutex: Introduce ww_mutex_set_context_slowpath
  locking/mcs: Better differentiate between mcs variants
  locking: Use [READ,ASSIGN]_ONCE() for non-scalar types
  locking/osq: No need for load/acquire when acquire-polling

 include/linux/osq_lock.h      |  12 ++-
 kernel/Kconfig.locks          |   4 +
 kernel/locking/Makefile       |   3 +-
 kernel/locking/mcs_spinlock.c | 208 ------------------------------------------
 kernel/locking/mcs_spinlock.h |  22 +----
 kernel/locking/mutex.c        |  66 +++++++-------
 kernel/locking/osq_lock.c     | 203 +++++++++++++++++++++++++++++++++++++++++
 kernel/locking/rwsem-xadd.c   |   4 +-
 8 files changed, 258 insertions(+), 264 deletions(-)
 delete mode 100644 kernel/locking/mcs_spinlock.c
 create mode 100644 kernel/locking/osq_lock.c

-- 
2.1.2


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/6] locking/mutex: Checking the stamp is ww only
  2015-01-06 19:45 [PATCH -tip v2 0/6] locking: Various updates Davidlohr Bueso
@ 2015-01-06 19:45 ` Davidlohr Bueso
  2015-01-14 19:18   ` [tip:locking/core] locking/mutex: Checking the stamp is WW only tip-bot for Davidlohr Bueso
  2015-01-06 19:45 ` [PATCH 2/6] locking/mutex: Move mcs related comments to proper location Davidlohr Bueso
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 13+ messages in thread
From: Davidlohr Bueso @ 2015-01-06 19:45 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Paul E. McKenney, Davidlohr Bueso, linux-kernel, Davidlohr Bueso

Mark it so by renaming __mutex_lock_check_stamp().

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
 kernel/locking/mutex.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 4541951..b042ea5 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -469,7 +469,7 @@ void __sched ww_mutex_unlock(struct ww_mutex *lock)
 EXPORT_SYMBOL(ww_mutex_unlock);
 
 static inline int __sched
-__mutex_lock_check_stamp(struct mutex *lock, struct ww_acquire_ctx *ctx)
+__ww_mutex_lock_check_stamp(struct mutex *lock, struct ww_acquire_ctx *ctx)
 {
 	struct ww_mutex *ww = container_of(lock, struct ww_mutex, base);
 	struct ww_acquire_ctx *hold_ctx = ACCESS_ONCE(ww->ctx);
@@ -557,7 +557,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 		}
 
 		if (use_ww_ctx && ww_ctx->acquired > 0) {
-			ret = __mutex_lock_check_stamp(lock, ww_ctx);
+			ret = __ww_mutex_lock_check_stamp(lock, ww_ctx);
 			if (ret)
 				goto err;
 		}
-- 
2.1.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/6] locking/mutex: Move mcs related comments to proper location
  2015-01-06 19:45 [PATCH -tip v2 0/6] locking: Various updates Davidlohr Bueso
  2015-01-06 19:45 ` [PATCH 1/6] locking/mutex: Checking the stamp is ww only Davidlohr Bueso
@ 2015-01-06 19:45 ` Davidlohr Bueso
  2015-01-14 19:18   ` [tip:locking/core] locking/mutex: Move MCS " tip-bot for Davidlohr Bueso
  2015-01-06 19:45 ` [PATCH 3/6] locking/mutex: Introduce ww_mutex_set_context_slowpath Davidlohr Bueso
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 13+ messages in thread
From: Davidlohr Bueso @ 2015-01-06 19:45 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Paul E. McKenney, Davidlohr Bueso, linux-kernel, Davidlohr Bueso

It serves much better if right before the osq_lock() call.
Also delete a useless comment.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
 kernel/locking/mutex.c | 16 +++++-----------
 1 file changed, 5 insertions(+), 11 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index b042ea5..6db3d0d 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -193,17 +193,6 @@ ww_mutex_set_context_fastpath(struct ww_mutex *lock,
 
 
 #ifdef CONFIG_MUTEX_SPIN_ON_OWNER
-/*
- * In order to avoid a stampede of mutex spinners from acquiring the mutex
- * more or less simultaneously, the spinners need to acquire a MCS lock
- * first before spinning on the owner field.
- *
- */
-
-/*
- * Mutex spinning code migrated from kernel/sched/core.c
- */
-
 static inline bool owner_running(struct mutex *lock, struct task_struct *owner)
 {
 	if (lock->owner != owner)
@@ -307,6 +296,11 @@ static bool mutex_optimistic_spin(struct mutex *lock,
 	if (!mutex_can_spin_on_owner(lock))
 		goto done;
 
+	/*
+	 * In order to avoid a stampede of mutex spinners trying to
+	 * acquire the mutex all at once, the spinners need to take a
+	 * MCS (queued) lock first before spinning on the owner field.
+	 */
 	if (!osq_lock(&lock->osq))
 		goto done;
 
-- 
2.1.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/6] locking/mutex: Introduce ww_mutex_set_context_slowpath
  2015-01-06 19:45 [PATCH -tip v2 0/6] locking: Various updates Davidlohr Bueso
  2015-01-06 19:45 ` [PATCH 1/6] locking/mutex: Checking the stamp is ww only Davidlohr Bueso
  2015-01-06 19:45 ` [PATCH 2/6] locking/mutex: Move mcs related comments to proper location Davidlohr Bueso
@ 2015-01-06 19:45 ` Davidlohr Bueso
  2015-01-14 19:19   ` [tip:locking/core] locking/mutex: Introduce ww_mutex_set_context_slowpath() tip-bot for Davidlohr Bueso
  2015-01-06 19:45 ` [PATCH 4/6] locking/mcs: Better differentiate between mcs variants Davidlohr Bueso
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 13+ messages in thread
From: Davidlohr Bueso @ 2015-01-06 19:45 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Paul E. McKenney, Davidlohr Bueso, linux-kernel, Davidlohr Bueso

... which is equivalent to the fastpath counter part.
This mainly allows getting some ww specific code out
of generic mutex paths.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
 kernel/locking/mutex.c | 44 ++++++++++++++++++++++++++------------------
 1 file changed, 26 insertions(+), 18 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 6db3d0d..2ac48e0 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -147,7 +147,7 @@ static __always_inline void ww_mutex_lock_acquired(struct ww_mutex *ww,
 }
 
 /*
- * after acquiring lock with fastpath or when we lost out in contested
+ * After acquiring lock with fastpath or when we lost out in contested
  * slowpath, set ctx and wake up any waiters so they can recheck.
  *
  * This function is never called when CONFIG_DEBUG_LOCK_ALLOC is set,
@@ -191,6 +191,30 @@ ww_mutex_set_context_fastpath(struct ww_mutex *lock,
 	spin_unlock_mutex(&lock->base.wait_lock, flags);
 }
 
+/*
+ * After acquiring lock in the slowpath set ctx and wake up any
+ * waiters so they can recheck.
+ *
+ * Callers must hold the mutex wait_lock.
+ */
+static __always_inline void
+ww_mutex_set_context_slowpath(struct ww_mutex *lock,
+			      struct ww_acquire_ctx *ctx)
+{
+	struct mutex_waiter *cur;
+
+	ww_mutex_lock_acquired(lock, ctx);
+	lock->ctx = ctx;
+
+	/*
+	 * Give any possible sleeping processes the chance to wake up,
+	 * so they can recheck if they have to back off.
+	 */
+	list_for_each_entry(cur, &lock->base.wait_list, list) {
+		debug_mutex_wake_waiter(lock->base, cur);
+		wake_up_process(cur->task);
+	}
+}
 
 #ifdef CONFIG_MUTEX_SPIN_ON_OWNER
 static inline bool owner_running(struct mutex *lock, struct task_struct *owner)
@@ -576,23 +600,7 @@ skip_wait:
 
 	if (use_ww_ctx) {
 		struct ww_mutex *ww = container_of(lock, struct ww_mutex, base);
-		struct mutex_waiter *cur;
-
-		/*
-		 * This branch gets optimized out for the common case,
-		 * and is only important for ww_mutex_lock.
-		 */
-		ww_mutex_lock_acquired(ww, ww_ctx);
-		ww->ctx = ww_ctx;
-
-		/*
-		 * Give any possible sleeping processes the chance to wake up,
-		 * so they can recheck if they have to back off.
-		 */
-		list_for_each_entry(cur, &lock->wait_list, list) {
-			debug_mutex_wake_waiter(lock, cur);
-			wake_up_process(cur->task);
-		}
+		ww_mutex_set_context_slowpath(ww, ww_ctx);
 	}
 
 	spin_unlock_mutex(&lock->wait_lock, flags);
-- 
2.1.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 4/6] locking/mcs: Better differentiate between mcs variants
  2015-01-06 19:45 [PATCH -tip v2 0/6] locking: Various updates Davidlohr Bueso
                   ` (2 preceding siblings ...)
  2015-01-06 19:45 ` [PATCH 3/6] locking/mutex: Introduce ww_mutex_set_context_slowpath Davidlohr Bueso
@ 2015-01-06 19:45 ` Davidlohr Bueso
  2015-01-14 19:19   ` [tip:locking/core] locking/mcs: Better differentiate between MCS variants tip-bot for Davidlohr Bueso
  2015-01-06 19:45 ` [PATCH 5/6] locking: Use [READ,ASSIGN]_ONCE() for non-scalar types Davidlohr Bueso
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 13+ messages in thread
From: Davidlohr Bueso @ 2015-01-06 19:45 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Paul E. McKenney, Davidlohr Bueso, linux-kernel, Davidlohr Bueso

We have two flavors of the MCS spinlock: standard and cancelable (osq).
While each one is independent of the other, we currently mix and match
them. This patch:

o Moves osq code out of mcs_spinlock.h (which only deals with the traditional
version) into include/linux/osq_lock.h. No unnecessary code is added to the
more global header file, anything locks that make use of osq must include
it anyway.

o Renames mcs_spinlock.c to osq_lock.c. This file only contains osq code.

o Introduces a CONFIG_LOCK_SPIN_ON_OWNER in order to only build osq_lock
if there is support for it.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
 include/linux/osq_lock.h      |  12 ++-
 kernel/Kconfig.locks          |   4 +
 kernel/locking/Makefile       |   3 +-
 kernel/locking/mcs_spinlock.c | 208 ------------------------------------------
 kernel/locking/mcs_spinlock.h |  16 ----
 kernel/locking/osq_lock.c     | 203 +++++++++++++++++++++++++++++++++++++++++
 6 files changed, 219 insertions(+), 227 deletions(-)
 delete mode 100644 kernel/locking/mcs_spinlock.c
 create mode 100644 kernel/locking/osq_lock.c

diff --git a/include/linux/osq_lock.h b/include/linux/osq_lock.h
index 90230d5..3a6490e 100644
--- a/include/linux/osq_lock.h
+++ b/include/linux/osq_lock.h
@@ -5,8 +5,11 @@
  * An MCS like lock especially tailored for optimistic spinning for sleeping
  * lock implementations (mutex, rwsem, etc).
  */
-
-#define OSQ_UNLOCKED_VAL (0)
+struct optimistic_spin_node {
+	struct optimistic_spin_node *next, *prev;
+	int locked; /* 1 if lock acquired */
+	int cpu; /* encoded CPU # + 1 value */
+};
 
 struct optimistic_spin_queue {
 	/*
@@ -16,6 +19,8 @@ struct optimistic_spin_queue {
 	atomic_t tail;
 };
 
+#define OSQ_UNLOCKED_VAL (0)
+
 /* Init macro and function. */
 #define OSQ_LOCK_UNLOCKED { ATOMIC_INIT(OSQ_UNLOCKED_VAL) }
 
@@ -24,4 +29,7 @@ static inline void osq_lock_init(struct optimistic_spin_queue *lock)
 	atomic_set(&lock->tail, OSQ_UNLOCKED_VAL);
 }
 
+extern bool osq_lock(struct optimistic_spin_queue *lock);
+extern void osq_unlock(struct optimistic_spin_queue *lock);
+
 #endif
diff --git a/kernel/Kconfig.locks b/kernel/Kconfig.locks
index 76768ee..08561f1 100644
--- a/kernel/Kconfig.locks
+++ b/kernel/Kconfig.locks
@@ -231,6 +231,10 @@ config RWSEM_SPIN_ON_OWNER
        def_bool y
        depends on SMP && RWSEM_XCHGADD_ALGORITHM && ARCH_SUPPORTS_ATOMIC_RMW
 
+config LOCK_SPIN_ON_OWNER
+       def_bool y
+       depends on MUTEX_SPIN_ON_OWNER || RWSEM_SPIN_ON_OWNER
+
 config ARCH_USE_QUEUE_RWLOCK
 	bool
 
diff --git a/kernel/locking/Makefile b/kernel/locking/Makefile
index 8541bfd..4ca8eb1 100644
--- a/kernel/locking/Makefile
+++ b/kernel/locking/Makefile
@@ -1,5 +1,5 @@
 
-obj-y += mutex.o semaphore.o rwsem.o mcs_spinlock.o
+obj-y += mutex.o semaphore.o rwsem.o
 
 ifdef CONFIG_FUNCTION_TRACER
 CFLAGS_REMOVE_lockdep.o = -pg
@@ -14,6 +14,7 @@ ifeq ($(CONFIG_PROC_FS),y)
 obj-$(CONFIG_LOCKDEP) += lockdep_proc.o
 endif
 obj-$(CONFIG_SMP) += spinlock.o
+obj-$(CONFIG_LOCK_SPIN_ON_OWNER) += osq_lock.o
 obj-$(CONFIG_SMP) += lglock.o
 obj-$(CONFIG_PROVE_LOCKING) += spinlock.o
 obj-$(CONFIG_RT_MUTEXES) += rtmutex.o
diff --git a/kernel/locking/mcs_spinlock.c b/kernel/locking/mcs_spinlock.c
deleted file mode 100644
index 9887a90..0000000
--- a/kernel/locking/mcs_spinlock.c
+++ /dev/null
@@ -1,208 +0,0 @@
-#include <linux/percpu.h>
-#include <linux/sched.h>
-#include "mcs_spinlock.h"
-
-#ifdef CONFIG_SMP
-
-/*
- * An MCS like lock especially tailored for optimistic spinning for sleeping
- * lock implementations (mutex, rwsem, etc).
- *
- * Using a single mcs node per CPU is safe because sleeping locks should not be
- * called from interrupt context and we have preemption disabled while
- * spinning.
- */
-static DEFINE_PER_CPU_SHARED_ALIGNED(struct optimistic_spin_node, osq_node);
-
-/*
- * We use the value 0 to represent "no CPU", thus the encoded value
- * will be the CPU number incremented by 1.
- */
-static inline int encode_cpu(int cpu_nr)
-{
-	return cpu_nr + 1;
-}
-
-static inline struct optimistic_spin_node *decode_cpu(int encoded_cpu_val)
-{
-	int cpu_nr = encoded_cpu_val - 1;
-
-	return per_cpu_ptr(&osq_node, cpu_nr);
-}
-
-/*
- * Get a stable @node->next pointer, either for unlock() or unqueue() purposes.
- * Can return NULL in case we were the last queued and we updated @lock instead.
- */
-static inline struct optimistic_spin_node *
-osq_wait_next(struct optimistic_spin_queue *lock,
-	      struct optimistic_spin_node *node,
-	      struct optimistic_spin_node *prev)
-{
-	struct optimistic_spin_node *next = NULL;
-	int curr = encode_cpu(smp_processor_id());
-	int old;
-
-	/*
-	 * If there is a prev node in queue, then the 'old' value will be
-	 * the prev node's CPU #, else it's set to OSQ_UNLOCKED_VAL since if
-	 * we're currently last in queue, then the queue will then become empty.
-	 */
-	old = prev ? prev->cpu : OSQ_UNLOCKED_VAL;
-
-	for (;;) {
-		if (atomic_read(&lock->tail) == curr &&
-		    atomic_cmpxchg(&lock->tail, curr, old) == curr) {
-			/*
-			 * We were the last queued, we moved @lock back. @prev
-			 * will now observe @lock and will complete its
-			 * unlock()/unqueue().
-			 */
-			break;
-		}
-
-		/*
-		 * We must xchg() the @node->next value, because if we were to
-		 * leave it in, a concurrent unlock()/unqueue() from
-		 * @node->next might complete Step-A and think its @prev is
-		 * still valid.
-		 *
-		 * If the concurrent unlock()/unqueue() wins the race, we'll
-		 * wait for either @lock to point to us, through its Step-B, or
-		 * wait for a new @node->next from its Step-C.
-		 */
-		if (node->next) {
-			next = xchg(&node->next, NULL);
-			if (next)
-				break;
-		}
-
-		cpu_relax_lowlatency();
-	}
-
-	return next;
-}
-
-bool osq_lock(struct optimistic_spin_queue *lock)
-{
-	struct optimistic_spin_node *node = this_cpu_ptr(&osq_node);
-	struct optimistic_spin_node *prev, *next;
-	int curr = encode_cpu(smp_processor_id());
-	int old;
-
-	node->locked = 0;
-	node->next = NULL;
-	node->cpu = curr;
-
-	old = atomic_xchg(&lock->tail, curr);
-	if (old == OSQ_UNLOCKED_VAL)
-		return true;
-
-	prev = decode_cpu(old);
-	node->prev = prev;
-	ACCESS_ONCE(prev->next) = node;
-
-	/*
-	 * Normally @prev is untouchable after the above store; because at that
-	 * moment unlock can proceed and wipe the node element from stack.
-	 *
-	 * However, since our nodes are static per-cpu storage, we're
-	 * guaranteed their existence -- this allows us to apply
-	 * cmpxchg in an attempt to undo our queueing.
-	 */
-
-	while (!smp_load_acquire(&node->locked)) {
-		/*
-		 * If we need to reschedule bail... so we can block.
-		 */
-		if (need_resched())
-			goto unqueue;
-
-		cpu_relax_lowlatency();
-	}
-	return true;
-
-unqueue:
-	/*
-	 * Step - A  -- stabilize @prev
-	 *
-	 * Undo our @prev->next assignment; this will make @prev's
-	 * unlock()/unqueue() wait for a next pointer since @lock points to us
-	 * (or later).
-	 */
-
-	for (;;) {
-		if (prev->next == node &&
-		    cmpxchg(&prev->next, node, NULL) == node)
-			break;
-
-		/*
-		 * We can only fail the cmpxchg() racing against an unlock(),
-		 * in which case we should observe @node->locked becomming
-		 * true.
-		 */
-		if (smp_load_acquire(&node->locked))
-			return true;
-
-		cpu_relax_lowlatency();
-
-		/*
-		 * Or we race against a concurrent unqueue()'s step-B, in which
-		 * case its step-C will write us a new @node->prev pointer.
-		 */
-		prev = ACCESS_ONCE(node->prev);
-	}
-
-	/*
-	 * Step - B -- stabilize @next
-	 *
-	 * Similar to unlock(), wait for @node->next or move @lock from @node
-	 * back to @prev.
-	 */
-
-	next = osq_wait_next(lock, node, prev);
-	if (!next)
-		return false;
-
-	/*
-	 * Step - C -- unlink
-	 *
-	 * @prev is stable because its still waiting for a new @prev->next
-	 * pointer, @next is stable because our @node->next pointer is NULL and
-	 * it will wait in Step-A.
-	 */
-
-	ACCESS_ONCE(next->prev) = prev;
-	ACCESS_ONCE(prev->next) = next;
-
-	return false;
-}
-
-void osq_unlock(struct optimistic_spin_queue *lock)
-{
-	struct optimistic_spin_node *node, *next;
-	int curr = encode_cpu(smp_processor_id());
-
-	/*
-	 * Fast path for the uncontended case.
-	 */
-	if (likely(atomic_cmpxchg(&lock->tail, curr, OSQ_UNLOCKED_VAL) == curr))
-		return;
-
-	/*
-	 * Second most likely case.
-	 */
-	node = this_cpu_ptr(&osq_node);
-	next = xchg(&node->next, NULL);
-	if (next) {
-		ACCESS_ONCE(next->locked) = 1;
-		return;
-	}
-
-	next = osq_wait_next(lock, node, NULL);
-	if (next)
-		ACCESS_ONCE(next->locked) = 1;
-}
-
-#endif
-
diff --git a/kernel/locking/mcs_spinlock.h b/kernel/locking/mcs_spinlock.h
index 4d60986..d1fe2ba 100644
--- a/kernel/locking/mcs_spinlock.h
+++ b/kernel/locking/mcs_spinlock.h
@@ -108,20 +108,4 @@ void mcs_spin_unlock(struct mcs_spinlock **lock, struct mcs_spinlock *node)
 	arch_mcs_spin_unlock_contended(&next->locked);
 }
 
-/*
- * Cancellable version of the MCS lock above.
- *
- * Intended for adaptive spinning of sleeping locks:
- * mutex_lock()/rwsem_down_{read,write}() etc.
- */
-
-struct optimistic_spin_node {
-	struct optimistic_spin_node *next, *prev;
-	int locked; /* 1 if lock acquired */
-	int cpu; /* encoded CPU # value */
-};
-
-extern bool osq_lock(struct optimistic_spin_queue *lock);
-extern void osq_unlock(struct optimistic_spin_queue *lock);
-
 #endif /* __LINUX_MCS_SPINLOCK_H */
diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
new file mode 100644
index 0000000..ec83d4d
--- /dev/null
+++ b/kernel/locking/osq_lock.c
@@ -0,0 +1,203 @@
+#include <linux/percpu.h>
+#include <linux/sched.h>
+#include <linux/osq_lock.h>
+
+/*
+ * An MCS like lock especially tailored for optimistic spinning for sleeping
+ * lock implementations (mutex, rwsem, etc).
+ *
+ * Using a single mcs node per CPU is safe because sleeping locks should not be
+ * called from interrupt context and we have preemption disabled while
+ * spinning.
+ */
+static DEFINE_PER_CPU_SHARED_ALIGNED(struct optimistic_spin_node, osq_node);
+
+/*
+ * We use the value 0 to represent "no CPU", thus the encoded value
+ * will be the CPU number incremented by 1.
+ */
+static inline int encode_cpu(int cpu_nr)
+{
+	return cpu_nr + 1;
+}
+
+static inline struct optimistic_spin_node *decode_cpu(int encoded_cpu_val)
+{
+	int cpu_nr = encoded_cpu_val - 1;
+
+	return per_cpu_ptr(&osq_node, cpu_nr);
+}
+
+/*
+ * Get a stable @node->next pointer, either for unlock() or unqueue() purposes.
+ * Can return NULL in case we were the last queued and we updated @lock instead.
+ */
+static inline struct optimistic_spin_node *
+osq_wait_next(struct optimistic_spin_queue *lock,
+	      struct optimistic_spin_node *node,
+	      struct optimistic_spin_node *prev)
+{
+	struct optimistic_spin_node *next = NULL;
+	int curr = encode_cpu(smp_processor_id());
+	int old;
+
+	/*
+	 * If there is a prev node in queue, then the 'old' value will be
+	 * the prev node's CPU #, else it's set to OSQ_UNLOCKED_VAL since if
+	 * we're currently last in queue, then the queue will then become empty.
+	 */
+	old = prev ? prev->cpu : OSQ_UNLOCKED_VAL;
+
+	for (;;) {
+		if (atomic_read(&lock->tail) == curr &&
+		    atomic_cmpxchg(&lock->tail, curr, old) == curr) {
+			/*
+			 * We were the last queued, we moved @lock back. @prev
+			 * will now observe @lock and will complete its
+			 * unlock()/unqueue().
+			 */
+			break;
+		}
+
+		/*
+		 * We must xchg() the @node->next value, because if we were to
+		 * leave it in, a concurrent unlock()/unqueue() from
+		 * @node->next might complete Step-A and think its @prev is
+		 * still valid.
+		 *
+		 * If the concurrent unlock()/unqueue() wins the race, we'll
+		 * wait for either @lock to point to us, through its Step-B, or
+		 * wait for a new @node->next from its Step-C.
+		 */
+		if (node->next) {
+			next = xchg(&node->next, NULL);
+			if (next)
+				break;
+		}
+
+		cpu_relax_lowlatency();
+	}
+
+	return next;
+}
+
+bool osq_lock(struct optimistic_spin_queue *lock)
+{
+	struct optimistic_spin_node *node = this_cpu_ptr(&osq_node);
+	struct optimistic_spin_node *prev, *next;
+	int curr = encode_cpu(smp_processor_id());
+	int old;
+
+	node->locked = 0;
+	node->next = NULL;
+	node->cpu = curr;
+
+	old = atomic_xchg(&lock->tail, curr);
+	if (old == OSQ_UNLOCKED_VAL)
+		return true;
+
+	prev = decode_cpu(old);
+	node->prev = prev;
+	ACCESS_ONCE(prev->next) = node;
+
+	/*
+	 * Normally @prev is untouchable after the above store; because at that
+	 * moment unlock can proceed and wipe the node element from stack.
+	 *
+	 * However, since our nodes are static per-cpu storage, we're
+	 * guaranteed their existence -- this allows us to apply
+	 * cmpxchg in an attempt to undo our queueing.
+	 */
+
+	while (!smp_load_acquire(&node->locked)) {
+		/*
+		 * If we need to reschedule bail... so we can block.
+		 */
+		if (need_resched())
+			goto unqueue;
+
+		cpu_relax_lowlatency();
+	}
+	return true;
+
+unqueue:
+	/*
+	 * Step - A  -- stabilize @prev
+	 *
+	 * Undo our @prev->next assignment; this will make @prev's
+	 * unlock()/unqueue() wait for a next pointer since @lock points to us
+	 * (or later).
+	 */
+
+	for (;;) {
+		if (prev->next == node &&
+		    cmpxchg(&prev->next, node, NULL) == node)
+			break;
+
+		/*
+		 * We can only fail the cmpxchg() racing against an unlock(),
+		 * in which case we should observe @node->locked becomming
+		 * true.
+		 */
+		if (smp_load_acquire(&node->locked))
+			return true;
+
+		cpu_relax_lowlatency();
+
+		/*
+		 * Or we race against a concurrent unqueue()'s step-B, in which
+		 * case its step-C will write us a new @node->prev pointer.
+		 */
+		prev = ACCESS_ONCE(node->prev);
+	}
+
+	/*
+	 * Step - B -- stabilize @next
+	 *
+	 * Similar to unlock(), wait for @node->next or move @lock from @node
+	 * back to @prev.
+	 */
+
+	next = osq_wait_next(lock, node, prev);
+	if (!next)
+		return false;
+
+	/*
+	 * Step - C -- unlink
+	 *
+	 * @prev is stable because its still waiting for a new @prev->next
+	 * pointer, @next is stable because our @node->next pointer is NULL and
+	 * it will wait in Step-A.
+	 */
+
+	ACCESS_ONCE(next->prev) = prev;
+	ACCESS_ONCE(prev->next) = next;
+
+	return false;
+}
+
+void osq_unlock(struct optimistic_spin_queue *lock)
+{
+	struct optimistic_spin_node *node, *next;
+	int curr = encode_cpu(smp_processor_id());
+
+	/*
+	 * Fast path for the uncontended case.
+	 */
+	if (likely(atomic_cmpxchg(&lock->tail, curr, OSQ_UNLOCKED_VAL) == curr))
+		return;
+
+	/*
+	 * Second most likely case.
+	 */
+	node = this_cpu_ptr(&osq_node);
+	next = xchg(&node->next, NULL);
+	if (next) {
+		ACCESS_ONCE(next->locked) = 1;
+		return;
+	}
+
+	next = osq_wait_next(lock, node, NULL);
+	if (next)
+		ACCESS_ONCE(next->locked) = 1;
+}
-- 
2.1.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 5/6] locking: Use [READ,ASSIGN]_ONCE() for non-scalar types
  2015-01-06 19:45 [PATCH -tip v2 0/6] locking: Various updates Davidlohr Bueso
                   ` (3 preceding siblings ...)
  2015-01-06 19:45 ` [PATCH 4/6] locking/mcs: Better differentiate between mcs variants Davidlohr Bueso
@ 2015-01-06 19:45 ` Davidlohr Bueso
  2015-01-06 19:45 ` [PATCH 6/6] locking/osq: No need for load/acquire when acquire-polling Davidlohr Bueso
  2015-01-07 11:52 ` [PATCH -tip v2 0/6] locking: Various updates Peter Zijlstra
  6 siblings, 0 replies; 13+ messages in thread
From: Davidlohr Bueso @ 2015-01-06 19:45 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Paul E. McKenney, Davidlohr Bueso, linux-kernel,
	Christian Borntraeger, Davidlohr Bueso

ACCESS_ONCE does not work reliably on non-scalar types. For
example gcc 4.6 and 4.7 might remove the volatile tag for such
accesses during the SRA (scalar replacement of aggregates) step
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145).

Change the generic locking code to replace ACCESS_ONCE with the
new calls. I guess everyone will eventually have to update, so
lets see what happens; we also become the first users of ASSIGN_ONCE.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
---
 kernel/locking/mcs_spinlock.h | 6 +++---
 kernel/locking/mutex.c        | 8 ++++----
 kernel/locking/osq_lock.c     | 8 ++++----
 kernel/locking/rwsem-xadd.c   | 4 ++--
 4 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/kernel/locking/mcs_spinlock.h b/kernel/locking/mcs_spinlock.h
index d1fe2ba..903009a 100644
--- a/kernel/locking/mcs_spinlock.h
+++ b/kernel/locking/mcs_spinlock.h
@@ -78,7 +78,7 @@ void mcs_spin_lock(struct mcs_spinlock **lock, struct mcs_spinlock *node)
 		 */
 		return;
 	}
-	ACCESS_ONCE(prev->next) = node;
+	ASSIGN_ONCE(node, prev->next);
 
 	/* Wait until the lock holder passes the lock down. */
 	arch_mcs_spin_lock_contended(&node->locked);
@@ -91,7 +91,7 @@ void mcs_spin_lock(struct mcs_spinlock **lock, struct mcs_spinlock *node)
 static inline
 void mcs_spin_unlock(struct mcs_spinlock **lock, struct mcs_spinlock *node)
 {
-	struct mcs_spinlock *next = ACCESS_ONCE(node->next);
+	struct mcs_spinlock *next = READ_ONCE(node->next);
 
 	if (likely(!next)) {
 		/*
@@ -100,7 +100,7 @@ void mcs_spin_unlock(struct mcs_spinlock **lock, struct mcs_spinlock *node)
 		if (likely(cmpxchg(lock, node, NULL) == node))
 			return;
 		/* Wait until the next pointer is set */
-		while (!(next = ACCESS_ONCE(node->next)))
+		while (!(next = READ_ONCE(node->next)))
 			cpu_relax_lowlatency();
 	}
 
diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 2ac48e0..0082705 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -269,7 +269,7 @@ static inline int mutex_can_spin_on_owner(struct mutex *lock)
 		return 0;
 
 	rcu_read_lock();
-	owner = ACCESS_ONCE(lock->owner);
+	owner = READ_ONCE(lock->owner);
 	if (owner)
 		retval = owner->on_cpu;
 	rcu_read_unlock();
@@ -343,7 +343,7 @@ static bool mutex_optimistic_spin(struct mutex *lock,
 			 * As such, when deadlock detection needs to be
 			 * performed the optimistic spinning cannot be done.
 			 */
-			if (ACCESS_ONCE(ww->ctx))
+			if (READ_ONCE(ww->ctx))
 				break;
 		}
 
@@ -351,7 +351,7 @@ static bool mutex_optimistic_spin(struct mutex *lock,
 		 * If there's an owner, wait for it to either
 		 * release the lock or go to sleep.
 		 */
-		owner = ACCESS_ONCE(lock->owner);
+		owner = READ_ONCE(lock->owner);
 		if (owner && !mutex_spin_on_owner(lock, owner))
 			break;
 
@@ -490,7 +490,7 @@ static inline int __sched
 __ww_mutex_lock_check_stamp(struct mutex *lock, struct ww_acquire_ctx *ctx)
 {
 	struct ww_mutex *ww = container_of(lock, struct ww_mutex, base);
-	struct ww_acquire_ctx *hold_ctx = ACCESS_ONCE(ww->ctx);
+	struct ww_acquire_ctx *hold_ctx = READ_ONCE(ww->ctx);
 
 	if (!hold_ctx)
 		return 0;
diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
index ec83d4d..9c6e251 100644
--- a/kernel/locking/osq_lock.c
+++ b/kernel/locking/osq_lock.c
@@ -98,7 +98,7 @@ bool osq_lock(struct optimistic_spin_queue *lock)
 
 	prev = decode_cpu(old);
 	node->prev = prev;
-	ACCESS_ONCE(prev->next) = node;
+	ASSIGN_ONCE(node, prev->next);
 
 	/*
 	 * Normally @prev is untouchable after the above store; because at that
@@ -148,7 +148,7 @@ unqueue:
 		 * Or we race against a concurrent unqueue()'s step-B, in which
 		 * case its step-C will write us a new @node->prev pointer.
 		 */
-		prev = ACCESS_ONCE(node->prev);
+		prev = READ_ONCE(node->prev);
 	}
 
 	/*
@@ -170,8 +170,8 @@ unqueue:
 	 * it will wait in Step-A.
 	 */
 
-	ACCESS_ONCE(next->prev) = prev;
-	ACCESS_ONCE(prev->next) = next;
+	ASSIGN_ONCE(prev, next->prev);
+	ASSIGN_ONCE(next, prev->next);
 
 	return false;
 }
diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 7628c3f..2e651f6 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -294,7 +294,7 @@ static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem)
 		return false;
 
 	rcu_read_lock();
-	owner = ACCESS_ONCE(sem->owner);
+	owner = READ_ONCE(sem->owner);
 	if (owner)
 		on_cpu = owner->on_cpu;
 	rcu_read_unlock();
@@ -359,7 +359,7 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
 		goto done;
 
 	while (true) {
-		owner = ACCESS_ONCE(sem->owner);
+		owner = READ_ONCE(sem->owner);
 		if (owner && !rwsem_spin_on_owner(sem, owner))
 			break;
 
-- 
2.1.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 6/6] locking/osq: No need for load/acquire when acquire-polling
  2015-01-06 19:45 [PATCH -tip v2 0/6] locking: Various updates Davidlohr Bueso
                   ` (4 preceding siblings ...)
  2015-01-06 19:45 ` [PATCH 5/6] locking: Use [READ,ASSIGN]_ONCE() for non-scalar types Davidlohr Bueso
@ 2015-01-06 19:45 ` Davidlohr Bueso
  2015-01-14 19:20   ` [tip:locking/core] locking/osq: No need for load/ acquire " tip-bot for Davidlohr Bueso
  2015-01-07 11:52 ` [PATCH -tip v2 0/6] locking: Various updates Peter Zijlstra
  6 siblings, 1 reply; 13+ messages in thread
From: Davidlohr Bueso @ 2015-01-06 19:45 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Paul E. McKenney, Davidlohr Bueso, linux-kernel, Davidlohr Bueso

Both mutexes and rwsems took a performance hit when we switched
over from the original mcs code to the cancelable variant (osq).
The reason being the use of smp_load_acquire() when polling for
node->locked. This is not needed as reordering is not an issue,
as such, relax the barrier semantics. Paul describes the scenario
nicely: https://lkml.org/lkml/2013/11/19/405

o If we start polling before the insertion is complete, all that
happens is that the first few polls have no chance of seeing a lock
grant.

o Ordering the polling against the initialization -- the above
 xchg() is already doing that for us.

The smp_load_acquire() when unqueuing make sense. In addition,
we don't need to worry about leaking the critical region as
osq is only used internally.

This impacts both regular and large levels of concurrency,
ie on a 40 core system with a disk intensive workload:

disk-1               804.83 (  0.00%)      828.16 (  2.90%)
disk-61             8063.45 (  0.00%)    18181.82 (125.48%)
disk-121            7187.41 (  0.00%)    20119.17 (179.92%)
disk-181            6933.32 (  0.00%)    20509.91 (195.82%)
disk-241            6850.81 (  0.00%)    20397.80 (197.74%)
disk-301            6815.22 (  0.00%)    20287.58 (197.68%)
disk-361            7080.40 (  0.00%)    20205.22 (185.37%)
disk-421            7076.13 (  0.00%)    19957.33 (182.04%)
disk-481            7083.25 (  0.00%)    19784.06 (179.31%)
disk-541            7038.39 (  0.00%)    19610.92 (178.63%)
disk-601            7072.04 (  0.00%)    19464.53 (175.23%)
disk-661            7010.97 (  0.00%)    19348.23 (175.97%)
disk-721            7069.44 (  0.00%)    19255.33 (172.37%)
disk-781            7007.58 (  0.00%)    19103.14 (172.61%)
disk-841            6981.18 (  0.00%)    18964.22 (171.65%)
disk-901            6968.47 (  0.00%)    18826.72 (170.17%)
disk-961            6964.61 (  0.00%)    18708.02 (168.62%)

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
 kernel/locking/osq_lock.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
index 9c6e251..aa56e3c 100644
--- a/kernel/locking/osq_lock.c
+++ b/kernel/locking/osq_lock.c
@@ -109,7 +109,7 @@ bool osq_lock(struct optimistic_spin_queue *lock)
 	 * cmpxchg in an attempt to undo our queueing.
 	 */
 
-	while (!smp_load_acquire(&node->locked)) {
+	while (!ACCESS_ONCE(node->locked)) {
 		/*
 		 * If we need to reschedule bail... so we can block.
 		 */
-- 
2.1.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH -tip v2 0/6] locking: Various updates
  2015-01-06 19:45 [PATCH -tip v2 0/6] locking: Various updates Davidlohr Bueso
                   ` (5 preceding siblings ...)
  2015-01-06 19:45 ` [PATCH 6/6] locking/osq: No need for load/acquire when acquire-polling Davidlohr Bueso
@ 2015-01-07 11:52 ` Peter Zijlstra
  6 siblings, 0 replies; 13+ messages in thread
From: Peter Zijlstra @ 2015-01-07 11:52 UTC (permalink / raw)
  To: Davidlohr Bueso; +Cc: Ingo Molnar, Paul E. McKenney, linux-kernel



Thanks!

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [tip:locking/core] locking/mutex: Checking the stamp is WW only
  2015-01-06 19:45 ` [PATCH 1/6] locking/mutex: Checking the stamp is ww only Davidlohr Bueso
@ 2015-01-14 19:18   ` tip-bot for Davidlohr Bueso
  0 siblings, 0 replies; 13+ messages in thread
From: tip-bot for Davidlohr Bueso @ 2015-01-14 19:18 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: torvalds, hpa, mingo, peterz, paulmck, tglx, dave, dbueso,
	linux-kernel

Commit-ID:  63dc47e956b464e0ed3282f6e70974eebf850180
Gitweb:     http://git.kernel.org/tip/63dc47e956b464e0ed3282f6e70974eebf850180
Author:     Davidlohr Bueso <dave@stgolabs.net>
AuthorDate: Tue, 6 Jan 2015 11:45:04 -0800
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 14 Jan 2015 15:07:21 +0100

locking/mutex: Checking the stamp is WW only

Mark it so by renaming __mutex_lock_check_stamp().

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1420573509-24774-2-git-send-email-dave@stgolabs.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/locking/mutex.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 4541951..b042ea5 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -469,7 +469,7 @@ void __sched ww_mutex_unlock(struct ww_mutex *lock)
 EXPORT_SYMBOL(ww_mutex_unlock);
 
 static inline int __sched
-__mutex_lock_check_stamp(struct mutex *lock, struct ww_acquire_ctx *ctx)
+__ww_mutex_lock_check_stamp(struct mutex *lock, struct ww_acquire_ctx *ctx)
 {
 	struct ww_mutex *ww = container_of(lock, struct ww_mutex, base);
 	struct ww_acquire_ctx *hold_ctx = ACCESS_ONCE(ww->ctx);
@@ -557,7 +557,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 		}
 
 		if (use_ww_ctx && ww_ctx->acquired > 0) {
-			ret = __mutex_lock_check_stamp(lock, ww_ctx);
+			ret = __ww_mutex_lock_check_stamp(lock, ww_ctx);
 			if (ret)
 				goto err;
 		}

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [tip:locking/core] locking/mutex: Move MCS related comments to proper location
  2015-01-06 19:45 ` [PATCH 2/6] locking/mutex: Move mcs related comments to proper location Davidlohr Bueso
@ 2015-01-14 19:18   ` tip-bot for Davidlohr Bueso
  0 siblings, 0 replies; 13+ messages in thread
From: tip-bot for Davidlohr Bueso @ 2015-01-14 19:18 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, peterz, torvalds, paulmck, tglx, dbueso, mingo,
	dave

Commit-ID:  e42f678a0237f84f0004fbaf0fad0b844751eadd
Gitweb:     http://git.kernel.org/tip/e42f678a0237f84f0004fbaf0fad0b844751eadd
Author:     Davidlohr Bueso <dave@stgolabs.net>
AuthorDate: Tue, 6 Jan 2015 11:45:05 -0800
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 14 Jan 2015 15:07:22 +0100

locking/mutex: Move MCS related comments to proper location

It serves much better if the comments are right before the osq_lock() call.
Also delete a useless comment.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1420573509-24774-3-git-send-email-dave@stgolabs.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/locking/mutex.c | 16 +++++-----------
 1 file changed, 5 insertions(+), 11 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index b042ea5..6db3d0d 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -193,17 +193,6 @@ ww_mutex_set_context_fastpath(struct ww_mutex *lock,
 
 
 #ifdef CONFIG_MUTEX_SPIN_ON_OWNER
-/*
- * In order to avoid a stampede of mutex spinners from acquiring the mutex
- * more or less simultaneously, the spinners need to acquire a MCS lock
- * first before spinning on the owner field.
- *
- */
-
-/*
- * Mutex spinning code migrated from kernel/sched/core.c
- */
-
 static inline bool owner_running(struct mutex *lock, struct task_struct *owner)
 {
 	if (lock->owner != owner)
@@ -307,6 +296,11 @@ static bool mutex_optimistic_spin(struct mutex *lock,
 	if (!mutex_can_spin_on_owner(lock))
 		goto done;
 
+	/*
+	 * In order to avoid a stampede of mutex spinners trying to
+	 * acquire the mutex all at once, the spinners need to take a
+	 * MCS (queued) lock first before spinning on the owner field.
+	 */
 	if (!osq_lock(&lock->osq))
 		goto done;
 

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [tip:locking/core] locking/mutex: Introduce ww_mutex_set_context_slowpath()
  2015-01-06 19:45 ` [PATCH 3/6] locking/mutex: Introduce ww_mutex_set_context_slowpath Davidlohr Bueso
@ 2015-01-14 19:19   ` tip-bot for Davidlohr Bueso
  0 siblings, 0 replies; 13+ messages in thread
From: tip-bot for Davidlohr Bueso @ 2015-01-14 19:19 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: hpa, torvalds, tglx, dave, linux-kernel, mingo, dbueso, paulmck,
	peterz

Commit-ID:  4bd19084faa61a8c68586e74f03f5776179f65c2
Gitweb:     http://git.kernel.org/tip/4bd19084faa61a8c68586e74f03f5776179f65c2
Author:     Davidlohr Bueso <dave@stgolabs.net>
AuthorDate: Tue, 6 Jan 2015 11:45:06 -0800
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 14 Jan 2015 15:07:30 +0100

locking/mutex: Introduce ww_mutex_set_context_slowpath()

... which is equivalent to the fastpath counter part.
This mainly allows getting some WW specific code out
of generic mutex paths.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1420573509-24774-4-git-send-email-dave@stgolabs.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/locking/mutex.c | 44 ++++++++++++++++++++++++++------------------
 1 file changed, 26 insertions(+), 18 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 6db3d0d..c67a60b 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -147,7 +147,7 @@ static __always_inline void ww_mutex_lock_acquired(struct ww_mutex *ww,
 }
 
 /*
- * after acquiring lock with fastpath or when we lost out in contested
+ * After acquiring lock with fastpath or when we lost out in contested
  * slowpath, set ctx and wake up any waiters so they can recheck.
  *
  * This function is never called when CONFIG_DEBUG_LOCK_ALLOC is set,
@@ -191,6 +191,30 @@ ww_mutex_set_context_fastpath(struct ww_mutex *lock,
 	spin_unlock_mutex(&lock->base.wait_lock, flags);
 }
 
+/*
+ * After acquiring lock in the slowpath set ctx and wake up any
+ * waiters so they can recheck.
+ *
+ * Callers must hold the mutex wait_lock.
+ */
+static __always_inline void
+ww_mutex_set_context_slowpath(struct ww_mutex *lock,
+			      struct ww_acquire_ctx *ctx)
+{
+	struct mutex_waiter *cur;
+
+	ww_mutex_lock_acquired(lock, ctx);
+	lock->ctx = ctx;
+
+	/*
+	 * Give any possible sleeping processes the chance to wake up,
+	 * so they can recheck if they have to back off.
+	 */
+	list_for_each_entry(cur, &lock->base.wait_list, list) {
+		debug_mutex_wake_waiter(&lock->base, cur);
+		wake_up_process(cur->task);
+	}
+}
 
 #ifdef CONFIG_MUTEX_SPIN_ON_OWNER
 static inline bool owner_running(struct mutex *lock, struct task_struct *owner)
@@ -576,23 +600,7 @@ skip_wait:
 
 	if (use_ww_ctx) {
 		struct ww_mutex *ww = container_of(lock, struct ww_mutex, base);
-		struct mutex_waiter *cur;
-
-		/*
-		 * This branch gets optimized out for the common case,
-		 * and is only important for ww_mutex_lock.
-		 */
-		ww_mutex_lock_acquired(ww, ww_ctx);
-		ww->ctx = ww_ctx;
-
-		/*
-		 * Give any possible sleeping processes the chance to wake up,
-		 * so they can recheck if they have to back off.
-		 */
-		list_for_each_entry(cur, &lock->wait_list, list) {
-			debug_mutex_wake_waiter(lock, cur);
-			wake_up_process(cur->task);
-		}
+		ww_mutex_set_context_slowpath(ww, ww_ctx);
 	}
 
 	spin_unlock_mutex(&lock->wait_lock, flags);

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [tip:locking/core] locking/mcs: Better differentiate between MCS variants
  2015-01-06 19:45 ` [PATCH 4/6] locking/mcs: Better differentiate between mcs variants Davidlohr Bueso
@ 2015-01-14 19:19   ` tip-bot for Davidlohr Bueso
  0 siblings, 0 replies; 13+ messages in thread
From: tip-bot for Davidlohr Bueso @ 2015-01-14 19:19 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: dbueso, paulmck, Waiman.Long, peterz, linux-kernel, torvalds,
	tglx, mingo, mpatocka, jason.low2, hpa, dave

Commit-ID:  d84b6728c54dcf73bcef3e3f7cf6767e2d224e39
Gitweb:     http://git.kernel.org/tip/d84b6728c54dcf73bcef3e3f7cf6767e2d224e39
Author:     Davidlohr Bueso <dave@stgolabs.net>
AuthorDate: Tue, 6 Jan 2015 11:45:07 -0800
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 14 Jan 2015 15:07:32 +0100

locking/mcs: Better differentiate between MCS variants

We have two flavors of the MCS spinlock: standard and cancelable (OSQ).
While each one is independent of the other, we currently mix and match
them. This patch:

  - Moves the OSQ code out of mcs_spinlock.h (which only deals with the traditional
    version) into include/linux/osq_lock.h. No unnecessary code is added to the
    more global header file, anything locks that make use of OSQ must include
    it anyway.

  - Renames mcs_spinlock.c to osq_lock.c. This file only contains osq code.

  - Introduces a CONFIG_LOCK_SPIN_ON_OWNER in order to only build osq_lock
    if there is support for it.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Jason Low <jason.low2@hp.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: Waiman Long <Waiman.Long@hp.com>
Link: http://lkml.kernel.org/r/1420573509-24774-5-git-send-email-dave@stgolabs.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 include/linux/osq_lock.h                      | 12 ++++++++++--
 kernel/Kconfig.locks                          |  4 ++++
 kernel/locking/Makefile                       |  3 ++-
 kernel/locking/mcs_spinlock.h                 | 16 ----------------
 kernel/locking/{mcs_spinlock.c => osq_lock.c} |  7 +------
 5 files changed, 17 insertions(+), 25 deletions(-)

diff --git a/include/linux/osq_lock.h b/include/linux/osq_lock.h
index 90230d5..3a6490e 100644
--- a/include/linux/osq_lock.h
+++ b/include/linux/osq_lock.h
@@ -5,8 +5,11 @@
  * An MCS like lock especially tailored for optimistic spinning for sleeping
  * lock implementations (mutex, rwsem, etc).
  */
-
-#define OSQ_UNLOCKED_VAL (0)
+struct optimistic_spin_node {
+	struct optimistic_spin_node *next, *prev;
+	int locked; /* 1 if lock acquired */
+	int cpu; /* encoded CPU # + 1 value */
+};
 
 struct optimistic_spin_queue {
 	/*
@@ -16,6 +19,8 @@ struct optimistic_spin_queue {
 	atomic_t tail;
 };
 
+#define OSQ_UNLOCKED_VAL (0)
+
 /* Init macro and function. */
 #define OSQ_LOCK_UNLOCKED { ATOMIC_INIT(OSQ_UNLOCKED_VAL) }
 
@@ -24,4 +29,7 @@ static inline void osq_lock_init(struct optimistic_spin_queue *lock)
 	atomic_set(&lock->tail, OSQ_UNLOCKED_VAL);
 }
 
+extern bool osq_lock(struct optimistic_spin_queue *lock);
+extern void osq_unlock(struct optimistic_spin_queue *lock);
+
 #endif
diff --git a/kernel/Kconfig.locks b/kernel/Kconfig.locks
index 76768ee..08561f1 100644
--- a/kernel/Kconfig.locks
+++ b/kernel/Kconfig.locks
@@ -231,6 +231,10 @@ config RWSEM_SPIN_ON_OWNER
        def_bool y
        depends on SMP && RWSEM_XCHGADD_ALGORITHM && ARCH_SUPPORTS_ATOMIC_RMW
 
+config LOCK_SPIN_ON_OWNER
+       def_bool y
+       depends on MUTEX_SPIN_ON_OWNER || RWSEM_SPIN_ON_OWNER
+
 config ARCH_USE_QUEUE_RWLOCK
 	bool
 
diff --git a/kernel/locking/Makefile b/kernel/locking/Makefile
index 8541bfd..4ca8eb1 100644
--- a/kernel/locking/Makefile
+++ b/kernel/locking/Makefile
@@ -1,5 +1,5 @@
 
-obj-y += mutex.o semaphore.o rwsem.o mcs_spinlock.o
+obj-y += mutex.o semaphore.o rwsem.o
 
 ifdef CONFIG_FUNCTION_TRACER
 CFLAGS_REMOVE_lockdep.o = -pg
@@ -14,6 +14,7 @@ ifeq ($(CONFIG_PROC_FS),y)
 obj-$(CONFIG_LOCKDEP) += lockdep_proc.o
 endif
 obj-$(CONFIG_SMP) += spinlock.o
+obj-$(CONFIG_LOCK_SPIN_ON_OWNER) += osq_lock.o
 obj-$(CONFIG_SMP) += lglock.o
 obj-$(CONFIG_PROVE_LOCKING) += spinlock.o
 obj-$(CONFIG_RT_MUTEXES) += rtmutex.o
diff --git a/kernel/locking/mcs_spinlock.h b/kernel/locking/mcs_spinlock.h
index 4d60986..d1fe2ba 100644
--- a/kernel/locking/mcs_spinlock.h
+++ b/kernel/locking/mcs_spinlock.h
@@ -108,20 +108,4 @@ void mcs_spin_unlock(struct mcs_spinlock **lock, struct mcs_spinlock *node)
 	arch_mcs_spin_unlock_contended(&next->locked);
 }
 
-/*
- * Cancellable version of the MCS lock above.
- *
- * Intended for adaptive spinning of sleeping locks:
- * mutex_lock()/rwsem_down_{read,write}() etc.
- */
-
-struct optimistic_spin_node {
-	struct optimistic_spin_node *next, *prev;
-	int locked; /* 1 if lock acquired */
-	int cpu; /* encoded CPU # value */
-};
-
-extern bool osq_lock(struct optimistic_spin_queue *lock);
-extern void osq_unlock(struct optimistic_spin_queue *lock);
-
 #endif /* __LINUX_MCS_SPINLOCK_H */
diff --git a/kernel/locking/mcs_spinlock.c b/kernel/locking/osq_lock.c
similarity index 98%
rename from kernel/locking/mcs_spinlock.c
rename to kernel/locking/osq_lock.c
index 9887a90..ec83d4d 100644
--- a/kernel/locking/mcs_spinlock.c
+++ b/kernel/locking/osq_lock.c
@@ -1,8 +1,6 @@
 #include <linux/percpu.h>
 #include <linux/sched.h>
-#include "mcs_spinlock.h"
-
-#ifdef CONFIG_SMP
+#include <linux/osq_lock.h>
 
 /*
  * An MCS like lock especially tailored for optimistic spinning for sleeping
@@ -203,6 +201,3 @@ void osq_unlock(struct optimistic_spin_queue *lock)
 	if (next)
 		ACCESS_ONCE(next->locked) = 1;
 }
-
-#endif
-

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [tip:locking/core] locking/osq: No need for load/ acquire when acquire-polling
  2015-01-06 19:45 ` [PATCH 6/6] locking/osq: No need for load/acquire when acquire-polling Davidlohr Bueso
@ 2015-01-14 19:20   ` tip-bot for Davidlohr Bueso
  0 siblings, 0 replies; 13+ messages in thread
From: tip-bot for Davidlohr Bueso @ 2015-01-14 19:20 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: hpa, tglx, mingo, dbueso, peterz, paulmck, linux-kernel, torvalds,
	dave

Commit-ID:  036cc30c6b6af1cd42de6c34c4461f17da01cbf7
Gitweb:     http://git.kernel.org/tip/036cc30c6b6af1cd42de6c34c4461f17da01cbf7
Author:     Davidlohr Bueso <dave@stgolabs.net>
AuthorDate: Tue, 6 Jan 2015 11:45:09 -0800
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 14 Jan 2015 15:16:20 +0100

locking/osq: No need for load/acquire when acquire-polling

Both mutexes and rwsems took a performance hit when we switched
over from the original mcs code to the cancelable variant (osq).
The reason being the use of smp_load_acquire() when polling for
node->locked. This is not needed as reordering is not an issue,
as such, relax the barrier semantics. Paul describes the scenario
nicely: https://lkml.org/lkml/2013/11/19/405

  - If we start polling before the insertion is complete, all that
    happens is that the first few polls have no chance of seeing a lock
    grant.

  - Ordering the polling against the initialization -- the above
    xchg() is already doing that for us.

The smp_load_acquire() when unqueuing make sense. In addition,
we don't need to worry about leaking the critical region as
osq is only used internally.

This impacts both regular and large levels of concurrency,
ie on a 40 core system with a disk intensive workload:

	disk-1               804.83 (  0.00%)      828.16 (  2.90%)
	disk-61             8063.45 (  0.00%)    18181.82 (125.48%)
	disk-121            7187.41 (  0.00%)    20119.17 (179.92%)
	disk-181            6933.32 (  0.00%)    20509.91 (195.82%)
	disk-241            6850.81 (  0.00%)    20397.80 (197.74%)
	disk-301            6815.22 (  0.00%)    20287.58 (197.68%)
	disk-361            7080.40 (  0.00%)    20205.22 (185.37%)
	disk-421            7076.13 (  0.00%)    19957.33 (182.04%)
	disk-481            7083.25 (  0.00%)    19784.06 (179.31%)
	disk-541            7038.39 (  0.00%)    19610.92 (178.63%)
	disk-601            7072.04 (  0.00%)    19464.53 (175.23%)
	disk-661            7010.97 (  0.00%)    19348.23 (175.97%)
	disk-721            7069.44 (  0.00%)    19255.33 (172.37%)
	disk-781            7007.58 (  0.00%)    19103.14 (172.61%)
	disk-841            6981.18 (  0.00%)    18964.22 (171.65%)
	disk-901            6968.47 (  0.00%)    18826.72 (170.17%)
	disk-961            6964.61 (  0.00%)    18708.02 (168.62%)

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1420573509-24774-7-git-send-email-dave@stgolabs.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/locking/osq_lock.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
index ec83d4d..c112d00 100644
--- a/kernel/locking/osq_lock.c
+++ b/kernel/locking/osq_lock.c
@@ -109,7 +109,7 @@ bool osq_lock(struct optimistic_spin_queue *lock)
 	 * cmpxchg in an attempt to undo our queueing.
 	 */
 
-	while (!smp_load_acquire(&node->locked)) {
+	while (!ACCESS_ONCE(node->locked)) {
 		/*
 		 * If we need to reschedule bail... so we can block.
 		 */

^ permalink raw reply related	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2015-01-14 19:20 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-01-06 19:45 [PATCH -tip v2 0/6] locking: Various updates Davidlohr Bueso
2015-01-06 19:45 ` [PATCH 1/6] locking/mutex: Checking the stamp is ww only Davidlohr Bueso
2015-01-14 19:18   ` [tip:locking/core] locking/mutex: Checking the stamp is WW only tip-bot for Davidlohr Bueso
2015-01-06 19:45 ` [PATCH 2/6] locking/mutex: Move mcs related comments to proper location Davidlohr Bueso
2015-01-14 19:18   ` [tip:locking/core] locking/mutex: Move MCS " tip-bot for Davidlohr Bueso
2015-01-06 19:45 ` [PATCH 3/6] locking/mutex: Introduce ww_mutex_set_context_slowpath Davidlohr Bueso
2015-01-14 19:19   ` [tip:locking/core] locking/mutex: Introduce ww_mutex_set_context_slowpath() tip-bot for Davidlohr Bueso
2015-01-06 19:45 ` [PATCH 4/6] locking/mcs: Better differentiate between mcs variants Davidlohr Bueso
2015-01-14 19:19   ` [tip:locking/core] locking/mcs: Better differentiate between MCS variants tip-bot for Davidlohr Bueso
2015-01-06 19:45 ` [PATCH 5/6] locking: Use [READ,ASSIGN]_ONCE() for non-scalar types Davidlohr Bueso
2015-01-06 19:45 ` [PATCH 6/6] locking/osq: No need for load/acquire when acquire-polling Davidlohr Bueso
2015-01-14 19:20   ` [tip:locking/core] locking/osq: No need for load/ acquire " tip-bot for Davidlohr Bueso
2015-01-07 11:52 ` [PATCH -tip v2 0/6] locking: Various updates Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox