[PATCH -tip] locking/rtmutex: Reduce top-waiter blocking on a lock

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Davidlohr Bueso <dave@stgolabs.net>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Waiman Long <waiman.long@hpe.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Mike Galbraith <umgwanakikbuti@gmail.com>,
	Ingo Molnar <mingo@kernel.org>, Jonathan Corbet <corbet@lwn.net>,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
	Jason Low <jason.low2@hpe.com>,
	Scott J Norton <scott.norton@hpe.com>,
	Douglas Hatch <doug.hatch@hpe.com>
Subject: [PATCH -tip] locking/rtmutex: Reduce top-waiter blocking on a lock
Date: Fri, 23 Sep 2016 18:28:03 -0700	[thread overview]
Message-ID: <20160924012803.GC30291@linux-80c1.suse> (raw)
In-Reply-To: <alpine.DEB.2.20.1609222339180.5640@nanos>

By applying well known spin-on-lock-owner techniques, we can avoid the
blocking overhead during the process of when the task is trying to take
the rtmutex. The idea is that as long as the owner is running, there is a
fair chance it'll release the lock soon, and thus a task trying to acquire
the rtmutex will better off spinning instead of blocking immediately after
the fastpath. This is similar to what we use for other locks, borrowed
from -rt. The main difference (due to the obvious real-time constraints)
is that top-waiter spinning must account for any new higher priority waiter,
and therefore cannot steal the lock and avoid any pi-dance. As such there
will be at most only one spinner waiter upon contended lock.

Conditions to stop spinning and block are simple:

(1) Upon need_resched()
(2) Current lock owner blocks

The unlock side remains unchanged as wake_up_process() can safely deal with
calls where the task is not actually blocked (TASK_NORMAL). The biggest
concern would perhaps be if we relied on any implicit barriers (in that the
wake_up_process call would not imply it anymore since nothing was awoken),
but this is not the case. As such, there is only unnecessary overhead dealing
with the wake_q, but this allows us not to miss any wakeups between the spinning
step and the unlocking side.

Measuring the amount of priority inversions of the pi_stress program, there is
some improvement in throughput during a 30 second window. On a 32-core box, with
increasing thread-group count:

pistress
			    4.4.3                 4.4.3
			  vanilla           rtmutex-topspinner
Hmean    1   2321586.73 (  0.00%)  2339847.23 (  0.79%)
Hmean    4   8209026.49 (  0.00%)  8597809.55 (  4.74%)
Hmean    7  12655322.45 (  0.00%) 13194896.45 (  4.26%)
Hmean    12  4210477.03 (  0.00%)  4348643.08 (  3.28%)
Hmean    21  2996823.05 (  0.00%)  3104513.47 (  3.59%)
Hmean    30  2463107.53 (  0.00%)  2584275.71 (  4.91%)
Hmean    48  2656668.46 (  0.00%)  2719324.53 (  2.36%)
Hmean    64  2397253.65 (  0.00%)  2471628.92 (  3.10%)
Stddev   1    653473.88 (  0.00%)   527076.59 (-19.34%)
Stddev   4    664995.50 (  0.00%)   359487.15 (-45.94%)
Stddev   7    248476.88 (  0.00%)   278307.31 ( 12.01%)
Stddev   12    74537.42 (  0.00%)    54305.86 (-27.14%)
Stddev   21    72143.80 (  0.00%)    40371.42 (-44.04%)
Stddev   30    31981.43 (  0.00%)    42306.07 ( 32.28%)
Stddev   48    21317.95 (  0.00%)    42608.50 ( 99.87%)
Stddev   64    23433.99 (  0.00%)    21502.56 ( -8.24%)

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---

Hi, so I've rebased the patch against -tip, and has survived about a full day
of pistress pounding. That said, I don't have any interesting boxes available 
for performance tests, so I'm keeping the results I obtained originally; which
should obviously not matter. The other difference from the previous post was that
I've sprinkled READ/WRITE_ONCE around lock->owner, as now we're playing with it
without the wait_lock.

 kernel/Kconfig.locks            |  4 ++
 kernel/locking/rtmutex.c        | 83 ++++++++++++++++++++++++++++++++++-------
 kernel/locking/rtmutex_common.h |  2 +-
 3 files changed, 75 insertions(+), 14 deletions(-)

diff --git a/kernel/Kconfig.locks b/kernel/Kconfig.locks
index ebdb0043203a..e20790cdc446 100644
--- a/kernel/Kconfig.locks
+++ b/kernel/Kconfig.locks
@@ -227,6 +227,10 @@ config MUTEX_SPIN_ON_OWNER
 	def_bool y
 	depends on SMP && !DEBUG_MUTEXES && ARCH_SUPPORTS_ATOMIC_RMW
 
+config RT_MUTEX_SPIN_ON_OWNER
+       def_bool y
+       depends on SMP && RT_MUTEXES && !DEBUG_RT_MUTEXES && ARCH_SUPPORTS_ATOMIC_RMW
+
 config RWSEM_SPIN_ON_OWNER
        def_bool y
        depends on SMP && RWSEM_XCHGADD_ALGORITHM && ARCH_SUPPORTS_ATOMIC_RMW
diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index 1ec0f48962b3..282a773d1563 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -54,13 +54,13 @@ rt_mutex_set_owner(struct rt_mutex *lock, struct task_struct *owner)
 	if (rt_mutex_has_waiters(lock))
 		val |= RT_MUTEX_HAS_WAITERS;
 
-	lock->owner = (struct task_struct *)val;
+	WRITE_ONCE(lock->owner, (struct task_struct *)val);
 }
 
 static inline void clear_rt_mutex_waiters(struct rt_mutex *lock)
 {
-	lock->owner = (struct task_struct *)
-			((unsigned long)lock->owner & ~RT_MUTEX_HAS_WAITERS);
+	WRITE_ONCE(lock->owner, (struct task_struct *)
+		   ((unsigned long)lock->owner & ~RT_MUTEX_HAS_WAITERS));
 }
 
 static void fixup_rt_mutex_waiters(struct rt_mutex *lock)
@@ -989,14 +989,17 @@ static void mark_wakeup_next_waiter(struct wake_q_head *wake_q,
 	rt_mutex_dequeue_pi(current, waiter);
 
 	/*
-	 * As we are waking up the top waiter, and the waiter stays
-	 * queued on the lock until it gets the lock, this lock
-	 * obviously has waiters. Just set the bit here and this has
-	 * the added benefit of forcing all new tasks into the
-	 * slow path making sure no task of lower priority than
-	 * the top waiter can steal this lock.
+	 * As we are potentially waking up the top waiter, and the waiter
+	 * stays queued on the lock until it gets the lock, this lock
+	 * obviously has waiters. Just set the bit here and this has the
+	 * added benefit of forcing all new tasks into the slow path
+	 * making sure no task of lower priority than the top waiter can
+	 * steal this lock.
+	 *
+	 * If the top waiter, otoh, is spinning on ->owner, this will also
+	 * serve to exit out of the loop and try to acquire the lock.
 	 */
-	lock->owner = (void *) RT_MUTEX_HAS_WAITERS;
+	WRITE_ONCE(lock->owner, (void *) RT_MUTEX_HAS_WAITERS);
 
 	raw_spin_unlock(&current->pi_lock);
 
@@ -1089,6 +1092,48 @@ void rt_mutex_adjust_pi(struct task_struct *task)
 				   next_lock, NULL, task);
 }
 
+#ifdef CONFIG_RT_MUTEX_SPIN_ON_OWNER
+static bool rt_mutex_spin_on_owner(struct rt_mutex *lock,
+				   struct task_struct *owner)
+{
+	bool ret = true;
+
+	/*
+	 * The last owner could have just released the lock,
+	 * immediately try taking it again.
+	 */
+	if (!owner)
+		goto done;
+
+	rcu_read_lock();
+	while (rt_mutex_owner(lock) == owner) {
+		/*
+		 * Ensure we emit the owner->on_cpu, dereference _after_
+		 * checking lock->owner still matches owner. If that fails,
+		 * owner might point to freed memory. If it still matches,
+		 * the rcu_read_lock() ensures the memory stays valid.
+		 */
+		barrier();
+		if (!owner->on_cpu || need_resched()) {
+			ret = false;
+			break;
+		}
+
+		cpu_relax_lowlatency();
+	}
+	rcu_read_unlock();
+done:
+	return ret;
+}
+
+#else
+static bool rt_mutex_spin_on_owner(struct rt_mutex *lock,
+				   struct task_struct *owner)
+{
+	return false;
+}
+#endif
+
 /**
  * __rt_mutex_slowlock() - Perform the wait-wake-try-to-take loop
  * @lock:		 the rt_mutex to take
@@ -1107,6 +1152,8 @@ __rt_mutex_slowlock(struct rt_mutex *lock, int state,
 	int ret = 0;
 
 	for (;;) {
+		struct rt_mutex_waiter *top_waiter = NULL;
+
 		/* Try to acquire the lock: */
 		if (try_to_take_rt_mutex(lock, current, waiter))
 			break;
@@ -1125,11 +1172,21 @@ __rt_mutex_slowlock(struct rt_mutex *lock, int state,
 				break;
 		}
 
+		top_waiter = rt_mutex_top_waiter(lock);
+
 		raw_spin_unlock_irq(&lock->wait_lock);
 
 		debug_rt_mutex_print_deadlock(waiter);
 
-		schedule();
+		/*
+		 * At this point the PI-dance is done, and, as the top waiter,
+		 * we are next in line for the lock. Try to spin on the current
+		 * owner for a while, in the hope that the lock will be released
+		 * soon. Otherwise fallback and block.
+		 */
+		if (top_waiter != waiter ||
+		    !rt_mutex_spin_on_owner(lock, rt_mutex_owner(lock)))
+			schedule();
 
 		raw_spin_lock_irq(&lock->wait_lock);
 		set_current_state(state);
@@ -1555,7 +1612,7 @@ EXPORT_SYMBOL_GPL(__rt_mutex_init);
  * rt_mutex_init_proxy_locked - initialize and lock a rt_mutex on behalf of a
  *				proxy owner
  *
- * @lock: 	the rt_mutex to be locked
+ * @lock:	the rt_mutex to be locked
  * @proxy_owner:the task to set as owner
  *
  * No locking. Caller has to do serializing itself
@@ -1573,7 +1630,7 @@ void rt_mutex_init_proxy_locked(struct rt_mutex *lock,
 /**
  * rt_mutex_proxy_unlock - release a lock on behalf of owner
  *
- * @lock: 	the rt_mutex to be locked
+ * @lock:	the rt_mutex to be locked
  *
  * No locking. Caller has to do serializing itself
  * Special API call for PI-futex support
diff --git a/kernel/locking/rtmutex_common.h b/kernel/locking/rtmutex_common.h
index 4f5f83c7d2d3..d30fc4797fe1 100644
--- a/kernel/locking/rtmutex_common.h
+++ b/kernel/locking/rtmutex_common.h
@@ -76,7 +76,7 @@ task_top_pi_waiter(struct task_struct *p)
 static inline struct task_struct *rt_mutex_owner(struct rt_mutex *lock)
 {
 	return (struct task_struct *)
-		((unsigned long)lock->owner & ~RT_MUTEX_OWNER_MASKALL);
+		((unsigned long)READ_ONCE(lock->owner) & ~RT_MUTEX_OWNER_MASKALL);
 }
 
 /*
-- 
2.6.6

next prev parent reply	other threads:[~2016-09-24  1:28 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-20 13:42 [RFC PATCH v2 0/5] futex: Introducing throughput-optimized futexes Waiman Long
2016-09-20 13:42 ` [RFC PATCH v2 1/5] futex: Add futex_set_timer() helper function Waiman Long
2016-09-22 21:31   ` Thomas Gleixner
2016-09-23  0:45     ` Waiman Long
2016-09-20 13:42 ` [RFC PATCH v2 2/5] futex: Rename futex_pi_state to futex_state Waiman Long
2016-09-20 13:42 ` [RFC PATCH v2 3/5] futex: Throughput-optimized (TO) futexes Waiman Long
2016-09-21  6:59   ` Mike Galbraith
2016-09-21 23:37     ` Waiman Long
2016-09-22  7:49       ` Peter Zijlstra
2016-09-22 13:04         ` Waiman Long
2016-09-22 13:34         ` Thomas Gleixner
2016-09-22 14:41           ` Davidlohr Bueso
2016-09-22 14:46             ` Thomas Gleixner
2016-09-22 15:11               ` Davidlohr Bueso
2016-09-22 20:08                 ` Waiman Long
2016-09-22 20:28                   ` Waiman Long
2016-09-22 20:38                     ` Thomas Gleixner
2016-09-22 21:48                       ` Waiman Long
2016-09-23 13:02                         ` Thomas Gleixner
2016-09-26 22:02                           ` Waiman Long
2016-09-22 21:39                     ` Davidlohr Bueso
2016-09-22 21:41                       ` Thomas Gleixner
2016-09-22 21:59                         ` Waiman Long
2016-09-27 19:02                           ` [PATCH v2 -tip] locking/rtmutex: Reduce top-waiter blocking on a lock Davidlohr Bueso
2016-10-24 18:08                             ` Davidlohr Bueso
2016-10-24 18:48                               ` Thomas Gleixner
2016-09-24  1:28                         ` Davidlohr Bueso [this message]
2016-09-26 21:40                           ` [PATCH " Waiman Long
2016-09-22 19:56           ` [RFC PATCH v2 3/5] futex: Throughput-optimized (TO) futexes Waiman Long
2016-09-22 20:26             ` Thomas Gleixner
2016-09-22 21:13               ` Waiman Long
2016-09-22 13:23   ` Peter Zijlstra
2016-09-22 17:21     ` Waiman Long
2016-09-20 13:42 ` [RFC PATCH v2 4/5] futex: Add timeout support to TO futexes Waiman Long
2016-09-20 13:42 ` [RFC PATCH v2 5/5] futex, doc: TO futexes document Waiman Long

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:ebdb0043203 dfblob:e20790cdc44 dfblob:1ec0f48962b
dfblob:282a773d156 dfblob:4f5f83c7d2d dfblob:d30fc4797fe )
 OR (
bs:"[PATCH -tip] locking/rtmutex: Reduce top-waiter blocking on a lock" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160924012803.GC30291@linux-80c1.suse \
    --to=dave@stgolabs.net \
    --cc=corbet@lwn.net \
    --cc=doug.hatch@hpe.com \
    --cc=jason.low2@hpe.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=scott.norton@hpe.com \
    --cc=tglx@linutronix.de \
    --cc=umgwanakikbuti@gmail.com \
    --cc=waiman.long@hpe.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.