public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Steven Rostedt <rostedt@goodmis.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Peter Zijlstra <peterz@infradead.org>,
	Mike Galbraith <umgwanakikbuti@gmail.com>
Subject: [PATCH 3.4 14/22] rtmutex: Plug slow unlock race
Date: Tue, 15 Jul 2014 16:17:19 -0700	[thread overview]
Message-ID: <20140715231622.540202454@linuxfoundation.org> (raw)
In-Reply-To: <20140715231622.121816073@linuxfoundation.org>

3.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Thomas Gleixner <tglx@linutronix.de>

commit 27e35715df54cbc4f2d044f681802ae30479e7fb upstream.

When the rtmutex fast path is enabled the slow unlock function can
create the following situation:

spin_lock(foo->m->wait_lock);
foo->m->owner = NULL;
	    			rt_mutex_lock(foo->m); <-- fast path
				free = atomic_dec_and_test(foo->refcnt);
				rt_mutex_unlock(foo->m); <-- fast path
				if (free)
				   kfree(foo);

spin_unlock(foo->m->wait_lock); <--- Use after free.

Plug the race by changing the slow unlock to the following scheme:

     while (!rt_mutex_has_waiters(m)) {
     	    /* Clear the waiters bit in m->owner */
	    clear_rt_mutex_waiters(m);
      	    owner = rt_mutex_owner(m);
      	    spin_unlock(m->wait_lock);
      	    if (cmpxchg(m->owner, owner, 0) == owner)
      	       return;
      	    spin_lock(m->wait_lock);
     }

So in case of a new waiter incoming while the owner tries the slow
path unlock we have two situations:

 unlock(wait_lock);
					lock(wait_lock);
 cmpxchg(p, owner, 0) == owner
 	    	   			mark_rt_mutex_waiters(lock);
	 				acquire(lock);

Or:

 unlock(wait_lock);
					lock(wait_lock);
	 				mark_rt_mutex_waiters(lock);
 cmpxchg(p, owner, 0) != owner
					enqueue_waiter();
					unlock(wait_lock);
 lock(wait_lock);
 wakeup_next waiter();
 unlock(wait_lock);
					lock(wait_lock);
					acquire(lock);

If the fast path is disabled, then the simple

   m->owner = NULL;
   unlock(m->wait_lock);

is sufficient as all access to m->owner is serialized via
m->wait_lock;

Also document and clarify the wakeup_next_waiter function as suggested
by Oleg Nesterov.

Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140611183852.937945560@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 kernel/rtmutex.c |  115 ++++++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 109 insertions(+), 6 deletions(-)

--- a/kernel/rtmutex.c
+++ b/kernel/rtmutex.c
@@ -81,6 +81,47 @@ static inline void mark_rt_mutex_waiters
 		owner = *p;
 	} while (cmpxchg(p, owner, owner | RT_MUTEX_HAS_WAITERS) != owner);
 }
+
+/*
+ * Safe fastpath aware unlock:
+ * 1) Clear the waiters bit
+ * 2) Drop lock->wait_lock
+ * 3) Try to unlock the lock with cmpxchg
+ */
+static inline bool unlock_rt_mutex_safe(struct rt_mutex *lock)
+	__releases(lock->wait_lock)
+{
+	struct task_struct *owner = rt_mutex_owner(lock);
+
+	clear_rt_mutex_waiters(lock);
+	raw_spin_unlock(&lock->wait_lock);
+	/*
+	 * If a new waiter comes in between the unlock and the cmpxchg
+	 * we have two situations:
+	 *
+	 * unlock(wait_lock);
+	 *					lock(wait_lock);
+	 * cmpxchg(p, owner, 0) == owner
+	 *					mark_rt_mutex_waiters(lock);
+	 *					acquire(lock);
+	 * or:
+	 *
+	 * unlock(wait_lock);
+	 *					lock(wait_lock);
+	 *					mark_rt_mutex_waiters(lock);
+	 *
+	 * cmpxchg(p, owner, 0) != owner
+	 *					enqueue_waiter();
+	 *					unlock(wait_lock);
+	 * lock(wait_lock);
+	 * wake waiter();
+	 * unlock(wait_lock);
+	 *					lock(wait_lock);
+	 *					acquire(lock);
+	 */
+	return rt_mutex_cmpxchg(lock, owner, NULL);
+}
+
 #else
 # define rt_mutex_cmpxchg(l,c,n)	(0)
 static inline void mark_rt_mutex_waiters(struct rt_mutex *lock)
@@ -88,6 +129,17 @@ static inline void mark_rt_mutex_waiters
 	lock->owner = (struct task_struct *)
 			((unsigned long)lock->owner | RT_MUTEX_HAS_WAITERS);
 }
+
+/*
+ * Simple slow path only version: lock->owner is protected by lock->wait_lock.
+ */
+static inline bool unlock_rt_mutex_safe(struct rt_mutex *lock)
+	__releases(lock->wait_lock)
+{
+	lock->owner = NULL;
+	raw_spin_unlock(&lock->wait_lock);
+	return true;
+}
 #endif
 
 /*
@@ -519,7 +571,8 @@ static int task_blocks_on_rt_mutex(struc
 /*
  * Wake up the next waiter on the lock.
  *
- * Remove the top waiter from the current tasks waiter list and wake it up.
+ * Remove the top waiter from the current tasks pi waiter list and
+ * wake it up.
  *
  * Called with lock->wait_lock held.
  */
@@ -540,10 +593,23 @@ static void wakeup_next_waiter(struct rt
 	 */
 	plist_del(&waiter->pi_list_entry, &current->pi_waiters);
 
-	rt_mutex_set_owner(lock, NULL);
+	/*
+	 * As we are waking up the top waiter, and the waiter stays
+	 * queued on the lock until it gets the lock, this lock
+	 * obviously has waiters. Just set the bit here and this has
+	 * the added benefit of forcing all new tasks into the
+	 * slow path making sure no task of lower priority than
+	 * the top waiter can steal this lock.
+	 */
+	lock->owner = (void *) RT_MUTEX_HAS_WAITERS;
 
 	raw_spin_unlock_irqrestore(&current->pi_lock, flags);
 
+	/*
+	 * It's safe to dereference waiter as it cannot go away as
+	 * long as we hold lock->wait_lock. The waiter task needs to
+	 * acquire it in order to dequeue the waiter.
+	 */
 	wake_up_process(waiter->task);
 }
 
@@ -796,12 +862,49 @@ rt_mutex_slowunlock(struct rt_mutex *loc
 
 	rt_mutex_deadlock_account_unlock(current);
 
-	if (!rt_mutex_has_waiters(lock)) {
-		lock->owner = NULL;
-		raw_spin_unlock(&lock->wait_lock);
-		return;
+	/*
+	 * We must be careful here if the fast path is enabled. If we
+	 * have no waiters queued we cannot set owner to NULL here
+	 * because of:
+	 *
+	 * foo->lock->owner = NULL;
+	 *			rtmutex_lock(foo->lock);   <- fast path
+	 *			free = atomic_dec_and_test(foo->refcnt);
+	 *			rtmutex_unlock(foo->lock); <- fast path
+	 *			if (free)
+	 *				kfree(foo);
+	 * raw_spin_unlock(foo->lock->wait_lock);
+	 *
+	 * So for the fastpath enabled kernel:
+	 *
+	 * Nothing can set the waiters bit as long as we hold
+	 * lock->wait_lock. So we do the following sequence:
+	 *
+	 *	owner = rt_mutex_owner(lock);
+	 *	clear_rt_mutex_waiters(lock);
+	 *	raw_spin_unlock(&lock->wait_lock);
+	 *	if (cmpxchg(&lock->owner, owner, 0) == owner)
+	 *		return;
+	 *	goto retry;
+	 *
+	 * The fastpath disabled variant is simple as all access to
+	 * lock->owner is serialized by lock->wait_lock:
+	 *
+	 *	lock->owner = NULL;
+	 *	raw_spin_unlock(&lock->wait_lock);
+	 */
+	while (!rt_mutex_has_waiters(lock)) {
+		/* Drops lock->wait_lock ! */
+		if (unlock_rt_mutex_safe(lock) == true)
+			return;
+		/* Relock the rtmutex and try again */
+		raw_spin_lock(&lock->wait_lock);
 	}
 
+	/*
+	 * The wakeup next waiter path does not suffer from the above
+	 * race. See the comments there.
+	 */
 	wakeup_next_waiter(lock);
 
 	raw_spin_unlock(&lock->wait_lock);



  parent reply	other threads:[~2014-07-16  0:19 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-07-15 23:17 [PATCH 3.4 00/22] 3.4.99-stable review Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.4 01/22] usb: option: Add ID for Telewell TW-LTE 4G v2 Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.4 02/22] USB: cp210x: add support for Corsair usb dongle Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.4 03/22] USB: ftdi_sio: Add extra PID Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.4 04/22] cpuset,mempolicy: fix sleeping function called from invalid context Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.4 05/22] hwmon: (amc6821) Fix permissions for temp2_input Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.4 06/22] hwmon: (adm1029) Ensure the fan_div cache is updated in set_fan_div Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.4 07/22] powerpc/perf: Never program book3s PMCs with values >= 0x80000000 Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.4 08/22] ext4: clarify error count warning messages Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.4 10/22] tracing: Remove ftrace_stop/start() from reading the trace file Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.4 11/22] rtmutex: Fix deadlock detector for real Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.4 12/22] rtmutex: Detect changes in the pi lock chain Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.4 13/22] rtmutex: Handle deadlock detection smarter Greg Kroah-Hartman
2014-07-15 23:17 ` Greg Kroah-Hartman [this message]
2014-07-15 23:17 ` [PATCH 3.4 15/22] Revert "x86-64, modify_ldt: Make support for 16-bit segments a runtime option" Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.4 16/22] x86-64, espfix: Dont leak bits 31:16 of %esp returning to 16-bit stack Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.4 17/22] x86, espfix: Move espfix definitions into a separate header file Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.4 18/22] x86, espfix: Fix broken header guard Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.4 19/22] x86, espfix: Make espfix64 a Kconfig option, fix UML Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.4 20/22] x86, espfix: Make it possible to disable 16-bit support Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.4 21/22] x86, ioremap: Speed up check for RAM pages Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.4 22/22] ACPI / battery: Retry to get battery information if failed during probing Greg Kroah-Hartman
2014-07-16  4:23 ` [PATCH 3.4 00/22] 3.4.99-stable review Guenter Roeck
2014-07-16 23:10 ` Greg Kroah-Hartman
2014-07-17 13:25 ` Shuah Khan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140715231622.540202454@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=stable@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=umgwanakikbuti@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox