From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org, Steven Rostedt <rostedt@goodmis.org>,
Thomas Gleixner <tglx@linutronix.de>,
Peter Zijlstra <peterz@infradead.org>,
Mike Galbraith <umgwanakikbuti@gmail.com>
Subject: [PATCH 3.10 32/44] rtmutex: Plug slow unlock race
Date: Tue, 15 Jul 2014 16:17:30 -0700 [thread overview]
Message-ID: <20140715231647.650801374@linuxfoundation.org> (raw)
In-Reply-To: <20140715231646.690728348@linuxfoundation.org>
3.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Thomas Gleixner <tglx@linutronix.de>
commit 27e35715df54cbc4f2d044f681802ae30479e7fb upstream.
When the rtmutex fast path is enabled the slow unlock function can
create the following situation:
spin_lock(foo->m->wait_lock);
foo->m->owner = NULL;
rt_mutex_lock(foo->m); <-- fast path
free = atomic_dec_and_test(foo->refcnt);
rt_mutex_unlock(foo->m); <-- fast path
if (free)
kfree(foo);
spin_unlock(foo->m->wait_lock); <--- Use after free.
Plug the race by changing the slow unlock to the following scheme:
while (!rt_mutex_has_waiters(m)) {
/* Clear the waiters bit in m->owner */
clear_rt_mutex_waiters(m);
owner = rt_mutex_owner(m);
spin_unlock(m->wait_lock);
if (cmpxchg(m->owner, owner, 0) == owner)
return;
spin_lock(m->wait_lock);
}
So in case of a new waiter incoming while the owner tries the slow
path unlock we have two situations:
unlock(wait_lock);
lock(wait_lock);
cmpxchg(p, owner, 0) == owner
mark_rt_mutex_waiters(lock);
acquire(lock);
Or:
unlock(wait_lock);
lock(wait_lock);
mark_rt_mutex_waiters(lock);
cmpxchg(p, owner, 0) != owner
enqueue_waiter();
unlock(wait_lock);
lock(wait_lock);
wakeup_next waiter();
unlock(wait_lock);
lock(wait_lock);
acquire(lock);
If the fast path is disabled, then the simple
m->owner = NULL;
unlock(m->wait_lock);
is sufficient as all access to m->owner is serialized via
m->wait_lock;
Also document and clarify the wakeup_next_waiter function as suggested
by Oleg Nesterov.
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140611183852.937945560@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
kernel/rtmutex.c | 115 ++++++++++++++++++++++++++++++++++++++++++++++++++++---
1 file changed, 109 insertions(+), 6 deletions(-)
--- a/kernel/rtmutex.c
+++ b/kernel/rtmutex.c
@@ -82,6 +82,47 @@ static inline void mark_rt_mutex_waiters
owner = *p;
} while (cmpxchg(p, owner, owner | RT_MUTEX_HAS_WAITERS) != owner);
}
+
+/*
+ * Safe fastpath aware unlock:
+ * 1) Clear the waiters bit
+ * 2) Drop lock->wait_lock
+ * 3) Try to unlock the lock with cmpxchg
+ */
+static inline bool unlock_rt_mutex_safe(struct rt_mutex *lock)
+ __releases(lock->wait_lock)
+{
+ struct task_struct *owner = rt_mutex_owner(lock);
+
+ clear_rt_mutex_waiters(lock);
+ raw_spin_unlock(&lock->wait_lock);
+ /*
+ * If a new waiter comes in between the unlock and the cmpxchg
+ * we have two situations:
+ *
+ * unlock(wait_lock);
+ * lock(wait_lock);
+ * cmpxchg(p, owner, 0) == owner
+ * mark_rt_mutex_waiters(lock);
+ * acquire(lock);
+ * or:
+ *
+ * unlock(wait_lock);
+ * lock(wait_lock);
+ * mark_rt_mutex_waiters(lock);
+ *
+ * cmpxchg(p, owner, 0) != owner
+ * enqueue_waiter();
+ * unlock(wait_lock);
+ * lock(wait_lock);
+ * wake waiter();
+ * unlock(wait_lock);
+ * lock(wait_lock);
+ * acquire(lock);
+ */
+ return rt_mutex_cmpxchg(lock, owner, NULL);
+}
+
#else
# define rt_mutex_cmpxchg(l,c,n) (0)
static inline void mark_rt_mutex_waiters(struct rt_mutex *lock)
@@ -89,6 +130,17 @@ static inline void mark_rt_mutex_waiters
lock->owner = (struct task_struct *)
((unsigned long)lock->owner | RT_MUTEX_HAS_WAITERS);
}
+
+/*
+ * Simple slow path only version: lock->owner is protected by lock->wait_lock.
+ */
+static inline bool unlock_rt_mutex_safe(struct rt_mutex *lock)
+ __releases(lock->wait_lock)
+{
+ lock->owner = NULL;
+ raw_spin_unlock(&lock->wait_lock);
+ return true;
+}
#endif
/*
@@ -520,7 +572,8 @@ static int task_blocks_on_rt_mutex(struc
/*
* Wake up the next waiter on the lock.
*
- * Remove the top waiter from the current tasks waiter list and wake it up.
+ * Remove the top waiter from the current tasks pi waiter list and
+ * wake it up.
*
* Called with lock->wait_lock held.
*/
@@ -541,10 +594,23 @@ static void wakeup_next_waiter(struct rt
*/
plist_del(&waiter->pi_list_entry, ¤t->pi_waiters);
- rt_mutex_set_owner(lock, NULL);
+ /*
+ * As we are waking up the top waiter, and the waiter stays
+ * queued on the lock until it gets the lock, this lock
+ * obviously has waiters. Just set the bit here and this has
+ * the added benefit of forcing all new tasks into the
+ * slow path making sure no task of lower priority than
+ * the top waiter can steal this lock.
+ */
+ lock->owner = (void *) RT_MUTEX_HAS_WAITERS;
raw_spin_unlock_irqrestore(¤t->pi_lock, flags);
+ /*
+ * It's safe to dereference waiter as it cannot go away as
+ * long as we hold lock->wait_lock. The waiter task needs to
+ * acquire it in order to dequeue the waiter.
+ */
wake_up_process(waiter->task);
}
@@ -797,12 +863,49 @@ rt_mutex_slowunlock(struct rt_mutex *loc
rt_mutex_deadlock_account_unlock(current);
- if (!rt_mutex_has_waiters(lock)) {
- lock->owner = NULL;
- raw_spin_unlock(&lock->wait_lock);
- return;
+ /*
+ * We must be careful here if the fast path is enabled. If we
+ * have no waiters queued we cannot set owner to NULL here
+ * because of:
+ *
+ * foo->lock->owner = NULL;
+ * rtmutex_lock(foo->lock); <- fast path
+ * free = atomic_dec_and_test(foo->refcnt);
+ * rtmutex_unlock(foo->lock); <- fast path
+ * if (free)
+ * kfree(foo);
+ * raw_spin_unlock(foo->lock->wait_lock);
+ *
+ * So for the fastpath enabled kernel:
+ *
+ * Nothing can set the waiters bit as long as we hold
+ * lock->wait_lock. So we do the following sequence:
+ *
+ * owner = rt_mutex_owner(lock);
+ * clear_rt_mutex_waiters(lock);
+ * raw_spin_unlock(&lock->wait_lock);
+ * if (cmpxchg(&lock->owner, owner, 0) == owner)
+ * return;
+ * goto retry;
+ *
+ * The fastpath disabled variant is simple as all access to
+ * lock->owner is serialized by lock->wait_lock:
+ *
+ * lock->owner = NULL;
+ * raw_spin_unlock(&lock->wait_lock);
+ */
+ while (!rt_mutex_has_waiters(lock)) {
+ /* Drops lock->wait_lock ! */
+ if (unlock_rt_mutex_safe(lock) == true)
+ return;
+ /* Relock the rtmutex and try again */
+ raw_spin_lock(&lock->wait_lock);
}
+ /*
+ * The wakeup next waiter path does not suffer from the above
+ * race. See the comments there.
+ */
wakeup_next_waiter(lock);
raw_spin_unlock(&lock->wait_lock);
next prev parent reply other threads:[~2014-07-16 0:11 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-07-15 23:16 [PATCH 3.10 00/44] 3.10.49-stable review Greg Kroah-Hartman
2014-07-15 23:16 ` [PATCH 3.10 01/44] usb: option: Add ID for Telewell TW-LTE 4G v2 Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.10 02/44] USB: cp210x: add support for Corsair usb dongle Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.10 03/44] USB: ftdi_sio: Add extra PID Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.10 04/44] USB: serial: ftdi_sio: Add Infineon Triboard Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.10 05/44] parisc: add serial ports of C8000/1GHz machine to hardware database Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.10 06/44] workqueue: fix dev_set_uevent_suppress() imbalance Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.10 07/44] cpuset,mempolicy: fix sleeping function called from invalid context Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.10 08/44] workqueue: zero cpumask of wq_numa_possible_cpumask on init Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.10 09/44] hwmon: (amc6821) Fix permissions for temp2_input Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.10 10/44] hwmon: (adm1031) Fix writes to limit registers Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.10 11/44] hwmon: (adm1029) Ensure the fan_div cache is updated in set_fan_div Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.10 12/44] hwmon: (adm1021) Fix cache problem when writing temperature limits Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.10 13/44] ACPI / resources: only reject zero length resources based at address zero Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.10 14/44] powerpc/perf: Never program book3s PMCs with values >= 0x80000000 Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.10 15/44] powerpc/perf: Add PPMU_ARCH_207S define Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.10 16/44] powerpc/perf: Clear MMCR2 when enabling PMU Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.10 17/44] cpufreq: Makefile: fix compilation for davinci platform Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.10 18/44] crypto: sha512_ssse3 - fix byte count to bit count conversion Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.10 19/44] arm64: implement TASK_SIZE_OF Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.10 20/44] clk: spear3xx: Use proper control register offset Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.10 21/44] Drivers: hv: vmbus: Fix a bug in the channel callback dispatch code Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.10 22/44] dm io: fix a race condition in the wake up code for sync_io Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.10 23/44] ext4: fix unjournalled bg descriptor while initializing inode bitmap Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.10 24/44] ext4: clarify error count warning messages Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.10 25/44] ext4: disable synchronous transaction batching if max_batch_time==0 Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.10 26/44] drm/radeon: fix typo in golden register setup on evergreen Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.10 28/44] ring-buffer: Check if buffer exists before polling Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.10 29/44] rtmutex: Fix deadlock detector for real Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.10 30/44] rtmutex: Detect changes in the pi lock chain Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.10 31/44] rtmutex: Handle deadlock detection smarter Greg Kroah-Hartman
2014-07-15 23:17 ` Greg Kroah-Hartman [this message]
2014-07-15 23:17 ` [PATCH 3.10 33/44] score: normalize global variables exported by vmlinux.lds Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.10 34/44] Score: Implement the function csum_ipv6_magic Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.10 35/44] Score: The commit is for compiling successfully Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.10 36/44] Score: Modify the Makefile of Score, remove -mlong-calls for compiling Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.10 37/44] Revert "x86-64, modify_ldt: Make support for 16-bit segments a runtime option" Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.10 38/44] x86-64, espfix: Dont leak bits 31:16 of %esp returning to 16-bit stack Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.10 39/44] x86, espfix: Move espfix definitions into a separate header file Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.10 40/44] x86, espfix: Fix broken header guard Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.10 41/44] x86, espfix: Make espfix64 a Kconfig option, fix UML Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.10 42/44] x86, espfix: Make it possible to disable 16-bit support Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.10 43/44] x86, ioremap: Speed up check for RAM pages Greg Kroah-Hartman
2014-07-15 23:17 ` [PATCH 3.10 44/44] ACPI / battery: Retry to get battery information if failed during probing Greg Kroah-Hartman
2014-07-16 4:25 ` [PATCH 3.10 00/44] 3.10.49-stable review Guenter Roeck
2014-07-16 23:10 ` Greg Kroah-Hartman
2014-07-17 13:24 ` Shuah Khan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140715231647.650801374@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=stable@vger.kernel.org \
--cc=tglx@linutronix.de \
--cc=umgwanakikbuti@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox