From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org, Manfred Spraul <manfred@colorfullife.com>,
Mike Galbraith <bitbucket@online.de>,
Rik van Riel <riel@redhat.com>,
Davidlohr Bueso <davidlohr.bueso@hp.com>,
Andrew Morton <akpm@linux-foundation.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
Mike Galbraith <efault@gmx.de>
Subject: [ 64/69] ipc/sem.c: fix race in sem_lock()
Date: Wed, 16 Oct 2013 10:45:13 -0700 [thread overview]
Message-ID: <20131016174320.729324837@linuxfoundation.org> (raw)
In-Reply-To: <20131016174312.844154919@linuxfoundation.org>
3.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Manfred Spraul <manfred@colorfullife.com>
commit 5e9d527591421ccdb16acb8c23662231135d8686 upstream.
The exclusion of complex operations in sem_lock() is insufficient: after
acquiring the per-semaphore lock, a simple op must first check that
sem_perm.lock is not locked and only after that test check
complex_count. The current code does it the other way around - and that
creates a race. Details are below.
The patch is a complete rewrite of sem_lock(), based in part on the code
from Mike Galbraith. It removes all gotos and all loops and thus the
risk of livelocks.
I have tested the patch (together with the next one) on my i3 laptop and
it didn't cause any problems.
The bug is probably also present in 3.10 and 3.11, but for these kernels
it might be simpler just to move the test of sma->complex_count after
the spin_is_locked() test.
Details of the bug:
Assume:
- sma->complex_count = 0.
- Thread 1: semtimedop(complex op that must sleep)
- Thread 2: semtimedop(simple op).
Pseudo-Trace:
Thread 1: sem_lock(): acquire sem_perm.lock
Thread 1: sem_lock(): check for ongoing simple ops
Nothing ongoing, thread 2 is still before sem_lock().
Thread 1: try_atomic_semop()
<<< preempted.
Thread 2: sem_lock():
static inline int sem_lock(struct sem_array *sma, struct sembuf *sops,
int nsops)
{
int locknum;
again:
if (nsops == 1 && !sma->complex_count) {
struct sem *sem = sma->sem_base + sops->sem_num;
/* Lock just the semaphore we are interested in. */
spin_lock(&sem->lock);
/*
* If sma->complex_count was set while we were spinning,
* we may need to look at things we did not lock here.
*/
if (unlikely(sma->complex_count)) {
spin_unlock(&sem->lock);
goto lock_array;
}
<<<<<<<<<
<<< complex_count is still 0.
<<<
<<< Here it is preempted
<<<<<<<<<
Thread 1: try_atomic_semop() returns, notices that it must sleep.
Thread 1: increases sma->complex_count.
Thread 1: drops sem_perm.lock
Thread 2:
/*
* Another process is holding the global lock on the
* sem_array; we cannot enter our critical section,
* but have to wait for the global lock to be released.
*/
if (unlikely(spin_is_locked(&sma->sem_perm.lock))) {
spin_unlock(&sem->lock);
spin_unlock_wait(&sma->sem_perm.lock);
goto again;
}
<<< sem_perm.lock already dropped, thus no "goto again;"
locknum = sops->sem_num;
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Mike Galbraith <bitbucket@online.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
ipc/sem.c | 122 +++++++++++++++++++++++++++++++++++++++-----------------------
1 file changed, 78 insertions(+), 44 deletions(-)
--- a/ipc/sem.c
+++ b/ipc/sem.c
@@ -253,70 +253,104 @@ static void sem_rcu_free(struct rcu_head
}
/*
+ * Wait until all currently ongoing simple ops have completed.
+ * Caller must own sem_perm.lock.
+ * New simple ops cannot start, because simple ops first check
+ * that sem_perm.lock is free.
+ */
+static void sem_wait_array(struct sem_array *sma)
+{
+ int i;
+ struct sem *sem;
+
+ for (i = 0; i < sma->sem_nsems; i++) {
+ sem = sma->sem_base + i;
+ spin_unlock_wait(&sem->lock);
+ }
+}
+
+/*
* If the request contains only one semaphore operation, and there are
* no complex transactions pending, lock only the semaphore involved.
* Otherwise, lock the entire semaphore array, since we either have
* multiple semaphores in our own semops, or we need to look at
* semaphores from other pending complex operations.
- *
- * Carefully guard against sma->complex_count changing between zero
- * and non-zero while we are spinning for the lock. The value of
- * sma->complex_count cannot change while we are holding the lock,
- * so sem_unlock should be fine.
- *
- * The global lock path checks that all the local locks have been released,
- * checking each local lock once. This means that the local lock paths
- * cannot start their critical sections while the global lock is held.
*/
static inline int sem_lock(struct sem_array *sma, struct sembuf *sops,
int nsops)
{
- int locknum;
- again:
- if (nsops == 1 && !sma->complex_count) {
- struct sem *sem = sma->sem_base + sops->sem_num;
+ struct sem *sem;
- /* Lock just the semaphore we are interested in. */
- spin_lock(&sem->lock);
+ if (nsops != 1) {
+ /* Complex operation - acquire a full lock */
+ ipc_lock_object(&sma->sem_perm);
- /*
- * If sma->complex_count was set while we were spinning,
- * we may need to look at things we did not lock here.
+ /* And wait until all simple ops that are processed
+ * right now have dropped their locks.
*/
- if (unlikely(sma->complex_count)) {
- spin_unlock(&sem->lock);
- goto lock_array;
- }
+ sem_wait_array(sma);
+ return -1;
+ }
+
+ /*
+ * Only one semaphore affected - try to optimize locking.
+ * The rules are:
+ * - optimized locking is possible if no complex operation
+ * is either enqueued or processed right now.
+ * - The test for enqueued complex ops is simple:
+ * sma->complex_count != 0
+ * - Testing for complex ops that are processed right now is
+ * a bit more difficult. Complex ops acquire the full lock
+ * and first wait that the running simple ops have completed.
+ * (see above)
+ * Thus: If we own a simple lock and the global lock is free
+ * and complex_count is now 0, then it will stay 0 and
+ * thus just locking sem->lock is sufficient.
+ */
+ sem = sma->sem_base + sops->sem_num;
+ if (sma->complex_count == 0) {
/*
- * Another process is holding the global lock on the
- * sem_array; we cannot enter our critical section,
- * but have to wait for the global lock to be released.
+ * It appears that no complex operation is around.
+ * Acquire the per-semaphore lock.
*/
- if (unlikely(spin_is_locked(&sma->sem_perm.lock))) {
- spin_unlock(&sem->lock);
- spin_unlock_wait(&sma->sem_perm.lock);
- goto again;
+ spin_lock(&sem->lock);
+
+ /* Then check that the global lock is free */
+ if (!spin_is_locked(&sma->sem_perm.lock)) {
+ /* spin_is_locked() is not a memory barrier */
+ smp_mb();
+
+ /* Now repeat the test of complex_count:
+ * It can't change anymore until we drop sem->lock.
+ * Thus: if is now 0, then it will stay 0.
+ */
+ if (sma->complex_count == 0) {
+ /* fast path successful! */
+ return sops->sem_num;
+ }
}
+ spin_unlock(&sem->lock);
+ }
+
+ /* slow path: acquire the full lock */
+ ipc_lock_object(&sma->sem_perm);
- locknum = sops->sem_num;
+ if (sma->complex_count == 0) {
+ /* False alarm:
+ * There is no complex operation, thus we can switch
+ * back to the fast path.
+ */
+ spin_lock(&sem->lock);
+ ipc_unlock_object(&sma->sem_perm);
+ return sops->sem_num;
} else {
- int i;
- /*
- * Lock the semaphore array, and wait for all of the
- * individual semaphore locks to go away. The code
- * above ensures no new single-lock holders will enter
- * their critical section while the array lock is held.
+ /* Not a false alarm, thus complete the sequence for a
+ * full lock.
*/
- lock_array:
- ipc_lock_object(&sma->sem_perm);
- for (i = 0; i < sma->sem_nsems; i++) {
- struct sem *sem = sma->sem_base + i;
- spin_unlock_wait(&sem->lock);
- }
- locknum = -1;
+ sem_wait_array(sma);
+ return -1;
}
- return locknum;
}
static inline void sem_unlock(struct sem_array *sma, int locknum)
next prev parent reply other threads:[~2013-10-16 18:07 UTC|newest]
Thread overview: 73+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-10-16 17:44 [ 00/69] 3.10.17-stable review Greg Kroah-Hartman
2013-10-16 17:44 ` [ 01/69] ALSA: snd-usb-usx2y: remove bogus frame checks Greg Kroah-Hartman
2013-10-16 17:44 ` [ 02/69] ALSA: hda - hdmi: Fix channel map switch not taking effect Greg Kroah-Hartman
2013-10-16 17:44 ` [ 03/69] ALSA: hda - Add fixup for ASUS N56VZ Greg Kroah-Hartman
2013-10-16 17:44 ` [ 04/69] ALSA: hda - Fix microphone for Sony VAIO Pro 13 (Haswell model) Greg Kroah-Hartman
2013-10-16 17:44 ` [ 05/69] random: run random_int_secret_init() run after all late_initcalls Greg Kroah-Hartman
2013-10-16 17:44 ` [ 06/69] vfs: allow O_PATH file descriptors for fstatfs() Greg Kroah-Hartman
2013-10-16 17:44 ` [ 07/69] i2c: omap: Clear ARDY bit twice Greg Kroah-Hartman
2013-10-16 17:44 ` [ 08/69] hwmon: (applesmc) Always read until end of data Greg Kroah-Hartman
2013-10-16 17:44 ` [ 09/69] Btrfs: use right root when checking for hash collision Greg Kroah-Hartman
2013-10-16 17:44 ` [ 10/69] ext4: fix memory leak in xattr Greg Kroah-Hartman
2013-10-16 17:44 ` [ 11/69] KVM: PPC: Book3S HV: Fix typo in saving DSCR Greg Kroah-Hartman
2013-10-16 17:44 ` [ 12/69] parisc: fix interruption handler to respect pagefault_disable() Greg Kroah-Hartman
2013-10-16 17:44 ` [ 13/69] ARM: Fix the world famous typo with is_gate_vma() Greg Kroah-Hartman
2013-10-16 17:44 ` [ 14/69] ARC: Setup Vector Table Base in early boot Greg Kroah-Hartman
2013-10-16 17:44 ` [ 15/69] ARC: SMP failed to boot due to missing IVT setup Greg Kroah-Hartman
2013-10-16 17:44 ` [ 16/69] ARC: Fix __udelay calculation Greg Kroah-Hartman
2013-10-16 17:44 ` [ 17/69] ARC: Handle zero-overhead-loop in unaligned access handler Greg Kroah-Hartman
2013-10-16 17:44 ` [ 18/69] ARC: Fix 32-bit wrap around in access_ok() Greg Kroah-Hartman
2013-10-16 17:44 ` [ 19/69] ARC: Workaround spinlock livelock in SMP SystemC simulation Greg Kroah-Hartman
2013-10-16 17:44 ` [ 20/69] ARC: Fix signal frame management for SA_SIGINFO Greg Kroah-Hartman
2013-10-16 17:44 ` [ 21/69] ARC: Ignore ptrace SETREGSET request for synthetic register "stop_pc" Greg Kroah-Hartman
2013-10-16 17:44 ` [ 22/69] watchdog: ts72xx_wdt: locking bug in ioctl Greg Kroah-Hartman
2013-10-16 17:44 ` [ 23/69] compiler/gcc4: Add quirk for asm goto miscompilation bug Greg Kroah-Hartman
2013-10-16 17:44 ` [ 24/69] ALSA: hda - Fix mono speakers and headset mic on Dell Vostro 5470 Greg Kroah-Hartman
2013-10-16 17:44 ` [ 25/69] cope with potentially long ->d_dname() output for shmem/hugetlb Greg Kroah-Hartman
2013-10-16 17:44 ` [ 26/69] drm/i915: Only apply DPMS to the encoder if enabled Greg Kroah-Hartman
2013-10-16 17:44 ` [ 27/69] drm/radeon: forever loop on error in radeon_do_test_moves() Greg Kroah-Hartman
2013-10-16 17:44 ` [ 28/69] drm/radeon: fix typo in CP DMA register headers Greg Kroah-Hartman
2013-10-16 17:44 ` [ 29/69] drm/radeon: fix hw contexts for SUMO2 asics Greg Kroah-Hartman
2013-10-16 17:44 ` [ 30/69] ipc: move rcu lock out of ipc_addid Greg Kroah-Hartman
2013-10-16 17:44 ` [ 31/69] ipc: introduce ipc object locking helpers Greg Kroah-Hartman
2013-10-16 17:44 ` [ 32/69] ipc: close open coded spin lock calls Greg Kroah-Hartman
2013-10-16 17:44 ` [ 33/69] ipc: move locking out of ipcctl_pre_down_nolock Greg Kroah-Hartman
2013-10-16 17:44 ` [ 34/69] ipc,msg: shorten critical region in msgctl_down Greg Kroah-Hartman
2013-10-16 17:44 ` [ 35/69] ipc,msg: introduce msgctl_nolock Greg Kroah-Hartman
2013-10-16 17:44 ` [ 36/69] ipc,msg: introduce lockless functions to obtain the ipc object Greg Kroah-Hartman
2013-10-16 17:44 ` [ 37/69] ipc,msg: make msgctl_nolock lockless Greg Kroah-Hartman
2013-10-16 17:44 ` [ 38/69] ipc,msg: shorten critical region in msgsnd Greg Kroah-Hartman
2013-10-16 17:44 ` [ 39/69] ipc,msg: shorten critical region in msgrcv Greg Kroah-Hartman
2013-10-16 17:44 ` [ 40/69] ipc: remove unused functions Greg Kroah-Hartman
2013-10-16 17:44 ` [ 41/69] ipc/util.c, ipc_rcu_alloc: cacheline align allocation Greg Kroah-Hartman
2013-10-16 17:44 ` [ 42/69] ipc/sem.c: cacheline align the semaphore structures Greg Kroah-Hartman
2013-10-16 17:44 ` [ 43/69] ipc/sem: separate wait-for-zero and alter tasks into seperate queues Greg Kroah-Hartman
2013-10-16 17:44 ` [ 44/69] ipc/sem.c: always use only one queue for alter operations Greg Kroah-Hartman
2013-10-16 17:44 ` [ 45/69] ipc/sem.c: replace shared sem_otime with per-semaphore value Greg Kroah-Hartman
2013-10-16 17:44 ` [ 46/69] ipc/sem.c: rename try_atomic_semop() to perform_atomic_semop(), docu update Greg Kroah-Hartman
2013-10-16 17:44 ` [ 47/69] ipc/msg.c: Fix lost wakeup in msgsnd() Greg Kroah-Hartman
2013-10-16 17:44 ` [ 48/69] ipc,shm: introduce lockless functions to obtain the ipc object Greg Kroah-Hartman
2013-10-16 17:44 ` [ 49/69] ipc,shm: shorten critical region in shmctl_down Greg Kroah-Hartman
2013-10-16 17:44 ` [ 50/69] ipc: drop ipcctl_pre_down Greg Kroah-Hartman
2013-10-16 17:45 ` [ 51/69] ipc,shm: introduce shmctl_nolock Greg Kroah-Hartman
2013-10-16 17:45 ` [ 52/69] ipc,shm: make shmctl_nolock lockless Greg Kroah-Hartman
2013-10-16 17:45 ` [ 53/69] ipc,shm: shorten critical region for shmctl Greg Kroah-Hartman
2013-10-16 17:45 ` [ 54/69] ipc,shm: cleanup do_shmat pasta Greg Kroah-Hartman
2013-10-16 17:45 ` [ 55/69] ipc,shm: shorten critical region for shmat Greg Kroah-Hartman
2013-10-16 17:45 ` [ 56/69] ipc: rename ids->rw_mutex Greg Kroah-Hartman
2013-10-16 17:45 ` [ 57/69] ipc,msg: drop msg_unlock Greg Kroah-Hartman
2013-10-16 17:45 ` [ 58/69] ipc: document general ipc locking scheme Greg Kroah-Hartman
2013-10-16 17:45 ` [ 59/69] ipc, shm: guard against non-existant vma in shmdt(2) Greg Kroah-Hartman
2013-10-16 17:45 ` [ 60/69] ipc: drop ipc_lock_by_ptr Greg Kroah-Hartman
2013-10-16 17:45 ` [ 61/69] ipc, shm: drop shm_lock_check Greg Kroah-Hartman
2013-10-16 17:45 ` [ 62/69] ipc: drop ipc_lock_check Greg Kroah-Hartman
2013-10-16 17:45 ` [ 63/69] ipc: fix race with LSMs Greg Kroah-Hartman
2013-10-16 17:45 ` Greg Kroah-Hartman [this message]
2013-10-16 17:45 ` [ 65/69] ipc/sem.c: optimize sem_lock() Greg Kroah-Hartman
2013-10-16 17:45 ` [ 66/69] ipc/sem.c: synchronize the proc interface Greg Kroah-Hartman
2013-10-16 17:45 ` [ 67/69] ipc/sem.c: update sem_otime for all operations Greg Kroah-Hartman
2013-10-16 17:45 ` [ 68/69] ipc,msg: prevent race with rmid in msgsnd,msgrcv Greg Kroah-Hartman
2013-10-16 17:45 ` [ 69/69] x86: avoid remapping data in parse_setup_data() Greg Kroah-Hartman
2013-10-16 22:11 ` [ 00/69] 3.10.17-stable review Guenter Roeck
2013-10-17 1:07 ` Greg Kroah-Hartman
2013-10-17 16:50 ` Shuah Khan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20131016174320.729324837@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=akpm@linux-foundation.org \
--cc=bitbucket@online.de \
--cc=davidlohr.bueso@hp.com \
--cc=efault@gmx.de \
--cc=linux-kernel@vger.kernel.org \
--cc=manfred@colorfullife.com \
--cc=riel@redhat.com \
--cc=stable@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox