* [PATCH] ipc/sem.c: Fix complex_count vs. simple op race
@ 2016-01-02 12:04 Manfred Spraul
2016-01-04 13:02 ` Davidlohr Bueso
0 siblings, 1 reply; 5+ messages in thread
From: Manfred Spraul @ 2016-01-02 12:04 UTC (permalink / raw)
To: Andrew Morton
Cc: LKML, 1vier1, Ingo Molnar, Davidlohr Bueso, felixh,
Manfred Spraul, stable
Commit 6d07b68ce16a ("ipc/sem.c: optimize sem_lock()") introduced a race:
sem_lock has a fast path that allows parallel simple operations.
There are two reasons why a simple operation cannot run in parallel:
- a non-simple operations is ongoing (sma->sem_perm.lock held)
- a complex operation is sleeping (sma->complex_count != 0)
As both facts are stored independently, a thread can bypass the current
checks by sleeping in the right positions. See below for more details
(or kernel bugzilla 105651).
The patch fixes that by creating one variable (complex_mode)
that tracks both reasons why parallel operations are not possible.
The patch also updates stale documentation regarding the locking.
With regards to stable kernels:
The patch is required for all kernels that include the commit 6d07b68ce16a
("ipc/sem.c: optimize sem_lock()") (3.10?)
The alternative is to revert the patch that introduced the race.
Background:
Here is the race of the current implementation:
Thread A: (simple op)
- does the first "sma->complex_count == 0" test
Thread B: (complex op)
- does sem_lock(): This includes an array scan. But the scan can't
find Thread A, because Thread A does not own sem->lock yet.
- the thread does the operation, increases complex_count,
drops sem_lock, sleeps
Thread A:
- spin_lock(&sem->lock), spin_is_locked(sma->sem_perm.lock)
- sleeps before the complex_count test
Thread C: (complex op)
- does sem_lock (no array scan, complex_count==1)
- wakes up Thread B.
- decrements complex_count
Thread A:
- does the complex_count test
Bug:
Now both thread A and thread C operate on the same array, without
any synchronization.
Reported-by: felixh@informatik.uni-bremen.de
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: <stable@vger.kernel.org>
---
include/linux/sem.h | 1 +
ipc/sem.c | 124 ++++++++++++++++++++++++++++++----------------------
2 files changed, 72 insertions(+), 53 deletions(-)
diff --git a/include/linux/sem.h b/include/linux/sem.h
index 976ce3a..d0efd6e 100644
--- a/include/linux/sem.h
+++ b/include/linux/sem.h
@@ -21,6 +21,7 @@ struct sem_array {
struct list_head list_id; /* undo requests on this array */
int sem_nsems; /* no. of semaphores in array */
int complex_count; /* pending complex operations */
+ bool complex_mode; /* no parallel simple ops */
};
#ifdef CONFIG_SYSVIPC
diff --git a/ipc/sem.c b/ipc/sem.c
index b471e5a..87e1f5d 100644
--- a/ipc/sem.c
+++ b/ipc/sem.c
@@ -155,14 +155,21 @@ static int sysvipc_sem_proc_show(struct seq_file *s, void *it);
/*
* Locking:
+ * a) global sem_lock() for read/write
* sem_undo.id_next,
* sem_array.complex_count,
- * sem_array.pending{_alter,_cont},
- * sem_array.sem_undo: global sem_lock() for read/write
- * sem_undo.proc_next: only "current" is allowed to read/write that field.
+ * sem_array.complex_mode
+ * sem_array.pending{_alter,_const},
+ * sem_array.sem_undo
*
+ * b) global or semaphore sem_lock() for read/write:
* sem_array.sem_base[i].pending_{const,alter}:
- * global or semaphore sem_lock() for read/write
+ * sem_array.complex_mode (for read)
+ *
+ * c) special:
+ * sem_undo_list.list_proc:
+ * * undo_list->lock for write
+ * * rcu for read
*/
#define sc_semmsl sem_ctls[0]
@@ -263,23 +270,25 @@ static void sem_rcu_free(struct rcu_head *head)
#define ipc_smp_acquire__after_spin_is_unlocked() smp_rmb()
/*
- * Wait until all currently ongoing simple ops have completed.
+ * Enter the mode suitable for non-simple operations:
* Caller must own sem_perm.lock.
- * New simple ops cannot start, because simple ops first check
- * that sem_perm.lock is free.
- * that a) sem_perm.lock is free and b) complex_count is 0.
*/
-static void sem_wait_array(struct sem_array *sma)
+static void complexmode_enter(struct sem_array *sma)
{
int i;
struct sem *sem;
- if (sma->complex_count) {
- /* The thread that increased sma->complex_count waited on
- * all sem->lock locks. Thus we don't need to wait again.
- */
+ if (sma->complex_mode) {
+ /* We are already in complex_mode. Nothing to do */
return;
}
+ WRITE_ONCE(sma->complex_mode, true);
+
+ /* We need a full barrier:
+ * The write to complex_mode must be visible
+ * before we read the first sem->lock spinlock state.
+ */
+ smp_mb();
for (i = 0; i < sma->sem_nsems; i++) {
sem = sma->sem_base + i;
@@ -289,6 +298,29 @@ static void sem_wait_array(struct sem_array *sma)
}
/*
+ * Try to leave the mode that disallows simple operations:
+ * Caller must own sem_perm.lock.
+ */
+static void complexmode_tryleave(struct sem_array *sma)
+{
+ if (sma->complex_count) {
+ /* Complex ops are sleeping.
+ * We must stay in complex mode
+ */
+ return;
+ }
+ /*
+ * Immediately after setting complex_mode to false,
+ * a simple op can start. Thus: all memory writes
+ * performed by the current operation must be visible
+ * before we set complex_mode to false.
+ */
+ smp_wmb();
+
+ WRITE_ONCE(sma->complex_mode, false);
+}
+
+/*
* If the request contains only one semaphore operation, and there are
* no complex transactions pending, lock only the semaphore involved.
* Otherwise, lock the entire semaphore array, since we either have
@@ -304,56 +336,38 @@ static inline int sem_lock(struct sem_array *sma, struct sembuf *sops,
/* Complex operation - acquire a full lock */
ipc_lock_object(&sma->sem_perm);
- /* And wait until all simple ops that are processed
- * right now have dropped their locks.
- */
- sem_wait_array(sma);
+ /* Prevent parallel simple ops */
+ complexmode_enter(sma);
return -1;
}
/*
* Only one semaphore affected - try to optimize locking.
- * The rules are:
- * - optimized locking is possible if no complex operation
- * is either enqueued or processed right now.
- * - The test for enqueued complex ops is simple:
- * sma->complex_count != 0
- * - Testing for complex ops that are processed right now is
- * a bit more difficult. Complex ops acquire the full lock
- * and first wait that the running simple ops have completed.
- * (see above)
- * Thus: If we own a simple lock and the global lock is free
- * and complex_count is now 0, then it will stay 0 and
- * thus just locking sem->lock is sufficient.
+ * Optimized locking is possible if no complex operation
+ * is either enqueued or processed right now.
+ *
+ * Both facts are tracked by complex_mode.
*/
sem = sma->sem_base + sops->sem_num;
- if (sma->complex_count == 0) {
+ /*
+ * Initial check for complex_mode. Just an optimization,
+ * no locking.
+ */
+ if (!READ_ONCE(sma->complex_mode)) {
/*
* It appears that no complex operation is around.
* Acquire the per-semaphore lock.
*/
spin_lock(&sem->lock);
- /* Then check that the global lock is free */
- if (!spin_is_locked(&sma->sem_perm.lock)) {
- /*
- * We need a memory barrier with acquire semantics,
- * otherwise we can race with another thread that does:
- * complex_count++;
- * spin_unlock(sem_perm.lock);
- */
- ipc_smp_acquire__after_spin_is_unlocked();
-
- /*
- * Now repeat the test of complex_count:
- * It can't change anymore until we drop sem->lock.
- * Thus: if is now 0, then it will stay 0.
- */
- if (sma->complex_count == 0) {
- /* fast path successful! */
- return sops->sem_num;
- }
+ /* Now repeat the test for complex_mode.
+ * A memory barrier is provided by the spin_lock()
+ * above.
+ */
+ if (!READ_ONCE(sma->complex_mode)) {
+ /* fast path successful! */
+ return sops->sem_num;
}
spin_unlock(&sem->lock);
}
@@ -373,7 +387,7 @@ static inline int sem_lock(struct sem_array *sma, struct sembuf *sops,
/* Not a false alarm, thus complete the sequence for a
* full lock.
*/
- sem_wait_array(sma);
+ complexmode_enter(sma);
return -1;
}
}
@@ -382,6 +396,7 @@ static inline void sem_unlock(struct sem_array *sma, int locknum)
{
if (locknum == -1) {
unmerge_queues(sma);
+ complexmode_tryleave(sma);
ipc_unlock_object(&sma->sem_perm);
} else {
struct sem *sem = sma->sem_base + locknum;
@@ -533,6 +548,7 @@ static int newary(struct ipc_namespace *ns, struct ipc_params *params)
}
sma->complex_count = 0;
+ sma->complex_mode = true; /* dropped by sem_unlock below */
INIT_LIST_HEAD(&sma->pending_alter);
INIT_LIST_HEAD(&sma->pending_const);
INIT_LIST_HEAD(&sma->list_id);
@@ -2186,10 +2202,10 @@ static int sysvipc_sem_proc_show(struct seq_file *s, void *it)
/*
* The proc interface isn't aware of sem_lock(), it calls
* ipc_lock_object() directly (in sysvipc_find_ipc).
- * In order to stay compatible with sem_lock(), we must wait until
- * all simple semop() calls have left their critical regions.
+ * In order to stay compatible with sem_lock(), we must
+ * enter / leave complex_mode.
*/
- sem_wait_array(sma);
+ complexmode_enter(sma);
sem_otime = get_semotime(sma);
@@ -2206,6 +2222,8 @@ static int sysvipc_sem_proc_show(struct seq_file *s, void *it)
sem_otime,
sma->sem_ctime);
+ complexmode_tryleave(sma);
+
return 0;
}
#endif
--
2.4.3
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] ipc/sem.c: Fix complex_count vs. simple op race
2016-01-02 12:04 [PATCH] ipc/sem.c: Fix complex_count vs. simple op race Manfred Spraul
@ 2016-01-04 13:02 ` Davidlohr Bueso
2016-01-04 18:32 ` Manfred Spraul
0 siblings, 1 reply; 5+ messages in thread
From: Davidlohr Bueso @ 2016-01-04 13:02 UTC (permalink / raw)
To: Manfred Spraul; +Cc: Andrew Morton, LKML, 1vier1, Ingo Molnar, felixh, stable
On Sat, 02 Jan 2016, Manfred Spraul wrote:
>Commit 6d07b68ce16a ("ipc/sem.c: optimize sem_lock()") introduced a race:
>
>sem_lock has a fast path that allows parallel simple operations.
>There are two reasons why a simple operation cannot run in parallel:
>- a non-simple operations is ongoing (sma->sem_perm.lock held)
>- a complex operation is sleeping (sma->complex_count != 0)
>
>As both facts are stored independently, a thread can bypass the current
>checks by sleeping in the right positions. See below for more details
>(or kernel bugzilla 105651).
>
>The patch fixes that by creating one variable (complex_mode)
>that tracks both reasons why parallel operations are not possible.
>
>The patch also updates stale documentation regarding the locking.
>
>With regards to stable kernels:
>The patch is required for all kernels that include the commit 6d07b68ce16a
>("ipc/sem.c: optimize sem_lock()") (3.10?)
>
>The alternative is to revert the patch that introduced the race.
I am just now catching up with this, but a quick thought is that we probably
want to keep 6d07b68ce16a as waiting on unlocking all sem->lock should be
much worse for performance than keeping track of the complex 'mode'. Specially
if we have a large array.
Also, any idea what workload exposed this race? Anyway, will take a closer look
at the patch/issue.
Thanks,
Davidlohr
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] ipc/sem.c: Fix complex_count vs. simple op race
2016-01-04 13:02 ` Davidlohr Bueso
@ 2016-01-04 18:32 ` Manfred Spraul
0 siblings, 0 replies; 5+ messages in thread
From: Manfred Spraul @ 2016-01-04 18:32 UTC (permalink / raw)
To: Davidlohr Bueso; +Cc: Andrew Morton, LKML, 1vier1, Ingo Molnar, felixh, stable
[-- Attachment #1: Type: text/plain, Size: 1905 bytes --]
On 01/04/2016 02:02 PM, Davidlohr Bueso wrote:
> On Sat, 02 Jan 2016, Manfred Spraul wrote:
>
>> Commit 6d07b68ce16a ("ipc/sem.c: optimize sem_lock()") introduced a
>> race:
>>
>> sem_lock has a fast path that allows parallel simple operations.
>> There are two reasons why a simple operation cannot run in parallel:
>> - a non-simple operations is ongoing (sma->sem_perm.lock held)
>> - a complex operation is sleeping (sma->complex_count != 0)
>>
>> As both facts are stored independently, a thread can bypass the current
>> checks by sleeping in the right positions. See below for more details
>> (or kernel bugzilla 105651).
>>
>> The patch fixes that by creating one variable (complex_mode)
>> that tracks both reasons why parallel operations are not possible.
>>
>> The patch also updates stale documentation regarding the locking.
>>
>> With regards to stable kernels:
>> The patch is required for all kernels that include the commit
>> 6d07b68ce16a
>> ("ipc/sem.c: optimize sem_lock()") (3.10?)
>>
>> The alternative is to revert the patch that introduced the race.
>
> I am just now catching up with this, but a quick thought is that we
> probably
> want to keep 6d07b68ce16a as waiting on unlocking all sem->lock should be
> much worse for performance than keeping track of the complex 'mode'.
Keeping track is simple - and the fast path gets simpler compared to
ncurrent code.
> Specially
> if we have a large array.
Yes, I would prefer my patch as a backport.
But if someone has objections, then a revert would be an alternative.
>
> Also, any idea what workload exposed this race? Anyway, will take a
> closer look
> at the patch/issue.
It was found by a theoretical review:
The sem_lock code was used as an exercise at University Bremen.
--
Manfred
P.S.: If we replace the "bool" with an "int", we could even introduce a
hysteresis, to further reduce the amount of array scans.
[-- Attachment #2: 0001-ipc-sem-sem_lock-with-hysteresis.patch --]
[-- Type: text/x-patch, Size: 4195 bytes --]
>From d3ea4cddaff440c03b89b045ee5a0be8c4db623d Mon Sep 17 00:00:00 2001
From: Manfred Spraul <manfred@colorfullife.com>
Date: Mon, 4 Jan 2016 19:20:34 +0100
Subject: [PATCH] ipc/sem: sem_lock with hysteresis
Untested
May not work
May not pass checkpatch.pl
May not be an improvement at all, not even for microbenchmarks
May be a worthless pseudo improvement without any real-world
advantages (except more complexity and more risks for bugs)
---
include/linux/sem.h | 2 +-
ipc/sem.c | 60 ++++++++++++++++++++++++++++++++++++++---------------
2 files changed, 44 insertions(+), 18 deletions(-)
diff --git a/include/linux/sem.h b/include/linux/sem.h
index d0efd6e..6fb3227 100644
--- a/include/linux/sem.h
+++ b/include/linux/sem.h
@@ -21,7 +21,7 @@ struct sem_array {
struct list_head list_id; /* undo requests on this array */
int sem_nsems; /* no. of semaphores in array */
int complex_count; /* pending complex operations */
- bool complex_mode; /* no parallel simple ops */
+ int complex_mode; /* >0: no parallel simple ops */
};
#ifdef CONFIG_SYSVIPC
diff --git a/ipc/sem.c b/ipc/sem.c
index 87e1f5d..f580c7c 100644
--- a/ipc/sem.c
+++ b/ipc/sem.c
@@ -154,6 +154,13 @@ static int sysvipc_sem_proc_show(struct seq_file *s, void *it);
#define SEMOPM_FAST 64 /* ~ 372 bytes on stack */
/*
+ * Switching from the mode suitable for complex ops
+ * to the mode for simple ops is costly. Therefore:
+ * use some hysteresis
+ */
+#define COMPLEX_MODE_ENTER 10
+
+/*
* Locking:
* a) global sem_lock() for read/write
* sem_undo.id_next,
@@ -278,11 +285,16 @@ static void complexmode_enter(struct sem_array *sma)
int i;
struct sem *sem;
- if (sma->complex_mode) {
- /* We are already in complex_mode. Nothing to do */
+ if (sma->complex_mode > 0) {
+ /*
+ * We are already in complex_mode.
+ * Nothing to do, just increase
+ * counter until we return to simple mode
+ */
+ WRITE_ONCE(sma->complex_mode, COMPLEX_MODE_ENTER);
return;
}
- WRITE_ONCE(sma->complex_mode, true);
+ WRITE_ONCE(sma->complex_mode, COMPLEX_MODE_ENTER);
/* We need a full barrier:
* The write to complex_mode must be visible
@@ -309,15 +321,18 @@ static void complexmode_tryleave(struct sem_array *sma)
*/
return;
}
+ if (sma->complex_mode == 0) {
+pr_info("error.");
+ }
/*
- * Immediately after setting complex_mode to false,
+ * Immediately after setting complex_mode to 0,
* a simple op can start. Thus: all memory writes
* performed by the current operation must be visible
* before we set complex_mode to false.
*/
smp_wmb();
- WRITE_ONCE(sma->complex_mode, false);
+ WRITE_ONCE(sma->complex_mode, sma->complex_mode-1);
}
/*
@@ -377,19 +392,30 @@ static inline int sem_lock(struct sem_array *sma, struct sembuf *sops,
if (sma->complex_count == 0) {
/* False alarm:
- * There is no complex operation, thus we can switch
- * back to the fast path.
- */
- spin_lock(&sem->lock);
- ipc_unlock_object(&sma->sem_perm);
- return sops->sem_num;
- } else {
- /* Not a false alarm, thus complete the sequence for a
- * full lock.
+ * There is no complex operation, check hysteresis
+ * If 0, switch back to the fast path.
*/
- complexmode_enter(sma);
- return -1;
+ if (sma->complex_mode > 0) {
+ /*
+ * barrier provided by spin_lock
+ * sma->sem_perm.lock
+ */
+
+ WRITE_ONCE(sma->complex_mode, sma->complex_mode-1);
+ }
+ if (sma->complex_mode == 0) {
+ spin_lock(&sem->lock);
+ ipc_unlock_object(&sma->sem_perm);
+ return sops->sem_num;
+ }
}
+ /*
+ * Not a false alarm, but we must be already in
+ * complex_mode: either because of waiting complex ops,
+ * or due to hysteresis.
+ */
+ WARN_ON(sma->complex_mode == 0);
+ return -1;
}
static inline void sem_unlock(struct sem_array *sma, int locknum)
@@ -548,7 +574,7 @@ static int newary(struct ipc_namespace *ns, struct ipc_params *params)
}
sma->complex_count = 0;
- sma->complex_mode = true; /* dropped by sem_unlock below */
+ WRITE_ONCE(sma->complex_mode, COMPLEX_MODE_ENTER);
INIT_LIST_HEAD(&sma->pending_alter);
INIT_LIST_HEAD(&sma->pending_const);
INIT_LIST_HEAD(&sma->list_id);
--
2.4.3
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH] ipc/sem.c: Fix complex_count vs. simple op race
@ 2016-07-21 17:54 Manfred Spraul
2016-07-21 20:16 ` Andrew Morton
0 siblings, 1 reply; 5+ messages in thread
From: Manfred Spraul @ 2016-07-21 17:54 UTC (permalink / raw)
To: H. Peter Anvin, Peter Zijlstra, Andrew Morton, Davidlohr Bueso
Cc: LKML, Thomas Gleixner, Ingo Molnar, 1vier1, felixh,
Manfred Spraul, stable
Next update:
- switch to smp_store_mb() instead of WRITE_ONCE();smp_mb();
- introduce SEM_GLOBAL_LOCK instead of magic -1.
- do not use READ_ONCE() for the unlocked&unordered test:
READ_ONCE doesn't make sense for unlocked&unordered code.
- document why smp_mb() is required after spin_lock().
Commit 6d07b68ce16a ("ipc/sem.c: optimize sem_lock()") introduced a
race:
sem_lock has a fast path that allows parallel simple operations.
There are two reasons why a simple operation cannot run in parallel:
- a non-simple operations is ongoing (sma->sem_perm.lock held)
- a complex operation is sleeping (sma->complex_count != 0)
As both facts are stored independently, a thread can bypass the current
checks by sleeping in the right positions. See below for more details
(or kernel bugzilla 105651).
The patch fixes that by creating one variable (complex_mode)
that tracks both reasons why parallel operations are not possible.
The patch also updates stale documentation regarding the locking.
With regards to stable kernels:
The patch is required for all kernels that include the
commit 6d07b68ce16a ("ipc/sem.c: optimize sem_lock()") (3.10?)
The alternative is to revert the patch that introduced the race.
The patch is safe for backporting, i.e. it makes no assumptions
about memory barriers in spin_unlock_wait().
Background:
Here is the race of the current implementation:
Thread A: (simple op)
- does the first "sma->complex_count == 0" test
Thread B: (complex op)
- does sem_lock(): This includes an array scan. But the scan can't
find Thread A, because Thread A does not own sem->lock yet.
- the thread does the operation, increases complex_count,
drops sem_lock, sleeps
Thread A:
- spin_lock(&sem->lock), spin_is_locked(sma->sem_perm.lock)
- sleeps before the complex_count test
Thread C: (complex op)
- does sem_lock (no array scan, complex_count==1)
- wakes up Thread B.
- decrements complex_count
Thread A:
- does the complex_count test
Bug:
Now both thread A and thread C operate on the same array, without
any synchronization.
Fixes: 6d07b68ce16a ("ipc/sem.c: optimize sem_lock()")
Reported-by: felixh@informatik.uni-bremen.de
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: <stable@vger.kernel.org>
---
include/linux/sem.h | 1 +
ipc/sem.c | 138 +++++++++++++++++++++++++++++++---------------------
2 files changed, 84 insertions(+), 55 deletions(-)
diff --git a/include/linux/sem.h b/include/linux/sem.h
index 976ce3a..d0efd6e 100644
--- a/include/linux/sem.h
+++ b/include/linux/sem.h
@@ -21,6 +21,7 @@ struct sem_array {
struct list_head list_id; /* undo requests on this array */
int sem_nsems; /* no. of semaphores in array */
int complex_count; /* pending complex operations */
+ bool complex_mode; /* no parallel simple ops */
};
#ifdef CONFIG_SYSVIPC
diff --git a/ipc/sem.c b/ipc/sem.c
index ae72b3c..3987a97 100644
--- a/ipc/sem.c
+++ b/ipc/sem.c
@@ -162,14 +162,21 @@ static int sysvipc_sem_proc_show(struct seq_file *s, void *it);
/*
* Locking:
+ * a) global sem_lock() for read/write
* sem_undo.id_next,
* sem_array.complex_count,
- * sem_array.pending{_alter,_cont},
- * sem_array.sem_undo: global sem_lock() for read/write
- * sem_undo.proc_next: only "current" is allowed to read/write that field.
+ * sem_array.complex_mode
+ * sem_array.pending{_alter,_const},
+ * sem_array.sem_undo
*
+ * b) global or semaphore sem_lock() for read/write:
* sem_array.sem_base[i].pending_{const,alter}:
- * global or semaphore sem_lock() for read/write
+ * sem_array.complex_mode (for read)
+ *
+ * c) special:
+ * sem_undo_list.list_proc:
+ * * undo_list->lock for write
+ * * rcu for read
*/
#define sc_semmsl sem_ctls[0]
@@ -260,30 +267,61 @@ static void sem_rcu_free(struct rcu_head *head)
}
/*
- * Wait until all currently ongoing simple ops have completed.
+ * Enter the mode suitable for non-simple operations:
* Caller must own sem_perm.lock.
- * New simple ops cannot start, because simple ops first check
- * that sem_perm.lock is free.
- * that a) sem_perm.lock is free and b) complex_count is 0.
*/
-static void sem_wait_array(struct sem_array *sma)
+static void complexmode_enter(struct sem_array *sma)
{
int i;
struct sem *sem;
- if (sma->complex_count) {
- /* The thread that increased sma->complex_count waited on
- * all sem->lock locks. Thus we don't need to wait again.
- */
+ if (sma->complex_mode) {
+ /* We are already in complex_mode. Nothing to do */
return;
}
+ /* We need a full barrier after seting complex_mode:
+ * The write to complex_mode must be visible
+ * before we read the first sem->lock spinlock state.
+ */
+ smp_store_mb(sma->complex_mode, true);
+
for (i = 0; i < sma->sem_nsems; i++) {
sem = sma->sem_base + i;
spin_unlock_wait(&sem->lock);
}
+ /*
+ * spin_unlock_wait() is not a memory barriers, it is only a
+ * control barrier. The code must pair with spin_unlock(&sem->lock),
+ * thus just the control barrier is insufficient.
+ *
+ * smp_rmb() is sufficient, as writes cannot pass the control barrier.
+ */
+ smp_rmb();
+}
+
+/*
+ * Try to leave the mode that disallows simple operations:
+ * Caller must own sem_perm.lock.
+ */
+static void complexmode_tryleave(struct sem_array *sma)
+{
+ if (sma->complex_count) {
+ /* Complex ops are sleeping.
+ * We must stay in complex mode
+ */
+ return;
+ }
+ /*
+ * Immediately after setting complex_mode to false,
+ * a simple op can start. Thus: all memory writes
+ * performed by the current operation must be visible
+ * before we set complex_mode to false.
+ */
+ smp_store_release(&sma->complex_mode, false);
}
+#define SEM_GLOBAL_LOCK (-1)
/*
* If the request contains only one semaphore operation, and there are
* no complex transactions pending, lock only the semaphore involved.
@@ -300,56 +338,42 @@ static inline int sem_lock(struct sem_array *sma, struct sembuf *sops,
/* Complex operation - acquire a full lock */
ipc_lock_object(&sma->sem_perm);
- /* And wait until all simple ops that are processed
- * right now have dropped their locks.
- */
- sem_wait_array(sma);
- return -1;
+ /* Prevent parallel simple ops */
+ complexmode_enter(sma);
+ return SEM_GLOBAL_LOCK;
}
/*
* Only one semaphore affected - try to optimize locking.
- * The rules are:
- * - optimized locking is possible if no complex operation
- * is either enqueued or processed right now.
- * - The test for enqueued complex ops is simple:
- * sma->complex_count != 0
- * - Testing for complex ops that are processed right now is
- * a bit more difficult. Complex ops acquire the full lock
- * and first wait that the running simple ops have completed.
- * (see above)
- * Thus: If we own a simple lock and the global lock is free
- * and complex_count is now 0, then it will stay 0 and
- * thus just locking sem->lock is sufficient.
+ * Optimized locking is possible if no complex operation
+ * is either enqueued or processed right now.
+ *
+ * Both facts are tracked by complex_mode.
*/
sem = sma->sem_base + sops->sem_num;
- if (sma->complex_count == 0) {
+ /*
+ * Initial check for complex_mode. Just an optimization,
+ * no locking, no memory barrier.
+ */
+ if (!sma->complex_mode) {
/*
* It appears that no complex operation is around.
* Acquire the per-semaphore lock.
*/
spin_lock(&sem->lock);
- /* Then check that the global lock is free */
- if (!spin_is_locked(&sma->sem_perm.lock)) {
- /*
- * We need a memory barrier with acquire semantics,
- * otherwise we can race with another thread that does:
- * complex_count++;
- * spin_unlock(sem_perm.lock);
- */
- smp_acquire__after_ctrl_dep();
+ /*
+ * See 51d7d5205d33
+ * ("powerpc: Add smp_mb() to arch_spin_is_locked()"):
+ * A full barrier is required: the write of sem->lock
+ * must be visible before the read is executed
+ */
+ smp_mb();
- /*
- * Now repeat the test of complex_count:
- * It can't change anymore until we drop sem->lock.
- * Thus: if is now 0, then it will stay 0.
- */
- if (sma->complex_count == 0) {
- /* fast path successful! */
- return sops->sem_num;
- }
+ if (!smp_load_acquire(&sma->complex_mode)) {
+ /* fast path successful! */
+ return sops->sem_num;
}
spin_unlock(&sem->lock);
}
@@ -369,15 +393,16 @@ static inline int sem_lock(struct sem_array *sma, struct sembuf *sops,
/* Not a false alarm, thus complete the sequence for a
* full lock.
*/
- sem_wait_array(sma);
- return -1;
+ complexmode_enter(sma);
+ return SEM_GLOBAL_LOCK;
}
}
static inline void sem_unlock(struct sem_array *sma, int locknum)
{
- if (locknum == -1) {
+ if (locknum == SEM_GLOBAL_LOCK) {
unmerge_queues(sma);
+ complexmode_tryleave(sma);
ipc_unlock_object(&sma->sem_perm);
} else {
struct sem *sem = sma->sem_base + locknum;
@@ -529,6 +554,7 @@ static int newary(struct ipc_namespace *ns, struct ipc_params *params)
}
sma->complex_count = 0;
+ sma->complex_mode = true; /* dropped by sem_unlock below */
INIT_LIST_HEAD(&sma->pending_alter);
INIT_LIST_HEAD(&sma->pending_const);
INIT_LIST_HEAD(&sma->list_id);
@@ -2184,10 +2210,10 @@ static int sysvipc_sem_proc_show(struct seq_file *s, void *it)
/*
* The proc interface isn't aware of sem_lock(), it calls
* ipc_lock_object() directly (in sysvipc_find_ipc).
- * In order to stay compatible with sem_lock(), we must wait until
- * all simple semop() calls have left their critical regions.
+ * In order to stay compatible with sem_lock(), we must
+ * enter / leave complex_mode.
*/
- sem_wait_array(sma);
+ complexmode_enter(sma);
sem_otime = get_semotime(sma);
@@ -2204,6 +2230,8 @@ static int sysvipc_sem_proc_show(struct seq_file *s, void *it)
sem_otime,
sma->sem_ctime);
+ complexmode_tryleave(sma);
+
return 0;
}
#endif
--
2.5.5
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] ipc/sem.c: Fix complex_count vs. simple op race
2016-07-21 17:54 Manfred Spraul
@ 2016-07-21 20:16 ` Andrew Morton
0 siblings, 0 replies; 5+ messages in thread
From: Andrew Morton @ 2016-07-21 20:16 UTC (permalink / raw)
To: Manfred Spraul
Cc: H. Peter Anvin, Peter Zijlstra, Davidlohr Bueso, LKML,
Thomas Gleixner, Ingo Molnar, 1vier1, felixh, stable
On Thu, 21 Jul 2016 19:54:55 +0200 Manfred Spraul <manfred@colorfullife.com> wrote:
> Next update:
> - switch to smp_store_mb() instead of WRITE_ONCE();smp_mb();
> - introduce SEM_GLOBAL_LOCK instead of magic -1.
> - do not use READ_ONCE() for the unlocked&unordered test:
> READ_ONCE doesn't make sense for unlocked&unordered code.
> - document why smp_mb() is required after spin_lock().
I assume "ipc/sem.c: remove duplicated memory barriers" is still
relevant?
From: Manfred Spraul <manfred@colorfullife.com>
Subject: ipc/sem.c: remove duplicated memory barriers
With 2c610022711 ("locking/qspinlock: Fix spin_unlock_wait() some more"),
memory barriers were added into spin_unlock_wait(). Thus another barrier
is not required.
And as explained in 055ce0fd1b8 ("locking/qspinlock: Add comments"),
spin_lock() provides a barrier so that reads within the critical section
cannot happen before the write for the lock is visible. i.e. spin_lock
provides an acquire barrier after the write of the lock variable, this
barrier pairs with the smp_mb() in complexmode_enter().
Link: http://lkml.kernel.org/r/1468386412-3608-3-git-send-email-manfred@colorfullife.com
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: <1vier1@web.de>
Cc: <felixh@informatik.uni-bremen.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
ipc/sem.c | 16 ----------------
1 file changed, 16 deletions(-)
diff -puN ipc/sem.c~ipc-semc-remove-duplicated-memory-barriers ipc/sem.c
--- a/ipc/sem.c~ipc-semc-remove-duplicated-memory-barriers
+++ a/ipc/sem.c
@@ -290,14 +290,6 @@ static void complexmode_enter(struct sem
sem = sma->sem_base + i;
spin_unlock_wait(&sem->lock);
}
- /*
- * spin_unlock_wait() is not a memory barriers, it is only a
- * control barrier. The code must pair with spin_unlock(&sem->lock),
- * thus just the control barrier is insufficient.
- *
- * smp_rmb() is sufficient, as writes cannot pass the control barrier.
- */
- smp_rmb();
}
/*
@@ -363,14 +355,6 @@ static inline int sem_lock(struct sem_ar
*/
spin_lock(&sem->lock);
- /*
- * See 51d7d5205d33
- * ("powerpc: Add smp_mb() to arch_spin_is_locked()"):
- * A full barrier is required: the write of sem->lock
- * must be visible before the read is executed
- */
- smp_mb();
-
if (!smp_load_acquire(&sma->complex_mode)) {
/* fast path successful! */
return sops->sem_num;
_
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2016-07-21 20:16 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-01-02 12:04 [PATCH] ipc/sem.c: Fix complex_count vs. simple op race Manfred Spraul
2016-01-04 13:02 ` Davidlohr Bueso
2016-01-04 18:32 ` Manfred Spraul
-- strict thread matches above, loose matches on Subject: below --
2016-07-21 17:54 Manfred Spraul
2016-07-21 20:16 ` Andrew Morton
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).