* [PATCH 1/2] ipc/sem.c: Fix complex_count vs. simple op race [not found] <1468386412-3608-1-git-send-email-manfred@colorfullife.com> @ 2016-07-13 5:06 ` Manfred Spraul 2016-07-16 1:27 ` Davidlohr Bueso 0 siblings, 1 reply; 12+ messages in thread From: Manfred Spraul @ 2016-07-13 5:06 UTC (permalink / raw) To: H. Peter Anvin, Peter Zijlstra, Andrew Morton, Davidlohr Bueso Cc: LKML, Thomas Gleixner, Ingo Molnar, 1vier1, felixh, Manfred Spraul, stable Commit 6d07b68ce16a ("ipc/sem.c: optimize sem_lock()") introduced a race: sem_lock has a fast path that allows parallel simple operations. There are two reasons why a simple operation cannot run in parallel: - a non-simple operations is ongoing (sma->sem_perm.lock held) - a complex operation is sleeping (sma->complex_count != 0) As both facts are stored independently, a thread can bypass the current checks by sleeping in the right positions. See below for more details (or kernel bugzilla 105651). The patch fixes that by creating one variable (complex_mode) that tracks both reasons why parallel operations are not possible. The patch also updates stale documentation regarding the locking. With regards to stable kernels: The patch is required for all kernels that include the commit 6d07b68ce16a ("ipc/sem.c: optimize sem_lock()") (3.10?) The alternative is to revert the patch that introduced the race. The patch is safe for backporting, i.e. it makes no assumptions about memory barriers in spin_unlock_wait() or that the acquire within spin_lock() is after setting the lock variable. Background: Here is the race of the current implementation: Thread A: (simple op) - does the first "sma->complex_count == 0" test Thread B: (complex op) - does sem_lock(): This includes an array scan. But the scan can't find Thread A, because Thread A does not own sem->lock yet. - the thread does the operation, increases complex_count, drops sem_lock, sleeps Thread A: - spin_lock(&sem->lock), spin_is_locked(sma->sem_perm.lock) - sleeps before the complex_count test Thread C: (complex op) - does sem_lock (no array scan, complex_count==1) - wakes up Thread B. - decrements complex_count Thread A: - does the complex_count test Bug: Now both thread A and thread C operate on the same array, without any synchronization. Full memory barrier are required to synchronize changes of complex_mode and the lock operations. Fixes: 6d07b68ce16a ("ipc/sem.c: optimize sem_lock()") Reported-by: felixh@informatik.uni-bremen.de Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Cc: <stable@vger.kernel.org> --- include/linux/sem.h | 1 + ipc/sem.c | 130 +++++++++++++++++++++++++++++++--------------------- 2 files changed, 79 insertions(+), 52 deletions(-) diff --git a/include/linux/sem.h b/include/linux/sem.h index 976ce3a..d0efd6e 100644 --- a/include/linux/sem.h +++ b/include/linux/sem.h @@ -21,6 +21,7 @@ struct sem_array { struct list_head list_id; /* undo requests on this array */ int sem_nsems; /* no. of semaphores in array */ int complex_count; /* pending complex operations */ + bool complex_mode; /* no parallel simple ops */ }; #ifdef CONFIG_SYSVIPC diff --git a/ipc/sem.c b/ipc/sem.c index ae72b3c..0da63c8 100644 --- a/ipc/sem.c +++ b/ipc/sem.c @@ -162,14 +162,21 @@ static int sysvipc_sem_proc_show(struct seq_file *s, void *it); /* * Locking: + * a) global sem_lock() for read/write * sem_undo.id_next, * sem_array.complex_count, - * sem_array.pending{_alter,_cont}, - * sem_array.sem_undo: global sem_lock() for read/write - * sem_undo.proc_next: only "current" is allowed to read/write that field. + * sem_array.complex_mode + * sem_array.pending{_alter,_const}, + * sem_array.sem_undo * + * b) global or semaphore sem_lock() for read/write: * sem_array.sem_base[i].pending_{const,alter}: - * global or semaphore sem_lock() for read/write + * sem_array.complex_mode (for read) + * + * c) special: + * sem_undo_list.list_proc: + * * undo_list->lock for write + * * rcu for read */ #define sc_semmsl sem_ctls[0] @@ -260,28 +267,59 @@ static void sem_rcu_free(struct rcu_head *head) } /* - * Wait until all currently ongoing simple ops have completed. + * Enter the mode suitable for non-simple operations: * Caller must own sem_perm.lock. - * New simple ops cannot start, because simple ops first check - * that sem_perm.lock is free. - * that a) sem_perm.lock is free and b) complex_count is 0. */ -static void sem_wait_array(struct sem_array *sma) +static void complexmode_enter(struct sem_array *sma) { int i; struct sem *sem; - if (sma->complex_count) { - /* The thread that increased sma->complex_count waited on - * all sem->lock locks. Thus we don't need to wait again. - */ + if (sma->complex_mode) { + /* We are already in complex_mode. Nothing to do */ return; } + WRITE_ONCE(sma->complex_mode, true); + + /* We need a full barrier: + * The write to complex_mode must be visible + * before we read the first sem->lock spinlock state. + */ + smp_mb(); for (i = 0; i < sma->sem_nsems; i++) { sem = sma->sem_base + i; spin_unlock_wait(&sem->lock); } + /* + * spin_unlock_wait() is not a memory barriers, it is only a + * control barrier. The code must pair with spin_unlock(&sem->lock), + * thus just the control barrier is insufficient. + * + * smp_rmb() is sufficient, as writes cannot pass the control barrier. + */ + smp_rmb(); +} + +/* + * Try to leave the mode that disallows simple operations: + * Caller must own sem_perm.lock. + */ +static void complexmode_tryleave(struct sem_array *sma) +{ + if (sma->complex_count) { + /* Complex ops are sleeping. + * We must stay in complex mode + */ + return; + } + /* + * Immediately after setting complex_mode to false, + * a simple op can start. Thus: all memory writes + * performed by the current operation must be visible + * before we set complex_mode to false. + */ + smp_store_release(&sma->complex_mode, false); } /* @@ -300,56 +338,40 @@ static inline int sem_lock(struct sem_array *sma, struct sembuf *sops, /* Complex operation - acquire a full lock */ ipc_lock_object(&sma->sem_perm); - /* And wait until all simple ops that are processed - * right now have dropped their locks. - */ - sem_wait_array(sma); + /* Prevent parallel simple ops */ + complexmode_enter(sma); return -1; } /* * Only one semaphore affected - try to optimize locking. - * The rules are: - * - optimized locking is possible if no complex operation - * is either enqueued or processed right now. - * - The test for enqueued complex ops is simple: - * sma->complex_count != 0 - * - Testing for complex ops that are processed right now is - * a bit more difficult. Complex ops acquire the full lock - * and first wait that the running simple ops have completed. - * (see above) - * Thus: If we own a simple lock and the global lock is free - * and complex_count is now 0, then it will stay 0 and - * thus just locking sem->lock is sufficient. + * Optimized locking is possible if no complex operation + * is either enqueued or processed right now. + * + * Both facts are tracked by complex_mode. */ sem = sma->sem_base + sops->sem_num; - if (sma->complex_count == 0) { + /* + * Initial check for complex_mode. Just an optimization, + * no locking, no memory barrier. + */ + if (!READ_ONCE(sma->complex_mode)) { /* * It appears that no complex operation is around. * Acquire the per-semaphore lock. */ spin_lock(&sem->lock); - /* Then check that the global lock is free */ - if (!spin_is_locked(&sma->sem_perm.lock)) { - /* - * We need a memory barrier with acquire semantics, - * otherwise we can race with another thread that does: - * complex_count++; - * spin_unlock(sem_perm.lock); - */ - smp_acquire__after_ctrl_dep(); + /* + * A full barrier is required: the write of sem->lock + * must be visible before the read is executed + */ + smp_mb(); - /* - * Now repeat the test of complex_count: - * It can't change anymore until we drop sem->lock. - * Thus: if is now 0, then it will stay 0. - */ - if (sma->complex_count == 0) { - /* fast path successful! */ - return sops->sem_num; - } + if (!smp_load_acquire(&sma->complex_mode)) { + /* fast path successful! */ + return sops->sem_num; } spin_unlock(&sem->lock); } @@ -369,7 +391,7 @@ static inline int sem_lock(struct sem_array *sma, struct sembuf *sops, /* Not a false alarm, thus complete the sequence for a * full lock. */ - sem_wait_array(sma); + complexmode_enter(sma); return -1; } } @@ -378,6 +400,7 @@ static inline void sem_unlock(struct sem_array *sma, int locknum) { if (locknum == -1) { unmerge_queues(sma); + complexmode_tryleave(sma); ipc_unlock_object(&sma->sem_perm); } else { struct sem *sem = sma->sem_base + locknum; @@ -529,6 +552,7 @@ static int newary(struct ipc_namespace *ns, struct ipc_params *params) } sma->complex_count = 0; + sma->complex_mode = true; /* dropped by sem_unlock below */ INIT_LIST_HEAD(&sma->pending_alter); INIT_LIST_HEAD(&sma->pending_const); INIT_LIST_HEAD(&sma->list_id); @@ -2184,10 +2208,10 @@ static int sysvipc_sem_proc_show(struct seq_file *s, void *it) /* * The proc interface isn't aware of sem_lock(), it calls * ipc_lock_object() directly (in sysvipc_find_ipc). - * In order to stay compatible with sem_lock(), we must wait until - * all simple semop() calls have left their critical regions. + * In order to stay compatible with sem_lock(), we must + * enter / leave complex_mode. */ - sem_wait_array(sma); + complexmode_enter(sma); sem_otime = get_semotime(sma); @@ -2204,6 +2228,8 @@ static int sysvipc_sem_proc_show(struct seq_file *s, void *it) sem_otime, sma->sem_ctime); + complexmode_tryleave(sma); + return 0; } #endif -- 2.5.5 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 1/2] ipc/sem.c: Fix complex_count vs. simple op race 2016-07-13 5:06 ` [PATCH 1/2] ipc/sem.c: Fix complex_count vs. simple op race Manfred Spraul @ 2016-07-16 1:27 ` Davidlohr Bueso 0 siblings, 0 replies; 12+ messages in thread From: Davidlohr Bueso @ 2016-07-16 1:27 UTC (permalink / raw) To: Manfred Spraul Cc: H. Peter Anvin, Peter Zijlstra, Andrew Morton, LKML, Thomas Gleixner, Ingo Molnar, 1vier1, felixh, stable On Wed, 13 Jul 2016, Manfred Spraul wrote: >-static void sem_wait_array(struct sem_array *sma) >+static void complexmode_enter(struct sem_array *sma) > { > int i; > struct sem *sem; > >- if (sma->complex_count) { >- /* The thread that increased sma->complex_count waited on >- * all sem->lock locks. Thus we don't need to wait again. >- */ >+ if (sma->complex_mode) { >+ /* We are already in complex_mode. Nothing to do */ > return; > } >+ WRITE_ONCE(sma->complex_mode, true); So we can actually save those READ/WRITE_ONCE calls for complex_mode as it's a bool and therefore tearing is not an issue. >+ >+ /* We need a full barrier: >+ * The write to complex_mode must be visible >+ * before we read the first sem->lock spinlock state. >+ */ >+ smp_mb(); smp_store_mb()? > /* >@@ -300,56 +338,40 @@ static inline int sem_lock(struct sem_array *sma, struct sembuf *sops, > /* Complex operation - acquire a full lock */ > ipc_lock_object(&sma->sem_perm); > >- /* And wait until all simple ops that are processed >- * right now have dropped their locks. >- */ >- sem_wait_array(sma); >+ /* Prevent parallel simple ops */ >+ complexmode_enter(sma); > return -1; nit and unrelated: we should probably use some better label here than a raw -1 (although I don't see it changing, just for nicer reading), ie: SEM_OBJECT_LOCKED Thanks, Davidlohr ^ permalink raw reply [flat|nested] 12+ messages in thread
[parent not found: <1466876272-3824-1-git-send-email-manfred@colorfullife.com>]
* [PATCH 1/2] ipc/sem.c: Fix complex_count vs. simple op race [not found] <1466876272-3824-1-git-send-email-manfred@colorfullife.com> @ 2016-06-25 17:37 ` Manfred Spraul 2016-06-25 19:41 ` Holger Hoffstätte 0 siblings, 1 reply; 12+ messages in thread From: Manfred Spraul @ 2016-06-25 17:37 UTC (permalink / raw) To: H. Peter Anvin, Peter Zijlstra, Andrew Morton, Davidlohr Bueso Cc: LKML, Thomas Gleixner, Ingo Molnar, 1vier1, felixh, Manfred Spraul, stable Commit 6d07b68ce16a ("ipc/sem.c: optimize sem_lock()") introduced a race: sem_lock has a fast path that allows parallel simple operations. There are two reasons why a simple operation cannot run in parallel: - a non-simple operations is ongoing (sma->sem_perm.lock held) - a complex operation is sleeping (sma->complex_count != 0) As both facts are stored independently, a thread can bypass the current checks by sleeping in the right positions. See below for more details (or kernel bugzilla 105651). The patch fixes that by creating one variable (complex_mode) that tracks both reasons why parallel operations are not possible. The patch also updates stale documentation regarding the locking. With regards to stable kernels: The patch is required for all kernels that include the commit 6d07b68ce16a ("ipc/sem.c: optimize sem_lock()") (3.10?) The alternative is to revert the patch that introduced the race. Background: Here is the race of the current implementation: Thread A: (simple op) - does the first "sma->complex_count == 0" test Thread B: (complex op) - does sem_lock(): This includes an array scan. But the scan can't find Thread A, because Thread A does not own sem->lock yet. - the thread does the operation, increases complex_count, drops sem_lock, sleeps Thread A: - spin_lock(&sem->lock), spin_is_locked(sma->sem_perm.lock) - sleeps before the complex_count test Thread C: (complex op) - does sem_lock (no array scan, complex_count==1) - wakes up Thread B. - decrements complex_count Thread A: - does the complex_count test Bug: Now both thread A and thread C operate on the same array, without any synchronization. Full memory barrier are required to synchronize changes of complex_mode and the lock operations. Fixes: 6d07b68ce16a ("ipc/sem.c: optimize sem_lock()") Reported-by: felixh@informatik.uni-bremen.de Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Cc: <stable@vger.kernel.org> --- include/linux/sem.h | 1 + ipc/sem.c | 122 ++++++++++++++++++++++++++++++---------------------- 2 files changed, 71 insertions(+), 52 deletions(-) diff --git a/include/linux/sem.h b/include/linux/sem.h index 976ce3a..d0efd6e 100644 --- a/include/linux/sem.h +++ b/include/linux/sem.h @@ -21,6 +21,7 @@ struct sem_array { struct list_head list_id; /* undo requests on this array */ int sem_nsems; /* no. of semaphores in array */ int complex_count; /* pending complex operations */ + bool complex_mode; /* no parallel simple ops */ }; #ifdef CONFIG_SYSVIPC diff --git a/ipc/sem.c b/ipc/sem.c index ae72b3c..538f43a 100644 --- a/ipc/sem.c +++ b/ipc/sem.c @@ -162,14 +162,21 @@ static int sysvipc_sem_proc_show(struct seq_file *s, void *it); /* * Locking: + * a) global sem_lock() for read/write * sem_undo.id_next, * sem_array.complex_count, - * sem_array.pending{_alter,_cont}, - * sem_array.sem_undo: global sem_lock() for read/write - * sem_undo.proc_next: only "current" is allowed to read/write that field. + * sem_array.complex_mode + * sem_array.pending{_alter,_const}, + * sem_array.sem_undo * + * b) global or semaphore sem_lock() for read/write: * sem_array.sem_base[i].pending_{const,alter}: - * global or semaphore sem_lock() for read/write + * sem_array.complex_mode (for read) + * + * c) special: + * sem_undo_list.list_proc: + * * undo_list->lock for write + * * rcu for read */ #define sc_semmsl sem_ctls[0] @@ -260,23 +267,25 @@ static void sem_rcu_free(struct rcu_head *head) } /* - * Wait until all currently ongoing simple ops have completed. + * Enter the mode suitable for non-simple operations: * Caller must own sem_perm.lock. - * New simple ops cannot start, because simple ops first check - * that sem_perm.lock is free. - * that a) sem_perm.lock is free and b) complex_count is 0. */ -static void sem_wait_array(struct sem_array *sma) +static void complexmode_enter(struct sem_array *sma) { int i; struct sem *sem; - if (sma->complex_count) { - /* The thread that increased sma->complex_count waited on - * all sem->lock locks. Thus we don't need to wait again. - */ + if (sma->complex_mode) { + /* We are already in complex_mode. Nothing to do */ return; } + WRITE_ONCE(sma->complex_mode, true); + + /* We need a full barrier: + * The write to complex_mode must be visible + * before we read the first sem->lock spinlock state. + */ + smp_mb(); for (i = 0; i < sma->sem_nsems; i++) { sem = sma->sem_base + i; @@ -285,6 +294,27 @@ static void sem_wait_array(struct sem_array *sma) } /* + * Try to leave the mode that disallows simple operations: + * Caller must own sem_perm.lock. + */ +static void complexmode_tryleave(struct sem_array *sma) +{ + if (sma->complex_count) { + /* Complex ops are sleeping. + * We must stay in complex mode + */ + return; + } + /* + * Immediately after setting complex_mode to false, + * a simple op can start. Thus: all memory writes + * performed by the current operation must be visible + * before we set complex_mode to false. + */ + smp_store_release(&sma->complex_mode, false); +} + +/* * If the request contains only one semaphore operation, and there are * no complex transactions pending, lock only the semaphore involved. * Otherwise, lock the entire semaphore array, since we either have @@ -300,56 +330,40 @@ static inline int sem_lock(struct sem_array *sma, struct sembuf *sops, /* Complex operation - acquire a full lock */ ipc_lock_object(&sma->sem_perm); - /* And wait until all simple ops that are processed - * right now have dropped their locks. - */ - sem_wait_array(sma); + /* Prevent parallel simple ops */ + complexmode_enter(sma); return -1; } /* * Only one semaphore affected - try to optimize locking. - * The rules are: - * - optimized locking is possible if no complex operation - * is either enqueued or processed right now. - * - The test for enqueued complex ops is simple: - * sma->complex_count != 0 - * - Testing for complex ops that are processed right now is - * a bit more difficult. Complex ops acquire the full lock - * and first wait that the running simple ops have completed. - * (see above) - * Thus: If we own a simple lock and the global lock is free - * and complex_count is now 0, then it will stay 0 and - * thus just locking sem->lock is sufficient. + * Optimized locking is possible if no complex operation + * is either enqueued or processed right now. + * + * Both facts are tracked by complex_mode. */ sem = sma->sem_base + sops->sem_num; - if (sma->complex_count == 0) { + /* + * Initial check for complex_mode. Just an optimization, + * no locking. + */ + if (!READ_ONCE(sma->complex_mode)) { /* * It appears that no complex operation is around. * Acquire the per-semaphore lock. */ spin_lock(&sem->lock); - /* Then check that the global lock is free */ - if (!spin_is_locked(&sma->sem_perm.lock)) { - /* - * We need a memory barrier with acquire semantics, - * otherwise we can race with another thread that does: - * complex_count++; - * spin_unlock(sem_perm.lock); - */ - smp_acquire__after_ctrl_dep(); + /* + * A full barrier is required: the write of sem->lock + * must be visible before the read is executed + */ + smp_mb(); - /* - * Now repeat the test of complex_count: - * It can't change anymore until we drop sem->lock. - * Thus: if is now 0, then it will stay 0. - */ - if (sma->complex_count == 0) { - /* fast path successful! */ - return sops->sem_num; - } + if (!READ_ONCE(sma->complex_mode)) { + /* fast path successful! */ + return sops->sem_num; } spin_unlock(&sem->lock); } @@ -369,7 +383,7 @@ static inline int sem_lock(struct sem_array *sma, struct sembuf *sops, /* Not a false alarm, thus complete the sequence for a * full lock. */ - sem_wait_array(sma); + complexmode_enter(sma); return -1; } } @@ -378,6 +392,7 @@ static inline void sem_unlock(struct sem_array *sma, int locknum) { if (locknum == -1) { unmerge_queues(sma); + complexmode_tryleave(sma); ipc_unlock_object(&sma->sem_perm); } else { struct sem *sem = sma->sem_base + locknum; @@ -529,6 +544,7 @@ static int newary(struct ipc_namespace *ns, struct ipc_params *params) } sma->complex_count = 0; + sma->complex_mode = true; /* dropped by sem_unlock below */ INIT_LIST_HEAD(&sma->pending_alter); INIT_LIST_HEAD(&sma->pending_const); INIT_LIST_HEAD(&sma->list_id); @@ -2184,10 +2200,10 @@ static int sysvipc_sem_proc_show(struct seq_file *s, void *it) /* * The proc interface isn't aware of sem_lock(), it calls * ipc_lock_object() directly (in sysvipc_find_ipc). - * In order to stay compatible with sem_lock(), we must wait until - * all simple semop() calls have left their critical regions. + * In order to stay compatible with sem_lock(), we must + * enter / leave complex_mode. */ - sem_wait_array(sma); + complexmode_enter(sma); sem_otime = get_semotime(sma); @@ -2204,6 +2220,8 @@ static int sysvipc_sem_proc_show(struct seq_file *s, void *it) sem_otime, sma->sem_ctime); + complexmode_tryleave(sma); + return 0; } #endif -- 2.5.5 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 1/2] ipc/sem.c: Fix complex_count vs. simple op race 2016-06-25 17:37 ` Manfred Spraul @ 2016-06-25 19:41 ` Holger Hoffstätte 0 siblings, 0 replies; 12+ messages in thread From: Holger Hoffstätte @ 2016-06-25 19:41 UTC (permalink / raw) To: stable; +Cc: linux-kernel On Sat, 25 Jun 2016 19:37:51 +0200, Manfred Spraul wrote: > Commit 6d07b68ce16a ("ipc/sem.c: optimize sem_lock()") introduced a > race: > > sem_lock has a fast path that allows parallel simple operations. > There are two reasons why a simple operation cannot run in parallel: > - a non-simple operations is ongoing (sma->sem_perm.lock held) > - a complex operation is sleeping (sma->complex_count != 0) > > As both facts are stored independently, a thread can bypass the current > checks by sleeping in the right positions. See below for more details > (or kernel bugzilla 105651). > > The patch fixes that by creating one variable (complex_mode) > that tracks both reasons why parallel operations are not possible. > > The patch also updates stale documentation regarding the locking. > > With regards to stable kernels: > The patch is required for all kernels that include the > commit 6d07b68ce16a ("ipc/sem.c: optimize sem_lock()") (3.10?) > > The alternative is to revert the patch that introduced the race. > > Background: > Here is the race of the current implementation: > > Thread A: (simple op) > - does the first "sma->complex_count == 0" test > > Thread B: (complex op) > - does sem_lock(): This includes an array scan. But the scan can't > find Thread A, because Thread A does not own sem->lock yet. > - the thread does the operation, increases complex_count, > drops sem_lock, sleeps > > Thread A: > - spin_lock(&sem->lock), spin_is_locked(sma->sem_perm.lock) > - sleeps before the complex_count test > > Thread C: (complex op) > - does sem_lock (no array scan, complex_count==1) > - wakes up Thread B. > - decrements complex_count > > Thread A: > - does the complex_count test > > Bug: > Now both thread A and thread C operate on the same array, without > any synchronization. > > Full memory barrier are required to synchronize changes of > complex_mode and the lock operations. > > Fixes: 6d07b68ce16a ("ipc/sem.c: optimize sem_lock()") > Reported-by: felixh@informatik.uni-bremen.de > Signed-off-by: Manfred Spraul <manfred@colorfullife.com> > Cc: <stable@vger.kernel.org> <snip> > > - /* Then check that the global lock is free */ > - if (!spin_is_locked(&sma->sem_perm.lock)) { > - /* > - * We need a memory barrier with acquire semantics, > - * otherwise we can race with another thread that does: > - * complex_count++; > - * spin_unlock(sem_perm.lock); > - */ > - smp_acquire__after_ctrl_dep(); This won't apply to -stable because smp_acquire__after_ctrl_dep() was only recently added. I could merge this over 4.4.x by replacing it with the previous definition ipc_smp_acquire__after_spin_is_unlocked() (just as you did in the first version of the patch). hth, Holger ^ permalink raw reply [flat|nested] 12+ messages in thread
[parent not found: <20160615152318.164b1ebd@canb.auug.org.au>]
* [PATCH 1/2] ipc/sem.c: Fix complex_count vs. simple op race [not found] <20160615152318.164b1ebd@canb.auug.org.au> @ 2016-06-18 20:02 ` Manfred Spraul 2016-06-20 23:04 ` Andrew Morton 2016-06-21 0:30 ` Davidlohr Bueso 0 siblings, 2 replies; 12+ messages in thread From: Manfred Spraul @ 2016-06-18 20:02 UTC (permalink / raw) To: Stephen Rothwell, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Peter Zijlstra, Andrew Morton Cc: LKML, linux-next, 1vier1, Davidlohr Bueso, felixh, Manfred Spraul, stable Commit 6d07b68ce16a ("ipc/sem.c: optimize sem_lock()") introduced a race: sem_lock has a fast path that allows parallel simple operations. There are two reasons why a simple operation cannot run in parallel: - a non-simple operations is ongoing (sma->sem_perm.lock held) - a complex operation is sleeping (sma->complex_count != 0) As both facts are stored independently, a thread can bypass the current checks by sleeping in the right positions. See below for more details (or kernel bugzilla 105651). The patch fixes that by creating one variable (complex_mode) that tracks both reasons why parallel operations are not possible. The patch also updates stale documentation regarding the locking. With regards to stable kernels: The patch is required for all kernels that include the commit 6d07b68ce16a ("ipc/sem.c: optimize sem_lock()") (3.10?) The alternative is to revert the patch that introduced the race. Background: Here is the race of the current implementation: Thread A: (simple op) - does the first "sma->complex_count == 0" test Thread B: (complex op) - does sem_lock(): This includes an array scan. But the scan can't find Thread A, because Thread A does not own sem->lock yet. - the thread does the operation, increases complex_count, drops sem_lock, sleeps Thread A: - spin_lock(&sem->lock), spin_is_locked(sma->sem_perm.lock) - sleeps before the complex_count test Thread C: (complex op) - does sem_lock (no array scan, complex_count==1) - wakes up Thread B. - decrements complex_count Thread A: - does the complex_count test Bug: Now both thread A and thread C operate on the same array, without any synchronization. Fixes: 6d07b68ce16a ("ipc/sem.c: optimize sem_lock()") Reported-by: felixh@informatik.uni-bremen.de Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Cc: <stable@vger.kernel.org> --- diff --git a/include/linux/sem.h b/include/linux/sem.h index 976ce3a..d0efd6e 100644 --- a/include/linux/sem.h +++ b/include/linux/sem.h @@ -21,6 +21,7 @@ struct sem_array { struct list_head list_id; /* undo requests on this array */ int sem_nsems; /* no. of semaphores in array */ int complex_count; /* pending complex operations */ + bool complex_mode; /* no parallel simple ops */ }; #ifdef CONFIG_SYSVIPC diff --git a/ipc/sem.c b/ipc/sem.c index ae72b3c..db2e6fc 100644 --- a/ipc/sem.c +++ b/ipc/sem.c @@ -162,14 +162,21 @@ static int sysvipc_sem_proc_show(struct seq_file *s, void *it); /* * Locking: + * a) global sem_lock() for read/write * sem_undo.id_next, * sem_array.complex_count, - * sem_array.pending{_alter,_cont}, - * sem_array.sem_undo: global sem_lock() for read/write - * sem_undo.proc_next: only "current" is allowed to read/write that field. + * sem_array.complex_mode + * sem_array.pending{_alter,_const}, + * sem_array.sem_undo * + * b) global or semaphore sem_lock() for read/write: * sem_array.sem_base[i].pending_{const,alter}: - * global or semaphore sem_lock() for read/write + * sem_array.complex_mode (for read) + * + * c) special: + * sem_undo_list.list_proc: + * * undo_list->lock for write + * * rcu for read */ #define sc_semmsl sem_ctls[0] @@ -260,23 +267,25 @@ static void sem_rcu_free(struct rcu_head *head) } /* - * Wait until all currently ongoing simple ops have completed. + * Enter the mode suitable for non-simple operations: * Caller must own sem_perm.lock. - * New simple ops cannot start, because simple ops first check - * that sem_perm.lock is free. - * that a) sem_perm.lock is free and b) complex_count is 0. */ -static void sem_wait_array(struct sem_array *sma) +static void complexmode_enter(struct sem_array *sma) { int i; struct sem *sem; - if (sma->complex_count) { - /* The thread that increased sma->complex_count waited on - * all sem->lock locks. Thus we don't need to wait again. - */ + if (sma->complex_mode) { + /* We are already in complex_mode. Nothing to do */ return; } + WRITE_ONCE(sma->complex_mode, true); + + /* We need a full barrier: + * The write to complex_mode must be visible + * before we read the first sem->lock spinlock state. + */ + smp_mb(); for (i = 0; i < sma->sem_nsems; i++) { sem = sma->sem_base + i; @@ -285,6 +294,29 @@ static void sem_wait_array(struct sem_array *sma) } /* + * Try to leave the mode that disallows simple operations: + * Caller must own sem_perm.lock. + */ +static void complexmode_tryleave(struct sem_array *sma) +{ + if (sma->complex_count) { + /* Complex ops are sleeping. + * We must stay in complex mode + */ + return; + } + /* + * Immediately after setting complex_mode to false, + * a simple op can start. Thus: all memory writes + * performed by the current operation must be visible + * before we set complex_mode to false. + */ + smp_wmb(); + + WRITE_ONCE(sma->complex_mode, false); +} + +/* * If the request contains only one semaphore operation, and there are * no complex transactions pending, lock only the semaphore involved. * Otherwise, lock the entire semaphore array, since we either have @@ -300,56 +332,38 @@ static inline int sem_lock(struct sem_array *sma, struct sembuf *sops, /* Complex operation - acquire a full lock */ ipc_lock_object(&sma->sem_perm); - /* And wait until all simple ops that are processed - * right now have dropped their locks. - */ - sem_wait_array(sma); + /* Prevent parallel simple ops */ + complexmode_enter(sma); return -1; } /* * Only one semaphore affected - try to optimize locking. - * The rules are: - * - optimized locking is possible if no complex operation - * is either enqueued or processed right now. - * - The test for enqueued complex ops is simple: - * sma->complex_count != 0 - * - Testing for complex ops that are processed right now is - * a bit more difficult. Complex ops acquire the full lock - * and first wait that the running simple ops have completed. - * (see above) - * Thus: If we own a simple lock and the global lock is free - * and complex_count is now 0, then it will stay 0 and - * thus just locking sem->lock is sufficient. + * Optimized locking is possible if no complex operation + * is either enqueued or processed right now. + * + * Both facts are tracked by complex_mode. */ sem = sma->sem_base + sops->sem_num; - if (sma->complex_count == 0) { + /* + * Initial check for complex_mode. Just an optimization, + * no locking. + */ + if (!READ_ONCE(sma->complex_mode)) { /* * It appears that no complex operation is around. * Acquire the per-semaphore lock. */ spin_lock(&sem->lock); - /* Then check that the global lock is free */ - if (!spin_is_locked(&sma->sem_perm.lock)) { - /* - * We need a memory barrier with acquire semantics, - * otherwise we can race with another thread that does: - * complex_count++; - * spin_unlock(sem_perm.lock); - */ - smp_acquire__after_ctrl_dep(); - - /* - * Now repeat the test of complex_count: - * It can't change anymore until we drop sem->lock. - * Thus: if is now 0, then it will stay 0. - */ - if (sma->complex_count == 0) { - /* fast path successful! */ - return sops->sem_num; - } + /* Now repeat the test for complex_mode. + * A memory barrier is provided by the spin_lock() + * above. + */ + if (!READ_ONCE(sma->complex_mode)) { + /* fast path successful! */ + return sops->sem_num; } spin_unlock(&sem->lock); } @@ -369,7 +383,7 @@ static inline int sem_lock(struct sem_array *sma, struct sembuf *sops, /* Not a false alarm, thus complete the sequence for a * full lock. */ - sem_wait_array(sma); + complexmode_enter(sma); return -1; } } @@ -378,6 +392,7 @@ static inline void sem_unlock(struct sem_array *sma, int locknum) { if (locknum == -1) { unmerge_queues(sma); + complexmode_tryleave(sma); ipc_unlock_object(&sma->sem_perm); } else { struct sem *sem = sma->sem_base + locknum; @@ -529,6 +544,7 @@ static int newary(struct ipc_namespace *ns, struct ipc_params *params) } sma->complex_count = 0; + sma->complex_mode = true; /* dropped by sem_unlock below */ INIT_LIST_HEAD(&sma->pending_alter); INIT_LIST_HEAD(&sma->pending_const); INIT_LIST_HEAD(&sma->list_id); @@ -2184,10 +2200,10 @@ static int sysvipc_sem_proc_show(struct seq_file *s, void *it) /* * The proc interface isn't aware of sem_lock(), it calls * ipc_lock_object() directly (in sysvipc_find_ipc). - * In order to stay compatible with sem_lock(), we must wait until - * all simple semop() calls have left their critical regions. + * In order to stay compatible with sem_lock(), we must + * enter / leave complex_mode. */ - sem_wait_array(sma); + complexmode_enter(sma); sem_otime = get_semotime(sma); @@ -2204,6 +2220,8 @@ static int sysvipc_sem_proc_show(struct seq_file *s, void *it) sem_otime, sma->sem_ctime); + complexmode_tryleave(sma); + return 0; } #endif ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 1/2] ipc/sem.c: Fix complex_count vs. simple op race 2016-06-18 20:02 ` Manfred Spraul @ 2016-06-20 23:04 ` Andrew Morton 2016-06-23 18:55 ` Manfred Spraul 2016-06-21 0:30 ` Davidlohr Bueso 1 sibling, 1 reply; 12+ messages in thread From: Andrew Morton @ 2016-06-20 23:04 UTC (permalink / raw) To: Manfred Spraul Cc: Stephen Rothwell, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Peter Zijlstra, LKML, linux-next, 1vier1, Davidlohr Bueso, felixh, stable On Sat, 18 Jun 2016 22:02:21 +0200 Manfred Spraul <manfred@colorfullife.com> wrote: > Commit 6d07b68ce16a ("ipc/sem.c: optimize sem_lock()") introduced a race: > > sem_lock has a fast path that allows parallel simple operations. > There are two reasons why a simple operation cannot run in parallel: > - a non-simple operations is ongoing (sma->sem_perm.lock held) > - a complex operation is sleeping (sma->complex_count != 0) > > As both facts are stored independently, a thread can bypass the current > checks by sleeping in the right positions. See below for more details > (or kernel bugzilla 105651). > > The patch fixes that by creating one variable (complex_mode) > that tracks both reasons why parallel operations are not possible. > > The patch also updates stale documentation regarding the locking. > > With regards to stable kernels: > The patch is required for all kernels that include the commit 6d07b68ce16a > ("ipc/sem.c: optimize sem_lock()") (3.10?) I've had this in -mm (and -next) since January 4, without issues. I put it on hold because Davidlohr expressed concern about performance regressions. Your [2/2] should prevent those regressions (yes?) so I assume that any kernel which has [1/2] really should have [2/2] as well. But without any quantitative information, this is all mad guesswork. What to do? (The [2/2] changelog should explain that it is the cure to [1/2]'s regressions, btw). ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/2] ipc/sem.c: Fix complex_count vs. simple op race 2016-06-20 23:04 ` Andrew Morton @ 2016-06-23 18:55 ` Manfred Spraul 0 siblings, 0 replies; 12+ messages in thread From: Manfred Spraul @ 2016-06-23 18:55 UTC (permalink / raw) To: Andrew Morton Cc: Stephen Rothwell, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Peter Zijlstra, LKML, linux-next, 1vier1, Davidlohr Bueso, felixh, stable On 06/21/2016 01:04 AM, Andrew Morton wrote: > On Sat, 18 Jun 2016 22:02:21 +0200 Manfred Spraul <manfred@colorfullife.com> wrote: > >> Commit 6d07b68ce16a ("ipc/sem.c: optimize sem_lock()") introduced a race: >> >> sem_lock has a fast path that allows parallel simple operations. >> There are two reasons why a simple operation cannot run in parallel: >> - a non-simple operations is ongoing (sma->sem_perm.lock held) >> - a complex operation is sleeping (sma->complex_count != 0) >> >> As both facts are stored independently, a thread can bypass the current >> checks by sleeping in the right positions. See below for more details >> (or kernel bugzilla 105651). >> >> The patch fixes that by creating one variable (complex_mode) >> that tracks both reasons why parallel operations are not possible. >> >> The patch also updates stale documentation regarding the locking. >> >> With regards to stable kernels: >> The patch is required for all kernels that include the commit 6d07b68ce16a >> ("ipc/sem.c: optimize sem_lock()") (3.10?) > I've had this in -mm (and -next) since January 4, without issues. I > put it on hold because Davidlohr expressed concern about performance > regressions. I had several ideas how to fix it. The initial ideas probably had performance issue. The current one doesn't have any issues. It just took longer than expected to test it. > Your [2/2] should prevent those regressions (yes?) so I assume that any > kernel which has [1/2] really should have [2/2] as well. But without > any quantitative information, this is all mad guesswork. > > What to do? [2/2] is an improvement, it handles one case better than the current code. If you want: 3.10 improved scalability, but it introduced a performance regression for one use case. [with 3.10, simple ops got parallel, but complex ops had to perform a "for_each_sem() {spin_unlock_wait()}"] The patch fixes that. -- Manfred ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/2] ipc/sem.c: Fix complex_count vs. simple op race 2016-06-18 20:02 ` Manfred Spraul 2016-06-20 23:04 ` Andrew Morton @ 2016-06-21 0:30 ` Davidlohr Bueso 2016-06-23 19:22 ` Manfred Spraul 1 sibling, 1 reply; 12+ messages in thread From: Davidlohr Bueso @ 2016-06-21 0:30 UTC (permalink / raw) To: Manfred Spraul Cc: Stephen Rothwell, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Peter Zijlstra, Andrew Morton, LKML, linux-next, 1vier1, felixh, stable On Sat, 18 Jun 2016, Manfred Spraul wrote: >diff --git a/include/linux/sem.h b/include/linux/sem.h >index 976ce3a..d0efd6e 100644 >--- a/include/linux/sem.h >+++ b/include/linux/sem.h >@@ -21,6 +21,7 @@ struct sem_array { > struct list_head list_id; /* undo requests on this array */ > int sem_nsems; /* no. of semaphores in array */ > int complex_count; /* pending complex operations */ I see this patch also takes care of complex_count needing READ/WRITE_ONCE as you change that to always be done with the big sem_perm lock. >+ bool complex_mode; /* no parallel simple ops */ But I'm wondering about races with this one. Doesn't complex_mode need acquire/release semantics? > }; > [...] > /* >- * Wait until all currently ongoing simple ops have completed. >+ * Enter the mode suitable for non-simple operations: > * Caller must own sem_perm.lock. >- * New simple ops cannot start, because simple ops first check >- * that sem_perm.lock is free. >- * that a) sem_perm.lock is free and b) complex_count is 0. > */ >-static void sem_wait_array(struct sem_array *sma) >+static void complexmode_enter(struct sem_array *sma) > { > int i; > struct sem *sem; > >- if (sma->complex_count) { >- /* The thread that increased sma->complex_count waited on >- * all sem->lock locks. Thus we don't need to wait again. >- */ >+ if (sma->complex_mode) { >+ /* We are already in complex_mode. Nothing to do */ This complex_mode load is serialized because both complexmode_enter() and _tryleave(), which are the main calls that modify the variable, are serialized by sem_perm lock, right? > return; > } Btw, I like that this logic is much simpler, just by reading the comments :) >+ WRITE_ONCE(sma->complex_mode, true); >+ >+ /* We need a full barrier: >+ * The write to complex_mode must be visible >+ * before we read the first sem->lock spinlock state. >+ */ >+ smp_mb(); > > for (i = 0; i < sma->sem_nsems; i++) { > sem = sma->sem_base + i; >@@ -285,6 +294,29 @@ static void sem_wait_array(struct sem_array *sma) > } > > /* >+ * Try to leave the mode that disallows simple operations: >+ * Caller must own sem_perm.lock. >+ */ >+static void complexmode_tryleave(struct sem_array *sma) >+{ >+ if (sma->complex_count) { >+ /* Complex ops are sleeping. >+ * We must stay in complex mode >+ */ >+ return; >+ } >+ /* >+ * Immediately after setting complex_mode to false, >+ * a simple op can start. Thus: all memory writes >+ * performed by the current operation must be visible >+ * before we set complex_mode to false. >+ */ >+ smp_wmb(); >+ >+ WRITE_ONCE(sma->complex_mode, false); smp_store_release()? See below. >+} >+ >+/* > * If the request contains only one semaphore operation, and there are > * no complex transactions pending, lock only the semaphore involved. > * Otherwise, lock the entire semaphore array, since we either have >@@ -300,56 +332,38 @@ static inline int sem_lock(struct sem_array *sma, struct sembuf *sops, > /* Complex operation - acquire a full lock */ > ipc_lock_object(&sma->sem_perm); > >- /* And wait until all simple ops that are processed >- * right now have dropped their locks. >- */ >- sem_wait_array(sma); >+ /* Prevent parallel simple ops */ >+ complexmode_enter(sma); > return -1; > } > > /* > * Only one semaphore affected - try to optimize locking. >- * The rules are: >- * - optimized locking is possible if no complex operation >- * is either enqueued or processed right now. >- * - The test for enqueued complex ops is simple: >- * sma->complex_count != 0 >- * - Testing for complex ops that are processed right now is >- * a bit more difficult. Complex ops acquire the full lock >- * and first wait that the running simple ops have completed. >- * (see above) >- * Thus: If we own a simple lock and the global lock is free >- * and complex_count is now 0, then it will stay 0 and >- * thus just locking sem->lock is sufficient. >+ * Optimized locking is possible if no complex operation >+ * is either enqueued or processed right now. >+ * >+ * Both facts are tracked by complex_mode. > */ > sem = sma->sem_base + sops->sem_num; > >- if (sma->complex_count == 0) { >+ /* >+ * Initial check for complex_mode. Just an optimization, >+ * no locking. >+ */ >+ if (!READ_ONCE(sma->complex_mode)) { We have no lock (which is the whole point), I think we need stronger guarantees here to avoid racing with another thread that holds sem_perm lock and is entering complexmode for the first time or vice versa with tryleave(). An smp_load_acquire here would pair with the suggested smp_store_release calls. Thanks, Davidlohr ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/2] ipc/sem.c: Fix complex_count vs. simple op race 2016-06-21 0:30 ` Davidlohr Bueso @ 2016-06-23 19:22 ` Manfred Spraul 2016-06-28 5:27 ` Davidlohr Bueso 0 siblings, 1 reply; 12+ messages in thread From: Manfred Spraul @ 2016-06-23 19:22 UTC (permalink / raw) To: Davidlohr Bueso Cc: Stephen Rothwell, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Peter Zijlstra, Andrew Morton, LKML, linux-next, 1vier1, felixh, stable On 06/21/2016 02:30 AM, Davidlohr Bueso wrote: > On Sat, 18 Jun 2016, Manfred Spraul wrote: > >> diff --git a/include/linux/sem.h b/include/linux/sem.h >> index 976ce3a..d0efd6e 100644 >> --- a/include/linux/sem.h >> +++ b/include/linux/sem.h >> @@ -21,6 +21,7 @@ struct sem_array { >> struct list_head list_id; /* undo requests on this array */ >> int sem_nsems; /* no. of semaphores in array */ >> int complex_count; /* pending complex operations */ > > I see this patch also takes care of complex_count needing READ/WRITE_ONCE > as you change that to always be done with the big sem_perm lock. > >> + bool complex_mode; /* no parallel simple ops */ > > But I'm wondering about races with this one. Doesn't complex_mode need > acquire/release semantics? > Yes. complex_mode needs acquire/release. >> }; >> > > [...] > >> /* >> - * Wait until all currently ongoing simple ops have completed. >> + * Enter the mode suitable for non-simple operations: >> * Caller must own sem_perm.lock. >> - * New simple ops cannot start, because simple ops first check >> - * that sem_perm.lock is free. >> - * that a) sem_perm.lock is free and b) complex_count is 0. >> */ >> -static void sem_wait_array(struct sem_array *sma) >> +static void complexmode_enter(struct sem_array *sma) >> { >> int i; >> struct sem *sem; >> >> - if (sma->complex_count) { >> - /* The thread that increased sma->complex_count waited on >> - * all sem->lock locks. Thus we don't need to wait again. >> - */ >> + if (sma->complex_mode) { >> + /* We are already in complex_mode. Nothing to do */ > > This complex_mode load is serialized because both complexmode_enter() and > _tryleave(), which are the main calls that modify the variable, are > serialized > by sem_perm lock, right? > Exactly. Changes to complex_mode are protected by perm.lock. >> return; >> } > > Btw, I like that this logic is much simpler, just by reading the > comments :) > >> + WRITE_ONCE(sma->complex_mode, true); >> + >> + /* We need a full barrier: >> + * The write to complex_mode must be visible >> + * before we read the first sem->lock spinlock state. >> + */ >> + smp_mb(); >> Theoretically: smp_store_acquire. but this doesn't exist, so smp_mb() >> for (i = 0; i < sma->sem_nsems; i++) { >> sem = sma->sem_base + i; >> @@ -285,6 +294,29 @@ static void sem_wait_array(struct sem_array *sma) >> } >> >> /* >> + * Try to leave the mode that disallows simple operations: >> + * Caller must own sem_perm.lock. >> + */ >> +static void complexmode_tryleave(struct sem_array *sma) >> +{ >> + if (sma->complex_count) { >> + /* Complex ops are sleeping. >> + * We must stay in complex mode >> + */ >> + return; >> + } >> + /* >> + * Immediately after setting complex_mode to false, >> + * a simple op can start. Thus: all memory writes >> + * performed by the current operation must be visible >> + * before we set complex_mode to false. >> + */ >> + smp_wmb(); >> + >> + WRITE_ONCE(sma->complex_mode, false); > > smp_store_release()? See below. > Yes >> +} >> + >> +/* >> * If the request contains only one semaphore operation, and there are >> * no complex transactions pending, lock only the semaphore involved. >> * Otherwise, lock the entire semaphore array, since we either have >> @@ -300,56 +332,38 @@ static inline int sem_lock(struct sem_array >> *sma, struct sembuf *sops, >> /* Complex operation - acquire a full lock */ >> ipc_lock_object(&sma->sem_perm); >> >> - /* And wait until all simple ops that are processed >> - * right now have dropped their locks. >> - */ >> - sem_wait_array(sma); >> + /* Prevent parallel simple ops */ >> + complexmode_enter(sma); >> return -1; >> } >> >> /* >> * Only one semaphore affected - try to optimize locking. >> - * The rules are: >> - * - optimized locking is possible if no complex operation >> - * is either enqueued or processed right now. >> - * - The test for enqueued complex ops is simple: >> - * sma->complex_count != 0 >> - * - Testing for complex ops that are processed right now is >> - * a bit more difficult. Complex ops acquire the full lock >> - * and first wait that the running simple ops have completed. >> - * (see above) >> - * Thus: If we own a simple lock and the global lock is free >> - * and complex_count is now 0, then it will stay 0 and >> - * thus just locking sem->lock is sufficient. >> + * Optimized locking is possible if no complex operation >> + * is either enqueued or processed right now. >> + * >> + * Both facts are tracked by complex_mode. >> */ >> sem = sma->sem_base + sops->sem_num; >> >> - if (sma->complex_count == 0) { >> + /* >> + * Initial check for complex_mode. Just an optimization, >> + * no locking. >> + */ >> + if (!READ_ONCE(sma->complex_mode)) { > > We have no lock (which is the whole point), I think we need stronger > guarantees here to avoid racing with another thread that holds sem_perm > lock and is entering complexmode for the first time or vice versa with > tryleave(). An smp_load_acquire here would pair with the suggested > smp_store_release calls. > Yes,you are right. What I'm not sure yet is if smp_load_acquire() is sufficient: Thread A: > if (!READ_ONCE(sma->complex_mode)) { The code is test_and_test, no barrier requirements for first test > /* > * It appears that no complex operation is around. > * Acquire the per-semaphore lock. > */ > spin_lock(&sem->lock); > > if (!smp_load_acquire(&sma->complex_mode)) { > /* fast path successful! */ > return sops->sem_num; > } > spin_unlock(&sem->lock); > } Thread B: > WRITE_ONCE(sma->complex_mode, true); > > /* We need a full barrier: > * The write to complex_mode must be visible > * before we read the first sem->lock spinlock state. > */ > smp_mb(); > > for (i = 0; i < sma->sem_nsems; i++) { > sem = sma->sem_base + i; > spin_unlock_wait(&sem->lock); > } If thread A is allowed to issue read_spinlock;read complex_mode;write spinlock, then thread B would not notice that thread A is in the critical section > Thanks, > Davidlohr I'll update the patch. -- Manfred ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/2] ipc/sem.c: Fix complex_count vs. simple op race 2016-06-23 19:22 ` Manfred Spraul @ 2016-06-28 5:27 ` Davidlohr Bueso 2016-06-30 19:28 ` Manfred Spraul 0 siblings, 1 reply; 12+ messages in thread From: Davidlohr Bueso @ 2016-06-28 5:27 UTC (permalink / raw) To: Manfred Spraul Cc: Stephen Rothwell, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Peter Zijlstra, Andrew Morton, LKML, linux-next, 1vier1, felixh, stable On Thu, 23 Jun 2016, Manfred Spraul wrote: >What I'm not sure yet is if smp_load_acquire() is sufficient: > >Thread A: >> if (!READ_ONCE(sma->complex_mode)) { >The code is test_and_test, no barrier requirements for first test Yeah, it would just make us take the big lock unnecessarily, nothing fatal and I agree its probably worth the optimization. It still might be worth commenting. >> /* >> * It appears that no complex operation is around. >> * Acquire the per-semaphore lock. >> */ >> spin_lock(&sem->lock); >> >> if (!smp_load_acquire(&sma->complex_mode)) { >> /* fast path successful! */ >> return sops->sem_num; >> } >> spin_unlock(&sem->lock); >> } > >Thread B: >> WRITE_ONCE(sma->complex_mode, true); >> >> /* We need a full barrier: >> * The write to complex_mode must be visible >> * before we read the first sem->lock spinlock state. >> */ >> smp_mb(); >> >> for (i = 0; i < sma->sem_nsems; i++) { >> sem = sma->sem_base + i; >> spin_unlock_wait(&sem->lock); >> } > >If thread A is allowed to issue read_spinlock;read complex_mode;write >spinlock, then thread B would not notice that thread A is in the >critical section Are you referring to the sem->lock word not being visibly locked before we read complex_mode (for the second time)? This issue was fixed in 2c610022711 (locking/qspinlock: Fix spin_unlock_wait() some more). So smp_load_acquire should be enough afaict, or are you referring to something else? Thanks, Davidlohr ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/2] ipc/sem.c: Fix complex_count vs. simple op race 2016-06-28 5:27 ` Davidlohr Bueso @ 2016-06-30 19:28 ` Manfred Spraul 2016-07-01 16:52 ` Davidlohr Bueso 0 siblings, 1 reply; 12+ messages in thread From: Manfred Spraul @ 2016-06-30 19:28 UTC (permalink / raw) To: Davidlohr Bueso Cc: Stephen Rothwell, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Peter Zijlstra, Andrew Morton, LKML, linux-next, 1vier1, felixh, stable On 06/28/2016 07:27 AM, Davidlohr Bueso wrote: > On Thu, 23 Jun 2016, Manfred Spraul wrote: > >> What I'm not sure yet is if smp_load_acquire() is sufficient: >> >> Thread A: >>> if (!READ_ONCE(sma->complex_mode)) { >> The code is test_and_test, no barrier requirements for first test > > Yeah, it would just make us take the big lock unnecessarily, nothing > fatal > and I agree its probably worth the optimization. It still might be worth > commenting. > I'll extend the comment: "no locking and no memory barrier" >>> /* >>> * It appears that no complex operation is around. >>> * Acquire the per-semaphore lock. >>> */ >>> spin_lock(&sem->lock); >>> >>> if (!smp_load_acquire(&sma->complex_mode)) { >>> /* fast path successful! */ >>> return sops->sem_num; >>> } >>> spin_unlock(&sem->lock); >>> } >> >> Thread B: >>> WRITE_ONCE(sma->complex_mode, true); >>> >>> /* We need a full barrier: >>> * The write to complex_mode must be visible >>> * before we read the first sem->lock spinlock state. >>> */ >>> smp_mb(); >>> >>> for (i = 0; i < sma->sem_nsems; i++) { >>> sem = sma->sem_base + i; >>> spin_unlock_wait(&sem->lock); >>> } >> >> If thread A is allowed to issue read_spinlock;read complex_mode;write >> spinlock, then thread B would not notice that thread A is in the >> critical section > > Are you referring to the sem->lock word not being visibly locked > before we > read complex_mode (for the second time)? This issue was fixed in > 2c610022711 > (locking/qspinlock: Fix spin_unlock_wait() some more). So > smp_load_acquire > should be enough afaict, or are you referring to something else? > You are right, I didn't read this patch fully. If I understand it right, it means that spin_lock() is both an acquire and a release - for qspinlocks. It this valid for all spinlock implementations, for all architectures? Otherwise: How can I detect in generic code if I can rely on a release inside spin_lock()? -- Manfred ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/2] ipc/sem.c: Fix complex_count vs. simple op race 2016-06-30 19:28 ` Manfred Spraul @ 2016-07-01 16:52 ` Davidlohr Bueso 0 siblings, 0 replies; 12+ messages in thread From: Davidlohr Bueso @ 2016-07-01 16:52 UTC (permalink / raw) To: Manfred Spraul Cc: Stephen Rothwell, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Peter Zijlstra, Andrew Morton, LKML, linux-next, 1vier1, felixh, stable On Thu, 30 Jun 2016, Manfred Spraul wrote: >On 06/28/2016 07:27 AM, Davidlohr Bueso wrote: >If I understand it right, it means that spin_lock() is both an acquire >and a release - for qspinlocks. I wouldn't say that: the _Q_LOCKED_VAL stores are still unordered inside the acquire region but no further than the the unlock store-release; which is fine for regular mutual exclusion. >It this valid for all spinlock implementations, for all architectures? >Otherwise: How can I detect in generic code if I can rely on a release >inside spin_lock()? With ticket locks this was never an issue as special lock readers (spin_unlock_wait(), spin_is_locked()) will always see the lock taken as they observe waiters by adding itself to the tail -- something the above commit mimics and piggy backs on to save expensive smp_rmb(). In addition, see 726328d92a4 (locking/spinlock, arch: Update and fix spin_unlock_wait() implementations). Thanks, Davidlohr ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2016-07-16 1:27 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <1468386412-3608-1-git-send-email-manfred@colorfullife.com>
2016-07-13 5:06 ` [PATCH 1/2] ipc/sem.c: Fix complex_count vs. simple op race Manfred Spraul
2016-07-16 1:27 ` Davidlohr Bueso
[not found] <1466876272-3824-1-git-send-email-manfred@colorfullife.com>
2016-06-25 17:37 ` Manfred Spraul
2016-06-25 19:41 ` Holger Hoffstätte
[not found] <20160615152318.164b1ebd@canb.auug.org.au>
2016-06-18 20:02 ` Manfred Spraul
2016-06-20 23:04 ` Andrew Morton
2016-06-23 18:55 ` Manfred Spraul
2016-06-21 0:30 ` Davidlohr Bueso
2016-06-23 19:22 ` Manfred Spraul
2016-06-28 5:27 ` Davidlohr Bueso
2016-06-30 19:28 ` Manfred Spraul
2016-07-01 16:52 ` Davidlohr Bueso
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox