From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f44.google.com ([74.125.82.44]:36926 "EHLO mail-wm0-f44.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753144AbcADScy (ORCPT ); Mon, 4 Jan 2016 13:32:54 -0500 Received: by mail-wm0-f44.google.com with SMTP id f206so226549769wmf.0 for ; Mon, 04 Jan 2016 10:32:52 -0800 (PST) Subject: Re: [PATCH] ipc/sem.c: Fix complex_count vs. simple op race To: Davidlohr Bueso References: <1451736291-8115-1-git-send-email-manfred@colorfullife.com> <20160104130231.GA3013@linux-uzut.site> Cc: Andrew Morton , LKML , 1vier1@web.de, Ingo Molnar , felixh@informatik.uni-bremen.de, stable@vger.kernel.org From: Manfred Spraul Message-ID: <568ABAD2.4060707@colorfullife.com> Date: Mon, 4 Jan 2016 19:32:50 +0100 MIME-Version: 1.0 In-Reply-To: <20160104130231.GA3013@linux-uzut.site> Content-Type: multipart/mixed; boundary="------------080502020409050003040901" Sender: stable-owner@vger.kernel.org List-ID: This is a multi-part message in MIME format. --------------080502020409050003040901 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit On 01/04/2016 02:02 PM, Davidlohr Bueso wrote: > On Sat, 02 Jan 2016, Manfred Spraul wrote: > >> Commit 6d07b68ce16a ("ipc/sem.c: optimize sem_lock()") introduced a >> race: >> >> sem_lock has a fast path that allows parallel simple operations. >> There are two reasons why a simple operation cannot run in parallel: >> - a non-simple operations is ongoing (sma->sem_perm.lock held) >> - a complex operation is sleeping (sma->complex_count != 0) >> >> As both facts are stored independently, a thread can bypass the current >> checks by sleeping in the right positions. See below for more details >> (or kernel bugzilla 105651). >> >> The patch fixes that by creating one variable (complex_mode) >> that tracks both reasons why parallel operations are not possible. >> >> The patch also updates stale documentation regarding the locking. >> >> With regards to stable kernels: >> The patch is required for all kernels that include the commit >> 6d07b68ce16a >> ("ipc/sem.c: optimize sem_lock()") (3.10?) >> >> The alternative is to revert the patch that introduced the race. > > I am just now catching up with this, but a quick thought is that we > probably > want to keep 6d07b68ce16a as waiting on unlocking all sem->lock should be > much worse for performance than keeping track of the complex 'mode'. Keeping track is simple - and the fast path gets simpler compared to ncurrent code. > Specially > if we have a large array. Yes, I would prefer my patch as a backport. But if someone has objections, then a revert would be an alternative. > > Also, any idea what workload exposed this race? Anyway, will take a > closer look > at the patch/issue. It was found by a theoretical review: The sem_lock code was used as an exercise at University Bremen. -- Manfred P.S.: If we replace the "bool" with an "int", we could even introduce a hysteresis, to further reduce the amount of array scans. --------------080502020409050003040901 Content-Type: text/x-patch; name="0001-ipc-sem-sem_lock-with-hysteresis.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="0001-ipc-sem-sem_lock-with-hysteresis.patch" >>From d3ea4cddaff440c03b89b045ee5a0be8c4db623d Mon Sep 17 00:00:00 2001 From: Manfred Spraul Date: Mon, 4 Jan 2016 19:20:34 +0100 Subject: [PATCH] ipc/sem: sem_lock with hysteresis Untested May not work May not pass checkpatch.pl May not be an improvement at all, not even for microbenchmarks May be a worthless pseudo improvement without any real-world advantages (except more complexity and more risks for bugs) --- include/linux/sem.h | 2 +- ipc/sem.c | 60 ++++++++++++++++++++++++++++++++++++++--------------- 2 files changed, 44 insertions(+), 18 deletions(-) diff --git a/include/linux/sem.h b/include/linux/sem.h index d0efd6e..6fb3227 100644 --- a/include/linux/sem.h +++ b/include/linux/sem.h @@ -21,7 +21,7 @@ struct sem_array { struct list_head list_id; /* undo requests on this array */ int sem_nsems; /* no. of semaphores in array */ int complex_count; /* pending complex operations */ - bool complex_mode; /* no parallel simple ops */ + int complex_mode; /* >0: no parallel simple ops */ }; #ifdef CONFIG_SYSVIPC diff --git a/ipc/sem.c b/ipc/sem.c index 87e1f5d..f580c7c 100644 --- a/ipc/sem.c +++ b/ipc/sem.c @@ -154,6 +154,13 @@ static int sysvipc_sem_proc_show(struct seq_file *s, void *it); #define SEMOPM_FAST 64 /* ~ 372 bytes on stack */ /* + * Switching from the mode suitable for complex ops + * to the mode for simple ops is costly. Therefore: + * use some hysteresis + */ +#define COMPLEX_MODE_ENTER 10 + +/* * Locking: * a) global sem_lock() for read/write * sem_undo.id_next, @@ -278,11 +285,16 @@ static void complexmode_enter(struct sem_array *sma) int i; struct sem *sem; - if (sma->complex_mode) { - /* We are already in complex_mode. Nothing to do */ + if (sma->complex_mode > 0) { + /* + * We are already in complex_mode. + * Nothing to do, just increase + * counter until we return to simple mode + */ + WRITE_ONCE(sma->complex_mode, COMPLEX_MODE_ENTER); return; } - WRITE_ONCE(sma->complex_mode, true); + WRITE_ONCE(sma->complex_mode, COMPLEX_MODE_ENTER); /* We need a full barrier: * The write to complex_mode must be visible @@ -309,15 +321,18 @@ static void complexmode_tryleave(struct sem_array *sma) */ return; } + if (sma->complex_mode == 0) { +pr_info("error."); + } /* - * Immediately after setting complex_mode to false, + * Immediately after setting complex_mode to 0, * a simple op can start. Thus: all memory writes * performed by the current operation must be visible * before we set complex_mode to false. */ smp_wmb(); - WRITE_ONCE(sma->complex_mode, false); + WRITE_ONCE(sma->complex_mode, sma->complex_mode-1); } /* @@ -377,19 +392,30 @@ static inline int sem_lock(struct sem_array *sma, struct sembuf *sops, if (sma->complex_count == 0) { /* False alarm: - * There is no complex operation, thus we can switch - * back to the fast path. - */ - spin_lock(&sem->lock); - ipc_unlock_object(&sma->sem_perm); - return sops->sem_num; - } else { - /* Not a false alarm, thus complete the sequence for a - * full lock. + * There is no complex operation, check hysteresis + * If 0, switch back to the fast path. */ - complexmode_enter(sma); - return -1; + if (sma->complex_mode > 0) { + /* + * barrier provided by spin_lock + * sma->sem_perm.lock + */ + + WRITE_ONCE(sma->complex_mode, sma->complex_mode-1); + } + if (sma->complex_mode == 0) { + spin_lock(&sem->lock); + ipc_unlock_object(&sma->sem_perm); + return sops->sem_num; + } } + /* + * Not a false alarm, but we must be already in + * complex_mode: either because of waiting complex ops, + * or due to hysteresis. + */ + WARN_ON(sma->complex_mode == 0); + return -1; } static inline void sem_unlock(struct sem_array *sma, int locknum) @@ -548,7 +574,7 @@ static int newary(struct ipc_namespace *ns, struct ipc_params *params) } sma->complex_count = 0; - sma->complex_mode = true; /* dropped by sem_unlock below */ + WRITE_ONCE(sma->complex_mode, COMPLEX_MODE_ENTER); INIT_LIST_HEAD(&sma->pending_alter); INIT_LIST_HEAD(&sma->pending_const); INIT_LIST_HEAD(&sma->list_id); -- 2.4.3 --------------080502020409050003040901--