From: Manfred Spraul <manfred@colorfullife.com>
To: Davidlohr Bueso <dave@stgolabs.net>
Cc: Andrew Morton <akpm@linux-foundation.org>,
LKML <linux-kernel@vger.kernel.org>,
1vier1@web.de, Ingo Molnar <mingo@redhat.com>,
felixh@informatik.uni-bremen.de, stable@vger.kernel.org
Subject: Re: [PATCH] ipc/sem.c: Fix complex_count vs. simple op race
Date: Mon, 4 Jan 2016 19:32:50 +0100 [thread overview]
Message-ID: <568ABAD2.4060707@colorfullife.com> (raw)
In-Reply-To: <20160104130231.GA3013@linux-uzut.site>
[-- Attachment #1: Type: text/plain, Size: 1905 bytes --]
On 01/04/2016 02:02 PM, Davidlohr Bueso wrote:
> On Sat, 02 Jan 2016, Manfred Spraul wrote:
>
>> Commit 6d07b68ce16a ("ipc/sem.c: optimize sem_lock()") introduced a
>> race:
>>
>> sem_lock has a fast path that allows parallel simple operations.
>> There are two reasons why a simple operation cannot run in parallel:
>> - a non-simple operations is ongoing (sma->sem_perm.lock held)
>> - a complex operation is sleeping (sma->complex_count != 0)
>>
>> As both facts are stored independently, a thread can bypass the current
>> checks by sleeping in the right positions. See below for more details
>> (or kernel bugzilla 105651).
>>
>> The patch fixes that by creating one variable (complex_mode)
>> that tracks both reasons why parallel operations are not possible.
>>
>> The patch also updates stale documentation regarding the locking.
>>
>> With regards to stable kernels:
>> The patch is required for all kernels that include the commit
>> 6d07b68ce16a
>> ("ipc/sem.c: optimize sem_lock()") (3.10?)
>>
>> The alternative is to revert the patch that introduced the race.
>
> I am just now catching up with this, but a quick thought is that we
> probably
> want to keep 6d07b68ce16a as waiting on unlocking all sem->lock should be
> much worse for performance than keeping track of the complex 'mode'.
Keeping track is simple - and the fast path gets simpler compared to
ncurrent code.
> Specially
> if we have a large array.
Yes, I would prefer my patch as a backport.
But if someone has objections, then a revert would be an alternative.
>
> Also, any idea what workload exposed this race? Anyway, will take a
> closer look
> at the patch/issue.
It was found by a theoretical review:
The sem_lock code was used as an exercise at University Bremen.
--
Manfred
P.S.: If we replace the "bool" with an "int", we could even introduce a
hysteresis, to further reduce the amount of array scans.
[-- Attachment #2: 0001-ipc-sem-sem_lock-with-hysteresis.patch --]
[-- Type: text/x-patch, Size: 4195 bytes --]
>From d3ea4cddaff440c03b89b045ee5a0be8c4db623d Mon Sep 17 00:00:00 2001
From: Manfred Spraul <manfred@colorfullife.com>
Date: Mon, 4 Jan 2016 19:20:34 +0100
Subject: [PATCH] ipc/sem: sem_lock with hysteresis
Untested
May not work
May not pass checkpatch.pl
May not be an improvement at all, not even for microbenchmarks
May be a worthless pseudo improvement without any real-world
advantages (except more complexity and more risks for bugs)
---
include/linux/sem.h | 2 +-
ipc/sem.c | 60 ++++++++++++++++++++++++++++++++++++++---------------
2 files changed, 44 insertions(+), 18 deletions(-)
diff --git a/include/linux/sem.h b/include/linux/sem.h
index d0efd6e..6fb3227 100644
--- a/include/linux/sem.h
+++ b/include/linux/sem.h
@@ -21,7 +21,7 @@ struct sem_array {
struct list_head list_id; /* undo requests on this array */
int sem_nsems; /* no. of semaphores in array */
int complex_count; /* pending complex operations */
- bool complex_mode; /* no parallel simple ops */
+ int complex_mode; /* >0: no parallel simple ops */
};
#ifdef CONFIG_SYSVIPC
diff --git a/ipc/sem.c b/ipc/sem.c
index 87e1f5d..f580c7c 100644
--- a/ipc/sem.c
+++ b/ipc/sem.c
@@ -154,6 +154,13 @@ static int sysvipc_sem_proc_show(struct seq_file *s, void *it);
#define SEMOPM_FAST 64 /* ~ 372 bytes on stack */
/*
+ * Switching from the mode suitable for complex ops
+ * to the mode for simple ops is costly. Therefore:
+ * use some hysteresis
+ */
+#define COMPLEX_MODE_ENTER 10
+
+/*
* Locking:
* a) global sem_lock() for read/write
* sem_undo.id_next,
@@ -278,11 +285,16 @@ static void complexmode_enter(struct sem_array *sma)
int i;
struct sem *sem;
- if (sma->complex_mode) {
- /* We are already in complex_mode. Nothing to do */
+ if (sma->complex_mode > 0) {
+ /*
+ * We are already in complex_mode.
+ * Nothing to do, just increase
+ * counter until we return to simple mode
+ */
+ WRITE_ONCE(sma->complex_mode, COMPLEX_MODE_ENTER);
return;
}
- WRITE_ONCE(sma->complex_mode, true);
+ WRITE_ONCE(sma->complex_mode, COMPLEX_MODE_ENTER);
/* We need a full barrier:
* The write to complex_mode must be visible
@@ -309,15 +321,18 @@ static void complexmode_tryleave(struct sem_array *sma)
*/
return;
}
+ if (sma->complex_mode == 0) {
+pr_info("error.");
+ }
/*
- * Immediately after setting complex_mode to false,
+ * Immediately after setting complex_mode to 0,
* a simple op can start. Thus: all memory writes
* performed by the current operation must be visible
* before we set complex_mode to false.
*/
smp_wmb();
- WRITE_ONCE(sma->complex_mode, false);
+ WRITE_ONCE(sma->complex_mode, sma->complex_mode-1);
}
/*
@@ -377,19 +392,30 @@ static inline int sem_lock(struct sem_array *sma, struct sembuf *sops,
if (sma->complex_count == 0) {
/* False alarm:
- * There is no complex operation, thus we can switch
- * back to the fast path.
- */
- spin_lock(&sem->lock);
- ipc_unlock_object(&sma->sem_perm);
- return sops->sem_num;
- } else {
- /* Not a false alarm, thus complete the sequence for a
- * full lock.
+ * There is no complex operation, check hysteresis
+ * If 0, switch back to the fast path.
*/
- complexmode_enter(sma);
- return -1;
+ if (sma->complex_mode > 0) {
+ /*
+ * barrier provided by spin_lock
+ * sma->sem_perm.lock
+ */
+
+ WRITE_ONCE(sma->complex_mode, sma->complex_mode-1);
+ }
+ if (sma->complex_mode == 0) {
+ spin_lock(&sem->lock);
+ ipc_unlock_object(&sma->sem_perm);
+ return sops->sem_num;
+ }
}
+ /*
+ * Not a false alarm, but we must be already in
+ * complex_mode: either because of waiting complex ops,
+ * or due to hysteresis.
+ */
+ WARN_ON(sma->complex_mode == 0);
+ return -1;
}
static inline void sem_unlock(struct sem_array *sma, int locknum)
@@ -548,7 +574,7 @@ static int newary(struct ipc_namespace *ns, struct ipc_params *params)
}
sma->complex_count = 0;
- sma->complex_mode = true; /* dropped by sem_unlock below */
+ WRITE_ONCE(sma->complex_mode, COMPLEX_MODE_ENTER);
INIT_LIST_HEAD(&sma->pending_alter);
INIT_LIST_HEAD(&sma->pending_const);
INIT_LIST_HEAD(&sma->list_id);
--
2.4.3
next prev parent reply other threads:[~2016-01-04 18:33 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-01-02 12:04 [PATCH] ipc/sem.c: Fix complex_count vs. simple op race Manfred Spraul
2016-01-04 13:02 ` Davidlohr Bueso
2016-01-04 18:32 ` Manfred Spraul [this message]
-- strict thread matches above, loose matches on Subject: below --
2016-07-21 17:54 Manfred Spraul
2016-07-21 20:16 ` Andrew Morton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=568ABAD2.4060707@colorfullife.com \
--to=manfred@colorfullife.com \
--cc=1vier1@web.de \
--cc=akpm@linux-foundation.org \
--cc=dave@stgolabs.net \
--cc=felixh@informatik.uni-bremen.de \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.