From: Manfred Spraul <manfred@colorfullife.com>
To: Davidlohr Bueso <dave@stgolabs.net>
Cc: Andrew Morton <akpm@linux-foundation.org>,
LKML <linux-kernel@vger.kernel.org>,
1vier1@web.de, Ingo Molnar <mingo@redhat.com>,
felixh@informatik.uni-bremen.de, stable@vger.kernel.org
Subject: Re: [PATCH] ipc/sem.c: Fix complex_count vs. simple op race
Date: Mon, 4 Jan 2016 19:32:50 +0100 [thread overview]
Message-ID: <568ABAD2.4060707@colorfullife.com> (raw)
In-Reply-To: <20160104130231.GA3013@linux-uzut.site>
[-- Attachment #1: Type: text/plain, Size: 1905 bytes --]
On 01/04/2016 02:02 PM, Davidlohr Bueso wrote:
> On Sat, 02 Jan 2016, Manfred Spraul wrote:
>
>> Commit 6d07b68ce16a ("ipc/sem.c: optimize sem_lock()") introduced a
>> race:
>>
>> sem_lock has a fast path that allows parallel simple operations.
>> There are two reasons why a simple operation cannot run in parallel:
>> - a non-simple operations is ongoing (sma->sem_perm.lock held)
>> - a complex operation is sleeping (sma->complex_count != 0)
>>
>> As both facts are stored independently, a thread can bypass the current
>> checks by sleeping in the right positions. See below for more details
>> (or kernel bugzilla 105651).
>>
>> The patch fixes that by creating one variable (complex_mode)
>> that tracks both reasons why parallel operations are not possible.
>>
>> The patch also updates stale documentation regarding the locking.
>>
>> With regards to stable kernels:
>> The patch is required for all kernels that include the commit
>> 6d07b68ce16a
>> ("ipc/sem.c: optimize sem_lock()") (3.10?)
>>
>> The alternative is to revert the patch that introduced the race.
>
> I am just now catching up with this, but a quick thought is that we
> probably
> want to keep 6d07b68ce16a as waiting on unlocking all sem->lock should be
> much worse for performance than keeping track of the complex 'mode'.
Keeping track is simple - and the fast path gets simpler compared to
ncurrent code.
> Specially
> if we have a large array.
Yes, I would prefer my patch as a backport.
But if someone has objections, then a revert would be an alternative.
>
> Also, any idea what workload exposed this race? Anyway, will take a
> closer look
> at the patch/issue.
It was found by a theoretical review:
The sem_lock code was used as an exercise at University Bremen.
--
Manfred
P.S.: If we replace the "bool" with an "int", we could even introduce a
hysteresis, to further reduce the amount of array scans.
[-- Attachment #2: 0001-ipc-sem-sem_lock-with-hysteresis.patch --]
[-- Type: text/x-patch, Size: 4195 bytes --]
>From d3ea4cddaff440c03b89b045ee5a0be8c4db623d Mon Sep 17 00:00:00 2001
From: Manfred Spraul <manfred@colorfullife.com>
Date: Mon, 4 Jan 2016 19:20:34 +0100
Subject: [PATCH] ipc/sem: sem_lock with hysteresis
Untested
May not work
May not pass checkpatch.pl
May not be an improvement at all, not even for microbenchmarks
May be a worthless pseudo improvement without any real-world
advantages (except more complexity and more risks for bugs)
---
include/linux/sem.h | 2 +-
ipc/sem.c | 60 ++++++++++++++++++++++++++++++++++++++---------------
2 files changed, 44 insertions(+), 18 deletions(-)
diff --git a/include/linux/sem.h b/include/linux/sem.h
index d0efd6e..6fb3227 100644
--- a/include/linux/sem.h
+++ b/include/linux/sem.h
@@ -21,7 +21,7 @@ struct sem_array {
struct list_head list_id; /* undo requests on this array */
int sem_nsems; /* no. of semaphores in array */
int complex_count; /* pending complex operations */
- bool complex_mode; /* no parallel simple ops */
+ int complex_mode; /* >0: no parallel simple ops */
};
#ifdef CONFIG_SYSVIPC
diff --git a/ipc/sem.c b/ipc/sem.c
index 87e1f5d..f580c7c 100644
--- a/ipc/sem.c
+++ b/ipc/sem.c
@@ -154,6 +154,13 @@ static int sysvipc_sem_proc_show(struct seq_file *s, void *it);
#define SEMOPM_FAST 64 /* ~ 372 bytes on stack */
/*
+ * Switching from the mode suitable for complex ops
+ * to the mode for simple ops is costly. Therefore:
+ * use some hysteresis
+ */
+#define COMPLEX_MODE_ENTER 10
+
+/*
* Locking:
* a) global sem_lock() for read/write
* sem_undo.id_next,
@@ -278,11 +285,16 @@ static void complexmode_enter(struct sem_array *sma)
int i;
struct sem *sem;
- if (sma->complex_mode) {
- /* We are already in complex_mode. Nothing to do */
+ if (sma->complex_mode > 0) {
+ /*
+ * We are already in complex_mode.
+ * Nothing to do, just increase
+ * counter until we return to simple mode
+ */
+ WRITE_ONCE(sma->complex_mode, COMPLEX_MODE_ENTER);
return;
}
- WRITE_ONCE(sma->complex_mode, true);
+ WRITE_ONCE(sma->complex_mode, COMPLEX_MODE_ENTER);
/* We need a full barrier:
* The write to complex_mode must be visible
@@ -309,15 +321,18 @@ static void complexmode_tryleave(struct sem_array *sma)
*/
return;
}
+ if (sma->complex_mode == 0) {
+pr_info("error.");
+ }
/*
- * Immediately after setting complex_mode to false,
+ * Immediately after setting complex_mode to 0,
* a simple op can start. Thus: all memory writes
* performed by the current operation must be visible
* before we set complex_mode to false.
*/
smp_wmb();
- WRITE_ONCE(sma->complex_mode, false);
+ WRITE_ONCE(sma->complex_mode, sma->complex_mode-1);
}
/*
@@ -377,19 +392,30 @@ static inline int sem_lock(struct sem_array *sma, struct sembuf *sops,
if (sma->complex_count == 0) {
/* False alarm:
- * There is no complex operation, thus we can switch
- * back to the fast path.
- */
- spin_lock(&sem->lock);
- ipc_unlock_object(&sma->sem_perm);
- return sops->sem_num;
- } else {
- /* Not a false alarm, thus complete the sequence for a
- * full lock.
+ * There is no complex operation, check hysteresis
+ * If 0, switch back to the fast path.
*/
- complexmode_enter(sma);
- return -1;
+ if (sma->complex_mode > 0) {
+ /*
+ * barrier provided by spin_lock
+ * sma->sem_perm.lock
+ */
+
+ WRITE_ONCE(sma->complex_mode, sma->complex_mode-1);
+ }
+ if (sma->complex_mode == 0) {
+ spin_lock(&sem->lock);
+ ipc_unlock_object(&sma->sem_perm);
+ return sops->sem_num;
+ }
}
+ /*
+ * Not a false alarm, but we must be already in
+ * complex_mode: either because of waiting complex ops,
+ * or due to hysteresis.
+ */
+ WARN_ON(sma->complex_mode == 0);
+ return -1;
}
static inline void sem_unlock(struct sem_array *sma, int locknum)
@@ -548,7 +574,7 @@ static int newary(struct ipc_namespace *ns, struct ipc_params *params)
}
sma->complex_count = 0;
- sma->complex_mode = true; /* dropped by sem_unlock below */
+ WRITE_ONCE(sma->complex_mode, COMPLEX_MODE_ENTER);
INIT_LIST_HEAD(&sma->pending_alter);
INIT_LIST_HEAD(&sma->pending_const);
INIT_LIST_HEAD(&sma->list_id);
--
2.4.3
next prev parent reply other threads:[~2016-01-04 18:32 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-01-02 12:04 [PATCH] ipc/sem.c: Fix complex_count vs. simple op race Manfred Spraul
2016-01-04 13:02 ` Davidlohr Bueso
2016-01-04 18:32 ` Manfred Spraul [this message]
-- strict thread matches above, loose matches on Subject: below --
2016-07-21 17:54 Manfred Spraul
2016-07-21 20:16 ` Andrew Morton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=568ABAD2.4060707@colorfullife.com \
--to=manfred@colorfullife.com \
--cc=1vier1@web.de \
--cc=akpm@linux-foundation.org \
--cc=dave@stgolabs.net \
--cc=felixh@informatik.uni-bremen.de \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).