From: Tejun Heo <tj@kernel.org>
To: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: linux-kernel@vger.kernel.org, mingo@elte.hu,
laijs@cn.fujitsu.com, dipankar@in.ibm.com,
akpm@linux-foundation.org, mathieu.desnoyers@polymtl.ca,
josh@joshtriplett.org, niv@us.ibm.com, tglx@linutronix.de,
peterz@infradead.org, rostedt@goodmis.org,
Valdis.Kletnieks@vt.edu, dhowells@redhat.com,
eric.dumazet@gmail.com, darren@dvhart.com
Subject: Re: [PATCH RFC tip/core/rcu 11/12] rcu: fix race condition in synchronize_sched_expedited()
Date: Tue, 09 Nov 2010 14:26:37 +0100 [thread overview]
Message-ID: <4CD94C0D.3030007@kernel.org> (raw)
In-Reply-To: <1289095532-5398-11-git-send-email-paulmck@linux.vnet.ibm.com>
Hello, Paul.
On 11/07/2010 03:05 AM, Paul E. McKenney wrote:
> The new (early 2010) implementation of synchronize_sched_expedited() uses
> try_stop_cpu() to force a context switch on every CPU. It also permits
> concurrent calls to synchronize_sched_expedited() to share a single call
> to try_stop_cpu() through use of an atomically incremented
> synchronize_sched_expedited_count variable. Unfortunately, this is
> subject to failure as follows:
>
> o Task A invokes synchronize_sched_expedited(), try_stop_cpus()
> succeeds, but Task A is preempted before getting to the atomic
> increment of synchronize_sched_expedited_count.
>
> o Task B also invokes synchronize_sched_expedited(), with exactly
> the same outcome as Task A.
>
> o Task C also invokes synchronize_sched_expedited(), again with
> exactly the same outcome as Tasks A and B.
>
> o Task D also invokes synchronize_sched_expedited(), but only
> gets as far as acquiring the mutex within try_stop_cpus()
> before being preempted, interrupted, or otherwise delayed.
>
> o Task E also invokes synchronize_sched_expedited(), but only
> gets to the snapshotting of synchronize_sched_expedited_count.
>
> o Tasks A, B, and C all increment synchronize_sched_expedited_count.
>
> o Task E fails to get the mutex, so checks the new value
> of synchronize_sched_expedited_count. It finds that the
> value has increased, so (wrongly) assumes that its work
> has been done, returning despite there having been no
> expedited grace period since it began.
>
> The solution is to have the lowest-numbered CPU atomically increment
> the synchronize_sched_expedited_count variable within the
> synchronize_sched_expedited_cpu_stop() function, which is under
> the protection of the mutex acquired by try_stop_cpus(). However, this
> also requires that piggybacking tasks wait for three rather than two
> instances of try_stop_cpu(), because we cannot control the order in
> which the per-CPU callback function occur.
How about something like the following? It's slightly bigger but I
think it's a bit easier to understand. Thanks.
diff --git a/kernel/sched.c b/kernel/sched.c
index aa14a56..0069be5 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -9342,7 +9342,8 @@ EXPORT_SYMBOL_GPL(synchronize_sched_expedited);
#else /* #ifndef CONFIG_SMP */
-static atomic_t synchronize_sched_expedited_count = ATOMIC_INIT(0);
+static atomic_t sync_sched_expedited_token = ATOMIC_INIT(0);
+static atomic_t sync_sched_expedited_done = ATOMIC_INIT(0);
static int synchronize_sched_expedited_cpu_stop(void *data)
{
@@ -9373,11 +9374,18 @@ static int synchronize_sched_expedited_cpu_stop(void *data)
*/
void synchronize_sched_expedited(void)
{
- int snap, trycount = 0;
+ int my_tok, tok, t, trycount = 0;
+
+ smp_mb(); /* ensure prior mod happens before getting token. */
+
+ /*
+ * Get a token. This is used to coordinate with other
+ * concurrent syncers and consolidate multiple syncs.
+ */
+ my_tok = tok = atomic_inc_return(&sync_sched_expedited_token);
- smp_mb(); /* ensure prior mod happens before capturing snap. */
- snap = atomic_read(&synchronize_sched_expedited_count) + 1;
get_online_cpus();
+
while (try_stop_cpus(cpu_online_mask,
synchronize_sched_expedited_cpu_stop,
NULL) == -EAGAIN) {
@@ -9388,13 +9396,34 @@ void synchronize_sched_expedited(void)
synchronize_sched();
return;
}
- if (atomic_read(&synchronize_sched_expedited_count) - snap > 0) {
+
+ /*
+ * If the done count reached @my_tok, we know at least
+ * one synchronization happened since we entered this
+ * function.
+ */
+ if (atomic_read(&sync_sched_expedited_done) - my_tok >= 0) {
smp_mb(); /* ensure test happens before caller kfree */
return;
}
+
get_online_cpus();
+
+ /* about to retry, get the latest token value */
+ tok = atomic_read(&sync_sched_expedited_token);
}
- atomic_inc(&synchronize_sched_expedited_count);
+
+ /*
+ * We now know that everything upto @tok is synchronized.
+ * Update done counter which should always monotonically
+ * increase (with wrapping considered).
+ */
+ do {
+ t = atomic_read(&sync_sched_expedited_done);
+ if (t - tok >= 0)
+ break;
+ } while (atomic_cmpxchg(&sync_sched_expedited_done, t, tok) != t);
+
smp_mb__after_atomic_inc(); /* ensure post-GP actions seen after GP. */
put_online_cpus();
}
next prev parent reply other threads:[~2010-11-09 13:28 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-11-07 2:05 [PATCH RFC tip/core/rcu 0/12] preview of RCU patches for 2.6.38 Paul E. McKenney
2010-11-07 2:05 ` [PATCH RFC tip/core/rcu 01/12] rcu: add priority-inversion testing to rcutorture Paul E. McKenney
2010-11-07 2:05 ` [PATCH RFC tip/core/rcu 02/12] rcu: move TINY_RCU from softirq to kthread Paul E. McKenney
2010-11-07 2:05 ` [PATCH RFC tip/core/rcu 03/12] rcu: priority boosting for TINY_PREEMPT_RCU Paul E. McKenney
2010-11-07 2:05 ` [PATCH RFC tip/core/rcu 04/12] rcu: add tracing for TINY_RCU and TINY_PREEMPT_RCU Paul E. McKenney
2010-11-07 2:05 ` [PATCH RFC tip/core/rcu 05/12] rcu: document TINY_RCU and TINY_PREEMPT_RCU tracing Paul E. McKenney
2010-11-07 2:05 ` [PATCH RFC tip/core/rcu 06/12] rcu: Distinguish between boosting and boosted Paul E. McKenney
2010-11-07 2:05 ` [PATCH RFC tip/core/rcu 07/12] rcu: get rid of obsolete "classic" names in TREE_RCU tracing Paul E. McKenney
2010-11-07 2:05 ` [PATCH RFC tip/core/rcu 08/12] rcu,cleanup: move synchronize_sched_expedited() out of sched.c Paul E. McKenney
2010-11-07 2:05 ` [PATCH RFC tip/core/rcu 09/12] rcu,cleanup: simplify the code when cpu is dying Paul E. McKenney
2010-11-07 2:05 ` [PATCH RFC tip/core/rcu 10/12] rcu: update documentation/comments for Lai's adoption patch Paul E. McKenney
2010-11-07 2:05 ` [PATCH RFC tip/core/rcu 11/12] rcu: fix race condition in synchronize_sched_expedited() Paul E. McKenney
2010-11-09 13:26 ` Tejun Heo [this message]
2010-11-10 8:56 ` Lai Jiangshan
2010-11-11 4:20 ` Paul E. McKenney
2010-11-11 9:10 ` Tejun Heo
2010-11-11 12:31 ` Paul E. McKenney
2010-11-11 12:52 ` Tejun Heo
2010-11-07 2:05 ` [PATCH RFC tip/core/rcu 12/12] rcu: Make synchronize_srcu_expedited() fast if running readers Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4CD94C0D.3030007@kernel.org \
--to=tj@kernel.org \
--cc=Valdis.Kletnieks@vt.edu \
--cc=akpm@linux-foundation.org \
--cc=darren@dvhart.com \
--cc=dhowells@redhat.com \
--cc=dipankar@in.ibm.com \
--cc=eric.dumazet@gmail.com \
--cc=josh@joshtriplett.org \
--cc=laijs@cn.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mathieu.desnoyers@polymtl.ca \
--cc=mingo@elte.hu \
--cc=niv@us.ibm.com \
--cc=paulmck@linux.vnet.ibm.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox