From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753034Ab0KIN2q (ORCPT ); Tue, 9 Nov 2010 08:28:46 -0500 Received: from hera.kernel.org ([140.211.167.34]:39541 "EHLO hera.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752027Ab0KIN2o (ORCPT ); Tue, 9 Nov 2010 08:28:44 -0500 Message-ID: <4CD94C0D.3030007@kernel.org> Date: Tue, 09 Nov 2010 14:26:37 +0100 From: Tejun Heo User-Agent: Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.9.2.12) Gecko/20101027 Lightning/1.0b2 Thunderbird/3.1.6 MIME-Version: 1.0 To: "Paul E. McKenney" CC: linux-kernel@vger.kernel.org, mingo@elte.hu, laijs@cn.fujitsu.com, dipankar@in.ibm.com, akpm@linux-foundation.org, mathieu.desnoyers@polymtl.ca, josh@joshtriplett.org, niv@us.ibm.com, tglx@linutronix.de, peterz@infradead.org, rostedt@goodmis.org, Valdis.Kletnieks@vt.edu, dhowells@redhat.com, eric.dumazet@gmail.com, darren@dvhart.com Subject: Re: [PATCH RFC tip/core/rcu 11/12] rcu: fix race condition in synchronize_sched_expedited() References: <20101107020507.GA4974@linux.vnet.ibm.com> <1289095532-5398-11-git-send-email-paulmck@linux.vnet.ibm.com> In-Reply-To: <1289095532-5398-11-git-send-email-paulmck@linux.vnet.ibm.com> X-Enigmail-Version: 1.1.1 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.3 (hera.kernel.org [127.0.0.1]); Tue, 09 Nov 2010 13:26:42 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello, Paul. On 11/07/2010 03:05 AM, Paul E. McKenney wrote: > The new (early 2010) implementation of synchronize_sched_expedited() uses > try_stop_cpu() to force a context switch on every CPU. It also permits > concurrent calls to synchronize_sched_expedited() to share a single call > to try_stop_cpu() through use of an atomically incremented > synchronize_sched_expedited_count variable. Unfortunately, this is > subject to failure as follows: > > o Task A invokes synchronize_sched_expedited(), try_stop_cpus() > succeeds, but Task A is preempted before getting to the atomic > increment of synchronize_sched_expedited_count. > > o Task B also invokes synchronize_sched_expedited(), with exactly > the same outcome as Task A. > > o Task C also invokes synchronize_sched_expedited(), again with > exactly the same outcome as Tasks A and B. > > o Task D also invokes synchronize_sched_expedited(), but only > gets as far as acquiring the mutex within try_stop_cpus() > before being preempted, interrupted, or otherwise delayed. > > o Task E also invokes synchronize_sched_expedited(), but only > gets to the snapshotting of synchronize_sched_expedited_count. > > o Tasks A, B, and C all increment synchronize_sched_expedited_count. > > o Task E fails to get the mutex, so checks the new value > of synchronize_sched_expedited_count. It finds that the > value has increased, so (wrongly) assumes that its work > has been done, returning despite there having been no > expedited grace period since it began. > > The solution is to have the lowest-numbered CPU atomically increment > the synchronize_sched_expedited_count variable within the > synchronize_sched_expedited_cpu_stop() function, which is under > the protection of the mutex acquired by try_stop_cpus(). However, this > also requires that piggybacking tasks wait for three rather than two > instances of try_stop_cpu(), because we cannot control the order in > which the per-CPU callback function occur. How about something like the following? It's slightly bigger but I think it's a bit easier to understand. Thanks. diff --git a/kernel/sched.c b/kernel/sched.c index aa14a56..0069be5 100644 --- a/kernel/sched.c +++ b/kernel/sched.c @@ -9342,7 +9342,8 @@ EXPORT_SYMBOL_GPL(synchronize_sched_expedited); #else /* #ifndef CONFIG_SMP */ -static atomic_t synchronize_sched_expedited_count = ATOMIC_INIT(0); +static atomic_t sync_sched_expedited_token = ATOMIC_INIT(0); +static atomic_t sync_sched_expedited_done = ATOMIC_INIT(0); static int synchronize_sched_expedited_cpu_stop(void *data) { @@ -9373,11 +9374,18 @@ static int synchronize_sched_expedited_cpu_stop(void *data) */ void synchronize_sched_expedited(void) { - int snap, trycount = 0; + int my_tok, tok, t, trycount = 0; + + smp_mb(); /* ensure prior mod happens before getting token. */ + + /* + * Get a token. This is used to coordinate with other + * concurrent syncers and consolidate multiple syncs. + */ + my_tok = tok = atomic_inc_return(&sync_sched_expedited_token); - smp_mb(); /* ensure prior mod happens before capturing snap. */ - snap = atomic_read(&synchronize_sched_expedited_count) + 1; get_online_cpus(); + while (try_stop_cpus(cpu_online_mask, synchronize_sched_expedited_cpu_stop, NULL) == -EAGAIN) { @@ -9388,13 +9396,34 @@ void synchronize_sched_expedited(void) synchronize_sched(); return; } - if (atomic_read(&synchronize_sched_expedited_count) - snap > 0) { + + /* + * If the done count reached @my_tok, we know at least + * one synchronization happened since we entered this + * function. + */ + if (atomic_read(&sync_sched_expedited_done) - my_tok >= 0) { smp_mb(); /* ensure test happens before caller kfree */ return; } + get_online_cpus(); + + /* about to retry, get the latest token value */ + tok = atomic_read(&sync_sched_expedited_token); } - atomic_inc(&synchronize_sched_expedited_count); + + /* + * We now know that everything upto @tok is synchronized. + * Update done counter which should always monotonically + * increase (with wrapping considered). + */ + do { + t = atomic_read(&sync_sched_expedited_done); + if (t - tok >= 0) + break; + } while (atomic_cmpxchg(&sync_sched_expedited_done, t, tok) != t); + smp_mb__after_atomic_inc(); /* ensure post-GP actions seen after GP. */ put_online_cpus(); }