* Question about synchronize_sched_expedited()
@ 2010-10-25 15:43 Paul E. McKenney
2010-10-25 16:03 ` Tejun Heo
0 siblings, 1 reply; 4+ messages in thread
From: Paul E. McKenney @ 2010-10-25 15:43 UTC (permalink / raw)
To: tj; +Cc: linux-kernel
Hello, Tejun,
I was taking another look at synchronize_sched_expedited(), and was
concerned about the scenario listed out in the following commit.
Is this scenario a real problem, or am I missing the synchronization
that makes it safe?
(If my concerns are valid, I should also be able to change this
to non-atomically increment synchronize_sched_expedited_count, but
one step at a time...)
Thanx, Paul
------------------------------------------------------------------------
commit 1c2f788a742b87f8fae692b0b3014732124ee3c6
Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Date: Mon Oct 25 07:39:22 2010 -0700
rcu: fix race condition in synchronize_sched_expedited()
The new (early 2010) implementation of synchronize_sched_expedited() uses
try_stop_cpu() to force a context switch on every CPU. It also permits
concurrent calls to synchronize_sched_expedited() to share a single call
to try_stop_cpu() through use of an atomically incremented
synchronize_sched_expedited_count variable. Unfortunately, this is
subject to failure as follows:
o Task A invokes synchronize_sched_expedited(), try_stop_cpus()
succeeds, but Task A is preempted before getting to the atomic
increment of synchronize_sched_expedited_count.
o Task B also invokes synchronize_sched_expedited(), with exactly
the same outcome as Task A.
o Task C also invokes synchronize_sched_expedited(), again with
exactly the same outcome as Tasks A and B.
o Task D also invokes synchronize_sched_expedited(), but only
gets as far as acquiring the mutex within try_stop_cpus()
before being preempted, interrupted, or otherwise delayed.
o Task E also invokes synchronize_sched_expedited(), but only
gets to the snapshotting of synchronize_sched_expedited_count.
o Tasks A, B, and C all increment synchronize_sched_expedited_count.
o Task E fails to get the mutex, so checks the new value
of synchronize_sched_expedited_count. It finds that the
value has increased, so (wrongly) assumes that its work
has been done, returning despite there having been no
expedited grace period since it began.
The solution is to have the lowest-numbered CPU atomically increment
the synchronize_sched_expedited_count variable within the
synchronize_sched_expedited_cpu_stop() function, which is under
the protection of the mutex acquired by try_stop_cpus().
Cc: Tejun Heo <tj@kernel.org>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
index 32e76d4..16bf339 100644
--- a/kernel/rcutree_plugin.h
+++ b/kernel/rcutree_plugin.h
@@ -1041,6 +1041,8 @@ static int synchronize_sched_expedited_cpu_stop(void *data)
* robustness against future implementation changes.
*/
smp_mb(); /* See above comment block. */
+ if (cpumask_first(cpu_online_mask) == smp_processor_id())
+ atomic_inc(&synchronize_sched_expedited_count);
return 0;
}
@@ -1077,7 +1079,6 @@ void synchronize_sched_expedited(void)
}
get_online_cpus();
}
- atomic_inc(&synchronize_sched_expedited_count);
smp_mb__after_atomic_inc(); /* ensure post-GP actions seen after GP. */
put_online_cpus();
}
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: Question about synchronize_sched_expedited()
2010-10-25 15:43 Question about synchronize_sched_expedited() Paul E. McKenney
@ 2010-10-25 16:03 ` Tejun Heo
2010-10-25 19:41 ` Paul E. McKenney
0 siblings, 1 reply; 4+ messages in thread
From: Tejun Heo @ 2010-10-25 16:03 UTC (permalink / raw)
To: paulmck; +Cc: linux-kernel
Hello, Paul.
On 10/25/2010 05:43 PM, Paul E. McKenney wrote:
> Hello, Tejun,
>
> I was taking another look at synchronize_sched_expedited(), and was
> concerned about the scenario listed out in the following commit.
> Is this scenario a real problem, or am I missing the synchronization
> that makes it safe?
>
> (If my concerns are valid, I should also be able to change this
> to non-atomically increment synchronize_sched_expedited_count, but
> one step at a time...)
I think your concern is valid and this can happen w/o preemption given
enough cpus and perfect timing. Was the original code free from this
problem?
IMHO the counter based mechanism is a bit too difficult to ponder and
verify. Can we do more conventional double queueing (ie. flipping
pending and executing queues so that multiple sync calls can get
coalesced while another one is in progress)? That's what the code is
trying to achieve anyway, right?
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Question about synchronize_sched_expedited()
2010-10-25 16:03 ` Tejun Heo
@ 2010-10-25 19:41 ` Paul E. McKenney
2010-10-26 9:25 ` Tejun Heo
0 siblings, 1 reply; 4+ messages in thread
From: Paul E. McKenney @ 2010-10-25 19:41 UTC (permalink / raw)
To: Tejun Heo; +Cc: linux-kernel
On Mon, Oct 25, 2010 at 06:03:43PM +0200, Tejun Heo wrote:
> Hello, Paul.
>
> On 10/25/2010 05:43 PM, Paul E. McKenney wrote:
> > Hello, Tejun,
> >
> > I was taking another look at synchronize_sched_expedited(), and was
> > concerned about the scenario listed out in the following commit.
> > Is this scenario a real problem, or am I missing the synchronization
> > that makes it safe?
> >
> > (If my concerns are valid, I should also be able to change this
> > to non-atomically increment synchronize_sched_expedited_count, but
> > one step at a time...)
>
> I think your concern is valid and this can happen w/o preemption given
> enough cpus and perfect timing. Was the original code free from this
> problem?
I believe so -- there was a mutex guarding the whole operation, including
the increment.
> IMHO the counter based mechanism is a bit too difficult to ponder and
> verify. Can we do more conventional double queueing (ie. flipping
> pending and executing queues so that multiple sync calls can get
> coalesced while another one is in progress)? That's what the code is
> trying to achieve anyway, right?
Hmmm... But it would be necessary to flip the queues somewhere, and
wouldn't determining where that somewhere was involve the same analysis
and complexity as determining where to increment the counter?
Thanx, Paul
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Question about synchronize_sched_expedited()
2010-10-25 19:41 ` Paul E. McKenney
@ 2010-10-26 9:25 ` Tejun Heo
0 siblings, 0 replies; 4+ messages in thread
From: Tejun Heo @ 2010-10-26 9:25 UTC (permalink / raw)
To: paulmck; +Cc: linux-kernel
Hello, Paul.
On 10/25/2010 09:41 PM, Paul E. McKenney wrote:
>> I think your concern is valid and this can happen w/o preemption given
>> enough cpus and perfect timing. Was the original code free from this
>> problem?
>
> I believe so -- there was a mutex guarding the whole operation, including
> the increment.
I see.
>> IMHO the counter based mechanism is a bit too difficult to ponder and
>> verify. Can we do more conventional double queueing (ie. flipping
>> pending and executing queues so that multiple sync calls can get
>> coalesced while another one is in progress)? That's what the code is
>> trying to achieve anyway, right?
>
> Hmmm... But it would be necessary to flip the queues somewhere, and
> wouldn't determining where that somewhere was involve the same analysis
> and complexity as determining where to increment the counter?
I was thinking something like the following.
lock;
if (list_empty(running))
add myself to running
unlock;
else
remember list_empty(pending)
append myself to pending queue;
unlock and sleep;
if (pending wasn't empty)
return;
do it;
lock;
wake up all on running and clear it;
list_splice_init(pending, running);
wake up the first of running;
unlock;
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2010-10-26 9:25 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-10-25 15:43 Question about synchronize_sched_expedited() Paul E. McKenney
2010-10-25 16:03 ` Tejun Heo
2010-10-25 19:41 ` Paul E. McKenney
2010-10-26 9:25 ` Tejun Heo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox