From: Peter Zijlstra <peterz@infradead.org>
To: Oleg Nesterov <oleg@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>,
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
Jonathan Corbet <corbet@lwn.net>,
Michal Hocko <mhocko@kernel.org>,
David Howells <dhowells@redhat.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
Will Deacon <will.deacon@arm.com>
Subject: Re: [PATCH] Documentation: Remove misleading examples of the barriers in wake_*()
Date: Tue, 6 Oct 2015 21:57:27 +0200 [thread overview]
Message-ID: <20151006195727.GI11639@twins.programming.kicks-ass.net> (raw)
In-Reply-To: <20151006160450.GS3604@twins.programming.kicks-ass.net>
> On Mon, Sep 21, 2015 at 07:46:11PM +0200, Oleg Nesterov wrote:
> > In short, I got lost ;) Now I don't even understand why we do not need
> > another rmb() between p->on_rq and p->on_cpu. Suppose a thread T does
> >
> > set_current_state(...);
> > schedule();
> >
> > it can be preempted in between, after that we have "on_rq && !on_cpu".
> > Then it gets CPU again and calls schedule() which clears on_rq.
> >
> > What guarantees that if ttwu() sees on_rq == 0 cleared by schedule()
> > then it can _not_ still see the old value of on_cpu == 0?
I think you're right. Does the below adequately explain things?
I'll have another look tomorrow to see if I still agree with myself, but
for now I think I've convinced myself you're right.
---
Subject: sched: Fix race in try_to_wake_up() vs schedule()
Oleg noticed that its possible to falsely observe p->on_cpu == 0 such
that we'll prematurely continue with the wakeup and effectively run p on
two CPUs at the same time.
Even though the overlap is very limited; the task is in the middle of
being scheduled out; it could still result in corruption of the
scheduler data structures.
CPU0 CPU1
set_current_state(...)
<preempt_schedule>
context_switch(X, Y)
prepare_lock_switch(Y)
Y->on_cpu = 1;
finish_lock_switch(X)
store_release(X->on_cpu, 0);
try_to_wake_up(X)
LOCK(p->pi_lock);
t = X->on_cpu; // 0
context_switch(Y, X)
prepare_lock_switch(X)
X->on_cpu = 1;
finish_lock_switch(Y)
store_release(Y->on_cpu, 0);
</preempt_schedule>
schedule();
deactivate_task(X);
X->on_rq = 0;
if (X->on_rq) // false
if (t) while (X->on_cpu)
cpu_relax();
context_switch(X, ..)
finish_lock_switch(X)
store_release(X->on_cpu, 0);
Avoid the load of X->on_cpu being hoisted over the X->on_rq load.
Reported-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
kernel/sched/core.c | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2084,6 +2084,25 @@ try_to_wake_up(struct task_struct *p, un
#ifdef CONFIG_SMP
/*
+ * Ensure we load p->on_cpu _after_ p->on_rq, otherwise it would be
+ * possible to, falsely, observe p->on_cpu == 0.
+ *
+ * One must be running (->on_cpu == 1) in order to remove oneself
+ * from the runqueue.
+ *
+ * [S] ->on_cpu = 1; [L] ->on_rq
+ * UNLOCK rq->lock
+ * RMB
+ * LOCK rq->lock
+ * [S] ->on_rq = 0; [L] ->on_cpu
+ *
+ * Pairs with the full barrier implied in the UNLOCK+LOCK on rq->lock
+ * from the consecutive calls to schedule(); the first switching to our
+ * task, the second putting it to sleep.
+ */
+ smp_rmb();
+
+ /*
* If the owning (remote) cpu is still in the middle of schedule() with
* this task as prev, wait until its done referencing the task.
*/
next prev parent reply other threads:[~2015-10-06 20:01 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-09-08 1:14 [PATCH] Documentation: Remove misleading examples of the barriers in wake_*() Boqun Feng
2015-09-09 19:28 ` Paul E. McKenney
2015-09-10 2:16 ` Boqun Feng
2015-09-10 17:55 ` Oleg Nesterov
2015-09-11 16:59 ` Boqun Feng
2015-09-17 13:01 ` Peter Zijlstra
2015-09-17 17:01 ` Oleg Nesterov
2015-09-18 6:49 ` Peter Zijlstra
2015-09-21 17:46 ` Oleg Nesterov
2015-10-06 16:04 ` Peter Zijlstra
2015-10-06 16:24 ` Peter Zijlstra
2015-10-06 16:35 ` Will Deacon
2015-10-06 19:57 ` Peter Zijlstra [this message]
2015-10-07 11:10 ` Peter Zijlstra
2015-10-07 15:40 ` Paul E. McKenney
2015-09-24 13:21 ` Boqun Feng
2015-10-06 16:06 ` Peter Zijlstra
2015-10-11 15:26 ` Boqun Feng
2015-10-12 0:40 ` Paul E. McKenney
2015-10-12 9:06 ` Boqun Feng
2015-10-12 11:54 ` Peter Zijlstra
2015-10-12 13:09 ` Boqun Feng
2015-10-12 16:26 ` Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20151006195727.GI11639@twins.programming.kicks-ass.net \
--to=peterz@infradead.org \
--cc=boqun.feng@gmail.com \
--cc=corbet@lwn.net \
--cc=dhowells@redhat.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mhocko@kernel.org \
--cc=oleg@redhat.com \
--cc=paulmck@linux.vnet.ibm.com \
--cc=torvalds@linux-foundation.org \
--cc=will.deacon@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.