From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
Krister Johansen <kjlx@templeofstupid.com>,
Ingo Molnar <mingo@kernel.org>,
linux-kernel@vger.kernel.org,
Paul Gortmaker <paul.gortmaker@windriver.com>,
Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [PATCH tip/sched/core] Add comments to aid in safer usage of swake_up.
Date: Wed, 14 Jun 2017 08:55:31 -0700 [thread overview]
Message-ID: <20170614155531.GO3721@linux.vnet.ibm.com> (raw)
In-Reply-To: <20170614091015.01d7dc89@gandalf.local.home>
On Wed, Jun 14, 2017 at 09:10:15AM -0400, Steven Rostedt wrote:
> On Tue, 13 Jun 2017 20:58:43 -0700
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
>
> > And here is the part you also need to look at:
>
> Why? We are talking about two different, unrelated variables modified
> on two different CPUs. I don't see where the overlap is.
It does sound like we are talking past each other.
Please see below for how I was interpreting your sequence of events.
> > ====
> >
> > (*) Overlapping loads and stores within a particular CPU will appear to be
> > ordered within that CPU. This means that for:
> >
> > a = READ_ONCE(*X); WRITE_ONCE(*X, b);
> >
> > the CPU will only issue the following sequence of memory operations:
> >
> > a = LOAD *X, STORE *X = b
> >
> > And for:
> >
> > WRITE_ONCE(*X, c); d = READ_ONCE(*X);
> >
> > the CPU will only issue:
> >
> > STORE *X = c, d = LOAD *X
> >
> > (Loads and stores overlap if they are targeted at overlapping pieces of
> > memory).
> >
> > ====
> >
> > This section needs some help -- the actual guarantee is stronger, that
> > all CPUs will agree on the order of volatile same-sized aligned accesses
> > to a given single location. So if a previous READ_ONCE() sees the new
> > value, any subsequent READ_ONCE() from that same variable is guaranteed
> > to also see the new value (or some later value).
> >
> > Does that help, or am I missing something here?
>
> Maybe I'm missing something. Let me rewrite what I first wrote, and
> then abstract it into a simpler version:
>
> Here's what I first wrote:
>
> (looking at __call_rcu_core() and rcu_gp_kthread()
>
> CPU0 CPU1
> ---- ----
> __call_rcu_core() {
>
> spin_lock(rnp_root)
> need_wake = __rcu_start_gp() {
> rcu_start_gp_advanced() {
> gp_flags = FLAG_INIT
> }
> }
>
> rcu_gp_kthread() {
> swait_event_interruptible(wq,
> gp_flags & FLAG_INIT) {
This is the first access to ->gp_flags from rcu_gp_kthread().
> spin_lock(q->lock)
>
> *fetch wq->task_list here! *
>
> list_add(wq->task_list, q->task_list)
> spin_unlock(q->lock);
>
> *fetch old value of gp_flags here *
This is the second access to ->gp_flags.
Since you are saying that ->gp_flags is only accessed once, perhaps
this code from spin_lock() down is intended to be an expansion of
swait_event_interruptible()?
#define swait_event_interruptible(wq, condition) \
({ \
int __ret = 0; \
if (!(condition)) \
__ret = __swait_event_interruptible(wq, condition); \
__ret; \
})
But no, in this case, we have the macro argument named "condition"
accessing ->gp_flags, and a control dependency forcing that access to
precede the spin_lock() in __prepare_to_swait(). We cannot acquire the
spinlock unless the condition is false, that is, the old value is fetched.
So there is a first fetch of ->gp_flags that is constrained to happen
before the spin_lock(). Any fetch of ->gp_flags after the spin_unlock()
must therefore be a second fetch. Which of course might still get the
old value because the update to ->gp_flags might not have propagated yet.
But it appears that you are worried about something else.
> spin_unlock(rnp_root)
>
> rcu_gp_kthread_wake() {
> swake_up(wq) {
> swait_active(wq) {
> list_empty(wq->task_list)
We don't hold q->lock here, so I am guessing that your concern is that
we aren't guaranteed to see the above list_add().
Is that the case?
If so, your suggested fix is to place an smp_mb() between
swait_event_interruptible()'s access to "condition" and
__prepare_to_swait()'s list_add(), correct? And also an
smp_mb() before swake_up()'s call to swait_active(), correct?
The second smp_mb() could be placed by the user, but the first
one cannot, at least not reasonably.
So did I get the point eventually? ;-)
Thanx, Paul
> } * return false *
>
> if (condition) * false *
> schedule();
>
>
> Now let's make it simpler. I'll even add the READ_ONCE and WRITE_ONCE
> where applicable.
>
>
> CPU0 CPU1
> ---- ----
> LOCK(A)
>
> LOCK(B)
> WRITE_ONCE(X, INIT)
>
> (the cpu may postpone writing X)
>
> (the cpu can fetch wq list here)
> list_add(wq, q)
>
> UNLOCK(B)
>
> (the cpu may fetch old value of X)
>
> (write of X happens here)
>
> if (READ_ONCE(X) != init)
> schedule();
>
> UNLOCK(A)
>
> if (list_empty(wq))
> return;
>
> Tell me again how the READ_ONCE() and WRITE_ONCE() helps in this
> scenario?
>
> Because we are using spinlocks, this wont be an issue for most
> architectures. The bug happens if the fetching of the list_empty()
> leaks into before the UNLOCK(A).
>
> If the reading/writing of the list and the reading/writing of gp_flags
> gets reversed in either direction by the CPU, then we have a problem.
>
> -- Steve
>
prev parent reply other threads:[~2017-06-14 15:55 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-06-09 3:25 [PATCH tip/sched/core] Add comments to aid in safer usage of swake_up Krister Johansen
2017-06-09 7:19 ` Peter Zijlstra
2017-06-09 12:45 ` Paul E. McKenney
2017-06-13 23:23 ` Steven Rostedt
2017-06-13 23:42 ` Paul E. McKenney
2017-06-14 1:15 ` Steven Rostedt
2017-06-14 3:58 ` Paul E. McKenney
2017-06-14 13:10 ` Steven Rostedt
2017-06-14 15:02 ` Steven Rostedt
2017-06-14 16:25 ` Krister Johansen
2017-06-15 4:18 ` Boqun Feng
2017-06-15 17:56 ` Paul E. McKenney
2017-06-16 1:07 ` Boqun Feng
2017-06-16 3:09 ` Paul E. McKenney
2017-08-10 12:10 ` [tip:locking/core] sched/wait: Remove the lockless swait_active() check in swake_up*() tip-bot for Boqun Feng
2017-06-14 15:55 ` Paul E. McKenney [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170614155531.GO3721@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=kjlx@templeofstupid.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=paul.gortmaker@windriver.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox