Re: [PATCH tip/sched/core] Add comments to aid in safer usage of swake_up.

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Krister Johansen <kjlx@templeofstupid.com>,
	Ingo Molnar <mingo@kernel.org>,
	linux-kernel@vger.kernel.org,
	Paul Gortmaker <paul.gortmaker@windriver.com>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [PATCH tip/sched/core] Add comments to aid in safer usage of swake_up.
Date: Wed, 14 Jun 2017 08:55:31 -0700	[thread overview]
Message-ID: <20170614155531.GO3721@linux.vnet.ibm.com> (raw)
In-Reply-To: <20170614091015.01d7dc89@gandalf.local.home>

On Wed, Jun 14, 2017 at 09:10:15AM -0400, Steven Rostedt wrote:
> On Tue, 13 Jun 2017 20:58:43 -0700
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> 
> > And here is the part you also need to look at:
> 
> Why? We are talking about two different, unrelated variables modified
> on two different CPUs. I don't see where the overlap is.

It does sound like we are talking past each other.

Please see below for how I was interpreting your sequence of events.

> > ====
> > 
> >  (*) Overlapping loads and stores within a particular CPU will appear to be
> >      ordered within that CPU.  This means that for:
> > 
> > 	a = READ_ONCE(*X); WRITE_ONCE(*X, b);
> > 
> >      the CPU will only issue the following sequence of memory operations:
> > 
> > 	a = LOAD *X, STORE *X = b
> > 
> >      And for:
> > 
> > 	WRITE_ONCE(*X, c); d = READ_ONCE(*X);
> > 
> >      the CPU will only issue:
> > 
> > 	STORE *X = c, d = LOAD *X
> > 
> >      (Loads and stores overlap if they are targeted at overlapping pieces of
> >      memory).
> > 
> > ====
> > 
> > This section needs some help -- the actual guarantee is stronger, that
> > all CPUs will agree on the order of volatile same-sized aligned accesses
> > to a given single location.  So if a previous READ_ONCE() sees the new
> > value, any subsequent READ_ONCE() from that same variable is guaranteed
> > to also see the new value (or some later value).
> > 
> > Does that help, or am I missing something here?
> 
> Maybe I'm missing something. Let me rewrite what I first wrote, and
> then abstract it into a simpler version:
> 
> Here's what I first wrote:
> 
> (looking at __call_rcu_core() and rcu_gp_kthread()
> 
> 	CPU0				CPU1
> 	----				----
> 				__call_rcu_core() {
> 
> 				 spin_lock(rnp_root)
> 				 need_wake = __rcu_start_gp() {
> 				  rcu_start_gp_advanced() {
> 				   gp_flags = FLAG_INIT
> 				  }
> 				 }
> 
>  rcu_gp_kthread() {
>    swait_event_interruptible(wq,
> 	gp_flags & FLAG_INIT) {

This is the first access to ->gp_flags from rcu_gp_kthread().

>    spin_lock(q->lock)
> 
> 				*fetch wq->task_list here! *
> 
>    list_add(wq->task_list, q->task_list)
>    spin_unlock(q->lock);
> 
>    *fetch old value of gp_flags here *

This is the second access to ->gp_flags.

Since you are saying that ->gp_flags is only accessed once, perhaps
this code from spin_lock() down is intended to be an expansion of
swait_event_interruptible()?

#define swait_event_interruptible(wq, condition)			\
({									\
	int __ret = 0;							\
	if (!(condition))						\
		__ret = __swait_event_interruptible(wq, condition);	\
	__ret;								\
})

But no, in this case, we have the macro argument named "condition"
accessing ->gp_flags, and a control dependency forcing that access to
precede the spin_lock() in __prepare_to_swait().  We cannot acquire the
spinlock unless the condition is false, that is, the old value is fetched.
So there is a first fetch of ->gp_flags that is constrained to happen
before the spin_lock().  Any fetch of ->gp_flags after the spin_unlock()
must therefore be a second fetch.  Which of course might still get the
old value because the update to ->gp_flags might not have propagated yet.

But it appears that you are worried about something else.

> 				 spin_unlock(rnp_root)
> 
> 				 rcu_gp_kthread_wake() {
> 				  swake_up(wq) {
> 				   swait_active(wq) {
> 				    list_empty(wq->task_list)

We don't hold q->lock here, so I am guessing that your concern is that
we aren't guaranteed to see the above list_add().

Is that the case?

If so, your suggested fix is to place an smp_mb() between
swait_event_interruptible()'s access to "condition" and
__prepare_to_swait()'s list_add(), correct?  And also an
smp_mb() before swake_up()'s call to swait_active(), correct?

The second smp_mb() could be placed by the user, but the first
one cannot, at least not reasonably.

So did I get the point eventually?  ;-)

							Thanx, Paul

> 				   } * return false *
> 
>   if (condition) * false *
>     schedule();
> 
> 
> Now let's make it simpler. I'll even add the READ_ONCE and WRITE_ONCE
> where applicable.
> 
> 
> 	CPU0				CPU1
> 	----				----
> 				LOCK(A)
> 
>  LOCK(B)
> 				 WRITE_ONCE(X, INIT)
> 
> 				 (the cpu may postpone writing X)
> 
> 				 (the cpu can fetch wq list here)
>   list_add(wq, q)
> 
>  UNLOCK(B)
> 
>  (the cpu may fetch old value of X)
> 
> 				 (write of X happens here)
> 
>  if (READ_ONCE(X) != init)
>    schedule();
> 
> 				UNLOCK(A)
> 
> 				 if (list_empty(wq))
> 				   return;
> 
> Tell me again how the READ_ONCE() and WRITE_ONCE() helps in this
> scenario?
> 
> Because we are using spinlocks, this wont be an issue for most
> architectures. The bug happens if the fetching of the list_empty()
> leaks into before the UNLOCK(A).
> 
> If the reading/writing of the list and the reading/writing of gp_flags
> gets reversed in either direction by the CPU, then we have a problem.
> 
> -- Steve
>

     prev parent reply	other threads:[~2017-06-14 15:55 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-09  3:25 [PATCH tip/sched/core] Add comments to aid in safer usage of swake_up Krister Johansen
2017-06-09  7:19 ` Peter Zijlstra
2017-06-09 12:45   ` Paul E. McKenney
2017-06-13 23:23     ` Steven Rostedt
2017-06-13 23:42       ` Paul E. McKenney
2017-06-14  1:15         ` Steven Rostedt
2017-06-14  3:58           ` Paul E. McKenney
2017-06-14 13:10             ` Steven Rostedt
2017-06-14 15:02               ` Steven Rostedt
2017-06-14 16:25                 ` Krister Johansen
2017-06-15  4:18                   ` Boqun Feng
2017-06-15 17:56                     ` Paul E. McKenney
2017-06-16  1:07                       ` Boqun Feng
2017-06-16  3:09                         ` Paul E. McKenney
2017-08-10 12:10                     ` [tip:locking/core] sched/wait: Remove the lockless swait_active() check in swake_up*() tip-bot for Boqun Feng
2017-06-14 15:55               ` Paul E. McKenney [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170614155531.GO3721@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=kjlx@templeofstupid.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=paul.gortmaker@windriver.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.