public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Boqun Feng <boqun.feng@gmail.com>
Cc: Krister Johansen <kjlx@templeofstupid.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@kernel.org>,
	linux-kernel@vger.kernel.org,
	Paul Gortmaker <paul.gortmaker@windriver.com>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [PATCH tip/sched/core] Add comments to aid in safer usage of swake_up.
Date: Thu, 15 Jun 2017 10:56:29 -0700	[thread overview]
Message-ID: <20170615175629.GE3721@linux.vnet.ibm.com> (raw)
In-Reply-To: <20170615041828.zk3a3sfyudm5p6nl@tardis>

On Thu, Jun 15, 2017 at 12:18:28PM +0800, Boqun Feng wrote:
> On Wed, Jun 14, 2017 at 09:25:58AM -0700, Krister Johansen wrote:
> > On Wed, Jun 14, 2017 at 11:02:40AM -0400, Steven Rostedt wrote:
> > > On Wed, 14 Jun 2017 09:10:15 -0400
> > > Steven Rostedt <rostedt@goodmis.org> wrote:
> > > 
> > > > Now let's make it simpler. I'll even add the READ_ONCE and WRITE_ONCE
> > > > where applicable.
> > > > 
> > > > 
> > > > 	CPU0				CPU1
> > > > 	----				----
> > > > 				LOCK(A)
> > > > 
> > > >  LOCK(B)
> > > > 				 WRITE_ONCE(X, INIT)
> > > > 
> > > > 				 (the cpu may postpone writing X)
> > > > 
> > > > 				 (the cpu can fetch wq list here)
> > > >   list_add(wq, q)
> > > > 
> > > >  UNLOCK(B)
> > > > 
> > > >  (the cpu may fetch old value of X)
> > > > 
> > > > 				 (write of X happens here)
> > > > 
> > > >  if (READ_ONCE(X) != init)
> > > >    schedule();
> > > > 
> > > > 				UNLOCK(A)
> > > > 
> > > > 				 if (list_empty(wq))
> > > > 				   return;
> > > > 
> > > > Tell me again how the READ_ONCE() and WRITE_ONCE() helps in this
> > > > scenario?
> > > > 
> > > > Because we are using spinlocks, this wont be an issue for most
> > > > architectures. The bug happens if the fetching of the list_empty()
> > > > leaks into before the UNLOCK(A).
> > > > 
> > > > If the reading/writing of the list and the reading/writing of gp_flags
> > > > gets reversed in either direction by the CPU, then we have a problem.
> > > 
> > > FYI..
> > > 
> > > Both sides need a memory barrier. Otherwise, even with a memory barrier
> > > on CPU1 we can still have:
> > > 
> > > 
> > > 	CPU0				CPU1
> > > 	----				----
> > > 
> > > 				LOCK(A)
> > >  LOCK(B)
> > > 
> > >  list_add(wq, q)
> > > 
> > >  (cpu waits to write wq list)
> > > 
> > >  (cpu fetches X)
> > > 
> > > 				 WRITE_ONCE(X, INIT)
> > > 
> > > 				UNLOCK(A)
> > > 
> > > 				smp_mb();
> > > 
> > > 				if (list_empty(wq))
> > > 				   return;
> > > 
> > >  (cpu writes wq list)
> > > 
> > >  UNLOCK(B)
> > > 
> > >  if (READ_ONCE(X) != INIT)
> > >    schedule()
> > > 
> > > 
> > > Luckily for us, there is a memory barrier on CPU0. In
> > > prepare_to_swait() we have:
> > > 
> > > 	raw_spin_lock_irqsave(&q->lock, flags);
> > > 	__prepare_to_swait(q, wait);
> > > 	set_current_state(state);
> > > 	raw_spin_unlock_irqrestore(&q->lock, flags);
> > > 
> > > And that set_current_state() call includes a memory barrier, which will
> > > prevent the above from happening, as the addition to the wq list must
> > > be flushed before fetching X.
> > > 
> > > I still strongly believe that the swait_active() requires a memory
> > > barrier.
> > 
> > FWLIW, I agree.  There was a smb_mb() in RT-linux's equivalent of
> > swait_activate().
> > 
> > https://www.spinics.net/lists/linux-rt-users/msg10340.html
> > 
> > If the barrier goes in swait_active() then we don't have to require all
> > of the callers of swait_active and swake_up to issue the barrier
> > instead.  Handling this in swait_active is likely to be less error
> > prone.  Though, we could also do something like wq_has_sleeper() and use
> > that preferentially in swake_up and its variants.
> > 
> 
> I think it makes more sense that we delete the swait_active() in
> swake_up()? Because we seems to encourage users to do the quick check on
> wait queue on their own, so why do the check again in swake_up()?
> Besides, wake_up() doesn't call waitqueue_activie() outside the lock
> critical section either.
> 
> So how about the patch below(Testing is in progress)? Peter?

It is quite possible that a problem I am seeing is caused by this, but
there are reasons to believe otherwise.  And in any case, the problem is
quite rare, taking tens or perhaps even hundreds of hours of rcutorture
to reproduce.

So, would you be willing to create a dedicated swait torture test to check
this out?  The usual approach would be to create a circle of kthreads,
with each waiting on the previous kthread and waking up the next one.
Each kthread, after being awakened, checks a variable that its waker
sets just before the wakeup.  Have another kthread check for hangs.

Possibly introduce timeouts and random delays to stir things up a bit.

But maybe such a test already exists.  Does anyone know of one?  I don't
see anything obvious.

Interested?

							Thanx, Paul

> Regards,
> Boqun
> 
> --------------------->8
> Subject: [PATCH] swait: Remove the lockless swait_active() check in
>  swake_up*()
> 
> Steven Rostedt reported a potential race in RCU core because of
> swake_up():
> 
>         CPU0                            CPU1
>         ----                            ----
>                                 __call_rcu_core() {
> 
>                                  spin_lock(rnp_root)
>                                  need_wake = __rcu_start_gp() {
>                                   rcu_start_gp_advanced() {
>                                    gp_flags = FLAG_INIT
>                                   }
>                                  }
> 
>  rcu_gp_kthread() {
>    swait_event_interruptible(wq,
>         gp_flags & FLAG_INIT) {
>    spin_lock(q->lock)
> 
>                                 *fetch wq->task_list here! *
> 
>    list_add(wq->task_list, q->task_list)
>    spin_unlock(q->lock);
> 
>    *fetch old value of gp_flags here *
> 
>                                  spin_unlock(rnp_root)
> 
>                                  rcu_gp_kthread_wake() {
>                                   swake_up(wq) {
>                                    swait_active(wq) {
>                                     list_empty(wq->task_list)
> 
>                                    } * return false *
> 
>   if (condition) * false *
>     schedule();
> 
> In this case, a wakeup is missed, which could cause the rcu_gp_kthread
> waits for a long time.
> 
> The reason of this is that we do a lockless swait_active() check in
> swake_up(). To fix this, we can either 1) add a smp_mb() in swake_up()
> before swait_active() to provide the proper order or 2) simply remove
> the swait_active() in swake_up().
> 
> The solution 2 not only fixes this problem but also keeps the swait and
> wait API as close as possible, as wake_up() doesn't provide a full
> barrier and doesn't do a lockless check of the wait queue either.
> Moreover, there are users already using swait_active() to do their quick
> checks for the wait queues, so it make less sense that swake_up() and
> swake_up_all() do this on their own.
> 
> This patch then removes the lockless swait_active() check in swake_up()
> and swake_up_all().
> 
> Reported-by: Steven Rostedt <rostedt@goodmis.org>
> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
> ---
>  kernel/sched/swait.c | 6 ------
>  1 file changed, 6 deletions(-)
> 
> diff --git a/kernel/sched/swait.c b/kernel/sched/swait.c
> index 3d5610dcce11..2227e183e202 100644
> --- a/kernel/sched/swait.c
> +++ b/kernel/sched/swait.c
> @@ -33,9 +33,6 @@ void swake_up(struct swait_queue_head *q)
>  {
>  	unsigned long flags;
> 
> -	if (!swait_active(q))
> -		return;
> -
>  	raw_spin_lock_irqsave(&q->lock, flags);
>  	swake_up_locked(q);
>  	raw_spin_unlock_irqrestore(&q->lock, flags);
> @@ -51,9 +48,6 @@ void swake_up_all(struct swait_queue_head *q)
>  	struct swait_queue *curr;
>  	LIST_HEAD(tmp);
> 
> -	if (!swait_active(q))
> -		return;
> -
>  	raw_spin_lock_irq(&q->lock);
>  	list_splice_init(&q->task_list, &tmp);
>  	while (!list_empty(&tmp)) {
> -- 
> 2.13.0
> 

  reply	other threads:[~2017-06-15 17:56 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-09  3:25 [PATCH tip/sched/core] Add comments to aid in safer usage of swake_up Krister Johansen
2017-06-09  7:19 ` Peter Zijlstra
2017-06-09 12:45   ` Paul E. McKenney
2017-06-13 23:23     ` Steven Rostedt
2017-06-13 23:42       ` Paul E. McKenney
2017-06-14  1:15         ` Steven Rostedt
2017-06-14  3:58           ` Paul E. McKenney
2017-06-14 13:10             ` Steven Rostedt
2017-06-14 15:02               ` Steven Rostedt
2017-06-14 16:25                 ` Krister Johansen
2017-06-15  4:18                   ` Boqun Feng
2017-06-15 17:56                     ` Paul E. McKenney [this message]
2017-06-16  1:07                       ` Boqun Feng
2017-06-16  3:09                         ` Paul E. McKenney
2017-08-10 12:10                     ` [tip:locking/core] sched/wait: Remove the lockless swait_active() check in swake_up*() tip-bot for Boqun Feng
2017-06-14 15:55               ` [PATCH tip/sched/core] Add comments to aid in safer usage of swake_up Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170615175629.GE3721@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=boqun.feng@gmail.com \
    --cc=kjlx@templeofstupid.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=paul.gortmaker@windriver.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox