linux-rt-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Corey Minyard <minyard@acm.org>
To: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: linux-rt-users@vger.kernel.org,
	Corey Minyard <cminyard@mvista.com>,
	Peter Zijlstra <peterz@infradead.org>,
	linux-kernel@vger.kernel.org, tglx@linutronix.de,
	Steven Rostedt <rostedt@goodmis.org>
Subject: Re: [PATCH RT v2] Fix a lockup in wait_for_completion() and friends
Date: Thu, 9 May 2019 12:46:05 -0500	[thread overview]
Message-ID: <20190509174605.GI16145@minyard.net> (raw)
In-Reply-To: <20190509161925.kul66w54wpjcinuc@linutronix.de>

On Thu, May 09, 2019 at 06:19:25PM +0200, Sebastian Andrzej Siewior wrote:
> Please:
>  - add some RT developers on Cc:
>  - add lkml
>  - use [PATCH RT] instead just [PATCH] so it is visible that you target
>    the RT tree.

Will do.  I'll add your diagram below, too.

> 
> On 2019-05-08 15:57:28 [-0500], minyard@acm.org wrote:
> > From: Corey Minyard <cminyard@mvista.com>
> > 
> > The function call do_wait_for_common() has a race condition that
> > can result in lockups waiting for completions.  Adding the thread
> > to (and removing the thread from) the wait queue for the completion
> > is done outside the do loop in that function.  However, if the thread
> > is woken up, the swake_up_locked() function will delete the entry
> > from the wait queue.  If that happens and another thread sneaks
> > in and decrements the done count in the completion to zero, the
> > loop will go around again, but the thread will no longer be in the
> > wait queue, so there is no way to wake it up.
> > 
> > Fix it by adding/removing the thread to/from the wait queue inside
> > the do loop.
> 
> So you are saying:
> 	T0			T1			    T2
> 	wait_for_completion()
> 	 do_wait_for_common()
> 	  __prepare_to_swait()
> 	   schedule()
> 	    		       complete()
> 			        x->done++ (0 -> 1)
> 				raw_spin_lock_irqsave()
> 				 swake_up_locked()           wait_for_completion()
> 				  wake_up_process(T0)
> 				  list_del_init()
> 				raw_spin_unlock_irqrestore()
> 	                                                      raw_spin_lock_irq(&x->wait.lock)
> 	 raw_spin_lock_irq(&x->wait.lock)                      x->done != UINT_MAX, 1 -> 0
> 							       return 1
> 							      raw_spin_unlock_irq(&x->wait.lock)
> 	 while (!x->done && timeout),
> 	 continue loop, not enqueued
> 	 on &x->wait
> 
> The difference compared to the non-swait based implementation is that
> swake_up_locked() removes woken up tasks from the list while the other
> implementation (wait_queue_entry based, default_wake_function()) does
> not. Buh

Yes, exactly.  I was wondering if swait could be changed to not remove
the waiter, but that seemed like a bad idea.  It is an unusual semantic,
though.

I thought some more about this, wondering why everything isn't keeling
over because of this.  I'm guessing that just about everything using
completions has a single waiter, so it doesn't matter.  I just wrote
some code that has a bunch of waiters, so I hit it.

-corey

> 
> One question for the upstream completion implementation:
> completion_done() returns true if there are no waiters. It acquires the
> wait.lock to ensure that complete()/complete_all() is done. However,
> once complete releases the lock it is guaranteed that the wake_up() (for
> the waiter) occurred. The waiter task still needs to be remove itself
> from the wait-queue before the completion can be removed.
> Do I miss something?
> 
> > Fixes: a04ff6b4ec4ee7e ("completion: Use simple wait queues")
> > Signed-off-by: Corey Minyard <cminyard@mvista.com>
> > ---
> > I sent the wrong version of this, I had spotted this before but didn't
> > fix it here.  Adding the thread to the wait queue needs to come after
> > the signal check.  Sorry about the noise.
> > 
> >  kernel/sched/completion.c | 8 ++++----
> >  1 file changed, 4 insertions(+), 4 deletions(-)
> > 
> > diff --git a/kernel/sched/completion.c b/kernel/sched/completion.c
> > index 755a58084978..4f9b4cc0c95a 100644
> > --- a/kernel/sched/completion.c
> > +++ b/kernel/sched/completion.c
> > @@ -70,20 +70,20 @@ do_wait_for_common(struct completion *x,
> >  		   long (*action)(long), long timeout, int state)
> >  {
> >  	if (!x->done) {
> > -		DECLARE_SWAITQUEUE(wait);
> > -
> > -		__prepare_to_swait(&x->wait, &wait);
> 
> you can keep DECLARE_SWAITQUEUE remove just __prepare_to_swait()
> 
> >  		do {
> > +			DECLARE_SWAITQUEUE(wait);
> > +
> >  			if (signal_pending_state(state, current)) {
> >  				timeout = -ERESTARTSYS;
> >  				break;
> >  			}
> > +			__prepare_to_swait(&x->wait, &wait);
> 
> add this, yes and you are done.
> 
> >  			__set_current_state(state);
> >  			raw_spin_unlock_irq(&x->wait.lock);
> >  			timeout = action(timeout);
> >  			raw_spin_lock_irq(&x->wait.lock);
> > +			__finish_swait(&x->wait, &wait);
> >  		} while (!x->done && timeout);
> > -		__finish_swait(&x->wait, &wait);
> >  		if (!x->done)
> >  			return timeout;
> >  	}
> 
> Sebastian

  reply	other threads:[~2019-05-09 17:46 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-08 20:57 [PATCH v2] Fix a lockup in wait_for_completion() and friends minyard
2019-05-09 16:19 ` [PATCH RT " Sebastian Andrzej Siewior
2019-05-09 17:46   ` Corey Minyard [this message]
2019-05-14  8:43   ` Peter Zijlstra
2019-05-14  9:12     ` Sebastian Andrzej Siewior
2019-05-14 11:35       ` Peter Zijlstra
2019-05-14 15:25         ` Sebastian Andrzej Siewior
2019-05-14 12:13       ` Corey Minyard
2019-05-14 15:36         ` Sebastian Andrzej Siewior
2019-05-15 16:22           ` Corey Minyard
2019-06-26 10:35   ` Peter Zijlstra
  -- strict thread matches above, loose matches on Subject: below --
2019-05-09 19:33 minyard
2019-05-09 19:51 ` Steven Rostedt
2019-05-10 10:33 ` Sebastian Andrzej Siewior
2019-05-10 12:08   ` Corey Minyard
2019-05-10 12:26     ` Sebastian Andrzej Siewior
2019-06-29  1:49   ` Steven Rostedt
2019-07-01 19:09     ` Corey Minyard
     [not found]       ` <20190701161840.1a53c9e4@gandalf.local.home>
2019-07-01 20:43         ` Corey Minyard
2019-07-01 21:06           ` Steven Rostedt
2019-07-01 21:13             ` Steven Rostedt
2019-07-01 21:28               ` Steven Rostedt
2019-07-01 21:34                 ` Corey Minyard
2019-07-02  7:04                 ` Kurt Kanzenbach
2019-07-02  8:35                   ` Sebastian Andrzej Siewior
2019-07-02 11:40                     ` Corey Minyard
2019-07-02 11:53                       ` Sebastian Andrzej Siewior

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190509174605.GI16145@minyard.net \
    --to=minyard@acm.org \
    --cc=bigeasy@linutronix.de \
    --cc=cminyard@mvista.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rt-users@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).