From: Corey Minyard <minyard@acm.org>
To: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: linux-rt-users@vger.kernel.org,
Corey Minyard <cminyard@mvista.com>,
Peter Zijlstra <peterz@infradead.org>,
linux-kernel@vger.kernel.org, tglx@linutronix.de,
Steven Rostedt <rostedt@goodmis.org>
Subject: Re: [PATCH RT v2] Fix a lockup in wait_for_completion() and friends
Date: Thu, 9 May 2019 12:46:05 -0500 [thread overview]
Message-ID: <20190509174605.GI16145@minyard.net> (raw)
In-Reply-To: <20190509161925.kul66w54wpjcinuc@linutronix.de>
On Thu, May 09, 2019 at 06:19:25PM +0200, Sebastian Andrzej Siewior wrote:
> Please:
> - add some RT developers on Cc:
> - add lkml
> - use [PATCH RT] instead just [PATCH] so it is visible that you target
> the RT tree.
Will do. I'll add your diagram below, too.
>
> On 2019-05-08 15:57:28 [-0500], minyard@acm.org wrote:
> > From: Corey Minyard <cminyard@mvista.com>
> >
> > The function call do_wait_for_common() has a race condition that
> > can result in lockups waiting for completions. Adding the thread
> > to (and removing the thread from) the wait queue for the completion
> > is done outside the do loop in that function. However, if the thread
> > is woken up, the swake_up_locked() function will delete the entry
> > from the wait queue. If that happens and another thread sneaks
> > in and decrements the done count in the completion to zero, the
> > loop will go around again, but the thread will no longer be in the
> > wait queue, so there is no way to wake it up.
> >
> > Fix it by adding/removing the thread to/from the wait queue inside
> > the do loop.
>
> So you are saying:
> T0 T1 T2
> wait_for_completion()
> do_wait_for_common()
> __prepare_to_swait()
> schedule()
> complete()
> x->done++ (0 -> 1)
> raw_spin_lock_irqsave()
> swake_up_locked() wait_for_completion()
> wake_up_process(T0)
> list_del_init()
> raw_spin_unlock_irqrestore()
> raw_spin_lock_irq(&x->wait.lock)
> raw_spin_lock_irq(&x->wait.lock) x->done != UINT_MAX, 1 -> 0
> return 1
> raw_spin_unlock_irq(&x->wait.lock)
> while (!x->done && timeout),
> continue loop, not enqueued
> on &x->wait
>
> The difference compared to the non-swait based implementation is that
> swake_up_locked() removes woken up tasks from the list while the other
> implementation (wait_queue_entry based, default_wake_function()) does
> not. Buh
Yes, exactly. I was wondering if swait could be changed to not remove
the waiter, but that seemed like a bad idea. It is an unusual semantic,
though.
I thought some more about this, wondering why everything isn't keeling
over because of this. I'm guessing that just about everything using
completions has a single waiter, so it doesn't matter. I just wrote
some code that has a bunch of waiters, so I hit it.
-corey
>
> One question for the upstream completion implementation:
> completion_done() returns true if there are no waiters. It acquires the
> wait.lock to ensure that complete()/complete_all() is done. However,
> once complete releases the lock it is guaranteed that the wake_up() (for
> the waiter) occurred. The waiter task still needs to be remove itself
> from the wait-queue before the completion can be removed.
> Do I miss something?
>
> > Fixes: a04ff6b4ec4ee7e ("completion: Use simple wait queues")
> > Signed-off-by: Corey Minyard <cminyard@mvista.com>
> > ---
> > I sent the wrong version of this, I had spotted this before but didn't
> > fix it here. Adding the thread to the wait queue needs to come after
> > the signal check. Sorry about the noise.
> >
> > kernel/sched/completion.c | 8 ++++----
> > 1 file changed, 4 insertions(+), 4 deletions(-)
> >
> > diff --git a/kernel/sched/completion.c b/kernel/sched/completion.c
> > index 755a58084978..4f9b4cc0c95a 100644
> > --- a/kernel/sched/completion.c
> > +++ b/kernel/sched/completion.c
> > @@ -70,20 +70,20 @@ do_wait_for_common(struct completion *x,
> > long (*action)(long), long timeout, int state)
> > {
> > if (!x->done) {
> > - DECLARE_SWAITQUEUE(wait);
> > -
> > - __prepare_to_swait(&x->wait, &wait);
>
> you can keep DECLARE_SWAITQUEUE remove just __prepare_to_swait()
>
> > do {
> > + DECLARE_SWAITQUEUE(wait);
> > +
> > if (signal_pending_state(state, current)) {
> > timeout = -ERESTARTSYS;
> > break;
> > }
> > + __prepare_to_swait(&x->wait, &wait);
>
> add this, yes and you are done.
>
> > __set_current_state(state);
> > raw_spin_unlock_irq(&x->wait.lock);
> > timeout = action(timeout);
> > raw_spin_lock_irq(&x->wait.lock);
> > + __finish_swait(&x->wait, &wait);
> > } while (!x->done && timeout);
> > - __finish_swait(&x->wait, &wait);
> > if (!x->done)
> > return timeout;
> > }
>
> Sebastian
next prev parent reply other threads:[~2019-05-09 17:46 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-05-08 20:57 [PATCH v2] Fix a lockup in wait_for_completion() and friends minyard
2019-05-09 16:19 ` [PATCH RT " Sebastian Andrzej Siewior
2019-05-09 17:46 ` Corey Minyard [this message]
2019-05-14 8:43 ` Peter Zijlstra
2019-05-14 9:12 ` Sebastian Andrzej Siewior
2019-05-14 11:35 ` Peter Zijlstra
2019-05-14 15:25 ` Sebastian Andrzej Siewior
2019-05-14 12:13 ` Corey Minyard
2019-05-14 15:36 ` Sebastian Andrzej Siewior
2019-05-15 16:22 ` Corey Minyard
2019-06-26 10:35 ` Peter Zijlstra
-- strict thread matches above, loose matches on Subject: below --
2019-05-09 19:33 minyard
2019-05-09 19:51 ` Steven Rostedt
2019-05-10 10:33 ` Sebastian Andrzej Siewior
2019-05-10 12:08 ` Corey Minyard
2019-05-10 12:26 ` Sebastian Andrzej Siewior
2019-06-29 1:49 ` Steven Rostedt
2019-07-01 19:09 ` Corey Minyard
2019-07-01 20:18 ` Steven Rostedt
2019-07-01 20:43 ` Corey Minyard
2019-07-01 21:06 ` Steven Rostedt
2019-07-01 21:13 ` Steven Rostedt
2019-07-01 21:28 ` Steven Rostedt
2019-07-01 21:34 ` Corey Minyard
2019-07-02 7:04 ` Kurt Kanzenbach
2019-07-02 8:35 ` Sebastian Andrzej Siewior
2019-07-02 11:40 ` Corey Minyard
2019-07-02 11:53 ` Sebastian Andrzej Siewior
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190509174605.GI16145@minyard.net \
--to=minyard@acm.org \
--cc=bigeasy@linutronix.de \
--cc=cminyard@mvista.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-rt-users@vger.kernel.org \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.