* wait_for_completion_timeout() spurious failure under heavy load?
@ 2008-06-19 22:04 Roland Dreier
2008-06-20 6:40 ` Jiri Slaby
0 siblings, 1 reply; 8+ messages in thread
From: Roland Dreier @ 2008-06-19 22:04 UTC (permalink / raw)
To: linux-kernel, mingo; +Cc: Eli Cohen, general
It seems that the current implementaton of wait_for_completion_timeout()
has a small problem under very high load for the common pattern:
if (!wait_for_completion_timeout(&done, timeout))
/* handle failure */
because the implementation very roughly does (lots of code deleted to
show the basic flow):
static inline long __sched
do_wait_for_common(struct completion *x, long timeout, int state)
{
if (x->done)
return timeout;
do {
timeout = schedule_timeout(timeout);
if (!timeout)
return timeout;
} while (!x->done);
return timeout;
}
so if the system is very busy and x->done is not set when
do_wait_for_common() is entered, it is possible that the first call to
schedule_timeout() returns 0 because the task doing wait_for_completion
doesn't get rescheduled for a long time, even if it is woken up early
enough. In this case, wait_for_completion_timeout() returns 0 without
even checking x->done again, and the code above falls into its failure
case purely for scheduler reasons, even if the hardware event or
whatever was being waited for happened early enough.
So would it make sense to add an extra test to do_wait_for() in the
timeout case and, say, return 1 if x->done is actually set? Something
like the patch below?
A quick audit (not exhaustive) of wait_for_completion_timeout() callers
seems to indicate that no one actually cares about the return value in
the success case -- they just test for 0 (timed out) versus non-zero
(wait succeeded).
Thanks,
Roland
diff --git a/kernel/sched.c b/kernel/sched.c
index eaf6751..3d04ec1 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -4405,7 +4405,12 @@ do_wait_for_common(struct completion *x, long timeout, int state)
spin_lock_irq(&x->wait.lock);
if (!timeout) {
__remove_wait_queue(&x->wait, &wait);
- return timeout;
+ if (x->done) {
+ x->done--;
+ return 1;
+ } else {
+ return 0;
+ }
}
} while (!x->done);
__remove_wait_queue(&x->wait, &wait);
^ permalink raw reply related [flat|nested] 8+ messages in thread* Re: wait_for_completion_timeout() spurious failure under heavy load? 2008-06-19 22:04 wait_for_completion_timeout() spurious failure under heavy load? Roland Dreier @ 2008-06-20 6:40 ` Jiri Slaby 2008-06-20 11:20 ` Ingo Molnar 0 siblings, 1 reply; 8+ messages in thread From: Jiri Slaby @ 2008-06-20 6:40 UTC (permalink / raw) To: Roland Dreier; +Cc: linux-kernel, mingo, Eli Cohen, general Roland Dreier napsal(a): > It seems that the current implementaton of wait_for_completion_timeout() > has a small problem under very high load for the common pattern: > > if (!wait_for_completion_timeout(&done, timeout)) > /* handle failure */ > > because the implementation very roughly does (lots of code deleted to > show the basic flow): > > static inline long __sched > do_wait_for_common(struct completion *x, long timeout, int state) > { > if (x->done) > return timeout; > > do { > timeout = schedule_timeout(timeout); > > if (!timeout) > return timeout; > > } while (!x->done); > > return timeout; > } > > so if the system is very busy and x->done is not set when > do_wait_for_common() is entered, it is possible that the first call to > schedule_timeout() returns 0 because the task doing wait_for_completion Sorry, but how can schedule_timeout return 0 before the timeout expiration? ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: wait_for_completion_timeout() spurious failure under heavy load? 2008-06-20 6:40 ` Jiri Slaby @ 2008-06-20 11:20 ` Ingo Molnar 2008-06-20 11:30 ` Peter Zijlstra ` (2 more replies) 0 siblings, 3 replies; 8+ messages in thread From: Ingo Molnar @ 2008-06-20 11:20 UTC (permalink / raw) To: Jiri Slaby Cc: Roland Dreier, linux-kernel, Eli Cohen, general, Oleg Nesterov, Peter Zijlstra * Jiri Slaby <jirislaby@gmail.com> wrote: >> do { >> timeout = schedule_timeout(timeout); >> >> if (!timeout) >> return timeout; >> >> } while (!x->done); >> >> return timeout; >> } >> >> so if the system is very busy and x->done is not set when >> do_wait_for_common() is entered, it is possible that the first call to >> schedule_timeout() returns 0 because the task doing wait_for_completion > > Sorry, but how can schedule_timeout return 0 before the timeout > expiration? the point would be that due to high load, the completion wakeup happens first, but then due to scheduling delays the timeout also occurs (later), before the wakeup related to x->done has managed to do its task. I.e. due to scheduling delays we report a spurious "timeout" failure, despite the completion occuring before the timeout. The timeout is really intended to be related to the delay of the completion event, not the delay of scheduling after that event happened. seems like a well-spotted race to me, i agree it's more robust to ignore the timeout if we can make progress on x->done, and return a 1 jiffy 'timeout remaining' value. Oleg, do you concur? but i'd do it not like this: > if (!timeout) { > __remove_wait_queue(&x->wait, &wait); > - return timeout; > + if (x->done) { > + x->done--; > + return 1; > + } else { > + return 0; > + } but like in the commit below. Agreed? Ingo --------------------> commit bb10ed0994927d433f6dbdf274fdb26cfcf516b7 Author: Roland Dreier <rdreier@cisco.com> Date: Thu Jun 19 15:04:07 2008 -0700 sched: fix wait_for_completion_timeout() spurious failure under heavy load It seems that the current implementaton of wait_for_completion_timeout() has a small problem under very high load for the common pattern: if (!wait_for_completion_timeout(&done, timeout)) /* handle failure */ because the implementation very roughly does (lots of code deleted to show the basic flow): static inline long __sched do_wait_for_common(struct completion *x, long timeout, int state) { if (x->done) return timeout; do { timeout = schedule_timeout(timeout); if (!timeout) return timeout; } while (!x->done); return timeout; } so if the system is very busy and x->done is not set when do_wait_for_common() is entered, it is possible that the first call to schedule_timeout() returns 0 because the task doing wait_for_completion doesn't get rescheduled for a long time, even if it is woken up early enough. In this case, wait_for_completion_timeout() returns 0 without even checking x->done again, and the code above falls into its failure case purely for scheduler reasons, even if the hardware event or whatever was being waited for happened early enough. It would make sense to add an extra test to do_wait_for() in the timeout case and return 1 if x->done is actually set. A quick audit (not exhaustive) of wait_for_completion_timeout() callers seems to indicate that no one actually cares about the return value in the success case -- they just test for 0 (timed out) versus non-zero (wait succeeded). Signed-off-by: Ingo Molnar <mingo@elte.hu> diff --git a/kernel/sched.c b/kernel/sched.c index 4a3cb06..577f160 100644 --- a/kernel/sched.c +++ b/kernel/sched.c @@ -4405,6 +4405,16 @@ do_wait_for_common(struct completion *x, long timeout, int state) spin_unlock_irq(&x->wait.lock); timeout = schedule_timeout(timeout); spin_lock_irq(&x->wait.lock); + + /* + * If the completion has arrived meanwhile + * then return 1 jiffy time left: + */ + if (x->done && !timeout) { + timeout = 1; + break; + } + if (!timeout) { __remove_wait_queue(&x->wait, &wait); return timeout; ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: wait_for_completion_timeout() spurious failure under heavy load? 2008-06-20 11:20 ` Ingo Molnar @ 2008-06-20 11:30 ` Peter Zijlstra 2008-06-20 14:14 ` Oleg Nesterov 2008-06-20 14:33 ` Roland Dreier 2 siblings, 0 replies; 8+ messages in thread From: Peter Zijlstra @ 2008-06-20 11:30 UTC (permalink / raw) To: Ingo Molnar Cc: Jiri Slaby, Roland Dreier, linux-kernel, Eli Cohen, general, Oleg Nesterov On Fri, 2008-06-20 at 13:20 +0200, Ingo Molnar wrote: > * Jiri Slaby <jirislaby@gmail.com> wrote: > > >> do { > >> timeout = schedule_timeout(timeout); > >> > >> if (!timeout) > >> return timeout; > >> > >> } while (!x->done); > >> > >> return timeout; > >> } > >> > >> so if the system is very busy and x->done is not set when > >> do_wait_for_common() is entered, it is possible that the first call to > >> schedule_timeout() returns 0 because the task doing wait_for_completion > > > > Sorry, but how can schedule_timeout return 0 before the timeout > > expiration? > > the point would be that due to high load, the completion wakeup happens > first, but then due to scheduling delays the timeout also occurs > (later), before the wakeup related to x->done has managed to do its > task. > > I.e. due to scheduling delays we report a spurious "timeout" failure, > despite the completion occuring before the timeout. The timeout is > really intended to be related to the delay of the completion event, not > the delay of scheduling after that event happened. > > seems like a well-spotted race to me, i agree it's more robust to ignore > the timeout if we can make progress on x->done, and return a 1 jiffy > 'timeout remaining' value. Oleg, do you concur? > > but i'd do it not like this: > > > if (!timeout) { > > __remove_wait_queue(&x->wait, &wait); > > - return timeout; > > + if (x->done) { > > + x->done--; > > + return 1; > > + } else { > > + return 0; > > + } > > but like in the commit below. Agreed? > > Ingo > > --------------------> > commit bb10ed0994927d433f6dbdf274fdb26cfcf516b7 > Author: Roland Dreier <rdreier@cisco.com> > Date: Thu Jun 19 15:04:07 2008 -0700 > > sched: fix wait_for_completion_timeout() spurious failure under heavy load > > It seems that the current implementaton of wait_for_completion_timeout() > has a small problem under very high load for the common pattern: > > if (!wait_for_completion_timeout(&done, timeout)) > /* handle failure */ > > because the implementation very roughly does (lots of code deleted to > show the basic flow): > > static inline long __sched > do_wait_for_common(struct completion *x, long timeout, int state) > { > if (x->done) > return timeout; > > do { > timeout = schedule_timeout(timeout); > > if (!timeout) > return timeout; > > } while (!x->done); > > return timeout; > } > > so if the system is very busy and x->done is not set when > do_wait_for_common() is entered, it is possible that the first call to > schedule_timeout() returns 0 because the task doing wait_for_completion > doesn't get rescheduled for a long time, even if it is woken up early > enough. > > In this case, wait_for_completion_timeout() returns 0 without even > checking x->done again, and the code above falls into its failure case > purely for scheduler reasons, even if the hardware event or whatever was > being waited for happened early enough. > > It would make sense to add an extra test to do_wait_for() in the timeout > case and return 1 if x->done is actually set. > > A quick audit (not exhaustive) of wait_for_completion_timeout() callers > seems to indicate that no one actually cares about the return value in > the success case -- they just test for 0 (timed out) versus non-zero > (wait succeeded). > > Signed-off-by: Ingo Molnar <mingo@elte.hu> Good catch, Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> > diff --git a/kernel/sched.c b/kernel/sched.c > index 4a3cb06..577f160 100644 > --- a/kernel/sched.c > +++ b/kernel/sched.c > @@ -4405,6 +4405,16 @@ do_wait_for_common(struct completion *x, long timeout, int state) > spin_unlock_irq(&x->wait.lock); > timeout = schedule_timeout(timeout); > spin_lock_irq(&x->wait.lock); > + > + /* > + * If the completion has arrived meanwhile > + * then return 1 jiffy time left: > + */ > + if (x->done && !timeout) { > + timeout = 1; > + break; > + } > + > if (!timeout) { > __remove_wait_queue(&x->wait, &wait); > return timeout; ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: wait_for_completion_timeout() spurious failure under heavy load? 2008-06-20 11:20 ` Ingo Molnar 2008-06-20 11:30 ` Peter Zijlstra @ 2008-06-20 14:14 ` Oleg Nesterov 2008-06-20 14:32 ` Oleg Nesterov 2008-06-20 14:33 ` Roland Dreier 2 siblings, 1 reply; 8+ messages in thread From: Oleg Nesterov @ 2008-06-20 14:14 UTC (permalink / raw) To: Ingo Molnar Cc: Jiri Slaby, Roland Dreier, linux-kernel, Eli Cohen, general, Peter Zijlstra On 06/20, Ingo Molnar wrote: > > --- a/kernel/sched.c > +++ b/kernel/sched.c > @@ -4405,6 +4405,16 @@ do_wait_for_common(struct completion *x, long timeout, int state) > spin_unlock_irq(&x->wait.lock); > timeout = schedule_timeout(timeout); > spin_lock_irq(&x->wait.lock); > + > + /* > + * If the completion has arrived meanwhile > + * then return 1 jiffy time left: > + */ > + if (x->done && !timeout) { > + timeout = 1; > + break; > + } > + > if (!timeout) { > __remove_wait_queue(&x->wait, &wait); > return timeout; This is the real nitpick, but I wonder what is the right behaviour of wait_for_completion_timeout(x, 0) when x->done != 0. Perhaps we can return 1 in that case too, just for the consistency? IOW, how about the patch below? this also makes the code a bit simpler because we factor out __remove_wait_queue(). Oleg. --- kernel/sched.c +++ kernel/sched.c @@ -4746,15 +4746,13 @@ do_wait_for_common(struct completion *x, spin_unlock_irq(&x->wait.lock); timeout = schedule_timeout(timeout); spin_lock_irq(&x->wait.lock); - if (!timeout) { - __remove_wait_queue(&x->wait, &wait); - return timeout; - } - } while (!x->done); + } while (!x->done && timeout); __remove_wait_queue(&x->wait, &wait); + if (!x->done) + return 0; } x->done--; - return timeout; + return timeout ?: 1; } static long __sched ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: wait_for_completion_timeout() spurious failure under heavy load? 2008-06-20 14:14 ` Oleg Nesterov @ 2008-06-20 14:32 ` Oleg Nesterov 2008-06-20 15:21 ` Ingo Molnar 0 siblings, 1 reply; 8+ messages in thread From: Oleg Nesterov @ 2008-06-20 14:32 UTC (permalink / raw) To: Ingo Molnar Cc: Jiri Slaby, Roland Dreier, linux-kernel, Eli Cohen, general, Peter Zijlstra On 06/20, Oleg Nesterov wrote: > > On 06/20, Ingo Molnar wrote: > > > > --- a/kernel/sched.c > > +++ b/kernel/sched.c > > @@ -4405,6 +4405,16 @@ do_wait_for_common(struct completion *x, long timeout, int state) > > spin_unlock_irq(&x->wait.lock); > > timeout = schedule_timeout(timeout); > > spin_lock_irq(&x->wait.lock); > > + > > + /* > > + * If the completion has arrived meanwhile > > + * then return 1 jiffy time left: > > + */ > > + if (x->done && !timeout) { > > + timeout = 1; > > + break; > > + } > > + > > if (!timeout) { > > __remove_wait_queue(&x->wait, &wait); > > return timeout; > > This is the real nitpick, but I wonder what is the right behaviour > of wait_for_completion_timeout(x, 0) when x->done != 0. Perhaps we > can return 1 in that case too, just for the consistency? > > IOW, how about the patch below? this also makes the code a bit > simpler because we factor out __remove_wait_queue(). Even better, we can kill the first __remove_wait_queue() as well. Oleg. --- kernel/sched.c +++ kernel/sched.c @@ -4739,22 +4739,20 @@ do_wait_for_common(struct completion *x, signal_pending(current)) || (state == TASK_KILLABLE && fatal_signal_pending(current))) { - __remove_wait_queue(&x->wait, &wait); - return -ERESTARTSYS; + timeout = -ERESTARTSYS; + break; } __set_current_state(state); spin_unlock_irq(&x->wait.lock); timeout = schedule_timeout(timeout); spin_lock_irq(&x->wait.lock); - if (!timeout) { - __remove_wait_queue(&x->wait, &wait); - return timeout; - } - } while (!x->done); + } while (!x->done && timeout); __remove_wait_queue(&x->wait, &wait); + if (!x->done) + return timeout; } x->done--; - return timeout; + return timeout ?: 1; } static long __sched ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: wait_for_completion_timeout() spurious failure under heavy load? 2008-06-20 14:32 ` Oleg Nesterov @ 2008-06-20 15:21 ` Ingo Molnar 0 siblings, 0 replies; 8+ messages in thread From: Ingo Molnar @ 2008-06-20 15:21 UTC (permalink / raw) To: Oleg Nesterov Cc: Jiri Slaby, Roland Dreier, linux-kernel, Eli Cohen, general, Peter Zijlstra * Oleg Nesterov <oleg@tv-sign.ru> wrote: > > IOW, how about the patch below? this also makes the code a bit > > simpler because we factor out __remove_wait_queue(). > > Even better, we can kill the first __remove_wait_queue() as well. nice, thanks - applied it in the form below to tip/sched/urgent. Ingo ------------------> commit 6b8464474776dccf619283ee5510b0b795382dfb Author: Oleg Nesterov <oleg@tv-sign.ru> Date: Fri Jun 20 18:32:20 2008 +0400 sched: refactor wait_for_completion_timeout() Simplify the code and fix the boundary condition of wait_for_completion_timeout(,0). We can kill the first __remove_wait_queue() as well. Signed-off-by: Ingo Molnar <mingo@elte.hu> diff --git a/kernel/sched.c b/kernel/sched.c index 577f160..bebf978 100644 --- a/kernel/sched.c +++ b/kernel/sched.c @@ -4398,32 +4398,20 @@ do_wait_for_common(struct completion *x, long timeout, int state) signal_pending(current)) || (state == TASK_KILLABLE && fatal_signal_pending(current))) { - __remove_wait_queue(&x->wait, &wait); - return -ERESTARTSYS; + timeout = -ERESTARTSYS; + break; } __set_current_state(state); spin_unlock_irq(&x->wait.lock); timeout = schedule_timeout(timeout); spin_lock_irq(&x->wait.lock); - - /* - * If the completion has arrived meanwhile - * then return 1 jiffy time left: - */ - if (x->done && !timeout) { - timeout = 1; - break; - } - - if (!timeout) { - __remove_wait_queue(&x->wait, &wait); - return timeout; - } - } while (!x->done); + } while (!x->done && timeout); __remove_wait_queue(&x->wait, &wait); + if (!x->done) + return timeout; } x->done--; - return timeout; + return timeout ?: 1; } static long __sched ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: wait_for_completion_timeout() spurious failure under heavy load? 2008-06-20 11:20 ` Ingo Molnar 2008-06-20 11:30 ` Peter Zijlstra 2008-06-20 14:14 ` Oleg Nesterov @ 2008-06-20 14:33 ` Roland Dreier 2 siblings, 0 replies; 8+ messages in thread From: Roland Dreier @ 2008-06-20 14:33 UTC (permalink / raw) To: Ingo Molnar Cc: Jiri Slaby, linux-kernel, Eli Cohen, general, Oleg Nesterov, Peter Zijlstra > but like in the commit below. Agreed? > + > + /* > + * If the completion has arrived meanwhile > + * then return 1 jiffy time left: > + */ > + if (x->done && !timeout) { > + timeout = 1; > + break; > + } > + Sure, that looks nice to me. - R. ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2008-06-20 15:21 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-06-19 22:04 wait_for_completion_timeout() spurious failure under heavy load? Roland Dreier 2008-06-20 6:40 ` Jiri Slaby 2008-06-20 11:20 ` Ingo Molnar 2008-06-20 11:30 ` Peter Zijlstra 2008-06-20 14:14 ` Oleg Nesterov 2008-06-20 14:32 ` Oleg Nesterov 2008-06-20 15:21 ` Ingo Molnar 2008-06-20 14:33 ` Roland Dreier
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox