* Re: [PATCH] wait: fix false timeouts when using wait_event_timeout()
@ 2013-06-04 19:28 Oleg Nesterov
2013-06-04 21:35 ` Imre Deak
0 siblings, 1 reply; 23+ messages in thread
From: Oleg Nesterov @ 2013-06-04 19:28 UTC (permalink / raw)
To: Andrew Morton, Daniel Vetter, Dave Jones, David Howells,
Imre Deak, Jens Axboe, Linus Torvalds, Lukas Czerner,
Paul E. McKenney
Cc: linux-kernel
Hello,
Just noticed this commit...
commit 4c663cfc523a88d97a8309b04a089c27dc57fd7e
Author: Imre Deak <imre.deak@intel.com>
Date: Fri May 24 15:55:09 2013 -0700
Many callers of the wait_event_timeout() and
wait_event_interruptible_timeout() expect that the return value will be
positive if the specified condition becomes true before the timeout
elapses. However, at the moment this isn't guaranteed. If the wake-up
handler is delayed enough, the time remaining until timeout will be
calculated as 0 - and passed back as a return value - even if the
condition became true before the timeout has passed.
OK, agreed.
--- a/include/linux/wait.h
+++ b/include/linux/wait.h
@@ -217,6 +217,8 @@ do { \
if (!ret) \
break; \
} \
+ if (!ret && (condition)) \
+ ret = 1; \
finish_wait(&wq, &__wait); \
} while (0)
Well, this evaluates "condition" twice, perhaps it would be more
clean to do, say,
#define __wait_event_timeout(wq, condition, ret) \
do { \
DEFINE_WAIT(__wait); \
\
for (;;) { \
prepare_to_wait(&wq, &__wait, TASK_UNINTERRUPTIBLE); \
if (condition) { \
if (!ret) \
ret = 1; \
break; \
} else if (!ret) \
break; \
ret = schedule_timeout(ret); \
} \
finish_wait(&wq, &__wait); \
} while (0)
but this is minor.
@@ -233,8 +235,9 @@ do { \
* wake_up() has to be called after changing any variable that could
* change the result of the wait condition.
*
- * The function returns 0 if the @timeout elapsed, and the remaining
- * jiffies if the condition evaluated to true before the timeout elapsed.
+ * The function returns 0 if the @timeout elapsed, or the remaining
+ * jiffies (at least 1) if the @condition evaluated to %true before
+ * the @timeout elapsed.
This is still not true if timeout == 0.
Shouldn't we also change wait_event_timeout() ? Say,
#define wait_event_timeout(wq, condition, timeout) \
({ \
long __ret = timeout; \
if (!(condition)) \
__wait_event_timeout(wq, condition, __ret); \
else if (!__ret) \
__ret = 1; \
__ret; \
})
Or wait_event_timeout(timeout => 0) is not legal in a non-void context?
To me the code like
long wait_for_something(bool nonblock)
{
timeout = nonblock ? 0 : DEFAULT_TIMEOUT;
return wait_event_timeout(..., timeout);
}
looks reasonable and correct. But it is not?
Oleg.
^ permalink raw reply [flat|nested] 23+ messages in thread* Re: [PATCH] wait: fix false timeouts when using wait_event_timeout() 2013-06-04 19:28 [PATCH] wait: fix false timeouts when using wait_event_timeout() Oleg Nesterov @ 2013-06-04 21:35 ` Imre Deak 2013-06-04 21:40 ` Imre Deak 0 siblings, 1 reply; 23+ messages in thread From: Imre Deak @ 2013-06-04 21:35 UTC (permalink / raw) To: Oleg Nesterov Cc: Andrew Morton, Daniel Vetter, Dave Jones, David Howells, Jens Axboe, Linus Torvalds, Lukas Czerner, Paul E. McKenney, linux-kernel On Tue, 2013-06-04 at 21:28 +0200, Oleg Nesterov wrote: > Hello, > > Just noticed this commit... > > commit 4c663cfc523a88d97a8309b04a089c27dc57fd7e > Author: Imre Deak <imre.deak@intel.com> > Date: Fri May 24 15:55:09 2013 -0700 > > Many callers of the wait_event_timeout() and > wait_event_interruptible_timeout() expect that the return value will be > positive if the specified condition becomes true before the timeout > elapses. However, at the moment this isn't guaranteed. If the wake-up > handler is delayed enough, the time remaining until timeout will be > calculated as 0 - and passed back as a return value - even if the > condition became true before the timeout has passed. > > OK, agreed. > > --- a/include/linux/wait.h > +++ b/include/linux/wait.h > @@ -217,6 +217,8 @@ do { \ > if (!ret) \ > break; \ > } \ > + if (!ret && (condition)) \ > + ret = 1; \ > finish_wait(&wq, &__wait); \ > } while (0) > > Well, this evaluates "condition" twice, perhaps it would be more > clean to do, say, > > #define __wait_event_timeout(wq, condition, ret) \ > do { \ > DEFINE_WAIT(__wait); \ > \ > for (;;) { \ > prepare_to_wait(&wq, &__wait, TASK_UNINTERRUPTIBLE); \ > if (condition) { \ > if (!ret) \ > ret = 1; \ > break; \ > } else if (!ret) \ > break; \ > ret = schedule_timeout(ret); \ > } \ > finish_wait(&wq, &__wait); \ > } while (0) > > but this is minor. > > @@ -233,8 +235,9 @@ do { \ > * wake_up() has to be called after changing any variable that could > * change the result of the wait condition. > * > - * The function returns 0 if the @timeout elapsed, and the remaining > - * jiffies if the condition evaluated to true before the timeout elapsed. > + * The function returns 0 if the @timeout elapsed, or the remaining > + * jiffies (at least 1) if the @condition evaluated to %true before > + * the @timeout elapsed. > > This is still not true if timeout == 0. > > Shouldn't we also change wait_event_timeout() ? Say, > > #define wait_event_timeout(wq, condition, timeout) \ > ({ \ > long __ret = timeout; \ > if (!(condition)) \ > __wait_event_timeout(wq, condition, __ret); \ > else if (!__ret) \ > __ret = 1; \ > __ret; \ > }) > > Or wait_event_timeout(timeout => 0) is not legal in a non-void context? > > To me the code like > > long wait_for_something(bool nonblock) > { > timeout = nonblock ? 0 : DEFAULT_TIMEOUT; > return wait_event_timeout(..., timeout); > } > > looks reasonable and correct. But it is not? I don't see why it would be not legal. Note though that in the above form wait_event_timeout(cond, 0) would still schedule() if cond is false, which is not what I'd expect from a non-blocking function. --Imre ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] wait: fix false timeouts when using wait_event_timeout() 2013-06-04 21:35 ` Imre Deak @ 2013-06-04 21:40 ` Imre Deak 2013-06-05 16:37 ` Oleg Nesterov 0 siblings, 1 reply; 23+ messages in thread From: Imre Deak @ 2013-06-04 21:40 UTC (permalink / raw) To: Oleg Nesterov Cc: Andrew Morton, Daniel Vetter, Dave Jones, David Howells, Jens Axboe, Linus Torvalds, Lukas Czerner, Paul E. McKenney, linux-kernel On Wed, 2013-06-05 at 00:35 +0300, Imre Deak wrote: > On Tue, 2013-06-04 at 21:28 +0200, Oleg Nesterov wrote: > > Hello, > > > > Just noticed this commit... > > > > commit 4c663cfc523a88d97a8309b04a089c27dc57fd7e > > Author: Imre Deak <imre.deak@intel.com> > > Date: Fri May 24 15:55:09 2013 -0700 > > > > Many callers of the wait_event_timeout() and > > wait_event_interruptible_timeout() expect that the return value will be > > positive if the specified condition becomes true before the timeout > > elapses. However, at the moment this isn't guaranteed. If the wake-up > > handler is delayed enough, the time remaining until timeout will be > > calculated as 0 - and passed back as a return value - even if the > > condition became true before the timeout has passed. > > > > OK, agreed. > > > > --- a/include/linux/wait.h > > +++ b/include/linux/wait.h > > @@ -217,6 +217,8 @@ do { \ > > if (!ret) \ > > break; \ > > } \ > > + if (!ret && (condition)) \ > > + ret = 1; \ > > finish_wait(&wq, &__wait); \ > > } while (0) > > > > Well, this evaluates "condition" twice, perhaps it would be more > > clean to do, say, > > > > #define __wait_event_timeout(wq, condition, ret) \ > > do { \ > > DEFINE_WAIT(__wait); \ > > \ > > for (;;) { \ > > prepare_to_wait(&wq, &__wait, TASK_UNINTERRUPTIBLE); \ > > if (condition) { \ > > if (!ret) \ > > ret = 1; \ > > break; \ > > } else if (!ret) \ > > break; \ > > ret = schedule_timeout(ret); \ > > } \ > > finish_wait(&wq, &__wait); \ > > } while (0) > > > > but this is minor. > > > > @@ -233,8 +235,9 @@ do { \ > > * wake_up() has to be called after changing any variable that could > > * change the result of the wait condition. > > * > > - * The function returns 0 if the @timeout elapsed, and the remaining > > - * jiffies if the condition evaluated to true before the timeout elapsed. > > + * The function returns 0 if the @timeout elapsed, or the remaining > > + * jiffies (at least 1) if the @condition evaluated to %true before > > + * the @timeout elapsed. > > > > This is still not true if timeout == 0. > > > > Shouldn't we also change wait_event_timeout() ? Say, > > > > #define wait_event_timeout(wq, condition, timeout) \ > > ({ \ > > long __ret = timeout; \ > > if (!(condition)) \ > > __wait_event_timeout(wq, condition, __ret); \ > > else if (!__ret) \ > > __ret = 1; \ > > __ret; \ > > }) > > > > Or wait_event_timeout(timeout => 0) is not legal in a non-void context? > > > > To me the code like > > > > long wait_for_something(bool nonblock) > > { > > timeout = nonblock ? 0 : DEFAULT_TIMEOUT; > > return wait_event_timeout(..., timeout); > > } > > > > looks reasonable and correct. But it is not? > > I don't see why it would be not legal. Note though that in the above > form wait_event_timeout(cond, 0) would still schedule() if cond is > false, which is not what I'd expect from a non-blocking function. Ah sorry, if you also rewrite __wait_event_timeout() then timeout=>0 wouldn't schedule(), so things would work as expected. --Imre ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] wait: fix false timeouts when using wait_event_timeout() 2013-06-04 21:40 ` Imre Deak @ 2013-06-05 16:37 ` Oleg Nesterov 2013-06-05 19:07 ` Oleg Nesterov 0 siblings, 1 reply; 23+ messages in thread From: Oleg Nesterov @ 2013-06-05 16:37 UTC (permalink / raw) To: Imre Deak Cc: Andrew Morton, Daniel Vetter, Dave Jones, David Howells, Jens Axboe, Linus Torvalds, Lukas Czerner, Paul E. McKenney, linux-kernel On 06/05, Imre Deak wrote: > > On Wed, 2013-06-05 at 00:35 +0300, Imre Deak wrote: > > On Tue, 2013-06-04 at 21:28 +0200, Oleg Nesterov wrote: > > > > > > Shouldn't we also change wait_event_timeout() ? Say, > > > > > > #define wait_event_timeout(wq, condition, timeout) \ > > > ({ \ > > > long __ret = timeout; \ > > > if (!(condition)) \ > > > __wait_event_timeout(wq, condition, __ret); \ > > > else if (!__ret) \ > > > __ret = 1; \ > > > __ret; \ > > > }) > > > > > > Or wait_event_timeout(timeout => 0) is not legal in a non-void context? > > > > > > To me the code like > > > > > > long wait_for_something(bool nonblock) > > > { > > > timeout = nonblock ? 0 : DEFAULT_TIMEOUT; > > > return wait_event_timeout(..., timeout); > > > } > > > > > > looks reasonable and correct. But it is not? > > > > I don't see why it would be not legal. Note though that in the above > > form wait_event_timeout(cond, 0) would still schedule() if cond is > > false, which is not what I'd expect from a non-blocking function. Yes, if false. But what if it is true? > Ah sorry, if you also rewrite __wait_event_timeout() then timeout=>0 > wouldn't schedule(), so things would work as expected. Can't understand... probably you missed my point. Let me try again. I think that wait_eveint_timeout(wq, COND, 0) should return !!(COND). But it doesn't, for example wait_event_timeout(wq, true, 0) == 0, this doesn'tlook right to me. And, this is off-topic, but wait_eveint_timeout/__wait_eveint_timeout do not match wait_event/__wait_event. IOW, you can't use __wait_eveint_timeout() if you do not need the fast-path check. So. How about #define __wait_event_timeout(wq, condition, timeout) \ ({ \ DEFINE_WAIT(__wait); \ long __ret = 0, __to = timeout; \ \ for (;;) { \ prepare_to_wait(&wq, &__wait, TASK_UNINTERRUPTIBLE); \ if (condition) { \ __ret = __to ?: 1; \ break; \ } \ if (!__to) \ break; \ __to = schedule_timeout(__to); \ } \ finish_wait(&wq, &__wait); \ __ret; \ }) #define wait_event_timeout(wq, condition, timeout) \ ({ \ long __ret; \ if (condition) \ __ret = (timeout) ?: 1; \ else \ __ret = __wait_event_timeout(wq, condition, timeout); \ __ret; \ }) ? Othwerwise we should document the fact that the caller should alvays verify timeout != 0 if it checks the result of wait_event_timeout(). ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] wait: fix false timeouts when using wait_event_timeout() 2013-06-05 16:37 ` Oleg Nesterov @ 2013-06-05 19:07 ` Oleg Nesterov 2013-06-06 1:45 ` Tejun Heo 0 siblings, 1 reply; 23+ messages in thread From: Oleg Nesterov @ 2013-06-05 19:07 UTC (permalink / raw) To: Imre Deak Cc: Andrew Morton, Daniel Vetter, Dave Jones, David Howells, Jens Axboe, Linus Torvalds, Lukas Czerner, Paul E. McKenney, linux-kernel, Tejun Heo On 06/05, Oleg Nesterov wrote: > > I think that wait_eveint_timeout(wq, COND, 0) should return !!(COND). > But it doesn't, for example wait_event_timeout(wq, true, 0) == 0, this > doesn'tlook right to me. > > And, this is off-topic, but wait_eveint_timeout/__wait_eveint_timeout > do not match wait_event/__wait_event. IOW, you can't use > __wait_eveint_timeout() if you do not need the fast-path check. > > So. How about > > #define __wait_event_timeout(wq, condition, timeout) \ > ({ \ > DEFINE_WAIT(__wait); \ > long __ret = 0, __to = timeout; \ > \ > for (;;) { \ > prepare_to_wait(&wq, &__wait, TASK_UNINTERRUPTIBLE); \ > if (condition) { \ > __ret = __to ?: 1; \ > break; \ > } \ > if (!__to) \ > break; \ > __to = schedule_timeout(__to); \ > } \ > finish_wait(&wq, &__wait); \ > __ret; \ > }) > > #define wait_event_timeout(wq, condition, timeout) \ > ({ \ > long __ret; \ > if (condition) \ > __ret = (timeout) ?: 1; \ > else \ > __ret = __wait_event_timeout(wq, condition, timeout); \ > __ret; \ > }) > > ? > > Othwerwise we should document the fact that the caller should alvays verify > timeout != 0 if it checks the result of wait_event_timeout(). And in fact, perhaps we can implement wait_event_common() and avoid the code duplications? #define __wait_no_timeout(timeout) \ (__builtin_constant_p(timeout) && (timeout) == MAX_SCHEDULE_TIMEOUT) /* uglified signal_pending_state() */ #define __wait_signal_pending(state) \ ((state == TASK_INTERRUPTIBLE) ? signal_pending(current) : \ (state == TASK_KILLABLE) ? fatal_signal_pending(current) : \ 0) #define __wait_event_common(wq, condition, state, tout) \ ({ \ DEFINE_WAIT(__wait); \ long __ret = 0, __tout = tout; \ \ for (;;) { \ prepare_to_wait(&wq, &__wait, state); \ if (condition) { \ __ret = __wait_no_timeout(tout) ?: __tout ?: 1; \ break; \ } \ \ if (__wait_signal_pending(state)) { \ __ret = -ERESTARTSYS; \ break; \ } \ \ if (__wait_no_timeout(tout)) \ schedule(); \ else if (__tout) \ __tout = schedule_timeout(__tout); \ else \ break; \ } \ finish_wait(&wq, &__wait); \ __ret; \ }) #define wait_event_common(wq, condition, state, tout) \ ({ \ long __ret; \ if (condition) \ __ret = __wait_no_timeout(tout) ?: (tout) ?: 1; \ else \ __ret = __wait_event_common(wq, condition, state, tout);\ __ret; \ }) Then, for example, wait_event() becomes #define wait_event(wq, condition) \ wait_event_common(wq, condition, \ TASK_UNINTERRUPTIBLE, MAX_SCHEDULE_TIMEOUT) \ Hmm. I compiled the kernel with the patch below, $ size vmlinux text data bss dec hex filename - 4978601 2935080 10104832 18018513 112f0d1 vmlinux + 4977769 2930984 10104832 18013585 112dd91 vmlinux looks good... Oleg. diff --git a/include/linux/wait.h b/include/linux/wait.h index 1133695..359fffc 100644 --- a/include/linux/wait.h +++ b/include/linux/wait.h @@ -173,18 +173,52 @@ wait_queue_head_t *bit_waitqueue(void *, int); #define wake_up_interruptible_sync_poll(x, m) \ __wake_up_sync_key((x), TASK_INTERRUPTIBLE, 1, (void *) (m)) -#define __wait_event(wq, condition) \ -do { \ +#define __wait_no_timeout(timeout) \ + (__builtin_constant_p(timeout) && (timeout) == MAX_SCHEDULE_TIMEOUT) + +/* uglified signal_pending_state() */ +#define __wait_signal_pending(state) \ + ((state == TASK_INTERRUPTIBLE) ? signal_pending(current) : \ + (state == TASK_KILLABLE) ? fatal_signal_pending(current) : \ + 0) + +#define __wait_event_common(wq, condition, state, tout) \ +({ \ DEFINE_WAIT(__wait); \ + long __ret = 0, __tout = tout; \ \ for (;;) { \ - prepare_to_wait(&wq, &__wait, TASK_UNINTERRUPTIBLE); \ - if (condition) \ + prepare_to_wait(&wq, &__wait, state); \ + if (condition) { \ + __ret = __wait_no_timeout(tout) ?: __tout ?: 1; \ + break; \ + } \ + \ + if (__wait_signal_pending(state)) { \ + __ret = -ERESTARTSYS; \ + break; \ + } \ + \ + if (__wait_no_timeout(tout)) \ + schedule(); \ + else if (__tout) \ + __tout = schedule_timeout(__tout); \ + else \ break; \ - schedule(); \ } \ finish_wait(&wq, &__wait); \ -} while (0) + __ret; \ +}) + +#define wait_event_common(wq, condition, state, tout) \ +({ \ + long __ret; \ + if (condition) \ + __ret = __wait_no_timeout(tout) ?: (tout) ?: 1; \ + else \ + __ret = __wait_event_common(wq, condition, state, tout);\ + __ret; \ +}) /** * wait_event - sleep until a condition gets true @@ -198,29 +232,13 @@ do { \ * wake_up() has to be called after changing any variable that could * change the result of the wait condition. */ -#define wait_event(wq, condition) \ -do { \ - if (condition) \ - break; \ - __wait_event(wq, condition); \ -} while (0) +#define __wait_event(wq, condition) \ + __wait_event_common(wq, condition, \ + TASK_UNINTERRUPTIBLE, MAX_SCHEDULE_TIMEOUT) \ -#define __wait_event_timeout(wq, condition, ret) \ -do { \ - DEFINE_WAIT(__wait); \ - \ - for (;;) { \ - prepare_to_wait(&wq, &__wait, TASK_UNINTERRUPTIBLE); \ - if (condition) \ - break; \ - ret = schedule_timeout(ret); \ - if (!ret) \ - break; \ - } \ - if (!ret && (condition)) \ - ret = 1; \ - finish_wait(&wq, &__wait); \ -} while (0) +#define wait_event(wq, condition) \ + wait_event_common(wq, condition, \ + TASK_UNINTERRUPTIBLE, MAX_SCHEDULE_TIMEOUT) \ /** * wait_event_timeout - sleep until a condition gets true or a timeout elapses @@ -239,31 +257,13 @@ do { \ * jiffies (at least 1) if the @condition evaluated to %true before * the @timeout elapsed. */ -#define wait_event_timeout(wq, condition, timeout) \ -({ \ - long __ret = timeout; \ - if (!(condition)) \ - __wait_event_timeout(wq, condition, __ret); \ - __ret; \ -}) +#define __wait_event_timeout(wq, condition, timeout) \ + __wait_event_common(wq, condition, \ + TASK_UNINTERRUPTIBLE, timeout) -#define __wait_event_interruptible(wq, condition, ret) \ -do { \ - DEFINE_WAIT(__wait); \ - \ - for (;;) { \ - prepare_to_wait(&wq, &__wait, TASK_INTERRUPTIBLE); \ - if (condition) \ - break; \ - if (!signal_pending(current)) { \ - schedule(); \ - continue; \ - } \ - ret = -ERESTARTSYS; \ - break; \ - } \ - finish_wait(&wq, &__wait); \ -} while (0) +#define wait_event_timeout(wq, condition, timeout) \ + wait_event_common(wq, condition, \ + TASK_UNINTERRUPTIBLE, timeout) /** * wait_event_interruptible - sleep until a condition gets true @@ -280,35 +280,13 @@ do { \ * The function will return -ERESTARTSYS if it was interrupted by a * signal and 0 if @condition evaluated to true. */ -#define wait_event_interruptible(wq, condition) \ -({ \ - int __ret = 0; \ - if (!(condition)) \ - __wait_event_interruptible(wq, condition, __ret); \ - __ret; \ -}) +#define __wait_event_interruptible(wq, condition) \ + __wait_event_common(wq, condition, \ + TASK_INTERRUPTIBLE, MAX_SCHEDULE_TIMEOUT) -#define __wait_event_interruptible_timeout(wq, condition, ret) \ -do { \ - DEFINE_WAIT(__wait); \ - \ - for (;;) { \ - prepare_to_wait(&wq, &__wait, TASK_INTERRUPTIBLE); \ - if (condition) \ - break; \ - if (!signal_pending(current)) { \ - ret = schedule_timeout(ret); \ - if (!ret) \ - break; \ - continue; \ - } \ - ret = -ERESTARTSYS; \ - break; \ - } \ - if (!ret && (condition)) \ - ret = 1; \ - finish_wait(&wq, &__wait); \ -} while (0) +#define wait_event_interruptible(wq, condition) \ + wait_event_common(wq, condition, \ + TASK_INTERRUPTIBLE, MAX_SCHEDULE_TIMEOUT) /** * wait_event_interruptible_timeout - sleep until a condition gets true or a timeout elapses @@ -328,13 +306,13 @@ do { \ * a signal, or the remaining jiffies (at least 1) if the @condition * evaluated to %true before the @timeout elapsed. */ +#define __wait_event_interruptible_timeout(wq, condition, timeout) \ + __wait_event_common(wq, condition, \ + TASK_INTERRUPTIBLE, timeout) + #define wait_event_interruptible_timeout(wq, condition, timeout) \ -({ \ - long __ret = timeout; \ - if (!(condition)) \ - __wait_event_interruptible_timeout(wq, condition, __ret); \ - __ret; \ -}) + wait_event_common(wq, condition, \ + TASK_INTERRUPTIBLE, timeout) #define __wait_event_hrtimeout(wq, condition, timeout, state) \ ({ \ @@ -601,24 +579,6 @@ do { \ -#define __wait_event_killable(wq, condition, ret) \ -do { \ - DEFINE_WAIT(__wait); \ - \ - for (;;) { \ - prepare_to_wait(&wq, &__wait, TASK_KILLABLE); \ - if (condition) \ - break; \ - if (!fatal_signal_pending(current)) { \ - schedule(); \ - continue; \ - } \ - ret = -ERESTARTSYS; \ - break; \ - } \ - finish_wait(&wq, &__wait); \ -} while (0) - /** * wait_event_killable - sleep until a condition gets true * @wq: the waitqueue to wait on @@ -634,14 +594,13 @@ do { \ * The function will return -ERESTARTSYS if it was interrupted by a * signal and 0 if @condition evaluated to true. */ -#define wait_event_killable(wq, condition) \ -({ \ - int __ret = 0; \ - if (!(condition)) \ - __wait_event_killable(wq, condition, __ret); \ - __ret; \ -}) +#define __wait_event_killable(wq, condition) \ + __wait_event_common(wq, condition, \ + TASK_KILLABLE, MAX_SCHEDULE_TIMEOUT) +#define wait_event_killable(wq, condition) \ + wait_event_common(wq, condition, \ + TASK_KILLABLE, MAX_SCHEDULE_TIMEOUT) #define __wait_event_lock_irq(wq, condition, lock, cmd) \ do { \ ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH] wait: fix false timeouts when using wait_event_timeout() 2013-06-05 19:07 ` Oleg Nesterov @ 2013-06-06 1:45 ` Tejun Heo 2013-06-06 18:47 ` Oleg Nesterov 0 siblings, 1 reply; 23+ messages in thread From: Tejun Heo @ 2013-06-06 1:45 UTC (permalink / raw) To: Oleg Nesterov Cc: Imre Deak, Andrew Morton, Daniel Vetter, Dave Jones, David Howells, Jens Axboe, Linus Torvalds, Lukas Czerner, Paul E. McKenney, linux-kernel Hello, Oleg. On Wed, Jun 05, 2013 at 09:07:23PM +0200, Oleg Nesterov wrote: > And in fact, perhaps we can implement wait_event_common() and avoid the > code duplications? > > #define __wait_no_timeout(timeout) \ > (__builtin_constant_p(timeout) && (timeout) == MAX_SCHEDULE_TIMEOUT) > > /* uglified signal_pending_state() */ > #define __wait_signal_pending(state) \ > ((state == TASK_INTERRUPTIBLE) ? signal_pending(current) : \ > (state == TASK_KILLABLE) ? fatal_signal_pending(current) : \ > 0) > > #define __wait_event_common(wq, condition, state, tout) \ > ({ \ > DEFINE_WAIT(__wait); \ > long __ret = 0, __tout = tout; \ > \ > for (;;) { \ > prepare_to_wait(&wq, &__wait, state); \ > if (condition) { \ > __ret = __wait_no_timeout(tout) ?: __tout ?: 1; \ > break; \ > } \ > \ > if (__wait_signal_pending(state)) { \ > __ret = -ERESTARTSYS; \ > break; \ > } \ > \ > if (__wait_no_timeout(tout)) \ > schedule(); \ > else if (__tout) \ > __tout = schedule_timeout(__tout); \ > else \ > break; \ > } \ > finish_wait(&wq, &__wait); \ > __ret; \ > }) Heh, yeah, this looks good to me and a lot better than trying to do the same thing over and over again and ending up with subtle differences. > Hmm. I compiled the kernel with the patch below, > > $ size vmlinux > text data bss dec hex filename > - 4978601 2935080 10104832 18018513 112f0d1 vmlinux > + 4977769 2930984 10104832 18013585 112dd91 vmlinux Nice. Provided you went over assembly outputs of at least some combinations, please feel free to add Reviewed-by: Tejun Heo <tj@kernel.org> Thanks. -- tejun ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] wait: fix false timeouts when using wait_event_timeout() 2013-06-06 1:45 ` Tejun Heo @ 2013-06-06 18:47 ` Oleg Nesterov 0 siblings, 0 replies; 23+ messages in thread From: Oleg Nesterov @ 2013-06-06 18:47 UTC (permalink / raw) To: Tejun Heo Cc: Imre Deak, Andrew Morton, Daniel Vetter, Dave Jones, David Howells, Jens Axboe, Linus Torvalds, Lukas Czerner, Paul E. McKenney, linux-kernel Hi Tejun, On 06/05, Tejun Heo wrote: > > Heh, yeah, this looks good to me and a lot better than trying to do > the same thing over and over again and ending up with subtle > differences. Yes, this is the goal. Of course we could fix wait_event_timout() and wait_event_interruptible_timeout() without unification, but this would add more copy-and-paste. > > Hmm. I compiled the kernel with the patch below, > > > > $ size vmlinux > > text data bss dec hex filename > > - 4978601 2935080 10104832 18018513 112f0d1 vmlinux > > + 4977769 2930984 10104832 18013585 112dd91 vmlinux > > Nice. Provided you went over assembly outputs of at least some > combinations, I did, and that is why I am surprized by the numbers above. Yes, gcc optimizes out the unwanted checks but the generated code differs, of course. And sometimes the difference looks "random" to me. Simplest example: extern wait_queue_head_t WQ; extern int COND; void we(void) { wait_event(WQ, COND); } before: we: pushq %rbp # movq %rsp, %rbp #, pushq %rbx # subq $56, %rsp #, call mcount movl COND(%rip), %edx # COND, testl %edx, %edx # jne .L5 #, leaq -64(%rbp), %rbx #, tmp66 movq $0, -64(%rbp) #, __wait movq $autoremove_wake_function, -48(%rbp) #, __wait.func #APP # 14 "/home/tmp/K/misc/arch/x86/include/asm/current.h" 1 movq %gs:current_task,%rax #, pfo_ret__ # 0 "" 2 #NO_APP movq %rax, -56(%rbp) # pfo_ret__, __wait.private leaq 24(%rbx), %rax #, tmp61 movq %rax, -40(%rbp) # tmp61, __wait.task_list.next movq %rax, -32(%rbp) # tmp61, __wait.task_list.prev .p2align 4,,10 .p2align 3 .L4: movl $2, %edx #, movq %rbx, %rsi # tmp66, movq $WQ, %rdi #, call prepare_to_wait # movl COND(%rip), %eax # COND, testl %eax, %eax # jne .L3 #, call schedule # jmp .L4 # .p2align 4,,10 .p2align 3 .L3: movq %rbx, %rsi # tmp66, movq $WQ, %rdi #, call finish_wait # .L5: addq $56, %rsp #, popq %rbx # leave ret after: we: pushq %rbp # movq %rsp, %rbp #, pushq %rbx # subq $56, %rsp #, call mcount movl COND(%rip), %edx # COND, testl %edx, %edx # jne .L5 #, leaq -64(%rbp), %rbx #, tmp66 movq $0, -64(%rbp) #, __wait movq $autoremove_wake_function, -48(%rbp) #, __wait.func #APP # 14 "/home/tmp/K/misc/arch/x86/include/asm/current.h" 1 movq %gs:current_task,%rax #, pfo_ret__ # 0 "" 2 #NO_APP movq %rax, -56(%rbp) # pfo_ret__, __wait.private leaq 24(%rbx), %rax #, tmp61 movq %rax, -40(%rbp) # tmp61, __wait.task_list.next movq %rax, -32(%rbp) # tmp61, __wait.task_list.prev .p2align 4,,10 .p2align 3 .L4: movl $2, %edx #, movq %rbx, %rsi # tmp66, movq $WQ, %rdi #, call prepare_to_wait # movl COND(%rip), %eax # COND, testl %eax, %eax # je .L3 #, movq %rbx, %rsi # tmp66, movq $WQ, %rdi #, call finish_wait # jmp .L5 # .p2align 4,,10 .p2align 3 .L3: call schedule # .p2align 4,,8 .p2align 3 jmp .L4 # .p2align 4,,10 .p2align 3 .L5: addq $56, %rsp #, popq %rbx # leave .p2align 4,,4 .p2align 3 ret As you can see, with this patch gcc generates a bit more code. But only because reorderes finish_wait() and inserts the additional nops, otherwise code the same. I think this difference is not "reliable" and can be ignored. But, the code like "return wait_even_timeout(true, non_const_timeout)" really generates more code and this is correct: the patch tries to fix the bug (at least I think this is a bug) and the additional code ensures that __ret != 0. > Reviewed-by: Tejun Heo <tj@kernel.org> Thanks! I'll send this patch soon. Oleg. ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH] wait: fix false timeouts when using wait_event_timeout()
@ 2013-05-02 8:58 Imre Deak
2013-05-02 9:36 ` Daniel Vetter
` (2 more replies)
0 siblings, 3 replies; 23+ messages in thread
From: Imre Deak @ 2013-05-02 8:58 UTC (permalink / raw)
To: Daniel Vetter, Paul E. McKenney, David Howells, Dave Jones,
Jens Axboe, Lukas Czerner, linux-kernel
Cc: Imre Deak
Many callers of the wait_event_timeout() and
wait_event_interruptible_timeout() expect that the return value will be
positive if the specified condition becomes true before the timeout
elapses. However, at the moment this isn't guaranteed. If the wake-up
handler is delayed enough, the time remaining until timeout will be
calculated as 0 - and passed back as a return value - even if the
condition became true before the timeout has passed.
Fix this by returning at least 1 if the condition becomes true. This
semantic is in line with what wait_for_condition_timeout() does; see
commit bb10ed09 - "sched: fix wait_for_completion_timeout() spurious
failure under heavy load".
Signed-off-by: Imre Deak <imre.deak@intel.com>
---
include/linux/wait.h | 16 +++++++++++-----
1 file changed, 11 insertions(+), 5 deletions(-)
diff --git a/include/linux/wait.h b/include/linux/wait.h
index 7cb64d4..5336842 100644
--- a/include/linux/wait.h
+++ b/include/linux/wait.h
@@ -217,6 +217,8 @@ do { \
if (!ret) \
break; \
} \
+ if (!ret && (condition)) \
+ ret = 1; \
finish_wait(&wq, &__wait); \
} while (0)
@@ -233,8 +235,9 @@ do { \
* wake_up() has to be called after changing any variable that could
* change the result of the wait condition.
*
- * The function returns 0 if the @timeout elapsed, and the remaining
- * jiffies if the condition evaluated to true before the timeout elapsed.
+ * The function returns 0 if the @timeout elapsed, or the remaining
+ * jiffies (at least 1) if the @condition evaluated to %true before
+ * the @timeout elapsed.
*/
#define wait_event_timeout(wq, condition, timeout) \
({ \
@@ -302,6 +305,8 @@ do { \
ret = -ERESTARTSYS; \
break; \
} \
+ if (!ret && (condition)) \
+ ret = 1; \
finish_wait(&wq, &__wait); \
} while (0)
@@ -318,9 +323,10 @@ do { \
* wake_up() has to be called after changing any variable that could
* change the result of the wait condition.
*
- * The function returns 0 if the @timeout elapsed, -ERESTARTSYS if it
- * was interrupted by a signal, and the remaining jiffies otherwise
- * if the condition evaluated to true before the timeout elapsed.
+ * Returns:
+ * 0 if the @timeout elapsed, -%ERESTARTSYS if it was interrupted by
+ * a signal, or the remaining jiffies (at least 1) if the @condition
+ * evaluated to %true before the @timeout elapsed.
*/
#define wait_event_interruptible_timeout(wq, condition, timeout) \
({ \
--
1.7.10.4
^ permalink raw reply related [flat|nested] 23+ messages in thread* Re: [PATCH] wait: fix false timeouts when using wait_event_timeout() 2013-05-02 8:58 Imre Deak @ 2013-05-02 9:36 ` Daniel Vetter 2013-05-07 23:12 ` Andrew Morton 2013-05-02 10:29 ` David Howells 2013-05-02 12:29 ` David Howells 2 siblings, 1 reply; 23+ messages in thread From: Daniel Vetter @ 2013-05-02 9:36 UTC (permalink / raw) To: Imre Deak Cc: Paul E. McKenney, David Howells, Dave Jones, Jens Axboe, Lukas Czerner, Linux Kernel Mailing List On Thu, May 2, 2013 at 10:58 AM, Imre Deak <imre.deak@intel.com> wrote: > Many callers of the wait_event_timeout() and > wait_event_interruptible_timeout() expect that the return value will be > positive if the specified condition becomes true before the timeout > elapses. However, at the moment this isn't guaranteed. If the wake-up > handler is delayed enough, the time remaining until timeout will be > calculated as 0 - and passed back as a return value - even if the > condition became true before the timeout has passed. > > Fix this by returning at least 1 if the condition becomes true. This > semantic is in line with what wait_for_condition_timeout() does; see > commit bb10ed09 - "sched: fix wait_for_completion_timeout() spurious > failure under heavy load". > > Signed-off-by: Imre Deak <imre.deak@intel.com> We have 3 instances of this bug in drm/i915. One case even where we switch between the interruptible and not interruptible wait_event_timeout variants, foolishly presuming they have the same semantics. I very much like this. Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] wait: fix false timeouts when using wait_event_timeout() 2013-05-02 9:36 ` Daniel Vetter @ 2013-05-07 23:12 ` Andrew Morton 2013-05-08 9:49 ` Imre Deak 0 siblings, 1 reply; 23+ messages in thread From: Andrew Morton @ 2013-05-07 23:12 UTC (permalink / raw) To: Daniel Vetter Cc: Imre Deak, Paul E. McKenney, David Howells, Dave Jones, Jens Axboe, Lukas Czerner, Linux Kernel Mailing List On Thu, 2 May 2013 11:36:56 +0200 Daniel Vetter <daniel.vetter@ffwll.ch> wrote: > On Thu, May 2, 2013 at 10:58 AM, Imre Deak <imre.deak@intel.com> wrote: > > Many callers of the wait_event_timeout() and > > wait_event_interruptible_timeout() expect that the return value will be > > positive if the specified condition becomes true before the timeout > > elapses. However, at the moment this isn't guaranteed. If the wake-up > > handler is delayed enough, the time remaining until timeout will be > > calculated as 0 - and passed back as a return value - even if the > > condition became true before the timeout has passed. > > > > Fix this by returning at least 1 if the condition becomes true. This > > semantic is in line with what wait_for_condition_timeout() does; see > > commit bb10ed09 - "sched: fix wait_for_completion_timeout() spurious > > failure under heavy load". > > > > Signed-off-by: Imre Deak <imre.deak@intel.com> > > We have 3 instances of this bug in drm/i915. One case even where we > switch between the interruptible and not interruptible > wait_event_timeout variants, foolishly presuming they have the same > semantics. I very much like this. Let's think about scheduling this fix. Are any of the bugs which we expect this patch fixes serious enough to warrant merging it into 3.10? And -stable? ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] wait: fix false timeouts when using wait_event_timeout() 2013-05-07 23:12 ` Andrew Morton @ 2013-05-08 9:49 ` Imre Deak 0 siblings, 0 replies; 23+ messages in thread From: Imre Deak @ 2013-05-08 9:49 UTC (permalink / raw) To: Andrew Morton Cc: Daniel Vetter, Paul E. McKenney, David Howells, Dave Jones, Jens Axboe, Lukas Czerner, Linux Kernel Mailing List On Tue, 2013-05-07 at 16:12 -0700, Andrew Morton wrote: > On Thu, 2 May 2013 11:36:56 +0200 Daniel Vetter <daniel.vetter@ffwll.ch> wrote: > > > On Thu, May 2, 2013 at 10:58 AM, Imre Deak <imre.deak@intel.com> wrote: > > > Many callers of the wait_event_timeout() and > > > wait_event_interruptible_timeout() expect that the return value will be > > > positive if the specified condition becomes true before the timeout > > > elapses. However, at the moment this isn't guaranteed. If the wake-up > > > handler is delayed enough, the time remaining until timeout will be > > > calculated as 0 - and passed back as a return value - even if the > > > condition became true before the timeout has passed. > > > > > > Fix this by returning at least 1 if the condition becomes true. This > > > semantic is in line with what wait_for_condition_timeout() does; see > > > commit bb10ed09 - "sched: fix wait_for_completion_timeout() spurious > > > failure under heavy load". > > > > > > Signed-off-by: Imre Deak <imre.deak@intel.com> > > > > We have 3 instances of this bug in drm/i915. One case even where we > > switch between the interruptible and not interruptible > > wait_event_timeout variants, foolishly presuming they have the same > > semantics. I very much like this. > > Let's think about scheduling this fix. > > Are any of the bugs which we expect this patch fixes serious enough to > warrant merging it into 3.10? And -stable? There is at least [1], but I'm sure there is more similar reports about i915. I'd vote for -stable at least. --Imre [1] https://bugs.freedesktop.org/show_bug.cgi?id=64133 ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] wait: fix false timeouts when using wait_event_timeout() 2013-05-02 8:58 Imre Deak 2013-05-02 9:36 ` Daniel Vetter @ 2013-05-02 10:29 ` David Howells 2013-05-02 12:02 ` Imre Deak 2013-05-02 12:13 ` Daniel Vetter 2013-05-02 12:29 ` David Howells 2 siblings, 2 replies; 23+ messages in thread From: David Howells @ 2013-05-02 10:29 UTC (permalink / raw) To: Imre Deak Cc: dhowells, Daniel Vetter, Paul E. McKenney, Dave Jones, Jens Axboe, Lukas Czerner, linux-kernel Imre Deak <imre.deak@intel.com> wrote: > Many callers of the wait_event_timeout() and > wait_event_interruptible_timeout() expect that the return value will be > positive if the specified condition becomes true before the timeout > elapses. However, at the moment this isn't guaranteed. If the wake-up > handler is delayed enough, the time remaining until timeout will be > calculated as 0 - and passed back as a return value - even if the > condition became true before the timeout has passed. Fun. > Fix this by returning at least 1 if the condition becomes true. This > semantic is in line with what wait_for_condition_timeout() does; see > commit bb10ed09 - "sched: fix wait_for_completion_timeout() spurious > failure under heavy load". But now you can't distinguish the timer expiring first, if the thread doing the waiting gets delayed sufficiently long for the event to happen. I'm not sure there's a good answer - except maybe making the timer expiry handler check the condition (which would likely get really yucky really quickly). David ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] wait: fix false timeouts when using wait_event_timeout() 2013-05-02 10:29 ` David Howells @ 2013-05-02 12:02 ` Imre Deak 2013-05-02 12:13 ` Daniel Vetter 1 sibling, 0 replies; 23+ messages in thread From: Imre Deak @ 2013-05-02 12:02 UTC (permalink / raw) To: David Howells Cc: Daniel Vetter, Paul E. McKenney, Dave Jones, Jens Axboe, Lukas Czerner, linux-kernel On Thu, 2013-05-02 at 11:29 +0100, David Howells wrote: > Imre Deak <imre.deak@intel.com> wrote: > > > Many callers of the wait_event_timeout() and > > wait_event_interruptible_timeout() expect that the return value will be > > positive if the specified condition becomes true before the timeout > > elapses. However, at the moment this isn't guaranteed. If the wake-up > > handler is delayed enough, the time remaining until timeout will be > > calculated as 0 - and passed back as a return value - even if the > > condition became true before the timeout has passed. > > Fun. > > > Fix this by returning at least 1 if the condition becomes true. This > > semantic is in line with what wait_for_condition_timeout() does; see > > commit bb10ed09 - "sched: fix wait_for_completion_timeout() spurious > > failure under heavy load". > > But now you can't distinguish the timer expiring first, if the thread doing > the waiting gets delayed sufficiently long for the event to happen. I'm trying to understand what sequence do you mean. I can think of the following - as an example - in case of starting a transaction that will set a completion flag: waiter completion handler start transaction set completion_flag ret = wait_event_timeout(timeout, completion_flag) In this case ret will be timeout which is the original behavior, so should be ok. One exception is if timeout=0 to begin with, since then - after this change - ret will be 1. But I can't see how that use case is useful. I guess I'm missing something, could you elaborate? --Imre > I'm not sure there's a good answer - except maybe making the timer expiry > handler check the condition (which would likely get really yucky really > quickly). > > David ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] wait: fix false timeouts when using wait_event_timeout() 2013-05-02 10:29 ` David Howells 2013-05-02 12:02 ` Imre Deak @ 2013-05-02 12:13 ` Daniel Vetter 2013-05-02 12:23 ` Jens Axboe 1 sibling, 1 reply; 23+ messages in thread From: Daniel Vetter @ 2013-05-02 12:13 UTC (permalink / raw) To: David Howells Cc: Imre Deak, Paul E. McKenney, Dave Jones, Jens Axboe, Lukas Czerner, Linux Kernel Mailing List On Thu, May 2, 2013 at 12:29 PM, David Howells <dhowells@redhat.com> wrote: >> Fix this by returning at least 1 if the condition becomes true. This >> semantic is in line with what wait_for_condition_timeout() does; see >> commit bb10ed09 - "sched: fix wait_for_completion_timeout() spurious >> failure under heavy load". > > But now you can't distinguish the timer expiring first, if the thread doing > the waiting gets delayed sufficiently long for the event to happen. That can already happen, e.g. 1. wakeup happens and condition is true. 2. we compute remaining jiffies > 0 -> preempt 3. now wait_for_event_timeout returns. Only difference is that the delay/preempt happens in between 1. and 2., and then suddenly the wake up didn't happen in time (with the current return code semantics). So imo the current behaviour is simply a bug and will miss timely wakeups in some cases. The other way round, namely wait_for_event_timeout taking longer than the timeout is expected (and part of the interface for every timeout function). So all current callers already need to be able to cope with random preemption/delays pushing the total time before the call to wait_for_event and checking the return value over the timeout, even when condition was signalled in time. If there's any case which relies on accurate timeout detection that simply won't work with wait_for_event (they need an nmi or a hw timestamp counter or something similar). -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] wait: fix false timeouts when using wait_event_timeout() 2013-05-02 12:13 ` Daniel Vetter @ 2013-05-02 12:23 ` Jens Axboe 2013-05-02 12:29 ` David Howells 2013-05-02 12:34 ` Imre Deak 0 siblings, 2 replies; 23+ messages in thread From: Jens Axboe @ 2013-05-02 12:23 UTC (permalink / raw) To: Daniel Vetter Cc: David Howells, Imre Deak, Paul E. McKenney, Dave Jones, Lukas Czerner, Linux Kernel Mailing List On Thu, May 02 2013, Daniel Vetter wrote: > On Thu, May 2, 2013 at 12:29 PM, David Howells <dhowells@redhat.com> wrote: > >> Fix this by returning at least 1 if the condition becomes true. This > >> semantic is in line with what wait_for_condition_timeout() does; see > >> commit bb10ed09 - "sched: fix wait_for_completion_timeout() spurious > >> failure under heavy load". > > > > But now you can't distinguish the timer expiring first, if the thread doing > > the waiting gets delayed sufficiently long for the event to happen. > > That can already happen, e.g. > > 1. wakeup happens and condition is true. > 2. we compute remaining jiffies > 0 > -> preempt > 3. now wait_for_event_timeout returns. > > Only difference is that the delay/preempt happens in between 1. and > 2., and then suddenly the wake up didn't happen in time (with the > current return code semantics). > > So imo the current behaviour is simply a bug and will miss timely > wakeups in some cases. > > The other way round, namely wait_for_event_timeout taking longer than > the timeout is expected (and part of the interface for every timeout > function). So all current callers already need to be able to cope with > random preemption/delays pushing the total time before the call to > wait_for_event and checking the return value over the timeout, even > when condition was signalled in time. > > If there's any case which relies on accurate timeout detection that > simply won't work with wait_for_event (they need an nmi or a hw > timestamp counter or something similar). I seriously doubt that anyone is depending on any sort of accuracy on the return. 1 jiffy is not going to make or break anything - in fact, jiffies could be incremented nsecs after the initial call. So a granularity of at least 1 is going to be expected in any case. The important bit here is that the API should behave as expected. And the most logical way to code that is to check the return value. I can easily see people forgetting to re-check the condition, hence you get a bug. The fact that you and the original reporter already had accidents with this is a clear sign that the logical way to use the API is not the correct one. IMHO, the change definitely makes sense. -- Jens Axboe ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] wait: fix false timeouts when using wait_event_timeout() 2013-05-02 12:23 ` Jens Axboe @ 2013-05-02 12:29 ` David Howells 2013-05-02 12:34 ` Imre Deak 1 sibling, 0 replies; 23+ messages in thread From: David Howells @ 2013-05-02 12:29 UTC (permalink / raw) To: Jens Axboe Cc: dhowells, Daniel Vetter, Imre Deak, Paul E. McKenney, Dave Jones, Lukas Czerner, Linux Kernel Mailing List Jens Axboe <axboe@kernel.dk> wrote: > IMHO, the change definitely makes sense. Yeah... I think I agree. David ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] wait: fix false timeouts when using wait_event_timeout() 2013-05-02 12:23 ` Jens Axboe 2013-05-02 12:29 ` David Howells @ 2013-05-02 12:34 ` Imre Deak 2013-05-02 12:54 ` Jens Axboe 1 sibling, 1 reply; 23+ messages in thread From: Imre Deak @ 2013-05-02 12:34 UTC (permalink / raw) To: Jens Axboe Cc: Daniel Vetter, David Howells, Paul E. McKenney, Dave Jones, Lukas Czerner, Linux Kernel Mailing List On Thu, 2013-05-02 at 14:23 +0200, Jens Axboe wrote: > On Thu, May 02 2013, Daniel Vetter wrote: > > On Thu, May 2, 2013 at 12:29 PM, David Howells <dhowells@redhat.com> wrote: > > >> Fix this by returning at least 1 if the condition becomes true. This > > >> semantic is in line with what wait_for_condition_timeout() does; see > > >> commit bb10ed09 - "sched: fix wait_for_completion_timeout() spurious > > >> failure under heavy load". > > > > > > But now you can't distinguish the timer expiring first, if the thread doing > > > the waiting gets delayed sufficiently long for the event to happen. > > > > That can already happen, e.g. > > > > 1. wakeup happens and condition is true. > > 2. we compute remaining jiffies > 0 > > -> preempt > > 3. now wait_for_event_timeout returns. > > > > Only difference is that the delay/preempt happens in between 1. and > > 2., and then suddenly the wake up didn't happen in time (with the > > current return code semantics). > > > > So imo the current behaviour is simply a bug and will miss timely > > wakeups in some cases. > > > > The other way round, namely wait_for_event_timeout taking longer than > > the timeout is expected (and part of the interface for every timeout > > function). So all current callers already need to be able to cope with > > random preemption/delays pushing the total time before the call to > > wait_for_event and checking the return value over the timeout, even > > when condition was signalled in time. > > > > If there's any case which relies on accurate timeout detection that > > simply won't work with wait_for_event (they need an nmi or a hw > > timestamp counter or something similar). > > I seriously doubt that anyone is depending on any sort of accuracy on > the return. 1 jiffy is not going to make or break anything - in fact, > jiffies could be incremented nsecs after the initial call. So a > granularity of at least 1 is going to be expected in any case. > > The important bit here is that the API should behave as expected. And > the most logical way to code that is to check the return value. I can > easily see people forgetting to re-check the condition, hence you get a > bug. The fact that you and the original reporter already had accidents > with this is a clear sign that the logical way to use the API is not the > correct one. > > IMHO, the change definitely makes sense. Ok, so taking courage of this answer ;P How about also the following? diff --git a/kernel/timer.c b/kernel/timer.c index dbf7a78..5a62456 100644 --- a/kernel/timer.c +++ b/kernel/timer.c @@ -1515,7 +1515,11 @@ signed long __sched schedule_timeout(signed long timeout) } } - expire = timeout + jiffies; + /* + * We can't be sure how close we are to the next tick, so +1 to + * guarantee that we wait at least timeout amount. + */ + expire = timeout + jiffies + 1; setup_timer_on_stack(&timer, process_timeout, (unsigned long)current); __mod_timer(&timer, expire, false, TIMER_NOT_PINNED); It'd plug a similar hole for wait_event_timeout() and similar users, who don't compensate for the above.. --Imre ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH] wait: fix false timeouts when using wait_event_timeout() 2013-05-02 12:34 ` Imre Deak @ 2013-05-02 12:54 ` Jens Axboe 2013-05-02 13:56 ` Imre Deak 0 siblings, 1 reply; 23+ messages in thread From: Jens Axboe @ 2013-05-02 12:54 UTC (permalink / raw) To: Imre Deak Cc: Daniel Vetter, David Howells, Paul E. McKenney, Dave Jones, Lukas Czerner, Linux Kernel Mailing List On Thu, May 02 2013, Imre Deak wrote: > On Thu, 2013-05-02 at 14:23 +0200, Jens Axboe wrote: > > On Thu, May 02 2013, Daniel Vetter wrote: > > > On Thu, May 2, 2013 at 12:29 PM, David Howells <dhowells@redhat.com> wrote: > > > >> Fix this by returning at least 1 if the condition becomes true. This > > > >> semantic is in line with what wait_for_condition_timeout() does; see > > > >> commit bb10ed09 - "sched: fix wait_for_completion_timeout() spurious > > > >> failure under heavy load". > > > > > > > > But now you can't distinguish the timer expiring first, if the thread doing > > > > the waiting gets delayed sufficiently long for the event to happen. > > > > > > That can already happen, e.g. > > > > > > 1. wakeup happens and condition is true. > > > 2. we compute remaining jiffies > 0 > > > -> preempt > > > 3. now wait_for_event_timeout returns. > > > > > > Only difference is that the delay/preempt happens in between 1. and > > > 2., and then suddenly the wake up didn't happen in time (with the > > > current return code semantics). > > > > > > So imo the current behaviour is simply a bug and will miss timely > > > wakeups in some cases. > > > > > > The other way round, namely wait_for_event_timeout taking longer than > > > the timeout is expected (and part of the interface for every timeout > > > function). So all current callers already need to be able to cope with > > > random preemption/delays pushing the total time before the call to > > > wait_for_event and checking the return value over the timeout, even > > > when condition was signalled in time. > > > > > > If there's any case which relies on accurate timeout detection that > > > simply won't work with wait_for_event (they need an nmi or a hw > > > timestamp counter or something similar). > > > > I seriously doubt that anyone is depending on any sort of accuracy on > > the return. 1 jiffy is not going to make or break anything - in fact, > > jiffies could be incremented nsecs after the initial call. So a > > granularity of at least 1 is going to be expected in any case. > > > > The important bit here is that the API should behave as expected. And > > the most logical way to code that is to check the return value. I can > > easily see people forgetting to re-check the condition, hence you get a > > bug. The fact that you and the original reporter already had accidents > > with this is a clear sign that the logical way to use the API is not the > > correct one. > > > > IMHO, the change definitely makes sense. > > Ok, so taking courage of this answer ;P How about also the following? > > diff --git a/kernel/timer.c b/kernel/timer.c > index dbf7a78..5a62456 100644 > --- a/kernel/timer.c > +++ b/kernel/timer.c > @@ -1515,7 +1515,11 @@ signed long __sched schedule_timeout(signed long > timeout) > } > } > > - expire = timeout + jiffies; > + /* > + * We can't be sure how close we are to the next tick, so +1 to > + * guarantee that we wait at least timeout amount. > + */ > + expire = timeout + jiffies + 1; > > setup_timer_on_stack(&timer, process_timeout, (unsigned long)current); > __mod_timer(&timer, expire, false, TIMER_NOT_PINNED); > > > It'd plug a similar hole for wait_event_timeout() and similar users, who > don't compensate for the above.. Any jiffy based API is going to have this issue. I think it's different from the original patch, which just makes the API potentially return something that is confusing. So not sure on the above, sorry. -- Jens Axboe ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] wait: fix false timeouts when using wait_event_timeout() 2013-05-02 12:54 ` Jens Axboe @ 2013-05-02 13:56 ` Imre Deak 2013-05-02 14:04 ` Daniel Vetter 0 siblings, 1 reply; 23+ messages in thread From: Imre Deak @ 2013-05-02 13:56 UTC (permalink / raw) To: Jens Axboe Cc: Daniel Vetter, David Howells, Paul E. McKenney, Dave Jones, Lukas Czerner, Linux Kernel Mailing List On Thu, 2013-05-02 at 14:54 +0200, Jens Axboe wrote: > On Thu, May 02 2013, Imre Deak wrote: > > On Thu, 2013-05-02 at 14:23 +0200, Jens Axboe wrote: > > > On Thu, May 02 2013, Daniel Vetter wrote: > > > > On Thu, May 2, 2013 at 12:29 PM, David Howells <dhowells@redhat.com> wrote: > > > > >> Fix this by returning at least 1 if the condition becomes true. This > > > > >> semantic is in line with what wait_for_condition_timeout() does; see > > > > >> commit bb10ed09 - "sched: fix wait_for_completion_timeout() spurious > > > > >> failure under heavy load". > > > > > > > > > > But now you can't distinguish the timer expiring first, if the thread doing > > > > > the waiting gets delayed sufficiently long for the event to happen. > > > > > > > > That can already happen, e.g. > > > > > > > > 1. wakeup happens and condition is true. > > > > 2. we compute remaining jiffies > 0 > > > > -> preempt > > > > 3. now wait_for_event_timeout returns. > > > > > > > > Only difference is that the delay/preempt happens in between 1. and > > > > 2., and then suddenly the wake up didn't happen in time (with the > > > > current return code semantics). > > > > > > > > So imo the current behaviour is simply a bug and will miss timely > > > > wakeups in some cases. > > > > > > > > The other way round, namely wait_for_event_timeout taking longer than > > > > the timeout is expected (and part of the interface for every timeout > > > > function). So all current callers already need to be able to cope with > > > > random preemption/delays pushing the total time before the call to > > > > wait_for_event and checking the return value over the timeout, even > > > > when condition was signalled in time. > > > > > > > > If there's any case which relies on accurate timeout detection that > > > > simply won't work with wait_for_event (they need an nmi or a hw > > > > timestamp counter or something similar). > > > > > > I seriously doubt that anyone is depending on any sort of accuracy on > > > the return. 1 jiffy is not going to make or break anything - in fact, > > > jiffies could be incremented nsecs after the initial call. So a > > > granularity of at least 1 is going to be expected in any case. > > > > > > The important bit here is that the API should behave as expected. And > > > the most logical way to code that is to check the return value. I can > > > easily see people forgetting to re-check the condition, hence you get a > > > bug. The fact that you and the original reporter already had accidents > > > with this is a clear sign that the logical way to use the API is not the > > > correct one. > > > > > > IMHO, the change definitely makes sense. > > > > Ok, so taking courage of this answer ;P How about also the following? > > > > diff --git a/kernel/timer.c b/kernel/timer.c > > index dbf7a78..5a62456 100644 > > --- a/kernel/timer.c > > +++ b/kernel/timer.c > > @@ -1515,7 +1515,11 @@ signed long __sched schedule_timeout(signed long > > timeout) > > } > > } > > > > - expire = timeout + jiffies; > > + /* > > + * We can't be sure how close we are to the next tick, so +1 to > > + * guarantee that we wait at least timeout amount. > > + */ > > + expire = timeout + jiffies + 1; > > > > setup_timer_on_stack(&timer, process_timeout, (unsigned long)current); > > __mod_timer(&timer, expire, false, TIMER_NOT_PINNED); > > > > > > It'd plug a similar hole for wait_event_timeout() and similar users, who > > don't compensate for the above.. > > Any jiffy based API is going to have this issue. I think it's different > from the original patch, which just makes the API potentially return > something that is confusing. Yea, at least those that take a relative time. Usually the timeout is given with a big overhead, so there it won't be a problem. But I also found users like drivers/acpi/ec.c, that will pass a timeout of 1 jiffies, which may then result in premature timeouts. > So not sure on the above, sorry. Ok. A WARN to wait_event_timeout for the above case could still be a useful guard.. --Imre ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] wait: fix false timeouts when using wait_event_timeout() 2013-05-02 13:56 ` Imre Deak @ 2013-05-02 14:04 ` Daniel Vetter 0 siblings, 0 replies; 23+ messages in thread From: Daniel Vetter @ 2013-05-02 14:04 UTC (permalink / raw) To: Imre Deak Cc: Jens Axboe, David Howells, Paul E. McKenney, Dave Jones, Lukas Czerner, Linux Kernel Mailing List On Thu, May 2, 2013 at 3:56 PM, Imre Deak <imre.deak@intel.com> wrote: > On Thu, 2013-05-02 at 14:54 +0200, Jens Axboe wrote: >> On Thu, May 02 2013, Imre Deak wrote: >> > On Thu, 2013-05-02 at 14:23 +0200, Jens Axboe wrote: >> > > On Thu, May 02 2013, Daniel Vetter wrote: >> > > > On Thu, May 2, 2013 at 12:29 PM, David Howells <dhowells@redhat.com> wrote: >> > > > >> Fix this by returning at least 1 if the condition becomes true. This >> > > > >> semantic is in line with what wait_for_condition_timeout() does; see >> > > > >> commit bb10ed09 - "sched: fix wait_for_completion_timeout() spurious >> > > > >> failure under heavy load". >> > > > > >> > > > > But now you can't distinguish the timer expiring first, if the thread doing >> > > > > the waiting gets delayed sufficiently long for the event to happen. >> > > > >> > > > That can already happen, e.g. >> > > > >> > > > 1. wakeup happens and condition is true. >> > > > 2. we compute remaining jiffies > 0 >> > > > -> preempt >> > > > 3. now wait_for_event_timeout returns. >> > > > >> > > > Only difference is that the delay/preempt happens in between 1. and >> > > > 2., and then suddenly the wake up didn't happen in time (with the >> > > > current return code semantics). >> > > > >> > > > So imo the current behaviour is simply a bug and will miss timely >> > > > wakeups in some cases. >> > > > >> > > > The other way round, namely wait_for_event_timeout taking longer than >> > > > the timeout is expected (and part of the interface for every timeout >> > > > function). So all current callers already need to be able to cope with >> > > > random preemption/delays pushing the total time before the call to >> > > > wait_for_event and checking the return value over the timeout, even >> > > > when condition was signalled in time. >> > > > >> > > > If there's any case which relies on accurate timeout detection that >> > > > simply won't work with wait_for_event (they need an nmi or a hw >> > > > timestamp counter or something similar). >> > > >> > > I seriously doubt that anyone is depending on any sort of accuracy on >> > > the return. 1 jiffy is not going to make or break anything - in fact, >> > > jiffies could be incremented nsecs after the initial call. So a >> > > granularity of at least 1 is going to be expected in any case. >> > > >> > > The important bit here is that the API should behave as expected. And >> > > the most logical way to code that is to check the return value. I can >> > > easily see people forgetting to re-check the condition, hence you get a >> > > bug. The fact that you and the original reporter already had accidents >> > > with this is a clear sign that the logical way to use the API is not the >> > > correct one. >> > > >> > > IMHO, the change definitely makes sense. >> > >> > Ok, so taking courage of this answer ;P How about also the following? >> > >> > diff --git a/kernel/timer.c b/kernel/timer.c >> > index dbf7a78..5a62456 100644 >> > --- a/kernel/timer.c >> > +++ b/kernel/timer.c >> > @@ -1515,7 +1515,11 @@ signed long __sched schedule_timeout(signed long >> > timeout) >> > } >> > } >> > >> > - expire = timeout + jiffies; >> > + /* >> > + * We can't be sure how close we are to the next tick, so +1 to >> > + * guarantee that we wait at least timeout amount. >> > + */ >> > + expire = timeout + jiffies + 1; >> > >> > setup_timer_on_stack(&timer, process_timeout, (unsigned long)current); >> > __mod_timer(&timer, expire, false, TIMER_NOT_PINNED); >> > >> > >> > It'd plug a similar hole for wait_event_timeout() and similar users, who >> > don't compensate for the above.. >> >> Any jiffy based API is going to have this issue. I think it's different >> from the original patch, which just makes the API potentially return >> something that is confusing. > > Yea, at least those that take a relative time. Usually the timeout is > given with a big overhead, so there it won't be a problem. But I also > found users like drivers/acpi/ec.c, that will pass a timeout of 1 > jiffies, which may then result in premature timeouts. In drm/i915 we have a bunch of those 1 jiffie timeouts, too, especially with low HZ configs. For easy comparison with hw docs we try to stick to the documented timeout limits and convert them with msec_to_jiffies (which does the right thing wrt rounding). But since these timeouts are often in the low ms range, that often means just 1 jiffie timeouts. So the timeout could be off by a large factor (and we have the bugzillas to prove it happens in reality). I'm not sure whether sprinkling +1 all over the codebase is a that nice approach. Adding the +1 in sched_timeout otoh would give us a nice idiot proof interface for driver guys like me. We also have some custom register wait loops using msleep(1) and jiffy based timeouts, and there our driver macro does the +1. That helped a lot in avoiding stupid bugs, but now we've converted a few things over to interrupts + timeout and the bugs are back ... -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] wait: fix false timeouts when using wait_event_timeout() 2013-05-02 8:58 Imre Deak 2013-05-02 9:36 ` Daniel Vetter 2013-05-02 10:29 ` David Howells @ 2013-05-02 12:29 ` David Howells 2013-05-02 12:35 ` Jens Axboe 2 siblings, 1 reply; 23+ messages in thread From: David Howells @ 2013-05-02 12:29 UTC (permalink / raw) To: Imre Deak Cc: dhowells, Daniel Vetter, Paul E. McKenney, Dave Jones, Jens Axboe, Lukas Czerner, linux-kernel Imre Deak <imre.deak@intel.com> wrote: > Many callers of the wait_event_timeout() and > wait_event_interruptible_timeout() expect that the return value will be > positive if the specified condition becomes true before the timeout > elapses. However, at the moment this isn't guaranteed. If the wake-up > handler is delayed enough, the time remaining until timeout will be > calculated as 0 - and passed back as a return value - even if the > condition became true before the timeout has passed. > > Fix this by returning at least 1 if the condition becomes true. This > semantic is in line with what wait_for_condition_timeout() does; see > commit bb10ed09 - "sched: fix wait_for_completion_timeout() spurious > failure under heavy load". > > Signed-off-by: Imre Deak <imre.deak@intel.com> Acked-by: David Howells <dhowells@redhat.com> ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] wait: fix false timeouts when using wait_event_timeout() 2013-05-02 12:29 ` David Howells @ 2013-05-02 12:35 ` Jens Axboe 2013-05-02 19:56 ` Imre Deak 0 siblings, 1 reply; 23+ messages in thread From: Jens Axboe @ 2013-05-02 12:35 UTC (permalink / raw) To: David Howells Cc: Imre Deak, Daniel Vetter, Paul E. McKenney, Dave Jones, Lukas Czerner, linux-kernel On Thu, May 02 2013, David Howells wrote: > Imre Deak <imre.deak@intel.com> wrote: > > > Many callers of the wait_event_timeout() and > > wait_event_interruptible_timeout() expect that the return value will be > > positive if the specified condition becomes true before the timeout > > elapses. However, at the moment this isn't guaranteed. If the wake-up > > handler is delayed enough, the time remaining until timeout will be > > calculated as 0 - and passed back as a return value - even if the > > condition became true before the timeout has passed. > > > > Fix this by returning at least 1 if the condition becomes true. This > > semantic is in line with what wait_for_condition_timeout() does; see > > commit bb10ed09 - "sched: fix wait_for_completion_timeout() spurious > > failure under heavy load". > > > > Signed-off-by: Imre Deak <imre.deak@intel.com> > > Acked-by: David Howells <dhowells@redhat.com> You can add mine, too: Acked-by: Jens Axboe <axboe@kernel.dk> -- Jens Axboe ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] wait: fix false timeouts when using wait_event_timeout() 2013-05-02 12:35 ` Jens Axboe @ 2013-05-02 19:56 ` Imre Deak 0 siblings, 0 replies; 23+ messages in thread From: Imre Deak @ 2013-05-02 19:56 UTC (permalink / raw) To: Jens Axboe Cc: David Howells, Daniel Vetter, Paul E. McKenney, Dave Jones, Lukas Czerner, linux-kernel On Thu, 2013-05-02 at 14:35 +0200, Jens Axboe wrote: > On Thu, May 02 2013, David Howells wrote: > > Imre Deak <imre.deak@intel.com> wrote: > > > > > Many callers of the wait_event_timeout() and > > > wait_event_interruptible_timeout() expect that the return value will be > > > positive if the specified condition becomes true before the timeout > > > elapses. However, at the moment this isn't guaranteed. If the wake-up > > > handler is delayed enough, the time remaining until timeout will be > > > calculated as 0 - and passed back as a return value - even if the > > > condition became true before the timeout has passed. > > > > > > Fix this by returning at least 1 if the condition becomes true. This > > > semantic is in line with what wait_for_condition_timeout() does; see > > > commit bb10ed09 - "sched: fix wait_for_completion_timeout() spurious > > > failure under heavy load". > > > > > > Signed-off-by: Imre Deak <imre.deak@intel.com> > > > > Acked-by: David Howells <dhowells@redhat.com> > > You can add mine, too: > > Acked-by: Jens Axboe <axboe@kernel.dk> Ok, I think we agree that the +1 thing in schedule_timeout() discussed in this thread should be handled separately, so if there is no other objection I'd be happy if this patch was merged through someone's tree as-is. --Imre ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2013-06-06 18:51 UTC | newest] Thread overview: 23+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-06-04 19:28 [PATCH] wait: fix false timeouts when using wait_event_timeout() Oleg Nesterov 2013-06-04 21:35 ` Imre Deak 2013-06-04 21:40 ` Imre Deak 2013-06-05 16:37 ` Oleg Nesterov 2013-06-05 19:07 ` Oleg Nesterov 2013-06-06 1:45 ` Tejun Heo 2013-06-06 18:47 ` Oleg Nesterov -- strict thread matches above, loose matches on Subject: below -- 2013-05-02 8:58 Imre Deak 2013-05-02 9:36 ` Daniel Vetter 2013-05-07 23:12 ` Andrew Morton 2013-05-08 9:49 ` Imre Deak 2013-05-02 10:29 ` David Howells 2013-05-02 12:02 ` Imre Deak 2013-05-02 12:13 ` Daniel Vetter 2013-05-02 12:23 ` Jens Axboe 2013-05-02 12:29 ` David Howells 2013-05-02 12:34 ` Imre Deak 2013-05-02 12:54 ` Jens Axboe 2013-05-02 13:56 ` Imre Deak 2013-05-02 14:04 ` Daniel Vetter 2013-05-02 12:29 ` David Howells 2013-05-02 12:35 ` Jens Axboe 2013-05-02 19:56 ` Imre Deak
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox