* [PATCH][4/4] poll()/select() timeout behavior
@ 2004-02-20 21:04 Bill Rugolsky Jr.
2004-02-21 4:13 ` Andrew Morton
0 siblings, 1 reply; 4+ messages in thread
From: Bill Rugolsky Jr. @ 2004-02-20 21:04 UTC (permalink / raw)
To: torvalds, akpm; +Cc: linux-kernel, brugolsky
This patch forces select() to wait *at least* the specified timeout if
no events have occurred, same as poll(). The SUSv3 man page for select(2)
says:
"If the timeout parameter is not a null pointer, it specifies a maximum
interval to wait for the selection to complete. If the specified
time interval expires without any requested operation becoming ready,
the function shall return."
Additionally:
"If the requested timeout interval requires a finer granularity than
the implementation supports, the actual timeout interval shall be
rounded up to the next supported value."
Unfortunately, fixing the fencepost error places a hard lower limit of
1/HZ on the time slept, and increases the average minimum sleep time
threefold, from 1/(2*HZ) jiffy to 3/(2*HZ).
Please consider applying.
Bill Rugolsky
--- linux/fs/select.c 2004-02-20 14:29:11.000000000 -0500
+++ linux/fs/select.c 2004-02-20 14:30:18.326814232 -0500
@@ -313,8 +313,8 @@
if (sec < 0 || usec < 0 || usec >= 1000000)
goto out_nofds;
- if ((unsigned long) sec < (MAX_SCHEDULE_TIMEOUT-1) / HZ - 1) {
- timeout = ROUND_UP(usec, 1000000/HZ);
+ if ((unsigned long) sec < (MAX_SCHEDULE_TIMEOUT-2) / HZ - 1) {
+ timeout = ROUND_UP(usec, 1000000/HZ) + 1;
timeout += sec * (unsigned long) HZ;
} else {
timeout = MAX_SCHEDULE_TIMEOUT-1;
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH][4/4] poll()/select() timeout behavior
2004-02-20 21:04 [PATCH][4/4] poll()/select() timeout behavior Bill Rugolsky Jr.
@ 2004-02-21 4:13 ` Andrew Morton
2004-02-21 16:18 ` Jamie Lokier
0 siblings, 1 reply; 4+ messages in thread
From: Andrew Morton @ 2004-02-21 4:13 UTC (permalink / raw)
To: Bill Rugolsky Jr.; +Cc: torvalds, linux-kernel
"Bill Rugolsky Jr." <brugolsky@telemetry-investments.com> wrote:
>
> This patch forces select() to wait *at least* the specified timeout if
> no events have occurred, same as poll(). The SUSv3 man page for select(2)
> says:
>
> "If the timeout parameter is not a null pointer, it specifies a maximum
> interval to wait for the selection to complete. If the specified
> time interval expires without any requested operation becoming ready,
> the function shall return."
>
> Additionally:
>
> "If the requested timeout interval requires a finer granularity than
> the implementation supports, the actual timeout interval shall be
> rounded up to the next supported value."
>
> Unfortunately, fixing the fencepost error places a hard lower limit of
> 1/HZ on the time slept, and increases the average minimum sleep time
> threefold, from 1/(2*HZ) jiffy to 3/(2*HZ).
I'm inclined to live with the current behaviour rather than
risk breaking existing apps.
>
> --- linux/fs/select.c 2004-02-20 14:29:11.000000000 -0500
> +++ linux/fs/select.c 2004-02-20 14:30:18.326814232 -0500
> @@ -313,8 +313,8 @@
> if (sec < 0 || usec < 0 || usec >= 1000000)
> goto out_nofds;
>
> - if ((unsigned long) sec < (MAX_SCHEDULE_TIMEOUT-1) / HZ - 1) {
> - timeout = ROUND_UP(usec, 1000000/HZ);
> + if ((unsigned long) sec < (MAX_SCHEDULE_TIMEOUT-2) / HZ - 1) {
> + timeout = ROUND_UP(usec, 1000000/HZ) + 1;
> timeout += sec * (unsigned long) HZ;
> } else {
> timeout = MAX_SCHEDULE_TIMEOUT-1;
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH][4/4] poll()/select() timeout behavior
2004-02-21 4:13 ` Andrew Morton
@ 2004-02-21 16:18 ` Jamie Lokier
2004-02-21 19:27 ` Bill Rugolsky Jr.
0 siblings, 1 reply; 4+ messages in thread
From: Jamie Lokier @ 2004-02-21 16:18 UTC (permalink / raw)
To: Andrew Morton; +Cc: Bill Rugolsky Jr., torvalds, linux-kernel
Andrew Morton wrote:
> > Unfortunately, fixing the fencepost error places a hard lower limit of
> > 1/HZ on the time slept, and increases the average minimum sleep time
> > threefold, from 1/(2*HZ) jiffy to 3/(2*HZ).
>
> I'm inclined to live with the current behaviour rather than
> risk breaking existing apps.
select's behaviour is fun when trying to do smooth game animation on
X... Humans are pretty good at noticing jitter in the animation of a
moving object. Years ago, I ended up writing an estimator which
deduced the granularity and rounding of select(), so that I could then
_reduce_ the timeout given to select() followed by a busy wait up to
the desired time. That was needed for SunOS. Nowadays with 1kHz
jiffies it's not a problem, but not all systems have that.
So, I agree, the change might break current apps.
If the current behaviour is retained, shouldn't select(), poll() and
epoll() at least agree on the same rounding direction? poll/epoll
should be suitable as replacements for select, but I don't think they
are timing-wise.
(Btw, Bill, did you take a look at epoll too?)
-- Jamie
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH][4/4] poll()/select() timeout behavior
2004-02-21 16:18 ` Jamie Lokier
@ 2004-02-21 19:27 ` Bill Rugolsky Jr.
0 siblings, 0 replies; 4+ messages in thread
From: Bill Rugolsky Jr. @ 2004-02-21 19:27 UTC (permalink / raw)
To: Jamie Lokier; +Cc: Andrew Morton, torvalds, linux-kernel
On Sat, Feb 21, 2004 at 04:18:06PM +0000, Jamie Lokier wrote:
> select's behaviour is fun when trying to do smooth game animation on
> X... Humans are pretty good at noticing jitter in the animation of a
> moving object. Years ago, I ended up writing an estimator which
> deduced the granularity and rounding of select(), so that I could then
> _reduce_ the timeout given to select() followed by a busy wait up to
> the desired time. That was needed for SunOS. Nowadays with 1kHz
> jiffies it's not a problem, but not all systems have that.
>
> So, I agree, the change might break current apps.
Hi Jamie,
Sorry I forgot to CC you on this ...
I don't feel strongly about the select() fencepost change, so no matter.
The man page wording in the past has been "unfortunate", using phrases
like "no longer than", which is a ridiculously imprecise wording in
light of how timers and sleep queues actually work.
As for the behaviour of usec overflow, there are probably existing
apps that abuse the interface (which is strictly checked on other platforms).
Below is a merged patch which drops the select fencepost change and handles
usec overflow instead of rejecting it. It incorporates the poll overflow
fix and the "don't sleep forever" semantic (and cleans up the use of < v. <=
in my earlier patches).
> If the current behaviour is retained, shouldn't select(), poll() and
> epoll() at least agree on the same rounding direction? poll/epoll
> should be suitable as replacements for select, but I don't think they
> are timing-wise.
You can compare the two manpages:
http://www.opengroup.org/onlinepubs/007904975/functions/select.html
If none of the selected descriptors are ready for the requested operation,
the pselect() or select() function shall block until at least one of
the requested operations becomes ready, until the timeout occurs, or
until interrupted by a signal. The timeout parameter controls how long
the pselect() or select() function shall take before timing out. If
the timeout parameter is not a null pointer, it specifies a maximum
interval to wait for the selection to complete. If the specified time
interval expires without any requested operation becoming ready, the
function shall return. If the timeout parameter is a null pointer,
then the call to pselect() or select() shall block indefinitely until
at least one descriptor meets the specified criteria. To effect a poll,
the timeout parameter should not be a null pointer, and should point to
a zero-valued timespec structure.
...
Implementations may place limitations on the maximum timeout interval
supported. All implementations shall support a maximum timeout interval
of at least 31 days. If the timeout argument specifies a timeout interval
greater than the implementation-defined maximum value, the maximum value
shall be used as the actual timeout value. Implementations may also place
limitations on the granularity of timeout intervals. If the requested
timeout interval requires a finer granularity than the implementation
supports, the actual timeout interval shall be rounded up to the next
supported value.
[Btw, we don't honor the 31-day requirement on 32-bit platforms with HZ=1000;
I certainly don't care.]
http://www.opengroup.org/onlinepubs/007904975/functions/poll.html
If none of the defined events have occurred on any selected file
descriptor, poll() shall wait at least timeout milliseconds for an event
to occur on any of the selected file descriptors. If the value of timeout
is 0, poll() shall return immediately. If the value of timeout is -1,
poll() shall block until a requested event occurs or until the call
is interrupted.
Implementations may place limitations on the granularity of timeout
intervals. If the requested timeout interval requires a finer granularity
than the implementation supports, the actual timeout interval shall be
rounded up to the next supported value.
I think the intention is that poll() requires the +1 jiffy.
Since it is not stated explicitly in the manpage, this raises the question:
should poll() return early if the timeout exceeds the implementation
maximum? I think so, because in a correctly written poll() loop it does
no harm, while the existing behavior is problematic. The select() manpage
explicitly specifies the early return.
> (Btw, Bill, did you take a look at epoll too?)
I hadn't, but I just did, and it also has the poll() overflow problem.
Interestingly, it has the select() fencepost behavior.
Should epoll() behave like poll() or select(), or should
we modify poll() to also behave like select()?
In the patch below I've chosen the first option.
W.r.t the overflow calculation: in the typical (non-overflow) case
we can trade off a compare/branch for one of the constant divisions
by 1000. I have no clue which is more costly.
- Bill Rugolsky
--- linux/fs/select.c 2004-02-17 22:57:12.000000000 -0500
+++ linux/fs/select.c 2004-02-21 12:55:32.357875000 -0500
@@ -291,8 +291,6 @@
* Update: ERESTARTSYS breaks at least the xview clock binary, so
* I'm trying ERESTARTNOHAND which restart only when you want to.
*/
-#define MAX_SELECT_SECONDS \
- ((unsigned long) (MAX_SCHEDULE_TIMEOUT / HZ)-1)
asmlinkage long
sys_select(int n, fd_set __user *inp, fd_set __user *outp, fd_set __user *exp, struct timeval __user *tvp)
@@ -315,9 +313,16 @@
if (sec < 0 || usec < 0)
goto out_nofds;
- if ((unsigned long) sec < MAX_SELECT_SECONDS) {
- timeout = ROUND_UP(usec, 1000000/HZ);
- timeout += sec * (unsigned long) HZ;
+ /* be careful about overflow */
+ if ((unsigned long) sec < ((MAX_SCHEDULE_TIMEOUT-1) / HZ)) {
+ long fraction = ROUND_UP(usec, 1000000/HZ);
+ timeout = sec * (unsigned long) HZ;
+ if ((unsigned long) timeout < MAX_SCHEDULE_TIMEOUT - fraction)
+ timeout += fraction;
+ else
+ timeout = MAX_SCHEDULE_TIMEOUT-1;
+ } else {
+ timeout = MAX_SCHEDULE_TIMEOUT-1;
}
}
@@ -469,11 +474,17 @@
return -EINVAL;
if (timeout) {
- /* Careful about overflow in the intermediate values */
- if ((unsigned long) timeout < MAX_SCHEDULE_TIMEOUT / HZ)
- timeout = (unsigned long)(timeout*HZ+999)/1000+1;
- else /* Negative or overflow */
+ if (timeout < 0) {
timeout = MAX_SCHEDULE_TIMEOUT;
+ } else {
+ /* Careful about overflow in the intermediate values */
+ long seconds = timeout/1000;
+ timeout = ((timeout - 1000*seconds)*HZ + 999)/1000 + 1;
+ if (seconds < (MAX_SCHEDULE_TIMEOUT-2) / HZ)
+ timeout += seconds * HZ;
+ else
+ timeout = MAX_SCHEDULE_TIMEOUT-1;
+ }
}
poll_initwait(&table);
--- linux/fs/eventpoll.c 2004-02-17 22:57:29.000000000 -0500
+++ linux/fs/eventpoll.c 2004-02-21 13:38:29.987875000 -0500
@@ -1576,7 +1576,6 @@
{
int res, eavail;
unsigned long flags;
- long jtimeout;
wait_queue_t wait;
/*
@@ -1584,8 +1583,17 @@
* and the overflow condition. The passed timeout is in milliseconds,
* that why (t * HZ) / 1000.
*/
- jtimeout = timeout == -1 || timeout > (MAX_SCHEDULE_TIMEOUT - 1000) / HZ ?
- MAX_SCHEDULE_TIMEOUT: (timeout * HZ + 999) / 1000;
+ if (timeout < 0) {
+ timeout = MAX_SCHEDULE_TIMEOUT;
+ } else {
+ /* Careful about overflow in the intermediate values */
+ long seconds = timeout/1000;
+ timeout = ((timeout - 1000*seconds)*HZ + 999)/1000 + 1;
+ if (seconds < (MAX_SCHEDULE_TIMEOUT-2) / HZ)
+ timeout += seconds * HZ;
+ else
+ timeout = MAX_SCHEDULE_TIMEOUT-1;
+ }
retry:
write_lock_irqsave(&ep->lock, flags);
@@ -1607,7 +1615,7 @@
* to TASK_INTERRUPTIBLE before doing the checks.
*/
set_current_state(TASK_INTERRUPTIBLE);
- if (!list_empty(&ep->rdllist) || !jtimeout)
+ if (!list_empty(&ep->rdllist) || !timeout)
break;
if (signal_pending(current)) {
res = -EINTR;
@@ -1615,7 +1623,7 @@
}
write_unlock_irqrestore(&ep->lock, flags);
- jtimeout = schedule_timeout(jtimeout);
+ timeout = schedule_timeout(timeout);
write_lock_irqsave(&ep->lock, flags);
}
remove_wait_queue(&ep->wq, &wait);
@@ -1634,7 +1642,7 @@
* more luck.
*/
if (!res && eavail &&
- !(res = ep_events_transfer(ep, events, maxevents)) && jtimeout)
+ !(res = ep_events_transfer(ep, events, maxevents)) && timeout)
goto retry;
return res;
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2004-02-21 19:28 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-02-20 21:04 [PATCH][4/4] poll()/select() timeout behavior Bill Rugolsky Jr.
2004-02-21 4:13 ` Andrew Morton
2004-02-21 16:18 ` Jamie Lokier
2004-02-21 19:27 ` Bill Rugolsky Jr.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox