From: Jamie Lokier <jamie@shareable.org>
To: linux-fsdevel@vger.kernel.org
Subject: Re: Bug: epoll_wait timeout is shorter than requested
Date: Mon, 17 Jan 2005 16:48:36 +0000 [thread overview]
Message-ID: <20050117164836.GA25815@mail.shareable.org> (raw)
In-Reply-To: <87acr8uizp.fsf@qrnik.zagroda>
Marcin 'Qrczak' Kowalczyk wrote:
> > If you call select with { 0, 10000 } - that is, 10 milliseconds, then
> > you get a delay between 0ms and 10ms on a 100Hz kernel.
> >
> > That is easy to measure. Just call select() in a loop and observe the
> > times.
>
> I think HZ is now 1000 on x86, so I can't determine experimentally
> which 1ms shifts come from the resolution of poll/epoll interface and
> which come from the timer frequency.
Sure you can. Call select(0,0,0,0,{0,1000}) in a loop, and you'll
see it returns every 1ms.
Call poll(0,0,1) and you'll see it returns every 2ms.
Call epoll_wait (epfd,0,0,1) and you'll see it returns every 1ms.
That 2ms minimum wait with poll() is the problem, although it was more
of a problem when HZ was 100 (as it still is on some architectures).
> What you are saying implies that the amount by which select/epoll may
> shorten the timeout depends on the timer frequency. Thus a program
> which does not know the timer frequency can't know how much to make
> the timeout longer, without risking having to sleep again.
Correct.
Another way to look at it is that a program which doesn't know the
timer frequency can't know how much to make the poll() timeout shorter
if it wants shortest non-zero timeouts, or if it is trying to time an
event as close as possible to an absolute time.
In my experience with a simple interactive X video game, the only
portable way to do this is to actually measure the times at which
select/poll return and deduce the OS's granularity and late/early
rounding using some kind of control system estimator. (Then if you
need finer granularity (as the game did) you need a busy loop to add
the remaining sub-tick time).
> I guess the 1ms here is actually the timer tick and that in case of
> epoll rules are the same, except that the timeout is specified in ms.
> That is, it is rounded up to a multiple of timer ticks, and then the
> actual timeout is between 0 and 1 tick shorter, such that it ends at
> some tick. Right?
Yes. Look at the kernel code: the only difference between epoll and
poll is that poll has "+1" at the end of the equation for the number
of ticks to wait.
> This means that depending on the fraction of the current tick which
> has elapsed, and the fraction of the timeout we want to sleep, the
> optimal request may have two possible values. By optimal request I
> mean the one which will give us the shortest delay which is not
> shorter than the one we actually want. We don't know which request
> to give if we don't know the timer frequency.
>
> For example, assuming 100Hz clock, if we are 3.333ms after a tick
> and we want to sleep at least for 124ms, we should give some timeout
> between 121ms and 130ms. We will actually sleep 126.667ms, which is
> fine. But if we are 6.666ms after a tick is and we want to sleep at
> least for 126ms, we should give some timeout between 131ms and 140ms.
> This will give us an actual delay of 133.334ms - one tick earlier
> would be too short.
>
> So perhaps my program should indeed do what it currently does.
> Sometimes the actual delay will be too short and a separate epoll call
> will sleep the remaining tick. But if the program always added one
> tick, it would sometimes sleep one tick longer than necessary (and
> another problem would be that it does know the timer frequency,
> so it can't add one tick).
>
> I think this gives the optimal behavior wrt. the number of ticks to
> sleep, and the only disadvantage is more syscalls in some cases.
>
> The kernel could make this better because it knows the timer frequency
> and it can determine the fraction of the current tick: it would make
> the delay longer by 1 tick in case the rest of the current tick is
> shorter than the fractional part of the delay (wrt. a tick) reduced by
> the unit of resolution of the interface (to allow for 1 unit to mean
> "until the next tick").
>
> But it would help only in case the tick is longer than the resolution
> of the epoll interface, so perhaps it's not worth the effort - I think
> today it's usually 1ms, equal to the epoll resolution. With select it
> would help more than with epoll, because the select interface has a
> finer resolution, but OTOH select is old-fashioned. And it would only
> help in saving some syscalls, it would not provide a behavior which
> is unimplementable today.
>
> > The man page for select says the timeout serves as an upper bound.
>
> Well, because of other processes a timeout can always become longer
> than requested. It should be an upper bound in the sense that it will
> return earlier if a fd is ready. But it should not return earlier if
> fds are not ready, pretending that the timeout expired while in fact
> it did not.
>
> Except that, as you say, it would prevent specifying a timeout
> "until the next tick, even if it's shorter than the resolution
> of the interface"...
>
> > By the way, select(), poll() and epoll_wait() all have another bug: if
> > the timeout parameter is too large, they'll wait *indefinitely*. They
> > call schedule_timeout(MAX_SCHEDULE_TIMEOUT) in that case, which just
> > calls schedule() with no timer.
>
> Oops. But this should be easy to fix: give MAX_SCHEDULE_TIMEOUT-1.
Yes. Feel free to submit the patch.
> > If select/poll/epoll were implemented by the kernel reading the
> > current time accurately before deciding how many ticks to wait for,
> > they could satisfy SUSv3's constraint, _and_ allow the useful
> > behaviour of application events at the tick rate, _and_ reduce the
> > number of system calls in some programs which call select().
>
> Right.
>
> > If you want to change the code in fs/select.c and fs/eventpoll.c to
> > do this, please do so; I'll be happy to support the case for it.
>
> I'm still not sure what the behavior should be. It seems poll and
> epoll with their current interfaces can't be made better if the tick
> frequency is 1000Hz...
Correct - but the tick frequency isn't 1000Hz on all architectures and
it isn't likely to change either, because 1000Hz is too fast for
slower CPUs such as in routers and PDAs.
> > By the way, the most logically useful interface would take an
> > *absolute* end time, in any of the forms that the POSIX timer code
> > allows.
>
> Yes! I actually have an absolute time, and compute a timeout from it.
Nearly every program does.
> Even if a user of my language specifies a relative time, I convert it
> to absolute time first. Then it's converted to relative time in order
> to pass the timeout to epoll/poll/select, and then the kernel probably
> converts it to absolute time again.
Quite. Silly interface, isn't it? :) We even get to waste significant
numbers of cycles reading the timer chip every time. 2 microseconds
on your system, ~20 microseconds on some others.
-- Jamie
next prev parent reply other threads:[~2005-01-17 16:48 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-01-17 11:15 Bug: epoll_wait timeout is shorter than requested Marcin 'Qrczak' Kowalczyk
2005-01-17 11:48 ` Jamie Lokier
2005-01-17 13:41 ` Marcin 'Qrczak' Kowalczyk
2005-01-17 14:33 ` Jamie Lokier
2005-01-17 14:43 ` Jamie Lokier
2005-01-17 16:18 ` Marcin 'Qrczak' Kowalczyk
2005-01-17 16:48 ` Jamie Lokier [this message]
2005-01-18 23:27 ` Marcin 'Qrczak' Kowalczyk
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20050117164836.GA25815@mail.shareable.org \
--to=jamie@shareable.org \
--cc=linux-fsdevel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.