From: Jamie Lokier <jamie@shareable.org>
To: linux-fsdevel@vger.kernel.org
Subject: Re: Bug: epoll_wait timeout is shorter than requested
Date: Mon, 17 Jan 2005 16:48:36 +0000 [thread overview]
Message-ID: <20050117164836.GA25815@mail.shareable.org> (raw)
In-Reply-To: <87acr8uizp.fsf@qrnik.zagroda>
Marcin 'Qrczak' Kowalczyk wrote:
> > If you call select with { 0, 10000 } - that is, 10 milliseconds, then
> > you get a delay between 0ms and 10ms on a 100Hz kernel.
> >
> > That is easy to measure. Just call select() in a loop and observe the
> > times.
>
> I think HZ is now 1000 on x86, so I can't determine experimentally
> which 1ms shifts come from the resolution of poll/epoll interface and
> which come from the timer frequency.
Sure you can. Call select(0,0,0,0,{0,1000}) in a loop, and you'll
see it returns every 1ms.
Call poll(0,0,1) and you'll see it returns every 2ms.
Call epoll_wait (epfd,0,0,1) and you'll see it returns every 1ms.
That 2ms minimum wait with poll() is the problem, although it was more
of a problem when HZ was 100 (as it still is on some architectures).
> What you are saying implies that the amount by which select/epoll may
> shorten the timeout depends on the timer frequency. Thus a program
> which does not know the timer frequency can't know how much to make
> the timeout longer, without risking having to sleep again.
Correct.
Another way to look at it is that a program which doesn't know the
timer frequency can't know how much to make the poll() timeout shorter
if it wants shortest non-zero timeouts, or if it is trying to time an
event as close as possible to an absolute time.
In my experience with a simple interactive X video game, the only
portable way to do this is to actually measure the times at which
select/poll return and deduce the OS's granularity and late/early
rounding using some kind of control system estimator. (Then if you
need finer granularity (as the game did) you need a busy loop to add
the remaining sub-tick time).
> I guess the 1ms here is actually the timer tick and that in case of
> epoll rules are the same, except that the timeout is specified in ms.
> That is, it is rounded up to a multiple of timer ticks, and then the
> actual timeout is between 0 and 1 tick shorter, such that it ends at
> some tick. Right?
Yes. Look at the kernel code: the only difference between epoll and
poll is that poll has "+1" at the end of the equation for the number
of ticks to wait.
> This means that depending on the fraction of the current tick which
> has elapsed, and the fraction of the timeout we want to sleep, the
> optimal request may have two possible values. By optimal request I
> mean the one which will give us the shortest delay which is not
> shorter than the one we actually want. We don't know which request
> to give if we don't know the timer frequency.
>
> For example, assuming 100Hz clock, if we are 3.333ms after a tick
> and we want to sleep at least for 124ms, we should give some timeout
> between 121ms and 130ms. We will actually sleep 126.667ms, which is
> fine. But if we are 6.666ms after a tick is and we want to sleep at
> least for 126ms, we should give some timeout between 131ms and 140ms.
> This will give us an actual delay of 133.334ms - one tick earlier
> would be too short.
>
> So perhaps my program should indeed do what it currently does.
> Sometimes the actual delay will be too short and a separate epoll call
> will sleep the remaining tick. But if the program always added one
> tick, it would sometimes sleep one tick longer than necessary (and
> another problem would be that it does know the timer frequency,
> so it can't add one tick).
>
> I think this gives the optimal behavior wrt. the number of ticks to
> sleep, and the only disadvantage is more syscalls in some cases.
>
> The kernel could make this better because it knows the timer frequency
> and it can determine the fraction of the current tick: it would make
> the delay longer by 1 tick in case the rest of the current tick is
> shorter than the fractional part of the delay (wrt. a tick) reduced by
> the unit of resolution of the interface (to allow for 1 unit to mean
> "until the next tick").
>
> But it would help only in case the tick is longer than the resolution
> of the epoll interface, so perhaps it's not worth the effort - I think
> today it's usually 1ms, equal to the epoll resolution. With select it
> would help more than with epoll, because the select interface has a
> finer resolution, but OTOH select is old-fashioned. And it would only
> help in saving some syscalls, it would not provide a behavior which
> is unimplementable today.
>
> > The man page for select says the timeout serves as an upper bound.
>
> Well, because of other processes a timeout can always become longer
> than requested. It should be an upper bound in the sense that it will
> return earlier if a fd is ready. But it should not return earlier if
> fds are not ready, pretending that the timeout expired while in fact
> it did not.
>
> Except that, as you say, it would prevent specifying a timeout
> "until the next tick, even if it's shorter than the resolution
> of the interface"...
>
> > By the way, select(), poll() and epoll_wait() all have another bug: if
> > the timeout parameter is too large, they'll wait *indefinitely*. They
> > call schedule_timeout(MAX_SCHEDULE_TIMEOUT) in that case, which just
> > calls schedule() with no timer.
>
> Oops. But this should be easy to fix: give MAX_SCHEDULE_TIMEOUT-1.
Yes. Feel free to submit the patch.
> > If select/poll/epoll were implemented by the kernel reading the
> > current time accurately before deciding how many ticks to wait for,
> > they could satisfy SUSv3's constraint, _and_ allow the useful
> > behaviour of application events at the tick rate, _and_ reduce the
> > number of system calls in some programs which call select().
>
> Right.
>
> > If you want to change the code in fs/select.c and fs/eventpoll.c to
> > do this, please do so; I'll be happy to support the case for it.
>
> I'm still not sure what the behavior should be. It seems poll and
> epoll with their current interfaces can't be made better if the tick
> frequency is 1000Hz...
Correct - but the tick frequency isn't 1000Hz on all architectures and
it isn't likely to change either, because 1000Hz is too fast for
slower CPUs such as in routers and PDAs.
> > By the way, the most logically useful interface would take an
> > *absolute* end time, in any of the forms that the POSIX timer code
> > allows.
>
> Yes! I actually have an absolute time, and compute a timeout from it.
Nearly every program does.
> Even if a user of my language specifies a relative time, I convert it
> to absolute time first. Then it's converted to relative time in order
> to pass the timeout to epoll/poll/select, and then the kernel probably
> converts it to absolute time again.
Quite. Silly interface, isn't it? :) We even get to waste significant
numbers of cycles reading the timer chip every time. 2 microseconds
on your system, ~20 microseconds on some others.
-- Jamie
next prev parent reply other threads:[~2005-01-17 16:48 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-01-17 11:15 Bug: epoll_wait timeout is shorter than requested Marcin 'Qrczak' Kowalczyk
2005-01-17 11:48 ` Jamie Lokier
2005-01-17 13:41 ` Marcin 'Qrczak' Kowalczyk
2005-01-17 14:33 ` Jamie Lokier
2005-01-17 14:43 ` Jamie Lokier
2005-01-17 16:18 ` Marcin 'Qrczak' Kowalczyk
2005-01-17 16:48 ` Jamie Lokier [this message]
2005-01-18 23:27 ` Marcin 'Qrczak' Kowalczyk
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20050117164836.GA25815@mail.shareable.org \
--to=jamie@shareable.org \
--cc=linux-fsdevel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).