public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed
From: Marcin 'Qrczak' Kowalczyk <qrczak@knm.org.pl>
To: linux-fsdevel@vger.kernel.org
Subject: Re: Bug: epoll_wait timeout is shorter than requested
Date: Mon, 17 Jan 2005 17:18:34 +0100	[thread overview]
Message-ID: <87acr8uizp.fsf@qrnik.zagroda> (raw)
In-Reply-To: <20050117143348.GA23427@mail.shareable.org> (Jamie Lokier's message of "Mon, 17 Jan 2005 14:33:48 +0000")

Jamie Lokier <jamie@shareable.org> writes:

> If you call select with { 0, 10000 } - that is, 10 milliseconds, then
> you get a delay between 0ms and 10ms on a 100Hz kernel.
>
> That is easy to measure.  Just call select() in a loop and observe the
> times.

I think HZ is now 1000 on x86, so I can't determine experimentally
which 1ms shifts come from the resolution of poll/epoll interface and
which come from the timer frequency.

What you are saying implies that the amount by which select/epoll may
shorten the timeout depends on the timer frequency. Thus a program
which does not know the timer frequency can't know how much to make
the timeout longer, without risking having to sleep again.

I can observe that select rounds the timeout up to a multiple of 1ms,
and then waits for an amount between the resulting time and 1ms
shorter. If the timeout is 12.5ms, it will wait between 12ms and 13ms;
the same is true for any requested timeout > 12ms and <= 13ms.

I guess the 1ms here is actually the timer tick and that in case of
epoll rules are the same, except that the timeout is specified in ms.
That is, it is rounded up to a multiple of timer ticks, and then the
actual timeout is between 0 and 1 tick shorter, such that it ends at
some tick. Right?

This means that depending on the fraction of the current tick which
has elapsed, and the fraction of the timeout we want to sleep, the
optimal request may have two possible values. By optimal request I
mean the one which will give us the shortest delay which is not
shorter than the one we actually want. We don't know which request
to give if we don't know the timer frequency.

For example, assuming 100Hz clock, if we are 3.333ms after a tick
and we want to sleep at least for 124ms, we should give some timeout
between 121ms and 130ms. We will actually sleep 126.667ms, which is
fine. But if we are 6.666ms after a tick is and we want to sleep at
least for 126ms, we should give some timeout between 131ms and 140ms.
This will give us an actual delay of 133.334ms - one tick earlier
would be too short.

So perhaps my program should indeed do what it currently does.
Sometimes the actual delay will be too short and a separate epoll call
will sleep the remaining tick. But if the program always added one
tick, it would sometimes sleep one tick longer than necessary (and
another problem would be that it does know the timer frequency,
so it can't add one tick).

I think this gives the optimal behavior wrt. the number of ticks to
sleep, and the only disadvantage is more syscalls in some cases.

The kernel could make this better because it knows the timer frequency
and it can determine the fraction of the current tick: it would make
the delay longer by 1 tick in case the rest of the current tick is
shorter than the fractional part of the delay (wrt. a tick) reduced by
the unit of resolution of the interface (to allow for 1 unit to mean
"until the next tick").

But it would help only in case the tick is longer than the resolution
of the epoll interface, so perhaps it's not worth the effort - I think
today it's usually 1ms, equal to the epoll resolution. With select it
would help more than with epoll, because the select interface has a
finer resolution, but OTOH select is old-fashioned. And it would only
help in saving some syscalls, it would not provide a behavior which
is unimplementable today.

> The man page for select says the timeout serves as an upper bound.

Well, because of other processes a timeout can always become longer
than requested. It should be an upper bound in the sense that it will
return earlier if a fd is ready. But it should not return earlier if
fds are not ready, pretending that the timeout expired while in fact
it did not.

Except that, as you say, it would prevent specifying a timeout
"until the next tick, even if it's shorter than the resolution
of the interface"...

> By the way, select(), poll() and epoll_wait() all have another bug: if
> the timeout parameter is too large, they'll wait *indefinitely*.  They
> call schedule_timeout(MAX_SCHEDULE_TIMEOUT) in that case, which just
> calls schedule() with no timer.

Oops. But this should be easy to fix: give MAX_SCHEDULE_TIMEOUT-1.
It's LONG_MAX ticks, i.e. 24 days on a 32-bit machine with 1000Hz
timer?

> If select/poll/epoll were implemented by the kernel reading the
> current time accurately before deciding how many ticks to wait for,
> they could satisfy SUSv3's constraint, _and_ allow the useful
> behaviour of application events at the tick rate, _and_ reduce the
> number of system calls in some programs which call select().

Right.

> If you want to change the code in fs/select.c and fs/eventpoll.c to
> do this, please do so; I'll be happy to support the case for it.

I'm still not sure what the behavior should be. It seems poll and
epoll with their current interfaces can't be made better if the tick
frequency is 1000Hz...

> By the way, the most logically useful interface would take an
> *absolute* end time, in any of the forms that the POSIX timer code
> allows.

Yes! I actually have an absolute time, and compute a timeout from it.

Even if a user of my language specifies a relative time, I convert it
to absolute time first. Then it's converted to relative time in order
to pass the timeout to epoll/poll/select, and then the kernel probably
converts it to absolute time again.

-- 
   __("<         Marcin Kowalczyk
   \__/       qrczak@knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/

  parent reply	other threads:[~2005-01-17 16:18 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-01-17 11:15 Bug: epoll_wait timeout is shorter than requested Marcin 'Qrczak' Kowalczyk
2005-01-17 11:48 ` Jamie Lokier
2005-01-17 13:41   ` Marcin 'Qrczak' Kowalczyk
2005-01-17 14:33     ` Jamie Lokier
2005-01-17 14:43       ` Jamie Lokier
2005-01-17 16:18       ` Marcin 'Qrczak' Kowalczyk [this message]
2005-01-17 16:48         ` Jamie Lokier
2005-01-18 23:27           ` Marcin 'Qrczak' Kowalczyk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87acr8uizp.fsf@qrnik.zagroda \
    --to=qrczak@knm.org.pl \
    --cc=linux-fsdevel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox