From mboxrd@z Thu Jan  1 00:00:00 1970
From: Marcin 'Qrczak' Kowalczyk <qrczak@knm.org.pl>
Subject: Re: Bug: epoll_wait timeout is shorter than requested
Date: Wed, 19 Jan 2005 00:27:55 +0100
Message-ID: <87is5ub9ms.fsf@qrnik.zagroda>
References: <87651wl32d.fsf@qrnik.zagroda>
	<20050117114821.GB20152@mail.shareable.org>
	<87r7kk41gp.fsf@qrnik.zagroda>
	<20050117143348.GA23427@mail.shareable.org>
	<87acr8uizp.fsf@qrnik.zagroda>
	<20050117164836.GA25815@mail.shareable.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Return-path: <linux-fsdevel-owner@vger.kernel.org>
Received: from paf87.warszawa.sdi.tpnet.pl ([217.96.225.87]:23310 "EHLO
	qrnik.knm.org.pl") by vger.kernel.org with ESMTP id S261475AbVARX15
	(ORCPT <rfc822;linux-fsdevel@vger.kernel.org>);
	Tue, 18 Jan 2005 18:27:57 -0500
Received: from qrczak by qrnik.knm.org.pl with local (Exim 3.36 #1)
	id 1Cr2lP-0001lh-00
	for linux-fsdevel@vger.kernel.org; Wed, 19 Jan 2005 00:27:55 +0100
To: linux-fsdevel@vger.kernel.org
In-Reply-To: <20050117164836.GA25815@mail.shareable.org> (Jamie Lokier's
 message of "Mon, 17 Jan 2005 16:48:36 +0000")
Sender: linux-fsdevel-owner@vger.kernel.org
List-Id: linux-fsdevel.vger.kernel.org

Jamie Lokier <jamie@shareable.org> writes:

> In my experience with a simple interactive X video game, the only
> portable way to do this is to actually measure the times at which
> select/poll return and deduce the OS's granularity and late/early
> rounding using some kind of control system estimator.  (Then if you
> need finer granularity (as the game did) you need a busy loop to add
> the remaining sub-tick time).

I just implemented something like this in the runtime of my language.
It was surprisingly easy to adjust to varying behaviors of poll and
epoll.

At ./configure time I measure the time poll/epoll will wait when asked
to wait for 1ms, started just after a timer tick (previous delay).
This is the only tricky part, because other activity in the system may
disturb the result, in either direction. So I do this 20 times, sort
the results, skip 3 shortest and 7 longest delays, and expect that
others are equal when rounded up to milliseconds. If they are not
equal, perhaps the system is too busy to give reliable results;
I wait for 1 second and repeat the experiment, up to 50 times.
Any better idea?

The result is in practice 1ms for epoll with 1000Hz, 10ms with 100Hz,
and twice as much for poll. It is 3ms for poll on Alpha (Linux-2.2.20),
I'm not sure why.

Anyway, the program doesn't need to know whether the actual delay can
be shorter or longer than requested, nor the clock frequency. It must
only know this one number.

When running a program, my scheduler computes the timeout for
epoll/poll/select basing on the earliest wakeup time of sleeping
threads. After calling epoll/poll/select, if still no thread is ready,
the scheduler loops again.

This loop used to be a rare event:
- the system clock has been adjusted during waiting
- the timeout was longer than INT_MAX
- spurious wakeup from epoll happened (threads are not unregistered
  from the epoll fd until a spurious wakeup actually happens, because
  usually it does not happen before they would reregister anyway)
- epoll returned earlier than asked (which prompted me to report this
  as a problem here)
- poll/epoll/select failed with EINTR, yet handling the signal did not
  wake up a thread

But now I changed the rules of computation of the timeout. It is
rounded down instead of up; the measured time described above is
subtracted; 1ms is added; if it got below 0, 0 is substituted.

This lets poll/epoll return *before* the planned wakeup time instead
of after (unless the system is busy of course), and the loop will
almost always spin if some thread is about to finish sleeping. In
the next iteration the computed delay will be 0 and the loop will
degenerate to busy waiting, calling gettimeofday and poll/epoll with
timeout 0 (or just gettimeofday if there are no fds to wait for).

If the program knew the timer frequency, the semantics of the timeout,
and the current fraction of the timer tick, it could save one tick of
busy waiting by making the timeout longer by one tick in some cases.
This is not worth the effort.

The semantics of the timeout of epoll is indeed slightly more useful
than of poll: it makes the busy waiting one tick shorter in some cases.

If the measurement done at ./configure time does not apply at runtime,
the only bad things which can happen are inaccurate delays, or longer
busy waiting than needed. This is not critical, so I don't worry.

I didn't bother to perform the analogous adjustment for select,
because it is used only on systems which don't have poll.

Sleeping used to be accurate to about 1ms here, or 10ms with a 100Hz
clock. Now it's accurate to about 30us most of the time. Enough.
The busy waiting is not noticeable in CPU usage.

-- 
   __("<         Marcin Kowalczyk
   \__/       qrczak@knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/