From mboxrd@z Thu Jan 1 00:00:00 1970 From: Marcin 'Qrczak' Kowalczyk Subject: Re: Bug: epoll_wait timeout is shorter than requested Date: Wed, 19 Jan 2005 00:27:55 +0100 Message-ID: <87is5ub9ms.fsf@qrnik.zagroda> References: <87651wl32d.fsf@qrnik.zagroda> <20050117114821.GB20152@mail.shareable.org> <87r7kk41gp.fsf@qrnik.zagroda> <20050117143348.GA23427@mail.shareable.org> <87acr8uizp.fsf@qrnik.zagroda> <20050117164836.GA25815@mail.shareable.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Received: from paf87.warszawa.sdi.tpnet.pl ([217.96.225.87]:23310 "EHLO qrnik.knm.org.pl") by vger.kernel.org with ESMTP id S261475AbVARX15 (ORCPT ); Tue, 18 Jan 2005 18:27:57 -0500 Received: from qrczak by qrnik.knm.org.pl with local (Exim 3.36 #1) id 1Cr2lP-0001lh-00 for linux-fsdevel@vger.kernel.org; Wed, 19 Jan 2005 00:27:55 +0100 To: linux-fsdevel@vger.kernel.org In-Reply-To: <20050117164836.GA25815@mail.shareable.org> (Jamie Lokier's message of "Mon, 17 Jan 2005 16:48:36 +0000") Sender: linux-fsdevel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org Jamie Lokier writes: > In my experience with a simple interactive X video game, the only > portable way to do this is to actually measure the times at which > select/poll return and deduce the OS's granularity and late/early > rounding using some kind of control system estimator. (Then if you > need finer granularity (as the game did) you need a busy loop to add > the remaining sub-tick time). I just implemented something like this in the runtime of my language. It was surprisingly easy to adjust to varying behaviors of poll and epoll. At ./configure time I measure the time poll/epoll will wait when asked to wait for 1ms, started just after a timer tick (previous delay). This is the only tricky part, because other activity in the system may disturb the result, in either direction. So I do this 20 times, sort the results, skip 3 shortest and 7 longest delays, and expect that others are equal when rounded up to milliseconds. If they are not equal, perhaps the system is too busy to give reliable results; I wait for 1 second and repeat the experiment, up to 50 times. Any better idea? The result is in practice 1ms for epoll with 1000Hz, 10ms with 100Hz, and twice as much for poll. It is 3ms for poll on Alpha (Linux-2.2.20), I'm not sure why. Anyway, the program doesn't need to know whether the actual delay can be shorter or longer than requested, nor the clock frequency. It must only know this one number. When running a program, my scheduler computes the timeout for epoll/poll/select basing on the earliest wakeup time of sleeping threads. After calling epoll/poll/select, if still no thread is ready, the scheduler loops again. This loop used to be a rare event: - the system clock has been adjusted during waiting - the timeout was longer than INT_MAX - spurious wakeup from epoll happened (threads are not unregistered from the epoll fd until a spurious wakeup actually happens, because usually it does not happen before they would reregister anyway) - epoll returned earlier than asked (which prompted me to report this as a problem here) - poll/epoll/select failed with EINTR, yet handling the signal did not wake up a thread But now I changed the rules of computation of the timeout. It is rounded down instead of up; the measured time described above is subtracted; 1ms is added; if it got below 0, 0 is substituted. This lets poll/epoll return *before* the planned wakeup time instead of after (unless the system is busy of course), and the loop will almost always spin if some thread is about to finish sleeping. In the next iteration the computed delay will be 0 and the loop will degenerate to busy waiting, calling gettimeofday and poll/epoll with timeout 0 (or just gettimeofday if there are no fds to wait for). If the program knew the timer frequency, the semantics of the timeout, and the current fraction of the timer tick, it could save one tick of busy waiting by making the timeout longer by one tick in some cases. This is not worth the effort. The semantics of the timeout of epoll is indeed slightly more useful than of poll: it makes the busy waiting one tick shorter in some cases. If the measurement done at ./configure time does not apply at runtime, the only bad things which can happen are inaccurate delays, or longer busy waiting than needed. This is not critical, so I don't worry. I didn't bother to perform the analogous adjustment for select, because it is used only on systems which don't have poll. Sleeping used to be accurate to about 1ms here, or 10ms with a 100Hz clock. Now it's accurate to about 30us most of the time. Enough. The busy waiting is not noticeable in CPU usage. -- __("< Marcin Kowalczyk \__/ qrczak@knm.org.pl ^^ http://qrnik.knm.org.pl/~qrczak/