* [PATCH] 'select' failure or signal should not update timeout [not found] <200207171430.g6HEUvY23619@aztec.santafe.edu> @ 2002-07-19 9:52 ` Paul Eggert 2002-07-20 0:38 ` Alan Cox 0 siblings, 1 reply; 23+ messages in thread From: Paul Eggert @ 2002-07-19 9:52 UTC (permalink / raw) To: linux-kernel; +Cc: rms, Alan Cox [This follows up a thread in the emacs-pretesters mailing list about a problem with Emacs, 'select', SA_RESTART, and interrupts.] > Date: Wed, 17 Jul 2002 08:30:57 -0600 (MDT) > From: Richard Stallman <rms@gnu.org> > > Is it true today that restarting a `select' call after a signal (with > SA_RESTART) alters the contents of the user's timeout parameter? I looked into this a bit more, and it turns out to be a problem in the Linux kernel. POSIX 1003.1-2001 <http://www.opengroup.org/onlinepubs/007904975/functions/select.html> says that 'select' may modify its timeout argument only "upon successful completion". However, the Linux kernel sometimes modifies the timeout argument even when 'select' fails or is interrupted. Here is a program that illustrates the problem. At the end of this message I enclose a proposed patch to the Linux kernel to fix the problem. /* Conformance test for POSIX 1003.1-2001's requirement that select() must not update its timeout argument unless it succeeds. */ #include <signal.h> #include <stdio.h> #include <stdlib.h> #include <sys/time.h> #include <unistd.h> static void moan (char const *string) { perror (string); exit (1); } struct timeval timeout; struct timeval timeout_when_in_handler; struct sigaction act; void handle_sigalrm (int sig) { timeout_when_in_handler = timeout; } enum { TIMEOUT_SECONDS = 5 }; int main (int argc, char **argv) { act.sa_handler = handle_sigalrm; act.sa_flags = SA_RESTART; if (sigaction (SIGALRM, &act, 0) != 0) moan ("sigaction"); timeout.tv_sec = TIMEOUT_SECONDS; timeout_when_in_handler = timeout; alarm (TIMEOUT_SECONDS / 2); if (select (0, 0, 0, 0, &timeout) != 0) { if (timeout.tv_sec != TIMEOUT_SECONDS) { perror ("select"); fprintf (stderr, "select failed, but timeout was updated to %ld.%.9ld seconds\n", (long) timeout.tv_sec, timeout.tv_usec); } } if (timeout_when_in_handler.tv_sec != TIMEOUT_SECONDS) fprintf (stderr, "timeout was updated to %ld.%.9ld seconds while signal handler was active\n", (long) timeout_when_in_handler.tv_sec, timeout_when_in_handler.tv_usec); return 0; } Here is a proposed patch to Linux kernel 2.5.26. The patch also applies to Linux 2.4.18, though you have to ignore the patches to files that do not exist in 2.4.18. I haven't tested this patch, but it's pretty straightforward. diff -prU6 2.5.26/arch/ia64/ia32/sys_ia32.c 2.5.26-select/arch/ia64/ia32/sys_ia32.c --- 2.5.26/arch/ia64/ia32/sys_ia32.c Tue Jul 16 16:49:35 2002 +++ 2.5.26-select/arch/ia64/ia32/sys_ia32.c Fri Jul 19 02:17:08 2002 @@ -1058,32 +1058,32 @@ sys32_select (int n, fd_set *inp, fd_set zero_fd_set(n, fds.res_in); zero_fd_set(n, fds.res_out); zero_fd_set(n, fds.res_ex); ret = do_select(n, &fds, &timeout); + if (ret < 0) + goto out; + if (!ret) { + ret = -ERESTARTNOHAND; + if (signal_pending(current)) + goto out; + ret = 0; + } + if (tvp32 && !(current->personality & STICKY_TIMEOUTS)) { time_t sec = 0, usec = 0; if (timeout) { sec = timeout / HZ; usec = timeout % HZ; usec *= (1000000/HZ); } if (put_user(sec, &tvp32->tv_sec) || put_user(usec, &tvp32->tv_usec)) { ret = -EFAULT; goto out; } - } - - if (ret < 0) - goto out; - if (!ret) { - ret = -ERESTARTNOHAND; - if (signal_pending(current)) - goto out; - ret = 0; } set_fd_set(n, inp, fds.res_in); set_fd_set(n, outp, fds.res_out); set_fd_set(n, exp, fds.res_ex); diff -prU6 2.5.26/arch/mips64/kernel/linux32.c 2.5.26-select/arch/mips64/kernel/linux32.c --- 2.5.26/arch/mips64/kernel/linux32.c Tue Jul 16 16:49:25 2002 +++ 2.5.26-select/arch/mips64/kernel/linux32.c Fri Jul 19 02:18:19 2002 @@ -1170,30 +1170,30 @@ asmlinkage int sys32_select(int n, u32 * zero_fd_set(n, fds.res_in); zero_fd_set(n, fds.res_out); zero_fd_set(n, fds.res_ex); ret = do_select(n, &fds, &timeout); + if (ret < 0) + goto out; + if (!ret) { + ret = -ERESTARTNOHAND; + if (signal_pending(current)) + goto out; + ret = 0; + } + if (tvp && !(current->personality & STICKY_TIMEOUTS)) { time_t sec = 0, usec = 0; if (timeout) { sec = timeout / HZ; usec = timeout % HZ; usec *= (1000000/HZ); } put_user(sec, &tvp->tv_sec); put_user(usec, &tvp->tv_usec); - } - - if (ret < 0) - goto out; - if (!ret) { - ret = -ERESTARTNOHAND; - if (signal_pending(current)) - goto out; - ret = 0; } set_fd_set32(nn, inp, fds.res_in); set_fd_set32(nn, outp, fds.res_out); set_fd_set32(nn, exp, fds.res_ex); diff -prU6 2.5.26/arch/ppc64/kernel/sys_ppc32.c 2.5.26-select/arch/ppc64/kernel/sys_ppc32.c --- 2.5.26/arch/ppc64/kernel/sys_ppc32.c Tue Jul 16 16:49:36 2002 +++ 2.5.26-select/arch/ppc64/kernel/sys_ppc32.c Fri Jul 19 02:17:58 2002 @@ -762,30 +762,30 @@ asmlinkage long sys32_select(int n, u32 zero_fd_set(n, fds.res_in); zero_fd_set(n, fds.res_out); zero_fd_set(n, fds.res_ex); ret = do_select(n, &fds, &timeout); + if (ret < 0) + goto out; + if (!ret) { + ret = -ERESTARTNOHAND; + if (signal_pending(current)) + goto out; + ret = 0; + } + if (tvp && !(current->personality & STICKY_TIMEOUTS)) { time_t sec = 0, usec = 0; if (timeout) { sec = timeout / HZ; usec = timeout % HZ; usec *= (1000000/HZ); } put_user(sec, &tvp->tv_sec); put_user(usec, &tvp->tv_usec); - } - - if (ret < 0) - goto out; - if (!ret) { - ret = -ERESTARTNOHAND; - if (signal_pending(current)) - goto out; - ret = 0; } set_fd_set32(nn, inp, fds.res_in); set_fd_set32(nn, outp, fds.res_out); set_fd_set32(nn, exp, fds.res_ex); diff -prU6 2.5.26/arch/s390x/kernel/linux32.c 2.5.26-select/arch/s390x/kernel/linux32.c --- 2.5.26/arch/s390x/kernel/linux32.c Tue Jul 16 16:49:27 2002 +++ 2.5.26-select/arch/s390x/kernel/linux32.c Fri Jul 19 02:18:27 2002 @@ -1420,30 +1420,30 @@ asmlinkage int sys32_select(int n, u32 * zero_fd_set(n, fds.res_in); zero_fd_set(n, fds.res_out); zero_fd_set(n, fds.res_ex); ret = do_select(n, &fds, &timeout); + if (ret < 0) + goto out; + if (!ret) { + ret = -ERESTARTNOHAND; + if (signal_pending(current)) + goto out; + ret = 0; + } + if (tvp && !(current->personality & STICKY_TIMEOUTS)) { int sec = 0, usec = 0; if (timeout) { sec = timeout / HZ; usec = timeout % HZ; usec *= (1000000/HZ); } put_user(sec, &tvp->tv_sec); put_user(usec, &tvp->tv_usec); - } - - if (ret < 0) - goto out; - if (!ret) { - ret = -ERESTARTNOHAND; - if (signal_pending(current)) - goto out; - ret = 0; } set_fd_set32(nn, inp, fds.res_in); set_fd_set32(nn, outp, fds.res_out); set_fd_set32(nn, exp, fds.res_ex); diff -prU6 2.5.26/arch/sparc64/kernel/sys_sparc32.c 2.5.26-select/arch/sparc64/kernel/sys_sparc32.c --- 2.5.26/arch/sparc64/kernel/sys_sparc32.c Tue Jul 16 16:49:35 2002 +++ 2.5.26-select/arch/sparc64/kernel/sys_sparc32.c Fri Jul 19 02:16:59 2002 @@ -1387,30 +1387,30 @@ asmlinkage int sys32_select(int n, u32 * zero_fd_set(n, fds.res_in); zero_fd_set(n, fds.res_out); zero_fd_set(n, fds.res_ex); ret = do_select(n, &fds, &timeout); + if (ret < 0) + goto out; + if (!ret) { + ret = -ERESTARTNOHAND; + if (signal_pending(current)) + goto out; + ret = 0; + } + if (tvp && !(current->personality & STICKY_TIMEOUTS)) { time_t sec = 0, usec = 0; if (timeout) { sec = timeout / HZ; usec = timeout % HZ; usec *= (1000000/HZ); } put_user(sec, &tvp->tv_sec); put_user(usec, &tvp->tv_usec); - } - - if (ret < 0) - goto out; - if (!ret) { - ret = -ERESTARTNOHAND; - if (signal_pending(current)) - goto out; - ret = 0; } set_fd_set32(nn, inp, fds.res_in); set_fd_set32(nn, outp, fds.res_out); set_fd_set32(nn, exp, fds.res_ex); diff -prU6 2.5.26/arch/x86_64/ia32/sys_ia32.c 2.5.26-select/arch/x86_64/ia32/sys_ia32.c --- 2.5.26/arch/x86_64/ia32/sys_ia32.c Tue Jul 16 16:49:32 2002 +++ 2.5.26-select/arch/x86_64/ia32/sys_ia32.c Fri Jul 19 02:18:07 2002 @@ -878,30 +878,30 @@ sys32_select(int n, fd_set *inp, fd_set zero_fd_set(n, fds.res_in); zero_fd_set(n, fds.res_out); zero_fd_set(n, fds.res_ex); ret = do_select(n, &fds, &timeout); + if (ret < 0) + goto out; + if (!ret) { + ret = -ERESTARTNOHAND; + if (signal_pending(current)) + goto out; + ret = 0; + } + if (tvp32 && !(current->personality & STICKY_TIMEOUTS)) { time_t sec = 0, usec = 0; if (timeout) { sec = timeout / HZ; usec = timeout % HZ; usec *= (1000000/HZ); } put_user(sec, (int *)&tvp32->tv_sec); put_user(usec, (int *)&tvp32->tv_usec); - } - - if (ret < 0) - goto out; - if (!ret) { - ret = -ERESTARTNOHAND; - if (signal_pending(current)) - goto out; - ret = 0; } set_fd_set(n, inp, fds.res_in); set_fd_set(n, outp, fds.res_out); set_fd_set(n, exp, fds.res_ex); diff -prU6 2.5.26/fs/select.c 2.5.26-select/fs/select.c --- 2.5.26/fs/select.c Tue Jul 16 16:49:24 2002 +++ 2.5.26-select/fs/select.c Fri Jul 19 02:18:41 2002 @@ -316,30 +316,30 @@ sys_select(int n, fd_set *inp, fd_set *o zero_fd_set(n, fds.res_in); zero_fd_set(n, fds.res_out); zero_fd_set(n, fds.res_ex); ret = do_select(n, &fds, &timeout); + if (ret < 0) + goto out; + if (!ret) { + ret = -ERESTARTNOHAND; + if (signal_pending(current)) + goto out; + ret = 0; + } + if (tvp && !(current->personality & STICKY_TIMEOUTS)) { time_t sec = 0, usec = 0; if (timeout) { sec = timeout / HZ; usec = timeout % HZ; usec *= (1000000/HZ); } put_user(sec, &tvp->tv_sec); put_user(usec, &tvp->tv_usec); - } - - if (ret < 0) - goto out; - if (!ret) { - ret = -ERESTARTNOHAND; - if (signal_pending(current)) - goto out; - ret = 0; } set_fd_set(n, inp, fds.res_in); set_fd_set(n, outp, fds.res_out); set_fd_set(n, exp, fds.res_ex); ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] 'select' failure or signal should not update timeout 2002-07-19 9:52 ` [PATCH] 'select' failure or signal should not update timeout Paul Eggert @ 2002-07-20 0:38 ` Alan Cox 2002-07-20 5:57 ` Linus Torvalds 2002-07-21 20:14 ` Richard Stallman 0 siblings, 2 replies; 23+ messages in thread From: Alan Cox @ 2002-07-20 0:38 UTC (permalink / raw) To: Paul Eggert; +Cc: linux-kernel, rms, Alan Cox > <http://www.opengroup.org/onlinepubs/007904975/functions/select.html> > says that 'select' may modify its timeout argument only "upon > successful completion". However, the Linux kernel sometimes modifies > the timeout argument even when 'select' fails or is interrupted. This is extremely useful behaviour. POSIX is broken here. Fix it in the C library or somewhere it doesn't harm the clueful You should raise this with the standards committee instead ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] 'select' failure or signal should not update timeout 2002-07-20 0:38 ` Alan Cox @ 2002-07-20 5:57 ` Linus Torvalds 2002-07-21 15:36 ` Eric W. Biederman ` (2 more replies) 2002-07-21 20:14 ` Richard Stallman 1 sibling, 3 replies; 23+ messages in thread From: Linus Torvalds @ 2002-07-20 5:57 UTC (permalink / raw) To: linux-kernel In article <200207200038.g6K0cZO12086@devserv.devel.redhat.com>, Alan Cox <alan@redhat.com> wrote: >> <http://www.opengroup.org/onlinepubs/007904975/functions/select.html> >> says that 'select' may modify its timeout argument only "upon >> successful completion". However, the Linux kernel sometimes modifies >> the timeout argument even when 'select' fails or is interrupted. > >This is extremely useful behaviour. POSIX is broken here. Fix it in the >C library or somewhere it doesn't harm the clueful Personally, I've gotten to the point where I think that the select() time is broken. The thing is, nobody should really ever use timeouts, because the notion of "I want to sleep X seconds" is simply not _useful_ if the process also just got delayed by a page-out event as it said so. What does "X seconds" mean at that point? It's ambiguous - and the kernel will (quite naturally) just always assume that it is "X seconds from when the kernel got notified". A _useful_ interface would be to say "I want to sleep to at most time X" or "to at least time X". Those are unambiguous things to say, and are not open to interpretation. The "I want to sleep until at least time X" (or "at most time X") also has the added advantage that it is inherently re-startable - restarting the sleep has _no_ rounding issues, and again no ambiguity. Note that select() is definitely not the only offender here. Other system calls like "nanosleep()" have the exact same problem - what do you do if you get interrupted by a signal and need to restart? The Linux behaviour of modifying the timeout is a half-assed try for restartability, but the problem is that (a) nobody else does that or expects it to happen, despite the man-pages originally claiming that they were supposed to and (b) it inherently has rounding problems and other ambiguities - making it even less useful. Oh, well. I suspect almost nobody actually uses the Linux timeout feature because of the nonportability issues, making the whole mess even less tasty. Linus ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] 'select' failure or signal should not update timeout 2002-07-20 5:57 ` Linus Torvalds @ 2002-07-21 15:36 ` Eric W. Biederman 2002-07-24 13:44 ` Jamie Lokier 2002-07-21 16:00 ` Christoph Rohland 2002-07-21 16:26 ` Ingo Molnar 2 siblings, 1 reply; 23+ messages in thread From: Eric W. Biederman @ 2002-07-21 15:36 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-kernel torvalds@transmeta.com (Linus Torvalds) writes: > In article <200207200038.g6K0cZO12086@devserv.devel.redhat.com>, > Alan Cox <alan@redhat.com> wrote: > >> <http://www.opengroup.org/onlinepubs/007904975/functions/select.html> > >> says that 'select' may modify its timeout argument only "upon > >> successful completion". However, the Linux kernel sometimes modifies > >> the timeout argument even when 'select' fails or is interrupted. > > > >This is extremely useful behaviour. POSIX is broken here. Fix it in the > >C library or somewhere it doesn't harm the clueful > > Personally, I've gotten to the point where I think that the select() > time is broken. > > The thing is, nobody should really ever use timeouts, because the notion > of "I want to sleep X seconds" is simply not _useful_ if the process > also just got delayed by a page-out event as it said so. What does "X > seconds" mean at that point? It's ambiguous - and the kernel will (quite > naturally) just always assume that it is "X seconds from when the kernel > got notified". > > A _useful_ interface would be to say "I want to sleep to at most time X" > or "to at least time X". Those are unambiguous things to say, and are > not open to interpretation. Sleeping until at most time X is only useful if the kernel can actually make a guarantee like that. If you are doing hard real time fine, otherwise that doesn't work to well. > The "I want to sleep until at least time X" (or "at most time X") also > has the added advantage that it is inherently re-startable - restarting > the sleep has _no_ rounding issues, and again no ambiguity. > > Note that select() is definitely not the only offender here. Other > system calls like "nanosleep()" have the exact same problem - what do > you do if you get interrupted by a signal and need to restart? > > The Linux behaviour of modifying the timeout is a half-assed try for > restartability, but the problem is that (a) nobody else does that or > expects it to happen, despite the man-pages originally claiming that > they were supposed to and (b) it inherently has rounding problems and > other ambiguities - making it even less useful. > > Oh, well. > > I suspect almost nobody actually uses the Linux timeout feature because > of the nonportability issues, making the whole mess even less tasty. Actually I have had occasion in dosemu to not use the timeout features because it did not do a good job of attempting to sleep for X seconds. There can be a lot of time from when the kernel updates the timeout value, and when the system call is restarted. The desired semantics in this case were I want to sleep until time X, and I want to wake up as soon afterwards as is reasonable. Calling gettimeofday before restarting the system call resulted in a much better approximation of the desired result. Eric ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] 'select' failure or signal should not update timeout 2002-07-21 15:36 ` Eric W. Biederman @ 2002-07-24 13:44 ` Jamie Lokier 2002-07-24 18:48 ` Linus Torvalds 0 siblings, 1 reply; 23+ messages in thread From: Jamie Lokier @ 2002-07-24 13:44 UTC (permalink / raw) To: Eric W. Biederman; +Cc: Linus Torvalds, linux-kernel Eric W. Biederman wrote: > torvalds@transmeta.com (Linus Torvalds) writes: > > A _useful_ interface would be to say "I want to sleep to at most time X" > > or "to at least time X". Those are unambiguous things to say, and are > > not open to interpretation. > > Sleeping until at most time X is only useful if the kernel can actually > make a guarantee like that. If you are doing hard real time fine, otherwise > that doesn't work to well. Oh, that would definitely be useful even if it's only a "soft" guarantee. Especially with recent HZ changes. Typical soft real-time code looks a bit like this pseudo-code (excuse the bugs :-): void wait_until_time (const struct timeval * until) { struct timeval now, timeout; while (1) { gettimeofday (&now, 0); timeout.tv_sec = until->tv_sec - now.tv_sec; timeout.tv_usec = until->tv_usec - now.tv_usec; if (timeout.tv_usec < 0) { timeout.tv_usec += 1000000; timeout.tv_sec -= 1; } if (timeout.tv_sec < 0) break; /* Finished! */ timeout.tv_usec -= SCHEDULER_GRANULARITY; if (timeout.tv_usec < 0) { timeout.tv_usec += 1000000; timeout.tv_sec -= 1; } /* Busy wait if within scheduler granularity. */ if (timeout.tv_sec > 0) { select (0, 0, 0, &timeout); } } } Note that SCHEDULER_GRANULARITY is an architecure-specific and OS-specific constant that has to be determined somehow. The select() call in the above code is one that would, ideally, be "wait until at most TIME" even if that is limited by the granularity of scheduler timeouts. The scheduler may not be able to _guarantee_ to schedule the process before TIME (fair enough, that's why we call it soft real-time), but at least the tick calculations etc. in the kernel would be rounded down, rather than up. -- Jamie ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] 'select' failure or signal should not update timeout 2002-07-24 13:44 ` Jamie Lokier @ 2002-07-24 18:48 ` Linus Torvalds 2002-07-24 19:07 ` Chris Friesen ` (3 more replies) 0 siblings, 4 replies; 23+ messages in thread From: Linus Torvalds @ 2002-07-24 18:48 UTC (permalink / raw) To: Jamie Lokier; +Cc: Eric W. Biederman, linux-kernel On Wed, 24 Jul 2002, Jamie Lokier wrote: > > Typical soft real-time code looks a bit like this pseudo-code (excuse > the bugs :-): Yup, looks familiar. The thing is, we cannot change existing select semantics, and the question is whether what most soft-realtime wants is actually select, or whether people really want a "waittimeofday()". Like your example, the only uses I've had personally (DVD playback) have really had an empty select, so it wasn't really select itself that was horribly important. Linus ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] 'select' failure or signal should not update timeout 2002-07-24 18:48 ` Linus Torvalds @ 2002-07-24 19:07 ` Chris Friesen 2002-07-24 23:30 ` Jamie Lokier ` (2 subsequent siblings) 3 siblings, 0 replies; 23+ messages in thread From: Chris Friesen @ 2002-07-24 19:07 UTC (permalink / raw) To: Linus Torvalds; +Cc: Jamie Lokier, Eric W. Biederman, linux-kernel Linus Torvalds wrote: > The thing is, we cannot change existing select semantics, and the question > is whether what most soft-realtime wants is actually select, or whether > people really want a "waittimeofday()". Actually, I'd like a waitonmonotonicallyincreasingnonadjustablehighres64bittime(). Chris -- Chris Friesen | MailStop: 043/33/F10 Nortel Networks | work: (613) 765-0557 3500 Carling Avenue | fax: (613) 765-2986 Nepean, ON K2H 8E9 Canada | email: cfriesen@nortelnetworks.com ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] 'select' failure or signal should not update timeout 2002-07-24 18:48 ` Linus Torvalds 2002-07-24 19:07 ` Chris Friesen @ 2002-07-24 23:30 ` Jamie Lokier 2002-07-25 6:32 ` Rusty Russell 2002-07-25 16:35 ` Eric W. Biederman 3 siblings, 0 replies; 23+ messages in thread From: Jamie Lokier @ 2002-07-24 23:30 UTC (permalink / raw) To: Linus Torvalds; +Cc: Eric W. Biederman, linux-kernel Linus Torvalds wrote: > Like your example, the only uses I've had personally (DVD playback) have > really had an empty select, so it wasn't really select itself that was > horribly important. All the real examples I've encountered are waiting on file descriptors too -- and occasionally also signals. -- Jamie ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] 'select' failure or signal should not update timeout 2002-07-24 18:48 ` Linus Torvalds 2002-07-24 19:07 ` Chris Friesen 2002-07-24 23:30 ` Jamie Lokier @ 2002-07-25 6:32 ` Rusty Russell 2002-07-25 18:31 ` george anzinger 2002-07-28 5:40 ` David Schwartz 2002-07-25 16:35 ` Eric W. Biederman 3 siblings, 2 replies; 23+ messages in thread From: Rusty Russell @ 2002-07-25 6:32 UTC (permalink / raw) To: Linus Torvalds; +Cc: lk, ebiederm, linux-kernel On Wed, 24 Jul 2002 11:48:10 -0700 (PDT) Linus Torvalds <torvalds@transmeta.com> wrote: > The thing is, we cannot change existing select semantics, and the > question is whether what most soft-realtime wants is actually select, or > whether people really want a "waittimeofday()". NOT waittimeofday. You need a *new* measure which can't be set forwards or back if you want this to be sane. pthreads has absolute timeouts (eg. pthread_cond_timedwait), but they suck IRL for this reason. Of course, doesn't need any correlation with absolute time, it could be a "microseconds since boot" kind of thing. Rusty. -- there are those who do and those who hang on and you don't see too many doers quoting their contemporaries. -- Larry McVoy ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] 'select' failure or signal should not update timeout 2002-07-25 6:32 ` Rusty Russell @ 2002-07-25 18:31 ` george anzinger 2002-07-28 5:40 ` David Schwartz 1 sibling, 0 replies; 23+ messages in thread From: george anzinger @ 2002-07-25 18:31 UTC (permalink / raw) To: Rusty Russell; +Cc: Linus Torvalds, lk, ebiederm, linux-kernel Rusty Russell wrote: > > On Wed, 24 Jul 2002 11:48:10 -0700 (PDT) > Linus Torvalds <torvalds@transmeta.com> wrote: > > > The thing is, we cannot change existing select semantics, and the > > question is whether what most soft-realtime wants is actually select, or > > whether people really want a "waittimeofday()". > > NOT waittimeofday. You need a *new* measure which can't be set forwards > or back if you want this to be sane. pthreads has absolute timeouts (eg. > pthread_cond_timedwait), but they suck IRL for this reason. > > Of course, doesn't need any correlation with absolute time, it could be a > "microseconds since boot" kind of thing. > The POSIX clocks & timers API defines CLOCK_MONOTONIC for this sort of thing (CLOCK_MONOTONIC can not be set). It also defines an API for clock_nanosleep() that CAN use an absolute time which is supposed to follow any clock setting that is done. Combine the two and you have a fixed time definition. AND, guess what, the high-res-timers patch does all this and more. -- George Anzinger george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ Real time sched: http://sourceforge.net/projects/rtsched/ Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] 'select' failure or signal should not update timeout 2002-07-25 6:32 ` Rusty Russell 2002-07-25 18:31 ` george anzinger @ 2002-07-28 5:40 ` David Schwartz 1 sibling, 0 replies; 23+ messages in thread From: David Schwartz @ 2002-07-28 5:40 UTC (permalink / raw) To: rusty; +Cc: linux-kernel >NOT waittimeofday. You need a *new* measure which can't be set forwards >or back if you want this to be sane. pthreads has absolute timeouts (eg. >pthread_cond_timedwait), but they suck IRL for this reason. >Rusty. The usual way to deal with this is to have a 'clock watcher' thread. If the system time jumps any significant amount, you signal all condition variables. You're not guaranteed any particular latency anyway. I don't think a DVD playback skipping when the system time is changed by a large amount is unacceptable. However, the use of some sort of linear timebase is much more convenient for many things. DS ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] 'select' failure or signal should not update timeout 2002-07-24 18:48 ` Linus Torvalds ` (2 preceding siblings ...) 2002-07-25 6:32 ` Rusty Russell @ 2002-07-25 16:35 ` Eric W. Biederman 2002-07-25 17:15 ` Jamie Lokier 3 siblings, 1 reply; 23+ messages in thread From: Eric W. Biederman @ 2002-07-25 16:35 UTC (permalink / raw) To: Linus Torvalds; +Cc: Jamie Lokier, linux-kernel Linus Torvalds <torvalds@transmeta.com> writes: > On Wed, 24 Jul 2002, Jamie Lokier wrote: > > > > Typical soft real-time code looks a bit like this pseudo-code (excuse > > the bugs :-): > > Yup, looks familiar. > > The thing is, we cannot change existing select semantics, and the question > is whether what most soft-realtime wants is actually select, or whether > people really want a "waittimeofday()". > > Like your example, the only uses I've had personally (DVD playback) have > really had an empty select, so it wasn't really select itself that was > horribly important. Baring minor quibbles waittimeofday is essentially what we have today. Fixing up the interface to take an absolute time, from an absolute timer cleans up some races but doesn't attack the fundamental problem. There are two fundamental problems with the current interface. The timer granularity is much to large, and we don't know the granularity that user space cares about. The posted wait_for_time implementation had one very interesting aspect the timer granularity of the kernel (HZ) was known to the application, and it very deliberately rounded the sleep interval down based on the kernel timer granularity, so it could busy wait all by itself for the rest. Problem, user space applications can only get what they want through busy waiting. But they can tell when they have gotten what they want because the gettimeofday resolution is much better than the kernel timer resolution. There are two states a unix box can be in. cpu load_average < 1. In this state multiple processes run per scheduler quantum. They run for short periods of time and then go back to sleep. Latency is very good because the other processes get out of the way, and sleeping process can count on running at the next timer tick. cpu load_average > 1. In this state one or several processes run for their full cpu quantum. Latency is bad, and opportunistically the timer resolution does not get better. When we have a cpu load average < 1, it is trivial to increase the timer granularity to something resembling the gettimeofday resolution simply by internally doing gettimeofday when schedule is called, and adding those processes that have just become runnable to the run queue. To get the most out of this the idle task would need to busy wait looking for timer events, when we have an event scheduled before the next timer tick. The goal is twofold, to remove the need for user space applications to busy wait, so sometimes the system can get something done another process is waiting, and to increase the internal kernel timer granularity to the point where user space doesn't care anymore. With the only timer we sleep past the desired time is when the kernel decides there is some higher priority task to run. On the timer queue I would use either microseconds (the resolution of struct timeval), or the natural resolution of the timer, instead of something artificial like HZ. This allows faster machines under the same load as slower machines to become more precise, with the same code. Polling for timers only in schedule and the idle task means that when the load is low the kernel offers more precision than when the load is high. The frequency of timer interrupts would have to up, at some point to handle some loads, but just keeping the load light would allow the program to work as expected even without a kernel recompile. Having user space know HZ and use it for their internal calculations is dangerous because HZ will change as time goes by. But having user space specify it's desired timer resolution to the kernel will allow the kernel to round the time to a place where it can efficiently handle the timer, and still meet the user space deadline. The most interesting use I have seen is a high performance local area data transfer utility, that would do short sleeps in between sending packets to avoid pushing the switch to the point where it would drop packets. But it was perfectly fine if a new packet came in before it was done waiting for the old packet to go out the wire. Eric ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] 'select' failure or signal should not update timeout 2002-07-25 16:35 ` Eric W. Biederman @ 2002-07-25 17:15 ` Jamie Lokier 0 siblings, 0 replies; 23+ messages in thread From: Jamie Lokier @ 2002-07-25 17:15 UTC (permalink / raw) To: Eric W. Biederman; +Cc: Linus Torvalds, linux-kernel Eric W. Biederman wrote: > When we have a cpu load average < 1, it is trivial to increase the > timer granularity to something resembling the gettimeofday resolution > simply by internally doing gettimeofday when schedule is called, and > adding those processes that have just become runnable to the run > queue. To get the most out of this the idle task would need to busy > wait looking for timer events, when we have an event scheduled before > the next timer tick. Unfortunately, this does not help "soft real-time" tasks like the hypothetical video game with a compile running in the background. That needs to preempt lower priority tasks somehow. Ideally, because they don't use much CPU but do want to run on time, it should be possible to run those programs using non-real-time priority, and they would run on time simply because they always have a high dynamic priority. To be fair, although 100Hz timer resolution wasn't good enough even for a simple "snake" video game with no other load (the eye detects the time variance as an apparent velocity variance), 1000Hz is probably fine. > The goal is twofold, to remove the need for user space applications to > busy wait, so sometimes the system can get something done another > process is waiting, and to increase the internal kernel timer > granularity to the point where user space doesn't care anymore. With > the only timer we sleep past the desired time is when the kernel > decides there is some higher priority task to run. What will happen if the timer granularity remains at 1000Hz when loadavg > 1 is that time-sensitive interactive apps will still busy wait for the remainder of a tick. _But_, if we can define select() or similar semantics to mean, as Linus suggested, "wait until at most TIME", then it becomes possible to avoid the busy wait at low loads (paradoxically). > The most interesting use I have seen is a high performance local area > data transfer utility, that would do short sleeps in between sending > packets to avoid pushing the switch to the point where it would drop > packets. But it was perfectly fine if a new packet came in before it > was done waiting for the old packet to go out the wire. That's the sort of thing I work on :) The resolution required of a packet shaper is measured in 10s of microseconds, though, so I just accept that user space must busy wait _all_ the time the link isn't idle. -- Jamie ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] 'select' failure or signal should not update timeout 2002-07-20 5:57 ` Linus Torvalds 2002-07-21 15:36 ` Eric W. Biederman @ 2002-07-21 16:00 ` Christoph Rohland 2002-07-21 16:43 ` Linus Torvalds 2002-07-21 16:26 ` Ingo Molnar 2 siblings, 1 reply; 23+ messages in thread From: Christoph Rohland @ 2002-07-21 16:00 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-kernel Hi Linus, On Sat, 20 Jul 2002, Linus Torvalds wrote: > The thing is, nobody should really ever use timeouts, because the > notion of "I want to sleep X seconds" is simply not _useful_ if the > process also just got delayed by a page-out event as it said so. > What does "X seconds" mean at that point? It's ambiguous - and the > kernel will (quite naturally) just always assume that it is "X > seconds from when the kernel got notified". > > A _useful_ interface would be to say "I want to sleep to at most > time X" or "to at least time X". Those are unambiguous things to > say, and are not open to interpretation. Yes, so everybody really using select assumes it's _at least_ X seconds... So where's the problem? I always know it's at least in a multiprocess environment. (At least as long as I do not want to fiddle with scheduling and priorities) > The Linux behaviour of modifying the timeout is a half-assed try for > restartability, but the problem is that (a) nobody else does that or > expects it to happen, despite the man-pages originally claiming that > they were supposed to and (b) it inherently has rounding problems > and other ambiguities - making it even less useful. Yes, and probably select is one of the calls you most of the time use because of portability. So IMHO a linuxism isn't worth the effort. Greetings Christoph ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] 'select' failure or signal should not update timeout 2002-07-21 16:00 ` Christoph Rohland @ 2002-07-21 16:43 ` Linus Torvalds 2002-07-21 17:51 ` dean gaudet ` (2 more replies) 0 siblings, 3 replies; 23+ messages in thread From: Linus Torvalds @ 2002-07-21 16:43 UTC (permalink / raw) To: Christoph Rohland; +Cc: linux-kernel On 21 Jul 2002, Christoph Rohland wrote: > > Yes, so everybody really using select assumes it's _at least_ X > seconds... So where's the problem? Have you tried to _do_ this? I doubt you have, since you think it works well already. The fact is, that if you're doing soft-realtime, you end up having to call gettimeofday() a lot more than you should. Your timeouts are fundamentally "real time" (ie they are _not_ of the type "I should show the next frame in 0.0333 seconds" but they are really "I showed frame N at time X, so I need to show frame N+1 at time X+0.0333"). The fact that select() and friends do not work with real time, but offsets, and is not restartable means that you end up having to do two gettimeofday() calls per select in these situations. In contrast, if you could just rely on absolute time in select(), you would be re-startable _and_ you'd not have to do the extra "what time is it now, so that I know what timeout I need to use for the next thing"? > Yes, and probably select is one of the calls you most of the time use > because of portability. So IMHO a linuxism isn't worth the effort. The fact is, the linuxism exists, and breaking it is worse than not breaking it. The number of users is probably small, but they do exist. Linus ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] 'select' failure or signal should not update timeout 2002-07-21 16:43 ` Linus Torvalds @ 2002-07-21 17:51 ` dean gaudet 2002-07-22 3:59 ` Edgar Toernig 2002-07-22 6:51 ` Christoph Rohland 2 siblings, 0 replies; 23+ messages in thread From: dean gaudet @ 2002-07-21 17:51 UTC (permalink / raw) To: Linus Torvalds; +Cc: Christoph Rohland, linux-kernel On Sun, 21 Jul 2002, Linus Torvalds wrote: > The fact is, the linuxism exists, and breaking it is worse than not > breaking it. fortunately, glibc uses poll() rather than select() these days (so that it avoids bugs with programs with huge numbers of fds). so that ancient code in the libc5 resolver (see res_send) which still relies on this linuxism is dying out :) -dean ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] 'select' failure or signal should not update timeout 2002-07-21 16:43 ` Linus Torvalds 2002-07-21 17:51 ` dean gaudet @ 2002-07-22 3:59 ` Edgar Toernig 2002-07-22 6:51 ` Christoph Rohland 2 siblings, 0 replies; 23+ messages in thread From: Edgar Toernig @ 2002-07-22 3:59 UTC (permalink / raw) To: Linus Torvalds; +Cc: Christoph Rohland, linux-kernel Linus Torvalds wrote: > > In contrast, if you could just rely on absolute time in select(), you > would be re-startable _and_ you'd not have to do the extra "what time is > it now, so that I know what timeout I need to use for the next thing"? I agree. Absolute times are nicer. Just one note: to make that work you need a sane time source! gettimeofday jumps back and forth. You want a getuptime (or similar) that gives a constant monotonous growing value not adjustable from userspace (and preferably the same for all processes). Ciao, ET. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] 'select' failure or signal should not update timeout 2002-07-21 16:43 ` Linus Torvalds 2002-07-21 17:51 ` dean gaudet 2002-07-22 3:59 ` Edgar Toernig @ 2002-07-22 6:51 ` Christoph Rohland 2 siblings, 0 replies; 23+ messages in thread From: Christoph Rohland @ 2002-07-22 6:51 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-kernel Hi Linus, On Sun, 21 Jul 2002, Linus Torvalds wrote: >> Yes, so everybody really using select assumes it's _at least_ X >> seconds... So where's the problem? > > Have you tried to _do_ this? I doubt you have, since you think it > works well already. Well enough for me and my customers :-) > The fact is, that if you're doing soft-realtime, you end up having > to call gettimeofday() a lot more than you should. OK, I do not do (soft-)realtime. For non-realtime needs the current scheme with relative timeouts is easier to use since you do not need to call gettimeofday at all. Greetings Christoph ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] 'select' failure or signal should not update timeout 2002-07-20 5:57 ` Linus Torvalds 2002-07-21 15:36 ` Eric W. Biederman 2002-07-21 16:00 ` Christoph Rohland @ 2002-07-21 16:26 ` Ingo Molnar 2 siblings, 0 replies; 23+ messages in thread From: Ingo Molnar @ 2002-07-21 16:26 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-kernel On Sat, 20 Jul 2002, Linus Torvalds wrote: > The thing is, nobody should really ever use timeouts, because the notion > of "I want to sleep X seconds" is simply not _useful_ if the process > also just got delayed by a page-out event as it said so. What does "X > seconds" mean at that point? It's ambiguous - and the kernel will (quite > naturally) just always assume that it is "X seconds from when the kernel > got notified". > > A _useful_ interface would be to say "I want to sleep to at most time X" > or "to at least time X". Those are unambiguous things to say, and are > not open to interpretation. on the other hand, the application itself cannot even know what exact absolute time it is, in any unambiguous form - what if right after the gettimeofday() it got scheduled away and swapped out for many seconds? so the notion of 'sleep until absolute time X' just brings the 'time uncertainity' down one more level, it doesnt eliminate it. the rounding issue is valid when an unlimited number of restarts are allowed - N x relative timeouts are numerically inaccurate. But there is no fundamental difference (only performance difference): correct timeouts can be achieved even if the kernel interface only supports relative timeouts: the application has to save the absolute target time and has to recalculate the relative timeout based on the target date and current date. (which involves multiple calls to gettimeofday(), so it's additional overhead.) Ingo ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] 'select' failure or signal should not update timeout 2002-07-20 0:38 ` Alan Cox 2002-07-20 5:57 ` Linus Torvalds @ 2002-07-21 20:14 ` Richard Stallman 1 sibling, 0 replies; 23+ messages in thread From: Richard Stallman @ 2002-07-21 20:14 UTC (permalink / raw) To: alan; +Cc: eggert, linux-kernel, alan This is extremely useful behaviour. POSIX is broken here. Fix it in the C library or somewhere it doesn't harm the clueful Why is it useful? For signal handlers to see how much waiting time is left? ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] 'select' failure or signal should not update timeout @ 2002-07-20 3:59 dank 0 siblings, 0 replies; 23+ messages in thread From: dank @ 2002-07-20 3:59 UTC (permalink / raw) To: linux-kernel@vger.kernel.org Alan Cox wrote: > > <http://www.opengroup.org/onlinepubs/007904975/functions/select.html> > > says that 'select' may modify its timeout argument only "upon > > successful completion". However, the Linux kernel sometimes modifies > > the timeout argument even when 'select' fails or is interrupted. > > This is extremely useful behaviour. POSIX is broken here. I tried to make use of this behavior back in 2.2 days, I think, and ran into trouble. The time remaining wasn't quite right, I seem to recall, making this nifty feature less useful. I've since given up on it. > Fix it in the C library or somewhere it doesn't harm the clueful Can you give an example of a clueful package that makes use of this feature and would be harmed if select() suddenly became posix-compliant? - Dan ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] 'select' failure or signal should not update timeout
@ 2002-07-21 3:34 Peter T. Breuer
0 siblings, 0 replies; 23+ messages in thread
From: Peter T. Breuer @ 2002-07-21 3:34 UTC (permalink / raw)
To: linux-kernel
Dan writes:
> Alan Cox wrote:
> > > <http://www.opengroup.org/onlinepubs/007904975/functions/select.html>
> > > says that 'select' may modify its timeout argument only "upon
> > > successful completion". However, the Linux kernel sometimes modifies
> > > the timeout argument even when 'select' fails or is interrupted.
> >
> > This is extremely useful behaviour. POSIX is broken here.
>
> I tried to make use of this behavior back in 2.2 days, I think,
> and ran into trouble. The time remaining wasn't quite right, I seem
> to recall, making this nifty feature less useful. I've since
> given up on it.
>
> > Fix it in the C library or somewhere it doesn't harm the clueful
>
> Can you give an example of a clueful package that makes
> use of this feature and would be harmed if select() suddenly
> became posix-compliant?
Daemons that I've written for linux-specific tasks all
use the select timeout in order to wait for an event for a fixed
amount of time, across possible interrupts.
That is to say, they watch the errno after return from select, and if
it was EINTR, then they reenter the select without further ado.
Since the timeout has changed to reflect the time remaining, that's
quite right.
This is typical of deamons doing tcp/ip. I guess one answer would be to
make tcp timeout more configurable.
Peter
^ permalink raw reply [flat|nested] 23+ messages in thread* Re: [PATCH] 'select' failure or signal should not update timeout
@ 2002-07-28 10:33 linux
0 siblings, 0 replies; 23+ messages in thread
From: linux @ 2002-07-28 10:33 UTC (permalink / raw)
To: linux-kernel
Chris Friesen asked for:
> waitonmonotonicallyincreasingnonadjustablehighres64bittime()
Well, take the POSIX clock_gettime() interface and add clock_waittime().
Oh, wait.. they already did it. clock_nanosleep().
The POSIX folks realized that people want a variety of tiemrs, and
so the functions take a clockid_t first argument, which is just an enum.
They defined two values, but leave the field open to others:
- CLOCK_MONOTONIC, which is what you want. Unspecified epoch
(possibly boot time), and never gets adjusted
- CLOCK_REALTIME, which is the classig time() UTC time.
Extensions define CLOCK_PROCESS_CPUTIME_ID and CLOCK_THREAD_CPUTIME_ID.
The clock weenies are welcome to add CLOCK_TAI, CLOCK_GPS, CLOCK_UTS
(see Markus Kuhn's suggestion), CLOCK_UTC (with some "better" leap-second
handling), CLOCK_FREQADJUST (uses frequency but not phase adjustments),
CLOCK_NOSTEP (frequency and phase adjustments, but doesn't step),
and anything else you like.
Astronomers might add CLOCK_UT1, CLOCK_UT0, CLOCK_SIDERIAL, CLOCK_TDB,
CLOCK_TDT, CLOCK_TCG, CLOCK_TCB, and maybe a few things I haven't thought
of. The interface doesn't require that all of these be implemented in
the kernel, of course.
^ permalink raw reply [flat|nested] 23+ messages in threadend of thread, other threads:[~2002-07-28 10:30 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <200207171430.g6HEUvY23619@aztec.santafe.edu>
2002-07-19 9:52 ` [PATCH] 'select' failure or signal should not update timeout Paul Eggert
2002-07-20 0:38 ` Alan Cox
2002-07-20 5:57 ` Linus Torvalds
2002-07-21 15:36 ` Eric W. Biederman
2002-07-24 13:44 ` Jamie Lokier
2002-07-24 18:48 ` Linus Torvalds
2002-07-24 19:07 ` Chris Friesen
2002-07-24 23:30 ` Jamie Lokier
2002-07-25 6:32 ` Rusty Russell
2002-07-25 18:31 ` george anzinger
2002-07-28 5:40 ` David Schwartz
2002-07-25 16:35 ` Eric W. Biederman
2002-07-25 17:15 ` Jamie Lokier
2002-07-21 16:00 ` Christoph Rohland
2002-07-21 16:43 ` Linus Torvalds
2002-07-21 17:51 ` dean gaudet
2002-07-22 3:59 ` Edgar Toernig
2002-07-22 6:51 ` Christoph Rohland
2002-07-21 16:26 ` Ingo Molnar
2002-07-21 20:14 ` Richard Stallman
2002-07-20 3:59 dank
-- strict thread matches above, loose matches on Subject: below --
2002-07-21 3:34 Peter T. Breuer
2002-07-28 10:33 linux
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox