* Re: [PATCH] Revised timerfd() interface @ 2007-08-25 6:41 Michael Kerrisk 2007-08-30 12:01 ` Davide Libenzi 0 siblings, 1 reply; 29+ messages in thread From: Michael Kerrisk @ 2007-08-25 6:41 UTC (permalink / raw) To: Davide Libenzi, Randy Dunlap Cc: Michael Kerrisk, Andrew Morton, Thomas Gleixner, lkml, Linux Torvalds, drepper, stable, Christoph Hellwig, Jan Engelhardt, Jonathan Corbet [resend, since LKML didn't like last send.] Randy, Thanks for the input. I will try to send a revised patch after I get back from holiday. Some comments below. Davide -- ping! Can you please offer your comments about this change, and also thoughts on Jon's and my comments about a more radical API change later in this thread. Cheers, Michael Cheers, Michael On 8/14/07, Randy Dunlap <rdunlap@xenotime.net> wrote: > > On Thu, 09 Aug 2007 23:11:45 +0200 Michael Kerrisk wrote: > > > Andrew, > > > > Here's my first shot at changing the timerfd() interface; patch > > is against 2.6.23-rc1-mm2. > > > > I think I got to the bottom of understanding the timer code in > > the end, but I may still have missed some things... > > > > This is my first kernel patch of any substance. Perhaps Randy > > or Christoph can give my toes a light toasting for the various > > boobs I've likely made. > > > > Thomas, would you please look over this patch to verify my > > timer code? > > > > Davide: would you please bless this interface change ;-). > > It makes it possible to: > > > > 1) Fetch the previous timer settings (i.e., interval, and time > > remaining until next expiration) while installing new settings > > > > 2) Retrieve the settings of an existing timer without changing > > the timer. > > > > Both of these features are supported by the older Unix timer APIs > > (setitimer/getitimer and timer_create/timer_settime/timer_gettime). > > Hi, > > OK, I'll make a few comments. > > 1. Provide a full changelog, including reasons why the patch is > needed. Don't assume that we have read the previous email threads. > (We haven't. :) Okay -- will do for a future revision of this patch. In the meantime, the most relevant earlier mail thread is this: http://article.gmane.org/gmane.linux.kernel/559193 The following are also of some relevance: http://article.gmane.org/gmane.linux.kernel/556043 http://article.gmane.org/gmane.linux.kernel/556124 > The changes are as follows: > > > > a) A further argument, outmr, can be used to retrieve the interval > > and time remaining before the expiration of a previously created > > timer. Thus the call signature is now: > > > > timerfd(int utmr, int clockid, int flags, > > ufd, Yes. > struct itimerspec *utmr, > > struct itimerspec *outmr) > > > > The outmr argument: > > > > 1) must be NULL for a new timer (ufd == -1). > > 2) can be NULL if we are updating an existing > > timer (i.e., utmr != NULL), but we are not > > interested in the previous timer value. > > 3) can be used to retrieve the time until next timer > > expiration, without changing the timer. > > (In this case, utmr == NULL and ufd >= 0.) > > > > b) Make the 'clockid' immutable: it can only be set > > if 'ufd' is -1 -- that is, on the timerfd() call that > > first creates a timer. (This is effectively > > the limitation that is imposed by POSIX timers: the > > clockid is specified when the timer is created with > > timer_create(), and can't be changed.) > > > > [In the 2.6.22 interface, the clockid of an existing > > timer can be changed with a further call to timerfd() > > that specifies the file descriptor of an existing timer.] > > > > The use cases are as follows: > > > > ufd = timerfd(-1, clockid, flags, utmr, NULL); > > to create a new timer with given clockid, flags, and utmr (initial > > expiration + interval). outmr must be NULL. > > > > ufd = timerfd(ufd, -1, flags, utmr, NULL); > > to change the flags and timer settings of an existing timer. > > (The clockid argument serves no purpose when modifying an > > existing timer, and as I've implemented things, the argument > > must be specified as -1 for an existing timer.) > > > > ufd = timerfd(ufd, -1, flags, utmr, &outmr); > > to change the flags and timer settings of an existing timer, and > > retrieve the time until the next expiration of the timer (and > > the associated interval). > > > > ufd = timerfd(ufd, -1, 0, NULL, &outmr); > > Return the time until the next expiration of the timer (and the > > associated interval), without changing the existing timer settings. > > The flags argument has no meaning in this case, and must be > > specified as 0. > > > > Other changes: > > > > * I've added checks for various combinations of arguments values > > that could be deemed invalid; there is probably room for debate > > about whether we want all of these checks. Comments in the > > patch describe these checks. > > > > * Appropriate, and possibly even correct, changes made in fs/compat.c. > > > > * Jan Engelhardt noted that a cast could be removed from Davide's > > original code; I've removed it. > > > > The changes are correct as far as I've tested them. I've attached a > > test program. Davide, could you subject this to further test please? > > > > I'm off on holiday the day after tomorrow, back in two weeks, with > > very limited computer access. I may have time for another pass of > > the code if comments come in over(euro)night, otherwise Davide may > > be willing to clean up whatever I messed up... > > > > Cheers, > > > > Michael > > > > diff -ruN linux-2.6.23-rc1-mm2-orig/fs/compat.c linux-2.6.23-rc1-mm2 > /fs/compat.c > > --- linux-2.6.23-rc1-mm2-orig/fs/compat.c 2007-08-05 10:43: > 30.000000000 +0200 > > +++ linux-2.6.23-rc1-mm2/fs/compat.c 2007-08-07 22:08:15.000000000+0200 > > @@ -2211,18 +2211,38 @@ > > #ifdef CONFIG_TIMERFD > > > > asmlinkage long compat_sys_timerfd(int ufd, int clockid, int flags, > > - const struct compat_itimerspec __user > *utmr) > > + const struct compat_itimerspec __user *utmr, > > + struct compat_itimerspec __user *old_utmr) > > { > > struct itimerspec t; > > struct itimerspec __user *ut; > > + long ret; > > > > - if (get_compat_itimerspec(&t, utmr)) > > - return -EFAULT; > > - ut = compat_alloc_user_space(sizeof(*ut)); > > - if (copy_to_user(ut, &t, sizeof(t))) > > - return -EFAULT; > > + /*:mtk: Framework taken from compat_sys_mq_getsetattr() */ > > Drop the :mtk: string(s). Yes, that was just a marker to myself for comments that should eventually be removed. > - return sys_timerfd(ufd, clockid, flags, ut); > > + ut = compat_alloc_user_space(2 * sizeof(*ut)); > > + > > + if (utmr) { > > + if (get_compat_itimerspec(&t, utmr)) > > + return -EFAULT; > > + if (copy_to_user(ut, &t, sizeof(t))) > > + return -EFAULT; > > + } > > + > > + ret = sys_timerfd(ufd, clockid, flags, > > + utmr ? ut : NULL, > > + old_utmr ? ut + 1 : NULL); > > + > > + if (ret) > > + return ret; > > + > > + if (old_utmr) { > > + if (copy_from_user(&t, old_ut, sizeof(t)) || > > I can't find <old_ut> anywhere. Should be old_utmr I guess. > > > + put_compat_itimerspec(&t, old_utmr)) > > put_compat_itimerspec(old_utmr, &t)) > > IOW, I think that the parameters are reversed (at least this way > their types match the function prototype). I'm not sure what happened there. I had a compiled, working kernel with the patch. Somehow I've messed up while making the patch, I guess. > + return -EFAULT; > > + } > > + > > + return 0; > > } > > > > #endif /* CONFIG_TIMERFD */ > > diff -ruN linux-2.6.23-rc1-mm2-orig/fs/timerfd.c linux-2.6.23-rc1-mm2 > /fs/timerfd.c > > --- linux-2.6.23-rc1-mm2-orig/fs/timerfd.c 2007-08-05 10:44: > 24.000000000 +0200 > > +++ linux-2.6.23-rc1-mm2/fs/timerfd.c 2007-08-09 22:11:25.000000000+0200 > > @@ -2,6 +2,8 @@ > > * fs/timerfd.c > > * > > * Copyright (C) 2007 Davide Libenzi <davidel@xmailserver.org> > > + * and Copyright (C) 2007 Google, Inc. > > + * (Author: Michael Kerrisk <mtk@google.com>) > > * > > * > > * Thanks to Thomas Gleixner for code reviews and useful comments. > > @@ -26,6 +28,12 @@ > > ktime_t tintv; > > wait_queue_head_t wqh; > > int expired; > > + int clockid; /* Established when timer is created, then > > + immutable */ > > + int overrun; /* Number of overruns for this timer so far, > > + updated on calls to timerfd() that retrieve > > + settings of an existing timer, reset to zero > > + by timerfd_read() */ > > }; > > > > /* > > @@ -61,6 +69,7 @@ > > hrtimer_init(&ctx->tmr, clockid, htmode); > > ctx->tmr.expires = texp; > > ctx->tmr.function = timerfd_tmrproc; > > + ctx->overrun = 0; > > if (texp.tv64 != 0) > > hrtimer_start(&ctx->tmr, texp, htmode); > > } > > @@ -130,10 +139,11 @@ > > * callback to avoid DoS attacks specifying a very > > * short timer period. > > */ > > - ticks = (u64) > > + ticks = ctx->overrun + > > hrtimer_forward(&ctx->tmr, > > > hrtimer_cb_get_time(&ctx->tmr), > > ctx->tintv); > > + ctx->overrun = 0; > > hrtimer_restart(&ctx->tmr); > > } else > > ticks = 1; > > @@ -151,24 +161,69 @@ > > }; > > > > asmlinkage long sys_timerfd(int ufd, int clockid, int flags, > > - const struct itimerspec __user *utmr) > > + const struct itimerspec __user *utmr, > > + struct itimerspec __user *old_utmr) > > { > > int error; > > struct timerfd_ctx *ctx; > > struct file *file; > > struct inode *inode; > > - struct itimerspec ktmr; > > + struct itimerspec ktmr, oktmr; > > > > - if (copy_from_user(&ktmr, utmr, sizeof(ktmr))) > > - return -EFAULT; > > + /* > > + * Not sure if we really need the first check below -- it > > + * relies on all clocks having a non-negative value. That's > > + * currently true, but I'm not sure that it is anywhere > > + * documented as a requirement. > > + * (The point of the check is that clockid has no meaning if > > + * we are changing the settings of an existing timer > > + * (i.e., ufd >= 0), so let's require it to be an otherwise > > + * unused value.) > > + */ > > > > - if (clockid != CLOCK_MONOTONIC && > > - clockid != CLOCK_REALTIME) > > + if (ufd >= 0) { > > + if (clockid != -1) > > + return -EINVAL; > > + } else if (clockid != CLOCK_MONOTONIC && clockid != > CLOCK_REALTIME) > > + return -EINVAL; > > + > > + /* > > + * Disallow any other bits in flags than those we implement. > > + */ > > + if ((flags & ~(TFD_TIMER_ABSTIME)) != 0) > > + return -EINVAL; > > + > > + /* > > + * Not sure if this check is strictly necessary, except to stop > > Use tab+space instead of spaces. Doh! Yes. > + * userspace trying to do something silly. Flags only has meaning > > + * if we are creating/modifying a timer. > > + */ > > + if (utmr == NULL && flags != 0) > > return -EINVAL; > > - if (!timespec_valid(&ktmr.it_value) || > > - !timespec_valid(&ktmr.it_interval)) > > + > > + /* > > + * If we are creating a new timer, then we must supply the setting > > + * and there is no previous timer setting to retrieve. > > + */ > > + if (ufd == -1 && (utmr == NULL || old_utmr != NULL)) > > + return -EINVAL; > > + > > + /* > > + * Caller must provide a new timer setting, or ask for previous > > + * setting, or both. > > + */ > > + if (utmr == NULL && old_utmr == NULL) > > return -EINVAL; > > > > + if (utmr) { > > + if (copy_from_user(&ktmr, utmr, sizeof(ktmr))) > > + return -EFAULT; > > + > > + if (!timespec_valid(&ktmr.it_value) || > > + !timespec_valid(&ktmr.it_interval)) > > + return -EINVAL; > > + } > > + > > if (ufd == -1) { > > ctx = kmalloc(sizeof(*ctx), GFP_KERNEL); > > if (!ctx) > > @@ -176,6 +231,11 @@ > > > > init_waitqueue_head(&ctx->wqh); > > > > + /* > > + * Save the clockid since we need it later if we modify > > + * the timer. > > + */ > > + ctx->clockid = clockid; > > timerfd_setup(ctx, clockid, flags, &ktmr); > > > > /* > > @@ -195,23 +255,61 @@ > > fput(file); > > return -EINVAL; > > } > > - /* > > - * We need to stop the existing timer before reprogramming > > - * it to the new values. > > - */ > > - for (;;) { > > - spin_lock_irq(&ctx->wqh.lock); > > - if (hrtimer_try_to_cancel(&ctx->tmr) >= 0) > > - break; > > - spin_unlock_irq(&ctx->wqh.lock); > > - cpu_relax(); > > + > > + if (old_utmr) { > > + > > + /* > > + * Retrieve interval and time remaining before > > + * next expiration of timer. > > + */ > > + > > + struct hrtimer *timer = &ctx->tmr; > > + ktime_t iv = ctx->tintv; > > + ktime_t now, remaining; > > + > > + clockid = ctx->clockid; > > + > > + memset(&oktmr, 0, sizeof(struct itimerspec)); > > + if (iv.tv64) > > + oktmr.it_interval = ktime_to_timespec(iv); > > + > > + now = timer->base->get_time(); > > + ctx->overrun += hrtimer_forward(&ctx->tmr, > > + hrtimer_cb_get_time(&ctx->tmr), > > + ctx->tintv); > > + remaining = ktime_sub(timer->expires, now); > > + if (remaining.tv64 <= 0) > > + oktmr.it_value.tv_nsec = 1; > > + else > > + oktmr.it_value = > ktime_to_timespec(remaining); > > + > > + if (copy_to_user(old_utmr, &oktmr, sizeof > (oktmr))) { > > + fput(file); > > + return -EFAULT; > > + } > > } > > - /* > > - * Re-program the timer to the new value ... > > - */ > > - timerfd_setup(ctx, clockid, flags, &ktmr); > > > > - spin_unlock_irq(&ctx->wqh.lock); > > + if (utmr) { > > + /* > > + * Modify settings of existing timer. > > + * We need to stop the existing timer before > > + * reprogramming it to the new values. > > + */ > > + for (;;) { > > + spin_lock_irq(&ctx->wqh.lock); > > + if (hrtimer_try_to_cancel(&ctx->tmr) >= 0) > > + break; > > + spin_unlock_irq(&ctx->wqh.lock); > > + cpu_relax(); > > + } > > + > > + /* > > + * Re-program the timer to the new value ... > > + */ > > + timerfd_setup(ctx, ctx->clockid, flags, &ktmr); > > + > > + spin_unlock_irq(&ctx->wqh.lock); > > + } > > fput(file); > > } > > > > @@ -222,4 +320,3 @@ > > kfree(ctx); > > return error; > > } > > - > > diff -ruN linux-2.6.23-rc1-mm2-orig/include/linux/compat.h > linux-2.6.23-rc1-mm2/include/linux/compat.h > > --- linux-2.6.23-rc1-mm2-orig/include/linux/compat.h 2007-07-09 01:32: > 17.000000000 +0200 > > +++ linux-2.6.23-rc1-mm2/include/linux/compat.h 2007-08-05 17:46: > 33.000000000 +0200 > > @@ -265,7 +265,8 @@ > > const compat_sigset_t __user *sigmask, > > compat_size_t sigsetsize); > > asmlinkage long compat_sys_timerfd(int ufd, int clockid, int flags, > > - const struct compat_itimerspec __user > *utmr); > > + const struct compat_itimerspec __user *utmr, > > + struct compat_itimerspec __user *old_utmr); > > > > #endif /* CONFIG_COMPAT */ > > #endif /* _LINUX_COMPAT_H */ > > diff -ruN linux-2.6.23-rc1-mm2-orig/include/linux/syscalls.h > linux-2.6.23-rc1-mm2/include/linux/syscalls.h > > --- linux-2.6.23-rc1-mm2-orig/include/linux/syscalls.h 2007-08-05 > 10:44:32.000000000 +0200 > > +++ linux-2.6.23-rc1-mm2/include/linux/syscalls.h 2007-08-05 17:47: > 15.000000000 +0200 > > @@ -608,7 +608,8 @@ > > asmlinkage long sys_getcpu(unsigned __user *cpu, unsigned __user *node, > struct getcpu_cache __user *cache); > > asmlinkage long sys_signalfd(int ufd, sigset_t __user *user_mask, > size_t sizemask); > > asmlinkage long sys_timerfd(int ufd, int clockid, int flags, > > - const struct itimerspec __user *utmr); > > + const struct itimerspec __user *utmr, > > + struct itimerspec __user *old_utmr); > > asmlinkage long sys_eventfd(unsigned int count); > > asmlinkage long sys_fallocate(int fd, int mode, loff_t offset, loff_t > len); > > > --- > ~Randy > *** Remember to use Documentation/SubmitChecklist when testing your code > *** > ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] Revised timerfd() interface 2007-08-25 6:41 [PATCH] Revised timerfd() interface Michael Kerrisk @ 2007-08-30 12:01 ` Davide Libenzi 2007-09-04 8:03 ` Michael Kerrisk 0 siblings, 1 reply; 29+ messages in thread From: Davide Libenzi @ 2007-08-30 12:01 UTC (permalink / raw) To: Michael Kerrisk Cc: Randy Dunlap, Andrew Morton, Thomas Gleixner, lkml, Linux Torvalds, Ulrich Drepper, stable, Christoph Hellwig, Jan Engelhardt, Jonathan Corbet On Sat, 25 Aug 2007, Michael Kerrisk wrote: > Davide -- ping! Can you please offer your comments about this change, and > also thoughts on Jon's and my comments about a more radical API change > later in this thread. IMO the complexity of the resulting API (and resulting patch), and the ABI change, is not justified by the added value. - Davide ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] Revised timerfd() interface 2007-08-30 12:01 ` Davide Libenzi @ 2007-09-04 8:03 ` Michael Kerrisk 2007-09-04 8:18 ` Andrew Morton 0 siblings, 1 reply; 29+ messages in thread From: Michael Kerrisk @ 2007-09-04 8:03 UTC (permalink / raw) To: Davide Libenzi Cc: Randy Dunlap, Andrew Morton, Thomas Gleixner, lkml, Linux Torvalds, Ulrich Drepper, stable, Christoph Hellwig, Jan Engelhardt, Jonathan Corbet Davide, >> Davide -- ping! Can you please offer your comments about this change, and >> also thoughts on Jon's and my comments about a more radical API change >> later in this thread. > > IMO the complexity of the resulting API (and resulting patch), and the ABI > change, is not justified by the added value. Neither of the proposed APIs (either my multiplexed version of timerfd() or Jon's/my idea of using three system calls (like POSIX timers), or the notion of timerfd() integrated with POSIX timers) is more complicated than the existing POSIX timers API. The ABI change doesn't really matter, since timerfd() was broken in 2.6.22 anyway. Both previous APIs provided the features I have described provide: * the ability to fetch the old timer value when applying a new setting * the ability to non-destructively fetch the amount of time remaining on a timer. This is clearly useful for timers -- but you have not explained why you think this is not necessary for timerfd timers. Please -- let's do timerfd() better. Either three syscalls like: timerfd_create() timerfd_settime() timer_gettime() (the analogs of timer_create(), timer_settime(), timer_gettime()). Or (if possible, and even better) timerfd() integrated with POSIX timers. Then we have a good API for the coming decades. I'm prepared to help out with patches (for what my help is worth ;-)). Cheers, Michael -- Michael Kerrisk maintainer of Linux man pages Sections 2, 3, 4, 5, and 7 Want to help with man page maintenance? Grab the latest tarball at http://www.kernel.org/pub/linux/docs/manpages/ read the HOWTOHELP file and grep the source files for 'FIXME'. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] Revised timerfd() interface 2007-09-04 8:03 ` Michael Kerrisk @ 2007-09-04 8:18 ` Andrew Morton 2007-09-04 8:24 ` Michael Kerrisk 2007-09-04 15:25 ` Davide Libenzi 0 siblings, 2 replies; 29+ messages in thread From: Andrew Morton @ 2007-09-04 8:18 UTC (permalink / raw) To: Michael Kerrisk Cc: davidel, rdunlap, tglx, linux-kernel, torvalds, drepper, stable, hch, jengelh, corbet > On Tue, 04 Sep 2007 10:03:56 +0200 Michael Kerrisk <mtk-manpages@gmx.net> wrote: > Davide, > > >> Davide -- ping! Can you please offer your comments about this change, and > >> also thoughts on Jon's and my comments about a more radical API change > >> later in this thread. > > > > IMO the complexity of the resulting API (and resulting patch), and the ABI > > change, is not justified by the added value. > > Neither of the proposed APIs (either my multiplexed version of timerfd() > or Jon's/my idea of using three system calls (like POSIX timers), or > the notion of timerfd() integrated with POSIX timers) is more > complicated than the existing POSIX timers API. > > The ABI change doesn't really matter, since timerfd() was broken in 2.6.22 > anyway. > > Both previous APIs provided the features I have described provide: > > * the ability to fetch the old timer value when applying > a new setting > > * the ability to non-destructively fetch the amount of time remaining > on a timer. > > This is clearly useful for timers -- but you have not explained why > you think this is not necessary for timerfd timers. <wakes up> I'd have thought that the existing stuff would be near-useless without the capabilities which you describe? ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] Revised timerfd() interface 2007-09-04 8:18 ` Andrew Morton @ 2007-09-04 8:24 ` Michael Kerrisk 2007-09-04 15:25 ` Davide Libenzi 1 sibling, 0 replies; 29+ messages in thread From: Michael Kerrisk @ 2007-09-04 8:24 UTC (permalink / raw) To: Andrew Morton Cc: corbet, jengelh, hch, stable, drepper, torvalds, linux-kernel, tglx, rdunlap, davidel [...] > > Neither of the proposed APIs (either my multiplexed version of > > timerfd() > > or Jon's/my idea of using three system calls (like POSIX timers), or > > the notion of timerfd() integrated with POSIX timers) is more > > complicated than the existing POSIX timers API. > > > > The ABI change doesn't really matter, since timerfd() was broken in > > 2.6.22 anyway. > > > > Both previous APIs provided the features I have described provide: > > > > * the ability to fetch the old timer value when applying > > a new setting > > > > * the ability to non-destructively fetch the amount of time remaining > > on a timer. > > > > This is clearly useful for timers -- but you have not explained why > > you think this is not necessary for timerfd timers. > > <wakes up> > > I'd have thought that the existing stuff would be near-useless without > the capabilities which you describe? Not useless, but certainly less functional than it can/should be (and with not too much effort on the kernel implementation side). Cheers, Michael -- Michael Kerrisk maintainer of Linux man pages Sections 2, 3, 4, 5, and 7 Want to help with man page maintenance? Grab the latest tarball at http://www.kernel.org/pub/linux/docs/manpages , read the HOWTOHELP file and grep the source files for 'FIXME'. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] Revised timerfd() interface 2007-09-04 8:18 ` Andrew Morton 2007-09-04 8:24 ` Michael Kerrisk @ 2007-09-04 15:25 ` Davide Libenzi 2007-09-04 15:39 ` Michael Kerrisk ` (2 more replies) 1 sibling, 3 replies; 29+ messages in thread From: Davide Libenzi @ 2007-09-04 15:25 UTC (permalink / raw) To: Andrew Morton Cc: Michael Kerrisk, rdunlap, tglx, Linux Kernel Mailing List, Linus Torvalds, Ulrich Drepper, stable, hch, jengelh, corbet On Tue, 4 Sep 2007, Andrew Morton wrote: > > On Tue, 04 Sep 2007 10:03:56 +0200 Michael Kerrisk <mtk-manpages@gmx.net> wrote: > > Davide, > > > > >> Davide -- ping! Can you please offer your comments about this change, and > > >> also thoughts on Jon's and my comments about a more radical API change > > >> later in this thread. > > > > > > IMO the complexity of the resulting API (and resulting patch), and the ABI > > > change, is not justified by the added value. > > > > Neither of the proposed APIs (either my multiplexed version of timerfd() > > or Jon's/my idea of using three system calls (like POSIX timers), or > > the notion of timerfd() integrated with POSIX timers) is more > > complicated than the existing POSIX timers API. > > > > The ABI change doesn't really matter, since timerfd() was broken in 2.6.22 > > anyway. > > > > Both previous APIs provided the features I have described provide: > > > > * the ability to fetch the old timer value when applying > > a new setting > > > > * the ability to non-destructively fetch the amount of time remaining > > on a timer. > > > > This is clearly useful for timers -- but you have not explained why > > you think this is not necessary for timerfd timers. > > <wakes up> > > I'd have thought that the existing stuff would be near-useless without the > capabilities which you describe? Useless like it'd be a motorcycle w/out a cup-holder :) Seriously, the ability to get the previous values from "something" could have a meaning if this something is a shared global resource (like signals for example). In the timerfd case this makes little sense, since you can create as many timerfd as you like and you do not need to share a single one by changing/restoring the original context. On top of that, the cup-holder addition would cost in terms of API clarity (or in terms of two additional system calls in the other case), and in terms of kernel code footprint. Costs that IMO are not balanced by the added values. - Davide ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] Revised timerfd() interface 2007-09-04 15:25 ` Davide Libenzi @ 2007-09-04 15:39 ` Michael Kerrisk 2007-09-04 22:41 ` Davide Libenzi 2007-09-04 20:49 ` Michael Kerrisk 2007-09-05 12:12 ` Denys Vlasenko 2 siblings, 1 reply; 29+ messages in thread From: Michael Kerrisk @ 2007-09-04 15:39 UTC (permalink / raw) To: Davide Libenzi, akpm Cc: corbet, jengelh, hch, stable, drepper, torvalds, linux-kernel, tglx, rdunlap Hi Davide, > > <wakes up> > > > > I'd have thought that the existing stuff would be near-useless without > > the capabilities which you describe? > > Useless like it'd be a motorcycle w/out a cup-holder :) > Seriously, the ability to get the previous values from "something" could > have a meaning if this something is a shared global resource (like > signals > for example). In the timerfd case this makes little sense, since you can > create as many timerfd as you like and you do not need to share a single > one by changing/restoring the original context. However, one can have multipe POSIX timers, just as you can have multiple timerfd timers; nevertheless POSIX timers provide the get and get-while-setting functionality. > On top of that, the cup-holder addition would cost in terms of API > clarity I agree my proposed API is not as clean as it could be, that's why I would favour: > (or in terms of two additional system calls in the other case), Or better still, have timerfd() integrated with POSIX tiemrs (if this is feasible). This givesus a simple API, exactly one new syscall, and all of the functionality of the existing POSIX timers API. > and in terms of kernel code footprint. Not sure what your concern is here. The total amount of new code for all of these options is pretty small. Cheers, Michael -- Michael Kerrisk maintainer of Linux man pages Sections 2, 3, 4, 5, and 7 Want to help with man page maintenance? Grab the latest tarball at http://www.kernel.org/pub/linux/docs/manpages , read the HOWTOHELP file and grep the source files for 'FIXME'. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] Revised timerfd() interface 2007-09-04 15:39 ` Michael Kerrisk @ 2007-09-04 22:41 ` Davide Libenzi 0 siblings, 0 replies; 29+ messages in thread From: Davide Libenzi @ 2007-09-04 22:41 UTC (permalink / raw) To: Michael Kerrisk Cc: Andrew Morton, corbet, jengelh, hch, stable, Ulrich Drepper, Linus Torvalds, Linux Kernel Mailing List, tglx, rdunlap On Tue, 4 Sep 2007, Michael Kerrisk wrote: > Hi Davide, > > > > <wakes up> > > > > > > I'd have thought that the existing stuff would be near-useless without > > > the capabilities which you describe? > > > > Useless like it'd be a motorcycle w/out a cup-holder :) > > Seriously, the ability to get the previous values from "something" could > > have a meaning if this something is a shared global resource (like > > signals > > for example). In the timerfd case this makes little sense, since you can > > create as many timerfd as you like and you do not need to share a single > > one by changing/restoring the original context. > > However, one can have multipe POSIX timers, just as you can > have multiple timerfd timers; nevertheless POSIX timers provide > the get and get-while-setting functionality. The fact that POSIX defined a certain API in a given way, does not automatically mean that every other API has to look exactly like that. POSIX has the tendency to bloat things up at times ;) > > and in terms of kernel code footprint. > > Not sure what your concern is here. The total amount of > new code for all of these options is pretty small. >From your patch: fs/compat.c | 34 ++++++++-- fs/timerfd.c | 147 +++++++++++++++++++++++++++++++++++++++-------- include/linux/compat.h | 3 include/linux/syscalls.h | 3 4 files changed, 153 insertions(+), 34 deletions(-) And the API definition becomes pretty messy. The other way is to add new system calls. 120+ lines of code more of new system calls wouldn't even be a problem in itself, if the added value was there. IMO, as I already said, the added value does not justify them. - Davide ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] Revised timerfd() interface 2007-09-04 15:25 ` Davide Libenzi 2007-09-04 15:39 ` Michael Kerrisk @ 2007-09-04 20:49 ` Michael Kerrisk 2007-09-04 22:44 ` Davide Libenzi 2007-09-05 12:12 ` Denys Vlasenko 2 siblings, 1 reply; 29+ messages in thread From: Michael Kerrisk @ 2007-09-04 20:49 UTC (permalink / raw) To: Davide Libenzi, akpm Cc: corbet, jengelh, hch, stable, drepper, torvalds, linux-kernel, tglx, rdunlap > > > The ABI change doesn't really matter, since timerfd() was broken in > > > 2.6.22 anyway. > > > > > > Both previous APIs provided the features I have described provide: > > > > > > * the ability to fetch the old timer value when applying > > > a new setting > > > > > > * the ability to non-destructively fetch the amount of time remaining > > > on a timer. > > > > > > This is clearly useful for timers -- but you have not explained why > > > you think this is not necessary for timerfd timers. > > > > <wakes up> > > > > I'd have thought that the existing stuff would be near-useless without > > the capabilities which you describe? > > Useless like it'd be a motorcycle w/out a cup-holder :) > Seriously, the ability to get the previous values from "something" could > have a meaning if this something is a shared global resource (like > signals > for example). In the timerfd case this makes little sense, since you can > create as many timerfd as you like and you do not need to share a single > one by changing/restoring the original context. Davide, As I think about this more, I see more problems with your argument. timerfd needs the ability to get and get-while-setting just as much as the earlier APIs. Consider a library that creates a timerfd file descriptor that is handed off to an application: that library may want to modify the timer settings without having to create a new file descriptor (the app mey not be able to be told about the new fd). Your argument just doesn't hold, AFAICS. Cheers, Michael -- Michael Kerrisk maintainer of Linux man pages Sections 2, 3, 4, 5, and 7 Want to help with man page maintenance? Grab the latest tarball at http://www.kernel.org/pub/linux/docs/manpages , read the HOWTOHELP file and grep the source files for 'FIXME'. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] Revised timerfd() interface 2007-09-04 20:49 ` Michael Kerrisk @ 2007-09-04 22:44 ` Davide Libenzi 2007-09-05 0:08 ` Michael Kerrisk 0 siblings, 1 reply; 29+ messages in thread From: Davide Libenzi @ 2007-09-04 22:44 UTC (permalink / raw) To: Michael Kerrisk Cc: Andrew Morton, corbet, jengelh, hch, stable, Ulrich Drepper, Linus Torvalds, Linux Kernel Mailing List, tglx, rdunlap On Tue, 4 Sep 2007, Michael Kerrisk wrote: > > Useless like it'd be a motorcycle w/out a cup-holder :) > > Seriously, the ability to get the previous values from "something" could > > have a meaning if this something is a shared global resource (like > > signals > > for example). In the timerfd case this makes little sense, since you can > > create as many timerfd as you like and you do not need to share a single > > one by changing/restoring the original context. > > Davide, > > As I think about this more, I see more problems with > your argument. timerfd needs the ability to get and > get-while-setting just as much as the earlier APIs. > Consider a library that creates a timerfd file descriptor that > is handed off to an application: that library may want > to modify the timer settings without having to create a > new file descriptor (the app mey not be able to be told about > the new fd). Your argument just doesn't hold, AFAICS. Such hypotethical library, in case it really wanted to offer such functionality, could simply return an handle instead of the raw fd, and take care of all that stuff in userspace. Again, mimicking POSIX APIs doesn't always take you in the right place. - Davide ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] Revised timerfd() interface 2007-09-04 22:44 ` Davide Libenzi @ 2007-09-05 0:08 ` Michael Kerrisk 2007-09-05 12:02 ` Andrew Morton 2007-09-05 16:14 ` Davide Libenzi 0 siblings, 2 replies; 29+ messages in thread From: Michael Kerrisk @ 2007-09-05 0:08 UTC (permalink / raw) To: Davide Libenzi Cc: rdunlap, tglx, linux-kernel, torvalds, drepper, stable, hch, jengelh, corbet, akpm Davide, > > As I think about this more, I see more problems with > > your argument. timerfd needs the ability to get and > > get-while-setting just as much as the earlier APIs. > > Consider a library that creates a timerfd file descriptor that > > is handed off to an application: that library may want > > to modify the timer settings without having to create a > > new file descriptor (the app mey not be able to be told about > > the new fd). Your argument just doesn't hold, AFAICS. > > Such hypotethical library, in case it really wanted to offer such > functionality, could simply return an handle instead of the raw fd, and > take care of all that stuff in userspace. Did I miss something? Is it not the case that as soon as the library returns a handle, rather than an fd, then the whole advantage of timerfd() (being able to select/poll/epoll on the timer as well as other fds) is lost? > Again, mimicking POSIX APIs doesn't always take you in the right place. POSIX may goof in places, but in general it is the result of many smart people thinking about how to design/standardize APIs. So the onus is on us to explain why they got this point wrong. And it is not merely POSIX that did things things in the way I've described: so did the earlier setitimer()/getitimer(). Cheers, Michael -- Michael Kerrisk maintainer of Linux man pages Sections 2, 3, 4, 5, and 7 Want to help with man page maintenance? Grab the latest tarball at http://www.kernel.org/pub/linux/docs/manpages , read the HOWTOHELP file and grep the source files for 'FIXME'. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] Revised timerfd() interface 2007-09-05 0:08 ` Michael Kerrisk @ 2007-09-05 12:02 ` Andrew Morton 2007-09-05 16:14 ` Davide Libenzi 1 sibling, 0 replies; 29+ messages in thread From: Andrew Morton @ 2007-09-05 12:02 UTC (permalink / raw) To: Michael Kerrisk Cc: davidel, rdunlap, tglx, linux-kernel, torvalds, drepper, stable, hch, jengelh, corbet > On Wed, 05 Sep 2007 02:08:31 +0200 "Michael Kerrisk" <mtk-manpages@gmx.net> wrote: > Davide, > > > > As I think about this more, I see more problems with > > > your argument. timerfd needs the ability to get and > > > get-while-setting just as much as the earlier APIs. > > > Consider a library that creates a timerfd file descriptor that > > > is handed off to an application: that library may want > > > to modify the timer settings without having to create a > > > new file descriptor (the app mey not be able to be told about > > > the new fd). Your argument just doesn't hold, AFAICS. > > > > Such hypotethical library, in case it really wanted to offer such > > functionality, could simply return an handle instead of the raw fd, and > > take care of all that stuff in userspace. > > Did I miss something? Is it not the case that as soon as the > library returns a handle, rather than an fd, then the whole > advantage of timerfd() (being able to select/poll/epoll on > the timer as well as other fds) is lost? > > > Again, mimicking POSIX APIs doesn't always take you in the right place. > > POSIX may goof in places, but in general it is the result of > many smart people thinking about how to design/standardize APIs. > So the onus is on us to explain why they got this point wrong. > And it is not merely POSIX that did things things in the > way I've described: so did the earlier setitimer()/getitimer(). > <wakes up even more> Seems that we need to pay some more attention to this, as time is pressing and we should get this stuff finalised in 2.6.23. Which means that most of us have some catching-up to do. Michael, could you please refresh our memories with a brief, from-scratch summary of what the current interface is, followed by a summary of what you believe to be the shortcomings to be? ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] Revised timerfd() interface 2007-09-05 0:08 ` Michael Kerrisk 2007-09-05 12:02 ` Andrew Morton @ 2007-09-05 16:14 ` Davide Libenzi 2007-09-05 16:23 ` Michael Kerrisk 1 sibling, 1 reply; 29+ messages in thread From: Davide Libenzi @ 2007-09-05 16:14 UTC (permalink / raw) To: Michael Kerrisk Cc: rdunlap, tglx, Linux Kernel Mailing List, Linus Torvalds, Ulrich Drepper, stable, hch, jengelh, corbet, Andrew Morton On Wed, 5 Sep 2007, Michael Kerrisk wrote: > Davide, A Michael! > > > As I think about this more, I see more problems with > > > your argument. timerfd needs the ability to get and > > > get-while-setting just as much as the earlier APIs. > > > Consider a library that creates a timerfd file descriptor that > > > is handed off to an application: that library may want > > > to modify the timer settings without having to create a > > > new file descriptor (the app mey not be able to be told about > > > the new fd). Your argument just doesn't hold, AFAICS. > > > > Such hypotethical library, in case it really wanted to offer such > > functionality, could simply return an handle instead of the raw fd, and > > take care of all that stuff in userspace. > > Did I miss something? Is it not the case that as soon as the > library returns a handle, rather than an fd, then the whole > advantage of timerfd() (being able to select/poll/epoll on > the timer as well as other fds) is lost? Why? The handle would simply be a little struct where the timerfd fd is stored, and a XXX_getfd() would return it. So my point is, I doubt such functionalities are really needed, and I also argue that the kernel is the best place for such wrapper code to go. - Davide ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] Revised timerfd() interface 2007-09-05 16:14 ` Davide Libenzi @ 2007-09-05 16:23 ` Michael Kerrisk 2007-09-05 19:57 ` Davide Libenzi 0 siblings, 1 reply; 29+ messages in thread From: Michael Kerrisk @ 2007-09-05 16:23 UTC (permalink / raw) To: Davide Libenzi Cc: akpm, corbet, jengelh, hch, stable, drepper, torvalds, linux-kernel, tglx, rdunlap Hi Davide, > > > > As I think about this more, I see more problems with > > > > your argument. timerfd needs the ability to get and > > > > get-while-setting just as much as the earlier APIs. > > > > Consider a library that creates a timerfd file descriptor that > > > > is handed off to an application: that library may want > > > > to modify the timer settings without having to create a > > > > new file descriptor (the app mey not be able to be told about > > > > the new fd). Your argument just doesn't hold, AFAICS. > > > > > > Such hypotethical library, in case it really wanted to offer such > > > functionality, could simply return an handle instead of the raw fd, > > > and take care of all that stuff in userspace. > > > > Did I miss something? Is it not the case that as soon as the > > library returns a handle, rather than an fd, then the whole > > advantage of timerfd() (being able to select/poll/epoll on > > the timer as well as other fds) is lost? > > Why? The handle would simply be a little struct where the timerfd fd is > stored, and a XXX_getfd() would return it. > So my point is, I doubt such functionalities are really needed, and I > also argue that the kernel is the best place for such wrapper code > to go. So what happens if one thread (via the library) wants modify a timer's settings at the same timer as another thread is select()ing on it? The first thread can't do this by creating a new timerfd timer, since it wants to affect the select() in the other thread? Cheers, Michael -- Michael Kerrisk maintainer of Linux man pages Sections 2, 3, 4, 5, and 7 Want to help with man page maintenance? Grab the latest tarball at http://www.kernel.org/pub/linux/docs/manpages , read the HOWTOHELP file and grep the source files for 'FIXME'. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] Revised timerfd() interface 2007-09-05 16:23 ` Michael Kerrisk @ 2007-09-05 19:57 ` Davide Libenzi 2007-09-05 22:50 ` Michael Kerrisk 0 siblings, 1 reply; 29+ messages in thread From: Davide Libenzi @ 2007-09-05 19:57 UTC (permalink / raw) To: Michael Kerrisk Cc: Andrew Morton, corbet, jengelh, hch, stable, Ulrich Drepper, Linus Torvalds, Linux Kernel Mailing List, tglx, rdunlap On Wed, 5 Sep 2007, Michael Kerrisk wrote: > Hi Davide, > > > > > > As I think about this more, I see more problems with > > > > > your argument. timerfd needs the ability to get and > > > > > get-while-setting just as much as the earlier APIs. > > > > > Consider a library that creates a timerfd file descriptor that > > > > > is handed off to an application: that library may want > > > > > to modify the timer settings without having to create a > > > > > new file descriptor (the app mey not be able to be told about > > > > > the new fd). Your argument just doesn't hold, AFAICS. > > > > > > > > Such hypotethical library, in case it really wanted to offer such > > > > functionality, could simply return an handle instead of the raw fd, > > > > and take care of all that stuff in userspace. > > > > > > Did I miss something? Is it not the case that as soon as the > > > library returns a handle, rather than an fd, then the whole > > > advantage of timerfd() (being able to select/poll/epoll on > > > the timer as well as other fds) is lost? > > > > Why? The handle would simply be a little struct where the timerfd fd is > > stored, and a XXX_getfd() would return it. > > So my point is, I doubt such functionalities are really needed, and I > > also argue that the kernel is the best place for such wrapper code > > to go. > > So what happens if one thread (via the library) wants modify > a timer's settings at the same timer as another thread is > select()ing on it? The first thread can't do this by creating > a new timerfd timer, since it wants to affect the select() > in the other thread? It can be done w/out any problems. The select thread will be notified whenever the new timer setting expires. - Davide ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] Revised timerfd() interface 2007-09-05 19:57 ` Davide Libenzi @ 2007-09-05 22:50 ` Michael Kerrisk 2007-09-05 23:45 ` Davide Libenzi 0 siblings, 1 reply; 29+ messages in thread From: Michael Kerrisk @ 2007-09-05 22:50 UTC (permalink / raw) To: Davide Libenzi Cc: rdunlap, tglx, linux-kernel, torvalds, drepper, stable, hch, jengelh, corbet, akpm Hi Davide, > > > > > > As I think about this more, I see more problems with > > > > > > your argument. timerfd needs the ability to get and > > > > > > get-while-setting just as much as the earlier APIs. > > > > > > Consider a library that creates a timerfd file descriptor that > > > > > > is handed off to an application: that library may want > > > > > > to modify the timer settings without having to create a > > > > > > new file descriptor (the app mey not be able to be told about > > > > > > the new fd). Your argument just doesn't hold, AFAICS. > > > > > > > > > > Such hypotethical library, in case it really wanted to offer such > > > > > functionality, could simply return an handle instead of the raw > > > > > fd, and take care of all that stuff in userspace. > > > > > > > > Did I miss something? Is it not the case that as soon as the > > > > library returns a handle, rather than an fd, then the whole > > > > advantage of timerfd() (being able to select/poll/epoll on > > > > the timer as well as other fds) is lost? > > > > > > Why? The handle would simply be a little struct where the timerfd fd > > > is > > > stored, and a XXX_getfd() would return it. > > > So my point is, I doubt such functionalities are really needed, and I > > > also argue that the kernel is the best place for such wrapper code > > > to go. > > > > So what happens if one thread (via the library) wants modify > > a timer's settings at the same timer as another thread is > > select()ing on it? The first thread can't do this by creating > > a new timerfd timer, since it wants to affect the select() > > in the other thread? > > It can be done w/out any problems. The select thread will be notified > whenever the new timer setting expires. We are going in circles here. I think you are missing my point. Consider the following [[ Thread A: calls library function which creates a timerfd file descriptor. Thread B: calls select() on the timerfd file descriptor. Thread A: calls library function which wants to: a) modify timer settings, and retrieve copy of current timer settings, and later b) restore old timer settings. ]] This seems a quite reasonable use-case to me, and the existing interface simply can't support it. Cheers, Michael -- Michael Kerrisk maintainer of Linux man pages Sections 2, 3, 4, 5, and 7 Want to help with man page maintenance? Grab the latest tarball at http://www.kernel.org/pub/linux/docs/manpages , read the HOWTOHELP file and grep the source files for 'FIXME'. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] Revised timerfd() interface 2007-09-05 22:50 ` Michael Kerrisk @ 2007-09-05 23:45 ` Davide Libenzi 2007-09-06 6:58 ` Michael Kerrisk 0 siblings, 1 reply; 29+ messages in thread From: Davide Libenzi @ 2007-09-05 23:45 UTC (permalink / raw) To: Michael Kerrisk Cc: rdunlap, tglx, Linux Kernel Mailing List, Linus Torvalds, Ulrich Drepper, stable, hch, jengelh, corbet, Andrew Morton On Thu, 6 Sep 2007, Michael Kerrisk wrote: > Hi Davide, > > > > > > > > As I think about this more, I see more problems with > > > > > > > your argument. timerfd needs the ability to get and > > > > > > > get-while-setting just as much as the earlier APIs. > > > > > > > Consider a library that creates a timerfd file descriptor that > > > > > > > is handed off to an application: that library may want > > > > > > > to modify the timer settings without having to create a > > > > > > > new file descriptor (the app mey not be able to be told about > > > > > > > the new fd). Your argument just doesn't hold, AFAICS. > > > > > > > > > > > > Such hypotethical library, in case it really wanted to offer such > > > > > > functionality, could simply return an handle instead of the raw > > > > > > fd, and take care of all that stuff in userspace. > > > > > > > > > > Did I miss something? Is it not the case that as soon as the > > > > > library returns a handle, rather than an fd, then the whole > > > > > advantage of timerfd() (being able to select/poll/epoll on > > > > > the timer as well as other fds) is lost? > > > > > > > > Why? The handle would simply be a little struct where the timerfd fd > > > > is > > > > stored, and a XXX_getfd() would return it. > > > > So my point is, I doubt such functionalities are really needed, and I > > > > also argue that the kernel is the best place for such wrapper code > > > > to go. > > > > > > So what happens if one thread (via the library) wants modify > > > a timer's settings at the same timer as another thread is > > > select()ing on it? The first thread can't do this by creating > > > a new timerfd timer, since it wants to affect the select() > > > in the other thread? > > > > It can be done w/out any problems. The select thread will be notified > > whenever the new timer setting expires. > > We are going in circles here. I think you are missing my point. > Consider the following > > [[ > Thread A: calls library function which creates a timerfd file > descriptor. > > Thread B: calls select() on the timerfd file descriptor. > > Thread A: calls library function which wants to: > a) modify timer settings, and retrieve copy of current timer > settings, and later > b) restore old timer settings. > ]] > > This seems a quite reasonable use-case to me, and the existing > interface simply can't support it. "Quite reasonable"? :) I honestly doubt it, but anyway. Modulo error checking: struct tfd { int fd, clockid; struct itimerspec ts; }; struct tfd *tfd_create(int clockid, int flags, const struct itimerspec *ts) { struct tfd *th; th = malloc(sizeof(*th)); th->clockid = clockid; th->ts = *ts; th->fd = timerfd(-1, clockid, flags, ts); return th; } void tfd_close(struct tfd *th) { close(th->fd); free(th); } int tfd_getfd(const struct tfd *th) { return th->fd; } int tfd_gettime(const struct tfd *th, int *clockid, struct itimerspec *ts) { *clockid = th->clockid; *ts = th->ts; return 0; } int tfd_settime(struct tfd *th, int clockid, int flags, const struct itimerspec *ts) { th->fd = timerfd(th->fd, clockid, flags, ts); th->clockid = clockid; th->ts = *ts; return 0; } Wrap the get/set with a mutex in case you plan to shoot yourself in a foot by doing get/set from multiple threads ;) So, once again: - I sincerly doubt the above is common usage/design patters for timerfds * timerfds are not a common global resource, ala signals, that requires get+set+restore pattern - you can have many of them set to different times - Those IMO *very* special use cases can be handled in userspace with few lines of code, *if* really needed - Davide ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] Revised timerfd() interface 2007-09-05 23:45 ` Davide Libenzi @ 2007-09-06 6:58 ` Michael Kerrisk 2007-09-06 23:37 ` Davide Libenzi 0 siblings, 1 reply; 29+ messages in thread From: Michael Kerrisk @ 2007-09-06 6:58 UTC (permalink / raw) To: Davide Libenzi Cc: akpm, corbet, jengelh, hch, stable, drepper, torvalds, linux-kernel, tglx, rdunlap Hi Davide, > > > > > > > > As I think about this more, I see more problems with > > > > > > > > your argument. timerfd needs the ability to get and > > > > > > > > get-while-setting just as much as the earlier APIs. > > > > > > > > Consider a library that creates a timerfd file descriptor > > > > > > > > that > > > > > > > > is handed off to an application: that library may want > > > > > > > > to modify the timer settings without having to create a > > > > > > > > new file descriptor (the app mey not be able to be told > > > > > > > > about > > > > > > > > the new fd). Your argument just doesn't hold, AFAICS. > > > > > > > > > > > > > > Such hypotethical library, in case it really wanted to offer > > > > > > > such > > > > > > > functionality, could simply return an handle instead of the > > > > > > > raw fd, and take care of all that stuff in userspace. > > > > > > > > > > > > Did I miss something? Is it not the case that as soon as the > > > > > > library returns a handle, rather than an fd, then the whole > > > > > > advantage of timerfd() (being able to select/poll/epoll on > > > > > > the timer as well as other fds) is lost? > > > > > > > > > > Why? The handle would simply be a little struct where the > > > > > timerfd fd is > > > > > stored, and a XXX_getfd() would return it. > > > > > So my point is, I doubt such functionalities are really needed, > > > > > and I > > > > > also argue that the kernel is the best place for such wrapper ^ (By the way, I assume that back there, there was a missing "not", right?) > > > > > code to go. > > > > > > > > So what happens if one thread (via the library) wants modify > > > > a timer's settings at the same timer as another thread is > > > > select()ing on it? The first thread can't do this by creating > > > > a new timerfd timer, since it wants to affect the select() > > > > in the other thread? > > > > > > It can be done w/out any problems. The select thread will be notified > > > whenever the new timer setting expires. > > > > We are going in circles here. I think you are missing my point. > > Consider the following > > > > [[ > > Thread A: calls library function which creates a timerfd file > > descriptor. > > > > Thread B: calls select() on the timerfd file descriptor. > > > > Thread A: calls library function which wants to: > > a) modify timer settings, and retrieve copy of current timer > > settings, and later > > b) restore old timer settings. > > ]] > > > > This seems a quite reasonable use-case to me, and the existing > > interface simply can't support it. > > "Quite reasonable"? :) > I honestly doubt it, but anyway. You are asserting this in the face of two previous APIs designed by people who (at least in the case of POSIX timers) probably thoroughly examined and discussed existing APIs and practice. > Modulo error checking: I'm not sure what problem this code is supposed to solve. It doesn't solve the problem I have in mind; see below. > struct tfd { > int fd, clockid; > struct itimerspec ts; > }; > > struct tfd *tfd_create(int clockid, int flags, const struct itimerspec > *ts) { > struct tfd *th; > th = malloc(sizeof(*th)); > th->clockid = clockid; > th->ts = *ts; > th->fd = timerfd(-1, clockid, flags, ts); > return th; > } > > void tfd_close(struct tfd *th) { > close(th->fd); > free(th); > } > > int tfd_getfd(const struct tfd *th) { > return th->fd; > } > > int tfd_gettime(const struct tfd *th, int *clockid, struct itimerspec *ts) > { > *clockid = th->clockid; > *ts = th->ts; > return 0; > } This function is *not at all* equivalent to the "get" functionality of the previous APIs. The "get" functionality of POSIX timers (for example) returns a structure that contains the timer interval and the *time until the next expiration of the timer* (not the initial timer string, as your code above does). So come up with a reliable, race-free way of doing that in userspace. Then make it work for both CLOCK_MONOTONIC and CLOCK_REALTIME timers. (You'll certainly need to be making some additional system calls, by the way: at least some calls to clock_gettime().) > int tfd_settime(struct tfd *th, int clockid, int flags, > const struct itimerspec *ts) { > th->fd = timerfd(th->fd, clockid, flags, ts); > th->clockid = clockid; > th->ts = *ts; > return 0; > } > > Wrap the get/set with a mutex in case you plan to shoot yourself in a > foot by doing get/set from multiple threads ;) > So, once again: > > - I sincerly doubt the above is common usage/design patters for timerfds See my comments above in this message. You may doubt it, but all of the earlier API designers did not. > * timerfds are not a common global resource, ala signals, that requires > get+set+restore pattern - you can have many of them set to different > times No! In the example I described the library is able to create and control exactly *one* timerfd file descriptor. It wants to hand that fd back to the application and then perform arbitrary manipulations of the timer's settings. Meanwhile the application needs to (repeatedly) monitor that one file descriptor in a select()/poll()/epoll() (and so the library can't just arbitrarily create further file descriptors). > - Those IMO *very* special use cases can be handled in userspace with few > lines of code, *if* really needed You have not demonstrated this ;-). Your userspace code does not solve the problem. Best regards, Michael -- Michael Kerrisk maintainer of Linux man pages Sections 2, 3, 4, 5, and 7 Want to help with man page maintenance? Grab the latest tarball at http://www.kernel.org/pub/linux/docs/manpages , read the HOWTOHELP file and grep the source files for 'FIXME'. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] Revised timerfd() interface 2007-09-06 6:58 ` Michael Kerrisk @ 2007-09-06 23:37 ` Davide Libenzi 2007-09-10 3:15 ` Michael Kerrisk 0 siblings, 1 reply; 29+ messages in thread From: Davide Libenzi @ 2007-09-06 23:37 UTC (permalink / raw) To: Michael Kerrisk Cc: Andrew Morton, corbet, jengelh, hch, stable, Ulrich Drepper, Linus Torvalds, Linux Kernel Mailing List, tglx, rdunlap On Thu, 6 Sep 2007, Michael Kerrisk wrote: > You are asserting this in the face of two previous APIs designed > by people who (at least in the case of POSIX timers) probably > thoroughly examined and discussed existing APIs and practice. You really think that. Uhmm, ok. > This function is *not at all* equivalent to the "get" > functionality of the previous APIs. The "get" functionality > of POSIX timers (for example) returns a structure that contains > the timer interval and the *time until the next expiration of > the timer* (not the initial timer string, as your code above > does). > So come up with a reliable, race-free way of doing that in > userspace. Then make it work for both CLOCK_MONOTONIC and > CLOCK_REALTIME timers. (You'll certainly need to be making > some additional system calls, by the way: at least some > calls to clock_gettime().) No, I don't think I will. Since 1) it's so trivial it's not even challenging 2) you know it can be done (I assume) 3) it solves a non-case made up to justify an API/kernel-code bloating. But you don't need *my* signoff. Cook a working patch, make a case for it, and gather support and signoffs. I won't be acking it because, as I said many times, I think it's useless bloating. - Davide ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] Revised timerfd() interface 2007-09-06 23:37 ` Davide Libenzi @ 2007-09-10 3:15 ` Michael Kerrisk 0 siblings, 0 replies; 29+ messages in thread From: Michael Kerrisk @ 2007-09-10 3:15 UTC (permalink / raw) To: Davide Libenzi Cc: Andrew Morton, corbet, jengelh, hch, stable, Ulrich Drepper, Linus Torvalds, Linux Kernel Mailing List, tglx, rdunlap Hi Davide, >> This function is *not at all* equivalent to the "get" >> functionality of the previous APIs. The "get" functionality >> of POSIX timers (for example) returns a structure that contains >> the timer interval and the *time until the next expiration of >> the timer* (not the initial timer string, as your code above >> does). >> So come up with a reliable, race-free way of doing that in >> userspace. Then make it work for both CLOCK_MONOTONIC and >> CLOCK_REALTIME timers. (You'll certainly need to be making >> some additional system calls, by the way: at least some >> calls to clock_gettime().) > > No, I don't think I will. Since 1) it's so trivial it's not even > challenging Well, it seems to me that the proposed library wrapper code could start to get rather verbose by the time one adds in these changes (plus the mutexes). > 2) you know it can be done (I assume) Sorry -- I was probably speaking too rhetorically. In fact, it's not completely clear to me that it can (always) be done. For example, the answer depends on knowing the clockid. But what if the file descriptor has been handed off to some code that doesn't know the clock type (i.e., isn't accessing the fd via your suggested library)? Cheers, Michael > 3) it solves a non-case > made up to justify an API/kernel-code bloating. > But you don't need *my* signoff. Cook a working patch, make a case for it, > and gather support and signoffs. I won't be acking it because, as I said > many times, I think it's useless bloating. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] Revised timerfd() interface 2007-09-04 15:25 ` Davide Libenzi 2007-09-04 15:39 ` Michael Kerrisk 2007-09-04 20:49 ` Michael Kerrisk @ 2007-09-05 12:12 ` Denys Vlasenko 2007-09-05 15:32 ` timerfd redux Michael Kerrisk 2 siblings, 1 reply; 29+ messages in thread From: Denys Vlasenko @ 2007-09-05 12:12 UTC (permalink / raw) To: Davide Libenzi Cc: Andrew Morton, Michael Kerrisk, rdunlap, tglx, Linux Kernel Mailing List, Linus Torvalds, Ulrich Drepper, stable, hch, jengelh, corbet On Tuesday 04 September 2007 16:25, Davide Libenzi wrote: > On Tue, 4 Sep 2007, Andrew Morton wrote: > > > > On Tue, 04 Sep 2007 10:03:56 +0200 Michael Kerrisk <mtk-manpages@gmx.net> wrote: > > > Davide, > > > > > > >> Davide -- ping! Can you please offer your comments about this change, and > > > >> also thoughts on Jon's and my comments about a more radical API change > > > >> later in this thread. > > > > > > > > IMO the complexity of the resulting API (and resulting patch), and the ABI > > > > change, is not justified by the added value. > > > > > > Neither of the proposed APIs (either my multiplexed version of timerfd() > > > or Jon's/my idea of using three system calls (like POSIX timers), or > > > the notion of timerfd() integrated with POSIX timers) is more > > > complicated than the existing POSIX timers API. > > > > > > The ABI change doesn't really matter, since timerfd() was broken in 2.6.22 > > > anyway. > > > > > > Both previous APIs provided the features I have described provide: > > > > > > * the ability to fetch the old timer value when applying > > > a new setting > > > > > > * the ability to non-destructively fetch the amount of time remaining > > > on a timer. > > > > > > This is clearly useful for timers -- but you have not explained why > > > you think this is not necessary for timerfd timers. > > > > <wakes up> > > > > I'd have thought that the existing stuff would be near-useless without the > > capabilities which you describe? > > Useless like it'd be a motorcycle w/out a cup-holder :) > Seriously, the ability to get the previous values from "something" could > have a meaning if this something is a shared global resource (like signals > for example). In the timerfd case this makes little sense, since you can > create as many timerfd as you like and you do not need to share a single > one by changing/restoring the original context. I think at least ability to read remaining time from a timerfd is needed. -- vda ^ permalink raw reply [flat|nested] 29+ messages in thread
* timerfd redux 2007-09-05 12:12 ` Denys Vlasenko @ 2007-09-05 15:32 ` Michael Kerrisk 2007-09-13 2:39 ` Andrew Morton 0 siblings, 1 reply; 29+ messages in thread From: Michael Kerrisk @ 2007-09-05 15:32 UTC (permalink / raw) To: akpm Cc: corbet, jengelh, hch, stable, drepper, torvalds, linux-kernel, tglx, rdunlap, vda.linux, davidel [Was: Re: [PATCH] Revised timerfd() interface] > Michael, could you please refresh our memories with a brief, > from-scratch summary of what the current interface is, followed > by a summary of what you believe to be the shortcomings to be? Andrew, I'll break this up into parts: 1. the existing timerfd interface 2. timerfd limitations 3. possible solutions a) Add an argument b) Create an interface similar to POSIX timers c) Integrate timerfd with POSIX timers Cheers, Michael 1: the existing timerfd interface ================================= In 2.6.22, Davide added timerfd() with the following interface: returned_fd = timerfd(int fd, int clockid, int flags, struct itimerspec *utimer); If fd is -1, a new timer is created and started. The syscall returns a file descriptor for the timer. 'utimer' specifies the initial expiration and interval of the timer. 'clockid' is CLOCK_REALTIME or CLOCK_REALTIME. The 'utimer' value is relative, unless TFD_TIMER_ABSTIME is specified in 'flags', in which case the initial expiration is specified absolutely. If 'fd' is not -1, then the call modifies the existing timer referred to by the file descriptor 'fd'. The 'clockid', 'flags', and 'utimer' can all be modified. The return value is 'fd'. The key feature of timerfd() is that the caller can use select/poll/epoll to wait on traditional file descriptors and one or more timers. read() from a timerfd file descriptor (should) return a 4-byte integer that is the number of timer expirations since the last read. (If no expiration has so far occurred, read() will block.) IMPORTANT POINT: as implemented in 2.6.22, timerfd was broken: only a single byte of info was returned by read(). I regard this as a virtue: it gives us something closer to a blank slate for fixing the problems described below; furthermore, arguably at this point we could buy ourselves time by pulling timerfd() from 2.6.23, and taking more time to get things right in 2.6.24. (More details on timerfd() can be found here: http://lwn.net/Articles/245533/) 2. timerfd limitations ====================== Unix has two older timer interfaces: * setitimer/getitimer and * POSIX timers (timer_create/timer_settime/timer_gettime). timerfd() lacks two features that are present in the older interfaces: * Retrieve the previous setting of an existing timer when setting a new value for the timer. * Non-destructively fetch the timer remaining until the next expiration of the timer. The fact that this functionality is present in both older APIs strongly suggests that various applications really need both functionalities. (Davide has argued that timerfd() doesn't need the get-while-setting functionality because we can create multiple timerfd timers. However, POSIX timers also allow multiple timer instances, but nevertheless provide get-while-setting. I would estimate that this functionality would be useful for libraries that want to create and control a (single) timerfd file descriptor that is returned to the caller.) 3. possible solutions ===================== ====> a) Add an argument I proposed adding a further argument to timerfd(): old_utmr, which could be used to return the time remaining until expiry for an existing timer (http://marc.info/?l=linux-kernel&m=118669430305788&w=2 ). I proposed semantics that would allow get and get-while-setting functionality. Jon Corbet pointed out that my suggestion was starting to look like a multiplexing syscall. I agree. I now favor one of the remaining solutions. ====> b) Create an interface similar to POSIX timers Create an interface analogous to POSIX timers: fd = timerfd_create(clockid, flags); timerfd_settime(fd, flags, newtimervalue, &time_to_next_expire); timerfd_gettime(fd, &time_to_next_expire); Advantage: this would be a clean, fully functional API, and well understood by virtue of its analogy with the POSIX timers API. Disadvantage: three new system calls, rather than 1. This solution would be sufficient, IMO, but the next solution might be better. ====> c) Integrate timerfd with POSIX timers Make a very simple timerfd call that is integrated with the POSIX timers API. A POSIX timer is created using: int timer_create(clockid_t clockid, struct sigevent *evp, timer_t *timerid); We could then have a timerfd() call that returns a file descriptor for the newly created 'timerid': fd = timerfd(timer_t timerid); We could then use the POSIX timers API to operate on the timer (start it / modify it / fetch timer value): int timer_gettime(timer_t timerid, struct itimerspec *value); int timer_settime(timer_t timerid, int flags, const struct itimerspec *value, struct itimerspec *ovalue); And then read from 'fd' as before. Advantages: 1. Integration with an existing API. 2. Adds just a single system call 3. This strikes me as the most beautiful solution, if we can do it properly. Disadvantage: I'm not yet completely clear whether there are some features of the POSIX timers API that might preclude a clean integration. In particular, we would need to think a little about the semantics of timer_getoverrun(): int timer_getoverrun(timer_t timerid); I suspect it's fine, but we better think about it a little. We would also have to think about how the 'evp' argument of timer_create() would be used. This might be trickier. (Simplest might be to require evp.sigev_notify to be SIGEV_NONE, or perhaps a new flag, SIGEV_TIMERFD.) === END === ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: timerfd redux 2007-09-05 15:32 ` timerfd redux Michael Kerrisk @ 2007-09-13 2:39 ` Andrew Morton 2007-09-13 6:14 ` Michael Kerrisk 2007-09-13 8:13 ` Michael Kerrisk 0 siblings, 2 replies; 29+ messages in thread From: Andrew Morton @ 2007-09-13 2:39 UTC (permalink / raw) To: Michael Kerrisk Cc: corbet, jengelh, hch, stable, drepper, torvalds, linux-kernel, tglx, rdunlap, vda.linux, davidel On Wed, 05 Sep 2007 17:32:01 +0200 "Michael Kerrisk" <mtk-manpages@gmx.net> wrote: > [Was: Re: [PATCH] Revised timerfd() interface] > > > Michael, could you please refresh our memories with a brief, > > from-scratch summary of what the current interface is, followed > > by a summary of what you believe to be the shortcomings to be? > > Andrew, > > I'll break this up into parts: > > 1. the existing timerfd interface > 2. timerfd limitations > 3. possible solutions > a) Add an argument > b) Create an interface similar to POSIX timers > c) Integrate timerfd with POSIX timers > > Cheers, > > Michael > > > 1: the existing timerfd interface > ================================= > > In 2.6.22, Davide added timerfd() with the following interface: > > returned_fd = timerfd(int fd, int clockid, int flags, > struct itimerspec *utimer); > > If fd is -1, a new timer is created and started. The syscall > returns a file descriptor for the timer. 'utimer' specifies > the initial expiration and interval of the timer. > 'clockid' is CLOCK_REALTIME or CLOCK_REALTIME. The 'utimer' > value is relative, unless TFD_TIMER_ABSTIME is specified in > 'flags', in which case the initial expiration is specified > absolutely. > > If 'fd' is not -1, then the call modifies the existing timer > referred to by the file descriptor 'fd'. The 'clockid', 'flags', > and 'utimer' can all be modified. The return value is 'fd'. > > The key feature of timerfd() is that the caller can use > select/poll/epoll to wait on traditional file descriptors and > one or more timers. > > read() from a timerfd file descriptor (should) return a 4-byte > integer that is the number of timer expirations since the last > read. (If no expiration has so far occurred, read() will block.) > > IMPORTANT POINT: as implemented in 2.6.22, timerfd was broken: > only a single byte of info was returned by read(). I regard > this as a virtue: it gives us something closer to a blank slate > for fixing the problems described below; furthermore, > arguably at this point we could buy ourselves time by > pulling timerfd() from 2.6.23, and taking more time to get > things right in 2.6.24. > > (More details on timerfd() can be found here: > http://lwn.net/Articles/245533/) OK. > 2. timerfd limitations > ====================== > > Unix has two older timer interfaces: > > * setitimer/getitimer and > > * POSIX timers (timer_create/timer_settime/timer_gettime). > > timerfd() lacks two features that are present in the older > interfaces: > > * Retrieve the previous setting of an existing timer when > setting a new value for the timer. > > * Non-destructively fetch the timer remaining until the > next expiration of the timer. > > The fact that this functionality is present in both older APIs > strongly suggests that various applications really need both > functionalities. Yes, I can imagine applications wanting to do those things. > (Davide has argued that timerfd() doesn't need the > get-while-setting functionality because we can create multiple > timerfd timers. However, POSIX timers also allow multiple > timer instances, but nevertheless provide get-while-setting. > I would estimate that this functionality would be useful for > libraries that want to create and control a (single) timerfd > file descriptor that is returned to the caller.) Sure. If you're implementing a timeout and you want to reset it, you might indeed want to know how close the old one was to expiring. Davide's proposal sounds like an awkward workaround for missing functionality. Does Davide have a proposal for the non-destructive fetch? > 3. possible solutions I don't think we'll have this settled and coded in time for 2.6.23. So I think the prudent thing to do is to push this back to 2.6.24 and not offer sys_timerfd() in 2.6.23. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: timerfd redux 2007-09-13 2:39 ` Andrew Morton @ 2007-09-13 6:14 ` Michael Kerrisk 2007-09-13 8:13 ` Michael Kerrisk 1 sibling, 0 replies; 29+ messages in thread From: Michael Kerrisk @ 2007-09-13 6:14 UTC (permalink / raw) To: Andrew Morton Cc: davidel, vda.linux, rdunlap, tglx, linux-kernel, torvalds, drepper, stable, hch, jengelh, corbet > > [Was: Re: [PATCH] Revised timerfd() interface] > > > > > Michael, could you please refresh our memories with a brief, > > > from-scratch summary of what the current interface is, followed > > > by a summary of what you believe to be the shortcomings to be? > > > > Andrew, > > > > I'll break this up into parts: > > > > 1. the existing timerfd interface > > 2. timerfd limitations > > 3. possible solutions > > a) Add an argument > > b) Create an interface similar to POSIX timers > > c) Integrate timerfd with POSIX timers > > > > Cheers, > > > > Michael > > > > > > 1: the existing timerfd interface > > ================================= > > > > In 2.6.22, Davide added timerfd() with the following interface: > > > > returned_fd = timerfd(int fd, int clockid, int flags, > > struct itimerspec *utimer); > > > > If fd is -1, a new timer is created and started. The syscall > > returns a file descriptor for the timer. 'utimer' specifies > > the initial expiration and interval of the timer. > > 'clockid' is CLOCK_REALTIME or CLOCK_REALTIME. The 'utimer' > > value is relative, unless TFD_TIMER_ABSTIME is specified in > > 'flags', in which case the initial expiration is specified > > absolutely. > > > > If 'fd' is not -1, then the call modifies the existing timer > > referred to by the file descriptor 'fd'. The 'clockid', 'flags', > > and 'utimer' can all be modified. The return value is 'fd'. > > > > The key feature of timerfd() is that the caller can use > > select/poll/epoll to wait on traditional file descriptors and > > one or more timers. > > > > read() from a timerfd file descriptor (should) return a 4-byte > > integer that is the number of timer expirations since the last > > read. (If no expiration has so far occurred, read() will block.) > > > > IMPORTANT POINT: as implemented in 2.6.22, timerfd was broken: > > only a single byte of info was returned by read(). I regard > > this as a virtue: it gives us something closer to a blank slate > > for fixing the problems described below; furthermore, > > arguably at this point we could buy ourselves time by > > pulling timerfd() from 2.6.23, and taking more time to get > > things right in 2.6.24. > > > > (More details on timerfd() can be found here: > > http://lwn.net/Articles/245533/) > > OK. > > > 2. timerfd limitations > > ====================== > > > > Unix has two older timer interfaces: > > > > * setitimer/getitimer and > > > > * POSIX timers (timer_create/timer_settime/timer_gettime). > > > > timerfd() lacks two features that are present in the older > > interfaces: > > > > * Retrieve the previous setting of an existing timer when > > setting a new value for the timer. > > > > * Non-destructively fetch the timer remaining until the > > next expiration of the timer. > > > > The fact that this functionality is present in both older APIs > > strongly suggests that various applications really need both > > functionalities. > > Yes, I can imagine applications wanting to do those things. > > > (Davide has argued that timerfd() doesn't need the > > get-while-setting functionality because we can create multiple > > timerfd timers. However, POSIX timers also allow multiple > > timer instances, but nevertheless provide get-while-setting. > > I would estimate that this functionality would be useful for > > libraries that want to create and control a (single) timerfd > > file descriptor that is returned to the caller.) > > Sure. If you're implementing a timeout and you want to reset it, you > might indeed want to know how close the old one was to expiring. > > Davide's proposal sounds like an awkward workaround for missing > functionality. In the other thread I commented that the userspace solution starts to look pretty complex, and I doubt that it can be made to work in all cases. > Does Davide have a proposal for the non-destructive fetch? I don't think so, since he disagrees about it's necessity. > > 3. possible solutions > > I don't think we'll have this settled and coded in time for 2.6.23. So I > think the prudent thing to do is to push this back to 2.6.24 and not > offer sys_timerfd() in 2.6.23. I think this would be wise. I'd like to talk with Davide some more about the possibilities. Cheers, Michael -- Michael Kerrisk maintainer of Linux man pages Sections 2, 3, 4, 5, and 7 Want to help with man page maintenance? Grab the latest tarball at http://www.kernel.org/pub/linux/docs/manpages , read the HOWTOHELP file and grep the source files for 'FIXME'. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: timerfd redux 2007-09-13 2:39 ` Andrew Morton 2007-09-13 6:14 ` Michael Kerrisk @ 2007-09-13 8:13 ` Michael Kerrisk 2007-09-13 8:20 ` Andrew Morton 1 sibling, 1 reply; 29+ messages in thread From: Michael Kerrisk @ 2007-09-13 8:13 UTC (permalink / raw) To: Andrew Morton Cc: davidel, vda.linux, rdunlap, tglx, linux-kernel, torvalds, drepper, stable, hch, jengelh, corbet Andrew, > > 3. possible solutions > > I don't think we'll have this settled and coded in time for 2.6.23. So I > think the prudent thing to do is to push this back to 2.6.24 and not offer > sys_timerfd() in 2.6.23. Did you want a patch to remove the syscall number for now, or will you do that? Cheers, Michael -- Michael Kerrisk maintainer of Linux man pages Sections 2, 3, 4, 5, and 7 Want to help with man page maintenance? Grab the latest tarball at http://www.kernel.org/pub/linux/docs/manpages , read the HOWTOHELP file and grep the source files for 'FIXME'. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: timerfd redux 2007-09-13 8:13 ` Michael Kerrisk @ 2007-09-13 8:20 ` Andrew Morton 0 siblings, 0 replies; 29+ messages in thread From: Andrew Morton @ 2007-09-13 8:20 UTC (permalink / raw) To: Michael Kerrisk Cc: davidel, vda.linux, rdunlap, tglx, linux-kernel, torvalds, drepper, stable, hch, jengelh, corbet On Thu, 13 Sep 2007 10:13:59 +0200 "Michael Kerrisk" <mtk-manpages@gmx.net> wrote: > Andrew, > > > > 3. possible solutions > > > > I don't think we'll have this settled and coded in time for 2.6.23. So I > > think the prudent thing to do is to push this back to 2.6.24 and not offer > > sys_timerfd() in 2.6.23. > > Did you want a patch to remove the syscall number for now, > or will you do that? > Please send one over sometime. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Problems with timerfd()
@ 2007-07-23 6:32 Michael Kerrisk
2007-07-23 6:38 ` Andrew Morton
0 siblings, 1 reply; 29+ messages in thread
From: Michael Kerrisk @ 2007-07-23 6:32 UTC (permalink / raw)
To: Andrew Morton; +Cc: lkml, Linux Torvalds, Davide Libenzi, drepper
Andrew,
The timerfd() syscall went into 2.6.22. While writing the man page for
this syscall I've found some notable limitations of the interface, and I am
wondering whether you and Linus would consider having this interface fixed
for 2.6.23.
On the one hand, these fixes would be an ABI change, which is of course
bad. (However, as noted below, you have already accepted one of the ABI
changes that I suggested into -mm, after Davide submitted a patch.)
On the other hand, the interface has not yet made its way into a glibc
release, and the change will not break applications. (The 2.6.22 version
of the interface would just be "broken".)
Details of my suggested changes are below. A complication in all of this
is that on Friday, while I was part way though discussing this with Davide,
he went on vacation for a month and is likely to have only limited email
access during that time. (See my further thoughts about what to do while
Davide is away at the end of this mail message.) Our last communication,
after Davide had expressed reluctance about making some of the interface
changes, was a more extensive note from me describing the problems of the
interface.
The problems of the 2.6.22 timerfd() interface are as follows:
Problem 1
---------
The value returned by read(2)ing from a timerfd file descriptor is the
number of timer overruns. In 2.6.22, this value is 4 bytes, limiting the
overrun count to 2^32. Consider an application where the timer frequency
was 100 kHz (feasible in the not-too-distant future, I would guess), then
the overrun counter would cycle after ~40000 seconds (~11 hours).
Furthermore returning 4 bytes from the read() is inconsistent with eventfd
file descriptors, which return 8 byte integers from a read().
Davide has already submitted a patch to you to make read() from a timerfd
file descriptor return an 8 byte integer, and I understand it to have been
accepted into -mm.
Problem 2
---------
Existing timer APIs (Unix interval timers -- setitimer(2); POSIX timers --
timer_settime()) allow the caller to retrieve the previous setting of a
timer at the same time as a new timer setting is established. This permits
functionality such as the following for userland programs:
1. set a timer to go of at time X
2. modify the timer to go off at earlier time Z; return previous
timer settings (X)
3. When the timer Z expires, restore timer to expire at time X
timerfd() does not provide this functionality.
Problem 3
---------
Existing timer APIs (Unix interval timers -- getitimer(2); POSIX timers --
timer_gettime()) allow the caller to retrieve the time remaining until the
next expiration of the timer.
timerfd() does not provide this functionality.
Solution (proposed interface changes)
-------------------------------------
In response to my "Problem 2", Davide noted in the last message I got from
him before he went on vacation:
> But the old status of the timer is the union of clickid, flags and utmr.
> So, in theory, the whole set should be returned back, forcing a pretty
> drastic API change.
However, I think there is a reasonable solution to this problem, which I
outlined to Davide, but did not yet hear back from him about.
a) Make the 'clockid' immutable: say that it can only be set
if 'ufd' is -1 -- that is, on the timerfd() call that
first creates the timer. This would eliminate the need to
return the previous clockid value. (This is effectively
the limitation that is imposed by POSIX timers: the
clockid is specified when the timer is created with
timer_create(), and can't be changed.)
[In the 2.6.22 interface, the clockid of an existing
timer can be changed with a further call to timerfd()
that specifies the file descriptor of an existing timer.]
b) There is no need to return the previous 'flags' setting.
The POSIX timer functions (i.e., timer_settime()) do not
do this. Instead, timer_settime() always returns the
time until the next expiration would have occurred,
even if the TIMER_ABSTIME flag was specified when
the timer was set.
[The only 'flags' value currently implemented in
timerfd() is TFD_TIMER_ABSTIME, which is the
equivalent of TIMER_ABSTIME.]
With these design assumptions, the only thing that would need
to be added to timerfd() would be an argument used to return the time
until the previous timer would have expired + its interval.
The use cases would be as follows:
ufd = timerfd(-1, clockid, flags, utmr, NULL);
to create a new timer with given clockid, flags, and utmr (intial
expiration + interval).
ufd = timerfd(ufd, 0, flags, utmr, NULL);
to change the flags and timer settings of an existing timer.
ufd = timerfd(ufd, 0, flags, utmr, &old);
to change the flags and timer settings of an existing timer, and retrieve
the time until the next expiration of the timer (and the associated interval).
ufd = timerfd(ufd, 0, 0, NULL, &old);
Return the time until the next expiration of the timer (and the associated
interval), without changing the existing timer settings
Practical details
-----------------
Since Davide is away, my proposal is this: if you are prepared to consider
entertaining this ABI change, then I would try to write the patch. If when
we next hear from Davide (he may have intermittent email access over the
next month), he agrees that it is worth making the change, then I would
submit the change to -mm with the hope that time frames would allow for it
to make it into 2.6.23. (I would guess that delaying things for a month
means the fix might not make it into 2.6.23, and would thus simply make the
fix more painful and less feasible.)
What do you think?
Cheers,
Michael
PS For reference, the timerfd.2 man page describing the 2.6.22 interface is
below.
.TH TIMERFD 2 2007-07-17 Linux "Linux Programmer's Manual"
.SH NAME
timerfd \- create a timer that delivers notifications on a file descriptor
.SH SYNOPSIS
.\" FIXME . This header file may well change
.\" FIXME . Probably _GNU_SOURCE will be required
.\" FIXME . May require: Link with \fI\-lrt\f
.nf
.B #include <sys/timerfd.h>
.sp
.BR "int timerfd(int " ufd ", int " clockid ", int " flags ,
.BR " const struct itimerspec *" utmr );
.fi
.SH DESCRIPTION
.BR timerfd ()
creates and starts a new timer (or modifies the settings of an
existing timer) that delivers timer expiration
information via a file descriptor.
This provides an alternative to the use of
.BR setitimer (2)
or
.BR timer_create (3),
and has the advantage that the file descriptor may be monitored by
.BR poll (2)
and
.BR select (2).
.\" FIXME Davide, a question: timer_settime() and setitimer()
.\" both permit the caller to obtain the old value of the
.\" timer when modifying an existing timer. Why doesn't
.\" timerfd() provide this functionality?
The
.I ufd
argument is either \-1 to create a new timer,
or a file descriptor referring to an existing timerfd timer.
The remaining arguments specify the settings for the new timer,
or the modified settings for an existing timer.
The
.I clockid
argument specifies the clock that is used to mark the progress
of the timer, and must be either
.BR CLOCK_REALTIME
or
.BR CLOCK_MONOTONIC .
.B CLOCK_REALTIME
is a settable system-wide clock.
.B CLOCK_MONOTONIC
is a non-settable clock that is not affected
by discontinuous changes in the system clock
(e.g., manual changes to system time).
See also
.BR clock_getres (3).
The
.I flags
argument is either 0, to create a relative timer
.RI ( utmr.it_interval
specifies a relative time for the clock specified by
.IR clockid ),
or
.BR TFD_TIMER_ABSTIME ,
to create an absolute timer
.RI ( utmr.it_interval
specifies an absolute time for the clock specified by
.IR clockid ).
The
.I utmr
argument specifies the initial expiration and interval for the timer.
The
.I itimer
structure used for this argument contains two fields,
each of which is in turn a structure of type
.IR timespec :
.in +0.5i
.nf
struct timespec {
time_t tv_sec; /* Seconds */
long tv_nsec; /* Nanoseconds */
};
struct itimerspec {
struct timespec it_interval; /* Interval for periodic
timer */
struct timespec it_value; /* Initial expiration */
};
.fi
.in
.PP
.IR utmr.it_value
specifies the initial expiration of the timer,
in seconds and nanoseconds.
Setting both fields of
.IR utmr.it_value
to zero will disable an existing timer
.RI ( ufd
!= \-1),
or create a new timer that is not armed
.RI ( ufd
== \-1).
Setting one or both fields of
.I utmr.it_interval
to non-zero values specifies the period, in seconds and nanoseconds,
for repeated timer expirations after the initial expiration.
If both fields of
.I utmr.it_interval
are zero, the the timer expires just once, at the time specified by
.IR utmr.it_value .
.PP
.BR timerfd (2)
returns a file descriptor that supports the following operations:
.TP
.BR read (2)
.\" FIXME Davide, What I have written below is what
.\" I've determined from looking at the source code
.\" and from experimenting. But is it correct?
If the timer has already expired one or more times since it was created,
or since the last
.BR read (2),
then the buffer given to
.BR read (2)
returns an unsigned 4-byte integer
.RI ( uint32_t )
containing the number of expirations that have occurred.
.\" FIXME Davide, what if there are more expirations than can fit
.\" in a uint32_t? (Why wasn't this value uint64_t, as with
.\" eventfd()?)
.IP
If no timer expirations have occurred at the time of the
.BR read (2),
then the call either blocks until the next timer expiration,
or fails with the error
.B EAGAIN
if the file descriptor has been made non-blocking
(via the use of the
.BR fcntl (2)
.B F_SETFL
operation to set the
.B O_NONBLOCK
flag).
.IP
A
.BR read (2)
will fail with the error
.B EINVAL
if the size of the supplied buffer is less than 4 bytes.
.TP
.BR poll "(2), " select "(2) (and similar)"
The file descriptor is readable
(the
.BR select (2)
.I readfds
argument; the
.BR poll (2)
.B POLLIN
flag)
if one or more timer expirations have occurred.
.IP
The timerfd file descriptor also supports the other file-descriptor
multiplexing APIs:
.BR pselect (2),
.BR ppoll (2),
and
.BR epoll (7).
.SS fork(2) semantics
.\" FIXME Davide, is the following correct?
After a
.BR fork (2),
the child inherits a copy of the timerfd file descriptor.
The file descriptor refers to the same underlying
file object as the corresponding descriptor in the parent,
and
.BR read (2)s
in the child will return information about
expirations of the timer.
.SS execve(2) semantics
.\" FIXME Davide, is the following correct?
A timerfd file descriptor is preserved across
.BR execve (2),
and continues to generate file expirations.
.SH "RETURN VALUE"
On success,
.BR timerfd ()
returns a timerfd file descriptor;
this is either a new file descriptor (if
.I ufd
was \-1), or
.I ufd
if
.I ufd
was a valid timerfd file descriptor.
On error, \-1 is returned and
.I errno
is set to indicate the error.
.SH ERRORS
.TP
.B EBADF
The
.I ufd
file descriptor is not a valid file descriptor.
.TP
.B EINVAL
The
.I ufd
file descriptor is not a valid timerfd file descriptor.
The
.I clockid
argument is neither
.B CLOCK_MONOTONIC
nor
.BR CLOCK_REALTIME .
The
.I utmr
is not properly initialized (one of the
.I tv_nsec
falls outside the range zero to 999,999,999).
.TP
.B EMFILE
The per-process limit of open file descriptors has been reached.
.TP
.B ENFILE
The system limit on the total number of open files has been
reached.
.TP
.B ENODEV
Could not mount (internal) anonymous i-node device.
.TP
.B ENOMEM
There was insufficient memory to handle the requested
.I op
control operation.
.SH VERSIONS
.BR timerfd (2)
is available on Linux since kernel 2.6.22.
.\" FIXME . check later to see when glibc support is provided
As at July 2007 (glibc 2.6), the details of the glibc interface
have not been finalized, so that, for example,
the eventual header file may be different from that shown above.
.SH CONFORMING TO
.BR timerfd (2)
is Linux specific.
.SH EXAMPLE
.nf
.\" FIXME . Check later what header file glibc uses for timerfd
.\" FIXME . Probably glibc will require _GNU_SOURCE to be set
.\"
.\" The commented out code here is what we currently need until
.\" the required stuff is in glibc
.\"
.\" #define _GNU_SOURCE
.\" #include <sys/syscall.h>
.\" #include <unistd.h>
.\" #include <time.h>
.\" #if defined(__i386__)
.\" #define __NR_timerfd 322
.\" #endif
.\"
.\" static int
.\" timerfd(int ufd, int clockid, int flags, struct itimerspec *utmr) {
.\" return syscall(__NR_timerfd, ufd, clockid, flags, utmr);
.\" }
.\"
.\" #define TFD_TIMER_ABSTIME (1 << 0)
.\"
/* Link with -lrt */
#include <sys/timerfd.h> /* May yet change for glibc */
#include <time.h>
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <stdint.h> /* Definition of uint32_t */
#define die(msg) do { perror(msg); exit(EXIT_FAILURE); } while (0)
static void
print_elapsed_time(void)
{
static struct timespec start;
struct timespec curr;
static int first_call = 1;
int secs, nsecs;
if (first_call) {
first_call = 0;
if (clock_gettime(CLOCK_MONOTONIC, &start) == \-1)
die("clock_gettime");
}
if (clock_gettime(CLOCK_MONOTONIC, &curr) == \-1)
die("clock_gettime");
secs = curr.tv_sec \- start.tv_sec;
nsecs = curr.tv_nsec \- start.tv_nsec;
if (nsecs < 0) {
secs\-\-;
nsecs += 1000000000;
}
printf("%d.%03d: ", secs, (nsecs + 500000) / 1000000);
}
int
main(int argc, char *argv[])
{
struct itimerspec utmr;
int max_expirations, tot_exp, tfd;
struct timespec now;
uint32_t exp;
ssize_t s;
if ((argc != 2) && (argc != 4)) {
fprintf(stderr, "%s init\-secs [interval\-secs max\-exp]\\n",
argv[0]);
exit(EXIT_FAILURE);
}
if (clock_gettime(CLOCK_REALTIME, &now) == \-1)
die("clock_gettime");
/* Create a CLOCK_REALTIME absolute timer with initial
expiration and interval as specified in command line */
utmr.it_value.tv_sec = now.tv_sec + atoi(argv[1]);
utmr.it_value.tv_nsec = now.tv_nsec;
if (argc == 2) {
utmr.it_interval.tv_sec = 0;
max_expirations = 1;
} else {
utmr.it_interval.tv_sec = atoi(argv[2]);
max_expirations = atoi(argv[3]);
}
utmr.it_interval.tv_nsec = 0;
tfd = timerfd(\-1, CLOCK_REALTIME, TFD_TIMER_ABSTIME, &utmr);
if (tfd == \-1)
die("timerfd");
print_elapsed_time();
printf("timer started\\n");
.\" exp = 0; // ????? Without this initialization, the results from
.\" // read() are strange; it appears that read() is only
.\" // returning one byte of tick information, not four.
for (tot_exp = 0; tot_exp < max_expirations;) {
s = read(tfd, &exp, sizeof(uint32_t));
if (s != sizeof(uint32_t))
die("read");
tot_exp += exp;
print_elapsed_time();
printf("read: %u; total=%d\\n", exp, tot_exp);
}
exit(EXIT_SUCCESS);
}
.fi
.SH "SEE ALSO"
.BR eventfd (2),
.BR poll (2),
.BR read (2),
.BR select (2),
.BR signalfd (2),
.BR epoll (7),
.BR time (7)
.\" FIXME See: setitimer(2), timer_create(3), clock_settime(3)
.\" FIXME other timer syscalls, and have them refer to this page
.\" FIXME have SEE ALSO in time.7 refer to this page.
^ permalink raw reply [flat|nested] 29+ messages in thread* Re: Problems with timerfd() 2007-07-23 6:32 Problems with timerfd() Michael Kerrisk @ 2007-07-23 6:38 ` Andrew Morton 2007-07-25 18:18 ` Michael Kerrisk 0 siblings, 1 reply; 29+ messages in thread From: Andrew Morton @ 2007-07-23 6:38 UTC (permalink / raw) To: Michael Kerrisk; +Cc: lkml, Linux Torvalds, Davide Libenzi, drepper, stable On Mon, 23 Jul 2007 08:32:29 +0200 Michael Kerrisk <mtk-manpages@gmx.net> wrote: > Andrew, > > The timerfd() syscall went into 2.6.22. While writing the man page for > this syscall I've found some notable limitations of the interface, and I am > wondering whether you and Linus would consider having this interface fixed > for 2.6.23. > > On the one hand, these fixes would be an ABI change, which is of course > bad. (However, as noted below, you have already accepted one of the ABI > changes that I suggested into -mm, after Davide submitted a patch.) > > On the other hand, the interface has not yet made its way into a glibc > release, and the change will not break applications. (The 2.6.22 version > of the interface would just be "broken".) I think if the need is sufficient we can do this: fix it in 2.6.23 and in 2.6.22.x. That means that there will be a few broken-on-new-glibc kernels out in the wild, but very few I suspect. > Details of my suggested changes are below. A complication in all of this > is that on Friday, while I was part way though discussing this with Davide, > he went on vacation for a month and is likely to have only limited email > access during that time. (See my further thoughts about what to do while > Davide is away at the end of this mail message.) Our last communication, > after Davide had expressed reluctance about making some of the interface > changes, was a more extensive note from me describing the problems of the > interface. > > The problems of the 2.6.22 timerfd() interface are as follows: > > Problem 1 > --------- > > The value returned by read(2)ing from a timerfd file descriptor is the > number of timer overruns. In 2.6.22, this value is 4 bytes, limiting the > overrun count to 2^32. Consider an application where the timer frequency > was 100 kHz (feasible in the not-too-distant future, I would guess), then > the overrun counter would cycle after ~40000 seconds (~11 hours). > Furthermore returning 4 bytes from the read() is inconsistent with eventfd > file descriptors, which return 8 byte integers from a read(). > > Davide has already submitted a patch to you to make read() from a timerfd > file descriptor return an 8 byte integer, and I understand it to have been > accepted into -mm. argh. Nobody told me it was an ABI change! We'll need to consider merging make-timerfd-return-a-u64-and-fix-the-__put_user.patch into 2.6.22.x as well. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Problems with timerfd() 2007-07-23 6:38 ` Andrew Morton @ 2007-07-25 18:18 ` Michael Kerrisk 2007-07-25 22:12 ` Andrew Morton 0 siblings, 1 reply; 29+ messages in thread From: Michael Kerrisk @ 2007-07-25 18:18 UTC (permalink / raw) To: Andrew Morton; +Cc: lkml, Linux Torvalds, Davide Libenzi, drepper, stable Andrew, Andrew Morton wrote: > On Mon, 23 Jul 2007 08:32:29 +0200 Michael Kerrisk <mtk-manpages@gmx.net> wrote: > >> Andrew, >> >> The timerfd() syscall went into 2.6.22. While writing the man page for >> this syscall I've found some notable limitations of the interface, and I am >> wondering whether you and Linus would consider having this interface fixed >> for 2.6.23. >> >> On the one hand, these fixes would be an ABI change, which is of course >> bad. (However, as noted below, you have already accepted one of the ABI >> changes that I suggested into -mm, after Davide submitted a patch.) >> >> On the other hand, the interface has not yet made its way into a glibc >> release, and the change will not break applications. (The 2.6.22 version >> of the interface would just be "broken".) > > I think if the need is sufficient we can do this: fix it in 2.6.23 and in > 2.6.22.x. That means that there will be a few broken-on-new-glibc kernels > out in the wild, but very few I suspect. So I'm still not quite clear. Can I take it from your statement above that the proposed ABI changes would be admissible, as long as Davide is okay with them? Cheers, Michael -- Michael Kerrisk maintainer of Linux man pages Sections 2, 3, 4, 5, and 7 Want to help with man page maintenance? Grab the latest tarball at http://www.kernel.org/pub/linux/docs/manpages/ read the HOWTOHELP file and grep the source files for 'FIXME'. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Problems with timerfd() 2007-07-25 18:18 ` Michael Kerrisk @ 2007-07-25 22:12 ` Andrew Morton 2007-08-07 6:55 ` Michael Kerrisk 0 siblings, 1 reply; 29+ messages in thread From: Andrew Morton @ 2007-07-25 22:12 UTC (permalink / raw) To: Michael Kerrisk; +Cc: lkml, Linux Torvalds, Davide Libenzi, drepper, stable On Wed, 25 Jul 2007 20:18:51 +0200 Michael Kerrisk <mtk-manpages@gmx.net> wrote: > Andrew, > > Andrew Morton wrote: > > On Mon, 23 Jul 2007 08:32:29 +0200 Michael Kerrisk <mtk-manpages@gmx.net> wrote: > > > >> Andrew, > >> > >> The timerfd() syscall went into 2.6.22. While writing the man page for > >> this syscall I've found some notable limitations of the interface, and I am > >> wondering whether you and Linus would consider having this interface fixed > >> for 2.6.23. > >> > >> On the one hand, these fixes would be an ABI change, which is of course > >> bad. (However, as noted below, you have already accepted one of the ABI > >> changes that I suggested into -mm, after Davide submitted a patch.) > >> > >> On the other hand, the interface has not yet made its way into a glibc > >> release, and the change will not break applications. (The 2.6.22 version > >> of the interface would just be "broken".) > > > > I think if the need is sufficient we can do this: fix it in 2.6.23 and in > > 2.6.22.x. That means that there will be a few broken-on-new-glibc kernels > > out in the wild, but very few I suspect. > > So I'm still not quite clear. Can I take it from your statement above that > the proposed ABI changes would be admissible, as long as Davide is okay > with them? > yup, I'll send that diff into Linus and -stable and see what happens. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Problems with timerfd() 2007-07-25 22:12 ` Andrew Morton @ 2007-08-07 6:55 ` Michael Kerrisk 2007-08-09 21:11 ` [PATCH] Revised timerfd() interface Michael Kerrisk 0 siblings, 1 reply; 29+ messages in thread From: Michael Kerrisk @ 2007-08-07 6:55 UTC (permalink / raw) To: Andrew Morton Cc: lkml, Linux Torvalds, Davide Libenzi, drepper, stable, Christoph Hellwig, Randy Dunlap Andrew, I'm working on the changes to timerfd(), but must admit I am struggling to understand some of the kernel code for working with userspace timers (e.g., in kernel/posix-timers.c). Can you suggest anyone who could provide assistance? Cheers, Michael Andrew Morton wrote: > On Wed, 25 Jul 2007 20:18:51 +0200 > Michael Kerrisk <mtk-manpages@gmx.net> wrote: > >> Andrew, >> >> Andrew Morton wrote: >>> On Mon, 23 Jul 2007 08:32:29 +0200 Michael Kerrisk <mtk-manpages@gmx.net> wrote: >>> >>>> Andrew, >>>> >>>> The timerfd() syscall went into 2.6.22. While writing the man page for >>>> this syscall I've found some notable limitations of the interface, and I am >>>> wondering whether you and Linus would consider having this interface fixed >>>> for 2.6.23. >>>> >>>> On the one hand, these fixes would be an ABI change, which is of course >>>> bad. (However, as noted below, you have already accepted one of the ABI >>>> changes that I suggested into -mm, after Davide submitted a patch.) >>>> >>>> On the other hand, the interface has not yet made its way into a glibc >>>> release, and the change will not break applications. (The 2.6.22 version >>>> of the interface would just be "broken".) >>> I think if the need is sufficient we can do this: fix it in 2.6.23 and in >>> 2.6.22.x. That means that there will be a few broken-on-new-glibc kernels >>> out in the wild, but very few I suspect. >> So I'm still not quite clear. Can I take it from your statement above that >> the proposed ABI changes would be admissible, as long as Davide is okay >> with them? >> > > yup, I'll send that diff into Linus and -stable and see what happens. > -- Michael Kerrisk maintainer of Linux man pages Sections 2, 3, 4, 5, and 7 Want to help with man page maintenance? Grab the latest tarball at http://www.kernel.org/pub/linux/docs/manpages/ read the HOWTOHELP file and grep the source files for 'FIXME'. ^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH] Revised timerfd() interface 2007-08-07 6:55 ` Michael Kerrisk @ 2007-08-09 21:11 ` Michael Kerrisk 2007-08-13 23:34 ` Randy Dunlap 2007-08-15 14:40 ` Jonathan Corbet 0 siblings, 2 replies; 29+ messages in thread From: Michael Kerrisk @ 2007-08-09 21:11 UTC (permalink / raw) To: Andrew Morton, Thomas Gleixner Cc: Michael Kerrisk, lkml, Linux Torvalds, Davide Libenzi, drepper, stable, Christoph Hellwig, Randy Dunlap, Jan Engelhardt, mtk [-- Attachment #1: Type: text/plain, Size: 12283 bytes --] Andrew, Here's my first shot at changing the timerfd() interface; patch is against 2.6.23-rc1-mm2. I think I got to the bottom of understanding the timer code in the end, but I may still have missed some things... This is my first kernel patch of any substance. Perhaps Randy or Christoph can give my toes a light toasting for the various boobs I've likely made. Thomas, would you please look over this patch to verify my timer code? Davide: would you please bless this interface change ;-). It makes it possible to: 1) Fetch the previous timer settings (i.e., interval, and time remaining until next expiration) while installing new settings 2) Retrieve the settings of an existing timer without changing the timer. Both of these features are supported by the older Unix timer APIs (setitimer/getitimer and timer_create/timer_settime/timer_gettime). The changes are as follows: a) A further argument, outmr, can be used to retrieve the interval and time remaining before the expiration of a previously created timer. Thus the call signature is now: timerfd(int utmr, int clockid, int flags, struct itimerspec *utmr, struct itimerspec *outmr) The outmr argument: 1) must be NULL for a new timer (ufd == -1). 2) can be NULL if we are updating an existing timer (i.e., utmr != NULL), but we are not interested in the previous timer value. 3) can be used to retrieve the time until next timer expiration, without changing the timer. (In this case, utmr == NULL and ufd >= 0.) b) Make the 'clockid' immutable: it can only be set if 'ufd' is -1 -- that is, on the timerfd() call that first creates a timer. (This is effectively the limitation that is imposed by POSIX timers: the clockid is specified when the timer is created with timer_create(), and can't be changed.) [In the 2.6.22 interface, the clockid of an existing timer can be changed with a further call to timerfd() that specifies the file descriptor of an existing timer.] The use cases are as follows: ufd = timerfd(-1, clockid, flags, utmr, NULL); to create a new timer with given clockid, flags, and utmr (initial expiration + interval). outmr must be NULL. ufd = timerfd(ufd, -1, flags, utmr, NULL); to change the flags and timer settings of an existing timer. (The clockid argument serves no purpose when modifying an existing timer, and as I've implemented things, the argument must be specified as -1 for an existing timer.) ufd = timerfd(ufd, -1, flags, utmr, &outmr); to change the flags and timer settings of an existing timer, and retrieve the time until the next expiration of the timer (and the associated interval). ufd = timerfd(ufd, -1, 0, NULL, &outmr); Return the time until the next expiration of the timer (and the associated interval), without changing the existing timer settings. The flags argument has no meaning in this case, and must be specified as 0. Other changes: * I've added checks for various combinations of arguments values that could be deemed invalid; there is probably room for debate about whether we want all of these checks. Comments in the patch describe these checks. * Appropriate, and possibly even correct, changes made in fs/compat.c. * Jan Engelhardt noted that a cast could be removed from Davide's original code; I've removed it. The changes are correct as far as I've tested them. I've attached a test program. Davide, could you subject this to further test please? I'm off on holiday the day after tomorrow, back in two weeks, with very limited computer access. I may have time for another pass of the code if comments come in over(euro)night, otherwise Davide may be willing to clean up whatever I messed up... Cheers, Michael diff -ruN linux-2.6.23-rc1-mm2-orig/fs/compat.c linux-2.6.23-rc1-mm2/fs/compat.c --- linux-2.6.23-rc1-mm2-orig/fs/compat.c 2007-08-05 10:43:30.000000000 +0200 +++ linux-2.6.23-rc1-mm2/fs/compat.c 2007-08-07 22:08:15.000000000 +0200 @@ -2211,18 +2211,38 @@ #ifdef CONFIG_TIMERFD asmlinkage long compat_sys_timerfd(int ufd, int clockid, int flags, - const struct compat_itimerspec __user *utmr) + const struct compat_itimerspec __user *utmr, + struct compat_itimerspec __user *old_utmr) { struct itimerspec t; struct itimerspec __user *ut; + long ret; - if (get_compat_itimerspec(&t, utmr)) - return -EFAULT; - ut = compat_alloc_user_space(sizeof(*ut)); - if (copy_to_user(ut, &t, sizeof(t))) - return -EFAULT; + /*:mtk: Framework taken from compat_sys_mq_getsetattr() */ - return sys_timerfd(ufd, clockid, flags, ut); + ut = compat_alloc_user_space(2 * sizeof(*ut)); + + if (utmr) { + if (get_compat_itimerspec(&t, utmr)) + return -EFAULT; + if (copy_to_user(ut, &t, sizeof(t))) + return -EFAULT; + } + + ret = sys_timerfd(ufd, clockid, flags, + utmr ? ut : NULL, + old_utmr ? ut + 1 : NULL); + + if (ret) + return ret; + + if (old_utmr) { + if (copy_from_user(&t, old_ut, sizeof(t)) || + put_compat_itimerspec(&t, old_utmr)) + return -EFAULT; + } + + return 0; } #endif /* CONFIG_TIMERFD */ diff -ruN linux-2.6.23-rc1-mm2-orig/fs/timerfd.c linux-2.6.23-rc1-mm2/fs/timerfd.c --- linux-2.6.23-rc1-mm2-orig/fs/timerfd.c 2007-08-05 10:44:24.000000000 +0200 +++ linux-2.6.23-rc1-mm2/fs/timerfd.c 2007-08-09 22:11:25.000000000 +0200 @@ -2,6 +2,8 @@ * fs/timerfd.c * * Copyright (C) 2007 Davide Libenzi <davidel@xmailserver.org> + * and Copyright (C) 2007 Google, Inc. + * (Author: Michael Kerrisk <mtk@google.com>) * * * Thanks to Thomas Gleixner for code reviews and useful comments. @@ -26,6 +28,12 @@ ktime_t tintv; wait_queue_head_t wqh; int expired; + int clockid; /* Established when timer is created, then + immutable */ + int overrun; /* Number of overruns for this timer so far, + updated on calls to timerfd() that retrieve + settings of an existing timer, reset to zero + by timerfd_read() */ }; /* @@ -61,6 +69,7 @@ hrtimer_init(&ctx->tmr, clockid, htmode); ctx->tmr.expires = texp; ctx->tmr.function = timerfd_tmrproc; + ctx->overrun = 0; if (texp.tv64 != 0) hrtimer_start(&ctx->tmr, texp, htmode); } @@ -130,10 +139,11 @@ * callback to avoid DoS attacks specifying a very * short timer period. */ - ticks = (u64) + ticks = ctx->overrun + hrtimer_forward(&ctx->tmr, hrtimer_cb_get_time(&ctx->tmr), ctx->tintv); + ctx->overrun = 0; hrtimer_restart(&ctx->tmr); } else ticks = 1; @@ -151,24 +161,69 @@ }; asmlinkage long sys_timerfd(int ufd, int clockid, int flags, - const struct itimerspec __user *utmr) + const struct itimerspec __user *utmr, + struct itimerspec __user *old_utmr) { int error; struct timerfd_ctx *ctx; struct file *file; struct inode *inode; - struct itimerspec ktmr; + struct itimerspec ktmr, oktmr; - if (copy_from_user(&ktmr, utmr, sizeof(ktmr))) - return -EFAULT; + /* + * Not sure if we really need the first check below -- it + * relies on all clocks having a non-negative value. That's + * currently true, but I'm not sure that it is anywhere + * documented as a requirement. + * (The point of the check is that clockid has no meaning if + * we are changing the settings of an existing timer + * (i.e., ufd >= 0), so let's require it to be an otherwise + * unused value.) + */ - if (clockid != CLOCK_MONOTONIC && - clockid != CLOCK_REALTIME) + if (ufd >= 0) { + if (clockid != -1) + return -EINVAL; + } else if (clockid != CLOCK_MONOTONIC && clockid != CLOCK_REALTIME) + return -EINVAL; + + /* + * Disallow any other bits in flags than those we implement. + */ + if ((flags & ~(TFD_TIMER_ABSTIME)) != 0) + return -EINVAL; + + /* + * Not sure if this check is strictly necessary, except to stop + * userspace trying to do something silly. Flags only has meaning + * if we are creating/modifying a timer. + */ + if (utmr == NULL && flags != 0) return -EINVAL; - if (!timespec_valid(&ktmr.it_value) || - !timespec_valid(&ktmr.it_interval)) + + /* + * If we are creating a new timer, then we must supply the setting + * and there is no previous timer setting to retrieve. + */ + if (ufd == -1 && (utmr == NULL || old_utmr != NULL)) + return -EINVAL; + + /* + * Caller must provide a new timer setting, or ask for previous + * setting, or both. + */ + if (utmr == NULL && old_utmr == NULL) return -EINVAL; + if (utmr) { + if (copy_from_user(&ktmr, utmr, sizeof(ktmr))) + return -EFAULT; + + if (!timespec_valid(&ktmr.it_value) || + !timespec_valid(&ktmr.it_interval)) + return -EINVAL; + } + if (ufd == -1) { ctx = kmalloc(sizeof(*ctx), GFP_KERNEL); if (!ctx) @@ -176,6 +231,11 @@ init_waitqueue_head(&ctx->wqh); + /* + * Save the clockid since we need it later if we modify + * the timer. + */ + ctx->clockid = clockid; timerfd_setup(ctx, clockid, flags, &ktmr); /* @@ -195,23 +255,61 @@ fput(file); return -EINVAL; } - /* - * We need to stop the existing timer before reprogramming - * it to the new values. - */ - for (;;) { - spin_lock_irq(&ctx->wqh.lock); - if (hrtimer_try_to_cancel(&ctx->tmr) >= 0) - break; - spin_unlock_irq(&ctx->wqh.lock); - cpu_relax(); + + if (old_utmr) { + + /* + * Retrieve interval and time remaining before + * next expiration of timer. + */ + + struct hrtimer *timer = &ctx->tmr; + ktime_t iv = ctx->tintv; + ktime_t now, remaining; + + clockid = ctx->clockid; + + memset(&oktmr, 0, sizeof(struct itimerspec)); + if (iv.tv64) + oktmr.it_interval = ktime_to_timespec(iv); + + now = timer->base->get_time(); + ctx->overrun += hrtimer_forward(&ctx->tmr, + hrtimer_cb_get_time(&ctx->tmr), + ctx->tintv); + remaining = ktime_sub(timer->expires, now); + if (remaining.tv64 <= 0) + oktmr.it_value.tv_nsec = 1; + else + oktmr.it_value = ktime_to_timespec(remaining); + + if (copy_to_user(old_utmr, &oktmr, sizeof (oktmr))) { + fput(file); + return -EFAULT; + } } - /* - * Re-program the timer to the new value ... - */ - timerfd_setup(ctx, clockid, flags, &ktmr); - spin_unlock_irq(&ctx->wqh.lock); + if (utmr) { + /* + * Modify settings of existing timer. + * We need to stop the existing timer before + * reprogramming it to the new values. + */ + for (;;) { + spin_lock_irq(&ctx->wqh.lock); + if (hrtimer_try_to_cancel(&ctx->tmr) >= 0) + break; + spin_unlock_irq(&ctx->wqh.lock); + cpu_relax(); + } + + /* + * Re-program the timer to the new value ... + */ + timerfd_setup(ctx, ctx->clockid, flags, &ktmr); + + spin_unlock_irq(&ctx->wqh.lock); + } fput(file); } @@ -222,4 +320,3 @@ kfree(ctx); return error; } - diff -ruN linux-2.6.23-rc1-mm2-orig/include/linux/compat.h linux-2.6.23-rc1-mm2/include/linux/compat.h --- linux-2.6.23-rc1-mm2-orig/include/linux/compat.h 2007-07-09 01:32:17.000000000 +0200 +++ linux-2.6.23-rc1-mm2/include/linux/compat.h 2007-08-05 17:46:33.000000000 +0200 @@ -265,7 +265,8 @@ const compat_sigset_t __user *sigmask, compat_size_t sigsetsize); asmlinkage long compat_sys_timerfd(int ufd, int clockid, int flags, - const struct compat_itimerspec __user *utmr); + const struct compat_itimerspec __user *utmr, + struct compat_itimerspec __user *old_utmr); #endif /* CONFIG_COMPAT */ #endif /* _LINUX_COMPAT_H */ diff -ruN linux-2.6.23-rc1-mm2-orig/include/linux/syscalls.h linux-2.6.23-rc1-mm2/include/linux/syscalls.h --- linux-2.6.23-rc1-mm2-orig/include/linux/syscalls.h 2007-08-05 10:44:32.000000000 +0200 +++ linux-2.6.23-rc1-mm2/include/linux/syscalls.h 2007-08-05 17:47:15.000000000 +0200 @@ -608,7 +608,8 @@ asmlinkage long sys_getcpu(unsigned __user *cpu, unsigned __user *node, struct getcpu_cache __user *cache); asmlinkage long sys_signalfd(int ufd, sigset_t __user *user_mask, size_t sizemask); asmlinkage long sys_timerfd(int ufd, int clockid, int flags, - const struct itimerspec __user *utmr); + const struct itimerspec __user *utmr, + struct itimerspec __user *old_utmr); asmlinkage long sys_eventfd(unsigned int count); asmlinkage long sys_fallocate(int fd, int mode, loff_t offset, loff_t len); [-- Attachment #2: new_timerfd_errors_test.c --] [-- Type: text/x-csrc, Size: 9715 bytes --] /* new_timerfd_errors_test.c Copyright (C) 2007 Google Author: Michael Kerrisk <mtk@google.com> Run various tests against the revised timerfd() implementation. */ /* Link with -lrt */ #define _GNU_SOURCE #include <sys/syscall.h> #include <unistd.h> #include <time.h> #if defined(__i386__) #define __NR_timerfd 322 #endif static int timerfd(int ufd, int clockid, int flags, struct itimerspec *utmr, struct itimerspec *outmr) { return syscall(__NR_timerfd, ufd, clockid, flags, utmr, outmr); } #define TFD_TIMER_ABSTIME (1 << 0) /**********************************************************************/ // #include <sys/timerfd.h> #include <time.h> #include <unistd.h> #include <stdlib.h> #include <stdio.h> #include <stdint.h> #include <errno.h> /* * die() is for code that unexpectedly fails. * test_failed() is for tests where we didn't get a failure when we * should have. */ #define die(msg) do { perror(msg); exit(EXIT_FAILURE); } while (0) static int failures = 0; #define test_failed(msg) do { \ fprintf(stderr, "TEST FAILED, because: %s\n", msg); \ failures++; \ } while (0); #define MILLISEC 1000000 /* Expressed in nanonsecs */ static int make_new_timerfd(int clockid, int flags, int vsecs, int vnsecs, int isecs, int insecs) { struct itimerspec utmr; utmr.it_value.tv_sec = vsecs; utmr.it_value.tv_nsec = vnsecs; utmr.it_interval.tv_sec = isecs; utmr.it_interval.tv_nsec = insecs; return timerfd(-1, clockid, flags, &utmr, NULL); } /* make_new_timerfd */ int main(int argc, char *argv[]) { struct itimerspec utmr, outmr; int ufd, ret; struct timespec now; uint64_t ticks; int j; /* * All of the following timer creations should succeed. */ ufd = make_new_timerfd(CLOCK_REALTIME, 0, 1, 0, 1, 0); if (ufd == -1) die("timerfd-REALTIME-flags=0"); if (clock_gettime(CLOCK_REALTIME, &now) == -1) die("clock_gettime"); ufd = make_new_timerfd(CLOCK_REALTIME, TFD_TIMER_ABSTIME, now.tv_sec + 1, now.tv_nsec, 1, 0); if (ufd == -1) die("timerfd-REALTIME-ABSTIME"); ufd = make_new_timerfd(CLOCK_MONOTONIC, 0, 1, 0, 1, 0); if (ufd == -1) die("timerfd-MONOTONIC-flags=0"); if (clock_gettime(CLOCK_MONOTONIC, &now) == -1) die("clock_gettime"); ufd = make_new_timerfd(CLOCK_MONOTONIC, TFD_TIMER_ABSTIME, now.tv_sec + 1, now.tv_nsec, 1, 0); if (ufd == -1) die("timerfd-MONOTONIC-ABSTIME"); ufd = make_new_timerfd(CLOCK_REALTIME, 0, 0, 0, 0, 0); if (ufd == -1) die("timerfd-MONOTONIC-zero-timer"); /* * Some argument values should not be permitted. */ ufd = make_new_timerfd(-1, 0, 1, 0, 1, 0); if (!(ufd == -1 && errno == EINVAL)) test_failed("allowed-negative-CLOCKID-for-new-timer"); ufd = make_new_timerfd(99, 0, 1, 0, 1, 0); if (!(ufd == -1 && errno == EINVAL)) test_failed("allowed-bad-clockid"); ufd = make_new_timerfd(CLOCK_MONOTONIC, 0xff, 1, 0, 1, 0); if (!(ufd == -1 && errno == EINVAL)) test_failed("allowed-invalid-flags"); /* * Non-canonical utmr values should be disallowed. */ ufd = make_new_timerfd(CLOCK_MONOTONIC, 0, -1, 0, 1, 0); if (!(ufd == -1 && errno == EINVAL)) test_failed("allowed-negative-it_value.tv_secs"); ufd = make_new_timerfd(CLOCK_MONOTONIC, 0, 1, -1, 1, 0); if (!(ufd == -1 && errno == EINVAL)) test_failed("allowed-negative-it_value.tv_nsecs"); ufd = make_new_timerfd(CLOCK_MONOTONIC, 0, 1, 0, -1, 0); if (!(ufd == -1 && errno == EINVAL)) test_failed("allowed-negative-it_interval.tv_secs"); ufd = make_new_timerfd(CLOCK_MONOTONIC, 0, 1, 0, 1, -1); if (!(ufd == -1 && errno == EINVAL)) test_failed("allowed-negative-it_interval.tv_nsecs"); ufd = make_new_timerfd(CLOCK_MONOTONIC, 0, 1, 2000000000, 1, 0); if (!(ufd == -1 && errno == EINVAL)) test_failed("allowed-non-canonical-it_value.tv_nsecs"); ufd = make_new_timerfd(CLOCK_MONOTONIC, 0, 1, 0, 1, 2000000000); if (!(ufd == -1 && errno == EINVAL)) test_failed("allowed-non-canonical-it_interval.tv_nsecs"); /* * At least one of utmr and outmr must be non-NULL. */ ufd = timerfd(-1, CLOCK_MONOTONIC, 0, NULL, NULL); if (!(ufd == -1 && errno == EINVAL)) test_failed("allowed-NULL-utmtr-plus-NULL-outmr"); /* * When creating a new timer, a non-NULL outmr is meaningless * and disallowed. */ utmr.it_value.tv_sec = 10; utmr.it_value.tv_nsec = 1; utmr.it_interval.tv_sec = 10; utmr.it_interval.tv_nsec = 1; ufd = timerfd(-1, CLOCK_MONOTONIC, 0, &utmr, &outmr); if (!(ufd == -1 && errno == EINVAL)) test_failed("allowed-NULL-outmr-for-new-timer"); /******************************************************************* * Various argument combinations are disallowed when modifying an * existing timer. */ ufd = timerfd(-1, CLOCK_REALTIME, 0, &utmr, NULL); if (ufd == -1) die("timerfd: modify setup"); /* * A clockid cannot be changed for an existing timer; * therefore it is required to be -1. (Note: We could not * require it to be zero, because CLOCK_REALTIME == 0.) */ ret = timerfd(ufd, 0, 0, &utmr, NULL); if (!(ret == -1 && errno == EINVAL)) test_failed("allowed-clockid-not-neg-1-clockid-for-timer-mod"); /* * If ufd >= 0 and utmr is NULL, then we are retrieving the setting * of an existing timer; in this case a non-zero flags value is * meaningless and disallowed. */ ret = timerfd(ufd, -1, TFD_TIMER_ABSTIME, NULL, &outmr); if (!(ret == -1 && errno == EINVAL)) test_failed("allowed-non-zero-flags-with-NULL-utmr"); /* * If ufd is not -1, then it must be a valid timerfd file descriptor. */ ret = timerfd(STDOUT_FILENO, -1, 0, &utmr, NULL); if (!(ret == -1 && errno == EINVAL)) test_failed("allowed-non-timerfd-file-descriptor"); ret = timerfd(99, -1, 0, &utmr, NULL); if (!(ret == -1 && errno == EBADF)) test_failed("allowed-bad-file-descriptor"); printf("Completed argument tests\n"); /* * Test operation of a single shot timer. */ ufd = make_new_timerfd(CLOCK_REALTIME, 0, 0, 500 * MILLISEC, 0, 0); if (ufd == -1) die("timerfd-MONOTONIC-ABSTIME"); if (read(ufd, &ticks, sizeof(ticks)) != sizeof(ticks)) die("read"); if (ticks != 1) test_failed("ticks-was-not-1") /* * The following timer should return an overrun count of 5 * for the read(). The fact that make multiple timerfd() * calls to retrieve the current timer settings should not * affect the overrun count. */ utmr.it_value.tv_sec = 1; utmr.it_value.tv_nsec = 0; utmr.it_interval.tv_sec = 2; utmr.it_interval.tv_nsec = 0; printf("\nMeasuring timer with interval %ld, frequency %ld, " "for 10 seconds\n", (long) utmr.it_value.tv_sec, (long) utmr.it_interval.tv_sec); ufd = timerfd(-1, CLOCK_REALTIME, 0, &utmr, NULL); if (ufd == -1) die("timerfd-MONOTONIC-ABSTIME"); for (j = 0; j < 10; j++) { sleep(1); ret = timerfd(ufd, -1, 0, NULL, &outmr); printf("%d - Got: %ld %9ld\n", j, (long) outmr.it_value.tv_sec, (long) outmr.it_value.tv_nsec); } if (read(ufd, &ticks, sizeof(ticks)) != sizeof(ticks)) die("read"); printf("Read (expiration count): %lld\n", ticks); if (ticks != 5) test_failed("ticks-was-not-5"); /* * Modify the settings of the previous timer. * The following timer should return an overrun count of 3 * for the read(). Making multiple timerfd() calls * to retrieve the current timer settings should not * affect the overrun count. */ utmr.it_value.tv_sec = 1; utmr.it_value.tv_nsec = 0; utmr.it_interval.tv_sec = 4; utmr.it_interval.tv_nsec = 0; printf("\nMeasuring timer with interval %ld, frequency %ld, " "for 10 seconds\n", (long) utmr.it_value.tv_sec, (long) utmr.it_interval.tv_sec); ret = timerfd(ufd, -1, 0, &utmr, &outmr); if (ret == -1) die("timerfd-MONOTONIC-ABSTIME"); for (j = 0; j < 10; j++) { sleep(1); ret = timerfd(ufd, -1, 0, NULL, &outmr); printf("%d - Got: %ld %9ld\n", j, (long) outmr.it_value.tv_sec, (long) outmr.it_value.tv_nsec); } if (read(ufd, &ticks, sizeof(ticks)) != sizeof(ticks)) die("read"); printf("Read (expiration count): %lld\n", ticks); if (ticks != 3) test_failed("ticks-was-not-3"); /* * If we set an interval timer to expire in the past, then it * the number of overruns should correspond to how far in the * past the timer was set. */ #define NEG_SECS 20 if (clock_gettime(CLOCK_REALTIME, &now) == -1) die("clock_gettime"); ufd = make_new_timerfd(CLOCK_REALTIME, TFD_TIMER_ABSTIME, now.tv_sec - NEG_SECS, now.tv_nsec, 1, 0); if (ufd == -1) die("timerfd-REALTIME-ABSTIME"); if (read(ufd, &ticks, sizeof(ticks)) != sizeof(ticks)) die("read"); printf("Read from already expired timer: %lld\n", ticks); if (ticks != (1 + NEG_SECS)) test_failed("wrong-value-read-from-past-timer"); if (failures == 0) printf("All tests passed successfully\n"); else printf("%d tests failed\n", failures); exit((failures == 0) ? EXIT_SUCCESS : EXIT_FAILURE); } ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] Revised timerfd() interface 2007-08-09 21:11 ` [PATCH] Revised timerfd() interface Michael Kerrisk @ 2007-08-13 23:34 ` Randy Dunlap 2007-08-15 14:40 ` Jonathan Corbet 1 sibling, 0 replies; 29+ messages in thread From: Randy Dunlap @ 2007-08-13 23:34 UTC (permalink / raw) To: Michael Kerrisk Cc: Andrew Morton, Thomas Gleixner, lkml, Linux Torvalds, Davide Libenzi, drepper, stable, Christoph Hellwig, Jan Engelhardt, mtk On Thu, 09 Aug 2007 23:11:45 +0200 Michael Kerrisk wrote: > Andrew, > > Here's my first shot at changing the timerfd() interface; patch > is against 2.6.23-rc1-mm2. > > I think I got to the bottom of understanding the timer code in > the end, but I may still have missed some things... > > This is my first kernel patch of any substance. Perhaps Randy > or Christoph can give my toes a light toasting for the various > boobs I've likely made. > > Thomas, would you please look over this patch to verify my > timer code? > > Davide: would you please bless this interface change ;-). > It makes it possible to: > > 1) Fetch the previous timer settings (i.e., interval, and time > remaining until next expiration) while installing new settings > > 2) Retrieve the settings of an existing timer without changing > the timer. > > Both of these features are supported by the older Unix timer APIs > (setitimer/getitimer and timer_create/timer_settime/timer_gettime). Hi, OK, I'll make a few comments. 1. Provide a full changelog, including reasons why the patch is needed. Don't assume that we have read the previous email threads. (We haven't. :) > The changes are as follows: > > a) A further argument, outmr, can be used to retrieve the interval > and time remaining before the expiration of a previously created > timer. Thus the call signature is now: > > timerfd(int utmr, int clockid, int flags, ufd, > struct itimerspec *utmr, > struct itimerspec *outmr) > > The outmr argument: > > 1) must be NULL for a new timer (ufd == -1). > 2) can be NULL if we are updating an existing > timer (i.e., utmr != NULL), but we are not > interested in the previous timer value. > 3) can be used to retrieve the time until next timer > expiration, without changing the timer. > (In this case, utmr == NULL and ufd >= 0.) > > b) Make the 'clockid' immutable: it can only be set > if 'ufd' is -1 -- that is, on the timerfd() call that > first creates a timer. (This is effectively > the limitation that is imposed by POSIX timers: the > clockid is specified when the timer is created with > timer_create(), and can't be changed.) > > [In the 2.6.22 interface, the clockid of an existing > timer can be changed with a further call to timerfd() > that specifies the file descriptor of an existing timer.] > > The use cases are as follows: > > ufd = timerfd(-1, clockid, flags, utmr, NULL); > to create a new timer with given clockid, flags, and utmr (initial > expiration + interval). outmr must be NULL. > > ufd = timerfd(ufd, -1, flags, utmr, NULL); > to change the flags and timer settings of an existing timer. > (The clockid argument serves no purpose when modifying an > existing timer, and as I've implemented things, the argument > must be specified as -1 for an existing timer.) > > ufd = timerfd(ufd, -1, flags, utmr, &outmr); > to change the flags and timer settings of an existing timer, and > retrieve the time until the next expiration of the timer (and > the associated interval). > > ufd = timerfd(ufd, -1, 0, NULL, &outmr); > Return the time until the next expiration of the timer (and the > associated interval), without changing the existing timer settings. > The flags argument has no meaning in this case, and must be > specified as 0. > > Other changes: > > * I've added checks for various combinations of arguments values > that could be deemed invalid; there is probably room for debate > about whether we want all of these checks. Comments in the > patch describe these checks. > > * Appropriate, and possibly even correct, changes made in fs/compat.c. > > * Jan Engelhardt noted that a cast could be removed from Davide's > original code; I've removed it. > > The changes are correct as far as I've tested them. I've attached a > test program. Davide, could you subject this to further test please? > > I'm off on holiday the day after tomorrow, back in two weeks, with > very limited computer access. I may have time for another pass of > the code if comments come in over(euro)night, otherwise Davide may > be willing to clean up whatever I messed up... > > Cheers, > > Michael > > diff -ruN linux-2.6.23-rc1-mm2-orig/fs/compat.c linux-2.6.23-rc1-mm2/fs/compat.c > --- linux-2.6.23-rc1-mm2-orig/fs/compat.c 2007-08-05 10:43:30.000000000 +0200 > +++ linux-2.6.23-rc1-mm2/fs/compat.c 2007-08-07 22:08:15.000000000 +0200 > @@ -2211,18 +2211,38 @@ > #ifdef CONFIG_TIMERFD > > asmlinkage long compat_sys_timerfd(int ufd, int clockid, int flags, > - const struct compat_itimerspec __user *utmr) > + const struct compat_itimerspec __user *utmr, > + struct compat_itimerspec __user *old_utmr) > { > struct itimerspec t; > struct itimerspec __user *ut; > + long ret; > > - if (get_compat_itimerspec(&t, utmr)) > - return -EFAULT; > - ut = compat_alloc_user_space(sizeof(*ut)); > - if (copy_to_user(ut, &t, sizeof(t))) > - return -EFAULT; > + /*:mtk: Framework taken from compat_sys_mq_getsetattr() */ Drop the :mtk: string(s). > - return sys_timerfd(ufd, clockid, flags, ut); > + ut = compat_alloc_user_space(2 * sizeof(*ut)); > + > + if (utmr) { > + if (get_compat_itimerspec(&t, utmr)) > + return -EFAULT; > + if (copy_to_user(ut, &t, sizeof(t))) > + return -EFAULT; > + } > + > + ret = sys_timerfd(ufd, clockid, flags, > + utmr ? ut : NULL, > + old_utmr ? ut + 1 : NULL); > + > + if (ret) > + return ret; > + > + if (old_utmr) { > + if (copy_from_user(&t, old_ut, sizeof(t)) || I can't find <old_ut> anywhere. Should be old_utmr I guess. > + put_compat_itimerspec(&t, old_utmr)) put_compat_itimerspec(old_utmr, &t)) IOW, I think that the parameters are reversed (at least this way their types match the function prototype). > + return -EFAULT; > + } > + > + return 0; > } > > #endif /* CONFIG_TIMERFD */ > diff -ruN linux-2.6.23-rc1-mm2-orig/fs/timerfd.c linux-2.6.23-rc1-mm2/fs/timerfd.c > --- linux-2.6.23-rc1-mm2-orig/fs/timerfd.c 2007-08-05 10:44:24.000000000 +0200 > +++ linux-2.6.23-rc1-mm2/fs/timerfd.c 2007-08-09 22:11:25.000000000 +0200 > @@ -2,6 +2,8 @@ > * fs/timerfd.c > * > * Copyright (C) 2007 Davide Libenzi <davidel@xmailserver.org> > + * and Copyright (C) 2007 Google, Inc. > + * (Author: Michael Kerrisk <mtk@google.com>) > * > * > * Thanks to Thomas Gleixner for code reviews and useful comments. > @@ -26,6 +28,12 @@ > ktime_t tintv; > wait_queue_head_t wqh; > int expired; > + int clockid; /* Established when timer is created, then > + immutable */ > + int overrun; /* Number of overruns for this timer so far, > + updated on calls to timerfd() that retrieve > + settings of an existing timer, reset to zero > + by timerfd_read() */ > }; > > /* > @@ -61,6 +69,7 @@ > hrtimer_init(&ctx->tmr, clockid, htmode); > ctx->tmr.expires = texp; > ctx->tmr.function = timerfd_tmrproc; > + ctx->overrun = 0; > if (texp.tv64 != 0) > hrtimer_start(&ctx->tmr, texp, htmode); > } > @@ -130,10 +139,11 @@ > * callback to avoid DoS attacks specifying a very > * short timer period. > */ > - ticks = (u64) > + ticks = ctx->overrun + > hrtimer_forward(&ctx->tmr, > hrtimer_cb_get_time(&ctx->tmr), > ctx->tintv); > + ctx->overrun = 0; > hrtimer_restart(&ctx->tmr); > } else > ticks = 1; > @@ -151,24 +161,69 @@ > }; > > asmlinkage long sys_timerfd(int ufd, int clockid, int flags, > - const struct itimerspec __user *utmr) > + const struct itimerspec __user *utmr, > + struct itimerspec __user *old_utmr) > { > int error; > struct timerfd_ctx *ctx; > struct file *file; > struct inode *inode; > - struct itimerspec ktmr; > + struct itimerspec ktmr, oktmr; > > - if (copy_from_user(&ktmr, utmr, sizeof(ktmr))) > - return -EFAULT; > + /* > + * Not sure if we really need the first check below -- it > + * relies on all clocks having a non-negative value. That's > + * currently true, but I'm not sure that it is anywhere > + * documented as a requirement. > + * (The point of the check is that clockid has no meaning if > + * we are changing the settings of an existing timer > + * (i.e., ufd >= 0), so let's require it to be an otherwise > + * unused value.) > + */ > > - if (clockid != CLOCK_MONOTONIC && > - clockid != CLOCK_REALTIME) > + if (ufd >= 0) { > + if (clockid != -1) > + return -EINVAL; > + } else if (clockid != CLOCK_MONOTONIC && clockid != CLOCK_REALTIME) > + return -EINVAL; > + > + /* > + * Disallow any other bits in flags than those we implement. > + */ > + if ((flags & ~(TFD_TIMER_ABSTIME)) != 0) > + return -EINVAL; > + > + /* > + * Not sure if this check is strictly necessary, except to stop Use tab+space instead of spaces. > + * userspace trying to do something silly. Flags only has meaning > + * if we are creating/modifying a timer. > + */ > + if (utmr == NULL && flags != 0) > return -EINVAL; > - if (!timespec_valid(&ktmr.it_value) || > - !timespec_valid(&ktmr.it_interval)) > + > + /* > + * If we are creating a new timer, then we must supply the setting > + * and there is no previous timer setting to retrieve. > + */ > + if (ufd == -1 && (utmr == NULL || old_utmr != NULL)) > + return -EINVAL; > + > + /* > + * Caller must provide a new timer setting, or ask for previous > + * setting, or both. > + */ > + if (utmr == NULL && old_utmr == NULL) > return -EINVAL; > > + if (utmr) { > + if (copy_from_user(&ktmr, utmr, sizeof(ktmr))) > + return -EFAULT; > + > + if (!timespec_valid(&ktmr.it_value) || > + !timespec_valid(&ktmr.it_interval)) > + return -EINVAL; > + } > + > if (ufd == -1) { > ctx = kmalloc(sizeof(*ctx), GFP_KERNEL); > if (!ctx) > @@ -176,6 +231,11 @@ > > init_waitqueue_head(&ctx->wqh); > > + /* > + * Save the clockid since we need it later if we modify > + * the timer. > + */ > + ctx->clockid = clockid; > timerfd_setup(ctx, clockid, flags, &ktmr); > > /* > @@ -195,23 +255,61 @@ > fput(file); > return -EINVAL; > } > - /* > - * We need to stop the existing timer before reprogramming > - * it to the new values. > - */ > - for (;;) { > - spin_lock_irq(&ctx->wqh.lock); > - if (hrtimer_try_to_cancel(&ctx->tmr) >= 0) > - break; > - spin_unlock_irq(&ctx->wqh.lock); > - cpu_relax(); > + > + if (old_utmr) { > + > + /* > + * Retrieve interval and time remaining before > + * next expiration of timer. > + */ > + > + struct hrtimer *timer = &ctx->tmr; > + ktime_t iv = ctx->tintv; > + ktime_t now, remaining; > + > + clockid = ctx->clockid; > + > + memset(&oktmr, 0, sizeof(struct itimerspec)); > + if (iv.tv64) > + oktmr.it_interval = ktime_to_timespec(iv); > + > + now = timer->base->get_time(); > + ctx->overrun += hrtimer_forward(&ctx->tmr, > + hrtimer_cb_get_time(&ctx->tmr), > + ctx->tintv); > + remaining = ktime_sub(timer->expires, now); > + if (remaining.tv64 <= 0) > + oktmr.it_value.tv_nsec = 1; > + else > + oktmr.it_value = ktime_to_timespec(remaining); > + > + if (copy_to_user(old_utmr, &oktmr, sizeof (oktmr))) { > + fput(file); > + return -EFAULT; > + } > } > - /* > - * Re-program the timer to the new value ... > - */ > - timerfd_setup(ctx, clockid, flags, &ktmr); > > - spin_unlock_irq(&ctx->wqh.lock); > + if (utmr) { > + /* > + * Modify settings of existing timer. > + * We need to stop the existing timer before > + * reprogramming it to the new values. > + */ > + for (;;) { > + spin_lock_irq(&ctx->wqh.lock); > + if (hrtimer_try_to_cancel(&ctx->tmr) >= 0) > + break; > + spin_unlock_irq(&ctx->wqh.lock); > + cpu_relax(); > + } > + > + /* > + * Re-program the timer to the new value ... > + */ > + timerfd_setup(ctx, ctx->clockid, flags, &ktmr); > + > + spin_unlock_irq(&ctx->wqh.lock); > + } > fput(file); > } > > @@ -222,4 +320,3 @@ > kfree(ctx); > return error; > } > - > diff -ruN linux-2.6.23-rc1-mm2-orig/include/linux/compat.h linux-2.6.23-rc1-mm2/include/linux/compat.h > --- linux-2.6.23-rc1-mm2-orig/include/linux/compat.h 2007-07-09 01:32:17.000000000 +0200 > +++ linux-2.6.23-rc1-mm2/include/linux/compat.h 2007-08-05 17:46:33.000000000 +0200 > @@ -265,7 +265,8 @@ > const compat_sigset_t __user *sigmask, > compat_size_t sigsetsize); > asmlinkage long compat_sys_timerfd(int ufd, int clockid, int flags, > - const struct compat_itimerspec __user *utmr); > + const struct compat_itimerspec __user *utmr, > + struct compat_itimerspec __user *old_utmr); > > #endif /* CONFIG_COMPAT */ > #endif /* _LINUX_COMPAT_H */ > diff -ruN linux-2.6.23-rc1-mm2-orig/include/linux/syscalls.h linux-2.6.23-rc1-mm2/include/linux/syscalls.h > --- linux-2.6.23-rc1-mm2-orig/include/linux/syscalls.h 2007-08-05 10:44:32.000000000 +0200 > +++ linux-2.6.23-rc1-mm2/include/linux/syscalls.h 2007-08-05 17:47:15.000000000 +0200 > @@ -608,7 +608,8 @@ > asmlinkage long sys_getcpu(unsigned __user *cpu, unsigned __user *node, struct getcpu_cache __user *cache); > asmlinkage long sys_signalfd(int ufd, sigset_t __user *user_mask, size_t sizemask); > asmlinkage long sys_timerfd(int ufd, int clockid, int flags, > - const struct itimerspec __user *utmr); > + const struct itimerspec __user *utmr, > + struct itimerspec __user *old_utmr); > asmlinkage long sys_eventfd(unsigned int count); > asmlinkage long sys_fallocate(int fd, int mode, loff_t offset, loff_t len); --- ~Randy *** Remember to use Documentation/SubmitChecklist when testing your code *** ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] Revised timerfd() interface 2007-08-09 21:11 ` [PATCH] Revised timerfd() interface Michael Kerrisk 2007-08-13 23:34 ` Randy Dunlap @ 2007-08-15 14:40 ` Jonathan Corbet 1 sibling, 0 replies; 29+ messages in thread From: Jonathan Corbet @ 2007-08-15 14:40 UTC (permalink / raw) To: Michael Kerrisk Cc: lkml, Linux Torvalds, Davide Libenzi, drepper, stable, Christoph Hellwig, Randy Dunlap, Jan Engelhardt, Andrew Morton, Thomas Gleixner Sorry for the late commentary...as I looked this over, one thing popped into my mind.... > b) Make the 'clockid' immutable: it can only be set > if 'ufd' is -1 -- that is, on the timerfd() call that > first creates a timer. timerfd() is looking increasingly like a multiplexor system call in disguise. There is no form of invocation which uses all of the arguments. Are we sure we wouldn't rather have something like: timerfd_open(clock); timerfd_adjust(fd, new_time, old_time); ? jon ^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2007-09-13 8:22 UTC | newest] Thread overview: 29+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-08-25 6:41 [PATCH] Revised timerfd() interface Michael Kerrisk 2007-08-30 12:01 ` Davide Libenzi 2007-09-04 8:03 ` Michael Kerrisk 2007-09-04 8:18 ` Andrew Morton 2007-09-04 8:24 ` Michael Kerrisk 2007-09-04 15:25 ` Davide Libenzi 2007-09-04 15:39 ` Michael Kerrisk 2007-09-04 22:41 ` Davide Libenzi 2007-09-04 20:49 ` Michael Kerrisk 2007-09-04 22:44 ` Davide Libenzi 2007-09-05 0:08 ` Michael Kerrisk 2007-09-05 12:02 ` Andrew Morton 2007-09-05 16:14 ` Davide Libenzi 2007-09-05 16:23 ` Michael Kerrisk 2007-09-05 19:57 ` Davide Libenzi 2007-09-05 22:50 ` Michael Kerrisk 2007-09-05 23:45 ` Davide Libenzi 2007-09-06 6:58 ` Michael Kerrisk 2007-09-06 23:37 ` Davide Libenzi 2007-09-10 3:15 ` Michael Kerrisk 2007-09-05 12:12 ` Denys Vlasenko 2007-09-05 15:32 ` timerfd redux Michael Kerrisk 2007-09-13 2:39 ` Andrew Morton 2007-09-13 6:14 ` Michael Kerrisk 2007-09-13 8:13 ` Michael Kerrisk 2007-09-13 8:20 ` Andrew Morton -- strict thread matches above, loose matches on Subject: below -- 2007-07-23 6:32 Problems with timerfd() Michael Kerrisk 2007-07-23 6:38 ` Andrew Morton 2007-07-25 18:18 ` Michael Kerrisk 2007-07-25 22:12 ` Andrew Morton 2007-08-07 6:55 ` Michael Kerrisk 2007-08-09 21:11 ` [PATCH] Revised timerfd() interface Michael Kerrisk 2007-08-13 23:34 ` Randy Dunlap 2007-08-15 14:40 ` Jonathan Corbet
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.