All of lore.kernel.org
 help / color / mirror / Atom feed
* Problems with timerfd()
@ 2007-07-23  6:32 Michael Kerrisk
  2007-07-23  6:38 ` Andrew Morton
  2007-07-23 16:55 ` Problems with timerfd() Ray Lee
  0 siblings, 2 replies; 37+ messages in thread
From: Michael Kerrisk @ 2007-07-23  6:32 UTC (permalink / raw)
  To: Andrew Morton; +Cc: lkml, Linux Torvalds, Davide Libenzi, drepper

Andrew,

The timerfd() syscall went into 2.6.22.  While writing the man page for
this syscall I've found some notable limitations of the interface, and I am
wondering whether you and Linus would consider having this interface fixed
for 2.6.23.

On the one hand, these fixes would be an ABI change, which is of course
bad.  (However, as noted below, you have already accepted one of the ABI
changes that I suggested into -mm, after Davide submitted a patch.)

On the other hand, the interface has not yet made its way into a glibc
release, and the change will not break applications.  (The 2.6.22 version
of the interface would just be "broken".)

Details of my suggested changes are below.  A complication in all of this
is that on Friday, while I was part way though discussing this with Davide,
he went on vacation for a month and is likely to have only limited email
access during that time.  (See my further thoughts about what to do while
Davide is away at the end of this mail message.)  Our last communication,
after Davide had expressed reluctance about making some of the interface
changes, was a more extensive note from me describing the problems of the
interface.

The problems of the 2.6.22 timerfd() interface are as follows:

Problem 1
---------

The value returned by read(2)ing from a timerfd file descriptor is the
number of timer overruns.  In 2.6.22, this value is 4 bytes, limiting the
overrun count  to 2^32.  Consider an application where the timer frequency
was 100 kHz (feasible in the not-too-distant future, I would guess), then
the overrun counter would cycle after ~40000 seconds (~11 hours).
Furthermore returning 4 bytes from the read() is inconsistent with eventfd
file descriptors, which return 8 byte integers from a read().

Davide has already submitted a patch to you to make read() from a timerfd
file descriptor return an 8 byte integer, and I understand it to have been
accepted into -mm.

Problem 2
---------
Existing timer APIs (Unix interval timers -- setitimer(2); POSIX timers --
timer_settime()) allow the caller to retrieve the previous setting of a
timer at the same time as a new timer setting is established.  This permits
functionality such as the following for userland programs:

1. set a timer to go of at time X
2. modify the timer to go off at earlier time Z; return previous
   timer settings (X)
3. When the timer Z expires, restore timer to expire at time X

timerfd() does not provide this functionality.

Problem 3
---------

Existing timer APIs (Unix interval timers -- getitimer(2); POSIX timers --
timer_gettime()) allow the caller to retrieve the time remaining until the
next expiration of the timer.

timerfd() does not provide this functionality.

Solution (proposed interface changes)
-------------------------------------

In response to my "Problem 2", Davide noted in the last message I got from
him before he went on vacation:

> But the old status of the timer is the union of clickid, flags and utmr.
> So, in theory, the whole set should be returned back, forcing a pretty
> drastic API change.

However, I think there is a reasonable solution to this problem, which I
outlined to Davide, but did not yet hear back from him about.

a) Make the 'clockid' immutable: say that it can only be set
   if 'ufd' is -1 -- that is, on the timerfd() call that
   first creates the timer.  This would eliminate the need to
   return the previous clockid value.  (This is effectively
   the limitation that is imposed by POSIX timers: the
   clockid is specified when the timer is created with
   timer_create(), and can't be changed.)

   [In the 2.6.22 interface, the clockid of an existing
   timer can be changed with a further call to timerfd()
   that specifies the file descriptor of an existing timer.]

b) There is no need to return the previous 'flags' setting.
   The POSIX timer functions (i.e., timer_settime()) do not
   do this. Instead, timer_settime() always returns the
   time until the next expiration would have occurred,
   even if the TIMER_ABSTIME flag was specified when
   the timer was set.

   [The only 'flags' value currently implemented in
   timerfd() is TFD_TIMER_ABSTIME, which is the
   equivalent of TIMER_ABSTIME.]

With these design assumptions, the only thing that would need
to be added to timerfd() would be an argument used to return the time
until the previous timer would have expired + its interval.

The use cases would be as follows:

ufd = timerfd(-1, clockid, flags, utmr, NULL);
to create a new timer with given clockid, flags, and utmr (intial
expiration + interval).

ufd = timerfd(ufd, 0, flags, utmr, NULL);
to change the flags and timer settings of an existing timer.

ufd = timerfd(ufd, 0, flags, utmr, &old);
to change the flags and timer settings of an existing timer, and retrieve
the time until the next expiration of the timer (and the associated interval).

ufd = timerfd(ufd, 0, 0, NULL, &old);
Return the time until the next expiration of the timer (and the associated
interval), without changing the existing timer settings

Practical details
-----------------

Since Davide is away, my proposal is this: if you are prepared to consider
entertaining this ABI change, then I would try to write the patch.  If when
we next hear from Davide (he may have intermittent email access over the
next month), he agrees that it is worth making the change, then I would
submit the change to -mm with the hope that time frames would allow for it
to make it into 2.6.23.  (I would guess that delaying things for a month
means the fix might not make it into 2.6.23, and would thus simply make the
fix more painful and less feasible.)

What do you think?

Cheers,

Michael

PS For reference, the timerfd.2 man page describing the 2.6.22 interface is
below.

.TH TIMERFD 2 2007-07-17 Linux "Linux Programmer's Manual"
.SH NAME
timerfd \- create a timer that delivers notifications on a file descriptor
.SH SYNOPSIS
.\" FIXME . This header file may well change
.\" FIXME . Probably _GNU_SOURCE will be required
.\" FIXME . May require: Link with \fI\-lrt\f
.nf
.B #include <sys/timerfd.h>
.sp
.BR "int timerfd(int " ufd ", int " clockid ", int " flags ,
.BR "            const struct itimerspec *" utmr );
.fi
.SH DESCRIPTION
.BR timerfd ()
creates and starts a new timer (or modifies the settings of an
existing timer) that delivers timer expiration
information via a file descriptor.
This provides an alternative to the use of
.BR setitimer (2)
or
.BR timer_create (3),
and has the advantage that the file descriptor may be monitored by
.BR poll (2)
and
.BR select (2).
.\" FIXME Davide, a question: timer_settime() and setitimer()
.\" both permit the caller to obtain the old value of the
.\" timer when modifying an existing timer.  Why doesn't
.\" timerfd() provide this functionality?

The
.I ufd
argument is either \-1 to create a new timer,
or a file descriptor referring to an existing timerfd timer.
The remaining arguments specify the settings for the new timer,
or the modified settings for an existing timer.

The
.I clockid
argument specifies the clock that is used to mark the progress
of the timer, and must be either
.BR CLOCK_REALTIME
or
.BR CLOCK_MONOTONIC .
.B CLOCK_REALTIME
is a settable system-wide clock.
.B CLOCK_MONOTONIC
is a non-settable clock that is not affected
by discontinuous changes in the system clock
(e.g., manual changes to system time).
See also
.BR clock_getres (3).

The
.I flags
argument is either 0, to create a relative timer
.RI ( utmr.it_interval
specifies a relative time for the clock specified by
.IR clockid ),
or
.BR TFD_TIMER_ABSTIME ,
to create an absolute timer
.RI ( utmr.it_interval
specifies an absolute time for the clock specified by
.IR clockid ).

The
.I utmr
argument specifies the initial expiration and interval for the timer.
The
.I itimer
structure used for this argument contains two fields,
each of which is in turn a structure of type
.IR timespec :
.in +0.5i
.nf

struct timespec {
    time_t tv_sec;               /* Seconds */
    long   tv_nsec;              /* Nanoseconds */
};

struct itimerspec {
    struct timespec it_interval; /* Interval for periodic
                                    timer */
    struct timespec it_value;    /* Initial expiration */
};
.fi
.in
.PP
.IR utmr.it_value
specifies the initial expiration of the timer,
in seconds and nanoseconds.
Setting both fields of
.IR utmr.it_value
to zero will disable an existing timer
.RI ( ufd
!= \-1),
or create a new timer that is not armed
.RI ( ufd
== \-1).

Setting one or both fields of
.I utmr.it_interval
to non-zero values specifies the period, in seconds and nanoseconds,
for repeated timer expirations after the initial expiration.
If both fields of
.I utmr.it_interval
are zero, the the timer expires just once, at the time specified by
.IR utmr.it_value .
.PP
.BR timerfd (2)
returns a file descriptor that supports the following operations:
.TP
.BR read (2)
.\" FIXME Davide, What I have written below is what
.\" I've determined from looking at the source code
.\" and from experimenting.  But is it correct?
If the timer has already expired one or more times since it was created,
or since the last
.BR read (2),
then the buffer given to
.BR read (2)
returns an unsigned 4-byte integer
.RI ( uint32_t )
containing the number of expirations that have occurred.
.\" FIXME Davide, what if there are more expirations than can fit
.\" in a uint32_t?  (Why wasn't this value uint64_t, as with
.\" eventfd()?)
.IP
If no timer expirations have occurred at the time of the
.BR read (2),
then the call either blocks until the next timer expiration,
or fails with the error
.B EAGAIN
if the file descriptor has been made non-blocking
(via the use of the
.BR fcntl (2)
.B F_SETFL
operation to set the
.B O_NONBLOCK
flag).
.IP
A
.BR read (2)
will fail with the error
.B EINVAL
if the size of the supplied buffer is less than 4 bytes.
.TP
.BR poll "(2), " select "(2) (and similar)"
The file descriptor is readable
(the
.BR select (2)
.I readfds
argument; the
.BR poll (2)
.B POLLIN
flag)
if one or more timer expirations have occurred.
.IP
The timerfd file descriptor also supports the other file-descriptor
multiplexing APIs:
.BR pselect (2),
.BR ppoll (2),
and
.BR epoll (7).
.SS fork(2) semantics
.\" FIXME Davide, is the following correct?
After a
.BR fork (2),
the child inherits a copy of the timerfd file descriptor.
The file descriptor refers to the same underlying
file object as the corresponding descriptor in the parent,
and
.BR read (2)s
in the child will return information about
expirations of the timer.
.SS execve(2) semantics
.\" FIXME Davide, is the following correct?
A timerfd file descriptor is preserved across
.BR execve (2),
and continues to generate file expirations.
.SH "RETURN VALUE"
On success,
.BR timerfd ()
returns a timerfd file descriptor;
this is either a new file descriptor (if
.I ufd
was \-1), or
.I ufd
if
.I ufd
was a valid timerfd file descriptor.
On error, \-1 is returned and
.I errno
is set to indicate the error.
.SH ERRORS
.TP
.B EBADF
The
.I ufd
file descriptor is not a valid file descriptor.
.TP
.B EINVAL
The
.I ufd
file descriptor is not a valid timerfd file descriptor.
The
.I clockid
argument is neither
.B CLOCK_MONOTONIC
nor
.BR CLOCK_REALTIME .
The
.I utmr
is not properly initialized (one of the
.I tv_nsec
falls outside the range zero to 999,999,999).
.TP
.B EMFILE
The per-process limit of open file descriptors has been reached.
.TP
.B ENFILE
The system limit on the total number of open files has been
reached.
.TP
.B ENODEV
Could not mount (internal) anonymous i-node device.
.TP
.B ENOMEM
There was insufficient memory to handle the requested
.I op
control operation.
.SH VERSIONS
.BR timerfd (2)
is available on Linux since kernel 2.6.22.
.\" FIXME . check later to see when glibc support is provided
As at July 2007 (glibc 2.6), the details of the glibc interface
have not been finalized, so that, for example,
the eventual header file may be different from that shown above.
.SH CONFORMING TO
.BR timerfd (2)
is Linux specific.
.SH EXAMPLE
.nf

.\" FIXME . Check later what header file glibc uses for timerfd
.\" FIXME . Probably glibc will require _GNU_SOURCE to be set
.\"
.\" The commented out code here is what we currently need until
.\" the required stuff is in glibc
.\"
.\" #define _GNU_SOURCE
.\" #include <sys/syscall.h>
.\" #include <unistd.h>
.\" #include <time.h>
.\" #if defined(__i386__)
.\" #define __NR_timerfd 322
.\" #endif
.\"
.\" static int
.\" timerfd(int ufd, int clockid, int flags, struct itimerspec *utmr) {
.\"     return syscall(__NR_timerfd, ufd, clockid, flags, utmr);
.\" }
.\"
.\" #define TFD_TIMER_ABSTIME (1 << 0)
.\"
/* Link with -lrt */
#include <sys/timerfd.h>        /* May yet change for glibc */
#include <time.h>
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>     /* Definition of uint32_t */

#define die(msg) do { perror(msg); exit(EXIT_FAILURE); } while (0)

static void
print_elapsed_time(void)
{
    static struct timespec start;
    struct timespec curr;
    static int first_call = 1;
    int secs, nsecs;

    if (first_call) {
        first_call = 0;
        if (clock_gettime(CLOCK_MONOTONIC, &start) == \-1)
            die("clock_gettime");
    }

    if (clock_gettime(CLOCK_MONOTONIC, &curr) == \-1)
        die("clock_gettime");

    secs = curr.tv_sec \- start.tv_sec;
    nsecs = curr.tv_nsec \- start.tv_nsec;
    if (nsecs < 0) {
        secs\-\-;
        nsecs += 1000000000;
    }
    printf("%d.%03d: ", secs, (nsecs + 500000) / 1000000);
}

int
main(int argc, char *argv[])
{
    struct itimerspec utmr;
    int max_expirations, tot_exp, tfd;
    struct timespec now;
    uint32_t exp;
    ssize_t s;

    if ((argc != 2) && (argc != 4)) {
        fprintf(stderr, "%s init\-secs [interval\-secs max\-exp]\\n",
                argv[0]);
        exit(EXIT_FAILURE);
    }

    if (clock_gettime(CLOCK_REALTIME, &now) == \-1)
        die("clock_gettime");

    /* Create a CLOCK_REALTIME absolute timer with initial
       expiration and interval as specified in command line */

    utmr.it_value.tv_sec = now.tv_sec + atoi(argv[1]);
    utmr.it_value.tv_nsec = now.tv_nsec;
    if (argc == 2) {
        utmr.it_interval.tv_sec = 0;
        max_expirations = 1;
    } else {
        utmr.it_interval.tv_sec = atoi(argv[2]);
        max_expirations = atoi(argv[3]);
    }
    utmr.it_interval.tv_nsec = 0;

    tfd = timerfd(\-1, CLOCK_REALTIME, TFD_TIMER_ABSTIME, &utmr);
    if (tfd == \-1)
        die("timerfd");

    print_elapsed_time();
    printf("timer started\\n");

.\"    exp = 0; // ????? Without this initialization, the results from
.\"             // read() are strange; it appears that read() is only
.\"             // returning one byte of tick information, not four.
    for (tot_exp = 0; tot_exp < max_expirations;) {
        s = read(tfd, &exp, sizeof(uint32_t));
        if (s != sizeof(uint32_t))
            die("read");

        tot_exp += exp;
        print_elapsed_time();
        printf("read: %u; total=%d\\n", exp, tot_exp);
    }

    exit(EXIT_SUCCESS);
}
.fi
.SH "SEE ALSO"
.BR eventfd (2),
.BR poll (2),
.BR read (2),
.BR select (2),
.BR signalfd (2),
.BR epoll (7),
.BR time (7)
.\" FIXME See: setitimer(2), timer_create(3), clock_settime(3)
.\" FIXME other timer syscalls, and have them refer to this page
.\" FIXME have SEE ALSO in time.7 refer to this page.



^ permalink raw reply	[flat|nested] 37+ messages in thread
* Re: [PATCH] Revised timerfd() interface
@ 2007-08-25  6:41 Michael Kerrisk
  2007-08-30 12:01 ` Davide Libenzi
  0 siblings, 1 reply; 37+ messages in thread
From: Michael Kerrisk @ 2007-08-25  6:41 UTC (permalink / raw)
  To: Davide Libenzi, Randy Dunlap
  Cc: Michael Kerrisk, Andrew Morton, Thomas Gleixner, lkml,
	Linux Torvalds, drepper, stable, Christoph Hellwig,
	Jan Engelhardt, Jonathan Corbet

[resend, since LKML didn't like last send.]

Randy,

Thanks for the input.  I will try to send a revised patch after I get back
from holiday.  Some comments below.

Davide -- ping!  Can you please offer your comments about this change, and
also thoughts on Jon's and my comments about a more radical API change
later  in this thread.

Cheers,

Michael


Cheers,

Michael

On 8/14/07, Randy Dunlap <rdunlap@xenotime.net> wrote:
>
> On Thu, 09 Aug 2007 23:11:45 +0200 Michael Kerrisk wrote:
>
> > Andrew,
> >
> > Here's my first shot at changing the timerfd() interface; patch
> > is against 2.6.23-rc1-mm2.
> >
> > I think I got to the bottom of understanding the timer code in
> > the end, but I may still have missed some things...
> >
> > This is my first kernel patch of any substance.  Perhaps Randy
> > or Christoph can give my toes a light toasting for the various
> > boobs I've likely made.
> >
> > Thomas, would you please look over this patch to verify my
> > timer code?
> >
> > Davide: would you please bless this interface change ;-).
> > It makes it possible to:
> >
> > 1) Fetch the previous timer settings (i.e., interval, and time
> >    remaining until next expiration) while installing new settings
> >
> > 2) Retrieve the settings of an existing timer without changing
> >    the timer.
> >
> > Both of these features are supported by the older Unix timer APIs
> > (setitimer/getitimer and timer_create/timer_settime/timer_gettime).
>
> Hi,
>
> OK, I'll make a few comments.
>
> 1.  Provide a full changelog, including reasons why the patch is
> needed.  Don't assume that we have read the previous email threads.
> (We haven't. :)


Okay -- will do for a future revision of this patch.

In the meantime, the most relevant earlier mail thread is this:

http://article.gmane.org/gmane.linux.kernel/559193

The following are also of some relevance:

http://article.gmane.org/gmane.linux.kernel/556043
http://article.gmane.org/gmane.linux.kernel/556124

> The changes are as follows:
> >
> > a) A further argument, outmr, can be used to retrieve the interval
> >    and time remaining before the expiration of a previously created
> >    timer.  Thus the call signature is now:
> >
> >    timerfd(int utmr, int clockid, int flags,
>
>                  ufd,

Yes.

>            struct itimerspec *utmr,
> >            struct itimerspec *outmr)
> >
> >    The outmr argument:
> >
> >    1) must be NULL for a new timer (ufd == -1).
> >    2) can be NULL if we are updating an existing
> >       timer (i.e., utmr != NULL), but we are not
> >       interested in the previous timer value.
> >    3) can be used to retrieve the time until next timer
> >       expiration, without changing the timer.
> >       (In this case, utmr == NULL and ufd >= 0.)
> >
> > b) Make the 'clockid' immutable: it can only be set
> >    if 'ufd' is -1 -- that is, on the timerfd() call that
> >    first creates a timer.  (This is effectively
> >    the limitation that is imposed by POSIX timers: the
> >    clockid is specified when the timer is created with
> >    timer_create(), and can't be changed.)
> >
> >    [In the 2.6.22 interface, the clockid of an existing
> >    timer can be changed with a further call to timerfd()
> >    that specifies the file descriptor of an existing timer.]
> >
> > The use cases are as follows:
> >
> > ufd = timerfd(-1, clockid, flags, utmr, NULL);
> > to create a new timer with given clockid, flags, and utmr (initial
> > expiration + interval).  outmr must be NULL.
> >
> > ufd = timerfd(ufd, -1, flags, utmr, NULL);
> > to change the flags and timer settings of an existing timer.
> > (The clockid argument serves no purpose when modifying an
> > existing timer, and as I've implemented things, the argument
> > must be specified as -1 for an existing timer.)
> >
> > ufd = timerfd(ufd, -1, flags, utmr, &outmr);
> > to change the flags and timer settings of an existing timer, and
> > retrieve the time until the next expiration of the timer (and
> > the associated interval).
> >
> > ufd = timerfd(ufd, -1, 0, NULL, &outmr);
> > Return the time until the next expiration of the timer (and the
> > associated interval), without changing the existing timer settings.
> > The flags argument has no meaning in this case, and must be
> > specified as 0.
> >
> > Other changes:
> >
> > * I've added checks for various combinations of arguments values
> >   that could be deemed invalid; there is probably room for debate
> >   about whether we want all of these checks.  Comments in the
> >   patch describe these checks.
> >
> > * Appropriate, and possibly even correct, changes made in fs/compat.c.
> >
> > * Jan Engelhardt noted that a cast could be removed from Davide's
> >   original code; I've removed it.
> >
> > The changes are correct as far as I've tested them.  I've attached a
> > test program.  Davide, could you subject this to further test please?
> >
> > I'm off on holiday the day after tomorrow, back in two weeks, with
> > very limited computer access.  I may have time for another pass of
> > the code if comments come in over(euro)night, otherwise Davide may
> > be willing to clean up whatever I messed up...
> >
> > Cheers,
> >
> > Michael
> >
> > diff -ruN linux-2.6.23-rc1-mm2-orig/fs/compat.c linux-2.6.23-rc1-mm2
> /fs/compat.c
> > --- linux-2.6.23-rc1-mm2-orig/fs/compat.c     2007-08-05 10:43:
> 30.000000000 +0200
> > +++ linux-2.6.23-rc1-mm2/fs/compat.c  2007-08-07 22:08:15.000000000+0200
> > @@ -2211,18 +2211,38 @@
> >  #ifdef CONFIG_TIMERFD
> >
> >  asmlinkage long compat_sys_timerfd(int ufd, int clockid, int flags,
> > -                                const struct compat_itimerspec __user
> *utmr)
> > +                      const struct compat_itimerspec __user *utmr,
> > +                      struct compat_itimerspec __user *old_utmr)
> >  {
> >       struct itimerspec t;
> >       struct itimerspec __user *ut;
> > +     long ret;
> >
> > -     if (get_compat_itimerspec(&t, utmr))
> > -             return -EFAULT;
> > -     ut = compat_alloc_user_space(sizeof(*ut));
> > -     if (copy_to_user(ut, &t, sizeof(t)))
> > -             return -EFAULT;
> > +     /*:mtk: Framework taken from compat_sys_mq_getsetattr() */
>
> Drop the :mtk: string(s).


Yes, that was just a marker to myself  for comments that should eventually
be removed.

> -     return sys_timerfd(ufd, clockid, flags, ut);
> > +     ut = compat_alloc_user_space(2 * sizeof(*ut));
> > +
> > +     if (utmr) {
> > +             if (get_compat_itimerspec(&t, utmr))
> > +                     return -EFAULT;
> > +             if (copy_to_user(ut, &t, sizeof(t)))
> > +                     return -EFAULT;
> > +     }
> > +
> > +     ret = sys_timerfd(ufd, clockid, flags,
> > +                     utmr ? ut : NULL,
> > +                     old_utmr ? ut + 1 : NULL);
> > +
> > +     if (ret)
> > +             return ret;
> > +
> > +     if (old_utmr) {
> > +             if (copy_from_user(&t, old_ut, sizeof(t)) ||
>
> I can't find <old_ut> anywhere.  Should be old_utmr I guess.
>
> > +                 put_compat_itimerspec(&t, old_utmr))
>
>                     put_compat_itimerspec(old_utmr, &t))
>
> IOW, I think that the parameters are reversed (at least this way
> their types match the function prototype).


I'm not sure what happened there.  I had a compiled, working kernel with the patch.  Somehow I've messed up while making the patch, I guess.

> +                     return -EFAULT;
> > +     }
> > +
> > +     return 0;
> >  }
> >
> >  #endif /* CONFIG_TIMERFD */
> > diff -ruN linux-2.6.23-rc1-mm2-orig/fs/timerfd.c linux-2.6.23-rc1-mm2
> /fs/timerfd.c
> > --- linux-2.6.23-rc1-mm2-orig/fs/timerfd.c    2007-08-05 10:44:
> 24.000000000 +0200
> > +++ linux-2.6.23-rc1-mm2/fs/timerfd.c 2007-08-09 22:11:25.000000000+0200
> > @@ -2,6 +2,8 @@
> >   *  fs/timerfd.c
> >   *
> >   *  Copyright (C) 2007  Davide Libenzi <davidel@xmailserver.org>
> > + *  and Copyright (C) 2007 Google, Inc.
> > + *      (Author: Michael Kerrisk <mtk@google.com>)
> >   *
> >   *
> >   *  Thanks to Thomas Gleixner for code reviews and useful comments.
> > @@ -26,6 +28,12 @@
> >       ktime_t tintv;
> >       wait_queue_head_t wqh;
> >       int expired;
> > +     int clockid;    /* Established when timer is created, then
> > +                        immutable */
> > +     int overrun;    /* Number of overruns for this timer so far,
> > +                        updated on calls to timerfd() that retrieve
> > +                        settings of an existing timer, reset to zero
> > +                        by timerfd_read() */
> >  };
> >
> >  /*
> > @@ -61,6 +69,7 @@
> >       hrtimer_init(&ctx->tmr, clockid, htmode);
> >       ctx->tmr.expires = texp;
> >       ctx->tmr.function = timerfd_tmrproc;
> > +     ctx->overrun = 0;
> >       if (texp.tv64 != 0)
> >               hrtimer_start(&ctx->tmr, texp, htmode);
> >  }
> > @@ -130,10 +139,11 @@
> >                        * callback to avoid DoS attacks specifying a very
> >                        * short timer period.
> >                        */
> > -                     ticks = (u64)
> > +                     ticks = ctx->overrun +
> >                               hrtimer_forward(&ctx->tmr,
> >
> hrtimer_cb_get_time(&ctx->tmr),
> >                                               ctx->tintv);
> > +                     ctx->overrun = 0;
> >                       hrtimer_restart(&ctx->tmr);
> >               } else
> >                       ticks = 1;
> > @@ -151,24 +161,69 @@
> >  };
> >
> >  asmlinkage long sys_timerfd(int ufd, int clockid, int flags,
> > -                         const struct itimerspec __user *utmr)
> > +                     const struct itimerspec __user *utmr,
> > +                     struct itimerspec __user *old_utmr)
> >  {
> >       int error;
> >       struct timerfd_ctx *ctx;
> >       struct file *file;
> >       struct inode *inode;
> > -     struct itimerspec ktmr;
> > +     struct itimerspec ktmr, oktmr;
> >
> > -     if (copy_from_user(&ktmr, utmr, sizeof(ktmr)))
> > -             return -EFAULT;
> > +     /*
> > +      * Not sure if we really need the first check below -- it
> > +      * relies on all clocks having a non-negative value.  That's
> > +      * currently true, but I'm not sure that it is anywhere
> > +      * documented as a requirement.
> > +      * (The point of the check is that clockid has no meaning if
> > +      * we are changing the settings of an existing timer
> > +      * (i.e., ufd >= 0), so let's require it to be an otherwise
> > +      * unused value.)
> > +      */
> >
> > -     if (clockid != CLOCK_MONOTONIC &&
> > -         clockid != CLOCK_REALTIME)
> > +     if (ufd >= 0) {
> > +             if (clockid != -1)
> > +                     return -EINVAL;
> > +     } else if (clockid != CLOCK_MONOTONIC && clockid !=
> CLOCK_REALTIME)
> > +             return -EINVAL;
> > +
> > +     /*
> > +      * Disallow any other bits in flags than those we implement.
> > +      */
> > +     if ((flags & ~(TFD_TIMER_ABSTIME)) != 0)
> > +             return -EINVAL;
> > +
> > +     /*
> > +         * Not sure if this check is strictly necessary, except to stop
>
> Use tab+space instead of spaces.


Doh!  Yes.

> +      * userspace trying to do something silly.  Flags only has meaning
> > +      * if we are creating/modifying a timer.
> > +      */
> > +     if (utmr == NULL && flags != 0)
> >               return -EINVAL;
> > -     if (!timespec_valid(&ktmr.it_value) ||
> > -         !timespec_valid(&ktmr.it_interval))
> > +
> > +     /*
> > +      * If we are creating a new timer, then we must supply the setting
> > +      * and there is no previous timer setting to retrieve.
> > +      */
> > +     if (ufd == -1 && (utmr == NULL || old_utmr != NULL))
> > +             return -EINVAL;
> > +
> > +     /*
> > +      * Caller must provide a new timer setting, or ask for previous
> > +      * setting, or both.
> > +      */
> > +     if (utmr == NULL && old_utmr == NULL)
> >               return -EINVAL;
> >
> > +     if (utmr) {
> > +             if (copy_from_user(&ktmr, utmr, sizeof(ktmr)))
> > +                     return -EFAULT;
> > +
> > +             if (!timespec_valid(&ktmr.it_value) ||
> > +                 !timespec_valid(&ktmr.it_interval))
> > +                     return -EINVAL;
> > +     }
> > +
> >       if (ufd == -1) {
> >               ctx = kmalloc(sizeof(*ctx), GFP_KERNEL);
> >               if (!ctx)
> > @@ -176,6 +231,11 @@
> >
> >               init_waitqueue_head(&ctx->wqh);
> >
> > +             /*
> > +              * Save the clockid since we need it later if we modify
> > +              * the timer.
> > +              */
> > +             ctx->clockid = clockid;
> >               timerfd_setup(ctx, clockid, flags, &ktmr);
> >
> >               /*
> > @@ -195,23 +255,61 @@
> >                       fput(file);
> >                       return -EINVAL;
> >               }
> > -             /*
> > -              * We need to stop the existing timer before reprogramming
> > -              * it to the new values.
> > -              */
> > -             for (;;) {
> > -                     spin_lock_irq(&ctx->wqh.lock);
> > -                     if (hrtimer_try_to_cancel(&ctx->tmr) >= 0)
> > -                             break;
> > -                     spin_unlock_irq(&ctx->wqh.lock);
> > -                     cpu_relax();
> > +
> > +             if (old_utmr) {
> > +
> > +                     /*
> > +                      * Retrieve interval and time remaining before
> > +                      * next expiration of timer.
> > +                      */
> > +
> > +                     struct hrtimer *timer = &ctx->tmr;
> > +                     ktime_t iv = ctx->tintv;
> > +                     ktime_t now, remaining;
> > +
> > +                     clockid = ctx->clockid;
> > +
> > +                     memset(&oktmr, 0, sizeof(struct itimerspec));
> > +                     if (iv.tv64)
> > +                             oktmr.it_interval = ktime_to_timespec(iv);
> > +
> > +                     now = timer->base->get_time();
> > +                     ctx->overrun += hrtimer_forward(&ctx->tmr,
> > +                                     hrtimer_cb_get_time(&ctx->tmr),
> > +                                     ctx->tintv);
> > +                     remaining = ktime_sub(timer->expires, now);
> > +                     if (remaining.tv64 <= 0)
> > +                             oktmr.it_value.tv_nsec = 1;
> > +                     else
> > +                             oktmr.it_value =
> ktime_to_timespec(remaining);
> > +
> > +                     if (copy_to_user(old_utmr, &oktmr, sizeof
> (oktmr))) {
> > +                             fput(file);
> > +                             return -EFAULT;
> > +                     }
> >               }
> > -             /*
> > -              * Re-program the timer to the new value ...
> > -              */
> > -             timerfd_setup(ctx, clockid, flags, &ktmr);
> >
> > -             spin_unlock_irq(&ctx->wqh.lock);
> > +             if (utmr) {
> > +                     /*
> > +                      * Modify settings of existing timer.
> > +                      * We need to stop the existing timer before
> > +                      * reprogramming it to the new values.
> > +                      */
> > +                     for (;;) {
> > +                             spin_lock_irq(&ctx->wqh.lock);
> > +                             if (hrtimer_try_to_cancel(&ctx->tmr) >= 0)
> > +                                     break;
> > +                             spin_unlock_irq(&ctx->wqh.lock);
> > +                             cpu_relax();
> > +                     }
> > +
> > +                     /*
> > +                      * Re-program the timer to the new value ...
> > +                      */
> > +                     timerfd_setup(ctx, ctx->clockid, flags, &ktmr);
> > +
> > +                     spin_unlock_irq(&ctx->wqh.lock);
> > +             }
> >               fput(file);
> >       }
> >
> > @@ -222,4 +320,3 @@
> >       kfree(ctx);
> >       return error;
> >  }
> > -
> > diff -ruN linux-2.6.23-rc1-mm2-orig/include/linux/compat.h
> linux-2.6.23-rc1-mm2/include/linux/compat.h
> > --- linux-2.6.23-rc1-mm2-orig/include/linux/compat.h  2007-07-09 01:32:
> 17.000000000 +0200
> > +++ linux-2.6.23-rc1-mm2/include/linux/compat.h       2007-08-05 17:46:
> 33.000000000 +0200
> > @@ -265,7 +265,8 @@
> >                               const compat_sigset_t __user *sigmask,
> >                                  compat_size_t sigsetsize);
> >  asmlinkage long compat_sys_timerfd(int ufd, int clockid, int flags,
> > -                             const struct compat_itimerspec __user
> *utmr);
> > +                     const struct compat_itimerspec __user *utmr,
> > +                     struct compat_itimerspec __user *old_utmr);
> >
> >  #endif /* CONFIG_COMPAT */
> >  #endif /* _LINUX_COMPAT_H */
> > diff -ruN linux-2.6.23-rc1-mm2-orig/include/linux/syscalls.h
> linux-2.6.23-rc1-mm2/include/linux/syscalls.h
> > --- linux-2.6.23-rc1-mm2-orig/include/linux/syscalls.h        2007-08-05
> 10:44:32.000000000 +0200
> > +++ linux-2.6.23-rc1-mm2/include/linux/syscalls.h     2007-08-05 17:47:
> 15.000000000 +0200
> > @@ -608,7 +608,8 @@
> >  asmlinkage long sys_getcpu(unsigned __user *cpu, unsigned __user *node,
> struct getcpu_cache __user *cache);
> >  asmlinkage long sys_signalfd(int ufd, sigset_t __user *user_mask,
> size_t sizemask);
> >  asmlinkage long sys_timerfd(int ufd, int clockid, int flags,
> > -                         const struct itimerspec __user *utmr);
> > +                     const struct itimerspec __user *utmr,
> > +                     struct itimerspec __user *old_utmr);
> >  asmlinkage long sys_eventfd(unsigned int count);
> >  asmlinkage long sys_fallocate(int fd, int mode, loff_t offset, loff_t
> len);
>
>
> ---
> ~Randy
> *** Remember to use Documentation/SubmitChecklist when testing your code
> ***
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2007-09-10  3:18 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-07-23  6:32 Problems with timerfd() Michael Kerrisk
2007-07-23  6:38 ` Andrew Morton
2007-07-23  6:42   ` Andrew Morton
2007-07-23  8:02     ` Michael Kerrisk
2007-07-25 18:18   ` Michael Kerrisk
2007-07-25 22:12     ` Andrew Morton
2007-08-07  6:55       ` Michael Kerrisk
2007-08-07  7:36         ` Andrew Morton
2007-08-07  9:14           ` Michael Kerrisk
2007-08-09 21:11         ` [PATCH] Revised timerfd() interface Michael Kerrisk
2007-08-13 23:34           ` Randy Dunlap
2007-08-15 14:40           ` Jonathan Corbet
2007-07-23 16:55 ` Problems with timerfd() Ray Lee
2007-07-24  7:40   ` Michael Kerrisk
2007-07-24 15:22     ` Ray Lee
2007-07-24 15:56       ` Michael Kerrisk
  -- strict thread matches above, loose matches on Subject: below --
2007-08-25  6:41 [PATCH] Revised timerfd() interface Michael Kerrisk
2007-08-30 12:01 ` Davide Libenzi
2007-09-04  8:03   ` Michael Kerrisk
2007-09-04  8:18     ` Andrew Morton
2007-09-04  8:24       ` Michael Kerrisk
2007-09-04 15:25       ` Davide Libenzi
2007-09-04 15:39         ` Michael Kerrisk
2007-09-04 22:41           ` Davide Libenzi
2007-09-04 20:49         ` Michael Kerrisk
2007-09-04 22:44           ` Davide Libenzi
2007-09-05  0:08             ` Michael Kerrisk
2007-09-05 12:02               ` Andrew Morton
2007-09-05 16:14               ` Davide Libenzi
2007-09-05 16:23                 ` Michael Kerrisk
2007-09-05 19:57                   ` Davide Libenzi
2007-09-05 22:50                     ` Michael Kerrisk
2007-09-05 23:45                       ` Davide Libenzi
2007-09-06  6:58                         ` Michael Kerrisk
2007-09-06 23:37                           ` Davide Libenzi
2007-09-10  3:15                             ` Michael Kerrisk
2007-09-05 12:12         ` Denys Vlasenko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.