public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Problems with timerfd()
@ 2007-07-23  6:32 Michael Kerrisk
  2007-07-23  6:38 ` Andrew Morton
  2007-07-23 16:55 ` Problems with timerfd() Ray Lee
  0 siblings, 2 replies; 16+ messages in thread
From: Michael Kerrisk @ 2007-07-23  6:32 UTC (permalink / raw)
  To: Andrew Morton; +Cc: lkml, Linux Torvalds, Davide Libenzi, drepper

Andrew,

The timerfd() syscall went into 2.6.22.  While writing the man page for
this syscall I've found some notable limitations of the interface, and I am
wondering whether you and Linus would consider having this interface fixed
for 2.6.23.

On the one hand, these fixes would be an ABI change, which is of course
bad.  (However, as noted below, you have already accepted one of the ABI
changes that I suggested into -mm, after Davide submitted a patch.)

On the other hand, the interface has not yet made its way into a glibc
release, and the change will not break applications.  (The 2.6.22 version
of the interface would just be "broken".)

Details of my suggested changes are below.  A complication in all of this
is that on Friday, while I was part way though discussing this with Davide,
he went on vacation for a month and is likely to have only limited email
access during that time.  (See my further thoughts about what to do while
Davide is away at the end of this mail message.)  Our last communication,
after Davide had expressed reluctance about making some of the interface
changes, was a more extensive note from me describing the problems of the
interface.

The problems of the 2.6.22 timerfd() interface are as follows:

Problem 1
---------

The value returned by read(2)ing from a timerfd file descriptor is the
number of timer overruns.  In 2.6.22, this value is 4 bytes, limiting the
overrun count  to 2^32.  Consider an application where the timer frequency
was 100 kHz (feasible in the not-too-distant future, I would guess), then
the overrun counter would cycle after ~40000 seconds (~11 hours).
Furthermore returning 4 bytes from the read() is inconsistent with eventfd
file descriptors, which return 8 byte integers from a read().

Davide has already submitted a patch to you to make read() from a timerfd
file descriptor return an 8 byte integer, and I understand it to have been
accepted into -mm.

Problem 2
---------
Existing timer APIs (Unix interval timers -- setitimer(2); POSIX timers --
timer_settime()) allow the caller to retrieve the previous setting of a
timer at the same time as a new timer setting is established.  This permits
functionality such as the following for userland programs:

1. set a timer to go of at time X
2. modify the timer to go off at earlier time Z; return previous
   timer settings (X)
3. When the timer Z expires, restore timer to expire at time X

timerfd() does not provide this functionality.

Problem 3
---------

Existing timer APIs (Unix interval timers -- getitimer(2); POSIX timers --
timer_gettime()) allow the caller to retrieve the time remaining until the
next expiration of the timer.

timerfd() does not provide this functionality.

Solution (proposed interface changes)
-------------------------------------

In response to my "Problem 2", Davide noted in the last message I got from
him before he went on vacation:

> But the old status of the timer is the union of clickid, flags and utmr.
> So, in theory, the whole set should be returned back, forcing a pretty
> drastic API change.

However, I think there is a reasonable solution to this problem, which I
outlined to Davide, but did not yet hear back from him about.

a) Make the 'clockid' immutable: say that it can only be set
   if 'ufd' is -1 -- that is, on the timerfd() call that
   first creates the timer.  This would eliminate the need to
   return the previous clockid value.  (This is effectively
   the limitation that is imposed by POSIX timers: the
   clockid is specified when the timer is created with
   timer_create(), and can't be changed.)

   [In the 2.6.22 interface, the clockid of an existing
   timer can be changed with a further call to timerfd()
   that specifies the file descriptor of an existing timer.]

b) There is no need to return the previous 'flags' setting.
   The POSIX timer functions (i.e., timer_settime()) do not
   do this. Instead, timer_settime() always returns the
   time until the next expiration would have occurred,
   even if the TIMER_ABSTIME flag was specified when
   the timer was set.

   [The only 'flags' value currently implemented in
   timerfd() is TFD_TIMER_ABSTIME, which is the
   equivalent of TIMER_ABSTIME.]

With these design assumptions, the only thing that would need
to be added to timerfd() would be an argument used to return the time
until the previous timer would have expired + its interval.

The use cases would be as follows:

ufd = timerfd(-1, clockid, flags, utmr, NULL);
to create a new timer with given clockid, flags, and utmr (intial
expiration + interval).

ufd = timerfd(ufd, 0, flags, utmr, NULL);
to change the flags and timer settings of an existing timer.

ufd = timerfd(ufd, 0, flags, utmr, &old);
to change the flags and timer settings of an existing timer, and retrieve
the time until the next expiration of the timer (and the associated interval).

ufd = timerfd(ufd, 0, 0, NULL, &old);
Return the time until the next expiration of the timer (and the associated
interval), without changing the existing timer settings

Practical details
-----------------

Since Davide is away, my proposal is this: if you are prepared to consider
entertaining this ABI change, then I would try to write the patch.  If when
we next hear from Davide (he may have intermittent email access over the
next month), he agrees that it is worth making the change, then I would
submit the change to -mm with the hope that time frames would allow for it
to make it into 2.6.23.  (I would guess that delaying things for a month
means the fix might not make it into 2.6.23, and would thus simply make the
fix more painful and less feasible.)

What do you think?

Cheers,

Michael

PS For reference, the timerfd.2 man page describing the 2.6.22 interface is
below.

.TH TIMERFD 2 2007-07-17 Linux "Linux Programmer's Manual"
.SH NAME
timerfd \- create a timer that delivers notifications on a file descriptor
.SH SYNOPSIS
.\" FIXME . This header file may well change
.\" FIXME . Probably _GNU_SOURCE will be required
.\" FIXME . May require: Link with \fI\-lrt\f
.nf
.B #include <sys/timerfd.h>
.sp
.BR "int timerfd(int " ufd ", int " clockid ", int " flags ,
.BR "            const struct itimerspec *" utmr );
.fi
.SH DESCRIPTION
.BR timerfd ()
creates and starts a new timer (or modifies the settings of an
existing timer) that delivers timer expiration
information via a file descriptor.
This provides an alternative to the use of
.BR setitimer (2)
or
.BR timer_create (3),
and has the advantage that the file descriptor may be monitored by
.BR poll (2)
and
.BR select (2).
.\" FIXME Davide, a question: timer_settime() and setitimer()
.\" both permit the caller to obtain the old value of the
.\" timer when modifying an existing timer.  Why doesn't
.\" timerfd() provide this functionality?

The
.I ufd
argument is either \-1 to create a new timer,
or a file descriptor referring to an existing timerfd timer.
The remaining arguments specify the settings for the new timer,
or the modified settings for an existing timer.

The
.I clockid
argument specifies the clock that is used to mark the progress
of the timer, and must be either
.BR CLOCK_REALTIME
or
.BR CLOCK_MONOTONIC .
.B CLOCK_REALTIME
is a settable system-wide clock.
.B CLOCK_MONOTONIC
is a non-settable clock that is not affected
by discontinuous changes in the system clock
(e.g., manual changes to system time).
See also
.BR clock_getres (3).

The
.I flags
argument is either 0, to create a relative timer
.RI ( utmr.it_interval
specifies a relative time for the clock specified by
.IR clockid ),
or
.BR TFD_TIMER_ABSTIME ,
to create an absolute timer
.RI ( utmr.it_interval
specifies an absolute time for the clock specified by
.IR clockid ).

The
.I utmr
argument specifies the initial expiration and interval for the timer.
The
.I itimer
structure used for this argument contains two fields,
each of which is in turn a structure of type
.IR timespec :
.in +0.5i
.nf

struct timespec {
    time_t tv_sec;               /* Seconds */
    long   tv_nsec;              /* Nanoseconds */
};

struct itimerspec {
    struct timespec it_interval; /* Interval for periodic
                                    timer */
    struct timespec it_value;    /* Initial expiration */
};
.fi
.in
.PP
.IR utmr.it_value
specifies the initial expiration of the timer,
in seconds and nanoseconds.
Setting both fields of
.IR utmr.it_value
to zero will disable an existing timer
.RI ( ufd
!= \-1),
or create a new timer that is not armed
.RI ( ufd
== \-1).

Setting one or both fields of
.I utmr.it_interval
to non-zero values specifies the period, in seconds and nanoseconds,
for repeated timer expirations after the initial expiration.
If both fields of
.I utmr.it_interval
are zero, the the timer expires just once, at the time specified by
.IR utmr.it_value .
.PP
.BR timerfd (2)
returns a file descriptor that supports the following operations:
.TP
.BR read (2)
.\" FIXME Davide, What I have written below is what
.\" I've determined from looking at the source code
.\" and from experimenting.  But is it correct?
If the timer has already expired one or more times since it was created,
or since the last
.BR read (2),
then the buffer given to
.BR read (2)
returns an unsigned 4-byte integer
.RI ( uint32_t )
containing the number of expirations that have occurred.
.\" FIXME Davide, what if there are more expirations than can fit
.\" in a uint32_t?  (Why wasn't this value uint64_t, as with
.\" eventfd()?)
.IP
If no timer expirations have occurred at the time of the
.BR read (2),
then the call either blocks until the next timer expiration,
or fails with the error
.B EAGAIN
if the file descriptor has been made non-blocking
(via the use of the
.BR fcntl (2)
.B F_SETFL
operation to set the
.B O_NONBLOCK
flag).
.IP
A
.BR read (2)
will fail with the error
.B EINVAL
if the size of the supplied buffer is less than 4 bytes.
.TP
.BR poll "(2), " select "(2) (and similar)"
The file descriptor is readable
(the
.BR select (2)
.I readfds
argument; the
.BR poll (2)
.B POLLIN
flag)
if one or more timer expirations have occurred.
.IP
The timerfd file descriptor also supports the other file-descriptor
multiplexing APIs:
.BR pselect (2),
.BR ppoll (2),
and
.BR epoll (7).
.SS fork(2) semantics
.\" FIXME Davide, is the following correct?
After a
.BR fork (2),
the child inherits a copy of the timerfd file descriptor.
The file descriptor refers to the same underlying
file object as the corresponding descriptor in the parent,
and
.BR read (2)s
in the child will return information about
expirations of the timer.
.SS execve(2) semantics
.\" FIXME Davide, is the following correct?
A timerfd file descriptor is preserved across
.BR execve (2),
and continues to generate file expirations.
.SH "RETURN VALUE"
On success,
.BR timerfd ()
returns a timerfd file descriptor;
this is either a new file descriptor (if
.I ufd
was \-1), or
.I ufd
if
.I ufd
was a valid timerfd file descriptor.
On error, \-1 is returned and
.I errno
is set to indicate the error.
.SH ERRORS
.TP
.B EBADF
The
.I ufd
file descriptor is not a valid file descriptor.
.TP
.B EINVAL
The
.I ufd
file descriptor is not a valid timerfd file descriptor.
The
.I clockid
argument is neither
.B CLOCK_MONOTONIC
nor
.BR CLOCK_REALTIME .
The
.I utmr
is not properly initialized (one of the
.I tv_nsec
falls outside the range zero to 999,999,999).
.TP
.B EMFILE
The per-process limit of open file descriptors has been reached.
.TP
.B ENFILE
The system limit on the total number of open files has been
reached.
.TP
.B ENODEV
Could not mount (internal) anonymous i-node device.
.TP
.B ENOMEM
There was insufficient memory to handle the requested
.I op
control operation.
.SH VERSIONS
.BR timerfd (2)
is available on Linux since kernel 2.6.22.
.\" FIXME . check later to see when glibc support is provided
As at July 2007 (glibc 2.6), the details of the glibc interface
have not been finalized, so that, for example,
the eventual header file may be different from that shown above.
.SH CONFORMING TO
.BR timerfd (2)
is Linux specific.
.SH EXAMPLE
.nf

.\" FIXME . Check later what header file glibc uses for timerfd
.\" FIXME . Probably glibc will require _GNU_SOURCE to be set
.\"
.\" The commented out code here is what we currently need until
.\" the required stuff is in glibc
.\"
.\" #define _GNU_SOURCE
.\" #include <sys/syscall.h>
.\" #include <unistd.h>
.\" #include <time.h>
.\" #if defined(__i386__)
.\" #define __NR_timerfd 322
.\" #endif
.\"
.\" static int
.\" timerfd(int ufd, int clockid, int flags, struct itimerspec *utmr) {
.\"     return syscall(__NR_timerfd, ufd, clockid, flags, utmr);
.\" }
.\"
.\" #define TFD_TIMER_ABSTIME (1 << 0)
.\"
/* Link with -lrt */
#include <sys/timerfd.h>        /* May yet change for glibc */
#include <time.h>
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>     /* Definition of uint32_t */

#define die(msg) do { perror(msg); exit(EXIT_FAILURE); } while (0)

static void
print_elapsed_time(void)
{
    static struct timespec start;
    struct timespec curr;
    static int first_call = 1;
    int secs, nsecs;

    if (first_call) {
        first_call = 0;
        if (clock_gettime(CLOCK_MONOTONIC, &start) == \-1)
            die("clock_gettime");
    }

    if (clock_gettime(CLOCK_MONOTONIC, &curr) == \-1)
        die("clock_gettime");

    secs = curr.tv_sec \- start.tv_sec;
    nsecs = curr.tv_nsec \- start.tv_nsec;
    if (nsecs < 0) {
        secs\-\-;
        nsecs += 1000000000;
    }
    printf("%d.%03d: ", secs, (nsecs + 500000) / 1000000);
}

int
main(int argc, char *argv[])
{
    struct itimerspec utmr;
    int max_expirations, tot_exp, tfd;
    struct timespec now;
    uint32_t exp;
    ssize_t s;

    if ((argc != 2) && (argc != 4)) {
        fprintf(stderr, "%s init\-secs [interval\-secs max\-exp]\\n",
                argv[0]);
        exit(EXIT_FAILURE);
    }

    if (clock_gettime(CLOCK_REALTIME, &now) == \-1)
        die("clock_gettime");

    /* Create a CLOCK_REALTIME absolute timer with initial
       expiration and interval as specified in command line */

    utmr.it_value.tv_sec = now.tv_sec + atoi(argv[1]);
    utmr.it_value.tv_nsec = now.tv_nsec;
    if (argc == 2) {
        utmr.it_interval.tv_sec = 0;
        max_expirations = 1;
    } else {
        utmr.it_interval.tv_sec = atoi(argv[2]);
        max_expirations = atoi(argv[3]);
    }
    utmr.it_interval.tv_nsec = 0;

    tfd = timerfd(\-1, CLOCK_REALTIME, TFD_TIMER_ABSTIME, &utmr);
    if (tfd == \-1)
        die("timerfd");

    print_elapsed_time();
    printf("timer started\\n");

.\"    exp = 0; // ????? Without this initialization, the results from
.\"             // read() are strange; it appears that read() is only
.\"             // returning one byte of tick information, not four.
    for (tot_exp = 0; tot_exp < max_expirations;) {
        s = read(tfd, &exp, sizeof(uint32_t));
        if (s != sizeof(uint32_t))
            die("read");

        tot_exp += exp;
        print_elapsed_time();
        printf("read: %u; total=%d\\n", exp, tot_exp);
    }

    exit(EXIT_SUCCESS);
}
.fi
.SH "SEE ALSO"
.BR eventfd (2),
.BR poll (2),
.BR read (2),
.BR select (2),
.BR signalfd (2),
.BR epoll (7),
.BR time (7)
.\" FIXME See: setitimer(2), timer_create(3), clock_settime(3)
.\" FIXME other timer syscalls, and have them refer to this page
.\" FIXME have SEE ALSO in time.7 refer to this page.



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Problems with timerfd()
  2007-07-23  6:32 Problems with timerfd() Michael Kerrisk
@ 2007-07-23  6:38 ` Andrew Morton
  2007-07-23  6:42   ` Andrew Morton
  2007-07-25 18:18   ` Michael Kerrisk
  2007-07-23 16:55 ` Problems with timerfd() Ray Lee
  1 sibling, 2 replies; 16+ messages in thread
From: Andrew Morton @ 2007-07-23  6:38 UTC (permalink / raw)
  To: Michael Kerrisk; +Cc: lkml, Linux Torvalds, Davide Libenzi, drepper, stable

On Mon, 23 Jul 2007 08:32:29 +0200 Michael Kerrisk <mtk-manpages@gmx.net> wrote:

> Andrew,
> 
> The timerfd() syscall went into 2.6.22.  While writing the man page for
> this syscall I've found some notable limitations of the interface, and I am
> wondering whether you and Linus would consider having this interface fixed
> for 2.6.23.
> 
> On the one hand, these fixes would be an ABI change, which is of course
> bad.  (However, as noted below, you have already accepted one of the ABI
> changes that I suggested into -mm, after Davide submitted a patch.)
> 
> On the other hand, the interface has not yet made its way into a glibc
> release, and the change will not break applications.  (The 2.6.22 version
> of the interface would just be "broken".)

I think if the need is sufficient we can do this: fix it in 2.6.23 and in
2.6.22.x.  That means that there will be a few broken-on-new-glibc kernels
out in the wild, but very few I suspect.

> Details of my suggested changes are below.  A complication in all of this
> is that on Friday, while I was part way though discussing this with Davide,
> he went on vacation for a month and is likely to have only limited email
> access during that time.  (See my further thoughts about what to do while
> Davide is away at the end of this mail message.)  Our last communication,
> after Davide had expressed reluctance about making some of the interface
> changes, was a more extensive note from me describing the problems of the
> interface.
> 
> The problems of the 2.6.22 timerfd() interface are as follows:
> 
> Problem 1
> ---------
> 
> The value returned by read(2)ing from a timerfd file descriptor is the
> number of timer overruns.  In 2.6.22, this value is 4 bytes, limiting the
> overrun count  to 2^32.  Consider an application where the timer frequency
> was 100 kHz (feasible in the not-too-distant future, I would guess), then
> the overrun counter would cycle after ~40000 seconds (~11 hours).
> Furthermore returning 4 bytes from the read() is inconsistent with eventfd
> file descriptors, which return 8 byte integers from a read().
> 
> Davide has already submitted a patch to you to make read() from a timerfd
> file descriptor return an 8 byte integer, and I understand it to have been
> accepted into -mm.

argh.  Nobody told me it was an ABI change!  We'll need to consider merging
make-timerfd-return-a-u64-and-fix-the-__put_user.patch into 2.6.22.x as
well.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Problems with timerfd()
  2007-07-23  6:38 ` Andrew Morton
@ 2007-07-23  6:42   ` Andrew Morton
  2007-07-23  8:02     ` Michael Kerrisk
  2007-07-25 18:18   ` Michael Kerrisk
  1 sibling, 1 reply; 16+ messages in thread
From: Andrew Morton @ 2007-07-23  6:42 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Michael Kerrisk, lkml, Linux Torvalds, Davide Libenzi, drepper,
	stable

On Sun, 22 Jul 2007 23:38:26 -0700 Andrew Morton <akpm@linux-foundation.org> wrote:

> > Davide has already submitted a patch to you to make read() from a timerfd
> > file descriptor return an 8 byte integer, and I understand it to have been
> > accepted into -mm.
> 
> argh.  Nobody told me it was an ABI change!  We'll need to consider merging
> make-timerfd-return-a-u64-and-fix-the-__put_user.patch into 2.6.22.x as
> well.
> 

So I'm trying to write a halfway respectable description of that patch and
I'm stuck when it comes to describing what will happen if someone tries
to run a future timerfd-enabled glibc on 2.2.22 base.   In what manner
will it misbehave?  What are the consequences of this decision?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Problems with timerfd()
  2007-07-23  6:42   ` Andrew Morton
@ 2007-07-23  8:02     ` Michael Kerrisk
  0 siblings, 0 replies; 16+ messages in thread
From: Michael Kerrisk @ 2007-07-23  8:02 UTC (permalink / raw)
  To: Andrew Morton; +Cc: lkml, Linux Torvalds, Davide Libenzi, drepper, stable



Andrew Morton wrote:
> On Sun, 22 Jul 2007 23:38:26 -0700 Andrew Morton <akpm@linux-foundation.org> wrote:
> 
>>> Davide has already submitted a patch to you to make read() from a timerfd
>>> file descriptor return an 8 byte integer, and I understand it to have been
>>> accepted into -mm.
>> argh.  Nobody told me it was an ABI change!  We'll need to consider merging
>> make-timerfd-return-a-u64-and-fix-the-__put_user.patch into 2.6.22.x as
>> well.
>>
> 
> So I'm trying to write a halfway respectable description of that patch and
> I'm stuck when it comes to describing what will happen if someone tries
> to run a future timerfd-enabled glibc on 2.2.22 base.   In what manner
> will it misbehave?  What are the consequences of this decision?
> 

The difference would be that read() will only return 4 bytes, while the
application will expect 8.  If the application is checking the size of
returned value, as it should, then it will be able to detect the problem
(it could even be sophisticated enough to know that if this is a 4-byte
return, then it is running on an old 2.6.22 kernel).  If the application is
not checking the return from read(), then its 8-byte buffer will not be
filled -- the contents of the last 4 bytes will be undefined, so the u64
value as a whole will be junk.

Cheers,

Michael

-- 
Michael Kerrisk
maintainer of Linux man pages Sections 2, 3, 4, 5, and 7

Want to help with man page maintenance?  Grab the latest tarball at
http://www.kernel.org/pub/linux/docs/manpages/
read the HOWTOHELP file and grep the source files for 'FIXME'.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Problems with timerfd()
  2007-07-23  6:32 Problems with timerfd() Michael Kerrisk
  2007-07-23  6:38 ` Andrew Morton
@ 2007-07-23 16:55 ` Ray Lee
  2007-07-24  7:40   ` Michael Kerrisk
  1 sibling, 1 reply; 16+ messages in thread
From: Ray Lee @ 2007-07-23 16:55 UTC (permalink / raw)
  To: Michael Kerrisk
  Cc: Andrew Morton, lkml, Linux Torvalds, Davide Libenzi, drepper

Hey there Michael, all,

On 7/22/07, Michael Kerrisk <mtk-manpages@gmx.net> wrote:
> Problem 1
> ---------
>
> The value returned by read(2)ing from a timerfd file descriptor is the
> number of timer overruns.  In 2.6.22, this value is 4 bytes, limiting the
> overrun count  to 2^32.  Consider an application where the timer frequency
> was 100 kHz (feasible in the not-too-distant future, I would guess), then
> the overrun counter would cycle after ~40000 seconds (~11 hours).
> Furthermore returning 4 bytes from the read() is inconsistent with eventfd
> file descriptors, which return 8 byte integers from a read().

I'm feeling slow, and think I'm missing something. Why is this an
issue? Wouldn't userspace just keep track of the last overrun count,
and subtract the two to get the overruns-since-last-read? That makes
it oblivious to rollovers, unless it can't manage to do a read once
every 11 hours.

(This is the same sort of thing we already have to deal with in
certain situations, such as network stat counters on 32 bit
platforms.)

Ray

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Problems with timerfd()
  2007-07-23 16:55 ` Problems with timerfd() Ray Lee
@ 2007-07-24  7:40   ` Michael Kerrisk
  2007-07-24 15:22     ` Ray Lee
  0 siblings, 1 reply; 16+ messages in thread
From: Michael Kerrisk @ 2007-07-24  7:40 UTC (permalink / raw)
  To: Ray Lee; +Cc: drepper, davidel, torvalds, linux-kernel, akpm

Ray,

> On 7/22/07, Michael Kerrisk <mtk-manpages@gmx.net> wrote:
> > Problem 1
> > ---------
> >
> > The value returned by read(2)ing from a timerfd file descriptor is 
> > the
> > number of timer overruns.  In 2.6.22, this value is 4 bytes, limiting
> > the overrun count  to 2^32.  Consider an application where the timer
> > frequency
> > was 100 kHz (feasible in the not-too-distant future, I would guess),
> > then
> > the overrun counter would cycle after ~40000 seconds (~11 hours).
> > Furthermore returning 4 bytes from the read() is inconsistent with
> > eventfd
> > file descriptors, which return 8 byte integers from a read().
> 
> I'm feeling slow, and think I'm missing something. Why is this an
> issue? Wouldn't userspace just keep track of the last overrun count,
> and subtract the two to get the overruns-since-last-read? 

The value returned by read() is the number of overruns since
the last read().

> That makes
> it oblivious to rollovers, unless it can't manage to do a read once
> every 11 hours.

That's the point that the change is meant to address.

> (This is the same sort of thing we already have to deal with in
> certain situations, such as network stat counters on 32 bit
> platforms.)

But userspace can't deal with the condition accurately, so why
require userspace to worry about this when we could just use
a 64-bit value instead?

Cheers,

Michael
-- 
Michael Kerrisk
maintainer of Linux man pages Sections 2, 3, 4, 5, and 7 

Want to help with man page maintenance?  
Grab the latest tarball at
http://www.kernel.org/pub/linux/docs/manpages , 
read the HOWTOHELP file and grep the source 
files for 'FIXME'.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Problems with timerfd()
  2007-07-24  7:40   ` Michael Kerrisk
@ 2007-07-24 15:22     ` Ray Lee
  2007-07-24 15:56       ` Michael Kerrisk
  0 siblings, 1 reply; 16+ messages in thread
From: Ray Lee @ 2007-07-24 15:22 UTC (permalink / raw)
  To: Michael Kerrisk; +Cc: drepper, davidel, torvalds, linux-kernel, akpm

On 7/24/07, Michael Kerrisk <mtk-manpages@gmx.net> wrote:
> > On 7/22/07, Michael Kerrisk <mtk-manpages@gmx.net> wrote:
> > > The value returned by read(2)ing from a timerfd file descriptor is
> > > the
> > > number of timer overruns.  In 2.6.22, this value is 4 bytes, limiting
> > > the overrun count  to 2^32.  Consider an application where the timer
> > > frequency
> > > was 100 kHz (feasible in the not-too-distant future, I would guess),
> > > then
> > > the overrun counter would cycle after ~40000 seconds (~11 hours).
> > > Furthermore returning 4 bytes from the read() is inconsistent with
> > > eventfd
> > > file descriptors, which return 8 byte integers from a read().
> >
> > I'm feeling slow, and think I'm missing something. Why is this an
> > issue? Wouldn't userspace just keep track of the last overrun count,
> > and subtract the two to get the overruns-since-last-read?
>
> The value returned by read() is the number of overruns since
> the last read().

Okay, and this is an application local count, yes? (Not system-wide.)

> > That makes
> > it oblivious to rollovers, unless it can't manage to do a read once
> > every 11 hours.
>
> That's the point that the change is meant to address.

Once every 11 hours isn't a terrible price to pay. Once every 11
seconds would be another matter. It doesn't seem unreasonable to
expect an application that is already actively dealing with timers to
go and take a look at things every hour to see if there have been more
overruns.

> > (This is the same sort of thing we already have to deal with in
> > certain situations, such as network stat counters on 32 bit
> > platforms.)
>
> But userspace can't deal with the condition accurately,

Okay, perhaps this is where I'm missing something? If userspace wakes
up once every hour, checks the overrun counter against the previous
(new-old), and goes back to sleep, that'd be good enough, right?

> so why
> require userspace to worry about this when we could just use
> a 64-bit value instead?

<shrug> I don't have strong feelings either way. It just seemed like
something that could already be taken care of with today's interface.
Given that the discussion was about an API change between 2.6.22 and
2.6.23, I was looking for options to avoid that.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Problems with timerfd()
  2007-07-24 15:22     ` Ray Lee
@ 2007-07-24 15:56       ` Michael Kerrisk
  0 siblings, 0 replies; 16+ messages in thread
From: Michael Kerrisk @ 2007-07-24 15:56 UTC (permalink / raw)
  To: Ray Lee; +Cc: akpm, linux-kernel, torvalds, davidel, drepper

> > > (This is the same sort of thing we already have to deal with in
> > > certain situations, such as network stat counters on 32 bit
> > > platforms.)
> >
> > But userspace can't deal with the condition accurately,
> 
> Okay, perhaps this is where I'm missing something? If userspace wakes
> up once every hour, checks the overrun counter against the previous
> (new-old), and goes back to sleep, that'd be good enough, right?

Yes.

> > so why
> > require userspace to worry about this when we could just use
> > a 64-bit value instead?
> 
> <shrug> I don't have strong feelings either way. It just seemed like
> something that could already be taken care of with today's interface.
> Given that the discussion was about an API change between 2.6.22 and
> 2.6.23, I was looking for options to avoid that.

In fact I don't have strong feelings on this part of the interface
design either.  I pointed out the limitation to Davide, and 
pointed out that the related eventfd interface read()s an 8-byte 
integer, and Davide then just fired off a patch to Andrew which 
went into --mm.

It's the other problems with the interface that bother me more
(inability to retrieve previous setting when changing
the timer; inability to retrieve time until next expiration
without changing the current setting).

Cheers,

Michael
-- 
Michael Kerrisk
maintainer of Linux man pages Sections 2, 3, 4, 5, and 7 

Want to help with man page maintenance?  
Grab the latest tarball at
http://www.kernel.org/pub/linux/docs/manpages , 
read the HOWTOHELP file and grep the source 
files for 'FIXME'.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Problems with timerfd()
  2007-07-23  6:38 ` Andrew Morton
  2007-07-23  6:42   ` Andrew Morton
@ 2007-07-25 18:18   ` Michael Kerrisk
  2007-07-25 22:12     ` Andrew Morton
  1 sibling, 1 reply; 16+ messages in thread
From: Michael Kerrisk @ 2007-07-25 18:18 UTC (permalink / raw)
  To: Andrew Morton; +Cc: lkml, Linux Torvalds, Davide Libenzi, drepper, stable

Andrew,

Andrew Morton wrote:
> On Mon, 23 Jul 2007 08:32:29 +0200 Michael Kerrisk <mtk-manpages@gmx.net> wrote:
> 
>> Andrew,
>>
>> The timerfd() syscall went into 2.6.22.  While writing the man page for
>> this syscall I've found some notable limitations of the interface, and I am
>> wondering whether you and Linus would consider having this interface fixed
>> for 2.6.23.
>>
>> On the one hand, these fixes would be an ABI change, which is of course
>> bad.  (However, as noted below, you have already accepted one of the ABI
>> changes that I suggested into -mm, after Davide submitted a patch.)
>>
>> On the other hand, the interface has not yet made its way into a glibc
>> release, and the change will not break applications.  (The 2.6.22 version
>> of the interface would just be "broken".)
> 
> I think if the need is sufficient we can do this: fix it in 2.6.23 and in
> 2.6.22.x.  That means that there will be a few broken-on-new-glibc kernels
> out in the wild, but very few I suspect.

So I'm still not quite clear.  Can I take it from your statement above that
the proposed ABI changes would be admissible, as long as Davide is okay
with them?

Cheers,

Michael


-- 
Michael Kerrisk
maintainer of Linux man pages Sections 2, 3, 4, 5, and 7

Want to help with man page maintenance?  Grab the latest tarball at
http://www.kernel.org/pub/linux/docs/manpages/
read the HOWTOHELP file and grep the source files for 'FIXME'.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Problems with timerfd()
  2007-07-25 18:18   ` Michael Kerrisk
@ 2007-07-25 22:12     ` Andrew Morton
  2007-08-07  6:55       ` Michael Kerrisk
  0 siblings, 1 reply; 16+ messages in thread
From: Andrew Morton @ 2007-07-25 22:12 UTC (permalink / raw)
  To: Michael Kerrisk; +Cc: lkml, Linux Torvalds, Davide Libenzi, drepper, stable

On Wed, 25 Jul 2007 20:18:51 +0200
Michael Kerrisk <mtk-manpages@gmx.net> wrote:

> Andrew,
> 
> Andrew Morton wrote:
> > On Mon, 23 Jul 2007 08:32:29 +0200 Michael Kerrisk <mtk-manpages@gmx.net> wrote:
> > 
> >> Andrew,
> >>
> >> The timerfd() syscall went into 2.6.22.  While writing the man page for
> >> this syscall I've found some notable limitations of the interface, and I am
> >> wondering whether you and Linus would consider having this interface fixed
> >> for 2.6.23.
> >>
> >> On the one hand, these fixes would be an ABI change, which is of course
> >> bad.  (However, as noted below, you have already accepted one of the ABI
> >> changes that I suggested into -mm, after Davide submitted a patch.)
> >>
> >> On the other hand, the interface has not yet made its way into a glibc
> >> release, and the change will not break applications.  (The 2.6.22 version
> >> of the interface would just be "broken".)
> > 
> > I think if the need is sufficient we can do this: fix it in 2.6.23 and in
> > 2.6.22.x.  That means that there will be a few broken-on-new-glibc kernels
> > out in the wild, but very few I suspect.
> 
> So I'm still not quite clear.  Can I take it from your statement above that
> the proposed ABI changes would be admissible, as long as Davide is okay
> with them?
> 

yup, I'll send that diff into Linus and -stable and see what happens.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Problems with timerfd()
  2007-07-25 22:12     ` Andrew Morton
@ 2007-08-07  6:55       ` Michael Kerrisk
  2007-08-07  7:36         ` Andrew Morton
  2007-08-09 21:11         ` [PATCH] Revised timerfd() interface Michael Kerrisk
  0 siblings, 2 replies; 16+ messages in thread
From: Michael Kerrisk @ 2007-08-07  6:55 UTC (permalink / raw)
  To: Andrew Morton
  Cc: lkml, Linux Torvalds, Davide Libenzi, drepper, stable,
	Christoph Hellwig, Randy Dunlap

Andrew,

I'm working on the changes to timerfd(), but must admit I am struggling to
understand some of the kernel code for working with userspace timers (e.g.,
in kernel/posix-timers.c).  Can you suggest anyone who could provide
assistance?

Cheers,

Michael

Andrew Morton wrote:
> On Wed, 25 Jul 2007 20:18:51 +0200
> Michael Kerrisk <mtk-manpages@gmx.net> wrote:
> 
>> Andrew,
>>
>> Andrew Morton wrote:
>>> On Mon, 23 Jul 2007 08:32:29 +0200 Michael Kerrisk <mtk-manpages@gmx.net> wrote:
>>>
>>>> Andrew,
>>>>
>>>> The timerfd() syscall went into 2.6.22.  While writing the man page for
>>>> this syscall I've found some notable limitations of the interface, and I am
>>>> wondering whether you and Linus would consider having this interface fixed
>>>> for 2.6.23.
>>>>
>>>> On the one hand, these fixes would be an ABI change, which is of course
>>>> bad.  (However, as noted below, you have already accepted one of the ABI
>>>> changes that I suggested into -mm, after Davide submitted a patch.)
>>>>
>>>> On the other hand, the interface has not yet made its way into a glibc
>>>> release, and the change will not break applications.  (The 2.6.22 version
>>>> of the interface would just be "broken".)
>>> I think if the need is sufficient we can do this: fix it in 2.6.23 and in
>>> 2.6.22.x.  That means that there will be a few broken-on-new-glibc kernels
>>> out in the wild, but very few I suspect.
>> So I'm still not quite clear.  Can I take it from your statement above that
>> the proposed ABI changes would be admissible, as long as Davide is okay
>> with them?
>>
> 
> yup, I'll send that diff into Linus and -stable and see what happens.
> 

-- 
Michael Kerrisk
maintainer of Linux man pages Sections 2, 3, 4, 5, and 7

Want to help with man page maintenance?  Grab the latest tarball at
http://www.kernel.org/pub/linux/docs/manpages/
read the HOWTOHELP file and grep the source files for 'FIXME'.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Problems with timerfd()
  2007-08-07  6:55       ` Michael Kerrisk
@ 2007-08-07  7:36         ` Andrew Morton
  2007-08-07  9:14           ` Michael Kerrisk
  2007-08-09 21:11         ` [PATCH] Revised timerfd() interface Michael Kerrisk
  1 sibling, 1 reply; 16+ messages in thread
From: Andrew Morton @ 2007-08-07  7:36 UTC (permalink / raw)
  To: Michael Kerrisk
  Cc: lkml, Linux Torvalds, Davide Libenzi, drepper, stable,
	Christoph Hellwig, Randy Dunlap

On Tue, 07 Aug 2007 08:55:01 +0200 Michael Kerrisk <mtk-manpages@gmx.net> wrote:

> I'm working on the changes to timerfd(), but must admit I am struggling to
> understand some of the kernel code for working with userspace timers (e.g.,
> in kernel/posix-timers.c).

Join the club.

>  Can you suggest anyone who could provide assistance?

The code was originally contributed by George Anzinger.  He has moved on
and Thomas Gleixner now does most work in there.  I'd expect that Thomas is
your guy, but he is nearing the end of a two-week vacation and will
presumably return to an akpm-sized inbox, so I'd give him a little time ;)


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Problems with timerfd()
  2007-08-07  7:36         ` Andrew Morton
@ 2007-08-07  9:14           ` Michael Kerrisk
  0 siblings, 0 replies; 16+ messages in thread
From: Michael Kerrisk @ 2007-08-07  9:14 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Thomas Gleixner, rdunlap, hch, stable, drepper, davidel, torvalds,
	linux-kernel

> On Tue, 07 Aug 2007 08:55:01 +0200 Michael Kerrisk <mtk-manpages@gmx.net>
> wrote:
> 
> > I'm working on the changes to timerfd(), but must admit I am struggling
> > to understand some of the kernel code for working with userspace timers
> > (e.g., in kernel/posix-timers.c).
> 
> Join the club.

Ahhh good to know I'm not the only one challeged by the code.

> >  Can you suggest anyone who could provide assistance?
> 
> The code was originally contributed by George Anzinger.  

Yes, I pinged George a week or so back, but got no response.
(I think he's more or less retired nowadays.)

> He has moved on
> and Thomas Gleixner now does most work in there.  I'd expect that Thomas
> is
> your guy, but he is nearing the end of a two-week vacation and will
> presumably return to an akpm-sized inbox, so I'd give him a little time 
> ;)

Thanks -- Thomas CCed on this reply.

Unfortunately I myself go on holiday on Saturday for two weeks.
All of these holidays (I guess that Davide is only back in another
1.5 weeks or so) start to make the chances of getting change
in during the .23-rc window look tight...

Cheers,

Michael
-- 
Michael Kerrisk
maintainer of Linux man pages Sections 2, 3, 4, 5, and 7 

Want to help with man page maintenance?  
Grab the latest tarball at
http://www.kernel.org/pub/linux/docs/manpages , 
read the HOWTOHELP file and grep the source 
files for 'FIXME'.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH] Revised timerfd() interface
  2007-08-07  6:55       ` Michael Kerrisk
  2007-08-07  7:36         ` Andrew Morton
@ 2007-08-09 21:11         ` Michael Kerrisk
  2007-08-13 23:34           ` Randy Dunlap
  2007-08-15 14:40           ` Jonathan Corbet
  1 sibling, 2 replies; 16+ messages in thread
From: Michael Kerrisk @ 2007-08-09 21:11 UTC (permalink / raw)
  To: Andrew Morton, Thomas Gleixner
  Cc: Michael Kerrisk, lkml, Linux Torvalds, Davide Libenzi, drepper,
	stable, Christoph Hellwig, Randy Dunlap, Jan Engelhardt, mtk

[-- Attachment #1: Type: text/plain, Size: 12283 bytes --]

Andrew,

Here's my first shot at changing the timerfd() interface; patch
is against 2.6.23-rc1-mm2.

I think I got to the bottom of understanding the timer code in
the end, but I may still have missed some things...

This is my first kernel patch of any substance.  Perhaps Randy
or Christoph can give my toes a light toasting for the various
boobs I've likely made.

Thomas, would you please look over this patch to verify my
timer code?

Davide: would you please bless this interface change ;-).
It makes it possible to:

1) Fetch the previous timer settings (i.e., interval, and time
   remaining until next expiration) while installing new settings

2) Retrieve the settings of an existing timer without changing
   the timer.

Both of these features are supported by the older Unix timer APIs
(setitimer/getitimer and timer_create/timer_settime/timer_gettime).

The changes are as follows:

a) A further argument, outmr, can be used to retrieve the interval
   and time remaining before the expiration of a previously created
   timer.  Thus the call signature is now:

   timerfd(int utmr, int clockid, int flags,
           struct itimerspec *utmr,
           struct itimerspec *outmr)

   The outmr argument:

   1) must be NULL for a new timer (ufd == -1).
   2) can be NULL if we are updating an existing
      timer (i.e., utmr != NULL), but we are not
      interested in the previous timer value.
   3) can be used to retrieve the time until next timer
      expiration, without changing the timer.
      (In this case, utmr == NULL and ufd >= 0.)

b) Make the 'clockid' immutable: it can only be set
   if 'ufd' is -1 -- that is, on the timerfd() call that
   first creates a timer.  (This is effectively
   the limitation that is imposed by POSIX timers: the
   clockid is specified when the timer is created with
   timer_create(), and can't be changed.)

   [In the 2.6.22 interface, the clockid of an existing
   timer can be changed with a further call to timerfd()
   that specifies the file descriptor of an existing timer.]

The use cases are as follows:

ufd = timerfd(-1, clockid, flags, utmr, NULL);
to create a new timer with given clockid, flags, and utmr (initial
expiration + interval).  outmr must be NULL.

ufd = timerfd(ufd, -1, flags, utmr, NULL);
to change the flags and timer settings of an existing timer.
(The clockid argument serves no purpose when modifying an
existing timer, and as I've implemented things, the argument
must be specified as -1 for an existing timer.)

ufd = timerfd(ufd, -1, flags, utmr, &outmr);
to change the flags and timer settings of an existing timer, and
retrieve the time until the next expiration of the timer (and
the associated interval).

ufd = timerfd(ufd, -1, 0, NULL, &outmr);
Return the time until the next expiration of the timer (and the
associated interval), without changing the existing timer settings.
The flags argument has no meaning in this case, and must be
specified as 0.

Other changes:

* I've added checks for various combinations of arguments values
  that could be deemed invalid; there is probably room for debate
  about whether we want all of these checks.  Comments in the
  patch describe these checks.

* Appropriate, and possibly even correct, changes made in fs/compat.c.

* Jan Engelhardt noted that a cast could be removed from Davide's
  original code; I've removed it.

The changes are correct as far as I've tested them.  I've attached a
test program.  Davide, could you subject this to further test please?

I'm off on holiday the day after tomorrow, back in two weeks, with
very limited computer access.  I may have time for another pass of
the code if comments come in over(euro)night, otherwise Davide may
be willing to clean up whatever I messed up...

Cheers,

Michael

diff -ruN linux-2.6.23-rc1-mm2-orig/fs/compat.c linux-2.6.23-rc1-mm2/fs/compat.c
--- linux-2.6.23-rc1-mm2-orig/fs/compat.c	2007-08-05 10:43:30.000000000 +0200
+++ linux-2.6.23-rc1-mm2/fs/compat.c	2007-08-07 22:08:15.000000000 +0200
@@ -2211,18 +2211,38 @@
 #ifdef CONFIG_TIMERFD

 asmlinkage long compat_sys_timerfd(int ufd, int clockid, int flags,
-				   const struct compat_itimerspec __user *utmr)
+			 const struct compat_itimerspec __user *utmr,
+			 struct compat_itimerspec __user *old_utmr)
 {
 	struct itimerspec t;
 	struct itimerspec __user *ut;
+	long ret;

-	if (get_compat_itimerspec(&t, utmr))
-		return -EFAULT;
-	ut = compat_alloc_user_space(sizeof(*ut));
-	if (copy_to_user(ut, &t, sizeof(t)))
-		return -EFAULT;
+	/*:mtk: Framework taken from compat_sys_mq_getsetattr() */

-	return sys_timerfd(ufd, clockid, flags, ut);
+	ut = compat_alloc_user_space(2 * sizeof(*ut));
+
+	if (utmr) {
+		if (get_compat_itimerspec(&t, utmr))
+			return -EFAULT;
+		if (copy_to_user(ut, &t, sizeof(t)))
+			return -EFAULT;
+	}
+
+	ret = sys_timerfd(ufd, clockid, flags,
+			utmr ? ut : NULL,
+			old_utmr ? ut + 1 : NULL);
+
+	if (ret)
+		return ret;
+
+	if (old_utmr) {
+	    	if (copy_from_user(&t, old_ut, sizeof(t)) ||
+		    put_compat_itimerspec(&t, old_utmr))
+			return -EFAULT;
+	}
+
+	return 0;
 }

 #endif /* CONFIG_TIMERFD */
diff -ruN linux-2.6.23-rc1-mm2-orig/fs/timerfd.c linux-2.6.23-rc1-mm2/fs/timerfd.c
--- linux-2.6.23-rc1-mm2-orig/fs/timerfd.c	2007-08-05 10:44:24.000000000 +0200
+++ linux-2.6.23-rc1-mm2/fs/timerfd.c	2007-08-09 22:11:25.000000000 +0200
@@ -2,6 +2,8 @@
  *  fs/timerfd.c
  *
  *  Copyright (C) 2007  Davide Libenzi <davidel@xmailserver.org>
+ *  and Copyright (C) 2007 Google, Inc.
+ *      (Author: Michael Kerrisk <mtk@google.com>)
  *
  *
  *  Thanks to Thomas Gleixner for code reviews and useful comments.
@@ -26,6 +28,12 @@
 	ktime_t tintv;
 	wait_queue_head_t wqh;
 	int expired;
+	int clockid;	/* Established when timer is created, then
+			   immutable */
+	int overrun;	/* Number of overruns for this timer so far,
+			   updated on calls to timerfd() that retrieve
+			   settings of an existing timer, reset to zero
+			   by timerfd_read() */
 };

 /*
@@ -61,6 +69,7 @@
 	hrtimer_init(&ctx->tmr, clockid, htmode);
 	ctx->tmr.expires = texp;
 	ctx->tmr.function = timerfd_tmrproc;
+	ctx->overrun = 0;
 	if (texp.tv64 != 0)
 		hrtimer_start(&ctx->tmr, texp, htmode);
 }
@@ -130,10 +139,11 @@
 			 * callback to avoid DoS attacks specifying a very
 			 * short timer period.
 			 */
-			ticks = (u64)
+			ticks = ctx->overrun +
 				hrtimer_forward(&ctx->tmr,
 						hrtimer_cb_get_time(&ctx->tmr),
 						ctx->tintv);
+			ctx->overrun = 0;
 			hrtimer_restart(&ctx->tmr);
 		} else
 			ticks = 1;
@@ -151,24 +161,69 @@
 };

 asmlinkage long sys_timerfd(int ufd, int clockid, int flags,
-			    const struct itimerspec __user *utmr)
+			const struct itimerspec __user *utmr,
+			struct itimerspec __user *old_utmr)
 {
 	int error;
 	struct timerfd_ctx *ctx;
 	struct file *file;
 	struct inode *inode;
-	struct itimerspec ktmr;
+	struct itimerspec ktmr, oktmr;

-	if (copy_from_user(&ktmr, utmr, sizeof(ktmr)))
-		return -EFAULT;
+	/*
+	 * Not sure if we really need the first check below -- it
+	 * relies on all clocks having a non-negative value.  That's
+	 * currently true, but I'm not sure that it is anywhere
+	 * documented as a requirement.
+	 * (The point of the check is that clockid has no meaning if
+	 * we are changing the settings of an existing timer
+	 * (i.e., ufd >= 0), so let's require it to be an otherwise
+	 * unused value.)
+	 */

-	if (clockid != CLOCK_MONOTONIC &&
-	    clockid != CLOCK_REALTIME)
+	if (ufd >= 0) {
+		if (clockid != -1)
+			return -EINVAL;
+	} else if (clockid != CLOCK_MONOTONIC && clockid != CLOCK_REALTIME)
+		return -EINVAL;
+
+	/*
+	 * Disallow any other bits in flags than those we implement.
+	 */
+	if ((flags & ~(TFD_TIMER_ABSTIME)) != 0)
+		return -EINVAL;
+
+	/*
+         * Not sure if this check is strictly necessary, except to stop
+	 * userspace trying to do something silly.  Flags only has meaning
+	 * if we are creating/modifying a timer.
+	 */
+	if (utmr == NULL && flags != 0)
 		return -EINVAL;
-	if (!timespec_valid(&ktmr.it_value) ||
-	    !timespec_valid(&ktmr.it_interval))
+
+	/*
+	 * If we are creating a new timer, then we must supply the setting
+	 * and there is no previous timer setting to retrieve.
+	 */
+	if (ufd == -1 && (utmr == NULL || old_utmr != NULL))
+		return -EINVAL;
+	
+	/*
+	 * Caller must provide a new timer setting, or ask for previous
+	 * setting, or both.
+	 */
+	if (utmr == NULL && old_utmr == NULL)
 		return -EINVAL;

+	if (utmr) {
+		if (copy_from_user(&ktmr, utmr, sizeof(ktmr)))
+			return -EFAULT;
+
+		if (!timespec_valid(&ktmr.it_value) ||
+		    !timespec_valid(&ktmr.it_interval))
+			return -EINVAL;
+	}
+
 	if (ufd == -1) {
 		ctx = kmalloc(sizeof(*ctx), GFP_KERNEL);
 		if (!ctx)
@@ -176,6 +231,11 @@

 		init_waitqueue_head(&ctx->wqh);

+		/*
+		 * Save the clockid since we need it later if we modify
+		 * the timer.
+		 */
+		ctx->clockid = clockid;
 		timerfd_setup(ctx, clockid, flags, &ktmr);

 		/*
@@ -195,23 +255,61 @@
 			fput(file);
 			return -EINVAL;
 		}
-		/*
-		 * We need to stop the existing timer before reprogramming
-		 * it to the new values.
-		 */
-		for (;;) {
-			spin_lock_irq(&ctx->wqh.lock);
-			if (hrtimer_try_to_cancel(&ctx->tmr) >= 0)
-				break;
-			spin_unlock_irq(&ctx->wqh.lock);
-			cpu_relax();
+
+		if (old_utmr) {
+
+			/*
+			 * Retrieve interval and time remaining before
+			 * next expiration of timer.
+			 */
+
+			struct hrtimer *timer = &ctx->tmr;
+			ktime_t iv = ctx->tintv;
+			ktime_t now, remaining;
+
+			clockid = ctx->clockid;
+
+			memset(&oktmr, 0, sizeof(struct itimerspec));
+			if (iv.tv64)
+				oktmr.it_interval = ktime_to_timespec(iv);
+
+			now = timer->base->get_time();
+			ctx->overrun += hrtimer_forward(&ctx->tmr,
+					hrtimer_cb_get_time(&ctx->tmr),
+					ctx->tintv);
+			remaining = ktime_sub(timer->expires, now);
+			if (remaining.tv64 <= 0)
+				oktmr.it_value.tv_nsec = 1;
+			else
+				oktmr.it_value = ktime_to_timespec(remaining);
+
+			if (copy_to_user(old_utmr, &oktmr, sizeof (oktmr))) {
+				fput(file);
+				return -EFAULT;
+			}
 		}
-		/*
-		 * Re-program the timer to the new value ...
-		 */
-		timerfd_setup(ctx, clockid, flags, &ktmr);

-		spin_unlock_irq(&ctx->wqh.lock);
+		if (utmr) {
+			/*
+			 * Modify settings of existing timer.
+			 * We need to stop the existing timer before
+			 * reprogramming it to the new values.
+			 */
+			for (;;) {
+				spin_lock_irq(&ctx->wqh.lock);
+				if (hrtimer_try_to_cancel(&ctx->tmr) >= 0)
+					break;
+				spin_unlock_irq(&ctx->wqh.lock);
+				cpu_relax();
+			}
+
+			/*
+			 * Re-program the timer to the new value ...
+			 */
+			timerfd_setup(ctx, ctx->clockid, flags, &ktmr);
+
+			spin_unlock_irq(&ctx->wqh.lock);
+		}
 		fput(file);
 	}

@@ -222,4 +320,3 @@
 	kfree(ctx);
 	return error;
 }
-
diff -ruN linux-2.6.23-rc1-mm2-orig/include/linux/compat.h linux-2.6.23-rc1-mm2/include/linux/compat.h
--- linux-2.6.23-rc1-mm2-orig/include/linux/compat.h	2007-07-09 01:32:17.000000000 +0200
+++ linux-2.6.23-rc1-mm2/include/linux/compat.h	2007-08-05 17:46:33.000000000 +0200
@@ -265,7 +265,8 @@
 				const compat_sigset_t __user *sigmask,
                                 compat_size_t sigsetsize);
 asmlinkage long compat_sys_timerfd(int ufd, int clockid, int flags,
-				const struct compat_itimerspec __user *utmr);
+			const struct compat_itimerspec __user *utmr,
+			struct compat_itimerspec __user *old_utmr);

 #endif /* CONFIG_COMPAT */
 #endif /* _LINUX_COMPAT_H */
diff -ruN linux-2.6.23-rc1-mm2-orig/include/linux/syscalls.h linux-2.6.23-rc1-mm2/include/linux/syscalls.h
--- linux-2.6.23-rc1-mm2-orig/include/linux/syscalls.h	2007-08-05 10:44:32.000000000 +0200
+++ linux-2.6.23-rc1-mm2/include/linux/syscalls.h	2007-08-05 17:47:15.000000000 +0200
@@ -608,7 +608,8 @@
 asmlinkage long sys_getcpu(unsigned __user *cpu, unsigned __user *node, struct getcpu_cache __user *cache);
 asmlinkage long sys_signalfd(int ufd, sigset_t __user *user_mask, size_t sizemask);
 asmlinkage long sys_timerfd(int ufd, int clockid, int flags,
-			    const struct itimerspec __user *utmr);
+			const struct itimerspec __user *utmr,
+			struct itimerspec __user *old_utmr);
 asmlinkage long sys_eventfd(unsigned int count);
 asmlinkage long sys_fallocate(int fd, int mode, loff_t offset, loff_t len);



[-- Attachment #2: new_timerfd_errors_test.c --]
[-- Type: text/x-csrc, Size: 9715 bytes --]

/* new_timerfd_errors_test.c

   Copyright (C) 2007 Google
   Author: Michael Kerrisk <mtk@google.com>

   Run various tests against the revised timerfd() implementation.
*/
/* Link with -lrt */

#define _GNU_SOURCE
#include <sys/syscall.h>
#include <unistd.h>
#include <time.h>
#if defined(__i386__)
#define __NR_timerfd 322
#endif

static int
timerfd(int ufd, int clockid, int flags, struct itimerspec *utmr,
        struct itimerspec *outmr)
{
    return syscall(__NR_timerfd, ufd, clockid, flags, utmr, outmr);
}

#define TFD_TIMER_ABSTIME (1 << 0)

/**********************************************************************/

// #include <sys/timerfd.h>
#include <time.h>
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>
#include <errno.h>

/*
 * die() is for code that unexpectedly fails.
 * test_failed() is for tests where we didn't get a failure when we
 * should have.
*/
#define die(msg) do { perror(msg); exit(EXIT_FAILURE); } while (0)

static int failures = 0;

#define test_failed(msg) do { \
        fprintf(stderr, "TEST FAILED, because: %s\n", msg); \
        failures++; \
    } while (0);


#define MILLISEC 1000000        /* Expressed in nanonsecs */

static int
make_new_timerfd(int clockid, int flags, int vsecs, int vnsecs,
        int isecs, int insecs)
{
    struct itimerspec utmr;

    utmr.it_value.tv_sec = vsecs;
    utmr.it_value.tv_nsec = vnsecs;
    utmr.it_interval.tv_sec = isecs;
    utmr.it_interval.tv_nsec = insecs;

    return timerfd(-1, clockid, flags, &utmr, NULL);
} /* make_new_timerfd */

int
main(int argc, char *argv[])
{
    struct itimerspec utmr, outmr;
    int ufd, ret;
    struct timespec now;
    uint64_t ticks;
    int j;

    /*
     * All of the following timer creations should succeed.
     */ 
    ufd = make_new_timerfd(CLOCK_REALTIME, 0, 1, 0, 1, 0);
    if (ufd == -1)
        die("timerfd-REALTIME-flags=0");

    if (clock_gettime(CLOCK_REALTIME, &now) == -1)
        die("clock_gettime");
    ufd = make_new_timerfd(CLOCK_REALTIME, TFD_TIMER_ABSTIME, 
                now.tv_sec + 1, now.tv_nsec, 1, 0);
    if (ufd == -1)
        die("timerfd-REALTIME-ABSTIME");

    ufd = make_new_timerfd(CLOCK_MONOTONIC, 0, 1, 0, 1, 0);
    if (ufd == -1)
        die("timerfd-MONOTONIC-flags=0");

    if (clock_gettime(CLOCK_MONOTONIC, &now) == -1)
        die("clock_gettime");
    ufd = make_new_timerfd(CLOCK_MONOTONIC, TFD_TIMER_ABSTIME, 
                now.tv_sec + 1, now.tv_nsec, 1, 0);
    if (ufd == -1)
        die("timerfd-MONOTONIC-ABSTIME");

    ufd = make_new_timerfd(CLOCK_REALTIME, 0, 0, 0, 0, 0);
    if (ufd == -1)
        die("timerfd-MONOTONIC-zero-timer");

    /*
     * Some argument values should not be permitted.
     */
    ufd = make_new_timerfd(-1, 0, 1, 0, 1, 0);
    if (!(ufd == -1 && errno == EINVAL))
        test_failed("allowed-negative-CLOCKID-for-new-timer");

    ufd = make_new_timerfd(99, 0, 1, 0, 1, 0);
    if (!(ufd == -1 && errno == EINVAL))
        test_failed("allowed-bad-clockid");

    ufd = make_new_timerfd(CLOCK_MONOTONIC, 0xff, 1, 0, 1, 0);
    if (!(ufd == -1 && errno == EINVAL))
        test_failed("allowed-invalid-flags");

    /*
     * Non-canonical utmr values should be disallowed.
     */
    ufd = make_new_timerfd(CLOCK_MONOTONIC, 0, -1, 0, 1, 0);
    if (!(ufd == -1 && errno == EINVAL))
        test_failed("allowed-negative-it_value.tv_secs");

    ufd = make_new_timerfd(CLOCK_MONOTONIC, 0, 1, -1, 1, 0);
    if (!(ufd == -1 && errno == EINVAL))
        test_failed("allowed-negative-it_value.tv_nsecs");

    ufd = make_new_timerfd(CLOCK_MONOTONIC, 0, 1, 0, -1, 0);
    if (!(ufd == -1 && errno == EINVAL))
        test_failed("allowed-negative-it_interval.tv_secs");

    ufd = make_new_timerfd(CLOCK_MONOTONIC, 0, 1, 0, 1, -1);
    if (!(ufd == -1 && errno == EINVAL))
        test_failed("allowed-negative-it_interval.tv_nsecs");

    ufd = make_new_timerfd(CLOCK_MONOTONIC, 0, 1, 2000000000, 1, 0);
    if (!(ufd == -1 && errno == EINVAL))
        test_failed("allowed-non-canonical-it_value.tv_nsecs");

    ufd = make_new_timerfd(CLOCK_MONOTONIC, 0, 1, 0, 1, 2000000000);
    if (!(ufd == -1 && errno == EINVAL))
        test_failed("allowed-non-canonical-it_interval.tv_nsecs");

    /*
     * At least one of utmr and outmr must be non-NULL.
     */

    ufd = timerfd(-1, CLOCK_MONOTONIC, 0, NULL, NULL);
    if (!(ufd == -1 && errno == EINVAL))
        test_failed("allowed-NULL-utmtr-plus-NULL-outmr");

    /*
     * When creating a new timer, a non-NULL outmr is meaningless
     * and disallowed.
     */
    utmr.it_value.tv_sec = 10;
    utmr.it_value.tv_nsec = 1;
    utmr.it_interval.tv_sec = 10;
    utmr.it_interval.tv_nsec = 1;
    ufd = timerfd(-1, CLOCK_MONOTONIC, 0, &utmr, &outmr);
    if (!(ufd == -1 && errno == EINVAL))
        test_failed("allowed-NULL-outmr-for-new-timer");

    /*******************************************************************
     * Various argument combinations are disallowed when modifying an
     * existing timer.
     */
    ufd = timerfd(-1, CLOCK_REALTIME, 0, &utmr, NULL);
    if (ufd == -1)
        die("timerfd: modify setup");

    /*
     * A clockid cannot be changed for an existing timer;
     * therefore it is required to be -1.  (Note: We could not
     * require it to be zero, because CLOCK_REALTIME == 0.)
     */
    ret = timerfd(ufd, 0, 0, &utmr, NULL);
    if (!(ret == -1 && errno == EINVAL))
        test_failed("allowed-clockid-not-neg-1-clockid-for-timer-mod");

    /*
     * If ufd >= 0 and utmr is NULL, then we are retrieving the setting
     * of an existing timer; in this case a non-zero flags value is
     * meaningless and disallowed.
     */
    ret = timerfd(ufd, -1, TFD_TIMER_ABSTIME, NULL, &outmr);
    if (!(ret == -1 && errno == EINVAL))
        test_failed("allowed-non-zero-flags-with-NULL-utmr");

    /*
     * If ufd is not -1, then it must be a valid timerfd file descriptor.
     */
    ret = timerfd(STDOUT_FILENO, -1, 0, &utmr, NULL);
    if (!(ret == -1 && errno == EINVAL))
        test_failed("allowed-non-timerfd-file-descriptor");

    ret = timerfd(99, -1, 0, &utmr, NULL);
    if (!(ret == -1 && errno == EBADF))
        test_failed("allowed-bad-file-descriptor");

    printf("Completed argument tests\n");

    /*
     * Test operation of a single shot timer.
     */
    ufd = make_new_timerfd(CLOCK_REALTIME, 0, 0, 500 * MILLISEC, 0, 0);
    if (ufd == -1)
        die("timerfd-MONOTONIC-ABSTIME");

    if (read(ufd, &ticks, sizeof(ticks)) != sizeof(ticks))
        die("read");

    if (ticks != 1)
        test_failed("ticks-was-not-1")

    /*
     * The following timer should return an overrun count of 5
     * for the read().  The fact that make multiple timerfd()
     * calls to retrieve the current timer settings should not
     * affect the overrun count.
     */
    utmr.it_value.tv_sec = 1;
    utmr.it_value.tv_nsec = 0;
    utmr.it_interval.tv_sec = 2;
    utmr.it_interval.tv_nsec = 0;
    printf("\nMeasuring timer with interval %ld, frequency %ld, "
            "for 10 seconds\n", (long) utmr.it_value.tv_sec,
            (long) utmr.it_interval.tv_sec);
    ufd = timerfd(-1, CLOCK_REALTIME, 0, &utmr, NULL);
    if (ufd == -1)
        die("timerfd-MONOTONIC-ABSTIME");

    for (j = 0; j < 10; j++) {
        sleep(1);
        ret = timerfd(ufd, -1, 0, NULL, &outmr);
        printf("%d - Got: %ld %9ld\n", j,
                (long) outmr.it_value.tv_sec,
                (long) outmr.it_value.tv_nsec);
    }

    if (read(ufd, &ticks, sizeof(ticks)) != sizeof(ticks))
        die("read");

    printf("Read (expiration count): %lld\n", ticks);
    if (ticks != 5)
        test_failed("ticks-was-not-5");

    /*
     * Modify the settings of the previous timer.
     * The following timer should return an overrun count of 3
     * for the read().  Making multiple timerfd() calls
     * to retrieve the current timer settings should not
     * affect the overrun count.
     */
    utmr.it_value.tv_sec = 1;
    utmr.it_value.tv_nsec = 0;
    utmr.it_interval.tv_sec = 4;
    utmr.it_interval.tv_nsec = 0;
    printf("\nMeasuring timer with interval %ld, frequency %ld, "
            "for 10 seconds\n", (long) utmr.it_value.tv_sec,
            (long) utmr.it_interval.tv_sec);
    ret = timerfd(ufd, -1, 0, &utmr, &outmr);
    if (ret == -1)
        die("timerfd-MONOTONIC-ABSTIME");

    for (j = 0; j < 10; j++) {
        sleep(1);
        ret = timerfd(ufd, -1, 0, NULL, &outmr);
        printf("%d - Got: %ld %9ld\n", j,
                (long) outmr.it_value.tv_sec,
                (long) outmr.it_value.tv_nsec);
    }

    if (read(ufd, &ticks, sizeof(ticks)) != sizeof(ticks))
        die("read");

    printf("Read (expiration count): %lld\n", ticks);
    if (ticks != 3)
        test_failed("ticks-was-not-3");

    /*
     * If we set an interval timer to expire in the past, then it
     * the number of overruns should correspond to how far in the
     * past the timer was set.
     */
#define NEG_SECS 20
    if (clock_gettime(CLOCK_REALTIME, &now) == -1)
        die("clock_gettime");
    ufd = make_new_timerfd(CLOCK_REALTIME, TFD_TIMER_ABSTIME, 
                now.tv_sec - NEG_SECS, now.tv_nsec, 1, 0);
    if (ufd == -1)
        die("timerfd-REALTIME-ABSTIME");

    if (read(ufd, &ticks, sizeof(ticks)) != sizeof(ticks))
        die("read");

    printf("Read from already expired timer: %lld\n", ticks);
    if (ticks != (1 + NEG_SECS))
        test_failed("wrong-value-read-from-past-timer");

    if (failures == 0) 
        printf("All tests passed successfully\n");
    else
        printf("%d tests failed\n", failures);
    exit((failures == 0) ? EXIT_SUCCESS : EXIT_FAILURE);
}

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Revised timerfd() interface
  2007-08-09 21:11         ` [PATCH] Revised timerfd() interface Michael Kerrisk
@ 2007-08-13 23:34           ` Randy Dunlap
  2007-08-15 14:40           ` Jonathan Corbet
  1 sibling, 0 replies; 16+ messages in thread
From: Randy Dunlap @ 2007-08-13 23:34 UTC (permalink / raw)
  To: Michael Kerrisk
  Cc: Andrew Morton, Thomas Gleixner, lkml, Linux Torvalds,
	Davide Libenzi, drepper, stable, Christoph Hellwig,
	Jan Engelhardt, mtk

On Thu, 09 Aug 2007 23:11:45 +0200 Michael Kerrisk wrote:

> Andrew,
> 
> Here's my first shot at changing the timerfd() interface; patch
> is against 2.6.23-rc1-mm2.
> 
> I think I got to the bottom of understanding the timer code in
> the end, but I may still have missed some things...
> 
> This is my first kernel patch of any substance.  Perhaps Randy
> or Christoph can give my toes a light toasting for the various
> boobs I've likely made.
> 
> Thomas, would you please look over this patch to verify my
> timer code?
> 
> Davide: would you please bless this interface change ;-).
> It makes it possible to:
> 
> 1) Fetch the previous timer settings (i.e., interval, and time
>    remaining until next expiration) while installing new settings
> 
> 2) Retrieve the settings of an existing timer without changing
>    the timer.
> 
> Both of these features are supported by the older Unix timer APIs
> (setitimer/getitimer and timer_create/timer_settime/timer_gettime).

Hi,

OK, I'll make a few comments.

1.  Provide a full changelog, including reasons why the patch is
needed.  Don't assume that we have read the previous email threads.
(We haven't. :)


> The changes are as follows:
> 
> a) A further argument, outmr, can be used to retrieve the interval
>    and time remaining before the expiration of a previously created
>    timer.  Thus the call signature is now:
> 
>    timerfd(int utmr, int clockid, int flags,

                 ufd,

>            struct itimerspec *utmr,
>            struct itimerspec *outmr)
> 
>    The outmr argument:
> 
>    1) must be NULL for a new timer (ufd == -1).
>    2) can be NULL if we are updating an existing
>       timer (i.e., utmr != NULL), but we are not
>       interested in the previous timer value.
>    3) can be used to retrieve the time until next timer
>       expiration, without changing the timer.
>       (In this case, utmr == NULL and ufd >= 0.)
> 
> b) Make the 'clockid' immutable: it can only be set
>    if 'ufd' is -1 -- that is, on the timerfd() call that
>    first creates a timer.  (This is effectively
>    the limitation that is imposed by POSIX timers: the
>    clockid is specified when the timer is created with
>    timer_create(), and can't be changed.)
> 
>    [In the 2.6.22 interface, the clockid of an existing
>    timer can be changed with a further call to timerfd()
>    that specifies the file descriptor of an existing timer.]
> 
> The use cases are as follows:
> 
> ufd = timerfd(-1, clockid, flags, utmr, NULL);
> to create a new timer with given clockid, flags, and utmr (initial
> expiration + interval).  outmr must be NULL.
> 
> ufd = timerfd(ufd, -1, flags, utmr, NULL);
> to change the flags and timer settings of an existing timer.
> (The clockid argument serves no purpose when modifying an
> existing timer, and as I've implemented things, the argument
> must be specified as -1 for an existing timer.)
> 
> ufd = timerfd(ufd, -1, flags, utmr, &outmr);
> to change the flags and timer settings of an existing timer, and
> retrieve the time until the next expiration of the timer (and
> the associated interval).
> 
> ufd = timerfd(ufd, -1, 0, NULL, &outmr);
> Return the time until the next expiration of the timer (and the
> associated interval), without changing the existing timer settings.
> The flags argument has no meaning in this case, and must be
> specified as 0.
> 
> Other changes:
> 
> * I've added checks for various combinations of arguments values
>   that could be deemed invalid; there is probably room for debate
>   about whether we want all of these checks.  Comments in the
>   patch describe these checks.
> 
> * Appropriate, and possibly even correct, changes made in fs/compat.c.
> 
> * Jan Engelhardt noted that a cast could be removed from Davide's
>   original code; I've removed it.
> 
> The changes are correct as far as I've tested them.  I've attached a
> test program.  Davide, could you subject this to further test please?
> 
> I'm off on holiday the day after tomorrow, back in two weeks, with
> very limited computer access.  I may have time for another pass of
> the code if comments come in over(euro)night, otherwise Davide may
> be willing to clean up whatever I messed up...
> 
> Cheers,
> 
> Michael
> 
> diff -ruN linux-2.6.23-rc1-mm2-orig/fs/compat.c linux-2.6.23-rc1-mm2/fs/compat.c
> --- linux-2.6.23-rc1-mm2-orig/fs/compat.c	2007-08-05 10:43:30.000000000 +0200
> +++ linux-2.6.23-rc1-mm2/fs/compat.c	2007-08-07 22:08:15.000000000 +0200
> @@ -2211,18 +2211,38 @@
>  #ifdef CONFIG_TIMERFD
> 
>  asmlinkage long compat_sys_timerfd(int ufd, int clockid, int flags,
> -				   const struct compat_itimerspec __user *utmr)
> +			 const struct compat_itimerspec __user *utmr,
> +			 struct compat_itimerspec __user *old_utmr)
>  {
>  	struct itimerspec t;
>  	struct itimerspec __user *ut;
> +	long ret;
> 
> -	if (get_compat_itimerspec(&t, utmr))
> -		return -EFAULT;
> -	ut = compat_alloc_user_space(sizeof(*ut));
> -	if (copy_to_user(ut, &t, sizeof(t)))
> -		return -EFAULT;
> +	/*:mtk: Framework taken from compat_sys_mq_getsetattr() */

Drop the :mtk: string(s).

> -	return sys_timerfd(ufd, clockid, flags, ut);
> +	ut = compat_alloc_user_space(2 * sizeof(*ut));
> +
> +	if (utmr) {
> +		if (get_compat_itimerspec(&t, utmr))
> +			return -EFAULT;
> +		if (copy_to_user(ut, &t, sizeof(t)))
> +			return -EFAULT;
> +	}
> +
> +	ret = sys_timerfd(ufd, clockid, flags,
> +			utmr ? ut : NULL,
> +			old_utmr ? ut + 1 : NULL);
> +
> +	if (ret)
> +		return ret;
> +
> +	if (old_utmr) {
> +	    	if (copy_from_user(&t, old_ut, sizeof(t)) ||

I can't find <old_ut> anywhere.  Should be old_utmr I guess.

> +		    put_compat_itimerspec(&t, old_utmr))

		    put_compat_itimerspec(old_utmr, &t))

IOW, I think that the parameters are reversed (at least this way
their types match the function prototype).

> +			return -EFAULT;
> +	}
> +
> +	return 0;
>  }
> 
>  #endif /* CONFIG_TIMERFD */
> diff -ruN linux-2.6.23-rc1-mm2-orig/fs/timerfd.c linux-2.6.23-rc1-mm2/fs/timerfd.c
> --- linux-2.6.23-rc1-mm2-orig/fs/timerfd.c	2007-08-05 10:44:24.000000000 +0200
> +++ linux-2.6.23-rc1-mm2/fs/timerfd.c	2007-08-09 22:11:25.000000000 +0200
> @@ -2,6 +2,8 @@
>   *  fs/timerfd.c
>   *
>   *  Copyright (C) 2007  Davide Libenzi <davidel@xmailserver.org>
> + *  and Copyright (C) 2007 Google, Inc.
> + *      (Author: Michael Kerrisk <mtk@google.com>)
>   *
>   *
>   *  Thanks to Thomas Gleixner for code reviews and useful comments.
> @@ -26,6 +28,12 @@
>  	ktime_t tintv;
>  	wait_queue_head_t wqh;
>  	int expired;
> +	int clockid;	/* Established when timer is created, then
> +			   immutable */
> +	int overrun;	/* Number of overruns for this timer so far,
> +			   updated on calls to timerfd() that retrieve
> +			   settings of an existing timer, reset to zero
> +			   by timerfd_read() */
>  };
> 
>  /*
> @@ -61,6 +69,7 @@
>  	hrtimer_init(&ctx->tmr, clockid, htmode);
>  	ctx->tmr.expires = texp;
>  	ctx->tmr.function = timerfd_tmrproc;
> +	ctx->overrun = 0;
>  	if (texp.tv64 != 0)
>  		hrtimer_start(&ctx->tmr, texp, htmode);
>  }
> @@ -130,10 +139,11 @@
>  			 * callback to avoid DoS attacks specifying a very
>  			 * short timer period.
>  			 */
> -			ticks = (u64)
> +			ticks = ctx->overrun +
>  				hrtimer_forward(&ctx->tmr,
>  						hrtimer_cb_get_time(&ctx->tmr),
>  						ctx->tintv);
> +			ctx->overrun = 0;
>  			hrtimer_restart(&ctx->tmr);
>  		} else
>  			ticks = 1;
> @@ -151,24 +161,69 @@
>  };
> 
>  asmlinkage long sys_timerfd(int ufd, int clockid, int flags,
> -			    const struct itimerspec __user *utmr)
> +			const struct itimerspec __user *utmr,
> +			struct itimerspec __user *old_utmr)
>  {
>  	int error;
>  	struct timerfd_ctx *ctx;
>  	struct file *file;
>  	struct inode *inode;
> -	struct itimerspec ktmr;
> +	struct itimerspec ktmr, oktmr;
> 
> -	if (copy_from_user(&ktmr, utmr, sizeof(ktmr)))
> -		return -EFAULT;
> +	/*
> +	 * Not sure if we really need the first check below -- it
> +	 * relies on all clocks having a non-negative value.  That's
> +	 * currently true, but I'm not sure that it is anywhere
> +	 * documented as a requirement.
> +	 * (The point of the check is that clockid has no meaning if
> +	 * we are changing the settings of an existing timer
> +	 * (i.e., ufd >= 0), so let's require it to be an otherwise
> +	 * unused value.)
> +	 */
> 
> -	if (clockid != CLOCK_MONOTONIC &&
> -	    clockid != CLOCK_REALTIME)
> +	if (ufd >= 0) {
> +		if (clockid != -1)
> +			return -EINVAL;
> +	} else if (clockid != CLOCK_MONOTONIC && clockid != CLOCK_REALTIME)
> +		return -EINVAL;
> +
> +	/*
> +	 * Disallow any other bits in flags than those we implement.
> +	 */
> +	if ((flags & ~(TFD_TIMER_ABSTIME)) != 0)
> +		return -EINVAL;
> +
> +	/*
> +         * Not sure if this check is strictly necessary, except to stop

Use tab+space instead of spaces.

> +	 * userspace trying to do something silly.  Flags only has meaning
> +	 * if we are creating/modifying a timer.
> +	 */
> +	if (utmr == NULL && flags != 0)
>  		return -EINVAL;
> -	if (!timespec_valid(&ktmr.it_value) ||
> -	    !timespec_valid(&ktmr.it_interval))
> +
> +	/*
> +	 * If we are creating a new timer, then we must supply the setting
> +	 * and there is no previous timer setting to retrieve.
> +	 */
> +	if (ufd == -1 && (utmr == NULL || old_utmr != NULL))
> +		return -EINVAL;
> +	
> +	/*
> +	 * Caller must provide a new timer setting, or ask for previous
> +	 * setting, or both.
> +	 */
> +	if (utmr == NULL && old_utmr == NULL)
>  		return -EINVAL;
> 
> +	if (utmr) {
> +		if (copy_from_user(&ktmr, utmr, sizeof(ktmr)))
> +			return -EFAULT;
> +
> +		if (!timespec_valid(&ktmr.it_value) ||
> +		    !timespec_valid(&ktmr.it_interval))
> +			return -EINVAL;
> +	}
> +
>  	if (ufd == -1) {
>  		ctx = kmalloc(sizeof(*ctx), GFP_KERNEL);
>  		if (!ctx)
> @@ -176,6 +231,11 @@
> 
>  		init_waitqueue_head(&ctx->wqh);
> 
> +		/*
> +		 * Save the clockid since we need it later if we modify
> +		 * the timer.
> +		 */
> +		ctx->clockid = clockid;
>  		timerfd_setup(ctx, clockid, flags, &ktmr);
> 
>  		/*
> @@ -195,23 +255,61 @@
>  			fput(file);
>  			return -EINVAL;
>  		}
> -		/*
> -		 * We need to stop the existing timer before reprogramming
> -		 * it to the new values.
> -		 */
> -		for (;;) {
> -			spin_lock_irq(&ctx->wqh.lock);
> -			if (hrtimer_try_to_cancel(&ctx->tmr) >= 0)
> -				break;
> -			spin_unlock_irq(&ctx->wqh.lock);
> -			cpu_relax();
> +
> +		if (old_utmr) {
> +
> +			/*
> +			 * Retrieve interval and time remaining before
> +			 * next expiration of timer.
> +			 */
> +
> +			struct hrtimer *timer = &ctx->tmr;
> +			ktime_t iv = ctx->tintv;
> +			ktime_t now, remaining;
> +
> +			clockid = ctx->clockid;
> +
> +			memset(&oktmr, 0, sizeof(struct itimerspec));
> +			if (iv.tv64)
> +				oktmr.it_interval = ktime_to_timespec(iv);
> +
> +			now = timer->base->get_time();
> +			ctx->overrun += hrtimer_forward(&ctx->tmr,
> +					hrtimer_cb_get_time(&ctx->tmr),
> +					ctx->tintv);
> +			remaining = ktime_sub(timer->expires, now);
> +			if (remaining.tv64 <= 0)
> +				oktmr.it_value.tv_nsec = 1;
> +			else
> +				oktmr.it_value = ktime_to_timespec(remaining);
> +
> +			if (copy_to_user(old_utmr, &oktmr, sizeof (oktmr))) {
> +				fput(file);
> +				return -EFAULT;
> +			}
>  		}
> -		/*
> -		 * Re-program the timer to the new value ...
> -		 */
> -		timerfd_setup(ctx, clockid, flags, &ktmr);
> 
> -		spin_unlock_irq(&ctx->wqh.lock);
> +		if (utmr) {
> +			/*
> +			 * Modify settings of existing timer.
> +			 * We need to stop the existing timer before
> +			 * reprogramming it to the new values.
> +			 */
> +			for (;;) {
> +				spin_lock_irq(&ctx->wqh.lock);
> +				if (hrtimer_try_to_cancel(&ctx->tmr) >= 0)
> +					break;
> +				spin_unlock_irq(&ctx->wqh.lock);
> +				cpu_relax();
> +			}
> +
> +			/*
> +			 * Re-program the timer to the new value ...
> +			 */
> +			timerfd_setup(ctx, ctx->clockid, flags, &ktmr);
> +
> +			spin_unlock_irq(&ctx->wqh.lock);
> +		}
>  		fput(file);
>  	}
> 
> @@ -222,4 +320,3 @@
>  	kfree(ctx);
>  	return error;
>  }
> -
> diff -ruN linux-2.6.23-rc1-mm2-orig/include/linux/compat.h linux-2.6.23-rc1-mm2/include/linux/compat.h
> --- linux-2.6.23-rc1-mm2-orig/include/linux/compat.h	2007-07-09 01:32:17.000000000 +0200
> +++ linux-2.6.23-rc1-mm2/include/linux/compat.h	2007-08-05 17:46:33.000000000 +0200
> @@ -265,7 +265,8 @@
>  				const compat_sigset_t __user *sigmask,
>                                  compat_size_t sigsetsize);
>  asmlinkage long compat_sys_timerfd(int ufd, int clockid, int flags,
> -				const struct compat_itimerspec __user *utmr);
> +			const struct compat_itimerspec __user *utmr,
> +			struct compat_itimerspec __user *old_utmr);
> 
>  #endif /* CONFIG_COMPAT */
>  #endif /* _LINUX_COMPAT_H */
> diff -ruN linux-2.6.23-rc1-mm2-orig/include/linux/syscalls.h linux-2.6.23-rc1-mm2/include/linux/syscalls.h
> --- linux-2.6.23-rc1-mm2-orig/include/linux/syscalls.h	2007-08-05 10:44:32.000000000 +0200
> +++ linux-2.6.23-rc1-mm2/include/linux/syscalls.h	2007-08-05 17:47:15.000000000 +0200
> @@ -608,7 +608,8 @@
>  asmlinkage long sys_getcpu(unsigned __user *cpu, unsigned __user *node, struct getcpu_cache __user *cache);
>  asmlinkage long sys_signalfd(int ufd, sigset_t __user *user_mask, size_t sizemask);
>  asmlinkage long sys_timerfd(int ufd, int clockid, int flags,
> -			    const struct itimerspec __user *utmr);
> +			const struct itimerspec __user *utmr,
> +			struct itimerspec __user *old_utmr);
>  asmlinkage long sys_eventfd(unsigned int count);
>  asmlinkage long sys_fallocate(int fd, int mode, loff_t offset, loff_t len);


---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] Revised timerfd() interface
  2007-08-09 21:11         ` [PATCH] Revised timerfd() interface Michael Kerrisk
  2007-08-13 23:34           ` Randy Dunlap
@ 2007-08-15 14:40           ` Jonathan Corbet
  1 sibling, 0 replies; 16+ messages in thread
From: Jonathan Corbet @ 2007-08-15 14:40 UTC (permalink / raw)
  To: Michael Kerrisk
  Cc: lkml, Linux Torvalds, Davide Libenzi, drepper, stable,
	Christoph Hellwig, Randy Dunlap, Jan Engelhardt, Andrew Morton,
	Thomas Gleixner

Sorry for the late commentary...as I looked this over, one thing popped
into my mind....

> b) Make the 'clockid' immutable: it can only be set
>    if 'ufd' is -1 -- that is, on the timerfd() call that
>    first creates a timer.

timerfd() is looking increasingly like a multiplexor system call in
disguise.  There is no form of invocation which uses all of the
arguments.  Are we sure we wouldn't rather have something like:

	timerfd_open(clock);
	timerfd_adjust(fd, new_time, old_time);

?

jon

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2007-08-15 14:40 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-07-23  6:32 Problems with timerfd() Michael Kerrisk
2007-07-23  6:38 ` Andrew Morton
2007-07-23  6:42   ` Andrew Morton
2007-07-23  8:02     ` Michael Kerrisk
2007-07-25 18:18   ` Michael Kerrisk
2007-07-25 22:12     ` Andrew Morton
2007-08-07  6:55       ` Michael Kerrisk
2007-08-07  7:36         ` Andrew Morton
2007-08-07  9:14           ` Michael Kerrisk
2007-08-09 21:11         ` [PATCH] Revised timerfd() interface Michael Kerrisk
2007-08-13 23:34           ` Randy Dunlap
2007-08-15 14:40           ` Jonathan Corbet
2007-07-23 16:55 ` Problems with timerfd() Ray Lee
2007-07-24  7:40   ` Michael Kerrisk
2007-07-24 15:22     ` Ray Lee
2007-07-24 15:56       ` Michael Kerrisk

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox