public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* RFC: A revised timerfd API
@ 2007-09-18  7:27 Michael Kerrisk
  2007-09-18  7:30 ` Michael Kerrisk
  2007-09-18 16:51 ` Davide Libenzi
  0 siblings, 2 replies; 26+ messages in thread
From: Michael Kerrisk @ 2007-09-18  7:27 UTC (permalink / raw)
  To: Davide Libenzi
  Cc: Ulrich Drepper, geoff, lkml, Andrew Morton, Thomas Gleixner,
	Christoph Hellwig, Jonathan Corbet, Randy Dunlap, vda.linux,
	Linus Torvalds, Lee Schermerhorn

After my earlier mail (full thread here:
http://thread.gmane.org/gmane.linux.kernel/574430/focus=579368 )
it seems that some people agree we should give a bit more
thought to how a final timerfd interface should look.

Davide's original API and the limitations that I see in it,
are described in the message cited above.  In that message,
I proposed three alternatives (and I'm going to add a fourth
now) that provide "get-while-set" and "non-destructive-get"
functionality.  Before trying to implement anything I'd like to
get input on the various possible designs.

The four designs are:

a) A multiplexing timerfd() system call.
b) Creating three syscalls analogous to the POSIX timers API (i.e.,
   timerfd_create/timerfd_settime/timerfd_gettime).
c) Creating a simplified timerfd() system call that is integrated
   with the POSIX timers API.
d) Extending the POSIX timers API to support the timerfd concept.

My order of preference for the implementations is currently
something like (in descending order): d, b, c, a.

The details follow:

====> a) Add an argument (a multiplexing timerfd() system call)

In an earlier mail (http://thread.gmane.org/gmane.linux.kernel/559193 ).
I proposed adding a further argument to timerfd(): old_utmr, which could
be used to return the time remaining until expiry for an existing timer
The proposed semantics that would allow get and get-while-setting
functionality.

Advantages:
  1. Provides the desired get and get-while-setting functionality.
  2. It's a simple interface (a single system call).

Disadvantage:
Jon Corbet pointed out
(http://thread.gmane.org/gmane.linux.kernel/559193/focus=570709 )
that this interface was starting to look like a multiplexing syscall,
because there is no case where all of the arguments are used (see
the use-case descriptions in the earlier mail).

I'm inclined to agree with Jon; therefore one of the remaining
solutions may be preferable

====> b) Create a timerfd interface analogous to POSIX timers

Create an interface analogous to POSIX timers:

fd = timerfd_create(clockid, flags);

timerfd_settime(fd, flags, newtimervalue, &time_to_next_expire);

timerfd_gettime(fd, &time_to_next_expire);

Under this proposal, the manner of making a timer that does not
need "get-while-set" functionality remains fairly simple:

    fd = timerfd_create(clockid);

    timerfd_settime(fd, flags, newtimervalue, NULL);

Advantage: this would be a clean, fully functional API, and well
understood by virtue of its analogy with the POSIX timers API.

Disadvantage: 3 new system calls, rather than 1.

This solution would be sufficient, IMO, but one of the
next solutions might be better.

====> c) Integrate timerfd with POSIX timers

Make a very simple timerfd call that is integrated with the
POSIX timers API.  The POSIX timers API is detailed here:
http://linux.die.net/man/3/timer_create
http://linux.die.net/man/3/timer_settime

Under the POSIX timers API, a new timer is created using:

int timer_create(clockid_t clockid, struct sigevent *evp,
        timer_t *timerid);

We could then have a timerfd() call that returns a file descriptor
for the newly created 'timerid':

fd = timerfd(timer_t timerid);

We could then use the POSIX timers API to operate on the timer
(start it / modify it / fetch timer value):

int timer_settime(timer_t timerid, int flags,
        const struct itimerspec *value,
        struct itimerspec *ovalue);
int timer_gettime(timer_t timerid, struct itimerspec *value);

And then read() from 'fd' as before.

In the simple case (no "get" or "get-while-setting" functionality),
the use of API (c) would be:

    timer_create(clockid, &evp, &timerid);

    fd = timerfd(timerid);

    timer_settime(timerid, flags, &newvalue, NULL));

Advantages:
  1. Integration with an existing API.
  2. Adds just a single system call.
  3. It _might_ be possible to construct an interface that allows
     userland programs to do things like creating a timer fd for
     a POSIX timer that was created via some library that doesn't
     actually know about timer fds.  (I can already see problems with
     this, since that library will already expect to be delivering
     timer notifications somehow (via threads or signals), and it may
     be difficult to make the two notification mechanisms play
     together in a sane way.  But maybe someone else has a take on
     this that can rescue this idea.)

Disadvantages:
  1. Starts to get a little more clunky to use in the simple
     case shown above.

This strikes me as a more attractive solution than (b), if we can do
it properly -- that means: if we can achieve advantage 3
in some reasonable way.  If we can't achieve that, then probably
the next solution is better.

====> d) extend the POSIX timers API

Under the POSIX timers API, the evp argument of timer_create() is a
structure that allows the caller to specify how timer expirations
should be notified.  There are the following possibilities
(differentiated by the value assigned to evp.sigev_notify):

  i) notify via a signal: the caller specifies which signal the
     kernel should deliver when the timer expires.
     (SIGEV_SIGNAL)
 ii) notify by delivering a signal to the thread whose thread ID
     is specified in evp.  (This is Linux specific.)
     (SIGEV_THREAD_ID)
iii) notify via a thread: when the timer expires, the system starts
     a new thread which receives an argument that was specified in
     the evp structure. (SIGEV_THREAD)
 iv) no notification: the caller can monitor the timer state using
     timer_gettime(). (SIGEV_NONE)

In all of the above cases, the return value from timer_create()
is 0 for success, or -1 for failure.

We could extend the interface as follows:

1) Add a new flag for evp.sigev_notify: SIGEV_TIMERFD.
   This flag indicates that the caller wants timer
   notification via a file descriptor.
2) Whenevp.sigev_notify == SIGEV_TIMERFD, have a successful
   timer_create() call return a file descriptor (i.e., an
   integer >= 0).

Advantages:
  1. Integration with an existing API.
  2. No new system calls are required.
  3. This idea might even have a chance of getting standardized in
     POSIX one day, since (IMO) it integrates fairly cleanly with
     an existing API.

Disadvantages:
  1. The fact that the return value of a successful timer_create()
     is different for the SIGEV_TIMERFD case is a bit of a wart.

=====

That's it.  I welcome people's thoughts on these designs.

Cheers,

Michael

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2007-10-01 16:22 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-09-18  7:27 RFC: A revised timerfd API Michael Kerrisk
2007-09-18  7:30 ` Michael Kerrisk
2007-09-18  8:05   ` David Härdeman
2007-09-18  9:01     ` Michael Kerrisk
2007-09-18  9:27       ` Thomas Gleixner
2007-09-18  9:10   ` Thomas Gleixner
2007-09-18  9:30     ` Michael Kerrisk
2007-09-18  9:42       ` Thomas Gleixner
2007-09-18 11:08         ` Michael Kerrisk
2007-09-18 11:30           ` Thomas Gleixner
2007-09-18 13:13             ` David Härdeman
2007-09-22 13:03               ` Michael Kerrisk
2007-09-18 16:51 ` Davide Libenzi
2007-09-22 13:12   ` Michael Kerrisk
2007-09-22 14:32     ` Bernd Eckenfels
2007-09-22 16:07       ` Michael Kerrisk
2007-09-22 17:05         ` Thomas Gleixner
2007-09-22 23:37         ` David Härdeman
2007-09-22 17:10     ` Thomas Gleixner
2007-09-22 21:07     ` Davide Libenzi
2007-09-22 21:26       ` Thomas Gleixner
2007-09-22 23:21         ` Davide Libenzi
2007-09-23 17:33       ` Michael Kerrisk
2007-09-23 18:33         ` Davide Libenzi
2007-09-23 18:41           ` Davide Libenzi
2007-09-23 19:03             ` Michael Kerrisk

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox