public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Jamie Lokier <lk@tantalophile.demon.co.uk>
To: Davide Libenzi <davidel@xmailserver.org>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linux-aio@kvack.org, lse-tech@lists.sourceforge.net,
	Linus Torvalds <torvalds@transmeta.com>,
	Andrew Morton <akpm@digeo.com>,
	Alan Cox <alan@lxorguk.ukuu.org.uk>
Subject: Re: Unifying epoll,aio,futexes etc. (What I really want from epoll)
Date: Fri, 1 Nov 2002 02:01:19 +0000	[thread overview]
Message-ID: <20021101020119.GC30865@bjl1.asuk.net> (raw)
In-Reply-To: <Pine.LNX.4.44.0210311642300.1562-100000@blue1.dev.mcafeelabs.com>

Davide Libenzi wrote:
> Jamie, the futex support can be easily done with one line of code patch. I
> still prefer the one-to-one mapping between futexes and files. It makes
> everything easier.

I do agree it is very simple and hence good.

> I don't really see futex creation/destroy as an high frequency event
> that might be suitable for optimization. Usually you have your own
> set of resources to be "protected" and in 95% of cases you know
> those resources from the beginning.

Well, I'll disagree but stay mostly quiet.  I think it is reasonable
to have a futex per _object_ in certain language run-times.
Allocation rate: 10,000,000 per second in some examples (f.e. certain
kinds of threaded simulator).

Hardly any of those will need associated fds, and I have no figures on
how many or how often, but you can see that futexes are sometimes used
in a very dynamic way because they are so cheap until contention.

That's the cool thing about futexes: there's absolutely zero kernel
overhead until contention, and only one "long" of overhead in user
space.

At contention, two syscalls resolves it synchronously: futex_wait,
futex_wake.  The async method using an fd with epoll takes five:
futex_fd, epoll_ctl, poll, futex_wake, futex_close.  That works, but
lacks the _cool_ factor that futexes have IMHO.  It should be:
futex_wait_async, futex_wake.

I realise my argument is a weak one though :)

> > > Timer, as long as you access them through a file* interface ( like futexes )
> > > will become trivial too. Another line should be sufficent for dnotify :
> >
> > Sorry (<humble/>), ignore timers.  Somehow I picked up the idea that
> > epoll_wait() didn't have a timeout from some example or other, which
> > was very silly of me.  I've read the patch properly now!  Of course
> > epoll supports timers - a timeout is quite enough for user space.
> 
> If you want to timeout I/O operations you can easily put a timer routine
> in your main event scheduler loop. But I still like the idea of timers
> easily accessible through a file* interface.

Sure, but using file * interface implies entering the kernel - that
can sometimes be skipped* if your timer queue is in user space.

* - it happens under heavy load, conveniently.

> > > void __inode_dir_notify(struct inode *inode, unsigned long event)
> >
> > Agreed.  This is looking good :)
> 
> I asked Linus what he thinks about this one-line patch.

I have no objections to it.  Generally, I'd like epoll to be able to
report _what_ the event was (not just POLL_RDNORM, but what kind of
dnotify event), but as I don't get to run on an ideal kernel [;)] I'll
be happy with POLL_RDNORM.

> I still believe that the 1:1 mapping is sufficent and with that in place (
> and the one line patch to kernel/futex.c ) futex support comes nicely.

It does work - actually, with ->poll() you don't need any lines in futex.c :)

Even if a specialised futex hook is added someday, the fd support will
continue to be useful.

> >    2. Add a check to EP_CTL_ADD which checks whether a file supports
> >       epoll notifications natively.  Perhaps a file_operations hook
> >       is in order here.  If it does, great.  If not, fall back to
> >       a generic mechanism that uses the file's ->poll() method.  (I
> >       haven't thought through for sure how plausible this is).
> >       Magically, every kind of fd works, including special devices,
> >       and the things that are most performance critical (sockets,
> >       pipes, futexes) are tuned.  Yum!
> 
> Yes, kind of. The hook for an efficent edge triggered event notification
> should be something like the socket one where you have a ->data_ready()
> and ->write_space(), where the caller of these callbacks know that signals
> has to be delivered on 0->1 transactions. With the poll hook you have the
> drawback that the wakeup list is invoked each time data arrives and this
> might generate a little bit too many events. This is no a problem since
> epoll collapse them, but still collapsing do cost CPU cycles.

You avoid the extra CPU cycles like this:

    1. EP_CTL_ADD adds the listener to the file's wait queue using
       ->poll(), and gets a free test of the object readiness [;)]

    2. When the transition happens, the wakeup will call your function,
       epoll_wakeup_function.  That removes the listener from the file's
       wait queue.  Note, you won't see any more wakeups from that file.

    3. When you report the event user space, _then_ you automatically
       add the listener back to the file's wait queue by calling ->poll().

This way, there are no spurious wakeups, and nothing to collapse.  I
would not be surprised if this is quite fast - perhaps as fast as the
special epoll hooks.

The nice feature that makes this possible is that waitqueues don't
wake up tasks any more: they simply call your choice of callback
function.  It was changed for aio, and it's a good change.

-- Jamie

  reply	other threads:[~2002-11-01  1:55 UTC|newest]

Thread overview: 117+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-10-28 19:14 [PATCH] epoll more scalable than poll Hanna Linder
2002-10-28 20:10 ` Hanna Linder
2002-10-28 20:56 ` Martin Waitz
2002-10-28 22:02   ` bert hubert
2002-10-28 22:15     ` bert hubert
2002-10-28 22:17   ` Davide Libenzi
2002-10-28 22:08 ` bert hubert
2002-10-28 22:12   ` [Lse-tech] " Shailabh Nagar
2002-10-28 22:37     ` Davide Libenzi
2002-10-28 22:29   ` Davide Libenzi
2002-10-28 22:58     ` and nicer too - " bert hubert
2002-10-28 23:23       ` Davide Libenzi
2002-10-28 23:44     ` Jamie Lokier
2002-10-29  0:02       ` Davide Libenzi
2002-10-29  1:51         ` Jamie Lokier
2002-10-29  5:06           ` Davide Libenzi
2002-10-29 11:20             ` Jamie Lokier
2002-10-30  0:16               ` Davide Libenzi
2002-10-29  0:03       ` bert hubert
2002-10-29  0:20         ` Davide Libenzi
2002-10-29  0:48         ` Jamie Lokier
2002-10-29  1:53           ` Jamie Lokier
2002-10-28 23:45   ` and nicer too - " John Gardiner Myers
2002-10-29  0:08     ` Davide Libenzi
2002-10-29 12:59       ` Martin Waitz
2002-10-29 15:19         ` bert hubert
2002-10-29 22:54           ` Martin Waitz
2002-10-30  2:24             ` Davide Libenzi
2002-10-30 19:38               ` Martin Waitz
2002-10-31  5:04                 ` Davide Libenzi
2002-10-29  0:18     ` bert hubert
2002-10-29  0:32       ` Davide Libenzi
2002-10-29  0:40         ` bert hubert
2002-10-29  0:57           ` Davide Libenzi
2002-10-29  0:53             ` bert hubert
2002-10-29  1:13               ` Davide Libenzi
2002-10-29  1:08                 ` [Lse-tech] " Hanna Linder
2002-10-29  1:39                   ` Davide Libenzi
2002-10-29  2:05               ` Jamie Lokier
2002-10-29  2:44                 ` Davide Libenzi
2002-10-29  4:01                   ` [PATCH] Updated sys_epoll now with man pages Hanna Linder
2002-10-29  5:09                     ` Andrew Morton
2002-10-29  5:28                       ` [Lse-tech] " Randy.Dunlap
2002-10-29  5:47                         ` Davide Libenzi
2002-10-29  5:41                           ` Randy.Dunlap
2002-10-29  6:12                             ` Davide Libenzi
2002-10-29  6:03                               ` Randy.Dunlap
2002-10-29  6:23                                 ` Davide Libenzi
2002-10-29 14:59                         ` Paul Larson
2002-10-29  5:31                       ` Davide Libenzi
2002-10-29  7:34                       ` Davide Libenzi
2002-10-29 11:04                       ` bert hubert
2002-10-29 15:30                       ` [Lse-tech] " Shailabh Nagar
2002-10-29 17:45                         ` Davide Libenzi
2002-10-29 19:30                           ` Hanna Linder
2002-10-29 19:49                             ` Davide Libenzi
2002-10-29 13:09             ` and nicer too - Re: [PATCH] epoll more scalable than poll bert hubert
2002-10-29 21:25               ` Davide Libenzi
2002-10-29 21:23                 ` Hanna Linder
2002-10-29 21:41                   ` Davide Libenzi
2002-10-29 23:06                     ` Hanna Linder
2002-10-29 23:14                       ` [Lse-tech] " Randy.Dunlap
2002-10-29 23:25                       ` Davide Libenzi
2002-10-29  1:47   ` Security critical race condition in epoll code John Gardiner Myers
2002-10-29  2:13     ` Davide Libenzi
2002-10-29  3:38     ` Davide Libenzi
2002-10-29 19:49   ` and nicer too - Re: [PATCH] epoll more scalable than poll John Gardiner Myers
2002-10-29 21:03     ` Davide Libenzi
2002-10-30  0:26       ` Jamie Lokier
2002-10-30  2:09         ` Davide Libenzi
2002-10-30  5:51         ` Davide Libenzi
2002-10-30  2:22       ` John Gardiner Myers
2002-10-30  3:51         ` Davide Libenzi
2002-10-31  2:07           ` John Gardiner Myers
2002-10-31  3:21             ` Davide Libenzi
2002-10-31 11:10               ` [Lse-tech] " Suparna Bhattacharya
2002-10-31 18:42                 ` Davide Libenzi
2002-10-30 23:01         ` Jamie Lokier
2002-10-30 23:53           ` Davide Libenzi
2002-10-31  0:52             ` Jamie Lokier
2002-10-31  4:15               ` Davide Libenzi
2002-10-31 15:07                 ` Jamie Lokier
2002-10-31 19:10                   ` Davide Libenzi
2002-11-01 17:42                     ` Dan Kegel
2002-11-01 17:45                       ` Davide Libenzi
2002-11-01 18:41                         ` Dan Kegel
2002-11-01 19:16                       ` Jamie Lokier
2002-11-01 20:04                         ` Charlie Krasic
2002-11-01 20:14                           ` Jamie Lokier
2002-11-01 20:22                         ` Mark Mielke
2002-10-31 15:41                 ` Unifying epoll,aio,futexes etc. (What I really want from epoll) Jamie Lokier
2002-10-31 15:48                   ` bert hubert
2002-10-31 16:45                   ` Alan Cox
2002-10-31 22:00                     ` Rusty Russell
2002-11-01  0:32                       ` Jamie Lokier
2002-11-01 13:23                       ` Alan Cox
2002-10-31 20:28                   ` Davide Libenzi
2002-10-31 23:02                     ` Jamie Lokier
2002-11-01  1:01                       ` Davide Libenzi
2002-11-01  2:01                         ` Jamie Lokier [this message]
2002-11-01 17:36                           ` Davide Libenzi
2002-11-01 20:45                         ` Jamie Lokier
2002-11-01  1:55                       ` Matthew D. Hall
2002-11-01  2:54                         ` Davide Libenzi
2002-11-01 18:18                           ` Dan Kegel
2002-11-01  2:56                         ` Jamie Lokier
2002-11-01  4:29                       ` Mark Mielke
2002-11-01  4:59                         ` Jamie Lokier
2002-11-01 23:27                       ` John Gardiner Myers
2002-11-02  4:55                         ` Mark Mielke
2002-11-02 15:41                         ` Jamie Lokier
2002-11-05 18:15                       ` pipe POLLOUT oddity John Gardiner Myers
2002-11-05 18:18                         ` Benjamin LaHaise
2002-11-01 23:16                   ` Unifying epoll,aio,futexes etc. (What I really want from epoll) John Gardiner Myers
2002-10-30 18:59       ` and nicer too - Re: [PATCH] epoll more scalable than poll Zach Brown
2002-10-30 19:25         ` Davide Libenzi
2002-10-31 16:54         ` Davide Libenzi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20021101020119.GC30865@bjl1.asuk.net \
    --to=lk@tantalophile.demon.co.uk \
    --cc=akpm@digeo.com \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=davidel@xmailserver.org \
    --cc=linux-aio@kvack.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lse-tech@lists.sourceforge.net \
    --cc=torvalds@transmeta.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox