public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Jamie Lokier <lk@tantalophile.demon.co.uk>
To: Davide Libenzi <davidel@xmailserver.org>
Cc: John Gardiner Myers <jgmyers@netscape.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linux-aio@kvack.org, lse-tech@lists.sourceforge.net
Subject: Re: and nicer too - Re: [PATCH] epoll more scalable than poll
Date: Thu, 31 Oct 2002 00:52:59 +0000	[thread overview]
Message-ID: <20021031005259.GA25651@bjl1.asuk.net> (raw)
In-Reply-To: <Pine.LNX.4.44.0210301547170.1405-100000@blue1.dev.mcafeelabs.com>

Davide Libenzi wrote:
> The cost of the test will be basically the cost of a ->poll(), that is
> exactly the same cost of the very first read()/write() that you would do
> by following the current API rule.

No, the cost of ->poll() is somewhat less than read()/write(), because
the latter requires a system call and the former does not.  System
calls are still nowhere near as cheap as function calls.

> > I also agree with criticisms that epoll should test and send an event
> > on registration, but only _if_ the test is cheap.  Nothing to do with
> > correctness (I like the edge semantics as they are), but because
> > delivering one event is so infinitesimally low impact with epoll that
> > it's preferable to doing a single speculative read/write/whatever.
> >
> > Regarding the effectiveness of the optimisation, I'd guess that quite
> > a lot of incoming connections do not come with initial data in the
> > short scheduling time after a SYN (unless it's on a LAN).  I don't
> > know this for sure though.
> 
> Ok Jamie, try to explain me which kind of improvement this first drop will
> bring.

I have thought about an optimal server state machine.  (I presume from
your carefully thought out implementation that you have too).

In a state machine, each fd has some user-space state.  I've already
hinted at how this is used to prevent starvation/livelock on a busy
server, and make service fairer.

I would take that further and _defer_ the epoll_ctl() to register an
fd until the first time I have seen EAGAIN from that fd.  This is
because in some cases, epoll_ctl() would not be needed at all - so we
can remove that overhead, and the system call overhead.

Now you would force me to call read() or write() after the
epoll_ctl(), even though I _know_ the result is always going to be
EAGAIN.  You're forcing me to make an always redundant system call.
But I can't omit it, because that's a race condition.

So, I've thought about the _optimal_ state machine and it's clear that
epoll should test the condition on fd registration - for best
performance.  (Nothing to do with scalability, just raw performance).

> And also, how such first drop would not bring a "confusion" for the
> user, letting him think that he can go sleeping event w/out having first
> received EAGAIN. Isn't it better to say "you wait for events after EAGAIN",
> instead of "you wait for events after EAGAIN but after accept/connect".

Be careful with your rules.  epoll should work with blocking fds too,
if you understand the rules well enough, and fd registration doesn't
have to be done at the same time as accept/connect/pipe.

Your current rule in practice is:

	an event is generated on every "would-block" -> "ready" transition.
	after fd registration, you must treat the fd as "ready".

The proposed rule is this:

	an event is generated on every "would-block" -> "ready" transition.
	after fd registration, you may treat the fd as in any state you like.

The proposed rule is better because it permits better optimisations in
user space, as explained earlier.  (If you _really_ want to avoid the
call to ->poll() when user space doesn't care, make that a flag
argument to epoll_ctl()).

enjoy :)
-- Jamie


  reply	other threads:[~2002-10-31  0:46 UTC|newest]

Thread overview: 117+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-10-28 19:14 [PATCH] epoll more scalable than poll Hanna Linder
2002-10-28 20:10 ` Hanna Linder
2002-10-28 20:56 ` Martin Waitz
2002-10-28 22:02   ` bert hubert
2002-10-28 22:15     ` bert hubert
2002-10-28 22:17   ` Davide Libenzi
2002-10-28 22:08 ` bert hubert
2002-10-28 22:12   ` [Lse-tech] " Shailabh Nagar
2002-10-28 22:37     ` Davide Libenzi
2002-10-28 22:29   ` Davide Libenzi
2002-10-28 22:58     ` and nicer too - " bert hubert
2002-10-28 23:23       ` Davide Libenzi
2002-10-28 23:44     ` Jamie Lokier
2002-10-29  0:02       ` Davide Libenzi
2002-10-29  1:51         ` Jamie Lokier
2002-10-29  5:06           ` Davide Libenzi
2002-10-29 11:20             ` Jamie Lokier
2002-10-30  0:16               ` Davide Libenzi
2002-10-29  0:03       ` bert hubert
2002-10-29  0:20         ` Davide Libenzi
2002-10-29  0:48         ` Jamie Lokier
2002-10-29  1:53           ` Jamie Lokier
2002-10-28 23:45   ` and nicer too - " John Gardiner Myers
2002-10-29  0:08     ` Davide Libenzi
2002-10-29 12:59       ` Martin Waitz
2002-10-29 15:19         ` bert hubert
2002-10-29 22:54           ` Martin Waitz
2002-10-30  2:24             ` Davide Libenzi
2002-10-30 19:38               ` Martin Waitz
2002-10-31  5:04                 ` Davide Libenzi
2002-10-29  0:18     ` bert hubert
2002-10-29  0:32       ` Davide Libenzi
2002-10-29  0:40         ` bert hubert
2002-10-29  0:57           ` Davide Libenzi
2002-10-29  0:53             ` bert hubert
2002-10-29  1:13               ` Davide Libenzi
2002-10-29  1:08                 ` [Lse-tech] " Hanna Linder
2002-10-29  1:39                   ` Davide Libenzi
2002-10-29  2:05               ` Jamie Lokier
2002-10-29  2:44                 ` Davide Libenzi
2002-10-29  4:01                   ` [PATCH] Updated sys_epoll now with man pages Hanna Linder
2002-10-29  5:09                     ` Andrew Morton
2002-10-29  5:28                       ` [Lse-tech] " Randy.Dunlap
2002-10-29  5:47                         ` Davide Libenzi
2002-10-29  5:41                           ` Randy.Dunlap
2002-10-29  6:12                             ` Davide Libenzi
2002-10-29  6:03                               ` Randy.Dunlap
2002-10-29  6:23                                 ` Davide Libenzi
2002-10-29 14:59                         ` Paul Larson
2002-10-29  5:31                       ` Davide Libenzi
2002-10-29  7:34                       ` Davide Libenzi
2002-10-29 11:04                       ` bert hubert
2002-10-29 15:30                       ` [Lse-tech] " Shailabh Nagar
2002-10-29 17:45                         ` Davide Libenzi
2002-10-29 19:30                           ` Hanna Linder
2002-10-29 19:49                             ` Davide Libenzi
2002-10-29 13:09             ` and nicer too - Re: [PATCH] epoll more scalable than poll bert hubert
2002-10-29 21:25               ` Davide Libenzi
2002-10-29 21:23                 ` Hanna Linder
2002-10-29 21:41                   ` Davide Libenzi
2002-10-29 23:06                     ` Hanna Linder
2002-10-29 23:14                       ` [Lse-tech] " Randy.Dunlap
2002-10-29 23:25                       ` Davide Libenzi
2002-10-29  1:47   ` Security critical race condition in epoll code John Gardiner Myers
2002-10-29  2:13     ` Davide Libenzi
2002-10-29  3:38     ` Davide Libenzi
2002-10-29 19:49   ` and nicer too - Re: [PATCH] epoll more scalable than poll John Gardiner Myers
2002-10-29 21:03     ` Davide Libenzi
2002-10-30  0:26       ` Jamie Lokier
2002-10-30  2:09         ` Davide Libenzi
2002-10-30  5:51         ` Davide Libenzi
2002-10-30  2:22       ` John Gardiner Myers
2002-10-30  3:51         ` Davide Libenzi
2002-10-31  2:07           ` John Gardiner Myers
2002-10-31  3:21             ` Davide Libenzi
2002-10-31 11:10               ` [Lse-tech] " Suparna Bhattacharya
2002-10-31 18:42                 ` Davide Libenzi
2002-10-30 23:01         ` Jamie Lokier
2002-10-30 23:53           ` Davide Libenzi
2002-10-31  0:52             ` Jamie Lokier [this message]
2002-10-31  4:15               ` Davide Libenzi
2002-10-31 15:07                 ` Jamie Lokier
2002-10-31 19:10                   ` Davide Libenzi
2002-11-01 17:42                     ` Dan Kegel
2002-11-01 17:45                       ` Davide Libenzi
2002-11-01 18:41                         ` Dan Kegel
2002-11-01 19:16                       ` Jamie Lokier
2002-11-01 20:04                         ` Charlie Krasic
2002-11-01 20:14                           ` Jamie Lokier
2002-11-01 20:22                         ` Mark Mielke
2002-10-31 15:41                 ` Unifying epoll,aio,futexes etc. (What I really want from epoll) Jamie Lokier
2002-10-31 15:48                   ` bert hubert
2002-10-31 16:45                   ` Alan Cox
2002-10-31 22:00                     ` Rusty Russell
2002-11-01  0:32                       ` Jamie Lokier
2002-11-01 13:23                       ` Alan Cox
2002-10-31 20:28                   ` Davide Libenzi
2002-10-31 23:02                     ` Jamie Lokier
2002-11-01  1:01                       ` Davide Libenzi
2002-11-01  2:01                         ` Jamie Lokier
2002-11-01 17:36                           ` Davide Libenzi
2002-11-01 20:45                         ` Jamie Lokier
2002-11-01  1:55                       ` Matthew D. Hall
2002-11-01  2:54                         ` Davide Libenzi
2002-11-01 18:18                           ` Dan Kegel
2002-11-01  2:56                         ` Jamie Lokier
2002-11-01  4:29                       ` Mark Mielke
2002-11-01  4:59                         ` Jamie Lokier
2002-11-01 23:27                       ` John Gardiner Myers
2002-11-02  4:55                         ` Mark Mielke
2002-11-02 15:41                         ` Jamie Lokier
2002-11-05 18:15                       ` pipe POLLOUT oddity John Gardiner Myers
2002-11-05 18:18                         ` Benjamin LaHaise
2002-11-01 23:16                   ` Unifying epoll,aio,futexes etc. (What I really want from epoll) John Gardiner Myers
2002-10-30 18:59       ` and nicer too - Re: [PATCH] epoll more scalable than poll Zach Brown
2002-10-30 19:25         ` Davide Libenzi
2002-10-31 16:54         ` Davide Libenzi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20021031005259.GA25651@bjl1.asuk.net \
    --to=lk@tantalophile.demon.co.uk \
    --cc=davidel@xmailserver.org \
    --cc=jgmyers@netscape.com \
    --cc=linux-aio@kvack.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lse-tech@lists.sourceforge.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox