public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Christopher K. St. John" <cks@distributopia.com>
To: linux-kernel@vger.kernel.org
Cc: Davide Libenzi <davidel@xmailserver.org>, Dan Kegel <dank@kegel.com>
Subject: Re: [PATCH] /dev/epoll update ...
Date: Wed, 19 Sep 2001 18:24:41 -0500	[thread overview]
Message-ID: <3BA92939.60AEE7DA@distributopia.com> (raw)
In-Reply-To: <XFMail.20010919151147.davidel@xmailserver.org>

Davide Libenzi wrote:
> 
> 1)      select()/poll();
> 2)      recv()/send();
> 
> vs :
> 
> 1)      if (recv()/send() == FAIL)
> 2)              ioctl(EP_POLL);
> 
> When there's no data/tx buffer full these will result in 2 syscalls while
> if data is available/tx buffer ok the first method will result in 2 syscalls
> while the second will never call the ioctl().
> It looks very linear to me, with select()/poll() you're asking for a state while
> with /dev/epoll you're asking for a state change.
> 

 Ok, if we're just disagreeing about the best api,
then I can live with that. But it appears we're
talking at cross-purposes, so I want to try this one
more time. I'll lay my though processes out in detail,
and you can tell me at which step I'm going wrong:


 Normally, you'd spend most of your time sitting in
ioctl(EP_POLL) waiting for something to happen. So
that's one syscall.

 If you get an event that indicates you can accept()
a new connection, then you do an accept(). Assume it
succeeds. That's two syscalls. Then you register
interest in the fd with a write to /dev/poll, that's
three.

 With the current /dev/epoll, you must try to read()
the new socket before you go back to ioctl(EP_POLL),
just in case there is data available. You expect
there isn't, but you have to try. This is the step
I'm talking about. That's four.

 Assume data was not available, so you loop back
to ioctl(EP_POLL) and wait for an event. That's five
syscalls. The event comes in, you do another read()
on the socket, and probably get some data. That's
six syscalls to finally get your data.

 ioctl(kpfd, EP_POLL)	1     wait for events
 s = accept()           2     accept a new socket
 write(kpfd, s)         3     register interest
 n = read(s)            4 <-- annoying test-read
 ioctl(kpfd, EP_POLL)   5     wait for events
 n = read(s)            6     get some data

 You have a similiar problem with write's, but I'm
guessing it's safe to assume the first write will
always succeed, so it's awkward but not a big
problem.

 If /dev/epoll tested the initial state of the socket,
then there would be no need for the test read:

 ioctl(kpfd, EP_POLL)	1     wait for events
 s = accept()		2     accept a new socket
 write(kpfd, s)		3     register interest
 ioctl(kpfd, EP_POLL)	4     wait for events
 n = read(s)		5     get some data

 So, we've saved a syscall and, perhaps more importantly,
we don't have to keep a list of to-be-read-just-in-case
fd's sitting around. I wouldrather make this a "clean
api" argument than a performance argument, since it's
unclear that there is really any significant speed
difference in practice.

 Note that the number of unnecessary syscalls is much
greater than 20%, since on a heavily loaded server, you
could be doing 1000's of unecessary reads for every
ioctl(EP_POLL).

 On a fast local network you'd expect the test reads
to mostly return something, so it's no big deal. But
if you've got 10k very slow connections...

 There's a good summary of the problem in the Banga,
Mogul and Druschel[1] paper at:

  http://citeseer.nj.nec.com/banga99scalable.html

 Page 5, right hand column, third paragraph.

 By the way, thanks for the patch. I know I've been
complaining about it, but I wouldn't have bothered
unless I thought it was a good thing. I appreciate
your taking the time to write and release it.


-- 
Christopher St. John cks@distributopia.com
DistribuTopia http://www.distributopia.com

  reply	other threads:[~2001-09-19 23:32 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2001-09-19  2:20 [PATCH] /dev/epoll update Dan Kegel
2001-09-19  6:25 ` Dan Kegel
2001-09-19  7:04 ` Christopher K. St. John
2001-09-19 15:37   ` Dan Kegel
2001-09-19 15:59     ` Zach Brown
2001-09-19 17:12     ` Christopher K. St. John
2001-09-19 17:39     ` Davide Libenzi
2001-09-19 18:26     ` Alan Cox
2001-09-19 17:25   ` Davide Libenzi
2001-09-19 19:03     ` Christopher K. St. John
2001-09-19 19:30       ` Davide Libenzi
2001-09-19 21:49         ` Christopher K. St. John
2001-09-19 22:11           ` Davide Libenzi
2001-09-19 23:24             ` Christopher K. St. John [this message]
2001-09-19 23:52               ` Davide Libenzi
2001-09-20  2:13             ` Dan Kegel
2001-09-20  2:28               ` Davide Libenzi
2001-09-20  3:03                 ` Dan Kegel
2001-09-20 16:58                   ` Davide Libenzi
2001-09-20  4:32                 ` Christopher K. St. John
2001-09-20  4:43                   ` Christopher K. St. John
2001-09-20  5:05                     ` Benjamin LaHaise
2001-09-20 18:25                       ` Davide Libenzi
2001-09-20 19:33                         ` Benjamin LaHaise
2001-09-20 19:58                           ` Davide Libenzi
2001-09-20 17:18                   ` Davide Libenzi
2001-09-24  0:11                     ` Gordon Oliver
2001-09-24  0:33                       ` Davide Libenzi
2001-09-24 19:23                     ` Eric W. Biederman
2001-09-24 20:04                       ` Davide Libenzi
2001-09-21  5:59             ` Ton Hospel
2001-09-21 16:48               ` Davide Libenzi
2001-09-19 17:21 ` Davide Libenzi
  -- strict thread matches above, loose matches on Subject: below --
2002-03-20  3:49 [patch] " Davide Libenzi
     [not found] <local.mail.linux-kernel/3BB03C6A.7D1DD7B3@kegel.com>
     [not found] ` <local.mail.linux-kernel/3BAEB39B.DE7932CF@kegel.com>
     [not found]   ` <local.mail.linux-kernel/3BAF83EF.C8018E45@distributopia.com>
2001-09-25 17:36     ` [PATCH] " Jonathan Lemon
2001-09-25 18:34       ` Dan Kegel
2001-09-24  4:16 Dan Kegel
2001-09-24 19:11 ` Eric W. Biederman
2001-09-24 19:34   ` Jamie Lokier
2001-09-24 20:09     ` Davide Libenzi
2001-09-24 21:56       ` Jamie Lokier
2001-09-24 22:08         ` Davide Libenzi
2001-09-24 22:09           ` Jamie Lokier
2001-09-24 22:20             ` Davide Libenzi
2001-09-24 22:21               ` Jamie Lokier
2001-09-24 22:30                 ` Davide Libenzi
2001-09-25  9:25             ` Dan Kegel
     [not found] ` <3BAF83EF.C8018E45@distributopia.com>
2001-09-25  8:12   ` Dan Kegel
2001-09-21  6:22 Dan Kegel
2001-09-21 18:45 ` Davide Libenzi
2001-09-07 19:27 Davide Libenzi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3BA92939.60AEE7DA@distributopia.com \
    --to=cks@distributopia.com \
    --cc=dank@kegel.com \
    --cc=davidel@xmailserver.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox