public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Jamie Lokier <jamie@shareable.org>
To: Dirk Morris <dmorris@metavize.com>
Cc: Davide Libenzi <davidel@xmailserver.org>,
	Ben Mansell <ben@zeus.com>, Steven Dake <sdake@mvista.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: epoll reporting events when it hasn't been asked to
Date: Wed, 14 Apr 2004 22:48:17 +0100	[thread overview]
Message-ID: <20040414214817.GH12105@mail.shareable.org> (raw)
In-Reply-To: <407D9D2F.3010901@metavize.com>

Dirk Morris wrote:
> From what I understand you're proposing to remove the fd from the set 
> lazily instead of immediately.
> Which will save system calls in the cases were the HUP/ERR condition 
> does not occur during the 'disabled' time.
> 
> In my case, which you may choose to disregard, this condition is not 
> irregular or in any way a special case.
> So the revision you have proposed is just an optimization.
> You could even use this same optimization with the disable feature 
> (disable it lazily) and get even better performance with the same number 
> of syscalls you proposed.

I don't think you would get any better performance even when HUP/ERR
conditions are commonplace.

A HUP condition means you cannot read & write from the fd any more, so
even though you may defer handling it in userspace, there's nothing to
be gained from disabling all epoll events after you receive the HUP:
instead of lazy disabling, you might as well delete the fd from epoll
as soon as you receive the HUP even though you don't want to handle it yet.
(Btw, the comment about HUP in net/ipv4/tcp.c:tcp_poll() is illuminating).

ERRs can occur many times while a socket is open so the algorithmic
efficiency is worth considering.

An ERR condition, at least on a socket, forces you to examine the
error before you can perform a further read or write.  That's because
read and write operations will both check for pending error, so when
you know there's an error condition, you know that the next read or
write call is really a "tell me the error" call.

So, assuming you apply the lazy strategy, after you receive an ERR, in
principle you could decide that you want to do nothing with it until
the next IN or OUT which represents non-error data readiness, and then
you will examine the error code (by doing a read or write call) and
then read or write actual data.  Then indeed being able to ignore just
ERR and still listen for IN and/or OUT would make a difference.  But
you might as well just read the error condition using MSG_ERRQUEUE,
and spend your one system call that way instead - you can still defer
the processing of the error code.

There is a situation where that is algorithmically not as good as
being able to ignore just ERR: The example is when you are receiving a
malicious flood of ICMP packets which cause lots of error conditions
on a UDP socket (or other similar things), and you want to ignore all
of those while efficiently handling a lower rate of non-error data
transfer.  That's a very unusual situation and it doesn't occur except
under attack circumstances with UDP (because real error ICMPs are a
response to something you transmitted yourself).

Other socket types or devices might give ERR a different meaning which
causes them to be common relative to read and write readiness.  If so,
they probably shouldn't.

> I see no downside, except that it no longer conforms to the semantics of 
> poll and select.
> Whether or not its worth it to deviate from this behavior over such a 
> detail, I don't know. :)

I see two downsides.  They're not performance downsides, just practical:

  1. Currently, you can implement epoll in terms of poll(), if you
     have an epoll-based program and want to create epoll emulation
     functions for running on an old kernel.  If epoll were extended
     to permit ignoring HUP/ERR, that would no longer be possible.

  2. Perhaps most programs will use a flexible library like libevent
     or something made for the program.  It's possible that library
     will offer an API which sends the POLLIN/OUT/ERR/HUP bits to the
     application, and lets the application programmer interpret those
     bits in whatever way is appropriate.  If it becomes easy to
     ignore HUP when the library works with epoll, applications may
     accidentally end up depending on that, and will unexpectedly fail
     when they are run one day on an older system, or even another OS
     where the same library works by calling poll() or select().

Summary: epoll is fine the way it is.  Imho.

-- Jamie

  reply	other threads:[~2004-04-14 21:48 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-04-01 18:25 epoll reporting events when it hasn't been asked to Ben Mansell
2004-04-01 19:28 ` Davide Libenzi
2004-04-01 23:29   ` Steven Dake
2004-04-02  9:04     ` Ben Mansell
2004-04-02 15:22       ` Davide Libenzi
2004-04-02 18:40         ` Is POLLHUP an input-only or bidirectional condition? (was: epoll reporting events when it hasn't been asked to) Jamie Lokier
2004-04-03 12:19           ` Is POLLHUP an input-only or bidirectional condition? Richard Kettlewell
2004-04-03 21:44           ` Is POLLHUP an input-only or bidirectional condition? (was: epoll reporting events when it hasn't been asked to) Davide Libenzi
2004-04-03 22:35             ` Jamie Lokier
2004-04-04  1:28               ` Davide Libenzi
2004-04-04  2:08                 ` Jamie Lokier
2004-04-04  2:49                   ` Davide Libenzi
2004-04-04 18:51               ` Ben Mansell
2004-04-04 19:41                 ` Davide Libenzi
2004-04-04 20:24                 ` Jamie Lokier
2004-04-14 17:59         ` epoll reporting events when it hasn't been asked to Dirk Morris
2004-04-14 19:39           ` Jamie Lokier
2004-04-14 20:21             ` Dirk Morris
2004-04-14 21:48               ` Jamie Lokier [this message]
  -- strict thread matches above, loose matches on Subject: below --
2004-04-01 16:54 Ben
2004-04-01 17:51 ` Davide Libenzi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20040414214817.GH12105@mail.shareable.org \
    --to=jamie@shareable.org \
    --cc=ben@zeus.com \
    --cc=davidel@xmailserver.org \
    --cc=dmorris@metavize.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sdake@mvista.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox