From: Benjamin LaHaise <bcrl@kvack.org>
To: Zach Brown <zach.brown@oracle.com>
Cc: David Miller <davem@davemloft.net>,
Evgeniy Polyakov <johnpol@2ka.mipt.ru>,
linux-kernel@vger.kernel.org, netdev@vger.kernel.org
Subject: Re: [RFC 1/4] kevent: core files.
Date: Thu, 27 Jul 2006 16:58:06 -0400 [thread overview]
Message-ID: <20060727205806.GD16971@kvack.org> (raw)
In-Reply-To: <44C91192.4090303@oracle.com>
On Thu, Jul 27, 2006 at 12:18:42PM -0700, Zach Brown wrote:
> The easy part is fixing up the somewhat obfuscated collection call.
> Instead of coming in through a multiplexer that magically treats a void
> * as a struct kevent_user_control followed by N ukevents (as specified
> in the kevent_user_control!) we'd turn it into a more explicit
> collection syscall:
>
> int kevent_getevents(int event_fd, struct ukevent *events,
> int min_events, int max_events,
> struct timeval *timeout);
You've just reinvented io_getevents(). What exactly are we getting from
reinventing this (aside from breaking existing apps and creating more of
an API mess)?
> Say we have a ring of event structs. AIO has this today, but it sort of
> gets it wrong because each event element doesn't specify whether it is
> owned by the kernel or userspace. (It really gets it wrong because it
> doesn't flush_dcache_page() after updating the ring via kmap(), but
> never mind that! No one actually uses this mmap() AIO ring.) In AIO
> today there is also a control struct mapped along with the ring that has
> head and tail pointers. We don't want to bounce that cacheline around.
> net/socket/af_packet.c gets this right with it's tp_status member of
> tpacket_hdr.
That could be rev'd in the mmap() ring buffer, as there are compat and
incompat bits for changing the structure layout. As for bouncing the
cacheline of head/tail around, I don't think it matters on real machines,
as the multithreaded/SMP case will hit that cacheline bouncing if the
user is sharing the event ring between multiple threads on multiple CPUs.
The only way around that is to use multiple event rings, say one per node,
at which point you have to do load balancing of io requests explicitely
between queues (which might be worth it).
> So, great, glibc can now find pending events very quickly if they're
> waiting in the ring and can fall back to the collection syscall if it
> wants to wait and the ring is empty. If it consumes events via the
> syscall it increases its ring index by the number the syscall returned.
>
> There's two things we should address: level events and the notion of
> only submitting as much as fits in the ring.
>
> epoll and kevent both have the notion of an event type that always
> creates an event at the time of the collection syscall while the event
> source is on a ready list. Think of epoll calling ->poll(POLLOUT) for
> an empty socket buffer at every sys_epoll_wait() call. We can't have
> some source constantly spewing into the ring :/. We could fix this by
> the API requiring that level events can *only* be collected through the
> syscall interface. userspace could call into the collection syscall
> every N events collected through the ring, say. N would be tuned to
> amortize the syscall cost and still provide fairness or latency for the
> level sources. I'd be fine with that, especially when it's hidden off
> in glibc.
This is exactly why I think level triggered events are nasty. It's
impossible to do cleanly without requiring a syscall.
> Today AIO only allows submission of as many events as there are space in
> the ring. It mostly does this so its completion can drop an event in
> the ring from any context. If we back away from this so that we can
> have long-lived source registration generate multiple edge events (and I
> think we want to!), we have to be kind of careful. A source could
> generate an event while the ring is full. The event could go in a list
> but if userspace is collecting events in userspace the kernel won't be
> told when there's space. We'd first have to check this ready list when
> later events are generated so that pending events on the list aren't
> overlooked. Userspace would also want to use the collection syscall as
> the ring empties. Neither seem hard.
As soon as you allow queueing events up in kernel space, it becomes
necessary to do another syscall after pulling events out of the queue,
which is a waste of CPU cycles when you're under heavy load (exactly the
point at which you want the system to be its most efficient). Given that
growing the ring buffer is easy enough to do, I'm not sure that the hit
is worth it. At some point there has to be some form of flow control
involved, and it is much better if it is explicitely obvious where this
happens (as opposed to signal queues and our wonderful OOM handling).
-ben
--
"Time is of no importance, Mr. President, only life is important."
Don't Email: <dont@kvack.org>.
next prev parent reply other threads:[~2006-07-27 20:58 UTC|newest]
Thread overview: 180+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-07-09 13:24 [RFC 1/4] kevent: core files Evgeniy Polyakov
2006-07-09 14:59 ` Pekka Enberg
2006-07-09 15:08 ` Evgeniy Polyakov
2006-07-25 6:17 ` David Miller
2006-07-25 6:26 ` Evgeniy Polyakov
2006-07-27 19:18 ` Zach Brown
2006-07-27 20:06 ` Evgeniy Polyakov
2006-07-27 21:32 ` Zach Brown
2006-07-28 5:23 ` Evgeniy Polyakov
2006-07-28 18:33 ` Zach Brown
2006-07-28 18:44 ` Evgeniy Polyakov
2006-07-28 19:10 ` Zach Brown
2006-07-29 3:38 ` Ulrich Drepper
2006-07-29 4:32 ` Nicholas Miell
2006-07-29 15:48 ` Evgeniy Polyakov
2006-07-29 20:54 ` Nicholas Miell
2006-07-30 8:08 ` Ulrich Drepper
2006-07-29 15:44 ` Evgeniy Polyakov
2006-07-29 16:18 ` Ulrich Drepper
2006-07-29 16:36 ` Hans Henrik Happe
2006-07-31 10:33 ` Evgeniy Polyakov
2006-07-31 10:35 ` Herbert Xu
2006-07-31 10:50 ` Evgeniy Polyakov
2006-07-31 10:57 ` David Miller
2006-07-31 10:59 ` Herbert Xu
2006-08-01 7:53 ` Ulrich Drepper
2006-08-01 7:58 ` David Miller
2006-07-31 19:41 ` Evgeniy Polyakov
2006-07-31 22:00 ` David Miller
2006-07-31 22:16 ` Brent Cook
2006-07-31 22:20 ` David Miller
2006-08-01 6:24 ` Evgeniy Polyakov
2006-07-31 22:46 ` Zach Brown
2006-08-01 9:34 ` [take2 0/4] kevent: introduction Evgeniy Polyakov
2006-08-01 9:34 ` [take2 1/4] kevent: core files Evgeniy Polyakov
2006-08-01 9:34 ` [take2 2/4] kevent: network AIO, socket notifications Evgeniy Polyakov
2006-08-01 9:34 ` [take2 4/4] kevent: poll/select() notifications. Timer notifications Evgeniy Polyakov
2006-08-01 9:34 ` [take2 3/4] kevent: AIO, aio_sendfile() implementation Evgeniy Polyakov
2006-08-01 13:46 ` [take2 1/4] kevent: core files James Morris
2006-08-01 13:55 ` Evgeniy Polyakov
2006-08-01 14:27 ` James Morris
2006-08-01 14:34 ` Evgeniy Polyakov
2006-08-01 23:56 ` Zach Brown
2006-08-02 0:01 ` David Miller
2006-08-02 6:43 ` Evgeniy Polyakov
2006-08-02 6:39 ` Evgeniy Polyakov
2006-08-02 7:25 ` David Miller
2006-08-02 7:46 ` Evgeniy Polyakov
2006-08-03 9:45 ` [take3 0/4] kevent: Generic event handling mechanism Evgeniy Polyakov
2006-08-03 9:40 ` Evgeniy Polyakov
2006-08-03 9:46 ` [take3 1/4] kevent: Core files Evgeniy Polyakov
2006-08-03 9:46 ` [take3 2/4] kevent: AIO, aio_sendfile() implementation Evgeniy Polyakov
2006-08-03 9:46 ` [take3 3/4] kevent: Network AIO, socket notifications Evgeniy Polyakov
2006-08-03 9:46 ` [take3 4/4] kevent: poll/select() notifications. Timer notifications Evgeniy Polyakov
2006-08-03 9:43 ` Eric Dumazet
2006-08-03 9:48 ` Evgeniy Polyakov
2006-08-03 9:54 ` [take3 3/4] kevent: Network AIO, socket notifications Eric Dumazet
2006-08-03 10:13 ` Evgeniy Polyakov
2006-08-03 17:04 ` [take3 2/4] kevent: AIO, aio_sendfile() implementation Badari Pulavarty
2006-08-03 17:13 ` Evgeniy Polyakov
2006-08-03 14:40 ` [take3 1/4] kevent: Core files Eric Dumazet
2006-08-03 14:55 ` Evgeniy Polyakov
2006-08-03 15:11 ` Eric Dumazet
2006-08-03 15:21 ` Evgeniy Polyakov
2006-08-03 21:37 ` David Miller
2006-08-05 13:02 ` [take4 0/4] kevent: Generic event handling mechanism Evgeniy Polyakov
2006-08-05 13:02 ` [take4 1/4] kevent: Core files Evgeniy Polyakov
2006-08-05 13:02 ` [take4 2/4] kevent: AIO, aio_sendfile() implementation Evgeniy Polyakov
2006-08-05 13:02 ` [take4 3/4] kevent: Network AIO, socket notifications Evgeniy Polyakov
2006-08-05 13:02 ` [take4 4/4] kevent: poll/select() notifications. Timer notifications Evgeniy Polyakov
2006-08-05 17:57 ` [take4 1/4] kevent: Core files Greg KH
2006-08-05 18:10 ` Evgeniy Polyakov
2006-08-08 7:44 ` [take5 0/4] kevent: Generic event handling mechanism Evgeniy Polyakov
2006-08-08 7:44 ` [take5 1/4] kevent: Core files Evgeniy Polyakov
2006-08-08 7:44 ` [take5 2/4] kevent: AIO, aio_sendfile() implementation Evgeniy Polyakov
2006-08-08 7:44 ` [take5 3/4] kevent: Network AIO, socket notifications Evgeniy Polyakov
2006-08-08 7:44 ` [take5 4/4] kevent: poll/select() notifications. Timer notifications Evgeniy Polyakov
2006-08-08 9:52 ` [take5 3/4] kevent: Network AIO, socket notifications Eric Dumazet
2006-08-08 10:02 ` Evgeniy Polyakov
2006-08-08 22:02 ` [take5 1/4] kevent: Core files Zach Brown
2006-08-09 5:22 ` Evgeniy Polyakov
2006-08-08 21:32 ` [take5 0/4] kevent: Generic event handling mechanism Zach Brown
2006-08-09 5:31 ` Evgeniy Polyakov
2006-08-09 5:52 ` David Miller
2006-08-09 6:11 ` Evgeniy Polyakov
2006-08-09 6:25 ` Evgeniy Polyakov
2006-08-09 6:31 ` David Miller
2006-08-09 6:49 ` Evgeniy Polyakov
2006-08-09 6:57 ` Ulrich Drepper
2006-08-09 7:00 ` David Miller
2006-08-09 7:00 ` Evgeniy Polyakov
2006-08-09 8:34 ` Christoph Hellwig
2006-08-09 8:45 ` Andrew Morton
2006-08-09 8:02 ` [take6 0/3] " Evgeniy Polyakov
2006-08-09 7:58 ` David Miller
2006-08-09 8:07 ` Evgeniy Polyakov
2006-08-09 8:20 ` David Miller
2006-08-09 8:24 ` Evgeniy Polyakov
2006-08-09 8:02 ` [take6 1/3] kevent: Core files Evgeniy Polyakov
2006-08-09 8:02 ` [take6 3/3] kevent: Network AIO, socket notifications Evgeniy Polyakov
2006-08-09 8:02 ` [take6 2/3] kevent: poll/select() notifications. Timer notifications Evgeniy Polyakov
2006-08-09 17:47 ` [take6 1/3] kevent: Core files Stephen Hemminger
2006-08-09 19:17 ` Evgeniy Polyakov
2006-08-10 0:04 ` David Miller
2006-08-09 22:21 ` Andrew Morton
2006-08-10 6:14 ` Evgeniy Polyakov
2006-08-10 6:42 ` David Miller
2006-08-10 6:48 ` Evgeniy Polyakov
2006-08-10 7:18 ` Andrew Morton
2006-08-10 7:50 ` Evgeniy Polyakov
2006-08-10 8:02 ` Andrew Morton
2006-08-10 8:22 ` Evgeniy Polyakov
2006-08-11 0:56 ` Andrew Morton
2006-08-11 6:15 ` Evgeniy Polyakov
2006-08-11 6:23 ` Andrew Morton
2006-08-11 6:30 ` Evgeniy Polyakov
2006-08-11 7:04 ` Andrew Morton
2006-08-11 7:27 ` Evgeniy Polyakov
2006-08-11 6:25 ` Ulrich Drepper
2006-08-11 6:33 ` Evgeniy Polyakov
2006-08-11 6:38 ` David Miller
2006-08-11 6:55 ` Evgeniy Polyakov
2006-08-10 12:12 ` [take7 0/1] kevent: generic event handling mechanism Evgeniy Polyakov
2006-08-10 12:16 ` [take7 1/1] kevent: core files and timer/poll notifications Evgeniy Polyakov
2006-08-10 12:22 ` Evgeniy Polyakov
2006-08-11 8:40 ` [take8 0/2] kevent: Generic event handling mechanism Evgeniy Polyakov
2006-08-11 8:40 ` [take8 1/2] kevent: Core files Evgeniy Polyakov
2006-08-11 8:40 ` [take8 2/2] kevent: poll/select() notifications. Timer notifications Evgeniy Polyakov
2006-08-11 15:45 ` Andrew Morton
2006-08-12 8:18 ` Evgeniy Polyakov
2006-08-12 8:38 ` Andrew Morton
2006-08-12 8:55 ` Evgeniy Polyakov
2006-08-13 0:51 ` [take8 1/2] kevent: Core files Jeff Carr
2006-08-13 9:04 ` Evgeniy Polyakov
2006-08-14 6:20 ` [take8 0/2] kevent: Generic event handling mechanism Evgeniy Polyakov
2006-08-14 6:20 ` [take8 1/2] kevent: Core files Evgeniy Polyakov
2006-08-14 6:20 ` [take8 2/2] kevent: poll/select() notifications. Timer notifications Evgeniy Polyakov
2006-08-14 6:21 ` [take9 0/2] kevent: Generic event handling mechanism Evgeniy Polyakov
2006-08-14 6:21 ` [take9 1/2] kevent: Core files Evgeniy Polyakov
2006-08-14 6:21 ` [take9 2/2] kevent: poll/select() notifications. Timer notifications Evgeniy Polyakov
2006-08-16 13:30 ` Christoph Hellwig
2006-08-16 13:40 ` Evgeniy Polyakov
2006-08-18 10:41 ` Christoph Hellwig
2006-08-18 10:59 ` Evgeniy Polyakov
2006-08-21 11:01 ` Christoph Hellwig
2006-08-21 11:26 ` Evgeniy Polyakov
2006-08-22 14:35 ` Davide Libenzi
2006-08-16 13:45 ` [take9 1/2] kevent: Core files Christoph Hellwig
2006-08-16 13:56 ` Evgeniy Polyakov
2006-08-16 18:08 ` Zach Brown
2006-08-16 19:24 ` Evgeniy Polyakov
2006-08-16 19:45 ` David Miller
2006-08-16 20:06 ` Evgeniy Polyakov
2006-08-18 10:46 ` Christoph Hellwig
2006-08-18 11:23 ` Evgeniy Polyakov
2006-08-21 10:56 ` Christoph Hellwig
2006-08-21 11:13 ` Evgeniy Polyakov
2006-08-21 12:53 ` Bernd Petrovitsch
2006-08-21 13:01 ` Evgeniy Polyakov
2006-08-21 13:49 ` Bernd Petrovitsch
2006-08-21 19:09 ` David Miller
2006-08-16 13:26 ` [take9 0/2] kevent: Generic event handling mechanism Christoph Hellwig
2006-08-16 13:38 ` Evgeniy Polyakov
2006-08-16 18:10 ` Zach Brown
2006-08-16 12:34 ` [take10 " Evgeniy Polyakov
2006-08-16 12:34 ` [take10 1/2] kevent: Core files Evgeniy Polyakov
2006-08-16 12:34 ` [take10 2/2] kevent: poll/select() notifications. Timer notifications Evgeniy Polyakov
2006-08-18 9:35 ` [take10 1/2] kevent: Core files Joe Jin
2006-08-18 10:10 ` Evgeniy Polyakov
2006-08-01 1:05 ` [RFC 1/4] kevent: core files David Miller
2006-07-27 20:58 ` Benjamin LaHaise [this message]
2006-07-27 21:44 ` Zach Brown
2006-07-27 22:02 ` Benjamin LaHaise
2006-07-28 5:39 ` Evgeniy Polyakov
2006-07-28 19:01 ` Zach Brown
2006-07-28 19:24 ` Evgeniy Polyakov
2006-07-28 19:34 ` Zach Brown
2006-07-28 19:37 ` Zach Brown
2006-08-01 1:02 ` David Miller
2006-08-01 17:02 ` Zach Brown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20060727205806.GD16971@kvack.org \
--to=bcrl@kvack.org \
--cc=davem@davemloft.net \
--cc=johnpol@2ka.mipt.ru \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=zach.brown@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).