From: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
To: Eric Dumazet <dada1@cosmosbay.com>
Cc: Johann Borck <johann.borck@densedata.com>,
Ulrich Drepper <drepper@redhat.com>,
Ulrich Drepper <drepper@gmail.com>,
lkml <linux-kernel@vger.kernel.org>,
David Miller <davem@davemloft.net>, Andrew Morton <akpm@osdl.org>,
netdev <netdev@vger.kernel.org>,
Zach Brown <zach.brown@oracle.com>,
Christoph Hellwig <hch@infradead.org>,
Chase Venters <chase.venters@clientec.com>
Subject: Re: [take19 1/4] kevent: Core files.
Date: Tue, 17 Oct 2006 20:01:55 +0400 [thread overview]
Message-ID: <20061017160155.GA18522@2ka.mipt.ru> (raw)
In-Reply-To: <200610171732.28640.dada1@cosmosbay.com>
On Tue, Oct 17, 2006 at 05:32:28PM +0200, Eric Dumazet (dada1@cosmosbay.com) wrote:
> > So the most complex case is when user is going to use both interfaces,
> > and it's steps when mapped ring buffer has overflow.
> > In that case user can either read and mark some events as ready in ring
> > buffer (the latter is being done through special syscall), so kevent
> > core will put there new ready events.
> > User can also get events using usual syscall, in that case events in
> > ring buffer must be updated - and actually I implemented mapped buffer
> > in the way which allows to remove events from the queue - queue is a
> > FIFO, and the first entry to be obtained through syscall is _always_ the
> > first entry in the ring buffer.
> >
> > So when user reads event through syscall (no matter if we are in overflow
> > case or not), even being read is easily accessible in the ring buffer.
> >
> > So I propose following design for ring buffer (quite simple):
> > kernelspace maintains two indexes - to the first and the last events in
> > the ring buffer (and maximum size of the buffer of course).
> > When new event is marked as ready, some info is being copied into ring
> > buffer and index of the last entry is increased.
> > When event is being read through syscall it is _guaranteed_ that that
> > event will be at the position pointed by the index of the first
> > element, that index is then increased (thus opening new slot in the
> > buffer).
> > If index of the last entry reaches (with possible wrapping) index of the
> > first entry, that means that overflow has happend. In this case no new
> > events can be copied into ring buffer, so they are only placed into
> > ready queue (accessible through syscall kevent_get_events()).
> >
> > When user calls kevent_get_events() it will obtain the first element
> > (pointed by index of the first element in the ring buffer), and if there
> > is ready event, which is not placed into the ring buffer, it is
> > copied (with appropriate update of the last index and new overflow
> > condition).
>
> Well, I'm not sure its good to do this 'move one event from ready list to slot
> X', one by one, because this event will likely be flushed out of processor
> cache (because we will have to consume 4096 events before reaching this one).
> I think its better to batch this kind of 'push XX events' later, XX being
> small enough not to waste CPU cache, and when ring buffer is empty again.
Ok, that's possible.
> mmap buffer is good for latency and minimum synchro between user thread and
> kernel producer. But once we hit an 'overflow', it is better to revert to a
> mode feeding XX events per syscall, to be sure it fits CPU caches : The user
> thread will do the copy between kernel memory to user memory, and this thread
> will shortly use those events in user land.
User can do both - either get events through syscall, or get them from
mapped ring buffer when it is refilled.
> BTW, maintaining coherency on mmap buffer is expensive : once a event is
> copied to mmap buffer, kernel has to issue a smp_mb() before updating the
> index, so that a user thread wont start to consume an event with random
> values because its CPU see the update on index before updates on data.
There will be some tricks with barriers indeed.
> Once all the queue is flushed in efficient way, we can switch to mmap mode
> again.
>
> Eric
Ok, there is one apologist for mmap buffer implementation, who forced me
to create first implementation, which was dropped due to absense of
remote mental reading abilities.
Ulrich, does above approach sound good for you?
I actually do not want to reimplement something, that will be
pointed to with words 'no matter what you say, it is broken and I do not
want it' again :).
--
Evgeniy Polyakov
next prev parent reply other threads:[~2006-10-17 16:01 UTC|newest]
Thread overview: 68+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <115a6230591036@2ka.mipt.ru>
2006-09-12 8:41 ` [take18 0/4] kevent: Generic event handling mechanism Evgeniy Polyakov
2006-09-12 8:41 ` [take18 1/4] kevent: Core files Evgeniy Polyakov
2006-09-12 8:41 ` [take18 2/4] kevent: poll/select() notifications Evgeniy Polyakov
2006-09-12 8:41 ` [take18 3/4] kevent: Socket notifications Evgeniy Polyakov
2006-09-12 8:41 ` [take18 4/4] kevent: Timer notifications Evgeniy Polyakov
2006-09-20 9:35 ` [take19 0/4] kevent: Generic event handling mechanism Evgeniy Polyakov
2006-09-20 9:35 ` [take19 1/4] kevent: Core files Evgeniy Polyakov
2006-09-20 9:35 ` [take19 2/4] kevent: poll/select() notifications Evgeniy Polyakov
2006-09-20 9:35 ` [take19 3/4] kevent: Socket notifications Evgeniy Polyakov
2006-09-20 9:35 ` [take19 4/4] kevent: Timer notifications Evgeniy Polyakov
2006-10-04 6:34 ` [take19 1/4] kevent: Core files Ulrich Drepper
2006-10-04 6:48 ` Evgeniy Polyakov
2006-10-04 17:57 ` Ulrich Drepper
2006-10-05 8:57 ` Evgeniy Polyakov
2006-10-05 9:56 ` Eric Dumazet
2006-10-05 10:21 ` Evgeniy Polyakov
2006-10-05 10:45 ` Eric Dumazet
2006-10-05 10:55 ` Evgeniy Polyakov
2006-10-05 12:09 ` Eric Dumazet
2006-10-05 12:37 ` Evgeniy Polyakov
2006-10-15 23:22 ` Ulrich Drepper
2006-10-16 7:33 ` Evgeniy Polyakov
2006-10-16 10:16 ` Ulrich Drepper
2006-10-16 11:23 ` Evgeniy Polyakov
2006-10-17 5:10 ` Johann Borck
2006-10-17 5:59 ` Chase Venters
2006-10-17 10:42 ` Evgeniy Polyakov
2006-10-17 13:12 ` Chase Venters
2006-10-17 13:35 ` Evgeniy Polyakov
2006-10-17 10:39 ` Evgeniy Polyakov
2006-10-17 13:19 ` Eric Dumazet
2006-10-17 13:42 ` Evgeniy Polyakov
2006-10-17 13:52 ` Eric Dumazet
2006-10-17 14:07 ` Evgeniy Polyakov
2006-10-17 14:25 ` Eric Dumazet
2006-10-17 15:09 ` Evgeniy Polyakov
2006-10-17 15:32 ` Eric Dumazet
2006-10-17 16:01 ` Evgeniy Polyakov [this message]
2006-10-17 16:26 ` Eric Dumazet
2006-10-17 16:35 ` Evgeniy Polyakov
2006-10-17 16:45 ` Eric Dumazet
2006-10-18 4:10 ` Evgeniy Polyakov
2006-10-18 4:45 ` Eric Dumazet
2006-10-17 15:33 ` Hans Henrik Happe
2006-10-05 14:01 ` Hans Henrik Happe
2006-10-05 14:15 ` Evgeniy Polyakov
2006-10-05 15:07 ` Hans Henrik Happe
2006-09-22 19:22 ` [take19 0/4] kevent: Generic event handling mechanism Andrew Morton
2006-09-23 4:23 ` Evgeniy Polyakov
2006-10-04 6:09 ` Ulrich Drepper
2006-10-04 6:10 ` Ulrich Drepper
2006-10-04 6:27 ` Evgeniy Polyakov
2006-10-04 6:24 ` Evgeniy Polyakov
2006-09-26 15:54 ` Christoph Hellwig
2006-09-27 4:46 ` Evgeniy Polyakov
2006-09-27 15:09 ` Evgeniy Polyakov
2006-10-04 4:50 ` Ulrich Drepper
2006-10-04 4:55 ` Evgeniy Polyakov
2006-10-04 7:33 ` Ulrich Drepper
2006-10-04 7:48 ` Evgeniy Polyakov
2006-10-04 17:20 ` Ulrich Drepper
2006-10-05 9:02 ` Evgeniy Polyakov
2006-10-05 14:45 ` Ulrich Drepper
2006-10-06 8:36 ` Evgeniy Polyakov
2006-10-15 22:43 ` Ulrich Drepper
2006-10-16 7:23 ` Evgeniy Polyakov
2006-10-16 9:59 ` Ulrich Drepper
2006-10-16 10:38 ` Evgeniy Polyakov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20061017160155.GA18522@2ka.mipt.ru \
--to=johnpol@2ka.mipt.ru \
--cc=akpm@osdl.org \
--cc=chase.venters@clientec.com \
--cc=dada1@cosmosbay.com \
--cc=davem@davemloft.net \
--cc=drepper@gmail.com \
--cc=drepper@redhat.com \
--cc=hch@infradead.org \
--cc=johann.borck@densedata.com \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=zach.brown@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).