* RE: [PATCH 1/3] Rough VJ Channel Implementation - vj_core.patch
@ 2006-04-28 17:02 Caitlin Bestler
2006-04-28 17:18 ` Stephen Hemminger
2006-04-28 17:25 ` Evgeniy Polyakov
0 siblings, 2 replies; 26+ messages in thread
From: Caitlin Bestler @ 2006-04-28 17:02 UTC (permalink / raw)
To: Evgeniy Polyakov; +Cc: David S. Miller, kelly, rusty, netdev
Evgeniy Polyakov wrote:
> On Fri, Apr 28, 2006 at 08:59:19AM -0700, Caitlin Bestler
> (caitlinb@broadcom.com) wrote:
>>> Btw, how is it supposed to work without header split capabale
>>> hardware?
>>
>> Hardware that can classify packets is obviously capable of doing
>> header data separation, but that does not mean that it has to do so.
>>
>> If the host wants header data separation it's real value is that when
>> packets arrive in order that fewer distinct copies are required to
>> move the data to the user buffer (because separated data can be
>> placed back-to-back in a data-only ring). But that's an
>> optimization, it's not needed to make the idea worth doing, or even
>> necessarily in the first implementation.
>
> If there is dataflow, not flow of packets or flow of data
> with holes, it could be possible to modify recv() to just
> return the right pointer, so in theory userspace
> modifications would be minimal.
> With copy in place it completely does not differ from current
> design with copy_to_user() being used since memcpy() is just
> slightly faster than copy*user().
If the app is really ready to use a modified interface we might as well
just give them a QP/CQ interface. But I suppose "receive by pointer"
interfaces don't really stretch the sockets interface all that badly.
The key is that you have to decide how the buffer is released,
is it the next call? Or a separate call? Does releasing buffer
N+2 release buffers N and N+1? What you want to avoid
is having to keep a scoreboard of which buffers have been
released.
But in context, header/data separation would allow in order
packets to have the data be placed back to back, which
could allow a single recv to report the payload of multiple
successive TCP segments. So the benefit of header/data
separation remains the same, and I still say it's a optimization
that should not be made a requirement. The benefits of vj_channels
exist even without them. When the packet classifier runs on the
host, header/data separation would not be free. I want to enable
hardware offloads, not make the kernel bend over backwards
to emulate how hardware would work. I'm just hoping that we
can agree to let hardware do its work without being forced to
work the same way the kernel does (i.e., running down a long
list of arbitrary packet filter rules on a per packet basis).
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 1/3] Rough VJ Channel Implementation - vj_core.patch
2006-04-28 17:02 [PATCH 1/3] Rough VJ Channel Implementation - vj_core.patch Caitlin Bestler
@ 2006-04-28 17:18 ` Stephen Hemminger
2006-04-28 17:29 ` Evgeniy Polyakov
2006-04-28 19:10 ` David S. Miller
2006-04-28 17:25 ` Evgeniy Polyakov
1 sibling, 2 replies; 26+ messages in thread
From: Stephen Hemminger @ 2006-04-28 17:18 UTC (permalink / raw)
To: Caitlin Bestler; +Cc: Evgeniy Polyakov, David S. Miller, kelly, rusty, netdev
On Fri, 28 Apr 2006 10:02:10 -0700
"Caitlin Bestler" <caitlinb@broadcom.com> wrote:
> Evgeniy Polyakov wrote:
> > On Fri, Apr 28, 2006 at 08:59:19AM -0700, Caitlin Bestler
> > (caitlinb@broadcom.com) wrote:
> >>> Btw, how is it supposed to work without header split capabale
> >>> hardware?
> >>
> >> Hardware that can classify packets is obviously capable of doing
> >> header data separation, but that does not mean that it has to do so.
> >>
> >> If the host wants header data separation it's real value is that when
> >> packets arrive in order that fewer distinct copies are required to
> >> move the data to the user buffer (because separated data can be
> >> placed back-to-back in a data-only ring). But that's an
> >> optimization, it's not needed to make the idea worth doing, or even
> >> necessarily in the first implementation.
> >
> > If there is dataflow, not flow of packets or flow of data
> > with holes, it could be possible to modify recv() to just
> > return the right pointer, so in theory userspace
> > modifications would be minimal.
> > With copy in place it completely does not differ from current
> > design with copy_to_user() being used since memcpy() is just
> > slightly faster than copy*user().
>
> If the app is really ready to use a modified interface we might as well
> just give them a QP/CQ interface. But I suppose "receive by pointer"
> interfaces don't really stretch the sockets interface all that badly.
> The key is that you have to decide how the buffer is released,
> is it the next call? Or a separate call? Does releasing buffer
> N+2 release buffers N and N+1? What you want to avoid
> is having to keep a scoreboard of which buffers have been
> released.
>
Please just use existing AIO interface. We don't need another
interface. The number of interfaces increases the exposed bug
surface geometrically. Which means for each new interface, it
means testing and fixing bugs in every possible usage.
> But in context, header/data separation would allow in order
> packets to have the data be placed back to back, which
> could allow a single recv to report the payload of multiple
> successive TCP segments. So the benefit of header/data
> separation remains the same, and I still say it's a optimization
> that should not be made a requirement. The benefits of vj_channels
> exist even without them. When the packet classifier runs on the
> host, header/data separation would not be free. I want to enable
> hardware offloads, not make the kernel bend over backwards
> to emulate how hardware would work. I'm just hoping that we
> can agree to let hardware do its work without being forced to
> work the same way the kernel does (i.e., running down a long
> list of arbitrary packet filter rules on a per packet basis).
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 1/3] Rough VJ Channel Implementation - vj_core.patch
2006-04-28 17:18 ` Stephen Hemminger
@ 2006-04-28 17:29 ` Evgeniy Polyakov
2006-04-28 17:41 ` Stephen Hemminger
2006-04-28 19:10 ` David S. Miller
1 sibling, 1 reply; 26+ messages in thread
From: Evgeniy Polyakov @ 2006-04-28 17:29 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: Caitlin Bestler, David S. Miller, kelly, rusty, netdev
On Fri, Apr 28, 2006 at 10:18:33AM -0700, Stephen Hemminger (shemminger@osdl.org) wrote:
> Please just use existing AIO interface. We don't need another
> interface. The number of interfaces increases the exposed bug
> surface geometrically. Which means for each new interface, it
> means testing and fixing bugs in every possible usage.
Networking AIO? Like [1] :)
That would be really good.
1. http://tservice.net.ru/~s0mbre/old/?section=projects&item=naio
--
Evgeniy Polyakov
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 1/3] Rough VJ Channel Implementation - vj_core.patch
2006-04-28 17:29 ` Evgeniy Polyakov
@ 2006-04-28 17:41 ` Stephen Hemminger
2006-04-28 17:55 ` Evgeniy Polyakov
0 siblings, 1 reply; 26+ messages in thread
From: Stephen Hemminger @ 2006-04-28 17:41 UTC (permalink / raw)
To: Evgeniy Polyakov; +Cc: Caitlin Bestler, David S. Miller, kelly, rusty, netdev
On Fri, 28 Apr 2006 21:29:32 +0400
Evgeniy Polyakov <johnpol@2ka.mipt.ru> wrote:
> On Fri, Apr 28, 2006 at 10:18:33AM -0700, Stephen Hemminger (shemminger@osdl.org) wrote:
> > Please just use existing AIO interface. We don't need another
> > interface. The number of interfaces increases the exposed bug
> > surface geometrically. Which means for each new interface, it
> > means testing and fixing bugs in every possible usage.
>
> Networking AIO? Like [1] :)
> That would be really good.
>
> 1. http://tservice.net.ru/~s0mbre/old/?section=projects&item=naio
>
The existing infrastructure is there in the syscall layer, it just
isn't really AIO for sockets. That naio project has two problems, first
they require driver changes, and he is doing it on the stupidest
of hardware, optimizing a 8139too is foolish. Second, introducing
kevents, seems unnecessary and hasn't been accepted in the mainline.
The existing linux AIO model seems sufficient:
http://lse.sourceforge.net/io/aio.html
There is work to put true Posix AIO on top of this.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 1/3] Rough VJ Channel Implementation - vj_core.patch
2006-04-28 17:41 ` Stephen Hemminger
@ 2006-04-28 17:55 ` Evgeniy Polyakov
2006-04-28 19:16 ` David S. Miller
0 siblings, 1 reply; 26+ messages in thread
From: Evgeniy Polyakov @ 2006-04-28 17:55 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: Caitlin Bestler, David S. Miller, kelly, rusty, netdev
On Fri, Apr 28, 2006 at 10:41:18AM -0700, Stephen Hemminger (shemminger@osdl.org) wrote:
> On Fri, 28 Apr 2006 21:29:32 +0400
> Evgeniy Polyakov <johnpol@2ka.mipt.ru> wrote:
>
> > On Fri, Apr 28, 2006 at 10:18:33AM -0700, Stephen Hemminger (shemminger@osdl.org) wrote:
> > > Please just use existing AIO interface. We don't need another
> > > interface. The number of interfaces increases the exposed bug
> > > surface geometrically. Which means for each new interface, it
> > > means testing and fixing bugs in every possible usage.
> >
> > Networking AIO? Like [1] :)
> > That would be really good.
> >
> > 1. http://tservice.net.ru/~s0mbre/old/?section=projects&item=naio
> >
>
> The existing infrastructure is there in the syscall layer, it just
> isn't really AIO for sockets. That naio project has two problems, first
> they require driver changes, and he is doing it on the stupidest
> of hardware, optimizing a 8139too is foolish.
No, it does not. You confuse it with receiving zero-copy support which
allows to DMA data directly into VFS cache [1].
NAIO works for any kind of hardware and was tested with e1000 and showed
noticeble win in both CPU usage and network performance.
> Second, introducing
> kevents, seems unnecessary and hasn't been accepted in the mainline.
kevent was never sent to lkml@ although it showed over 40% win over epoll for
test web server. Sending it to lkml@ is just jumping into ... not into
technical world, so I posted it first here, but without much attention
though.
> The existing linux AIO model seems sufficient:
> http://lse.sourceforge.net/io/aio.html
>
> There is work to put true Posix AIO on top of this.
There are a lot of discussions about combining AIO with epoll and
combine them into something similar to kevent which allows to monitor
level and edge triggered events, to create proper state machine for AIO
compeltions. kevent [2] does exactly that. AIO works not as state
machine, but it's repeated-check design is more like postponing work
from one context to special thread.
1. receiving zero-copy support
http://tservice.net.ru/~s0mbre/old/?section=projects&item=recv_zero_copy
2. kevent system
http://tservice.net.ru/~s0mbre/old/?section=projects&item=kevent
--
Evgeniy Polyakov
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 1/3] Rough VJ Channel Implementation - vj_core.patch
2006-04-28 17:55 ` Evgeniy Polyakov
@ 2006-04-28 19:16 ` David S. Miller
2006-04-28 19:49 ` Stephen Hemminger
2006-04-28 19:52 ` [PATCH 1/3] Rough VJ Channel Implementation - vj_core.patch Evgeniy Polyakov
0 siblings, 2 replies; 26+ messages in thread
From: David S. Miller @ 2006-04-28 19:16 UTC (permalink / raw)
To: johnpol; +Cc: shemminger, caitlinb, kelly, rusty, netdev
From: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Date: Fri, 28 Apr 2006 21:55:39 +0400
> On Fri, Apr 28, 2006 at 10:41:18AM -0700, Stephen Hemminger (shemminger@osdl.org) wrote:
> > Second, introducing
> > kevents, seems unnecessary and hasn't been accepted in the mainline.
>
> kevent was never sent to lkml@ although it showed over 40% win over epoll for
> test web server. Sending it to lkml@ is just jumping into ... not into
> technical world, so I posted it first here, but without much attention
> though.
Frankly I found kevents to be a very strong idea.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 1/3] Rough VJ Channel Implementation - vj_core.patch
2006-04-28 19:16 ` David S. Miller
@ 2006-04-28 19:49 ` Stephen Hemminger
2006-04-28 19:59 ` Evgeniy Polyakov
2006-04-28 19:52 ` [PATCH 1/3] Rough VJ Channel Implementation - vj_core.patch Evgeniy Polyakov
1 sibling, 1 reply; 26+ messages in thread
From: Stephen Hemminger @ 2006-04-28 19:49 UTC (permalink / raw)
To: David S. Miller; +Cc: johnpol, caitlinb, kelly, rusty, netdev
On Fri, 28 Apr 2006 12:16:36 -0700 (PDT)
"David S. Miller" <davem@davemloft.net> wrote:
> From: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
> Date: Fri, 28 Apr 2006 21:55:39 +0400
>
> > On Fri, Apr 28, 2006 at 10:41:18AM -0700, Stephen Hemminger (shemminger@osdl.org) wrote:
> > > Second, introducing
> > > kevents, seems unnecessary and hasn't been accepted in the mainline.
> >
> > kevent was never sent to lkml@ although it showed over 40% win over epoll for
> > test web server. Sending it to lkml@ is just jumping into ... not into
> > technical world, so I posted it first here, but without much attention
> > though.
>
> Frankly I found kevents to be a very strong idea.
But there is this huge semantic overload of kevent, poll, epoll, aio,
regular sendmsg/recv, posix aio, etc.
Perhaps a clean break with the socket interface is needed. Otherwise, there
are nasty complications with applications that mix old socket calls and new interface
on the same connection.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 1/3] Rough VJ Channel Implementation - vj_core.patch
2006-04-28 19:49 ` Stephen Hemminger
@ 2006-04-28 19:59 ` Evgeniy Polyakov
2006-04-28 22:00 ` David S. Miller
0 siblings, 1 reply; 26+ messages in thread
From: Evgeniy Polyakov @ 2006-04-28 19:59 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: David S. Miller, caitlinb, kelly, rusty, netdev
On Fri, Apr 28, 2006 at 12:49:15PM -0700, Stephen Hemminger (shemminger@osdl.org) wrote:
> But there is this huge semantic overload of kevent, poll, epoll, aio,
> regular sendmsg/recv, posix aio, etc.
>
> Perhaps a clean break with the socket interface is needed. Otherwise, there
> are nasty complications with applications that mix old socket calls and new interface
> on the same connection.
kevent can be used as poll without any changes to the socket code.
There are two types of network related kevents - socket events
(recv/send/accept) and network aio, which can be turned completely off
in config.
There are following events which are supported by kevent:
o usual poll/select notifications
o inode notifications (create/remove)
o timer notifications
o socket notifications (send/recv/accept)
o network aio system
o fs aio (project closed, aio_sendfile() is being developed instead)
Any of the above can be turned off by config option.
--
Evgeniy Polyakov
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 1/3] Rough VJ Channel Implementation - vj_core.patch
2006-04-28 19:59 ` Evgeniy Polyakov
@ 2006-04-28 22:00 ` David S. Miller
2006-04-29 13:54 ` Evgeniy Polyakov
[not found] ` <20060429124451.GA19810@2ka.mipt.ru>
0 siblings, 2 replies; 26+ messages in thread
From: David S. Miller @ 2006-04-28 22:00 UTC (permalink / raw)
To: johnpol; +Cc: shemminger, caitlinb, kelly, rusty, netdev
From: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Date: Fri, 28 Apr 2006 23:59:30 +0400
> kevent can be used as poll without any changes to the socket code.
> There are two types of network related kevents - socket events
> (recv/send/accept) and network aio, which can be turned completely off
> in config.
> There are following events which are supported by kevent:
> o usual poll/select notifications
> o inode notifications (create/remove)
> o timer notifications
> o socket notifications (send/recv/accept)
> o network aio system
> o fs aio (project closed, aio_sendfile() is being developed instead)
>
> Any of the above can be turned off by config option.
Feel free to post the current version of your kevent patch
here so we can discuss something concrete.
Maybe you have even some toy example user applications that
use kevent that people can look at too? That might help
in understanding how it's supposed to be used.
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: [PATCH 1/3] Rough VJ Channel Implementation - vj_core.patch
2006-04-28 22:00 ` David S. Miller
@ 2006-04-29 13:54 ` Evgeniy Polyakov
[not found] ` <20060429124451.GA19810@2ka.mipt.ru>
1 sibling, 0 replies; 26+ messages in thread
From: Evgeniy Polyakov @ 2006-04-29 13:54 UTC (permalink / raw)
To: David S. Miller; +Cc: shemminger, caitlinb, kelly, rusty, netdev
[-- Attachment #1: Type: text/plain, Size: 2358 bytes --]
On Fri, Apr 28, 2006 at 03:00:56PM -0700, David S. Miller (davem@davemloft.net) wrote:
> From: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
> Date: Fri, 28 Apr 2006 23:59:30 +0400
>
> > kevent can be used as poll without any changes to the socket code.
> > There are two types of network related kevents - socket events
> > (recv/send/accept) and network aio, which can be turned completely off
> > in config.
> > There are following events which are supported by kevent:
> > o usual poll/select notifications
> > o inode notifications (create/remove)
> > o timer notifications
> > o socket notifications (send/recv/accept)
> > o network aio system
> > o fs aio (project closed, aio_sendfile() is being developed instead)
> >
> > Any of the above can be turned off by config option.
>
> Feel free to post the current version of your kevent patch
> here so we can discuss something concrete.
>
> Maybe you have even some toy example user applications that
> use kevent that people can look at too? That might help
> in understanding how it's supposed to be used.
There are several at project's homepage [1] and in archive [2]:
evserver_epoll.c - epoll-based web server (pure epoll)
evserver_kevent.c - kevent-based web server (socket notifications)
evserver_poll.c - web server which uses kevent-based poll (poll/select
notifications)
evtest.c - can wait for any type of events. It was used to test
timer notifications.
naio_recv.c/naio_send.c - network AIO sending and receiving benchmarks
(sync/async)
aio_sendfile.c - aio sendfile benchmark (sendfile/aio_sendfile). Kernel
implementation is not 100% ready, pages are only asynchronously propagated
into VFS cache, but are not sent yet.
There are also links to benchmark results, comparison with FreeBSD kqueue, some
conclusions on kevent homepage [1].
Network AIO [3] homepage also contains additional NAIO benchmarks with
some graphs.
1. kevent project home page.
http://tservice.net.ru/~s0mbre/old/?section=projects&item=kevent
2. kevent archive
http://tservice.net.ru/~s0mbre/archive/kevent/
3. Network AIO
http://tservice.net.ru/~s0mbre/old/?section=projects&item=naio
Current development kevent patchset (against 2.6.15-rc7, but could be
applied against later trees too) attached gzipped, sory if you get this
twice.
Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
--
Evgeniy Polyakov
[-- Attachment #2: kevent_full.diff.gz --]
[-- Type: application/x-gunzip, Size: 23625 bytes --]
^ permalink raw reply [flat|nested] 26+ messages in thread[parent not found: <20060429124451.GA19810@2ka.mipt.ru>]
* Re: [PATCH 1/3] Rough VJ Channel Implementation - vj_core.patch
[not found] ` <20060429124451.GA19810@2ka.mipt.ru>
@ 2006-05-01 21:32 ` David S. Miller
2006-05-02 7:08 ` Evgeniy Polyakov
2006-05-02 8:10 ` [1/1] Kevent subsystem Evgeniy Polyakov
0 siblings, 2 replies; 26+ messages in thread
From: David S. Miller @ 2006-05-01 21:32 UTC (permalink / raw)
To: johnpol; +Cc: shemminger, caitlinb, kelly, rusty, netdev
From: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Date: Sat, 29 Apr 2006 16:44:51 +0400
> Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
I understand how in some ways this is work in progress,
but direct calls into ext3 from the kevent code? I'd
like stuff like that cleaned up before reviewing :-)
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: [PATCH 1/3] Rough VJ Channel Implementation - vj_core.patch
2006-05-01 21:32 ` David S. Miller
@ 2006-05-02 7:08 ` Evgeniy Polyakov
2006-05-02 8:10 ` [1/1] Kevent subsystem Evgeniy Polyakov
1 sibling, 0 replies; 26+ messages in thread
From: Evgeniy Polyakov @ 2006-05-02 7:08 UTC (permalink / raw)
To: David S. Miller; +Cc: shemminger, caitlinb, kelly, rusty, netdev
On Mon, May 01, 2006 at 02:32:46PM -0700, David S. Miller (davem@davemloft.net) wrote:
> From: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
> Date: Sat, 29 Apr 2006 16:44:51 +0400
>
> > Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
>
> I understand how in some ways this is work in progress,
> but direct calls into ext3 from the kevent code? I'd
> like stuff like that cleaned up before reviewing :-)
Well, this only requires per address space ->get_block() callback,
which is what ext3_get_block() is.
I will update and resend patchset today.
Thank you.
--
Evgeniy Polyakov
^ permalink raw reply [flat|nested] 26+ messages in thread
* [1/1] Kevent subsystem.
2006-05-01 21:32 ` David S. Miller
2006-05-02 7:08 ` Evgeniy Polyakov
@ 2006-05-02 8:10 ` Evgeniy Polyakov
1 sibling, 0 replies; 26+ messages in thread
From: Evgeniy Polyakov @ 2006-05-02 8:10 UTC (permalink / raw)
To: David S. Miller; +Cc: shemminger, caitlinb, kelly, rusty, netdev
[-- Attachment #1: Type: text/plain, Size: 1007 bytes --]
Kevent subsystem incorporates several AIO/kqueue design notes and
ideas. Kevent can be used both for edge and level triggered notifications.
It supports:
o socket notifications (accept, receiving and sending)
o network AIO (aio_send(), aio_recv() and aio_sendfile()) [3]
o inode notifications (create/remove)
o generic poll()/select() notifications
o timer notifications
More info, design notes, benchmarks (web server based on epoll, kevent,
kevent_poll; naio_send() vs. send(), naio_recv() vs. recv() with different number
of sending/receiving users) can be found on project's homepage [1].
Userspace interface was greatly described in LWN article [2].
1. kevent homepage.
http://tservice.net.ru/~s0mbre/old/?section=projects&item=kevent
2. LWN article about kevent.
http://lwn.net/Articles/172844/
3. Network AIO (aio_send(), aio_recv(), aio_sendfile()).
http://tservice.net.ru/~s0mbre/old/?section=projects&item=naio
Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
--
Evgeniy Polyakov
[-- Attachment #2: kevent_full.diff.3.gz --]
[-- Type: application/x-gunzip, Size: 24072 bytes --]
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 1/3] Rough VJ Channel Implementation - vj_core.patch
2006-04-28 19:16 ` David S. Miller
2006-04-28 19:49 ` Stephen Hemminger
@ 2006-04-28 19:52 ` Evgeniy Polyakov
1 sibling, 0 replies; 26+ messages in thread
From: Evgeniy Polyakov @ 2006-04-28 19:52 UTC (permalink / raw)
To: David S. Miller; +Cc: shemminger, caitlinb, kelly, rusty, netdev
On Fri, Apr 28, 2006 at 12:16:36PM -0700, David S. Miller (davem@davemloft.net) wrote:
> From: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
> Date: Fri, 28 Apr 2006 21:55:39 +0400
>
> > On Fri, Apr 28, 2006 at 10:41:18AM -0700, Stephen Hemminger (shemminger@osdl.org) wrote:
> > > Second, introducing
> > > kevents, seems unnecessary and hasn't been accepted in the mainline.
> >
> > kevent was never sent to lkml@ although it showed over 40% win over epoll for
> > test web server. Sending it to lkml@ is just jumping into ... not into
> > technical world, so I posted it first here, but without much attention
> > though.
>
> Frankly I found kevents to be a very strong idea.
Glad to hear this.
I probably should resend patches netdev@ and (mar my karma) send it to
lkml@...?
--
Evgeniy Polyakov
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 1/3] Rough VJ Channel Implementation - vj_core.patch
2006-04-28 17:18 ` Stephen Hemminger
2006-04-28 17:29 ` Evgeniy Polyakov
@ 2006-04-28 19:10 ` David S. Miller
2006-04-28 20:46 ` Brent Cook
1 sibling, 1 reply; 26+ messages in thread
From: David S. Miller @ 2006-04-28 19:10 UTC (permalink / raw)
To: shemminger; +Cc: caitlinb, johnpol, kelly, rusty, netdev
From: Stephen Hemminger <shemminger@osdl.org>
Date: Fri, 28 Apr 2006 10:18:33 -0700
> Please just use existing AIO interface.
I totally disagree, the existing AIO interface is garbage.
We need new APIs to do this right, to get the ring buffer
and the zero-copy'ness correct.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 1/3] Rough VJ Channel Implementation - vj_core.patch
2006-04-28 19:10 ` David S. Miller
@ 2006-04-28 20:46 ` Brent Cook
0 siblings, 0 replies; 26+ messages in thread
From: Brent Cook @ 2006-04-28 20:46 UTC (permalink / raw)
To: David S. Miller; +Cc: shemminger, caitlinb, johnpol, kelly, rusty, netdev
On Friday 28 April 2006 14:10, David S. Miller wrote:
> From: Stephen Hemminger <shemminger@osdl.org>
> Date: Fri, 28 Apr 2006 10:18:33 -0700
>
> > Please just use existing AIO interface.
>
> I totally disagree, the existing AIO interface is garbage.
>
> We need new APIs to do this right, to get the ring buffer
> and the zero-copy'ness correct.
> -
Heh, like PF_RING? Just mmap a socket and read out some structures?
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 1/3] Rough VJ Channel Implementation - vj_core.patch
2006-04-28 17:02 [PATCH 1/3] Rough VJ Channel Implementation - vj_core.patch Caitlin Bestler
2006-04-28 17:18 ` Stephen Hemminger
@ 2006-04-28 17:25 ` Evgeniy Polyakov
2006-04-28 19:14 ` David S. Miller
1 sibling, 1 reply; 26+ messages in thread
From: Evgeniy Polyakov @ 2006-04-28 17:25 UTC (permalink / raw)
To: Caitlin Bestler; +Cc: David S. Miller, kelly, rusty, netdev
On Fri, Apr 28, 2006 at 10:02:10AM -0700, Caitlin Bestler (caitlinb@broadcom.com) wrote:
> If the app is really ready to use a modified interface we might as well
> just give them a QP/CQ interface. But I suppose "receive by pointer"
> interfaces don't really stretch the sockets interface all that badly.
> The key is that you have to decide how the buffer is released,
> is it the next call? Or a separate call? Does releasing buffer
> N+2 release buffers N and N+1? What you want to avoid
> is having to keep a scoreboard of which buffers have been
> released.
>
> But in context, header/data separation would allow in order
> packets to have the data be placed back to back, which
> could allow a single recv to report the payload of multiple
> successive TCP segments. So the benefit of header/data
> separation remains the same, and I still say it's a optimization
> that should not be made a requirement. The benefits of vj_channels
> exist even without them. When the packet classifier runs on the
> host, header/data separation would not be free. I want to enable
> hardware offloads, not make the kernel bend over backwards
> to emulate how hardware would work. I'm just hoping that we
> can agree to let hardware do its work without being forced to
> work the same way the kernel does (i.e., running down a long
> list of arbitrary packet filter rules on a per packet basis).
I see your point, and respectfully disagree.
The more complex userspace interface we create the less users it will
have. It is completely unconvenient to read 100 bytes and receive only
80, since 20 were eaten by header. And what if we need only 20, but
packet contains 100, introduce per packet head pointer?
For purpose of benchmarking it works perfectly -
read the whole packet, one can event touch that data to emulate real
work, but for the real world it becomes practically unusabl.
But what we are talking about right now is a research project, not
production system, so we can create any interface we like since the main
goal, IMHO, is searching for the bottlenecks in the current stack and
ways of it's removal even by introducing new complex interface.
I would definitely like to see how your approach works for some
kind of real workloads and does it allow to
create faster and generally better systems.
--
Evgeniy Polyakov
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 1/3] Rough VJ Channel Implementation - vj_core.patch
2006-04-28 17:25 ` Evgeniy Polyakov
@ 2006-04-28 19:14 ` David S. Miller
0 siblings, 0 replies; 26+ messages in thread
From: David S. Miller @ 2006-04-28 19:14 UTC (permalink / raw)
To: johnpol; +Cc: caitlinb, kelly, rusty, netdev
From: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Date: Fri, 28 Apr 2006 21:25:36 +0400
> The more complex userspace interface we create the less users it will
> have. It is completely unconvenient to read 100 bytes and receive only
> 80, since 20 were eaten by header.
These bytes are charged to socket anyways, and allowing the
headers to be there is the only clean way to finesse the whole
zero-copy problem.
User can manage his data any way he likes. He can decide to take
advantage of the zero-copy layout we've provided, or he can copy
to put things into a format he is more happy with at the cost
of the copy.
^ permalink raw reply [flat|nested] 26+ messages in thread
* [1/1] Kevent subsystem.
@ 2006-06-22 17:14 Evgeniy Polyakov
2006-06-22 19:01 ` James Morris
` (2 more replies)
0 siblings, 3 replies; 26+ messages in thread
From: Evgeniy Polyakov @ 2006-06-22 17:14 UTC (permalink / raw)
To: David Miller; +Cc: netdev
[-- Attachment #1: Type: text/plain, Size: 1157 bytes --]
Hello.
Kevent subsystem incorporates several AIO/kqueue design notes and ideas.
Kevent can be used both for edge and level notifications. It supports
socket notifications, network AIO (aio_send(), aio_recv() and
aio_sendfile()), inode notifications (create/remove),
generic poll()/select() notifications and timer notifications.
It was tested against FreeBSD kqueue and Linux epoll and showed
noticeble performance win.
Network asynchronous IO operations were tested against Linux synchronous
socket code and showed noticeble performance win.
Patch against linux-2.6.17-git tree attached (gzipped).
I would like to hear some comments about the overall design,
implementation and plans about it's usefullness for generic kernel.
Design notes, patches, userspace application and perfomance tests can be
found at project's homepages.
1. Kevent subsystem.
http://tservice.net.ru/~s0mbre/old/?section=projects&item=kevent
2. Network AIO.
http://tservice.net.ru/~s0mbre/old/?section=projects&item=naio
3. LWN article about kevent.
http://lwn.net/Articles/172844/
Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Thank you.
--
Evgeniy Polyakov
[-- Attachment #2: kevent-2.6.17-git.diff.gz --]
[-- Type: application/x-gunzip, Size: 24054 bytes --]
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: [1/1] Kevent subsystem.
2006-06-22 17:14 [1/1] Kevent subsystem Evgeniy Polyakov
@ 2006-06-22 19:01 ` James Morris
2006-06-23 5:54 ` Evgeniy Polyakov
2006-06-22 19:53 ` Robert Iakobashvili
2006-06-23 6:12 ` YOSHIFUJI Hideaki / 吉藤英明
2 siblings, 1 reply; 26+ messages in thread
From: James Morris @ 2006-06-22 19:01 UTC (permalink / raw)
To: Evgeniy Polyakov; +Cc: David Miller, netdev
On Thu, 22 Jun 2006, Evgeniy Polyakov wrote:
> Patch against linux-2.6.17-git tree attached (gzipped).
> I would like to hear some comments about the overall design,
> implementation and plans about it's usefullness for generic kernel.
Please send patches as in-line ascii text, along with documentation.
If they're too big, split them up logically into smaller pieces.
- James
--
James Morris
<jmorris@namei.org>
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [1/1] Kevent subsystem.
2006-06-22 19:01 ` James Morris
@ 2006-06-23 5:54 ` Evgeniy Polyakov
0 siblings, 0 replies; 26+ messages in thread
From: Evgeniy Polyakov @ 2006-06-23 5:54 UTC (permalink / raw)
To: James Morris; +Cc: David Miller, netdev
On Thu, Jun 22, 2006 at 03:01:24PM -0400, James Morris (jmorris@namei.org) wrote:
> On Thu, 22 Jun 2006, Evgeniy Polyakov wrote:
>
> > Patch against linux-2.6.17-git tree attached (gzipped).
> > I would like to hear some comments about the overall design,
> > implementation and plans about it's usefullness for generic kernel.
>
> Please send patches as in-line ascii text, along with documentation.
>
>
> If they're too big, split them up logically into smaller pieces.
Hmm... It is too big (>100kb) to send as plain text.
I will try to split it to several pieces and resend in a couple of
moments, although they are quite strongly logically connected.
> - James
> --
> James Morris
> <jmorris@namei.org>
--
Evgeniy Polyakov
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [1/1] Kevent subsystem.
2006-06-22 17:14 [1/1] Kevent subsystem Evgeniy Polyakov
2006-06-22 19:01 ` James Morris
@ 2006-06-22 19:53 ` Robert Iakobashvili
2006-06-23 5:50 ` Evgeniy Polyakov
2006-06-23 6:12 ` YOSHIFUJI Hideaki / 吉藤英明
2 siblings, 1 reply; 26+ messages in thread
From: Robert Iakobashvili @ 2006-06-22 19:53 UTC (permalink / raw)
To: Evgeniy Polyakov; +Cc: netdev
Evgeniy,
On 6/22/06, Evgeniy Polyakov <johnpol@2ka.mipt.ru> wrote:
> Kevent subsystem incorporates several AIO/kqueue design notes and ideas.
> Kevent can be used both for edge and level notifications. It supports
> socket notifications, network AIO (aio_send(), aio_recv() and
> aio_sendfile()), inode notifications (create/remove),
> generic poll()/select() notifications and timer notifications.
Great job!
Smooth integration with userland asynch POSIX frameworks
(e.g. ACE POSIX_Proactor) may require syscalls (or their emulation)
with POSIX interface:
* POSIX_API
* aio_read
* aio_write
* aio_suspend
* aio_error
* aio_return
* aio_cancel
where aio_suspend is very important.
--
Sincerely,
------------------------------------------------------------------
Robert Iakobashvili, coroberti at gmail dot com
Navigare necesse est, vivere non est necesse.
------------------------------------------------------------------
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [1/1] Kevent subsystem.
2006-06-22 19:53 ` Robert Iakobashvili
@ 2006-06-23 5:50 ` Evgeniy Polyakov
0 siblings, 0 replies; 26+ messages in thread
From: Evgeniy Polyakov @ 2006-06-23 5:50 UTC (permalink / raw)
To: Robert Iakobashvili; +Cc: netdev
On Thu, Jun 22, 2006 at 09:53:38PM +0200, Robert Iakobashvili (coroberti@gmail.com) wrote:
> Evgeniy,
>
> On 6/22/06, Evgeniy Polyakov <johnpol@2ka.mipt.ru> wrote:
>
> >Kevent subsystem incorporates several AIO/kqueue design notes and ideas.
> >Kevent can be used both for edge and level notifications. It supports
> >socket notifications, network AIO (aio_send(), aio_recv() and
> >aio_sendfile()), inode notifications (create/remove),
> >generic poll()/select() notifications and timer notifications.
>
> Great job!
> Smooth integration with userland asynch POSIX frameworks
> (e.g. ACE POSIX_Proactor) may require syscalls (or their emulation)
> with POSIX interface:
>
> * POSIX_API
> * aio_read
> * aio_write
> * aio_suspend
> * aio_error
> * aio_return
> * aio_cancel
>
> where aio_suspend is very important.
I've designed network AIO differently as how posix declared it's
functions. POSIX AIO is like rocket science - while one tries to
implement them all there is no time to do the real work.
Although all of them including suspend can be emulated on top of
existing calls to kevent.
> --
> Sincerely,
> ------------------------------------------------------------------
> Robert Iakobashvili, coroberti at gmail dot com
> Navigare necesse est, vivere non est necesse.
> ------------------------------------------------------------------
--
Evgeniy Polyakov
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [1/1] Kevent subsystem.
2006-06-22 17:14 [1/1] Kevent subsystem Evgeniy Polyakov
2006-06-22 19:01 ` James Morris
2006-06-22 19:53 ` Robert Iakobashvili
@ 2006-06-23 6:12 ` YOSHIFUJI Hideaki / 吉藤英明
2006-06-23 6:14 ` David Miller
2 siblings, 1 reply; 26+ messages in thread
From: YOSHIFUJI Hideaki / 吉藤英明 @ 2006-06-23 6:12 UTC (permalink / raw)
To: johnpol; +Cc: davem, netdev, yoshfuji
In article <20060622171436.GA26161@2ka.mipt.ru> (at Thu, 22 Jun 2006 21:14:37 +0400), Evgeniy Polyakov <johnpol@2ka.mipt.ru> says:
> Patch against linux-2.6.17-git tree attached (gzipped).
> I would like to hear some comments about the overall design,
> implementation and plans about it's usefullness for generic kernel.
>
> Design notes, patches, userspace application and perfomance tests can be
> found at project's homepages.
| diff --git a/arch/i386/kernel/syscall_table.S b/arch/i386/kernel/syscall_table.S
| index af56987..93e23ff 100644
| --- a/arch/i386/kernel/syscall_table.S
| +++ b/arch/i386/kernel/syscall_table.S
| @@ -316,3 +316,7 @@ ENTRY(sys_call_table)
| .long sys_sync_file_range
| .long sys_tee /* 315 */
| .long sys_vmsplice
| + .long sys_aio_recv
| + .long sys_aio_send
| + .long sys_aio_sendfile
| + .long sys_kevent_ctl
:
We do not waste syscall entries; we can probably use socketcall
for sys_aio_recv, sys_aio_send (and maybe, sys_aio_sendfile).
| diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
| index 25ecc6e..05d7086 100644
| --- a/net/ipv4/tcp_ipv4.c
| +++ b/net/ipv4/tcp_ipv4.c
:
Please do for tcp_ipv6.c as well. Thank you.
--
YOSHIFUJI Hideaki @ USAGI Project <yoshfuji@linux-ipv6.org>
GPG-FP : 9022 65EB 1ECF 3AD1 0BDF 80D8 4807 F894 E062 0EEA
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [1/1] Kevent subsystem.
2006-06-23 6:12 ` YOSHIFUJI Hideaki / 吉藤英明
@ 2006-06-23 6:14 ` David Miller
2006-06-23 6:18 ` YOSHIFUJI Hideaki / 吉藤英明
0 siblings, 1 reply; 26+ messages in thread
From: David Miller @ 2006-06-23 6:14 UTC (permalink / raw)
To: yoshfuji; +Cc: johnpol, netdev
From: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Date: Fri, 23 Jun 2006 15:12:44 +0900 (JST)
> We do not waste syscall entries; we can probably use socketcall
> for sys_aio_recv, sys_aio_send (and maybe, sys_aio_sendfile).
socketcall is deprecated and some architectures do not even
have a slot for it in their system call tables, one example
is x86_64
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [1/1] Kevent subsystem.
2006-06-23 6:14 ` David Miller
@ 2006-06-23 6:18 ` YOSHIFUJI Hideaki / 吉藤英明
0 siblings, 0 replies; 26+ messages in thread
From: YOSHIFUJI Hideaki / 吉藤英明 @ 2006-06-23 6:18 UTC (permalink / raw)
To: davem; +Cc: johnpol, netdev, yoshfuji
In article <20060622.231407.59469243.davem@davemloft.net> (at Thu, 22 Jun 2006 23:14:07 -0700 (PDT)), David Miller <davem@davemloft.net> says:
> From: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
> Date: Fri, 23 Jun 2006 15:12:44 +0900 (JST)
>
> > We do not waste syscall entries; we can probably use socketcall
> > for sys_aio_recv, sys_aio_send (and maybe, sys_aio_sendfile).
>
> socketcall is deprecated and some architectures do not even
> have a slot for it in their system call tables, one example
> is x86_64
Ah, okay. Thanks.
--yoshfuji
^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2006-06-23 6:17 UTC | newest]
Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-04-28 17:02 [PATCH 1/3] Rough VJ Channel Implementation - vj_core.patch Caitlin Bestler
2006-04-28 17:18 ` Stephen Hemminger
2006-04-28 17:29 ` Evgeniy Polyakov
2006-04-28 17:41 ` Stephen Hemminger
2006-04-28 17:55 ` Evgeniy Polyakov
2006-04-28 19:16 ` David S. Miller
2006-04-28 19:49 ` Stephen Hemminger
2006-04-28 19:59 ` Evgeniy Polyakov
2006-04-28 22:00 ` David S. Miller
2006-04-29 13:54 ` Evgeniy Polyakov
[not found] ` <20060429124451.GA19810@2ka.mipt.ru>
2006-05-01 21:32 ` David S. Miller
2006-05-02 7:08 ` Evgeniy Polyakov
2006-05-02 8:10 ` [1/1] Kevent subsystem Evgeniy Polyakov
2006-04-28 19:52 ` [PATCH 1/3] Rough VJ Channel Implementation - vj_core.patch Evgeniy Polyakov
2006-04-28 19:10 ` David S. Miller
2006-04-28 20:46 ` Brent Cook
2006-04-28 17:25 ` Evgeniy Polyakov
2006-04-28 19:14 ` David S. Miller
-- strict thread matches above, loose matches on Subject: below --
2006-06-22 17:14 [1/1] Kevent subsystem Evgeniy Polyakov
2006-06-22 19:01 ` James Morris
2006-06-23 5:54 ` Evgeniy Polyakov
2006-06-22 19:53 ` Robert Iakobashvili
2006-06-23 5:50 ` Evgeniy Polyakov
2006-06-23 6:12 ` YOSHIFUJI Hideaki / 吉藤英明
2006-06-23 6:14 ` David Miller
2006-06-23 6:18 ` YOSHIFUJI Hideaki / 吉藤英明
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).