From: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
To: "David S. Miller" <davem@davemloft.net>
Cc: kelly@au1.ibm.com, rusty@rustcorp.com.au, netdev@vger.kernel.org
Subject: Re: [PATCH 1/3] Rough VJ Channel Implementation - vj_core.patch
Date: Fri, 28 Apr 2006 10:05:03 +0400 [thread overview]
Message-ID: <20060428060503.GC17360@2ka.mipt.ru> (raw)
In-Reply-To: <20060427.130918.65400512.davem@davemloft.net>
On Thu, Apr 27, 2006 at 01:09:18PM -0700, David S. Miller (davem@davemloft.net) wrote:
> Evgeniy, the difference between this and your work is that you did not
> have an intelligent piece of hardware that could be told to recognize
> flows, and only put packets for a specific flow into that's flow's
> buffer pool.
There are the most "intellegent" NICs which use MMIO copy like Realtek 8139 :)
which were used in receiving zero-copy [1] project.
There was special alorithm researched for receiving zero-copy [1] to allow
to put not page-aligned TCP frames into pages, but there was other
problem when page was committed, since no byte commit is allowed in VFS.
In this case we do not have that problem, but instead we must force userspace to
be very smart when dealing with mapped buffers, instead of simple recv().
And for sending it must be even smarter, since data must be properly
aligned. And what about crappy hardware which can DMA only into limited
memory area, or NIC that can not do sg? Or do we need remapping for NIC
that can not do checksum calculation?
> > If we want to dma data from nic into premapped userspace area, this will
> > strike with message sizes/misalignment/slow read and so on, so
> > preallocation has even more problems.
>
> I do not really think this is an issue, we put the full packet into
> user space and teach it where the offset is to the actual data.
> We'll do the same things we do today to try and get the data area
> aligned. User can do whatever is logical and relevant on his end
> to deal with strange cases.
>
> In fact we can specify that card has to take some care to get data
> area of packet aligned on say an 8 byte boundary or something like
> that. When we don't have hardware assist, we are going to be doing
> copies.
Userspace must be too smart, and as we saw with various java tests, it
can not be so even now.
And what if pages are shared and several threads are trying to write
into the same remapped area? Will we use COW and be blamed like Mach
and FreeBSD developers? :)
> > I do think that significant win in VJ's tests belongs not to remapping
> > and cache-oriented changes, but to move all protocol processing into
> > process' context.
>
> I partly disagree. The biggest win is eliminating all of the control
> overhead (all of "softint RX + protocol demux + IP route lookup +
> socket lookup" is turned into single flow demux), and the SMP safe
> data structure which makes it realistic enough to always move the bulk
> of the packet work to the socket's home cpu.
>
> I do not think userspace protocol implementation buys enough to
> justify it. We have to do the protection switch in and out of kernel
> space anyways, so why not still do the protected protocol processing
> work in the kernel? It is still being done on the user's behalf,
> contributes to his time slice, and avoids all of the terrible issues
> of userspace protocol implementations.
After hard irq softirq is scheduled, then later userspace is scheduled,
at least 2 context switch just to move a packet, and "slow" userspace
code is interrupted by both irqs again...
I run some tests on ppc32 embedded boards which showed that rescheduling
latency tend to have milliseconds delay sometimes (about 4 running processes
on 200mhz cpu), although we do not have some real-time requirements here
it is not a good sign...
> And I also want to note that even if the whole idea explodes and
> cannot be made to work, there are good arguments for transitioning
> to SKB'less drivers for their own sake. So work will really not
> be lost.
>
> Let's have 100 different implementations of net channels! :-)
:)
--
Evgeniy Polyakov
next prev parent reply other threads:[~2006-04-28 6:06 UTC|newest]
Thread overview: 90+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-04-26 11:47 [PATCH 1/3] Rough VJ Channel Implementation - vj_core.patch Kelly Daly
2006-04-26 7:33 ` David S. Miller
2006-04-27 3:31 ` Kelly Daly
2006-04-27 6:25 ` David S. Miller
2006-04-27 11:51 ` Evgeniy Polyakov
2006-04-27 20:09 ` David S. Miller
2006-04-28 6:05 ` Evgeniy Polyakov [this message]
2006-05-04 2:59 ` Kelly Daly
2006-05-04 23:22 ` David S. Miller
2006-05-05 1:31 ` Rusty Russell
2006-04-26 7:59 ` David S. Miller
2006-05-04 7:28 ` Kelly Daly
2006-05-04 23:11 ` David S. Miller
2006-05-05 2:48 ` Kelly Daly
2006-05-16 1:02 ` Kelly Daly
2006-05-16 1:05 ` David S. Miller
2006-05-16 1:15 ` Kelly Daly
2006-05-16 5:16 ` David S. Miller
2006-06-22 2:05 ` Kelly Daly
2006-06-22 3:58 ` James Morris
2006-06-22 4:31 ` Arnaldo Carvalho de Melo
2006-06-22 4:36 ` YOSHIFUJI Hideaki / 吉藤英明
2006-07-08 0:05 ` David Miller
2006-05-16 6:19 ` [1/1] netchannel subsystem Evgeniy Polyakov
2006-05-16 6:57 ` David S. Miller
2006-05-16 6:59 ` Evgeniy Polyakov
2006-05-16 7:06 ` David S. Miller
2006-05-16 7:15 ` Evgeniy Polyakov
2006-05-16 7:07 ` Evgeniy Polyakov
2006-05-16 17:34 ` [1/1] Netchannel subsyste Evgeniy Polyakov
2006-05-18 10:34 ` Netchannel subsystem update Evgeniy Polyakov
2006-05-20 15:52 ` Evgeniy Polyakov
2006-05-22 6:06 ` David S. Miller
2006-05-22 16:34 ` [Netchannel] Full TCP receiving support Evgeniy Polyakov
2006-05-24 9:38 ` Evgeniy Polyakov
-- strict thread matches above, loose matches on Subject: below --
2006-04-26 16:57 [PATCH 1/3] Rough VJ Channel Implementation - vj_core.patch Caitlin Bestler
2006-04-26 19:23 ` David S. Miller
2006-04-26 19:30 Caitlin Bestler
2006-04-26 19:46 ` Jeff Garzik
2006-04-26 22:40 ` David S. Miller
2006-04-27 3:40 ` Rusty Russell
2006-04-27 4:58 ` James Morris
2006-04-27 6:16 ` David S. Miller
2006-04-27 6:17 ` David S. Miller
2006-04-26 20:20 Caitlin Bestler
2006-04-26 22:35 ` David S. Miller
2006-04-26 22:53 Caitlin Bestler
2006-04-26 22:59 ` David S. Miller
2006-04-27 1:02 Caitlin Bestler
2006-04-27 6:08 ` David S. Miller
2006-04-27 6:17 ` Andi Kleen
2006-04-27 6:27 ` David S. Miller
2006-04-27 6:41 ` Andi Kleen
2006-04-27 7:52 ` David S. Miller
2006-04-27 21:12 Caitlin Bestler
2006-04-28 6:10 ` Evgeniy Polyakov
2006-04-28 7:20 ` David S. Miller
2006-04-28 7:32 ` Evgeniy Polyakov
2006-04-28 18:20 ` David S. Miller
2006-04-28 8:24 ` Rusty Russell
2006-04-28 19:21 ` David S. Miller
2006-04-28 22:04 ` Rusty Russell
2006-04-28 22:38 ` David S. Miller
2006-04-29 0:10 ` Rusty Russell
2006-04-28 15:59 Caitlin Bestler
2006-04-28 16:12 ` Evgeniy Polyakov
2006-04-28 19:09 ` David S. Miller
2006-04-28 17:02 Caitlin Bestler
2006-04-28 17:18 ` Stephen Hemminger
2006-04-28 17:29 ` Evgeniy Polyakov
2006-04-28 17:41 ` Stephen Hemminger
2006-04-28 17:55 ` Evgeniy Polyakov
2006-04-28 19:16 ` David S. Miller
2006-04-28 19:49 ` Stephen Hemminger
2006-04-28 19:59 ` Evgeniy Polyakov
2006-04-28 22:00 ` David S. Miller
2006-04-29 13:54 ` Evgeniy Polyakov
[not found] ` <20060429124451.GA19810@2ka.mipt.ru>
2006-05-01 21:32 ` David S. Miller
2006-05-02 7:08 ` Evgeniy Polyakov
2006-04-28 19:52 ` Evgeniy Polyakov
2006-04-28 19:10 ` David S. Miller
2006-04-28 20:46 ` Brent Cook
2006-04-28 17:25 ` Evgeniy Polyakov
2006-04-28 19:14 ` David S. Miller
2006-04-28 17:55 Caitlin Bestler
2006-04-28 22:17 ` Rusty Russell
2006-04-28 22:40 ` David S. Miller
2006-04-29 0:22 ` Rusty Russell
2006-04-29 6:46 ` David S. Miller
2006-04-28 23:45 Caitlin Bestler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20060428060503.GC17360@2ka.mipt.ru \
--to=johnpol@2ka.mipt.ru \
--cc=davem@davemloft.net \
--cc=kelly@au1.ibm.com \
--cc=netdev@vger.kernel.org \
--cc=rusty@rustcorp.com.au \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).