From: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
To: David Miller <davem@davemloft.net>
Cc: rdreier@cisco.com, ak@suse.de, tom@opengridcomputing.com,
netdev@vger.kernel.org, akpm@osdl.org
Subject: Re: RDMA will be reverted
Date: Tue, 25 Jul 2006 09:51:28 +0400 [thread overview]
Message-ID: <20060725055127.GA5103@2ka.mipt.ru> (raw)
In-Reply-To: <20060724.150613.54186472.davem@davemloft.net>
On Mon, Jul 24, 2006 at 03:06:13PM -0700, David Miller (davem@davemloft.net) wrote:
> Don't get too excited about VJ netchannels, more and more roadblocks
> to their practicality are being found every day.
>
> For example, my idea to allow ESTABLISHED TCP socket demux to be done
> before netfilter is flawed. Connection tracking and NAT can change
> the packet ID and loop it back to us to hit exactly an ESTABLISHED TCP
> socket, therefore we must always hit netfilter first.
There is no problem with netfilter and process context processing - when
skb is removed from hardware list/array and is being processed by
netfilter in netchannel (or in process context in general),
there is no problems if changed skb will be rerouted into different
queue and state.
> All the original costs of route, netfilter, TCP socket lookup all
> reappear as we make VJ netchannels fit all the rules of real practical
> systems, eliminating their gains entirely. I will also note in
> passing that papers on related ideas, such as the Exokernel stuff, are
> very careful to not address the issue of how practical 1) their demux
> engine is and 2) the negative side effects of userspace TCP
> implementations. For an example of the latter, if you have some 1GB
> JAVA process you do not want to wake that monster up just to do some
> ACK processing or TCP window updates, yet if you don't you violate
> TCP's rules and risk spurious unnecessary retransmits.
I still plan to continue userspace implementation.
If gigantic-java-monster (tm) is going to read some data - it has been
awakened already, thus it is in the memeory (with linked tcp lib), so
there is zero overhead.
> Furthermore, the VJ netchannel gains can be partially obtained from
> generic stateless facilities that we are going to get anyways.
> Networking chips supporting multiple MSI-X vectors, choosen by hashing
> the flow ID, can move TCP processing to "end nodes" which are cpu
> threads in this case, by having each such MSI-X vector target a
> different cpu thread.
And if that CPU is very busy?
Linux should somehow tell NIC that some CPUs are valid and some are not
right now, but not in a second, so scheduler must be tightly bound with
network internals.
Just my 2 coins.
--
Evgeniy Polyakov
next prev parent reply other threads:[~2006-07-25 5:55 UTC|newest]
Thread overview: 74+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-06-28 7:07 RDMA will be reverted David Miller
2006-06-28 7:41 ` Evgeniy Polyakov
2006-06-28 14:56 ` Tom Tucker
2006-06-28 15:01 ` Steve Wise
2006-06-29 16:54 ` Roland Dreier
2006-06-29 17:32 ` YOSHIFUJI Hideaki / 吉藤英明
2006-06-29 17:35 ` Roland Dreier
2006-06-29 17:40 ` YOSHIFUJI Hideaki / 吉藤英明
2006-06-29 19:46 ` David Miller
2006-06-29 20:11 ` Tom Tucker
2006-06-29 20:16 ` Tom Tucker
2006-06-29 20:19 ` David Miller
2006-06-29 20:47 ` Tom Tucker
2006-06-29 20:53 ` David Miller
2006-06-29 21:28 ` Tom Tucker
2006-06-29 21:25 ` Andi Kleen
2006-06-29 20:42 ` James Morris
2006-06-30 20:51 ` Roland Dreier
2006-06-30 21:16 ` David Miller
2006-06-30 23:01 ` Tom Tucker
2006-07-01 14:26 ` Andi Kleen
2006-07-04 18:34 ` Andy Gay
2006-07-04 20:47 ` Andi Kleen
2006-07-04 22:22 ` Andy Gay
2006-07-04 23:01 ` Andi Kleen
2006-07-04 23:48 ` Andy Gay
2006-07-05 0:04 ` Andi Kleen
2006-07-04 20:34 ` Roland Dreier
2006-07-24 22:06 ` David Miller
2006-07-24 23:10 ` Andi Kleen
2006-07-24 23:22 ` David Miller
2006-07-25 0:02 ` Andi Kleen
2006-07-25 0:29 ` Rick Jones
2006-07-25 0:45 ` David Miller
2006-07-25 0:55 ` Rick Jones
2006-07-25 1:04 ` Andi Kleen
2006-07-25 1:21 ` David Miller
2006-07-25 16:29 ` Rick Jones
2006-07-25 16:32 ` Andi Kleen
2006-07-25 1:03 ` Rick Jones
2006-07-25 1:42 ` Andi Kleen
2006-07-25 5:51 ` Evgeniy Polyakov [this message]
2006-07-25 6:48 ` David Miller
2006-07-25 6:59 ` Evgeniy Polyakov
2006-07-25 7:33 ` David Miller
2006-07-25 7:42 ` Evgeniy Polyakov
2006-07-05 17:09 ` Tom Tucker
2006-07-05 17:50 ` Steve Wise
2006-07-24 22:25 ` David Miller
2006-07-24 22:47 ` Caitlin Bestler
2006-07-24 22:23 ` David Miller
2006-07-24 22:57 ` Caitlin Bestler
2006-07-01 21:45 ` David Miller
2006-07-04 20:34 ` Roland Dreier
2006-07-05 18:27 ` David Miller
2006-07-05 20:29 ` Roland Dreier
2006-07-06 3:03 ` David Miller
2006-07-06 5:25 ` Tom Tucker
2006-07-06 14:08 ` Herbert Xu
2006-07-06 17:36 ` Tom Tucker
2006-07-07 0:03 ` Herbert Xu
2006-07-07 0:32 ` Tom Tucker
2006-07-07 6:53 ` David Miller
2006-07-07 8:11 ` What is RDMA (was: RDMA will be reverted) Herbert Xu
2006-07-07 18:25 ` Steve Wise
2006-07-11 8:17 ` Herbert Xu
2006-07-11 13:27 ` Steve Wise
2006-07-24 22:29 ` What is RDMA David Miller
2006-07-24 22:34 ` Rick Jones
2006-07-24 22:39 ` David Miller
2006-07-24 22:49 ` Andi Kleen
2006-07-07 13:29 ` RDMA will be reverted Tom Tucker
-- strict thread matches above, loose matches on Subject: below --
2006-07-06 13:26 Caitlin Bestler
2006-07-25 19:59 Tom Tucker
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20060725055127.GA5103@2ka.mipt.ru \
--to=johnpol@2ka.mipt.ru \
--cc=ak@suse.de \
--cc=akpm@osdl.org \
--cc=davem@davemloft.net \
--cc=netdev@vger.kernel.org \
--cc=rdreier@cisco.com \
--cc=tom@opengridcomputing.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.