From: David Miller <davem@davemloft.net>
To: rdreier@cisco.com
Cc: ak@suse.de, tom@opengridcomputing.com, netdev@vger.kernel.org,
akpm@osdl.org
Subject: Re: RDMA will be reverted
Date: Mon, 24 Jul 2006 15:06:13 -0700 (PDT) [thread overview]
Message-ID: <20060724.150613.54186472.davem@davemloft.net> (raw)
In-Reply-To: <adasllh9kj0.fsf@cisco.com>
From: Roland Dreier <rdreier@cisco.com>
Date: Tue, 04 Jul 2006 13:34:27 -0700
> Well, here's a quick overview, leaving out some of the details. The
> difference between TOE and iWARP/RDMA is really the interface that
> they present.
Thanks for the description Roland. It helps me understand the
situation better.
> The real issues for netdev are things like Steve Wise's patch to add
> route change notifiers, which could be used to tell RNICs when to
> update the next hop for a connection they're handling.
I'll probably put Steve's patches in soon.
> More generally, it would be interesting to see if it's possible to
> tie an RNIC into the kernel's packet filtering, so that disallowed
> connections don't get set up. This seems very similar in spirit to
> the problems around packet filtering that were raised for VJ
> netchannels.
Don't get too excited about VJ netchannels, more and more roadblocks
to their practicality are being found every day.
For example, my idea to allow ESTABLISHED TCP socket demux to be done
before netfilter is flawed. Connection tracking and NAT can change
the packet ID and loop it back to us to hit exactly an ESTABLISHED TCP
socket, therefore we must always hit netfilter first.
All the original costs of route, netfilter, TCP socket lookup all
reappear as we make VJ netchannels fit all the rules of real practical
systems, eliminating their gains entirely. I will also note in
passing that papers on related ideas, such as the Exokernel stuff, are
very careful to not address the issue of how practical 1) their demux
engine is and 2) the negative side effects of userspace TCP
implementations. For an example of the latter, if you have some 1GB
JAVA process you do not want to wake that monster up just to do some
ACK processing or TCP window updates, yet if you don't you violate
TCP's rules and risk spurious unnecessary retransmits.
Furthermore, the VJ netchannel gains can be partially obtained from
generic stateless facilities that we are going to get anyways.
Networking chips supporting multiple MSI-X vectors, choosen by hashing
the flow ID, can move TCP processing to "end nodes" which are cpu
threads in this case, by having each such MSI-X vector target a
different cpu thread.
The good news is that we've survived a long time without revolutions
like VJ net channels, and the existing TCP stack can be improved
dramatically and in ways that people will see benefits from in a
shorter amount of time. For example, Alexey Kuznetsov and I have some
ideas on how to make the most expensive TCP function for a sender,
tcp_ack(), more efficient by using different data structures for the
retransmit queue and the loss/recovery packet SACK state.
next prev parent reply other threads:[~2006-07-24 22:06 UTC|newest]
Thread overview: 74+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-06-28 7:07 RDMA will be reverted David Miller
2006-06-28 7:41 ` Evgeniy Polyakov
2006-06-28 14:56 ` Tom Tucker
2006-06-28 15:01 ` Steve Wise
2006-06-29 16:54 ` Roland Dreier
2006-06-29 17:32 ` YOSHIFUJI Hideaki / 吉藤英明
2006-06-29 17:35 ` Roland Dreier
2006-06-29 17:40 ` YOSHIFUJI Hideaki / 吉藤英明
2006-06-29 19:46 ` David Miller
2006-06-29 20:11 ` Tom Tucker
2006-06-29 20:16 ` Tom Tucker
2006-06-29 20:19 ` David Miller
2006-06-29 20:47 ` Tom Tucker
2006-06-29 20:53 ` David Miller
2006-06-29 21:28 ` Tom Tucker
2006-06-29 21:25 ` Andi Kleen
2006-06-29 20:42 ` James Morris
2006-06-30 20:51 ` Roland Dreier
2006-06-30 21:16 ` David Miller
2006-06-30 23:01 ` Tom Tucker
2006-07-01 14:26 ` Andi Kleen
2006-07-04 18:34 ` Andy Gay
2006-07-04 20:47 ` Andi Kleen
2006-07-04 22:22 ` Andy Gay
2006-07-04 23:01 ` Andi Kleen
2006-07-04 23:48 ` Andy Gay
2006-07-05 0:04 ` Andi Kleen
2006-07-04 20:34 ` Roland Dreier
2006-07-24 22:06 ` David Miller [this message]
2006-07-24 23:10 ` Andi Kleen
2006-07-24 23:22 ` David Miller
2006-07-25 0:02 ` Andi Kleen
2006-07-25 0:29 ` Rick Jones
2006-07-25 0:45 ` David Miller
2006-07-25 0:55 ` Rick Jones
2006-07-25 1:04 ` Andi Kleen
2006-07-25 1:21 ` David Miller
2006-07-25 16:29 ` Rick Jones
2006-07-25 16:32 ` Andi Kleen
2006-07-25 1:03 ` Rick Jones
2006-07-25 1:42 ` Andi Kleen
2006-07-25 5:51 ` Evgeniy Polyakov
2006-07-25 6:48 ` David Miller
2006-07-25 6:59 ` Evgeniy Polyakov
2006-07-25 7:33 ` David Miller
2006-07-25 7:42 ` Evgeniy Polyakov
2006-07-05 17:09 ` Tom Tucker
2006-07-05 17:50 ` Steve Wise
2006-07-24 22:25 ` David Miller
2006-07-24 22:47 ` Caitlin Bestler
2006-07-24 22:23 ` David Miller
2006-07-24 22:57 ` Caitlin Bestler
2006-07-01 21:45 ` David Miller
2006-07-04 20:34 ` Roland Dreier
2006-07-05 18:27 ` David Miller
2006-07-05 20:29 ` Roland Dreier
2006-07-06 3:03 ` David Miller
2006-07-06 5:25 ` Tom Tucker
2006-07-06 14:08 ` Herbert Xu
2006-07-06 17:36 ` Tom Tucker
2006-07-07 0:03 ` Herbert Xu
2006-07-07 0:32 ` Tom Tucker
2006-07-07 6:53 ` David Miller
2006-07-07 8:11 ` What is RDMA (was: RDMA will be reverted) Herbert Xu
2006-07-07 18:25 ` Steve Wise
2006-07-11 8:17 ` Herbert Xu
2006-07-11 13:27 ` Steve Wise
2006-07-24 22:29 ` What is RDMA David Miller
2006-07-24 22:34 ` Rick Jones
2006-07-24 22:39 ` David Miller
2006-07-24 22:49 ` Andi Kleen
2006-07-07 13:29 ` RDMA will be reverted Tom Tucker
-- strict thread matches above, loose matches on Subject: below --
2006-07-06 13:26 Caitlin Bestler
2006-07-25 19:59 Tom Tucker
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20060724.150613.54186472.davem@davemloft.net \
--to=davem@davemloft.net \
--cc=ak@suse.de \
--cc=akpm@osdl.org \
--cc=netdev@vger.kernel.org \
--cc=rdreier@cisco.com \
--cc=tom@opengridcomputing.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).