From: Roland Dreier <rdreier@cisco.com>
To: Andi Kleen <ak@suse.de>
Cc: Tom Tucker <tom@opengridcomputing.com>,
David Miller <davem@davemloft.net>,
netdev@vger.kernel.org, akpm@osdl.org
Subject: Re: RDMA will be reverted
Date: Tue, 04 Jul 2006 13:34:27 -0700 [thread overview]
Message-ID: <adasllh9kj0.fsf@cisco.com> (raw)
In-Reply-To: <200607011626.04539.ak@suse.de> (Andi Kleen's message of "Sat, 1 Jul 2006 16:26:04 +0200")
Andi> Perhaps a good start of that discussion David asked for
Andi> would be if you could give us an overview of the differences
Andi> and how you avoid the TOE problems.
Well, here's a quick overview, leaving out some of the details. The
difference between TOE and iWARP/RDMA is really the interface that
they present.
A TOE ("TCP Offload Engine") is a piece of hardware that offloads TCP
processing from the main system to handle regular sockets. There is
either some way to hand off a socket from the host stack to the TOE,
or a socket is created on the TOE to start with, but in both cases,
the TOE is handling processing for normal TCP sockets. This means
that the TOE has some hardware and/or firmware to do stateful TCP
processing.
An iWARP device, or RNIC (RDMA NIC), also usually has hardware and/or
firmware TCP processing, but this isn't exposed through the BSD socket
interface. Instead, an RNIC presents an interface more like an
InfiniBand HCA: work requests (sends, receives, RDMA operations) are
passed to the RNIC via work queues, and completion notification is
returned asynchronously via completion queues. An iWARP connection
can handle both send/receive ("two-sided") and get/put (RDMA or
"one-sided") operations.
For full details of the protocol used for this, you can look at the
drafs from the IETF rddp working group, but basically an RDMA protocol
is layered on top of a connected stream protocol -- usually TCP, but
SCTP could be used as well.
A lot of the perfomance of iWARP comes from the RDMA/direct placement
capabilities -- for example an NFS/RDMA server can process requests
out of order and put data directly into the buffer that's waiting for
it, without using any CPU on the destination -- but even send/receive
operations can be useful.
As a side note, an RNIC will also typically support the same sort of
kernel bypass as an IB HCA, where work queues can be safely mapped
into a userspace process's memory so that work requests can be posted
without a system call. In fact, when people usually use RDMA as a
shorthand for the combination of the three features I described:
asynchronous work queues and completion queues, connections that
support both send/receive and RDMA, and kernel bypass.
In any case, RNIC support can be added to the existing IB stack with
fairly minor modifications -- you can search the netdev archives for
the patchsets posted by Steve Wise, but nearly all of the new code is
in the low-level hardware driver for the specific iWARP devices.
The real issues for netdev are things like Steve Wise's patch to add
route change notifiers, which could be used to tell RNICs when to
update the next hop for a connection they're handling. More
generally, it would be interesting to see if it's possible to tie an
RNIC into the kernel's packet filtering, so that disallowed
connections don't get set up. This seems very similar in spirit to
the problems around packet filtering that were raised for VJ netchannels.
- Roland
next prev parent reply other threads:[~2006-07-04 20:34 UTC|newest]
Thread overview: 74+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-06-28 7:07 RDMA will be reverted David Miller
2006-06-28 7:41 ` Evgeniy Polyakov
2006-06-28 14:56 ` Tom Tucker
2006-06-28 15:01 ` Steve Wise
2006-06-29 16:54 ` Roland Dreier
2006-06-29 17:32 ` YOSHIFUJI Hideaki / 吉藤英明
2006-06-29 17:35 ` Roland Dreier
2006-06-29 17:40 ` YOSHIFUJI Hideaki / 吉藤英明
2006-06-29 19:46 ` David Miller
2006-06-29 20:11 ` Tom Tucker
2006-06-29 20:16 ` Tom Tucker
2006-06-29 20:19 ` David Miller
2006-06-29 20:47 ` Tom Tucker
2006-06-29 20:53 ` David Miller
2006-06-29 21:28 ` Tom Tucker
2006-06-29 21:25 ` Andi Kleen
2006-06-29 20:42 ` James Morris
2006-06-30 20:51 ` Roland Dreier
2006-06-30 21:16 ` David Miller
2006-06-30 23:01 ` Tom Tucker
2006-07-01 14:26 ` Andi Kleen
2006-07-04 18:34 ` Andy Gay
2006-07-04 20:47 ` Andi Kleen
2006-07-04 22:22 ` Andy Gay
2006-07-04 23:01 ` Andi Kleen
2006-07-04 23:48 ` Andy Gay
2006-07-05 0:04 ` Andi Kleen
2006-07-04 20:34 ` Roland Dreier [this message]
2006-07-24 22:06 ` David Miller
2006-07-24 23:10 ` Andi Kleen
2006-07-24 23:22 ` David Miller
2006-07-25 0:02 ` Andi Kleen
2006-07-25 0:29 ` Rick Jones
2006-07-25 0:45 ` David Miller
2006-07-25 0:55 ` Rick Jones
2006-07-25 1:04 ` Andi Kleen
2006-07-25 1:21 ` David Miller
2006-07-25 16:29 ` Rick Jones
2006-07-25 16:32 ` Andi Kleen
2006-07-25 1:03 ` Rick Jones
2006-07-25 1:42 ` Andi Kleen
2006-07-25 5:51 ` Evgeniy Polyakov
2006-07-25 6:48 ` David Miller
2006-07-25 6:59 ` Evgeniy Polyakov
2006-07-25 7:33 ` David Miller
2006-07-25 7:42 ` Evgeniy Polyakov
2006-07-05 17:09 ` Tom Tucker
2006-07-05 17:50 ` Steve Wise
2006-07-24 22:25 ` David Miller
2006-07-24 22:47 ` Caitlin Bestler
2006-07-24 22:23 ` David Miller
2006-07-24 22:57 ` Caitlin Bestler
2006-07-01 21:45 ` David Miller
2006-07-04 20:34 ` Roland Dreier
2006-07-05 18:27 ` David Miller
2006-07-05 20:29 ` Roland Dreier
2006-07-06 3:03 ` David Miller
2006-07-06 5:25 ` Tom Tucker
2006-07-06 14:08 ` Herbert Xu
2006-07-06 17:36 ` Tom Tucker
2006-07-07 0:03 ` Herbert Xu
2006-07-07 0:32 ` Tom Tucker
2006-07-07 6:53 ` David Miller
2006-07-07 8:11 ` What is RDMA (was: RDMA will be reverted) Herbert Xu
2006-07-07 18:25 ` Steve Wise
2006-07-11 8:17 ` Herbert Xu
2006-07-11 13:27 ` Steve Wise
2006-07-24 22:29 ` What is RDMA David Miller
2006-07-24 22:34 ` Rick Jones
2006-07-24 22:39 ` David Miller
2006-07-24 22:49 ` Andi Kleen
2006-07-07 13:29 ` RDMA will be reverted Tom Tucker
-- strict thread matches above, loose matches on Subject: below --
2006-07-06 13:26 Caitlin Bestler
2006-07-25 19:59 Tom Tucker
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=adasllh9kj0.fsf@cisco.com \
--to=rdreier@cisco.com \
--cc=ak@suse.de \
--cc=akpm@osdl.org \
--cc=davem@davemloft.net \
--cc=netdev@vger.kernel.org \
--cc=tom@opengridcomputing.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.