netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* RE: TOE, etc.
@ 2006-06-28 21:15 Caitlin Bestler
  2006-06-28 23:43 ` Jeff Garzik
  0 siblings, 1 reply; 14+ messages in thread
From: Caitlin Bestler @ 2006-06-28 21:15 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: David Miller, swise, herbert, netdev

Jeff Garzik wrote:
> Caitlin Bestler wrote:
>> netdev-owner@vger.kernel.org wrote:
>>> From: Steve Wise <swise@opengridcomputing.com>
>>> Date: Wed, 28 Jun 2006 09:54:57 -0500
>>> 
>>>> Doesn't iSCSI have this same issue?
>>> Software iSCSI implementations don't have the issue because they go
>>> through the stack using normal sockets and normal device send and
>>> receive.
>> 
>> But hardware iSCSI implementations, which already exist, do not work
>> through normal sockets.
> 
> No, they work through normal SCSI stack...
> 
> 	Jeff

Correct.

But they then interface to the network using none of the network stack.
The normal SCSI stack does not control that it any way.

NFS over RDMA is part of the file system. That doesn't change the fact
that it's use of IP Addresses needs to be co-ordinated with the network
stack, and indeed that address based authentication *assumes* that this
is the case. (and yes, there are preferable means of authentication, but
authenticating based on IP address is already supported).

But back on the main point, if implementing SCSI services over a
TCP connection is acceptable even though it does not use a kernel
socket, why would it not be acceptable to implement RDMA services
over a TCP connection without using a kernel socket?


^ permalink raw reply	[flat|nested] 14+ messages in thread
* RE: TOE, etc.
@ 2006-06-28 23:54 Caitlin Bestler
  0 siblings, 0 replies; 14+ messages in thread
From: Caitlin Bestler @ 2006-06-28 23:54 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: David Miller, swise, herbert, netdev

Jeff Garzik wrote:
> Caitlin Bestler wrote:
>> Jeff Garzik wrote:
>>> Caitlin Bestler wrote:
>>>> But hardware iSCSI implementations, which already exist, do not
>>>> work through normal sockets.
> 
>>> No, they work through normal SCSI stack...
> 
>> Correct.
>> 
>> But they then interface to the network using none of the network
>> stack. The normal SCSI stack does not control that it any way.
> 
> Correct.  And the network stack is completely unaware of
> whatever IP addresses, ARP tables, routing tables, etc. it is using.
> 
> 
>> NFS over RDMA is part of the file system. That doesn't change the
>> fact that it's use of IP Addresses needs to be co-ordinated with the
>> network stack, and indeed that address based authentication
>> *assumes* that this is the case. (and yes, there are preferable
>> means of authentication, but authenticating based on IP address is
>> already supported). 
> 
> Sounds quite broken to me.
> 
> 
>> But back on the main point, if implementing SCSI services over a
>> TCP connection is acceptable even though it does not use a kernel
>> socket, why would it not be acceptable to implement RDMA services
>> over a TCP connection without using a kernel socket?
> 
> Because SCSI doesn't force nasty hooks into the net stack to
> allow for
> sharing of resources with a proprietary black box of unknown quality.
> 
> 	Jeff

RDMA can also solve all of these problems on its own. Complete with
giving the network administrator *no* conventional controls over the
IP address being used for RDMA services.

That means no standard ability to monitor connections, no standard
ability to control which connections are made with whom.

That is better? 

You seem to be practically demanding that RDMA build an entire
parallel stack.

Worse, that *each* RDMA vendor build an entire parallel stack.

Open source being what it is, that is not terribly difficult.
But exactly how does this benefit Linux users?

The proposed subscriptions are not about sharing *resources*, they
share *information* with device drivers. The quality of each
RDMA device driver will be just as known as for a SCSI driver,
an InfiniBand HCA driver, a graphics driver or a plain Ethernet
driver.




^ permalink raw reply	[flat|nested] 14+ messages in thread
* RE: TOE, etc.
@ 2006-06-28 18:49 Caitlin Bestler
  2006-06-28 21:10 ` Jeff Garzik
  0 siblings, 1 reply; 14+ messages in thread
From: Caitlin Bestler @ 2006-06-28 18:49 UTC (permalink / raw)
  To: David Miller, swise; +Cc: herbert, jgarzik, netdev

netdev-owner@vger.kernel.org wrote:
> From: Steve Wise <swise@opengridcomputing.com>
> Date: Wed, 28 Jun 2006 09:54:57 -0500
> 
>> Doesn't iSCSI have this same issue?
> 
> Software iSCSI implementations don't have the issue because
> they go through the stack using normal sockets and normal
> device send and receive.

But hardware iSCSI implementations, which already exist,
do not work through normal sockets.


^ permalink raw reply	[flat|nested] 14+ messages in thread
* RE: TOE, etc.
@ 2006-06-28 16:25 Caitlin Bestler
  0 siblings, 0 replies; 14+ messages in thread
From: Caitlin Bestler @ 2006-06-28 16:25 UTC (permalink / raw)
  To: Herbert Xu, David Miller; +Cc: jgarzik, swise, netdev

Herbert Xu wrote:

> 
> Yes, however I think the same argument could be applied to TOE.
> 
> With their RDMA NIC, we'll have TCP/SCTP connections that
> bypass netfilter, tc, IPsec, AF_PACKET/tcpdump and the rest
> of our stack while at the same time it is using the same IP
> address as us and deciding what packets we will or won't see.
> 

The whole point of the patches that opengrid has proposed is to
allow control of these issues to remain with the kernel. That is
where the ownership of the IP address logically resides, and system
administrators will expect to be able to use one set of tools to
control what is done with a given IP address.

The bypassing is already going on with iSCSI devices and with
InfiniBand devices that use IP addresses. An RDMA/IP device just
makes it harder to ignore this problem, but the problem was already
there. SDP over IB is presented to Linux users essentially as a
TOE service. Connections are made with IP and socket semantics,
and yet there is no co-ordination on routes/netfilter/etc.

I'll state right up front that I think stateful offload, when
co-ordinated with the OS, is better than stateless offload --
especially at 10G speeds.

But for plain TCP connections there are stateless offloads
available. As a product architect I am already seeking as
many ways as possible to support stateless offload as efficiently
as possible to keep that option viable for Linux users for as
high of a rate as possible. That is why we are very interested
in exploring a hardware friendly definition of vj_netchannels.

But with RDMA things are different. There is no such thing as
stateless RDMA. It is not RDMA over TCP that requires stateful
offload, it is RDMA itself. RDMA over InfiniBand is just as
much of a stateful offload as RDMA over TCP.

It is possible to build RDMA over TCP as a service that merely
uses memory mappping services in a mysterious way but is not
integrated with the network stack at all. That is essentially
how RDMA over IB is currently working.

But I believe that integrating control over the IP address,
and the associated netfilter/routing/arp/pmtu/etc issues,
is the correct path. This logic should not be duplicated,
and its control must not be split.



^ permalink raw reply	[flat|nested] 14+ messages in thread
* Re: [PATCH Round 3 0/2][RFC] Network Event Notifier Mechanism
@ 2006-06-28  3:37 Herbert Xu
  2006-06-28  4:18 ` TOE, etc. (was Re: [PATCH Round 3 0/2][RFC] Network Event Notifier Mechanism) Jeff Garzik
  0 siblings, 1 reply; 14+ messages in thread
From: Herbert Xu @ 2006-06-28  3:37 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Steve Wise, davem, netdev

On Tue, Jun 27, 2006 at 11:24:25PM -0400, Jeff Garzik wrote:
>
> I don't see how that position has changed?
> 
> http://linux-net.osdl.org/index.php/TOE

Well I must say that RDMA over TCP smells very much like TOE.  They've
got an ARP table, a routing table, and presumably a TCP stack.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2006-06-29 14:09 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-06-28 21:15 TOE, etc Caitlin Bestler
2006-06-28 23:43 ` Jeff Garzik
2006-06-29 14:09   ` Steve Wise
  -- strict thread matches above, loose matches on Subject: below --
2006-06-28 23:54 Caitlin Bestler
2006-06-28 18:49 Caitlin Bestler
2006-06-28 21:10 ` Jeff Garzik
2006-06-28 16:25 Caitlin Bestler
2006-06-28  3:37 [PATCH Round 3 0/2][RFC] Network Event Notifier Mechanism Herbert Xu
2006-06-28  4:18 ` TOE, etc. (was Re: [PATCH Round 3 0/2][RFC] Network Event Notifier Mechanism) Jeff Garzik
2006-06-28  4:29   ` Herbert Xu
2006-06-28  4:43     ` TOE, etc David Miller
2006-06-28  5:35       ` Herbert Xu
2006-06-28  6:31         ` David Miller
2006-06-28 14:41         ` Steve Wise
2006-06-28 14:54         ` Steve Wise
2006-06-28 18:36           ` David Miller
2006-06-28 18:56             ` Steve Wise

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).