netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: TOE, etc.
  2006-06-28  4:29   ` Herbert Xu
@ 2006-06-28  4:43     ` David Miller
  2006-06-28  5:35       ` Herbert Xu
  0 siblings, 1 reply; 14+ messages in thread
From: David Miller @ 2006-06-28  4:43 UTC (permalink / raw)
  To: herbert; +Cc: jgarzik, swise, netdev

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Wed, 28 Jun 2006 14:29:59 +1000

> On Wed, Jun 28, 2006 at 12:18:25AM -0400, Jeff Garzik wrote:
> > 
> > A PCI device that presents itself as a SCSI controller, but under the 
> > hood is really iSCSI-over-TCP smells like TOE.  Running a virtualized 
> > Linux guest on top of a proprietary stack [which provides networking 
> > services to guests] also smells like TOE.  :)
> 
> Agreed.  However, when they start adding hooks to the ARP table, the
> routing table, and PMTU management, it begs the question what more is
> there to add for TOE (well, user-space driven TOE at least)?

Socket state, and that is one thing I don't see them doing yet.

> Put it another way, I think the dividing line between TOE and iSCSI or
> virtualisation is exactly the interface between them and the Linux kernel.
> If the interface is an existing one such as SCSI or standard IP then it's
> OK.  However, when it starts poking in the guts of the Linux stack I'd say
> that it has crossed the line.

Yeah, it's starting to smell really bad.

But we have to realize they've already been given %95 of the
interfaces they need to speak IP using our routes and our neighbour
entries.

Right?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: TOE, etc.
  2006-06-28  4:43     ` TOE, etc David Miller
@ 2006-06-28  5:35       ` Herbert Xu
  2006-06-28  6:31         ` David Miller
                           ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Herbert Xu @ 2006-06-28  5:35 UTC (permalink / raw)
  To: David Miller; +Cc: jgarzik, swise, netdev

On Tue, Jun 27, 2006 at 09:43:23PM -0700, David Miller wrote:
> 
> Socket state, and that is one thing I don't see them doing yet.

I wonder what happens when the Linux TCP stack attempts to open a
connection to a remote host when that connection is already open
in the RDMA NIC?  For that matter what happens if a Linux application
decides to listen on a TCP port already listened on by the RDMA
NIC?

The only saving grace is that they're only doing RDMA rather than
arbitrary TCP.  However, exactly the same infrastructure can be used
to do arbitrary TCP should they wish to.
 
> But we have to realize they've already been given %95 of the
> interfaces they need to speak IP using our routes and our neighbour
> entries.
> 
> Right?

Yes, however I think the same argument could be applied to TOE.

With their RDMA NIC, we'll have TCP/SCTP connections that bypass
netfilter, tc, IPsec, AF_PACKET/tcpdump and the rest of our stack
while at the same time it is using the same IP address as us and
deciding what packets we will or won't see.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: TOE, etc.
  2006-06-28  5:35       ` Herbert Xu
@ 2006-06-28  6:31         ` David Miller
  2006-06-28 14:41         ` Steve Wise
  2006-06-28 14:54         ` Steve Wise
  2 siblings, 0 replies; 14+ messages in thread
From: David Miller @ 2006-06-28  6:31 UTC (permalink / raw)
  To: herbert; +Cc: jgarzik, swise, netdev

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Wed, 28 Jun 2006 15:35:54 +1000

> With their RDMA NIC, we'll have TCP/SCTP connections that bypass
> netfilter, tc, IPsec, AF_PACKET/tcpdump and the rest of our stack
> while at the same time it is using the same IP address as us and
> deciding what packets we will or won't see.

That's true.  I don't think we should really add any more
help for these kinds of things then.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: TOE, etc.
  2006-06-28  5:35       ` Herbert Xu
  2006-06-28  6:31         ` David Miller
@ 2006-06-28 14:41         ` Steve Wise
  2006-06-28 14:54         ` Steve Wise
  2 siblings, 0 replies; 14+ messages in thread
From: Steve Wise @ 2006-06-28 14:41 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David Miller, jgarzik, netdev

On Wed, 2006-06-28 at 15:35 +1000, Herbert Xu wrote:
> On Tue, Jun 27, 2006 at 09:43:23PM -0700, David Miller wrote:
> > 
> > Socket state, and that is one thing I don't see them doing yet.
> 
> I wonder what happens when the Linux TCP stack attempts to open a
> connection to a remote host when that connection is already open
> in the RDMA NIC?  For that matter what happens if a Linux application
> decides to listen on a TCP port already listened on by the RDMA
> NIC?
> 
> The only saving grace is that they're only doing RDMA rather than
> arbitrary TCP.  However, exactly the same infrastructure can be used
> to do arbitrary TCP should they wish to.
>  
> > But we have to realize they've already been given %95 of the
> > interfaces they need to speak IP using our routes and our neighbour
> > entries.
> > 
> > Right?
> 
> Yes, however I think the same argument could be applied to TOE.
> 
> With their RDMA NIC, we'll have TCP/SCTP connections that bypass
> netfilter, tc, IPsec, AF_PACKET/tcpdump and the rest of our stack
> while at the same time it is using the same IP address as us and
> deciding what packets we will or won't see.
> 

Doesn't iSCSI have the same issue?  No netfilter, IPsec, tcpdump, etc...





^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: TOE, etc.
  2006-06-28  5:35       ` Herbert Xu
  2006-06-28  6:31         ` David Miller
  2006-06-28 14:41         ` Steve Wise
@ 2006-06-28 14:54         ` Steve Wise
  2006-06-28 18:36           ` David Miller
  2 siblings, 1 reply; 14+ messages in thread
From: Steve Wise @ 2006-06-28 14:54 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David Miller, jgarzik, netdev

On Wed, 2006-06-28 at 15:35 +1000, Herbert Xu wrote:
> On Tue, Jun 27, 2006 at 09:43:23PM -0700, David Miller wrote:
> > 
> > Socket state, and that is one thing I don't see them doing yet.
> 
> I wonder what happens when the Linux TCP stack attempts to open a
> connection to a remote host when that connection is already open
> in the RDMA NIC?  For that matter what happens if a Linux application
> decides to listen on a TCP port already listened on by the RDMA
> NIC?
> 

This issue would have to be handled by using seperate IP addresses for
RDMA connections vs native stack TCP.

Consider NFS-RDMA server.  Through administration, it would be
configured to listen on the specific rdma ip addresses, and the native
stack tcp ip addresses and thus support both TCP and RDMA NFS
connections.

There are definitely issues with this that could be resolved via tighter
integration, but that seems to not be a goal of the linux community at
this time...


> The only saving grace is that they're only doing RDMA rather than
> arbitrary TCP.  However, exactly the same infrastructure can be used
> to do arbitrary TCP should they wish to.
>  
> > But we have to realize they've already been given %95 of the
> > interfaces they need to speak IP using our routes and our neighbour
> > entries.
> > 
> > Right?
> 
> Yes, however I think the same argument could be applied to TOE.
> 
> With their RDMA NIC, we'll have TCP/SCTP connections that bypass
> netfilter, tc, IPsec, AF_PACKET/tcpdump and the rest of our stack
> while at the same time it is using the same IP address as us and
> deciding what packets we will or won't see.
> 


Doesn't iSCSI have this same issue?

Steve.





^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: TOE, etc.
@ 2006-06-28 16:25 Caitlin Bestler
  0 siblings, 0 replies; 14+ messages in thread
From: Caitlin Bestler @ 2006-06-28 16:25 UTC (permalink / raw)
  To: Herbert Xu, David Miller; +Cc: jgarzik, swise, netdev

Herbert Xu wrote:

> 
> Yes, however I think the same argument could be applied to TOE.
> 
> With their RDMA NIC, we'll have TCP/SCTP connections that
> bypass netfilter, tc, IPsec, AF_PACKET/tcpdump and the rest
> of our stack while at the same time it is using the same IP
> address as us and deciding what packets we will or won't see.
> 

The whole point of the patches that opengrid has proposed is to
allow control of these issues to remain with the kernel. That is
where the ownership of the IP address logically resides, and system
administrators will expect to be able to use one set of tools to
control what is done with a given IP address.

The bypassing is already going on with iSCSI devices and with
InfiniBand devices that use IP addresses. An RDMA/IP device just
makes it harder to ignore this problem, but the problem was already
there. SDP over IB is presented to Linux users essentially as a
TOE service. Connections are made with IP and socket semantics,
and yet there is no co-ordination on routes/netfilter/etc.

I'll state right up front that I think stateful offload, when
co-ordinated with the OS, is better than stateless offload --
especially at 10G speeds.

But for plain TCP connections there are stateless offloads
available. As a product architect I am already seeking as
many ways as possible to support stateless offload as efficiently
as possible to keep that option viable for Linux users for as
high of a rate as possible. That is why we are very interested
in exploring a hardware friendly definition of vj_netchannels.

But with RDMA things are different. There is no such thing as
stateless RDMA. It is not RDMA over TCP that requires stateful
offload, it is RDMA itself. RDMA over InfiniBand is just as
much of a stateful offload as RDMA over TCP.

It is possible to build RDMA over TCP as a service that merely
uses memory mappping services in a mysterious way but is not
integrated with the network stack at all. That is essentially
how RDMA over IB is currently working.

But I believe that integrating control over the IP address,
and the associated netfilter/routing/arp/pmtu/etc issues,
is the correct path. This logic should not be duplicated,
and its control must not be split.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: TOE, etc.
  2006-06-28 14:54         ` Steve Wise
@ 2006-06-28 18:36           ` David Miller
  2006-06-28 18:56             ` Steve Wise
  0 siblings, 1 reply; 14+ messages in thread
From: David Miller @ 2006-06-28 18:36 UTC (permalink / raw)
  To: swise; +Cc: herbert, jgarzik, netdev

From: Steve Wise <swise@opengridcomputing.com>
Date: Wed, 28 Jun 2006 09:54:57 -0500

> Doesn't iSCSI have this same issue?

Software iSCSI implementations don't have the issue because
they go through the stack using normal sockets and normal
device send and receive.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: TOE, etc.
@ 2006-06-28 18:49 Caitlin Bestler
  2006-06-28 21:10 ` Jeff Garzik
  0 siblings, 1 reply; 14+ messages in thread
From: Caitlin Bestler @ 2006-06-28 18:49 UTC (permalink / raw)
  To: David Miller, swise; +Cc: herbert, jgarzik, netdev

netdev-owner@vger.kernel.org wrote:
> From: Steve Wise <swise@opengridcomputing.com>
> Date: Wed, 28 Jun 2006 09:54:57 -0500
> 
>> Doesn't iSCSI have this same issue?
> 
> Software iSCSI implementations don't have the issue because
> they go through the stack using normal sockets and normal
> device send and receive.

But hardware iSCSI implementations, which already exist,
do not work through normal sockets.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: TOE, etc.
  2006-06-28 18:36           ` David Miller
@ 2006-06-28 18:56             ` Steve Wise
  0 siblings, 0 replies; 14+ messages in thread
From: Steve Wise @ 2006-06-28 18:56 UTC (permalink / raw)
  To: David Miller; +Cc: herbert, jgarzik, netdev

On Wed, 2006-06-28 at 11:36 -0700, David Miller wrote:
> From: Steve Wise <swise@opengridcomputing.com>
> Date: Wed, 28 Jun 2006 09:54:57 -0500
> 
> > Doesn't iSCSI have this same issue?
> 
> Software iSCSI implementations don't have the issue because
> they go through the stack using normal sockets and normal
> device send and receive.
> -

Right.  I was assuming, in this thread we were talking about iSCSI
devices where the TCP stack is in HW/FW on the adapter...

Steve.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: TOE, etc.
  2006-06-28 18:49 TOE, etc Caitlin Bestler
@ 2006-06-28 21:10 ` Jeff Garzik
  0 siblings, 0 replies; 14+ messages in thread
From: Jeff Garzik @ 2006-06-28 21:10 UTC (permalink / raw)
  To: Caitlin Bestler; +Cc: David Miller, swise, herbert, netdev

Caitlin Bestler wrote:
> netdev-owner@vger.kernel.org wrote:
>> From: Steve Wise <swise@opengridcomputing.com>
>> Date: Wed, 28 Jun 2006 09:54:57 -0500
>>
>>> Doesn't iSCSI have this same issue?
>> Software iSCSI implementations don't have the issue because
>> they go through the stack using normal sockets and normal
>> device send and receive.
> 
> But hardware iSCSI implementations, which already exist,
> do not work through normal sockets.

No, they work through normal SCSI stack...

	Jeff




^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: TOE, etc.
@ 2006-06-28 21:15 Caitlin Bestler
  2006-06-28 23:43 ` Jeff Garzik
  0 siblings, 1 reply; 14+ messages in thread
From: Caitlin Bestler @ 2006-06-28 21:15 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: David Miller, swise, herbert, netdev

Jeff Garzik wrote:
> Caitlin Bestler wrote:
>> netdev-owner@vger.kernel.org wrote:
>>> From: Steve Wise <swise@opengridcomputing.com>
>>> Date: Wed, 28 Jun 2006 09:54:57 -0500
>>> 
>>>> Doesn't iSCSI have this same issue?
>>> Software iSCSI implementations don't have the issue because they go
>>> through the stack using normal sockets and normal device send and
>>> receive.
>> 
>> But hardware iSCSI implementations, which already exist, do not work
>> through normal sockets.
> 
> No, they work through normal SCSI stack...
> 
> 	Jeff

Correct.

But they then interface to the network using none of the network stack.
The normal SCSI stack does not control that it any way.

NFS over RDMA is part of the file system. That doesn't change the fact
that it's use of IP Addresses needs to be co-ordinated with the network
stack, and indeed that address based authentication *assumes* that this
is the case. (and yes, there are preferable means of authentication, but
authenticating based on IP address is already supported).

But back on the main point, if implementing SCSI services over a
TCP connection is acceptable even though it does not use a kernel
socket, why would it not be acceptable to implement RDMA services
over a TCP connection without using a kernel socket?


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: TOE, etc.
  2006-06-28 21:15 Caitlin Bestler
@ 2006-06-28 23:43 ` Jeff Garzik
  2006-06-29 14:09   ` Steve Wise
  0 siblings, 1 reply; 14+ messages in thread
From: Jeff Garzik @ 2006-06-28 23:43 UTC (permalink / raw)
  To: Caitlin Bestler; +Cc: David Miller, swise, herbert, netdev

Caitlin Bestler wrote:
> Jeff Garzik wrote:
>> Caitlin Bestler wrote:
>>> But hardware iSCSI implementations, which already exist, do not work
>>> through normal sockets.

>> No, they work through normal SCSI stack...

> Correct.
> 
> But they then interface to the network using none of the network stack.
> The normal SCSI stack does not control that it any way.

Correct.  And the network stack is completely unaware of whatever IP 
addresses, ARP tables, routing tables, etc. it is using.


> NFS over RDMA is part of the file system. That doesn't change the fact
> that it's use of IP Addresses needs to be co-ordinated with the network
> stack, and indeed that address based authentication *assumes* that this
> is the case. (and yes, there are preferable means of authentication, but
> authenticating based on IP address is already supported).

Sounds quite broken to me.


> But back on the main point, if implementing SCSI services over a
> TCP connection is acceptable even though it does not use a kernel
> socket, why would it not be acceptable to implement RDMA services
> over a TCP connection without using a kernel socket?

Because SCSI doesn't force nasty hooks into the net stack to allow for 
sharing of resources with a proprietary black box of unknown quality.

	Jeff



^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: TOE, etc.
@ 2006-06-28 23:54 Caitlin Bestler
  0 siblings, 0 replies; 14+ messages in thread
From: Caitlin Bestler @ 2006-06-28 23:54 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: David Miller, swise, herbert, netdev

Jeff Garzik wrote:
> Caitlin Bestler wrote:
>> Jeff Garzik wrote:
>>> Caitlin Bestler wrote:
>>>> But hardware iSCSI implementations, which already exist, do not
>>>> work through normal sockets.
> 
>>> No, they work through normal SCSI stack...
> 
>> Correct.
>> 
>> But they then interface to the network using none of the network
>> stack. The normal SCSI stack does not control that it any way.
> 
> Correct.  And the network stack is completely unaware of
> whatever IP addresses, ARP tables, routing tables, etc. it is using.
> 
> 
>> NFS over RDMA is part of the file system. That doesn't change the
>> fact that it's use of IP Addresses needs to be co-ordinated with the
>> network stack, and indeed that address based authentication
>> *assumes* that this is the case. (and yes, there are preferable
>> means of authentication, but authenticating based on IP address is
>> already supported). 
> 
> Sounds quite broken to me.
> 
> 
>> But back on the main point, if implementing SCSI services over a
>> TCP connection is acceptable even though it does not use a kernel
>> socket, why would it not be acceptable to implement RDMA services
>> over a TCP connection without using a kernel socket?
> 
> Because SCSI doesn't force nasty hooks into the net stack to
> allow for
> sharing of resources with a proprietary black box of unknown quality.
> 
> 	Jeff

RDMA can also solve all of these problems on its own. Complete with
giving the network administrator *no* conventional controls over the
IP address being used for RDMA services.

That means no standard ability to monitor connections, no standard
ability to control which connections are made with whom.

That is better? 

You seem to be practically demanding that RDMA build an entire
parallel stack.

Worse, that *each* RDMA vendor build an entire parallel stack.

Open source being what it is, that is not terribly difficult.
But exactly how does this benefit Linux users?

The proposed subscriptions are not about sharing *resources*, they
share *information* with device drivers. The quality of each
RDMA device driver will be just as known as for a SCSI driver,
an InfiniBand HCA driver, a graphics driver or a plain Ethernet
driver.




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: TOE, etc.
  2006-06-28 23:43 ` Jeff Garzik
@ 2006-06-29 14:09   ` Steve Wise
  0 siblings, 0 replies; 14+ messages in thread
From: Steve Wise @ 2006-06-29 14:09 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Caitlin Bestler, David Miller, herbert, netdev


> 
> > But back on the main point, if implementing SCSI services over a
> > TCP connection is acceptable even though it does not use a kernel
> > socket, why would it not be acceptable to implement RDMA services
> > over a TCP connection without using a kernel socket?
> 
> Because SCSI doesn't force nasty hooks into the net stack to allow for 
> sharing of resources with a proprietary black box of unknown quality.
> 
> 	Jeff
> 

The netevent notifier patch in question seemed to be reasonable hooks
for notifying kernel and user mode modules of certain network events.
In fact, it went through 3 review cycles and was improved by feedback
from Dave and others.  eg: It was integrated with netlink to provide
user mode notifications.  

Then someone says 'TOE' and suddenly the hooks are nasty?


Steve.


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2006-06-29 14:09 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-06-28 18:49 TOE, etc Caitlin Bestler
2006-06-28 21:10 ` Jeff Garzik
  -- strict thread matches above, loose matches on Subject: below --
2006-06-28 23:54 Caitlin Bestler
2006-06-28 21:15 Caitlin Bestler
2006-06-28 23:43 ` Jeff Garzik
2006-06-29 14:09   ` Steve Wise
2006-06-28 16:25 Caitlin Bestler
2006-06-28  3:37 [PATCH Round 3 0/2][RFC] Network Event Notifier Mechanism Herbert Xu
2006-06-28  4:18 ` TOE, etc. (was Re: [PATCH Round 3 0/2][RFC] Network Event Notifier Mechanism) Jeff Garzik
2006-06-28  4:29   ` Herbert Xu
2006-06-28  4:43     ` TOE, etc David Miller
2006-06-28  5:35       ` Herbert Xu
2006-06-28  6:31         ` David Miller
2006-06-28 14:41         ` Steve Wise
2006-06-28 14:54         ` Steve Wise
2006-06-28 18:36           ` David Miller
2006-06-28 18:56             ` Steve Wise

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).