* Re: Linux support for RDMA
@ 2005-04-01 1:49 jaganav
2005-04-01 1:57 ` H. Peter Anvin
0 siblings, 1 reply; 6+ messages in thread
From: jaganav @ 2005-04-01 1:49 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Roland Dreier, Dmitry Yusupov, open-iscsi, David S. Miller, mpm,
andrea, michaelc, James.Bottomley, ksummit-2005-discuss, netdev,
Benjamin LaHaise
Quoting "H. Peter Anvin" <hpa@zytor.com>:
> Benjamin LaHaise wrote:
> >
> > I'm curious how the 10Gig ethernet market will pan out. Time and again
> > the market has shown that ethernet always has the cost advantage in the
> > end. If something like Intel's I/O Acceleration Technology makes it
> > that much easier for commodity ethernet to achieve similar performance
> > characteristics over ethernet to that of IB and fibre channel, the cost
> > advantage alone might switch some new customers over. But the hardware
> > isn't near what IB offers today, making IB an important niche filler.
> >
>
> From what I've seen coming down the pipe, I think 10GE is going to
> eventually win over IB, just like previous generations did over Token
> Ring, FDDI and other niche filler technologies. It doesn't, as you say,
> mean that e.g. IB doesn't matter *now*; furthermore, it also matters for
> the purpose of fixing the kind of issues that are going to have to be
> fixed anyway.
>
> -hpa
>
>
>
No doubt, Ethernet will eventually win .. btw, Hasn't history proven this over
ATM? More specifically when the industry predicted that ATM will replace
ethernet :)
However, I'll have to agree with Ben that IB technolgy will fill an important
niche segment, more specifically so in the low end of High Performance Computing
(HPC) segment which is in a transition mode currently moving away from
proprietary interconnects to industry standards based IB technology. Eventhough,
ethernet may eventually may catch up with IB in terms of the bandwidth but IB
fabrics can offer better latencies.
Thanks
Venkat
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: Linux support for RDMA 2005-04-01 1:49 Linux support for RDMA jaganav @ 2005-04-01 1:57 ` H. Peter Anvin 0 siblings, 0 replies; 6+ messages in thread From: H. Peter Anvin @ 2005-04-01 1:57 UTC (permalink / raw) To: jaganav Cc: Roland Dreier, Dmitry Yusupov, open-iscsi, David S. Miller, mpm, andrea, michaelc, James.Bottomley, ksummit-2005-discuss, netdev, Benjamin LaHaise jaganav@us.ibm.com wrote: > > No doubt, Ethernet will eventually win .. btw, Hasn't history proven this over > ATM? More specifically when the industry predicted that ATM will replace > ethernet :) > > However, I'll have to agree with Ben that IB technolgy will fill an important > niche segment, more specifically so in the low end of High Performance Computing > (HPC) segment which is in a transition mode currently moving away from > proprietary interconnects to industry standards based IB technology. Eventhough, > ethernet may eventually may catch up with IB in terms of the bandwidth but IB > fabrics can offer better latencies. > We've seen this over and over... Token Ring, FDDI, ATM, IB, ... all of them "better" than the Ethernet of the day, but eventually commoditization wins out. With 10GE, Ethernet has finally stopped pretending to be CSMA/CD even; "Ethernet" is now really nothing more than a collective name for a set of somewhat compatible commodity networking technologies. -hpa ^ permalink raw reply [flat|nested] 6+ messages in thread
* RE: Linux support for RDMA
@ 2005-04-02 1:59 jaganav
0 siblings, 0 replies; 6+ messages in thread
From: jaganav @ 2005-04-02 1:59 UTC (permalink / raw)
To: Dmitry Yusupov
Cc: Asgeir Eiriksson, H. Peter Anvin, Roland Dreier, open-iscsi,
David S. Miller, mpm, andrea, michaelc, James.Bottomley,
ksummit-2005-discuss, netdev, Benjamin LaHaise
Quoting Dmitry Yusupov <dima@neterion.com>:
> On Fri, 2005-04-01 at 15:50 -0800, Asgeir Eiriksson wrote:
> > Venkat
> >
> > Your assessment of the IB vs. Ethernet latencies isn't necessarily
> > correct.
> > - you already have available low latency 10GE switches (< 1us
> > port-to-port)
> > - you already have available low latency (cut-through processing) 10GE
> > TOE engines
> >
> > The Veritest verified 10GE TOE end-to-end latency is < 10us today
> > (end-to-end being from a Linux user-space-process to a Linux
> > user-space-process through a switch; full report with detail of the
> > setup is available at
> > http://www.chelsio.com/technology/Chelsio10GbE_Fujitsu.pdf)
> >
> > For comparison: the published IB latency numbers are around 5us today
> > and those use a polling receiver, and those don't include a context
> > switch(es) as does the Ethernet number quoted above.
>
> yep. I should agree in here. On 10Gbps network latencies numbers are
> around 5-15us. Even with non-TOE card, I managed to get 13us latency
> with regular TCP/IP stack.
>
> [root@localhost root]# ./nptcp -a -t -l 256 -u 98304 -i 256 -p 5100 -P - h
> 17.1.1.227
> Latency: 0.000013
> Now starting main loop
> 0: 256 bytes 7 times --> 131.37 Mbps in 0.000015 sec
> 1: 512 bytes 65 times --> 239.75 Mbps in 0.000016 sec
>
> Dima
When I mentioned about latency, the measurement is from
end-to-end (i.e. from app to app) but not just the
switching or port to port latencies.
With IB, I have seen the best numbers ranging from
5 to 7 us and which is far better than ethernet today
(15 to 35us) with the network we have. I am not
denyig the fact that ethernet is trying to close the
gap here but IB has got a relative advantage now.
Good to see you have got 5us in one case but what were
the switch and adapter latencies in this case.
Thanks
Venkat
^ permalink raw reply [flat|nested] 6+ messages in thread* RE: Linux support for RDMA @ 2005-04-01 23:50 Asgeir Eiriksson 2005-04-02 0:02 ` Dmitry Yusupov 0 siblings, 1 reply; 6+ messages in thread From: Asgeir Eiriksson @ 2005-04-01 23:50 UTC (permalink / raw) To: jaganav, H. Peter Anvin Cc: Roland Dreier, Dmitry Yusupov, open-iscsi, David S. Miller, mpm, andrea, michaelc, James.Bottomley, ksummit-2005-discuss, netdev, Benjamin LaHaise Venkat Your assessment of the IB vs. Ethernet latencies isn't necessarily correct. - you already have available low latency 10GE switches (< 1us port-to-port) - you already have available low latency (cut-through processing) 10GE TOE engines The Veritest verified 10GE TOE end-to-end latency is < 10us today (end-to-end being from a Linux user-space-process to a Linux user-space-process through a switch; full report with detail of the setup is available at http://www.chelsio.com/technology/Chelsio10GbE_Fujitsu.pdf) For comparison: the published IB latency numbers are around 5us today and those use a polling receiver, and those don't include a context switch(es) as does the Ethernet number quoted above. 'Asgeir > -----Original Message----- > From: netdev-bounce@oss.sgi.com [mailto:netdev-bounce@oss.sgi.com] On > Behalf Of jaganav@us.ibm.com > Sent: Thursday, March 31, 2005 5:49 PM > To: H. Peter Anvin > Cc: Roland Dreier; Dmitry Yusupov; open-iscsi@googlegroups.com; David S. > Miller; mpm@selenic.com; andrea@suse.de; michaelc@cs.wisc.edu; > James.Bottomley@HansenPartnership.com; ksummit-2005-discuss@thunk.org; > netdev@oss.sgi.com; Benjamin LaHaise > Subject: Re: Linux support for RDMA > > Quoting "H. Peter Anvin" <hpa@zytor.com>: > > Benjamin LaHaise wrote: > > > > > > I'm curious how the 10Gig ethernet market will pan out. Time and > again > > > the market has shown that ethernet always has the cost advantage in > the > > > end. If something like Intel's I/O Acceleration Technology makes it > > > that much easier for commodity ethernet to achieve similar performance > > > characteristics over ethernet to that of IB and fibre channel, the > cost > > > advantage alone might switch some new customers over. But the > hardware > > > isn't near what IB offers today, making IB an important niche filler. > > > > > > > From what I've seen coming down the pipe, I think 10GE is going to > > eventually win over IB, just like previous generations did over Token > > Ring, FDDI and other niche filler technologies. It doesn't, as you say, > > mean that e.g. IB doesn't matter *now*; furthermore, it also matters for > > the purpose of fixing the kind of issues that are going to have to be > > fixed anyway. > > > > -hpa > > > > > > > > No doubt, Ethernet will eventually win .. btw, Hasn't history proven this > over > ATM? More specifically when the industry predicted that ATM will replace > ethernet :) > > However, I'll have to agree with Ben that IB technolgy will fill an > important > niche segment, more specifically so in the low end of High Performance > Computing > (HPC) segment which is in a transition mode currently moving away from > proprietary interconnects to industry standards based IB technology. > Eventhough, > ethernet may eventually may catch up with IB in terms of the bandwidth but > IB > fabrics can offer better latencies. > > Thanks > Venkat ^ permalink raw reply [flat|nested] 6+ messages in thread
* RE: Linux support for RDMA 2005-04-01 23:50 Asgeir Eiriksson @ 2005-04-02 0:02 ` Dmitry Yusupov 0 siblings, 0 replies; 6+ messages in thread From: Dmitry Yusupov @ 2005-04-02 0:02 UTC (permalink / raw) To: Asgeir Eiriksson Cc: jaganav, H. Peter Anvin, Roland Dreier, open-iscsi, David S. Miller, mpm, andrea, michaelc, James.Bottomley, ksummit-2005-discuss, netdev, Benjamin LaHaise On Fri, 2005-04-01 at 15:50 -0800, Asgeir Eiriksson wrote: > Venkat > > Your assessment of the IB vs. Ethernet latencies isn't necessarily > correct. > - you already have available low latency 10GE switches (< 1us > port-to-port) > - you already have available low latency (cut-through processing) 10GE > TOE engines > > The Veritest verified 10GE TOE end-to-end latency is < 10us today > (end-to-end being from a Linux user-space-process to a Linux > user-space-process through a switch; full report with detail of the > setup is available at > http://www.chelsio.com/technology/Chelsio10GbE_Fujitsu.pdf) > > For comparison: the published IB latency numbers are around 5us today > and those use a polling receiver, and those don't include a context > switch(es) as does the Ethernet number quoted above. yep. I should agree in here. On 10Gbps network latencies numbers are around 5-15us. Even with non-TOE card, I managed to get 13us latency with regular TCP/IP stack. [root@localhost root]# ./nptcp -a -t -l 256 -u 98304 -i 256 -p 5100 -P - h 17.1.1.227 Latency: 0.000013 Now starting main loop 0: 256 bytes 7 times --> 131.37 Mbps in 0.000015 sec 1: 512 bytes 65 times --> 239.75 Mbps in 0.000016 sec Dima > 'Asgeir > > > > -----Original Message----- > > From: netdev-bounce@oss.sgi.com [mailto:netdev-bounce@oss.sgi.com] On > > Behalf Of jaganav@us.ibm.com > > Sent: Thursday, March 31, 2005 5:49 PM > > To: H. Peter Anvin > > Cc: Roland Dreier; Dmitry Yusupov; open-iscsi@googlegroups.com; David > S. > > Miller; mpm@selenic.com; andrea@suse.de; michaelc@cs.wisc.edu; > > James.Bottomley@HansenPartnership.com; ksummit-2005-discuss@thunk.org; > > netdev@oss.sgi.com; Benjamin LaHaise > > Subject: Re: Linux support for RDMA > > > > Quoting "H. Peter Anvin" <hpa@zytor.com>: > > > Benjamin LaHaise wrote: > > > > > > > > I'm curious how the 10Gig ethernet market will pan out. Time and > > again > > > > the market has shown that ethernet always has the cost advantage > in > > the > > > > end. If something like Intel's I/O Acceleration Technology makes > it > > > > that much easier for commodity ethernet to achieve similar > performance > > > > characteristics over ethernet to that of IB and fibre channel, the > > cost > > > > advantage alone might switch some new customers over. But the > > hardware > > > > isn't near what IB offers today, making IB an important niche > filler. > > > > > > > > > > From what I've seen coming down the pipe, I think 10GE is going to > > > eventually win over IB, just like previous generations did over > Token > > > Ring, FDDI and other niche filler technologies. It doesn't, as you > say, > > > mean that e.g. IB doesn't matter *now*; furthermore, it also matters > for > > > the purpose of fixing the kind of issues that are going to have to > be > > > fixed anyway. > > > > > > -hpa > > > > > > > > > > > > > No doubt, Ethernet will eventually win .. btw, Hasn't history proven > this > > over > > ATM? More specifically when the industry predicted that ATM will > replace > > ethernet :) > > > > However, I'll have to agree with Ben that IB technolgy will fill an > > important > > niche segment, more specifically so in the low end of High Performance > > Computing > > (HPC) segment which is in a transition mode currently moving away from > > proprietary interconnects to industry standards based IB technology. > > Eventhough, > > ethernet may eventually may catch up with IB in terms of the bandwidth > but > > IB > > fabrics can offer better latencies. > > > > Thanks > > Venkat > > > > ^ permalink raw reply [flat|nested] 6+ messages in thread
[parent not found: <20050324233921.GZ14202@opteron.random>]
[parent not found: <20050325034341.GV32638@waste.org>]
[parent not found: <20050327035149.GD4053@g5.random>]
* Re: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics [not found] ` <20050327035149.GD4053@g5.random> @ 2005-03-27 5:48 ` Matt Mackall 2005-03-27 6:33 ` Dmitry Yusupov 0 siblings, 1 reply; 6+ messages in thread From: Matt Mackall @ 2005-03-27 5:48 UTC (permalink / raw) To: Andrea Arcangeli Cc: Mike Christie, Dmitry Yusupov, open-iscsi, James.Bottomley, ksummit-2005-discuss, netdev I'm cc:ing this to netdev, where this discussion really ought to be. There's a separate networking summit and I suspect most of the networking heavies aren't reading ksummit-discuss or open-iscsi. It's getting rather far afield for ksummit-discuss so people should trim that from follow-ups. On Sun, Mar 27, 2005 at 05:51:49AM +0200, Andrea Arcangeli wrote: > On Thu, Mar 24, 2005 at 07:43:41PM -0800, Matt Mackall wrote: > > There may be network multipath. But I think we can have a single > > socket mempool per logical device and a single skbuff mempool shared > > among those sockets. > > If we'll have to reserve more than 1 packet per each socket context, > then the mempool probably can't be shared. I believe the mempool can be shared among all sockets that represent the same storage device. Packets out any socket represent progress. > I wonder if somebody has ever reproduced deadlocks > by swapping on software-tcp-iscsi. Yes, done before it was even called iSCSI. > > And that still leaves us with the lack of buffers to receive ACKs > > problem, which is perhaps worse. > > The mempooling should take care of the acks too. The receive buffer is allocated at the time we DMA it from the card. We have no idea of its contents and we won't know what socket mempool to pull the receive skbuff from until much higher in the network stack, which could be quite a while later if we're under OOM load. And we can't have a mempool big enough to handle all the traffic that might potentially be deferred for softirq processing when we're OOM, especially at gigabit rates. I think this is actually the tricky piece of the problem and solving the socket and send buffer allocation doesn't help until this gets figured out. We could perhaps try to address this with another special receive-side alloc_skb that fails most of the time on OOM but sometimes pulls from a special reserve. > Perhaps the mempooling overhead will be too huge to pay for it even when > it's not necessary, in such case the iscsid will have to pass a new > bitflag to the socket syscall, when it creates the socket meant to talk > with the remote disk. I think we probably attach a mempool to a socket after the fact. And no, we can't have a mempool attached to every socket. -- Mathematics is the supreme nostalgia of our time. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics 2005-03-27 5:48 ` [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics Matt Mackall @ 2005-03-27 6:33 ` Dmitry Yusupov 2005-03-27 6:46 ` David S. Miller 0 siblings, 1 reply; 6+ messages in thread From: Dmitry Yusupov @ 2005-03-27 6:33 UTC (permalink / raw) To: Matt Mackall Cc: Andrea Arcangeli, Mike Christie, open-iscsi@googlegroups.com, James.Bottomley, ksummit-2005-discuss, netdev On Sat, 2005-03-26 at 21:48 -0800, Matt Mackall wrote: > I'm cc:ing this to netdev, where this discussion really ought to be. > There's a separate networking summit and I suspect most of the > networking heavies aren't reading ksummit-discuss or open-iscsi. > It's getting rather far afield for ksummit-discuss so people should > trim that from follow-ups. > > On Sun, Mar 27, 2005 at 05:51:49AM +0200, Andrea Arcangeli wrote: > > On Thu, Mar 24, 2005 at 07:43:41PM -0800, Matt Mackall wrote: > > > There may be network multipath. But I think we can have a single > > > socket mempool per logical device and a single skbuff mempool shared > > > among those sockets. > > > > If we'll have to reserve more than 1 packet per each socket context, > > then the mempool probably can't be shared. > > I believe the mempool can be shared among all sockets that represent > the same storage device. Packets out any socket represent progress. > > > I wonder if somebody has ever reproduced deadlocks > > by swapping on software-tcp-iscsi. > > Yes, done before it was even called iSCSI. > > > > And that still leaves us with the lack of buffers to receive ACKs > > > problem, which is perhaps worse. > > > > The mempooling should take care of the acks too. > > The receive buffer is allocated at the time we DMA it from the card. > We have no idea of its contents and we won't know what socket mempool > to pull the receive skbuff from until much higher in the network > stack, which could be quite a while later if we're under OOM load. And > we can't have a mempool big enough to handle all the traffic that > might potentially be deferred for softirq processing when we're OOM, > especially at gigabit rates. > > I think this is actually the tricky piece of the problem and solving > the socket and send buffer allocation doesn't help until this gets > figured out. > > We could perhaps try to address this with another special receive-side > alloc_skb that fails most of the time on OOM but sometimes pulls from > a special reserve. nope. this will not solve the problem on receive or will just solve it partially. The right way to solve it would be to provide special API which will help to re-use link-layer's ring SKBs. i.e. TCP stack should call NIC driver's callback after all SKB data been successfully copied to the user space. At that point NIC driver will safely replenish HW ring. This way we could avoid most of memory allocations on receive. > > Perhaps the mempooling overhead will be too huge to pay for it even when > > it's not necessary, in such case the iscsid will have to pass a new > > bitflag to the socket syscall, when it creates the socket meant to talk > > with the remote disk. > > I think we probably attach a mempool to a socket after the fact. And > no, we can't have a mempool attached to every socket. > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics 2005-03-27 6:33 ` Dmitry Yusupov @ 2005-03-27 6:46 ` David S. Miller 2005-03-28 19:45 ` Roland Dreier 0 siblings, 1 reply; 6+ messages in thread From: David S. Miller @ 2005-03-27 6:46 UTC (permalink / raw) To: Dmitry Yusupov Cc: mpm, andrea, michaelc, open-iscsi, James.Bottomley, ksummit-2005-discuss, netdev On Sat, 26 Mar 2005 22:33:01 -0800 Dmitry Yusupov <dmitry_yus@yahoo.com> wrote: > i.e. TCP stack should call NIC driver's callback after all SKB data > been successfully copied to the user space. At that point NIC driver > will safely replenish HW ring. This way we could avoid most of memory > allocations on receive. How does this solve your problem? This is just simple SKB recycling, and it's a pretty old idea. TCP packets can be held on receive for arbitrary amounts of time. This is especially true if data is received out of order or when packets are dropped. We can't even wake up the user until the holes in the sequence space are filled. Even if data is received properly and in order, there are no hard guarentees about when the user will get back onto the CPU to get the data copied to it. During these gaps in time, you will need to keep your HW receive ring populated with packets. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics 2005-03-27 6:46 ` David S. Miller @ 2005-03-28 19:45 ` Roland Dreier [not found] ` <1112042936.5088.22.camel@beastie> 0 siblings, 1 reply; 6+ messages in thread From: Roland Dreier @ 2005-03-28 19:45 UTC (permalink / raw) To: David S. Miller Cc: Dmitry Yusupov, mpm, andrea, michaelc, open-iscsi, James.Bottomley, ksummit-2005-discuss, netdev Let me slightly hijack this thread to throw out another topic that I think is worth talking about at the kernel summit: handling remote DMA (RDMA) network technologies. As some of you might know, I'm one of the main authors of the InfiniBand support in the kernel, and I think we have things fairly well in hand there, although handling direct userspace access to RDMA capabilities may raise some issues worth talking about. However, there is also RDMA-over-TCP hardware beginning to be used, based on the specs from the IETF rddp working group and the RDMA Consortium. I would hope that we can abstract out the common pieces for InfiniBand and RDMA NIC (RNIC) support and morph drivers/infiniband into a more general drivers/rdma. This is not _that_ offtopic, since RDMA NICs provide another way of handling OOM for iSCSI. By having the NIC handle the network transport through something like iSER, you avoid a lot of the issues in this thread. Having to reconnect to a target while OOM is still a problem, but it seems no worse in principal than the issues with a dump FC card that needs the host driver to handling fabric login. I know that in the InfiniBand world, people have been able to run stress tests of storage over SCSI RDMA Protocol (SRP) with very heavy swapping going on and no deadlocks. SRP is in effect network storage with the transport handled by the IB hardware. However there are some sticky points that I would be interested in discussing. For example, the IETF rddp drafts envisage what they call a "dual stack" model: TCP connections are set up by the usual network stack and run for a while in "streaming" mode until the application is ready to start using RDMA. At that point there is an "MPA" negotiation and then the socket is handed over to the RNIC. Clearly moving the state from the kernel's stack to the RNIC is not trivial. Other developers who have more direct experience with RNIC hardware or perhaps just strong opinions may have other things in this area that they'd like to talk about. Thanks, Roland ^ permalink raw reply [flat|nested] 6+ messages in thread
[parent not found: <1112042936.5088.22.camel@beastie>]
* Re: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics [not found] ` <1112042936.5088.22.camel@beastie> @ 2005-03-28 22:32 ` Benjamin LaHaise 2005-03-29 3:19 ` Linux support for RDMA (was: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics) Roland Dreier 0 siblings, 1 reply; 6+ messages in thread From: Benjamin LaHaise @ 2005-03-28 22:32 UTC (permalink / raw) To: Dmitry Yusupov Cc: open-iscsi, David S. Miller, mpm, andrea, michaelc, James.Bottomley, ksummit-2005-discuss, netdev On Mon, Mar 28, 2005 at 12:48:56PM -0800, Dmitry Yusupov wrote: > If you have plans to start new project such as SoftRDMA than yes. lets > discuss it since set of problems will be similar to what we've got with > software iSCSI Initiators. I'm somewhat interested in seeing a SoftRDMA project get off the ground. At least the NatSemi 83820 gige MAC is able to provide early-rx interrupts that allow one to get an rx interrupt before the full payload has arrived making it possible to write out a new rx descriptor to place the payload wherever it is ultimately desired. It would be fun to work on if not the most performant RDMA implementation. > I'm not a believer in any HW state-full protocol offloading technologies > and that was one of my motivations to initiate Open-iSCSI project to > prove that performance is not an issue anymore. And we succeeded, by > showing comparable to iSCSI HW Initiator's numbers. Agreed. After working on a full TOE implementation, I think that the niche market most TOE vendors are pursuing is not one that the Linux community will ever develop for. Hardware vendors that gradually add offloading features from the NIC realm to speed up the existing network stack are a much better fit with Linux. > Though, for me, RDMA over TCP is an interesting topic from software > implementation point of view. I was thinking about organizing new > project. If someone knows that related work is already started - let me > know since I might be interested to help. Shall we create a new mailing list? I guess it's time to update majordomo... =) -ben ^ permalink raw reply [flat|nested] 6+ messages in thread
* Linux support for RDMA (was: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics) 2005-03-28 22:32 ` Benjamin LaHaise @ 2005-03-29 3:19 ` Roland Dreier 2005-03-30 16:00 ` Benjamin LaHaise 0 siblings, 1 reply; 6+ messages in thread From: Roland Dreier @ 2005-03-29 3:19 UTC (permalink / raw) To: Benjamin LaHaise Cc: Dmitry Yusupov, open-iscsi, David S. Miller, mpm, andrea, michaelc, James.Bottomley, ksummit-2005-discuss, netdev Benjamin> Agreed. After working on a full TOE implementation, I Benjamin> think that the niche market most TOE vendors are Benjamin> pursuing is not one that the Linux community will ever Benjamin> develop for. Hardware vendors that gradually add Benjamin> offloading features from the NIC realm to speed up the Benjamin> existing network stack are a much better fit with Linux. I have to admit I don't know much about the TOE / RDMA/TCP / RNIC (or whatever you want to call it) world. However I know that the large majority of InfiniBand use right now is running on Linux, and I hope the Linux community is willing to work with the IB community. InfiniBand adoption is strong right now, with lots of large clusters being built. It seems reasonable that RDMA/TCP should be able to compete in the same market. Whether InfiniBand or RDMA/TCP or both will survive or prosper is a good question, and I think it's too early to tell yet. - R. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Linux support for RDMA (was: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics) 2005-03-29 3:19 ` Linux support for RDMA (was: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics) Roland Dreier @ 2005-03-30 16:00 ` Benjamin LaHaise 2005-03-31 1:08 ` Linux support for RDMA H. Peter Anvin 0 siblings, 1 reply; 6+ messages in thread From: Benjamin LaHaise @ 2005-03-30 16:00 UTC (permalink / raw) To: Roland Dreier Cc: Dmitry Yusupov, open-iscsi, David S. Miller, mpm, andrea, michaelc, James.Bottomley, ksummit-2005-discuss, netdev On Mon, Mar 28, 2005 at 07:19:35PM -0800, Roland Dreier wrote: > Benjamin> Agreed. After working on a full TOE implementation, I > Benjamin> think that the niche market most TOE vendors are > Benjamin> pursuing is not one that the Linux community will ever > Benjamin> develop for. Hardware vendors that gradually add > Benjamin> offloading features from the NIC realm to speed up the > Benjamin> existing network stack are a much better fit with Linux. > > I have to admit I don't know much about the TOE / RDMA/TCP / RNIC (or > whatever you want to call it) world. However I know that the large > majority of InfiniBand use right now is running on Linux, and I hope > the Linux community is willing to work with the IB community. My comments were more directed to Full TOE implementations, which tend to suffer from incomplete feature coverage if compared to the native Linux TCP/IP stack. Wedging a complete network stack onto a piece of hardware does allow for better performance characteristics on workloads where the networking overhead matters, but it comes at the cost of not being able to trivially change the resulting stack. Plus there are very few vendors who are willing to release firmware code to the open source community. > InfiniBand adoption is strong right now, with lots of large clusters > being built. It seems reasonable that RDMA/TCP should be able to > compete in the same market. Whether InfiniBand or RDMA/TCP or both > will survive or prosper is a good question, and I think it's too early > to tell yet. I'm curious how the 10Gig ethernet market will pan out. Time and again the market has shown that ethernet always has the cost advantage in the end. If something like Intel's I/O Acceleration Technology makes it that much easier for commodity ethernet to achieve similar performance characteristics over ethernet to that of IB and fibre channel, the cost advantage alone might switch some new customers over. But the hardware isn't near what IB offers today, making IB an important niche filler. -ben -- "Time is what keeps everything from happening all at once." -- John Wheeler ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Linux support for RDMA 2005-03-30 16:00 ` Benjamin LaHaise @ 2005-03-31 1:08 ` H. Peter Anvin 0 siblings, 0 replies; 6+ messages in thread From: H. Peter Anvin @ 2005-03-31 1:08 UTC (permalink / raw) To: Benjamin LaHaise Cc: Roland Dreier, Dmitry Yusupov, open-iscsi, David S. Miller, mpm, andrea, michaelc, James.Bottomley, ksummit-2005-discuss, netdev Benjamin LaHaise wrote: > > I'm curious how the 10Gig ethernet market will pan out. Time and again > the market has shown that ethernet always has the cost advantage in the > end. If something like Intel's I/O Acceleration Technology makes it > that much easier for commodity ethernet to achieve similar performance > characteristics over ethernet to that of IB and fibre channel, the cost > advantage alone might switch some new customers over. But the hardware > isn't near what IB offers today, making IB an important niche filler. > From what I've seen coming down the pipe, I think 10GE is going to eventually win over IB, just like previous generations did over Token Ring, FDDI and other niche filler technologies. It doesn't, as you say, mean that e.g. IB doesn't matter *now*; furthermore, it also matters for the purpose of fixing the kind of issues that are going to have to be fixed anyway. -hpa ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2005-04-02 1:59 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-04-01 1:49 Linux support for RDMA jaganav
2005-04-01 1:57 ` H. Peter Anvin
-- strict thread matches above, loose matches on Subject: below --
2005-04-02 1:59 jaganav
2005-04-01 23:50 Asgeir Eiriksson
2005-04-02 0:02 ` Dmitry Yusupov
[not found] <20050324233921.GZ14202@opteron.random>
[not found] ` <20050325034341.GV32638@waste.org>
[not found] ` <20050327035149.GD4053@g5.random>
2005-03-27 5:48 ` [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics Matt Mackall
2005-03-27 6:33 ` Dmitry Yusupov
2005-03-27 6:46 ` David S. Miller
2005-03-28 19:45 ` Roland Dreier
[not found] ` <1112042936.5088.22.camel@beastie>
2005-03-28 22:32 ` Benjamin LaHaise
2005-03-29 3:19 ` Linux support for RDMA (was: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics) Roland Dreier
2005-03-30 16:00 ` Benjamin LaHaise
2005-03-31 1:08 ` Linux support for RDMA H. Peter Anvin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).