Re: Linux support for RDMA

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: Linux support for RDMA
@ 2005-04-01  1:49 jaganav
  2005-04-01  1:57 ` H. Peter Anvin
  0 siblings, 1 reply; 6+ messages in thread
From: jaganav @ 2005-04-01  1:49 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Roland Dreier, Dmitry Yusupov, open-iscsi, David S. Miller, mpm,
	andrea, michaelc, James.Bottomley, ksummit-2005-discuss, netdev,
	Benjamin LaHaise

Quoting "H. Peter Anvin" <hpa@zytor.com>:
> Benjamin LaHaise wrote:
> >  
> > I'm curious how the 10Gig ethernet market will pan out.  Time and again 
> > the market has shown that ethernet always has the cost advantage in the 
> > end.  If something like Intel's I/O Acceleration Technology makes it 
> > that much easier for commodity ethernet to achieve similar performance 
> > characteristics over ethernet to that of IB and fibre channel, the cost 
> > advantage alone might switch some new customers over.  But the hardware 
> > isn't near what IB offers today, making IB an important niche filler.
> > 
> 
>  From what I've seen coming down the pipe, I think 10GE is going to 
> eventually win over IB, just like previous generations did over Token 
> Ring, FDDI and other niche filler technologies.  It doesn't, as you say, 
> mean that e.g. IB doesn't matter *now*; furthermore, it also matters for 
> the purpose of fixing the kind of issues that are going to have to be 
> fixed anyway.
> 
> 	-hpa
> 
> 
> 

No doubt, Ethernet will eventually win .. btw, Hasn't history proven this over
ATM? More specifically when the industry predicted that ATM will replace
ethernet :)

However, I'll have to agree with Ben that IB technolgy will fill an important
niche segment, more specifically so in the low end of High Performance Computing
(HPC) segment which is in a transition mode currently moving away from
proprietary interconnects to industry standards based IB technology. Eventhough,
ethernet may eventually may catch up with IB in terms of the bandwidth but IB
fabrics can offer better latencies.

Thanks
Venkat

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Linux support for RDMA
  2005-04-01  1:49 Linux support for RDMA jaganav
@ 2005-04-01  1:57 ` H. Peter Anvin
  0 siblings, 0 replies; 6+ messages in thread
From: H. Peter Anvin @ 2005-04-01  1:57 UTC (permalink / raw)
  To: jaganav
  Cc: Roland Dreier, Dmitry Yusupov, open-iscsi, David S. Miller, mpm,
	andrea, michaelc, James.Bottomley, ksummit-2005-discuss, netdev,
	Benjamin LaHaise

jaganav@us.ibm.com wrote:
> 
> No doubt, Ethernet will eventually win .. btw, Hasn't history proven this over
> ATM? More specifically when the industry predicted that ATM will replace
> ethernet :)
> 
> However, I'll have to agree with Ben that IB technolgy will fill an important
> niche segment, more specifically so in the low end of High Performance Computing
> (HPC) segment which is in a transition mode currently moving away from
> proprietary interconnects to industry standards based IB technology. Eventhough,
> ethernet may eventually may catch up with IB in terms of the bandwidth but IB
> fabrics can offer better latencies.
> 

We've seen this over and over... Token Ring, FDDI, ATM, IB, ... all of 
them "better" than the Ethernet of the day, but eventually 
commoditization wins out.  With 10GE, Ethernet has finally stopped 
pretending to be CSMA/CD even; "Ethernet" is now really nothing more 
than a collective name for a set of somewhat compatible commodity 
networking technologies.

	-hpa

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: Linux support for RDMA
@ 2005-04-02  1:59 jaganav
  0 siblings, 0 replies; 6+ messages in thread
From: jaganav @ 2005-04-02  1:59 UTC (permalink / raw)
  To: Dmitry Yusupov
  Cc: Asgeir Eiriksson, H. Peter Anvin, Roland Dreier, open-iscsi,
	David S. Miller, mpm, andrea, michaelc, James.Bottomley,
	ksummit-2005-discuss, netdev, Benjamin LaHaise

Quoting Dmitry Yusupov <dima@neterion.com>:

> On Fri, 2005-04-01 at 15:50 -0800, Asgeir Eiriksson wrote:
> > Venkat
> > 
> > Your assessment of the IB vs. Ethernet latencies isn't necessarily
> > correct.
> > - you already have available low latency 10GE switches (< 1us
> > port-to-port)
> > - you already have available low latency (cut-through processing) 10GE
> > TOE engines
> > 
> > The Veritest verified 10GE TOE end-to-end latency is < 10us today
> > (end-to-end being from a Linux user-space-process to a Linux
> > user-space-process through a switch; full report with detail of the
> > setup is available at
> > http://www.chelsio.com/technology/Chelsio10GbE_Fujitsu.pdf)
> > 
> > For comparison: the published IB latency numbers are around 5us today
> > and those use a polling receiver, and those don't include a context
> > switch(es) as does the Ethernet number quoted above.
> 
> yep. I should agree in here. On 10Gbps network latencies numbers are
> around 5-15us. Even with non-TOE card, I managed to get 13us latency
> with regular TCP/IP stack.
> 
> [root@localhost root]# ./nptcp -a -t -l 256 -u 98304 -i 256 -p 5100 -P - h
> 17.1.1.227
> Latency: 0.000013
> Now starting main loop
>   0:       256 bytes    7 times -->  131.37 Mbps in 0.000015 sec
>   1:       512 bytes   65 times -->  239.75 Mbps in 0.000016 sec
> 
> Dima

When I mentioned about latency, the measurement is from
end-to-end (i.e. from app to app) but not just the
switching or port to port latencies.

With IB, I have seen the best numbers ranging from
5 to 7 us and which is far better than ethernet today 
(15 to 35us) with the network we have. I am not
denyig the fact that ethernet is trying to close the 
gap here but IB has got a relative advantage now.

Good to see you have got 5us in one case but what were
the switch and adapter latencies in this case.

Thanks
Venkat

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: Linux support for RDMA
@ 2005-04-01 23:50 Asgeir Eiriksson
  2005-04-02  0:02 ` Dmitry Yusupov
  0 siblings, 1 reply; 6+ messages in thread
From: Asgeir Eiriksson @ 2005-04-01 23:50 UTC (permalink / raw)
  To: jaganav, H. Peter Anvin
  Cc: Roland Dreier, Dmitry Yusupov, open-iscsi, David S. Miller, mpm,
	andrea, michaelc, James.Bottomley, ksummit-2005-discuss, netdev,
	Benjamin LaHaise

Venkat

Your assessment of the IB vs. Ethernet latencies isn't necessarily
correct.
- you already have available low latency 10GE switches (< 1us
port-to-port)
- you already have available low latency (cut-through processing) 10GE
TOE engines

The Veritest verified 10GE TOE end-to-end latency is < 10us today
(end-to-end being from a Linux user-space-process to a Linux
user-space-process through a switch; full report with detail of the
setup is available at
http://www.chelsio.com/technology/Chelsio10GbE_Fujitsu.pdf)

For comparison: the published IB latency numbers are around 5us today
and those use a polling receiver, and those don't include a context
switch(es) as does the Ethernet number quoted above.

'Asgeir


> -----Original Message-----
> From: netdev-bounce@oss.sgi.com [mailto:netdev-bounce@oss.sgi.com] On
> Behalf Of jaganav@us.ibm.com
> Sent: Thursday, March 31, 2005 5:49 PM
> To: H. Peter Anvin
> Cc: Roland Dreier; Dmitry Yusupov; open-iscsi@googlegroups.com; David
S.
> Miller; mpm@selenic.com; andrea@suse.de; michaelc@cs.wisc.edu;
> James.Bottomley@HansenPartnership.com; ksummit-2005-discuss@thunk.org;
> netdev@oss.sgi.com; Benjamin LaHaise
> Subject: Re: Linux support for RDMA
> 
> Quoting "H. Peter Anvin" <hpa@zytor.com>:
> > Benjamin LaHaise wrote:
> > >
> > > I'm curious how the 10Gig ethernet market will pan out.  Time and
> again
> > > the market has shown that ethernet always has the cost advantage
in
> the
> > > end.  If something like Intel's I/O Acceleration Technology makes
it
> > > that much easier for commodity ethernet to achieve similar
performance
> > > characteristics over ethernet to that of IB and fibre channel, the
> cost
> > > advantage alone might switch some new customers over.  But the
> hardware
> > > isn't near what IB offers today, making IB an important niche
filler.
> > >
> >
> >  From what I've seen coming down the pipe, I think 10GE is going to
> > eventually win over IB, just like previous generations did over
Token
> > Ring, FDDI and other niche filler technologies.  It doesn't, as you
say,
> > mean that e.g. IB doesn't matter *now*; furthermore, it also matters
for
> > the purpose of fixing the kind of issues that are going to have to
be
> > fixed anyway.
> >
> > 	-hpa
> >
> >
> >
> 
> No doubt, Ethernet will eventually win .. btw, Hasn't history proven
this
> over
> ATM? More specifically when the industry predicted that ATM will
replace
> ethernet :)
> 
> However, I'll have to agree with Ben that IB technolgy will fill an
> important
> niche segment, more specifically so in the low end of High Performance
> Computing
> (HPC) segment which is in a transition mode currently moving away from
> proprietary interconnects to industry standards based IB technology.
> Eventhough,
> ethernet may eventually may catch up with IB in terms of the bandwidth
but
> IB
> fabrics can offer better latencies.
> 
> Thanks
> Venkat

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: Linux support for RDMA
  2005-04-01 23:50 Asgeir Eiriksson
@ 2005-04-02  0:02 ` Dmitry Yusupov
  0 siblings, 0 replies; 6+ messages in thread
From: Dmitry Yusupov @ 2005-04-02  0:02 UTC (permalink / raw)
  To: Asgeir Eiriksson
  Cc: jaganav, H. Peter Anvin, Roland Dreier, open-iscsi,
	David S. Miller, mpm, andrea, michaelc, James.Bottomley,
	ksummit-2005-discuss, netdev, Benjamin LaHaise

On Fri, 2005-04-01 at 15:50 -0800, Asgeir Eiriksson wrote:
> Venkat
> 
> Your assessment of the IB vs. Ethernet latencies isn't necessarily
> correct.
> - you already have available low latency 10GE switches (< 1us
> port-to-port)
> - you already have available low latency (cut-through processing) 10GE
> TOE engines
> 
> The Veritest verified 10GE TOE end-to-end latency is < 10us today
> (end-to-end being from a Linux user-space-process to a Linux
> user-space-process through a switch; full report with detail of the
> setup is available at
> http://www.chelsio.com/technology/Chelsio10GbE_Fujitsu.pdf)
> 
> For comparison: the published IB latency numbers are around 5us today
> and those use a polling receiver, and those don't include a context
> switch(es) as does the Ethernet number quoted above.

yep. I should agree in here. On 10Gbps network latencies numbers are
around 5-15us. Even with non-TOE card, I managed to get 13us latency
with regular TCP/IP stack.

[root@localhost root]# ./nptcp -a -t -l 256 -u 98304 -i 256 -p 5100 -P - h 17.1.1.227
Latency: 0.000013
Now starting main loop
  0:       256 bytes    7 times -->  131.37 Mbps in 0.000015 sec
  1:       512 bytes   65 times -->  239.75 Mbps in 0.000016 sec

Dima

> 'Asgeir
> 
> 
> > -----Original Message-----
> > From: netdev-bounce@oss.sgi.com [mailto:netdev-bounce@oss.sgi.com] On
> > Behalf Of jaganav@us.ibm.com
> > Sent: Thursday, March 31, 2005 5:49 PM
> > To: H. Peter Anvin
> > Cc: Roland Dreier; Dmitry Yusupov; open-iscsi@googlegroups.com; David
> S.
> > Miller; mpm@selenic.com; andrea@suse.de; michaelc@cs.wisc.edu;
> > James.Bottomley@HansenPartnership.com; ksummit-2005-discuss@thunk.org;
> > netdev@oss.sgi.com; Benjamin LaHaise
> > Subject: Re: Linux support for RDMA
> > 
> > Quoting "H. Peter Anvin" <hpa@zytor.com>:
> > > Benjamin LaHaise wrote:
> > > >
> > > > I'm curious how the 10Gig ethernet market will pan out.  Time and
> > again
> > > > the market has shown that ethernet always has the cost advantage
> in
> > the
> > > > end.  If something like Intel's I/O Acceleration Technology makes
> it
> > > > that much easier for commodity ethernet to achieve similar
> performance
> > > > characteristics over ethernet to that of IB and fibre channel, the
> > cost
> > > > advantage alone might switch some new customers over.  But the
> > hardware
> > > > isn't near what IB offers today, making IB an important niche
> filler.
> > > >
> > >
> > >  From what I've seen coming down the pipe, I think 10GE is going to
> > > eventually win over IB, just like previous generations did over
> Token
> > > Ring, FDDI and other niche filler technologies.  It doesn't, as you
> say,
> > > mean that e.g. IB doesn't matter *now*; furthermore, it also matters
> for
> > > the purpose of fixing the kind of issues that are going to have to
> be
> > > fixed anyway.
> > >
> > > 	-hpa
> > >
> > >
> > >
> > 
> > No doubt, Ethernet will eventually win .. btw, Hasn't history proven
> this
> > over
> > ATM? More specifically when the industry predicted that ATM will
> replace
> > ethernet :)
> > 
> > However, I'll have to agree with Ben that IB technolgy will fill an
> > important
> > niche segment, more specifically so in the low end of High Performance
> > Computing
> > (HPC) segment which is in a transition mode currently moving away from
> > proprietary interconnects to industry standards based IB technology.
> > Eventhough,
> > ethernet may eventually may catch up with IB in terms of the bandwidth
> but
> > IB
> > fabrics can offer better latencies.
> > 
> > Thanks
> > Venkat
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

[parent not found: <20050324233921.GZ14202@opteron.random>]

[parent not found: <20050325034341.GV32638@waste.org>]

[parent not found: <20050327035149.GD4053@g5.random>]

* Re: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics
       [not found]   ` <20050327035149.GD4053@g5.random>
@ 2005-03-27  5:48     ` Matt Mackall
  2005-03-27  6:33       ` Dmitry Yusupov
  0 siblings, 1 reply; 6+ messages in thread
From: Matt Mackall @ 2005-03-27  5:48 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Mike Christie, Dmitry Yusupov, open-iscsi, James.Bottomley,
	ksummit-2005-discuss, netdev

I'm cc:ing this to netdev, where this discussion really ought to be.
There's a separate networking summit and I suspect most of the
networking heavies aren't reading ksummit-discuss or open-iscsi.
It's getting rather far afield for ksummit-discuss so people should
trim that from follow-ups.

On Sun, Mar 27, 2005 at 05:51:49AM +0200, Andrea Arcangeli wrote:
> On Thu, Mar 24, 2005 at 07:43:41PM -0800, Matt Mackall wrote:
> > There may be network multipath. But I think we can have a single
> > socket mempool per logical device and a single skbuff mempool shared
> > among those sockets.
> 
> If we'll have to reserve more than 1 packet per each socket context,
> then the mempool probably can't be shared.

I believe the mempool can be shared among all sockets that represent
the same storage device. Packets out any socket represent progress.

> I wonder if somebody has ever reproduced deadlocks
> by swapping on software-tcp-iscsi.

Yes, done before it was even called iSCSI.

> > And that still leaves us with the lack of buffers to receive ACKs
> > problem, which is perhaps worse.
> 
> The mempooling should take care of the acks too.

The receive buffer is allocated at the time we DMA it from the card.
We have no idea of its contents and we won't know what socket mempool
to pull the receive skbuff from until much higher in the network
stack, which could be quite a while later if we're under OOM load. And
we can't have a mempool big enough to handle all the traffic that
might potentially be deferred for softirq processing when we're OOM,
especially at gigabit rates.

I think this is actually the tricky piece of the problem and solving
the socket and send buffer allocation doesn't help until this gets
figured out.

We could perhaps try to address this with another special receive-side
alloc_skb that fails most of the time on OOM but sometimes pulls from
a special reserve.

> Perhaps the mempooling overhead will be too huge to pay for it even when
> it's not necessary, in such case the iscsid will have to pass a new
> bitflag to the socket syscall, when it creates the socket meant to talk
> with the remote disk.

I think we probably attach a mempool to a socket after the fact. And
no, we can't have a mempool attached to every socket.

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics
  2005-03-27  5:48     ` [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics Matt Mackall
@ 2005-03-27  6:33       ` Dmitry Yusupov
  2005-03-27  6:46         ` David S. Miller
  0 siblings, 1 reply; 6+ messages in thread
From: Dmitry Yusupov @ 2005-03-27  6:33 UTC (permalink / raw)
  To: Matt Mackall
  Cc: Andrea Arcangeli, Mike Christie, open-iscsi@googlegroups.com,
	James.Bottomley, ksummit-2005-discuss, netdev

On Sat, 2005-03-26 at 21:48 -0800, Matt Mackall wrote:
> I'm cc:ing this to netdev, where this discussion really ought to be.
> There's a separate networking summit and I suspect most of the
> networking heavies aren't reading ksummit-discuss or open-iscsi.
> It's getting rather far afield for ksummit-discuss so people should
> trim that from follow-ups.
> 
> On Sun, Mar 27, 2005 at 05:51:49AM +0200, Andrea Arcangeli wrote:
> > On Thu, Mar 24, 2005 at 07:43:41PM -0800, Matt Mackall wrote:
> > > There may be network multipath. But I think we can have a single
> > > socket mempool per logical device and a single skbuff mempool shared
> > > among those sockets.
> > 
> > If we'll have to reserve more than 1 packet per each socket context,
> > then the mempool probably can't be shared.
> 
> I believe the mempool can be shared among all sockets that represent
> the same storage device. Packets out any socket represent progress.
> 
> > I wonder if somebody has ever reproduced deadlocks
> > by swapping on software-tcp-iscsi.
> 
> Yes, done before it was even called iSCSI.
> 
> > > And that still leaves us with the lack of buffers to receive ACKs
> > > problem, which is perhaps worse.
> > 
> > The mempooling should take care of the acks too.
> 
> The receive buffer is allocated at the time we DMA it from the card.
> We have no idea of its contents and we won't know what socket mempool
> to pull the receive skbuff from until much higher in the network
> stack, which could be quite a while later if we're under OOM load. And
> we can't have a mempool big enough to handle all the traffic that
> might potentially be deferred for softirq processing when we're OOM,
> especially at gigabit rates.
> 
> I think this is actually the tricky piece of the problem and solving
> the socket and send buffer allocation doesn't help until this gets
> figured out.
> 
> We could perhaps try to address this with another special receive-side
> alloc_skb that fails most of the time on OOM but sometimes pulls from
> a special reserve.

nope. this will not solve the problem on receive or will just solve it
partially. The right way to solve it would be to provide special API
which will help to re-use link-layer's ring SKBs. i.e. TCP stack should
call NIC driver's callback after all SKB data been successfully copied
to the user space. At that point NIC driver will safely replenish HW
ring. This way we could avoid most of memory allocations on receive.

> > Perhaps the mempooling overhead will be too huge to pay for it even when
> > it's not necessary, in such case the iscsid will have to pass a new
> > bitflag to the socket syscall, when it creates the socket meant to talk
> > with the remote disk.
> 
> I think we probably attach a mempool to a socket after the fact. And
> no, we can't have a mempool attached to every socket.
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics
  2005-03-27  6:33       ` Dmitry Yusupov
@ 2005-03-27  6:46         ` David S. Miller
  2005-03-28 19:45           ` Roland Dreier
  0 siblings, 1 reply; 6+ messages in thread
From: David S. Miller @ 2005-03-27  6:46 UTC (permalink / raw)
  To: Dmitry Yusupov
  Cc: mpm, andrea, michaelc, open-iscsi, James.Bottomley,
	ksummit-2005-discuss, netdev

On Sat, 26 Mar 2005 22:33:01 -0800
Dmitry Yusupov <dmitry_yus@yahoo.com> wrote:

> i.e. TCP stack should call NIC driver's callback after all SKB data
> been successfully copied to the user space. At that point NIC driver
> will safely replenish HW ring. This way we could avoid most of memory
> allocations on receive.

How does this solve your problem?  This is just simple SKB recycling,
and it's a pretty old idea.

TCP packets can be held on receive for arbitrary amounts of time.

This is especially true if data is received out of order or when
packets are dropped.  We can't even wake up the user until the
holes in the sequence space are filled.

Even if data is received properly and in order, there are no hard
guarentees about when the user will get back onto the CPU to
get the data copied to it.

During these gaps in time, you will need to keep your HW receive
ring populated with packets.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics
  2005-03-27  6:46         ` David S. Miller
@ 2005-03-28 19:45           ` Roland Dreier
       [not found]             ` <1112042936.5088.22.camel@beastie>
  0 siblings, 1 reply; 6+ messages in thread
From: Roland Dreier @ 2005-03-28 19:45 UTC (permalink / raw)
  To: David S. Miller
  Cc: Dmitry Yusupov, mpm, andrea, michaelc, open-iscsi,
	James.Bottomley, ksummit-2005-discuss, netdev

Let me slightly hijack this thread to throw out another topic that I
think is worth talking about at the kernel summit: handling remote DMA
(RDMA) network technologies.

As some of you might know, I'm one of the main authors of the
InfiniBand support in the kernel, and I think we have things fairly
well in hand there, although handling direct userspace access to RDMA
capabilities may raise some issues worth talking about.

However, there is also RDMA-over-TCP hardware beginning to be used,
based on the specs from the IETF rddp working group and the RDMA
Consortium.  I would hope that we can abstract out the common pieces
for InfiniBand and RDMA NIC (RNIC) support and morph
drivers/infiniband into a more general drivers/rdma.

This is not _that_ offtopic, since RDMA NICs provide another way of
handling OOM for iSCSI.  By having the NIC handle the network
transport through something like iSER, you avoid a lot of the issues
in this thread.  Having to reconnect to a target while OOM is still a
problem, but it seems no worse in principal than the issues with a
dump FC card that needs the host driver to handling fabric login.

I know that in the InfiniBand world, people have been able to run
stress tests of storage over SCSI RDMA Protocol (SRP) with very heavy
swapping going on and no deadlocks.  SRP is in effect network storage
with the transport handled by the IB hardware.

However there are some sticky points that I would be interested in
discussing.  For example, the IETF rddp drafts envisage what they call
a "dual stack" model: TCP connections are set up by the usual network
stack and run for a while in "streaming" mode until the application is
ready to start using RDMA.  At that point there is an "MPA"
negotiation and then the socket is handed over to the RNIC.  Clearly
moving the state from the kernel's stack to the RNIC is not trivial.

Other developers who have more direct experience with RNIC hardware or
perhaps just strong opinions may have other things in this area that
they'd like to talk about.

Thanks,
  Roland

^ permalink raw reply	[flat|nested] 6+ messages in thread

[parent not found: <1112042936.5088.22.camel@beastie>]

* Re: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics
       [not found]             ` <1112042936.5088.22.camel@beastie>
@ 2005-03-28 22:32               ` Benjamin LaHaise
  2005-03-29  3:19                 ` Linux support for RDMA (was: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics) Roland Dreier
  0 siblings, 1 reply; 6+ messages in thread
From: Benjamin LaHaise @ 2005-03-28 22:32 UTC (permalink / raw)
  To: Dmitry Yusupov
  Cc: open-iscsi, David S. Miller, mpm, andrea, michaelc,
	James.Bottomley, ksummit-2005-discuss, netdev

On Mon, Mar 28, 2005 at 12:48:56PM -0800, Dmitry Yusupov wrote:
> If you have plans to start new project such as SoftRDMA than yes. lets
> discuss it since set of problems will be similar to what we've got with
> software iSCSI Initiators.

I'm somewhat interested in seeing a SoftRDMA project get off the ground.  
At least the NatSemi 83820 gige MAC is able to provide early-rx interrupts 
that allow one to get an rx interrupt before the full payload has arrived 
making it possible to write out a new rx descriptor to place the payload 
wherever it is ultimately desired.  It would be fun to work on if not the 
most performant RDMA implementation.

> I'm not a believer in any HW state-full protocol offloading technologies
> and that was one of my motivations to initiate Open-iSCSI project to
> prove that performance is not an issue anymore. And we succeeded, by
> showing comparable to iSCSI HW Initiator's numbers.

Agreed.  After working on a full TOE implementation, I think that the 
niche market most TOE vendors are pursuing is not one that the Linux 
community will ever develop for.  Hardware vendors that gradually add 
offloading features from the NIC realm to speed up the existing network 
stack are a much better fit with Linux.

> Though, for me, RDMA over TCP is an interesting topic from software
> implementation point of view. I was thinking about organizing new
> project. If someone knows that related work is already started - let me
> know since I might be interested to help.

Shall we create a new mailing list?  I guess it's time to update 
majordomo... =)

		-ben

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Linux support for RDMA (was: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics)
  2005-03-28 22:32               ` Benjamin LaHaise
@ 2005-03-29  3:19                 ` Roland Dreier
  2005-03-30 16:00                   ` Benjamin LaHaise
  0 siblings, 1 reply; 6+ messages in thread
From: Roland Dreier @ 2005-03-29  3:19 UTC (permalink / raw)
  To: Benjamin LaHaise
  Cc: Dmitry Yusupov, open-iscsi, David S. Miller, mpm, andrea,
	michaelc, James.Bottomley, ksummit-2005-discuss, netdev

    Benjamin> Agreed.  After working on a full TOE implementation, I
    Benjamin> think that the niche market most TOE vendors are
    Benjamin> pursuing is not one that the Linux community will ever
    Benjamin> develop for.  Hardware vendors that gradually add
    Benjamin> offloading features from the NIC realm to speed up the
    Benjamin> existing network stack are a much better fit with Linux.

I have to admit I don't know much about the TOE / RDMA/TCP / RNIC (or
whatever you want to call it) world.  However I know that the large
majority of InfiniBand use right now is running on Linux, and I hope
the Linux community is willing to work with the IB community.

InfiniBand adoption is strong right now, with lots of large clusters
being built.  It seems reasonable that RDMA/TCP should be able to
compete in the same market.  Whether InfiniBand or RDMA/TCP or both
will survive or prosper is a good question, and I think it's too early
to tell yet.

 - R.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Linux support for RDMA (was: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics)
  2005-03-29  3:19                 ` Linux support for RDMA (was: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics) Roland Dreier
@ 2005-03-30 16:00                   ` Benjamin LaHaise
  2005-03-31  1:08                     ` Linux support for RDMA H. Peter Anvin
  0 siblings, 1 reply; 6+ messages in thread
From: Benjamin LaHaise @ 2005-03-30 16:00 UTC (permalink / raw)
  To: Roland Dreier
  Cc: Dmitry Yusupov, open-iscsi, David S. Miller, mpm, andrea,
	michaelc, James.Bottomley, ksummit-2005-discuss, netdev

On Mon, Mar 28, 2005 at 07:19:35PM -0800, Roland Dreier wrote:
>     Benjamin> Agreed.  After working on a full TOE implementation, I
>     Benjamin> think that the niche market most TOE vendors are
>     Benjamin> pursuing is not one that the Linux community will ever
>     Benjamin> develop for.  Hardware vendors that gradually add
>     Benjamin> offloading features from the NIC realm to speed up the
>     Benjamin> existing network stack are a much better fit with Linux.
> 
> I have to admit I don't know much about the TOE / RDMA/TCP / RNIC (or
> whatever you want to call it) world.  However I know that the large
> majority of InfiniBand use right now is running on Linux, and I hope
> the Linux community is willing to work with the IB community.

My comments were more directed to Full TOE implementations, which tend 
to suffer from incomplete feature coverage if compared to the native 
Linux TCP/IP stack.  Wedging a complete network stack onto a piece of 
hardware does allow for better performance characteristics on workloads 
where the networking overhead matters, but it comes at the cost of not 
being able to trivially change the resulting stack.  Plus there are 
very few vendors who are willing to release firmware code to the open 
source community.

> InfiniBand adoption is strong right now, with lots of large clusters
> being built.  It seems reasonable that RDMA/TCP should be able to
> compete in the same market.  Whether InfiniBand or RDMA/TCP or both
> will survive or prosper is a good question, and I think it's too early
> to tell yet.

I'm curious how the 10Gig ethernet market will pan out.  Time and again 
the market has shown that ethernet always has the cost advantage in the 
end.  If something like Intel's I/O Acceleration Technology makes it 
that much easier for commodity ethernet to achieve similar performance 
characteristics over ethernet to that of IB and fibre channel, the cost 
advantage alone might switch some new customers over.  But the hardware 
isn't near what IB offers today, making IB an important niche filler.

		-ben
-- 
"Time is what keeps everything from happening all at once." -- John Wheeler

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Linux support for RDMA
  2005-03-30 16:00                   ` Benjamin LaHaise
@ 2005-03-31  1:08                     ` H. Peter Anvin
  0 siblings, 0 replies; 6+ messages in thread
From: H. Peter Anvin @ 2005-03-31  1:08 UTC (permalink / raw)
  To: Benjamin LaHaise
  Cc: Roland Dreier, Dmitry Yusupov, open-iscsi, David S. Miller, mpm,
	andrea, michaelc, James.Bottomley, ksummit-2005-discuss, netdev

Benjamin LaHaise wrote:
>  
> I'm curious how the 10Gig ethernet market will pan out.  Time and again 
> the market has shown that ethernet always has the cost advantage in the 
> end.  If something like Intel's I/O Acceleration Technology makes it 
> that much easier for commodity ethernet to achieve similar performance 
> characteristics over ethernet to that of IB and fibre channel, the cost 
> advantage alone might switch some new customers over.  But the hardware 
> isn't near what IB offers today, making IB an important niche filler.
> 

 From what I've seen coming down the pipe, I think 10GE is going to 
eventually win over IB, just like previous generations did over Token 
Ring, FDDI and other niche filler technologies.  It doesn't, as you say, 
mean that e.g. IB doesn't matter *now*; furthermore, it also matters for 
the purpose of fixing the kind of issues that are going to have to be 
fixed anyway.

	-hpa

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2005-04-02  1:59 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-04-01  1:49 Linux support for RDMA jaganav
2005-04-01  1:57 ` H. Peter Anvin
  -- strict thread matches above, loose matches on Subject: below --
2005-04-02  1:59 jaganav
2005-04-01 23:50 Asgeir Eiriksson
2005-04-02  0:02 ` Dmitry Yusupov
     [not found] <20050324233921.GZ14202@opteron.random>
     [not found] ` <20050325034341.GV32638@waste.org>
     [not found]   ` <20050327035149.GD4053@g5.random>
2005-03-27  5:48     ` [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics Matt Mackall
2005-03-27  6:33       ` Dmitry Yusupov
2005-03-27  6:46         ` David S. Miller
2005-03-28 19:45           ` Roland Dreier
     [not found]             ` <1112042936.5088.22.camel@beastie>
2005-03-28 22:32               ` Benjamin LaHaise
2005-03-29  3:19                 ` Linux support for RDMA (was: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics) Roland Dreier
2005-03-30 16:00                   ` Benjamin LaHaise
2005-03-31  1:08                     ` Linux support for RDMA H. Peter Anvin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).