* Linux support for RDMA (was: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics)
[not found] ` <1112042936.5088.22.camel@beastie>
2005-03-28 22:32 ` Benjamin LaHaise
@ 2005-03-29 3:14 ` Roland Dreier
1 sibling, 0 replies; 23+ messages in thread
From: Roland Dreier @ 2005-03-29 3:14 UTC (permalink / raw)
To: Dmitry Yusupov
Cc: open-iscsi, David S. Miller, mpm, andrea, michaelc,
James.Bottomley, ksummit-2005-discuss, netdev
Dmitry> Basically, HW offloading all kind of is a different
Dmitry> subject. Yes, iSER/RDMA/RNIC will help to avoid bunch of
Dmitry> problems but at the same time will add bunch of new
Dmitry> problems. OOM/deadlock problem we are discussing is a
Dmitry> software, *not* hardware related.
Yes, that's why I said I was hijacking the topic to bring up something
else I was interested in :)
Dmitry> If you have plans to start new project such as SoftRDMA
Dmitry> than yes. lets discuss it since set of problems will be
Dmitry> similar to what we've got with software iSCSI Initiators.
No, I don't have plans for such a project, although I would be
interesting in participating in a small way. Unfortunately I'm
involved in too many other things on to do much real work.
My main interest comes from the InfiniBand world. Right now we have
the beginnings of good support for IB in drivers/infiniband, but
people are always talking to me about adding support for RDMA/TCP
hardware. I think we should be able to evolve the curent InfiniBand
API to a more generic RDMA API, and I would hope that a "SoftRDMA"
project can fit in as just another low-level device driver (soft of
the same way software iSCSI sits under the SCSI stack).
In fact I think SoftRDMA would be very good for this generalization
work, as it would force us to come up with very flexible APIs.
Dmitry> I'm not a believer in any HW state-full protocol
Dmitry> offloading technologies and that was one of my motivations
Dmitry> to initiate Open-iSCSI project to prove that performance
Dmitry> is not an issue anymore. And we succeeded, by showing
Dmitry> comparable to iSCSI HW Initiator's numbers.
Fair enough. I think I agree that HW offload is not really justified
if all you care about is storage, although a cheap iSCSI HBA than
handles all the transport and just lets the host queue IOs seems like
a reasonable thing to put in a server that has work to do beyond
running a storage stack.
It seems that many people are using RDMA hardware (mostly InfiniBand
now, maybe RDMA/TCP will catch on) hardware for other reasons. In
those cases users often want to share the same fabric and NIC for
storage too. But my main interest right now is in getting RDMA
working well on Linux for the users that are already out there -- I
know many IB clusters with hundreds and even thousands of nodes are
being built all the time, so InfiniBand must be solving some real
problems for users.
- R.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Linux support for RDMA (was: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics)
2005-03-28 22:32 ` Benjamin LaHaise
@ 2005-03-29 3:19 ` Roland Dreier
2005-03-30 16:00 ` Benjamin LaHaise
0 siblings, 1 reply; 23+ messages in thread
From: Roland Dreier @ 2005-03-29 3:19 UTC (permalink / raw)
To: Benjamin LaHaise
Cc: Dmitry Yusupov, open-iscsi, David S. Miller, mpm, andrea,
michaelc, James.Bottomley, ksummit-2005-discuss, netdev
Benjamin> Agreed. After working on a full TOE implementation, I
Benjamin> think that the niche market most TOE vendors are
Benjamin> pursuing is not one that the Linux community will ever
Benjamin> develop for. Hardware vendors that gradually add
Benjamin> offloading features from the NIC realm to speed up the
Benjamin> existing network stack are a much better fit with Linux.
I have to admit I don't know much about the TOE / RDMA/TCP / RNIC (or
whatever you want to call it) world. However I know that the large
majority of InfiniBand use right now is running on Linux, and I hope
the Linux community is willing to work with the IB community.
InfiniBand adoption is strong right now, with lots of large clusters
being built. It seems reasonable that RDMA/TCP should be able to
compete in the same market. Whether InfiniBand or RDMA/TCP or both
will survive or prosper is a good question, and I think it's too early
to tell yet.
- R.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Linux support for RDMA (was: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics)
2005-03-29 3:19 ` Linux support for RDMA (was: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics) Roland Dreier
@ 2005-03-30 16:00 ` Benjamin LaHaise
0 siblings, 0 replies; 23+ messages in thread
From: Benjamin LaHaise @ 2005-03-30 16:00 UTC (permalink / raw)
To: Roland Dreier
Cc: Dmitry Yusupov, open-iscsi, David S. Miller, mpm, andrea,
michaelc, James.Bottomley, ksummit-2005-discuss, netdev
On Mon, Mar 28, 2005 at 07:19:35PM -0800, Roland Dreier wrote:
> Benjamin> Agreed. After working on a full TOE implementation, I
> Benjamin> think that the niche market most TOE vendors are
> Benjamin> pursuing is not one that the Linux community will ever
> Benjamin> develop for. Hardware vendors that gradually add
> Benjamin> offloading features from the NIC realm to speed up the
> Benjamin> existing network stack are a much better fit with Linux.
>
> I have to admit I don't know much about the TOE / RDMA/TCP / RNIC (or
> whatever you want to call it) world. However I know that the large
> majority of InfiniBand use right now is running on Linux, and I hope
> the Linux community is willing to work with the IB community.
My comments were more directed to Full TOE implementations, which tend
to suffer from incomplete feature coverage if compared to the native
Linux TCP/IP stack. Wedging a complete network stack onto a piece of
hardware does allow for better performance characteristics on workloads
where the networking overhead matters, but it comes at the cost of not
being able to trivially change the resulting stack. Plus there are
very few vendors who are willing to release firmware code to the open
source community.
> InfiniBand adoption is strong right now, with lots of large clusters
> being built. It seems reasonable that RDMA/TCP should be able to
> compete in the same market. Whether InfiniBand or RDMA/TCP or both
> will survive or prosper is a good question, and I think it's too early
> to tell yet.
I'm curious how the 10Gig ethernet market will pan out. Time and again
the market has shown that ethernet always has the cost advantage in the
end. If something like Intel's I/O Acceleration Technology makes it
that much easier for commodity ethernet to achieve similar performance
characteristics over ethernet to that of IB and fibre channel, the cost
advantage alone might switch some new customers over. But the hardware
isn't near what IB offers today, making IB an important niche filler.
-ben
--
"Time is what keeps everything from happening all at once." -- John Wheeler
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Linux support for RDMA (was: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics)
@ 2005-04-01 2:13 jaganav
2005-04-01 23:43 ` Stephen Hemminger
0 siblings, 1 reply; 23+ messages in thread
From: jaganav @ 2005-04-01 2:13 UTC (permalink / raw)
To: Roland Dreier
Cc: Benjamin LaHaise, Dmitry Yusupov, open-iscsi, David S. Miller,
mpm, andrea, michaelc, James.Bottomley, ksummit-2005-discuss,
netdev, bmt
Quoting Roland Dreier <roland@topspin.com>:
> I have to admit I don't know much about the TOE / RDMA/TCP / RNIC (or
> whatever you want to call it) world. However I know that the large
> majority of InfiniBand use right now is running on Linux, and I hope
> the Linux community is willing to work with the IB community.
>
Just want to let everyone know know that we have started an opensource
effort (www.openrdma.org) for enablement of RNICs (RDMA enabled NICs). This
community has now come up with an architecture
(http://rdma.sourceforge.net/architecture.pdf) to build this support in Linux.
Would really appreciate if you review and provide any comments. We have just
started to hack but no code is available on this project yet.
Thanks
Venkat
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Linux support for RDMA (was: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics)
2005-04-01 2:13 jaganav
@ 2005-04-01 23:43 ` Stephen Hemminger
2005-04-02 1:37 ` jaganav
0 siblings, 1 reply; 23+ messages in thread
From: Stephen Hemminger @ 2005-04-01 23:43 UTC (permalink / raw)
To: jaganav
Cc: Roland Dreier, Benjamin LaHaise, Dmitry Yusupov, open-iscsi,
David S. Miller, mpm, andrea, michaelc, James.Bottomley,
ksummit-2005-discuss, netdev, bmt
On Thu, 31 Mar 2005 21:13:39 -0500
jaganav@us.ibm.com wrote:
> Quoting Roland Dreier <roland@topspin.com>:
> > I have to admit I don't know much about the TOE / RDMA/TCP / RNIC (or
> > whatever you want to call it) world. However I know that the large
> > majority of InfiniBand use right now is running on Linux, and I hope
> > the Linux community is willing to work with the IB community.
> >
>
> Just want to let everyone know know that we have started an opensource
> effort (www.openrdma.org) for enablement of RNICs (RDMA enabled NICs). This
> community has now come up with an architecture
> (http://rdma.sourceforge.net/architecture.pdf) to build this support in Linux.
> Would really appreciate if you review and provide any comments. We have just
> started to hack but no code is available on this project yet.
>
> Thanks
> Venkat
OpenRdma is a misnomer, because as I read your architecture you are trying to
create a "kernel abstraction layer" for closed source vendor RDMA drivers. This will
never be accepted, please go back to the drawing board and figure out how to make
real open source drivers.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Linux support for RDMA (was: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics)
@ 2005-04-02 1:37 ` jaganav
2005-04-02 5:27 ` Greg KH
2005-04-04 16:50 ` Stephen Hemminger
0 siblings, 2 replies; 23+ messages in thread
From: jaganav @ 2005-04-02 1:37 UTC (permalink / raw)
To: Stephen Hemminger
Cc: Roland Dreier, Benjamin LaHaise, Dmitry Yusupov, open-iscsi,
David S. Miller, mpm, andrea, michaelc, James.Bottomley,
ksummit-2005-discuss, netdev, bmt
Quoting Stephen Hemminger <shemminger@osdl.org>:
> On Thu, 31 Mar 2005 21:13:39 -0500
> jaganav@us.ibm.com wrote:
>
> > Quoting Roland Dreier <roland@topspin.com>:
> > > I have to admit I don't know much about the TOE / RDMA/TCP / RNIC (or
> > > whatever you want to call it) world. However I know that the large
> > > majority of InfiniBand use right now is running on Linux, and I hope
> > > the Linux community is willing to work with the IB community.
> > >
> >
> > Just want to let everyone know know that we have started an opensource
> > effort (www.openrdma.org) for enablement of RNICs (RDMA enabled NICs).
> This
> > community has now come up with an architecture
> > (http://rdma.sourceforge.net/architecture.pdf) to build this support in
> Linux.
> > Would really appreciate if you review and provide any comments. We have
> just
> > started to hack but no code is available on this project yet.
> >
> > Thanks
> > Venkat
>
> OpenRdma is a misnomer, because as I read your architecture you are trying
> to
> create a "kernel abstraction layer" for closed source vendor RDMA drivers.
> This will
> never be accepted, please go back to the drawing board and figure out how to
> make
> real open source drivers.
>
>
First let me say that the purpose of this project is to
make the entire stack (with all of the enablement layers)
including the drivers opensourced.
The kernel abstraction layer will be built
around standards based (opengroup.org/icsc) RNIC-PI
interface and which allows the RNIC vendors to opensource
their drivers using that interface. BTW, RNIC-PI
interface is work-in-progress and the first draft
is targeted to be published soon.
Several RNIC adapter vendors, who contribute to the
openRDMA effort, are quite willing to opensource
their drivers through openRDMA project.
BTW, I understood why you got the impression
that the this is for closed source vendor drivers:
Our intention is not to allow the kernel verbs
provider code (kVP) to be private and that was
an error. Thanks for pointing this out
but we'll make this change soon.
Thanks
Venkat
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Linux support for RDMA (was: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics)
2005-04-02 1:37 ` jaganav
@ 2005-04-02 5:27 ` Greg KH
2005-04-02 6:02 ` Greg KH
2005-04-04 16:50 ` Stephen Hemminger
1 sibling, 1 reply; 23+ messages in thread
From: Greg KH @ 2005-04-02 5:27 UTC (permalink / raw)
To: jaganav
Cc: Stephen Hemminger, Roland Dreier, Benjamin LaHaise,
Dmitry Yusupov, open-iscsi, David S. Miller, mpm, andrea,
michaelc, James.Bottomley, ksummit-2005-discuss, netdev, bmt
On Fri, Apr 01, 2005 at 08:37:13PM -0500, jaganav@us.ibm.com wrote:
>
> Several RNIC adapter vendors, who contribute to the
> openRDMA effort, are quite willing to opensource
> their drivers through openRDMA project.
"Several"? Why not all?
And why the dual license? What good is writing Linux kernel code that
is BSD licensed for such a core component? Didn't you all learn from
the openib licensing mess?
thanks,
greg k-h
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Linux support for RDMA (was: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics)
2005-04-02 5:27 ` Greg KH
@ 2005-04-02 6:02 ` Greg KH
2005-04-02 15:01 ` Andrea Arcangeli
0 siblings, 1 reply; 23+ messages in thread
From: Greg KH @ 2005-04-02 6:02 UTC (permalink / raw)
To: jaganav
Cc: Stephen Hemminger, Roland Dreier, Benjamin LaHaise,
Dmitry Yusupov, open-iscsi, David S. Miller, mpm, andrea,
michaelc, James.Bottomley, ksummit-2005-discuss, netdev, bmt
On Fri, Apr 01, 2005 at 09:27:38PM -0800, Greg KH wrote:
> On Fri, Apr 01, 2005 at 08:37:13PM -0500, jaganav@us.ibm.com wrote:
> >
> > Several RNIC adapter vendors, who contribute to the
> > openRDMA effort, are quite willing to opensource
> > their drivers through openRDMA project.
>
> "Several"? Why not all?
>
> And why the dual license? What good is writing Linux kernel code that
> is BSD licensed for such a core component? Didn't you all learn from
> the openib licensing mess?
Oh, and for those of you who might not know what mess I am talking
about:
The openib code was set up to be dual GPL and BSD licensed for the
express purpose of taking the openib code and placing it into a closed
source operating system (not any of the *BSDs). Needless to say, this
has prevented me from doing any openib work, and probably the same for a
number of other Linux kernel developers.
If you all wish to duplicate this stupidity, feel free, but do not
expect to get any help from the community...
thanks,
greg k-h
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Linux support for RDMA (was: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics)
@ 2005-04-02 7:29 jaganav
2005-04-02 18:27 ` Matthew Wilcox
` (2 more replies)
0 siblings, 3 replies; 23+ messages in thread
From: jaganav @ 2005-04-02 7:29 UTC (permalink / raw)
To: Greg KH
Cc: Stephen Hemminger, Roland Dreier, Benjamin LaHaise,
Dmitry Yusupov, open-iscsi, David S. Miller, mpm, andrea,
michaelc, James.Bottomley, ksummit-2005-discuss, netdev, bmt
Quoting Greg KH <greg@kroah.com>:
> On Fri, Apr 01, 2005 at 09:27:38PM -0800, Greg KH wrote:
> > On Fri, Apr 01, 2005 at 08:37:13PM -0500, jaganav@us.ibm.com wrote:
> > >
> > > Several RNIC adapter vendors, who contribute to the
> > > openRDMA effort, are quite willing to opensource
> > > their drivers through openRDMA project.
> >
> > "Several"? Why not all?
Because I haven't heard from 'all' of them yet that they would opensource.
I am sure every vendor will do when the most of the other vendors are
opensourcing it but I can't speak for them. I have asked in the past and will
continue to ask every vendor to opensource their driver and make it part of
openRDMA stack.
> >
> > And why the dual license? What good is writing Linux kernel code that
> > is BSD licensed for such a core component? Didn't you all learn from
> > the openib licensing mess?
>
> Oh, and for those of you who might not know what mess I am talking
> about:
>
> The openib code was set up to be dual GPL and BSD licensed for the
> express purpose of taking the openib code and placing it into a closed
> source operating system (not any of the *BSDs). Needless to say, this
> has prevented me from doing any openib work, and probably the same for a
> number of other Linux kernel developers.
>
Absolutely understand the dual-license mess with openIB code. -:)
However the intention of dual license with OpenRDMA is not for placing
the code in closed source OSes but specifically for BSD* and in fact, the
request is specifically made by the most adapter vendors as they wanted to offer
the same on BSD platforms as well.
BTW, unlike OpenIB initial stack (i.e. Gen1) which was already developed when it
got opensourced, the openRDMA code is developed from scratch in true opensource
fashion (of course, OpenIB has also followed this approach for their next
generation stack though) with no ifdef code for BSD*.
If this dual license is a concern to other kernel developers as well from
contributing to OpenRDMA, we would seriously consider this and discuss with the
adapter vendors.
Thanks
Venkat
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Linux support for RDMA (was: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics)
2005-04-02 6:02 ` Greg KH
@ 2005-04-02 15:01 ` Andrea Arcangeli
0 siblings, 0 replies; 23+ messages in thread
From: Andrea Arcangeli @ 2005-04-02 15:01 UTC (permalink / raw)
To: Greg KH
Cc: jaganav, Stephen Hemminger, Roland Dreier, Benjamin LaHaise,
Dmitry Yusupov, open-iscsi, David S. Miller, mpm, michaelc,
James.Bottomley, ksummit-2005-discuss, netdev, bmt
On Fri, Apr 01, 2005 at 10:02:16PM -0800, Greg KH wrote:
> If you all wish to duplicate this stupidity, feel free, but do not
> expect to get any help from the community...
And just in case: do not expect to be allowed to use stuff like the
rbtree.[ch] which is GPL'd (not LGPL). (ib patches from topspin
originally relicensed rbtree.[ch] under BSD...)
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Linux support for RDMA (was: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics)
2005-04-02 7:29 jaganav
@ 2005-04-02 18:27 ` Matthew Wilcox
2005-04-03 1:26 ` Grant Grundler
2005-04-05 15:04 ` Rik van Riel
2 siblings, 0 replies; 23+ messages in thread
From: Matthew Wilcox @ 2005-04-02 18:27 UTC (permalink / raw)
To: jaganav
Cc: Greg KH, Stephen Hemminger, Roland Dreier, Benjamin LaHaise,
Dmitry Yusupov, open-iscsi, David S. Miller, mpm, andrea,
michaelc, James.Bottomley, ksummit-2005-discuss, netdev, bmt
On Sat, Apr 02, 2005 at 02:29:51AM -0500, jaganav@us.ibm.com wrote:
> If this dual license is a concern to other kernel developers as well from
> contributing to OpenRDMA, we would seriously consider this and discuss with the
> adapter vendors.
Yes, it's a serious concern. Please release the code under the GPL only.
--
"Next the statesmen will invent cheap lies, putting the blame upon
the nation that is attacked, and every man will be glad of those
conscience-soothing falsities, and will diligently study them, and refuse
to examine any refutations of them; and thus he will by and by convince
himself that the war is just, and will thank God for the better sleep
he enjoys after this process of grotesque self-deception." -- Mark Twain
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Linux support for RDMA (was: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics)
2005-04-02 7:29 jaganav
2005-04-02 18:27 ` Matthew Wilcox
@ 2005-04-03 1:26 ` Grant Grundler
2005-04-05 15:04 ` Rik van Riel
2 siblings, 0 replies; 23+ messages in thread
From: Grant Grundler @ 2005-04-03 1:26 UTC (permalink / raw)
To: jaganav
Cc: Greg KH, Stephen Hemminger, Roland Dreier, Benjamin LaHaise,
Dmitry Yusupov, open-iscsi, David S. Miller, mpm, andrea,
michaelc, James.Bottomley, ksummit-2005-discuss, netdev, bmt
On Sat, Apr 02, 2005 at 02:29:51AM -0500, jaganav@us.ibm.com wrote:
> If this dual license is a concern to other kernel developers as well from
> contributing to OpenRDMA, we would seriously consider this and discuss
> with the adapter vendors.
I'm not concerned with it. If *BSD can thrive with it's license,
I don't see why it's a problem for linux.
HP is going to pay me to work on the code regardless of the license.
Projects I work on privately happen to be GPL though I'm not religous
about it. If people choose NOT to volunteer time/effort on dual licensed
code, I understand and respect that. There are enough worthy GPL only
projects out there.
I'm speaking for myself and NOT for HP.
grant
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Linux support for RDMA (was: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics)
2005-04-02 19:07 [Ksummit-2005-discuss] Summary of 2005 Kernel Summit ProposedTopics Asgeir Eiriksson
@ 2005-04-04 0:56 ` Dmitry Yusupov
2005-04-04 6:34 ` Grant Grundler
0 siblings, 1 reply; 23+ messages in thread
From: Dmitry Yusupov @ 2005-04-04 0:56 UTC (permalink / raw)
To: open-iscsi@googlegroups.com
Cc: David S. Miller, mpm, andrea, michaelc, James.Bottomley,
ksummit-2005-discuss, netdev
On Sat, 2005-04-02 at 11:07 -0800, Asgeir Eiriksson wrote:
> Dmitry
> The CPU cycles is only at most half of the story with the other half
> being the memory sub-system BW.
>
> So the validity of your observation depends on the BW we're talking
> about, i.e. if the client is using a fraction of 10Gbps for RDMA (or
> DDP, e.g. iSCSI DDP), yes then that fraction amounts to a fraction of
> the memory sub-system total BW so we don't much care about the extra
> copy.
>
> The situation is different if the client wants something close to 10Gbps
> (already have such client applications), because today 10Gbps is still a
> big chunk of the overall memory BW so you really care about eliminating
> that copy via DDP.
I do not get your concern with memory BW. With good AMD box V40Z(SUN)
you can get 5.3GBytes/sec. Even with 10Gbps full speed you have 80%
left. PCI-X BUS BW is bigger concern...
> 'Asgeir
>
> > -----Original Message-----
> > From: netdev-bounce@oss.sgi.com [mailto:netdev-bounce@oss.sgi.com] On
> > Behalf Of Dmitry Yusupov
> > Sent: Saturday, April 02, 2005 10:09 AM
> > To: open-iscsi@googlegroups.com
> > Cc: David S. Miller; mpm@selenic.com; andrea@suse.de;
> > michaelc@cs.wisc.edu; James.Bottomley@HansenPartnership.com;
> ksummit-2005-
> > discuss@thunk.org; netdev@oss.sgi.com
> > Subject: Re: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit
> > ProposedTopics
> >
> > On Mon, 2005-03-28 at 17:32 -0500, Benjamin LaHaise wrote:
> > > On Mon, Mar 28, 2005 at 12:48:56PM -0800, Dmitry Yusupov wrote:
> > > > If you have plans to start new project such as SoftRDMA than yes.
> lets
> > > > discuss it since set of problems will be similar to what we've got
> > with
> > > > software iSCSI Initiators.
> > >
> > > I'm somewhat interested in seeing a SoftRDMA project get off the
> ground.
> > > At least the NatSemi 83820 gige MAC is able to provide early-rx
> > interrupts
> > > that allow one to get an rx interrupt before the full payload has
> > arrived
> > > making it possible to write out a new rx descriptor to place the
> payload
> > > wherever it is ultimately desired. It would be fun to work on if
> not
> > the
> > > most performant RDMA implementation.
> >
> > I see a lot of skepticism around early-rx interrupt schema. It might
> > work for gige, but i'm not sure if it will fit into 10g.
> >
> > What RDMA gives us is zero-copy on receive and new networking api
> which
> > has a potential to be HW accelerated. SoftRDMA will never avoid
> copying
> > on receive. But benefit for SoftRDMA would be its availability on
> client
> > sides. It is free and it could be easily deployed. Soon Intel & Co
> will
> > give us 2,4,8... multi-core CPUs for around 200$ :), So, who cares if
> > one of those cores will do receive side copying?
> >
>
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Linux support for RDMA (was: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics)
2005-04-04 0:56 ` Linux support for RDMA (was: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics) Dmitry Yusupov
@ 2005-04-04 6:34 ` Grant Grundler
2005-04-04 7:10 ` David S. Miller
` (2 more replies)
0 siblings, 3 replies; 23+ messages in thread
From: Grant Grundler @ 2005-04-04 6:34 UTC (permalink / raw)
To: Dmitry Yusupov
Cc: open-iscsi@googlegroups.com, David S. Miller, mpm, andrea,
michaelc, James.Bottomley, ksummit-2005-discuss, netdev
On Sun, Apr 03, 2005 at 05:56:11PM -0700, Dmitry Yusupov wrote:
> I do not get your concern with memory BW. With good AMD box V40Z(SUN)
> you can get 5.3GBytes/sec. Even with 10Gbps full speed you have 80%
> left. PCI-X BUS BW is bigger concern...
Yes and No. PCI-X isn't fast enough but the data only crosses
the PCI-X bus once. Think about the data flow:
1) DMA to RAM
2) load into CPU cache
3) store back into RAM
We are down to 40% left...graphics folks won't like you.
grant
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Linux support for RDMA (was: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics)
2005-04-04 6:34 ` Grant Grundler
@ 2005-04-04 7:10 ` David S. Miller
2005-04-04 12:58 ` Ming Zhang
2005-04-04 16:31 ` Grant Grundler
2005-04-04 12:56 ` Ming Zhang
2005-04-04 16:54 ` Dmitry Yusupov
2 siblings, 2 replies; 23+ messages in thread
From: David S. Miller @ 2005-04-04 7:10 UTC (permalink / raw)
To: Grant Grundler
Cc: dmitry_yus, open-iscsi, mpm, andrea, michaelc, James.Bottomley,
ksummit-2005-discuss, netdev
On Mon, 4 Apr 2005 00:34:56 -0600
Grant Grundler <grundler@parisc-linux.org> wrote:
> Yes and No. PCI-X isn't fast enough but the data only crosses
> the PCI-X bus once. Think about the data flow:
> 1) DMA to RAM
> 2) load into CPU cache
> 3) store back into RAM
>
> We are down to 40% left...graphics folks won't like you.
But you're missing the point, which is that the memory system
always catches up to the networking technology.
We'll have that %60 back before you know it when we have
PCI-Z and DDR8 or whatever even in $500.00USD desktop machines.
And those systems will be present by the time we put together
this complicated infrastructure for RDMA.
RDMA is like cache coloring page allocators, it's for yesterday's
technology that we won't be using tomorrow. :-)
Those steps #2 and #3 in your data flow are powerful, it is what
gives us flexibility. And in a general purpose OS that is important.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Linux support for RDMA (was: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics)
2005-04-04 6:34 ` Grant Grundler
2005-04-04 7:10 ` David S. Miller
@ 2005-04-04 12:56 ` Ming Zhang
2005-04-04 16:54 ` Dmitry Yusupov
2 siblings, 0 replies; 23+ messages in thread
From: Ming Zhang @ 2005-04-04 12:56 UTC (permalink / raw)
To: open-iscsi
Cc: Dmitry Yusupov, David S. Miller, mpm, andrea, michaelc,
James.Bottomley, ksummit-2005-discuss, netdev
yes, it travel 3 times instead of 1 time. and it is duplex. send traffic
will take another 20%. so total 80% or it can never run that fast.
ming
On Mon, 2005-04-04 at 02:34, Grant Grundler wrote:
> On Sun, Apr 03, 2005 at 05:56:11PM -0700, Dmitry Yusupov wrote:
> > I do not get your concern with memory BW. With good AMD box V40Z(SUN)
> > you can get 5.3GBytes/sec. Even with 10Gbps full speed you have 80%
> > left. PCI-X BUS BW is bigger concern...
>
> Yes and No. PCI-X isn't fast enough but the data only crosses
> the PCI-X bus once. Think about the data flow:
> 1) DMA to RAM
> 2) load into CPU cache
> 3) store back into RAM
>
> We are down to 40% left...graphics folks won't like you.
>
> grant
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Linux support for RDMA (was: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics)
2005-04-04 7:10 ` David S. Miller
@ 2005-04-04 12:58 ` Ming Zhang
2005-04-04 16:31 ` Grant Grundler
1 sibling, 0 replies; 23+ messages in thread
From: Ming Zhang @ 2005-04-04 12:58 UTC (permalink / raw)
To: open-iscsi
Cc: Grant Grundler, Dmitry Yusupov, mpm, andrea, michaelc,
James.Bottomley, ksummit-2005-discuss, netdev
On Mon, 2005-04-04 at 03:10, David S. Miller wrote:
> On Mon, 4 Apr 2005 00:34:56 -0600
> Grant Grundler <grundler@parisc-linux.org> wrote:
>
> > Yes and No. PCI-X isn't fast enough but the data only crosses
> > the PCI-X bus once. Think about the data flow:
> > 1) DMA to RAM
> > 2) load into CPU cache
> > 3) store back into RAM
> >
> > We are down to 40% left...graphics folks won't like you.
>
> But you're missing the point, which is that the memory system
> always catches up to the networking technology.
>
> We'll have that %60 back before you know it when we have
> PCI-Z and DDR8 or whatever even in $500.00USD desktop machines.
10G is supposed to be deployed in 2005 and 2006. while i did not see
DDR4 come out yet.
>
> And those systems will be present by the time we put together
> this complicated infrastructure for RDMA.
>
> RDMA is like cache coloring page allocators, it's for yesterday's
> technology that we won't be using tomorrow. :-)
>
> Those steps #2 and #3 in your data flow are powerful, it is what
> gives us flexibility. And in a general purpose OS that is important.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Linux support for RDMA (was: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics)
2005-04-04 7:10 ` David S. Miller
2005-04-04 12:58 ` Ming Zhang
@ 2005-04-04 16:31 ` Grant Grundler
1 sibling, 0 replies; 23+ messages in thread
From: Grant Grundler @ 2005-04-04 16:31 UTC (permalink / raw)
To: David S. Miller
Cc: dmitry_yus, open-iscsi, mpm, andrea, michaelc, James.Bottomley,
ksummit-2005-discuss, netdev
On Mon, Apr 04, 2005 at 12:10:00AM -0700, David S. Miller wrote:
> On Mon, 4 Apr 2005 00:34:56 -0600
> Grant Grundler <grundler@parisc-linux.org> wrote:
>
> > Yes and No. PCI-X isn't fast enough but the data only crosses
> > the PCI-X bus once. Think about the data flow:
> > 1) DMA to RAM
> > 2) load into CPU cache
> > 3) store back into RAM
> >
> > We are down to 40% left...graphics folks won't like you.
>
> But you're missing the point, which is that the memory system
> always catches up to the networking technology.
No. Bus bandwidth catches up to "a" networking technology - not
the "current" technology.
Networking and graphics are usually starving for bus bandwidth.
> We'll have that %60 back before you know it when we have
> PCI-Z and DDR8 or whatever even in $500.00USD desktop machines.
Yes, I agree. That's certainly how it went for 100bt and gige.
Even laptops come with gige now. But we aren't in that part
"of the curve" for IB or 10GigE *yet*.
> And those systems will be present by the time we put together
> this complicated infrastructure for RDMA.
And that will be fine for "general use".
> RDMA is like cache coloring page allocators, it's for yesterday's
> technology that we won't be using tomorrow. :-)
>
> Those steps #2 and #3 in your data flow are powerful, it is what
> gives us flexibility.
Agreed - some very cool things have been done with it.
And for general use, it'll perf sufficiently well over gige.
In the future, I agree IB or 10gigE will too.
> And in a general purpose OS that is important.
I think most of the people interested in IB and 10GigE aren't looking
for "general use". They have a particular application in mind
and they want to maximize performance for dollar spent.
Things like "science appliance", "router", "data warehouse" come to mind.
"General Use" will be a reality only when the dollar cost comes down
so those new technologies can compete with gige.
thanks,
grant
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Linux support for RDMA (was: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics)
2005-04-02 1:37 ` jaganav
2005-04-02 5:27 ` Greg KH
@ 2005-04-04 16:50 ` Stephen Hemminger
1 sibling, 0 replies; 23+ messages in thread
From: Stephen Hemminger @ 2005-04-04 16:50 UTC (permalink / raw)
To: jaganav
Cc: Roland Dreier, Benjamin LaHaise, Dmitry Yusupov, open-iscsi,
David S. Miller, mpm, andrea, michaelc, James.Bottomley,
ksummit-2005-discuss, netdev, bmt
On Fri, 1 Apr 2005 20:37:13 -0500
jaganav@us.ibm.com wrote:
> Quoting Stephen Hemminger <shemminger@osdl.org>:
>
> > On Thu, 31 Mar 2005 21:13:39 -0500
> > jaganav@us.ibm.com wrote:
> >
> > > Quoting Roland Dreier <roland@topspin.com>:
> > > > I have to admit I don't know much about the TOE / RDMA/TCP / RNIC (or
> > > > whatever you want to call it) world. However I know that the large
> > > > majority of InfiniBand use right now is running on Linux, and I hope
> > > > the Linux community is willing to work with the IB community.
> > > >
> > >
> > > Just want to let everyone know know that we have started an opensource
> > > effort (www.openrdma.org) for enablement of RNICs (RDMA enabled NICs).
> > This
> > > community has now come up with an architecture
> > > (http://rdma.sourceforge.net/architecture.pdf) to build this support in
> > Linux.
> > > Would really appreciate if you review and provide any comments. We have
> > just
> > > started to hack but no code is available on this project yet.
> > >
> > > Thanks
> > > Venkat
> >
> > OpenRdma is a misnomer, because as I read your architecture you are trying
> > to
> > create a "kernel abstraction layer" for closed source vendor RDMA drivers.
> > This will
> > never be accepted, please go back to the drawing board and figure out how to
> > make
> > real open source drivers.
> >
> >
>
> First let me say that the purpose of this project is to
> make the entire stack (with all of the enablement layers)
> including the drivers opensourced.
How about putting out code early that implements a subset of
what you want (and not waiting till you think it is done).
> The kernel abstraction layer will be built
> around standards based (opengroup.org/icsc) RNIC-PI
> interface and which allows the RNIC vendors to opensource
> their drivers using that interface. BTW, RNIC-PI
> interface is work-in-progress and the first draft
> is targeted to be published soon.
But standards based abstraction layer is sure to be limited to least common
denominator. The locking model and setup/teardown are sure to be different
under each OS.
Also, it is impossible to build any decent abstraction top-down in a "waterfall"
model. You need to have an iterative process that refines and is willing to have
compatibility restrictions. Also, the kernel community hates interfaces and code
where there is a big *don't go here* box that prevents fixing bugs and improving
interfaces. Linux puts a big emphasis on long-term maintainability of code.
Another issue that concerns me is making sure that all the security and
policy are maintained when doing RDMA. How do you do firewalling and
routing when you are allowing adapter to control the world?
> Several RNIC adapter vendors, who contribute to the
> openRDMA effort, are quite willing to opensource
> their drivers through openRDMA project.
>
> BTW, I understood why you got the impression
> that the this is for closed source vendor drivers:
> Our intention is not to allow the kernel verbs
> provider code (kVP) to be private and that was
> an error. Thanks for pointing this out
> but we'll make this change soon.
You are attacking a hard problem. Thanks for the effort.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Linux support for RDMA (was: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics)
2005-04-04 6:34 ` Grant Grundler
2005-04-04 7:10 ` David S. Miller
2005-04-04 12:56 ` Ming Zhang
@ 2005-04-04 16:54 ` Dmitry Yusupov
2005-04-04 19:11 ` Grant Grundler
2 siblings, 1 reply; 23+ messages in thread
From: Dmitry Yusupov @ 2005-04-04 16:54 UTC (permalink / raw)
To: Grant Grundler
Cc: open-iscsi@googlegroups.com, David S. Miller, mpm, andrea,
michaelc, James.Bottomley, ksummit-2005-discuss, netdev
On Mon, 2005-04-04 at 00:34 -0600, Grant Grundler wrote:
> On Sun, Apr 03, 2005 at 05:56:11PM -0700, Dmitry Yusupov wrote:
> > I do not get your concern with memory BW. With good AMD box V40Z(SUN)
> > you can get 5.3GBytes/sec. Even with 10Gbps full speed you have 80%
> > left. PCI-X BUS BW is bigger concern...
>
> Yes and No. PCI-X isn't fast enough but the data only crosses
> the PCI-X bus once. Think about the data flow:
> 1) DMA to RAM
yes.
> 2) load into CPU cache
yes.
> 3) store back into RAM
no. we are talking about receive side optimization only.
why do you think store back into RAM comes to the picture?
also keep in mind that we have huge L2 & L3 caches today and write
operation is usually very well buffered.
> We are down to 40% left...graphics folks won't like you.
>
> grant
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Linux support for RDMA (was: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics)
2005-04-04 16:54 ` Dmitry Yusupov
@ 2005-04-04 19:11 ` Grant Grundler
0 siblings, 0 replies; 23+ messages in thread
From: Grant Grundler @ 2005-04-04 19:11 UTC (permalink / raw)
To: Dmitry Yusupov
Cc: open-iscsi@googlegroups.com, David S. Miller, mpm, andrea,
michaelc, James.Bottomley, ksummit-2005-discuss, netdev
On Mon, Apr 04, 2005 at 09:54:10AM -0700, Dmitry Yusupov wrote:
> > 3) store back into RAM
>
> no. we are talking about receive side optimization only.
> why do you think store back into RAM comes to the picture?
Application eventually wants to read the data.
> also keep in mind that we have huge L2 & L3 caches today and write
> operation is usually very well buffered.
Agreed. But how effective the cache is will depend on if the CPU
(application) can process the data as fast as it arrives (and still
be in the cache). Otherwise the data will get pushed out in (3)
and recalled later when the app can consume it (4th time across).
It also assumes the application is running on a CPU core that shares
the cache with the CPU that did the copy. If the CPU is saturated
with the copy (ok, assume we've got 2 Cores per socket), then the
other CPU has to be *assigned* manually to make sure it does the
other part.
Jamal learned all this when he moved to a dual core PPC for
his fast routing work. Jamal, did that ever make it into
a paper?
grant
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Linux support for RDMA (was: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics)
2005-04-02 7:29 jaganav
2005-04-02 18:27 ` Matthew Wilcox
2005-04-03 1:26 ` Grant Grundler
@ 2005-04-05 15:04 ` Rik van Riel
2 siblings, 0 replies; 23+ messages in thread
From: Rik van Riel @ 2005-04-05 15:04 UTC (permalink / raw)
To: jaganav
Cc: Greg KH, Stephen Hemminger, Roland Dreier, Benjamin LaHaise,
Dmitry Yusupov, open-iscsi, David S. Miller, mpm, andrea,
michaelc, James.Bottomley, ksummit-2005-discuss, netdev, bmt
On Sat, 2 Apr 2005 jaganav@us.ibm.com wrote:
> If this dual license is a concern to other kernel developers as well
> from contributing to OpenRDMA, we would seriously consider this and
> discuss with the adapter vendors.
It could be a problem when trying to reuse existing
GPL code, eg. to hook into locking mechanisms. It
could also be a problem if you touch data structures
that are protected by RCU.
--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Linux support for RDMA (was: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics)
@ 2005-04-05 22:19 jaganav
0 siblings, 0 replies; 23+ messages in thread
From: jaganav @ 2005-04-05 22:19 UTC (permalink / raw)
To: Rik van Riel
Cc: Greg KH, Stephen Hemminger, Roland Dreier, Benjamin LaHaise,
Dmitry Yusupov, open-iscsi, David S. Miller, mpm, andrea,
michaelc, James.Bottomley, ksummit-2005-discuss, netdev, bmt
Quoting Rik van Riel <riel@redhat.com>:
> On Sat, 2 Apr 2005 jaganav@us.ibm.com wrote:
>
> > If this dual license is a concern to other kernel developers as well
> > from contributing to OpenRDMA, we would seriously consider this and
> > discuss with the adapter vendors.
>
> It could be a problem when trying to reuse existing
> GPL code, eg. to hook into locking mechanisms. It
> could also be a problem if you touch data structures
> that are protected by RCU.
Right, this could be one of the significant porting issues
but it may become a real concern. Thanks.
Thanks
Venkat
^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2005-04-05 22:19 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-04-05 22:19 Linux support for RDMA (was: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics) jaganav
-- strict thread matches above, loose matches on Subject: below --
2005-04-02 19:07 [Ksummit-2005-discuss] Summary of 2005 Kernel Summit ProposedTopics Asgeir Eiriksson
2005-04-04 0:56 ` Linux support for RDMA (was: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics) Dmitry Yusupov
2005-04-04 6:34 ` Grant Grundler
2005-04-04 7:10 ` David S. Miller
2005-04-04 12:58 ` Ming Zhang
2005-04-04 16:31 ` Grant Grundler
2005-04-04 12:56 ` Ming Zhang
2005-04-04 16:54 ` Dmitry Yusupov
2005-04-04 19:11 ` Grant Grundler
2005-04-02 7:29 jaganav
2005-04-02 18:27 ` Matthew Wilcox
2005-04-03 1:26 ` Grant Grundler
2005-04-05 15:04 ` Rik van Riel
2005-04-01 2:13 jaganav
2005-04-01 23:43 ` Stephen Hemminger
2005-04-02 1:37 ` jaganav
2005-04-02 5:27 ` Greg KH
2005-04-02 6:02 ` Greg KH
2005-04-02 15:01 ` Andrea Arcangeli
2005-04-04 16:50 ` Stephen Hemminger
[not found] <4241D106.8050302@cs.wisc.edu>
[not found] ` <20050324101622S.fujita.tomonori@lab.ntt.co.jp>
[not found] ` <1111628393.1548.307.camel@beastie>
[not found] ` <20050324113312W.fujita.tomonori@lab.ntt.co.jp>
[not found] ` <1111633846.1548.318.camel@beastie>
[not found] ` <20050324215922.GT14202@opteron.random>
[not found] ` <424346FE.20704@cs.wisc.edu>
[not found] ` <20050324233921.GZ14202@opteron.random>
[not found] ` <20050325034341.GV32638@waste.org>
[not found] ` <20050327035149.GD4053@g5.random>
2005-03-27 5:48 ` [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics Matt Mackall
2005-03-27 6:33 ` Dmitry Yusupov
2005-03-27 6:46 ` David S. Miller
2005-03-28 19:45 ` Roland Dreier
[not found] ` <1112042936.5088.22.camel@beastie>
2005-03-28 22:32 ` Benjamin LaHaise
2005-03-29 3:19 ` Linux support for RDMA (was: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics) Roland Dreier
2005-03-30 16:00 ` Benjamin LaHaise
2005-03-29 3:14 ` Roland Dreier
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).