* RE: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit ProposedTopics
@ 2005-04-02 19:07 Asgeir Eiriksson
2005-04-02 19:14 ` Ming Zhang
2005-04-04 0:56 ` Linux support for RDMA (was: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics) Dmitry Yusupov
0 siblings, 2 replies; 11+ messages in thread
From: Asgeir Eiriksson @ 2005-04-02 19:07 UTC (permalink / raw)
To: Dmitry Yusupov, open-iscsi
Cc: David S. Miller, mpm, andrea, michaelc, James.Bottomley,
ksummit-2005-discuss, netdev
Dmitry
The CPU cycles is only at most half of the story with the other half
being the memory sub-system BW.
So the validity of your observation depends on the BW we're talking
about, i.e. if the client is using a fraction of 10Gbps for RDMA (or
DDP, e.g. iSCSI DDP), yes then that fraction amounts to a fraction of
the memory sub-system total BW so we don't much care about the extra
copy.
The situation is different if the client wants something close to 10Gbps
(already have such client applications), because today 10Gbps is still a
big chunk of the overall memory BW so you really care about eliminating
that copy via DDP.
'Asgeir
> -----Original Message-----
> From: netdev-bounce@oss.sgi.com [mailto:netdev-bounce@oss.sgi.com] On
> Behalf Of Dmitry Yusupov
> Sent: Saturday, April 02, 2005 10:09 AM
> To: open-iscsi@googlegroups.com
> Cc: David S. Miller; mpm@selenic.com; andrea@suse.de;
> michaelc@cs.wisc.edu; James.Bottomley@HansenPartnership.com;
ksummit-2005-
> discuss@thunk.org; netdev@oss.sgi.com
> Subject: Re: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit
> ProposedTopics
>
> On Mon, 2005-03-28 at 17:32 -0500, Benjamin LaHaise wrote:
> > On Mon, Mar 28, 2005 at 12:48:56PM -0800, Dmitry Yusupov wrote:
> > > If you have plans to start new project such as SoftRDMA than yes.
lets
> > > discuss it since set of problems will be similar to what we've got
> with
> > > software iSCSI Initiators.
> >
> > I'm somewhat interested in seeing a SoftRDMA project get off the
ground.
> > At least the NatSemi 83820 gige MAC is able to provide early-rx
> interrupts
> > that allow one to get an rx interrupt before the full payload has
> arrived
> > making it possible to write out a new rx descriptor to place the
payload
> > wherever it is ultimately desired. It would be fun to work on if
not
> the
> > most performant RDMA implementation.
>
> I see a lot of skepticism around early-rx interrupt schema. It might
> work for gige, but i'm not sure if it will fit into 10g.
>
> What RDMA gives us is zero-copy on receive and new networking api
which
> has a potential to be HW accelerated. SoftRDMA will never avoid
copying
> on receive. But benefit for SoftRDMA would be its availability on
client
> sides. It is free and it could be easily deployed. Soon Intel & Co
will
> give us 2,4,8... multi-core CPUs for around 200$ :), So, who cares if
> one of those cores will do receive side copying?
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit ProposedTopics
2005-04-02 19:07 [Ksummit-2005-discuss] Summary of 2005 Kernel Summit ProposedTopics Asgeir Eiriksson
@ 2005-04-02 19:14 ` Ming Zhang
2005-04-04 0:56 ` Linux support for RDMA (was: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics) Dmitry Yusupov
1 sibling, 0 replies; 11+ messages in thread
From: Ming Zhang @ 2005-04-02 19:14 UTC (permalink / raw)
To: open-iscsi
Cc: Dmitry Yusupov, David S. Miller, mpm, andrea, michaelc,
James.Bottomley, ksummit-2005-discuss, netdev
yes, thx for explaining this in more detail.
copy avoidance is one main goal of rdma. the BW gap is the bottleneck.
ming
On Sat, 2005-04-02 at 14:07, Asgeir Eiriksson wrote:
> Dmitry
>
> The CPU cycles is only at most half of the story with the other half
> being the memory sub-system BW.
>
> So the validity of your observation depends on the BW we're talking
> about, i.e. if the client is using a fraction of 10Gbps for RDMA (or
> DDP, e.g. iSCSI DDP), yes then that fraction amounts to a fraction of
> the memory sub-system total BW so we don't much care about the extra
> copy.
>
> The situation is different if the client wants something close to 10Gbps
> (already have such client applications), because today 10Gbps is still a
> big chunk of the overall memory BW so you really care about eliminating
> that copy via DDP.
>
> 'Asgeir
>
> > -----Original Message-----
> > From: netdev-bounce@oss.sgi.com [mailto:netdev-bounce@oss.sgi.com] On
> > Behalf Of Dmitry Yusupov
> > Sent: Saturday, April 02, 2005 10:09 AM
> > To: open-iscsi@googlegroups.com
> > Cc: David S. Miller; mpm@selenic.com; andrea@suse.de;
> > michaelc@cs.wisc.edu; James.Bottomley@HansenPartnership.com;
> ksummit-2005-
> > discuss@thunk.org; netdev@oss.sgi.com
> > Subject: Re: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit
> > ProposedTopics
> >
> > On Mon, 2005-03-28 at 17:32 -0500, Benjamin LaHaise wrote:
> > > On Mon, Mar 28, 2005 at 12:48:56PM -0800, Dmitry Yusupov wrote:
> > > > If you have plans to start new project such as SoftRDMA than yes.
> lets
> > > > discuss it since set of problems will be similar to what we've got
> > with
> > > > software iSCSI Initiators.
> > >
> > > I'm somewhat interested in seeing a SoftRDMA project get off the
> ground.
> > > At least the NatSemi 83820 gige MAC is able to provide early-rx
> > interrupts
> > > that allow one to get an rx interrupt before the full payload has
> > arrived
> > > making it possible to write out a new rx descriptor to place the
> payload
> > > wherever it is ultimately desired. It would be fun to work on if
> not
> > the
> > > most performant RDMA implementation.
> >
> > I see a lot of skepticism around early-rx interrupt schema. It might
> > work for gige, but i'm not sure if it will fit into 10g.
> >
> > What RDMA gives us is zero-copy on receive and new networking api
> which
> > has a potential to be HW accelerated. SoftRDMA will never avoid
> copying
> > on receive. But benefit for SoftRDMA would be its availability on
> client
> > sides. It is free and it could be easily deployed. Soon Intel & Co
> will
> > give us 2,4,8... multi-core CPUs for around 200$ :), So, who cares if
> > one of those cores will do receive side copying?
> >
>
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Linux support for RDMA (was: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics)
2005-04-02 19:07 [Ksummit-2005-discuss] Summary of 2005 Kernel Summit ProposedTopics Asgeir Eiriksson
2005-04-02 19:14 ` Ming Zhang
@ 2005-04-04 0:56 ` Dmitry Yusupov
2005-04-04 6:34 ` Grant Grundler
1 sibling, 1 reply; 11+ messages in thread
From: Dmitry Yusupov @ 2005-04-04 0:56 UTC (permalink / raw)
To: open-iscsi@googlegroups.com
Cc: David S. Miller, mpm, andrea, michaelc, James.Bottomley,
ksummit-2005-discuss, netdev
On Sat, 2005-04-02 at 11:07 -0800, Asgeir Eiriksson wrote:
> Dmitry
> The CPU cycles is only at most half of the story with the other half
> being the memory sub-system BW.
>
> So the validity of your observation depends on the BW we're talking
> about, i.e. if the client is using a fraction of 10Gbps for RDMA (or
> DDP, e.g. iSCSI DDP), yes then that fraction amounts to a fraction of
> the memory sub-system total BW so we don't much care about the extra
> copy.
>
> The situation is different if the client wants something close to 10Gbps
> (already have such client applications), because today 10Gbps is still a
> big chunk of the overall memory BW so you really care about eliminating
> that copy via DDP.
I do not get your concern with memory BW. With good AMD box V40Z(SUN)
you can get 5.3GBytes/sec. Even with 10Gbps full speed you have 80%
left. PCI-X BUS BW is bigger concern...
> 'Asgeir
>
> > -----Original Message-----
> > From: netdev-bounce@oss.sgi.com [mailto:netdev-bounce@oss.sgi.com] On
> > Behalf Of Dmitry Yusupov
> > Sent: Saturday, April 02, 2005 10:09 AM
> > To: open-iscsi@googlegroups.com
> > Cc: David S. Miller; mpm@selenic.com; andrea@suse.de;
> > michaelc@cs.wisc.edu; James.Bottomley@HansenPartnership.com;
> ksummit-2005-
> > discuss@thunk.org; netdev@oss.sgi.com
> > Subject: Re: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit
> > ProposedTopics
> >
> > On Mon, 2005-03-28 at 17:32 -0500, Benjamin LaHaise wrote:
> > > On Mon, Mar 28, 2005 at 12:48:56PM -0800, Dmitry Yusupov wrote:
> > > > If you have plans to start new project such as SoftRDMA than yes.
> lets
> > > > discuss it since set of problems will be similar to what we've got
> > with
> > > > software iSCSI Initiators.
> > >
> > > I'm somewhat interested in seeing a SoftRDMA project get off the
> ground.
> > > At least the NatSemi 83820 gige MAC is able to provide early-rx
> > interrupts
> > > that allow one to get an rx interrupt before the full payload has
> > arrived
> > > making it possible to write out a new rx descriptor to place the
> payload
> > > wherever it is ultimately desired. It would be fun to work on if
> not
> > the
> > > most performant RDMA implementation.
> >
> > I see a lot of skepticism around early-rx interrupt schema. It might
> > work for gige, but i'm not sure if it will fit into 10g.
> >
> > What RDMA gives us is zero-copy on receive and new networking api
> which
> > has a potential to be HW accelerated. SoftRDMA will never avoid
> copying
> > on receive. But benefit for SoftRDMA would be its availability on
> client
> > sides. It is free and it could be easily deployed. Soon Intel & Co
> will
> > give us 2,4,8... multi-core CPUs for around 200$ :), So, who cares if
> > one of those cores will do receive side copying?
> >
>
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Linux support for RDMA (was: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics)
2005-04-04 0:56 ` Linux support for RDMA (was: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics) Dmitry Yusupov
@ 2005-04-04 6:34 ` Grant Grundler
2005-04-04 7:10 ` David S. Miller
` (2 more replies)
0 siblings, 3 replies; 11+ messages in thread
From: Grant Grundler @ 2005-04-04 6:34 UTC (permalink / raw)
To: Dmitry Yusupov
Cc: open-iscsi@googlegroups.com, David S. Miller, mpm, andrea,
michaelc, James.Bottomley, ksummit-2005-discuss, netdev
On Sun, Apr 03, 2005 at 05:56:11PM -0700, Dmitry Yusupov wrote:
> I do not get your concern with memory BW. With good AMD box V40Z(SUN)
> you can get 5.3GBytes/sec. Even with 10Gbps full speed you have 80%
> left. PCI-X BUS BW is bigger concern...
Yes and No. PCI-X isn't fast enough but the data only crosses
the PCI-X bus once. Think about the data flow:
1) DMA to RAM
2) load into CPU cache
3) store back into RAM
We are down to 40% left...graphics folks won't like you.
grant
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: Linux support for RDMA (was: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics)
2005-04-04 6:34 ` Grant Grundler
@ 2005-04-04 7:10 ` David S. Miller
2005-04-04 12:58 ` Ming Zhang
2005-04-04 16:31 ` Grant Grundler
2005-04-04 12:56 ` Ming Zhang
2005-04-04 16:54 ` Dmitry Yusupov
2 siblings, 2 replies; 11+ messages in thread
From: David S. Miller @ 2005-04-04 7:10 UTC (permalink / raw)
To: Grant Grundler
Cc: dmitry_yus, open-iscsi, mpm, andrea, michaelc, James.Bottomley,
ksummit-2005-discuss, netdev
On Mon, 4 Apr 2005 00:34:56 -0600
Grant Grundler <grundler@parisc-linux.org> wrote:
> Yes and No. PCI-X isn't fast enough but the data only crosses
> the PCI-X bus once. Think about the data flow:
> 1) DMA to RAM
> 2) load into CPU cache
> 3) store back into RAM
>
> We are down to 40% left...graphics folks won't like you.
But you're missing the point, which is that the memory system
always catches up to the networking technology.
We'll have that %60 back before you know it when we have
PCI-Z and DDR8 or whatever even in $500.00USD desktop machines.
And those systems will be present by the time we put together
this complicated infrastructure for RDMA.
RDMA is like cache coloring page allocators, it's for yesterday's
technology that we won't be using tomorrow. :-)
Those steps #2 and #3 in your data flow are powerful, it is what
gives us flexibility. And in a general purpose OS that is important.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Linux support for RDMA (was: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics)
2005-04-04 7:10 ` David S. Miller
@ 2005-04-04 12:58 ` Ming Zhang
2005-04-04 16:31 ` Grant Grundler
1 sibling, 0 replies; 11+ messages in thread
From: Ming Zhang @ 2005-04-04 12:58 UTC (permalink / raw)
To: open-iscsi
Cc: Grant Grundler, Dmitry Yusupov, mpm, andrea, michaelc,
James.Bottomley, ksummit-2005-discuss, netdev
On Mon, 2005-04-04 at 03:10, David S. Miller wrote:
> On Mon, 4 Apr 2005 00:34:56 -0600
> Grant Grundler <grundler@parisc-linux.org> wrote:
>
> > Yes and No. PCI-X isn't fast enough but the data only crosses
> > the PCI-X bus once. Think about the data flow:
> > 1) DMA to RAM
> > 2) load into CPU cache
> > 3) store back into RAM
> >
> > We are down to 40% left...graphics folks won't like you.
>
> But you're missing the point, which is that the memory system
> always catches up to the networking technology.
>
> We'll have that %60 back before you know it when we have
> PCI-Z and DDR8 or whatever even in $500.00USD desktop machines.
10G is supposed to be deployed in 2005 and 2006. while i did not see
DDR4 come out yet.
>
> And those systems will be present by the time we put together
> this complicated infrastructure for RDMA.
>
> RDMA is like cache coloring page allocators, it's for yesterday's
> technology that we won't be using tomorrow. :-)
>
> Those steps #2 and #3 in your data flow are powerful, it is what
> gives us flexibility. And in a general purpose OS that is important.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Linux support for RDMA (was: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics)
2005-04-04 7:10 ` David S. Miller
2005-04-04 12:58 ` Ming Zhang
@ 2005-04-04 16:31 ` Grant Grundler
1 sibling, 0 replies; 11+ messages in thread
From: Grant Grundler @ 2005-04-04 16:31 UTC (permalink / raw)
To: David S. Miller
Cc: dmitry_yus, open-iscsi, mpm, andrea, michaelc, James.Bottomley,
ksummit-2005-discuss, netdev
On Mon, Apr 04, 2005 at 12:10:00AM -0700, David S. Miller wrote:
> On Mon, 4 Apr 2005 00:34:56 -0600
> Grant Grundler <grundler@parisc-linux.org> wrote:
>
> > Yes and No. PCI-X isn't fast enough but the data only crosses
> > the PCI-X bus once. Think about the data flow:
> > 1) DMA to RAM
> > 2) load into CPU cache
> > 3) store back into RAM
> >
> > We are down to 40% left...graphics folks won't like you.
>
> But you're missing the point, which is that the memory system
> always catches up to the networking technology.
No. Bus bandwidth catches up to "a" networking technology - not
the "current" technology.
Networking and graphics are usually starving for bus bandwidth.
> We'll have that %60 back before you know it when we have
> PCI-Z and DDR8 or whatever even in $500.00USD desktop machines.
Yes, I agree. That's certainly how it went for 100bt and gige.
Even laptops come with gige now. But we aren't in that part
"of the curve" for IB or 10GigE *yet*.
> And those systems will be present by the time we put together
> this complicated infrastructure for RDMA.
And that will be fine for "general use".
> RDMA is like cache coloring page allocators, it's for yesterday's
> technology that we won't be using tomorrow. :-)
>
> Those steps #2 and #3 in your data flow are powerful, it is what
> gives us flexibility.
Agreed - some very cool things have been done with it.
And for general use, it'll perf sufficiently well over gige.
In the future, I agree IB or 10gigE will too.
> And in a general purpose OS that is important.
I think most of the people interested in IB and 10GigE aren't looking
for "general use". They have a particular application in mind
and they want to maximize performance for dollar spent.
Things like "science appliance", "router", "data warehouse" come to mind.
"General Use" will be a reality only when the dollar cost comes down
so those new technologies can compete with gige.
thanks,
grant
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Linux support for RDMA (was: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics)
2005-04-04 6:34 ` Grant Grundler
2005-04-04 7:10 ` David S. Miller
@ 2005-04-04 12:56 ` Ming Zhang
2005-04-04 16:54 ` Dmitry Yusupov
2 siblings, 0 replies; 11+ messages in thread
From: Ming Zhang @ 2005-04-04 12:56 UTC (permalink / raw)
To: open-iscsi
Cc: Dmitry Yusupov, David S. Miller, mpm, andrea, michaelc,
James.Bottomley, ksummit-2005-discuss, netdev
yes, it travel 3 times instead of 1 time. and it is duplex. send traffic
will take another 20%. so total 80% or it can never run that fast.
ming
On Mon, 2005-04-04 at 02:34, Grant Grundler wrote:
> On Sun, Apr 03, 2005 at 05:56:11PM -0700, Dmitry Yusupov wrote:
> > I do not get your concern with memory BW. With good AMD box V40Z(SUN)
> > you can get 5.3GBytes/sec. Even with 10Gbps full speed you have 80%
> > left. PCI-X BUS BW is bigger concern...
>
> Yes and No. PCI-X isn't fast enough but the data only crosses
> the PCI-X bus once. Think about the data flow:
> 1) DMA to RAM
> 2) load into CPU cache
> 3) store back into RAM
>
> We are down to 40% left...graphics folks won't like you.
>
> grant
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Linux support for RDMA (was: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics)
2005-04-04 6:34 ` Grant Grundler
2005-04-04 7:10 ` David S. Miller
2005-04-04 12:56 ` Ming Zhang
@ 2005-04-04 16:54 ` Dmitry Yusupov
2005-04-04 19:11 ` Grant Grundler
2 siblings, 1 reply; 11+ messages in thread
From: Dmitry Yusupov @ 2005-04-04 16:54 UTC (permalink / raw)
To: Grant Grundler
Cc: open-iscsi@googlegroups.com, David S. Miller, mpm, andrea,
michaelc, James.Bottomley, ksummit-2005-discuss, netdev
On Mon, 2005-04-04 at 00:34 -0600, Grant Grundler wrote:
> On Sun, Apr 03, 2005 at 05:56:11PM -0700, Dmitry Yusupov wrote:
> > I do not get your concern with memory BW. With good AMD box V40Z(SUN)
> > you can get 5.3GBytes/sec. Even with 10Gbps full speed you have 80%
> > left. PCI-X BUS BW is bigger concern...
>
> Yes and No. PCI-X isn't fast enough but the data only crosses
> the PCI-X bus once. Think about the data flow:
> 1) DMA to RAM
yes.
> 2) load into CPU cache
yes.
> 3) store back into RAM
no. we are talking about receive side optimization only.
why do you think store back into RAM comes to the picture?
also keep in mind that we have huge L2 & L3 caches today and write
operation is usually very well buffered.
> We are down to 40% left...graphics folks won't like you.
>
> grant
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Linux support for RDMA (was: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics)
2005-04-04 16:54 ` Dmitry Yusupov
@ 2005-04-04 19:11 ` Grant Grundler
0 siblings, 0 replies; 11+ messages in thread
From: Grant Grundler @ 2005-04-04 19:11 UTC (permalink / raw)
To: Dmitry Yusupov
Cc: open-iscsi@googlegroups.com, David S. Miller, mpm, andrea,
michaelc, James.Bottomley, ksummit-2005-discuss, netdev
On Mon, Apr 04, 2005 at 09:54:10AM -0700, Dmitry Yusupov wrote:
> > 3) store back into RAM
>
> no. we are talking about receive side optimization only.
> why do you think store back into RAM comes to the picture?
Application eventually wants to read the data.
> also keep in mind that we have huge L2 & L3 caches today and write
> operation is usually very well buffered.
Agreed. But how effective the cache is will depend on if the CPU
(application) can process the data as fast as it arrives (and still
be in the cache). Otherwise the data will get pushed out in (3)
and recalled later when the app can consume it (4th time across).
It also assumes the application is running on a CPU core that shares
the cache with the CPU that did the copy. If the CPU is saturated
with the copy (ok, assume we've got 2 Cores per socket), then the
other CPU has to be *assigned* manually to make sure it does the
other part.
Jamal learned all this when he moved to a dual core PPC for
his fast routing work. Jamal, did that ever make it into
a paper?
grant
^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit ProposedTopics
@ 2005-03-29 0:44 Asgeir Eiriksson
0 siblings, 0 replies; 11+ messages in thread
From: Asgeir Eiriksson @ 2005-03-29 0:44 UTC (permalink / raw)
To: Dmitry Yusupov, open-iscsi
Cc: David S. Miller, mpm, andrea, michaelc, James.Bottomley,
ksummit-2005-discuss, netdev
> -----Original Message-----
> From: netdev-bounce@oss.sgi.com [mailto:netdev-bounce@oss.sgi.com] On
> Behalf Of Dmitry Yusupov
> Sent: Monday, March 28, 2005 12:49 PM
> To: open-iscsi@googlegroups.com
> Cc: David S. Miller; mpm@selenic.com; andrea@suse.de;
> michaelc@cs.wisc.edu; James.Bottomley@HansenPartnership.com;
ksummit-2005-
> discuss@thunk.org; netdev@oss.sgi.com
> Subject: Re: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit
> ProposedTopics
>
> Basically, HW offloading all kind of is a different subject.
> Yes, iSER/RDMA/RNIC will help to avoid bunch of problems but at the
same
> time will add bunch of new problems. OOM/deadlock problem we are
> discussing is a software, *not* hardware related.
>
> If you have plans to start new project such as SoftRDMA than yes. lets
> discuss it since set of problems will be similar to what we've got
with
> software iSCSI Initiators.
>
> I'm not a believer in any HW state-full protocol offloading
technologies
> and that was one of my motivations to initiate Open-iSCSI project to
> prove that performance is not an issue anymore. And we succeeded, by
> showing comparable to iSCSI HW Initiator's numbers.
>
Dmitry
Care to be more specific about the performance you achieved?
You might want to contrast your numbers to veritest verified numbers of
800+ MBps and 600+KOPS achieved by Chelsio HBA with stateful offload
using either 1500B or 9KB MTU (for full detail see Veritest report at
http://www.chelsio.com/technology/Chelsio10GbE_iSCSI_report.pdf)
'Asgeir
> Though, for me, RDMA over TCP is an interesting topic from software
> implementation point of view. I was thinking about organizing new
> project. If someone knows that related work is already started - let
me
> know since I might be interested to help.
>
> Dmitry
>
> On Mon, 2005-03-28 at 11:45 -0800, Roland Dreier wrote:
> > Let me slightly hijack this thread to throw out another topic that I
> > think is worth talking about at the kernel summit: handling remote
DMA
> > (RDMA) network technologies.
> >
> > As some of you might know, I'm one of the main authors of the
> > InfiniBand support in the kernel, and I think we have things fairly
> > well in hand there, although handling direct userspace access to
RDMA
> > capabilities may raise some issues worth talking about.
> >
> > However, there is also RDMA-over-TCP hardware beginning to be used,
> > based on the specs from the IETF rddp working group and the RDMA
> > Consortium. I would hope that we can abstract out the common pieces
> > for InfiniBand and RDMA NIC (RNIC) support and morph
> > drivers/infiniband into a more general drivers/rdma.
> >
> > This is not _that_ offtopic, since RDMA NICs provide another way of
> > handling OOM for iSCSI. By having the NIC handle the network
> > transport through something like iSER, you avoid a lot of the issues
> > in this thread. Having to reconnect to a target while OOM is still
a
> > problem, but it seems no worse in principal than the issues with a
> > dump FC card that needs the host driver to handling fabric login.
> >
> > I know that in the InfiniBand world, people have been able to run
> > stress tests of storage over SCSI RDMA Protocol (SRP) with very
heavy
> > swapping going on and no deadlocks. SRP is in effect network
storage
> > with the transport handled by the IB hardware.
> >
> > However there are some sticky points that I would be interested in
> > discussing. For example, the IETF rddp drafts envisage what they
call
> > a "dual stack" model: TCP connections are set up by the usual
network
> > stack and run for a while in "streaming" mode until the application
is
> > ready to start using RDMA. At that point there is an "MPA"
> > negotiation and then the socket is handed over to the RNIC. Clearly
> > moving the state from the kernel's stack to the RNIC is not trivial.
> >
> > Other developers who have more direct experience with RNIC hardware
or
> > perhaps just strong opinions may have other things in this area that
> > they'd like to talk about.
> >
> > Thanks,
> > Roland
>
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2005-04-04 19:11 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-04-02 19:07 [Ksummit-2005-discuss] Summary of 2005 Kernel Summit ProposedTopics Asgeir Eiriksson
2005-04-02 19:14 ` Ming Zhang
2005-04-04 0:56 ` Linux support for RDMA (was: [Ksummit-2005-discuss] Summary of 2005 Kernel Summit Proposed Topics) Dmitry Yusupov
2005-04-04 6:34 ` Grant Grundler
2005-04-04 7:10 ` David S. Miller
2005-04-04 12:58 ` Ming Zhang
2005-04-04 16:31 ` Grant Grundler
2005-04-04 12:56 ` Ming Zhang
2005-04-04 16:54 ` Dmitry Yusupov
2005-04-04 19:11 ` Grant Grundler
-- strict thread matches above, loose matches on Subject: below --
2005-03-29 0:44 [Ksummit-2005-discuss] Summary of 2005 Kernel Summit ProposedTopics Asgeir Eiriksson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).