public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
To: Tom Talpey <tom-CLs1Zie5N5HQT0dZR+AlfA@public.gmane.org>
Cc: Christoph Hellwig <hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>,
	Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>,
	Linux NFS Mailing List
	<linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Steve French <smfrench-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Subject: Re: [PATCH v1 00/16] NFS/RDMA patches proposed for 4.1
Date: Wed, 6 May 2015 10:20:05 -0600	[thread overview]
Message-ID: <20150506162005.GA11331@obsidianresearch.com> (raw)
In-Reply-To: <55495D41.5090502-CLs1Zie5N5HQT0dZR+AlfA@public.gmane.org>

On Tue, May 05, 2015 at 08:16:01PM -0400, Tom Talpey wrote:

> >The specific use-case of a RDMA to/from a logical linear region broken
> >up into HW pages is incredibly kernel specific, and very friendly to
> >hardware support.
> >
> >Heck, on modern systems 100% of these requirements can be solved just
> >by using the IOMMU. No need for the HCA at all. (HCA may be more
> >performant, of course)
> 
> I don't agree on "100%", because IOMMUs don't have the same protection
> attributes as RDMA adapters (local R, local W, remote R, remote W).

No, you do get protection - the IOMMU isn't the only resource, it would
still have to be combined with several pre-setup MR's that have the
proper protection attributes. You'd map the page list into the address
space that is covered by a MR that has the protection attributes
needed.

> Also they don't support handles for page lists quite like
> STags/RMRs, so they require additional (R)DMA scatter/gather. But, I
> agree with your point that they translate addresses just great.

??? the entire point of using the IOMMU in a context like this is to
linearize the page list into DMA'able address. How could you ever need
to scatter/gather when your memory is linear?

> >'post outbound rdma send/write of page region'
> 
> A bunch of writes followed by a send is a common sequence, but not
> very complex (I think).

So, I wasn't clear, I mean a general API that can post a SEND or RDMA
WRITE using a logically linear page list as the data source.

So this results in one of:
 1) A SEND with a gather list
 2) A SEND with a temporary linearized MR
 3) A series of RDMA WRITE with gather lists
 4) A RDMA WRITE with a temporary linearized MR

Picking one depends on the performance of the HCA and the various
features it supports. Even just the really simple options of #1 and #3
become a bit more complex when you want to take advantage of
transparent huge pages to reduce gather list length.

For instance, deciding when to trade off 3 vs 4 is going to be very
driver specific..

> >'prepare inbound rdma write of page region'
> 
> This is memory registration, with remote writability. That's what
> the rpcrdma_register_external() API in xprtrdma/verbs.c does. It
> takes a private rpcrdma structure, but it supports multiple memreg
> strategies and pretty much does what you expect. I'm sure someone
> could abstract it upward.

Right, most likely an implementation would just pull the NFS code into
the core, I think it is the broadest version we have?

> >'complete X'
> 
> This is trickier - invalidation has many interesting error cases.
> But, on a sunny day with the breeze at our backs, sure.

I don't mean send+invalidate, this is the 'free' for the 'alloc' the
above APIs might need (ie the temporary MR). You can't fail to free
the MR - that would be an insane API :)

> If Linux upper layers considered adopting a similar approach by
> carefully inserting RDMA operations conditionally, it can make
> the lower layer's job much more efficient. And, efficiency is speed.
> And in the end, the API throughout the stack will be simpler.

No idea for Linux. It seems to me most of the use cases we are talking
about here not actually assuming a socket, NFS-RDMA, SRP, iSER, Lustre
are all explicitly driving verbs and explicity working with pages
lists for their high speed side.

Does that mean we are already doing what you are talking about?

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  parent reply	other threads:[~2015-05-06 16:20 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20150313211124.22471.14517.stgit@manet.1015granger.net>
     [not found] ` <20150313211124.22471.14517.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2015-05-05 15:44   ` [PATCH v1 00/16] NFS/RDMA patches proposed for 4.1 Christoph Hellwig
     [not found]     ` <20150505154411.GA16729-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2015-05-05 16:04       ` Chuck Lever
     [not found]         ` <5E1B32EA-9803-49AA-856D-BF0E1A5DFFF4-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2015-05-05 17:25           ` Christoph Hellwig
     [not found]             ` <20150505172540.GA19442-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2015-05-05 18:14               ` Tom Talpey
     [not found]                 ` <55490886.4070502-CLs1Zie5N5HQT0dZR+AlfA@public.gmane.org>
2015-05-05 19:10                   ` Christoph Hellwig
     [not found]                     ` <20150505191012.GA21164-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2015-05-05 20:57                       ` Tom Talpey
     [not found]                         ` <55492ED3.7000507-CLs1Zie5N5HQT0dZR+AlfA@public.gmane.org>
2015-05-05 21:06                           ` Christoph Hellwig
     [not found]                             ` <20150505210627.GA5941-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2015-05-05 21:32                               ` Tom Talpey
     [not found]                                 ` <554936E5.80607-CLs1Zie5N5HQT0dZR+AlfA@public.gmane.org>
2015-05-05 22:38                                   ` Jason Gunthorpe
     [not found]                                     ` <20150505223855.GA7696-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-05-06  0:16                                       ` Tom Talpey
     [not found]                                         ` <55495D41.5090502-CLs1Zie5N5HQT0dZR+AlfA@public.gmane.org>
2015-05-06 16:20                                           ` Jason Gunthorpe [this message]
2015-05-06  7:01                                       ` Bart Van Assche
     [not found]                                         ` <5549BC33.30905-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2015-05-06 16:38                                           ` Jason Gunthorpe
2015-05-06  7:33                                   ` Christoph Hellwig
2015-05-06  7:09                               ` Bart Van Assche
     [not found]                                 ` <5549BE30.8020505-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2015-05-06  7:29                                   ` Christoph Hellwig
2015-05-06 12:15               ` Sagi Grimberg
2015-03-13 21:26 Chuck Lever

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150506162005.GA11331@obsidianresearch.com \
    --to=jgunthorpe-epgobjl8dl3ta4ec/59zmfatqe2ktcn/@public.gmane.org \
    --cc=chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org \
    --cc=hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org \
    --cc=linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=smfrench-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    --cc=tom-CLs1Zie5N5HQT0dZR+AlfA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox