From: Tom Talpey <tom@talpey.com>
To: Christoph Hellwig <hch@infradead.org>
Cc: Chuck Lever <chuck.lever@oracle.com>,
Linux NFS Mailing List <linux-nfs@vger.kernel.org>,
linux-rdma@vger.kernel.org
Subject: Re: [PATCH v1 00/16] NFS/RDMA patches proposed for 4.1
Date: Tue, 05 May 2015 16:57:55 -0400 [thread overview]
Message-ID: <55492ED3.7000507@talpey.com> (raw)
In-Reply-To: <20150505191012.GA21164@infradead.org>
On 5/5/2015 3:10 PM, Christoph Hellwig wrote:
> On Tue, May 05, 2015 at 02:14:30PM -0400, Tom Talpey wrote:
>> As you might guess, I can go on at length about this. :-) But, if
>> you have a kernel service, the ability to pin memory, and you
>> want it to go fast, you want FRWR.
>
> Basically most in-kernel consumers seem to have the same requirements:
>
> - register a struct page, which can be kernel or user memory (it's
> probably pinned in your Terms, but we don't really use that much in
> kernelspace).
Actually, I strongly disagree that the in-kernel consumers want to
register a struct page. They want to register a list of pages, often
a rather long one. They want this because it allows the RDMA layer to
address the list with a single memory handle. This is where things
get tricky.
So the "pinned" or "wired" term is because in order to do RDMA, the
page needs to have a fixed mapping to this handle. Usually, that means
a physical address. There are some new approaches that allow the NIC
to raise a fault and/or walk kernel page tables, but one way or the
other the page had better be resident. RDMA NICs, generally speaking,
don't buffer in-flight RDMA data, nor do you want them to.
> - In many but not all cases we might need an offset/length for each
> page (think struct bvec, paged sk_buffs, or scatterlists of some
> sort), in other an offset/len for the whole set of pages is fine,
> but that's a superset of the one above.
Yep, RDMA calls this FBO and length, and further, the protocol requires
that the data itself be contiguous within the registration, that is, the
FBO can be non-zero, but no other holes be present.
> - we usually want it to be as fast as possible
In the case of file protocols such as NFS/RDMA and SMB Direct, as well
as block protocols such as iSER, these registrations are set up and
torn down on a per-I/O basis, in order to protect the data from
misbehaving peers or misbehaving hardware. So to me as a storage
protocol provider, "usually" means "always".
I totally get where you're coming from, my main question is whether
it's possible to nail the requirements of some useful common API.
It has been tried before, shall I say.
Tom.
WARNING: multiple messages have this Message-ID (diff)
From: Tom Talpey <tom-CLs1Zie5N5HQT0dZR+AlfA@public.gmane.org>
To: Christoph Hellwig <hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
Cc: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>,
Linux NFS Mailing List
<linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: [PATCH v1 00/16] NFS/RDMA patches proposed for 4.1
Date: Tue, 05 May 2015 16:57:55 -0400 [thread overview]
Message-ID: <55492ED3.7000507@talpey.com> (raw)
In-Reply-To: <20150505191012.GA21164-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
On 5/5/2015 3:10 PM, Christoph Hellwig wrote:
> On Tue, May 05, 2015 at 02:14:30PM -0400, Tom Talpey wrote:
>> As you might guess, I can go on at length about this. :-) But, if
>> you have a kernel service, the ability to pin memory, and you
>> want it to go fast, you want FRWR.
>
> Basically most in-kernel consumers seem to have the same requirements:
>
> - register a struct page, which can be kernel or user memory (it's
> probably pinned in your Terms, but we don't really use that much in
> kernelspace).
Actually, I strongly disagree that the in-kernel consumers want to
register a struct page. They want to register a list of pages, often
a rather long one. They want this because it allows the RDMA layer to
address the list with a single memory handle. This is where things
get tricky.
So the "pinned" or "wired" term is because in order to do RDMA, the
page needs to have a fixed mapping to this handle. Usually, that means
a physical address. There are some new approaches that allow the NIC
to raise a fault and/or walk kernel page tables, but one way or the
other the page had better be resident. RDMA NICs, generally speaking,
don't buffer in-flight RDMA data, nor do you want them to.
> - In many but not all cases we might need an offset/length for each
> page (think struct bvec, paged sk_buffs, or scatterlists of some
> sort), in other an offset/len for the whole set of pages is fine,
> but that's a superset of the one above.
Yep, RDMA calls this FBO and length, and further, the protocol requires
that the data itself be contiguous within the registration, that is, the
FBO can be non-zero, but no other holes be present.
> - we usually want it to be as fast as possible
In the case of file protocols such as NFS/RDMA and SMB Direct, as well
as block protocols such as iSER, these registrations are set up and
torn down on a per-I/O basis, in order to protect the data from
misbehaving peers or misbehaving hardware. So to me as a storage
protocol provider, "usually" means "always".
I totally get where you're coming from, my main question is whether
it's possible to nail the requirements of some useful common API.
It has been tried before, shall I say.
Tom.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2015-05-05 20:57 UTC|newest]
Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-03-13 21:21 [PATCH v1 00/16] NFS/RDMA patches proposed for 4.1 Chuck Lever
2015-03-13 21:21 ` [PATCH v1 01/16] xprtrdma: Display IPv6 addresses and port numbers correctly Chuck Lever
2015-03-13 21:21 ` [PATCH v1 02/16] xprtrdma: Perform a full marshal on retransmit Chuck Lever
2015-03-13 21:21 ` [PATCH v1 03/16] xprtrdma: Add vector of ops for each memory registration strategy Chuck Lever
2015-03-13 21:21 ` [PATCH v1 04/16] xprtrdma: Add a "max_payload" op for each memreg mode Chuck Lever
2015-03-13 21:22 ` [PATCH v1 05/16] xprtrdma: Add a "register_external" " Chuck Lever
2015-03-13 21:22 ` [PATCH v1 06/16] xprtrdma: Add a "deregister_external" " Chuck Lever
2015-03-17 14:37 ` Anna Schumaker
2015-03-17 15:04 ` Chuck Lever
2015-03-13 21:22 ` [PATCH v1 07/16] xprtrdma: Add "init MRs" memreg op Chuck Lever
2015-03-13 21:22 ` [PATCH v1 08/16] xprtrdma: Add "reset " Chuck Lever
2015-03-13 21:22 ` [PATCH v1 09/16] xprtrdma: Add "destroy " Chuck Lever
2015-03-13 21:22 ` [PATCH v1 10/16] xprtrdma: Add "open" " Chuck Lever
2015-03-17 15:16 ` Anna Schumaker
2015-03-17 15:19 ` Chuck Lever
2015-03-13 21:23 ` [PATCH v1 11/16] xprtrdma: Handle non-SEND completions via a callout Chuck Lever
2015-03-13 21:23 ` [PATCH v1 12/16] xprtrdma: Acquire FMRs in rpcrdma_fmr_register_external() Chuck Lever
2015-03-13 21:23 ` [PATCH v1 13/16] xprtrdma: Acquire MRs in rpcrdma_register_external() Chuck Lever
2015-03-13 21:23 ` [PATCH v1 14/16] xprtrdma: Remove rpcrdma_ia::ri_memreg_strategy Chuck Lever
2015-03-13 21:23 ` [PATCH v1 15/16] xprtrdma: Make rpcrdma_{un}map_one() into inline functions Chuck Lever
2015-03-13 21:23 ` [PATCH v1 16/16] xprtrdma: Split rb_lock Chuck Lever
2015-05-05 15:44 ` [PATCH v1 00/16] NFS/RDMA patches proposed for 4.1 Christoph Hellwig
2015-05-05 15:44 ` Christoph Hellwig
2015-05-05 16:04 ` Chuck Lever
2015-05-05 16:04 ` Chuck Lever
2015-05-05 17:25 ` Christoph Hellwig
2015-05-05 17:25 ` Christoph Hellwig
2015-05-05 18:14 ` Tom Talpey
2015-05-05 18:14 ` Tom Talpey
2015-05-05 19:10 ` Christoph Hellwig
2015-05-05 19:10 ` Christoph Hellwig
2015-05-05 20:57 ` Tom Talpey [this message]
2015-05-05 20:57 ` Tom Talpey
2015-05-05 21:06 ` Christoph Hellwig
2015-05-05 21:06 ` Christoph Hellwig
2015-05-05 21:32 ` Tom Talpey
2015-05-05 21:32 ` Tom Talpey
2015-05-05 22:38 ` Jason Gunthorpe
2015-05-05 22:38 ` Jason Gunthorpe
2015-05-06 0:16 ` Tom Talpey
2015-05-06 0:16 ` Tom Talpey
2015-05-06 16:20 ` Jason Gunthorpe
2015-05-06 16:20 ` Jason Gunthorpe
2015-05-06 7:01 ` Bart Van Assche
2015-05-06 7:01 ` Bart Van Assche
2015-05-06 16:38 ` Jason Gunthorpe
2015-05-06 16:38 ` Jason Gunthorpe
2015-05-06 7:33 ` Christoph Hellwig
2015-05-06 7:33 ` Christoph Hellwig
2015-05-06 7:09 ` Bart Van Assche
2015-05-06 7:09 ` Bart Van Assche
2015-05-06 7:29 ` Christoph Hellwig
2015-05-06 7:29 ` Christoph Hellwig
2015-05-06 12:15 ` Sagi Grimberg
2015-05-06 12:15 ` Sagi Grimberg
-- strict thread matches above, loose matches on Subject: below --
2015-03-13 21:26 Chuck Lever
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=55492ED3.7000507@talpey.com \
--to=tom@talpey.com \
--cc=chuck.lever@oracle.com \
--cc=hch@infradead.org \
--cc=linux-nfs@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.