From: Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
To: 'Chuck Lever' <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Cc: 'Sagi Grimberg'
<sagig-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
'Linux NFS Mailing List'
<linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Subject: Re: [PATCH v1 06/10] svcrdma: Plant reader function in struct svcxprt_rdma
Date: Mon, 12 Jan 2015 10:45:41 -0600 [thread overview]
Message-ID: <54B3FA35.4030003@opengridcomputing.com> (raw)
In-Reply-To: <006b01d02e84$907f5890$b17e09b0$@opengridcomputing.com>
On 1/12/2015 10:26 AM, Steve Wise wrote:
>
>> -----Original Message-----
>> From: Chuck Lever [mailto:chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org]
>> Sent: Monday, January 12, 2015 10:20 AM
>> To: Steve Wise
>> Cc: Sagi Grimberg; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Linux NFS Mailing List
>> Subject: Re: [PATCH v1 06/10] svcrdma: Plant reader function in struct svcxprt_rdma
>>
>>
>> On Jan 12, 2015, at 11:08 AM, Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org> wrote:
>>
>>>
>>>> -----Original Message-----
>>>> From: Chuck Lever [mailto:chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org]
>>>> Sent: Sunday, January 11, 2015 6:41 PM
>>>> To: Sagi Grimberg; Steve Wise
>>>> Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Linux NFS Mailing List
>>>> Subject: Re: [PATCH v1 06/10] svcrdma: Plant reader function in struct svcxprt_rdma
>>>>
>>>>
>>>> On Jan 11, 2015, at 12:45 PM, Sagi Grimberg <sagig-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> wrote:
>>>>
>>>>> On 1/9/2015 9:22 PM, Chuck Lever wrote:
>>>>>> The RDMA reader function doesn't change once an svcxprt is
>>>>>> instantiated. Instead of checking sc_devcap during every incoming
>>>>>> RPC, set the reader function once when the connection is accepted.
>>>>> General question(s),
>>>>>
>>>>> Any specific reason why to use FRMR in the server side? And why only
>>>>> for reads and not writes? Sorry if these are dumb questions...
>>>> Steve Wise presented patches a few months back to add FRMR, he
>>>> would have to answer this. Steve has a selection of iWARP adapters
>>>> and maybe could provide some idea of performance impact. I have
>>>> only CX-[23] here.
>>>>
>>> The rdma rpc server has always tried to use FRMR for rdma reads as far as I recall. The patch I submitted refactored the design
> in
>>> order to make it more efficient and to fix some bugs. Unlike IB, the iWARP protocol only allows 1 target/sink SGE in an rdma
> read
>>> request message, so an FRMR is used to create that single target/sink SGE allowing 1 read to be submitted instead of many.
>> How does this work when the client uses PHYSICAL memory registration?
> Each page would require a separate rdma read WR. That is why we use FRMRs. :)
Correction, each physical scatter gather entry would require a separate
read WR. There may be contiguous chunks of physical mem that can be
described with one RDMA SGE...
>> It can't form a read/write list SGE larger than a page, thus the
>> server must emit an RDMA READ or WRITE for each page in the payload.
>>
>> Curious, have you tried using iWARP with PHYSICAL MR on the client?
>>
> No I haven't.
>
>>> I
>>> believe that the FRMR allows for more efficient IO since w/o it you end up with large SGLs of 4K each and lots of read requests.
>>> However, I have no data to back that up. I would think that the write side (NFS READ) could also benefit from FRMRs too. It
> also
>>> could use refactoring, because I believe it still creates an intermediate data structure to hold the write chunks vs just
>>> translating them directly into the RDMA SGLs needed for the IO. See send_write_chunks() and send_write() and how they create a
>>> svc_rdma_req_map vector first and then translate that into the SGL needed for the rdma writes.
>>>
>>>
>>>> My next step is to do some performance measurement to see if FRMR
>>>> is worth the trouble, at least with the cards on hand.
>>>>
>>>> I notice that the lcl case does not seem to work with my CX-3 Pro.
>>>> Probably a bug I will have to address first.
>>>>
>>>>> Sagi.
>>>>>
>>>>>> Signed-off-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
>>>>>> ---
>>>>>>
>>>>>> include/linux/sunrpc/svc_rdma.h | 10 ++++
>>>>>> net/sunrpc/xprtrdma/svc_rdma_recvfrom.c | 71 +++++++++++-------------------
>>>>>> net/sunrpc/xprtrdma/svc_rdma_transport.c | 2 +
>>>>>> 3 files changed, 39 insertions(+), 44 deletions(-)
>>>>>>
>>>>>> diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
>>>>>> index 2280325..f161e30 100644
>>>>>> --- a/include/linux/sunrpc/svc_rdma.h
>>>>>> +++ b/include/linux/sunrpc/svc_rdma.h
>>>>>> @@ -150,6 +150,10 @@ struct svcxprt_rdma {
>>>>>> struct ib_cq *sc_rq_cq;
>>>>>> struct ib_cq *sc_sq_cq;
>>>>>> struct ib_mr *sc_phys_mr; /* MR for server memory */
>>>>>> + int (*sc_reader)(struct svcxprt_rdma *,
>>>>>> + struct svc_rqst *,
>>>>>> + struct svc_rdma_op_ctxt *,
>>>>>> + int *, u32 *, u32, u32, u64, bool);
>>>>>> u32 sc_dev_caps; /* distilled device caps */
>>>>>> u32 sc_dma_lkey; /* local dma key */
>>>>>> unsigned int sc_frmr_pg_list_len;
>>>>>> @@ -195,6 +199,12 @@ extern int svc_rdma_xdr_get_reply_hdr_len(struct rpcrdma_msg *);
>>>>>>
>>>>>> /* svc_rdma_recvfrom.c */
>>>>>> extern int svc_rdma_recvfrom(struct svc_rqst *);
>>>>>> +extern int rdma_read_chunk_lcl(struct svcxprt_rdma *, struct svc_rqst *,
>>>>>> + struct svc_rdma_op_ctxt *, int *, u32 *,
>>>>>> + u32, u32, u64, bool);
>>>>>> +extern int rdma_read_chunk_frmr(struct svcxprt_rdma *, struct svc_rqst *,
>>>>>> + struct svc_rdma_op_ctxt *, int *, u32 *,
>>>>>> + u32, u32, u64, bool);
>>>>>>
>>>>>> /* svc_rdma_sendto.c */
>>>>>> extern int svc_rdma_sendto(struct svc_rqst *);
>>>>>> diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
>>>>>> index 577f865..c3aebc1 100644
>>>>>> --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
>>>>>> +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
>>>>>> @@ -117,26 +117,16 @@ static int rdma_read_max_sge(struct svcxprt_rdma *xprt, int sge_count)
>>>>>> return min_t(int, sge_count, xprt->sc_max_sge);
>>>>>> }
>>>>>>
>>>>>> -typedef int (*rdma_reader_fn)(struct svcxprt_rdma *xprt,
>>>>>> - struct svc_rqst *rqstp,
>>>>>> - struct svc_rdma_op_ctxt *head,
>>>>>> - int *page_no,
>>>>>> - u32 *page_offset,
>>>>>> - u32 rs_handle,
>>>>>> - u32 rs_length,
>>>>>> - u64 rs_offset,
>>>>>> - int last);
>>>>>> -
>>>>>> /* Issue an RDMA_READ using the local lkey to map the data sink */
>>>>>> -static int rdma_read_chunk_lcl(struct svcxprt_rdma *xprt,
>>>>>> - struct svc_rqst *rqstp,
>>>>>> - struct svc_rdma_op_ctxt *head,
>>>>>> - int *page_no,
>>>>>> - u32 *page_offset,
>>>>>> - u32 rs_handle,
>>>>>> - u32 rs_length,
>>>>>> - u64 rs_offset,
>>>>>> - int last)
>>>>>> +int rdma_read_chunk_lcl(struct svcxprt_rdma *xprt,
>>>>>> + struct svc_rqst *rqstp,
>>>>>> + struct svc_rdma_op_ctxt *head,
>>>>>> + int *page_no,
>>>>>> + u32 *page_offset,
>>>>>> + u32 rs_handle,
>>>>>> + u32 rs_length,
>>>>>> + u64 rs_offset,
>>>>>> + bool last)
>>>>>> {
>>>>>> struct ib_send_wr read_wr;
>>>>>> int pages_needed = PAGE_ALIGN(*page_offset + rs_length) >> PAGE_SHIFT;
>>>>>> @@ -221,15 +211,15 @@ static int rdma_read_chunk_lcl(struct svcxprt_rdma *xprt,
>>>>>> }
>>>>>>
>>>>>> /* Issue an RDMA_READ using an FRMR to map the data sink */
>>>>>> -static int rdma_read_chunk_frmr(struct svcxprt_rdma *xprt,
>>>>>> - struct svc_rqst *rqstp,
>>>>>> - struct svc_rdma_op_ctxt *head,
>>>>>> - int *page_no,
>>>>>> - u32 *page_offset,
>>>>>> - u32 rs_handle,
>>>>>> - u32 rs_length,
>>>>>> - u64 rs_offset,
>>>>>> - int last)
>>>>>> +int rdma_read_chunk_frmr(struct svcxprt_rdma *xprt,
>>>>>> + struct svc_rqst *rqstp,
>>>>>> + struct svc_rdma_op_ctxt *head,
>>>>>> + int *page_no,
>>>>>> + u32 *page_offset,
>>>>>> + u32 rs_handle,
>>>>>> + u32 rs_length,
>>>>>> + u64 rs_offset,
>>>>>> + bool last)
>>>>>> {
>>>>>> struct ib_send_wr read_wr;
>>>>>> struct ib_send_wr inv_wr;
>>>>>> @@ -374,9 +364,9 @@ static int rdma_read_chunks(struct svcxprt_rdma *xprt,
>>>>>> {
>>>>>> int page_no, ret;
>>>>>> struct rpcrdma_read_chunk *ch;
>>>>>> - u32 page_offset, byte_count;
>>>>>> + u32 handle, page_offset, byte_count;
>>>>>> u64 rs_offset;
>>>>>> - rdma_reader_fn reader;
>>>>>> + bool last;
>>>>>>
>>>>>> /* If no read list is present, return 0 */
>>>>>> ch = svc_rdma_get_read_chunk(rmsgp);
>>>>>> @@ -399,27 +389,20 @@ static int rdma_read_chunks(struct svcxprt_rdma *xprt,
>>>>>> head->arg.len = rqstp->rq_arg.len;
>>>>>> head->arg.buflen = rqstp->rq_arg.buflen;
>>>>>>
>>>>>> - /* Use FRMR if supported */
>>>>>> - if (xprt->sc_dev_caps & SVCRDMA_DEVCAP_FAST_REG)
>>>>>> - reader = rdma_read_chunk_frmr;
>>>>>> - else
>>>>>> - reader = rdma_read_chunk_lcl;
>>>>>> -
>>>>>> page_no = 0; page_offset = 0;
>>>>>> for (ch = (struct rpcrdma_read_chunk *)&rmsgp->rm_body.rm_chunks[0];
>>>>>> ch->rc_discrim != 0; ch++) {
>>>>>> -
>>>>>> + handle = be32_to_cpu(ch->rc_target.rs_handle);
>>>>>> + byte_count = be32_to_cpu(ch->rc_target.rs_length);
>>>>>> xdr_decode_hyper((__be32 *)&ch->rc_target.rs_offset,
>>>>>> &rs_offset);
>>>>>> - byte_count = ntohl(ch->rc_target.rs_length);
>>>>>>
>>>>>> while (byte_count > 0) {
>>>>>> - ret = reader(xprt, rqstp, head,
>>>>>> - &page_no, &page_offset,
>>>>>> - ntohl(ch->rc_target.rs_handle),
>>>>>> - byte_count, rs_offset,
>>>>>> - ((ch+1)->rc_discrim == 0) /* last */
>>>>>> - );
>>>>>> + last = (ch + 1)->rc_discrim == xdr_zero;
>>>>>> + ret = xprt->sc_reader(xprt, rqstp, head,
>>>>>> + &page_no, &page_offset,
>>>>>> + handle, byte_count,
>>>>>> + rs_offset, last);
>>>>>> if (ret < 0)
>>>>>> goto err;
>>>>>> byte_count -= ret;
>>>>>> diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
>>>>>> index f2e059b..f609c1c 100644
>>>>>> --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
>>>>>> +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
>>>>>> @@ -974,10 +974,12 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
>>>>>> * NB: iWARP requires remote write access for the data sink
>>>>>> * of an RDMA_READ. IB does not.
>>>>>> */
>>>>>> + newxprt->sc_reader = rdma_read_chunk_lcl;
>>>>>> if (devattr.device_cap_flags & IB_DEVICE_MEM_MGT_EXTENSIONS) {
>>>>>> newxprt->sc_frmr_pg_list_len =
>>>>>> devattr.max_fast_reg_page_list_len;
>>>>>> newxprt->sc_dev_caps |= SVCRDMA_DEVCAP_FAST_REG;
>>>>>> + newxprt->sc_reader = rdma_read_chunk_frmr;
>>>>>> }
>>>>>>
>>>>>> /*
>>>>>>
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>>>>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>>>
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>> --
>>>> Chuck Lever
>>>> chuck[dot]lever[at]oracle[dot]com
>>>>
>>>
>> --
>> Chuck Lever
>> chuck[dot]lever[at]oracle[dot]com
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2015-01-12 16:45 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-01-09 19:21 [PATCH v1 00/10] NFS/RDMA server for 3.20 Chuck Lever
[not found] ` <20150109191910.4901.29548.stgit-Hs+gFlyCn65vLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2015-01-09 19:22 ` [PATCH v1 01/10] svcrdma: Clean up dprintk Chuck Lever
2015-01-09 19:22 ` [PATCH v1 02/10] svcrdma: Remove unused variable Chuck Lever
2015-01-09 19:22 ` [PATCH v1 03/10] svcrdma: Clean up read chunk counting Chuck Lever
2015-01-09 19:22 ` [PATCH v1 04/10] svcrdma: Scrub BUG_ON() and WARN_ON() call sites Chuck Lever
2015-01-09 19:22 ` [PATCH v1 05/10] svcrdma: Find rmsgp more reliably Chuck Lever
[not found] ` <20150109192237.4901.92644.stgit-Hs+gFlyCn65vLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2015-01-11 17:37 ` Sagi Grimberg
[not found] ` <54B2B4E0.5060901-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-01-12 0:30 ` Chuck Lever
[not found] ` <3C09A798-2BA9-46A1-AA60-122C2274974C-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2015-01-13 10:07 ` Sagi Grimberg
2015-01-09 19:22 ` [PATCH v1 06/10] svcrdma: Plant reader function in struct svcxprt_rdma Chuck Lever
[not found] ` <20150109192245.4901.89614.stgit-Hs+gFlyCn65vLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2015-01-11 17:45 ` Sagi Grimberg
[not found] ` <54B2B69E.2010503-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-01-12 0:41 ` Chuck Lever
[not found] ` <6A78707C-A371-412F-8E9A-24937318A01D-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2015-01-12 16:08 ` Steve Wise
2015-01-12 16:20 ` Chuck Lever
[not found] ` <A84D07C5-1879-49ED-A181-6FFC76B4864B-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2015-01-12 16:26 ` Steve Wise
2015-01-12 16:45 ` Steve Wise [this message]
[not found] ` <54B3FA35.4030003-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2015-01-13 10:05 ` Sagi Grimberg
[not found] ` <54B4EDE9.2050300-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-01-13 15:40 ` Steve Wise
2015-01-09 19:22 ` [PATCH v1 07/10] svcrdma: rc_position sanity checking Chuck Lever
2015-01-09 19:23 ` [PATCH v1 08/10] svcrdma: Support RDMA_NOMSG requests Chuck Lever
2015-01-09 19:23 ` [PATCH v1 09/10] Move read list XDR round-up logic Chuck Lever
[not found] ` <20150109192310.4901.62851.stgit-Hs+gFlyCn65vLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2015-01-09 20:14 ` J. Bruce Fields
[not found] ` <20150109201434.GA30452-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2015-01-09 20:20 ` Chuck Lever
2015-01-09 19:23 ` [PATCH v1 10/10] svcrdma: Handle additional inline content Chuck Lever
[not found] ` <20150109192319.4901.89444.stgit-Hs+gFlyCn65vLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2015-01-11 18:01 ` Sagi Grimberg
[not found] ` <54B2BA77.20101-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-01-12 1:13 ` Chuck Lever
[not found] ` <46D2849E-39D7-4290-91CE-FD66E3F96B21-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2015-01-13 10:11 ` Sagi Grimberg
[not found] ` <54B4EF5D.3040201-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-01-13 14:35 ` Chuck Lever
2015-01-09 20:39 ` [PATCH v1 00/10] NFS/RDMA server for 3.20 J. Bruce Fields
[not found] ` <20150109203958.GB30452-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2015-01-09 20:40 ` Chuck Lever
[not found] ` <629A4CE4-ECB9-4A1D-9179-CFAD2FC7AD91-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2015-01-09 20:44 ` J. Bruce Fields
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=54B3FA35.4030003@opengridcomputing.com \
--to=swise-7bpotxp6k4+p2yhjcf5u+vpxobypeauw@public.gmane.org \
--cc=chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org \
--cc=linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=sagig-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox