public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
From: Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
To: 'Chuck Lever' <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Cc: 'Sagi Grimberg'
	<sagig-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	'Linux NFS Mailing List'
	<linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Subject: Re: [PATCH v1 06/10] svcrdma: Plant reader function in struct svcxprt_rdma
Date: Mon, 12 Jan 2015 10:45:41 -0600	[thread overview]
Message-ID: <54B3FA35.4030003@opengridcomputing.com> (raw)
In-Reply-To: <006b01d02e84$907f5890$b17e09b0$@opengridcomputing.com>

On 1/12/2015 10:26 AM, Steve Wise wrote:
>
>> -----Original Message-----
>> From: Chuck Lever [mailto:chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org]
>> Sent: Monday, January 12, 2015 10:20 AM
>> To: Steve Wise
>> Cc: Sagi Grimberg; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Linux NFS Mailing List
>> Subject: Re: [PATCH v1 06/10] svcrdma: Plant reader function in struct svcxprt_rdma
>>
>>
>> On Jan 12, 2015, at 11:08 AM, Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org> wrote:
>>
>>>
>>>> -----Original Message-----
>>>> From: Chuck Lever [mailto:chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org]
>>>> Sent: Sunday, January 11, 2015 6:41 PM
>>>> To: Sagi Grimberg; Steve Wise
>>>> Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Linux NFS Mailing List
>>>> Subject: Re: [PATCH v1 06/10] svcrdma: Plant reader function in struct svcxprt_rdma
>>>>
>>>>
>>>> On Jan 11, 2015, at 12:45 PM, Sagi Grimberg <sagig-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> wrote:
>>>>
>>>>> On 1/9/2015 9:22 PM, Chuck Lever wrote:
>>>>>> The RDMA reader function doesn't change once an svcxprt is
>>>>>> instantiated. Instead of checking sc_devcap during every incoming
>>>>>> RPC, set the reader function once when the connection is accepted.
>>>>> General question(s),
>>>>>
>>>>> Any specific reason why to use FRMR in the server side? And why only
>>>>> for reads and not writes? Sorry if these are dumb questions...
>>>> Steve Wise presented patches a few months back to add FRMR, he
>>>> would have to answer this. Steve has a selection of iWARP adapters
>>>> and maybe could provide some idea of performance impact. I have
>>>> only CX-[23] here.
>>>>
>>> The rdma rpc server has always tried to use FRMR for rdma reads as far as I recall.  The patch I submitted refactored the design
> in
>>> order to make it more efficient and to fix some bugs.   Unlike IB, the iWARP  protocol only allows 1 target/sink SGE in an rdma
> read
>>> request message, so an FRMR is used to create that single target/sink SGE allowing 1 read to be submitted instead of many.
>> How does this work when the client uses PHYSICAL memory registration?
> Each page would require a separate rdma read WR.  That is why we use FRMRs. :)

Correction, each physical scatter gather entry would require a separate 
read WR.  There may be contiguous chunks of physical mem that can be 
described with one RDMA SGE...


>> It can't form a read/write list SGE larger than a page, thus the
>> server must emit an RDMA READ or WRITE for each page in the payload.
>>
>> Curious, have you tried using iWARP with PHYSICAL MR on the client?
>>
> No I haven't.
>
>>> I
>>> believe that the FRMR allows for more efficient IO since w/o it you end up with large SGLs of 4K each and lots of read requests.
>>> However, I have no data to back that up.  I would think that the write side (NFS READ) could also benefit from FRMRs too.  It
> also
>>> could use refactoring, because I believe it still creates an intermediate data structure to hold the write chunks vs just
>>> translating them directly into the RDMA SGLs needed for the IO.  See send_write_chunks() and send_write() and how they create a
>>> svc_rdma_req_map vector first and then translate that into the SGL needed for the rdma writes.
>>>
>>>
>>>> My next step is to do some performance measurement to see if FRMR
>>>> is worth the trouble, at least with the cards on hand.
>>>>
>>>> I notice that the lcl case does not seem to work with my CX-3 Pro.
>>>> Probably a bug I will have to address first.
>>>>
>>>>> Sagi.
>>>>>
>>>>>> Signed-off-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
>>>>>> ---
>>>>>>
>>>>>> include/linux/sunrpc/svc_rdma.h          |   10 ++++
>>>>>> net/sunrpc/xprtrdma/svc_rdma_recvfrom.c  |   71 +++++++++++-------------------
>>>>>> net/sunrpc/xprtrdma/svc_rdma_transport.c |    2 +
>>>>>> 3 files changed, 39 insertions(+), 44 deletions(-)
>>>>>>
>>>>>> diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
>>>>>> index 2280325..f161e30 100644
>>>>>> --- a/include/linux/sunrpc/svc_rdma.h
>>>>>> +++ b/include/linux/sunrpc/svc_rdma.h
>>>>>> @@ -150,6 +150,10 @@ struct svcxprt_rdma {
>>>>>> 	struct ib_cq         *sc_rq_cq;
>>>>>> 	struct ib_cq         *sc_sq_cq;
>>>>>> 	struct ib_mr         *sc_phys_mr;	/* MR for server memory */
>>>>>> +	int		     (*sc_reader)(struct svcxprt_rdma *,
>>>>>> +					  struct svc_rqst *,
>>>>>> +					  struct svc_rdma_op_ctxt *,
>>>>>> +					  int *, u32 *, u32, u32, u64, bool);
>>>>>> 	u32		     sc_dev_caps;	/* distilled device caps */
>>>>>> 	u32		     sc_dma_lkey;	/* local dma key */
>>>>>> 	unsigned int	     sc_frmr_pg_list_len;
>>>>>> @@ -195,6 +199,12 @@ extern int svc_rdma_xdr_get_reply_hdr_len(struct rpcrdma_msg *);
>>>>>>
>>>>>> /* svc_rdma_recvfrom.c */
>>>>>> extern int svc_rdma_recvfrom(struct svc_rqst *);
>>>>>> +extern int rdma_read_chunk_lcl(struct svcxprt_rdma *, struct svc_rqst *,
>>>>>> +			       struct svc_rdma_op_ctxt *, int *, u32 *,
>>>>>> +			       u32, u32, u64, bool);
>>>>>> +extern int rdma_read_chunk_frmr(struct svcxprt_rdma *, struct svc_rqst *,
>>>>>> +				struct svc_rdma_op_ctxt *, int *, u32 *,
>>>>>> +				u32, u32, u64, bool);
>>>>>>
>>>>>> /* svc_rdma_sendto.c */
>>>>>> extern int svc_rdma_sendto(struct svc_rqst *);
>>>>>> diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
>>>>>> index 577f865..c3aebc1 100644
>>>>>> --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
>>>>>> +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
>>>>>> @@ -117,26 +117,16 @@ static int rdma_read_max_sge(struct svcxprt_rdma *xprt, int sge_count)
>>>>>> 		return min_t(int, sge_count, xprt->sc_max_sge);
>>>>>> }
>>>>>>
>>>>>> -typedef int (*rdma_reader_fn)(struct svcxprt_rdma *xprt,
>>>>>> -			      struct svc_rqst *rqstp,
>>>>>> -			      struct svc_rdma_op_ctxt *head,
>>>>>> -			      int *page_no,
>>>>>> -			      u32 *page_offset,
>>>>>> -			      u32 rs_handle,
>>>>>> -			      u32 rs_length,
>>>>>> -			      u64 rs_offset,
>>>>>> -			      int last);
>>>>>> -
>>>>>> /* Issue an RDMA_READ using the local lkey to map the data sink */
>>>>>> -static int rdma_read_chunk_lcl(struct svcxprt_rdma *xprt,
>>>>>> -			       struct svc_rqst *rqstp,
>>>>>> -			       struct svc_rdma_op_ctxt *head,
>>>>>> -			       int *page_no,
>>>>>> -			       u32 *page_offset,
>>>>>> -			       u32 rs_handle,
>>>>>> -			       u32 rs_length,
>>>>>> -			       u64 rs_offset,
>>>>>> -			       int last)
>>>>>> +int rdma_read_chunk_lcl(struct svcxprt_rdma *xprt,
>>>>>> +			struct svc_rqst *rqstp,
>>>>>> +			struct svc_rdma_op_ctxt *head,
>>>>>> +			int *page_no,
>>>>>> +			u32 *page_offset,
>>>>>> +			u32 rs_handle,
>>>>>> +			u32 rs_length,
>>>>>> +			u64 rs_offset,
>>>>>> +			bool last)
>>>>>> {
>>>>>> 	struct ib_send_wr read_wr;
>>>>>> 	int pages_needed = PAGE_ALIGN(*page_offset + rs_length) >> PAGE_SHIFT;
>>>>>> @@ -221,15 +211,15 @@ static int rdma_read_chunk_lcl(struct svcxprt_rdma *xprt,
>>>>>> }
>>>>>>
>>>>>> /* Issue an RDMA_READ using an FRMR to map the data sink */
>>>>>> -static int rdma_read_chunk_frmr(struct svcxprt_rdma *xprt,
>>>>>> -				struct svc_rqst *rqstp,
>>>>>> -				struct svc_rdma_op_ctxt *head,
>>>>>> -				int *page_no,
>>>>>> -				u32 *page_offset,
>>>>>> -				u32 rs_handle,
>>>>>> -				u32 rs_length,
>>>>>> -				u64 rs_offset,
>>>>>> -				int last)
>>>>>> +int rdma_read_chunk_frmr(struct svcxprt_rdma *xprt,
>>>>>> +			 struct svc_rqst *rqstp,
>>>>>> +			 struct svc_rdma_op_ctxt *head,
>>>>>> +			 int *page_no,
>>>>>> +			 u32 *page_offset,
>>>>>> +			 u32 rs_handle,
>>>>>> +			 u32 rs_length,
>>>>>> +			 u64 rs_offset,
>>>>>> +			 bool last)
>>>>>> {
>>>>>> 	struct ib_send_wr read_wr;
>>>>>> 	struct ib_send_wr inv_wr;
>>>>>> @@ -374,9 +364,9 @@ static int rdma_read_chunks(struct svcxprt_rdma *xprt,
>>>>>> {
>>>>>> 	int page_no, ret;
>>>>>> 	struct rpcrdma_read_chunk *ch;
>>>>>> -	u32 page_offset, byte_count;
>>>>>> +	u32 handle, page_offset, byte_count;
>>>>>> 	u64 rs_offset;
>>>>>> -	rdma_reader_fn reader;
>>>>>> +	bool last;
>>>>>>
>>>>>> 	/* If no read list is present, return 0 */
>>>>>> 	ch = svc_rdma_get_read_chunk(rmsgp);
>>>>>> @@ -399,27 +389,20 @@ static int rdma_read_chunks(struct svcxprt_rdma *xprt,
>>>>>> 	head->arg.len = rqstp->rq_arg.len;
>>>>>> 	head->arg.buflen = rqstp->rq_arg.buflen;
>>>>>>
>>>>>> -	/* Use FRMR if supported */
>>>>>> -	if (xprt->sc_dev_caps & SVCRDMA_DEVCAP_FAST_REG)
>>>>>> -		reader = rdma_read_chunk_frmr;
>>>>>> -	else
>>>>>> -		reader = rdma_read_chunk_lcl;
>>>>>> -
>>>>>> 	page_no = 0; page_offset = 0;
>>>>>> 	for (ch = (struct rpcrdma_read_chunk *)&rmsgp->rm_body.rm_chunks[0];
>>>>>> 	     ch->rc_discrim != 0; ch++) {
>>>>>> -
>>>>>> +		handle = be32_to_cpu(ch->rc_target.rs_handle);
>>>>>> +		byte_count = be32_to_cpu(ch->rc_target.rs_length);
>>>>>> 		xdr_decode_hyper((__be32 *)&ch->rc_target.rs_offset,
>>>>>> 				 &rs_offset);
>>>>>> -		byte_count = ntohl(ch->rc_target.rs_length);
>>>>>>
>>>>>> 		while (byte_count > 0) {
>>>>>> -			ret = reader(xprt, rqstp, head,
>>>>>> -				     &page_no, &page_offset,
>>>>>> -				     ntohl(ch->rc_target.rs_handle),
>>>>>> -				     byte_count, rs_offset,
>>>>>> -				     ((ch+1)->rc_discrim == 0) /* last */
>>>>>> -				     );
>>>>>> +			last = (ch + 1)->rc_discrim == xdr_zero;
>>>>>> +			ret = xprt->sc_reader(xprt, rqstp, head,
>>>>>> +					      &page_no, &page_offset,
>>>>>> +					      handle, byte_count,
>>>>>> +					      rs_offset, last);
>>>>>> 			if (ret < 0)
>>>>>> 				goto err;
>>>>>> 			byte_count -= ret;
>>>>>> diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
>>>>>> index f2e059b..f609c1c 100644
>>>>>> --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
>>>>>> +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
>>>>>> @@ -974,10 +974,12 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
>>>>>> 	 * NB:	iWARP requires remote write access for the data sink
>>>>>> 	 *	of an RDMA_READ. IB does not.
>>>>>> 	 */
>>>>>> +	newxprt->sc_reader = rdma_read_chunk_lcl;
>>>>>> 	if (devattr.device_cap_flags & IB_DEVICE_MEM_MGT_EXTENSIONS) {
>>>>>> 		newxprt->sc_frmr_pg_list_len =
>>>>>> 			devattr.max_fast_reg_page_list_len;
>>>>>> 		newxprt->sc_dev_caps |= SVCRDMA_DEVCAP_FAST_REG;
>>>>>> +		newxprt->sc_reader = rdma_read_chunk_frmr;
>>>>>> 	}
>>>>>>
>>>>>> 	/*
>>>>>>
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>>>>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>> --
>>>> Chuck Lever
>>>> chuck[dot]lever[at]oracle[dot]com
>>>>
>>>
>> --
>> Chuck Lever
>> chuck[dot]lever[at]oracle[dot]com
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2015-01-12 16:45 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-01-09 19:21 [PATCH v1 00/10] NFS/RDMA server for 3.20 Chuck Lever
     [not found] ` <20150109191910.4901.29548.stgit-Hs+gFlyCn65vLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2015-01-09 19:22   ` [PATCH v1 01/10] svcrdma: Clean up dprintk Chuck Lever
2015-01-09 19:22   ` [PATCH v1 02/10] svcrdma: Remove unused variable Chuck Lever
2015-01-09 19:22   ` [PATCH v1 03/10] svcrdma: Clean up read chunk counting Chuck Lever
2015-01-09 19:22   ` [PATCH v1 04/10] svcrdma: Scrub BUG_ON() and WARN_ON() call sites Chuck Lever
2015-01-09 19:22   ` [PATCH v1 05/10] svcrdma: Find rmsgp more reliably Chuck Lever
     [not found]     ` <20150109192237.4901.92644.stgit-Hs+gFlyCn65vLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2015-01-11 17:37       ` Sagi Grimberg
     [not found]         ` <54B2B4E0.5060901-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-01-12  0:30           ` Chuck Lever
     [not found]             ` <3C09A798-2BA9-46A1-AA60-122C2274974C-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2015-01-13 10:07               ` Sagi Grimberg
2015-01-09 19:22   ` [PATCH v1 06/10] svcrdma: Plant reader function in struct svcxprt_rdma Chuck Lever
     [not found]     ` <20150109192245.4901.89614.stgit-Hs+gFlyCn65vLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2015-01-11 17:45       ` Sagi Grimberg
     [not found]         ` <54B2B69E.2010503-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-01-12  0:41           ` Chuck Lever
     [not found]             ` <6A78707C-A371-412F-8E9A-24937318A01D-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2015-01-12 16:08               ` Steve Wise
2015-01-12 16:20                 ` Chuck Lever
     [not found]                   ` <A84D07C5-1879-49ED-A181-6FFC76B4864B-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2015-01-12 16:26                     ` Steve Wise
2015-01-12 16:45                       ` Steve Wise [this message]
     [not found]                         ` <54B3FA35.4030003-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2015-01-13 10:05                           ` Sagi Grimberg
     [not found]                             ` <54B4EDE9.2050300-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-01-13 15:40                               ` Steve Wise
2015-01-09 19:22   ` [PATCH v1 07/10] svcrdma: rc_position sanity checking Chuck Lever
2015-01-09 19:23   ` [PATCH v1 08/10] svcrdma: Support RDMA_NOMSG requests Chuck Lever
2015-01-09 19:23   ` [PATCH v1 09/10] Move read list XDR round-up logic Chuck Lever
     [not found]     ` <20150109192310.4901.62851.stgit-Hs+gFlyCn65vLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2015-01-09 20:14       ` J. Bruce Fields
     [not found]         ` <20150109201434.GA30452-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2015-01-09 20:20           ` Chuck Lever
2015-01-09 19:23   ` [PATCH v1 10/10] svcrdma: Handle additional inline content Chuck Lever
     [not found]     ` <20150109192319.4901.89444.stgit-Hs+gFlyCn65vLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2015-01-11 18:01       ` Sagi Grimberg
     [not found]         ` <54B2BA77.20101-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-01-12  1:13           ` Chuck Lever
     [not found]             ` <46D2849E-39D7-4290-91CE-FD66E3F96B21-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2015-01-13 10:11               ` Sagi Grimberg
     [not found]                 ` <54B4EF5D.3040201-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-01-13 14:35                   ` Chuck Lever
2015-01-09 20:39   ` [PATCH v1 00/10] NFS/RDMA server for 3.20 J. Bruce Fields
     [not found]     ` <20150109203958.GB30452-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2015-01-09 20:40       ` Chuck Lever
     [not found]         ` <629A4CE4-ECB9-4A1D-9179-CFAD2FC7AD91-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2015-01-09 20:44           ` J. Bruce Fields

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54B3FA35.4030003@opengridcomputing.com \
    --to=swise-7bpotxp6k4+p2yhjcf5u+vpxobypeauw@public.gmane.org \
    --cc=chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org \
    --cc=linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=sagig-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox