public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
From: "Steve Wise" <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
To: 'Chuck Lever' <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Cc: 'Sagi Grimberg'
	<sagig-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	'Linux NFS Mailing List'
	<linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Subject: RE: [PATCH v1 06/10] svcrdma: Plant reader function in struct svcxprt_rdma
Date: Mon, 12 Jan 2015 10:26:48 -0600	[thread overview]
Message-ID: <006b01d02e84$907f5890$b17e09b0$@opengridcomputing.com> (raw)
In-Reply-To: <A84D07C5-1879-49ED-A181-6FFC76B4864B-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>



> -----Original Message-----
> From: Chuck Lever [mailto:chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org]
> Sent: Monday, January 12, 2015 10:20 AM
> To: Steve Wise
> Cc: Sagi Grimberg; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Linux NFS Mailing List
> Subject: Re: [PATCH v1 06/10] svcrdma: Plant reader function in struct svcxprt_rdma
> 
> 
> On Jan 12, 2015, at 11:08 AM, Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org> wrote:
> 
> >
> >
> >> -----Original Message-----
> >> From: Chuck Lever [mailto:chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org]
> >> Sent: Sunday, January 11, 2015 6:41 PM
> >> To: Sagi Grimberg; Steve Wise
> >> Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Linux NFS Mailing List
> >> Subject: Re: [PATCH v1 06/10] svcrdma: Plant reader function in struct svcxprt_rdma
> >>
> >>
> >> On Jan 11, 2015, at 12:45 PM, Sagi Grimberg <sagig-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> wrote:
> >>
> >>> On 1/9/2015 9:22 PM, Chuck Lever wrote:
> >>>> The RDMA reader function doesn't change once an svcxprt is
> >>>> instantiated. Instead of checking sc_devcap during every incoming
> >>>> RPC, set the reader function once when the connection is accepted.
> >>>
> >>> General question(s),
> >>>
> >>> Any specific reason why to use FRMR in the server side? And why only
> >>> for reads and not writes? Sorry if these are dumb questions...
> >>
> >> Steve Wise presented patches a few months back to add FRMR, he
> >> would have to answer this. Steve has a selection of iWARP adapters
> >> and maybe could provide some idea of performance impact. I have
> >> only CX-[23] here.
> >>
> >
> > The rdma rpc server has always tried to use FRMR for rdma reads as far as I recall.  The patch I submitted refactored the design
in
> > order to make it more efficient and to fix some bugs.   Unlike IB, the iWARP  protocol only allows 1 target/sink SGE in an rdma
read
> > request message, so an FRMR is used to create that single target/sink SGE allowing 1 read to be submitted instead of many.
> 
> How does this work when the client uses PHYSICAL memory registration?

Each page would require a separate rdma read WR.  That is why we use FRMRs. :)

> It can't form a read/write list SGE larger than a page, thus the
> server must emit an RDMA READ or WRITE for each page in the payload.
> 
> Curious, have you tried using iWARP with PHYSICAL MR on the client?
> 

No I haven't. 

> > I
> > believe that the FRMR allows for more efficient IO since w/o it you end up with large SGLs of 4K each and lots of read requests.
> > However, I have no data to back that up.  I would think that the write side (NFS READ) could also benefit from FRMRs too.  It
also
> > could use refactoring, because I believe it still creates an intermediate data structure to hold the write chunks vs just
> > translating them directly into the RDMA SGLs needed for the IO.  See send_write_chunks() and send_write() and how they create a
> > svc_rdma_req_map vector first and then translate that into the SGL needed for the rdma writes.
> >
> >
> >> My next step is to do some performance measurement to see if FRMR
> >> is worth the trouble, at least with the cards on hand.
> >>
> >> I notice that the lcl case does not seem to work with my CX-3 Pro.
> >> Probably a bug I will have to address first.
> >>
> >
> >>
> >>> Sagi.
> >>>
> >>>> Signed-off-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
> >>>> ---
> >>>>
> >>>> include/linux/sunrpc/svc_rdma.h          |   10 ++++
> >>>> net/sunrpc/xprtrdma/svc_rdma_recvfrom.c  |   71 +++++++++++-------------------
> >>>> net/sunrpc/xprtrdma/svc_rdma_transport.c |    2 +
> >>>> 3 files changed, 39 insertions(+), 44 deletions(-)
> >>>>
> >>>> diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
> >>>> index 2280325..f161e30 100644
> >>>> --- a/include/linux/sunrpc/svc_rdma.h
> >>>> +++ b/include/linux/sunrpc/svc_rdma.h
> >>>> @@ -150,6 +150,10 @@ struct svcxprt_rdma {
> >>>> 	struct ib_cq         *sc_rq_cq;
> >>>> 	struct ib_cq         *sc_sq_cq;
> >>>> 	struct ib_mr         *sc_phys_mr;	/* MR for server memory */
> >>>> +	int		     (*sc_reader)(struct svcxprt_rdma *,
> >>>> +					  struct svc_rqst *,
> >>>> +					  struct svc_rdma_op_ctxt *,
> >>>> +					  int *, u32 *, u32, u32, u64, bool);
> >>>> 	u32		     sc_dev_caps;	/* distilled device caps */
> >>>> 	u32		     sc_dma_lkey;	/* local dma key */
> >>>> 	unsigned int	     sc_frmr_pg_list_len;
> >>>> @@ -195,6 +199,12 @@ extern int svc_rdma_xdr_get_reply_hdr_len(struct rpcrdma_msg *);
> >>>>
> >>>> /* svc_rdma_recvfrom.c */
> >>>> extern int svc_rdma_recvfrom(struct svc_rqst *);
> >>>> +extern int rdma_read_chunk_lcl(struct svcxprt_rdma *, struct svc_rqst *,
> >>>> +			       struct svc_rdma_op_ctxt *, int *, u32 *,
> >>>> +			       u32, u32, u64, bool);
> >>>> +extern int rdma_read_chunk_frmr(struct svcxprt_rdma *, struct svc_rqst *,
> >>>> +				struct svc_rdma_op_ctxt *, int *, u32 *,
> >>>> +				u32, u32, u64, bool);
> >>>>
> >>>> /* svc_rdma_sendto.c */
> >>>> extern int svc_rdma_sendto(struct svc_rqst *);
> >>>> diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> >>>> index 577f865..c3aebc1 100644
> >>>> --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> >>>> +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> >>>> @@ -117,26 +117,16 @@ static int rdma_read_max_sge(struct svcxprt_rdma *xprt, int sge_count)
> >>>> 		return min_t(int, sge_count, xprt->sc_max_sge);
> >>>> }
> >>>>
> >>>> -typedef int (*rdma_reader_fn)(struct svcxprt_rdma *xprt,
> >>>> -			      struct svc_rqst *rqstp,
> >>>> -			      struct svc_rdma_op_ctxt *head,
> >>>> -			      int *page_no,
> >>>> -			      u32 *page_offset,
> >>>> -			      u32 rs_handle,
> >>>> -			      u32 rs_length,
> >>>> -			      u64 rs_offset,
> >>>> -			      int last);
> >>>> -
> >>>> /* Issue an RDMA_READ using the local lkey to map the data sink */
> >>>> -static int rdma_read_chunk_lcl(struct svcxprt_rdma *xprt,
> >>>> -			       struct svc_rqst *rqstp,
> >>>> -			       struct svc_rdma_op_ctxt *head,
> >>>> -			       int *page_no,
> >>>> -			       u32 *page_offset,
> >>>> -			       u32 rs_handle,
> >>>> -			       u32 rs_length,
> >>>> -			       u64 rs_offset,
> >>>> -			       int last)
> >>>> +int rdma_read_chunk_lcl(struct svcxprt_rdma *xprt,
> >>>> +			struct svc_rqst *rqstp,
> >>>> +			struct svc_rdma_op_ctxt *head,
> >>>> +			int *page_no,
> >>>> +			u32 *page_offset,
> >>>> +			u32 rs_handle,
> >>>> +			u32 rs_length,
> >>>> +			u64 rs_offset,
> >>>> +			bool last)
> >>>> {
> >>>> 	struct ib_send_wr read_wr;
> >>>> 	int pages_needed = PAGE_ALIGN(*page_offset + rs_length) >> PAGE_SHIFT;
> >>>> @@ -221,15 +211,15 @@ static int rdma_read_chunk_lcl(struct svcxprt_rdma *xprt,
> >>>> }
> >>>>
> >>>> /* Issue an RDMA_READ using an FRMR to map the data sink */
> >>>> -static int rdma_read_chunk_frmr(struct svcxprt_rdma *xprt,
> >>>> -				struct svc_rqst *rqstp,
> >>>> -				struct svc_rdma_op_ctxt *head,
> >>>> -				int *page_no,
> >>>> -				u32 *page_offset,
> >>>> -				u32 rs_handle,
> >>>> -				u32 rs_length,
> >>>> -				u64 rs_offset,
> >>>> -				int last)
> >>>> +int rdma_read_chunk_frmr(struct svcxprt_rdma *xprt,
> >>>> +			 struct svc_rqst *rqstp,
> >>>> +			 struct svc_rdma_op_ctxt *head,
> >>>> +			 int *page_no,
> >>>> +			 u32 *page_offset,
> >>>> +			 u32 rs_handle,
> >>>> +			 u32 rs_length,
> >>>> +			 u64 rs_offset,
> >>>> +			 bool last)
> >>>> {
> >>>> 	struct ib_send_wr read_wr;
> >>>> 	struct ib_send_wr inv_wr;
> >>>> @@ -374,9 +364,9 @@ static int rdma_read_chunks(struct svcxprt_rdma *xprt,
> >>>> {
> >>>> 	int page_no, ret;
> >>>> 	struct rpcrdma_read_chunk *ch;
> >>>> -	u32 page_offset, byte_count;
> >>>> +	u32 handle, page_offset, byte_count;
> >>>> 	u64 rs_offset;
> >>>> -	rdma_reader_fn reader;
> >>>> +	bool last;
> >>>>
> >>>> 	/* If no read list is present, return 0 */
> >>>> 	ch = svc_rdma_get_read_chunk(rmsgp);
> >>>> @@ -399,27 +389,20 @@ static int rdma_read_chunks(struct svcxprt_rdma *xprt,
> >>>> 	head->arg.len = rqstp->rq_arg.len;
> >>>> 	head->arg.buflen = rqstp->rq_arg.buflen;
> >>>>
> >>>> -	/* Use FRMR if supported */
> >>>> -	if (xprt->sc_dev_caps & SVCRDMA_DEVCAP_FAST_REG)
> >>>> -		reader = rdma_read_chunk_frmr;
> >>>> -	else
> >>>> -		reader = rdma_read_chunk_lcl;
> >>>> -
> >>>> 	page_no = 0; page_offset = 0;
> >>>> 	for (ch = (struct rpcrdma_read_chunk *)&rmsgp->rm_body.rm_chunks[0];
> >>>> 	     ch->rc_discrim != 0; ch++) {
> >>>> -
> >>>> +		handle = be32_to_cpu(ch->rc_target.rs_handle);
> >>>> +		byte_count = be32_to_cpu(ch->rc_target.rs_length);
> >>>> 		xdr_decode_hyper((__be32 *)&ch->rc_target.rs_offset,
> >>>> 				 &rs_offset);
> >>>> -		byte_count = ntohl(ch->rc_target.rs_length);
> >>>>
> >>>> 		while (byte_count > 0) {
> >>>> -			ret = reader(xprt, rqstp, head,
> >>>> -				     &page_no, &page_offset,
> >>>> -				     ntohl(ch->rc_target.rs_handle),
> >>>> -				     byte_count, rs_offset,
> >>>> -				     ((ch+1)->rc_discrim == 0) /* last */
> >>>> -				     );
> >>>> +			last = (ch + 1)->rc_discrim == xdr_zero;
> >>>> +			ret = xprt->sc_reader(xprt, rqstp, head,
> >>>> +					      &page_no, &page_offset,
> >>>> +					      handle, byte_count,
> >>>> +					      rs_offset, last);
> >>>> 			if (ret < 0)
> >>>> 				goto err;
> >>>> 			byte_count -= ret;
> >>>> diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
> >>>> index f2e059b..f609c1c 100644
> >>>> --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
> >>>> +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
> >>>> @@ -974,10 +974,12 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
> >>>> 	 * NB:	iWARP requires remote write access for the data sink
> >>>> 	 *	of an RDMA_READ. IB does not.
> >>>> 	 */
> >>>> +	newxprt->sc_reader = rdma_read_chunk_lcl;
> >>>> 	if (devattr.device_cap_flags & IB_DEVICE_MEM_MGT_EXTENSIONS) {
> >>>> 		newxprt->sc_frmr_pg_list_len =
> >>>> 			devattr.max_fast_reg_page_list_len;
> >>>> 		newxprt->sc_dev_caps |= SVCRDMA_DEVCAP_FAST_REG;
> >>>> +		newxprt->sc_reader = rdma_read_chunk_frmr;
> >>>> 	}
> >>>>
> >>>> 	/*
> >>>>
> >>>> --
> >>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> >>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> >>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>
> >>>
> >>> --
> >>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> >>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> >>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>
> >> --
> >> Chuck Lever
> >> chuck[dot]lever[at]oracle[dot]com
> >>
> >
> >
> 
> --
> Chuck Lever
> chuck[dot]lever[at]oracle[dot]com
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  parent reply	other threads:[~2015-01-12 16:26 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-01-09 19:21 [PATCH v1 00/10] NFS/RDMA server for 3.20 Chuck Lever
     [not found] ` <20150109191910.4901.29548.stgit-Hs+gFlyCn65vLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2015-01-09 19:22   ` [PATCH v1 01/10] svcrdma: Clean up dprintk Chuck Lever
2015-01-09 19:22   ` [PATCH v1 02/10] svcrdma: Remove unused variable Chuck Lever
2015-01-09 19:22   ` [PATCH v1 03/10] svcrdma: Clean up read chunk counting Chuck Lever
2015-01-09 19:22   ` [PATCH v1 04/10] svcrdma: Scrub BUG_ON() and WARN_ON() call sites Chuck Lever
2015-01-09 19:22   ` [PATCH v1 05/10] svcrdma: Find rmsgp more reliably Chuck Lever
     [not found]     ` <20150109192237.4901.92644.stgit-Hs+gFlyCn65vLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2015-01-11 17:37       ` Sagi Grimberg
     [not found]         ` <54B2B4E0.5060901-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-01-12  0:30           ` Chuck Lever
     [not found]             ` <3C09A798-2BA9-46A1-AA60-122C2274974C-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2015-01-13 10:07               ` Sagi Grimberg
2015-01-09 19:22   ` [PATCH v1 06/10] svcrdma: Plant reader function in struct svcxprt_rdma Chuck Lever
     [not found]     ` <20150109192245.4901.89614.stgit-Hs+gFlyCn65vLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2015-01-11 17:45       ` Sagi Grimberg
     [not found]         ` <54B2B69E.2010503-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-01-12  0:41           ` Chuck Lever
     [not found]             ` <6A78707C-A371-412F-8E9A-24937318A01D-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2015-01-12 16:08               ` Steve Wise
2015-01-12 16:20                 ` Chuck Lever
     [not found]                   ` <A84D07C5-1879-49ED-A181-6FFC76B4864B-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2015-01-12 16:26                     ` Steve Wise [this message]
2015-01-12 16:45                       ` Steve Wise
     [not found]                         ` <54B3FA35.4030003-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2015-01-13 10:05                           ` Sagi Grimberg
     [not found]                             ` <54B4EDE9.2050300-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-01-13 15:40                               ` Steve Wise
2015-01-09 19:22   ` [PATCH v1 07/10] svcrdma: rc_position sanity checking Chuck Lever
2015-01-09 19:23   ` [PATCH v1 08/10] svcrdma: Support RDMA_NOMSG requests Chuck Lever
2015-01-09 19:23   ` [PATCH v1 09/10] Move read list XDR round-up logic Chuck Lever
     [not found]     ` <20150109192310.4901.62851.stgit-Hs+gFlyCn65vLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2015-01-09 20:14       ` J. Bruce Fields
     [not found]         ` <20150109201434.GA30452-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2015-01-09 20:20           ` Chuck Lever
2015-01-09 19:23   ` [PATCH v1 10/10] svcrdma: Handle additional inline content Chuck Lever
     [not found]     ` <20150109192319.4901.89444.stgit-Hs+gFlyCn65vLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2015-01-11 18:01       ` Sagi Grimberg
     [not found]         ` <54B2BA77.20101-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-01-12  1:13           ` Chuck Lever
     [not found]             ` <46D2849E-39D7-4290-91CE-FD66E3F96B21-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2015-01-13 10:11               ` Sagi Grimberg
     [not found]                 ` <54B4EF5D.3040201-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-01-13 14:35                   ` Chuck Lever
2015-01-09 20:39   ` [PATCH v1 00/10] NFS/RDMA server for 3.20 J. Bruce Fields
     [not found]     ` <20150109203958.GB30452-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2015-01-09 20:40       ` Chuck Lever
     [not found]         ` <629A4CE4-ECB9-4A1D-9179-CFAD2FC7AD91-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2015-01-09 20:44           ` J. Bruce Fields

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='006b01d02e84$907f5890$b17e09b0$@opengridcomputing.com' \
    --to=swise-7bpotxp6k4+p2yhjcf5u+vpxobypeauw@public.gmane.org \
    --cc=chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org \
    --cc=linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=sagig-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox