All of lore.kernel.org
 help / color / mirror / Atom feed
From: "J. Bruce Fields" <bfields@fieldses.org>
To: Chuck Lever <chuck.lever@oracle.com>
Cc: Linux NFS Mailing List <linux-nfs@vger.kernel.org>
Subject: Re: [PATCH v1] sunrpc: Allocate up to RPCSVC_MAXPAGES per svc_rqst
Date: Fri, 30 Jun 2017 16:47:33 -0400	[thread overview]
Message-ID: <20170630204733.GE14868@fieldses.org> (raw)
In-Reply-To: <222F5E88-CC11-418B-B9FF-1EF10DE60BEB@oracle.com>

On Fri, Jun 30, 2017 at 04:33:23PM -0400, Chuck Lever wrote:
> 
> > On Jun 30, 2017, at 4:16 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
> > 
> > I like that better.
> > 
> > I'm still pretty confused.
> > 
> > The long comment above RPCSVC_MAXPAGES convinces me that it's
> > calculating the maximum number of pages a TCP request might need.  But
> > apparently rq_pages has never been large enough to hold that many pages.
> > Has nobody ever tried an unaligned 1MB read?
> 
> I'm not sure how the TCP transport works. If it tries to fill the
> head iovec completely, then goes into the page list, then it might
> not ever need the extra page on the end for the GETATTR.

I think an unaligned read of N pages ends up with rq_pages like:

	1st page: read request
	2nd page: head+tail of reply
	3rd page: partial first page of read data
	N-1 pages: read data
	last page: partial last page of read data

So, N+3 pages.

> For NFS/RDMA, the transport currently aligns the incoming data so
> it always starts at byte 0 of the first page in the xdr_buf's page
> list. That's going to have to change if we want direct placement
> of the write payload, and at that point an unaligned payload will
> be possible.
> 
> 
> > We could use comments each time elsewhere when we add another random 1
> > or 2 somewhere.  What does sv_max_mesg really mean, why are we adding 2
> > to it in svc_alloc_arg, etc.
> 
> I'm open to suggestions. A comment in svc_alloc_arg might just
> refer weary readers to the comment in front of the definition of
> RPCSVC_MAXPAGES.

There's 3 extra pages in rq_pages beyond what's needed to hold a max IO
size buffer, and I'd like the comment to explain *which* 2 of those
three pages we're accounting for here.  I could figure that out if I
knew what xv_max_mesg meant, but the comment there just says 1 page is
added for "overhead", which isn't helpful.

> Let me know if there's anything else you want to see changed, or
> if this patch is now acceptable.
> 
> 
> > Might make more sense if RPCSVC_MAXPAYLOAD was explicitly calculated as
> > a max over the compiled-in transports, too.
> 
> IMO sv_max_mesg could be expressed in pages, instead of bytes, so
> that the computation in svc_alloc_arg isn't repeated for every RPC.

Makes sense to me.

> 
> 
> > I feel like we've actually been through this before, but somehow I'm
> > still confused every time I look at it.
> > 
> > --b.
> > 
> > On Fri, Jun 30, 2017 at 12:03:54PM -0400, Chuck Lever wrote:
> >> svcrdma needs 259 pages for 1MB NFSv4.0 READ requests:
> 
> This should be
> 
> "svcrdma needs 259 pages for 1MB NFSv4.0 WRITE requests:"

Oh, got it.

--b.

> 
> 
> >> - 1 page for the transport header and head iovec
> >> - 256 pages for the data payload
> >> - 1 page for the trailing GETATTR request (since NFSD XDR decoding
> >>   does not look for a tail iovec, the GETATTR is stuck at the end
> >>   of the rqstp->rq_arg.pages list)
> >> - 1 page for building the reply xdr_buf
> >> 
> >> But RPCSVC_MAXPAGES is already 259 (on x86_64). The problem is that
> >> svc_alloc_arg never allocates that many pages. To address this:
> >> 
> >> 1. The final element of rq_pages always points to NULL. To
> >>   accommodate up to 259 pages in rq_pages, add an extra element
> >>   to rq_pages for the array termination sentinel.
> >> 
> >> 2. Adjust the calculation of "pages" to match how RPCSVC_MAXPAGES
> >>   is calculated, so it can go up to 259. Bruce noted that the
> >>   calculation assumes sv_max_mesg is a multiple of PAGE_SIZE,
> >>   which might not always be true. I didn't change this assumption.
> >> 
> >> 3. Change the loop boundaries to allow 259 pages to be allocated.
> >> 
> >> Additional clean-up: WARN_ON_ONCE adds an extra conditional branch,
> >> which is basically never taken. And there's no need to dump the
> >> stack here because svc_alloc_arg has only one caller.
> >> 
> >> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> >> ---
> >> Hi Bruce-
> >> 
> >> This is what I proposed yesterday evening. Instead of changing
> >> the value of RPCSVC_MAXPAGES, ensure that svc_alloc_arg can
> >> safely allocate that many pages.
> >> 
> >> 
> >> include/linux/sunrpc/svc.h |    2 +-
> >> net/sunrpc/svc_xprt.c      |   10 ++++++----
> >> 2 files changed, 7 insertions(+), 5 deletions(-)
> >> 
> >> diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
> >> index 11cef5a..d741399 100644
> >> --- a/include/linux/sunrpc/svc.h
> >> +++ b/include/linux/sunrpc/svc.h
> >> @@ -246,7 +246,7 @@ struct svc_rqst {
> >> 	size_t			rq_xprt_hlen;	/* xprt header len */
> >> 	struct xdr_buf		rq_arg;
> >> 	struct xdr_buf		rq_res;
> >> -	struct page *		rq_pages[RPCSVC_MAXPAGES];
> >> +	struct page		*rq_pages[RPCSVC_MAXPAGES + 1];
> >> 	struct page *		*rq_respages;	/* points into rq_pages */
> >> 	struct page *		*rq_next_page; /* next reply page to use */
> >> 	struct page *		*rq_page_end;  /* one past the last page */
> >> diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
> >> index 7bfe1fb..d16a8b4 100644
> >> --- a/net/sunrpc/svc_xprt.c
> >> +++ b/net/sunrpc/svc_xprt.c
> >> @@ -659,11 +659,13 @@ static int svc_alloc_arg(struct svc_rqst *rqstp)
> >> 	int i;
> >> 
> >> 	/* now allocate needed pages.  If we get a failure, sleep briefly */
> >> -	pages = (serv->sv_max_mesg + PAGE_SIZE) / PAGE_SIZE;
> >> -	WARN_ON_ONCE(pages >= RPCSVC_MAXPAGES);
> >> -	if (pages >= RPCSVC_MAXPAGES)
> >> +	pages = (serv->sv_max_mesg + 2 * PAGE_SIZE) >> PAGE_SHIFT;
> >> +	if (pages > RPCSVC_MAXPAGES) {
> >> +		pr_warn_once("svc: warning: pages=%u > RPCSVC_MAXPAGES=%lu\n",
> >> +			     pages, RPCSVC_MAXPAGES);
> >> 		/* use as many pages as possible */
> >> -		pages = RPCSVC_MAXPAGES - 1;
> >> +		pages = RPCSVC_MAXPAGES;
> >> +	}
> >> 	for (i = 0; i < pages ; i++)
> >> 		while (rqstp->rq_pages[i] == NULL) {
> >> 			struct page *p = alloc_page(GFP_KERNEL);
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> --
> Chuck Lever
> 
> 

  reply	other threads:[~2017-06-30 20:47 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-30 16:03 [PATCH v1] sunrpc: Allocate up to RPCSVC_MAXPAGES per svc_rqst Chuck Lever
2017-06-30 20:16 ` J. Bruce Fields
2017-06-30 20:33   ` Chuck Lever
2017-06-30 20:47     ` J. Bruce Fields [this message]
2017-06-30 21:23       ` Chuck Lever
2017-07-05 14:36         ` Chuck Lever
2017-07-06 14:02           ` J. Bruce Fields
2017-07-11 20:50             ` J. Bruce Fields

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170630204733.GE14868@fieldses.org \
    --to=bfields@fieldses.org \
    --cc=chuck.lever@oracle.com \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.