All of lore.kernel.org
 help / color / mirror / Atom feed
From: "J. Bruce Fields" <bfields@fieldses.org>
To: Chuck Lever <chuck.lever@oracle.com>
Cc: Linux NFS Mailing List <linux-nfs@vger.kernel.org>
Subject: Re: nfsd: managing pages under network I/O
Date: Wed, 19 Oct 2016 15:16:08 -0400	[thread overview]
Message-ID: <20161019191608.GD23282@fieldses.org> (raw)
In-Reply-To: <CFE1B548-5AC0-451E-BC9D-253395443355@oracle.com>

On Wed, Oct 19, 2016 at 02:28:40PM -0400, Chuck Lever wrote:
> 
> > On Oct 19, 2016, at 1:26 PM, bfields@fieldses.org wrote:
> > 
> > On Wed, Oct 12, 2016 at 01:42:26PM -0400, Chuck Lever wrote:
> >> I'm studying the way that the ->recvfrom and ->sendto calls work
> >> for RPC-over-RDMA.
> >> 
> >> The ->sendto path moves pages out of the svc_rqst before posting
> >> I/O (RDMA Write and Send). Once the work is posted, ->sendto
> >> returns, and looks like svc_rqst is released at that point. The
> >> subsequent completion of the Send then releases those moved pages.
> >> 
> >> I'm wondering if the transport can be simplified: instead of
> >> moving pages around, ->sendto could just wait until the Write and
> >> Send activity is complete, then return. The upper layer then
> >> releases everything.
> > 
> > I don't understand what problem you're trying to fix.  Is "moving pages
> > around" really especially complicated or expensive?
> 
> Moving pages is complicated and undocumented, and adds host CPU and
> memory costs (loads and stores). There is also a whole lot of page
> allocator traffic in this code, and that will be enabling and
> disabling interrupts and bottom-halves with some frequency.

I thought "moving pages around" here is basically just this, from
isvc_rdma_sendto.c:send_reply():

	pages = rqstp->rq_next_page - rqstp->rq_respages;
        for (page_no = 0; page_no < pages; page_no++) {
                ctxt->pages[page_no+1] = rqstp->rq_respages[page_no];
		...
                rqstp->rq_respages[page_no] = NULL;
		...
	}

So we're just copying an array of page pointers from one place to
another, and zeroing out the source array.

For a short reply that could be 1 page or even none.  In the worst case
(a 1MB read result) that could be 256 8-byte pointers, so 2K.

Am I missing something?  Has that up-to-2K operation been a problem?

--b.

> 
> The less involvement by the host CPU the better. The current RDMA
> transport code documents the RPC-over-RDMA protocol to some extent,
> but the need for moving the pages is to the reader's imagination.
> 
> From my code audit:
> 
> The svc_rqst is essentially a static piece of memory, and the calling
> NFSD kthread will re-use it immediately when svc_sendto() returns. It
> cannot be manipulated asynchronously. The recvfrom calls and the
> sendto call for one RPC request can each use a different svc_rqst,
> for example. So if the transport needs to use pages for longer the
> length of one svc_sendto call (for example, to wait for RDMA Write to
> complete) then it has to move those pages out of the svc_rqst before
> it returns. (Let me know if I've got this wrong).
> 
> I'm falling back to looking at ways to clean up and document the
> movement of these pages so it is better understood why it is done.
> There also could be some opportunity to share page movement helpers
> between the RDMA read and write path (and perhaps the TCP transport
> too).
> 
> HTH.
> 
> 
> > --b.
> > 
> >> 
> >> Another option would be for ->sendto to return a value that means
> >> the transport will release the svc_rqst and pages.
> >> 
> >> Or, the svc_rqst could be reference counted.
> >> 
> >> Anyone have thoughts about this?
> 
> 
> --
> Chuck Lever
> 
> 

  reply	other threads:[~2016-10-19 19:16 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-12 17:42 nfsd: managing pages under network I/O Chuck Lever
2016-10-13  6:35 ` Christoph Hellwig
2016-10-13 13:36   ` Chuck Lever
2016-10-13 14:32     ` Chuck Lever
2016-10-19 17:26 ` J. Bruce Fields
2016-10-19 18:28   ` Chuck Lever
2016-10-19 19:16     ` J. Bruce Fields [this message]
2016-10-19 19:21       ` Chuck Lever
2016-10-19 19:26         ` J. Bruce Fields

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161019191608.GD23282@fieldses.org \
    --to=bfields@fieldses.org \
    --cc=chuck.lever@oracle.com \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.