From: NeilBrown <neilb@suse.de>
To: Chuck Lever <chuck.lever@oracle.com>
Cc: Linux NFS Mailing List <linux-nfs@vger.kernel.org>
Subject: Re: releasing result pages in svc_xprt_release()
Date: Mon, 01 Feb 2021 10:45:31 +1100 [thread overview]
Message-ID: <878s88fz6s.fsf@notabene.neil.brown.name> (raw)
In-Reply-To: <597824E7-3942-4F11-958F-A6E247330A9E@oracle.com>
[-- Attachment #1: Type: text/plain, Size: 3561 bytes --]
On Fri, Jan 29 2021, Chuck Lever wrote:
>> On Jan 29, 2021, at 5:43 PM, NeilBrown <neilb@suse.de> wrote:
>>
>> On Fri, Jan 29 2021, Chuck Lever wrote:
>>
>>> Hi Neil-
>>>
>>> I'd like to reduce the amount of page allocation that NFSD does,
>>> and was wondering about the release and reset of pages in
>>> svc_xprt_release(). This logic was added when the socket transport
>>> was converted to use kernel_sendpage() back in 2002. Do you
>>> remember why releasing the result pages is necessary?
>>>
>>
>> Hi Chuck,
>> as I recall, kernel_sendpage() (or sock->ops->sendpage() as it was
>> then) takes a reference to the page and will hold that reference until
>> the content has been sent and ACKed. nfsd has no way to know when the
>> ACK comes, so cannot know when the page can be re-used, so it must
>> release the page and allocate a new one.
>>
>> This is the price we pay for zero-copy, and I acknowledge that it is a
>> real price. I wouldn't be surprised if the trade-offs between
>> zero-copy and single-copy change over time, and between different
>> hardware.
>
> Very interesting, thanks for the history! Two observations:
>
> - I thought without MSG_DONTWAIT, the sendpage operation would be
> total synchronous -- when the network layer was done with retransmissions,
> it would unblock the caller. But that's likely a mistaken assumption
> on my part. That could be why sendmsg is so much slower than sendpage
> in this particular application.
>
On the "send" side, I think MSG_DONTWAIT is primarily about memory
allocation. send_msg() can only return when the message is queued. If
it needs to allocate memory (or wait for space in a restricted queue),
then MSG_DONTWAIT says "fail instead". It certainly doesn't wait for
successful xmit and ack.
On the "recv" side it is quite different of course.
> - IIUC, nfsd_splice_read() replaces anonymous pages in rq_pages with
> actual page cache pages. Those of course cannot be used to construct
> subsequent RPC Replies, so that introduces a second release requirement.
Yep. I wonder if those pages are protected against concurrent updates
.. so that a computed checksum will remain accurate.
>
> So I have a way to make the first case unnecessary for RPC/RDMA. It
> has a reliable Send completion mechanism. Sounds like releasing is
> still necessary for TCP, though; maybe that could be done in the
> xpo_release_rqst callback.
It isn't clear to me what particular cost you are trying to reduce. Is
handing a page back from RDMA to nfsd cheaper than nfsd calling
alloc_page(), or do you hope to keep batches of pages together to avoid
multi-page overheads, or is this about cache-hot pages, or ???
>
> As far as nfsd_splice_read(), I had thought of moving those pages to
> a separate array which would always be released. That would need to
> deal with the transport requirements above.
>
> If nothing else, I would like to add mention of these requirements
> somewhere in the code too.
Strongly agree with that.
>
> What's your opinion?
To form a coherent opinion, I would need to know what that problem is.
I certainly accept that there could be performance problems in releasing
and re-allocating pages which might be resolved by batching, or by copying,
or by better tracking. But without knowing what hot-spot you want to
cool down, I cannot think about how that fits into the big picture.
So: what exactly is the problem that you see?
Thanks,
NeilBrown
>
>
> --
> Chuck Lever
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 853 bytes --]
next prev parent reply other threads:[~2021-01-31 23:46 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-01-29 16:19 releasing result pages in svc_xprt_release() Chuck Lever
2021-01-29 22:43 ` NeilBrown
2021-01-29 23:06 ` Chuck Lever
2021-01-31 23:45 ` NeilBrown [this message]
2021-02-01 0:19 ` Chuck Lever
2021-02-01 23:27 ` NeilBrown
2021-02-05 20:20 ` Chuck Lever
2021-02-05 21:13 ` J. Bruce Fields
2021-02-06 17:59 ` Chuck Lever
2021-02-07 23:59 ` NeilBrown
2021-02-07 23:42 ` NeilBrown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=878s88fz6s.fsf@notabene.neil.brown.name \
--to=neilb@suse.de \
--cc=chuck.lever@oracle.com \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox