From: Chuck Lever <chuck.lever@oracle.com>
To: Mike Snitzer <snitzer@kernel.org>
Cc: linux-nfs@vger.kernel.org, Neil Brown <neilb@suse.de>,
Jeff Layton <jlayton@kernel.org>,
Olga Kornievskaia <okorniev@redhat.com>,
Dai Ngo <Dai.Ngo@oracle.com>, Tom Talpey <tom@talpey.com>
Subject: Re: [PATCH] nfsd: add the ability to enable use of RWF_DONTCACHE for all nfsd IO
Date: Thu, 20 Feb 2025 13:17:42 -0500 [thread overview]
Message-ID: <ce92e5f6-d7cb-47ef-ad96-d334212a51f1@oracle.com> (raw)
In-Reply-To: <20250220171205.12092-1-snitzer@kernel.org>
[ Adding NFSD reviewers ... ]
On 2/20/25 12:12 PM, Mike Snitzer wrote:
> Add nfsd 'nfsd_dontcache' modparam so that "Any data read or written
> by nfsd will be removed from the page cache upon completion."
>
> nfsd_dontcache is disabled by default. It may be enabled with:
> echo Y > /sys/module/nfsd/parameters/nfsd_dontcache
A per-export setting like an export option would be nicer. Also, does
it make sense to make it a separate control for READ and one for WRITE?
My trick knee suggests caching read results is still going to add
significant value, but write, not so much.
However, to add any such administrative control, I'd like to see some
performance numbers. I think we need to enumerate the cases (I/O types)
that are most interesting to examine: small memory NFS servers; lots of
small unaligned I/O; server-side CPU per byte; storage interrupt rates;
any others?
And let's see some user/admin documentation (eg when should this setting
be enabled? when would it be contra-indicated?)
The same arguments that applied to Cedric's request to make maximum RPC
size a tunable setting apply here. Do we want to carry a manual setting
for this mechanism for a long time, or do we expect that the setting can
become automatic/uninteresting after a period of experimentation?
* It might be argued that putting these experimental tunables under /sys
eliminates the support longevity question, since there aren't strict
rules about removing files under /sys.
> FOP_DONTCACHE must be advertised as supported by the underlying
> filesystem (e.g. XFS), otherwise if/when 'nfsd_dontcache' is enabled
> all IO will fail with -EOPNOTSUPP.
It would be better all around if NFSD simply ignored the setting in the
cases where the underlying file system doesn't implement DONTCACHE.
> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> ---
> fs/nfsd/vfs.c | 17 ++++++++++++++++-
> 1 file changed, 16 insertions(+), 1 deletion(-)
>
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index 29cb7b812d71..d7e49004e93d 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -955,6 +955,11 @@ nfsd_open_verified(struct svc_fh *fhp, int may_flags, struct file **filp)
> return __nfsd_open(fhp, S_IFREG, may_flags, filp);
> }
>
> +static bool nfsd_dontcache __read_mostly = false;
> +module_param(nfsd_dontcache, bool, 0644);
> +MODULE_PARM_DESC(nfsd_dontcache,
> + "Any data read or written by nfsd will be removed from the page cache upon completion.");
> +
> /*
> * Grab and keep cached pages associated with a file in the svc_rqst
> * so that they can be passed to the network sendmsg routines
> @@ -1084,6 +1089,7 @@ __be32 nfsd_iter_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
> loff_t ppos = offset;
> struct page *page;
> ssize_t host_err;
> + rwf_t flags = 0;
>
> v = 0;
> total = *count;
> @@ -1097,9 +1103,12 @@ __be32 nfsd_iter_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
> }
> WARN_ON_ONCE(v > ARRAY_SIZE(rqstp->rq_vec));
>
> + if (nfsd_dontcache)
> + flags |= RWF_DONTCACHE;
> +
> trace_nfsd_read_vector(rqstp, fhp, offset, *count);
> iov_iter_kvec(&iter, ITER_DEST, rqstp->rq_vec, v, *count);
> - host_err = vfs_iter_read(file, &iter, &ppos, 0);
> + host_err = vfs_iter_read(file, &iter, &ppos, flags);
> return nfsd_finish_read(rqstp, fhp, file, offset, count, eof, host_err);
> }
>
> @@ -1186,6 +1195,9 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfsd_file *nf,
> if (stable && !fhp->fh_use_wgather)
> flags |= RWF_SYNC;
>
> + if (nfsd_dontcache)
> + flags |= RWF_DONTCACHE;
> +
> iov_iter_kvec(&iter, ITER_SOURCE, vec, vlen, *cnt);
> since = READ_ONCE(file->f_wb_err);
> if (verf)
> @@ -1237,6 +1249,9 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfsd_file *nf,
> */
> bool nfsd_read_splice_ok(struct svc_rqst *rqstp)
> {
> + if (nfsd_dontcache) /* force the use of vfs_iter_read for reads */
> + return false;
> +
Urgh.
So I've been mulling over simply removing the splice read path.
- Less code, less complexity, smaller test matrix
- How much of a performance loss would result?
- Would such a change make it easier to pass whole folios from
the file system directly to the network layer?
> switch (svc_auth_flavor(rqstp)) {
> case RPC_AUTH_GSS_KRB5I:
> case RPC_AUTH_GSS_KRB5P:
--
Chuck Lever
next prev parent reply other threads:[~2025-02-20 18:18 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-20 17:12 [PATCH] nfsd: add the ability to enable use of RWF_DONTCACHE for all nfsd IO Mike Snitzer
2025-02-20 18:17 ` Chuck Lever [this message]
2025-02-21 15:02 ` Mike Snitzer
2025-02-21 15:25 ` Jeff Layton
2025-02-21 15:36 ` Mike Snitzer
2025-02-21 15:42 ` Jeff Layton
2025-02-21 15:46 ` Chuck Lever
2025-02-21 16:13 ` Trond Myklebust
2025-02-21 18:42 ` Jeff Layton
2025-02-21 19:18 ` Trond Myklebust
2025-02-21 15:39 ` Chuck Lever
2025-02-21 15:46 ` Jeff Layton
2025-02-21 15:50 ` Chuck Lever
2025-02-20 19:00 ` [PATCH] " Jeff Layton
2025-02-20 19:15 ` Chuck Lever
2025-02-21 15:25 ` Mike Snitzer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ce92e5f6-d7cb-47ef-ad96-d334212a51f1@oracle.com \
--to=chuck.lever@oracle.com \
--cc=Dai.Ngo@oracle.com \
--cc=jlayton@kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=neilb@suse.de \
--cc=okorniev@redhat.com \
--cc=snitzer@kernel.org \
--cc=tom@talpey.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox