From: Mike Snitzer <snitzer@kernel.org>
To: Chuck Lever <chuck.lever@oracle.com>
Cc: linux-nfs@vger.kernel.org, Neil Brown <neilb@suse.de>,
Jeff Layton <jlayton@kernel.org>,
Olga Kornievskaia <okorniev@redhat.com>,
Dai Ngo <Dai.Ngo@oracle.com>, Tom Talpey <tom@talpey.com>,
axboe@kernel.dk
Subject: Re: nfsd: add the ability to enable use of RWF_DONTCACHE for all nfsd IO
Date: Fri, 21 Feb 2025 10:02:12 -0500 [thread overview]
Message-ID: <Z7iVdHcnGveg-gbg@kernel.org> (raw)
In-Reply-To: <ce92e5f6-d7cb-47ef-ad96-d334212a51f1@oracle.com>
On Thu, Feb 20, 2025 at 01:17:42PM -0500, Chuck Lever wrote:
> [ Adding NFSD reviewers ... ]
>
> On 2/20/25 12:12 PM, Mike Snitzer wrote:
> > Add nfsd 'nfsd_dontcache' modparam so that "Any data read or written
> > by nfsd will be removed from the page cache upon completion."
> >
> > nfsd_dontcache is disabled by default. It may be enabled with:
> > echo Y > /sys/module/nfsd/parameters/nfsd_dontcache
>
> A per-export setting like an export option would be nicer. Also, does
> it make sense to make it a separate control for READ and one for WRITE?
> My trick knee suggests caching read results is still going to add
> significant value, but write, not so much.
My intent was to make 6.14's DONTCACHE feature able to be tested in
the context of nfsd in a no-frills way. I realize adding the
nfsd_dontcache knob skews toward too raw, lacks polish. But I'm
inclined to expose such course-grained opt-in knobs to encourage
others' discovery (and answers to some of the questions you pose
below). I also hope to enlist all NFSD reviewers' help in
categorizing/documenting where DONTCACHE helps/hurts. ;)
And I agree that ultimately per-export control is needed. I'll take
the time to implement that, hopeful to have something more suitable in
time for LSF.
> However, to add any such administrative control, I'd like to see some
> performance numbers. I think we need to enumerate the cases (I/O types)
> that are most interesting to examine: small memory NFS servers; lots of
> small unaligned I/O; server-side CPU per byte; storage interrupt rates;
> any others?
>
> And let's see some user/admin documentation (eg when should this setting
> be enabled? when would it be contra-indicated?)
>
> The same arguments that applied to Cedric's request to make maximum RPC
> size a tunable setting apply here. Do we want to carry a manual setting
> for this mechanism for a long time, or do we expect that the setting can
> become automatic/uninteresting after a period of experimentation?
>
> * It might be argued that putting these experimental tunables under /sys
> eliminates the support longevity question, since there aren't strict
> rules about removing files under /sys.
Right, I do think a sysfs knob (that defaults to disabled, requires
user opt-in) is a pretty useful and benign means to expose
experimental functionality.
And I agree with all you said needed above, I haven't had the time to
focus on DONTCACHE since ~Decemeber, I just picked up my old patches
from that time and decided to send the NFSD one since DONTCACHE has
been merged for 6.14.
> > FOP_DONTCACHE must be advertised as supported by the underlying
> > filesystem (e.g. XFS), otherwise if/when 'nfsd_dontcache' is enabled
> > all IO will fail with -EOPNOTSUPP.
>
> It would be better all around if NFSD simply ignored the setting in the
> cases where the underlying file system doesn't implement DONTCACHE.
I'll work on making it so.
>
>
> > Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> > ---
> > fs/nfsd/vfs.c | 17 ++++++++++++++++-
> > 1 file changed, 16 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> > index 29cb7b812d71..d7e49004e93d 100644
> > --- a/fs/nfsd/vfs.c
> > +++ b/fs/nfsd/vfs.c
> > @@ -955,6 +955,11 @@ nfsd_open_verified(struct svc_fh *fhp, int may_flags, struct file **filp)
> > return __nfsd_open(fhp, S_IFREG, may_flags, filp);
> > }
> >
> > +static bool nfsd_dontcache __read_mostly = false;
> > +module_param(nfsd_dontcache, bool, 0644);
> > +MODULE_PARM_DESC(nfsd_dontcache,
> > + "Any data read or written by nfsd will be removed from the page cache upon completion.");
> > +
> > /*
> > * Grab and keep cached pages associated with a file in the svc_rqst
> > * so that they can be passed to the network sendmsg routines
> > @@ -1084,6 +1089,7 @@ __be32 nfsd_iter_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
> > loff_t ppos = offset;
> > struct page *page;
> > ssize_t host_err;
> > + rwf_t flags = 0;
> >
> > v = 0;
> > total = *count;
> > @@ -1097,9 +1103,12 @@ __be32 nfsd_iter_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
> > }
> > WARN_ON_ONCE(v > ARRAY_SIZE(rqstp->rq_vec));
> >
> > + if (nfsd_dontcache)
> > + flags |= RWF_DONTCACHE;
> > +
> > trace_nfsd_read_vector(rqstp, fhp, offset, *count);
> > iov_iter_kvec(&iter, ITER_DEST, rqstp->rq_vec, v, *count);
> > - host_err = vfs_iter_read(file, &iter, &ppos, 0);
> > + host_err = vfs_iter_read(file, &iter, &ppos, flags);
> > return nfsd_finish_read(rqstp, fhp, file, offset, count, eof, host_err);
> > }
> >
> > @@ -1186,6 +1195,9 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfsd_file *nf,
> > if (stable && !fhp->fh_use_wgather)
> > flags |= RWF_SYNC;
> >
> > + if (nfsd_dontcache)
> > + flags |= RWF_DONTCACHE;
> > +
> > iov_iter_kvec(&iter, ITER_SOURCE, vec, vlen, *cnt);
> > since = READ_ONCE(file->f_wb_err);
> > if (verf)
> > @@ -1237,6 +1249,9 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfsd_file *nf,
> > */
> > bool nfsd_read_splice_ok(struct svc_rqst *rqstp)
> > {
> > + if (nfsd_dontcache) /* force the use of vfs_iter_read for reads */
> > + return false;
> > +
>
> Urgh.
Heh, yeah, bypassing splice was needed given dontcache hooks off vfs_iter_read.
> So I've been mulling over simply removing the splice read path.
>
> - Less code, less complexity, smaller test matrix
>
> - How much of a performance loss would result?
>
> - Would such a change make it easier to pass whole folios from
> the file system directly to the network layer?
Good to know.
Thanks,
Mike
next prev parent reply other threads:[~2025-02-21 15:02 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-20 17:12 [PATCH] nfsd: add the ability to enable use of RWF_DONTCACHE for all nfsd IO Mike Snitzer
2025-02-20 18:17 ` Chuck Lever
2025-02-21 15:02 ` Mike Snitzer [this message]
2025-02-21 15:25 ` Jeff Layton
2025-02-21 15:36 ` Mike Snitzer
2025-02-21 15:42 ` Jeff Layton
2025-02-21 15:46 ` Chuck Lever
2025-02-21 16:13 ` Trond Myklebust
2025-02-21 18:42 ` Jeff Layton
2025-02-21 19:18 ` Trond Myklebust
2025-02-21 15:39 ` Chuck Lever
2025-02-21 15:46 ` Jeff Layton
2025-02-21 15:50 ` Chuck Lever
2025-02-20 19:00 ` [PATCH] " Jeff Layton
2025-02-20 19:15 ` Chuck Lever
2025-02-21 15:25 ` Mike Snitzer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z7iVdHcnGveg-gbg@kernel.org \
--to=snitzer@kernel.org \
--cc=Dai.Ngo@oracle.com \
--cc=axboe@kernel.dk \
--cc=chuck.lever@oracle.com \
--cc=jlayton@kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=neilb@suse.de \
--cc=okorniev@redhat.com \
--cc=tom@talpey.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox