From: Jeff Layton <jlayton@kernel.org>
To: Trond Myklebust <trondmy@hammerspace.com>,
"snitzer@kernel.org" <snitzer@kernel.org>,
"chuck.lever@oracle.com" <chuck.lever@oracle.com>
Cc: "okorniev@redhat.com" <okorniev@redhat.com>,
"tom@talpey.com" <tom@talpey.com>,
"linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
"Dai.Ngo@oracle.com" <Dai.Ngo@oracle.com>,
"neilb@suse.de" <neilb@suse.de>,
"axboe@kernel.dk" <axboe@kernel.dk>
Subject: Re: nfsd: add the ability to enable use of RWF_DONTCACHE for all nfsd IO
Date: Fri, 21 Feb 2025 13:42:50 -0500 [thread overview]
Message-ID: <42400116f9098ec7f5acc70c2450dd52a2bf8f21.camel@kernel.org> (raw)
In-Reply-To: <7b1574e2499da99986c432f815abccb2e5a6c7f5.camel@hammerspace.com>
On Fri, 2025-02-21 at 16:13 +0000, Trond Myklebust wrote:
> On Fri, 2025-02-21 at 10:46 -0500, Chuck Lever wrote:
> > On 2/21/25 10:36 AM, Mike Snitzer wrote:
> > > On Fri, Feb 21, 2025 at 10:25:03AM -0500, Jeff Layton wrote:
> > > > On Fri, 2025-02-21 at 10:02 -0500, Mike Snitzer wrote:
> > > > > My intent was to make 6.14's DONTCACHE feature able to be
> > > > > tested in
> > > > > the context of nfsd in a no-frills way. I realize adding the
> > > > > nfsd_dontcache knob skews toward too raw, lacks polish. But
> > > > > I'm
> > > > > inclined to expose such course-grained opt-in knobs to
> > > > > encourage
> > > > > others' discovery (and answers to some of the questions you
> > > > > pose
> > > > > below). I also hope to enlist all NFSD reviewers' help in
> > > > > categorizing/documenting where DONTCACHE helps/hurts. ;)
> > > > >
> > > > > And I agree that ultimately per-export control is needed. I'll
> > > > > take
> > > > > the time to implement that, hopeful to have something more
> > > > > suitable in
> > > > > time for LSF.
> > > >
> > > > Would it make more sense to hook DONTCACHE up to the IO_ADVISE
> > > > operation in RFC7862? IO_ADVISE4_NOREUSE sounds like it has
> > > > similar
> > > > meaning? That would give the clients a way to do this on a per-
> > > > open
> > > > basis.
> > >
> > > Just thinking aloud here but: Using a DONTCACHE scalpel on a per
> > > open
> > > basis quite likely wouldn't provide the required page reclaim
> > > relief
> > > if the server is being hammered with normal buffered IO. Sure that
> > > particular DONTCACHE IO wouldn't contribute to the problem but it
> > > would still be impacted by those not opting to use DONTCACHE on
> > > entry
> > > to the server due to needing pages for its DONTCACHE buffered IO.
> >
> > For this initial work, which is to provide a mechanism for
> > experimentation, IMO exposing the setting to clients won't be all
> > that helpful.
> >
> > But there are some applications/workloads on clients where exposure
> > could be beneficial -- for instance, a backup job, where NFSD would
> > benefit by knowing it doesn't have to maintain the job's written data
> > in
> > its page cache. I regard that as a later evolutionary improvement,
> > though.
> >
> > Jorge proposed adding the NFSv4.2 IO_ADVISE operation to NFSD, but I
> > think we first need to a) work out and document appropriate semantics
> > for each hint, because the spec does not provide specifics, and b)
> > perform some extensive benchmarking to understand their value and
> > impact.
> >
> >
>
> That puts the onus on the application running on the client to decide
> the caching semantics of the server which:
> A. Is a terrible idea™. The application may know how it wants to use
> the cached data, and be able to somewhat confidently manage its
> own pagecache. However in almost all cases, it will have no basis
> for understanding how the server should manage its cache. The
> latter really is a job for the sysadmin to figure out.
> B. Is impractical, because even if you can figure out a policy, it
> requires rewriting the application to manage the server cache.
> C. Will require additional APIs on the NFSv4.2 client to expose the
> IO_ADVISE operation. You cannot just map it to posix_fadvise()
> and/or posix_madvise(), because IO_ADVISE is designed to manage a
> completely different caching layer. At best, we might be able to
> rally one or two more distributed filesystems to implement
> similar functionality and share an API, however there is no
> chance this API will be useful for ordinary filesystems.
>
You could map this to RWF_DONTCACHE itself. I know that's really
intended as a hint to the local kernel, but it seems reasonable that if
the application is giving the kernel a DONTCACHE hint, we could pass
that along to the server as well. The server is under no obligation to
do anything with it, just like the kernel with RWF_DONTCACHE.
We could put an IO_ADVISE in a READ or READ_PLUS compound like so:
PUTFH + IO_ADVISE(IO_ADVISE_NOREUSE for ranges being read) + READ_PLUS or READ ...
On the server, we could track those ranges in the compound and enable
RWF_DONTCACHE for any subsequent reads or writes.
All that said, I don't object to some sort of mechanism to turn this on
more globally, particularly since that would allow us to use this with
v3 I/O as well.
--
Jeff Layton <jlayton@kernel.org>
next prev parent reply other threads:[~2025-02-21 18:42 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-20 17:12 [PATCH] nfsd: add the ability to enable use of RWF_DONTCACHE for all nfsd IO Mike Snitzer
2025-02-20 18:17 ` Chuck Lever
2025-02-21 15:02 ` Mike Snitzer
2025-02-21 15:25 ` Jeff Layton
2025-02-21 15:36 ` Mike Snitzer
2025-02-21 15:42 ` Jeff Layton
2025-02-21 15:46 ` Chuck Lever
2025-02-21 16:13 ` Trond Myklebust
2025-02-21 18:42 ` Jeff Layton [this message]
2025-02-21 19:18 ` Trond Myklebust
2025-02-21 15:39 ` Chuck Lever
2025-02-21 15:46 ` Jeff Layton
2025-02-21 15:50 ` Chuck Lever
2025-02-20 19:00 ` [PATCH] " Jeff Layton
2025-02-20 19:15 ` Chuck Lever
2025-02-21 15:25 ` Mike Snitzer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=42400116f9098ec7f5acc70c2450dd52a2bf8f21.camel@kernel.org \
--to=jlayton@kernel.org \
--cc=Dai.Ngo@oracle.com \
--cc=axboe@kernel.dk \
--cc=chuck.lever@oracle.com \
--cc=linux-nfs@vger.kernel.org \
--cc=neilb@suse.de \
--cc=okorniev@redhat.com \
--cc=snitzer@kernel.org \
--cc=tom@talpey.com \
--cc=trondmy@hammerspace.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox