Linux NFS development
 help / color / mirror / Atom feed
From: Jeff Layton <jlayton@kernel.org>
To: Trond Myklebust <trondmy@hammerspace.com>,
	"snitzer@kernel.org" <snitzer@kernel.org>,
	"chuck.lever@oracle.com" <chuck.lever@oracle.com>
Cc: "okorniev@redhat.com" <okorniev@redhat.com>,
	"tom@talpey.com"	 <tom@talpey.com>,
	"linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
	 "Dai.Ngo@oracle.com"	 <Dai.Ngo@oracle.com>,
	"neilb@suse.de" <neilb@suse.de>,
	"axboe@kernel.dk"	 <axboe@kernel.dk>
Subject: Re: nfsd: add the ability to enable use of RWF_DONTCACHE for all nfsd IO
Date: Fri, 21 Feb 2025 13:42:50 -0500	[thread overview]
Message-ID: <42400116f9098ec7f5acc70c2450dd52a2bf8f21.camel@kernel.org> (raw)
In-Reply-To: <7b1574e2499da99986c432f815abccb2e5a6c7f5.camel@hammerspace.com>

On Fri, 2025-02-21 at 16:13 +0000, Trond Myklebust wrote:
> On Fri, 2025-02-21 at 10:46 -0500, Chuck Lever wrote:
> > On 2/21/25 10:36 AM, Mike Snitzer wrote:
> > > On Fri, Feb 21, 2025 at 10:25:03AM -0500, Jeff Layton wrote:
> > > > On Fri, 2025-02-21 at 10:02 -0500, Mike Snitzer wrote:
> > > > > My intent was to make 6.14's DONTCACHE feature able to be
> > > > > tested in
> > > > > the context of nfsd in a no-frills way.  I realize adding the
> > > > > nfsd_dontcache knob skews toward too raw, lacks polish.  But
> > > > > I'm
> > > > > inclined to expose such course-grained opt-in knobs to
> > > > > encourage
> > > > > others' discovery (and answers to some of the questions you
> > > > > pose
> > > > > below).  I also hope to enlist all NFSD reviewers' help in
> > > > > categorizing/documenting where DONTCACHE helps/hurts. ;)
> > > > > 
> > > > > And I agree that ultimately per-export control is needed.  I'll
> > > > > take
> > > > > the time to implement that, hopeful to have something more
> > > > > suitable in
> > > > > time for LSF.
> > > > 
> > > > Would it make more sense to hook DONTCACHE up to the IO_ADVISE
> > > > operation in RFC7862? IO_ADVISE4_NOREUSE sounds like it has
> > > > similar
> > > > meaning? That would give the clients a way to do this on a per-
> > > > open
> > > > basis.
> > > 
> > > Just thinking aloud here but: Using a DONTCACHE scalpel on a per
> > > open
> > > basis quite likely wouldn't provide the required page reclaim
> > > relief
> > > if the server is being hammered with normal buffered IO.  Sure that
> > > particular DONTCACHE IO wouldn't contribute to the problem but it
> > > would still be impacted by those not opting to use DONTCACHE on
> > > entry
> > > to the server due to needing pages for its DONTCACHE buffered IO.
> > 
> > For this initial work, which is to provide a mechanism for
> > experimentation, IMO exposing the setting to clients won't be all
> > that helpful.
> > 
> > But there are some applications/workloads on clients where exposure
> > could be beneficial -- for instance, a backup job, where NFSD would
> > benefit by knowing it doesn't have to maintain the job's written data
> > in
> > its page cache. I regard that as a later evolutionary improvement,
> > though.
> > 
> > Jorge proposed adding the NFSv4.2 IO_ADVISE operation to NFSD, but I
> > think we first need to a) work out and document appropriate semantics
> > for each hint, because the spec does not provide specifics, and b)
> > perform some extensive benchmarking to understand their value and
> > impact.
> > 
> > 
> 
> That puts the onus on the application running on the client to decide
> the caching semantics of the server which:
>    A. Is a terrible idea™. The application may know how it wants to use
>       the cached data, and be able to somewhat confidently manage its
>       own pagecache. However in almost all cases, it will have no basis
>       for understanding how the server should manage its cache. The
>       latter really is a job for the sysadmin to figure out.
>    B. Is impractical, because even if you can figure out a policy, it
>       requires rewriting the application to manage the server cache.
>    C. Will require additional APIs on the NFSv4.2 client to expose the
>       IO_ADVISE operation. You cannot just map it to posix_fadvise()
>       and/or posix_madvise(), because IO_ADVISE is designed to manage a
>       completely different caching layer. At best, we might be able to
>       rally one or two more distributed filesystems to implement
>       similar functionality and share an API, however there is no
>       chance this API will be useful for ordinary filesystems.
> 

You could map this to RWF_DONTCACHE itself. I know that's really
intended as a hint to the local kernel, but it seems reasonable that if
the application is giving the kernel a DONTCACHE hint, we could pass
that along to the server as well. The server is under no obligation to
do anything with it, just like the kernel with RWF_DONTCACHE.

We could put an IO_ADVISE in a READ or READ_PLUS compound like so:

    PUTFH + IO_ADVISE(IO_ADVISE_NOREUSE for ranges being read) + READ_PLUS or READ ...

On the server, we could track those ranges in the compound and enable
RWF_DONTCACHE for any subsequent reads or writes.

All that said, I don't object to some sort of mechanism to turn this on
more globally, particularly since that would allow us to use this with
v3 I/O as well.
-- 
Jeff Layton <jlayton@kernel.org>

  reply	other threads:[~2025-02-21 18:42 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-20 17:12 [PATCH] nfsd: add the ability to enable use of RWF_DONTCACHE for all nfsd IO Mike Snitzer
2025-02-20 18:17 ` Chuck Lever
2025-02-21 15:02   ` Mike Snitzer
2025-02-21 15:25     ` Jeff Layton
2025-02-21 15:36       ` Mike Snitzer
2025-02-21 15:42         ` Jeff Layton
2025-02-21 15:46         ` Chuck Lever
2025-02-21 16:13           ` Trond Myklebust
2025-02-21 18:42             ` Jeff Layton [this message]
2025-02-21 19:18               ` Trond Myklebust
2025-02-21 15:39     ` Chuck Lever
2025-02-21 15:46       ` Jeff Layton
2025-02-21 15:50         ` Chuck Lever
2025-02-20 19:00 ` [PATCH] " Jeff Layton
2025-02-20 19:15   ` Chuck Lever
2025-02-21 15:25     ` Mike Snitzer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=42400116f9098ec7f5acc70c2450dd52a2bf8f21.camel@kernel.org \
    --to=jlayton@kernel.org \
    --cc=Dai.Ngo@oracle.com \
    --cc=axboe@kernel.dk \
    --cc=chuck.lever@oracle.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=neilb@suse.de \
    --cc=okorniev@redhat.com \
    --cc=snitzer@kernel.org \
    --cc=tom@talpey.com \
    --cc=trondmy@hammerspace.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox