From: Mike Snitzer <snitzer@kernel.org>
To: Chuck Lever <chuck.lever@oracle.com>
Cc: linux-nfs@vger.kernel.org, Jeff Layton <jlayton@kernel.org>
Subject: Re: [PATCH v8 3/7] NFSD: add io_cache_read controls to debugfs interface
Date: Wed, 3 Sep 2025 11:07:29 -0400 [thread overview]
Message-ID: <aLhZsfJMwsGu1eu3@kernel.org> (raw)
In-Reply-To: <1c69b5dd-ec65-438f-9b9c-af8013619afa@oracle.com>
On Wed, Sep 03, 2025 at 10:38:45AM -0400, Chuck Lever wrote:
> On 8/26/25 2:57 PM, Mike Snitzer wrote:
> > Add 'io_cache_read' to NFSD's debugfs interface so that: Any data
> > read by NFSD will either be:
> > - cached using page cache (NFSD_IO_BUFFERED=1)
> > - cached but removed from the page cache upon completion
> > (NFSD_IO_DONTCACHE=2).
> > - not cached (NFSD_IO_DIRECT=3)
> >
> > io_cache_read may be set by writing to:
> > /sys/kernel/debug/nfsd/io_cache_read
> >
> > If NFSD_IO_DONTCACHE is specified using 2, FOP_DONTCACHE must be
> > advertised as supported by the underlying filesystem (e.g. XFS),
> > otherwise all IO flagged with RWF_DONTCACHE will fail with
> > -EOPNOTSUPP.
> >
> > If NFSD_IO_DIRECT is specified using 3, the IO must be aligned
> > relative to the underlying block device's logical_block_size. Also the
> > memory buffer used to store the read must be aligned relative to the
> > underlying block device's dma_alignment.
> >
> > Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> > Reviewed-by: Jeff Layton <jlayton@kernel.org>
> > ---
> > fs/nfsd/debugfs.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++
> > fs/nfsd/nfsd.h | 9 ++++++++
> > fs/nfsd/vfs.c | 18 +++++++++++++++
> > 3 files changed, 84 insertions(+)
> >
> > diff --git a/fs/nfsd/debugfs.c b/fs/nfsd/debugfs.c
> > index 84b0c8b559dc9..3cadd45868b48 100644
> > --- a/fs/nfsd/debugfs.c
> > +++ b/fs/nfsd/debugfs.c
> > @@ -27,11 +27,65 @@ static int nfsd_dsr_get(void *data, u64 *val)
> > static int nfsd_dsr_set(void *data, u64 val)
> > {
> > nfsd_disable_splice_read = (val > 0) ? true : false;
> > + if (!nfsd_disable_splice_read) {
> > + /*
> > + * Cannot use NFSD_IO_DONTCACHE or NFSD_IO_DIRECT
> > + * if splice_read is enabled.
> > + */
> > + nfsd_io_cache_read = NFSD_IO_BUFFERED;
> > + }
> > return 0;
> > }
> >
> > DEFINE_DEBUGFS_ATTRIBUTE(nfsd_dsr_fops, nfsd_dsr_get, nfsd_dsr_set, "%llu\n");
> >
> > +/*
> > + * /sys/kernel/debug/nfsd/io_cache_read
> > + *
> > + * Contents:
> > + * %1: NFS READ will use buffered IO
> > + * %2: NFS READ will use dontcache (buffered IO w/ dropbehind)
> > + * %3: NFS READ will use direct IO
> > + *
> > + * The default value of this setting is zero (UNSPECIFIED).
>
> Hi Mike -
>
> I can't remember why we have the UNSPECIFIED setting? IME a debug
> file reflects the current setting, so if our default behavior is
> "buffered" then the first "cat io_cache_read" should reflect that
> rather than "I haven't been changed yet". This doesn't seem like the
> usual semantics of a /sys/kernel/debug file, IME.
Jeff had convincing justification for his request, from:
https://lore.kernel.org/linux-nfs/e5a0d1e435196c55acbdc491b43b6380cbef5599.camel@kernel.org/
"I think the default case should leave nfsd_io_cache_read alone and
return an error. If we add new values later, and someone tries to use
them on an old kernel, it's better to make that attempt error out.
Ditto for the write side controls."
> For example, a user space application may want to read io_cache_read
> to find out what the current behavior is. If it gets 0 (UNSPECIFIED)
> then it has found out nothing.
Right, that is a negative user experience until/unless user is
informed. Having Documentation file may ease that though?
> However, if there is a good reason to keep UNSPECIFIED, then you
> need to add a " %0: NFS READ uses the default behavior" to the
> documenting comment for nfsd_io_cache_{read,write}.
>
> My preference is to remove NFSD_IO_UNSPECIFIED from this patch
> and her sister (4/7).
I don't have a strong preference, when I first implemented it I had it
how you'd prefer. But Jeff's kernel downgrade scenario still seems
like a prophetic nice catch.
Jeff, what is your thinking at this point?
> > + * This setting takes immediate effect for all NFS versions,
> > + * all exports, and in all NFSD net namespaces.
> > + */
> > +
> > +static int nfsd_io_cache_read_get(void *data, u64 *val)
> > +{
> > + *val = nfsd_io_cache_read;
> > + return 0;
> > +}
> > +
> > +static int nfsd_io_cache_read_set(void *data, u64 val)
> > +{
> > + int ret = 0;
> > +
> > + switch (val) {
> > + case NFSD_IO_BUFFERED:
> > + nfsd_io_cache_read = NFSD_IO_BUFFERED;
> > + break;
> > + case NFSD_IO_DONTCACHE:
> > + case NFSD_IO_DIRECT:
> > + /*
> > + * Must disable splice_read when enabling
> > + * NFSD_IO_DONTCACHE or NFSD_IO_DIRECT.
> > + */
> > + nfsd_disable_splice_read = true;
> > + nfsd_io_cache_read = val;
> > + break;
> > + default:
> > + ret = -EINVAL;
> > + break;
> > + }
> > +
> > + return ret;
> > +}
> > +
> > +DEFINE_DEBUGFS_ATTRIBUTE(nfsd_io_cache_read_fops, nfsd_io_cache_read_get,
> > + nfsd_io_cache_read_set, "%llu\n");
> > +
> > void nfsd_debugfs_exit(void)
> > {
> > debugfs_remove_recursive(nfsd_top_dir);
> > @@ -44,4 +98,7 @@ void nfsd_debugfs_init(void)
> >
> > debugfs_create_file("disable-splice-read", S_IWUSR | S_IRUGO,
> > nfsd_top_dir, NULL, &nfsd_dsr_fops);
> > +
> > + debugfs_create_file("io_cache_read", S_IWUSR | S_IRUGO,
> > + nfsd_top_dir, NULL, &nfsd_io_cache_read_fops);
> > }
> > diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h
> > index 1cd0bed57bc2f..6ef799405145f 100644
> > --- a/fs/nfsd/nfsd.h
> > +++ b/fs/nfsd/nfsd.h
> > @@ -153,6 +153,15 @@ static inline void nfsd_debugfs_exit(void) {}
> >
> > extern bool nfsd_disable_splice_read __read_mostly;
> >
> > +enum {
> > + NFSD_IO_UNSPECIFIED = 0,
> > + NFSD_IO_BUFFERED,
> > + NFSD_IO_DONTCACHE,
> > + NFSD_IO_DIRECT,
> > +};
> > +
> > +extern u64 nfsd_io_cache_read __read_mostly;
>
> And then here, initialize nfsd_io_cache_read to reflect the default
> behavior. That would be NFSD_IO_BUFFERED for now... then later we might
> want to change it to NFSD_IO_DIRECT, for instance.
>
> Same suggestion for 4/7.
Ah ok, I can see the way forward to default to NFSD_IO_BUFFERED but
_not_ default to it when erroring (if the user specified some unknown
value).
I'll run with that (despite just asking Jeff's opinion above, I'm the
one who came up with the awkward UNSPECIFIED state when honoring
Jeff's early feedback).
> > +
> > extern int nfsd_max_blksize;
> >
> > static inline int nfsd_v4client(struct svc_rqst *rq)
> > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> > index 79439ad93880a..8ea8b80097195 100644
> > --- a/fs/nfsd/vfs.c
> > +++ b/fs/nfsd/vfs.c
> > @@ -49,6 +49,7 @@
> > #define NFSDDBG_FACILITY NFSDDBG_FILEOP
> >
> > bool nfsd_disable_splice_read __read_mostly;
> > +u64 nfsd_io_cache_read __read_mostly;
> >
> > /**
> > * nfserrno - Map Linux errnos to NFS errnos
> > @@ -1099,6 +1100,23 @@ __be32 nfsd_iter_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
> > size_t len;
> >
> > init_sync_kiocb(&kiocb, file);
> > +
> > + switch (nfsd_io_cache_read) {
> > + case NFSD_IO_DIRECT:
> > + /* Verify ondisk and memory DIO alignment */
> > + if (nf->nf_dio_mem_align && nf->nf_dio_read_offset_align &&
> > + (((offset | *count) & (nf->nf_dio_read_offset_align - 1)) == 0) &&
> > + (base & (nf->nf_dio_mem_align - 1)) == 0)
> > + kiocb.ki_flags = IOCB_DIRECT;
> > + break;
> > + case NFSD_IO_DONTCACHE:
> > + kiocb.ki_flags = IOCB_DONTCACHE;
> > + fallthrough;
>
> Nit: Make this "break;". This is brittle: if someone adds something to
> the NFSD_IO_BUFFERED arm but happens to miss that the DONTCACHE arm
> above it is "fallthrough" then we have a latent bug.
>
> Same suggestion for 4/7.
Sure, but just FYI, the misaligned DIO WRITE patch does need
fallthrough from NFSD_IO_DONTCACHE to NFSD_IO_BUFFERED:
case NFSD_IO_DONTCACHE:
kiocb.ki_flags |= IOCB_DONTCACHE;
fallthrough;
case NFSD_IO_UNSPECIFIED:
case NFSD_IO_BUFFERED:
+ host_err = nfsd_issue_write_buffered(rqstp, file,
+ nvecs, cnt, &kiocb);
break;
}
But rather than preemptively lay the foundation for that in 4/7 I'll
just be explicit in that 6/7 patch.
next prev parent reply other threads:[~2025-09-03 15:07 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-26 18:57 [PATCH v8 0/7] NFSD: add "NFSD DIRECT" and "NFSD DONTCACHE" IO modes Mike Snitzer
2025-08-26 18:57 ` [PATCH v8 1/7] NFSD: filecache: add STATX_DIOALIGN and STATX_DIO_READ_ALIGN support Mike Snitzer
2025-08-26 18:57 ` [PATCH v8 2/7] NFSD: pass nfsd_file to nfsd_iter_read() Mike Snitzer
2025-08-26 18:57 ` [PATCH v8 3/7] NFSD: add io_cache_read controls to debugfs interface Mike Snitzer
2025-09-03 14:38 ` Chuck Lever
2025-09-03 15:07 ` Mike Snitzer [this message]
2025-09-03 16:02 ` Mike Snitzer
2025-09-03 16:12 ` Chuck Lever
2025-09-03 16:50 ` Mike Snitzer
2025-08-26 18:57 ` [PATCH v8 4/7] NFSD: add io_cache_write " Mike Snitzer
2025-08-26 18:57 ` [PATCH v8 5/7] NFSD: issue READs using O_DIRECT even if IO is misaligned Mike Snitzer
2025-08-27 15:34 ` Chuck Lever
2025-08-27 19:41 ` Mike Snitzer
2025-08-27 20:56 ` Chuck Lever
2025-08-27 23:15 ` Mike Snitzer
2025-08-28 1:57 ` Chuck Lever
2025-08-28 8:09 ` Mike Snitzer
2025-08-28 14:53 ` Chuck Lever
2025-08-28 18:52 ` Mike Snitzer
2025-08-30 17:38 ` [RFC PATCH 0/2] some progress on rpcrdma bug [was: Re: [PATCH v8 5/7] NFSD: issue READs using O_DIRECT even if IO is misaligned] Mike Snitzer
2025-08-30 17:38 ` [RFC PATCH 1/2] NFSD: fix misaligned DIO READ to not use a start_extra_page, exposes rpcrdma bug? Mike Snitzer
2025-09-02 14:04 ` Chuck Lever
2025-09-02 15:56 ` Chuck Lever
2025-09-02 17:59 ` Chuck Lever
2025-09-02 21:06 ` Mike Snitzer
2025-09-02 21:16 ` Chuck Lever
2025-09-02 21:27 ` Mike Snitzer
2025-09-02 22:18 ` Mike Snitzer
2025-09-04 19:07 ` Chuck Lever
2025-09-04 21:00 ` Mike Snitzer
2025-09-04 14:42 ` Mike Snitzer
2025-09-04 15:12 ` Chuck Lever
2025-09-04 16:10 ` Chuck Lever
2025-09-04 16:33 ` Mike Snitzer
2025-09-04 17:54 ` Chuck Lever
2025-08-30 17:38 ` [RFC PATCH 2/2] NFSD: use /end/ of rq_pages for front_pad page, simpler workaround for rpcrdma bug Mike Snitzer
2025-08-30 18:53 ` [RFC PATCH 0/2] some progress on rpcrdma bug [was: Re: [PATCH v8 5/7] NFSD: issue READs using O_DIRECT even if IO is misaligned] Mike Snitzer
2025-08-28 16:36 ` [PATCH v8 5/7] NFSD: issue READs using O_DIRECT even if IO is misaligned Jeff Layton
2025-08-28 16:22 ` Jeff Layton
2025-08-28 16:27 ` Chuck Lever
2025-08-26 18:57 ` [PATCH v8 6/7] NFSD: issue WRITEs " Mike Snitzer
2025-08-26 18:57 ` [PATCH v8 7/7] NFSD: add nfsd_analyze_read_dio and nfsd_analyze_write_dio trace events Mike Snitzer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aLhZsfJMwsGu1eu3@kernel.org \
--to=snitzer@kernel.org \
--cc=chuck.lever@oracle.com \
--cc=jlayton@kernel.org \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox