From: Mike Snitzer <snitzer@kernel.org>
To: Chuck Lever <chuck.lever@oracle.com>
Cc: Jeff Layton <jlayton@kernel.org>, linux-nfs@vger.kernel.org
Subject: Re: [PATCH v4 2/3] NFSD: add new NFSD_IO_DIRECT variants that may override stable_how
Date: Thu, 6 Nov 2025 15:17:20 -0500 [thread overview]
Message-ID: <aQ0CUPcYYg6-5IJ1@kernel.org> (raw)
In-Reply-To: <c1f4d144-826e-4c27-821c-47652a7b67d2@oracle.com>
On Wed, Nov 05, 2025 at 01:49:29PM -0500, Chuck Lever wrote:
> On 11/5/25 12:42 PM, Mike Snitzer wrote:
> > NFSD_IO_DIRECT_WRITE_FILE_SYNC is direct IO with stable_how=NFS_FILE_SYNC.
> > NFSD_IO_DIRECT_WRITE_DATA_SYNC is direct IO with stable_how=NFS_DATA_SYNC.
> >
> > The stable_how associated with each is a hint in the form of a "floor"
> > value for stable_how. Meaning if the client provided stable_how is
> > already of higher value it will not be changed.
> >
> > These permutations of NFSD_IO_DIRECT allow to experiment with also
> > elevating stable_how and sending it back to the client. Which for
> > NFSD_IO_DIRECT_WRITE_FILE_SYNC will cause the client to elide its
> > COMMIT.
> >
> > Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> > ---
> > fs/nfsd/debugfs.c | 7 ++++++-
> > fs/nfsd/nfsd.h | 2 ++
> > fs/nfsd/vfs.c | 46 ++++++++++++++++++++++++++++++++++------------
> > 3 files changed, 42 insertions(+), 13 deletions(-)
> >
> > diff --git a/fs/nfsd/debugfs.c b/fs/nfsd/debugfs.c
> > index 7f44689e0a53..8538e29ed2ab 100644
> > --- a/fs/nfsd/debugfs.c
> > +++ b/fs/nfsd/debugfs.c
> > @@ -68,7 +68,7 @@ static int nfsd_io_cache_read_set(void *data, u64 val)
> > case NFSD_IO_DIRECT:
> > /*
> > * Must disable splice_read when enabling
> > - * NFSD_IO_DONTCACHE.
> > + * NFSD_IO_DONTCACHE and NFSD_IO_DIRECT.
> > */
> > nfsd_disable_splice_read = true;
> > nfsd_io_cache_read = val;
> > @@ -90,6 +90,9 @@ DEFINE_DEBUGFS_ATTRIBUTE(nfsd_io_cache_read_fops, nfsd_io_cache_read_get,
> > * Contents:
> > * %0: NFS WRITE will use buffered IO
> > * %1: NFS WRITE will use dontcache (buffered IO w/ dropbehind)
> > + * %2: NFS WRITE will use direct IO with stable_how=NFS_UNSTABLE
> > + * %3: NFS WRITE will use direct IO with stable_how=NFS_DATA_SYNC
> > + * %4: NFS WRITE will use direct IO with stable_how=NFS_FILE_SYNC
> > *
> > * This setting takes immediate effect for all NFS versions,
> > * all exports, and in all NFSD net namespaces.
> > @@ -109,6 +112,8 @@ static int nfsd_io_cache_write_set(void *data, u64 val)
> > case NFSD_IO_BUFFERED:
> > case NFSD_IO_DONTCACHE:
> > case NFSD_IO_DIRECT:
> > + case NFSD_IO_DIRECT_WRITE_DATA_SYNC:
> > + case NFSD_IO_DIRECT_WRITE_FILE_SYNC:
> > nfsd_io_cache_write = val;
> > break;
> > default:
> > diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h
> > index e4263326ca4a..10eca169392b 100644
> > --- a/fs/nfsd/nfsd.h
> > +++ b/fs/nfsd/nfsd.h
> > @@ -161,6 +161,8 @@ enum {
> > NFSD_IO_BUFFERED,
> > NFSD_IO_DONTCACHE,
> > NFSD_IO_DIRECT,
> > + NFSD_IO_DIRECT_WRITE_DATA_SYNC,
> > + NFSD_IO_DIRECT_WRITE_FILE_SYNC,
> > };
> >
> > extern u64 nfsd_io_cache_read __read_mostly;
> > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> > index a4700c917c72..1b61185e96a9 100644
> > --- a/fs/nfsd/vfs.c
> > +++ b/fs/nfsd/vfs.c
> > @@ -1367,15 +1367,45 @@ nfsd_write_dio_iters_init(struct bio_vec *bvec, unsigned int nvecs,
> > args->flags_buffered |= IOCB_DONTCACHE;
> > }
> >
> > +static void
> > +nfsd_init_write_kiocb_from_stable(u32 stable_floor,
> > + struct kiocb *kiocb,
> > + u32 *stable_how)
> > +{
> > + if (stable_floor < *stable_how)
> > + return; /* stable_how already set higher */
> > +
> > + *stable_how = stable_floor;
> > +
> > + switch (stable_floor) {
> > + case NFS_FILE_SYNC:
> > + /* persist data and timestamps */
> > + kiocb->ki_flags |= IOCB_DSYNC | IOCB_SYNC;
> > + break;
> > + case NFS_DATA_SYNC:
> > + /* persist data only */
> > + kiocb->ki_flags |= IOCB_DSYNC;
> > + break;
> > + }
> > +}
> > +
> > static noinline_for_stack int
> > nfsd_direct_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
> > struct nfsd_file *nf, u32 *stable_how, unsigned int nvecs,
> > unsigned long *cnt, struct kiocb *kiocb)
> > {
> > + u32 stable_floor = NFS_UNSTABLE;
> > struct nfsd_write_dio_args args;
> > ssize_t host_err;
> > unsigned int i;
> >
> > + if (nfsd_io_cache_write == NFSD_IO_DIRECT_WRITE_FILE_SYNC)
> > + stable_floor = NFS_FILE_SYNC;
> > + else if (nfsd_io_cache_write == NFSD_IO_DIRECT_WRITE_DATA_SYNC)
> > + stable_floor = NFS_DATA_SYNC;
> > + if (stable_floor != NFS_UNSTABLE)
> > + nfsd_init_write_kiocb_from_stable(stable_floor, kiocb,
> > + stable_how);
> > args.nf = nf;
> > nfsd_write_dio_iters_init(rqstp->rq_bvec, nvecs, kiocb, *cnt, &args);
> >
> > @@ -1461,18 +1491,8 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
> > stable = NFS_UNSTABLE;
> > init_sync_kiocb(&kiocb, file);
> > kiocb.ki_pos = offset;
> > - if (likely(!fhp->fh_use_wgather)) {
> > - switch (stable) {
> > - case NFS_FILE_SYNC:
> > - /* persist data and timestamps */
> > - kiocb.ki_flags |= IOCB_DSYNC | IOCB_SYNC;
> > - break;
> > - case NFS_DATA_SYNC:
> > - /* persist data only */
> > - kiocb.ki_flags |= IOCB_DSYNC;
> > - break;
> > - }
> > - }
> > + if (likely(!fhp->fh_use_wgather))
> > + nfsd_init_write_kiocb_from_stable(stable, &kiocb, stable_how);
> >
> > nvecs = xdr_buf_to_bvec(rqstp->rq_bvec, rqstp->rq_maxpages, payload);
> >
> > @@ -1482,6 +1502,8 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
> >
> > switch (nfsd_io_cache_write) {
> > case NFSD_IO_DIRECT:
> > + case NFSD_IO_DIRECT_WRITE_DATA_SYNC:
> > + case NFSD_IO_DIRECT_WRITE_FILE_SYNC:
> > host_err = nfsd_direct_write(rqstp, fhp, nf, stable_how,
> > nvecs, cnt, &kiocb);
> > stable = *stable_how;
>
>
> I asked for the use of a file_sync export option because we need to test
> the BUFFERED cache mode as well as DIRECT. So, continue to experiment
> with this one, but I don't plan to merge it for now.
Doesn't the client have the ability to control NFS_UNSTABLE,
NFS_DATA_SYNC and NFS_FILE_SYNC already? What experiment are you
looking to run?
If just looking to compare NFS_FILE_SYNC performance of
NFSD_IO_BUFFERED versus NFSD_IO_DIRECT then using the client control
is fine right?
Anyway, maybe I'm just being overly concerned about the permanence of
an export option. I thought it best to avoid export for now given we
do seem to have adequate controls for a NFS_FILE_SYNC performance
bakeoff.
Here is a rebased patch that applies ontop of Christoph's cleanup and
my incremental Documentation patch. I would appreciate us exposing
this NFSD stable_how "floor" control so others can try. But if this
still isn't OK, due it to being in terms of NFSD_IO_DIRECT debugfs
knobs, then I can pursue a generic export option that works for all
NFS IO modes.
Thanks,
Mike
From: Mike Snitzer <snitzer@kernel.org>
Date: Thu, 30 Oct 2025 17:53:09 -0400
Subject: [PATCH v4 rebased] NFSD: add new NFSD_IO_DIRECT variants that may override stable_how
NFSD_IO_DIRECT_WRITE_FILE_SYNC is direct IO with stable_how=NFS_FILE_SYNC.
NFSD_IO_DIRECT_WRITE_DATA_SYNC is direct IO with stable_how=NFS_DATA_SYNC.
The stable_how associated with each is a hint in the form of a "floor"
value for stable_how. Meaning if the client provided stable_how is
already of higher value it will not be changed.
These permutations of NFSD_IO_DIRECT allow to experiment with also
elevating stable_how and sending it back to the client. Which for
NFSD_IO_DIRECT_WRITE_FILE_SYNC will cause the client to elide its
COMMIT.
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
.../filesystems/nfs/nfsd-io-modes.rst | 6 ++-
fs/nfsd/debugfs.c | 7 ++-
fs/nfsd/nfsd.h | 2 +
fs/nfsd/vfs.c | 54 +++++++++++++------
4 files changed, 51 insertions(+), 18 deletions(-)
diff --git a/Documentation/filesystems/nfs/nfsd-io-modes.rst b/Documentation/filesystems/nfs/nfsd-io-modes.rst
index e3a522d09766..a2194ec45c76 100644
--- a/Documentation/filesystems/nfs/nfsd-io-modes.rst
+++ b/Documentation/filesystems/nfs/nfsd-io-modes.rst
@@ -23,11 +23,13 @@ Based on the configured settings, NFSD's IO will either be:
- cached using page cache (NFSD_IO_BUFFERED=0)
- cached but removed from page cache on completion (NFSD_IO_DONTCACHE=1)
- not cached stable_how=NFS_UNSTABLE (NFSD_IO_DIRECT=2)
+- not cached stable_how=NFS_DATA_SYNC (NFSD_IO_DIRECT_WRITE_DATA_SYNC=3)
+- not cached stable_how=NFS_FILE_SYNC (NFSD_IO_DIRECT_WRITE_FILE_SYNC=4)
-To set an NFSD IO mode, write a supported value (0 - 2) to the
+To set an NFSD IO mode, write a supported value (0 - 4) to the
corresponding IO operation's debugfs interface, e.g.:
echo 2 > /sys/kernel/debug/nfsd/io_cache_read
- echo 2 > /sys/kernel/debug/nfsd/io_cache_write
+ echo 4 > /sys/kernel/debug/nfsd/io_cache_write
To check which IO mode NFSD is using for READ or WRITE, simply read the
corresponding IO operation's debugfs interface, e.g.:
diff --git a/fs/nfsd/debugfs.c b/fs/nfsd/debugfs.c
index 7f44689e0a53..8538e29ed2ab 100644
--- a/fs/nfsd/debugfs.c
+++ b/fs/nfsd/debugfs.c
@@ -68,7 +68,7 @@ static int nfsd_io_cache_read_set(void *data, u64 val)
case NFSD_IO_DIRECT:
/*
* Must disable splice_read when enabling
- * NFSD_IO_DONTCACHE.
+ * NFSD_IO_DONTCACHE and NFSD_IO_DIRECT.
*/
nfsd_disable_splice_read = true;
nfsd_io_cache_read = val;
@@ -90,6 +90,9 @@ DEFINE_DEBUGFS_ATTRIBUTE(nfsd_io_cache_read_fops, nfsd_io_cache_read_get,
* Contents:
* %0: NFS WRITE will use buffered IO
* %1: NFS WRITE will use dontcache (buffered IO w/ dropbehind)
+ * %2: NFS WRITE will use direct IO with stable_how=NFS_UNSTABLE
+ * %3: NFS WRITE will use direct IO with stable_how=NFS_DATA_SYNC
+ * %4: NFS WRITE will use direct IO with stable_how=NFS_FILE_SYNC
*
* This setting takes immediate effect for all NFS versions,
* all exports, and in all NFSD net namespaces.
@@ -109,6 +112,8 @@ static int nfsd_io_cache_write_set(void *data, u64 val)
case NFSD_IO_BUFFERED:
case NFSD_IO_DONTCACHE:
case NFSD_IO_DIRECT:
+ case NFSD_IO_DIRECT_WRITE_DATA_SYNC:
+ case NFSD_IO_DIRECT_WRITE_FILE_SYNC:
nfsd_io_cache_write = val;
break;
default:
diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h
index e4263326ca4a..10eca169392b 100644
--- a/fs/nfsd/nfsd.h
+++ b/fs/nfsd/nfsd.h
@@ -161,6 +161,8 @@ enum {
NFSD_IO_BUFFERED,
NFSD_IO_DONTCACHE,
NFSD_IO_DIRECT,
+ NFSD_IO_DIRECT_WRITE_DATA_SYNC,
+ NFSD_IO_DIRECT_WRITE_FILE_SYNC,
};
extern u64 nfsd_io_cache_read __read_mostly;
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 326cf6f717b3..101c18d79208 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1353,16 +1353,47 @@ nfsd_write_dio_iters_init(struct nfsd_file *nf, struct bio_vec *bvec,
return 1;
}
+static void
+nfsd_init_write_kiocb_from_stable(u32 stable_floor,
+ struct kiocb *kiocb,
+ u32 *stable_how)
+{
+ if (stable_floor < *stable_how)
+ return; /* stable_how already set higher */
+
+ *stable_how = stable_floor;
+
+ switch (stable_floor) {
+ case NFS_FILE_SYNC:
+ /* persist data and timestamps */
+ kiocb->ki_flags |= IOCB_DSYNC | IOCB_SYNC;
+ break;
+ case NFS_DATA_SYNC:
+ /* persist data only */
+ kiocb->ki_flags |= IOCB_DSYNC;
+ break;
+ }
+}
+
static noinline_for_stack int
nfsd_direct_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
- struct nfsd_file *nf, unsigned int nvecs,
+ struct nfsd_file *nf, u32 *stable_how, unsigned int nvecs,
unsigned long *cnt, struct kiocb *kiocb)
{
+ u32 stable_floor = NFS_UNSTABLE;
struct file *file = nf->nf_file;
struct nfsd_write_dio_seg segments[3];
unsigned int nsegs = 0, i;
ssize_t host_err;
+ if (nfsd_io_cache_write == NFSD_IO_DIRECT_WRITE_FILE_SYNC)
+ stable_floor = NFS_FILE_SYNC;
+ else if (nfsd_io_cache_write == NFSD_IO_DIRECT_WRITE_DATA_SYNC)
+ stable_floor = NFS_DATA_SYNC;
+ if (stable_floor != NFS_UNSTABLE)
+ nfsd_init_write_kiocb_from_stable(stable_floor, kiocb,
+ stable_how);
+
nsegs = nfsd_write_dio_iters_init(nf, rqstp->rq_bvec, nvecs,
kiocb, *cnt, segments);
@@ -1445,18 +1476,8 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
stable = NFS_UNSTABLE;
init_sync_kiocb(&kiocb, file);
kiocb.ki_pos = offset;
- if (likely(!fhp->fh_use_wgather)) {
- switch (stable) {
- case NFS_FILE_SYNC:
- /* persist data and timestamps */
- kiocb.ki_flags |= IOCB_DSYNC | IOCB_SYNC;
- break;
- case NFS_DATA_SYNC:
- /* persist data only */
- kiocb.ki_flags |= IOCB_DSYNC;
- break;
- }
- }
+ if (likely(!fhp->fh_use_wgather))
+ nfsd_init_write_kiocb_from_stable(stable, &kiocb, stable_how);
nvecs = xdr_buf_to_bvec(rqstp->rq_bvec, rqstp->rq_maxpages, payload);
@@ -1466,8 +1487,11 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
switch (nfsd_io_cache_write) {
case NFSD_IO_DIRECT:
- host_err = nfsd_direct_write(rqstp, fhp, nf, nvecs,
- cnt, &kiocb);
+ case NFSD_IO_DIRECT_WRITE_DATA_SYNC:
+ case NFSD_IO_DIRECT_WRITE_FILE_SYNC:
+ host_err = nfsd_direct_write(rqstp, fhp, nf, stable_how,
+ nvecs, cnt, &kiocb);
+ stable = *stable_how;
break;
case NFSD_IO_DONTCACHE:
if (file->f_op->fop_flags & FOP_DONTCACHE)
--
2.43.0
next prev parent reply other threads:[~2025-11-06 20:17 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-05 17:42 [PATCH v4 0/3] [PATCH 0/3] NFSD: additional NFSD Direct changes Mike Snitzer
2025-11-05 17:42 ` [PATCH v4 1/3] NFSD: avoid DONTCACHE for misaligned ends of misaligned DIO WRITE Mike Snitzer
2025-11-05 18:47 ` Chuck Lever
2025-11-07 15:29 ` Christoph Hellwig
2025-11-05 17:42 ` [PATCH v4 2/3] NFSD: add new NFSD_IO_DIRECT variants that may override stable_how Mike Snitzer
2025-11-05 18:49 ` Chuck Lever
2025-11-06 20:17 ` Mike Snitzer [this message]
2025-11-06 20:35 ` Chuck Lever
2025-11-06 22:56 ` Mike Snitzer
2025-11-07 14:48 ` Chuck Lever
2025-11-07 15:34 ` Christoph Hellwig
2025-11-07 15:35 ` Chuck Lever
2025-11-07 15:40 ` Christoph Hellwig
2025-11-07 15:30 ` Christoph Hellwig
2025-11-05 17:42 ` [PATCH v4 3/3] NFSD: update Documentation/filesystems/nfs/nfsd-io-modes.rst Mike Snitzer
2025-11-05 18:50 ` Chuck Lever
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aQ0CUPcYYg6-5IJ1@kernel.org \
--to=snitzer@kernel.org \
--cc=chuck.lever@oracle.com \
--cc=jlayton@kernel.org \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).