linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mike Snitzer <snitzer@kernel.org>
To: Chuck Lever <chuck.lever@oracle.com>
Cc: Jeff Layton <jlayton@kernel.org>, linux-nfs@vger.kernel.org
Subject: Re: [PATCH v4 2/3] NFSD: add new NFSD_IO_DIRECT variants that may override stable_how
Date: Thu, 6 Nov 2025 15:17:20 -0500	[thread overview]
Message-ID: <aQ0CUPcYYg6-5IJ1@kernel.org> (raw)
In-Reply-To: <c1f4d144-826e-4c27-821c-47652a7b67d2@oracle.com>

On Wed, Nov 05, 2025 at 01:49:29PM -0500, Chuck Lever wrote:
> On 11/5/25 12:42 PM, Mike Snitzer wrote:
> > NFSD_IO_DIRECT_WRITE_FILE_SYNC is direct IO with stable_how=NFS_FILE_SYNC.
> > NFSD_IO_DIRECT_WRITE_DATA_SYNC is direct IO with stable_how=NFS_DATA_SYNC.
> > 
> > The stable_how associated with each is a hint in the form of a "floor"
> > value for stable_how.  Meaning if the client provided stable_how is
> > already of higher value it will not be changed.
> > 
> > These permutations of NFSD_IO_DIRECT allow to experiment with also
> > elevating stable_how and sending it back to the client.  Which for
> > NFSD_IO_DIRECT_WRITE_FILE_SYNC will cause the client to elide its
> > COMMIT.
> > 
> > Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> > ---
> >  fs/nfsd/debugfs.c |  7 ++++++-
> >  fs/nfsd/nfsd.h    |  2 ++
> >  fs/nfsd/vfs.c     | 46 ++++++++++++++++++++++++++++++++++------------
> >  3 files changed, 42 insertions(+), 13 deletions(-)
> > 
> > diff --git a/fs/nfsd/debugfs.c b/fs/nfsd/debugfs.c
> > index 7f44689e0a53..8538e29ed2ab 100644
> > --- a/fs/nfsd/debugfs.c
> > +++ b/fs/nfsd/debugfs.c
> > @@ -68,7 +68,7 @@ static int nfsd_io_cache_read_set(void *data, u64 val)
> >  	case NFSD_IO_DIRECT:
> >  		/*
> >  		 * Must disable splice_read when enabling
> > -		 * NFSD_IO_DONTCACHE.
> > +		 * NFSD_IO_DONTCACHE and NFSD_IO_DIRECT.
> >  		 */
> >  		nfsd_disable_splice_read = true;
> >  		nfsd_io_cache_read = val;
> > @@ -90,6 +90,9 @@ DEFINE_DEBUGFS_ATTRIBUTE(nfsd_io_cache_read_fops, nfsd_io_cache_read_get,
> >   * Contents:
> >   *   %0: NFS WRITE will use buffered IO
> >   *   %1: NFS WRITE will use dontcache (buffered IO w/ dropbehind)
> > + *   %2: NFS WRITE will use direct IO with stable_how=NFS_UNSTABLE
> > + *   %3: NFS WRITE will use direct IO with stable_how=NFS_DATA_SYNC
> > + *   %4: NFS WRITE will use direct IO with stable_how=NFS_FILE_SYNC
> >   *
> >   * This setting takes immediate effect for all NFS versions,
> >   * all exports, and in all NFSD net namespaces.
> > @@ -109,6 +112,8 @@ static int nfsd_io_cache_write_set(void *data, u64 val)
> >  	case NFSD_IO_BUFFERED:
> >  	case NFSD_IO_DONTCACHE:
> >  	case NFSD_IO_DIRECT:
> > +	case NFSD_IO_DIRECT_WRITE_DATA_SYNC:
> > +	case NFSD_IO_DIRECT_WRITE_FILE_SYNC:
> >  		nfsd_io_cache_write = val;
> >  		break;
> >  	default:
> > diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h
> > index e4263326ca4a..10eca169392b 100644
> > --- a/fs/nfsd/nfsd.h
> > +++ b/fs/nfsd/nfsd.h
> > @@ -161,6 +161,8 @@ enum {
> >  	NFSD_IO_BUFFERED,
> >  	NFSD_IO_DONTCACHE,
> >  	NFSD_IO_DIRECT,
> > +	NFSD_IO_DIRECT_WRITE_DATA_SYNC,
> > +	NFSD_IO_DIRECT_WRITE_FILE_SYNC,
> >  };
> >  
> >  extern u64 nfsd_io_cache_read __read_mostly;
> > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> > index a4700c917c72..1b61185e96a9 100644
> > --- a/fs/nfsd/vfs.c
> > +++ b/fs/nfsd/vfs.c
> > @@ -1367,15 +1367,45 @@ nfsd_write_dio_iters_init(struct bio_vec *bvec, unsigned int nvecs,
> >  		args->flags_buffered |= IOCB_DONTCACHE;
> >  }
> >  
> > +static void
> > +nfsd_init_write_kiocb_from_stable(u32 stable_floor,
> > +				  struct kiocb *kiocb,
> > +				  u32 *stable_how)
> > +{
> > +	if (stable_floor < *stable_how)
> > +		return; /* stable_how already set higher */
> > +
> > +	*stable_how = stable_floor;
> > +
> > +	switch (stable_floor) {
> > +	case NFS_FILE_SYNC:
> > +		/* persist data and timestamps */
> > +		kiocb->ki_flags |= IOCB_DSYNC | IOCB_SYNC;
> > +		break;
> > +	case NFS_DATA_SYNC:
> > +		/* persist data only */
> > +		kiocb->ki_flags |= IOCB_DSYNC;
> > +		break;
> > +	}
> > +}
> > +
> >  static noinline_for_stack int
> >  nfsd_direct_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
> >  		  struct nfsd_file *nf, u32 *stable_how, unsigned int nvecs,
> >  		  unsigned long *cnt, struct kiocb *kiocb)
> >  {
> > +	u32 stable_floor = NFS_UNSTABLE;
> >  	struct nfsd_write_dio_args args;
> >  	ssize_t host_err;
> >  	unsigned int i;
> >  
> > +	if (nfsd_io_cache_write == NFSD_IO_DIRECT_WRITE_FILE_SYNC)
> > +		stable_floor = NFS_FILE_SYNC;
> > +	else if (nfsd_io_cache_write == NFSD_IO_DIRECT_WRITE_DATA_SYNC)
> > +		stable_floor = NFS_DATA_SYNC;
> > +	if (stable_floor != NFS_UNSTABLE)
> > +		nfsd_init_write_kiocb_from_stable(stable_floor, kiocb,
> > +						  stable_how);
> >  	args.nf = nf;
> >  	nfsd_write_dio_iters_init(rqstp->rq_bvec, nvecs, kiocb, *cnt, &args);
> >  
> > @@ -1461,18 +1491,8 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
> >  		stable = NFS_UNSTABLE;
> >  	init_sync_kiocb(&kiocb, file);
> >  	kiocb.ki_pos = offset;
> > -	if (likely(!fhp->fh_use_wgather)) {
> > -		switch (stable) {
> > -		case NFS_FILE_SYNC:
> > -			/* persist data and timestamps */
> > -			kiocb.ki_flags |= IOCB_DSYNC | IOCB_SYNC;
> > -			break;
> > -		case NFS_DATA_SYNC:
> > -			/* persist data only */
> > -			kiocb.ki_flags |= IOCB_DSYNC;
> > -			break;
> > -		}
> > -	}
> > +	if (likely(!fhp->fh_use_wgather))
> > +		nfsd_init_write_kiocb_from_stable(stable, &kiocb, stable_how);
> >  
> >  	nvecs = xdr_buf_to_bvec(rqstp->rq_bvec, rqstp->rq_maxpages, payload);
> >  
> > @@ -1482,6 +1502,8 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
> >  
> >  	switch (nfsd_io_cache_write) {
> >  	case NFSD_IO_DIRECT:
> > +	case NFSD_IO_DIRECT_WRITE_DATA_SYNC:
> > +	case NFSD_IO_DIRECT_WRITE_FILE_SYNC:
> >  		host_err = nfsd_direct_write(rqstp, fhp, nf, stable_how,
> >  					     nvecs, cnt, &kiocb);
> >  		stable = *stable_how;
> 
> 
> I asked for the use of a file_sync export option because we need to test
> the BUFFERED cache mode as well as DIRECT. So, continue to experiment
> with this one, but I don't plan to merge it for now.

Doesn't the client have the ability to control NFS_UNSTABLE,
NFS_DATA_SYNC and NFS_FILE_SYNC already?  What experiment are you
looking to run?

If just looking to compare NFS_FILE_SYNC performance of
NFSD_IO_BUFFERED versus NFSD_IO_DIRECT then using the client control
is fine right?

Anyway, maybe I'm just being overly concerned about the permanence of
an export option.  I thought it best to avoid export for now given we
do seem to have adequate controls for a NFS_FILE_SYNC performance
bakeoff.

Here is a rebased patch that applies ontop of Christoph's cleanup and
my incremental Documentation patch.  I would appreciate us exposing
this NFSD stable_how "floor" control so others can try.  But if this
still isn't OK, due it to being in terms of NFSD_IO_DIRECT debugfs
knobs, then I can pursue a generic export option that works for all
NFS IO modes.

Thanks,
Mike

From: Mike Snitzer <snitzer@kernel.org>
Date: Thu, 30 Oct 2025 17:53:09 -0400
Subject: [PATCH v4 rebased] NFSD: add new NFSD_IO_DIRECT variants that may override stable_how

NFSD_IO_DIRECT_WRITE_FILE_SYNC is direct IO with stable_how=NFS_FILE_SYNC.
NFSD_IO_DIRECT_WRITE_DATA_SYNC is direct IO with stable_how=NFS_DATA_SYNC.

The stable_how associated with each is a hint in the form of a "floor"
value for stable_how.  Meaning if the client provided stable_how is
already of higher value it will not be changed.

These permutations of NFSD_IO_DIRECT allow to experiment with also
elevating stable_how and sending it back to the client.  Which for
NFSD_IO_DIRECT_WRITE_FILE_SYNC will cause the client to elide its
COMMIT.

Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
 .../filesystems/nfs/nfsd-io-modes.rst         |  6 ++-
 fs/nfsd/debugfs.c                             |  7 ++-
 fs/nfsd/nfsd.h                                |  2 +
 fs/nfsd/vfs.c                                 | 54 +++++++++++++------
 4 files changed, 51 insertions(+), 18 deletions(-)

diff --git a/Documentation/filesystems/nfs/nfsd-io-modes.rst b/Documentation/filesystems/nfs/nfsd-io-modes.rst
index e3a522d09766..a2194ec45c76 100644
--- a/Documentation/filesystems/nfs/nfsd-io-modes.rst
+++ b/Documentation/filesystems/nfs/nfsd-io-modes.rst
@@ -23,11 +23,13 @@ Based on the configured settings, NFSD's IO will either be:
 - cached using page cache (NFSD_IO_BUFFERED=0)
 - cached but removed from page cache on completion (NFSD_IO_DONTCACHE=1)
 - not cached stable_how=NFS_UNSTABLE (NFSD_IO_DIRECT=2)
+- not cached stable_how=NFS_DATA_SYNC (NFSD_IO_DIRECT_WRITE_DATA_SYNC=3)
+- not cached stable_how=NFS_FILE_SYNC (NFSD_IO_DIRECT_WRITE_FILE_SYNC=4)
 
-To set an NFSD IO mode, write a supported value (0 - 2) to the
+To set an NFSD IO mode, write a supported value (0 - 4) to the
 corresponding IO operation's debugfs interface, e.g.:
   echo 2 > /sys/kernel/debug/nfsd/io_cache_read
-  echo 2 > /sys/kernel/debug/nfsd/io_cache_write
+  echo 4 > /sys/kernel/debug/nfsd/io_cache_write
 
 To check which IO mode NFSD is using for READ or WRITE, simply read the
 corresponding IO operation's debugfs interface, e.g.:
diff --git a/fs/nfsd/debugfs.c b/fs/nfsd/debugfs.c
index 7f44689e0a53..8538e29ed2ab 100644
--- a/fs/nfsd/debugfs.c
+++ b/fs/nfsd/debugfs.c
@@ -68,7 +68,7 @@ static int nfsd_io_cache_read_set(void *data, u64 val)
 	case NFSD_IO_DIRECT:
 		/*
 		 * Must disable splice_read when enabling
-		 * NFSD_IO_DONTCACHE.
+		 * NFSD_IO_DONTCACHE and NFSD_IO_DIRECT.
 		 */
 		nfsd_disable_splice_read = true;
 		nfsd_io_cache_read = val;
@@ -90,6 +90,9 @@ DEFINE_DEBUGFS_ATTRIBUTE(nfsd_io_cache_read_fops, nfsd_io_cache_read_get,
  * Contents:
  *   %0: NFS WRITE will use buffered IO
  *   %1: NFS WRITE will use dontcache (buffered IO w/ dropbehind)
+ *   %2: NFS WRITE will use direct IO with stable_how=NFS_UNSTABLE
+ *   %3: NFS WRITE will use direct IO with stable_how=NFS_DATA_SYNC
+ *   %4: NFS WRITE will use direct IO with stable_how=NFS_FILE_SYNC
  *
  * This setting takes immediate effect for all NFS versions,
  * all exports, and in all NFSD net namespaces.
@@ -109,6 +112,8 @@ static int nfsd_io_cache_write_set(void *data, u64 val)
 	case NFSD_IO_BUFFERED:
 	case NFSD_IO_DONTCACHE:
 	case NFSD_IO_DIRECT:
+	case NFSD_IO_DIRECT_WRITE_DATA_SYNC:
+	case NFSD_IO_DIRECT_WRITE_FILE_SYNC:
 		nfsd_io_cache_write = val;
 		break;
 	default:
diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h
index e4263326ca4a..10eca169392b 100644
--- a/fs/nfsd/nfsd.h
+++ b/fs/nfsd/nfsd.h
@@ -161,6 +161,8 @@ enum {
 	NFSD_IO_BUFFERED,
 	NFSD_IO_DONTCACHE,
 	NFSD_IO_DIRECT,
+	NFSD_IO_DIRECT_WRITE_DATA_SYNC,
+	NFSD_IO_DIRECT_WRITE_FILE_SYNC,
 };
 
 extern u64 nfsd_io_cache_read __read_mostly;
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 326cf6f717b3..101c18d79208 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1353,16 +1353,47 @@ nfsd_write_dio_iters_init(struct nfsd_file *nf, struct bio_vec *bvec,
 	return 1;
 }
 
+static void
+nfsd_init_write_kiocb_from_stable(u32 stable_floor,
+				  struct kiocb *kiocb,
+				  u32 *stable_how)
+{
+	if (stable_floor < *stable_how)
+		return; /* stable_how already set higher */
+
+	*stable_how = stable_floor;
+
+	switch (stable_floor) {
+	case NFS_FILE_SYNC:
+		/* persist data and timestamps */
+		kiocb->ki_flags |= IOCB_DSYNC | IOCB_SYNC;
+		break;
+	case NFS_DATA_SYNC:
+		/* persist data only */
+		kiocb->ki_flags |= IOCB_DSYNC;
+		break;
+	}
+}
+
 static noinline_for_stack int
 nfsd_direct_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
-		  struct nfsd_file *nf, unsigned int nvecs,
+		  struct nfsd_file *nf, u32 *stable_how, unsigned int nvecs,
 		  unsigned long *cnt, struct kiocb *kiocb)
 {
+	u32 stable_floor = NFS_UNSTABLE;
 	struct file *file = nf->nf_file;
 	struct nfsd_write_dio_seg segments[3];
 	unsigned int nsegs = 0, i;
 	ssize_t host_err;
 
+	if (nfsd_io_cache_write == NFSD_IO_DIRECT_WRITE_FILE_SYNC)
+		stable_floor = NFS_FILE_SYNC;
+	else if (nfsd_io_cache_write == NFSD_IO_DIRECT_WRITE_DATA_SYNC)
+		stable_floor = NFS_DATA_SYNC;
+	if (stable_floor != NFS_UNSTABLE)
+		nfsd_init_write_kiocb_from_stable(stable_floor, kiocb,
+						  stable_how);
+
 	nsegs = nfsd_write_dio_iters_init(nf, rqstp->rq_bvec, nvecs,
 			kiocb, *cnt, segments);
 
@@ -1445,18 +1476,8 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
 		stable = NFS_UNSTABLE;
 	init_sync_kiocb(&kiocb, file);
 	kiocb.ki_pos = offset;
-	if (likely(!fhp->fh_use_wgather)) {
-		switch (stable) {
-		case NFS_FILE_SYNC:
-			/* persist data and timestamps */
-			kiocb.ki_flags |= IOCB_DSYNC | IOCB_SYNC;
-			break;
-		case NFS_DATA_SYNC:
-			/* persist data only */
-			kiocb.ki_flags |= IOCB_DSYNC;
-			break;
-		}
-	}
+	if (likely(!fhp->fh_use_wgather))
+		nfsd_init_write_kiocb_from_stable(stable, &kiocb, stable_how);
 
 	nvecs = xdr_buf_to_bvec(rqstp->rq_bvec, rqstp->rq_maxpages, payload);
 
@@ -1466,8 +1487,11 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
 
 	switch (nfsd_io_cache_write) {
 	case NFSD_IO_DIRECT:
-		host_err = nfsd_direct_write(rqstp, fhp, nf, nvecs,
-					     cnt, &kiocb);
+	case NFSD_IO_DIRECT_WRITE_DATA_SYNC:
+	case NFSD_IO_DIRECT_WRITE_FILE_SYNC:
+		host_err = nfsd_direct_write(rqstp, fhp, nf, stable_how,
+					     nvecs, cnt, &kiocb);
+		stable = *stable_how;
 		break;
 	case NFSD_IO_DONTCACHE:
 		if (file->f_op->fop_flags & FOP_DONTCACHE)
-- 
2.43.0


  reply	other threads:[~2025-11-06 20:17 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-05 17:42 [PATCH v4 0/3] [PATCH 0/3] NFSD: additional NFSD Direct changes Mike Snitzer
2025-11-05 17:42 ` [PATCH v4 1/3] NFSD: avoid DONTCACHE for misaligned ends of misaligned DIO WRITE Mike Snitzer
2025-11-05 18:47   ` Chuck Lever
2025-11-07 15:29   ` Christoph Hellwig
2025-11-05 17:42 ` [PATCH v4 2/3] NFSD: add new NFSD_IO_DIRECT variants that may override stable_how Mike Snitzer
2025-11-05 18:49   ` Chuck Lever
2025-11-06 20:17     ` Mike Snitzer [this message]
2025-11-06 20:35       ` Chuck Lever
2025-11-06 22:56         ` Mike Snitzer
2025-11-07 14:48           ` Chuck Lever
2025-11-07 15:34           ` Christoph Hellwig
2025-11-07 15:35             ` Chuck Lever
2025-11-07 15:40               ` Christoph Hellwig
2025-11-07 15:30   ` Christoph Hellwig
2025-11-05 17:42 ` [PATCH v4 3/3] NFSD: update Documentation/filesystems/nfs/nfsd-io-modes.rst Mike Snitzer
2025-11-05 18:50   ` Chuck Lever

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aQ0CUPcYYg6-5IJ1@kernel.org \
    --to=snitzer@kernel.org \
    --cc=chuck.lever@oracle.com \
    --cc=jlayton@kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).