All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Snitzer <snitzer@kernel.org>
To: Chuck Lever <cel@kernel.org>
Cc: NeilBrown <neil@brown.name>, Jeff Layton <jlayton@kernel.org>,
	Olga Kornievskaia <okorniev@redhat.com>,
	Dai Ngo <dai.ngo@oracle.com>, Tom Talpey <tom@talpey.com>,
	linux-nfs@vger.kernel.org, Chuck Lever <chuck.lever@oracle.com>
Subject: Re: [PATCH v11 2/3] NFSD: Implement NFSD_IO_DIRECT for NFS WRITE
Date: Fri, 7 Nov 2025 12:18:19 -0500	[thread overview]
Message-ID: <aQ4p2xumOVlOxlkl@kernel.org> (raw)
In-Reply-To: <20251107153422.4373-3-cel@kernel.org>

On Fri, Nov 07, 2025 at 10:34:21AM -0500, Chuck Lever wrote:
> From: Mike Snitzer <snitzer@kernel.org>
> 
> When NFSD_IO_DIRECT is selected via the
> /sys/kernel/debug/nfsd/io_cache_write experimental tunable, split
> incoming unaligned NFS WRITE requests into a prefix, middle and
> suffix segment, as needed. The middle segment is now DIO-aligned and
> the prefix and/or suffix are unaligned. Synchronous buffered IO is
> used for the unaligned segments, and IOCB_DIRECT is used for the
> middle DIO-aligned extent.
> 
> Although IOCB_DIRECT avoids the use of the page cache, by itself it
> doesn't guarantee data durability. For UNSTABLE WRITE requests,
> durability is obtained by a subsequent NFS COMMIT request.
> 
> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> Co-developed-by: Chuck Lever <chuck.lever@oracle.com>
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
>  fs/nfsd/debugfs.c |   1 +
>  fs/nfsd/trace.h   |   1 +
>  fs/nfsd/vfs.c     | 140 ++++++++++++++++++++++++++++++++++++++++++++--
>  3 files changed, 138 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/nfsd/debugfs.c b/fs/nfsd/debugfs.c
> index 00eb1ecef6ac..7f44689e0a53 100644
> --- a/fs/nfsd/debugfs.c
> +++ b/fs/nfsd/debugfs.c
> @@ -108,6 +108,7 @@ static int nfsd_io_cache_write_set(void *data, u64 val)
>  	switch (val) {
>  	case NFSD_IO_BUFFERED:
>  	case NFSD_IO_DONTCACHE:
> +	case NFSD_IO_DIRECT:
>  		nfsd_io_cache_write = val;
>  		break;
>  	default:
> diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
> index 85a1521ad757..8047a6d97b81 100644
> --- a/fs/nfsd/trace.h
> +++ b/fs/nfsd/trace.h
> @@ -469,6 +469,7 @@ DEFINE_NFSD_IO_EVENT(read_io_done);
>  DEFINE_NFSD_IO_EVENT(read_done);
>  DEFINE_NFSD_IO_EVENT(write_start);
>  DEFINE_NFSD_IO_EVENT(write_opened);
> +DEFINE_NFSD_IO_EVENT(write_direct);
>  DEFINE_NFSD_IO_EVENT(write_io_done);
>  DEFINE_NFSD_IO_EVENT(write_done);
>  DEFINE_NFSD_IO_EVENT(commit_start);
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index 5333d49910d9..7e56be190170 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -1254,6 +1254,131 @@ static int wait_for_concurrent_writes(struct file *file)
>  	return err;
>  }
>  
> +struct nfsd_write_dio_seg {
> +	struct iov_iter			iter;
> +	int				flags;
> +};
> +
> +static unsigned long
> +iov_iter_bvec_offset(const struct iov_iter *iter)
> +{
> +	return (unsigned long)(iter->bvec->bv_offset + iter->iov_offset);
> +}
> +
> +static void
> +nfsd_write_dio_seg_init(struct nfsd_write_dio_seg *segment,
> +			struct bio_vec *bvec, unsigned int nvecs,
> +			unsigned long total, size_t start, size_t len,
> +			struct kiocb *iocb)
> +{
> +	iov_iter_bvec(&segment->iter, ITER_SOURCE, bvec, nvecs, total);
> +	if (start)
> +		iov_iter_advance(&segment->iter, start);
> +	iov_iter_truncate(&segment->iter, len);
> +	segment->flags = iocb->ki_flags;
> +}
> +
> +static unsigned int
> +nfsd_write_dio_iters_init(struct nfsd_file *nf, struct bio_vec *bvec,
> +			  unsigned int nvecs, struct kiocb *iocb,
> +			  unsigned long total,
> +			  struct nfsd_write_dio_seg segments[3])
> +{
> +	u32 offset_align = nf->nf_dio_offset_align;
> +	loff_t prefix_end, orig_end, middle_end;
> +	u32 mem_align = nf->nf_dio_mem_align;
> +	size_t prefix, middle, suffix;
> +	loff_t offset = iocb->ki_pos;
> +	unsigned int nsegs = 0;
> +
> +	/*
> +	 * Check if direct I/O is feasible for this write request.
> +	 * If alignments are not available, the write is too small,
> +	 * or no alignment can be found, fall back to buffered I/O.
> +	 */
> +	if (unlikely(!mem_align || !offset_align) ||
> +	    unlikely(total < max(offset_align, mem_align)))
> +		goto no_dio;
> +
> +	prefix_end = round_up(offset, offset_align);
> +	orig_end = offset + total;
> +	middle_end = round_down(orig_end, offset_align);
> +
> +	prefix = prefix_end - offset;
> +	middle = middle_end - prefix_end;
> +	suffix = orig_end - middle_end;
> +
> +	if (!middle)
> +		goto no_dio;
> +
> +	if (prefix)
> +		nfsd_write_dio_seg_init(&segments[nsegs++], bvec,
> +					nvecs, total, 0, prefix, iocb);
> +
> +	nfsd_write_dio_seg_init(&segments[nsegs], bvec, nvecs,
> +				total, prefix, middle, iocb);
> +
> +	/*
> +	 * Check if the bvec iterator is aligned for direct I/O.
> +	 *
> +	 * bvecs generated from RPC receive buffers are contiguous: After
> +	 * the first bvec, all subsequent bvecs start at bv_offset zero
> +	 * (page-aligned). Therefore, only the first bvec is checked.
> +	 */
> +	if (iov_iter_bvec_offset(&segments[nsegs].iter) & (mem_align - 1))
> +		goto no_dio;
> +	segments[nsegs].flags |= IOCB_DIRECT;
> +	nsegs++;
> +
> +	if (suffix)
> +		nfsd_write_dio_seg_init(&segments[nsegs++], bvec, nvecs, total,
> +					prefix + middle, suffix, iocb);
> +
> +	return nsegs;
> +
> +no_dio:
> +	/*
> +	 * No DIO alignment possible - pack into single non-DIO segment.
> +	 * IOCB_DONTCACHE preserves the intent of NFSD_IO_DIRECT.
> +	 */
> +	nfsd_write_dio_seg_init(&segments[0], bvec, nvecs, total, 0,
> +				total, iocb);
> +	if (nf->nf_file->f_op->fop_flags & FOP_DONTCACHE)
> +		segments[nsegs].flags |= IOCB_DONTCACHE;

This needs to be: segments[0].flags |= IOCB_DONTCACHE;

Mike

  parent reply	other threads:[~2025-11-07 17:18 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-07 15:34 [PATCH v11 0/3] NFSD: Implement NFSD_IO_DIRECT for NFS WRITE Chuck Lever
2025-11-07 15:34 ` [PATCH v11 1/3] NFSD: Make FILE_SYNC WRITEs comply with spec Chuck Lever
2025-11-07 15:34 ` [PATCH v11 2/3] NFSD: Implement NFSD_IO_DIRECT for NFS WRITE Chuck Lever
2025-11-07 15:39   ` Christoph Hellwig
2025-11-07 15:40     ` Chuck Lever
2025-11-07 20:05       ` Mike Snitzer
2025-11-07 20:08         ` Chuck Lever
2025-11-07 20:10           ` Mike Snitzer
2025-11-07 21:58             ` NeilBrown
2025-11-07 22:24               ` Mike Snitzer
2025-11-07 23:42                 ` NeilBrown
2025-11-08  2:01                   ` Mike Snitzer
2025-11-10 16:41                     ` Chuck Lever
2025-11-10 17:57                       ` Mike Snitzer
2025-11-11  8:51                       ` Christoph Hellwig
2025-11-11 14:20                         ` Chuck Lever
2025-11-11 14:21                           ` Christoph Hellwig
2025-11-12  0:06                         ` Mike Snitzer
2025-11-12 15:02                           ` Christoph Hellwig
2025-11-12 23:14                             ` Mike Snitzer
2025-11-13  8:13                               ` Christoph Hellwig
2025-11-13 21:45                                 ` Mike Snitzer
2025-11-07 20:28     ` Chuck Lever
2025-11-07 22:16       ` Mike Snitzer
2025-11-10  9:12         ` Christoph Hellwig
2025-11-10 15:42           ` Mike Snitzer
2025-11-11  8:44             ` Christoph Hellwig
2025-11-10  9:17       ` Christoph Hellwig
2025-11-10 15:43         ` Mike Snitzer
2025-11-07 17:18   ` Mike Snitzer [this message]
2025-11-07 22:13   ` NeilBrown
2025-11-07 15:34 ` [PATCH v11 3/3] NFSD: add Documentation/filesystems/nfs/nfsd-io-modes.rst Chuck Lever

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aQ4p2xumOVlOxlkl@kernel.org \
    --to=snitzer@kernel.org \
    --cc=cel@kernel.org \
    --cc=chuck.lever@oracle.com \
    --cc=dai.ngo@oracle.com \
    --cc=jlayton@kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=neil@brown.name \
    --cc=okorniev@redhat.com \
    --cc=tom@talpey.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.