All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Snitzer <snitzer@kernel.org>
To: Chuck Lever <chuck.lever@oracle.com>,
	Jeff Layton <jlayton@kernel.org>,
	Trond Myklebust <trond.myklebust@hammerspace.com>,
	Anna Schumaker <anna.schumaker@oracle.com>
Cc: linux-nfs@vger.kernel.org
Subject: [PATCH v5 13/13] nfs/direct: add misaligned WRITE handling
Date: Thu, 24 Jul 2025 15:31:02 -0400	[thread overview]
Message-ID: <20250724193102.65111-14-snitzer@kernel.org> (raw)
In-Reply-To: <20250724193102.65111-1-snitzer@kernel.org>

Because the NFS client will already happily handle misaligned O_DIRECT
IO (by sending it out to NFSD via RPC) this commit's new capabilities
are for the benefit of LOCALIO and require the nfs modparam:
  localio_O_DIRECT_align_misaligned_IO=Y

When enabled, misaligned WRITE IO is split into a start, middle and
end as needed. The large middle extent is DIO-aligned and the start
and/or end are misaligned (due to each being a partial page).

Like the READ support that came before this WRITE support, the
nfs_analyze_dio trace event shows how the NFS client split a given
misaligned IO into a mix of misaligned page(s) and a DIO-aligned
extent.

This combination of trace events is useful for LOCALIO WRITEs:

  echo 1 > /sys/kernel/tracing/events/nfs/nfs_analyze_dio/enable
  echo 1 > /sys/kernel/tracing/events/nfs/nfs_initiate_write/enable
  echo 1 > /sys/kernel/tracing/events/nfs/nfs_writeback_done/enable
  echo 1 > /sys/kernel/tracing/events/xfs/xfs_file_direct_write/enable

Which for this dd command:

  dd if=/dev/zero of=/mnt/share1/test bs=47008 count=2 oflag=direct

Results in:

              dd-63257   [001] ..... 83742.427650: nfs_analyze_dio: WRITE offset=0 len=47008 start=0+0 middle=0+45056 end=45056+1952
              dd-63257   [001] ..... 83742.427659: nfs_initiate_write: fileid=00:2e:219750 fhandle=0xf6927a01 offset=0 count=45056 stable=UNSTABLE
              dd-63257   [001] ..... 83742.427662: nfs_initiate_write: fileid=00:2e:219750 fhandle=0xf6927a01 offset=45056 count=1952 stable=UNSTABLE
  kworker/u193:3-62985   [011] ..... 83742.427664: xfs_file_direct_write: dev 259:22 ino 0x5e0000a3 disize 0x0 pos 0x0 bytecount 0xb000
  kworker/u193:3-62985   [011] ..... 83742.427695: nfs_writeback_done: error=0 fileid=00:2e:219750 fhandle=0xf6927a01 offset=0 count=45056 res=45056 stable=UNSTABLE verifier=a8b37e6803d1eb1e
  kworker/u193:4-63221   [004] ..... 83742.427699: nfs_writeback_done: error=0 fileid=00:2e:219750 fhandle=0xf6927a01 offset=45056 count=1952 res=1952 stable=UNSTABLE verifier=a8b37e6803d1eb1e

              dd-63257   [001] ..... 83742.427755: nfs_analyze_dio: WRITE offset=47008 len=47008 start=47008+2144 middle=49152+40960 end=90112+3904
              dd-63257   [001] ..... 83742.427758: nfs_initiate_write: fileid=00:2e:219750 fhandle=0xf6927a01 offset=47008 count=2144 stable=UNSTABLE
              dd-63257   [001] ..... 83742.427760: nfs_initiate_write: fileid=00:2e:219750 fhandle=0xf6927a01 offset=49152 count=40960 stable=UNSTABLE
  kworker/u193:4-63221   [004] ..... 83742.427761: nfs_writeback_done: error=0 fileid=00:2e:219750 fhandle=0xf6927a01 offset=47008 count=2144 res=2144 stable=UNSTABLE verifier=a8b37e6803d1eb1e
              dd-63257   [001] ..... 83742.427763: nfs_initiate_write: fileid=00:2e:219750 fhandle=0xf6927a01 offset=90112 count=3904 stable=UNSTABLE
  kworker/u193:4-63221   [004] ..... 83742.427763: xfs_file_direct_write: dev 259:22 ino 0x5e0000a3 disize 0xb7a0 pos 0xc000 bytecount 0xa000
  kworker/u193:4-63221   [004] ..... 83742.427783: nfs_writeback_done: error=0 fileid=00:2e:219750 fhandle=0xf6927a01 offset=49152 count=40960 res=40960 stable=UNSTABLE verifier=a8b37e6803d1eb1e
  kworker/u193:3-62985   [011] ..... 83742.427788: nfs_writeback_done: error=0 fileid=00:2e:219750 fhandle=0xf6927a01 offset=90112 count=3904 res=3904 stable=UNSTABLE verifier=a8b37e6803d1eb1e

Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
 fs/nfs/direct.c   | 84 ++++++++++++++++++++++++++++++++++++++++++++---
 fs/nfs/internal.h |  1 +
 2 files changed, 80 insertions(+), 5 deletions(-)

diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index 4e1e668eaa1f..80c2ca37cf28 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -1048,11 +1048,19 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq,
 		if (result < 0)
 			break;
 
-		bytes = result;
-		npages = (result + pgbase + PAGE_SIZE - 1) / PAGE_SIZE;
+		/* Limit the amount of bytes serviced each iteration to aligned batches */
+		if (pos < dreq->middle_offset && dreq->start_len)
+			bytes = min_t(size_t, dreq->start_len, result);
+		else if (pos < dreq->end_offset && dreq->middle_len)
+			bytes = min_t(size_t, dreq->middle_len, result);
+		else
+			bytes = result;
+		npages = (bytes + pgbase + PAGE_SIZE - 1) / PAGE_SIZE;
+
 		for (i = 0; i < npages; i++) {
 			struct nfs_page *req;
 			unsigned int req_len = min_t(size_t, bytes, PAGE_SIZE - pgbase);
+			bool issue_dio_now = false;
 
 			req = nfs_page_create_from_page(dreq->ctx, pagevec[i],
 							pgbase, pos, req_len);
@@ -1068,6 +1076,7 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq,
 			}
 
 			pgbase = 0;
+			result -= req_len;
 			bytes -= req_len;
 			requested_bytes += req_len;
 			pos += req_len;
@@ -1077,9 +1086,27 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq,
 				continue;
 			}
 
+			/* Looking ahead, is this req the end of the start or middle? */
+			if (bytes == 0) {
+				if ((dreq->start_len &&
+				     pos == dreq->middle_offset && result >= dreq->middle_len) ||
+				    (dreq->end_len &&
+				     pos == dreq->end_offset && result == dreq->end_len)) {
+					desc.pg_doio_now = 1;
+					issue_dio_now = true;
+					/* Reset iter to the last boundary, isse the current
+					 * req and then handle iter to next boundary or end.
+					 */
+					iov_iter_revert(iter, result);
+				}
+			}
+
 			nfs_lock_request(req);
-			if (nfs_pageio_add_request(&desc, req))
+			if (nfs_pageio_add_request(&desc, req)) {
+				if (issue_dio_now)
+					break;
 				continue;
+			}
 
 			/* Exit on hard errors */
 			if (desc.pg_error < 0 && desc.pg_error != -EAGAIN) {
@@ -1120,6 +1147,50 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq,
 	return requested_bytes;
 }
 
+/*
+ * If localio_O_DIRECT_align_misaligned_WRITE enabled, split misaligned
+ * WRITE to a DIO-aligned middle and misaligned head and/or tail.
+ */
+static bool nfs_analyze_write_dio(loff_t offset, __u32 len,
+				  struct nfs_direct_req *dreq)
+{
+#if IS_ENABLED(CONFIG_NFS_LOCALIO)
+	/* Hardcoded to PAGE_SIZE (since don't have LOCALIO nfsd_file's
+	 * dio_alignment), works for smaller alignment too (e.g. 512b).
+	 */
+	u32 dio_blocksize = PAGE_SIZE;
+	loff_t start_end, orig_end, middle_end;
+
+	/* Return early if feature disabled, if IO is irreparably
+	 * misaligned (len < PAGE_SIZE) or if IO is already DIO-aligned.
+	 */
+	if (!nfs_localio_O_DIRECT_align_misaligned_IO() ||
+	    unlikely(len < dio_blocksize) ||
+	    (((offset | len) & (dio_blocksize-1)) == 0))
+		return false;
+
+	start_end = round_up(offset, dio_blocksize);
+	orig_end = offset + len;
+	middle_end = round_down(orig_end, dio_blocksize);
+
+	dreq->io_start = offset;
+	dreq->max_count = orig_end - offset;
+
+	dreq->start_len = start_end - offset;
+	dreq->middle_offset = start_end;
+	dreq->middle_len = middle_end - start_end;
+	dreq->end_offset = middle_end;
+	dreq->end_len = orig_end - middle_end;
+
+	trace_nfs_analyze_dio(WRITE, offset, len, offset, dreq->start_len,
+			      dreq->middle_offset, dreq->middle_len,
+			      dreq->end_offset, dreq->end_len);
+	return true;
+#else
+	return false;
+#endif
+}
+
 /**
  * nfs_file_direct_write - file direct write operation for NFS files
  * @iocb: target I/O control block
@@ -1176,9 +1247,12 @@ ssize_t nfs_file_direct_write(struct kiocb *iocb, struct iov_iter *iter,
 	if (!dreq)
 		goto out;
 
+	if (swap || !nfs_analyze_write_dio(pos, count, dreq)) {
+		dreq->max_count = count;
+		dreq->io_start = pos;
+	}
+
 	dreq->inode = inode;
-	dreq->max_count = count;
-	dreq->io_start = pos;
 	dreq->ctx = get_nfs_open_context(nfs_file_open_context(iocb->ki_filp));
 	l_ctx = nfs_get_lock_context(dreq->ctx);
 	if (IS_ERR(l_ctx)) {
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 06a15bf08357..8daed5b1aa50 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -995,6 +995,7 @@ struct nfs_direct_req {
 	struct bio_vec *        start_extra_bvec;
 	loff_t			middle_offset;	/* Offset for start of DIO-aligned middle */
 	loff_t			end_offset;	/* Offset for start of DIO-aligned end */
+	ssize_t			start_len;	/* Length for misaligned first page */
 	ssize_t			middle_len;	/* Length for DIO-aligned middle */
 	ssize_t			end_len;	/* Length for misaligned last page */
 };
-- 
2.44.0


  parent reply	other threads:[~2025-07-24 19:31 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-24 19:30 [PATCH v5 00/13] NFSD DIRECT and NFS DIRECT Mike Snitzer
2025-07-24 19:30 ` [PATCH v5 01/13] NFSD: filecache: add STATX_DIOALIGN and STATX_DIO_READ_ALIGN support Mike Snitzer
2025-07-24 19:30 ` [PATCH v5 02/13] NFSD: pass nfsd_file to nfsd_iter_read() Mike Snitzer
2025-07-24 19:30 ` [PATCH v5 03/13] NFSD: add io_cache_read controls to debugfs interface Mike Snitzer
2025-07-24 19:30 ` [PATCH v5 04/13] NFSD: add io_cache_write " Mike Snitzer
2025-07-24 19:30 ` [PATCH v5 05/13] NFSD: filecache: only get DIO alignment attrs if NFSD_IO_DIRECT enabled Mike Snitzer
2025-07-24 19:30 ` [PATCH v5 06/13] NFSD: issue READs using O_DIRECT even if IO is misaligned Mike Snitzer
2025-07-24 19:30 ` [PATCH v5 07/13] nfs/localio: avoid bouncing LOCALIO if nfs_client_is_local() Mike Snitzer
2025-07-24 19:30 ` [PATCH v5 08/13] nfs/localio: make trace_nfs_local_open_fh more useful Mike Snitzer
2025-07-24 19:30 ` [PATCH v5 09/13] nfs/localio: add nfsd_file_dio_alignment Mike Snitzer
2025-07-24 19:30 ` [PATCH v5 10/13] nfs/localio: refactor iocb initialization Mike Snitzer
2025-07-24 19:31 ` [PATCH v5 11/13] nfs/localio: fallback to NFSD for misaligned O_DIRECT READs Mike Snitzer
2025-07-24 19:31 ` [PATCH v5 12/13] nfs/direct: add misaligned READ handling Mike Snitzer
2025-07-24 19:31 ` Mike Snitzer [this message]
2025-07-27 15:39 ` [PATCH v5 00/13] NFSD DIRECT and NFS DIRECT Chuck Lever
2025-07-28 13:44   ` Mike Snitzer
2025-07-28 13:48     ` Chuck Lever
2025-07-28 14:08       ` Mike Snitzer
2025-07-27 16:16 ` (subset) " Chuck Lever
2025-07-28 13:51   ` Mike Snitzer
2025-07-28 13:53     ` Chuck Lever
2025-07-28 13:58       ` Mike Snitzer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250724193102.65111-14-snitzer@kernel.org \
    --to=snitzer@kernel.org \
    --cc=anna.schumaker@oracle.com \
    --cc=chuck.lever@oracle.com \
    --cc=jlayton@kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=trond.myklebust@hammerspace.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.