linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] NFSD DIRECT: add handling for misaligned WRITEs
@ 2025-07-30 23:05 Mike Snitzer
  2025-07-30 23:05 ` [PATCH 1/3] NFSD: rename and update nfsd_read_vector_dio trace event to nfsd_analyze_dio Mike Snitzer
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Mike Snitzer @ 2025-07-30 23:05 UTC (permalink / raw)
  To: Chuck Lever, Jeff Layton; +Cc: linux-nfs

Hi,

This series builds on what has been staged in the nfsd-testing branch.

This code has proven to work well during my testing.  Any suggestions
for further refinement are welcome.

Thanks,
Mike

Mike Snitzer (3):
  NFSD: rename and update nfsd_read_vector_dio trace event to nfsd_analyze_dio
  NFSD: prepare nfsd_vfs_write() to use O_DIRECT on misaligned WRITEs
  NFSD: issue WRITEs using O_DIRECT even if IO is misaligned

 fs/nfsd/trace.h |  37 ++++++----
 fs/nfsd/vfs.c   | 188 +++++++++++++++++++++++++++++++++++++++---------
 2 files changed, 180 insertions(+), 45 deletions(-)

-- 
2.44.0


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/3] NFSD: rename and update nfsd_read_vector_dio trace event to nfsd_analyze_dio
  2025-07-30 23:05 [PATCH 0/3] NFSD DIRECT: add handling for misaligned WRITEs Mike Snitzer
@ 2025-07-30 23:05 ` Mike Snitzer
  2025-07-31 15:31   ` Jeff Layton
  2025-07-30 23:05 ` [PATCH 2/3] NFSD: prepare nfsd_vfs_write() to use O_DIRECT on misaligned WRITEs Mike Snitzer
  2025-07-30 23:05 ` [PATCH 3/3] NFSD: issue WRITEs using O_DIRECT even if IO is misaligned Mike Snitzer
  2 siblings, 1 reply; 9+ messages in thread
From: Mike Snitzer @ 2025-07-30 23:05 UTC (permalink / raw)
  To: Chuck Lever, Jeff Layton; +Cc: linux-nfs

From: Mike Snitzer <snitzer@hammerspace.com>

Rename nfsd_read_vector_dio trace event to nfsd_analyze_dio and update
it so that it provides useful tracing for both READ and WRITE.  This
prepares for nfsd_vfs_write() to also make use of it when handling
misaligned WRITEs.

Signed-off-by: Mike Snitzer <snitzer@hammerspace.com>
---
 fs/nfsd/trace.h | 37 ++++++++++++++++++++++++-------------
 fs/nfsd/vfs.c   | 11 ++++++-----
 2 files changed, 30 insertions(+), 18 deletions(-)

diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
index 55055482f8a8..51b47fd041a8 100644
--- a/fs/nfsd/trace.h
+++ b/fs/nfsd/trace.h
@@ -473,41 +473,52 @@ DEFINE_NFSD_IO_EVENT(write_done);
 DEFINE_NFSD_IO_EVENT(commit_start);
 DEFINE_NFSD_IO_EVENT(commit_done);
 
-TRACE_EVENT(nfsd_read_vector_dio,
+TRACE_EVENT(nfsd_analyze_dio,
 	TP_PROTO(struct svc_rqst *rqstp,
 		 struct svc_fh	*fhp,
+		 u32		rw,
 		 u64		offset,
 		 u32		len,
-		 loff_t         start,
-		 loff_t         start_extra,
-		 loff_t         end,
-		 loff_t         end_extra),
-	TP_ARGS(rqstp, fhp, offset, len, start, start_extra, end, end_extra),
+		 loff_t		start,
+		 ssize_t	start_len,
+		 loff_t		middle,
+		 ssize_t	middle_len,
+		 loff_t		end,
+		 ssize_t	end_len),
+	TP_ARGS(rqstp, fhp, rw, offset, len, start, start_len, middle, middle_len, end, end_len),
 	TP_STRUCT__entry(
 		__field(u32, xid)
 		__field(u32, fh_hash)
+		__field(u32, rw)
 		__field(u64, offset)
 		__field(u32, len)
 		__field(loff_t, start)
-		__field(loff_t, start_extra)
+		__field(ssize_t, start_len)
+		__field(loff_t, middle)
+		__field(ssize_t, middle_len)
 		__field(loff_t, end)
-		__field(loff_t, end_extra)
+		__field(ssize_t, end_len)
 	),
 	TP_fast_assign(
 		__entry->xid = be32_to_cpu(rqstp->rq_xid);
 		__entry->fh_hash = knfsd_fh_hash(&fhp->fh_handle);
+		__entry->rw = rw;
 		__entry->offset = offset;
 		__entry->len = len;
 		__entry->start = start;
-		__entry->start_extra = start_extra;
+		__entry->start_len = start_len;
+		__entry->middle = middle;
+		__entry->middle_len = middle_len;
 		__entry->end = end;
-		__entry->end_extra = end_extra;
+		__entry->end_len = end_len;
 	),
-	TP_printk("xid=0x%08x fh_hash=0x%08x offset=%llu len=%u start=%llu+%llu end=%llu-%llu",
+	TP_printk("xid=0x%08x fh_hash=0x%08x %s offset=%llu len=%u start=%llu+%lu middle=%llu+%lu end=%llu+%lu",
 		  __entry->xid, __entry->fh_hash,
+		  __entry->rw ? "WRITE" : "READ",
 		  __entry->offset, __entry->len,
-		  __entry->start, __entry->start_extra,
-		  __entry->end, __entry->end_extra)
+		  __entry->start, __entry->start_len,
+		  __entry->middle, __entry->middle_len,
+		  __entry->end, __entry->end_len)
 );
 
 DECLARE_EVENT_CLASS(nfsd_err_class,
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 46189020172f..0863350c4259 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1094,7 +1094,7 @@ static bool nfsd_analyze_read_dio(struct svc_rqst *rqstp, struct svc_fh *fhp,
 				  struct nfsd_read_dio *read_dio)
 {
 	const u32 dio_blocksize = nf->nf_dio_read_offset_align;
-	loff_t orig_end = offset + len;
+	loff_t middle_end, orig_end = offset + len;
 
 	if (WARN_ONCE(!nf->nf_dio_mem_align || !nf->nf_dio_read_offset_align,
 		      "%s: underlying filesystem has not provided DIO alignment info\n",
@@ -1133,10 +1133,11 @@ static bool nfsd_analyze_read_dio(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	}
 
 	/* Show original offset and count, and how it was expanded for DIO */
-	trace_nfsd_read_vector_dio(rqstp, fhp, offset, len,
-				   read_dio->start, read_dio->start_extra,
-				   read_dio->end, read_dio->end_extra);
-
+	middle_end = read_dio->end - read_dio->end_extra;
+	trace_nfsd_analyze_dio(rqstp, fhp, READ, offset, len,
+			       read_dio->start, read_dio->start_extra,
+			       offset, (middle_end - offset),
+			       middle_end, read_dio->end_extra);
 	return true;
 }
 
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/3] NFSD: prepare nfsd_vfs_write() to use O_DIRECT on misaligned WRITEs
  2025-07-30 23:05 [PATCH 0/3] NFSD DIRECT: add handling for misaligned WRITEs Mike Snitzer
  2025-07-30 23:05 ` [PATCH 1/3] NFSD: rename and update nfsd_read_vector_dio trace event to nfsd_analyze_dio Mike Snitzer
@ 2025-07-30 23:05 ` Mike Snitzer
  2025-07-30 23:05 ` [PATCH 3/3] NFSD: issue WRITEs using O_DIRECT even if IO is misaligned Mike Snitzer
  2 siblings, 0 replies; 9+ messages in thread
From: Mike Snitzer @ 2025-07-30 23:05 UTC (permalink / raw)
  To: Chuck Lever, Jeff Layton; +Cc: linux-nfs

From: Mike Snitzer <snitzer@hammerspace.com>

Refactor nfsd_vfs_write() to support splitting a WRITE into parts
(which will be either misaligned or DIO-aligned).  Doing so in a
preliminary commit just allows for indentation and slight
transformation to be more easily understood and reviewed.

Signed-off-by: Mike Snitzer <snitzer@hammerspace.com>
---
 fs/nfsd/vfs.c | 50 ++++++++++++++++++++++++++++++--------------------
 1 file changed, 30 insertions(+), 20 deletions(-)

diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 0863350c4259..72fd0a11ffa3 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1341,7 +1341,6 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	struct super_block	*sb = file_inode(file)->i_sb;
 	struct kiocb		kiocb;
 	struct svc_export	*exp;
-	struct iov_iter		iter;
 	errseq_t		since;
 	__be32			nfserr;
 	int			host_err;
@@ -1349,6 +1348,9 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	unsigned int		pflags = current->flags;
 	bool			restore_flags = false;
 	unsigned int		nvecs;
+	struct iov_iter		iter_stack[1];
+	struct iov_iter		*iter = iter_stack;
+	unsigned int		n_iters = 0;
 
 	trace_nfsd_write_opened(rqstp, fhp, offset, *cnt);
 
@@ -1378,14 +1380,15 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
 		kiocb.ki_flags |= IOCB_DSYNC;
 
 	nvecs = xdr_buf_to_bvec(rqstp->rq_bvec, rqstp->rq_maxpages, payload);
-	iov_iter_bvec(&iter, ITER_SOURCE, rqstp->rq_bvec, nvecs, *cnt);
+	iov_iter_bvec(&iter[0], ITER_SOURCE, rqstp->rq_bvec, nvecs, *cnt);
+	n_iters++;
 
 	switch (nfsd_io_cache_write) {
 	case NFSD_IO_DIRECT:
 		/* direct I/O must be aligned to device logical sector size */
 		if (nf->nf_dio_mem_align && nf->nf_dio_offset_align &&
 		    (((offset | *cnt) & (nf->nf_dio_offset_align-1)) == 0) &&
-		    iov_iter_is_aligned(&iter, nf->nf_dio_mem_align - 1,
+		    iov_iter_is_aligned(&iter[0], nf->nf_dio_mem_align - 1,
 					nf->nf_dio_offset_align - 1))
 			kiocb.ki_flags = IOCB_DIRECT;
 		break;
@@ -1396,25 +1399,32 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
 		break;
 	}
 
-	since = READ_ONCE(file->f_wb_err);
-	if (verf)
-		nfsd_copy_write_verifier(verf, nn);
-	host_err = vfs_iocb_iter_write(file, &kiocb, &iter);
-	if (host_err < 0) {
-		commit_reset_write_verifier(nn, rqstp, host_err);
-		goto out_nfserr;
-	}
-	*cnt = host_err;
-	nfsd_stats_io_write_add(nn, exp, *cnt);
-	fsnotify_modify(file);
-	host_err = filemap_check_wb_err(file->f_mapping, since);
-	if (host_err < 0)
-		goto out_nfserr;
+	*cnt = 0;
+	for (int i = 0; i < n_iters; i++) {
+		since = READ_ONCE(file->f_wb_err);
+		if (verf)
+			nfsd_copy_write_verifier(verf, nn);
 
-	if (stable && fhp->fh_use_wgather) {
-		host_err = wait_for_concurrent_writes(file);
-		if (host_err < 0)
+		host_err = vfs_iocb_iter_write(file, &kiocb, &iter[i]);
+		if (host_err < 0) {
 			commit_reset_write_verifier(nn, rqstp, host_err);
+			goto out_nfserr;
+		}
+		*cnt += host_err;
+		nfsd_stats_io_write_add(nn, exp, host_err);
+
+		fsnotify_modify(file);
+		host_err = filemap_check_wb_err(file->f_mapping, since);
+		if (host_err < 0)
+			goto out_nfserr;
+
+		if (stable && fhp->fh_use_wgather) {
+			host_err = wait_for_concurrent_writes(file);
+			if (host_err < 0) {
+				commit_reset_write_verifier(nn, rqstp, host_err);
+				goto out_nfserr;
+			}
+		}
 	}
 
 out_nfserr:
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 3/3] NFSD: issue WRITEs using O_DIRECT even if IO is misaligned
  2025-07-30 23:05 [PATCH 0/3] NFSD DIRECT: add handling for misaligned WRITEs Mike Snitzer
  2025-07-30 23:05 ` [PATCH 1/3] NFSD: rename and update nfsd_read_vector_dio trace event to nfsd_analyze_dio Mike Snitzer
  2025-07-30 23:05 ` [PATCH 2/3] NFSD: prepare nfsd_vfs_write() to use O_DIRECT on misaligned WRITEs Mike Snitzer
@ 2025-07-30 23:05 ` Mike Snitzer
  2 siblings, 0 replies; 9+ messages in thread
From: Mike Snitzer @ 2025-07-30 23:05 UTC (permalink / raw)
  To: Chuck Lever, Jeff Layton; +Cc: linux-nfs

From: Mike Snitzer <snitzer@hammerspace.com>

If NFSD_IO_DIRECT is used, split any misaligned WRITE into a start,
middle and end as needed. The large middle extent is DIO-aligned and
the start and/or end are misaligned. Buffered IO is used for the
misaligned extents and O_DIRECT is used for the middle DIO-aligned
extent.

The nfsd_analyze_dio trace event shows how NFSD splits a given
misaligned IO into a mix of misaligned extent(s) and a DIO-aligned
extent.

This combination of trace events is useful:

  echo 1 > /sys/kernel/tracing/events/nfsd/nfsd_write_opened/enable
  echo 1 > /sys/kernel/tracing/events/nfsd/nfsd_analyze_dio/enable
  echo 1 > /sys/kernel/tracing/events/nfsd/nfsd_write_io_done/enable
  echo 1 > /sys/kernel/tracing/events/xfs/xfs_file_direct_write/enable

Which for this dd command:

  dd if=/dev/zero of=/mnt/share1/test bs=47008 count=2 oflag=direct

Results in:

  nfsd-55714   [043] ..... 79976.260851: nfsd_write_opened: xid=0x966c5d2d fh_hash=0x4d34e6c1 offset=0 len=47008
  nfsd-55714   [043] ..... 79976.260852: nfsd_analyze_dio: xid=0x966c5d2d fh_hash=0x4d34e6c1 WRITE offset=0 len=47008 start=0+0 middle=0+45056 end=45056+1952
  nfsd-55714   [043] ..... 79976.260857: xfs_file_direct_write: dev 259:12 ino 0x3e00008f disize 0x0 pos 0x0 bytecount 0xb000
  nfsd-55714   [043] ..... 79976.260965: nfsd_write_io_done: xid=0x966c5d2d fh_hash=0x4d34e6c1 offset=0 len=47008

  nfsd-55714   [043] ..... 79976.307762: nfsd_write_opened: xid=0x67e5ce6f fh_hash=0x4d34e6c1 offset=47008 len=47008
  nfsd-55714   [043] ..... 79976.307762: nfsd_analyze_dio: xid=0x67e5ce6f fh_hash=0x4d34e6c1 WRITE offset=47008 len=47008 start=47008+2144 middle=49152+40960 end=90112+3904
  nfsd-55714   [043] ..... 79976.307797: xfs_file_direct_write: dev 259:12 ino 0x3e00008f disize 0xc000 pos 0xc000 bytecount 0xa000
  nfsd-55714   [043] ..... 79976.307866: nfsd_write_io_done: xid=0x67e5ce6f fh_hash=0x4d34e6c1 offset=47008 len=47008

Signed-off-by: Mike Snitzer <snitzer@hammerspace.com>
---
 fs/nfsd/vfs.c | 135 ++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 124 insertions(+), 11 deletions(-)

diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 72fd0a11ffa3..96b346a95bfb 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1314,6 +1314,113 @@ static int wait_for_concurrent_writes(struct file *file)
 	return err;
 }
 
+struct nfsd_write_dio
+{
+	loff_t middle_offset;	/* Offset for start of DIO-aligned middle */
+	loff_t end_offset;	/* Offset for start of DIO-aligned end */
+	ssize_t	start_len;	/* Length for misaligned first extent */
+	ssize_t	middle_len;	/* Length for DIO-aligned middle extent */
+	ssize_t	end_len;	/* Length for misaligned last extent */
+};
+
+static void init_nfsd_write_dio(struct nfsd_write_dio *write_dio)
+{
+	memset(write_dio, 0, sizeof(*write_dio));
+}
+
+static bool nfsd_analyze_write_dio(struct svc_rqst *rqstp, struct svc_fh *fhp,
+				   struct nfsd_file *nf, loff_t offset,
+				   unsigned long len, struct nfsd_write_dio *write_dio)
+{
+	const u32 dio_blocksize = nf->nf_dio_offset_align;
+	loff_t orig_end, middle_end, start_end, start_offset = offset;
+	ssize_t start_len = len;
+	bool aligned = true;
+
+	if (WARN_ONCE(!nf->nf_dio_mem_align || !dio_blocksize,
+		      "%s: underlying filesystem has not provided DIO alignment info\n",
+		      __func__))
+		return false;
+
+	if (WARN_ONCE(dio_blocksize > PAGE_SIZE,
+		      "%s: underlying storage's dio_blocksize=%u > PAGE_SIZE=%lu\n",
+		      __func__, dio_blocksize, PAGE_SIZE))
+		return false;
+
+	if (unlikely(len < dio_blocksize)) {
+		aligned = false;
+		goto out;
+	}
+
+	if (((offset | len) & (dio_blocksize-1)) == 0) {
+		/* already DIO-aligned, no misaligned head or tail */
+		write_dio->middle_offset = offset;
+		write_dio->middle_len = len;
+		/* clear these for the benefit of trace_nfsd_analyze_dio */
+		start_offset = 0;
+		start_len = 0;
+		goto out;
+	}
+
+	start_end = round_up(offset, dio_blocksize);
+	start_len = start_end - offset;
+	orig_end = offset + len;
+	middle_end = round_down(orig_end, dio_blocksize);
+
+	write_dio->start_len = start_len;
+	write_dio->middle_offset = start_end;
+	write_dio->middle_len = middle_end - start_end;
+	write_dio->end_offset = middle_end;
+	write_dio->end_len = orig_end - middle_end;
+out:
+	trace_nfsd_analyze_dio(rqstp, fhp, WRITE, offset, len, start_offset, start_len,
+			       write_dio->middle_offset, write_dio->middle_len,
+			       write_dio->end_offset, write_dio->end_len);
+	return aligned;
+}
+
+/*
+ * Setup as many as 3 iov_iter based on extents possibly described by @write_dio.
+ * @iterp: pointer to pointer to onstack array of 3 iov_iter structs from caller.
+ * @rq_bvec: backing bio_vec used to setup all 3 iov_iter permutations.
+ * @nvecs: number of segments in @rq_bvec
+ * @cnt: size of the request in bytes
+ * @write_dio: nfsd_write_dio struct that describes start, middle and end extents.
+ *
+ * Returns the number of iov_iter that were setup.
+ */
+static int nfsd_setup_write_iters(struct iov_iter **iterp, struct bio_vec *rq_bvec,
+				  unsigned int nvecs, unsigned long cnt,
+				  struct nfsd_write_dio *write_dio)
+{
+	int n_iters = 0;
+	struct iov_iter *iters = *iterp;
+
+	/* Setup misaligned start? */
+	if (write_dio->start_len) {
+		iov_iter_bvec(&iters[n_iters], ITER_SOURCE, rq_bvec, nvecs, cnt);
+		iters[n_iters].count = write_dio->start_len;
+		n_iters++;
+	}
+
+	/* Setup possibly DIO-aligned middle */
+	iov_iter_bvec(&iters[n_iters], ITER_SOURCE, rq_bvec, nvecs, cnt);
+	if (write_dio->start_len)
+		iov_iter_advance(&iters[n_iters], write_dio->start_len);
+	iters[n_iters].count -= write_dio->end_len;
+	n_iters++;
+
+	/* Setup misaligned end? */
+	if (write_dio->end_len) {
+		iov_iter_bvec(&iters[n_iters], ITER_SOURCE, rq_bvec, nvecs, cnt);
+		iov_iter_advance(&iters[n_iters],
+				 write_dio->start_len + write_dio->middle_len);
+		n_iters++;
+	}
+
+	return n_iters;
+}
+
 /**
  * nfsd_vfs_write - write data to an already-open file
  * @rqstp: RPC execution context
@@ -1348,9 +1455,11 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	unsigned int		pflags = current->flags;
 	bool			restore_flags = false;
 	unsigned int		nvecs;
-	struct iov_iter		iter_stack[1];
+	struct iov_iter		iter_stack[3];
 	struct iov_iter		*iter = iter_stack;
 	unsigned int		n_iters = 0;
+	bool                    dio_aligned = false;
+	struct nfsd_write_dio	write_dio;
 
 	trace_nfsd_write_opened(rqstp, fhp, offset, *cnt);
 
@@ -1379,18 +1488,12 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	if (stable && !fhp->fh_use_wgather)
 		kiocb.ki_flags |= IOCB_DSYNC;
 
-	nvecs = xdr_buf_to_bvec(rqstp->rq_bvec, rqstp->rq_maxpages, payload);
-	iov_iter_bvec(&iter[0], ITER_SOURCE, rqstp->rq_bvec, nvecs, *cnt);
-	n_iters++;
-
+	init_nfsd_write_dio(&write_dio);
 	switch (nfsd_io_cache_write) {
 	case NFSD_IO_DIRECT:
-		/* direct I/O must be aligned to device logical sector size */
-		if (nf->nf_dio_mem_align && nf->nf_dio_offset_align &&
-		    (((offset | *cnt) & (nf->nf_dio_offset_align-1)) == 0) &&
-		    iov_iter_is_aligned(&iter[0], nf->nf_dio_mem_align - 1,
-					nf->nf_dio_offset_align - 1))
-			kiocb.ki_flags = IOCB_DIRECT;
+		if (nfsd_analyze_write_dio(rqstp, fhp, nf, offset,
+					   *cnt, &write_dio))
+			dio_aligned = true;
 		break;
 	case NFSD_IO_DONTCACHE:
 		kiocb.ki_flags = IOCB_DONTCACHE;
@@ -1399,12 +1502,22 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
 		break;
 	}
 
+	nvecs = xdr_buf_to_bvec(rqstp->rq_bvec, rqstp->rq_maxpages, payload);
+	n_iters = nfsd_setup_write_iters(&iter, rqstp->rq_bvec, nvecs, *cnt, &write_dio);
+
 	*cnt = 0;
 	for (int i = 0; i < n_iters; i++) {
 		since = READ_ONCE(file->f_wb_err);
 		if (verf)
 			nfsd_copy_write_verifier(verf, nn);
 
+		if (dio_aligned) {
+			if (iov_iter_is_aligned(&iter[i], nf->nf_dio_mem_align - 1,
+						nf->nf_dio_offset_align - 1))
+				kiocb.ki_flags |= IOCB_DIRECT;
+			else
+				kiocb.ki_flags &= ~IOCB_DIRECT;
+		}
 		host_err = vfs_iocb_iter_write(file, &kiocb, &iter[i]);
 		if (host_err < 0) {
 			commit_reset_write_verifier(nn, rqstp, host_err);
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/3] NFSD: rename and update nfsd_read_vector_dio trace event to nfsd_analyze_dio
  2025-07-30 23:05 ` [PATCH 1/3] NFSD: rename and update nfsd_read_vector_dio trace event to nfsd_analyze_dio Mike Snitzer
@ 2025-07-31 15:31   ` Jeff Layton
  2025-07-31 17:04     ` Mike Snitzer
  0 siblings, 1 reply; 9+ messages in thread
From: Jeff Layton @ 2025-07-31 15:31 UTC (permalink / raw)
  To: Mike Snitzer, Chuck Lever; +Cc: linux-nfs

On Wed, 2025-07-30 at 19:05 -0400, Mike Snitzer wrote:
> From: Mike Snitzer <snitzer@hammerspace.com>
> 
> Rename nfsd_read_vector_dio trace event to nfsd_analyze_dio and update
> it so that it provides useful tracing for both READ and WRITE.  This
> prepares for nfsd_vfs_write() to also make use of it when handling
> misaligned WRITEs.
> 
> Signed-off-by: Mike Snitzer <snitzer@hammerspace.com>
> ---
>  fs/nfsd/trace.h | 37 ++++++++++++++++++++++++-------------
>  fs/nfsd/vfs.c   | 11 ++++++-----
>  2 files changed, 30 insertions(+), 18 deletions(-)
> 
> diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
> index 55055482f8a8..51b47fd041a8 100644
> --- a/fs/nfsd/trace.h
> +++ b/fs/nfsd/trace.h
> @@ -473,41 +473,52 @@ DEFINE_NFSD_IO_EVENT(write_done);
>  DEFINE_NFSD_IO_EVENT(commit_start);
>  DEFINE_NFSD_IO_EVENT(commit_done);
>  
> -TRACE_EVENT(nfsd_read_vector_dio,
> +TRACE_EVENT(nfsd_analyze_dio,
>  	TP_PROTO(struct svc_rqst *rqstp,
>  		 struct svc_fh	*fhp,
> +		 u32		rw,

I would do this a bit differently. You're hardcoding READ and WRITE
into both tracepoints. I would turn this trace event into a class a'la
DECLARE_EVENT_CLASS, and then just define two different tracepoints
(maybe trace_nfsd_analyze_read/write_dio). Then you can just drop the
above u32 field, and it'll still be evident whether it's a read or
write in the log.

>  		 u64		offset,
>  		 u32		len,
> -		 loff_t         start,
> -		 loff_t         start_extra,
> -		 loff_t         end,
> -		 loff_t         end_extra),
> -	TP_ARGS(rqstp, fhp, offset, len, start, start_extra, end, end_extra),
> +		 loff_t		start,
> +		 ssize_t	start_len,
> +		 loff_t		middle,
> +		 ssize_t	middle_len,
> +		 loff_t		end,
> +		 ssize_t	end_len),
> +	TP_ARGS(rqstp, fhp, rw, offset, len, start, start_len, middle, middle_len, end, end_len),
>  	TP_STRUCT__entry(
>  		__field(u32, xid)
>  		__field(u32, fh_hash)
> +		__field(u32, rw)
>  		__field(u64, offset)
>  		__field(u32, len)
>  		__field(loff_t, start)
> -		__field(loff_t, start_extra)
> +		__field(ssize_t, start_len)
> +		__field(loff_t, middle)
> +		__field(ssize_t, middle_len)
>  		__field(loff_t, end)
> -		__field(loff_t, end_extra)
> +		__field(ssize_t, end_len)
>  	),
>  	TP_fast_assign(
>  		__entry->xid = be32_to_cpu(rqstp->rq_xid);
>  		__entry->fh_hash = knfsd_fh_hash(&fhp->fh_handle);
> +		__entry->rw = rw;
>  		__entry->offset = offset;
>  		__entry->len = len;
>  		__entry->start = start;
> -		__entry->start_extra = start_extra;
> +		__entry->start_len = start_len;
> +		__entry->middle = middle;
> +		__entry->middle_len = middle_len;
>  		__entry->end = end;
> -		__entry->end_extra = end_extra;
> +		__entry->end_len = end_len;
>  	),
> -	TP_printk("xid=0x%08x fh_hash=0x%08x offset=%llu len=%u start=%llu+%llu end=%llu-%llu",
> +	TP_printk("xid=0x%08x fh_hash=0x%08x %s offset=%llu len=%u start=%llu+%lu middle=%llu+%lu end=%llu+%lu",
>  		  __entry->xid, __entry->fh_hash,
> +		  __entry->rw ? "WRITE" : "READ",
>  		  __entry->offset, __entry->len,
> -		  __entry->start, __entry->start_extra,
> -		  __entry->end, __entry->end_extra)
> +		  __entry->start, __entry->start_len,
> +		  __entry->middle, __entry->middle_len,
> +		  __entry->end, __entry->end_len)
>  );
>  
>  DECLARE_EVENT_CLASS(nfsd_err_class,
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index 46189020172f..0863350c4259 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -1094,7 +1094,7 @@ static bool nfsd_analyze_read_dio(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  				  struct nfsd_read_dio *read_dio)
>  {
>  	const u32 dio_blocksize = nf->nf_dio_read_offset_align;
> -	loff_t orig_end = offset + len;
> +	loff_t middle_end, orig_end = offset + len;
>  
>  	if (WARN_ONCE(!nf->nf_dio_mem_align || !nf->nf_dio_read_offset_align,
>  		      "%s: underlying filesystem has not provided DIO alignment info\n",
> @@ -1133,10 +1133,11 @@ static bool nfsd_analyze_read_dio(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  	}
>  
>  	/* Show original offset and count, and how it was expanded for DIO */
> -	trace_nfsd_read_vector_dio(rqstp, fhp, offset, len,
> -				   read_dio->start, read_dio->start_extra,
> -				   read_dio->end, read_dio->end_extra);
> -
> +	middle_end = read_dio->end - read_dio->end_extra;
> +	trace_nfsd_analyze_dio(rqstp, fhp, READ, offset, len,
> +			       read_dio->start, read_dio->start_extra,
> +			       offset, (middle_end - offset),
> +			       middle_end, read_dio->end_extra);
>  	return true;
>  }
>  

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/3] NFSD: rename and update nfsd_read_vector_dio trace event to nfsd_analyze_dio
  2025-07-31 15:31   ` Jeff Layton
@ 2025-07-31 17:04     ` Mike Snitzer
  2025-07-31 18:10       ` Chuck Lever
  2025-07-31 19:04       ` Jeff Layton
  0 siblings, 2 replies; 9+ messages in thread
From: Mike Snitzer @ 2025-07-31 17:04 UTC (permalink / raw)
  To: Jeff Layton; +Cc: Chuck Lever, linux-nfs

On Thu, Jul 31, 2025 at 11:31:29AM -0400, Jeff Layton wrote:
> On Wed, 2025-07-30 at 19:05 -0400, Mike Snitzer wrote:
> > From: Mike Snitzer <snitzer@hammerspace.com>
> > 
> > Rename nfsd_read_vector_dio trace event to nfsd_analyze_dio and update
> > it so that it provides useful tracing for both READ and WRITE.  This
> > prepares for nfsd_vfs_write() to also make use of it when handling
> > misaligned WRITEs.
> > 
> > Signed-off-by: Mike Snitzer <snitzer@hammerspace.com>
> > ---
> >  fs/nfsd/trace.h | 37 ++++++++++++++++++++++++-------------
> >  fs/nfsd/vfs.c   | 11 ++++++-----
> >  2 files changed, 30 insertions(+), 18 deletions(-)
> > 
> > diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
> > index 55055482f8a8..51b47fd041a8 100644
> > --- a/fs/nfsd/trace.h
> > +++ b/fs/nfsd/trace.h
> > @@ -473,41 +473,52 @@ DEFINE_NFSD_IO_EVENT(write_done);
> >  DEFINE_NFSD_IO_EVENT(commit_start);
> >  DEFINE_NFSD_IO_EVENT(commit_done);
> >  
> > -TRACE_EVENT(nfsd_read_vector_dio,
> > +TRACE_EVENT(nfsd_analyze_dio,
> >  	TP_PROTO(struct svc_rqst *rqstp,
> >  		 struct svc_fh	*fhp,
> > +		 u32		rw,
> 
> I would do this a bit differently. You're hardcoding READ and WRITE
> into both tracepoints. I would turn this trace event into a class a'la
> DECLARE_EVENT_CLASS, and then just define two different tracepoints
> (maybe trace_nfsd_analyze_read/write_dio). Then you can just drop the
> above u32 field, and it'll still be evident whether it's a read or
> write in the log.

Seems overkill to me, and also forces the need for user to enable
discrete tracepoints.  Could be good, could be bad.

But if I should be taking your feedback as: "do what Jeff suggested",
that's fine too, happy to do so! ;)

Thanks,
Mike

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/3] NFSD: rename and update nfsd_read_vector_dio trace event to nfsd_analyze_dio
  2025-07-31 17:04     ` Mike Snitzer
@ 2025-07-31 18:10       ` Chuck Lever
  2025-07-31 18:12         ` Mike Snitzer
  2025-07-31 19:04       ` Jeff Layton
  1 sibling, 1 reply; 9+ messages in thread
From: Chuck Lever @ 2025-07-31 18:10 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: linux-nfs, Jeff Layton

On 7/31/25 1:04 PM, Mike Snitzer wrote:
> On Thu, Jul 31, 2025 at 11:31:29AM -0400, Jeff Layton wrote:
>> On Wed, 2025-07-30 at 19:05 -0400, Mike Snitzer wrote:
>>> From: Mike Snitzer <snitzer@hammerspace.com>
>>>
>>> Rename nfsd_read_vector_dio trace event to nfsd_analyze_dio and update
>>> it so that it provides useful tracing for both READ and WRITE.  This
>>> prepares for nfsd_vfs_write() to also make use of it when handling
>>> misaligned WRITEs.
>>>
>>> Signed-off-by: Mike Snitzer <snitzer@hammerspace.com>
>>> ---
>>>  fs/nfsd/trace.h | 37 ++++++++++++++++++++++++-------------
>>>  fs/nfsd/vfs.c   | 11 ++++++-----
>>>  2 files changed, 30 insertions(+), 18 deletions(-)
>>>
>>> diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
>>> index 55055482f8a8..51b47fd041a8 100644
>>> --- a/fs/nfsd/trace.h
>>> +++ b/fs/nfsd/trace.h
>>> @@ -473,41 +473,52 @@ DEFINE_NFSD_IO_EVENT(write_done);
>>>  DEFINE_NFSD_IO_EVENT(commit_start);
>>>  DEFINE_NFSD_IO_EVENT(commit_done);
>>>  
>>> -TRACE_EVENT(nfsd_read_vector_dio,
>>> +TRACE_EVENT(nfsd_analyze_dio,
>>>  	TP_PROTO(struct svc_rqst *rqstp,
>>>  		 struct svc_fh	*fhp,
>>> +		 u32		rw,
>>
>> I would do this a bit differently. You're hardcoding READ and WRITE
>> into both tracepoints. I would turn this trace event into a class a'la
>> DECLARE_EVENT_CLASS, and then just define two different tracepoints
>> (maybe trace_nfsd_analyze_read/write_dio). Then you can just drop the
>> above u32 field, and it'll still be evident whether it's a read or
>> write in the log.
> 
> Seems overkill to me, and also forces the need for user to enable
> discrete tracepoints.

Users can enable trace points with globs, so that's not a big deal in
most cases. What is more important is that, when you define individual
trace points, you can enable tracing for only reads or only writes.

The common trope is to define a class, as Jeff suggested. The I/O
direction is then recorded in the trace point and the extra field
is no longer necessary.


-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/3] NFSD: rename and update nfsd_read_vector_dio trace event to nfsd_analyze_dio
  2025-07-31 18:10       ` Chuck Lever
@ 2025-07-31 18:12         ` Mike Snitzer
  0 siblings, 0 replies; 9+ messages in thread
From: Mike Snitzer @ 2025-07-31 18:12 UTC (permalink / raw)
  To: Chuck Lever; +Cc: linux-nfs, Jeff Layton

On Thu, Jul 31, 2025 at 02:10:09PM -0400, Chuck Lever wrote:
> On 7/31/25 1:04 PM, Mike Snitzer wrote:
> > On Thu, Jul 31, 2025 at 11:31:29AM -0400, Jeff Layton wrote:
> >> On Wed, 2025-07-30 at 19:05 -0400, Mike Snitzer wrote:
> >>> From: Mike Snitzer <snitzer@hammerspace.com>
> >>>
> >>> Rename nfsd_read_vector_dio trace event to nfsd_analyze_dio and update
> >>> it so that it provides useful tracing for both READ and WRITE.  This
> >>> prepares for nfsd_vfs_write() to also make use of it when handling
> >>> misaligned WRITEs.
> >>>
> >>> Signed-off-by: Mike Snitzer <snitzer@hammerspace.com>
> >>> ---
> >>>  fs/nfsd/trace.h | 37 ++++++++++++++++++++++++-------------
> >>>  fs/nfsd/vfs.c   | 11 ++++++-----
> >>>  2 files changed, 30 insertions(+), 18 deletions(-)
> >>>
> >>> diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
> >>> index 55055482f8a8..51b47fd041a8 100644
> >>> --- a/fs/nfsd/trace.h
> >>> +++ b/fs/nfsd/trace.h
> >>> @@ -473,41 +473,52 @@ DEFINE_NFSD_IO_EVENT(write_done);
> >>>  DEFINE_NFSD_IO_EVENT(commit_start);
> >>>  DEFINE_NFSD_IO_EVENT(commit_done);
> >>>  
> >>> -TRACE_EVENT(nfsd_read_vector_dio,
> >>> +TRACE_EVENT(nfsd_analyze_dio,
> >>>  	TP_PROTO(struct svc_rqst *rqstp,
> >>>  		 struct svc_fh	*fhp,
> >>> +		 u32		rw,
> >>
> >> I would do this a bit differently. You're hardcoding READ and WRITE
> >> into both tracepoints. I would turn this trace event into a class a'la
> >> DECLARE_EVENT_CLASS, and then just define two different tracepoints
> >> (maybe trace_nfsd_analyze_read/write_dio). Then you can just drop the
> >> above u32 field, and it'll still be evident whether it's a read or
> >> write in the log.
> > 
> > Seems overkill to me, and also forces the need for user to enable
> > discrete tracepoints.
> 
> Users can enable trace points with globs, so that's not a big deal in
> most cases. What is more important is that, when you define individual
> trace points, you can enable tracing for only reads or only writes.
> 
> The common trope is to define a class, as Jeff suggested. The I/O
> direction is then recorded in the trace point and the extra field
> is no longer necessary.

OK, I'll fix it up, thanks.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/3] NFSD: rename and update nfsd_read_vector_dio trace event to nfsd_analyze_dio
  2025-07-31 17:04     ` Mike Snitzer
  2025-07-31 18:10       ` Chuck Lever
@ 2025-07-31 19:04       ` Jeff Layton
  1 sibling, 0 replies; 9+ messages in thread
From: Jeff Layton @ 2025-07-31 19:04 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: Chuck Lever, linux-nfs

On Thu, 2025-07-31 at 13:04 -0400, Mike Snitzer wrote:
> On Thu, Jul 31, 2025 at 11:31:29AM -0400, Jeff Layton wrote:
> > On Wed, 2025-07-30 at 19:05 -0400, Mike Snitzer wrote:
> > > From: Mike Snitzer <snitzer@hammerspace.com>
> > > 
> > > Rename nfsd_read_vector_dio trace event to nfsd_analyze_dio and update
> > > it so that it provides useful tracing for both READ and WRITE.  This
> > > prepares for nfsd_vfs_write() to also make use of it when handling
> > > misaligned WRITEs.
> > > 
> > > Signed-off-by: Mike Snitzer <snitzer@hammerspace.com>
> > > ---
> > >  fs/nfsd/trace.h | 37 ++++++++++++++++++++++++-------------
> > >  fs/nfsd/vfs.c   | 11 ++++++-----
> > >  2 files changed, 30 insertions(+), 18 deletions(-)
> > > 
> > > diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
> > > index 55055482f8a8..51b47fd041a8 100644
> > > --- a/fs/nfsd/trace.h
> > > +++ b/fs/nfsd/trace.h
> > > @@ -473,41 +473,52 @@ DEFINE_NFSD_IO_EVENT(write_done);
> > >  DEFINE_NFSD_IO_EVENT(commit_start);
> > >  DEFINE_NFSD_IO_EVENT(commit_done);
> > >  
> > > -TRACE_EVENT(nfsd_read_vector_dio,
> > > +TRACE_EVENT(nfsd_analyze_dio,
> > >  	TP_PROTO(struct svc_rqst *rqstp,
> > >  		 struct svc_fh	*fhp,
> > > +		 u32		rw,
> > 
> > I would do this a bit differently. You're hardcoding READ and WRITE
> > into both tracepoints. I would turn this trace event into a class a'la
> > DECLARE_EVENT_CLASS, and then just define two different tracepoints
> > (maybe trace_nfsd_analyze_read/write_dio). Then you can just drop the
> > above u32 field, and it'll still be evident whether it's a read or
> > write in the log.
> 
> Seems overkill to me, and also forces the need for user to enable
> discrete tracepoints.  Could be good, could be bad.
> 
> But if I should be taking your feedback as: "do what Jeff suggested",
> that's fine too, happy to do so! ;)
> 

I'd prefer it changed. It's pretty simple to convert a TRACE_EVENT to a
tracepoint class. It'll also consume less memory.
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2025-07-31 19:04 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-30 23:05 [PATCH 0/3] NFSD DIRECT: add handling for misaligned WRITEs Mike Snitzer
2025-07-30 23:05 ` [PATCH 1/3] NFSD: rename and update nfsd_read_vector_dio trace event to nfsd_analyze_dio Mike Snitzer
2025-07-31 15:31   ` Jeff Layton
2025-07-31 17:04     ` Mike Snitzer
2025-07-31 18:10       ` Chuck Lever
2025-07-31 18:12         ` Mike Snitzer
2025-07-31 19:04       ` Jeff Layton
2025-07-30 23:05 ` [PATCH 2/3] NFSD: prepare nfsd_vfs_write() to use O_DIRECT on misaligned WRITEs Mike Snitzer
2025-07-30 23:05 ` [PATCH 3/3] NFSD: issue WRITEs using O_DIRECT even if IO is misaligned Mike Snitzer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).