* [PATCH v4 0/4] NFSD DIRECT: misaligned READs fixup, add handling for misaligned WRITEs
@ 2025-08-05 18:44 Mike Snitzer
2025-08-05 18:44 ` [PATCH v4 1/4] NFSD: avoid using iov_iter_is_aligned() in nfsd_iter_read() Mike Snitzer
` (3 more replies)
0 siblings, 4 replies; 13+ messages in thread
From: Mike Snitzer @ 2025-08-05 18:44 UTC (permalink / raw)
To: Chuck Lever, Jeff Layton; +Cc: linux-nfs
Hi,
This series builds on what has been staged in the nfsd-testing branch.
This code has proven to work well during my testing. Any suggestions
for further refinement are welcome.
Changes since v3:
- fixed nfsd_vfs_write() so work that only needs to happen once, after
IO is submitted, isn't performed each iteration of the for loop,
thanks for catching this Chuck.
- updated NFSD's misaligned READ and WRITE handling to not use
iov_iter_is_aligned() because it will soon be removed upstream.
- Chuck, provided both you and Jeff agree with patch 1's incremental
changes, patch 1 should be folded into what you already have in your
nfsd-testing branch (more justification in patch 1's header)
- dropped the NFSD misaligned DIO NFS reexport patch in favor of
adding STATX_DIOALIGN support to NFS (the patch to add support will
be provided in the next NFS DIRECT v7 patchset that I'll post soon).
Changes since v2:
- fixed patch 2 by moving redundant work out of nfsd_vfs_write()'s for
loop, thanks to Jeff's review.
- added Jeff's Reviewed-by to patches 1-3.
- Left patch 4 in the series because it is pragmatic, but feel free to
drop it if you'd prefer to see this cat skinned a different way.
Changes since v1:
- switched to using an EVENT_CLASS to create nfsd_analyze_{read,write}_dio
- added 4th patch, if user configured use of NFSD_IO_DIRECT then NFS
reexports should use it too (in future, with per-export controls
we'll have the benefit of finer-grained control; but until then we'd
do well to offer comprehensive use of NFSD_IO_DIRECT if it enabled).
Thanks,
Mike
Mike Snitzer (4):
NFSD: avoid using iov_iter_is_aligned() in nfsd_iter_read()
NFSD: refactor nfsd_read_vector_dio to EVENT_CLASS useful for READ and WRITE
NFSD: prepare nfsd_vfs_write() to use O_DIRECT on misaligned WRITEs
NFSD: issue WRITEs using O_DIRECT even if IO is misaligned
fs/nfsd/trace.h | 52 +++++++++----
fs/nfsd/vfs.c | 199 +++++++++++++++++++++++++++++++++++++++---------
2 files changed, 202 insertions(+), 49 deletions(-)
--
2.44.0
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH v4 1/4] NFSD: avoid using iov_iter_is_aligned() in nfsd_iter_read()
2025-08-05 18:44 [PATCH v4 0/4] NFSD DIRECT: misaligned READs fixup, add handling for misaligned WRITEs Mike Snitzer
@ 2025-08-05 18:44 ` Mike Snitzer
2025-08-06 13:18 ` Chuck Lever
2025-08-05 18:44 ` [PATCH v4 2/4] NFSD: refactor nfsd_read_vector_dio to EVENT_CLASS useful for READ and WRITE Mike Snitzer
` (2 subsequent siblings)
3 siblings, 1 reply; 13+ messages in thread
From: Mike Snitzer @ 2025-08-05 18:44 UTC (permalink / raw)
To: Chuck Lever, Jeff Layton; +Cc: linux-nfs
From: Mike Snitzer <snitzer@hammerspace.com>
Check the bvec is DIO-aligned while creating it, saves CPU cycles by
avoiding iterating the bvec elements a second time using
iov_iter_is_aligned().
This prepares for Keith Busch's near-term removal of the
iov_iter_is_aligned() interface. This fixes cel/nfsd-testing commit
5d78ac1e674b4 ("NFSD: issue READs using O_DIRECT even if IO is
misaligned") and it should be folded into that commit so that NFSD
doesn't require iov_iter_is_aligned() while it is being removed
upstream in parallel.
Fixes: cel/nfsd-testing 5d78ac1e674b4 ("NFSD: issue READs using O_DIRECT even if IO is misaligned")
Signed-off-by: Mike Snitzer <snitzer@hammerspace.com>
---
fs/nfsd/vfs.c | 29 +++++++++++++++--------------
1 file changed, 15 insertions(+), 14 deletions(-)
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 46189020172fb..e1751d3715264 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1226,7 +1226,10 @@ __be32 nfsd_iter_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
*/
offset = read_dio.start;
in_count = read_dio.end - offset;
- kiocb.ki_flags = IOCB_DIRECT;
+ /* Verify ondisk DIO alignment, memory addrs checked below */
+ if (likely(((offset | in_count) &
+ (nf->nf_dio_read_offset_align - 1)) == 0))
+ kiocb.ki_flags = IOCB_DIRECT;
}
} else if (nfsd_io_cache_read == NFSD_IO_DONTCACHE)
kiocb.ki_flags = IOCB_DONTCACHE;
@@ -1236,16 +1239,24 @@ __be32 nfsd_iter_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
v = 0;
total = in_count;
if (read_dio.start_extra) {
- bvec_set_page(&rqstp->rq_bvec[v++], read_dio.start_extra_page,
+ bvec_set_page(&rqstp->rq_bvec[v], read_dio.start_extra_page,
read_dio.start_extra, PAGE_SIZE - read_dio.start_extra);
+ if (unlikely((kiocb.ki_flags & IOCB_DIRECT) &&
+ rqstp->rq_bvec[v].bv_offset & (nf->nf_dio_mem_align - 1)))
+ kiocb.ki_flags &= ~IOCB_DIRECT;
total -= read_dio.start_extra;
+ v++;
}
while (total) {
len = min_t(size_t, total, PAGE_SIZE - base);
- bvec_set_page(&rqstp->rq_bvec[v++], *(rqstp->rq_next_page++),
- len, base);
+ bvec_set_page(&rqstp->rq_bvec[v], *(rqstp->rq_next_page++), len, base);
+ /* No need to verify memory is DIO-aligned since bv_offset is 0 */
+ if (unlikely((kiocb.ki_flags & IOCB_DIRECT) && base &&
+ (base & (nf->nf_dio_mem_align - 1))))
+ kiocb.ki_flags &= ~IOCB_DIRECT;
total -= len;
base = 0;
+ v++;
}
if (WARN_ONCE(v > rqstp->rq_maxpages,
"%s: v=%lu exceeds rqstp->rq_maxpages=%lu\n", __func__,
@@ -1256,16 +1267,6 @@ __be32 nfsd_iter_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
if (!host_err) {
trace_nfsd_read_vector(rqstp, fhp, offset, in_count);
iov_iter_bvec(&iter, ITER_DEST, rqstp->rq_bvec, v, in_count);
-
- /* Double check nfsd_analyze_read_dio's DIO-aligned result */
- if (unlikely((kiocb.ki_flags & IOCB_DIRECT) &&
- !iov_iter_is_aligned(&iter,
- nf->nf_dio_mem_align - 1,
- nf->nf_dio_read_offset_align - 1))) {
- /* Fallback to buffered IO */
- kiocb.ki_flags &= ~IOCB_DIRECT;
- }
-
host_err = vfs_iocb_iter_read(file, &kiocb, &iter);
}
--
2.44.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v4 2/4] NFSD: refactor nfsd_read_vector_dio to EVENT_CLASS useful for READ and WRITE
2025-08-05 18:44 [PATCH v4 0/4] NFSD DIRECT: misaligned READs fixup, add handling for misaligned WRITEs Mike Snitzer
2025-08-05 18:44 ` [PATCH v4 1/4] NFSD: avoid using iov_iter_is_aligned() in nfsd_iter_read() Mike Snitzer
@ 2025-08-05 18:44 ` Mike Snitzer
2025-08-05 18:44 ` [PATCH v4 3/4] NFSD: prepare nfsd_vfs_write() to use O_DIRECT on misaligned WRITEs Mike Snitzer
2025-08-05 18:44 ` [PATCH v4 4/4] NFSD: issue WRITEs using O_DIRECT even if IO is misaligned Mike Snitzer
3 siblings, 0 replies; 13+ messages in thread
From: Mike Snitzer @ 2025-08-05 18:44 UTC (permalink / raw)
To: Chuck Lever, Jeff Layton; +Cc: linux-nfs
Transform nfsd_read_vector_dio trace event into nfsd_analyze_dio_class
and use it to create nfsd_analyze_read_dio and nfsd_analyze_write_dio
trace events.
This prepares for nfsd_vfs_write() to also make use of it when
handling misaligned WRITEs.
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
---
fs/nfsd/trace.h | 52 ++++++++++++++++++++++++++++++++++++-------------
fs/nfsd/vfs.c | 11 ++++++-----
2 files changed, 44 insertions(+), 19 deletions(-)
diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
index 55055482f8a84..4173bd9344b6b 100644
--- a/fs/nfsd/trace.h
+++ b/fs/nfsd/trace.h
@@ -473,25 +473,29 @@ DEFINE_NFSD_IO_EVENT(write_done);
DEFINE_NFSD_IO_EVENT(commit_start);
DEFINE_NFSD_IO_EVENT(commit_done);
-TRACE_EVENT(nfsd_read_vector_dio,
+DECLARE_EVENT_CLASS(nfsd_analyze_dio_class,
TP_PROTO(struct svc_rqst *rqstp,
struct svc_fh *fhp,
u64 offset,
u32 len,
- loff_t start,
- loff_t start_extra,
- loff_t end,
- loff_t end_extra),
- TP_ARGS(rqstp, fhp, offset, len, start, start_extra, end, end_extra),
+ loff_t start,
+ ssize_t start_len,
+ loff_t middle,
+ ssize_t middle_len,
+ loff_t end,
+ ssize_t end_len),
+ TP_ARGS(rqstp, fhp, offset, len, start, start_len, middle, middle_len, end, end_len),
TP_STRUCT__entry(
__field(u32, xid)
__field(u32, fh_hash)
__field(u64, offset)
__field(u32, len)
__field(loff_t, start)
- __field(loff_t, start_extra)
+ __field(ssize_t, start_len)
+ __field(loff_t, middle)
+ __field(ssize_t, middle_len)
__field(loff_t, end)
- __field(loff_t, end_extra)
+ __field(ssize_t, end_len)
),
TP_fast_assign(
__entry->xid = be32_to_cpu(rqstp->rq_xid);
@@ -499,16 +503,36 @@ TRACE_EVENT(nfsd_read_vector_dio,
__entry->offset = offset;
__entry->len = len;
__entry->start = start;
- __entry->start_extra = start_extra;
+ __entry->start_len = start_len;
+ __entry->middle = middle;
+ __entry->middle_len = middle_len;
__entry->end = end;
- __entry->end_extra = end_extra;
+ __entry->end_len = end_len;
),
- TP_printk("xid=0x%08x fh_hash=0x%08x offset=%llu len=%u start=%llu+%llu end=%llu-%llu",
+ TP_printk("xid=0x%08x fh_hash=0x%08x offset=%llu len=%u start=%llu+%lu middle=%llu+%lu end=%llu+%lu",
__entry->xid, __entry->fh_hash,
__entry->offset, __entry->len,
- __entry->start, __entry->start_extra,
- __entry->end, __entry->end_extra)
-);
+ __entry->start, __entry->start_len,
+ __entry->middle, __entry->middle_len,
+ __entry->end, __entry->end_len)
+)
+
+#define DEFINE_NFSD_ANALYZE_DIO_EVENT(name) \
+DEFINE_EVENT(nfsd_analyze_dio_class, nfsd_analyze_##name##_dio, \
+ TP_PROTO(struct svc_rqst *rqstp, \
+ struct svc_fh *fhp, \
+ u64 offset, \
+ u32 len, \
+ loff_t start, \
+ ssize_t start_len, \
+ loff_t middle, \
+ ssize_t middle_len, \
+ loff_t end, \
+ ssize_t end_len), \
+ TP_ARGS(rqstp, fhp, offset, len, start, start_len, middle, middle_len, end, end_len))
+
+DEFINE_NFSD_ANALYZE_DIO_EVENT(read);
+DEFINE_NFSD_ANALYZE_DIO_EVENT(write);
DECLARE_EVENT_CLASS(nfsd_err_class,
TP_PROTO(struct svc_rqst *rqstp,
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index e1751d3715264..609b85f8bde3e 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1094,7 +1094,7 @@ static bool nfsd_analyze_read_dio(struct svc_rqst *rqstp, struct svc_fh *fhp,
struct nfsd_read_dio *read_dio)
{
const u32 dio_blocksize = nf->nf_dio_read_offset_align;
- loff_t orig_end = offset + len;
+ loff_t middle_end, orig_end = offset + len;
if (WARN_ONCE(!nf->nf_dio_mem_align || !nf->nf_dio_read_offset_align,
"%s: underlying filesystem has not provided DIO alignment info\n",
@@ -1133,10 +1133,11 @@ static bool nfsd_analyze_read_dio(struct svc_rqst *rqstp, struct svc_fh *fhp,
}
/* Show original offset and count, and how it was expanded for DIO */
- trace_nfsd_read_vector_dio(rqstp, fhp, offset, len,
- read_dio->start, read_dio->start_extra,
- read_dio->end, read_dio->end_extra);
-
+ middle_end = read_dio->end - read_dio->end_extra;
+ trace_nfsd_analyze_read_dio(rqstp, fhp, offset, len,
+ read_dio->start, read_dio->start_extra,
+ offset, (middle_end - offset),
+ middle_end, read_dio->end_extra);
return true;
}
--
2.44.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v4 3/4] NFSD: prepare nfsd_vfs_write() to use O_DIRECT on misaligned WRITEs
2025-08-05 18:44 [PATCH v4 0/4] NFSD DIRECT: misaligned READs fixup, add handling for misaligned WRITEs Mike Snitzer
2025-08-05 18:44 ` [PATCH v4 1/4] NFSD: avoid using iov_iter_is_aligned() in nfsd_iter_read() Mike Snitzer
2025-08-05 18:44 ` [PATCH v4 2/4] NFSD: refactor nfsd_read_vector_dio to EVENT_CLASS useful for READ and WRITE Mike Snitzer
@ 2025-08-05 18:44 ` Mike Snitzer
2025-08-05 18:44 ` [PATCH v4 4/4] NFSD: issue WRITEs using O_DIRECT even if IO is misaligned Mike Snitzer
3 siblings, 0 replies; 13+ messages in thread
From: Mike Snitzer @ 2025-08-05 18:44 UTC (permalink / raw)
To: Chuck Lever, Jeff Layton; +Cc: linux-nfs
Refactor nfsd_vfs_write() to support splitting a WRITE into parts
(which will be either misaligned or DIO-aligned). Doing so in a
preliminary commit just allows for indentation and slight
transformation to be more easily understood and reviewed.
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
---
fs/nfsd/vfs.c | 25 ++++++++++++++++---------
1 file changed, 16 insertions(+), 9 deletions(-)
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 609b85f8bde3e..0d4f9f452d466 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1342,7 +1342,6 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
struct super_block *sb = file_inode(file)->i_sb;
struct kiocb kiocb;
struct svc_export *exp;
- struct iov_iter iter;
errseq_t since;
__be32 nfserr;
int host_err;
@@ -1350,6 +1349,9 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
unsigned int pflags = current->flags;
bool restore_flags = false;
unsigned int nvecs;
+ struct iov_iter iter_stack[1];
+ struct iov_iter *iter = iter_stack;
+ unsigned int n_iters = 0;
trace_nfsd_write_opened(rqstp, fhp, offset, *cnt);
@@ -1379,14 +1381,15 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
kiocb.ki_flags |= IOCB_DSYNC;
nvecs = xdr_buf_to_bvec(rqstp->rq_bvec, rqstp->rq_maxpages, payload);
- iov_iter_bvec(&iter, ITER_SOURCE, rqstp->rq_bvec, nvecs, *cnt);
+ iov_iter_bvec(&iter[0], ITER_SOURCE, rqstp->rq_bvec, nvecs, *cnt);
+ n_iters++;
switch (nfsd_io_cache_write) {
case NFSD_IO_DIRECT:
/* direct I/O must be aligned to device logical sector size */
if (nf->nf_dio_mem_align && nf->nf_dio_offset_align &&
(((offset | *cnt) & (nf->nf_dio_offset_align-1)) == 0) &&
- iov_iter_is_aligned(&iter, nf->nf_dio_mem_align - 1,
+ iov_iter_is_aligned(&iter[0], nf->nf_dio_mem_align - 1,
nf->nf_dio_offset_align - 1))
kiocb.ki_flags = IOCB_DIRECT;
break;
@@ -1400,13 +1403,17 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
since = READ_ONCE(file->f_wb_err);
if (verf)
nfsd_copy_write_verifier(verf, nn);
- host_err = vfs_iocb_iter_write(file, &kiocb, &iter);
- if (host_err < 0) {
- commit_reset_write_verifier(nn, rqstp, host_err);
- goto out_nfserr;
+ *cnt = 0;
+ for (int i = 0; i < n_iters; i++) {
+ host_err = vfs_iocb_iter_write(file, &kiocb, &iter[i]);
+ if (host_err < 0) {
+ commit_reset_write_verifier(nn, rqstp, host_err);
+ goto out_nfserr;
+ }
+ *cnt += host_err;
+ nfsd_stats_io_write_add(nn, exp, host_err);
}
- *cnt = host_err;
- nfsd_stats_io_write_add(nn, exp, *cnt);
+
fsnotify_modify(file);
host_err = filemap_check_wb_err(file->f_mapping, since);
if (host_err < 0)
--
2.44.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v4 4/4] NFSD: issue WRITEs using O_DIRECT even if IO is misaligned
2025-08-05 18:44 [PATCH v4 0/4] NFSD DIRECT: misaligned READs fixup, add handling for misaligned WRITEs Mike Snitzer
` (2 preceding siblings ...)
2025-08-05 18:44 ` [PATCH v4 3/4] NFSD: prepare nfsd_vfs_write() to use O_DIRECT on misaligned WRITEs Mike Snitzer
@ 2025-08-05 18:44 ` Mike Snitzer
2025-08-06 13:53 ` Chuck Lever
3 siblings, 1 reply; 13+ messages in thread
From: Mike Snitzer @ 2025-08-05 18:44 UTC (permalink / raw)
To: Chuck Lever, Jeff Layton; +Cc: linux-nfs
If NFSD_IO_DIRECT is used, split any misaligned WRITE into a start,
middle and end as needed. The large middle extent is DIO-aligned and
the start and/or end are misaligned. Buffered IO is used for the
misaligned extents and O_DIRECT is used for the middle DIO-aligned
extent.
The nfsd_analyze_write_dio trace event shows how NFSD splits a given
misaligned WRITE into a mix of misaligned extent(s) and a DIO-aligned
extent.
This combination of trace events is useful:
echo 1 > /sys/kernel/tracing/events/nfsd/nfsd_write_opened/enable
echo 1 > /sys/kernel/tracing/events/nfsd/nfsd_analyze_write_dio/enable
echo 1 > /sys/kernel/tracing/events/nfsd/nfsd_write_io_done/enable
echo 1 > /sys/kernel/tracing/events/xfs/xfs_file_direct_write/enable
Which for this dd command:
dd if=/dev/zero of=/mnt/share1/test bs=47008 count=2 oflag=direct
Results in:
nfsd-55714 [043] ..... 79976.260851: nfsd_write_opened: xid=0x966c5d2d fh_hash=0x4d34e6c1 offset=0 len=47008
nfsd-55714 [043] ..... 79976.260852: nfsd_analyze_write_dio: xid=0x966c5d2d fh_hash=0x4d34e6c1 offset=0 len=47008 start=0+0 middle=0+45056 end=45056+1952
nfsd-55714 [043] ..... 79976.260857: xfs_file_direct_write: dev 259:12 ino 0x3e00008f disize 0x0 pos 0x0 bytecount 0xb000
nfsd-55714 [043] ..... 79976.260965: nfsd_write_io_done: xid=0x966c5d2d fh_hash=0x4d34e6c1 offset=0 len=47008
nfsd-55714 [043] ..... 79976.307762: nfsd_write_opened: xid=0x67e5ce6f fh_hash=0x4d34e6c1 offset=47008 len=47008
nfsd-55714 [043] ..... 79976.307762: nfsd_analyze_write_dio: xid=0x67e5ce6f fh_hash=0x4d34e6c1 offset=47008 len=47008 start=47008+2144 middle=49152+40960 end=90112+3904
nfsd-55714 [043] ..... 79976.307797: xfs_file_direct_write: dev 259:12 ino 0x3e00008f disize 0xc000 pos 0xc000 bytecount 0xa000
nfsd-55714 [043] ..... 79976.307866: nfsd_write_io_done: xid=0x67e5ce6f fh_hash=0x4d34e6c1 offset=47008 len=47008
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
---
fs/nfsd/vfs.c | 142 ++++++++++++++++++++++++++++++++++++++++++++++----
1 file changed, 131 insertions(+), 11 deletions(-)
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 0d4f9f452d466..4980800fab66e 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1315,6 +1315,121 @@ static int wait_for_concurrent_writes(struct file *file)
return err;
}
+struct nfsd_write_dio
+{
+ loff_t middle_offset; /* Offset for start of DIO-aligned middle */
+ loff_t end_offset; /* Offset for start of DIO-aligned end */
+ ssize_t start_len; /* Length for misaligned first extent */
+ ssize_t middle_len; /* Length for DIO-aligned middle extent */
+ ssize_t end_len; /* Length for misaligned last extent */
+};
+
+static void init_nfsd_write_dio(struct nfsd_write_dio *write_dio)
+{
+ memset(write_dio, 0, sizeof(*write_dio));
+}
+
+static bool nfsd_analyze_write_dio(struct svc_rqst *rqstp, struct svc_fh *fhp,
+ struct nfsd_file *nf, loff_t offset,
+ unsigned long len, struct nfsd_write_dio *write_dio)
+{
+ const u32 dio_blocksize = nf->nf_dio_offset_align;
+ loff_t orig_end, middle_end, start_end, start_offset = offset;
+ ssize_t start_len = len;
+ bool aligned = true;
+
+ if (WARN_ONCE(!nf->nf_dio_mem_align || !dio_blocksize,
+ "%s: underlying filesystem has not provided DIO alignment info\n",
+ __func__))
+ return false;
+
+ if (WARN_ONCE(dio_blocksize > PAGE_SIZE,
+ "%s: underlying storage's dio_blocksize=%u > PAGE_SIZE=%lu\n",
+ __func__, dio_blocksize, PAGE_SIZE))
+ return false;
+
+ if (unlikely(len < dio_blocksize)) {
+ aligned = false;
+ goto out;
+ }
+
+ if (((offset | len) & (dio_blocksize-1)) == 0) {
+ /* already DIO-aligned, no misaligned head or tail */
+ write_dio->middle_offset = offset;
+ write_dio->middle_len = len;
+ /* clear these for the benefit of trace_nfsd_analyze_write_dio */
+ start_offset = 0;
+ start_len = 0;
+ goto out;
+ }
+
+ start_end = round_up(offset, dio_blocksize);
+ start_len = start_end - offset;
+ orig_end = offset + len;
+ middle_end = round_down(orig_end, dio_blocksize);
+
+ write_dio->start_len = start_len;
+ write_dio->middle_offset = start_end;
+ write_dio->middle_len = middle_end - start_end;
+ write_dio->end_offset = middle_end;
+ write_dio->end_len = orig_end - middle_end;
+out:
+ trace_nfsd_analyze_write_dio(rqstp, fhp, offset, len, start_offset, start_len,
+ write_dio->middle_offset, write_dio->middle_len,
+ write_dio->end_offset, write_dio->end_len);
+ return aligned;
+}
+
+/*
+ * Setup as many as 3 iov_iter based on extents possibly described by @write_dio.
+ * @iterp: pointer to pointer to onstack array of 3 iov_iter structs from caller.
+ * @iter_is_dio_aligned: pointer to onstack array of 3 bools from caller.
+ * @dio_aligned: bool that reflects nfsd_analyze_write_dio()'s return
+ * @rq_bvec: backing bio_vec used to setup all 3 iov_iter permutations.
+ * @nvecs: number of segments in @rq_bvec
+ * @cnt: size of the request in bytes
+ * @write_dio: nfsd_write_dio struct that describes start, middle and end extents.
+ *
+ * Returns the number of iov_iter that were setup.
+ */
+static int nfsd_setup_write_iters(struct iov_iter **iterp, bool *iter_is_dio_aligned,
+ bool dio_aligned, struct bio_vec *rq_bvec,
+ unsigned int nvecs, unsigned long cnt,
+ struct nfsd_write_dio *write_dio)
+{
+ int n_iters = 0;
+ struct iov_iter *iters = *iterp;
+
+ /* Setup misaligned start? */
+ if (write_dio->start_len) {
+ iter_is_dio_aligned[n_iters] = false;
+ iov_iter_bvec(&iters[n_iters], ITER_SOURCE, rq_bvec, nvecs, cnt);
+ iters[n_iters].count = write_dio->start_len;
+ n_iters++;
+ }
+
+ /* Setup possibly DIO-aligned middle */
+ iter_is_dio_aligned[n_iters] = dio_aligned;
+ iov_iter_bvec(&iters[n_iters], ITER_SOURCE, rq_bvec, nvecs, cnt);
+ if (dio_aligned) {
+ if (write_dio->start_len)
+ iov_iter_advance(&iters[n_iters], write_dio->start_len);
+ iters[n_iters].count -= write_dio->end_len;
+ }
+ n_iters++;
+
+ /* Setup misaligned end? */
+ if (write_dio->end_len) {
+ iter_is_dio_aligned[n_iters] = false;
+ iov_iter_bvec(&iters[n_iters], ITER_SOURCE, rq_bvec, nvecs, cnt);
+ iov_iter_advance(&iters[n_iters],
+ write_dio->start_len + write_dio->middle_len);
+ n_iters++;
+ }
+
+ return n_iters;
+}
+
/**
* nfsd_vfs_write - write data to an already-open file
* @rqstp: RPC execution context
@@ -1349,9 +1464,12 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
unsigned int pflags = current->flags;
bool restore_flags = false;
unsigned int nvecs;
- struct iov_iter iter_stack[1];
+ struct iov_iter iter_stack[3];
struct iov_iter *iter = iter_stack;
unsigned int n_iters = 0;
+ bool iov_iter_is_dio_aligned[3];
+ bool dio_aligned = false;
+ struct nfsd_write_dio write_dio;
trace_nfsd_write_opened(rqstp, fhp, offset, *cnt);
@@ -1380,18 +1498,12 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
if (stable && !fhp->fh_use_wgather)
kiocb.ki_flags |= IOCB_DSYNC;
- nvecs = xdr_buf_to_bvec(rqstp->rq_bvec, rqstp->rq_maxpages, payload);
- iov_iter_bvec(&iter[0], ITER_SOURCE, rqstp->rq_bvec, nvecs, *cnt);
- n_iters++;
-
+ init_nfsd_write_dio(&write_dio);
switch (nfsd_io_cache_write) {
case NFSD_IO_DIRECT:
- /* direct I/O must be aligned to device logical sector size */
- if (nf->nf_dio_mem_align && nf->nf_dio_offset_align &&
- (((offset | *cnt) & (nf->nf_dio_offset_align-1)) == 0) &&
- iov_iter_is_aligned(&iter[0], nf->nf_dio_mem_align - 1,
- nf->nf_dio_offset_align - 1))
- kiocb.ki_flags = IOCB_DIRECT;
+ if (nfsd_analyze_write_dio(rqstp, fhp, nf, offset,
+ *cnt, &write_dio))
+ dio_aligned = true;
break;
case NFSD_IO_DONTCACHE:
kiocb.ki_flags = IOCB_DONTCACHE;
@@ -1400,11 +1512,19 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
break;
}
+ nvecs = xdr_buf_to_bvec(rqstp->rq_bvec, rqstp->rq_maxpages, payload);
+ n_iters = nfsd_setup_write_iters(&iter, iov_iter_is_dio_aligned, dio_aligned,
+ rqstp->rq_bvec, nvecs, *cnt, &write_dio);
+
since = READ_ONCE(file->f_wb_err);
if (verf)
nfsd_copy_write_verifier(verf, nn);
*cnt = 0;
for (int i = 0; i < n_iters; i++) {
+ if (iov_iter_is_dio_aligned[i])
+ kiocb.ki_flags |= IOCB_DIRECT;
+ else
+ kiocb.ki_flags &= ~IOCB_DIRECT;
host_err = vfs_iocb_iter_write(file, &kiocb, &iter[i]);
if (host_err < 0) {
commit_reset_write_verifier(nn, rqstp, host_err);
--
2.44.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v4 1/4] NFSD: avoid using iov_iter_is_aligned() in nfsd_iter_read()
2025-08-05 18:44 ` [PATCH v4 1/4] NFSD: avoid using iov_iter_is_aligned() in nfsd_iter_read() Mike Snitzer
@ 2025-08-06 13:18 ` Chuck Lever
2025-08-06 15:57 ` Mike Snitzer
0 siblings, 1 reply; 13+ messages in thread
From: Chuck Lever @ 2025-08-06 13:18 UTC (permalink / raw)
To: Mike Snitzer, Jeff Layton; +Cc: linux-nfs
On 8/5/25 2:44 PM, Mike Snitzer wrote:
> From: Mike Snitzer <snitzer@hammerspace.com>
>
> Check the bvec is DIO-aligned while creating it, saves CPU cycles by
> avoiding iterating the bvec elements a second time using
> iov_iter_is_aligned().
>
> This prepares for Keith Busch's near-term removal of the
> iov_iter_is_aligned() interface. This fixes cel/nfsd-testing commit
> 5d78ac1e674b4 ("NFSD: issue READs using O_DIRECT even if IO is
> misaligned") and it should be folded into that commit so that NFSD
> doesn't require iov_iter_is_aligned() while it is being removed
> upstream in parallel.
>
> Fixes: cel/nfsd-testing 5d78ac1e674b4 ("NFSD: issue READs using O_DIRECT even if IO is misaligned")
> Signed-off-by: Mike Snitzer <snitzer@hammerspace.com>
> ---
> fs/nfsd/vfs.c | 29 +++++++++++++++--------------
> 1 file changed, 15 insertions(+), 14 deletions(-)
>
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index 46189020172fb..e1751d3715264 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -1226,7 +1226,10 @@ __be32 nfsd_iter_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
> */
> offset = read_dio.start;
> in_count = read_dio.end - offset;
> - kiocb.ki_flags = IOCB_DIRECT;
> + /* Verify ondisk DIO alignment, memory addrs checked below */
> + if (likely(((offset | in_count) &
> + (nf->nf_dio_read_offset_align - 1)) == 0))
> + kiocb.ki_flags = IOCB_DIRECT;
> }
> } else if (nfsd_io_cache_read == NFSD_IO_DONTCACHE)
> kiocb.ki_flags = IOCB_DONTCACHE;
> @@ -1236,16 +1239,24 @@ __be32 nfsd_iter_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
> v = 0;
> total = in_count;
> if (read_dio.start_extra) {
> - bvec_set_page(&rqstp->rq_bvec[v++], read_dio.start_extra_page,
> + bvec_set_page(&rqstp->rq_bvec[v], read_dio.start_extra_page,
> read_dio.start_extra, PAGE_SIZE - read_dio.start_extra);
> + if (unlikely((kiocb.ki_flags & IOCB_DIRECT) &&
> + rqstp->rq_bvec[v].bv_offset & (nf->nf_dio_mem_align - 1)))
> + kiocb.ki_flags &= ~IOCB_DIRECT;
> total -= read_dio.start_extra;
> + v++;
> }
> while (total) {
> len = min_t(size_t, total, PAGE_SIZE - base);
> - bvec_set_page(&rqstp->rq_bvec[v++], *(rqstp->rq_next_page++),
> - len, base);
> + bvec_set_page(&rqstp->rq_bvec[v], *(rqstp->rq_next_page++), len, base);
> + /* No need to verify memory is DIO-aligned since bv_offset is 0 */
> + if (unlikely((kiocb.ki_flags & IOCB_DIRECT) && base &&
> + (base & (nf->nf_dio_mem_align - 1))))
> + kiocb.ki_flags &= ~IOCB_DIRECT;
> total -= len;
> base = 0;
> + v++;
> }
> if (WARN_ONCE(v > rqstp->rq_maxpages,
> "%s: v=%lu exceeds rqstp->rq_maxpages=%lu\n", __func__,
> @@ -1256,16 +1267,6 @@ __be32 nfsd_iter_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
> if (!host_err) {
> trace_nfsd_read_vector(rqstp, fhp, offset, in_count);
> iov_iter_bvec(&iter, ITER_DEST, rqstp->rq_bvec, v, in_count);
> -
> - /* Double check nfsd_analyze_read_dio's DIO-aligned result */
> - if (unlikely((kiocb.ki_flags & IOCB_DIRECT) &&
> - !iov_iter_is_aligned(&iter,
> - nf->nf_dio_mem_align - 1,
> - nf->nf_dio_read_offset_align - 1))) {
> - /* Fallback to buffered IO */
> - kiocb.ki_flags &= ~IOCB_DIRECT;
> - }
> -
> host_err = vfs_iocb_iter_read(file, &kiocb, &iter);
> }
>
Hi Mike,
In cases where the SQUASHME patch is this large, I usually drop the
patch (or series) in nfsd-testing and ask the contributor to rebase and
repost. This gets the new version of the patch properly archived on
lore, for one thing.
Before reposting, please do run checkpatch.pl on the series.
--
Chuck Lever
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v4 4/4] NFSD: issue WRITEs using O_DIRECT even if IO is misaligned
2025-08-05 18:44 ` [PATCH v4 4/4] NFSD: issue WRITEs using O_DIRECT even if IO is misaligned Mike Snitzer
@ 2025-08-06 13:53 ` Chuck Lever
2025-08-06 15:55 ` Mike Snitzer
0 siblings, 1 reply; 13+ messages in thread
From: Chuck Lever @ 2025-08-06 13:53 UTC (permalink / raw)
To: Mike Snitzer; +Cc: linux-nfs, Jeff Layton
On 8/5/25 2:44 PM, Mike Snitzer wrote:
> If NFSD_IO_DIRECT is used, split any misaligned WRITE into a start,
> middle and end as needed. The large middle extent is DIO-aligned and
> the start and/or end are misaligned. Buffered IO is used for the
> misaligned extents and O_DIRECT is used for the middle DIO-aligned
> extent.
>
> The nfsd_analyze_write_dio trace event shows how NFSD splits a given
> misaligned WRITE into a mix of misaligned extent(s) and a DIO-aligned
> extent.
>
> This combination of trace events is useful:
>
> echo 1 > /sys/kernel/tracing/events/nfsd/nfsd_write_opened/enable
> echo 1 > /sys/kernel/tracing/events/nfsd/nfsd_analyze_write_dio/enable
> echo 1 > /sys/kernel/tracing/events/nfsd/nfsd_write_io_done/enable
> echo 1 > /sys/kernel/tracing/events/xfs/xfs_file_direct_write/enable
>
> Which for this dd command:
>
> dd if=/dev/zero of=/mnt/share1/test bs=47008 count=2 oflag=direct
>
> Results in:
>
> nfsd-55714 [043] ..... 79976.260851: nfsd_write_opened: xid=0x966c5d2d fh_hash=0x4d34e6c1 offset=0 len=47008
> nfsd-55714 [043] ..... 79976.260852: nfsd_analyze_write_dio: xid=0x966c5d2d fh_hash=0x4d34e6c1 offset=0 len=47008 start=0+0 middle=0+45056 end=45056+1952
> nfsd-55714 [043] ..... 79976.260857: xfs_file_direct_write: dev 259:12 ino 0x3e00008f disize 0x0 pos 0x0 bytecount 0xb000
> nfsd-55714 [043] ..... 79976.260965: nfsd_write_io_done: xid=0x966c5d2d fh_hash=0x4d34e6c1 offset=0 len=47008
>
> nfsd-55714 [043] ..... 79976.307762: nfsd_write_opened: xid=0x67e5ce6f fh_hash=0x4d34e6c1 offset=47008 len=47008
> nfsd-55714 [043] ..... 79976.307762: nfsd_analyze_write_dio: xid=0x67e5ce6f fh_hash=0x4d34e6c1 offset=47008 len=47008 start=47008+2144 middle=49152+40960 end=90112+3904
> nfsd-55714 [043] ..... 79976.307797: xfs_file_direct_write: dev 259:12 ino 0x3e00008f disize 0xc000 pos 0xc000 bytecount 0xa000
> nfsd-55714 [043] ..... 79976.307866: nfsd_write_io_done: xid=0x67e5ce6f fh_hash=0x4d34e6c1 offset=47008 len=47008
>
> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> Reviewed-by: Jeff Layton <jlayton@kernel.org>
> ---
> fs/nfsd/vfs.c | 142 ++++++++++++++++++++++++++++++++++++++++++++++----
> 1 file changed, 131 insertions(+), 11 deletions(-)
>
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index 0d4f9f452d466..4980800fab66e 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -1315,6 +1315,121 @@ static int wait_for_concurrent_writes(struct file *file)
> return err;
> }
>
> +struct nfsd_write_dio
> +{
struct nfsd_write_dio {
> + loff_t middle_offset; /* Offset for start of DIO-aligned middle */
> + loff_t end_offset; /* Offset for start of DIO-aligned end */
> + ssize_t start_len; /* Length for misaligned first extent */
> + ssize_t middle_len; /* Length for DIO-aligned middle extent */
> + ssize_t end_len; /* Length for misaligned last extent */
> +};
> +
> +static void init_nfsd_write_dio(struct nfsd_write_dio *write_dio)
> +{
> + memset(write_dio, 0, sizeof(*write_dio));
> +}
> +
> +static bool nfsd_analyze_write_dio(struct svc_rqst *rqstp, struct svc_fh *fhp,
> + struct nfsd_file *nf, loff_t offset,
> + unsigned long len, struct nfsd_write_dio *write_dio)
> +{
> + const u32 dio_blocksize = nf->nf_dio_offset_align;
> + loff_t orig_end, middle_end, start_end, start_offset = offset;
> + ssize_t start_len = len;
> + bool aligned = true;
> +
> + if (WARN_ONCE(!nf->nf_dio_mem_align || !dio_blocksize,
> + "%s: underlying filesystem has not provided DIO alignment info\n",
> + __func__))
> + return false;
> +
> + if (WARN_ONCE(dio_blocksize > PAGE_SIZE,
> + "%s: underlying storage's dio_blocksize=%u > PAGE_SIZE=%lu\n",
> + __func__, dio_blocksize, PAGE_SIZE))
> + return false;
> +
> + if (unlikely(len < dio_blocksize)) {
> + aligned = false;
> + goto out;
> + }
> +
> + if (((offset | len) & (dio_blocksize-1)) == 0) {
> + /* already DIO-aligned, no misaligned head or tail */
> + write_dio->middle_offset = offset;
> + write_dio->middle_len = len;
> + /* clear these for the benefit of trace_nfsd_analyze_write_dio */
> + start_offset = 0;
> + start_len = 0;
> + goto out;
> + }
> +
> + start_end = round_up(offset, dio_blocksize);
> + start_len = start_end - offset;
> + orig_end = offset + len;
> + middle_end = round_down(orig_end, dio_blocksize);
> +
> + write_dio->start_len = start_len;
> + write_dio->middle_offset = start_end;
> + write_dio->middle_len = middle_end - start_end;
> + write_dio->end_offset = middle_end;
> + write_dio->end_len = orig_end - middle_end;
> +out:
> + trace_nfsd_analyze_write_dio(rqstp, fhp, offset, len, start_offset, start_len,
> + write_dio->middle_offset, write_dio->middle_len,
> + write_dio->end_offset, write_dio->end_len);
> + return aligned;
> +}
> +
> +/*
> + * Setup as many as 3 iov_iter based on extents possibly described by @write_dio.
> + * @iterp: pointer to pointer to onstack array of 3 iov_iter structs from caller.
> + * @iter_is_dio_aligned: pointer to onstack array of 3 bools from caller.
> + * @dio_aligned: bool that reflects nfsd_analyze_write_dio()'s return
> + * @rq_bvec: backing bio_vec used to setup all 3 iov_iter permutations.
> + * @nvecs: number of segments in @rq_bvec
> + * @cnt: size of the request in bytes
> + * @write_dio: nfsd_write_dio struct that describes start, middle and end extents.
> + *
> + * Returns the number of iov_iter that were setup.
> + */
> +static int nfsd_setup_write_iters(struct iov_iter **iterp, bool *iter_is_dio_aligned,
> + bool dio_aligned, struct bio_vec *rq_bvec,
> + unsigned int nvecs, unsigned long cnt,
> + struct nfsd_write_dio *write_dio)
> +{
> + int n_iters = 0;
> + struct iov_iter *iters = *iterp;
> +
> + /* Setup misaligned start? */
> + if (write_dio->start_len) {
> + iter_is_dio_aligned[n_iters] = false;
> + iov_iter_bvec(&iters[n_iters], ITER_SOURCE, rq_bvec, nvecs, cnt);
> + iters[n_iters].count = write_dio->start_len;
> + n_iters++;
> + }
> +
> + /* Setup possibly DIO-aligned middle */
> + iter_is_dio_aligned[n_iters] = dio_aligned;
> + iov_iter_bvec(&iters[n_iters], ITER_SOURCE, rq_bvec, nvecs, cnt);
> + if (dio_aligned) {
> + if (write_dio->start_len)
> + iov_iter_advance(&iters[n_iters], write_dio->start_len);
> + iters[n_iters].count -= write_dio->end_len;
> + }
> + n_iters++;
> +
> + /* Setup misaligned end? */
> + if (write_dio->end_len) {
> + iter_is_dio_aligned[n_iters] = false;
> + iov_iter_bvec(&iters[n_iters], ITER_SOURCE, rq_bvec, nvecs, cnt);
> + iov_iter_advance(&iters[n_iters],
> + write_dio->start_len + write_dio->middle_len);
> + n_iters++;
> + }
> +
> + return n_iters;
> +}
> +
> /**
> * nfsd_vfs_write - write data to an already-open file
> * @rqstp: RPC execution context
> @@ -1349,9 +1464,12 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
> unsigned int pflags = current->flags;
> bool restore_flags = false;
> unsigned int nvecs;
> - struct iov_iter iter_stack[1];
> + struct iov_iter iter_stack[3];
struct iov_iter isn't that small. This is going to grow the stack frame
substantially but is used for only the direct I/O case.
> struct iov_iter *iter = iter_stack;
> unsigned int n_iters = 0;
> + bool iov_iter_is_dio_aligned[3];
> + bool dio_aligned = false;
> + struct nfsd_write_dio write_dio;
>
> trace_nfsd_write_opened(rqstp, fhp, offset, *cnt);
>
> @@ -1380,18 +1498,12 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
> if (stable && !fhp->fh_use_wgather)
> kiocb.ki_flags |= IOCB_DSYNC;
>
> - nvecs = xdr_buf_to_bvec(rqstp->rq_bvec, rqstp->rq_maxpages, payload);
> - iov_iter_bvec(&iter[0], ITER_SOURCE, rqstp->rq_bvec, nvecs, *cnt);
> - n_iters++;
> -
> + init_nfsd_write_dio(&write_dio);
I assume init_nfsd_write_dio() is going to be called only once.
Is there a plan to make it more than a memset() ? Can it be called
in only direct I/O mode?
> switch (nfsd_io_cache_write) {
> case NFSD_IO_DIRECT:
> - /* direct I/O must be aligned to device logical sector size */
> - if (nf->nf_dio_mem_align && nf->nf_dio_offset_align &&
> - (((offset | *cnt) & (nf->nf_dio_offset_align-1)) == 0) &&
> - iov_iter_is_aligned(&iter[0], nf->nf_dio_mem_align - 1,
> - nf->nf_dio_offset_align - 1))
> - kiocb.ki_flags = IOCB_DIRECT;
> + if (nfsd_analyze_write_dio(rqstp, fhp, nf, offset,
> + *cnt, &write_dio))
> + dio_aligned = true;
How about
dio_aligned = nfsd_analyze_write_dio(rqstp, fhp, nf,
offset, *cnt,
&write_dio);
Let's make nfsd_analyze_write_dio a "noinline" so that the compiler
removes it from the hot path in page cache I/O mode.
> break;
> case NFSD_IO_DONTCACHE:
> kiocb.ki_flags = IOCB_DONTCACHE;
> @@ -1400,11 +1512,19 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
> break;
> }
>
> + nvecs = xdr_buf_to_bvec(rqstp->rq_bvec, rqstp->rq_maxpages, payload);
> + n_iters = nfsd_setup_write_iters(&iter, iov_iter_is_dio_aligned, dio_aligned,
> + rqstp->rq_bvec, nvecs, *cnt, &write_dio);
Is there a plan to use buffer re-alignment for the other two I/O modes?
I ask because there are many more conditional branches now, and they
seem to be useful only if there are multiple iters. And it looks like
there are multiple iters only in the direct I/O case.
Generally what we do in situations like this is create utility functions
that contain code common to all paths, and have the separate paths use
those helpers in the combination that they need. Not only is the
instruction path length shorter for each individual path, but the
resulting source code is much more legible.
> +
> since = READ_ONCE(file->f_wb_err);
> if (verf)
> nfsd_copy_write_verifier(verf, nn);
> *cnt = 0;
> for (int i = 0; i < n_iters; i++) {
> + if (iov_iter_is_dio_aligned[i])
> + kiocb.ki_flags |= IOCB_DIRECT;
> + else
> + kiocb.ki_flags &= ~IOCB_DIRECT;
> host_err = vfs_iocb_iter_write(file, &kiocb, &iter[i]);
> if (host_err < 0) {
> commit_reset_write_verifier(nn, rqstp, host_err);
--
Chuck Lever
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v4 4/4] NFSD: issue WRITEs using O_DIRECT even if IO is misaligned
2025-08-06 13:53 ` Chuck Lever
@ 2025-08-06 15:55 ` Mike Snitzer
0 siblings, 0 replies; 13+ messages in thread
From: Mike Snitzer @ 2025-08-06 15:55 UTC (permalink / raw)
To: Chuck Lever; +Cc: linux-nfs, Jeff Layton
On Wed, Aug 06, 2025 at 09:53:10AM -0400, Chuck Lever wrote:
> On 8/5/25 2:44 PM, Mike Snitzer wrote:
> > If NFSD_IO_DIRECT is used, split any misaligned WRITE into a start,
> > middle and end as needed. The large middle extent is DIO-aligned and
> > the start and/or end are misaligned. Buffered IO is used for the
> > misaligned extents and O_DIRECT is used for the middle DIO-aligned
> > extent.
> >
> > The nfsd_analyze_write_dio trace event shows how NFSD splits a given
> > misaligned WRITE into a mix of misaligned extent(s) and a DIO-aligned
> > extent.
> >
> > This combination of trace events is useful:
> >
> > echo 1 > /sys/kernel/tracing/events/nfsd/nfsd_write_opened/enable
> > echo 1 > /sys/kernel/tracing/events/nfsd/nfsd_analyze_write_dio/enable
> > echo 1 > /sys/kernel/tracing/events/nfsd/nfsd_write_io_done/enable
> > echo 1 > /sys/kernel/tracing/events/xfs/xfs_file_direct_write/enable
> >
> > Which for this dd command:
> >
> > dd if=/dev/zero of=/mnt/share1/test bs=47008 count=2 oflag=direct
> >
> > Results in:
> >
> > nfsd-55714 [043] ..... 79976.260851: nfsd_write_opened: xid=0x966c5d2d fh_hash=0x4d34e6c1 offset=0 len=47008
> > nfsd-55714 [043] ..... 79976.260852: nfsd_analyze_write_dio: xid=0x966c5d2d fh_hash=0x4d34e6c1 offset=0 len=47008 start=0+0 middle=0+45056 end=45056+1952
> > nfsd-55714 [043] ..... 79976.260857: xfs_file_direct_write: dev 259:12 ino 0x3e00008f disize 0x0 pos 0x0 bytecount 0xb000
> > nfsd-55714 [043] ..... 79976.260965: nfsd_write_io_done: xid=0x966c5d2d fh_hash=0x4d34e6c1 offset=0 len=47008
> >
> > nfsd-55714 [043] ..... 79976.307762: nfsd_write_opened: xid=0x67e5ce6f fh_hash=0x4d34e6c1 offset=47008 len=47008
> > nfsd-55714 [043] ..... 79976.307762: nfsd_analyze_write_dio: xid=0x67e5ce6f fh_hash=0x4d34e6c1 offset=47008 len=47008 start=47008+2144 middle=49152+40960 end=90112+3904
> > nfsd-55714 [043] ..... 79976.307797: xfs_file_direct_write: dev 259:12 ino 0x3e00008f disize 0xc000 pos 0xc000 bytecount 0xa000
> > nfsd-55714 [043] ..... 79976.307866: nfsd_write_io_done: xid=0x67e5ce6f fh_hash=0x4d34e6c1 offset=47008 len=47008
> >
> > Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> > Reviewed-by: Jeff Layton <jlayton@kernel.org>
> > ---
> > fs/nfsd/vfs.c | 142 ++++++++++++++++++++++++++++++++++++++++++++++----
> > 1 file changed, 131 insertions(+), 11 deletions(-)
> >
> > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> > index 0d4f9f452d466..4980800fab66e 100644
> > --- a/fs/nfsd/vfs.c
> > +++ b/fs/nfsd/vfs.c
> > @@ -1315,6 +1315,121 @@ static int wait_for_concurrent_writes(struct file *file)
> > return err;
> > }
> >
> > +struct nfsd_write_dio
> > +{
>
> struct nfsd_write_dio {
Yeap, fixed now ;)
> > + loff_t middle_offset; /* Offset for start of DIO-aligned middle */
> > + loff_t end_offset; /* Offset for start of DIO-aligned end */
> > + ssize_t start_len; /* Length for misaligned first extent */
> > + ssize_t middle_len; /* Length for DIO-aligned middle extent */
> > + ssize_t end_len; /* Length for misaligned last extent */
> > +};
> > +
> > +static void init_nfsd_write_dio(struct nfsd_write_dio *write_dio)
> > +{
> > + memset(write_dio, 0, sizeof(*write_dio));
> > +}
> > +
> > +static bool nfsd_analyze_write_dio(struct svc_rqst *rqstp, struct svc_fh *fhp,
> > + struct nfsd_file *nf, loff_t offset,
> > + unsigned long len, struct nfsd_write_dio *write_dio)
> > +{
> > + const u32 dio_blocksize = nf->nf_dio_offset_align;
> > + loff_t orig_end, middle_end, start_end, start_offset = offset;
> > + ssize_t start_len = len;
> > + bool aligned = true;
> > +
> > + if (WARN_ONCE(!nf->nf_dio_mem_align || !dio_blocksize,
> > + "%s: underlying filesystem has not provided DIO alignment info\n",
> > + __func__))
> > + return false;
> > +
> > + if (WARN_ONCE(dio_blocksize > PAGE_SIZE,
> > + "%s: underlying storage's dio_blocksize=%u > PAGE_SIZE=%lu\n",
> > + __func__, dio_blocksize, PAGE_SIZE))
> > + return false;
> > +
> > + if (unlikely(len < dio_blocksize)) {
> > + aligned = false;
> > + goto out;
> > + }
> > +
> > + if (((offset | len) & (dio_blocksize-1)) == 0) {
> > + /* already DIO-aligned, no misaligned head or tail */
> > + write_dio->middle_offset = offset;
> > + write_dio->middle_len = len;
> > + /* clear these for the benefit of trace_nfsd_analyze_write_dio */
> > + start_offset = 0;
> > + start_len = 0;
> > + goto out;
> > + }
> > +
> > + start_end = round_up(offset, dio_blocksize);
> > + start_len = start_end - offset;
> > + orig_end = offset + len;
> > + middle_end = round_down(orig_end, dio_blocksize);
> > +
> > + write_dio->start_len = start_len;
> > + write_dio->middle_offset = start_end;
> > + write_dio->middle_len = middle_end - start_end;
> > + write_dio->end_offset = middle_end;
> > + write_dio->end_len = orig_end - middle_end;
> > +out:
> > + trace_nfsd_analyze_write_dio(rqstp, fhp, offset, len, start_offset, start_len,
> > + write_dio->middle_offset, write_dio->middle_len,
> > + write_dio->end_offset, write_dio->end_len);
> > + return aligned;
> > +}
> > +
> > +/*
> > + * Setup as many as 3 iov_iter based on extents possibly described by @write_dio.
> > + * @iterp: pointer to pointer to onstack array of 3 iov_iter structs from caller.
> > + * @iter_is_dio_aligned: pointer to onstack array of 3 bools from caller.
> > + * @dio_aligned: bool that reflects nfsd_analyze_write_dio()'s return
> > + * @rq_bvec: backing bio_vec used to setup all 3 iov_iter permutations.
> > + * @nvecs: number of segments in @rq_bvec
> > + * @cnt: size of the request in bytes
> > + * @write_dio: nfsd_write_dio struct that describes start, middle and end extents.
> > + *
> > + * Returns the number of iov_iter that were setup.
> > + */
> > +static int nfsd_setup_write_iters(struct iov_iter **iterp, bool *iter_is_dio_aligned,
> > + bool dio_aligned, struct bio_vec *rq_bvec,
> > + unsigned int nvecs, unsigned long cnt,
> > + struct nfsd_write_dio *write_dio)
> > +{
> > + int n_iters = 0;
> > + struct iov_iter *iters = *iterp;
> > +
> > + /* Setup misaligned start? */
> > + if (write_dio->start_len) {
> > + iter_is_dio_aligned[n_iters] = false;
> > + iov_iter_bvec(&iters[n_iters], ITER_SOURCE, rq_bvec, nvecs, cnt);
> > + iters[n_iters].count = write_dio->start_len;
> > + n_iters++;
> > + }
> > +
> > + /* Setup possibly DIO-aligned middle */
> > + iter_is_dio_aligned[n_iters] = dio_aligned;
> > + iov_iter_bvec(&iters[n_iters], ITER_SOURCE, rq_bvec, nvecs, cnt);
> > + if (dio_aligned) {
> > + if (write_dio->start_len)
> > + iov_iter_advance(&iters[n_iters], write_dio->start_len);
> > + iters[n_iters].count -= write_dio->end_len;
> > + }
> > + n_iters++;
> > +
> > + /* Setup misaligned end? */
> > + if (write_dio->end_len) {
> > + iter_is_dio_aligned[n_iters] = false;
> > + iov_iter_bvec(&iters[n_iters], ITER_SOURCE, rq_bvec, nvecs, cnt);
> > + iov_iter_advance(&iters[n_iters],
> > + write_dio->start_len + write_dio->middle_len);
> > + n_iters++;
> > + }
> > +
> > + return n_iters;
> > +}
> > +
> > /**
> > * nfsd_vfs_write - write data to an already-open file
> > * @rqstp: RPC execution context
> > @@ -1349,9 +1464,12 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
> > unsigned int pflags = current->flags;
> > bool restore_flags = false;
> > unsigned int nvecs;
> > - struct iov_iter iter_stack[1];
> > + struct iov_iter iter_stack[3];
>
> struct iov_iter isn't that small. This is going to grow the stack frame
> substantially but is used for only the direct I/O case.
Yes, that's the only lingering footprint I have after another pass at
cleanup based on your IPL feedback yesterday.
I expect to be able to push the use of multiple iov_iter down into the
O_DIRECT path only.
> > struct iov_iter *iter = iter_stack;
> > unsigned int n_iters = 0;
> > + bool iov_iter_is_dio_aligned[3];
> > + bool dio_aligned = false;
> > + struct nfsd_write_dio write_dio;
> >
> > trace_nfsd_write_opened(rqstp, fhp, offset, *cnt);
> >
> > @@ -1380,18 +1498,12 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
> > if (stable && !fhp->fh_use_wgather)
> > kiocb.ki_flags |= IOCB_DSYNC;
> >
> > - nvecs = xdr_buf_to_bvec(rqstp->rq_bvec, rqstp->rq_maxpages, payload);
> > - iov_iter_bvec(&iter[0], ITER_SOURCE, rqstp->rq_bvec, nvecs, *cnt);
> > - n_iters++;
> > -
> > + init_nfsd_write_dio(&write_dio);
>
> I assume init_nfsd_write_dio() is going to be called only once.
> Is there a plan to make it more than a memset() ? Can it be called
> in only direct I/O mode?
Yes, I've fixed this.
> > switch (nfsd_io_cache_write) {
> > case NFSD_IO_DIRECT:
> > - /* direct I/O must be aligned to device logical sector size */
> > - if (nf->nf_dio_mem_align && nf->nf_dio_offset_align &&
> > - (((offset | *cnt) & (nf->nf_dio_offset_align-1)) == 0) &&
> > - iov_iter_is_aligned(&iter[0], nf->nf_dio_mem_align - 1,
> > - nf->nf_dio_offset_align - 1))
> > - kiocb.ki_flags = IOCB_DIRECT;
> > + if (nfsd_analyze_write_dio(rqstp, fhp, nf, offset,
> > + *cnt, &write_dio))
> > + dio_aligned = true;
>
> How about
>
> dio_aligned = nfsd_analyze_write_dio(rqstp, fhp, nf,
> offset, *cnt,
> &write_dio);
I've iterated on things a bit, no longer need dio_aligned variable.
> Let's make nfsd_analyze_write_dio a "noinline" so that the compiler
> removes it from the hot path in page cache I/O mode.
OK, will do.
> > break;
> > case NFSD_IO_DONTCACHE:
> > kiocb.ki_flags = IOCB_DONTCACHE;
> > @@ -1400,11 +1512,19 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
> > break;
> > }
> >
> > + nvecs = xdr_buf_to_bvec(rqstp->rq_bvec, rqstp->rq_maxpages, payload);
> > + n_iters = nfsd_setup_write_iters(&iter, iov_iter_is_dio_aligned, dio_aligned,
> > + rqstp->rq_bvec, nvecs, *cnt, &write_dio);
>
> Is there a plan to use buffer re-alignment for the other two I/O modes?
>
> I ask because there are many more conditional branches now, and they
> seem to be useful only if there are multiple iters. And it looks like
> there are multiple iters only in the direct I/O case.
>
> Generally what we do in situations like this is create utility functions
> that contain code common to all paths, and have the separate paths use
> those helpers in the combination that they need. Not only is the
> instruction path length shorter for each individual path, but the
> resulting source code is much more legible.
Yes, I am now working on doing just that.
Thanks,
Mike
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v4 1/4] NFSD: avoid using iov_iter_is_aligned() in nfsd_iter_read()
2025-08-06 13:18 ` Chuck Lever
@ 2025-08-06 15:57 ` Mike Snitzer
2025-08-06 15:58 ` Chuck Lever
2025-08-07 15:50 ` sparse warnings with nfsd-testing [was: Re: [PATCH v4 1/4] NFSD: avoid using iov_iter_is_aligned() in nfsd_iter_read()] Mike Snitzer
0 siblings, 2 replies; 13+ messages in thread
From: Mike Snitzer @ 2025-08-06 15:57 UTC (permalink / raw)
To: Chuck Lever; +Cc: Jeff Layton, linux-nfs
On Wed, Aug 06, 2025 at 09:18:51AM -0400, Chuck Lever wrote:
> On 8/5/25 2:44 PM, Mike Snitzer wrote:
> > From: Mike Snitzer <snitzer@hammerspace.com>
> >
> > Check the bvec is DIO-aligned while creating it, saves CPU cycles by
> > avoiding iterating the bvec elements a second time using
> > iov_iter_is_aligned().
> >
> > This prepares for Keith Busch's near-term removal of the
> > iov_iter_is_aligned() interface. This fixes cel/nfsd-testing commit
> > 5d78ac1e674b4 ("NFSD: issue READs using O_DIRECT even if IO is
> > misaligned") and it should be folded into that commit so that NFSD
> > doesn't require iov_iter_is_aligned() while it is being removed
> > upstream in parallel.
> >
> > Fixes: cel/nfsd-testing 5d78ac1e674b4 ("NFSD: issue READs using O_DIRECT even if IO is misaligned")
> > Signed-off-by: Mike Snitzer <snitzer@hammerspace.com>
> > ---
> > fs/nfsd/vfs.c | 29 +++++++++++++++--------------
> > 1 file changed, 15 insertions(+), 14 deletions(-)
> >
> > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> > index 46189020172fb..e1751d3715264 100644
> > --- a/fs/nfsd/vfs.c
> > +++ b/fs/nfsd/vfs.c
> > @@ -1226,7 +1226,10 @@ __be32 nfsd_iter_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
> > */
> > offset = read_dio.start;
> > in_count = read_dio.end - offset;
> > - kiocb.ki_flags = IOCB_DIRECT;
> > + /* Verify ondisk DIO alignment, memory addrs checked below */
> > + if (likely(((offset | in_count) &
> > + (nf->nf_dio_read_offset_align - 1)) == 0))
> > + kiocb.ki_flags = IOCB_DIRECT;
> > }
> > } else if (nfsd_io_cache_read == NFSD_IO_DONTCACHE)
> > kiocb.ki_flags = IOCB_DONTCACHE;
> > @@ -1236,16 +1239,24 @@ __be32 nfsd_iter_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
> > v = 0;
> > total = in_count;
> > if (read_dio.start_extra) {
> > - bvec_set_page(&rqstp->rq_bvec[v++], read_dio.start_extra_page,
> > + bvec_set_page(&rqstp->rq_bvec[v], read_dio.start_extra_page,
> > read_dio.start_extra, PAGE_SIZE - read_dio.start_extra);
> > + if (unlikely((kiocb.ki_flags & IOCB_DIRECT) &&
> > + rqstp->rq_bvec[v].bv_offset & (nf->nf_dio_mem_align - 1)))
> > + kiocb.ki_flags &= ~IOCB_DIRECT;
> > total -= read_dio.start_extra;
> > + v++;
> > }
> > while (total) {
> > len = min_t(size_t, total, PAGE_SIZE - base);
> > - bvec_set_page(&rqstp->rq_bvec[v++], *(rqstp->rq_next_page++),
> > - len, base);
> > + bvec_set_page(&rqstp->rq_bvec[v], *(rqstp->rq_next_page++), len, base);
> > + /* No need to verify memory is DIO-aligned since bv_offset is 0 */
> > + if (unlikely((kiocb.ki_flags & IOCB_DIRECT) && base &&
> > + (base & (nf->nf_dio_mem_align - 1))))
> > + kiocb.ki_flags &= ~IOCB_DIRECT;
> > total -= len;
> > base = 0;
> > + v++;
> > }
> > if (WARN_ONCE(v > rqstp->rq_maxpages,
> > "%s: v=%lu exceeds rqstp->rq_maxpages=%lu\n", __func__,
> > @@ -1256,16 +1267,6 @@ __be32 nfsd_iter_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
> > if (!host_err) {
> > trace_nfsd_read_vector(rqstp, fhp, offset, in_count);
> > iov_iter_bvec(&iter, ITER_DEST, rqstp->rq_bvec, v, in_count);
> > -
> > - /* Double check nfsd_analyze_read_dio's DIO-aligned result */
> > - if (unlikely((kiocb.ki_flags & IOCB_DIRECT) &&
> > - !iov_iter_is_aligned(&iter,
> > - nf->nf_dio_mem_align - 1,
> > - nf->nf_dio_read_offset_align - 1))) {
> > - /* Fallback to buffered IO */
> > - kiocb.ki_flags &= ~IOCB_DIRECT;
> > - }
> > -
> > host_err = vfs_iocb_iter_read(file, &kiocb, &iter);
> > }
> >
>
> Hi Mike,
>
> In cases where the SQUASHME patch is this large, I usually drop the
> patch (or series) in nfsd-testing and ask the contributor to rebase and
> repost. This gets the new version of the patch properly archived on
> lore, for one thing.
Yeah, make sense, I missed that iov_iter_is_aligned() was used early
on in the series too, so I'll fixup further back.
> Before reposting, please do run checkpatch.pl on the series.
Will do, will also ensure bisect safe and that sparse is happy.
Thanks,
Mike
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v4 1/4] NFSD: avoid using iov_iter_is_aligned() in nfsd_iter_read()
2025-08-06 15:57 ` Mike Snitzer
@ 2025-08-06 15:58 ` Chuck Lever
2025-08-07 15:50 ` sparse warnings with nfsd-testing [was: Re: [PATCH v4 1/4] NFSD: avoid using iov_iter_is_aligned() in nfsd_iter_read()] Mike Snitzer
1 sibling, 0 replies; 13+ messages in thread
From: Chuck Lever @ 2025-08-06 15:58 UTC (permalink / raw)
To: Mike Snitzer; +Cc: Jeff Layton, linux-nfs
On 8/6/25 11:57 AM, Mike Snitzer wrote:
> On Wed, Aug 06, 2025 at 09:18:51AM -0400, Chuck Lever wrote:
>> On 8/5/25 2:44 PM, Mike Snitzer wrote:
>>> From: Mike Snitzer <snitzer@hammerspace.com>
>>>
>>> Check the bvec is DIO-aligned while creating it, saves CPU cycles by
>>> avoiding iterating the bvec elements a second time using
>>> iov_iter_is_aligned().
>>>
>>> This prepares for Keith Busch's near-term removal of the
>>> iov_iter_is_aligned() interface. This fixes cel/nfsd-testing commit
>>> 5d78ac1e674b4 ("NFSD: issue READs using O_DIRECT even if IO is
>>> misaligned") and it should be folded into that commit so that NFSD
>>> doesn't require iov_iter_is_aligned() while it is being removed
>>> upstream in parallel.
>>>
>>> Fixes: cel/nfsd-testing 5d78ac1e674b4 ("NFSD: issue READs using O_DIRECT even if IO is misaligned")
>>> Signed-off-by: Mike Snitzer <snitzer@hammerspace.com>
>>> ---
>>> fs/nfsd/vfs.c | 29 +++++++++++++++--------------
>>> 1 file changed, 15 insertions(+), 14 deletions(-)
>>>
>>> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
>>> index 46189020172fb..e1751d3715264 100644
>>> --- a/fs/nfsd/vfs.c
>>> +++ b/fs/nfsd/vfs.c
>>> @@ -1226,7 +1226,10 @@ __be32 nfsd_iter_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
>>> */
>>> offset = read_dio.start;
>>> in_count = read_dio.end - offset;
>>> - kiocb.ki_flags = IOCB_DIRECT;
>>> + /* Verify ondisk DIO alignment, memory addrs checked below */
>>> + if (likely(((offset | in_count) &
>>> + (nf->nf_dio_read_offset_align - 1)) == 0))
>>> + kiocb.ki_flags = IOCB_DIRECT;
>>> }
>>> } else if (nfsd_io_cache_read == NFSD_IO_DONTCACHE)
>>> kiocb.ki_flags = IOCB_DONTCACHE;
>>> @@ -1236,16 +1239,24 @@ __be32 nfsd_iter_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
>>> v = 0;
>>> total = in_count;
>>> if (read_dio.start_extra) {
>>> - bvec_set_page(&rqstp->rq_bvec[v++], read_dio.start_extra_page,
>>> + bvec_set_page(&rqstp->rq_bvec[v], read_dio.start_extra_page,
>>> read_dio.start_extra, PAGE_SIZE - read_dio.start_extra);
>>> + if (unlikely((kiocb.ki_flags & IOCB_DIRECT) &&
>>> + rqstp->rq_bvec[v].bv_offset & (nf->nf_dio_mem_align - 1)))
>>> + kiocb.ki_flags &= ~IOCB_DIRECT;
>>> total -= read_dio.start_extra;
>>> + v++;
>>> }
>>> while (total) {
>>> len = min_t(size_t, total, PAGE_SIZE - base);
>>> - bvec_set_page(&rqstp->rq_bvec[v++], *(rqstp->rq_next_page++),
>>> - len, base);
>>> + bvec_set_page(&rqstp->rq_bvec[v], *(rqstp->rq_next_page++), len, base);
>>> + /* No need to verify memory is DIO-aligned since bv_offset is 0 */
>>> + if (unlikely((kiocb.ki_flags & IOCB_DIRECT) && base &&
>>> + (base & (nf->nf_dio_mem_align - 1))))
>>> + kiocb.ki_flags &= ~IOCB_DIRECT;
>>> total -= len;
>>> base = 0;
>>> + v++;
>>> }
>>> if (WARN_ONCE(v > rqstp->rq_maxpages,
>>> "%s: v=%lu exceeds rqstp->rq_maxpages=%lu\n", __func__,
>>> @@ -1256,16 +1267,6 @@ __be32 nfsd_iter_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
>>> if (!host_err) {
>>> trace_nfsd_read_vector(rqstp, fhp, offset, in_count);
>>> iov_iter_bvec(&iter, ITER_DEST, rqstp->rq_bvec, v, in_count);
>>> -
>>> - /* Double check nfsd_analyze_read_dio's DIO-aligned result */
>>> - if (unlikely((kiocb.ki_flags & IOCB_DIRECT) &&
>>> - !iov_iter_is_aligned(&iter,
>>> - nf->nf_dio_mem_align - 1,
>>> - nf->nf_dio_read_offset_align - 1))) {
>>> - /* Fallback to buffered IO */
>>> - kiocb.ki_flags &= ~IOCB_DIRECT;
>>> - }
>>> -
>>> host_err = vfs_iocb_iter_read(file, &kiocb, &iter);
>>> }
>>>
>>
>> Hi Mike,
>>
>> In cases where the SQUASHME patch is this large, I usually drop the
>> patch (or series) in nfsd-testing and ask the contributor to rebase and
>> repost. This gets the new version of the patch properly archived on
>> lore, for one thing.
>
> Yeah, make sense, I missed that iov_iter_is_aligned() was used early
> on in the series too, so I'll fixup further back.
>
>> Before reposting, please do run checkpatch.pl on the series.
>
> Will do, will also ensure bisect safe and that sparse is happy.
Thanks for the changes, sounds like the right path forward.
--
Chuck Lever
^ permalink raw reply [flat|nested] 13+ messages in thread
* sparse warnings with nfsd-testing [was: Re: [PATCH v4 1/4] NFSD: avoid using iov_iter_is_aligned() in nfsd_iter_read()]
2025-08-06 15:57 ` Mike Snitzer
2025-08-06 15:58 ` Chuck Lever
@ 2025-08-07 15:50 ` Mike Snitzer
2025-08-07 15:51 ` Chuck Lever
1 sibling, 1 reply; 13+ messages in thread
From: Mike Snitzer @ 2025-08-07 15:50 UTC (permalink / raw)
To: Chuck Lever; +Cc: Jeff Layton, linux-nfs
On Wed, Aug 06, 2025 at 11:57:42AM -0400, Mike Snitzer wrote:
> On Wed, Aug 06, 2025 at 09:18:51AM -0400, Chuck Lever wrote:
>
> > Before reposting, please do run checkpatch.pl on the series.
>
> Will do, will also ensure bisect safe and that sparse is happy.
FYI, I'm preparing my next patchset and sparse is happy with it, but I
wanted to share these warnings seen with nfsd-testing through commit
ae83299cc048e ("NFSD: Fix last write offset handling in
layoutcommit"):
fs/nfsd/nfs4state.c: note: in included file (through include/linux/wait.h, include/linux/wait_bit.h, include/linux/fs.h):
./include/linux/list.h:229:25: warning: context imbalance in 'put_clnt_odstate' - unexpected unlock
fs/nfsd/nfs4state.c:1188:9: warning: context imbalance in 'nfs4_put_stid' - unexpected unlock
I haven't looked at them closer. Could be you're well aware of them?
Mike
ps. but full disclosure, my baseline kernel is 6.12.24, I haven't
tried sparse against the nfsd-testing branch itself (which is based on
your nfsd-6.17); but my 6.12.24 kernel does have all NFS/NFSD/SUNRPC
changes through nfsd-6.17 + nfsd-testing commit ae83299cc048e).
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: sparse warnings with nfsd-testing [was: Re: [PATCH v4 1/4] NFSD: avoid using iov_iter_is_aligned() in nfsd_iter_read()]
2025-08-07 15:50 ` sparse warnings with nfsd-testing [was: Re: [PATCH v4 1/4] NFSD: avoid using iov_iter_is_aligned() in nfsd_iter_read()] Mike Snitzer
@ 2025-08-07 15:51 ` Chuck Lever
2025-08-07 15:53 ` Mike Snitzer
0 siblings, 1 reply; 13+ messages in thread
From: Chuck Lever @ 2025-08-07 15:51 UTC (permalink / raw)
To: Mike Snitzer; +Cc: Jeff Layton, linux-nfs
On 8/7/25 11:50 AM, Mike Snitzer wrote:
> On Wed, Aug 06, 2025 at 11:57:42AM -0400, Mike Snitzer wrote:
>> On Wed, Aug 06, 2025 at 09:18:51AM -0400, Chuck Lever wrote:
>>
>>> Before reposting, please do run checkpatch.pl on the series.
>>
>> Will do, will also ensure bisect safe and that sparse is happy.
>
> FYI, I'm preparing my next patchset and sparse is happy with it, but I
> wanted to share these warnings seen with nfsd-testing through commit
> ae83299cc048e ("NFSD: Fix last write offset handling in
> layoutcommit"):
>
> fs/nfsd/nfs4state.c: note: in included file (through include/linux/wait.h, include/linux/wait_bit.h, include/linux/fs.h):
> ./include/linux/list.h:229:25: warning: context imbalance in 'put_clnt_odstate' - unexpected unlock
> fs/nfsd/nfs4state.c:1188:9: warning: context imbalance in 'nfs4_put_stid' - unexpected unlock
>
> I haven't looked at them closer. Could be you're well aware of them?
These warnings have been there forever. I'm told they are the result of
bugs in sparse.
> ps. but full disclosure, my baseline kernel is 6.12.24, I haven't
> tried sparse against the nfsd-testing branch itself (which is based on
> your nfsd-6.17); but my 6.12.24 kernel does have all NFS/NFSD/SUNRPC
> changes through nfsd-6.17 + nfsd-testing commit ae83299cc048e).
--
Chuck Lever
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: sparse warnings with nfsd-testing [was: Re: [PATCH v4 1/4] NFSD: avoid using iov_iter_is_aligned() in nfsd_iter_read()]
2025-08-07 15:51 ` Chuck Lever
@ 2025-08-07 15:53 ` Mike Snitzer
0 siblings, 0 replies; 13+ messages in thread
From: Mike Snitzer @ 2025-08-07 15:53 UTC (permalink / raw)
To: Chuck Lever; +Cc: Jeff Layton, linux-nfs
On Thu, Aug 07, 2025 at 11:51:49AM -0400, Chuck Lever wrote:
> On 8/7/25 11:50 AM, Mike Snitzer wrote:
> > On Wed, Aug 06, 2025 at 11:57:42AM -0400, Mike Snitzer wrote:
> >> On Wed, Aug 06, 2025 at 09:18:51AM -0400, Chuck Lever wrote:
> >>
> >>> Before reposting, please do run checkpatch.pl on the series.
> >>
> >> Will do, will also ensure bisect safe and that sparse is happy.
> >
> > FYI, I'm preparing my next patchset and sparse is happy with it, but I
> > wanted to share these warnings seen with nfsd-testing through commit
> > ae83299cc048e ("NFSD: Fix last write offset handling in
> > layoutcommit"):
> >
> > fs/nfsd/nfs4state.c: note: in included file (through include/linux/wait.h, include/linux/wait_bit.h, include/linux/fs.h):
> > ./include/linux/list.h:229:25: warning: context imbalance in 'put_clnt_odstate' - unexpected unlock
> > fs/nfsd/nfs4state.c:1188:9: warning: context imbalance in 'nfs4_put_stid' - unexpected unlock
> >
> > I haven't looked at them closer. Could be you're well aware of them?
>
> These warnings have been there forever. I'm told they are the result of
> bugs in sparse.
OK, good to know, thanks!
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2025-08-07 15:53 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-05 18:44 [PATCH v4 0/4] NFSD DIRECT: misaligned READs fixup, add handling for misaligned WRITEs Mike Snitzer
2025-08-05 18:44 ` [PATCH v4 1/4] NFSD: avoid using iov_iter_is_aligned() in nfsd_iter_read() Mike Snitzer
2025-08-06 13:18 ` Chuck Lever
2025-08-06 15:57 ` Mike Snitzer
2025-08-06 15:58 ` Chuck Lever
2025-08-07 15:50 ` sparse warnings with nfsd-testing [was: Re: [PATCH v4 1/4] NFSD: avoid using iov_iter_is_aligned() in nfsd_iter_read()] Mike Snitzer
2025-08-07 15:51 ` Chuck Lever
2025-08-07 15:53 ` Mike Snitzer
2025-08-05 18:44 ` [PATCH v4 2/4] NFSD: refactor nfsd_read_vector_dio to EVENT_CLASS useful for READ and WRITE Mike Snitzer
2025-08-05 18:44 ` [PATCH v4 3/4] NFSD: prepare nfsd_vfs_write() to use O_DIRECT on misaligned WRITEs Mike Snitzer
2025-08-05 18:44 ` [PATCH v4 4/4] NFSD: issue WRITEs using O_DIRECT even if IO is misaligned Mike Snitzer
2025-08-06 13:53 ` Chuck Lever
2025-08-06 15:55 ` Mike Snitzer
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).