* [PATCH v2 0/4] NFSD DIRECT: add handling for misaligned WRITEs
@ 2025-07-31 19:44 Mike Snitzer
2025-07-31 19:44 ` [PATCH v2 1/4] NFSD: refactor nfsd_read_vector_dio to EVENT_CLASS useful for READ and WRITE Mike Snitzer
` (3 more replies)
0 siblings, 4 replies; 19+ messages in thread
From: Mike Snitzer @ 2025-07-31 19:44 UTC (permalink / raw)
To: Chuck Lever, Jeff Layton; +Cc: linux-nfs
Hi,
This series builds on what has been staged in the nfsd-testing branch.
This code has proven to work well during my testing. Any suggestions
for further refinement are welcome.
Changes since v1:
- switched to using an EVENT_CLASS to create nfsd_analyze_{read,write}_dio
- added 4th patch, if user configured use of NFSD_IO_DIRECT then NFS
reexports should use it too (in future, with per-export controls
we'll have the benefit of finer-grained control; but until then we'd
do well to offer comprehensive use of NFSD_IO_DIRECT if it enabled).
Thanks,
Mike
Mike Snitzer (4):
NFSD: refactor nfsd_read_vector_dio to EVENT_CLASS useful for READ and WRITE
NFSD: prepare nfsd_vfs_write() to use O_DIRECT on misaligned WRITEs
NFSD: issue WRITEs using O_DIRECT even if IO is misaligned
NFSD: handle unaligned DIO for NFS reexport
fs/nfs/export.c | 3 +-
fs/nfsd/filecache.c | 11 +++
fs/nfsd/trace.h | 52 ++++++++---
fs/nfsd/vfs.c | 188 ++++++++++++++++++++++++++++++++-------
include/linux/exportfs.h | 13 +++
5 files changed, 220 insertions(+), 47 deletions(-)
--
2.44.0
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH v2 1/4] NFSD: refactor nfsd_read_vector_dio to EVENT_CLASS useful for READ and WRITE
2025-07-31 19:44 [PATCH v2 0/4] NFSD DIRECT: add handling for misaligned WRITEs Mike Snitzer
@ 2025-07-31 19:44 ` Mike Snitzer
2025-07-31 20:28 ` Jeff Layton
2025-07-31 19:44 ` [PATCH v2 2/4] NFSD: prepare nfsd_vfs_write() to use O_DIRECT on misaligned WRITEs Mike Snitzer
` (2 subsequent siblings)
3 siblings, 1 reply; 19+ messages in thread
From: Mike Snitzer @ 2025-07-31 19:44 UTC (permalink / raw)
To: Chuck Lever, Jeff Layton; +Cc: linux-nfs
Transform nfsd_read_vector_dio trace event into nfsd_analyze_dio_class
and use it to create nfsd_analyze_read_dio and nfsd_analyze_write_dio
trace events.
This prepares for nfsd_vfs_write() to also make use of it when
handling misaligned WRITEs.
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfsd/trace.h | 52 ++++++++++++++++++++++++++++++++++++-------------
fs/nfsd/vfs.c | 11 ++++++-----
2 files changed, 44 insertions(+), 19 deletions(-)
diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
index 55055482f8a84..4173bd9344b6b 100644
--- a/fs/nfsd/trace.h
+++ b/fs/nfsd/trace.h
@@ -473,25 +473,29 @@ DEFINE_NFSD_IO_EVENT(write_done);
DEFINE_NFSD_IO_EVENT(commit_start);
DEFINE_NFSD_IO_EVENT(commit_done);
-TRACE_EVENT(nfsd_read_vector_dio,
+DECLARE_EVENT_CLASS(nfsd_analyze_dio_class,
TP_PROTO(struct svc_rqst *rqstp,
struct svc_fh *fhp,
u64 offset,
u32 len,
- loff_t start,
- loff_t start_extra,
- loff_t end,
- loff_t end_extra),
- TP_ARGS(rqstp, fhp, offset, len, start, start_extra, end, end_extra),
+ loff_t start,
+ ssize_t start_len,
+ loff_t middle,
+ ssize_t middle_len,
+ loff_t end,
+ ssize_t end_len),
+ TP_ARGS(rqstp, fhp, offset, len, start, start_len, middle, middle_len, end, end_len),
TP_STRUCT__entry(
__field(u32, xid)
__field(u32, fh_hash)
__field(u64, offset)
__field(u32, len)
__field(loff_t, start)
- __field(loff_t, start_extra)
+ __field(ssize_t, start_len)
+ __field(loff_t, middle)
+ __field(ssize_t, middle_len)
__field(loff_t, end)
- __field(loff_t, end_extra)
+ __field(ssize_t, end_len)
),
TP_fast_assign(
__entry->xid = be32_to_cpu(rqstp->rq_xid);
@@ -499,16 +503,36 @@ TRACE_EVENT(nfsd_read_vector_dio,
__entry->offset = offset;
__entry->len = len;
__entry->start = start;
- __entry->start_extra = start_extra;
+ __entry->start_len = start_len;
+ __entry->middle = middle;
+ __entry->middle_len = middle_len;
__entry->end = end;
- __entry->end_extra = end_extra;
+ __entry->end_len = end_len;
),
- TP_printk("xid=0x%08x fh_hash=0x%08x offset=%llu len=%u start=%llu+%llu end=%llu-%llu",
+ TP_printk("xid=0x%08x fh_hash=0x%08x offset=%llu len=%u start=%llu+%lu middle=%llu+%lu end=%llu+%lu",
__entry->xid, __entry->fh_hash,
__entry->offset, __entry->len,
- __entry->start, __entry->start_extra,
- __entry->end, __entry->end_extra)
-);
+ __entry->start, __entry->start_len,
+ __entry->middle, __entry->middle_len,
+ __entry->end, __entry->end_len)
+)
+
+#define DEFINE_NFSD_ANALYZE_DIO_EVENT(name) \
+DEFINE_EVENT(nfsd_analyze_dio_class, nfsd_analyze_##name##_dio, \
+ TP_PROTO(struct svc_rqst *rqstp, \
+ struct svc_fh *fhp, \
+ u64 offset, \
+ u32 len, \
+ loff_t start, \
+ ssize_t start_len, \
+ loff_t middle, \
+ ssize_t middle_len, \
+ loff_t end, \
+ ssize_t end_len), \
+ TP_ARGS(rqstp, fhp, offset, len, start, start_len, middle, middle_len, end, end_len))
+
+DEFINE_NFSD_ANALYZE_DIO_EVENT(read);
+DEFINE_NFSD_ANALYZE_DIO_EVENT(write);
DECLARE_EVENT_CLASS(nfsd_err_class,
TP_PROTO(struct svc_rqst *rqstp,
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 46189020172fb..35c29b8ade9c3 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1094,7 +1094,7 @@ static bool nfsd_analyze_read_dio(struct svc_rqst *rqstp, struct svc_fh *fhp,
struct nfsd_read_dio *read_dio)
{
const u32 dio_blocksize = nf->nf_dio_read_offset_align;
- loff_t orig_end = offset + len;
+ loff_t middle_end, orig_end = offset + len;
if (WARN_ONCE(!nf->nf_dio_mem_align || !nf->nf_dio_read_offset_align,
"%s: underlying filesystem has not provided DIO alignment info\n",
@@ -1133,10 +1133,11 @@ static bool nfsd_analyze_read_dio(struct svc_rqst *rqstp, struct svc_fh *fhp,
}
/* Show original offset and count, and how it was expanded for DIO */
- trace_nfsd_read_vector_dio(rqstp, fhp, offset, len,
- read_dio->start, read_dio->start_extra,
- read_dio->end, read_dio->end_extra);
-
+ middle_end = read_dio->end - read_dio->end_extra;
+ trace_nfsd_analyze_read_dio(rqstp, fhp, offset, len,
+ read_dio->start, read_dio->start_extra,
+ offset, (middle_end - offset),
+ middle_end, read_dio->end_extra);
return true;
}
--
2.44.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH v2 2/4] NFSD: prepare nfsd_vfs_write() to use O_DIRECT on misaligned WRITEs
2025-07-31 19:44 [PATCH v2 0/4] NFSD DIRECT: add handling for misaligned WRITEs Mike Snitzer
2025-07-31 19:44 ` [PATCH v2 1/4] NFSD: refactor nfsd_read_vector_dio to EVENT_CLASS useful for READ and WRITE Mike Snitzer
@ 2025-07-31 19:44 ` Mike Snitzer
2025-07-31 20:28 ` Jeff Layton
2025-07-31 20:54 ` Jeff Layton
2025-07-31 19:44 ` [PATCH v2 3/4] NFSD: issue WRITEs using O_DIRECT even if IO is misaligned Mike Snitzer
2025-07-31 19:44 ` [PATCH v2 4/4] NFSD: handle unaligned DIO for NFS reexport Mike Snitzer
3 siblings, 2 replies; 19+ messages in thread
From: Mike Snitzer @ 2025-07-31 19:44 UTC (permalink / raw)
To: Chuck Lever, Jeff Layton; +Cc: linux-nfs
Refactor nfsd_vfs_write() to support splitting a WRITE into parts
(which will be either misaligned or DIO-aligned). Doing so in a
preliminary commit just allows for indentation and slight
transformation to be more easily understood and reviewed.
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfsd/vfs.c | 50 ++++++++++++++++++++++++++++++--------------------
1 file changed, 30 insertions(+), 20 deletions(-)
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 35c29b8ade9c3..e4855c32dad12 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1341,7 +1341,6 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
struct super_block *sb = file_inode(file)->i_sb;
struct kiocb kiocb;
struct svc_export *exp;
- struct iov_iter iter;
errseq_t since;
__be32 nfserr;
int host_err;
@@ -1349,6 +1348,9 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
unsigned int pflags = current->flags;
bool restore_flags = false;
unsigned int nvecs;
+ struct iov_iter iter_stack[1];
+ struct iov_iter *iter = iter_stack;
+ unsigned int n_iters = 0;
trace_nfsd_write_opened(rqstp, fhp, offset, *cnt);
@@ -1378,14 +1380,15 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
kiocb.ki_flags |= IOCB_DSYNC;
nvecs = xdr_buf_to_bvec(rqstp->rq_bvec, rqstp->rq_maxpages, payload);
- iov_iter_bvec(&iter, ITER_SOURCE, rqstp->rq_bvec, nvecs, *cnt);
+ iov_iter_bvec(&iter[0], ITER_SOURCE, rqstp->rq_bvec, nvecs, *cnt);
+ n_iters++;
switch (nfsd_io_cache_write) {
case NFSD_IO_DIRECT:
/* direct I/O must be aligned to device logical sector size */
if (nf->nf_dio_mem_align && nf->nf_dio_offset_align &&
(((offset | *cnt) & (nf->nf_dio_offset_align-1)) == 0) &&
- iov_iter_is_aligned(&iter, nf->nf_dio_mem_align - 1,
+ iov_iter_is_aligned(&iter[0], nf->nf_dio_mem_align - 1,
nf->nf_dio_offset_align - 1))
kiocb.ki_flags = IOCB_DIRECT;
break;
@@ -1396,25 +1399,32 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
break;
}
- since = READ_ONCE(file->f_wb_err);
- if (verf)
- nfsd_copy_write_verifier(verf, nn);
- host_err = vfs_iocb_iter_write(file, &kiocb, &iter);
- if (host_err < 0) {
- commit_reset_write_verifier(nn, rqstp, host_err);
- goto out_nfserr;
- }
- *cnt = host_err;
- nfsd_stats_io_write_add(nn, exp, *cnt);
- fsnotify_modify(file);
- host_err = filemap_check_wb_err(file->f_mapping, since);
- if (host_err < 0)
- goto out_nfserr;
+ *cnt = 0;
+ for (int i = 0; i < n_iters; i++) {
+ since = READ_ONCE(file->f_wb_err);
+ if (verf)
+ nfsd_copy_write_verifier(verf, nn);
- if (stable && fhp->fh_use_wgather) {
- host_err = wait_for_concurrent_writes(file);
- if (host_err < 0)
+ host_err = vfs_iocb_iter_write(file, &kiocb, &iter[i]);
+ if (host_err < 0) {
commit_reset_write_verifier(nn, rqstp, host_err);
+ goto out_nfserr;
+ }
+ *cnt += host_err;
+ nfsd_stats_io_write_add(nn, exp, host_err);
+
+ fsnotify_modify(file);
+ host_err = filemap_check_wb_err(file->f_mapping, since);
+ if (host_err < 0)
+ goto out_nfserr;
+
+ if (stable && fhp->fh_use_wgather) {
+ host_err = wait_for_concurrent_writes(file);
+ if (host_err < 0) {
+ commit_reset_write_verifier(nn, rqstp, host_err);
+ goto out_nfserr;
+ }
+ }
}
out_nfserr:
--
2.44.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH v2 3/4] NFSD: issue WRITEs using O_DIRECT even if IO is misaligned
2025-07-31 19:44 [PATCH v2 0/4] NFSD DIRECT: add handling for misaligned WRITEs Mike Snitzer
2025-07-31 19:44 ` [PATCH v2 1/4] NFSD: refactor nfsd_read_vector_dio to EVENT_CLASS useful for READ and WRITE Mike Snitzer
2025-07-31 19:44 ` [PATCH v2 2/4] NFSD: prepare nfsd_vfs_write() to use O_DIRECT on misaligned WRITEs Mike Snitzer
@ 2025-07-31 19:44 ` Mike Snitzer
2025-07-31 20:53 ` Jeff Layton
2025-07-31 19:44 ` [PATCH v2 4/4] NFSD: handle unaligned DIO for NFS reexport Mike Snitzer
3 siblings, 1 reply; 19+ messages in thread
From: Mike Snitzer @ 2025-07-31 19:44 UTC (permalink / raw)
To: Chuck Lever, Jeff Layton; +Cc: linux-nfs
If NFSD_IO_DIRECT is used, split any misaligned WRITE into a start,
middle and end as needed. The large middle extent is DIO-aligned and
the start and/or end are misaligned. Buffered IO is used for the
misaligned extents and O_DIRECT is used for the middle DIO-aligned
extent.
The nfsd_analyze_write_dio trace event shows how NFSD splits a given
misaligned WRITE into a mix of misaligned extent(s) and a DIO-aligned
extent.
This combination of trace events is useful:
echo 1 > /sys/kernel/tracing/events/nfsd/nfsd_write_opened/enable
echo 1 > /sys/kernel/tracing/events/nfsd/nfsd_analyze_write_dio/enable
echo 1 > /sys/kernel/tracing/events/nfsd/nfsd_write_io_done/enable
echo 1 > /sys/kernel/tracing/events/xfs/xfs_file_direct_write/enable
Which for this dd command:
dd if=/dev/zero of=/mnt/share1/test bs=47008 count=2 oflag=direct
Results in:
nfsd-55714 [043] ..... 79976.260851: nfsd_write_opened: xid=0x966c5d2d fh_hash=0x4d34e6c1 offset=0 len=47008
nfsd-55714 [043] ..... 79976.260852: nfsd_analyze_write_dio: xid=0x966c5d2d fh_hash=0x4d34e6c1 offset=0 len=47008 start=0+0 middle=0+45056 end=45056+1952
nfsd-55714 [043] ..... 79976.260857: xfs_file_direct_write: dev 259:12 ino 0x3e00008f disize 0x0 pos 0x0 bytecount 0xb000
nfsd-55714 [043] ..... 79976.260965: nfsd_write_io_done: xid=0x966c5d2d fh_hash=0x4d34e6c1 offset=0 len=47008
nfsd-55714 [043] ..... 79976.307762: nfsd_write_opened: xid=0x67e5ce6f fh_hash=0x4d34e6c1 offset=47008 len=47008
nfsd-55714 [043] ..... 79976.307762: nfsd_analyze_write_dio: xid=0x67e5ce6f fh_hash=0x4d34e6c1 offset=47008 len=47008 start=47008+2144 middle=49152+40960 end=90112+3904
nfsd-55714 [043] ..... 79976.307797: xfs_file_direct_write: dev 259:12 ino 0x3e00008f disize 0xc000 pos 0xc000 bytecount 0xa000
nfsd-55714 [043] ..... 79976.307866: nfsd_write_io_done: xid=0x67e5ce6f fh_hash=0x4d34e6c1 offset=47008 len=47008
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfsd/vfs.c | 135 ++++++++++++++++++++++++++++++++++++++++++++++----
1 file changed, 124 insertions(+), 11 deletions(-)
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index e4855c32dad12..23360825455a2 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1314,6 +1314,113 @@ static int wait_for_concurrent_writes(struct file *file)
return err;
}
+struct nfsd_write_dio
+{
+ loff_t middle_offset; /* Offset for start of DIO-aligned middle */
+ loff_t end_offset; /* Offset for start of DIO-aligned end */
+ ssize_t start_len; /* Length for misaligned first extent */
+ ssize_t middle_len; /* Length for DIO-aligned middle extent */
+ ssize_t end_len; /* Length for misaligned last extent */
+};
+
+static void init_nfsd_write_dio(struct nfsd_write_dio *write_dio)
+{
+ memset(write_dio, 0, sizeof(*write_dio));
+}
+
+static bool nfsd_analyze_write_dio(struct svc_rqst *rqstp, struct svc_fh *fhp,
+ struct nfsd_file *nf, loff_t offset,
+ unsigned long len, struct nfsd_write_dio *write_dio)
+{
+ const u32 dio_blocksize = nf->nf_dio_offset_align;
+ loff_t orig_end, middle_end, start_end, start_offset = offset;
+ ssize_t start_len = len;
+ bool aligned = true;
+
+ if (WARN_ONCE(!nf->nf_dio_mem_align || !dio_blocksize,
+ "%s: underlying filesystem has not provided DIO alignment info\n",
+ __func__))
+ return false;
+
+ if (WARN_ONCE(dio_blocksize > PAGE_SIZE,
+ "%s: underlying storage's dio_blocksize=%u > PAGE_SIZE=%lu\n",
+ __func__, dio_blocksize, PAGE_SIZE))
+ return false;
+
+ if (unlikely(len < dio_blocksize)) {
+ aligned = false;
+ goto out;
+ }
+
+ if (((offset | len) & (dio_blocksize-1)) == 0) {
+ /* already DIO-aligned, no misaligned head or tail */
+ write_dio->middle_offset = offset;
+ write_dio->middle_len = len;
+ /* clear these for the benefit of trace_nfsd_analyze_write_dio */
+ start_offset = 0;
+ start_len = 0;
+ goto out;
+ }
+
+ start_end = round_up(offset, dio_blocksize);
+ start_len = start_end - offset;
+ orig_end = offset + len;
+ middle_end = round_down(orig_end, dio_blocksize);
+
+ write_dio->start_len = start_len;
+ write_dio->middle_offset = start_end;
+ write_dio->middle_len = middle_end - start_end;
+ write_dio->end_offset = middle_end;
+ write_dio->end_len = orig_end - middle_end;
+out:
+ trace_nfsd_analyze_write_dio(rqstp, fhp, offset, len, start_offset, start_len,
+ write_dio->middle_offset, write_dio->middle_len,
+ write_dio->end_offset, write_dio->end_len);
+ return aligned;
+}
+
+/*
+ * Setup as many as 3 iov_iter based on extents possibly described by @write_dio.
+ * @iterp: pointer to pointer to onstack array of 3 iov_iter structs from caller.
+ * @rq_bvec: backing bio_vec used to setup all 3 iov_iter permutations.
+ * @nvecs: number of segments in @rq_bvec
+ * @cnt: size of the request in bytes
+ * @write_dio: nfsd_write_dio struct that describes start, middle and end extents.
+ *
+ * Returns the number of iov_iter that were setup.
+ */
+static int nfsd_setup_write_iters(struct iov_iter **iterp, struct bio_vec *rq_bvec,
+ unsigned int nvecs, unsigned long cnt,
+ struct nfsd_write_dio *write_dio)
+{
+ int n_iters = 0;
+ struct iov_iter *iters = *iterp;
+
+ /* Setup misaligned start? */
+ if (write_dio->start_len) {
+ iov_iter_bvec(&iters[n_iters], ITER_SOURCE, rq_bvec, nvecs, cnt);
+ iters[n_iters].count = write_dio->start_len;
+ n_iters++;
+ }
+
+ /* Setup possibly DIO-aligned middle */
+ iov_iter_bvec(&iters[n_iters], ITER_SOURCE, rq_bvec, nvecs, cnt);
+ if (write_dio->start_len)
+ iov_iter_advance(&iters[n_iters], write_dio->start_len);
+ iters[n_iters].count -= write_dio->end_len;
+ n_iters++;
+
+ /* Setup misaligned end? */
+ if (write_dio->end_len) {
+ iov_iter_bvec(&iters[n_iters], ITER_SOURCE, rq_bvec, nvecs, cnt);
+ iov_iter_advance(&iters[n_iters],
+ write_dio->start_len + write_dio->middle_len);
+ n_iters++;
+ }
+
+ return n_iters;
+}
+
/**
* nfsd_vfs_write - write data to an already-open file
* @rqstp: RPC execution context
@@ -1348,9 +1455,11 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
unsigned int pflags = current->flags;
bool restore_flags = false;
unsigned int nvecs;
- struct iov_iter iter_stack[1];
+ struct iov_iter iter_stack[3];
struct iov_iter *iter = iter_stack;
unsigned int n_iters = 0;
+ bool dio_aligned = false;
+ struct nfsd_write_dio write_dio;
trace_nfsd_write_opened(rqstp, fhp, offset, *cnt);
@@ -1379,18 +1488,12 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
if (stable && !fhp->fh_use_wgather)
kiocb.ki_flags |= IOCB_DSYNC;
- nvecs = xdr_buf_to_bvec(rqstp->rq_bvec, rqstp->rq_maxpages, payload);
- iov_iter_bvec(&iter[0], ITER_SOURCE, rqstp->rq_bvec, nvecs, *cnt);
- n_iters++;
-
+ init_nfsd_write_dio(&write_dio);
switch (nfsd_io_cache_write) {
case NFSD_IO_DIRECT:
- /* direct I/O must be aligned to device logical sector size */
- if (nf->nf_dio_mem_align && nf->nf_dio_offset_align &&
- (((offset | *cnt) & (nf->nf_dio_offset_align-1)) == 0) &&
- iov_iter_is_aligned(&iter[0], nf->nf_dio_mem_align - 1,
- nf->nf_dio_offset_align - 1))
- kiocb.ki_flags = IOCB_DIRECT;
+ if (nfsd_analyze_write_dio(rqstp, fhp, nf, offset,
+ *cnt, &write_dio))
+ dio_aligned = true;
break;
case NFSD_IO_DONTCACHE:
kiocb.ki_flags = IOCB_DONTCACHE;
@@ -1399,12 +1502,22 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
break;
}
+ nvecs = xdr_buf_to_bvec(rqstp->rq_bvec, rqstp->rq_maxpages, payload);
+ n_iters = nfsd_setup_write_iters(&iter, rqstp->rq_bvec, nvecs, *cnt, &write_dio);
+
*cnt = 0;
for (int i = 0; i < n_iters; i++) {
since = READ_ONCE(file->f_wb_err);
if (verf)
nfsd_copy_write_verifier(verf, nn);
+ if (dio_aligned) {
+ if (iov_iter_is_aligned(&iter[i], nf->nf_dio_mem_align - 1,
+ nf->nf_dio_offset_align - 1))
+ kiocb.ki_flags |= IOCB_DIRECT;
+ else
+ kiocb.ki_flags &= ~IOCB_DIRECT;
+ }
host_err = vfs_iocb_iter_write(file, &kiocb, &iter[i]);
if (host_err < 0) {
commit_reset_write_verifier(nn, rqstp, host_err);
--
2.44.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH v2 4/4] NFSD: handle unaligned DIO for NFS reexport
2025-07-31 19:44 [PATCH v2 0/4] NFSD DIRECT: add handling for misaligned WRITEs Mike Snitzer
` (2 preceding siblings ...)
2025-07-31 19:44 ` [PATCH v2 3/4] NFSD: issue WRITEs using O_DIRECT even if IO is misaligned Mike Snitzer
@ 2025-07-31 19:44 ` Mike Snitzer
2025-07-31 20:58 ` Jeff Layton
3 siblings, 1 reply; 19+ messages in thread
From: Mike Snitzer @ 2025-07-31 19:44 UTC (permalink / raw)
To: Chuck Lever, Jeff Layton; +Cc: linux-nfs
NFS doesn't have any DIO alignment constraints but it doesn't support
STATX_DIOALIGN, so update NFSD such that it doesn't disable the use of
NFSD_IO_DIRECT if it is reexporting NFS.
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfs/export.c | 3 ++-
fs/nfsd/filecache.c | 11 +++++++++++
include/linux/exportfs.h | 13 +++++++++++++
3 files changed, 26 insertions(+), 1 deletion(-)
diff --git a/fs/nfs/export.c b/fs/nfs/export.c
index e9c233b6fd209..2cae75ba6b35d 100644
--- a/fs/nfs/export.c
+++ b/fs/nfs/export.c
@@ -155,5 +155,6 @@ const struct export_operations nfs_export_ops = {
EXPORT_OP_REMOTE_FS |
EXPORT_OP_NOATOMIC_ATTR |
EXPORT_OP_FLUSH_ON_CLOSE |
- EXPORT_OP_NOLOCKS,
+ EXPORT_OP_NOLOCKS |
+ EXPORT_OP_NO_DIOALIGN_NEEDED,
};
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index 5601e839a72da..ea489dd44fd9a 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -1066,6 +1066,17 @@ nfsd_file_getattr(const struct svc_fh *fhp, struct nfsd_file *nf)
nfsd_io_cache_write != NFSD_IO_DIRECT))
return nfs_ok;
+ if (exportfs_handles_unaligned_dio(nf->nf_file->f_path.mnt->mnt_sb->s_export_op)) {
+ /* Underlying filesystem doesn't support STATX_DIOALIGN
+ * but it can handle all unaligned DIO, so establish
+ * DIO alignment that is accommodating.
+ */
+ nf->nf_dio_mem_align = 4;
+ nf->nf_dio_offset_align = PAGE_SIZE;
+ nf->nf_dio_read_offset_align = nf->nf_dio_offset_align;
+ return nfs_ok;
+ }
+
status = fh_getattr(fhp, &stat);
if (status != nfs_ok)
return status;
diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h
index 9369a607224c1..626b8486dd985 100644
--- a/include/linux/exportfs.h
+++ b/include/linux/exportfs.h
@@ -247,6 +247,7 @@ struct export_operations {
*/
#define EXPORT_OP_FLUSH_ON_CLOSE (0x20) /* fs flushes file data on close */
#define EXPORT_OP_NOLOCKS (0x40) /* no file locking support */
+#define EXPORT_OP_NO_DIOALIGN_NEEDED (0x80) /* fs can handle unaligned DIO */
unsigned long flags;
};
@@ -262,6 +263,18 @@ exportfs_cannot_lock(const struct export_operations *export_ops)
return export_ops->flags & EXPORT_OP_NOLOCKS;
}
+/**
+ * exportfs_handles_unaligned_dio() - check if export can handle unaligned DIO
+ * @export_ops: the nfs export operations to check
+ *
+ * Returns true if the export can handle unaligned DIO.
+ */
+static inline bool
+exportfs_handles_unaligned_dio(const struct export_operations *export_ops)
+{
+ return export_ops->flags & EXPORT_OP_NO_DIOALIGN_NEEDED;
+}
+
extern int exportfs_encode_inode_fh(struct inode *inode, struct fid *fid,
int *max_len, struct inode *parent,
int flags);
--
2.44.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH v2 2/4] NFSD: prepare nfsd_vfs_write() to use O_DIRECT on misaligned WRITEs
2025-07-31 19:44 ` [PATCH v2 2/4] NFSD: prepare nfsd_vfs_write() to use O_DIRECT on misaligned WRITEs Mike Snitzer
@ 2025-07-31 20:28 ` Jeff Layton
2025-07-31 20:49 ` Mike Snitzer
2025-07-31 20:54 ` Jeff Layton
1 sibling, 1 reply; 19+ messages in thread
From: Jeff Layton @ 2025-07-31 20:28 UTC (permalink / raw)
To: Mike Snitzer, Chuck Lever; +Cc: linux-nfs
On Thu, 2025-07-31 at 15:44 -0400, Mike Snitzer wrote:
> Refactor nfsd_vfs_write() to support splitting a WRITE into parts
> (which will be either misaligned or DIO-aligned). Doing so in a
> preliminary commit just allows for indentation and slight
> transformation to be more easily understood and reviewed.
>
> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> ---
> fs/nfsd/vfs.c | 50 ++++++++++++++++++++++++++++++--------------------
> 1 file changed, 30 insertions(+), 20 deletions(-)
>
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index 35c29b8ade9c3..e4855c32dad12 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -1341,7 +1341,6 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
> struct super_block *sb = file_inode(file)->i_sb;
> struct kiocb kiocb;
> struct svc_export *exp;
> - struct iov_iter iter;
> errseq_t since;
> __be32 nfserr;
> int host_err;
> @@ -1349,6 +1348,9 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
> unsigned int pflags = current->flags;
> bool restore_flags = false;
> unsigned int nvecs;
> + struct iov_iter iter_stack[1];
> + struct iov_iter *iter = iter_stack;
> + unsigned int n_iters = 0;
>
> trace_nfsd_write_opened(rqstp, fhp, offset, *cnt);
>
> @@ -1378,14 +1380,15 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
> kiocb.ki_flags |= IOCB_DSYNC;
>
> nvecs = xdr_buf_to_bvec(rqstp->rq_bvec, rqstp->rq_maxpages, payload);
> - iov_iter_bvec(&iter, ITER_SOURCE, rqstp->rq_bvec, nvecs, *cnt);
> + iov_iter_bvec(&iter[0], ITER_SOURCE, rqstp->rq_bvec, nvecs, *cnt);
> + n_iters++;
>
> switch (nfsd_io_cache_write) {
> case NFSD_IO_DIRECT:
> /* direct I/O must be aligned to device logical sector size */
> if (nf->nf_dio_mem_align && nf->nf_dio_offset_align &&
> (((offset | *cnt) & (nf->nf_dio_offset_align-1)) == 0) &&
> - iov_iter_is_aligned(&iter, nf->nf_dio_mem_align - 1,
> + iov_iter_is_aligned(&iter[0], nf->nf_dio_mem_align - 1,
> nf->nf_dio_offset_align - 1))
> kiocb.ki_flags = IOCB_DIRECT;
> break;
> @@ -1396,25 +1399,32 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
> break;
> }
>
> - since = READ_ONCE(file->f_wb_err);
> - if (verf)
> - nfsd_copy_write_verifier(verf, nn);
> - host_err = vfs_iocb_iter_write(file, &kiocb, &iter);
> - if (host_err < 0) {
> - commit_reset_write_verifier(nn, rqstp, host_err);
> - goto out_nfserr;
> - }
> - *cnt = host_err;
> - nfsd_stats_io_write_add(nn, exp, *cnt);
> - fsnotify_modify(file);
> - host_err = filemap_check_wb_err(file->f_mapping, since);
> - if (host_err < 0)
> - goto out_nfserr;
> + *cnt = 0;
> + for (int i = 0; i < n_iters; i++) {
> + since = READ_ONCE(file->f_wb_err);
The above assignment can stay outside the loop. No need to resample it
on every pass, and doing that could cause you to miss errors.
> + if (verf)
> + nfsd_copy_write_verifier(verf, nn);
>
The verf doesn't need to be copied every time either.
> - if (stable && fhp->fh_use_wgather) {
> - host_err = wait_for_concurrent_writes(file);
> - if (host_err < 0)
> + host_err = vfs_iocb_iter_write(file, &kiocb, &iter[i]);
> + if (host_err < 0) {
> commit_reset_write_verifier(nn, rqstp, host_err);
> + goto out_nfserr;
> + }
> + *cnt += host_err;
> + nfsd_stats_io_write_add(nn, exp, host_err);
> +
> + fsnotify_modify(file);
> + host_err = filemap_check_wb_err(file->f_mapping, since);
> + if (host_err < 0)
> + goto out_nfserr;
> +
> + if (stable && fhp->fh_use_wgather) {
> + host_err = wait_for_concurrent_writes(file);
> + if (host_err < 0) {
> + commit_reset_write_verifier(nn, rqstp, host_err);
> + goto out_nfserr;
> + }
> + }
> }
>
> out_nfserr:
The rest looks good though.
--
Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2 1/4] NFSD: refactor nfsd_read_vector_dio to EVENT_CLASS useful for READ and WRITE
2025-07-31 19:44 ` [PATCH v2 1/4] NFSD: refactor nfsd_read_vector_dio to EVENT_CLASS useful for READ and WRITE Mike Snitzer
@ 2025-07-31 20:28 ` Jeff Layton
0 siblings, 0 replies; 19+ messages in thread
From: Jeff Layton @ 2025-07-31 20:28 UTC (permalink / raw)
To: Mike Snitzer, Chuck Lever; +Cc: linux-nfs
On Thu, 2025-07-31 at 15:44 -0400, Mike Snitzer wrote:
> Transform nfsd_read_vector_dio trace event into nfsd_analyze_dio_class
> and use it to create nfsd_analyze_read_dio and nfsd_analyze_write_dio
> trace events.
>
> This prepares for nfsd_vfs_write() to also make use of it when
> handling misaligned WRITEs.
>
> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> ---
> fs/nfsd/trace.h | 52 ++++++++++++++++++++++++++++++++++++-------------
> fs/nfsd/vfs.c | 11 ++++++-----
> 2 files changed, 44 insertions(+), 19 deletions(-)
>
> diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
> index 55055482f8a84..4173bd9344b6b 100644
> --- a/fs/nfsd/trace.h
> +++ b/fs/nfsd/trace.h
> @@ -473,25 +473,29 @@ DEFINE_NFSD_IO_EVENT(write_done);
> DEFINE_NFSD_IO_EVENT(commit_start);
> DEFINE_NFSD_IO_EVENT(commit_done);
>
> -TRACE_EVENT(nfsd_read_vector_dio,
> +DECLARE_EVENT_CLASS(nfsd_analyze_dio_class,
> TP_PROTO(struct svc_rqst *rqstp,
> struct svc_fh *fhp,
> u64 offset,
> u32 len,
> - loff_t start,
> - loff_t start_extra,
> - loff_t end,
> - loff_t end_extra),
> - TP_ARGS(rqstp, fhp, offset, len, start, start_extra, end, end_extra),
> + loff_t start,
> + ssize_t start_len,
> + loff_t middle,
> + ssize_t middle_len,
> + loff_t end,
> + ssize_t end_len),
> + TP_ARGS(rqstp, fhp, offset, len, start, start_len, middle, middle_len, end, end_len),
> TP_STRUCT__entry(
> __field(u32, xid)
> __field(u32, fh_hash)
> __field(u64, offset)
> __field(u32, len)
> __field(loff_t, start)
> - __field(loff_t, start_extra)
> + __field(ssize_t, start_len)
> + __field(loff_t, middle)
> + __field(ssize_t, middle_len)
> __field(loff_t, end)
> - __field(loff_t, end_extra)
> + __field(ssize_t, end_len)
> ),
> TP_fast_assign(
> __entry->xid = be32_to_cpu(rqstp->rq_xid);
> @@ -499,16 +503,36 @@ TRACE_EVENT(nfsd_read_vector_dio,
> __entry->offset = offset;
> __entry->len = len;
> __entry->start = start;
> - __entry->start_extra = start_extra;
> + __entry->start_len = start_len;
> + __entry->middle = middle;
> + __entry->middle_len = middle_len;
> __entry->end = end;
> - __entry->end_extra = end_extra;
> + __entry->end_len = end_len;
> ),
> - TP_printk("xid=0x%08x fh_hash=0x%08x offset=%llu len=%u start=%llu+%llu end=%llu-%llu",
> + TP_printk("xid=0x%08x fh_hash=0x%08x offset=%llu len=%u start=%llu+%lu middle=%llu+%lu end=%llu+%lu",
> __entry->xid, __entry->fh_hash,
> __entry->offset, __entry->len,
> - __entry->start, __entry->start_extra,
> - __entry->end, __entry->end_extra)
> -);
> + __entry->start, __entry->start_len,
> + __entry->middle, __entry->middle_len,
> + __entry->end, __entry->end_len)
> +)
> +
> +#define DEFINE_NFSD_ANALYZE_DIO_EVENT(name) \
> +DEFINE_EVENT(nfsd_analyze_dio_class, nfsd_analyze_##name##_dio, \
> + TP_PROTO(struct svc_rqst *rqstp, \
> + struct svc_fh *fhp, \
> + u64 offset, \
> + u32 len, \
> + loff_t start, \
> + ssize_t start_len, \
> + loff_t middle, \
> + ssize_t middle_len, \
> + loff_t end, \
> + ssize_t end_len), \
> + TP_ARGS(rqstp, fhp, offset, len, start, start_len, middle, middle_len, end, end_len))
> +
> +DEFINE_NFSD_ANALYZE_DIO_EVENT(read);
> +DEFINE_NFSD_ANALYZE_DIO_EVENT(write);
>
> DECLARE_EVENT_CLASS(nfsd_err_class,
> TP_PROTO(struct svc_rqst *rqstp,
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index 46189020172fb..35c29b8ade9c3 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -1094,7 +1094,7 @@ static bool nfsd_analyze_read_dio(struct svc_rqst *rqstp, struct svc_fh *fhp,
> struct nfsd_read_dio *read_dio)
> {
> const u32 dio_blocksize = nf->nf_dio_read_offset_align;
> - loff_t orig_end = offset + len;
> + loff_t middle_end, orig_end = offset + len;
>
> if (WARN_ONCE(!nf->nf_dio_mem_align || !nf->nf_dio_read_offset_align,
> "%s: underlying filesystem has not provided DIO alignment info\n",
> @@ -1133,10 +1133,11 @@ static bool nfsd_analyze_read_dio(struct svc_rqst *rqstp, struct svc_fh *fhp,
> }
>
> /* Show original offset and count, and how it was expanded for DIO */
> - trace_nfsd_read_vector_dio(rqstp, fhp, offset, len,
> - read_dio->start, read_dio->start_extra,
> - read_dio->end, read_dio->end_extra);
> -
> + middle_end = read_dio->end - read_dio->end_extra;
> + trace_nfsd_analyze_read_dio(rqstp, fhp, offset, len,
> + read_dio->start, read_dio->start_extra,
> + offset, (middle_end - offset),
> + middle_end, read_dio->end_extra);
> return true;
> }
>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2 2/4] NFSD: prepare nfsd_vfs_write() to use O_DIRECT on misaligned WRITEs
2025-07-31 20:28 ` Jeff Layton
@ 2025-07-31 20:49 ` Mike Snitzer
0 siblings, 0 replies; 19+ messages in thread
From: Mike Snitzer @ 2025-07-31 20:49 UTC (permalink / raw)
To: Jeff Layton; +Cc: Chuck Lever, linux-nfs
On Thu, Jul 31, 2025 at 04:28:13PM -0400, Jeff Layton wrote:
> On Thu, 2025-07-31 at 15:44 -0400, Mike Snitzer wrote:
> > Refactor nfsd_vfs_write() to support splitting a WRITE into parts
> > (which will be either misaligned or DIO-aligned). Doing so in a
> > preliminary commit just allows for indentation and slight
> > transformation to be more easily understood and reviewed.
> >
> > Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> > ---
> > fs/nfsd/vfs.c | 50 ++++++++++++++++++++++++++++++--------------------
> > 1 file changed, 30 insertions(+), 20 deletions(-)
> >
> > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> > index 35c29b8ade9c3..e4855c32dad12 100644
> > --- a/fs/nfsd/vfs.c
> > +++ b/fs/nfsd/vfs.c
> > @@ -1341,7 +1341,6 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
> > struct super_block *sb = file_inode(file)->i_sb;
> > struct kiocb kiocb;
> > struct svc_export *exp;
> > - struct iov_iter iter;
> > errseq_t since;
> > __be32 nfserr;
> > int host_err;
> > @@ -1349,6 +1348,9 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
> > unsigned int pflags = current->flags;
> > bool restore_flags = false;
> > unsigned int nvecs;
> > + struct iov_iter iter_stack[1];
> > + struct iov_iter *iter = iter_stack;
> > + unsigned int n_iters = 0;
> >
> > trace_nfsd_write_opened(rqstp, fhp, offset, *cnt);
> >
> > @@ -1378,14 +1380,15 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
> > kiocb.ki_flags |= IOCB_DSYNC;
> >
> > nvecs = xdr_buf_to_bvec(rqstp->rq_bvec, rqstp->rq_maxpages, payload);
> > - iov_iter_bvec(&iter, ITER_SOURCE, rqstp->rq_bvec, nvecs, *cnt);
> > + iov_iter_bvec(&iter[0], ITER_SOURCE, rqstp->rq_bvec, nvecs, *cnt);
> > + n_iters++;
> >
> > switch (nfsd_io_cache_write) {
> > case NFSD_IO_DIRECT:
> > /* direct I/O must be aligned to device logical sector size */
> > if (nf->nf_dio_mem_align && nf->nf_dio_offset_align &&
> > (((offset | *cnt) & (nf->nf_dio_offset_align-1)) == 0) &&
> > - iov_iter_is_aligned(&iter, nf->nf_dio_mem_align - 1,
> > + iov_iter_is_aligned(&iter[0], nf->nf_dio_mem_align - 1,
> > nf->nf_dio_offset_align - 1))
> > kiocb.ki_flags = IOCB_DIRECT;
> > break;
> > @@ -1396,25 +1399,32 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
> > break;
> > }
> >
> > - since = READ_ONCE(file->f_wb_err);
> > - if (verf)
> > - nfsd_copy_write_verifier(verf, nn);
> > - host_err = vfs_iocb_iter_write(file, &kiocb, &iter);
> > - if (host_err < 0) {
> > - commit_reset_write_verifier(nn, rqstp, host_err);
> > - goto out_nfserr;
> > - }
> > - *cnt = host_err;
> > - nfsd_stats_io_write_add(nn, exp, *cnt);
> > - fsnotify_modify(file);
> > - host_err = filemap_check_wb_err(file->f_mapping, since);
> > - if (host_err < 0)
> > - goto out_nfserr;
> > + *cnt = 0;
> > + for (int i = 0; i < n_iters; i++) {
> > + since = READ_ONCE(file->f_wb_err);
>
> The above assignment can stay outside the loop. No need to resample it
> on every pass, and doing that could cause you to miss errors.
>
> > + if (verf)
> > + nfsd_copy_write_verifier(verf, nn);
> >
>
> The verf doesn't need to be copied every time either.
OK, ack to both, will fix.
> > - if (stable && fhp->fh_use_wgather) {
> > - host_err = wait_for_concurrent_writes(file);
> > - if (host_err < 0)
> > + host_err = vfs_iocb_iter_write(file, &kiocb, &iter[i]);
> > + if (host_err < 0) {
> > commit_reset_write_verifier(nn, rqstp, host_err);
> > + goto out_nfserr;
> > + }
> > + *cnt += host_err;
> > + nfsd_stats_io_write_add(nn, exp, host_err);
> > +
> > + fsnotify_modify(file);
> > + host_err = filemap_check_wb_err(file->f_mapping, since);
> > + if (host_err < 0)
> > + goto out_nfserr;
> > +
> > + if (stable && fhp->fh_use_wgather) {
> > + host_err = wait_for_concurrent_writes(file);
> > + if (host_err < 0) {
> > + commit_reset_write_verifier(nn, rqstp, host_err);
> > + goto out_nfserr;
> > + }
> > + }
> > }
> >
> > out_nfserr:
>
> The rest looks good though.
Thanks for the review!
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2 3/4] NFSD: issue WRITEs using O_DIRECT even if IO is misaligned
2025-07-31 19:44 ` [PATCH v2 3/4] NFSD: issue WRITEs using O_DIRECT even if IO is misaligned Mike Snitzer
@ 2025-07-31 20:53 ` Jeff Layton
0 siblings, 0 replies; 19+ messages in thread
From: Jeff Layton @ 2025-07-31 20:53 UTC (permalink / raw)
To: Mike Snitzer, Chuck Lever; +Cc: linux-nfs
On Thu, 2025-07-31 at 15:44 -0400, Mike Snitzer wrote:
> If NFSD_IO_DIRECT is used, split any misaligned WRITE into a start,
> middle and end as needed. The large middle extent is DIO-aligned and
> the start and/or end are misaligned. Buffered IO is used for the
> misaligned extents and O_DIRECT is used for the middle DIO-aligned
> extent.
>
> The nfsd_analyze_write_dio trace event shows how NFSD splits a given
> misaligned WRITE into a mix of misaligned extent(s) and a DIO-aligned
> extent.
>
> This combination of trace events is useful:
>
> echo 1 > /sys/kernel/tracing/events/nfsd/nfsd_write_opened/enable
> echo 1 > /sys/kernel/tracing/events/nfsd/nfsd_analyze_write_dio/enable
> echo 1 > /sys/kernel/tracing/events/nfsd/nfsd_write_io_done/enable
> echo 1 > /sys/kernel/tracing/events/xfs/xfs_file_direct_write/enable
>
> Which for this dd command:
>
> dd if=/dev/zero of=/mnt/share1/test bs=47008 count=2 oflag=direct
>
> Results in:
>
> nfsd-55714 [043] ..... 79976.260851: nfsd_write_opened: xid=0x966c5d2d fh_hash=0x4d34e6c1 offset=0 len=47008
> nfsd-55714 [043] ..... 79976.260852: nfsd_analyze_write_dio: xid=0x966c5d2d fh_hash=0x4d34e6c1 offset=0 len=47008 start=0+0 middle=0+45056 end=45056+1952
> nfsd-55714 [043] ..... 79976.260857: xfs_file_direct_write: dev 259:12 ino 0x3e00008f disize 0x0 pos 0x0 bytecount 0xb000
> nfsd-55714 [043] ..... 79976.260965: nfsd_write_io_done: xid=0x966c5d2d fh_hash=0x4d34e6c1 offset=0 len=47008
>
> nfsd-55714 [043] ..... 79976.307762: nfsd_write_opened: xid=0x67e5ce6f fh_hash=0x4d34e6c1 offset=47008 len=47008
> nfsd-55714 [043] ..... 79976.307762: nfsd_analyze_write_dio: xid=0x67e5ce6f fh_hash=0x4d34e6c1 offset=47008 len=47008 start=47008+2144 middle=49152+40960 end=90112+3904
> nfsd-55714 [043] ..... 79976.307797: xfs_file_direct_write: dev 259:12 ino 0x3e00008f disize 0xc000 pos 0xc000 bytecount 0xa000
> nfsd-55714 [043] ..... 79976.307866: nfsd_write_io_done: xid=0x67e5ce6f fh_hash=0x4d34e6c1 offset=47008 len=47008
>
> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> ---
> fs/nfsd/vfs.c | 135 ++++++++++++++++++++++++++++++++++++++++++++++----
> 1 file changed, 124 insertions(+), 11 deletions(-)
>
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index e4855c32dad12..23360825455a2 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -1314,6 +1314,113 @@ static int wait_for_concurrent_writes(struct file *file)
> return err;
> }
>
> +struct nfsd_write_dio
> +{
> + loff_t middle_offset; /* Offset for start of DIO-aligned middle */
> + loff_t end_offset; /* Offset for start of DIO-aligned end */
> + ssize_t start_len; /* Length for misaligned first extent */
> + ssize_t middle_len; /* Length for DIO-aligned middle extent */
> + ssize_t end_len; /* Length for misaligned last extent */
> +};
> +
> +static void init_nfsd_write_dio(struct nfsd_write_dio *write_dio)
> +{
> + memset(write_dio, 0, sizeof(*write_dio));
> +}
> +
> +static bool nfsd_analyze_write_dio(struct svc_rqst *rqstp, struct svc_fh *fhp,
> + struct nfsd_file *nf, loff_t offset,
> + unsigned long len, struct nfsd_write_dio *write_dio)
> +{
> + const u32 dio_blocksize = nf->nf_dio_offset_align;
> + loff_t orig_end, middle_end, start_end, start_offset = offset;
> + ssize_t start_len = len;
> + bool aligned = true;
> +
> + if (WARN_ONCE(!nf->nf_dio_mem_align || !dio_blocksize,
> + "%s: underlying filesystem has not provided DIO alignment info\n",
> + __func__))
> + return false;
> +
> + if (WARN_ONCE(dio_blocksize > PAGE_SIZE,
> + "%s: underlying storage's dio_blocksize=%u > PAGE_SIZE=%lu\n",
> + __func__, dio_blocksize, PAGE_SIZE))
> + return false;
> +
> + if (unlikely(len < dio_blocksize)) {
> + aligned = false;
> + goto out;
> + }
> +
> + if (((offset | len) & (dio_blocksize-1)) == 0) {
> + /* already DIO-aligned, no misaligned head or tail */
> + write_dio->middle_offset = offset;
> + write_dio->middle_len = len;
> + /* clear these for the benefit of trace_nfsd_analyze_write_dio */
> + start_offset = 0;
> + start_len = 0;
> + goto out;
> + }
> +
> + start_end = round_up(offset, dio_blocksize);
> + start_len = start_end - offset;
> + orig_end = offset + len;
> + middle_end = round_down(orig_end, dio_blocksize);
> +
> + write_dio->start_len = start_len;
> + write_dio->middle_offset = start_end;
> + write_dio->middle_len = middle_end - start_end;
> + write_dio->end_offset = middle_end;
> + write_dio->end_len = orig_end - middle_end;
> +out:
> + trace_nfsd_analyze_write_dio(rqstp, fhp, offset, len, start_offset, start_len,
> + write_dio->middle_offset, write_dio->middle_len,
> + write_dio->end_offset, write_dio->end_len);
> + return aligned;
> +}
> +
> +/*
> + * Setup as many as 3 iov_iter based on extents possibly described by @write_dio.
> + * @iterp: pointer to pointer to onstack array of 3 iov_iter structs from caller.
> + * @rq_bvec: backing bio_vec used to setup all 3 iov_iter permutations.
> + * @nvecs: number of segments in @rq_bvec
> + * @cnt: size of the request in bytes
> + * @write_dio: nfsd_write_dio struct that describes start, middle and end extents.
> + *
> + * Returns the number of iov_iter that were setup.
> + */
> +static int nfsd_setup_write_iters(struct iov_iter **iterp, struct bio_vec *rq_bvec,
> + unsigned int nvecs, unsigned long cnt,
> + struct nfsd_write_dio *write_dio)
> +{
> + int n_iters = 0;
> + struct iov_iter *iters = *iterp;
> +
> + /* Setup misaligned start? */
> + if (write_dio->start_len) {
> + iov_iter_bvec(&iters[n_iters], ITER_SOURCE, rq_bvec, nvecs, cnt);
> + iters[n_iters].count = write_dio->start_len;
> + n_iters++;
> + }
> +
> + /* Setup possibly DIO-aligned middle */
> + iov_iter_bvec(&iters[n_iters], ITER_SOURCE, rq_bvec, nvecs, cnt);
> + if (write_dio->start_len)
> + iov_iter_advance(&iters[n_iters], write_dio->start_len);
> + iters[n_iters].count -= write_dio->end_len;
> + n_iters++;
> +
> + /* Setup misaligned end? */
> + if (write_dio->end_len) {
> + iov_iter_bvec(&iters[n_iters], ITER_SOURCE, rq_bvec, nvecs, cnt);
> + iov_iter_advance(&iters[n_iters],
> + write_dio->start_len + write_dio->middle_len);
> + n_iters++;
> + }
> +
> + return n_iters;
> +}
> +
> /**
> * nfsd_vfs_write - write data to an already-open file
> * @rqstp: RPC execution context
> @@ -1348,9 +1455,11 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
> unsigned int pflags = current->flags;
> bool restore_flags = false;
> unsigned int nvecs;
> - struct iov_iter iter_stack[1];
> + struct iov_iter iter_stack[3];
> struct iov_iter *iter = iter_stack;
> unsigned int n_iters = 0;
> + bool dio_aligned = false;
> + struct nfsd_write_dio write_dio;
>
> trace_nfsd_write_opened(rqstp, fhp, offset, *cnt);
>
> @@ -1379,18 +1488,12 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
> if (stable && !fhp->fh_use_wgather)
> kiocb.ki_flags |= IOCB_DSYNC;
>
> - nvecs = xdr_buf_to_bvec(rqstp->rq_bvec, rqstp->rq_maxpages, payload);
> - iov_iter_bvec(&iter[0], ITER_SOURCE, rqstp->rq_bvec, nvecs, *cnt);
> - n_iters++;
> -
> + init_nfsd_write_dio(&write_dio);
> switch (nfsd_io_cache_write) {
> case NFSD_IO_DIRECT:
> - /* direct I/O must be aligned to device logical sector size */
> - if (nf->nf_dio_mem_align && nf->nf_dio_offset_align &&
> - (((offset | *cnt) & (nf->nf_dio_offset_align-1)) == 0) &&
> - iov_iter_is_aligned(&iter[0], nf->nf_dio_mem_align - 1,
> - nf->nf_dio_offset_align - 1))
> - kiocb.ki_flags = IOCB_DIRECT;
> + if (nfsd_analyze_write_dio(rqstp, fhp, nf, offset,
> + *cnt, &write_dio))
> + dio_aligned = true;
> break;
> case NFSD_IO_DONTCACHE:
> kiocb.ki_flags = IOCB_DONTCACHE;
> @@ -1399,12 +1502,22 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
> break;
> }
>
> + nvecs = xdr_buf_to_bvec(rqstp->rq_bvec, rqstp->rq_maxpages, payload);
> + n_iters = nfsd_setup_write_iters(&iter, rqstp->rq_bvec, nvecs, *cnt, &write_dio);
> +
> *cnt = 0;
> for (int i = 0; i < n_iters; i++) {
> since = READ_ONCE(file->f_wb_err);
> if (verf)
> nfsd_copy_write_verifier(verf, nn);
>
> + if (dio_aligned) {
> + if (iov_iter_is_aligned(&iter[i], nf->nf_dio_mem_align - 1,
> + nf->nf_dio_offset_align - 1))
> + kiocb.ki_flags |= IOCB_DIRECT;
> + else
> + kiocb.ki_flags &= ~IOCB_DIRECT;
> + }
> host_err = vfs_iocb_iter_write(file, &kiocb, &iter[i]);
> if (host_err < 0) {
> commit_reset_write_verifier(nn, rqstp, host_err);
Reviewed-by: Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2 2/4] NFSD: prepare nfsd_vfs_write() to use O_DIRECT on misaligned WRITEs
2025-07-31 19:44 ` [PATCH v2 2/4] NFSD: prepare nfsd_vfs_write() to use O_DIRECT on misaligned WRITEs Mike Snitzer
2025-07-31 20:28 ` Jeff Layton
@ 2025-07-31 20:54 ` Jeff Layton
1 sibling, 0 replies; 19+ messages in thread
From: Jeff Layton @ 2025-07-31 20:54 UTC (permalink / raw)
To: Mike Snitzer, Chuck Lever; +Cc: linux-nfs
On Thu, 2025-07-31 at 15:44 -0400, Mike Snitzer wrote:
> Refactor nfsd_vfs_write() to support splitting a WRITE into parts
> (which will be either misaligned or DIO-aligned). Doing so in a
> preliminary commit just allows for indentation and slight
> transformation to be more easily understood and reviewed.
>
> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> ---
> fs/nfsd/vfs.c | 50 ++++++++++++++++++++++++++++++--------------------
> 1 file changed, 30 insertions(+), 20 deletions(-)
>
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index 35c29b8ade9c3..e4855c32dad12 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -1341,7 +1341,6 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
> struct super_block *sb = file_inode(file)->i_sb;
> struct kiocb kiocb;
> struct svc_export *exp;
> - struct iov_iter iter;
> errseq_t since;
> __be32 nfserr;
> int host_err;
> @@ -1349,6 +1348,9 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
> unsigned int pflags = current->flags;
> bool restore_flags = false;
> unsigned int nvecs;
> + struct iov_iter iter_stack[1];
> + struct iov_iter *iter = iter_stack;
> + unsigned int n_iters = 0;
>
> trace_nfsd_write_opened(rqstp, fhp, offset, *cnt);
>
> @@ -1378,14 +1380,15 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
> kiocb.ki_flags |= IOCB_DSYNC;
>
> nvecs = xdr_buf_to_bvec(rqstp->rq_bvec, rqstp->rq_maxpages, payload);
> - iov_iter_bvec(&iter, ITER_SOURCE, rqstp->rq_bvec, nvecs, *cnt);
> + iov_iter_bvec(&iter[0], ITER_SOURCE, rqstp->rq_bvec, nvecs, *cnt);
> + n_iters++;
>
> switch (nfsd_io_cache_write) {
> case NFSD_IO_DIRECT:
> /* direct I/O must be aligned to device logical sector size */
> if (nf->nf_dio_mem_align && nf->nf_dio_offset_align &&
> (((offset | *cnt) & (nf->nf_dio_offset_align-1)) == 0) &&
> - iov_iter_is_aligned(&iter, nf->nf_dio_mem_align - 1,
> + iov_iter_is_aligned(&iter[0], nf->nf_dio_mem_align - 1,
> nf->nf_dio_offset_align - 1))
> kiocb.ki_flags = IOCB_DIRECT;
> break;
> @@ -1396,25 +1399,32 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
> break;
> }
>
> - since = READ_ONCE(file->f_wb_err);
> - if (verf)
> - nfsd_copy_write_verifier(verf, nn);
> - host_err = vfs_iocb_iter_write(file, &kiocb, &iter);
> - if (host_err < 0) {
> - commit_reset_write_verifier(nn, rqstp, host_err);
> - goto out_nfserr;
> - }
> - *cnt = host_err;
> - nfsd_stats_io_write_add(nn, exp, *cnt);
> - fsnotify_modify(file);
> - host_err = filemap_check_wb_err(file->f_mapping, since);
> - if (host_err < 0)
> - goto out_nfserr;
> + *cnt = 0;
> + for (int i = 0; i < n_iters; i++) {
> + since = READ_ONCE(file->f_wb_err);
> + if (verf)
> + nfsd_copy_write_verifier(verf, nn);
>
Once you remove the above bits outside the loop, you can add:
Reviewed-by: Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2 4/4] NFSD: handle unaligned DIO for NFS reexport
2025-07-31 19:44 ` [PATCH v2 4/4] NFSD: handle unaligned DIO for NFS reexport Mike Snitzer
@ 2025-07-31 20:58 ` Jeff Layton
2025-07-31 21:28 ` Mike Snitzer
0 siblings, 1 reply; 19+ messages in thread
From: Jeff Layton @ 2025-07-31 20:58 UTC (permalink / raw)
To: Mike Snitzer, Chuck Lever; +Cc: linux-nfs
On Thu, 2025-07-31 at 15:44 -0400, Mike Snitzer wrote:
> NFS doesn't have any DIO alignment constraints but it doesn't support
> STATX_DIOALIGN, so update NFSD such that it doesn't disable the use of
> NFSD_IO_DIRECT if it is reexporting NFS.
>
> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> ---
> fs/nfs/export.c | 3 ++-
> fs/nfsd/filecache.c | 11 +++++++++++
> include/linux/exportfs.h | 13 +++++++++++++
> 3 files changed, 26 insertions(+), 1 deletion(-)
>
> diff --git a/fs/nfs/export.c b/fs/nfs/export.c
> index e9c233b6fd209..2cae75ba6b35d 100644
> --- a/fs/nfs/export.c
> +++ b/fs/nfs/export.c
> @@ -155,5 +155,6 @@ const struct export_operations nfs_export_ops = {
> EXPORT_OP_REMOTE_FS |
> EXPORT_OP_NOATOMIC_ATTR |
> EXPORT_OP_FLUSH_ON_CLOSE |
> - EXPORT_OP_NOLOCKS,
> + EXPORT_OP_NOLOCKS |
> + EXPORT_OP_NO_DIOALIGN_NEEDED,
> };
> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> index 5601e839a72da..ea489dd44fd9a 100644
> --- a/fs/nfsd/filecache.c
> +++ b/fs/nfsd/filecache.c
> @@ -1066,6 +1066,17 @@ nfsd_file_getattr(const struct svc_fh *fhp, struct nfsd_file *nf)
> nfsd_io_cache_write != NFSD_IO_DIRECT))
> return nfs_ok;
>
> + if (exportfs_handles_unaligned_dio(nf->nf_file->f_path.mnt->mnt_sb->s_export_op)) {
> + /* Underlying filesystem doesn't support STATX_DIOALIGN
> + * but it can handle all unaligned DIO, so establish
> + * DIO alignment that is accommodating.
> + */
> + nf->nf_dio_mem_align = 4;
> + nf->nf_dio_offset_align = PAGE_SIZE;
> + nf->nf_dio_read_offset_align = nf->nf_dio_offset_align;
> + return nfs_ok;
> + }
> +
> status = fh_getattr(fhp, &stat);
> if (status != nfs_ok)
> return status;
> diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h
> index 9369a607224c1..626b8486dd985 100644
> --- a/include/linux/exportfs.h
> +++ b/include/linux/exportfs.h
> @@ -247,6 +247,7 @@ struct export_operations {
> */
> #define EXPORT_OP_FLUSH_ON_CLOSE (0x20) /* fs flushes file data on close */
> #define EXPORT_OP_NOLOCKS (0x40) /* no file locking support */
> +#define EXPORT_OP_NO_DIOALIGN_NEEDED (0x80) /* fs can handle unaligned DIO */
> unsigned long flags;
> };
>
> @@ -262,6 +263,18 @@ exportfs_cannot_lock(const struct export_operations *export_ops)
> return export_ops->flags & EXPORT_OP_NOLOCKS;
> }
>
> +/**
> + * exportfs_handles_unaligned_dio() - check if export can handle unaligned DIO
> + * @export_ops: the nfs export operations to check
> + *
> + * Returns true if the export can handle unaligned DIO.
> + */
> +static inline bool
> +exportfs_handles_unaligned_dio(const struct export_operations *export_ops)
> +{
> + return export_ops->flags & EXPORT_OP_NO_DIOALIGN_NEEDED;
> +}
> +
> extern int exportfs_encode_inode_fh(struct inode *inode, struct fid *fid,
> int *max_len, struct inode *parent,
> int flags);
Would it not be simpler (better?) to add support for STATX_DIOALIGN to
NFS, and just have it report '1' for both values?
--
Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2 4/4] NFSD: handle unaligned DIO for NFS reexport
2025-07-31 20:58 ` Jeff Layton
@ 2025-07-31 21:28 ` Mike Snitzer
2025-07-31 21:45 ` Jeff Layton
2025-07-31 21:48 ` Mike Snitzer
0 siblings, 2 replies; 19+ messages in thread
From: Mike Snitzer @ 2025-07-31 21:28 UTC (permalink / raw)
To: Jeff Layton; +Cc: Chuck Lever, linux-nfs, hch
On Thu, Jul 31, 2025 at 04:58:00PM -0400, Jeff Layton wrote:
> On Thu, 2025-07-31 at 15:44 -0400, Mike Snitzer wrote:
> > NFS doesn't have any DIO alignment constraints but it doesn't support
> > STATX_DIOALIGN, so update NFSD such that it doesn't disable the use of
> > NFSD_IO_DIRECT if it is reexporting NFS.
> >
> > Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> > ---
> > fs/nfs/export.c | 3 ++-
> > fs/nfsd/filecache.c | 11 +++++++++++
> > include/linux/exportfs.h | 13 +++++++++++++
> > 3 files changed, 26 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/nfs/export.c b/fs/nfs/export.c
> > index e9c233b6fd209..2cae75ba6b35d 100644
> > --- a/fs/nfs/export.c
> > +++ b/fs/nfs/export.c
> > @@ -155,5 +155,6 @@ const struct export_operations nfs_export_ops = {
> > EXPORT_OP_REMOTE_FS |
> > EXPORT_OP_NOATOMIC_ATTR |
> > EXPORT_OP_FLUSH_ON_CLOSE |
> > - EXPORT_OP_NOLOCKS,
> > + EXPORT_OP_NOLOCKS |
> > + EXPORT_OP_NO_DIOALIGN_NEEDED,
> > };
> > diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> > index 5601e839a72da..ea489dd44fd9a 100644
> > --- a/fs/nfsd/filecache.c
> > +++ b/fs/nfsd/filecache.c
> > @@ -1066,6 +1066,17 @@ nfsd_file_getattr(const struct svc_fh *fhp, struct nfsd_file *nf)
> > nfsd_io_cache_write != NFSD_IO_DIRECT))
> > return nfs_ok;
> >
> > + if (exportfs_handles_unaligned_dio(nf->nf_file->f_path.mnt->mnt_sb->s_export_op)) {
> > + /* Underlying filesystem doesn't support STATX_DIOALIGN
> > + * but it can handle all unaligned DIO, so establish
> > + * DIO alignment that is accommodating.
> > + */
> > + nf->nf_dio_mem_align = 4;
> > + nf->nf_dio_offset_align = PAGE_SIZE;
> > + nf->nf_dio_read_offset_align = nf->nf_dio_offset_align;
> > + return nfs_ok;
> > + }
> > +
> > status = fh_getattr(fhp, &stat);
> > if (status != nfs_ok)
> > return status;
> > diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h
> > index 9369a607224c1..626b8486dd985 100644
> > --- a/include/linux/exportfs.h
> > +++ b/include/linux/exportfs.h
> > @@ -247,6 +247,7 @@ struct export_operations {
> > */
> > #define EXPORT_OP_FLUSH_ON_CLOSE (0x20) /* fs flushes file data on close */
> > #define EXPORT_OP_NOLOCKS (0x40) /* no file locking support */
> > +#define EXPORT_OP_NO_DIOALIGN_NEEDED (0x80) /* fs can handle unaligned DIO */
> > unsigned long flags;
> > };
> >
> > @@ -262,6 +263,18 @@ exportfs_cannot_lock(const struct export_operations *export_ops)
> > return export_ops->flags & EXPORT_OP_NOLOCKS;
> > }
> >
> > +/**
> > + * exportfs_handles_unaligned_dio() - check if export can handle unaligned DIO
> > + * @export_ops: the nfs export operations to check
> > + *
> > + * Returns true if the export can handle unaligned DIO.
> > + */
> > +static inline bool
> > +exportfs_handles_unaligned_dio(const struct export_operations *export_ops)
> > +{
> > + return export_ops->flags & EXPORT_OP_NO_DIOALIGN_NEEDED;
> > +}
> > +
> > extern int exportfs_encode_inode_fh(struct inode *inode, struct fid *fid,
> > int *max_len, struct inode *parent,
> > int flags);
>
>
> Would it not be simpler (better?) to add support for STATX_DIOALIGN to
> NFS, and just have it report '1' for both values?
I suppose adding NFS support for STATX_DIOALIGN, that doesn't actually
go over the wire, does make sense.
But I wouldn't think setting them to 1 valid. Pretty sure they need
to be a power-of-2 (since they are used as masks passed to
iov_iter_is_aligned).
In addition, we want to make sure NFS's default DIO alignment (which
isn't informed by actual DIO alignment advertised by NFSD's underlying
filesystem and hardware, e.g. XFS and NVMe) is able to be compatible
with both finer (512b) and coarser (4096b) grained DIO alignment.
Only way to achieve that would be to skew toward the coarse-grained
end of the spectrum, right?
More conservative but more likely to work with everything.
Mike
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2 4/4] NFSD: handle unaligned DIO for NFS reexport
2025-07-31 21:28 ` Mike Snitzer
@ 2025-07-31 21:45 ` Jeff Layton
2025-07-31 22:14 ` Mike Snitzer
2025-08-01 23:17 ` Tom Talpey
2025-07-31 21:48 ` Mike Snitzer
1 sibling, 2 replies; 19+ messages in thread
From: Jeff Layton @ 2025-07-31 21:45 UTC (permalink / raw)
To: Mike Snitzer; +Cc: Chuck Lever, linux-nfs, hch
On Thu, 2025-07-31 at 17:28 -0400, Mike Snitzer wrote:
> On Thu, Jul 31, 2025 at 04:58:00PM -0400, Jeff Layton wrote:
> > On Thu, 2025-07-31 at 15:44 -0400, Mike Snitzer wrote:
> > > NFS doesn't have any DIO alignment constraints but it doesn't support
> > > STATX_DIOALIGN, so update NFSD such that it doesn't disable the use of
> > > NFSD_IO_DIRECT if it is reexporting NFS.
> > >
> > > Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> > > ---
> > > fs/nfs/export.c | 3 ++-
> > > fs/nfsd/filecache.c | 11 +++++++++++
> > > include/linux/exportfs.h | 13 +++++++++++++
> > > 3 files changed, 26 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/fs/nfs/export.c b/fs/nfs/export.c
> > > index e9c233b6fd209..2cae75ba6b35d 100644
> > > --- a/fs/nfs/export.c
> > > +++ b/fs/nfs/export.c
> > > @@ -155,5 +155,6 @@ const struct export_operations nfs_export_ops = {
> > > EXPORT_OP_REMOTE_FS |
> > > EXPORT_OP_NOATOMIC_ATTR |
> > > EXPORT_OP_FLUSH_ON_CLOSE |
> > > - EXPORT_OP_NOLOCKS,
> > > + EXPORT_OP_NOLOCKS |
> > > + EXPORT_OP_NO_DIOALIGN_NEEDED,
> > > };
> > > diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> > > index 5601e839a72da..ea489dd44fd9a 100644
> > > --- a/fs/nfsd/filecache.c
> > > +++ b/fs/nfsd/filecache.c
> > > @@ -1066,6 +1066,17 @@ nfsd_file_getattr(const struct svc_fh *fhp, struct nfsd_file *nf)
> > > nfsd_io_cache_write != NFSD_IO_DIRECT))
> > > return nfs_ok;
> > >
> > > + if (exportfs_handles_unaligned_dio(nf->nf_file->f_path.mnt->mnt_sb->s_export_op)) {
> > > + /* Underlying filesystem doesn't support STATX_DIOALIGN
> > > + * but it can handle all unaligned DIO, so establish
> > > + * DIO alignment that is accommodating.
> > > + */
> > > + nf->nf_dio_mem_align = 4;
> > > + nf->nf_dio_offset_align = PAGE_SIZE;
> > > + nf->nf_dio_read_offset_align = nf->nf_dio_offset_align;
> > > + return nfs_ok;
> > > + }
> > > +
> > > status = fh_getattr(fhp, &stat);
> > > if (status != nfs_ok)
> > > return status;
> > > diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h
> > > index 9369a607224c1..626b8486dd985 100644
> > > --- a/include/linux/exportfs.h
> > > +++ b/include/linux/exportfs.h
> > > @@ -247,6 +247,7 @@ struct export_operations {
> > > */
> > > #define EXPORT_OP_FLUSH_ON_CLOSE (0x20) /* fs flushes file data on close */
> > > #define EXPORT_OP_NOLOCKS (0x40) /* no file locking support */
> > > +#define EXPORT_OP_NO_DIOALIGN_NEEDED (0x80) /* fs can handle unaligned DIO */
> > > unsigned long flags;
> > > };
> > >
> > > @@ -262,6 +263,18 @@ exportfs_cannot_lock(const struct export_operations *export_ops)
> > > return export_ops->flags & EXPORT_OP_NOLOCKS;
> > > }
> > >
> > > +/**
> > > + * exportfs_handles_unaligned_dio() - check if export can handle unaligned DIO
> > > + * @export_ops: the nfs export operations to check
> > > + *
> > > + * Returns true if the export can handle unaligned DIO.
> > > + */
> > > +static inline bool
> > > +exportfs_handles_unaligned_dio(const struct export_operations *export_ops)
> > > +{
> > > + return export_ops->flags & EXPORT_OP_NO_DIOALIGN_NEEDED;
> > > +}
> > > +
> > > extern int exportfs_encode_inode_fh(struct inode *inode, struct fid *fid,
> > > int *max_len, struct inode *parent,
> > > int flags);
> >
> >
> > Would it not be simpler (better?) to add support for STATX_DIOALIGN to
> > NFS, and just have it report '1' for both values?
>
> I suppose adding NFS support for STATX_DIOALIGN, that doesn't actually
> go over the wire, does make sense.
>
The NFS protocol doesn't have any alignment restrictions. The NFS
client supports DIO, but doesn't enforce any sort of alignment
restriction on userland.
> But I wouldn't think setting them to 1 valid. Pretty sure they need
> to be a power-of-2 (since they are used as masks passed to
> iov_iter_is_aligned).
>
2^0 == 1 :-)
This might be a good thing to bring up to the greater fsdevel
community. What should filesystems that support DIO but don't enforce
any alignment restrictions report for that attribute?
'1' would seem to be the natural thing to return in that case. Maybe we
need to special case that in some of the helpers?
> In addition, we want to make sure NFS's default DIO alignment (which
> isn't informed by actual DIO alignment advertised by NFSD's underlying
> filesystem and hardware, e.g. XFS and NVMe) is able to be compatible
> with both finer (512b) and coarser (4096b) grained DIO alignment.
> Only way to achieve that would be to skew toward the coarse-grained
> end of the spectrum, right?
>
> More conservative but more likely to work with everything.
>
I don't think NFS has ever enforced a particular alignment on userland,
at least not with regular network transport. Does RDMA change this?
In any case, I'm fine with taking this for now as a stopgap fix, but we
should aim to plumb proper support for STATX_DIOALIGN in the client
sometime soon. Applications are going to start using that attribute,
and if they get back that it's unsupported, some may fail or fall back
on buffered I/O on NFS.
--
Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2 4/4] NFSD: handle unaligned DIO for NFS reexport
2025-07-31 21:28 ` Mike Snitzer
2025-07-31 21:45 ` Jeff Layton
@ 2025-07-31 21:48 ` Mike Snitzer
2025-08-01 14:07 ` Chuck Lever
1 sibling, 1 reply; 19+ messages in thread
From: Mike Snitzer @ 2025-07-31 21:48 UTC (permalink / raw)
To: Jeff Layton; +Cc: Chuck Lever, linux-nfs, hch
On Thu, Jul 31, 2025 at 05:28:13PM -0400, Mike Snitzer wrote:
> On Thu, Jul 31, 2025 at 04:58:00PM -0400, Jeff Layton wrote:
> > On Thu, 2025-07-31 at 15:44 -0400, Mike Snitzer wrote:
> > > NFS doesn't have any DIO alignment constraints but it doesn't support
> > > STATX_DIOALIGN, so update NFSD such that it doesn't disable the use of
> > > NFSD_IO_DIRECT if it is reexporting NFS.
> > >
> > > Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> > > ---
> > > fs/nfs/export.c | 3 ++-
> > > fs/nfsd/filecache.c | 11 +++++++++++
> > > include/linux/exportfs.h | 13 +++++++++++++
> > > 3 files changed, 26 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/fs/nfs/export.c b/fs/nfs/export.c
> > > index e9c233b6fd209..2cae75ba6b35d 100644
> > > --- a/fs/nfs/export.c
> > > +++ b/fs/nfs/export.c
> > > @@ -155,5 +155,6 @@ const struct export_operations nfs_export_ops = {
> > > EXPORT_OP_REMOTE_FS |
> > > EXPORT_OP_NOATOMIC_ATTR |
> > > EXPORT_OP_FLUSH_ON_CLOSE |
> > > - EXPORT_OP_NOLOCKS,
> > > + EXPORT_OP_NOLOCKS |
> > > + EXPORT_OP_NO_DIOALIGN_NEEDED,
> > > };
> > > diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> > > index 5601e839a72da..ea489dd44fd9a 100644
> > > --- a/fs/nfsd/filecache.c
> > > +++ b/fs/nfsd/filecache.c
> > > @@ -1066,6 +1066,17 @@ nfsd_file_getattr(const struct svc_fh *fhp, struct nfsd_file *nf)
> > > nfsd_io_cache_write != NFSD_IO_DIRECT))
> > > return nfs_ok;
> > >
> > > + if (exportfs_handles_unaligned_dio(nf->nf_file->f_path.mnt->mnt_sb->s_export_op)) {
> > > + /* Underlying filesystem doesn't support STATX_DIOALIGN
> > > + * but it can handle all unaligned DIO, so establish
> > > + * DIO alignment that is accommodating.
> > > + */
> > > + nf->nf_dio_mem_align = 4;
> > > + nf->nf_dio_offset_align = PAGE_SIZE;
> > > + nf->nf_dio_read_offset_align = nf->nf_dio_offset_align;
> > > + return nfs_ok;
> > > + }
> > > +
> > > status = fh_getattr(fhp, &stat);
> > > if (status != nfs_ok)
> > > return status;
> > > diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h
> > > index 9369a607224c1..626b8486dd985 100644
> > > --- a/include/linux/exportfs.h
> > > +++ b/include/linux/exportfs.h
> > > @@ -247,6 +247,7 @@ struct export_operations {
> > > */
> > > #define EXPORT_OP_FLUSH_ON_CLOSE (0x20) /* fs flushes file data on close */
> > > #define EXPORT_OP_NOLOCKS (0x40) /* no file locking support */
> > > +#define EXPORT_OP_NO_DIOALIGN_NEEDED (0x80) /* fs can handle unaligned DIO */
> > > unsigned long flags;
> > > };
> > >
> > > @@ -262,6 +263,18 @@ exportfs_cannot_lock(const struct export_operations *export_ops)
> > > return export_ops->flags & EXPORT_OP_NOLOCKS;
> > > }
> > >
> > > +/**
> > > + * exportfs_handles_unaligned_dio() - check if export can handle unaligned DIO
> > > + * @export_ops: the nfs export operations to check
> > > + *
> > > + * Returns true if the export can handle unaligned DIO.
> > > + */
> > > +static inline bool
> > > +exportfs_handles_unaligned_dio(const struct export_operations *export_ops)
> > > +{
> > > + return export_ops->flags & EXPORT_OP_NO_DIOALIGN_NEEDED;
> > > +}
> > > +
> > > extern int exportfs_encode_inode_fh(struct inode *inode, struct fid *fid,
> > > int *max_len, struct inode *parent,
> > > int flags);
> >
> >
> > Would it not be simpler (better?) to add support for STATX_DIOALIGN to
> > NFS, and just have it report '1' for both values?
>
> I suppose adding NFS support for STATX_DIOALIGN, that doesn't actually
> go over the wire, does make sense.
>
> But I wouldn't think setting them to 1 valid. Pretty sure they need
> to be a power-of-2 (since they are used as masks passed to
> iov_iter_is_aligned).
>
> In addition, we want to make sure NFS's default DIO alignment (which
> isn't informed by actual DIO alignment advertised by NFSD's underlying
> filesystem and hardware, e.g. XFS and NVMe) is able to be compatible
> with both finer (512b) and coarser (4096b) grained DIO alignment.
> Only way to achieve that would be to skew toward the coarse-grained
> end of the spectrum, right?
>
> More conservative but more likely to work with everything.
Thinking/looking further: I really do prefer the approach I took in
this patch (over the suggestion to implement STATX_DIOALIGN for NFS).
Otherwise NFS would forced to needlessly issue an RPC via fh_getattr()
even though we're talking about NFS faking its STATX_DIOALIGN response
(so it doesn't need to do the work to do a full-blown GETATTR).
This would be wasteful for the NFS reexport usecase.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2 4/4] NFSD: handle unaligned DIO for NFS reexport
2025-07-31 21:45 ` Jeff Layton
@ 2025-07-31 22:14 ` Mike Snitzer
2025-08-01 23:17 ` Tom Talpey
1 sibling, 0 replies; 19+ messages in thread
From: Mike Snitzer @ 2025-07-31 22:14 UTC (permalink / raw)
To: Jeff Layton; +Cc: Chuck Lever, linux-nfs, hch
On Thu, Jul 31, 2025 at 05:45:31PM -0400, Jeff Layton wrote:
> On Thu, 2025-07-31 at 17:28 -0400, Mike Snitzer wrote:
> > On Thu, Jul 31, 2025 at 04:58:00PM -0400, Jeff Layton wrote:
> > > On Thu, 2025-07-31 at 15:44 -0400, Mike Snitzer wrote:
> > > > NFS doesn't have any DIO alignment constraints but it doesn't support
> > > > STATX_DIOALIGN, so update NFSD such that it doesn't disable the use of
> > > > NFSD_IO_DIRECT if it is reexporting NFS.
> > > >
> > > > Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> > > > ---
> > > > fs/nfs/export.c | 3 ++-
> > > > fs/nfsd/filecache.c | 11 +++++++++++
> > > > include/linux/exportfs.h | 13 +++++++++++++
> > > > 3 files changed, 26 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/fs/nfs/export.c b/fs/nfs/export.c
> > > > index e9c233b6fd209..2cae75ba6b35d 100644
> > > > --- a/fs/nfs/export.c
> > > > +++ b/fs/nfs/export.c
> > > > @@ -155,5 +155,6 @@ const struct export_operations nfs_export_ops = {
> > > > EXPORT_OP_REMOTE_FS |
> > > > EXPORT_OP_NOATOMIC_ATTR |
> > > > EXPORT_OP_FLUSH_ON_CLOSE |
> > > > - EXPORT_OP_NOLOCKS,
> > > > + EXPORT_OP_NOLOCKS |
> > > > + EXPORT_OP_NO_DIOALIGN_NEEDED,
> > > > };
> > > > diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> > > > index 5601e839a72da..ea489dd44fd9a 100644
> > > > --- a/fs/nfsd/filecache.c
> > > > +++ b/fs/nfsd/filecache.c
> > > > @@ -1066,6 +1066,17 @@ nfsd_file_getattr(const struct svc_fh *fhp, struct nfsd_file *nf)
> > > > nfsd_io_cache_write != NFSD_IO_DIRECT))
> > > > return nfs_ok;
> > > >
> > > > + if (exportfs_handles_unaligned_dio(nf->nf_file->f_path.mnt->mnt_sb->s_export_op)) {
> > > > + /* Underlying filesystem doesn't support STATX_DIOALIGN
> > > > + * but it can handle all unaligned DIO, so establish
> > > > + * DIO alignment that is accommodating.
> > > > + */
> > > > + nf->nf_dio_mem_align = 4;
> > > > + nf->nf_dio_offset_align = PAGE_SIZE;
> > > > + nf->nf_dio_read_offset_align = nf->nf_dio_offset_align;
> > > > + return nfs_ok;
> > > > + }
> > > > +
> > > > status = fh_getattr(fhp, &stat);
> > > > if (status != nfs_ok)
> > > > return status;
> > > > diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h
> > > > index 9369a607224c1..626b8486dd985 100644
> > > > --- a/include/linux/exportfs.h
> > > > +++ b/include/linux/exportfs.h
> > > > @@ -247,6 +247,7 @@ struct export_operations {
> > > > */
> > > > #define EXPORT_OP_FLUSH_ON_CLOSE (0x20) /* fs flushes file data on close */
> > > > #define EXPORT_OP_NOLOCKS (0x40) /* no file locking support */
> > > > +#define EXPORT_OP_NO_DIOALIGN_NEEDED (0x80) /* fs can handle unaligned DIO */
> > > > unsigned long flags;
> > > > };
> > > >
> > > > @@ -262,6 +263,18 @@ exportfs_cannot_lock(const struct export_operations *export_ops)
> > > > return export_ops->flags & EXPORT_OP_NOLOCKS;
> > > > }
> > > >
> > > > +/**
> > > > + * exportfs_handles_unaligned_dio() - check if export can handle unaligned DIO
> > > > + * @export_ops: the nfs export operations to check
> > > > + *
> > > > + * Returns true if the export can handle unaligned DIO.
> > > > + */
> > > > +static inline bool
> > > > +exportfs_handles_unaligned_dio(const struct export_operations *export_ops)
> > > > +{
> > > > + return export_ops->flags & EXPORT_OP_NO_DIOALIGN_NEEDED;
> > > > +}
> > > > +
> > > > extern int exportfs_encode_inode_fh(struct inode *inode, struct fid *fid,
> > > > int *max_len, struct inode *parent,
> > > > int flags);
> > >
> > >
> > > Would it not be simpler (better?) to add support for STATX_DIOALIGN to
> > > NFS, and just have it report '1' for both values?
> >
> > I suppose adding NFS support for STATX_DIOALIGN, that doesn't actually
> > go over the wire, does make sense.
> >
>
> The NFS protocol doesn't have any alignment restrictions. The NFS
> client supports DIO, but doesn't enforce any sort of alignment
> restriction on userland.
>
> > But I wouldn't think setting them to 1 valid. Pretty sure they need
> > to be a power-of-2 (since they are used as masks passed to
> > iov_iter_is_aligned).
> >
>
> 2^0 == 1 :-)
>
> This might be a good thing to bring up to the greater fsdevel
> community. What should filesystems that support DIO but don't enforce
> any alignment restrictions report for that attribute?
>
> '1' would seem to be the natural thing to return in that case. Maybe we
> need to special case that in some of the helpers?
>
> > In addition, we want to make sure NFS's default DIO alignment (which
> > isn't informed by actual DIO alignment advertised by NFSD's underlying
> > filesystem and hardware, e.g. XFS and NVMe) is able to be compatible
> > with both finer (512b) and coarser (4096b) grained DIO alignment.
> > Only way to achieve that would be to skew toward the coarse-grained
> > end of the spectrum, right?
> >
> > More conservative but more likely to work with everything.
> >
>
>
> I don't think NFS has ever enforced a particular alignment on userland,
> at least not with regular network transport. Does RDMA change this?
Not _exactly_ sure what you're asking. But no, as you mentioned, NFS
doesn't have any DIO alignment constraints -- so it certainly isn't
imposing any. I'm not looking to impose any either. I'm just trying
to have NFSD and NFS offer a sane response in the reexport case ;)
One that doesn't limit the utility of NFSD doing work to shape the IO
so that it is compatible with the remote NFSD(s) by the time it gets
to an NFSD that _actually_ sits ontop of a local filesystem like XFS.
> In any case, I'm fine with taking this for now as a stopgap fix, but we
> should aim to plumb proper support for STATX_DIOALIGN in the client
> sometime soon. Applications are going to start using that attribute,
> and if they get back that it's unsupported, some may fail or fall back
> on buffered I/O on NFS.
That is a valid concern, maybe we'd do well to make it possible for
both NFSD _and_ NFS to avoid going over the wire if all that it is
asked to provide is STATX_DIOALIGN | STATX_DIO_READ_ALIGN (via
request_mask).
Currently, fh_getattr() is used (which expects to be querying a local
filesystem) so it is heavier than we need it to be given we're just
looking to populate the nfsd_file's DIO alignment attrs in
nfsd_file_getattr().
Mike
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2 4/4] NFSD: handle unaligned DIO for NFS reexport
2025-07-31 21:48 ` Mike Snitzer
@ 2025-08-01 14:07 ` Chuck Lever
2025-08-01 14:33 ` Jeff Layton
0 siblings, 1 reply; 19+ messages in thread
From: Chuck Lever @ 2025-08-01 14:07 UTC (permalink / raw)
To: Mike Snitzer, Jeff Layton; +Cc: linux-nfs, hch
On 7/31/25 5:48 PM, Mike Snitzer wrote:
> On Thu, Jul 31, 2025 at 05:28:13PM -0400, Mike Snitzer wrote:
>> On Thu, Jul 31, 2025 at 04:58:00PM -0400, Jeff Layton wrote:
>>> On Thu, 2025-07-31 at 15:44 -0400, Mike Snitzer wrote:
>>>> NFS doesn't have any DIO alignment constraints but it doesn't support
>>>> STATX_DIOALIGN, so update NFSD such that it doesn't disable the use of
>>>> NFSD_IO_DIRECT if it is reexporting NFS.
>>>>
>>>> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
>>>> ---
>>>> fs/nfs/export.c | 3 ++-
>>>> fs/nfsd/filecache.c | 11 +++++++++++
>>>> include/linux/exportfs.h | 13 +++++++++++++
>>>> 3 files changed, 26 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/fs/nfs/export.c b/fs/nfs/export.c
>>>> index e9c233b6fd209..2cae75ba6b35d 100644
>>>> --- a/fs/nfs/export.c
>>>> +++ b/fs/nfs/export.c
>>>> @@ -155,5 +155,6 @@ const struct export_operations nfs_export_ops = {
>>>> EXPORT_OP_REMOTE_FS |
>>>> EXPORT_OP_NOATOMIC_ATTR |
>>>> EXPORT_OP_FLUSH_ON_CLOSE |
>>>> - EXPORT_OP_NOLOCKS,
>>>> + EXPORT_OP_NOLOCKS |
>>>> + EXPORT_OP_NO_DIOALIGN_NEEDED,
>>>> };
>>>> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
>>>> index 5601e839a72da..ea489dd44fd9a 100644
>>>> --- a/fs/nfsd/filecache.c
>>>> +++ b/fs/nfsd/filecache.c
>>>> @@ -1066,6 +1066,17 @@ nfsd_file_getattr(const struct svc_fh *fhp, struct nfsd_file *nf)
>>>> nfsd_io_cache_write != NFSD_IO_DIRECT))
>>>> return nfs_ok;
>>>>
>>>> + if (exportfs_handles_unaligned_dio(nf->nf_file->f_path.mnt->mnt_sb->s_export_op)) {
>>>> + /* Underlying filesystem doesn't support STATX_DIOALIGN
>>>> + * but it can handle all unaligned DIO, so establish
>>>> + * DIO alignment that is accommodating.
>>>> + */
>>>> + nf->nf_dio_mem_align = 4;
>>>> + nf->nf_dio_offset_align = PAGE_SIZE;
>>>> + nf->nf_dio_read_offset_align = nf->nf_dio_offset_align;
>>>> + return nfs_ok;
>>>> + }
>>>> +
>>>> status = fh_getattr(fhp, &stat);
>>>> if (status != nfs_ok)
>>>> return status;
>>>> diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h
>>>> index 9369a607224c1..626b8486dd985 100644
>>>> --- a/include/linux/exportfs.h
>>>> +++ b/include/linux/exportfs.h
>>>> @@ -247,6 +247,7 @@ struct export_operations {
>>>> */
>>>> #define EXPORT_OP_FLUSH_ON_CLOSE (0x20) /* fs flushes file data on close */
>>>> #define EXPORT_OP_NOLOCKS (0x40) /* no file locking support */
>>>> +#define EXPORT_OP_NO_DIOALIGN_NEEDED (0x80) /* fs can handle unaligned DIO */
>>>> unsigned long flags;
>>>> };
>>>>
>>>> @@ -262,6 +263,18 @@ exportfs_cannot_lock(const struct export_operations *export_ops)
>>>> return export_ops->flags & EXPORT_OP_NOLOCKS;
>>>> }
>>>>
>>>> +/**
>>>> + * exportfs_handles_unaligned_dio() - check if export can handle unaligned DIO
>>>> + * @export_ops: the nfs export operations to check
>>>> + *
>>>> + * Returns true if the export can handle unaligned DIO.
>>>> + */
>>>> +static inline bool
>>>> +exportfs_handles_unaligned_dio(const struct export_operations *export_ops)
>>>> +{
>>>> + return export_ops->flags & EXPORT_OP_NO_DIOALIGN_NEEDED;
>>>> +}
>>>> +
>>>> extern int exportfs_encode_inode_fh(struct inode *inode, struct fid *fid,
>>>> int *max_len, struct inode *parent,
>>>> int flags);
>>>
>>>
>>> Would it not be simpler (better?) to add support for STATX_DIOALIGN to
>>> NFS, and just have it report '1' for both values?
>>
>> I suppose adding NFS support for STATX_DIOALIGN, that doesn't actually
>> go over the wire, does make sense.
>>
>> But I wouldn't think setting them to 1 valid. Pretty sure they need
>> to be a power-of-2 (since they are used as masks passed to
>> iov_iter_is_aligned).
>>
>> In addition, we want to make sure NFS's default DIO alignment (which
>> isn't informed by actual DIO alignment advertised by NFSD's underlying
>> filesystem and hardware, e.g. XFS and NVMe) is able to be compatible
>> with both finer (512b) and coarser (4096b) grained DIO alignment.
>> Only way to achieve that would be to skew toward the coarse-grained
>> end of the spectrum, right?
>>
>> More conservative but more likely to work with everything.
>
> Thinking/looking further: I really do prefer the approach I took in
> this patch (over the suggestion to implement STATX_DIOALIGN for NFS).
>
> Otherwise NFS would forced to needlessly issue an RPC via fh_getattr()
> even though we're talking about NFS faking its STATX_DIOALIGN response
> (so it doesn't need to do the work to do a full-blown GETATTR).
>
> This would be wasteful for the NFS reexport usecase.
Jeff's point is that applications (and in particular, user space NFS
servers) will use statx() to discover these values. The NFS client has
to implement STATX_DIOALIGN.
I don't buy the idea that using vfs_getattr() to call into the NFS
client is wasteful here. Isn't this done once when the nfsd_file
is opened? And, since these are emulated values that are not fetched
via the NFS protocol, wouldn't that mean the NFS client could respond
without sending an RPC?
I prefer to not add the exception processing to NFSD if, in the medium
to long run, the NFS client has to get support for DIOALIGN anyway.
--
Chuck Lever
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2 4/4] NFSD: handle unaligned DIO for NFS reexport
2025-08-01 14:07 ` Chuck Lever
@ 2025-08-01 14:33 ` Jeff Layton
2025-08-01 16:06 ` Mike Snitzer
0 siblings, 1 reply; 19+ messages in thread
From: Jeff Layton @ 2025-08-01 14:33 UTC (permalink / raw)
To: Chuck Lever, Mike Snitzer; +Cc: linux-nfs, hch
On Fri, 2025-08-01 at 10:07 -0400, Chuck Lever wrote:
> On 7/31/25 5:48 PM, Mike Snitzer wrote:
> > On Thu, Jul 31, 2025 at 05:28:13PM -0400, Mike Snitzer wrote:
> > > On Thu, Jul 31, 2025 at 04:58:00PM -0400, Jeff Layton wrote:
> > > > On Thu, 2025-07-31 at 15:44 -0400, Mike Snitzer wrote:
> > > > > NFS doesn't have any DIO alignment constraints but it doesn't support
> > > > > STATX_DIOALIGN, so update NFSD such that it doesn't disable the use of
> > > > > NFSD_IO_DIRECT if it is reexporting NFS.
> > > > >
> > > > > Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> > > > > ---
> > > > > fs/nfs/export.c | 3 ++-
> > > > > fs/nfsd/filecache.c | 11 +++++++++++
> > > > > include/linux/exportfs.h | 13 +++++++++++++
> > > > > 3 files changed, 26 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/fs/nfs/export.c b/fs/nfs/export.c
> > > > > index e9c233b6fd209..2cae75ba6b35d 100644
> > > > > --- a/fs/nfs/export.c
> > > > > +++ b/fs/nfs/export.c
> > > > > @@ -155,5 +155,6 @@ const struct export_operations nfs_export_ops = {
> > > > > EXPORT_OP_REMOTE_FS |
> > > > > EXPORT_OP_NOATOMIC_ATTR |
> > > > > EXPORT_OP_FLUSH_ON_CLOSE |
> > > > > - EXPORT_OP_NOLOCKS,
> > > > > + EXPORT_OP_NOLOCKS |
> > > > > + EXPORT_OP_NO_DIOALIGN_NEEDED,
> > > > > };
> > > > > diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> > > > > index 5601e839a72da..ea489dd44fd9a 100644
> > > > > --- a/fs/nfsd/filecache.c
> > > > > +++ b/fs/nfsd/filecache.c
> > > > > @@ -1066,6 +1066,17 @@ nfsd_file_getattr(const struct svc_fh *fhp, struct nfsd_file *nf)
> > > > > nfsd_io_cache_write != NFSD_IO_DIRECT))
> > > > > return nfs_ok;
> > > > >
> > > > > + if (exportfs_handles_unaligned_dio(nf->nf_file->f_path.mnt->mnt_sb->s_export_op)) {
> > > > > + /* Underlying filesystem doesn't support STATX_DIOALIGN
> > > > > + * but it can handle all unaligned DIO, so establish
> > > > > + * DIO alignment that is accommodating.
> > > > > + */
> > > > > + nf->nf_dio_mem_align = 4;
> > > > > + nf->nf_dio_offset_align = PAGE_SIZE;
> > > > > + nf->nf_dio_read_offset_align = nf->nf_dio_offset_align;
> > > > > + return nfs_ok;
> > > > > + }
> > > > > +
> > > > > status = fh_getattr(fhp, &stat);
> > > > > if (status != nfs_ok)
> > > > > return status;
> > > > > diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h
> > > > > index 9369a607224c1..626b8486dd985 100644
> > > > > --- a/include/linux/exportfs.h
> > > > > +++ b/include/linux/exportfs.h
> > > > > @@ -247,6 +247,7 @@ struct export_operations {
> > > > > */
> > > > > #define EXPORT_OP_FLUSH_ON_CLOSE (0x20) /* fs flushes file data on close */
> > > > > #define EXPORT_OP_NOLOCKS (0x40) /* no file locking support */
> > > > > +#define EXPORT_OP_NO_DIOALIGN_NEEDED (0x80) /* fs can handle unaligned DIO */
> > > > > unsigned long flags;
> > > > > };
> > > > >
> > > > > @@ -262,6 +263,18 @@ exportfs_cannot_lock(const struct export_operations *export_ops)
> > > > > return export_ops->flags & EXPORT_OP_NOLOCKS;
> > > > > }
> > > > >
> > > > > +/**
> > > > > + * exportfs_handles_unaligned_dio() - check if export can handle unaligned DIO
> > > > > + * @export_ops: the nfs export operations to check
> > > > > + *
> > > > > + * Returns true if the export can handle unaligned DIO.
> > > > > + */
> > > > > +static inline bool
> > > > > +exportfs_handles_unaligned_dio(const struct export_operations *export_ops)
> > > > > +{
> > > > > + return export_ops->flags & EXPORT_OP_NO_DIOALIGN_NEEDED;
> > > > > +}
> > > > > +
> > > > > extern int exportfs_encode_inode_fh(struct inode *inode, struct fid *fid,
> > > > > int *max_len, struct inode *parent,
> > > > > int flags);
> > > >
> > > >
> > > > Would it not be simpler (better?) to add support for STATX_DIOALIGN to
> > > > NFS, and just have it report '1' for both values?
> > >
> > > I suppose adding NFS support for STATX_DIOALIGN, that doesn't actually
> > > go over the wire, does make sense.
> > >
> > > But I wouldn't think setting them to 1 valid. Pretty sure they need
> > > to be a power-of-2 (since they are used as masks passed to
> > > iov_iter_is_aligned).
> > >
> > > In addition, we want to make sure NFS's default DIO alignment (which
> > > isn't informed by actual DIO alignment advertised by NFSD's underlying
> > > filesystem and hardware, e.g. XFS and NVMe) is able to be compatible
> > > with both finer (512b) and coarser (4096b) grained DIO alignment.
> > > Only way to achieve that would be to skew toward the coarse-grained
> > > end of the spectrum, right?
> > >
> > > More conservative but more likely to work with everything.
> >
> > Thinking/looking further: I really do prefer the approach I took in
> > this patch (over the suggestion to implement STATX_DIOALIGN for NFS).
> >
> > Otherwise NFS would forced to needlessly issue an RPC via fh_getattr()
> > even though we're talking about NFS faking its STATX_DIOALIGN response
> > (so it doesn't need to do the work to do a full-blown GETATTR).
> >
> > This would be wasteful for the NFS reexport usecase.
>
> Jeff's point is that applications (and in particular, user space NFS
> servers) will use statx() to discover these values. The NFS client has
> to implement STATX_DIOALIGN.
>
> I don't buy the idea that using vfs_getattr() to call into the NFS
> client is wasteful here. Isn't this done once when the nfsd_file
> is opened? And, since these are emulated values that are not fetched
> via the NFS protocol, wouldn't that mean the NFS client could respond
> without sending an RPC?
>
> I prefer to not add the exception processing to NFSD if, in the medium
> to long run, the NFS client has to get support for DIOALIGN anyway.
>
I too think this would be a better approach. We have other exportable
filesystems that have no DIO alignment restrictions too (Ceph comes to
mind, but there are others). It would be nice if they "just worked" and
didn't have to do special EXPORT_* flag shenanigans.
--
Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2 4/4] NFSD: handle unaligned DIO for NFS reexport
2025-08-01 14:33 ` Jeff Layton
@ 2025-08-01 16:06 ` Mike Snitzer
0 siblings, 0 replies; 19+ messages in thread
From: Mike Snitzer @ 2025-08-01 16:06 UTC (permalink / raw)
To: Jeff Layton; +Cc: Chuck Lever, linux-nfs, hch
On Fri, Aug 01, 2025 at 10:33:02AM -0400, Jeff Layton wrote:
> On Fri, 2025-08-01 at 10:07 -0400, Chuck Lever wrote:
> > On 7/31/25 5:48 PM, Mike Snitzer wrote:
> > > On Thu, Jul 31, 2025 at 05:28:13PM -0400, Mike Snitzer wrote:
> > > > On Thu, Jul 31, 2025 at 04:58:00PM -0400, Jeff Layton wrote:
> > > > > On Thu, 2025-07-31 at 15:44 -0400, Mike Snitzer wrote:
> > > > > > NFS doesn't have any DIO alignment constraints but it doesn't support
> > > > > > STATX_DIOALIGN, so update NFSD such that it doesn't disable the use of
> > > > > > NFSD_IO_DIRECT if it is reexporting NFS.
> > > > > >
> > > > > > Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> > > > > > ---
> > > > > > fs/nfs/export.c | 3 ++-
> > > > > > fs/nfsd/filecache.c | 11 +++++++++++
> > > > > > include/linux/exportfs.h | 13 +++++++++++++
> > > > > > 3 files changed, 26 insertions(+), 1 deletion(-)
> > > > > >
> > > > > > diff --git a/fs/nfs/export.c b/fs/nfs/export.c
> > > > > > index e9c233b6fd209..2cae75ba6b35d 100644
> > > > > > --- a/fs/nfs/export.c
> > > > > > +++ b/fs/nfs/export.c
> > > > > > @@ -155,5 +155,6 @@ const struct export_operations nfs_export_ops = {
> > > > > > EXPORT_OP_REMOTE_FS |
> > > > > > EXPORT_OP_NOATOMIC_ATTR |
> > > > > > EXPORT_OP_FLUSH_ON_CLOSE |
> > > > > > - EXPORT_OP_NOLOCKS,
> > > > > > + EXPORT_OP_NOLOCKS |
> > > > > > + EXPORT_OP_NO_DIOALIGN_NEEDED,
> > > > > > };
> > > > > > diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> > > > > > index 5601e839a72da..ea489dd44fd9a 100644
> > > > > > --- a/fs/nfsd/filecache.c
> > > > > > +++ b/fs/nfsd/filecache.c
> > > > > > @@ -1066,6 +1066,17 @@ nfsd_file_getattr(const struct svc_fh *fhp, struct nfsd_file *nf)
> > > > > > nfsd_io_cache_write != NFSD_IO_DIRECT))
> > > > > > return nfs_ok;
> > > > > >
> > > > > > + if (exportfs_handles_unaligned_dio(nf->nf_file->f_path.mnt->mnt_sb->s_export_op)) {
> > > > > > + /* Underlying filesystem doesn't support STATX_DIOALIGN
> > > > > > + * but it can handle all unaligned DIO, so establish
> > > > > > + * DIO alignment that is accommodating.
> > > > > > + */
> > > > > > + nf->nf_dio_mem_align = 4;
> > > > > > + nf->nf_dio_offset_align = PAGE_SIZE;
> > > > > > + nf->nf_dio_read_offset_align = nf->nf_dio_offset_align;
> > > > > > + return nfs_ok;
> > > > > > + }
> > > > > > +
> > > > > > status = fh_getattr(fhp, &stat);
> > > > > > if (status != nfs_ok)
> > > > > > return status;
> > > > > > diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h
> > > > > > index 9369a607224c1..626b8486dd985 100644
> > > > > > --- a/include/linux/exportfs.h
> > > > > > +++ b/include/linux/exportfs.h
> > > > > > @@ -247,6 +247,7 @@ struct export_operations {
> > > > > > */
> > > > > > #define EXPORT_OP_FLUSH_ON_CLOSE (0x20) /* fs flushes file data on close */
> > > > > > #define EXPORT_OP_NOLOCKS (0x40) /* no file locking support */
> > > > > > +#define EXPORT_OP_NO_DIOALIGN_NEEDED (0x80) /* fs can handle unaligned DIO */
> > > > > > unsigned long flags;
> > > > > > };
> > > > > >
> > > > > > @@ -262,6 +263,18 @@ exportfs_cannot_lock(const struct export_operations *export_ops)
> > > > > > return export_ops->flags & EXPORT_OP_NOLOCKS;
> > > > > > }
> > > > > >
> > > > > > +/**
> > > > > > + * exportfs_handles_unaligned_dio() - check if export can handle unaligned DIO
> > > > > > + * @export_ops: the nfs export operations to check
> > > > > > + *
> > > > > > + * Returns true if the export can handle unaligned DIO.
> > > > > > + */
> > > > > > +static inline bool
> > > > > > +exportfs_handles_unaligned_dio(const struct export_operations *export_ops)
> > > > > > +{
> > > > > > + return export_ops->flags & EXPORT_OP_NO_DIOALIGN_NEEDED;
> > > > > > +}
> > > > > > +
> > > > > > extern int exportfs_encode_inode_fh(struct inode *inode, struct fid *fid,
> > > > > > int *max_len, struct inode *parent,
> > > > > > int flags);
> > > > >
> > > > >
> > > > > Would it not be simpler (better?) to add support for STATX_DIOALIGN to
> > > > > NFS, and just have it report '1' for both values?
> > > >
> > > > I suppose adding NFS support for STATX_DIOALIGN, that doesn't actually
> > > > go over the wire, does make sense.
> > > >
> > > > But I wouldn't think setting them to 1 valid. Pretty sure they need
> > > > to be a power-of-2 (since they are used as masks passed to
> > > > iov_iter_is_aligned).
> > > >
> > > > In addition, we want to make sure NFS's default DIO alignment (which
> > > > isn't informed by actual DIO alignment advertised by NFSD's underlying
> > > > filesystem and hardware, e.g. XFS and NVMe) is able to be compatible
> > > > with both finer (512b) and coarser (4096b) grained DIO alignment.
> > > > Only way to achieve that would be to skew toward the coarse-grained
> > > > end of the spectrum, right?
> > > >
> > > > More conservative but more likely to work with everything.
> > >
> > > Thinking/looking further: I really do prefer the approach I took in
> > > this patch (over the suggestion to implement STATX_DIOALIGN for NFS).
> > >
> > > Otherwise NFS would forced to needlessly issue an RPC via fh_getattr()
> > > even though we're talking about NFS faking its STATX_DIOALIGN response
> > > (so it doesn't need to do the work to do a full-blown GETATTR).
> > >
> > > This would be wasteful for the NFS reexport usecase.
> >
> > Jeff's point is that applications (and in particular, user space NFS
> > servers) will use statx() to discover these values. The NFS client has
> > to implement STATX_DIOALIGN.
I agree, I'll add it to my TODO.
Adding that support is now very important for the NFS reexport case.
But joining them is a choice. They don't _need_ to be tightly coupled.
We could just have NFS and NFSD call the same helper _and_ set the
EXPORT_OP_NO_DIOALIGN_NEEDED I proposed.
> > I don't buy the idea that using vfs_getattr() to call into the NFS
> > client is wasteful here. Isn't this done once when the nfsd_file
> > is opened? And, since these are emulated values that are not fetched
> > via the NFS protocol, wouldn't that mean the NFS client could respond
> > without sending an RPC?
Not if the request_mask includes more than simply asking for
(STATX_DIOALIGN | STATX_DIO_READ_ALIGN) -- which currently will always
be the case for fh_getattr(). But yes, fair point that it only
happens for nfsd_file open.
> > I prefer to not add the exception processing to NFSD if, in the medium
> > to long run, the NFS client has to get support for DIOALIGN anyway.
> >
>
> I too think this would be a better approach. We have other exportable
> filesystems that have no DIO alignment restrictions too (Ceph comes to
> mind, but there are others). It would be nice if they "just worked" and
> didn't have to do special EXPORT_* flag shenanigans.
Not trying to make this harder than it is, just stating my peace and
then acquiescing...
Each filesystem where this applicable just setting a single well
defined export flag is considerably simpler than the alternative (of
needing to construct an emulated STATX_DIOALIGN response that will
accomplish the goal of NFSD transforming misaligned DIO such that it
isn't too fine-grained to be useful once the rubber meets the road and
the DIO is issued to the local filesystem exported by the ultimate
NFSD endpoint).
Maybe for bonus points, not holding breath, I'll see about adding a
EXPORT_SYMBOL helper <handwave>somewhere</handwave> that will allow
other filesystems like Ceph to benefit too. But to my eyes, an export
flag provides that ;)
Anyway, we have agreement that:
1) You'll drop this 4th patch for NFS reexport
2) I'll work on implementing STATX_DIOALIGN support for NFS
Thanks,
Mike
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2 4/4] NFSD: handle unaligned DIO for NFS reexport
2025-07-31 21:45 ` Jeff Layton
2025-07-31 22:14 ` Mike Snitzer
@ 2025-08-01 23:17 ` Tom Talpey
1 sibling, 0 replies; 19+ messages in thread
From: Tom Talpey @ 2025-08-01 23:17 UTC (permalink / raw)
To: Jeff Layton, Mike Snitzer; +Cc: Chuck Lever, linux-nfs, hch
Just a late interjection on this:
On 7/31/2025 5:45 PM, Jeff Layton wrote:
> I don't think NFS has ever enforced a particular alignment on userland,
> at least not with regular network transport. Does RDMA change this?
RDMA imposes no alignment restrictions per se, it's byte-aligned
and byte-length for both inline and directly-placed payloads.
Some NFS/rpcrdma implementations may reserve direct placement only
for page lists. I believe early Linux NFS clients did, but that was
long ago. Like, 20 years. There's no transport requirement for it.
Tom.
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2025-08-01 23:17 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-31 19:44 [PATCH v2 0/4] NFSD DIRECT: add handling for misaligned WRITEs Mike Snitzer
2025-07-31 19:44 ` [PATCH v2 1/4] NFSD: refactor nfsd_read_vector_dio to EVENT_CLASS useful for READ and WRITE Mike Snitzer
2025-07-31 20:28 ` Jeff Layton
2025-07-31 19:44 ` [PATCH v2 2/4] NFSD: prepare nfsd_vfs_write() to use O_DIRECT on misaligned WRITEs Mike Snitzer
2025-07-31 20:28 ` Jeff Layton
2025-07-31 20:49 ` Mike Snitzer
2025-07-31 20:54 ` Jeff Layton
2025-07-31 19:44 ` [PATCH v2 3/4] NFSD: issue WRITEs using O_DIRECT even if IO is misaligned Mike Snitzer
2025-07-31 20:53 ` Jeff Layton
2025-07-31 19:44 ` [PATCH v2 4/4] NFSD: handle unaligned DIO for NFS reexport Mike Snitzer
2025-07-31 20:58 ` Jeff Layton
2025-07-31 21:28 ` Mike Snitzer
2025-07-31 21:45 ` Jeff Layton
2025-07-31 22:14 ` Mike Snitzer
2025-08-01 23:17 ` Tom Talpey
2025-07-31 21:48 ` Mike Snitzer
2025-08-01 14:07 ` Chuck Lever
2025-08-01 14:33 ` Jeff Layton
2025-08-01 16:06 ` Mike Snitzer
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).