* [PATCH v2 0/7] NFS DIRECT: handle misaligned READ and WRITE for LOCALIO
@ 2025-07-22 2:49 Mike Snitzer
2025-07-22 2:49 ` [PATCH v2 1/7] nfs/localio: avoid bouncing LOCALIO if nfs_client_is_local() Mike Snitzer
` (7 more replies)
0 siblings, 8 replies; 14+ messages in thread
From: Mike Snitzer @ 2025-07-22 2:49 UTC (permalink / raw)
To: Trond Myklebust, Anna Schumaker; +Cc: linux-nfs
Hi,
This "NFS DIRECT" series depends on the "NFSD DIRECT" series here:
https://lore.kernel.org/linux-nfs/20250714224216.14329-1-snitzer@kernel.org/
(for the benefit of nfsd_file_dio_alignment patch in this series)
The first patch was posted as part of a LOCALIO revert series I posted
a week or so ago, thankfully that series isn't needed thanks to Trond
and Neil's efforts. BUT the first patch is needed, has Reviewed-by
from Jeff and Neil and is marked for stable@.
The biggest change in v2 is the introduction of O_DIRECT misaligned
READ and WRITE handling for the benefit of LOCALIO. Please see patches
6 and 7 for more details.
Changes since v1:
- renamed nfs modparam from localio_O_DIRECT_align_misaligned_READ to
localio_O_DIRECT_align_misaligned_IO (is used for misaligned READ
and WRITE support in fs/nfs/direct.c)
- added misaligned O_DIRECT handling for both READ and WRITE to
fs/nfs/direct.c which in practice obviates LOCALIO's need to
fallback to sending misaligned READs to NFSD.
- But the 5th patch that adds LOCALIO support to fallback to NFSD is a
useful backup mechanism (that will hopefully never be needed unless
some fs/nfs/direct.c bug gets introduced in the future). Patch 5
also provides refactoring that is useful.
Thanks,
Mike
Mike Snitzer (7):
nfs/localio: avoid bouncing LOCALIO if nfs_client_is_local()
nfs/localio: make trace_nfs_local_open_fh more useful
nfs/localio: add nfsd_file_dio_alignment
nfs/localio: refactor iocb initialization
nfs/localio: fallback to NFSD for misaligned O_DIRECT READs
nfs/direct: add misaligned READ handling
nfs/direct: add misaligned WRITE handling
fs/nfs/direct.c | 262 +++++++++++++++++++++++--
fs/nfs/flexfilelayout/flexfilelayout.c | 1 +
fs/nfs/internal.h | 17 +-
fs/nfs/localio.c | 231 ++++++++++++++--------
fs/nfs/nfstrace.h | 47 ++++-
fs/nfs/pagelist.c | 22 ++-
fs/nfsd/localio.c | 11 ++
include/linux/nfs_page.h | 1 +
include/linux/nfslocalio.h | 2 +
9 files changed, 485 insertions(+), 109 deletions(-)
--
2.44.0
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v2 1/7] nfs/localio: avoid bouncing LOCALIO if nfs_client_is_local()
2025-07-22 2:49 [PATCH v2 0/7] NFS DIRECT: handle misaligned READ and WRITE for LOCALIO Mike Snitzer
@ 2025-07-22 2:49 ` Mike Snitzer
2025-07-22 2:49 ` [PATCH v2 2/7] nfs/localio: make trace_nfs_local_open_fh more useful Mike Snitzer
` (6 subsequent siblings)
7 siblings, 0 replies; 14+ messages in thread
From: Mike Snitzer @ 2025-07-22 2:49 UTC (permalink / raw)
To: Trond Myklebust, Anna Schumaker; +Cc: linux-nfs
From: Mike Snitzer <snitzer@hammerspace.com>
Previously nfs_local_probe() was made to disable and then attempt to
re-enable LOCALIO (via LOCALIO protocol handshake) if/when it was
called and LOCALIO already enabled.
Vague memory for _why_ this was the case is that this was useful
if/when a local NFS server were to be restarted with a local NFS
client connected to it.
But as it happens this causes an absurd amount of LOCALIO flapping
which has a side-effect of too much IO being needlessly sent to NFSD
(using RPC over the loopback network interface). This is the
definition of "serious performance loss" (that negates the point of
having LOCALIO).
So remove this mis-optimization for re-enabling LOCALIO if/when an NFS
server is restarted (which is an extremely rare thing to do). Will
revisit testing that scenario again but in the meantime this patch
restores the full benefit of LOCALIO.
Signed-off-by: Mike Snitzer <snitzer@hammerspace.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: NeilBrown <neil@brown.name>
---
fs/nfs/localio.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
index 510d0a16cfe9..ecfe22a105ea 100644
--- a/fs/nfs/localio.c
+++ b/fs/nfs/localio.c
@@ -180,10 +180,8 @@ static void nfs_local_probe(struct nfs_client *clp)
return;
}
- if (nfs_client_is_local(clp)) {
- /* If already enabled, disable and re-enable */
- nfs_localio_disable_client(clp);
- }
+ if (nfs_client_is_local(clp))
+ return;
if (!nfs_uuid_begin(&clp->cl_uuid))
return;
@@ -244,7 +242,8 @@ __nfs_local_open_fh(struct nfs_client *clp, const struct cred *cred,
case -ENOMEM:
case -ENXIO:
case -ENOENT:
- /* Revalidate localio, will disable if unsupported */
+ /* Revalidate localio */
+ nfs_localio_disable_client(clp);
nfs_local_probe(clp);
}
}
--
2.44.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v2 2/7] nfs/localio: make trace_nfs_local_open_fh more useful
2025-07-22 2:49 [PATCH v2 0/7] NFS DIRECT: handle misaligned READ and WRITE for LOCALIO Mike Snitzer
2025-07-22 2:49 ` [PATCH v2 1/7] nfs/localio: avoid bouncing LOCALIO if nfs_client_is_local() Mike Snitzer
@ 2025-07-22 2:49 ` Mike Snitzer
2025-07-22 2:49 ` [PATCH v2 3/7] nfs/localio: add nfsd_file_dio_alignment Mike Snitzer
` (5 subsequent siblings)
7 siblings, 0 replies; 14+ messages in thread
From: Mike Snitzer @ 2025-07-22 2:49 UTC (permalink / raw)
To: Trond Myklebust, Anna Schumaker; +Cc: linux-nfs
From: Mike Snitzer <snitzer@hammerspace.com>
Always trigger trace event when LOCALIO opens a file.
Signed-off-by: Mike Snitzer <snitzer@hammerspace.com>
---
fs/nfs/localio.c | 5 +++--
fs/nfs/nfstrace.h | 6 +++---
2 files changed, 6 insertions(+), 5 deletions(-)
diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
index ecfe22a105ea..0b54f01299d2 100644
--- a/fs/nfs/localio.c
+++ b/fs/nfs/localio.c
@@ -231,13 +231,13 @@ __nfs_local_open_fh(struct nfs_client *clp, const struct cred *cred,
struct nfsd_file __rcu **pnf,
const fmode_t mode)
{
+ int status = 0;
struct nfsd_file *localio;
localio = nfs_open_local_fh(&clp->cl_uuid, clp->cl_rpcclient,
cred, fh, nfl, pnf, mode);
if (IS_ERR(localio)) {
- int status = PTR_ERR(localio);
- trace_nfs_local_open_fh(fh, mode, status);
+ status = PTR_ERR(localio);
switch (status) {
case -ENOMEM:
case -ENXIO:
@@ -247,6 +247,7 @@ __nfs_local_open_fh(struct nfs_client *clp, const struct cred *cred,
nfs_local_probe(clp);
}
}
+ trace_nfs_local_open_fh(fh, mode, status);
return localio;
}
diff --git a/fs/nfs/nfstrace.h b/fs/nfs/nfstrace.h
index f49f064c5ee5..feadaa6dee63 100644
--- a/fs/nfs/nfstrace.h
+++ b/fs/nfs/nfstrace.h
@@ -1708,10 +1708,10 @@ TRACE_EVENT(nfs_local_open_fh,
),
TP_printk(
- "error=%d fhandle=0x%08x mode=%s",
- __entry->error,
+ "fhandle=0x%08x mode=%s result=%d",
__entry->fhandle,
- show_fs_fmode_flags(__entry->fmode)
+ show_fs_fmode_flags(__entry->fmode),
+ __entry->error
)
);
--
2.44.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v2 3/7] nfs/localio: add nfsd_file_dio_alignment
2025-07-22 2:49 [PATCH v2 0/7] NFS DIRECT: handle misaligned READ and WRITE for LOCALIO Mike Snitzer
2025-07-22 2:49 ` [PATCH v2 1/7] nfs/localio: avoid bouncing LOCALIO if nfs_client_is_local() Mike Snitzer
2025-07-22 2:49 ` [PATCH v2 2/7] nfs/localio: make trace_nfs_local_open_fh more useful Mike Snitzer
@ 2025-07-22 2:49 ` Mike Snitzer
2025-07-22 2:49 ` [PATCH v2 4/7] nfs/localio: refactor iocb initialization Mike Snitzer
` (4 subsequent siblings)
7 siblings, 0 replies; 14+ messages in thread
From: Mike Snitzer @ 2025-07-22 2:49 UTC (permalink / raw)
To: Trond Myklebust, Anna Schumaker; +Cc: linux-nfs
From: Mike Snitzer <snitzer@hammerspace.com>
And use it to avoid issuing misaligned IO using O_DIRECT.
Signed-off-by: Mike Snitzer <snitzer@hammerspace.com>
---
fs/nfs/localio.c | 26 ++++++++++++++++++++++----
fs/nfsd/localio.c | 11 +++++++++++
include/linux/nfslocalio.h | 2 ++
3 files changed, 35 insertions(+), 4 deletions(-)
diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
index 0b54f01299d2..0c48db38f74f 100644
--- a/fs/nfs/localio.c
+++ b/fs/nfs/localio.c
@@ -322,12 +322,10 @@ nfs_local_iocb_alloc(struct nfs_pgio_header *hdr,
return NULL;
}
+ init_sync_kiocb(&iocb->kiocb, file);
if (localio_O_DIRECT_semantics &&
- test_bit(NFS_IOHDR_ODIRECT, &hdr->flags)) {
- iocb->kiocb.ki_filp = file;
+ test_bit(NFS_IOHDR_ODIRECT, &hdr->flags))
iocb->kiocb.ki_flags = IOCB_DIRECT;
- } else
- init_sync_kiocb(&iocb->kiocb, file);
iocb->kiocb.ki_pos = hdr->args.offset;
iocb->hdr = hdr;
@@ -346,6 +344,26 @@ nfs_local_iter_init(struct iov_iter *i, struct nfs_local_kiocb *iocb, int dir)
hdr->args.count + hdr->args.pgbase);
if (hdr->args.pgbase != 0)
iov_iter_advance(i, hdr->args.pgbase);
+
+ if (iocb->kiocb.ki_flags & IOCB_DIRECT) {
+ u32 nf_dio_mem_align, nf_dio_offset_align, nf_dio_read_offset_align;
+ /* Verify the IO is DIO-aligned as required */
+ nfs_to->nfsd_file_dio_alignment(iocb->localio, &nf_dio_mem_align,
+ &nf_dio_offset_align,
+ &nf_dio_read_offset_align);
+ if (dir == READ)
+ nf_dio_offset_align = nf_dio_read_offset_align;
+ /* direct I/O must be aligned to device logical sector size */
+ if (nf_dio_mem_align && nf_dio_offset_align &&
+ (((hdr->args.offset | hdr->args.count) & (nf_dio_offset_align-1)) == 0) &&
+ iov_iter_is_aligned(i, nf_dio_mem_align - 1,
+ nf_dio_offset_align - 1))
+ return 0;
+
+ /* Fallback to using buffered for this misaligned IO */
+ iocb->kiocb.ki_flags &= ~IOCB_DIRECT;
+ iocb->kiocb.ki_filp->f_flags &= ~O_DIRECT;
+ }
}
static void
diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c
index 269fa9391dc4..be710d809a3b 100644
--- a/fs/nfsd/localio.c
+++ b/fs/nfsd/localio.c
@@ -117,12 +117,23 @@ nfsd_open_local_fh(struct net *net, struct auth_domain *dom,
return localio;
}
+static void nfsd_file_dio_alignment(struct nfsd_file *nf,
+ u32 *nf_dio_mem_align,
+ u32 *nf_dio_offset_align,
+ u32 *nf_dio_read_offset_align)
+{
+ *nf_dio_mem_align = nf->nf_dio_mem_align;
+ *nf_dio_offset_align = nf->nf_dio_offset_align;
+ *nf_dio_read_offset_align = nf->nf_dio_read_offset_align;
+}
+
static const struct nfsd_localio_operations nfsd_localio_ops = {
.nfsd_net_try_get = nfsd_net_try_get,
.nfsd_net_put = nfsd_net_put,
.nfsd_open_local_fh = nfsd_open_local_fh,
.nfsd_file_put_local = nfsd_file_put_local,
.nfsd_file_file = nfsd_file_file,
+ .nfsd_file_dio_alignment = nfsd_file_dio_alignment,
};
void nfsd_localio_ops_init(void)
diff --git a/include/linux/nfslocalio.h b/include/linux/nfslocalio.h
index 59ea90bd136b..3d91043254e6 100644
--- a/include/linux/nfslocalio.h
+++ b/include/linux/nfslocalio.h
@@ -64,6 +64,8 @@ struct nfsd_localio_operations {
const fmode_t);
struct net *(*nfsd_file_put_local)(struct nfsd_file __rcu **);
struct file *(*nfsd_file_file)(struct nfsd_file *);
+ void (*nfsd_file_dio_alignment)(struct nfsd_file *,
+ u32 *, u32 *, u32 *);
} ____cacheline_aligned;
extern void nfsd_localio_ops_init(void);
--
2.44.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v2 4/7] nfs/localio: refactor iocb initialization
2025-07-22 2:49 [PATCH v2 0/7] NFS DIRECT: handle misaligned READ and WRITE for LOCALIO Mike Snitzer
` (2 preceding siblings ...)
2025-07-22 2:49 ` [PATCH v2 3/7] nfs/localio: add nfsd_file_dio_alignment Mike Snitzer
@ 2025-07-22 2:49 ` Mike Snitzer
2025-07-22 2:49 ` [PATCH v2 5/7] nfs/localio: fallback to NFSD for misaligned O_DIRECT READs Mike Snitzer
` (3 subsequent siblings)
7 siblings, 0 replies; 14+ messages in thread
From: Mike Snitzer @ 2025-07-22 2:49 UTC (permalink / raw)
To: Trond Myklebust, Anna Schumaker; +Cc: linux-nfs
From: Mike Snitzer <snitzer@hammerspace.com>
No functional change, but slighty cleaner.
Signed-off-by: Mike Snitzer <snitzer@hammerspace.com>
---
fs/nfs/localio.c | 56 ++++++++++++++++++++++--------------------------
1 file changed, 26 insertions(+), 30 deletions(-)
diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
index 0c48db38f74f..9ce242454c66 100644
--- a/fs/nfs/localio.c
+++ b/fs/nfs/localio.c
@@ -282,23 +282,6 @@ nfs_local_open_fh(struct nfs_client *clp, const struct cred *cred,
}
EXPORT_SYMBOL_GPL(nfs_local_open_fh);
-static struct bio_vec *
-nfs_bvec_alloc_and_import_pagevec(struct page **pagevec,
- unsigned int npages, gfp_t flags)
-{
- struct bio_vec *bvec, *p;
-
- bvec = kmalloc_array(npages, sizeof(*bvec), flags);
- if (bvec != NULL) {
- for (p = bvec; npages > 0; p++, pagevec++, npages--) {
- p->bv_page = *pagevec;
- p->bv_len = PAGE_SIZE;
- p->bv_offset = 0;
- }
- }
- return bvec;
-}
-
static void
nfs_local_iocb_free(struct nfs_local_kiocb *iocb)
{
@@ -315,8 +298,9 @@ nfs_local_iocb_alloc(struct nfs_pgio_header *hdr,
iocb = kmalloc(sizeof(*iocb), flags);
if (iocb == NULL)
return NULL;
- iocb->bvec = nfs_bvec_alloc_and_import_pagevec(hdr->page_array.pagevec,
- hdr->page_array.npages, flags);
+
+ iocb->bvec = kmalloc_array(hdr->page_array.npages,
+ sizeof(struct bio_vec), flags);
if (iocb->bvec == NULL) {
kfree(iocb);
return NULL;
@@ -339,8 +323,22 @@ static void
nfs_local_iter_init(struct iov_iter *i, struct nfs_local_kiocb *iocb, int dir)
{
struct nfs_pgio_header *hdr = iocb->hdr;
+ struct page **pagevec = hdr->page_array.pagevec;
+ unsigned long v, total;
+ size_t len;
- iov_iter_bvec(i, dir, iocb->bvec, hdr->page_array.npages,
+ v = 0;
+ total = hdr->args.count + hdr->args.pgbase;
+ while (total) {
+ len = min_t(size_t, total, PAGE_SIZE);
+ bvec_set_page(&iocb->bvec[v], *(pagevec++),
+ len, 0);
+ total -= len;
+ ++v;
+ }
+ WARN_ON_ONCE(v != hdr->page_array.npages);
+
+ iov_iter_bvec(i, dir, iocb->bvec, v,
hdr->args.count + hdr->args.pgbase);
if (hdr->args.pgbase != 0)
iov_iter_advance(i, hdr->args.pgbase);
@@ -469,6 +467,10 @@ static void nfs_local_call_read(struct work_struct *work)
save_cred = override_creds(filp->f_cred);
nfs_local_iter_init(&iter, iocb, READ);
+ if (iocb->kiocb.ki_flags & IOCB_DIRECT) {
+ iocb->kiocb.ki_complete = nfs_local_read_aio_complete;
+ iocb->aio_complete_work = nfs_local_read_aio_complete_work;
+ }
status = filp->f_op->read_iter(&iocb->kiocb, &iter);
if (status != -EIOCBQUEUED) {
@@ -502,11 +504,6 @@ nfs_do_local_read(struct nfs_pgio_header *hdr,
nfs_local_pgio_init(hdr, call_ops);
hdr->res.eof = false;
- if (iocb->kiocb.ki_flags & IOCB_DIRECT) {
- iocb->kiocb.ki_complete = nfs_local_read_aio_complete;
- iocb->aio_complete_work = nfs_local_read_aio_complete_work;
- }
-
INIT_WORK(&iocb->work, nfs_local_call_read);
queue_work(nfslocaliod_workqueue, &iocb->work);
@@ -663,6 +660,10 @@ static void nfs_local_call_write(struct work_struct *work)
save_cred = override_creds(filp->f_cred);
nfs_local_iter_init(&iter, iocb, WRITE);
+ if (iocb->kiocb.ki_flags & IOCB_DIRECT) {
+ iocb->kiocb.ki_complete = nfs_local_write_aio_complete;
+ iocb->aio_complete_work = nfs_local_write_aio_complete_work;
+ }
file_start_write(filp);
status = filp->f_op->write_iter(&iocb->kiocb, &iter);
@@ -712,11 +713,6 @@ nfs_do_local_write(struct nfs_pgio_header *hdr,
nfs_set_local_verifier(hdr->inode, hdr->res.verf, hdr->args.stable);
- if (iocb->kiocb.ki_flags & IOCB_DIRECT) {
- iocb->kiocb.ki_complete = nfs_local_write_aio_complete;
- iocb->aio_complete_work = nfs_local_write_aio_complete_work;
- }
-
INIT_WORK(&iocb->work, nfs_local_call_write);
queue_work(nfslocaliod_workqueue, &iocb->work);
--
2.44.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v2 5/7] nfs/localio: fallback to NFSD for misaligned O_DIRECT READs
2025-07-22 2:49 [PATCH v2 0/7] NFS DIRECT: handle misaligned READ and WRITE for LOCALIO Mike Snitzer
` (3 preceding siblings ...)
2025-07-22 2:49 ` [PATCH v2 4/7] nfs/localio: refactor iocb initialization Mike Snitzer
@ 2025-07-22 2:49 ` Mike Snitzer
2025-07-22 2:49 ` [PATCH v2 6/7] nfs/direct: add misaligned READ handling Mike Snitzer
` (2 subsequent siblings)
7 siblings, 0 replies; 14+ messages in thread
From: Mike Snitzer @ 2025-07-22 2:49 UTC (permalink / raw)
To: Trond Myklebust, Anna Schumaker; +Cc: linux-nfs
LOCALIO is left enabled, it just isn't used for any misaligned
O_DIRECT READ larger than 32K.
If LOCALIO determines that an O_DIRECT READ is misaligned then it
makes sense to immediately issue the READ remotely via NFSD (which has
the ability to expand a misaligned O_DIRECT READ to be DIO-aligned)
if/when NFSD is configured to use O_DIRECT for READ IO with:
echo 3 > /sys/kernel/debug/nfsd/io_cache_read
This change in behavior for LOCALIO's O_DIRECT support really should
be dependent on NFSD running with io_cache_read=3 but there isn't an
interface to check for that (currently). This fallback is sub-optimal
due to resorting to using RPC and will only serve as a last resort
if/when NFS client's O_DIRECT support isn't able to align misaligned
IO (support will be added in subsequent patches).
Add 'localio_O_DIRECT_align_misaligned_IO' modparm, which depends on
localio_O_DIRECT_semantics=Y, to control if LOCALIO will make best
effort to transform misaligned IO to DIO-aligned (e.g. expanding
misaligned READ to DIO-aligned).
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfs/flexfilelayout/flexfilelayout.c | 1 +
fs/nfs/internal.h | 9 +-
fs/nfs/localio.c | 139 +++++++++++++++++--------
fs/nfs/pagelist.c | 15 ++-
4 files changed, 113 insertions(+), 51 deletions(-)
diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c
index 4bea008dbebd..fbcf7c5ac118 100644
--- a/fs/nfs/flexfilelayout/flexfilelayout.c
+++ b/fs/nfs/flexfilelayout/flexfilelayout.c
@@ -1911,6 +1911,7 @@ ff_layout_read_pagelist(struct nfs_pgio_header *hdr)
localio = ff_local_open_fh(lseg, idx, ds->ds_clp, ds_cred, fh, FMODE_READ);
if (localio) {
hdr->task.tk_start = ktime_get();
+ // FIXME: if fallback occurs is this stats start bogus?
ff_layout_read_record_layoutstats_start(&hdr->task, hdr);
}
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 3a2ff094216c..16b0351a441a 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -463,13 +463,14 @@ extern struct nfsd_file *nfs_local_open_fh(struct nfs_client *,
struct nfs_file_localio *,
const fmode_t);
extern int nfs_local_doio(struct nfs_client *,
- struct nfsd_file *,
+ struct nfsd_file **,
struct nfs_pgio_header *,
const struct rpc_call_ops *);
extern int nfs_local_commit(struct nfsd_file *,
struct nfs_commit_data *,
const struct rpc_call_ops *, int);
extern bool nfs_server_is_local(const struct nfs_client *clp);
+extern bool nfs_localio_O_DIRECT_align_misaligned_IO(void);
#else /* CONFIG_NFS_LOCALIO */
static inline void nfs_local_probe(struct nfs_client *clp) {}
@@ -482,7 +483,7 @@ nfs_local_open_fh(struct nfs_client *clp, const struct cred *cred,
return NULL;
}
static inline int nfs_local_doio(struct nfs_client *clp,
- struct nfsd_file *localio,
+ struct nfsd_file **localio,
struct nfs_pgio_header *hdr,
const struct rpc_call_ops *call_ops)
{
@@ -498,6 +499,10 @@ static inline bool nfs_server_is_local(const struct nfs_client *clp)
{
return false;
}
+static inline bool nfs_localio_O_DIRECT_align_misaligned_IO(void)
+{
+ return false;
+}
#endif /* CONFIG_NFS_LOCALIO */
/* super.c */
diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
index 9ce242454c66..f61cfe42d745 100644
--- a/fs/nfs/localio.c
+++ b/fs/nfs/localio.c
@@ -36,6 +36,7 @@ struct nfs_local_kiocb {
struct nfs_pgio_header *hdr;
struct work_struct work;
void (*aio_complete_work)(struct work_struct *);
+ struct iov_iter iter ____cacheline_aligned;
struct nfsd_file *localio;
};
@@ -54,6 +55,12 @@ module_param(localio_O_DIRECT_semantics, bool, 0644);
MODULE_PARM_DESC(localio_O_DIRECT_semantics,
"LOCALIO will use O_DIRECT semantics to filesystem.");
+static bool localio_O_DIRECT_align_misaligned_IO __read_mostly = true;
+module_param(localio_O_DIRECT_align_misaligned_IO, bool, 0644);
+/* This feature also depends on: echo 2 > /sys/kernel/debug/nfsd/io_cache_read */
+MODULE_PARM_DESC(localio_O_DIRECT_align_misaligned_IO,
+ "If LOCALIO_O_DIRECT_semantics=Y make best effort to transform misaligned IO to DIO-aligned.");
+
static inline bool nfs_client_is_local(const struct nfs_client *clp)
{
return !!rcu_access_pointer(clp->cl_uuid.net);
@@ -65,6 +72,12 @@ bool nfs_server_is_local(const struct nfs_client *clp)
}
EXPORT_SYMBOL_GPL(nfs_server_is_local);
+bool nfs_localio_O_DIRECT_align_misaligned_IO(void)
+{
+ return localio_O_DIRECT_align_misaligned_IO;
+}
+EXPORT_SYMBOL_GPL(nfs_localio_O_DIRECT_align_misaligned_IO);
+
/*
* UUID_IS_LOCAL XDR functions
*/
@@ -319,8 +332,8 @@ nfs_local_iocb_alloc(struct nfs_pgio_header *hdr,
return iocb;
}
-static void
-nfs_local_iter_init(struct iov_iter *i, struct nfs_local_kiocb *iocb, int dir)
+static int
+nfs_local_iter_init(struct iov_iter *i, struct nfs_local_kiocb *iocb, int rw)
{
struct nfs_pgio_header *hdr = iocb->hdr;
struct page **pagevec = hdr->page_array.pagevec;
@@ -338,7 +351,7 @@ nfs_local_iter_init(struct iov_iter *i, struct nfs_local_kiocb *iocb, int dir)
}
WARN_ON_ONCE(v != hdr->page_array.npages);
- iov_iter_bvec(i, dir, iocb->bvec, v,
+ iov_iter_bvec(i, rw, iocb->bvec, v,
hdr->args.count + hdr->args.pgbase);
if (hdr->args.pgbase != 0)
iov_iter_advance(i, hdr->args.pgbase);
@@ -349,7 +362,7 @@ nfs_local_iter_init(struct iov_iter *i, struct nfs_local_kiocb *iocb, int dir)
nfs_to->nfsd_file_dio_alignment(iocb->localio, &nf_dio_mem_align,
&nf_dio_offset_align,
&nf_dio_read_offset_align);
- if (dir == READ)
+ if (rw == ITER_DEST)
nf_dio_offset_align = nf_dio_read_offset_align;
/* direct I/O must be aligned to device logical sector size */
if (nf_dio_mem_align && nf_dio_offset_align &&
@@ -358,10 +371,21 @@ nfs_local_iter_init(struct iov_iter *i, struct nfs_local_kiocb *iocb, int dir)
nf_dio_offset_align - 1))
return 0;
+ /* Only send misaligned READ to NFSD if 32K or larger */
+ if (localio_O_DIRECT_align_misaligned_IO &&
+ (rw == ITER_DEST) && (hdr->args.count >= (32 << 10))) {
+ /*
+ * Fallback to sending this READ to NFSD since it
+ * can expand misaligned READ IO to be DIO-aligned.
+ */
+ return -ENOSYS;
+ }
/* Fallback to using buffered for this misaligned IO */
iocb->kiocb.ki_flags &= ~IOCB_DIRECT;
iocb->kiocb.ki_filp->f_flags &= ~O_DIRECT;
}
+
+ return 0;
}
static void
@@ -394,13 +418,18 @@ nfs_local_pgio_done(struct nfs_pgio_header *hdr, long status)
}
}
-static void
-nfs_local_pgio_release(struct nfs_local_kiocb *iocb)
+static void nfs_local_iocb_release(struct nfs_local_kiocb *iocb)
{
- struct nfs_pgio_header *hdr = iocb->hdr;
-
nfs_local_file_put(iocb->localio);
nfs_local_iocb_free(iocb);
+}
+
+static void
+nfs_local_pgio_release(struct nfs_local_kiocb *iocb)
+{
+ struct nfs_pgio_header *hdr = iocb->hdr;
+
+ nfs_local_iocb_release(iocb);
nfs_local_hdr_release(hdr, hdr->task.tk_ops);
}
@@ -461,18 +490,16 @@ static void nfs_local_call_read(struct work_struct *work)
container_of(work, struct nfs_local_kiocb, work);
struct file *filp = iocb->kiocb.ki_filp;
const struct cred *save_cred;
- struct iov_iter iter;
ssize_t status;
save_cred = override_creds(filp->f_cred);
- nfs_local_iter_init(&iter, iocb, READ);
if (iocb->kiocb.ki_flags & IOCB_DIRECT) {
iocb->kiocb.ki_complete = nfs_local_read_aio_complete;
iocb->aio_complete_work = nfs_local_read_aio_complete_work;
}
- status = filp->f_op->read_iter(&iocb->kiocb, &iter);
+ status = filp->f_op->read_iter(&iocb->kiocb, &iocb->iter);
if (status != -EIOCBQUEUED) {
nfs_local_read_done(iocb, status);
nfs_local_pgio_release(iocb);
@@ -482,25 +509,14 @@ static void nfs_local_call_read(struct work_struct *work)
}
static int
-nfs_do_local_read(struct nfs_pgio_header *hdr,
- struct nfsd_file *localio,
+nfs_local_do_read(struct nfs_local_kiocb *iocb,
const struct rpc_call_ops *call_ops)
{
- struct nfs_local_kiocb *iocb;
- struct file *file = nfs_to->nfsd_file_file(localio);
-
- /* Don't support filesystems without read_iter */
- if (!file->f_op->read_iter)
- return -EAGAIN;
+ struct nfs_pgio_header *hdr = iocb->hdr;
dprintk("%s: vfs_read count=%u pos=%llu\n",
__func__, hdr->args.count, hdr->args.offset);
- iocb = nfs_local_iocb_alloc(hdr, file, GFP_KERNEL);
- if (iocb == NULL)
- return -ENOMEM;
- iocb->localio = localio;
-
nfs_local_pgio_init(hdr, call_ops);
hdr->res.eof = false;
@@ -653,20 +669,18 @@ static void nfs_local_call_write(struct work_struct *work)
struct file *filp = iocb->kiocb.ki_filp;
unsigned long old_flags = current->flags;
const struct cred *save_cred;
- struct iov_iter iter;
ssize_t status;
current->flags |= PF_LOCAL_THROTTLE | PF_MEMALLOC_NOIO;
save_cred = override_creds(filp->f_cred);
- nfs_local_iter_init(&iter, iocb, WRITE);
if (iocb->kiocb.ki_flags & IOCB_DIRECT) {
iocb->kiocb.ki_complete = nfs_local_write_aio_complete;
iocb->aio_complete_work = nfs_local_write_aio_complete_work;
}
file_start_write(filp);
- status = filp->f_op->write_iter(&iocb->kiocb, &iter);
+ status = filp->f_op->write_iter(&iocb->kiocb, &iocb->iter);
file_end_write(filp);
if (status != -EIOCBQUEUED) {
nfs_local_write_done(iocb, status);
@@ -679,26 +693,15 @@ static void nfs_local_call_write(struct work_struct *work)
}
static int
-nfs_do_local_write(struct nfs_pgio_header *hdr,
- struct nfsd_file *localio,
+nfs_local_do_write(struct nfs_local_kiocb *iocb,
const struct rpc_call_ops *call_ops)
{
- struct nfs_local_kiocb *iocb;
- struct file *file = nfs_to->nfsd_file_file(localio);
-
- /* Don't support filesystems without write_iter */
- if (!file->f_op->write_iter)
- return -EAGAIN;
+ struct nfs_pgio_header *hdr = iocb->hdr;
dprintk("%s: vfs_write count=%u pos=%llu %s\n",
__func__, hdr->args.count, hdr->args.offset,
(hdr->args.stable == NFS_UNSTABLE) ? "unstable" : "stable");
- iocb = nfs_local_iocb_alloc(hdr, file, GFP_NOIO);
- if (iocb == NULL)
- return -ENOMEM;
- iocb->localio = localio;
-
switch (hdr->args.stable) {
default:
break;
@@ -719,32 +722,78 @@ nfs_do_local_write(struct nfs_pgio_header *hdr,
return 0;
}
-int nfs_local_doio(struct nfs_client *clp, struct nfsd_file *localio,
+static struct nfs_local_kiocb *
+nfs_local_iocb_init(struct nfs_pgio_header *hdr, struct nfsd_file **localio)
+{
+ struct file *file = nfs_to->nfsd_file_file(*localio);
+ struct nfs_local_kiocb *iocb;
+ gfp_t gfp_mask;
+ int rw, status;
+
+ if (hdr->rw_mode & FMODE_READ) {
+ if (!file->f_op->read_iter)
+ return ERR_PTR(-EOPNOTSUPP);
+ gfp_mask = GFP_KERNEL;
+ rw = ITER_DEST;
+ } else {
+ if (!file->f_op->write_iter)
+ return ERR_PTR(-EOPNOTSUPP);
+ gfp_mask = GFP_NOIO;
+ rw = ITER_SOURCE;
+ }
+
+ iocb = nfs_local_iocb_alloc(hdr, file, gfp_mask);
+ if (iocb == NULL)
+ return ERR_PTR(-ENOMEM);
+ iocb->hdr = hdr;
+ iocb->localio = *localio;
+
+ status = nfs_local_iter_init(&iocb->iter, iocb, rw);
+ if (status == -ENOSYS) {
+ /* close nfsd_file and clear localio,
+ * this informs callers that IO should
+ * be serviced remotely.
+ */
+ nfs_local_iocb_release(iocb);
+ *localio = NULL;
+ return ERR_PTR(status);
+ }
+ WARN_ON_ONCE(status != 0);
+
+ return iocb;
+}
+
+int nfs_local_doio(struct nfs_client *clp, struct nfsd_file **localio,
struct nfs_pgio_header *hdr,
const struct rpc_call_ops *call_ops)
{
+ struct nfs_local_kiocb *iocb;
int status = 0;
if (!hdr->args.count)
return 0;
+ iocb = nfs_local_iocb_init(hdr, localio);
+ if (IS_ERR(iocb))
+ return PTR_ERR(iocb);
+
switch (hdr->rw_mode) {
case FMODE_READ:
- status = nfs_do_local_read(hdr, localio, call_ops);
+ status = nfs_local_do_read(iocb, call_ops);
break;
case FMODE_WRITE:
- status = nfs_do_local_write(hdr, localio, call_ops);
+ status = nfs_local_do_write(iocb, call_ops);
break;
default:
dprintk("%s: invalid mode: %d\n", __func__,
hdr->rw_mode);
- status = -EINVAL;
+ status = -EOPNOTSUPP;
}
if (status != 0) {
if (status == -EAGAIN)
nfs_localio_disable_client(clp);
- nfs_local_file_put(localio);
+ nfs_local_iocb_release(iocb);
hdr->task.tk_status = status;
nfs_local_hdr_release(hdr, call_ops);
}
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index 11968dcb7243..9ddff27e96e9 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -762,9 +762,17 @@ int nfs_initiate_pgio(struct rpc_clnt *clnt, struct nfs_pgio_header *hdr,
hdr->args.count,
(unsigned long long)hdr->args.offset);
- if (localio)
- return nfs_local_doio(NFS_SERVER(hdr->inode)->nfs_client,
- localio, hdr, call_ops);
+ if (localio) {
+ int status = nfs_local_doio(NFS_SERVER(hdr->inode)->nfs_client,
+ &localio, hdr, call_ops);
+ /* nfs_local_doio() will clear localio and return -ENOSYS if
+ * it is prudent to immediately service this IO remotely.
+ */
+ if (status != -ENOSYS)
+ return status;
+ WARN_ON_ONCE(localio != NULL);
+ /* fallthrough */
+ }
task = rpc_run_task(&task_setup_data);
if (IS_ERR(task))
@@ -959,7 +967,6 @@ static int nfs_generic_pg_pgios(struct nfs_pageio_descriptor *desc)
ret = nfs_generic_pgio(desc, hdr);
if (ret == 0) {
struct nfs_client *clp = NFS_SERVER(hdr->inode)->nfs_client;
-
struct nfsd_file *localio =
nfs_local_open_fh(clp, hdr->cred, hdr->args.fh,
&hdr->args.context->nfl,
--
2.44.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v2 6/7] nfs/direct: add misaligned READ handling
2025-07-22 2:49 [PATCH v2 0/7] NFS DIRECT: handle misaligned READ and WRITE for LOCALIO Mike Snitzer
` (4 preceding siblings ...)
2025-07-22 2:49 ` [PATCH v2 5/7] nfs/localio: fallback to NFSD for misaligned O_DIRECT READs Mike Snitzer
@ 2025-07-22 2:49 ` Mike Snitzer
2025-07-22 2:49 ` [PATCH v2 7/7] nfs/direct: add misaligned WRITE handling Mike Snitzer
2025-07-23 18:40 ` [PATCH v2 0/7] NFS DIRECT: handle misaligned READ and WRITE for LOCALIO Mike Snitzer
7 siblings, 0 replies; 14+ messages in thread
From: Mike Snitzer @ 2025-07-22 2:49 UTC (permalink / raw)
To: Trond Myklebust, Anna Schumaker; +Cc: linux-nfs
Because the NFS client will already happily handle misaligned O_DIRECT
IO (by sending it out to NFSD via RPC) this commit's new capabilities
are for the benefit of LOCALIO and require the nfs modparam:
localio_O_DIRECT_align_misaligned_IO=Y
When enabled, misaligned READ IO is expanded to consist of a
DIO-aligned extent followed by a single misaligned tail page (due to
it being a partial page).
Also add an nfs_analyze_dio trace event that shows how the NFS client
split a given misaligned IO into a mix of misaligned page(s) and a
DIO-aligned extent.
This combination of trace events is useful for LOCALIO READs:
echo 1 > /sys/kernel/tracing/events/nfs/nfs_analyze_dio/enable
echo 1 > /sys/kernel/tracing/events/nfs/nfs_initiate_read/enable
echo 1 > /sys/kernel/tracing/events/nfs/nfs_readpage_done/enable
echo 1 > /sys/kernel/tracing/events/xfs/xfs_file_direct_read/enable
Which for this dd command:
dd if=/mnt/share1/test of=/dev/null bs=47008 count=2 iflag=direct
Results in:
dd-63258 [002] ..... 83742.428577: nfs_analyze_dio: READ offset=0 len=47008 start=0+0 middle=0+45056 end=45056+1952
dd-63258 [002] ..... 83742.428591: nfs_initiate_read: fileid=00:2e:219750 fhandle=0xf6927a01 offset=0 count=45056
kworker/u193:3-62985 [011] ..... 83742.428594: xfs_file_direct_read: dev 259:22 ino 0x5e0000a3 disize 0x16f40 pos 0x0 bytecount 0xb000
dd-63258 [002] ..... 83742.428595: nfs_initiate_read: fileid=00:2e:219750 fhandle=0xf6927a01 offset=45056 count=1952
kworker/u193:4-63221 [004] ..... 83742.428598: nfs_readpage_done: error=0 fileid=00:2e:219750 fhandle=0xf6927a01 offset=45056 count=1952 res=1952
kworker/u193:4-63221 [004] ..... 83742.428613: nfs_readpage_done: error=0 fileid=00:2e:219750 fhandle=0xf6927a01 offset=0 count=45056 res=45056
dd-63258 [002] ..... 83742.428619: nfs_analyze_dio: READ offset=47008 len=47008 start=45056+1952 middle=47008+43104 end=90112+3904
dd-63258 [002] ..... 83742.428622: nfs_initiate_read: fileid=00:2e:219750 fhandle=0xf6927a01 offset=45056 count=45056
dd-63258 [002] ..... 83742.428624: nfs_initiate_read: fileid=00:2e:219750 fhandle=0xf6927a01 offset=90112 count=3904
kworker/u193:4-63221 [004] ..... 83742.428624: xfs_file_direct_read: dev 259:22 ino 0x5e0000a3 disize 0x16f40 pos 0xb000 bytecount 0xb000
kworker/u193:3-62985 [011] ..... 83742.428628: nfs_readpage_done: error=0 fileid=00:2e:219750 fhandle=0xf6927a01 offset=90112 count=3904 res=3904 eof
kworker/u193:3-62985 [011] ..... 83742.428642: nfs_readpage_done: error=0 fileid=00:2e:219750 fhandle=0xf6927a01 offset=45056 count=45056 res=45056
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfs/direct.c | 173 ++++++++++++++++++++++++++++++++++++---
fs/nfs/internal.h | 7 ++
fs/nfs/nfstrace.h | 41 ++++++++++
fs/nfs/pagelist.c | 7 ++
include/linux/nfs_page.h | 1 +
5 files changed, 218 insertions(+), 11 deletions(-)
diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index 48d89716193a..ba644daa5e14 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -210,6 +210,13 @@ static void nfs_direct_req_free(struct kref *kref)
nfs_put_lock_context(dreq->l_ctx);
if (dreq->ctx != NULL)
put_nfs_open_context(dreq->ctx);
+
+ if (dreq->start_extra_bvec != NULL) {
+ if (dreq->start_extra_bvec->bv_page != NULL)
+ __free_page(dreq->start_extra_bvec->bv_page);
+ kfree(dreq->start_extra_bvec);
+ }
+
kmem_cache_free(nfs_direct_cachep, dreq);
}
@@ -264,6 +271,10 @@ static void nfs_direct_complete(struct nfs_direct_req *dreq)
if (dreq->count != 0) {
res = (long) dreq->count;
WARN_ON_ONCE(dreq->count < 0);
+ /* Reduce res by front_pad */
+ if ((dreq->start_extra_bvec != NULL) &&
+ res >= dreq->start_extra_bvec->bv_len)
+ res -= dreq->start_extra_bvec->bv_len;
}
dreq->iocb->ki_complete(dreq->iocb, res);
}
@@ -285,6 +296,15 @@ static void nfs_direct_read_completion(struct nfs_pgio_header *hdr)
}
nfs_direct_count_bytes(dreq, hdr);
+
+ if (dreq->start_extra_bvec != NULL && (dreq->count == dreq->max_count)) {
+ unsigned front_pad = dreq->start_extra_bvec->bv_len;
+
+ hdr->res.count -= front_pad;
+ hdr->good_bytes -= front_pad;
+ hdr->args.count -= front_pad;
+ hdr->args.offset += front_pad;
+ }
spin_unlock(&dreq->lock);
nfs_update_delegated_atime(dreq->inode);
@@ -353,6 +373,30 @@ static ssize_t nfs_direct_read_schedule_iovec(struct nfs_direct_req *dreq,
desc.pg_dreq = dreq;
inode_dio_begin(inode);
+ if (dreq->start_extra_bvec != NULL) {
+ struct nfs_page *req;
+ size_t pgbase = dreq->start_extra_bvec->bv_offset;
+ unsigned int front_pad = dreq->start_extra_bvec->bv_len;
+
+ /* Must force start pos to DIO-aligned start */
+ WARN_ON(pos != dreq->io_start);
+ req = nfs_page_create_from_page(dreq->ctx,
+ dreq->start_extra_bvec->bv_page,
+ pgbase, pos, front_pad);
+ if (IS_ERR(req)) {
+ result = PTR_ERR(req);
+ goto out;
+ }
+ if (!nfs_pageio_add_request(&desc, req)) {
+ result = desc.pg_error;
+ nfs_release_request(req);
+ goto out;
+ }
+
+ requested_bytes += front_pad;
+ pos += front_pad;
+ }
+
while (iov_iter_count(iter)) {
struct page **pagevec;
size_t bytes;
@@ -363,12 +407,19 @@ static ssize_t nfs_direct_read_schedule_iovec(struct nfs_direct_req *dreq,
rsize, &pgbase);
if (result < 0)
break;
-
- bytes = result;
- npages = (result + pgbase + PAGE_SIZE - 1) / PAGE_SIZE;
+
+ /* Limit the first batch of pages to DIO-aligned boundary? */
+ if (pos < dreq->end_offset && dreq->middle_len)
+ bytes = min_t(size_t, dreq->middle_len, result);
+ else
+ bytes = result;
+ npages = (bytes + pgbase + PAGE_SIZE - 1) / PAGE_SIZE;
+
for (i = 0; i < npages; i++) {
struct nfs_page *req;
unsigned int req_len = min_t(size_t, bytes, PAGE_SIZE - pgbase);
+ bool issue_dio_now = false;
+
/* XXX do we need to do the eof zeroing found in async_filler? */
req = nfs_page_create_from_page(dreq->ctx, pagevec[i],
pgbase, pos, req_len);
@@ -376,15 +427,33 @@ static ssize_t nfs_direct_read_schedule_iovec(struct nfs_direct_req *dreq,
result = PTR_ERR(req);
break;
}
+
+ pgbase = 0;
+ result -= req_len;
+ bytes -= req_len;
+ requested_bytes += req_len;
+ pos += req_len;
+
+ /* Looking ahead, is this req the end of the DIO-aligned middle? */
+ if (bytes == 0 && dreq->end_len &&
+ pos == dreq->end_offset && result == dreq->end_len) {
+ desc.pg_doio_now = 1;
+ issue_dio_now = true;
+ /* Reset iter to the last page (known misaligned),
+ * issue previous DIO-aligned page and then handle
+ * the last partial page stored in iter
+ */
+ iov_iter_revert(iter, result);
+ }
+
if (!nfs_pageio_add_request(&desc, req)) {
result = desc.pg_error;
nfs_release_request(req);
break;
}
- pgbase = 0;
- bytes -= req_len;
- requested_bytes += req_len;
- pos += req_len;
+
+ if (issue_dio_now)
+ break;
}
nfs_direct_release_pages(pagevec, npages);
kvfree(pagevec);
@@ -398,6 +467,7 @@ static ssize_t nfs_direct_read_schedule_iovec(struct nfs_direct_req *dreq,
* If no bytes were started, return the error, and let the
* generic layer handle the completion.
*/
+out:
if (requested_bytes == 0) {
inode_dio_end(inode);
nfs_direct_req_release(dreq);
@@ -409,6 +479,65 @@ static ssize_t nfs_direct_read_schedule_iovec(struct nfs_direct_req *dreq,
return requested_bytes;
}
+/*
+ * If localio_O_DIRECT_align_misaligned_READ enabled, expand any
+ * misaligned READ to include the previous DIO-aligned block.
+ * - FIXME: expanding the end to also be DIO-aligned requires a
+ * bounce page that must be copied to original partial end page.
+ */
+static bool nfs_analyze_read_dio(loff_t offset, __u32 len,
+ struct nfs_direct_req *dreq)
+{
+#if IS_ENABLED(CONFIG_NFS_LOCALIO)
+ /* Hardcoded to PAGE_SIZE (since don't have LOCALIO nfsd_file's
+ * dio_alignment), works for smaller alignment too (e.g. 512b).
+ */
+ u32 dio_blocksize = PAGE_SIZE;
+ loff_t start, front_pad, orig_end, middle_end;
+
+ if (!nfs_localio_O_DIRECT_align_misaligned_IO())
+ return false;
+
+ start = round_down(offset, dio_blocksize);
+ front_pad = offset - start;
+ orig_end = offset + len;
+ middle_end = round_down(orig_end, dio_blocksize);
+
+ if (front_pad) {
+ gfp_t gfp_mask = nfs_io_gfp_mask();
+
+ dreq->start_extra_bvec = kmalloc(sizeof(struct bio_vec), gfp_mask);
+ if (dreq->start_extra_bvec == NULL)
+ return false;
+ dreq->start_extra_bvec->bv_page = alloc_page(gfp_mask);
+ if (dreq->start_extra_bvec->bv_page == NULL) {
+ kfree(dreq->start_extra_bvec);
+ dreq->start_extra_bvec = NULL;
+ return false;
+ }
+
+ bvec_set_page(dreq->start_extra_bvec,
+ dreq->start_extra_bvec->bv_page,
+ front_pad, PAGE_SIZE - front_pad);
+ }
+
+ dreq->middle_offset = offset;
+ dreq->middle_len = middle_end - offset;
+ dreq->end_offset = middle_end;
+ dreq->end_len = orig_end - middle_end;
+
+ dreq->io_start = start;
+ dreq->max_count = orig_end - start;
+
+ trace_nfs_analyze_dio(READ, offset, len, start, front_pad,
+ dreq->middle_offset, dreq->middle_len,
+ dreq->end_offset, dreq->end_len);
+ return true;
+#else
+ return false;
+#endif
+}
+
/**
* nfs_file_direct_read - file direct read operation for NFS files
* @iocb: target I/O control block
@@ -439,6 +568,9 @@ ssize_t nfs_file_direct_read(struct kiocb *iocb, struct iov_iter *iter,
struct nfs_lock_context *l_ctx;
ssize_t result, requested;
size_t count = iov_iter_count(iter);
+ size_t in_count = count;
+ unsigned int front_pad = 0;
+
nfs_add_stats(mapping->host, NFSIOS_DIRECTREADBYTES, count);
dfprintk(FILE, "NFS: direct read(%pD2, %zd@%Ld)\n",
@@ -455,9 +587,20 @@ ssize_t nfs_file_direct_read(struct kiocb *iocb, struct iov_iter *iter,
if (dreq == NULL)
goto out;
+ if (!swap && nfs_analyze_read_dio(iocb->ki_pos, count, dreq)) {
+ /* note that dreq values do include front_pad
+ * (dreq->io_start -> dreq->start_extra_bvec->bv_offset)
+ */
+ iocb->ki_pos = dreq->io_start;
+ count = dreq->max_count;
+ if (dreq->start_extra_bvec)
+ front_pad = dreq->start_extra_bvec->bv_len;
+ } else {
+ dreq->io_start = iocb->ki_pos;
+ dreq->max_count = count;
+ }
+
dreq->inode = inode;
- dreq->max_count = count;
- dreq->io_start = iocb->ki_pos;
dreq->ctx = get_nfs_open_context(nfs_file_open_context(iocb->ki_filp));
l_ctx = nfs_get_lock_context(dreq->ctx);
if (IS_ERR(l_ctx)) {
@@ -483,16 +626,24 @@ ssize_t nfs_file_direct_read(struct kiocb *iocb, struct iov_iter *iter,
}
}
- NFS_I(inode)->read_io += count;
+ NFS_I(inode)->read_io += in_count;
requested = nfs_direct_read_schedule_iovec(dreq, iter, iocb->ki_pos);
if (!swap)
nfs_end_io_direct(inode);
if (requested > 0) {
+ if (front_pad) {
+ /* given the iov_iter_revert below, must exclude the
+ * front_pad (dreq->start_extra_bvec) from requested,
+ */
+ requested -= front_pad;
+ }
+
result = nfs_direct_wait(dreq);
if (result > 0) {
- requested -= result;
+ if (front_pad && result >= front_pad)
+ result -= front_pad;
iocb->ki_pos += result;
}
iov_iter_revert(iter, requested);
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 16b0351a441a..6ab9d345518a 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -990,4 +990,11 @@ struct nfs_direct_req {
/* for read */
#define NFS_ODIRECT_SHOULD_DIRTY (3) /* dirty user-space page after read */
#define NFS_ODIRECT_DONE INT_MAX /* write verification failed */
+
+ /* State for expanding misaligned IO to be DIO-aligned (for LOCALIO) */
+ struct bio_vec * start_extra_bvec;
+ loff_t middle_offset; /* Offset for start of DIO-aligned middle */
+ loff_t end_offset; /* Offset for start of DIO-aligned end */
+ ssize_t middle_len; /* Length for DIO-aligned middle */
+ ssize_t end_len; /* Length for misaligned last page */
};
diff --git a/fs/nfs/nfstrace.h b/fs/nfs/nfstrace.h
index feadaa6dee63..ceeca867d174 100644
--- a/fs/nfs/nfstrace.h
+++ b/fs/nfs/nfstrace.h
@@ -1594,6 +1594,47 @@ DEFINE_NFS_DIRECT_REQ_EVENT(nfs_direct_write_completion);
DEFINE_NFS_DIRECT_REQ_EVENT(nfs_direct_write_schedule_iovec);
DEFINE_NFS_DIRECT_REQ_EVENT(nfs_direct_write_reschedule_io);
+TRACE_EVENT(nfs_analyze_dio,
+ TP_PROTO(u32 rw,
+ u64 offset,
+ u32 len,
+ loff_t start,
+ loff_t start_extra,
+ loff_t middle,
+ loff_t middle_len,
+ loff_t end,
+ loff_t end_len),
+ TP_ARGS(rw, offset, len, start, start_extra, middle, middle_len, end, end_len),
+ TP_STRUCT__entry(
+ __field(u32, rw)
+ __field(u64, offset)
+ __field(u32, len)
+ __field(loff_t, start)
+ __field(loff_t, start_extra)
+ __field(loff_t, middle)
+ __field(loff_t, middle_len)
+ __field(loff_t, end)
+ __field(loff_t, end_len)
+ ),
+ TP_fast_assign(
+ __entry->rw = rw;
+ __entry->offset = offset;
+ __entry->len = len;
+ __entry->start = start;
+ __entry->start_extra = start_extra;
+ __entry->middle = middle;
+ __entry->middle_len = middle_len;
+ __entry->end = end;
+ __entry->end_len = end_len;
+ ),
+ TP_printk("%s offset=%llu len=%u start=%llu+%llu middle=%llu+%llu end=%llu+%llu",
+ __entry->rw ? "WRITE" : "READ",
+ __entry->offset, __entry->len,
+ __entry->start, __entry->start_extra,
+ __entry->middle, __entry->middle_len,
+ __entry->end, __entry->end_len)
+);
+
TRACE_EVENT(nfs_fh_to_dentry,
TP_PROTO(
const struct super_block *sb,
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index 9ddff27e96e9..8d877360042d 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -832,6 +832,7 @@ void nfs_pageio_init(struct nfs_pageio_descriptor *desc,
int io_flags)
{
desc->pg_moreio = 0;
+ desc->pg_doio_now = 0;
desc->pg_inode = inode;
desc->pg_ops = pg_ops;
desc->pg_completion_ops = compl_ops;
@@ -1141,6 +1142,8 @@ nfs_pageio_do_add_request(struct nfs_pageio_descriptor *desc,
return size;
nfs_list_move_request(req, &mirror->pg_list);
mirror->pg_count += req->wb_bytes;
+ if (desc->pg_doio_now)
+ return 0; /* trigger nfs_pageio_doio() in caller */
return req->wb_bytes;
}
@@ -1220,6 +1223,10 @@ static int __nfs_pageio_add_request(struct nfs_pageio_descriptor *desc,
nfs_pageio_doio(desc);
if (desc->pg_error < 0 || mirror->pg_recoalesce)
return 0;
+ if (desc->pg_doio_now) {
+ desc->pg_doio_now = 0;
+ return 1;
+ }
/* retry add_request for this subreq */
nfs_page_group_lock(req);
continue;
diff --git a/include/linux/nfs_page.h b/include/linux/nfs_page.h
index 1a017b5b476f..5d4d4e5c8025 100644
--- a/include/linux/nfs_page.h
+++ b/include/linux/nfs_page.h
@@ -118,6 +118,7 @@ struct nfs_pageio_descriptor {
u32 pg_mirror_idx; /* current mirror */
unsigned short pg_maxretrans;
unsigned char pg_moreio : 1;
+ unsigned char pg_doio_now : 1;
};
/* arbitrarily selected limit to number of mirrors */
--
2.44.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v2 7/7] nfs/direct: add misaligned WRITE handling
2025-07-22 2:49 [PATCH v2 0/7] NFS DIRECT: handle misaligned READ and WRITE for LOCALIO Mike Snitzer
` (5 preceding siblings ...)
2025-07-22 2:49 ` [PATCH v2 6/7] nfs/direct: add misaligned READ handling Mike Snitzer
@ 2025-07-22 2:49 ` Mike Snitzer
2025-07-23 18:40 ` [PATCH v2 0/7] NFS DIRECT: handle misaligned READ and WRITE for LOCALIO Mike Snitzer
7 siblings, 0 replies; 14+ messages in thread
From: Mike Snitzer @ 2025-07-22 2:49 UTC (permalink / raw)
To: Trond Myklebust, Anna Schumaker; +Cc: linux-nfs
Because the NFS client will already happily handle misaligned O_DIRECT
IO (by sending it out to NFSD via RPC) this commit's new capabilities
are for the benefit of LOCALIO and require the nfs modparam:
localio_O_DIRECT_align_misaligned_IO=Y
When enabled, misaligned WRITE IO is split into a start, middle and
end as needed. The large middle extent is DIO-aligned and the start
and/or end are misaligned (due to each being a partial page).
Like the READ support that came before this WRITE support, the
nfs_analyze_dio trace event shows how the NFS client split a given
misaligned IO into a mix of misaligned page(s) and a DIO-aligned
extent.
This combination of trace events is useful for LOCALIO WRITEs:
echo 1 > /sys/kernel/tracing/events/nfs/nfs_analyze_dio/enable
echo 1 > /sys/kernel/tracing/events/nfs/nfs_initiate_write/enable
echo 1 > /sys/kernel/tracing/events/nfs/nfs_writeback_done/enable
echo 1 > /sys/kernel/tracing/events/xfs/xfs_file_direct_write/enable
Which for this dd command:
dd if=/dev/zero of=/mnt/share1/test bs=47008 count=2 oflag=direct
Results in:
dd-63257 [001] ..... 83742.427650: nfs_analyze_dio: WRITE offset=0 len=47008 start=0+0 middle=0+45056 end=45056+1952
dd-63257 [001] ..... 83742.427659: nfs_initiate_write: fileid=00:2e:219750 fhandle=0xf6927a01 offset=0 count=45056 stable=UNSTABLE
dd-63257 [001] ..... 83742.427662: nfs_initiate_write: fileid=00:2e:219750 fhandle=0xf6927a01 offset=45056 count=1952 stable=UNSTABLE
kworker/u193:3-62985 [011] ..... 83742.427664: xfs_file_direct_write: dev 259:22 ino 0x5e0000a3 disize 0x0 pos 0x0 bytecount 0xb000
kworker/u193:3-62985 [011] ..... 83742.427695: nfs_writeback_done: error=0 fileid=00:2e:219750 fhandle=0xf6927a01 offset=0 count=45056 res=45056 stable=UNSTABLE verifier=a8b37e6803d1eb1e
kworker/u193:4-63221 [004] ..... 83742.427699: nfs_writeback_done: error=0 fileid=00:2e:219750 fhandle=0xf6927a01 offset=45056 count=1952 res=1952 stable=UNSTABLE verifier=a8b37e6803d1eb1e
dd-63257 [001] ..... 83742.427755: nfs_analyze_dio: WRITE offset=47008 len=47008 start=47008+2144 middle=49152+40960 end=90112+3904
dd-63257 [001] ..... 83742.427758: nfs_initiate_write: fileid=00:2e:219750 fhandle=0xf6927a01 offset=47008 count=2144 stable=UNSTABLE
dd-63257 [001] ..... 83742.427760: nfs_initiate_write: fileid=00:2e:219750 fhandle=0xf6927a01 offset=49152 count=40960 stable=UNSTABLE
kworker/u193:4-63221 [004] ..... 83742.427761: nfs_writeback_done: error=0 fileid=00:2e:219750 fhandle=0xf6927a01 offset=47008 count=2144 res=2144 stable=UNSTABLE verifier=a8b37e6803d1eb1e
dd-63257 [001] ..... 83742.427763: nfs_initiate_write: fileid=00:2e:219750 fhandle=0xf6927a01 offset=90112 count=3904 stable=UNSTABLE
kworker/u193:4-63221 [004] ..... 83742.427763: xfs_file_direct_write: dev 259:22 ino 0x5e0000a3 disize 0xb7a0 pos 0xc000 bytecount 0xa000
kworker/u193:4-63221 [004] ..... 83742.427783: nfs_writeback_done: error=0 fileid=00:2e:219750 fhandle=0xf6927a01 offset=49152 count=40960 res=40960 stable=UNSTABLE verifier=a8b37e6803d1eb1e
kworker/u193:3-62985 [011] ..... 83742.427788: nfs_writeback_done: error=0 fileid=00:2e:219750 fhandle=0xf6927a01 offset=90112 count=3904 res=3904 stable=UNSTABLE verifier=a8b37e6803d1eb1e
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfs/direct.c | 89 ++++++++++++++++++++++++++++++++++++++++++++---
fs/nfs/internal.h | 1 +
2 files changed, 85 insertions(+), 5 deletions(-)
diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index ba644daa5e14..a4f6827072c5 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -1043,11 +1043,19 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq,
if (result < 0)
break;
- bytes = result;
- npages = (result + pgbase + PAGE_SIZE - 1) / PAGE_SIZE;
+ /* Limit the amount of bytes serviced each iteration to aligned batches */
+ if (pos < dreq->middle_offset && dreq->start_len)
+ bytes = min_t(size_t, dreq->start_len, result);
+ else if (pos < dreq->end_offset && dreq->middle_len)
+ bytes = min_t(size_t, dreq->middle_len, result);
+ else
+ bytes = result;
+ npages = (bytes + pgbase + PAGE_SIZE - 1) / PAGE_SIZE;
+
for (i = 0; i < npages; i++) {
struct nfs_page *req;
unsigned int req_len = min_t(size_t, bytes, PAGE_SIZE - pgbase);
+ bool issue_dio_now = false;
req = nfs_page_create_from_page(dreq->ctx, pagevec[i],
pgbase, pos, req_len);
@@ -1063,6 +1071,7 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq,
}
pgbase = 0;
+ result -= req_len;
bytes -= req_len;
requested_bytes += req_len;
pos += req_len;
@@ -1072,9 +1081,27 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq,
continue;
}
+ /* Looking ahead, is this req the end of the start or middle? */
+ if (bytes == 0) {
+ if ((dreq->start_len &&
+ pos == dreq->middle_offset && result >= dreq->middle_len) ||
+ (dreq->end_len &&
+ pos == dreq->end_offset && result == dreq->end_len)) {
+ desc.pg_doio_now = 1;
+ issue_dio_now = true;
+ /* Reset iter to the last boundary, isse the current
+ * req and then handle iter to next boundary or end.
+ */
+ iov_iter_revert(iter, result);
+ }
+ }
+
nfs_lock_request(req);
- if (nfs_pageio_add_request(&desc, req))
+ if (nfs_pageio_add_request(&desc, req)) {
+ if (issue_dio_now)
+ break;
continue;
+ }
/* Exit on hard errors */
if (desc.pg_error < 0 && desc.pg_error != -EAGAIN) {
@@ -1115,6 +1142,55 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq,
return requested_bytes;
}
+/*
+ * If localio_O_DIRECT_align_misaligned_WRITE enabled, split misaligned
+ * WRITE to a DIO-aligned middle and misaligned head and/or tail.
+ */
+static bool nfs_analyze_write_dio(loff_t offset, __u32 len,
+ struct nfs_direct_req *dreq)
+{
+#if IS_ENABLED(CONFIG_NFS_LOCALIO)
+ /* Hardcoded to PAGE_SIZE (since don't have LOCALIO nfsd_file's
+ * dio_alignment), works for smaller alignment too (e.g. 512b).
+ */
+ u32 dio_blocksize = PAGE_SIZE;
+ loff_t start_end, orig_end, middle_end;
+ ssize_t start_len = len;
+ bool aligned = true;
+
+ if (!nfs_localio_O_DIRECT_align_misaligned_IO())
+ return false;
+
+ if (unlikely(len < dio_blocksize)) {
+ dreq->io_start = offset;
+ dreq->max_count = len;
+ aligned = false;
+ goto out;
+ }
+
+ start_end = round_up(offset, dio_blocksize);
+ start_len = start_end - offset;
+ orig_end = offset + len;
+ middle_end = round_down(orig_end, dio_blocksize);
+
+ dreq->io_start = offset;
+ dreq->max_count = orig_end - offset;
+
+ dreq->start_len = start_len;
+ dreq->middle_offset = start_end;
+ dreq->middle_len = middle_end - start_end;
+ dreq->end_offset = middle_end;
+ dreq->end_len = orig_end - middle_end;
+out:
+ trace_nfs_analyze_dio(WRITE, offset, len, offset, start_len,
+ dreq->middle_offset, dreq->middle_len,
+ dreq->end_offset, dreq->end_len);
+ return aligned;
+#else
+ return false;
+#endif
+}
+
/**
* nfs_file_direct_write - file direct write operation for NFS files
* @iocb: target I/O control block
@@ -1171,9 +1247,12 @@ ssize_t nfs_file_direct_write(struct kiocb *iocb, struct iov_iter *iter,
if (!dreq)
goto out;
+ if (swap || !nfs_analyze_write_dio(pos, count, dreq)) {
+ dreq->max_count = count;
+ dreq->io_start = pos;
+ }
+
dreq->inode = inode;
- dreq->max_count = count;
- dreq->io_start = pos;
dreq->ctx = get_nfs_open_context(nfs_file_open_context(iocb->ki_filp));
l_ctx = nfs_get_lock_context(dreq->ctx);
if (IS_ERR(l_ctx)) {
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 6ab9d345518a..d07dc3e6316d 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -995,6 +995,7 @@ struct nfs_direct_req {
struct bio_vec * start_extra_bvec;
loff_t middle_offset; /* Offset for start of DIO-aligned middle */
loff_t end_offset; /* Offset for start of DIO-aligned end */
+ ssize_t start_len; /* Length for misaligned first page */
ssize_t middle_len; /* Length for DIO-aligned middle */
ssize_t end_len; /* Length for misaligned last page */
};
--
2.44.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH v2 0/7] NFS DIRECT: handle misaligned READ and WRITE for LOCALIO
2025-07-22 2:49 [PATCH v2 0/7] NFS DIRECT: handle misaligned READ and WRITE for LOCALIO Mike Snitzer
` (6 preceding siblings ...)
2025-07-22 2:49 ` [PATCH v2 7/7] nfs/direct: add misaligned WRITE handling Mike Snitzer
@ 2025-07-23 18:40 ` Mike Snitzer
2025-07-23 18:42 ` Chuck Lever
7 siblings, 1 reply; 14+ messages in thread
From: Mike Snitzer @ 2025-07-23 18:40 UTC (permalink / raw)
To: Trond Myklebust, Anna Schumaker; +Cc: linux-nfs, Chuck Lever, Jeff Layton
On Mon, Jul 21, 2025 at 10:49:17PM -0400, Mike Snitzer wrote:
> Hi,
>
> This "NFS DIRECT" series depends on the "NFSD DIRECT" series here:
> https://lore.kernel.org/linux-nfs/20250714224216.14329-1-snitzer@kernel.org/
> (for the benefit of nfsd_file_dio_alignment patch in this series)
>
> The first patch was posted as part of a LOCALIO revert series I posted
> a week or so ago, thankfully that series isn't needed thanks to Trond
> and Neil's efforts. BUT the first patch is needed, has Reviewed-by
> from Jeff and Neil and is marked for stable@.
>
> The biggest change in v2 is the introduction of O_DIRECT misaligned
> READ and WRITE handling for the benefit of LOCALIO. Please see patches
> 6 and 7 for more details.
Turns out that these NFS client (fs/nfs/direct.c) changes that focused
on benefiting LOCALIO actually also benefit NFSD if/when it is
configured to use O_DIRECT [0].
Such that the NFS client's O_DIRECT code will split any misaligned
WRITE IO and NFSD will then happily issue the DIO-aligned "middle" of
a given misaligned WRITE IO down to the underlying filesystem.
Same goes for READ, NFS client will expand the front of any misaligned
READ such that the bulk of the IO becomes DIO-aligned (only the final
partial tail page is misaligned).
So with the NFS client changes in this series we actually don't _need_
NFSD to have the same type of misaligned IO analysis and handling to
expand READs or split WRITEs to be DIO-aligned. Which reduces work
that NFSD needs to do by leaning on the NFS client having the
capability.
Which means that we _could_ drop the more complicated NFSD misaligned
READ change (last patch [1]) from the v4 NFSD DIRECT patchset I just
posted [2]. And just do the basic NFSD enablement for DIRECT and
DONTCACHE (so we'd still need the v4 NFSD patchest's patches 1-4).
Thanks,
Mike
[0]: https://lore.kernel.org/linux-nfs/20250723154351.59042-1-snitzer@kernel.org/
[1]: https://lore.kernel.org/linux-nfs/20250723154351.59042-6-snitzer@kernel.org/
[2]: https://lore.kernel.org/linux-nfs/20250723154351.59042-1-snitzer@kernel.org/
>
> Changes since v1:
> - renamed nfs modparam from localio_O_DIRECT_align_misaligned_READ to
> localio_O_DIRECT_align_misaligned_IO (is used for misaligned READ
> and WRITE support in fs/nfs/direct.c)
> - added misaligned O_DIRECT handling for both READ and WRITE to
> fs/nfs/direct.c which in practice obviates LOCALIO's need to
> fallback to sending misaligned READs to NFSD.
> - But the 5th patch that adds LOCALIO support to fallback to NFSD is a
> useful backup mechanism (that will hopefully never be needed unless
> some fs/nfs/direct.c bug gets introduced in the future). Patch 5
> also provides refactoring that is useful.
>
> Thanks,
> Mike
>
> Mike Snitzer (7):
> nfs/localio: avoid bouncing LOCALIO if nfs_client_is_local()
> nfs/localio: make trace_nfs_local_open_fh more useful
> nfs/localio: add nfsd_file_dio_alignment
> nfs/localio: refactor iocb initialization
> nfs/localio: fallback to NFSD for misaligned O_DIRECT READs
> nfs/direct: add misaligned READ handling
> nfs/direct: add misaligned WRITE handling
>
> fs/nfs/direct.c | 262 +++++++++++++++++++++++--
> fs/nfs/flexfilelayout/flexfilelayout.c | 1 +
> fs/nfs/internal.h | 17 +-
> fs/nfs/localio.c | 231 ++++++++++++++--------
> fs/nfs/nfstrace.h | 47 ++++-
> fs/nfs/pagelist.c | 22 ++-
> fs/nfsd/localio.c | 11 ++
> include/linux/nfs_page.h | 1 +
> include/linux/nfslocalio.h | 2 +
> 9 files changed, 485 insertions(+), 109 deletions(-)
>
> --
> 2.44.0
>
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 0/7] NFS DIRECT: handle misaligned READ and WRITE for LOCALIO
2025-07-23 18:40 ` [PATCH v2 0/7] NFS DIRECT: handle misaligned READ and WRITE for LOCALIO Mike Snitzer
@ 2025-07-23 18:42 ` Chuck Lever
2025-07-23 23:53 ` Mike Snitzer
0 siblings, 1 reply; 14+ messages in thread
From: Chuck Lever @ 2025-07-23 18:42 UTC (permalink / raw)
To: Mike Snitzer, Trond Myklebust, Anna Schumaker; +Cc: linux-nfs, Jeff Layton
On 7/23/25 2:40 PM, Mike Snitzer wrote:
> On Mon, Jul 21, 2025 at 10:49:17PM -0400, Mike Snitzer wrote:
>> Hi,
>>
>> This "NFS DIRECT" series depends on the "NFSD DIRECT" series here:
>> https://lore.kernel.org/linux-nfs/20250714224216.14329-1-snitzer@kernel.org/
>> (for the benefit of nfsd_file_dio_alignment patch in this series)
>>
>> The first patch was posted as part of a LOCALIO revert series I posted
>> a week or so ago, thankfully that series isn't needed thanks to Trond
>> and Neil's efforts. BUT the first patch is needed, has Reviewed-by
>> from Jeff and Neil and is marked for stable@.
>>
>> The biggest change in v2 is the introduction of O_DIRECT misaligned
>> READ and WRITE handling for the benefit of LOCALIO. Please see patches
>> 6 and 7 for more details.
>
> Turns out that these NFS client (fs/nfs/direct.c) changes that focused
> on benefiting LOCALIO actually also benefit NFSD if/when it is
> configured to use O_DIRECT [0].
>
> Such that the NFS client's O_DIRECT code will split any misaligned
> WRITE IO and NFSD will then happily issue the DIO-aligned "middle" of
> a given misaligned WRITE IO down to the underlying filesystem.
>
> Same goes for READ, NFS client will expand the front of any misaligned
> READ such that the bulk of the IO becomes DIO-aligned (only the final
> partial tail page is misaligned).
>
> So with the NFS client changes in this series we actually don't _need_
> NFSD to have the same type of misaligned IO analysis and handling to
> expand READs or split WRITEs to be DIO-aligned. Which reduces work
> that NFSD needs to do by leaning on the NFS client having the
> capability.
I'm not quite following. Does that apply to non-Linux NFS clients too?
> Which means that we _could_ drop the more complicated NFSD misaligned
> READ change (last patch [1]) from the v4 NFSD DIRECT patchset I just
> posted [2]. And just do the basic NFSD enablement for DIRECT and
> DONTCACHE (so we'd still need the v4 NFSD patchest's patches 1-4).
>
> Thanks,
> Mike
>
> [0]: https://lore.kernel.org/linux-nfs/20250723154351.59042-1-snitzer@kernel.org/
> [1]: https://lore.kernel.org/linux-nfs/20250723154351.59042-6-snitzer@kernel.org/
> [2]: https://lore.kernel.org/linux-nfs/20250723154351.59042-1-snitzer@kernel.org/
>
>>
>> Changes since v1:
>> - renamed nfs modparam from localio_O_DIRECT_align_misaligned_READ to
>> localio_O_DIRECT_align_misaligned_IO (is used for misaligned READ
>> and WRITE support in fs/nfs/direct.c)
>> - added misaligned O_DIRECT handling for both READ and WRITE to
>> fs/nfs/direct.c which in practice obviates LOCALIO's need to
>> fallback to sending misaligned READs to NFSD.
>> - But the 5th patch that adds LOCALIO support to fallback to NFSD is a
>> useful backup mechanism (that will hopefully never be needed unless
>> some fs/nfs/direct.c bug gets introduced in the future). Patch 5
>> also provides refactoring that is useful.
>>
>> Thanks,
>> Mike
>>
>> Mike Snitzer (7):
>> nfs/localio: avoid bouncing LOCALIO if nfs_client_is_local()
>> nfs/localio: make trace_nfs_local_open_fh more useful
>> nfs/localio: add nfsd_file_dio_alignment
>> nfs/localio: refactor iocb initialization
>> nfs/localio: fallback to NFSD for misaligned O_DIRECT READs
>> nfs/direct: add misaligned READ handling
>> nfs/direct: add misaligned WRITE handling
>>
>> fs/nfs/direct.c | 262 +++++++++++++++++++++++--
>> fs/nfs/flexfilelayout/flexfilelayout.c | 1 +
>> fs/nfs/internal.h | 17 +-
>> fs/nfs/localio.c | 231 ++++++++++++++--------
>> fs/nfs/nfstrace.h | 47 ++++-
>> fs/nfs/pagelist.c | 22 ++-
>> fs/nfsd/localio.c | 11 ++
>> include/linux/nfs_page.h | 1 +
>> include/linux/nfslocalio.h | 2 +
>> 9 files changed, 485 insertions(+), 109 deletions(-)
>>
>> --
>> 2.44.0
>>
>>
--
Chuck Lever
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 0/7] NFS DIRECT: handle misaligned READ and WRITE for LOCALIO
2025-07-23 18:42 ` Chuck Lever
@ 2025-07-23 23:53 ` Mike Snitzer
2025-07-23 23:58 ` Mike Snitzer
2025-07-24 13:28 ` Chuck Lever
0 siblings, 2 replies; 14+ messages in thread
From: Mike Snitzer @ 2025-07-23 23:53 UTC (permalink / raw)
To: Chuck Lever; +Cc: Trond Myklebust, Anna Schumaker, linux-nfs, Jeff Layton
On Wed, Jul 23, 2025 at 02:42:56PM -0400, Chuck Lever wrote:
> On 7/23/25 2:40 PM, Mike Snitzer wrote:
> > On Mon, Jul 21, 2025 at 10:49:17PM -0400, Mike Snitzer wrote:
> >> Hi,
> >>
> >> This "NFS DIRECT" series depends on the "NFSD DIRECT" series here:
> >> https://lore.kernel.org/linux-nfs/20250714224216.14329-1-snitzer@kernel.org/
> >> (for the benefit of nfsd_file_dio_alignment patch in this series)
> >>
> >> The first patch was posted as part of a LOCALIO revert series I posted
> >> a week or so ago, thankfully that series isn't needed thanks to Trond
> >> and Neil's efforts. BUT the first patch is needed, has Reviewed-by
> >> from Jeff and Neil and is marked for stable@.
> >>
> >> The biggest change in v2 is the introduction of O_DIRECT misaligned
> >> READ and WRITE handling for the benefit of LOCALIO. Please see patches
> >> 6 and 7 for more details.
> >
> > Turns out that these NFS client (fs/nfs/direct.c) changes that focused
> > on benefiting LOCALIO actually also benefit NFSD if/when it is
> > configured to use O_DIRECT [0].
> >
> > Such that the NFS client's O_DIRECT code will split any misaligned
> > WRITE IO and NFSD will then happily issue the DIO-aligned "middle" of
> > a given misaligned WRITE IO down to the underlying filesystem.
> >
> > Same goes for READ, NFS client will expand the front of any misaligned
> > READ such that the bulk of the IO becomes DIO-aligned (only the final
> > partial tail page is misaligned).
> >
> > So with the NFS client changes in this series we actually don't _need_
> > NFSD to have the same type of misaligned IO analysis and handling to
> > expand READs or split WRITEs to be DIO-aligned. Which reduces work
> > that NFSD needs to do by leaning on the NFS client having the
> > capability.
>
> I'm not quite following. Does that apply to non-Linux NFS clients too?
No, old Linux clients without these changes or non-Linux clients
wouldn't have the capabilities offered (extending READs, splitting
WRITEs to be DIO-aligned). The question is: do we care? Long-term:
probably.
I'd be fine with the NFSD DIRECT's misaligned IO support (READ already
implemented/posted [0], WRITE partially implemented but not posted) to
be land upstream so that we have the benefit of making the most of
NFSD's O_DIRECT support even if the client doesn't take steps to issue
IO that is DIO-aligned.
Whichever way we go, it is potentially convenient that the less
"involved" NFSD DIRECT patch 5 [0] could be withheld initially. Let
the NFS client do the lifting for us assessing how well NFSD's
O_DIRECT works and yet have confidence that we can deploy support for
old Linux clients or non-Linux clients in the future by merging that
patch 5 (and NFSD misaligned WRITE support comparable to NFS's WRITE
splitting in this series [1]) in a secondary phase.
Good to have options.
Mike
[0]: https://lore.kernel.org/linux-nfs/20250723154351.59042-6-snitzer@kernel.org/
[1]: https://lore.kernel.org/linux-nfs/20250722024924.49877-8-snitzer@kernel.org/
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 0/7] NFS DIRECT: handle misaligned READ and WRITE for LOCALIO
2025-07-23 23:53 ` Mike Snitzer
@ 2025-07-23 23:58 ` Mike Snitzer
2025-07-24 13:28 ` Chuck Lever
1 sibling, 0 replies; 14+ messages in thread
From: Mike Snitzer @ 2025-07-23 23:58 UTC (permalink / raw)
To: Chuck Lever; +Cc: Trond Myklebust, Anna Schumaker, linux-nfs, Jeff Layton
On Wed, Jul 23, 2025 at 07:53:04PM -0400, Mike Snitzer wrote:
> On Wed, Jul 23, 2025 at 02:42:56PM -0400, Chuck Lever wrote:
> > On 7/23/25 2:40 PM, Mike Snitzer wrote:
> > > On Mon, Jul 21, 2025 at 10:49:17PM -0400, Mike Snitzer wrote:
> > >> Hi,
> > >>
> > >> This "NFS DIRECT" series depends on the "NFSD DIRECT" series here:
> > >> https://lore.kernel.org/linux-nfs/20250714224216.14329-1-snitzer@kernel.org/
> > >> (for the benefit of nfsd_file_dio_alignment patch in this series)
> > >>
> > >> The first patch was posted as part of a LOCALIO revert series I posted
> > >> a week or so ago, thankfully that series isn't needed thanks to Trond
> > >> and Neil's efforts. BUT the first patch is needed, has Reviewed-by
> > >> from Jeff and Neil and is marked for stable@.
> > >>
> > >> The biggest change in v2 is the introduction of O_DIRECT misaligned
> > >> READ and WRITE handling for the benefit of LOCALIO. Please see patches
> > >> 6 and 7 for more details.
> > >
> > > Turns out that these NFS client (fs/nfs/direct.c) changes that focused
> > > on benefiting LOCALIO actually also benefit NFSD if/when it is
> > > configured to use O_DIRECT [0].
> > >
> > > Such that the NFS client's O_DIRECT code will split any misaligned
> > > WRITE IO and NFSD will then happily issue the DIO-aligned "middle" of
> > > a given misaligned WRITE IO down to the underlying filesystem.
> > >
> > > Same goes for READ, NFS client will expand the front of any misaligned
> > > READ such that the bulk of the IO becomes DIO-aligned (only the final
> > > partial tail page is misaligned).
> > >
> > > So with the NFS client changes in this series we actually don't _need_
> > > NFSD to have the same type of misaligned IO analysis and handling to
> > > expand READs or split WRITEs to be DIO-aligned. Which reduces work
> > > that NFSD needs to do by leaning on the NFS client having the
> > > capability.
> >
> > I'm not quite following. Does that apply to non-Linux NFS clients too?
>
> No, old Linux clients without these changes or non-Linux clients
> wouldn't have the capabilities offered (extending READs, splitting
> WRITEs to be DIO-aligned). The question is: do we care? Long-term:
> probably.
>
> I'd be fine with the NFSD DIRECT's misaligned IO support (READ already
> implemented/posted [0], WRITE partially implemented but not posted) to
> be land upstream so that we have the benefit of making the most of
> NFSD's O_DIRECT support even if the client doesn't take steps to issue
> IO that is DIO-aligned.
>
> Whichever way we go, it is potentially convenient that the less
> "involved" NFSD DIRECT patch 5 [0] could be withheld initially.
I meant to say: the _more_ involved NFSD DIRECT patch 5 ;)
(that said, that really patch isn't so bad... but I do recall you
hoping it cleaner, and that it'd need refinement in the future).
> [0]: https://lore.kernel.org/linux-nfs/20250723154351.59042-6-snitzer@kernel.org/
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 0/7] NFS DIRECT: handle misaligned READ and WRITE for LOCALIO
2025-07-23 23:53 ` Mike Snitzer
2025-07-23 23:58 ` Mike Snitzer
@ 2025-07-24 13:28 ` Chuck Lever
2025-07-24 19:39 ` Mike Snitzer
1 sibling, 1 reply; 14+ messages in thread
From: Chuck Lever @ 2025-07-24 13:28 UTC (permalink / raw)
To: Mike Snitzer; +Cc: Trond Myklebust, Anna Schumaker, linux-nfs, Jeff Layton
On 7/23/25 7:53 PM, Mike Snitzer wrote:
> On Wed, Jul 23, 2025 at 02:42:56PM -0400, Chuck Lever wrote:
>> On 7/23/25 2:40 PM, Mike Snitzer wrote:
>>> On Mon, Jul 21, 2025 at 10:49:17PM -0400, Mike Snitzer wrote:
>>>> Hi,
>>>>
>>>> This "NFS DIRECT" series depends on the "NFSD DIRECT" series here:
>>>> https://lore.kernel.org/linux-nfs/20250714224216.14329-1-snitzer@kernel.org/
>>>> (for the benefit of nfsd_file_dio_alignment patch in this series)
>>>>
>>>> The first patch was posted as part of a LOCALIO revert series I posted
>>>> a week or so ago, thankfully that series isn't needed thanks to Trond
>>>> and Neil's efforts. BUT the first patch is needed, has Reviewed-by
>>>> from Jeff and Neil and is marked for stable@.
>>>>
>>>> The biggest change in v2 is the introduction of O_DIRECT misaligned
>>>> READ and WRITE handling for the benefit of LOCALIO. Please see patches
>>>> 6 and 7 for more details.
>>>
>>> Turns out that these NFS client (fs/nfs/direct.c) changes that focused
>>> on benefiting LOCALIO actually also benefit NFSD if/when it is
>>> configured to use O_DIRECT [0].
>>>
>>> Such that the NFS client's O_DIRECT code will split any misaligned
>>> WRITE IO and NFSD will then happily issue the DIO-aligned "middle" of
>>> a given misaligned WRITE IO down to the underlying filesystem.
>>>
>>> Same goes for READ, NFS client will expand the front of any misaligned
>>> READ such that the bulk of the IO becomes DIO-aligned (only the final
>>> partial tail page is misaligned).
>>>
>>> So with the NFS client changes in this series we actually don't _need_
>>> NFSD to have the same type of misaligned IO analysis and handling to
>>> expand READs or split WRITEs to be DIO-aligned. Which reduces work
>>> that NFSD needs to do by leaning on the NFS client having the
>>> capability.
>>
>> I'm not quite following. Does that apply to non-Linux NFS clients too?
>
> No, old Linux clients without these changes or non-Linux clients
> wouldn't have the capabilities offered (extending READs, splitting
> WRITEs to be DIO-aligned). The question is: do we care? Long-term:
> probably.
It sounds like we can't rely on clients to align the payload for NFSD.
So I'd say "we care".
Maybe NFSD could recognize when the content is already properly aligned
and take a shorter code path? I'm not familiar enough with your patches
yet to make a guess.
> I'd be fine with the NFSD DIRECT's misaligned IO support (READ already
> implemented/posted [0], WRITE partially implemented but not posted) to
> be land upstream so that we have the benefit of making the most of
> NFSD's O_DIRECT support even if the client doesn't take steps to issue
> IO that is DIO-aligned.
>
> Whichever way we go, it is potentially convenient that the less
> "involved" NFSD DIRECT patch 5 [0] could be withheld initially. Let
> the NFS client do the lifting for us assessing how well NFSD's
> O_DIRECT works and yet have confidence that we can deploy support for
> old Linux clients or non-Linux clients in the future by merging that
> patch 5 (and NFSD misaligned WRITE support comparable to NFS's WRITE
> splitting in this series [1]) in a secondary phase.
>
> Good to have options.
>
> Mike
>
> [0]: https://lore.kernel.org/linux-nfs/20250723154351.59042-6-snitzer@kernel.org/
> [1]: https://lore.kernel.org/linux-nfs/20250722024924.49877-8-snitzer@kernel.org/
--
Chuck Lever
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 0/7] NFS DIRECT: handle misaligned READ and WRITE for LOCALIO
2025-07-24 13:28 ` Chuck Lever
@ 2025-07-24 19:39 ` Mike Snitzer
0 siblings, 0 replies; 14+ messages in thread
From: Mike Snitzer @ 2025-07-24 19:39 UTC (permalink / raw)
To: Chuck Lever; +Cc: Trond Myklebust, Anna Schumaker, linux-nfs, Jeff Layton
On Thu, Jul 24, 2025 at 09:28:39AM -0400, Chuck Lever wrote:
> On 7/23/25 7:53 PM, Mike Snitzer wrote:
> > On Wed, Jul 23, 2025 at 02:42:56PM -0400, Chuck Lever wrote:
> >> On 7/23/25 2:40 PM, Mike Snitzer wrote:
> >>> On Mon, Jul 21, 2025 at 10:49:17PM -0400, Mike Snitzer wrote:
> >>>> Hi,
> >>>>
> >>>> This "NFS DIRECT" series depends on the "NFSD DIRECT" series here:
> >>>> https://lore.kernel.org/linux-nfs/20250714224216.14329-1-snitzer@kernel.org/
> >>>> (for the benefit of nfsd_file_dio_alignment patch in this series)
> >>>>
> >>>> The first patch was posted as part of a LOCALIO revert series I posted
> >>>> a week or so ago, thankfully that series isn't needed thanks to Trond
> >>>> and Neil's efforts. BUT the first patch is needed, has Reviewed-by
> >>>> from Jeff and Neil and is marked for stable@.
> >>>>
> >>>> The biggest change in v2 is the introduction of O_DIRECT misaligned
> >>>> READ and WRITE handling for the benefit of LOCALIO. Please see patches
> >>>> 6 and 7 for more details.
> >>>
> >>> Turns out that these NFS client (fs/nfs/direct.c) changes that focused
> >>> on benefiting LOCALIO actually also benefit NFSD if/when it is
> >>> configured to use O_DIRECT [0].
> >>>
> >>> Such that the NFS client's O_DIRECT code will split any misaligned
> >>> WRITE IO and NFSD will then happily issue the DIO-aligned "middle" of
> >>> a given misaligned WRITE IO down to the underlying filesystem.
> >>>
> >>> Same goes for READ, NFS client will expand the front of any misaligned
> >>> READ such that the bulk of the IO becomes DIO-aligned (only the final
> >>> partial tail page is misaligned).
> >>>
> >>> So with the NFS client changes in this series we actually don't _need_
> >>> NFSD to have the same type of misaligned IO analysis and handling to
> >>> expand READs or split WRITEs to be DIO-aligned. Which reduces work
> >>> that NFSD needs to do by leaning on the NFS client having the
> >>> capability.
> >>
> >> I'm not quite following. Does that apply to non-Linux NFS clients too?
> >
> > No, old Linux clients without these changes or non-Linux clients
> > wouldn't have the capabilities offered (extending READs, splitting
> > WRITEs to be DIO-aligned). The question is: do we care? Long-term:
> > probably.
>
> It sounds like we can't rely on clients to align the payload for NFSD.
> So I'd say "we care".
Not old or non-Linux clients, no...
> Maybe NFSD could recognize when the content is already properly aligned
> and take a shorter code path? I'm not familiar enough with your patches
> yet to make a guess.
Its pretty well optimized as-is, yes. So that isn't a major concern.
I've just posted a rolled up v5 that keeps the NFSD capability, thanks
for your guidance.
Mike
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2025-07-24 19:39 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-22 2:49 [PATCH v2 0/7] NFS DIRECT: handle misaligned READ and WRITE for LOCALIO Mike Snitzer
2025-07-22 2:49 ` [PATCH v2 1/7] nfs/localio: avoid bouncing LOCALIO if nfs_client_is_local() Mike Snitzer
2025-07-22 2:49 ` [PATCH v2 2/7] nfs/localio: make trace_nfs_local_open_fh more useful Mike Snitzer
2025-07-22 2:49 ` [PATCH v2 3/7] nfs/localio: add nfsd_file_dio_alignment Mike Snitzer
2025-07-22 2:49 ` [PATCH v2 4/7] nfs/localio: refactor iocb initialization Mike Snitzer
2025-07-22 2:49 ` [PATCH v2 5/7] nfs/localio: fallback to NFSD for misaligned O_DIRECT READs Mike Snitzer
2025-07-22 2:49 ` [PATCH v2 6/7] nfs/direct: add misaligned READ handling Mike Snitzer
2025-07-22 2:49 ` [PATCH v2 7/7] nfs/direct: add misaligned WRITE handling Mike Snitzer
2025-07-23 18:40 ` [PATCH v2 0/7] NFS DIRECT: handle misaligned READ and WRITE for LOCALIO Mike Snitzer
2025-07-23 18:42 ` Chuck Lever
2025-07-23 23:53 ` Mike Snitzer
2025-07-23 23:58 ` Mike Snitzer
2025-07-24 13:28 ` Chuck Lever
2025-07-24 19:39 ` Mike Snitzer
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).