Linux NFS development
 help / color / mirror / Atom feed
* [RFC PATCH 0/6] NFS: LOCALIO improvements and support for misaligned O_DIRECT READs
@ 2025-07-08 16:20 Mike Snitzer
  2025-07-08 16:20 ` [RFC PATCH 1/6] nfs/localio: avoid bouncing LOCALIO if nfs_client_is_local() Mike Snitzer
                   ` (5 more replies)
  0 siblings, 6 replies; 9+ messages in thread
From: Mike Snitzer @ 2025-07-08 16:20 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker; +Cc: linux-nfs, snitzer

Hi,

This patchset benefits from this NFSD patchset:
https://lore.kernel.org/linux-nfs/20250708160619.64800-1-snitzer@kernel.org/
(particularly due to patch 6 leaning heavily on NFSD's ability to
expand misaligned O_DIRECT READS to be DIO-aligned).

First 3 patches are general LOCALIO improvements.
Patches 4 - 6 added dio_alignment awareness to LOCALIO and make it
possible for LOCALIO to punt IO over to NFSD (via loopback network) so
that it can take advantage of NFSD's io_cache_read=2 to handle
misaligned O_DIRECT READs so that they are issued as DIO-aligned.

Thanks,
Mike

Mike Snitzer (6):
  nfs/localio: avoid bouncing LOCALIO if nfs_client_is_local()
  nfs/localio: add localio_async_probe modparm
  nfs/localio: make trace_nfs_local_open_fh more useful
  nfs/localio: add nfsd_file_dio_alignment
  nfs/localio: refactor iocb initialization
  nfs/localio: fallback to NFSD for misaligned O_DIRECT READs

 fs/nfs/flexfilelayout/flexfilelayout.c |   1 +
 fs/nfs/internal.h                      |   4 +-
 fs/nfs/localio.c                       | 232 ++++++++++++++++---------
 fs/nfs/nfstrace.h                      |   6 +-
 fs/nfs/pagelist.c                      |  15 +-
 fs/nfsd/localio.c                      |  11 ++
 include/linux/nfslocalio.h             |   2 +
 7 files changed, 178 insertions(+), 93 deletions(-)

-- 
2.44.0


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [RFC PATCH 1/6] nfs/localio: avoid bouncing LOCALIO if nfs_client_is_local()
  2025-07-08 16:20 [RFC PATCH 0/6] NFS: LOCALIO improvements and support for misaligned O_DIRECT READs Mike Snitzer
@ 2025-07-08 16:20 ` Mike Snitzer
  2025-07-08 16:20 ` [RFC PATCH 2/6] nfs/localio: add localio_async_probe modparm Mike Snitzer
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Mike Snitzer @ 2025-07-08 16:20 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker; +Cc: linux-nfs, snitzer, Mike Snitzer

From: Mike Snitzer <snitzer@hammerspace.com>

Previously nfs_local_probe() was made to disable and then attempt to
re-enable LOCALIO (via LOCALIO protocol handshake) if/when it was
called and LOCALIO already enabled.

Vague memory for _why_ this was the case is that this was useful
if/when a local NFS server were to be restarted with a local NFS
client connected to it.

But as it happens this causes an absurd amount of LOCALIO flapping
which has a side-effect of too much IO being needlessly sent to NFSD
(using RPC over the loopback network interface).  This is the
definition of "serious performance loss" (that negates the point of
having LOCALIO).

So remove this mis-optimization for re-enabling LOCALIO if/when an NFS
server is restarted (which is an extremely rare thing to do).  Will
revisit testing that scenario again but in the meantime this patch
restores the full benefit of LOCALIO.

Signed-off-by: Mike Snitzer <snitzer@hammerspace.com>
---
 fs/nfs/localio.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
index ef12dd279539..b728b9305a0f 100644
--- a/fs/nfs/localio.c
+++ b/fs/nfs/localio.c
@@ -180,10 +180,8 @@ static void nfs_local_probe(struct nfs_client *clp)
 		return;
 	}
 
-	if (nfs_client_is_local(clp)) {
-		/* If already enabled, disable and re-enable */
-		nfs_localio_disable_client(clp);
-	}
+	if (nfs_client_is_local(clp))
+		return;
 
 	if (!nfs_uuid_begin(&clp->cl_uuid))
 		return;
@@ -237,7 +235,8 @@ __nfs_local_open_fh(struct nfs_client *clp, const struct cred *cred,
 		case -ENOMEM:
 		case -ENXIO:
 		case -ENOENT:
-			/* Revalidate localio, will disable if unsupported */
+			/* Revalidate localio */
+			nfs_localio_disable_client(clp);
 			nfs_local_probe(clp);
 		}
 	}
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC PATCH 2/6] nfs/localio: add localio_async_probe modparm
  2025-07-08 16:20 [RFC PATCH 0/6] NFS: LOCALIO improvements and support for misaligned O_DIRECT READs Mike Snitzer
  2025-07-08 16:20 ` [RFC PATCH 1/6] nfs/localio: avoid bouncing LOCALIO if nfs_client_is_local() Mike Snitzer
@ 2025-07-08 16:20 ` Mike Snitzer
  2025-07-08 16:20 ` [RFC PATCH 3/6] nfs/localio: make trace_nfs_local_open_fh more useful Mike Snitzer
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Mike Snitzer @ 2025-07-08 16:20 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker; +Cc: linux-nfs, snitzer, Mike Snitzer

From: Mike Snitzer <snitzer@hammerspace.com>

This knob influences the LOCALIO handshake so that it happens
synchronously upon NFS client creation.. which reduces the window of
opportunity for a bunch of IO to flood page cache and send out over to
NFSD before LOCALIO handshake negotiates that the client and server
are local.  The knob is:
  echo N > /sys/module/nfs/parameters/localio_async_probe

Fixes: 1ff4716f420b ("NFS: always probe for LOCALIO support asynchronously")
Signed-off-by: Mike Snitzer <snitzer@hammerspace.com>
---
 fs/nfs/localio.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
index b728b9305a0f..c05dc8a09653 100644
--- a/fs/nfs/localio.c
+++ b/fs/nfs/localio.c
@@ -49,6 +49,11 @@ struct nfs_local_fsync_ctx {
 static bool localio_enabled __read_mostly = true;
 module_param(localio_enabled, bool, 0644);
 
+static bool localio_async_probe __read_mostly = true;
+module_param(localio_async_probe, bool, 0644);
+MODULE_PARM_DESC(localio_async_probe,
+		 "Probe for LOCALIO support asynchronously.");
+
 static bool localio_O_DIRECT_semantics __read_mostly = false;
 module_param(localio_O_DIRECT_semantics, bool, 0644);
 MODULE_PARM_DESC(localio_O_DIRECT_semantics,
@@ -203,7 +208,10 @@ void nfs_local_probe_async_work(struct work_struct *work)
 
 void nfs_local_probe_async(struct nfs_client *clp)
 {
-	queue_work(nfsiod_workqueue, &clp->cl_local_probe_work);
+	if (likely(localio_async_probe))
+		queue_work(nfsiod_workqueue, &clp->cl_local_probe_work);
+	else
+		nfs_local_probe(clp);
 }
 EXPORT_SYMBOL_GPL(nfs_local_probe_async);
 
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC PATCH 3/6] nfs/localio: make trace_nfs_local_open_fh more useful
  2025-07-08 16:20 [RFC PATCH 0/6] NFS: LOCALIO improvements and support for misaligned O_DIRECT READs Mike Snitzer
  2025-07-08 16:20 ` [RFC PATCH 1/6] nfs/localio: avoid bouncing LOCALIO if nfs_client_is_local() Mike Snitzer
  2025-07-08 16:20 ` [RFC PATCH 2/6] nfs/localio: add localio_async_probe modparm Mike Snitzer
@ 2025-07-08 16:20 ` Mike Snitzer
  2025-07-08 16:20 ` [RFC PATCH 4/6] nfs/localio: add nfsd_file_dio_alignment Mike Snitzer
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Mike Snitzer @ 2025-07-08 16:20 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker; +Cc: linux-nfs, snitzer, Mike Snitzer

From: Mike Snitzer <snitzer@hammerspace.com>

Always trigger trace event when LOCALIO opens a file.

Signed-off-by: Mike Snitzer <snitzer@hammerspace.com>
---
 fs/nfs/localio.c  | 5 +++--
 fs/nfs/nfstrace.h | 6 +++---
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
index c05dc8a09653..67de26392c4a 100644
--- a/fs/nfs/localio.c
+++ b/fs/nfs/localio.c
@@ -232,13 +232,13 @@ __nfs_local_open_fh(struct nfs_client *clp, const struct cred *cred,
 		    struct nfsd_file __rcu **pnf,
 		    const fmode_t mode)
 {
+	int status = 0;
 	struct nfsd_file *localio;
 
 	localio = nfs_open_local_fh(&clp->cl_uuid, clp->cl_rpcclient,
 				    cred, fh, nfl, pnf, mode);
 	if (IS_ERR(localio)) {
-		int status = PTR_ERR(localio);
-		trace_nfs_local_open_fh(fh, mode, status);
+		status = PTR_ERR(localio);
 		switch (status) {
 		case -ENOMEM:
 		case -ENXIO:
@@ -248,6 +248,7 @@ __nfs_local_open_fh(struct nfs_client *clp, const struct cred *cred,
 			nfs_local_probe(clp);
 		}
 	}
+	trace_nfs_local_open_fh(fh, mode, status);
 	return localio;
 }
 
diff --git a/fs/nfs/nfstrace.h b/fs/nfs/nfstrace.h
index f49f064c5ee5..feadaa6dee63 100644
--- a/fs/nfs/nfstrace.h
+++ b/fs/nfs/nfstrace.h
@@ -1708,10 +1708,10 @@ TRACE_EVENT(nfs_local_open_fh,
 		),
 
 		TP_printk(
-			"error=%d fhandle=0x%08x mode=%s",
-			__entry->error,
+			"fhandle=0x%08x mode=%s result=%d",
 			__entry->fhandle,
-			show_fs_fmode_flags(__entry->fmode)
+			show_fs_fmode_flags(__entry->fmode),
+			__entry->error
 		)
 );
 
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC PATCH 4/6] nfs/localio: add nfsd_file_dio_alignment
  2025-07-08 16:20 [RFC PATCH 0/6] NFS: LOCALIO improvements and support for misaligned O_DIRECT READs Mike Snitzer
                   ` (2 preceding siblings ...)
  2025-07-08 16:20 ` [RFC PATCH 3/6] nfs/localio: make trace_nfs_local_open_fh more useful Mike Snitzer
@ 2025-07-08 16:20 ` Mike Snitzer
  2025-07-08 16:20 ` [RFC PATCH 5/6] nfs/localio: refactor iocb initialization Mike Snitzer
  2025-07-08 16:20 ` [RFC PATCH 6/6] nfs/localio: fallback to NFSD for misaligned O_DIRECT READs Mike Snitzer
  5 siblings, 0 replies; 9+ messages in thread
From: Mike Snitzer @ 2025-07-08 16:20 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker; +Cc: linux-nfs, snitzer, Mike Snitzer

From: Mike Snitzer <snitzer@hammerspace.com>

And use it to avoid issuing misaligned IO using O_DIRECT.

Signed-off-by: Mike Snitzer <snitzer@hammerspace.com>
---
 fs/nfs/localio.c           | 34 ++++++++++++++++++++++++++++++----
 fs/nfsd/localio.c          | 11 +++++++++++
 include/linux/nfslocalio.h |  2 ++
 3 files changed, 43 insertions(+), 4 deletions(-)

diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
index 67de26392c4a..e551690e3f6b 100644
--- a/fs/nfs/localio.c
+++ b/fs/nfs/localio.c
@@ -37,6 +37,10 @@ struct nfs_local_kiocb {
 	struct work_struct	work;
 	void (*aio_complete_work)(struct work_struct *);
 	struct nfsd_file	*localio;
+	/* local copy of nfsd_file's dio alignment attrs */
+	u32			nf_dio_mem_align;
+	u32			nf_dio_offset_align;
+	u32			nf_dio_read_offset_align;
 };
 
 struct nfs_local_fsync_ctx {
@@ -323,12 +327,10 @@ nfs_local_iocb_alloc(struct nfs_pgio_header *hdr,
 		return NULL;
 	}
 
+	init_sync_kiocb(&iocb->kiocb, file);
 	if (localio_O_DIRECT_semantics &&
-	    test_bit(NFS_IOHDR_ODIRECT, &hdr->flags)) {
-		iocb->kiocb.ki_filp = file;
+	    test_bit(NFS_IOHDR_ODIRECT, &hdr->flags))
 		iocb->kiocb.ki_flags = IOCB_DIRECT;
-	} else
-		init_sync_kiocb(&iocb->kiocb, file);
 
 	iocb->kiocb.ki_pos = hdr->args.offset;
 	iocb->hdr = hdr;
@@ -342,11 +344,29 @@ static void
 nfs_local_iter_init(struct iov_iter *i, struct nfs_local_kiocb *iocb, int dir)
 {
 	struct nfs_pgio_header *hdr = iocb->hdr;
+	u32 nf_dio_mem_align, nf_dio_offset_align;
+
+	if (iocb->kiocb.ki_flags & IOCB_DIRECT) {
+		nf_dio_mem_align = iocb->nf_dio_mem_align;
+		if (dir == READ)
+			nf_dio_offset_align = iocb->nf_dio_read_offset_align;
+		else
+			nf_dio_offset_align = iocb->nf_dio_offset_align;
+	}
 
 	iov_iter_bvec(i, dir, iocb->bvec, hdr->page_array.npages,
 		      hdr->args.count + hdr->args.pgbase);
 	if (hdr->args.pgbase != 0)
 		iov_iter_advance(i, hdr->args.pgbase);
+	/* Verify the IO is DIO-aligned as needed */
+	if (iocb->kiocb.ki_flags & IOCB_DIRECT &&
+	    !iov_iter_is_aligned(i, nf_dio_mem_align - 1,
+				 nf_dio_offset_align - 1)) {
+		/* Fallback to buffered IO */
+		iocb->kiocb.ki_flags &= ~IOCB_DIRECT;
+		iocb->kiocb.ki_complete = NULL;
+		iocb->aio_complete_work = NULL;
+	}
 }
 
 static void
@@ -481,6 +501,9 @@ nfs_do_local_read(struct nfs_pgio_header *hdr,
 	if (iocb == NULL)
 		return -ENOMEM;
 	iocb->localio = localio;
+	nfs_to->nfsd_file_dio_alignment(localio, &iocb->nf_dio_mem_align,
+					&iocb->nf_dio_offset_align,
+					&iocb->nf_dio_read_offset_align);
 
 	nfs_local_pgio_init(hdr, call_ops);
 	hdr->res.eof = false;
@@ -680,6 +703,9 @@ nfs_do_local_write(struct nfs_pgio_header *hdr,
 	if (iocb == NULL)
 		return -ENOMEM;
 	iocb->localio = localio;
+	nfs_to->nfsd_file_dio_alignment(localio, &iocb->nf_dio_mem_align,
+					&iocb->nf_dio_offset_align,
+					&iocb->nf_dio_read_offset_align);
 
 	switch (hdr->args.stable) {
 	default:
diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c
index 4f6468eb2adf..3eb1997289fe 100644
--- a/fs/nfsd/localio.c
+++ b/fs/nfsd/localio.c
@@ -116,6 +116,16 @@ nfsd_open_local_fh(struct net *net, struct auth_domain *dom,
 	return localio;
 }
 
+static void nfsd_file_dio_alignment(struct nfsd_file *nf,
+				    u32 *nf_dio_mem_align,
+				    u32 *nf_dio_offset_align,
+				    u32 *nf_dio_read_offset_align)
+{
+	*nf_dio_mem_align = nf->nf_dio_mem_align;
+	*nf_dio_offset_align = nf->nf_dio_offset_align;
+	*nf_dio_read_offset_align = nf->nf_dio_read_offset_align;
+}
+
 static const struct nfsd_localio_operations nfsd_localio_ops = {
 	.nfsd_net_try_get  = nfsd_net_try_get,
 	.nfsd_net_put  = nfsd_net_put,
@@ -123,6 +133,7 @@ static const struct nfsd_localio_operations nfsd_localio_ops = {
 	.nfsd_file_put_local = nfsd_file_put_local,
 	.nfsd_file_get_local = nfsd_file_get_local,
 	.nfsd_file_file = nfsd_file_file,
+	.nfsd_file_dio_alignment = nfsd_file_dio_alignment,
 };
 
 void nfsd_localio_ops_init(void)
diff --git a/include/linux/nfslocalio.h b/include/linux/nfslocalio.h
index 453d9de3d70b..f9d07fef7dba 100644
--- a/include/linux/nfslocalio.h
+++ b/include/linux/nfslocalio.h
@@ -65,6 +65,8 @@ struct nfsd_localio_operations {
 	struct net *(*nfsd_file_put_local)(struct nfsd_file *);
 	struct nfsd_file *(*nfsd_file_get_local)(struct nfsd_file *);
 	struct file *(*nfsd_file_file)(struct nfsd_file *);
+	void (*nfsd_file_dio_alignment)(struct nfsd_file *,
+					u32 *, u32 *, u32 *);
 } ____cacheline_aligned;
 
 extern void nfsd_localio_ops_init(void);
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC PATCH 5/6] nfs/localio: refactor iocb initialization
  2025-07-08 16:20 [RFC PATCH 0/6] NFS: LOCALIO improvements and support for misaligned O_DIRECT READs Mike Snitzer
                   ` (3 preceding siblings ...)
  2025-07-08 16:20 ` [RFC PATCH 4/6] nfs/localio: add nfsd_file_dio_alignment Mike Snitzer
@ 2025-07-08 16:20 ` Mike Snitzer
  2025-07-10  7:23   ` Christoph Hellwig
  2025-07-08 16:20 ` [RFC PATCH 6/6] nfs/localio: fallback to NFSD for misaligned O_DIRECT READs Mike Snitzer
  5 siblings, 1 reply; 9+ messages in thread
From: Mike Snitzer @ 2025-07-08 16:20 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker; +Cc: linux-nfs, snitzer, Mike Snitzer

From: Mike Snitzer <snitzer@hammerspace.com>

No functional change, but slighty cleaner.

Signed-off-by: Mike Snitzer <snitzer@hammerspace.com>
---
 fs/nfs/localio.c | 60 ++++++++++++++++++++++--------------------------
 1 file changed, 28 insertions(+), 32 deletions(-)

diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
index e551690e3f6b..d577fb27fd2f 100644
--- a/fs/nfs/localio.c
+++ b/fs/nfs/localio.c
@@ -287,23 +287,6 @@ nfs_local_open_fh(struct nfs_client *clp, const struct cred *cred,
 }
 EXPORT_SYMBOL_GPL(nfs_local_open_fh);
 
-static struct bio_vec *
-nfs_bvec_alloc_and_import_pagevec(struct page **pagevec,
-		unsigned int npages, gfp_t flags)
-{
-	struct bio_vec *bvec, *p;
-
-	bvec = kmalloc_array(npages, sizeof(*bvec), flags);
-	if (bvec != NULL) {
-		for (p = bvec; npages > 0; p++, pagevec++, npages--) {
-			p->bv_page = *pagevec;
-			p->bv_len = PAGE_SIZE;
-			p->bv_offset = 0;
-		}
-	}
-	return bvec;
-}
-
 static void
 nfs_local_iocb_free(struct nfs_local_kiocb *iocb)
 {
@@ -320,8 +303,9 @@ nfs_local_iocb_alloc(struct nfs_pgio_header *hdr,
 	iocb = kmalloc(sizeof(*iocb), flags);
 	if (iocb == NULL)
 		return NULL;
-	iocb->bvec = nfs_bvec_alloc_and_import_pagevec(hdr->page_array.pagevec,
-			hdr->page_array.npages, flags);
+	// FIXME: would do well to allocate memory based on NUMA node.
+	iocb->bvec = kmalloc_array(hdr->page_array.npages,
+				   sizeof(struct bio_vec), flags);
 	if (iocb->bvec == NULL) {
 		kfree(iocb);
 		return NULL;
@@ -344,7 +328,10 @@ static void
 nfs_local_iter_init(struct iov_iter *i, struct nfs_local_kiocb *iocb, int dir)
 {
 	struct nfs_pgio_header *hdr = iocb->hdr;
+	struct page **pagevec = hdr->page_array.pagevec;
 	u32 nf_dio_mem_align, nf_dio_offset_align;
+	unsigned long v, total;
+	size_t len;
 
 	if (iocb->kiocb.ki_flags & IOCB_DIRECT) {
 		nf_dio_mem_align = iocb->nf_dio_mem_align;
@@ -354,18 +341,29 @@ nfs_local_iter_init(struct iov_iter *i, struct nfs_local_kiocb *iocb, int dir)
 			nf_dio_offset_align = iocb->nf_dio_offset_align;
 	}
 
-	iov_iter_bvec(i, dir, iocb->bvec, hdr->page_array.npages,
+	v = 0;
+	total = hdr->args.count + hdr->args.pgbase;
+	while (total) {
+		len = min_t(size_t, total, PAGE_SIZE);
+		bvec_set_page(&iocb->bvec[v], *(pagevec++),
+			      len, 0);
+		total -= len;
+		++v;
+	}
+	WARN_ON_ONCE(v != hdr->page_array.npages);
+
+	iov_iter_bvec(i, dir, iocb->bvec, v,
 		      hdr->args.count + hdr->args.pgbase);
 	if (hdr->args.pgbase != 0)
 		iov_iter_advance(i, hdr->args.pgbase);
+
 	/* Verify the IO is DIO-aligned as needed */
 	if (iocb->kiocb.ki_flags & IOCB_DIRECT &&
 	    !iov_iter_is_aligned(i, nf_dio_mem_align - 1,
 				 nf_dio_offset_align - 1)) {
 		/* Fallback to buffered IO */
 		iocb->kiocb.ki_flags &= ~IOCB_DIRECT;
-		iocb->kiocb.ki_complete = NULL;
-		iocb->aio_complete_work = NULL;
+		iocb->kiocb.ki_filp->f_flags &= ~O_DIRECT;
 	}
 }
 
@@ -472,6 +470,10 @@ static void nfs_local_call_read(struct work_struct *work)
 	save_cred = override_creds(filp->f_cred);
 
 	nfs_local_iter_init(&iter, iocb, READ);
+	if (iocb->kiocb.ki_flags & IOCB_DIRECT) {
+		iocb->kiocb.ki_complete = nfs_local_read_aio_complete;
+		iocb->aio_complete_work = nfs_local_read_aio_complete_work;
+	}
 
 	status = filp->f_op->read_iter(&iocb->kiocb, &iter);
 	if (status != -EIOCBQUEUED) {
@@ -508,11 +510,6 @@ nfs_do_local_read(struct nfs_pgio_header *hdr,
 	nfs_local_pgio_init(hdr, call_ops);
 	hdr->res.eof = false;
 
-	if (iocb->kiocb.ki_flags & IOCB_DIRECT) {
-		iocb->kiocb.ki_complete = nfs_local_read_aio_complete;
-		iocb->aio_complete_work = nfs_local_read_aio_complete_work;
-	}
-
 	INIT_WORK(&iocb->work, nfs_local_call_read);
 	queue_work(nfslocaliod_workqueue, &iocb->work);
 
@@ -669,6 +666,10 @@ static void nfs_local_call_write(struct work_struct *work)
 	save_cred = override_creds(filp->f_cred);
 
 	nfs_local_iter_init(&iter, iocb, WRITE);
+	if (iocb->kiocb.ki_flags & IOCB_DIRECT) {
+		iocb->kiocb.ki_complete = nfs_local_write_aio_complete;
+		iocb->aio_complete_work = nfs_local_write_aio_complete_work;
+	}
 
 	file_start_write(filp);
 	status = filp->f_op->write_iter(&iocb->kiocb, &iter);
@@ -721,11 +722,6 @@ nfs_do_local_write(struct nfs_pgio_header *hdr,
 
 	nfs_set_local_verifier(hdr->inode, hdr->res.verf, hdr->args.stable);
 
-	if (iocb->kiocb.ki_flags & IOCB_DIRECT) {
-		iocb->kiocb.ki_complete = nfs_local_write_aio_complete;
-		iocb->aio_complete_work = nfs_local_write_aio_complete_work;
-	}
-
 	INIT_WORK(&iocb->work, nfs_local_call_write);
 	queue_work(nfslocaliod_workqueue, &iocb->work);
 
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC PATCH 6/6] nfs/localio: fallback to NFSD for misaligned O_DIRECT READs
  2025-07-08 16:20 [RFC PATCH 0/6] NFS: LOCALIO improvements and support for misaligned O_DIRECT READs Mike Snitzer
                   ` (4 preceding siblings ...)
  2025-07-08 16:20 ` [RFC PATCH 5/6] nfs/localio: refactor iocb initialization Mike Snitzer
@ 2025-07-08 16:20 ` Mike Snitzer
  5 siblings, 0 replies; 9+ messages in thread
From: Mike Snitzer @ 2025-07-08 16:20 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker; +Cc: linux-nfs, snitzer, Mike Snitzer

From: Mike Snitzer <snitzer@hammerspace.com>

LOCALIO is left enabled, it just isn't used for any misaligned
O_DIRECT READ.

If LOCALIO determines that an O_DIRECT READ is misaligned then it
makes sense to immediately issue the READ remotely via NFSD (which has
the ability to expand a misaligned O_DIRECT READ to be DIO-aligned)
if/when NFSD is configured to use O_DIRECT for READ IO with:
  echo 2 > /sys/kernel/debug/nfsd/io_cache_read

This change in behavior for LOCALIO's O_DIRECT support really should
be dependent on NFSD running with io_cache_read=2 but there isn't an
interface to check for that (currently).

Add 'localio_O_DIRECT_align_misaligned_READ' modparm, which depends on
localio_O_DIRECT_semantics=Y, to control if LOCALIO will expand
misaligned READ to DIO-aligned (boolean control is Y by default).

Signed-off-by: Mike Snitzer <snitzer@hammerspace.com>
---
 fs/nfs/flexfilelayout/flexfilelayout.c |   1 +
 fs/nfs/internal.h                      |   4 +-
 fs/nfs/localio.c                       | 152 +++++++++++++++----------
 fs/nfs/pagelist.c                      |  15 ++-
 4 files changed, 107 insertions(+), 65 deletions(-)

diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c
index 4bea008dbebd..fbcf7c5ac118 100644
--- a/fs/nfs/flexfilelayout/flexfilelayout.c
+++ b/fs/nfs/flexfilelayout/flexfilelayout.c
@@ -1911,6 +1911,7 @@ ff_layout_read_pagelist(struct nfs_pgio_header *hdr)
 	localio = ff_local_open_fh(lseg, idx, ds->ds_clp, ds_cred, fh, FMODE_READ);
 	if (localio) {
 		hdr->task.tk_start = ktime_get();
+		// FIXME: if fallback occurs is this stats start bogus?
 		ff_layout_read_record_layoutstats_start(&hdr->task, hdr);
 	}
 
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 3a2ff094216c..5c39e143efc0 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -463,7 +463,7 @@ extern struct nfsd_file *nfs_local_open_fh(struct nfs_client *,
 					   struct nfs_file_localio *,
 					   const fmode_t);
 extern int nfs_local_doio(struct nfs_client *,
-			  struct nfsd_file *,
+			  struct nfsd_file **,
 			  struct nfs_pgio_header *,
 			  const struct rpc_call_ops *);
 extern int nfs_local_commit(struct nfsd_file *,
@@ -482,7 +482,7 @@ nfs_local_open_fh(struct nfs_client *clp, const struct cred *cred,
 	return NULL;
 }
 static inline int nfs_local_doio(struct nfs_client *clp,
-				 struct nfsd_file *localio,
+				 struct nfsd_file **localio,
 				 struct nfs_pgio_header *hdr,
 				 const struct rpc_call_ops *call_ops)
 {
diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
index d577fb27fd2f..3eab88715d5b 100644
--- a/fs/nfs/localio.c
+++ b/fs/nfs/localio.c
@@ -36,11 +36,8 @@ struct nfs_local_kiocb {
 	struct nfs_pgio_header	*hdr;
 	struct work_struct	work;
 	void (*aio_complete_work)(struct work_struct *);
+	struct iov_iter		iter ____cacheline_aligned;
 	struct nfsd_file	*localio;
-	/* local copy of nfsd_file's dio alignment attrs */
-	u32			nf_dio_mem_align;
-	u32			nf_dio_offset_align;
-	u32			nf_dio_read_offset_align;
 };
 
 struct nfs_local_fsync_ctx {
@@ -63,6 +60,12 @@ module_param(localio_O_DIRECT_semantics, bool, 0644);
 MODULE_PARM_DESC(localio_O_DIRECT_semantics,
 		 "LOCALIO will use O_DIRECT semantics to filesystem.");
 
+static bool localio_O_DIRECT_align_misaligned_READ __read_mostly = true;
+module_param(localio_O_DIRECT_align_misaligned_READ, bool, 0644);
+/* This feature also depends on: echo 2 > /sys/kernel/debug/nfsd/io_cache_read */
+MODULE_PARM_DESC(localio_O_DIRECT_align_misaligned_READ,
+		 "If LOCALIO_O_DIRECT_semantics=Y it will expand misaligned READ to DIO-aligned.");
+
 static inline bool nfs_client_is_local(const struct nfs_client *clp)
 {
 	return !!rcu_access_pointer(clp->cl_uuid.net);
@@ -324,21 +327,23 @@ nfs_local_iocb_alloc(struct nfs_pgio_header *hdr,
 	return iocb;
 }
 
-static void
-nfs_local_iter_init(struct iov_iter *i, struct nfs_local_kiocb *iocb, int dir)
+static int
+nfs_local_iter_init(struct iov_iter *i, struct nfs_local_kiocb *iocb, int rw)
 {
 	struct nfs_pgio_header *hdr = iocb->hdr;
 	struct page **pagevec = hdr->page_array.pagevec;
-	u32 nf_dio_mem_align, nf_dio_offset_align;
+	u32 nf_dio_mem_align, nf_dio_offset_align, nf_dio_read_offset_align;
 	unsigned long v, total;
 	size_t len;
 
 	if (iocb->kiocb.ki_flags & IOCB_DIRECT) {
-		nf_dio_mem_align = iocb->nf_dio_mem_align;
-		if (dir == READ)
-			nf_dio_offset_align = iocb->nf_dio_read_offset_align;
+		nfs_to->nfsd_file_dio_alignment(iocb->localio, &nf_dio_mem_align,
+						&nf_dio_offset_align,
+						&nf_dio_read_offset_align);
+		if (rw == ITER_DEST)
+			nf_dio_offset_align = nf_dio_read_offset_align;
 		else
-			nf_dio_offset_align = iocb->nf_dio_offset_align;
+			nf_dio_offset_align = nf_dio_offset_align;
 	}
 
 	v = 0;
@@ -352,7 +357,7 @@ nfs_local_iter_init(struct iov_iter *i, struct nfs_local_kiocb *iocb, int dir)
 	}
 	WARN_ON_ONCE(v != hdr->page_array.npages);
 
-	iov_iter_bvec(i, dir, iocb->bvec, v,
+	iov_iter_bvec(i, rw, iocb->bvec, v,
 		      hdr->args.count + hdr->args.pgbase);
 	if (hdr->args.pgbase != 0)
 		iov_iter_advance(i, hdr->args.pgbase);
@@ -361,10 +366,20 @@ nfs_local_iter_init(struct iov_iter *i, struct nfs_local_kiocb *iocb, int dir)
 	if (iocb->kiocb.ki_flags & IOCB_DIRECT &&
 	    !iov_iter_is_aligned(i, nf_dio_mem_align - 1,
 				 nf_dio_offset_align - 1)) {
-		/* Fallback to buffered IO */
+		if (localio_O_DIRECT_align_misaligned_READ &&
+		    rw == ITER_DEST) {
+			/*
+			 * Fallback to sending this READ to NFSD since it
+			 * can expand misaligned READ IO to be DIO-aligned.
+			 */
+			return -ENOSYS;
+		}
+		/* Fallback to using buffered IO for this WRITE */
 		iocb->kiocb.ki_flags &= ~IOCB_DIRECT;
 		iocb->kiocb.ki_filp->f_flags &= ~O_DIRECT;
 	}
+
+	return 0;
 }
 
 static void
@@ -397,13 +412,18 @@ nfs_local_pgio_done(struct nfs_pgio_header *hdr, long status)
 	}
 }
 
-static void
-nfs_local_pgio_release(struct nfs_local_kiocb *iocb)
+static void nfs_local_iocb_release(struct nfs_local_kiocb *iocb)
 {
-	struct nfs_pgio_header *hdr = iocb->hdr;
-
 	nfs_local_file_put(iocb->localio);
 	nfs_local_iocb_free(iocb);
+}
+
+static void
+nfs_local_pgio_release(struct nfs_local_kiocb *iocb)
+{
+	struct nfs_pgio_header *hdr = iocb->hdr;
+
+	nfs_local_iocb_release(iocb);
 	nfs_local_hdr_release(hdr, hdr->task.tk_ops);
 }
 
@@ -464,18 +484,16 @@ static void nfs_local_call_read(struct work_struct *work)
 		container_of(work, struct nfs_local_kiocb, work);
 	struct file *filp = iocb->kiocb.ki_filp;
 	const struct cred *save_cred;
-	struct iov_iter iter;
 	ssize_t status;
 
 	save_cred = override_creds(filp->f_cred);
 
-	nfs_local_iter_init(&iter, iocb, READ);
 	if (iocb->kiocb.ki_flags & IOCB_DIRECT) {
 		iocb->kiocb.ki_complete = nfs_local_read_aio_complete;
 		iocb->aio_complete_work = nfs_local_read_aio_complete_work;
 	}
 
-	status = filp->f_op->read_iter(&iocb->kiocb, &iter);
+	status = filp->f_op->read_iter(&iocb->kiocb, &iocb->iter);
 	if (status != -EIOCBQUEUED) {
 		nfs_local_read_done(iocb, status);
 		nfs_local_pgio_release(iocb);
@@ -485,28 +503,14 @@ static void nfs_local_call_read(struct work_struct *work)
 }
 
 static int
-nfs_do_local_read(struct nfs_pgio_header *hdr,
-		  struct nfsd_file *localio,
+nfs_local_do_read(struct nfs_local_kiocb *iocb,
 		  const struct rpc_call_ops *call_ops)
 {
-	struct nfs_local_kiocb *iocb;
-	struct file *file = nfs_to->nfsd_file_file(localio);
-
-	/* Don't support filesystems without read_iter */
-	if (!file->f_op->read_iter)
-		return -EAGAIN;
+	struct nfs_pgio_header *hdr = iocb->hdr;
 
 	dprintk("%s: vfs_read count=%u pos=%llu\n",
 		__func__, hdr->args.count, hdr->args.offset);
 
-	iocb = nfs_local_iocb_alloc(hdr, file, GFP_KERNEL);
-	if (iocb == NULL)
-		return -ENOMEM;
-	iocb->localio = localio;
-	nfs_to->nfsd_file_dio_alignment(localio, &iocb->nf_dio_mem_align,
-					&iocb->nf_dio_offset_align,
-					&iocb->nf_dio_read_offset_align);
-
 	nfs_local_pgio_init(hdr, call_ops);
 	hdr->res.eof = false;
 
@@ -659,20 +663,18 @@ static void nfs_local_call_write(struct work_struct *work)
 	struct file *filp = iocb->kiocb.ki_filp;
 	unsigned long old_flags = current->flags;
 	const struct cred *save_cred;
-	struct iov_iter iter;
 	ssize_t status;
 
 	current->flags |= PF_LOCAL_THROTTLE | PF_MEMALLOC_NOIO;
 	save_cred = override_creds(filp->f_cred);
 
-	nfs_local_iter_init(&iter, iocb, WRITE);
 	if (iocb->kiocb.ki_flags & IOCB_DIRECT) {
 		iocb->kiocb.ki_complete = nfs_local_write_aio_complete;
 		iocb->aio_complete_work = nfs_local_write_aio_complete_work;
 	}
 
 	file_start_write(filp);
-	status = filp->f_op->write_iter(&iocb->kiocb, &iter);
+	status = filp->f_op->write_iter(&iocb->kiocb, &iocb->iter);
 	file_end_write(filp);
 	if (status != -EIOCBQUEUED) {
 		nfs_local_write_done(iocb, status);
@@ -685,29 +687,15 @@ static void nfs_local_call_write(struct work_struct *work)
 }
 
 static int
-nfs_do_local_write(struct nfs_pgio_header *hdr,
-		   struct nfsd_file *localio,
+nfs_local_do_write(struct nfs_local_kiocb *iocb,
 		   const struct rpc_call_ops *call_ops)
 {
-	struct nfs_local_kiocb *iocb;
-	struct file *file = nfs_to->nfsd_file_file(localio);
-
-	/* Don't support filesystems without write_iter */
-	if (!file->f_op->write_iter)
-		return -EAGAIN;
+	struct nfs_pgio_header *hdr = iocb->hdr;
 
 	dprintk("%s: vfs_write count=%u pos=%llu %s\n",
 		__func__, hdr->args.count, hdr->args.offset,
 		(hdr->args.stable == NFS_UNSTABLE) ?  "unstable" : "stable");
 
-	iocb = nfs_local_iocb_alloc(hdr, file, GFP_NOIO);
-	if (iocb == NULL)
-		return -ENOMEM;
-	iocb->localio = localio;
-	nfs_to->nfsd_file_dio_alignment(localio, &iocb->nf_dio_mem_align,
-					&iocb->nf_dio_offset_align,
-					&iocb->nf_dio_read_offset_align);
-
 	switch (hdr->args.stable) {
 	default:
 		break;
@@ -728,21 +716,67 @@ nfs_do_local_write(struct nfs_pgio_header *hdr,
 	return 0;
 }
 
-int nfs_local_doio(struct nfs_client *clp, struct nfsd_file *localio,
+static struct nfs_local_kiocb *
+nfs_local_iocb_init(struct nfs_pgio_header *hdr, struct nfsd_file **localio)
+{
+	struct file *file = nfs_to->nfsd_file_file(*localio);
+	struct nfs_local_kiocb *iocb;
+	gfp_t gfp_mask;
+	int rw, status;
+
+	if (hdr->rw_mode & FMODE_READ) {
+		if (!file->f_op->read_iter)
+			return ERR_PTR(-EINVAL);
+		gfp_mask = GFP_KERNEL;
+		rw = ITER_DEST;
+	} else {
+		if (!file->f_op->write_iter)
+			return ERR_PTR(-EINVAL);
+		gfp_mask = GFP_NOIO;
+		rw = ITER_SOURCE;
+	}
+
+	iocb = nfs_local_iocb_alloc(hdr, file, gfp_mask);
+	if (iocb == NULL)
+		return ERR_PTR(-ENOMEM);
+	iocb->hdr = hdr;
+	iocb->localio = *localio;
+
+	status = nfs_local_iter_init(&iocb->iter, iocb, rw);
+	if (status == -ENOSYS) {
+		/* close nfsd_file and clear localio,
+		 * this informs callers that IO should
+		 * be serviced remotely.
+		 */
+		nfs_local_iocb_release(iocb);
+		*localio = NULL;
+		return ERR_PTR(status);
+	}
+	WARN_ON_ONCE(status != 0);
+
+	return iocb;
+}
+
+int nfs_local_doio(struct nfs_client *clp, struct nfsd_file **localio,
 		   struct nfs_pgio_header *hdr,
 		   const struct rpc_call_ops *call_ops)
 {
+	struct nfs_local_kiocb *iocb;
 	int status = 0;
 
 	if (!hdr->args.count)
 		return 0;
 
+	iocb = nfs_local_iocb_init(hdr, localio);
+	if (IS_ERR(iocb))
+		return PTR_ERR(iocb);
+
 	switch (hdr->rw_mode) {
 	case FMODE_READ:
-		status = nfs_do_local_read(hdr, localio, call_ops);
+		status = nfs_local_do_read(iocb, call_ops);
 		break;
 	case FMODE_WRITE:
-		status = nfs_do_local_write(hdr, localio, call_ops);
+		status = nfs_local_do_write(iocb, call_ops);
 		break;
 	default:
 		dprintk("%s: invalid mode: %d\n", __func__,
@@ -753,7 +787,7 @@ int nfs_local_doio(struct nfs_client *clp, struct nfsd_file *localio,
 	if (status != 0) {
 		if (status == -EAGAIN)
 			nfs_localio_disable_client(clp);
-		nfs_local_file_put(localio);
+		nfs_local_iocb_release(iocb);
 		hdr->task.tk_status = status;
 		nfs_local_hdr_release(hdr, call_ops);
 	}
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index 11968dcb7243..9ddff27e96e9 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -762,9 +762,17 @@ int nfs_initiate_pgio(struct rpc_clnt *clnt, struct nfs_pgio_header *hdr,
 		hdr->args.count,
 		(unsigned long long)hdr->args.offset);
 
-	if (localio)
-		return nfs_local_doio(NFS_SERVER(hdr->inode)->nfs_client,
-				      localio, hdr, call_ops);
+	if (localio) {
+		int status = nfs_local_doio(NFS_SERVER(hdr->inode)->nfs_client,
+					    &localio, hdr, call_ops);
+		/* nfs_local_doio() will clear localio and return -ENOSYS if
+		 * it is prudent to immediately service this IO remotely.
+		 */
+		if (status != -ENOSYS)
+			return status;
+		WARN_ON_ONCE(localio != NULL);
+		/* fallthrough */
+	}
 
 	task = rpc_run_task(&task_setup_data);
 	if (IS_ERR(task))
@@ -959,7 +967,6 @@ static int nfs_generic_pg_pgios(struct nfs_pageio_descriptor *desc)
 	ret = nfs_generic_pgio(desc, hdr);
 	if (ret == 0) {
 		struct nfs_client *clp = NFS_SERVER(hdr->inode)->nfs_client;
-
 		struct nfsd_file *localio =
 			nfs_local_open_fh(clp, hdr->cred, hdr->args.fh,
 					  &hdr->args.context->nfl,
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH 5/6] nfs/localio: refactor iocb initialization
  2025-07-08 16:20 ` [RFC PATCH 5/6] nfs/localio: refactor iocb initialization Mike Snitzer
@ 2025-07-10  7:23   ` Christoph Hellwig
  2025-07-10  7:32     ` Mike Snitzer
  0 siblings, 1 reply; 9+ messages in thread
From: Christoph Hellwig @ 2025-07-10  7:23 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: Trond Myklebust, Anna Schumaker, linux-nfs, Mike Snitzer

> +	// FIXME: would do well to allocate memory based on NUMA node.
> +	iocb->bvec = kmalloc_array(hdr->page_array.npages,
> +				   sizeof(struct bio_vec), flags);

I don't think that's a "FIXME".  Either do it or leave it, but given
that memory is allocated on the local node by default I can't see why
that would be useful here for a short lived allocation.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH 5/6] nfs/localio: refactor iocb initialization
  2025-07-10  7:23   ` Christoph Hellwig
@ 2025-07-10  7:32     ` Mike Snitzer
  0 siblings, 0 replies; 9+ messages in thread
From: Mike Snitzer @ 2025-07-10  7:32 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Trond Myklebust, Anna Schumaker, linux-nfs, Mike Snitzer

On Thu, Jul 10, 2025 at 12:23:35AM -0700, Christoph Hellwig wrote:
> > +	// FIXME: would do well to allocate memory based on NUMA node.
> > +	iocb->bvec = kmalloc_array(hdr->page_array.npages,
> > +				   sizeof(struct bio_vec), flags);
> 
> I don't think that's a "FIXME".  Either do it or leave it, but given
> that memory is allocated on the local node by default I can't see why
> that would be useful here for a short lived allocation. 

Yeah, meant to remove that FIXME.. will do for v3.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2025-07-10  7:32 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-08 16:20 [RFC PATCH 0/6] NFS: LOCALIO improvements and support for misaligned O_DIRECT READs Mike Snitzer
2025-07-08 16:20 ` [RFC PATCH 1/6] nfs/localio: avoid bouncing LOCALIO if nfs_client_is_local() Mike Snitzer
2025-07-08 16:20 ` [RFC PATCH 2/6] nfs/localio: add localio_async_probe modparm Mike Snitzer
2025-07-08 16:20 ` [RFC PATCH 3/6] nfs/localio: make trace_nfs_local_open_fh more useful Mike Snitzer
2025-07-08 16:20 ` [RFC PATCH 4/6] nfs/localio: add nfsd_file_dio_alignment Mike Snitzer
2025-07-08 16:20 ` [RFC PATCH 5/6] nfs/localio: refactor iocb initialization Mike Snitzer
2025-07-10  7:23   ` Christoph Hellwig
2025-07-10  7:32     ` Mike Snitzer
2025-07-08 16:20 ` [RFC PATCH 6/6] nfs/localio: fallback to NFSD for misaligned O_DIRECT READs Mike Snitzer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox