public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] Fix misc localio issues
@ 2026-01-03 17:14 Trond Myklebust
  2026-01-03 17:14 ` [PATCH 1/4] NFS/localio: Stop further I/O upon hitting an error Trond Myklebust
                   ` (4 more replies)
  0 siblings, 5 replies; 17+ messages in thread
From: Trond Myklebust @ 2026-01-03 17:14 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: linux-nfs

From: Trond Myklebust <trond.myklebust@hammerspace.com>

This series fixes the following issues:
- Data corruption when localio hits a transient error.
- Short write handling can trigger spurious ENOSPC errors


Trond Myklebust (4):
  NFS/localio: Stop further I/O upon hitting an error
  NFS/localio: Deal with page bases that are > PAGE_SIZE
  NFS/localio: Handle short writes by retrying
  NFS/localio: Cleanup the nfs_local_pgio_done() parameters

 fs/nfs/localio.c | 106 +++++++++++++++++++++++++++++------------------
 1 file changed, 66 insertions(+), 40 deletions(-)

-- 
2.52.0


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/4] NFS/localio: Stop further I/O upon hitting an error
  2026-01-03 17:14 [PATCH 0/4] Fix misc localio issues Trond Myklebust
@ 2026-01-03 17:14 ` Trond Myklebust
  2026-01-05 17:19   ` Mike Snitzer
  2026-01-03 17:14 ` [PATCH 2/4] NFS/localio: Deal with page bases that are > PAGE_SIZE Trond Myklebust
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 17+ messages in thread
From: Trond Myklebust @ 2026-01-03 17:14 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: linux-nfs

From: Trond Myklebust <trond.myklebust@hammerspace.com>

If the call into the filesystem results in an I/O error, then the next
chunk of data won't be contiguous with the end of the last successful
chunk. So break out of the I/O loop and report the results.
Currently the localio code will do this for a short read/write, but not
for an error.

Fixes: 6a218b9c3183 ("nfs/localio: do not issue misaligned DIO out-of-order")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfs/localio.c | 30 ++++++++++++++----------------
 1 file changed, 14 insertions(+), 16 deletions(-)

diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
index b98bb292fef0..611088ec5a99 100644
--- a/fs/nfs/localio.c
+++ b/fs/nfs/localio.c
@@ -619,7 +619,6 @@ static void nfs_local_call_read(struct work_struct *work)
 		container_of(work, struct nfs_local_kiocb, work);
 	struct file *filp = iocb->kiocb.ki_filp;
 	const struct cred *save_cred;
-	bool force_done = false;
 	ssize_t status;
 	int n_iters;
 
@@ -638,13 +637,13 @@ static void nfs_local_call_read(struct work_struct *work)
 			iocb->kiocb.ki_flags &= ~IOCB_DIRECT;
 
 		status = filp->f_op->read_iter(&iocb->kiocb, &iocb->iters[i]);
-		if (status != -EIOCBQUEUED) {
-			if (unlikely(status >= 0 && status < iocb->iters[i].count))
-				force_done = true; /* Partial read */
-			if (nfs_local_pgio_done(iocb, status, force_done)) {
-				nfs_local_read_iocb_done(iocb);
-				break;
-			}
+		if (status == -EIOCBQUEUED)
+			continue;
+		/* Break on completion, errors, or short reads */
+		if (nfs_local_pgio_done(iocb, status, false) || status < 0 ||
+		    (size_t)status < iov_iter_count(&iocb->iters[i])) {
+			nfs_local_read_iocb_done(iocb);
+			break;
 		}
 	}
 
@@ -825,7 +824,6 @@ static void nfs_local_call_write(struct work_struct *work)
 	struct file *filp = iocb->kiocb.ki_filp;
 	unsigned long old_flags = current->flags;
 	const struct cred *save_cred;
-	bool force_done = false;
 	ssize_t status;
 	int n_iters;
 
@@ -846,13 +844,13 @@ static void nfs_local_call_write(struct work_struct *work)
 			iocb->kiocb.ki_flags &= ~IOCB_DIRECT;
 
 		status = filp->f_op->write_iter(&iocb->kiocb, &iocb->iters[i]);
-		if (status != -EIOCBQUEUED) {
-			if (unlikely(status >= 0 && status < iocb->iters[i].count))
-				force_done = true; /* Partial write */
-			if (nfs_local_pgio_done(iocb, status, force_done)) {
-				nfs_local_write_iocb_done(iocb);
-				break;
-			}
+		if (status == -EIOCBQUEUED)
+			continue;
+		/* Break on completion, errors, or short writes */
+		if (nfs_local_pgio_done(iocb, status, false) || status < 0 ||
+		    (size_t)status < iov_iter_count(&iocb->iters[i])) {
+			nfs_local_write_iocb_done(iocb);
+			break;
 		}
 	}
 	file_end_write(filp);
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/4] NFS/localio: Deal with page bases that are > PAGE_SIZE
  2026-01-03 17:14 [PATCH 0/4] Fix misc localio issues Trond Myklebust
  2026-01-03 17:14 ` [PATCH 1/4] NFS/localio: Stop further I/O upon hitting an error Trond Myklebust
@ 2026-01-03 17:14 ` Trond Myklebust
  2026-01-05 17:40   ` Mike Snitzer
  2026-01-03 17:14 ` [PATCH 3/4] NFS/localio: Handle short writes by retrying Trond Myklebust
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 17+ messages in thread
From: Trond Myklebust @ 2026-01-03 17:14 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: linux-nfs

From: Trond Myklebust <trond.myklebust@hammerspace.com>

When resending requests, etc, the page base can quickly grow larger than
the page size.

Fixes: 091bdcfcece0 ("nfs/localio: refactor iocb and iov_iter_bvec initialization")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfs/localio.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
index 611088ec5a99..c5f975bb5a64 100644
--- a/fs/nfs/localio.c
+++ b/fs/nfs/localio.c
@@ -461,6 +461,8 @@ nfs_local_iters_init(struct nfs_local_kiocb *iocb, int rw)
 	v = 0;
 	total = hdr->args.count;
 	base = hdr->args.pgbase;
+	pagevec += base >> PAGE_SHIFT;
+	base &= ~PAGE_MASK;
 	while (total && v < hdr->page_array.npages) {
 		len = min_t(size_t, total, PAGE_SIZE - base);
 		bvec_set_page(&iocb->bvec[v], *pagevec, len, base);
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/4] NFS/localio: Handle short writes by retrying
  2026-01-03 17:14 [PATCH 0/4] Fix misc localio issues Trond Myklebust
  2026-01-03 17:14 ` [PATCH 1/4] NFS/localio: Stop further I/O upon hitting an error Trond Myklebust
  2026-01-03 17:14 ` [PATCH 2/4] NFS/localio: Deal with page bases that are > PAGE_SIZE Trond Myklebust
@ 2026-01-03 17:14 ` Trond Myklebust
  2026-01-05 18:04   ` Mike Snitzer
  2026-01-03 17:15 ` [PATCH 4/4] NFS/localio: Cleanup the nfs_local_pgio_done() parameters Trond Myklebust
  2026-01-07 16:08 ` [PATCH 0/4] NFS/localio: various improvements Mike Snitzer
  4 siblings, 1 reply; 17+ messages in thread
From: Trond Myklebust @ 2026-01-03 17:14 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: linux-nfs

From: Trond Myklebust <trond.myklebust@hammerspace.com>

The current code for handling short writes in localio just truncates the
I/O and then sets an error. While that is close to how the ordinary NFS
code behaves, it does mean there is a chance the data that got written
is lost because it isn't persisted.
To fix this, change localio so that the upper layers can direct the
behaviour to persist any unstable data by rewriting it, and then
continuing writing until an ENOSPC is hit.

Fixes: 70ba381e1a43 ("nfs: add LOCALIO support")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfs/localio.c | 64 +++++++++++++++++++++++++++++++++++-------------
 1 file changed, 47 insertions(+), 17 deletions(-)

diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
index c5f975bb5a64..87abebbedbab 100644
--- a/fs/nfs/localio.c
+++ b/fs/nfs/localio.c
@@ -58,6 +58,11 @@ struct nfs_local_fsync_ctx {
 static bool localio_enabled __read_mostly = true;
 module_param(localio_enabled, bool, 0644);
 
+static int nfs_local_do_read(struct nfs_local_kiocb *iocb,
+			     const struct rpc_call_ops *call_ops);
+static int nfs_local_do_write(struct nfs_local_kiocb *iocb,
+			      const struct rpc_call_ops *call_ops);
+
 static inline bool nfs_client_is_local(const struct nfs_client *clp)
 {
 	return !!rcu_access_pointer(clp->cl_uuid.net);
@@ -542,13 +547,50 @@ nfs_local_iocb_release(struct nfs_local_kiocb *iocb)
 	nfs_local_iocb_free(iocb);
 }
 
-static void
-nfs_local_pgio_release(struct nfs_local_kiocb *iocb)
+static void nfs_local_pgio_restart(struct nfs_local_kiocb *iocb,
+				   struct nfs_pgio_header *hdr)
+{
+	int status = 0;
+
+	iocb->kiocb.ki_pos = hdr->args.offset;
+	iocb->kiocb.ki_flags &= ~(IOCB_DSYNC | IOCB_SYNC | IOCB_DIRECT);
+	iocb->kiocb.ki_complete = NULL;
+	iocb->aio_complete_work = NULL;
+	iocb->end_iter_index = -1;
+
+	switch (hdr->rw_mode) {
+	case FMODE_READ:
+		nfs_local_iters_init(iocb, ITER_DEST);
+		status = nfs_local_do_read(iocb, hdr->task.tk_ops);
+		break;
+	case FMODE_WRITE:
+		nfs_local_iters_init(iocb, ITER_SOURCE);
+		status = nfs_local_do_write(iocb, hdr->task.tk_ops);
+		break;
+	default:
+		status = -EOPNOTSUPP;
+	}
+
+	if (status != 0) {
+		nfs_local_iocb_release(iocb);
+		hdr->task.tk_status = status;
+		nfs_local_hdr_release(hdr, hdr->task.tk_ops);
+	}
+}
+
+static void nfs_local_pgio_release(struct nfs_local_kiocb *iocb)
 {
 	struct nfs_pgio_header *hdr = iocb->hdr;
+	struct rpc_task *task = &hdr->task;
+
+	task->tk_action = NULL;
+	task->tk_ops->rpc_call_done(task, hdr);
 
-	nfs_local_iocb_release(iocb);
-	nfs_local_hdr_release(hdr, hdr->task.tk_ops);
+	if (task->tk_action == NULL) {
+		nfs_local_iocb_release(iocb);
+		task->tk_ops->rpc_release(hdr);
+	} else
+		nfs_local_pgio_restart(iocb, hdr);
 }
 
 /*
@@ -776,19 +818,7 @@ static void nfs_local_write_done(struct nfs_local_kiocb *iocb)
 		pr_info_ratelimited("nfs: Unexpected direct I/O write alignment failure\n");
 	}
 
-	/* Handle short writes as if they are ENOSPC */
-	status = hdr->res.count;
-	if (status > 0 && status < hdr->args.count) {
-		hdr->mds_offset += status;
-		hdr->args.offset += status;
-		hdr->args.pgbase += status;
-		hdr->args.count -= status;
-		nfs_set_pgio_error(hdr, -ENOSPC, hdr->args.offset);
-		status = -ENOSPC;
-		/* record -ENOSPC in terms of nfs_local_pgio_done */
-		(void) nfs_local_pgio_done(iocb, status, true);
-	}
-	if (hdr->task.tk_status < 0)
+	if (status < 0)
 		nfs_reset_boot_verifier(hdr->inode);
 }
 
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 4/4] NFS/localio: Cleanup the nfs_local_pgio_done() parameters
  2026-01-03 17:14 [PATCH 0/4] Fix misc localio issues Trond Myklebust
                   ` (2 preceding siblings ...)
  2026-01-03 17:14 ` [PATCH 3/4] NFS/localio: Handle short writes by retrying Trond Myklebust
@ 2026-01-03 17:15 ` Trond Myklebust
  2026-01-05 17:24   ` Mike Snitzer
  2026-01-07 16:08 ` [PATCH 0/4] NFS/localio: various improvements Mike Snitzer
  4 siblings, 1 reply; 17+ messages in thread
From: Trond Myklebust @ 2026-01-03 17:15 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: linux-nfs

From: Trond Myklebust <trond.myklebust@hammerspace.com>

Remove the redundant 'force' parameter.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfs/localio.c | 14 +++++---------
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
index 87abebbedbab..8caf2ffc7e43 100644
--- a/fs/nfs/localio.c
+++ b/fs/nfs/localio.c
@@ -517,8 +517,7 @@ nfs_local_pgio_init(struct nfs_pgio_header *hdr,
 		hdr->task.tk_start = ktime_get();
 }
 
-static bool
-nfs_local_pgio_done(struct nfs_local_kiocb *iocb, long status, bool force)
+static bool nfs_local_pgio_done(struct nfs_local_kiocb *iocb, long status)
 {
 	struct nfs_pgio_header *hdr = iocb->hdr;
 
@@ -533,9 +532,6 @@ nfs_local_pgio_done(struct nfs_local_kiocb *iocb, long status, bool force)
 		hdr->task.tk_status = status;
 	}
 
-	if (force)
-		return true;
-
 	BUG_ON(atomic_read(&iocb->n_iters) <= 0);
 	return atomic_dec_and_test(&iocb->n_iters);
 }
@@ -651,7 +647,7 @@ static void nfs_local_read_aio_complete(struct kiocb *kiocb, long ret)
 		container_of(kiocb, struct nfs_local_kiocb, kiocb);
 
 	/* AIO completion of DIO read should always be last to complete */
-	if (unlikely(!nfs_local_pgio_done(iocb, ret, false)))
+	if (unlikely(!nfs_local_pgio_done(iocb, ret)))
 		return;
 
 	nfs_local_pgio_aio_complete(iocb); /* Calls nfs_local_read_aio_complete_work */
@@ -684,7 +680,7 @@ static void nfs_local_call_read(struct work_struct *work)
 		if (status == -EIOCBQUEUED)
 			continue;
 		/* Break on completion, errors, or short reads */
-		if (nfs_local_pgio_done(iocb, status, false) || status < 0 ||
+		if (nfs_local_pgio_done(iocb, status) || status < 0 ||
 		    (size_t)status < iov_iter_count(&iocb->iters[i])) {
 			nfs_local_read_iocb_done(iocb);
 			break;
@@ -843,7 +839,7 @@ static void nfs_local_write_aio_complete(struct kiocb *kiocb, long ret)
 		container_of(kiocb, struct nfs_local_kiocb, kiocb);
 
 	/* AIO completion of DIO write should always be last to complete */
-	if (unlikely(!nfs_local_pgio_done(iocb, ret, false)))
+	if (unlikely(!nfs_local_pgio_done(iocb, ret)))
 		return;
 
 	nfs_local_pgio_aio_complete(iocb); /* Calls nfs_local_write_aio_complete_work */
@@ -879,7 +875,7 @@ static void nfs_local_call_write(struct work_struct *work)
 		if (status == -EIOCBQUEUED)
 			continue;
 		/* Break on completion, errors, or short writes */
-		if (nfs_local_pgio_done(iocb, status, false) || status < 0 ||
+		if (nfs_local_pgio_done(iocb, status) || status < 0 ||
 		    (size_t)status < iov_iter_count(&iocb->iters[i])) {
 			nfs_local_write_iocb_done(iocb);
 			break;
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/4] NFS/localio: Stop further I/O upon hitting an error
  2026-01-03 17:14 ` [PATCH 1/4] NFS/localio: Stop further I/O upon hitting an error Trond Myklebust
@ 2026-01-05 17:19   ` Mike Snitzer
  2026-01-05 17:35     ` Trond Myklebust
  0 siblings, 1 reply; 17+ messages in thread
From: Mike Snitzer @ 2026-01-05 17:19 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs

On Sat, Jan 03, 2026 at 12:14:57PM -0500, Trond Myklebust wrote:
> From: Trond Myklebust <trond.myklebust@hammerspace.com>
> 
> If the call into the filesystem results in an I/O error, then the next
> chunk of data won't be contiguous with the end of the last successful
> chunk. So break out of the I/O loop and report the results.
> Currently the localio code will do this for a short read/write, but not
> for an error.
> 
> Fixes: 6a218b9c3183 ("nfs/localio: do not issue misaligned DIO out-of-order")
> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

Thanks, definitely cleaner to not have the awkward force_done flag,
sorry for that nastiness.

But this one needs to be rebased on tip of linus' master due to commit
3af870aedbff ("nfs/localio: fix regression due to out-of-order
__put_cred") which landed after the 6.19 merge.

Like so:

From: Trond Myklebust <trond.myklebust@hammerspace.com>
Date: Sat, 3 Jan 2026 12:14:57 -0500
Subject: [PATCH] NFS/localio: Stop further I/O upon hitting an error

If the call into the filesystem results in an I/O error, then the next
chunk of data won't be contiguous with the end of the last successful
chunk. So break out of the I/O loop and report the results.
Currently the localio code will do this for a short read/write, but not
for an error.

Fixes: 6a218b9c3183 ("nfs/localio: do not issue misaligned DIO out-of-order")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Reviewed-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
 fs/nfs/localio.c | 30 ++++++++++++++----------------
 1 file changed, 14 insertions(+), 16 deletions(-)

diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
index ed2a7efaf8f20..97e4733d04714 100644
--- a/fs/nfs/localio.c
+++ b/fs/nfs/localio.c
@@ -619,7 +619,6 @@ static void nfs_local_call_read(struct work_struct *work)
 		container_of(work, struct nfs_local_kiocb, work);
 	struct file *filp = iocb->kiocb.ki_filp;
 	const struct cred *save_cred;
-	bool force_done = false;
 	ssize_t status;
 	int n_iters;
 
@@ -639,13 +638,13 @@ static void nfs_local_call_read(struct work_struct *work)
 		status = filp->f_op->read_iter(&iocb->kiocb, &iocb->iters[i]);
 		revert_creds(save_cred);
 
-		if (status != -EIOCBQUEUED) {
-			if (unlikely(status >= 0 && status < iocb->iters[i].count))
-				force_done = true; /* Partial read */
-			if (nfs_local_pgio_done(iocb, status, force_done)) {
-				nfs_local_read_iocb_done(iocb);
-				break;
-			}
+		if (status == -EIOCBQUEUED)
+			continue;
+		/* Break on completion, errors, or short reads */
+		if (nfs_local_pgio_done(iocb, status, false) || status < 0 ||
+		    (size_t)status < iov_iter_count(&iocb->iters[i])) {
+			nfs_local_read_iocb_done(iocb);
+			break;
 		}
 	}
 }
@@ -824,7 +823,6 @@ static void nfs_local_call_write(struct work_struct *work)
 	struct file *filp = iocb->kiocb.ki_filp;
 	unsigned long old_flags = current->flags;
 	const struct cred *save_cred;
-	bool force_done = false;
 	ssize_t status;
 	int n_iters;
 
@@ -847,13 +845,13 @@ static void nfs_local_call_write(struct work_struct *work)
 		status = filp->f_op->write_iter(&iocb->kiocb, &iocb->iters[i]);
 		revert_creds(save_cred);
 
-		if (status != -EIOCBQUEUED) {
-			if (unlikely(status >= 0 && status < iocb->iters[i].count))
-				force_done = true; /* Partial write */
-			if (nfs_local_pgio_done(iocb, status, force_done)) {
-				nfs_local_write_iocb_done(iocb);
-				break;
-			}
+		if (status == -EIOCBQUEUED)
+			continue;
+		/* Break on completion, errors, or short writes */
+		if (nfs_local_pgio_done(iocb, status, false) || status < 0 ||
+		    (size_t)status < iov_iter_count(&iocb->iters[i])) {
+			nfs_local_write_iocb_done(iocb);
+			break;
 		}
 	}
 	file_end_write(filp);
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/4] NFS/localio: Cleanup the nfs_local_pgio_done() parameters
  2026-01-03 17:15 ` [PATCH 4/4] NFS/localio: Cleanup the nfs_local_pgio_done() parameters Trond Myklebust
@ 2026-01-05 17:24   ` Mike Snitzer
  0 siblings, 0 replies; 17+ messages in thread
From: Mike Snitzer @ 2026-01-05 17:24 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs

On Sat, Jan 03, 2026 at 12:15:00PM -0500, Trond Myklebust wrote:
> From: Trond Myklebust <trond.myklebust@hammerspace.com>
> 
> Remove the redundant 'force' parameter.
> 
> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

Reviewed-by: Mike Snitzer <snitzer@kernel.org>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/4] NFS/localio: Stop further I/O upon hitting an error
  2026-01-05 17:19   ` Mike Snitzer
@ 2026-01-05 17:35     ` Trond Myklebust
  0 siblings, 0 replies; 17+ messages in thread
From: Trond Myklebust @ 2026-01-05 17:35 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: linux-nfs

On Mon, 2026-01-05 at 12:19 -0500, Mike Snitzer wrote:
> On Sat, Jan 03, 2026 at 12:14:57PM -0500, Trond Myklebust wrote:
> > From: Trond Myklebust <trond.myklebust@hammerspace.com>
> > 
> > If the call into the filesystem results in an I/O error, then the
> > next
> > chunk of data won't be contiguous with the end of the last
> > successful
> > chunk. So break out of the I/O loop and report the results.
> > Currently the localio code will do this for a short read/write, but
> > not
> > for an error.
> > 
> > Fixes: 6a218b9c3183 ("nfs/localio: do not issue misaligned DIO out-
> > of-order")
> > Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
> 
> Thanks, definitely cleaner to not have the awkward force_done flag,
> sorry for that nastiness.
> 
> But this one needs to be rebased on tip of linus' master due to
> commit
> 3af870aedbff ("nfs/localio: fix regression due to out-of-order
> __put_cred") which landed after the 6.19 merge.
> 
> Like so:
> 
> From: Trond Myklebust <trond.myklebust@hammerspace.com>
> Date: Sat, 3 Jan 2026 12:14:57 -0500
> Subject: [PATCH] NFS/localio: Stop further I/O upon hitting an error
> 
> If the call into the filesystem results in an I/O error, then the
> next
> chunk of data won't be contiguous with the end of the last successful
> chunk. So break out of the I/O loop and report the results.
> Currently the localio code will do this for a short read/write, but
> not
> for an error.
> 
> Fixes: 6a218b9c3183 ("nfs/localio: do not issue misaligned DIO out-
> of-order")
> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
> Reviewed-by: Mike Snitzer <snitzer@kernel.org>
> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> ---
>  fs/nfs/localio.c | 30 ++++++++++++++----------------
>  1 file changed, 14 insertions(+), 16 deletions(-)
> 
> diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
> index ed2a7efaf8f20..97e4733d04714 100644
> --- a/fs/nfs/localio.c
> +++ b/fs/nfs/localio.c
> @@ -619,7 +619,6 @@ static void nfs_local_call_read(struct
> work_struct *work)
>  		container_of(work, struct nfs_local_kiocb, work);
>  	struct file *filp = iocb->kiocb.ki_filp;
>  	const struct cred *save_cred;
> -	bool force_done = false;
>  	ssize_t status;
>  	int n_iters;
>  
> @@ -639,13 +638,13 @@ static void nfs_local_call_read(struct
> work_struct *work)
>  		status = filp->f_op->read_iter(&iocb->kiocb, &iocb-
> >iters[i]);
>  		revert_creds(save_cred);
>  
> -		if (status != -EIOCBQUEUED) {
> -			if (unlikely(status >= 0 && status < iocb-
> >iters[i].count))
> -				force_done = true; /* Partial read
> */
> -			if (nfs_local_pgio_done(iocb, status,
> force_done)) {
> -				nfs_local_read_iocb_done(iocb);
> -				break;
> -			}
> +		if (status == -EIOCBQUEUED)
> +			continue;
> +		/* Break on completion, errors, or short reads */
> +		if (nfs_local_pgio_done(iocb, status, false) ||
> status < 0 ||
> +		    (size_t)status < iov_iter_count(&iocb-
> >iters[i])) {
> +			nfs_local_read_iocb_done(iocb);
> +			break;
>  		}
>  	}
>  }
> @@ -824,7 +823,6 @@ static void nfs_local_call_write(struct
> work_struct *work)
>  	struct file *filp = iocb->kiocb.ki_filp;
>  	unsigned long old_flags = current->flags;
>  	const struct cred *save_cred;
> -	bool force_done = false;
>  	ssize_t status;
>  	int n_iters;
>  
> @@ -847,13 +845,13 @@ static void nfs_local_call_write(struct
> work_struct *work)
>  		status = filp->f_op->write_iter(&iocb->kiocb, &iocb-
> >iters[i]);
>  		revert_creds(save_cred);
>  
> -		if (status != -EIOCBQUEUED) {
> -			if (unlikely(status >= 0 && status < iocb-
> >iters[i].count))
> -				force_done = true; /* Partial write
> */
> -			if (nfs_local_pgio_done(iocb, status,
> force_done)) {
> -				nfs_local_write_iocb_done(iocb);
> -				break;
> -			}
> +		if (status == -EIOCBQUEUED)
> +			continue;
> +		/* Break on completion, errors, or short writes */
> +		if (nfs_local_pgio_done(iocb, status, false) ||
> status < 0 ||
> +		    (size_t)status < iov_iter_count(&iocb-
> >iters[i])) {
> +			nfs_local_write_iocb_done(iocb);
> +			break;
>  		}
>  	}
>  	file_end_write(filp);

Yep. I rebased my testing branch yesterday, and caught that.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trondmy@kernel.org, trond.myklebust@hammerspace.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/4] NFS/localio: Deal with page bases that are > PAGE_SIZE
  2026-01-03 17:14 ` [PATCH 2/4] NFS/localio: Deal with page bases that are > PAGE_SIZE Trond Myklebust
@ 2026-01-05 17:40   ` Mike Snitzer
  0 siblings, 0 replies; 17+ messages in thread
From: Mike Snitzer @ 2026-01-05 17:40 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs

On Sat, Jan 03, 2026 at 12:14:58PM -0500, Trond Myklebust wrote:
> From: Trond Myklebust <trond.myklebust@hammerspace.com>
> 
> When resending requests, etc, the page base can quickly grow larger than
> the page size.
> 
> Fixes: 091bdcfcece0 ("nfs/localio: refactor iocb and iov_iter_bvec initialization")
> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

This was bound to happen given the "No functional change." in the
header for commit 091bdcfcece0 -- wasn't purely a refactor, ugh.

Should the implications of not having this fix be made clearer in this
patch header?

In any case, very nice catch:

Reviewed-by: Mike Snitzer <snitzer@kernel.org>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/4] NFS/localio: Handle short writes by retrying
  2026-01-03 17:14 ` [PATCH 3/4] NFS/localio: Handle short writes by retrying Trond Myklebust
@ 2026-01-05 18:04   ` Mike Snitzer
  2026-01-05 18:09     ` Trond Myklebust
  0 siblings, 1 reply; 17+ messages in thread
From: Mike Snitzer @ 2026-01-05 18:04 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs

On Sat, Jan 03, 2026 at 12:14:59PM -0500, Trond Myklebust wrote:
> From: Trond Myklebust <trond.myklebust@hammerspace.com>
> 
> The current code for handling short writes in localio just truncates the
> I/O and then sets an error. While that is close to how the ordinary NFS
> code behaves, it does mean there is a chance the data that got written
> is lost because it isn't persisted.
> To fix this, change localio so that the upper layers can direct the
> behaviour to persist any unstable data by rewriting it, and then
> continuing writing until an ENOSPC is hit.
> 
> Fixes: 70ba381e1a43 ("nfs: add LOCALIO support")
> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

This is a pretty subtle fix in that it depends on rpc_call_done
conditionally setting task->tk_action -- is it worth adding a relevant
code comment in nfs_local_pgio_release()?

Additional inline review comment below.

> ---
>  fs/nfs/localio.c | 64 +++++++++++++++++++++++++++++++++++-------------
>  1 file changed, 47 insertions(+), 17 deletions(-)
> 
> diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
> index c5f975bb5a64..87abebbedbab 100644
> --- a/fs/nfs/localio.c
> +++ b/fs/nfs/localio.c
> @@ -58,6 +58,11 @@ struct nfs_local_fsync_ctx {
>  static bool localio_enabled __read_mostly = true;
>  module_param(localio_enabled, bool, 0644);
>  
> +static int nfs_local_do_read(struct nfs_local_kiocb *iocb,
> +			     const struct rpc_call_ops *call_ops);
> +static int nfs_local_do_write(struct nfs_local_kiocb *iocb,
> +			      const struct rpc_call_ops *call_ops);
> +
>  static inline bool nfs_client_is_local(const struct nfs_client *clp)
>  {
>  	return !!rcu_access_pointer(clp->cl_uuid.net);
> @@ -542,13 +547,50 @@ nfs_local_iocb_release(struct nfs_local_kiocb *iocb)
>  	nfs_local_iocb_free(iocb);
>  }
>  
> -static void
> -nfs_local_pgio_release(struct nfs_local_kiocb *iocb)
> +static void nfs_local_pgio_restart(struct nfs_local_kiocb *iocb,
> +				   struct nfs_pgio_header *hdr)
> +{
> +	int status = 0;
> +
> +	iocb->kiocb.ki_pos = hdr->args.offset;
> +	iocb->kiocb.ki_flags &= ~(IOCB_DSYNC | IOCB_SYNC | IOCB_DIRECT);
> +	iocb->kiocb.ki_complete = NULL;
> +	iocb->aio_complete_work = NULL;
> +	iocb->end_iter_index = -1;
> +
> +	switch (hdr->rw_mode) {
> +	case FMODE_READ:
> +		nfs_local_iters_init(iocb, ITER_DEST);
> +		status = nfs_local_do_read(iocb, hdr->task.tk_ops);
> +		break;
> +	case FMODE_WRITE:
> +		nfs_local_iters_init(iocb, ITER_SOURCE);
> +		status = nfs_local_do_write(iocb, hdr->task.tk_ops);
> +		break;
> +	default:
> +		status = -EOPNOTSUPP;
> +	}

If this is a restart, then hdr->rw_mode will never not be FMODE_READ
or FMODE_WRITE.  As such, hdr->task.tk_ops will have been initialized
(as a side-effect of the original nfs_local_do_{read,write}) _and_
reinitialized by the above new calls to them.

My point: "default" case shouldn't ever be possible.  So should a
comment be added?  Switch to BUG_ON?  Do nothing about it?

Mike

> +
> +	if (status != 0) {
> +		nfs_local_iocb_release(iocb);
> +		hdr->task.tk_status = status;
> +		nfs_local_hdr_release(hdr, hdr->task.tk_ops);
> +	}
> +}
> +
> +static void nfs_local_pgio_release(struct nfs_local_kiocb *iocb)
>  {
>  	struct nfs_pgio_header *hdr = iocb->hdr;
> +	struct rpc_task *task = &hdr->task;
> +
> +	task->tk_action = NULL;
> +	task->tk_ops->rpc_call_done(task, hdr);
>  
> -	nfs_local_iocb_release(iocb);
> -	nfs_local_hdr_release(hdr, hdr->task.tk_ops);
> +	if (task->tk_action == NULL) {
> +		nfs_local_iocb_release(iocb);
> +		task->tk_ops->rpc_release(hdr);
> +	} else
> +		nfs_local_pgio_restart(iocb, hdr);
>  }
>  
>  /*
> @@ -776,19 +818,7 @@ static void nfs_local_write_done(struct nfs_local_kiocb *iocb)
>  		pr_info_ratelimited("nfs: Unexpected direct I/O write alignment failure\n");
>  	}
>  
> -	/* Handle short writes as if they are ENOSPC */
> -	status = hdr->res.count;
> -	if (status > 0 && status < hdr->args.count) {
> -		hdr->mds_offset += status;
> -		hdr->args.offset += status;
> -		hdr->args.pgbase += status;
> -		hdr->args.count -= status;
> -		nfs_set_pgio_error(hdr, -ENOSPC, hdr->args.offset);
> -		status = -ENOSPC;
> -		/* record -ENOSPC in terms of nfs_local_pgio_done */
> -		(void) nfs_local_pgio_done(iocb, status, true);
> -	}
> -	if (hdr->task.tk_status < 0)
> +	if (status < 0)
>  		nfs_reset_boot_verifier(hdr->inode);
>  }
>  
> -- 
> 2.52.0
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/4] NFS/localio: Handle short writes by retrying
  2026-01-05 18:04   ` Mike Snitzer
@ 2026-01-05 18:09     ` Trond Myklebust
  2026-01-05 18:30       ` Mike Snitzer
  0 siblings, 1 reply; 17+ messages in thread
From: Trond Myklebust @ 2026-01-05 18:09 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: linux-nfs

On Mon, 2026-01-05 at 13:04 -0500, Mike Snitzer wrote:
> On Sat, Jan 03, 2026 at 12:14:59PM -0500, Trond Myklebust wrote:
> > From: Trond Myklebust <trond.myklebust@hammerspace.com>
> > 
> > The current code for handling short writes in localio just
> > truncates the
> > I/O and then sets an error. While that is close to how the ordinary
> > NFS
> > code behaves, it does mean there is a chance the data that got
> > written
> > is lost because it isn't persisted.
> > To fix this, change localio so that the upper layers can direct the
> > behaviour to persist any unstable data by rewriting it, and then
> > continuing writing until an ENOSPC is hit.
> > 
> > Fixes: 70ba381e1a43 ("nfs: add LOCALIO support")
> > Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
> 
> This is a pretty subtle fix in that it depends on rpc_call_done
> conditionally setting task->tk_action -- is it worth adding a
> relevant
> code comment in nfs_local_pgio_release()?
> 

That's how restarts work in the RPC code: after the rpc_call_done
callback is done, rpc_exit_task() will check for whether or not the
task->tk_action got reset, and if so, it will prepare a new RPC call.

> Additional inline review comment below.
> 
> > ---
> >  fs/nfs/localio.c | 64 +++++++++++++++++++++++++++++++++++---------
> > ----
> >  1 file changed, 47 insertions(+), 17 deletions(-)
> > 
> > diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
> > index c5f975bb5a64..87abebbedbab 100644
> > --- a/fs/nfs/localio.c
> > +++ b/fs/nfs/localio.c
> > @@ -58,6 +58,11 @@ struct nfs_local_fsync_ctx {
> >  static bool localio_enabled __read_mostly = true;
> >  module_param(localio_enabled, bool, 0644);
> >  
> > +static int nfs_local_do_read(struct nfs_local_kiocb *iocb,
> > +			     const struct rpc_call_ops *call_ops);
> > +static int nfs_local_do_write(struct nfs_local_kiocb *iocb,
> > +			      const struct rpc_call_ops
> > *call_ops);
> > +
> >  static inline bool nfs_client_is_local(const struct nfs_client
> > *clp)
> >  {
> >  	return !!rcu_access_pointer(clp->cl_uuid.net);
> > @@ -542,13 +547,50 @@ nfs_local_iocb_release(struct nfs_local_kiocb
> > *iocb)
> >  	nfs_local_iocb_free(iocb);
> >  }
> >  
> > -static void
> > -nfs_local_pgio_release(struct nfs_local_kiocb *iocb)
> > +static void nfs_local_pgio_restart(struct nfs_local_kiocb *iocb,
> > +				   struct nfs_pgio_header *hdr)
> > +{
> > +	int status = 0;
> > +
> > +	iocb->kiocb.ki_pos = hdr->args.offset;
> > +	iocb->kiocb.ki_flags &= ~(IOCB_DSYNC | IOCB_SYNC |
> > IOCB_DIRECT);
> > +	iocb->kiocb.ki_complete = NULL;
> > +	iocb->aio_complete_work = NULL;
> > +	iocb->end_iter_index = -1;
> > +
> > +	switch (hdr->rw_mode) {
> > +	case FMODE_READ:
> > +		nfs_local_iters_init(iocb, ITER_DEST);
> > +		status = nfs_local_do_read(iocb, hdr-
> > >task.tk_ops);
> > +		break;
> > +	case FMODE_WRITE:
> > +		nfs_local_iters_init(iocb, ITER_SOURCE);
> > +		status = nfs_local_do_write(iocb, hdr-
> > >task.tk_ops);
> > +		break;
> > +	default:
> > +		status = -EOPNOTSUPP;
> > +	}
> 
> If this is a restart, then hdr->rw_mode will never not be FMODE_READ
> or FMODE_WRITE.  As such, hdr->task.tk_ops will have been initialized
> (as a side-effect of the original nfs_local_do_{read,write}) _and_
> reinitialized by the above new calls to them.
> 
> My point: "default" case shouldn't ever be possible.  So should a
> comment be added?  Switch to BUG_ON?  Do nothing about it?
> 

I considered a BUG_ON(), but it shouldn't really matter. All this does
now is cancel the restart.

> Mike
> 
> > +
> > +	if (status != 0) {
> > +		nfs_local_iocb_release(iocb);
> > +		hdr->task.tk_status = status;
> > +		nfs_local_hdr_release(hdr, hdr->task.tk_ops);
> > +	}
> > +}
> > +
> > +static void nfs_local_pgio_release(struct nfs_local_kiocb *iocb)
> >  {
> >  	struct nfs_pgio_header *hdr = iocb->hdr;
> > +	struct rpc_task *task = &hdr->task;
> > +
> > +	task->tk_action = NULL;
> > +	task->tk_ops->rpc_call_done(task, hdr);
> >  
> > -	nfs_local_iocb_release(iocb);
> > -	nfs_local_hdr_release(hdr, hdr->task.tk_ops);
> > +	if (task->tk_action == NULL) {
> > +		nfs_local_iocb_release(iocb);
> > +		task->tk_ops->rpc_release(hdr);
> > +	} else
> > +		nfs_local_pgio_restart(iocb, hdr);
> >  }
> >  
> >  /*
> > @@ -776,19 +818,7 @@ static void nfs_local_write_done(struct
> > nfs_local_kiocb *iocb)
> >  		pr_info_ratelimited("nfs: Unexpected direct I/O
> > write alignment failure\n");
> >  	}
> >  
> > -	/* Handle short writes as if they are ENOSPC */
> > -	status = hdr->res.count;
> > -	if (status > 0 && status < hdr->args.count) {
> > -		hdr->mds_offset += status;
> > -		hdr->args.offset += status;
> > -		hdr->args.pgbase += status;
> > -		hdr->args.count -= status;
> > -		nfs_set_pgio_error(hdr, -ENOSPC, hdr-
> > >args.offset);
> > -		status = -ENOSPC;
> > -		/* record -ENOSPC in terms of nfs_local_pgio_done
> > */
> > -		(void) nfs_local_pgio_done(iocb, status, true);
> > -	}
> > -	if (hdr->task.tk_status < 0)
> > +	if (status < 0)
> >  		nfs_reset_boot_verifier(hdr->inode);
> >  }
> >  
> > -- 
> > 2.52.0
> > 

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trondmy@kernel.org, trond.myklebust@hammerspace.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/4] NFS/localio: Handle short writes by retrying
  2026-01-05 18:09     ` Trond Myklebust
@ 2026-01-05 18:30       ` Mike Snitzer
  0 siblings, 0 replies; 17+ messages in thread
From: Mike Snitzer @ 2026-01-05 18:30 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs

On Mon, Jan 05, 2026 at 01:09:50PM -0500, Trond Myklebust wrote:
> On Mon, 2026-01-05 at 13:04 -0500, Mike Snitzer wrote:
> > On Sat, Jan 03, 2026 at 12:14:59PM -0500, Trond Myklebust wrote:
> > > From: Trond Myklebust <trond.myklebust@hammerspace.com>
> > > 
> > > The current code for handling short writes in localio just
> > > truncates the
> > > I/O and then sets an error. While that is close to how the ordinary
> > > NFS
> > > code behaves, it does mean there is a chance the data that got
> > > written
> > > is lost because it isn't persisted.
> > > To fix this, change localio so that the upper layers can direct the
> > > behaviour to persist any unstable data by rewriting it, and then
> > > continuing writing until an ENOSPC is hit.
> > > 
> > > Fixes: 70ba381e1a43 ("nfs: add LOCALIO support")
> > > Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
> > 
> > This is a pretty subtle fix in that it depends on rpc_call_done
> > conditionally setting task->tk_action -- is it worth adding a
> > relevant
> > code comment in nfs_local_pgio_release()?
> > 
> 
> That's how restarts work in the RPC code: after the rpc_call_done
> callback is done, rpc_exit_task() will check for whether or not the
> task->tk_action got reset, and if so, it will prepare a new RPC call.
> 
> > Additional inline review comment below.
> > 
> > > ---
> > >  fs/nfs/localio.c | 64 +++++++++++++++++++++++++++++++++++---------
> > > ----
> > >  1 file changed, 47 insertions(+), 17 deletions(-)
> > > 
> > > diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
> > > index c5f975bb5a64..87abebbedbab 100644
> > > --- a/fs/nfs/localio.c
> > > +++ b/fs/nfs/localio.c
> > > @@ -58,6 +58,11 @@ struct nfs_local_fsync_ctx {
> > >  static bool localio_enabled __read_mostly = true;
> > >  module_param(localio_enabled, bool, 0644);
> > >  
> > > +static int nfs_local_do_read(struct nfs_local_kiocb *iocb,
> > > +			     const struct rpc_call_ops *call_ops);
> > > +static int nfs_local_do_write(struct nfs_local_kiocb *iocb,
> > > +			      const struct rpc_call_ops
> > > *call_ops);
> > > +
> > >  static inline bool nfs_client_is_local(const struct nfs_client
> > > *clp)
> > >  {
> > >  	return !!rcu_access_pointer(clp->cl_uuid.net);
> > > @@ -542,13 +547,50 @@ nfs_local_iocb_release(struct nfs_local_kiocb
> > > *iocb)
> > >  	nfs_local_iocb_free(iocb);
> > >  }
> > >  
> > > -static void
> > > -nfs_local_pgio_release(struct nfs_local_kiocb *iocb)
> > > +static void nfs_local_pgio_restart(struct nfs_local_kiocb *iocb,
> > > +				   struct nfs_pgio_header *hdr)
> > > +{
> > > +	int status = 0;
> > > +
> > > +	iocb->kiocb.ki_pos = hdr->args.offset;
> > > +	iocb->kiocb.ki_flags &= ~(IOCB_DSYNC | IOCB_SYNC |
> > > IOCB_DIRECT);
> > > +	iocb->kiocb.ki_complete = NULL;
> > > +	iocb->aio_complete_work = NULL;
> > > +	iocb->end_iter_index = -1;
> > > +
> > > +	switch (hdr->rw_mode) {
> > > +	case FMODE_READ:
> > > +		nfs_local_iters_init(iocb, ITER_DEST);
> > > +		status = nfs_local_do_read(iocb, hdr-
> > > >task.tk_ops);
> > > +		break;
> > > +	case FMODE_WRITE:
> > > +		nfs_local_iters_init(iocb, ITER_SOURCE);
> > > +		status = nfs_local_do_write(iocb, hdr-
> > > >task.tk_ops);
> > > +		break;
> > > +	default:
> > > +		status = -EOPNOTSUPP;
> > > +	}
> > 
> > If this is a restart, then hdr->rw_mode will never not be FMODE_READ
> > or FMODE_WRITE.  As such, hdr->task.tk_ops will have been initialized
> > (as a side-effect of the original nfs_local_do_{read,write}) _and_
> > reinitialized by the above new calls to them.
> > 
> > My point: "default" case shouldn't ever be possible.  So should a
> > comment be added?  Switch to BUG_ON?  Do nothing about it?
> > 
> 
> I considered a BUG_ON(), but it shouldn't really matter. All this does
> now is cancel the restart.

OK, thanks.

Reviewed-by: Mike Snitzer <snitzer@kernel.org>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 0/4] NFS/localio: various improvements
  2026-01-03 17:14 [PATCH 0/4] Fix misc localio issues Trond Myklebust
                   ` (3 preceding siblings ...)
  2026-01-03 17:15 ` [PATCH 4/4] NFS/localio: Cleanup the nfs_local_pgio_done() parameters Trond Myklebust
@ 2026-01-07 16:08 ` Mike Snitzer
  2026-01-07 16:08   ` [PATCH 1/4] NFS/localio: prevent direct reclaim recursion into NFS via nfs_writepages Mike Snitzer
                     ` (3 more replies)
  4 siblings, 4 replies; 17+ messages in thread
From: Mike Snitzer @ 2026-01-07 16:08 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker; +Cc: linux-nfs

Hi,

This series has been rebased on Trond's recent patchset:
https://lore.kernel.org/linux-nfs/cover.1767459435.git.trond.myklebust@hammerspace.com/

but the first 2 patches were developed while chasing a particularly
nasty deadlock that was reproducible when putting LOCALIO under heavy
load on systems with smaller amounts of memory (16G).  Ultimately that
deadlock happens to also be fixed by Trond's recent commit here:
https://git.linux-nfs.org/?p=trondmy/linux-nfs.git;a=commitdiff;h=cce0be6eb4971456b703aaeafd571650d314bcca

FYI, prior to that nfs_release_folio() deadlock fix, tasks would be
stuck like:

[92]            "khugepaged"
[<0>] wait_on_commit+0x71/0xb0 [nfs]
[<0>] __nfs_commit_inode+0x131/0x180 [nfs]
[<0>] nfs_wb_folio+0xa7/0x1b0 [nfs]
[<0>] nfs_release_folio+0x6b/0x120 [nfs]
[<0>] split_huge_page_to_list_to_order+0x367/0x840
[<0>] migrate_pages_batch+0x208/0x7b0
[<0>] migrate_pages+0x220/0x550
[<0>] compact_zone+0x300/0x6e0
[<0>] compact_zone_order+0xc7/0x110
[<0>] try_to_compact_pages+0xe6/0x2d0
[<0>] __alloc_pages_direct_compact+0x8a/0x210
[<0>] __alloc_pages_slowpath.constprop.0+0x218/0x9c0
[<0>] __alloc_pages_noprof+0x335/0x350
[<0>] __folio_alloc_noprof+0x14/0x30
[<0>] alloc_charge_folio+0xdd/0x1b0
[<0>] collapse_huge_page+0x63/0x700
[<0>] hpage_collapse_scan_pmd+0x627/0x750
[<0>] khugepaged_scan_mm_slot.constprop.0+0x3cf/0x5a0
[<0>] khugepaged+0x105/0x200
[<0>] kthread+0xef/0x230
[<0>] ret_from_fork+0x31/0x50
[<0>] ret_from_fork_asm+0x1a/0x30

and:

[12813]         "nfsd"
[<0>] wait_on_commit+0x71/0xb0 [nfs]
[<0>] __nfs_commit_inode+0x131/0x180 [nfs]
[<0>] nfs_wb_folio+0xa7/0x1b0 [nfs]
[<0>] nfs_release_folio+0x6b/0x120 [nfs]
[<0>] split_huge_page_to_list_to_order+0x367/0x840
[<0>] migrate_pages_batch+0x208/0x7b0
[<0>] migrate_pages+0x220/0x550
[<0>] compact_zone+0x300/0x6e0
[<0>] compact_zone_order+0xc7/0x110
[<0>] try_to_compact_pages+0xe6/0x2d0
[<0>] __alloc_pages_direct_compact+0x8a/0x210
[<0>] __alloc_pages_slowpath.constprop.0+0x218/0x9c0
[<0>] __alloc_pages_noprof+0x335/0x350
[<0>] alloc_pages_mpol_noprof+0x8f/0x1f0
[<0>] folio_alloc_noprof+0x5b/0xb0
[<0>] __filemap_get_folio+0x177/0x330
[<0>] nfs_write_begin+0x81/0x370 [nfs]
[<0>] generic_perform_write+0x91/0x290
[<0>] nfs_file_write+0x1ea/0x2f0 [nfs]
[<0>] vfs_iocb_iter_write+0xc0/0x210
[<0>] nfsd_vfs_write+0x26e/0x760 [nfsd]
[<0>] nfsd_write+0x17c/0x1c0 [nfsd]
[<0>] nfsd3_proc_write+0x10b/0x1a0 [nfsd]
[<0>] nfsd_dispatch+0xea/0x220 [nfsd]
[<0>] svc_process_common+0x4cd/0x6c0 [sunrpc]
[<0>] svc_process+0x145/0x210 [sunrpc]
[<0>] svc_handle_xprt+0x481/0x550 [sunrpc]
[<0>] svc_recv+0x170/0x2c0 [sunrpc]
[<0>] nfsd+0x8f/0xf0 [nfsd]
[<0>] kthread+0xef/0x230
[<0>] ret_from_fork+0x31/0x50
[<0>] ret_from_fork_asm+0x1a/0x30

Anyway, I think this LOCALIO series is worthwhile on its own and ready
for wider review and hopeful inclusion.

Thanks,
Mike

Mike Snitzer (4):
  NFS/localio: prevent direct reclaim recursion into NFS via nfs_writepages
  NFS/localio: use GFP_NOIO and non-memreclaim workqueue in nfs_local_commit
  NFS/localio: remove -EAGAIN handling in nfs_local_doio()
  NFS/localio: switch nfs_local_do_read and nfs_local_do_write to return void

 fs/nfs/localio.c | 60 +++++++++++++++++++++++++++++-------------------
 1 file changed, 36 insertions(+), 24 deletions(-)

-- 
2.44.0


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/4] NFS/localio: prevent direct reclaim recursion into NFS via nfs_writepages
  2026-01-07 16:08 ` [PATCH 0/4] NFS/localio: various improvements Mike Snitzer
@ 2026-01-07 16:08   ` Mike Snitzer
  2026-01-07 16:08   ` [PATCH 2/4] NFS/localio: use GFP_NOIO and non-memreclaim workqueue in nfs_local_commit Mike Snitzer
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 17+ messages in thread
From: Mike Snitzer @ 2026-01-07 16:08 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker; +Cc: linux-nfs

From: Mike Snitzer <snitzer@hammerspace.com>

LOCALIO is an NFS loopback mount optimization that avoids using the
network for READ, WRITE and COMMIT if the NFS client and server are
determined to be on the same system. But because LOCALIO is still
fundamentally "just NFS loopback mount" it is susceptible to recursion
deadlock via direct reclaim, e.g.: NFS LOCALIO down to XFS and then
back into NFS via nfs_writepages.

Fix LOCALIO's potential for direct reclaim deadlock by ensuring that
all its page cache allocations are done from GFP_NOFS context.

Thanks to Ben Coddington for pointing out commit ad22c7a043c2 ("xfs:
prevent stack overflows from page cache allocation").

Reported-by: John Cagle <john.cagle@hammerspace.com>
Tested-by: Allen Lu <allen.lu@hammerspace.com>
Suggested-by: Benjamin Coddington <bcodding@hammerspace.com>
Fixes: 70ba381e1a43 ("nfs: add LOCALIO support")
Signed-off-by: Mike Snitzer <snitzer@hammerspace.com>
---
 fs/nfs/localio.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
index d21047f7e4528..c38e7d4685e2f 100644
--- a/fs/nfs/localio.c
+++ b/fs/nfs/localio.c
@@ -291,6 +291,18 @@ nfs_local_open_fh(struct nfs_client *clp, const struct cred *cred,
 }
 EXPORT_SYMBOL_GPL(nfs_local_open_fh);
 
+/*
+ * Ensure all page cache allocations are done from GFP_NOFS context to
+ * prevent direct reclaim recursion back into NFS via nfs_writepages.
+ */
+static void
+nfs_local_mapping_set_gfp_nofs_context(struct address_space *m)
+{
+	gfp_t gfp_mask = mapping_gfp_mask(m);
+
+	mapping_set_gfp_mask(m, (gfp_mask & ~(__GFP_FS)));
+}
+
 static void
 nfs_local_iocb_free(struct nfs_local_kiocb *iocb)
 {
@@ -315,6 +327,7 @@ nfs_local_iocb_alloc(struct nfs_pgio_header *hdr,
 		return NULL;
 	}
 
+	nfs_local_mapping_set_gfp_nofs_context(file->f_mapping);
 	init_sync_kiocb(&iocb->kiocb, file);
 
 	iocb->hdr = hdr;
@@ -1004,6 +1017,8 @@ nfs_local_run_commit(struct file *filp, struct nfs_commit_data *data)
 			end = LLONG_MAX;
 	}
 
+	nfs_local_mapping_set_gfp_nofs_context(filp->f_mapping);
+
 	dprintk("%s: commit %llu - %llu\n", __func__, start, end);
 	return vfs_fsync_range(filp, start, end, 0);
 }
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/4] NFS/localio: use GFP_NOIO and non-memreclaim workqueue in nfs_local_commit
  2026-01-07 16:08 ` [PATCH 0/4] NFS/localio: various improvements Mike Snitzer
  2026-01-07 16:08   ` [PATCH 1/4] NFS/localio: prevent direct reclaim recursion into NFS via nfs_writepages Mike Snitzer
@ 2026-01-07 16:08   ` Mike Snitzer
  2026-01-07 16:08   ` [PATCH 3/4] NFS/localio: remove -EAGAIN handling in nfs_local_doio() Mike Snitzer
  2026-01-07 16:08   ` [PATCH 4/4] NFS/localio: switch nfs_local_do_read and nfs_local_do_write to return void Mike Snitzer
  3 siblings, 0 replies; 17+ messages in thread
From: Mike Snitzer @ 2026-01-07 16:08 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker; +Cc: linux-nfs

From: Mike Snitzer <snitzer@hammerspace.com>

nfslocaliod_workqueue is a non-memreclaim workqueue (it isn't
initialized with WQ_MEM_RECLAIM), see commit b9f5dd57f4a5
("nfs/localio: use dedicated workqueues for filesystem read and
write").

Use nfslocaliod_workqueue for LOCALIO's SYNC work.

Also, set PF_LOCAL_THROTTLE | PF_MEMALLOC_NOIO in
nfs_local_fsync_work.

Fixes: b9f5dd57f4a5 ("nfs/localio: use dedicated workqueues for filesystem read and write")
Signed-off-by: Mike Snitzer <snitzer@hammerspace.com>
---
 fs/nfs/localio.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
index c38e7d4685e2f..9ec1652cc5369 100644
--- a/fs/nfs/localio.c
+++ b/fs/nfs/localio.c
@@ -1060,17 +1060,22 @@ nfs_local_fsync_ctx_free(struct nfs_local_fsync_ctx *ctx)
 static void
 nfs_local_fsync_work(struct work_struct *work)
 {
+	unsigned long old_flags = current->flags;
 	struct nfs_local_fsync_ctx *ctx;
 	int status;
 
 	ctx = container_of(work, struct nfs_local_fsync_ctx, work);
 
+	current->flags |= PF_LOCAL_THROTTLE | PF_MEMALLOC_NOIO;
+
 	status = nfs_local_run_commit(nfs_to->nfsd_file_file(ctx->localio),
 				      ctx->data);
 	nfs_local_commit_done(ctx->data, status);
 	if (ctx->done != NULL)
 		complete(ctx->done);
 	nfs_local_fsync_ctx_free(ctx);
+
+	current->flags = old_flags;
 }
 
 static struct nfs_local_fsync_ctx *
@@ -1094,7 +1099,7 @@ int nfs_local_commit(struct nfsd_file *localio,
 {
 	struct nfs_local_fsync_ctx *ctx;
 
-	ctx = nfs_local_fsync_ctx_alloc(data, localio, GFP_KERNEL);
+	ctx = nfs_local_fsync_ctx_alloc(data, localio, GFP_NOIO);
 	if (!ctx) {
 		nfs_local_commit_done(data, -ENOMEM);
 		nfs_local_release_commit_data(localio, data, call_ops);
@@ -1106,10 +1111,10 @@ int nfs_local_commit(struct nfsd_file *localio,
 	if (how & FLUSH_SYNC) {
 		DECLARE_COMPLETION_ONSTACK(done);
 		ctx->done = &done;
-		queue_work(nfsiod_workqueue, &ctx->work);
+		queue_work(nfslocaliod_workqueue, &ctx->work);
 		wait_for_completion(&done);
 	} else
-		queue_work(nfsiod_workqueue, &ctx->work);
+		queue_work(nfslocaliod_workqueue, &ctx->work);
 
 	return 0;
 }
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/4] NFS/localio: remove -EAGAIN handling in nfs_local_doio()
  2026-01-07 16:08 ` [PATCH 0/4] NFS/localio: various improvements Mike Snitzer
  2026-01-07 16:08   ` [PATCH 1/4] NFS/localio: prevent direct reclaim recursion into NFS via nfs_writepages Mike Snitzer
  2026-01-07 16:08   ` [PATCH 2/4] NFS/localio: use GFP_NOIO and non-memreclaim workqueue in nfs_local_commit Mike Snitzer
@ 2026-01-07 16:08   ` Mike Snitzer
  2026-01-07 16:08   ` [PATCH 4/4] NFS/localio: switch nfs_local_do_read and nfs_local_do_write to return void Mike Snitzer
  3 siblings, 0 replies; 17+ messages in thread
From: Mike Snitzer @ 2026-01-07 16:08 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker; +Cc: linux-nfs

Handling -EAGAIN in nfs_local_doio() was introduced with commit
0978e5b85fc08 (nfs_do_local_{read,write} were made to have negative
checks for correspoding iter method) but commit e43e9a3a3d66
since eliminated the possibility for this -EAGAIN early return.

So remove nfs_local_doio()'s -EAGAIN handling that calls
nfs_localio_disable_client() -- while it should never happen from
nfs_do_local_{read,write} this particular -EAGAIN handling is now
"dead" and so it has become a liability.

Fixes: e43e9a3a3d66 ("nfs/localio: refactor iocb initialization")
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
 fs/nfs/localio.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
index 9ec1652cc5369..fa37425c5784b 100644
--- a/fs/nfs/localio.c
+++ b/fs/nfs/localio.c
@@ -989,8 +989,6 @@ int nfs_local_doio(struct nfs_client *clp, struct nfsd_file *localio,
 	}
 
 	if (status != 0) {
-		if (status == -EAGAIN)
-			nfs_localio_disable_client(clp);
 		nfs_local_iocb_release(iocb);
 		hdr->task.tk_status = status;
 		nfs_local_hdr_release(hdr, call_ops);
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 4/4] NFS/localio: switch nfs_local_do_read and nfs_local_do_write to return void
  2026-01-07 16:08 ` [PATCH 0/4] NFS/localio: various improvements Mike Snitzer
                     ` (2 preceding siblings ...)
  2026-01-07 16:08   ` [PATCH 3/4] NFS/localio: remove -EAGAIN handling in nfs_local_doio() Mike Snitzer
@ 2026-01-07 16:08   ` Mike Snitzer
  3 siblings, 0 replies; 17+ messages in thread
From: Mike Snitzer @ 2026-01-07 16:08 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker; +Cc: linux-nfs

Both nfs_local_do_read and nfs_local_do_write only return 0 at the
end, so switch them to returning void.

Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
 fs/nfs/localio.c | 32 +++++++++++++-------------------
 1 file changed, 13 insertions(+), 19 deletions(-)

diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
index fa37425c5784b..fa9c0bc0be9d0 100644
--- a/fs/nfs/localio.c
+++ b/fs/nfs/localio.c
@@ -58,10 +58,10 @@ struct nfs_local_fsync_ctx {
 static bool localio_enabled __read_mostly = true;
 module_param(localio_enabled, bool, 0644);
 
-static int nfs_local_do_read(struct nfs_local_kiocb *iocb,
-			     const struct rpc_call_ops *call_ops);
-static int nfs_local_do_write(struct nfs_local_kiocb *iocb,
+static void nfs_local_do_read(struct nfs_local_kiocb *iocb,
 			      const struct rpc_call_ops *call_ops);
+static void nfs_local_do_write(struct nfs_local_kiocb *iocb,
+			       const struct rpc_call_ops *call_ops);
 
 static inline bool nfs_client_is_local(const struct nfs_client *clp)
 {
@@ -570,17 +570,17 @@ static void nfs_local_pgio_restart(struct nfs_local_kiocb *iocb,
 	switch (hdr->rw_mode) {
 	case FMODE_READ:
 		nfs_local_iters_init(iocb, ITER_DEST);
-		status = nfs_local_do_read(iocb, hdr->task.tk_ops);
+		nfs_local_do_read(iocb, hdr->task.tk_ops);
 		break;
 	case FMODE_WRITE:
 		nfs_local_iters_init(iocb, ITER_SOURCE);
-		status = nfs_local_do_write(iocb, hdr->task.tk_ops);
+		nfs_local_do_write(iocb, hdr->task.tk_ops);
 		break;
 	default:
 		status = -EOPNOTSUPP;
 	}
 
-	if (status != 0) {
+	if (unlikely(status != 0)) {
 		nfs_local_iocb_release(iocb);
 		hdr->task.tk_status = status;
 		nfs_local_hdr_release(hdr, hdr->task.tk_ops);
@@ -702,9 +702,8 @@ static void nfs_local_call_read(struct work_struct *work)
 	}
 }
 
-static int
-nfs_local_do_read(struct nfs_local_kiocb *iocb,
-		  const struct rpc_call_ops *call_ops)
+static void nfs_local_do_read(struct nfs_local_kiocb *iocb,
+			      const struct rpc_call_ops *call_ops)
 {
 	struct nfs_pgio_header *hdr = iocb->hdr;
 
@@ -716,8 +715,6 @@ nfs_local_do_read(struct nfs_local_kiocb *iocb,
 
 	INIT_WORK(&iocb->work, nfs_local_call_read);
 	queue_work(nfslocaliod_workqueue, &iocb->work);
-
-	return 0;
 }
 
 static void
@@ -900,9 +897,8 @@ static void nfs_local_call_write(struct work_struct *work)
 	current->flags = old_flags;
 }
 
-static int
-nfs_local_do_write(struct nfs_local_kiocb *iocb,
-		   const struct rpc_call_ops *call_ops)
+static void nfs_local_do_write(struct nfs_local_kiocb *iocb,
+			       const struct rpc_call_ops *call_ops)
 {
 	struct nfs_pgio_header *hdr = iocb->hdr;
 
@@ -926,8 +922,6 @@ nfs_local_do_write(struct nfs_local_kiocb *iocb,
 
 	INIT_WORK(&iocb->work, nfs_local_call_write);
 	queue_work(nfslocaliod_workqueue, &iocb->work);
-
-	return 0;
 }
 
 static struct nfs_local_kiocb *
@@ -977,10 +971,10 @@ int nfs_local_doio(struct nfs_client *clp, struct nfsd_file *localio,
 
 	switch (hdr->rw_mode) {
 	case FMODE_READ:
-		status = nfs_local_do_read(iocb, call_ops);
+		nfs_local_do_read(iocb, call_ops);
 		break;
 	case FMODE_WRITE:
-		status = nfs_local_do_write(iocb, call_ops);
+		nfs_local_do_write(iocb, call_ops);
 		break;
 	default:
 		dprintk("%s: invalid mode: %d\n", __func__,
@@ -988,7 +982,7 @@ int nfs_local_doio(struct nfs_client *clp, struct nfsd_file *localio,
 		status = -EOPNOTSUPP;
 	}
 
-	if (status != 0) {
+	if (unlikely(status != 0)) {
 		nfs_local_iocb_release(iocb);
 		hdr->task.tk_status = status;
 		nfs_local_hdr_release(hdr, call_ops);
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2026-01-07 16:09 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-03 17:14 [PATCH 0/4] Fix misc localio issues Trond Myklebust
2026-01-03 17:14 ` [PATCH 1/4] NFS/localio: Stop further I/O upon hitting an error Trond Myklebust
2026-01-05 17:19   ` Mike Snitzer
2026-01-05 17:35     ` Trond Myklebust
2026-01-03 17:14 ` [PATCH 2/4] NFS/localio: Deal with page bases that are > PAGE_SIZE Trond Myklebust
2026-01-05 17:40   ` Mike Snitzer
2026-01-03 17:14 ` [PATCH 3/4] NFS/localio: Handle short writes by retrying Trond Myklebust
2026-01-05 18:04   ` Mike Snitzer
2026-01-05 18:09     ` Trond Myklebust
2026-01-05 18:30       ` Mike Snitzer
2026-01-03 17:15 ` [PATCH 4/4] NFS/localio: Cleanup the nfs_local_pgio_done() parameters Trond Myklebust
2026-01-05 17:24   ` Mike Snitzer
2026-01-07 16:08 ` [PATCH 0/4] NFS/localio: various improvements Mike Snitzer
2026-01-07 16:08   ` [PATCH 1/4] NFS/localio: prevent direct reclaim recursion into NFS via nfs_writepages Mike Snitzer
2026-01-07 16:08   ` [PATCH 2/4] NFS/localio: use GFP_NOIO and non-memreclaim workqueue in nfs_local_commit Mike Snitzer
2026-01-07 16:08   ` [PATCH 3/4] NFS/localio: remove -EAGAIN handling in nfs_local_doio() Mike Snitzer
2026-01-07 16:08   ` [PATCH 4/4] NFS/localio: switch nfs_local_do_read and nfs_local_do_write to return void Mike Snitzer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox