[PATCH AUTOSEL 6.19-6.18] netfs: when subreq is marked for retry, do not check if it faced an error

public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed

From: Sasha Levin <sashal@kernel.org>
To: patches@lists.linux.dev, stable@vger.kernel.org
Cc: Shyam Prasad N <sprasad@microsoft.com>,
	David Howells <dhowells@redhat.com>,
	Steve French <stfrench@microsoft.com>,
	Sasha Levin <sashal@kernel.org>,
	pc@manguebit.org, netfs@lists.linux.dev,
	linux-fsdevel@vger.kernel.org
Subject: [PATCH AUTOSEL 6.19-6.18] netfs: when subreq is marked for retry, do not check if it faced an error
Date: Tue, 10 Feb 2026 18:31:05 -0500	[thread overview]
Message-ID: <20260210233123.2905307-20-sashal@kernel.org> (raw)
In-Reply-To: <20260210233123.2905307-1-sashal@kernel.org>

From: Shyam Prasad N <sprasad@microsoft.com>

[ Upstream commit 82e8885bd7633a36ee9050e6d7f348a4155eed5f ]

The *_subreq_terminated functions today only process the NEED_RETRY
flag when the subreq was successful or failed with EAGAIN error.
However, there could be other retriable errors for network filesystems.

Avoid this by processing the NEED_RETRY irrespective of the error
code faced by the subreq. If it was specifically marked for retry,
the error code must not matter.

Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

Now I have a complete picture. Let me provide my detailed analysis.

---

## Comprehensive Analysis

### 1. Commit Message Analysis

The commit title "netfs: when subreq is marked for retry, do not check
if it faced an error" clearly describes a bug fix. The commit explains
that `*_subreq_terminated` functions only process the
`NETFS_SREQ_NEED_RETRY` flag when the subrequest was successful or had
an `-EAGAIN` error, but network filesystems may set `NEED_RETRY` for
other error codes.

The commit has:
- **Acked-by: David Howells** — the creator and maintainer of the netfs
  subsystem
- **Signed-off-by: Shyam Prasad N** — a CIFS/SMB kernel developer at
  Microsoft
- **Signed-off-by: Steve French** — the CIFS/SMB subsystem maintainer

### 2. Code Change Analysis — The Bug

**Read Path Bug** in `netfs_read_subreq_terminated()`
(`read_collect.c`):

When CIFS handles a `MID_REQUEST_SUBMITTED` or `MID_RETRY_NEEDED`
response (as seen in `smb2pdu.c` lines 4633-4640):

```4639:4640:fs/smb/client/smb2pdu.c
                __set_bit(NETFS_SREQ_NEED_RETRY, &rdata->subreq.flags);
                rdata->result = -EAGAIN;
```

Then at line 4707: `rdata->subreq.error = rdata->result;` (= -EAGAIN),
and line 4710 calls `netfs_read_subreq_terminated()`.

In the current code:

```532:547:fs/netfs/read_collect.c
        if (!subreq->error && subreq->transferred < subreq->len) {
                // ... this block is SKIPPED because error is -EAGAIN
(non-zero)
        }
```

Then:

```549:560:fs/netfs/read_collect.c
        if (unlikely(subreq->error < 0)) {
                trace_netfs_failure(rreq, subreq, subreq->error,
netfs_fail_read);
                if (subreq->source == NETFS_READ_FROM_CACHE) {
                        netfs_stat(&netfs_n_rh_read_failed);
                        __set_bit(NETFS_SREQ_NEED_RETRY,
&subreq->flags);
                } else {
                        netfs_stat(&netfs_n_rh_download_failed);
                        __set_bit(NETFS_SREQ_FAILED, &subreq->flags);
// BUG: overrides NEED_RETRY!
                }
```

For server downloads, `NETFS_SREQ_FAILED` is set **despite**
`NETFS_SREQ_NEED_RETRY` already being set by CIFS. Since the collection
code at line 269 checks `FAILED` **before** `NEED_RETRY` at line 277,
and the retry code at line 94 abandons if `FAILED` is set, the
subrequest is **permanently failed** instead of being retried.

**Write Path Bug** in `netfs_write_subrequest_terminated()`
(`write_collect.c`):

```493:498:fs/netfs/write_collect.c
        if (IS_ERR_VALUE(transferred_or_error)) {
                subreq->error = transferred_or_error;
                if (subreq->error == -EAGAIN)
                        set_bit(NETFS_SREQ_NEED_RETRY, &subreq->flags);
                else
                        set_bit(NETFS_SREQ_FAILED, &subreq->flags);
```

AFS sets `NETFS_SREQ_NEED_RETRY` for auth errors (`-EACCES`, `-EPERM`,
`-ENOKEY`, etc.) at `fs/afs/write.c:165` before calling
`netfs_write_subrequest_terminated()` with those error codes. Since the
error is not `-EAGAIN`, the netfs layer overrides the filesystem's retry
request with `FAILED`.

### 3. The Fix

1. **`read_collect.c`**: Adds a check for `NETFS_SREQ_NEED_RETRY` before
   the error handling block. If NEED_RETRY is set, it pauses the request
   and jumps past error checks via `goto skip_error_checks`, preventing
   FAILED from being set.

2. **`write_collect.c`**: Replaces the EAGAIN-only check with a check
   for whether NEED_RETRY is already set. If set, FAILED is not applied.
   This delegates retry decisions to the filesystem.

3. **`read_retry.c` / `write_issue.c`**: Adds `subreq->error = 0` in
   `netfs_reissue_read()` and `netfs_reissue_write()` to clear stale
   error codes before reissuing.

4. **`read_retry.c` abandon path**: Removes the `!subreq->error` check
   since subrequests with `NEED_RETRY` can now legitimately carry error
   codes.

### 4. User Impact

This bug directly affects:
- **CIFS/SMB users** experiencing network disruptions (laptop lid
  close/open, WiFi drops, VPN changes): reads fail with errors instead
  of transparently retrying
- **AFS users** needing key rotation during writes: writes fail instead
  of retrying with a different key
- These are common real-world scenarios for enterprise and laptop users

### 5. Risk Assessment

**Positive:**
- Small patch (~30 lines across 4 files)
- ACKed by the netfs subsystem maintainer
- Logically sound: filesystems know better than the generic layer when
  to retry
- Addresses concrete, user-facing failures

**Concerns:**
- The write path removes the EAGAIN→NEED_RETRY fallback. AFS's
  `afs_alloc_operation()` failure (line 133) passes -EAGAIN without pre-
  setting NEED_RETRY — this would now fail instead of retry. However,
  David Howells (AFS author) ACKed this change, accepting the behavioral
  shift.
- 9p similarly doesn't pre-set NEED_RETRY — -EAGAIN from 9p would no
  longer trigger retry. This is likely intentional: the filesystem
  should decide, not the generic layer.
- The affected code (with the new collector architecture) exists from
  v6.14 onward. v6.14 has a slightly different function signature for
  `netfs_write_subrequest_terminated` (with `was_async` parameter),
  requiring minor backport adjustment.

### 6. Dependency Check

- No missing dependencies: `NETFS_RREQ_PAUSE`,
  `netfs_rreq_trace_set_pause`, and all other referenced symbols exist
  in the current stable trees.
- The patch is self-contained and doesn't appear to be part of a series.
- For v6.18 stable, the patch should apply cleanly as the code matches
  exactly.

### 7. Conclusion

This commit fixes a real, user-visible bug where network filesystem I/O
operations marked for retry by CIFS and AFS are incorrectly treated as
permanent failures by the netfs generic layer. The bug causes spurious
I/O errors during common scenarios like network reconnections and
authentication retries. The fix is small, reviewed by the subsystem
maintainer, and addresses concrete failure paths. While it includes a
deliberate behavioral shift in the write path (removing EAGAIN special-
casing), this was reviewed and ACKed by the relevant maintainers. The
risk is low relative to the benefit of fixing CIFS reconnection failures
for a large user base.

**YES**

 fs/netfs/read_collect.c  | 10 ++++++++++
 fs/netfs/read_retry.c    |  4 ++--
 fs/netfs/write_collect.c |  8 ++++----
 fs/netfs/write_issue.c   |  1 +
 4 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/fs/netfs/read_collect.c b/fs/netfs/read_collect.c
index 7a0ffa675fb17..137f0e28a44c5 100644
--- a/fs/netfs/read_collect.c
+++ b/fs/netfs/read_collect.c
@@ -546,6 +546,15 @@ void netfs_read_subreq_terminated(struct netfs_io_subrequest *subreq)
 		}
 	}
 
+	/* If need retry is set, error should not matter unless we hit too many
+	 * retries. Pause the generation of new subreqs
+	 */
+	if (test_bit(NETFS_SREQ_NEED_RETRY, &subreq->flags)) {
+		trace_netfs_rreq(rreq, netfs_rreq_trace_set_pause);
+		set_bit(NETFS_RREQ_PAUSE, &rreq->flags);
+		goto skip_error_checks;
+	}
+
 	if (unlikely(subreq->error < 0)) {
 		trace_netfs_failure(rreq, subreq, subreq->error, netfs_fail_read);
 		if (subreq->source == NETFS_READ_FROM_CACHE) {
@@ -559,6 +568,7 @@ void netfs_read_subreq_terminated(struct netfs_io_subrequest *subreq)
 		set_bit(NETFS_RREQ_PAUSE, &rreq->flags);
 	}
 
+skip_error_checks:
 	trace_netfs_sreq(subreq, netfs_sreq_trace_terminated);
 	netfs_subreq_clear_in_progress(subreq);
 	netfs_put_subrequest(subreq, netfs_sreq_trace_put_terminated);
diff --git a/fs/netfs/read_retry.c b/fs/netfs/read_retry.c
index b99e84a8170af..7793ba5e3e8fc 100644
--- a/fs/netfs/read_retry.c
+++ b/fs/netfs/read_retry.c
@@ -12,6 +12,7 @@
 static void netfs_reissue_read(struct netfs_io_request *rreq,
 			       struct netfs_io_subrequest *subreq)
 {
+	subreq->error = 0;
 	__clear_bit(NETFS_SREQ_MADE_PROGRESS, &subreq->flags);
 	__set_bit(NETFS_SREQ_IN_PROGRESS, &subreq->flags);
 	netfs_stat(&netfs_n_rh_retry_read_subreq);
@@ -242,8 +243,7 @@ static void netfs_retry_read_subrequests(struct netfs_io_request *rreq)
 	subreq = list_next_entry(subreq, rreq_link);
 abandon:
 	list_for_each_entry_from(subreq, &stream->subrequests, rreq_link) {
-		if (!subreq->error &&
-		    !test_bit(NETFS_SREQ_FAILED, &subreq->flags) &&
+		if (!test_bit(NETFS_SREQ_FAILED, &subreq->flags) &&
 		    !test_bit(NETFS_SREQ_NEED_RETRY, &subreq->flags))
 			continue;
 		subreq->error = -ENOMEM;
diff --git a/fs/netfs/write_collect.c b/fs/netfs/write_collect.c
index cbf3d9194c7bf..61eab34ea67ef 100644
--- a/fs/netfs/write_collect.c
+++ b/fs/netfs/write_collect.c
@@ -492,11 +492,11 @@ void netfs_write_subrequest_terminated(void *_op, ssize_t transferred_or_error)
 
 	if (IS_ERR_VALUE(transferred_or_error)) {
 		subreq->error = transferred_or_error;
-		if (subreq->error == -EAGAIN)
-			set_bit(NETFS_SREQ_NEED_RETRY, &subreq->flags);
-		else
+		/* if need retry is set, error should not matter */
+		if (!test_bit(NETFS_SREQ_NEED_RETRY, &subreq->flags)) {
 			set_bit(NETFS_SREQ_FAILED, &subreq->flags);
-		trace_netfs_failure(wreq, subreq, transferred_or_error, netfs_fail_write);
+			trace_netfs_failure(wreq, subreq, transferred_or_error, netfs_fail_write);
+		}
 
 		switch (subreq->source) {
 		case NETFS_WRITE_TO_CACHE:
diff --git a/fs/netfs/write_issue.c b/fs/netfs/write_issue.c
index dd8743bc8d7fe..34894da5a23ec 100644
--- a/fs/netfs/write_issue.c
+++ b/fs/netfs/write_issue.c
@@ -250,6 +250,7 @@ void netfs_reissue_write(struct netfs_io_stream *stream,
 	iov_iter_truncate(&subreq->io_iter, size);
 
 	subreq->retry_count++;
+	subreq->error = 0;
 	__clear_bit(NETFS_SREQ_MADE_PROGRESS, &subreq->flags);
 	__set_bit(NETFS_SREQ_IN_PROGRESS, &subreq->flags);
 	netfs_stat(&netfs_n_wh_retry_write_subreq);
-- 
2.51.0

next prev parent reply	other threads:[~2026-02-10 23:31 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20260210233123.2905307-1-sashal@kernel.org>
2026-02-10 23:30 ` [PATCH AUTOSEL 6.19-6.18] hfsplus: fix volume corruption issue for generic/480 Sasha Levin
2026-02-10 23:30 ` [PATCH AUTOSEL 6.19-6.1] fs/buffer: add alert in try_to_free_buffers() for folios without buffers Sasha Levin
2026-02-10 23:31 ` [PATCH AUTOSEL 6.19-6.12] statmount: permission check should return EPERM Sasha Levin
2026-02-10 23:31 ` [PATCH AUTOSEL 6.19-5.10] hfsplus: pretend special inodes as regular files Sasha Levin
2026-02-10 23:31 ` [PATCH AUTOSEL 6.19-5.10] hfsplus: fix volume corruption issue for generic/498 Sasha Levin
2026-02-10 23:31 ` Sasha Levin [this message]
2026-02-10 23:31 ` [PATCH AUTOSEL 6.19] hfs: Replace BUG_ON with error handling for CNID count checks Sasha Levin

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:7a0ffa675fb1 dfblob:137f0e28a44c dfblob:b99e84a8170a
dfblob:7793ba5e3e8f dfblob:cbf3d9194c7b dfblob:61eab34ea67e
dfblob:dd8743bc8d7f dfblob:34894da5a23e )
 OR (
bs:"[PATCH AUTOSEL 6.19-6.18] netfs: when subreq is marked for retry, do not check if it faced an error" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260210233123.2905307-20-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=dhowells@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=netfs@lists.linux.dev \
    --cc=patches@lists.linux.dev \
    --cc=pc@manguebit.org \
    --cc=sprasad@microsoft.com \
    --cc=stable@vger.kernel.org \
    --cc=stfrench@microsoft.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox