public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: patches@lists.linux.dev, stable@vger.kernel.org
Cc: Nikhil Jha <njha@janestreet.com>,
	Chuck Lever <chuck.lever@oracle.com>,
	Anna Schumaker <anna.schumaker@oracle.com>,
	Sasha Levin <sashal@kernel.org>,
	trondmy@kernel.org, anna@kernel.org, jlayton@kernel.org,
	linux-nfs@vger.kernel.org
Subject: [PATCH AUTOSEL 6.15 13/21] sunrpc: don't immediately retransmit on seqno miss
Date: Fri,  6 Jun 2025 11:41:38 -0400	[thread overview]
Message-ID: <20250606154147.546388-13-sashal@kernel.org> (raw)
In-Reply-To: <20250606154147.546388-1-sashal@kernel.org>

From: Nikhil Jha <njha@janestreet.com>

[ Upstream commit fadc0f3bb2de8c570ced6d9c1f97222213d93140 ]

RFC2203 requires that retransmitted messages use a new gss sequence
number, but the same XID. This means that if the server is just slow
(e.x. overloaded), the client might receive a response using an older
seqno than the one it has recorded.

Currently, Linux's client immediately retransmits in this case. However,
this leads to a lot of wasted retransmits until the server eventually
responds faster than the client can resend.

Client -> SEQ 1 -> Server
Client -> SEQ 2 -> Server
Client <- SEQ 1 <- Server (misses, expecting seqno = 2)
Client -> SEQ 3 -> Server (immediate retransmission on miss)
Client <- SEQ 2 <- Server (misses, expecting seqno = 3)
Client -> SEQ 4 -> Server (immediate retransmission on miss)
... and so on ...

This commit makes it so that we ignore messages with bad checksums
due to seqnum mismatch, and rely on the usual timeout behavior for
retransmission instead of doing so immediately.

Signed-off-by: Nikhil Jha <njha@janestreet.com>
Acked-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

Based on my analysis of the commit and the similar backported commits as
reference, here is my assessment:

**YES**

This commit should be backported to stable kernel trees. Here's my
detailed analysis:

## Key Analysis Points:

### 1. **Bug Fix Nature**:
This commit addresses a clear performance bug in SUNRPC's RPCSEC_GSS
sequence number handling. The current code immediately retransmits when
it receives a response with an older sequence number than expected,
leading to excessive and wasteful retransmissions when the server is
simply slow.

### 2. **Alignment with RFC2203**:
The commit correctly implements RFC2203 requirements, which state that
retransmitted messages should use new GSS sequence numbers but keep the
same XID. The current behavior creates a retransmission storm that
violates the spirit of this requirement.

### 3. **Code Changes Analysis**:
The modification in `rpc_decode_header()` at the `out_verifier` label is
minimal and targeted:

**Before** (lines 2773-2776):
```c
case -EACCES:
    /* Re-encode with a fresh cred */
    fallthrough;
default:
    goto out_garbage;
```

**After**:
```c
case -EACCES:
    /* possible RPCSEC_GSS out-of-sequence event (RFC2203),
     - reset recv state and keep waiting, don't retransmit
     */
    task->tk_rqstp->rq_reply_bytes_recvd = 0;
    task->tk_status = xprt_request_enqueue_receive(task);
    task->tk_action = call_transmit_status;
    return -EBADMSG;
default:
    goto out_garbage;
```

### 4. **Risk Assessment - LOW**:
- **Scope**: Changes are confined to a specific error handling path in
  SUNRPC client code
- **Behavioral change**: Instead of immediate retransmission, it now
  waits for normal timeout-based retransmission
- **Fallback**: The normal timeout mechanism still provides
  retransmission if needed
- **Error handling**: Uses existing infrastructure
  (`xprt_request_enqueue_receive`, `call_transmit_status`)

### 5. **Comparison with Similar Backported Commits**:
This commit shares characteristics with successful backports:
- **Similar Commit #1**: Also fixed SUNRPC verifier handling by changing
  `goto out_err` to `goto out_garbage`
- **Similar Commit #2**: Fixed incomplete RPC message handling with
  proper state management
- **Similar Commit #5**: Fixed RPCSEC_GSS sequence number limit handling

All these were backported because they fixed protocol compliance issues
with minimal risk.

### 6. **User Impact**:
- **Performance improvement**: Significantly reduces unnecessary
  retransmissions under server load
- **Network efficiency**: Reduces wasted bandwidth and server load
- **No breaking changes**: Maintains compatibility while improving
  behavior

### 7. **Stability Criteria Compliance**:
✅ **Important bugfix**: Fixes performance degradation and protocol
compliance
✅ **Minimal risk**: Small, localized change to error handling path
✅ **No new features**: Pure bugfix, no architectural changes
✅ **Subsystem-contained**: Changes only affect SUNRPC client verifier
handling
✅ **Clear benefit**: Reduces retransmission storms in production
environments

The commit follows the stable tree rules perfectly: it's a targeted fix
for a real-world performance problem with minimal risk of regression,
affecting only the specific error case of RPCSEC_GSS sequence number
mismatches.

 net/sunrpc/clnt.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index 6f75862d97820..21426c3049d35 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -2771,8 +2771,13 @@ rpc_decode_header(struct rpc_task *task, struct xdr_stream *xdr)
 	case -EPROTONOSUPPORT:
 		goto out_err;
 	case -EACCES:
-		/* Re-encode with a fresh cred */
-		fallthrough;
+		/* possible RPCSEC_GSS out-of-sequence event (RFC2203),
+		 * reset recv state and keep waiting, don't retransmit
+		 */
+		task->tk_rqstp->rq_reply_bytes_recvd = 0;
+		task->tk_status = xprt_request_enqueue_receive(task);
+		task->tk_action = call_transmit_status;
+		return -EBADMSG;
 	default:
 		goto out_garbage;
 	}
-- 
2.39.5


  parent reply	other threads:[~2025-06-06 15:42 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-06 15:41 [PATCH AUTOSEL 6.15 01/21] cifs: Correctly set SMB1 SessionKey field in Session Setup Request Sasha Levin
2025-06-06 15:41 ` [PATCH AUTOSEL 6.15 02/21] cifs: Fix cifs_query_path_info() for Windows NT servers Sasha Levin
2025-06-06 15:41 ` [PATCH AUTOSEL 6.15 03/21] cifs: Fix encoding of SMB1 Session Setup NTLMSSP Request in non-UNICODE mode Sasha Levin
2025-06-06 15:41 ` [PATCH AUTOSEL 6.15 04/21] NFSv4: Always set NLINK even if the server doesn't support it Sasha Levin
2025-06-06 15:41 ` [PATCH AUTOSEL 6.15 05/21] NFSv4.2: fix listxattr to return selinux security label Sasha Levin
2025-06-06 15:41 ` [PATCH AUTOSEL 6.15 06/21] NFSv4.2: fix setattr caching of TIME_[MODIFY|ACCESS]_SET when timestamps are delegated Sasha Levin
2025-06-06 15:41 ` [PATCH AUTOSEL 6.15 07/21] mailbox: Not protect module_put with spin_lock_irqsave Sasha Levin
2025-06-06 15:41 ` [PATCH AUTOSEL 6.15 08/21] mfd: max77541: Fix wakeup source leaks on device unbind Sasha Levin
2025-06-06 15:41 ` [PATCH AUTOSEL 6.15 09/21] mfd: max14577: " Sasha Levin
2025-06-06 15:41 ` [PATCH AUTOSEL 6.15 10/21] mfd: max77705: " Sasha Levin
2025-06-06 15:41 ` [PATCH AUTOSEL 6.15 11/21] mfd: 88pm886: " Sasha Levin
2025-06-06 15:41 ` [PATCH AUTOSEL 6.15 12/21] mfd: sprd-sc27xx: " Sasha Levin
2025-06-06 15:41 ` Sasha Levin [this message]
2025-06-06 15:41 ` [PATCH AUTOSEL 6.15 14/21] hwmon: (isl28022) Fix current reading calculation Sasha Levin
2025-06-06 15:41 ` [PATCH AUTOSEL 6.15 15/21] dm vdo indexer: don't read request structure after enqueuing Sasha Levin
2025-06-06 15:41 ` [PATCH AUTOSEL 6.15 16/21] leds: multicolor: Fix intensity setting while SW blinking Sasha Levin
2025-06-06 15:41 ` [PATCH AUTOSEL 6.15 17/21] fuse: fix race between concurrent setattrs from multiple nodes Sasha Levin
2025-06-06 15:41 ` [PATCH AUTOSEL 6.15 18/21] cxl/region: Add a dev_err() on missing target list entries Sasha Levin
2025-06-06 15:41 ` [PATCH AUTOSEL 6.15 19/21] cxl: core/region - ignore interleave granularity when ways=1 Sasha Levin
2025-06-06 15:41 ` [PATCH AUTOSEL 6.15 20/21] NFSv4: xattr handlers should check for absent nfs filehandles Sasha Levin
2025-06-06 15:41 ` [PATCH AUTOSEL 6.15 21/21] hwmon: (pmbus/max34440) Fix support for max34451 Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250606154147.546388-13-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=anna.schumaker@oracle.com \
    --cc=anna@kernel.org \
    --cc=chuck.lever@oracle.com \
    --cc=jlayton@kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=njha@janestreet.com \
    --cc=patches@lists.linux.dev \
    --cc=stable@vger.kernel.org \
    --cc=trondmy@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox