[PATCH v2 0/2] fix gss seqno handling to be more rfc-compliant

linux-trace-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 0/2] fix gss seqno handling to be more rfc-compliant
@ 2025-03-19 17:02 Nikhil Jha via B4 Relay
  2025-03-19 17:02 ` [PATCH v2 1/2] sunrpc: implement rfc2203 rpcsec_gss seqnum cache Nikhil Jha via B4 Relay
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Nikhil Jha via B4 Relay @ 2025-03-19 17:02 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Chuck Lever, Jeff Layton,
	Neil Brown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers
  Cc: linux-nfs, linux-kernel, netdev, linux-trace-kernel, Nikhil Jha

When the client retransmits an operation (for example, because the
server is slow to respond), a new GSS sequence number is associated with
the XID. In the current kernel code the original sequence number is
discarded. Subsequently, if a response to the original request is
received there will be a GSS sequence number mismatch. A mismatch will
trigger another retransmit, possibly repeating the cycle, and after some
number of failed retries EACCES is returned.

RFC2203, section 5.3.3.1 suggests a possible solution... “cache the
RPCSEC_GSS sequence number of each request it sends” and "compute the
checksum of each sequence number in the cache to try to match the
checksum in the reply's verifier." This is what FreeBSD’s implementation
does (rpc_gss_validate in sys/rpc/rpcsec_gss/rpcsec_gss.c).

However, even with this cache, retransmits directly caused by a seqno
mismatch can still cause a bad message interleaving that results in this
bug. The RFC already suggests ignoring incorrect seqnos on the server
side, and this seems symmetric, so this patchset also applies that
behavior to the client.

These two patches are *not* dependent on each other. I tested them by
delaying packets with a Python script hooked up to NFQUEUE. If it would
be helpful I can send this script along as well.

Signed-off-by: Nikhil Jha <njha@janestreet.com>
---
Changes since v1:
 * Maintain the invariant that the first seqno is always first in
   rq_seqnos, so that it doesn't need to be stored twice.
 * Minor formatting, and resending with proper mailing-list headers so the
   patches are easier to work with.

---
Nikhil Jha (2):
      sunrpc: implement rfc2203 rpcsec_gss seqnum cache
      sunrpc: don't immediately retransmit on seqno miss

 include/linux/sunrpc/xprt.h    | 17 +++++++++++-
 include/trace/events/rpcgss.h  |  4 +--
 include/trace/events/sunrpc.h  |  2 +-
 net/sunrpc/auth_gss/auth_gss.c | 59 ++++++++++++++++++++++++++----------------
 net/sunrpc/clnt.c              |  9 +++++--
 net/sunrpc/xprt.c              |  3 ++-
 6 files changed, 64 insertions(+), 30 deletions(-)
---
base-commit: 7eb172143d5508b4da468ed59ee857c6e5e01da6
change-id: 20250314-rfc2203-seqnum-cache-52389d14f567

Best regards,
-- 
Nikhil Jha <njha@janestreet.com>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 1/2] sunrpc: implement rfc2203 rpcsec_gss seqnum cache
  2025-03-19 17:02 [PATCH v2 0/2] fix gss seqno handling to be more rfc-compliant Nikhil Jha via B4 Relay
@ 2025-03-19 17:02 ` Nikhil Jha via B4 Relay
  2025-03-19 17:02 ` [PATCH v2 2/2] sunrpc: don't immediately retransmit on seqno miss Nikhil Jha via B4 Relay
  2025-03-20 13:16 ` [PATCH v2 0/2] fix gss seqno handling to be more rfc-compliant Chuck Lever
  2 siblings, 0 replies; 8+ messages in thread
From: Nikhil Jha via B4 Relay @ 2025-03-19 17:02 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Chuck Lever, Jeff Layton,
	Neil Brown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers
  Cc: linux-nfs, linux-kernel, netdev, linux-trace-kernel, Nikhil Jha

From: Nikhil Jha <njha@janestreet.com>

This implements a sequence number cache of the last three (right now
hardcoded) sent sequence numbers for a given XID, as suggested by the
RFC.

From RFC2203 5.3.3.1:

"Note that the sequence number algorithm requires that the client
increment the sequence number even if it is retrying a request with
the same RPC transaction identifier.  It is not infrequent for
clients to get into a situation where they send two or more attempts
and a slow server sends the reply for the first attempt. With
RPCSEC_GSS, each request and reply will have a unique sequence
number. If the client wishes to improve turn around time on the RPC
call, it can cache the RPCSEC_GSS sequence number of each request it
sends. Then when it receives a response with a matching RPC
transaction identifier, it can compute the checksum of each sequence
number in the cache to try to match the checksum in the reply's
verifier."

Signed-off-by: Nikhil Jha <njha@janestreet.com>
---
 include/linux/sunrpc/xprt.h    | 17 +++++++++++-
 include/trace/events/rpcgss.h  |  4 +--
 include/trace/events/sunrpc.h  |  2 +-
 net/sunrpc/auth_gss/auth_gss.c | 59 ++++++++++++++++++++++++++----------------
 net/sunrpc/xprt.c              |  3 ++-
 5 files changed, 57 insertions(+), 28 deletions(-)

diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h
index 81b952649d35e3ad4fa8c7e77388ac2ceb44ce60..f46d1fb8f71ae22233f15691b2cf312f73656d9d 100644
--- a/include/linux/sunrpc/xprt.h
+++ b/include/linux/sunrpc/xprt.h
@@ -30,6 +30,8 @@
 #define RPC_MAXCWND(xprt)	((xprt)->max_reqs << RPC_CWNDSHIFT)
 #define RPCXPRT_CONGESTED(xprt) ((xprt)->cong >= (xprt)->cwnd)
 
+#define RPC_GSS_SEQNO_ARRAY_SIZE 3U
+
 enum rpc_display_format_t {
 	RPC_DISPLAY_ADDR = 0,
 	RPC_DISPLAY_PORT,
@@ -66,7 +68,8 @@ struct rpc_rqst {
 	struct rpc_cred *	rq_cred;	/* Bound cred */
 	__be32			rq_xid;		/* request XID */
 	int			rq_cong;	/* has incremented xprt->cong */
-	u32			rq_seqno;	/* gss seq no. used on req. */
+	u32			rq_seqnos[RPC_GSS_SEQNO_ARRAY_SIZE];	/* past gss req seq nos. */
+	unsigned int		rq_seqno_count;	/* number of entries in rq_seqnos */
 	int			rq_enc_pages_num;
 	struct page		**rq_enc_pages;	/* scratch pages for use by
 						   gss privacy code */
@@ -119,6 +122,18 @@ struct rpc_rqst {
 #define rq_svec			rq_snd_buf.head
 #define rq_slen			rq_snd_buf.len
 
+static inline int xprt_rqst_add_seqno(struct rpc_rqst *req, u32 seqno)
+{
+	if (likely(req->rq_seqno_count < RPC_GSS_SEQNO_ARRAY_SIZE))
+		req->rq_seqno_count++;
+
+	/* Shift array to make room for the newest element at the beginning */
+	memmove(&req->rq_seqnos[1], &req->rq_seqnos[0],
+		(RPC_GSS_SEQNO_ARRAY_SIZE - 1) * sizeof(req->rq_seqnos[0]));
+	req->rq_seqnos[0] = seqno;
+	return 0;
+}
+
 /* RPC transport layer security policies */
 enum xprtsec_policies {
 	RPC_XPRTSEC_NONE = 0,
diff --git a/include/trace/events/rpcgss.h b/include/trace/events/rpcgss.h
index b0b6300a0cabdbaf1c08460a474aaf092b548e53..8aeae06cf434d05547b4cffb14f01901cfba4142 100644
--- a/include/trace/events/rpcgss.h
+++ b/include/trace/events/rpcgss.h
@@ -409,7 +409,7 @@ TRACE_EVENT(rpcgss_seqno,
 		__entry->task_id = task->tk_pid;
 		__entry->client_id = task->tk_client->cl_clid;
 		__entry->xid = be32_to_cpu(rqst->rq_xid);
-		__entry->seqno = rqst->rq_seqno;
+		__entry->seqno = *rqst->rq_seqnos;
 	),
 
 	TP_printk(SUNRPC_TRACE_TASK_SPECIFIER " xid=0x%08x seqno=%u",
@@ -440,7 +440,7 @@ TRACE_EVENT(rpcgss_need_reencode,
 		__entry->client_id = task->tk_client->cl_clid;
 		__entry->xid = be32_to_cpu(task->tk_rqstp->rq_xid);
 		__entry->seq_xmit = seq_xmit;
-		__entry->seqno = task->tk_rqstp->rq_seqno;
+		__entry->seqno = *task->tk_rqstp->rq_seqnos;
 		__entry->ret = ret;
 	),
 
diff --git a/include/trace/events/sunrpc.h b/include/trace/events/sunrpc.h
index 851841336ee65c1d03fc3565a96f5dbe409c1d64..bb01a0ca93aa72a310146bba97ca58b851a85653 100644
--- a/include/trace/events/sunrpc.h
+++ b/include/trace/events/sunrpc.h
@@ -1099,7 +1099,7 @@ TRACE_EVENT(xprt_transmit,
 		__entry->client_id = rqst->rq_task->tk_client ?
 			rqst->rq_task->tk_client->cl_clid : -1;
 		__entry->xid = be32_to_cpu(rqst->rq_xid);
-		__entry->seqno = rqst->rq_seqno;
+		__entry->seqno = *rqst->rq_seqnos;
 		__entry->status = status;
 	),
 
diff --git a/net/sunrpc/auth_gss/auth_gss.c b/net/sunrpc/auth_gss/auth_gss.c
index 369310909fc98596c8a06db8ac3c976a719ca7b2..0fa244f16876f3c434fd507b4d53c5eefd748ce4 100644
--- a/net/sunrpc/auth_gss/auth_gss.c
+++ b/net/sunrpc/auth_gss/auth_gss.c
@@ -1545,6 +1545,7 @@ static int gss_marshal(struct rpc_task *task, struct xdr_stream *xdr)
 	struct kvec	iov;
 	struct xdr_buf	verf_buf;
 	int status;
+	u32 seqno;
 
 	/* Credential */
 
@@ -1556,15 +1557,16 @@ static int gss_marshal(struct rpc_task *task, struct xdr_stream *xdr)
 	cred_len = p++;
 
 	spin_lock(&ctx->gc_seq_lock);
-	req->rq_seqno = (ctx->gc_seq < MAXSEQ) ? ctx->gc_seq++ : MAXSEQ;
+	seqno = (ctx->gc_seq < MAXSEQ) ? ctx->gc_seq++ : MAXSEQ;
+	xprt_rqst_add_seqno(req, seqno);
 	spin_unlock(&ctx->gc_seq_lock);
-	if (req->rq_seqno == MAXSEQ)
+	if (*req->rq_seqnos == MAXSEQ)
 		goto expired;
 	trace_rpcgss_seqno(task);
 
 	*p++ = cpu_to_be32(RPC_GSS_VERSION);
 	*p++ = cpu_to_be32(ctx->gc_proc);
-	*p++ = cpu_to_be32(req->rq_seqno);
+	*p++ = cpu_to_be32(*req->rq_seqnos);
 	*p++ = cpu_to_be32(gss_cred->gc_service);
 	p = xdr_encode_netobj(p, &ctx->gc_wire_ctx);
 	*cred_len = cpu_to_be32((p - (cred_len + 1)) << 2);
@@ -1678,17 +1680,31 @@ gss_refresh_null(struct rpc_task *task)
 	return 0;
 }
 
+static u32
+gss_validate_seqno_mic(struct gss_cl_ctx *ctx, u32 seqno, __be32 *seq, __be32 *p, u32 len)
+{
+	struct kvec iov;
+	struct xdr_buf verf_buf;
+	struct xdr_netobj mic;
+
+	*seq = cpu_to_be32(seqno);
+	iov.iov_base = seq;
+	iov.iov_len = 4;
+	xdr_buf_from_iov(&iov, &verf_buf);
+	mic.data = (u8 *)p;
+	mic.len = len;
+	return gss_verify_mic(ctx->gc_gss_ctx, &verf_buf, &mic);
+}
+
 static int
 gss_validate(struct rpc_task *task, struct xdr_stream *xdr)
 {
 	struct rpc_cred *cred = task->tk_rqstp->rq_cred;
 	struct gss_cl_ctx *ctx = gss_cred_get_ctx(cred);
 	__be32		*p, *seq = NULL;
-	struct kvec	iov;
-	struct xdr_buf	verf_buf;
-	struct xdr_netobj mic;
 	u32		len, maj_stat;
 	int		status;
+	int		i = 1; /* don't recheck the first item */
 
 	p = xdr_inline_decode(xdr, 2 * sizeof(*p));
 	if (!p)
@@ -1705,13 +1721,10 @@ gss_validate(struct rpc_task *task, struct xdr_stream *xdr)
 	seq = kmalloc(4, GFP_KERNEL);
 	if (!seq)
 		goto validate_failed;
-	*seq = cpu_to_be32(task->tk_rqstp->rq_seqno);
-	iov.iov_base = seq;
-	iov.iov_len = 4;
-	xdr_buf_from_iov(&iov, &verf_buf);
-	mic.data = (u8 *)p;
-	mic.len = len;
-	maj_stat = gss_verify_mic(ctx->gc_gss_ctx, &verf_buf, &mic);
+	maj_stat = gss_validate_seqno_mic(ctx, task->tk_rqstp->rq_seqnos[0], seq, p, len);
+	/* RFC 2203 5.3.3.1 - compute the checksum of each sequence number in the cache */
+	while (unlikely(maj_stat == GSS_S_BAD_SIG && i < task->tk_rqstp->rq_seqno_count))
+		maj_stat = gss_validate_seqno_mic(ctx, task->tk_rqstp->rq_seqnos[i], seq, p, len);
 	if (maj_stat == GSS_S_CONTEXT_EXPIRED)
 		clear_bit(RPCAUTH_CRED_UPTODATE, &cred->cr_flags);
 	if (maj_stat)
@@ -1750,7 +1763,7 @@ gss_wrap_req_integ(struct rpc_cred *cred, struct gss_cl_ctx *ctx,
 	if (!p)
 		goto wrap_failed;
 	integ_len = p++;
-	*p = cpu_to_be32(rqstp->rq_seqno);
+	*p = cpu_to_be32(*rqstp->rq_seqnos);
 
 	if (rpcauth_wrap_req_encode(task, xdr))
 		goto wrap_failed;
@@ -1847,7 +1860,7 @@ gss_wrap_req_priv(struct rpc_cred *cred, struct gss_cl_ctx *ctx,
 	if (!p)
 		goto wrap_failed;
 	opaque_len = p++;
-	*p = cpu_to_be32(rqstp->rq_seqno);
+	*p = cpu_to_be32(*rqstp->rq_seqnos);
 
 	if (rpcauth_wrap_req_encode(task, xdr))
 		goto wrap_failed;
@@ -2001,7 +2014,7 @@ gss_unwrap_resp_integ(struct rpc_task *task, struct rpc_cred *cred,
 	offset = rcv_buf->len - xdr_stream_remaining(xdr);
 	if (xdr_stream_decode_u32(xdr, &seqno))
 		goto unwrap_failed;
-	if (seqno != rqstp->rq_seqno)
+	if (seqno != *rqstp->rq_seqnos)
 		goto bad_seqno;
 	if (xdr_buf_subsegment(rcv_buf, &gss_data, offset, len))
 		goto unwrap_failed;
@@ -2045,7 +2058,7 @@ gss_unwrap_resp_integ(struct rpc_task *task, struct rpc_cred *cred,
 	trace_rpcgss_unwrap_failed(task);
 	goto out;
 bad_seqno:
-	trace_rpcgss_bad_seqno(task, rqstp->rq_seqno, seqno);
+	trace_rpcgss_bad_seqno(task, *rqstp->rq_seqnos, seqno);
 	goto out;
 bad_mic:
 	trace_rpcgss_verify_mic(task, maj_stat);
@@ -2077,7 +2090,7 @@ gss_unwrap_resp_priv(struct rpc_task *task, struct rpc_cred *cred,
 	if (maj_stat != GSS_S_COMPLETE)
 		goto bad_unwrap;
 	/* gss_unwrap decrypted the sequence number */
-	if (be32_to_cpup(p++) != rqstp->rq_seqno)
+	if (be32_to_cpup(p++) != *rqstp->rq_seqnos)
 		goto bad_seqno;
 
 	/* gss_unwrap redacts the opaque blob from the head iovec.
@@ -2093,7 +2106,7 @@ gss_unwrap_resp_priv(struct rpc_task *task, struct rpc_cred *cred,
 	trace_rpcgss_unwrap_failed(task);
 	return -EIO;
 bad_seqno:
-	trace_rpcgss_bad_seqno(task, rqstp->rq_seqno, be32_to_cpup(--p));
+	trace_rpcgss_bad_seqno(task, *rqstp->rq_seqnos, be32_to_cpup(--p));
 	return -EIO;
 bad_unwrap:
 	trace_rpcgss_unwrap(task, maj_stat);
@@ -2118,14 +2131,14 @@ gss_xmit_need_reencode(struct rpc_task *task)
 	if (!ctx)
 		goto out;
 
-	if (gss_seq_is_newer(req->rq_seqno, READ_ONCE(ctx->gc_seq)))
+	if (gss_seq_is_newer(*req->rq_seqnos, READ_ONCE(ctx->gc_seq)))
 		goto out_ctx;
 
 	seq_xmit = READ_ONCE(ctx->gc_seq_xmit);
-	while (gss_seq_is_newer(req->rq_seqno, seq_xmit)) {
+	while (gss_seq_is_newer(*req->rq_seqnos, seq_xmit)) {
 		u32 tmp = seq_xmit;
 
-		seq_xmit = cmpxchg(&ctx->gc_seq_xmit, tmp, req->rq_seqno);
+		seq_xmit = cmpxchg(&ctx->gc_seq_xmit, tmp, *req->rq_seqnos);
 		if (seq_xmit == tmp) {
 			ret = false;
 			goto out_ctx;
@@ -2134,7 +2147,7 @@ gss_xmit_need_reencode(struct rpc_task *task)
 
 	win = ctx->gc_win;
 	if (win > 0)
-		ret = !gss_seq_is_newer(req->rq_seqno, seq_xmit - win);
+		ret = !gss_seq_is_newer(*req->rq_seqnos, seq_xmit - win);
 
 out_ctx:
 	gss_put_ctx(ctx);
diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index 09f245cda5262a572c450237419c80b183a83568..395bdfdaf4313eb85292f5a99bba818bbb268a76 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -1365,7 +1365,7 @@ xprt_request_enqueue_transmit(struct rpc_task *task)
 				INIT_LIST_HEAD(&req->rq_xmit2);
 				goto out;
 			}
-		} else if (!req->rq_seqno) {
+		} else if (req->rq_seqno_count == 0) {
 			list_for_each_entry(pos, &xprt->xmit_queue, rq_xmit) {
 				if (pos->rq_task->tk_owner != task->tk_owner)
 					continue;
@@ -1898,6 +1898,7 @@ xprt_request_init(struct rpc_task *task)
 	req->rq_snd_buf.bvec = NULL;
 	req->rq_rcv_buf.bvec = NULL;
 	req->rq_release_snd_buf = NULL;
+	req->rq_seqno_count = 0;
 	xprt_init_majortimeo(task, req, task->tk_client->cl_timeout);
 
 	trace_xprt_reserve(req);

-- 
2.43.5



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 2/2] sunrpc: don't immediately retransmit on seqno miss
  2025-03-19 17:02 [PATCH v2 0/2] fix gss seqno handling to be more rfc-compliant Nikhil Jha via B4 Relay
  2025-03-19 17:02 ` [PATCH v2 1/2] sunrpc: implement rfc2203 rpcsec_gss seqnum cache Nikhil Jha via B4 Relay
@ 2025-03-19 17:02 ` Nikhil Jha via B4 Relay
  2025-03-20 13:16 ` [PATCH v2 0/2] fix gss seqno handling to be more rfc-compliant Chuck Lever
  2 siblings, 0 replies; 8+ messages in thread
From: Nikhil Jha via B4 Relay @ 2025-03-19 17:02 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Chuck Lever, Jeff Layton,
	Neil Brown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers
  Cc: linux-nfs, linux-kernel, netdev, linux-trace-kernel, Nikhil Jha

From: Nikhil Jha <njha@janestreet.com>

RFC2203 requires that retransmitted messages use a new gss sequence
number, but the same XID. This means that if the server is just slow
(e.x. overloaded), the client might receive a response using an older
seqno than the one it has recorded.

Currently, Linux's client immediately retransmits in this case. However,
this leads to a lot of wasted retransmits until the server eventually
responds faster than the client can resend.

Client -> SEQ 1 -> Server
Client -> SEQ 2 -> Server
Client <- SEQ 1 <- Server (misses, expecting seqno = 2)
Client -> SEQ 3 -> Server (immediate retransmission on miss)
Client <- SEQ 2 <- Server (misses, expecting seqno = 3)
Client -> SEQ 4 -> Server (immediate retransmission on miss)
... and so on ...

This commit makes it so that we ignore messages with bad checksums
due to seqnum mismatch, and rely on the usual timeout behavior for
retransmission instead of doing so immediately.

Signed-off-by: Nikhil Jha <njha@janestreet.com>
---
 net/sunrpc/clnt.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index 2fe88ea79a70c134e58abfb03fca230883eddf1f..98c0880ec18a8b80371e20f65e7a57a8d9244f7f 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -2760,8 +2760,13 @@ rpc_decode_header(struct rpc_task *task, struct xdr_stream *xdr)
 	case -EPROTONOSUPPORT:
 		goto out_err;
 	case -EACCES:
-		/* Re-encode with a fresh cred */
-		fallthrough;
+		/* possible RPCSEC_GSS out-of-sequence event (RFC2203),
+		 * reset recv state and keep waiting, don't retransmit
+		 */
+		task->tk_rqstp->rq_reply_bytes_recvd = 0;
+		task->tk_status = xprt_request_enqueue_receive(task);
+		task->tk_action = call_transmit_status;
+		return -EBADMSG;
 	default:
 		goto out_garbage;
 	}

-- 
2.43.5



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 0/2] fix gss seqno handling to be more rfc-compliant
  2025-03-19 17:02 [PATCH v2 0/2] fix gss seqno handling to be more rfc-compliant Nikhil Jha via B4 Relay
  2025-03-19 17:02 ` [PATCH v2 1/2] sunrpc: implement rfc2203 rpcsec_gss seqnum cache Nikhil Jha via B4 Relay
  2025-03-19 17:02 ` [PATCH v2 2/2] sunrpc: don't immediately retransmit on seqno miss Nikhil Jha via B4 Relay
@ 2025-03-20 13:16 ` Chuck Lever
  2025-06-11 18:50   ` Nikhil Jha
  2 siblings, 1 reply; 8+ messages in thread
From: Chuck Lever @ 2025-03-20 13:16 UTC (permalink / raw)
  To: njha, Trond Myklebust, Anna Schumaker, Jeff Layton, Neil Brown,
	Olga Kornievskaia, Dai Ngo, Tom Talpey, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers
  Cc: linux-nfs, linux-kernel, netdev, linux-trace-kernel

On 3/19/25 1:02 PM, Nikhil Jha via B4 Relay wrote:
> When the client retransmits an operation (for example, because the
> server is slow to respond), a new GSS sequence number is associated with
> the XID. In the current kernel code the original sequence number is
> discarded. Subsequently, if a response to the original request is
> received there will be a GSS sequence number mismatch. A mismatch will
> trigger another retransmit, possibly repeating the cycle, and after some
> number of failed retries EACCES is returned.
> 
> RFC2203, section 5.3.3.1 suggests a possible solution... “cache the
> RPCSEC_GSS sequence number of each request it sends” and "compute the
> checksum of each sequence number in the cache to try to match the
> checksum in the reply's verifier." This is what FreeBSD’s implementation
> does (rpc_gss_validate in sys/rpc/rpcsec_gss/rpcsec_gss.c).
> 
> However, even with this cache, retransmits directly caused by a seqno
> mismatch can still cause a bad message interleaving that results in this
> bug. The RFC already suggests ignoring incorrect seqnos on the server
> side, and this seems symmetric, so this patchset also applies that
> behavior to the client.
> 
> These two patches are *not* dependent on each other. I tested them by
> delaying packets with a Python script hooked up to NFQUEUE. If it would
> be helpful I can send this script along as well.
> 
> Signed-off-by: Nikhil Jha <njha@janestreet.com>
> ---
> Changes since v1:
>  * Maintain the invariant that the first seqno is always first in
>    rq_seqnos, so that it doesn't need to be stored twice.
>  * Minor formatting, and resending with proper mailing-list headers so the
>    patches are easier to work with.
> 
> ---
> Nikhil Jha (2):
>       sunrpc: implement rfc2203 rpcsec_gss seqnum cache
>       sunrpc: don't immediately retransmit on seqno miss
> 
>  include/linux/sunrpc/xprt.h    | 17 +++++++++++-
>  include/trace/events/rpcgss.h  |  4 +--
>  include/trace/events/sunrpc.h  |  2 +-
>  net/sunrpc/auth_gss/auth_gss.c | 59 ++++++++++++++++++++++++++----------------
>  net/sunrpc/clnt.c              |  9 +++++--
>  net/sunrpc/xprt.c              |  3 ++-
>  6 files changed, 64 insertions(+), 30 deletions(-)
> ---
> base-commit: 7eb172143d5508b4da468ed59ee857c6e5e01da6
> change-id: 20250314-rfc2203-seqnum-cache-52389d14f567
> 
> Best regards,

This seems like a sensible thing to do to me.

Acked-by: Chuck Lever <chuck.lever@oracle.com>

-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 0/2] fix gss seqno handling to be more rfc-compliant
  2025-03-20 13:16 ` [PATCH v2 0/2] fix gss seqno handling to be more rfc-compliant Chuck Lever
@ 2025-06-11 18:50   ` Nikhil Jha
  2025-06-11 18:54     ` Chuck Lever
  0 siblings, 1 reply; 8+ messages in thread
From: Nikhil Jha @ 2025-06-11 18:50 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Trond Myklebust, Anna Schumaker, Jeff Layton, Neil Brown,
	Olga Kornievskaia, Dai Ngo, Tom Talpey, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers, linux-nfs,
	linux-kernel, netdev, linux-trace-kernel

On Thu, Mar 20, 2025 at 09:16:15AM -0400, Chuck Lever wrote:
> On 3/19/25 1:02 PM, Nikhil Jha via B4 Relay wrote:
> > When the client retransmits an operation (for example, because the
> > server is slow to respond), a new GSS sequence number is associated with
> > the XID. In the current kernel code the original sequence number is
> > discarded. Subsequently, if a response to the original request is
> > received there will be a GSS sequence number mismatch. A mismatch will
> > trigger another retransmit, possibly repeating the cycle, and after some
> > number of failed retries EACCES is returned.
> > 
> > RFC2203, section 5.3.3.1 suggests a possible solution... “cache the
> > RPCSEC_GSS sequence number of each request it sends” and "compute the
> > checksum of each sequence number in the cache to try to match the
> > checksum in the reply's verifier." This is what FreeBSD’s implementation
> > does (rpc_gss_validate in sys/rpc/rpcsec_gss/rpcsec_gss.c).
> > 
> > However, even with this cache, retransmits directly caused by a seqno
> > mismatch can still cause a bad message interleaving that results in this
> > bug. The RFC already suggests ignoring incorrect seqnos on the server
> > side, and this seems symmetric, so this patchset also applies that
> > behavior to the client.
> > 
> > These two patches are *not* dependent on each other. I tested them by
> > delaying packets with a Python script hooked up to NFQUEUE. If it would
> > be helpful I can send this script along as well.
> > 
> > Signed-off-by: Nikhil Jha <njha@janestreet.com>
> > ---
> > Changes since v1:
> >  * Maintain the invariant that the first seqno is always first in
> >    rq_seqnos, so that it doesn't need to be stored twice.
> >  * Minor formatting, and resending with proper mailing-list headers so the
> >    patches are easier to work with.
> > 
> > ---
> > Nikhil Jha (2):
> >       sunrpc: implement rfc2203 rpcsec_gss seqnum cache
> >       sunrpc: don't immediately retransmit on seqno miss
> > 
> >  include/linux/sunrpc/xprt.h    | 17 +++++++++++-
> >  include/trace/events/rpcgss.h  |  4 +--
> >  include/trace/events/sunrpc.h  |  2 +-
> >  net/sunrpc/auth_gss/auth_gss.c | 59 ++++++++++++++++++++++++++----------------
> >  net/sunrpc/clnt.c              |  9 +++++--
> >  net/sunrpc/xprt.c              |  3 ++-
> >  6 files changed, 64 insertions(+), 30 deletions(-)
> > ---
> > base-commit: 7eb172143d5508b4da468ed59ee857c6e5e01da6
> > change-id: 20250314-rfc2203-seqnum-cache-52389d14f567
> > 
> > Best regards,
> 
> This seems like a sensible thing to do to me.
> 
> Acked-by: Chuck Lever <chuck.lever@oracle.com>
> 
> -- 
> Chuck Lever

Hi,

We've been running this patch for a while now and noticed a (very silly
in hindsight) bug.

maj_stat = gss_validate_seqno_mic(ctx, task->tk_rqstp->rq_seqnos[i], seq, p, len);

needs to be

maj_stat = gss_validate_seqno_mic(ctx, task->tk_rqstp->rq_seqnos[i++], seq, p, len);

Or the kernel gets stuck in a loop when you have more than two retries.
I can resend this patch but I noticed it's already made its way into
quite a few trees. Should this be a separate patch instead?

- Nikhil






^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 0/2] fix gss seqno handling to be more rfc-compliant
  2025-06-11 18:50   ` Nikhil Jha
@ 2025-06-11 18:54     ` Chuck Lever
  2025-06-11 19:05       ` Nikhil Jha
  0 siblings, 1 reply; 8+ messages in thread
From: Chuck Lever @ 2025-06-11 18:54 UTC (permalink / raw)
  To: Nikhil Jha
  Cc: Trond Myklebust, Anna Schumaker, Jeff Layton, Neil Brown,
	Olga Kornievskaia, Dai Ngo, Tom Talpey, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers, linux-nfs,
	linux-kernel, netdev, linux-trace-kernel

On 6/11/25 2:50 PM, Nikhil Jha wrote:
> On Thu, Mar 20, 2025 at 09:16:15AM -0400, Chuck Lever wrote:
>> On 3/19/25 1:02 PM, Nikhil Jha via B4 Relay wrote:
>>> When the client retransmits an operation (for example, because the
>>> server is slow to respond), a new GSS sequence number is associated with
>>> the XID. In the current kernel code the original sequence number is
>>> discarded. Subsequently, if a response to the original request is
>>> received there will be a GSS sequence number mismatch. A mismatch will
>>> trigger another retransmit, possibly repeating the cycle, and after some
>>> number of failed retries EACCES is returned.
>>>
>>> RFC2203, section 5.3.3.1 suggests a possible solution... “cache the
>>> RPCSEC_GSS sequence number of each request it sends” and "compute the
>>> checksum of each sequence number in the cache to try to match the
>>> checksum in the reply's verifier." This is what FreeBSD’s implementation
>>> does (rpc_gss_validate in sys/rpc/rpcsec_gss/rpcsec_gss.c).
>>>
>>> However, even with this cache, retransmits directly caused by a seqno
>>> mismatch can still cause a bad message interleaving that results in this
>>> bug. The RFC already suggests ignoring incorrect seqnos on the server
>>> side, and this seems symmetric, so this patchset also applies that
>>> behavior to the client.
>>>
>>> These two patches are *not* dependent on each other. I tested them by
>>> delaying packets with a Python script hooked up to NFQUEUE. If it would
>>> be helpful I can send this script along as well.
>>>
>>> Signed-off-by: Nikhil Jha <njha@janestreet.com>
>>> ---
>>> Changes since v1:
>>>  * Maintain the invariant that the first seqno is always first in
>>>    rq_seqnos, so that it doesn't need to be stored twice.
>>>  * Minor formatting, and resending with proper mailing-list headers so the
>>>    patches are easier to work with.
>>>
>>> ---
>>> Nikhil Jha (2):
>>>       sunrpc: implement rfc2203 rpcsec_gss seqnum cache
>>>       sunrpc: don't immediately retransmit on seqno miss
>>>
>>>  include/linux/sunrpc/xprt.h    | 17 +++++++++++-
>>>  include/trace/events/rpcgss.h  |  4 +--
>>>  include/trace/events/sunrpc.h  |  2 +-
>>>  net/sunrpc/auth_gss/auth_gss.c | 59 ++++++++++++++++++++++++++----------------
>>>  net/sunrpc/clnt.c              |  9 +++++--
>>>  net/sunrpc/xprt.c              |  3 ++-
>>>  6 files changed, 64 insertions(+), 30 deletions(-)
>>> ---
>>> base-commit: 7eb172143d5508b4da468ed59ee857c6e5e01da6
>>> change-id: 20250314-rfc2203-seqnum-cache-52389d14f567
>>>
>>> Best regards,
>>
>> This seems like a sensible thing to do to me.
>>
>> Acked-by: Chuck Lever <chuck.lever@oracle.com>
>>
>> -- 
>> Chuck Lever
> 
> Hi,
> 
> We've been running this patch for a while now and noticed a (very silly
> in hindsight) bug.
> 
> maj_stat = gss_validate_seqno_mic(ctx, task->tk_rqstp->rq_seqnos[i], seq, p, len);
> 
> needs to be
> 
> maj_stat = gss_validate_seqno_mic(ctx, task->tk_rqstp->rq_seqnos[i++], seq, p, len);
> 
> Or the kernel gets stuck in a loop when you have more than two retries.
> I can resend this patch but I noticed it's already made its way into
> quite a few trees. Should this be a separate patch instead?

The course of action depends on what trees you found the patch in.


-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 0/2] fix gss seqno handling to be more rfc-compliant
  2025-06-11 18:54     ` Chuck Lever
@ 2025-06-11 19:05       ` Nikhil Jha
  2025-06-11 19:10         ` Chuck Lever
  0 siblings, 1 reply; 8+ messages in thread
From: Nikhil Jha @ 2025-06-11 19:05 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Trond Myklebust, Anna Schumaker, Jeff Layton, Neil Brown,
	Olga Kornievskaia, Dai Ngo, Tom Talpey, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers, linux-nfs,
	linux-kernel, netdev, linux-trace-kernel

On Wed, Jun 11, 2025 at 02:54:09PM -0400, Chuck Lever wrote:
> On 6/11/25 2:50 PM, Nikhil Jha wrote:
> > On Thu, Mar 20, 2025 at 09:16:15AM -0400, Chuck Lever wrote:
> >> On 3/19/25 1:02 PM, Nikhil Jha via B4 Relay wrote:
> >>> When the client retransmits an operation (for example, because the
> >>> server is slow to respond), a new GSS sequence number is associated with
> >>> the XID. In the current kernel code the original sequence number is
> >>> discarded. Subsequently, if a response to the original request is
> >>> received there will be a GSS sequence number mismatch. A mismatch will
> >>> trigger another retransmit, possibly repeating the cycle, and after some
> >>> number of failed retries EACCES is returned.
> >>>
> >>> RFC2203, section 5.3.3.1 suggests a possible solution... “cache the
> >>> RPCSEC_GSS sequence number of each request it sends” and "compute the
> >>> checksum of each sequence number in the cache to try to match the
> >>> checksum in the reply's verifier." This is what FreeBSD’s implementation
> >>> does (rpc_gss_validate in sys/rpc/rpcsec_gss/rpcsec_gss.c).
> >>>
> >>> However, even with this cache, retransmits directly caused by a seqno
> >>> mismatch can still cause a bad message interleaving that results in this
> >>> bug. The RFC already suggests ignoring incorrect seqnos on the server
> >>> side, and this seems symmetric, so this patchset also applies that
> >>> behavior to the client.
> >>>
> >>> These two patches are *not* dependent on each other. I tested them by
> >>> delaying packets with a Python script hooked up to NFQUEUE. If it would
> >>> be helpful I can send this script along as well.
> >>>
> >>> Signed-off-by: Nikhil Jha <njha@janestreet.com>
> >>> ---
> >>> Changes since v1:
> >>>  * Maintain the invariant that the first seqno is always first in
> >>>    rq_seqnos, so that it doesn't need to be stored twice.
> >>>  * Minor formatting, and resending with proper mailing-list headers so the
> >>>    patches are easier to work with.
> >>>
> >>> ---
> >>> Nikhil Jha (2):
> >>>       sunrpc: implement rfc2203 rpcsec_gss seqnum cache
> >>>       sunrpc: don't immediately retransmit on seqno miss
> >>>
> >>>  include/linux/sunrpc/xprt.h    | 17 +++++++++++-
> >>>  include/trace/events/rpcgss.h  |  4 +--
> >>>  include/trace/events/sunrpc.h  |  2 +-
> >>>  net/sunrpc/auth_gss/auth_gss.c | 59 ++++++++++++++++++++++++++----------------
> >>>  net/sunrpc/clnt.c              |  9 +++++--
> >>>  net/sunrpc/xprt.c              |  3 ++-
> >>>  6 files changed, 64 insertions(+), 30 deletions(-)
> >>> ---
> >>> base-commit: 7eb172143d5508b4da468ed59ee857c6e5e01da6
> >>> change-id: 20250314-rfc2203-seqnum-cache-52389d14f567
> >>>
> >>> Best regards,
> >>
> >> This seems like a sensible thing to do to me.
> >>
> >> Acked-by: Chuck Lever <chuck.lever@oracle.com>
> >>
> >> -- 
> >> Chuck Lever
> > 
> > Hi,
> > 
> > We've been running this patch for a while now and noticed a (very silly
> > in hindsight) bug.
> > 
> > maj_stat = gss_validate_seqno_mic(ctx, task->tk_rqstp->rq_seqnos[i], seq, p, len);
> > 
> > needs to be
> > 
> > maj_stat = gss_validate_seqno_mic(ctx, task->tk_rqstp->rq_seqnos[i++], seq, p, len);
> > 
> > Or the kernel gets stuck in a loop when you have more than two retries.
> > I can resend this patch but I noticed it's already made its way into
> > quite a few trees. Should this be a separate patch instead?
> 
> The course of action depends on what trees you found the patch in.
> 
> 
> -- 
> Chuck Lever

It shows up here, so I think it's in v6.16-rc1 already.

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.16-rc1&id=08d6ee6d8a10aef958c2af16bb121070290ed589

- Nikhil



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 0/2] fix gss seqno handling to be more rfc-compliant
  2025-06-11 19:05       ` Nikhil Jha
@ 2025-06-11 19:10         ` Chuck Lever
  0 siblings, 0 replies; 8+ messages in thread
From: Chuck Lever @ 2025-06-11 19:10 UTC (permalink / raw)
  To: Nikhil Jha
  Cc: Trond Myklebust, Anna Schumaker, Jeff Layton, Neil Brown,
	Olga Kornievskaia, Dai Ngo, Tom Talpey, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers, linux-nfs,
	linux-kernel, netdev, linux-trace-kernel

On 6/11/25 3:05 PM, Nikhil Jha wrote:
> On Wed, Jun 11, 2025 at 02:54:09PM -0400, Chuck Lever wrote:
>> On 6/11/25 2:50 PM, Nikhil Jha wrote:
>>> On Thu, Mar 20, 2025 at 09:16:15AM -0400, Chuck Lever wrote:
>>>> On 3/19/25 1:02 PM, Nikhil Jha via B4 Relay wrote:
>>>>> When the client retransmits an operation (for example, because the
>>>>> server is slow to respond), a new GSS sequence number is associated with
>>>>> the XID. In the current kernel code the original sequence number is
>>>>> discarded. Subsequently, if a response to the original request is
>>>>> received there will be a GSS sequence number mismatch. A mismatch will
>>>>> trigger another retransmit, possibly repeating the cycle, and after some
>>>>> number of failed retries EACCES is returned.
>>>>>
>>>>> RFC2203, section 5.3.3.1 suggests a possible solution... “cache the
>>>>> RPCSEC_GSS sequence number of each request it sends” and "compute the
>>>>> checksum of each sequence number in the cache to try to match the
>>>>> checksum in the reply's verifier." This is what FreeBSD’s implementation
>>>>> does (rpc_gss_validate in sys/rpc/rpcsec_gss/rpcsec_gss.c).
>>>>>
>>>>> However, even with this cache, retransmits directly caused by a seqno
>>>>> mismatch can still cause a bad message interleaving that results in this
>>>>> bug. The RFC already suggests ignoring incorrect seqnos on the server
>>>>> side, and this seems symmetric, so this patchset also applies that
>>>>> behavior to the client.
>>>>>
>>>>> These two patches are *not* dependent on each other. I tested them by
>>>>> delaying packets with a Python script hooked up to NFQUEUE. If it would
>>>>> be helpful I can send this script along as well.
>>>>>
>>>>> Signed-off-by: Nikhil Jha <njha@janestreet.com>
>>>>> ---
>>>>> Changes since v1:
>>>>>  * Maintain the invariant that the first seqno is always first in
>>>>>    rq_seqnos, so that it doesn't need to be stored twice.
>>>>>  * Minor formatting, and resending with proper mailing-list headers so the
>>>>>    patches are easier to work with.
>>>>>
>>>>> ---
>>>>> Nikhil Jha (2):
>>>>>       sunrpc: implement rfc2203 rpcsec_gss seqnum cache
>>>>>       sunrpc: don't immediately retransmit on seqno miss
>>>>>
>>>>>  include/linux/sunrpc/xprt.h    | 17 +++++++++++-
>>>>>  include/trace/events/rpcgss.h  |  4 +--
>>>>>  include/trace/events/sunrpc.h  |  2 +-
>>>>>  net/sunrpc/auth_gss/auth_gss.c | 59 ++++++++++++++++++++++++++----------------
>>>>>  net/sunrpc/clnt.c              |  9 +++++--
>>>>>  net/sunrpc/xprt.c              |  3 ++-
>>>>>  6 files changed, 64 insertions(+), 30 deletions(-)
>>>>> ---
>>>>> base-commit: 7eb172143d5508b4da468ed59ee857c6e5e01da6
>>>>> change-id: 20250314-rfc2203-seqnum-cache-52389d14f567
>>>>>
>>>>> Best regards,
>>>>
>>>> This seems like a sensible thing to do to me.
>>>>
>>>> Acked-by: Chuck Lever <chuck.lever@oracle.com>
>>>>
>>>> -- 
>>>> Chuck Lever
>>>
>>> Hi,
>>>
>>> We've been running this patch for a while now and noticed a (very silly
>>> in hindsight) bug.
>>>
>>> maj_stat = gss_validate_seqno_mic(ctx, task->tk_rqstp->rq_seqnos[i], seq, p, len);
>>>
>>> needs to be
>>>
>>> maj_stat = gss_validate_seqno_mic(ctx, task->tk_rqstp->rq_seqnos[i++], seq, p, len);
>>>
>>> Or the kernel gets stuck in a loop when you have more than two retries.
>>> I can resend this patch but I noticed it's already made its way into
>>> quite a few trees. Should this be a separate patch instead?
>>
>> The course of action depends on what trees you found the patch in.
>>
>>
>> -- 
>> Chuck Lever
> 
> It shows up here, so I think it's in v6.16-rc1 already.
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.16-rc1&id=08d6ee6d8a10aef958c2af16bb121070290ed589

In that case, post your fix delta To: Anna, Cc: linux-nfs@ and she will
apply it for a subsequent upstream pull request for v6.16-rc.


-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-06-11 19:11 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-19 17:02 [PATCH v2 0/2] fix gss seqno handling to be more rfc-compliant Nikhil Jha via B4 Relay
2025-03-19 17:02 ` [PATCH v2 1/2] sunrpc: implement rfc2203 rpcsec_gss seqnum cache Nikhil Jha via B4 Relay
2025-03-19 17:02 ` [PATCH v2 2/2] sunrpc: don't immediately retransmit on seqno miss Nikhil Jha via B4 Relay
2025-03-20 13:16 ` [PATCH v2 0/2] fix gss seqno handling to be more rfc-compliant Chuck Lever
2025-06-11 18:50   ` Nikhil Jha
2025-06-11 18:54     ` Chuck Lever
2025-06-11 19:05       ` Nikhil Jha
2025-06-11 19:10         ` Chuck Lever

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).