public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next v5 0/6] TLS read_sock performance scalability
@ 2026-03-24 12:53 Chuck Lever
  2026-03-24 12:53 ` [PATCH net-next v5 1/6] tls: Purge async_hold in tls_decrypt_async_wait() Chuck Lever
                   ` (6 more replies)
  0 siblings, 7 replies; 12+ messages in thread
From: Chuck Lever @ 2026-03-24 12:53 UTC (permalink / raw)
  To: john.fastabend, kuba, sd
  Cc: netdev, kernel-tls-handshake, Chuck Lever, Hannes Reinecke,
	Alistair Francis

I'd like to encourage in-kernel kTLS consumers (i.e., NFS and
NVMe/TCP) to coalesce on the use of read_sock. When I suggested
this to Hannes, he reported a number of nagging performance
scalability issues with read_sock. This series is an attempt to
run these issues down and get them fixed before we convert the
above sock_recvmsg consumers over to read_sock.

Batch async decryption and its submit/deliver scaffolding were
dropped from this series because async_capable is always false
for TLS 1.3, which NFS and NVMe/TCP both require. Async crypto
support for TLS 1.3 is a prerequisite for revisiting that work.

---
Changes since v4:
- Drop batch async decryption and submit/deliver restructure:
  async_capable is always false for TLS 1.3, so the new code
  was unreachable for NFS and NVMe/TCP
- Purge async_hold directly in tls_decrypt_async_wait() and drop
  the tls_decrypt_async_drain() wrapper
- Merge tls_strp_check_rcv_quiet() into tls_strp_check_rcv() with
  a bool wake parameter; fix lost wakeup on the recvmsg exit path

Changes since v3:
- Clarify why tls_decrypt_async_drain() is separate from _wait()
- Fold tls_err_abort() into tls_rx_one_record(), drop tls_rx_decrypt_record()
- Move backlog flush into tls_rx_rec_wait() so all RX paths benefit

Changes since v2:
- Fix short read self tests

Changes since v1:
- Add C11 reference
- Extend data_ready reduction to recvmsg and splice
- Restructure read_sock and recvmsg using shared helpers

---
Chuck Lever (6):
      tls: Purge async_hold in tls_decrypt_async_wait()
      tls: Abort the connection on decrypt failure
      tls: Fix dangling skb pointer in tls_sw_read_sock()
      tls: Factor tls_strp_msg_release() from tls_strp_msg_done()
      tls: Suppress spurious saved_data_ready on all receive paths
      tls: Flush backlog before waiting for a new record

 net/tls/tls.h      |  4 ++--
 net/tls/tls_main.c |  2 +-
 net/tls/tls_strp.c | 42 +++++++++++++++++++++++++++++++-----------
 net/tls/tls_sw.c   | 51 ++++++++++++++++++++++++++++++---------------------
 4 files changed, 64 insertions(+), 35 deletions(-)
---
base-commit: fb78a629b4f0eb399b413f6c093a3da177b3a4eb
change-id: 20260317-tls-read-sock-a0022c9df265

Best regards,
--  
Chuck Lever


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH net-next v5 1/6] tls: Purge async_hold in tls_decrypt_async_wait()
  2026-03-24 12:53 [PATCH net-next v5 0/6] TLS read_sock performance scalability Chuck Lever
@ 2026-03-24 12:53 ` Chuck Lever
  2026-03-26 10:32   ` Hannes Reinecke
  2026-03-24 12:53 ` [PATCH net-next v5 2/6] tls: Abort the connection on decrypt failure Chuck Lever
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 12+ messages in thread
From: Chuck Lever @ 2026-03-24 12:53 UTC (permalink / raw)
  To: john.fastabend, kuba, sd; +Cc: netdev, kernel-tls-handshake, Chuck Lever

From: Chuck Lever <chuck.lever@oracle.com>

The async_hold queue pins encrypted input skbs while
the AEAD engine references their scatterlist data. Once
tls_decrypt_async_wait() returns, every AEAD operation
has completed and the engine no longer references those
skbs, so they can be freed unconditionally.

A subsequent patch adds batch async decryption to
tls_sw_read_sock(), introducing a new call site that
must drain pending AEAD operations and release held
skbs. Move __skb_queue_purge(&ctx->async_hold) into
tls_decrypt_async_wait() so the purge is centralized
and every caller -- recvmsg's drain path, the -EBUSY
fallback in tls_do_decryption(), and the new read_sock
batch path -- releases held skbs on synchronization
without each site managing the purge independently.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 net/tls/tls_sw.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index a656ce235758..20f8fc84c5f5 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -246,6 +246,7 @@ static int tls_decrypt_async_wait(struct tls_sw_context_rx *ctx)
 		crypto_wait_req(-EINPROGRESS, &ctx->async_wait);
 	atomic_inc(&ctx->decrypt_pending);
 
+	__skb_queue_purge(&ctx->async_hold);
 	return ctx->async_wait.err;
 }
 
@@ -2224,7 +2225,6 @@ int tls_sw_recvmsg(struct sock *sk,
 
 		/* Wait for all previously submitted records to be decrypted */
 		ret = tls_decrypt_async_wait(ctx);
-		__skb_queue_purge(&ctx->async_hold);
 
 		if (ret) {
 			if (err >= 0 || err == -EINPROGRESS)

-- 
2.53.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH net-next v5 2/6] tls: Abort the connection on decrypt failure
  2026-03-24 12:53 [PATCH net-next v5 0/6] TLS read_sock performance scalability Chuck Lever
  2026-03-24 12:53 ` [PATCH net-next v5 1/6] tls: Purge async_hold in tls_decrypt_async_wait() Chuck Lever
@ 2026-03-24 12:53 ` Chuck Lever
  2026-03-24 12:53 ` [PATCH net-next v5 3/6] tls: Fix dangling skb pointer in tls_sw_read_sock() Chuck Lever
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: Chuck Lever @ 2026-03-24 12:53 UTC (permalink / raw)
  To: john.fastabend, kuba, sd
  Cc: netdev, kernel-tls-handshake, Chuck Lever, Hannes Reinecke

From: Chuck Lever <chuck.lever@oracle.com>

recvmsg, read_sock, and splice_read each open-code a
tls_err_abort() call after tls_rx_one_record() fails. Move
the abort into tls_rx_one_record() so each receive path
shares a single decrypt-and-abort sequence.

A tls_check_pending_rekey() failure after successful
decryption no longer triggers tls_err_abort(). That path
fires only when skb_copy_bits() fails on a valid skb,
which is not a realistic scenario.

Suggested-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 net/tls/tls_sw.c | 19 +++++++++----------
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 20f8fc84c5f5..5626fdd4ea0a 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -1799,6 +1799,9 @@ static int tls_check_pending_rekey(struct sock *sk, struct tls_context *ctx,
 	return 0;
 }
 
+/* Decrypt and return one TLS record. On decrypt failure the connection is
+ * aborted (sk_err set) before returning a negative errno.
+ */
 static int tls_rx_one_record(struct sock *sk, struct msghdr *msg,
 			     struct tls_decrypt_arg *darg)
 {
@@ -1810,8 +1813,10 @@ static int tls_rx_one_record(struct sock *sk, struct msghdr *msg,
 	err = tls_decrypt_device(sk, msg, tls_ctx, darg);
 	if (!err)
 		err = tls_decrypt_sw(sk, tls_ctx, msg, darg);
-	if (err < 0)
+	if (err < 0) {
+		tls_err_abort(sk, -EBADMSG);
 		return err;
+	}
 
 	rxm = strp_msg(darg->skb);
 	rxm->offset += prot->prepend_size;
@@ -2122,10 +2127,8 @@ int tls_sw_recvmsg(struct sock *sk,
 			darg.async = false;
 
 		err = tls_rx_one_record(sk, msg, &darg);
-		if (err < 0) {
-			tls_err_abort(sk, -EBADMSG);
+		if (err < 0)
 			goto recv_end;
-		}
 
 		async |= darg.async;
 
@@ -2284,10 +2287,8 @@ ssize_t tls_sw_splice_read(struct socket *sock,  loff_t *ppos,
 		memset(&darg.inargs, 0, sizeof(darg.inargs));
 
 		err = tls_rx_one_record(sk, NULL, &darg);
-		if (err < 0) {
-			tls_err_abort(sk, -EBADMSG);
+		if (err < 0)
 			goto splice_read_end;
-		}
 
 		tls_rx_rec_done(ctx);
 		skb = darg.skb;
@@ -2370,10 +2371,8 @@ int tls_sw_read_sock(struct sock *sk, read_descriptor_t *desc,
 			memset(&darg.inargs, 0, sizeof(darg.inargs));
 
 			err = tls_rx_one_record(sk, NULL, &darg);
-			if (err < 0) {
-				tls_err_abort(sk, -EBADMSG);
+			if (err < 0)
 				goto read_sock_end;
-			}
 
 			released = tls_read_flush_backlog(sk, prot, INT_MAX,
 							  0, decrypted,

-- 
2.53.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH net-next v5 3/6] tls: Fix dangling skb pointer in tls_sw_read_sock()
  2026-03-24 12:53 [PATCH net-next v5 0/6] TLS read_sock performance scalability Chuck Lever
  2026-03-24 12:53 ` [PATCH net-next v5 1/6] tls: Purge async_hold in tls_decrypt_async_wait() Chuck Lever
  2026-03-24 12:53 ` [PATCH net-next v5 2/6] tls: Abort the connection on decrypt failure Chuck Lever
@ 2026-03-24 12:53 ` Chuck Lever
  2026-03-24 12:53 ` [PATCH net-next v5 4/6] tls: Factor tls_strp_msg_release() from tls_strp_msg_done() Chuck Lever
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: Chuck Lever @ 2026-03-24 12:53 UTC (permalink / raw)
  To: john.fastabend, kuba, sd
  Cc: netdev, kernel-tls-handshake, Chuck Lever, Hannes Reinecke,
	Alistair Francis

From: Chuck Lever <chuck.lever@oracle.com>

Per ISO/IEC 9899:2011 section 6.2.4p2, a pointer value becomes
indeterminate when the object it points to reaches the end of its
lifetime; Annex J.2 classifies the use of such a value as undefined
behavior. In tls_sw_read_sock(), consume_skb(skb) in the
fully-consumed path frees the skb, but the "do { } while (skb)"
loop condition then evaluates that freed pointer. Although the
value is never dereferenced -- the loop either continues and
overwrites skb, or exits -- any future change that adds a
dereference between consume_skb() and the loop condition would
produce a silent use-after-free.

Fixes: 662fbcec32f4 ("net/tls: implement ->read_sock()")
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 net/tls/tls_sw.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 5626fdd4ea0a..5fdd43a55f1e 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -2356,7 +2356,7 @@ int tls_sw_read_sock(struct sock *sk, read_descriptor_t *desc,
 		goto read_sock_end;
 
 	decrypted = 0;
-	do {
+	for (;;) {
 		if (!skb_queue_empty(&ctx->rx_list)) {
 			skb = __skb_dequeue(&ctx->rx_list);
 			rxm = strp_msg(skb);
@@ -2405,10 +2405,11 @@ int tls_sw_read_sock(struct sock *sk, read_descriptor_t *desc,
 				goto read_sock_requeue;
 		} else {
 			consume_skb(skb);
+			skb = NULL;
 			if (!desc->count)
-				skb = NULL;
+				break;
 		}
-	} while (skb);
+	}
 
 read_sock_end:
 	tls_rx_reader_release(sk, ctx);

-- 
2.53.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH net-next v5 4/6] tls: Factor tls_strp_msg_release() from tls_strp_msg_done()
  2026-03-24 12:53 [PATCH net-next v5 0/6] TLS read_sock performance scalability Chuck Lever
                   ` (2 preceding siblings ...)
  2026-03-24 12:53 ` [PATCH net-next v5 3/6] tls: Fix dangling skb pointer in tls_sw_read_sock() Chuck Lever
@ 2026-03-24 12:53 ` Chuck Lever
  2026-03-24 12:53 ` [PATCH net-next v5 5/6] tls: Suppress spurious saved_data_ready on all receive paths Chuck Lever
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: Chuck Lever @ 2026-03-24 12:53 UTC (permalink / raw)
  To: john.fastabend, kuba, sd
  Cc: netdev, kernel-tls-handshake, Chuck Lever, Hannes Reinecke,
	Alistair Francis

From: Chuck Lever <chuck.lever@oracle.com>

tls_strp_msg_done() conflates releasing the current record with
checking for the next one via tls_strp_check_rcv(). Batch
processing requires releasing a record without immediately
triggering that check, so the release step is separated into
tls_strp_msg_release(). tls_strp_msg_done() is preserved as a
wrapper for existing callers.

Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 net/tls/tls.h      |  1 +
 net/tls/tls_strp.c | 15 ++++++++++++++-
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/net/tls/tls.h b/net/tls/tls.h
index e8f81a006520..a97f1acef31d 100644
--- a/net/tls/tls.h
+++ b/net/tls/tls.h
@@ -193,6 +193,7 @@ int tls_strp_init(struct tls_strparser *strp, struct sock *sk);
 void tls_strp_data_ready(struct tls_strparser *strp);
 
 void tls_strp_check_rcv(struct tls_strparser *strp);
+void tls_strp_msg_release(struct tls_strparser *strp);
 void tls_strp_msg_done(struct tls_strparser *strp);
 
 int tls_rx_msg_size(struct tls_strparser *strp, struct sk_buff *skb);
diff --git a/net/tls/tls_strp.c b/net/tls/tls_strp.c
index 98e12f0ff57e..a7648ebde162 100644
--- a/net/tls/tls_strp.c
+++ b/net/tls/tls_strp.c
@@ -581,7 +581,16 @@ static void tls_strp_work(struct work_struct *w)
 	release_sock(strp->sk);
 }
 
-void tls_strp_msg_done(struct tls_strparser *strp)
+/**
+ * tls_strp_msg_release - release the current strparser message
+ * @strp: TLS stream parser instance
+ *
+ * Release the current record without triggering a check for the
+ * next record. Callers must invoke tls_strp_check_rcv() before
+ * releasing the socket lock, or queued data will stall until
+ * the next tls_strp_data_ready() event.
+ */
+void tls_strp_msg_release(struct tls_strparser *strp)
 {
 	WARN_ON(!strp->stm.full_len);
 
@@ -592,7 +601,11 @@ void tls_strp_msg_done(struct tls_strparser *strp)
 
 	WRITE_ONCE(strp->msg_ready, 0);
 	memset(&strp->stm, 0, sizeof(strp->stm));
+}
 
+void tls_strp_msg_done(struct tls_strparser *strp)
+{
+	tls_strp_msg_release(strp);
 	tls_strp_check_rcv(strp);
 }
 

-- 
2.53.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH net-next v5 5/6] tls: Suppress spurious saved_data_ready on all receive paths
  2026-03-24 12:53 [PATCH net-next v5 0/6] TLS read_sock performance scalability Chuck Lever
                   ` (3 preceding siblings ...)
  2026-03-24 12:53 ` [PATCH net-next v5 4/6] tls: Factor tls_strp_msg_release() from tls_strp_msg_done() Chuck Lever
@ 2026-03-24 12:53 ` Chuck Lever
  2026-03-24 12:53 ` [PATCH net-next v5 6/6] tls: Flush backlog before waiting for a new record Chuck Lever
  2026-03-26  9:10 ` [PATCH net-next v5 0/6] TLS read_sock performance scalability patchwork-bot+netdevbpf
  6 siblings, 0 replies; 12+ messages in thread
From: Chuck Lever @ 2026-03-24 12:53 UTC (permalink / raw)
  To: john.fastabend, kuba, sd
  Cc: netdev, kernel-tls-handshake, Chuck Lever, Alistair Francis,
	Hannes Reinecke

From: Chuck Lever <chuck.lever@oracle.com>

Each record release via tls_strp_msg_done() triggers
tls_strp_check_rcv(), which calls tls_rx_msg_ready() and
fires saved_data_ready(). During a multi-record receive,
the first N-1 wakeups are pure overhead: the caller is
already running and will pick up subsequent records on
the next loop iteration. On the splice_read path the
per-record wakeup is similarly unnecessary because the
caller still holds the socket lock.

Replace tls_strp_msg_done() with tls_strp_msg_release()
in all three receive paths (read_sock, recvmsg,
splice_read), deferring the tls_strp_check_rcv() call
to each path's exit point. Factor tls_rx_msg_ready()
out of tls_strp_read_sock() so that parsing a record
no longer fires the callback directly, and add a @wake
parameter to tls_strp_check_rcv() so callers can parse
queued data without notifying.

With no remaining callers, tls_strp_msg_done() and its
wrapper tls_rx_rec_done() are removed.

Acked-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 net/tls/tls.h      |  3 +--
 net/tls/tls_main.c |  2 +-
 net/tls/tls_strp.c | 35 +++++++++++++++++++++--------------
 net/tls/tls_sw.c   | 22 +++++++++++++++-------
 4 files changed, 38 insertions(+), 24 deletions(-)

diff --git a/net/tls/tls.h b/net/tls/tls.h
index a97f1acef31d..f41dac6305f4 100644
--- a/net/tls/tls.h
+++ b/net/tls/tls.h
@@ -192,9 +192,8 @@ void tls_strp_stop(struct tls_strparser *strp);
 int tls_strp_init(struct tls_strparser *strp, struct sock *sk);
 void tls_strp_data_ready(struct tls_strparser *strp);
 
-void tls_strp_check_rcv(struct tls_strparser *strp);
+void tls_strp_check_rcv(struct tls_strparser *strp, bool wake);
 void tls_strp_msg_release(struct tls_strparser *strp);
-void tls_strp_msg_done(struct tls_strparser *strp);
 
 int tls_rx_msg_size(struct tls_strparser *strp, struct sk_buff *skb);
 void tls_rx_msg_ready(struct tls_strparser *strp);
diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
index fd39acf41a61..c10a3fd7fc17 100644
--- a/net/tls/tls_main.c
+++ b/net/tls/tls_main.c
@@ -769,7 +769,7 @@ static int do_tls_setsockopt_conf(struct sock *sk, sockptr_t optval,
 	} else {
 		struct tls_sw_context_rx *rx_ctx = tls_sw_ctx_rx(ctx);
 
-		tls_strp_check_rcv(&rx_ctx->strp);
+		tls_strp_check_rcv(&rx_ctx->strp, true);
 	}
 	return 0;
 
diff --git a/net/tls/tls_strp.c b/net/tls/tls_strp.c
index a7648ebde162..b0b2ca92fa99 100644
--- a/net/tls/tls_strp.c
+++ b/net/tls/tls_strp.c
@@ -368,7 +368,6 @@ static int tls_strp_copyin(read_descriptor_t *desc, struct sk_buff *in_skb,
 		desc->count = 0;
 
 		WRITE_ONCE(strp->msg_ready, 1);
-		tls_rx_msg_ready(strp);
 	}
 
 	return ret;
@@ -539,18 +538,32 @@ static int tls_strp_read_sock(struct tls_strparser *strp)
 		return tls_strp_read_copy(strp, false);
 
 	WRITE_ONCE(strp->msg_ready, 1);
-	tls_rx_msg_ready(strp);
 
 	return 0;
 }
 
-void tls_strp_check_rcv(struct tls_strparser *strp)
+/**
+ * tls_strp_check_rcv - parse queued data and optionally notify
+ * @strp: TLS stream parser instance
+ * @wake: if true, fire consumer notification when a message is ready
+ *
+ * When @wake is false, queued data is parsed without consumer
+ * notification. A subsequent call with @wake set to true is
+ * required before the socket lock is released; otherwise queued
+ * data stalls until the next tls_strp_data_ready() event.
+ */
+void tls_strp_check_rcv(struct tls_strparser *strp, bool wake)
 {
-	if (unlikely(strp->stopped) || strp->msg_ready)
+	if (unlikely(strp->stopped))
 		return;
 
-	if (tls_strp_read_sock(strp) == -ENOMEM)
-		queue_work(tls_strp_wq, &strp->work);
+	if (!strp->msg_ready) {
+		if (tls_strp_read_sock(strp) == -ENOMEM)
+			queue_work(tls_strp_wq, &strp->work);
+	}
+
+	if (wake && strp->msg_ready)
+		tls_rx_msg_ready(strp);
 }
 
 /* Lower sock lock held */
@@ -568,7 +581,7 @@ void tls_strp_data_ready(struct tls_strparser *strp)
 		return;
 	}
 
-	tls_strp_check_rcv(strp);
+	tls_strp_check_rcv(strp, true);
 }
 
 static void tls_strp_work(struct work_struct *w)
@@ -577,7 +590,7 @@ static void tls_strp_work(struct work_struct *w)
 		container_of(w, struct tls_strparser, work);
 
 	lock_sock(strp->sk);
-	tls_strp_check_rcv(strp);
+	tls_strp_check_rcv(strp, true);
 	release_sock(strp->sk);
 }
 
@@ -603,12 +616,6 @@ void tls_strp_msg_release(struct tls_strparser *strp)
 	memset(&strp->stm, 0, sizeof(strp->stm));
 }
 
-void tls_strp_msg_done(struct tls_strparser *strp)
-{
-	tls_strp_msg_release(strp);
-	tls_strp_check_rcv(strp);
-}
-
 void tls_strp_stop(struct tls_strparser *strp)
 {
 	strp->stopped = 1;
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 5fdd43a55f1e..8fb2f2a93846 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -1373,7 +1373,11 @@ tls_rx_rec_wait(struct sock *sk, struct sk_psock *psock, bool nonblock,
 			return ret;
 
 		if (!skb_queue_empty(&sk->sk_receive_queue)) {
-			tls_strp_check_rcv(&ctx->strp);
+			/* Defer notification to the exit point;
+			 * this thread will consume the record
+			 * directly.
+			 */
+			tls_strp_check_rcv(&ctx->strp, false);
 			if (tls_strp_msg_ready(ctx))
 				break;
 		}
@@ -1859,9 +1863,9 @@ static int tls_record_content_type(struct msghdr *msg, struct tls_msg *tlm,
 	return 1;
 }
 
-static void tls_rx_rec_done(struct tls_sw_context_rx *ctx)
+static void tls_rx_rec_release(struct tls_sw_context_rx *ctx)
 {
-	tls_strp_msg_done(&ctx->strp);
+	tls_strp_msg_release(&ctx->strp);
 }
 
 /* This function traverses the rx_list in tls receive context to copies the
@@ -2142,7 +2146,7 @@ int tls_sw_recvmsg(struct sock *sk,
 		err = tls_record_content_type(msg, tls_msg(darg.skb), &control);
 		if (err <= 0) {
 			DEBUG_NET_WARN_ON_ONCE(darg.zc);
-			tls_rx_rec_done(ctx);
+			tls_rx_rec_release(ctx);
 put_on_rx_list_err:
 			__skb_queue_tail(&ctx->rx_list, darg.skb);
 			goto recv_end;
@@ -2156,7 +2160,8 @@ int tls_sw_recvmsg(struct sock *sk,
 		/* TLS 1.3 may have updated the length by more than overhead */
 		rxm = strp_msg(darg.skb);
 		chunk = rxm->full_len;
-		tls_rx_rec_done(ctx);
+		tls_rx_rec_release(ctx);
+		tls_strp_check_rcv(&ctx->strp, false);
 
 		if (!darg.zc) {
 			bool partially_consumed = chunk > len;
@@ -2250,6 +2255,7 @@ int tls_sw_recvmsg(struct sock *sk,
 	copied += decrypted;
 
 end:
+	tls_strp_check_rcv(&ctx->strp, true);
 	tls_rx_reader_unlock(sk, ctx);
 	if (psock)
 		sk_psock_put(sk, psock);
@@ -2290,7 +2296,7 @@ ssize_t tls_sw_splice_read(struct socket *sock,  loff_t *ppos,
 		if (err < 0)
 			goto splice_read_end;
 
-		tls_rx_rec_done(ctx);
+		tls_rx_rec_release(ctx);
 		skb = darg.skb;
 	}
 
@@ -2317,6 +2323,7 @@ ssize_t tls_sw_splice_read(struct socket *sock,  loff_t *ppos,
 	consume_skb(skb);
 
 splice_read_end:
+	tls_strp_check_rcv(&ctx->strp, true);
 	tls_rx_reader_unlock(sk, ctx);
 	return copied ? : err;
 
@@ -2382,7 +2389,7 @@ int tls_sw_read_sock(struct sock *sk, read_descriptor_t *desc,
 			tlm = tls_msg(skb);
 			decrypted += rxm->full_len;
 
-			tls_rx_rec_done(ctx);
+			tls_rx_rec_release(ctx);
 		}
 
 		/* read_sock does not support reading control messages */
@@ -2412,6 +2419,7 @@ int tls_sw_read_sock(struct sock *sk, read_descriptor_t *desc,
 	}
 
 read_sock_end:
+	tls_strp_check_rcv(&ctx->strp, true);
 	tls_rx_reader_release(sk, ctx);
 	return copied ? : err;
 

-- 
2.53.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH net-next v5 6/6] tls: Flush backlog before waiting for a new record
  2026-03-24 12:53 [PATCH net-next v5 0/6] TLS read_sock performance scalability Chuck Lever
                   ` (4 preceding siblings ...)
  2026-03-24 12:53 ` [PATCH net-next v5 5/6] tls: Suppress spurious saved_data_ready on all receive paths Chuck Lever
@ 2026-03-24 12:53 ` Chuck Lever
  2026-03-24 16:18   ` Sabrina Dubroca
  2026-03-26  9:10 ` [PATCH net-next v5 0/6] TLS read_sock performance scalability patchwork-bot+netdevbpf
  6 siblings, 1 reply; 12+ messages in thread
From: Chuck Lever @ 2026-03-24 12:53 UTC (permalink / raw)
  To: john.fastabend, kuba, sd
  Cc: netdev, kernel-tls-handshake, Chuck Lever, Hannes Reinecke

From: Chuck Lever <chuck.lever@oracle.com>

While lock_sock is held, incoming TCP segments land on
sk->sk_backlog rather than sk->sk_receive_queue.
tls_rx_rec_wait() inspects only sk_receive_queue, so
backlog data remains invisible. For non-blocking callers
(read_sock, and recvmsg or splice_read with MSG_DONTWAIT)
this causes a spurious -EAGAIN. For blocking callers it
forces an unnecessary sleep/wakeup cycle.

Flush the backlog inside tls_rx_rec_wait() before checking
sk_receive_queue so the strparser can parse newly-arrived
segments immediately.

Fixes: 20ffc7adf53a ("net/tls: missing received data after fast remote close")
Suggested-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 net/tls/tls_sw.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 8fb2f2a93846..84c4ae0330d1 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -1372,6 +1372,7 @@ tls_rx_rec_wait(struct sock *sk, struct sk_psock *psock, bool nonblock,
 		if (ret < 0)
 			return ret;
 
+		sk_flush_backlog(sk);
 		if (!skb_queue_empty(&sk->sk_receive_queue)) {
 			/* Defer notification to the exit point;
 			 * this thread will consume the record

-- 
2.53.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next v5 6/6] tls: Flush backlog before waiting for a new record
  2026-03-24 12:53 ` [PATCH net-next v5 6/6] tls: Flush backlog before waiting for a new record Chuck Lever
@ 2026-03-24 16:18   ` Sabrina Dubroca
  2026-03-24 19:07     ` Chuck Lever
  0 siblings, 1 reply; 12+ messages in thread
From: Sabrina Dubroca @ 2026-03-24 16:18 UTC (permalink / raw)
  To: Chuck Lever
  Cc: john.fastabend, kuba, netdev, kernel-tls-handshake, Chuck Lever,
	Hannes Reinecke

2026-03-24, 08:53:28 -0400, Chuck Lever wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
> 
> While lock_sock is held, incoming TCP segments land on
> sk->sk_backlog rather than sk->sk_receive_queue.
> tls_rx_rec_wait() inspects only sk_receive_queue, so
> backlog data remains invisible. For non-blocking callers
> (read_sock, and recvmsg or splice_read with MSG_DONTWAIT)
> this causes a spurious -EAGAIN. For blocking callers it
> forces an unnecessary sleep/wakeup cycle.
> 
> Flush the backlog inside tls_rx_rec_wait() before checking
> sk_receive_queue so the strparser can parse newly-arrived
> segments immediately.
> 
> Fixes: 20ffc7adf53a ("net/tls: missing received data after fast remote close")

How did you pick that Fixes tag? That commit mentions FIN/connection
closing, which doesn't seem related to the local backlog.

And it's quite possible there was a similar problem when kTLS was
using the generic strparser, but the code has changed so much with
84c61fe1a75b ("tls: rx: do not use the standard strparser") and the
work around that, that blaming something older probably doesn't make
too much sense.

> Suggested-by: Sabrina Dubroca <sd@queasysnail.net>
> Reviewed-by: Hannes Reinecke <hare@suse.de>
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
>  net/tls/tls_sw.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
> index 8fb2f2a93846..84c4ae0330d1 100644
> --- a/net/tls/tls_sw.c
> +++ b/net/tls/tls_sw.c
> @@ -1372,6 +1372,7 @@ tls_rx_rec_wait(struct sock *sk, struct sk_psock *psock, bool nonblock,
>  		if (ret < 0)
>  			return ret;
>  
> +		sk_flush_backlog(sk);

Do we need to update released when this returns true, like callers of
tls_read_flush_backlog() do? I also wonder if we'd want to update the
caller's flushed_at to avoid bypassing the "smart checks" in
tls_read_flush_backlog().

>  		if (!skb_queue_empty(&sk->sk_receive_queue)) {
>  			/* Defer notification to the exit point;
>  			 * this thread will consume the record

-- 
Sabrina

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next v5 6/6] tls: Flush backlog before waiting for a new record
  2026-03-24 16:18   ` Sabrina Dubroca
@ 2026-03-24 19:07     ` Chuck Lever
  2026-03-26  8:59       ` Sabrina Dubroca
  0 siblings, 1 reply; 12+ messages in thread
From: Chuck Lever @ 2026-03-24 19:07 UTC (permalink / raw)
  To: Sabrina Dubroca
  Cc: john.fastabend, Jakub Kicinski, netdev, kernel-tls-handshake,
	Chuck Lever, Hannes Reinecke



On Tue, Mar 24, 2026, at 12:18 PM, Sabrina Dubroca wrote:
> 2026-03-24, 08:53:28 -0400, Chuck Lever wrote:
>> From: Chuck Lever <chuck.lever@oracle.com>
>> 
>> While lock_sock is held, incoming TCP segments land on
>> sk->sk_backlog rather than sk->sk_receive_queue.
>> tls_rx_rec_wait() inspects only sk_receive_queue, so
>> backlog data remains invisible. For non-blocking callers
>> (read_sock, and recvmsg or splice_read with MSG_DONTWAIT)
>> this causes a spurious -EAGAIN. For blocking callers it
>> forces an unnecessary sleep/wakeup cycle.
>> 
>> Flush the backlog inside tls_rx_rec_wait() before checking
>> sk_receive_queue so the strparser can parse newly-arrived
>> segments immediately.
>> 
>> Fixes: 20ffc7adf53a ("net/tls: missing received data after fast remote close")
>
> How did you pick that Fixes tag? That commit mentions FIN/connection
> closing, which doesn't seem related to the local backlog.

20ffc7adf53a introduced the sk_receive_queue check inside the
wait loop (then called tls_wait_data(), later refactored into
tls_rx_rec_wait()).

When lock_sock is held, incoming TCP will segments land on
sk->sk_backlog, not sk->sk_receive_queue. The sk_receive_queue
check introduced by 20ffc7adf53a doesn't see backlog data.


>> diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
>> index 8fb2f2a93846..84c4ae0330d1 100644
>> --- a/net/tls/tls_sw.c
>> +++ b/net/tls/tls_sw.c
>> @@ -1372,6 +1372,7 @@ tls_rx_rec_wait(struct sock *sk, struct sk_psock *psock, bool nonblock,
>>  		if (ret < 0)
>>  			return ret;
>>  
>> +		sk_flush_backlog(sk);
>
> Do we need to update released when this returns true, like callers of
> tls_read_flush_backlog() do?

Good catch. v6 will do that.


> I also wonder if we'd want to update the
> caller's flushed_at to avoid bypassing the "smart checks" in
> tls_read_flush_backlog().

The flush in tls_rx_rec_wait() only fires when the loop finds
no ready message, which is the cold path. The redundant flush
from tls_read_flush_backlog() on the next iteration is wasteful
but harmless. I'm not sure the additional complexity would be
worth it, but if you believe it will add some value, let me
know and I will add it.


>>  		if (!skb_queue_empty(&sk->sk_receive_queue)) {
>>  			/* Defer notification to the exit point;
>>  			 * this thread will consume the record
>
> -- 
> Sabrina

-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next v5 6/6] tls: Flush backlog before waiting for a new record
  2026-03-24 19:07     ` Chuck Lever
@ 2026-03-26  8:59       ` Sabrina Dubroca
  0 siblings, 0 replies; 12+ messages in thread
From: Sabrina Dubroca @ 2026-03-26  8:59 UTC (permalink / raw)
  To: Chuck Lever
  Cc: john.fastabend, Jakub Kicinski, netdev, kernel-tls-handshake,
	Chuck Lever, Hannes Reinecke

2026-03-24, 15:07:00 -0400, Chuck Lever wrote:
> 
> 
> On Tue, Mar 24, 2026, at 12:18 PM, Sabrina Dubroca wrote:
> > 2026-03-24, 08:53:28 -0400, Chuck Lever wrote:
> >> From: Chuck Lever <chuck.lever@oracle.com>
> >> 
> >> While lock_sock is held, incoming TCP segments land on
> >> sk->sk_backlog rather than sk->sk_receive_queue.
> >> tls_rx_rec_wait() inspects only sk_receive_queue, so
> >> backlog data remains invisible. For non-blocking callers
> >> (read_sock, and recvmsg or splice_read with MSG_DONTWAIT)
> >> this causes a spurious -EAGAIN. For blocking callers it
> >> forces an unnecessary sleep/wakeup cycle.
> >> 
> >> Flush the backlog inside tls_rx_rec_wait() before checking
> >> sk_receive_queue so the strparser can parse newly-arrived
> >> segments immediately.
> >> 
> >> Fixes: 20ffc7adf53a ("net/tls: missing received data after fast remote close")
> >
> > How did you pick that Fixes tag? That commit mentions FIN/connection
> > closing, which doesn't seem related to the local backlog.
> 
> 20ffc7adf53a introduced the sk_receive_queue check inside the
> wait loop (then called tls_wait_data(), later refactored into
> tls_rx_rec_wait()).
> 
> When lock_sock is held, incoming TCP will segments land on
> sk->sk_backlog, not sk->sk_receive_queue. The sk_receive_queue
> check introduced by 20ffc7adf53a doesn't see backlog data.

But without this check, we'd go straight to the EAGAIN/sleep cycle, so
things were even worse before?

(btw, you don't need a Fixes tag for net-next patches. if you think
this is a bug, the patch should be extracted from this series and
submitted separately for the "net" tree, with the appropriate Fixes
tag)


> >> diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
> >> index 8fb2f2a93846..84c4ae0330d1 100644
> >> --- a/net/tls/tls_sw.c
> >> +++ b/net/tls/tls_sw.c
> >> @@ -1372,6 +1372,7 @@ tls_rx_rec_wait(struct sock *sk, struct sk_psock *psock, bool nonblock,
> >>  		if (ret < 0)
> >>  			return ret;
> >>  
> >> +		sk_flush_backlog(sk);
> >
> > Do we need to update released when this returns true, like callers of
> > tls_read_flush_backlog() do?
> 
> Good catch. v6 will do that.
> 
> 
> > I also wonder if we'd want to update the
> > caller's flushed_at to avoid bypassing the "smart checks" in
> > tls_read_flush_backlog().
> 
> The flush in tls_rx_rec_wait() only fires when the loop finds
> no ready message, which is the cold path.

Right.

> The redundant flush
> from tls_read_flush_backlog() on the next iteration is wasteful
> but harmless. I'm not sure the additional complexity would be
> worth it, but if you believe it will add some value, let me
> know and I will add it.

No, that sounds ok. But probably worth a quick mention in the commit
message about this wasteful sk_flush_backlog() when we've already gone
on the cold path, and we can revisit in the future if needed.

-- 
Sabrina

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next v5 0/6] TLS read_sock performance scalability
  2026-03-24 12:53 [PATCH net-next v5 0/6] TLS read_sock performance scalability Chuck Lever
                   ` (5 preceding siblings ...)
  2026-03-24 12:53 ` [PATCH net-next v5 6/6] tls: Flush backlog before waiting for a new record Chuck Lever
@ 2026-03-26  9:10 ` patchwork-bot+netdevbpf
  6 siblings, 0 replies; 12+ messages in thread
From: patchwork-bot+netdevbpf @ 2026-03-26  9:10 UTC (permalink / raw)
  To: Chuck Lever
  Cc: john.fastabend, kuba, sd, netdev, kernel-tls-handshake,
	chuck.lever, hare, alistair.francis

Hello:

This series was applied to netdev/net.git (main)
by Paolo Abeni <pabeni@redhat.com>:

On Tue, 24 Mar 2026 08:53:22 -0400 you wrote:
> I'd like to encourage in-kernel kTLS consumers (i.e., NFS and
> NVMe/TCP) to coalesce on the use of read_sock. When I suggested
> this to Hannes, he reported a number of nagging performance
> scalability issues with read_sock. This series is an attempt to
> run these issues down and get them fixed before we convert the
> above sock_recvmsg consumers over to read_sock.
> 
> [...]

Here is the summary with links:
  - [net-next,v5,1/6] tls: Purge async_hold in tls_decrypt_async_wait()
    https://git.kernel.org/netdev/net/c/84a8335d8300
  - [net-next,v5,2/6] tls: Abort the connection on decrypt failure
    (no matching commit)
  - [net-next,v5,3/6] tls: Fix dangling skb pointer in tls_sw_read_sock()
    (no matching commit)
  - [net-next,v5,4/6] tls: Factor tls_strp_msg_release() from tls_strp_msg_done()
    (no matching commit)
  - [net-next,v5,5/6] tls: Suppress spurious saved_data_ready on all receive paths
    (no matching commit)
  - [net-next,v5,6/6] tls: Flush backlog before waiting for a new record
    (no matching commit)

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next v5 1/6] tls: Purge async_hold in tls_decrypt_async_wait()
  2026-03-24 12:53 ` [PATCH net-next v5 1/6] tls: Purge async_hold in tls_decrypt_async_wait() Chuck Lever
@ 2026-03-26 10:32   ` Hannes Reinecke
  0 siblings, 0 replies; 12+ messages in thread
From: Hannes Reinecke @ 2026-03-26 10:32 UTC (permalink / raw)
  To: Chuck Lever, john.fastabend, kuba, sd
  Cc: netdev, kernel-tls-handshake, Chuck Lever

On 3/24/26 13:53, Chuck Lever wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
> 
> The async_hold queue pins encrypted input skbs while
> the AEAD engine references their scatterlist data. Once
> tls_decrypt_async_wait() returns, every AEAD operation
> has completed and the engine no longer references those
> skbs, so they can be freed unconditionally.
> 
> A subsequent patch adds batch async decryption to
> tls_sw_read_sock(), introducing a new call site that
> must drain pending AEAD operations and release held
> skbs. Move __skb_queue_purge(&ctx->async_hold) into
> tls_decrypt_async_wait() so the purge is centralized
> and every caller -- recvmsg's drain path, the -EBUSY
> fallback in tls_do_decryption(), and the new read_sock
> batch path -- releases held skbs on synchronization
> without each site managing the purge independently.
> 
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
>   net/tls/tls_sw.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2026-03-26 10:32 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-24 12:53 [PATCH net-next v5 0/6] TLS read_sock performance scalability Chuck Lever
2026-03-24 12:53 ` [PATCH net-next v5 1/6] tls: Purge async_hold in tls_decrypt_async_wait() Chuck Lever
2026-03-26 10:32   ` Hannes Reinecke
2026-03-24 12:53 ` [PATCH net-next v5 2/6] tls: Abort the connection on decrypt failure Chuck Lever
2026-03-24 12:53 ` [PATCH net-next v5 3/6] tls: Fix dangling skb pointer in tls_sw_read_sock() Chuck Lever
2026-03-24 12:53 ` [PATCH net-next v5 4/6] tls: Factor tls_strp_msg_release() from tls_strp_msg_done() Chuck Lever
2026-03-24 12:53 ` [PATCH net-next v5 5/6] tls: Suppress spurious saved_data_ready on all receive paths Chuck Lever
2026-03-24 12:53 ` [PATCH net-next v5 6/6] tls: Flush backlog before waiting for a new record Chuck Lever
2026-03-24 16:18   ` Sabrina Dubroca
2026-03-24 19:07     ` Chuck Lever
2026-03-26  8:59       ` Sabrina Dubroca
2026-03-26  9:10 ` [PATCH net-next v5 0/6] TLS read_sock performance scalability patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox