* [PATCH net-next v9 0/5] TLS read_sock performance scalability
@ 2026-04-29 21:48 Chuck Lever
2026-04-29 21:48 ` [PATCH net-next v9 1/5] tls: Abort the connection on decrypt failure Chuck Lever
` (6 more replies)
0 siblings, 7 replies; 16+ messages in thread
From: Chuck Lever @ 2026-04-29 21:48 UTC (permalink / raw)
To: John Fastabend, Jakub Kicinski, Sabrina Dubroca
Cc: Eric Dumazet, Simon Horman, Paolo Abeni, netdev,
kernel-tls-handshake, Chuck Lever, Hannes Reinecke,
Alistair Francis
I'd like to encourage in-kernel kTLS consumers (i.e., NFS and
NVMe/TCP) to coalesce on the use of read_sock. When I suggested
this to Hannes, he reported a few performance scalability issues
with read_sock. However, batch async decryption and its
submit/deliver scaffolding were dropped from this series because
async_capable is always false for TLS 1.3, the TLS version that
NFS and NVMe/TCP both require. Async crypto support for TLS 1.3
is a prerequisite for revisiting that work.
This series is now only a set of clean-ups. Support for async
has been deferred until after TLS KeyUpdate has been merged.
---
Changes since v8:
- Address review comments from sashiko
- Patch 2: Requeue partially consumed skb to prevent leak
- Patch 5: Re-check sk_err so RST during flush surfaces as
-ECONNRESET instead of EOF
- Address review comments from gpt-5.5
- Patch 4: Restore msg_ready early-return in tls_strp_check_rcv()
so the queued strp_work doesn't double-wake the consumer
- Patch 4: Add tls_strparser msg_announced bit so the recvmsg
exit-point handoff doesn't re-fire saved_data_ready() for a
record BH or the worker already announced (rx_list-only drain
path)
Changes since v7:
- Rebased on net-next (v7.1-rc1)
Changes since v6:
- Rebased on net-next, v5's 1/6 was merged upstream
Changes since v5:
- Patch 6: Set released = true when sk_flush_backlog() returns
true, so tls_strp_msg_load() knows the socket lock was
released (Sabrina)
- Patch 6: Drop Fixes tag; submit bug fix separately via net
if warranted (Sabrina)
- Patch 6: Note redundant flush on cold path in commit message
(Sabrina)
Changes since v4:
- Drop batch async decryption and submit/deliver restructure:
async_capable is always false for TLS 1.3, so the new code
was unreachable for NFS and NVMe/TCP
- Purge async_hold directly in tls_decrypt_async_wait() and drop
the tls_decrypt_async_drain() wrapper
- Merge tls_strp_check_rcv_quiet() into tls_strp_check_rcv() with
a bool wake parameter; fix lost wakeup on the recvmsg exit path
Changes since v3:
- Clarify why tls_decrypt_async_drain() is separate from _wait()
- Fold tls_err_abort() into tls_rx_one_record(), drop tls_rx_decrypt_record()
- Move backlog flush into tls_rx_rec_wait() so all RX paths benefit
Changes since v2:
- Fix short read self tests
Changes since v1:
- Add C11 reference
- Extend data_ready reduction to recvmsg and splice
- Restructure read_sock and recvmsg using shared helpers
---
Chuck Lever (5):
tls: Abort the connection on decrypt failure
tls: Fix dangling skb pointer in tls_sw_read_sock()
tls: Factor tls_strp_msg_release() from tls_strp_msg_done()
tls: Suppress spurious saved_data_ready on all receive paths
tls: Flush backlog before waiting for a new record
include/net/tls.h | 4 +++
net/tls/tls.h | 4 +--
net/tls/tls_main.c | 2 +-
net/tls/tls_strp.c | 36 ++++++++++++++++++++------
net/tls/tls_sw.c | 76 ++++++++++++++++++++++++++++++++++++------------------
5 files changed, 86 insertions(+), 36 deletions(-)
---
base-commit: 09942ddedcb960f9e78fd817ec33f501d1040c5b
change-id: 20260317-tls-read-sock-a0022c9df265
Best regards,
--
Chuck Lever
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH net-next v9 1/5] tls: Abort the connection on decrypt failure
2026-04-29 21:48 [PATCH net-next v9 0/5] TLS read_sock performance scalability Chuck Lever
@ 2026-04-29 21:48 ` Chuck Lever
2026-05-03 1:20 ` Jakub Kicinski
2026-04-29 21:48 ` [PATCH net-next v9 2/5] tls: Fix dangling skb pointer in tls_sw_read_sock() Chuck Lever
` (5 subsequent siblings)
6 siblings, 1 reply; 16+ messages in thread
From: Chuck Lever @ 2026-04-29 21:48 UTC (permalink / raw)
To: John Fastabend, Jakub Kicinski, Sabrina Dubroca
Cc: Eric Dumazet, Simon Horman, Paolo Abeni, netdev,
kernel-tls-handshake, Chuck Lever, Hannes Reinecke
From: Chuck Lever <chuck.lever@oracle.com>
recvmsg, read_sock, and splice_read each open-code a
tls_err_abort() call after tls_rx_one_record() fails. Move
the abort into tls_rx_one_record() so each receive path
shares a single decrypt-and-abort sequence.
A tls_check_pending_rekey() failure after successful
decryption no longer triggers tls_err_abort(). That path
fires only when skb_copy_bits() fails on a valid skb,
which is not a realistic scenario.
Suggested-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
net/tls/tls_sw.c | 19 +++++++++----------
1 file changed, 9 insertions(+), 10 deletions(-)
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 94d2ae0daa8c..244ac8ed4b01 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -1809,6 +1809,9 @@ static int tls_check_pending_rekey(struct sock *sk, struct tls_context *ctx,
return 0;
}
+/* Decrypt and return one TLS record. On decrypt failure the connection is
+ * aborted (sk_err set) before returning a negative errno.
+ */
static int tls_rx_one_record(struct sock *sk, struct msghdr *msg,
struct tls_decrypt_arg *darg)
{
@@ -1820,8 +1823,10 @@ static int tls_rx_one_record(struct sock *sk, struct msghdr *msg,
err = tls_decrypt_device(sk, msg, tls_ctx, darg);
if (!err)
err = tls_decrypt_sw(sk, tls_ctx, msg, darg);
- if (err < 0)
+ if (err < 0) {
+ tls_err_abort(sk, -EBADMSG);
return err;
+ }
rxm = strp_msg(darg->skb);
rxm->offset += prot->prepend_size;
@@ -2132,10 +2137,8 @@ int tls_sw_recvmsg(struct sock *sk,
darg.async = false;
err = tls_rx_one_record(sk, msg, &darg);
- if (err < 0) {
- tls_err_abort(sk, -EBADMSG);
+ if (err < 0)
goto recv_end;
- }
async |= darg.async;
@@ -2294,10 +2297,8 @@ ssize_t tls_sw_splice_read(struct socket *sock, loff_t *ppos,
memset(&darg.inargs, 0, sizeof(darg.inargs));
err = tls_rx_one_record(sk, NULL, &darg);
- if (err < 0) {
- tls_err_abort(sk, -EBADMSG);
+ if (err < 0)
goto splice_read_end;
- }
tls_rx_rec_done(ctx);
skb = darg.skb;
@@ -2380,10 +2381,8 @@ int tls_sw_read_sock(struct sock *sk, read_descriptor_t *desc,
memset(&darg.inargs, 0, sizeof(darg.inargs));
err = tls_rx_one_record(sk, NULL, &darg);
- if (err < 0) {
- tls_err_abort(sk, -EBADMSG);
+ if (err < 0)
goto read_sock_end;
- }
released = tls_read_flush_backlog(sk, prot, INT_MAX,
0, decrypted,
--
2.53.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH net-next v9 2/5] tls: Fix dangling skb pointer in tls_sw_read_sock()
2026-04-29 21:48 [PATCH net-next v9 0/5] TLS read_sock performance scalability Chuck Lever
2026-04-29 21:48 ` [PATCH net-next v9 1/5] tls: Abort the connection on decrypt failure Chuck Lever
@ 2026-04-29 21:48 ` Chuck Lever
2026-05-03 1:05 ` Jakub Kicinski
2026-04-29 21:48 ` [PATCH net-next v9 3/5] tls: Factor tls_strp_msg_release() from tls_strp_msg_done() Chuck Lever
` (4 subsequent siblings)
6 siblings, 1 reply; 16+ messages in thread
From: Chuck Lever @ 2026-04-29 21:48 UTC (permalink / raw)
To: John Fastabend, Jakub Kicinski, Sabrina Dubroca
Cc: Eric Dumazet, Simon Horman, Paolo Abeni, netdev,
kernel-tls-handshake, Chuck Lever, Hannes Reinecke,
Alistair Francis
From: Chuck Lever <chuck.lever@oracle.com>
Two related defects in the receive loop of tls_sw_read_sock()
share a single fix.
Per ISO/IEC 9899:2011 section 6.2.4p2, a pointer value becomes
indeterminate when the object it points to reaches the end of
its lifetime; Annex J.2 classifies the use of such a value as
undefined behavior. consume_skb(skb) in the fully-consumed path
frees the skb, but the "do { } while (skb)" loop condition then
evaluates that freed pointer. Although the value is never
dereferenced -- the loop either continues and overwrites skb,
or exits -- any future change that adds a dereference between
consume_skb() and the loop condition would produce a silent
use-after-free.
Separately, when read_actor() consumes only part of a record
(used < rxm->full_len) but desc->count is still non-zero, the
existing code updates rxm in place and falls through to the
next loop iteration. The next iteration then unconditionally
overwrites skb without freeing or requeuing the partially
consumed buffer, leaking the skb and silently dropping stream
data.
A read_actor returning fewer bytes than offered is in every
case a backpressure signal; the only correct response is to
requeue and exit. Replace the do/while with an explicit
for(;;), requeue unconditionally on partial consume, and break
on exhausted desc->count after a full consume.
Fixes: 662fbcec32f4 ("net/tls: implement ->read_sock()")
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
net/tls/tls_sw.c | 15 +++++++--------
1 file changed, 7 insertions(+), 8 deletions(-)
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 244ac8ed4b01..c58d3b0b0a8a 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -2366,7 +2366,7 @@ int tls_sw_read_sock(struct sock *sk, read_descriptor_t *desc,
goto read_sock_end;
decrypted = 0;
- do {
+ for (;;) {
if (!skb_queue_empty(&ctx->rx_list)) {
skb = __skb_dequeue(&ctx->rx_list);
rxm = strp_msg(skb);
@@ -2411,14 +2411,13 @@ int tls_sw_read_sock(struct sock *sk, read_descriptor_t *desc,
if (used < rxm->full_len) {
rxm->offset += used;
rxm->full_len -= used;
- if (!desc->count)
- goto read_sock_requeue;
- } else {
- consume_skb(skb);
- if (!desc->count)
- skb = NULL;
+ goto read_sock_requeue;
}
- } while (skb);
+ consume_skb(skb);
+ skb = NULL;
+ if (!desc->count)
+ break;
+ }
read_sock_end:
tls_rx_reader_release(sk, ctx);
--
2.53.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH net-next v9 3/5] tls: Factor tls_strp_msg_release() from tls_strp_msg_done()
2026-04-29 21:48 [PATCH net-next v9 0/5] TLS read_sock performance scalability Chuck Lever
2026-04-29 21:48 ` [PATCH net-next v9 1/5] tls: Abort the connection on decrypt failure Chuck Lever
2026-04-29 21:48 ` [PATCH net-next v9 2/5] tls: Fix dangling skb pointer in tls_sw_read_sock() Chuck Lever
@ 2026-04-29 21:48 ` Chuck Lever
2026-05-03 1:09 ` Jakub Kicinski
2026-04-29 21:48 ` [PATCH net-next v9 4/5] tls: Suppress spurious saved_data_ready on all receive paths Chuck Lever
` (3 subsequent siblings)
6 siblings, 1 reply; 16+ messages in thread
From: Chuck Lever @ 2026-04-29 21:48 UTC (permalink / raw)
To: John Fastabend, Jakub Kicinski, Sabrina Dubroca
Cc: Eric Dumazet, Simon Horman, Paolo Abeni, netdev,
kernel-tls-handshake, Chuck Lever, Hannes Reinecke,
Alistair Francis
From: Chuck Lever <chuck.lever@oracle.com>
tls_strp_msg_done() conflates releasing the current record with
checking for the next one via tls_strp_check_rcv(). Batch
processing requires releasing a record without immediately
triggering that check, so the release step is separated into
tls_strp_msg_release(). tls_strp_msg_done() is preserved as a
wrapper for existing callers.
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
net/tls/tls.h | 1 +
net/tls/tls_strp.c | 15 ++++++++++++++-
2 files changed, 15 insertions(+), 1 deletion(-)
diff --git a/net/tls/tls.h b/net/tls/tls.h
index e8f81a006520..a97f1acef31d 100644
--- a/net/tls/tls.h
+++ b/net/tls/tls.h
@@ -193,6 +193,7 @@ int tls_strp_init(struct tls_strparser *strp, struct sock *sk);
void tls_strp_data_ready(struct tls_strparser *strp);
void tls_strp_check_rcv(struct tls_strparser *strp);
+void tls_strp_msg_release(struct tls_strparser *strp);
void tls_strp_msg_done(struct tls_strparser *strp);
int tls_rx_msg_size(struct tls_strparser *strp, struct sk_buff *skb);
diff --git a/net/tls/tls_strp.c b/net/tls/tls_strp.c
index 98e12f0ff57e..a7648ebde162 100644
--- a/net/tls/tls_strp.c
+++ b/net/tls/tls_strp.c
@@ -581,7 +581,16 @@ static void tls_strp_work(struct work_struct *w)
release_sock(strp->sk);
}
-void tls_strp_msg_done(struct tls_strparser *strp)
+/**
+ * tls_strp_msg_release - release the current strparser message
+ * @strp: TLS stream parser instance
+ *
+ * Release the current record without triggering a check for the
+ * next record. Callers must invoke tls_strp_check_rcv() before
+ * releasing the socket lock, or queued data will stall until
+ * the next tls_strp_data_ready() event.
+ */
+void tls_strp_msg_release(struct tls_strparser *strp)
{
WARN_ON(!strp->stm.full_len);
@@ -592,7 +601,11 @@ void tls_strp_msg_done(struct tls_strparser *strp)
WRITE_ONCE(strp->msg_ready, 0);
memset(&strp->stm, 0, sizeof(strp->stm));
+}
+void tls_strp_msg_done(struct tls_strparser *strp)
+{
+ tls_strp_msg_release(strp);
tls_strp_check_rcv(strp);
}
--
2.53.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH net-next v9 4/5] tls: Suppress spurious saved_data_ready on all receive paths
2026-04-29 21:48 [PATCH net-next v9 0/5] TLS read_sock performance scalability Chuck Lever
` (2 preceding siblings ...)
2026-04-29 21:48 ` [PATCH net-next v9 3/5] tls: Factor tls_strp_msg_release() from tls_strp_msg_done() Chuck Lever
@ 2026-04-29 21:48 ` Chuck Lever
2026-05-03 1:19 ` Jakub Kicinski
2026-04-29 21:48 ` [PATCH net-next v9 5/5] tls: Flush backlog before waiting for a new record Chuck Lever
` (2 subsequent siblings)
6 siblings, 1 reply; 16+ messages in thread
From: Chuck Lever @ 2026-04-29 21:48 UTC (permalink / raw)
To: John Fastabend, Jakub Kicinski, Sabrina Dubroca
Cc: Eric Dumazet, Simon Horman, Paolo Abeni, netdev,
kernel-tls-handshake, Chuck Lever
From: Chuck Lever <chuck.lever@oracle.com>
Each record release via tls_strp_msg_done() triggers
tls_strp_check_rcv(), which calls tls_rx_msg_ready() and
fires saved_data_ready(). During a multi-record receive,
the first N-1 wakeups are pure overhead: the caller is
already running and will pick up subsequent records on
the next loop iteration. On the splice_read path the
per-record wakeup is similarly unnecessary because the
caller still holds the socket lock.
Replace tls_strp_msg_done() with tls_strp_msg_release()
in all three receive paths (read_sock, recvmsg,
splice_read), deferring the consumer notification to
each path's exit point. Factor tls_rx_msg_ready() out
of tls_strp_read_sock(), and add a @wake parameter to
tls_strp_check_rcv() so callers can parse queued data
without notifying. tls_strp_check_rcv() retains its
no-op-on-msg_ready semantics, so the BH and worker
notification paths fire saved_data_ready() at most once
per parsed record.
The exit points then invoke tls_rx_msg_ready() once,
covering records the inline parse loop left behind for
a subsequent reader. To keep that final notification
idempotent against records BH or the worker has already
announced, tls_strparser gains a msg_announced bit:
tls_rx_msg_ready() sets it when firing saved_data_ready();
the bit is cleared whenever the parsed record is wiped,
by tls_strp_msg_release() on consumption or by
tls_strp_msg_load() when the lower socket loses bytes
from under the parse. A second call for the same parsed
record, as happens when recvmsg() satisfies the request
from ctx->rx_list without touching the strparser, is then
a no-op.
With no remaining callers, tls_strp_msg_done() and its
wrapper tls_rx_rec_done() are removed.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
include/net/tls.h | 4 ++++
net/tls/tls.h | 3 +--
net/tls/tls_main.c | 2 +-
net/tls/tls_strp.c | 29 ++++++++++++++++++-----------
net/tls/tls_sw.c | 38 +++++++++++++++++++++++++++++++-------
5 files changed, 55 insertions(+), 21 deletions(-)
diff --git a/include/net/tls.h b/include/net/tls.h
index ebd2550280ae..b5c00a93f6ba 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -111,10 +111,14 @@ struct tls_sw_context_tx {
struct tls_strparser {
struct sock *sk;
+ /* Bitfield word and msg_ready are serialized by the lower
+ * socket lock; BH and worker contexts both acquire it.
+ */
u32 mark : 8;
u32 stopped : 1;
u32 copy_mode : 1;
u32 mixed_decrypted : 1;
+ u32 msg_announced : 1;
bool msg_ready;
diff --git a/net/tls/tls.h b/net/tls/tls.h
index a97f1acef31d..f41dac6305f4 100644
--- a/net/tls/tls.h
+++ b/net/tls/tls.h
@@ -192,9 +192,8 @@ void tls_strp_stop(struct tls_strparser *strp);
int tls_strp_init(struct tls_strparser *strp, struct sock *sk);
void tls_strp_data_ready(struct tls_strparser *strp);
-void tls_strp_check_rcv(struct tls_strparser *strp);
+void tls_strp_check_rcv(struct tls_strparser *strp, bool wake);
void tls_strp_msg_release(struct tls_strparser *strp);
-void tls_strp_msg_done(struct tls_strparser *strp);
int tls_rx_msg_size(struct tls_strparser *strp, struct sk_buff *skb);
void tls_rx_msg_ready(struct tls_strparser *strp);
diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
index fd39acf41a61..c10a3fd7fc17 100644
--- a/net/tls/tls_main.c
+++ b/net/tls/tls_main.c
@@ -769,7 +769,7 @@ static int do_tls_setsockopt_conf(struct sock *sk, sockptr_t optval,
} else {
struct tls_sw_context_rx *rx_ctx = tls_sw_ctx_rx(ctx);
- tls_strp_check_rcv(&rx_ctx->strp);
+ tls_strp_check_rcv(&rx_ctx->strp, true);
}
return 0;
diff --git a/net/tls/tls_strp.c b/net/tls/tls_strp.c
index a7648ebde162..bf88fad58b9b 100644
--- a/net/tls/tls_strp.c
+++ b/net/tls/tls_strp.c
@@ -368,7 +368,6 @@ static int tls_strp_copyin(read_descriptor_t *desc, struct sk_buff *in_skb,
desc->count = 0;
WRITE_ONCE(strp->msg_ready, 1);
- tls_rx_msg_ready(strp);
}
return ret;
@@ -492,6 +491,7 @@ bool tls_strp_msg_load(struct tls_strparser *strp, bool force_refresh)
if (!strp->copy_mode && force_refresh) {
if (unlikely(tcp_inq(strp->sk) < strp->stm.full_len)) {
WRITE_ONCE(strp->msg_ready, 0);
+ strp->msg_announced = 0;
memset(&strp->stm, 0, sizeof(strp->stm));
return false;
}
@@ -539,18 +539,30 @@ static int tls_strp_read_sock(struct tls_strparser *strp)
return tls_strp_read_copy(strp, false);
WRITE_ONCE(strp->msg_ready, 1);
- tls_rx_msg_ready(strp);
return 0;
}
-void tls_strp_check_rcv(struct tls_strparser *strp)
+/**
+ * tls_strp_check_rcv - parse queued data and optionally notify
+ * @strp: TLS stream parser instance
+ * @wake: if true, fire consumer notification when a record is newly
+ * parsed by this call
+ *
+ * Returns immediately when a record is already ready; the wake fires
+ * only on transitions from no-record to record-ready. Callers that
+ * need to notify a waiter about a record parsed by another path
+ * should invoke tls_rx_msg_ready() directly.
+ */
+void tls_strp_check_rcv(struct tls_strparser *strp, bool wake)
{
if (unlikely(strp->stopped) || strp->msg_ready)
return;
if (tls_strp_read_sock(strp) == -ENOMEM)
queue_work(tls_strp_wq, &strp->work);
+ else if (wake && strp->msg_ready)
+ tls_rx_msg_ready(strp);
}
/* Lower sock lock held */
@@ -568,7 +580,7 @@ void tls_strp_data_ready(struct tls_strparser *strp)
return;
}
- tls_strp_check_rcv(strp);
+ tls_strp_check_rcv(strp, true);
}
static void tls_strp_work(struct work_struct *w)
@@ -577,7 +589,7 @@ static void tls_strp_work(struct work_struct *w)
container_of(w, struct tls_strparser, work);
lock_sock(strp->sk);
- tls_strp_check_rcv(strp);
+ tls_strp_check_rcv(strp, true);
release_sock(strp->sk);
}
@@ -600,15 +612,10 @@ void tls_strp_msg_release(struct tls_strparser *strp)
tls_strp_flush_anchor_copy(strp);
WRITE_ONCE(strp->msg_ready, 0);
+ strp->msg_announced = 0;
memset(&strp->stm, 0, sizeof(strp->stm));
}
-void tls_strp_msg_done(struct tls_strparser *strp)
-{
- tls_strp_msg_release(strp);
- tls_strp_check_rcv(strp);
-}
-
void tls_strp_stop(struct tls_strparser *strp)
{
strp->stopped = 1;
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index c58d3b0b0a8a..cbb068266bab 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -1383,7 +1383,11 @@ tls_rx_rec_wait(struct sock *sk, struct sk_psock *psock, bool nonblock,
return ret;
if (!skb_queue_empty(&sk->sk_receive_queue)) {
- tls_strp_check_rcv(&ctx->strp);
+ /* Defer notification to the exit point;
+ * this thread will consume the record
+ * directly.
+ */
+ tls_strp_check_rcv(&ctx->strp, false);
if (tls_strp_msg_ready(ctx))
break;
}
@@ -1869,9 +1873,17 @@ static int tls_record_content_type(struct msghdr *msg, struct tls_msg *tlm,
return 1;
}
-static void tls_rx_rec_done(struct tls_sw_context_rx *ctx)
+/* Parse any data left in the lower socket and hand off a single
+ * notification to the next reader. tls_rx_msg_ready() is a no-op
+ * when the current record has already been announced, so paths
+ * that drained ctx->rx_list without touching the strparser do
+ * not re-fire saved_data_ready() for a record BH or the worker
+ * already announced.
+ */
+static void tls_rx_handoff(struct tls_sw_context_rx *ctx)
{
- tls_strp_msg_done(&ctx->strp);
+ tls_strp_check_rcv(&ctx->strp, false);
+ tls_rx_msg_ready(&ctx->strp);
}
/* This function traverses the rx_list in tls receive context to copies the
@@ -2152,7 +2164,7 @@ int tls_sw_recvmsg(struct sock *sk,
err = tls_record_content_type(msg, tls_msg(darg.skb), &control);
if (err <= 0) {
DEBUG_NET_WARN_ON_ONCE(darg.zc);
- tls_rx_rec_done(ctx);
+ tls_strp_msg_release(&ctx->strp);
put_on_rx_list_err:
__skb_queue_tail(&ctx->rx_list, darg.skb);
goto recv_end;
@@ -2166,7 +2178,8 @@ int tls_sw_recvmsg(struct sock *sk,
/* TLS 1.3 may have updated the length by more than overhead */
rxm = strp_msg(darg.skb);
chunk = rxm->full_len;
- tls_rx_rec_done(ctx);
+ tls_strp_msg_release(&ctx->strp);
+ tls_strp_check_rcv(&ctx->strp, false);
if (!darg.zc) {
bool partially_consumed = chunk > len;
@@ -2260,6 +2273,7 @@ int tls_sw_recvmsg(struct sock *sk,
copied += decrypted;
end:
+ tls_rx_handoff(ctx);
tls_rx_reader_unlock(sk, ctx);
if (psock)
sk_psock_put(sk, psock);
@@ -2300,7 +2314,7 @@ ssize_t tls_sw_splice_read(struct socket *sock, loff_t *ppos,
if (err < 0)
goto splice_read_end;
- tls_rx_rec_done(ctx);
+ tls_strp_msg_release(&ctx->strp);
skb = darg.skb;
}
@@ -2327,6 +2341,7 @@ ssize_t tls_sw_splice_read(struct socket *sock, loff_t *ppos,
consume_skb(skb);
splice_read_end:
+ tls_rx_handoff(ctx);
tls_rx_reader_unlock(sk, ctx);
return copied ? : err;
@@ -2392,7 +2407,7 @@ int tls_sw_read_sock(struct sock *sk, read_descriptor_t *desc,
tlm = tls_msg(skb);
decrypted += rxm->full_len;
- tls_rx_rec_done(ctx);
+ tls_strp_msg_release(&ctx->strp);
}
/* read_sock does not support reading control messages */
@@ -2420,6 +2435,7 @@ int tls_sw_read_sock(struct sock *sk, read_descriptor_t *desc,
}
read_sock_end:
+ tls_rx_handoff(ctx);
tls_rx_reader_release(sk, ctx);
return copied ? : err;
@@ -2504,10 +2520,18 @@ int tls_rx_msg_size(struct tls_strparser *strp, struct sk_buff *skb)
return ret;
}
+/* Fire saved_data_ready() at most once per parsed record.
+ * msg_announced is cleared by tls_strp_msg_release() when the
+ * current record is consumed, arming the next announcement.
+ */
void tls_rx_msg_ready(struct tls_strparser *strp)
{
struct tls_sw_context_rx *ctx;
+ if (!READ_ONCE(strp->msg_ready) || strp->msg_announced)
+ return;
+ strp->msg_announced = 1;
+
ctx = container_of(strp, struct tls_sw_context_rx, strp);
ctx->saved_data_ready(strp->sk);
}
--
2.53.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH net-next v9 5/5] tls: Flush backlog before waiting for a new record
2026-04-29 21:48 [PATCH net-next v9 0/5] TLS read_sock performance scalability Chuck Lever
` (3 preceding siblings ...)
2026-04-29 21:48 ` [PATCH net-next v9 4/5] tls: Suppress spurious saved_data_ready on all receive paths Chuck Lever
@ 2026-04-29 21:48 ` Chuck Lever
2026-04-29 23:13 ` [PATCH net-next v9 0/5] TLS read_sock performance scalability Jakub Kicinski
2026-05-03 1:04 ` Jakub Kicinski
6 siblings, 0 replies; 16+ messages in thread
From: Chuck Lever @ 2026-04-29 21:48 UTC (permalink / raw)
To: John Fastabend, Jakub Kicinski, Sabrina Dubroca
Cc: Eric Dumazet, Simon Horman, Paolo Abeni, netdev,
kernel-tls-handshake, Chuck Lever, Hannes Reinecke
From: Chuck Lever <chuck.lever@oracle.com>
While lock_sock is held, incoming TCP segments land on
sk->sk_backlog rather than sk->sk_receive_queue.
tls_rx_rec_wait() inspects only sk_receive_queue, so backlog
data remains invisible. For non-blocking callers (read_sock,
and recvmsg or splice_read with MSG_DONTWAIT) this causes a
spurious -EAGAIN. For blocking callers it forces an
unnecessary sleep/wakeup cycle.
Flush the backlog inside tls_rx_rec_wait() before checking
sk_receive_queue so the strparser can parse newly-arrived
segments immediately. On the next loop iteration
tls_read_flush_backlog() may redundantly flush, but this
path is cold and the cost is negligible.
Backlog processing can run tcp_reset(), which sets both
sk->sk_err = ECONNRESET and (via tcp_done()) sk->sk_shutdown
= SHUTDOWN_MASK. The pre-existing top-of-loop sk_err check
already ran before the flush, so without further care the
freshly-set error would be masked by the next-line
sk_shutdown test returning 0 (EOF). Re-check sk_err
immediately before the sk_shutdown test so a connection
abort surfaces as -ECONNRESET rather than a clean EOF.
Suggested-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
net/tls/tls_sw.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index cbb068266bab..b888aaa505c0 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -1382,6 +1382,8 @@ tls_rx_rec_wait(struct sock *sk, struct sk_psock *psock, bool nonblock,
if (ret < 0)
return ret;
+ if (sk_flush_backlog(sk))
+ released = true;
if (!skb_queue_empty(&sk->sk_receive_queue)) {
/* Defer notification to the exit point;
* this thread will consume the record
@@ -1392,6 +1394,8 @@ tls_rx_rec_wait(struct sock *sk, struct sk_psock *psock, bool nonblock,
break;
}
+ if (sk->sk_err)
+ return sock_error(sk);
if (sk->sk_shutdown & RCV_SHUTDOWN)
return 0;
--
2.53.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH net-next v9 0/5] TLS read_sock performance scalability
2026-04-29 21:48 [PATCH net-next v9 0/5] TLS read_sock performance scalability Chuck Lever
` (4 preceding siblings ...)
2026-04-29 21:48 ` [PATCH net-next v9 5/5] tls: Flush backlog before waiting for a new record Chuck Lever
@ 2026-04-29 23:13 ` Jakub Kicinski
2026-04-29 23:15 ` Chuck Lever
2026-05-03 1:04 ` Jakub Kicinski
6 siblings, 1 reply; 16+ messages in thread
From: Jakub Kicinski @ 2026-04-29 23:13 UTC (permalink / raw)
To: Chuck Lever
Cc: John Fastabend, Sabrina Dubroca, Eric Dumazet, Simon Horman,
Paolo Abeni, netdev, kernel-tls-handshake, Chuck Lever,
Hannes Reinecke, Alistair Francis
On Wed, 29 Apr 2026 17:48:07 -0400 Chuck Lever wrote:
> Changes since v8:
> - Address review comments from sashiko
> - Patch 2: Requeue partially consumed skb to prevent leak
> - Patch 5: Re-check sk_err so RST during flush surfaces as
> -ECONNRESET instead of EOF
> - Address review comments from gpt-5.5
Is someone running gpt-5.5 on the public submissions?
> - Patch 4: Restore msg_ready early-return in tls_strp_check_rcv()
> so the queued strp_work doesn't double-wake the consumer
> - Patch 4: Add tls_strparser msg_announced bit so the recvmsg
> exit-point handoff doesn't re-fire saved_data_ready() for a
> record BH or the worker already announced (rx_list-only drain
> path)
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH net-next v9 0/5] TLS read_sock performance scalability
2026-04-29 23:13 ` [PATCH net-next v9 0/5] TLS read_sock performance scalability Jakub Kicinski
@ 2026-04-29 23:15 ` Chuck Lever
0 siblings, 0 replies; 16+ messages in thread
From: Chuck Lever @ 2026-04-29 23:15 UTC (permalink / raw)
To: Jakub Kicinski
Cc: John Fastabend, Sabrina Dubroca, Eric Dumazet, Simon Horman,
Paolo Abeni, netdev, kernel-tls-handshake, Chuck Lever,
Hannes Reinecke, Alistair Francis
On Wed, Apr 29, 2026, at 7:13 PM, Jakub Kicinski wrote:
> On Wed, 29 Apr 2026 17:48:07 -0400 Chuck Lever wrote:
>> Changes since v8:
>> - Address review comments from sashiko
>> - Patch 2: Requeue partially consumed skb to prevent leak
>> - Patch 5: Re-check sk_err so RST during flush surfaces as
>> -ECONNRESET instead of EOF
>> - Address review comments from gpt-5.5
>
> Is someone running gpt-5.5 on the public submissions?
>
>> - Patch 4: Restore msg_ready early-return in tls_strp_check_rcv()
>> so the queued strp_work doesn't double-wake the consumer
>> - Patch 4: Add tls_strparser msg_announced bit so the recvmsg
>> exit-point handoff doesn't re-fire saved_data_ready() for a
>> record BH or the worker already announced (rx_list-only drain
>> path)
I'm using codex to review the patches.
--
Chuck Lever
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH net-next v9 0/5] TLS read_sock performance scalability
2026-04-29 21:48 [PATCH net-next v9 0/5] TLS read_sock performance scalability Chuck Lever
` (5 preceding siblings ...)
2026-04-29 23:13 ` [PATCH net-next v9 0/5] TLS read_sock performance scalability Jakub Kicinski
@ 2026-05-03 1:04 ` Jakub Kicinski
2026-05-03 19:34 ` Chuck Lever
6 siblings, 1 reply; 16+ messages in thread
From: Jakub Kicinski @ 2026-05-03 1:04 UTC (permalink / raw)
To: Chuck Lever
Cc: John Fastabend, Sabrina Dubroca, Eric Dumazet, Simon Horman,
Paolo Abeni, netdev, kernel-tls-handshake, Chuck Lever,
Hannes Reinecke, Alistair Francis
On Wed, 29 Apr 2026 17:48:07 -0400 Chuck Lever wrote:
> I'd like to encourage in-kernel kTLS consumers (i.e., NFS and
> NVMe/TCP) to coalesce on the use of read_sock. When I suggested
> this to Hannes, he reported a few performance scalability issues
> with read_sock.
Meaning, this series achieves.. what right now?
I mean - the headline is "performance scalability" and there's no
performance testing result in any of the messages :S
Patch 5 for instance "seems logical" but how much difference does
it make?
> However, batch async decryption and its
> submit/deliver scaffolding were dropped from this series because
> async_capable is always false for TLS 1.3, the TLS version that
> NFS and NVMe/TCP both require. Async crypto support for TLS 1.3
> is a prerequisite for revisiting that work.
>
> This series is now only a set of clean-ups. Support for async
> has been deferred until after TLS KeyUpdate has been merged.
What does "after TLS KeyUpdate has been merged" mean?
KeyUpdate is supported.. You mean in NFS? Or in async?
FTR async support is a major pain and we'd rather get rid of it
(and switch away from cryto API) than extend it.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH net-next v9 2/5] tls: Fix dangling skb pointer in tls_sw_read_sock()
2026-04-29 21:48 ` [PATCH net-next v9 2/5] tls: Fix dangling skb pointer in tls_sw_read_sock() Chuck Lever
@ 2026-05-03 1:05 ` Jakub Kicinski
0 siblings, 0 replies; 16+ messages in thread
From: Jakub Kicinski @ 2026-05-03 1:05 UTC (permalink / raw)
To: Chuck Lever
Cc: John Fastabend, Sabrina Dubroca, Eric Dumazet, Simon Horman,
Paolo Abeni, netdev, kernel-tls-handshake, Chuck Lever,
Hannes Reinecke, Alistair Francis
On Wed, 29 Apr 2026 17:48:09 -0400 Chuck Lever wrote:
> if (used < rxm->full_len) {
> rxm->offset += used;
> rxm->full_len -= used;
> - if (!desc->count)
> - goto read_sock_requeue;
> - } else {
> - consume_skb(skb);
> - if (!desc->count)
> - skb = NULL;
> + goto read_sock_requeue;
> }
> - } while (skb);
> + consume_skb(skb);
> + skb = NULL;
> + if (!desc->count)
> + break;
> + }
This diverges from how TCP behaves, AFAICT.
Short read is not a signal to break for TCP.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH net-next v9 3/5] tls: Factor tls_strp_msg_release() from tls_strp_msg_done()
2026-04-29 21:48 ` [PATCH net-next v9 3/5] tls: Factor tls_strp_msg_release() from tls_strp_msg_done() Chuck Lever
@ 2026-05-03 1:09 ` Jakub Kicinski
0 siblings, 0 replies; 16+ messages in thread
From: Jakub Kicinski @ 2026-05-03 1:09 UTC (permalink / raw)
To: Chuck Lever
Cc: John Fastabend, Sabrina Dubroca, Eric Dumazet, Simon Horman,
Paolo Abeni, netdev, kernel-tls-handshake, Chuck Lever,
Hannes Reinecke, Alistair Francis
On Wed, 29 Apr 2026 17:48:10 -0400 Chuck Lever wrote:
> -void tls_strp_msg_done(struct tls_strparser *strp)
> +/**
> + * tls_strp_msg_release - release the current strparser message
> + * @strp: TLS stream parser instance
> + *
> + * Release the current record without triggering a check for the
> + * next record. Callers must invoke tls_strp_check_rcv() before
> + * releasing the socket lock, or queued data will stall until
> + * the next tls_strp_data_ready() event.
> + */
Please respect local style - don't add kdoc on internal functions.
This is not exported, just add the "body" of the comment above
the function. Kdoc on internal functions is a waste of LOC and
it's easy to forget when adding arguments.
> +void tls_strp_msg_release(struct tls_strparser *strp)
release -> consume
In context of TLS we "release" a socket when we unlock it.
And we "consume" and skb when we free it. So "consume" matches
the semantics better, no?
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH net-next v9 4/5] tls: Suppress spurious saved_data_ready on all receive paths
2026-04-29 21:48 ` [PATCH net-next v9 4/5] tls: Suppress spurious saved_data_ready on all receive paths Chuck Lever
@ 2026-05-03 1:19 ` Jakub Kicinski
0 siblings, 0 replies; 16+ messages in thread
From: Jakub Kicinski @ 2026-05-03 1:19 UTC (permalink / raw)
To: Chuck Lever
Cc: John Fastabend, Sabrina Dubroca, Eric Dumazet, Simon Horman,
Paolo Abeni, netdev, kernel-tls-handshake, Chuck Lever
On Wed, 29 Apr 2026 17:48:11 -0400 Chuck Lever wrote:
> -void tls_strp_check_rcv(struct tls_strparser *strp)
> +/**
> + * tls_strp_check_rcv - parse queued data and optionally notify
> + * @strp: TLS stream parser instance
> + * @wake: if true, fire consumer notification when a record is newly
> + * parsed by this call
> + *
> + * Returns immediately when a record is already ready; the wake fires
> + * only on transitions from no-record to record-ready. Callers that
> + * need to notify a waiter about a record parsed by another path
> + * should invoke tls_rx_msg_ready() directly.
> + */
nit/reminder: no kdoc
> +void tls_strp_check_rcv(struct tls_strparser *strp, bool wake)
I wonder if it'd make sense to s/wake/announce/ for consistency?
> {
> if (unlikely(strp->stopped) || strp->msg_ready)
> return;
>
> if (tls_strp_read_sock(strp) == -ENOMEM)
> queue_work(tls_strp_wq, &strp->work);
> + else if (wake && strp->msg_ready)
> + tls_rx_msg_ready(strp);
> }
> diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
> index c58d3b0b0a8a..cbb068266bab 100644
> --- a/net/tls/tls_sw.c
> +++ b/net/tls/tls_sw.c
> @@ -1383,7 +1383,11 @@ tls_rx_rec_wait(struct sock *sk, struct sk_psock *psock, bool nonblock,
> return ret;
>
> if (!skb_queue_empty(&sk->sk_receive_queue)) {
> - tls_strp_check_rcv(&ctx->strp);
> + /* Defer notification to the exit point;
> + * this thread will consume the record
> + * directly.
this line could be un-wrapped AFAICT
> + */
> + tls_strp_check_rcv(&ctx->strp, false);
> if (tls_strp_msg_ready(ctx))
> break;
> }
> @@ -1869,9 +1873,17 @@ static int tls_record_content_type(struct msghdr *msg, struct tls_msg *tlm,
> return 1;
> }
>
> -static void tls_rx_rec_done(struct tls_sw_context_rx *ctx)
> +/* Parse any data left in the lower socket and hand off a single
> + * notification to the next reader. tls_rx_msg_ready() is a no-op
> + * when the current record has already been announced, so paths
> + * that drained ctx->rx_list without touching the strparser do
> + * not re-fire saved_data_ready() for a record BH or the worker
> + * already announced.
> + */
> +static void tls_rx_handoff(struct tls_sw_context_rx *ctx)
> {
> - tls_strp_msg_done(&ctx->strp);
> + tls_strp_check_rcv(&ctx->strp, false);
> + tls_rx_msg_ready(&ctx->strp);
> }
>
> /* This function traverses the rx_list in tls receive context to copies the
> @@ -2152,7 +2164,7 @@ int tls_sw_recvmsg(struct sock *sk,
> err = tls_record_content_type(msg, tls_msg(darg.skb), &control);
> if (err <= 0) {
> DEBUG_NET_WARN_ON_ONCE(darg.zc);
> - tls_rx_rec_done(ctx);
> + tls_strp_msg_release(&ctx->strp);
> put_on_rx_list_err:
> __skb_queue_tail(&ctx->rx_list, darg.skb);
> goto recv_end;
> @@ -2166,7 +2178,8 @@ int tls_sw_recvmsg(struct sock *sk,
> /* TLS 1.3 may have updated the length by more than overhead */
> rxm = strp_msg(darg.skb);
> chunk = rxm->full_len;
> - tls_rx_rec_done(ctx);
> + tls_strp_msg_release(&ctx->strp);
> + tls_strp_check_rcv(&ctx->strp, false);
>
> if (!darg.zc) {
> bool partially_consumed = chunk > len;
> @@ -2260,6 +2273,7 @@ int tls_sw_recvmsg(struct sock *sk,
> copied += decrypted;
>
> end:
> + tls_rx_handoff(ctx);
> tls_rx_reader_unlock(sk, ctx);
> if (psock)
> sk_psock_put(sk, psock);
> @@ -2300,7 +2314,7 @@ ssize_t tls_sw_splice_read(struct socket *sock, loff_t *ppos,
> if (err < 0)
> goto splice_read_end;
>
> - tls_rx_rec_done(ctx);
> + tls_strp_msg_release(&ctx->strp);
> skb = darg.skb;
> }
>
> @@ -2327,6 +2341,7 @@ ssize_t tls_sw_splice_read(struct socket *sock, loff_t *ppos,
> consume_skb(skb);
>
> splice_read_end:
> + tls_rx_handoff(ctx);
I don't get why the tls_rx_handoff() thing exists, TBH
Maybe it made sense in the context of the code which is no longer part
of the series. But without it it looks pretty odd. Fells like you should
simply be suppressing notifications when tls_strp_check_rcv() is called
by tls_rx_rec_done() and then tls_rx_reader_release() should make sure
it flushes the ->data_ready if it's exiting on a read && !announced
condition. In rcvmsg() path you're now calling tls_strp_check_rcv()
twice.
> tls_rx_reader_unlock(sk, ctx);
> return copied ? : err;
>
> @@ -2392,7 +2407,7 @@ int tls_sw_read_sock(struct sock *sk, read_descriptor_t *desc,
> tlm = tls_msg(skb);
> decrypted += rxm->full_len;
>
> - tls_rx_rec_done(ctx);
> + tls_strp_msg_release(&ctx->strp);
> }
>
> /* read_sock does not support reading control messages */
> @@ -2420,6 +2435,7 @@ int tls_sw_read_sock(struct sock *sk, read_descriptor_t *desc,
> }
>
> read_sock_end:
> + tls_rx_handoff(ctx);
> tls_rx_reader_release(sk, ctx);
> return copied ? : err;
>
> @@ -2504,10 +2520,18 @@ int tls_rx_msg_size(struct tls_strparser *strp, struct sk_buff *skb)
> return ret;
> }
>
> +/* Fire saved_data_ready() at most once per parsed record.
> + * msg_announced is cleared by tls_strp_msg_release() when the
> + * current record is consumed, arming the next announcement.
> + */
> void tls_rx_msg_ready(struct tls_strparser *strp)
Given the change to the semantics maybe this should be named
tls_rx_msg_maybe_announce() or _flush_announce() now ?
> {
> struct tls_sw_context_rx *ctx;
>
> + if (!READ_ONCE(strp->msg_ready) || strp->msg_announced)
> + return;
> + strp->msg_announced = 1;
> +
> ctx = container_of(strp, struct tls_sw_context_rx, strp);
> ctx->saved_data_ready(strp->sk);
> }
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH net-next v9 1/5] tls: Abort the connection on decrypt failure
2026-04-29 21:48 ` [PATCH net-next v9 1/5] tls: Abort the connection on decrypt failure Chuck Lever
@ 2026-05-03 1:20 ` Jakub Kicinski
0 siblings, 0 replies; 16+ messages in thread
From: Jakub Kicinski @ 2026-05-03 1:20 UTC (permalink / raw)
To: Chuck Lever
Cc: John Fastabend, Sabrina Dubroca, Eric Dumazet, Simon Horman,
Paolo Abeni, netdev, kernel-tls-handshake, Chuck Lever,
Hannes Reinecke
On Wed, 29 Apr 2026 17:48:08 -0400 Chuck Lever wrote:
> Subject: [PATCH net-next v9 1/5] tls: Abort the connection on decrypt failure
I was gonna apply this one but the subject gives me pause.
It really oversells what this trivial refactor does..
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH net-next v9 0/5] TLS read_sock performance scalability
2026-05-03 1:04 ` Jakub Kicinski
@ 2026-05-03 19:34 ` Chuck Lever
2026-05-04 13:33 ` Sabrina Dubroca
0 siblings, 1 reply; 16+ messages in thread
From: Chuck Lever @ 2026-05-03 19:34 UTC (permalink / raw)
To: Jakub Kicinski
Cc: John Fastabend, Sabrina Dubroca, Eric Dumazet, Simon Horman,
Paolo Abeni, netdev, kernel-tls-handshake, Chuck Lever,
Hannes Reinecke, Alistair Francis
On 5/3/26 3:04 AM, Jakub Kicinski wrote:
> On Wed, 29 Apr 2026 17:48:07 -0400 Chuck Lever wrote:
>> I'd like to encourage in-kernel kTLS consumers (i.e., NFS and
>> NVMe/TCP) to coalesce on the use of read_sock. When I suggested
>> this to Hannes, he reported a few performance scalability issues
>> with read_sock.
>
> Meaning, this series achieves.. what right now?
> I mean - the headline is "performance scalability" and there's no
> performance testing result in any of the messages :S
> Patch 5 for instance "seems logical" but how much difference does
> it make?
The cover Subject: line has not been changed so all the revisions of
this series can be located easily.
The cover letter makes it clear that the series is now only a clean-up
series. Since async_capable is set to false for TLSv1.3, there is no
performance benefit to these changes, so I don't intend to post a
motivation for it based on performance.
>> However, batch async decryption and its
>> submit/deliver scaffolding were dropped from this series because
>> async_capable is always false for TLS 1.3, the TLS version that
>> NFS and NVMe/TCP both require. Async crypto support for TLS 1.3
>> is a prerequisite for revisiting that work.
>>
>> This series is now only a set of clean-ups. Support for async
>> has been deferred until after TLS KeyUpdate has been merged.
>
> What does "after TLS KeyUpdate has been merged" mean?
> KeyUpdate is supported.. You mean in NFS? Or in async?
We want to support TLS KeyUpdate in the in-kernel TLS consumers, which
include NFSD, the NFS client, the NVMe/TCP host, and the NVMe/TCP
target. There are two pre-requisites:
1. The in-kernel TLS consumers need to reliably and securely handle TLS
Alerts. That is coming in the next series I plan to post.
2. The TLS handshake upcall needs to handle KeyUpdate operations. That
is the series Alistair has been posting since forever, and is waiting
on getting this series and support for TLS Alerts merged into the
four in-kernel TLS consumers listed above.
> FTR async support is a major pain and we'd rather get rid of it
> (and switch away from cryto API) than extend it.
That would have been nice to know three months ago when I started work
on this series.
Is there nothing left to do here but drop this series? We'd really like
to get TLS KeyUpdate working for in-kernel TLS consumers, so anything
that can move this process forward is welcome.
--
Chuck Lever
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH net-next v9 0/5] TLS read_sock performance scalability
2026-05-03 19:34 ` Chuck Lever
@ 2026-05-04 13:33 ` Sabrina Dubroca
2026-05-04 15:59 ` Chuck Lever
0 siblings, 1 reply; 16+ messages in thread
From: Sabrina Dubroca @ 2026-05-04 13:33 UTC (permalink / raw)
To: Chuck Lever
Cc: Jakub Kicinski, John Fastabend, Eric Dumazet, Simon Horman,
Paolo Abeni, netdev, kernel-tls-handshake, Chuck Lever,
Hannes Reinecke, Alistair Francis
2026-05-03, 21:34:01 +0200, Chuck Lever wrote:
> On 5/3/26 3:04 AM, Jakub Kicinski wrote:
> > On Wed, 29 Apr 2026 17:48:07 -0400 Chuck Lever wrote:
> >> I'd like to encourage in-kernel kTLS consumers (i.e., NFS and
> >> NVMe/TCP) to coalesce on the use of read_sock. When I suggested
> >> this to Hannes, he reported a few performance scalability issues
> >> with read_sock.
> >
> > Meaning, this series achieves.. what right now?
> > I mean - the headline is "performance scalability" and there's no
> > performance testing result in any of the messages :S
> > Patch 5 for instance "seems logical" but how much difference does
> > it make?
>
> The cover Subject: line has not been changed so all the revisions of
> this series can be located easily.
(not to bikeshed, links to lore also do that)
> The cover letter makes it clear that the series is now only a clean-up
> series. Since async_capable is set to false for TLSv1.3, there is no
> performance benefit to these changes, so I don't intend to post a
> motivation for it based on performance.
Maybe I misunderstood, but I thought there was a somewhat noticeable
benefit to the "suppress spurious wakeups" patch (not +20%, but at
least improved behavior for some users of kTLS), and maybe the "flush
backlog" one.
Patch 2 may still be beneficial (though it's now mixing 2 separate
changes), and patch 1 is a very reasonable code cleanup.
Patch 4 does feel like a pretty large amount of churn if it has no
observable benefit.
> > FTR async support is a major pain and we'd rather get rid of it
> > (and switch away from cryto API) than extend it.
(the "don't use crypto API" thing is news to me, that seems to be
trendy these days)
> That would have been nice to know three months ago when I started work
> on this series.
I remember discussing this with Jakub, but I don't know if that was
on-list. There's been a lot of bugs in that code.
> We'd really like
> to get TLS KeyUpdate working for in-kernel TLS consumers, so anything
> that can move this process forward is welcome.
But net/tls doesn't need any changes for that, right? net/handshake
maybe, but that's a separate "component" (MAINTAINERS entry).
--
Sabrina
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH net-next v9 0/5] TLS read_sock performance scalability
2026-05-04 13:33 ` Sabrina Dubroca
@ 2026-05-04 15:59 ` Chuck Lever
0 siblings, 0 replies; 16+ messages in thread
From: Chuck Lever @ 2026-05-04 15:59 UTC (permalink / raw)
To: Sabrina Dubroca
Cc: Jakub Kicinski, John Fastabend, Eric Dumazet, Simon Horman,
Paolo Abeni, netdev, kernel-tls-handshake, Chuck Lever,
Hannes Reinecke, Alistair Francis
On Mon, May 4, 2026, at 3:33 PM, Sabrina Dubroca wrote:
> 2026-05-03, 21:34:01 +0200, Chuck Lever wrote:
>> On 5/3/26 3:04 AM, Jakub Kicinski wrote:
>> > On Wed, 29 Apr 2026 17:48:07 -0400 Chuck Lever wrote:
>> >> I'd like to encourage in-kernel kTLS consumers (i.e., NFS and
>> >> NVMe/TCP) to coalesce on the use of read_sock. When I suggested
>> >> this to Hannes, he reported a few performance scalability issues
>> >> with read_sock.
>> >
>> > Meaning, this series achieves.. what right now?
>> > I mean - the headline is "performance scalability" and there's no
>> > performance testing result in any of the messages :S
>> > Patch 5 for instance "seems logical" but how much difference does
>> > it make?
>>
>> The cover Subject: line has not been changed so all the revisions of
>> this series can be located easily.
>
> (not to bikeshed, links to lore also do that)
>
>> The cover letter makes it clear that the series is now only a clean-up
>> series. Since async_capable is set to false for TLSv1.3, there is no
>> performance benefit to these changes, so I don't intend to post a
>> motivation for it based on performance.
>
> Maybe I misunderstood, but I thought there was a somewhat noticeable
> benefit to the "suppress spurious wakeups" patch (not +20%, but at
> least improved behavior for some users of kTLS), and maybe the "flush
> backlog" one.
>
> Patch 2 may still be beneficial (though it's now mixing 2 separate
> changes), and patch 1 is a very reasonable code cleanup.
>
> Patch 4 does feel like a pretty large amount of churn if it has no
> observable benefit.
There is potential benefit to eliminating spurious wake-ups,
but nothing I've found to be observable at the application
level.
>> We'd really like
>> to get TLS KeyUpdate working for in-kernel TLS consumers, so anything
>> that can move this process forward is welcome.
>
> But net/tls doesn't need any changes for that, right?
>> 1. The in-kernel TLS consumers need to reliably and securely handle TLS
>> Alerts. That is coming in the next series I plan to post.
This series will make changes to net/tls/.
--
Chuck Lever
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2026-05-04 15:59 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-29 21:48 [PATCH net-next v9 0/5] TLS read_sock performance scalability Chuck Lever
2026-04-29 21:48 ` [PATCH net-next v9 1/5] tls: Abort the connection on decrypt failure Chuck Lever
2026-05-03 1:20 ` Jakub Kicinski
2026-04-29 21:48 ` [PATCH net-next v9 2/5] tls: Fix dangling skb pointer in tls_sw_read_sock() Chuck Lever
2026-05-03 1:05 ` Jakub Kicinski
2026-04-29 21:48 ` [PATCH net-next v9 3/5] tls: Factor tls_strp_msg_release() from tls_strp_msg_done() Chuck Lever
2026-05-03 1:09 ` Jakub Kicinski
2026-04-29 21:48 ` [PATCH net-next v9 4/5] tls: Suppress spurious saved_data_ready on all receive paths Chuck Lever
2026-05-03 1:19 ` Jakub Kicinski
2026-04-29 21:48 ` [PATCH net-next v9 5/5] tls: Flush backlog before waiting for a new record Chuck Lever
2026-04-29 23:13 ` [PATCH net-next v9 0/5] TLS read_sock performance scalability Jakub Kicinski
2026-04-29 23:15 ` Chuck Lever
2026-05-03 1:04 ` Jakub Kicinski
2026-05-03 19:34 ` Chuck Lever
2026-05-04 13:33 ` Sabrina Dubroca
2026-05-04 15:59 ` Chuck Lever
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox