* [PATCH v3 0/8] TLS read_sock performance scalability
@ 2026-03-12 1:47 Chuck Lever
2026-03-12 1:47 ` [PATCH v3 1/8] tls: Factor tls_decrypt_async_drain() from recvmsg Chuck Lever
` (7 more replies)
0 siblings, 8 replies; 15+ messages in thread
From: Chuck Lever @ 2026-03-12 1:47 UTC (permalink / raw)
To: john.fastabend, kuba, sd; +Cc: netdev, kernel-tls-handshake, Chuck Lever
From: Chuck Lever <chuck.lever@oracle.com>
I'd like to encourage in-kernel kTLS consumers (i.e., NFS and
NVMe/TCP) to coalesce on the use of read_sock. When I suggested
this to Hannes, he reported a number of nagging performance
scalability issues with read_sock. This series is an attempt to run
these issues down and get them fixed before we convert the above
sock_recvmsg consumers over to read_sock.
While I assemble performance data, let's nail down the preferred
code structure.
Base commit: 05e059510edf ("Merge branch 'eth-fbnic-add-fbnic-self-tests'")
---
Changes since v2:
- Fix short read self tests
Changes since v1:
- Add C11 reference
- Extend data_ready reduction to recvmsg and splice
- Restructure read_sock and recvmsg using shared helpers
Chuck Lever (8):
tls: Factor tls_decrypt_async_drain() from recvmsg
tls: Factor tls_rx_decrypt_record() helper
tls: Fix dangling skb pointer in tls_sw_read_sock()
tls: Factor tls_strp_msg_release() from tls_strp_msg_done()
tls: Suppress spurious saved_data_ready on all receive paths
tls: Flush backlog before tls_rx_rec_wait in read_sock
tls: Restructure tls_sw_read_sock() into submit/deliver phases
tls: Enable batch async decryption in read_sock
net/tls/tls.h | 3 +-
net/tls/tls_strp.c | 34 ++++++--
net/tls/tls_sw.c | 213 ++++++++++++++++++++++++++++++++-------------
3 files changed, 184 insertions(+), 66 deletions(-)
--
2.52.0
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH v3 1/8] tls: Factor tls_decrypt_async_drain() from recvmsg
2026-03-12 1:47 [PATCH v3 0/8] TLS read_sock performance scalability Chuck Lever
@ 2026-03-12 1:47 ` Chuck Lever
2026-03-12 4:34 ` Alistair Francis
2026-03-16 10:13 ` Sabrina Dubroca
2026-03-12 1:47 ` [PATCH v3 2/8] tls: Factor tls_rx_decrypt_record() helper Chuck Lever
` (6 subsequent siblings)
7 siblings, 2 replies; 15+ messages in thread
From: Chuck Lever @ 2026-03-12 1:47 UTC (permalink / raw)
To: john.fastabend, kuba, sd
Cc: netdev, kernel-tls-handshake, Chuck Lever, Hannes Reinecke
From: Chuck Lever <chuck.lever@oracle.com>
The recvmsg path pairs tls_decrypt_async_wait() with
__skb_queue_purge(&ctx->async_hold). Bundling the two into
tls_decrypt_async_drain() gives later patches a single call for
async teardown.
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
net/tls/tls_sw.c | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index a656ce235758..cedcc82669db 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -249,6 +249,18 @@ static int tls_decrypt_async_wait(struct tls_sw_context_rx *ctx)
return ctx->async_wait.err;
}
+/* Collect all pending async AEAD completions and release the
+ * skbs held for them. Returns the crypto error if any
+ * operation failed, zero otherwise.
+ */
+static int tls_decrypt_async_drain(struct tls_sw_context_rx *ctx)
+{
+ int ret = tls_decrypt_async_wait(ctx);
+
+ __skb_queue_purge(&ctx->async_hold);
+ return ret;
+}
+
static int tls_do_decryption(struct sock *sk,
struct scatterlist *sgin,
struct scatterlist *sgout,
@@ -2223,8 +2235,7 @@ int tls_sw_recvmsg(struct sock *sk,
int ret;
/* Wait for all previously submitted records to be decrypted */
- ret = tls_decrypt_async_wait(ctx);
- __skb_queue_purge(&ctx->async_hold);
+ ret = tls_decrypt_async_drain(ctx);
if (ret) {
if (err >= 0 || err == -EINPROGRESS)
--
2.52.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v3 2/8] tls: Factor tls_rx_decrypt_record() helper
2026-03-12 1:47 [PATCH v3 0/8] TLS read_sock performance scalability Chuck Lever
2026-03-12 1:47 ` [PATCH v3 1/8] tls: Factor tls_decrypt_async_drain() from recvmsg Chuck Lever
@ 2026-03-12 1:47 ` Chuck Lever
2026-03-12 4:35 ` Alistair Francis
2026-03-16 10:20 ` Sabrina Dubroca
2026-03-12 1:47 ` [PATCH v3 3/8] tls: Fix dangling skb pointer in tls_sw_read_sock() Chuck Lever
` (5 subsequent siblings)
7 siblings, 2 replies; 15+ messages in thread
From: Chuck Lever @ 2026-03-12 1:47 UTC (permalink / raw)
To: john.fastabend, kuba, sd
Cc: netdev, kernel-tls-handshake, Chuck Lever, Hannes Reinecke
From: Chuck Lever <chuck.lever@oracle.com>
recvmsg, read_sock, and splice_read each open-code the
same sequence: zero-initialize the decrypt arguments, call
tls_rx_one_record(), and abort the connection on failure.
Extract tls_rx_decrypt_record() so each receive path shares
a single decrypt-and-abort primitive. Each call site still
initializes darg.inargs separately, since recvmsg sets zc
and async between the memset and the decrypt call.
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
net/tls/tls_sw.c | 29 +++++++++++++++++------------
1 file changed, 17 insertions(+), 12 deletions(-)
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index cedcc82669db..81e0e8aaa6f9 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -1832,6 +1832,17 @@ static int tls_rx_one_record(struct sock *sk, struct msghdr *msg,
return tls_check_pending_rekey(sk, tls_ctx, darg->skb);
}
+/* Decrypt one record and abort the connection on failure. */
+static int tls_rx_decrypt_record(struct sock *sk, struct msghdr *msg,
+ struct tls_decrypt_arg *darg)
+{
+ int err = tls_rx_one_record(sk, msg, darg);
+
+ if (err < 0)
+ tls_err_abort(sk, -EBADMSG);
+ return err;
+}
+
int decrypt_skb(struct sock *sk, struct scatterlist *sgout)
{
struct tls_decrypt_arg darg = { .zc = true, };
@@ -2132,11 +2143,9 @@ int tls_sw_recvmsg(struct sock *sk,
else
darg.async = false;
- err = tls_rx_one_record(sk, msg, &darg);
- if (err < 0) {
- tls_err_abort(sk, -EBADMSG);
+ err = tls_rx_decrypt_record(sk, msg, &darg);
+ if (err < 0)
goto recv_end;
- }
async |= darg.async;
@@ -2294,11 +2303,9 @@ ssize_t tls_sw_splice_read(struct socket *sock, loff_t *ppos,
memset(&darg.inargs, 0, sizeof(darg.inargs));
- err = tls_rx_one_record(sk, NULL, &darg);
- if (err < 0) {
- tls_err_abort(sk, -EBADMSG);
+ err = tls_rx_decrypt_record(sk, NULL, &darg);
+ if (err < 0)
goto splice_read_end;
- }
tls_rx_rec_done(ctx);
skb = darg.skb;
@@ -2380,11 +2387,9 @@ int tls_sw_read_sock(struct sock *sk, read_descriptor_t *desc,
memset(&darg.inargs, 0, sizeof(darg.inargs));
- err = tls_rx_one_record(sk, NULL, &darg);
- if (err < 0) {
- tls_err_abort(sk, -EBADMSG);
+ err = tls_rx_decrypt_record(sk, NULL, &darg);
+ if (err < 0)
goto read_sock_end;
- }
released = tls_read_flush_backlog(sk, prot, INT_MAX,
0, decrypted,
--
2.52.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v3 3/8] tls: Fix dangling skb pointer in tls_sw_read_sock()
2026-03-12 1:47 [PATCH v3 0/8] TLS read_sock performance scalability Chuck Lever
2026-03-12 1:47 ` [PATCH v3 1/8] tls: Factor tls_decrypt_async_drain() from recvmsg Chuck Lever
2026-03-12 1:47 ` [PATCH v3 2/8] tls: Factor tls_rx_decrypt_record() helper Chuck Lever
@ 2026-03-12 1:47 ` Chuck Lever
2026-03-12 1:48 ` [PATCH v3 4/8] tls: Factor tls_strp_msg_release() from tls_strp_msg_done() Chuck Lever
` (4 subsequent siblings)
7 siblings, 0 replies; 15+ messages in thread
From: Chuck Lever @ 2026-03-12 1:47 UTC (permalink / raw)
To: john.fastabend, kuba, sd
Cc: netdev, kernel-tls-handshake, Chuck Lever, Hannes Reinecke,
Alistair Francis
From: Chuck Lever <chuck.lever@oracle.com>
Per ISO/IEC 9899:2011 section 6.2.4p2, a pointer value becomes
indeterminate when the object it points to reaches the end of its
lifetime; Annex J.2 classifies the use of such a value as undefined
behavior. In tls_sw_read_sock(), consume_skb(skb) in the
fully-consumed path frees the skb, but the "do { } while (skb)"
loop condition then evaluates that freed pointer. Although the
value is never dereferenced -- the loop either continues and
overwrites skb, or exits -- any future change that adds a
dereference between consume_skb() and the loop condition would
produce a silent use-after-free.
Fixes: 662fbcec32f4 ("net/tls: implement ->read_sock()")
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
net/tls/tls_sw.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 81e0e8aaa6f9..e5d0447cbba6 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -2373,7 +2373,7 @@ int tls_sw_read_sock(struct sock *sk, read_descriptor_t *desc,
goto read_sock_end;
decrypted = 0;
- do {
+ for (;;) {
if (!skb_queue_empty(&ctx->rx_list)) {
skb = __skb_dequeue(&ctx->rx_list);
rxm = strp_msg(skb);
@@ -2422,10 +2422,11 @@ int tls_sw_read_sock(struct sock *sk, read_descriptor_t *desc,
goto read_sock_requeue;
} else {
consume_skb(skb);
+ skb = NULL;
if (!desc->count)
- skb = NULL;
+ break;
}
- } while (skb);
+ }
read_sock_end:
tls_rx_reader_release(sk, ctx);
--
2.52.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v3 4/8] tls: Factor tls_strp_msg_release() from tls_strp_msg_done()
2026-03-12 1:47 [PATCH v3 0/8] TLS read_sock performance scalability Chuck Lever
` (2 preceding siblings ...)
2026-03-12 1:47 ` [PATCH v3 3/8] tls: Fix dangling skb pointer in tls_sw_read_sock() Chuck Lever
@ 2026-03-12 1:48 ` Chuck Lever
2026-03-12 1:48 ` [PATCH v3 5/8] tls: Suppress spurious saved_data_ready on all receive paths Chuck Lever
` (3 subsequent siblings)
7 siblings, 0 replies; 15+ messages in thread
From: Chuck Lever @ 2026-03-12 1:48 UTC (permalink / raw)
To: john.fastabend, kuba, sd
Cc: netdev, kernel-tls-handshake, Chuck Lever, Hannes Reinecke,
Alistair Francis
From: Chuck Lever <chuck.lever@oracle.com>
tls_strp_msg_done() conflates releasing the current record with
checking for the next one via tls_strp_check_rcv(). Batch
processing requires releasing a record without immediately
triggering that check, so the release step is separated into
tls_strp_msg_release(). tls_strp_msg_done() is preserved as a
wrapper for existing callers.
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
net/tls/tls.h | 1 +
net/tls/tls_strp.c | 15 ++++++++++++++-
2 files changed, 15 insertions(+), 1 deletion(-)
diff --git a/net/tls/tls.h b/net/tls/tls.h
index e8f81a006520..a97f1acef31d 100644
--- a/net/tls/tls.h
+++ b/net/tls/tls.h
@@ -193,6 +193,7 @@ int tls_strp_init(struct tls_strparser *strp, struct sock *sk);
void tls_strp_data_ready(struct tls_strparser *strp);
void tls_strp_check_rcv(struct tls_strparser *strp);
+void tls_strp_msg_release(struct tls_strparser *strp);
void tls_strp_msg_done(struct tls_strparser *strp);
int tls_rx_msg_size(struct tls_strparser *strp, struct sk_buff *skb);
diff --git a/net/tls/tls_strp.c b/net/tls/tls_strp.c
index 98e12f0ff57e..a7648ebde162 100644
--- a/net/tls/tls_strp.c
+++ b/net/tls/tls_strp.c
@@ -581,7 +581,16 @@ static void tls_strp_work(struct work_struct *w)
release_sock(strp->sk);
}
-void tls_strp_msg_done(struct tls_strparser *strp)
+/**
+ * tls_strp_msg_release - release the current strparser message
+ * @strp: TLS stream parser instance
+ *
+ * Release the current record without triggering a check for the
+ * next record. Callers must invoke tls_strp_check_rcv() before
+ * releasing the socket lock, or queued data will stall until
+ * the next tls_strp_data_ready() event.
+ */
+void tls_strp_msg_release(struct tls_strparser *strp)
{
WARN_ON(!strp->stm.full_len);
@@ -592,7 +601,11 @@ void tls_strp_msg_done(struct tls_strparser *strp)
WRITE_ONCE(strp->msg_ready, 0);
memset(&strp->stm, 0, sizeof(strp->stm));
+}
+void tls_strp_msg_done(struct tls_strparser *strp)
+{
+ tls_strp_msg_release(strp);
tls_strp_check_rcv(strp);
}
--
2.52.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v3 5/8] tls: Suppress spurious saved_data_ready on all receive paths
2026-03-12 1:47 [PATCH v3 0/8] TLS read_sock performance scalability Chuck Lever
` (3 preceding siblings ...)
2026-03-12 1:48 ` [PATCH v3 4/8] tls: Factor tls_strp_msg_release() from tls_strp_msg_done() Chuck Lever
@ 2026-03-12 1:48 ` Chuck Lever
2026-03-12 1:48 ` [PATCH v3 6/8] tls: Flush backlog before tls_rx_rec_wait in read_sock Chuck Lever
` (2 subsequent siblings)
7 siblings, 0 replies; 15+ messages in thread
From: Chuck Lever @ 2026-03-12 1:48 UTC (permalink / raw)
To: john.fastabend, kuba, sd
Cc: netdev, kernel-tls-handshake, Chuck Lever, Alistair Francis,
Hannes Reinecke
From: Chuck Lever <chuck.lever@oracle.com>
Each record release via tls_strp_msg_done() triggers
tls_strp_check_rcv(), which calls tls_rx_msg_ready() and
fires saved_data_ready(). During a multi-record receive,
the first N-1 wakeups are pure overhead: the caller is
already running and will pick up subsequent records on
the next loop iteration. The same waste occurs on the
recvmsg and splice_read paths.
Replace tls_strp_msg_done() with tls_strp_msg_release() in
all three receive paths (read_sock, recvmsg, splice_read),
deferring the tls_strp_check_rcv() call to each path's
exit point. Factor tls_rx_msg_ready() out of
tls_strp_read_sock() so that parsing a record no longer
fires the callback directly, and introduce
tls_strp_check_rcv_quiet() for use in tls_rx_rec_wait(),
which parses queued data without notifying.
With no remaining callers, tls_strp_msg_done() and its
wrapper tls_rx_rec_done() are removed.
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
net/tls/tls.h | 2 +-
net/tls/tls_strp.c | 27 +++++++++++++++++++--------
net/tls/tls_sw.c | 21 ++++++++++++++-------
3 files changed, 34 insertions(+), 16 deletions(-)
diff --git a/net/tls/tls.h b/net/tls/tls.h
index a97f1acef31d..0ab3b83c3724 100644
--- a/net/tls/tls.h
+++ b/net/tls/tls.h
@@ -193,8 +193,8 @@ int tls_strp_init(struct tls_strparser *strp, struct sock *sk);
void tls_strp_data_ready(struct tls_strparser *strp);
void tls_strp_check_rcv(struct tls_strparser *strp);
+void tls_strp_check_rcv_quiet(struct tls_strparser *strp);
void tls_strp_msg_release(struct tls_strparser *strp);
-void tls_strp_msg_done(struct tls_strparser *strp);
int tls_rx_msg_size(struct tls_strparser *strp, struct sk_buff *skb);
void tls_rx_msg_ready(struct tls_strparser *strp);
diff --git a/net/tls/tls_strp.c b/net/tls/tls_strp.c
index a7648ebde162..6cf274380da2 100644
--- a/net/tls/tls_strp.c
+++ b/net/tls/tls_strp.c
@@ -368,7 +368,6 @@ static int tls_strp_copyin(read_descriptor_t *desc, struct sk_buff *in_skb,
desc->count = 0;
WRITE_ONCE(strp->msg_ready, 1);
- tls_rx_msg_ready(strp);
}
return ret;
@@ -539,11 +538,27 @@ static int tls_strp_read_sock(struct tls_strparser *strp)
return tls_strp_read_copy(strp, false);
WRITE_ONCE(strp->msg_ready, 1);
- tls_rx_msg_ready(strp);
return 0;
}
+/**
+ * tls_strp_check_rcv_quiet - parse without consumer notification
+ * @strp: TLS stream parser instance
+ *
+ * Parse queued data without firing the consumer notification. A subsequent
+ * tls_strp_check_rcv() is required before the socket lock is released;
+ * otherwise queued data stalls until the next tls_strp_data_ready() event.
+ */
+void tls_strp_check_rcv_quiet(struct tls_strparser *strp)
+{
+ if (unlikely(strp->stopped) || strp->msg_ready)
+ return;
+
+ if (tls_strp_read_sock(strp) == -ENOMEM)
+ queue_work(tls_strp_wq, &strp->work);
+}
+
void tls_strp_check_rcv(struct tls_strparser *strp)
{
if (unlikely(strp->stopped) || strp->msg_ready)
@@ -551,6 +566,8 @@ void tls_strp_check_rcv(struct tls_strparser *strp)
if (tls_strp_read_sock(strp) == -ENOMEM)
queue_work(tls_strp_wq, &strp->work);
+ else if (strp->msg_ready)
+ tls_rx_msg_ready(strp);
}
/* Lower sock lock held */
@@ -603,12 +620,6 @@ void tls_strp_msg_release(struct tls_strparser *strp)
memset(&strp->stm, 0, sizeof(strp->stm));
}
-void tls_strp_msg_done(struct tls_strparser *strp)
-{
- tls_strp_msg_release(strp);
- tls_strp_check_rcv(strp);
-}
-
void tls_strp_stop(struct tls_strparser *strp)
{
strp->stopped = 1;
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index e5d0447cbba6..43d37b0e6d59 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -1384,7 +1384,10 @@ tls_rx_rec_wait(struct sock *sk, struct sk_psock *psock, bool nonblock,
return ret;
if (!skb_queue_empty(&sk->sk_receive_queue)) {
- tls_strp_check_rcv(&ctx->strp);
+ /* tls_strp_check_rcv() is called at each receive
+ * path's exit before the socket lock is released.
+ */
+ tls_strp_check_rcv_quiet(&ctx->strp);
if (tls_strp_msg_ready(ctx))
break;
}
@@ -1876,9 +1879,9 @@ static int tls_record_content_type(struct msghdr *msg, struct tls_msg *tlm,
return 1;
}
-static void tls_rx_rec_done(struct tls_sw_context_rx *ctx)
+static void tls_rx_rec_release(struct tls_sw_context_rx *ctx)
{
- tls_strp_msg_done(&ctx->strp);
+ tls_strp_msg_release(&ctx->strp);
}
/* This function traverses the rx_list in tls receive context to copies the
@@ -2159,7 +2162,7 @@ int tls_sw_recvmsg(struct sock *sk,
err = tls_record_content_type(msg, tls_msg(darg.skb), &control);
if (err <= 0) {
DEBUG_NET_WARN_ON_ONCE(darg.zc);
- tls_rx_rec_done(ctx);
+ tls_rx_rec_release(ctx);
put_on_rx_list_err:
__skb_queue_tail(&ctx->rx_list, darg.skb);
goto recv_end;
@@ -2173,7 +2176,8 @@ int tls_sw_recvmsg(struct sock *sk,
/* TLS 1.3 may have updated the length by more than overhead */
rxm = strp_msg(darg.skb);
chunk = rxm->full_len;
- tls_rx_rec_done(ctx);
+ tls_rx_rec_release(ctx);
+ tls_strp_check_rcv_quiet(&ctx->strp);
if (!darg.zc) {
bool partially_consumed = chunk > len;
@@ -2267,6 +2271,7 @@ int tls_sw_recvmsg(struct sock *sk,
copied += decrypted;
end:
+ tls_strp_check_rcv(&ctx->strp);
tls_rx_reader_unlock(sk, ctx);
if (psock)
sk_psock_put(sk, psock);
@@ -2307,7 +2312,7 @@ ssize_t tls_sw_splice_read(struct socket *sock, loff_t *ppos,
if (err < 0)
goto splice_read_end;
- tls_rx_rec_done(ctx);
+ tls_rx_rec_release(ctx);
skb = darg.skb;
}
@@ -2334,6 +2339,7 @@ ssize_t tls_sw_splice_read(struct socket *sock, loff_t *ppos,
consume_skb(skb);
splice_read_end:
+ tls_strp_check_rcv(&ctx->strp);
tls_rx_reader_unlock(sk, ctx);
return copied ? : err;
@@ -2399,7 +2405,7 @@ int tls_sw_read_sock(struct sock *sk, read_descriptor_t *desc,
tlm = tls_msg(skb);
decrypted += rxm->full_len;
- tls_rx_rec_done(ctx);
+ tls_rx_rec_release(ctx);
}
/* read_sock does not support reading control messages */
@@ -2429,6 +2435,7 @@ int tls_sw_read_sock(struct sock *sk, read_descriptor_t *desc,
}
read_sock_end:
+ tls_strp_check_rcv(&ctx->strp);
tls_rx_reader_release(sk, ctx);
return copied ? : err;
--
2.52.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v3 6/8] tls: Flush backlog before tls_rx_rec_wait in read_sock
2026-03-12 1:47 [PATCH v3 0/8] TLS read_sock performance scalability Chuck Lever
` (4 preceding siblings ...)
2026-03-12 1:48 ` [PATCH v3 5/8] tls: Suppress spurious saved_data_ready on all receive paths Chuck Lever
@ 2026-03-12 1:48 ` Chuck Lever
2026-03-16 17:17 ` Sabrina Dubroca
2026-03-12 1:48 ` [PATCH v3 7/8] tls: Restructure tls_sw_read_sock() into submit/deliver phases Chuck Lever
2026-03-12 1:48 ` [PATCH v3 8/8] tls: Enable batch async decryption in read_sock Chuck Lever
7 siblings, 1 reply; 15+ messages in thread
From: Chuck Lever @ 2026-03-12 1:48 UTC (permalink / raw)
To: john.fastabend, kuba, sd
Cc: netdev, kernel-tls-handshake, Chuck Lever, Alistair Francis,
Hannes Reinecke
From: Chuck Lever <chuck.lever@oracle.com>
While lock_sock is held during read_sock, incoming TCP segments
land on sk->sk_backlog rather than sk->sk_receive_queue.
tls_rx_rec_wait() inspects only sk_receive_queue, so backlog
data remains invisible until release_sock() drains it, forcing
an extra workqueue cycle for records that arrive during
decryption.
Calling sk_flush_backlog() before tls_rx_rec_wait() moves
backlog data into sk_receive_queue, where tls_strp_check_rcv()
can parse it immediately. The existing tls_read_flush_backlog
call after decryption is retained for TCP window management.
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
net/tls/tls_sw.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 43d37b0e6d59..7e1560d5ab79 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -2387,6 +2387,11 @@ int tls_sw_read_sock(struct sock *sk, read_descriptor_t *desc,
} else {
struct tls_decrypt_arg darg;
+ /* Drain backlog so segments that arrived while the
+ * lock was held appear on sk_receive_queue before
+ * tls_rx_rec_wait waits for a new record.
+ */
+ sk_flush_backlog(sk);
err = tls_rx_rec_wait(sk, NULL, true, released);
if (err <= 0)
goto read_sock_end;
--
2.52.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v3 7/8] tls: Restructure tls_sw_read_sock() into submit/deliver phases
2026-03-12 1:47 [PATCH v3 0/8] TLS read_sock performance scalability Chuck Lever
` (5 preceding siblings ...)
2026-03-12 1:48 ` [PATCH v3 6/8] tls: Flush backlog before tls_rx_rec_wait in read_sock Chuck Lever
@ 2026-03-12 1:48 ` Chuck Lever
2026-03-12 1:48 ` [PATCH v3 8/8] tls: Enable batch async decryption in read_sock Chuck Lever
7 siblings, 0 replies; 15+ messages in thread
From: Chuck Lever @ 2026-03-12 1:48 UTC (permalink / raw)
To: john.fastabend, kuba, sd
Cc: netdev, kernel-tls-handshake, Chuck Lever, Hannes Reinecke
From: Chuck Lever <chuck.lever@oracle.com>
Pipelining multiple AEAD operations requires separating decryption
from delivery so that several records can be submitted before any
are passed to the read_actor callback. The main loop in
tls_sw_read_sock() is split into two explicit phases: a submit
phase that decrypts one record onto ctx->rx_list, and a deliver
phase that drains rx_list and passes each cleartext skb to the
read_actor callback.
With a single record per submit phase, behavior is identical to the
previous code. A subsequent patch will extend the submit phase to
pipeline multiple AEAD operations.
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
net/tls/tls_sw.c | 79 +++++++++++++++++++++++++-----------------------
1 file changed, 41 insertions(+), 38 deletions(-)
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 7e1560d5ab79..6d54d350bced 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -2354,8 +2354,8 @@ int tls_sw_read_sock(struct sock *sk, read_descriptor_t *desc,
struct tls_context *tls_ctx = tls_get_ctx(sk);
struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx);
struct tls_prot_info *prot = &tls_ctx->prot_info;
- struct strp_msg *rxm = NULL;
struct sk_buff *skb = NULL;
+ struct strp_msg *rxm;
struct sk_psock *psock;
size_t flushed_at = 0;
bool released = true;
@@ -2380,17 +2380,15 @@ int tls_sw_read_sock(struct sock *sk, read_descriptor_t *desc,
decrypted = 0;
for (;;) {
- if (!skb_queue_empty(&ctx->rx_list)) {
- skb = __skb_dequeue(&ctx->rx_list);
- rxm = strp_msg(skb);
- tlm = tls_msg(skb);
- } else {
- struct tls_decrypt_arg darg;
+ struct tls_decrypt_arg darg;
- /* Drain backlog so segments that arrived while the
- * lock was held appear on sk_receive_queue before
- * tls_rx_rec_wait waits for a new record.
- */
+ /* Phase 1: Submit -- decrypt one record onto rx_list.
+ * Flush the backlog first so that segments that
+ * arrived while the lock was held appear on
+ * sk_receive_queue before tls_rx_rec_wait waits
+ * for a new record.
+ */
+ if (skb_queue_empty(&ctx->rx_list)) {
sk_flush_backlog(sk);
err = tls_rx_rec_wait(sk, NULL, true, released);
if (err <= 0)
@@ -2405,38 +2403,43 @@ int tls_sw_read_sock(struct sock *sk, read_descriptor_t *desc,
released = tls_read_flush_backlog(sk, prot, INT_MAX,
0, decrypted,
&flushed_at);
- skb = darg.skb;
+ decrypted += strp_msg(darg.skb)->full_len;
+ tls_rx_rec_release(ctx);
+ __skb_queue_tail(&ctx->rx_list, darg.skb);
+ }
+
+ /* Phase 2: Deliver -- drain rx_list to read_actor */
+ while ((skb = __skb_dequeue(&ctx->rx_list)) != NULL) {
rxm = strp_msg(skb);
tlm = tls_msg(skb);
- decrypted += rxm->full_len;
- tls_rx_rec_release(ctx);
- }
-
- /* read_sock does not support reading control messages */
- if (tlm->control != TLS_RECORD_TYPE_DATA) {
- err = -EINVAL;
- goto read_sock_requeue;
- }
-
- used = read_actor(desc, skb, rxm->offset, rxm->full_len);
- if (used <= 0) {
- if (!copied)
- err = used;
- goto read_sock_requeue;
- }
- copied += used;
- if (used < rxm->full_len) {
- rxm->offset += used;
- rxm->full_len -= used;
- if (!desc->count)
+ /* read_sock does not support reading control messages */
+ if (tlm->control != TLS_RECORD_TYPE_DATA) {
+ err = -EINVAL;
goto read_sock_requeue;
- } else {
- consume_skb(skb);
- skb = NULL;
- if (!desc->count)
- break;
+ }
+
+ used = read_actor(desc, skb, rxm->offset,
+ rxm->full_len);
+ if (used <= 0) {
+ if (!copied)
+ err = used;
+ goto read_sock_requeue;
+ }
+ copied += used;
+ if (used < rxm->full_len) {
+ rxm->offset += used;
+ rxm->full_len -= used;
+ if (!desc->count)
+ goto read_sock_requeue;
+ } else {
+ consume_skb(skb);
+ skb = NULL;
+ }
}
+ /* Drain all of rx_list before honoring !desc->count */
+ if (!desc->count)
+ break;
}
read_sock_end:
--
2.52.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v3 8/8] tls: Enable batch async decryption in read_sock
2026-03-12 1:47 [PATCH v3 0/8] TLS read_sock performance scalability Chuck Lever
` (6 preceding siblings ...)
2026-03-12 1:48 ` [PATCH v3 7/8] tls: Restructure tls_sw_read_sock() into submit/deliver phases Chuck Lever
@ 2026-03-12 1:48 ` Chuck Lever
7 siblings, 0 replies; 15+ messages in thread
From: Chuck Lever @ 2026-03-12 1:48 UTC (permalink / raw)
To: john.fastabend, kuba, sd
Cc: netdev, kernel-tls-handshake, Chuck Lever, Hannes Reinecke
From: Chuck Lever <chuck.lever@oracle.com>
tls_sw_read_sock() decrypts one TLS record at a time, blocking until
each AEAD operation completes before proceeding. Hardware async
crypto engines depend on pipelining multiple operations to achieve
full throughput, and the one-at-a-time model prevents that. Kernel
consumers such as NVMe-TCP and NFSD (when using TLS) are therefore
unable to benefit from hardware offload.
When ctx->async_capable is true, the submit phase now loops up to
TLS_READ_SOCK_BATCH (16) records. The first record waits via
tls_rx_rec_wait(); subsequent iterations use tls_strp_msg_ready()
and tls_strp_check_rcv() to collect records already queued on the
socket without blocking. Each record is submitted with darg.async
set, and all resulting skbs are appended to rx_list.
After the submit loop, a single tls_decrypt_async_drain() collects
all pending AEAD completions before the deliver phase passes
cleartext records to the consumer. The batch bound of 16 limits
concurrent memory consumption to 16 cleartext skbs plus their AEAD
contexts. If async_capable is false, the loop exits after one
record and the async wait is skipped, preserving prior behavior.
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
net/tls/tls_sw.c | 95 +++++++++++++++++++++++++++++++++++++++---------
1 file changed, 78 insertions(+), 17 deletions(-)
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 6d54d350bced..e8d92faa7296 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -261,6 +261,12 @@ static int tls_decrypt_async_drain(struct tls_sw_context_rx *ctx)
return ret;
}
+/* Submit an AEAD decrypt request. On success with darg->async set,
+ * the caller must not touch aead_req; the completion handler frees
+ * it. Every error return clears darg->async and guarantees no
+ * in-flight AEAD operation remains -- callers rely on this to
+ * safely free aead_req and to skip async drain on error paths.
+ */
static int tls_do_decryption(struct sock *sk,
struct scatterlist *sgin,
struct scatterlist *sgout,
@@ -2348,6 +2354,13 @@ ssize_t tls_sw_splice_read(struct socket *sock, loff_t *ppos,
goto splice_read_end;
}
+/* Bound on concurrent async AEAD submissions per read_sock
+ * call. Chosen to fill typical hardware crypto pipelines
+ * without excessive memory consumption (each in-flight record
+ * holds one cleartext skb plus its AEAD request context).
+ */
+#define TLS_READ_SOCK_BATCH 16
+
int tls_sw_read_sock(struct sock *sk, read_descriptor_t *desc,
sk_read_actor_t read_actor)
{
@@ -2359,6 +2372,7 @@ int tls_sw_read_sock(struct sock *sk, read_descriptor_t *desc,
struct sk_psock *psock;
size_t flushed_at = 0;
bool released = true;
+ bool async = false;
struct tls_msg *tlm;
ssize_t copied = 0;
ssize_t decrypted;
@@ -2381,31 +2395,68 @@ int tls_sw_read_sock(struct sock *sk, read_descriptor_t *desc,
decrypted = 0;
for (;;) {
struct tls_decrypt_arg darg;
+ int nr_async = 0;
- /* Phase 1: Submit -- decrypt one record onto rx_list.
+ /* Phase 1: Submit -- decrypt records onto rx_list.
* Flush the backlog first so that segments that
* arrived while the lock was held appear on
* sk_receive_queue before tls_rx_rec_wait waits
* for a new record.
*/
if (skb_queue_empty(&ctx->rx_list)) {
- sk_flush_backlog(sk);
- err = tls_rx_rec_wait(sk, NULL, true, released);
- if (err <= 0)
+ while (nr_async < TLS_READ_SOCK_BATCH) {
+ if (nr_async == 0) {
+ sk_flush_backlog(sk);
+ err = tls_rx_rec_wait(sk, NULL,
+ true,
+ released);
+ if (err <= 0)
+ goto read_sock_end;
+ } else {
+ if (!tls_strp_msg_ready(ctx)) {
+ tls_strp_check_rcv_quiet(&ctx->strp);
+ if (!tls_strp_msg_ready(ctx))
+ break;
+ }
+ if (!tls_strp_msg_load(&ctx->strp,
+ released))
+ break;
+ }
+
+ memset(&darg.inargs, 0, sizeof(darg.inargs));
+ darg.async = ctx->async_capable;
+
+ err = tls_rx_decrypt_record(sk, NULL,
+ &darg);
+ if (err < 0)
+ goto read_sock_end;
+
+ async |= darg.async;
+ released = tls_read_flush_backlog(sk, prot,
+ INT_MAX,
+ 0,
+ decrypted,
+ &flushed_at);
+ decrypted += strp_msg(darg.skb)->full_len;
+ tls_rx_rec_release(ctx);
+ __skb_queue_tail(&ctx->rx_list, darg.skb);
+ nr_async++;
+
+ if (!ctx->async_capable)
+ break;
+ }
+ }
+
+ /* Async wait -- collect pending AEAD completions */
+ if (async) {
+ int ret = tls_decrypt_async_drain(ctx);
+
+ async = false;
+ if (ret) {
+ __skb_queue_purge(&ctx->rx_list);
+ err = ret;
goto read_sock_end;
-
- memset(&darg.inargs, 0, sizeof(darg.inargs));
-
- err = tls_rx_decrypt_record(sk, NULL, &darg);
- if (err < 0)
- goto read_sock_end;
-
- released = tls_read_flush_backlog(sk, prot, INT_MAX,
- 0, decrypted,
- &flushed_at);
- decrypted += strp_msg(darg.skb)->full_len;
- tls_rx_rec_release(ctx);
- __skb_queue_tail(&ctx->rx_list, darg.skb);
+ }
}
/* Phase 2: Deliver -- drain rx_list to read_actor */
@@ -2443,6 +2494,16 @@ int tls_sw_read_sock(struct sock *sk, read_descriptor_t *desc,
}
read_sock_end:
+ if (async) {
+ int ret = tls_decrypt_async_drain(ctx);
+
+ __skb_queue_purge(&ctx->rx_list);
+ /* Preserve the error that triggered early exit;
+ * a crypto drain error is secondary.
+ */
+ if (ret && !err)
+ err = ret;
+ }
tls_strp_check_rcv(&ctx->strp);
tls_rx_reader_release(sk, ctx);
return copied ? : err;
--
2.52.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH v3 1/8] tls: Factor tls_decrypt_async_drain() from recvmsg
2026-03-12 1:47 ` [PATCH v3 1/8] tls: Factor tls_decrypt_async_drain() from recvmsg Chuck Lever
@ 2026-03-12 4:34 ` Alistair Francis
2026-03-16 10:13 ` Sabrina Dubroca
1 sibling, 0 replies; 15+ messages in thread
From: Alistair Francis @ 2026-03-12 4:34 UTC (permalink / raw)
To: Chuck Lever
Cc: john.fastabend, kuba, sd, netdev, kernel-tls-handshake,
Chuck Lever, Hannes Reinecke
On Thu, Mar 12, 2026 at 11:48 AM Chuck Lever <cel@kernel.org> wrote:
>
> From: Chuck Lever <chuck.lever@oracle.com>
>
> The recvmsg path pairs tls_decrypt_async_wait() with
> __skb_queue_purge(&ctx->async_hold). Bundling the two into
> tls_decrypt_async_drain() gives later patches a single call for
> async teardown.
>
> Reviewed-by: Hannes Reinecke <hare@suse.de>
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Alistair
> ---
> net/tls/tls_sw.c | 15 +++++++++++++--
> 1 file changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
> index a656ce235758..cedcc82669db 100644
> --- a/net/tls/tls_sw.c
> +++ b/net/tls/tls_sw.c
> @@ -249,6 +249,18 @@ static int tls_decrypt_async_wait(struct tls_sw_context_rx *ctx)
> return ctx->async_wait.err;
> }
>
> +/* Collect all pending async AEAD completions and release the
> + * skbs held for them. Returns the crypto error if any
> + * operation failed, zero otherwise.
> + */
> +static int tls_decrypt_async_drain(struct tls_sw_context_rx *ctx)
> +{
> + int ret = tls_decrypt_async_wait(ctx);
> +
> + __skb_queue_purge(&ctx->async_hold);
> + return ret;
> +}
> +
> static int tls_do_decryption(struct sock *sk,
> struct scatterlist *sgin,
> struct scatterlist *sgout,
> @@ -2223,8 +2235,7 @@ int tls_sw_recvmsg(struct sock *sk,
> int ret;
>
> /* Wait for all previously submitted records to be decrypted */
> - ret = tls_decrypt_async_wait(ctx);
> - __skb_queue_purge(&ctx->async_hold);
> + ret = tls_decrypt_async_drain(ctx);
>
> if (ret) {
> if (err >= 0 || err == -EINPROGRESS)
> --
> 2.52.0
>
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v3 2/8] tls: Factor tls_rx_decrypt_record() helper
2026-03-12 1:47 ` [PATCH v3 2/8] tls: Factor tls_rx_decrypt_record() helper Chuck Lever
@ 2026-03-12 4:35 ` Alistair Francis
2026-03-16 10:20 ` Sabrina Dubroca
1 sibling, 0 replies; 15+ messages in thread
From: Alistair Francis @ 2026-03-12 4:35 UTC (permalink / raw)
To: Chuck Lever
Cc: john.fastabend, kuba, sd, netdev, kernel-tls-handshake,
Chuck Lever, Hannes Reinecke
On Thu, Mar 12, 2026 at 11:48 AM Chuck Lever <cel@kernel.org> wrote:
>
> From: Chuck Lever <chuck.lever@oracle.com>
>
> recvmsg, read_sock, and splice_read each open-code the
> same sequence: zero-initialize the decrypt arguments, call
> tls_rx_one_record(), and abort the connection on failure.
>
> Extract tls_rx_decrypt_record() so each receive path shares
> a single decrypt-and-abort primitive. Each call site still
> initializes darg.inargs separately, since recvmsg sets zc
> and async between the memset and the decrypt call.
>
> Reviewed-by: Hannes Reinecke <hare@suse.de>
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Alistair
> ---
> net/tls/tls_sw.c | 29 +++++++++++++++++------------
> 1 file changed, 17 insertions(+), 12 deletions(-)
>
> diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
> index cedcc82669db..81e0e8aaa6f9 100644
> --- a/net/tls/tls_sw.c
> +++ b/net/tls/tls_sw.c
> @@ -1832,6 +1832,17 @@ static int tls_rx_one_record(struct sock *sk, struct msghdr *msg,
> return tls_check_pending_rekey(sk, tls_ctx, darg->skb);
> }
>
> +/* Decrypt one record and abort the connection on failure. */
> +static int tls_rx_decrypt_record(struct sock *sk, struct msghdr *msg,
> + struct tls_decrypt_arg *darg)
> +{
> + int err = tls_rx_one_record(sk, msg, darg);
> +
> + if (err < 0)
> + tls_err_abort(sk, -EBADMSG);
> + return err;
> +}
> +
> int decrypt_skb(struct sock *sk, struct scatterlist *sgout)
> {
> struct tls_decrypt_arg darg = { .zc = true, };
> @@ -2132,11 +2143,9 @@ int tls_sw_recvmsg(struct sock *sk,
> else
> darg.async = false;
>
> - err = tls_rx_one_record(sk, msg, &darg);
> - if (err < 0) {
> - tls_err_abort(sk, -EBADMSG);
> + err = tls_rx_decrypt_record(sk, msg, &darg);
> + if (err < 0)
> goto recv_end;
> - }
>
> async |= darg.async;
>
> @@ -2294,11 +2303,9 @@ ssize_t tls_sw_splice_read(struct socket *sock, loff_t *ppos,
>
> memset(&darg.inargs, 0, sizeof(darg.inargs));
>
> - err = tls_rx_one_record(sk, NULL, &darg);
> - if (err < 0) {
> - tls_err_abort(sk, -EBADMSG);
> + err = tls_rx_decrypt_record(sk, NULL, &darg);
> + if (err < 0)
> goto splice_read_end;
> - }
>
> tls_rx_rec_done(ctx);
> skb = darg.skb;
> @@ -2380,11 +2387,9 @@ int tls_sw_read_sock(struct sock *sk, read_descriptor_t *desc,
>
> memset(&darg.inargs, 0, sizeof(darg.inargs));
>
> - err = tls_rx_one_record(sk, NULL, &darg);
> - if (err < 0) {
> - tls_err_abort(sk, -EBADMSG);
> + err = tls_rx_decrypt_record(sk, NULL, &darg);
> + if (err < 0)
> goto read_sock_end;
> - }
>
> released = tls_read_flush_backlog(sk, prot, INT_MAX,
> 0, decrypted,
> --
> 2.52.0
>
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v3 1/8] tls: Factor tls_decrypt_async_drain() from recvmsg
2026-03-12 1:47 ` [PATCH v3 1/8] tls: Factor tls_decrypt_async_drain() from recvmsg Chuck Lever
2026-03-12 4:34 ` Alistair Francis
@ 2026-03-16 10:13 ` Sabrina Dubroca
1 sibling, 0 replies; 15+ messages in thread
From: Sabrina Dubroca @ 2026-03-16 10:13 UTC (permalink / raw)
To: Chuck Lever
Cc: john.fastabend, kuba, netdev, kernel-tls-handshake, Chuck Lever,
Hannes Reinecke
2026-03-11, 21:47:57 -0400, Chuck Lever wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
>
> The recvmsg path pairs tls_decrypt_async_wait() with
> __skb_queue_purge(&ctx->async_hold). Bundling the two into
> tls_decrypt_async_drain() gives later patches a single call for
> async teardown.
I was wondering if tls_decrypt_async_wait() without
__skb_queue_purge() is ever the right thing. Once we've waited for all
pending decryptions, async_hold's job is done, we don't need to keep
all that memory around anymore.
Should we just move recvmsg's __skb_queue_purge() into
tls_decrypt_async_wait()?
--
Sabrina
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v3 2/8] tls: Factor tls_rx_decrypt_record() helper
2026-03-12 1:47 ` [PATCH v3 2/8] tls: Factor tls_rx_decrypt_record() helper Chuck Lever
2026-03-12 4:35 ` Alistair Francis
@ 2026-03-16 10:20 ` Sabrina Dubroca
2026-03-17 7:06 ` Hannes Reinecke
1 sibling, 1 reply; 15+ messages in thread
From: Sabrina Dubroca @ 2026-03-16 10:20 UTC (permalink / raw)
To: Chuck Lever
Cc: john.fastabend, kuba, netdev, kernel-tls-handshake, Chuck Lever,
Hannes Reinecke
2026-03-11, 21:47:58 -0400, Chuck Lever wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
>
> recvmsg, read_sock, and splice_read each open-code the
> same sequence: zero-initialize the decrypt arguments, call
> tls_rx_one_record(), and abort the connection on failure.
>
> Extract tls_rx_decrypt_record() so each receive path shares
> a single decrypt-and-abort primitive. Each call site still
> initializes darg.inargs separately, since recvmsg sets zc
> and async between the memset and the decrypt call.
Is there any reason to keep tls_rx_one_record()? You're replacing all
existing callers, and not introducing new users in this series. Seems
like what you want is just move the tls_err_abort() into
tls_rx_one_record().
(I'm not convinced that "abort the connection on every error (decrypt
fail or ENOMEM or whatever)" is right, but that's a separate question)
--
Sabrina
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v3 6/8] tls: Flush backlog before tls_rx_rec_wait in read_sock
2026-03-12 1:48 ` [PATCH v3 6/8] tls: Flush backlog before tls_rx_rec_wait in read_sock Chuck Lever
@ 2026-03-16 17:17 ` Sabrina Dubroca
0 siblings, 0 replies; 15+ messages in thread
From: Sabrina Dubroca @ 2026-03-16 17:17 UTC (permalink / raw)
To: Chuck Lever
Cc: john.fastabend, kuba, netdev, kernel-tls-handshake, Chuck Lever,
Alistair Francis, Hannes Reinecke
2026-03-11, 21:48:02 -0400, Chuck Lever wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
>
> While lock_sock is held during read_sock, incoming TCP segments
> land on sk->sk_backlog rather than sk->sk_receive_queue.
> tls_rx_rec_wait() inspects only sk_receive_queue, so backlog
> data remains invisible until release_sock() drains it, forcing
> an extra workqueue cycle for records that arrive during
> decryption.
>
> Calling sk_flush_backlog() before tls_rx_rec_wait() moves
> backlog data into sk_receive_queue, where tls_strp_check_rcv()
> can parse it immediately. The existing tls_read_flush_backlog
> call after decryption is retained for TCP window management.
I'm really confused by this.
- Why is the existing tls_read_flush_backlog not enough?
- and what is it still accomplishing now that we're flushing before
every tls_rx_rec_wait?
- Why are other RX paths not affected?
You first paragraph and the comment in the diff kind of say the
problem is that tls_rx_rec_wait() doesn't try to feed from the
backlog when sk_receive_queue is empty.
> @@ -2387,6 +2387,11 @@ int tls_sw_read_sock(struct sock *sk, read_descriptor_t *desc,
> } else {
> struct tls_decrypt_arg darg;
>
> + /* Drain backlog so segments that arrived while the
> + * lock was held appear on sk_receive_queue before
> + * tls_rx_rec_wait waits for a new record.
> + */
> + sk_flush_backlog(sk);
> err = tls_rx_rec_wait(sk, NULL, true, released);
> if (err <= 0)
> goto read_sock_end;
--
Sabrina
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v3 2/8] tls: Factor tls_rx_decrypt_record() helper
2026-03-16 10:20 ` Sabrina Dubroca
@ 2026-03-17 7:06 ` Hannes Reinecke
0 siblings, 0 replies; 15+ messages in thread
From: Hannes Reinecke @ 2026-03-17 7:06 UTC (permalink / raw)
To: Sabrina Dubroca, Chuck Lever
Cc: john.fastabend, kuba, netdev, kernel-tls-handshake, Chuck Lever
On 3/16/26 11:20, Sabrina Dubroca wrote:
> 2026-03-11, 21:47:58 -0400, Chuck Lever wrote:
>> From: Chuck Lever <chuck.lever@oracle.com>
>>
>> recvmsg, read_sock, and splice_read each open-code the
>> same sequence: zero-initialize the decrypt arguments, call
>> tls_rx_one_record(), and abort the connection on failure.
>>
>> Extract tls_rx_decrypt_record() so each receive path shares
>> a single decrypt-and-abort primitive. Each call site still
>> initializes darg.inargs separately, since recvmsg sets zc
>> and async between the memset and the decrypt call.
>
> Is there any reason to keep tls_rx_one_record()? You're replacing all
> existing callers, and not introducing new users in this series. Seems
> like what you want is just move the tls_err_abort() into
> tls_rx_one_record().
>
> (I'm not convinced that "abort the connection on every error (decrypt
> fail or ENOMEM or whatever)" is right, but that's a separate question)
>
It certainly is the sane option. Any error should be considered a
transmission error, so something has happened during transmission.
As the main point of TLS is transmission security, we cannot assume
anything (like being able to pick up the existing connection again),
and have to drop the connection to re-establish the security context.
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2026-03-17 7:07 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-12 1:47 [PATCH v3 0/8] TLS read_sock performance scalability Chuck Lever
2026-03-12 1:47 ` [PATCH v3 1/8] tls: Factor tls_decrypt_async_drain() from recvmsg Chuck Lever
2026-03-12 4:34 ` Alistair Francis
2026-03-16 10:13 ` Sabrina Dubroca
2026-03-12 1:47 ` [PATCH v3 2/8] tls: Factor tls_rx_decrypt_record() helper Chuck Lever
2026-03-12 4:35 ` Alistair Francis
2026-03-16 10:20 ` Sabrina Dubroca
2026-03-17 7:06 ` Hannes Reinecke
2026-03-12 1:47 ` [PATCH v3 3/8] tls: Fix dangling skb pointer in tls_sw_read_sock() Chuck Lever
2026-03-12 1:48 ` [PATCH v3 4/8] tls: Factor tls_strp_msg_release() from tls_strp_msg_done() Chuck Lever
2026-03-12 1:48 ` [PATCH v3 5/8] tls: Suppress spurious saved_data_ready on all receive paths Chuck Lever
2026-03-12 1:48 ` [PATCH v3 6/8] tls: Flush backlog before tls_rx_rec_wait in read_sock Chuck Lever
2026-03-16 17:17 ` Sabrina Dubroca
2026-03-12 1:48 ` [PATCH v3 7/8] tls: Restructure tls_sw_read_sock() into submit/deliver phases Chuck Lever
2026-03-12 1:48 ` [PATCH v3 8/8] tls: Enable batch async decryption in read_sock Chuck Lever
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox