From: Chuck Lever <cel@kernel.org>
To: John Fastabend <john.fastabend@gmail.com>,
Jakub Kicinski <kuba@kernel.org>,
Sabrina Dubroca <sd@queasysnail.net>
Cc: Eric Dumazet <edumazet@google.com>,
Simon Horman <horms@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
netdev@vger.kernel.org, kernel-tls-handshake@lists.linux.dev,
Chuck Lever <chuck.lever@oracle.com>
Subject: [PATCH net-next v9 4/5] tls: Suppress spurious saved_data_ready on all receive paths
Date: Wed, 29 Apr 2026 17:48:11 -0400 [thread overview]
Message-ID: <20260429-tls-read-sock-v9-4-39e71aa7810f@oracle.com> (raw)
In-Reply-To: <20260429-tls-read-sock-v9-0-39e71aa7810f@oracle.com>
From: Chuck Lever <chuck.lever@oracle.com>
Each record release via tls_strp_msg_done() triggers
tls_strp_check_rcv(), which calls tls_rx_msg_ready() and
fires saved_data_ready(). During a multi-record receive,
the first N-1 wakeups are pure overhead: the caller is
already running and will pick up subsequent records on
the next loop iteration. On the splice_read path the
per-record wakeup is similarly unnecessary because the
caller still holds the socket lock.
Replace tls_strp_msg_done() with tls_strp_msg_release()
in all three receive paths (read_sock, recvmsg,
splice_read), deferring the consumer notification to
each path's exit point. Factor tls_rx_msg_ready() out
of tls_strp_read_sock(), and add a @wake parameter to
tls_strp_check_rcv() so callers can parse queued data
without notifying. tls_strp_check_rcv() retains its
no-op-on-msg_ready semantics, so the BH and worker
notification paths fire saved_data_ready() at most once
per parsed record.
The exit points then invoke tls_rx_msg_ready() once,
covering records the inline parse loop left behind for
a subsequent reader. To keep that final notification
idempotent against records BH or the worker has already
announced, tls_strparser gains a msg_announced bit:
tls_rx_msg_ready() sets it when firing saved_data_ready();
the bit is cleared whenever the parsed record is wiped,
by tls_strp_msg_release() on consumption or by
tls_strp_msg_load() when the lower socket loses bytes
from under the parse. A second call for the same parsed
record, as happens when recvmsg() satisfies the request
from ctx->rx_list without touching the strparser, is then
a no-op.
With no remaining callers, tls_strp_msg_done() and its
wrapper tls_rx_rec_done() are removed.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
include/net/tls.h | 4 ++++
net/tls/tls.h | 3 +--
net/tls/tls_main.c | 2 +-
net/tls/tls_strp.c | 29 ++++++++++++++++++-----------
net/tls/tls_sw.c | 38 +++++++++++++++++++++++++++++++-------
5 files changed, 55 insertions(+), 21 deletions(-)
diff --git a/include/net/tls.h b/include/net/tls.h
index ebd2550280ae..b5c00a93f6ba 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -111,10 +111,14 @@ struct tls_sw_context_tx {
struct tls_strparser {
struct sock *sk;
+ /* Bitfield word and msg_ready are serialized by the lower
+ * socket lock; BH and worker contexts both acquire it.
+ */
u32 mark : 8;
u32 stopped : 1;
u32 copy_mode : 1;
u32 mixed_decrypted : 1;
+ u32 msg_announced : 1;
bool msg_ready;
diff --git a/net/tls/tls.h b/net/tls/tls.h
index a97f1acef31d..f41dac6305f4 100644
--- a/net/tls/tls.h
+++ b/net/tls/tls.h
@@ -192,9 +192,8 @@ void tls_strp_stop(struct tls_strparser *strp);
int tls_strp_init(struct tls_strparser *strp, struct sock *sk);
void tls_strp_data_ready(struct tls_strparser *strp);
-void tls_strp_check_rcv(struct tls_strparser *strp);
+void tls_strp_check_rcv(struct tls_strparser *strp, bool wake);
void tls_strp_msg_release(struct tls_strparser *strp);
-void tls_strp_msg_done(struct tls_strparser *strp);
int tls_rx_msg_size(struct tls_strparser *strp, struct sk_buff *skb);
void tls_rx_msg_ready(struct tls_strparser *strp);
diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
index fd39acf41a61..c10a3fd7fc17 100644
--- a/net/tls/tls_main.c
+++ b/net/tls/tls_main.c
@@ -769,7 +769,7 @@ static int do_tls_setsockopt_conf(struct sock *sk, sockptr_t optval,
} else {
struct tls_sw_context_rx *rx_ctx = tls_sw_ctx_rx(ctx);
- tls_strp_check_rcv(&rx_ctx->strp);
+ tls_strp_check_rcv(&rx_ctx->strp, true);
}
return 0;
diff --git a/net/tls/tls_strp.c b/net/tls/tls_strp.c
index a7648ebde162..bf88fad58b9b 100644
--- a/net/tls/tls_strp.c
+++ b/net/tls/tls_strp.c
@@ -368,7 +368,6 @@ static int tls_strp_copyin(read_descriptor_t *desc, struct sk_buff *in_skb,
desc->count = 0;
WRITE_ONCE(strp->msg_ready, 1);
- tls_rx_msg_ready(strp);
}
return ret;
@@ -492,6 +491,7 @@ bool tls_strp_msg_load(struct tls_strparser *strp, bool force_refresh)
if (!strp->copy_mode && force_refresh) {
if (unlikely(tcp_inq(strp->sk) < strp->stm.full_len)) {
WRITE_ONCE(strp->msg_ready, 0);
+ strp->msg_announced = 0;
memset(&strp->stm, 0, sizeof(strp->stm));
return false;
}
@@ -539,18 +539,30 @@ static int tls_strp_read_sock(struct tls_strparser *strp)
return tls_strp_read_copy(strp, false);
WRITE_ONCE(strp->msg_ready, 1);
- tls_rx_msg_ready(strp);
return 0;
}
-void tls_strp_check_rcv(struct tls_strparser *strp)
+/**
+ * tls_strp_check_rcv - parse queued data and optionally notify
+ * @strp: TLS stream parser instance
+ * @wake: if true, fire consumer notification when a record is newly
+ * parsed by this call
+ *
+ * Returns immediately when a record is already ready; the wake fires
+ * only on transitions from no-record to record-ready. Callers that
+ * need to notify a waiter about a record parsed by another path
+ * should invoke tls_rx_msg_ready() directly.
+ */
+void tls_strp_check_rcv(struct tls_strparser *strp, bool wake)
{
if (unlikely(strp->stopped) || strp->msg_ready)
return;
if (tls_strp_read_sock(strp) == -ENOMEM)
queue_work(tls_strp_wq, &strp->work);
+ else if (wake && strp->msg_ready)
+ tls_rx_msg_ready(strp);
}
/* Lower sock lock held */
@@ -568,7 +580,7 @@ void tls_strp_data_ready(struct tls_strparser *strp)
return;
}
- tls_strp_check_rcv(strp);
+ tls_strp_check_rcv(strp, true);
}
static void tls_strp_work(struct work_struct *w)
@@ -577,7 +589,7 @@ static void tls_strp_work(struct work_struct *w)
container_of(w, struct tls_strparser, work);
lock_sock(strp->sk);
- tls_strp_check_rcv(strp);
+ tls_strp_check_rcv(strp, true);
release_sock(strp->sk);
}
@@ -600,15 +612,10 @@ void tls_strp_msg_release(struct tls_strparser *strp)
tls_strp_flush_anchor_copy(strp);
WRITE_ONCE(strp->msg_ready, 0);
+ strp->msg_announced = 0;
memset(&strp->stm, 0, sizeof(strp->stm));
}
-void tls_strp_msg_done(struct tls_strparser *strp)
-{
- tls_strp_msg_release(strp);
- tls_strp_check_rcv(strp);
-}
-
void tls_strp_stop(struct tls_strparser *strp)
{
strp->stopped = 1;
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index c58d3b0b0a8a..cbb068266bab 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -1383,7 +1383,11 @@ tls_rx_rec_wait(struct sock *sk, struct sk_psock *psock, bool nonblock,
return ret;
if (!skb_queue_empty(&sk->sk_receive_queue)) {
- tls_strp_check_rcv(&ctx->strp);
+ /* Defer notification to the exit point;
+ * this thread will consume the record
+ * directly.
+ */
+ tls_strp_check_rcv(&ctx->strp, false);
if (tls_strp_msg_ready(ctx))
break;
}
@@ -1869,9 +1873,17 @@ static int tls_record_content_type(struct msghdr *msg, struct tls_msg *tlm,
return 1;
}
-static void tls_rx_rec_done(struct tls_sw_context_rx *ctx)
+/* Parse any data left in the lower socket and hand off a single
+ * notification to the next reader. tls_rx_msg_ready() is a no-op
+ * when the current record has already been announced, so paths
+ * that drained ctx->rx_list without touching the strparser do
+ * not re-fire saved_data_ready() for a record BH or the worker
+ * already announced.
+ */
+static void tls_rx_handoff(struct tls_sw_context_rx *ctx)
{
- tls_strp_msg_done(&ctx->strp);
+ tls_strp_check_rcv(&ctx->strp, false);
+ tls_rx_msg_ready(&ctx->strp);
}
/* This function traverses the rx_list in tls receive context to copies the
@@ -2152,7 +2164,7 @@ int tls_sw_recvmsg(struct sock *sk,
err = tls_record_content_type(msg, tls_msg(darg.skb), &control);
if (err <= 0) {
DEBUG_NET_WARN_ON_ONCE(darg.zc);
- tls_rx_rec_done(ctx);
+ tls_strp_msg_release(&ctx->strp);
put_on_rx_list_err:
__skb_queue_tail(&ctx->rx_list, darg.skb);
goto recv_end;
@@ -2166,7 +2178,8 @@ int tls_sw_recvmsg(struct sock *sk,
/* TLS 1.3 may have updated the length by more than overhead */
rxm = strp_msg(darg.skb);
chunk = rxm->full_len;
- tls_rx_rec_done(ctx);
+ tls_strp_msg_release(&ctx->strp);
+ tls_strp_check_rcv(&ctx->strp, false);
if (!darg.zc) {
bool partially_consumed = chunk > len;
@@ -2260,6 +2273,7 @@ int tls_sw_recvmsg(struct sock *sk,
copied += decrypted;
end:
+ tls_rx_handoff(ctx);
tls_rx_reader_unlock(sk, ctx);
if (psock)
sk_psock_put(sk, psock);
@@ -2300,7 +2314,7 @@ ssize_t tls_sw_splice_read(struct socket *sock, loff_t *ppos,
if (err < 0)
goto splice_read_end;
- tls_rx_rec_done(ctx);
+ tls_strp_msg_release(&ctx->strp);
skb = darg.skb;
}
@@ -2327,6 +2341,7 @@ ssize_t tls_sw_splice_read(struct socket *sock, loff_t *ppos,
consume_skb(skb);
splice_read_end:
+ tls_rx_handoff(ctx);
tls_rx_reader_unlock(sk, ctx);
return copied ? : err;
@@ -2392,7 +2407,7 @@ int tls_sw_read_sock(struct sock *sk, read_descriptor_t *desc,
tlm = tls_msg(skb);
decrypted += rxm->full_len;
- tls_rx_rec_done(ctx);
+ tls_strp_msg_release(&ctx->strp);
}
/* read_sock does not support reading control messages */
@@ -2420,6 +2435,7 @@ int tls_sw_read_sock(struct sock *sk, read_descriptor_t *desc,
}
read_sock_end:
+ tls_rx_handoff(ctx);
tls_rx_reader_release(sk, ctx);
return copied ? : err;
@@ -2504,10 +2520,18 @@ int tls_rx_msg_size(struct tls_strparser *strp, struct sk_buff *skb)
return ret;
}
+/* Fire saved_data_ready() at most once per parsed record.
+ * msg_announced is cleared by tls_strp_msg_release() when the
+ * current record is consumed, arming the next announcement.
+ */
void tls_rx_msg_ready(struct tls_strparser *strp)
{
struct tls_sw_context_rx *ctx;
+ if (!READ_ONCE(strp->msg_ready) || strp->msg_announced)
+ return;
+ strp->msg_announced = 1;
+
ctx = container_of(strp, struct tls_sw_context_rx, strp);
ctx->saved_data_ready(strp->sk);
}
--
2.53.0
next prev parent reply other threads:[~2026-04-29 21:48 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-29 21:48 [PATCH net-next v9 0/5] TLS read_sock performance scalability Chuck Lever
2026-04-29 21:48 ` [PATCH net-next v9 1/5] tls: Abort the connection on decrypt failure Chuck Lever
2026-05-03 1:20 ` Jakub Kicinski
2026-04-29 21:48 ` [PATCH net-next v9 2/5] tls: Fix dangling skb pointer in tls_sw_read_sock() Chuck Lever
2026-05-03 1:05 ` Jakub Kicinski
2026-04-29 21:48 ` [PATCH net-next v9 3/5] tls: Factor tls_strp_msg_release() from tls_strp_msg_done() Chuck Lever
2026-05-03 1:09 ` Jakub Kicinski
2026-04-29 21:48 ` Chuck Lever [this message]
2026-05-03 1:19 ` [PATCH net-next v9 4/5] tls: Suppress spurious saved_data_ready on all receive paths Jakub Kicinski
2026-04-29 21:48 ` [PATCH net-next v9 5/5] tls: Flush backlog before waiting for a new record Chuck Lever
2026-04-29 23:13 ` [PATCH net-next v9 0/5] TLS read_sock performance scalability Jakub Kicinski
2026-04-29 23:15 ` Chuck Lever
2026-05-03 1:04 ` Jakub Kicinski
2026-05-03 19:34 ` Chuck Lever
2026-05-04 13:33 ` Sabrina Dubroca
2026-05-04 15:59 ` Chuck Lever
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260429-tls-read-sock-v9-4-39e71aa7810f@oracle.com \
--to=cel@kernel.org \
--cc=chuck.lever@oracle.com \
--cc=edumazet@google.com \
--cc=horms@kernel.org \
--cc=john.fastabend@gmail.com \
--cc=kernel-tls-handshake@lists.linux.dev \
--cc=kuba@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=sd@queasysnail.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox