* [PATCH net v2 0/4] net: tls: net: tls: fix a few random bugs
@ 2026-05-11 17:49 Jakub Kicinski
2026-05-11 17:49 ` [PATCH net v2 1/4] net: tls: fix off-by-one in sg_chain entry count for wrapped sk_msg ring Jakub Kicinski
` (4 more replies)
0 siblings, 5 replies; 17+ messages in thread
From: Jakub Kicinski @ 2026-05-11 17:49 UTC (permalink / raw)
To: davem
Cc: netdev, edumazet, pabeni, andrew+netdev, horms, sd,
john.fastabend, bpf, Jakub Kicinski
Fix a few random bugs, from external reports and my local scan
with various AI tools. Mostly corner cases in code which I don't
think TLS maintainers would consider "battle tested".
v2:
- patches 2 and 3 are new (Sashiko report)
- patch 4 is rewritten to remove the code instead of fixing it
- drop the selftests, they were a little too specific, more PoC
triggers than selftests, and Sashiko kept nit picking
- old patches 1 and 2 were already applied
- old patch 3 is gone since it can't trigger today (I will send it
to net-next)
v1: https://lore.kernel.org/20260429222944.2139041-1-kuba@kernel.org
Jakub Kicinski (4):
net: tls: fix off-by-one in sg_chain entry count for wrapped sk_msg
ring
net: tls: prevent chain-after-chain in plain text SG
net: tls: fix use-after-free in tls_sw_sendmsg_locked after bpf
verdict
net: tls: remove bad rollback and UAF on ENOSPC
net/tls/tls_sw.c | 44 +++++++++++++++++++++-----------------------
1 file changed, 21 insertions(+), 23 deletions(-)
--
2.54.0
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH net v2 1/4] net: tls: fix off-by-one in sg_chain entry count for wrapped sk_msg ring
2026-05-11 17:49 [PATCH net v2 0/4] net: tls: net: tls: fix a few random bugs Jakub Kicinski
@ 2026-05-11 17:49 ` Jakub Kicinski
2026-05-12 10:21 ` Sabrina Dubroca
2026-05-11 17:49 ` [PATCH net v2 2/4] net: tls: prevent chain-after-chain in plain text SG Jakub Kicinski
` (3 subsequent siblings)
4 siblings, 1 reply; 17+ messages in thread
From: Jakub Kicinski @ 2026-05-11 17:49 UTC (permalink / raw)
To: davem
Cc: netdev, edumazet, pabeni, andrew+netdev, horms, sd,
john.fastabend, bpf, Jakub Kicinski, 钱一铭,
daniel, jonathan.lemon
When an sk_msg scatterlist ring wraps (sg.end < sg.start),
tls_push_record() chains the tail portion of the ring to the head
using sg_chain(). An extra entry in the sg array is reserved for
this:
struct sk_msg_sg {
[...]
/* The extra two elements:
* 1) used for chaining the front and sections when the list becomes
* partitioned (e.g. end < start). The crypto APIs require the
* chaining;
* 2) to chain tailer SG entries after the message.
*/
struct scatterlist data[MAX_MSG_FRAGS + 2];
The current code uses MAX_SKB_FRAGS + 1 as the ring size:
sg_chain(&msg_pl->sg.data[msg_pl->sg.start],
MAX_SKB_FRAGS - msg_pl->sg.start + 1,
msg_pl->sg.data);
This places the chain pointer at
sg_chain(data[start], (MAX_SKB_FRAGS - msg_start + 1) .. =
&data[start] + (MAX_SKB_FRAGS - msg_start + 1) - 1 =
data[start + (MAX_SKB_FRAGS - start + 1) - 1] =
data[MAX_SKB_FRAGS]
instead of the true last entry. This is likely due to a "race" of
the commit under Fixes landing close to
commit 031097d9e079 ("bpf: sk_msg, zap ingress queue on psock down")
Convert to ARRAY_SIZE and drop the data[start] / - start (as suggested
by Sabrina).
Reported-by: 钱一铭 <yimingqian591@gmail.com>
Fixes: 9aaaa56845a0 ("bpf: Sockmap/tls, skmsg can have wrapped skmsg that needs extra chaining")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
CC: john.fastabend@gmail.com
CC: sd@queasysnail.net
CC: daniel@iogearbox.net
CC: jonathan.lemon@gmail.com
CC: bpf@vger.kernel.org
---
net/tls/tls_sw.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 2590e855f6a5..2608b0c01849 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -800,11 +800,9 @@ static int tls_push_record(struct sock *sk, int flags,
sg_mark_end(sk_msg_elem(msg_pl, i));
}
- if (msg_pl->sg.end < msg_pl->sg.start) {
- sg_chain(&msg_pl->sg.data[msg_pl->sg.start],
- MAX_SKB_FRAGS - msg_pl->sg.start + 1,
+ if (msg_pl->sg.end < msg_pl->sg.start)
+ sg_chain(msg_pl->sg.data, ARRAY_SIZE(msg_pl->sg.data),
msg_pl->sg.data);
- }
i = msg_pl->sg.start;
sg_chain(rec->sg_aead_in, 2, &msg_pl->sg.data[i]);
--
2.54.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH net v2 2/4] net: tls: prevent chain-after-chain in plain text SG
2026-05-11 17:49 [PATCH net v2 0/4] net: tls: net: tls: fix a few random bugs Jakub Kicinski
2026-05-11 17:49 ` [PATCH net v2 1/4] net: tls: fix off-by-one in sg_chain entry count for wrapped sk_msg ring Jakub Kicinski
@ 2026-05-11 17:49 ` Jakub Kicinski
2026-05-12 11:09 ` Sabrina Dubroca
2026-05-13 0:10 ` sashiko-bot
2026-05-11 17:49 ` [PATCH net v2 3/4] net: tls: fix use-after-free in tls_sw_sendmsg_locked after bpf verdict Jakub Kicinski
` (2 subsequent siblings)
4 siblings, 2 replies; 17+ messages in thread
From: Jakub Kicinski @ 2026-05-11 17:49 UTC (permalink / raw)
To: davem
Cc: netdev, edumazet, pabeni, andrew+netdev, horms, sd,
john.fastabend, bpf, Jakub Kicinski, Sashiko, daniel,
jonathan.lemon
Sashiko points out that if end = 0 (start != 0) the current
code will create a chain link to content type right after
the wrap link:
This would create a chain where the wrap link points directly
to another chain link. The scatterlist API sg_next iterator
does not recursively resolve consecutive chain links.
meaning this is illegal input to crypto.
The wrapping link is unnecessary if end = 0. end is the entry after
the last one used so end = 0 means there's nothing pushed after
the wrap:
end start i
v v v
[ ]...[ ][ d ][ d ][ d ][ d ][rsv for wrap]
Skip the wrapping in this case.
TLS 1.3 can use the "wrapping slot" for it's chaining if end = 0.
This avoids the chain-after-chain.
Move the wrap chaining before marking END and chaining off content
type, that feels like more logical ordering to me, but should not
matter from functional perspective.
Reported-by: Sashiko <sashiko-bot@kernel.org>
Fixes: 9aaaa56845a0 ("bpf: Sockmap/tls, skmsg can have wrapped skmsg that needs extra chaining")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
CC: john.fastabend@gmail.com
CC: sd@queasysnail.net
CC: daniel@iogearbox.net
CC: jonathan.lemon@gmail.com
---
net/tls/tls_sw.c | 24 ++++++++++++++++++------
1 file changed, 18 insertions(+), 6 deletions(-)
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 2608b0c01849..3bfdaf5e64f5 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -789,21 +789,33 @@ static int tls_push_record(struct sock *sk, int flags,
i = msg_pl->sg.end;
sk_msg_iter_var_prev(i);
+ /* msg_pl->sg.data is a ring; data[MAX+1] is reserved for the wrap
+ * link (frags won't use it). 'i' is now the last filled entry:
+ *
+ * i end start
+ * v v v [ rsv ]
+ * [ d ][ d ][ ][ ]...[ ][ d ][ d ][ d ][chain]
+ * ^ END v
+ * `-----------------------------------------'
+ *
+ * Note that SGL does not allow chain-after-chain, so for TLS 1.3,
+ * we must make sure we don't create the wrap entry and then chain
+ * link to content_type immediately at index 0.
+ */
+ if (i < msg_pl->sg.start)
+ sg_chain(msg_pl->sg.data, ARRAY_SIZE(msg_pl->sg.data),
+ msg_pl->sg.data);
+
rec->content_type = record_type;
if (prot->version == TLS_1_3_VERSION) {
/* Add content type to end of message. No padding added */
sg_set_buf(&rec->sg_content_type, &rec->content_type, 1);
sg_mark_end(&rec->sg_content_type);
- sg_chain(msg_pl->sg.data, msg_pl->sg.end + 1,
- &rec->sg_content_type);
+ sg_chain(msg_pl->sg.data, i + 2, &rec->sg_content_type);
} else {
sg_mark_end(sk_msg_elem(msg_pl, i));
}
- if (msg_pl->sg.end < msg_pl->sg.start)
- sg_chain(msg_pl->sg.data, ARRAY_SIZE(msg_pl->sg.data),
- msg_pl->sg.data);
-
i = msg_pl->sg.start;
sg_chain(rec->sg_aead_in, 2, &msg_pl->sg.data[i]);
--
2.54.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH net v2 3/4] net: tls: fix use-after-free in tls_sw_sendmsg_locked after bpf verdict
2026-05-11 17:49 [PATCH net v2 0/4] net: tls: net: tls: fix a few random bugs Jakub Kicinski
2026-05-11 17:49 ` [PATCH net v2 1/4] net: tls: fix off-by-one in sg_chain entry count for wrapped sk_msg ring Jakub Kicinski
2026-05-11 17:49 ` [PATCH net v2 2/4] net: tls: prevent chain-after-chain in plain text SG Jakub Kicinski
@ 2026-05-11 17:49 ` Jakub Kicinski
2026-05-12 9:47 ` Jiayuan Chen
2026-05-13 0:39 ` sashiko-bot
2026-05-11 17:49 ` [PATCH net v2 4/4] net: tls: remove bad rollback and UAF on ENOSPC Jakub Kicinski
2026-05-12 9:28 ` [PATCH net v2 0/4] net: tls: net: tls: fix a few random bugs Jakub Sitnicki
4 siblings, 2 replies; 17+ messages in thread
From: Jakub Kicinski @ 2026-05-11 17:49 UTC (permalink / raw)
To: davem
Cc: netdev, edumazet, pabeni, andrew+netdev, horms, sd,
john.fastabend, bpf, Jakub Kicinski, Sashiko
bpf_exec_tx_verdict() may return having modified the record
and the plaintext/encrypted sk_msg pointers. We must always
reload those pointers after calling bpf_exec_tx_verdict().
On the wait_for_memory path after sk_stream_wait_memory() returns,
the post-wait contains a shortcut:
if (ctx->open_rec && msg_en->sg.size < required_size)
goto alloc_encrypted;
which dereferences the cached msg_en, which can equally point at
a freed record if the prior bpf_exec_tx_verdict() split the open
rec before returning -ENOMEM. Drop the shortcut it seems to have
only been an optimization to skip trivial intro of the loop.
Reported-by: Sashiko <sashiko-bot@kernel.org>
Fixes: 54a3ecaeeeae ("bpf: fix ktls panic with sockmap")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
net/tls/tls_sw.c | 4 ----
1 file changed, 4 deletions(-)
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 3bfdaf5e64f5..360f71fd7884 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -1112,7 +1112,6 @@ static int tls_sw_sendmsg_locked(struct sock *sk, struct msghdr *msg,
if (!sk_stream_memory_free(sk))
goto wait_for_sndbuf;
-alloc_encrypted:
ret = tls_alloc_encrypted_msg(sk, required_size);
if (ret) {
if (ret != -ENOSPC)
@@ -1255,9 +1254,6 @@ static int tls_sw_sendmsg_locked(struct sock *sk, struct msghdr *msg,
tls_trim_both_msgs(sk, orig_size);
goto send_end;
}
-
- if (ctx->open_rec && msg_en->sg.size < required_size)
- goto alloc_encrypted;
}
send_end:
--
2.54.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH net v2 4/4] net: tls: remove bad rollback and UAF on ENOSPC
2026-05-11 17:49 [PATCH net v2 0/4] net: tls: net: tls: fix a few random bugs Jakub Kicinski
` (2 preceding siblings ...)
2026-05-11 17:49 ` [PATCH net v2 3/4] net: tls: fix use-after-free in tls_sw_sendmsg_locked after bpf verdict Jakub Kicinski
@ 2026-05-11 17:49 ` Jakub Kicinski
2026-05-13 1:39 ` sashiko-bot
2026-05-12 9:28 ` [PATCH net v2 0/4] net: tls: net: tls: fix a few random bugs Jakub Sitnicki
4 siblings, 1 reply; 17+ messages in thread
From: Jakub Kicinski @ 2026-05-11 17:49 UTC (permalink / raw)
To: davem
Cc: netdev, edumazet, pabeni, andrew+netdev, horms, sd,
john.fastabend, bpf, Jakub Kicinski
As explained in commit 54a3ecaeeeae ("bpf: fix ktls panic with sockmap")
once we call BPF there's no way for us to rollback the iter
and copy data, since BPF may have modified the message.
This is regardless of whether BPF set up cork or not.
Remove the attempt to roll back iter completely. This removes a UAF
since BPF may have modified msg_pl and rec, so these pointers were
stale.
Note that I'm entirely unsure what the expected behavior is here
for BPF. Feels like this path must not be exercised by normal
applications / existing deployments in the first place.
Fixes: d3b18ad31f93 ("tls: add bpf support to sk_msg handling")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
net/tls/tls_sw.c | 12 ++----------
1 file changed, 2 insertions(+), 10 deletions(-)
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 360f71fd7884..22b77840e35a 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -1164,11 +1164,8 @@ static int tls_sw_sendmsg_locked(struct sock *sk, struct msghdr *msg,
else if (ret == -ENOMEM)
goto wait_for_memory;
else if (ctx->open_rec && ret == -ENOSPC) {
- if (msg_pl->cork_bytes) {
- ret = 0;
- goto send_end;
- }
- goto rollback_iter;
+ ret = 0;
+ goto send_end;
} else if (ret != -EAGAIN)
goto send_end;
}
@@ -1180,11 +1177,6 @@ static int tls_sw_sendmsg_locked(struct sock *sk, struct msghdr *msg,
}
continue;
-rollback_iter:
- copied -= try_to_copy;
- sk_msg_sg_copy_clear(msg_pl, first);
- iov_iter_revert(&msg->msg_iter,
- msg_pl->sg.size - orig_size);
fallback_to_reg_send:
sk_msg_trim(sk, msg_pl, orig_size);
}
--
2.54.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH net v2 0/4] net: tls: net: tls: fix a few random bugs
2026-05-11 17:49 [PATCH net v2 0/4] net: tls: net: tls: fix a few random bugs Jakub Kicinski
` (3 preceding siblings ...)
2026-05-11 17:49 ` [PATCH net v2 4/4] net: tls: remove bad rollback and UAF on ENOSPC Jakub Kicinski
@ 2026-05-12 9:28 ` Jakub Sitnicki
2026-05-12 9:37 ` Sabrina Dubroca
4 siblings, 1 reply; 17+ messages in thread
From: Jakub Sitnicki @ 2026-05-12 9:28 UTC (permalink / raw)
To: Jakub Kicinski
Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, sd,
john.fastabend, bpf, Jiayuan Chen
General question - do we know about any kTLS with sockmap-redirects
users out there? I've been asking around at conferences for a couple
years now and haven't heard about any consumers of this feature.
If there are no known users, maybe we can have kTLS+sockmap glue code
behind a build-time config option and put it on a path to deprecation?
WDYT?
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH net v2 0/4] net: tls: net: tls: fix a few random bugs
2026-05-12 9:28 ` [PATCH net v2 0/4] net: tls: net: tls: fix a few random bugs Jakub Sitnicki
@ 2026-05-12 9:37 ` Sabrina Dubroca
0 siblings, 0 replies; 17+ messages in thread
From: Sabrina Dubroca @ 2026-05-12 9:37 UTC (permalink / raw)
To: Jakub Sitnicki, Daniel Borkmann, john.fastabend
Cc: Jakub Kicinski, davem, netdev, edumazet, pabeni, andrew+netdev,
horms, bpf, Jiayuan Chen
2026-05-12, 11:28:59 +0200, Jakub Sitnicki wrote:
> General question - do we know about any kTLS with sockmap-redirects
> users out there? I've been asking around at conferences for a couple
> years now and haven't heard about any consumers of this feature.
>
> If there are no known users, maybe we can have kTLS+sockmap glue code
> behind a build-time config option and put it on a path to deprecation?
>
> WDYT?
I was also thinking about this recently (well, more about ripping it
out directly, without a "polite" deprecation path). It's a lot of
complexity (on the same level as the async crypto code, which is also
causing us some pain).
In
https://lore.kernel.org/netdev/47b7dc3c-7092-49fe-b849-6847b73df86d@linux.dev/
Jiayuan Chen linked to a 2018 LPC presentation, which is... not strong
evidence of current use :)
Daniel, John?
--
Sabrina
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH net v2 3/4] net: tls: fix use-after-free in tls_sw_sendmsg_locked after bpf verdict
2026-05-11 17:49 ` [PATCH net v2 3/4] net: tls: fix use-after-free in tls_sw_sendmsg_locked after bpf verdict Jakub Kicinski
@ 2026-05-12 9:47 ` Jiayuan Chen
2026-05-12 16:04 ` Jakub Kicinski
2026-05-13 0:39 ` sashiko-bot
1 sibling, 1 reply; 17+ messages in thread
From: Jiayuan Chen @ 2026-05-12 9:47 UTC (permalink / raw)
To: Jakub Kicinski, davem
Cc: netdev, edumazet, pabeni, andrew+netdev, horms, sd,
john.fastabend, bpf, Sashiko
On 5/12/26 1:49 AM, Jakub Kicinski wrote:
> bpf_exec_tx_verdict() may return having modified the record
> and the plaintext/encrypted sk_msg pointers. We must always
> reload those pointers after calling bpf_exec_tx_verdict().
>
> On the wait_for_memory path after sk_stream_wait_memory() returns,
> the post-wait contains a shortcut:
>
> if (ctx->open_rec && msg_en->sg.size < required_size)
> goto alloc_encrypted;
>
> which dereferences the cached msg_en, which can equally point at
> a freed record if the prior bpf_exec_tx_verdict() split the open
> rec before returning -ENOMEM. Drop the shortcut it seems to have
> only been an optimization to skip trivial intro of the loop.
>
> Reported-by: Sashiko <sashiko-bot@kernel.org>
> Fixes: 54a3ecaeeeae ("bpf: fix ktls panic with sockmap")
Is the blamed commit correct? I don't think I touched the following code
in this commit.
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
> net/tls/tls_sw.c | 4 ----
> 1 file changed, 4 deletions(-)
>
> diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
> index 3bfdaf5e64f5..360f71fd7884 100644
> --- a/net/tls/tls_sw.c
> +++ b/net/tls/tls_sw.c
> @@ -1112,7 +1112,6 @@ static int tls_sw_sendmsg_locked(struct sock *sk, struct msghdr *msg,
> if (!sk_stream_memory_free(sk))
> goto wait_for_sndbuf;
>
> -alloc_encrypted:
> ret = tls_alloc_encrypted_msg(sk, required_size);
> if (ret) {
> if (ret != -ENOSPC)
> @@ -1255,9 +1254,6 @@ static int tls_sw_sendmsg_locked(struct sock *sk, struct msghdr *msg,
> tls_trim_both_msgs(sk, orig_size);
> goto send_end;
> }
> -
> - if (ctx->open_rec && msg_en->sg.size < required_size)
> - goto alloc_encrypted;
> }
>
> send_end:
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH net v2 1/4] net: tls: fix off-by-one in sg_chain entry count for wrapped sk_msg ring
2026-05-11 17:49 ` [PATCH net v2 1/4] net: tls: fix off-by-one in sg_chain entry count for wrapped sk_msg ring Jakub Kicinski
@ 2026-05-12 10:21 ` Sabrina Dubroca
0 siblings, 0 replies; 17+ messages in thread
From: Sabrina Dubroca @ 2026-05-12 10:21 UTC (permalink / raw)
To: Jakub Kicinski
Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms,
john.fastabend, bpf, 钱一铭, daniel,
jonathan.lemon
2026-05-11, 10:49:17 -0700, Jakub Kicinski wrote:
> When an sk_msg scatterlist ring wraps (sg.end < sg.start),
> tls_push_record() chains the tail portion of the ring to the head
> using sg_chain(). An extra entry in the sg array is reserved for
> this:
>
> struct sk_msg_sg {
> [...]
> /* The extra two elements:
> * 1) used for chaining the front and sections when the list becomes
> * partitioned (e.g. end < start). The crypto APIs require the
> * chaining;
> * 2) to chain tailer SG entries after the message.
> */
> struct scatterlist data[MAX_MSG_FRAGS + 2];
>
> The current code uses MAX_SKB_FRAGS + 1 as the ring size:
>
> sg_chain(&msg_pl->sg.data[msg_pl->sg.start],
> MAX_SKB_FRAGS - msg_pl->sg.start + 1,
> msg_pl->sg.data);
>
> This places the chain pointer at
>
> sg_chain(data[start], (MAX_SKB_FRAGS - msg_start + 1) .. =
> &data[start] + (MAX_SKB_FRAGS - msg_start + 1) - 1 =
> data[start + (MAX_SKB_FRAGS - start + 1) - 1] =
> data[MAX_SKB_FRAGS]
>
> instead of the true last entry. This is likely due to a "race" of
> the commit under Fixes landing close to
> commit 031097d9e079 ("bpf: sk_msg, zap ingress queue on psock down")
>
> Convert to ARRAY_SIZE and drop the data[start] / - start (as suggested
> by Sabrina).
>
> Reported-by: 钱一铭 <yimingqian591@gmail.com>
> Fixes: 9aaaa56845a0 ("bpf: Sockmap/tls, skmsg can have wrapped skmsg that needs extra chaining")
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
--
Sabrina
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH net v2 2/4] net: tls: prevent chain-after-chain in plain text SG
2026-05-11 17:49 ` [PATCH net v2 2/4] net: tls: prevent chain-after-chain in plain text SG Jakub Kicinski
@ 2026-05-12 11:09 ` Sabrina Dubroca
2026-05-12 16:03 ` Jakub Kicinski
2026-05-13 0:10 ` sashiko-bot
1 sibling, 1 reply; 17+ messages in thread
From: Sabrina Dubroca @ 2026-05-12 11:09 UTC (permalink / raw)
To: Jakub Kicinski
Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms,
john.fastabend, bpf, Sashiko, daniel, jonathan.lemon
2026-05-11, 10:49:18 -0700, Jakub Kicinski wrote:
> diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
> index 2608b0c01849..3bfdaf5e64f5 100644
> --- a/net/tls/tls_sw.c
> +++ b/net/tls/tls_sw.c
> @@ -789,21 +789,33 @@ static int tls_push_record(struct sock *sk, int flags,
> i = msg_pl->sg.end;
> sk_msg_iter_var_prev(i);
>
> + /* msg_pl->sg.data is a ring; data[MAX+1] is reserved for the wrap
> + * link (frags won't use it). 'i' is now the last filled entry:
> + *
> + * i end start
> + * v v v [ rsv ]
> + * [ d ][ d ][ ][ ]...[ ][ d ][ d ][ d ][chain]
> + * ^ END v
> + * `-----------------------------------------'
> + *
> + * Note that SGL does not allow chain-after-chain, so for TLS 1.3,
> + * we must make sure we don't create the wrap entry and then chain
> + * link to content_type immediately at index 0.
> + */
All this wrapping with extra "hidden" slots is so confusing...
> + if (i < msg_pl->sg.start)
> + sg_chain(msg_pl->sg.data, ARRAY_SIZE(msg_pl->sg.data),
> + msg_pl->sg.data);
> +
> rec->content_type = record_type;
> if (prot->version == TLS_1_3_VERSION) {
> /* Add content type to end of message. No padding added */
> sg_set_buf(&rec->sg_content_type, &rec->content_type, 1);
> sg_mark_end(&rec->sg_content_type);
> - sg_chain(msg_pl->sg.data, msg_pl->sg.end + 1,
> - &rec->sg_content_type);
> + sg_chain(msg_pl->sg.data, i + 2, &rec->sg_content_type);
Probably a silly question: why do we need to chain the content type?
Could we just drop it directly into the right slot of msg_pl?
(I'm also a bit puzzled by the "last_filled + 2"/"end + 1", because
that would leave an empty slot between the last_filled slot and the
content_type? and maybe even overwrite the first actual message chunk
if we had filled the message?)
[all this could quite possibly be bogus, which I think isn't a good
sign for the maintainability of this code]
--
Sabrina
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH net v2 2/4] net: tls: prevent chain-after-chain in plain text SG
2026-05-12 11:09 ` Sabrina Dubroca
@ 2026-05-12 16:03 ` Jakub Kicinski
2026-05-12 22:30 ` Sabrina Dubroca
0 siblings, 1 reply; 17+ messages in thread
From: Jakub Kicinski @ 2026-05-12 16:03 UTC (permalink / raw)
To: Sabrina Dubroca
Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms,
john.fastabend, bpf, Sashiko, daniel, jonathan.lemon
On Tue, 12 May 2026 13:09:36 +0200 Sabrina Dubroca wrote:
> 2026-05-11, 10:49:18 -0700, Jakub Kicinski wrote:
> > diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
> > index 2608b0c01849..3bfdaf5e64f5 100644
> > --- a/net/tls/tls_sw.c
> > +++ b/net/tls/tls_sw.c
> > @@ -789,21 +789,33 @@ static int tls_push_record(struct sock *sk, int flags,
> > i = msg_pl->sg.end;
> > sk_msg_iter_var_prev(i);
> >
> > + /* msg_pl->sg.data is a ring; data[MAX+1] is reserved for the wrap
> > + * link (frags won't use it). 'i' is now the last filled entry:
> > + *
> > + * i end start
> > + * v v v [ rsv ]
> > + * [ d ][ d ][ ][ ]...[ ][ d ][ d ][ d ][chain]
> > + * ^ END v
> > + * `-----------------------------------------'
> > + *
> > + * Note that SGL does not allow chain-after-chain, so for TLS 1.3,
> > + * we must make sure we don't create the wrap entry and then chain
> > + * link to content_type immediately at index 0.
> > + */
>
> All this wrapping with extra "hidden" slots is so confusing...
Yes, always takes me an hour to swap the context back in.
I was hoping the diagram would help.
> > + if (i < msg_pl->sg.start)
> > + sg_chain(msg_pl->sg.data, ARRAY_SIZE(msg_pl->sg.data),
> > + msg_pl->sg.data);
> > +
> > rec->content_type = record_type;
> > if (prot->version == TLS_1_3_VERSION) {
> > /* Add content type to end of message. No padding added */
> > sg_set_buf(&rec->sg_content_type, &rec->content_type, 1);
> > sg_mark_end(&rec->sg_content_type);
> > - sg_chain(msg_pl->sg.data, msg_pl->sg.end + 1,
> > - &rec->sg_content_type);
> > + sg_chain(msg_pl->sg.data, i + 2, &rec->sg_content_type);
>
> Probably a silly question: why do we need to chain the content type?
> Could we just drop it directly into the right slot of msg_pl?
Good question, IDK either. Maybe anticipating the need for padding?
Do you prefer that as the fix, or follow up in net-next is fine?
> (I'm also a bit puzzled by the "last_filled + 2"/"end + 1", because
> that would leave an empty slot between the last_filled slot and the
> content_type? and maybe even overwrite the first actual message chunk
> if we had filled the message?)
Incredibly confusingly the value passed to sg_chain is the arrays size.
So it's entry target + 1. IDK if that answer your question but it always
confuses me. I even added it to the comment above but looks like I ended
up removing it.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH net v2 3/4] net: tls: fix use-after-free in tls_sw_sendmsg_locked after bpf verdict
2026-05-12 9:47 ` Jiayuan Chen
@ 2026-05-12 16:04 ` Jakub Kicinski
0 siblings, 0 replies; 17+ messages in thread
From: Jakub Kicinski @ 2026-05-12 16:04 UTC (permalink / raw)
To: Jiayuan Chen
Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, sd,
john.fastabend, bpf, Sashiko
On Tue, 12 May 2026 17:47:13 +0800 Jiayuan Chen wrote:
> > Reported-by: Sashiko <sashiko-bot@kernel.org>
> > Fixes: 54a3ecaeeeae ("bpf: fix ktls panic with sockmap")
>
> Is the blamed commit correct? I don't think I touched the following code
> in this commit.
Fair, will update.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH net v2 2/4] net: tls: prevent chain-after-chain in plain text SG
2026-05-12 16:03 ` Jakub Kicinski
@ 2026-05-12 22:30 ` Sabrina Dubroca
2026-05-13 0:14 ` Jakub Kicinski
0 siblings, 1 reply; 17+ messages in thread
From: Sabrina Dubroca @ 2026-05-12 22:30 UTC (permalink / raw)
To: Jakub Kicinski
Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms,
john.fastabend, bpf, Sashiko, daniel, jonathan.lemon
2026-05-12, 09:03:42 -0700, Jakub Kicinski wrote:
> On Tue, 12 May 2026 13:09:36 +0200 Sabrina Dubroca wrote:
> > 2026-05-11, 10:49:18 -0700, Jakub Kicinski wrote:
> > > diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
> > > index 2608b0c01849..3bfdaf5e64f5 100644
> > > --- a/net/tls/tls_sw.c
> > > +++ b/net/tls/tls_sw.c
> > > @@ -789,21 +789,33 @@ static int tls_push_record(struct sock *sk, int flags,
> > > i = msg_pl->sg.end;
> > > sk_msg_iter_var_prev(i);
> > >
> > > + /* msg_pl->sg.data is a ring; data[MAX+1] is reserved for the wrap
> > > + * link (frags won't use it). 'i' is now the last filled entry:
> > > + *
> > > + * i end start
> > > + * v v v [ rsv ]
> > > + * [ d ][ d ][ ][ ]...[ ][ d ][ d ][ d ][chain]
> > > + * ^ END v
> > > + * `-----------------------------------------'
> > > + *
> > > + * Note that SGL does not allow chain-after-chain, so for TLS 1.3,
> > > + * we must make sure we don't create the wrap entry and then chain
> > > + * link to content_type immediately at index 0.
> > > + */
> >
> > All this wrapping with extra "hidden" slots is so confusing...
>
> Yes, always takes me an hour to swap the context back in.
> I was hoping the diagram would help.
I've more or less managed to wrap my head around it due to the
previous patch, but it still takes me way too long just to check the
logic makes sense.
> > > + if (i < msg_pl->sg.start)
> > > + sg_chain(msg_pl->sg.data, ARRAY_SIZE(msg_pl->sg.data),
> > > + msg_pl->sg.data);
> > > +
> > > rec->content_type = record_type;
> > > if (prot->version == TLS_1_3_VERSION) {
> > > /* Add content type to end of message. No padding added */
> > > sg_set_buf(&rec->sg_content_type, &rec->content_type, 1);
> > > sg_mark_end(&rec->sg_content_type);
> > > - sg_chain(msg_pl->sg.data, msg_pl->sg.end + 1,
> > > - &rec->sg_content_type);
> > > + sg_chain(msg_pl->sg.data, i + 2, &rec->sg_content_type);
> >
> > Probably a silly question: why do we need to chain the content type?
> > Could we just drop it directly into the right slot of msg_pl?
>
> Good question, IDK either. Maybe anticipating the need for padding?
> Do you prefer that as the fix, or follow up in net-next is fine?
I'd go with "make it a bit more straightforward" and remove the
chaining. If you want to retain the hunk moving the other chaining
(for wrapping) around, go ahead. Not sure it would still make sense.
> > (I'm also a bit puzzled by the "last_filled + 2"/"end + 1", because
> > that would leave an empty slot between the last_filled slot and the
> > content_type? and maybe even overwrite the first actual message chunk
> > if we had filled the message?)
>
> Incredibly confusingly the value passed to sg_chain is the arrays size.
> So it's entry target + 1. IDK if that answer your question but it always
> confuses me. I even added it to the comment above but looks like I ended
> up removing it.
Ugh, right. I read all this stuff last time you sent those patches,
and a bit earlier as I was reviewing patch 1 again... and I had
already forgotten when I got to this. Thanks.
--
Sabrina
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH net v2 2/4] net: tls: prevent chain-after-chain in plain text SG
2026-05-11 17:49 ` [PATCH net v2 2/4] net: tls: prevent chain-after-chain in plain text SG Jakub Kicinski
2026-05-12 11:09 ` Sabrina Dubroca
@ 2026-05-13 0:10 ` sashiko-bot
1 sibling, 0 replies; 17+ messages in thread
From: sashiko-bot @ 2026-05-13 0:10 UTC (permalink / raw)
To: Jakub Kicinski; +Cc: bpf
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Critical] The patch fails to handle empty TLS records (size == 0, start == end) when a BPF program pops all data, leading to a chain-after-chain kernel panic or premature scatterlist termination.
--
commit a19dc6e44cc96cee4ac7cb297275155ee108e228
Author: Jakub Kicinski <kuba@kernel.org>
net: tls: prevent chain-after-chain in plain text SG
This commit adjusts the scatterlist chaining logic in tls_push_record() to
prevent an illegal chain-after-chain configuration when end = 0.
> diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
> index 2608b0c01849f..3bfdaf5e64f50 100644
> --- a/net/tls/tls_sw.c
> +++ b/net/tls/tls_sw.c
> @@ -789,21 +789,33 @@ static int tls_push_record(struct sock *sk, int flags,
> i = msg_pl->sg.end;
> sk_msg_iter_var_prev(i);
Does this code handle the case where a BPF program pops all data and leaves
an empty record where msg_pl->sg.size == 0 and start == end?
[ ... ]
> + if (i < msg_pl->sg.start)
> + sg_chain(msg_pl->sg.data, ARRAY_SIZE(msg_pl->sg.data),
> + msg_pl->sg.data);
If start == end == 0, i wraps to NR_MSG_FRAG_IDS - 1. The condition
i < msg_pl->sg.start evaluates to false, skipping the wrap link.
> rec->content_type = record_type;
> if (prot->version == TLS_1_3_VERSION) {
> /* Add content type to end of message. No padding added */
> sg_set_buf(&rec->sg_content_type, &rec->content_type, 1);
> sg_mark_end(&rec->sg_content_type);
> - sg_chain(msg_pl->sg.data, msg_pl->sg.end + 1,
> - &rec->sg_content_type);
> + sg_chain(msg_pl->sg.data, i + 2, &rec->sg_content_type);
In the start == end == 0 scenario, the content type link is placed at
data[NR_MSG_FRAG_IDS].
If start == end > 0 (for example, start = 5 and end = 5), i becomes 4.
The condition i < msg_pl->sg.start is true, so the wrap link is created.
Here, msg_pl->sg.data is chained at i + 2 (which is 6, meaning data[5])
to rec->sg_content_type.
> } else {
> sg_mark_end(sk_msg_elem(msg_pl, i));
> }
>
> - if (msg_pl->sg.end < msg_pl->sg.start)
> - sg_chain(msg_pl->sg.data, ARRAY_SIZE(msg_pl->sg.data),
> - msg_pl->sg.data);
> -
> i = msg_pl->sg.start;
> sg_chain(rec->sg_aead_in, 2, &msg_pl->sg.data[i]);
When start == end == 0, sg_aead_in chains to data[0]. Since the list starting
at data[0] never links to data[NR_MSG_FRAG_IDS], the scatterlist iterator
might hit a premature SG_END at data[0] instead of finding the content type
link. Can this cause the crypto API to fail because the expected cryptlen
is not satisfied?
When start == end > 0 (e.g., 5), sg_aead_in is chained to msg_pl->sg.data[5].
Since data[5] is already a chain link to rec->sg_content_type, doesn't this
result in an illegal chain-after-chain configuration that the scatterlist API
cannot resolve?
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260511174920.433155-1-kuba@kernel.org?part=2
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH net v2 2/4] net: tls: prevent chain-after-chain in plain text SG
2026-05-12 22:30 ` Sabrina Dubroca
@ 2026-05-13 0:14 ` Jakub Kicinski
0 siblings, 0 replies; 17+ messages in thread
From: Jakub Kicinski @ 2026-05-13 0:14 UTC (permalink / raw)
To: Sabrina Dubroca
Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms,
john.fastabend, bpf, Sashiko, daniel, jonathan.lemon
On Wed, 13 May 2026 00:30:32 +0200 Sabrina Dubroca wrote:
> > > Probably a silly question: why do we need to chain the content type?
> > > Could we just drop it directly into the right slot of msg_pl?
> >
> > Good question, IDK either. Maybe anticipating the need for padding?
> > Do you prefer that as the fix, or follow up in net-next is fine?
>
> I'd go with "make it a bit more straightforward" and remove the
> chaining. If you want to retain the hunk moving the other chaining
> (for wrapping) around, go ahead. Not sure it would still make sense.
AFAICT start=N end=0 and tls content_type chaining are unnecessary.
So I think we can keep this fix as it.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH net v2 3/4] net: tls: fix use-after-free in tls_sw_sendmsg_locked after bpf verdict
2026-05-11 17:49 ` [PATCH net v2 3/4] net: tls: fix use-after-free in tls_sw_sendmsg_locked after bpf verdict Jakub Kicinski
2026-05-12 9:47 ` Jiayuan Chen
@ 2026-05-13 0:39 ` sashiko-bot
1 sibling, 0 replies; 17+ messages in thread
From: sashiko-bot @ 2026-05-13 0:39 UTC (permalink / raw)
To: Jakub Kicinski; +Cc: bpf
Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [Critical] Silent data loss, stream corruption, and race condition in the trim_sgl error path on wait_for_memory failure.
- [High] Removing the alloc_encrypted shortcut eliminates the verdict retry mechanism, causing connection hangs on ENOMEM.
--
commit 1c563620cb643329347417380b885f1f7483dc0f
Author: Jakub Kicinski <kuba@kernel.org>
net: tls: fix use-after-free in tls_sw_sendmsg_locked after bpf verdict
bpf_exec_tx_verdict() may modify the record and the plaintext/encrypted
sk_msg pointers, requiring them to be reloaded. The post-wait shortcut
could dereference a cached msg_en pointing to a freed record if the
verdict split the open rec before returning -ENOMEM. This commit drops
the shortcut to prevent a use-after-free.
> diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
> index 3bfdaf5e64f50..360f71fd78842 100644
> --- a/net/tls/tls_sw.c
> +++ b/net/tls/tls_sw.c
[ ... ]
> @@ -1255,9 +1254,6 @@ static int tls_sw_sendmsg_locked(struct sock *sk, struct msghdr *msg,
> tls_trim_both_msgs(sk, orig_size);
While this error path wasn't introduced by this patch, is it safe?
If bpf_exec_tx_verdict() returns -ENOMEM, execution jumps to wait_for_memory.
At this point, try_to_copy bytes have already been appended to the record,
msg_iter advanced, and copied incremented.
sk_stream_wait_memory() releases the socket lock and sleeps. If it wakes up and
returns an error, execution jumps to trim_sgl and calls
tls_trim_both_msgs(sk, orig_size) to truncate the record.
Does this discard the try_to_copy bytes without decrementing copied or
reverting msg_iter, returning success to userspace for deleted data?
Because the socket lock was dropped during the sleep, could a concurrent thread
have pushed the record and allocated a new one, meaning trimming this new
record using the stale orig_size from before the sleep corrupts the concurrent
thread's stream?
Also, if the verdict split the record before returning -ENOMEM, would orig_size
refer to the freed record, causing out-of-bounds trimming logic?
> goto send_end;
> }
> -
> - if (ctx->open_rec && msg_en->sg.size < required_size)
> - goto alloc_encrypted;
Does removing this shortcut eliminate the verdict retry mechanism?
If the verdict fails with -ENOMEM on the final data chunk (where
msg_data_left(msg) == 0) and memory subsequently becomes available, removing
this jump means execution falls through to the loop condition and immediately
terminates.
The record is fully constructed in ctx->open_rec but is never pushed to the
engine or transmitted. Does this leave the data stuck in the buffer
indefinitely until a subsequent sendmsg or socket close triggers a flush,
causing the protocol to hang?
> }
>
> send_end:
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260511174920.433155-1-kuba@kernel.org?part=3
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH net v2 4/4] net: tls: remove bad rollback and UAF on ENOSPC
2026-05-11 17:49 ` [PATCH net v2 4/4] net: tls: remove bad rollback and UAF on ENOSPC Jakub Kicinski
@ 2026-05-13 1:39 ` sashiko-bot
0 siblings, 0 replies; 17+ messages in thread
From: sashiko-bot @ 2026-05-13 1:39 UTC (permalink / raw)
To: Jakub Kicinski; +Cc: bpf
Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [Critical] Silent data loss and stale state usage (UAF) in the -ENOMEM error path.
- [High] BPF policy bypass due to uncleared psock->eval state on -ENOMEM.
--
commit 9867dad26c6eb929c35af6ad7b485aac95c01aef
Author: Jakub Kicinski <kuba@kernel.org>
net: tls: remove bad rollback and UAF on ENOSPC
This commit removes the attempt to roll back the iterator when BPF
encounters an ENOSPC error. Since BPF may have modified the message,
variables like orig_size and other pointers become stale, so rolling
them back leads to use-after-free or corruption.
> diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
> index 360f71fd78842..22b77840e35ae 100644
> --- a/net/tls/tls_sw.c
> +++ b/net/tls/tls_sw.c
> @@ -1164,11 +1164,8 @@ static int tls_sw_sendmsg_locked(struct sock *sk, struct msghdr *msg,
> else if (ret == -ENOMEM)
> goto wait_for_memory;
The commit message notes that BPF may have modified the message, making
pointers and state stale. Does this same stale state issue still exist in
the -ENOMEM path?
If bpf_exec_tx_verdict() executes BPF (which may push or pop data) and then
fails with -ENOMEM, execution jumps to wait_for_memory:
wait_for_memory:
ret = sk_stream_wait_memory(sk, &timeo);
if (ret) {
trim_sgl:
if (ctx->open_rec)
tls_trim_both_msgs(sk, orig_size);
goto send_end;
}
If sk_stream_wait_memory() fails, the code calls tls_trim_both_msgs(sk,
orig_size). Can trimming to the pre-BPF orig_size corrupt the scatterlist
if BPF changed the size?
Additionally, copied was already incremented by try_to_copy. Since the
function returns copied, might it falsely indicate to userspace that the
discarded bytes were successfully sent, causing silent data corruption?
What happens if sk_stream_wait_memory() succeeds instead? The loop in
tls_sw_sendmsg_locked() restarts, appending more data from userspace to
the open record.
If tls_push_record() inside the bpf_exec_tx_verdict() path returns -ENOMEM,
does it return early without resetting psock->eval to __SK_NONE?
If so, when bpf_exec_tx_verdict() is called again, could it see psock->eval
evaluating to __SK_PASS and skip the BPF evaluation entirely, allowing
the newly appended data to be sent without verification?
> else if (ctx->open_rec && ret == -ENOSPC) {
> - if (msg_pl->cork_bytes) {
> - ret = 0;
> - goto send_end;
> - }
> - goto rollback_iter;
> + ret = 0;
> + goto send_end;
> } else if (ret != -EAGAIN)
> goto send_end;
> }
[ ... ]
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260511174920.433155-1-kuba@kernel.org?part=4
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2026-05-13 1:39 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-11 17:49 [PATCH net v2 0/4] net: tls: net: tls: fix a few random bugs Jakub Kicinski
2026-05-11 17:49 ` [PATCH net v2 1/4] net: tls: fix off-by-one in sg_chain entry count for wrapped sk_msg ring Jakub Kicinski
2026-05-12 10:21 ` Sabrina Dubroca
2026-05-11 17:49 ` [PATCH net v2 2/4] net: tls: prevent chain-after-chain in plain text SG Jakub Kicinski
2026-05-12 11:09 ` Sabrina Dubroca
2026-05-12 16:03 ` Jakub Kicinski
2026-05-12 22:30 ` Sabrina Dubroca
2026-05-13 0:14 ` Jakub Kicinski
2026-05-13 0:10 ` sashiko-bot
2026-05-11 17:49 ` [PATCH net v2 3/4] net: tls: fix use-after-free in tls_sw_sendmsg_locked after bpf verdict Jakub Kicinski
2026-05-12 9:47 ` Jiayuan Chen
2026-05-12 16:04 ` Jakub Kicinski
2026-05-13 0:39 ` sashiko-bot
2026-05-11 17:49 ` [PATCH net v2 4/4] net: tls: remove bad rollback and UAF on ENOSPC Jakub Kicinski
2026-05-13 1:39 ` sashiko-bot
2026-05-12 9:28 ` [PATCH net v2 0/4] net: tls: net: tls: fix a few random bugs Jakub Sitnicki
2026-05-12 9:37 ` Sabrina Dubroca
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox