Netdev List
 help / color / mirror / Atom feed
* [PATCH net v3] net: tls: use sync AEAD for sk_msg BPF sockets
@ 2026-05-26  2:51 Christopher Lusk
  2026-05-26  6:44 ` Jiayuan Chen
  0 siblings, 1 reply; 4+ messages in thread
From: Christopher Lusk @ 2026-05-26  2:51 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: John Fastabend, Sabrina Dubroca, David S. Miller, Eric Dumazet,
	Paolo Abeni, Simon Horman, Alexei Starovoitov, Daniel Borkmann,
	netdev, bpf, linux-kernel, stable

The kTLS TX path can hand an open record to a sk_msg verdict
program before encryption.  If the verdict applies fewer bytes
than the open record contains, tls_push_record() splits
ctx->open_rec into the record being encrypted and a remainder.
The synchronous path reattaches that remainder before continuing.

With an async AEAD provider, crypto_aead_encrypt() can return
-EINPROGRESS after ctx->open_rec has been unhooked but before the
split remainder is reattached.  The remainder is no longer
reachable through ctx->open_rec or ctx->tx_list, silently dropping
transmitted data and leaking the unreachable tls_rec.  The same
composition also entangles the user-page zerocopy lifetime rules
with an async completion path.

A sockmap cannot be attached to a socket after an inet ULP is
installed: sk_psock_init() returns -EINVAL when
inet_csk_has_ulp() is true.  So the supported ordering for
sockmap + kTLS TX is sockmap first, TLS_TX setup second.  When
TLS_TX setup sees an existing sk_psock, allocate the AEAD with
CRYPTO_ALG_ASYNC masked out and latch the TX zerocopy gate
(sw_ctx_tx->async_capable) so the buggy composition becomes
structurally unreachable.  Ordinary kTLS sockets without sk_msg
BPF attached are unaffected and continue to use async-capable
providers.

Fixes: d3b18ad31f93 ("tls: add bpf support to sk_msg handling")
Cc: stable@vger.kernel.org # 4.20+
Signed-off-by: Christopher Lusk <clusk@northecho.dev>
Assisted-by: Codex:gpt-5.5
Assisted-by: Claude:claude-opus-4-7
---

Changes since v2 [1]:
- Per netdev maintainer guidance [2], replace the Option-C
  drain-on-error fix with a setup-time surface narrowing in
  tls_set_sw_offload(): when a sockmap is already attached at
  TLS_TX setup, request a synchronous AEAD (CRYPTO_ALG_ASYNC in
  the allocation mask) and set sw_ctx_tx->async_capable = 1.
  Both moves are needed: latching async_capable alone disables
  zerocopy but tls_do_encryption() can still return -EINPROGRESS
  on the copy path; selecting a sync provider removes that return
  path for sk_msg-attached sockets.
- Drop the selftest from the series per Jakub's note that the
  existing sockmap + TLS coverage at
  tools/testing/selftests/bpf/prog_tests/sockmap_ktls.c exercises
  this configuration [3].  That suite covers sockmap + kTLS
  policy paths broadly; the specific async-pcrypt pass-then-drop
  failure mode from the v2 reproducer was validated for v3 on
  QEMU/KVM with a KASAN+LOCKDEP-instrumented kernel against net
  base 2156a29aecff before send.
- Single-patch series.

Changes since v1:
- v1's remainder-rooting fix was incomplete; Sashiko AI review
  surfaced a real UAF in the v2 follow-up that John Fastabend
  endorsed on the v1 thread [4].  The surface-narrowing approach
  in v3 makes both failure modes unreachable by avoiding the
  async + sk_msg composition entirely rather than patching each
  continuation point.

[1] https://lore.kernel.org/all/20260521025840.976378-1-clusk@northecho.dev/
[2] https://lore.kernel.org/all/20260525133028.58494274@kernel.org/
[3] https://lore.kernel.org/all/20260525133048.2dc6d8d3@kernel.org/
[4] https://lore.kernel.org/all/huduxtn6parzgiaf5cyiyrrvjjvx6jsdedowvrd4nkwmuyeind@j6migjgofh2i/

 net/tls/tls_sw.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 964ebc268..0000000 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -2867,7 +2867,20 @@ int tls_set_sw_offload(struct sock *sk, int tx,
 	rec_seq = crypto_info_rec_seq(src_crypto_info, cipher_desc);

 	if (!*aead) {
-		*aead = crypto_alloc_aead(cipher_desc->cipher_name, 0, 0);
+		u32 mask = 0;
+
+		if (tx) {
+			struct sk_psock *psock;
+
+			psock = sk_psock_get(sk);
+			if (psock) {
+				mask = CRYPTO_ALG_ASYNC;
+				sw_ctx_tx->async_capable = 1;
+				sk_psock_put(sk, psock);
+			}
+		}
+
+		*aead = crypto_alloc_aead(cipher_desc->cipher_name, 0, mask);
 		if (IS_ERR(*aead)) {
 			rc = PTR_ERR(*aead);
 			*aead = NULL;
--
2.54.0

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH net v3] net: tls: use sync AEAD for sk_msg BPF sockets
  2026-05-26  2:51 [PATCH net v3] net: tls: use sync AEAD for sk_msg BPF sockets Christopher Lusk
@ 2026-05-26  6:44 ` Jiayuan Chen
  2026-05-26 23:11   ` Jakub Kicinski
  0 siblings, 1 reply; 4+ messages in thread
From: Jiayuan Chen @ 2026-05-26  6:44 UTC (permalink / raw)
  To: Christopher Lusk, Jakub Kicinski
  Cc: John Fastabend, Sabrina Dubroca, David S. Miller, Eric Dumazet,
	Paolo Abeni, Simon Horman, Alexei Starovoitov, Daniel Borkmann,
	netdev, bpf, linux-kernel, stable


On 5/26/26 10:51 AM, Christopher Lusk wrote:
> The kTLS TX path can hand an open record to a sk_msg verdict
> program before encryption.  If the verdict applies fewer bytes
> than the open record contains, tls_push_record() splits
> ctx->open_rec into the record being encrypted and a remainder.
> The synchronous path reattaches that remainder before continuing.
>
> With an async AEAD provider, crypto_aead_encrypt() can return
> -EINPROGRESS after ctx->open_rec has been unhooked but before the
> split remainder is reattached.  The remainder is no longer
> reachable through ctx->open_rec or ctx->tx_list, silently dropping
> transmitted data and leaking the unreachable tls_rec.  The same
> composition also entangles the user-page zerocopy lifetime rules
> with an async completion path.
>
> A sockmap cannot be attached to a socket after an inet ULP is
> installed: sk_psock_init() returns -EINVAL when
> inet_csk_has_ulp() is true.  So the supported ordering for
> sockmap + kTLS TX is sockmap first, TLS_TX setup second.  When
> TLS_TX setup sees an existing sk_psock, allocate the AEAD with
> CRYPTO_ALG_ASYNC masked out and latch the TX zerocopy gate
> (sw_ctx_tx->async_capable) so the buggy composition becomes
> structurally unreachable.  Ordinary kTLS sockets without sk_msg
> BPF attached are unaffected and continue to use async-capable
> providers.
>
> Fixes: d3b18ad31f93 ("tls: add bpf support to sk_msg handling")
> Cc: stable@vger.kernel.org # 4.20+
> Signed-off-by: Christopher Lusk <clusk@northecho.dev>
> Assisted-by: Codex:gpt-5.5
> Assisted-by: Claude:claude-opus-4-7
> ---
>
> Changes since v2 [1]:
> - Per netdev maintainer guidance [2], replace the Option-C
>    drain-on-error fix with a setup-time surface narrowing in
>    tls_set_sw_offload(): when a sockmap is already attached at
>    TLS_TX setup, request a synchronous AEAD (CRYPTO_ALG_ASYNC in
>    the allocation mask) and set sw_ctx_tx->async_capable = 1.
>    Both moves are needed: latching async_capable alone disables
>    zerocopy but tls_do_encryption() can still return -EINPROGRESS
>    on the copy path; selecting a sync provider removes that return
>    path for sk_msg-attached sockets.
> - Drop the selftest from the series per Jakub's note that the
>    existing sockmap + TLS coverage at
>    tools/testing/selftests/bpf/prog_tests/sockmap_ktls.c exercises
>    this configuration [3].  That suite covers sockmap + kTLS
>    policy paths broadly; the specific async-pcrypt pass-then-drop
>    failure mode from the v2 reproducer was validated for v3 on
>    QEMU/KVM with a KASAN+LOCKDEP-instrumented kernel against net
>    base 2156a29aecff before send.
> - Single-patch series.
>
> Changes since v1:
> - v1's remainder-rooting fix was incomplete; Sashiko AI review
>    surfaced a real UAF in the v2 follow-up that John Fastabend
>    endorsed on the v1 thread [4].  The surface-narrowing approach
>    in v3 makes both failure modes unreachable by avoiding the
>    async + sk_msg composition entirely rather than patching each
>    continuation point.
>
> [1] https://lore.kernel.org/all/20260521025840.976378-1-clusk@northecho.dev/
> [2] https://lore.kernel.org/all/20260525133028.58494274@kernel.org/
> [3] https://lore.kernel.org/all/20260525133048.2dc6d8d3@kernel.org/
> [4] https://lore.kernel.org/all/huduxtn6parzgiaf5cyiyrrvjjvx6jsdedowvrd4nkwmuyeind@j6migjgofh2i/
>
>   net/tls/tls_sw.c | 15 ++++++++++++++-
>   1 file changed, 14 insertions(+), 1 deletion(-)
>
> diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
> index 964ebc268..0000000 100644
> --- a/net/tls/tls_sw.c
> +++ b/net/tls/tls_sw.c
> @@ -2867,7 +2867,20 @@ int tls_set_sw_offload(struct sock *sk, int tx,
>   	rec_seq = crypto_info_rec_seq(src_crypto_info, cipher_desc);
>
>   	if (!*aead) {
> -		*aead = crypto_alloc_aead(cipher_desc->cipher_name, 0, 0);
> +		u32 mask = 0;
> +
> +		if (tx) {
> +			struct sk_psock *psock;
> +
> +			psock = sk_psock_get(sk);
> +			if (psock) {
> +				mask = CRYPTO_ALG_ASYNC;
> +				sw_ctx_tx->async_capable = 1;
> +				sk_psock_put(sk, psock);
> +			}
> +		}
> +
> +		*aead = crypto_alloc_aead(cipher_desc->cipher_name, 0, mask);
>   		if (IS_ERR(*aead)) {
>   			rc = PTR_ERR(*aead);
>   			*aead = NULL;
> --
> 2.54.0

If async_capable is set to 1, the zerocopy path in tls_sw_sendmsg() is 
skipped.
Unfortunately ktls with bpf_msg_pop_data() does not work correctly under 
this
copy path.

tls_clone_plaintext_msg() aliases msg_pl onto msg_en's plaintext area 
(in-place encryption).

BPF runs bpf_msg_pop_data(msg, 0, 2). This shifts msg_pl's SG entry 
forward by 2 bytes.
The two SGs now point to the same page at different offsets. Physical 
memory overlaps but the start of
address differ.

I think selecting a sync provider via mask = CRYPTO_ALG_ASYNC is 
sufficient to
remove the -EINPROGRESS return path.

May be time to remove skmsg from ktls? (disable by default first, 
re-enable via a new ktls module_param?)



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH net v3] net: tls: use sync AEAD for sk_msg BPF sockets
  2026-05-26  6:44 ` Jiayuan Chen
@ 2026-05-26 23:11   ` Jakub Kicinski
  2026-05-27  5:09     ` Jiayuan Chen
  0 siblings, 1 reply; 4+ messages in thread
From: Jakub Kicinski @ 2026-05-26 23:11 UTC (permalink / raw)
  To: Jiayuan Chen
  Cc: Christopher Lusk, John Fastabend, Sabrina Dubroca,
	David S. Miller, Eric Dumazet, Paolo Abeni, Simon Horman,
	Alexei Starovoitov, Daniel Borkmann, netdev, bpf, linux-kernel,
	stable

On Tue, 26 May 2026 14:44:24 +0800 Jiayuan Chen wrote:
> If async_capable is set to 1, the zerocopy path in tls_sw_sendmsg() is 
> skipped.
> Unfortunately ktls with bpf_msg_pop_data() does not work correctly under 
> this
> copy path.
> 
> tls_clone_plaintext_msg() aliases msg_pl onto msg_en's plaintext area 
> (in-place encryption).
> 
> BPF runs bpf_msg_pop_data(msg, 0, 2). This shifts msg_pl's SG entry 
> forward by 2 bytes.
> The two SGs now point to the same page at different offsets. Physical 
> memory overlaps but the start of
> address differ.

Ugh, do you mean that the memcopy path is broken? There are other
conditions under which we may fall into it than just !async_capable :(
Small send with MSG_MORE is probably the easiest?

So we need to fix that one way or the other.

> I think selecting a sync provider via mask = CRYPTO_ALG_ASYNC is 
> sufficient to
> remove the -EINPROGRESS return path.
> 
> May be time to remove skmsg from ktls? (disable by default first, 
> re-enable via a new ktls module_param?)

Yes, we asked John F off-list to get his attention and I think there's
only a vague plan to start using kTLS + sockmap, no current user
(sorry if I misread / misremembered).

module params aren't a great API. If we want to deprecate it let's just
remove the integration in net-next. You have my vote..

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH net v3] net: tls: use sync AEAD for sk_msg BPF sockets
  2026-05-26 23:11   ` Jakub Kicinski
@ 2026-05-27  5:09     ` Jiayuan Chen
  0 siblings, 0 replies; 4+ messages in thread
From: Jiayuan Chen @ 2026-05-27  5:09 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Christopher Lusk, John Fastabend, Sabrina Dubroca,
	David S. Miller, Eric Dumazet, Paolo Abeni, Simon Horman,
	Alexei Starovoitov, Daniel Borkmann, netdev, bpf, linux-kernel,
	stable


On 5/27/26 7:11 AM, Jakub Kicinski wrote:
> On Tue, 26 May 2026 14:44:24 +0800 Jiayuan Chen wrote:
>> If async_capable is set to 1, the zerocopy path in tls_sw_sendmsg() is
>> skipped.
>> Unfortunately ktls with bpf_msg_pop_data() does not work correctly under
>> this
>> copy path.
>>
>> tls_clone_plaintext_msg() aliases msg_pl onto msg_en's plaintext area
>> (in-place encryption).
>>
>> BPF runs bpf_msg_pop_data(msg, 0, 2). This shifts msg_pl's SG entry
>> forward by 2 bytes.
>> The two SGs now point to the same page at different offsets. Physical
>> memory overlaps but the start of
>> address differ.
> Ugh, do you mean that the memcopy path is broken? There are other
> conditions under which we may fall into it than just !async_capable :(
> Small send with MSG_MORE is probably the easiest?
>
> So we need to fix that one way or the other.


Yes, the memcopy path is broken, but only when combined with sockmap's 
pop helper.


msg_pl and msg_en share the underlying page:

                        msg_pl           msg_pl end
                          ^                     ^
                   |------|------------------|-------|
                   | hdr |   plaintext     |  tag  |
                   |------|------------------|-------|
                   ^                                      ^
                   |                                       |
               msg_en                         msg_en end

Before encryption, sge->offset += prot->prepend_size is applied
to msg_en so that the encryption's dst and src point to the same
block of memory.

But once pop has run — i.e. msg_pl's start advances — the encryption's 
dst and src
are no longer the same.

crypto_ctr_crypt():
When dst and src have the same address, crypto saves the encryption 
result into a
temporary buffer and then writes it back to dst.

When dst and src have different addresses, the crypto module treats them 
as two

separate buffers and stops considering in-place mode.

it's complicated to process pop/push + head/mid/tail...

>> I think selecting a sync provider via mask = CRYPTO_ALG_ASYNC is
>> sufficient to
>> remove the -EINPROGRESS return path.
>>
>> May be time to remove skmsg from ktls? (disable by default first,
>> re-enable via a new ktls module_param?)
> Yes, we asked John F off-list to get his attention and I think there's
> only a vague plan to start using kTLS + sockmap, no current user
> (sorry if I misread / misremembered).
>
> module params aren't a great API. If we want to deprecate it let's just
> remove the integration in net-next. You have my vote..

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-05-27  5:09 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-26  2:51 [PATCH net v3] net: tls: use sync AEAD for sk_msg BPF sockets Christopher Lusk
2026-05-26  6:44 ` Jiayuan Chen
2026-05-26 23:11   ` Jakub Kicinski
2026-05-27  5:09     ` Jiayuan Chen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox