From: Jiayuan Chen <jiayuan.chen@linux.dev>
To: Christopher Lusk <clusk@northecho.dev>, Jakub Kicinski <kuba@kernel.org>
Cc: John Fastabend <john.fastabend@gmail.com>,
Sabrina Dubroca <sd@queasysnail.net>,
"David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Paolo Abeni <pabeni@redhat.com>, Simon Horman <horms@kernel.org>,
Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
netdev@vger.kernel.org, bpf@vger.kernel.org,
linux-kernel@vger.kernel.org, stable@vger.kernel.org
Subject: Re: [PATCH net v3] net: tls: use sync AEAD for sk_msg BPF sockets
Date: Tue, 26 May 2026 14:44:24 +0800 [thread overview]
Message-ID: <d92bc603-e345-4dee-9ae9-6ad45e4e6642@linux.dev> (raw)
In-Reply-To: <20260526025154.60607-1-clusk@northecho.dev>
On 5/26/26 10:51 AM, Christopher Lusk wrote:
> The kTLS TX path can hand an open record to a sk_msg verdict
> program before encryption. If the verdict applies fewer bytes
> than the open record contains, tls_push_record() splits
> ctx->open_rec into the record being encrypted and a remainder.
> The synchronous path reattaches that remainder before continuing.
>
> With an async AEAD provider, crypto_aead_encrypt() can return
> -EINPROGRESS after ctx->open_rec has been unhooked but before the
> split remainder is reattached. The remainder is no longer
> reachable through ctx->open_rec or ctx->tx_list, silently dropping
> transmitted data and leaking the unreachable tls_rec. The same
> composition also entangles the user-page zerocopy lifetime rules
> with an async completion path.
>
> A sockmap cannot be attached to a socket after an inet ULP is
> installed: sk_psock_init() returns -EINVAL when
> inet_csk_has_ulp() is true. So the supported ordering for
> sockmap + kTLS TX is sockmap first, TLS_TX setup second. When
> TLS_TX setup sees an existing sk_psock, allocate the AEAD with
> CRYPTO_ALG_ASYNC masked out and latch the TX zerocopy gate
> (sw_ctx_tx->async_capable) so the buggy composition becomes
> structurally unreachable. Ordinary kTLS sockets without sk_msg
> BPF attached are unaffected and continue to use async-capable
> providers.
>
> Fixes: d3b18ad31f93 ("tls: add bpf support to sk_msg handling")
> Cc: stable@vger.kernel.org # 4.20+
> Signed-off-by: Christopher Lusk <clusk@northecho.dev>
> Assisted-by: Codex:gpt-5.5
> Assisted-by: Claude:claude-opus-4-7
> ---
>
> Changes since v2 [1]:
> - Per netdev maintainer guidance [2], replace the Option-C
> drain-on-error fix with a setup-time surface narrowing in
> tls_set_sw_offload(): when a sockmap is already attached at
> TLS_TX setup, request a synchronous AEAD (CRYPTO_ALG_ASYNC in
> the allocation mask) and set sw_ctx_tx->async_capable = 1.
> Both moves are needed: latching async_capable alone disables
> zerocopy but tls_do_encryption() can still return -EINPROGRESS
> on the copy path; selecting a sync provider removes that return
> path for sk_msg-attached sockets.
> - Drop the selftest from the series per Jakub's note that the
> existing sockmap + TLS coverage at
> tools/testing/selftests/bpf/prog_tests/sockmap_ktls.c exercises
> this configuration [3]. That suite covers sockmap + kTLS
> policy paths broadly; the specific async-pcrypt pass-then-drop
> failure mode from the v2 reproducer was validated for v3 on
> QEMU/KVM with a KASAN+LOCKDEP-instrumented kernel against net
> base 2156a29aecff before send.
> - Single-patch series.
>
> Changes since v1:
> - v1's remainder-rooting fix was incomplete; Sashiko AI review
> surfaced a real UAF in the v2 follow-up that John Fastabend
> endorsed on the v1 thread [4]. The surface-narrowing approach
> in v3 makes both failure modes unreachable by avoiding the
> async + sk_msg composition entirely rather than patching each
> continuation point.
>
> [1] https://lore.kernel.org/all/20260521025840.976378-1-clusk@northecho.dev/
> [2] https://lore.kernel.org/all/20260525133028.58494274@kernel.org/
> [3] https://lore.kernel.org/all/20260525133048.2dc6d8d3@kernel.org/
> [4] https://lore.kernel.org/all/huduxtn6parzgiaf5cyiyrrvjjvx6jsdedowvrd4nkwmuyeind@j6migjgofh2i/
>
> net/tls/tls_sw.c | 15 ++++++++++++++-
> 1 file changed, 14 insertions(+), 1 deletion(-)
>
> diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
> index 964ebc268..0000000 100644
> --- a/net/tls/tls_sw.c
> +++ b/net/tls/tls_sw.c
> @@ -2867,7 +2867,20 @@ int tls_set_sw_offload(struct sock *sk, int tx,
> rec_seq = crypto_info_rec_seq(src_crypto_info, cipher_desc);
>
> if (!*aead) {
> - *aead = crypto_alloc_aead(cipher_desc->cipher_name, 0, 0);
> + u32 mask = 0;
> +
> + if (tx) {
> + struct sk_psock *psock;
> +
> + psock = sk_psock_get(sk);
> + if (psock) {
> + mask = CRYPTO_ALG_ASYNC;
> + sw_ctx_tx->async_capable = 1;
> + sk_psock_put(sk, psock);
> + }
> + }
> +
> + *aead = crypto_alloc_aead(cipher_desc->cipher_name, 0, mask);
> if (IS_ERR(*aead)) {
> rc = PTR_ERR(*aead);
> *aead = NULL;
> --
> 2.54.0
If async_capable is set to 1, the zerocopy path in tls_sw_sendmsg() is
skipped.
Unfortunately ktls with bpf_msg_pop_data() does not work correctly under
this
copy path.
tls_clone_plaintext_msg() aliases msg_pl onto msg_en's plaintext area
(in-place encryption).
BPF runs bpf_msg_pop_data(msg, 0, 2). This shifts msg_pl's SG entry
forward by 2 bytes.
The two SGs now point to the same page at different offsets. Physical
memory overlaps but the start of
address differ.
I think selecting a sync provider via mask = CRYPTO_ALG_ASYNC is
sufficient to
remove the -EINPROGRESS return path.
May be time to remove skmsg from ktls? (disable by default first,
re-enable via a new ktls module_param?)
prev parent reply other threads:[~2026-05-26 6:44 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-26 2:51 [PATCH net v3] net: tls: use sync AEAD for sk_msg BPF sockets Christopher Lusk
2026-05-26 6:44 ` Jiayuan Chen [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d92bc603-e345-4dee-9ae9-6ad45e4e6642@linux.dev \
--to=jiayuan.chen@linux.dev \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=clusk@northecho.dev \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=horms@kernel.org \
--cc=john.fastabend@gmail.com \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=sd@queasysnail.net \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox