BPF List
 help / color / mirror / Atom feed
* [PATCH net v5 0/2] bpf, skmsg: fix verdict sk_data_ready racing with ktls rx
@ 2026-05-17 14:56 Xingwang Xiang
  2026-05-17 14:56 ` [PATCH net v5 1/2] " Xingwang Xiang
  2026-05-17 14:56 ` [PATCH net v5 2/2] selftests/bpf: add regression test for ktls+sockmap verdict UAF Xingwang Xiang
  0 siblings, 2 replies; 5+ messages in thread
From: Xingwang Xiang @ 2026-05-17 14:56 UTC (permalink / raw)
  To: john.fastabend, kuba, mrpre
  Cc: jakub, sd, davem, pabeni, horms, netdev, daniel, bpf,
	Xingwang Xiang

sk_psock_verdict_data_ready() lacks the tls_sw_has_ctx_rx() guard that
sk_psock_strp_data_ready() gained in e91de6afa81c.  When a socket is
inserted into a sockmap (BPF_SK_SKB_VERDICT) before TLS RX is configured,
the missing guard causes tcp_read_skb() to drain sk_receive_queue without
advancing copied_seq, leaving a dangling frag_list pointer that
tls_decrypt_sg() walks — a use-after-free.

Patch 1 mirrors the fix from e91de6afa81c: add the tls_sw_has_ctx_rx()
check to sk_psock_verdict_data_ready() so that when a TLS RX context is
present the function defers to psock->saved_data_ready (sock_def_readable)
instead of calling tcp_read_skb().

Patch 2 adds a selftest that drives the vulnerable sequence end-to-end
and verifies recv() returns the correct decrypted data.

Xingwang Xiang (2):
  bpf, skmsg: fix verdict sk_data_ready racing with ktls rx
  selftests/bpf: add regression test for ktls+sockmap verdict UAF

 net/core/skmsg.c                              |   9 +-
 .../selftests/bpf/prog_tests/sockmap_ktls.c   | 103 ++++++++++++++++++
 .../selftests/bpf/progs/test_sockmap_ktls.c   |  21 ++++
 3 files changed, 131 insertions(+), 2 deletions(-)


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH net v5 1/2] bpf, skmsg: fix verdict sk_data_ready racing with ktls rx
  2026-05-17 14:56 [PATCH net v5 0/2] bpf, skmsg: fix verdict sk_data_ready racing with ktls rx Xingwang Xiang
@ 2026-05-17 14:56 ` Xingwang Xiang
  2026-05-18 14:57   ` sashiko-bot
  2026-05-17 14:56 ` [PATCH net v5 2/2] selftests/bpf: add regression test for ktls+sockmap verdict UAF Xingwang Xiang
  1 sibling, 1 reply; 5+ messages in thread
From: Xingwang Xiang @ 2026-05-17 14:56 UTC (permalink / raw)
  To: john.fastabend, kuba, mrpre
  Cc: jakub, sd, davem, pabeni, horms, netdev, daniel, bpf,
	Xingwang Xiang

sk_psock_strp_data_ready() already checks tls_sw_has_ctx_rx() and
defers to psock->saved_data_ready when a TLS RX context is present,
avoiding a conflict with the TLS strparser's ownership of the receive
queue (commit e91de6afa81c, "bpf: Fix running sk_skb program types
with ktls").

sk_psock_verdict_data_ready() has no equivalent guard.  When a socket
is inserted into a sockmap (BPF_SK_SKB_VERDICT) before TLS RX is
configured, tls_sw_strparser_arm() saves sk_psock_verdict_data_ready
as rx_ctx->saved_data_ready.  On data arrival:

  tls_data_ready -> tls_strp_data_ready -> tls_rx_msg_ready
    -> saved_data_ready() = sk_psock_verdict_data_ready()
      -> tcp_read_skb() drains sk_receive_queue via __skb_unlink()
         without calling tcp_eat_skb(), so copied_seq is not advanced.

tls_strp_msg_load() then finds tcp_inq() >= full_len (stale), calls
tcp_recv_skb() on the now-empty queue, hits WARN_ON_ONCE(!first), and
returns with rx_ctx->strp.anchor.frag_list pointing at a psock-owned
(potentially freed) skb.  tls_decrypt_sg() subsequently walks that
frag_list: use-after-free.

Apply the same fix as sk_psock_strp_data_ready(): if a TLS RX context
is present, call psock->saved_data_ready (sock_def_readable) to wake
recv() waiters and return immediately, leaving the receive queue
untouched.  TLS retains sole ownership of the queue and decrypts the
record normally through tls_sw_recvmsg().

Signed-off-by: Xingwang Xiang <v3rdant.xiang@gmail.com>
---
 net/core/skmsg.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index 6187a83bd..e1850caf1 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -1268,12 +1268,19 @@ static int sk_psock_verdict_recv(struct sock *sk, struct sk_buff *skb)
 static void sk_psock_verdict_data_ready(struct sock *sk)
 {
 	const struct proto_ops *ops = NULL;
+	struct sk_psock *psock;
 	struct socket *sock;
 	int copied;
 
 	trace_sk_data_ready(sk);
 
 	rcu_read_lock();
+	psock = sk_psock(sk);
+	if (psock && tls_sw_has_ctx_rx(sk)) {
+		psock->saved_data_ready(sk);
+		rcu_read_unlock();
+		return;
+	}
 	sock = READ_ONCE(sk->sk_socket);
 	if (likely(sock))
 		ops = READ_ONCE(sock->ops);
@@ -1283,8 +1290,6 @@ static void sk_psock_verdict_data_ready(struct sock *sk)
 
 	copied = ops->read_skb(sk, sk_psock_verdict_recv);
 	if (copied >= 0) {
-		struct sk_psock *psock;
-
 		rcu_read_lock();
 		psock = sk_psock(sk);
 		if (psock)

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH net v5 2/2] selftests/bpf: add regression test for ktls+sockmap verdict UAF
  2026-05-17 14:56 [PATCH net v5 0/2] bpf, skmsg: fix verdict sk_data_ready racing with ktls rx Xingwang Xiang
  2026-05-17 14:56 ` [PATCH net v5 1/2] " Xingwang Xiang
@ 2026-05-17 14:56 ` Xingwang Xiang
  2026-05-18 14:57   ` sashiko-bot
  1 sibling, 1 reply; 5+ messages in thread
From: Xingwang Xiang @ 2026-05-17 14:56 UTC (permalink / raw)
  To: john.fastabend, kuba, mrpre
  Cc: jakub, sd, davem, pabeni, horms, netdev, daniel, bpf,
	Xingwang Xiang

Test the scenario where a socket is inserted into a sockmap with a
BPF_SK_SKB_VERDICT program before TLS RX is configured.  Previously
sk_psock_verdict_data_ready() would call tcp_read_skb() and drain the
receive queue without advancing copied_seq, causing tls_decrypt_sg()
to walk a dangling frag_list pointer (use-after-free).

The test drives the full vulnerable sequence and verifies that after
the fix recv() returns the correct decrypted data.

Signed-off-by: Xingwang Xiang <v3rdant.xiang@gmail.com>
---
 .../selftests/bpf/prog_tests/sockmap_ktls.c   | 103 ++++++++++++++++++
 .../selftests/bpf/progs/test_sockmap_ktls.c   |  21 ++++
 2 files changed, 124 insertions(+)

diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_ktls.c b/tools/testing/selftests/bpf/prog_tests/sockmap_ktls.c
index b87e7f39e..6ed8e149e 100644
--- a/tools/testing/selftests/bpf/prog_tests/sockmap_ktls.c
+++ b/tools/testing/selftests/bpf/prog_tests/sockmap_ktls.c
@@ -417,6 +417,107 @@ static void run_tests(int family, enum bpf_map_type map_type)
 	close(map);
 }
 
+/*
+ * Regression test for the KTLS + sockmap (verdict) reverse-order UAF.
+ *
+ * Vulnerable sequence:
+ *   1. Insert receiver socket into sockmap with BPF_SK_SKB_VERDICT program.
+ *      sk->sk_data_ready becomes sk_psock_verdict_data_ready.
+ *   2. Configure TLS RX: tls_sw_strparser_arm() saves
+ *      sk_psock_verdict_data_ready as rx_ctx->saved_data_ready.
+ *
+ * When data arrives, tls_rx_msg_ready() calls saved_data_ready() =
+ * sk_psock_verdict_data_ready(), which calls tcp_read_skb() and drains
+ * sk_receive_queue via __skb_unlink() without advancing copied_seq.
+ * tls_strp_msg_load() then finds the queue empty while tcp_inq() is still
+ * non-zero, hits WARN_ON_ONCE(!first), and leaves a dangling frag_list
+ * pointer that tls_decrypt_sg() walks — a use-after-free.
+ *
+ * The fix adds a tls_sw_has_ctx_rx() check to sk_psock_verdict_data_ready(),
+ * mirroring what sk_psock_strp_data_ready() already does: when a TLS RX
+ * context is present, defer to psock->saved_data_ready (sock_def_readable)
+ * instead of calling tcp_read_skb(), so TLS retains sole ownership of the
+ * receive queue.  Data is then decrypted and returned correctly by
+ * tls_sw_recvmsg().
+ */
+static void test_sockmap_ktls_verdict_with_tls_rx(int family, int sotype)
+{
+	struct tls12_crypto_info_aes_gcm_128 crypto_info = {};
+	char send_buf[] = "hello ktls sockmap reverse order";
+	char recv_buf[sizeof(send_buf)] = {};
+	struct test_sockmap_ktls *skel;
+	int c = -1, p = -1, zero = 0;
+	int prog_fd, map_fd;
+	ssize_t n;
+	int err;
+
+	skel = test_sockmap_ktls__open_and_load();
+	if (!ASSERT_TRUE(skel, "open_and_load"))
+		return;
+
+	err = create_pair(family, sotype, &c, &p);
+	if (!ASSERT_OK(err, "create_pair"))
+		goto out;
+
+	prog_fd = bpf_program__fd(skel->progs.prog_skb_verdict_pass);
+	map_fd = bpf_map__fd(skel->maps.sock_map_verdict);
+
+	err = bpf_prog_attach(prog_fd, map_fd, BPF_SK_SKB_VERDICT, 0);
+	if (!ASSERT_OK(err, "bpf_prog_attach sk_skb verdict"))
+		goto out;
+
+	/* Step 1: configure TLS TX on sender (no sockmap involvement) */
+	err = setsockopt(c, IPPROTO_TCP, TCP_ULP, "tls", strlen("tls"));
+	if (!ASSERT_OK(err, "setsockopt(TCP_ULP) client"))
+		goto out;
+
+	crypto_info.info.version = TLS_1_2_VERSION;
+	crypto_info.info.cipher_type = TLS_CIPHER_AES_GCM_128;
+	memset(crypto_info.key, 0x01, sizeof(crypto_info.key));
+	memset(crypto_info.salt, 0x02, sizeof(crypto_info.salt));
+
+	err = setsockopt(c, SOL_TLS, TLS_TX, &crypto_info, sizeof(crypto_info));
+	if (!ASSERT_OK(err, "setsockopt(TLS_TX)"))
+		goto out;
+
+	/* Step 2: insert receiver into sockmap BEFORE TLS RX */
+	err = bpf_map_update_elem(map_fd, &zero, &p, BPF_NOEXIST);
+	if (!ASSERT_OK(err, "bpf_map_update_elem"))
+		goto out;
+
+	/* Step 3: configure TLS RX AFTER sockmap insertion */
+	err = setsockopt(p, IPPROTO_TCP, TCP_ULP, "tls", strlen("tls"));
+	if (!ASSERT_OK(err, "setsockopt(TCP_ULP) server"))
+		goto out;
+
+	err = setsockopt(p, SOL_TLS, TLS_RX, &crypto_info, sizeof(crypto_info));
+	if (!ASSERT_OK(err, "setsockopt(TLS_RX)"))
+		goto out;
+
+	/*
+	 * A buggy kernel hits WARN_ON_ONCE in tls_strp_load_anchor_with_queue
+	 * and may UAF in tls_decrypt_sg here.  With the fix,
+	 * sk_psock_verdict_data_ready defers to sock_def_readable and TLS
+	 * decrypts the record normally.
+	 */
+	n = send(c, send_buf, sizeof(send_buf), 0);
+	if (!ASSERT_EQ(n, (ssize_t)sizeof(send_buf), "send"))
+		goto out;
+
+	n = recv_timeout(p, recv_buf, sizeof(recv_buf), 0, 5);
+	if (!ASSERT_EQ(n, (ssize_t)sizeof(send_buf), "recv"))
+		goto out;
+
+	ASSERT_OK(memcmp(send_buf, recv_buf, sizeof(send_buf)), "data integrity");
+
+out:
+	if (c != -1)
+		close(c);
+	if (p != -1)
+		close(p);
+	test_sockmap_ktls__destroy(skel);
+}
+
 static void run_ktls_test(int family, int sotype)
 {
 	if (test__start_subtest("tls simple offload"))
@@ -429,6 +530,8 @@ static void run_ktls_test(int family, int sotype)
 		test_sockmap_ktls_tx_no_buf(family, sotype, true);
 	if (test__start_subtest("tls tx with pop"))
 		test_sockmap_ktls_tx_pop(family, sotype);
+	if (test__start_subtest("tls verdict with tls rx"))
+		test_sockmap_ktls_verdict_with_tls_rx(family, sotype);
 }
 
 void test_sockmap_ktls(void)
diff --git a/tools/testing/selftests/bpf/progs/test_sockmap_ktls.c b/tools/testing/selftests/bpf/progs/test_sockmap_ktls.c
index 83df4919c..facafeaf4 100644
--- a/tools/testing/selftests/bpf/progs/test_sockmap_ktls.c
+++ b/tools/testing/selftests/bpf/progs/test_sockmap_ktls.c
@@ -17,6 +17,13 @@ struct {
 	__type(value, int);
 } sock_map SEC(".maps");
 
+struct {
+	__uint(type, BPF_MAP_TYPE_SOCKMAP);
+	__uint(max_entries, 2);
+	__type(key, int);
+	__type(value, int);
+} sock_map_verdict SEC(".maps");
+
 SEC("sk_msg")
 int prog_sk_policy(struct sk_msg_md *msg)
 {
@@ -38,3 +45,17 @@ int prog_sk_policy_redir(struct sk_msg_md *msg)
 	bpf_msg_apply_bytes(msg, apply_bytes);
 	return bpf_msg_redirect_map(msg, &sock_map, two, 0);
 }
+
+/*
+ * Verdict program for the reverse-order TLS/sockmap regression test.
+ * Returns SK_PASS so tcp_read_skb() drains the receive queue via
+ * sk_psock_verdict_recv() without calling tcp_eat_skb(), which is
+ * the precondition for the KTLS strparser frag_list UAF.
+ */
+SEC("sk_skb/verdict")
+int prog_skb_verdict_pass(struct __sk_buff *skb)
+{
+	return SK_PASS;
+}
+
+char _license[] SEC("license") = "GPL";

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH net v5 1/2] bpf, skmsg: fix verdict sk_data_ready racing with ktls rx
  2026-05-17 14:56 ` [PATCH net v5 1/2] " Xingwang Xiang
@ 2026-05-18 14:57   ` sashiko-bot
  0 siblings, 0 replies; 5+ messages in thread
From: sashiko-bot @ 2026-05-18 14:57 UTC (permalink / raw)
  To: Xingwang Xiang; +Cc: bpf

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [Critical] Concurrent sockmap deletion causes a deterministic NULL pointer dereference and kernel panic.
- [High] Type confusion leading to out-of-bounds reads and BPF verdict bypass due to blind ULP context cast.
--

commit 99c04859dba221cd63c5238a47d1833983548f83
Author: Xingwang Xiang <v3rdant.xiang@gmail.com>

bpf, skmsg: fix verdict sk_data_ready racing with ktls rx

sk_psock_strp_data_ready() already checks tls_sw_has_ctx_rx() and
defers to psock->saved_data_ready when a TLS RX context is present.
This patch applies the same logic to sk_psock_verdict_data_ready() to
prevent a use-after-free caused by tcp_read_skb draining the queue
without advancing copied_seq.

> diff --git a/net/core/skmsg.c b/net/core/skmsg.c
> index 6187a83bd7411..e1850caf1a71a 100644
> --- a/net/core/skmsg.c
> +++ b/net/core/skmsg.c
> @@ -1268,12 +1268,19 @@ static int sk_psock_verdict_recv(struct sock *sk, struct sk_buff *skb)
>  static void sk_psock_verdict_data_ready(struct sock *sk)
>  {
>  	const struct proto_ops *ops = NULL;
> +	struct sk_psock *psock;
>  	struct socket *sock;
>  	int copied;
>  
>  	trace_sk_data_ready(sk);
>  
>  	rcu_read_lock();
> +	psock = sk_psock(sk);
> +	if (psock && tls_sw_has_ctx_rx(sk)) {

Is it possible for tls_sw_has_ctx_rx() to cause a type confusion regression here?

tls_sw_has_ctx_rx() calls tls_get_ctx(), which casts icsk->icsk_ulp_data
to a struct tls_context pointer. If a socket has a different ULP configured,
like espintcp, and is attached to a sockmap, icsk_ulp_data will point to a
different structure like espintcp_ctx.

Reading from this mismatched structure could yield garbage data. If that
data is non-zero, the code might falsely assume a KTLS RX context is active
and incorrectly skip the BPF verdict program.

While this problem already exists in sk_psock_strp_data_ready(), does this
change spread the same issue to the verdict path?

> +		psock->saved_data_ready(sk);

Can this lead to a NULL pointer dereference regression during concurrent
sockmap deletion?

sk_psock_verdict_data_ready() executes here without holding
sk->sk_callback_lock. If userspace concurrently removes the socket from the
sockmap, sk_psock_stop_verdict() will be called, which explicitly sets
psock->saved_data_ready to NULL.

Since this code doesn't hold read_lock_bh(&sk->sk_callback_lock) or check if
saved_data_ready is NULL before calling it, could it execute the newly nullified
pointer and cause a crash?

I notice this vulnerability already exists in sk_psock_strp_data_ready(), but
does this duplicate it into the verdict handler?

> +		rcu_read_unlock();
> +		return;
> +	}

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260517145630.20521-1-v3rdant.xiang@gmail.com?part=1

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net v5 2/2] selftests/bpf: add regression test for ktls+sockmap verdict UAF
  2026-05-17 14:56 ` [PATCH net v5 2/2] selftests/bpf: add regression test for ktls+sockmap verdict UAF Xingwang Xiang
@ 2026-05-18 14:57   ` sashiko-bot
  0 siblings, 0 replies; 5+ messages in thread
From: sashiko-bot @ 2026-05-18 14:57 UTC (permalink / raw)
  To: Xingwang Xiang; +Cc: bpf

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Low] Usage of ASSERT_TRUE for skeleton pointer checks instead of ASSERT_OK_PTR.
--

commit 2da876f992ba88c42d69b2910d90f780e2c1b89f
Author: Xingwang Xiang <v3rdant.xiang@gmail.com>
selftests/bpf: add regression test for ktls+sockmap verdict UAF

This commit adds a regression test for a use-after-free vulnerability
involving KTLS and sockmap verdict programs. The test creates a scenario
where TLS RX is configured after inserting the receiver socket into a
sockmap.

> diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_ktls.c b/tools/testing/selftests/bpf/prog_tests/sockmap_ktls.c
> index b87e7f39e15a8..6ed8e149e3d58 100644
> --- a/tools/testing/selftests/bpf/prog_tests/sockmap_ktls.c
> +++ b/tools/testing/selftests/bpf/prog_tests/sockmap_ktls.c

[ ... ]

> @@ -417,6 +417,107 @@ static void run_tests(int family, enum bpf_map_type map_type)
>  	close(map);
>  }
>  
> +/*
> + * Regression test for the KTLS + sockmap (verdict) reverse-order UAF.
> + *
> + * Vulnerable sequence:

[ ... ]

> +static void test_sockmap_ktls_verdict_with_tls_rx(int family, int sotype)
> +{
> +	struct tls12_crypto_info_aes_gcm_128 crypto_info = {};
> +	char send_buf[] = "hello ktls sockmap reverse order";
> +	char recv_buf[sizeof(send_buf)] = {};
> +	struct test_sockmap_ktls *skel;
> +	int c = -1, p = -1, zero = 0;
> +	int prog_fd, map_fd;
> +	ssize_t n;
> +	int err;
> +
> +	skel = test_sockmap_ktls__open_and_load();
> +	if (!ASSERT_TRUE(skel, "open_and_load"))

This isn't a bug, but should this use ASSERT_OK_PTR() instead?

ASSERT_OK_PTR() handles IS_ERR_OR_NULL checks and provides better
debugging output for pointers on failure.

> +		return;

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260517145630.20521-1-v3rdant.xiang@gmail.com?part=2

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-05-18 14:57 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-17 14:56 [PATCH net v5 0/2] bpf, skmsg: fix verdict sk_data_ready racing with ktls rx Xingwang Xiang
2026-05-17 14:56 ` [PATCH net v5 1/2] " Xingwang Xiang
2026-05-18 14:57   ` sashiko-bot
2026-05-17 14:56 ` [PATCH net v5 2/2] selftests/bpf: add regression test for ktls+sockmap verdict UAF Xingwang Xiang
2026-05-18 14:57   ` sashiko-bot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox