BPF List
 help / color / mirror / Atom feed
* [PATCH net 0/7] net: tls: fix a few random bugs
@ 2026-04-29 22:29 Jakub Kicinski
  2026-04-29 22:29 ` [PATCH net 1/7] net: tls: fix silent data drop under pipe back-pressure Jakub Kicinski
                   ` (7 more replies)
  0 siblings, 8 replies; 19+ messages in thread
From: Jakub Kicinski @ 2026-04-29 22:29 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, bpf,
	john.fastabend, sd, linux-kselftest, Jakub Kicinski

Fix a few random bugs, from external reports and my local scan
with various AI tools. Mostly corner cases in code which I don't
think TLS maintainers would consider "battle tested".

Jakub Kicinski (7):
  net: tls: fix silent data drop under pipe back-pressure
  selftests: tls: add test for data loss on small pipe
  net: tls: fix page pin leak on sendpage_ok() failure
  net: tls: fix off-by-one in sg_chain entry count for wrapped sk_msg
    ring
  selftests: bpf: cover wrapped sk_msg ring chaining in ktls TX path
  net: tls: fix use-after-free in tls_sw_sendmsg_locked after bpf
    verdict
  selftests: bpf: cover tls_sw_sendmsg UAF after bpf_exec_tx_verdict
    split

 net/tls/tls_device.c                          |   2 +
 net/tls/tls_sw.c                              |  17 ++-
 .../selftests/bpf/prog_tests/sockmap_ktls.c   | 139 ++++++++++++++++++
 .../selftests/bpf/progs/test_sockmap_ktls.c   |  10 ++
 tools/testing/selftests/net/tls.c             |  43 ++++++
 5 files changed, 207 insertions(+), 4 deletions(-)

-- 
2.54.0


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH net 1/7] net: tls: fix silent data drop under pipe back-pressure
  2026-04-29 22:29 [PATCH net 0/7] net: tls: fix a few random bugs Jakub Kicinski
@ 2026-04-29 22:29 ` Jakub Kicinski
  2026-04-29 22:29 ` [PATCH net 2/7] selftests: tls: add test for data loss on small pipe Jakub Kicinski
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Jakub Kicinski @ 2026-04-29 22:29 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, bpf,
	john.fastabend, sd, linux-kselftest, Jakub Kicinski

tls_sw_splice_read() uses len when advancing rxm->offset / rxm->full_len
after skb_splice_bits(), rather than copied (the actual number of bytes
successfully spliced into the pipe). When the destination pipe cannot
accept all the requested bytes, splice_to_pipe() returns fewer bytes
than len, and 'len - copied' of data is effectively skipped over.

Fixes: e062fe99cccd ("tls: splice_read: fix accessing pre-processed records")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
CC: john.fastabend@gmail.com
CC: sd@queasysnail.net
---
 net/tls/tls_sw.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 798243eabb1f..2590e855f6a5 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -2317,9 +2317,9 @@ ssize_t tls_sw_splice_read(struct socket *sock,  loff_t *ppos,
 	if (copied < 0)
 		goto splice_requeue;
 
-	if (chunk < rxm->full_len) {
-		rxm->offset += len;
-		rxm->full_len -= len;
+	if (copied < rxm->full_len) {
+		rxm->offset += copied;
+		rxm->full_len -= copied;
 		goto splice_requeue;
 	}
 
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH net 2/7] selftests: tls: add test for data loss on small pipe
  2026-04-29 22:29 [PATCH net 0/7] net: tls: fix a few random bugs Jakub Kicinski
  2026-04-29 22:29 ` [PATCH net 1/7] net: tls: fix silent data drop under pipe back-pressure Jakub Kicinski
@ 2026-04-29 22:29 ` Jakub Kicinski
  2026-04-30 22:31   ` sashiko-bot
  2026-04-29 22:29 ` [PATCH net 3/7] net: tls: fix page pin leak on sendpage_ok() failure Jakub Kicinski
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Jakub Kicinski @ 2026-04-29 22:29 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, bpf,
	john.fastabend, sd, linux-kselftest, Jakub Kicinski, shuah

Add selftest for data loss on short splice.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
CC: john.fastabend@gmail.com
CC: sd@queasysnail.net
CC: shuah@kernel.org
CC: linux-kselftest@vger.kernel.org
---
 tools/testing/selftests/net/tls.c | 43 +++++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/tools/testing/selftests/net/tls.c b/tools/testing/selftests/net/tls.c
index 9e2ccea13d70..30a236b8e9f7 100644
--- a/tools/testing/selftests/net/tls.c
+++ b/tools/testing/selftests/net/tls.c
@@ -946,6 +946,49 @@ TEST_F(tls, peek_and_splice)
 	EXPECT_EQ(memcmp(mem_send, mem_recv, send_len), 0);
 }
 
+TEST_F(tls, splice_to_pipe_small)
+{
+	int send_len = TLS_PAYLOAD_MAX_LEN;
+	char mem_send[TLS_PAYLOAD_MAX_LEN];
+	char mem_recv[TLS_PAYLOAD_MAX_LEN];
+	size_t total = 0;
+	int p[2];
+
+	memrnd(mem_send, sizeof(mem_send));
+
+	ASSERT_GE(pipe(p), 0);
+
+	/* Shrink pipe to 1 page (typically 4096 bytes) to force multiple
+	 * splice iterations for a 16384-byte TLS record.
+	 */
+	EXPECT_GE(fcntl(p[1], F_SETPIPE_SZ, 4096), 4096);
+
+	EXPECT_EQ(send(self->fd, mem_send, send_len, 0), send_len);
+
+	while (total < (size_t)send_len) {
+		ssize_t spliced, drained;
+
+		spliced = splice(self->cfd, NULL, p[1], NULL,
+				 send_len - total, 0);
+		EXPECT_GT(spliced, 0);
+		if (spliced <= 0)
+			break;
+
+		drained = read(p[0], mem_recv + total, spliced);
+		EXPECT_EQ(drained, spliced);
+		if (drained <= 0)
+			break;
+
+		total += drained;
+	}
+
+	EXPECT_EQ(total, (size_t)send_len);
+	EXPECT_EQ(memcmp(mem_send, mem_recv, send_len), 0);
+
+	close(p[0]);
+	close(p[1]);
+}
+
 #define MAX_FRAGS 48
 TEST_F(tls, splice_short)
 {
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH net 3/7] net: tls: fix page pin leak on sendpage_ok() failure
  2026-04-29 22:29 [PATCH net 0/7] net: tls: fix a few random bugs Jakub Kicinski
  2026-04-29 22:29 ` [PATCH net 1/7] net: tls: fix silent data drop under pipe back-pressure Jakub Kicinski
  2026-04-29 22:29 ` [PATCH net 2/7] selftests: tls: add test for data loss on small pipe Jakub Kicinski
@ 2026-04-29 22:29 ` Jakub Kicinski
  2026-04-30 22:31   ` sashiko-bot
  2026-04-29 22:29 ` [PATCH net 4/7] net: tls: fix off-by-one in sg_chain entry count for wrapped sk_msg ring Jakub Kicinski
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Jakub Kicinski @ 2026-04-29 22:29 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, bpf,
	john.fastabend, sd, linux-kselftest, Jakub Kicinski, dhowells

When iov_iter_extract_pages() is called on a user-backed iterator, it
pins the extracted pages via pin_user_pages_fast(). If the subsequent
sendpage_ok() check fails, both tls_sw_sendmsg_splice() and
tls_push_data() (device offload path) call iov_iter_revert() to unwind
the iterator position, but iov_iter_revert() only adjusts the cursor -
it does not release the page pin taken by the extract.

Fixes: 45e5be844ab6 ("tls/sw: Convert tls_sw_sendpage() to use MSG_SPLICE_PAGES")
Fixes: 24763c9c0980 ("tls/device: Support MSG_SPLICE_PAGES")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
CC: john.fastabend@gmail.com
CC: sd@queasysnail.net
CC: dhowells@redhat.com
---
 net/tls/tls_device.c | 2 ++
 net/tls/tls_sw.c     | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
index 741aef09bfd3..6aae8d04f976 100644
--- a/net/tls/tls_device.c
+++ b/net/tls/tls_device.c
@@ -508,6 +508,8 @@ static int tls_push_data(struct sock *sk,
 			copy = rc;
 
 			if (WARN_ON_ONCE(!sendpage_ok(zc_pfrag.page))) {
+				if (iov_iter_extract_will_pin(iter))
+					unpin_user_page(zc_pfrag.page);
 				iov_iter_revert(iter, copy);
 				rc = -EIO;
 				goto handle_error;
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 2590e855f6a5..906a1998c630 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -1018,6 +1018,8 @@ static int tls_sw_sendmsg_splice(struct sock *sk, struct msghdr *msg,
 			return part ?: -EIO;
 
 		if (WARN_ON_ONCE(!sendpage_ok(page))) {
+			if (iov_iter_extract_will_pin(&msg->msg_iter))
+				unpin_user_page(page);
 			iov_iter_revert(&msg->msg_iter, part);
 			return -EIO;
 		}
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH net 4/7] net: tls: fix off-by-one in sg_chain entry count for wrapped sk_msg ring
  2026-04-29 22:29 [PATCH net 0/7] net: tls: fix a few random bugs Jakub Kicinski
                   ` (2 preceding siblings ...)
  2026-04-29 22:29 ` [PATCH net 3/7] net: tls: fix page pin leak on sendpage_ok() failure Jakub Kicinski
@ 2026-04-29 22:29 ` Jakub Kicinski
  2026-04-30 22:31   ` sashiko-bot
  2026-05-01 16:40   ` Sabrina Dubroca
  2026-04-29 22:29 ` [PATCH net 5/7] selftests: bpf: cover wrapped sk_msg ring chaining in ktls TX path Jakub Kicinski
                   ` (3 subsequent siblings)
  7 siblings, 2 replies; 19+ messages in thread
From: Jakub Kicinski @ 2026-04-29 22:29 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, bpf,
	john.fastabend, sd, linux-kselftest, Jakub Kicinski,
	钱一铭, daniel, jonathan.lemon

When an sk_msg scatterlist ring wraps (sg.end < sg.start),
tls_push_record() chains the tail portion of the ring to the head
using sg_chain(). The entry count passed to sg_chain() determines
where the chain pointer is written: at prv[prv_nents - 1].

The current code uses MAX_SKB_FRAGS (17) as the ring size:

    sg_chain(&msg_pl->sg.data[msg_pl->sg.start],
             MAX_SKB_FRAGS - msg_pl->sg.start + 1,
             msg_pl->sg.data);

This places the chain pointer at data[start + (MAX_SKB_FRAGS - start
+ 1) - 1] = data[MAX_SKB_FRAGS] = data[17]. However, since commit
031097d9e079 ("bpf: sk_msg, zap ingress queue on psock down") expanded
the ring from MAX_MSG_FRAGS to NR_MSG_FRAG_IDS (18) positions,
data[17] is a valid ring slot that can hold live scatterlist entries.
The chain pointer must land at data[NR_MSG_FRAG_IDS] (index 18), the
reserved chaining slot.

Every other wrapped-ring arithmetic operation in the sk_msg subsystem
(sk_msg_iter_dist, sk_msg_iter_var_next, sk_msg_iter_var_prev,
bpf_msg_pull_data) correctly uses NR_MSG_FRAG_IDS as the ring modulus.
This sg_chain call is the sole remaining use of MAX_SKB_FRAGS for
ring-modulus arithmetic and was introduced after the ring expansion.

Reported-by: 钱一铭 <yimingqian591@gmail.com>
Fixes: 9aaaa56845a0 ("bpf: Sockmap/tls, skmsg can have wrapped skmsg that needs extra chaining")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
CC: john.fastabend@gmail.com
CC: sd@queasysnail.net
CC: daniel@iogearbox.net
CC: jonathan.lemon@gmail.com
CC: bpf@vger.kernel.org
---
 net/tls/tls_sw.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 906a1998c630..600e13effaab 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -802,7 +802,7 @@ static int tls_push_record(struct sock *sk, int flags,
 
 	if (msg_pl->sg.end < msg_pl->sg.start) {
 		sg_chain(&msg_pl->sg.data[msg_pl->sg.start],
-			 MAX_SKB_FRAGS - msg_pl->sg.start + 1,
+			 NR_MSG_FRAG_IDS - msg_pl->sg.start + 1,
 			 msg_pl->sg.data);
 	}
 
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH net 5/7] selftests: bpf: cover wrapped sk_msg ring chaining in ktls TX path
  2026-04-29 22:29 [PATCH net 0/7] net: tls: fix a few random bugs Jakub Kicinski
                   ` (3 preceding siblings ...)
  2026-04-29 22:29 ` [PATCH net 4/7] net: tls: fix off-by-one in sg_chain entry count for wrapped sk_msg ring Jakub Kicinski
@ 2026-04-29 22:29 ` Jakub Kicinski
  2026-04-30 17:58   ` Jiayuan Chen
  2026-04-30 22:31   ` sashiko-bot
  2026-04-29 22:29 ` [PATCH net 6/7] net: tls: fix use-after-free in tls_sw_sendmsg_locked after bpf verdict Jakub Kicinski
                   ` (2 subsequent siblings)
  7 siblings, 2 replies; 19+ messages in thread
From: Jakub Kicinski @ 2026-04-29 22:29 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, bpf,
	john.fastabend, sd, linux-kselftest, Jakub Kicinski, ast, daniel,
	andrii, martin.lau, eddyz87, memxor, song, yonghong.song, jolsa,
	shuah, jiayuan.chen, isolodrai

Add a regression test for the off-by-one in tls_push_record() where
the sg_chain() entry count was MAX_SKB_FRAGS instead of NR_MSG_FRAG_IDS,
causing the chain pointer to overwrite a live ring slot when an sk_msg
scatterlist ring wrapped (sg.end < sg.start).

The new "tls tx wrapped sg chain" subtest:
1. attaches an SK_MSG program (prog_sk_policy_drop) that drops the first
   N bytes of a message via bpf_msg_apply_bytes() + SK_DROP,
2. splices 17 single-byte frags through a kTLS TX socket so the ring
   fills to sg.start=16, sg.end=17,
3. removes the socket from the sockmap and sends one more byte, which
   wraps sg.end to 0 and exercises the wrap branch in tls_push_record().

Without the fix the kernel hangs on the wrapping send (the corrupted
chain pointer leaves the sg traversal stuck); with the fix the test
completes cleanly.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
CC: ast@kernel.org
CC: daniel@iogearbox.net
CC: andrii@kernel.org
CC: martin.lau@linux.dev
CC: eddyz87@gmail.com
CC: memxor@gmail.com
CC: song@kernel.org
CC: yonghong.song@linux.dev
CC: jolsa@kernel.org
CC: shuah@kernel.org
CC: john.fastabend@gmail.com
CC: jiayuan.chen@linux.dev
CC: isolodrai@meta.com
CC: bpf@vger.kernel.org
CC: linux-kselftest@vger.kernel.org
---
 .../selftests/bpf/prog_tests/sockmap_ktls.c   | 83 +++++++++++++++++++
 .../selftests/bpf/progs/test_sockmap_ktls.c   |  8 ++
 2 files changed, 91 insertions(+)

diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_ktls.c b/tools/testing/selftests/bpf/prog_tests/sockmap_ktls.c
index b87e7f39e15a..8ab7f4cdc614 100644
--- a/tools/testing/selftests/bpf/prog_tests/sockmap_ktls.c
+++ b/tools/testing/selftests/bpf/prog_tests/sockmap_ktls.c
@@ -4,6 +4,7 @@
  * Tests for sockmap/sockhash holding kTLS sockets.
  */
 #include <error.h>
+#include <fcntl.h>
 #include <netinet/tcp.h>
 #include <linux/tls.h>
 #include "test_progs.h"
@@ -403,6 +404,86 @@ static void test_sockmap_ktls_tx_pop(int family, int sotype)
 	test_sockmap_ktls__destroy(skel);
 }
 
+static void test_sockmap_ktls_tx_wrapped_chain(int family, int sotype)
+{
+	int c = -1, p = -1, one = 1, prog_fd, map_fd;
+	int pipefd[2] = { -1, -1 };
+	struct test_sockmap_ktls *skel;
+	char byte;
+	ssize_t n;
+	int err, i;
+
+	skel = test_sockmap_ktls__open_and_load();
+	if (!ASSERT_TRUE(skel, "open ktls skel"))
+		return;
+
+	err = create_pair(family, sotype, &c, &p);
+	if (!ASSERT_OK(err, "create_pair()"))
+		goto out;
+
+	prog_fd = bpf_program__fd(skel->progs.prog_sk_policy_drop);
+	map_fd = bpf_map__fd(skel->maps.sock_map);
+
+	err = bpf_prog_attach(prog_fd, map_fd, BPF_SK_MSG_VERDICT, 0);
+	if (!ASSERT_OK(err, "bpf_prog_attach sk msg"))
+		goto out;
+
+	err = bpf_map_update_elem(map_fd, &one, &c, BPF_NOEXIST);
+	if (!ASSERT_OK(err, "bpf_map_update_elem(c)"))
+		goto out;
+
+	err = init_ktls_pairs(c, p);
+	if (!ASSERT_OK(err, "init_ktls_pairs(c, p)"))
+		goto out;
+
+	/* packetized pipe so each splice frag becomes its own sg entry */
+	err = pipe2(pipefd, O_DIRECT);
+	if (!ASSERT_OK(err, "pipe2"))
+		goto out;
+	err = fcntl(pipefd[0], F_SETPIPE_SZ, 17 * 4096);
+	if (!ASSERT_GE(err, 17 * 4096, "F_SETPIPE_SZ"))
+		goto out;
+
+	for (i = 0; i < 17; i++) {
+		byte = 'A' + i;
+		if (!ASSERT_EQ(write(pipefd[1], &byte, 1), 1, "write to pipe"))
+			goto out;
+	}
+
+	/* drop the first 16 bytes so sg.start advances to 16 */
+	skel->bss->apply_bytes = 16;
+
+	n = splice(pipefd[0], NULL, c, NULL, 17, 0);
+	if (n < 0)
+		ASSERT_EQ(errno, EACCES, "splice errno");
+
+	err = bpf_map_delete_elem(map_fd, &one);
+	if (!ASSERT_OK(err, "bpf_map_delete_elem"))
+		goto out;
+	usleep(50000);
+
+	while (recv(p, &byte, 1, MSG_DONTWAIT) > 0)
+		;
+
+	/* this send wraps sg.end to 0 and trips the wrap branch */
+	byte = 'X';
+	n = send(c, &byte, 1, MSG_DONTWAIT);
+	if (n < 0)
+		ASSERT_TRUE(errno == EAGAIN || errno == EACCES || errno == EPIPE,
+			    "send errno");
+
+out:
+	if (pipefd[0] != -1)
+		close(pipefd[0]);
+	if (pipefd[1] != -1)
+		close(pipefd[1]);
+	if (c != -1)
+		close(c);
+	if (p != -1)
+		close(p);
+	test_sockmap_ktls__destroy(skel);
+}
+
 static void run_tests(int family, enum bpf_map_type map_type)
 {
 	int map;
@@ -429,6 +510,8 @@ static void run_ktls_test(int family, int sotype)
 		test_sockmap_ktls_tx_no_buf(family, sotype, true);
 	if (test__start_subtest("tls tx with pop"))
 		test_sockmap_ktls_tx_pop(family, sotype);
+	if (test__start_subtest("tls tx wrapped sg chain"))
+		test_sockmap_ktls_tx_wrapped_chain(family, sotype);
 }
 
 void test_sockmap_ktls(void)
diff --git a/tools/testing/selftests/bpf/progs/test_sockmap_ktls.c b/tools/testing/selftests/bpf/progs/test_sockmap_ktls.c
index 83df4919c224..18de4d7cd816 100644
--- a/tools/testing/selftests/bpf/progs/test_sockmap_ktls.c
+++ b/tools/testing/selftests/bpf/progs/test_sockmap_ktls.c
@@ -38,3 +38,11 @@ int prog_sk_policy_redir(struct sk_msg_md *msg)
 	bpf_msg_apply_bytes(msg, apply_bytes);
 	return bpf_msg_redirect_map(msg, &sock_map, two, 0);
 }
+
+SEC("sk_msg")
+int prog_sk_policy_drop(struct sk_msg_md *msg)
+{
+	if (apply_bytes > 0)
+		bpf_msg_apply_bytes(msg, apply_bytes);
+	return SK_DROP;
+}
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH net 6/7] net: tls: fix use-after-free in tls_sw_sendmsg_locked after bpf verdict
  2026-04-29 22:29 [PATCH net 0/7] net: tls: fix a few random bugs Jakub Kicinski
                   ` (4 preceding siblings ...)
  2026-04-29 22:29 ` [PATCH net 5/7] selftests: bpf: cover wrapped sk_msg ring chaining in ktls TX path Jakub Kicinski
@ 2026-04-29 22:29 ` Jakub Kicinski
  2026-04-30 17:50   ` Jiayuan Chen
  2026-04-30 22:31   ` sashiko-bot
  2026-04-29 22:29 ` [PATCH net 7/7] selftests: bpf: cover tls_sw_sendmsg UAF after bpf_exec_tx_verdict split Jakub Kicinski
  2026-05-03  2:20 ` [PATCH net 0/7] net: tls: fix a few random bugs patchwork-bot+netdevbpf
  7 siblings, 2 replies; 19+ messages in thread
From: Jakub Kicinski @ 2026-04-29 22:29 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, bpf,
	john.fastabend, sd, linux-kselftest, Jakub Kicinski, Alessandro G,
	jiayuan.chen, ast

After bpf_exec_tx_verdict() returns in the zerocopy path, the local
msg_pl/msg_en pointers may be stale. If a BPF program set apply_bytes
such that tls_push_record() splits the open record via
tls_split_open_record(), ctx->open_rec is replaced with the split
remainder while the original record is pushed to the tx_list and may
be freed by tls_tx_records(). The caller's cached msg_pl/msg_en still
reference the old (now-freed) record.

This is triggered when bpf_exec_tx_verdict() returns -ENOSPC (BPF set
cork_bytes > remaining data) after an internal record split: the code
dereferences msg_pl->cork_bytes on the freed record, causing a UAF.

Reported-by: Alessandro G <ale.grpp@gmail.com>
Fixes: 54a3ecaeeeae ("bpf: fix ktls panic with sockmap")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
CC: john.fastabend@gmail.com
CC: sd@queasysnail.net
CC: jiayuan.chen@linux.dev
CC: ast@kernel.org
CC: bpf@vger.kernel.org
---
 net/tls/tls_sw.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 600e13effaab..d086b43fc675 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -1157,6 +1157,13 @@ static int tls_sw_sendmsg_locked(struct sock *sk, struct msghdr *msg,
 				else if (ret == -ENOMEM)
 					goto wait_for_memory;
 				else if (ctx->open_rec && ret == -ENOSPC) {
+					/* bpf_exec_tx_verdict() may have
+					 * called tls_split_open_record(),
+					 * freeing the old record. Re-fetch.
+					 */
+					rec = ctx->open_rec;
+					msg_pl = &rec->msg_plaintext;
+					msg_en = &rec->msg_encrypted;
 					if (msg_pl->cork_bytes) {
 						ret = 0;
 						goto send_end;
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH net 7/7] selftests: bpf: cover tls_sw_sendmsg UAF after bpf_exec_tx_verdict split
  2026-04-29 22:29 [PATCH net 0/7] net: tls: fix a few random bugs Jakub Kicinski
                   ` (5 preceding siblings ...)
  2026-04-29 22:29 ` [PATCH net 6/7] net: tls: fix use-after-free in tls_sw_sendmsg_locked after bpf verdict Jakub Kicinski
@ 2026-04-29 22:29 ` Jakub Kicinski
  2026-04-30 17:55   ` Jiayuan Chen
  2026-05-03  2:20 ` [PATCH net 0/7] net: tls: fix a few random bugs patchwork-bot+netdevbpf
  7 siblings, 1 reply; 19+ messages in thread
From: Jakub Kicinski @ 2026-04-29 22:29 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, bpf,
	john.fastabend, sd, linux-kselftest, Jakub Kicinski, andrii,
	eddyz87, ast, daniel, martin.lau, memxor, song, yonghong.song,
	jolsa, shuah, jiayuan.chen, isolodrai

Add a regression test for the use-after-free in tls_sw_sendmsg_locked()
where the cached msg_pl pointer becomes stale after bpf_exec_tx_verdict()
returns -ENOSPC: tls_push_record() may have called
tls_split_open_record() which replaces ctx->open_rec and frees the old
record, but the caller still dereferences msg_pl->cork_bytes.

Reusing prog_sk_policy with apply_bytes=1000 + cork_bytes=800, a single
1500-byte send on a kTLS TX socket in a sockmap drives the split-and-free
path. Without the fix, KASAN reports slab-use-after-free in tls_sw_sendmsg
and the kernel hangs; with the fix the test completes cleanly.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
CC: andrii@kernel.org
CC: eddyz87@gmail.com
CC: ast@kernel.org
CC: daniel@iogearbox.net
CC: martin.lau@linux.dev
CC: memxor@gmail.com
CC: song@kernel.org
CC: yonghong.song@linux.dev
CC: jolsa@kernel.org
CC: shuah@kernel.org
CC: john.fastabend@gmail.com
CC: jiayuan.chen@linux.dev
CC: isolodrai@meta.com
CC: bpf@vger.kernel.org
CC: linux-kselftest@vger.kernel.org
---
 .../selftests/bpf/prog_tests/sockmap_ktls.c   | 56 +++++++++++++++++++
 .../selftests/bpf/progs/test_sockmap_ktls.c   |  2 +
 2 files changed, 58 insertions(+)

diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_ktls.c b/tools/testing/selftests/bpf/prog_tests/sockmap_ktls.c
index 8ab7f4cdc614..d0bc8d39893a 100644
--- a/tools/testing/selftests/bpf/prog_tests/sockmap_ktls.c
+++ b/tools/testing/selftests/bpf/prog_tests/sockmap_ktls.c
@@ -404,6 +404,60 @@ static void test_sockmap_ktls_tx_pop(int family, int sotype)
 	test_sockmap_ktls__destroy(skel);
 }
 
+static void test_sockmap_ktls_tx_apply_cork_uaf(int family, int sotype)
+{
+	int c = -1, p = -1, one = 1, prog_fd, map_fd;
+	struct test_sockmap_ktls *skel;
+	char buf[1500];
+	ssize_t n;
+	int err;
+
+	skel = test_sockmap_ktls__open_and_load();
+	if (!ASSERT_TRUE(skel, "open ktls skel"))
+		return;
+
+	err = create_pair(family, sotype, &c, &p);
+	if (!ASSERT_OK(err, "create_pair()"))
+		goto out;
+
+	prog_fd = bpf_program__fd(skel->progs.prog_sk_policy);
+	map_fd = bpf_map__fd(skel->maps.sock_map);
+
+	err = bpf_prog_attach(prog_fd, map_fd, BPF_SK_MSG_VERDICT, 0);
+	if (!ASSERT_OK(err, "bpf_prog_attach sk msg"))
+		goto out;
+
+	err = bpf_map_update_elem(map_fd, &one, &c, BPF_NOEXIST);
+	if (!ASSERT_OK(err, "bpf_map_update_elem(c)"))
+		goto out;
+
+	/* apply_bytes < send drives tls_split_open_record(); cork_bytes >
+	 * remaining returns -ENOSPC after the split frees the old rec
+	 */
+	skel->bss->apply_bytes = 1000;
+	skel->bss->cork_byte = 800;
+
+	err = init_ktls_pairs(c, p);
+	if (!ASSERT_OK(err, "init_ktls_pairs(c, p)"))
+		goto out;
+
+	memset(buf, 'A', sizeof(buf));
+	n = send(c, buf, sizeof(buf), MSG_DONTWAIT);
+	if (n < 0)
+		ASSERT_TRUE(errno == ENOSPC || errno == EAGAIN, "send errno");
+
+	n = send(c, buf, sizeof(buf), MSG_DONTWAIT);
+	if (n < 0)
+		ASSERT_TRUE(errno == ENOSPC || errno == EAGAIN, "send errno");
+
+out:
+	if (c != -1)
+		close(c);
+	if (p != -1)
+		close(p);
+	test_sockmap_ktls__destroy(skel);
+}
+
 static void test_sockmap_ktls_tx_wrapped_chain(int family, int sotype)
 {
 	int c = -1, p = -1, one = 1, prog_fd, map_fd;
@@ -512,6 +566,8 @@ static void run_ktls_test(int family, int sotype)
 		test_sockmap_ktls_tx_pop(family, sotype);
 	if (test__start_subtest("tls tx wrapped sg chain"))
 		test_sockmap_ktls_tx_wrapped_chain(family, sotype);
+	if (test__start_subtest("tls tx apply cork uaf"))
+		test_sockmap_ktls_tx_apply_cork_uaf(family, sotype);
 }
 
 void test_sockmap_ktls(void)
diff --git a/tools/testing/selftests/bpf/progs/test_sockmap_ktls.c b/tools/testing/selftests/bpf/progs/test_sockmap_ktls.c
index 18de4d7cd816..b34f0c0f9f83 100644
--- a/tools/testing/selftests/bpf/progs/test_sockmap_ktls.c
+++ b/tools/testing/selftests/bpf/progs/test_sockmap_ktls.c
@@ -20,6 +20,8 @@ struct {
 SEC("sk_msg")
 int prog_sk_policy(struct sk_msg_md *msg)
 {
+	if (apply_bytes > 0)
+		bpf_msg_apply_bytes(msg, apply_bytes);
 	if (cork_byte > 0)
 		bpf_msg_cork_bytes(msg, cork_byte);
 	if (push_start > 0 && push_end > 0)
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH net 6/7] net: tls: fix use-after-free in tls_sw_sendmsg_locked after bpf verdict
  2026-04-29 22:29 ` [PATCH net 6/7] net: tls: fix use-after-free in tls_sw_sendmsg_locked after bpf verdict Jakub Kicinski
@ 2026-04-30 17:50   ` Jiayuan Chen
  2026-04-30 22:31   ` sashiko-bot
  1 sibling, 0 replies; 19+ messages in thread
From: Jiayuan Chen @ 2026-04-30 17:50 UTC (permalink / raw)
  To: Jakub Kicinski, davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, bpf,
	john.fastabend, sd, linux-kselftest, Jakub Kicinski, Alessandro G,
	ast

April 29, 2026 at 3:29 PM, "Jakub Kicinski" <kuba@kernel.org mailto:kuba@kernel.org?to=%22Jakub%20Kicinski%22%20%3Ckuba%40kernel.org%3E > wrote:


> 
> After bpf_exec_tx_verdict() returns in the zerocopy path, the local
> msg_pl/msg_en pointers may be stale. If a BPF program set apply_bytes
> such that tls_push_record() splits the open record via
> tls_split_open_record(), ctx->open_rec is replaced with the split
> remainder while the original record is pushed to the tx_list and may
> be freed by tls_tx_records(). The caller's cached msg_pl/msg_en still
> reference the old (now-freed) record.
> 
> This is triggered when bpf_exec_tx_verdict() returns -ENOSPC (BPF set
> cork_bytes > remaining data) after an internal record split: the code
> dereferences msg_pl->cork_bytes on the freed record, causing a UAF.
> 
> Reported-by: Alessandro G <ale.grpp@gmail.com>
> Fixes: 54a3ecaeeeae ("bpf: fix ktls panic with sockmap")
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
> CC: john.fastabend@gmail.com
> CC: sd@queasysnail.net
> CC: jiayuan.chen@linux.dev
> CC: ast@kernel.org
> CC: bpf@vger.kernel.org
> ---
>  net/tls/tls_sw.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
> index 600e13effaab..d086b43fc675 100644
> --- a/net/tls/tls_sw.c
> +++ b/net/tls/tls_sw.c
> @@ -1157,6 +1157,13 @@ static int tls_sw_sendmsg_locked(struct sock *sk, struct msghdr *msg,
>  else if (ret == -ENOMEM)
>  goto wait_for_memory;
>  else if (ctx->open_rec && ret == -ENOSPC) {
> + /* bpf_exec_tx_verdict() may have
> + * called tls_split_open_record(),
> + * freeing the old record. Re-fetch.
> + */
> + rec = ctx->open_rec;
> + msg_pl = &rec->msg_plaintext;
> + msg_en = &rec->msg_encrypted;
>  if (msg_pl->cork_bytes) {
>  ret = 0;
>  goto send_end;
> -- 
> 2.54.0
> 

Reviewed-by: Jiayuan Chen <jiayuan.chen@linux.dev mailto:jiayuan.chen@linux.dev >

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net 7/7] selftests: bpf: cover tls_sw_sendmsg UAF after bpf_exec_tx_verdict split
  2026-04-29 22:29 ` [PATCH net 7/7] selftests: bpf: cover tls_sw_sendmsg UAF after bpf_exec_tx_verdict split Jakub Kicinski
@ 2026-04-30 17:55   ` Jiayuan Chen
  0 siblings, 0 replies; 19+ messages in thread
From: Jiayuan Chen @ 2026-04-30 17:55 UTC (permalink / raw)
  To: Jakub Kicinski, davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, bpf,
	john.fastabend, sd, linux-kselftest, Jakub Kicinski, andrii,
	eddyz87, ast, daniel, martin.lau, memxor, song, yonghong.song,
	jolsa, shuah, isolodrai

2026年4月29日 15:29, "Jakub Kicinski" <kuba@kernel.org mailto:kuba@kernel.org?to=%22Jakub%20Kicinski%22%20%3Ckuba%40kernel.org%3E > wrote:


> 
> Add a regression test for the use-after-free in tls_sw_sendmsg_locked()
> where the cached msg_pl pointer becomes stale after bpf_exec_tx_verdict()
> returns -ENOSPC: tls_push_record() may have called
> tls_split_open_record() which replaces ctx->open_rec and frees the old
> record, but the caller still dereferences msg_pl->cork_bytes.
> 
> Reusing prog_sk_policy with apply_bytes=1000 + cork_bytes=800, a single
> 1500-byte send on a kTLS TX socket in a sockmap drives the split-and-free
> path. Without the fix, KASAN reports slab-use-after-free in tls_sw_sendmsg
> and the kernel hangs; with the fix the test completes cleanly.
> 
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
> CC: andrii@kernel.org
> CC: eddyz87@gmail.com
> CC: ast@kernel.org
> CC: daniel@iogearbox.net
> CC: martin.lau@linux.dev
> CC: memxor@gmail.com
> CC: song@kernel.org
> CC: yonghong.song@linux.dev
> CC: jolsa@kernel.org
> CC: shuah@kernel.org
> CC: john.fastabend@gmail.com
> CC: jiayuan.chen@linux.dev
> CC: isolodrai@meta.com
> CC: bpf@vger.kernel.org
> CC: linux-kselftest@vger.kernel.org
> ---
>  

Reviewed-by: Jiayuan Chen <jiayuan.chen@linux.dev>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net 5/7] selftests: bpf: cover wrapped sk_msg ring chaining in ktls TX path
  2026-04-29 22:29 ` [PATCH net 5/7] selftests: bpf: cover wrapped sk_msg ring chaining in ktls TX path Jakub Kicinski
@ 2026-04-30 17:58   ` Jiayuan Chen
  2026-04-30 22:31   ` sashiko-bot
  1 sibling, 0 replies; 19+ messages in thread
From: Jiayuan Chen @ 2026-04-30 17:58 UTC (permalink / raw)
  To: Jakub Kicinski, davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, bpf,
	john.fastabend, sd, linux-kselftest, Jakub Kicinski, ast, daniel,
	andrii, martin.lau, eddyz87, memxor, song, yonghong.song, jolsa,
	shuah, isolodrai

2026年4月29日 15:29, "Jakub Kicinski" <kuba@kernel.org mailto:kuba@kernel.org?to=%22Jakub%20Kicinski%22%20%3Ckuba%40kernel.org%3E > wrote:


> 
> Add a regression test for the off-by-one in tls_push_record() where
> the sg_chain() entry count was MAX_SKB_FRAGS instead of NR_MSG_FRAG_IDS,
> causing the chain pointer to overwrite a live ring slot when an sk_msg
> scatterlist ring wrapped (sg.end < sg.start).
> 
> The new "tls tx wrapped sg chain" subtest:
> 1. attaches an SK_MSG program (prog_sk_policy_drop) that drops the first
>  N bytes of a message via bpf_msg_apply_bytes() + SK_DROP,
> 2. splices 17 single-byte frags through a kTLS TX socket so the ring
>  fills to sg.start=16, sg.end=17,
> 3. removes the socket from the sockmap and sends one more byte, which
>  wraps sg.end to 0 and exercises the wrap branch in tls_push_record().
> 
> Without the fix the kernel hangs on the wrapping send (the corrupted
> chain pointer leaves the sg traversal stuck); with the fix the test
> completes cleanly.
> 
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
> CC: ast@kernel.org
> CC: daniel@iogearbox.net
> CC: andrii@kernel.org
> CC: martin.lau@linux.dev
> CC: eddyz87@gmail.com
> CC: memxor@gmail.com
> CC: song@kernel.org
> CC: yonghong.song@linux.dev
> CC: jolsa@kernel.org
> CC: shuah@kernel.org
> CC: john.fastabend@gmail.com
> CC: jiayuan.chen@linux.dev
> CC: isolodrai@meta.com
> CC: bpf@vger.kernel.org
> CC: linux-kselftest@vger.kernel.org
> ---
>  

Reviewed-by: Jiayuan Chen <jiayuan.chen@linux.dev>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net 2/7] selftests: tls: add test for data loss on small pipe
  2026-04-29 22:29 ` [PATCH net 2/7] selftests: tls: add test for data loss on small pipe Jakub Kicinski
@ 2026-04-30 22:31   ` sashiko-bot
  0 siblings, 0 replies; 19+ messages in thread
From: sashiko-bot @ 2026-04-30 22:31 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: bpf

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [Medium] Test logic flaw on architectures with large page sizes defeats the purpose of the test.
- [Medium] Unhandled failure paths and use of EXPECT_EQ cause uninitialized memory reads and test hangs.
--

commit 52c9391392d3ea9a9876d20000032a6a0985383b
Author: Jakub Kicinski <kuba@kernel.org>

selftests: tls: add test for data loss on small pipe

This commit adds a new selftest splice_to_pipe_small to the TLS test suite.
The test verifies that data is not lost when using splice to move data from
a TLS socket into a small pipe that requires multiple iterations to drain.

> diff --git a/tools/testing/selftests/net/tls.c b/tools/testing/selftests/net/tls.c
> --- a/tools/testing/selftests/net/tls.c
> +++ b/tools/testing/selftests/net/tls.c

[ ... ]

> +TEST_F(tls, splice_to_pipe_small)
> +{
> +	int send_len = TLS_PAYLOAD_MAX_LEN;
> +	char mem_send[TLS_PAYLOAD_MAX_LEN];
> +	char mem_recv[TLS_PAYLOAD_MAX_LEN];
> +	size_t total = 0;
> +	int p[2];
> +
> +	memrnd(mem_send, sizeof(mem_send));
> +
> +	ASSERT_GE(pipe(p), 0);
> +
> +	/* Shrink pipe to 1 page (typically 4096 bytes) to force multiple
> +	 * splice iterations for a 16384-byte TLS record.
> +	 */
> +	EXPECT_GE(fcntl(p[1], F_SETPIPE_SZ, 4096), 4096);

Does this test behave as expected on architectures with large page sizes?

The kernel rounds pipe capacities up to the system page size. On architectures
with a 64KB page size (such as ARM64 or PowerPC), the pipe size will be set
to 64KB instead of 4096 bytes.

Since the 16KB payload easily fits within a 64KB pipe, the splice call would
transfer the entire payload in a single iteration, bypassing the intended
multiple-iteration behavior.

> +	EXPECT_EQ(send(self->fd, mem_send, send_len, 0), send_len);

Will this test hang on a short send?

Unlike ASSERT_EQ, EXPECT_EQ allows execution to continue on failure. If send()
transmits a partial payload, total will not advance enough relative to the
data on the socket. The subsequent splice() could then block indefinitely
waiting for data that will never arrive.

> +	while (total < (size_t)send_len) {
> +		ssize_t spliced, drained;
> +
> +		spliced = splice(self->cfd, NULL, p[1], NULL,
> +				 send_len - total, 0);
> +		EXPECT_GT(spliced, 0);
> +		if (spliced <= 0)
> +			break;
> +
> +		drained = read(p[0], mem_recv + total, spliced);
> +		EXPECT_EQ(drained, spliced);
> +		if (drained <= 0)
> +			break;
> +
> +		total += drained;
> +	}
> +
> +	EXPECT_EQ(total, (size_t)send_len);
> +	EXPECT_EQ(memcmp(mem_send, mem_recv, send_len), 0);

Could this read uninitialized memory?

If splice() or read() return <= 0, the loop breaks early with total < send_len.

Because mem_recv is uninitialized stack memory, the memcmp() call will
read past mem_recv + total into uninitialized memory, triggering undefined
behavior and potential memory sanitizer failures.

> +
> +	close(p[0]);
> +	close(p[1]);
> +}

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260429222944.2139041-1-kuba@kernel.org?part=2

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net 3/7] net: tls: fix page pin leak on sendpage_ok() failure
  2026-04-29 22:29 ` [PATCH net 3/7] net: tls: fix page pin leak on sendpage_ok() failure Jakub Kicinski
@ 2026-04-30 22:31   ` sashiko-bot
  0 siblings, 0 replies; 19+ messages in thread
From: sashiko-bot @ 2026-04-30 22:31 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: bpf

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [High] Incomplete fix for page pin leak: dead code in the error path and a permanent leak on the success path for user-backed iterators.
--

commit 6f0253d94e63d44e6e53f2c84d2aead405564b2b
Author: Jakub Kicinski <kuba@kernel.org>

net: tls: fix page pin leak on sendpage_ok() failure
    
When iov_iter_extract_pages() is called on a user-backed iterator, it
pins the extracted pages via pin_user_pages_fast(). If the subsequent
sendpage_ok() check fails, both tls_sw_sendmsg_splice() and
tls_push_data() (device offload path) call iov_iter_revert() to unwind
the iterator position, but iov_iter_revert() only adjusts the cursor -
it does not release the page pin taken by the extract.

> diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
> index 99c8eff9783e..77922e72c685 100644
> --- a/net/tls/tls_device.c
> +++ b/net/tls/tls_device.c
> @@ -508,6 +508,8 @@ static int tls_push_data(struct sock *sk,
>  			copy = rc;
>  
>  			if (WARN_ON_ONCE(!sendpage_ok(zc_pfrag.page))) {
> +				if (iov_iter_extract_will_pin(iter))
> +					unpin_user_page(zc_pfrag.page);

Is this error path reachable for user-backed iterators?

sendpage_ok() only returns false for slab pages or pages with zero refcount.
User pages extracted via pin_user_pages_fast() will always have a positive
refcount and are never slab pages.

Additionally, does this code leak the page pin on the success path?

On success, the extracted page is passed to tls_append_frag(), which acquires
an additional standard page reference via get_page(). The original FOLL_PIN
reference (which includes GUP_PIN_COUNTING_BIAS) is never released by TLS.

When the packet is eventually transmitted and freed, the network stack calls
standard put_page() (via __skb_frag_unref()), which drops only the standard
reference. The pin bias is never cleared, causing a permanent memory leak on
every successful extraction.

If the intent is to support user-backed iterators in MSG_SPLICE_PAGES paths,
this same unpin omission appears to exist in the core MSG_SPLICE_PAGES helper
skb_splice_from_iter() as well.

>  				iov_iter_revert(iter, copy);

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260429222944.2139041-1-kuba@kernel.org?part=3

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net 4/7] net: tls: fix off-by-one in sg_chain entry count for wrapped sk_msg ring
  2026-04-29 22:29 ` [PATCH net 4/7] net: tls: fix off-by-one in sg_chain entry count for wrapped sk_msg ring Jakub Kicinski
@ 2026-04-30 22:31   ` sashiko-bot
  2026-05-01 16:40   ` Sabrina Dubroca
  1 sibling, 0 replies; 19+ messages in thread
From: sashiko-bot @ 2026-04-30 22:31 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: bpf

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Critical] In TLS 1.3, a wrapped scatterlist with `end == 0` creates a chain-to-chain pointer structure that the scatterlist API cannot traverse, leading to invalid `struct page` dereferences in the crypto engine.
--

commit ea91dff65607bff05b2c2e5b43839119ab72f451
Author: Jakub Kicinski <kuba@kernel.org>

net: tls: fix off-by-one in sg_chain entry count for wrapped sk_msg ring

This commit fixes the entry count passed to sg_chain when chaining a
wrapped sk_msg ring. It replaces MAX_SKB_FRAGS with NR_MSG_FRAG_IDS to
ensure the chaining pointer lands in the correct reserved slot.

> diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
> index bf8f022e1895..2c6a3c4a9fcc 100644
> --- a/net/tls/tls_sw.c
> +++ b/net/tls/tls_sw.c
[ ... ]
>  	if (prot->version == TLS_1_3_VERSION) {
>  		/* Add content type to end of message.  No padding added */
>  		sg_set_buf(&rec->sg_content_type, &rec->content_type, 1);
>  		sg_mark_end(&rec->sg_content_type);
>  		sg_chain(msg_pl->sg.data, msg_pl->sg.end + 1,
>  			 &rec->sg_content_type);
>  	} else {
>  		sg_mark_end(sk_msg_elem(msg_pl, i));
>  	}
>  
>  	if (msg_pl->sg.end < msg_pl->sg.start) {
>  		sg_chain(&msg_pl->sg.data[msg_pl->sg.start],
> -			 MAX_SKB_FRAGS - msg_pl->sg.start + 1,
> +			 NR_MSG_FRAG_IDS - msg_pl->sg.start + 1,
>  			 msg_pl->sg.data);
>  	}

If the wrapped message perfectly fills the end of the array such that
msg_pl->sg.end is 0, does the TLS 1.3 block create a chain pointer at
msg_pl->sg.data[0]?

Because the ring wrapped, the second sg_chain call links the tail of the
array to msg_pl->sg.data, which starts at index 0.

This would create a chain where the wrap link points directly to another chain
link. The scatterlist API sg_next iterator does not recursively resolve
consecutive chain links.

When stepping past the end of the ring, sg_next would follow the wrap chain,
land on msg_pl->sg.data[0], and return that chain element directly to the
caller.

If the crypto engine then calls sg_page on this returned element, could it
interpret the chain pointer as a struct page address and panic?

While this patch fixes the ring modulus so it no longer corrupts valid payload
data, does the underlying double-chain issue still exist for TLS 1.3 records
that wrap exactly at msg_pl->sg.end == 0?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260429222944.2139041-1-kuba@kernel.org?part=4

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net 5/7] selftests: bpf: cover wrapped sk_msg ring chaining in ktls TX path
  2026-04-29 22:29 ` [PATCH net 5/7] selftests: bpf: cover wrapped sk_msg ring chaining in ktls TX path Jakub Kicinski
  2026-04-30 17:58   ` Jiayuan Chen
@ 2026-04-30 22:31   ` sashiko-bot
  1 sibling, 0 replies; 19+ messages in thread
From: sashiko-bot @ 2026-04-30 22:31 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: bpf

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
- [Medium] Test fragility due to hardcoded fragment count allows silent false positive passes
- [Medium] Flawed splice() return value validation masks setup failures
- [Low] Usage of ASSERT_TRUE() for pointer validation instead of ASSERT_OK_PTR()
--

commit ff8ec72cdd60f8b61132bc64e2221ed120c71020
Author: Jakub Kicinski <kuba@kernel.org>

selftests: bpf: cover wrapped sk_msg ring chaining in ktls TX path

This commit adds a regression test for an off-by-one error in
tls_push_record(). It uses a BPF program to drop bytes and splices
data to force an sk_msg scatterlist ring to wrap and trigger the bug.

> diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_ktls.c b/tools/testing/selftests/bpf/prog_tests/sockmap_ktls.c
> --- a/tools/testing/selftests/bpf/prog_tests/sockmap_ktls.c
> +++ b/tools/testing/selftests/bpf/prog_tests/sockmap_ktls.c

[ ... ]

> +static void test_sockmap_ktls_tx_wrapped_chain(int family, int sotype)
> +{
> +	int c = -1, p = -1, one = 1, prog_fd, map_fd;
> +	int pipefd[2] = { -1, -1 };
> +	struct test_sockmap_ktls *skel;
> +	char byte;
> +	ssize_t n;
> +	int err, i;
> +
> +	skel = test_sockmap_ktls__open_and_load();
> +	if (!ASSERT_TRUE(skel, "open ktls skel"))

This isn't a bug, but should this use ASSERT_OK_PTR() instead of
ASSERT_TRUE() for validating the skeleton pointer? ASSERT_OK_PTR()
is specifically designated for checking pointers in BPF selftests
and properly handles both NULL and ERR_PTR values.

> +		return;

[ ... ]

> +	/* packetized pipe so each splice frag becomes its own sg entry */
> +	err = pipe2(pipefd, O_DIRECT);
> +	if (!ASSERT_OK(err, "pipe2"))
> +		goto out;
> +	err = fcntl(pipefd[0], F_SETPIPE_SZ, 17 * 4096);
> +	if (!ASSERT_GE(err, 17 * 4096, "F_SETPIPE_SZ"))
> +		goto out;
> +
> +	for (i = 0; i < 17; i++) {
> +		byte = 'A' + i;
> +		if (!ASSERT_EQ(write(pipefd[1], &byte, 1), 1, "write to pipe"))
> +			goto out;
> +	}

Will this silently fail to wrap the ring if the kernel is configured
with a CONFIG_MAX_SKB_FRAGS value greater than 17?

The test relies on 17 fragments to fill the ring, as the default capacity
for NR_MSG_FRAG_IDS is MAX_SKB_FRAGS + 1. However, MAX_SKB_FRAGS can be
configured up to 45. If the capacity exceeds 17, the 18th byte sent later
in the test will just be placed at index 17, so sg.end will never wrap to 0.

Would it be more robust to dynamically determine the fragment limit by
reading /proc/sys/net/core/max_skb_frags instead of hardcoding 17?

> +
> +	/* drop the first 16 bytes so sg.start advances to 16 */
> +	skel->bss->apply_bytes = 16;
> +
> +	n = splice(pipefd[0], NULL, c, NULL, 17, 0);
> +	if (n < 0)
> +		ASSERT_EQ(errno, EACCES, "splice errno");

Could this splice check mask potential test setup failures?

If the BPF program drops 16 bytes as expected, tls_sw_sendmsg() will
return the remaining 1 byte, making splice() return 1. The n < 0 check
evaluates to false and is bypassed.

However, if the BPF program fails to drop the bytes due to a setup issue
or future regression, splice() will return 17. The n < 0 check is still
bypassed, but the full sk_msg ring will be flushed. The subsequent
1-byte send() will then start a new ring at sg.end = 1 instead of
wrapping the existing one.

Should this strictly assert the exact expected return value, for example
using ASSERT_EQ(n, 1, "splice returns remaining byte"), to guarantee
the ring state is set up correctly?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260429222944.2139041-1-kuba@kernel.org?part=5

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net 6/7] net: tls: fix use-after-free in tls_sw_sendmsg_locked after bpf verdict
  2026-04-29 22:29 ` [PATCH net 6/7] net: tls: fix use-after-free in tls_sw_sendmsg_locked after bpf verdict Jakub Kicinski
  2026-04-30 17:50   ` Jiayuan Chen
@ 2026-04-30 22:31   ` sashiko-bot
  1 sibling, 0 replies; 19+ messages in thread
From: sashiko-bot @ 2026-04-30 22:31 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: bpf

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [Critical] Arithmetic underflow, state corruption, and silent data duplication in the iterator rollback path.
- [High] Use-After-Free when `bpf_exec_tx_verdict` returns `-ENOMEM`.
--

commit ba7ab075901b934710734a96c18f344dbc28cf04
Author: Jakub Kicinski <kuba@kernel.org>

net: tls: fix use-after-free in tls_sw_sendmsg_locked after bpf verdict

This patch fixes a use-after-free in tls_sw_sendmsg_locked() when
bpf_exec_tx_verdict() returns -ENOSPC. It updates msg_pl and msg_en after a
possible record split to avoid referencing a freed record.

> diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
> index 2c6a3c4a9fcc..4f8d656f39a3 100644
> --- a/net/tls/tls_sw.c
> +++ b/net/tls/tls_sw.c
> @@ -1157,6 +1157,13 @@ static int tls_sw_sendmsg_locked(struct sock *sk, struct msghdr *msg,
>  				else if (ret == -ENOMEM)
>  					goto wait_for_memory;

If bpf_exec_tx_verdict() splits the record, pushes it to tx_list, and then
encounters an -ENOMEM error while transmitting, it returns -ENOMEM.

This jumps to wait_for_memory without refetching msg_en. If
sk_stream_wait_memory() returns 0, the loop continues and evaluates:

	if (ctx->open_rec && msg_en->sg.size < required_size)
		goto alloc_encrypted;

Could this result in a use-after-free read when dereferencing msg_en->sg.size,
since msg_en still points to the freed original record?

>  				else if (ctx->open_rec && ret == -ENOSPC) {
> +					/* bpf_exec_tx_verdict() may have
> +					 * called tls_split_open_record(),
> +					 * freeing the old record. Re-fetch.
> +					 */
> +					rec = ctx->open_rec;
> +					msg_pl = &rec->msg_plaintext;
> +					msg_en = &rec->msg_encrypted;
>  					if (msg_pl->cork_bytes) {
>  						ret = 0;
>  						goto send_end;

If bpf_exec_tx_verdict() splits the record and then returns -ENOSPC (for
example, from a failed sk_msg_alloc on the remainder), the new msg_pl points
to the remainder record. If msg_pl->cork_bytes is 0, execution falls through
to rollback_iter:

rollback_iter:
	copied -= try_to_copy;
	sk_msg_sg_copy_clear(msg_pl, first);
	iov_iter_revert(&msg->msg_iter,
			msg_pl->sg.size - orig_size);

Since msg_pl is now the remainder record, its size might be smaller than
orig_size. Does the subtraction in iov_iter_revert() underflow and corrupt
the iterator state?

Also, sk_msg_sg_copy_clear(msg_pl, first) is called, but 'first' was captured
from the original record. Could this cause an out-of-bounds access on the
remainder's scatterlist?

Additionally, is it safe to roll back the iterator after the first chunk of
the split record was successfully transmitted? It seems this could lead to
the application re-sending already committed data.

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260429222944.2139041-1-kuba@kernel.org?part=6

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net 4/7] net: tls: fix off-by-one in sg_chain entry count for wrapped sk_msg ring
  2026-04-29 22:29 ` [PATCH net 4/7] net: tls: fix off-by-one in sg_chain entry count for wrapped sk_msg ring Jakub Kicinski
  2026-04-30 22:31   ` sashiko-bot
@ 2026-05-01 16:40   ` Sabrina Dubroca
  2026-05-03  1:26     ` Jakub Kicinski
  1 sibling, 1 reply; 19+ messages in thread
From: Sabrina Dubroca @ 2026-05-01 16:40 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, bpf,
	john.fastabend, linux-kselftest, 钱一铭, daniel,
	jonathan.lemon

[I'm slowly trying to catch up with the chunk of ktls patches that
were posted in the past 3 days]

2026-04-29, 15:29:41 -0700, Jakub Kicinski wrote:
[...]
> Every other wrapped-ring arithmetic operation in the sk_msg subsystem
> (sk_msg_iter_dist, sk_msg_iter_var_next, sk_msg_iter_var_prev,
> bpf_msg_pull_data) correctly uses NR_MSG_FRAG_IDS as the ring modulus.
> This sg_chain call is the sole remaining use of MAX_SKB_FRAGS for
> ring-modulus arithmetic and was introduced after the ring expansion.

[...]
> diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
> index 906a1998c630..600e13effaab 100644
> --- a/net/tls/tls_sw.c
> +++ b/net/tls/tls_sw.c
> @@ -802,7 +802,7 @@ static int tls_push_record(struct sock *sk, int flags,
>  
>  	if (msg_pl->sg.end < msg_pl->sg.start) {
>  		sg_chain(&msg_pl->sg.data[msg_pl->sg.start],
> -			 MAX_SKB_FRAGS - msg_pl->sg.start + 1,
> +			 NR_MSG_FRAG_IDS - msg_pl->sg.start + 1,
>  			 msg_pl->sg.data);

And get rid of the [start] / NR - start dance to make this code a bit
clearer?

This should maybe even be an skmsg helper to avoid "random" code
making assumptions on the size of the sg.data array. The last
paragraph in your commit message also tends to suggest that.

-- 
Sabrina

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net 4/7] net: tls: fix off-by-one in sg_chain entry count for wrapped sk_msg ring
  2026-05-01 16:40   ` Sabrina Dubroca
@ 2026-05-03  1:26     ` Jakub Kicinski
  0 siblings, 0 replies; 19+ messages in thread
From: Jakub Kicinski @ 2026-05-03  1:26 UTC (permalink / raw)
  To: Sabrina Dubroca
  Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, bpf,
	john.fastabend, linux-kselftest, 钱一铭, daniel,
	jonathan.lemon

On Fri, 1 May 2026 18:40:27 +0200 Sabrina Dubroca wrote:
> > -			 MAX_SKB_FRAGS - msg_pl->sg.start + 1,
> > +			 NR_MSG_FRAG_IDS - msg_pl->sg.start + 1,
> >  			 msg_pl->sg.data);  
> 
> And get rid of the [start] / NR - start dance to make this code a bit
> clearer?

Heh, yes. In hindsight that is quite a silly construct..

Let me apply the first 2 since the first fix is trivial and hopefully
find some time to check what Sashiko was saying some time this well.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net 0/7] net: tls: fix a few random bugs
  2026-04-29 22:29 [PATCH net 0/7] net: tls: fix a few random bugs Jakub Kicinski
                   ` (6 preceding siblings ...)
  2026-04-29 22:29 ` [PATCH net 7/7] selftests: bpf: cover tls_sw_sendmsg UAF after bpf_exec_tx_verdict split Jakub Kicinski
@ 2026-05-03  2:20 ` patchwork-bot+netdevbpf
  7 siblings, 0 replies; 19+ messages in thread
From: patchwork-bot+netdevbpf @ 2026-05-03  2:20 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, bpf,
	john.fastabend, sd, linux-kselftest

Hello:

This series was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Wed, 29 Apr 2026 15:29:37 -0700 you wrote:
> Fix a few random bugs, from external reports and my local scan
> with various AI tools. Mostly corner cases in code which I don't
> think TLS maintainers would consider "battle tested".
> 
> Jakub Kicinski (7):
>   net: tls: fix silent data drop under pipe back-pressure
>   selftests: tls: add test for data loss on small pipe
>   net: tls: fix page pin leak on sendpage_ok() failure
>   net: tls: fix off-by-one in sg_chain entry count for wrapped sk_msg
>     ring
>   selftests: bpf: cover wrapped sk_msg ring chaining in ktls TX path
>   net: tls: fix use-after-free in tls_sw_sendmsg_locked after bpf
>     verdict
>   selftests: bpf: cover tls_sw_sendmsg UAF after bpf_exec_tx_verdict
>     split
> 
> [...]

Here is the summary with links:
  - [net,1/7] net: tls: fix silent data drop under pipe back-pressure
    https://git.kernel.org/netdev/net/c/7e7be31bfdb0
  - [net,2/7] selftests: tls: add test for data loss on small pipe
    https://git.kernel.org/netdev/net/c/bd3a4795d574
  - [net,3/7] net: tls: fix page pin leak on sendpage_ok() failure
    (no matching commit)
  - [net,4/7] net: tls: fix off-by-one in sg_chain entry count for wrapped sk_msg ring
    (no matching commit)
  - [net,5/7] selftests: bpf: cover wrapped sk_msg ring chaining in ktls TX path
    (no matching commit)
  - [net,6/7] net: tls: fix use-after-free in tls_sw_sendmsg_locked after bpf verdict
    (no matching commit)
  - [net,7/7] selftests: bpf: cover tls_sw_sendmsg UAF after bpf_exec_tx_verdict split
    (no matching commit)

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2026-05-03  2:20 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-29 22:29 [PATCH net 0/7] net: tls: fix a few random bugs Jakub Kicinski
2026-04-29 22:29 ` [PATCH net 1/7] net: tls: fix silent data drop under pipe back-pressure Jakub Kicinski
2026-04-29 22:29 ` [PATCH net 2/7] selftests: tls: add test for data loss on small pipe Jakub Kicinski
2026-04-30 22:31   ` sashiko-bot
2026-04-29 22:29 ` [PATCH net 3/7] net: tls: fix page pin leak on sendpage_ok() failure Jakub Kicinski
2026-04-30 22:31   ` sashiko-bot
2026-04-29 22:29 ` [PATCH net 4/7] net: tls: fix off-by-one in sg_chain entry count for wrapped sk_msg ring Jakub Kicinski
2026-04-30 22:31   ` sashiko-bot
2026-05-01 16:40   ` Sabrina Dubroca
2026-05-03  1:26     ` Jakub Kicinski
2026-04-29 22:29 ` [PATCH net 5/7] selftests: bpf: cover wrapped sk_msg ring chaining in ktls TX path Jakub Kicinski
2026-04-30 17:58   ` Jiayuan Chen
2026-04-30 22:31   ` sashiko-bot
2026-04-29 22:29 ` [PATCH net 6/7] net: tls: fix use-after-free in tls_sw_sendmsg_locked after bpf verdict Jakub Kicinski
2026-04-30 17:50   ` Jiayuan Chen
2026-04-30 22:31   ` sashiko-bot
2026-04-29 22:29 ` [PATCH net 7/7] selftests: bpf: cover tls_sw_sendmsg UAF after bpf_exec_tx_verdict split Jakub Kicinski
2026-04-30 17:55   ` Jiayuan Chen
2026-05-03  2:20 ` [PATCH net 0/7] net: tls: fix a few random bugs patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox