public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf-next v4 0/4] bpf: Reject TCP_NODELAY in TCP header option
@ 2026-04-21 15:58 KaFai Wan
  2026-04-21 15:58 ` [PATCH bpf-next v4 1/4] bpf: Reject TCP_NODELAY in TCP header option callbacks KaFai Wan
                   ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: KaFai Wan @ 2026-04-21 15:58 UTC (permalink / raw)
  To: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, memxor,
	song, yonghong.song, jolsa, sdf, davem, edumazet, kuba, pabeni,
	horms, dsahern, shuah, ihor.solodrai, kafai.wan, jiayuan.chen,
	hoyeon.lee, ameryhung, bpf, linux-kernel, netdev, linux-kselftest

This small patchset is about avoid infinite recursion in TCP header option callbacks
and bpf-tcp-cc callbacks via TCP_NODELAY setsockopt.

v4:
 - Fix the test case for TCP header option callbacks (Martin and Jiayuan)
 - Reject TCP_NODELAY in bpf-tcp-cc callbacks (AI and Martin)
 - Add a test case for bpf-tcp-cc

v3:
 - Remove CONFIG_INET check and add comment (Martin and Jiayuan)
 - Fix the test case (Martin)
 https://lore.kernel.org/bpf/20260417092035.2299913-1-kafai.wan@linux.dev/

v2:
 - Reject TCP_NODELAY in bpf_sock_ops_setsockopt() (AI and Martin)
 https://lore.kernel.org/bpf/20260416112308.1820332-1-kafai.wan@linux.dev/

v1:
 https://lore.kernel.org/bpf/20260414112310.1285783-1-kafai.wan@linux.dev/

---
KaFai Wan (4):
  bpf: Reject TCP_NODELAY in TCP header option callbacks
  bpf: Reject TCP_NODELAY in bpf-tcp-cc
  selftests/bpf: Test TCP_NODELAY in TCP hdr opt callbacks
  selftests/bpf: Verify bpf-tcp-cc rejects TCP_NODELAY

 include/linux/bpf.h                           |  1 +
 net/core/filter.c                             | 30 +++++++++++++++++++
 net/ipv4/bpf_tcp_ca.c                         |  2 +-
 .../selftests/bpf/prog_tests/bpf_tcp_ca.c     |  4 +++
 .../bpf/prog_tests/tcp_hdr_options.c          |  6 ++++
 tools/testing/selftests/bpf/progs/bpf_cubic.c | 12 ++++++++
 .../bpf/progs/test_misc_tcp_hdr_options.c     | 15 +++++++++-
 7 files changed, 68 insertions(+), 2 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH bpf-next v4 1/4] bpf: Reject TCP_NODELAY in TCP header option callbacks
  2026-04-21 15:58 [PATCH bpf-next v4 0/4] bpf: Reject TCP_NODELAY in TCP header option KaFai Wan
@ 2026-04-21 15:58 ` KaFai Wan
  2026-04-21 16:51   ` bot+bpf-ci
  2026-04-21 15:58 ` [PATCH bpf-next v4 2/4] bpf: Reject TCP_NODELAY in bpf-tcp-cc KaFai Wan
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 6+ messages in thread
From: KaFai Wan @ 2026-04-21 15:58 UTC (permalink / raw)
  To: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, memxor,
	song, yonghong.song, jolsa, sdf, davem, edumazet, kuba, pabeni,
	horms, dsahern, shuah, ihor.solodrai, kafai.wan, jiayuan.chen,
	hoyeon.lee, ameryhung, bpf, linux-kernel, netdev, linux-kselftest
  Cc: Quan Sun, Yinhao Hu, Kaiyan Mei

A BPF_SOCK_OPS program can enable
BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG and then call
bpf_setsockopt(TCP_NODELAY) from BPF_SOCK_OPS_HDR_OPT_LEN_CB or
BPF_SOCK_OPS_WRITE_HDR_OPT_CB.

In these callbacks, bpf_setsockopt(TCP_NODELAY) can reach
__tcp_sock_set_nodelay(), which can call tcp_push_pending_frames().

From BPF_SOCK_OPS_HDR_OPT_LEN_CB, tcp_push_pending_frames() can call
tcp_current_mss(), which calls tcp_established_options() and re-enters
bpf_skops_hdr_opt_len().

BPF_SOCK_OPS_HDR_OPT_LEN_CB
  -> bpf_setsockopt(TCP_NODELAY)
    -> tcp_push_pending_frames()
      -> tcp_current_mss()
        -> tcp_established_options()
          -> bpf_skops_hdr_opt_len()
            -> BPF_SOCK_OPS_HDR_OPT_LEN_CB

From BPF_SOCK_OPS_WRITE_HDR_OPT_CB, tcp_push_pending_frames() can call
tcp_write_xmit(), which calls tcp_transmit_skb().  That path recomputes
header option length through tcp_established_options() and
bpf_skops_hdr_opt_len() before re-entering bpf_skops_write_hdr_opt().

BPF_SOCK_OPS_WRITE_HDR_OPT_CB
  -> bpf_setsockopt(TCP_NODELAY)
    -> tcp_push_pending_frames()
      -> tcp_write_xmit()
        -> tcp_transmit_skb()
          -> tcp_established_options()
            -> bpf_skops_hdr_opt_len()
          -> bpf_skops_write_hdr_opt()
            -> BPF_SOCK_OPS_WRITE_HDR_OPT_CB

This leads to unbounded recursion and can overflow the kernel stack.

Reject TCP_NODELAY with -EOPNOTSUPP in bpf_sock_ops_setsockopt()
when bpf_setsockopt() is called from
BPF_SOCK_OPS_HDR_OPT_LEN_CB or BPF_SOCK_OPS_WRITE_HDR_OPT_CB.

Reported-by: Quan Sun <2022090917019@std.uestc.edu.cn>
Reported-by: Yinhao Hu <dddddd@hust.edu.cn>
Reported-by: Kaiyan Mei <M202472210@hust.edu.cn>
Closes: https://lore.kernel.org/bpf/d1d523c9-6901-4454-a183-94462b8f3e4e@std.uestc.edu.cn/
Fixes: 7e41df5dbba2 ("bpf: Add a few optnames to bpf_setsockopt")
Reviewed-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Signed-off-by: KaFai Wan <kafai.wan@linux.dev>
---
 net/core/filter.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/net/core/filter.c b/net/core/filter.c
index 5fa9189eb772..96849f4c1fbc 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5833,6 +5833,12 @@ BPF_CALL_5(bpf_sock_ops_setsockopt, struct bpf_sock_ops_kern *, bpf_sock,
 	if (!is_locked_tcp_sock_ops(bpf_sock))
 		return -EOPNOTSUPP;
 
+	/* TCP_NODELAY triggers tcp_push_pending_frames() and re-enters these callbacks. */
+	if ((bpf_sock->op == BPF_SOCK_OPS_HDR_OPT_LEN_CB ||
+	     bpf_sock->op == BPF_SOCK_OPS_WRITE_HDR_OPT_CB) &&
+	    level == SOL_TCP && optname == TCP_NODELAY)
+		return -EOPNOTSUPP;
+
 	return _bpf_setsockopt(bpf_sock->sk, level, optname, optval, optlen);
 }
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH bpf-next v4 2/4] bpf: Reject TCP_NODELAY in bpf-tcp-cc
  2026-04-21 15:58 [PATCH bpf-next v4 0/4] bpf: Reject TCP_NODELAY in TCP header option KaFai Wan
  2026-04-21 15:58 ` [PATCH bpf-next v4 1/4] bpf: Reject TCP_NODELAY in TCP header option callbacks KaFai Wan
@ 2026-04-21 15:58 ` KaFai Wan
  2026-04-21 15:58 ` [PATCH bpf-next v4 3/4] selftests/bpf: Test TCP_NODELAY in TCP hdr opt callbacks KaFai Wan
  2026-04-21 15:58 ` [PATCH bpf-next v4 4/4] selftests/bpf: Verify bpf-tcp-cc rejects TCP_NODELAY KaFai Wan
  3 siblings, 0 replies; 6+ messages in thread
From: KaFai Wan @ 2026-04-21 15:58 UTC (permalink / raw)
  To: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, memxor,
	song, yonghong.song, jolsa, sdf, davem, edumazet, kuba, pabeni,
	horms, dsahern, shuah, ihor.solodrai, kafai.wan, jiayuan.chen,
	hoyeon.lee, ameryhung, bpf, linux-kernel, netdev, linux-kselftest

A BPF TCP congestion control program can call bpf_setsockopt() from
its callbacks. In current kernels, if it calls
bpf_setsockopt(TCP_NODELAY) from cwnd_event_tx_start(), the call can
re-enter the TCP transmit path before the outer tcp_transmit_skb()
has completed and advanced the send head.

This can re-trigger CA_EVENT_TX_START and lead to unbounded recursion:

  tcp_transmit_skb()
    -> tcp_event_data_sent()
      -> tcp_ca_event(sk, CA_EVENT_TX_START)
        -> cwnd_event_tx_start()
          -> bpf_setsockopt(TCP_NODELAY)
            -> tcp_push_pending_frames()
              -> tcp_write_xmit()
                -> tcp_transmit_skb()

This leads to unbounded recursion and can overflow the kernel stack.

Reject TCP_NODELAY with -EOPNOTSUPP for bpf-tcp-cc by introducing
a dedicated setsockopt proto for BPF_PROG_TYPE_STRUCT_OPS TCP
congestion control programs.

Fixes: 7e41df5dbba2 ("bpf: Add a few optnames to bpf_setsockopt")
Suggested-by: Martin KaFai Lau <martin.lau@linux.dev>
Signed-off-by: KaFai Wan <kafai.wan@linux.dev>
---
 include/linux/bpf.h   |  1 +
 net/core/filter.c     | 24 ++++++++++++++++++++++++
 net/ipv4/bpf_tcp_ca.c |  2 +-
 3 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 3cb6b9e70080..cf75da8a12bd 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -3725,6 +3725,7 @@ extern const struct bpf_func_proto bpf_for_each_map_elem_proto;
 extern const struct bpf_func_proto bpf_btf_find_by_name_kind_proto;
 extern const struct bpf_func_proto bpf_sk_setsockopt_proto;
 extern const struct bpf_func_proto bpf_sk_getsockopt_proto;
+extern const struct bpf_func_proto bpf_sk_setsockopt_nodelay_proto;
 extern const struct bpf_func_proto bpf_unlocked_sk_setsockopt_proto;
 extern const struct bpf_func_proto bpf_unlocked_sk_getsockopt_proto;
 extern const struct bpf_func_proto bpf_find_vma_proto;
diff --git a/net/core/filter.c b/net/core/filter.c
index 96849f4c1fbc..1140f4b55ab5 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5688,6 +5688,30 @@ const struct bpf_func_proto bpf_sk_getsockopt_proto = {
 	.arg5_type	= ARG_CONST_SIZE,
 };
 
+BPF_CALL_5(bpf_sk_setsockopt_nodelay, struct sock *, sk, int, level,
+	   int, optname, char *, optval, int, optlen)
+{
+	/*
+	 * TCP_NODELAY triggers tcp_push_pending_frames() and re-enters
+	 * CA_EVENT_TX_START in bpf_tcp_cc, reject it in all bpf_tcp_cc.
+	 */
+	if (level == SOL_TCP && optname == TCP_NODELAY)
+		return -EOPNOTSUPP;
+
+	return _bpf_setsockopt(sk, level, optname, optval, optlen);
+}
+
+const struct bpf_func_proto bpf_sk_setsockopt_nodelay_proto = {
+	.func		= bpf_sk_setsockopt_nodelay,
+	.gpl_only	= false,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_BTF_ID_SOCK_COMMON,
+	.arg2_type	= ARG_ANYTHING,
+	.arg3_type	= ARG_ANYTHING,
+	.arg4_type	= ARG_PTR_TO_MEM | MEM_RDONLY,
+	.arg5_type	= ARG_CONST_SIZE,
+};
+
 BPF_CALL_5(bpf_unlocked_sk_setsockopt, struct sock *, sk, int, level,
 	   int, optname, char *, optval, int, optlen)
 {
diff --git a/net/ipv4/bpf_tcp_ca.c b/net/ipv4/bpf_tcp_ca.c
index 008edc7f6688..791e15063237 100644
--- a/net/ipv4/bpf_tcp_ca.c
+++ b/net/ipv4/bpf_tcp_ca.c
@@ -168,7 +168,7 @@ bpf_tcp_ca_get_func_proto(enum bpf_func_id func_id,
 		 */
 		if (prog_ops_moff(prog) !=
 		    offsetof(struct tcp_congestion_ops, release))
-			return &bpf_sk_setsockopt_proto;
+			return &bpf_sk_setsockopt_nodelay_proto;
 		return NULL;
 	case BPF_FUNC_getsockopt:
 		/* Since get/setsockopt is usually expected to
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH bpf-next v4 3/4] selftests/bpf: Test TCP_NODELAY in TCP hdr opt callbacks
  2026-04-21 15:58 [PATCH bpf-next v4 0/4] bpf: Reject TCP_NODELAY in TCP header option KaFai Wan
  2026-04-21 15:58 ` [PATCH bpf-next v4 1/4] bpf: Reject TCP_NODELAY in TCP header option callbacks KaFai Wan
  2026-04-21 15:58 ` [PATCH bpf-next v4 2/4] bpf: Reject TCP_NODELAY in bpf-tcp-cc KaFai Wan
@ 2026-04-21 15:58 ` KaFai Wan
  2026-04-21 15:58 ` [PATCH bpf-next v4 4/4] selftests/bpf: Verify bpf-tcp-cc rejects TCP_NODELAY KaFai Wan
  3 siblings, 0 replies; 6+ messages in thread
From: KaFai Wan @ 2026-04-21 15:58 UTC (permalink / raw)
  To: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, memxor,
	song, yonghong.song, jolsa, sdf, davem, edumazet, kuba, pabeni,
	horms, dsahern, shuah, ihor.solodrai, kafai.wan, jiayuan.chen,
	hoyeon.lee, ameryhung, bpf, linux-kernel, netdev, linux-kselftest

Add a sockops selftest for the TCP_NODELAY restriction in
BPF_SOCK_OPS_HDR_OPT_LEN_CB and BPF_SOCK_OPS_WRITE_HDR_OPT_CB.

With BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG enabled,
bpf_setsockopt(TCP_NODELAY) returns -EOPNOTSUPP from
BPF_SOCK_OPS_HDR_OPT_LEN_CB and BPF_SOCK_OPS_WRITE_HDR_OPT_CB, avoiding
unbounded recursion and kernel stack overflow.

Other cases continue to work as before, including
BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB.

Signed-off-by: KaFai Wan <kafai.wan@linux.dev>
Reviewed-by: Jiayuan Chen <jiayuan.chen@linux.dev>
---
 .../selftests/bpf/prog_tests/tcp_hdr_options.c    |  6 ++++++
 .../bpf/progs/test_misc_tcp_hdr_options.c         | 15 ++++++++++++++-
 2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/tcp_hdr_options.c b/tools/testing/selftests/bpf/prog_tests/tcp_hdr_options.c
index 56685fc03c7e..21632e0946c5 100644
--- a/tools/testing/selftests/bpf/prog_tests/tcp_hdr_options.c
+++ b/tools/testing/selftests/bpf/prog_tests/tcp_hdr_options.c
@@ -507,6 +507,12 @@ static void misc(void)
 
 	ASSERT_EQ(misc_skel->bss->nr_hwtstamp, 0, "nr_hwtstamp");
 
+	ASSERT_TRUE(misc_skel->data->nodelay_est_ok, "nodelay_est_ok");
+
+	ASSERT_TRUE(misc_skel->data->nodelay_hdr_len_reject, "nodelay_hdr_len_reject");
+
+	ASSERT_TRUE(misc_skel->data->nodelay_write_hdr_reject, "nodelay_write_hdr_reject");
+
 check_linum:
 	ASSERT_FALSE(check_error_linum(&sk_fds), "check_error_linum");
 	sk_fds_close(&sk_fds);
diff --git a/tools/testing/selftests/bpf/progs/test_misc_tcp_hdr_options.c b/tools/testing/selftests/bpf/progs/test_misc_tcp_hdr_options.c
index d487153a839d..e77ec6791092 100644
--- a/tools/testing/selftests/bpf/progs/test_misc_tcp_hdr_options.c
+++ b/tools/testing/selftests/bpf/progs/test_misc_tcp_hdr_options.c
@@ -29,6 +29,10 @@ unsigned int nr_syn = 0;
 unsigned int nr_fin = 0;
 unsigned int nr_hwtstamp = 0;
 
+bool nodelay_est_ok = true;
+bool nodelay_hdr_len_reject = true;
+bool nodelay_write_hdr_reject = true;
+
 /* Check the header received from the active side */
 static int __check_active_hdr_in(struct bpf_sock_ops *skops, bool check_syn)
 {
@@ -300,7 +304,7 @@ static int handle_passive_estab(struct bpf_sock_ops *skops)
 SEC("sockops")
 int misc_estab(struct bpf_sock_ops *skops)
 {
-	int true_val = 1;
+	int true_val = 1, false_val = 0, ret;
 
 	switch (skops->op) {
 	case BPF_SOCK_OPS_TCP_LISTEN_CB:
@@ -316,10 +320,19 @@ int misc_estab(struct bpf_sock_ops *skops)
 	case BPF_SOCK_OPS_PARSE_HDR_OPT_CB:
 		return handle_parse_hdr(skops);
 	case BPF_SOCK_OPS_HDR_OPT_LEN_CB:
+		ret = bpf_setsockopt(skops, SOL_TCP, TCP_NODELAY, &true_val, sizeof(true_val));
+		nodelay_hdr_len_reject &= ret == -EOPNOTSUPP;
+
 		return handle_hdr_opt_len(skops);
 	case BPF_SOCK_OPS_WRITE_HDR_OPT_CB:
+		ret = bpf_setsockopt(skops, SOL_TCP, TCP_NODELAY, &true_val, sizeof(true_val));
+		nodelay_write_hdr_reject &= ret == -EOPNOTSUPP;
+
 		return handle_write_hdr_opt(skops);
 	case BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB:
+		ret = bpf_setsockopt(skops, SOL_TCP, TCP_NODELAY, &false_val, sizeof(false_val));
+		nodelay_est_ok &= ret == 0;
+
 		return handle_passive_estab(skops);
 	}
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH bpf-next v4 4/4] selftests/bpf: Verify bpf-tcp-cc rejects TCP_NODELAY
  2026-04-21 15:58 [PATCH bpf-next v4 0/4] bpf: Reject TCP_NODELAY in TCP header option KaFai Wan
                   ` (2 preceding siblings ...)
  2026-04-21 15:58 ` [PATCH bpf-next v4 3/4] selftests/bpf: Test TCP_NODELAY in TCP hdr opt callbacks KaFai Wan
@ 2026-04-21 15:58 ` KaFai Wan
  3 siblings, 0 replies; 6+ messages in thread
From: KaFai Wan @ 2026-04-21 15:58 UTC (permalink / raw)
  To: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, memxor,
	song, yonghong.song, jolsa, sdf, davem, edumazet, kuba, pabeni,
	horms, dsahern, shuah, ihor.solodrai, kafai.wan, jiayuan.chen,
	hoyeon.lee, ameryhung, bpf, linux-kernel, netdev, linux-kselftest

Add a bpf_tcp_ca selftest for the TCP_NODELAY restriction in
bpf-tcp-cc.

Update bpf_cubic to exercise init() and cwnd_event_tx_start(),
and check that both callbacks reject bpf_setsockopt(TCP_NODELAY)
with -EOPNOTSUPP.

Signed-off-by: KaFai Wan <kafai.wan@linux.dev>
---
 tools/testing/selftests/bpf/prog_tests/bpf_tcp_ca.c |  4 ++++
 tools/testing/selftests/bpf/progs/bpf_cubic.c       | 12 ++++++++++++
 2 files changed, 16 insertions(+)

diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_tcp_ca.c b/tools/testing/selftests/bpf/prog_tests/bpf_tcp_ca.c
index f829b6f09bc9..4f632aa3a79e 100644
--- a/tools/testing/selftests/bpf/prog_tests/bpf_tcp_ca.c
+++ b/tools/testing/selftests/bpf/prog_tests/bpf_tcp_ca.c
@@ -112,6 +112,10 @@ static void test_cubic(void)
 
 	ASSERT_EQ(cubic_skel->bss->bpf_cubic_acked_called, 1, "pkts_acked called");
 
+	ASSERT_TRUE(cubic_skel->data->nodelay_init_reject, "init reject nodelay option");
+	ASSERT_TRUE(cubic_skel->data->nodelay_cwnd_event_tx_start_reject,
+		    "cwnd_event_tx_start reject nodelay option");
+
 	bpf_link__destroy(link);
 	bpf_cubic__destroy(cubic_skel);
 }
diff --git a/tools/testing/selftests/bpf/progs/bpf_cubic.c b/tools/testing/selftests/bpf/progs/bpf_cubic.c
index ce18a4db813f..b941ab3ebad5 100644
--- a/tools/testing/selftests/bpf/progs/bpf_cubic.c
+++ b/tools/testing/selftests/bpf/progs/bpf_cubic.c
@@ -16,6 +16,7 @@
 
 #include "bpf_tracing_net.h"
 #include <bpf/bpf_tracing.h>
+#include <errno.h>
 
 char _license[] SEC("license") = "GPL";
 
@@ -170,10 +171,17 @@ static void bictcp_hystart_reset(struct sock *sk)
 	ca->sample_cnt = 0;
 }
 
+bool nodelay_init_reject = true;
+bool nodelay_cwnd_event_tx_start_reject = true;
+
 SEC("struct_ops")
 void BPF_PROG(bpf_cubic_init, struct sock *sk)
 {
 	struct bpf_bictcp *ca = inet_csk_ca(sk);
+	int true_val = 1, ret;
+
+	ret = bpf_setsockopt(sk, SOL_TCP, TCP_NODELAY, &true_val, sizeof(true_val));
+	nodelay_init_reject &= ret == -EOPNOTSUPP;
 
 	bictcp_reset(ca);
 
@@ -189,8 +197,12 @@ void BPF_PROG(bpf_cubic_cwnd_event_tx_start, struct sock *sk)
 {
 	struct bpf_bictcp *ca = inet_csk_ca(sk);
 	__u32 now = tcp_jiffies32;
+	int true_val = 1, ret;
 	__s32 delta;
 
+	ret = bpf_setsockopt(sk, SOL_TCP, TCP_NODELAY, &true_val, sizeof(true_val));
+	nodelay_cwnd_event_tx_start_reject &= ret == -EOPNOTSUPP;
+
 	delta = now - tcp_sk(sk)->lsndtime;
 
 	/* We were application limited (idle) for a while.
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH bpf-next v4 1/4] bpf: Reject TCP_NODELAY in TCP header option callbacks
  2026-04-21 15:58 ` [PATCH bpf-next v4 1/4] bpf: Reject TCP_NODELAY in TCP header option callbacks KaFai Wan
@ 2026-04-21 16:51   ` bot+bpf-ci
  0 siblings, 0 replies; 6+ messages in thread
From: bot+bpf-ci @ 2026-04-21 16:51 UTC (permalink / raw)
  To: kafai.wan, ast, daniel, john.fastabend, andrii, martin.lau,
	eddyz87, memxor, song, yonghong.song, jolsa, sdf, davem, edumazet,
	kuba, pabeni, horms, dsahern, shuah, ihor.solodrai, kafai.wan,
	jiayuan.chen, hoyeon.lee, ameryhung, bpf, linux-kernel, netdev,
	linux-kselftest
  Cc: 2022090917019, dddddd, M202472210, ast, andrii, daniel,
	martin.lau, eddyz87, yonghong.song, clm, ihor.solodrai

[-- Attachment #1: Type: text/plain, Size: 2641 bytes --]

> diff --git a/net/core/filter.c b/net/core/filter.c
> index 5fa9189eb772..96849f4c1fbc 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -5833,6 +5833,12 @@ BPF_CALL_5(bpf_sock_ops_setsockopt, struct bpf_sock_ops_kern *, bpf_sock,
>  	if (!is_locked_tcp_sock_ops(bpf_sock))
>  		return -EOPNOTSUPP;
>
> +	/* TCP_NODELAY triggers tcp_push_pending_frames() and re-enters these callbacks. */
> +	if ((bpf_sock->op == BPF_SOCK_OPS_HDR_OPT_LEN_CB ||
> +	     bpf_sock->op == BPF_SOCK_OPS_WRITE_HDR_OPT_CB) &&
> +	    level == SOL_TCP && optname == TCP_NODELAY)
> +		return -EOPNOTSUPP;
> +

Does this same recursion vulnerability exist for BPF TCP congestion
control algorithms using BPF_PROG_TYPE_STRUCT_OPS?

If a BPF congestion control algorithm invokes bpf_setsockopt(TCP_NODELAY)
from its cwnd_event callback when handling CA_EVENT_TX_START, could it
trigger unbounded recursion through this path:

BPF_PROG_TYPE_STRUCT_OPS cwnd_event callback (CA_EVENT_TX_START)
  -> bpf_setsockopt(TCP_NODELAY)
    -> tcp_push_pending_frames()
      -> tcp_write_xmit()
        -> tcp_transmit_skb()
          -> tcp_event_data_sent()
            -> cwnd_event()
              -> BPF congestion control cwnd_event callback

The author confirmed this issue in the v3 discussion thread and Martin
KaFai Lau suggested creating a new func_proto
(bpf_sk_setsockopt_nodelay_proto) to reject TCP_NODELAY for all struct_ops
programs, but this patch only addresses BPF_SOCK_OPS callbacks.

Reference: https://lore.kernel.org/bpf/20260418092235.98444C19424@smtp.kernel.org/


Does this fix also need to block TCP_CORK when the value is 0?

When bpf_setsockopt() is called with SOL_TCP and TCP_CORK with value 0
(disabling cork), the call path goes through __tcp_sock_set_cork() in
net/ipv4/tcp.c, which unconditionally calls tcp_push_pending_frames(sk)
when cork is disabled:

__tcp_sock_set_cork()
  -> tcp_push_pending_frames()
    -> tcp_current_mss()
      -> tcp_established_options()
        -> bpf_skops_hdr_opt_len()
          -> BPF_SOCK_OPS_HDR_OPT_LEN_CB

This triggers the same recursion chain described in the commit message,
but the patch only blocks TCP_NODELAY, not TCP_CORK with value 0.

Reference: https://lore.kernel.org/bpf/5fea6b5e9d1a13e72edf8222a8ebe50cbb660b37e62b87b65dab6f6f317943ec@mail.kernel.org/

>  	return _bpf_setsockopt(bpf_sock->sk, level, optname, optval, optlen);
>  }
>


---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/24733356810

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-04-21 16:51 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-21 15:58 [PATCH bpf-next v4 0/4] bpf: Reject TCP_NODELAY in TCP header option KaFai Wan
2026-04-21 15:58 ` [PATCH bpf-next v4 1/4] bpf: Reject TCP_NODELAY in TCP header option callbacks KaFai Wan
2026-04-21 16:51   ` bot+bpf-ci
2026-04-21 15:58 ` [PATCH bpf-next v4 2/4] bpf: Reject TCP_NODELAY in bpf-tcp-cc KaFai Wan
2026-04-21 15:58 ` [PATCH bpf-next v4 3/4] selftests/bpf: Test TCP_NODELAY in TCP hdr opt callbacks KaFai Wan
2026-04-21 15:58 ` [PATCH bpf-next v4 4/4] selftests/bpf: Verify bpf-tcp-cc rejects TCP_NODELAY KaFai Wan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox