* [PATCH bpf-next v4 0/4] bpf: Reject TCP_NODELAY in TCP header option
@ 2026-04-21 15:58 KaFai Wan
2026-04-21 15:58 ` [PATCH bpf-next v4 1/4] bpf: Reject TCP_NODELAY in TCP header option callbacks KaFai Wan
` (3 more replies)
0 siblings, 4 replies; 6+ messages in thread
From: KaFai Wan @ 2026-04-21 15:58 UTC (permalink / raw)
To: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, memxor,
song, yonghong.song, jolsa, sdf, davem, edumazet, kuba, pabeni,
horms, dsahern, shuah, ihor.solodrai, kafai.wan, jiayuan.chen,
hoyeon.lee, ameryhung, bpf, linux-kernel, netdev, linux-kselftest
This small patchset is about avoid infinite recursion in TCP header option callbacks
and bpf-tcp-cc callbacks via TCP_NODELAY setsockopt.
v4:
- Fix the test case for TCP header option callbacks (Martin and Jiayuan)
- Reject TCP_NODELAY in bpf-tcp-cc callbacks (AI and Martin)
- Add a test case for bpf-tcp-cc
v3:
- Remove CONFIG_INET check and add comment (Martin and Jiayuan)
- Fix the test case (Martin)
https://lore.kernel.org/bpf/20260417092035.2299913-1-kafai.wan@linux.dev/
v2:
- Reject TCP_NODELAY in bpf_sock_ops_setsockopt() (AI and Martin)
https://lore.kernel.org/bpf/20260416112308.1820332-1-kafai.wan@linux.dev/
v1:
https://lore.kernel.org/bpf/20260414112310.1285783-1-kafai.wan@linux.dev/
---
KaFai Wan (4):
bpf: Reject TCP_NODELAY in TCP header option callbacks
bpf: Reject TCP_NODELAY in bpf-tcp-cc
selftests/bpf: Test TCP_NODELAY in TCP hdr opt callbacks
selftests/bpf: Verify bpf-tcp-cc rejects TCP_NODELAY
include/linux/bpf.h | 1 +
net/core/filter.c | 30 +++++++++++++++++++
net/ipv4/bpf_tcp_ca.c | 2 +-
.../selftests/bpf/prog_tests/bpf_tcp_ca.c | 4 +++
.../bpf/prog_tests/tcp_hdr_options.c | 6 ++++
tools/testing/selftests/bpf/progs/bpf_cubic.c | 12 ++++++++
.../bpf/progs/test_misc_tcp_hdr_options.c | 15 +++++++++-
7 files changed, 68 insertions(+), 2 deletions(-)
--
2.43.0
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH bpf-next v4 1/4] bpf: Reject TCP_NODELAY in TCP header option callbacks
2026-04-21 15:58 [PATCH bpf-next v4 0/4] bpf: Reject TCP_NODELAY in TCP header option KaFai Wan
@ 2026-04-21 15:58 ` KaFai Wan
2026-04-21 16:51 ` bot+bpf-ci
2026-04-21 15:58 ` [PATCH bpf-next v4 2/4] bpf: Reject TCP_NODELAY in bpf-tcp-cc KaFai Wan
` (2 subsequent siblings)
3 siblings, 1 reply; 6+ messages in thread
From: KaFai Wan @ 2026-04-21 15:58 UTC (permalink / raw)
To: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, memxor,
song, yonghong.song, jolsa, sdf, davem, edumazet, kuba, pabeni,
horms, dsahern, shuah, ihor.solodrai, kafai.wan, jiayuan.chen,
hoyeon.lee, ameryhung, bpf, linux-kernel, netdev, linux-kselftest
Cc: Quan Sun, Yinhao Hu, Kaiyan Mei
A BPF_SOCK_OPS program can enable
BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG and then call
bpf_setsockopt(TCP_NODELAY) from BPF_SOCK_OPS_HDR_OPT_LEN_CB or
BPF_SOCK_OPS_WRITE_HDR_OPT_CB.
In these callbacks, bpf_setsockopt(TCP_NODELAY) can reach
__tcp_sock_set_nodelay(), which can call tcp_push_pending_frames().
From BPF_SOCK_OPS_HDR_OPT_LEN_CB, tcp_push_pending_frames() can call
tcp_current_mss(), which calls tcp_established_options() and re-enters
bpf_skops_hdr_opt_len().
BPF_SOCK_OPS_HDR_OPT_LEN_CB
-> bpf_setsockopt(TCP_NODELAY)
-> tcp_push_pending_frames()
-> tcp_current_mss()
-> tcp_established_options()
-> bpf_skops_hdr_opt_len()
-> BPF_SOCK_OPS_HDR_OPT_LEN_CB
From BPF_SOCK_OPS_WRITE_HDR_OPT_CB, tcp_push_pending_frames() can call
tcp_write_xmit(), which calls tcp_transmit_skb(). That path recomputes
header option length through tcp_established_options() and
bpf_skops_hdr_opt_len() before re-entering bpf_skops_write_hdr_opt().
BPF_SOCK_OPS_WRITE_HDR_OPT_CB
-> bpf_setsockopt(TCP_NODELAY)
-> tcp_push_pending_frames()
-> tcp_write_xmit()
-> tcp_transmit_skb()
-> tcp_established_options()
-> bpf_skops_hdr_opt_len()
-> bpf_skops_write_hdr_opt()
-> BPF_SOCK_OPS_WRITE_HDR_OPT_CB
This leads to unbounded recursion and can overflow the kernel stack.
Reject TCP_NODELAY with -EOPNOTSUPP in bpf_sock_ops_setsockopt()
when bpf_setsockopt() is called from
BPF_SOCK_OPS_HDR_OPT_LEN_CB or BPF_SOCK_OPS_WRITE_HDR_OPT_CB.
Reported-by: Quan Sun <2022090917019@std.uestc.edu.cn>
Reported-by: Yinhao Hu <dddddd@hust.edu.cn>
Reported-by: Kaiyan Mei <M202472210@hust.edu.cn>
Closes: https://lore.kernel.org/bpf/d1d523c9-6901-4454-a183-94462b8f3e4e@std.uestc.edu.cn/
Fixes: 7e41df5dbba2 ("bpf: Add a few optnames to bpf_setsockopt")
Reviewed-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Signed-off-by: KaFai Wan <kafai.wan@linux.dev>
---
net/core/filter.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/net/core/filter.c b/net/core/filter.c
index 5fa9189eb772..96849f4c1fbc 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5833,6 +5833,12 @@ BPF_CALL_5(bpf_sock_ops_setsockopt, struct bpf_sock_ops_kern *, bpf_sock,
if (!is_locked_tcp_sock_ops(bpf_sock))
return -EOPNOTSUPP;
+ /* TCP_NODELAY triggers tcp_push_pending_frames() and re-enters these callbacks. */
+ if ((bpf_sock->op == BPF_SOCK_OPS_HDR_OPT_LEN_CB ||
+ bpf_sock->op == BPF_SOCK_OPS_WRITE_HDR_OPT_CB) &&
+ level == SOL_TCP && optname == TCP_NODELAY)
+ return -EOPNOTSUPP;
+
return _bpf_setsockopt(bpf_sock->sk, level, optname, optval, optlen);
}
--
2.43.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH bpf-next v4 2/4] bpf: Reject TCP_NODELAY in bpf-tcp-cc
2026-04-21 15:58 [PATCH bpf-next v4 0/4] bpf: Reject TCP_NODELAY in TCP header option KaFai Wan
2026-04-21 15:58 ` [PATCH bpf-next v4 1/4] bpf: Reject TCP_NODELAY in TCP header option callbacks KaFai Wan
@ 2026-04-21 15:58 ` KaFai Wan
2026-04-21 15:58 ` [PATCH bpf-next v4 3/4] selftests/bpf: Test TCP_NODELAY in TCP hdr opt callbacks KaFai Wan
2026-04-21 15:58 ` [PATCH bpf-next v4 4/4] selftests/bpf: Verify bpf-tcp-cc rejects TCP_NODELAY KaFai Wan
3 siblings, 0 replies; 6+ messages in thread
From: KaFai Wan @ 2026-04-21 15:58 UTC (permalink / raw)
To: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, memxor,
song, yonghong.song, jolsa, sdf, davem, edumazet, kuba, pabeni,
horms, dsahern, shuah, ihor.solodrai, kafai.wan, jiayuan.chen,
hoyeon.lee, ameryhung, bpf, linux-kernel, netdev, linux-kselftest
A BPF TCP congestion control program can call bpf_setsockopt() from
its callbacks. In current kernels, if it calls
bpf_setsockopt(TCP_NODELAY) from cwnd_event_tx_start(), the call can
re-enter the TCP transmit path before the outer tcp_transmit_skb()
has completed and advanced the send head.
This can re-trigger CA_EVENT_TX_START and lead to unbounded recursion:
tcp_transmit_skb()
-> tcp_event_data_sent()
-> tcp_ca_event(sk, CA_EVENT_TX_START)
-> cwnd_event_tx_start()
-> bpf_setsockopt(TCP_NODELAY)
-> tcp_push_pending_frames()
-> tcp_write_xmit()
-> tcp_transmit_skb()
This leads to unbounded recursion and can overflow the kernel stack.
Reject TCP_NODELAY with -EOPNOTSUPP for bpf-tcp-cc by introducing
a dedicated setsockopt proto for BPF_PROG_TYPE_STRUCT_OPS TCP
congestion control programs.
Fixes: 7e41df5dbba2 ("bpf: Add a few optnames to bpf_setsockopt")
Suggested-by: Martin KaFai Lau <martin.lau@linux.dev>
Signed-off-by: KaFai Wan <kafai.wan@linux.dev>
---
include/linux/bpf.h | 1 +
net/core/filter.c | 24 ++++++++++++++++++++++++
net/ipv4/bpf_tcp_ca.c | 2 +-
3 files changed, 26 insertions(+), 1 deletion(-)
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 3cb6b9e70080..cf75da8a12bd 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -3725,6 +3725,7 @@ extern const struct bpf_func_proto bpf_for_each_map_elem_proto;
extern const struct bpf_func_proto bpf_btf_find_by_name_kind_proto;
extern const struct bpf_func_proto bpf_sk_setsockopt_proto;
extern const struct bpf_func_proto bpf_sk_getsockopt_proto;
+extern const struct bpf_func_proto bpf_sk_setsockopt_nodelay_proto;
extern const struct bpf_func_proto bpf_unlocked_sk_setsockopt_proto;
extern const struct bpf_func_proto bpf_unlocked_sk_getsockopt_proto;
extern const struct bpf_func_proto bpf_find_vma_proto;
diff --git a/net/core/filter.c b/net/core/filter.c
index 96849f4c1fbc..1140f4b55ab5 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5688,6 +5688,30 @@ const struct bpf_func_proto bpf_sk_getsockopt_proto = {
.arg5_type = ARG_CONST_SIZE,
};
+BPF_CALL_5(bpf_sk_setsockopt_nodelay, struct sock *, sk, int, level,
+ int, optname, char *, optval, int, optlen)
+{
+ /*
+ * TCP_NODELAY triggers tcp_push_pending_frames() and re-enters
+ * CA_EVENT_TX_START in bpf_tcp_cc, reject it in all bpf_tcp_cc.
+ */
+ if (level == SOL_TCP && optname == TCP_NODELAY)
+ return -EOPNOTSUPP;
+
+ return _bpf_setsockopt(sk, level, optname, optval, optlen);
+}
+
+const struct bpf_func_proto bpf_sk_setsockopt_nodelay_proto = {
+ .func = bpf_sk_setsockopt_nodelay,
+ .gpl_only = false,
+ .ret_type = RET_INTEGER,
+ .arg1_type = ARG_PTR_TO_BTF_ID_SOCK_COMMON,
+ .arg2_type = ARG_ANYTHING,
+ .arg3_type = ARG_ANYTHING,
+ .arg4_type = ARG_PTR_TO_MEM | MEM_RDONLY,
+ .arg5_type = ARG_CONST_SIZE,
+};
+
BPF_CALL_5(bpf_unlocked_sk_setsockopt, struct sock *, sk, int, level,
int, optname, char *, optval, int, optlen)
{
diff --git a/net/ipv4/bpf_tcp_ca.c b/net/ipv4/bpf_tcp_ca.c
index 008edc7f6688..791e15063237 100644
--- a/net/ipv4/bpf_tcp_ca.c
+++ b/net/ipv4/bpf_tcp_ca.c
@@ -168,7 +168,7 @@ bpf_tcp_ca_get_func_proto(enum bpf_func_id func_id,
*/
if (prog_ops_moff(prog) !=
offsetof(struct tcp_congestion_ops, release))
- return &bpf_sk_setsockopt_proto;
+ return &bpf_sk_setsockopt_nodelay_proto;
return NULL;
case BPF_FUNC_getsockopt:
/* Since get/setsockopt is usually expected to
--
2.43.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH bpf-next v4 3/4] selftests/bpf: Test TCP_NODELAY in TCP hdr opt callbacks
2026-04-21 15:58 [PATCH bpf-next v4 0/4] bpf: Reject TCP_NODELAY in TCP header option KaFai Wan
2026-04-21 15:58 ` [PATCH bpf-next v4 1/4] bpf: Reject TCP_NODELAY in TCP header option callbacks KaFai Wan
2026-04-21 15:58 ` [PATCH bpf-next v4 2/4] bpf: Reject TCP_NODELAY in bpf-tcp-cc KaFai Wan
@ 2026-04-21 15:58 ` KaFai Wan
2026-04-21 15:58 ` [PATCH bpf-next v4 4/4] selftests/bpf: Verify bpf-tcp-cc rejects TCP_NODELAY KaFai Wan
3 siblings, 0 replies; 6+ messages in thread
From: KaFai Wan @ 2026-04-21 15:58 UTC (permalink / raw)
To: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, memxor,
song, yonghong.song, jolsa, sdf, davem, edumazet, kuba, pabeni,
horms, dsahern, shuah, ihor.solodrai, kafai.wan, jiayuan.chen,
hoyeon.lee, ameryhung, bpf, linux-kernel, netdev, linux-kselftest
Add a sockops selftest for the TCP_NODELAY restriction in
BPF_SOCK_OPS_HDR_OPT_LEN_CB and BPF_SOCK_OPS_WRITE_HDR_OPT_CB.
With BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG enabled,
bpf_setsockopt(TCP_NODELAY) returns -EOPNOTSUPP from
BPF_SOCK_OPS_HDR_OPT_LEN_CB and BPF_SOCK_OPS_WRITE_HDR_OPT_CB, avoiding
unbounded recursion and kernel stack overflow.
Other cases continue to work as before, including
BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB.
Signed-off-by: KaFai Wan <kafai.wan@linux.dev>
Reviewed-by: Jiayuan Chen <jiayuan.chen@linux.dev>
---
.../selftests/bpf/prog_tests/tcp_hdr_options.c | 6 ++++++
.../bpf/progs/test_misc_tcp_hdr_options.c | 15 ++++++++++++++-
2 files changed, 20 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/prog_tests/tcp_hdr_options.c b/tools/testing/selftests/bpf/prog_tests/tcp_hdr_options.c
index 56685fc03c7e..21632e0946c5 100644
--- a/tools/testing/selftests/bpf/prog_tests/tcp_hdr_options.c
+++ b/tools/testing/selftests/bpf/prog_tests/tcp_hdr_options.c
@@ -507,6 +507,12 @@ static void misc(void)
ASSERT_EQ(misc_skel->bss->nr_hwtstamp, 0, "nr_hwtstamp");
+ ASSERT_TRUE(misc_skel->data->nodelay_est_ok, "nodelay_est_ok");
+
+ ASSERT_TRUE(misc_skel->data->nodelay_hdr_len_reject, "nodelay_hdr_len_reject");
+
+ ASSERT_TRUE(misc_skel->data->nodelay_write_hdr_reject, "nodelay_write_hdr_reject");
+
check_linum:
ASSERT_FALSE(check_error_linum(&sk_fds), "check_error_linum");
sk_fds_close(&sk_fds);
diff --git a/tools/testing/selftests/bpf/progs/test_misc_tcp_hdr_options.c b/tools/testing/selftests/bpf/progs/test_misc_tcp_hdr_options.c
index d487153a839d..e77ec6791092 100644
--- a/tools/testing/selftests/bpf/progs/test_misc_tcp_hdr_options.c
+++ b/tools/testing/selftests/bpf/progs/test_misc_tcp_hdr_options.c
@@ -29,6 +29,10 @@ unsigned int nr_syn = 0;
unsigned int nr_fin = 0;
unsigned int nr_hwtstamp = 0;
+bool nodelay_est_ok = true;
+bool nodelay_hdr_len_reject = true;
+bool nodelay_write_hdr_reject = true;
+
/* Check the header received from the active side */
static int __check_active_hdr_in(struct bpf_sock_ops *skops, bool check_syn)
{
@@ -300,7 +304,7 @@ static int handle_passive_estab(struct bpf_sock_ops *skops)
SEC("sockops")
int misc_estab(struct bpf_sock_ops *skops)
{
- int true_val = 1;
+ int true_val = 1, false_val = 0, ret;
switch (skops->op) {
case BPF_SOCK_OPS_TCP_LISTEN_CB:
@@ -316,10 +320,19 @@ int misc_estab(struct bpf_sock_ops *skops)
case BPF_SOCK_OPS_PARSE_HDR_OPT_CB:
return handle_parse_hdr(skops);
case BPF_SOCK_OPS_HDR_OPT_LEN_CB:
+ ret = bpf_setsockopt(skops, SOL_TCP, TCP_NODELAY, &true_val, sizeof(true_val));
+ nodelay_hdr_len_reject &= ret == -EOPNOTSUPP;
+
return handle_hdr_opt_len(skops);
case BPF_SOCK_OPS_WRITE_HDR_OPT_CB:
+ ret = bpf_setsockopt(skops, SOL_TCP, TCP_NODELAY, &true_val, sizeof(true_val));
+ nodelay_write_hdr_reject &= ret == -EOPNOTSUPP;
+
return handle_write_hdr_opt(skops);
case BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB:
+ ret = bpf_setsockopt(skops, SOL_TCP, TCP_NODELAY, &false_val, sizeof(false_val));
+ nodelay_est_ok &= ret == 0;
+
return handle_passive_estab(skops);
}
--
2.43.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH bpf-next v4 4/4] selftests/bpf: Verify bpf-tcp-cc rejects TCP_NODELAY
2026-04-21 15:58 [PATCH bpf-next v4 0/4] bpf: Reject TCP_NODELAY in TCP header option KaFai Wan
` (2 preceding siblings ...)
2026-04-21 15:58 ` [PATCH bpf-next v4 3/4] selftests/bpf: Test TCP_NODELAY in TCP hdr opt callbacks KaFai Wan
@ 2026-04-21 15:58 ` KaFai Wan
3 siblings, 0 replies; 6+ messages in thread
From: KaFai Wan @ 2026-04-21 15:58 UTC (permalink / raw)
To: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, memxor,
song, yonghong.song, jolsa, sdf, davem, edumazet, kuba, pabeni,
horms, dsahern, shuah, ihor.solodrai, kafai.wan, jiayuan.chen,
hoyeon.lee, ameryhung, bpf, linux-kernel, netdev, linux-kselftest
Add a bpf_tcp_ca selftest for the TCP_NODELAY restriction in
bpf-tcp-cc.
Update bpf_cubic to exercise init() and cwnd_event_tx_start(),
and check that both callbacks reject bpf_setsockopt(TCP_NODELAY)
with -EOPNOTSUPP.
Signed-off-by: KaFai Wan <kafai.wan@linux.dev>
---
tools/testing/selftests/bpf/prog_tests/bpf_tcp_ca.c | 4 ++++
tools/testing/selftests/bpf/progs/bpf_cubic.c | 12 ++++++++++++
2 files changed, 16 insertions(+)
diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_tcp_ca.c b/tools/testing/selftests/bpf/prog_tests/bpf_tcp_ca.c
index f829b6f09bc9..4f632aa3a79e 100644
--- a/tools/testing/selftests/bpf/prog_tests/bpf_tcp_ca.c
+++ b/tools/testing/selftests/bpf/prog_tests/bpf_tcp_ca.c
@@ -112,6 +112,10 @@ static void test_cubic(void)
ASSERT_EQ(cubic_skel->bss->bpf_cubic_acked_called, 1, "pkts_acked called");
+ ASSERT_TRUE(cubic_skel->data->nodelay_init_reject, "init reject nodelay option");
+ ASSERT_TRUE(cubic_skel->data->nodelay_cwnd_event_tx_start_reject,
+ "cwnd_event_tx_start reject nodelay option");
+
bpf_link__destroy(link);
bpf_cubic__destroy(cubic_skel);
}
diff --git a/tools/testing/selftests/bpf/progs/bpf_cubic.c b/tools/testing/selftests/bpf/progs/bpf_cubic.c
index ce18a4db813f..b941ab3ebad5 100644
--- a/tools/testing/selftests/bpf/progs/bpf_cubic.c
+++ b/tools/testing/selftests/bpf/progs/bpf_cubic.c
@@ -16,6 +16,7 @@
#include "bpf_tracing_net.h"
#include <bpf/bpf_tracing.h>
+#include <errno.h>
char _license[] SEC("license") = "GPL";
@@ -170,10 +171,17 @@ static void bictcp_hystart_reset(struct sock *sk)
ca->sample_cnt = 0;
}
+bool nodelay_init_reject = true;
+bool nodelay_cwnd_event_tx_start_reject = true;
+
SEC("struct_ops")
void BPF_PROG(bpf_cubic_init, struct sock *sk)
{
struct bpf_bictcp *ca = inet_csk_ca(sk);
+ int true_val = 1, ret;
+
+ ret = bpf_setsockopt(sk, SOL_TCP, TCP_NODELAY, &true_val, sizeof(true_val));
+ nodelay_init_reject &= ret == -EOPNOTSUPP;
bictcp_reset(ca);
@@ -189,8 +197,12 @@ void BPF_PROG(bpf_cubic_cwnd_event_tx_start, struct sock *sk)
{
struct bpf_bictcp *ca = inet_csk_ca(sk);
__u32 now = tcp_jiffies32;
+ int true_val = 1, ret;
__s32 delta;
+ ret = bpf_setsockopt(sk, SOL_TCP, TCP_NODELAY, &true_val, sizeof(true_val));
+ nodelay_cwnd_event_tx_start_reject &= ret == -EOPNOTSUPP;
+
delta = now - tcp_sk(sk)->lsndtime;
/* We were application limited (idle) for a while.
--
2.43.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH bpf-next v4 1/4] bpf: Reject TCP_NODELAY in TCP header option callbacks
2026-04-21 15:58 ` [PATCH bpf-next v4 1/4] bpf: Reject TCP_NODELAY in TCP header option callbacks KaFai Wan
@ 2026-04-21 16:51 ` bot+bpf-ci
0 siblings, 0 replies; 6+ messages in thread
From: bot+bpf-ci @ 2026-04-21 16:51 UTC (permalink / raw)
To: kafai.wan, ast, daniel, john.fastabend, andrii, martin.lau,
eddyz87, memxor, song, yonghong.song, jolsa, sdf, davem, edumazet,
kuba, pabeni, horms, dsahern, shuah, ihor.solodrai, kafai.wan,
jiayuan.chen, hoyeon.lee, ameryhung, bpf, linux-kernel, netdev,
linux-kselftest
Cc: 2022090917019, dddddd, M202472210, ast, andrii, daniel,
martin.lau, eddyz87, yonghong.song, clm, ihor.solodrai
[-- Attachment #1: Type: text/plain, Size: 2641 bytes --]
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 5fa9189eb772..96849f4c1fbc 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -5833,6 +5833,12 @@ BPF_CALL_5(bpf_sock_ops_setsockopt, struct bpf_sock_ops_kern *, bpf_sock,
> if (!is_locked_tcp_sock_ops(bpf_sock))
> return -EOPNOTSUPP;
>
> + /* TCP_NODELAY triggers tcp_push_pending_frames() and re-enters these callbacks. */
> + if ((bpf_sock->op == BPF_SOCK_OPS_HDR_OPT_LEN_CB ||
> + bpf_sock->op == BPF_SOCK_OPS_WRITE_HDR_OPT_CB) &&
> + level == SOL_TCP && optname == TCP_NODELAY)
> + return -EOPNOTSUPP;
> +
Does this same recursion vulnerability exist for BPF TCP congestion
control algorithms using BPF_PROG_TYPE_STRUCT_OPS?
If a BPF congestion control algorithm invokes bpf_setsockopt(TCP_NODELAY)
from its cwnd_event callback when handling CA_EVENT_TX_START, could it
trigger unbounded recursion through this path:
BPF_PROG_TYPE_STRUCT_OPS cwnd_event callback (CA_EVENT_TX_START)
-> bpf_setsockopt(TCP_NODELAY)
-> tcp_push_pending_frames()
-> tcp_write_xmit()
-> tcp_transmit_skb()
-> tcp_event_data_sent()
-> cwnd_event()
-> BPF congestion control cwnd_event callback
The author confirmed this issue in the v3 discussion thread and Martin
KaFai Lau suggested creating a new func_proto
(bpf_sk_setsockopt_nodelay_proto) to reject TCP_NODELAY for all struct_ops
programs, but this patch only addresses BPF_SOCK_OPS callbacks.
Reference: https://lore.kernel.org/bpf/20260418092235.98444C19424@smtp.kernel.org/
Does this fix also need to block TCP_CORK when the value is 0?
When bpf_setsockopt() is called with SOL_TCP and TCP_CORK with value 0
(disabling cork), the call path goes through __tcp_sock_set_cork() in
net/ipv4/tcp.c, which unconditionally calls tcp_push_pending_frames(sk)
when cork is disabled:
__tcp_sock_set_cork()
-> tcp_push_pending_frames()
-> tcp_current_mss()
-> tcp_established_options()
-> bpf_skops_hdr_opt_len()
-> BPF_SOCK_OPS_HDR_OPT_LEN_CB
This triggers the same recursion chain described in the commit message,
but the patch only blocks TCP_NODELAY, not TCP_CORK with value 0.
Reference: https://lore.kernel.org/bpf/5fea6b5e9d1a13e72edf8222a8ebe50cbb660b37e62b87b65dab6f6f317943ec@mail.kernel.org/
> return _bpf_setsockopt(bpf_sock->sk, level, optname, optval, optlen);
> }
>
---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md
CI run summary: https://github.com/kernel-patches/bpf/actions/runs/24733356810
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2026-04-21 16:51 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-21 15:58 [PATCH bpf-next v4 0/4] bpf: Reject TCP_NODELAY in TCP header option KaFai Wan
2026-04-21 15:58 ` [PATCH bpf-next v4 1/4] bpf: Reject TCP_NODELAY in TCP header option callbacks KaFai Wan
2026-04-21 16:51 ` bot+bpf-ci
2026-04-21 15:58 ` [PATCH bpf-next v4 2/4] bpf: Reject TCP_NODELAY in bpf-tcp-cc KaFai Wan
2026-04-21 15:58 ` [PATCH bpf-next v4 3/4] selftests/bpf: Test TCP_NODELAY in TCP hdr opt callbacks KaFai Wan
2026-04-21 15:58 ` [PATCH bpf-next v4 4/4] selftests/bpf: Verify bpf-tcp-cc rejects TCP_NODELAY KaFai Wan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox