* [PATCH bpf-next 0/2] bpf: tcp: Reject TCP_NODELAY from BPF hdr opt callbacks @ 2026-04-14 11:23 KaFai Wan 2026-04-14 11:23 ` [PATCH bpf-next 1/2] " KaFai Wan 2026-04-14 11:23 ` [PATCH bpf-next 2/2] selftests/bpf: Cover TCP_NODELAY in hdr opt callback KaFai Wan 0 siblings, 2 replies; 5+ messages in thread From: KaFai Wan @ 2026-04-14 11:23 UTC (permalink / raw) To: edumazet, ncardwell, kuniyu, davem, dsahern, kuba, pabeni, horms, ast, daniel, andrii, martin.lau, eddyz87, memxor, song, yonghong.song, jolsa, shuah, kafai.wan, sdf, netdev, linux-kernel, bpf, linux-kselftest This small patchset is about avoid infinite recursion in bpf_skops_hdr_opt_len() via TCP_NODELAY setsockopt. --- KaFai Wan (2): bpf: tcp: Reject TCP_NODELAY from BPF hdr opt callbacks selftests/bpf: Cover TCP_NODELAY in hdr opt callback net/ipv4/tcp.c | 5 ++- .../bpf/prog_tests/tcp_hdr_options.c | 34 +++++++++++++++++++ .../bpf/progs/test_misc_tcp_hdr_options.c | 18 ++++++++++ 3 files changed, 56 insertions(+), 1 deletion(-) -- 2.43.0 ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH bpf-next 1/2] bpf: tcp: Reject TCP_NODELAY from BPF hdr opt callbacks 2026-04-14 11:23 [PATCH bpf-next 0/2] bpf: tcp: Reject TCP_NODELAY from BPF hdr opt callbacks KaFai Wan @ 2026-04-14 11:23 ` KaFai Wan 2026-04-14 13:56 ` KaFai Wan 2026-04-15 17:31 ` Martin KaFai Lau 2026-04-14 11:23 ` [PATCH bpf-next 2/2] selftests/bpf: Cover TCP_NODELAY in hdr opt callback KaFai Wan 1 sibling, 2 replies; 5+ messages in thread From: KaFai Wan @ 2026-04-14 11:23 UTC (permalink / raw) To: edumazet, ncardwell, kuniyu, davem, dsahern, kuba, pabeni, horms, ast, daniel, andrii, martin.lau, eddyz87, memxor, song, yonghong.song, jolsa, shuah, kafai.wan, sdf, netdev, linux-kernel, bpf, linux-kselftest Cc: Quan Sun, Yinhao Hu, Kaiyan Mei A BPF_SOCK_OPS program can enable BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG and then call bpf_setsockopt(TCP_NODELAY) from BPF_SOCK_OPS_HDR_OPT_LEN_CB. That reaches __tcp_sock_set_nodelay(), which may call tcp_push_pending_frames(). The transmit path then computes TCP options again, re-enters bpf_skops_hdr_opt_len(), and invokes the same BPF callback recursively. This can loop until the kernel stack overflows. TCP_NODELAY is not safe from the header option callback context. Reject it with -EOPNOTSUPP when TCP header option callbacks are enabled on the socket, so the callback cannot recurse back into tcp_push_pending_frames() through do_tcp_setsockopt(). Reported-by: Quan Sun <2022090917019@std.uestc.edu.cn> Reported-by: Yinhao Hu <dddddd@hust.edu.cn> Reported-by: Kaiyan Mei <M202472210@hust.edu.cn> Closes: https://lore.kernel.org/bpf/d1d523c9-6901-4454-a183-94462b8f3e4e@std.uestc.edu.cn/ Fixes: 7e41df5dbba2 ("bpf: Add a few optnames to bpf_setsockopt") Signed-off-by: KaFai Wan <kafai.wan@linux.dev> --- net/ipv4/tcp.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 202a4e57a218..7ac4c98be19d 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -4004,7 +4004,10 @@ int do_tcp_setsockopt(struct sock *sk, int level, int optname, switch (optname) { case TCP_NODELAY: - __tcp_sock_set_nodelay(sk, val); + if (val && BPF_SOCK_OPS_TEST_FLAG(tp, BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG)) + err = -EOPNOTSUPP; + else + __tcp_sock_set_nodelay(sk, val); break; case TCP_THIN_LINEAR_TIMEOUTS: -- 2.43.0 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH bpf-next 1/2] bpf: tcp: Reject TCP_NODELAY from BPF hdr opt callbacks 2026-04-14 11:23 ` [PATCH bpf-next 1/2] " KaFai Wan @ 2026-04-14 13:56 ` KaFai Wan 2026-04-15 17:31 ` Martin KaFai Lau 1 sibling, 0 replies; 5+ messages in thread From: KaFai Wan @ 2026-04-14 13:56 UTC (permalink / raw) To: edumazet, ncardwell, kuniyu, davem, dsahern, kuba, pabeni, horms, ast, daniel, andrii, martin.lau, eddyz87, memxor, song, yonghong.song, jolsa, shuah, sdf, netdev, linux-kernel, bpf, linux-kselftest Cc: Quan Sun, Yinhao Hu, Kaiyan Mei On Tue, 2026-04-14 at 19:23 +0800, KaFai Wan wrote: AI is right and I'm late for the issue. Please ignore this. Sorry for the noise. > A BPF_SOCK_OPS program can enable > BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG and then call > bpf_setsockopt(TCP_NODELAY) from BPF_SOCK_OPS_HDR_OPT_LEN_CB. > > That reaches __tcp_sock_set_nodelay(), which may call > tcp_push_pending_frames(). The transmit path then computes TCP > options again, re-enters bpf_skops_hdr_opt_len(), and invokes the > same BPF callback recursively. This can loop until the kernel > stack overflows. > > TCP_NODELAY is not safe from the header option callback context. > Reject it with -EOPNOTSUPP when TCP header option callbacks are > enabled on the socket, so the callback cannot recurse back into > tcp_push_pending_frames() through do_tcp_setsockopt(). > > Reported-by: Quan Sun <2022090917019@std.uestc.edu.cn> > Reported-by: Yinhao Hu <dddddd@hust.edu.cn> > Reported-by: Kaiyan Mei <M202472210@hust.edu.cn> > Closes: https://lore.kernel.org/bpf/d1d523c9-6901-4454-a183-94462b8f3e4e@std.uestc.edu.cn/ > Fixes: 7e41df5dbba2 ("bpf: Add a few optnames to bpf_setsockopt") > Signed-off-by: KaFai Wan <kafai.wan@linux.dev> > --- > net/ipv4/tcp.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c > index 202a4e57a218..7ac4c98be19d 100644 > --- a/net/ipv4/tcp.c > +++ b/net/ipv4/tcp.c > @@ -4004,7 +4004,10 @@ int do_tcp_setsockopt(struct sock *sk, int level, int optname, > > switch (optname) { > case TCP_NODELAY: > - __tcp_sock_set_nodelay(sk, val); > + if (val && BPF_SOCK_OPS_TEST_FLAG(tp, BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG)) > + err = -EOPNOTSUPP; > + else > + __tcp_sock_set_nodelay(sk, val); > break; > > case TCP_THIN_LINEAR_TIMEOUTS: -- Thanks, KaFai ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH bpf-next 1/2] bpf: tcp: Reject TCP_NODELAY from BPF hdr opt callbacks 2026-04-14 11:23 ` [PATCH bpf-next 1/2] " KaFai Wan 2026-04-14 13:56 ` KaFai Wan @ 2026-04-15 17:31 ` Martin KaFai Lau 1 sibling, 0 replies; 5+ messages in thread From: Martin KaFai Lau @ 2026-04-15 17:31 UTC (permalink / raw) To: KaFai Wan Cc: edumazet, ncardwell, kuniyu, davem, dsahern, kuba, pabeni, horms, ast, daniel, andrii, eddyz87, memxor, song, yonghong.song, jolsa, shuah, sdf, netdev, linux-kernel, bpf, linux-kselftest, Quan Sun, Yinhao Hu, Kaiyan Mei On Tue, Apr 14, 2026 at 07:23:09PM +0800, KaFai Wan wrote: > A BPF_SOCK_OPS program can enable > BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG and then call > bpf_setsockopt(TCP_NODELAY) from BPF_SOCK_OPS_HDR_OPT_LEN_CB. > > That reaches __tcp_sock_set_nodelay(), which may call > tcp_push_pending_frames(). The transmit path then computes TCP > options again, re-enters bpf_skops_hdr_opt_len(), and invokes the > same BPF callback recursively. This can loop until the kernel > stack overflows. > > TCP_NODELAY is not safe from the header option callback context. > Reject it with -EOPNOTSUPP when TCP header option callbacks are > enabled on the socket, so the callback cannot recurse back into > tcp_push_pending_frames() through do_tcp_setsockopt(). > > Reported-by: Quan Sun <2022090917019@std.uestc.edu.cn> > Reported-by: Yinhao Hu <dddddd@hust.edu.cn> > Reported-by: Kaiyan Mei <M202472210@hust.edu.cn> > Closes: https://lore.kernel.org/bpf/d1d523c9-6901-4454-a183-94462b8f3e4e@std.uestc.edu.cn/ > Fixes: 7e41df5dbba2 ("bpf: Add a few optnames to bpf_setsockopt") > Signed-off-by: KaFai Wan <kafai.wan@linux.dev> > --- > net/ipv4/tcp.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c > index 202a4e57a218..7ac4c98be19d 100644 > --- a/net/ipv4/tcp.c > +++ b/net/ipv4/tcp.c > @@ -4004,7 +4004,10 @@ int do_tcp_setsockopt(struct sock *sk, int level, int optname, > > switch (optname) { > case TCP_NODELAY: > - __tcp_sock_set_nodelay(sk, val); > + if (val && BPF_SOCK_OPS_TEST_FLAG(tp, BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG)) It will break the syscall setsockopt and also break the existing bpf prog that calls bpf_setsockopt(TCP_NODELAY) in CB other than the BPF_SOCK_OPS_HDR_OPT_LEN_CB/BPF_SOCK_OPS_WRITE_HDR_OPT_CB. Lets brainstorm other options suggested on the list that have smaller blast radius. pw-bot: cr > + err = -EOPNOTSUPP; > + else > + __tcp_sock_set_nodelay(sk, val); > break; > > case TCP_THIN_LINEAR_TIMEOUTS: > -- > 2.43.0 > ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH bpf-next 2/2] selftests/bpf: Cover TCP_NODELAY in hdr opt callback 2026-04-14 11:23 [PATCH bpf-next 0/2] bpf: tcp: Reject TCP_NODELAY from BPF hdr opt callbacks KaFai Wan 2026-04-14 11:23 ` [PATCH bpf-next 1/2] " KaFai Wan @ 2026-04-14 11:23 ` KaFai Wan 1 sibling, 0 replies; 5+ messages in thread From: KaFai Wan @ 2026-04-14 11:23 UTC (permalink / raw) To: edumazet, ncardwell, kuniyu, davem, dsahern, kuba, pabeni, horms, ast, daniel, andrii, martin.lau, eddyz87, memxor, song, yonghong.song, jolsa, shuah, kafai.wan, sdf, netdev, linux-kernel, bpf, linux-kselftest Add a sockops test program that enables BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG on connection setup and calls bpf_setsockopt(TCP_NODELAY) from BPF_SOCK_OPS_HDR_OPT_LEN_CB. Exercise the connection by sending data after the socket is established. Before the fix, this setup can recurse through tcp_push_pending_frames() and bpf_skops_hdr_opt_len() until the kernel hits a stack guard page. After the fix, the connection continues to make forward progress and the data exchange completes. Signed-off-by: KaFai Wan <kafai.wan@linux.dev> --- .../bpf/prog_tests/tcp_hdr_options.c | 34 +++++++++++++++++++ .../bpf/progs/test_misc_tcp_hdr_options.c | 18 ++++++++++ 2 files changed, 52 insertions(+) diff --git a/tools/testing/selftests/bpf/prog_tests/tcp_hdr_options.c b/tools/testing/selftests/bpf/prog_tests/tcp_hdr_options.c index 56685fc03c7e..f361f9c7bf59 100644 --- a/tools/testing/selftests/bpf/prog_tests/tcp_hdr_options.c +++ b/tools/testing/selftests/bpf/prog_tests/tcp_hdr_options.c @@ -513,6 +513,39 @@ static void misc(void) bpf_link__destroy(link); } +static void hdr_sockopt(void) +{ + const char send_msg[] = "MISC!!!"; + char recv_msg[sizeof(send_msg)]; + const unsigned int nr_data = 2; + struct bpf_link *link; + struct sk_fds sk_fds; + int i, ret; + + link = bpf_program__attach_cgroup(misc_skel->progs.misc_hdr_sockopt, cg_fd); + if (!ASSERT_OK_PTR(link, "attach_cgroup(misc_hdr_sockopt)")) + return; + + if (sk_fds_connect(&sk_fds, false)) { + bpf_link__destroy(link); + return; + } + + for (i = 0; i < nr_data; i++) { + ret = send(sk_fds.active_fd, send_msg, sizeof(send_msg), 0); + if (!ASSERT_EQ(ret, sizeof(send_msg), "send(msg)")) + goto check_linum; + + ret = read(sk_fds.passive_fd, recv_msg, sizeof(recv_msg)); + if (!ASSERT_EQ(ret, sizeof(send_msg), "read(msg)")) + goto check_linum; + } + +check_linum: + sk_fds_close(&sk_fds); + bpf_link__destroy(link); +} + struct test { const char *desc; void (*run)(void); @@ -526,6 +559,7 @@ static struct test tests[] = { DEF_TEST(fastopen_estab), DEF_TEST(fin), DEF_TEST(misc), + DEF_TEST(hdr_sockopt), }; void test_tcp_hdr_options(void) diff --git a/tools/testing/selftests/bpf/progs/test_misc_tcp_hdr_options.c b/tools/testing/selftests/bpf/progs/test_misc_tcp_hdr_options.c index d487153a839d..e1dc7246193e 100644 --- a/tools/testing/selftests/bpf/progs/test_misc_tcp_hdr_options.c +++ b/tools/testing/selftests/bpf/progs/test_misc_tcp_hdr_options.c @@ -326,4 +326,22 @@ int misc_estab(struct bpf_sock_ops *skops) return CG_OK; } +SEC("sockops") +int misc_hdr_sockopt(struct bpf_sock_ops *skops) +{ + int true_val = 1; + + switch (skops->op) { + case BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB: + case BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB: + set_hdr_cb_flags(skops, 0); + break; + case BPF_SOCK_OPS_HDR_OPT_LEN_CB: + bpf_setsockopt(skops, SOL_TCP, TCP_NODELAY, &true_val, sizeof(true_val)); + break; + } + + return 0; +} + char _license[] SEC("license") = "GPL"; -- 2.43.0 ^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-04-15 17:32 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-04-14 11:23 [PATCH bpf-next 0/2] bpf: tcp: Reject TCP_NODELAY from BPF hdr opt callbacks KaFai Wan 2026-04-14 11:23 ` [PATCH bpf-next 1/2] " KaFai Wan 2026-04-14 13:56 ` KaFai Wan 2026-04-15 17:31 ` Martin KaFai Lau 2026-04-14 11:23 ` [PATCH bpf-next 2/2] selftests/bpf: Cover TCP_NODELAY in hdr opt callback KaFai Wan
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox