* [PATCH bpf-next 0/2] bpf: tcp: Reject TCP_NODELAY from BPF hdr opt callbacks @ 2026-04-14 11:23 KaFai Wan 2026-04-14 11:23 ` [PATCH bpf-next 1/2] " KaFai Wan 2026-04-14 11:23 ` [PATCH bpf-next 2/2] selftests/bpf: Cover TCP_NODELAY in hdr opt callback KaFai Wan 0 siblings, 2 replies; 5+ messages in thread From: KaFai Wan @ 2026-04-14 11:23 UTC (permalink / raw) To: edumazet, ncardwell, kuniyu, davem, dsahern, kuba, pabeni, horms, ast, daniel, andrii, martin.lau, eddyz87, memxor, song, yonghong.song, jolsa, shuah, kafai.wan, sdf, netdev, linux-kernel, bpf, linux-kselftest This small patchset is about avoid infinite recursion in bpf_skops_hdr_opt_len() via TCP_NODELAY setsockopt. --- KaFai Wan (2): bpf: tcp: Reject TCP_NODELAY from BPF hdr opt callbacks selftests/bpf: Cover TCP_NODELAY in hdr opt callback net/ipv4/tcp.c | 5 ++- .../bpf/prog_tests/tcp_hdr_options.c | 34 +++++++++++++++++++ .../bpf/progs/test_misc_tcp_hdr_options.c | 18 ++++++++++ 3 files changed, 56 insertions(+), 1 deletion(-) -- 2.43.0 ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH bpf-next 1/2] bpf: tcp: Reject TCP_NODELAY from BPF hdr opt callbacks 2026-04-14 11:23 [PATCH bpf-next 0/2] bpf: tcp: Reject TCP_NODELAY from BPF hdr opt callbacks KaFai Wan @ 2026-04-14 11:23 ` KaFai Wan 2026-04-14 13:56 ` KaFai Wan 2026-04-15 17:31 ` Martin KaFai Lau 2026-04-14 11:23 ` [PATCH bpf-next 2/2] selftests/bpf: Cover TCP_NODELAY in hdr opt callback KaFai Wan 1 sibling, 2 replies; 5+ messages in thread From: KaFai Wan @ 2026-04-14 11:23 UTC (permalink / raw) To: edumazet, ncardwell, kuniyu, davem, dsahern, kuba, pabeni, horms, ast, daniel, andrii, martin.lau, eddyz87, memxor, song, yonghong.song, jolsa, shuah, kafai.wan, sdf, netdev, linux-kernel, bpf, linux-kselftest Cc: Quan Sun, Yinhao Hu, Kaiyan Mei A BPF_SOCK_OPS program can enable BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG and then call bpf_setsockopt(TCP_NODELAY) from BPF_SOCK_OPS_HDR_OPT_LEN_CB. That reaches __tcp_sock_set_nodelay(), which may call tcp_push_pending_frames(). The transmit path then computes TCP options again, re-enters bpf_skops_hdr_opt_len(), and invokes the same BPF callback recursively. This can loop until the kernel stack overflows. TCP_NODELAY is not safe from the header option callback context. Reject it with -EOPNOTSUPP when TCP header option callbacks are enabled on the socket, so the callback cannot recurse back into tcp_push_pending_frames() through do_tcp_setsockopt(). Reported-by: Quan Sun <2022090917019@std.uestc.edu.cn> Reported-by: Yinhao Hu <dddddd@hust.edu.cn> Reported-by: Kaiyan Mei <M202472210@hust.edu.cn> Closes: https://lore.kernel.org/bpf/d1d523c9-6901-4454-a183-94462b8f3e4e@std.uestc.edu.cn/ Fixes: 7e41df5dbba2 ("bpf: Add a few optnames to bpf_setsockopt") Signed-off-by: KaFai Wan <kafai.wan@linux.dev> --- net/ipv4/tcp.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 202a4e57a218..7ac4c98be19d 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -4004,7 +4004,10 @@ int do_tcp_setsockopt(struct sock *sk, int level, int optname, switch (optname) { case TCP_NODELAY: - __tcp_sock_set_nodelay(sk, val); + if (val && BPF_SOCK_OPS_TEST_FLAG(tp, BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG)) + err = -EOPNOTSUPP; + else + __tcp_sock_set_nodelay(sk, val); break; case TCP_THIN_LINEAR_TIMEOUTS: -- 2.43.0 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH bpf-next 1/2] bpf: tcp: Reject TCP_NODELAY from BPF hdr opt callbacks 2026-04-14 11:23 ` [PATCH bpf-next 1/2] " KaFai Wan @ 2026-04-14 13:56 ` KaFai Wan 2026-04-15 17:31 ` Martin KaFai Lau 1 sibling, 0 replies; 5+ messages in thread From: KaFai Wan @ 2026-04-14 13:56 UTC (permalink / raw) To: edumazet, ncardwell, kuniyu, davem, dsahern, kuba, pabeni, horms, ast, daniel, andrii, martin.lau, eddyz87, memxor, song, yonghong.song, jolsa, shuah, sdf, netdev, linux-kernel, bpf, linux-kselftest Cc: Quan Sun, Yinhao Hu, Kaiyan Mei On Tue, 2026-04-14 at 19:23 +0800, KaFai Wan wrote: AI is right and I'm late for the issue. Please ignore this. Sorry for the noise. > A BPF_SOCK_OPS program can enable > BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG and then call > bpf_setsockopt(TCP_NODELAY) from BPF_SOCK_OPS_HDR_OPT_LEN_CB. > > That reaches __tcp_sock_set_nodelay(), which may call > tcp_push_pending_frames(). The transmit path then computes TCP > options again, re-enters bpf_skops_hdr_opt_len(), and invokes the > same BPF callback recursively. This can loop until the kernel > stack overflows. > > TCP_NODELAY is not safe from the header option callback context. > Reject it with -EOPNOTSUPP when TCP header option callbacks are > enabled on the socket, so the callback cannot recurse back into > tcp_push_pending_frames() through do_tcp_setsockopt(). > > Reported-by: Quan Sun <2022090917019@std.uestc.edu.cn> > Reported-by: Yinhao Hu <dddddd@hust.edu.cn> > Reported-by: Kaiyan Mei <M202472210@hust.edu.cn> > Closes: https://lore.kernel.org/bpf/d1d523c9-6901-4454-a183-94462b8f3e4e@std.uestc.edu.cn/ > Fixes: 7e41df5dbba2 ("bpf: Add a few optnames to bpf_setsockopt") > Signed-off-by: KaFai Wan <kafai.wan@linux.dev> > --- > net/ipv4/tcp.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c > index 202a4e57a218..7ac4c98be19d 100644 > --- a/net/ipv4/tcp.c > +++ b/net/ipv4/tcp.c > @@ -4004,7 +4004,10 @@ int do_tcp_setsockopt(struct sock *sk, int level, int optname, > > switch (optname) { > case TCP_NODELAY: > - __tcp_sock_set_nodelay(sk, val); > + if (val && BPF_SOCK_OPS_TEST_FLAG(tp, BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG)) > + err = -EOPNOTSUPP; > + else > + __tcp_sock_set_nodelay(sk, val); > break; > > case TCP_THIN_LINEAR_TIMEOUTS: -- Thanks, KaFai ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH bpf-next 1/2] bpf: tcp: Reject TCP_NODELAY from BPF hdr opt callbacks 2026-04-14 11:23 ` [PATCH bpf-next 1/2] " KaFai Wan 2026-04-14 13:56 ` KaFai Wan @ 2026-04-15 17:31 ` Martin KaFai Lau 1 sibling, 0 replies; 5+ messages in thread From: Martin KaFai Lau @ 2026-04-15 17:31 UTC (permalink / raw) To: KaFai Wan Cc: edumazet, ncardwell, kuniyu, davem, dsahern, kuba, pabeni, horms, ast, daniel, andrii, eddyz87, memxor, song, yonghong.song, jolsa, shuah, sdf, netdev, linux-kernel, bpf, linux-kselftest, Quan Sun, Yinhao Hu, Kaiyan Mei On Tue, Apr 14, 2026 at 07:23:09PM +0800, KaFai Wan wrote: > A BPF_SOCK_OPS program can enable > BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG and then call > bpf_setsockopt(TCP_NODELAY) from BPF_SOCK_OPS_HDR_OPT_LEN_CB. > > That reaches __tcp_sock_set_nodelay(), which may call > tcp_push_pending_frames(). The transmit path then computes TCP > options again, re-enters bpf_skops_hdr_opt_len(), and invokes the > same BPF callback recursively. This can loop until the kernel > stack overflows. > > TCP_NODELAY is not safe from the header option callback context. > Reject it with -EOPNOTSUPP when TCP header option callbacks are > enabled on the socket, so the callback cannot recurse back into > tcp_push_pending_frames() through do_tcp_setsockopt(). > > Reported-by: Quan Sun <2022090917019@std.uestc.edu.cn> > Reported-by: Yinhao Hu <dddddd@hust.edu.cn> > Reported-by: Kaiyan Mei <M202472210@hust.edu.cn> > Closes: https://lore.kernel.org/bpf/d1d523c9-6901-4454-a183-94462b8f3e4e@std.uestc.edu.cn/ > Fixes: 7e41df5dbba2 ("bpf: Add a few optnames to bpf_setsockopt") > Signed-off-by: KaFai Wan <kafai.wan@linux.dev> > --- > net/ipv4/tcp.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c > index 202a4e57a218..7ac4c98be19d 100644 > --- a/net/ipv4/tcp.c > +++ b/net/ipv4/tcp.c > @@ -4004,7 +4004,10 @@ int do_tcp_setsockopt(struct sock *sk, int level, int optname, > > switch (optname) { > case TCP_NODELAY: > - __tcp_sock_set_nodelay(sk, val); > + if (val && BPF_SOCK_OPS_TEST_FLAG(tp, BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG)) It will break the syscall setsockopt and also break the existing bpf prog that calls bpf_setsockopt(TCP_NODELAY) in CB other than the BPF_SOCK_OPS_HDR_OPT_LEN_CB/BPF_SOCK_OPS_WRITE_HDR_OPT_CB. Lets brainstorm other options suggested on the list that have smaller blast radius. pw-bot: cr > + err = -EOPNOTSUPP; > + else > + __tcp_sock_set_nodelay(sk, val); > break; > > case TCP_THIN_LINEAR_TIMEOUTS: > -- > 2.43.0 > ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH bpf-next 2/2] selftests/bpf: Cover TCP_NODELAY in hdr opt callback 2026-04-14 11:23 [PATCH bpf-next 0/2] bpf: tcp: Reject TCP_NODELAY from BPF hdr opt callbacks KaFai Wan 2026-04-14 11:23 ` [PATCH bpf-next 1/2] " KaFai Wan @ 2026-04-14 11:23 ` KaFai Wan 1 sibling, 0 replies; 5+ messages in thread From: KaFai Wan @ 2026-04-14 11:23 UTC (permalink / raw) To: edumazet, ncardwell, kuniyu, davem, dsahern, kuba, pabeni, horms, ast, daniel, andrii, martin.lau, eddyz87, memxor, song, yonghong.song, jolsa, shuah, kafai.wan, sdf, netdev, linux-kernel, bpf, linux-kselftest Add a sockops test program that enables BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG on connection setup and calls bpf_setsockopt(TCP_NODELAY) from BPF_SOCK_OPS_HDR_OPT_LEN_CB. Exercise the connection by sending data after the socket is established. Before the fix, this setup can recurse through tcp_push_pending_frames() and bpf_skops_hdr_opt_len() until the kernel hits a stack guard page. After the fix, the connection continues to make forward progress and the data exchange completes. Signed-off-by: KaFai Wan <kafai.wan@linux.dev> --- .../bpf/prog_tests/tcp_hdr_options.c | 34 +++++++++++++++++++ .../bpf/progs/test_misc_tcp_hdr_options.c | 18 ++++++++++ 2 files changed, 52 insertions(+) diff --git a/tools/testing/selftests/bpf/prog_tests/tcp_hdr_options.c b/tools/testing/selftests/bpf/prog_tests/tcp_hdr_options.c index 56685fc03c7e..f361f9c7bf59 100644 --- a/tools/testing/selftests/bpf/prog_tests/tcp_hdr_options.c +++ b/tools/testing/selftests/bpf/prog_tests/tcp_hdr_options.c @@ -513,6 +513,39 @@ static void misc(void) bpf_link__destroy(link); } +static void hdr_sockopt(void) +{ + const char send_msg[] = "MISC!!!"; + char recv_msg[sizeof(send_msg)]; + const unsigned int nr_data = 2; + struct bpf_link *link; + struct sk_fds sk_fds; + int i, ret; + + link = bpf_program__attach_cgroup(misc_skel->progs.misc_hdr_sockopt, cg_fd); + if (!ASSERT_OK_PTR(link, "attach_cgroup(misc_hdr_sockopt)")) + return; + + if (sk_fds_connect(&sk_fds, false)) { + bpf_link__destroy(link); + return; + } + + for (i = 0; i < nr_data; i++) { + ret = send(sk_fds.active_fd, send_msg, sizeof(send_msg), 0); + if (!ASSERT_EQ(ret, sizeof(send_msg), "send(msg)")) + goto check_linum; + + ret = read(sk_fds.passive_fd, recv_msg, sizeof(recv_msg)); + if (!ASSERT_EQ(ret, sizeof(send_msg), "read(msg)")) + goto check_linum; + } + +check_linum: + sk_fds_close(&sk_fds); + bpf_link__destroy(link); +} + struct test { const char *desc; void (*run)(void); @@ -526,6 +559,7 @@ static struct test tests[] = { DEF_TEST(fastopen_estab), DEF_TEST(fin), DEF_TEST(misc), + DEF_TEST(hdr_sockopt), }; void test_tcp_hdr_options(void) diff --git a/tools/testing/selftests/bpf/progs/test_misc_tcp_hdr_options.c b/tools/testing/selftests/bpf/progs/test_misc_tcp_hdr_options.c index d487153a839d..e1dc7246193e 100644 --- a/tools/testing/selftests/bpf/progs/test_misc_tcp_hdr_options.c +++ b/tools/testing/selftests/bpf/progs/test_misc_tcp_hdr_options.c @@ -326,4 +326,22 @@ int misc_estab(struct bpf_sock_ops *skops) return CG_OK; } +SEC("sockops") +int misc_hdr_sockopt(struct bpf_sock_ops *skops) +{ + int true_val = 1; + + switch (skops->op) { + case BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB: + case BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB: + set_hdr_cb_flags(skops, 0); + break; + case BPF_SOCK_OPS_HDR_OPT_LEN_CB: + bpf_setsockopt(skops, SOL_TCP, TCP_NODELAY, &true_val, sizeof(true_val)); + break; + } + + return 0; +} + char _license[] SEC("license") = "GPL"; -- 2.43.0 ^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-04-15 17:32 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-04-14 11:23 [PATCH bpf-next 0/2] bpf: tcp: Reject TCP_NODELAY from BPF hdr opt callbacks KaFai Wan 2026-04-14 11:23 ` [PATCH bpf-next 1/2] " KaFai Wan 2026-04-14 13:56 ` KaFai Wan 2026-04-15 17:31 ` Martin KaFai Lau 2026-04-14 11:23 ` [PATCH bpf-next 2/2] selftests/bpf: Cover TCP_NODELAY in hdr opt callback KaFai Wan
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.