* [PATCH bpf-next 0/2] bpf: tcp: Reject TCP_NODELAY from BPF hdr opt callbacks
@ 2026-04-14 11:23 KaFai Wan
2026-04-14 11:23 ` [PATCH bpf-next 1/2] " KaFai Wan
2026-04-14 11:23 ` [PATCH bpf-next 2/2] selftests/bpf: Cover TCP_NODELAY in hdr opt callback KaFai Wan
0 siblings, 2 replies; 5+ messages in thread
From: KaFai Wan @ 2026-04-14 11:23 UTC (permalink / raw)
To: edumazet, ncardwell, kuniyu, davem, dsahern, kuba, pabeni, horms,
ast, daniel, andrii, martin.lau, eddyz87, memxor, song,
yonghong.song, jolsa, shuah, kafai.wan, sdf, netdev, linux-kernel,
bpf, linux-kselftest
This small patchset is about avoid infinite recursion in bpf_skops_hdr_opt_len()
via TCP_NODELAY setsockopt.
---
KaFai Wan (2):
bpf: tcp: Reject TCP_NODELAY from BPF hdr opt callbacks
selftests/bpf: Cover TCP_NODELAY in hdr opt callback
net/ipv4/tcp.c | 5 ++-
.../bpf/prog_tests/tcp_hdr_options.c | 34 +++++++++++++++++++
.../bpf/progs/test_misc_tcp_hdr_options.c | 18 ++++++++++
3 files changed, 56 insertions(+), 1 deletion(-)
--
2.43.0
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH bpf-next 1/2] bpf: tcp: Reject TCP_NODELAY from BPF hdr opt callbacks
2026-04-14 11:23 [PATCH bpf-next 0/2] bpf: tcp: Reject TCP_NODELAY from BPF hdr opt callbacks KaFai Wan
@ 2026-04-14 11:23 ` KaFai Wan
2026-04-14 13:56 ` KaFai Wan
2026-04-15 17:31 ` Martin KaFai Lau
2026-04-14 11:23 ` [PATCH bpf-next 2/2] selftests/bpf: Cover TCP_NODELAY in hdr opt callback KaFai Wan
1 sibling, 2 replies; 5+ messages in thread
From: KaFai Wan @ 2026-04-14 11:23 UTC (permalink / raw)
To: edumazet, ncardwell, kuniyu, davem, dsahern, kuba, pabeni, horms,
ast, daniel, andrii, martin.lau, eddyz87, memxor, song,
yonghong.song, jolsa, shuah, kafai.wan, sdf, netdev, linux-kernel,
bpf, linux-kselftest
Cc: Quan Sun, Yinhao Hu, Kaiyan Mei
A BPF_SOCK_OPS program can enable
BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG and then call
bpf_setsockopt(TCP_NODELAY) from BPF_SOCK_OPS_HDR_OPT_LEN_CB.
That reaches __tcp_sock_set_nodelay(), which may call
tcp_push_pending_frames(). The transmit path then computes TCP
options again, re-enters bpf_skops_hdr_opt_len(), and invokes the
same BPF callback recursively. This can loop until the kernel
stack overflows.
TCP_NODELAY is not safe from the header option callback context.
Reject it with -EOPNOTSUPP when TCP header option callbacks are
enabled on the socket, so the callback cannot recurse back into
tcp_push_pending_frames() through do_tcp_setsockopt().
Reported-by: Quan Sun <2022090917019@std.uestc.edu.cn>
Reported-by: Yinhao Hu <dddddd@hust.edu.cn>
Reported-by: Kaiyan Mei <M202472210@hust.edu.cn>
Closes: https://lore.kernel.org/bpf/d1d523c9-6901-4454-a183-94462b8f3e4e@std.uestc.edu.cn/
Fixes: 7e41df5dbba2 ("bpf: Add a few optnames to bpf_setsockopt")
Signed-off-by: KaFai Wan <kafai.wan@linux.dev>
---
net/ipv4/tcp.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 202a4e57a218..7ac4c98be19d 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -4004,7 +4004,10 @@ int do_tcp_setsockopt(struct sock *sk, int level, int optname,
switch (optname) {
case TCP_NODELAY:
- __tcp_sock_set_nodelay(sk, val);
+ if (val && BPF_SOCK_OPS_TEST_FLAG(tp, BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG))
+ err = -EOPNOTSUPP;
+ else
+ __tcp_sock_set_nodelay(sk, val);
break;
case TCP_THIN_LINEAR_TIMEOUTS:
--
2.43.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH bpf-next 2/2] selftests/bpf: Cover TCP_NODELAY in hdr opt callback
2026-04-14 11:23 [PATCH bpf-next 0/2] bpf: tcp: Reject TCP_NODELAY from BPF hdr opt callbacks KaFai Wan
2026-04-14 11:23 ` [PATCH bpf-next 1/2] " KaFai Wan
@ 2026-04-14 11:23 ` KaFai Wan
1 sibling, 0 replies; 5+ messages in thread
From: KaFai Wan @ 2026-04-14 11:23 UTC (permalink / raw)
To: edumazet, ncardwell, kuniyu, davem, dsahern, kuba, pabeni, horms,
ast, daniel, andrii, martin.lau, eddyz87, memxor, song,
yonghong.song, jolsa, shuah, kafai.wan, sdf, netdev, linux-kernel,
bpf, linux-kselftest
Add a sockops test program that enables
BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG on connection setup and calls
bpf_setsockopt(TCP_NODELAY) from BPF_SOCK_OPS_HDR_OPT_LEN_CB.
Exercise the connection by sending data after the socket is
established. Before the fix, this setup can recurse through
tcp_push_pending_frames() and bpf_skops_hdr_opt_len() until the
kernel hits a stack guard page. After the fix, the connection
continues to make forward progress and the data exchange completes.
Signed-off-by: KaFai Wan <kafai.wan@linux.dev>
---
.../bpf/prog_tests/tcp_hdr_options.c | 34 +++++++++++++++++++
.../bpf/progs/test_misc_tcp_hdr_options.c | 18 ++++++++++
2 files changed, 52 insertions(+)
diff --git a/tools/testing/selftests/bpf/prog_tests/tcp_hdr_options.c b/tools/testing/selftests/bpf/prog_tests/tcp_hdr_options.c
index 56685fc03c7e..f361f9c7bf59 100644
--- a/tools/testing/selftests/bpf/prog_tests/tcp_hdr_options.c
+++ b/tools/testing/selftests/bpf/prog_tests/tcp_hdr_options.c
@@ -513,6 +513,39 @@ static void misc(void)
bpf_link__destroy(link);
}
+static void hdr_sockopt(void)
+{
+ const char send_msg[] = "MISC!!!";
+ char recv_msg[sizeof(send_msg)];
+ const unsigned int nr_data = 2;
+ struct bpf_link *link;
+ struct sk_fds sk_fds;
+ int i, ret;
+
+ link = bpf_program__attach_cgroup(misc_skel->progs.misc_hdr_sockopt, cg_fd);
+ if (!ASSERT_OK_PTR(link, "attach_cgroup(misc_hdr_sockopt)"))
+ return;
+
+ if (sk_fds_connect(&sk_fds, false)) {
+ bpf_link__destroy(link);
+ return;
+ }
+
+ for (i = 0; i < nr_data; i++) {
+ ret = send(sk_fds.active_fd, send_msg, sizeof(send_msg), 0);
+ if (!ASSERT_EQ(ret, sizeof(send_msg), "send(msg)"))
+ goto check_linum;
+
+ ret = read(sk_fds.passive_fd, recv_msg, sizeof(recv_msg));
+ if (!ASSERT_EQ(ret, sizeof(send_msg), "read(msg)"))
+ goto check_linum;
+ }
+
+check_linum:
+ sk_fds_close(&sk_fds);
+ bpf_link__destroy(link);
+}
+
struct test {
const char *desc;
void (*run)(void);
@@ -526,6 +559,7 @@ static struct test tests[] = {
DEF_TEST(fastopen_estab),
DEF_TEST(fin),
DEF_TEST(misc),
+ DEF_TEST(hdr_sockopt),
};
void test_tcp_hdr_options(void)
diff --git a/tools/testing/selftests/bpf/progs/test_misc_tcp_hdr_options.c b/tools/testing/selftests/bpf/progs/test_misc_tcp_hdr_options.c
index d487153a839d..e1dc7246193e 100644
--- a/tools/testing/selftests/bpf/progs/test_misc_tcp_hdr_options.c
+++ b/tools/testing/selftests/bpf/progs/test_misc_tcp_hdr_options.c
@@ -326,4 +326,22 @@ int misc_estab(struct bpf_sock_ops *skops)
return CG_OK;
}
+SEC("sockops")
+int misc_hdr_sockopt(struct bpf_sock_ops *skops)
+{
+ int true_val = 1;
+
+ switch (skops->op) {
+ case BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB:
+ case BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB:
+ set_hdr_cb_flags(skops, 0);
+ break;
+ case BPF_SOCK_OPS_HDR_OPT_LEN_CB:
+ bpf_setsockopt(skops, SOL_TCP, TCP_NODELAY, &true_val, sizeof(true_val));
+ break;
+ }
+
+ return 0;
+}
+
char _license[] SEC("license") = "GPL";
--
2.43.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH bpf-next 1/2] bpf: tcp: Reject TCP_NODELAY from BPF hdr opt callbacks
2026-04-14 11:23 ` [PATCH bpf-next 1/2] " KaFai Wan
@ 2026-04-14 13:56 ` KaFai Wan
2026-04-15 17:31 ` Martin KaFai Lau
1 sibling, 0 replies; 5+ messages in thread
From: KaFai Wan @ 2026-04-14 13:56 UTC (permalink / raw)
To: edumazet, ncardwell, kuniyu, davem, dsahern, kuba, pabeni, horms,
ast, daniel, andrii, martin.lau, eddyz87, memxor, song,
yonghong.song, jolsa, shuah, sdf, netdev, linux-kernel, bpf,
linux-kselftest
Cc: Quan Sun, Yinhao Hu, Kaiyan Mei
On Tue, 2026-04-14 at 19:23 +0800, KaFai Wan wrote:
AI is right and I'm late for the issue. Please ignore this. Sorry for the noise.
> A BPF_SOCK_OPS program can enable
> BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG and then call
> bpf_setsockopt(TCP_NODELAY) from BPF_SOCK_OPS_HDR_OPT_LEN_CB.
>
> That reaches __tcp_sock_set_nodelay(), which may call
> tcp_push_pending_frames(). The transmit path then computes TCP
> options again, re-enters bpf_skops_hdr_opt_len(), and invokes the
> same BPF callback recursively. This can loop until the kernel
> stack overflows.
>
> TCP_NODELAY is not safe from the header option callback context.
> Reject it with -EOPNOTSUPP when TCP header option callbacks are
> enabled on the socket, so the callback cannot recurse back into
> tcp_push_pending_frames() through do_tcp_setsockopt().
>
> Reported-by: Quan Sun <2022090917019@std.uestc.edu.cn>
> Reported-by: Yinhao Hu <dddddd@hust.edu.cn>
> Reported-by: Kaiyan Mei <M202472210@hust.edu.cn>
> Closes: https://lore.kernel.org/bpf/d1d523c9-6901-4454-a183-94462b8f3e4e@std.uestc.edu.cn/
> Fixes: 7e41df5dbba2 ("bpf: Add a few optnames to bpf_setsockopt")
> Signed-off-by: KaFai Wan <kafai.wan@linux.dev>
> ---
> net/ipv4/tcp.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 202a4e57a218..7ac4c98be19d 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -4004,7 +4004,10 @@ int do_tcp_setsockopt(struct sock *sk, int level, int optname,
>
> switch (optname) {
> case TCP_NODELAY:
> - __tcp_sock_set_nodelay(sk, val);
> + if (val && BPF_SOCK_OPS_TEST_FLAG(tp, BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG))
> + err = -EOPNOTSUPP;
> + else
> + __tcp_sock_set_nodelay(sk, val);
> break;
>
> case TCP_THIN_LINEAR_TIMEOUTS:
--
Thanks,
KaFai
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH bpf-next 1/2] bpf: tcp: Reject TCP_NODELAY from BPF hdr opt callbacks
2026-04-14 11:23 ` [PATCH bpf-next 1/2] " KaFai Wan
2026-04-14 13:56 ` KaFai Wan
@ 2026-04-15 17:31 ` Martin KaFai Lau
1 sibling, 0 replies; 5+ messages in thread
From: Martin KaFai Lau @ 2026-04-15 17:31 UTC (permalink / raw)
To: KaFai Wan
Cc: edumazet, ncardwell, kuniyu, davem, dsahern, kuba, pabeni, horms,
ast, daniel, andrii, eddyz87, memxor, song, yonghong.song, jolsa,
shuah, sdf, netdev, linux-kernel, bpf, linux-kselftest, Quan Sun,
Yinhao Hu, Kaiyan Mei
On Tue, Apr 14, 2026 at 07:23:09PM +0800, KaFai Wan wrote:
> A BPF_SOCK_OPS program can enable
> BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG and then call
> bpf_setsockopt(TCP_NODELAY) from BPF_SOCK_OPS_HDR_OPT_LEN_CB.
>
> That reaches __tcp_sock_set_nodelay(), which may call
> tcp_push_pending_frames(). The transmit path then computes TCP
> options again, re-enters bpf_skops_hdr_opt_len(), and invokes the
> same BPF callback recursively. This can loop until the kernel
> stack overflows.
>
> TCP_NODELAY is not safe from the header option callback context.
> Reject it with -EOPNOTSUPP when TCP header option callbacks are
> enabled on the socket, so the callback cannot recurse back into
> tcp_push_pending_frames() through do_tcp_setsockopt().
>
> Reported-by: Quan Sun <2022090917019@std.uestc.edu.cn>
> Reported-by: Yinhao Hu <dddddd@hust.edu.cn>
> Reported-by: Kaiyan Mei <M202472210@hust.edu.cn>
> Closes: https://lore.kernel.org/bpf/d1d523c9-6901-4454-a183-94462b8f3e4e@std.uestc.edu.cn/
> Fixes: 7e41df5dbba2 ("bpf: Add a few optnames to bpf_setsockopt")
> Signed-off-by: KaFai Wan <kafai.wan@linux.dev>
> ---
> net/ipv4/tcp.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 202a4e57a218..7ac4c98be19d 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -4004,7 +4004,10 @@ int do_tcp_setsockopt(struct sock *sk, int level, int optname,
>
> switch (optname) {
> case TCP_NODELAY:
> - __tcp_sock_set_nodelay(sk, val);
> + if (val && BPF_SOCK_OPS_TEST_FLAG(tp, BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG))
It will break the syscall setsockopt and also break the existing bpf prog
that calls bpf_setsockopt(TCP_NODELAY) in CB other than the
BPF_SOCK_OPS_HDR_OPT_LEN_CB/BPF_SOCK_OPS_WRITE_HDR_OPT_CB.
Lets brainstorm other options suggested on the list that have smaller
blast radius.
pw-bot: cr
> + err = -EOPNOTSUPP;
> + else
> + __tcp_sock_set_nodelay(sk, val);
> break;
>
> case TCP_THIN_LINEAR_TIMEOUTS:
> --
> 2.43.0
>
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-04-15 17:32 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-14 11:23 [PATCH bpf-next 0/2] bpf: tcp: Reject TCP_NODELAY from BPF hdr opt callbacks KaFai Wan
2026-04-14 11:23 ` [PATCH bpf-next 1/2] " KaFai Wan
2026-04-14 13:56 ` KaFai Wan
2026-04-15 17:31 ` Martin KaFai Lau
2026-04-14 11:23 ` [PATCH bpf-next 2/2] selftests/bpf: Cover TCP_NODELAY in hdr opt callback KaFai Wan
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.