* [PATCH bpf v2 0/2] bpf: Reject TCP_NODELAY in TCP header option callbacks @ 2026-04-16 11:23 KaFai Wan 2026-04-16 11:23 ` [PATCH bpf v2 1/2] " KaFai Wan 2026-04-16 11:23 ` [PATCH bpf v2 2/2] selftests/bpf: Test TCP_NODELAY in TCP hdr opt callbacks KaFai Wan 0 siblings, 2 replies; 9+ messages in thread From: KaFai Wan @ 2026-04-16 11:23 UTC (permalink / raw) To: martin.lau, daniel, john.fastabend, sdf, ast, andrii, eddyz87, memxor, song, yonghong.song, jolsa, davem, edumazet, kuba, pabeni, horms, shuah, jiayuan.chen, kafai.wan, bpf, netdev, linux-kernel, linux-kselftest This small patchset is about avoid infinite recursion in TCP header option callbacks via TCP_NODELAY setsockopt. v2: - Reject TCP_NODELAY in bpf_sock_ops_setsockopt() (AI and Martin) v1: https://lore.kernel.org/bpf/20260414112310.1285783-1-kafai.wan@linux.dev/ --- KaFai Wan (2): bpf: Reject TCP_NODELAY in TCP header option callbacks selftests/bpf: Test TCP_NODELAY in TCP hdr opt callbacks net/core/filter.c | 5 ++ .../bpf/prog_tests/tcp_hdr_options.c | 54 +++++++++++++++++++ .../bpf/progs/test_misc_tcp_hdr_options.c | 40 ++++++++++++++ 3 files changed, 99 insertions(+) -- 2.43.0 ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH bpf v2 1/2] bpf: Reject TCP_NODELAY in TCP header option callbacks 2026-04-16 11:23 [PATCH bpf v2 0/2] bpf: Reject TCP_NODELAY in TCP header option callbacks KaFai Wan @ 2026-04-16 11:23 ` KaFai Wan 2026-04-16 17:35 ` Martin KaFai Lau 2026-04-17 2:43 ` Jiayuan Chen 2026-04-16 11:23 ` [PATCH bpf v2 2/2] selftests/bpf: Test TCP_NODELAY in TCP hdr opt callbacks KaFai Wan 1 sibling, 2 replies; 9+ messages in thread From: KaFai Wan @ 2026-04-16 11:23 UTC (permalink / raw) To: martin.lau, daniel, john.fastabend, sdf, ast, andrii, eddyz87, memxor, song, yonghong.song, jolsa, davem, edumazet, kuba, pabeni, horms, shuah, jiayuan.chen, kafai.wan, bpf, netdev, linux-kernel, linux-kselftest Cc: Quan Sun, Yinhao Hu, Kaiyan Mei A BPF_SOCK_OPS program can enable BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG and then call bpf_setsockopt(TCP_NODELAY) from BPF_SOCK_OPS_HDR_OPT_LEN_CB or BPF_SOCK_OPS_WRITE_HDR_OPT_CB. In these callbacks, bpf_setsockopt(TCP_NODELAY) can reach __tcp_sock_set_nodelay(), which can call tcp_push_pending_frames(). From BPF_SOCK_OPS_HDR_OPT_LEN_CB, tcp_push_pending_frames() can call tcp_current_mss(), which calls tcp_established_options() and re-enters bpf_skops_hdr_opt_len(). BPF_SOCK_OPS_HDR_OPT_LEN_CB -> bpf_setsockopt(TCP_NODELAY) -> tcp_push_pending_frames() -> tcp_current_mss() -> tcp_established_options() -> bpf_skops_hdr_opt_len() -> BPF_SOCK_OPS_HDR_OPT_LEN_CB From BPF_SOCK_OPS_WRITE_HDR_OPT_CB, tcp_push_pending_frames() can call tcp_write_xmit(), which calls tcp_transmit_skb(). That path recomputes header option length through tcp_established_options() and bpf_skops_hdr_opt_len() before re-entering bpf_skops_write_hdr_opt(). BPF_SOCK_OPS_WRITE_HDR_OPT_CB -> bpf_setsockopt(TCP_NODELAY) -> tcp_push_pending_frames() -> tcp_write_xmit() -> tcp_transmit_skb() -> tcp_established_options() -> bpf_skops_hdr_opt_len() -> bpf_skops_write_hdr_opt() -> BPF_SOCK_OPS_WRITE_HDR_OPT_CB This leads to unbounded recursion and can overflow the kernel stack. Reject TCP_NODELAY with -EOPNOTSUPP in bpf_sock_ops_setsockopt() when bpf_setsockopt() is called from BPF_SOCK_OPS_HDR_OPT_LEN_CB or BPF_SOCK_OPS_WRITE_HDR_OPT_CB. Reported-by: Quan Sun <2022090917019@std.uestc.edu.cn> Reported-by: Yinhao Hu <dddddd@hust.edu.cn> Reported-by: Kaiyan Mei <M202472210@hust.edu.cn> Closes: https://lore.kernel.org/bpf/d1d523c9-6901-4454-a183-94462b8f3e4e@std.uestc.edu.cn/ Fixes: 7e41df5dbba2 ("bpf: Add a few optnames to bpf_setsockopt") Signed-off-by: KaFai Wan <kafai.wan@linux.dev> --- net/core/filter.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/net/core/filter.c b/net/core/filter.c index fcfcb72663ca..911ff04bca5a 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -5833,6 +5833,11 @@ BPF_CALL_5(bpf_sock_ops_setsockopt, struct bpf_sock_ops_kern *, bpf_sock, if (!is_locked_tcp_sock_ops(bpf_sock)) return -EOPNOTSUPP; + if ((bpf_sock->op == BPF_SOCK_OPS_HDR_OPT_LEN_CB || + bpf_sock->op == BPF_SOCK_OPS_WRITE_HDR_OPT_CB) && + IS_ENABLED(CONFIG_INET) && level == SOL_TCP && optname == TCP_NODELAY) + return -EOPNOTSUPP; + return _bpf_setsockopt(bpf_sock->sk, level, optname, optval, optlen); } -- 2.43.0 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH bpf v2 1/2] bpf: Reject TCP_NODELAY in TCP header option callbacks 2026-04-16 11:23 ` [PATCH bpf v2 1/2] " KaFai Wan @ 2026-04-16 17:35 ` Martin KaFai Lau 2026-04-17 1:35 ` KaFai Wan 2026-04-17 2:43 ` Jiayuan Chen 1 sibling, 1 reply; 9+ messages in thread From: Martin KaFai Lau @ 2026-04-16 17:35 UTC (permalink / raw) To: KaFai Wan Cc: daniel, john.fastabend, sdf, ast, andrii, eddyz87, memxor, song, yonghong.song, jolsa, davem, edumazet, kuba, pabeni, horms, shuah, jiayuan.chen, bpf, netdev, linux-kernel, linux-kselftest, Quan Sun, Yinhao Hu, Kaiyan Mei On Thu, Apr 16, 2026 at 07:23:07PM +0800, KaFai Wan wrote: > diff --git a/net/core/filter.c b/net/core/filter.c > index fcfcb72663ca..911ff04bca5a 100644 > --- a/net/core/filter.c > +++ b/net/core/filter.c > @@ -5833,6 +5833,11 @@ BPF_CALL_5(bpf_sock_ops_setsockopt, struct bpf_sock_ops_kern *, bpf_sock, > if (!is_locked_tcp_sock_ops(bpf_sock)) > return -EOPNOTSUPP; > > + if ((bpf_sock->op == BPF_SOCK_OPS_HDR_OPT_LEN_CB || > + bpf_sock->op == BPF_SOCK_OPS_WRITE_HDR_OPT_CB) && > + IS_ENABLED(CONFIG_INET) && level == SOL_TCP && optname == TCP_NODELAY) IS_ENABLED(CONFIG_INET) is unnecessary. pw-bot: cr > + return -EOPNOTSUPP; > + > return _bpf_setsockopt(bpf_sock->sk, level, optname, optval, optlen); > } > > -- > 2.43.0 > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH bpf v2 1/2] bpf: Reject TCP_NODELAY in TCP header option callbacks 2026-04-16 17:35 ` Martin KaFai Lau @ 2026-04-17 1:35 ` KaFai Wan 0 siblings, 0 replies; 9+ messages in thread From: KaFai Wan @ 2026-04-17 1:35 UTC (permalink / raw) To: Martin KaFai Lau Cc: daniel, john.fastabend, sdf, ast, andrii, eddyz87, memxor, song, yonghong.song, jolsa, davem, edumazet, kuba, pabeni, horms, shuah, jiayuan.chen, bpf, netdev, linux-kernel, linux-kselftest, Quan Sun, Yinhao Hu, Kaiyan Mei On Thu, 2026-04-16 at 10:35 -0700, Martin KaFai Lau wrote: > On Thu, Apr 16, 2026 at 07:23:07PM +0800, KaFai Wan wrote: > > diff --git a/net/core/filter.c b/net/core/filter.c > > index fcfcb72663ca..911ff04bca5a 100644 > > --- a/net/core/filter.c > > +++ b/net/core/filter.c > > @@ -5833,6 +5833,11 @@ BPF_CALL_5(bpf_sock_ops_setsockopt, struct bpf_sock_ops_kern *, bpf_sock, > > if (!is_locked_tcp_sock_ops(bpf_sock)) > > return -EOPNOTSUPP; > > > > + if ((bpf_sock->op == BPF_SOCK_OPS_HDR_OPT_LEN_CB || > > + bpf_sock->op == BPF_SOCK_OPS_WRITE_HDR_OPT_CB) && > > + IS_ENABLED(CONFIG_INET) && level == SOL_TCP && optname == TCP_NODELAY) > > IS_ENABLED(CONFIG_INET) is unnecessary. ok, will remove in next version. > > pw-bot: cr > > > + return -EOPNOTSUPP; > > + > > return _bpf_setsockopt(bpf_sock->sk, level, optname, optval, optlen); > > } > > > > -- > > 2.43.0 > > -- Thanks, KaFai ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH bpf v2 1/2] bpf: Reject TCP_NODELAY in TCP header option callbacks 2026-04-16 11:23 ` [PATCH bpf v2 1/2] " KaFai Wan 2026-04-16 17:35 ` Martin KaFai Lau @ 2026-04-17 2:43 ` Jiayuan Chen 2026-04-17 9:27 ` KaFai Wan 1 sibling, 1 reply; 9+ messages in thread From: Jiayuan Chen @ 2026-04-17 2:43 UTC (permalink / raw) To: KaFai Wan, martin.lau, daniel, john.fastabend, sdf, ast, andrii, eddyz87, memxor, song, yonghong.song, jolsa, davem, edumazet, kuba, pabeni, horms, shuah, jiayuan.chen, bpf, netdev, linux-kernel, linux-kselftest Cc: Quan Sun, Yinhao Hu, Kaiyan Mei On 4/16/26 7:23 PM, KaFai Wan wrote: > A BPF_SOCK_OPS program can enable > BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG and then call > bpf_setsockopt(TCP_NODELAY) from BPF_SOCK_OPS_HDR_OPT_LEN_CB or > BPF_SOCK_OPS_WRITE_HDR_OPT_CB. > > In these callbacks, bpf_setsockopt(TCP_NODELAY) can reach > __tcp_sock_set_nodelay(), which can call tcp_push_pending_frames(). > > From BPF_SOCK_OPS_HDR_OPT_LEN_CB, tcp_push_pending_frames() can call > tcp_current_mss(), which calls tcp_established_options() and re-enters > bpf_skops_hdr_opt_len(). > > BPF_SOCK_OPS_HDR_OPT_LEN_CB > -> bpf_setsockopt(TCP_NODELAY) > -> tcp_push_pending_frames() > -> tcp_current_mss() > -> tcp_established_options() > -> bpf_skops_hdr_opt_len() > -> BPF_SOCK_OPS_HDR_OPT_LEN_CB > > From BPF_SOCK_OPS_WRITE_HDR_OPT_CB, tcp_push_pending_frames() can call > tcp_write_xmit(), which calls tcp_transmit_skb(). That path recomputes > header option length through tcp_established_options() and > bpf_skops_hdr_opt_len() before re-entering bpf_skops_write_hdr_opt(). > > BPF_SOCK_OPS_WRITE_HDR_OPT_CB > -> bpf_setsockopt(TCP_NODELAY) > -> tcp_push_pending_frames() > -> tcp_write_xmit() > -> tcp_transmit_skb() > -> tcp_established_options() > -> bpf_skops_hdr_opt_len() > -> bpf_skops_write_hdr_opt() > -> BPF_SOCK_OPS_WRITE_HDR_OPT_CB > > This leads to unbounded recursion and can overflow the kernel stack. > > Reject TCP_NODELAY with -EOPNOTSUPP in bpf_sock_ops_setsockopt() > when bpf_setsockopt() is called from > BPF_SOCK_OPS_HDR_OPT_LEN_CB or BPF_SOCK_OPS_WRITE_HDR_OPT_CB. > > Reported-by: Quan Sun <2022090917019@std.uestc.edu.cn> > Reported-by: Yinhao Hu <dddddd@hust.edu.cn> > Reported-by: Kaiyan Mei <M202472210@hust.edu.cn> > Closes: https://lore.kernel.org/bpf/d1d523c9-6901-4454-a183-94462b8f3e4e@std.uestc.edu.cn/ > Fixes: 7e41df5dbba2 ("bpf: Add a few optnames to bpf_setsockopt") > Signed-off-by: KaFai Wan <kafai.wan@linux.dev> > --- > net/core/filter.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/net/core/filter.c b/net/core/filter.c > index fcfcb72663ca..911ff04bca5a 100644 > --- a/net/core/filter.c > +++ b/net/core/filter.c > @@ -5833,6 +5833,11 @@ BPF_CALL_5(bpf_sock_ops_setsockopt, struct bpf_sock_ops_kern *, bpf_sock, > if (!is_locked_tcp_sock_ops(bpf_sock)) > return -EOPNOTSUPP; > > + if ((bpf_sock->op == BPF_SOCK_OPS_HDR_OPT_LEN_CB || > + bpf_sock->op == BPF_SOCK_OPS_WRITE_HDR_OPT_CB) && > + IS_ENABLED(CONFIG_INET) && level == SOL_TCP && optname == TCP_NODELAY) > + return -EOPNOTSUPP; > + > return _bpf_setsockopt(bpf_sock->sk, level, optname, optval, optlen); > } > A simple comment is recommended: /* TCP_NODELAY triggers tcp_push_pending_frames() and re-enters these callbacks. */ Also like Martin pointed before, BPF_SOCK_OPS_HDR_OPT_LEN_CB / BPF_SOCK_OPS_WRITE_HDR_OPT_CB can only be produced under CONFIG_INET so IS_ENABLED(CONFIG_INET) is dead. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH bpf v2 1/2] bpf: Reject TCP_NODELAY in TCP header option callbacks 2026-04-17 2:43 ` Jiayuan Chen @ 2026-04-17 9:27 ` KaFai Wan 0 siblings, 0 replies; 9+ messages in thread From: KaFai Wan @ 2026-04-17 9:27 UTC (permalink / raw) To: Jiayuan Chen, martin.lau, daniel, john.fastabend, sdf, ast, andrii, eddyz87, memxor, song, yonghong.song, jolsa, davem, edumazet, kuba, pabeni, horms, shuah, bpf, netdev, linux-kernel, linux-kselftest Cc: Quan Sun, Yinhao Hu, Kaiyan Mei On Fri, 2026-04-17 at 10:43 +0800, Jiayuan Chen wrote: > > On 4/16/26 7:23 PM, KaFai Wan wrote: > > A BPF_SOCK_OPS program can enable > > BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG and then call > > bpf_setsockopt(TCP_NODELAY) from BPF_SOCK_OPS_HDR_OPT_LEN_CB or > > BPF_SOCK_OPS_WRITE_HDR_OPT_CB. > > > > In these callbacks, bpf_setsockopt(TCP_NODELAY) can reach > > __tcp_sock_set_nodelay(), which can call tcp_push_pending_frames(). > > > > From BPF_SOCK_OPS_HDR_OPT_LEN_CB, tcp_push_pending_frames() can call > > tcp_current_mss(), which calls tcp_established_options() and re-enters > > bpf_skops_hdr_opt_len(). > > > > BPF_SOCK_OPS_HDR_OPT_LEN_CB > > -> bpf_setsockopt(TCP_NODELAY) > > -> tcp_push_pending_frames() > > -> tcp_current_mss() > > -> tcp_established_options() > > -> bpf_skops_hdr_opt_len() > > -> BPF_SOCK_OPS_HDR_OPT_LEN_CB > > > > From BPF_SOCK_OPS_WRITE_HDR_OPT_CB, tcp_push_pending_frames() can call > > tcp_write_xmit(), which calls tcp_transmit_skb(). That path recomputes > > header option length through tcp_established_options() and > > bpf_skops_hdr_opt_len() before re-entering bpf_skops_write_hdr_opt(). > > > > BPF_SOCK_OPS_WRITE_HDR_OPT_CB > > -> bpf_setsockopt(TCP_NODELAY) > > -> tcp_push_pending_frames() > > -> tcp_write_xmit() > > -> tcp_transmit_skb() > > -> tcp_established_options() > > -> bpf_skops_hdr_opt_len() > > -> bpf_skops_write_hdr_opt() > > -> BPF_SOCK_OPS_WRITE_HDR_OPT_CB > > > > This leads to unbounded recursion and can overflow the kernel stack. > > > > Reject TCP_NODELAY with -EOPNOTSUPP in bpf_sock_ops_setsockopt() > > when bpf_setsockopt() is called from > > BPF_SOCK_OPS_HDR_OPT_LEN_CB or BPF_SOCK_OPS_WRITE_HDR_OPT_CB. > > > > Reported-by: Quan Sun <2022090917019@std.uestc.edu.cn> > > Reported-by: Yinhao Hu <dddddd@hust.edu.cn> > > Reported-by: Kaiyan Mei <M202472210@hust.edu.cn> > > Closes: https://lore.kernel.org/bpf/d1d523c9-6901-4454-a183-94462b8f3e4e@std.uestc.edu.cn/ > > Fixes: 7e41df5dbba2 ("bpf: Add a few optnames to bpf_setsockopt") > > Signed-off-by: KaFai Wan <kafai.wan@linux.dev> > > --- > > net/core/filter.c | 5 +++++ > > 1 file changed, 5 insertions(+) > > > > diff --git a/net/core/filter.c b/net/core/filter.c > > index fcfcb72663ca..911ff04bca5a 100644 > > --- a/net/core/filter.c > > +++ b/net/core/filter.c > > @@ -5833,6 +5833,11 @@ BPF_CALL_5(bpf_sock_ops_setsockopt, struct bpf_sock_ops_kern *, bpf_sock, > > if (!is_locked_tcp_sock_ops(bpf_sock)) > > return -EOPNOTSUPP; > > > > + if ((bpf_sock->op == BPF_SOCK_OPS_HDR_OPT_LEN_CB || > > + bpf_sock->op == BPF_SOCK_OPS_WRITE_HDR_OPT_CB) && > > + IS_ENABLED(CONFIG_INET) && level == SOL_TCP && optname == TCP_NODELAY) > > + return -EOPNOTSUPP; > > + > > return _bpf_setsockopt(bpf_sock->sk, level, optname, optval, optlen); > > } > > > > A simple comment is recommended: > > /* TCP_NODELAY triggers tcp_push_pending_frames() and re-enters these > callbacks. */ > ok. Added in new version. > > Also like Martin pointed before, BPF_SOCK_OPS_HDR_OPT_LEN_CB / > BPF_SOCK_OPS_WRITE_HDR_OPT_CB > > can only be produced under CONFIG_INET so IS_ENABLED(CONFIG_INET) is dead. > -- Thanks, KaFai ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH bpf v2 2/2] selftests/bpf: Test TCP_NODELAY in TCP hdr opt callbacks 2026-04-16 11:23 [PATCH bpf v2 0/2] bpf: Reject TCP_NODELAY in TCP header option callbacks KaFai Wan 2026-04-16 11:23 ` [PATCH bpf v2 1/2] " KaFai Wan @ 2026-04-16 11:23 ` KaFai Wan 2026-04-16 19:06 ` Martin KaFai Lau 1 sibling, 1 reply; 9+ messages in thread From: KaFai Wan @ 2026-04-16 11:23 UTC (permalink / raw) To: martin.lau, daniel, john.fastabend, sdf, ast, andrii, eddyz87, memxor, song, yonghong.song, jolsa, davem, edumazet, kuba, pabeni, horms, shuah, jiayuan.chen, kafai.wan, bpf, netdev, linux-kernel, linux-kselftest Add a sockops selftest for the TCP_NODELAY restriction in BPF_SOCK_OPS_HDR_OPT_LEN_CB and BPF_SOCK_OPS_WRITE_HDR_OPT_CB. The test program calls bpf_setsockopt(TCP_NODELAY) from ACTIVE_ESTABLISHED_CB and PASSIVE_ESTABLISHED_CB to verify that it is still allowed outside the TCP header option callbacks. It then enables BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG, sends data to exercise the TCP header option path, and checks that bpf_setsockopt(TCP_NODELAY) returns -EOPNOTSUPP from both BPF_SOCK_OPS_HDR_OPT_LEN_CB and BPF_SOCK_OPS_WRITE_HDR_OPT_CB while the connection continues to make forward progress. Signed-off-by: KaFai Wan <kafai.wan@linux.dev> --- .../bpf/prog_tests/tcp_hdr_options.c | 54 +++++++++++++++++++ .../bpf/progs/test_misc_tcp_hdr_options.c | 40 ++++++++++++++ 2 files changed, 94 insertions(+) diff --git a/tools/testing/selftests/bpf/prog_tests/tcp_hdr_options.c b/tools/testing/selftests/bpf/prog_tests/tcp_hdr_options.c index 56685fc03c7e..2d738c0c4259 100644 --- a/tools/testing/selftests/bpf/prog_tests/tcp_hdr_options.c +++ b/tools/testing/selftests/bpf/prog_tests/tcp_hdr_options.c @@ -513,6 +513,59 @@ static void misc(void) bpf_link__destroy(link); } +static void hdr_sockopt(void) +{ + const char send_msg[] = "MISC!!!"; + char recv_msg[sizeof(send_msg)]; + const unsigned int nr_data = 2; + struct bpf_link *link; + struct sk_fds sk_fds; + int i, ret, true_val = 1; + + lport_linum_map_fd = bpf_map__fd(misc_skel->maps.lport_linum_map); + + link = bpf_program__attach_cgroup(misc_skel->progs.misc_hdr_sockopt, cg_fd); + if (!ASSERT_OK_PTR(link, "attach_cgroup(misc_hdr_sockopt)")) + return; + + if (sk_fds_connect(&sk_fds, false)) { + bpf_link__destroy(link); + return; + } + + ret = setsockopt(sk_fds.active_fd, SOL_TCP, TCP_NODELAY, &true_val, sizeof(true_val)); + if (!ASSERT_OK(ret, "setsockopt(TCP_NODELAY) active")) + goto check_linum; + + ret = setsockopt(sk_fds.passive_fd, SOL_TCP, TCP_NODELAY, &true_val, sizeof(true_val)); + if (!ASSERT_OK(ret, "setsockopt(TCP_NODELAY) passive")) + goto check_linum; + + for (i = 0; i < nr_data; i++) { + ret = send(sk_fds.active_fd, send_msg, sizeof(send_msg), 0); + if (!ASSERT_EQ(ret, sizeof(send_msg), "send(msg)")) + goto check_linum; + + ret = read(sk_fds.passive_fd, recv_msg, sizeof(recv_msg)); + if (!ASSERT_EQ(ret, sizeof(send_msg), "read(msg)")) + goto check_linum; + } + + ASSERT_NEQ(misc_skel->bss->nr_hdr_sockopt_estab, 0, "nr_hdr_sockopt_estab"); + ASSERT_EQ(misc_skel->bss->nr_hdr_sockopt_estab_err, 0, "nr_hdr_sockopt_estab_err"); + + ASSERT_NEQ(misc_skel->bss->nr_hdr_sockopt_len, 0, "nr_hdr_sockopt_len"); + ASSERT_EQ(misc_skel->bss->nr_hdr_sockopt_len_err, 0, "nr_hdr_sockopt_len_err"); + + ASSERT_NEQ(misc_skel->bss->nr_hdr_sockopt_write, 0, "nr_hdr_sockopt_write"); + ASSERT_EQ(misc_skel->bss->nr_hdr_sockopt_write_err, 0, "nr_hdr_sockopt_write_err"); + +check_linum: + ASSERT_FALSE(check_error_linum(&sk_fds), "check_error_linum"); + sk_fds_close(&sk_fds); + bpf_link__destroy(link); +} + struct test { const char *desc; void (*run)(void); @@ -526,6 +579,7 @@ static struct test tests[] = { DEF_TEST(fastopen_estab), DEF_TEST(fin), DEF_TEST(misc), + DEF_TEST(hdr_sockopt), }; void test_tcp_hdr_options(void) diff --git a/tools/testing/selftests/bpf/progs/test_misc_tcp_hdr_options.c b/tools/testing/selftests/bpf/progs/test_misc_tcp_hdr_options.c index d487153a839d..a8cf7c4e7ed2 100644 --- a/tools/testing/selftests/bpf/progs/test_misc_tcp_hdr_options.c +++ b/tools/testing/selftests/bpf/progs/test_misc_tcp_hdr_options.c @@ -28,6 +28,12 @@ unsigned int nr_data = 0; unsigned int nr_syn = 0; unsigned int nr_fin = 0; unsigned int nr_hwtstamp = 0; +unsigned int nr_hdr_sockopt_estab = 0; +unsigned int nr_hdr_sockopt_estab_err = 0; +unsigned int nr_hdr_sockopt_len = 0; +unsigned int nr_hdr_sockopt_len_err = 0; +unsigned int nr_hdr_sockopt_write = 0; +unsigned int nr_hdr_sockopt_write_err = 0; /* Check the header received from the active side */ static int __check_active_hdr_in(struct bpf_sock_ops *skops, bool check_syn) @@ -326,4 +332,38 @@ int misc_estab(struct bpf_sock_ops *skops) return CG_OK; } +SEC("sockops") +int misc_hdr_sockopt(struct bpf_sock_ops *skops) +{ + int true_val = 1; + int ret; + + switch (skops->op) { + case BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB: + case BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB: + nr_hdr_sockopt_estab++; + set_hdr_cb_flags(skops, 0); + ret = bpf_setsockopt(skops, SOL_TCP, TCP_NODELAY, &true_val, sizeof(true_val)); + if (ret) + nr_hdr_sockopt_estab_err++; + break; + case BPF_SOCK_OPS_HDR_OPT_LEN_CB: + nr_hdr_sockopt_len++; + ret = bpf_setsockopt(skops, SOL_TCP, TCP_NODELAY, &true_val, sizeof(true_val)); + if (ret != -EOPNOTSUPP) + nr_hdr_sockopt_len_err++; + /* just trigger BPF_SOCK_OPS_WRITE_HDR_OPT_CB */ + bpf_reserve_hdr_opt(skops, 12, 0); + break; + case BPF_SOCK_OPS_WRITE_HDR_OPT_CB: + nr_hdr_sockopt_write++; + ret = bpf_setsockopt(skops, SOL_TCP, TCP_NODELAY, &true_val, sizeof(true_val)); + if (ret != -EOPNOTSUPP) + nr_hdr_sockopt_write_err++; + break; + } + + return CG_OK; +} + char _license[] SEC("license") = "GPL"; -- 2.43.0 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH bpf v2 2/2] selftests/bpf: Test TCP_NODELAY in TCP hdr opt callbacks 2026-04-16 11:23 ` [PATCH bpf v2 2/2] selftests/bpf: Test TCP_NODELAY in TCP hdr opt callbacks KaFai Wan @ 2026-04-16 19:06 ` Martin KaFai Lau 2026-04-17 3:07 ` KaFai Wan 0 siblings, 1 reply; 9+ messages in thread From: Martin KaFai Lau @ 2026-04-16 19:06 UTC (permalink / raw) To: KaFai Wan Cc: daniel, john.fastabend, sdf, ast, andrii, eddyz87, memxor, song, yonghong.song, jolsa, davem, edumazet, kuba, pabeni, horms, shuah, jiayuan.chen, bpf, netdev, linux-kernel, linux-kselftest On Thu, Apr 16, 2026 at 07:23:08PM +0800, KaFai Wan wrote: > index 56685fc03c7e..2d738c0c4259 100644 > --- a/tools/testing/selftests/bpf/prog_tests/tcp_hdr_options.c > +++ b/tools/testing/selftests/bpf/prog_tests/tcp_hdr_options.c > @@ -513,6 +513,59 @@ static void misc(void) > bpf_link__destroy(link); > } > > +static void hdr_sockopt(void) > +{ > + const char send_msg[] = "MISC!!!"; > + char recv_msg[sizeof(send_msg)]; > + const unsigned int nr_data = 2; > + struct bpf_link *link; > + struct sk_fds sk_fds; > + int i, ret, true_val = 1; > + > + lport_linum_map_fd = bpf_map__fd(misc_skel->maps.lport_linum_map); > + > + link = bpf_program__attach_cgroup(misc_skel->progs.misc_hdr_sockopt, cg_fd); > + if (!ASSERT_OK_PTR(link, "attach_cgroup(misc_hdr_sockopt)")) > + return; > + > + if (sk_fds_connect(&sk_fds, false)) { > + bpf_link__destroy(link); > + return; > + } > + > + ret = setsockopt(sk_fds.active_fd, SOL_TCP, TCP_NODELAY, &true_val, sizeof(true_val)); > + if (!ASSERT_OK(ret, "setsockopt(TCP_NODELAY) active")) > + goto check_linum; > + > + ret = setsockopt(sk_fds.passive_fd, SOL_TCP, TCP_NODELAY, &true_val, sizeof(true_val)); Why are these two setsockopt(TCP_NODELAY) calls needed? Instead of creating a new "void hdr_sockopt(void)", can the test be done in the existing "void misc(void)" by doing bpf_setsockopt(TCP_NODELAY) in the misc_estab() bpf prog? The PASSIVE_ESTABLISHED_CB can do the bpf_setsockopt(TCP_NODELAY, 0) if it wants to keep the same expectation on Nagle. The BPF_SOCK_OPS_HDR_OPT_LEN_CB and BPF_SOCK_OPS_WRITE_HDR_OPT_CB can do bpf_setsockopt(TCP_NODELAY, 1) to test recursion and the error return value. > void test_tcp_hdr_options(void) > diff --git a/tools/testing/selftests/bpf/progs/test_misc_tcp_hdr_options.c b/tools/testing/selftests/bpf/progs/test_misc_tcp_hdr_options.c > index d487153a839d..a8cf7c4e7ed2 100644 > --- a/tools/testing/selftests/bpf/progs/test_misc_tcp_hdr_options.c > +++ b/tools/testing/selftests/bpf/progs/test_misc_tcp_hdr_options.c > @@ -28,6 +28,12 @@ unsigned int nr_data = 0; > unsigned int nr_syn = 0; > unsigned int nr_fin = 0; > unsigned int nr_hwtstamp = 0; > +unsigned int nr_hdr_sockopt_estab = 0; > +unsigned int nr_hdr_sockopt_estab_err = 0; > +unsigned int nr_hdr_sockopt_len = 0; > +unsigned int nr_hdr_sockopt_len_err = 0; > +unsigned int nr_hdr_sockopt_write = 0; > +unsigned int nr_hdr_sockopt_write_err = 0; nr_hdr_sockopt_estab, nr_hdr_sockopt_len, and nr_hdr_sockopt_write are unnecessary. These tests have already been covered in some ways. Mostly a nit. The new counters are used in both connections. Note the existing nr_xxx is exclusively used in either active or passive, so there is no parallel counting in practice. Instead of counting, just use a bool nodelay_est_ok, nodelay_hdr_len_err, nodelay_write_err and assert them to be true in userspace. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH bpf v2 2/2] selftests/bpf: Test TCP_NODELAY in TCP hdr opt callbacks 2026-04-16 19:06 ` Martin KaFai Lau @ 2026-04-17 3:07 ` KaFai Wan 0 siblings, 0 replies; 9+ messages in thread From: KaFai Wan @ 2026-04-17 3:07 UTC (permalink / raw) To: Martin KaFai Lau Cc: daniel, john.fastabend, sdf, ast, andrii, eddyz87, memxor, song, yonghong.song, jolsa, davem, edumazet, kuba, pabeni, horms, shuah, jiayuan.chen, bpf, netdev, linux-kernel, linux-kselftest On Thu, 2026-04-16 at 12:06 -0700, Martin KaFai Lau wrote: > On Thu, Apr 16, 2026 at 07:23:08PM +0800, KaFai Wan wrote: > > index 56685fc03c7e..2d738c0c4259 100644 > > --- a/tools/testing/selftests/bpf/prog_tests/tcp_hdr_options.c > > +++ b/tools/testing/selftests/bpf/prog_tests/tcp_hdr_options.c > > @@ -513,6 +513,59 @@ static void misc(void) > > bpf_link__destroy(link); > > } > > > > +static void hdr_sockopt(void) > > +{ > > + const char send_msg[] = "MISC!!!"; > > + char recv_msg[sizeof(send_msg)]; > > + const unsigned int nr_data = 2; > > + struct bpf_link *link; > > + struct sk_fds sk_fds; > > + int i, ret, true_val = 1; > > + > > + lport_linum_map_fd = bpf_map__fd(misc_skel->maps.lport_linum_map); > > + > > + link = bpf_program__attach_cgroup(misc_skel->progs.misc_hdr_sockopt, cg_fd); > > + if (!ASSERT_OK_PTR(link, "attach_cgroup(misc_hdr_sockopt)")) > > + return; > > + > > + if (sk_fds_connect(&sk_fds, false)) { > > + bpf_link__destroy(link); > > + return; > > + } > > + > > + ret = setsockopt(sk_fds.active_fd, SOL_TCP, TCP_NODELAY, &true_val, sizeof(true_val)); > > + if (!ASSERT_OK(ret, "setsockopt(TCP_NODELAY) active")) > > + goto check_linum; > > + > > + ret = setsockopt(sk_fds.passive_fd, SOL_TCP, TCP_NODELAY, &true_val, sizeof(true_val)); > > Why are these two setsockopt(TCP_NODELAY) calls needed? > > Instead of creating a new "void hdr_sockopt(void)", can the test be done in the > existing "void misc(void)" by doing bpf_setsockopt(TCP_NODELAY) in the > misc_estab() bpf prog? Oh, I see. I meant to test on both active and passive side. We can only test on active side in the existing "void misc(void)". > > The PASSIVE_ESTABLISHED_CB can do the bpf_setsockopt(TCP_NODELAY, 0) > if it wants to keep the same expectation on Nagle. The > BPF_SOCK_OPS_HDR_OPT_LEN_CB and BPF_SOCK_OPS_WRITE_HDR_OPT_CB > can do bpf_setsockopt(TCP_NODELAY, 1) to test recursion and > the error return value. > > > void test_tcp_hdr_options(void) > > diff --git a/tools/testing/selftests/bpf/progs/test_misc_tcp_hdr_options.c > > b/tools/testing/selftests/bpf/progs/test_misc_tcp_hdr_options.c > > index d487153a839d..a8cf7c4e7ed2 100644 > > --- a/tools/testing/selftests/bpf/progs/test_misc_tcp_hdr_options.c > > +++ b/tools/testing/selftests/bpf/progs/test_misc_tcp_hdr_options.c > > @@ -28,6 +28,12 @@ unsigned int nr_data = 0; > > unsigned int nr_syn = 0; > > unsigned int nr_fin = 0; > > unsigned int nr_hwtstamp = 0; > > +unsigned int nr_hdr_sockopt_estab = 0; > > +unsigned int nr_hdr_sockopt_estab_err = 0; > > +unsigned int nr_hdr_sockopt_len = 0; > > +unsigned int nr_hdr_sockopt_len_err = 0; > > +unsigned int nr_hdr_sockopt_write = 0; > > +unsigned int nr_hdr_sockopt_write_err = 0; > > nr_hdr_sockopt_estab, nr_hdr_sockopt_len, and nr_hdr_sockopt_write > are unnecessary. These tests have already been covered in some ways. yes, they are unnecessary in existing misc_estab() > > Mostly a nit. The new counters are used in both connections. Note the > existing nr_xxx is exclusively used in either active or passive, > so there is no parallel counting in practice. > > Instead of counting, just use a bool nodelay_est_ok, > nodelay_hdr_len_err, nodelay_write_err and assert them > to be true in userspace. indeed. will fix these in next version. -- Thanks, KaFai ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2026-04-17 9:28 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-04-16 11:23 [PATCH bpf v2 0/2] bpf: Reject TCP_NODELAY in TCP header option callbacks KaFai Wan 2026-04-16 11:23 ` [PATCH bpf v2 1/2] " KaFai Wan 2026-04-16 17:35 ` Martin KaFai Lau 2026-04-17 1:35 ` KaFai Wan 2026-04-17 2:43 ` Jiayuan Chen 2026-04-17 9:27 ` KaFai Wan 2026-04-16 11:23 ` [PATCH bpf v2 2/2] selftests/bpf: Test TCP_NODELAY in TCP hdr opt callbacks KaFai Wan 2026-04-16 19:06 ` Martin KaFai Lau 2026-04-17 3:07 ` KaFai Wan
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox