public inbox for linux-kselftest@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf v2 0/2] bpf: Reject TCP_NODELAY in TCP header option callbacks
@ 2026-04-16 11:23 KaFai Wan
  2026-04-16 11:23 ` [PATCH bpf v2 1/2] " KaFai Wan
  2026-04-16 11:23 ` [PATCH bpf v2 2/2] selftests/bpf: Test TCP_NODELAY in TCP hdr opt callbacks KaFai Wan
  0 siblings, 2 replies; 9+ messages in thread
From: KaFai Wan @ 2026-04-16 11:23 UTC (permalink / raw)
  To: martin.lau, daniel, john.fastabend, sdf, ast, andrii, eddyz87,
	memxor, song, yonghong.song, jolsa, davem, edumazet, kuba, pabeni,
	horms, shuah, jiayuan.chen, kafai.wan, bpf, netdev, linux-kernel,
	linux-kselftest

This small patchset is about avoid infinite recursion in TCP header option
callbacks via TCP_NODELAY setsockopt.

v2:
 - Reject TCP_NODELAY in bpf_sock_ops_setsockopt() (AI and Martin)

v1:
 https://lore.kernel.org/bpf/20260414112310.1285783-1-kafai.wan@linux.dev/

---
KaFai Wan (2):
  bpf: Reject TCP_NODELAY in TCP header option callbacks
  selftests/bpf: Test TCP_NODELAY in TCP hdr opt callbacks

 net/core/filter.c                             |  5 ++
 .../bpf/prog_tests/tcp_hdr_options.c          | 54 +++++++++++++++++++
 .../bpf/progs/test_misc_tcp_hdr_options.c     | 40 ++++++++++++++
 3 files changed, 99 insertions(+)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH bpf v2 1/2] bpf: Reject TCP_NODELAY in TCP header option callbacks
  2026-04-16 11:23 [PATCH bpf v2 0/2] bpf: Reject TCP_NODELAY in TCP header option callbacks KaFai Wan
@ 2026-04-16 11:23 ` KaFai Wan
  2026-04-16 17:35   ` Martin KaFai Lau
  2026-04-17  2:43   ` Jiayuan Chen
  2026-04-16 11:23 ` [PATCH bpf v2 2/2] selftests/bpf: Test TCP_NODELAY in TCP hdr opt callbacks KaFai Wan
  1 sibling, 2 replies; 9+ messages in thread
From: KaFai Wan @ 2026-04-16 11:23 UTC (permalink / raw)
  To: martin.lau, daniel, john.fastabend, sdf, ast, andrii, eddyz87,
	memxor, song, yonghong.song, jolsa, davem, edumazet, kuba, pabeni,
	horms, shuah, jiayuan.chen, kafai.wan, bpf, netdev, linux-kernel,
	linux-kselftest
  Cc: Quan Sun, Yinhao Hu, Kaiyan Mei

A BPF_SOCK_OPS program can enable
BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG and then call
bpf_setsockopt(TCP_NODELAY) from BPF_SOCK_OPS_HDR_OPT_LEN_CB or
BPF_SOCK_OPS_WRITE_HDR_OPT_CB.

In these callbacks, bpf_setsockopt(TCP_NODELAY) can reach
__tcp_sock_set_nodelay(), which can call tcp_push_pending_frames().

From BPF_SOCK_OPS_HDR_OPT_LEN_CB, tcp_push_pending_frames() can call
tcp_current_mss(), which calls tcp_established_options() and re-enters
bpf_skops_hdr_opt_len().

BPF_SOCK_OPS_HDR_OPT_LEN_CB
  -> bpf_setsockopt(TCP_NODELAY)
    -> tcp_push_pending_frames()
      -> tcp_current_mss()
        -> tcp_established_options()
          -> bpf_skops_hdr_opt_len()
            -> BPF_SOCK_OPS_HDR_OPT_LEN_CB

From BPF_SOCK_OPS_WRITE_HDR_OPT_CB, tcp_push_pending_frames() can call
tcp_write_xmit(), which calls tcp_transmit_skb().  That path recomputes
header option length through tcp_established_options() and
bpf_skops_hdr_opt_len() before re-entering bpf_skops_write_hdr_opt().

BPF_SOCK_OPS_WRITE_HDR_OPT_CB
  -> bpf_setsockopt(TCP_NODELAY)
    -> tcp_push_pending_frames()
      -> tcp_write_xmit()
        -> tcp_transmit_skb()
          -> tcp_established_options()
            -> bpf_skops_hdr_opt_len()
          -> bpf_skops_write_hdr_opt()
            -> BPF_SOCK_OPS_WRITE_HDR_OPT_CB

This leads to unbounded recursion and can overflow the kernel stack.

Reject TCP_NODELAY with -EOPNOTSUPP in bpf_sock_ops_setsockopt()
when bpf_setsockopt() is called from
BPF_SOCK_OPS_HDR_OPT_LEN_CB or BPF_SOCK_OPS_WRITE_HDR_OPT_CB.

Reported-by: Quan Sun <2022090917019@std.uestc.edu.cn>
Reported-by: Yinhao Hu <dddddd@hust.edu.cn>
Reported-by: Kaiyan Mei <M202472210@hust.edu.cn>
Closes: https://lore.kernel.org/bpf/d1d523c9-6901-4454-a183-94462b8f3e4e@std.uestc.edu.cn/
Fixes: 7e41df5dbba2 ("bpf: Add a few optnames to bpf_setsockopt")
Signed-off-by: KaFai Wan <kafai.wan@linux.dev>
---
 net/core/filter.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/net/core/filter.c b/net/core/filter.c
index fcfcb72663ca..911ff04bca5a 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5833,6 +5833,11 @@ BPF_CALL_5(bpf_sock_ops_setsockopt, struct bpf_sock_ops_kern *, bpf_sock,
 	if (!is_locked_tcp_sock_ops(bpf_sock))
 		return -EOPNOTSUPP;
 
+	if ((bpf_sock->op == BPF_SOCK_OPS_HDR_OPT_LEN_CB ||
+	     bpf_sock->op == BPF_SOCK_OPS_WRITE_HDR_OPT_CB) &&
+	    IS_ENABLED(CONFIG_INET) && level == SOL_TCP && optname == TCP_NODELAY)
+		return -EOPNOTSUPP;
+
 	return _bpf_setsockopt(bpf_sock->sk, level, optname, optval, optlen);
 }
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH bpf v2 2/2] selftests/bpf: Test TCP_NODELAY in TCP hdr opt callbacks
  2026-04-16 11:23 [PATCH bpf v2 0/2] bpf: Reject TCP_NODELAY in TCP header option callbacks KaFai Wan
  2026-04-16 11:23 ` [PATCH bpf v2 1/2] " KaFai Wan
@ 2026-04-16 11:23 ` KaFai Wan
  2026-04-16 19:06   ` Martin KaFai Lau
  1 sibling, 1 reply; 9+ messages in thread
From: KaFai Wan @ 2026-04-16 11:23 UTC (permalink / raw)
  To: martin.lau, daniel, john.fastabend, sdf, ast, andrii, eddyz87,
	memxor, song, yonghong.song, jolsa, davem, edumazet, kuba, pabeni,
	horms, shuah, jiayuan.chen, kafai.wan, bpf, netdev, linux-kernel,
	linux-kselftest

Add a sockops selftest for the TCP_NODELAY restriction in
BPF_SOCK_OPS_HDR_OPT_LEN_CB and BPF_SOCK_OPS_WRITE_HDR_OPT_CB.

The test program calls bpf_setsockopt(TCP_NODELAY) from
ACTIVE_ESTABLISHED_CB and PASSIVE_ESTABLISHED_CB to verify that it is
still allowed outside the TCP header option callbacks.

It then enables BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG, sends data to
exercise the TCP header option path, and checks that
bpf_setsockopt(TCP_NODELAY) returns -EOPNOTSUPP from both
BPF_SOCK_OPS_HDR_OPT_LEN_CB and BPF_SOCK_OPS_WRITE_HDR_OPT_CB while the
connection continues to make forward progress.

Signed-off-by: KaFai Wan <kafai.wan@linux.dev>
---
 .../bpf/prog_tests/tcp_hdr_options.c          | 54 +++++++++++++++++++
 .../bpf/progs/test_misc_tcp_hdr_options.c     | 40 ++++++++++++++
 2 files changed, 94 insertions(+)

diff --git a/tools/testing/selftests/bpf/prog_tests/tcp_hdr_options.c b/tools/testing/selftests/bpf/prog_tests/tcp_hdr_options.c
index 56685fc03c7e..2d738c0c4259 100644
--- a/tools/testing/selftests/bpf/prog_tests/tcp_hdr_options.c
+++ b/tools/testing/selftests/bpf/prog_tests/tcp_hdr_options.c
@@ -513,6 +513,59 @@ static void misc(void)
 	bpf_link__destroy(link);
 }
 
+static void hdr_sockopt(void)
+{
+	const char send_msg[] = "MISC!!!";
+	char recv_msg[sizeof(send_msg)];
+	const unsigned int nr_data = 2;
+	struct bpf_link *link;
+	struct sk_fds sk_fds;
+	int i, ret, true_val = 1;
+
+	lport_linum_map_fd = bpf_map__fd(misc_skel->maps.lport_linum_map);
+
+	link = bpf_program__attach_cgroup(misc_skel->progs.misc_hdr_sockopt, cg_fd);
+	if (!ASSERT_OK_PTR(link, "attach_cgroup(misc_hdr_sockopt)"))
+		return;
+
+	if (sk_fds_connect(&sk_fds, false)) {
+		bpf_link__destroy(link);
+		return;
+	}
+
+	ret = setsockopt(sk_fds.active_fd, SOL_TCP, TCP_NODELAY, &true_val, sizeof(true_val));
+	if (!ASSERT_OK(ret, "setsockopt(TCP_NODELAY) active"))
+		goto check_linum;
+
+	ret = setsockopt(sk_fds.passive_fd, SOL_TCP, TCP_NODELAY, &true_val, sizeof(true_val));
+	if (!ASSERT_OK(ret, "setsockopt(TCP_NODELAY) passive"))
+		goto check_linum;
+
+	for (i = 0; i < nr_data; i++) {
+		ret = send(sk_fds.active_fd, send_msg, sizeof(send_msg), 0);
+		if (!ASSERT_EQ(ret, sizeof(send_msg), "send(msg)"))
+			goto check_linum;
+
+		ret = read(sk_fds.passive_fd, recv_msg, sizeof(recv_msg));
+		if (!ASSERT_EQ(ret, sizeof(send_msg), "read(msg)"))
+			goto check_linum;
+	}
+
+	ASSERT_NEQ(misc_skel->bss->nr_hdr_sockopt_estab, 0, "nr_hdr_sockopt_estab");
+	ASSERT_EQ(misc_skel->bss->nr_hdr_sockopt_estab_err, 0, "nr_hdr_sockopt_estab_err");
+
+	ASSERT_NEQ(misc_skel->bss->nr_hdr_sockopt_len, 0, "nr_hdr_sockopt_len");
+	ASSERT_EQ(misc_skel->bss->nr_hdr_sockopt_len_err, 0, "nr_hdr_sockopt_len_err");
+
+	ASSERT_NEQ(misc_skel->bss->nr_hdr_sockopt_write, 0, "nr_hdr_sockopt_write");
+	ASSERT_EQ(misc_skel->bss->nr_hdr_sockopt_write_err, 0, "nr_hdr_sockopt_write_err");
+
+check_linum:
+	ASSERT_FALSE(check_error_linum(&sk_fds), "check_error_linum");
+	sk_fds_close(&sk_fds);
+	bpf_link__destroy(link);
+}
+
 struct test {
 	const char *desc;
 	void (*run)(void);
@@ -526,6 +579,7 @@ static struct test tests[] = {
 	DEF_TEST(fastopen_estab),
 	DEF_TEST(fin),
 	DEF_TEST(misc),
+	DEF_TEST(hdr_sockopt),
 };
 
 void test_tcp_hdr_options(void)
diff --git a/tools/testing/selftests/bpf/progs/test_misc_tcp_hdr_options.c b/tools/testing/selftests/bpf/progs/test_misc_tcp_hdr_options.c
index d487153a839d..a8cf7c4e7ed2 100644
--- a/tools/testing/selftests/bpf/progs/test_misc_tcp_hdr_options.c
+++ b/tools/testing/selftests/bpf/progs/test_misc_tcp_hdr_options.c
@@ -28,6 +28,12 @@ unsigned int nr_data = 0;
 unsigned int nr_syn = 0;
 unsigned int nr_fin = 0;
 unsigned int nr_hwtstamp = 0;
+unsigned int nr_hdr_sockopt_estab = 0;
+unsigned int nr_hdr_sockopt_estab_err = 0;
+unsigned int nr_hdr_sockopt_len = 0;
+unsigned int nr_hdr_sockopt_len_err = 0;
+unsigned int nr_hdr_sockopt_write = 0;
+unsigned int nr_hdr_sockopt_write_err = 0;
 
 /* Check the header received from the active side */
 static int __check_active_hdr_in(struct bpf_sock_ops *skops, bool check_syn)
@@ -326,4 +332,38 @@ int misc_estab(struct bpf_sock_ops *skops)
 	return CG_OK;
 }
 
+SEC("sockops")
+int misc_hdr_sockopt(struct bpf_sock_ops *skops)
+{
+	int true_val = 1;
+	int ret;
+
+	switch (skops->op) {
+	case BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB:
+	case BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB:
+		nr_hdr_sockopt_estab++;
+		set_hdr_cb_flags(skops, 0);
+		ret = bpf_setsockopt(skops, SOL_TCP, TCP_NODELAY, &true_val, sizeof(true_val));
+		if (ret)
+			nr_hdr_sockopt_estab_err++;
+		break;
+	case BPF_SOCK_OPS_HDR_OPT_LEN_CB:
+		nr_hdr_sockopt_len++;
+		ret = bpf_setsockopt(skops, SOL_TCP, TCP_NODELAY, &true_val, sizeof(true_val));
+		if (ret != -EOPNOTSUPP)
+			nr_hdr_sockopt_len_err++;
+		/* just trigger BPF_SOCK_OPS_WRITE_HDR_OPT_CB */
+		bpf_reserve_hdr_opt(skops, 12, 0);
+		break;
+	case BPF_SOCK_OPS_WRITE_HDR_OPT_CB:
+		nr_hdr_sockopt_write++;
+		ret = bpf_setsockopt(skops, SOL_TCP, TCP_NODELAY, &true_val, sizeof(true_val));
+		if (ret != -EOPNOTSUPP)
+			nr_hdr_sockopt_write_err++;
+		break;
+	}
+
+	return CG_OK;
+}
+
 char _license[] SEC("license") = "GPL";
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH bpf v2 1/2] bpf: Reject TCP_NODELAY in TCP header option callbacks
  2026-04-16 11:23 ` [PATCH bpf v2 1/2] " KaFai Wan
@ 2026-04-16 17:35   ` Martin KaFai Lau
  2026-04-17  1:35     ` KaFai Wan
  2026-04-17  2:43   ` Jiayuan Chen
  1 sibling, 1 reply; 9+ messages in thread
From: Martin KaFai Lau @ 2026-04-16 17:35 UTC (permalink / raw)
  To: KaFai Wan
  Cc: daniel, john.fastabend, sdf, ast, andrii, eddyz87, memxor, song,
	yonghong.song, jolsa, davem, edumazet, kuba, pabeni, horms, shuah,
	jiayuan.chen, bpf, netdev, linux-kernel, linux-kselftest,
	Quan Sun, Yinhao Hu, Kaiyan Mei

On Thu, Apr 16, 2026 at 07:23:07PM +0800, KaFai Wan wrote:
> diff --git a/net/core/filter.c b/net/core/filter.c
> index fcfcb72663ca..911ff04bca5a 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -5833,6 +5833,11 @@ BPF_CALL_5(bpf_sock_ops_setsockopt, struct bpf_sock_ops_kern *, bpf_sock,
>  	if (!is_locked_tcp_sock_ops(bpf_sock))
>  		return -EOPNOTSUPP;
>  
> +	if ((bpf_sock->op == BPF_SOCK_OPS_HDR_OPT_LEN_CB ||
> +	     bpf_sock->op == BPF_SOCK_OPS_WRITE_HDR_OPT_CB) &&
> +	    IS_ENABLED(CONFIG_INET) && level == SOL_TCP && optname == TCP_NODELAY)

IS_ENABLED(CONFIG_INET) is unnecessary.

pw-bot: cr

> +		return -EOPNOTSUPP;
> +
>  	return _bpf_setsockopt(bpf_sock->sk, level, optname, optval, optlen);
>  }
>  
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH bpf v2 2/2] selftests/bpf: Test TCP_NODELAY in TCP hdr opt callbacks
  2026-04-16 11:23 ` [PATCH bpf v2 2/2] selftests/bpf: Test TCP_NODELAY in TCP hdr opt callbacks KaFai Wan
@ 2026-04-16 19:06   ` Martin KaFai Lau
  2026-04-17  3:07     ` KaFai Wan
  0 siblings, 1 reply; 9+ messages in thread
From: Martin KaFai Lau @ 2026-04-16 19:06 UTC (permalink / raw)
  To: KaFai Wan
  Cc: daniel, john.fastabend, sdf, ast, andrii, eddyz87, memxor, song,
	yonghong.song, jolsa, davem, edumazet, kuba, pabeni, horms, shuah,
	jiayuan.chen, bpf, netdev, linux-kernel, linux-kselftest

On Thu, Apr 16, 2026 at 07:23:08PM +0800, KaFai Wan wrote:
> index 56685fc03c7e..2d738c0c4259 100644
> --- a/tools/testing/selftests/bpf/prog_tests/tcp_hdr_options.c
> +++ b/tools/testing/selftests/bpf/prog_tests/tcp_hdr_options.c
> @@ -513,6 +513,59 @@ static void misc(void)
>  	bpf_link__destroy(link);
>  }
>  
> +static void hdr_sockopt(void)
> +{
> +	const char send_msg[] = "MISC!!!";
> +	char recv_msg[sizeof(send_msg)];
> +	const unsigned int nr_data = 2;
> +	struct bpf_link *link;
> +	struct sk_fds sk_fds;
> +	int i, ret, true_val = 1;
> +
> +	lport_linum_map_fd = bpf_map__fd(misc_skel->maps.lport_linum_map);
> +
> +	link = bpf_program__attach_cgroup(misc_skel->progs.misc_hdr_sockopt, cg_fd);
> +	if (!ASSERT_OK_PTR(link, "attach_cgroup(misc_hdr_sockopt)"))
> +		return;
> +
> +	if (sk_fds_connect(&sk_fds, false)) {
> +		bpf_link__destroy(link);
> +		return;
> +	}
> +
> +	ret = setsockopt(sk_fds.active_fd, SOL_TCP, TCP_NODELAY, &true_val, sizeof(true_val));
> +	if (!ASSERT_OK(ret, "setsockopt(TCP_NODELAY) active"))
> +		goto check_linum;
> +
> +	ret = setsockopt(sk_fds.passive_fd, SOL_TCP, TCP_NODELAY, &true_val, sizeof(true_val));

Why are these two setsockopt(TCP_NODELAY) calls needed?

Instead of creating a new "void hdr_sockopt(void)", can the test be done in the
existing "void misc(void)" by doing bpf_setsockopt(TCP_NODELAY) in the
misc_estab() bpf prog?

The PASSIVE_ESTABLISHED_CB can do the bpf_setsockopt(TCP_NODELAY, 0)
if it wants to keep the same expectation on Nagle. The
BPF_SOCK_OPS_HDR_OPT_LEN_CB and BPF_SOCK_OPS_WRITE_HDR_OPT_CB
can do bpf_setsockopt(TCP_NODELAY, 1) to test recursion and
the error return value.

>  void test_tcp_hdr_options(void)
> diff --git a/tools/testing/selftests/bpf/progs/test_misc_tcp_hdr_options.c b/tools/testing/selftests/bpf/progs/test_misc_tcp_hdr_options.c
> index d487153a839d..a8cf7c4e7ed2 100644
> --- a/tools/testing/selftests/bpf/progs/test_misc_tcp_hdr_options.c
> +++ b/tools/testing/selftests/bpf/progs/test_misc_tcp_hdr_options.c
> @@ -28,6 +28,12 @@ unsigned int nr_data = 0;
>  unsigned int nr_syn = 0;
>  unsigned int nr_fin = 0;
>  unsigned int nr_hwtstamp = 0;
> +unsigned int nr_hdr_sockopt_estab = 0;
> +unsigned int nr_hdr_sockopt_estab_err = 0;
> +unsigned int nr_hdr_sockopt_len = 0;
> +unsigned int nr_hdr_sockopt_len_err = 0;
> +unsigned int nr_hdr_sockopt_write = 0;
> +unsigned int nr_hdr_sockopt_write_err = 0;

nr_hdr_sockopt_estab, nr_hdr_sockopt_len, and nr_hdr_sockopt_write
are unnecessary. These tests have already been covered in some ways.

Mostly a nit. The new counters are used in both connections. Note the
existing nr_xxx is exclusively used in either active or passive,
so there is no parallel counting in practice.

Instead of counting, just use a bool nodelay_est_ok,
nodelay_hdr_len_err, nodelay_write_err and assert them
to be true in userspace.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH bpf v2 1/2] bpf: Reject TCP_NODELAY in TCP header option callbacks
  2026-04-16 17:35   ` Martin KaFai Lau
@ 2026-04-17  1:35     ` KaFai Wan
  0 siblings, 0 replies; 9+ messages in thread
From: KaFai Wan @ 2026-04-17  1:35 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: daniel, john.fastabend, sdf, ast, andrii, eddyz87, memxor, song,
	yonghong.song, jolsa, davem, edumazet, kuba, pabeni, horms, shuah,
	jiayuan.chen, bpf, netdev, linux-kernel, linux-kselftest,
	Quan Sun, Yinhao Hu, Kaiyan Mei

On Thu, 2026-04-16 at 10:35 -0700, Martin KaFai Lau wrote:
> On Thu, Apr 16, 2026 at 07:23:07PM +0800, KaFai Wan wrote:
> > diff --git a/net/core/filter.c b/net/core/filter.c
> > index fcfcb72663ca..911ff04bca5a 100644
> > --- a/net/core/filter.c
> > +++ b/net/core/filter.c
> > @@ -5833,6 +5833,11 @@ BPF_CALL_5(bpf_sock_ops_setsockopt, struct bpf_sock_ops_kern *, bpf_sock,
> >  	if (!is_locked_tcp_sock_ops(bpf_sock))
> >  		return -EOPNOTSUPP;
> >  
> > +	if ((bpf_sock->op == BPF_SOCK_OPS_HDR_OPT_LEN_CB ||
> > +	     bpf_sock->op == BPF_SOCK_OPS_WRITE_HDR_OPT_CB) &&
> > +	    IS_ENABLED(CONFIG_INET) && level == SOL_TCP && optname == TCP_NODELAY)
> 
> IS_ENABLED(CONFIG_INET) is unnecessary.

ok, will remove in next version.
> 
> pw-bot: cr
> 
> > +		return -EOPNOTSUPP;
> > +
> >  	return _bpf_setsockopt(bpf_sock->sk, level, optname, optval, optlen);
> >  }
> >  
> > -- 
> > 2.43.0
> > 

-- 
Thanks,
KaFai

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH bpf v2 1/2] bpf: Reject TCP_NODELAY in TCP header option callbacks
  2026-04-16 11:23 ` [PATCH bpf v2 1/2] " KaFai Wan
  2026-04-16 17:35   ` Martin KaFai Lau
@ 2026-04-17  2:43   ` Jiayuan Chen
  2026-04-17  9:27     ` KaFai Wan
  1 sibling, 1 reply; 9+ messages in thread
From: Jiayuan Chen @ 2026-04-17  2:43 UTC (permalink / raw)
  To: KaFai Wan, martin.lau, daniel, john.fastabend, sdf, ast, andrii,
	eddyz87, memxor, song, yonghong.song, jolsa, davem, edumazet,
	kuba, pabeni, horms, shuah, jiayuan.chen, bpf, netdev,
	linux-kernel, linux-kselftest
  Cc: Quan Sun, Yinhao Hu, Kaiyan Mei


On 4/16/26 7:23 PM, KaFai Wan wrote:
> A BPF_SOCK_OPS program can enable
> BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG and then call
> bpf_setsockopt(TCP_NODELAY) from BPF_SOCK_OPS_HDR_OPT_LEN_CB or
> BPF_SOCK_OPS_WRITE_HDR_OPT_CB.
>
> In these callbacks, bpf_setsockopt(TCP_NODELAY) can reach
> __tcp_sock_set_nodelay(), which can call tcp_push_pending_frames().
>
>  From BPF_SOCK_OPS_HDR_OPT_LEN_CB, tcp_push_pending_frames() can call
> tcp_current_mss(), which calls tcp_established_options() and re-enters
> bpf_skops_hdr_opt_len().
>
> BPF_SOCK_OPS_HDR_OPT_LEN_CB
>    -> bpf_setsockopt(TCP_NODELAY)
>      -> tcp_push_pending_frames()
>        -> tcp_current_mss()
>          -> tcp_established_options()
>            -> bpf_skops_hdr_opt_len()
>              -> BPF_SOCK_OPS_HDR_OPT_LEN_CB
>
>  From BPF_SOCK_OPS_WRITE_HDR_OPT_CB, tcp_push_pending_frames() can call
> tcp_write_xmit(), which calls tcp_transmit_skb().  That path recomputes
> header option length through tcp_established_options() and
> bpf_skops_hdr_opt_len() before re-entering bpf_skops_write_hdr_opt().
>
> BPF_SOCK_OPS_WRITE_HDR_OPT_CB
>    -> bpf_setsockopt(TCP_NODELAY)
>      -> tcp_push_pending_frames()
>        -> tcp_write_xmit()
>          -> tcp_transmit_skb()
>            -> tcp_established_options()
>              -> bpf_skops_hdr_opt_len()
>            -> bpf_skops_write_hdr_opt()
>              -> BPF_SOCK_OPS_WRITE_HDR_OPT_CB
>
> This leads to unbounded recursion and can overflow the kernel stack.
>
> Reject TCP_NODELAY with -EOPNOTSUPP in bpf_sock_ops_setsockopt()
> when bpf_setsockopt() is called from
> BPF_SOCK_OPS_HDR_OPT_LEN_CB or BPF_SOCK_OPS_WRITE_HDR_OPT_CB.
>
> Reported-by: Quan Sun <2022090917019@std.uestc.edu.cn>
> Reported-by: Yinhao Hu <dddddd@hust.edu.cn>
> Reported-by: Kaiyan Mei <M202472210@hust.edu.cn>
> Closes: https://lore.kernel.org/bpf/d1d523c9-6901-4454-a183-94462b8f3e4e@std.uestc.edu.cn/
> Fixes: 7e41df5dbba2 ("bpf: Add a few optnames to bpf_setsockopt")
> Signed-off-by: KaFai Wan <kafai.wan@linux.dev>
> ---
>   net/core/filter.c | 5 +++++
>   1 file changed, 5 insertions(+)
>
> diff --git a/net/core/filter.c b/net/core/filter.c
> index fcfcb72663ca..911ff04bca5a 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -5833,6 +5833,11 @@ BPF_CALL_5(bpf_sock_ops_setsockopt, struct bpf_sock_ops_kern *, bpf_sock,
>   	if (!is_locked_tcp_sock_ops(bpf_sock))
>   		return -EOPNOTSUPP;
>   
> +	if ((bpf_sock->op == BPF_SOCK_OPS_HDR_OPT_LEN_CB ||
> +	     bpf_sock->op == BPF_SOCK_OPS_WRITE_HDR_OPT_CB) &&
> +	    IS_ENABLED(CONFIG_INET) && level == SOL_TCP && optname == TCP_NODELAY)
> +		return -EOPNOTSUPP;
> +
>   	return _bpf_setsockopt(bpf_sock->sk, level, optname, optval, optlen);
>   }
>   

A simple comment is recommended:

/* TCP_NODELAY triggers tcp_push_pending_frames() and re-enters these 
callbacks. */


Also like Martin pointed before, BPF_SOCK_OPS_HDR_OPT_LEN_CB / 
BPF_SOCK_OPS_WRITE_HDR_OPT_CB

can only be produced under CONFIG_INET so IS_ENABLED(CONFIG_INET) is dead.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH bpf v2 2/2] selftests/bpf: Test TCP_NODELAY in TCP hdr opt callbacks
  2026-04-16 19:06   ` Martin KaFai Lau
@ 2026-04-17  3:07     ` KaFai Wan
  0 siblings, 0 replies; 9+ messages in thread
From: KaFai Wan @ 2026-04-17  3:07 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: daniel, john.fastabend, sdf, ast, andrii, eddyz87, memxor, song,
	yonghong.song, jolsa, davem, edumazet, kuba, pabeni, horms, shuah,
	jiayuan.chen, bpf, netdev, linux-kernel, linux-kselftest

On Thu, 2026-04-16 at 12:06 -0700, Martin KaFai Lau wrote:
> On Thu, Apr 16, 2026 at 07:23:08PM +0800, KaFai Wan wrote:
> > index 56685fc03c7e..2d738c0c4259 100644
> > --- a/tools/testing/selftests/bpf/prog_tests/tcp_hdr_options.c
> > +++ b/tools/testing/selftests/bpf/prog_tests/tcp_hdr_options.c
> > @@ -513,6 +513,59 @@ static void misc(void)
> >  	bpf_link__destroy(link);
> >  }
> >  
> > +static void hdr_sockopt(void)
> > +{
> > +	const char send_msg[] = "MISC!!!";
> > +	char recv_msg[sizeof(send_msg)];
> > +	const unsigned int nr_data = 2;
> > +	struct bpf_link *link;
> > +	struct sk_fds sk_fds;
> > +	int i, ret, true_val = 1;
> > +
> > +	lport_linum_map_fd = bpf_map__fd(misc_skel->maps.lport_linum_map);
> > +
> > +	link = bpf_program__attach_cgroup(misc_skel->progs.misc_hdr_sockopt, cg_fd);
> > +	if (!ASSERT_OK_PTR(link, "attach_cgroup(misc_hdr_sockopt)"))
> > +		return;
> > +
> > +	if (sk_fds_connect(&sk_fds, false)) {
> > +		bpf_link__destroy(link);
> > +		return;
> > +	}
> > +
> > +	ret = setsockopt(sk_fds.active_fd, SOL_TCP, TCP_NODELAY, &true_val, sizeof(true_val));
> > +	if (!ASSERT_OK(ret, "setsockopt(TCP_NODELAY) active"))
> > +		goto check_linum;
> > +
> > +	ret = setsockopt(sk_fds.passive_fd, SOL_TCP, TCP_NODELAY, &true_val, sizeof(true_val));
> 
> Why are these two setsockopt(TCP_NODELAY) calls needed?
> 
> Instead of creating a new "void hdr_sockopt(void)", can the test be done in the
> existing "void misc(void)" by doing bpf_setsockopt(TCP_NODELAY) in the
> misc_estab() bpf prog?

Oh, I see. I meant to test on both active and passive side. We can only test on active side in the
existing "void misc(void)".
> 
> The PASSIVE_ESTABLISHED_CB can do the bpf_setsockopt(TCP_NODELAY, 0)
> if it wants to keep the same expectation on Nagle. The
> BPF_SOCK_OPS_HDR_OPT_LEN_CB and BPF_SOCK_OPS_WRITE_HDR_OPT_CB
> can do bpf_setsockopt(TCP_NODELAY, 1) to test recursion and
> the error return value.
> 
> >  void test_tcp_hdr_options(void)
> > diff --git a/tools/testing/selftests/bpf/progs/test_misc_tcp_hdr_options.c
> > b/tools/testing/selftests/bpf/progs/test_misc_tcp_hdr_options.c
> > index d487153a839d..a8cf7c4e7ed2 100644
> > --- a/tools/testing/selftests/bpf/progs/test_misc_tcp_hdr_options.c
> > +++ b/tools/testing/selftests/bpf/progs/test_misc_tcp_hdr_options.c
> > @@ -28,6 +28,12 @@ unsigned int nr_data = 0;
> >  unsigned int nr_syn = 0;
> >  unsigned int nr_fin = 0;
> >  unsigned int nr_hwtstamp = 0;
> > +unsigned int nr_hdr_sockopt_estab = 0;
> > +unsigned int nr_hdr_sockopt_estab_err = 0;
> > +unsigned int nr_hdr_sockopt_len = 0;
> > +unsigned int nr_hdr_sockopt_len_err = 0;
> > +unsigned int nr_hdr_sockopt_write = 0;
> > +unsigned int nr_hdr_sockopt_write_err = 0;
> 
> nr_hdr_sockopt_estab, nr_hdr_sockopt_len, and nr_hdr_sockopt_write
> are unnecessary. These tests have already been covered in some ways.

yes, they are unnecessary in existing misc_estab()
> 
> Mostly a nit. The new counters are used in both connections. Note the
> existing nr_xxx is exclusively used in either active or passive,
> so there is no parallel counting in practice.
> 
> Instead of counting, just use a bool nodelay_est_ok,
> nodelay_hdr_len_err, nodelay_write_err and assert them
> to be true in userspace.

indeed. will fix these in next version.

-- 
Thanks,
KaFai

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH bpf v2 1/2] bpf: Reject TCP_NODELAY in TCP header option callbacks
  2026-04-17  2:43   ` Jiayuan Chen
@ 2026-04-17  9:27     ` KaFai Wan
  0 siblings, 0 replies; 9+ messages in thread
From: KaFai Wan @ 2026-04-17  9:27 UTC (permalink / raw)
  To: Jiayuan Chen, martin.lau, daniel, john.fastabend, sdf, ast,
	andrii, eddyz87, memxor, song, yonghong.song, jolsa, davem,
	edumazet, kuba, pabeni, horms, shuah, bpf, netdev, linux-kernel,
	linux-kselftest
  Cc: Quan Sun, Yinhao Hu, Kaiyan Mei

On Fri, 2026-04-17 at 10:43 +0800, Jiayuan Chen wrote:
> 
> On 4/16/26 7:23 PM, KaFai Wan wrote:
> > A BPF_SOCK_OPS program can enable
> > BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG and then call
> > bpf_setsockopt(TCP_NODELAY) from BPF_SOCK_OPS_HDR_OPT_LEN_CB or
> > BPF_SOCK_OPS_WRITE_HDR_OPT_CB.
> > 
> > In these callbacks, bpf_setsockopt(TCP_NODELAY) can reach
> > __tcp_sock_set_nodelay(), which can call tcp_push_pending_frames().
> > 
> >  From BPF_SOCK_OPS_HDR_OPT_LEN_CB, tcp_push_pending_frames() can call
> > tcp_current_mss(), which calls tcp_established_options() and re-enters
> > bpf_skops_hdr_opt_len().
> > 
> > BPF_SOCK_OPS_HDR_OPT_LEN_CB
> >    -> bpf_setsockopt(TCP_NODELAY)
> >      -> tcp_push_pending_frames()
> >        -> tcp_current_mss()
> >          -> tcp_established_options()
> >            -> bpf_skops_hdr_opt_len()
> >              -> BPF_SOCK_OPS_HDR_OPT_LEN_CB
> > 
> >  From BPF_SOCK_OPS_WRITE_HDR_OPT_CB, tcp_push_pending_frames() can call
> > tcp_write_xmit(), which calls tcp_transmit_skb().  That path recomputes
> > header option length through tcp_established_options() and
> > bpf_skops_hdr_opt_len() before re-entering bpf_skops_write_hdr_opt().
> > 
> > BPF_SOCK_OPS_WRITE_HDR_OPT_CB
> >    -> bpf_setsockopt(TCP_NODELAY)
> >      -> tcp_push_pending_frames()
> >        -> tcp_write_xmit()
> >          -> tcp_transmit_skb()
> >            -> tcp_established_options()
> >              -> bpf_skops_hdr_opt_len()
> >            -> bpf_skops_write_hdr_opt()
> >              -> BPF_SOCK_OPS_WRITE_HDR_OPT_CB
> > 
> > This leads to unbounded recursion and can overflow the kernel stack.
> > 
> > Reject TCP_NODELAY with -EOPNOTSUPP in bpf_sock_ops_setsockopt()
> > when bpf_setsockopt() is called from
> > BPF_SOCK_OPS_HDR_OPT_LEN_CB or BPF_SOCK_OPS_WRITE_HDR_OPT_CB.
> > 
> > Reported-by: Quan Sun <2022090917019@std.uestc.edu.cn>
> > Reported-by: Yinhao Hu <dddddd@hust.edu.cn>
> > Reported-by: Kaiyan Mei <M202472210@hust.edu.cn>
> > Closes: https://lore.kernel.org/bpf/d1d523c9-6901-4454-a183-94462b8f3e4e@std.uestc.edu.cn/
> > Fixes: 7e41df5dbba2 ("bpf: Add a few optnames to bpf_setsockopt")
> > Signed-off-by: KaFai Wan <kafai.wan@linux.dev>
> > ---
> >   net/core/filter.c | 5 +++++
> >   1 file changed, 5 insertions(+)
> > 
> > diff --git a/net/core/filter.c b/net/core/filter.c
> > index fcfcb72663ca..911ff04bca5a 100644
> > --- a/net/core/filter.c
> > +++ b/net/core/filter.c
> > @@ -5833,6 +5833,11 @@ BPF_CALL_5(bpf_sock_ops_setsockopt, struct bpf_sock_ops_kern *, bpf_sock,
> >   	if (!is_locked_tcp_sock_ops(bpf_sock))
> >   		return -EOPNOTSUPP;
> >   
> > +	if ((bpf_sock->op == BPF_SOCK_OPS_HDR_OPT_LEN_CB ||
> > +	     bpf_sock->op == BPF_SOCK_OPS_WRITE_HDR_OPT_CB) &&
> > +	    IS_ENABLED(CONFIG_INET) && level == SOL_TCP && optname == TCP_NODELAY)
> > +		return -EOPNOTSUPP;
> > +
> >   	return _bpf_setsockopt(bpf_sock->sk, level, optname, optval, optlen);
> >   }
> >   
> 
> A simple comment is recommended:
> 
> /* TCP_NODELAY triggers tcp_push_pending_frames() and re-enters these 
> callbacks. */
> 
ok. Added in new version.
> 
> Also like Martin pointed before, BPF_SOCK_OPS_HDR_OPT_LEN_CB / 
> BPF_SOCK_OPS_WRITE_HDR_OPT_CB
> 
> can only be produced under CONFIG_INET so IS_ENABLED(CONFIG_INET) is dead.
> 

-- 
Thanks,
KaFai

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2026-04-17  9:27 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-16 11:23 [PATCH bpf v2 0/2] bpf: Reject TCP_NODELAY in TCP header option callbacks KaFai Wan
2026-04-16 11:23 ` [PATCH bpf v2 1/2] " KaFai Wan
2026-04-16 17:35   ` Martin KaFai Lau
2026-04-17  1:35     ` KaFai Wan
2026-04-17  2:43   ` Jiayuan Chen
2026-04-17  9:27     ` KaFai Wan
2026-04-16 11:23 ` [PATCH bpf v2 2/2] selftests/bpf: Test TCP_NODELAY in TCP hdr opt callbacks KaFai Wan
2026-04-16 19:06   ` Martin KaFai Lau
2026-04-17  3:07     ` KaFai Wan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox