From: KaFai Wan <kafai.wan@linux.dev>
To: ast@kernel.org, daniel@iogearbox.net, john.fastabend@gmail.com,
andrii@kernel.org, martin.lau@linux.dev, eddyz87@gmail.com,
memxor@gmail.com, song@kernel.org, yonghong.song@linux.dev,
jolsa@kernel.org, sdf@fomichev.me, davem@davemloft.net,
edumazet@google.com, kuba@kernel.org, pabeni@redhat.com,
horms@kernel.org, dsahern@kernel.org, shuah@kernel.org,
ihor.solodrai@linux.dev, kafai.wan@linux.dev,
jiayuan.chen@linux.dev, hoyeon.lee@suse.com, ameryhung@gmail.com,
bpf@vger.kernel.org, linux-kernel@vger.kernel.org,
netdev@vger.kernel.org, linux-kselftest@vger.kernel.org
Subject: [PATCH bpf-next v4 2/4] bpf: Reject TCP_NODELAY in bpf-tcp-cc
Date: Tue, 21 Apr 2026 23:58:02 +0800 [thread overview]
Message-ID: <20260421155804.135786-3-kafai.wan@linux.dev> (raw)
In-Reply-To: <20260421155804.135786-1-kafai.wan@linux.dev>
A BPF TCP congestion control program can call bpf_setsockopt() from
its callbacks. In current kernels, if it calls
bpf_setsockopt(TCP_NODELAY) from cwnd_event_tx_start(), the call can
re-enter the TCP transmit path before the outer tcp_transmit_skb()
has completed and advanced the send head.
This can re-trigger CA_EVENT_TX_START and lead to unbounded recursion:
tcp_transmit_skb()
-> tcp_event_data_sent()
-> tcp_ca_event(sk, CA_EVENT_TX_START)
-> cwnd_event_tx_start()
-> bpf_setsockopt(TCP_NODELAY)
-> tcp_push_pending_frames()
-> tcp_write_xmit()
-> tcp_transmit_skb()
This leads to unbounded recursion and can overflow the kernel stack.
Reject TCP_NODELAY with -EOPNOTSUPP for bpf-tcp-cc by introducing
a dedicated setsockopt proto for BPF_PROG_TYPE_STRUCT_OPS TCP
congestion control programs.
Fixes: 7e41df5dbba2 ("bpf: Add a few optnames to bpf_setsockopt")
Suggested-by: Martin KaFai Lau <martin.lau@linux.dev>
Signed-off-by: KaFai Wan <kafai.wan@linux.dev>
---
include/linux/bpf.h | 1 +
net/core/filter.c | 24 ++++++++++++++++++++++++
net/ipv4/bpf_tcp_ca.c | 2 +-
3 files changed, 26 insertions(+), 1 deletion(-)
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 3cb6b9e70080..cf75da8a12bd 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -3725,6 +3725,7 @@ extern const struct bpf_func_proto bpf_for_each_map_elem_proto;
extern const struct bpf_func_proto bpf_btf_find_by_name_kind_proto;
extern const struct bpf_func_proto bpf_sk_setsockopt_proto;
extern const struct bpf_func_proto bpf_sk_getsockopt_proto;
+extern const struct bpf_func_proto bpf_sk_setsockopt_nodelay_proto;
extern const struct bpf_func_proto bpf_unlocked_sk_setsockopt_proto;
extern const struct bpf_func_proto bpf_unlocked_sk_getsockopt_proto;
extern const struct bpf_func_proto bpf_find_vma_proto;
diff --git a/net/core/filter.c b/net/core/filter.c
index 96849f4c1fbc..1140f4b55ab5 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5688,6 +5688,30 @@ const struct bpf_func_proto bpf_sk_getsockopt_proto = {
.arg5_type = ARG_CONST_SIZE,
};
+BPF_CALL_5(bpf_sk_setsockopt_nodelay, struct sock *, sk, int, level,
+ int, optname, char *, optval, int, optlen)
+{
+ /*
+ * TCP_NODELAY triggers tcp_push_pending_frames() and re-enters
+ * CA_EVENT_TX_START in bpf_tcp_cc, reject it in all bpf_tcp_cc.
+ */
+ if (level == SOL_TCP && optname == TCP_NODELAY)
+ return -EOPNOTSUPP;
+
+ return _bpf_setsockopt(sk, level, optname, optval, optlen);
+}
+
+const struct bpf_func_proto bpf_sk_setsockopt_nodelay_proto = {
+ .func = bpf_sk_setsockopt_nodelay,
+ .gpl_only = false,
+ .ret_type = RET_INTEGER,
+ .arg1_type = ARG_PTR_TO_BTF_ID_SOCK_COMMON,
+ .arg2_type = ARG_ANYTHING,
+ .arg3_type = ARG_ANYTHING,
+ .arg4_type = ARG_PTR_TO_MEM | MEM_RDONLY,
+ .arg5_type = ARG_CONST_SIZE,
+};
+
BPF_CALL_5(bpf_unlocked_sk_setsockopt, struct sock *, sk, int, level,
int, optname, char *, optval, int, optlen)
{
diff --git a/net/ipv4/bpf_tcp_ca.c b/net/ipv4/bpf_tcp_ca.c
index 008edc7f6688..791e15063237 100644
--- a/net/ipv4/bpf_tcp_ca.c
+++ b/net/ipv4/bpf_tcp_ca.c
@@ -168,7 +168,7 @@ bpf_tcp_ca_get_func_proto(enum bpf_func_id func_id,
*/
if (prog_ops_moff(prog) !=
offsetof(struct tcp_congestion_ops, release))
- return &bpf_sk_setsockopt_proto;
+ return &bpf_sk_setsockopt_nodelay_proto;
return NULL;
case BPF_FUNC_getsockopt:
/* Since get/setsockopt is usually expected to
--
2.43.0
next prev parent reply other threads:[~2026-04-21 15:59 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-21 15:58 [PATCH bpf-next v4 0/4] bpf: Reject TCP_NODELAY in TCP header option KaFai Wan
2026-04-21 15:58 ` [PATCH bpf-next v4 1/4] bpf: Reject TCP_NODELAY in TCP header option callbacks KaFai Wan
2026-04-21 16:51 ` bot+bpf-ci
2026-04-21 15:58 ` KaFai Wan [this message]
2026-04-21 15:58 ` [PATCH bpf-next v4 3/4] selftests/bpf: Test TCP_NODELAY in TCP hdr opt callbacks KaFai Wan
2026-04-21 15:58 ` [PATCH bpf-next v4 4/4] selftests/bpf: Verify bpf-tcp-cc rejects TCP_NODELAY KaFai Wan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260421155804.135786-3-kafai.wan@linux.dev \
--to=kafai.wan@linux.dev \
--cc=ameryhung@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=dsahern@kernel.org \
--cc=eddyz87@gmail.com \
--cc=edumazet@google.com \
--cc=horms@kernel.org \
--cc=hoyeon.lee@suse.com \
--cc=ihor.solodrai@linux.dev \
--cc=jiayuan.chen@linux.dev \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=martin.lau@linux.dev \
--cc=memxor@gmail.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=sdf@fomichev.me \
--cc=shuah@kernel.org \
--cc=song@kernel.org \
--cc=yonghong.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox