public inbox for bpf@vger.kernel.org
 help / color / mirror / Atom feed
From: KaFai Wan <kafai.wan@linux.dev>
To: Martin KaFai Lau <martin.lau@linux.dev>
Cc: sashiko@lists.linux.dev, bpf@vger.kernel.org
Subject: Re: [PATCH bpf v3 1/2] bpf: Reject TCP_NODELAY in TCP header option callbacks
Date: Tue, 21 Apr 2026 23:50:29 +0800	[thread overview]
Message-ID: <b546fbe2c2e8bc3783b18944df01d2e15171ed70.camel@linux.dev> (raw)
In-Reply-To: <2026420174537.e2om.martin.lau@linux.dev>

On Mon, 2026-04-20 at 11:12 -0700, Martin KaFai Lau wrote:
> On Mon, Apr 20, 2026 at 09:41:06PM +0800, KaFai Wan wrote:
> > > Does this same recursion vulnerability exist for BPF TCP congestion control
> > > algorithms using BPF_PROG_TYPE_STRUCT_OPS?
> > > 
> > > If a BPF congestion control algorithm invokes bpf_setsockopt(TCP_NODELAY)
> > > from its cwnd_event callback when handling CA_EVENT_TX_START, could it
> > > trigger the same unbounded recursion?
> > > 
> > > When the kernel transmits the first packet of a data train via
> > > tcp_transmit_skb(), it invokes tcp_event_data_sent(). Because
> > > tp->packets_out is not incremented until later, tcp_packets_in_flight(tp)
> > > evaluates to 0, triggering tcp_ca_event(sk, CA_EVENT_TX_START).
> > > 
> > > If the BPF program then calls bpf_setsockopt(TCP_NODELAY), it would result
> > > in this call chain:
> > > 
> > > tcp_transmit_skb()
> > >   tcp_event_data_sent() -> invokes CA_EVENT_TX_START
> > >     cwnd_event()
> > >       bpf_setsockopt(TCP_NODELAY)
> > >         tcp_push_pending_frames()
> > >           tcp_write_xmit()
> > > 
> > > Since the outer tcp_transmit_skb() hasn't finished, the send head hasn't
> > > advanced. Wouldn't tcp_write_xmit() see the same SKB, attempt to transmit
> > > it again, and re-enter tcp_transmit_skb() causing an infinite recursion?
> > > 
> > You are right. I can reproduce this. 
> >  
> > > Should the restriction on TCP_NODELAY be enforced at a broader level, such
> > > as inside _bpf_setsockopt(), to protect contexts holding the socket lock
> > > during TX paths?
> > > 
> > We can check in sol_tcp_sockopt().
> 
> I don't know how it can use the socket lock to single out this case.
> All bpf programs that are allowed to call bpf_setsockopt should
> have the sock lock held. Maybe I am missing something obvious.

I tried to find a way to determine if the sk is in tx state in tcp_transmit_skb(),
but didn't succeed.

> 
> In bpf_tcp_ca_get_func_proto, it checks what ops can do bpf_sk_setsockopt_proto.
> Right now, it rejects the "release" ops. One option is to create a new
> func_proto, bpf_sk_setsockopt_nodelay_proto, to reject TCP_NODELAY.
> Instead of checking cwnd_event[_tx_start] in bpf_tcp_ca_get_func_proto,
> I would return bpf_sk_setsockopt_nodelay_proto for all ops. We can revisit
> and be more selective in the future if the hammer turns out to be too big.
> "release" ops will remain disallowed from calling bpf_setsockopt.

Great, I'll try this one. bpf_getsockopt(TCP_NODELAY) will not trigger 
infinite recursion, I will keep it as is.

-- 
Thanks,
KaFai

  reply	other threads:[~2026-04-21 15:50 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-17  9:20 [PATCH bpf v3 0/2] bpf: Reject TCP_NODELAY in TCP header option callbacks KaFai Wan
2026-04-17  9:20 ` [PATCH bpf v3 1/2] " KaFai Wan
2026-04-17 10:10   ` bot+bpf-ci
2026-04-17 10:26   ` Jiayuan Chen
2026-04-18  9:22   ` sashiko-bot
2026-04-20 13:41     ` KaFai Wan
2026-04-20 18:12       ` Martin KaFai Lau
2026-04-21 15:50         ` KaFai Wan [this message]
2026-04-17  9:20 ` [PATCH bpf v3 2/2] selftests/bpf: Test TCP_NODELAY in TCP hdr opt callbacks KaFai Wan
2026-04-17 10:45   ` Jiayuan Chen
2026-04-17 16:25   ` Martin KaFai Lau
2026-04-18  2:19     ` KaFai Wan
2026-04-20 17:09       ` Martin KaFai Lau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b546fbe2c2e8bc3783b18944df01d2e15171ed70.camel@linux.dev \
    --to=kafai.wan@linux.dev \
    --cc=bpf@vger.kernel.org \
    --cc=martin.lau@linux.dev \
    --cc=sashiko@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox