All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jiayuan Chen <jiayuan.chen@linux.dev>
To: Kuniyuki Iwashima <kuniyu@google.com>
Cc: Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>,
	Martin KaFai Lau <martin.lau@linux.dev>,
	Eduard Zingerman <eddyz87@gmail.com>,
	Kumar Kartikeya Dwivedi <memxor@gmail.com>,
	Yonghong Song <yonghong.song@linux.dev>,
	John Fastabend <john.fastabend@gmail.com>,
	Stanislav Fomichev <sdf@fomichev.me>,
	Eric Dumazet <edumazet@google.com>,
	Neal Cardwell <ncardwell@google.com>,
	Willem de Bruijn <willemb@google.com>,
	Tenzin Ukyab <ukyab@berkeley.edu>,
	Kuniyuki Iwashima <kuni1840@gmail.com>,
	bpf@vger.kernel.org, netdev@vger.kernel.org
Subject: Re: [PATCH v1 bpf-next 7/8] bpf: tcp: Add SOCK_OPS rcvlowat hook.
Date: Fri, 8 May 2026 20:19:32 +0800	[thread overview]
Message-ID: <9362bf10-9ede-4005-8e63-a18dafd7fab0@linux.dev> (raw)
In-Reply-To: <CAAVpQUBv0Uc4Xi-4wK2S63FqtXHHkLqJTotOxxDyhqFknoZG_Q@mail.gmail.com>


On 5/8/26 7:30 PM, Kuniyuki Iwashima wrote:
> On Fri, May 8, 2026 at 3:37 AM Jiayuan Chen <jiayuan.chen@linux.dev> wrote:
>>
>> On 5/8/26 3:33 PM, Kuniyuki Iwashima wrote:
>>> Now, it is time to add the new hooks for BPF_SOCK_OPS_RCVLOWAT_CB.
>>>
>>> Let's invoke the BPF SOCK_OPS prog when
>>>
>>>     1. TCP stack enqueues skb to sk->sk_receive_queue
>>>        -> tcp_queue_rcv(), tcp_ofo_queue(), and tcp_fastopen_add_skb()
>>>
>>>     2. TCP recvmsg() completes
>>>        -> __tcp_cleanup_rbuf()
>>>
>>> This will allow the BPF prog to parse each skb and dynamically
>>> adjust sk->sk_rcvlowat to suppress unnecessary EPOLLIN wakeups
>>> until sufficient data (e.g., a full RPC frame) is available
>>> in the receive queue.
>>>
>>> Note that the direct access to bpf_sock_ops.data is intentionally
>>> disabled by passing 0 as end_offset.
>>>
>>> Instead, the BPF prog is supposed to use bpf_skb_load_bytes()
>>> with bpf_sock_ops because payload is not in the linear area
>>> with TCP header/data split on and skb may contain a RPC
>>> descriptor in skb frag.  This also simplifies the BPF prog.
>>>
>>> Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
>>> ---
>>>    include/net/tcp.h       | 14 ++++++++++++++
>>>    net/ipv4/tcp.c          |  2 ++
>>>    net/ipv4/tcp_fastopen.c |  2 ++
>>>    net/ipv4/tcp_input.c    | 10 ++++++++++
>>>    4 files changed, 28 insertions(+)
>>>
>>> diff --git a/include/net/tcp.h b/include/net/tcp.h
>>> index 4e9e634e276b..003e46c9b500 100644
>>> --- a/include/net/tcp.h
>>> +++ b/include/net/tcp.h
>>> @@ -737,6 +737,20 @@ static inline struct request_sock *cookie_bpf_check(struct net *net, struct sock
>>>    }
>>>    #endif
>>>
>>> +#ifdef CONFIG_CGROUP_BPF
>>> +void bpf_skops_rcvlowat(struct sock *sk, struct sk_buff *skb);
>>> +
>>> +static inline void tcp_bpf_rcvlowat(struct sock *sk, struct sk_buff *skb)
>>> +{
>>> +     if (BPF_SOCK_OPS_TEST_FLAG(tcp_sk(sk), BPF_SOCK_OPS_RCVLOWAT_CB_FLAG))
>>> +             bpf_skops_rcvlowat(sk, skb);
>>> +}
>>> +#else
>>> +static inline void tcp_bpf_rcvlowat(struct sock *sk, struct sk_buff *skb)
>>> +{
>>> +}
>>> +#endif
>>> +
>>>    /* From net/ipv6/syncookies.c */
>>>    int __cookie_v6_check(const struct ipv6hdr *iph, const struct tcphdr *th);
>>>    struct sock *cookie_v6_check(struct sock *sk, struct sk_buff *skb);
>>> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
>>> index 1d9e52fc454f..80144b97a87a 100644
>>> --- a/net/ipv4/tcp.c
>>> +++ b/net/ipv4/tcp.c
>>> @@ -1602,6 +1602,8 @@ void __tcp_cleanup_rbuf(struct sock *sk, int copied)
>>>                tcp_mstamp_refresh(tp);
>>>                tcp_send_ack(sk);
>>>        }
>>> +
>>> +     tcp_bpf_rcvlowat(sk, NULL);
>>>    }
>>>
>> tcp_read_skb (process frame 1 and __skb_unlink)
>> └─ sk_psock_verdict_recv
>>       └─ sk_psock_verdict_apply
>>           └─ tcp_eat_skb
>>               └─ tcp_cleanup_rbuf
>>                   └─ __tcp_cleanup_rbuf
>>                       └─ BPF RCVLOWAT_CB
>>                           └─ bpf_sock_ops_tcp_set_rcvlowat (wakeup=true)
>>                               └─ tcp_data_ready
>>                                   └─ sk_psock_verdict_data_ready
>>                                       └─ tcp_read_skb (frame 2)
>>                                           └─ ... → tcp_read_skb (frame 3) ...
>>
>> For strparser it use read_sock instead of read_skb and it will become
>> more complicated...
> To be clear, this feature is NOT to use strparser/sockmap.
>> I think this will cause stack overflow with amounts of skbs in receive
>> queue or infinite call(not tested) for sockmap/kTLS/strparser.
>>
> BPF user is responsible for not doing silly things.
>
> tcp_bpf_strp_read_sock() can have loop detection logic,
> but it's only if really needed.


Similar infinite recursion problems for reference:
  https://lore.kernel.org/r/20220929070407.965581-5-martin.lau@linux.dev
  https://lore.kernel.org/bpf/20260421155804.135786-1-kafai.wan@linux.dev/

They were not solved in TCP side but in ops side.


Can we try to handle it on the BPF/OPS side first and only
prevent it elsewhere if it's not feasible there ?


Thanks


  reply	other threads:[~2026-05-08 12:20 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-08  7:33 [PATCH v1 bpf-next 0/8] bpf: Add SOCK_OPS hooks for TCP AutoLOWAT Kuniyuki Iwashima
2026-05-08  7:33 ` [PATCH v1 bpf-next 1/8] selftest: bpf: Use BPF_SOCK_OPS_ALL_CB_FLAGS + 1 for bad_cb_test_rv Kuniyuki Iwashima
2026-05-08 19:02   ` sashiko-bot
2026-05-08 20:21     ` Kuniyuki Iwashima
2026-05-08  7:33 ` [PATCH v1 bpf-next 2/8] bpf: tcp: Introduce BPF_SOCK_OPS_RCVLOWAT_CB Kuniyuki Iwashima
2026-05-08 19:17   ` sashiko-bot
2026-05-08 20:26     ` Kuniyuki Iwashima
2026-05-08  7:33 ` [PATCH v1 bpf-next 3/8] bpf: tcp: Support bpf_skb_load_bytes() for BPF_SOCK_OPS_RCVLOWAT_CB Kuniyuki Iwashima
2026-05-08 15:15   ` Stanislav Fomichev
2026-05-08 19:45     ` Kuniyuki Iwashima
2026-05-11 14:56       ` Stanislav Fomichev
2026-05-08  7:33 ` [PATCH v1 bpf-next 4/8] tcp: Split out __tcp_set_rcvlowat() Kuniyuki Iwashima
2026-05-08  7:33 ` [PATCH v1 bpf-next 5/8] bpf: tcp: Add kfunc to adjust sk->sk_rcvlowat Kuniyuki Iwashima
2026-05-11 12:34   ` Björn Töpel
2026-05-17 23:28     ` Kuniyuki Iwashima
2026-05-08  7:33 ` [PATCH v1 bpf-next 6/8] bpf: tcp: Factorise bpf_skops_established() Kuniyuki Iwashima
2026-05-08  7:33 ` [PATCH v1 bpf-next 7/8] bpf: tcp: Add SOCK_OPS rcvlowat hook Kuniyuki Iwashima
2026-05-08 10:37   ` Jiayuan Chen
2026-05-08 11:30     ` Kuniyuki Iwashima
2026-05-08 12:19       ` Jiayuan Chen [this message]
2026-05-08 15:28   ` Stanislav Fomichev
2026-05-08 20:05     ` Kuniyuki Iwashima
2026-05-11 14:55       ` Stanislav Fomichev
2026-05-08 21:46   ` sashiko-bot
2026-05-08  7:33 ` [PATCH v1 bpf-next 8/8] selftest: bpf: Add test for BPF_SOCK_OPS_RCVLOWAT_CB Kuniyuki Iwashima
2026-05-08 15:35   ` Stanislav Fomichev
2026-05-08 20:19     ` Kuniyuki Iwashima
2026-05-08 21:47       ` Stanislav Fomichev
2026-05-08 21:58         ` Kuniyuki Iwashima
2026-05-08 22:17   ` sashiko-bot
2026-05-08 22:47     ` Kuniyuki Iwashima

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9362bf10-9ede-4005-8e63-a18dafd7fab0@linux.dev \
    --to=jiayuan.chen@linux.dev \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=eddyz87@gmail.com \
    --cc=edumazet@google.com \
    --cc=john.fastabend@gmail.com \
    --cc=kuni1840@gmail.com \
    --cc=kuniyu@google.com \
    --cc=martin.lau@linux.dev \
    --cc=memxor@gmail.com \
    --cc=ncardwell@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=sdf@fomichev.me \
    --cc=ukyab@berkeley.edu \
    --cc=willemb@google.com \
    --cc=yonghong.song@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.