From: Martin KaFai Lau <martin.lau@linux.dev>
To: Kuniyuki Iwashima <kuniyu@google.com>
Cc: Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Andrii Nakryiko <andrii@kernel.org>,
Eduard Zingerman <eddyz87@gmail.com>,
Kumar Kartikeya Dwivedi <memxor@gmail.com>,
Yonghong Song <yonghong.song@linux.dev>,
John Fastabend <john.fastabend@gmail.com>,
Stanislav Fomichev <sdf@fomichev.me>,
Eric Dumazet <edumazet@google.com>,
Neal Cardwell <ncardwell@google.com>,
Willem de Bruijn <willemb@google.com>,
Tenzin Ukyab <ukyab@berkeley.edu>,
Kuniyuki Iwashima <kuni1840@gmail.com>,
bpf@vger.kernel.org, netdev@vger.kernel.org
Subject: Re: [PATCH v3 bpf-next 10/11] bpf: tcp: Add SOCK_OPS rcvlowat hook.
Date: Tue, 26 May 2026 13:47:36 -0700 [thread overview]
Message-ID: <202652620435.m7WK.martin.lau@linux.dev> (raw)
In-Reply-To: <20260523083001.2911931-11-kuniyu@google.com>
On Sat, May 23, 2026 at 08:29:39AM +0000, Kuniyuki Iwashima wrote:
> Now, it is time to add the new hooks for BPF_SOCK_OPS_RCVQ_CB.
>
> Let's invoke the BPF SOCK_OPS prog when
>
> 1. TCP stack enqueues skb to sk->sk_receive_queue
> -> tcp_queue_rcv(), tcp_ofo_queue(), and tcp_fastopen_add_skb()
>
> 2. TCP recvmsg() completes
> -> __tcp_cleanup_rbuf()
>
> This will allow the BPF prog to parse each skb and dynamically
> adjust sk->sk_rcvlowat to suppress unnecessary EPOLLIN wakeups
> until sufficient data (e.g., a full RPC frame) is available
> in the receive queue.
>
> Note that the direct access to bpf_sock_ops.data is intentionally
> disabled by passing 0 as end_offset.
>
> Instead, the BPF prog is supposed to use bpf_skb_load_bytes()
> with bpf_sock_ops because payload is not in the linear area
> with TCP header/data split on and skb may contain a RPC
> descriptor in skb frag. This also simplifies the BPF prog.
>
> The placement of tcp_bpf_rcvlowat() in tcp_ofo_queue() and
> tcp_fastopen_add_skb() is chosen to provide the same snapshot
> with tcp_queue_rcv().
>
> For example, if tcp_bpf_rcvlowat() were called before updating
> TCP_SKB_CB(skb)->seq in tcp_fastopen_add_skb(), BPF prog would
> need to implement an unlikely if branch to strip SYN.
>
> In addition, TCP stack can queue overlapping skb into recvq.
> Once rcv_nxt is updated with a new skb, BPF prog cannot infer
> the previous one from skb->len.
>
> Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
> ---
> v2: Add explanation of tcp_bpf_rcvlowat() placement.
> ---
> include/net/tcp.h | 12 ++++++++++++
> net/ipv4/tcp.c | 2 ++
> net/ipv4/tcp_fastopen.c | 2 ++
> net/ipv4/tcp_input.c | 10 ++++++++++
> 4 files changed, 26 insertions(+)
>
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index bc95d8e7b62e..a409f2ea710f 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -2889,12 +2889,24 @@ static inline void bpf_skops_init_skb(struct bpf_sock_ops_kern *skops,
> skops->skb = skb;
> skops->skb_data_end = skb->data + end_offset;
> }
> +
> +void bpf_skops_rcvlowat(struct sock *sk, struct sk_buff *skb);
> +
> +static inline void tcp_bpf_rcvlowat(struct sock *sk, struct sk_buff *skb)
> +{
> + if (BPF_SOCK_OPS_TEST_FLAG(tcp_sk(sk), BPF_SOCK_OPS_RCVQ_CB_FLAG))
> + bpf_skops_rcvlowat(sk, skb);
> +}
> #else
> static inline void bpf_skops_init_skb(struct bpf_sock_ops_kern *skops,
> struct sk_buff *skb,
> unsigned int end_offset)
> {
> }
> +
> +static inline void tcp_bpf_rcvlowat(struct sock *sk, struct sk_buff *skb)
> +{
> +}
> #endif
>
> /* Call BPF_SOCK_OPS program that returns an int. If the return value
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 3afeb69a547a..f7e32891bb4e 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -1602,6 +1602,8 @@ void __tcp_cleanup_rbuf(struct sock *sk, int copied)
> tcp_mstamp_refresh(tp);
> tcp_send_ack(sk);
> }
> +
> + tcp_bpf_rcvlowat(sk, NULL);
hmm... so NULL is a way for the bpf prog to tell where it is called?
With skb NULL, what does the bpf prog usually do?
next prev parent reply other threads:[~2026-05-26 20:47 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-23 8:29 [PATCH v3 bpf-next 00/11] bpf: Add SOCK_OPS hooks for TCP AutoLOWAT Kuniyuki Iwashima
2026-05-23 8:29 ` [PATCH v3 bpf-next 01/11] selftest: bpf: Use BPF_SOCK_OPS_ALL_CB_FLAGS + 1 for bad_cb_test_rv Kuniyuki Iwashima
2026-05-23 9:06 ` bot+bpf-ci
2026-05-23 8:29 ` [PATCH v3 bpf-next 02/11] bpf: tcp: Introduce BPF_SOCK_OPS_RCVQ_CB Kuniyuki Iwashima
2026-05-23 8:29 ` [PATCH v3 bpf-next 03/11] bpf: tcp: Support bpf_skb_load_bytes() for BPF_SOCK_OPS_RCVQ_CB Kuniyuki Iwashima
2026-05-26 20:34 ` Martin KaFai Lau
2026-05-26 21:21 ` Kuniyuki Iwashima
2026-05-26 22:18 ` Martin KaFai Lau
2026-05-27 4:01 ` Jason Xing
2026-05-27 19:46 ` Martin KaFai Lau
2026-05-27 19:52 ` Martin KaFai Lau
2026-05-27 21:39 ` Kuniyuki Iwashima
2026-05-28 0:24 ` Martin KaFai Lau
2026-05-28 0:49 ` Jason Xing
2026-05-23 8:29 ` [PATCH v3 bpf-next 04/11] tcp: Split out __tcp_set_rcvlowat() Kuniyuki Iwashima
2026-05-23 8:29 ` [PATCH v3 bpf-next 05/11] bpf: tcp: Add kfunc to adjust sk->sk_rcvlowat Kuniyuki Iwashima
2026-05-23 9:06 ` bot+bpf-ci
2026-05-23 8:29 ` [PATCH v3 bpf-next 06/11] bpf: tcp: Make BPF_SOCK_OPS_RCVQ_CB and SOCKMAP mutually exclusive Kuniyuki Iwashima
2026-05-23 9:20 ` bot+bpf-ci
2026-05-24 3:37 ` Kuniyuki Iwashima
2026-05-23 9:29 ` sashiko-bot
2026-05-24 3:47 ` Kuniyuki Iwashima
2026-05-23 8:29 ` [PATCH v3 bpf-next 07/11] bpf: mptcp: Don't support BPF_SOCK_OPS_RCVQ_CB Kuniyuki Iwashima
2026-05-23 8:29 ` [PATCH v3 bpf-next 08/11] bpf: tcp: Reject BPF_SOCK_OPS_RCVQ_CB if receive queue is not empty Kuniyuki Iwashima
2026-05-23 9:20 ` bot+bpf-ci
2026-05-23 9:42 ` sashiko-bot
2026-05-24 3:51 ` Kuniyuki Iwashima
2026-05-23 8:29 ` [PATCH v3 bpf-next 09/11] bpf: tcp: Factorise bpf_skops_established() Kuniyuki Iwashima
2026-05-23 8:29 ` [PATCH v3 bpf-next 10/11] bpf: tcp: Add SOCK_OPS rcvlowat hook Kuniyuki Iwashima
2026-05-26 20:47 ` Martin KaFai Lau [this message]
2026-05-26 21:07 ` Kuniyuki Iwashima
2026-05-26 21:37 ` Amery Hung
2026-05-26 21:51 ` Kuniyuki Iwashima
2026-05-23 8:29 ` [PATCH v3 bpf-next 11/11] selftest: bpf: Add test for BPF_SOCK_OPS_RCVQ_CB Kuniyuki Iwashima
2026-05-23 9:12 ` sashiko-bot
2026-05-24 4:06 ` Kuniyuki Iwashima
2026-05-23 9:20 ` bot+bpf-ci
2026-05-24 4:03 ` Kuniyuki Iwashima
2026-05-26 21:01 ` Martin KaFai Lau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=202652620435.m7WK.martin.lau@linux.dev \
--to=martin.lau@linux.dev \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=eddyz87@gmail.com \
--cc=edumazet@google.com \
--cc=john.fastabend@gmail.com \
--cc=kuni1840@gmail.com \
--cc=kuniyu@google.com \
--cc=memxor@gmail.com \
--cc=ncardwell@google.com \
--cc=netdev@vger.kernel.org \
--cc=sdf@fomichev.me \
--cc=ukyab@berkeley.edu \
--cc=willemb@google.com \
--cc=yonghong.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.