Netdev List
 help / color / mirror / Atom feed
From: Martin KaFai Lau <martin.lau@linux.dev>
To: Kuniyuki Iwashima <kuniyu@google.com>
Cc: Jason Xing <kerneljasonxing@gmail.com>,
	 Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	 Andrii Nakryiko <andrii@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	 Kumar Kartikeya Dwivedi <memxor@gmail.com>,
	Yonghong Song <yonghong.song@linux.dev>,
	 John Fastabend <john.fastabend@gmail.com>,
	Stanislav Fomichev <sdf@fomichev.me>,
	 Eric Dumazet <edumazet@google.com>,
	Neal Cardwell <ncardwell@google.com>,
	 Willem de Bruijn <willemb@google.com>,
	Tenzin Ukyab <ukyab@berkeley.edu>,
	 Kuniyuki Iwashima <kuni1840@gmail.com>,
	bpf@vger.kernel.org, netdev@vger.kernel.org
Subject: Re: [PATCH v3 bpf-next 03/11] bpf: tcp: Support bpf_skb_load_bytes() for BPF_SOCK_OPS_RCVQ_CB.
Date: Tue, 26 May 2026 15:18:54 -0700	[thread overview]
Message-ID: <2026526214538.Vhly.martin.lau@linux.dev> (raw)
In-Reply-To: <CAAVpQUA0UNhR8AWfd0WC-krYJeXb0ebNs-V4u9kgweyNw_XHtg@mail.gmail.com>

On Tue, May 26, 2026 at 02:21:56PM -0700, Kuniyuki Iwashima wrote:
> On Tue, May 26, 2026 at 1:34 PM Martin KaFai Lau <martin.lau@linux.dev> wrote:
> >
> > On Sat, May 23, 2026 at 08:29:32AM +0000, Kuniyuki Iwashima wrote:
> > > When a TCP skb is queued to sk->sk_receive_queue, BPF SOCK_OPS
> > > prog can be called with BPF_SOCK_OPS_RCVQ_CB.
> > >
> > > In this hook, we want to parse the RPC descriptor in the skb
> > > and adjust sk->sk_rcvlowat based on the RPC frame size.
> > >
> > > However, we cannot access payload via bpf_sock_ops.data on
> > > modern NICs with TCP header/data split on as the payload is
> > > not placed in the linear area.
> > >
> > > Let's support bpf_skb_load_bytes() for BPF_SOCK_OPS_RCVQ_CB.
> > >
> > > Three notes:
> > >
> > >   1) bpf_sock_ops_kern.skb will be NULL when the BPF prog is
> > >       invoked from recvmsg().
> > >
> > >   2) Access to bpf_sock_ops.data will be disabled by passing
> > >       0 end_offset to bpf_skops_init_skb().
> > >
> > >   3) ____bpf_skb_load_bytes() is called directly instead of
> > >      __bpf_skb_load_bytes() to allow compilers to inline it
> > >      instead of generating a tail-call.
> >
> > Some observations below.
> >
> > >
> > > Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
> > > ---
> > > v2: Explain why using ____ version instead of __
> > > ---
> > >  net/core/filter.c | 34 ++++++++++++++++++++++++++++++++++
> > >  1 file changed, 34 insertions(+)
> > >
> > > diff --git a/net/core/filter.c b/net/core/filter.c
> > > index 4a50fe2cd863..fa8a7c7d86eb 100644
> > > --- a/net/core/filter.c
> > > +++ b/net/core/filter.c
> > > @@ -7760,6 +7760,38 @@ static const struct bpf_func_proto bpf_sk_assign_proto = {
> > >       .arg3_type      = ARG_ANYTHING,
> > >  };
> > >
> > > +BPF_CALL_4(bpf_sock_ops_skb_load_bytes, struct bpf_sock_ops_kern *, bpf_sock,
> > > +        u32, offset, void *, to, u32, len)
> > > +{
> > > +     int err;
> > > +
> > > +     if (bpf_sock->op != BPF_SOCK_OPS_RCVQ_CB) {
> >
> > bpf_dynptr_from_skb() and bpf_dynptr_slice() kfunc could also be considered.
> > One less bpf_sock->op check in filter.c to maintain and could also avoid
> > a data copy. There is a bpf_cast_to_kern_ctx() to get to a trusted
> > skops_kern pointer but this will need changes in verifier.c to get to
> > skops_kern->skb (e.g. in type_is_trusted_or_null) and this is the tradeoff.
> 
> Maybe a dumb question, but does it add extra cost (extra dynptr
> function call?) if data overlaps two frags, or can dynptr handle it
> seamlessly with a single bpf_dynptr_slice() ?

Right, there is an extra bpf_dynptr_from_skb(). I don't think we have
benchmarked it.

If I read it correctly, unlike bpf_xdp_pointer, the skb_header_pointer
will still copy even if the data is in one frag. It works well if the data
is in the headlen and the worst case is to copy, which is the same as
load_bytes.

It is a readonly use case. Maybe the bpf prog can directly read the frag.
Regardless, it is useful to have a kfunc/helper to read it.

> 
> In our case, the data copy is ~16 bytes, so the cost will not be
> a big problem I think.
> 
> 
> >
> > If this new rcvq callback is added to the 'bpf_tcp_ops' proposal [1],
> > all this will go away. 'struct sk_buff *skb' can be directly passed to an
> > ops of the 'bpf_tcp_ops'. Supporting '*skb' in a struct_ops has already
> > been done in the bpf_qdisc.
> >
> > [1]: https://lore.kernel.org/bpf/20260519215841.2984970-11-martin.lau@linux.dev/
> 
> Oh I missed the series, the struct_ops conversion looks nice !
> Since this work isn't urgent, I can wait for your series if mine
> churns it.
> 
> Jason's series is adding a new op, and I guess this can be
> integrated too ?
> https://lore.kernel.org/bpf/20260521135244.40869-5-kerneljasonxing@gmail.com/

imo, a new sock_ops cb should be added as an ops in struct_ops. For example,
in patch 4 of that series, bpf_skops_rx_timestamping assigns u64 to 'u32
args[4]', which is adding tech debt to the current sock_ops interface.
For the timestamping case, it could be a separate ops for the
'struct sock' instead of 'struct tcp_sock' because it should
at least work for both TCP and UDP.

  reply	other threads:[~2026-05-26 22:19 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-23  8:29 [PATCH v3 bpf-next 00/11] bpf: Add SOCK_OPS hooks for TCP AutoLOWAT Kuniyuki Iwashima
2026-05-23  8:29 ` [PATCH v3 bpf-next 01/11] selftest: bpf: Use BPF_SOCK_OPS_ALL_CB_FLAGS + 1 for bad_cb_test_rv Kuniyuki Iwashima
2026-05-23  9:06   ` bot+bpf-ci
2026-05-23  8:29 ` [PATCH v3 bpf-next 02/11] bpf: tcp: Introduce BPF_SOCK_OPS_RCVQ_CB Kuniyuki Iwashima
2026-05-23  8:29 ` [PATCH v3 bpf-next 03/11] bpf: tcp: Support bpf_skb_load_bytes() for BPF_SOCK_OPS_RCVQ_CB Kuniyuki Iwashima
2026-05-26 20:34   ` Martin KaFai Lau
2026-05-26 21:21     ` Kuniyuki Iwashima
2026-05-26 22:18       ` Martin KaFai Lau [this message]
2026-05-23  8:29 ` [PATCH v3 bpf-next 04/11] tcp: Split out __tcp_set_rcvlowat() Kuniyuki Iwashima
2026-05-23  8:29 ` [PATCH v3 bpf-next 05/11] bpf: tcp: Add kfunc to adjust sk->sk_rcvlowat Kuniyuki Iwashima
2026-05-23  9:06   ` bot+bpf-ci
2026-05-23  8:29 ` [PATCH v3 bpf-next 06/11] bpf: tcp: Make BPF_SOCK_OPS_RCVQ_CB and SOCKMAP mutually exclusive Kuniyuki Iwashima
2026-05-23  9:20   ` bot+bpf-ci
2026-05-24  3:37     ` Kuniyuki Iwashima
2026-05-23  8:29 ` [PATCH v3 bpf-next 07/11] bpf: mptcp: Don't support BPF_SOCK_OPS_RCVQ_CB Kuniyuki Iwashima
2026-05-23  8:29 ` [PATCH v3 bpf-next 08/11] bpf: tcp: Reject BPF_SOCK_OPS_RCVQ_CB if receive queue is not empty Kuniyuki Iwashima
2026-05-23  9:20   ` bot+bpf-ci
2026-05-23  8:29 ` [PATCH v3 bpf-next 09/11] bpf: tcp: Factorise bpf_skops_established() Kuniyuki Iwashima
2026-05-23  8:29 ` [PATCH v3 bpf-next 10/11] bpf: tcp: Add SOCK_OPS rcvlowat hook Kuniyuki Iwashima
2026-05-26 20:47   ` Martin KaFai Lau
2026-05-26 21:07     ` Kuniyuki Iwashima
2026-05-26 21:37       ` Amery Hung
2026-05-26 21:51         ` Kuniyuki Iwashima
2026-05-23  8:29 ` [PATCH v3 bpf-next 11/11] selftest: bpf: Add test for BPF_SOCK_OPS_RCVQ_CB Kuniyuki Iwashima
2026-05-23  9:20   ` bot+bpf-ci
2026-05-24  4:03     ` Kuniyuki Iwashima
2026-05-26 21:01   ` Martin KaFai Lau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2026526214538.Vhly.martin.lau@linux.dev \
    --to=martin.lau@linux.dev \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=eddyz87@gmail.com \
    --cc=edumazet@google.com \
    --cc=john.fastabend@gmail.com \
    --cc=kerneljasonxing@gmail.com \
    --cc=kuni1840@gmail.com \
    --cc=kuniyu@google.com \
    --cc=memxor@gmail.com \
    --cc=ncardwell@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=sdf@fomichev.me \
    --cc=ukyab@berkeley.edu \
    --cc=willemb@google.com \
    --cc=yonghong.song@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox