From: "Toke Høiland-Jørgensen" <toke@kernel.org>
To: Stanislav Fomichev <sdf@google.com>
Cc: bpf@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net,
andrii@kernel.org, martin.lau@linux.dev, song@kernel.org,
yhs@fb.com, john.fastabend@gmail.com, kpsingh@kernel.org,
haoluo@google.com, jolsa@kernel.org, willemb@google.com,
dsahern@kernel.org, magnus.karlsson@intel.com, bjorn@kernel.org,
maciej.fijalkowski@intel.com, netdev@vger.kernel.org
Subject: Re: [RFC bpf-next 0/7] bpf: netdev TX metadata
Date: Tue, 13 Jun 2023 21:10:17 +0200 [thread overview]
Message-ID: <87v8frw546.fsf@toke.dk> (raw)
In-Reply-To: <CAKH8qBt5tQ69Zs9kYGc7j-_3Yx9D6+pmS4KCN5G0s9UkX545Mg@mail.gmail.com>
Stanislav Fomichev <sdf@google.com> writes:
> On Tue, Jun 13, 2023 at 10:18 AM Toke Høiland-Jørgensen <toke@kernel.org> wrote:
>>
>> Stanislav Fomichev <sdf@google.com> writes:
>>
>> > On 06/12, Toke Høiland-Jørgensen wrote:
>> >> Some immediate thoughts after glancing through this:
>> >>
>> >> > --- Use cases ---
>> >> >
>> >> > The goal of this series is to add two new standard-ish places
>> >> > in the transmit path:
>> >> >
>> >> > 1. Right before the packet is transmitted (with access to TX
>> >> > descriptors)
>> >> > 2. Right after the packet is actually transmitted and we've received the
>> >> > completion (again, with access to TX completion descriptors)
>> >> >
>> >> > Accessing TX descriptors unlocks the following use-cases:
>> >> >
>> >> > - Setting device hints at TX: XDP/AF_XDP might use these new hooks to
>> >> > use device offloads. The existing case implements TX timestamp.
>> >> > - Observability: global per-netdev hooks can be used for tracing
>> >> > the packets and exploring completion descriptors for all sorts of
>> >> > device errors.
>> >> >
>> >> > Accessing TX descriptors also means that the hooks have to be called
>> >> > from the drivers.
>> >> >
>> >> > The hooks are a light-weight alternative to XDP at egress and currently
>> >> > don't provide any packet modification abilities. However, eventually,
>> >> > can expose new kfuncs to operate on the packet (or, rather, the actual
>> >> > descriptors; for performance sake).
>> >>
>> >> dynptr?
>> >
>> > Haven't considered, let me explore, but not sure what it buys us
>> > here?
>>
>> API consistency, certainly. Possibly also performance, if using the
>> slice thing that gets you a direct pointer to the pkt data? Not sure
>> about that, though, haven't done extensive benchmarking of dynptr yet...
>
> Same. Let's keep it on the table, I'll try to explore. I was just
> thinking that having less abstraction here might be better
> performance-wise.
Sure, let's evaluate this once we have performance numbers.
>> >> > --- UAPI ---
>> >> >
>> >> > The hooks are implemented in a HID-BPF style. Meaning they don't
>> >> > expose any UAPI and are implemented as tracing programs that call
>> >> > a bunch of kfuncs. The attach/detach operation happen via BPF syscall
>> >> > programs. The series expands device-bound infrastructure to tracing
>> >> > programs.
>> >>
>> >> Not a fan of the "attach from BPF syscall program" thing. These are part
>> >> of the XDP data path API, and I think we should expose them as proper
>> >> bpf_link attachments from userspace with introspection etc. But I guess
>> >> the bpf_mprog thing will give us that?
>> >
>> > bpf_mprog will just make those attach kfuncs return the link fd. The
>> > syscall program will still stay :-(
>>
>> Why does the attachment have to be done this way, exactly? Couldn't we
>> just use the regular bpf_link attachment from userspace? AFAICT it's not
>> really piggy-backing on the function override thing anyway when the
>> attachment is per-dev? Or am I misunderstanding how all this works?
>
> It's UAPI vs non-UAPI. I'm assuming kfunc makes it non-UAPI and gives
> us an opportunity to fix things.
> We can do it via a regular syscall path if there is a consensus.
Yeah, the API exposed to the BPF program is kfunc-based in any case. If
we were to at some point conclude that this whole thing was not useful
at all and deprecate it, it doesn't seem to me that it makes much
difference whether that means "you can no longer create a link
attachment of this type via BPF_LINK_CREATE" or "you can no longer
create a link attachment of this type via BPF_PROG_RUN of a syscall type
program" doesn't really seem like a significant detail to me...
>> >> > --- skb vs xdp ---
>> >> >
>> >> > The hooks operate on a new light-weight devtx_frame which contains:
>> >> > - data
>> >> > - len
>> >> > - sinfo
>> >> >
>> >> > This should allow us to have a unified (from BPF POW) place at TX
>> >> > and not be super-taxing (we need to copy 2 pointers + len to the stack
>> >> > for each invocation).
>> >>
>> >> Not sure what I think about this one. At the very least I think we
>> >> should expose xdp->data_meta as well. I'm not sure what the use case for
>> >> accessing skbs is? If that *is* indeed useful, probably there will also
>> >> end up being a use case for accessing the full skb?
>> >
>> > skb_shared_info has meta_len, buf afaik, xdp doesn't use it. Maybe I
>> > a good opportunity to unify? Or probably won't work because if
>> > xdf_frame doesn't have frags, it won't have sinfo?
>>
>> No, it won't. But why do we need this unification between the skb and
>> xdp paths in the first place? Doesn't the skb path already have support
>> for these things? Seems like we could just stick to making this xdp-only
>> and keeping xdp_frame as the ctx argument?
>
> For skb path, I'm assuming we can read sinfo->meta_len; it feels nice
> to make it work for both cases?
> We can always export metadata len via some kfunc, sure.
I wasn't referring to the metadata field specifically when talking about
the skb path. I'm wondering why we need these hooks to work for the skb
path at all? :)
-Toke
next prev parent reply other threads:[~2023-06-13 19:10 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-12 17:23 [RFC bpf-next 0/7] bpf: netdev TX metadata Stanislav Fomichev
2023-06-12 17:23 ` [RFC bpf-next 1/7] bpf: rename some xdp-metadata functions into dev-bound Stanislav Fomichev
2023-06-12 17:23 ` [RFC bpf-next 2/7] bpf: resolve single typedef when walking structs Stanislav Fomichev
2023-06-12 17:23 ` [RFC bpf-next 3/7] bpf: implement devtx hook points Stanislav Fomichev
2023-06-13 14:54 ` Willem de Bruijn
2023-06-13 19:00 ` Stanislav Fomichev
2023-06-13 19:29 ` Willem de Bruijn
2023-06-13 15:08 ` Simon Horman
2023-06-13 19:00 ` Stanislav Fomichev
2023-06-14 7:02 ` Simon Horman
2023-06-14 17:18 ` Stanislav Fomichev
2023-06-16 5:46 ` Kui-Feng Lee
2023-06-16 17:32 ` Stanislav Fomichev
2023-06-12 17:23 ` [RFC bpf-next 4/7] bpf: implement devtx timestamp kfunc Stanislav Fomichev
2023-06-13 15:14 ` Simon Horman
2023-06-13 18:39 ` Stanislav Fomichev
2023-06-12 17:23 ` [RFC bpf-next 5/7] net: veth: implement devtx timestamp kfuncs Stanislav Fomichev
2023-06-12 17:23 ` [RFC bpf-next 6/7] selftests/bpf: extend xdp_metadata with devtx kfuncs Stanislav Fomichev
2023-06-13 14:47 ` Willem de Bruijn
2023-06-13 19:00 ` Stanislav Fomichev
2023-06-12 17:23 ` [RFC bpf-next 7/7] selftests/bpf: extend xdp_hw_metadata " Stanislav Fomichev
2023-06-13 15:03 ` Willem de Bruijn
2023-06-13 19:00 ` Stanislav Fomichev
2023-06-12 21:00 ` [RFC bpf-next 0/7] bpf: netdev TX metadata Toke Høiland-Jørgensen
2023-06-13 16:32 ` Stanislav Fomichev
2023-06-13 17:18 ` Toke Høiland-Jørgensen
2023-06-13 18:39 ` Stanislav Fomichev
2023-06-13 19:10 ` Toke Høiland-Jørgensen [this message]
2023-06-13 21:17 ` Stanislav Fomichev
2023-06-13 22:32 ` Alexei Starovoitov
2023-06-13 23:16 ` Stanislav Fomichev
2023-06-14 4:19 ` Alexei Starovoitov
2023-06-14 11:59 ` Toke Høiland-Jørgensen
2023-06-14 16:27 ` Alexei Starovoitov
2023-06-15 12:36 ` Toke Høiland-Jørgensen
2023-06-15 16:10 ` Alexei Starovoitov
2023-06-15 16:31 ` Stanislav Fomichev
2023-06-16 1:50 ` Jakub Kicinski
2023-06-16 0:09 ` Stanislav Fomichev
2023-06-16 8:12 ` Magnus Karlsson
2023-06-16 17:32 ` Stanislav Fomichev
2023-06-16 23:10 ` Stanislav Fomichev
2023-06-19 7:15 ` Magnus Karlsson
2023-06-14 3:31 ` Jakub Kicinski
2023-06-14 3:54 ` David Ahern
2023-06-14 5:05 ` Jakub Kicinski
2023-06-14 17:17 ` Stanislav Fomichev
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87v8frw546.fsf@toke.dk \
--to=toke@kernel.org \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bjorn@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=dsahern@kernel.org \
--cc=haoluo@google.com \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kpsingh@kernel.org \
--cc=maciej.fijalkowski@intel.com \
--cc=magnus.karlsson@intel.com \
--cc=martin.lau@linux.dev \
--cc=netdev@vger.kernel.org \
--cc=sdf@google.com \
--cc=song@kernel.org \
--cc=willemb@google.com \
--cc=yhs@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox