Re: [RFC bpf-next 0/7] bpf: netdev TX metadata

public inbox for bpf@vger.kernel.org
 help / color / mirror / Atom feed

From: "Toke Høiland-Jørgensen" <toke@kernel.org>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	Stanislav Fomichev <sdf@google.com>
Cc: bpf <bpf@vger.kernel.org>, "Alexei Starovoitov" <ast@kernel.org>,
	"Daniel Borkmann" <daniel@iogearbox.net>,
	"Andrii Nakryiko" <andrii@kernel.org>,
	"Martin KaFai Lau" <martin.lau@linux.dev>,
	"Song Liu" <song@kernel.org>, "Yonghong Song" <yhs@fb.com>,
	"John Fastabend" <john.fastabend@gmail.com>,
	"KP Singh" <kpsingh@kernel.org>, "Hao Luo" <haoluo@google.com>,
	"Jiri Olsa" <jolsa@kernel.org>,
	"Willem de Bruijn" <willemb@google.com>,
	"David Ahern" <dsahern@kernel.org>,
	"Karlsson, Magnus" <magnus.karlsson@intel.com>,
	"Björn Töpel" <bjorn@kernel.org>,
	"Fijalkowski, Maciej" <maciej.fijalkowski@intel.com>,
	"Network Development" <netdev@vger.kernel.org>
Subject: Re: [RFC bpf-next 0/7] bpf: netdev TX metadata
Date: Wed, 14 Jun 2023 13:59:57 +0200	[thread overview]
Message-ID: <877cs6l0ea.fsf@toke.dk> (raw)
In-Reply-To: <CAADnVQ+CCOw9_LbCAaFz0593eydKNb7RxnGr6_FatUOKmvPmBg@mail.gmail.com>

Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:

> On Tue, Jun 13, 2023 at 4:16 PM Stanislav Fomichev <sdf@google.com> wrote:
>>
>> On Tue, Jun 13, 2023 at 3:32 PM Alexei Starovoitov
>> <alexei.starovoitov@gmail.com> wrote:
>> >
>> > On Tue, Jun 13, 2023 at 2:17 PM Stanislav Fomichev <sdf@google.com> wrote:
>> > >
>> > > > >> >> > --- UAPI ---
>> > > > >> >> >
>> > > > >> >> > The hooks are implemented in a HID-BPF style. Meaning they don't
>> > > > >> >> > expose any UAPI and are implemented as tracing programs that call
>> > > > >> >> > a bunch of kfuncs. The attach/detach operation happen via BPF syscall
>> > > > >> >> > programs. The series expands device-bound infrastructure to tracing
>> > > > >> >> > programs.
>> > > > >> >>
>> > > > >> >> Not a fan of the "attach from BPF syscall program" thing. These are part
>> > > > >> >> of the XDP data path API, and I think we should expose them as proper
>> > > > >> >> bpf_link attachments from userspace with introspection etc. But I guess
>> > > > >> >> the bpf_mprog thing will give us that?
>> > > > >> >
>> > > > >> > bpf_mprog will just make those attach kfuncs return the link fd. The
>> > > > >> > syscall program will still stay :-(
>> > > > >>
>> > > > >> Why does the attachment have to be done this way, exactly? Couldn't we
>> > > > >> just use the regular bpf_link attachment from userspace? AFAICT it's not
>> > > > >> really piggy-backing on the function override thing anyway when the
>> > > > >> attachment is per-dev? Or am I misunderstanding how all this works?
>> > > > >
>> > > > > It's UAPI vs non-UAPI. I'm assuming kfunc makes it non-UAPI and gives
>> > > > > us an opportunity to fix things.
>> > > > > We can do it via a regular syscall path if there is a consensus.
>> > > >
>> > > > Yeah, the API exposed to the BPF program is kfunc-based in any case. If
>> > > > we were to at some point conclude that this whole thing was not useful
>> > > > at all and deprecate it, it doesn't seem to me that it makes much
>> > > > difference whether that means "you can no longer create a link
>> > > > attachment of this type via BPF_LINK_CREATE" or "you can no longer
>> > > > create a link attachment of this type via BPF_PROG_RUN of a syscall type
>> > > > program" doesn't really seem like a significant detail to me...
>> > >
>> > > In this case, why do you prefer it to go via regular syscall? Seems
>> > > like we can avoid a bunch of boileplate syscall work with a kfunc that
>> > > does the attachment?
>> > > We might as well abstract it at, say, libbpf layer which would
>> > > generate/load this small bpf program to call a kfunc.
>> >
>> > I'm not sure we're on the same page here.
>> > imo using syscall bpf prog that calls kfunc to do a per-device attach
>> > is an overkill here.
>> > It's an experimental feature, but you're already worried about
>> > multiple netdevs?
>> >
>> > Can you add an empty nop function and attach to it tracing style
>> > with fentry ?
>> > It won't be per-netdev, but do you have to do per-device demux
>> > by the kernel? Can your tracing bpf prog do that instead?
>> > It's just an ifindex compare.
>> > This way than non-uapi bits will be even smaller and no need
>> > to change struct netdevice.
>>
>> It's probably going to work if each driver has a separate set of tx
>> fentry points, something like:
>>   {veth,mlx5,etc}_devtx_submit()
>>   {veth,mlx5,etc}_devtx_complete()

I really don't get the opposition to exposing proper APIs; as a
dataplane developer I want to attach a program to an interface. The
kernel's role is to provide a consistent interface for this, not to
require users to become driver developers just to get at the required
details.

> Right. And per-driver descriptors.
> The 'generic' xdptx metadata is unlikely to be practical.
> Marshaling in and out of it is going to be too perf sensitive.
> I'd just add an attach point in the driver with enough
> args for bpf progs to make sense of the context and extend
> the verifier to make few safe fields writeable.

This is a rehashing of the argument we had on the RX side: just exposing
descriptors is a bad API because it forces BPF programs to deal with
hardware errata - which is exactly the kind of thing that belongs in a
driver. Exposing kfuncs allows drivers to deal with hardware quirks
while exposing a consistent API to (BPF) users.

> kfuncs to read/request timestamp are probably too slow.

Which is why we should be inlining them :)

-Toke

next prev parent reply	other threads:[~2023-06-14 12:00 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-12 17:23 [RFC bpf-next 0/7] bpf: netdev TX metadata Stanislav Fomichev
2023-06-12 17:23 ` [RFC bpf-next 1/7] bpf: rename some xdp-metadata functions into dev-bound Stanislav Fomichev
2023-06-12 17:23 ` [RFC bpf-next 2/7] bpf: resolve single typedef when walking structs Stanislav Fomichev
2023-06-12 17:23 ` [RFC bpf-next 3/7] bpf: implement devtx hook points Stanislav Fomichev
2023-06-13 14:54   ` Willem de Bruijn
2023-06-13 19:00     ` Stanislav Fomichev
2023-06-13 19:29       ` Willem de Bruijn
2023-06-13 15:08   ` Simon Horman
2023-06-13 19:00     ` Stanislav Fomichev
2023-06-14  7:02       ` Simon Horman
2023-06-14 17:18         ` Stanislav Fomichev
2023-06-16  5:46   ` Kui-Feng Lee
2023-06-16 17:32     ` Stanislav Fomichev
2023-06-12 17:23 ` [RFC bpf-next 4/7] bpf: implement devtx timestamp kfunc Stanislav Fomichev
2023-06-13 15:14   ` Simon Horman
2023-06-13 18:39     ` Stanislav Fomichev
2023-06-12 17:23 ` [RFC bpf-next 5/7] net: veth: implement devtx timestamp kfuncs Stanislav Fomichev
2023-06-12 17:23 ` [RFC bpf-next 6/7] selftests/bpf: extend xdp_metadata with devtx kfuncs Stanislav Fomichev
2023-06-13 14:47   ` Willem de Bruijn
2023-06-13 19:00     ` Stanislav Fomichev
2023-06-12 17:23 ` [RFC bpf-next 7/7] selftests/bpf: extend xdp_hw_metadata " Stanislav Fomichev
2023-06-13 15:03   ` Willem de Bruijn
2023-06-13 19:00     ` Stanislav Fomichev
2023-06-12 21:00 ` [RFC bpf-next 0/7] bpf: netdev TX metadata Toke Høiland-Jørgensen
2023-06-13 16:32   ` Stanislav Fomichev
2023-06-13 17:18     ` Toke Høiland-Jørgensen
2023-06-13 18:39       ` Stanislav Fomichev
2023-06-13 19:10         ` Toke Høiland-Jørgensen
2023-06-13 21:17           ` Stanislav Fomichev
2023-06-13 22:32             ` Alexei Starovoitov
2023-06-13 23:16               ` Stanislav Fomichev
2023-06-14  4:19                 ` Alexei Starovoitov
2023-06-14 11:59                   ` Toke Høiland-Jørgensen [this message]
2023-06-14 16:27                     ` Alexei Starovoitov
2023-06-15 12:36                       ` Toke Høiland-Jørgensen
2023-06-15 16:10                         ` Alexei Starovoitov
2023-06-15 16:31                           ` Stanislav Fomichev
2023-06-16  1:50                             ` Jakub Kicinski
2023-06-16  0:09   ` Stanislav Fomichev
2023-06-16  8:12     ` Magnus Karlsson
2023-06-16 17:32       ` Stanislav Fomichev
2023-06-16 23:10         ` Stanislav Fomichev
2023-06-19  7:15           ` Magnus Karlsson
2023-06-14  3:31 ` Jakub Kicinski
2023-06-14  3:54   ` David Ahern
2023-06-14  5:05     ` Jakub Kicinski
2023-06-14 17:17       ` Stanislav Fomichev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=877cs6l0ea.fsf@toke.dk \
    --to=toke@kernel.org \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bjorn@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=dsahern@kernel.org \
    --cc=haoluo@google.com \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kpsingh@kernel.org \
    --cc=maciej.fijalkowski@intel.com \
    --cc=magnus.karlsson@intel.com \
    --cc=martin.lau@linux.dev \
    --cc=netdev@vger.kernel.org \
    --cc=sdf@google.com \
    --cc=song@kernel.org \
    --cc=willemb@google.com \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox