From: John Fastabend <john.fastabend@gmail.com>
To: "Toke Høiland-Jørgensen" <toke@redhat.com>,
"Jakub Kicinski" <kuba@kernel.org>,
"John Fastabend" <john.fastabend@gmail.com>
Cc: Stanislav Fomichev <sdf@google.com>,
Alexei Starovoitov <alexei.starovoitov@gmail.com>,
Donald Hunter <donald.hunter@gmail.com>,
bpf <bpf@vger.kernel.org>, Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Andrii Nakryiko <andrii@kernel.org>,
Martin KaFai Lau <martin.lau@linux.dev>,
Song Liu <song@kernel.org>, Yonghong Song <yhs@fb.com>,
KP Singh <kpsingh@kernel.org>, Hao Luo <haoluo@google.com>,
Jiri Olsa <jolsa@kernel.org>,
Network Development <netdev@vger.kernel.org>
Subject: Re: [RFC bpf-next v2 11/11] net/mlx5e: Support TX timestamp metadata
Date: Fri, 30 Jun 2023 17:52:05 -0700 [thread overview]
Message-ID: <649f78b57358c_30943208c4@john.notmuch> (raw)
In-Reply-To: <87y1k2fq9m.fsf@toke.dk>
Toke Høiland-Jørgensen wrote:
> Jakub Kicinski <kuba@kernel.org> writes:
>
> > On Tue, 27 Jun 2023 14:43:57 -0700 John Fastabend wrote:
> >> What I think would be the most straight-forward thing and most flexible
> >> is to create a <drvname>_devtx_submit_skb(<drivname>descriptor, sk_buff)
> >> and <drvname>_devtx_submit_xdp(<drvname>descriptor, xdp_frame) and then
> >> corresponding calls for <drvname>_devtx_complete_{skb|xdp}() Then you
> >> don't spend any cycles building the metadata thing or have to even
> >> worry about read kfuncs. The BPF program has read access to any
> >> fields they need. And with the skb, xdp pointer we have the context
> >> that created the descriptor and generate meaningful metrics.
> >
> > Sorry but this is not going to happen without my nack. DPDK was a much
> > cleaner bifurcation point than trying to write datapath drivers in BPF.
> > Users having to learn how to render descriptors for all the NICs
> > and queue formats out there is not reasonable. Isovalent hired
I would expect BPF/driver experts would write the libraries for the
datapath API that the network/switch developer is going to use. I would
even put the BPF programs in kernel and ship them with the release
if that helps.
We have different visions on who the BPF user is that writes XDP
programs I think.
> > a lot of former driver developers so you may feel like it's a good
> > idea, as a middleware provider. But for the rest of us the matrix
> > of HW x queue format x people writing BPF is too large. If we can
Its nice though that we have good coverage for XDP so the matrix
is big. Even with kfuncs though we need someone to write support.
My thought is its just a question of if they write it in BPF
or in C code as a reader kfunc. I suspect for these advanced features
its only a subset at least upfront. Either way BPF or C you are
stuck finding someone to write that code.
> > write some poor man's DPDK / common BPF driver library to be selected
> > at linking time - we can as well provide a generic interface in
> > the kernel itself. Again, we never merged explicit DPDK support,
> > your idea is strictly worse.
>
> I agree: we're writing an operating system kernel here. The *whole
> point* of an operating system is to provide an abstraction over
> different types of hardware and provide a common API so users don't have
> to deal with the hardware details.
And just to be clear what we sacrifice then is forwards/backwards
portability. If its a kernel kfunc we need to add a kfunc for
every field we want to read and it will only be available then.
Further, it will need some general agreement that its useful for
it to be added. A hardware vendor wont be able to add some arbitrary
field and get access to it. So we lose this by doing kfuncs.
Its pushing complexity into the kernel that we maintain in kernel
when we could push the complexity into BPF and maintain as user
space code and BPF codes. Its a choice to make I think.
Also abstraction can cost cycles. Here we have to prepare the
structure and call kfunc. The kfunc can be inlined if folks
do the work. It may be small cost but not free.
>
> I feel like there's some tension between "BPF as a dataplane API" and
> "BPF as a kernel extension language" here, especially as the BPF
Agree. I'm obviously not maximizing for ease of use for the dataplane
API as BPF. IMO though even with the kfunc abstraction its niche work
writing low level datapath code that requires exposing a user
API higher up the stack. With a DSL (P4, ...) for example you could
abstract away the complexity and then compile down into these
details. Or if you like tables an Openflow style table interface
would provide a table API.
> subsystem has grown more features in the latter direction. In my mind,
> XDP is still very much a dataplane API; in fact that's one of the main
> selling points wrt DPDK: you can get high performance networking but
> still take advantage of the kernel drivers and other abstractions that
I think we agree on the goal a fast datapath for the nic.
> the kernel provides. If you're going for raw performance and the ability
> to twiddle every tiny detail of the hardware, DPDK fills that niche
> quite nicely (and also shows us the pains of going that route).
Summary on my side is we minimize kernel complexity
by raw descriptor reads, we don't need to know what we
want to read in the future and we need folks who understand
the hardware regardless of where the code lives in BPF
or C. C certainly helps the picking what kfunc to use
but we also have BTF that solves this struct/offset problem
for non-networking use cases already.
>
> -Toke
>
next prev parent reply other threads:[~2023-07-01 0:52 UTC|newest]
Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-21 17:02 [RFC bpf-next v2 00/11] bpf: Netdev TX metadata Stanislav Fomichev
2023-06-21 17:02 ` [RFC bpf-next v2 01/11] bpf: Rename some xdp-metadata functions into dev-bound Stanislav Fomichev
2023-06-21 17:02 ` [RFC bpf-next v2 02/11] bpf: Resolve single typedef when walking structs Stanislav Fomichev
2023-06-22 5:17 ` Alexei Starovoitov
2023-06-22 17:55 ` Stanislav Fomichev
2023-06-21 17:02 ` [RFC bpf-next v2 04/11] bpf: Implement devtx hook points Stanislav Fomichev
2023-06-21 17:02 ` [RFC bpf-next v2 05/11] bpf: Implement devtx timestamp kfunc Stanislav Fomichev
2023-06-22 12:07 ` Jesper D. Brouer
2023-06-22 17:55 ` Stanislav Fomichev
2023-06-21 17:02 ` [RFC bpf-next v2 06/11] net: veth: Implement devtx timestamp kfuncs Stanislav Fomichev
2023-06-23 23:29 ` Vinicius Costa Gomes
2023-06-26 17:00 ` Stanislav Fomichev
2023-06-26 22:00 ` Vinicius Costa Gomes
2023-06-26 23:29 ` Stanislav Fomichev
2023-06-27 1:38 ` Vinicius Costa Gomes
2023-06-21 17:02 ` [RFC bpf-next v2 09/11] selftests/bpf: Extend xdp_metadata with devtx kfuncs Stanislav Fomichev
2023-06-23 11:12 ` Jesper D. Brouer
2023-06-23 17:40 ` Stanislav Fomichev
2023-06-21 17:02 ` [RFC bpf-next v2 10/11] selftests/bpf: Extend xdp_hw_metadata " Stanislav Fomichev
2023-06-21 17:02 ` [RFC bpf-next v2 11/11] net/mlx5e: Support TX timestamp metadata Stanislav Fomichev
2023-06-22 19:57 ` Alexei Starovoitov
2023-06-22 20:13 ` Stanislav Fomichev
2023-06-22 21:47 ` Alexei Starovoitov
2023-06-22 22:13 ` Stanislav Fomichev
2023-06-23 2:35 ` Alexei Starovoitov
2023-06-23 10:16 ` Maryam Tahhan
2023-06-23 16:32 ` Alexei Starovoitov
2023-06-23 17:47 ` Maryam Tahhan
2023-06-23 17:24 ` Stanislav Fomichev
2023-06-23 18:57 ` Donald Hunter
2023-06-24 0:25 ` John Fastabend
2023-06-24 2:52 ` Alexei Starovoitov
2023-06-24 21:38 ` Jakub Kicinski
2023-06-25 1:12 ` Stanislav Fomichev
2023-06-26 21:36 ` Stanislav Fomichev
2023-06-26 22:37 ` Alexei Starovoitov
2023-06-26 23:29 ` Stanislav Fomichev
2023-06-27 13:35 ` Toke Høiland-Jørgensen
2023-06-27 21:43 ` John Fastabend
2023-06-27 22:56 ` Stanislav Fomichev
2023-06-27 23:33 ` John Fastabend
2023-06-27 23:50 ` Alexei Starovoitov
2023-06-28 18:52 ` Jakub Kicinski
2023-06-29 11:43 ` Toke Høiland-Jørgensen
2023-06-30 18:54 ` Stanislav Fomichev
2023-07-01 0:52 ` John Fastabend [this message]
2023-07-01 3:11 ` Jakub Kicinski
2023-07-03 18:30 ` John Fastabend
2023-07-03 19:33 ` Jakub Kicinski
2023-06-22 8:41 ` [RFC bpf-next v2 00/11] bpf: Netdev TX metadata Jesper Dangaard Brouer
2023-06-22 17:55 ` Stanislav Fomichev
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=649f78b57358c_30943208c4@john.notmuch \
--to=john.fastabend@gmail.com \
--cc=alexei.starovoitov@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=donald.hunter@gmail.com \
--cc=haoluo@google.com \
--cc=jolsa@kernel.org \
--cc=kpsingh@kernel.org \
--cc=kuba@kernel.org \
--cc=martin.lau@linux.dev \
--cc=netdev@vger.kernel.org \
--cc=sdf@google.com \
--cc=song@kernel.org \
--cc=toke@redhat.com \
--cc=yhs@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).