From: Jesper Dangaard Brouer <hawk@kernel.org>
To: "Toke Høiland-Jørgensen" <toke@redhat.com>,
"Lorenzo Bianconi" <lorenzo@kernel.org>,
"Stanislav Fomichev" <stfomichev@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>,
bpf@vger.kernel.org, netdev@vger.kernel.org,
Jakub Kicinski <kuba@kernel.org>,
Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <borkmann@iogearbox.net>,
Eric Dumazet <eric.dumazet@gmail.com>,
"David S. Miller" <davem@davemloft.net>,
Paolo Abeni <pabeni@redhat.com>,
sdf@fomichev.me, kernel-team@cloudflare.com,
arthur@arthurfabre.com, jakub@cloudflare.com,
Magnus Karlsson <magnus.karlsson@intel.com>,
Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Subject: Re: [PATCH bpf-next V1 7/7] net: xdp: update documentation for xdp-rx-metadata.rst
Date: Tue, 17 Jun 2025 16:47:18 +0200 [thread overview]
Message-ID: <76a5330e-dc52-41ea-89c2-ddcde4b414bd@kernel.org> (raw)
In-Reply-To: <87h60e4meo.fsf@toke.dk>
On 17/06/2025 13.50, Toke Høiland-Jørgensen wrote:
> Lorenzo Bianconi <lorenzo@kernel.org> writes:
>
>>> On 06/16, Lorenzo Bianconi wrote:
>>>> On Jun 10, Stanislav Fomichev wrote:
>>>>> On 06/11, Lorenzo Bianconi wrote:
>>>>>>> Daniel Borkmann <daniel@iogearbox.net> writes:
>>>>>>>
>>>>>> [...]
>>>>>>>>>
>>>>>>>>> Why not have a new flag for bpf_redirect that transparently stores all
>>>>>>>>> available metadata? If you care only about the redirect -> skb case.
>>>>>>>>> Might give us more wiggle room in the future to make it work with
>>>>>>>>> traits.
>>>>>>>>
>>>>>>>> Also q from my side: If I understand the proposal correctly, in order to fully
>>>>>>>> populate an skb at some point, you have to call all the bpf_xdp_metadata_* kfuncs
>>>>>>>> to collect the data from the driver descriptors (indirect call), and then yet
>>>>>>>> again all equivalent bpf_xdp_store_rx_* kfuncs to re-store the data in struct
>>>>>>>> xdp_rx_meta again. This seems rather costly and once you add more kfuncs with
>>>>>>>> meta data aren't you better off switching to tc(x) directly so the driver can
>>>>>>>> do all this natively? :/
>>>>>>>
>>>>>>> I agree that the "one kfunc per metadata item" scales poorly. IIRC, the
>>>>>>> hope was (back when we added the initial HW metadata support) that we
>>>>>>> would be able to inline them to avoid the function call overhead.
>>>>>>>
>>>>>>> That being said, even with half a dozen function calls, that's still a
>>>>>>> lot less overhead from going all the way to TC(x). The goal of the use
>>>>>>> case here is to do as little work as possible on the CPU that initially
>>>>>>> receives the packet, instead moving the network stack processing (and
>>>>>>> skb allocation) to a different CPU with cpumap.
>>>>>>>
>>>>>>> So even if the *total* amount of work being done is a bit higher because
>>>>>>> of the kfunc overhead, that can still be beneficial because it's split
>>>>>>> between two (or more) CPUs.
>>>>>>>
>>>>>>> I'm sure Jesper has some concrete benchmarks for this lying around
>>>>>>> somewhere, hopefully he can share those :)
>>>>>>
>>>>>> Another possible approach would be to have some utility functions (not kfuncs)
>>>>>> used to 'store' the hw metadata in the xdp_frame that are executed in each
>>>>>> driver codebase before performing XDP_REDIRECT. The downside of this approach
>>>>>> is we need to parse the hw metadata twice if the eBPF program that is bounded
>>>>>> to the NIC is consuming these info. What do you think?
>>>>>
>>>>> That's the option I was asking about. I'm assuming we should be able
>>>>> to reuse existing xmo metadata callbacks for this. We should be able
>>>>> to hide it from the drivers also hopefully.
>>>>
>>>> If we move the hw metadata 'store' operations to the driver codebase (running
>>>> xmo metadata callbacks before performing XDP_REDIRECT), we will parse the hw
>>>> metadata twice if we attach to the NIC an AF_XDP program consuming the hw
>>>> metadata, right? One parsing is done by the AF_XDP hw metadata kfunc, and the
>>>> second one would be performed by the native driver codebase.
>>>
>>> The native driver codebase will parse the hw metadata only if the
>>> bpf_redirect set some flag, so unless I'm missing something, there
>>> should not be double parsing. (but it's all user controlled, so doesn't
>>> sound like a problem?)
>>
>> I do not have a strong opinion about it, I guess it is fine, but I am not
>> 100% sure if it fits in Jesper's use case.
>> @Jesper: any input on it?
>
> FWIW, one of the selling points of XDP is (IMO) that it allows you to
> basically override any processing the stack does. I think this should
> apply to hardware metadata as well (for instance, if the HW metadata
> indicates that a packet is TCP, and XDP performs encapsulation before
> PASSing it, the metadata should be overridden to reflect this).
I fully agree :-)
> So if the driver populates these fields natively, I think this should
> either happen before the XDP program is run (so it can be overridden),
> or it should check if the XDP program already set the values and leave
> them be if so. Both of those obviously incur overhead; not sure which
> would be more expensive, though...
>
Yes, if the XDP BPF-prog choose to override a field, then it's value
should "win". As I explained in [0], this is our first main production
use-case. To override the RX-hash with the tunnel inner-headers.
[0]
https://lore.kernel.org/all/ca38f2ed-999f-4ce1-8035-8ee9247f27f2@kernel.org/
Later we will look at using the vlan tag. Today we have disabled HW
vlan-offloading, because XDP originally didn't support accessing HW vlan
tags. Next hurdle for us is our usage of tail-calls, which cannot
access the HW RX-metadata via the "get" kfuncs. Not part of this
patchset, but we have considered if someone calls the "store" kfunc, if
tail-calls invoking the "get" kfunc could return the stored value
(even-though it is not device bound).
The last field is the HW timestamp, which we also don't currently use.
Notice that this patchset stores the timestamp in skb_shared_info area.
Storing in this area will likely cause a cache-miss. Thus, "if the
driver populates these fields natively" and automatically then it will
be a performance concern.
For now, I just need to override the RX-hash and have this survive to
CPUMAP (+veth). Adding some flag that stores all three HW metadata
natively in the driver is something that we can add later IMHO.
--Jesper
next prev parent reply other threads:[~2025-06-17 14:47 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-03 17:45 [PATCH bpf-next V1 0/7] xdp: Propagate RX HW hints for XDP_REDIRECTed packets via xdp_frame Jesper Dangaard Brouer
2025-06-03 17:45 ` [PATCH bpf-next V1 1/7] net: xdp: Add xdp_rx_meta structure Jesper Dangaard Brouer
2025-06-03 17:46 ` [PATCH bpf-next V1 2/7] selftests/bpf: Adjust test for maximum packet size in xdp_do_redirect Jesper Dangaard Brouer
2025-06-03 17:46 ` [PATCH bpf-next V1 3/7] net: xdp: Add kfuncs to store hw metadata in xdp_buff Jesper Dangaard Brouer
2025-06-16 21:55 ` Jakub Kicinski
2025-06-03 17:46 ` [PATCH bpf-next V1 4/7] net: xdp: Set skb hw metadata from xdp_frame Jesper Dangaard Brouer
2025-06-03 17:46 ` [PATCH bpf-next V1 5/7] net: veth: Read xdp metadata from rx_meta struct if available Jesper Dangaard Brouer
2025-06-03 17:46 ` [PATCH bpf-next V1 6/7] bpf: selftests: Add rx_meta store kfuncs selftest Jesper Dangaard Brouer
2025-06-06 21:57 ` Alexei Starovoitov
2025-06-06 22:16 ` Lorenzo Bianconi
2025-06-03 17:46 ` [PATCH bpf-next V1 7/7] net: xdp: update documentation for xdp-rx-metadata.rst Jesper Dangaard Brouer
2025-06-06 2:45 ` Stanislav Fomichev
2025-06-10 13:48 ` Daniel Borkmann
2025-06-10 20:12 ` Toke Høiland-Jørgensen
2025-06-10 22:26 ` Lorenzo Bianconi
2025-06-11 3:40 ` Stanislav Fomichev
2025-06-13 10:59 ` Jesper Dangaard Brouer
2025-06-16 15:34 ` Stanislav Fomichev
2025-06-17 16:15 ` Jesper Dangaard Brouer
2025-06-17 20:47 ` Stanislav Fomichev
2025-06-16 12:37 ` Lorenzo Bianconi
2025-06-16 15:45 ` Stanislav Fomichev
2025-06-16 16:40 ` Lorenzo Bianconi
2025-06-17 11:50 ` Toke Høiland-Jørgensen
2025-06-17 14:47 ` Jesper Dangaard Brouer [this message]
2025-06-17 15:10 ` Performance impact of disabling VLAN offload [was: Re: [PATCH bpf-next V1 7/7] net: xdp: update documentation for xdp-rx-metadata.rst] Toke Høiland-Jørgensen
2025-06-19 12:09 ` Jesper Dangaard Brouer
2025-06-19 12:23 ` Toke Høiland-Jørgensen
2025-06-13 11:18 ` [PATCH bpf-next V1 7/7] net: xdp: update documentation for xdp-rx-metadata.rst Daniel Borkmann
2025-06-16 11:51 ` Toke Høiland-Jørgensen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=76a5330e-dc52-41ea-89c2-ddcde4b414bd@kernel.org \
--to=hawk@kernel.org \
--cc=arthur@arthurfabre.com \
--cc=ast@kernel.org \
--cc=borkmann@iogearbox.net \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=eric.dumazet@gmail.com \
--cc=jakub@cloudflare.com \
--cc=kernel-team@cloudflare.com \
--cc=kuba@kernel.org \
--cc=lorenzo@kernel.org \
--cc=maciej.fijalkowski@intel.com \
--cc=magnus.karlsson@intel.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=sdf@fomichev.me \
--cc=stfomichev@gmail.com \
--cc=toke@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.