Re: [PATCH RFCv2 bpf-next 00/18] XDP-hints: XDP gaining access to HW offload hints via BTF

All of lore.kernel.org
 help / color / mirror / Atom feed

From: sdf@google.com
To: Jesper Dangaard Brouer <jbrouer@redhat.com>
Cc: brouer@redhat.com, bpf@vger.kernel.org, netdev@vger.kernel.org,
	xdp-hints@xdp-project.net, larysa.zaremba@intel.com,
	memxor@gmail.com, Lorenzo Bianconi <lorenzo@kernel.org>,
	mtahhan@redhat.com,
	Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	Daniel Borkmann <borkmann@iogearbox.net>,
	Andrii Nakryiko <andrii.nakryiko@gmail.com>,
	dave@dtucker.co.uk, Magnus Karlsson <magnus.karlsson@intel.com>,
	bjorn@kernel.org
Subject: Re: [PATCH RFCv2 bpf-next 00/18] XDP-hints: XDP gaining access to HW offload hints via BTF
Date: Wed, 5 Oct 2022 11:43:00 -0700	[thread overview]
Message-ID: <Yz3QNM7061WmXDHS@google.com> (raw)
In-Reply-To: <982b9125-f849-5e1c-0082-7239b8c8eebf@redhat.com>

On 10/05, Jesper Dangaard Brouer wrote:


> On 04/10/2022 20.26, Stanislav Fomichev wrote:
> > On Tue, Oct 4, 2022 at 2:29 AM Jesper Dangaard Brouer
> > <jbrouer@redhat.com> wrote:
> > >
> > >
> > > On 04/10/2022 01.55, sdf@google.com wrote:
> > > > On 09/07, Jesper Dangaard Brouer wrote:
> > > > > This patchset expose the traditional hardware offload hints to  
> XDP and
> > > > > rely on BTF to expose the layout to users.
> > > >
> > > > > Main idea is that the kernel and NIC drivers simply defines the  
> struct
> > > > > layouts they choose to use for XDP-hints. These XDP-hints structs  
> gets
> > > > > naturally and automatically described via BTF and implicitly  
> exported to
> > > > > users. NIC drivers populate and records their own BTF ID as the  
> last
> > > > > member in XDP metadata area (making it easily accessible by AF_XDP
> > > > > userspace at a known negative offset from packet data start).
> > > >
> > > > > Naming conventions for the structs (xdp_hints_*) is used such that
> > > > > userspace can find and decode the BTF layout and match against the
> > > > > provided BTF IDs. Thus, no new UAPI interfaces are needed for  
> exporting
> > > > > what XDP-hints a driver supports.
> > > >
> > > > > The patch "i40e: Add xdp_hints_union" introduce the idea of  
> creating a
> > > > > union named "xdp_hints_union" in every driver, which contains all
> > > > > xdp_hints_* struct this driver can support. This makes it  
> easier/quicker
> > > > > to find and parse the relevant BTF types.  (Seeking input before  
> fixing
> > > > > up all drivers in patchset).
> > > >
> > > >
> > > > > The main different from RFC-v1:
> > > > >    - Drop idea of BTF "origin" (vmlinux, module or local)
> > > > >    - Instead to use full 64-bit BTF ID that combine object+type ID
> > > >
> > > > > I've taken some of Alexandr/Larysa's libbpf patches and integrated
> > > > > those.
> > > >
> > > > > Patchset exceeds netdev usually max 15 patches rule. My excuse is  
> three
> > > > > NIC drivers (i40e, ixgbe and mvneta) gets XDP-hints support and  
> which
> > > > > required some refactoring to remove the SKB dependencies.
> > > >
> > > > Hey Jesper,
> > > >
> > > > I took a quick look at the series.
> > > Appreciate that! :-)
> > >
> > > > Do we really need the enum with the flags?
> > >
> > > The primary reason for using enum is that these gets exposed as BTF.
> > > The proposal is that userspace/BTF need to obtain the flags via BTF,
> > > such that they don't become UAPI, but something we can change later.
> > >
> > > > We might eventually hit that "first 16 bits are reserved" issue?
> > > >
> > > > Instead of exposing enum with the flags, why not solve it as  
> follows:
> > > > a. We define UAPI struct xdp_rx_hints with _all_ possible hints
> > >
> > > How can we know _all_ possible hints from the beginning(?).
> > >
> > > UAPI + central struct dictating all possible hints, will limit  
> innovation.
> >
> > We don't need to know them all in advance. The same way we don't know
> > them all for flags enum. That UAPI xdp_rx_hints can be extended any
> > time some driver needs some new hint offload. The benefit here is that
> > we have a "common registry" of all offloads and different drivers have
> > an opportunity to share.
> >
> > Think of it like current __sk_buff vs sk_buff. xdp_rx_hints is a fake
> > uapi struct (__sk_buff) and the access to it gets translated into
> > <device>_xdp_rx_hints offsets (sk_buff).
> >
> > > > b. Each device defines much denser <device>_xdp_rx_hints struct  
> with the
> > > >      metadata that it supports
> > >
> > > Thus, the NIC device is limited to what is defined in UAPI struct
> > > xdp_rx_hints.  Again this limits innovation.
> >
> > I guess what I'm missing from your series is the bpf/userspace side.
> > Do you have an example on the bpf side that will work for, say,
> > xdp_hints_ixgbe_timestamp?
> >
> > Suppose, you pass this custom hints btf_id via xdp_md as proposed,

> I just want to reiterate why we place btf_full_id at the "end inline".
> This makes it easily available for AF_XDP to consume.  Plus, we already
> have to write info into this metadata cache-line anyway, thus it's
> almost free.  Moving bpf_full_id into xdp_md, will require expanding
> both xdp_buff and xdp_frame (+ extra store for converting
> buff-to-frame). If AF_XDP need this btf_full_id the BPF-prog _could_
> move/copy it from xdp_md to metadata, but that will just waste cycles,
> why not just store it once in a known location.

> One option, for convenience, would be to map xdp_md->bpf_full_id to load
> the btf_full_id value from the metadata.  But that would essentially be
> syntax-sugar and adds UAPI.

> > what's the action on the bpf side to consume this?
> >
> > If (ctx_hints_btf_id == xdp_hints_ixgbe_timestamp_btf_id /* supposedly
> > populated at runtime by libbpf? */) {

> See e.g. bpf_core_type_id_kernel(struct xdp_hints_ixgbe_timestamp)
> AFAIK libbpf will make this a constant at load/setup time, and give us
> dead-code elimination.

Even with bpf_core_type_id_kernel() you still would have the following:

	if (ctx_hints_btf_id == bpf_core_type_id_kernel(struct xdp_hints_ixgbe)) {
	} else if (the same for every driver that has custom hints) {
	}

Toke has a good suggestion on hiding this behind a helper; either
pre-generated on the libbpf side or a kfunc. We should try to hide
this per-device logic if possible; otherwise we'll get to per-device
XDP programs that only work on some special deployments. OTOH, we'll
probably get there with the hints anyway?

> >    // do something with rx_timestamp
> >    // also, handle xdp_hints_ixgbe and then xdp_hints_common ?
> > } else if (ctx_hints_btf_id == xdp_hints_ixgbe) {
> >    // do something else
> >    // plus explicitly handle xdp_hints_common here?
> > } else {
> >    // handle xdp_hints_common
> > }

> I added a BPF-helper that can tell us if layout if compatible with
> xdp_hints_common, which is basically the only UAPI the patchset  
> introduces.
> The handle xdp_hints_common code should be common.

> I'm not super happy with the BPF-helper approach, so suggestions are
> welcome.  E.g. xdp_md/ctx->is_hint_common could be one approach and
> ctx->has_hint (ctx is often called xdp so it reads xdp->has_hint).

> One feature I need from the BPF-helper is to "disable" the xdp_hints and
> allow the BPF-prog to use the entire metadata area for something else
> (avoiding it to be misintrepreted by next prog or after redirect).

As mentioned in the previous emails, let's try to have a bpf side
example/selftest for the next round? I also feel like xdp_hints_common is
a bit distracting. It makes the common case easy and it hides the
discussion/complexity about per-device hints. Maybe we can drop this
common case at all? Why can't every driver has a custom hints struct?
If we agree that naming/size will be the same across them (and review
catches/guaranteed that), why do we even care about having common
xdp_hints_common struct?

> > What I'd like to avoid is an xdp program targeting specific drivers.
> > Where possible, we should aim towards something like "if this device
> > has rx_timestamp offload -> use it without depending too much on
> > specific btf_ids.
> >

> I do understand your wish, and adding rx_timestamps to xdp_hints_common
> would be too easy (and IMHO wasting u64/8-bytes for all packets not
> needing this timestamp).  Hopefully we can come up with a good solution
> together.

> One idea would be to extend libbpf to lookup or translate struct name

>   struct xdp_hints_DRIVER_timestamp {
>     __u64 rx_timestamp;
>   } __attribute__((preserve_access_index));

> into e.g. xdp_hints_i40e_timestamp, if an ifindex was provided when  
> loading
> the XDP prog.  And the bpf_core_type_id_kernel() result of the struct
> returning id from xdp_hints_i40e_timestamp.

> But this ideas doesn't really work for the veth redirect use-case :-(
> As veth need to handle xdp_hints from other drivers.

Agreed. If we want redirect to work, then the parsing should be either
mostly pre-generated by libbpf to include all possible btf ids that
matter; or done similarly by a kfunc. The idea that we can pre-generate
per-device bpf program seems to be out of the window now?

> > > > c. The subset of fields in <device>_xdp_rx_hints should match the  
> ones from
> > > >      xdp_rx_hints (we essentially standardize on the field  
> names/sizes)
> > > > d. We expose <device>_xdp_rx_hints btf id via netlink for each  
> device
> > >
> > > For this proposed design you would still need more than one BTF ID or
> > > <device>_xdp_rx_hints struct's, because not all packets contains all
> > > hints. The most common case is HW timestamping, which some HW only
> > > supports for PTP frames.
> > >
> > > Plus, I don't see a need to expose anything via netlink, as we can  
> just
> > > use the existing BTF information from the module.  Thus, avoiding to
> > > creating more UAPI.
> >
> > See above. I think even with your series, that btf_id info should also
> > come via netlink so the programs can query it before loading and do
> > the required adjustments. Otherwise, I'm not sure I understand what I
> > need to do with a btf_id that comes via xdp_md/xdp_frame. It seems too
> > late? I need to know them in advance to at least populate those ids
> > into the bpf program itself?

> Yes, we need to know these IDs in advance and can.  I don't think we need
> the netlink interface, as we can already read out the BTF layout and IDs
> today.  I coded it up in userspace, where the intented consumer is AF_XDP
> (as libbpf already does this itself).

> See this code:
>   -  
> https://github.com/xdp-project/bpf-examples/blob/master/BTF-playground/btf_module_ids.c
>   -  
> https://github.com/xdp-project/bpf-examples/blob/master/BTF-playground/btf_module_read.c

SG, if we can have some convention on the names where we can reliably
parse out all possible structs with the hints, let's rely solely on
vmlinux+vmlinux module btf.

> > > > e. libbpf will query and do offset relocations for
> > > >      xdp_rx_hints -> <device>_xdp_rx_hints at load time
> > > >
> > > > Would that work? Then it seems like we can replace bitfields with  
> the
> > >
> > > I used to be a fan of bitfields, until I discovered that they are bad
> > > for performance, because compilers cannot optimize these.
> >
> > Ack, good point, something to keep in mind.
> >
> > > > following:
> > > >
> > > >     if (bpf_core_field_exists(struct xdp_rx_hints, vlan_tci)) {
> > > >       /* use that hint */
> > >
> > > Fairly often a VLAN will not be set in packets, so we still have to  
> read
> > > and check a bitfield/flag if the VLAN value is valid. (Guess it is
> > > implicit in above code).
> >
> > That's a fair point. Then we need two signals?
> >
> > 1. Whether this particular offload is supported for the device at all
> > (via that bpf_core_field_exists or something similar)
> > 2. Whether this particular packet has particular metadata (via your
> > proposed flags)
> >
> > if (device I'm attaching xdp to has vlan offload) { // via
> > bpf_core_field_exists?
> >    if (particular packet comes with a vlan tag) { // via your proposed
> > bitfield flags?
> >    }
> > }
> >
> > Or are we assuming that (2) is fast enough and we don't care about
> > (1)? Because (1) can 'if (0)' the whole branch and make the verifier
> > remove that part.
> >
> > > >     }
> > > >
> > > > All we need here is for libbpf to, again, do xdp_rx_hints ->
> > > > <device>_xdp_rx_hints translation before it evaluates
> > > > bpf_core_field_exists()?
> > > >
> > > > Thoughts? Any downsides? Am I missing something?
> > > >
> > >
> > > Well, the downside is primarily that this design limits innovation.
> > >
> > > Each time a NIC driver want to introduce a new hardware hint, they  
> have
> > > to update the central UAPI xdp_rx_hints struct first.
> > >
> > > The design in the patchset is to open for innovation.  Driver can  
> extend
> > > their own xdp_hints_<driver>_xxx struct(s).  They still have to land
> > > their patches upstream, but avoid mangling a central UAPI struct. As
> > > upstream we review driver changes and should focus on sane struct  
> member
> > > naming(+size) especially if this "sounds" like a hint/feature that  
> more
> > > driver are likely to support.  With help from BTF relocations, a new
> > > driver can support same hint/feature if naming(+size) match (without
> > > necessary the same offset in the struct).
> >
> > The opposite side of this approach is that we'll have 'ixgbe_hints'
> > with 'rx_timestamp' and 'mvneta_hints' with something like
> > 'rx_tstamp'.

> Well, as I wrote reviewers should ask drivers to use the same member name.

SG!

> > > > Also, about the TX side: I feel like the same can be applied there,
> > > > the program works with xdp_tx_hints and libbpf will rewrite to
> > > > <device>_xdp_tx_hints. xdp_tx_hints might have fields  
> like "has_tx_vlan:1";
> > > > those, presumably, can be relocatable by libbpf as well?
> > > >
> > >
> > > Good to think ahead for TX-side, even-though I think we should focus  
> on
> > > landing RX-side first.
> > >
> > > I notice your naming xdp_rx_hints vs. xdp_tx_hints.  I have named the
> > > common struct xdp_hints_common, without a RX/TX direction indication.
> > > Maybe this is wrong of me, but my thinking was that most of the common
> > > hints can be directly used as TX-side hints.  I'm hoping TX-side
> > > xdp-hints will need to do little-to-non adjustment, before using the
> > > hints as TX "instruction".  I'm hoping that XDP-redirect will just  
> work
> > > and xmit driver can use XDP-hints area.
> > >
> > > Please correct me if I'm wrong.
> > > The checksum fields hopefully translates to similar TX  
> offload "actions".
> > > The VLAN offload hint should translate directly to TX-side.
> > >
> > > I can easily be convinced we should name it xdp_hints_rx_common from  
> the
> > > start, but then I will propose that xdp_hints_tx_common have the
> > > checksum and VLAN fields+flags at same locations, such that we don't
> > > take any performance hint for moving them to "TX-side" hints, making
> > > XDP-redirect just work.
> >
> > Might be good to think about this beforehand. I agree that most of the
> > layout should hopefully match. However once case that I'm interested
> > in is rx_timestamp vs tx_timestamp. For rx, I'm getting the timestamp
> > in the metadata; for tx, I'm merely setting a flag somewhere to
> > request it for async delivery later (I hope we plan to support that
> > for af_xdp?). So the layout might be completely different :-(
> >

> Yes, it is definitely in my plans to support handling at TX-completion
> time, so you can extract the TX-wire-timestamp.  This is easy for AF_XDP
> as it has the CQ (Completion Queue) step.

> I'm getting ahead of myself, but for XDP I imagine that driver will
> populate this xdp_tx_hint in DMA TX-completion function, and we can add
> a kfunc "not-a-real-hook" to xdp_return_frame that can run another XDP
> BPF-prog that can inspect the xdp_tx_hint in metadata.

Can we also place that xdp_tx_hint somewhere in the completion ring
for AF_XDP to consume?

> At this proposed kfunc xdp_return_frame call point, we likely cannot know
> what driver that produced the xdp_hints metadata either, and thus not lock
> our design or BTF-reloacations to assume which driver is it loaded on.

> [... cut ... getting too long]

> --Jesper

next prev parent reply	other threads:[~2022-10-05 18:45 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-07 15:45 [PATCH RFCv2 bpf-next 00/18] XDP-hints: XDP gaining access to HW offload hints via BTF Jesper Dangaard Brouer
2022-09-07 15:45 ` [PATCH RFCv2 bpf-next 01/18] libbpf: factor out BTF loading from load_module_btfs() Jesper Dangaard Brouer
2022-09-07 15:45 ` [PATCH RFCv2 bpf-next 02/18] libbpf: try to load vmlinux BTF from the kernel first Jesper Dangaard Brouer
2022-09-07 15:45 ` [PATCH RFCv2 bpf-next 03/18] libbpf: patch module BTF obj+type ID into BPF insns Jesper Dangaard Brouer
2022-09-07 15:45 ` [PATCH RFCv2 bpf-next 04/18] net: create xdp_hints_common and set functions Jesper Dangaard Brouer
2022-09-09 10:49   ` Burakov, Anatoly
2022-09-09 14:13     ` [xdp-hints] " Jesper Dangaard Brouer
2022-09-07 15:45 ` [PATCH RFCv2 bpf-next 05/18] net: add net_device feature flag for XDP-hints Jesper Dangaard Brouer
2022-09-07 15:45 ` [PATCH RFCv2 bpf-next 06/18] xdp: controlling XDP-hints from BPF-prog via helper Jesper Dangaard Brouer
2022-09-07 15:45 ` [PATCH RFCv2 bpf-next 07/18] i40e: Refactor i40e_ptp_rx_hwtstamp Jesper Dangaard Brouer
2022-09-07 15:45 ` [PATCH RFCv2 bpf-next 08/18] i40e: refactor i40e_rx_checksum with helper Jesper Dangaard Brouer
2022-09-07 15:45 ` [PATCH RFCv2 bpf-next 09/18] bpf: export btf functions for modules Jesper Dangaard Brouer
2022-09-07 15:45 ` [PATCH RFCv2 bpf-next 10/18] btf: Add helper for kernel modules to lookup full BTF ID Jesper Dangaard Brouer
2022-09-07 15:45 ` [PATCH RFCv2 bpf-next 11/18] i40e: add XDP-hints handling Jesper Dangaard Brouer
2022-09-07 15:46 ` [PATCH RFCv2 bpf-next 12/18] net: use XDP-hints in xdp_frame to SKB conversion Jesper Dangaard Brouer
2022-09-07 15:46 ` [PATCH RFCv2 bpf-next 13/18] mvneta: add XDP-hints support Jesper Dangaard Brouer
2022-09-07 15:46 ` [PATCH RFCv2 bpf-next 14/18] i40e: Add xdp_hints_union Jesper Dangaard Brouer
2022-09-07 15:46 ` [PATCH RFCv2 bpf-next 15/18] ixgbe: enable xdp-hints Jesper Dangaard Brouer
2022-09-08  1:17   ` kernel test robot
2022-09-07 15:46 ` [PATCH RFCv2 bpf-next 16/18] ixgbe: add rx timestamp xdp hints support Jesper Dangaard Brouer
2022-09-07 15:46 ` [PATCH RFCv2 bpf-next 17/18] xsk: AF_XDP xdp-hints support in desc options Jesper Dangaard Brouer
2022-09-08  8:06   ` Magnus Karlsson
2022-09-08 10:10     ` Maryam Tahhan
2022-09-08 15:04       ` Jesper Dangaard Brouer
2022-09-09  6:43         ` Magnus Karlsson
2022-09-09  8:12           ` Maryam Tahhan
2022-09-09  9:42             ` Jesper Dangaard Brouer
2022-09-09 10:14               ` Magnus Karlsson
2022-09-09 12:35                 ` [xdp-hints] " Jesper Dangaard Brouer
2022-09-09 12:44                   ` Magnus Karlsson
2022-09-07 15:46 ` [PATCH RFCv2 bpf-next 18/18] ixgbe: AF_XDP xdp-hints processing in ixgbe_clean_rx_irq_zc Jesper Dangaard Brouer
2022-09-08  9:30 ` [xdp-hints] [PATCH RFCv2 bpf-next 00/18] XDP-hints: XDP gaining access to HW offload hints via BTF Alexander Lobakin
2022-09-09 13:48   ` Jesper Dangaard Brouer
2022-10-03 23:55 ` sdf
2022-10-04  9:29   ` Jesper Dangaard Brouer
2022-10-04 18:26     ` Stanislav Fomichev
2022-10-05  0:25       ` Martin KaFai Lau
2022-10-05  0:59         ` Jakub Kicinski
2022-10-05  1:02           ` Stanislav Fomichev
2022-10-05  1:24             ` Jakub Kicinski
2022-10-05  2:15               ` Stanislav Fomichev
2022-10-05 19:26                 ` Martin KaFai Lau
2022-10-06  9:14                   ` Magnus Karlsson
2022-10-06 15:29                     ` Jesper Dangaard Brouer
2022-10-11  6:29                       ` Martin KaFai Lau
2022-10-11 11:57                         ` Jesper Dangaard Brouer
2022-10-05 10:06             ` [xdp-hints] " Toke Høiland-Jørgensen
2022-10-05 18:47               ` sdf
2022-10-06  8:19                 ` Maryam Tahhan
2022-10-06 17:22                   ` sdf
2022-10-05 14:19             ` Jesper Dangaard Brouer
2022-10-06 14:59               ` Jakub Kicinski
2022-10-05 13:43         ` Jesper Dangaard Brouer
2022-10-05 16:29       ` Jesper Dangaard Brouer
2022-10-05 18:43         ` sdf [this message]
2022-10-06 17:47           ` Jesper Dangaard Brouer
2022-10-07 15:05             ` [xdp-hints] " David Ahern
2022-10-05 13:14     ` Burakov, Anatoly

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yz3QNM7061WmXDHS@google.com \
    --to=sdf@google.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii.nakryiko@gmail.com \
    --cc=bjorn@kernel.org \
    --cc=borkmann@iogearbox.net \
    --cc=bpf@vger.kernel.org \
    --cc=brouer@redhat.com \
    --cc=dave@dtucker.co.uk \
    --cc=jbrouer@redhat.com \
    --cc=larysa.zaremba@intel.com \
    --cc=lorenzo@kernel.org \
    --cc=magnus.karlsson@intel.com \
    --cc=memxor@gmail.com \
    --cc=mtahhan@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=xdp-hints@xdp-project.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.