All of lore.kernel.org
 help / color / mirror / Atom feed
From: Lorenzo Bianconi <lorenzo@kernel.org>
To: Jesper Dangaard Brouer <hawk@kernel.org>
Cc: Jakub Kicinski <kuba@kernel.org>,
	Stanislav Fomichev <stfomichev@gmail.com>,
	bpf@vger.kernel.org, netdev@vger.kernel.org,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <borkmann@iogearbox.net>,
	Eric Dumazet <eric.dumazet@gmail.com>,
	"David S. Miller" <davem@davemloft.net>,
	Paolo Abeni <pabeni@redhat.com>,
	sdf@fomichev.me, kernel-team@cloudflare.com,
	arthur@arthurfabre.com, jakub@cloudflare.com,
	Jesse Brandeburg <jbrandeburg@cloudflare.com>
Subject: Re: [PATCH bpf-next V2 0/7] xdp: Allow BPF to set RX hints for XDP_REDIRECTed packets
Date: Fri, 18 Jul 2025 11:55:04 +0200	[thread overview]
Message-ID: <aHoZ-LtKT9p5FKAD@lore-desk> (raw)
In-Reply-To: <fbb026f9-54cf-49ba-b0dc-0df0f54c6961@kernel.org>

[-- Attachment #1: Type: text/plain, Size: 5885 bytes --]

> 
> 
> On 16/07/2025 23.20, Jakub Kicinski wrote:
> > On Wed, 16 Jul 2025 13:17:53 +0200 Lorenzo Bianconi wrote:
> > > > > I can't see what the non-redirected use-case could be. Can you please provide
> > > > > more details?
> > > > > Moreover, can it be solved without storing the rx_hash (or the other
> > > > > hw-metadata) in a non-driver specific format?
> > > > 
> > > > Having setters feels more generic than narrowly solving only the redirect,
> > > > but I don't have a good use-case in mind.
> > > > > Storing the hw-metadata in some of hw-specific format in xdp_frame will not
> > > > > allow to consume them directly building the skb and we will require to decode
> > > > > them again. What is the upside/use-case of this approach? (not considering the
> > > > > orthogonality with the get method).
> > > > 
> > > > If we add the store kfuncs to regular drivers, the metadata  won't be stored
> > > > in the xdp_frame; it will go into the rx descriptors so regular path that
> > > > builds skbs will use it.
> > > 
> > > IIUC, the described use-case would be to modify the hw metadata via a
> > > 'setter' kfunc executed by an eBPF program bounded to the NIC and to store
> > > the new metadata in the DMA descriptor in order to be consumed by the driver
> > > codebase building the skb, right?
> > > If so:
> > > - we can get the same result just storing (running a kfunc) the modified hw
> > >    metadata in the xdp_buff struct using a well-known/generic layout and
> > >    consume it in the driver codebase (e.g. if the bounded eBPF program
> > >    returns XDP_PASS) using a generic xdp utility routine. This part is not in
> > >    the current series.
> > > - Using this approach we are still not preserving the hw metadata if we pass
> > >    the xdp_frame to a remote CPU returning XDP_REDIRCT (we need to add more
> > >    code)
> > > - I am not completely sure if can always modify the DMA descriptor directly
> > >    since it is DMA mapped.
> 
> Let me explain why it is a bad idea of writing into the RX descriptors.
> The DMA descriptors are allocated as coherent DMA (dma_alloc_coherent).
> This is memory that is shared with the NIC hardware device, which
> implies cache-line coherence.  NIC performance is tightly coupled to
> limiting cache misses for descriptors.  One common trick is to pack more
> descriptors into a single cache-line.  Thus, if we start to write into
> the current RX-descriptor, then we invalidate that cache-line seen from
> the device, and next RX-descriptor (from this cache-line) will be in an
> unfortunate coherent state.  Behind the scene this might lead to some
> extra PCIe transactions.
> 
> By writing to the xdp_frame, we don't have to modify the DMA descriptors
> directly and risk invalidating cache lines for the NIC.
> 
> > > 
> > > What do you think?
> > 
> > FWIW I commented on an earlier revision to similar effect as Stanislav.
> > To me the main concern is that we're adding another adhoc scheme, and
> > are making xdp_frame grow into a para-skb. We added XDP to make raw
> > packet access fast, now we're making drivers convert metadata twice :/
> 
> Thanks for the feedback. I can see why you'd be concerned about adding
> another adhoc scheme or making xdp_frame grow into a "para-skb".
> 
> However, I'd like to frame this as part of a long-term plan we've been
> calling the "mini-SKB" concept. This isn't a new idea, but a
> continuation of architectural discussions from as far back as [2016].
> 
> The long-term goal, described in these presentations from [2018] and
> [2019], has always been to evolve the xdp_frame to handle more hardware
> offloads, with the ultimate vision of moving SKB allocation out of NIC
> drivers entirely. In the future, the netstack could perform L3
> forwarding (and L2 bridging) directly on these enhanced xdp_frames
> [2019-slide20]. The main blocker for this vision has been the lack of
> hardware metadata in the xdp_frame.
> 
> This patchset is a small but necessary first step towards that goal. It
> focuses on the concrete XDP_REDIRECT use-case where we can immediately
> benefit for our production use-case. Storing this metadata in the
> xdp_frame is fundamental to the plan. It's no coincidence the fields are
> compatible with the SKB; they need to be.
> 
> I'm certainly open to debating the bigger picture, but I hope we can
> agree that it shouldn't hold up this first step, which solves an
> immediate need. Perhaps we can evaluate the merits of this specific
> change first, and discuss the overall architecture in parallel?

Considering the XDP_REDIRECT use-case, this series will allow us (in the
future) to avoid recomputing the packet checksum redirecting the frame into
a veth and then into a container, obtaining a significant performance
improvement.

Regarding,
Lorenzo

> 
> --Jesper
> 
> 
> Links:
> ------
> [2019] XDP closer integration with network stack
>  - https://people.netfilter.org/hawk/presentations/KernelRecipes2019/xdp-netstack-concert.pdf
>  - https://github.com/xdp-project/xdp-project/blob/main/conference/KernelRecipes2019/xdp-netstack-concert.org#slide-move-skb-allocations-out-of-nic-drivers
>  - [2019-slide20] https://github.com/xdp-project/xdp-project/blob/main/conference/KernelRecipes2019/xdp-netstack-concert.org#slide-fun-with-xdp_frame-before-skb-alloc
> 
> [2018] LPC Networking Track: XDP - challenges and future work
>  - https://people.netfilter.org/hawk/presentations/LinuxPlumbers2018/
>  - https://github.com/xdp-project/xdp-project/blob/main/conference/LinuxPlumbers2018/presentation-lpc2018-xdp-future.org#topic-moving-skb-allocation-out-of-driver
> 
> [2016] Network Performance Workshop
>  - https://people.netfilter.org/hawk/presentations/NetDev1.2_2016/net_performance_workshop_netdev1.2.pdf

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

      parent reply	other threads:[~2025-07-18  9:55 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-02 14:58 [PATCH bpf-next V2 0/7] xdp: Allow BPF to set RX hints for XDP_REDIRECTed packets Jesper Dangaard Brouer
2025-07-02 14:58 ` [PATCH bpf-next V2 1/7] net: xdp: Add xdp_rx_meta structure Jesper Dangaard Brouer
2025-07-17  9:19   ` Jakub Sitnicki
2025-07-17 14:40     ` Jesper Dangaard Brouer
2025-07-18 10:33       ` Jakub Sitnicki
2025-07-02 14:58 ` [PATCH bpf-next V2 2/7] selftests/bpf: Adjust test for maximum packet size in xdp_do_redirect Jesper Dangaard Brouer
2025-07-02 14:58 ` [PATCH bpf-next V2 3/7] net: xdp: Add kfuncs to store hw metadata in xdp_buff Jesper Dangaard Brouer
2025-07-03 11:41   ` Jesper Dangaard Brouer
2025-07-03 12:26     ` Lorenzo Bianconi
2025-07-02 14:58 ` [PATCH bpf-next V2 4/7] net: xdp: Set skb hw metadata from xdp_frame Jesper Dangaard Brouer
2025-07-02 14:58 ` [PATCH bpf-next V2 5/7] net: veth: Read xdp metadata from rx_meta struct if available Jesper Dangaard Brouer
2025-07-17 12:11   ` Jakub Sitnicki
2025-07-02 14:58 ` [PATCH bpf-next V2 6/7] bpf: selftests: Add rx_meta store kfuncs selftest Jesper Dangaard Brouer
2025-07-23  9:24   ` Bouska, Zdenek
2025-07-02 14:58 ` [PATCH bpf-next V2 7/7] net: xdp: update documentation for xdp-rx-metadata.rst Jesper Dangaard Brouer
2025-07-02 16:05 ` [PATCH bpf-next V2 0/7] xdp: Allow BPF to set RX hints for XDP_REDIRECTed packets Stanislav Fomichev
2025-07-03 11:17   ` Jesper Dangaard Brouer
2025-07-07 14:40     ` Stanislav Fomichev
2025-07-09  9:31       ` Lorenzo Bianconi
2025-07-11 16:04         ` Stanislav Fomichev
2025-07-16 11:17           ` Lorenzo Bianconi
2025-07-16 21:20             ` Jakub Kicinski
2025-07-17 13:08               ` Jesper Dangaard Brouer
2025-07-18  1:25                 ` Jakub Kicinski
2025-07-18 10:56                   ` Jesper Dangaard Brouer
2025-07-22  1:13                     ` Jakub Kicinski
2025-07-28 10:53                       ` Lorenzo Bianconi
2025-07-28 16:29                         ` Jakub Kicinski
2025-07-29 11:15                           ` Jesper Dangaard Brouer
2025-07-29 19:47                             ` Martin KaFai Lau
2025-07-31 16:27                               ` Jesper Dangaard Brouer
2025-08-01 20:38                                 ` Jakub Kicinski
2025-08-04 13:18                                   ` Jesper Dangaard Brouer
2025-08-06  0:28                                     ` Jakub Kicinski
2025-08-07 18:26                                       ` Jesper Dangaard Brouer
2025-08-06  1:24                                     ` Martin KaFai Lau
2025-08-07 19:07                                       ` Jesper Dangaard Brouer
2025-08-13  2:59                                         ` Martin KaFai Lau
2025-07-31 21:18                           ` Lorenzo Bianconi
2025-08-01 20:40                             ` Jakub Kicinski
2025-08-05 13:18                               ` Lorenzo Bianconi
2025-08-05 23:54                                 ` Jakub Kicinski
2025-07-18  9:55                 ` Lorenzo Bianconi [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aHoZ-LtKT9p5FKAD@lore-desk \
    --to=lorenzo@kernel.org \
    --cc=arthur@arthurfabre.com \
    --cc=ast@kernel.org \
    --cc=borkmann@iogearbox.net \
    --cc=bpf@vger.kernel.org \
    --cc=davem@davemloft.net \
    --cc=eric.dumazet@gmail.com \
    --cc=hawk@kernel.org \
    --cc=jakub@cloudflare.com \
    --cc=jbrandeburg@cloudflare.com \
    --cc=kernel-team@cloudflare.com \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=sdf@fomichev.me \
    --cc=stfomichev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.