netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Lorenzo Bianconi <lorenzo@kernel.org>
To: Jesper Dangaard Brouer <hawk@kernel.org>
Cc: Jakub Kicinski <kuba@kernel.org>,
	Stanislav Fomichev <stfomichev@gmail.com>,
	bpf@vger.kernel.org, netdev@vger.kernel.org,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <borkmann@iogearbox.net>,
	Eric Dumazet <eric.dumazet@gmail.com>,
	"David S. Miller" <davem@davemloft.net>,
	Paolo Abeni <pabeni@redhat.com>,
	sdf@fomichev.me, kernel-team@cloudflare.com,
	arthur@arthurfabre.com, jakub@cloudflare.com,
	Jesse Brandeburg <jbrandeburg@cloudflare.com>
Subject: Re: [PATCH bpf-next V2 0/7] xdp: Allow BPF to set RX hints for XDP_REDIRECTed packets
Date: Fri, 18 Jul 2025 11:55:04 +0200	[thread overview]
Message-ID: <aHoZ-LtKT9p5FKAD@lore-desk> (raw)
In-Reply-To: <fbb026f9-54cf-49ba-b0dc-0df0f54c6961@kernel.org>

[-- Attachment #1: Type: text/plain, Size: 5885 bytes --]

> 
> 
> On 16/07/2025 23.20, Jakub Kicinski wrote:
> > On Wed, 16 Jul 2025 13:17:53 +0200 Lorenzo Bianconi wrote:
> > > > > I can't see what the non-redirected use-case could be. Can you please provide
> > > > > more details?
> > > > > Moreover, can it be solved without storing the rx_hash (or the other
> > > > > hw-metadata) in a non-driver specific format?
> > > > 
> > > > Having setters feels more generic than narrowly solving only the redirect,
> > > > but I don't have a good use-case in mind.
> > > > > Storing the hw-metadata in some of hw-specific format in xdp_frame will not
> > > > > allow to consume them directly building the skb and we will require to decode
> > > > > them again. What is the upside/use-case of this approach? (not considering the
> > > > > orthogonality with the get method).
> > > > 
> > > > If we add the store kfuncs to regular drivers, the metadata  won't be stored
> > > > in the xdp_frame; it will go into the rx descriptors so regular path that
> > > > builds skbs will use it.
> > > 
> > > IIUC, the described use-case would be to modify the hw metadata via a
> > > 'setter' kfunc executed by an eBPF program bounded to the NIC and to store
> > > the new metadata in the DMA descriptor in order to be consumed by the driver
> > > codebase building the skb, right?
> > > If so:
> > > - we can get the same result just storing (running a kfunc) the modified hw
> > >    metadata in the xdp_buff struct using a well-known/generic layout and
> > >    consume it in the driver codebase (e.g. if the bounded eBPF program
> > >    returns XDP_PASS) using a generic xdp utility routine. This part is not in
> > >    the current series.
> > > - Using this approach we are still not preserving the hw metadata if we pass
> > >    the xdp_frame to a remote CPU returning XDP_REDIRCT (we need to add more
> > >    code)
> > > - I am not completely sure if can always modify the DMA descriptor directly
> > >    since it is DMA mapped.
> 
> Let me explain why it is a bad idea of writing into the RX descriptors.
> The DMA descriptors are allocated as coherent DMA (dma_alloc_coherent).
> This is memory that is shared with the NIC hardware device, which
> implies cache-line coherence.  NIC performance is tightly coupled to
> limiting cache misses for descriptors.  One common trick is to pack more
> descriptors into a single cache-line.  Thus, if we start to write into
> the current RX-descriptor, then we invalidate that cache-line seen from
> the device, and next RX-descriptor (from this cache-line) will be in an
> unfortunate coherent state.  Behind the scene this might lead to some
> extra PCIe transactions.
> 
> By writing to the xdp_frame, we don't have to modify the DMA descriptors
> directly and risk invalidating cache lines for the NIC.
> 
> > > 
> > > What do you think?
> > 
> > FWIW I commented on an earlier revision to similar effect as Stanislav.
> > To me the main concern is that we're adding another adhoc scheme, and
> > are making xdp_frame grow into a para-skb. We added XDP to make raw
> > packet access fast, now we're making drivers convert metadata twice :/
> 
> Thanks for the feedback. I can see why you'd be concerned about adding
> another adhoc scheme or making xdp_frame grow into a "para-skb".
> 
> However, I'd like to frame this as part of a long-term plan we've been
> calling the "mini-SKB" concept. This isn't a new idea, but a
> continuation of architectural discussions from as far back as [2016].
> 
> The long-term goal, described in these presentations from [2018] and
> [2019], has always been to evolve the xdp_frame to handle more hardware
> offloads, with the ultimate vision of moving SKB allocation out of NIC
> drivers entirely. In the future, the netstack could perform L3
> forwarding (and L2 bridging) directly on these enhanced xdp_frames
> [2019-slide20]. The main blocker for this vision has been the lack of
> hardware metadata in the xdp_frame.
> 
> This patchset is a small but necessary first step towards that goal. It
> focuses on the concrete XDP_REDIRECT use-case where we can immediately
> benefit for our production use-case. Storing this metadata in the
> xdp_frame is fundamental to the plan. It's no coincidence the fields are
> compatible with the SKB; they need to be.
> 
> I'm certainly open to debating the bigger picture, but I hope we can
> agree that it shouldn't hold up this first step, which solves an
> immediate need. Perhaps we can evaluate the merits of this specific
> change first, and discuss the overall architecture in parallel?

Considering the XDP_REDIRECT use-case, this series will allow us (in the
future) to avoid recomputing the packet checksum redirecting the frame into
a veth and then into a container, obtaining a significant performance
improvement.

Regarding,
Lorenzo

> 
> --Jesper
> 
> 
> Links:
> ------
> [2019] XDP closer integration with network stack
>  - https://people.netfilter.org/hawk/presentations/KernelRecipes2019/xdp-netstack-concert.pdf
>  - https://github.com/xdp-project/xdp-project/blob/main/conference/KernelRecipes2019/xdp-netstack-concert.org#slide-move-skb-allocations-out-of-nic-drivers
>  - [2019-slide20] https://github.com/xdp-project/xdp-project/blob/main/conference/KernelRecipes2019/xdp-netstack-concert.org#slide-fun-with-xdp_frame-before-skb-alloc
> 
> [2018] LPC Networking Track: XDP - challenges and future work
>  - https://people.netfilter.org/hawk/presentations/LinuxPlumbers2018/
>  - https://github.com/xdp-project/xdp-project/blob/main/conference/LinuxPlumbers2018/presentation-lpc2018-xdp-future.org#topic-moving-skb-allocation-out-of-driver
> 
> [2016] Network Performance Workshop
>  - https://people.netfilter.org/hawk/presentations/NetDev1.2_2016/net_performance_workshop_netdev1.2.pdf

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

      parent reply	other threads:[~2025-07-18  9:55 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-02 14:58 [PATCH bpf-next V2 0/7] xdp: Allow BPF to set RX hints for XDP_REDIRECTed packets Jesper Dangaard Brouer
2025-07-02 14:58 ` [PATCH bpf-next V2 1/7] net: xdp: Add xdp_rx_meta structure Jesper Dangaard Brouer
2025-07-17  9:19   ` Jakub Sitnicki
2025-07-17 14:40     ` Jesper Dangaard Brouer
2025-07-18 10:33       ` Jakub Sitnicki
2025-07-02 14:58 ` [PATCH bpf-next V2 2/7] selftests/bpf: Adjust test for maximum packet size in xdp_do_redirect Jesper Dangaard Brouer
2025-07-02 14:58 ` [PATCH bpf-next V2 3/7] net: xdp: Add kfuncs to store hw metadata in xdp_buff Jesper Dangaard Brouer
2025-07-03 11:41   ` Jesper Dangaard Brouer
2025-07-03 12:26     ` Lorenzo Bianconi
2025-07-02 14:58 ` [PATCH bpf-next V2 4/7] net: xdp: Set skb hw metadata from xdp_frame Jesper Dangaard Brouer
2025-07-02 14:58 ` [PATCH bpf-next V2 5/7] net: veth: Read xdp metadata from rx_meta struct if available Jesper Dangaard Brouer
2025-07-17 12:11   ` Jakub Sitnicki
2025-07-02 14:58 ` [PATCH bpf-next V2 6/7] bpf: selftests: Add rx_meta store kfuncs selftest Jesper Dangaard Brouer
2025-07-23  9:24   ` Bouska, Zdenek
2025-07-02 14:58 ` [PATCH bpf-next V2 7/7] net: xdp: update documentation for xdp-rx-metadata.rst Jesper Dangaard Brouer
2025-07-02 16:05 ` [PATCH bpf-next V2 0/7] xdp: Allow BPF to set RX hints for XDP_REDIRECTed packets Stanislav Fomichev
2025-07-03 11:17   ` Jesper Dangaard Brouer
2025-07-07 14:40     ` Stanislav Fomichev
2025-07-09  9:31       ` Lorenzo Bianconi
2025-07-11 16:04         ` Stanislav Fomichev
2025-07-16 11:17           ` Lorenzo Bianconi
2025-07-16 21:20             ` Jakub Kicinski
2025-07-17 13:08               ` Jesper Dangaard Brouer
2025-07-18  1:25                 ` Jakub Kicinski
2025-07-18 10:56                   ` Jesper Dangaard Brouer
2025-07-22  1:13                     ` Jakub Kicinski
2025-07-28 10:53                       ` Lorenzo Bianconi
2025-07-28 16:29                         ` Jakub Kicinski
2025-07-29 11:15                           ` Jesper Dangaard Brouer
2025-07-29 19:47                             ` Martin KaFai Lau
2025-07-31 16:27                               ` Jesper Dangaard Brouer
2025-08-01 20:38                                 ` Jakub Kicinski
2025-08-04 13:18                                   ` Jesper Dangaard Brouer
2025-08-06  0:28                                     ` Jakub Kicinski
2025-08-07 18:26                                       ` Jesper Dangaard Brouer
2025-08-06  1:24                                     ` Martin KaFai Lau
2025-08-07 19:07                                       ` Jesper Dangaard Brouer
2025-08-13  2:59                                         ` Martin KaFai Lau
2025-07-31 21:18                           ` Lorenzo Bianconi
2025-08-01 20:40                             ` Jakub Kicinski
2025-08-05 13:18                               ` Lorenzo Bianconi
2025-08-05 23:54                                 ` Jakub Kicinski
2025-07-18  9:55                 ` Lorenzo Bianconi [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aHoZ-LtKT9p5FKAD@lore-desk \
    --to=lorenzo@kernel.org \
    --cc=arthur@arthurfabre.com \
    --cc=ast@kernel.org \
    --cc=borkmann@iogearbox.net \
    --cc=bpf@vger.kernel.org \
    --cc=davem@davemloft.net \
    --cc=eric.dumazet@gmail.com \
    --cc=hawk@kernel.org \
    --cc=jakub@cloudflare.com \
    --cc=jbrandeburg@cloudflare.com \
    --cc=kernel-team@cloudflare.com \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=sdf@fomichev.me \
    --cc=stfomichev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).