BPF List
 help / color / mirror / Atom feed
From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
To: Jesper Dangaard Brouer <hawk@kernel.org>
Cc: <netdev@vger.kernel.org>, <bpf@vger.kernel.org>,
	<magnus.karlsson@intel.com>, <stfomichev@gmail.com>,
	<kuba@kernel.org>, <pabeni@redhat.com>, <horms@kernel.org>,
	<bjorn@kernel.org>, <lorenzo@kernel.org>, <toke@redhat.com>,
	Liang Chen <liangchen.linux@gmail.com>,
	Yunsheng Lin <linyunsheng@huawei.com>,
	huangjie.albert <huangjie.albert@bytedance.com>
Subject: Re: [PATCH RFC net-next 0/4] xdp: reuse generic skb XDP handling for veth
Date: Fri, 15 May 2026 20:18:54 +0200	[thread overview]
Message-ID: <agdjjtlXxlzy6Mx5@boxer> (raw)
In-Reply-To: <d784e8d3-e6af-4437-93f2-3103720b8454@kernel.org>

On Thu, May 14, 2026 at 07:13:07AM +0200, Jesper Dangaard Brouer wrote:
> 
> 
> On 09/05/2026 10.48, Maciej Fijalkowski wrote:
> > Hi,
> > 
> > this series is an attempt to make skb-backed XDP handling in veth use
> > the generic skb XDP machinery instead of converting skb-backed packets
> > into xdp_frames for XDP_TX and XDP_REDIRECT.
> 
> I support this idea.  Thanks for working on this Maciej.

Hi Jesper, good to read!

> 
> I basically proposed the same back in Aug 2023 [1].  My patchset[2] was
> motivated by a performance improvement 23.5% see [3], that comes from
> avoiding to reallocate most veth packets when XDP is loaded.
> 
> Please read veth_benchmark04.org[4] as it documents a number of
> pitfalls, and highlight that the main trick to avoid reallocation is
> changing net_device needed_headroom (when XDP is loaded).

Thanks, will do

> 
> [1] https://lore.kernel.org/all/169272715407.1975370.3989385869434330916.stgit@firesoul/
> [2] https://lore.kernel.org/all/169272709850.1975370.16698220879817216294.stgit@firesoul/
> [3] https://github.com/xdp-project/xdp-project/blob/main/areas/core/veth_benchmark03.org
> [4] https://github.com/xdp-project/xdp-project/blob/main/areas/core/veth_benchmark04.org
> 
> 
> > This was brought up by
> > Jakub during review of previous work, which was focused on addressing
> > splats from AF_XDP selftests shrinking multi-buffer packet:
> > 
> >    https://lore.kernel.org/bpf/20251029221315.2694841-1-maciej.fijalkowski@intel.com/
> > 
> > veth currently has two different XDP input paths:
> >    - xdp_frame input, coming through ndo_xdp_xmit(), for example from
> >      devmap redirects; and
> >    - skb input, coming through ndo_start_xmit(), for example from the
> >      regular networking stack, pktgen, or other skb producers.
> > 
> > The xdp_frame path is naturally frame-based and this series keeps it on
> > the existing veth xdp_frame handling path. The skb path, however, is
> > still fundamentally skb-backed, but today veth converts it into an
> > xdp_frame for XDP_TX/XDP_REDIRECT. That conversion is awkward and has
> > been pointed out as undesirable before, because veth has to pin skb data,
> > construct an xdp_frame view of it, and then restore/massage XDP memory
> > metadata around that conversion.
> > 
> 
> Yes, I really dislike the veth approach of stealing the SKB "head"/data
> page by bumping the page refcnt directly.  If is awkward as you say.
> I would be happy to see that code improved.
> 
> > This series takes a different approach: skb-backed veth XDP now reuses
> > the generic skb XDP execution path. veth provides its own xdp_buff,
> > xdp_rxq_info and page_pool to the generic XDP helper, so the generic code
> > can still perform the required skb COW using veth's page_pool when the
> > skb head/frags cannot be used directly. XDP_TX then uses generic_xdp_tx()
> > and XDP_REDIRECT uses xdp_do_generic_redirect(), while the xdp_frame
> > input path keeps using the existing veth xdp_frame TX/redirect handling.
> 
> I also used this approach in my patchset, so I like it.
> 
> https://lore.kernel.org/all/169272715407.1975370.3989385869434330916.stgit@firesoul/
> 
> > The problem this series tries to address more generally is that
> > skb-backed generic XDP can end up with memory whose provenance is not
> > described correctly by a single queue-level MEM_TYPE_PAGE_SHARED value.
> > When skb is COWed the underlying memory is page_pool backed but current
> > logic does not respect it.
> > 
> > For that reason the series introduces MEM_TYPE_PAGE_POOL_OR_SHARED. This
> > type is not bound to a single registered page_pool allocator. Instead,
> > the return path inspects the individual netmem and dispatches either to
> > the page_pool return path or to page_frag_free(). This lets generic
> > skb-backed XDP handle mixed page_pool/page-shared memory without
> > mutating rxq->mem.type per packet.
> 
> I'm unsure about the introduction of MEM_TYPE_PAGE_POOL_OR_SHARED an
> ambiguous memory type.  In [5] I considered adding two new memory types
> MEM_TYPE_KMALLOC_SKB and MEM_TYPE_SKB_SMALL_HEAD_CACHE that __xdp_return
> would handle, but I labeled is as an "uncertain approach" myself.

I assume your hacks were done on top of veth without page_pool being used
for reallocations? Otherwise I have to understand your reasoning. I had a
bit shattered week so expect response on monday.

Just as a reminder, I need this to be fixed as currently AF_XDP test suite
splats heavily when we encounter skb_pp_cow_data() calls and
__xdp_return() still sees MEM_TYPE_PAGE_SHARED when packet is shrunk via
bpf_xdp_adjust_tail().

Also seems all parties agree (Jakub's response) we should come up with
common conditions for taking the conversion path (and bump veth hroom).

To be fully transparent I assume we have to include Toke to discussion as
you guys had discussion back on your RFC.

Thanks,
Maciej
> 
> [5] https://github.com/xdp-project/xdp-project/blob/main/areas/core/veth_benchmark04.org#uncertain-approach
> 
> 
> > The veth part also removes the old rq->xdp_mem juggling. For incoming
> > xdp_frames, veth now uses a local rxq view whose mem.type is taken from
> > frame->mem_type. This preserves the frame's original memory type for
> > XDP_TX/XDP_REDIRECT without overwriting the persistent rq->xdp_rxq memory
> > model used by skb-backed generic XDP.
> > 
> > One visible datapath change is that skb-backed veth XDP_TX no longer
> > uses veth's xdp_frame bulk queue. It now follows the generic skb XDP_TX
> > path. The xdp_frame-originated path is unchanged and still uses the
> > existing veth bulk path. The old skb-backed path had batching, but it
> > also paid the cost of converting skb-backed packets into xdp_frames. The
> > new path removes that conversion and keeps skb-backed packets on the
> > generic skb XDP path. From my local tests that consisted of
> > pktgen + xdp_bench I did not notice any major performance regressions,
> > however Lorenzo and Jesper might disagree here, hence the RFC status. I
> > am fed up of internal wars with Sashiko so I would be pleased to get
> > some human feedback.
> > 
> 
> My benchmarking in above documents, showed that needed_headroom change was a
> bigger performance boost than loosing the batching.
> 
> For a long time, I have considered adding batching to the generic-XDP code
> path, simply via SKB-list trick.  I did an experiment doing this in the past
> and my benchmarking showed 30% TX performance boost, because xmit_more takes
> effect.  Given people are not suppose to use generic-XDP if they care about
> performance, I never followed up.  If veth start to use this generic
> redirect code path, then it makes sense to do this. In production we have
> code that does XDP redirect of SKBs into veth (I have recommended to
> redirect native xdp_frame's instead, but because traffic first need to
> travel through some DDoS filters in iptables, it cannot be done without
> first moving those filters into XDP).
> 
> 
> > In particular, let's discuss:
> > 
> >    - whether MEM_TYPE_PAGE_POOL_OR_SHARED is an acceptable way to describe
> >      skb-backed generic XDP memory that may be page_pool-backed after COW
> >      or ordinary page-shared memory otherwise;
> > 
> >    - whether passing caller-provided xdp_buff/rxq/page_pool context into
> >      the generic skb XDP helper is the right API shape;
> > 
> >    - whether letting veth provide its own page_pool to generic XDP is
> >      acceptable for avoiding the old skb->xdp_frame conversion; could
> >      veth just piggy-back on system's page_pool? even though it could,
> >      we still need xdp_buff being passed (metadata) and other refactors
> >      that allow veth to bump stats and do the redirect flush;
> > 
> >    - whether skb-backed veth XDP_TX using generic_xdp_tx(), while keeping
> >      xdp_frame-originated traffic on the existing veth bulk path, is an
> >      acceptable split;
> > 
> >    - the INDIRECT_CALL for taking the COW path; I wanted to preserve
> >      existing behavior, but is it really needed or maybe it would be
> >      possible to come up with conditions that would cover both generic
> >      XDP path and veth?
> > 
> > FWIW I do like the end result on veth side. If I missed CCing someone,
> > mea culpa.
> 
> Added Cc
> - Liang Chen <liangchen.linux@gmail.com>
> - Yunsheng Lin <linyunsheng@huawei.com>
> - huangjie.albert@bytedance.com
> 
> > 
> > Thanks,
> > Maciej
> > 
> > Maciej Fijalkowski (4):
> >    xdp: add mixed page_pool/page_shared memory type
> >    xdp: return status from generic_xdp_tx()
> >    xdp: split generic XDP skb handling
> >    veth: use generic skb XDP handling
> > 
> >   drivers/net/veth.c        | 179 ++++++++------------------------------
> >   include/linux/netdevice.h |  33 ++++++-
> >   include/net/xdp.h         |   1 +
> >   kernel/bpf/devmap.c       |   2 +-
> >   net/core/dev.c            | 124 ++++++++++++++++++++------
> >   net/core/filter.c         |   2 +-
> >   net/core/xdp.c            |  54 ++++++++++--
> >   7 files changed, 216 insertions(+), 179 deletions(-)
> > 
> 

  reply	other threads:[~2026-05-15 18:19 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-09  8:48 [PATCH RFC net-next 0/4] xdp: reuse generic skb XDP handling for veth Maciej Fijalkowski
2026-05-09  8:48 ` [PATCH RFC net-next 1/4] xdp: add mixed page_pool/page_shared memory type Maciej Fijalkowski
2026-05-09  8:48 ` [PATCH RFC net-next 2/4] xdp: return status from generic_xdp_tx() Maciej Fijalkowski
2026-05-12 12:57   ` Björn Töpel
2026-05-12 17:13     ` Maciej Fijalkowski
2026-05-09  8:48 ` [PATCH RFC net-next 3/4] xdp: split generic XDP skb handling Maciej Fijalkowski
2026-05-09  8:48 ` [PATCH RFC net-next 4/4] veth: use generic skb XDP handling Maciej Fijalkowski
2026-05-12 14:32   ` Björn Töpel
2026-05-12 17:06     ` Maciej Fijalkowski
2026-05-13 11:31       ` Björn Töpel
2026-05-12 12:55 ` [PATCH RFC net-next 0/4] xdp: reuse generic skb XDP handling for veth Björn Töpel
2026-05-12 17:12   ` Maciej Fijalkowski
2026-05-14  5:13 ` Jesper Dangaard Brouer
2026-05-15 18:18   ` Maciej Fijalkowski [this message]
2026-05-15  0:54 ` Jakub Kicinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=agdjjtlXxlzy6Mx5@boxer \
    --to=maciej.fijalkowski@intel.com \
    --cc=bjorn@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=hawk@kernel.org \
    --cc=horms@kernel.org \
    --cc=huangjie.albert@bytedance.com \
    --cc=kuba@kernel.org \
    --cc=liangchen.linux@gmail.com \
    --cc=linyunsheng@huawei.com \
    --cc=lorenzo@kernel.org \
    --cc=magnus.karlsson@intel.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=stfomichev@gmail.com \
    --cc=toke@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox