Netdev List
 help / color / mirror / Atom feed
From: Jesper Dangaard Brouer <hawk@kernel.org>
To: Maciej Fijalkowski <maciej.fijalkowski@intel.com>,
	netdev@vger.kernel.org
Cc: bpf@vger.kernel.org, magnus.karlsson@intel.com,
	stfomichev@gmail.com, kuba@kernel.org, pabeni@redhat.com,
	horms@kernel.org, bjorn@kernel.org, lorenzo@kernel.org,
	toke@redhat.com, Liang Chen <liangchen.linux@gmail.com>,
	Yunsheng Lin <linyunsheng@huawei.com>,
	"huangjie.albert" <huangjie.albert@bytedance.com>
Subject: Re: [PATCH RFC net-next 0/4] xdp: reuse generic skb XDP handling for veth
Date: Thu, 14 May 2026 07:13:07 +0200	[thread overview]
Message-ID: <d784e8d3-e6af-4437-93f2-3103720b8454@kernel.org> (raw)
In-Reply-To: <20260509084858.773921-1-maciej.fijalkowski@intel.com>



On 09/05/2026 10.48, Maciej Fijalkowski wrote:
> Hi,
> 
> this series is an attempt to make skb-backed XDP handling in veth use
> the generic skb XDP machinery instead of converting skb-backed packets
> into xdp_frames for XDP_TX and XDP_REDIRECT. 

I support this idea.  Thanks for working on this Maciej.

I basically proposed the same back in Aug 2023 [1].  My patchset[2] was
motivated by a performance improvement 23.5% see [3], that comes from
avoiding to reallocate most veth packets when XDP is loaded.

Please read veth_benchmark04.org[4] as it documents a number of
pitfalls, and highlight that the main trick to avoid reallocation is
changing net_device needed_headroom (when XDP is loaded).

[1] 
https://lore.kernel.org/all/169272715407.1975370.3989385869434330916.stgit@firesoul/
[2] 
https://lore.kernel.org/all/169272709850.1975370.16698220879817216294.stgit@firesoul/
[3] 
https://github.com/xdp-project/xdp-project/blob/main/areas/core/veth_benchmark03.org
[4] 
https://github.com/xdp-project/xdp-project/blob/main/areas/core/veth_benchmark04.org


> This was brought up by
> Jakub during review of previous work, which was focused on addressing
> splats from AF_XDP selftests shrinking multi-buffer packet:
> 
>    https://lore.kernel.org/bpf/20251029221315.2694841-1-maciej.fijalkowski@intel.com/
> 
> veth currently has two different XDP input paths:
>    - xdp_frame input, coming through ndo_xdp_xmit(), for example from
>      devmap redirects; and
>    - skb input, coming through ndo_start_xmit(), for example from the
>      regular networking stack, pktgen, or other skb producers.
> 
> The xdp_frame path is naturally frame-based and this series keeps it on
> the existing veth xdp_frame handling path. The skb path, however, is
> still fundamentally skb-backed, but today veth converts it into an
> xdp_frame for XDP_TX/XDP_REDIRECT. That conversion is awkward and has
> been pointed out as undesirable before, because veth has to pin skb data,
> construct an xdp_frame view of it, and then restore/massage XDP memory
> metadata around that conversion.
> 

Yes, I really dislike the veth approach of stealing the SKB "head"/data
page by bumping the page refcnt directly.  If is awkward as you say.
I would be happy to see that code improved.

> This series takes a different approach: skb-backed veth XDP now reuses
> the generic skb XDP execution path. veth provides its own xdp_buff,
> xdp_rxq_info and page_pool to the generic XDP helper, so the generic code
> can still perform the required skb COW using veth's page_pool when the
> skb head/frags cannot be used directly. XDP_TX then uses generic_xdp_tx()
> and XDP_REDIRECT uses xdp_do_generic_redirect(), while the xdp_frame
> input path keeps using the existing veth xdp_frame TX/redirect handling.

I also used this approach in my patchset, so I like it.

https://lore.kernel.org/all/169272715407.1975370.3989385869434330916.stgit@firesoul/

> The problem this series tries to address more generally is that
> skb-backed generic XDP can end up with memory whose provenance is not
> described correctly by a single queue-level MEM_TYPE_PAGE_SHARED value.
> When skb is COWed the underlying memory is page_pool backed but current
> logic does not respect it.
> 
> For that reason the series introduces MEM_TYPE_PAGE_POOL_OR_SHARED. This
> type is not bound to a single registered page_pool allocator. Instead,
> the return path inspects the individual netmem and dispatches either to
> the page_pool return path or to page_frag_free(). This lets generic
> skb-backed XDP handle mixed page_pool/page-shared memory without
> mutating rxq->mem.type per packet.

I'm unsure about the introduction of MEM_TYPE_PAGE_POOL_OR_SHARED an 
ambiguous memory type.  In [5] I considered adding two new memory types 
MEM_TYPE_KMALLOC_SKB and MEM_TYPE_SKB_SMALL_HEAD_CACHE that __xdp_return 
would handle, but I labeled is as an "uncertain approach" myself.

[5] 
https://github.com/xdp-project/xdp-project/blob/main/areas/core/veth_benchmark04.org#uncertain-approach


> The veth part also removes the old rq->xdp_mem juggling. For incoming
> xdp_frames, veth now uses a local rxq view whose mem.type is taken from
> frame->mem_type. This preserves the frame's original memory type for
> XDP_TX/XDP_REDIRECT without overwriting the persistent rq->xdp_rxq memory
> model used by skb-backed generic XDP.
> 
> One visible datapath change is that skb-backed veth XDP_TX no longer
> uses veth's xdp_frame bulk queue. It now follows the generic skb XDP_TX
> path. The xdp_frame-originated path is unchanged and still uses the
> existing veth bulk path. The old skb-backed path had batching, but it
> also paid the cost of converting skb-backed packets into xdp_frames. The
> new path removes that conversion and keeps skb-backed packets on the
> generic skb XDP path. From my local tests that consisted of
> pktgen + xdp_bench I did not notice any major performance regressions,
> however Lorenzo and Jesper might disagree here, hence the RFC status. I
> am fed up of internal wars with Sashiko so I would be pleased to get
> some human feedback.
> 

My benchmarking in above documents, showed that needed_headroom change 
was a bigger performance boost than loosing the batching.

For a long time, I have considered adding batching to the generic-XDP 
code path, simply via SKB-list trick.  I did an experiment doing this in 
the past and my benchmarking showed 30% TX performance boost, because 
xmit_more takes effect.  Given people are not suppose to use generic-XDP 
if they care about performance, I never followed up.  If veth start to 
use this generic redirect code path, then it makes sense to do this. In 
production we have code that does XDP redirect of SKBs into veth (I have 
recommended to redirect native xdp_frame's instead, but because traffic 
first need to travel through some DDoS filters in iptables, it cannot be 
done without first moving those filters into XDP).


> In particular, let's discuss:
> 
>    - whether MEM_TYPE_PAGE_POOL_OR_SHARED is an acceptable way to describe
>      skb-backed generic XDP memory that may be page_pool-backed after COW
>      or ordinary page-shared memory otherwise;
> 
>    - whether passing caller-provided xdp_buff/rxq/page_pool context into
>      the generic skb XDP helper is the right API shape;
> 
>    - whether letting veth provide its own page_pool to generic XDP is
>      acceptable for avoiding the old skb->xdp_frame conversion; could
>      veth just piggy-back on system's page_pool? even though it could,
>      we still need xdp_buff being passed (metadata) and other refactors
>      that allow veth to bump stats and do the redirect flush;
> 
>    - whether skb-backed veth XDP_TX using generic_xdp_tx(), while keeping
>      xdp_frame-originated traffic on the existing veth bulk path, is an
>      acceptable split;
> 
>    - the INDIRECT_CALL for taking the COW path; I wanted to preserve
>      existing behavior, but is it really needed or maybe it would be
>      possible to come up with conditions that would cover both generic
>      XDP path and veth?
> 
> FWIW I do like the end result on veth side. If I missed CCing someone,
> mea culpa.

Added Cc
- Liang Chen <liangchen.linux@gmail.com>
- Yunsheng Lin <linyunsheng@huawei.com>
- huangjie.albert@bytedance.com

> 
> Thanks,
> Maciej
> 
> Maciej Fijalkowski (4):
>    xdp: add mixed page_pool/page_shared memory type
>    xdp: return status from generic_xdp_tx()
>    xdp: split generic XDP skb handling
>    veth: use generic skb XDP handling
> 
>   drivers/net/veth.c        | 179 ++++++++------------------------------
>   include/linux/netdevice.h |  33 ++++++-
>   include/net/xdp.h         |   1 +
>   kernel/bpf/devmap.c       |   2 +-
>   net/core/dev.c            | 124 ++++++++++++++++++++------
>   net/core/filter.c         |   2 +-
>   net/core/xdp.c            |  54 ++++++++++--
>   7 files changed, 216 insertions(+), 179 deletions(-)
> 


      parent reply	other threads:[~2026-05-14  5:13 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-09  8:48 [PATCH RFC net-next 0/4] xdp: reuse generic skb XDP handling for veth Maciej Fijalkowski
2026-05-09  8:48 ` [PATCH RFC net-next 1/4] xdp: add mixed page_pool/page_shared memory type Maciej Fijalkowski
2026-05-09  8:48 ` [PATCH RFC net-next 2/4] xdp: return status from generic_xdp_tx() Maciej Fijalkowski
2026-05-12 12:57   ` Björn Töpel
2026-05-12 17:13     ` Maciej Fijalkowski
2026-05-09  8:48 ` [PATCH RFC net-next 3/4] xdp: split generic XDP skb handling Maciej Fijalkowski
2026-05-09  8:48 ` [PATCH RFC net-next 4/4] veth: use generic skb XDP handling Maciej Fijalkowski
2026-05-12 14:32   ` Björn Töpel
2026-05-12 17:06     ` Maciej Fijalkowski
2026-05-13 11:31       ` Björn Töpel
2026-05-12 12:55 ` [PATCH RFC net-next 0/4] xdp: reuse generic skb XDP handling for veth Björn Töpel
2026-05-12 17:12   ` Maciej Fijalkowski
2026-05-14  5:13 ` Jesper Dangaard Brouer [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d784e8d3-e6af-4437-93f2-3103720b8454@kernel.org \
    --to=hawk@kernel.org \
    --cc=bjorn@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=horms@kernel.org \
    --cc=huangjie.albert@bytedance.com \
    --cc=kuba@kernel.org \
    --cc=liangchen.linux@gmail.com \
    --cc=linyunsheng@huawei.com \
    --cc=lorenzo@kernel.org \
    --cc=maciej.fijalkowski@intel.com \
    --cc=magnus.karlsson@intel.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=stfomichev@gmail.com \
    --cc=toke@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox