netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
To: "Toke Høiland-Jørgensen" <toke@redhat.com>
Cc: <bpf@vger.kernel.org>, <ast@kernel.org>, <daniel@iogearbox.net>,
	<hawk@kernel.org>, <ilias.apalodimas@linaro.org>,
	<lorenzo@kernel.org>, <kuba@kernel.org>, <netdev@vger.kernel.org>,
	<magnus.karlsson@intel.com>, <andrii@kernel.org>,
	<stfomichev@gmail.com>, <aleksander.lobakin@intel.com>
Subject: Re: [PATCH v2 bpf 2/2] veth: update mem type in xdp_buff
Date: Fri, 17 Oct 2025 18:52:50 +0200	[thread overview]
Message-ID: <aPJ0YqfH+pdSIbVS@boxer> (raw)
In-Reply-To: <87a51pij2l.fsf@toke.dk>

On Fri, Oct 17, 2025 at 06:33:54PM +0200, Toke Høiland-Jørgensen wrote:
> Maciej Fijalkowski <maciej.fijalkowski@intel.com> writes:
> 
> > Veth calls skb_pp_cow_data() which makes the underlying memory to
> > originate from system page_pool. For CONFIG_DEBUG_VM=y and XDP program
> > that uses bpf_xdp_adjust_tail(), following splat was observed:
> >
> > [   32.204881] BUG: Bad page state in process test_progs  pfn:11c98b
> > [   32.207167] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x11c98b
> > [   32.210084] flags: 0x1fffe0000000000(node=0|zone=1|lastcpupid=0x7fff)
> > [   32.212493] raw: 01fffe0000000000 dead000000000040 ff11000123c9b000 0000000000000000
> > [   32.218056] raw: 0000000000000000 0000000000000001 00000000ffffffff 0000000000000000
> > [   32.220900] page dumped because: page_pool leak
> > [   32.222636] Modules linked in: bpf_testmod(O) bpf_preload
> > [   32.224632] CPU: 6 UID: 0 PID: 3612 Comm: test_progs Tainted: G O        6.17.0-rc5-gfec474d29325 #6969 PREEMPT
> > [   32.224638] Tainted: [O]=OOT_MODULE
> > [   32.224639] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
> > [   32.224641] Call Trace:
> > [   32.224644]  <IRQ>
> > [   32.224646]  dump_stack_lvl+0x4b/0x70
> > [   32.224653]  bad_page.cold+0xbd/0xe0
> > [   32.224657]  __free_frozen_pages+0x838/0x10b0
> > [   32.224660]  ? skb_pp_cow_data+0x782/0xc30
> > [   32.224665]  bpf_xdp_shrink_data+0x221/0x530
> > [   32.224668]  ? skb_pp_cow_data+0x6d1/0xc30
> > [   32.224671]  bpf_xdp_adjust_tail+0x598/0x810
> > [   32.224673]  ? xsk_destruct_skb+0x321/0x800
> > [   32.224678]  bpf_prog_004ac6bb21de57a7_xsk_xdp_adjust_tail+0x52/0xd6
> > [   32.224681]  veth_xdp_rcv_skb+0x45d/0x15a0
> > [   32.224684]  ? get_stack_info_noinstr+0x16/0xe0
> > [   32.224688]  ? veth_set_channels+0x920/0x920
> > [   32.224691]  ? get_stack_info+0x2f/0x80
> > [   32.224693]  ? unwind_next_frame+0x3af/0x1df0
> > [   32.224697]  veth_xdp_rcv.constprop.0+0x38a/0xbe0
> > [   32.224700]  ? common_startup_64+0x13e/0x148
> > [   32.224703]  ? veth_xdp_rcv_one+0xcd0/0xcd0
> > [   32.224706]  ? stack_trace_save+0x84/0xa0
> > [   32.224709]  ? stack_depot_save_flags+0x28/0x820
> > [   32.224713]  ? __resched_curr.constprop.0+0x332/0x3b0
> > [   32.224716]  ? timerqueue_add+0x217/0x320
> > [   32.224719]  veth_poll+0x115/0x5e0
> > [   32.224722]  ? veth_xdp_rcv.constprop.0+0xbe0/0xbe0
> > [   32.224726]  ? update_load_avg+0x1cb/0x12d0
> > [   32.224730]  ? update_cfs_group+0x121/0x2c0
> > [   32.224733]  __napi_poll+0xa0/0x420
> > [   32.224736]  net_rx_action+0x901/0xe90
> > [   32.224740]  ? run_backlog_napi+0x50/0x50
> > [   32.224743]  ? clockevents_program_event+0x1cc/0x280
> > [   32.224746]  ? hrtimer_interrupt+0x31e/0x7c0
> > [   32.224749]  handle_softirqs+0x151/0x430
> > [   32.224752]  do_softirq+0x3f/0x60
> > [   32.224755]  </IRQ>
> >
> > It's because xdp_rxq with mem model set to MEM_TYPE_PAGE_SHARED was used
> > when initializing xdp_buff.
> >
> > Fix this by using new helper xdp_convert_skb_to_buff() that, besides
> > init/prepare xdp_buff, will check if page used for linear part of
> > xdp_buff comes from page_pool. We assume that linear data and frags will
> > have same memory provider as currently XDP API does not provide us a way
> > to distinguish it (the mem model is registered for *whole* Rx queue and
> > here we speak about single buffer granularity).
> >
> > In order to meet expected skb layout by new helper, pull the mac header
> > before conversion from skb to xdp_buff.
> >
> > However, that is not enough as before releasing xdp_buff out of veth via
> > XDP_{TX,REDIRECT}, mem type on xdp_rxq associated with xdp_buff is
> > restored to its original model. We need to respect previous setting at
> > least until buff is converted to frame, as frame carries the mem_type.
> > Add a page_pool variant of veth_xdp_get() so that we avoid refcount
> > underflow when draining page frag.
> >
> > Fixes: 0ebab78cbcbf ("net: veth: add page_pool for page recycling")
> > Reported-by: Alexei Starovoitov <ast@kernel.org>
> > Closes: https://lore.kernel.org/bpf/CAADnVQ+bBofJDfieyOYzSmSujSfJwDTQhiz3aJw7hE+4E2_iPA@mail.gmail.com/
> > Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> > ---
> >  drivers/net/veth.c | 43 +++++++++++++++++++++++++++----------------
> >  1 file changed, 27 insertions(+), 16 deletions(-)
> >
> > diff --git a/drivers/net/veth.c b/drivers/net/veth.c
> > index a3046142cb8e..eeeee7bba685 100644
> > --- a/drivers/net/veth.c
> > +++ b/drivers/net/veth.c
> > @@ -733,7 +733,7 @@ static void veth_xdp_rcv_bulk_skb(struct veth_rq *rq, void **frames,
> >  	}
> >  }
> >  
> > -static void veth_xdp_get(struct xdp_buff *xdp)
> > +static void veth_xdp_get_shared(struct xdp_buff *xdp)
> >  {
> >  	struct skb_shared_info *sinfo = xdp_get_shared_info_from_buff(xdp);
> >  	int i;
> > @@ -746,12 +746,33 @@ static void veth_xdp_get(struct xdp_buff *xdp)
> >  		__skb_frag_ref(&sinfo->frags[i]);
> >  }
> >  
> > +static void veth_xdp_get_pp(struct xdp_buff *xdp)
> > +{
> > +	struct skb_shared_info *sinfo = xdp_get_shared_info_from_buff(xdp);
> > +	int i;
> > +
> > +	page_pool_ref_page(virt_to_page(xdp->data));
> > +	if (likely(!xdp_buff_has_frags(xdp)))
> > +		return;
> > +
> > +	for (i = 0; i < sinfo->nr_frags; i++) {
> > +		skb_frag_t *frag = &sinfo->frags[i];
> > +
> > +		page_pool_ref_page(netmem_to_page(frag->netmem));
> > +	}
> > +}
> > +
> > +static void veth_xdp_get(struct xdp_buff *xdp)
> > +{
> > +	xdp->rxq->mem.type == MEM_TYPE_PAGE_POOL ?
> > +		veth_xdp_get_pp(xdp) : veth_xdp_get_shared(xdp);
> > +}
> > +
> >  static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq,
> >  					struct xdp_buff *xdp,
> >  					struct sk_buff **pskb)
> >  {
> >  	struct sk_buff *skb = *pskb;
> > -	u32 frame_sz;
> >  
> >  	if (skb_shared(skb) || skb_head_is_locked(skb) ||
> >  	    skb_shinfo(skb)->nr_frags ||
> > @@ -762,19 +783,9 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq,
> >  		skb = *pskb;
> >  	}
> >  
> > -	/* SKB "head" area always have tailroom for skb_shared_info */
> > -	frame_sz = skb_end_pointer(skb) - skb->head;
> > -	frame_sz += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> > -	xdp_init_buff(xdp, frame_sz, &rq->xdp_rxq);
> > -	xdp_prepare_buff(xdp, skb->head, skb_headroom(skb),
> > -			 skb_headlen(skb), true);
> > +	__skb_pull(*pskb, skb->data - skb_mac_header(skb));
> 
> veth_xdp_rcv_skb() does:
> 
> 	__skb_push(skb, skb->data - skb_mac_header(skb));
> 	if (veth_convert_skb_to_xdp_buff(rq, xdp, &skb))
> 
> so how about just getting rid of that push instead of doing the opposite
> pull straight after? :)

Hi Toke,

I believe this is done so we get a proper headroom representation which is
needed for XDP_PACKET_HEADROOM comparison. Maybe we could be smarter here
and for example subtract mac header length? However I wanted to preserve
old behavior.

Thanks for review!
Maciej

> 
> -Toke
> 

  reply	other threads:[~2025-10-17 16:53 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-17 14:31 [PATCH v2 bpf 0/2] xdp: fix page_pool leaks Maciej Fijalkowski
2025-10-17 14:31 ` [PATCH v2 bpf 1/2] xdp: update xdp_rxq_info's mem type in XDP generic hook Maciej Fijalkowski
2025-10-17 16:45   ` Toke Høiland-Jørgensen
2025-10-20 11:20   ` Jesper Dangaard Brouer
2025-10-20 15:36     ` Alexander Lobakin
2025-10-20 17:53       ` Maciej Fijalkowski
2025-10-22  1:01         ` Jakub Kicinski
2025-10-22 11:22           ` Maciej Fijalkowski
2025-10-22 14:04             ` Jakub Kicinski
2025-10-17 14:31 ` [PATCH v2 bpf 2/2] veth: update mem type in xdp_buff Maciej Fijalkowski
2025-10-17 16:33   ` Toke Høiland-Jørgensen
2025-10-17 16:52     ` Maciej Fijalkowski [this message]
2025-10-17 17:51       ` Toke Høiland-Jørgensen
2025-10-17 18:12         ` Toke Høiland-Jørgensen
2025-10-17 20:08           ` Maciej Fijalkowski
2025-10-20 10:25             ` Toke Høiland-Jørgensen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aPJ0YqfH+pdSIbVS@boxer \
    --to=maciej.fijalkowski@intel.com \
    --cc=aleksander.lobakin@intel.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=hawk@kernel.org \
    --cc=ilias.apalodimas@linaro.org \
    --cc=kuba@kernel.org \
    --cc=lorenzo@kernel.org \
    --cc=magnus.karlsson@intel.com \
    --cc=netdev@vger.kernel.org \
    --cc=stfomichev@gmail.com \
    --cc=toke@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).