From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
To: "Toke Høiland-Jørgensen" <toke@redhat.com>
Cc: <bpf@vger.kernel.org>, <ast@kernel.org>, <daniel@iogearbox.net>,
<hawk@kernel.org>, <ilias.apalodimas@linaro.org>,
<lorenzo@kernel.org>, <kuba@kernel.org>, <netdev@vger.kernel.org>,
<magnus.karlsson@intel.com>, <andrii@kernel.org>,
<stfomichev@gmail.com>, <aleksander.lobakin@intel.com>
Subject: Re: [PATCH v2 bpf 2/2] veth: update mem type in xdp_buff
Date: Fri, 17 Oct 2025 18:52:50 +0200 [thread overview]
Message-ID: <aPJ0YqfH+pdSIbVS@boxer> (raw)
In-Reply-To: <87a51pij2l.fsf@toke.dk>
On Fri, Oct 17, 2025 at 06:33:54PM +0200, Toke Høiland-Jørgensen wrote:
> Maciej Fijalkowski <maciej.fijalkowski@intel.com> writes:
>
> > Veth calls skb_pp_cow_data() which makes the underlying memory to
> > originate from system page_pool. For CONFIG_DEBUG_VM=y and XDP program
> > that uses bpf_xdp_adjust_tail(), following splat was observed:
> >
> > [ 32.204881] BUG: Bad page state in process test_progs pfn:11c98b
> > [ 32.207167] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x11c98b
> > [ 32.210084] flags: 0x1fffe0000000000(node=0|zone=1|lastcpupid=0x7fff)
> > [ 32.212493] raw: 01fffe0000000000 dead000000000040 ff11000123c9b000 0000000000000000
> > [ 32.218056] raw: 0000000000000000 0000000000000001 00000000ffffffff 0000000000000000
> > [ 32.220900] page dumped because: page_pool leak
> > [ 32.222636] Modules linked in: bpf_testmod(O) bpf_preload
> > [ 32.224632] CPU: 6 UID: 0 PID: 3612 Comm: test_progs Tainted: G O 6.17.0-rc5-gfec474d29325 #6969 PREEMPT
> > [ 32.224638] Tainted: [O]=OOT_MODULE
> > [ 32.224639] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
> > [ 32.224641] Call Trace:
> > [ 32.224644] <IRQ>
> > [ 32.224646] dump_stack_lvl+0x4b/0x70
> > [ 32.224653] bad_page.cold+0xbd/0xe0
> > [ 32.224657] __free_frozen_pages+0x838/0x10b0
> > [ 32.224660] ? skb_pp_cow_data+0x782/0xc30
> > [ 32.224665] bpf_xdp_shrink_data+0x221/0x530
> > [ 32.224668] ? skb_pp_cow_data+0x6d1/0xc30
> > [ 32.224671] bpf_xdp_adjust_tail+0x598/0x810
> > [ 32.224673] ? xsk_destruct_skb+0x321/0x800
> > [ 32.224678] bpf_prog_004ac6bb21de57a7_xsk_xdp_adjust_tail+0x52/0xd6
> > [ 32.224681] veth_xdp_rcv_skb+0x45d/0x15a0
> > [ 32.224684] ? get_stack_info_noinstr+0x16/0xe0
> > [ 32.224688] ? veth_set_channels+0x920/0x920
> > [ 32.224691] ? get_stack_info+0x2f/0x80
> > [ 32.224693] ? unwind_next_frame+0x3af/0x1df0
> > [ 32.224697] veth_xdp_rcv.constprop.0+0x38a/0xbe0
> > [ 32.224700] ? common_startup_64+0x13e/0x148
> > [ 32.224703] ? veth_xdp_rcv_one+0xcd0/0xcd0
> > [ 32.224706] ? stack_trace_save+0x84/0xa0
> > [ 32.224709] ? stack_depot_save_flags+0x28/0x820
> > [ 32.224713] ? __resched_curr.constprop.0+0x332/0x3b0
> > [ 32.224716] ? timerqueue_add+0x217/0x320
> > [ 32.224719] veth_poll+0x115/0x5e0
> > [ 32.224722] ? veth_xdp_rcv.constprop.0+0xbe0/0xbe0
> > [ 32.224726] ? update_load_avg+0x1cb/0x12d0
> > [ 32.224730] ? update_cfs_group+0x121/0x2c0
> > [ 32.224733] __napi_poll+0xa0/0x420
> > [ 32.224736] net_rx_action+0x901/0xe90
> > [ 32.224740] ? run_backlog_napi+0x50/0x50
> > [ 32.224743] ? clockevents_program_event+0x1cc/0x280
> > [ 32.224746] ? hrtimer_interrupt+0x31e/0x7c0
> > [ 32.224749] handle_softirqs+0x151/0x430
> > [ 32.224752] do_softirq+0x3f/0x60
> > [ 32.224755] </IRQ>
> >
> > It's because xdp_rxq with mem model set to MEM_TYPE_PAGE_SHARED was used
> > when initializing xdp_buff.
> >
> > Fix this by using new helper xdp_convert_skb_to_buff() that, besides
> > init/prepare xdp_buff, will check if page used for linear part of
> > xdp_buff comes from page_pool. We assume that linear data and frags will
> > have same memory provider as currently XDP API does not provide us a way
> > to distinguish it (the mem model is registered for *whole* Rx queue and
> > here we speak about single buffer granularity).
> >
> > In order to meet expected skb layout by new helper, pull the mac header
> > before conversion from skb to xdp_buff.
> >
> > However, that is not enough as before releasing xdp_buff out of veth via
> > XDP_{TX,REDIRECT}, mem type on xdp_rxq associated with xdp_buff is
> > restored to its original model. We need to respect previous setting at
> > least until buff is converted to frame, as frame carries the mem_type.
> > Add a page_pool variant of veth_xdp_get() so that we avoid refcount
> > underflow when draining page frag.
> >
> > Fixes: 0ebab78cbcbf ("net: veth: add page_pool for page recycling")
> > Reported-by: Alexei Starovoitov <ast@kernel.org>
> > Closes: https://lore.kernel.org/bpf/CAADnVQ+bBofJDfieyOYzSmSujSfJwDTQhiz3aJw7hE+4E2_iPA@mail.gmail.com/
> > Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> > ---
> > drivers/net/veth.c | 43 +++++++++++++++++++++++++++----------------
> > 1 file changed, 27 insertions(+), 16 deletions(-)
> >
> > diff --git a/drivers/net/veth.c b/drivers/net/veth.c
> > index a3046142cb8e..eeeee7bba685 100644
> > --- a/drivers/net/veth.c
> > +++ b/drivers/net/veth.c
> > @@ -733,7 +733,7 @@ static void veth_xdp_rcv_bulk_skb(struct veth_rq *rq, void **frames,
> > }
> > }
> >
> > -static void veth_xdp_get(struct xdp_buff *xdp)
> > +static void veth_xdp_get_shared(struct xdp_buff *xdp)
> > {
> > struct skb_shared_info *sinfo = xdp_get_shared_info_from_buff(xdp);
> > int i;
> > @@ -746,12 +746,33 @@ static void veth_xdp_get(struct xdp_buff *xdp)
> > __skb_frag_ref(&sinfo->frags[i]);
> > }
> >
> > +static void veth_xdp_get_pp(struct xdp_buff *xdp)
> > +{
> > + struct skb_shared_info *sinfo = xdp_get_shared_info_from_buff(xdp);
> > + int i;
> > +
> > + page_pool_ref_page(virt_to_page(xdp->data));
> > + if (likely(!xdp_buff_has_frags(xdp)))
> > + return;
> > +
> > + for (i = 0; i < sinfo->nr_frags; i++) {
> > + skb_frag_t *frag = &sinfo->frags[i];
> > +
> > + page_pool_ref_page(netmem_to_page(frag->netmem));
> > + }
> > +}
> > +
> > +static void veth_xdp_get(struct xdp_buff *xdp)
> > +{
> > + xdp->rxq->mem.type == MEM_TYPE_PAGE_POOL ?
> > + veth_xdp_get_pp(xdp) : veth_xdp_get_shared(xdp);
> > +}
> > +
> > static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq,
> > struct xdp_buff *xdp,
> > struct sk_buff **pskb)
> > {
> > struct sk_buff *skb = *pskb;
> > - u32 frame_sz;
> >
> > if (skb_shared(skb) || skb_head_is_locked(skb) ||
> > skb_shinfo(skb)->nr_frags ||
> > @@ -762,19 +783,9 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq,
> > skb = *pskb;
> > }
> >
> > - /* SKB "head" area always have tailroom for skb_shared_info */
> > - frame_sz = skb_end_pointer(skb) - skb->head;
> > - frame_sz += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> > - xdp_init_buff(xdp, frame_sz, &rq->xdp_rxq);
> > - xdp_prepare_buff(xdp, skb->head, skb_headroom(skb),
> > - skb_headlen(skb), true);
> > + __skb_pull(*pskb, skb->data - skb_mac_header(skb));
>
> veth_xdp_rcv_skb() does:
>
> __skb_push(skb, skb->data - skb_mac_header(skb));
> if (veth_convert_skb_to_xdp_buff(rq, xdp, &skb))
>
> so how about just getting rid of that push instead of doing the opposite
> pull straight after? :)
Hi Toke,
I believe this is done so we get a proper headroom representation which is
needed for XDP_PACKET_HEADROOM comparison. Maybe we could be smarter here
and for example subtract mac header length? However I wanted to preserve
old behavior.
Thanks for review!
Maciej
>
> -Toke
>
next prev parent reply other threads:[~2025-10-17 16:53 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-17 14:31 [PATCH v2 bpf 0/2] xdp: fix page_pool leaks Maciej Fijalkowski
2025-10-17 14:31 ` [PATCH v2 bpf 1/2] xdp: update xdp_rxq_info's mem type in XDP generic hook Maciej Fijalkowski
2025-10-17 16:45 ` Toke Høiland-Jørgensen
2025-10-20 11:20 ` Jesper Dangaard Brouer
2025-10-20 15:36 ` Alexander Lobakin
2025-10-20 17:53 ` Maciej Fijalkowski
2025-10-22 1:01 ` Jakub Kicinski
2025-10-22 11:22 ` Maciej Fijalkowski
2025-10-22 14:04 ` Jakub Kicinski
2025-10-17 14:31 ` [PATCH v2 bpf 2/2] veth: update mem type in xdp_buff Maciej Fijalkowski
2025-10-17 16:33 ` Toke Høiland-Jørgensen
2025-10-17 16:52 ` Maciej Fijalkowski [this message]
2025-10-17 17:51 ` Toke Høiland-Jørgensen
2025-10-17 18:12 ` Toke Høiland-Jørgensen
2025-10-17 20:08 ` Maciej Fijalkowski
2025-10-20 10:25 ` Toke Høiland-Jørgensen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aPJ0YqfH+pdSIbVS@boxer \
--to=maciej.fijalkowski@intel.com \
--cc=aleksander.lobakin@intel.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=hawk@kernel.org \
--cc=ilias.apalodimas@linaro.org \
--cc=kuba@kernel.org \
--cc=lorenzo@kernel.org \
--cc=magnus.karlsson@intel.com \
--cc=netdev@vger.kernel.org \
--cc=stfomichev@gmail.com \
--cc=toke@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.