* [PATCH bpf 0/2] xdp: fix page_pool leaks
@ 2025-10-03 14:02 Maciej Fijalkowski
2025-10-03 14:02 ` [PATCH bpf 1/2] xdp: update xdp_rxq_info's mem type in XDP generic hook Maciej Fijalkowski
2025-10-03 14:02 ` [PATCH bpf 2/2] veth: update mem type in xdp_buff Maciej Fijalkowski
0 siblings, 2 replies; 13+ messages in thread
From: Maciej Fijalkowski @ 2025-10-03 14:02 UTC (permalink / raw)
To: bpf, ast, daniel, hawk, ilias.apalodimas, toke, lorenzo
Cc: netdev, magnus.karlsson, andrii, stfomichev, aleksander.lobakin,
Maciej Fijalkowski
Hi,
this set fixes page_pool leaks which happen when fragment is released
within XDP program and memory type is set incorrectly. The reports come
from syzbot and AF_XDP test suite.
This work was started by Octavian Purdila, below you can find links to
two revisions:
v2:
https://lore.kernel.org/bpf/20251001074704.2817028-1-tavip@google.com/
- use a local xdp_rxq_info structure and update mem.type instead of
skipping using page pool if the netdev xdp_rxq_info is not set to
MEM_TYPE_PAGE_POOL (which is always the case currently)
v1:
https://lore.kernel.org/netdev/20250924060843.2280499-1-tavip@google.com/
In this approach we use a small helper that utilizes
page_pool_page_is_pp(), which is enough for setting correct mem type in
xdp_rxq_info.
Thanks!
Maciej Fijalkowski (2):
xdp: update xdp_rxq_info's mem type in XDP generic hook
veth: update mem type in xdp_buff
drivers/net/veth.c | 2 ++
include/net/xdp.h | 7 +++++++
net/core/dev.c | 2 ++
3 files changed, 11 insertions(+)
--
2.43.0
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH bpf 1/2] xdp: update xdp_rxq_info's mem type in XDP generic hook
2025-10-03 14:02 [PATCH bpf 0/2] xdp: fix page_pool leaks Maciej Fijalkowski
@ 2025-10-03 14:02 ` Maciej Fijalkowski
2025-10-03 17:29 ` Alexei Starovoitov
2025-10-03 14:02 ` [PATCH bpf 2/2] veth: update mem type in xdp_buff Maciej Fijalkowski
1 sibling, 1 reply; 13+ messages in thread
From: Maciej Fijalkowski @ 2025-10-03 14:02 UTC (permalink / raw)
To: bpf, ast, daniel, hawk, ilias.apalodimas, toke, lorenzo
Cc: netdev, magnus.karlsson, andrii, stfomichev, aleksander.lobakin,
Maciej Fijalkowski, syzbot+ff145014d6b0ce64a173, Ihor Solodrai,
Octavian Purdila
Currently, generic XDP hook uses xdp_rxq_info from netstack Rx queues
which do not have its XDP memory model registered. There is a case when
XDP program calls bpf_xdp_adjust_tail() BPF helper that releases
underlying memory. This happens when it consumes enough amount of bytes
and when XDP buffer has fragments. For this action the memory model
knowledge passed to XDP program is crucial so that core can call
suitable function for freeing/recycling the page.
For netstack queues it defaults to MEM_TYPE_PAGE_SHARED (0) due to lack
of mem model registration. The problem we're fixing here is when kernel
copied the skb to new buffer backed by system's page_pool and XDP buffer
is built around it. Then when bpf_xdp_adjust_tail() calls
__xdp_return(), it acts incorrectly due to mem type not being set to
MEM_TYPE_PAGE_POOL and causes a page leak.
For this purpose introduce a small helper, xdp_update_mem_type(), that
could be used on other callsites such as veth which are open to this
problem as well. Here we call it right before executing XDP program in
generic XDP hook.
This problem was triggered by syzbot as well as AF_XDP test suite which
is about to be integrated to BPF CI.
Reported-by: syzbot+ff145014d6b0ce64a173@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/6756c37b.050a0220.a30f1.019a.GAE@google.com/
Fixes: e6d5dbdd20aa ("xdp: add multi-buff support for xdp running in generic mode")
Tested-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Co-developed-by: Octavian Purdila <tavip@google.com>
Signed-off-by: Octavian Purdila <tavip@google.com> # whole analysis, testing, initiating a fix
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> # commit msg and proposed more robust fix
---
include/net/xdp.h | 7 +++++++
net/core/dev.c | 2 ++
2 files changed, 9 insertions(+)
diff --git a/include/net/xdp.h b/include/net/xdp.h
index f288c348a6c1..5568e41cc191 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -336,6 +336,13 @@ xdp_update_skb_shared_info(struct sk_buff *skb, u8 nr_frags,
skb->pfmemalloc |= pfmemalloc;
}
+static inline void
+xdp_update_mem_type(struct xdp_buff *xdp)
+{
+ xdp->rxq->mem.type = page_pool_page_is_pp(virt_to_page(xdp->data)) ?
+ MEM_TYPE_PAGE_POOL : MEM_TYPE_PAGE_SHARED;
+}
+
/* Avoids inlining WARN macro in fast-path */
void xdp_warn(const char *msg, const char *func, const int line);
#define XDP_WARN(msg) xdp_warn(msg, __func__, __LINE__)
diff --git a/net/core/dev.c b/net/core/dev.c
index 93a25d87b86b..076cd4a4b73f 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5269,6 +5269,8 @@ u32 bpf_prog_run_generic_xdp(struct sk_buff *skb, struct xdp_buff *xdp,
orig_bcast = is_multicast_ether_addr_64bits(eth->h_dest);
orig_eth_type = eth->h_proto;
+ xdp_update_mem_type(xdp);
+
act = bpf_prog_run_xdp(xdp_prog, xdp);
/* check if bpf_xdp_adjust_head was used */
--
2.43.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH bpf 2/2] veth: update mem type in xdp_buff
2025-10-03 14:02 [PATCH bpf 0/2] xdp: fix page_pool leaks Maciej Fijalkowski
2025-10-03 14:02 ` [PATCH bpf 1/2] xdp: update xdp_rxq_info's mem type in XDP generic hook Maciej Fijalkowski
@ 2025-10-03 14:02 ` Maciej Fijalkowski
2025-10-03 23:10 ` Jakub Kicinski
1 sibling, 1 reply; 13+ messages in thread
From: Maciej Fijalkowski @ 2025-10-03 14:02 UTC (permalink / raw)
To: bpf, ast, daniel, hawk, ilias.apalodimas, toke, lorenzo
Cc: netdev, magnus.karlsson, andrii, stfomichev, aleksander.lobakin,
Maciej Fijalkowski
Veth calls skb_pp_cow_data() which makes the underlying memory to
originate from system page_pool. For CONFIG_DEBUG_VM=y and XDP program
that uses bpf_xdp_adjust_tail(), following splat was observed:
[ 32.204881] BUG: Bad page state in process test_progs pfn:11c98b
[ 32.207167] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x11c98b
[ 32.210084] flags: 0x1fffe0000000000(node=0|zone=1|lastcpupid=0x7fff)
[ 32.212493] raw: 01fffe0000000000 dead000000000040 ff11000123c9b000 0000000000000000
[ 32.218056] raw: 0000000000000000 0000000000000001 00000000ffffffff 0000000000000000
[ 32.220900] page dumped because: page_pool leak
[ 32.222636] Modules linked in: bpf_testmod(O) bpf_preload
[ 32.224632] CPU: 6 UID: 0 PID: 3612 Comm: test_progs Tainted: G O 6.17.0-rc5-gfec474d29325 #6969 PREEMPT
[ 32.224638] Tainted: [O]=OOT_MODULE
[ 32.224639] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 32.224641] Call Trace:
[ 32.224644] <IRQ>
[ 32.224646] dump_stack_lvl+0x4b/0x70
[ 32.224653] bad_page.cold+0xbd/0xe0
[ 32.224657] __free_frozen_pages+0x838/0x10b0
[ 32.224660] ? skb_pp_cow_data+0x782/0xc30
[ 32.224665] bpf_xdp_shrink_data+0x221/0x530
[ 32.224668] ? skb_pp_cow_data+0x6d1/0xc30
[ 32.224671] bpf_xdp_adjust_tail+0x598/0x810
[ 32.224673] ? xsk_destruct_skb+0x321/0x800
[ 32.224678] bpf_prog_004ac6bb21de57a7_xsk_xdp_adjust_tail+0x52/0xd6
[ 32.224681] veth_xdp_rcv_skb+0x45d/0x15a0
[ 32.224684] ? get_stack_info_noinstr+0x16/0xe0
[ 32.224688] ? veth_set_channels+0x920/0x920
[ 32.224691] ? get_stack_info+0x2f/0x80
[ 32.224693] ? unwind_next_frame+0x3af/0x1df0
[ 32.224697] veth_xdp_rcv.constprop.0+0x38a/0xbe0
[ 32.224700] ? common_startup_64+0x13e/0x148
[ 32.224703] ? veth_xdp_rcv_one+0xcd0/0xcd0
[ 32.224706] ? stack_trace_save+0x84/0xa0
[ 32.224709] ? stack_depot_save_flags+0x28/0x820
[ 32.224713] ? __resched_curr.constprop.0+0x332/0x3b0
[ 32.224716] ? timerqueue_add+0x217/0x320
[ 32.224719] veth_poll+0x115/0x5e0
[ 32.224722] ? veth_xdp_rcv.constprop.0+0xbe0/0xbe0
[ 32.224726] ? update_load_avg+0x1cb/0x12d0
[ 32.224730] ? update_cfs_group+0x121/0x2c0
[ 32.224733] __napi_poll+0xa0/0x420
[ 32.224736] net_rx_action+0x901/0xe90
[ 32.224740] ? run_backlog_napi+0x50/0x50
[ 32.224743] ? clockevents_program_event+0x1cc/0x280
[ 32.224746] ? hrtimer_interrupt+0x31e/0x7c0
[ 32.224749] handle_softirqs+0x151/0x430
[ 32.224752] do_softirq+0x3f/0x60
[ 32.224755] </IRQ>
It's because xdp_rxq with mem model set to MEM_TYPE_PAGE_SHARED was used
when initializing xdp_buff.
Fix this by using new helper xdp_update_mem_type() that will check if
page used for linear part of xdp_buff comes from page_pool. We assume
that linear data and frags will have same memory provider as currently
XDP API does not provide us a way to distinguish it (the mem model is
registered for *whole* Rx queue and here we speak about single buffer
granularity).
Fixes: 0ebab78cbcbf ("net: veth: add page_pool for page recycling")
Reported-by: Alexei Starovoitov <ast@kernel.org>
Closes: https://lore.kernel.org/bpf/CAADnVQ+bBofJDfieyOYzSmSujSfJwDTQhiz3aJw7hE+4E2_iPA@mail.gmail.com/
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
---
drivers/net/veth.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index a3046142cb8e..2ccc40e57d12 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -814,6 +814,8 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
orig_data = xdp->data;
orig_data_end = xdp->data_end;
+ xdp_update_mem_type(xdp);
+
act = bpf_prog_run_xdp(xdp_prog, xdp);
switch (act) {
--
2.43.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH bpf 1/2] xdp: update xdp_rxq_info's mem type in XDP generic hook
2025-10-03 14:02 ` [PATCH bpf 1/2] xdp: update xdp_rxq_info's mem type in XDP generic hook Maciej Fijalkowski
@ 2025-10-03 17:29 ` Alexei Starovoitov
2025-10-07 19:39 ` Maciej Fijalkowski
0 siblings, 1 reply; 13+ messages in thread
From: Alexei Starovoitov @ 2025-10-03 17:29 UTC (permalink / raw)
To: Maciej Fijalkowski
Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
Ilias Apalodimas, Toke Høiland-Jørgensen,
Lorenzo Bianconi, Network Development, Karlsson, Magnus,
Andrii Nakryiko, Stanislav Fomichev, Alexander Lobakin,
syzbot+ff145014d6b0ce64a173, Ihor Solodrai, Octavian Purdila
On Fri, Oct 3, 2025 at 7:03 AM Maciej Fijalkowski
<maciej.fijalkowski@intel.com> wrote:
>
> Currently, generic XDP hook uses xdp_rxq_info from netstack Rx queues
> which do not have its XDP memory model registered. There is a case when
> XDP program calls bpf_xdp_adjust_tail() BPF helper that releases
> underlying memory. This happens when it consumes enough amount of bytes
> and when XDP buffer has fragments. For this action the memory model
> knowledge passed to XDP program is crucial so that core can call
> suitable function for freeing/recycling the page.
>
> For netstack queues it defaults to MEM_TYPE_PAGE_SHARED (0) due to lack
> of mem model registration. The problem we're fixing here is when kernel
> copied the skb to new buffer backed by system's page_pool and XDP buffer
> is built around it. Then when bpf_xdp_adjust_tail() calls
> __xdp_return(), it acts incorrectly due to mem type not being set to
> MEM_TYPE_PAGE_POOL and causes a page leak.
>
> For this purpose introduce a small helper, xdp_update_mem_type(), that
> could be used on other callsites such as veth which are open to this
> problem as well. Here we call it right before executing XDP program in
> generic XDP hook.
>
> This problem was triggered by syzbot as well as AF_XDP test suite which
> is about to be integrated to BPF CI.
>
> Reported-by: syzbot+ff145014d6b0ce64a173@syzkaller.appspotmail.com
> Closes: https://lore.kernel.org/netdev/6756c37b.050a0220.a30f1.019a.GAE@google.com/
> Fixes: e6d5dbdd20aa ("xdp: add multi-buff support for xdp running in generic mode")
> Tested-by: Ihor Solodrai <ihor.solodrai@linux.dev>
> Co-developed-by: Octavian Purdila <tavip@google.com>
> Signed-off-by: Octavian Purdila <tavip@google.com> # whole analysis, testing, initiating a fix
> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> # commit msg and proposed more robust fix
> ---
> include/net/xdp.h | 7 +++++++
> net/core/dev.c | 2 ++
> 2 files changed, 9 insertions(+)
>
> diff --git a/include/net/xdp.h b/include/net/xdp.h
> index f288c348a6c1..5568e41cc191 100644
> --- a/include/net/xdp.h
> +++ b/include/net/xdp.h
> @@ -336,6 +336,13 @@ xdp_update_skb_shared_info(struct sk_buff *skb, u8 nr_frags,
> skb->pfmemalloc |= pfmemalloc;
> }
>
> +static inline void
> +xdp_update_mem_type(struct xdp_buff *xdp)
> +{
> + xdp->rxq->mem.type = page_pool_page_is_pp(virt_to_page(xdp->data)) ?
> + MEM_TYPE_PAGE_POOL : MEM_TYPE_PAGE_SHARED;
> +}
AI says that it's racy and I think it's onto something.
See
https://github.com/kernel-patches/bpf/actions/runs/18224704286/job/51892959919
and
https://github.com/kernel-patches/bpf/actions/runs/18224704286/job/51892959937
click on 'Check review-inline.txt'
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH bpf 2/2] veth: update mem type in xdp_buff
2025-10-03 14:02 ` [PATCH bpf 2/2] veth: update mem type in xdp_buff Maciej Fijalkowski
@ 2025-10-03 23:10 ` Jakub Kicinski
2025-10-07 14:59 ` Maciej Fijalkowski
0 siblings, 1 reply; 13+ messages in thread
From: Jakub Kicinski @ 2025-10-03 23:10 UTC (permalink / raw)
To: Maciej Fijalkowski
Cc: bpf, ast, daniel, hawk, ilias.apalodimas, toke, lorenzo, netdev,
magnus.karlsson, andrii, stfomichev, aleksander.lobakin
On Fri, 3 Oct 2025 16:02:43 +0200 Maciej Fijalkowski wrote:
> + xdp_update_mem_type(xdp);
> +
> act = bpf_prog_run_xdp(xdp_prog, xdp);
The new helper doesn't really express what's going on. Developers
won't know what are we updating mem_type() to, and why. Right?
My thinking was that we should try to bake the rxq into "conversion"
APIs, draft diff below, very much unfinished and I'm probably missing
some cases but hopefully gets the point across:
diff --git a/include/net/xdp.h b/include/net/xdp.h
index aa742f413c35..e7f75d551d8f 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -384,9 +384,21 @@ struct sk_buff *xdp_build_skb_from_frame(struct xdp_frame *xdpf,
struct net_device *dev);
struct xdp_frame *xdpf_clone(struct xdp_frame *xdpf);
+/* Initialize rxq struct on the stack for processing @frame.
+ * Not necessary when processing in context of a driver which has a real rxq,
+ * and passes it to xdp_convert_frame_to_buff().
+ */
+static inline
+void xdp_rxq_prep_on_stack(const struct xdp_frame *frame,
+ struct xdp_rxq_info *rxq)
+{
+ rxq->dev = xdpf->dev_rx;
+ /* TODO: report queue_index to xdp_rxq_info */
+}
+
static inline
void xdp_convert_frame_to_buff(const struct xdp_frame *frame,
- struct xdp_buff *xdp)
+ struct xdp_buff *xdp, struct xdp_rxq_info *rxq)
{
xdp->data_hard_start = frame->data - frame->headroom - sizeof(*frame);
xdp->data = frame->data;
@@ -394,6 +406,22 @@ void xdp_convert_frame_to_buff(const struct xdp_frame *frame,
xdp->data_meta = frame->data - frame->metasize;
xdp->frame_sz = frame->frame_sz;
xdp->flags = frame->flags;
+
+ rxq->mem.type = xdpf->mem_type;
+}
+
+/* Initialize an xdp_buff from an skb.
+ *
+ * Note: if skb has frags skb_cow_data_for_xdp() must be called first,
+ * or caller must otherwise guarantee that the frags come from a page pool
+ */
+static inline
+void xdp_convert_skb_to_buff(const struct xdp_frame *frame,
+ struct xdp_buff *xdp, struct xdp_rxq_info *rxq)
+{
+ // copy the init_buff / prep_buff here
+
+ rxq->mem.type = MEM_TYPE_PAGE_POOL; /* see note above the function */
}
static inline
diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c
index 703e5df1f4ef..60ba15bbec59 100644
--- a/kernel/bpf/cpumap.c
+++ b/kernel/bpf/cpumap.c
@@ -193,11 +193,8 @@ static int cpu_map_bpf_prog_run_xdp(struct bpf_cpu_map_entry *rcpu,
u32 act;
int err;
- rxq.dev = xdpf->dev_rx;
- rxq.mem.type = xdpf->mem_type;
- /* TODO: report queue_index to xdp_rxq_info */
-
- xdp_convert_frame_to_buff(xdpf, &xdp);
+ xdp_rxq_prep_on_stack(xdpf, &rxq);
+ xdp_convert_frame_to_buff(xdpf, &xdp, &rxq);
act = bpf_prog_run_xdp(rcpu->prog, &xdp);
switch (act) {
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH bpf 2/2] veth: update mem type in xdp_buff
2025-10-03 23:10 ` Jakub Kicinski
@ 2025-10-07 14:59 ` Maciej Fijalkowski
2025-10-08 1:11 ` Jakub Kicinski
0 siblings, 1 reply; 13+ messages in thread
From: Maciej Fijalkowski @ 2025-10-07 14:59 UTC (permalink / raw)
To: Jakub Kicinski
Cc: bpf, ast, daniel, hawk, ilias.apalodimas, toke, lorenzo, netdev,
magnus.karlsson, andrii, stfomichev, aleksander.lobakin
On Fri, Oct 03, 2025 at 04:10:26PM -0700, Jakub Kicinski wrote:
> On Fri, 3 Oct 2025 16:02:43 +0200 Maciej Fijalkowski wrote:
> > + xdp_update_mem_type(xdp);
> > +
> > act = bpf_prog_run_xdp(xdp_prog, xdp);
>
> The new helper doesn't really express what's going on. Developers
> won't know what are we updating mem_type() to, and why. Right?
Hey sorry for delay.
Agree that it lacks sufficient comment explaining the purpose behind it.
>
> My thinking was that we should try to bake the rxq into "conversion"
> APIs, draft diff below, very much unfinished and I'm probably missing
> some cases but hopefully gets the point across:
That is not related IMHO. The bugs being fixed have existing rxqs. It's
just the mem type that needs to be correctly set per packet.
Plus we do *not* convert frame to buff here which was your initial (on
point) comment WRT onstack rxqs. Traffic comes as skbs from peer's
ndo_start_xmit(). What you're referring to is when source is xdp_frame (in
veth case this is when ndo_xdp_xmit or XDP_TX is used).
However the problem pointed out by AI (!) is something we should fix as
for XDP_{TX,REDIRECT} xdp_rxq_info is overwritten and mem type update is
lost.
>
> diff --git a/include/net/xdp.h b/include/net/xdp.h
> index aa742f413c35..e7f75d551d8f 100644
> --- a/include/net/xdp.h
> +++ b/include/net/xdp.h
> @@ -384,9 +384,21 @@ struct sk_buff *xdp_build_skb_from_frame(struct xdp_frame *xdpf,
> struct net_device *dev);
> struct xdp_frame *xdpf_clone(struct xdp_frame *xdpf);
>
> +/* Initialize rxq struct on the stack for processing @frame.
> + * Not necessary when processing in context of a driver which has a real rxq,
> + * and passes it to xdp_convert_frame_to_buff().
> + */
> +static inline
> +void xdp_rxq_prep_on_stack(const struct xdp_frame *frame,
> + struct xdp_rxq_info *rxq)
> +{
> + rxq->dev = xdpf->dev_rx;
> + /* TODO: report queue_index to xdp_rxq_info */
> +}
> +
> static inline
> void xdp_convert_frame_to_buff(const struct xdp_frame *frame,
> - struct xdp_buff *xdp)
> + struct xdp_buff *xdp, struct xdp_rxq_info *rxq)
> {
> xdp->data_hard_start = frame->data - frame->headroom - sizeof(*frame);
> xdp->data = frame->data;
> @@ -394,6 +406,22 @@ void xdp_convert_frame_to_buff(const struct xdp_frame *frame,
> xdp->data_meta = frame->data - frame->metasize;
> xdp->frame_sz = frame->frame_sz;
> xdp->flags = frame->flags;
> +
> + rxq->mem.type = xdpf->mem_type;
> +}
> +
> +/* Initialize an xdp_buff from an skb.
> + *
> + * Note: if skb has frags skb_cow_data_for_xdp() must be called first,
> + * or caller must otherwise guarantee that the frags come from a page pool
> + */
> +static inline
> +void xdp_convert_skb_to_buff(const struct xdp_frame *frame,
> + struct xdp_buff *xdp, struct xdp_rxq_info *rxq)
I would expect to get skb as an input here
> +{
> + // copy the init_buff / prep_buff here
> +
> + rxq->mem.type = MEM_TYPE_PAGE_POOL; /* see note above the function */
> }
>
> static inline
> diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c
> index 703e5df1f4ef..60ba15bbec59 100644
> --- a/kernel/bpf/cpumap.c
> +++ b/kernel/bpf/cpumap.c
> @@ -193,11 +193,8 @@ static int cpu_map_bpf_prog_run_xdp(struct bpf_cpu_map_entry *rcpu,
> u32 act;
> int err;
>
> - rxq.dev = xdpf->dev_rx;
> - rxq.mem.type = xdpf->mem_type;
> - /* TODO: report queue_index to xdp_rxq_info */
> -
> - xdp_convert_frame_to_buff(xdpf, &xdp);
> + xdp_rxq_prep_on_stack(xdpf, &rxq);
> + xdp_convert_frame_to_buff(xdpf, &xdp, &rxq);
>
> act = bpf_prog_run_xdp(rcpu->prog, &xdp);
> switch (act) {
>
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH bpf 1/2] xdp: update xdp_rxq_info's mem type in XDP generic hook
2025-10-03 17:29 ` Alexei Starovoitov
@ 2025-10-07 19:39 ` Maciej Fijalkowski
2025-10-07 20:18 ` Alexei Starovoitov
0 siblings, 1 reply; 13+ messages in thread
From: Maciej Fijalkowski @ 2025-10-07 19:39 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
Ilias Apalodimas, Toke Høiland-Jørgensen,
Lorenzo Bianconi, Network Development, Karlsson, Magnus,
Andrii Nakryiko, Stanislav Fomichev, Alexander Lobakin,
syzbot+ff145014d6b0ce64a173, Ihor Solodrai, Octavian Purdila
On Fri, Oct 03, 2025 at 10:29:08AM -0700, Alexei Starovoitov wrote:
> On Fri, Oct 3, 2025 at 7:03 AM Maciej Fijalkowski
> <maciej.fijalkowski@intel.com> wrote:
> >
> > Currently, generic XDP hook uses xdp_rxq_info from netstack Rx queues
> > which do not have its XDP memory model registered. There is a case when
> > XDP program calls bpf_xdp_adjust_tail() BPF helper that releases
> > underlying memory. This happens when it consumes enough amount of bytes
> > and when XDP buffer has fragments. For this action the memory model
> > knowledge passed to XDP program is crucial so that core can call
> > suitable function for freeing/recycling the page.
> >
> > For netstack queues it defaults to MEM_TYPE_PAGE_SHARED (0) due to lack
> > of mem model registration. The problem we're fixing here is when kernel
> > copied the skb to new buffer backed by system's page_pool and XDP buffer
> > is built around it. Then when bpf_xdp_adjust_tail() calls
> > __xdp_return(), it acts incorrectly due to mem type not being set to
> > MEM_TYPE_PAGE_POOL and causes a page leak.
> >
> > For this purpose introduce a small helper, xdp_update_mem_type(), that
> > could be used on other callsites such as veth which are open to this
> > problem as well. Here we call it right before executing XDP program in
> > generic XDP hook.
> >
> > This problem was triggered by syzbot as well as AF_XDP test suite which
> > is about to be integrated to BPF CI.
> >
> > Reported-by: syzbot+ff145014d6b0ce64a173@syzkaller.appspotmail.com
> > Closes: https://lore.kernel.org/netdev/6756c37b.050a0220.a30f1.019a.GAE@google.com/
> > Fixes: e6d5dbdd20aa ("xdp: add multi-buff support for xdp running in generic mode")
> > Tested-by: Ihor Solodrai <ihor.solodrai@linux.dev>
> > Co-developed-by: Octavian Purdila <tavip@google.com>
> > Signed-off-by: Octavian Purdila <tavip@google.com> # whole analysis, testing, initiating a fix
> > Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> # commit msg and proposed more robust fix
> > ---
> > include/net/xdp.h | 7 +++++++
> > net/core/dev.c | 2 ++
> > 2 files changed, 9 insertions(+)
> >
> > diff --git a/include/net/xdp.h b/include/net/xdp.h
> > index f288c348a6c1..5568e41cc191 100644
> > --- a/include/net/xdp.h
> > +++ b/include/net/xdp.h
> > @@ -336,6 +336,13 @@ xdp_update_skb_shared_info(struct sk_buff *skb, u8 nr_frags,
> > skb->pfmemalloc |= pfmemalloc;
> > }
> >
> > +static inline void
> > +xdp_update_mem_type(struct xdp_buff *xdp)
> > +{
> > + xdp->rxq->mem.type = page_pool_page_is_pp(virt_to_page(xdp->data)) ?
> > + MEM_TYPE_PAGE_POOL : MEM_TYPE_PAGE_SHARED;
> > +}
>
> AI says that it's racy and I think it's onto something.
> See
> https://github.com/kernel-patches/bpf/actions/runs/18224704286/job/51892959919
> and
> https://github.com/kernel-patches/bpf/actions/runs/18224704286/job/51892959937
How do we respond to this in future?
On first one AI missed the fact we run under napi so two cpus won't play
with same rx queue concurrently. The second is valid thing to be fixed.
>
> click on 'Check review-inline.txt'
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH bpf 1/2] xdp: update xdp_rxq_info's mem type in XDP generic hook
2025-10-07 19:39 ` Maciej Fijalkowski
@ 2025-10-07 20:18 ` Alexei Starovoitov
0 siblings, 0 replies; 13+ messages in thread
From: Alexei Starovoitov @ 2025-10-07 20:18 UTC (permalink / raw)
To: Maciej Fijalkowski
Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
Ilias Apalodimas, Toke Høiland-Jørgensen,
Lorenzo Bianconi, Network Development, Karlsson, Magnus,
Andrii Nakryiko, Stanislav Fomichev, Alexander Lobakin,
syzbot+ff145014d6b0ce64a173, Ihor Solodrai, Octavian Purdila
On Tue, Oct 7, 2025 at 12:39 PM Maciej Fijalkowski
<maciej.fijalkowski@intel.com> wrote:
>
> On Fri, Oct 03, 2025 at 10:29:08AM -0700, Alexei Starovoitov wrote:
> > On Fri, Oct 3, 2025 at 7:03 AM Maciej Fijalkowski
> > <maciej.fijalkowski@intel.com> wrote:
> > >
> > > Currently, generic XDP hook uses xdp_rxq_info from netstack Rx queues
> > > which do not have its XDP memory model registered. There is a case when
> > > XDP program calls bpf_xdp_adjust_tail() BPF helper that releases
> > > underlying memory. This happens when it consumes enough amount of bytes
> > > and when XDP buffer has fragments. For this action the memory model
> > > knowledge passed to XDP program is crucial so that core can call
> > > suitable function for freeing/recycling the page.
> > >
> > > For netstack queues it defaults to MEM_TYPE_PAGE_SHARED (0) due to lack
> > > of mem model registration. The problem we're fixing here is when kernel
> > > copied the skb to new buffer backed by system's page_pool and XDP buffer
> > > is built around it. Then when bpf_xdp_adjust_tail() calls
> > > __xdp_return(), it acts incorrectly due to mem type not being set to
> > > MEM_TYPE_PAGE_POOL and causes a page leak.
> > >
> > > For this purpose introduce a small helper, xdp_update_mem_type(), that
> > > could be used on other callsites such as veth which are open to this
> > > problem as well. Here we call it right before executing XDP program in
> > > generic XDP hook.
> > >
> > > This problem was triggered by syzbot as well as AF_XDP test suite which
> > > is about to be integrated to BPF CI.
> > >
> > > Reported-by: syzbot+ff145014d6b0ce64a173@syzkaller.appspotmail.com
> > > Closes: https://lore.kernel.org/netdev/6756c37b.050a0220.a30f1.019a.GAE@google.com/
> > > Fixes: e6d5dbdd20aa ("xdp: add multi-buff support for xdp running in generic mode")
> > > Tested-by: Ihor Solodrai <ihor.solodrai@linux.dev>
> > > Co-developed-by: Octavian Purdila <tavip@google.com>
> > > Signed-off-by: Octavian Purdila <tavip@google.com> # whole analysis, testing, initiating a fix
> > > Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> # commit msg and proposed more robust fix
> > > ---
> > > include/net/xdp.h | 7 +++++++
> > > net/core/dev.c | 2 ++
> > > 2 files changed, 9 insertions(+)
> > >
> > > diff --git a/include/net/xdp.h b/include/net/xdp.h
> > > index f288c348a6c1..5568e41cc191 100644
> > > --- a/include/net/xdp.h
> > > +++ b/include/net/xdp.h
> > > @@ -336,6 +336,13 @@ xdp_update_skb_shared_info(struct sk_buff *skb, u8 nr_frags,
> > > skb->pfmemalloc |= pfmemalloc;
> > > }
> > >
> > > +static inline void
> > > +xdp_update_mem_type(struct xdp_buff *xdp)
> > > +{
> > > + xdp->rxq->mem.type = page_pool_page_is_pp(virt_to_page(xdp->data)) ?
> > > + MEM_TYPE_PAGE_POOL : MEM_TYPE_PAGE_SHARED;
> > > +}
> >
> > AI says that it's racy and I think it's onto something.
> > See
> > https://github.com/kernel-patches/bpf/actions/runs/18224704286/job/51892959919
> > and
> > https://github.com/kernel-patches/bpf/actions/runs/18224704286/job/51892959937
>
> How do we respond to this in future?
The bot will start sending emails eventually, so developers can
respond directly with email to maintainers.
> On first one AI missed the fact we run under napi so two cpus won't play
> with same rx queue concurrently. The second is valid thing to be fixed.
Good point about napi.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH bpf 2/2] veth: update mem type in xdp_buff
2025-10-07 14:59 ` Maciej Fijalkowski
@ 2025-10-08 1:11 ` Jakub Kicinski
2025-10-08 9:22 ` Maciej Fijalkowski
0 siblings, 1 reply; 13+ messages in thread
From: Jakub Kicinski @ 2025-10-08 1:11 UTC (permalink / raw)
To: Maciej Fijalkowski
Cc: bpf, ast, daniel, hawk, ilias.apalodimas, toke, lorenzo, netdev,
magnus.karlsson, andrii, stfomichev, aleksander.lobakin
On Tue, 7 Oct 2025 16:59:21 +0200 Maciej Fijalkowski wrote:
> > My thinking was that we should try to bake the rxq into "conversion"
> > APIs, draft diff below, very much unfinished and I'm probably missing
> > some cases but hopefully gets the point across:
>
> That is not related IMHO. The bugs being fixed have existing rxqs. It's
> just the mem type that needs to be correctly set per packet.
>
> Plus we do *not* convert frame to buff here which was your initial (on
> point) comment WRT onstack rxqs. Traffic comes as skbs from peer's
> ndo_start_xmit(). What you're referring to is when source is xdp_frame (in
> veth case this is when ndo_xdp_xmit or XDP_TX is used).
I guess we're slipping into a philosophical discussion but I'd say
that the problem is that rxq stores part of what is de facto xdp buff
state. It is evacuated into the xdp frame when frame is constructed,
as packet is detached from driver context. We need to reconstitute it
when we convert frame (skb, or anything else) back info an xdp buff.
xdp_convert_buff_to_frame() and xdp_convert_frame_to_buff() should be
a mirror image of each other, to put it more concisely.
> However the problem pointed out by AI (!) is something we should fix as
> for XDP_{TX,REDIRECT} xdp_rxq_info is overwritten and mem type update is
> lost.
> > +/* Initialize an xdp_buff from an skb.
> > + *
> > + * Note: if skb has frags skb_cow_data_for_xdp() must be called first,
> > + * or caller must otherwise guarantee that the frags come from a page pool
> > + */
> > +static inline
> > +void xdp_convert_skb_to_buff(const struct xdp_frame *frame,
> > + struct xdp_buff *xdp, struct xdp_rxq_info *rxq)
>
> I would expect to get skb as an input here
Joł. Don't nit pick my draft diff :D It's not meant as a working patch.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH bpf 2/2] veth: update mem type in xdp_buff
2025-10-08 1:11 ` Jakub Kicinski
@ 2025-10-08 9:22 ` Maciej Fijalkowski
2025-10-08 10:37 ` Maciej Fijalkowski
0 siblings, 1 reply; 13+ messages in thread
From: Maciej Fijalkowski @ 2025-10-08 9:22 UTC (permalink / raw)
To: Jakub Kicinski
Cc: bpf, ast, daniel, hawk, ilias.apalodimas, toke, lorenzo, netdev,
magnus.karlsson, andrii, stfomichev, aleksander.lobakin
On Tue, Oct 07, 2025 at 06:11:53PM -0700, Jakub Kicinski wrote:
> On Tue, 7 Oct 2025 16:59:21 +0200 Maciej Fijalkowski wrote:
> > > My thinking was that we should try to bake the rxq into "conversion"
> > > APIs, draft diff below, very much unfinished and I'm probably missing
> > > some cases but hopefully gets the point across:
> >
> > That is not related IMHO. The bugs being fixed have existing rxqs. It's
> > just the mem type that needs to be correctly set per packet.
> >
> > Plus we do *not* convert frame to buff here which was your initial (on
> > point) comment WRT onstack rxqs. Traffic comes as skbs from peer's
> > ndo_start_xmit(). What you're referring to is when source is xdp_frame (in
> > veth case this is when ndo_xdp_xmit or XDP_TX is used).
>
> I guess we're slipping into a philosophical discussion but I'd say
> that the problem is that rxq stores part of what is de facto xdp buff
> state. It is evacuated into the xdp frame when frame is constructed,
> as packet is detached from driver context. We need to reconstitute it
> when we convert frame (skb, or anything else) back info an xdp buff.
So let us have mem type per xdp_buff then. Feels clunky anyways to change
it on whole rxq on xdp_buff basis. Maybe then everyone will be happy?
>
> xdp_convert_buff_to_frame() and xdp_convert_frame_to_buff() should be
> a mirror image of each other, to put it more concisely.
>
> > However the problem pointed out by AI (!) is something we should fix as
> > for XDP_{TX,REDIRECT} xdp_rxq_info is overwritten and mem type update is
> > lost.
>
> > > +/* Initialize an xdp_buff from an skb.
> > > + *
> > > + * Note: if skb has frags skb_cow_data_for_xdp() must be called first,
> > > + * or caller must otherwise guarantee that the frags come from a page pool
> > > + */
> > > +static inline
> > > +void xdp_convert_skb_to_buff(const struct xdp_frame *frame,
> > > + struct xdp_buff *xdp, struct xdp_rxq_info *rxq)
> >
> > I would expect to get skb as an input here
>
> Joł. Don't nit pick my draft diff :D It's not meant as a working patch.
Polish sneaked in so this got really under your skin :D
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH bpf 2/2] veth: update mem type in xdp_buff
2025-10-08 9:22 ` Maciej Fijalkowski
@ 2025-10-08 10:37 ` Maciej Fijalkowski
2025-10-13 23:24 ` Jakub Kicinski
0 siblings, 1 reply; 13+ messages in thread
From: Maciej Fijalkowski @ 2025-10-08 10:37 UTC (permalink / raw)
To: Jakub Kicinski
Cc: bpf, ast, daniel, hawk, ilias.apalodimas, toke, lorenzo, netdev,
magnus.karlsson, andrii, stfomichev, aleksander.lobakin
On Wed, Oct 08, 2025 at 11:22:26AM +0200, Maciej Fijalkowski wrote:
> On Tue, Oct 07, 2025 at 06:11:53PM -0700, Jakub Kicinski wrote:
> > On Tue, 7 Oct 2025 16:59:21 +0200 Maciej Fijalkowski wrote:
> > > > My thinking was that we should try to bake the rxq into "conversion"
> > > > APIs, draft diff below, very much unfinished and I'm probably missing
> > > > some cases but hopefully gets the point across:
> > >
> > > That is not related IMHO. The bugs being fixed have existing rxqs. It's
> > > just the mem type that needs to be correctly set per packet.
> > >
> > > Plus we do *not* convert frame to buff here which was your initial (on
> > > point) comment WRT onstack rxqs. Traffic comes as skbs from peer's
> > > ndo_start_xmit(). What you're referring to is when source is xdp_frame (in
> > > veth case this is when ndo_xdp_xmit or XDP_TX is used).
> >
> > I guess we're slipping into a philosophical discussion but I'd say
> > that the problem is that rxq stores part of what is de facto xdp buff
> > state. It is evacuated into the xdp frame when frame is constructed,
> > as packet is detached from driver context. We need to reconstitute it
> > when we convert frame (skb, or anything else) back info an xdp buff.
>
> So let us have mem type per xdp_buff then. Feels clunky anyways to change
> it on whole rxq on xdp_buff basis. Maybe then everyone will be happy?
...however would we be fine with taking a potential performance hit?
>
> >
> > xdp_convert_buff_to_frame() and xdp_convert_frame_to_buff() should be
> > a mirror image of each other, to put it more concisely.
> >
> > > However the problem pointed out by AI (!) is something we should fix as
> > > for XDP_{TX,REDIRECT} xdp_rxq_info is overwritten and mem type update is
> > > lost.
> >
> > > > +/* Initialize an xdp_buff from an skb.
> > > > + *
> > > > + * Note: if skb has frags skb_cow_data_for_xdp() must be called first,
> > > > + * or caller must otherwise guarantee that the frags come from a page pool
> > > > + */
> > > > +static inline
> > > > +void xdp_convert_skb_to_buff(const struct xdp_frame *frame,
> > > > + struct xdp_buff *xdp, struct xdp_rxq_info *rxq)
> > >
> > > I would expect to get skb as an input here
> >
> > Joł. Don't nit pick my draft diff :D It's not meant as a working patch.
>
> Polish sneaked in so this got really under your skin :D
>
> >
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH bpf 2/2] veth: update mem type in xdp_buff
2025-10-08 10:37 ` Maciej Fijalkowski
@ 2025-10-13 23:24 ` Jakub Kicinski
2025-10-14 16:53 ` Toke Høiland-Jørgensen
0 siblings, 1 reply; 13+ messages in thread
From: Jakub Kicinski @ 2025-10-13 23:24 UTC (permalink / raw)
To: Maciej Fijalkowski
Cc: bpf, ast, daniel, hawk, ilias.apalodimas, toke, lorenzo, netdev,
magnus.karlsson, andrii, stfomichev, aleksander.lobakin
On Wed, 8 Oct 2025 12:37:22 +0200 Maciej Fijalkowski wrote:
> > > I guess we're slipping into a philosophical discussion but I'd say
> > > that the problem is that rxq stores part of what is de facto xdp buff
> > > state. It is evacuated into the xdp frame when frame is constructed,
> > > as packet is detached from driver context. We need to reconstitute it
> > > when we convert frame (skb, or anything else) back info an xdp buff.
> >
> > So let us have mem type per xdp_buff then. Feels clunky anyways to change
> > it on whole rxq on xdp_buff basis. Maybe then everyone will be happy?
>
> ...however would we be fine with taking a potential performance hit?
I'd think the perf hit will be a blocker, supposedly it's in rxq for
a reason. We are updating it per packet in the few places that are
coded up correctly (cpumap) so while it is indeed kinda weird we're
not making it any worse?
Maybe others disagree. I don't feel super strongly. My gut feeling is
what I drafted is best we can do in a fix.
Sorry for delay, PTO.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH bpf 2/2] veth: update mem type in xdp_buff
2025-10-13 23:24 ` Jakub Kicinski
@ 2025-10-14 16:53 ` Toke Høiland-Jørgensen
0 siblings, 0 replies; 13+ messages in thread
From: Toke Høiland-Jørgensen @ 2025-10-14 16:53 UTC (permalink / raw)
To: Jakub Kicinski, Maciej Fijalkowski
Cc: bpf, ast, daniel, hawk, ilias.apalodimas, lorenzo, netdev,
magnus.karlsson, andrii, stfomichev, aleksander.lobakin
Jakub Kicinski <kuba@kernel.org> writes:
> On Wed, 8 Oct 2025 12:37:22 +0200 Maciej Fijalkowski wrote:
>> > > I guess we're slipping into a philosophical discussion but I'd say
>> > > that the problem is that rxq stores part of what is de facto xdp buff
>> > > state. It is evacuated into the xdp frame when frame is constructed,
>> > > as packet is detached from driver context. We need to reconstitute it
>> > > when we convert frame (skb, or anything else) back info an xdp buff.
>> >
>> > So let us have mem type per xdp_buff then. Feels clunky anyways to change
>> > it on whole rxq on xdp_buff basis. Maybe then everyone will be happy?
>>
>> ...however would we be fine with taking a potential performance hit?
>
> I'd think the perf hit will be a blocker, supposedly it's in rxq for
> a reason. We are updating it per packet in the few places that are
> coded up correctly (cpumap) so while it is indeed kinda weird we're
> not making it any worse?
>
> Maybe others disagree. I don't feel super strongly. My gut feeling is
> what I drafted is best we can do in a fix.
I'd tend to agree, although I don't have a good intuition for how much
of a performance hit this would end up being.
-Toke
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2025-10-14 16:53 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-03 14:02 [PATCH bpf 0/2] xdp: fix page_pool leaks Maciej Fijalkowski
2025-10-03 14:02 ` [PATCH bpf 1/2] xdp: update xdp_rxq_info's mem type in XDP generic hook Maciej Fijalkowski
2025-10-03 17:29 ` Alexei Starovoitov
2025-10-07 19:39 ` Maciej Fijalkowski
2025-10-07 20:18 ` Alexei Starovoitov
2025-10-03 14:02 ` [PATCH bpf 2/2] veth: update mem type in xdp_buff Maciej Fijalkowski
2025-10-03 23:10 ` Jakub Kicinski
2025-10-07 14:59 ` Maciej Fijalkowski
2025-10-08 1:11 ` Jakub Kicinski
2025-10-08 9:22 ` Maciej Fijalkowski
2025-10-08 10:37 ` Maciej Fijalkowski
2025-10-13 23:24 ` Jakub Kicinski
2025-10-14 16:53 ` Toke Høiland-Jørgensen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).