[PATCH net] xdp: use multi-buff only if receive queue supports page pool

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH net] xdp: use multi-buff only if receive queue supports page pool
@ 2025-09-24  6:08 Octavian Purdila
  2025-09-25  0:09 ` Jakub Kicinski
  0 siblings, 1 reply; 10+ messages in thread
From: Octavian Purdila @ 2025-09-24  6:08 UTC (permalink / raw)
  To: kuba
  Cc: davem, edumazet, pabeni, horms, ast, daniel, hawk, john.fastabend,
	sdf, uniyu, ahmed.zaki, aleksander.lobakin, toke, lorenzo, netdev,
	bpf, Octavian Purdila, syzbot+ff145014d6b0ce64a173

When a BPF program that supports BPF_F_XDP_HAS_FRAGS is issuing
bpf_xdp_adjust_tail and a large packet is injected via /dev/net/tun a
crash occurs due to detecting a bad page state (page_pool leak).

This is because xdp_buff does not record the type of memory and
instead relies on the netdev receive queue xdp info. Since the TUN/TAP
driver is using a MEM_TYPE_PAGE_SHARED memory model buffer, shrinking
will eventually call page_frag_free. But with current multi-buff
support for BPF_F_XDP_HAS_FRAGS programs buffers are allocated via the
page pool.

To fix this issue check that the receive queue memory mode is of
MEM_TYPE_PAGE_POOL before using multi-buffs.

Reported-by: syzbot+ff145014d6b0ce64a173@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/6756c37b.050a0220.a30f1.019a.GAE@google.com/
Fixes: e6d5dbdd20aa ("xdp: add multi-buff support for xdp running in generic mode")
Signed-off-by: Octavian Purdila <tavip@google.com>
---
 net/core/dev.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 8d49b2198d07..b195ee3068c2 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5335,13 +5335,18 @@ static int
 netif_skb_check_for_xdp(struct sk_buff **pskb, const struct bpf_prog *prog)
 {
 	struct sk_buff *skb = *pskb;
+	struct netdev_rx_queue *rxq;
 	int err, hroom, troom;
 
-	local_lock_nested_bh(&system_page_pool.bh_lock);
-	err = skb_cow_data_for_xdp(this_cpu_read(system_page_pool.pool), pskb, prog);
-	local_unlock_nested_bh(&system_page_pool.bh_lock);
-	if (!err)
-		return 0;
+	rxq = netif_get_rxqueue(skb);
+	if (rxq->xdp_rxq.mem.type == MEM_TYPE_PAGE_POOL) {
+		local_lock_nested_bh(&system_page_pool.bh_lock);
+		err = skb_cow_data_for_xdp(this_cpu_read(system_page_pool.pool),
+					   pskb, prog);
+		local_unlock_nested_bh(&system_page_pool.bh_lock);
+		if (!err)
+			return 0;
+	}
 
 	/* In case we have to go down the path and also linearize,
 	 * then lets do the pskb_expand_head() work just once here.
-- 
2.51.0.534.gc79095c0ca-goog


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH net] xdp: use multi-buff only if receive queue supports page pool
  2025-09-24  6:08 [PATCH net] xdp: use multi-buff only if receive queue supports page pool Octavian Purdila
@ 2025-09-25  0:09 ` Jakub Kicinski
  2025-09-25  7:53   ` Octavian Purdila
  0 siblings, 1 reply; 10+ messages in thread
From: Jakub Kicinski @ 2025-09-25  0:09 UTC (permalink / raw)
  To: Octavian Purdila
  Cc: davem, edumazet, pabeni, horms, ast, daniel, hawk, john.fastabend,
	sdf, uniyu, ahmed.zaki, aleksander.lobakin, toke, lorenzo, netdev,
	bpf, syzbot+ff145014d6b0ce64a173

On Wed, 24 Sep 2025 06:08:42 +0000 Octavian Purdila wrote:
> When a BPF program that supports BPF_F_XDP_HAS_FRAGS is issuing
> bpf_xdp_adjust_tail and a large packet is injected via /dev/net/tun a
> crash occurs due to detecting a bad page state (page_pool leak).
> 
> This is because xdp_buff does not record the type of memory and
> instead relies on the netdev receive queue xdp info. Since the TUN/TAP
> driver is using a MEM_TYPE_PAGE_SHARED memory model buffer, shrinking
> will eventually call page_frag_free. But with current multi-buff
> support for BPF_F_XDP_HAS_FRAGS programs buffers are allocated via the
> page pool.
> 
> To fix this issue check that the receive queue memory mode is of
> MEM_TYPE_PAGE_POOL before using multi-buffs.

This can also happen on veth, right? And veth re-stamps the Rx queues.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net] xdp: use multi-buff only if receive queue supports page pool
  2025-09-25  0:09 ` Jakub Kicinski
@ 2025-09-25  7:53   ` Octavian Purdila
  2025-09-25  9:42     ` Maciej Fijalkowski
  0 siblings, 1 reply; 10+ messages in thread
From: Octavian Purdila @ 2025-09-25  7:53 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, edumazet, pabeni, horms, ast, daniel, hawk, john.fastabend,
	sdf, ahmed.zaki, aleksander.lobakin, toke, lorenzo, netdev, bpf,
	syzbot+ff145014d6b0ce64a173, Kuniyuki Iwashima

On Wed, Sep 24, 2025 at 5:09 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Wed, 24 Sep 2025 06:08:42 +0000 Octavian Purdila wrote:
> > When a BPF program that supports BPF_F_XDP_HAS_FRAGS is issuing
> > bpf_xdp_adjust_tail and a large packet is injected via /dev/net/tun a
> > crash occurs due to detecting a bad page state (page_pool leak).
> >
> > This is because xdp_buff does not record the type of memory and
> > instead relies on the netdev receive queue xdp info. Since the TUN/TAP
> > driver is using a MEM_TYPE_PAGE_SHARED memory model buffer, shrinking
> > will eventually call page_frag_free. But with current multi-buff
> > support for BPF_F_XDP_HAS_FRAGS programs buffers are allocated via the
> > page pool.
> >
> > To fix this issue check that the receive queue memory mode is of
> > MEM_TYPE_PAGE_POOL before using multi-buffs.
>
> This can also happen on veth, right? And veth re-stamps the Rx queues.

I am not sure if re-stamps will have ill effects.

The allocation and deallocation for this issue happens while
processing the same packet (receive skb -> skb_pp_cow_data ->
page_pool alloc ... __bpf_prog_run ->  bpf_xdp_adjust_tail).

IIUC, if the veth re-stamps the RX queue to MEM_TYPE_PAGE_POOL
skb_pp_cow_data will proceed to allocate from page_pool and
bpf_xdp_adjust_tail will correctly free from page_pool.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net] xdp: use multi-buff only if receive queue supports page pool
  2025-09-25  7:53   ` Octavian Purdila
@ 2025-09-25  9:42     ` Maciej Fijalkowski
  2025-09-26  2:12       ` Jakub Kicinski
  0 siblings, 1 reply; 10+ messages in thread
From: Maciej Fijalkowski @ 2025-09-25  9:42 UTC (permalink / raw)
  To: Octavian Purdila
  Cc: Jakub Kicinski, davem, edumazet, pabeni, horms, ast, daniel, hawk,
	john.fastabend, sdf, ahmed.zaki, aleksander.lobakin, toke,
	lorenzo, netdev, bpf, syzbot+ff145014d6b0ce64a173,
	Kuniyuki Iwashima

On Thu, Sep 25, 2025 at 12:53:53AM -0700, Octavian Purdila wrote:
> On Wed, Sep 24, 2025 at 5:09 PM Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > On Wed, 24 Sep 2025 06:08:42 +0000 Octavian Purdila wrote:
> > > When a BPF program that supports BPF_F_XDP_HAS_FRAGS is issuing
> > > bpf_xdp_adjust_tail and a large packet is injected via /dev/net/tun a
> > > crash occurs due to detecting a bad page state (page_pool leak).
> > >
> > > This is because xdp_buff does not record the type of memory and
> > > instead relies on the netdev receive queue xdp info. Since the TUN/TAP
> > > driver is using a MEM_TYPE_PAGE_SHARED memory model buffer, shrinking
> > > will eventually call page_frag_free. But with current multi-buff
> > > support for BPF_F_XDP_HAS_FRAGS programs buffers are allocated via the
> > > page pool.
> > >
> > > To fix this issue check that the receive queue memory mode is of
> > > MEM_TYPE_PAGE_POOL before using multi-buffs.
> >
> > This can also happen on veth, right? And veth re-stamps the Rx queues.

What do you mean by 're-stamps' in this case?

> 
> I am not sure if re-stamps will have ill effects.
> 
> The allocation and deallocation for this issue happens while
> processing the same packet (receive skb -> skb_pp_cow_data ->
> page_pool alloc ... __bpf_prog_run ->  bpf_xdp_adjust_tail).
> 
> IIUC, if the veth re-stamps the RX queue to MEM_TYPE_PAGE_POOL
> skb_pp_cow_data will proceed to allocate from page_pool and
> bpf_xdp_adjust_tail will correctly free from page_pool.

netif_get_rxqueue() gives you a pointer the netstack queue, not the driver
one. Then you take the xdp_rxq from there. Do we even register memory
model for these queues? Or am I missing something here.

We're in generic XDP hook where driver specifics should not matter here
IMHO.

> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net] xdp: use multi-buff only if receive queue supports page pool
  2025-09-25  9:42     ` Maciej Fijalkowski
@ 2025-09-26  2:12       ` Jakub Kicinski
  2025-09-26  7:33         ` Octavian Purdila
  0 siblings, 1 reply; 10+ messages in thread
From: Jakub Kicinski @ 2025-09-26  2:12 UTC (permalink / raw)
  To: Maciej Fijalkowski
  Cc: Octavian Purdila, davem, edumazet, pabeni, horms, ast, daniel,
	hawk, john.fastabend, sdf, ahmed.zaki, aleksander.lobakin, toke,
	lorenzo, netdev, bpf, syzbot+ff145014d6b0ce64a173,
	Kuniyuki Iwashima

On Thu, 25 Sep 2025 11:42:04 +0200 Maciej Fijalkowski wrote:
> On Thu, Sep 25, 2025 at 12:53:53AM -0700, Octavian Purdila wrote:
> > On Wed, Sep 24, 2025 at 5:09 PM Jakub Kicinski <kuba@kernel.org> wrote:  
> > >
> > > On Wed, 24 Sep 2025 06:08:42 +0000 Octavian Purdila wrote:  
>  [...]  
> > >
> > > This can also happen on veth, right? And veth re-stamps the Rx queues.  
> 
> What do you mean by 're-stamps' in this case?
> 
> > 
> > I am not sure if re-stamps will have ill effects.
> > 
> > The allocation and deallocation for this issue happens while
> > processing the same packet (receive skb -> skb_pp_cow_data ->
> > page_pool alloc ... __bpf_prog_run ->  bpf_xdp_adjust_tail).
> > 
> > IIUC, if the veth re-stamps the RX queue to MEM_TYPE_PAGE_POOL
> > skb_pp_cow_data will proceed to allocate from page_pool and
> > bpf_xdp_adjust_tail will correctly free from page_pool.  
> 
> netif_get_rxqueue() gives you a pointer the netstack queue, not the driver
> one. Then you take the xdp_rxq from there. Do we even register memory
> model for these queues? Or am I missing something here.
> 
> We're in generic XDP hook where driver specifics should not matter here
> IMHO.

Well, IDK how helpful the flow below would be but:

veth_xdp_xmit() -> [ptr ring] -> veth_xdp_rcv() -> veth_xdp_rcv_one() 
                                                               |
                            | xdp_convert_frame_to_buff()   <-'
    ( "re-stamps" ;) ->     | xdp->rxq = &rq->xdp_rxq;
  can eat frags but now rxq | bpf_prog_run_xdp()
         is veth's          |

I just glanced at the code so >50% changes I'm wrong, but that's what 
I meant.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net] xdp: use multi-buff only if receive queue supports page pool
  2025-09-26  2:12       ` Jakub Kicinski
@ 2025-09-26  7:33         ` Octavian Purdila
  2025-09-26 11:24           ` Maciej Fijalkowski
  0 siblings, 1 reply; 10+ messages in thread
From: Octavian Purdila @ 2025-09-26  7:33 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Maciej Fijalkowski, davem, edumazet, pabeni, horms, ast, daniel,
	hawk, john.fastabend, sdf, ahmed.zaki, aleksander.lobakin, toke,
	lorenzo, netdev, bpf, syzbot+ff145014d6b0ce64a173,
	Kuniyuki Iwashima

On Thu, Sep 25, 2025 at 7:12 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Thu, 25 Sep 2025 11:42:04 +0200 Maciej Fijalkowski wrote:
> > On Thu, Sep 25, 2025 at 12:53:53AM -0700, Octavian Purdila wrote:
> > > On Wed, Sep 24, 2025 at 5:09 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > > >
> > > > On Wed, 24 Sep 2025 06:08:42 +0000 Octavian Purdila wrote:
> >  [...]
> > > >
> > > > This can also happen on veth, right? And veth re-stamps the Rx queues.
> >
> > What do you mean by 're-stamps' in this case?
> >
> > >
> > > I am not sure if re-stamps will have ill effects.
> > >
> > > The allocation and deallocation for this issue happens while
> > > processing the same packet (receive skb -> skb_pp_cow_data ->
> > > page_pool alloc ... __bpf_prog_run ->  bpf_xdp_adjust_tail).
> > >
> > > IIUC, if the veth re-stamps the RX queue to MEM_TYPE_PAGE_POOL
> > > skb_pp_cow_data will proceed to allocate from page_pool and
> > > bpf_xdp_adjust_tail will correctly free from page_pool.
> >
> > netif_get_rxqueue() gives you a pointer the netstack queue, not the driver
> > one. Then you take the xdp_rxq from there. Do we even register memory
> > model for these queues? Or am I missing something here.
> >

Ah, yes, you are right. So my comment in the commit message about
TUN/TAP registering a page shared memory model is wrong. But I think
the fix is still correct for the reported syzkaller issue. From
bpf_prog_run_generic_xdp:

        rxqueue = netif_get_rxqueue(skb);
        xdp_init_buff(xdp, frame_sz, rxq: &rxqueue->xdp_rxq);

So xdp_buff's rxq is set to the netstack queue for the generic XDP
hook. And adding the check in netif_skb_check_for_xdp based on the
netstack queue should be correct, right?

> > We're in generic XDP hook where driver specifics should not matter here
> > IMHO.
>
> Well, IDK how helpful the flow below would be but:
>
> veth_xdp_xmit() -> [ptr ring] -> veth_xdp_rcv() -> veth_xdp_rcv_one()
>                                                                |
>                             | xdp_convert_frame_to_buff()   <-'
>     ( "re-stamps" ;) ->     | xdp->rxq = &rq->xdp_rxq;
>   can eat frags but now rxq | bpf_prog_run_xdp()
>          is veth's          |
>
> I just glanced at the code so >50% changes I'm wrong, but that's what
> I meant.

Thanks for the clarification, I thought that "re-stamps" means the:

    xdp->rxq->mem.type = frame->mem_type;

from veth_xdp_rcv_one in the XDP_TX/XDP_REDIRECT cases.

And yes, now I think the same issue can happen because veth sets the
memory model to MEM_TYPE_PAGE_SHARED but veth_convert_skb_to_xdp_buff
calls skb_pp_cow_data that uses page_pool for allocations. I'll try to
see if I can adapt the syzkaller repro to trigger it for confirmation.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net] xdp: use multi-buff only if receive queue supports page pool
  2025-09-26  7:33         ` Octavian Purdila
@ 2025-09-26 11:24           ` Maciej Fijalkowski
  2025-09-26 19:40             ` Jakub Kicinski
  0 siblings, 1 reply; 10+ messages in thread
From: Maciej Fijalkowski @ 2025-09-26 11:24 UTC (permalink / raw)
  To: Octavian Purdila
  Cc: Jakub Kicinski, davem, edumazet, pabeni, horms, ast, daniel, hawk,
	john.fastabend, sdf, ahmed.zaki, aleksander.lobakin, toke,
	lorenzo, netdev, bpf, syzbot+ff145014d6b0ce64a173,
	Kuniyuki Iwashima

On Fri, Sep 26, 2025 at 12:33:46AM -0700, Octavian Purdila wrote:
> On Thu, Sep 25, 2025 at 7:12 PM Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > On Thu, 25 Sep 2025 11:42:04 +0200 Maciej Fijalkowski wrote:
> > > On Thu, Sep 25, 2025 at 12:53:53AM -0700, Octavian Purdila wrote:
> > > > On Wed, Sep 24, 2025 at 5:09 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > > > >
> > > > > On Wed, 24 Sep 2025 06:08:42 +0000 Octavian Purdila wrote:
> > >  [...]
> > > > >
> > > > > This can also happen on veth, right? And veth re-stamps the Rx queues.
> > >
> > > What do you mean by 're-stamps' in this case?
> > >
> > > >
> > > > I am not sure if re-stamps will have ill effects.
> > > >
> > > > The allocation and deallocation for this issue happens while
> > > > processing the same packet (receive skb -> skb_pp_cow_data ->
> > > > page_pool alloc ... __bpf_prog_run ->  bpf_xdp_adjust_tail).
> > > >
> > > > IIUC, if the veth re-stamps the RX queue to MEM_TYPE_PAGE_POOL
> > > > skb_pp_cow_data will proceed to allocate from page_pool and
> > > > bpf_xdp_adjust_tail will correctly free from page_pool.
> > >
> > > netif_get_rxqueue() gives you a pointer the netstack queue, not the driver
> > > one. Then you take the xdp_rxq from there. Do we even register memory
> > > model for these queues? Or am I missing something here.
> > >
> 
> Ah, yes, you are right. So my comment in the commit message about
> TUN/TAP registering a page shared memory model is wrong. But I think
> the fix is still correct for the reported syzkaller issue. From
> bpf_prog_run_generic_xdp:
> 
>         rxqueue = netif_get_rxqueue(skb);
>         xdp_init_buff(xdp, frame_sz, rxq: &rxqueue->xdp_rxq);
> 
> So xdp_buff's rxq is set to the netstack queue for the generic XDP
> hook. And adding the check in netif_skb_check_for_xdp based on the
> netstack queue should be correct, right?

Per my limited understanding your change is making skb_cow_data_for_xdp()
a dead code as I don't see mem model being registered for these stack
queues - netif_alloc_rx_queues() only calls xdp_rxq_info_reg() and
mem.type defaults to MEM_TYPE_PAGE_SHARED as it's defined as 0, which
means it's never going to be MEM_TYPE_PAGE_POOL.

IMHO that single case where we rewrite skb to memory backed by page pool
should have it reflected in mem.type so __xdp_return() potentially used in
bpf helpers could act correctly.

> 
> > > We're in generic XDP hook where driver specifics should not matter here
> > > IMHO.
> >
> > Well, IDK how helpful the flow below would be but:
> >
> > veth_xdp_xmit() -> [ptr ring] -> veth_xdp_rcv() -> veth_xdp_rcv_one()
> >                                                                |
> >                             | xdp_convert_frame_to_buff()   <-'
> >     ( "re-stamps" ;) ->     | xdp->rxq = &rq->xdp_rxq;
> >   can eat frags but now rxq | bpf_prog_run_xdp()
> >          is veth's          |
> >
> > I just glanced at the code so >50% changes I'm wrong, but that's what
> > I meant.
> 
> Thanks for the clarification, I thought that "re-stamps" means the:
> 
>     xdp->rxq->mem.type = frame->mem_type;
> 
> from veth_xdp_rcv_one in the XDP_TX/XDP_REDIRECT cases.
> 
> And yes, now I think the same issue can happen because veth sets the
> memory model to MEM_TYPE_PAGE_SHARED but veth_convert_skb_to_xdp_buff
> calls skb_pp_cow_data that uses page_pool for allocations. I'll try to
> see if I can adapt the syzkaller repro to trigger it for confirmation.

That is a good catch.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net] xdp: use multi-buff only if receive queue supports page pool
  2025-09-26 11:24           ` Maciej Fijalkowski
@ 2025-09-26 19:40             ` Jakub Kicinski
  2025-09-30  0:01               ` Octavian Purdila
  0 siblings, 1 reply; 10+ messages in thread
From: Jakub Kicinski @ 2025-09-26 19:40 UTC (permalink / raw)
  To: Maciej Fijalkowski
  Cc: Octavian Purdila, davem, edumazet, pabeni, horms, ast, daniel,
	hawk, john.fastabend, sdf, ahmed.zaki, aleksander.lobakin, toke,
	lorenzo, netdev, bpf, syzbot+ff145014d6b0ce64a173,
	Kuniyuki Iwashima

On Fri, 26 Sep 2025 13:24:12 +0200 Maciej Fijalkowski wrote:
> On Fri, Sep 26, 2025 at 12:33:46AM -0700, Octavian Purdila wrote:
> > On Thu, Sep 25, 2025 at 7:12 PM Jakub Kicinski <kuba@kernel.org> wrote:  
> > Ah, yes, you are right. So my comment in the commit message about
> > TUN/TAP registering a page shared memory model is wrong. But I think
> > the fix is still correct for the reported syzkaller issue. From
> > bpf_prog_run_generic_xdp:
> > 
> >         rxqueue = netif_get_rxqueue(skb);
> >         xdp_init_buff(xdp, frame_sz, rxq: &rxqueue->xdp_rxq);
> > 
> > So xdp_buff's rxq is set to the netstack queue for the generic XDP
> > hook. And adding the check in netif_skb_check_for_xdp based on the
> > netstack queue should be correct, right?  
> 
> Per my limited understanding your change is making skb_cow_data_for_xdp()
> a dead code as I don't see mem model being registered for these stack
> queues - netif_alloc_rx_queues() only calls xdp_rxq_info_reg() and
> mem.type defaults to MEM_TYPE_PAGE_SHARED as it's defined as 0, which
> means it's never going to be MEM_TYPE_PAGE_POOL.

Hah, that's a great catch, how did I miss that..

The reason for the cow is that frags can be shared, we are not allowed
to modify them. It's orthogonal.

> IMHO that single case where we rewrite skb to memory backed by page pool
> should have it reflected in mem.type so __xdp_return() potentially used in
> bpf helpers could act correctly.
> 
> > > Well, IDK how helpful the flow below would be but:
> > >
> > > veth_xdp_xmit() -> [ptr ring] -> veth_xdp_rcv() -> veth_xdp_rcv_one()
> > >                                                                |
> > >                             | xdp_convert_frame_to_buff()   <-'
> > >     ( "re-stamps" ;) ->     | xdp->rxq = &rq->xdp_rxq;
> > >   can eat frags but now rxq | bpf_prog_run_xdp()
> > >          is veth's          |
> > >
> > > I just glanced at the code so >50% changes I'm wrong, but that's what
> > > I meant.  
> > 
> > Thanks for the clarification, I thought that "re-stamps" means the:
> > 
> >     xdp->rxq->mem.type = frame->mem_type;
> > 
> > from veth_xdp_rcv_one in the XDP_TX/XDP_REDIRECT cases.
> > 
> > And yes, now I think the same issue can happen because veth sets the
> > memory model to MEM_TYPE_PAGE_SHARED but veth_convert_skb_to_xdp_buff
> > calls skb_pp_cow_data that uses page_pool for allocations. I'll try to
> > see if I can adapt the syzkaller repro to trigger it for confirmation.  
> 
> That is a good catch.

FWIW I think all calls to xdp_convert_frame_to_buff() must come with
the hack that cpu_map_bpf_prog_run_xdp() is doing today. Declare rxq
on the stack and fill in the mem info there. I wonder if we should add
something to the API (xdp_convert_frame_to_buff()) to make sure people
don't forget to do this..

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net] xdp: use multi-buff only if receive queue supports page pool
  2025-09-26 19:40             ` Jakub Kicinski
@ 2025-09-30  0:01               ` Octavian Purdila
  2025-09-30 17:41                 ` Jakub Kicinski
  0 siblings, 1 reply; 10+ messages in thread
From: Octavian Purdila @ 2025-09-30  0:01 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Maciej Fijalkowski, davem, edumazet, pabeni, horms, ast, daniel,
	hawk, john.fastabend, sdf, ahmed.zaki, aleksander.lobakin, toke,
	lorenzo, netdev, bpf, syzbot+ff145014d6b0ce64a173,
	Kuniyuki Iwashima

On Fri, Sep 26, 2025 at 12:40 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Fri, 26 Sep 2025 13:24:12 +0200 Maciej Fijalkowski wrote:
> > On Fri, Sep 26, 2025 at 12:33:46AM -0700, Octavian Purdila wrote:
> > > On Thu, Sep 25, 2025 at 7:12 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > > Ah, yes, you are right. So my comment in the commit message about
> > > TUN/TAP registering a page shared memory model is wrong. But I think
> > > the fix is still correct for the reported syzkaller issue. From
> > > bpf_prog_run_generic_xdp:
> > >
> > >         rxqueue = netif_get_rxqueue(skb);
> > >         xdp_init_buff(xdp, frame_sz, rxq: &rxqueue->xdp_rxq);
> > >
> > > So xdp_buff's rxq is set to the netstack queue for the generic XDP
> > > hook. And adding the check in netif_skb_check_for_xdp based on the
> > > netstack queue should be correct, right?
> >
> > Per my limited understanding your change is making skb_cow_data_for_xdp()
> > a dead code as I don't see mem model being registered for these stack
> > queues - netif_alloc_rx_queues() only calls xdp_rxq_info_reg() and
> > mem.type defaults to MEM_TYPE_PAGE_SHARED as it's defined as 0, which
> > means it's never going to be MEM_TYPE_PAGE_POOL.
>
> Hah, that's a great catch, how did I miss that..
>
> The reason for the cow is that frags can be shared, we are not allowed
> to modify them. It's orthogonal.
>

Could we use the same hack that you mention below (declare rxq on the
stack and fill in the mem info there) for do_xdp_generic?

@@ -5442,7 +5448,10 @@ int do_xdp_generic(const struct bpf_prog
*xdp_prog, struct sk_buff **pskb)
        struct bpf_net_context __bpf_net_ctx, *bpf_net_ctx;

        if (xdp_prog) {
-               struct xdp_buff xdp;
+               struct xdp_rxq_info rxq = {};
+               struct xdp_buff xdp = {
+                       .rxq = &rxq,
+               };
                u32 act;
                int err;

and then explicitly set the mem type:

@@ -5331,14 +5332,19 @@ u32 bpf_prog_run_generic_xdp(struct sk_buff
*skb, struct xdp_buff *xdp,
        return act;
 }

-static int
-netif_skb_check_for_xdp(struct sk_buff **pskb, const struct bpf_prog *prog)
+static int netif_skb_check_for_xdp(struct sk_buff **pskb,
+                                  const struct bpf_prog *prog,
+                                  struct xdp_rxq_info *rxq)
 {
        struct sk_buff *skb = *pskb;
        int err, hroom, troom;
+       struct page_pool *pool;

        local_lock_nested_bh(&system_page_pool.bh_lock);
-       err = skb_cow_data_for_xdp(this_cpu_read(system_page_pool.pool),
pskb, prog);
+       pool = this_cpu_read(system_page_pool.pool);
+       err = skb_cow_data_for_xdp(pool, pskb, prog);
+       rxq->mem.type = MEM_TYPE_PAGE_POOL;
+       rxq->mem.id = pool->xdp_mem_id;
        local_unlock_nested_bh(&system_page_pool.bh_lock);
        if (!err)
                return 0;


> > IMHO that single case where we rewrite skb to memory backed by page pool
> > should have it reflected in mem.type so __xdp_return() potentially used in
> > bpf helpers could act correctly.
> >
> > > > Well, IDK how helpful the flow below would be but:
> > > >
> > > > veth_xdp_xmit() -> [ptr ring] -> veth_xdp_rcv() -> veth_xdp_rcv_one()
> > > >                                                                |
> > > >                             | xdp_convert_frame_to_buff()   <-'
> > > >     ( "re-stamps" ;) ->     | xdp->rxq = &rq->xdp_rxq;
> > > >   can eat frags but now rxq | bpf_prog_run_xdp()
> > > >          is veth's          |
> > > >
> > > > I just glanced at the code so >50% changes I'm wrong, but that's what
> > > > I meant.
> > >
> > > Thanks for the clarification, I thought that "re-stamps" means the:
> > >
> > >     xdp->rxq->mem.type = frame->mem_type;
> > >
> > > from veth_xdp_rcv_one in the XDP_TX/XDP_REDIRECT cases.
> > >
> > > And yes, now I think the same issue can happen because veth sets the
> > > memory model to MEM_TYPE_PAGE_SHARED but veth_convert_skb_to_xdp_buff
> > > calls skb_pp_cow_data that uses page_pool for allocations. I'll try to
> > > see if I can adapt the syzkaller repro to trigger it for confirmation.
> >
> > That is a good catch.
>
> FWIW I think all calls to xdp_convert_frame_to_buff() must come with
> the hack that cpu_map_bpf_prog_run_xdp() is doing today. Declare rxq
> on the stack and fill in the mem info there. I wonder if we should add
> something to the API (xdp_convert_frame_to_buff()) to make sure people
> don't forget to do this..

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net] xdp: use multi-buff only if receive queue supports page pool
  2025-09-30  0:01               ` Octavian Purdila
@ 2025-09-30 17:41                 ` Jakub Kicinski
  0 siblings, 0 replies; 10+ messages in thread
From: Jakub Kicinski @ 2025-09-30 17:41 UTC (permalink / raw)
  To: Octavian Purdila
  Cc: Maciej Fijalkowski, davem, edumazet, pabeni, horms, ast, daniel,
	hawk, john.fastabend, sdf, ahmed.zaki, aleksander.lobakin, toke,
	lorenzo, netdev, bpf, syzbot+ff145014d6b0ce64a173,
	Kuniyuki Iwashima

On Mon, 29 Sep 2025 17:01:49 -0700 Octavian Purdila wrote:
>         local_lock_nested_bh(&system_page_pool.bh_lock);
> -       err = skb_cow_data_for_xdp(this_cpu_read(system_page_pool.pool), pskb, prog);
> +       pool = this_cpu_read(system_page_pool.pool);
> +       err = skb_cow_data_for_xdp(pool, pskb, prog);
> +       rxq->mem.type = MEM_TYPE_PAGE_POOL;
> +       rxq->mem.id = pool->xdp_mem_id;
>         local_unlock_nested_bh(&system_page_pool.bh_lock);

Yes, LGTM. I _think_ that only skb_cow_data_for_xdp() has to be under
the lock here, but doesn't really matter.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2025-09-30 17:41 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-24  6:08 [PATCH net] xdp: use multi-buff only if receive queue supports page pool Octavian Purdila
2025-09-25  0:09 ` Jakub Kicinski
2025-09-25  7:53   ` Octavian Purdila
2025-09-25  9:42     ` Maciej Fijalkowski
2025-09-26  2:12       ` Jakub Kicinski
2025-09-26  7:33         ` Octavian Purdila
2025-09-26 11:24           ` Maciej Fijalkowski
2025-09-26 19:40             ` Jakub Kicinski
2025-09-30  0:01               ` Octavian Purdila
2025-09-30 17:41                 ` Jakub Kicinski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).