All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chris Arges <carges@cloudflare.com>
To: Dragos Tatulea <dtatulea@nvidia.com>
Cc: netdev@vger.kernel.org, bpf@vger.kernel.org,
	kernel-team <kernel-team@cloudflare.com>,
	Jesper Dangaard Brouer <hawk@kernel.org>,
	tariqt@nvidia.com, saeedm@nvidia.com,
	Leon Romanovsky <leon@kernel.org>,
	Andrew Lunn <andrew+netdev@lunn.ch>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	John Fastabend <john.fastabend@gmail.com>,
	Simon Horman <horms@kernel.org>,
	Andrew Rzeznik <arzeznik@cloudflare.com>,
	Yan Zhai <yan@cloudflare.com>
Subject: Re: [BUG] mlx5_core memory management issue
Date: Thu, 7 Aug 2025 11:45:40 -0500	[thread overview]
Message-ID: <aJTYNG1AroAnvV31@861G6M3> (raw)
In-Reply-To: <jlvrzm6q7dnai6nf5v3ifhtwqlnvvrdg5driqomnl5q4lzfxmk@tmwaadjob5yd>

On 2025-07-24 17:01:16, Dragos Tatulea wrote:
> On Wed, Jul 23, 2025 at 01:48:07PM -0500, Chris Arges wrote:
> > 
> > Ok, we can reproduce this problem!
> > 
> > I tried to simplify this reproducer, but it seems like what's needed is:
> > - xdp program attached to mlx5 NIC
> > - cpumap redirect
> > - device redirect (map or just bpf_redirect)
> > - frame gets turned into an skb
> > Then from another machine send many flows of UDP traffic to trigger the problem.
> > 
> > I've put together a program that reproduces the issue here:
> > - https://github.com/arges/xdp-redirector
> >
> Much appreciated! I fumbled around initially, not managing to get
> traffic to the xdp_devmap stage. But further debugging revealed that GRO
> needs to be enabled on the veth devices for XDP redir to work to the
> xdp_devmap. After that I managed to reproduce your issue.
> 
> Now I can start looking into it.
> 

Dragos,

There was a similar reference counting issue identified in:
https://lore.kernel.org/all/20250801170754.2439577-1-kuba@kernel.org/

Part of the commit message mentioned:
> Unfortunately for fbnic since commit f7dc3248dcfb ("skbuff: Optimization
> of SKB coalescing for page pool") core _may_ actually take two extra
> pp refcounts, if one of them is returned before driver gives up the bias
> the ret < 0 check in page_pool_unref_netmem() will trigger.

In order to help debug the mlx5 issue caused by xdp redirection, I built a
kernel with commit f7dc3248dcfb reverted, but unfortunately I was still able
to reproduce the issue.

I am happy to try some other experiments, or if there are other ideas you have.

Thanks,
--chris

  reply	other threads:[~2025-08-07 16:45 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-03 15:49 [BUG] mlx5_core memory management issue Chris Arges
2025-07-04 12:37 ` Dragos Tatulea
2025-07-04 20:14   ` Dragos Tatulea
2025-07-07 22:07     ` Chris Arges
2025-07-23 18:48   ` Chris Arges
2025-07-24 17:01     ` Dragos Tatulea
2025-08-07 16:45       ` Chris Arges [this message]
2025-08-11  8:37         ` Dragos Tatulea
2025-08-12 15:44           ` Dragos Tatulea
2025-08-12 18:55             ` Jesse Brandeburg
2025-08-12 20:19               ` Dragos Tatulea
2025-08-12 21:25                 ` Chris Arges
2025-08-13 18:53                   ` Chris Arges
2025-08-13 19:26                     ` Dragos Tatulea
2025-08-13 20:24                       ` Dragos Tatulea
2025-08-14 11:26                         ` Jesper Dangaard Brouer
2025-08-14 14:42                           ` Dragos Tatulea
2025-08-14 15:58                             ` Jesper Dangaard Brouer
2025-08-14 16:45                               ` Dragos Tatulea
2025-08-15 14:59                               ` Jakub Kicinski
2025-08-15 16:02                                 ` Jesper Dangaard Brouer
2025-08-15 16:36                                   ` Jakub Kicinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aJTYNG1AroAnvV31@861G6M3 \
    --to=carges@cloudflare.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=arzeznik@cloudflare.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=dtatulea@nvidia.com \
    --cc=edumazet@google.com \
    --cc=hawk@kernel.org \
    --cc=horms@kernel.org \
    --cc=john.fastabend@gmail.com \
    --cc=kernel-team@cloudflare.com \
    --cc=kuba@kernel.org \
    --cc=leon@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=saeedm@nvidia.com \
    --cc=tariqt@nvidia.com \
    --cc=yan@cloudflare.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.