Re: [mellanox/mlx5-next RFC 1/1] net/mlx5: RX, Fix refcount warning on frag page release

Netdev List
 help / color / mirror / Atom feed

From: Dragos Tatulea <dtatulea@nvidia.com>
To: "Nabil S. Alramli" <dev@nalramli.com>,
	saeedm@nvidia.com, tariqt@nvidia.com, mbloch@nvidia.com
Cc: nalramli@fastly.com, leon@kernel.org, andrew+netdev@lunn.ch,
	davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
	pabeni@redhat.com, netdev@vger.kernel.org,
	linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [mellanox/mlx5-next RFC 1/1] net/mlx5: RX, Fix refcount warning on frag page release
Date: Fri, 26 Jun 2026 15:12:32 +0200	[thread overview]
Message-ID: <9f150145-d95c-4a90-a358-5b33ab78a8ef@nvidia.com> (raw)
In-Reply-To: <20260625174059.2879717-2-dev@nalramli.com>



On 25.06.26 19:40, Nabil S. Alramli wrote:
> Under memory pressure, mlx5 driver has WARNING during fragmented page
> release. This happens because there is a discrepency between what mlx5
> thinks the page fragment counter is vs what the page_pool actually says it
> is.
> 
The mlx5 frag counter is not the same as pp_ref_count. The page gets
split into 64 parts during page allocation. The frag counter tracks how
many of those frags have been used.

> The cause of the issue is page allocations on concurrent cpus, which
> increment the non-atomic u16 page counter mlx5e_frag_page.frags, while at
> the same time the page reference counter net_iov.pp_ref_count is atomically
> incremented. That sometimes leads to a difference in the counts and
> therefore triggers the warning in page_pool_unref_netmem:
> 
page_pool page allocations must not happen in parallel on different CPUs.
Each queue has its own page_pool and allocation happens within the NAPI of
that queue which sticks to a single CPU. The release path does support
releasing on another CPU (release to ring).

How did you encounter this scenario of having parallel allocations on
different CPUs from the same page_pool?

> ```
> 	ret = atomic_long_sub_return(nr, pp_ref_count);
> 	WARN_ON(ret < 0);
> ```
> 
> The actual stack trace looks like this:
> 
> ```
> WARNING: CPU: 37 PID: 447795 at include/net/page_pool/helpers.h:277 mlx5e_page_release_fragmented.isra.0+0x51/0x60 [mlx5_core]
> Tainted: [S]=CPU_OUT_OF_SPEC, [O]=OOT_MODULE
> Hardware name: *
> RIP: 0010:mlx5e_page_release_fragmented.isra.0+0x51/0x60 [mlx5_core]
> RSP: 0018:ffffc90019814d98 EFLAGS: 00010293
> RAX: 000000000000003f RBX: ffff88c0993d0a10 RCX: ffffea02424592c0
> RDX: 0000000000000001 RSI: ffffea02424592c0 RDI: ffff88c090e20000
> RBP: 000000000000000a R08: 0000000000001409 R09: 0000000000000006
> R10: 0000000000000000 R11: ffff88c095fbc040 R12: 000000000000141f
> R13: 0000000000000009 R14: ffff88c090e20000 R15: 0000000000000001
> FS:  00007f34149fa6c0(0000) GS:ffff89200fa40000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007ed0265eb000 CR3: 0000005091cbe000 CR4: 0000000000350ef0
> Call Trace:
>  <IRQ>
>  mlx5e_free_rx_wqes+0x7b/0xa0 [mlx5_core]
>  mlx5e_post_rx_wqes+0x1ac/0x5a0 [mlx5_core]
>  mlx5e_napi_poll+0x5e5/0x6f0 [mlx5_core]
>  __napi_poll+0x2b/0x1a0
>  net_rx_action+0x30e/0x370
>  ? sched_clock+0x9/0x10
>  ? sched_clock_cpu+0xf/0x170
>  handle_softirqs+0xe2/0x2a0
>  common_interrupt+0x85/0xa0
>  </IRQ>
>  <TASK>
>  asm_common_interrupt+0x26/0x40
> RIP: 0010:page_counter_uncharge+0x34/0x90
> RSP: 0018:ffffc900e728bb00 EFLAGS: 00000213
> RAX: ffff88aff4762000 RBX: ffff88aff4762100 RCX: 0000000000000304
> RDX: 0000000000000001 RSI: 00000000004e9e1a RDI: ffff88aff4762100
> RBP: 0000000000000001 R08: ffff891ea0560048 R09: 00007ffffffff000
> R10: 0000000000001000 R11: ffff891ae8061b00 R12: ffffffffffffffff
> R13: ffff89107fcfd4c0 R14: ffff891ae8061b00 R15: ffff892002fe1400
>  uncharge_batch+0x40/0xd0
> ```
>
Can you provide more data on how you reproduced this? This helps to
narrow down the bug. Reproduction steps would be ideal.

> The fix is to use an atomic page fragment counter, so it will always match
> the number of references held in the page_pool.
>
This is not the right fix. The mlx5 page frag counter is not atomic
on purpose because all changes to it happen only within the NAPI
context.

Thanks,
Dragos

next prev parent reply	other threads:[~2026-06-26 13:12 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-25 17:40 [mellanox/mlx5-next RFC 1/1] net/mlx5: RX, Fix refcount warning on frag page release Nabil S. Alramli
2026-06-25 17:40 ` Nabil S. Alramli
2026-06-26 13:12   ` Dragos Tatulea [this message]
2026-06-26 18:02     ` Nabil S. Alramli
2026-06-27  7:48       ` Dragos Tatulea

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9f150145-d95c-4a90-a358-5b33ab78a8ef@nvidia.com \
    --to=dtatulea@nvidia.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=davem@davemloft.net \
    --cc=dev@nalramli.com \
    --cc=edumazet@google.com \
    --cc=kuba@kernel.org \
    --cc=leon@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=mbloch@nvidia.com \
    --cc=nalramli@fastly.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=saeedm@nvidia.com \
    --cc=tariqt@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox