From: Chris Arges <carges@cloudflare.com>
To: Dragos Tatulea <dtatulea@nvidia.com>
Cc: netdev@vger.kernel.org, bpf@vger.kernel.org,
kernel-team <kernel-team@cloudflare.com>,
Jesper Dangaard Brouer <hawk@kernel.org>,
tariqt@nvidia.com, saeedm@nvidia.com,
Leon Romanovsky <leon@kernel.org>,
Andrew Lunn <andrew+netdev@lunn.ch>,
"David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
John Fastabend <john.fastabend@gmail.com>,
Simon Horman <horms@kernel.org>,
Andrew Rzeznik <arzeznik@cloudflare.com>,
Yan Zhai <yan@cloudflare.com>
Subject: Re: [BUG] mlx5_core memory management issue
Date: Wed, 23 Jul 2025 13:48:07 -0500 [thread overview]
Message-ID: <aIEuZy6fUj_4wtQ6@861G6M3> (raw)
In-Reply-To: <dhqeshvesjhyxeimyh6nttlkrrhoxwpmjpn65tesani3tmne5v@msusvzdhuuin>
On 2025-07-04 12:37:36, Dragos Tatulea wrote:
> On Thu, Jul 03, 2025 at 10:49:20AM -0500, Chris Arges wrote:
> > When running iperf through a set of XDP programs we were able to crash
> > machines with NICs using the mlx5_core driver. We were able to confirm
> > that other NICs/drivers did not exhibit the same problem, and suspect
> > this could be a memory management issue in the driver code.
> > Specifically we found a WARNING at include/net/page_pool/helpers.h:277
> > mlx5e_page_release_fragmented.isra. We are able to demonstrate this
> > issue in production using hardware, but cannot easily bisect because
> > we don’t have a simple reproducer.
> >
> Thanks for the report! We will investigate.
>
> > I wanted to share stack traces in
> > order to help us further debug and understand if anyone else has run
> > into this issue. We are currently working on getting more crashdumps
> > and doing further analysis.
> >
> >
> > The test setup looks like the following:
> > ┌─────┐
> > │mlx5 │
> > │NIC │
> > └──┬──┘
> > │xdp ebpf program (does encap and XDP_TX)
> > │
> > ▼
> > ┌──────────────────────┐
> > │xdp.frags │
> > │ │
> > └──┬───────────────────┘
> > │tailcall
> > │BPF_REDIRECT_MAP (using CPUMAP bpf type)
> > ▼
> > ┌──────────────────────┐
> > │xdp.frags/cpumap │
> > │ │
> > └──┬───────────────────┘
> > │BPF_REDIRECT to veth (*potential trigger for issue)
> > │
> > ▼
> > ┌──────┐
> > │veth │
> > │ │
> > └──┬───┘
> > │
> > │
> > ▼
> >
> > Here an mlx5 NIC has an xdp.frags program attached which tailcalls via
> > BPF_REDIRECT_MAP into an xdp.frags/cpumap. For our reproducer we can
> > choose a random valid CPU to reproduce the issue. Once that packet
> > reaches the xdp.frags/cpumap program we then do another BPF_REDIRECT
> > to a veth device which has an XDP program which redirects to an
> > XSKMAP. It wasn’t until we added the additional BPF_REDIRECT to the
> > veth device that we noticed this issue.
> >
> Would it be possible to try to use a single program that redirects to
> the XSKMAP and check that the issue reproduces?
>
> > When running with 6.12.30 to 6.12.32 kernels we are able to see the
> > following KASAN use-after-free WARNINGs followed by a page fault which
> > crashes the machine. We have not been able to test earlier or later
> > kernels. I’ve tried to map symbols to lines of code for clarity.
> >
> Thanks for the KASAN reports, they are very useful. Keep us posted
> if you have other updates. A first quick look didn't reveal anything
> obvious from our side but we will keep looking.
>
> Thanks,
> Dragos
Ok, we can reproduce this problem!
I tried to simplify this reproducer, but it seems like what's needed is:
- xdp program attached to mlx5 NIC
- cpumap redirect
- device redirect (map or just bpf_redirect)
- frame gets turned into an skb
Then from another machine send many flows of UDP traffic to trigger the problem.
I've put together a program that reproduces the issue here:
- https://github.com/arges/xdp-redirector
In general the failure manifests with many different WARNs such as:
include/net/page_pool/helpers.h:277 mlx5e_page_release_fragmented.isra.0+0xf7/0x150 [mlx5_core]
Then the machine crashes.
I was able to get a crashdump which shows:
```
PID: 0 TASK: ffff8c0910134380 CPU: 76 COMMAND: "swapper/76"
#0 [fffffe10906d3ea8] crash_nmi_callback at ffffffffadc5c4fd
#1 [fffffe10906d3eb0] default_do_nmi at ffffffffae9524f0
#2 [fffffe10906d3ed0] exc_nmi at ffffffffae952733
#3 [fffffe10906d3ef0] end_repeat_nmi at ffffffffaea01bfd
[exception RIP: io_serial_in+25]
RIP: ffffffffae4cd489 RSP: ffffb3c60d6049e8 RFLAGS: 00000002
RAX: ffffffffae4cd400 RBX: 00000000000025d8 RCX: 0000000000000000
RDX: 00000000000002fd RSI: 0000000000000005 RDI: ffffffffb10a9cb0
RBP: 0000000000000000 R8: 2d2d2d2d2d2d2d2d R9: 656820747563205b
R10: 000000002d2d2d2d R11: 000000002d2d2d2d R12: ffffffffb0fa5610
R13: 0000000000000000 R14: 0000000000000000 R15: ffffffffb10a9cb0
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
--- <NMI exception stack> ---
#4 [ffffb3c60d6049e8] io_serial_in at ffffffffae4cd489
#5 [ffffb3c60d6049e8] serial8250_console_write at ffffffffae4d2fcf
#6 [ffffb3c60d604a80] console_flush_all at ffffffffadd1cf26
#7 [ffffb3c60d604b00] console_unlock at ffffffffadd1d1df
#8 [ffffb3c60d604b48] vprintk_emit at ffffffffadd1dda1
#9 [ffffb3c60d604b98] _printk at ffffffffae90250c
#10 [ffffb3c60d604bf8] report_bug.cold at ffffffffae95001d
#11 [ffffb3c60d604c38] handle_bug at ffffffffae950e91
#12 [ffffb3c60d604c58] exc_invalid_op at ffffffffae9512b7
#13 [ffffb3c60d604c70] asm_exc_invalid_op at ffffffffaea0123a
[exception RIP: mlx5e_page_release_fragmented+85]
RIP: ffffffffc25f75c5 RSP: ffffb3c60d604d20 RFLAGS: 00010293
RAX: 000000000000003f RBX: ffff8bfa8f059fd0 RCX: ffffe3bf1992a180
RDX: 000000000000003d RSI: ffffe3bf1992a180 RDI: ffff8bf9b0784000
RBP: 0000000000000040 R8: 00000000000001d2 R9: 0000000000000006
R10: ffff8c06de22f380 R11: ffff8bfcfe6cd680 R12: 00000000000001d2
R13: 000000000000002b R14: ffff8bf9b0784000 R15: 0000000000000000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#14 [ffffb3c60d604d20] mlx5e_free_rx_wqes at ffffffffc25f7e2f [mlx5_core]
#15 [ffffb3c60d604d58] mlx5e_post_rx_wqes at ffffffffc25f877c [mlx5_core]
#16 [ffffb3c60d604dc0] mlx5e_napi_poll at ffffffffc25fdd27 [mlx5_core]
#17 [ffffb3c60d604e20] __napi_poll at ffffffffae6a8ddb
#18 [ffffb3c60d604e90] __napi_poll at ffffffffae6a8db5
#19 [ffffb3c60d604e98] net_rx_action at ffffffffae6a95f1
#20 [ffffb3c60d604f98] handle_softirqs at ffffffffadc9d4bf
#21 [ffffb3c60d604fe8] irq_exit_rcu at ffffffffadc9e057
#22 [ffffb3c60d604ff0] common_interrupt at ffffffffae952015
--- <IRQ stack> ---
#23 [ffffb3c60c837de8] asm_common_interrupt at ffffffffaea01466
[exception RIP: cpuidle_enter_state+184]
RIP: ffffffffae955c38 RSP: ffffb3c60c837e98 RFLAGS: 00000202
RAX: ffff8c0cffc00000 RBX: ffff8c0911002400 RCX: 0000000000000000
RDX: 00003c630b2d073a RSI: ffffffe519600d10 RDI: 0000000000000000
RBP: 0000000000000001 R8: 0000000000000002 R9: 0000000000000001
R10: ffff8c0cffc330c4 R11: 071c71c71c71c71c R12: ffffffffb05ff820
R13: 00003c630b2d073a R14: 0000000000000001 R15: 0000000000000000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#24 [ffffb3c60c837ed0] cpuidle_enter at ffffffffae64b4ad
#25 [ffffb3c60c837ef0] do_idle at ffffffffadcfa7c6
#26 [ffffb3c60c837f30] cpu_startup_entry at ffffffffadcfaa09
#27 [ffffb3c60c837f40] start_secondary at ffffffffadc5ec77
#28 [ffffb3c60c837f50] common_startup_64 at ffffffffadc24d5d
```
Assuming (this is x86_64):
RDI=ffff8bf9b0784000 (rq)
RSI=ffffe3bf1992a180 (frag_page)
```
static void mlx5e_page_release_fragmented(struct mlx5e_rq *rq,
struct mlx5e_frag_page *frag_page)
{
u16 drain_count = MLX5E_PAGECNT_BIAS_MAX - frag_page->frags;
struct page *page = frag_page->page;
if (page_pool_unref_page(page, drain_count) == 0)
page_pool_put_unrefed_page(rq->page_pool, page, -1, true);
}
```
crash> struct mlx5e_frag_page ffffe3bf1992a180
struct mlx5e_frag_page {
page = 0x26ffff800000000,
frags = 49856
}
This means that drain_count could be an unexpected number (assuming that we
expect it to be less than MLX5E_PAGECNT_BIAS_MAX).
Let me know what additional experiments would be useful here.
--chris
next prev parent reply other threads:[~2025-07-23 18:48 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-03 15:49 [BUG] mlx5_core memory management issue Chris Arges
2025-07-04 12:37 ` Dragos Tatulea
2025-07-04 20:14 ` Dragos Tatulea
2025-07-07 22:07 ` Chris Arges
2025-07-23 18:48 ` Chris Arges [this message]
2025-07-24 17:01 ` Dragos Tatulea
2025-08-07 16:45 ` Chris Arges
2025-08-11 8:37 ` Dragos Tatulea
2025-08-12 15:44 ` Dragos Tatulea
2025-08-12 18:55 ` Jesse Brandeburg
2025-08-12 20:19 ` Dragos Tatulea
2025-08-12 21:25 ` Chris Arges
2025-08-13 18:53 ` Chris Arges
2025-08-13 19:26 ` Dragos Tatulea
2025-08-13 20:24 ` Dragos Tatulea
2025-08-14 11:26 ` Jesper Dangaard Brouer
2025-08-14 14:42 ` Dragos Tatulea
2025-08-14 15:58 ` Jesper Dangaard Brouer
2025-08-14 16:45 ` Dragos Tatulea
2025-08-15 14:59 ` Jakub Kicinski
2025-08-15 16:02 ` Jesper Dangaard Brouer
2025-08-15 16:36 ` Jakub Kicinski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aIEuZy6fUj_4wtQ6@861G6M3 \
--to=carges@cloudflare.com \
--cc=andrew+netdev@lunn.ch \
--cc=arzeznik@cloudflare.com \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=dtatulea@nvidia.com \
--cc=edumazet@google.com \
--cc=hawk@kernel.org \
--cc=horms@kernel.org \
--cc=john.fastabend@gmail.com \
--cc=kernel-team@cloudflare.com \
--cc=kuba@kernel.org \
--cc=leon@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=saeedm@nvidia.com \
--cc=tariqt@nvidia.com \
--cc=yan@cloudflare.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.