public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
* mlx5 XDP redirect leaking memory on kernel 6.3
@ 2023-05-23 15:55 Jesper Dangaard Brouer
  2023-05-23 16:35 ` Dragos Tatulea
  0 siblings, 1 reply; 15+ messages in thread
From: Jesper Dangaard Brouer @ 2023-05-23 15:55 UTC (permalink / raw)
  To: Dragos Tatulea, Saeed Mahameed, Saeed Mahameed, Tariq Toukan,
	Tariq Toukan, Netdev, Yunsheng Lin
  Cc: brouer, atzin, mkabat, kheib, Jiri Benc, bpf, Felix Maurer,
	Alexander Duyck, Ilias Apalodimas, Lorenzo Bianconi,
	Maxim Mikityanskiy


When the mlx5 driver runs an XDP program doing XDP_REDIRECT, then memory
is getting leaked. Other XDP actions, like XDP_DROP, XDP_PASS and XDP_TX
works correctly. I tested both redirecting back out same mlx5 device and
cpumap redirect (with XDP_PASS), which both cause leaking.

After removing the XDP prog, which also cause the page_pool to be
released by mlx5, then the leaks are visible via the page_pool periodic
inflight reports. I have this bpftrace[1] tool that I also use to detect
the problem faster (not waiting 60 sec for a report).

  [1] 
https://github.com/xdp-project/xdp-project/blob/master/areas/mem/bpftrace/page_pool_track_shutdown01.bt

I've been debugging and reading through the code for a couple of days,
but I've not found the root-cause, yet. I would appreciate new ideas
where to look and fresh eyes on the issue.

To Lin, it looks like mlx5 uses PP_FLAG_PAGE_FRAG, and my current
suspicion is that mlx5 driver doesn't fully release the bias count (hint
see MLX5E_PAGECNT_BIAS_MAX).

--Jesper


Extra info about my device.  Providing these as mlx5 driver can have 
different allocation modes depending on HW and device priv-flags setup.

$ ethtool --show-priv-flags mlx5p1
Private flags for mlx5p1:
rx_cqe_moder       : on
tx_cqe_moder       : off
rx_cqe_compress    : off
rx_striding_rq     : on
rx_no_csum_complete: off
xdp_tx_mpwqe       : on
skb_tx_mpwqe       : on
tx_port_ts         : off

$ ethtool -i mlx5p1
driver: mlx5_core
version: 6.4.0-rc2-net-next-vm-lock-dbg+
firmware-version: 16.23.1020 (MT_0000000009)
expansion-rom-version:
bus-info: 0000:03:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: yes

$ lspci -v | grep 03:00.0
03:00.0 Ethernet controller: Mellanox Technologies MT28800 Family 
[ConnectX-5 Ex]


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2023-07-28 13:14 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-05-23 15:55 mlx5 XDP redirect leaking memory on kernel 6.3 Jesper Dangaard Brouer
2023-05-23 16:35 ` Dragos Tatulea
2023-05-24 11:26   ` Yunsheng Lin
2023-05-24 11:29     ` Yunsheng Lin
2023-05-24 12:03     ` Dragos Tatulea
2023-05-24 12:43       ` Yunsheng Lin
2023-05-24 15:28   ` Jesper Dangaard Brouer
2023-07-13  9:20   ` Jesper Dangaard Brouer
2023-07-13 10:11     ` Dragos Tatulea
2023-07-13 14:58       ` Jesper Dangaard Brouer
2023-07-13 15:31         ` Greg KH
2023-07-17 14:37           ` Dragos Tatulea
2023-07-17 14:42             ` gregkh
2023-07-17 15:15               ` Dragos Tatulea
2023-07-28 13:14                 ` Jesper Dangaard Brouer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox