public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next 0/5] net/mlx5e: XDP, Add support for multi-packet per page
@ 2026-03-19  7:50 Tariq Toukan
  2026-03-19  7:50 ` [PATCH net-next 1/5] net/mlx5e: XSK, Increase size for chunk_size param Tariq Toukan
                   ` (4 more replies)
  0 siblings, 5 replies; 8+ messages in thread
From: Tariq Toukan @ 2026-03-19  7:50 UTC (permalink / raw)
  To: Eric Dumazet, Jakub Kicinski, Paolo Abeni, Andrew Lunn,
	David S. Miller
  Cc: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Mark Bloch,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, netdev, linux-rdma, linux-kernel, bpf,
	Gal Pressman, Moshe Shemesh, Dragos Tatulea, Carolina Jubran

This change removes the limitation of having one packet per page in XDP
mode. This has the following implications:

- XDP in Striding RQ mode can now be used on 64K page systems.

- XDP in Legacy RQ mode was using a single packet per page which on 64K
  page systems is quite inefficient. The improvement can be observed
  with an XDP_DROP test when running in Legacy RQ mode on a ARM
  Neoverse-N1 system with a 64K page size:
  +-----------------------------------------------+
  | MTU  | baseline   | this change | improvement |
  |------+------------+-------------+-------------|
  | 1500 | 15.55 Mpps | 18.99 Mpps  | 22.0 %      |
  | 9000 | 15.53 Mpps | 18.24 Mpps  | 17.5 %      |
  +-----------------------------------------------+

After lifting this limitation, the series switches to using fragments
for the side page in non-linear mode. This small improvement is at most
visible for XDP_DROP tests with small 64B packets and a large enough MTU
for Striding RQ to be in non-linear mode:
+----------------------------------------------------------------------+
| System               | MTU  | baseline   | this change | improvement |
|----------------------+------+------------+-------------+-------------|
| 4K page x86_64 [1]   | 9000 | 26.30 Mpps | 30.45 Mpps  | 15.80 %     |
| 64K page aarch64 [2] | 9000 | 15.27 Mpps | 20.10 Mpps  | 31.62 %     |
+----------------------------------------------------------------------+

This series does not cover the xsk (AF_XDP) paths for 64K page systems.


Dragos Tatulea (5):
  net/mlx5e: XSK, Increase size for chunk_size param
  net/mlx5e: XDP, Improve dma address calculation of linear part for
    XDP_TX
  net/mlx5e: XDP, Remove stride size limitation
  net/mlx5e: XDP, Use a single linear page per rq
  net/mlx5e: XDP, Use page fragments for linear data in multibuf-mode

 drivers/net/ethernet/mellanox/mlx5/core/en.h  | 12 +++-
 .../ethernet/mellanox/mlx5/core/en/params.c   | 11 +---
 .../ethernet/mellanox/mlx5/core/en/params.h   |  2 +-
 .../net/ethernet/mellanox/mlx5/core/en/xdp.c  |  2 +-
 .../net/ethernet/mellanox/mlx5/core/en_main.c | 50 ++++++++++++++--
 .../net/ethernet/mellanox/mlx5/core/en_rx.c   | 59 +++++++++++++++----
 6 files changed, 107 insertions(+), 29 deletions(-)


base-commit: a7fb05cbb8f989fa5a81818be9680464cff9d717
-- 
2.44.0


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH net-next 1/5] net/mlx5e: XSK, Increase size for chunk_size param
  2026-03-19  7:50 [PATCH net-next 0/5] net/mlx5e: XDP, Add support for multi-packet per page Tariq Toukan
@ 2026-03-19  7:50 ` Tariq Toukan
  2026-03-19  7:50 ` [PATCH net-next 2/5] net/mlx5e: XDP, Improve dma address calculation of linear part for XDP_TX Tariq Toukan
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: Tariq Toukan @ 2026-03-19  7:50 UTC (permalink / raw)
  To: Eric Dumazet, Jakub Kicinski, Paolo Abeni, Andrew Lunn,
	David S. Miller
  Cc: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Mark Bloch,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, netdev, linux-rdma, linux-kernel, bpf,
	Gal Pressman, Moshe Shemesh, Dragos Tatulea, Carolina Jubran

From: Dragos Tatulea <dtatulea@nvidia.com>

When 64K pages are used, chunk_size can take the 64K value
which doesn't fit in u16. This results in overflows that
are detected in mlx5e_mpwrq_log_wqe_sz().

Increase the type to u32 to fix this.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en/params.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/params.h b/drivers/net/ethernet/mellanox/mlx5/core/en/params.h
index 9b1a2aed17c3..275f9be53a34 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/params.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/params.h
@@ -8,7 +8,7 @@
 
 struct mlx5e_xsk_param {
 	u16 headroom;
-	u16 chunk_size;
+	u32 chunk_size;
 	bool unaligned;
 };
 
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH net-next 2/5] net/mlx5e: XDP, Improve dma address calculation of linear part for XDP_TX
  2026-03-19  7:50 [PATCH net-next 0/5] net/mlx5e: XDP, Add support for multi-packet per page Tariq Toukan
  2026-03-19  7:50 ` [PATCH net-next 1/5] net/mlx5e: XSK, Increase size for chunk_size param Tariq Toukan
@ 2026-03-19  7:50 ` Tariq Toukan
  2026-03-19  7:50 ` [PATCH net-next 3/5] net/mlx5e: XDP, Remove stride size limitation Tariq Toukan
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: Tariq Toukan @ 2026-03-19  7:50 UTC (permalink / raw)
  To: Eric Dumazet, Jakub Kicinski, Paolo Abeni, Andrew Lunn,
	David S. Miller
  Cc: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Mark Bloch,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, netdev, linux-rdma, linux-kernel, bpf,
	Gal Pressman, Moshe Shemesh, Dragos Tatulea, Carolina Jubran

From: Dragos Tatulea <dtatulea@nvidia.com>

When calculating the dma address of the linear part of an XDP frame, the
formula assumes that there is a single XDP buffer per page. Extend the
formula to allow multiple XDP buffers per page by calculating the data
offset in the page.

This is a preparation for the upcoming removal of a single XDP buffer
per page limitation when the formula will no longer be correct.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
index 04e1b5fa4825..d3bab198c99c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
@@ -123,7 +123,7 @@ mlx5e_xmit_xdp_buff(struct mlx5e_xdpsq *sq, struct mlx5e_rq *rq,
 	 * mode.
 	 */
 
-	dma_addr = page_pool_get_dma_addr(page) + (xdpf->data - (void *)xdpf);
+	dma_addr = page_pool_get_dma_addr(page) + offset_in_page(xdpf->data);
 	dma_sync_single_for_device(sq->pdev, dma_addr, xdptxd->len, DMA_BIDIRECTIONAL);
 
 	if (xdptxd->has_frags) {
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH net-next 3/5] net/mlx5e: XDP, Remove stride size limitation
  2026-03-19  7:50 [PATCH net-next 0/5] net/mlx5e: XDP, Add support for multi-packet per page Tariq Toukan
  2026-03-19  7:50 ` [PATCH net-next 1/5] net/mlx5e: XSK, Increase size for chunk_size param Tariq Toukan
  2026-03-19  7:50 ` [PATCH net-next 2/5] net/mlx5e: XDP, Improve dma address calculation of linear part for XDP_TX Tariq Toukan
@ 2026-03-19  7:50 ` Tariq Toukan
  2026-03-19  7:50 ` [PATCH net-next 4/5] net/mlx5e: XDP, Use a single linear page per rq Tariq Toukan
  2026-03-19  7:50 ` [PATCH net-next 5/5] net/mlx5e: XDP, Use page fragments for linear data in multibuf-mode Tariq Toukan
  4 siblings, 0 replies; 8+ messages in thread
From: Tariq Toukan @ 2026-03-19  7:50 UTC (permalink / raw)
  To: Eric Dumazet, Jakub Kicinski, Paolo Abeni, Andrew Lunn,
	David S. Miller
  Cc: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Mark Bloch,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, netdev, linux-rdma, linux-kernel, bpf,
	Gal Pressman, Moshe Shemesh, Dragos Tatulea, Carolina Jubran

From: Dragos Tatulea <dtatulea@nvidia.com>

Currently XDP mode always uses PAGE_SIZE strides. This limitation
existed because page fragment counting was not implemented when XDP was
added. Furthermore, due to this limitation there were other issues as
well on system with larger pages (e.g. 64K):

- XDP for Striding RQ was effectively disabled on such systems.

- Legacy RQ allows the configuration but uses a fixed scheme of one XDP
  buffer per page which is inefficient.

As fragment counting was added during the driver conversion to
page_pool and the support for XDP multi-buffer, it is now possible
to remove this stride size limitation. This patch does just that.

Now it is possible to use XDP on systems with higher page sizes (e.g.
64K):

- For Striding RQ, loading the program is no longer blocked.
  Although a 64K page can fit any packet, MTUs that result in
  stride > 8K will still make the RQ in non-linear mode. That's
  because the HW doesn't support a higher than 8K stride.

- For Legacy RQ, the stride size was PAGE_SIZE which was very
  inefficient. Now the stride size will be calculated relative to MTU.
  Legacy RQ will always be in linear mode for larger system pages.

  This can be observed with an XDP_DROP test [1] when running
  in Legacy RQ mode on a ARM Neoverse-N1 system with a 64K
  page size:
  +-----------------------------------------------+
  | MTU  | baseline   | this change | improvement |
  |------+------------+-------------+-------------|
  | 1500 | 15.55 Mpps | 18.99 Mpps  | 22.0 %      |
  | 9000 | 15.53 Mpps | 18.24 Mpps  | 17.5 %      |
  +-----------------------------------------------+

There are performance benefits for Striding RQ mode as well:

- Striding RQ non-linear mode now uses 256B strides, just like
  non-XDP mode.

- Striding RQ linear mode can now fit a number of XDP buffers per page
  that is relative to the MTU size. That means that on 4K page systems
  and a small enough MTU, 2 XDP buffers can fit in one page.

The above benefits for Striding RQ can be observed with an
XDP_DROP test [1] when running on a 4K page x86_64 system
(Intel Xeon Platinum 8580):
  +-----------------------------------------------+
  | MTU  | baseline   | this change | improvement |
  |------+------------+-------------+-------------|
  | 1000 | 28.36 Mpps | 33.98 Mpps  | 19.82 %     |
  | 9000 | 20.76 Mpps | 26.30 Mpps  | 26.70 %     |
  +-----------------------------------------------+

[1] Test description:
- xdp-bench with XDP_DROP
- RX: single queue
- TX: sends 64B packets to saturate CPU on RX side

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en/params.c | 11 ++---------
 1 file changed, 2 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/params.c b/drivers/net/ethernet/mellanox/mlx5/core/en/params.c
index 26bb31c56e45..1f4a547917ba 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/params.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/params.c
@@ -298,12 +298,9 @@ static u32 mlx5e_rx_get_linear_stride_sz(struct mlx5_core_dev *mdev,
 	 * no_head_tail_room should be set in the case of XDP with Striding RQ
 	 * when SKB is not linear. This is because another page is allocated for the linear part.
 	 */
-	sz = roundup_pow_of_two(mlx5e_rx_get_linear_sz_skb(params, no_head_tail_room));
+	sz = mlx5e_rx_get_linear_sz_skb(params, no_head_tail_room);
 
-	/* XDP in mlx5e doesn't support multiple packets per page.
-	 * Do not assume sz <= PAGE_SIZE if params->xdp_prog is set.
-	 */
-	return params->xdp_prog && sz < PAGE_SIZE ? PAGE_SIZE : sz;
+	return roundup_pow_of_two(sz);
 }
 
 static u8 mlx5e_mpwqe_log_pkts_per_wqe(struct mlx5_core_dev *mdev,
@@ -453,10 +450,6 @@ u8 mlx5e_mpwqe_get_log_stride_size(struct mlx5_core_dev *mdev,
 		return order_base_2(mlx5e_rx_get_linear_stride_sz(mdev, params,
 								  rqo, true));
 
-	/* XDP in mlx5e doesn't support multiple packets per page. */
-	if (params->xdp_prog)
-		return PAGE_SHIFT;
-
 	return MLX5_MPWRQ_DEF_LOG_STRIDE_SZ(mdev);
 }
 
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH net-next 4/5] net/mlx5e: XDP, Use a single linear page per rq
  2026-03-19  7:50 [PATCH net-next 0/5] net/mlx5e: XDP, Add support for multi-packet per page Tariq Toukan
                   ` (2 preceding siblings ...)
  2026-03-19  7:50 ` [PATCH net-next 3/5] net/mlx5e: XDP, Remove stride size limitation Tariq Toukan
@ 2026-03-19  7:50 ` Tariq Toukan
  2026-03-19  7:50 ` [PATCH net-next 5/5] net/mlx5e: XDP, Use page fragments for linear data in multibuf-mode Tariq Toukan
  4 siblings, 0 replies; 8+ messages in thread
From: Tariq Toukan @ 2026-03-19  7:50 UTC (permalink / raw)
  To: Eric Dumazet, Jakub Kicinski, Paolo Abeni, Andrew Lunn,
	David S. Miller
  Cc: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Mark Bloch,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, netdev, linux-rdma, linux-kernel, bpf,
	Gal Pressman, Moshe Shemesh, Dragos Tatulea, Carolina Jubran

From: Dragos Tatulea <dtatulea@nvidia.com>

Currently in striding rq there is one mlx5e_frag_page member per WQE for
the linear page. This linear page is used only in XDP multi-buffer mode.
This is wasteful because only one linear page is needed per rq: the page
gets refreshed on every packet, regardless of WQE. Furthermore, it is
not needed in other modes (non-XDP, XDP single-buffer).

This change moves the linear page into its own structure (struct
mlx5_mpw_linear_info) and allocates it only when necessary.

A special structure is created because an upcoming patch will extend
this structure to support fragmentation of the linear page.

This patch has no functional changes.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |  6 ++-
 .../net/ethernet/mellanox/mlx5/core/en_main.c | 37 ++++++++++++++++---
 .../net/ethernet/mellanox/mlx5/core/en_rx.c   | 17 +++++----
 3 files changed, 47 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index c7ac6ebe8290..592234780f2b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -591,10 +591,13 @@ union mlx5e_alloc_units {
 struct mlx5e_mpw_info {
 	u16 consumed_strides;
 	DECLARE_BITMAP(skip_release_bitmap, MLX5_MPWRQ_MAX_PAGES_PER_WQE);
-	struct mlx5e_frag_page linear_page;
 	union mlx5e_alloc_units alloc_units;
 };
 
+struct mlx5e_mpw_linear_info {
+	struct mlx5e_frag_page frag_page;
+};
+
 #define MLX5E_MAX_RX_FRAGS 4
 
 struct mlx5e_rq;
@@ -689,6 +692,7 @@ struct mlx5e_rq {
 			u8                     umr_wqebbs;
 			u8                     mtts_per_wqe;
 			u8                     umr_mode;
+			struct mlx5e_mpw_linear_info *linear_info;
 			struct mlx5e_shampo_hd *shampo;
 		} mpwqe;
 	};
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index f7009da94f0b..8b3c82f6f038 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -369,6 +369,29 @@ static int mlx5e_rq_alloc_mpwqe_info(struct mlx5e_rq *rq, int node)
 	return 0;
 }
 
+static int mlx5e_rq_alloc_mpwqe_linear_info(struct mlx5e_rq *rq, int node,
+					    struct mlx5e_params *params,
+					    struct mlx5e_rq_opt_param *rqo,
+					    u32 *pool_size)
+{
+	struct mlx5_core_dev *mdev = rq->mdev;
+	struct mlx5e_mpw_linear_info *li;
+
+	if (mlx5e_rx_mpwqe_is_linear_skb(mdev, params, rqo) ||
+	    !params->xdp_prog)
+		return 0;
+
+	li = kvzalloc_node(sizeof(*li), GFP_KERNEL, node);
+	if (!li)
+		return -ENOMEM;
+
+	rq->mpwqe.linear_info = li;
+
+	/* additional page per packet for the linear part */
+	*pool_size *= 2;
+
+	return 0;
+}
 
 static u8 mlx5e_mpwrq_access_mode(enum mlx5e_mpwrq_umr_mode umr_mode)
 {
@@ -915,10 +938,6 @@ static int mlx5e_alloc_rq(struct mlx5e_params *params,
 			mlx5e_mpwqe_get_log_rq_size(mdev, params, rqo);
 		pool_order = rq->mpwqe.page_shift - PAGE_SHIFT;
 
-		if (!mlx5e_rx_mpwqe_is_linear_skb(mdev, params, rqo) &&
-		    params->xdp_prog)
-			pool_size *= 2; /* additional page per packet for the linear part */
-
 		rq->mpwqe.log_stride_sz =
 				mlx5e_mpwqe_get_log_stride_size(mdev, params,
 								rqo);
@@ -936,10 +955,15 @@ static int mlx5e_alloc_rq(struct mlx5e_params *params,
 		if (err)
 			goto err_rq_mkey;
 
-		err = mlx5_rq_shampo_alloc(mdev, params, rq_param, rq, node);
+		err = mlx5e_rq_alloc_mpwqe_linear_info(rq, node, params, rqo,
+						       &pool_size);
 		if (err)
 			goto err_free_mpwqe_info;
 
+		err = mlx5_rq_shampo_alloc(mdev, params, rq_param, rq, node);
+		if (err)
+			goto err_free_mpwqe_linear_info;
+
 		break;
 	default: /* MLX5_WQ_TYPE_CYCLIC */
 		err = mlx5_wq_cyc_create(mdev, &rq_param->wq, rqc_wq,
@@ -1054,6 +1078,8 @@ static int mlx5e_alloc_rq(struct mlx5e_params *params,
 	switch (rq->wq_type) {
 	case MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ:
 		mlx5e_rq_free_shampo(rq);
+err_free_mpwqe_linear_info:
+		kvfree(rq->mpwqe.linear_info);
 err_free_mpwqe_info:
 		kvfree(rq->mpwqe.info);
 err_rq_mkey:
@@ -1081,6 +1107,7 @@ static void mlx5e_free_rq(struct mlx5e_rq *rq)
 	switch (rq->wq_type) {
 	case MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ:
 		mlx5e_rq_free_shampo(rq);
+		kvfree(rq->mpwqe.linear_info);
 		kvfree(rq->mpwqe.info);
 		mlx5_core_destroy_mkey(rq->mdev, be32_to_cpu(rq->mpwqe.umr_mkey_be));
 		mlx5e_free_mpwqe_rq_drop_page(rq);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index f5c0e2a0ada9..feb042d84b8e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -1869,6 +1869,7 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w
 	struct mlx5e_frag_page *frag_page = &wi->alloc_units.frag_pages[page_idx];
 	u16 headlen = min_t(u16, MLX5E_RX_MAX_HEAD, cqe_bcnt);
 	struct mlx5e_frag_page *head_page = frag_page;
+	struct mlx5e_frag_page *linear_page = NULL;
 	struct mlx5e_xdp_buff *mxbuf = &rq->mxbuf;
 	u32 page_size = BIT(rq->mpwqe.page_shift);
 	u32 frag_offset    = head_offset;
@@ -1897,13 +1898,15 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w
 	if (prog) {
 		/* area for bpf_xdp_[store|load]_bytes */
 		net_prefetchw(netmem_address(frag_page->netmem) + frag_offset);
+
+		linear_page = &rq->mpwqe.linear_info->frag_page;
 		if (unlikely(mlx5e_page_alloc_fragmented(rq->page_pool,
-							 &wi->linear_page))) {
+							 linear_page))) {
 			rq->stats->buff_alloc_err++;
 			return NULL;
 		}
 
-		va = netmem_address(wi->linear_page.netmem);
+		va = netmem_address(linear_page->netmem);
 		net_prefetchw(va); /* xdp_frame data area */
 		linear_hr = XDP_PACKET_HEADROOM;
 		linear_data_len = 0;
@@ -1966,10 +1969,10 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w
 				for (pfp = head_page; pfp < frag_page; pfp++)
 					pfp->frags++;
 
-				wi->linear_page.frags++;
+				linear_page->frags++;
 			}
 			mlx5e_page_release_fragmented(rq->page_pool,
-						      &wi->linear_page);
+						      linear_page);
 			return NULL; /* page/packet was consumed by XDP */
 		}
 
@@ -1988,13 +1991,13 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w
 			mxbuf->xdp.data - mxbuf->xdp.data_meta);
 		if (unlikely(!skb)) {
 			mlx5e_page_release_fragmented(rq->page_pool,
-						      &wi->linear_page);
+						      linear_page);
 			return NULL;
 		}
 
 		skb_mark_for_recycle(skb);
-		wi->linear_page.frags++;
-		mlx5e_page_release_fragmented(rq->page_pool, &wi->linear_page);
+		linear_page->frags++;
+		mlx5e_page_release_fragmented(rq->page_pool, linear_page);
 
 		if (xdp_buff_has_frags(&mxbuf->xdp)) {
 			struct mlx5e_frag_page *pagep;
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH net-next 5/5] net/mlx5e: XDP, Use page fragments for linear data in multibuf-mode
  2026-03-19  7:50 [PATCH net-next 0/5] net/mlx5e: XDP, Add support for multi-packet per page Tariq Toukan
                   ` (3 preceding siblings ...)
  2026-03-19  7:50 ` [PATCH net-next 4/5] net/mlx5e: XDP, Use a single linear page per rq Tariq Toukan
@ 2026-03-19  7:50 ` Tariq Toukan
  2026-03-24  2:42   ` Jakub Kicinski
  4 siblings, 1 reply; 8+ messages in thread
From: Tariq Toukan @ 2026-03-19  7:50 UTC (permalink / raw)
  To: Eric Dumazet, Jakub Kicinski, Paolo Abeni, Andrew Lunn,
	David S. Miller
  Cc: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Mark Bloch,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, netdev, linux-rdma, linux-kernel, bpf,
	Gal Pressman, Moshe Shemesh, Dragos Tatulea, Carolina Jubran

From: Dragos Tatulea <dtatulea@nvidia.com>

Currently in XDP multi-buffer mode for striding rq a whole page is
allocated for the linear part of the XDP buffer. This is wasteful,
especially on systems with larger page sizes.

This change splits the page into fixed sized fragments. The page is
replenished when the maximum number of allowed fragments is reached.
When a fragment is not used, it will be simply recycled on next packet.
This is great for XDP_DROP as the fragment can be recycled for the next
packet. In the most extreme case (XDP_DROP everything), there will be 0
fragments used => only one linear page allocation for the lifetime of
the XDP program.

The previous page_pool size increase was too conservative (doubling the
size) and now there are much fewer allocations (1/8 for a 4K page). So
drop the page_pool size extension altogether when the linear side page
is used.

This small improvement is at most visible for XDP_DROP tests with small
64B packets and a large enough MTU for Striding RQ to be in non-linear
mode:
+----------------------------------------------------------------------+
| System               | MTU  | baseline   | this change | improvement |
|----------------------+------+------------+-------------+-------------|
| 4K page x86_64 [1]   | 9000 | 26.30 Mpps | 30.45 Mpps  | 15.80 %     |
| 64K page aarch64 [2] | 9000 | 15.27 Mpps | 20.10 Mpps  | 31.62 %     |
+----------------------------------------------------------------------+

[1] Intel Xeon Platinum 8580
[2] ARM Neoverse-N1

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |  6 +++
 .../net/ethernet/mellanox/mlx5/core/en_main.c | 25 ++++++---
 .../net/ethernet/mellanox/mlx5/core/en_rx.c   | 54 +++++++++++++++----
 3 files changed, 68 insertions(+), 17 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 592234780f2b..2270e2e550dd 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -82,6 +82,9 @@ struct page_pool;
 
 #define MLX5E_PAGECNT_BIAS_MAX U16_MAX
 #define MLX5E_RX_MAX_HEAD (256)
+#define MLX5E_XDP_LOG_MAX_LINEAR_SZ \
+	order_base_2(MLX5_SKB_FRAG_SZ(XDP_PACKET_HEADROOM + MLX5E_RX_MAX_HEAD))
+
 #define MLX5E_SHAMPO_LOG_HEADER_ENTRY_SIZE (8)
 #define MLX5E_SHAMPO_WQ_HEADER_PER_PAGE \
 	(PAGE_SIZE >> MLX5E_SHAMPO_LOG_HEADER_ENTRY_SIZE)
@@ -596,6 +599,7 @@ struct mlx5e_mpw_info {
 
 struct mlx5e_mpw_linear_info {
 	struct mlx5e_frag_page frag_page;
+	u16 max_frags;
 };
 
 #define MLX5E_MAX_RX_FRAGS 4
@@ -1081,6 +1085,8 @@ bool mlx5e_reset_rx_moderation(struct dim_cq_moder *cq_moder, u8 cq_period_mode,
 bool mlx5e_reset_rx_channels_moderation(struct mlx5e_channels *chs, u8 cq_period_mode,
 					bool dim_enabled, bool keep_dim_state);
 
+void mlx5e_mpwqe_dealloc_linear_page(struct mlx5e_rq *rq);
+
 struct mlx5e_sq_param;
 int mlx5e_open_xdpsq(struct mlx5e_channel *c, struct mlx5e_params *params,
 		     struct mlx5e_sq_param *param, struct xsk_buff_pool *xsk_pool,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 8b3c82f6f038..b376abc561fd 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -371,11 +371,11 @@ static int mlx5e_rq_alloc_mpwqe_info(struct mlx5e_rq *rq, int node)
 
 static int mlx5e_rq_alloc_mpwqe_linear_info(struct mlx5e_rq *rq, int node,
 					    struct mlx5e_params *params,
-					    struct mlx5e_rq_opt_param *rqo,
-					    u32 *pool_size)
+					    struct mlx5e_rq_opt_param *rqo)
 {
 	struct mlx5_core_dev *mdev = rq->mdev;
 	struct mlx5e_mpw_linear_info *li;
+	u32 linear_frag_count;
 
 	if (mlx5e_rx_mpwqe_is_linear_skb(mdev, params, rqo) ||
 	    !params->xdp_prog)
@@ -385,10 +385,22 @@ static int mlx5e_rq_alloc_mpwqe_linear_info(struct mlx5e_rq *rq, int node,
 	if (!li)
 		return -ENOMEM;
 
+	linear_frag_count =
+		BIT(rq->mpwqe.page_shift - MLX5E_XDP_LOG_MAX_LINEAR_SZ);
+	if (linear_frag_count > U16_MAX) {
+		netdev_warn(rq->netdev,
+			    "rq %d: linear_frag_count (%u) larger than expected (%u), page_shift: %u, log_max_linear_sz: %u\n",
+			    rq->ix, linear_frag_count, U16_MAX,
+			    rq->mpwqe.page_shift, MLX5E_XDP_LOG_MAX_LINEAR_SZ);
+		kvfree(li);
+		return -EINVAL;
+	}
+
+	li->max_frags = linear_frag_count;
 	rq->mpwqe.linear_info = li;
 
-	/* additional page per packet for the linear part */
-	*pool_size *= 2;
+	/* Set to max to force allocation on first run. */
+	li->frag_page.frags = li->max_frags;
 
 	return 0;
 }
@@ -955,8 +967,7 @@ static int mlx5e_alloc_rq(struct mlx5e_params *params,
 		if (err)
 			goto err_rq_mkey;
 
-		err = mlx5e_rq_alloc_mpwqe_linear_info(rq, node, params, rqo,
-						       &pool_size);
+		err = mlx5e_rq_alloc_mpwqe_linear_info(rq, node, params, rqo);
 		if (err)
 			goto err_free_mpwqe_info;
 
@@ -1347,6 +1358,8 @@ void mlx5e_free_rx_descs(struct mlx5e_rq *rq)
 			mlx5_wq_ll_pop(wq, wqe_ix_be,
 				       &wqe->next.next_wqe_index);
 		}
+
+		mlx5e_mpwqe_dealloc_linear_page(rq);
 	} else {
 		struct mlx5_wq_cyc *wq = &rq->wqe.wq;
 		u16 missing = mlx5_wq_cyc_missing(wq);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index feb042d84b8e..2ac38536afe9 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -300,6 +300,35 @@ static void mlx5e_page_release_fragmented(struct page_pool *pp,
 		page_pool_put_unrefed_netmem(pp, netmem, -1, true);
 }
 
+static int mlx5e_mpwqe_linear_page_refill(struct mlx5e_rq *rq)
+{
+	struct mlx5e_mpw_linear_info *li = rq->mpwqe.linear_info;
+
+	if (likely(li->frag_page.frags < li->max_frags))
+		return 0;
+
+	if (likely(li->frag_page.netmem)) {
+		mlx5e_page_release_fragmented(rq->page_pool, &li->frag_page);
+		li->frag_page.netmem = 0;
+	}
+
+	return mlx5e_page_alloc_fragmented(rq->page_pool, &li->frag_page);
+}
+
+static void *mlx5e_mpwqe_get_linear_page_frag(struct mlx5e_rq *rq)
+{
+	struct mlx5e_mpw_linear_info *li = rq->mpwqe.linear_info;
+	u32 frag_offset;
+
+	if (unlikely(mlx5e_mpwqe_linear_page_refill(rq)))
+		return NULL;
+
+	frag_offset = li->frag_page.frags << MLX5E_XDP_LOG_MAX_LINEAR_SZ;
+	WARN_ON(frag_offset >= BIT(rq->mpwqe.page_shift));
+
+	return netmem_address(li->frag_page.netmem) + frag_offset;
+}
+
 static inline int mlx5e_get_rx_frag(struct mlx5e_rq *rq,
 				    struct mlx5e_wqe_frag_info *frag)
 {
@@ -702,6 +731,16 @@ static void mlx5e_dealloc_rx_mpwqe(struct mlx5e_rq *rq, u16 ix)
 	bitmap_fill(wi->skip_release_bitmap, rq->mpwqe.pages_per_wqe);
 }
 
+void mlx5e_mpwqe_dealloc_linear_page(struct mlx5e_rq *rq)
+{
+	struct mlx5e_mpw_linear_info *li = rq->mpwqe.linear_info;
+
+	if (!li || !li->frag_page.netmem)
+		return;
+
+	mlx5e_page_release_fragmented(rq->page_pool, &li->frag_page);
+}
+
 INDIRECT_CALLABLE_SCOPE bool mlx5e_post_rx_wqes(struct mlx5e_rq *rq)
 {
 	struct mlx5_wq_cyc *wq = &rq->wqe.wq;
@@ -1899,18 +1938,17 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w
 		/* area for bpf_xdp_[store|load]_bytes */
 		net_prefetchw(netmem_address(frag_page->netmem) + frag_offset);
 
-		linear_page = &rq->mpwqe.linear_info->frag_page;
-		if (unlikely(mlx5e_page_alloc_fragmented(rq->page_pool,
-							 linear_page))) {
+		va = mlx5e_mpwqe_get_linear_page_frag(rq);
+		if (!va) {
 			rq->stats->buff_alloc_err++;
 			return NULL;
 		}
 
-		va = netmem_address(linear_page->netmem);
 		net_prefetchw(va); /* xdp_frame data area */
 		linear_hr = XDP_PACKET_HEADROOM;
 		linear_data_len = 0;
 		linear_frame_sz = MLX5_SKB_FRAG_SZ(linear_hr + MLX5E_RX_MAX_HEAD);
+		linear_page = &rq->mpwqe.linear_info->frag_page;
 	} else {
 		skb = napi_alloc_skb(rq->cq.napi,
 				     ALIGN(MLX5E_RX_MAX_HEAD, sizeof(long)));
@@ -1971,8 +2009,6 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w
 
 				linear_page->frags++;
 			}
-			mlx5e_page_release_fragmented(rq->page_pool,
-						      linear_page);
 			return NULL; /* page/packet was consumed by XDP */
 		}
 
@@ -1989,15 +2025,11 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w
 			rq, mxbuf->xdp.data_hard_start, linear_frame_sz,
 			mxbuf->xdp.data - mxbuf->xdp.data_hard_start, len,
 			mxbuf->xdp.data - mxbuf->xdp.data_meta);
-		if (unlikely(!skb)) {
-			mlx5e_page_release_fragmented(rq->page_pool,
-						      linear_page);
+		if (unlikely(!skb))
 			return NULL;
-		}
 
 		skb_mark_for_recycle(skb);
 		linear_page->frags++;
-		mlx5e_page_release_fragmented(rq->page_pool, linear_page);
 
 		if (xdp_buff_has_frags(&mxbuf->xdp)) {
 			struct mlx5e_frag_page *pagep;
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next 5/5] net/mlx5e: XDP, Use page fragments for linear data in multibuf-mode
  2026-03-19  7:50 ` [PATCH net-next 5/5] net/mlx5e: XDP, Use page fragments for linear data in multibuf-mode Tariq Toukan
@ 2026-03-24  2:42   ` Jakub Kicinski
  2026-03-24  8:50     ` Dragos Tatulea
  0 siblings, 1 reply; 8+ messages in thread
From: Jakub Kicinski @ 2026-03-24  2:42 UTC (permalink / raw)
  To: tariqt
  Cc: Jakub Kicinski, leon, linux-kernel, netdev, moshe, daniel,
	edumazet, andrew+netdev, hawk, dtatulea, ast, mbloch,
	john.fastabend, bpf, cjubran, linux-rdma, davem, pabeni, gal,
	saeedm

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.

For local reproduction steps see https://netdev-ai.bots.linux.dev/ai-local.html
---
net/mlx5e: XDP, Use page fragments for linear data in multibuf-mode

This commit optimizes XDP multi-buffer mode by splitting pages into fixed-
size fragments instead of allocating whole pages for the linear part of XDP
buffers. This reduces allocations significantly, especially on systems with
larger page sizes, improving performance by 15.80% on 4K pages and 31.62%
on 64K pages for XDP_DROP tests.

> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
> index 592234780f2b..2270e2e550dd 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h

[ ... ]

> @@ -1081,6 +1085,8 @@ bool mlx5e_reset_rx_moderation(struct dim_cq_moder *cq_moder, u8 cq_period_mode,
>  bool mlx5e_reset_rx_channels_moderation(struct mlx5e_channels *chs, u8 cq_period_mode,
>  					bool dim_enabled, bool keep_dim_state);
>
> +void mlx5e_mpwqe_dealloc_linear_page(struct mlx5e_rq *rq);
> +
>  struct mlx5e_sq_param;
>  int mlx5e_open_xdpsq(struct mlx5e_channel *c, struct mlx5e_params *params,
>  		     struct mlx5e_sq_param *param, struct xsk_buff_pool *xsk_pool,

[ ... ]

> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> index 8b3c82f6f038..b376abc561fd 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c

[ ... ]

> @@ -385,10 +385,22 @@ static int mlx5e_rq_alloc_mpwqe_linear_info(struct mlx5e_rq *rq, int node,
>  	if (!li)
>  		return -ENOMEM;
>
> +	linear_frag_count =
> +		BIT(rq->mpwqe.page_shift - MLX5E_XDP_LOG_MAX_LINEAR_SZ);
> +	if (linear_frag_count > U16_MAX) {
> +		netdev_warn(rq->netdev,
> +			    "rq %d: linear_frag_count (%u) larger than expected (%u), page_shift: %u, log_max_linear_sz: %u\n",
> +			    rq->ix, linear_frag_count, U16_MAX,
> +			    rq->mpwqe.page_shift, MLX5E_XDP_LOG_MAX_LINEAR_SZ);
> +		kvfree(li);
> +		return -EINVAL;
> +	}
> +
> +	li->max_frags = linear_frag_count;
>  	rq->mpwqe.linear_info = li;
>
> -	/* additional page per packet for the linear part */
> -	*pool_size *= 2;
> +	/* Set to max to force allocation on first run. */
> +	li->frag_page.frags = li->max_frags;
>
>  	return 0;
>  }

[ ... ]

> @@ -1347,6 +1358,8 @@ void mlx5e_free_rx_descs(struct mlx5e_rq *rq)
>  			mlx5_wq_ll_pop(wq, wqe_ix_be,
>  				       &wqe->next.next_wqe_index);
>  		}
> +
> +		mlx5e_mpwqe_dealloc_linear_page(rq);
>  	} else {
>  		struct mlx5_wq_cyc *wq = &rq->wqe.wq;
>  		u16 missing = mlx5_wq_cyc_missing(wq);
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> index feb042d84b8e..2ac38536afe9 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c

[ ... ]

> @@ -300,6 +300,35 @@ static void mlx5e_page_release_fragmented(struct page_pool *pp,
>  		page_pool_put_unrefed_netmem(pp, netmem, -1, true);
>  }
>
> +static int mlx5e_mpwqe_linear_page_refill(struct mlx5e_rq *rq)
> +{
> +	struct mlx5e_mpw_linear_info *li = rq->mpwqe.linear_info;
> +
> +	if (likely(li->frag_page.frags < li->max_frags))
> +		return 0;
> +
> +	if (likely(li->frag_page.netmem)) {
> +		mlx5e_page_release_fragmented(rq->page_pool, &li->frag_page);
> +		li->frag_page.netmem = 0;
> +	}
> +
> +	return mlx5e_page_alloc_fragmented(rq->page_pool, &li->frag_page);
> +}
> +
> +static void *mlx5e_mpwqe_get_linear_page_frag(struct mlx5e_rq *rq)
> +{
> +	struct mlx5e_mpw_linear_info *li = rq->mpwqe.linear_info;
> +	u32 frag_offset;
> +
> +	if (unlikely(mlx5e_mpwqe_linear_page_refill(rq)))
> +		return NULL;
> +
> +	frag_offset = li->frag_page.frags << MLX5E_XDP_LOG_MAX_LINEAR_SZ;
> +	WARN_ON(frag_offset >= BIT(rq->mpwqe.page_shift));
> +
> +	return netmem_address(li->frag_page.netmem) + frag_offset;
> +}
> +

[ ... ]

> @@ -702,6 +731,16 @@ static void mlx5e_dealloc_rx_mpwqe(struct mlx5e_rq *rq, u16 ix)
>  	bitmap_fill(wi->skip_release_bitmap, rq->mpwqe.pages_per_wqe);
>  }
>
> +void mlx5e_mpwqe_dealloc_linear_page(struct mlx5e_rq *rq)
> +{
> +	struct mlx5e_mpw_linear_info *li = rq->mpwqe.linear_info;
> +
> +	if (!li || !li->frag_page.netmem)
> +		return;
> +
> +	mlx5e_page_release_fragmented(rq->page_pool, &li->frag_page);
> +}

Does this function need to reset li->frag_page.netmem to 0 and
li->frag_page.frags to li->max_frags after releasing the page?

When an RQ is flushed and then reactivated (for example, in error recovery
via mlx5e_rx_reporter_err_rq_cqe_recover()), the following sequence occurs:

mlx5e_rx_reporter_err_rq_cqe_recover()
  mlx5e_flush_rq()
    mlx5e_free_rx_descs()
      mlx5e_mpwqe_dealloc_linear_page()
        mlx5e_page_release_fragmented()  [releases the page]
        [but doesn't reset netmem or frags]
  mlx5e_activate_rq()
    [RQ becomes active again]

When the next XDP multi-buffer packet arrives, mlx5e_mpwqe_get_linear_page_frag()
is called:

mlx5e_skb_from_cqe_mpwrq_nonlinear()
  mlx5e_mpwqe_get_linear_page_frag()
    mlx5e_mpwqe_linear_page_refill()
      if (likely(li->frag_page.frags < li->max_frags))
        return 0;  [skips allocation if frags < max_frags]
    netmem_address(li->frag_page.netmem) + frag_offset  [UAF]

If li->frag_page.frags was not reset to li->max_frags in
mlx5e_mpwqe_dealloc_linear_page(), the refill function will see frags <
max_frags and skip the allocation, then compute the address using the stale
netmem pointer that was already returned to the page pool.

Shouldn't this match the initialization in mlx5e_rq_alloc_mpwqe_linear_info()
which sets li->frag_page.frags = li->max_frags?
-- 
pw-bot: cr

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next 5/5] net/mlx5e: XDP, Use page fragments for linear data in multibuf-mode
  2026-03-24  2:42   ` Jakub Kicinski
@ 2026-03-24  8:50     ` Dragos Tatulea
  0 siblings, 0 replies; 8+ messages in thread
From: Dragos Tatulea @ 2026-03-24  8:50 UTC (permalink / raw)
  To: Jakub Kicinski, tariqt
  Cc: leon, linux-kernel, netdev, moshe, daniel, edumazet,
	andrew+netdev, hawk, ast, mbloch, john.fastabend, bpf, cjubran,
	linux-rdma, davem, pabeni, gal, saeedm



On 24.03.26 03:42, Jakub Kicinski wrote:
> This is an AI-generated review of your patch. The human sending this
> email has considered the AI review valid, or at least plausible.
> 
> For local reproduction steps see https://netdev-ai.bots.linux.dev/ai-local.html
> ---
> [...]
> 
>> @@ -702,6 +731,16 @@ static void mlx5e_dealloc_rx_mpwqe(struct mlx5e_rq *rq, u16 ix)
>>  	bitmap_fill(wi->skip_release_bitmap, rq->mpwqe.pages_per_wqe);
>>  }
>>
>> +void mlx5e_mpwqe_dealloc_linear_page(struct mlx5e_rq *rq)
>> +{
>> +	struct mlx5e_mpw_linear_info *li = rq->mpwqe.linear_info;
>> +
>> +	if (!li || !li->frag_page.netmem)
>> +		return;
>> +
>> +	mlx5e_page_release_fragmented(rq->page_pool, &li->frag_page);
>> +}
> 
> Does this function need to reset li->frag_page.netmem to 0 and
> li->frag_page.frags to li->max_frags after releasing the page?
> 
> When an RQ is flushed and then reactivated (for example, in error recovery
> via mlx5e_rx_reporter_err_rq_cqe_recover()), the following sequence occurs:
> 
> mlx5e_rx_reporter_err_rq_cqe_recover()
>   mlx5e_flush_rq()
>     mlx5e_free_rx_descs()
>       mlx5e_mpwqe_dealloc_linear_page()
>         mlx5e_page_release_fragmented()  [releases the page]
>         [but doesn't reset netmem or frags]
>   mlx5e_activate_rq()
>     [RQ becomes active again]
> 
> When the next XDP multi-buffer packet arrives, mlx5e_mpwqe_get_linear_page_frag()
> is called:
> 
> mlx5e_skb_from_cqe_mpwrq_nonlinear()
>   mlx5e_mpwqe_get_linear_page_frag()
>     mlx5e_mpwqe_linear_page_refill()
>       if (likely(li->frag_page.frags < li->max_frags))
>         return 0;  [skips allocation if frags < max_frags]
>     netmem_address(li->frag_page.netmem) + frag_offset  [UAF]
> 
> If li->frag_page.frags was not reset to li->max_frags in
> mlx5e_mpwqe_dealloc_linear_page(), the refill function will see frags <
> max_frags and skip the allocation, then compute the address using the stale
> netmem pointer that was already returned to the page pool.
> 
> Shouldn't this match the initialization in mlx5e_rq_alloc_mpwqe_linear_info()
> which sets li->frag_page.frags = li->max_frags?
Good catch. Will address in v2.

Thanks,
Dragos


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2026-03-24  8:50 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-19  7:50 [PATCH net-next 0/5] net/mlx5e: XDP, Add support for multi-packet per page Tariq Toukan
2026-03-19  7:50 ` [PATCH net-next 1/5] net/mlx5e: XSK, Increase size for chunk_size param Tariq Toukan
2026-03-19  7:50 ` [PATCH net-next 2/5] net/mlx5e: XDP, Improve dma address calculation of linear part for XDP_TX Tariq Toukan
2026-03-19  7:50 ` [PATCH net-next 3/5] net/mlx5e: XDP, Remove stride size limitation Tariq Toukan
2026-03-19  7:50 ` [PATCH net-next 4/5] net/mlx5e: XDP, Use a single linear page per rq Tariq Toukan
2026-03-19  7:50 ` [PATCH net-next 5/5] net/mlx5e: XDP, Use page fragments for linear data in multibuf-mode Tariq Toukan
2026-03-24  2:42   ` Jakub Kicinski
2026-03-24  8:50     ` Dragos Tatulea

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox