public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Tariq Toukan <tariqt@nvidia.com>
To: Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Andrew Lunn <andrew+netdev@lunn.ch>,
	"David S. Miller" <davem@davemloft.net>
Cc: Saeed Mahameed <saeedm@nvidia.com>,
	Leon Romanovsky <leon@kernel.org>,
	Tariq Toukan <tariqt@nvidia.com>, Mark Bloch <mbloch@nvidia.com>,
	<netdev@vger.kernel.org>, <linux-rdma@vger.kernel.org>,
	<linux-kernel@vger.kernel.org>, Gal Pressman <gal@nvidia.com>,
	Moshe Shemesh <moshe@nvidia.com>,
	Cosmin Ratiu <cratiu@nvidia.com>,
	Dragos Tatulea <dtatulea@nvidia.com>
Subject: [PATCH net-next 2/3] net/mlx5e: SHAMPO, Improve allocation recovery
Date: Mon, 12 Jan 2026 15:22:08 +0200	[thread overview]
Message-ID: <1768224129-1600265-3-git-send-email-tariqt@nvidia.com> (raw)
In-Reply-To: <1768224129-1600265-1-git-send-email-tariqt@nvidia.com>

From: Dragos Tatulea <dtatulea@nvidia.com>

When memory providers are used, there is a disconnect between the
page_pool size and the available memory in the provider. This means
that the page_pool can run out of memory if the user didn't provision
a large enough buffer.

Under these conditions, mlx5 gets stuck trying to allocate new
buffers without being able to release existing buffers. This happens due
to the optimization introduced in commit 4c2a13236807
("net/mlx5e: RX, Defer page release in striding rq for better recycling")
which delays WQE releases to increase the chance of page_pool direct
recycling. The optimization was developed before memory providers
existed and this circumstance was not considered.

This patch unblocks the queue by reclaiming pages from WQEs that can be
freed and doing a one-shot retry. A WQE can be freed when:
1) All its strides have been consumed (WQE is no longer in linked list).
2) The WQE pages/netmems have not been previously released.

This reclaim mechanism is useful for regular pages as well.

Note that provisioning memory that can't fill even one MPWQE (64
4K pages) will still render the queue unusable. Same when
the application doesn't release its buffers for various reasons.
Or a combination of the two: a very small buffer is provisioned,
application releases buffers in bulk, bulk size never reached
=> queue is stuck.

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 .../net/ethernet/mellanox/mlx5/core/en_rx.c   | 26 +++++++++++++++++--
 1 file changed, 24 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 57e20beb05dc..aae4db392992 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -1083,11 +1083,24 @@ int mlx5e_poll_ico_cq(struct mlx5e_cq *cq)
 	return i;
 }
 
+static void mlx5e_reclaim_mpwqe_pages(struct mlx5e_rq *rq, int head,
+				      int reclaim)
+{
+	struct mlx5_wq_ll *wq = &rq->mpwqe.wq;
+
+	for (int i = 0; i < reclaim; i++) {
+		head = mlx5_wq_ll_get_wqe_next_ix(wq, head);
+
+		mlx5e_dealloc_rx_mpwqe(rq, head);
+	}
+}
+
 INDIRECT_CALLABLE_SCOPE bool mlx5e_post_rx_mpwqes(struct mlx5e_rq *rq)
 {
 	struct mlx5_wq_ll *wq = &rq->mpwqe.wq;
 	u8  umr_completed = rq->mpwqe.umr_completed;
 	struct mlx5e_icosq *sq = rq->icosq;
+	bool reclaimed = false;
 	int alloc_err = 0;
 	u8  missing, i;
 	u16 head;
@@ -1122,11 +1135,20 @@ INDIRECT_CALLABLE_SCOPE bool mlx5e_post_rx_mpwqes(struct mlx5e_rq *rq)
 		/* Deferred free for better page pool cache usage. */
 		mlx5e_free_rx_mpwqe(rq, wi);
 
+retry:
 		alloc_err = rq->xsk_pool ? mlx5e_xsk_alloc_rx_mpwqe(rq, head) :
 					   mlx5e_alloc_rx_mpwqe(rq, head);
+		if (unlikely(alloc_err)) {
+			int reclaim = i - 1;
 
-		if (unlikely(alloc_err))
-			break;
+			if (reclaimed || !reclaim)
+				break;
+
+			mlx5e_reclaim_mpwqe_pages(rq, head, reclaim);
+			reclaimed = true;
+
+			goto retry;
+		}
 		head = mlx5_wq_ll_get_wqe_next_ix(wq, head);
 	} while (--i);
 
-- 
2.31.1


  parent reply	other threads:[~2026-01-12 13:23 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-12 13:22 [PATCH net-next 0/3] net/mlx5e: RX datapath enhancements Tariq Toukan
2026-01-12 13:22 ` [PATCH net-next 1/3] net/mlx5e: RX, Drop oversized packets in non-linear mode Tariq Toukan
2026-01-15 14:41   ` Paolo Abeni
2026-01-12 13:22 ` Tariq Toukan [this message]
2026-01-12 13:22 ` [PATCH net-next 3/3] net/mlx5e: SHAMPO, Switch to header memcpy Tariq Toukan
2026-01-15 14:46   ` Paolo Abeni
2026-01-15 13:57 ` [PATCH net-next 0/3] net/mlx5e: RX datapath enhancements Simon Horman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1768224129-1600265-3-git-send-email-tariqt@nvidia.com \
    --to=tariqt@nvidia.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=cratiu@nvidia.com \
    --cc=davem@davemloft.net \
    --cc=dtatulea@nvidia.com \
    --cc=edumazet@google.com \
    --cc=gal@nvidia.com \
    --cc=kuba@kernel.org \
    --cc=leon@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=mbloch@nvidia.com \
    --cc=moshe@nvidia.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=saeedm@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox