[PATCH net-next 2/6] net/mlx5e: Use regular ICOSQ for triggering NAPI

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Tariq Toukan <tariqt@nvidia.com>
To: Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Andrew Lunn <andrew+netdev@lunn.ch>,
	"David S. Miller" <davem@davemloft.net>
Cc: Saeed Mahameed <saeedm@nvidia.com>,
	Leon Romanovsky <leon@kernel.org>,
	Tariq Toukan <tariqt@nvidia.com>, Mark Bloch <mbloch@nvidia.com>,
	"Alexei Starovoitov" <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	"Jesper Dangaard Brouer" <hawk@kernel.org>,
	John Fastabend <john.fastabend@gmail.com>,
	<netdev@vger.kernel.org>, <linux-rdma@vger.kernel.org>,
	<linux-kernel@vger.kernel.org>, <bpf@vger.kernel.org>,
	Gal Pressman <gal@nvidia.com>,
	Leon Romanovsky <leonro@nvidia.com>,
	Moshe Shemesh <moshe@nvidia.com>, William Tu <witu@nvidia.com>,
	Dragos Tatulea <dtatulea@nvidia.com>,
	Nimrod Oren <noren@nvidia.com>, Alex Lazar <alazar@nvidia.com>
Subject: [PATCH net-next 2/6] net/mlx5e: Use regular ICOSQ for triggering NAPI
Date: Wed, 12 Nov 2025 11:29:05 +0200	[thread overview]
Message-ID: <1762939749-1165658-3-git-send-email-tariqt@nvidia.com> (raw)
In-Reply-To: <1762939749-1165658-1-git-send-email-tariqt@nvidia.com>

From: William Tu <witu@nvidia.com>

Before the cited commit, ICOSQ is used to post NOP WQE to trigger
hardware interrupt and start NAPI, but this mechanism suffers from
a race condition: mlx5e_alloc_rx_mpwqe may post UMR WQEs to ICOSQ
_before_ NOP WQE is posted. The cited commit fixes the issue by
replacing ICOSQ with async ICOSQ, as a new way to post the NOP WQE
to trigger the hardware interrupt and NAPI.

The patch changes it back by replacing async ICOSQ with regular
ICOSQ, for the purpose of saving memory in later patches, and solves
the issue by adding a new SQ state, MLX5E_SQ_STATE_LOCK_NEEDED
for syncing the start of NAPI.

What it does:
- Switch trigger path from async ICOSQ to regular ICOSQ to reduce
  need for async SQ.
- Introduce MLX5E_SQ_STATE_LOCK_NEEDED and mlx5e_icosq_sync_lock(),
  unlock() to prevent the race where UMR WQEs could be posted before
  the NOP WQE used to trigger NAPI.
- Use synchronize_net() once per trigger cycle to quiesce in-flight
  softirqs before serializing the NOP WQE and any UMR postings via
  the ICOSQ lock.
- Wrap ICOSQ UMR posting in en_rx.c and xsk/rx.c with the new
  conditional lock.

Signed-off-by: William Tu <witu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h  | 40 ++++++++++++++++++-
 .../mellanox/mlx5/core/en/reporter_tx.c       |  1 +
 .../ethernet/mellanox/mlx5/core/en/xsk/rx.c   |  3 ++
 .../net/ethernet/mellanox/mlx5/core/en_main.c | 14 +++++--
 .../net/ethernet/mellanox/mlx5/core/en_rx.c   |  3 ++
 5 files changed, 56 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 70bc878bd2c2..9ee07fa19896 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -388,6 +388,7 @@ enum {
 	MLX5E_SQ_STATE_DIM,
 	MLX5E_SQ_STATE_PENDING_XSK_TX,
 	MLX5E_SQ_STATE_PENDING_TLS_RX_RESYNC,
+	MLX5E_SQ_STATE_LOCK_NEEDED,
 	MLX5E_NUM_SQ_STATES, /* Must be kept last */
 };
 
@@ -751,7 +752,7 @@ struct mlx5e_rq {
 
 enum mlx5e_channel_state {
 	MLX5E_CHANNEL_STATE_XSK,
-	MLX5E_CHANNEL_NUM_STATES
+	MLX5E_CHANNEL_NUM_STATES, /* Must be kept last */
 };
 
 struct mlx5e_channel {
@@ -801,6 +802,43 @@ struct mlx5e_channel {
 	struct dim_cq_moder        tx_cq_moder;
 };
 
+enum mlx5e_lock_type {
+	MLX5E_LOCK_TYPE_NONE,
+	MLX5E_LOCK_TYPE_SOFTIRQ,
+	MLX5E_LOCK_TYPE_BH,
+};
+
+static inline enum mlx5e_lock_type
+mlx5e_icosq_sync_lock(struct mlx5e_icosq *sq)
+{
+	if (!test_bit(MLX5E_SQ_STATE_LOCK_NEEDED, &sq->state))
+		return MLX5E_LOCK_TYPE_NONE;
+
+	if (in_softirq()) {
+		spin_lock(&sq->lock);
+		return MLX5E_LOCK_TYPE_SOFTIRQ;
+	}
+
+	spin_lock_bh(&sq->lock);
+	return MLX5E_LOCK_TYPE_BH;
+}
+
+static inline void mlx5e_icosq_sync_unlock(struct mlx5e_icosq *sq,
+					   enum mlx5e_lock_type lock_type)
+{
+	switch (lock_type) {
+	case MLX5E_LOCK_TYPE_SOFTIRQ:
+		spin_unlock(&sq->lock);
+		break;
+	case MLX5E_LOCK_TYPE_BH:
+		spin_unlock_bh(&sq->lock);
+		break;
+	case MLX5E_LOCK_TYPE_NONE:
+	default:
+		break;
+	}
+}
+
 struct mlx5e_ptp;
 
 struct mlx5e_channels {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
index 9e2cf191ed30..4adc1adf9897 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
@@ -15,6 +15,7 @@ static const char * const sq_sw_state_type_name[] = {
 	[MLX5E_SQ_STATE_DIM] = "dim",
 	[MLX5E_SQ_STATE_PENDING_XSK_TX] = "pending_xsk_tx",
 	[MLX5E_SQ_STATE_PENDING_TLS_RX_RESYNC] = "pending_tls_rx_resync",
+	[MLX5E_SQ_STATE_LOCK_NEEDED] = "lock_needed",
 };
 
 static int mlx5e_wait_for_sq_flush(struct mlx5e_txqsq *sq)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
index 2b05536d564a..a96fd7f65485 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
@@ -21,6 +21,7 @@ int mlx5e_xsk_alloc_rx_mpwqe(struct mlx5e_rq *rq, u16 ix)
 	struct mlx5e_mpw_info *wi = mlx5e_get_mpw_info(rq, ix);
 	struct mlx5e_icosq *icosq = rq->icosq;
 	struct mlx5_wq_cyc *wq = &icosq->wq;
+	enum mlx5e_lock_type sync_locked;
 	struct mlx5e_umr_wqe *umr_wqe;
 	struct xdp_buff **xsk_buffs;
 	int batch, i;
@@ -47,6 +48,7 @@ int mlx5e_xsk_alloc_rx_mpwqe(struct mlx5e_rq *rq, u16 ix)
 			goto err_reuse_batch;
 	}
 
+	sync_locked = mlx5e_icosq_sync_lock(icosq);
 	pi = mlx5e_icosq_get_next_pi(icosq, rq->mpwqe.umr_wqebbs);
 	umr_wqe = mlx5_wq_cyc_get_wqe(wq, pi);
 	memcpy(umr_wqe, &rq->mpwqe.umr_wqe, sizeof(struct mlx5e_umr_wqe));
@@ -143,6 +145,7 @@ int mlx5e_xsk_alloc_rx_mpwqe(struct mlx5e_rq *rq, u16 ix)
 	};
 
 	icosq->pc += rq->mpwqe.umr_wqebbs;
+	mlx5e_icosq_sync_unlock(icosq, sync_locked);
 
 	icosq->doorbell_cseg = &umr_wqe->hdr.ctrl;
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 590707dc6f0e..80fb09d902f5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -2751,11 +2751,17 @@ static int mlx5e_channel_stats_alloc(struct mlx5e_priv *priv, int ix, int cpu)
 
 void mlx5e_trigger_napi_icosq(struct mlx5e_channel *c)
 {
-	struct mlx5e_icosq *async_icosq = &c->async_icosq;
+	enum mlx5e_lock_type locked_type;
 
-	spin_lock_bh(&async_icosq->lock);
-	mlx5e_trigger_irq(async_icosq);
-	spin_unlock_bh(&async_icosq->lock);
+	if (!test_and_set_bit(MLX5E_SQ_STATE_LOCK_NEEDED, &c->icosq.state))
+		synchronize_net();
+
+	locked_type = mlx5e_icosq_sync_lock(&c->icosq);
+	mlx5e_trigger_irq(&c->icosq);
+	if (locked_type != MLX5E_LOCK_TYPE_NONE)
+		mlx5e_icosq_sync_unlock(&c->icosq, locked_type);
+
+	clear_bit(MLX5E_SQ_STATE_LOCK_NEEDED, &c->icosq.state);
 }
 
 void mlx5e_trigger_napi_sched(struct napi_struct *napi)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 1f6930c77437..b54844d80922 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -776,6 +776,7 @@ static int mlx5e_alloc_rx_mpwqe(struct mlx5e_rq *rq, u16 ix)
 	struct mlx5e_icosq *sq = rq->icosq;
 	struct mlx5e_frag_page *frag_page;
 	struct mlx5_wq_cyc *wq = &sq->wq;
+	enum mlx5e_lock_type sync_locked;
 	struct mlx5e_umr_wqe *umr_wqe;
 	u32 offset; /* 17-bit value with MTT. */
 	u16 pi;
@@ -788,6 +789,7 @@ static int mlx5e_alloc_rx_mpwqe(struct mlx5e_rq *rq, u16 ix)
 			goto err;
 	}
 
+	sync_locked = mlx5e_icosq_sync_lock(sq);
 	pi = mlx5e_icosq_get_next_pi(sq, rq->mpwqe.umr_wqebbs);
 	umr_wqe = mlx5_wq_cyc_get_wqe(wq, pi);
 	memcpy(umr_wqe, &rq->mpwqe.umr_wqe, sizeof(struct mlx5e_umr_wqe));
@@ -835,6 +837,7 @@ static int mlx5e_alloc_rx_mpwqe(struct mlx5e_rq *rq, u16 ix)
 	};
 
 	sq->pc += rq->mpwqe.umr_wqebbs;
+	mlx5e_icosq_sync_unlock(sq, sync_locked);
 
 	sq->doorbell_cseg = &umr_wqe->hdr.ctrl;
 
-- 
2.31.1

next prev parent reply	other threads:[~2025-11-12  9:31 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-12  9:29 [PATCH net-next 0/6] net/mlx5e: Speedup channel configuration operations Tariq Toukan
2025-11-12  9:29 ` [PATCH net-next 1/6] net/mlx5e: Move async ICOSQ lock into ICOSQ struct Tariq Toukan
2025-11-12  9:29 ` Tariq Toukan [this message]
2025-11-15  2:53   ` [PATCH net-next 2/6] net/mlx5e: Use regular ICOSQ for triggering NAPI Jakub Kicinski
2025-11-12  9:29 ` [PATCH net-next 3/6] net/mlx5e: Move async ICOSQ to dynamic allocation Tariq Toukan
2025-11-12  9:29 ` [PATCH net-next 4/6] net/mlx5e: Conditionally create async ICOSQ Tariq Toukan
2025-11-12  9:29 ` [PATCH net-next 5/6] net/mlx5e: Update XDP features in switch channels Tariq Toukan
2025-11-12  9:29 ` [PATCH net-next 6/6] net/mlx5e: Support XDP target xmit with dummy program Tariq Toukan
2025-11-12 10:29   ` Toke Høiland-Jørgensen
2025-11-12 11:28     ` Tariq Toukan
2025-11-12 10:54 ` [PATCH net-next 0/6] net/mlx5e: Speedup channel configuration operations Toke Høiland-Jørgensen
2025-11-12 11:30   ` Tariq Toukan
2025-11-12 16:33     ` Toke Høiland-Jørgensen
2025-11-13 10:59       ` Tariq Toukan
2025-11-13 13:16         ` Toke Høiland-Jørgensen

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:70bc878bd2c dfblob:9ee07fa1989 dfblob:9e2cf191ed3
dfblob:4adc1adf989 dfblob:2b05536d564 dfblob:a96fd7f6548
dfblob:590707dc6f0 dfblob:80fb09d902f dfblob:1f6930c7743
dfblob:b54844d8092 )
 OR (
bs:"[PATCH net-next 2/6] net/mlx5e: Use regular ICOSQ for triggering NAPI" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1762939749-1165658-3-git-send-email-tariqt@nvidia.com \
    --to=tariqt@nvidia.com \
    --cc=alazar@nvidia.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=dtatulea@nvidia.com \
    --cc=edumazet@google.com \
    --cc=gal@nvidia.com \
    --cc=hawk@kernel.org \
    --cc=john.fastabend@gmail.com \
    --cc=kuba@kernel.org \
    --cc=leon@kernel.org \
    --cc=leonro@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=mbloch@nvidia.com \
    --cc=moshe@nvidia.com \
    --cc=netdev@vger.kernel.org \
    --cc=noren@nvidia.com \
    --cc=pabeni@redhat.com \
    --cc=saeedm@nvidia.com \
    --cc=witu@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox