public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next] net/mlx5: Use dma_wmb() for completion queue doorbell updates
@ 2026-04-02  5:52 lirongqing
  2026-04-09 20:46 ` Tariq Toukan
  0 siblings, 1 reply; 2+ messages in thread
From: lirongqing @ 2026-04-02  5:52 UTC (permalink / raw)
  To: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Mark Bloch,
	Andrew Lunn, David S . Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev,
	Boris Pismenny, Richard Cochran, Cosmin Ratiu, Dragos Tatulea,
	Carolina Jubran, Li RongQing, Kees Cook, Akiva Goldberger,
	Simon Horman, netdev, linux-rdma, linux-kernel, bpf

From: Li RongQing <lirongqing@baidu.com>

dma_wmb() barriers are specifically for ordering writes to DMA
coherent memory that is accessible to both the CPU and DMA capable
devices.

The dma_wmb() barrier is lighter than wmb() on some architectures
because it only ensures ordering for DMA writes, not for all writes
including MMIO accesses.

In the MLX5 driver, completion queue (CQ) doorbell records are
allocated as DMA coherent memory via mlx5_dma_zalloc_coherent_node().
The CQ update pattern is:
  1. Update CQ space (device reads via DMA)
  2. Update doorbell record (device reads via DMA)
  3. Memory barrier
  4. Enable more CQEs

Since only DMA coherent memory accesses are involved (no MMIO accesses
follow), can safely use dma_wmb() instead of wmb().

This change improves performance slightly on architectures where
dma_wmb() is lighter than wmb().

Signed-off-by: Li RongQing <lirongqing@baidu.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c    | 2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c    | 2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c     | 2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_tx.c     | 2 +-
 drivers/net/ethernet/mellanox/mlx5/core/fpga/conn.c | 2 +-
 drivers/net/ethernet/mellanox/mlx5/core/lib/aso.c   | 2 +-
 drivers/net/ethernet/mellanox/mlx5/core/wc.c        | 2 +-
 7 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
index 1b76647..7bd6dfc 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
@@ -259,7 +259,7 @@ static bool mlx5e_ptp_poll_ts_cq(struct mlx5e_cq *cq, int napi_budget)
 	mlx5_cqwq_update_db_record(cqwq);
 
 	/* ensure cq space is freed before enabling more cqes */
-	wmb();
+	dma_wmb();
 
 	while (metadata_buff_sz > 0)
 		mlx5e_ptp_metadata_fifo_push(&ptpsq->metadata_freelist,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
index 80f9fc1..dde8856 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
@@ -805,7 +805,7 @@ bool mlx5e_poll_xdpsq_cq(struct mlx5e_cq *cq)
 	mlx5_cqwq_update_db_record(&cq->wq);
 
 	/* ensure cq space is freed before enabling more cqes */
-	wmb();
+	dma_wmb();
 
 	sq->cc = sqcc;
 	return (i == MLX5E_TX_CQ_POLL_BUDGET);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 268e208..f17e7f1 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -2447,7 +2447,7 @@ int mlx5e_poll_rx_cq(struct mlx5e_cq *cq, int budget)
 	mlx5_cqwq_update_db_record(cqwq);
 
 	/* ensure cq space is freed before enabling more cqes */
-	wmb();
+	dma_wmb();
 
 	return work_done;
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
index 9f02726..7ba319f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
@@ -849,7 +849,7 @@ bool mlx5e_poll_tx_cq(struct mlx5e_cq *cq, int napi_budget)
 	mlx5_cqwq_update_db_record(&cq->wq);
 
 	/* ensure cq space is freed before enabling more cqes */
-	wmb();
+	dma_wmb();
 
 	sq->dma_fifo_cc = dma_fifo_cc;
 	sq->cc = sqcc;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fpga/conn.c b/drivers/net/ethernet/mellanox/mlx5/core/fpga/conn.c
index 1f6bde5..1341874 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fpga/conn.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fpga/conn.c
@@ -384,7 +384,7 @@ static inline void mlx5_fpga_conn_cqes(struct mlx5_fpga_conn *conn,
 
 	mlx5_fpga_dbg(conn->fdev, "Re-arming CQ with cc# %u\n", conn->cq.wq.cc);
 	/* ensure cq space is freed before enabling more cqes */
-	wmb();
+	dma_wmb();
 	mlx5_fpga_conn_arm_cq(conn);
 }
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/aso.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/aso.c
index 614cd57..8f7a89a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/aso.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/aso.c
@@ -421,7 +421,7 @@ int mlx5_aso_poll_cq(struct mlx5_aso *aso, bool with_data)
 	mlx5_cqwq_update_db_record(&cq->wq);
 
 	/* ensure cq space is freed before enabling more cqes */
-	wmb();
+	dma_wmb();
 
 	if (with_data)
 		aso->cc += MLX5_ASO_WQEBBS_DATA;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/wc.c b/drivers/net/ethernet/mellanox/mlx5/core/wc.c
index 7d3d4d7..1afbdd19 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/wc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/wc.c
@@ -314,7 +314,7 @@ static void mlx5_wc_post_nop(struct mlx5_wc_sq *sq, unsigned int *offset,
 	/* ensure doorbell record is visible to device before ringing the
 	 * doorbell
 	 */
-	wmb();
+	dma_wmb();
 
 	mlx5_iowrite64_copy(sq, mmio_wqe, sizeof(mmio_wqe), *offset);
 
-- 
2.9.4


^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH net-next] net/mlx5: Use dma_wmb() for completion queue doorbell updates
  2026-04-02  5:52 [PATCH net-next] net/mlx5: Use dma_wmb() for completion queue doorbell updates lirongqing
@ 2026-04-09 20:46 ` Tariq Toukan
  0 siblings, 0 replies; 2+ messages in thread
From: Tariq Toukan @ 2026-04-09 20:46 UTC (permalink / raw)
  To: lirongqing, Saeed Mahameed, Leon Romanovsky, Tariq Toukan,
	Mark Bloch, Andrew Lunn, David S . Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev,
	Boris Pismenny, Richard Cochran, Cosmin Ratiu, Dragos Tatulea,
	Carolina Jubran, Kees Cook, Akiva Goldberger, Simon Horman,
	netdev, linux-rdma, linux-kernel, bpf



On 02/04/2026 8:52, lirongqing wrote:
> From: Li RongQing <lirongqing@baidu.com>
> 
> dma_wmb() barriers are specifically for ordering writes to DMA
> coherent memory that is accessible to both the CPU and DMA capable
> devices.
> 
> The dma_wmb() barrier is lighter than wmb() on some architectures
> because it only ensures ordering for DMA writes, not for all writes
> including MMIO accesses.
> 
> In the MLX5 driver, completion queue (CQ) doorbell records are
> allocated as DMA coherent memory via mlx5_dma_zalloc_coherent_node().
> The CQ update pattern is:
>    1. Update CQ space (device reads via DMA)
>    2. Update doorbell record (device reads via DMA)
>    3. Memory barrier
>    4. Enable more CQEs
> 
> Since only DMA coherent memory accesses are involved (no MMIO accesses
> follow), can safely use dma_wmb() instead of wmb().
> 
> This change improves performance slightly on architectures where
> dma_wmb() is lighter than wmb().
> 
> Signed-off-by: Li RongQing <lirongqing@baidu.com>
> ---

Hi,

Sorry for the delay.
Thanks for your patch.

The idea looks valid.
This is the kind of patches that better go through intensive testing 
before acceptance, I'm picking it for internal testing and will update.

PS: I know you have one more patch [1] pending testing. It looks good so 
far, I'll verify and send an update soon.

Regards,
Tariq

[1] 
https://patchwork.kernel.org/project/netdevbpf/patch/20260317003544.2583-1-lirongqing@baidu.com/

>   drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c    | 2 +-
>   drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c    | 2 +-
>   drivers/net/ethernet/mellanox/mlx5/core/en_rx.c     | 2 +-
>   drivers/net/ethernet/mellanox/mlx5/core/en_tx.c     | 2 +-
>   drivers/net/ethernet/mellanox/mlx5/core/fpga/conn.c | 2 +-
>   drivers/net/ethernet/mellanox/mlx5/core/lib/aso.c   | 2 +-
>   drivers/net/ethernet/mellanox/mlx5/core/wc.c        | 2 +-
>   7 files changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
> index 1b76647..7bd6dfc 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
> @@ -259,7 +259,7 @@ static bool mlx5e_ptp_poll_ts_cq(struct mlx5e_cq *cq, int napi_budget)
>   	mlx5_cqwq_update_db_record(cqwq);
>   
>   	/* ensure cq space is freed before enabling more cqes */
> -	wmb();
> +	dma_wmb();
>   
>   	while (metadata_buff_sz > 0)
>   		mlx5e_ptp_metadata_fifo_push(&ptpsq->metadata_freelist,
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
> index 80f9fc1..dde8856 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
> @@ -805,7 +805,7 @@ bool mlx5e_poll_xdpsq_cq(struct mlx5e_cq *cq)
>   	mlx5_cqwq_update_db_record(&cq->wq);
>   
>   	/* ensure cq space is freed before enabling more cqes */
> -	wmb();
> +	dma_wmb();
>   
>   	sq->cc = sqcc;
>   	return (i == MLX5E_TX_CQ_POLL_BUDGET);
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> index 268e208..f17e7f1 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> @@ -2447,7 +2447,7 @@ int mlx5e_poll_rx_cq(struct mlx5e_cq *cq, int budget)
>   	mlx5_cqwq_update_db_record(cqwq);
>   
>   	/* ensure cq space is freed before enabling more cqes */
> -	wmb();
> +	dma_wmb();
>   
>   	return work_done;
>   }
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
> index 9f02726..7ba319f 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
> @@ -849,7 +849,7 @@ bool mlx5e_poll_tx_cq(struct mlx5e_cq *cq, int napi_budget)
>   	mlx5_cqwq_update_db_record(&cq->wq);
>   
>   	/* ensure cq space is freed before enabling more cqes */
> -	wmb();
> +	dma_wmb();
>   
>   	sq->dma_fifo_cc = dma_fifo_cc;
>   	sq->cc = sqcc;
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fpga/conn.c b/drivers/net/ethernet/mellanox/mlx5/core/fpga/conn.c
> index 1f6bde5..1341874 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/fpga/conn.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/fpga/conn.c
> @@ -384,7 +384,7 @@ static inline void mlx5_fpga_conn_cqes(struct mlx5_fpga_conn *conn,
>   
>   	mlx5_fpga_dbg(conn->fdev, "Re-arming CQ with cc# %u\n", conn->cq.wq.cc);
>   	/* ensure cq space is freed before enabling more cqes */
> -	wmb();
> +	dma_wmb();
>   	mlx5_fpga_conn_arm_cq(conn);
>   }
>   
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/aso.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/aso.c
> index 614cd57..8f7a89a 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/lib/aso.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/aso.c
> @@ -421,7 +421,7 @@ int mlx5_aso_poll_cq(struct mlx5_aso *aso, bool with_data)
>   	mlx5_cqwq_update_db_record(&cq->wq);
>   
>   	/* ensure cq space is freed before enabling more cqes */
> -	wmb();
> +	dma_wmb();
>   
>   	if (with_data)
>   		aso->cc += MLX5_ASO_WQEBBS_DATA;
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/wc.c b/drivers/net/ethernet/mellanox/mlx5/core/wc.c
> index 7d3d4d7..1afbdd19 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/wc.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/wc.c
> @@ -314,7 +314,7 @@ static void mlx5_wc_post_nop(struct mlx5_wc_sq *sq, unsigned int *offset,
>   	/* ensure doorbell record is visible to device before ringing the
>   	 * doorbell
>   	 */
> -	wmb();
> +	dma_wmb();
>   
>   	mlx5_iowrite64_copy(sq, mmio_wqe, sizeof(mmio_wqe), *offset);
>   


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2026-04-09 20:47 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-02  5:52 [PATCH net-next] net/mlx5: Use dma_wmb() for completion queue doorbell updates lirongqing
2026-04-09 20:46 ` Tariq Toukan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox