[PATCH net V2 0/6] mlx5 misc fixes 2026-02-18

public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH net V2 0/6] mlx5 misc fixes 2026-02-18
@ 2026-02-18  7:28 Tariq Toukan
  2026-02-18  7:28 ` [PATCH net V2 1/6] net/mlx5: Fix multiport device check over light SFs Tariq Toukan
                   ` (7 more replies)
  0 siblings, 8 replies; 12+ messages in thread
From: Tariq Toukan @ 2026-02-18  7:28 UTC (permalink / raw)
  To: Eric Dumazet, Jakub Kicinski, Paolo Abeni, Andrew Lunn,
	David S. Miller
  Cc: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Mark Bloch, netdev,
	linux-rdma, linux-kernel, Gal Pressman, Moshe Shemesh,
	Jacob Keller

Hi,

This patchset provides misc bug fixes from the team to the mlx5
core and Eth drivers.

Thanks,
Tariq.

V2:
- Add review tags (Jacob).
- Use poll_timeout_us and variants (Jacob).
- Link to V1: https://lore.kernel.org/all/20260212103217.1752943-1-tariqt@nvidia.com/

Cosmin Ratiu (2):
  net/mlx5e: Fix deadlocks between devlink and netdev instance locks
  net/mlx5e: Use unsigned for mlx5e_get_max_num_channels

Gal Pressman (3):
  net/mlx5e: Fix misidentification of ASO CQE during poll loop
  net/mlx5: Fix misidentification of write combining CQE during poll
    loop
  net/mlx5e: MACsec, add ASO poll loop in macsec_aso_set_arm_event

Shay Drory (1):
  net/mlx5: Fix multiport device check over light SFs

 drivers/net/ethernet/mellanox/mlx5/core/en.h  |  3 +-
 .../net/ethernet/mellanox/mlx5/core/en/ptp.c  | 14 -----
 .../mellanox/mlx5/core/en/reporter_rx.c       | 13 +++++
 .../mellanox/mlx5/core/en/reporter_tx.c       | 52 +++++++++++++++++--
 .../ethernet/mellanox/mlx5/core/en/tc/meter.c | 10 ++--
 .../mellanox/mlx5/core/en_accel/macsec.c      | 13 ++---
 .../net/ethernet/mellanox/mlx5/core/en_main.c | 40 --------------
 drivers/net/ethernet/mellanox/mlx5/core/wc.c  | 14 ++---
 include/linux/mlx5/driver.h                   |  4 +-
 9 files changed, 78 insertions(+), 85 deletions(-)


base-commit: ccd8e87748ad083047d6c8544c5809b7f96cc8df
-- 
2.44.0


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH net V2 1/6] net/mlx5: Fix multiport device check over light SFs
  2026-02-18  7:28 [PATCH net V2 0/6] mlx5 misc fixes 2026-02-18 Tariq Toukan
@ 2026-02-18  7:28 ` Tariq Toukan
  2026-02-18  7:29 ` [PATCH net V2 2/6] net/mlx5e: Fix misidentification of ASO CQE during poll loop Tariq Toukan
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Tariq Toukan @ 2026-02-18  7:28 UTC (permalink / raw)
  To: Eric Dumazet, Jakub Kicinski, Paolo Abeni, Andrew Lunn,
	David S. Miller
  Cc: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Mark Bloch, netdev,
	linux-rdma, linux-kernel, Gal Pressman, Moshe Shemesh,
	Jacob Keller, Shay Drory

From: Shay Drory <shayd@nvidia.com>

Driver is using num_vhca_ports capability to distinguish between
multiport master device and multiport slave device. num_vhca_ports is a
capability the driver sets according to the MAX num_vhca_ports
capability reported by FW. On the other hand, light SFs doesn't set the
above capbility.

This leads to wrong results whenever light SFs is checking whether he is
a multiport master or slave.

Therefore, use the MAX capability to distinguish between master and
slave devices.

Fixes: e71383fb9cd1 ("net/mlx5: Light probe local SFs")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
---
 include/linux/mlx5/driver.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index e2d067b1e67b..04dcd09f7517 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -1282,12 +1282,12 @@ static inline bool mlx5_rl_is_supported(struct mlx5_core_dev *dev)
 static inline int mlx5_core_is_mp_slave(struct mlx5_core_dev *dev)
 {
 	return MLX5_CAP_GEN(dev, affiliate_nic_vport_criteria) &&
-	       MLX5_CAP_GEN(dev, num_vhca_ports) <= 1;
+	       MLX5_CAP_GEN_MAX(dev, num_vhca_ports) <= 1;
 }
 
 static inline int mlx5_core_is_mp_master(struct mlx5_core_dev *dev)
 {
-	return MLX5_CAP_GEN(dev, num_vhca_ports) > 1;
+	return MLX5_CAP_GEN_MAX(dev, num_vhca_ports) > 1;
 }
 
 static inline int mlx5_core_mp_enabled(struct mlx5_core_dev *dev)
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH net V2 2/6] net/mlx5e: Fix misidentification of ASO CQE during poll loop
  2026-02-18  7:28 [PATCH net V2 0/6] mlx5 misc fixes 2026-02-18 Tariq Toukan
  2026-02-18  7:28 ` [PATCH net V2 1/6] net/mlx5: Fix multiport device check over light SFs Tariq Toukan
@ 2026-02-18  7:29 ` Tariq Toukan
  2026-02-18  7:29 ` [PATCH net V2 3/6] net/mlx5: Fix misidentification of write combining " Tariq Toukan
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Tariq Toukan @ 2026-02-18  7:29 UTC (permalink / raw)
  To: Eric Dumazet, Jakub Kicinski, Paolo Abeni, Andrew Lunn,
	David S. Miller
  Cc: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Mark Bloch, netdev,
	linux-rdma, linux-kernel, Gal Pressman, Moshe Shemesh,
	Jacob Keller, Jianbo Liu

From: Gal Pressman <gal@nvidia.com>

The ASO completion poll loop uses usleep_range() which can sleep much
longer than requested due to scheduler latency. Under load, we witnessed
a 20ms+ delay until the process was rescheduled, causing the jiffies
based timeout to expire while the thread is sleeping.

The original do-while loop structure (poll, sleep, check timeout) would
exit without a final poll when waking after timeout, missing a CQE that
arrived during sleep.

Instead of the open-coded while loop, use the kernel's
read_poll_timeout() which always performs an additional check after the
sleep expiration, and is less error-prone.

Note: read_poll_timeout() doesn't accept a sleep range, by passing 10
sleep_us the sleep range effectively changes from 2-10 to 3-10 usecs.

Fixes: 739cfa34518e ("net/mlx5: Make ASO poll CQ usable in atomic context")
Fixes: 7e3fce82d945 ("net/mlx5e: Overcome slow response for first macsec ASO WQE")
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en/tc/meter.c  | 10 +++-------
 .../net/ethernet/mellanox/mlx5/core/en_accel/macsec.c  | 10 +++-------
 2 files changed, 6 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc/meter.c b/drivers/net/ethernet/mellanox/mlx5/core/en/tc/meter.c
index 7819fb297280..d5d9146efca6 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc/meter.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc/meter.c
@@ -1,6 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
 // Copyright (c) 2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 
+#include <linux/iopoll.h>
 #include <linux/math64.h>
 #include "lib/aso.h"
 #include "en/tc/post_act.h"
@@ -115,7 +116,6 @@ mlx5e_tc_meter_modify(struct mlx5_core_dev *mdev,
 	struct mlx5e_flow_meters *flow_meters;
 	u8 cir_man, cir_exp, cbs_man, cbs_exp;
 	struct mlx5_aso_wqe *aso_wqe;
-	unsigned long expires;
 	struct mlx5_aso *aso;
 	u64 rate, burst;
 	u8 ds_cnt;
@@ -187,12 +187,8 @@ mlx5e_tc_meter_modify(struct mlx5_core_dev *mdev,
 	mlx5_aso_post_wqe(aso, true, &aso_wqe->ctrl);
 
 	/* With newer FW, the wait for the first ASO WQE is more than 2us, put the wait 10ms. */
-	expires = jiffies + msecs_to_jiffies(10);
-	do {
-		err = mlx5_aso_poll_cq(aso, true);
-		if (err)
-			usleep_range(2, 10);
-	} while (err && time_is_after_jiffies(expires));
+	read_poll_timeout(mlx5_aso_poll_cq, err, !err, 10, 10 * USEC_PER_MSEC,
+			  false, aso, true);
 	mutex_unlock(&flow_meters->aso_lock);
 
 	return err;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/macsec.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/macsec.c
index 528b04d4de41..641cd3a2cdfa 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/macsec.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/macsec.c
@@ -5,6 +5,7 @@
 #include <linux/mlx5/mlx5_ifc.h>
 #include <linux/xarray.h>
 #include <linux/if_vlan.h>
+#include <linux/iopoll.h>
 
 #include "en.h"
 #include "lib/aso.h"
@@ -1397,7 +1398,6 @@ static int macsec_aso_query(struct mlx5_core_dev *mdev, struct mlx5e_macsec *mac
 	struct mlx5e_macsec_aso *aso;
 	struct mlx5_aso_wqe *aso_wqe;
 	struct mlx5_aso *maso;
-	unsigned long expires;
 	int err;
 
 	aso = &macsec->aso;
@@ -1411,12 +1411,8 @@ static int macsec_aso_query(struct mlx5_core_dev *mdev, struct mlx5e_macsec *mac
 	macsec_aso_build_wqe_ctrl_seg(aso, &aso_wqe->aso_ctrl, NULL);
 
 	mlx5_aso_post_wqe(maso, false, &aso_wqe->ctrl);
-	expires = jiffies + msecs_to_jiffies(10);
-	do {
-		err = mlx5_aso_poll_cq(maso, false);
-		if (err)
-			usleep_range(2, 10);
-	} while (err && time_is_after_jiffies(expires));
+	read_poll_timeout(mlx5_aso_poll_cq, err, !err, 10, 10 * USEC_PER_MSEC,
+			  false, maso, false);
 
 	if (err)
 		goto err_out;
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH net V2 3/6] net/mlx5: Fix misidentification of write combining CQE during poll loop
  2026-02-18  7:28 [PATCH net V2 0/6] mlx5 misc fixes 2026-02-18 Tariq Toukan
  2026-02-18  7:28 ` [PATCH net V2 1/6] net/mlx5: Fix multiport device check over light SFs Tariq Toukan
  2026-02-18  7:29 ` [PATCH net V2 2/6] net/mlx5e: Fix misidentification of ASO CQE during poll loop Tariq Toukan
@ 2026-02-18  7:29 ` Tariq Toukan
  2026-02-18  7:29 ` [PATCH net V2 4/6] net/mlx5e: MACsec, add ASO poll loop in macsec_aso_set_arm_event Tariq Toukan
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Tariq Toukan @ 2026-02-18  7:29 UTC (permalink / raw)
  To: Eric Dumazet, Jakub Kicinski, Paolo Abeni, Andrew Lunn,
	David S. Miller
  Cc: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Mark Bloch, netdev,
	linux-rdma, linux-kernel, Gal Pressman, Moshe Shemesh,
	Jacob Keller, Jianbo Liu

From: Gal Pressman <gal@nvidia.com>

The write combining completion poll loop uses usleep_range() which can
sleep much longer than requested due to scheduler latency. Under load,
we witnessed a 20ms+ delay until the process was rescheduled, causing
the jiffies based timeout to expire while the thread is sleeping.

The original do-while loop structure (poll, sleep, check timeout) would
exit without a final poll when waking after timeout, missing a CQE that
arrived during sleep.

Instead of the open-coded while loop, use the kernel's poll_timeout_us()
which always performs an additional check after the sleep expiration,
and is less error-prone.

Note: poll_timeout_us() doesn't accept a sleep range, by passing 10
sleep_us the sleep range effectively changes from 2-10 to 3-10 usecs.

Fixes: d98995b4bf98 ("net/mlx5: Reimplement write combining test")
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/wc.c | 14 +++++---------
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/wc.c b/drivers/net/ethernet/mellanox/mlx5/core/wc.c
index 815a7c97d6b0..04d03be1bb77 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/wc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/wc.c
@@ -2,6 +2,7 @@
 // Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 
 #include <linux/io.h>
+#include <linux/iopoll.h>
 #include <linux/mlx5/transobj.h>
 #include "lib/clock.h"
 #include "mlx5_core.h"
@@ -15,7 +16,7 @@
 #define TEST_WC_NUM_WQES 255
 #define TEST_WC_LOG_CQ_SZ (order_base_2(TEST_WC_NUM_WQES))
 #define TEST_WC_SQ_LOG_WQ_SZ TEST_WC_LOG_CQ_SZ
-#define TEST_WC_POLLING_MAX_TIME_JIFFIES msecs_to_jiffies(100)
+#define TEST_WC_POLLING_MAX_TIME_USEC (100 * USEC_PER_MSEC)
 
 struct mlx5_wc_cq {
 	/* data path - accessed per cqe */
@@ -359,7 +360,6 @@ static int mlx5_wc_poll_cq(struct mlx5_wc_sq *sq)
 static void mlx5_core_test_wc(struct mlx5_core_dev *mdev)
 {
 	unsigned int offset = 0;
-	unsigned long expires;
 	struct mlx5_wc_sq *sq;
 	int i, err;
 
@@ -389,13 +389,9 @@ static void mlx5_core_test_wc(struct mlx5_core_dev *mdev)
 
 	mlx5_wc_post_nop(sq, &offset, true);
 
-	expires = jiffies + TEST_WC_POLLING_MAX_TIME_JIFFIES;
-	do {
-		err = mlx5_wc_poll_cq(sq);
-		if (err)
-			usleep_range(2, 10);
-	} while (mdev->wc_state == MLX5_WC_STATE_UNINITIALIZED &&
-		 time_is_after_jiffies(expires));
+	poll_timeout_us(mlx5_wc_poll_cq(sq),
+			mdev->wc_state != MLX5_WC_STATE_UNINITIALIZED, 10,
+			TEST_WC_POLLING_MAX_TIME_USEC, false);
 
 	mlx5_wc_destroy_sq(sq);
 
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH net V2 4/6] net/mlx5e: MACsec, add ASO poll loop in macsec_aso_set_arm_event
  2026-02-18  7:28 [PATCH net V2 0/6] mlx5 misc fixes 2026-02-18 Tariq Toukan
                   ` (2 preceding siblings ...)
  2026-02-18  7:29 ` [PATCH net V2 3/6] net/mlx5: Fix misidentification of write combining " Tariq Toukan
@ 2026-02-18  7:29 ` Tariq Toukan
  2026-02-18  7:29 ` [PATCH net V2 5/6] net/mlx5e: Fix deadlocks between devlink and netdev instance locks Tariq Toukan
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Tariq Toukan @ 2026-02-18  7:29 UTC (permalink / raw)
  To: Eric Dumazet, Jakub Kicinski, Paolo Abeni, Andrew Lunn,
	David S. Miller
  Cc: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Mark Bloch, netdev,
	linux-rdma, linux-kernel, Gal Pressman, Moshe Shemesh,
	Jacob Keller, Jianbo Liu

From: Gal Pressman <gal@nvidia.com>

The macsec_aso_set_arm_event function calls mlx5_aso_poll_cq once
without a retry loop. If the CQE is not immediately available after
posting the WQE, the function fails unnecessarily.

Use read_poll_timeout() to poll 3-10 usecs for CQE, consistent with
other ASO polling code paths in the driver.

Fixes: 739cfa34518e ("net/mlx5: Make ASO poll CQ usable in atomic context")
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_accel/macsec.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/macsec.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/macsec.c
index 641cd3a2cdfa..90b3bc5f9166 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/macsec.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/macsec.c
@@ -1386,7 +1386,8 @@ static int macsec_aso_set_arm_event(struct mlx5_core_dev *mdev, struct mlx5e_mac
 			   MLX5_ACCESS_ASO_OPC_MOD_MACSEC);
 	macsec_aso_build_ctrl(aso, &aso_wqe->aso_ctrl, in);
 	mlx5_aso_post_wqe(maso, false, &aso_wqe->ctrl);
-	err = mlx5_aso_poll_cq(maso, false);
+	read_poll_timeout(mlx5_aso_poll_cq, err, !err, 10, 10 * USEC_PER_MSEC,
+			  false, maso, false);
 	mutex_unlock(&aso->aso_lock);
 
 	return err;
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH net V2 5/6] net/mlx5e: Fix deadlocks between devlink and netdev instance locks
  2026-02-18  7:28 [PATCH net V2 0/6] mlx5 misc fixes 2026-02-18 Tariq Toukan
                   ` (3 preceding siblings ...)
  2026-02-18  7:29 ` [PATCH net V2 4/6] net/mlx5e: MACsec, add ASO poll loop in macsec_aso_set_arm_event Tariq Toukan
@ 2026-02-18  7:29 ` Tariq Toukan
  2026-03-05  2:33   ` Jinjie Ruan
  2026-02-18  7:29 ` [PATCH net V2 6/6] net/mlx5e: Use unsigned for mlx5e_get_max_num_channels Tariq Toukan
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 12+ messages in thread
From: Tariq Toukan @ 2026-02-18  7:29 UTC (permalink / raw)
  To: Eric Dumazet, Jakub Kicinski, Paolo Abeni, Andrew Lunn,
	David S. Miller
  Cc: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Mark Bloch, netdev,
	linux-rdma, linux-kernel, Gal Pressman, Moshe Shemesh,
	Jacob Keller, Cosmin Ratiu, Dragos Tatulea

From: Cosmin Ratiu <cratiu@nvidia.com>

In the mentioned "Fixes" commit, various work tasks triggering devlink
health reporter recovery were switched to use netdev_trylock to protect
against concurrent tear down of the channels being recovered. But this
had the side effect of introducing potential deadlocks because of
incorrect lock ordering.

The correct lock order is described by the init flow:
probe_one -> mlx5_init_one (acquires devlink lock)
-> mlx5_init_one_devl_locked -> mlx5_register_device
-> mlx5_rescan_drivers_locked -...-> mlx5e_probe -> _mlx5e_probe
-> register_netdev (acquires rtnl lock)
-> register_netdevice (acquires netdev lock)
=> devlink lock -> rtnl lock -> netdev lock.

But in the current recovery flow, the order is wrong:
mlx5e_tx_err_cqe_work (acquires netdev lock)
-> mlx5e_reporter_tx_err_cqe -> mlx5e_health_report
-> devlink_health_report (acquires devlink lock => boom!)
-> devlink_health_reporter_recover
-> mlx5e_tx_reporter_recover -> mlx5e_tx_reporter_recover_from_ctx
-> mlx5e_tx_reporter_err_cqe_recover

The same pattern exists in:
mlx5e_reporter_rx_timeout
mlx5e_reporter_tx_ptpsq_unhealthy
mlx5e_reporter_tx_timeout

Fix these by moving the netdev_trylock calls from the work handlers
lower in the call stack, in the respective recovery functions, where
they are actually necessary.

Fixes: 8f7b00307bf1 ("net/mlx5e: Convert mlx5 netdevs to instance locking")
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
---
 .../net/ethernet/mellanox/mlx5/core/en/ptp.c  | 14 -----
 .../mellanox/mlx5/core/en/reporter_rx.c       | 13 +++++
 .../mellanox/mlx5/core/en/reporter_tx.c       | 52 +++++++++++++++++--
 .../net/ethernet/mellanox/mlx5/core/en_main.c | 40 --------------
 4 files changed, 61 insertions(+), 58 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
index 424f8a2728a3..74660e7fe674 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
@@ -457,22 +457,8 @@ static void mlx5e_ptpsq_unhealthy_work(struct work_struct *work)
 {
 	struct mlx5e_ptpsq *ptpsq =
 		container_of(work, struct mlx5e_ptpsq, report_unhealthy_work);
-	struct mlx5e_txqsq *sq = &ptpsq->txqsq;
-
-	/* Recovering the PTP SQ means re-enabling NAPI, which requires the
-	 * netdev instance lock. However, SQ closing has to wait for this work
-	 * task to finish while also holding the same lock. So either get the
-	 * lock or find that the SQ is no longer enabled and thus this work is
-	 * not relevant anymore.
-	 */
-	while (!netdev_trylock(sq->netdev)) {
-		if (!test_bit(MLX5E_SQ_STATE_ENABLED, &sq->state))
-			return;
-		msleep(20);
-	}
 
 	mlx5e_reporter_tx_ptpsq_unhealthy(ptpsq);
-	netdev_unlock(sq->netdev);
 }
 
 static int mlx5e_ptp_open_txqsq(struct mlx5e_ptp *c, u32 tisn,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c
index 0686fbdd5a05..6efb626b5506 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c
@@ -1,6 +1,8 @@
 // SPDX-License-Identifier: GPL-2.0
 // Copyright (c) 2019 Mellanox Technologies.
 
+#include <net/netdev_lock.h>
+
 #include "health.h"
 #include "params.h"
 #include "txrx.h"
@@ -177,6 +179,16 @@ static int mlx5e_rx_reporter_timeout_recover(void *ctx)
 	rq = ctx;
 	priv = rq->priv;
 
+	/* Acquire netdev instance lock to synchronize with channel close and
+	 * reopen flows. Either successfully obtain the lock, or detect that
+	 * channels are closing for another reason, making this work no longer
+	 * necessary.
+	 */
+	while (!netdev_trylock(rq->netdev)) {
+		if (!test_bit(MLX5E_STATE_CHANNELS_ACTIVE, &rq->priv->state))
+			return 0;
+		msleep(20);
+	}
 	mutex_lock(&priv->state_lock);
 
 	eq = rq->cq.mcq.eq;
@@ -186,6 +198,7 @@ static int mlx5e_rx_reporter_timeout_recover(void *ctx)
 		clear_bit(MLX5E_SQ_STATE_ENABLED, &rq->icosq->state);
 
 	mutex_unlock(&priv->state_lock);
+	netdev_unlock(rq->netdev);
 
 	return err;
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
index 4adc1adf9897..60ba840e00fa 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
@@ -1,6 +1,8 @@
 /* SPDX-License-Identifier: GPL-2.0 */
 /* Copyright (c) 2019 Mellanox Technologies. */
 
+#include <net/netdev_lock.h>
+
 #include "health.h"
 #include "en/ptp.h"
 #include "en/devlink.h"
@@ -79,6 +81,18 @@ static int mlx5e_tx_reporter_err_cqe_recover(void *ctx)
 	if (!test_bit(MLX5E_SQ_STATE_RECOVERING, &sq->state))
 		return 0;
 
+	/* Recovering queues means re-enabling NAPI, which requires the netdev
+	 * instance lock. However, SQ closing flows have to wait for work tasks
+	 * to finish while also holding the netdev instance lock. So either get
+	 * the lock or find that the SQ is no longer enabled and thus this work
+	 * is not relevant anymore.
+	 */
+	while (!netdev_trylock(dev)) {
+		if (!test_bit(MLX5E_SQ_STATE_ENABLED, &sq->state))
+			return 0;
+		msleep(20);
+	}
+
 	err = mlx5_core_query_sq_state(mdev, sq->sqn, &state);
 	if (err) {
 		netdev_err(dev, "Failed to query SQ 0x%x state. err = %d\n",
@@ -114,9 +128,11 @@ static int mlx5e_tx_reporter_err_cqe_recover(void *ctx)
 	else
 		mlx5e_trigger_napi_sched(sq->cq.napi);
 
+	netdev_unlock(dev);
 	return 0;
 out:
 	clear_bit(MLX5E_SQ_STATE_RECOVERING, &sq->state);
+	netdev_unlock(dev);
 	return err;
 }
 
@@ -137,10 +153,24 @@ static int mlx5e_tx_reporter_timeout_recover(void *ctx)
 	sq = to_ctx->sq;
 	eq = sq->cq.mcq.eq;
 	priv = sq->priv;
+
+	/* Recovering the TX queues implies re-enabling NAPI, which requires
+	 * the netdev instance lock.
+	 * However, channel closing flows have to wait for this work to finish
+	 * while holding the same lock. So either get the lock or find that
+	 * channels are being closed for other reason and this work is not
+	 * relevant anymore.
+	 */
+	while (!netdev_trylock(sq->netdev)) {
+		if (!test_bit(MLX5E_STATE_CHANNELS_ACTIVE, &priv->state))
+			return 0;
+		msleep(20);
+	}
+
 	err = mlx5e_health_channel_eq_recover(sq->netdev, eq, sq->cq.ch_stats);
 	if (!err) {
 		to_ctx->status = 0; /* this sq recovered */
-		return err;
+		goto out;
 	}
 
 	mutex_lock(&priv->state_lock);
@@ -148,7 +178,7 @@ static int mlx5e_tx_reporter_timeout_recover(void *ctx)
 	mutex_unlock(&priv->state_lock);
 	if (!err) {
 		to_ctx->status = 1; /* all channels recovered */
-		return err;
+		goto out;
 	}
 
 	to_ctx->status = err;
@@ -156,7 +186,8 @@ static int mlx5e_tx_reporter_timeout_recover(void *ctx)
 	netdev_err(priv->netdev,
 		   "mlx5e_safe_reopen_channels failed recovering from a tx_timeout, err(%d).\n",
 		   err);
-
+out:
+	netdev_unlock(sq->netdev);
 	return err;
 }
 
@@ -173,10 +204,22 @@ static int mlx5e_tx_reporter_ptpsq_unhealthy_recover(void *ctx)
 		return 0;
 
 	priv = ptpsq->txqsq.priv;
+	netdev = priv->netdev;
+
+	/* Recovering the PTP SQ means re-enabling NAPI, which requires the
+	 * netdev instance lock. However, SQ closing has to wait for this work
+	 * task to finish while also holding the same lock. So either get the
+	 * lock or find that the SQ is no longer enabled and thus this work is
+	 * not relevant anymore.
+	 */
+	while (!netdev_trylock(netdev)) {
+		if (!test_bit(MLX5E_SQ_STATE_ENABLED, &ptpsq->txqsq.state))
+			return 0;
+		msleep(20);
+	}
 
 	mutex_lock(&priv->state_lock);
 	chs = &priv->channels;
-	netdev = priv->netdev;
 
 	carrier_ok = netif_carrier_ok(netdev);
 	netif_carrier_off(netdev);
@@ -193,6 +236,7 @@ static int mlx5e_tx_reporter_ptpsq_unhealthy_recover(void *ctx)
 		netif_carrier_on(netdev);
 
 	mutex_unlock(&priv->state_lock);
+	netdev_unlock(netdev);
 
 	return err;
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 4b8084420816..73f4805feac7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -631,19 +631,7 @@ static void mlx5e_rq_timeout_work(struct work_struct *timeout_work)
 					   struct mlx5e_rq,
 					   rx_timeout_work);
 
-	/* Acquire netdev instance lock to synchronize with channel close and
-	 * reopen flows. Either successfully obtain the lock, or detect that
-	 * channels are closing for another reason, making this work no longer
-	 * necessary.
-	 */
-	while (!netdev_trylock(rq->netdev)) {
-		if (!test_bit(MLX5E_STATE_CHANNELS_ACTIVE, &rq->priv->state))
-			return;
-		msleep(20);
-	}
-
 	mlx5e_reporter_rx_timeout(rq);
-	netdev_unlock(rq->netdev);
 }
 
 static int mlx5e_alloc_mpwqe_rq_drop_page(struct mlx5e_rq *rq)
@@ -1952,20 +1940,7 @@ void mlx5e_tx_err_cqe_work(struct work_struct *recover_work)
 	struct mlx5e_txqsq *sq = container_of(recover_work, struct mlx5e_txqsq,
 					      recover_work);
 
-	/* Recovering queues means re-enabling NAPI, which requires the netdev
-	 * instance lock. However, SQ closing flows have to wait for work tasks
-	 * to finish while also holding the netdev instance lock. So either get
-	 * the lock or find that the SQ is no longer enabled and thus this work
-	 * is not relevant anymore.
-	 */
-	while (!netdev_trylock(sq->netdev)) {
-		if (!test_bit(MLX5E_SQ_STATE_ENABLED, &sq->state))
-			return;
-		msleep(20);
-	}
-
 	mlx5e_reporter_tx_err_cqe(sq);
-	netdev_unlock(sq->netdev);
 }
 
 static struct dim_cq_moder mlx5e_get_def_tx_moderation(u8 cq_period_mode)
@@ -5105,19 +5080,6 @@ static void mlx5e_tx_timeout_work(struct work_struct *work)
 	struct net_device *netdev = priv->netdev;
 	int i;
 
-	/* Recovering the TX queues implies re-enabling NAPI, which requires
-	 * the netdev instance lock.
-	 * However, channel closing flows have to wait for this work to finish
-	 * while holding the same lock. So either get the lock or find that
-	 * channels are being closed for other reason and this work is not
-	 * relevant anymore.
-	 */
-	while (!netdev_trylock(netdev)) {
-		if (!test_bit(MLX5E_STATE_CHANNELS_ACTIVE, &priv->state))
-			return;
-		msleep(20);
-	}
-
 	for (i = 0; i < netdev->real_num_tx_queues; i++) {
 		struct netdev_queue *dev_queue =
 			netdev_get_tx_queue(netdev, i);
@@ -5130,8 +5092,6 @@ static void mlx5e_tx_timeout_work(struct work_struct *work)
 		/* break if tried to reopened channels */
 			break;
 	}
-
-	netdev_unlock(netdev);
 }
 
 static void mlx5e_tx_timeout(struct net_device *dev, unsigned int txqueue)
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH net V2 6/6] net/mlx5e: Use unsigned for mlx5e_get_max_num_channels
  2026-02-18  7:28 [PATCH net V2 0/6] mlx5 misc fixes 2026-02-18 Tariq Toukan
                   ` (4 preceding siblings ...)
  2026-02-18  7:29 ` [PATCH net V2 5/6] net/mlx5e: Fix deadlocks between devlink and netdev instance locks Tariq Toukan
@ 2026-02-18  7:29 ` Tariq Toukan
  2026-02-18 23:49 ` [PATCH net V2 0/6] mlx5 misc fixes 2026-02-18 Keller, Jacob E
  2026-02-19 17:40 ` patchwork-bot+netdevbpf
  7 siblings, 0 replies; 12+ messages in thread
From: Tariq Toukan @ 2026-02-18  7:29 UTC (permalink / raw)
  To: Eric Dumazet, Jakub Kicinski, Paolo Abeni, Andrew Lunn,
	David S. Miller
  Cc: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Mark Bloch, netdev,
	linux-rdma, linux-kernel, Gal Pressman, Moshe Shemesh,
	Jacob Keller, Cosmin Ratiu, Dragos Tatulea

From: Cosmin Ratiu <cratiu@nvidia.com>

The max number of channels is always an unsigned int, use the correct
type to fix compilation errors done with strict type checking, e.g.:

error: call to ‘__compiletime_assert_1110’ declared with attribute
  error: min(mlx5e_get_devlink_param_num_doorbells(mdev),
  mlx5e_get_max_num_channels(mdev)) signedness error

Fixes: 74a8dadac17e ("net/mlx5e: Preparations for supporting larger number of channels")
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index a7de3a3efc49..5215360c347b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -180,7 +180,8 @@ static inline u16 mlx5_min_rx_wqes(int wq_type, u32 wq_size)
 }
 
 /* Use this function to get max num channels (rxqs/txqs) only to create netdev */
-static inline int mlx5e_get_max_num_channels(struct mlx5_core_dev *mdev)
+static inline unsigned int
+mlx5e_get_max_num_channels(struct mlx5_core_dev *mdev)
 {
 	return is_kdump_kernel() ?
 		MLX5E_MIN_NUM_CHANNELS :
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* RE: [PATCH net V2 0/6] mlx5 misc fixes 2026-02-18
  2026-02-18  7:28 [PATCH net V2 0/6] mlx5 misc fixes 2026-02-18 Tariq Toukan
                   ` (5 preceding siblings ...)
  2026-02-18  7:29 ` [PATCH net V2 6/6] net/mlx5e: Use unsigned for mlx5e_get_max_num_channels Tariq Toukan
@ 2026-02-18 23:49 ` Keller, Jacob E
  2026-02-19 17:40 ` patchwork-bot+netdevbpf
  7 siblings, 0 replies; 12+ messages in thread
From: Keller, Jacob E @ 2026-02-18 23:49 UTC (permalink / raw)
  To: Tariq Toukan, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Andrew Lunn, David S. Miller
  Cc: Saeed Mahameed, Leon Romanovsky, Mark Bloch,
	netdev@vger.kernel.org, linux-rdma@vger.kernel.org,
	linux-kernel@vger.kernel.org, Gal Pressman, Moshe Shemesh



> -----Original Message-----
> From: Tariq Toukan <tariqt@nvidia.com>
> Sent: Tuesday, February 17, 2026 11:29 PM
> To: Eric Dumazet <edumazet@google.com>; Jakub Kicinski
> <kuba@kernel.org>; Paolo Abeni <pabeni@redhat.com>; Andrew Lunn
> <andrew+netdev@lunn.ch>; David S. Miller <davem@davemloft.net>
> Cc: Saeed Mahameed <saeedm@nvidia.com>; Leon Romanovsky
> <leon@kernel.org>; Tariq Toukan <tariqt@nvidia.com>; Mark Bloch
> <mbloch@nvidia.com>; netdev@vger.kernel.org; linux-rdma@vger.kernel.org;
> linux-kernel@vger.kernel.org; Gal Pressman <gal@nvidia.com>; Moshe
> Shemesh <moshe@nvidia.com>; Keller, Jacob E <jacob.e.keller@intel.com>
> Subject: [PATCH net V2 0/6] mlx5 misc fixes 2026-02-18
> 
> Hi,
> 
> This patchset provides misc bug fixes from the team to the mlx5
> core and Eth drivers.
> 
> Thanks,
> Tariq.
> 
> V2:
> - Add review tags (Jacob).
> - Use poll_timeout_us and variants (Jacob).
> - Link to V1: https://lore.kernel.org/all/20260212103217.1752943-1-
> tariqt@nvidia.com/
> 

Everything looks good in v2, thanks for switching to iopoll.

Reviewed-by: Jacob Keller <Jacob.e.keller@intel.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net V2 0/6] mlx5 misc fixes 2026-02-18
  2026-02-18  7:28 [PATCH net V2 0/6] mlx5 misc fixes 2026-02-18 Tariq Toukan
                   ` (6 preceding siblings ...)
  2026-02-18 23:49 ` [PATCH net V2 0/6] mlx5 misc fixes 2026-02-18 Keller, Jacob E
@ 2026-02-19 17:40 ` patchwork-bot+netdevbpf
  7 siblings, 0 replies; 12+ messages in thread
From: patchwork-bot+netdevbpf @ 2026-02-19 17:40 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: edumazet, kuba, pabeni, andrew+netdev, davem, saeedm, leon,
	mbloch, netdev, linux-rdma, linux-kernel, gal, moshe,
	jacob.e.keller

Hello:

This series was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Wed, 18 Feb 2026 09:28:58 +0200 you wrote:
> Hi,
> 
> This patchset provides misc bug fixes from the team to the mlx5
> core and Eth drivers.
> 
> Thanks,
> Tariq.
> 
> [...]

Here is the summary with links:
  - [net,V2,1/6] net/mlx5: Fix multiport device check over light SFs
    https://git.kernel.org/netdev/net/c/47bf2e813817
  - [net,V2,2/6] net/mlx5e: Fix misidentification of ASO CQE during poll loop
    https://git.kernel.org/netdev/net/c/ae3cb71e6c4d
  - [net,V2,3/6] net/mlx5: Fix misidentification of write combining CQE during poll loop
    https://git.kernel.org/netdev/net/c/d451994ebc7d
  - [net,V2,4/6] net/mlx5e: MACsec, add ASO poll loop in macsec_aso_set_arm_event
    https://git.kernel.org/netdev/net/c/9854b243ce42
  - [net,V2,5/6] net/mlx5e: Fix deadlocks between devlink and netdev instance locks
    https://git.kernel.org/netdev/net/c/83ac0304a2d7
  - [net,V2,6/6] net/mlx5e: Use unsigned for mlx5e_get_max_num_channels
    https://git.kernel.org/netdev/net/c/57a94d4b22b0

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net V2 5/6] net/mlx5e: Fix deadlocks between devlink and netdev instance locks
  2026-02-18  7:29 ` [PATCH net V2 5/6] net/mlx5e: Fix deadlocks between devlink and netdev instance locks Tariq Toukan
@ 2026-03-05  2:33   ` Jinjie Ruan
  2026-03-05 12:19     ` Cosmin Ratiu
  0 siblings, 1 reply; 12+ messages in thread
From: Jinjie Ruan @ 2026-03-05  2:33 UTC (permalink / raw)
  To: Tariq Toukan, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Andrew Lunn, David S. Miller
  Cc: Saeed Mahameed, Leon Romanovsky, Mark Bloch, netdev, linux-rdma,
	linux-kernel, Gal Pressman, Moshe Shemesh, Jacob Keller,
	Cosmin Ratiu, Dragos Tatulea



On 2026/2/18 15:29, Tariq Toukan wrote:
> From: Cosmin Ratiu <cratiu@nvidia.com>
> 
> In the mentioned "Fixes" commit, various work tasks triggering devlink
> health reporter recovery were switched to use netdev_trylock to protect
> against concurrent tear down of the channels being recovered. But this
> had the side effect of introducing potential deadlocks because of
> incorrect lock ordering.
> 
> The correct lock order is described by the init flow:
> probe_one -> mlx5_init_one (acquires devlink lock)
> -> mlx5_init_one_devl_locked -> mlx5_register_device
> -> mlx5_rescan_drivers_locked -...-> mlx5e_probe -> _mlx5e_probe
> -> register_netdev (acquires rtnl lock)
> -> register_netdevice (acquires netdev lock)
> => devlink lock -> rtnl lock -> netdev lock.
> 
> But in the current recovery flow, the order is wrong:
> mlx5e_tx_err_cqe_work (acquires netdev lock)
> -> mlx5e_reporter_tx_err_cqe -> mlx5e_health_report
> -> devlink_health_report (acquires devlink lock => boom!)
> -> devlink_health_reporter_recover
> -> mlx5e_tx_reporter_recover -> mlx5e_tx_reporter_recover_from_ctx
> -> mlx5e_tx_reporter_err_cqe_recover
> 
> The same pattern exists in:
> mlx5e_reporter_rx_timeout
> mlx5e_reporter_tx_ptpsq_unhealthy
> mlx5e_reporter_tx_timeout
> 

On 7.0 rc2，It seems that similar problems still exist, causing the ARM64
kernel to fail to start on the Kunpeng 920 as below:

[  242.676635][ T1644] INFO: task kworker/u1280:1:1671 blocked for more
than 120 seconds.
[  242.682141][ T1644]       Not tainted 7.0.0-rc2+ #3
[  242.684942][ T1644] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  242.690552][ T1644] task:kworker/u1280:1 state:D stack:0     pid:1671
 tgid:1671  ppid:2      task_flags:0x4208060 flags:0x00000010
[  242.696332][ T1644] Workqueue: mlx5_health0000:97:00.0
mlx5_fw_reporter_err_work [mlx5_core]
[  242.702324][ T1644] Call trace:
[  242.705187][ T1644]  __switch_to+0xdc/0x108 (T)
[  242.707936][ T1644]  __schedule+0x2a0/0x8a8
[  242.710647][ T1644]  schedule+0x3c/0xc0
[  242.713321][ T1644]  schedule_preempt_disabled+0x2c/0x50
[  242.715875][ T1644]  __mutex_lock.constprop.0+0x344/0x918
[  242.718421][ T1644]  __mutex_lock_slowpath+0x1c/0x30
[  242.720885][ T1644]  mutex_lock+0x50/0x68
[  242.723278][ T1644]  devl_lock+0x1c/0x30
[  242.725607][ T1644]  devlink_health_report+0x240/0x328
[  242.727902][ T1644]  mlx5_fw_reporter_err_work+0xa0/0xb0 [mlx5_core]
[  242.730333][ T1644]  process_one_work+0x180/0x4f8
[  242.732687][ T1644]  worker_thread+0x208/0x280
[  242.734976][ T1644]  kthread+0x128/0x138
[  242.737217][ T1644]  ret_from_fork+0x10/0x20
[  242.739599][ T1644] INFO: task kworker/u1280:1:1671 is blocked on a
mutex likely owned by task kworker/240:2:2582.
[  242.744002][ T1644] task:kworker/240:2   state:D stack:0     pid:2582
 tgid:2582  ppid:2      task_flags:0x4208060 flags:0x00000010
[  242.748447][ T1644] Workqueue: sync_wq local_pci_probe_callback
[  242.750654][ T1644] Call trace:
[  242.752793][ T1644]  __switch_to+0xdc/0x108 (T)
[  242.754882][ T1644]  __schedule+0x2a0/0x8a8
[  242.756946][ T1644]  schedule+0x3c/0xc0
[  242.758951][ T1644]  schedule_timeout+0x80/0x120
[  242.760903][ T1644]  __wait_for_common+0xc4/0x1d0
[  242.762796][ T1644]  wait_for_completion_timeout+0x28/0x40
[  242.764670][ T1644]  wait_func+0x180/0x240 [mlx5_core]
[  242.766533][ T1644]  mlx5_cmd_invoke+0x244/0x3e0 [mlx5_core]
[  242.768338][ T1644]  cmd_exec+0x208/0x448 [mlx5_core]
[  242.770153][ T1644]  mlx5_cmd_do+0x38/0x80 [mlx5_core]
[  242.771974][ T1644]  mlx5_cmd_exec+0x2c/0x60 [mlx5_core]
[  242.773848][ T1644]  mlx5_core_create_mkey+0x70/0x120 [mlx5_core]
[  242.775712][ T1644]  mlx5_fw_tracer_create_mkey+0x114/0x180 [mlx5_core]
[  242.777609][ T1644]  mlx5_fw_tracer_init.part.0+0xb0/0x1f0 [mlx5_core]
[  242.779495][ T1644]  mlx5_fw_tracer_init+0x24/0x40 [mlx5_core]
[  242.781380][ T1644]  mlx5_load+0x78/0x360 [mlx5_core]
[  242.783256][ T1644]  mlx5_init_one_devl_locked+0xd0/0x278 [mlx5_core]
[  242.785231][ T1644]  probe_one+0xe0/0x208 [mlx5_core]
[  242.787159][ T1644]  local_pci_probe+0x48/0xb8
[  242.789038][ T1644]  local_pci_probe_callback+0x24/0x40
[  242.790876][ T1644]  process_one_work+0x180/0x4f8
[  242.792731][ T1644]  worker_thread+0x208/0x280
[  242.794578][ T1644]  kthread+0x128/0x138
[  242.796427][ T1644]  ret_from_fork+0x10/0x20
[  242.798277][ T1644] INFO: task systemd-udevd:2281 blocked for more
than 120 seconds.
[  242.801795][ T1644]       Not tainted 7.0.0-rc2+ #3
[  242.803542][ T1644] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  242.807056][ T1644] task:systemd-udevd   state:D stack:0     pid:2281
 tgid:2281  ppid:2256   task_flags:0x400140 flags:0x00000811
[  242.810829][ T1644] Call trace:
[  242.812779][ T1644]  __switch_to+0xdc/0x108 (T)
[  242.814681][ T1644]  __schedule+0x2a0/0x8a8
[  242.816609][ T1644]  schedule+0x3c/0xc0
[  242.818499][ T1644]  schedule_timeout+0x10c/0x120
[  242.820388][ T1644]  __wait_for_common+0xc4/0x1d0
[  242.822267][ T1644]  wait_for_completion+0x28/0x40
[  242.824168][ T1644]  __flush_work+0x7c/0xf8
[  242.825983][ T1644]  flush_work+0x1c/0x30
[  242.827816][ T1644]  pci_call_probe+0x174/0x1e0
[  242.829652][ T1644]  pci_device_probe+0x98/0x108
[  242.831455][ T1644]  call_driver_probe+0x34/0x158
[  242.833261][ T1644]  really_probe+0xc0/0x320
[  242.835082][ T1644]  __driver_probe_device+0x88/0x190
[  242.836843][ T1644]  driver_probe_device+0x48/0x120
[  242.838607][ T1644]  __driver_attach+0x138/0x280
[  242.840355][ T1644]  bus_for_each_dev+0x80/0xe8
[  242.842095][ T1644]  driver_attach+0x2c/0x40
[  242.843830][ T1644]  bus_add_driver+0x128/0x258
[  242.845564][ T1644]  driver_register+0x68/0x138
[  242.847285][ T1644]  __pci_register_driver+0x4c/0x60
[  242.849038][ T1644]  mlx5_init+0x7c/0xff8 [mlx5_core]
[  242.850871][ T1644]  do_one_initcall+0x50/0x498
[  242.852561][ T1644]  do_init_module+0x60/0x280
[  242.854220][ T1644]  load_module+0x3d8/0x6a8
[  242.855856][ T1644]  init_module_from_file+0xe4/0x108
[  242.857470][ T1644]  idempotent_init_module+0x190/0x290
[  242.859022][ T1644]  __arm64_sys_finit_module+0x74/0xf8
[  242.860544][ T1644]  invoke_syscall+0x50/0x120
[  242.861996][ T1644]  el0_svc_common.constprop.0+0xc8/0xf0
[  242.863457][ T1644]  do_el0_svc+0x24/0x38
[  242.864914][ T1644]  el0_svc+0x34/0x170
[  242.866361][ T1644]  el0t_64_sync_handler+0xa0/0xe8
[  242.867839][ T1644]  el0t_64_sync+0x190/0x198


> Fix these by moving the netdev_trylock calls from the work handlers
> lower in the call stack, in the respective recovery functions, where
> they are actually necessary.
> 
> Fixes: 8f7b00307bf1 ("net/mlx5e: Convert mlx5 netdevs to instance locking")
> Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
> ---
>  .../net/ethernet/mellanox/mlx5/core/en/ptp.c  | 14 -----
>  .../mellanox/mlx5/core/en/reporter_rx.c       | 13 +++++
>  .../mellanox/mlx5/core/en/reporter_tx.c       | 52 +++++++++++++++++--
>  .../net/ethernet/mellanox/mlx5/core/en_main.c | 40 --------------
>  4 files changed, 61 insertions(+), 58 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
> index 424f8a2728a3..74660e7fe674 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
> @@ -457,22 +457,8 @@ static void mlx5e_ptpsq_unhealthy_work(struct work_struct *work)
>  {
>  	struct mlx5e_ptpsq *ptpsq =
>  		container_of(work, struct mlx5e_ptpsq, report_unhealthy_work);
> -	struct mlx5e_txqsq *sq = &ptpsq->txqsq;
> -
> -	/* Recovering the PTP SQ means re-enabling NAPI, which requires the
> -	 * netdev instance lock. However, SQ closing has to wait for this work
> -	 * task to finish while also holding the same lock. So either get the
> -	 * lock or find that the SQ is no longer enabled and thus this work is
> -	 * not relevant anymore.
> -	 */
> -	while (!netdev_trylock(sq->netdev)) {
> -		if (!test_bit(MLX5E_SQ_STATE_ENABLED, &sq->state))
> -			return;
> -		msleep(20);
> -	}
>  
>  	mlx5e_reporter_tx_ptpsq_unhealthy(ptpsq);
> -	netdev_unlock(sq->netdev);
>  }
>  
>  static int mlx5e_ptp_open_txqsq(struct mlx5e_ptp *c, u32 tisn,
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c
> index 0686fbdd5a05..6efb626b5506 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c
> @@ -1,6 +1,8 @@
>  // SPDX-License-Identifier: GPL-2.0
>  // Copyright (c) 2019 Mellanox Technologies.
>  
> +#include <net/netdev_lock.h>
> +
>  #include "health.h"
>  #include "params.h"
>  #include "txrx.h"
> @@ -177,6 +179,16 @@ static int mlx5e_rx_reporter_timeout_recover(void *ctx)
>  	rq = ctx;
>  	priv = rq->priv;
>  
> +	/* Acquire netdev instance lock to synchronize with channel close and
> +	 * reopen flows. Either successfully obtain the lock, or detect that
> +	 * channels are closing for another reason, making this work no longer
> +	 * necessary.
> +	 */
> +	while (!netdev_trylock(rq->netdev)) {
> +		if (!test_bit(MLX5E_STATE_CHANNELS_ACTIVE, &rq->priv->state))
> +			return 0;
> +		msleep(20);
> +	}
>  	mutex_lock(&priv->state_lock);
>  
>  	eq = rq->cq.mcq.eq;
> @@ -186,6 +198,7 @@ static int mlx5e_rx_reporter_timeout_recover(void *ctx)
>  		clear_bit(MLX5E_SQ_STATE_ENABLED, &rq->icosq->state);
>  
>  	mutex_unlock(&priv->state_lock);
> +	netdev_unlock(rq->netdev);
>  
>  	return err;
>  }
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
> index 4adc1adf9897..60ba840e00fa 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
> @@ -1,6 +1,8 @@
>  /* SPDX-License-Identifier: GPL-2.0 */
>  /* Copyright (c) 2019 Mellanox Technologies. */
>  
> +#include <net/netdev_lock.h>
> +
>  #include "health.h"
>  #include "en/ptp.h"
>  #include "en/devlink.h"
> @@ -79,6 +81,18 @@ static int mlx5e_tx_reporter_err_cqe_recover(void *ctx)
>  	if (!test_bit(MLX5E_SQ_STATE_RECOVERING, &sq->state))
>  		return 0;
>  
> +	/* Recovering queues means re-enabling NAPI, which requires the netdev
> +	 * instance lock. However, SQ closing flows have to wait for work tasks
> +	 * to finish while also holding the netdev instance lock. So either get
> +	 * the lock or find that the SQ is no longer enabled and thus this work
> +	 * is not relevant anymore.
> +	 */
> +	while (!netdev_trylock(dev)) {
> +		if (!test_bit(MLX5E_SQ_STATE_ENABLED, &sq->state))
> +			return 0;
> +		msleep(20);
> +	}
> +
>  	err = mlx5_core_query_sq_state(mdev, sq->sqn, &state);
>  	if (err) {
>  		netdev_err(dev, "Failed to query SQ 0x%x state. err = %d\n",
> @@ -114,9 +128,11 @@ static int mlx5e_tx_reporter_err_cqe_recover(void *ctx)
>  	else
>  		mlx5e_trigger_napi_sched(sq->cq.napi);
>  
> +	netdev_unlock(dev);
>  	return 0;
>  out:
>  	clear_bit(MLX5E_SQ_STATE_RECOVERING, &sq->state);
> +	netdev_unlock(dev);
>  	return err;
>  }
>  
> @@ -137,10 +153,24 @@ static int mlx5e_tx_reporter_timeout_recover(void *ctx)
>  	sq = to_ctx->sq;
>  	eq = sq->cq.mcq.eq;
>  	priv = sq->priv;
> +
> +	/* Recovering the TX queues implies re-enabling NAPI, which requires
> +	 * the netdev instance lock.
> +	 * However, channel closing flows have to wait for this work to finish
> +	 * while holding the same lock. So either get the lock or find that
> +	 * channels are being closed for other reason and this work is not
> +	 * relevant anymore.
> +	 */
> +	while (!netdev_trylock(sq->netdev)) {
> +		if (!test_bit(MLX5E_STATE_CHANNELS_ACTIVE, &priv->state))
> +			return 0;
> +		msleep(20);
> +	}
> +
>  	err = mlx5e_health_channel_eq_recover(sq->netdev, eq, sq->cq.ch_stats);
>  	if (!err) {
>  		to_ctx->status = 0; /* this sq recovered */
> -		return err;
> +		goto out;
>  	}
>  
>  	mutex_lock(&priv->state_lock);
> @@ -148,7 +178,7 @@ static int mlx5e_tx_reporter_timeout_recover(void *ctx)
>  	mutex_unlock(&priv->state_lock);
>  	if (!err) {
>  		to_ctx->status = 1; /* all channels recovered */
> -		return err;
> +		goto out;
>  	}
>  
>  	to_ctx->status = err;
> @@ -156,7 +186,8 @@ static int mlx5e_tx_reporter_timeout_recover(void *ctx)
>  	netdev_err(priv->netdev,
>  		   "mlx5e_safe_reopen_channels failed recovering from a tx_timeout, err(%d).\n",
>  		   err);
> -
> +out:
> +	netdev_unlock(sq->netdev);
>  	return err;
>  }
>  
> @@ -173,10 +204,22 @@ static int mlx5e_tx_reporter_ptpsq_unhealthy_recover(void *ctx)
>  		return 0;
>  
>  	priv = ptpsq->txqsq.priv;
> +	netdev = priv->netdev;
> +
> +	/* Recovering the PTP SQ means re-enabling NAPI, which requires the
> +	 * netdev instance lock. However, SQ closing has to wait for this work
> +	 * task to finish while also holding the same lock. So either get the
> +	 * lock or find that the SQ is no longer enabled and thus this work is
> +	 * not relevant anymore.
> +	 */
> +	while (!netdev_trylock(netdev)) {
> +		if (!test_bit(MLX5E_SQ_STATE_ENABLED, &ptpsq->txqsq.state))
> +			return 0;
> +		msleep(20);
> +	}
>  
>  	mutex_lock(&priv->state_lock);
>  	chs = &priv->channels;
> -	netdev = priv->netdev;
>  
>  	carrier_ok = netif_carrier_ok(netdev);
>  	netif_carrier_off(netdev);
> @@ -193,6 +236,7 @@ static int mlx5e_tx_reporter_ptpsq_unhealthy_recover(void *ctx)
>  		netif_carrier_on(netdev);
>  
>  	mutex_unlock(&priv->state_lock);
> +	netdev_unlock(netdev);
>  
>  	return err;
>  }
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> index 4b8084420816..73f4805feac7 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> @@ -631,19 +631,7 @@ static void mlx5e_rq_timeout_work(struct work_struct *timeout_work)
>  					   struct mlx5e_rq,
>  					   rx_timeout_work);
>  
> -	/* Acquire netdev instance lock to synchronize with channel close and
> -	 * reopen flows. Either successfully obtain the lock, or detect that
> -	 * channels are closing for another reason, making this work no longer
> -	 * necessary.
> -	 */
> -	while (!netdev_trylock(rq->netdev)) {
> -		if (!test_bit(MLX5E_STATE_CHANNELS_ACTIVE, &rq->priv->state))
> -			return;
> -		msleep(20);
> -	}
> -
>  	mlx5e_reporter_rx_timeout(rq);
> -	netdev_unlock(rq->netdev);
>  }
>  
>  static int mlx5e_alloc_mpwqe_rq_drop_page(struct mlx5e_rq *rq)
> @@ -1952,20 +1940,7 @@ void mlx5e_tx_err_cqe_work(struct work_struct *recover_work)
>  	struct mlx5e_txqsq *sq = container_of(recover_work, struct mlx5e_txqsq,
>  					      recover_work);
>  
> -	/* Recovering queues means re-enabling NAPI, which requires the netdev
> -	 * instance lock. However, SQ closing flows have to wait for work tasks
> -	 * to finish while also holding the netdev instance lock. So either get
> -	 * the lock or find that the SQ is no longer enabled and thus this work
> -	 * is not relevant anymore.
> -	 */
> -	while (!netdev_trylock(sq->netdev)) {
> -		if (!test_bit(MLX5E_SQ_STATE_ENABLED, &sq->state))
> -			return;
> -		msleep(20);
> -	}
> -
>  	mlx5e_reporter_tx_err_cqe(sq);
> -	netdev_unlock(sq->netdev);
>  }
>  
>  static struct dim_cq_moder mlx5e_get_def_tx_moderation(u8 cq_period_mode)
> @@ -5105,19 +5080,6 @@ static void mlx5e_tx_timeout_work(struct work_struct *work)
>  	struct net_device *netdev = priv->netdev;
>  	int i;
>  
> -	/* Recovering the TX queues implies re-enabling NAPI, which requires
> -	 * the netdev instance lock.
> -	 * However, channel closing flows have to wait for this work to finish
> -	 * while holding the same lock. So either get the lock or find that
> -	 * channels are being closed for other reason and this work is not
> -	 * relevant anymore.
> -	 */
> -	while (!netdev_trylock(netdev)) {
> -		if (!test_bit(MLX5E_STATE_CHANNELS_ACTIVE, &priv->state))
> -			return;
> -		msleep(20);
> -	}
> -
>  	for (i = 0; i < netdev->real_num_tx_queues; i++) {
>  		struct netdev_queue *dev_queue =
>  			netdev_get_tx_queue(netdev, i);
> @@ -5130,8 +5092,6 @@ static void mlx5e_tx_timeout_work(struct work_struct *work)
>  		/* break if tried to reopened channels */
>  			break;
>  	}
> -
> -	netdev_unlock(netdev);
>  }
>  
>  static void mlx5e_tx_timeout(struct net_device *dev, unsigned int txqueue)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net V2 5/6] net/mlx5e: Fix deadlocks between devlink and netdev instance locks
  2026-03-05  2:33   ` Jinjie Ruan
@ 2026-03-05 12:19     ` Cosmin Ratiu
  2026-03-06  1:54       ` Jinjie Ruan
  0 siblings, 1 reply; 12+ messages in thread
From: Cosmin Ratiu @ 2026-03-05 12:19 UTC (permalink / raw)
  To: Tariq Toukan, davem@davemloft.net, ruanjinjie@huawei.com,
	edumazet@google.com, kuba@kernel.org, pabeni@redhat.com,
	andrew+netdev@lunn.ch
  Cc: linux-rdma@vger.kernel.org, Gal Pressman, Dragos Tatulea,
	Mark Bloch, Moshe Shemesh, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org, jacob.e.keller@intel.com, leon@kernel.org,
	Saeed Mahameed

On Thu, 2026-03-05 at 10:33 +0800, Jinjie Ruan wrote:
> 
> 
> On 2026/2/18 15:29, Tariq Toukan wrote:
> > From: Cosmin Ratiu <cratiu@nvidia.com>
> > 
> > In the mentioned "Fixes" commit, various work tasks triggering
> > devlink
> > health reporter recovery were switched to use netdev_trylock to
> > protect
> > against concurrent tear down of the channels being recovered. But
> > this
> > had the side effect of introducing potential deadlocks because of
> > incorrect lock ordering.
> > 
> > The correct lock order is described by the init flow:
> > probe_one -> mlx5_init_one (acquires devlink lock)
> > -> mlx5_init_one_devl_locked -> mlx5_register_device
> > -> mlx5_rescan_drivers_locked -...-> mlx5e_probe -> _mlx5e_probe
> > -> register_netdev (acquires rtnl lock)
> > -> register_netdevice (acquires netdev lock)
> > => devlink lock -> rtnl lock -> netdev lock.
> > 
> > But in the current recovery flow, the order is wrong:
> > mlx5e_tx_err_cqe_work (acquires netdev lock)
> > -> mlx5e_reporter_tx_err_cqe -> mlx5e_health_report
> > -> devlink_health_report (acquires devlink lock => boom!)
> > -> devlink_health_reporter_recover
> > -> mlx5e_tx_reporter_recover -> mlx5e_tx_reporter_recover_from_ctx
> > -> mlx5e_tx_reporter_err_cqe_recover
> > 
> > The same pattern exists in:
> > mlx5e_reporter_rx_timeout
> > mlx5e_reporter_tx_ptpsq_unhealthy
> > mlx5e_reporter_tx_timeout
> > 
> 
> On 7.0 rc2，It seems that similar problems still exist, causing the
> ARM64
> kernel to fail to start on the Kunpeng 920 as below:

Thank you for the report. From it, I can understand that it's something
else:
Task2 (mlx5_init_one_devl_locked -> ... -> mlx5_fw_tracer_init) is
holding the devlink lock and busy executing a firmware command for more
than 120 seconds. There should be no other locks held on that path.
Task1 (mlx5_fw_reporter_err_work -> devlink_health_report) is trying to
acquire the same devlink lock and is stuck on that.
Task3 (mlx5_init -> __pci_register_driver -> ...) is stuck waiting for
all devices to finish probing/registering (and is waiting for task2 to
finish).

I don't know what the problem is with task2 taking so long to execute a
fw command. In order to exclude potential deadlocks, is it possible to
boot up your kernel with CONFIG_LOCKDEP enabled?

Cosmin.

> 
> [  242.676635][ T1644] INFO: task kworker/u1280:1:1671 blocked for
> more
> than 120 seconds.
> [  242.682141][ T1644]       Not tainted 7.0.0-rc2+ #3
> [  242.684942][ T1644] "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [  242.690552][ T1644] task:kworker/u1280:1 state:D stack:0    
> pid:1671
>  tgid:1671  ppid:2      task_flags:0x4208060 flags:0x00000010
> [  242.696332][ T1644] Workqueue: mlx5_health0000:97:00.0
> mlx5_fw_reporter_err_work [mlx5_core]
> [  242.702324][ T1644] Call trace:
> [  242.705187][ T1644]  __switch_to+0xdc/0x108 (T)
> [  242.707936][ T1644]  __schedule+0x2a0/0x8a8
> [  242.710647][ T1644]  schedule+0x3c/0xc0
> [  242.713321][ T1644]  schedule_preempt_disabled+0x2c/0x50
> [  242.715875][ T1644]  __mutex_lock.constprop.0+0x344/0x918
> [  242.718421][ T1644]  __mutex_lock_slowpath+0x1c/0x30
> [  242.720885][ T1644]  mutex_lock+0x50/0x68
> [  242.723278][ T1644]  devl_lock+0x1c/0x30
> [  242.725607][ T1644]  devlink_health_report+0x240/0x328
> [  242.727902][ T1644]  mlx5_fw_reporter_err_work+0xa0/0xb0
> [mlx5_core]
> [  242.730333][ T1644]  process_one_work+0x180/0x4f8
> [  242.732687][ T1644]  worker_thread+0x208/0x280
> [  242.734976][ T1644]  kthread+0x128/0x138
> [  242.737217][ T1644]  ret_from_fork+0x10/0x20
> [  242.739599][ T1644] INFO: task kworker/u1280:1:1671 is blocked on
> a
> mutex likely owned by task kworker/240:2:2582.
> [  242.744002][ T1644] task:kworker/240:2   state:D stack:0    
> pid:2582
>  tgid:2582  ppid:2      task_flags:0x4208060 flags:0x00000010
> [  242.748447][ T1644] Workqueue: sync_wq local_pci_probe_callback
> [  242.750654][ T1644] Call trace:
> [  242.752793][ T1644]  __switch_to+0xdc/0x108 (T)
> [  242.754882][ T1644]  __schedule+0x2a0/0x8a8
> [  242.756946][ T1644]  schedule+0x3c/0xc0
> [  242.758951][ T1644]  schedule_timeout+0x80/0x120
> [  242.760903][ T1644]  __wait_for_common+0xc4/0x1d0
> [  242.762796][ T1644]  wait_for_completion_timeout+0x28/0x40
> [  242.764670][ T1644]  wait_func+0x180/0x240 [mlx5_core]
> [  242.766533][ T1644]  mlx5_cmd_invoke+0x244/0x3e0 [mlx5_core]
> [  242.768338][ T1644]  cmd_exec+0x208/0x448 [mlx5_core]
> [  242.770153][ T1644]  mlx5_cmd_do+0x38/0x80 [mlx5_core]
> [  242.771974][ T1644]  mlx5_cmd_exec+0x2c/0x60 [mlx5_core]
> [  242.773848][ T1644]  mlx5_core_create_mkey+0x70/0x120 [mlx5_core]
> [  242.775712][ T1644]  mlx5_fw_tracer_create_mkey+0x114/0x180
> [mlx5_core]
> [  242.777609][ T1644]  mlx5_fw_tracer_init.part.0+0xb0/0x1f0
> [mlx5_core]
> [  242.779495][ T1644]  mlx5_fw_tracer_init+0x24/0x40 [mlx5_core]
> [  242.781380][ T1644]  mlx5_load+0x78/0x360 [mlx5_core]
> [  242.783256][ T1644]  mlx5_init_one_devl_locked+0xd0/0x278
> [mlx5_core]
> [  242.785231][ T1644]  probe_one+0xe0/0x208 [mlx5_core]
> [  242.787159][ T1644]  local_pci_probe+0x48/0xb8
> [  242.789038][ T1644]  local_pci_probe_callback+0x24/0x40
> [  242.790876][ T1644]  process_one_work+0x180/0x4f8
> [  242.792731][ T1644]  worker_thread+0x208/0x280
> [  242.794578][ T1644]  kthread+0x128/0x138
> [  242.796427][ T1644]  ret_from_fork+0x10/0x20
> [  242.798277][ T1644] INFO: task systemd-udevd:2281 blocked for more
> than 120 seconds.
> [  242.801795][ T1644]       Not tainted 7.0.0-rc2+ #3
> [  242.803542][ T1644] "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [  242.807056][ T1644] task:systemd-udevd   state:D stack:0    
> pid:2281
>  tgid:2281  ppid:2256   task_flags:0x400140 flags:0x00000811
> [  242.810829][ T1644] Call trace:
> [  242.812779][ T1644]  __switch_to+0xdc/0x108 (T)
> [  242.814681][ T1644]  __schedule+0x2a0/0x8a8
> [  242.816609][ T1644]  schedule+0x3c/0xc0
> [  242.818499][ T1644]  schedule_timeout+0x10c/0x120
> [  242.820388][ T1644]  __wait_for_common+0xc4/0x1d0
> [  242.822267][ T1644]  wait_for_completion+0x28/0x40
> [  242.824168][ T1644]  __flush_work+0x7c/0xf8
> [  242.825983][ T1644]  flush_work+0x1c/0x30
> [  242.827816][ T1644]  pci_call_probe+0x174/0x1e0
> [  242.829652][ T1644]  pci_device_probe+0x98/0x108
> [  242.831455][ T1644]  call_driver_probe+0x34/0x158
> [  242.833261][ T1644]  really_probe+0xc0/0x320
> [  242.835082][ T1644]  __driver_probe_device+0x88/0x190
> [  242.836843][ T1644]  driver_probe_device+0x48/0x120
> [  242.838607][ T1644]  __driver_attach+0x138/0x280
> [  242.840355][ T1644]  bus_for_each_dev+0x80/0xe8
> [  242.842095][ T1644]  driver_attach+0x2c/0x40
> [  242.843830][ T1644]  bus_add_driver+0x128/0x258
> [  242.845564][ T1644]  driver_register+0x68/0x138
> [  242.847285][ T1644]  __pci_register_driver+0x4c/0x60
> [  242.849038][ T1644]  mlx5_init+0x7c/0xff8 [mlx5_core]
> [  242.850871][ T1644]  do_one_initcall+0x50/0x498
> [  242.852561][ T1644]  do_init_module+0x60/0x280
> [  242.854220][ T1644]  load_module+0x3d8/0x6a8
> [  242.855856][ T1644]  init_module_from_file+0xe4/0x108
> [  242.857470][ T1644]  idempotent_init_module+0x190/0x290
> [  242.859022][ T1644]  __arm64_sys_finit_module+0x74/0xf8
> [  242.860544][ T1644]  invoke_syscall+0x50/0x120
> [  242.861996][ T1644]  el0_svc_common.constprop.0+0xc8/0xf0
> [  242.863457][ T1644]  do_el0_svc+0x24/0x38
> [  242.864914][ T1644]  el0_svc+0x34/0x170
> [  242.866361][ T1644]  el0t_64_sync_handler+0xa0/0xe8
> [  242.867839][ T1644]  el0t_64_sync+0x190/0x198
> 
> 
> > Fix these by moving the netdev_trylock calls from the work handlers
> > lower in the call stack, in the respective recovery functions,
> > where
> > they are actually necessary.
> > 
> > Fixes: 8f7b00307bf1 ("net/mlx5e: Convert mlx5 netdevs to instance
> > locking")
> > Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
> > Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
> > Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
> > Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
> > ---
> >  .../net/ethernet/mellanox/mlx5/core/en/ptp.c  | 14 -----
> >  .../mellanox/mlx5/core/en/reporter_rx.c       | 13 +++++
> >  .../mellanox/mlx5/core/en/reporter_tx.c       | 52
> > +++++++++++++++++--
> >  .../net/ethernet/mellanox/mlx5/core/en_main.c | 40 --------------
> >  4 files changed, 61 insertions(+), 58 deletions(-)
> > 
> > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
> > b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
> > index 424f8a2728a3..74660e7fe674 100644
> > --- a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
> > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
> > @@ -457,22 +457,8 @@ static void mlx5e_ptpsq_unhealthy_work(struct
> > work_struct *work)
> >  {
> >  	struct mlx5e_ptpsq *ptpsq =
> >  		container_of(work, struct mlx5e_ptpsq,
> > report_unhealthy_work);
> > -	struct mlx5e_txqsq *sq = &ptpsq->txqsq;
> > -
> > -	/* Recovering the PTP SQ means re-enabling NAPI, which
> > requires the
> > -	 * netdev instance lock. However, SQ closing has to wait
> > for this work
> > -	 * task to finish while also holding the same lock. So
> > either get the
> > -	 * lock or find that the SQ is no longer enabled and thus
> > this work is
> > -	 * not relevant anymore.
> > -	 */
> > -	while (!netdev_trylock(sq->netdev)) {
> > -		if (!test_bit(MLX5E_SQ_STATE_ENABLED, &sq->state))
> > -			return;
> > -		msleep(20);
> > -	}
> >  
> >  	mlx5e_reporter_tx_ptpsq_unhealthy(ptpsq);
> > -	netdev_unlock(sq->netdev);
> >  }
> >  
> >  static int mlx5e_ptp_open_txqsq(struct mlx5e_ptp *c, u32 tisn,
> > diff --git
> > a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c
> > b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c
> > index 0686fbdd5a05..6efb626b5506 100644
> > --- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c
> > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c
> > @@ -1,6 +1,8 @@
> >  // SPDX-License-Identifier: GPL-2.0
> >  // Copyright (c) 2019 Mellanox Technologies.
> >  
> > +#include <net/netdev_lock.h>
> > +
> >  #include "health.h"
> >  #include "params.h"
> >  #include "txrx.h"
> > @@ -177,6 +179,16 @@ static int
> > mlx5e_rx_reporter_timeout_recover(void *ctx)
> >  	rq = ctx;
> >  	priv = rq->priv;
> >  
> > +	/* Acquire netdev instance lock to synchronize with
> > channel close and
> > +	 * reopen flows. Either successfully obtain the lock, or
> > detect that
> > +	 * channels are closing for another reason, making this
> > work no longer
> > +	 * necessary.
> > +	 */
> > +	while (!netdev_trylock(rq->netdev)) {
> > +		if (!test_bit(MLX5E_STATE_CHANNELS_ACTIVE, &rq-
> > >priv->state))
> > +			return 0;
> > +		msleep(20);
> > +	}
> >  	mutex_lock(&priv->state_lock);
> >  
> >  	eq = rq->cq.mcq.eq;
> > @@ -186,6 +198,7 @@ static int
> > mlx5e_rx_reporter_timeout_recover(void *ctx)
> >  		clear_bit(MLX5E_SQ_STATE_ENABLED, &rq->icosq-
> > >state);
> >  
> >  	mutex_unlock(&priv->state_lock);
> > +	netdev_unlock(rq->netdev);
> >  
> >  	return err;
> >  }
> > diff --git
> > a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
> > b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
> > index 4adc1adf9897..60ba840e00fa 100644
> > --- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
> > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
> > @@ -1,6 +1,8 @@
> >  /* SPDX-License-Identifier: GPL-2.0 */
> >  /* Copyright (c) 2019 Mellanox Technologies. */
> >  
> > +#include <net/netdev_lock.h>
> > +
> >  #include "health.h"
> >  #include "en/ptp.h"
> >  #include "en/devlink.h"
> > @@ -79,6 +81,18 @@ static int
> > mlx5e_tx_reporter_err_cqe_recover(void *ctx)
> >  	if (!test_bit(MLX5E_SQ_STATE_RECOVERING, &sq->state))
> >  		return 0;
> >  
> > +	/* Recovering queues means re-enabling NAPI, which
> > requires the netdev
> > +	 * instance lock. However, SQ closing flows have to wait
> > for work tasks
> > +	 * to finish while also holding the netdev instance lock.
> > So either get
> > +	 * the lock or find that the SQ is no longer enabled and
> > thus this work
> > +	 * is not relevant anymore.
> > +	 */
> > +	while (!netdev_trylock(dev)) {
> > +		if (!test_bit(MLX5E_SQ_STATE_ENABLED, &sq->state))
> > +			return 0;
> > +		msleep(20);
> > +	}
> > +
> >  	err = mlx5_core_query_sq_state(mdev, sq->sqn, &state);
> >  	if (err) {
> >  		netdev_err(dev, "Failed to query SQ 0x%x state.
> > err = %d\n",
> > @@ -114,9 +128,11 @@ static int
> > mlx5e_tx_reporter_err_cqe_recover(void *ctx)
> >  	else
> >  		mlx5e_trigger_napi_sched(sq->cq.napi);
> >  
> > +	netdev_unlock(dev);
> >  	return 0;
> >  out:
> >  	clear_bit(MLX5E_SQ_STATE_RECOVERING, &sq->state);
> > +	netdev_unlock(dev);
> >  	return err;
> >  }
> >  
> > @@ -137,10 +153,24 @@ static int
> > mlx5e_tx_reporter_timeout_recover(void *ctx)
> >  	sq = to_ctx->sq;
> >  	eq = sq->cq.mcq.eq;
> >  	priv = sq->priv;
> > +
> > +	/* Recovering the TX queues implies re-enabling NAPI,
> > which requires
> > +	 * the netdev instance lock.
> > +	 * However, channel closing flows have to wait for this
> > work to finish
> > +	 * while holding the same lock. So either get the lock or
> > find that
> > +	 * channels are being closed for other reason and this
> > work is not
> > +	 * relevant anymore.
> > +	 */
> > +	while (!netdev_trylock(sq->netdev)) {
> > +		if (!test_bit(MLX5E_STATE_CHANNELS_ACTIVE, &priv-
> > >state))
> > +			return 0;
> > +		msleep(20);
> > +	}
> > +
> >  	err = mlx5e_health_channel_eq_recover(sq->netdev, eq, sq-
> > >cq.ch_stats);
> >  	if (!err) {
> >  		to_ctx->status = 0; /* this sq recovered */
> > -		return err;
> > +		goto out;
> >  	}
> >  
> >  	mutex_lock(&priv->state_lock);
> > @@ -148,7 +178,7 @@ static int
> > mlx5e_tx_reporter_timeout_recover(void *ctx)
> >  	mutex_unlock(&priv->state_lock);
> >  	if (!err) {
> >  		to_ctx->status = 1; /* all channels recovered */
> > -		return err;
> > +		goto out;
> >  	}
> >  
> >  	to_ctx->status = err;
> > @@ -156,7 +186,8 @@ static int
> > mlx5e_tx_reporter_timeout_recover(void *ctx)
> >  	netdev_err(priv->netdev,
> >  		   "mlx5e_safe_reopen_channels failed recovering
> > from a tx_timeout, err(%d).\n",
> >  		   err);
> > -
> > +out:
> > +	netdev_unlock(sq->netdev);
> >  	return err;
> >  }
> >  
> > @@ -173,10 +204,22 @@ static int
> > mlx5e_tx_reporter_ptpsq_unhealthy_recover(void *ctx)
> >  		return 0;
> >  
> >  	priv = ptpsq->txqsq.priv;
> > +	netdev = priv->netdev;
> > +
> > +	/* Recovering the PTP SQ means re-enabling NAPI, which
> > requires the
> > +	 * netdev instance lock. However, SQ closing has to wait
> > for this work
> > +	 * task to finish while also holding the same lock. So
> > either get the
> > +	 * lock or find that the SQ is no longer enabled and thus
> > this work is
> > +	 * not relevant anymore.
> > +	 */
> > +	while (!netdev_trylock(netdev)) {
> > +		if (!test_bit(MLX5E_SQ_STATE_ENABLED, &ptpsq-
> > >txqsq.state))
> > +			return 0;
> > +		msleep(20);
> > +	}
> >  
> >  	mutex_lock(&priv->state_lock);
> >  	chs = &priv->channels;
> > -	netdev = priv->netdev;
> >  
> >  	carrier_ok = netif_carrier_ok(netdev);
> >  	netif_carrier_off(netdev);
> > @@ -193,6 +236,7 @@ static int
> > mlx5e_tx_reporter_ptpsq_unhealthy_recover(void *ctx)
> >  		netif_carrier_on(netdev);
> >  
> >  	mutex_unlock(&priv->state_lock);
> > +	netdev_unlock(netdev);
> >  
> >  	return err;
> >  }
> > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> > b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> > index 4b8084420816..73f4805feac7 100644
> > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> > @@ -631,19 +631,7 @@ static void mlx5e_rq_timeout_work(struct
> > work_struct *timeout_work)
> >  					   struct mlx5e_rq,
> >  					   rx_timeout_work);
> >  
> > -	/* Acquire netdev instance lock to synchronize with
> > channel close and
> > -	 * reopen flows. Either successfully obtain the lock, or
> > detect that
> > -	 * channels are closing for another reason, making this
> > work no longer
> > -	 * necessary.
> > -	 */
> > -	while (!netdev_trylock(rq->netdev)) {
> > -		if (!test_bit(MLX5E_STATE_CHANNELS_ACTIVE, &rq-
> > >priv->state))
> > -			return;
> > -		msleep(20);
> > -	}
> > -
> >  	mlx5e_reporter_rx_timeout(rq);
> > -	netdev_unlock(rq->netdev);
> >  }
> >  
> >  static int mlx5e_alloc_mpwqe_rq_drop_page(struct mlx5e_rq *rq)
> > @@ -1952,20 +1940,7 @@ void mlx5e_tx_err_cqe_work(struct
> > work_struct *recover_work)
> >  	struct mlx5e_txqsq *sq = container_of(recover_work, struct
> > mlx5e_txqsq,
> >  					      recover_work);
> >  
> > -	/* Recovering queues means re-enabling NAPI, which
> > requires the netdev
> > -	 * instance lock. However, SQ closing flows have to wait
> > for work tasks
> > -	 * to finish while also holding the netdev instance lock.
> > So either get
> > -	 * the lock or find that the SQ is no longer enabled and
> > thus this work
> > -	 * is not relevant anymore.
> > -	 */
> > -	while (!netdev_trylock(sq->netdev)) {
> > -		if (!test_bit(MLX5E_SQ_STATE_ENABLED, &sq->state))
> > -			return;
> > -		msleep(20);
> > -	}
> > -
> >  	mlx5e_reporter_tx_err_cqe(sq);
> > -	netdev_unlock(sq->netdev);
> >  }
> >  
> >  static struct dim_cq_moder mlx5e_get_def_tx_moderation(u8
> > cq_period_mode)
> > @@ -5105,19 +5080,6 @@ static void mlx5e_tx_timeout_work(struct
> > work_struct *work)
> >  	struct net_device *netdev = priv->netdev;
> >  	int i;
> >  
> > -	/* Recovering the TX queues implies re-enabling NAPI,
> > which requires
> > -	 * the netdev instance lock.
> > -	 * However, channel closing flows have to wait for this
> > work to finish
> > -	 * while holding the same lock. So either get the lock or
> > find that
> > -	 * channels are being closed for other reason and this
> > work is not
> > -	 * relevant anymore.
> > -	 */
> > -	while (!netdev_trylock(netdev)) {
> > -		if (!test_bit(MLX5E_STATE_CHANNELS_ACTIVE, &priv-
> > >state))
> > -			return;
> > -		msleep(20);
> > -	}
> > -
> >  	for (i = 0; i < netdev->real_num_tx_queues; i++) {
> >  		struct netdev_queue *dev_queue =
> >  			netdev_get_tx_queue(netdev, i);
> > @@ -5130,8 +5092,6 @@ static void mlx5e_tx_timeout_work(struct
> > work_struct *work)
> >  		/* break if tried to reopened channels */
> >  			break;
> >  	}
> > -
> > -	netdev_unlock(netdev);
> >  }
> >  
> >  static void mlx5e_tx_timeout(struct net_device *dev, unsigned int
> > txqueue)
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH net V2 5/6] net/mlx5e: Fix deadlocks between devlink and netdev instance locks
  2026-03-05 12:19     ` Cosmin Ratiu
@ 2026-03-06  1:54       ` Jinjie Ruan
  0 siblings, 0 replies; 12+ messages in thread
From: Jinjie Ruan @ 2026-03-06  1:54 UTC (permalink / raw)
  To: Cosmin Ratiu, Tariq Toukan, davem@davemloft.net,
	edumazet@google.com, kuba@kernel.org, pabeni@redhat.com,
	andrew+netdev@lunn.ch
  Cc: linux-rdma@vger.kernel.org, Gal Pressman, Dragos Tatulea,
	Mark Bloch, Moshe Shemesh, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org, jacob.e.keller@intel.com, leon@kernel.org,
	Saeed Mahameed



On 2026/3/5 20:19, Cosmin Ratiu wrote:
> On Thu, 2026-03-05 at 10:33 +0800, Jinjie Ruan wrote:
>>
>>
>> On 2026/2/18 15:29, Tariq Toukan wrote:
>>> From: Cosmin Ratiu <cratiu@nvidia.com>
>>>
>>> In the mentioned "Fixes" commit, various work tasks triggering
>>> devlink
>>> health reporter recovery were switched to use netdev_trylock to
>>> protect
>>> against concurrent tear down of the channels being recovered. But
>>> this
>>> had the side effect of introducing potential deadlocks because of
>>> incorrect lock ordering.
>>>
>>> The correct lock order is described by the init flow:
>>> probe_one -> mlx5_init_one (acquires devlink lock)
>>> -> mlx5_init_one_devl_locked -> mlx5_register_device
>>> -> mlx5_rescan_drivers_locked -...-> mlx5e_probe -> _mlx5e_probe
>>> -> register_netdev (acquires rtnl lock)
>>> -> register_netdevice (acquires netdev lock)
>>> => devlink lock -> rtnl lock -> netdev lock.
>>>
>>> But in the current recovery flow, the order is wrong:
>>> mlx5e_tx_err_cqe_work (acquires netdev lock)
>>> -> mlx5e_reporter_tx_err_cqe -> mlx5e_health_report
>>> -> devlink_health_report (acquires devlink lock => boom!)
>>> -> devlink_health_reporter_recover
>>> -> mlx5e_tx_reporter_recover -> mlx5e_tx_reporter_recover_from_ctx
>>> -> mlx5e_tx_reporter_err_cqe_recover
>>>
>>> The same pattern exists in:
>>> mlx5e_reporter_rx_timeout
>>> mlx5e_reporter_tx_ptpsq_unhealthy
>>> mlx5e_reporter_tx_timeout
>>>
>>
>> On 7.0 rc2，It seems that similar problems still exist, causing the
>> ARM64
>> kernel to fail to start on the Kunpeng 920 as below:
> 
> Thank you for the report. From it, I can understand that it's something
> else:
> Task2 (mlx5_init_one_devl_locked -> ... -> mlx5_fw_tracer_init) is
> holding the devlink lock and busy executing a firmware command for more
> than 120 seconds. There should be no other locks held on that path.
> Task1 (mlx5_fw_reporter_err_work -> devlink_health_report) is trying to
> acquire the same devlink lock and is stuck on that.
> Task3 (mlx5_init -> __pci_register_driver -> ...) is stuck waiting for
> all devices to finish probing/registering (and is waiting for task2 to
> finish).
> 
> I don't know what the problem is with task2 taking so long to execute a
> fw command. In order to exclude potential deadlocks, is it possible to
> boot up your kernel with CONFIG_LOCKDEP enabled?
> 
> Cosmin.

Thank you for your reply and detailed analysis.

With CONFIG_LOCKDEP enabled, the log is as below:

[  242.675370][ T1645] INFO: task systemd-udevd:2258 blocked for more
than 120 seconds.
[  242.680860][ T1645]       Not tainted 7.0.0-rc2+ #6
[  242.683570][ T1645] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  242.688970][ T1645] task:systemd-udevd   state:D stack:0     pid:2258
 tgid:2258  ppid:2250   task_flags:0x400140 flags:0x00000811
[  242.694616][ T1645] Call trace:
[  242.697422][ T1645]  __switch_to+0xdc/0x108 (T)
[  242.700222][ T1645]  __schedule+0x2f4/0x690
[  242.702897][ T1645]  schedule+0x58/0x118
[  242.705538][ T1645]  schedule_timeout+0x114/0x128
[  242.708114][ T1645]  __wait_for_common+0xc4/0x1d0
[  242.710618][ T1645]  wait_for_completion+0x28/0x40
[  242.713084][ T1645]  __flush_work+0x7c/0xf8
[  242.715478][ T1645]  flush_work+0x1c/0x30
[  242.717831][ T1645]  pci_call_probe+0x200/0x368
[  242.720089][ T1645]  pci_device_probe+0x98/0x108
[  242.722297][ T1645]  call_driver_probe+0x34/0x158
[  242.724472][ T1645]  really_probe+0xc0/0x320
[  242.726638][ T1645]  __driver_probe_device+0x88/0x190
[  242.728839][ T1645]  driver_probe_device+0x48/0x120
[  242.730963][ T1645]  __driver_attach+0x168/0x2b0
[  242.733068][ T1645]  bus_for_each_dev+0x80/0xe8
[  242.735147][ T1645]  driver_attach+0x2c/0x40
[  242.737154][ T1645]  bus_add_driver+0x128/0x258
[  242.739136][ T1645]  driver_register+0x68/0x138
[  242.741000][ T1645]  __pci_register_driver+0x68/0x80
[  242.742846][ T1645]  mlx5_init+0x58/0xff8 [mlx5_core]
[  242.744707][ T1645]  do_one_initcall+0x70/0x5d8
[  242.746428][ T1645]  do_init_module+0x60/0x280
[  242.748106][ T1645]  load_module+0x5a0/0x650
[  242.749742][ T1645]  init_module_from_file+0xe4/0x108
[  242.751367][ T1645]  idempotent_init_module+0x198/0x298
[  242.752926][ T1645]  __arm64_sys_finit_module+0x74/0xf8
[  242.754451][ T1645]  invoke_syscall+0x50/0x120
[  242.755926][ T1645]  el0_svc_common.constprop.0+0xc8/0xf0
[  242.757385][ T1645]  do_el0_svc+0x24/0x38
[  242.758883][ T1645]  el0_svc+0x50/0x338
[  242.760317][ T1645]  el0t_64_sync_handler+0xa0/0xe8
[  242.761788][ T1645]  el0t_64_sync+0x190/0x198
[  242.763267][ T1645] INFO: task systemd-udevd:2268 blocked for more
than 120 seconds.
[  242.766309][ T1645]       Not tainted 7.0.0-rc2+ #6
[  242.767901][ T1645] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  242.771201][ T1645] task:systemd-udevd   state:D stack:0     pid:2268
 tgid:2268  ppid:2250   task_flags:0x400140 flags:0x00000811
[  242.774771][ T1645] Call trace:
[  242.776586][ T1645]  __switch_to+0xdc/0x108 (T)
[  242.778395][ T1645]  __schedule+0x2f4/0x690
[  242.780204][ T1645]  schedule+0x58/0x118
[  242.781977][ T1645]  schedule_preempt_disabled+0x2c/0x50
[  242.783797][ T1645]  __mutex_lock+0x45c/0x820
[  242.785597][ T1645]  mutex_lock_nested+0x2c/0x40
[  242.787418][ T1645]  __driver_attach+0x38/0x2b0
[  242.789228][ T1645]  bus_for_each_dev+0x80/0xe8
[  242.791047][ T1645]  driver_attach+0x2c/0x40
[  242.792834][ T1645]  bus_add_driver+0x128/0x258
[  242.794625][ T1645]  driver_register+0x68/0x138
[  242.796426][ T1645]  __pci_register_driver+0x68/0x80
[  242.798223][ T1645]  hpre_init+0x84/0xff8 [hisi_hpre]
[  242.800050][ T1645]  do_one_initcall+0x70/0x5d8
[  242.801848][ T1645]  do_init_module+0x60/0x280
[  242.803652][ T1645]  load_module+0x5a0/0x650
[  242.805415][ T1645]  init_module_from_file+0xe4/0x108
[  242.807196][ T1645]  idempotent_init_module+0x198/0x298
[  242.808957][ T1645]  __arm64_sys_finit_module+0x74/0xf8
[  242.810748][ T1645]  invoke_syscall+0x50/0x120
[  242.812543][ T1645]  el0_svc_common.constprop.0+0xc8/0xf0
[  242.814361][ T1645]  do_el0_svc+0x24/0x38
[  242.816149][ T1645]  el0_svc+0x50/0x338
[  242.817853][ T1645]  el0t_64_sync_handler+0xa0/0xe8
[  242.819573][ T1645]  el0t_64_sync+0x190/0x198
[  242.821407][ T1645] INFO: task systemd-udevd:2268 is blocked on a
mutex likely owned by task systemd-udevd:2258.
[  242.824739][ T1645] INFO: task systemd-udevd:2296 blocked for more
than 120 seconds.
[  242.828074][ T1645]       Not tainted 7.0.0-rc2+ #6
[  242.829738][ T1645] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  242.833313][ T1645] task:systemd-udevd   state:D stack:0     pid:2296
 tgid:2296  ppid:2250   task_flags:0x400140 flags:0x00000811
[  242.837101][ T1645] Call trace:
[  242.839050][ T1645]  __switch_to+0xdc/0x108 (T)
[  242.840992][ T1645]  __schedule+0x2f4/0x690
[  242.842921][ T1645]  schedule+0x58/0x118
[  242.844809][ T1645]  schedule_preempt_disabled+0x2c/0x50
[  242.846736][ T1645]  __mutex_lock+0x45c/0x820
[  242.848727][ T1645]  mutex_lock_nested+0x2c/0x40
[  242.850635][ T1645]  __driver_attach+0x38/0x2b0
[  242.852577][ T1645]  bus_for_each_dev+0x80/0xe8
[  242.854486][ T1645]  driver_attach+0x2c/0x40
[  242.856414][ T1645]  bus_add_driver+0x128/0x258
[  242.858330][ T1645]  driver_register+0x68/0x138
[  242.860261][ T1645]  __pci_register_driver+0x68/0x80
[  242.862175][ T1645]  sas_v3_pci_driver_init+0x30/0xff8 [hisi_sas_v3_hw]
[  242.864158][ T1645]  do_one_initcall+0x70/0x5d8
[  242.866104][ T1645]  do_init_module+0x60/0x280
[  242.868053][ T1645]  load_module+0x5a0/0x650
[  242.869956][ T1645]  init_module_from_file+0xe4/0x108
[  242.871903][ T1645]  idempotent_init_module+0x198/0x298
[  242.873836][ T1645]  __arm64_sys_finit_module+0x74/0xf8
[  242.875817][ T1645]  invoke_syscall+0x50/0x120
[  242.877732][ T1645]  el0_svc_common.constprop.0+0xc8/0xf0
[  242.879638][ T1645]  do_el0_svc+0x24/0x38
[  242.881453][ T1645]  el0_svc+0x50/0x338
[  242.883222][ T1645]  el0t_64_sync_handler+0xa0/0xe8
[  242.884968][ T1645]  el0t_64_sync+0x190/0x198
[  242.886792][ T1645] INFO: task systemd-udevd:2296 is blocked on a
mutex likely owned by task systemd-udevd:2258.
[  242.890229][ T1645] INFO: task kworker/u1280:2:2688 blocked for more
than 121 seconds.
[  242.893604][ T1645]       Not tainted 7.0.0-rc2+ #6
[  242.895348][ T1645] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  242.898968][ T1645] task:kworker/u1280:2 state:D stack:0     pid:2688
 tgid:2688  ppid:2      task_flags:0x4208060 flags:0x00000010
[  242.902888][ T1645] Workqueue: mlx5_health0000:97:00.1
mlx5_fw_reporter_err_work [mlx5_core]
[  242.906994][ T1645] Call trace:
[  242.909061][ T1645]  __switch_to+0xdc/0x108 (T)
[  242.911132][ T1645]  __schedule+0x2f4/0x690
[  242.913192][ T1645]  schedule+0x58/0x118
[  242.915264][ T1645]  schedule_preempt_disabled+0x2c/0x50
[  242.917338][ T1645]  __mutex_lock+0x45c/0x820
[  242.919435][ T1645]  mutex_lock_nested+0x2c/0x40
[  242.921429][ T1645]  devl_lock+0x20/0x38
[  242.923457][ T1645]  devlink_health_report+0x1e0/0x2a0
[  242.925493][ T1645]  mlx5_fw_reporter_err_work+0xa0/0xb0 [mlx5_core]
[  242.927634][ T1645]  process_one_work+0x23c/0x768
[  242.929686][ T1645]  worker_thread+0x230/0x2e0
[  242.931763][ T1645]  kthread+0x128/0x138
[  242.933798][ T1645]  ret_from_fork+0x10/0x20
[  242.935947][ T1645] INFO: task kworker/u1280:2:2688 is blocked on a
mutex likely owned by task kworker/240:1:1908.
[  242.940133][ T1645] task:kworker/240:1   state:D stack:0     pid:1908
 tgid:1908  ppid:2      task_flags:0x4208060 flags:0x00000010
[  242.944611][ T1645] Workqueue: sync_wq local_pci_probe_callback
[  242.946870][ T1645] Call trace:
[  242.949076][ T1645]  __switch_to+0xdc/0x108 (T)
[  242.951232][ T1645]  __schedule+0x2f4/0x690
[  242.953311][ T1645]  schedule+0x58/0x118
[  242.955374][ T1645]  schedule_timeout+0x88/0x128
[  242.957382][ T1645]  __wait_for_common+0xc4/0x1d0
[  242.959348][ T1645]  wait_for_completion_timeout+0x28/0x40
[  242.961269][ T1645]  wait_func+0x180/0x240 [mlx5_core]
[  242.963155][ T1645]  mlx5_cmd_invoke+0x290/0x430 [mlx5_core]
[  242.964980][ T1645]  cmd_exec+0x218/0x458 [mlx5_core]
[  242.966758][ T1645]  mlx5_cmd_do+0x38/0x80 [mlx5_core]
[  242.968559][ T1645]  mlx5_cmd_exec+0x2c/0x60 [mlx5_core]
[  242.970337][ T1645]  mlx5_core_create_mkey+0x70/0x120 [mlx5_core]
[  242.972172][ T1645]  mlx5_fw_tracer_create_mkey+0x114/0x180 [mlx5_core]
[  242.974070][ T1645]  mlx5_fw_tracer_init.part.0+0xb4/0x1f8 [mlx5_core]
[  242.975980][ T1645]  mlx5_fw_tracer_init+0x24/0x40 [mlx5_core]
[  242.977816][ T1645]  mlx5_load+0x7c/0x320 [mlx5_core]
[  242.979677][ T1645]  mlx5_init_one_devl_locked+0xd4/0x278 [mlx5_core]
[  242.981560][ T1645]  probe_one+0xe0/0x208 [mlx5_core]
[  242.983465][ T1645]  local_pci_probe+0x48/0xb8
[  242.985367][ T1645]  local_pci_probe_callback+0x24/0x40
[  242.987191][ T1645]  process_one_work+0x23c/0x768
[  242.989030][ T1645]  worker_thread+0x230/0x2e0
[  242.990877][ T1645]  kthread+0x128/0x138
[  242.992732][ T1645]  ret_from_fork+0x10/0x20
[  242.994563][ T1645]
[  242.994563][ T1645] Showing all locks held in the system:
[  242.998086][ T1645] 1 lock held by khungtaskd/1645:
[  242.999721][ T1645]  #0: ffffc4e789aa5160
(rcu_read_lock){....}-{1:3}, at: debug_show_all_locks+0x18/0x200
[  243.003067][ T1645] 2 locks held by pr/ttyAMA-1/1664:
[  243.004790][ T1645] 5 locks held by kworker/240:1/1908:
[  243.006498][ T1645]  #0: ffff084012a6af48
((wq_completion)sync_wq){+.+.}-{0:0}, at: process_one_work+0x1c0/0x768
[  243.010130][ T1645]  #1: ffff80009bc83d90
((work_completion)(&arg.work)){+.+.}-{0:0}, at: process_one_work+0x1e8/0x768
[  243.014018][ T1645]  #2: ffff2820116b2250
(&devlink->lock_key#4){+.+.}-{4:4}, at: devl_lock+0x20/0x38
[  243.018112][ T1645]  #3: ffff2820116b2f10
(&dev->lock_key#4){+.+.}-{4:4}, at: mlx5_init_one_devl_locked+0x40/0x278
[mlx5_core]
[  243.022472][ T1645]  #4: ffff282018220528
(&tracer->state_lock){+.+.}-{4:4}, at:
mlx5_fw_tracer_init.part.0+0x3c/0x1f8 [mlx5_core]
[  243.027125][ T1645] 1 lock held by systemd-udevd/2258:
[  243.029415][ T1645]  #0: ffff0840147c21b8 (&dev->mutex){....}-{4:4},
at: __driver_attach+0x15c/0x2b0
[  243.034166][ T1645] 1 lock held by systemd-udevd/2268:
[  243.036590][ T1645]  #0: ffff0840147c21b8 (&dev->mutex){....}-{4:4},
at: __driver_attach+0x38/0x2b0
[  243.041493][ T1645] 1 lock held by systemd-udevd/2296:
[  243.043988][ T1645]  #0: ffff0840147c21b8 (&dev->mutex){....}-{4:4},
at: __driver_attach+0x38/0x2b0
[  243.049135][ T1645] 1 lock held by systemd-udevd/2571:
[  243.051747][ T1645]  #0: ffff2840060bbeb0
(&sb->s_type->i_mutex_key#13){++++}-{4:4}, at: blkdev_read_iter+0x7c/0x188
[  243.057193][ T1645] 3 locks held by kworker/u1280:2/2688:
[  243.059988][ T1645]  #0: ffff2820116d5948
((wq_completion)mlx5_health0000:97:00.1){+.+.}-{0:0}, at:
process_one_work+0x1c0/0x768
[  243.065740][ T1645]  #1: ffff8000abb73d90
((work_completion)(&health->report_work)){+.+.}-{0:0}, at:
process_one_work+0x1e8/0x768
[  243.071648][ T1645]  #2: ffff2820116b2250
(&devlink->lock_key#4){+.+.}-{4:4}, at: devl_lock+0x20/0x38
[  243.077622][ T1645] 2 locks held by kworker/u1280:3/2692:
[  243.080654][ T1645]  #0: ffff2820116d7548
((wq_completion)mlx5_fw_tracer#4){+.+.}-{0:0}, at:
process_one_work+0x1c0/0x768
[  243.086670][ T1645]  #1: ffff8000abb93d90
((work_completion)(&tracer->read_fw_strings_work)){+.+.}-{0:0}, at:
process_one_work+0x1e8/0x768
[  243.092720][ T1645]


> 
>>
>> [  242.676635][ T1644] INFO: task kworker/u1280:1:1671 blocked for
>> more
>> than 120 seconds.
>> [  242.682141][ T1644]       Not tainted 7.0.0-rc2+ #3
>> [  242.684942][ T1644] "echo 0 >
>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> [  242.690552][ T1644] task:kworker/u1280:1 state:D stack:0    
>> pid:1671
>>  tgid:1671  ppid:2      task_flags:0x4208060 flags:0x00000010
>> [  242.696332][ T1644] Workqueue: mlx5_health0000:97:00.0
>> mlx5_fw_reporter_err_work [mlx5_core]
>> [  242.702324][ T1644] Call trace:
>> [  242.705187][ T1644]  __switch_to+0xdc/0x108 (T)
>> [  242.707936][ T1644]  __schedule+0x2a0/0x8a8
>> [  242.710647][ T1644]  schedule+0x3c/0xc0
>> [  242.713321][ T1644]  schedule_preempt_disabled+0x2c/0x50
>> [  242.715875][ T1644]  __mutex_lock.constprop.0+0x344/0x918
>> [  242.718421][ T1644]  __mutex_lock_slowpath+0x1c/0x30
>> [  242.720885][ T1644]  mutex_lock+0x50/0x68
>> [  242.723278][ T1644]  devl_lock+0x1c/0x30
>> [  242.725607][ T1644]  devlink_health_report+0x240/0x328
>> [  242.727902][ T1644]  mlx5_fw_reporter_err_work+0xa0/0xb0
>> [mlx5_core]
>> [  242.730333][ T1644]  process_one_work+0x180/0x4f8
>> [  242.732687][ T1644]  worker_thread+0x208/0x280
>> [  242.734976][ T1644]  kthread+0x128/0x138
>> [  242.737217][ T1644]  ret_from_fork+0x10/0x20
>> [  242.739599][ T1644] INFO: task kworker/u1280:1:1671 is blocked on
>> a
>> mutex likely owned by task kworker/240:2:2582.
>> [  242.744002][ T1644] task:kworker/240:2   state:D stack:0    
>> pid:2582
>>  tgid:2582  ppid:2      task_flags:0x4208060 flags:0x00000010
>> [  242.748447][ T1644] Workqueue: sync_wq local_pci_probe_callback
>> [  242.750654][ T1644] Call trace:
>> [  242.752793][ T1644]  __switch_to+0xdc/0x108 (T)
>> [  242.754882][ T1644]  __schedule+0x2a0/0x8a8
>> [  242.756946][ T1644]  schedule+0x3c/0xc0
>> [  242.758951][ T1644]  schedule_timeout+0x80/0x120
>> [  242.760903][ T1644]  __wait_for_common+0xc4/0x1d0
>> [  242.762796][ T1644]  wait_for_completion_timeout+0x28/0x40
>> [  242.764670][ T1644]  wait_func+0x180/0x240 [mlx5_core]
>> [  242.766533][ T1644]  mlx5_cmd_invoke+0x244/0x3e0 [mlx5_core]
>> [  242.768338][ T1644]  cmd_exec+0x208/0x448 [mlx5_core]
>> [  242.770153][ T1644]  mlx5_cmd_do+0x38/0x80 [mlx5_core]
>> [  242.771974][ T1644]  mlx5_cmd_exec+0x2c/0x60 [mlx5_core]
>> [  242.773848][ T1644]  mlx5_core_create_mkey+0x70/0x120 [mlx5_core]
>> [  242.775712][ T1644]  mlx5_fw_tracer_create_mkey+0x114/0x180
>> [mlx5_core]
>> [  242.777609][ T1644]  mlx5_fw_tracer_init.part.0+0xb0/0x1f0
>> [mlx5_core]
>> [  242.779495][ T1644]  mlx5_fw_tracer_init+0x24/0x40 [mlx5_core]
>> [  242.781380][ T1644]  mlx5_load+0x78/0x360 [mlx5_core]
>> [  242.783256][ T1644]  mlx5_init_one_devl_locked+0xd0/0x278
>> [mlx5_core]
>> [  242.785231][ T1644]  probe_one+0xe0/0x208 [mlx5_core]
>> [  242.787159][ T1644]  local_pci_probe+0x48/0xb8
>> [  242.789038][ T1644]  local_pci_probe_callback+0x24/0x40
>> [  242.790876][ T1644]  process_one_work+0x180/0x4f8
>> [  242.792731][ T1644]  worker_thread+0x208/0x280
>> [  242.794578][ T1644]  kthread+0x128/0x138
>> [  242.796427][ T1644]  ret_from_fork+0x10/0x20
>> [  242.798277][ T1644] INFO: task systemd-udevd:2281 blocked for more
>> than 120 seconds.
>> [  242.801795][ T1644]       Not tainted 7.0.0-rc2+ #3
>> [  242.803542][ T1644] "echo 0 >
>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> [  242.807056][ T1644] task:systemd-udevd   state:D stack:0    
>> pid:2281
>>  tgid:2281  ppid:2256   task_flags:0x400140 flags:0x00000811
>> [  242.810829][ T1644] Call trace:
>> [  242.812779][ T1644]  __switch_to+0xdc/0x108 (T)
>> [  242.814681][ T1644]  __schedule+0x2a0/0x8a8
>> [  242.816609][ T1644]  schedule+0x3c/0xc0
>> [  242.818499][ T1644]  schedule_timeout+0x10c/0x120
>> [  242.820388][ T1644]  __wait_for_common+0xc4/0x1d0
>> [  242.822267][ T1644]  wait_for_completion+0x28/0x40
>> [  242.824168][ T1644]  __flush_work+0x7c/0xf8
>> [  242.825983][ T1644]  flush_work+0x1c/0x30
>> [  242.827816][ T1644]  pci_call_probe+0x174/0x1e0
>> [  242.829652][ T1644]  pci_device_probe+0x98/0x108
>> [  242.831455][ T1644]  call_driver_probe+0x34/0x158
>> [  242.833261][ T1644]  really_probe+0xc0/0x320
>> [  242.835082][ T1644]  __driver_probe_device+0x88/0x190
>> [  242.836843][ T1644]  driver_probe_device+0x48/0x120
>> [  242.838607][ T1644]  __driver_attach+0x138/0x280
>> [  242.840355][ T1644]  bus_for_each_dev+0x80/0xe8
>> [  242.842095][ T1644]  driver_attach+0x2c/0x40
>> [  242.843830][ T1644]  bus_add_driver+0x128/0x258
>> [  242.845564][ T1644]  driver_register+0x68/0x138
>> [  242.847285][ T1644]  __pci_register_driver+0x4c/0x60
>> [  242.849038][ T1644]  mlx5_init+0x7c/0xff8 [mlx5_core]
>> [  242.850871][ T1644]  do_one_initcall+0x50/0x498
>> [  242.852561][ T1644]  do_init_module+0x60/0x280
>> [  242.854220][ T1644]  load_module+0x3d8/0x6a8
>> [  242.855856][ T1644]  init_module_from_file+0xe4/0x108
>> [  242.857470][ T1644]  idempotent_init_module+0x190/0x290
>> [  242.859022][ T1644]  __arm64_sys_finit_module+0x74/0xf8
>> [  242.860544][ T1644]  invoke_syscall+0x50/0x120
>> [  242.861996][ T1644]  el0_svc_common.constprop.0+0xc8/0xf0
>> [  242.863457][ T1644]  do_el0_svc+0x24/0x38
>> [  242.864914][ T1644]  el0_svc+0x34/0x170
>> [  242.866361][ T1644]  el0t_64_sync_handler+0xa0/0xe8
>> [  242.867839][ T1644]  el0t_64_sync+0x190/0x198
>>
>>
>>> Fix these by moving the netdev_trylock calls from the work handlers
>>> lower in the call stack, in the respective recovery functions,
>>> where
>>> they are actually necessary.
>>>
>>> Fixes: 8f7b00307bf1 ("net/mlx5e: Convert mlx5 netdevs to instance
>>> locking")
>>> Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
>>> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
>>> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
>>> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
>>> ---
>>>  .../net/ethernet/mellanox/mlx5/core/en/ptp.c  | 14 -----
>>>  .../mellanox/mlx5/core/en/reporter_rx.c       | 13 +++++
>>>  .../mellanox/mlx5/core/en/reporter_tx.c       | 52
>>> +++++++++++++++++--
>>>  .../net/ethernet/mellanox/mlx5/core/en_main.c | 40 --------------
>>>  4 files changed, 61 insertions(+), 58 deletions(-)
>>>
>>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
>>> b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
>>> index 424f8a2728a3..74660e7fe674 100644
>>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
>>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
>>> @@ -457,22 +457,8 @@ static void mlx5e_ptpsq_unhealthy_work(struct
>>> work_struct *work)
>>>  {
>>>  	struct mlx5e_ptpsq *ptpsq =
>>>  		container_of(work, struct mlx5e_ptpsq,
>>> report_unhealthy_work);
>>> -	struct mlx5e_txqsq *sq = &ptpsq->txqsq;
>>> -
>>> -	/* Recovering the PTP SQ means re-enabling NAPI, which
>>> requires the
>>> -	 * netdev instance lock. However, SQ closing has to wait
>>> for this work
>>> -	 * task to finish while also holding the same lock. So
>>> either get the
>>> -	 * lock or find that the SQ is no longer enabled and thus
>>> this work is
>>> -	 * not relevant anymore.
>>> -	 */
>>> -	while (!netdev_trylock(sq->netdev)) {
>>> -		if (!test_bit(MLX5E_SQ_STATE_ENABLED, &sq->state))
>>> -			return;
>>> -		msleep(20);
>>> -	}
>>>  
>>>  	mlx5e_reporter_tx_ptpsq_unhealthy(ptpsq);
>>> -	netdev_unlock(sq->netdev);
>>>  }
>>>  
>>>  static int mlx5e_ptp_open_txqsq(struct mlx5e_ptp *c, u32 tisn,
>>> diff --git
>>> a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c
>>> b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c
>>> index 0686fbdd5a05..6efb626b5506 100644
>>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c
>>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c
>>> @@ -1,6 +1,8 @@
>>>  // SPDX-License-Identifier: GPL-2.0
>>>  // Copyright (c) 2019 Mellanox Technologies.
>>>  
>>> +#include <net/netdev_lock.h>
>>> +
>>>  #include "health.h"
>>>  #include "params.h"
>>>  #include "txrx.h"
>>> @@ -177,6 +179,16 @@ static int
>>> mlx5e_rx_reporter_timeout_recover(void *ctx)
>>>  	rq = ctx;
>>>  	priv = rq->priv;
>>>  
>>> +	/* Acquire netdev instance lock to synchronize with
>>> channel close and
>>> +	 * reopen flows. Either successfully obtain the lock, or
>>> detect that
>>> +	 * channels are closing for another reason, making this
>>> work no longer
>>> +	 * necessary.
>>> +	 */
>>> +	while (!netdev_trylock(rq->netdev)) {
>>> +		if (!test_bit(MLX5E_STATE_CHANNELS_ACTIVE, &rq-
>>>> priv->state))
>>> +			return 0;
>>> +		msleep(20);
>>> +	}
>>>  	mutex_lock(&priv->state_lock);
>>>  
>>>  	eq = rq->cq.mcq.eq;
>>> @@ -186,6 +198,7 @@ static int
>>> mlx5e_rx_reporter_timeout_recover(void *ctx)
>>>  		clear_bit(MLX5E_SQ_STATE_ENABLED, &rq->icosq-
>>>> state);
>>>  
>>>  	mutex_unlock(&priv->state_lock);
>>> +	netdev_unlock(rq->netdev);
>>>  
>>>  	return err;
>>>  }
>>> diff --git
>>> a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
>>> b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
>>> index 4adc1adf9897..60ba840e00fa 100644
>>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
>>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
>>> @@ -1,6 +1,8 @@
>>>  /* SPDX-License-Identifier: GPL-2.0 */
>>>  /* Copyright (c) 2019 Mellanox Technologies. */
>>>  
>>> +#include <net/netdev_lock.h>
>>> +
>>>  #include "health.h"
>>>  #include "en/ptp.h"
>>>  #include "en/devlink.h"
>>> @@ -79,6 +81,18 @@ static int
>>> mlx5e_tx_reporter_err_cqe_recover(void *ctx)
>>>  	if (!test_bit(MLX5E_SQ_STATE_RECOVERING, &sq->state))
>>>  		return 0;
>>>  
>>> +	/* Recovering queues means re-enabling NAPI, which
>>> requires the netdev
>>> +	 * instance lock. However, SQ closing flows have to wait
>>> for work tasks
>>> +	 * to finish while also holding the netdev instance lock.
>>> So either get
>>> +	 * the lock or find that the SQ is no longer enabled and
>>> thus this work
>>> +	 * is not relevant anymore.
>>> +	 */
>>> +	while (!netdev_trylock(dev)) {
>>> +		if (!test_bit(MLX5E_SQ_STATE_ENABLED, &sq->state))
>>> +			return 0;
>>> +		msleep(20);
>>> +	}
>>> +
>>>  	err = mlx5_core_query_sq_state(mdev, sq->sqn, &state);
>>>  	if (err) {
>>>  		netdev_err(dev, "Failed to query SQ 0x%x state.
>>> err = %d\n",
>>> @@ -114,9 +128,11 @@ static int
>>> mlx5e_tx_reporter_err_cqe_recover(void *ctx)
>>>  	else
>>>  		mlx5e_trigger_napi_sched(sq->cq.napi);
>>>  
>>> +	netdev_unlock(dev);
>>>  	return 0;
>>>  out:
>>>  	clear_bit(MLX5E_SQ_STATE_RECOVERING, &sq->state);
>>> +	netdev_unlock(dev);
>>>  	return err;
>>>  }
>>>  
>>> @@ -137,10 +153,24 @@ static int
>>> mlx5e_tx_reporter_timeout_recover(void *ctx)
>>>  	sq = to_ctx->sq;
>>>  	eq = sq->cq.mcq.eq;
>>>  	priv = sq->priv;
>>> +
>>> +	/* Recovering the TX queues implies re-enabling NAPI,
>>> which requires
>>> +	 * the netdev instance lock.
>>> +	 * However, channel closing flows have to wait for this
>>> work to finish
>>> +	 * while holding the same lock. So either get the lock or
>>> find that
>>> +	 * channels are being closed for other reason and this
>>> work is not
>>> +	 * relevant anymore.
>>> +	 */
>>> +	while (!netdev_trylock(sq->netdev)) {
>>> +		if (!test_bit(MLX5E_STATE_CHANNELS_ACTIVE, &priv-
>>>> state))
>>> +			return 0;
>>> +		msleep(20);
>>> +	}
>>> +
>>>  	err = mlx5e_health_channel_eq_recover(sq->netdev, eq, sq-
>>>> cq.ch_stats);
>>>  	if (!err) {
>>>  		to_ctx->status = 0; /* this sq recovered */
>>> -		return err;
>>> +		goto out;
>>>  	}
>>>  
>>>  	mutex_lock(&priv->state_lock);
>>> @@ -148,7 +178,7 @@ static int
>>> mlx5e_tx_reporter_timeout_recover(void *ctx)
>>>  	mutex_unlock(&priv->state_lock);
>>>  	if (!err) {
>>>  		to_ctx->status = 1; /* all channels recovered */
>>> -		return err;
>>> +		goto out;
>>>  	}
>>>  
>>>  	to_ctx->status = err;
>>> @@ -156,7 +186,8 @@ static int
>>> mlx5e_tx_reporter_timeout_recover(void *ctx)
>>>  	netdev_err(priv->netdev,
>>>  		   "mlx5e_safe_reopen_channels failed recovering
>>> from a tx_timeout, err(%d).\n",
>>>  		   err);
>>> -
>>> +out:
>>> +	netdev_unlock(sq->netdev);
>>>  	return err;
>>>  }
>>>  
>>> @@ -173,10 +204,22 @@ static int
>>> mlx5e_tx_reporter_ptpsq_unhealthy_recover(void *ctx)
>>>  		return 0;
>>>  
>>>  	priv = ptpsq->txqsq.priv;
>>> +	netdev = priv->netdev;
>>> +
>>> +	/* Recovering the PTP SQ means re-enabling NAPI, which
>>> requires the
>>> +	 * netdev instance lock. However, SQ closing has to wait
>>> for this work
>>> +	 * task to finish while also holding the same lock. So
>>> either get the
>>> +	 * lock or find that the SQ is no longer enabled and thus
>>> this work is
>>> +	 * not relevant anymore.
>>> +	 */
>>> +	while (!netdev_trylock(netdev)) {
>>> +		if (!test_bit(MLX5E_SQ_STATE_ENABLED, &ptpsq-
>>>> txqsq.state))
>>> +			return 0;
>>> +		msleep(20);
>>> +	}
>>>  
>>>  	mutex_lock(&priv->state_lock);
>>>  	chs = &priv->channels;
>>> -	netdev = priv->netdev;
>>>  
>>>  	carrier_ok = netif_carrier_ok(netdev);
>>>  	netif_carrier_off(netdev);
>>> @@ -193,6 +236,7 @@ static int
>>> mlx5e_tx_reporter_ptpsq_unhealthy_recover(void *ctx)
>>>  		netif_carrier_on(netdev);
>>>  
>>>  	mutex_unlock(&priv->state_lock);
>>> +	netdev_unlock(netdev);
>>>  
>>>  	return err;
>>>  }
>>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>>> b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>>> index 4b8084420816..73f4805feac7 100644
>>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>>> @@ -631,19 +631,7 @@ static void mlx5e_rq_timeout_work(struct
>>> work_struct *timeout_work)
>>>  					   struct mlx5e_rq,
>>>  					   rx_timeout_work);
>>>  
>>> -	/* Acquire netdev instance lock to synchronize with
>>> channel close and
>>> -	 * reopen flows. Either successfully obtain the lock, or
>>> detect that
>>> -	 * channels are closing for another reason, making this
>>> work no longer
>>> -	 * necessary.
>>> -	 */
>>> -	while (!netdev_trylock(rq->netdev)) {
>>> -		if (!test_bit(MLX5E_STATE_CHANNELS_ACTIVE, &rq-
>>>> priv->state))
>>> -			return;
>>> -		msleep(20);
>>> -	}
>>> -
>>>  	mlx5e_reporter_rx_timeout(rq);
>>> -	netdev_unlock(rq->netdev);
>>>  }
>>>  
>>>  static int mlx5e_alloc_mpwqe_rq_drop_page(struct mlx5e_rq *rq)
>>> @@ -1952,20 +1940,7 @@ void mlx5e_tx_err_cqe_work(struct
>>> work_struct *recover_work)
>>>  	struct mlx5e_txqsq *sq = container_of(recover_work, struct
>>> mlx5e_txqsq,
>>>  					      recover_work);
>>>  
>>> -	/* Recovering queues means re-enabling NAPI, which
>>> requires the netdev
>>> -	 * instance lock. However, SQ closing flows have to wait
>>> for work tasks
>>> -	 * to finish while also holding the netdev instance lock.
>>> So either get
>>> -	 * the lock or find that the SQ is no longer enabled and
>>> thus this work
>>> -	 * is not relevant anymore.
>>> -	 */
>>> -	while (!netdev_trylock(sq->netdev)) {
>>> -		if (!test_bit(MLX5E_SQ_STATE_ENABLED, &sq->state))
>>> -			return;
>>> -		msleep(20);
>>> -	}
>>> -
>>>  	mlx5e_reporter_tx_err_cqe(sq);
>>> -	netdev_unlock(sq->netdev);
>>>  }
>>>  
>>>  static struct dim_cq_moder mlx5e_get_def_tx_moderation(u8
>>> cq_period_mode)
>>> @@ -5105,19 +5080,6 @@ static void mlx5e_tx_timeout_work(struct
>>> work_struct *work)
>>>  	struct net_device *netdev = priv->netdev;
>>>  	int i;
>>>  
>>> -	/* Recovering the TX queues implies re-enabling NAPI,
>>> which requires
>>> -	 * the netdev instance lock.
>>> -	 * However, channel closing flows have to wait for this
>>> work to finish
>>> -	 * while holding the same lock. So either get the lock or
>>> find that
>>> -	 * channels are being closed for other reason and this
>>> work is not
>>> -	 * relevant anymore.
>>> -	 */
>>> -	while (!netdev_trylock(netdev)) {
>>> -		if (!test_bit(MLX5E_STATE_CHANNELS_ACTIVE, &priv-
>>>> state))
>>> -			return;
>>> -		msleep(20);
>>> -	}
>>> -
>>>  	for (i = 0; i < netdev->real_num_tx_queues; i++) {
>>>  		struct netdev_queue *dev_queue =
>>>  			netdev_get_tx_queue(netdev, i);
>>> @@ -5130,8 +5092,6 @@ static void mlx5e_tx_timeout_work(struct
>>> work_struct *work)
>>>  		/* break if tried to reopened channels */
>>>  			break;
>>>  	}
>>> -
>>> -	netdev_unlock(netdev);
>>>  }
>>>  
>>>  static void mlx5e_tx_timeout(struct net_device *dev, unsigned int
>>> txqueue)
>>
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2026-03-06  1:54 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-18  7:28 [PATCH net V2 0/6] mlx5 misc fixes 2026-02-18 Tariq Toukan
2026-02-18  7:28 ` [PATCH net V2 1/6] net/mlx5: Fix multiport device check over light SFs Tariq Toukan
2026-02-18  7:29 ` [PATCH net V2 2/6] net/mlx5e: Fix misidentification of ASO CQE during poll loop Tariq Toukan
2026-02-18  7:29 ` [PATCH net V2 3/6] net/mlx5: Fix misidentification of write combining " Tariq Toukan
2026-02-18  7:29 ` [PATCH net V2 4/6] net/mlx5e: MACsec, add ASO poll loop in macsec_aso_set_arm_event Tariq Toukan
2026-02-18  7:29 ` [PATCH net V2 5/6] net/mlx5e: Fix deadlocks between devlink and netdev instance locks Tariq Toukan
2026-03-05  2:33   ` Jinjie Ruan
2026-03-05 12:19     ` Cosmin Ratiu
2026-03-06  1:54       ` Jinjie Ruan
2026-02-18  7:29 ` [PATCH net V2 6/6] net/mlx5e: Use unsigned for mlx5e_get_max_num_channels Tariq Toukan
2026-02-18 23:49 ` [PATCH net V2 0/6] mlx5 misc fixes 2026-02-18 Keller, Jacob E
2026-02-19 17:40 ` patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox