* [PATCH v8 1/2] IB/mlx5: Fix success return path and mutex initialization
2026-04-05 21:57 [PATCH v8 0/2] Fix loopback leaks and return paths Prathamesh Deshpande
@ 2026-04-05 21:57 ` Prathamesh Deshpande
2026-04-05 21:57 ` [PATCH v8 2/2] IB/mlx5: Fix loopback refcounting leaks and premature disable Prathamesh Deshpande
1 sibling, 0 replies; 3+ messages in thread
From: Prathamesh Deshpande @ 2026-04-05 21:57 UTC (permalink / raw)
To: linux-rdma
Cc: prathameshdeshpande7, dledford, haggaie, jgg, leon, linux-kernel
Fix an incorrect return path in mlx5_ib_alloc_transport_domain() where
a success case could return an uninitialized error value instead of 0.
Additionally, move dev->lb.mutex initialization to
mlx5_ib_stage_init_init(). This ensures the mutex is initialized
before potential access by create_raw_packet_qp_tir(), preventing
a null pointer dereference.
Signed-off-by: Prathamesh Deshpande <prathameshdeshpande7@gmail.com>
---
v8:
- Resubmitted as a fresh, independent thread per maintainer request.
- No functional changes since v7.
v7:
- Split from the main loopback refactor into a standalone patch to
improve bisection and isolate the return-value fix.
v1-v6:
- Part of the combined "IB/mlx5: Fix loopback enablement state and
resource leaks" patch.
drivers/infiniband/hw/mlx5/main.c | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index b74bf2697655..f49f746bc5bd 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -2068,7 +2068,7 @@ static int mlx5_ib_alloc_transport_domain(struct mlx5_ib_dev *dev, u32 *tdn,
if ((MLX5_CAP_GEN(dev->mdev, port_type) != MLX5_CAP_PORT_TYPE_ETH) ||
(!MLX5_CAP_GEN(dev->mdev, disable_local_lb_uc) &&
!MLX5_CAP_GEN(dev->mdev, disable_local_lb_mc)))
- return err;
+ return 0;
return mlx5_ib_enable_lb(dev, true, false);
}
@@ -4515,6 +4515,7 @@ static int mlx5_ib_stage_init_init(struct mlx5_ib_dev *dev)
mutex_init(&dev->data_direct_lock);
INIT_LIST_HEAD(&dev->qp_list);
spin_lock_init(&dev->reset_flow_resource_lock);
+ mutex_init(&dev->lb.mutex);
xa_init(&dev->odp_mkeys);
xa_init(&dev->sig_mrs);
atomic_set(&dev->mkey_var, 0);
@@ -4786,11 +4787,6 @@ static int mlx5_ib_stage_caps_init(struct mlx5_ib_dev *dev)
if (err)
return err;
- if ((MLX5_CAP_GEN(dev->mdev, port_type) == MLX5_CAP_PORT_TYPE_ETH) &&
- (MLX5_CAP_GEN(dev->mdev, disable_local_lb_uc) ||
- MLX5_CAP_GEN(dev->mdev, disable_local_lb_mc)))
- mutex_init(&dev->lb.mutex);
-
if (MLX5_CAP_GEN_64(dev->mdev, general_obj_types) &
MLX5_GENERAL_OBJ_TYPES_CAP_VIRTIO_NET_Q) {
err = mlx5_ib_init_var_region(dev);
--
2.43.0
^ permalink raw reply related [flat|nested] 3+ messages in thread* [PATCH v8 2/2] IB/mlx5: Fix loopback refcounting leaks and premature disable
2026-04-05 21:57 [PATCH v8 0/2] Fix loopback leaks and return paths Prathamesh Deshpande
2026-04-05 21:57 ` [PATCH v8 1/2] IB/mlx5: Fix success return path and mutex initialization Prathamesh Deshpande
@ 2026-04-05 21:57 ` Prathamesh Deshpande
1 sibling, 0 replies; 3+ messages in thread
From: Prathamesh Deshpande @ 2026-04-05 21:57 UTC (permalink / raw)
To: linux-rdma
Cc: prathameshdeshpande7, dledford, haggaie, jgg, leon, linux-kernel
Update mlx5_ib_enable_lb() and mlx5_ib_disable_lb() to ensure
symmetric updates of user_td and qps counters.
Software state leaks can occur if the force_enable flag is checked
before updating counters. Furthermore, the hardware deactivation
condition in the original code (user_td == 1) can fail to disable
loopback if user_td remains 0, or cause premature deactivation in
multi-user scenarios.
Fix these by:
- Updating counters prior to checking the force_enable gate.
- Disabling hardware only when both user_td and qps reach zero.
- Implementing a counter rollback if the hardware command fails.
Signed-off-by: Prathamesh Deshpande <prathameshdeshpande7@gmail.com>
---
v8:
- Resubmitted as a fresh, independent thread per maintainer request.
- No functional changes since v7.
v7:
- Split into a separate patch for better bisection.
- Moved force_enable check after increments/decrements to fix leaks [Sashiko].
- Updated hardware disable condition to a strict zero-check.
v6:
- Always update software counters regardless of force_enable to prevent
underflows during dynamic unbinding [Sashiko].
- Updated disable condition to user_td <= 1 to prevent HW state leaks
on systems without transport domains [Sashiko].
- Rebased on rdma/for-next to resolve baseline application failures.
v5:
- Moved mutex_init to stage_init_init to prevent crashes on non-ETH hardware.
- Implemented 'goto unlock' for concurrency safety in enable/disable paths.
- Added atomic rollback and fixed tdn leak.
v4:
- Moved rollback logic into mlx5_ib_enable_lb() to ensure atomicity
within the mutex and prevent race conditions [Sashiko].
v3:
- Also call mlx5_ib_disable_lb() on failure to roll back software state/counters
[Sashiko].
v2:
- Added deallocation of tdn if mlx5_ib_enable_lb() fails [Sashiko].
- Reworded commit message to reflect the functional fix and credit the tool.
drivers/infiniband/hw/mlx5/main.c | 37 ++++++++++++++++++-------------
1 file changed, 21 insertions(+), 16 deletions(-)
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index f49f746bc5bd..fde72ebe721a 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -2009,23 +2009,29 @@ int mlx5_ib_enable_lb(struct mlx5_ib_dev *dev, bool td, bool qp)
{
int err = 0;
- if (dev->lb.force_enable)
- return 0;
-
mutex_lock(&dev->lb.mutex);
+
if (td)
dev->lb.user_td++;
if (qp)
dev->lb.qps++;
- if (dev->lb.user_td == 2 ||
- dev->lb.qps == 1) {
- if (!dev->lb.enabled) {
- err = mlx5_nic_vport_update_local_lb(dev->mdev, true);
+ if (dev->lb.force_enable)
+ goto unlock;
+
+ if (!dev->lb.enabled && (dev->lb.user_td >= 1 || dev->lb.qps >= 1)) {
+ err = mlx5_nic_vport_update_local_lb(dev->mdev, true);
+ if (err) {
+ if (td)
+ dev->lb.user_td--;
+ if (qp)
+ dev->lb.qps--;
+ } else {
dev->lb.enabled = true;
}
}
+unlock:
mutex_unlock(&dev->lb.mutex);
return err;
@@ -2033,23 +2039,22 @@ int mlx5_ib_enable_lb(struct mlx5_ib_dev *dev, bool td, bool qp)
void mlx5_ib_disable_lb(struct mlx5_ib_dev *dev, bool td, bool qp)
{
- if (dev->lb.force_enable)
- return;
-
mutex_lock(&dev->lb.mutex);
+
if (td)
dev->lb.user_td--;
if (qp)
dev->lb.qps--;
- if (dev->lb.user_td == 1 &&
- dev->lb.qps == 0) {
- if (dev->lb.enabled) {
- mlx5_nic_vport_update_local_lb(dev->mdev, false);
- dev->lb.enabled = false;
- }
+ if (dev->lb.force_enable)
+ goto unlock;
+
+ if (dev->lb.enabled && (dev->lb.user_td == 0 && dev->lb.qps == 0)) {
+ mlx5_nic_vport_update_local_lb(dev->mdev, false);
+ dev->lb.enabled = false;
}
+unlock:
mutex_unlock(&dev->lb.mutex);
}
--
2.43.0
^ permalink raw reply related [flat|nested] 3+ messages in thread