public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
From: Prathamesh Deshpande <prathameshdeshpande7@gmail.com>
To: prathameshdeshpande7@gmail.com
Cc: dledford@redhat.com, haggaie@mellanox.com, jgg@ziepe.ca,
	leon@kernel.org, linux-kernel@vger.kernel.org,
	linux-rdma@vger.kernel.org
Subject: [PATCH v6] IB/mlx5: Fix loopback enablement state and resource leaks
Date: Sun,  5 Apr 2026 00:07:58 +0100	[thread overview]
Message-ID: <20260404230759.15131-1-prathameshdeshpande7@gmail.com> (raw)
In-Reply-To: <20260404215155.13254-1-prathameshdeshpande7@gmail.com>

In mlx5_ib_alloc_transport_domain(), an early success path was
incorrectly returning a variable instead of a literal 0. Additionally,
the loopback (LB) infrastructure required updates to address
initialization and concurrency issues:

1. Initialization: The LB mutex was initialized conditionally, but
   paths like create_raw_packet_qp_tir() in qp.c call LB functions
   regardless of those conditions. Move initialization to
   stage_init_init() to ensure the lock is always valid.
2. State Integrity: Update mlx5_ib_enable_lb() to roll back reference
   counters if the hardware command fails, ensuring software state
   matches hardware.
3. Concurrency: Move 'force_enable' checks inside the mutex in both
   enable/disable paths to prevent potential counter underflows.
4. Resource Leak: Deallocate the transport domain if LB enablement
   fails in mlx5_ib_alloc_transport_domain().

Signed-off-by: Prathamesh Deshpande <prathameshdeshpande7@gmail.com>
---
v6:
- Always update software counters regardless of force_enable to prevent 
  underflows during dynamic unbinding [Sashiko].
- Updated disable condition to user_td <= 1 to prevent HW state leaks 
  on systems without transport domains [Sashiko].
- Rebased on rdma/for-next to resolve baseline application failures.
v5:
- Moved mutex_init to stage_init_init to prevent crashes on non-ETH hardware.
- Implemented 'goto unlock' for concurrency safety in enable/disable paths.
- Added atomic rollback and fixed tdn leak.
v4:
- Moved rollback logic into mlx5_ib_enable_lb() to ensure atomicity
  within the mutex and prevent race conditions [Sashiko].
v3:
- Also call mlx5_ib_disable_lb() on failure to roll back software state/counters
  [Sashiko].
v2:
- Added deallocation of tdn if mlx5_ib_enable_lb() fails [Sashiko].
- Reworded commit message to reflect the functional fix and credit the tool.

 drivers/infiniband/hw/mlx5/main.c | 39 +++++++++++++++++++------------
 1 file changed, 24 insertions(+), 15 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index b74bf2697655..c2fafa00b60c 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -2009,23 +2009,31 @@ int mlx5_ib_enable_lb(struct mlx5_ib_dev *dev, bool td, bool qp)
 {
 	int err = 0;
 
-	if (dev->lb.force_enable)
-		return 0;
-
 	mutex_lock(&dev->lb.mutex);
 	if (td)
 		dev->lb.user_td++;
 	if (qp)
 		dev->lb.qps++;
 
-	if (dev->lb.user_td == 2 ||
+	if (dev->lb.force_enable)
+		goto unlock;
+
+	if (dev->lb.user_td == 1 ||
 	    dev->lb.qps == 1) {
 		if (!dev->lb.enabled) {
 			err = mlx5_nic_vport_update_local_lb(dev->mdev, true);
-			dev->lb.enabled = true;
+			if (err) {
+				if (td)
+					dev->lb.user_td--;
+				if (qp)
+					dev->lb.qps--;
+			} else {
+				dev->lb.enabled = true;
+			}
 		}
 	}
 
+unlock:
 	mutex_unlock(&dev->lb.mutex);
 
 	return err;
@@ -2033,16 +2041,16 @@ int mlx5_ib_enable_lb(struct mlx5_ib_dev *dev, bool td, bool qp)
 
 void mlx5_ib_disable_lb(struct mlx5_ib_dev *dev, bool td, bool qp)
 {
+	mutex_lock(&dev->lb.mutex);
 	if (dev->lb.force_enable)
-		return;
+		goto unlock;
 
-	mutex_lock(&dev->lb.mutex);
 	if (td)
 		dev->lb.user_td--;
 	if (qp)
 		dev->lb.qps--;
 
-	if (dev->lb.user_td == 1 &&
+	if (dev->lb.user_td <= 1 &&
 	    dev->lb.qps == 0) {
 		if (dev->lb.enabled) {
 			mlx5_nic_vport_update_local_lb(dev->mdev, false);
@@ -2050,6 +2058,7 @@ void mlx5_ib_disable_lb(struct mlx5_ib_dev *dev, bool td, bool qp)
 		}
 	}
 
+unlock:
 	mutex_unlock(&dev->lb.mutex);
 }
 
@@ -2068,9 +2077,13 @@ static int mlx5_ib_alloc_transport_domain(struct mlx5_ib_dev *dev, u32 *tdn,
 	if ((MLX5_CAP_GEN(dev->mdev, port_type) != MLX5_CAP_PORT_TYPE_ETH) ||
 	    (!MLX5_CAP_GEN(dev->mdev, disable_local_lb_uc) &&
 	     !MLX5_CAP_GEN(dev->mdev, disable_local_lb_mc)))
-		return err;
+		return 0;
+
+	err = mlx5_ib_enable_lb(dev, true, false);
+	if (err)
+		mlx5_cmd_dealloc_transport_domain(dev->mdev, *tdn, uid);
 
-	return mlx5_ib_enable_lb(dev, true, false);
+	return err;
 }
 
 static void mlx5_ib_dealloc_transport_domain(struct mlx5_ib_dev *dev, u32 tdn,
@@ -4515,6 +4528,7 @@ static int mlx5_ib_stage_init_init(struct mlx5_ib_dev *dev)
 	mutex_init(&dev->data_direct_lock);
 	INIT_LIST_HEAD(&dev->qp_list);
 	spin_lock_init(&dev->reset_flow_resource_lock);
+	mutex_init(&dev->lb.mutex);
 	xa_init(&dev->odp_mkeys);
 	xa_init(&dev->sig_mrs);
 	atomic_set(&dev->mkey_var, 0);
@@ -4786,11 +4800,6 @@ static int mlx5_ib_stage_caps_init(struct mlx5_ib_dev *dev)
 	if (err)
 		return err;
 
-	if ((MLX5_CAP_GEN(dev->mdev, port_type) == MLX5_CAP_PORT_TYPE_ETH) &&
-	    (MLX5_CAP_GEN(dev->mdev, disable_local_lb_uc) ||
-	     MLX5_CAP_GEN(dev->mdev, disable_local_lb_mc)))
-		mutex_init(&dev->lb.mutex);
-
 	if (MLX5_CAP_GEN_64(dev->mdev, general_obj_types) &
 			MLX5_GENERAL_OBJ_TYPES_CAP_VIRTIO_NET_Q) {
 		err = mlx5_ib_init_var_region(dev);
-- 
2.43.0


  reply	other threads:[~2026-04-04 23:08 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-31  1:28 [PATCH] IB/mlx5: Clarify success return path in mlx5_ib_alloc_transport_domain Prathamesh Deshpande
2026-03-31 13:48 ` Leon Romanovsky
2026-03-31 23:04   ` [PATCH v2] IB/mlx5: Fix tdn leak " Prathamesh Deshpande
2026-04-01 22:35     ` [PATCH v3] IB/mlx5: Fix tdn leak and state corruption " Prathamesh Deshpande
2026-04-01 23:52       ` [PATCH v4] IB/mlx5: Fix state corruption and resource leaks in loopback enablement Prathamesh Deshpande
2026-04-04 21:51         ` [PATCH v5] IB/mlx5: Fix loopback enablement state and resource leaks Prathamesh Deshpande
2026-04-04 23:07           ` Prathamesh Deshpande [this message]
2026-04-05 13:09             ` [PATCH v7 0/2] Fix loopback leaks and return paths Prathamesh Deshpande
2026-04-05 13:09               ` [PATCH v7 1/2] IB/mlx5: Fix success return path and mutex initialization Prathamesh Deshpande
2026-04-05 13:09               ` [PATCH v7 2/2] IB/mlx5: Fix loopback refcounting leaks and premature disable Prathamesh Deshpande
2026-04-05 19:22               ` [PATCH v7 0/2] Fix loopback leaks and return paths Leon Romanovsky
2026-04-05 22:09                 ` Prathamesh Deshpande

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260404230759.15131-1-prathameshdeshpande7@gmail.com \
    --to=prathameshdeshpande7@gmail.com \
    --cc=dledford@redhat.com \
    --cc=haggaie@mellanox.com \
    --cc=jgg@ziepe.ca \
    --cc=leon@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox