Linux RDMA and InfiniBand development
 help / color / mirror / Atom feed
* [PATCH] IB/mlx4: Fix stale CM id_map entries when RTU is never received
@ 2026-05-07 15:47 Praveen Kumar Kannoju
  2026-05-11  7:50 ` Praveen Kannoju
  2026-06-03  0:26 ` Jason Gunthorpe
  0 siblings, 2 replies; 3+ messages in thread
From: Praveen Kumar Kannoju @ 2026-05-07 15:47 UTC (permalink / raw)
  To: yishaih, jgg, leon, linux-rdma, linux-kernel
  Cc: anand.a.khoje, manjunath.b.patil, Praveen Kumar Kannoju

mlx4_ib_multiplex_cm_handler() allocates an id_map_entry for CM
transactions, but the entry is only released on DREQ or REJ flows.

In the duplicate REP handling scenario, cm_dup_rep_handler() may get
invoked when the remote side receives a REP for which no matching
cm_id_priv exists. In such cases the CM handshake never reaches RTU,
and the sender side may never receive either DREQ or REJ cleanup events.

As a result, the allocated id_map_entry remains indefinitely, resulting in
a stale mapping leak.

Fix this by scheduling delayed cleanup immediately after allocating the
id_map_entry. The delayed work is cancelled once CM_RTU_ATTR_ID is
received, indicating that the CM handshake completed successfully.

This ensures abandoned mappings are eventually reclaimed even when RTU is
never received.

Signed-off-by: Praveen Kumar Kannoju <praveen.kannoju@oracle.com>
---
 drivers/infiniband/hw/mlx4/cm.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/drivers/infiniband/hw/mlx4/cm.c b/drivers/infiniband/hw/mlx4/cm.c
index 63a868a3822f..700a840d491d 100644
--- a/drivers/infiniband/hw/mlx4/cm.c
+++ b/drivers/infiniband/hw/mlx4/cm.c
@@ -299,6 +299,7 @@ static void schedule_delayed(struct ib_device *ibdev, struct id_map_entry *id)
 }
 
 #define REJ_REASON(m) be16_to_cpu(((struct cm_generic_msg *)(m))->rej_reason)
+#define RTU_RECEIVE_TIMEOUT  (60 * HZ)
 int mlx4_ib_multiplex_cm_handler(struct ib_device *ibdev, int port, int slave_id,
 		struct ib_mad *mad)
 {
@@ -321,6 +322,9 @@ int mlx4_ib_multiplex_cm_handler(struct ib_device *ibdev, int port, int slave_id
 				__func__, slave_id, sl_cm_id);
 			return PTR_ERR(id);
 		}
+
+		schedule_delayed_work(&id->timeout, RTU_RECEIVE_TIMEOUT);
+
 	} else if (mad->mad_hdr.attr_id == CM_REJ_ATTR_ID ||
 		   mad->mad_hdr.attr_id == CM_SIDR_REP_ATTR_ID) {
 		return 0;
@@ -335,6 +339,9 @@ int mlx4_ib_multiplex_cm_handler(struct ib_device *ibdev, int port, int slave_id
 		return -EINVAL;
 	}
 
+	if (mad->mad_hdr.attr_id == CM_RTU_ATTR_ID)
+		cancel_delayed_work_sync(&id->timeout);
+
 cont:
 	set_local_comm_id(mad, id->pv_cm_id);
 
@@ -479,6 +486,9 @@ int mlx4_ib_demux_cm_handler(struct ib_device *ibdev, int port, int *slave,
 	    mad->mad_hdr.attr_id == CM_REJ_ATTR_ID)
 		schedule_delayed(ibdev, id);
 
+	if (mad->mad_hdr.attr_id == CM_RTU_ATTR_ID)
+		cancel_delayed_work_sync(&id->timeout);
+
 	return 0;
 }
 
-- 
2.43.7


^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-06-03  0:26 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-07 15:47 [PATCH] IB/mlx4: Fix stale CM id_map entries when RTU is never received Praveen Kumar Kannoju
2026-05-11  7:50 ` Praveen Kannoju
2026-06-03  0:26 ` Jason Gunthorpe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox