public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH for-next] RDMA/cm: Base cm_id destruction timeout on CMA values
@ 2025-10-21 13:27 Håkon Bugge
  2025-10-27 11:36 ` Leon Romanovsky
  2025-10-27 11:36 ` Leon Romanovsky
  0 siblings, 2 replies; 4+ messages in thread
From: Håkon Bugge @ 2025-10-21 13:27 UTC (permalink / raw)
  To: Jason Gunthorpe, Leon Romanovsky, Sean Hefty, Vlad Dumitrescu,
	Or Har-Toov, Håkon Bugge, Jacob Moroni, Manjunath Patil
  Cc: linux-rdma, linux-kernel

When a GSI MAD packet is sent on the QP, it will potentially be
retried CMA_MAX_CM_RETRIES times with a timeout value of:

    4.096usec * 2 ^ CMA_CM_RESPONSE_TIMEOUT

The above equates to ~64 seconds using the default CMA values.

The cm_id_priv's refcount will be incremented for this period.
Therefore, the timeout value waiting for a cm_id destruction must be
based on the effective timeout of MAD packets.  To provide additional
leeway, we add 25% to this timeout and use that instead of the
constant 10 seconds timeout, which may result in false negatives.

Fixes: 96d9cbe2f2ff ("RDMA/cm: add timeout to cm_destroy_id wait")
Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
---
 drivers/infiniband/core/cm.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index 01bede8ba1055..2a36a93459592 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -34,7 +34,6 @@ MODULE_AUTHOR("Sean Hefty");
 MODULE_DESCRIPTION("InfiniBand CM");
 MODULE_LICENSE("Dual BSD/GPL");
 
-#define CM_DESTROY_ID_WAIT_TIMEOUT 10000 /* msecs */
 #define CM_DIRECT_RETRY_CTX ((void *) 1UL)
 #define CM_MRA_SETTING 24 /* 4.096us * 2^24 = ~68.7 seconds */
 
@@ -1057,6 +1056,7 @@ static void cm_destroy_id(struct ib_cm_id *cm_id, int err)
 {
 	struct cm_id_private *cm_id_priv;
 	enum ib_cm_state old_state;
+	unsigned long timeout;
 	struct cm_work *work;
 	int ret;
 
@@ -1167,10 +1167,9 @@ static void cm_destroy_id(struct ib_cm_id *cm_id, int err)
 
 	xa_erase(&cm.local_id_table, cm_local_id(cm_id->local_id));
 	cm_deref_id(cm_id_priv);
+	timeout = msecs_to_jiffies((cm_id_priv->max_cm_retries * cm_id_priv->timeout_ms * 5) / 4);
 	do {
-		ret = wait_for_completion_timeout(&cm_id_priv->comp,
-						  msecs_to_jiffies(
-						  CM_DESTROY_ID_WAIT_TIMEOUT));
+		ret = wait_for_completion_timeout(&cm_id_priv->comp, timeout);
 		if (!ret) /* timeout happened */
 			cm_destroy_id_wait_timeout(cm_id, old_state);
 	} while (!ret);
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH for-next] RDMA/cm: Base cm_id destruction timeout on CMA values
  2025-10-21 13:27 [PATCH for-next] RDMA/cm: Base cm_id destruction timeout on CMA values Håkon Bugge
@ 2025-10-27 11:36 ` Leon Romanovsky
  2025-10-27 11:36 ` Leon Romanovsky
  1 sibling, 0 replies; 4+ messages in thread
From: Leon Romanovsky @ 2025-10-27 11:36 UTC (permalink / raw)
  To: Håkon Bugge
  Cc: Jason Gunthorpe, Sean Hefty, Vlad Dumitrescu, Or Har-Toov,
	Jacob Moroni, Manjunath Patil, linux-rdma, linux-kernel

On Tue, Oct 21, 2025 at 03:27:33PM +0200, Håkon Bugge wrote:
> When a GSI MAD packet is sent on the QP, it will potentially be
> retried CMA_MAX_CM_RETRIES times with a timeout value of:
> 
>     4.096usec * 2 ^ CMA_CM_RESPONSE_TIMEOUT
> 
> The above equates to ~64 seconds using the default CMA values.
> 
> The cm_id_priv's refcount will be incremented for this period.
> Therefore, the timeout value waiting for a cm_id destruction must be
> based on the effective timeout of MAD packets.  To provide additional
> leeway, we add 25% to this timeout and use that instead of the
> constant 10 seconds timeout, which may result in false negatives.
> 
> Fixes: 96d9cbe2f2ff ("RDMA/cm: add timeout to cm_destroy_id wait")

I applied and removed this Fixes line. Most likely someone will complain
that this patch breaks his flow.

Thanks

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH for-next] RDMA/cm: Base cm_id destruction timeout on CMA values
  2025-10-21 13:27 [PATCH for-next] RDMA/cm: Base cm_id destruction timeout on CMA values Håkon Bugge
  2025-10-27 11:36 ` Leon Romanovsky
@ 2025-10-27 11:36 ` Leon Romanovsky
  2025-10-28 12:28   ` Haakon Bugge
  1 sibling, 1 reply; 4+ messages in thread
From: Leon Romanovsky @ 2025-10-27 11:36 UTC (permalink / raw)
  To: Jason Gunthorpe, Sean Hefty, Vlad Dumitrescu, Or Har-Toov,
	Jacob Moroni, Manjunath Patil, Håkon Bugge
  Cc: linux-rdma, linux-kernel


On Tue, 21 Oct 2025 15:27:33 +0200, Håkon Bugge wrote:
> When a GSI MAD packet is sent on the QP, it will potentially be
> retried CMA_MAX_CM_RETRIES times with a timeout value of:
> 
>     4.096usec * 2 ^ CMA_CM_RESPONSE_TIMEOUT
> 
> The above equates to ~64 seconds using the default CMA values.
> 
> [...]

Applied, thanks!

[1/1] RDMA/cm: Base cm_id destruction timeout on CMA values
      https://git.kernel.org/rdma/rdma/c/58aca1f3de059c

Best regards,
-- 
Leon Romanovsky <leon@kernel.org>


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH for-next] RDMA/cm: Base cm_id destruction timeout on CMA values
  2025-10-27 11:36 ` Leon Romanovsky
@ 2025-10-28 12:28   ` Haakon Bugge
  0 siblings, 0 replies; 4+ messages in thread
From: Haakon Bugge @ 2025-10-28 12:28 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Jason Gunthorpe, Sean Hefty, Vlad Dumitrescu, Or Har-Toov,
	Jacob Moroni, Manjunath Patil, OFED mailing list,
	linux-kernel@vger.kernel.org



> On 27 Oct 2025, at 12:36, Leon Romanovsky <leon@kernel.org> wrote:
> 
> 
> On Tue, 21 Oct 2025 15:27:33 +0200, Håkon Bugge wrote:
>> When a GSI MAD packet is sent on the QP, it will potentially be
>> retried CMA_MAX_CM_RETRIES times with a timeout value of:
>> 
>>    4.096usec * 2 ^ CMA_CM_RESPONSE_TIMEOUT
>> 
>> The above equates to ~64 seconds using the default CMA values.
>> 
>> [...]
> 
> Applied, thanks!
> 

Thanks, Leon!


Håkon



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-10-28 12:28 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-21 13:27 [PATCH for-next] RDMA/cm: Base cm_id destruction timeout on CMA values Håkon Bugge
2025-10-27 11:36 ` Leon Romanovsky
2025-10-27 11:36 ` Leon Romanovsky
2025-10-28 12:28   ` Haakon Bugge

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox