Linux RDMA and InfiniBand development
 help / color / mirror / Atom feed
* [PATCH] IB/mlx4: delete allocated id_map_entry while sending REJ
@ 2026-05-06  9:08 Praveen Kumar Kannoju
  2026-05-12 12:58 ` Leon Romanovsky
  2026-06-02 19:07 ` Jason Gunthorpe
  0 siblings, 2 replies; 4+ messages in thread
From: Praveen Kumar Kannoju @ 2026-05-06  9:08 UTC (permalink / raw)
  To: yishaih, jgg, leon, linux-rdma, linux-kernel
  Cc: anand.a.khoje, manjunath.b.patil, Praveen Kumar Kannoju

During scenarios where a REJ is sent after a REQ or REP, the allocated
is_map_entry remains in memory, resulting in a memory leak. Scheduling the
entry for deletion during REJ handling, if it is not NULL, resolves the
issue.

Signed-off-by: Praveen Kumar Kannoju <praveen.kannoju@oracle.com>
---
 drivers/infiniband/hw/mlx4/cm.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/cm.c b/drivers/infiniband/hw/mlx4/cm.c
index 63a868a3822f..21f2f401ed61 100644
--- a/drivers/infiniband/hw/mlx4/cm.c
+++ b/drivers/infiniband/hw/mlx4/cm.c
@@ -321,10 +321,9 @@ int mlx4_ib_multiplex_cm_handler(struct ib_device *ibdev, int port, int slave_id
 				__func__, slave_id, sl_cm_id);
 			return PTR_ERR(id);
 		}
-	} else if (mad->mad_hdr.attr_id == CM_REJ_ATTR_ID ||
-		   mad->mad_hdr.attr_id == CM_SIDR_REP_ATTR_ID) {
+	} else if (mad->mad_hdr.attr_id == CM_SIDR_REP_ATTR_ID)
 		return 0;
-	} else {
+	else {
 		sl_cm_id = get_local_comm_id(mad);
 		id = id_map_get(ibdev, &pv_cm_id, slave_id, sl_cm_id);
 	}
@@ -338,7 +337,8 @@ int mlx4_ib_multiplex_cm_handler(struct ib_device *ibdev, int port, int slave_id
 cont:
 	set_local_comm_id(mad, id->pv_cm_id);
 
-	if (mad->mad_hdr.attr_id == CM_DREQ_ATTR_ID)
+	if (mad->mad_hdr.attr_id == CM_DREQ_ATTR_ID ||
+	    mad->mad_hdr.attr_id == CM_REJ_ATTR_ID)
 		schedule_delayed(ibdev, id);
 	return 0;
 }
-- 
2.43.7


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] IB/mlx4: delete allocated id_map_entry while sending REJ
  2026-05-06  9:08 [PATCH] IB/mlx4: delete allocated id_map_entry while sending REJ Praveen Kumar Kannoju
@ 2026-05-12 12:58 ` Leon Romanovsky
  2026-05-13 10:46   ` Praveen Kannoju
  2026-06-02 19:07 ` Jason Gunthorpe
  1 sibling, 1 reply; 4+ messages in thread
From: Leon Romanovsky @ 2026-05-12 12:58 UTC (permalink / raw)
  To: Praveen Kumar Kannoju
  Cc: yishaih, jgg, linux-rdma, linux-kernel, anand.a.khoje,
	manjunath.b.patil

On Wed, May 06, 2026 at 09:08:24AM +0000, Praveen Kumar Kannoju wrote:
> During scenarios where a REJ is sent after a REQ or REP, the allocated
> is_map_entry remains in memory, resulting in a memory leak. Scheduling the
> entry for deletion during REJ handling, if it is not NULL, resolves the
> issue.

Do you have kmemleak output to prove the leak?

> 
> Signed-off-by: Praveen Kumar Kannoju <praveen.kannoju@oracle.com>
> ---
>  drivers/infiniband/hw/mlx4/cm.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/mlx4/cm.c b/drivers/infiniband/hw/mlx4/cm.c
> index 63a868a3822f..21f2f401ed61 100644
> --- a/drivers/infiniband/hw/mlx4/cm.c
> +++ b/drivers/infiniband/hw/mlx4/cm.c
> @@ -321,10 +321,9 @@ int mlx4_ib_multiplex_cm_handler(struct ib_device *ibdev, int port, int slave_id
>  				__func__, slave_id, sl_cm_id);
>  			return PTR_ERR(id);
>  		}
> -	} else if (mad->mad_hdr.attr_id == CM_REJ_ATTR_ID ||
> -		   mad->mad_hdr.attr_id == CM_SIDR_REP_ATTR_ID) {
> +	} else if (mad->mad_hdr.attr_id == CM_SIDR_REP_ATTR_ID)
>  		return 0;
> -	} else {
> +	else {

It is now similar to the "if (...  && REJ_REASON(mad) == IB_CM_REJ_TIMEOUT)"
for active-side timeout above.

Thanks

>  		sl_cm_id = get_local_comm_id(mad);
>  		id = id_map_get(ibdev, &pv_cm_id, slave_id, sl_cm_id);
>  	}
> @@ -338,7 +337,8 @@ int mlx4_ib_multiplex_cm_handler(struct ib_device *ibdev, int port, int slave_id
>  cont:
>  	set_local_comm_id(mad, id->pv_cm_id);
>  
> -	if (mad->mad_hdr.attr_id == CM_DREQ_ATTR_ID)
> +	if (mad->mad_hdr.attr_id == CM_DREQ_ATTR_ID ||
> +	    mad->mad_hdr.attr_id == CM_REJ_ATTR_ID)
>  		schedule_delayed(ibdev, id);
>  	return 0;
>  }
> -- 
> 2.43.7
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: [PATCH] IB/mlx4: delete allocated id_map_entry while sending REJ
  2026-05-12 12:58 ` Leon Romanovsky
@ 2026-05-13 10:46   ` Praveen Kannoju
  0 siblings, 0 replies; 4+ messages in thread
From: Praveen Kannoju @ 2026-05-13 10:46 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: yishaih@nvidia.com, jgg@ziepe.ca, linux-rdma@vger.kernel.org,
	linux-kernel@vger.kernel.org, Anand Khoje, Manjunath Patil

Confidential - Oracle Restricted \Including External Recipients

HI Leon.
Thank you for the review.
Will reproduce the issue to collect kmemleak and reply you back with its output.

-
Praveen.


Confidential - Oracle Restricted \Including External Recipients
> -----Original Message-----
> From: Leon Romanovsky <leon@kernel.org>
> Sent: Tuesday, May 12, 2026 6:28 PM
> To: Praveen Kannoju <praveen.kannoju@oracle.com>
> Cc: yishaih@nvidia.com; jgg@ziepe.ca; linux-rdma@vger.kernel.org; linux-
> kernel@vger.kernel.org; Anand Khoje <anand.a.khoje@oracle.com>; Manjunath
> Patil <manjunath.b.patil@oracle.com>
> Subject: Re: [PATCH] IB/mlx4: delete allocated id_map_entry while sending REJ
>
> On Wed, May 06, 2026 at 09:08:24AM +0000, Praveen Kumar Kannoju wrote:
> > During scenarios where a REJ is sent after a REQ or REP, the allocated
> > is_map_entry remains in memory, resulting in a memory leak. Scheduling
> > the entry for deletion during REJ handling, if it is not NULL,
> > resolves the issue.
>
> Do you have kmemleak output to prove the leak?
>
> >
> > Signed-off-by: Praveen Kumar Kannoju <praveen.kannoju@oracle.com>
> > ---
> >  drivers/infiniband/hw/mlx4/cm.c | 8 ++++----
> >  1 file changed, 4 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/infiniband/hw/mlx4/cm.c
> > b/drivers/infiniband/hw/mlx4/cm.c index 63a868a3822f..21f2f401ed61
> > 100644
> > --- a/drivers/infiniband/hw/mlx4/cm.c
> > +++ b/drivers/infiniband/hw/mlx4/cm.c
> > @@ -321,10 +321,9 @@ int mlx4_ib_multiplex_cm_handler(struct ib_device
> *ibdev, int port, int slave_id
> >                             __func__, slave_id, sl_cm_id);
> >                     return PTR_ERR(id);
> >             }
> > -   } else if (mad->mad_hdr.attr_id == CM_REJ_ATTR_ID ||
> > -              mad->mad_hdr.attr_id == CM_SIDR_REP_ATTR_ID) {
> > +   } else if (mad->mad_hdr.attr_id == CM_SIDR_REP_ATTR_ID)
> >             return 0;
> > -   } else {
> > +   else {
>
> It is now similar to the "if (...  && REJ_REASON(mad) == IB_CM_REJ_TIMEOUT)"
> for active-side timeout above.
>
> Thanks
>
> >             sl_cm_id = get_local_comm_id(mad);
> >             id = id_map_get(ibdev, &pv_cm_id, slave_id, sl_cm_id);
> >     }
> > @@ -338,7 +337,8 @@ int mlx4_ib_multiplex_cm_handler(struct ib_device
> > *ibdev, int port, int slave_id
> >  cont:
> >     set_local_comm_id(mad, id->pv_cm_id);
> >
> > -   if (mad->mad_hdr.attr_id == CM_DREQ_ATTR_ID)
> > +   if (mad->mad_hdr.attr_id == CM_DREQ_ATTR_ID ||
> > +       mad->mad_hdr.attr_id == CM_REJ_ATTR_ID)
> >             schedule_delayed(ibdev, id);
> >     return 0;
> >  }
> > --
> > 2.43.7
> >

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] IB/mlx4: delete allocated id_map_entry while sending REJ
  2026-05-06  9:08 [PATCH] IB/mlx4: delete allocated id_map_entry while sending REJ Praveen Kumar Kannoju
  2026-05-12 12:58 ` Leon Romanovsky
@ 2026-06-02 19:07 ` Jason Gunthorpe
  1 sibling, 0 replies; 4+ messages in thread
From: Jason Gunthorpe @ 2026-06-02 19:07 UTC (permalink / raw)
  To: Praveen Kumar Kannoju
  Cc: yishaih, leon, linux-rdma, linux-kernel, anand.a.khoje,
	manjunath.b.patil

On Wed, May 06, 2026 at 09:08:24AM +0000, Praveen Kumar Kannoju wrote:
> During scenarios where a REJ is sent after a REQ or REP, the allocated
> is_map_entry remains in memory, resulting in a memory leak. Scheduling the
> entry for deletion during REJ handling, if it is not NULL, resolves the
> issue.

Well, the leak seems quite likely, but I'm not sure about this fix.

This code looks quite odd and it seems to have other races as well, so
IDK..

> Signed-off-by: Praveen Kumar Kannoju <praveen.kannoju@oracle.com>
> ---
>  drivers/infiniband/hw/mlx4/cm.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/mlx4/cm.c b/drivers/infiniband/hw/mlx4/cm.c
> index 63a868a3822f..21f2f401ed61 100644
> --- a/drivers/infiniband/hw/mlx4/cm.c
> +++ b/drivers/infiniband/hw/mlx4/cm.c
> @@ -321,10 +321,9 @@ int mlx4_ib_multiplex_cm_handler(struct ib_device *ibdev, int port, int slave_id
>  				__func__, slave_id, sl_cm_id);
>  			return PTR_ERR(id);
>  		}
> -	} else if (mad->mad_hdr.attr_id == CM_REJ_ATTR_ID ||
> -		   mad->mad_hdr.attr_id == CM_SIDR_REP_ATTR_ID) {
> +	} else if (mad->mad_hdr.attr_id == CM_SIDR_REP_ATTR_ID)
>  		return 0;
> -	} else {
> +	else {
>  		sl_cm_id = get_local_comm_id(mad);
>  		id = id_map_get(ibdev, &pv_cm_id, slave_id, sl_cm_id);
>  	}

What is this change for?

It does look like ignoring the rej isn't right, but then also why does
this rej just search and free but the rej in the prior stanza is
allocating too?

> @@ -338,7 +337,8 @@ int mlx4_ib_multiplex_cm_handler(struct ib_device *ibdev, int port, int slave_id
>  cont:
>  	set_local_comm_id(mad, id->pv_cm_id);
>  
> -	if (mad->mad_hdr.attr_id == CM_DREQ_ATTR_ID)
> +	if (mad->mad_hdr.attr_id == CM_DREQ_ATTR_ID ||
> +	    mad->mad_hdr.attr_id == CM_REJ_ATTR_ID)
>  		schedule_delayed(ibdev, id);
>  	return 0;
>  }

SIDR seems troubled as well.

AI pointed out the use of id like this is racey too.

But broadly this seems like it might be the right direction, but the
commit message should explain what this logic is alot better

Jason

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-06-02 19:07 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-06  9:08 [PATCH] IB/mlx4: delete allocated id_map_entry while sending REJ Praveen Kumar Kannoju
2026-05-12 12:58 ` Leon Romanovsky
2026-05-13 10:46   ` Praveen Kannoju
2026-06-02 19:07 ` Jason Gunthorpe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox