All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/1] nvme-rdma: Fix memory leak during queue allocation
@ 2017-11-08 13:00 Max Gurtovoy
  2017-11-09  9:38 ` Christoph Hellwig
  2017-11-09 11:02 ` Sagi Grimberg
  0 siblings, 2 replies; 5+ messages in thread
From: Max Gurtovoy @ 2017-11-08 13:00 UTC (permalink / raw)


In case nvme_rdma_wait_for_cm timeout expires before we get
an established or rejected event (rdma_connect succeeded) from
rdma_cm, we end up with leaking the ib resources for dedicated
queue.
This scenario can easily reproduced using traffic test during port
toggling.

Signed-off-by: Max Gurtovoy <maxg at mellanox.com>
---
 drivers/nvme/host/rdma.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 0ebb539..fcb278a 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -545,13 +545,16 @@ static int nvme_rdma_alloc_queue(struct nvme_rdma_ctrl *ctrl,
 	if (ret) {
 		dev_info(ctrl->ctrl.device,
 			"rdma_resolve_addr wait failed (%d).\n", ret);
-		goto out_destroy_cm_id;
+		goto out_destroy_queue_ib;
 	}
 
 	clear_bit(NVME_RDMA_Q_DELETING, &queue->flags);
 
 	return 0;
 
+out_destroy_queue_ib:
+	if (ret == -ETIMEDOUT)
+		nvme_rdma_destroy_queue_ib(queue);
 out_destroy_cm_id:
 	rdma_destroy_id(queue->cm_id);
 	return ret;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 1/1] nvme-rdma: Fix memory leak during queue allocation
  2017-11-08 13:00 [PATCH 1/1] nvme-rdma: Fix memory leak during queue allocation Max Gurtovoy
@ 2017-11-09  9:38 ` Christoph Hellwig
  2017-11-09 11:02 ` Sagi Grimberg
  1 sibling, 0 replies; 5+ messages in thread
From: Christoph Hellwig @ 2017-11-09  9:38 UTC (permalink / raw)


On Wed, Nov 08, 2017@03:00:32PM +0200, Max Gurtovoy wrote:
> In case nvme_rdma_wait_for_cm timeout expires before we get
> an established or rejected event (rdma_connect succeeded) from
> rdma_cm, we end up with leaking the ib resources for dedicated
> queue.
> This scenario can easily reproduced using traffic test during port
> toggling.
> 
> Signed-off-by: Max Gurtovoy <maxg at mellanox.com>

Looks fine. but a little comment on the special casing of
-ETIMEDOUT would be nice.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/1] nvme-rdma: Fix memory leak during queue allocation
  2017-11-08 13:00 [PATCH 1/1] nvme-rdma: Fix memory leak during queue allocation Max Gurtovoy
  2017-11-09  9:38 ` Christoph Hellwig
@ 2017-11-09 11:02 ` Sagi Grimberg
  2017-11-09 11:09   ` Max Gurtovoy
  1 sibling, 1 reply; 5+ messages in thread
From: Sagi Grimberg @ 2017-11-09 11:02 UTC (permalink / raw)



> In case nvme_rdma_wait_for_cm timeout expires before we get
> an established or rejected event (rdma_connect succeeded) from
> rdma_cm, we end up with leaking the ib resources for dedicated
> queue.
> This scenario can easily reproduced using traffic test during port
> toggling.
> 
> Signed-off-by: Max Gurtovoy <maxg at mellanox.com>
> ---
>   drivers/nvme/host/rdma.c | 5 ++++-
>   1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
> index 0ebb539..fcb278a 100644
> --- a/drivers/nvme/host/rdma.c
> +++ b/drivers/nvme/host/rdma.c
> @@ -545,13 +545,16 @@ static int nvme_rdma_alloc_queue(struct nvme_rdma_ctrl *ctrl,
>   	if (ret) {
>   		dev_info(ctrl->ctrl.device,
>   			"rdma_resolve_addr wait failed (%d).\n", ret);

Are you rebased? this message have changed I think.

> -		goto out_destroy_cm_id;
> +		goto out_destroy_queue_ib;
>   	}
>   
>   	clear_bit(NVME_RDMA_Q_DELETING, &queue->flags);
>   
>   	return 0;
>   
> +out_destroy_queue_ib:
> +	if (ret == -ETIMEDOUT)
> +		nvme_rdma_destroy_queue_ib(queue);

This does not look safe to me. What protects that nvme_rdma_cm_handler
will not destroy the ib queue as well? I think we need to destroy the
cm_id first (guarantee that we will never handle other cma events)
and only then destroy the ib queue if needed.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/1] nvme-rdma: Fix memory leak during queue allocation
  2017-11-09 11:02 ` Sagi Grimberg
@ 2017-11-09 11:09   ` Max Gurtovoy
  2017-11-09 11:40     ` Sagi Grimberg
  0 siblings, 1 reply; 5+ messages in thread
From: Max Gurtovoy @ 2017-11-09 11:09 UTC (permalink / raw)




On 11/9/2017 1:02 PM, Sagi Grimberg wrote:
> 
>> In case nvme_rdma_wait_for_cm timeout expires before we get
>> an established or rejected event (rdma_connect succeeded) from
>> rdma_cm, we end up with leaking the ib resources for dedicated
>> queue.
>> This scenario can easily reproduced using traffic test during port
>> toggling.
>>
>> Signed-off-by: Max Gurtovoy <maxg at mellanox.com>
>> ---
>> ? drivers/nvme/host/rdma.c | 5 ++++-
>> ? 1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
>> index 0ebb539..fcb278a 100644
>> --- a/drivers/nvme/host/rdma.c
>> +++ b/drivers/nvme/host/rdma.c
>> @@ -545,13 +545,16 @@ static int nvme_rdma_alloc_queue(struct 
>> nvme_rdma_ctrl *ctrl,
>> ????? if (ret) {
>> ????????? dev_info(ctrl->ctrl.device,
>> ????????????? "rdma_resolve_addr wait failed (%d).\n", ret);
> 
> Are you rebased? this message have changed I think.

I'm working on the main master. Should I work on top of nvme-4.15 ? from 
what I saw few days ago, it's wasn't rebased on top of 4.14-rc8

> 
>> -??????? goto out_destroy_cm_id;
>> +??????? goto out_destroy_queue_ib;
>> ????? }
>> ????? clear_bit(NVME_RDMA_Q_DELETING, &queue->flags);
>> ????? return 0;
>> +out_destroy_queue_ib:
>> +??? if (ret == -ETIMEDOUT)
>> +??????? nvme_rdma_destroy_queue_ib(queue);
> 
> This does not look safe to me. What protects that nvme_rdma_cm_handler
> will not destroy the ib queue as well? I think we need to destroy the
> cm_id first (guarantee that we will never handle other cma events)
> and only then destroy the ib queue if needed.

You mean we need to destroy cm_id always before destroying ib queue ?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/1] nvme-rdma: Fix memory leak during queue allocation
  2017-11-09 11:09   ` Max Gurtovoy
@ 2017-11-09 11:40     ` Sagi Grimberg
  0 siblings, 0 replies; 5+ messages in thread
From: Sagi Grimberg @ 2017-11-09 11:40 UTC (permalink / raw)



>> Are you rebased? this message have changed I think.
> 
> I'm working on the main master. Should I work on top of nvme-4.15 ? from 
> what I saw few days ago, it's wasn't rebased on top of 4.14-rc8

Yes, or at least make sure your patches apply on nvme-4.XX latest branch
where we collect patches to.

>>> -??????? goto out_destroy_cm_id;
>>> +??????? goto out_destroy_queue_ib;
>>> ????? }
>>> ????? clear_bit(NVME_RDMA_Q_DELETING, &queue->flags);
>>> ????? return 0;
>>> +out_destroy_queue_ib:
>>> +??? if (ret == -ETIMEDOUT)
>>> +??????? nvme_rdma_destroy_queue_ib(queue);
>>
>> This does not look safe to me. What protects that nvme_rdma_cm_handler
>> will not destroy the ib queue as well? I think we need to destroy the
>> cm_id first (guarantee that we will never handle other cma events)
>> and only then destroy the ib queue if needed.
> 
> You mean we need to destroy cm_id always before destroying ib queue ?

Yes, this is the only way to guarantee that the cm handler won't race
with this call site.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-11-09 11:40 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-11-08 13:00 [PATCH 1/1] nvme-rdma: Fix memory leak during queue allocation Max Gurtovoy
2017-11-09  9:38 ` Christoph Hellwig
2017-11-09 11:02 ` Sagi Grimberg
2017-11-09 11:09   ` Max Gurtovoy
2017-11-09 11:40     ` Sagi Grimberg

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.