* [PATCH 1/1] nvme-rdma: Fix memory leak during queue allocation
@ 2017-11-08 13:00 Max Gurtovoy
2017-11-09 9:38 ` Christoph Hellwig
2017-11-09 11:02 ` Sagi Grimberg
0 siblings, 2 replies; 5+ messages in thread
From: Max Gurtovoy @ 2017-11-08 13:00 UTC (permalink / raw)
In case nvme_rdma_wait_for_cm timeout expires before we get
an established or rejected event (rdma_connect succeeded) from
rdma_cm, we end up with leaking the ib resources for dedicated
queue.
This scenario can easily reproduced using traffic test during port
toggling.
Signed-off-by: Max Gurtovoy <maxg at mellanox.com>
---
drivers/nvme/host/rdma.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 0ebb539..fcb278a 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -545,13 +545,16 @@ static int nvme_rdma_alloc_queue(struct nvme_rdma_ctrl *ctrl,
if (ret) {
dev_info(ctrl->ctrl.device,
"rdma_resolve_addr wait failed (%d).\n", ret);
- goto out_destroy_cm_id;
+ goto out_destroy_queue_ib;
}
clear_bit(NVME_RDMA_Q_DELETING, &queue->flags);
return 0;
+out_destroy_queue_ib:
+ if (ret == -ETIMEDOUT)
+ nvme_rdma_destroy_queue_ib(queue);
out_destroy_cm_id:
rdma_destroy_id(queue->cm_id);
return ret;
--
1.8.3.1
^ permalink raw reply related [flat|nested] 5+ messages in thread* [PATCH 1/1] nvme-rdma: Fix memory leak during queue allocation
2017-11-08 13:00 [PATCH 1/1] nvme-rdma: Fix memory leak during queue allocation Max Gurtovoy
@ 2017-11-09 9:38 ` Christoph Hellwig
2017-11-09 11:02 ` Sagi Grimberg
1 sibling, 0 replies; 5+ messages in thread
From: Christoph Hellwig @ 2017-11-09 9:38 UTC (permalink / raw)
On Wed, Nov 08, 2017@03:00:32PM +0200, Max Gurtovoy wrote:
> In case nvme_rdma_wait_for_cm timeout expires before we get
> an established or rejected event (rdma_connect succeeded) from
> rdma_cm, we end up with leaking the ib resources for dedicated
> queue.
> This scenario can easily reproduced using traffic test during port
> toggling.
>
> Signed-off-by: Max Gurtovoy <maxg at mellanox.com>
Looks fine. but a little comment on the special casing of
-ETIMEDOUT would be nice.
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 1/1] nvme-rdma: Fix memory leak during queue allocation
2017-11-08 13:00 [PATCH 1/1] nvme-rdma: Fix memory leak during queue allocation Max Gurtovoy
2017-11-09 9:38 ` Christoph Hellwig
@ 2017-11-09 11:02 ` Sagi Grimberg
2017-11-09 11:09 ` Max Gurtovoy
1 sibling, 1 reply; 5+ messages in thread
From: Sagi Grimberg @ 2017-11-09 11:02 UTC (permalink / raw)
> In case nvme_rdma_wait_for_cm timeout expires before we get
> an established or rejected event (rdma_connect succeeded) from
> rdma_cm, we end up with leaking the ib resources for dedicated
> queue.
> This scenario can easily reproduced using traffic test during port
> toggling.
>
> Signed-off-by: Max Gurtovoy <maxg at mellanox.com>
> ---
> drivers/nvme/host/rdma.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
> index 0ebb539..fcb278a 100644
> --- a/drivers/nvme/host/rdma.c
> +++ b/drivers/nvme/host/rdma.c
> @@ -545,13 +545,16 @@ static int nvme_rdma_alloc_queue(struct nvme_rdma_ctrl *ctrl,
> if (ret) {
> dev_info(ctrl->ctrl.device,
> "rdma_resolve_addr wait failed (%d).\n", ret);
Are you rebased? this message have changed I think.
> - goto out_destroy_cm_id;
> + goto out_destroy_queue_ib;
> }
>
> clear_bit(NVME_RDMA_Q_DELETING, &queue->flags);
>
> return 0;
>
> +out_destroy_queue_ib:
> + if (ret == -ETIMEDOUT)
> + nvme_rdma_destroy_queue_ib(queue);
This does not look safe to me. What protects that nvme_rdma_cm_handler
will not destroy the ib queue as well? I think we need to destroy the
cm_id first (guarantee that we will never handle other cma events)
and only then destroy the ib queue if needed.
^ permalink raw reply [flat|nested] 5+ messages in thread* [PATCH 1/1] nvme-rdma: Fix memory leak during queue allocation
2017-11-09 11:02 ` Sagi Grimberg
@ 2017-11-09 11:09 ` Max Gurtovoy
2017-11-09 11:40 ` Sagi Grimberg
0 siblings, 1 reply; 5+ messages in thread
From: Max Gurtovoy @ 2017-11-09 11:09 UTC (permalink / raw)
On 11/9/2017 1:02 PM, Sagi Grimberg wrote:
>
>> In case nvme_rdma_wait_for_cm timeout expires before we get
>> an established or rejected event (rdma_connect succeeded) from
>> rdma_cm, we end up with leaking the ib resources for dedicated
>> queue.
>> This scenario can easily reproduced using traffic test during port
>> toggling.
>>
>> Signed-off-by: Max Gurtovoy <maxg at mellanox.com>
>> ---
>> ? drivers/nvme/host/rdma.c | 5 ++++-
>> ? 1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
>> index 0ebb539..fcb278a 100644
>> --- a/drivers/nvme/host/rdma.c
>> +++ b/drivers/nvme/host/rdma.c
>> @@ -545,13 +545,16 @@ static int nvme_rdma_alloc_queue(struct
>> nvme_rdma_ctrl *ctrl,
>> ????? if (ret) {
>> ????????? dev_info(ctrl->ctrl.device,
>> ????????????? "rdma_resolve_addr wait failed (%d).\n", ret);
>
> Are you rebased? this message have changed I think.
I'm working on the main master. Should I work on top of nvme-4.15 ? from
what I saw few days ago, it's wasn't rebased on top of 4.14-rc8
>
>> -??????? goto out_destroy_cm_id;
>> +??????? goto out_destroy_queue_ib;
>> ????? }
>> ????? clear_bit(NVME_RDMA_Q_DELETING, &queue->flags);
>> ????? return 0;
>> +out_destroy_queue_ib:
>> +??? if (ret == -ETIMEDOUT)
>> +??????? nvme_rdma_destroy_queue_ib(queue);
>
> This does not look safe to me. What protects that nvme_rdma_cm_handler
> will not destroy the ib queue as well? I think we need to destroy the
> cm_id first (guarantee that we will never handle other cma events)
> and only then destroy the ib queue if needed.
You mean we need to destroy cm_id always before destroying ib queue ?
^ permalink raw reply [flat|nested] 5+ messages in thread* [PATCH 1/1] nvme-rdma: Fix memory leak during queue allocation
2017-11-09 11:09 ` Max Gurtovoy
@ 2017-11-09 11:40 ` Sagi Grimberg
0 siblings, 0 replies; 5+ messages in thread
From: Sagi Grimberg @ 2017-11-09 11:40 UTC (permalink / raw)
>> Are you rebased? this message have changed I think.
>
> I'm working on the main master. Should I work on top of nvme-4.15 ? from
> what I saw few days ago, it's wasn't rebased on top of 4.14-rc8
Yes, or at least make sure your patches apply on nvme-4.XX latest branch
where we collect patches to.
>>> -??????? goto out_destroy_cm_id;
>>> +??????? goto out_destroy_queue_ib;
>>> ????? }
>>> ????? clear_bit(NVME_RDMA_Q_DELETING, &queue->flags);
>>> ????? return 0;
>>> +out_destroy_queue_ib:
>>> +??? if (ret == -ETIMEDOUT)
>>> +??????? nvme_rdma_destroy_queue_ib(queue);
>>
>> This does not look safe to me. What protects that nvme_rdma_cm_handler
>> will not destroy the ib queue as well? I think we need to destroy the
>> cm_id first (guarantee that we will never handle other cma events)
>> and only then destroy the ib queue if needed.
>
> You mean we need to destroy cm_id always before destroying ib queue ?
Yes, this is the only way to guarantee that the cm handler won't race
with this call site.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2017-11-09 11:40 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-11-08 13:00 [PATCH 1/1] nvme-rdma: Fix memory leak during queue allocation Max Gurtovoy
2017-11-09 9:38 ` Christoph Hellwig
2017-11-09 11:02 ` Sagi Grimberg
2017-11-09 11:09 ` Max Gurtovoy
2017-11-09 11:40 ` Sagi Grimberg
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.