* [PATCH v3 1/1] nvme-rdma: Fix memory leak during queue allocation
@ 2017-11-22 17:01 Max Gurtovoy
2017-11-28 12:36 ` Christoph Hellwig
0 siblings, 1 reply; 5+ messages in thread
From: Max Gurtovoy @ 2017-11-22 17:01 UTC (permalink / raw)
In case nvme_rdma_wait_for_cm timeout expires before we get
an established or rejected event (rdma_connect succeeded) from
rdma_cm, we end up with leaking the ib transport resources for
dedicated queue. This scenario can easily reproduced using traffic
test during port toggling.
Also, in order to protect from parallel ib queue destruction, that
may be invoked from different context's, introduce new flag that
stands for transport readiness. While we're here, protect also against
a situation that we can receive rdma_cm events during ib queue destruction.
Signed-off-by: Max Gurtovoy <maxg at mellanox.com>
---
Changes from v1:
- added new queue flag bit NVME_RDMA_Q_TR_READY to avoid parallel destruction
- guarantee that cm_id destroyed before ib queue resources (from Sagi)
- rebase over nvme-4.15
Changes from v2:
- remove redundant code (from Sagi)
---
drivers/nvme/host/rdma.c | 16 +++++++++++++---
1 file changed, 13 insertions(+), 3 deletions(-)
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 2c59710..98c9e0d 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -73,6 +73,7 @@ struct nvme_rdma_request {
enum nvme_rdma_queue_flags {
NVME_RDMA_Q_ALLOCATED = 0,
NVME_RDMA_Q_LIVE = 1,
+ NVME_RDMA_Q_TR_READY = 2,
};
struct nvme_rdma_queue {
@@ -428,10 +429,16 @@ static int nvme_rdma_dev_get(struct nvme_rdma_device *dev)
static void nvme_rdma_destroy_queue_ib(struct nvme_rdma_queue *queue)
{
- struct nvme_rdma_device *dev = queue->device;
- struct ib_device *ibdev = dev->dev;
+ struct nvme_rdma_device *dev;
+ struct ib_device *ibdev;
- rdma_destroy_qp(queue->cm_id);
+ if (!test_and_clear_bit(NVME_RDMA_Q_TR_READY, &queue->flags))
+ return;
+
+ dev = queue->device;
+ ibdev = dev->dev;
+
+ ib_destroy_qp(queue->qp);
ib_free_cq(queue->ib_cq);
nvme_rdma_free_ring(ibdev, queue->rsp_ring, queue->queue_size,
@@ -482,6 +489,8 @@ static int nvme_rdma_create_queue_ib(struct nvme_rdma_queue *queue)
goto out_destroy_qp;
}
+ set_bit(NVME_RDMA_Q_TR_READY, &queue->flags);
+
return 0;
out_destroy_qp:
@@ -546,6 +555,7 @@ static int nvme_rdma_alloc_queue(struct nvme_rdma_ctrl *ctrl,
out_destroy_cm_id:
rdma_destroy_id(queue->cm_id);
+ nvme_rdma_destroy_queue_ib(queue);
return ret;
}
--
1.8.3.1
^ permalink raw reply related [flat|nested] 5+ messages in thread* [PATCH v3 1/1] nvme-rdma: Fix memory leak during queue allocation
2017-11-22 17:01 [PATCH v3 1/1] nvme-rdma: Fix memory leak during queue allocation Max Gurtovoy
@ 2017-11-28 12:36 ` Christoph Hellwig
2017-11-28 12:42 ` Christoph Hellwig
0 siblings, 1 reply; 5+ messages in thread
From: Christoph Hellwig @ 2017-11-28 12:36 UTC (permalink / raw)
Looks good and applied to nvme-4.15.
In the long run I wonder if we want a real queue state machine intead
of the flags..
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH v3 1/1] nvme-rdma: Fix memory leak during queue allocation
2017-11-28 12:36 ` Christoph Hellwig
@ 2017-11-28 12:42 ` Christoph Hellwig
2017-11-28 14:18 ` Max Gurtovoy
0 siblings, 1 reply; 5+ messages in thread
From: Christoph Hellwig @ 2017-11-28 12:42 UTC (permalink / raw)
On Tue, Nov 28, 2017@01:36:47PM +0100, Christoph Hellwig wrote:
> Looks good and applied to nvme-4.15.
>
> In the long run I wonder if we want a real queue state machine intead
> of the flags..
Actually, that was a little fast. This conflicts with the MR pool
switch.
Also why does it switch from rdma_destroy_qp to ib_destroy_qp?
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH v3 1/1] nvme-rdma: Fix memory leak during queue allocation
2017-11-28 12:42 ` Christoph Hellwig
@ 2017-11-28 14:18 ` Max Gurtovoy
2017-11-28 14:26 ` Christoph Hellwig
0 siblings, 1 reply; 5+ messages in thread
From: Max Gurtovoy @ 2017-11-28 14:18 UTC (permalink / raw)
On 11/28/2017 2:42 PM, Christoph Hellwig wrote:
> On Tue, Nov 28, 2017@01:36:47PM +0100, Christoph Hellwig wrote:
>> Looks good and applied to nvme-4.15.
>>
>> In the long run I wonder if we want a real queue state machine intead
>> of the flags..
>
> Actually, that was a little fast. This conflicts with the MR pool
> switch.
I'll resend f needed.
>
> Also why does it switch from rdma_destroy_qp to ib_destroy_qp?
>
because the cm_id might be destroyed (see the error flow).
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH v3 1/1] nvme-rdma: Fix memory leak during queue allocation
2017-11-28 14:18 ` Max Gurtovoy
@ 2017-11-28 14:26 ` Christoph Hellwig
0 siblings, 0 replies; 5+ messages in thread
From: Christoph Hellwig @ 2017-11-28 14:26 UTC (permalink / raw)
On Tue, Nov 28, 2017@04:18:38PM +0200, Max Gurtovoy wrote:
>
>
> On 11/28/2017 2:42 PM, Christoph Hellwig wrote:
>> On Tue, Nov 28, 2017@01:36:47PM +0100, Christoph Hellwig wrote:
>>> Looks good and applied to nvme-4.15.
>>>
>>> In the long run I wonder if we want a real queue state machine intead
>>> of the flags..
>>
>> Actually, that was a little fast. This conflicts with the MR pool
>> switch.
>
> I'll resend f needed.
Yes, please.
>>
>> Also why does it switch from rdma_destroy_qp to ib_destroy_qp?
>>
>
> because the cm_id might be destroyed (see the error flow).
Needs detailed comments please.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2017-11-28 14:26 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-11-22 17:01 [PATCH v3 1/1] nvme-rdma: Fix memory leak during queue allocation Max Gurtovoy
2017-11-28 12:36 ` Christoph Hellwig
2017-11-28 12:42 ` Christoph Hellwig
2017-11-28 14:18 ` Max Gurtovoy
2017-11-28 14:26 ` Christoph Hellwig
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox