From: sagi@grimberg.me (Sagi Grimberg)
Subject: [PATCH 2/2] nvme-rdma: move admin queue cleanup to nvme_rdma_free_ctrl
Date: Sun, 17 Jul 2016 09:01:31 +0300 [thread overview]
Message-ID: <578B1F3B.3020100@grimberg.me> (raw)
In-Reply-To: <0a9b01d1deb0$d46ba5d0$7d42f170$@opengridcomputing.com>
> Hey Sagi, here is some lite reading for you. :)
>
> Prelude: As part of disconnecting an iwarp connection, the iwarp provider needs
> to post an IW_CM_EVENT_CLOSE event to iw_cm, which is scheduled onto the
> singlethread workq thread for iw_cm.
>
> Here is what happens with Sagi's patch:
>
> nvme_rdma_device_unplug() calls nvme_rdma_stop_queue() which calls
> rdma_disconnect(). This triggers the disconnect. iw_cxgb4 posts the
> IW_CM_EVENT_CLOSE to iw_cm, which ends up calling cm_close_handler() in the
> iw_cm workq thread context. cm_close_handler() calls the rdma_cm event handler
> for this cm_id, function cm_iw_handler(), which blocks until any currently
> running event handler for this cm_id finishes. It does this by calling
> cm_disable_callback(). However since this whole unplug process is running in
> the event handler function for this same cm_id, the iw_cm workq thread is now
> stuck in a deadlock. nvme_rdma_device_unplug() however, continues on and
> schedules the controller delete worker thread and waits for it to complete. The
> delete controller worker thread tries to disconnect and destroy all the
> remaining IO queues, but gets stuck in the destroy() path on the first IO queue
> because the iw_cm workq thread is already stuck, and processing the CLOSE event
> is required to release a reference the iw_cm has on the iwarp providers qp. So
> everything comes to a grinding halt....
>
> Now: Ming's 2 patches avoid this deadlock because the cm_id that received the
> device removal event is disconnected/destroyed _only after_ all the controller
> queues are disconnected/destroyed. So nvme_rdma_device_unplug() doesn't get
> stuck waiting for the controller to delete the io queues, and only after that
> completes, does it delete the cm_id/qp that got the device removal event. It
> then returns thus causing the rdma_cm to release the cm_id's callback mutex.
> This causes the iw_cm workq thread to now unblock and we continue on. (can you
> say house of cards?)
>
> So the net is: the cm_id that received the device remove event _must_ be
> disconnect/destroyed _last_.
Hey Steve, thanks for the detailed description.
IMHO, this really looks like a buggy design in iWARP connection
management implementation. The fact that a rdma_disconnect forward
progress is dependent on other cm_id execution looks wrong and
backwards to me.
Given that the device is being removed altogether I don't think that
calling rdma_disconnect is important at all. Given the fact that
ib_drain_qp makes sure that the queue-pair is in error state does
this incremental patch resolve the iwarp-deadlock you are seeing?
--
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index fd90b5c00aae..e7eec7d8c705 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -1342,8 +1342,12 @@ static int nvme_rdma_device_unplug(struct
nvme_rdma_queue *queue)
goto queue_delete;
}
+ /*
+ * iwcm does not handle rdma_disconnect within DEVICE_REMOVAL
+ * event very well, so we settle with qp drain
+ */
+ ib_drain_qp(queue->qp);
/* Free this queue ourselves */
- nvme_rdma_stop_queue(queue);
nvme_rdma_destroy_queue_ib(queue);
/* Return non-zero so the cm_id will destroy implicitly */
--
next prev parent reply other threads:[~2016-07-17 6:01 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-07-13 21:26 [PATCH 0/2] nvme-rdma: device removal crash fixes Ming Lin
2016-07-13 21:26 ` [PATCH 1/2] nvme-rdma: grab reference for device removal event Ming Lin
2016-07-13 21:33 ` Steve Wise
2016-07-13 21:26 ` [PATCH 2/2] nvme-rdma: move admin queue cleanup to nvme_rdma_free_ctrl Ming Lin
2016-07-13 21:33 ` Steve Wise
2016-07-13 23:19 ` J Freyensee
2016-07-13 23:36 ` Ming Lin
2016-07-13 23:59 ` J Freyensee
2016-07-14 6:39 ` Ming Lin
2016-07-14 17:09 ` J Freyensee
2016-07-14 18:04 ` Ming Lin
2016-07-14 9:15 ` Sagi Grimberg
2016-07-14 9:17 ` Sagi Grimberg
2016-07-14 14:30 ` Steve Wise
2016-07-14 14:44 ` Sagi Grimberg
2016-07-14 14:59 ` Steve Wise
[not found] ` <011301d1dde0$4450e4e0$ccf2aea0$@opengridcomputing.com>
2016-07-14 15:02 ` Steve Wise
2016-07-14 15:26 ` Steve Wise
2016-07-14 21:27 ` Steve Wise
2016-07-15 15:52 ` Steve Wise
2016-07-17 6:01 ` Sagi Grimberg [this message]
2016-07-18 14:55 ` Steve Wise
2016-07-18 15:47 ` Steve Wise
2016-07-18 16:34 ` Steve Wise
2016-07-18 18:04 ` Steve Wise
2016-07-13 21:58 ` [PATCH 0/2] nvme-rdma: device removal crash fixes Steve Wise
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=578B1F3B.3020100@grimberg.me \
--to=sagi@grimberg.me \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.