[PATCH WIP/RFC v3 1/6] iw_cxgb4: call dev_put() on l2t allocation failure

linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

* [PATCH WIP/RFC v3 1/6] iw_cxgb4: call dev_put() on l2t allocation failure
  2016-08-30 16:26 [PATCH WIP/RFC v3 0/6] nvme-rdma device removal fixes Steve Wise
@ 2016-08-29 21:26 ` Steve Wise
  2016-09-01  8:34   ` Christoph Hellwig
  2016-08-29 21:26 ` [PATCH WIP/RFC v3 2/6] iw_cxgb4: block module unload until all ep resources are released Steve Wise
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 20+ messages in thread
From: Steve Wise @ 2016-08-29 21:26 UTC (permalink / raw)


Reviewed-by: Sagi Grimberg <sagi at grimbrg.me>
Signed-off-by: Steve Wise <swise at opengridcomputing.com>
---
 drivers/infiniband/hw/cxgb4/cm.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/cxgb4/cm.c b/drivers/infiniband/hw/cxgb4/cm.c
index b6a953a..434a7fb 100644
--- a/drivers/infiniband/hw/cxgb4/cm.c
+++ b/drivers/infiniband/hw/cxgb4/cm.c
@@ -2117,8 +2117,10 @@ static int import_ep(struct c4iw_ep *ep, int iptype, __u8 *peer_ip,
 		}
 		ep->l2t = cxgb4_l2t_get(cdev->rdev.lldi.l2t,
 					n, pdev, rt_tos2priority(tos));
-		if (!ep->l2t)
+		if (!ep->l2t) {
+			dev_put(pdev);
 			goto out;
+		}
 		ep->mtu = pdev->mtu;
 		ep->tx_chan = cxgb4_port_chan(pdev);
 		ep->smac_idx = cxgb4_tp_smt_idx(adapter_type,
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH WIP/RFC v3 2/6] iw_cxgb4: block module unload until all ep resources are released
  2016-08-30 16:26 [PATCH WIP/RFC v3 0/6] nvme-rdma device removal fixes Steve Wise
  2016-08-29 21:26 ` [PATCH WIP/RFC v3 1/6] iw_cxgb4: call dev_put() on l2t allocation failure Steve Wise
@ 2016-08-29 21:26 ` Steve Wise
  2016-09-01  8:35   ` Christoph Hellwig
  2016-08-29 21:27 ` [PATCH WIP/RFC v3 3/6] nvme_rdma: keep a ref on the ctrl during delete/flush Steve Wise
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 20+ messages in thread
From: Steve Wise @ 2016-08-29 21:26 UTC (permalink / raw)


Otherwise an endpoint can be still closing down causing a touch
after free crash.

Fixes: ad61a4c7a9b7 ("iw_cxgb4: don't block in destroy_qp awaiting the last deref")
Reviewed-by: Sagi Grimberg <sagi at grimbrg.me>
Signed-off-by: Steve Wise <swise at opengridcomputing.com>
---
 drivers/infiniband/hw/cxgb4/cm.c       | 2 ++
 drivers/infiniband/hw/cxgb4/device.c   | 5 +++++
 drivers/infiniband/hw/cxgb4/iw_cxgb4.h | 1 +
 3 files changed, 8 insertions(+)

diff --git a/drivers/infiniband/hw/cxgb4/cm.c b/drivers/infiniband/hw/cxgb4/cm.c
index 434a7fb..80f9889 100644
--- a/drivers/infiniband/hw/cxgb4/cm.c
+++ b/drivers/infiniband/hw/cxgb4/cm.c
@@ -333,6 +333,8 @@ static void remove_ep_tid(struct c4iw_ep *ep)
 
 	spin_lock_irqsave(&ep->com.dev->lock, flags);
 	_remove_handle(ep->com.dev, &ep->com.dev->hwtid_idr, ep->hwtid, 0);
+	if (idr_is_empty(&ep->com.dev->hwtid_idr))
+		wake_up(&ep->com.dev->wait);
 	spin_unlock_irqrestore(&ep->com.dev->lock, flags);
 }
 
diff --git a/drivers/infiniband/hw/cxgb4/device.c b/drivers/infiniband/hw/cxgb4/device.c
index 071d733..3c4b212 100644
--- a/drivers/infiniband/hw/cxgb4/device.c
+++ b/drivers/infiniband/hw/cxgb4/device.c
@@ -872,9 +872,13 @@ static void c4iw_rdev_close(struct c4iw_rdev *rdev)
 static void c4iw_dealloc(struct uld_ctx *ctx)
 {
 	c4iw_rdev_close(&ctx->dev->rdev);
+	WARN_ON_ONCE(!idr_is_empty(&ctx->dev->cqidr));
 	idr_destroy(&ctx->dev->cqidr);
+	WARN_ON_ONCE(!idr_is_empty(&ctx->dev->qpidr));
 	idr_destroy(&ctx->dev->qpidr);
+	WARN_ON_ONCE(!idr_is_empty(&ctx->dev->mmidr));
 	idr_destroy(&ctx->dev->mmidr);
+	wait_event(ctx->dev->wait, idr_is_empty(&ctx->dev->hwtid_idr));
 	idr_destroy(&ctx->dev->hwtid_idr);
 	idr_destroy(&ctx->dev->stid_idr);
 	idr_destroy(&ctx->dev->atid_idr);
@@ -992,6 +996,7 @@ static struct c4iw_dev *c4iw_alloc(const struct cxgb4_lld_info *infop)
 	mutex_init(&devp->rdev.stats.lock);
 	mutex_init(&devp->db_mutex);
 	INIT_LIST_HEAD(&devp->db_fc_list);
+	init_waitqueue_head(&devp->wait);
 	devp->avail_ird = devp->rdev.lldi.max_ird_adapter;
 
 	if (c4iw_debugfs_root) {
diff --git a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
index aa47e0a..4b83b84 100644
--- a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
+++ b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
@@ -263,6 +263,7 @@ struct c4iw_dev {
 	struct idr stid_idr;
 	struct list_head db_fc_list;
 	u32 avail_ird;
+	wait_queue_head_t wait;
 };
 
 static inline struct c4iw_dev *to_c4iw_dev(struct ib_device *ibdev)
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH WIP/RFC v3 3/6] nvme_rdma: keep a ref on the ctrl during delete/flush
  2016-08-30 16:26 [PATCH WIP/RFC v3 0/6] nvme-rdma device removal fixes Steve Wise
  2016-08-29 21:26 ` [PATCH WIP/RFC v3 1/6] iw_cxgb4: call dev_put() on l2t allocation failure Steve Wise
  2016-08-29 21:26 ` [PATCH WIP/RFC v3 2/6] iw_cxgb4: block module unload until all ep resources are released Steve Wise
@ 2016-08-29 21:27 ` Steve Wise
  2016-08-30 17:24   ` Sagi Grimberg
  2016-08-29 21:28 ` [PATCH WIP/RFC v3 5/6] nvme-rdma: add DELETING queue flag Sagi Grimberg
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 20+ messages in thread
From: Steve Wise @ 2016-08-29 21:27 UTC (permalink / raw)


Reviewed-by: Christoph Hellwig <hch at lst.de>
Reviewed-by: Sagi Grimberg <sagi at grimbrg.me>
Signed-off-by: Steve Wise <swise at opengridcomputing.com>
---
 drivers/nvme/host/rdma.c | 25 +++++++++++++++++--------
 1 file changed, 17 insertions(+), 8 deletions(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index c133256..8d22b95 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -1356,9 +1356,15 @@ static int nvme_rdma_device_unplug(struct nvme_rdma_queue *queue)
 		ret = 1;
 	}
 
-	/* Queue controller deletion */
+	/*
+	 * Queue controller deletion. Keep a reference until all
+	 * work is flushed since delete_work will free the ctrl mem
+	 */
+	kref_get(&ctrl->ctrl.kref);
 	queue_work(nvme_rdma_wq, &ctrl->delete_work);
 	flush_work(&ctrl->delete_work);
+	nvme_put_ctrl(&ctrl->ctrl);
+
 	return ret;
 }
 
@@ -1705,15 +1711,18 @@ static int __nvme_rdma_del_ctrl(struct nvme_rdma_ctrl *ctrl)
 static int nvme_rdma_del_ctrl(struct nvme_ctrl *nctrl)
 {
 	struct nvme_rdma_ctrl *ctrl = to_rdma_ctrl(nctrl);
-	int ret;
+	int ret = 0;
 
+	/*
+	 * Keep a reference until all work is flushed since
+	 * __nvme_rdma_del_ctrl can free the ctrl mem
+	 */
+	kref_get(&ctrl->ctrl.kref);
 	ret = __nvme_rdma_del_ctrl(ctrl);
-	if (ret)
-		return ret;
-
-	flush_work(&ctrl->delete_work);
-
-	return 0;
+	if (!ret)
+		flush_work(&ctrl->delete_work);
+	nvme_put_ctrl(&ctrl->ctrl);
+	return ret;
 }
 
 static void nvme_rdma_remove_ctrl_work(struct work_struct *work)
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH WIP/RFC v3 4/6] nvme-rdma: destroy nvme queue rdma resources on connect failure
  2016-08-30 16:26 [PATCH WIP/RFC v3 0/6] nvme-rdma device removal fixes Steve Wise
                   ` (3 preceding siblings ...)
  2016-08-29 21:28 ` [PATCH WIP/RFC v3 5/6] nvme-rdma: add DELETING queue flag Sagi Grimberg
@ 2016-08-29 21:28 ` Steve Wise
  2016-09-01 15:00   ` Steve Wise
  2016-08-30 16:25 ` [PATCH WIP/RFC v3 6/6] nvme-rdma: use ib_client API to detect device removal Steve Wise
  5 siblings, 1 reply; 20+ messages in thread
From: Steve Wise @ 2016-08-29 21:28 UTC (permalink / raw)


After address resolution, the nvme_rdma_queue rdma resources are
allocated.  If rdma route resolution or the connect fails, or the
controller reconnect times out and gives up, then the rdma resources
need to be freed.  Otherwise, rdma resources are leaked.

Reviewed-by: Christoph Hellwig <hch at lst.de>
Reviewed-by: Sagi Grimberg <sagi at grimbrg.me>
Signed-off-by: Steve Wise <swise at opengridcomputing.com>
---
 drivers/nvme/host/rdma.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 8d22b95..c2f6cc6 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -87,6 +87,7 @@ struct nvme_rdma_request {
 
 enum nvme_rdma_queue_flags {
 	NVME_RDMA_Q_CONNECTED = (1 << 0),
+	NVME_RDMA_IB_QUEUE_ALLOCATED = (1 << 1),
 };
 
 struct nvme_rdma_queue {
@@ -488,6 +489,8 @@ static void nvme_rdma_destroy_queue_ib(struct nvme_rdma_queue *queue)
 	struct nvme_rdma_device *dev = queue->device;
 	struct ib_device *ibdev = dev->dev;
 
+	if (!test_and_clear_bit(NVME_RDMA_IB_QUEUE_ALLOCATED, &queue->flags))
+		return;
 	rdma_destroy_qp(queue->cm_id);
 	ib_free_cq(queue->ib_cq);
 
@@ -538,6 +541,7 @@ static int nvme_rdma_create_queue_ib(struct nvme_rdma_queue *queue,
 		ret = -ENOMEM;
 		goto out_destroy_qp;
 	}
+	set_bit(NVME_RDMA_IB_QUEUE_ALLOCATED, &queue->flags);
 
 	return 0;
 
@@ -595,6 +599,8 @@ static int nvme_rdma_init_queue(struct nvme_rdma_ctrl *ctrl,
 	return 0;
 
 out_destroy_cm_id:
+	if (test_bit(NVME_RDMA_IB_QUEUE_ALLOCATED, &ctrl->queues[0].flags))
+		nvme_rdma_destroy_queue_ib(queue);
 	rdma_destroy_id(queue->cm_id);
 	return ret;
 }
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH WIP/RFC v3 5/6] nvme-rdma: add DELETING queue flag
  2016-08-30 16:26 [PATCH WIP/RFC v3 0/6] nvme-rdma device removal fixes Steve Wise
                   ` (2 preceding siblings ...)
  2016-08-29 21:27 ` [PATCH WIP/RFC v3 3/6] nvme_rdma: keep a ref on the ctrl during delete/flush Steve Wise
@ 2016-08-29 21:28 ` Sagi Grimberg
  2016-09-01  8:41   ` Christoph Hellwig
  2016-08-29 21:28 ` [PATCH WIP/RFC v3 4/6] nvme-rdma: destroy nvme queue rdma resources on connect failure Steve Wise
  2016-08-30 16:25 ` [PATCH WIP/RFC v3 6/6] nvme-rdma: use ib_client API to detect device removal Steve Wise
  5 siblings, 1 reply; 20+ messages in thread
From: Sagi Grimberg @ 2016-08-29 21:28 UTC (permalink / raw)


When we get a surprise disconnect from the target we queue a periodic
reconnect (which is the sane thing to do...).

We only move the queues out of CONNECTED when we retry to reconnect (after
10 seconds in the default case) but we stop the blk queues immediately
so we are not bothered with traffic from now on. If delete() is kicking
off in this period the queues are still in CONNECTED state.

Part of the delete sequence is trying to issue ctrl shutdown if the
admin queue is CONNECTED (which it is!). This request is issued but
stuck in blk-mq waiting for the queues to start again. This might be
the one preventing us from forward progress...

The patch separates the queue flags to CONNECTED and DELETING. Now we
will move out of CONNECTED as soon as error recovery kicks in (before
stopping the queues) and DELETING is on when we start the queue deletion.

Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
---
 drivers/nvme/host/rdma.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index c2f6cc6..8036c23 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -88,6 +88,7 @@ struct nvme_rdma_request {
 enum nvme_rdma_queue_flags {
 	NVME_RDMA_Q_CONNECTED = (1 << 0),
 	NVME_RDMA_IB_QUEUE_ALLOCATED = (1 << 1),
+	NVME_RDMA_Q_DELETING = (1 << 2),
 };
 
 struct nvme_rdma_queue {
@@ -619,7 +620,7 @@ static void nvme_rdma_free_queue(struct nvme_rdma_queue *queue)
 
 static void nvme_rdma_stop_and_free_queue(struct nvme_rdma_queue *queue)
 {
-	if (!test_and_clear_bit(NVME_RDMA_Q_CONNECTED, &queue->flags))
+	if (test_and_set_bit(NVME_RDMA_Q_DELETING, &queue->flags))
 		return;
 	nvme_rdma_stop_queue(queue);
 	nvme_rdma_free_queue(queue);
@@ -772,8 +773,13 @@ static void nvme_rdma_error_recovery_work(struct work_struct *work)
 {
 	struct nvme_rdma_ctrl *ctrl = container_of(work,
 			struct nvme_rdma_ctrl, err_work);
+	int i;
 
 	nvme_stop_keep_alive(&ctrl->ctrl);
+
+	for (i = 0; i < ctrl->queue_count; i++)
+		clear_bit(NVME_RDMA_Q_CONNECTED, &ctrl->queues[i].flags);
+
 	if (ctrl->queue_count > 1)
 		nvme_stop_queues(&ctrl->ctrl);
 	blk_mq_stop_hw_queues(ctrl->ctrl.admin_q);
@@ -1353,7 +1359,7 @@ static int nvme_rdma_device_unplug(struct nvme_rdma_queue *queue)
 	cancel_delayed_work_sync(&ctrl->reconnect_work);
 
 	/* Disable the queue so ctrl delete won't free it */
-	if (test_and_clear_bit(NVME_RDMA_Q_CONNECTED, &queue->flags)) {
+	if (!test_and_set_bit(NVME_RDMA_Q_DELETING, &queue->flags)) {
 		/* Free this queue ourselves */
 		nvme_rdma_stop_queue(queue);
 		nvme_rdma_destroy_queue_ib(queue);
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH WIP/RFC v3 6/6] nvme-rdma: use ib_client API to detect device removal
  2016-08-30 16:26 [PATCH WIP/RFC v3 0/6] nvme-rdma device removal fixes Steve Wise
                   ` (4 preceding siblings ...)
  2016-08-29 21:28 ` [PATCH WIP/RFC v3 4/6] nvme-rdma: destroy nvme queue rdma resources on connect failure Steve Wise
@ 2016-08-30 16:25 ` Steve Wise
  2016-09-01  8:36   ` Christoph Hellwig
  5 siblings, 1 reply; 20+ messages in thread
From: Steve Wise @ 2016-08-30 16:25 UTC (permalink / raw)


Change nvme-rdma to use the IB Client API to detect device removal.
This has the wonderful benefit of being able to blow away all the
ib/rdma_cm resources for the device being removed.  No craziness about
not destroying the cm_id handling the event.  No deadlocks due to broken
iw_cm/rdma_cm/iwarp dependencies.  And no need to have a bound cm_id
around during controller recovery/reconnect to catch device removal
events.

We don't use the device_add aspect of the ib_client service since we only
want to create resources for an IB device if we have a target utilizing
that device.

Signed-off-by: Steve Wise <swise at opengridcomputing.com>
---
 drivers/nvme/host/rdma.c | 111 ++++++++++++++++++-----------------------------
 1 file changed, 43 insertions(+), 68 deletions(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 8036c23..ecb4162 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -1322,64 +1322,6 @@ out_destroy_queue_ib:
 	return ret;
 }
 
-/**
- * nvme_rdma_device_unplug() - Handle RDMA device unplug
- * @queue:      Queue that owns the cm_id that caught the event
- *
- * DEVICE_REMOVAL event notifies us that the RDMA device is about
- * to unplug so we should take care of destroying our RDMA resources.
- * This event will be generated for each allocated cm_id.
- *
- * In our case, the RDMA resources are managed per controller and not
- * only per queue. So the way we handle this is we trigger an implicit
- * controller deletion upon the first DEVICE_REMOVAL event we see, and
- * hold the event inflight until the controller deletion is completed.
- *
- * One exception that we need to handle is the destruction of the cm_id
- * that caught the event. Since we hold the callout until the controller
- * deletion is completed, we'll deadlock if the controller deletion will
- * call rdma_destroy_id on this queue's cm_id. Thus, we claim ownership
- * of destroying this queue before-hand, destroy the queue resources,
- * then queue the controller deletion which won't destroy this queue and
- * we destroy the cm_id implicitely by returning a non-zero rc to the callout.
- */
-static int nvme_rdma_device_unplug(struct nvme_rdma_queue *queue)
-{
-	struct nvme_rdma_ctrl *ctrl = queue->ctrl;
-	int ret = 0;
-
-	/* Own the controller deletion */
-	if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_DELETING))
-		return 0;
-
-	dev_warn(ctrl->ctrl.device,
-		"Got rdma device removal event, deleting ctrl\n");
-
-	/* Get rid of reconnect work if its running */
-	cancel_delayed_work_sync(&ctrl->reconnect_work);
-
-	/* Disable the queue so ctrl delete won't free it */
-	if (!test_and_set_bit(NVME_RDMA_Q_DELETING, &queue->flags)) {
-		/* Free this queue ourselves */
-		nvme_rdma_stop_queue(queue);
-		nvme_rdma_destroy_queue_ib(queue);
-
-		/* Return non-zero so the cm_id will destroy implicitly */
-		ret = 1;
-	}
-
-	/*
-	 * Queue controller deletion. Keep a reference until all
-	 * work is flushed since delete_work will free the ctrl mem
-	 */
-	kref_get(&ctrl->ctrl.kref);
-	queue_work(nvme_rdma_wq, &ctrl->delete_work);
-	flush_work(&ctrl->delete_work);
-	nvme_put_ctrl(&ctrl->ctrl);
-
-	return ret;
-}
-
 static int nvme_rdma_cm_handler(struct rdma_cm_id *cm_id,
 		struct rdma_cm_event *ev)
 {
@@ -1421,8 +1363,8 @@ static int nvme_rdma_cm_handler(struct rdma_cm_id *cm_id,
 		nvme_rdma_error_recovery(queue->ctrl);
 		break;
 	case RDMA_CM_EVENT_DEVICE_REMOVAL:
-		/* return 1 means impliciy CM ID destroy */
-		return nvme_rdma_device_unplug(queue);
+		/* device removal is handled via the ib_client API */
+		break;
 	default:
 		dev_err(queue->ctrl->ctrl.device,
 			"Unexpected RDMA CM event (%d)\n", ev->event);
@@ -2031,27 +1973,60 @@ static struct nvmf_transport_ops nvme_rdma_transport = {
 	.create_ctrl	= nvme_rdma_create_ctrl,
 };
 
+static void nvme_rdma_add_one(struct ib_device *ib_device)
+{
+	pr_info("Device %s available for use\n", ib_device->name);
+}
+
+static void nvme_rdma_remove_one(struct ib_device *ib_device, void *client_data)
+{
+	struct nvme_rdma_ctrl *ctrl;
+
+	pr_info("Removing resources for device %s\n", ib_device->name);
+
+	/* Delete all controllers using this device */
+	mutex_lock(&nvme_rdma_ctrl_mutex);
+	list_for_each_entry(ctrl, &nvme_rdma_ctrl_list, list) {
+		if (ctrl->device->dev != ib_device)
+			continue;
+		dev_info(ctrl->ctrl.device,
+			"Removing ctrl: NQN \"%s\", addr %pISp\n",
+			ctrl->ctrl.opts->subsysnqn, &ctrl->addr);
+		__nvme_rdma_del_ctrl(ctrl);
+	}
+	mutex_unlock(&nvme_rdma_ctrl_mutex);
+
+	flush_workqueue(nvme_rdma_wq);
+}
+
+static struct ib_client nvme_rdma_ib_client = {
+	.name   = "nvme_rdma",
+	.add = nvme_rdma_add_one,
+	.remove = nvme_rdma_remove_one
+};
+
 static int __init nvme_rdma_init_module(void)
 {
+	int ret;
+
 	nvme_rdma_wq = create_workqueue("nvme_rdma_wq");
 	if (!nvme_rdma_wq)
 		return -ENOMEM;
 
+	ret = ib_register_client(&nvme_rdma_ib_client);
+	if (ret) {
+		destroy_workqueue(nvme_rdma_wq);
+		return ret;
+	}
+
 	nvmf_register_transport(&nvme_rdma_transport);
 	return 0;
 }
 
 static void __exit nvme_rdma_cleanup_module(void)
 {
-	struct nvme_rdma_ctrl *ctrl;
-
 	nvmf_unregister_transport(&nvme_rdma_transport);
-
-	mutex_lock(&nvme_rdma_ctrl_mutex);
-	list_for_each_entry(ctrl, &nvme_rdma_ctrl_list, list)
-		__nvme_rdma_del_ctrl(ctrl);
-	mutex_unlock(&nvme_rdma_ctrl_mutex);
-
+	ib_unregister_client(&nvme_rdma_ib_client);
 	destroy_workqueue(nvme_rdma_wq);
 }
 
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH WIP/RFC v3 0/6] nvme-rdma device removal fixes
@ 2016-08-30 16:26 Steve Wise
  2016-08-29 21:26 ` [PATCH WIP/RFC v3 1/6] iw_cxgb4: call dev_put() on l2t allocation failure Steve Wise
                   ` (5 more replies)
  0 siblings, 6 replies; 20+ messages in thread
From: Steve Wise @ 2016-08-30 16:26 UTC (permalink / raw)


This series is a Work In Progress (WIP) attempting to address several
problems when shutting down a nvme-rdma host when its controllers are
attempting to reconnect to a target that is no longer reachable.

I'm still testing but I welcome review.  Specifically the last patch
which solves the problem of always being able to detect a device removal.

To tickle these bugs:

1) attach over iw_cxgb4 to 10 devices on a target.
2) 'ifconfig down' the target's interface
3) wait for keep-alive to fire and begin reconnecting (~15-20 seconds)
4) do one of these on the host:
   - rmmod iw_cxgb4
   - reboot
   - reboot -f

Changes since v2:

- refactor/simplify the remove_one function.

- nvme-rdma module remove function doesn't need to explicitly
remove the controllers; they will be removed as part of
ib_client unregister.

- removed forward declarations.

Changes since v1:

- the big change was patch 6 rewrite - use client_ib API to handle
device removal instead of rdma_cm device removal events.

- tweaked patch 5 to avoid bisect issues

- small code rework on patch 3 based on Christoph's suggestion

- clear_bit() -> !test_and_clear_bit() in patch 4 (Christoph's comment)

- add reviewed-by tags.

---

Sagi Grimberg (1):
  nvme-rdma: add DELETING queue flag

Steve Wise (5):
  iw_cxgb4: call dev_put() on l2t allocation failure
  iw_cxgb4: block module unload until all ep resources are released
  nvme_rdma: keep a ref on the ctrl during delete/flush
  nvme-rdma: destroy nvme queue rdma resources on connect failure
  nvme-rdma: use ib_client API to detect device removal

 drivers/infiniband/hw/cxgb4/cm.c       |   6 +-
 drivers/infiniband/hw/cxgb4/device.c   |   5 ++
 drivers/infiniband/hw/cxgb4/iw_cxgb4.h |   1 +
 drivers/nvme/host/rdma.c               | 136 ++++++++++++++++-----------------
 4 files changed, 77 insertions(+), 71 deletions(-)

-- 
2.7.0

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH WIP/RFC v3 3/6] nvme_rdma: keep a ref on the ctrl during delete/flush
  2016-08-29 21:27 ` [PATCH WIP/RFC v3 3/6] nvme_rdma: keep a ref on the ctrl during delete/flush Steve Wise
@ 2016-08-30 17:24   ` Sagi Grimberg
  2016-08-30 17:48     ` Steve Wise
  2016-09-01  8:36     ` Christoph Hellwig
  0 siblings, 2 replies; 20+ messages in thread
From: Sagi Grimberg @ 2016-08-30 17:24 UTC (permalink / raw)



> -	/* Queue controller deletion */
> +	/*
> +	 * Queue controller deletion. Keep a reference until all
> +	 * work is flushed since delete_work will free the ctrl mem
> +	 */
> +	kref_get(&ctrl->ctrl.kref);

kref_get_unless_zero?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH WIP/RFC v3 3/6] nvme_rdma: keep a ref on the ctrl during delete/flush
  2016-08-30 17:24   ` Sagi Grimberg
@ 2016-08-30 17:48     ` Steve Wise
  2016-08-31  5:06       ` Sagi Grimberg
  2016-09-01  8:36     ` Christoph Hellwig
  1 sibling, 1 reply; 20+ messages in thread
From: Steve Wise @ 2016-08-30 17:48 UTC (permalink / raw)


> 
> 
> > -	/* Queue controller deletion */
> > +	/*
> > +	 * Queue controller deletion. Keep a reference until all
> > +	 * work is flushed since delete_work will free the ctrl mem
> > +	 */
> > +	kref_get(&ctrl->ctrl.kref);
> 
> kref_get_unless_zero?
> 

At the top of nvme_rdma_device_unplug(), we change the ctrl state to DELETING.
If that succeeded, and thus we own the delete process, and are about to kref the
controller, how could the controller ref be 0?  If it is, it is probably freed
which means we're in a bad way...

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH WIP/RFC v3 3/6] nvme_rdma: keep a ref on the ctrl during delete/flush
  2016-08-30 17:48     ` Steve Wise
@ 2016-08-31  5:06       ` Sagi Grimberg
  2016-08-31 14:29         ` Steve Wise
  0 siblings, 1 reply; 20+ messages in thread
From: Sagi Grimberg @ 2016-08-31  5:06 UTC (permalink / raw)



>>> -	/* Queue controller deletion */
>>> +	/*
>>> +	 * Queue controller deletion. Keep a reference until all
>>> +	 * work is flushed since delete_work will free the ctrl mem
>>> +	 */
>>> +	kref_get(&ctrl->ctrl.kref);
>>
>> kref_get_unless_zero?
>>
>
> At the top of nvme_rdma_device_unplug(), we change the ctrl state to DELETING.
> If that succeeded, and thus we own the delete process, and are about to kref the
> controller, how could the controller ref be 0?  If it is, it is probably freed
> which means we're in a bad way...

I meant in the del_ctrl part...

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH WIP/RFC v3 3/6] nvme_rdma: keep a ref on the ctrl during delete/flush
  2016-08-31  5:06       ` Sagi Grimberg
@ 2016-08-31 14:29         ` Steve Wise
  2016-09-01  6:40           ` Sagi Grimberg
  0 siblings, 1 reply; 20+ messages in thread
From: Steve Wise @ 2016-08-31 14:29 UTC (permalink / raw)


> >
> > At the top of nvme_rdma_device_unplug(), we change the ctrl state to
DELETING.
> > If that succeeded, and thus we own the delete process, and are about to kref
the
> > controller, how could the controller ref be 0?  If it is, it is probably
freed
> > which means we're in a bad way...
> 
> I meant in the del_ctrl part...
> 

Oh. :)

Does this fully avoid the race?

@@ -1671,7 +1671,8 @@ static int nvme_rdma_del_ctrl(struct nvme_ctrl *nctrl)
         * Keep a reference until all work is flushed since
         * __nvme_rdma_del_ctrl can free the ctrl mem
         */
-       kref_get(&ctrl->ctrl.kref);
+       if (!kref_get_unless_zero(&ctrl->ctrl.kref))
+               return -ENOMEM;
        ret = __nvme_rdma_del_ctrl(ctrl);
        if (!ret)
                flush_work(&ctrl->delete_work);

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH WIP/RFC v3 3/6] nvme_rdma: keep a ref on the ctrl during delete/flush
  2016-08-31 14:29         ` Steve Wise
@ 2016-09-01  6:40           ` Sagi Grimberg
  0 siblings, 0 replies; 20+ messages in thread
From: Sagi Grimberg @ 2016-09-01  6:40 UTC (permalink / raw)



> Oh. :)
>
> Does this fully avoid the race?
>
> @@ -1671,7 +1671,8 @@ static int nvme_rdma_del_ctrl(struct nvme_ctrl *nctrl)
>          * Keep a reference until all work is flushed since
>          * __nvme_rdma_del_ctrl can free the ctrl mem
>          */
> -       kref_get(&ctrl->ctrl.kref);
> +       if (!kref_get_unless_zero(&ctrl->ctrl.kref))
> +               return -ENOMEM;

We shouldn't return -ENOMEM failure because we'll only get it if
someone else already deleted the controller, -EBUSY is more appropriate.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH WIP/RFC v3 1/6] iw_cxgb4: call dev_put() on l2t allocation failure
  2016-08-29 21:26 ` [PATCH WIP/RFC v3 1/6] iw_cxgb4: call dev_put() on l2t allocation failure Steve Wise
@ 2016-09-01  8:34   ` Christoph Hellwig
  0 siblings, 0 replies; 20+ messages in thread
From: Christoph Hellwig @ 2016-09-01  8:34 UTC (permalink / raw)


Looks fine,

Reviewed-by: Christoph Hellwig <hch at lst.de>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH WIP/RFC v3 2/6] iw_cxgb4: block module unload until all ep resources are released
  2016-08-29 21:26 ` [PATCH WIP/RFC v3 2/6] iw_cxgb4: block module unload until all ep resources are released Steve Wise
@ 2016-09-01  8:35   ` Christoph Hellwig
  0 siblings, 0 replies; 20+ messages in thread
From: Christoph Hellwig @ 2016-09-01  8:35 UTC (permalink / raw)


On Mon, Aug 29, 2016@02:26:48PM -0700, Steve Wise wrote:
> Otherwise an endpoint can be still closing down causing a touch
> after free crash.

This also adds a few WARN_ON calls that might be worth mentioning in the
changelog.

Otherwise looks fine:

Reviewed-by: Christoph Hellwig <hch at lst.de>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH WIP/RFC v3 3/6] nvme_rdma: keep a ref on the ctrl during delete/flush
  2016-08-30 17:24   ` Sagi Grimberg
  2016-08-30 17:48     ` Steve Wise
@ 2016-09-01  8:36     ` Christoph Hellwig
  2016-09-01  9:31       ` Sagi Grimberg
  1 sibling, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2016-09-01  8:36 UTC (permalink / raw)


On Tue, Aug 30, 2016@08:24:50PM +0300, Sagi Grimberg wrote:
>
>> -	/* Queue controller deletion */
>> +	/*
>> +	 * Queue controller deletion. Keep a reference until all
>> +	 * work is flushed since delete_work will free the ctrl mem
>> +	 */
>> +	kref_get(&ctrl->ctrl.kref);
>
> kref_get_unless_zero?

How could the refcount be zero here?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH WIP/RFC v3 6/6] nvme-rdma: use ib_client API to detect device removal
  2016-08-30 16:25 ` [PATCH WIP/RFC v3 6/6] nvme-rdma: use ib_client API to detect device removal Steve Wise
@ 2016-09-01  8:36   ` Christoph Hellwig
  0 siblings, 0 replies; 20+ messages in thread
From: Christoph Hellwig @ 2016-09-01  8:36 UTC (permalink / raw)


On Tue, Aug 30, 2016@09:25:46AM -0700, Steve Wise wrote:
> Change nvme-rdma to use the IB Client API to detect device removal.
> This has the wonderful benefit of being able to blow away all the
> ib/rdma_cm resources for the device being removed.  No craziness about
> not destroying the cm_id handling the event.  No deadlocks due to broken
> iw_cm/rdma_cm/iwarp dependencies.  And no need to have a bound cm_id
> around during controller recovery/reconnect to catch device removal
> events.
> 
> We don't use the device_add aspect of the ib_client service since we only
> want to create resources for an IB device if we have a target utilizing
> that device.
> 
> Signed-off-by: Steve Wise <swise at opengridcomputing.com>
> ---
>  drivers/nvme/host/rdma.c | 111 ++++++++++++++++++-----------------------------
>  1 file changed, 43 insertions(+), 68 deletions(-)
> 
> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
> index 8036c23..ecb4162 100644
> --- a/drivers/nvme/host/rdma.c
> +++ b/drivers/nvme/host/rdma.c
> @@ -1322,64 +1322,6 @@ out_destroy_queue_ib:
>  	return ret;
>  }
>  
> -/**
> - * nvme_rdma_device_unplug() - Handle RDMA device unplug
> - * @queue:      Queue that owns the cm_id that caught the event
> - *
> - * DEVICE_REMOVAL event notifies us that the RDMA device is about
> - * to unplug so we should take care of destroying our RDMA resources.
> - * This event will be generated for each allocated cm_id.
> - *
> - * In our case, the RDMA resources are managed per controller and not
> - * only per queue. So the way we handle this is we trigger an implicit
> - * controller deletion upon the first DEVICE_REMOVAL event we see, and
> - * hold the event inflight until the controller deletion is completed.
> - *
> - * One exception that we need to handle is the destruction of the cm_id
> - * that caught the event. Since we hold the callout until the controller
> - * deletion is completed, we'll deadlock if the controller deletion will
> - * call rdma_destroy_id on this queue's cm_id. Thus, we claim ownership
> - * of destroying this queue before-hand, destroy the queue resources,
> - * then queue the controller deletion which won't destroy this queue and
> - * we destroy the cm_id implicitely by returning a non-zero rc to the callout.
> - */
> -static int nvme_rdma_device_unplug(struct nvme_rdma_queue *queue)
> -{
> -	struct nvme_rdma_ctrl *ctrl = queue->ctrl;
> -	int ret = 0;
> -
> -	/* Own the controller deletion */
> -	if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_DELETING))
> -		return 0;
> -
> -	dev_warn(ctrl->ctrl.device,
> -		"Got rdma device removal event, deleting ctrl\n");
> -
> -	/* Get rid of reconnect work if its running */
> -	cancel_delayed_work_sync(&ctrl->reconnect_work);
> -
> -	/* Disable the queue so ctrl delete won't free it */
> -	if (!test_and_set_bit(NVME_RDMA_Q_DELETING, &queue->flags)) {
> -		/* Free this queue ourselves */
> -		nvme_rdma_stop_queue(queue);
> -		nvme_rdma_destroy_queue_ib(queue);
> -
> -		/* Return non-zero so the cm_id will destroy implicitly */
> -		ret = 1;
> -	}
> -
> -	/*
> -	 * Queue controller deletion. Keep a reference until all
> -	 * work is flushed since delete_work will free the ctrl mem
> -	 */
> -	kref_get(&ctrl->ctrl.kref);
> -	queue_work(nvme_rdma_wq, &ctrl->delete_work);
> -	flush_work(&ctrl->delete_work);
> -	nvme_put_ctrl(&ctrl->ctrl);
> -
> -	return ret;
> -}
> -
>  static int nvme_rdma_cm_handler(struct rdma_cm_id *cm_id,
>  		struct rdma_cm_event *ev)
>  {
> @@ -1421,8 +1363,8 @@ static int nvme_rdma_cm_handler(struct rdma_cm_id *cm_id,
>  		nvme_rdma_error_recovery(queue->ctrl);
>  		break;
>  	case RDMA_CM_EVENT_DEVICE_REMOVAL:
> -		/* return 1 means impliciy CM ID destroy */
> -		return nvme_rdma_device_unplug(queue);
> +		/* device removal is handled via the ib_client API */
> +		break;
>  	default:
>  		dev_err(queue->ctrl->ctrl.device,
>  			"Unexpected RDMA CM event (%d)\n", ev->event);
> @@ -2031,27 +1973,60 @@ static struct nvmf_transport_ops nvme_rdma_transport = {
>  	.create_ctrl	= nvme_rdma_create_ctrl,
>  };
>  
> +static void nvme_rdma_add_one(struct ib_device *ib_device)
> +{
> +	pr_info("Device %s available for use\n", ib_device->name);
> +}
> +
> +static void nvme_rdma_remove_one(struct ib_device *ib_device, void *client_data)
> +{
> +	struct nvme_rdma_ctrl *ctrl;
> +
> +	pr_info("Removing resources for device %s\n", ib_device->name);

Can we avoid these noisy printks?

Otherwise this looks fine:

Reviewed-by: Christoph Hellwig <hch at lst.de>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH WIP/RFC v3 5/6] nvme-rdma: add DELETING queue flag
  2016-08-29 21:28 ` [PATCH WIP/RFC v3 5/6] nvme-rdma: add DELETING queue flag Sagi Grimberg
@ 2016-09-01  8:41   ` Christoph Hellwig
  2016-09-01 13:34     ` Steve Wise
  0 siblings, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2016-09-01  8:41 UTC (permalink / raw)


Steve,

this mail pretends to be from Sagi, which given the threading looks
unlikely.  Can you fix up your patch mail scripts so that it appears
from you and has another

From: Sagi Grimberg <sagi@grimberg.me>

line in the mail body?

> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
> index c2f6cc6..8036c23 100644
> --- a/drivers/nvme/host/rdma.c
> +++ b/drivers/nvme/host/rdma.c
> @@ -88,6 +88,7 @@ struct nvme_rdma_request {
>  enum nvme_rdma_queue_flags {
>  	NVME_RDMA_Q_CONNECTED = (1 << 0),
>  	NVME_RDMA_IB_QUEUE_ALLOCATED = (1 << 1),
> +	NVME_RDMA_Q_DELETING = (1 << 2),

Can we please aligned these now that we have a few?

	NVME_RDMA_Q_CONNECTED		= (1 << 0),
	NVME_RDMA_IB_QUEUE_ALLOCATED	= (1 << 1),
	NVME_RDMA_Q_DELETING		= (1 << 2),


Otherwise this looks fine:

Reviewed-by: Christoph Hellwig <hch at lst.de>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH WIP/RFC v3 3/6] nvme_rdma: keep a ref on the ctrl during delete/flush
  2016-09-01  8:36     ` Christoph Hellwig
@ 2016-09-01  9:31       ` Sagi Grimberg
  0 siblings, 0 replies; 20+ messages in thread
From: Sagi Grimberg @ 2016-09-01  9:31 UTC (permalink / raw)




On 01/09/16 11:36, Christoph Hellwig wrote:
> On Tue, Aug 30, 2016@08:24:50PM +0300, Sagi Grimberg wrote:
>>
>>> -	/* Queue controller deletion */
>>> +	/*
>>> +	 * Queue controller deletion. Keep a reference until all
>>> +	 * work is flushed since delete_work will free the ctrl mem
>>> +	 */
>>> +	kref_get(&ctrl->ctrl.kref);
>>
>> kref_get_unless_zero?
>
> How could the refcount be zero here?

If another del_ctrl is invoked while the first puts the
final ref?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH WIP/RFC v3 5/6] nvme-rdma: add DELETING queue flag
  2016-09-01  8:41   ` Christoph Hellwig
@ 2016-09-01 13:34     ` Steve Wise
  0 siblings, 0 replies; 20+ messages in thread
From: Steve Wise @ 2016-09-01 13:34 UTC (permalink / raw)




> -----Original Message-----
> From: Christoph Hellwig [mailto:hch at lst.de]
> Sent: Thursday, September 01, 2016 3:41 AM
> To: swise at opengridcomputing.com; Sagi Grimberg
> Cc: hch at lst.de; linux-nvme at lists.infradead.org
> Subject: Re: [PATCH WIP/RFC v3 5/6] nvme-rdma: add DELETING queue flag
> 
> Steve,
> 
> this mail pretends to be from Sagi, which given the threading looks
> unlikely.  Can you fix up your patch mail scripts so that it appears
> from you and has another
> 
> From: Sagi Grimberg <sagi at grimberg.me>
> 
> line in the mail body?


Yes, I screwed up here.  I'll fix it in v4.



> 
> > diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
> > index c2f6cc6..8036c23 100644
> > --- a/drivers/nvme/host/rdma.c
> > +++ b/drivers/nvme/host/rdma.c
> > @@ -88,6 +88,7 @@ struct nvme_rdma_request {
> >  enum nvme_rdma_queue_flags {
> >  	NVME_RDMA_Q_CONNECTED = (1 << 0),
> >  	NVME_RDMA_IB_QUEUE_ALLOCATED = (1 << 1),
> > +	NVME_RDMA_Q_DELETING = (1 << 2),
> 
> Can we please aligned these now that we have a few?
> 
> 	NVME_RDMA_Q_CONNECTED		= (1 << 0),
> 	NVME_RDMA_IB_QUEUE_ALLOCATED	= (1 << 1),
> 	NVME_RDMA_Q_DELETING		= (1 << 2),
> 
> 
> Otherwise this looks fine:
> 
> Reviewed-by: Christoph Hellwig <hch at lst.de>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH WIP/RFC v3 4/6] nvme-rdma: destroy nvme queue rdma resources on connect failure
  2016-08-29 21:28 ` [PATCH WIP/RFC v3 4/6] nvme-rdma: destroy nvme queue rdma resources on connect failure Steve Wise
@ 2016-09-01 15:00   ` Steve Wise
  0 siblings, 0 replies; 20+ messages in thread
From: Steve Wise @ 2016-09-01 15:00 UTC (permalink / raw)


> After address resolution, the nvme_rdma_queue rdma resources are
> allocated.  If rdma route resolution or the connect fails, or the
> controller reconnect times out and gives up, then the rdma resources
> need to be freed.  Otherwise, rdma resources are leaked.
> 
> Reviewed-by: Christoph Hellwig <hch at lst.de>
> Reviewed-by: Sagi Grimberg <sagi at grimbrg.me>
> Signed-off-by: Steve Wise <swise at opengridcomputing.com>
> ---
>  drivers/nvme/host/rdma.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
> index 8d22b95..c2f6cc6 100644
> --- a/drivers/nvme/host/rdma.c
> +++ b/drivers/nvme/host/rdma.c
> @@ -87,6 +87,7 @@ struct nvme_rdma_request {
> 
>  enum nvme_rdma_queue_flags {
>  	NVME_RDMA_Q_CONNECTED = (1 << 0),
> +	NVME_RDMA_IB_QUEUE_ALLOCATED = (1 << 1),
>  };
> 
>  struct nvme_rdma_queue {
> @@ -488,6 +489,8 @@ static void nvme_rdma_destroy_queue_ib(struct
> nvme_rdma_queue *queue)
>  	struct nvme_rdma_device *dev = queue->device;
>  	struct ib_device *ibdev = dev->dev;
> 
> +	if (!test_and_clear_bit(NVME_RDMA_IB_QUEUE_ALLOCATED, &queue-
> >flags))
> +		return;
>  	rdma_destroy_qp(queue->cm_id);
>  	ib_free_cq(queue->ib_cq);
> 
> @@ -538,6 +541,7 @@ static int nvme_rdma_create_queue_ib(struct
> nvme_rdma_queue *queue,
>  		ret = -ENOMEM;
>  		goto out_destroy_qp;
>  	}
> +	set_bit(NVME_RDMA_IB_QUEUE_ALLOCATED, &queue->flags);
> 
>  	return 0;
> 
> @@ -595,6 +599,8 @@ static int nvme_rdma_init_queue(struct nvme_rdma_ctrl
> *ctrl,
>  	return 0;
> 
>  out_destroy_cm_id:
> +	if (test_bit(NVME_RDMA_IB_QUEUE_ALLOCATED, &ctrl->queues[0].flags))
> +		nvme_rdma_destroy_queue_ib(queue);

I'm changing this to test_and_clear_bit() just to keep the ALLOCATED bit off
when there is no queue allocated.

Also, if nvme_rdma_init_queue() does detect a failure and destroy the cm_id and
queue, then the calling function, nvme_rdma_init_io_queues() must not try to
stop_and_free the queu that failed.  So I need this extra change:

@@ -659,7 +659,7 @@ static int nvme_rdma_init_io_queues(struct nvme_rdma_ctrl
*ctrl)
        return 0;

 out_free_queues:
-       for (; i >= 1; i--)
+       for (i--; i >= 1; i--)
                nvme_rdma_stop_and_free_queue(&ctrl->queues[i]);

I found this by forcing rdma rejects (accidentally :)).

Steve.

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2016-09-01 15:00 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-08-30 16:26 [PATCH WIP/RFC v3 0/6] nvme-rdma device removal fixes Steve Wise
2016-08-29 21:26 ` [PATCH WIP/RFC v3 1/6] iw_cxgb4: call dev_put() on l2t allocation failure Steve Wise
2016-09-01  8:34   ` Christoph Hellwig
2016-08-29 21:26 ` [PATCH WIP/RFC v3 2/6] iw_cxgb4: block module unload until all ep resources are released Steve Wise
2016-09-01  8:35   ` Christoph Hellwig
2016-08-29 21:27 ` [PATCH WIP/RFC v3 3/6] nvme_rdma: keep a ref on the ctrl during delete/flush Steve Wise
2016-08-30 17:24   ` Sagi Grimberg
2016-08-30 17:48     ` Steve Wise
2016-08-31  5:06       ` Sagi Grimberg
2016-08-31 14:29         ` Steve Wise
2016-09-01  6:40           ` Sagi Grimberg
2016-09-01  8:36     ` Christoph Hellwig
2016-09-01  9:31       ` Sagi Grimberg
2016-08-29 21:28 ` [PATCH WIP/RFC v3 5/6] nvme-rdma: add DELETING queue flag Sagi Grimberg
2016-09-01  8:41   ` Christoph Hellwig
2016-09-01 13:34     ` Steve Wise
2016-08-29 21:28 ` [PATCH WIP/RFC v3 4/6] nvme-rdma: destroy nvme queue rdma resources on connect failure Steve Wise
2016-09-01 15:00   ` Steve Wise
2016-08-30 16:25 ` [PATCH WIP/RFC v3 6/6] nvme-rdma: use ib_client API to detect device removal Steve Wise
2016-09-01  8:36   ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).