[PATCH WIP/RFC v2 0/6] nvme-rdma device removal fixes

linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

* [PATCH WIP/RFC v2 0/6] nvme-rdma device removal fixes
@ 2016-08-29 21:35 Steve Wise
  2016-08-29 21:26 ` [PATCH WIP/RFC v2 1/6] iw_cxgb4: call dev_put() on l2t allocation failure Steve Wise
                   ` (5 more replies)
  0 siblings, 6 replies; 15+ messages in thread
From: Steve Wise @ 2016-08-29 21:35 UTC (permalink / raw)


This series is a Work In Progress (WIP) attempting to address several
problems when shutting down a nvme-rdma host when its controllers are
attempting to reconnect to a target that is no longer reachable.

I'm still testing but I welcome review.  Specifically the last patch
which solves the problem of always being able to detect a device removal.

To tickle these bugs:

1) attach over iw_cxgb4 to 10 devices on a target.
2) 'ifconfig down' the target's interface
3) wait for keep-alive to fire and begin reconnecting (~15-20 seconds)
4) do one of these on the host:
   - rmmod iw_cxgb4
   - reboot
   - reboot -f

Changes since v1:

- the big change was patch 6 rewrite - use client_ib API to handle
device removal instead of rdma_cm device removal events.

- tweaked patch 5 to avoid bisect issues

- small code rework on patch 3 based on Christoph's suggestion

- clear_bit() -> !test_and_clear_bit() in patch 4 (Christoph's comment)

- add reviewed-by tags.

---

Sagi Grimberg (1):
  nvme-rdma: add DELETING queue flag

Steve Wise (5):
  iw_cxgb4: call dev_put() on l2t allocation failure
  iw_cxgb4: block module unload until all ep resources are released
  nvme_rdma: keep a ref on the ctrl during delete/flush
  nvme-rdma: destroy nvme queue rdma resources on connect failure
  nvme-rdma: use ib_client API to detect device removal

 drivers/infiniband/hw/cxgb4/cm.c       |   6 +-
 drivers/infiniband/hw/cxgb4/device.c   |   5 ++
 drivers/infiniband/hw/cxgb4/iw_cxgb4.h |   1 +
 drivers/nvme/host/rdma.c               | 154 ++++++++++++++++++++-------------
 4 files changed, 103 insertions(+), 63 deletions(-)

-- 
2.7.0

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH WIP/RFC v2 1/6] iw_cxgb4: call dev_put() on l2t allocation failure
  2016-08-29 21:35 [PATCH WIP/RFC v2 0/6] nvme-rdma device removal fixes Steve Wise
@ 2016-08-29 21:26 ` Steve Wise
  2016-08-29 21:26 ` [PATCH WIP/RFC v2 2/6] iw_cxgb4: block module unload until all ep resources are released Steve Wise
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 15+ messages in thread
From: Steve Wise @ 2016-08-29 21:26 UTC (permalink / raw)


Reviewed-by: Sagi Grimberg <sagi at grimbrg.me>
Signed-off-by: Steve Wise <swise at opengridcomputing.com>
---
 drivers/infiniband/hw/cxgb4/cm.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/cxgb4/cm.c b/drivers/infiniband/hw/cxgb4/cm.c
index b6a953a..434a7fb 100644
--- a/drivers/infiniband/hw/cxgb4/cm.c
+++ b/drivers/infiniband/hw/cxgb4/cm.c
@@ -2117,8 +2117,10 @@ static int import_ep(struct c4iw_ep *ep, int iptype, __u8 *peer_ip,
 		}
 		ep->l2t = cxgb4_l2t_get(cdev->rdev.lldi.l2t,
 					n, pdev, rt_tos2priority(tos));
-		if (!ep->l2t)
+		if (!ep->l2t) {
+			dev_put(pdev);
 			goto out;
+		}
 		ep->mtu = pdev->mtu;
 		ep->tx_chan = cxgb4_port_chan(pdev);
 		ep->smac_idx = cxgb4_tp_smt_idx(adapter_type,
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH WIP/RFC v2 2/6] iw_cxgb4: block module unload until all ep resources are released
  2016-08-29 21:35 [PATCH WIP/RFC v2 0/6] nvme-rdma device removal fixes Steve Wise
  2016-08-29 21:26 ` [PATCH WIP/RFC v2 1/6] iw_cxgb4: call dev_put() on l2t allocation failure Steve Wise
@ 2016-08-29 21:26 ` Steve Wise
  2016-08-29 21:27 ` [PATCH WIP/RFC v2 3/6] nvme_rdma: keep a ref on the ctrl during delete/flush Steve Wise
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 15+ messages in thread
From: Steve Wise @ 2016-08-29 21:26 UTC (permalink / raw)


Otherwise an endpoint can be still closing down causing a touch
after free crash.

Fixes: ad61a4c7a9b7 ("iw_cxgb4: don't block in destroy_qp awaiting the last deref")
Reviewed-by: Sagi Grimberg <sagi at grimbrg.me>
Signed-off-by: Steve Wise <swise at opengridcomputing.com>
---
 drivers/infiniband/hw/cxgb4/cm.c       | 2 ++
 drivers/infiniband/hw/cxgb4/device.c   | 5 +++++
 drivers/infiniband/hw/cxgb4/iw_cxgb4.h | 1 +
 3 files changed, 8 insertions(+)

diff --git a/drivers/infiniband/hw/cxgb4/cm.c b/drivers/infiniband/hw/cxgb4/cm.c
index 434a7fb..80f9889 100644
--- a/drivers/infiniband/hw/cxgb4/cm.c
+++ b/drivers/infiniband/hw/cxgb4/cm.c
@@ -333,6 +333,8 @@ static void remove_ep_tid(struct c4iw_ep *ep)
 
 	spin_lock_irqsave(&ep->com.dev->lock, flags);
 	_remove_handle(ep->com.dev, &ep->com.dev->hwtid_idr, ep->hwtid, 0);
+	if (idr_is_empty(&ep->com.dev->hwtid_idr))
+		wake_up(&ep->com.dev->wait);
 	spin_unlock_irqrestore(&ep->com.dev->lock, flags);
 }
 
diff --git a/drivers/infiniband/hw/cxgb4/device.c b/drivers/infiniband/hw/cxgb4/device.c
index 071d733..3c4b212 100644
--- a/drivers/infiniband/hw/cxgb4/device.c
+++ b/drivers/infiniband/hw/cxgb4/device.c
@@ -872,9 +872,13 @@ static void c4iw_rdev_close(struct c4iw_rdev *rdev)
 static void c4iw_dealloc(struct uld_ctx *ctx)
 {
 	c4iw_rdev_close(&ctx->dev->rdev);
+	WARN_ON_ONCE(!idr_is_empty(&ctx->dev->cqidr));
 	idr_destroy(&ctx->dev->cqidr);
+	WARN_ON_ONCE(!idr_is_empty(&ctx->dev->qpidr));
 	idr_destroy(&ctx->dev->qpidr);
+	WARN_ON_ONCE(!idr_is_empty(&ctx->dev->mmidr));
 	idr_destroy(&ctx->dev->mmidr);
+	wait_event(ctx->dev->wait, idr_is_empty(&ctx->dev->hwtid_idr));
 	idr_destroy(&ctx->dev->hwtid_idr);
 	idr_destroy(&ctx->dev->stid_idr);
 	idr_destroy(&ctx->dev->atid_idr);
@@ -992,6 +996,7 @@ static struct c4iw_dev *c4iw_alloc(const struct cxgb4_lld_info *infop)
 	mutex_init(&devp->rdev.stats.lock);
 	mutex_init(&devp->db_mutex);
 	INIT_LIST_HEAD(&devp->db_fc_list);
+	init_waitqueue_head(&devp->wait);
 	devp->avail_ird = devp->rdev.lldi.max_ird_adapter;
 
 	if (c4iw_debugfs_root) {
diff --git a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
index aa47e0a..4b83b84 100644
--- a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
+++ b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
@@ -263,6 +263,7 @@ struct c4iw_dev {
 	struct idr stid_idr;
 	struct list_head db_fc_list;
 	u32 avail_ird;
+	wait_queue_head_t wait;
 };
 
 static inline struct c4iw_dev *to_c4iw_dev(struct ib_device *ibdev)
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH WIP/RFC v2 3/6] nvme_rdma: keep a ref on the ctrl during delete/flush
  2016-08-29 21:35 [PATCH WIP/RFC v2 0/6] nvme-rdma device removal fixes Steve Wise
  2016-08-29 21:26 ` [PATCH WIP/RFC v2 1/6] iw_cxgb4: call dev_put() on l2t allocation failure Steve Wise
  2016-08-29 21:26 ` [PATCH WIP/RFC v2 2/6] iw_cxgb4: block module unload until all ep resources are released Steve Wise
@ 2016-08-29 21:27 ` Steve Wise
  2016-08-29 21:28 ` [PATCH WIP/RFC v2 5/6] nvme-rdma: add DELETING queue flag Sagi Grimberg
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 15+ messages in thread
From: Steve Wise @ 2016-08-29 21:27 UTC (permalink / raw)


Reviewed-by: Christoph Hellwig <hch at lst.de>
Reviewed-by: Sagi Grimberg <sagi at grimbrg.me>
Signed-off-by: Steve Wise <swise at opengridcomputing.com>
---
 drivers/nvme/host/rdma.c | 25 +++++++++++++++++--------
 1 file changed, 17 insertions(+), 8 deletions(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index c133256..8d22b95 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -1356,9 +1356,15 @@ static int nvme_rdma_device_unplug(struct nvme_rdma_queue *queue)
 		ret = 1;
 	}
 
-	/* Queue controller deletion */
+	/*
+	 * Queue controller deletion. Keep a reference until all
+	 * work is flushed since delete_work will free the ctrl mem
+	 */
+	kref_get(&ctrl->ctrl.kref);
 	queue_work(nvme_rdma_wq, &ctrl->delete_work);
 	flush_work(&ctrl->delete_work);
+	nvme_put_ctrl(&ctrl->ctrl);
+
 	return ret;
 }
 
@@ -1705,15 +1711,18 @@ static int __nvme_rdma_del_ctrl(struct nvme_rdma_ctrl *ctrl)
 static int nvme_rdma_del_ctrl(struct nvme_ctrl *nctrl)
 {
 	struct nvme_rdma_ctrl *ctrl = to_rdma_ctrl(nctrl);
-	int ret;
+	int ret = 0;
 
+	/*
+	 * Keep a reference until all work is flushed since
+	 * __nvme_rdma_del_ctrl can free the ctrl mem
+	 */
+	kref_get(&ctrl->ctrl.kref);
 	ret = __nvme_rdma_del_ctrl(ctrl);
-	if (ret)
-		return ret;
-
-	flush_work(&ctrl->delete_work);
-
-	return 0;
+	if (!ret)
+		flush_work(&ctrl->delete_work);
+	nvme_put_ctrl(&ctrl->ctrl);
+	return ret;
 }
 
 static void nvme_rdma_remove_ctrl_work(struct work_struct *work)
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH WIP/RFC v2 5/6] nvme-rdma: add DELETING queue flag
  2016-08-29 21:35 [PATCH WIP/RFC v2 0/6] nvme-rdma device removal fixes Steve Wise
                   ` (2 preceding siblings ...)
  2016-08-29 21:27 ` [PATCH WIP/RFC v2 3/6] nvme_rdma: keep a ref on the ctrl during delete/flush Steve Wise
@ 2016-08-29 21:28 ` Sagi Grimberg
  2016-08-29 21:28 ` [PATCH WIP/RFC v2 4/6] nvme-rdma: destroy nvme queue rdma resources on connect failure Steve Wise
  2016-08-29 21:35 ` [PATCH WIP/RFC v2 6/6] nvme-rdma: use ib_client API to detect device removal Steve Wise
  5 siblings, 0 replies; 15+ messages in thread
From: Sagi Grimberg @ 2016-08-29 21:28 UTC (permalink / raw)


When we get a surprise disconnect from the target we queue a periodic
reconnect (which is the sane thing to do...).

We only move the queues out of CONNECTED when we retry to reconnect (after
10 seconds in the default case) but we stop the blk queues immediately
so we are not bothered with traffic from now on. If delete() is kicking
off in this period the queues are still in CONNECTED state.

Part of the delete sequence is trying to issue ctrl shutdown if the
admin queue is CONNECTED (which it is!). This request is issued but
stuck in blk-mq waiting for the queues to start again. This might be
the one preventing us from forward progress...

The patch separates the queue flags to CONNECTED and DELETING. Now we
will move out of CONNECTED as soon as error recovery kicks in (before
stopping the queues) and DELETING is on when we start the queue deletion.

Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
---
 drivers/nvme/host/rdma.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index c2f6cc6..8036c23 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -88,6 +88,7 @@ struct nvme_rdma_request {
 enum nvme_rdma_queue_flags {
 	NVME_RDMA_Q_CONNECTED = (1 << 0),
 	NVME_RDMA_IB_QUEUE_ALLOCATED = (1 << 1),
+	NVME_RDMA_Q_DELETING = (1 << 2),
 };
 
 struct nvme_rdma_queue {
@@ -619,7 +620,7 @@ static void nvme_rdma_free_queue(struct nvme_rdma_queue *queue)
 
 static void nvme_rdma_stop_and_free_queue(struct nvme_rdma_queue *queue)
 {
-	if (!test_and_clear_bit(NVME_RDMA_Q_CONNECTED, &queue->flags))
+	if (test_and_set_bit(NVME_RDMA_Q_DELETING, &queue->flags))
 		return;
 	nvme_rdma_stop_queue(queue);
 	nvme_rdma_free_queue(queue);
@@ -772,8 +773,13 @@ static void nvme_rdma_error_recovery_work(struct work_struct *work)
 {
 	struct nvme_rdma_ctrl *ctrl = container_of(work,
 			struct nvme_rdma_ctrl, err_work);
+	int i;
 
 	nvme_stop_keep_alive(&ctrl->ctrl);
+
+	for (i = 0; i < ctrl->queue_count; i++)
+		clear_bit(NVME_RDMA_Q_CONNECTED, &ctrl->queues[i].flags);
+
 	if (ctrl->queue_count > 1)
 		nvme_stop_queues(&ctrl->ctrl);
 	blk_mq_stop_hw_queues(ctrl->ctrl.admin_q);
@@ -1353,7 +1359,7 @@ static int nvme_rdma_device_unplug(struct nvme_rdma_queue *queue)
 	cancel_delayed_work_sync(&ctrl->reconnect_work);
 
 	/* Disable the queue so ctrl delete won't free it */
-	if (test_and_clear_bit(NVME_RDMA_Q_CONNECTED, &queue->flags)) {
+	if (!test_and_set_bit(NVME_RDMA_Q_DELETING, &queue->flags)) {
 		/* Free this queue ourselves */
 		nvme_rdma_stop_queue(queue);
 		nvme_rdma_destroy_queue_ib(queue);
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH WIP/RFC v2 4/6] nvme-rdma: destroy nvme queue rdma resources on connect failure
  2016-08-29 21:35 [PATCH WIP/RFC v2 0/6] nvme-rdma device removal fixes Steve Wise
                   ` (3 preceding siblings ...)
  2016-08-29 21:28 ` [PATCH WIP/RFC v2 5/6] nvme-rdma: add DELETING queue flag Sagi Grimberg
@ 2016-08-29 21:28 ` Steve Wise
  2016-08-29 21:35 ` [PATCH WIP/RFC v2 6/6] nvme-rdma: use ib_client API to detect device removal Steve Wise
  5 siblings, 0 replies; 15+ messages in thread
From: Steve Wise @ 2016-08-29 21:28 UTC (permalink / raw)


After address resolution, the nvme_rdma_queue rdma resources are
allocated.  If rdma route resolution or the connect fails, or the
controller reconnect times out and gives up, then the rdma resources
need to be freed.  Otherwise, rdma resources are leaked.

Reviewed-by: Christoph Hellwig <hch at lst.de>
Reviewed-by: Sagi Grimberg <sagi at grimbrg.me>
Signed-off-by: Steve Wise <swise at opengridcomputing.com>
---
 drivers/nvme/host/rdma.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 8d22b95..c2f6cc6 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -87,6 +87,7 @@ struct nvme_rdma_request {
 
 enum nvme_rdma_queue_flags {
 	NVME_RDMA_Q_CONNECTED = (1 << 0),
+	NVME_RDMA_IB_QUEUE_ALLOCATED = (1 << 1),
 };
 
 struct nvme_rdma_queue {
@@ -488,6 +489,8 @@ static void nvme_rdma_destroy_queue_ib(struct nvme_rdma_queue *queue)
 	struct nvme_rdma_device *dev = queue->device;
 	struct ib_device *ibdev = dev->dev;
 
+	if (!test_and_clear_bit(NVME_RDMA_IB_QUEUE_ALLOCATED, &queue->flags))
+		return;
 	rdma_destroy_qp(queue->cm_id);
 	ib_free_cq(queue->ib_cq);
 
@@ -538,6 +541,7 @@ static int nvme_rdma_create_queue_ib(struct nvme_rdma_queue *queue,
 		ret = -ENOMEM;
 		goto out_destroy_qp;
 	}
+	set_bit(NVME_RDMA_IB_QUEUE_ALLOCATED, &queue->flags);
 
 	return 0;
 
@@ -595,6 +599,8 @@ static int nvme_rdma_init_queue(struct nvme_rdma_ctrl *ctrl,
 	return 0;
 
 out_destroy_cm_id:
+	if (test_bit(NVME_RDMA_IB_QUEUE_ALLOCATED, &ctrl->queues[0].flags))
+		nvme_rdma_destroy_queue_ib(queue);
 	rdma_destroy_id(queue->cm_id);
 	return ret;
 }
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH WIP/RFC v2 6/6] nvme-rdma: use ib_client API to detect device removal
  2016-08-29 21:35 [PATCH WIP/RFC v2 0/6] nvme-rdma device removal fixes Steve Wise
                   ` (4 preceding siblings ...)
  2016-08-29 21:28 ` [PATCH WIP/RFC v2 4/6] nvme-rdma: destroy nvme queue rdma resources on connect failure Steve Wise
@ 2016-08-29 21:35 ` Steve Wise
  2016-08-30  7:06   ` Christoph Hellwig
  2016-08-30 15:46   ` Sagi Grimberg
  5 siblings, 2 replies; 15+ messages in thread
From: Steve Wise @ 2016-08-29 21:35 UTC (permalink / raw)


Change nvme-rdma to use the IB Client API to detect device removal.
This has the wonderful benefit of being able to blow away all the
ib/rdma_cm resources for the device being removed.  No craziness about
not destroying the cm_id handling the event.  No deadlocks due to broken
iw_cm/rdma_cm/iwarp dependencies.  And no need to have a bound cm_id
around during controller recovery/reconnect to catch device removal
events.

We don't use the device_add aspect of the ib_client service since we only
want to create resources for an IB device if we have a target utilizing
that device.

Signed-off-by: Steve Wise <swise at opengridcomputing.com>
---
 drivers/nvme/host/rdma.c | 129 +++++++++++++++++++++++++----------------------
 1 file changed, 69 insertions(+), 60 deletions(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 8036c23..77dbc5e 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -1322,64 +1322,6 @@ out_destroy_queue_ib:
 	return ret;
 }
 
-/**
- * nvme_rdma_device_unplug() - Handle RDMA device unplug
- * @queue:      Queue that owns the cm_id that caught the event
- *
- * DEVICE_REMOVAL event notifies us that the RDMA device is about
- * to unplug so we should take care of destroying our RDMA resources.
- * This event will be generated for each allocated cm_id.
- *
- * In our case, the RDMA resources are managed per controller and not
- * only per queue. So the way we handle this is we trigger an implicit
- * controller deletion upon the first DEVICE_REMOVAL event we see, and
- * hold the event inflight until the controller deletion is completed.
- *
- * One exception that we need to handle is the destruction of the cm_id
- * that caught the event. Since we hold the callout until the controller
- * deletion is completed, we'll deadlock if the controller deletion will
- * call rdma_destroy_id on this queue's cm_id. Thus, we claim ownership
- * of destroying this queue before-hand, destroy the queue resources,
- * then queue the controller deletion which won't destroy this queue and
- * we destroy the cm_id implicitely by returning a non-zero rc to the callout.
- */
-static int nvme_rdma_device_unplug(struct nvme_rdma_queue *queue)
-{
-	struct nvme_rdma_ctrl *ctrl = queue->ctrl;
-	int ret = 0;
-
-	/* Own the controller deletion */
-	if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_DELETING))
-		return 0;
-
-	dev_warn(ctrl->ctrl.device,
-		"Got rdma device removal event, deleting ctrl\n");
-
-	/* Get rid of reconnect work if its running */
-	cancel_delayed_work_sync(&ctrl->reconnect_work);
-
-	/* Disable the queue so ctrl delete won't free it */
-	if (!test_and_set_bit(NVME_RDMA_Q_DELETING, &queue->flags)) {
-		/* Free this queue ourselves */
-		nvme_rdma_stop_queue(queue);
-		nvme_rdma_destroy_queue_ib(queue);
-
-		/* Return non-zero so the cm_id will destroy implicitly */
-		ret = 1;
-	}
-
-	/*
-	 * Queue controller deletion. Keep a reference until all
-	 * work is flushed since delete_work will free the ctrl mem
-	 */
-	kref_get(&ctrl->ctrl.kref);
-	queue_work(nvme_rdma_wq, &ctrl->delete_work);
-	flush_work(&ctrl->delete_work);
-	nvme_put_ctrl(&ctrl->ctrl);
-
-	return ret;
-}
-
 static int nvme_rdma_cm_handler(struct rdma_cm_id *cm_id,
 		struct rdma_cm_event *ev)
 {
@@ -1421,8 +1363,8 @@ static int nvme_rdma_cm_handler(struct rdma_cm_id *cm_id,
 		nvme_rdma_error_recovery(queue->ctrl);
 		break;
 	case RDMA_CM_EVENT_DEVICE_REMOVAL:
-		/* return 1 means impliciy CM ID destroy */
-		return nvme_rdma_device_unplug(queue);
+		/* device removal is handled via the ib_client API */
+		break;
 	default:
 		dev_err(queue->ctrl->ctrl.device,
 			"Unexpected RDMA CM event (%d)\n", ev->event);
@@ -2031,12 +1973,78 @@ static struct nvmf_transport_ops nvme_rdma_transport = {
 	.create_ctrl	= nvme_rdma_create_ctrl,
 };
 
+static void nvme_rdma_add_one(struct ib_device *ib_device);
+static void nvme_rdma_remove_one(struct ib_device *ib_device,
+				void *client_data);
+
+static struct ib_client nvme_rdma_ib_client = {
+	.name   = "nvme_rdma",
+	.add    = nvme_rdma_add_one,
+	.remove = nvme_rdma_remove_one
+};
+
+static void nvme_rdma_add_one(struct ib_device *ib_device)
+{
+	/* devices are added dynamically as targets use them */
+}
+
+static void nvme_rdma_remove_one(struct ib_device *ib_device, void *client_data)
+{
+	struct nvme_rdma_ctrl *ctrl, *tmp;
+
+	pr_info("Removing resources for device %s\n", ib_device->name);
+
+	mutex_lock(&nvme_rdma_ctrl_mutex);
+	list_for_each_entry_safe(ctrl, tmp, &nvme_rdma_ctrl_list, list) {
+		int delete_ctrl;
+
+		if (ctrl->device->dev != ib_device)
+			continue;
+
+		/*
+		 * Keep a reference until all work is flushed since
+		 * __nvme_rdma_del_ctrl can free the ctrl mem
+		 */
+		kref_get(&ctrl->ctrl.kref);
+		delete_ctrl = nvme_change_ctrl_state(&ctrl->ctrl,
+				NVME_CTRL_DELETING);
+		mutex_unlock(&nvme_rdma_ctrl_mutex);
+
+		dev_info(ctrl->ctrl.device,
+			"Removing ctrl: NQN \"%s\", addr %pISp\n",
+			ctrl->ctrl.opts->subsysnqn, &ctrl->addr);
+
+		if (delete_ctrl) {
+
+			/* Get rid of reconnect work if its running */
+			cancel_delayed_work_sync(&ctrl->reconnect_work);
+
+			/* queue controller deletion */
+			queue_work(nvme_rdma_wq, &ctrl->delete_work);
+		}
+
+		/* wait for deletion work to complete */
+		flush_work(&ctrl->delete_work);
+		nvme_put_ctrl(&ctrl->ctrl);
+		mutex_lock(&nvme_rdma_ctrl_mutex);
+	}
+	mutex_unlock(&nvme_rdma_ctrl_mutex);
+}
+
 static int __init nvme_rdma_init_module(void)
 {
+	int ret;
+
 	nvme_rdma_wq = create_workqueue("nvme_rdma_wq");
 	if (!nvme_rdma_wq)
 		return -ENOMEM;
 
+	ret = ib_register_client(&nvme_rdma_ib_client);
+	if (ret) {
+		destroy_workqueue(nvme_rdma_wq);
+		return ret;
+	}
+
 	nvmf_register_transport(&nvme_rdma_transport);
 	return 0;
 }
@@ -2052,6 +2060,7 @@ static void __exit nvme_rdma_cleanup_module(void)
 		__nvme_rdma_del_ctrl(ctrl);
 	mutex_unlock(&nvme_rdma_ctrl_mutex);
 
+	ib_unregister_client(&nvme_rdma_ib_client);
 	destroy_workqueue(nvme_rdma_wq);
 }
 
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH WIP/RFC v2 6/6] nvme-rdma: use ib_client API to detect device removal
  2016-08-29 21:35 ` [PATCH WIP/RFC v2 6/6] nvme-rdma: use ib_client API to detect device removal Steve Wise
@ 2016-08-30  7:06   ` Christoph Hellwig
  2016-08-30 14:05     ` Steve Wise
  2016-08-30 15:46   ` Sagi Grimberg
  1 sibling, 1 reply; 15+ messages in thread
From: Christoph Hellwig @ 2016-08-30  7:06 UTC (permalink / raw)


> +static void nvme_rdma_add_one(struct ib_device *ib_device);
> +static void nvme_rdma_remove_one(struct ib_device *ib_device,
> +				void *client_data);

Can we avoid the forward declarations?

> +static void nvme_rdma_add_one(struct ib_device *ib_device)
> +{
> +	/* devices are added dynamically as targets use them */
> +}

Is the add callback mandatory in the core?  Would be nice if we could
fix that up in an incremental patch.

> +static void nvme_rdma_remove_one(struct ib_device *ib_device, void *client_data)
> +{
> +	struct nvme_rdma_ctrl *ctrl, *tmp;
> +
> +	pr_info("Removing resources for device %s\n", ib_device->name);
> +
> +	mutex_lock(&nvme_rdma_ctrl_mutex);
> +	list_for_each_entry_safe(ctrl, tmp, &nvme_rdma_ctrl_list, list) {
> +		int delete_ctrl;
> +
> +		if (ctrl->device->dev != ib_device)
> +			continue;

Also not important for now, but we probably should move to a per-device
list here if we move forward with this approach.

> +		mutex_unlock(&nvme_rdma_ctrl_mutex);

After unlocking nvme_rdma_ctrl_mutex we will need to restart the
list walk, as others might have changed the list (e.g. for a different
device).  Maybe we should do the per-device list from the start to
simplify this loop.

> @@ -2052,6 +2060,7 @@ static void __exit nvme_rdma_cleanup_module(void)
>  		__nvme_rdma_del_ctrl(ctrl);
>  	mutex_unlock(&nvme_rdma_ctrl_mutex);
>  
> +	ib_unregister_client(&nvme_rdma_ib_client);

We shouldn't need the deletion loop above because now ib_unregister_client
takes care of that, right?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH WIP/RFC v2 6/6] nvme-rdma: use ib_client API to detect device removal
  2016-08-30  7:06   ` Christoph Hellwig
@ 2016-08-30 14:05     ` Steve Wise
  2016-08-30 15:44       ` Steve Wise
  0 siblings, 1 reply; 15+ messages in thread
From: Steve Wise @ 2016-08-30 14:05 UTC (permalink / raw)


> 
> > +static void nvme_rdma_add_one(struct ib_device *ib_device);
> > +static void nvme_rdma_remove_one(struct ib_device *ib_device,
> > +				void *client_data);
> 
> Can we avoid the forward declarations?
> 

Yes, will do.

> > +static void nvme_rdma_add_one(struct ib_device *ib_device)
> > +{
> > +	/* devices are added dynamically as targets use them */
> > +}
> 
> Is the add callback mandatory in the core?  Would be nice if we could
> fix that up in an incremental patch.
>

We can just not populate the add function.  From ib_register_device():

---
        list_for_each_entry(client, &client_list, list)
                if (client->add && !add_client_context(device, client))
                        client->add(device);
---

> > +static void nvme_rdma_remove_one(struct ib_device *ib_device, void
> *client_data)
> > +{
> > +	struct nvme_rdma_ctrl *ctrl, *tmp;
> > +
> > +	pr_info("Removing resources for device %s\n", ib_device->name);
> > +
> > +	mutex_lock(&nvme_rdma_ctrl_mutex);
> > +	list_for_each_entry_safe(ctrl, tmp, &nvme_rdma_ctrl_list, list) {
> > +		int delete_ctrl;
> > +
> > +		if (ctrl->device->dev != ib_device)
> > +			continue;
> 
> Also not important for now, but we probably should move to a per-device
> list here if we move forward with this approach.
> 
> > +		mutex_unlock(&nvme_rdma_ctrl_mutex);
> 
> After unlocking nvme_rdma_ctrl_mutex we will need to restart the
> list walk, as others might have changed the list (e.g. for a different
> device).  Maybe we should do the per-device list from the start to
> simplify this loop.
> 

How would changing the list by adding/removing controllers for another device
cause this logic here to miss a controller associated with the device being
removed? 

> > @@ -2052,6 +2060,7 @@ static void __exit nvme_rdma_cleanup_module(void)
> >  		__nvme_rdma_del_ctrl(ctrl);
> >  	mutex_unlock(&nvme_rdma_ctrl_mutex);
> >
> > +	ib_unregister_client(&nvme_rdma_ib_client);
> 
> We shouldn't need the deletion loop above because now ib_unregister_client
> takes care of that, right?

Correct.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH WIP/RFC v2 6/6] nvme-rdma: use ib_client API to detect device removal
  2016-08-30 14:05     ` Steve Wise
@ 2016-08-30 15:44       ` Steve Wise
  0 siblings, 0 replies; 15+ messages in thread
From: Steve Wise @ 2016-08-30 15:44 UTC (permalink / raw)


> > > +static void nvme_rdma_add_one(struct ib_device *ib_device)
> > > +{
> > > +	/* devices are added dynamically as targets use them */
> > > +}
> >
> > Is the add callback mandatory in the core?  Would be nice if we could
> > fix that up in an incremental patch.
> >
> 
> We can just not populate the add function.  From ib_register_device():
> 
> ---
>         list_for_each_entry(client, &client_list, list)
>                 if (client->add && !add_client_context(device, client))
>                         client->add(device);
> ---

I'm wrong.  If you don't provide an add function, it doesn't call
add_client_context() and thus the client isn't associated with this device and
won't get remove() calls.  

You think this should be changed in the core?  I think it is simpler just to
have a noop add function, and it keeps the current symmetry of the ib_client
API...

Steve.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH WIP/RFC v2 6/6] nvme-rdma: use ib_client API to detect device removal
  2016-08-29 21:35 ` [PATCH WIP/RFC v2 6/6] nvme-rdma: use ib_client API to detect device removal Steve Wise
  2016-08-30  7:06   ` Christoph Hellwig
@ 2016-08-30 15:46   ` Sagi Grimberg
  2016-08-30 15:48     ` Steve Wise
  1 sibling, 1 reply; 15+ messages in thread
From: Sagi Grimberg @ 2016-08-30 15:46 UTC (permalink / raw)



> +static void nvme_rdma_remove_one(struct ib_device *ib_device, void *client_data)
> +{
> +	struct nvme_rdma_ctrl *ctrl, *tmp;
> +
> +	pr_info("Removing resources for device %s\n", ib_device->name);
> +
> +	mutex_lock(&nvme_rdma_ctrl_mutex);
> +	list_for_each_entry_safe(ctrl, tmp, &nvme_rdma_ctrl_list, list) {
> +		int delete_ctrl;
> +
> +		if (ctrl->device->dev != ib_device)
> +			continue;
> +
> +		/*
> +		 * Keep a reference until all work is flushed since
> +		 * __nvme_rdma_del_ctrl can free the ctrl mem
> +		 */
> +		kref_get(&ctrl->ctrl.kref);
> +		delete_ctrl = nvme_change_ctrl_state(&ctrl->ctrl,
> +				NVME_CTRL_DELETING);
> +		mutex_unlock(&nvme_rdma_ctrl_mutex);
> +
> +		dev_info(ctrl->ctrl.device,
> +			"Removing ctrl: NQN \"%s\", addr %pISp\n",
> +			ctrl->ctrl.opts->subsysnqn, &ctrl->addr);
> +
> +		if (delete_ctrl) {
> +
> +			/* Get rid of reconnect work if its running */
> +			cancel_delayed_work_sync(&ctrl->reconnect_work);
> +
> +			/* queue controller deletion */
> +			queue_work(nvme_rdma_wq, &ctrl->delete_work);
> +		}
> +
> +		/* wait for deletion work to complete */
> +		flush_work(&ctrl->delete_work);
> +		nvme_put_ctrl(&ctrl->ctrl);
> +		mutex_lock(&nvme_rdma_ctrl_mutex);
> +	}
> +	mutex_unlock(&nvme_rdma_ctrl_mutex);
> +}

Will this code work?

	mutex_lock(&nvme_rdma_ctrl_mutex);
	list_for_each_entry(ctrl, &nvme_rdma_ctrl_list, list) {
		if (ctrl->device->dev != ib_device)
			continue;

		__nvme_rdma_del_ctrl(ctrl);
	}
	mutex_unlock(&nvme_rdma_ctrl_mutex);

	flush_workqueue(nvme_rdma_wq);

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH WIP/RFC v2 6/6] nvme-rdma: use ib_client API to detect device removal
  2016-08-30 15:46   ` Sagi Grimberg
@ 2016-08-30 15:48     ` Steve Wise
  2016-08-30 15:56       ` Sagi Grimberg
  0 siblings, 1 reply; 15+ messages in thread
From: Steve Wise @ 2016-08-30 15:48 UTC (permalink / raw)


> > +static void nvme_rdma_remove_one(struct ib_device *ib_device, void
> *client_data)
> > +{
> > +	struct nvme_rdma_ctrl *ctrl, *tmp;
> > +
> > +	pr_info("Removing resources for device %s\n", ib_device->name);
> > +
> > +	mutex_lock(&nvme_rdma_ctrl_mutex);
> > +	list_for_each_entry_safe(ctrl, tmp, &nvme_rdma_ctrl_list, list) {
> > +		int delete_ctrl;
> > +
> > +		if (ctrl->device->dev != ib_device)
> > +			continue;
> > +
> > +		/*
> > +		 * Keep a reference until all work is flushed since
> > +		 * __nvme_rdma_del_ctrl can free the ctrl mem
> > +		 */
> > +		kref_get(&ctrl->ctrl.kref);
> > +		delete_ctrl = nvme_change_ctrl_state(&ctrl->ctrl,
> > +				NVME_CTRL_DELETING);
> > +		mutex_unlock(&nvme_rdma_ctrl_mutex);
> > +
> > +		dev_info(ctrl->ctrl.device,
> > +			"Removing ctrl: NQN \"%s\", addr %pISp\n",
> > +			ctrl->ctrl.opts->subsysnqn, &ctrl->addr);
> > +
> > +		if (delete_ctrl) {
> > +
> > +			/* Get rid of reconnect work if its running */
> > +			cancel_delayed_work_sync(&ctrl->reconnect_work);
> > +
> > +			/* queue controller deletion */
> > +			queue_work(nvme_rdma_wq, &ctrl->delete_work);
> > +		}
> > +
> > +		/* wait for deletion work to complete */
> > +		flush_work(&ctrl->delete_work);
> > +		nvme_put_ctrl(&ctrl->ctrl);
> > +		mutex_lock(&nvme_rdma_ctrl_mutex);
> > +	}
> > +	mutex_unlock(&nvme_rdma_ctrl_mutex);
> > +}
> 
> Will this code work?
> 
> 	mutex_lock(&nvme_rdma_ctrl_mutex);
> 	list_for_each_entry(ctrl, &nvme_rdma_ctrl_list, list) {
> 		if (ctrl->device->dev != ib_device)
> 			continue;
> 
> 		__nvme_rdma_del_ctrl(ctrl);
> 	}
> 	mutex_unlock(&nvme_rdma_ctrl_mutex);
> 
> 	flush_workqueue(nvme_rdma_wq);

I'll try this.  It is cleaner/easier to understand.

I'm also moving the controller list to per-device as Christoph recommends.

Steve.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH WIP/RFC v2 6/6] nvme-rdma: use ib_client API to detect device removal
  2016-08-30 15:48     ` Steve Wise
@ 2016-08-30 15:56       ` Sagi Grimberg
  2016-08-30 15:57         ` Steve Wise
  2016-08-30 16:00         ` Christoph Hellwig
  0 siblings, 2 replies; 15+ messages in thread
From: Sagi Grimberg @ 2016-08-30 15:56 UTC (permalink / raw)



>> Will this code work?
>>
>> 	mutex_lock(&nvme_rdma_ctrl_mutex);
>> 	list_for_each_entry(ctrl, &nvme_rdma_ctrl_list, list) {
>> 		if (ctrl->device->dev != ib_device)
>> 			continue;
>>
>> 		__nvme_rdma_del_ctrl(ctrl);
>> 	}
>> 	mutex_unlock(&nvme_rdma_ctrl_mutex);
>>
>> 	flush_workqueue(nvme_rdma_wq);
>
> I'll try this.  It is cleaner/easier to understand.
>
> I'm also moving the controller list to per-device as Christoph recommends.

I'm still not sure if it's a real must. This will only save us the
condition (as we'll get the device in the client data)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH WIP/RFC v2 6/6] nvme-rdma: use ib_client API to detect device removal
  2016-08-30 15:56       ` Sagi Grimberg
@ 2016-08-30 15:57         ` Steve Wise
  2016-08-30 16:00         ` Christoph Hellwig
  1 sibling, 0 replies; 15+ messages in thread
From: Steve Wise @ 2016-08-30 15:57 UTC (permalink / raw)


> 
> >> Will this code work?
> >>
> >> 	mutex_lock(&nvme_rdma_ctrl_mutex);
> >> 	list_for_each_entry(ctrl, &nvme_rdma_ctrl_list, list) {
> >> 		if (ctrl->device->dev != ib_device)
> >> 			continue;
> >>
> >> 		__nvme_rdma_del_ctrl(ctrl);
> >> 	}
> >> 	mutex_unlock(&nvme_rdma_ctrl_mutex);
> >>
> >> 	flush_workqueue(nvme_rdma_wq);
> >
> > I'll try this.  It is cleaner/easier to understand.
> >
> > I'm also moving the controller list to per-device as Christoph recommends.
> 
> I'm still not sure if it's a real must. This will only save us the
> condition (as we'll get the device in the client data)

True, and your logic avoids the unlock/relock.  

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH WIP/RFC v2 6/6] nvme-rdma: use ib_client API to detect device removal
  2016-08-30 15:56       ` Sagi Grimberg
  2016-08-30 15:57         ` Steve Wise
@ 2016-08-30 16:00         ` Christoph Hellwig
  1 sibling, 0 replies; 15+ messages in thread
From: Christoph Hellwig @ 2016-08-30 16:00 UTC (permalink / raw)


On Tue, Aug 30, 2016@06:56:13PM +0300, Sagi Grimberg wrote:
> I'm still not sure if it's a real must. This will only save us the
> condition (as we'll get the device in the client data)

We can skip it for now.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2016-08-30 16:00 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-08-29 21:35 [PATCH WIP/RFC v2 0/6] nvme-rdma device removal fixes Steve Wise
2016-08-29 21:26 ` [PATCH WIP/RFC v2 1/6] iw_cxgb4: call dev_put() on l2t allocation failure Steve Wise
2016-08-29 21:26 ` [PATCH WIP/RFC v2 2/6] iw_cxgb4: block module unload until all ep resources are released Steve Wise
2016-08-29 21:27 ` [PATCH WIP/RFC v2 3/6] nvme_rdma: keep a ref on the ctrl during delete/flush Steve Wise
2016-08-29 21:28 ` [PATCH WIP/RFC v2 5/6] nvme-rdma: add DELETING queue flag Sagi Grimberg
2016-08-29 21:28 ` [PATCH WIP/RFC v2 4/6] nvme-rdma: destroy nvme queue rdma resources on connect failure Steve Wise
2016-08-29 21:35 ` [PATCH WIP/RFC v2 6/6] nvme-rdma: use ib_client API to detect device removal Steve Wise
2016-08-30  7:06   ` Christoph Hellwig
2016-08-30 14:05     ` Steve Wise
2016-08-30 15:44       ` Steve Wise
2016-08-30 15:46   ` Sagi Grimberg
2016-08-30 15:48     ` Steve Wise
2016-08-30 15:56       ` Sagi Grimberg
2016-08-30 15:57         ` Steve Wise
2016-08-30 16:00         ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).