[PATCH rfc 00/30] centralize nvme controller reset, delete and periodic reconnects

linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH rfc 00/30] centralize nvme controller reset, delete and periodic reconnects
@ 2017-06-18 15:21 Sagi Grimberg
  2017-06-18 15:21 ` [PATCH rfc 01/30] nvme: Add admin connect request queue Sagi Grimberg
                   ` (30 more replies)
  0 siblings, 31 replies; 69+ messages in thread
From: Sagi Grimberg @ 2017-06-18 15:21 UTC (permalink / raw)
  To: linux-nvme; +Cc: Christoph Hellwig, Keith Busch, linux-block

We know for a some time now that ever since NVMe grew additional
transports, we should really look into centralizing lots of code
around controller resets, removals, and periodic reconnects for fabrics.

This series is the first attempt to move shared logic to nvme core.
Controller probe, reset and removal flows are completely driven
from nvme core while the various transports simply implement hooks
for alloc/free/start/stop queues, alloc/free tagsets, and some
sanity post-init checks. Similarly, nvme-fabrics lib drives periodic
reconnects and fabric error recovery.

In this set, rdma and loop drivers are fully converted to delegate
these flows to the nvme core. The implementation is incremental in
the sense that it adds the logic to nvme core but does not obligate
drivers to use it, so pci and fc drivers are left intact.

I tested rdma and loop stress reset, delete and fabric errors
during live IO and they seem to work (on poor vms though, thanks
Johannes for fixing up rxe ;)).

I've started looking into converting pci and fc, but I don't have
much time to make real progress at the moment. Assuming that the
scheme looks fine for everyone (big IFF), I'd like to ask the
community if it will be acceptable to merge this and incrementally
enhance it to accommodate pci and fc (pci is more of a challenge in
my PoV).

About the patch set itself, I sorta worked my way up from rdma.c to
make the relevant flows and routines generic by slowly removing
transport dependancies, then made the some routines controller ops,
and then moved some chunks of the code as-is to core.c and fabrics.c
respectively, this was mainly for debugging purposes. Each patch of it's
own, might not make perfect sense (and probably I didn't put too much
effort in their change logs). when we get closer to inclusion, we can
squash lots of these together if desired. 

Feedback is appreciated and highly needed!

As a side note, I also had a go with adding queues representation to the
nvme core (with proper states), but it seemed to be too far out there for
now... I'll consider proposing it as a follow up series.

Sagi Grimberg (30):
  nvme: Add admin connect request queue
  nvme-rdma: Don't alloc/free the tagset on reset
  nvme-rdma: reuse configure/destroy admin queue
  nvme-rdma: introduce configure/destroy io queues
  nvme-rdma: introduce nvme_rdma_start_queue
  nvme-rdma: rename nvme_rdma_init_queue to nvme_rdma_alloc_queue
  nvme-rdma: make stop/free queue receive a ctrl and qid struct
  nvme-rdma: cleanup error path in controller reset
  nvme: Move queue_count to the nvme_ctrl
  nvme: Add admin_tagset pointer to nvme_ctrl
  nvme: move controller cap to struct nvme_ctrl
  nvme-rdma: disable controller in reset instead of shutdown
  nvme-rdma: move queue LIVE/DELETING flags settings to queue routines
  nvme-rdma: stop queues instead of simply flipping their state
  nvme-rdma: don't check queue state for shutdown/disable
  nvme-rdma: move tagset allocation to a dedicated routine
  nvme-rdma: move admin specific resources to alloc_queue
  nvme-rdma: limit max_queues to rdma device number of completion
    vectors
  nvme-rdma: call ops->reg_read64 instead of nvmf_reg_read64
  nvme: add err, reconnect and delete work items to nvme core
  nvme-rdma: plumb nvme_ctrl down the calls tack
  nvme-rdma: Split create_ctrl to transport specific and generic parts
  nvme: add low level queue and tagset controller ops
  nvme-pci: rename to nvme_pci_configure_admin_queue
  nvme: move control plane handling to nvme core
  nvme-fabrics: handle reconnects in fabrics library
  nvme-loop: convert to nvme-core control plane management
  nvme: update tagset nr_hw_queues when reallocating io queues
  nvme: add sed-opal ctrl manipulation in admin configuration
  nvme: Add queue freeze/unfreeze handling on controller resets

 drivers/nvme/host/core.c    | 415 +++++++++++++++++++++++++
 drivers/nvme/host/fabrics.c | 104 ++++++-
 drivers/nvme/host/fabrics.h |   1 +
 drivers/nvme/host/fc.c      |  11 +-
 drivers/nvme/host/nvme.h    |  31 ++
 drivers/nvme/host/pci.c     |   4 +-
 drivers/nvme/host/rdma.c    | 741 ++++++++++----------------------------------
 drivers/nvme/target/loop.c  | 415 +++++++------------------
 8 files changed, 840 insertions(+), 882 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH rfc 01/30] nvme: Add admin connect request queue
  2017-06-18 15:21 [PATCH rfc 00/30] centralize nvme controller reset, delete and periodic reconnects Sagi Grimberg
@ 2017-06-18 15:21 ` Sagi Grimberg
  2017-06-19  7:13   ` Christoph Hellwig
  2017-06-18 15:21 ` [PATCH rfc 02/30] nvme-rdma: Don't alloc/free the tagset on reset Sagi Grimberg
                   ` (29 subsequent siblings)
  30 siblings, 1 reply; 69+ messages in thread
From: Sagi Grimberg @ 2017-06-18 15:21 UTC (permalink / raw)
  To: linux-nvme; +Cc: Christoph Hellwig, Keith Busch, linux-block

In case we reconnect with inflight admin IO we
need to make sure that the connect comes before
the admin command. This can be only achieved by
using a seperate request queue for admin connects.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
 drivers/nvme/host/fabrics.c |  2 +-
 drivers/nvme/host/fc.c      | 11 ++++++++++-
 drivers/nvme/host/nvme.h    |  1 +
 drivers/nvme/host/rdma.c    | 19 ++++++++++++++-----
 drivers/nvme/target/loop.c  | 17 +++++++++++++----
 5 files changed, 39 insertions(+), 11 deletions(-)

diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
index 64db2c46c5ea..bd99bbb1faa3 100644
--- a/drivers/nvme/host/fabrics.c
+++ b/drivers/nvme/host/fabrics.c
@@ -412,7 +412,7 @@ int nvmf_connect_admin_queue(struct nvme_ctrl *ctrl)
 	strncpy(data->subsysnqn, ctrl->opts->subsysnqn, NVMF_NQN_SIZE);
 	strncpy(data->hostnqn, ctrl->opts->host->nqn, NVMF_NQN_SIZE);
 
-	ret = __nvme_submit_sync_cmd(ctrl->admin_q, &cmd, &res,
+	ret = __nvme_submit_sync_cmd(ctrl->admin_connect_q, &cmd, &res,
 			data, sizeof(*data), 0, NVME_QID_ANY, 1,
 			BLK_MQ_REQ_RESERVED | BLK_MQ_REQ_NOWAIT);
 	if (ret) {
diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index 5d5ecefd8dbe..25ee49037edb 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -1703,6 +1703,7 @@ nvme_fc_ctrl_free(struct kref *ref)
 	list_del(&ctrl->ctrl_list);
 	spin_unlock_irqrestore(&ctrl->rport->lock, flags);
 
+	blk_cleanup_queue(ctrl->ctrl.admin_connect_q);
 	blk_cleanup_queue(ctrl->ctrl.admin_q);
 	blk_mq_free_tag_set(&ctrl->admin_tag_set);
 
@@ -2745,6 +2746,12 @@ nvme_fc_init_ctrl(struct device *dev, struct nvmf_ctrl_options *opts,
 		goto out_free_admin_tag_set;
 	}
 
+	ctrl->ctrl.admin_connect_q = blk_mq_init_queue(&ctrl->admin_tag_set);
+	if (IS_ERR(ctrl->ctrl.admin_connect_q)) {
+		ret = PTR_ERR(ctrl->ctrl.admin_connect_q);
+		goto out_cleanup_admin_q;
+	}
+
 	/*
 	 * Would have been nice to init io queues tag set as well.
 	 * However, we require interaction from the controller
@@ -2754,7 +2761,7 @@ nvme_fc_init_ctrl(struct device *dev, struct nvmf_ctrl_options *opts,
 
 	ret = nvme_init_ctrl(&ctrl->ctrl, dev, &nvme_fc_ctrl_ops, 0);
 	if (ret)
-		goto out_cleanup_admin_q;
+		goto out_cleanup_admin_connect_q;
 
 	/* at this point, teardown path changes to ref counting on nvme ctrl */
 
@@ -2791,6 +2798,8 @@ nvme_fc_init_ctrl(struct device *dev, struct nvmf_ctrl_options *opts,
 
 	return &ctrl->ctrl;
 
+out_cleanup_admin_connect_q:
+	blk_cleanup_queue(ctrl->ctrl.admin_connect_q);
 out_cleanup_admin_q:
 	blk_cleanup_queue(ctrl->ctrl.admin_q);
 out_free_admin_tag_set:
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index f27c58b860f4..67147b49d992 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -121,6 +121,7 @@ struct nvme_ctrl {
 	const struct nvme_ctrl_ops *ops;
 	struct request_queue *admin_q;
 	struct request_queue *connect_q;
+	struct request_queue *admin_connect_q;
 	struct device *dev;
 	struct kref kref;
 	int instance;
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 7533138d2244..cb7f81d9098f 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -661,6 +661,7 @@ static void nvme_rdma_destroy_admin_queue(struct nvme_rdma_ctrl *ctrl)
 	nvme_rdma_free_qe(ctrl->queues[0].device->dev, &ctrl->async_event_sqe,
 			sizeof(struct nvme_command), DMA_TO_DEVICE);
 	nvme_rdma_stop_and_free_queue(&ctrl->queues[0]);
+	blk_cleanup_queue(ctrl->ctrl.admin_connect_q);
 	blk_cleanup_queue(ctrl->ctrl.admin_q);
 	blk_mq_free_tag_set(&ctrl->admin_tag_set);
 	nvme_rdma_dev_put(ctrl->device);
@@ -1583,9 +1584,15 @@ static int nvme_rdma_configure_admin_queue(struct nvme_rdma_ctrl *ctrl)
 		goto out_free_tagset;
 	}
 
+	ctrl->ctrl.admin_connect_q = blk_mq_init_queue(&ctrl->admin_tag_set);
+	if (IS_ERR(ctrl->ctrl.admin_connect_q)) {
+		error = PTR_ERR(ctrl->ctrl.admin_connect_q);
+		goto out_cleanup_queue;
+	}
+
 	error = nvmf_connect_admin_queue(&ctrl->ctrl);
 	if (error)
-		goto out_cleanup_queue;
+		goto out_cleanup_connect_queue;
 
 	set_bit(NVME_RDMA_Q_LIVE, &ctrl->queues[0].flags);
 
@@ -1593,7 +1600,7 @@ static int nvme_rdma_configure_admin_queue(struct nvme_rdma_ctrl *ctrl)
 	if (error) {
 		dev_err(ctrl->ctrl.device,
 			"prop_get NVME_REG_CAP failed\n");
-		goto out_cleanup_queue;
+		goto out_cleanup_connect_queue;
 	}
 
 	ctrl->ctrl.sqsize =
@@ -1601,25 +1608,27 @@ static int nvme_rdma_configure_admin_queue(struct nvme_rdma_ctrl *ctrl)
 
 	error = nvme_enable_ctrl(&ctrl->ctrl, ctrl->cap);
 	if (error)
-		goto out_cleanup_queue;
+		goto out_cleanup_connect_queue;
 
 	ctrl->ctrl.max_hw_sectors =
 		(ctrl->max_fr_pages - 1) << (PAGE_SHIFT - 9);
 
 	error = nvme_init_identify(&ctrl->ctrl);
 	if (error)
-		goto out_cleanup_queue;
+		goto out_cleanup_connect_queue;
 
 	error = nvme_rdma_alloc_qe(ctrl->queues[0].device->dev,
 			&ctrl->async_event_sqe, sizeof(struct nvme_command),
 			DMA_TO_DEVICE);
 	if (error)
-		goto out_cleanup_queue;
+		goto out_cleanup_connect_queue;
 
 	nvme_start_keep_alive(&ctrl->ctrl);
 
 	return 0;
 
+out_cleanup_connect_queue:
+	blk_cleanup_queue(ctrl->ctrl.admin_connect_q);
 out_cleanup_queue:
 	blk_cleanup_queue(ctrl->ctrl.admin_q);
 out_free_tagset:
diff --git a/drivers/nvme/target/loop.c b/drivers/nvme/target/loop.c
index 86c09e2a1490..edd9ee04de02 100644
--- a/drivers/nvme/target/loop.c
+++ b/drivers/nvme/target/loop.c
@@ -278,6 +278,7 @@ static const struct blk_mq_ops nvme_loop_admin_mq_ops = {
 static void nvme_loop_destroy_admin_queue(struct nvme_loop_ctrl *ctrl)
 {
 	nvmet_sq_destroy(&ctrl->queues[0].nvme_sq);
+	blk_cleanup_queue(ctrl->ctrl.admin_connect_q);
 	blk_cleanup_queue(ctrl->ctrl.admin_q);
 	blk_mq_free_tag_set(&ctrl->admin_tag_set);
 }
@@ -384,15 +385,21 @@ static int nvme_loop_configure_admin_queue(struct nvme_loop_ctrl *ctrl)
 		goto out_free_tagset;
 	}
 
+	ctrl->ctrl.admin_connect_q = blk_mq_init_queue(&ctrl->admin_tag_set);
+	if (IS_ERR(ctrl->ctrl.admin_connect_q)) {
+		error = PTR_ERR(ctrl->ctrl.admin_connect_q);
+		goto out_cleanup_queue;
+	}
+
 	error = nvmf_connect_admin_queue(&ctrl->ctrl);
 	if (error)
-		goto out_cleanup_queue;
+		goto out_cleanup_connect_queue;
 
 	error = nvmf_reg_read64(&ctrl->ctrl, NVME_REG_CAP, &ctrl->cap);
 	if (error) {
 		dev_err(ctrl->ctrl.device,
 			"prop_get NVME_REG_CAP failed\n");
-		goto out_cleanup_queue;
+		goto out_cleanup_connect_queue;
 	}
 
 	ctrl->ctrl.sqsize =
@@ -400,19 +407,21 @@ static int nvme_loop_configure_admin_queue(struct nvme_loop_ctrl *ctrl)
 
 	error = nvme_enable_ctrl(&ctrl->ctrl, ctrl->cap);
 	if (error)
-		goto out_cleanup_queue;
+		goto out_cleanup_connect_queue;
 
 	ctrl->ctrl.max_hw_sectors =
 		(NVME_LOOP_MAX_SEGMENTS - 1) << (PAGE_SHIFT - 9);
 
 	error = nvme_init_identify(&ctrl->ctrl);
 	if (error)
-		goto out_cleanup_queue;
+		goto out_cleanup_connect_queue;
 
 	nvme_start_keep_alive(&ctrl->ctrl);
 
 	return 0;
 
+out_cleanup_connect_queue:
+	blk_cleanup_queue(ctrl->ctrl.admin_connect_q);
 out_cleanup_queue:
 	blk_cleanup_queue(ctrl->ctrl.admin_q);
 out_free_tagset:
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: [PATCH rfc 01/30] nvme: Add admin connect request queue
  2017-06-18 15:21 ` [PATCH rfc 01/30] nvme: Add admin connect request queue Sagi Grimberg
@ 2017-06-19  7:13   ` Christoph Hellwig
  2017-06-19  7:49     ` Sagi Grimberg
  0 siblings, 1 reply; 69+ messages in thread
From: Christoph Hellwig @ 2017-06-19  7:13 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: linux-nvme, Christoph Hellwig, Keith Busch, linux-block

On Sun, Jun 18, 2017 at 06:21:35PM +0300, Sagi Grimberg wrote:
> In case we reconnect with inflight admin IO we
> need to make sure that the connect comes before
> the admin command. This can be only achieved by
> using a seperate request queue for admin connects.

Use up a few more lines of the available space for your lines? :)

Wouldn't a head insertation also solve the problem?

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH rfc 01/30] nvme: Add admin connect request queue
  2017-06-19  7:13   ` Christoph Hellwig
@ 2017-06-19  7:49     ` Sagi Grimberg
  2017-06-19 12:30       ` Christoph Hellwig
  2017-06-19 15:56       ` Hannes Reinecke
  0 siblings, 2 replies; 69+ messages in thread
From: Sagi Grimberg @ 2017-06-19  7:49 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-nvme, Keith Busch, linux-block

>> In case we reconnect with inflight admin IO we
>> need to make sure that the connect comes before
>> the admin command. This can be only achieved by
>> using a seperate request queue for admin connects.
> 
> Use up a few more lines of the available space for your lines? :)

I warned in the cover-letter that the change logs are pure
negligence at the moment :)

> Wouldn't a head insertation also solve the problem?

the head insertion will not protect against it because
we must invoke blk_mq_start_stopped_hw_queues on the admin_q
request queue so the admin connect can make progress, at this
point (and before we actually queue up connect) pending admin
commands can sneak in...

However, you raise a valid point, I think I added this before we
had the queue_is_ready protection, which will reject the command
if the queue is not LIVE (unless its a connect). I think the reason
its still in is that I tested this with loop which doesn't have
a per-queue state machine.

I'm still wandering if its a good idea to rely on the transport
queue state to reject non-connect requests on non-LIVE queues.
if/when we introduce a queue representation to the core and we
drive the state machine there, then we could actually rely on it
(I do have some code for it, but its a pretty massive change which
cannot be added in an incremental fashion).

Thoughts?

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH rfc 01/30] nvme: Add admin connect request queue
  2017-06-19  7:49     ` Sagi Grimberg
@ 2017-06-19 12:30       ` Christoph Hellwig
  2017-06-19 15:56       ` Hannes Reinecke
  1 sibling, 0 replies; 69+ messages in thread
From: Christoph Hellwig @ 2017-06-19 12:30 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: Christoph Hellwig, linux-nvme, Keith Busch, linux-block

On Mon, Jun 19, 2017 at 10:49:15AM +0300, Sagi Grimberg wrote:
> However, you raise a valid point, I think I added this before we
> had the queue_is_ready protection, which will reject the command
> if the queue is not LIVE (unless its a connect). I think the reason
> its still in is that I tested this with loop which doesn't have
> a per-queue state machine.

Yeah.

> I'm still wandering if its a good idea to rely on the transport
> queue state to reject non-connect requests on non-LIVE queues.
> if/when we introduce a queue representation to the core and we
> drive the state machine there, then we could actually rely on it
> (I do have some code for it, but its a pretty massive change which
> cannot be added in an incremental fashion).

I suspect moving the state machine to the core is a good idea.  Note that
the current nvme_rdma_queue_is_ready hack actually seems a bit to simple -
even after the connect we should only allow get/set Property.  Nevermind
the additional complications if/when authentication is implemented.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH rfc 01/30] nvme: Add admin connect request queue
  2017-06-19  7:49     ` Sagi Grimberg
  2017-06-19 12:30       ` Christoph Hellwig
@ 2017-06-19 15:56       ` Hannes Reinecke
  1 sibling, 0 replies; 69+ messages in thread
From: Hannes Reinecke @ 2017-06-19 15:56 UTC (permalink / raw)
  To: Sagi Grimberg, Christoph Hellwig; +Cc: linux-nvme, Keith Busch, linux-block

On 06/19/2017 09:49 AM, Sagi Grimberg wrote:
> 
>>> In case we reconnect with inflight admin IO we
>>> need to make sure that the connect comes before
>>> the admin command. This can be only achieved by
>>> using a seperate request queue for admin connects.
>>
>> Use up a few more lines of the available space for your lines? :)
> 
> I warned in the cover-letter that the change logs are pure
> negligence at the moment :)
> 
>> Wouldn't a head insertation also solve the problem?
> 
> the head insertion will not protect against it because
> we must invoke blk_mq_start_stopped_hw_queues on the admin_q
> request queue so the admin connect can make progress, at this
> point (and before we actually queue up connect) pending admin
> commands can sneak in...
> 
> However, you raise a valid point, I think I added this before we
> had the queue_is_ready protection, which will reject the command
> if the queue is not LIVE (unless its a connect). I think the reason
> its still in is that I tested this with loop which doesn't have
> a per-queue state machine.
> 
> I'm still wandering if its a good idea to rely on the transport
> queue state to reject non-connect requests on non-LIVE queues.
> if/when we introduce a queue representation to the core and we
> drive the state machine there, then we could actually rely on it
> (I do have some code for it, but its a pretty massive change which
> cannot be added in an incremental fashion).
> 
> Thoughts?

I very much prefer this solution, ie rejecting all commands if the queue
state is not 'LIVE', and make the 'connect' command the only one being
accepted in that state.
(Actually, we would need a state 'READY_TO_CONNECT' or somesuch to make
this work properly.)

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH rfc 02/30] nvme-rdma: Don't alloc/free the tagset on reset
  2017-06-18 15:21 [PATCH rfc 00/30] centralize nvme controller reset, delete and periodic reconnects Sagi Grimberg
  2017-06-18 15:21 ` [PATCH rfc 01/30] nvme: Add admin connect request queue Sagi Grimberg
@ 2017-06-18 15:21 ` Sagi Grimberg
  2017-06-19  7:18   ` Christoph Hellwig
  2017-06-18 15:21 ` [PATCH rfc 03/30] nvme-rdma: reuse configure/destroy admin queue Sagi Grimberg
                   ` (28 subsequent siblings)
  30 siblings, 1 reply; 69+ messages in thread
From: Sagi Grimberg @ 2017-06-18 15:21 UTC (permalink / raw)
  To: linux-nvme; +Cc: Christoph Hellwig, Keith Busch, linux-block

Also the admin and admin_connect request queues. This
is not something we should do on controller resets.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
 drivers/nvme/host/rdma.c | 118 ++++++++++++++++++++++++++---------------------
 1 file changed, 65 insertions(+), 53 deletions(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index cb7f81d9098f..3e4c6aa119ee 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -656,15 +656,19 @@ static int nvme_rdma_init_io_queues(struct nvme_rdma_ctrl *ctrl)
 	return ret;
 }
 
-static void nvme_rdma_destroy_admin_queue(struct nvme_rdma_ctrl *ctrl)
+static void nvme_rdma_destroy_admin_queue(struct nvme_rdma_ctrl *ctrl, bool remove)
 {
+	nvme_rdma_stop_queue(&ctrl->queues[0]);
+	if (remove) {
+		blk_cleanup_queue(ctrl->ctrl.admin_connect_q);
+		blk_cleanup_queue(ctrl->ctrl.admin_q);
+		blk_mq_free_tag_set(&ctrl->admin_tag_set);
+		nvme_rdma_dev_put(ctrl->device);
+	}
+
 	nvme_rdma_free_qe(ctrl->queues[0].device->dev, &ctrl->async_event_sqe,
 			sizeof(struct nvme_command), DMA_TO_DEVICE);
-	nvme_rdma_stop_and_free_queue(&ctrl->queues[0]);
-	blk_cleanup_queue(ctrl->ctrl.admin_connect_q);
-	blk_cleanup_queue(ctrl->ctrl.admin_q);
-	blk_mq_free_tag_set(&ctrl->admin_tag_set);
-	nvme_rdma_dev_put(ctrl->device);
+	nvme_rdma_free_queue(&ctrl->queues[0]);
 }
 
 static void nvme_rdma_free_ctrl(struct nvme_ctrl *nctrl)
@@ -1542,7 +1546,7 @@ static const struct blk_mq_ops nvme_rdma_admin_mq_ops = {
 	.timeout	= nvme_rdma_timeout,
 };
 
-static int nvme_rdma_configure_admin_queue(struct nvme_rdma_ctrl *ctrl)
+static int nvme_rdma_configure_admin_queue(struct nvme_rdma_ctrl *ctrl, bool new)
 {
 	int error;
 
@@ -1551,43 +1555,44 @@ static int nvme_rdma_configure_admin_queue(struct nvme_rdma_ctrl *ctrl)
 		return error;
 
 	ctrl->device = ctrl->queues[0].device;
-
-	/*
-	 * We need a reference on the device as long as the tag_set is alive,
-	 * as the MRs in the request structures need a valid ib_device.
-	 */
-	error = -EINVAL;
-	if (!nvme_rdma_dev_get(ctrl->device))
-		goto out_free_queue;
-
 	ctrl->max_fr_pages = min_t(u32, NVME_RDMA_MAX_SEGMENTS,
 		ctrl->device->dev->attrs.max_fast_reg_page_list_len);
 
-	memset(&ctrl->admin_tag_set, 0, sizeof(ctrl->admin_tag_set));
-	ctrl->admin_tag_set.ops = &nvme_rdma_admin_mq_ops;
-	ctrl->admin_tag_set.queue_depth = NVME_RDMA_AQ_BLKMQ_DEPTH;
-	ctrl->admin_tag_set.reserved_tags = 2; /* connect + keep-alive */
-	ctrl->admin_tag_set.numa_node = NUMA_NO_NODE;
-	ctrl->admin_tag_set.cmd_size = sizeof(struct nvme_rdma_request) +
-		SG_CHUNK_SIZE * sizeof(struct scatterlist);
-	ctrl->admin_tag_set.driver_data = ctrl;
-	ctrl->admin_tag_set.nr_hw_queues = 1;
-	ctrl->admin_tag_set.timeout = ADMIN_TIMEOUT;
-
-	error = blk_mq_alloc_tag_set(&ctrl->admin_tag_set);
-	if (error)
-		goto out_put_dev;
-
-	ctrl->ctrl.admin_q = blk_mq_init_queue(&ctrl->admin_tag_set);
-	if (IS_ERR(ctrl->ctrl.admin_q)) {
-		error = PTR_ERR(ctrl->ctrl.admin_q);
-		goto out_free_tagset;
-	}
+	if (new) {
+		/*
+		 * We need a reference on the device as long as the tag_set is alive,
+		 * as the MRs in the request structures need a valid ib_device.
+		 */
+		error = -EINVAL;
+		if (!nvme_rdma_dev_get(ctrl->device))
+			goto out_free_queue;
+
+		memset(&ctrl->admin_tag_set, 0, sizeof(ctrl->admin_tag_set));
+		ctrl->admin_tag_set.ops = &nvme_rdma_admin_mq_ops;
+		ctrl->admin_tag_set.queue_depth = NVME_RDMA_AQ_BLKMQ_DEPTH;
+		ctrl->admin_tag_set.reserved_tags = 2; /* connect + keep-alive */
+		ctrl->admin_tag_set.numa_node = NUMA_NO_NODE;
+		ctrl->admin_tag_set.cmd_size = sizeof(struct nvme_rdma_request) +
+			SG_CHUNK_SIZE * sizeof(struct scatterlist);
+		ctrl->admin_tag_set.driver_data = ctrl;
+		ctrl->admin_tag_set.nr_hw_queues = 1;
+		ctrl->admin_tag_set.timeout = ADMIN_TIMEOUT;
+
+		error = blk_mq_alloc_tag_set(&ctrl->admin_tag_set);
+		if (error)
+			goto out_put_dev;
+
+		ctrl->ctrl.admin_q = blk_mq_init_queue(&ctrl->admin_tag_set);
+		if (IS_ERR(ctrl->ctrl.admin_q)) {
+			error = PTR_ERR(ctrl->ctrl.admin_q);
+			goto out_free_tagset;
+		}
 
-	ctrl->ctrl.admin_connect_q = blk_mq_init_queue(&ctrl->admin_tag_set);
-	if (IS_ERR(ctrl->ctrl.admin_connect_q)) {
-		error = PTR_ERR(ctrl->ctrl.admin_connect_q);
-		goto out_cleanup_queue;
+		ctrl->ctrl.admin_connect_q = blk_mq_init_queue(&ctrl->admin_tag_set);
+		if (IS_ERR(ctrl->ctrl.admin_connect_q)) {
+			error = PTR_ERR(ctrl->ctrl.admin_connect_q);
+			goto out_cleanup_queue;
+		}
 	}
 
 	error = nvmf_connect_admin_queue(&ctrl->ctrl);
@@ -1596,6 +1601,8 @@ static int nvme_rdma_configure_admin_queue(struct nvme_rdma_ctrl *ctrl)
 
 	set_bit(NVME_RDMA_Q_LIVE, &ctrl->queues[0].flags);
 
+	blk_mq_start_stopped_hw_queues(ctrl->ctrl.admin_q, true);
+
 	error = nvmf_reg_read64(&ctrl->ctrl, NVME_REG_CAP, &ctrl->cap);
 	if (error) {
 		dev_err(ctrl->ctrl.device,
@@ -1628,21 +1635,26 @@ static int nvme_rdma_configure_admin_queue(struct nvme_rdma_ctrl *ctrl)
 	return 0;
 
 out_cleanup_connect_queue:
-	blk_cleanup_queue(ctrl->ctrl.admin_connect_q);
+	if (new)
+		blk_cleanup_queue(ctrl->ctrl.admin_connect_q);
 out_cleanup_queue:
-	blk_cleanup_queue(ctrl->ctrl.admin_q);
+	if (new)
+		blk_cleanup_queue(ctrl->ctrl.admin_q);
 out_free_tagset:
-	/* disconnect and drain the queue before freeing the tagset */
-	nvme_rdma_stop_queue(&ctrl->queues[0]);
-	blk_mq_free_tag_set(&ctrl->admin_tag_set);
+	if (new) {
+		/* disconnect and drain the queue before freeing the tagset */
+		nvme_rdma_stop_queue(&ctrl->queues[0]);
+		blk_mq_free_tag_set(&ctrl->admin_tag_set);
+	}
 out_put_dev:
-	nvme_rdma_dev_put(ctrl->device);
+	if (new)
+		nvme_rdma_dev_put(ctrl->device);
 out_free_queue:
 	nvme_rdma_free_queue(&ctrl->queues[0]);
 	return error;
 }
 
-static void nvme_rdma_shutdown_ctrl(struct nvme_rdma_ctrl *ctrl)
+static void nvme_rdma_shutdown_ctrl(struct nvme_rdma_ctrl *ctrl, bool shutdown)
 {
 	nvme_stop_keep_alive(&ctrl->ctrl);
 	cancel_work_sync(&ctrl->err_work);
@@ -1661,14 +1673,14 @@ static void nvme_rdma_shutdown_ctrl(struct nvme_rdma_ctrl *ctrl)
 	blk_mq_stop_hw_queues(ctrl->ctrl.admin_q);
 	blk_mq_tagset_busy_iter(&ctrl->admin_tag_set,
 				nvme_cancel_request, &ctrl->ctrl);
-	nvme_rdma_destroy_admin_queue(ctrl);
+	nvme_rdma_destroy_admin_queue(ctrl, shutdown);
 }
 
 static void __nvme_rdma_remove_ctrl(struct nvme_rdma_ctrl *ctrl, bool shutdown)
 {
 	nvme_uninit_ctrl(&ctrl->ctrl);
 	if (shutdown)
-		nvme_rdma_shutdown_ctrl(ctrl);
+		nvme_rdma_shutdown_ctrl(ctrl, shutdown);
 
 	if (ctrl->ctrl.tagset) {
 		blk_cleanup_queue(ctrl->ctrl.connect_q);
@@ -1731,9 +1743,9 @@ static void nvme_rdma_reset_ctrl_work(struct work_struct *work)
 	int ret;
 	bool changed;
 
-	nvme_rdma_shutdown_ctrl(ctrl);
+	nvme_rdma_shutdown_ctrl(ctrl, false);
 
-	ret = nvme_rdma_configure_admin_queue(ctrl);
+	ret = nvme_rdma_configure_admin_queue(ctrl, false);
 	if (ret) {
 		/* ctrl is already shutdown, just remove the ctrl */
 		INIT_WORK(&ctrl->delete_work, nvme_rdma_remove_ctrl_work);
@@ -1898,7 +1910,7 @@ static struct nvme_ctrl *nvme_rdma_create_ctrl(struct device *dev,
 	if (!ctrl->queues)
 		goto out_uninit_ctrl;
 
-	ret = nvme_rdma_configure_admin_queue(ctrl);
+	ret = nvme_rdma_configure_admin_queue(ctrl, true);
 	if (ret)
 		goto out_kfree_queues;
 
@@ -1959,7 +1971,7 @@ static struct nvme_ctrl *nvme_rdma_create_ctrl(struct device *dev,
 
 out_remove_admin_queue:
 	nvme_stop_keep_alive(&ctrl->ctrl);
-	nvme_rdma_destroy_admin_queue(ctrl);
+	nvme_rdma_destroy_admin_queue(ctrl, true);
 out_kfree_queues:
 	kfree(ctrl->queues);
 out_uninit_ctrl:
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: [PATCH rfc 02/30] nvme-rdma: Don't alloc/free the tagset on reset
  2017-06-18 15:21 ` [PATCH rfc 02/30] nvme-rdma: Don't alloc/free the tagset on reset Sagi Grimberg
@ 2017-06-19  7:18   ` Christoph Hellwig
  2017-06-19  7:59     ` Sagi Grimberg
  2017-07-10 18:50     ` James Smart
  0 siblings, 2 replies; 69+ messages in thread
From: Christoph Hellwig @ 2017-06-19  7:18 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: linux-nvme, Christoph Hellwig, Keith Busch, linux-block

> +static void nvme_rdma_destroy_admin_queue(struct nvme_rdma_ctrl *ctrl, bool remove)
>  {
> +	nvme_rdma_stop_queue(&ctrl->queues[0]);
> +	if (remove) {
> +		blk_cleanup_queue(ctrl->ctrl.admin_connect_q);
> +		blk_cleanup_queue(ctrl->ctrl.admin_q);
> +		blk_mq_free_tag_set(&ctrl->admin_tag_set);
> +		nvme_rdma_dev_put(ctrl->device);
> +	}
> +
>  	nvme_rdma_free_qe(ctrl->queues[0].device->dev, &ctrl->async_event_sqe,
>  			sizeof(struct nvme_command), DMA_TO_DEVICE);
> +	nvme_rdma_free_queue(&ctrl->queues[0]);

I don't like the calling convention.  We only have have two callers
anyway.  So I'd much rather only keep the code inside the if above
in the new nvme_rdma_destroy_admin_queue that is only called at shutdown
time, and opencode the calls to nvme_rdma_stop_queue, nvme_rdma_free_qe
and nvme_rdma_free_queue in the callers.

> -static int nvme_rdma_configure_admin_queue(struct nvme_rdma_ctrl *ctrl)
> +static int nvme_rdma_configure_admin_queue(struct nvme_rdma_ctrl *ctrl, bool new)

PCIe is just checking for a non-null admin_q.  But I think we should
jsut split this into two functions, one for the shared code at the end
and one just for the first-time setup, with the nvme_rdma_init_queue
call open coded.

>  	error = nvmf_connect_admin_queue(&ctrl->ctrl);
> @@ -1596,6 +1601,8 @@ static int nvme_rdma_configure_admin_queue(struct nvme_rdma_ctrl *ctrl)
>  
>  	set_bit(NVME_RDMA_Q_LIVE, &ctrl->queues[0].flags);
>  
> +	blk_mq_start_stopped_hw_queues(ctrl->ctrl.admin_q, true);
> +

Where does this come from?

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH rfc 02/30] nvme-rdma: Don't alloc/free the tagset on reset
  2017-06-19  7:18   ` Christoph Hellwig
@ 2017-06-19  7:59     ` Sagi Grimberg
  2017-06-19 12:35       ` Christoph Hellwig
  2017-07-10 18:50     ` James Smart
  1 sibling, 1 reply; 69+ messages in thread
From: Sagi Grimberg @ 2017-06-19  7:59 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-nvme, Keith Busch, linux-block



On 19/06/17 10:18, Christoph Hellwig wrote:
>> +static void nvme_rdma_destroy_admin_queue(struct nvme_rdma_ctrl *ctrl, bool remove)
>>   {
>> +	nvme_rdma_stop_queue(&ctrl->queues[0]);
>> +	if (remove) {
>> +		blk_cleanup_queue(ctrl->ctrl.admin_connect_q);
>> +		blk_cleanup_queue(ctrl->ctrl.admin_q);
>> +		blk_mq_free_tag_set(&ctrl->admin_tag_set);
>> +		nvme_rdma_dev_put(ctrl->device);
>> +	}
>> +
>>   	nvme_rdma_free_qe(ctrl->queues[0].device->dev, &ctrl->async_event_sqe,
>>   			sizeof(struct nvme_command), DMA_TO_DEVICE);
>> +	nvme_rdma_free_queue(&ctrl->queues[0]);
> 
> I don't like the calling convention.  We only have have two callers
> anyway.  So I'd much rather only keep the code inside the if above
> in the new nvme_rdma_destroy_admin_queue that is only called at shutdown
> time, and opencode the calls to nvme_rdma_stop_queue, nvme_rdma_free_qe
> and nvme_rdma_free_queue in the callers.

We can do that, but this tries to eliminate duplicate code as
much as possible. It's not like the convention is unprecedented...

>> -static int nvme_rdma_configure_admin_queue(struct nvme_rdma_ctrl *ctrl)
>> +static int nvme_rdma_configure_admin_queue(struct nvme_rdma_ctrl *ctrl, bool new)
> 
> PCIe is just checking for a non-null admin_q.

Which I don't like very much :)

> But I think we should
> jsut split this into two functions, one for the shared code at the end
> and one just for the first-time setup, with the nvme_rdma_init_queue
> call open coded.

We can split, but I less like the idea of open-coding
nvme_rdma_init_queue at the call-sites.

>>   	error = nvmf_connect_admin_queue(&ctrl->ctrl);
>> @@ -1596,6 +1601,8 @@ static int nvme_rdma_configure_admin_queue(struct nvme_rdma_ctrl *ctrl)
>>   
>>   	set_bit(NVME_RDMA_Q_LIVE, &ctrl->queues[0].flags);
>>   
>> +	blk_mq_start_stopped_hw_queues(ctrl->ctrl.admin_q, true);
>> +
> 
> Where does this come from?

Spilled in I guess...

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH rfc 02/30] nvme-rdma: Don't alloc/free the tagset on reset
  2017-06-19  7:59     ` Sagi Grimberg
@ 2017-06-19 12:35       ` Christoph Hellwig
  0 siblings, 0 replies; 69+ messages in thread
From: Christoph Hellwig @ 2017-06-19 12:35 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: Christoph Hellwig, linux-nvme, Keith Busch, linux-block

> We can do that, but this tries to eliminate duplicate code as
> much as possible. It's not like the convention is unprecedented...

It's fairly nasty to follow.  OTOH I like your overall cleanup,
so I guess I shouldn't complain about the initial patches to much
but just possibly do another pass after you are done..

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH rfc 02/30] nvme-rdma: Don't alloc/free the tagset on reset
  2017-06-19  7:18   ` Christoph Hellwig
  2017-06-19  7:59     ` Sagi Grimberg
@ 2017-07-10 18:50     ` James Smart
  1 sibling, 0 replies; 69+ messages in thread
From: James Smart @ 2017-07-10 18:50 UTC (permalink / raw)
  To: Christoph Hellwig, Sagi Grimberg; +Cc: Keith Busch, linux-block, linux-nvme

On 6/19/2017 12:18 AM, Christoph Hellwig wrote:
>> +static void nvme_rdma_destroy_admin_queue(struct nvme_rdma_ctrl *ctrl, bool remove)
>>   {
>> +	nvme_rdma_stop_queue(&ctrl->queues[0]);
>> +	if (remove) {
>> +		blk_cleanup_queue(ctrl->ctrl.admin_connect_q);
>> +		blk_cleanup_queue(ctrl->ctrl.admin_q);
>> +		blk_mq_free_tag_set(&ctrl->admin_tag_set);
>> +		nvme_rdma_dev_put(ctrl->device);
>> +	}
>> +
>>   	nvme_rdma_free_qe(ctrl->queues[0].device->dev, &ctrl->async_event_sqe,
>>   			sizeof(struct nvme_command), DMA_TO_DEVICE);
>> +	nvme_rdma_free_queue(&ctrl->queues[0]);
> I don't like the calling convention.  We only have have two callers
> anyway.  So I'd much rather only keep the code inside the if above
> in the new nvme_rdma_destroy_admin_queue that is only called at shutdown
> time, and opencode the calls to nvme_rdma_stop_queue, nvme_rdma_free_qe
> and nvme_rdma_free_queue in the callers.
>

Any chance you can make the organization like what I did with FC and 
avoid all the "new" and "remove" flags ?

e.g. code blocks for:
- allocation/initialization for the controller and the tag sets. 
Basically initial allocation/creation of everything that would be the 
os-facing side of the controller.
- an association (or call it a session) create. Basically everything 
that makes the link-side ties to the subsystem and creates the 
controller and its connections. Does admin queue creation, controller 
init, and io queue creation, and enablement of the blk-mq queues as it 
does so.
- an association teardown. Basically everything that stops the blk-mq 
queues and tears down the link-side ties to the controller.
- a final controller teardown, which removes it from the system. 
Everything that terminates the os-facing side of the controller.

-- james

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH rfc 03/30] nvme-rdma: reuse configure/destroy admin queue
  2017-06-18 15:21 [PATCH rfc 00/30] centralize nvme controller reset, delete and periodic reconnects Sagi Grimberg
  2017-06-18 15:21 ` [PATCH rfc 01/30] nvme: Add admin connect request queue Sagi Grimberg
  2017-06-18 15:21 ` [PATCH rfc 02/30] nvme-rdma: Don't alloc/free the tagset on reset Sagi Grimberg
@ 2017-06-18 15:21 ` Sagi Grimberg
  2017-06-19  7:20   ` Christoph Hellwig
  2017-06-18 15:21 ` [PATCH rfc 04/30] nvme-rdma: introduce configure/destroy io queues Sagi Grimberg
                   ` (27 subsequent siblings)
  30 siblings, 1 reply; 69+ messages in thread
From: Sagi Grimberg @ 2017-06-18 15:21 UTC (permalink / raw)
  To: linux-nvme; +Cc: Christoph Hellwig, Keith Busch, linux-block

We have all we need in these functions now that these
are aware if we are doing a full instantiation/removal.

For that we move nvme_rdma_configure_admin_queue to avoid
a forward declaration, and we add blk_mq_ops forward declaration.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
 drivers/nvme/host/rdma.c | 253 ++++++++++++++++++++++-------------------------
 1 file changed, 119 insertions(+), 134 deletions(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 3e4c6aa119ee..5fef5545e365 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -140,6 +140,9 @@ static DEFINE_MUTEX(device_list_mutex);
 static LIST_HEAD(nvme_rdma_ctrl_list);
 static DEFINE_MUTEX(nvme_rdma_ctrl_mutex);
 
+static const struct blk_mq_ops nvme_rdma_mq_ops;
+static const struct blk_mq_ops nvme_rdma_admin_mq_ops;
+
 /*
  * Disabling this option makes small I/O goes faster, but is fundamentally
  * unsafe.  With it turned off we will have to register a global rkey that
@@ -562,20 +565,22 @@ static int nvme_rdma_init_queue(struct nvme_rdma_ctrl *ctrl,
 
 static void nvme_rdma_stop_queue(struct nvme_rdma_queue *queue)
 {
+	if (test_bit(NVME_RDMA_Q_DELETING, &queue->flags))
+		return;
 	rdma_disconnect(queue->cm_id);
 	ib_drain_qp(queue->qp);
 }
 
 static void nvme_rdma_free_queue(struct nvme_rdma_queue *queue)
 {
+	if (test_and_set_bit(NVME_RDMA_Q_DELETING, &queue->flags))
+		return;
 	nvme_rdma_destroy_queue_ib(queue);
 	rdma_destroy_id(queue->cm_id);
 }
 
 static void nvme_rdma_stop_and_free_queue(struct nvme_rdma_queue *queue)
 {
-	if (test_and_set_bit(NVME_RDMA_Q_DELETING, &queue->flags))
-		return;
 	nvme_rdma_stop_queue(queue);
 	nvme_rdma_free_queue(queue);
 }
@@ -671,6 +676,116 @@ static void nvme_rdma_destroy_admin_queue(struct nvme_rdma_ctrl *ctrl, bool remo
 	nvme_rdma_free_queue(&ctrl->queues[0]);
 }
 
+static int nvme_rdma_configure_admin_queue(struct nvme_rdma_ctrl *ctrl, bool new)
+{
+	int error;
+
+	error = nvme_rdma_init_queue(ctrl, 0, NVME_AQ_DEPTH);
+	if (error)
+		return error;
+
+	ctrl->device = ctrl->queues[0].device;
+	ctrl->max_fr_pages = min_t(u32, NVME_RDMA_MAX_SEGMENTS,
+		ctrl->device->dev->attrs.max_fast_reg_page_list_len);
+
+	if (new) {
+		/*
+		 * We need a reference on the device as long as the tag_set is alive,
+		 * as the MRs in the request structures need a valid ib_device.
+		 */
+		error = -EINVAL;
+		if (!nvme_rdma_dev_get(ctrl->device))
+			goto out_free_queue;
+
+		memset(&ctrl->admin_tag_set, 0, sizeof(ctrl->admin_tag_set));
+		ctrl->admin_tag_set.ops = &nvme_rdma_admin_mq_ops;
+		ctrl->admin_tag_set.queue_depth = NVME_RDMA_AQ_BLKMQ_DEPTH;
+		ctrl->admin_tag_set.reserved_tags = 2; /* connect + keep-alive */
+		ctrl->admin_tag_set.numa_node = NUMA_NO_NODE;
+		ctrl->admin_tag_set.cmd_size = sizeof(struct nvme_rdma_request) +
+			SG_CHUNK_SIZE * sizeof(struct scatterlist);
+		ctrl->admin_tag_set.driver_data = ctrl;
+		ctrl->admin_tag_set.nr_hw_queues = 1;
+		ctrl->admin_tag_set.timeout = ADMIN_TIMEOUT;
+
+		error = blk_mq_alloc_tag_set(&ctrl->admin_tag_set);
+		if (error)
+			goto out_put_dev;
+
+		ctrl->ctrl.admin_q = blk_mq_init_queue(&ctrl->admin_tag_set);
+		if (IS_ERR(ctrl->ctrl.admin_q)) {
+			error = PTR_ERR(ctrl->ctrl.admin_q);
+			goto out_free_tagset;
+		}
+
+		ctrl->ctrl.admin_connect_q = blk_mq_init_queue(&ctrl->admin_tag_set);
+		if (IS_ERR(ctrl->ctrl.admin_connect_q)) {
+			error = PTR_ERR(ctrl->ctrl.admin_connect_q);
+			goto out_cleanup_queue;
+		}
+	} else {
+		error = blk_mq_reinit_tagset(&ctrl->admin_tag_set);
+		if (error)
+			goto out_free_queue;
+	}
+
+	error = nvmf_connect_admin_queue(&ctrl->ctrl);
+	if (error)
+		goto out_cleanup_connect_queue;
+
+	set_bit(NVME_RDMA_Q_LIVE, &ctrl->queues[0].flags);
+
+	error = nvmf_reg_read64(&ctrl->ctrl, NVME_REG_CAP, &ctrl->cap);
+	if (error) {
+		dev_err(ctrl->ctrl.device,
+			"prop_get NVME_REG_CAP failed\n");
+		goto out_cleanup_connect_queue;
+	}
+
+	ctrl->ctrl.sqsize =
+		min_t(int, NVME_CAP_MQES(ctrl->cap), ctrl->ctrl.sqsize);
+
+	error = nvme_enable_ctrl(&ctrl->ctrl, ctrl->cap);
+	if (error)
+		goto out_cleanup_connect_queue;
+
+	ctrl->ctrl.max_hw_sectors =
+		(ctrl->max_fr_pages - 1) << (PAGE_SHIFT - 9);
+
+	error = nvme_init_identify(&ctrl->ctrl);
+	if (error)
+		goto out_cleanup_connect_queue;
+
+	error = nvme_rdma_alloc_qe(ctrl->queues[0].device->dev,
+			&ctrl->async_event_sqe, sizeof(struct nvme_command),
+			DMA_TO_DEVICE);
+	if (error)
+		goto out_cleanup_connect_queue;
+
+	nvme_start_keep_alive(&ctrl->ctrl);
+
+	return 0;
+
+out_cleanup_connect_queue:
+	if (new)
+		blk_cleanup_queue(ctrl->ctrl.admin_connect_q);
+out_cleanup_queue:
+	if (new)
+		blk_cleanup_queue(ctrl->ctrl.admin_q);
+out_free_tagset:
+	if (new) {
+		/* disconnect and drain the queue before freeing the tagset */
+		nvme_rdma_stop_queue(&ctrl->queues[0]);
+		blk_mq_free_tag_set(&ctrl->admin_tag_set);
+	}
+out_put_dev:
+	if (new)
+		nvme_rdma_dev_put(ctrl->device);
+out_free_queue:
+	nvme_rdma_free_queue(&ctrl->queues[0]);
+	return error;
+}
+
 static void nvme_rdma_free_ctrl(struct nvme_ctrl *nctrl)
 {
 	struct nvme_rdma_ctrl *ctrl = to_rdma_ctrl(nctrl);
@@ -725,28 +840,12 @@ static void nvme_rdma_reconnect_ctrl_work(struct work_struct *work)
 			goto requeue;
 	}
 
-	nvme_rdma_stop_and_free_queue(&ctrl->queues[0]);
+	nvme_rdma_destroy_admin_queue(ctrl, false);
 
-	ret = blk_mq_reinit_tagset(&ctrl->admin_tag_set);
-	if (ret)
-		goto requeue;
-
-	ret = nvme_rdma_init_queue(ctrl, 0, NVME_AQ_DEPTH);
-	if (ret)
-		goto requeue;
-
-	ret = nvmf_connect_admin_queue(&ctrl->ctrl);
-	if (ret)
-		goto requeue;
-
-	set_bit(NVME_RDMA_Q_LIVE, &ctrl->queues[0].flags);
-
-	ret = nvme_enable_ctrl(&ctrl->ctrl, ctrl->cap);
+	ret = nvme_rdma_configure_admin_queue(ctrl, false);
 	if (ret)
 		goto requeue;
 
-	nvme_start_keep_alive(&ctrl->ctrl);
-
 	if (ctrl->queue_count > 1) {
 		ret = nvme_rdma_init_io_queues(ctrl);
 		if (ret)
@@ -760,12 +859,6 @@ static void nvme_rdma_reconnect_ctrl_work(struct work_struct *work)
 	changed = nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_LIVE);
 	WARN_ON_ONCE(!changed);
 	ctrl->ctrl.nr_reconnects = 0;
-
-	if (ctrl->queue_count > 1) {
-		nvme_queue_scan(&ctrl->ctrl);
-		nvme_queue_async_events(&ctrl->ctrl);
-	}
-
 	dev_info(ctrl->ctrl.device, "Successfully reconnected\n");
 
 	return;
@@ -1546,114 +1639,6 @@ static const struct blk_mq_ops nvme_rdma_admin_mq_ops = {
 	.timeout	= nvme_rdma_timeout,
 };
 
-static int nvme_rdma_configure_admin_queue(struct nvme_rdma_ctrl *ctrl, bool new)
-{
-	int error;
-
-	error = nvme_rdma_init_queue(ctrl, 0, NVME_AQ_DEPTH);
-	if (error)
-		return error;
-
-	ctrl->device = ctrl->queues[0].device;
-	ctrl->max_fr_pages = min_t(u32, NVME_RDMA_MAX_SEGMENTS,
-		ctrl->device->dev->attrs.max_fast_reg_page_list_len);
-
-	if (new) {
-		/*
-		 * We need a reference on the device as long as the tag_set is alive,
-		 * as the MRs in the request structures need a valid ib_device.
-		 */
-		error = -EINVAL;
-		if (!nvme_rdma_dev_get(ctrl->device))
-			goto out_free_queue;
-
-		memset(&ctrl->admin_tag_set, 0, sizeof(ctrl->admin_tag_set));
-		ctrl->admin_tag_set.ops = &nvme_rdma_admin_mq_ops;
-		ctrl->admin_tag_set.queue_depth = NVME_RDMA_AQ_BLKMQ_DEPTH;
-		ctrl->admin_tag_set.reserved_tags = 2; /* connect + keep-alive */
-		ctrl->admin_tag_set.numa_node = NUMA_NO_NODE;
-		ctrl->admin_tag_set.cmd_size = sizeof(struct nvme_rdma_request) +
-			SG_CHUNK_SIZE * sizeof(struct scatterlist);
-		ctrl->admin_tag_set.driver_data = ctrl;
-		ctrl->admin_tag_set.nr_hw_queues = 1;
-		ctrl->admin_tag_set.timeout = ADMIN_TIMEOUT;
-
-		error = blk_mq_alloc_tag_set(&ctrl->admin_tag_set);
-		if (error)
-			goto out_put_dev;
-
-		ctrl->ctrl.admin_q = blk_mq_init_queue(&ctrl->admin_tag_set);
-		if (IS_ERR(ctrl->ctrl.admin_q)) {
-			error = PTR_ERR(ctrl->ctrl.admin_q);
-			goto out_free_tagset;
-		}
-
-		ctrl->ctrl.admin_connect_q = blk_mq_init_queue(&ctrl->admin_tag_set);
-		if (IS_ERR(ctrl->ctrl.admin_connect_q)) {
-			error = PTR_ERR(ctrl->ctrl.admin_connect_q);
-			goto out_cleanup_queue;
-		}
-	}
-
-	error = nvmf_connect_admin_queue(&ctrl->ctrl);
-	if (error)
-		goto out_cleanup_connect_queue;
-
-	set_bit(NVME_RDMA_Q_LIVE, &ctrl->queues[0].flags);
-
-	blk_mq_start_stopped_hw_queues(ctrl->ctrl.admin_q, true);
-
-	error = nvmf_reg_read64(&ctrl->ctrl, NVME_REG_CAP, &ctrl->cap);
-	if (error) {
-		dev_err(ctrl->ctrl.device,
-			"prop_get NVME_REG_CAP failed\n");
-		goto out_cleanup_connect_queue;
-	}
-
-	ctrl->ctrl.sqsize =
-		min_t(int, NVME_CAP_MQES(ctrl->cap), ctrl->ctrl.sqsize);
-
-	error = nvme_enable_ctrl(&ctrl->ctrl, ctrl->cap);
-	if (error)
-		goto out_cleanup_connect_queue;
-
-	ctrl->ctrl.max_hw_sectors =
-		(ctrl->max_fr_pages - 1) << (PAGE_SHIFT - 9);
-
-	error = nvme_init_identify(&ctrl->ctrl);
-	if (error)
-		goto out_cleanup_connect_queue;
-
-	error = nvme_rdma_alloc_qe(ctrl->queues[0].device->dev,
-			&ctrl->async_event_sqe, sizeof(struct nvme_command),
-			DMA_TO_DEVICE);
-	if (error)
-		goto out_cleanup_connect_queue;
-
-	nvme_start_keep_alive(&ctrl->ctrl);
-
-	return 0;
-
-out_cleanup_connect_queue:
-	if (new)
-		blk_cleanup_queue(ctrl->ctrl.admin_connect_q);
-out_cleanup_queue:
-	if (new)
-		blk_cleanup_queue(ctrl->ctrl.admin_q);
-out_free_tagset:
-	if (new) {
-		/* disconnect and drain the queue before freeing the tagset */
-		nvme_rdma_stop_queue(&ctrl->queues[0]);
-		blk_mq_free_tag_set(&ctrl->admin_tag_set);
-	}
-out_put_dev:
-	if (new)
-		nvme_rdma_dev_put(ctrl->device);
-out_free_queue:
-	nvme_rdma_free_queue(&ctrl->queues[0]);
-	return error;
-}
-
 static void nvme_rdma_shutdown_ctrl(struct nvme_rdma_ctrl *ctrl, bool shutdown)
 {
 	nvme_stop_keep_alive(&ctrl->ctrl);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: [PATCH rfc 03/30] nvme-rdma: reuse configure/destroy admin queue
  2017-06-18 15:21 ` [PATCH rfc 03/30] nvme-rdma: reuse configure/destroy admin queue Sagi Grimberg
@ 2017-06-19  7:20   ` Christoph Hellwig
  2017-06-19  8:00     ` Sagi Grimberg
  0 siblings, 1 reply; 69+ messages in thread
From: Christoph Hellwig @ 2017-06-19  7:20 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: linux-nvme, Christoph Hellwig, Keith Busch, linux-block

On Sun, Jun 18, 2017 at 06:21:37PM +0300, Sagi Grimberg wrote:
> We have all we need in these functions now that these
> are aware if we are doing a full instantiation/removal.
> 
> For that we move nvme_rdma_configure_admin_queue to avoid
> a forward declaration, and we add blk_mq_ops forward declaration.
> 
> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
> ---
>  drivers/nvme/host/rdma.c | 253 ++++++++++++++++++++++-------------------------
>  1 file changed, 119 insertions(+), 134 deletions(-)
> 
> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
> index 3e4c6aa119ee..5fef5545e365 100644
> --- a/drivers/nvme/host/rdma.c
> +++ b/drivers/nvme/host/rdma.c
> @@ -140,6 +140,9 @@ static DEFINE_MUTEX(device_list_mutex);
>  static LIST_HEAD(nvme_rdma_ctrl_list);
>  static DEFINE_MUTEX(nvme_rdma_ctrl_mutex);
>  
> +static const struct blk_mq_ops nvme_rdma_mq_ops;
> +static const struct blk_mq_ops nvme_rdma_admin_mq_ops;
> +
>  /*
>   * Disabling this option makes small I/O goes faster, but is fundamentally
>   * unsafe.  With it turned off we will have to register a global rkey that
> @@ -562,20 +565,22 @@ static int nvme_rdma_init_queue(struct nvme_rdma_ctrl *ctrl,
>  
>  static void nvme_rdma_stop_queue(struct nvme_rdma_queue *queue)
>  {
> +	if (test_bit(NVME_RDMA_Q_DELETING, &queue->flags))
> +		return;
>  	rdma_disconnect(queue->cm_id);
>  	ib_drain_qp(queue->qp);
>  }
>  
>  static void nvme_rdma_free_queue(struct nvme_rdma_queue *queue)
>  {
> +	if (test_and_set_bit(NVME_RDMA_Q_DELETING, &queue->flags))
> +		return;
>  	nvme_rdma_destroy_queue_ib(queue);
>  	rdma_destroy_id(queue->cm_id);
>  }
>  
>  static void nvme_rdma_stop_and_free_queue(struct nvme_rdma_queue *queue)
>  {
> -	if (test_and_set_bit(NVME_RDMA_Q_DELETING, &queue->flags))
> -		return;
>  	nvme_rdma_stop_queue(queue);
>  	nvme_rdma_free_queue(queue);
>  }
> @@ -671,6 +676,116 @@ static void nvme_rdma_destroy_admin_queue(struct nvme_rdma_ctrl *ctrl, bool remo
>  	nvme_rdma_free_queue(&ctrl->queues[0]);
>  }
>  
static int nvme_rdma_configure_admin_queue(struct nvme_rdma_ctrl *ctrl, bool new)

Too long line..

Also what about moving the helpers into the right place in the
previous patch that already re-indented most of this?

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH rfc 03/30] nvme-rdma: reuse configure/destroy admin queue
  2017-06-19  7:20   ` Christoph Hellwig
@ 2017-06-19  8:00     ` Sagi Grimberg
  0 siblings, 0 replies; 69+ messages in thread
From: Sagi Grimberg @ 2017-06-19  8:00 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-nvme, Keith Busch, linux-block


> static int nvme_rdma_configure_admin_queue(struct nvme_rdma_ctrl *ctrl, bool new)
> 
> Too long line..
> 
> Also what about moving the helpers into the right place in the
> previous patch that already re-indented most of this?

Sure.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH rfc 04/30] nvme-rdma: introduce configure/destroy io queues
  2017-06-18 15:21 [PATCH rfc 00/30] centralize nvme controller reset, delete and periodic reconnects Sagi Grimberg
                   ` (2 preceding siblings ...)
  2017-06-18 15:21 ` [PATCH rfc 03/30] nvme-rdma: reuse configure/destroy admin queue Sagi Grimberg
@ 2017-06-18 15:21 ` Sagi Grimberg
  2017-06-19 12:35   ` Christoph Hellwig
  2017-06-18 15:21 ` [PATCH rfc 05/30] nvme-rdma: introduce nvme_rdma_start_queue Sagi Grimberg
                   ` (26 subsequent siblings)
  30 siblings, 1 reply; 69+ messages in thread
From: Sagi Grimberg @ 2017-06-18 15:21 UTC (permalink / raw)
  To: linux-nvme; +Cc: Christoph Hellwig, Keith Busch, linux-block

simlar to how we handle the admin queue.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
 drivers/nvme/host/rdma.c | 193 +++++++++++++++++++++++------------------------
 1 file changed, 96 insertions(+), 97 deletions(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 5fef5545e365..bbe39dd378b5 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -579,18 +579,31 @@ static void nvme_rdma_free_queue(struct nvme_rdma_queue *queue)
 	rdma_destroy_id(queue->cm_id);
 }
 
-static void nvme_rdma_stop_and_free_queue(struct nvme_rdma_queue *queue)
+static void nvme_rdma_free_io_queues(struct nvme_rdma_ctrl *ctrl)
 {
-	nvme_rdma_stop_queue(queue);
-	nvme_rdma_free_queue(queue);
+	int i;
+
+	for (i = 1; i < ctrl->queue_count; i++)
+		nvme_rdma_free_queue(&ctrl->queues[i]);
 }
 
-static void nvme_rdma_free_io_queues(struct nvme_rdma_ctrl *ctrl)
+static void nvme_rdma_stop_io_queues(struct nvme_rdma_ctrl *ctrl)
 {
 	int i;
 
 	for (i = 1; i < ctrl->queue_count; i++)
-		nvme_rdma_stop_and_free_queue(&ctrl->queues[i]);
+		nvme_rdma_stop_queue(&ctrl->queues[i]);
+}
+
+static void nvme_rdma_destroy_io_queues(struct nvme_rdma_ctrl *ctrl, bool remove)
+{
+	nvme_rdma_stop_io_queues(ctrl);
+	if (remove) {
+		blk_cleanup_queue(ctrl->ctrl.connect_q);
+		blk_mq_free_tag_set(&ctrl->tag_set);
+		nvme_rdma_dev_put(ctrl->device);
+	}
+	nvme_rdma_free_io_queues(ctrl);
 }
 
 static int nvme_rdma_connect_io_queues(struct nvme_rdma_ctrl *ctrl)
@@ -602,15 +615,15 @@ static int nvme_rdma_connect_io_queues(struct nvme_rdma_ctrl *ctrl)
 		if (ret) {
 			dev_info(ctrl->ctrl.device,
 				"failed to connect i/o queue: %d\n", ret);
-			goto out_free_queues;
+			goto out_stop_queues;
 		}
 		set_bit(NVME_RDMA_Q_LIVE, &ctrl->queues[i].flags);
 	}
 
 	return 0;
 
-out_free_queues:
-	nvme_rdma_free_io_queues(ctrl);
+out_stop_queues:
+	nvme_rdma_stop_io_queues(ctrl);
 	return ret;
 }
 
@@ -656,8 +669,73 @@ static int nvme_rdma_init_io_queues(struct nvme_rdma_ctrl *ctrl)
 
 out_free_queues:
 	for (i--; i >= 1; i--)
-		nvme_rdma_stop_and_free_queue(&ctrl->queues[i]);
+		nvme_rdma_free_queue(&ctrl->queues[i]);
+
+	return ret;
+}
+
+static int nvme_rdma_configure_io_queues(struct nvme_rdma_ctrl *ctrl, bool new)
+{
+	int ret;
 
+	ret = nvme_rdma_init_io_queues(ctrl);
+	if (ret)
+		return ret;
+
+	if (new) {
+		/*
+		 * We need a reference on the device as long as the tag_set is alive,
+		 * as the MRs in the request structures need a valid ib_device.
+		 */
+		ret = -EINVAL;
+		if (!nvme_rdma_dev_get(ctrl->device))
+			goto out_free_io_queues;
+
+		memset(&ctrl->tag_set, 0, sizeof(ctrl->tag_set));
+		ctrl->tag_set.ops = &nvme_rdma_mq_ops;
+		ctrl->tag_set.queue_depth = ctrl->ctrl.opts->queue_size;
+		ctrl->tag_set.reserved_tags = 1; /* fabric connect */
+		ctrl->tag_set.numa_node = NUMA_NO_NODE;
+		ctrl->tag_set.flags = BLK_MQ_F_SHOULD_MERGE;
+		ctrl->tag_set.cmd_size = sizeof(struct nvme_rdma_request) +
+			SG_CHUNK_SIZE * sizeof(struct scatterlist);
+		ctrl->tag_set.driver_data = ctrl;
+		ctrl->tag_set.nr_hw_queues = ctrl->queue_count - 1;
+		ctrl->tag_set.timeout = NVME_IO_TIMEOUT;
+
+		ret = blk_mq_alloc_tag_set(&ctrl->tag_set);
+		if (ret)
+			goto out_put_dev;
+		ctrl->ctrl.tagset = &ctrl->tag_set;
+
+		ctrl->ctrl.connect_q = blk_mq_init_queue(&ctrl->tag_set);
+		if (IS_ERR(ctrl->ctrl.connect_q)) {
+			ret = PTR_ERR(ctrl->ctrl.connect_q);
+			goto out_free_tag_set;
+		}
+	} else {
+		ret = blk_mq_reinit_tagset(&ctrl->tag_set);
+		if (ret)
+			goto out_free_io_queues;
+	}
+
+	ret = nvme_rdma_connect_io_queues(ctrl);
+	if (ret)
+		goto out_cleanup_connect_q;
+
+	return 0;
+
+out_cleanup_connect_q:
+	if (new)
+		blk_cleanup_queue(ctrl->ctrl.connect_q);
+out_free_tag_set:
+	if (new)
+		blk_mq_free_tag_set(&ctrl->tag_set);
+out_put_dev:
+	if (new)
+		nvme_rdma_dev_put(ctrl->device);
+out_free_io_queues:
+	nvme_rdma_free_io_queues(ctrl);
 	return ret;
 }
 
@@ -832,13 +910,8 @@ static void nvme_rdma_reconnect_ctrl_work(struct work_struct *work)
 
 	++ctrl->ctrl.nr_reconnects;
 
-	if (ctrl->queue_count > 1) {
-		nvme_rdma_free_io_queues(ctrl);
-
-		ret = blk_mq_reinit_tagset(&ctrl->tag_set);
-		if (ret)
-			goto requeue;
-	}
+	if (ctrl->ctrl.opts->nr_io_queues)
+		nvme_rdma_destroy_io_queues(ctrl, false);
 
 	nvme_rdma_destroy_admin_queue(ctrl, false);
 
@@ -846,12 +919,8 @@ static void nvme_rdma_reconnect_ctrl_work(struct work_struct *work)
 	if (ret)
 		goto requeue;
 
-	if (ctrl->queue_count > 1) {
-		ret = nvme_rdma_init_io_queues(ctrl);
-		if (ret)
-			goto requeue;
-
-		ret = nvme_rdma_connect_io_queues(ctrl);
+	if (ctrl->ctrl.opts->nr_io_queues) {
+		ret = nvme_rdma_configure_io_queues(ctrl, false);
 		if (ret)
 			goto requeue;
 	}
@@ -1645,11 +1714,11 @@ static void nvme_rdma_shutdown_ctrl(struct nvme_rdma_ctrl *ctrl, bool shutdown)
 	cancel_work_sync(&ctrl->err_work);
 	cancel_delayed_work_sync(&ctrl->reconnect_work);
 
-	if (ctrl->queue_count > 1) {
+	if (ctrl->ctrl.opts->nr_io_queues) {
 		nvme_stop_queues(&ctrl->ctrl);
 		blk_mq_tagset_busy_iter(&ctrl->tag_set,
 					nvme_cancel_request, &ctrl->ctrl);
-		nvme_rdma_free_io_queues(ctrl);
+		nvme_rdma_destroy_io_queues(ctrl, shutdown);
 	}
 
 	if (test_bit(NVME_RDMA_Q_LIVE, &ctrl->queues[0].flags))
@@ -1667,12 +1736,6 @@ static void __nvme_rdma_remove_ctrl(struct nvme_rdma_ctrl *ctrl, bool shutdown)
 	if (shutdown)
 		nvme_rdma_shutdown_ctrl(ctrl, shutdown);
 
-	if (ctrl->ctrl.tagset) {
-		blk_cleanup_queue(ctrl->ctrl.connect_q);
-		blk_mq_free_tag_set(&ctrl->tag_set);
-		nvme_rdma_dev_put(ctrl->device);
-	}
-
 	nvme_put_ctrl(&ctrl->ctrl);
 }
 
@@ -1737,16 +1800,8 @@ static void nvme_rdma_reset_ctrl_work(struct work_struct *work)
 		goto del_dead_ctrl;
 	}
 
-	if (ctrl->queue_count > 1) {
-		ret = blk_mq_reinit_tagset(&ctrl->tag_set);
-		if (ret)
-			goto del_dead_ctrl;
-
-		ret = nvme_rdma_init_io_queues(ctrl);
-		if (ret)
-			goto del_dead_ctrl;
-
-		ret = nvme_rdma_connect_io_queues(ctrl);
+	if (ctrl->ctrl.opts->nr_io_queues) {
+		ret = nvme_rdma_configure_io_queues(ctrl, false);
 		if (ret)
 			goto del_dead_ctrl;
 	}
@@ -1782,62 +1837,6 @@ static const struct nvme_ctrl_ops nvme_rdma_ctrl_ops = {
 	.get_address		= nvmf_get_address,
 };
 
-static int nvme_rdma_create_io_queues(struct nvme_rdma_ctrl *ctrl)
-{
-	int ret;
-
-	ret = nvme_rdma_init_io_queues(ctrl);
-	if (ret)
-		return ret;
-
-	/*
-	 * We need a reference on the device as long as the tag_set is alive,
-	 * as the MRs in the request structures need a valid ib_device.
-	 */
-	ret = -EINVAL;
-	if (!nvme_rdma_dev_get(ctrl->device))
-		goto out_free_io_queues;
-
-	memset(&ctrl->tag_set, 0, sizeof(ctrl->tag_set));
-	ctrl->tag_set.ops = &nvme_rdma_mq_ops;
-	ctrl->tag_set.queue_depth = ctrl->ctrl.opts->queue_size;
-	ctrl->tag_set.reserved_tags = 1; /* fabric connect */
-	ctrl->tag_set.numa_node = NUMA_NO_NODE;
-	ctrl->tag_set.flags = BLK_MQ_F_SHOULD_MERGE;
-	ctrl->tag_set.cmd_size = sizeof(struct nvme_rdma_request) +
-		SG_CHUNK_SIZE * sizeof(struct scatterlist);
-	ctrl->tag_set.driver_data = ctrl;
-	ctrl->tag_set.nr_hw_queues = ctrl->queue_count - 1;
-	ctrl->tag_set.timeout = NVME_IO_TIMEOUT;
-
-	ret = blk_mq_alloc_tag_set(&ctrl->tag_set);
-	if (ret)
-		goto out_put_dev;
-	ctrl->ctrl.tagset = &ctrl->tag_set;
-
-	ctrl->ctrl.connect_q = blk_mq_init_queue(&ctrl->tag_set);
-	if (IS_ERR(ctrl->ctrl.connect_q)) {
-		ret = PTR_ERR(ctrl->ctrl.connect_q);
-		goto out_free_tag_set;
-	}
-
-	ret = nvme_rdma_connect_io_queues(ctrl);
-	if (ret)
-		goto out_cleanup_connect_q;
-
-	return 0;
-
-out_cleanup_connect_q:
-	blk_cleanup_queue(ctrl->ctrl.connect_q);
-out_free_tag_set:
-	blk_mq_free_tag_set(&ctrl->tag_set);
-out_put_dev:
-	nvme_rdma_dev_put(ctrl->device);
-out_free_io_queues:
-	nvme_rdma_free_io_queues(ctrl);
-	return ret;
-}
-
 static struct nvme_ctrl *nvme_rdma_create_ctrl(struct device *dev,
 		struct nvmf_ctrl_options *opts)
 {
@@ -1930,7 +1929,7 @@ static struct nvme_ctrl *nvme_rdma_create_ctrl(struct device *dev,
 	}
 
 	if (opts->nr_io_queues) {
-		ret = nvme_rdma_create_io_queues(ctrl);
+		ret = nvme_rdma_configure_io_queues(ctrl, true);
 		if (ret)
 			goto out_remove_admin_queue;
 	}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: [PATCH rfc 04/30] nvme-rdma: introduce configure/destroy io queues
  2017-06-18 15:21 ` [PATCH rfc 04/30] nvme-rdma: introduce configure/destroy io queues Sagi Grimberg
@ 2017-06-19 12:35   ` Christoph Hellwig
  0 siblings, 0 replies; 69+ messages in thread
From: Christoph Hellwig @ 2017-06-19 12:35 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: linux-nvme, Christoph Hellwig, Keith Busch, linux-block

Looks fine,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH rfc 05/30] nvme-rdma: introduce nvme_rdma_start_queue
  2017-06-18 15:21 [PATCH rfc 00/30] centralize nvme controller reset, delete and periodic reconnects Sagi Grimberg
                   ` (3 preceding siblings ...)
  2017-06-18 15:21 ` [PATCH rfc 04/30] nvme-rdma: introduce configure/destroy io queues Sagi Grimberg
@ 2017-06-18 15:21 ` Sagi Grimberg
  2017-06-19 12:38   ` Christoph Hellwig
  2017-06-18 15:21 ` [PATCH rfc 06/30] nvme-rdma: rename nvme_rdma_init_queue to nvme_rdma_alloc_queue Sagi Grimberg
                   ` (25 subsequent siblings)
  30 siblings, 1 reply; 69+ messages in thread
From: Sagi Grimberg @ 2017-06-18 15:21 UTC (permalink / raw)
  To: linux-nvme; +Cc: Christoph Hellwig, Keith Busch, linux-block

This should pair with nvme_rdma_stop_queue. While this
is not a complete 1x1 reverse, it still pairs up pretty
well because in fabrics we don't have a disconnect capsule
but we simply teardown the transport association.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
 drivers/nvme/host/rdma.c | 32 +++++++++++++++++++++++---------
 1 file changed, 23 insertions(+), 9 deletions(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index bbe39dd378b5..69ebfb61d599 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -606,24 +606,38 @@ static void nvme_rdma_destroy_io_queues(struct nvme_rdma_ctrl *ctrl, bool remove
 	nvme_rdma_free_io_queues(ctrl);
 }
 
-static int nvme_rdma_connect_io_queues(struct nvme_rdma_ctrl *ctrl)
+static int nvme_rdma_start_queue(struct nvme_rdma_ctrl *ctrl, int idx)
+{
+	int ret;
+
+	if (idx)
+		ret = nvmf_connect_io_queue(&ctrl->ctrl, idx);
+	else
+		ret = nvmf_connect_admin_queue(&ctrl->ctrl);
+
+	if (ret)
+		dev_info(ctrl->ctrl.device,
+			"failed to connect queue: %d ret=%d\n", idx, ret);
+	return ret;
+}
+
+static int nvme_rdma_start_io_queues(struct nvme_rdma_ctrl *ctrl)
 {
 	int i, ret = 0;
 
 	for (i = 1; i < ctrl->queue_count; i++) {
-		ret = nvmf_connect_io_queue(&ctrl->ctrl, i);
-		if (ret) {
-			dev_info(ctrl->ctrl.device,
-				"failed to connect i/o queue: %d\n", ret);
+		ret = nvme_rdma_start_queue(ctrl, i);
+		if (ret)
 			goto out_stop_queues;
-		}
+
 		set_bit(NVME_RDMA_Q_LIVE, &ctrl->queues[i].flags);
 	}
 
 	return 0;
 
 out_stop_queues:
-	nvme_rdma_stop_io_queues(ctrl);
+	for (i--; i >= 1; i--)
+		nvme_rdma_stop_queue(&ctrl->queues[i]);
 	return ret;
 }
 
@@ -719,7 +733,7 @@ static int nvme_rdma_configure_io_queues(struct nvme_rdma_ctrl *ctrl, bool new)
 			goto out_free_io_queues;
 	}
 
-	ret = nvme_rdma_connect_io_queues(ctrl);
+	ret = nvme_rdma_start_io_queues(ctrl);
 	if (ret)
 		goto out_cleanup_connect_q;
 
@@ -807,7 +821,7 @@ static int nvme_rdma_configure_admin_queue(struct nvme_rdma_ctrl *ctrl, bool new
 			goto out_free_queue;
 	}
 
-	error = nvmf_connect_admin_queue(&ctrl->ctrl);
+	error = nvme_rdma_start_queue(ctrl, 0);
 	if (error)
 		goto out_cleanup_connect_queue;
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: [PATCH rfc 05/30] nvme-rdma: introduce nvme_rdma_start_queue
  2017-06-18 15:21 ` [PATCH rfc 05/30] nvme-rdma: introduce nvme_rdma_start_queue Sagi Grimberg
@ 2017-06-19 12:38   ` Christoph Hellwig
  0 siblings, 0 replies; 69+ messages in thread
From: Christoph Hellwig @ 2017-06-19 12:38 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: linux-nvme, Christoph Hellwig, Keith Busch, linux-block

On Sun, Jun 18, 2017 at 06:21:39PM +0300, Sagi Grimberg wrote:
> This should pair with nvme_rdma_stop_queue. While this
> is not a complete 1x1 reverse, it still pairs up pretty
> well because in fabrics we don't have a disconnect capsule
> but we simply teardown the transport association.

Looks like we are going to get one sooner or later.  Not that I'd
like to send them if we can avoid it.

But the patch itself looks fine to me.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH rfc 06/30] nvme-rdma: rename nvme_rdma_init_queue to nvme_rdma_alloc_queue
  2017-06-18 15:21 [PATCH rfc 00/30] centralize nvme controller reset, delete and periodic reconnects Sagi Grimberg
                   ` (4 preceding siblings ...)
  2017-06-18 15:21 ` [PATCH rfc 05/30] nvme-rdma: introduce nvme_rdma_start_queue Sagi Grimberg
@ 2017-06-18 15:21 ` Sagi Grimberg
  2017-06-19 12:38   ` Christoph Hellwig
  2017-06-18 15:21 ` [PATCH rfc 07/30] nvme-rdma: make stop/free queue receive a ctrl and qid struct Sagi Grimberg
                   ` (24 subsequent siblings)
  30 siblings, 1 reply; 69+ messages in thread
From: Sagi Grimberg @ 2017-06-18 15:21 UTC (permalink / raw)
  To: linux-nvme; +Cc: Christoph Hellwig, Keith Busch, linux-block

Give it a name symmetric to nvme_rdma_free_queue. Also pass in
the ctrl sqsize+1 and not the opts queue_size. And suppress
failure message that has prior verbosity.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
 drivers/nvme/host/rdma.c | 17 +++++++----------
 1 file changed, 7 insertions(+), 10 deletions(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 69ebfb61d599..c8016150dc21 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -508,7 +508,7 @@ static int nvme_rdma_create_queue_ib(struct nvme_rdma_queue *queue)
 	return ret;
 }
 
-static int nvme_rdma_init_queue(struct nvme_rdma_ctrl *ctrl,
+static int nvme_rdma_alloc_queue(struct nvme_rdma_ctrl *ctrl,
 		int idx, size_t queue_size)
 {
 	struct nvme_rdma_queue *queue;
@@ -641,7 +641,7 @@ static int nvme_rdma_start_io_queues(struct nvme_rdma_ctrl *ctrl)
 	return ret;
 }
 
-static int nvme_rdma_init_io_queues(struct nvme_rdma_ctrl *ctrl)
+static int nvme_rdma_alloc_io_queues(struct nvme_rdma_ctrl *ctrl)
 {
 	struct nvmf_ctrl_options *opts = ctrl->ctrl.opts;
 	struct ib_device *ibdev = ctrl->device->dev;
@@ -670,13 +670,10 @@ static int nvme_rdma_init_io_queues(struct nvme_rdma_ctrl *ctrl)
 		"creating %d I/O queues.\n", nr_io_queues);
 
 	for (i = 1; i < ctrl->queue_count; i++) {
-		ret = nvme_rdma_init_queue(ctrl, i,
-					   ctrl->ctrl.opts->queue_size);
-		if (ret) {
-			dev_info(ctrl->ctrl.device,
-				"failed to initialize i/o queue: %d\n", ret);
+		ret = nvme_rdma_alloc_queue(ctrl, i,
+				ctrl->ctrl.sqsize + 1);
+		if (ret)
 			goto out_free_queues;
-		}
 	}
 
 	return 0;
@@ -692,7 +689,7 @@ static int nvme_rdma_configure_io_queues(struct nvme_rdma_ctrl *ctrl, bool new)
 {
 	int ret;
 
-	ret = nvme_rdma_init_io_queues(ctrl);
+	ret = nvme_rdma_alloc_io_queues(ctrl);
 	if (ret)
 		return ret;
 
@@ -772,7 +769,7 @@ static int nvme_rdma_configure_admin_queue(struct nvme_rdma_ctrl *ctrl, bool new
 {
 	int error;
 
-	error = nvme_rdma_init_queue(ctrl, 0, NVME_AQ_DEPTH);
+	error = nvme_rdma_alloc_queue(ctrl, 0, NVME_AQ_DEPTH);
 	if (error)
 		return error;
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: [PATCH rfc 06/30] nvme-rdma: rename nvme_rdma_init_queue to nvme_rdma_alloc_queue
  2017-06-18 15:21 ` [PATCH rfc 06/30] nvme-rdma: rename nvme_rdma_init_queue to nvme_rdma_alloc_queue Sagi Grimberg
@ 2017-06-19 12:38   ` Christoph Hellwig
  0 siblings, 0 replies; 69+ messages in thread
From: Christoph Hellwig @ 2017-06-19 12:38 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: linux-nvme, Christoph Hellwig, Keith Busch, linux-block

Looks fine,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH rfc 07/30] nvme-rdma: make stop/free queue receive a ctrl and qid struct
  2017-06-18 15:21 [PATCH rfc 00/30] centralize nvme controller reset, delete and periodic reconnects Sagi Grimberg
                   ` (5 preceding siblings ...)
  2017-06-18 15:21 ` [PATCH rfc 06/30] nvme-rdma: rename nvme_rdma_init_queue to nvme_rdma_alloc_queue Sagi Grimberg
@ 2017-06-18 15:21 ` Sagi Grimberg
  2017-06-19 12:39   ` Christoph Hellwig
  2017-06-18 15:21 ` [PATCH rfc 08/30] nvme-rdma: cleanup error path in controller reset Sagi Grimberg
                   ` (23 subsequent siblings)
  30 siblings, 1 reply; 69+ messages in thread
From: Sagi Grimberg @ 2017-06-18 15:21 UTC (permalink / raw)
  To: linux-nvme; +Cc: Christoph Hellwig, Keith Busch, linux-block

Make it symmetrical to alloc/start queue.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
 drivers/nvme/host/rdma.c | 24 ++++++++++++++----------
 1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index c8016150dc21..86998de90f52 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -563,16 +563,20 @@ static int nvme_rdma_alloc_queue(struct nvme_rdma_ctrl *ctrl,
 	return ret;
 }
 
-static void nvme_rdma_stop_queue(struct nvme_rdma_queue *queue)
+static void nvme_rdma_stop_queue(struct nvme_rdma_ctrl *ctrl, int qid)
 {
+	struct nvme_rdma_queue *queue = &ctrl->queues[qid];
+
 	if (test_bit(NVME_RDMA_Q_DELETING, &queue->flags))
 		return;
 	rdma_disconnect(queue->cm_id);
 	ib_drain_qp(queue->qp);
 }
 
-static void nvme_rdma_free_queue(struct nvme_rdma_queue *queue)
+static void nvme_rdma_free_queue(struct nvme_rdma_ctrl *ctrl, int qid)
 {
+	struct nvme_rdma_queue *queue = &ctrl->queues[qid];
+
 	if (test_and_set_bit(NVME_RDMA_Q_DELETING, &queue->flags))
 		return;
 	nvme_rdma_destroy_queue_ib(queue);
@@ -584,7 +588,7 @@ static void nvme_rdma_free_io_queues(struct nvme_rdma_ctrl *ctrl)
 	int i;
 
 	for (i = 1; i < ctrl->queue_count; i++)
-		nvme_rdma_free_queue(&ctrl->queues[i]);
+		nvme_rdma_free_queue(ctrl, i);
 }
 
 static void nvme_rdma_stop_io_queues(struct nvme_rdma_ctrl *ctrl)
@@ -592,7 +596,7 @@ static void nvme_rdma_stop_io_queues(struct nvme_rdma_ctrl *ctrl)
 	int i;
 
 	for (i = 1; i < ctrl->queue_count; i++)
-		nvme_rdma_stop_queue(&ctrl->queues[i]);
+		nvme_rdma_stop_queue(ctrl, i);
 }
 
 static void nvme_rdma_destroy_io_queues(struct nvme_rdma_ctrl *ctrl, bool remove)
@@ -637,7 +641,7 @@ static int nvme_rdma_start_io_queues(struct nvme_rdma_ctrl *ctrl)
 
 out_stop_queues:
 	for (i--; i >= 1; i--)
-		nvme_rdma_stop_queue(&ctrl->queues[i]);
+		nvme_rdma_stop_queue(ctrl, i);
 	return ret;
 }
 
@@ -680,7 +684,7 @@ static int nvme_rdma_alloc_io_queues(struct nvme_rdma_ctrl *ctrl)
 
 out_free_queues:
 	for (i--; i >= 1; i--)
-		nvme_rdma_free_queue(&ctrl->queues[i]);
+		nvme_rdma_free_queue(ctrl, i);
 
 	return ret;
 }
@@ -752,7 +756,7 @@ static int nvme_rdma_configure_io_queues(struct nvme_rdma_ctrl *ctrl, bool new)
 
 static void nvme_rdma_destroy_admin_queue(struct nvme_rdma_ctrl *ctrl, bool remove)
 {
-	nvme_rdma_stop_queue(&ctrl->queues[0]);
+	nvme_rdma_stop_queue(ctrl, 0);
 	if (remove) {
 		blk_cleanup_queue(ctrl->ctrl.admin_connect_q);
 		blk_cleanup_queue(ctrl->ctrl.admin_q);
@@ -762,7 +766,7 @@ static void nvme_rdma_destroy_admin_queue(struct nvme_rdma_ctrl *ctrl, bool remo
 
 	nvme_rdma_free_qe(ctrl->queues[0].device->dev, &ctrl->async_event_sqe,
 			sizeof(struct nvme_command), DMA_TO_DEVICE);
-	nvme_rdma_free_queue(&ctrl->queues[0]);
+	nvme_rdma_free_queue(ctrl, 0);
 }
 
 static int nvme_rdma_configure_admin_queue(struct nvme_rdma_ctrl *ctrl, bool new)
@@ -864,14 +868,14 @@ static int nvme_rdma_configure_admin_queue(struct nvme_rdma_ctrl *ctrl, bool new
 out_free_tagset:
 	if (new) {
 		/* disconnect and drain the queue before freeing the tagset */
-		nvme_rdma_stop_queue(&ctrl->queues[0]);
+		nvme_rdma_stop_queue(ctrl, 0);
 		blk_mq_free_tag_set(&ctrl->admin_tag_set);
 	}
 out_put_dev:
 	if (new)
 		nvme_rdma_dev_put(ctrl->device);
 out_free_queue:
-	nvme_rdma_free_queue(&ctrl->queues[0]);
+	nvme_rdma_free_queue(ctrl, 0);
 	return error;
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: [PATCH rfc 07/30] nvme-rdma: make stop/free queue receive a ctrl and qid struct
  2017-06-18 15:21 ` [PATCH rfc 07/30] nvme-rdma: make stop/free queue receive a ctrl and qid struct Sagi Grimberg
@ 2017-06-19 12:39   ` Christoph Hellwig
  0 siblings, 0 replies; 69+ messages in thread
From: Christoph Hellwig @ 2017-06-19 12:39 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: linux-nvme, Christoph Hellwig, Keith Busch, linux-block

Looks fine,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH rfc 08/30] nvme-rdma: cleanup error path in controller reset
  2017-06-18 15:21 [PATCH rfc 00/30] centralize nvme controller reset, delete and periodic reconnects Sagi Grimberg
                   ` (6 preceding siblings ...)
  2017-06-18 15:21 ` [PATCH rfc 07/30] nvme-rdma: make stop/free queue receive a ctrl and qid struct Sagi Grimberg
@ 2017-06-18 15:21 ` Sagi Grimberg
  2017-06-19 12:40   ` Christoph Hellwig
  2017-07-10 18:57   ` James Smart
  2017-06-18 15:21 ` [PATCH rfc 09/30] nvme: Move queue_count to the nvme_ctrl Sagi Grimberg
                   ` (22 subsequent siblings)
  30 siblings, 2 replies; 69+ messages in thread
From: Sagi Grimberg @ 2017-06-18 15:21 UTC (permalink / raw)
  To: linux-nvme; +Cc: Christoph Hellwig, Keith Busch, linux-block

No need to queue an extra work to indirect controller
uninit and put the final reference.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
 drivers/nvme/host/rdma.c | 39 ++++++++++++---------------------------
 1 file changed, 12 insertions(+), 27 deletions(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 86998de90f52..099b3d7b6721 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -1745,21 +1745,14 @@ static void nvme_rdma_shutdown_ctrl(struct nvme_rdma_ctrl *ctrl, bool shutdown)
 	nvme_rdma_destroy_admin_queue(ctrl, shutdown);
 }
 
-static void __nvme_rdma_remove_ctrl(struct nvme_rdma_ctrl *ctrl, bool shutdown)
-{
-	nvme_uninit_ctrl(&ctrl->ctrl);
-	if (shutdown)
-		nvme_rdma_shutdown_ctrl(ctrl, shutdown);
-
-	nvme_put_ctrl(&ctrl->ctrl);
-}
-
 static void nvme_rdma_del_ctrl_work(struct work_struct *work)
 {
 	struct nvme_rdma_ctrl *ctrl = container_of(work,
 				struct nvme_rdma_ctrl, delete_work);
 
-	__nvme_rdma_remove_ctrl(ctrl, true);
+	nvme_uninit_ctrl(&ctrl->ctrl);
+	nvme_rdma_shutdown_ctrl(ctrl, true);
+	nvme_put_ctrl(&ctrl->ctrl);
 }
 
 static int __nvme_rdma_del_ctrl(struct nvme_rdma_ctrl *ctrl)
@@ -1791,14 +1784,6 @@ static int nvme_rdma_del_ctrl(struct nvme_ctrl *nctrl)
 	return ret;
 }
 
-static void nvme_rdma_remove_ctrl_work(struct work_struct *work)
-{
-	struct nvme_rdma_ctrl *ctrl = container_of(work,
-				struct nvme_rdma_ctrl, delete_work);
-
-	__nvme_rdma_remove_ctrl(ctrl, false);
-}
-
 static void nvme_rdma_reset_ctrl_work(struct work_struct *work)
 {
 	struct nvme_rdma_ctrl *ctrl =
@@ -1809,16 +1794,13 @@ static void nvme_rdma_reset_ctrl_work(struct work_struct *work)
 	nvme_rdma_shutdown_ctrl(ctrl, false);
 
 	ret = nvme_rdma_configure_admin_queue(ctrl, false);
-	if (ret) {
-		/* ctrl is already shutdown, just remove the ctrl */
-		INIT_WORK(&ctrl->delete_work, nvme_rdma_remove_ctrl_work);
-		goto del_dead_ctrl;
-	}
+	if (ret)
+		goto out_destroy_admin;
 
 	if (ctrl->ctrl.opts->nr_io_queues) {
 		ret = nvme_rdma_configure_io_queues(ctrl, false);
 		if (ret)
-			goto del_dead_ctrl;
+			goto out_destroy_io;
 	}
 
 	changed = nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_LIVE);
@@ -1832,10 +1814,13 @@ static void nvme_rdma_reset_ctrl_work(struct work_struct *work)
 
 	return;
 
-del_dead_ctrl:
-	/* Deleting this dead controller... */
+out_destroy_io:
+	nvme_rdma_destroy_io_queues(ctrl, true);
+out_destroy_admin:
+	nvme_rdma_destroy_admin_queue(ctrl, true);
 	dev_warn(ctrl->ctrl.device, "Removing after reset failure\n");
-	WARN_ON(!queue_work(nvme_wq, &ctrl->delete_work));
+	nvme_uninit_ctrl(&ctrl->ctrl);
+	nvme_put_ctrl(&ctrl->ctrl);
 }
 
 static const struct nvme_ctrl_ops nvme_rdma_ctrl_ops = {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: [PATCH rfc 08/30] nvme-rdma: cleanup error path in controller reset
  2017-06-18 15:21 ` [PATCH rfc 08/30] nvme-rdma: cleanup error path in controller reset Sagi Grimberg
@ 2017-06-19 12:40   ` Christoph Hellwig
  2017-07-10 18:57   ` James Smart
  1 sibling, 0 replies; 69+ messages in thread
From: Christoph Hellwig @ 2017-06-19 12:40 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: linux-nvme, Christoph Hellwig, Keith Busch, linux-block

On Sun, Jun 18, 2017 at 06:21:42PM +0300, Sagi Grimberg wrote:
> No need to queue an extra work to indirect controller
> uninit and put the final reference.

Maybe my memory is a little vague, but didn't we need the work_struct
for something?  At least it would serialize all the removals for example.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH rfc 08/30] nvme-rdma: cleanup error path in controller reset
  2017-06-18 15:21 ` [PATCH rfc 08/30] nvme-rdma: cleanup error path in controller reset Sagi Grimberg
  2017-06-19 12:40   ` Christoph Hellwig
@ 2017-07-10 18:57   ` James Smart
  1 sibling, 0 replies; 69+ messages in thread
From: James Smart @ 2017-07-10 18:57 UTC (permalink / raw)
  To: Sagi Grimberg, linux-nvme; +Cc: Keith Busch, linux-block, Christoph Hellwig

On 6/18/2017 8:21 AM, Sagi Grimberg wrote:
> No need to queue an extra work to indirect controller
> uninit and put the final reference.
>
>
>   static void nvme_rdma_del_ctrl_work(struct work_struct *work)
>   {
>   	struct nvme_rdma_ctrl *ctrl = container_of(work,
>   				struct nvme_rdma_ctrl, delete_work);
>   
> -	__nvme_rdma_remove_ctrl(ctrl, true);
> +	nvme_uninit_ctrl(&ctrl->ctrl);
> +	nvme_rdma_shutdown_ctrl(ctrl, true);
> +	nvme_put_ctrl(&ctrl->ctrl);
>   }
>   
>   static int __nvme_rdma_del_ctrl(struct nvme_rdma_ctrl *ctrl)
> @@ -1791,14 +1784,6 @@ static int nvme_rdma_del_ctrl(struct nvme_ctrl *nctrl)
>   	return ret;
>   }
>   
...

> @@ -1832,10 +1814,13 @@ static void nvme_rdma_reset_ctrl_work(struct work_struct *work)
>   
>   	return;
>   
> -del_dead_ctrl:
> -	/* Deleting this dead controller... */
> +out_destroy_io:
> +	nvme_rdma_destroy_io_queues(ctrl, true);
> +out_destroy_admin:
> +	nvme_rdma_destroy_admin_queue(ctrl, true);
>   	dev_warn(ctrl->ctrl.device, "Removing after reset failure\n");
> -	WARN_ON(!queue_work(nvme_wq, &ctrl->delete_work));
> +	nvme_uninit_ctrl(&ctrl->ctrl);
> +	nvme_put_ctrl(&ctrl->ctrl);
>   }
>   
>   static const struct nvme_ctrl_ops nvme_rdma_ctrl_ops = {

Recommend calls to nvme_stop_keep_alive() prior to nvme_uninit_ctrl().

-- james

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH rfc 09/30] nvme: Move queue_count to the nvme_ctrl
  2017-06-18 15:21 [PATCH rfc 00/30] centralize nvme controller reset, delete and periodic reconnects Sagi Grimberg
                   ` (7 preceding siblings ...)
  2017-06-18 15:21 ` [PATCH rfc 08/30] nvme-rdma: cleanup error path in controller reset Sagi Grimberg
@ 2017-06-18 15:21 ` Sagi Grimberg
  2017-06-19 12:41   ` Christoph Hellwig
  2017-06-18 15:21 ` [PATCH rfc 10/30] nvme: Add admin_tagset pointer to nvme_ctrl Sagi Grimberg
                   ` (21 subsequent siblings)
  30 siblings, 1 reply; 69+ messages in thread
From: Sagi Grimberg @ 2017-06-18 15:21 UTC (permalink / raw)
  To: linux-nvme; +Cc: Christoph Hellwig, Keith Busch, linux-block

We are trying to move queues setup and teardown
to the core, introduce also ctrl max queues which
will tell us what is the maximum number of queues
allowed. For now just rdma, the rest will follow.

queue_count was replaced with sed
s/ctrl->queue_count/ctrl->ctrl.queue_count/g

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
 drivers/nvme/host/nvme.h |  2 ++
 drivers/nvme/host/rdma.c | 47 ++++++++++++++++++++++-------------------------
 2 files changed, 24 insertions(+), 25 deletions(-)

diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 67147b49d992..415a5ea4759c 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -142,6 +142,8 @@ struct nvme_ctrl {
 	u16 cntlid;
 
 	u32 ctrl_config;
+	u32 max_queues;
+	u32 queue_count;
 
 	u32 page_size;
 	u32 max_hw_sectors;
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 099b3d7b6721..2b23f88bedfe 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -104,7 +104,6 @@ struct nvme_rdma_queue {
 struct nvme_rdma_ctrl {
 	/* read only in the hot path */
 	struct nvme_rdma_queue	*queues;
-	u32			queue_count;
 
 	/* other member variables */
 	struct blk_mq_tag_set	tag_set;
@@ -353,7 +352,7 @@ static int nvme_rdma_init_hctx(struct blk_mq_hw_ctx *hctx, void *data,
 	struct nvme_rdma_ctrl *ctrl = data;
 	struct nvme_rdma_queue *queue = &ctrl->queues[hctx_idx + 1];
 
-	BUG_ON(hctx_idx >= ctrl->queue_count);
+	BUG_ON(hctx_idx >= ctrl->ctrl.max_queues);
 
 	hctx->driver_data = queue;
 	return 0;
@@ -587,7 +586,7 @@ static void nvme_rdma_free_io_queues(struct nvme_rdma_ctrl *ctrl)
 {
 	int i;
 
-	for (i = 1; i < ctrl->queue_count; i++)
+	for (i = 1; i < ctrl->ctrl.queue_count; i++)
 		nvme_rdma_free_queue(ctrl, i);
 }
 
@@ -595,7 +594,7 @@ static void nvme_rdma_stop_io_queues(struct nvme_rdma_ctrl *ctrl)
 {
 	int i;
 
-	for (i = 1; i < ctrl->queue_count; i++)
+	for (i = 1; i < ctrl->ctrl.queue_count; i++)
 		nvme_rdma_stop_queue(ctrl, i);
 }
 
@@ -629,7 +628,7 @@ static int nvme_rdma_start_io_queues(struct nvme_rdma_ctrl *ctrl)
 {
 	int i, ret = 0;
 
-	for (i = 1; i < ctrl->queue_count; i++) {
+	for (i = 1; i < ctrl->ctrl.queue_count; i++) {
 		ret = nvme_rdma_start_queue(ctrl, i);
 		if (ret)
 			goto out_stop_queues;
@@ -647,13 +646,11 @@ static int nvme_rdma_start_io_queues(struct nvme_rdma_ctrl *ctrl)
 
 static int nvme_rdma_alloc_io_queues(struct nvme_rdma_ctrl *ctrl)
 {
-	struct nvmf_ctrl_options *opts = ctrl->ctrl.opts;
+	unsigned int nr_io_queues = ctrl->ctrl.max_queues - 1;
 	struct ib_device *ibdev = ctrl->device->dev;
-	unsigned int nr_io_queues;
 	int i, ret;
 
-	nr_io_queues = min(opts->nr_io_queues, num_online_cpus());
-
+	nr_io_queues = min(nr_io_queues, num_online_cpus());
 	/*
 	 * we map queues according to the device irq vectors for
 	 * optimal locality so we don't need more queues than
@@ -666,14 +663,14 @@ static int nvme_rdma_alloc_io_queues(struct nvme_rdma_ctrl *ctrl)
 	if (ret)
 		return ret;
 
-	ctrl->queue_count = nr_io_queues + 1;
-	if (ctrl->queue_count < 2)
+	ctrl->ctrl.queue_count = nr_io_queues + 1;
+	if (ctrl->ctrl.queue_count < 2)
 		return 0;
 
 	dev_info(ctrl->ctrl.device,
 		"creating %d I/O queues.\n", nr_io_queues);
 
-	for (i = 1; i < ctrl->queue_count; i++) {
+	for (i = 1; i < ctrl->ctrl.queue_count; i++) {
 		ret = nvme_rdma_alloc_queue(ctrl, i,
 				ctrl->ctrl.sqsize + 1);
 		if (ret)
@@ -715,7 +712,7 @@ static int nvme_rdma_configure_io_queues(struct nvme_rdma_ctrl *ctrl, bool new)
 		ctrl->tag_set.cmd_size = sizeof(struct nvme_rdma_request) +
 			SG_CHUNK_SIZE * sizeof(struct scatterlist);
 		ctrl->tag_set.driver_data = ctrl;
-		ctrl->tag_set.nr_hw_queues = ctrl->queue_count - 1;
+		ctrl->tag_set.nr_hw_queues = ctrl->ctrl.max_queues - 1;
 		ctrl->tag_set.timeout = NVME_IO_TIMEOUT;
 
 		ret = blk_mq_alloc_tag_set(&ctrl->tag_set);
@@ -925,7 +922,7 @@ static void nvme_rdma_reconnect_ctrl_work(struct work_struct *work)
 
 	++ctrl->ctrl.nr_reconnects;
 
-	if (ctrl->ctrl.opts->nr_io_queues)
+	if (ctrl->ctrl.max_queues > 1)
 		nvme_rdma_destroy_io_queues(ctrl, false);
 
 	nvme_rdma_destroy_admin_queue(ctrl, false);
@@ -934,7 +931,7 @@ static void nvme_rdma_reconnect_ctrl_work(struct work_struct *work)
 	if (ret)
 		goto requeue;
 
-	if (ctrl->ctrl.opts->nr_io_queues) {
+	if (ctrl->ctrl.max_queues > 1) {
 		ret = nvme_rdma_configure_io_queues(ctrl, false);
 		if (ret)
 			goto requeue;
@@ -961,15 +958,15 @@ static void nvme_rdma_error_recovery_work(struct work_struct *work)
 
 	nvme_stop_keep_alive(&ctrl->ctrl);
 
-	for (i = 0; i < ctrl->queue_count; i++)
+	for (i = 0; i < ctrl->ctrl.queue_count; i++)
 		clear_bit(NVME_RDMA_Q_LIVE, &ctrl->queues[i].flags);
 
-	if (ctrl->queue_count > 1)
+	if (ctrl->ctrl.queue_count > 1)
 		nvme_stop_queues(&ctrl->ctrl);
 	blk_mq_stop_hw_queues(ctrl->ctrl.admin_q);
 
 	/* We must take care of fastfail/requeue all our inflight requests */
-	if (ctrl->queue_count > 1)
+	if (ctrl->ctrl.queue_count > 1)
 		blk_mq_tagset_busy_iter(&ctrl->tag_set,
 					nvme_cancel_request, &ctrl->ctrl);
 	blk_mq_tagset_busy_iter(&ctrl->admin_tag_set,
@@ -1729,7 +1726,7 @@ static void nvme_rdma_shutdown_ctrl(struct nvme_rdma_ctrl *ctrl, bool shutdown)
 	cancel_work_sync(&ctrl->err_work);
 	cancel_delayed_work_sync(&ctrl->reconnect_work);
 
-	if (ctrl->ctrl.opts->nr_io_queues) {
+	if (ctrl->ctrl.max_queues > 1) {
 		nvme_stop_queues(&ctrl->ctrl);
 		blk_mq_tagset_busy_iter(&ctrl->tag_set,
 					nvme_cancel_request, &ctrl->ctrl);
@@ -1797,7 +1794,7 @@ static void nvme_rdma_reset_ctrl_work(struct work_struct *work)
 	if (ret)
 		goto out_destroy_admin;
 
-	if (ctrl->ctrl.opts->nr_io_queues) {
+	if (ctrl->ctrl.max_queues > 1) {
 		ret = nvme_rdma_configure_io_queues(ctrl, false);
 		if (ret)
 			goto out_destroy_io;
@@ -1806,7 +1803,7 @@ static void nvme_rdma_reset_ctrl_work(struct work_struct *work)
 	changed = nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_LIVE);
 	WARN_ON_ONCE(!changed);
 
-	if (ctrl->queue_count > 1) {
+	if (ctrl->ctrl.queue_count > 1) {
 		nvme_start_queues(&ctrl->ctrl);
 		nvme_queue_scan(&ctrl->ctrl);
 		nvme_queue_async_events(&ctrl->ctrl);
@@ -1884,12 +1881,12 @@ static struct nvme_ctrl *nvme_rdma_create_ctrl(struct device *dev,
 	INIT_WORK(&ctrl->delete_work, nvme_rdma_del_ctrl_work);
 	INIT_WORK(&ctrl->ctrl.reset_work, nvme_rdma_reset_ctrl_work);
 
-	ctrl->queue_count = opts->nr_io_queues + 1; /* +1 for admin queue */
+	ctrl->ctrl.max_queues = opts->nr_io_queues + 1;
 	ctrl->ctrl.sqsize = opts->queue_size - 1;
 	ctrl->ctrl.kato = opts->kato;
 
 	ret = -ENOMEM;
-	ctrl->queues = kcalloc(ctrl->queue_count, sizeof(*ctrl->queues),
+	ctrl->queues = kcalloc(ctrl->ctrl.max_queues, sizeof(*ctrl->queues),
 				GFP_KERNEL);
 	if (!ctrl->queues)
 		goto out_uninit_ctrl;
@@ -1928,7 +1925,7 @@ static struct nvme_ctrl *nvme_rdma_create_ctrl(struct device *dev,
 		opts->queue_size = ctrl->ctrl.sqsize + 1;
 	}
 
-	if (opts->nr_io_queues) {
+	if (ctrl->ctrl.max_queues > 1) {
 		ret = nvme_rdma_configure_io_queues(ctrl, true);
 		if (ret)
 			goto out_remove_admin_queue;
@@ -1946,7 +1943,7 @@ static struct nvme_ctrl *nvme_rdma_create_ctrl(struct device *dev,
 	list_add_tail(&ctrl->list, &nvme_rdma_ctrl_list);
 	mutex_unlock(&nvme_rdma_ctrl_mutex);
 
-	if (opts->nr_io_queues) {
+	if (ctrl->ctrl.max_queues > 1) {
 		nvme_queue_scan(&ctrl->ctrl);
 		nvme_queue_async_events(&ctrl->ctrl);
 	}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: [PATCH rfc 09/30] nvme: Move queue_count to the nvme_ctrl
  2017-06-18 15:21 ` [PATCH rfc 09/30] nvme: Move queue_count to the nvme_ctrl Sagi Grimberg
@ 2017-06-19 12:41   ` Christoph Hellwig
  0 siblings, 0 replies; 69+ messages in thread
From: Christoph Hellwig @ 2017-06-19 12:41 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: linux-nvme, Christoph Hellwig, Keith Busch, linux-block

Looks fine.  I'd be happy to take this as an early cleanup.

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH rfc 10/30] nvme: Add admin_tagset pointer to nvme_ctrl
  2017-06-18 15:21 [PATCH rfc 00/30] centralize nvme controller reset, delete and periodic reconnects Sagi Grimberg
                   ` (8 preceding siblings ...)
  2017-06-18 15:21 ` [PATCH rfc 09/30] nvme: Move queue_count to the nvme_ctrl Sagi Grimberg
@ 2017-06-18 15:21 ` Sagi Grimberg
  2017-06-19 12:41   ` Christoph Hellwig
  2017-06-18 15:21 ` [PATCH rfc 11/30] nvme: move controller cap to struct nvme_ctrl Sagi Grimberg
                   ` (20 subsequent siblings)
  30 siblings, 1 reply; 69+ messages in thread
From: Sagi Grimberg @ 2017-06-18 15:21 UTC (permalink / raw)
  To: linux-nvme; +Cc: Christoph Hellwig, Keith Busch, linux-block

Will be used when we centralize control flows. only
rdma for now.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
 drivers/nvme/host/nvme.h | 1 +
 drivers/nvme/host/rdma.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 415a5ea4759c..3be59634b4af 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -126,6 +126,7 @@ struct nvme_ctrl {
 	struct kref kref;
 	int instance;
 	struct blk_mq_tag_set *tagset;
+	struct blk_mq_tag_set *admin_tagset;
 	struct list_head namespaces;
 	struct mutex namespaces_mutex;
 	struct device *device;	/* char device */
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 2b23f88bedfe..58ed2ae3cd35 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -801,6 +801,7 @@ static int nvme_rdma_configure_admin_queue(struct nvme_rdma_ctrl *ctrl, bool new
 		error = blk_mq_alloc_tag_set(&ctrl->admin_tag_set);
 		if (error)
 			goto out_put_dev;
+		ctrl->ctrl.admin_tagset = &ctrl->admin_tag_set;
 
 		ctrl->ctrl.admin_q = blk_mq_init_queue(&ctrl->admin_tag_set);
 		if (IS_ERR(ctrl->ctrl.admin_q)) {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: [PATCH rfc 10/30] nvme: Add admin_tagset pointer to nvme_ctrl
  2017-06-18 15:21 ` [PATCH rfc 10/30] nvme: Add admin_tagset pointer to nvme_ctrl Sagi Grimberg
@ 2017-06-19 12:41   ` Christoph Hellwig
  2017-06-19 13:58     ` Sagi Grimberg
  0 siblings, 1 reply; 69+ messages in thread
From: Christoph Hellwig @ 2017-06-19 12:41 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: linux-nvme, Christoph Hellwig, Keith Busch, linux-block

On Sun, Jun 18, 2017 at 06:21:44PM +0300, Sagi Grimberg wrote:
> Will be used when we centralize control flows. only
> rdma for now.

Should we at some point move the tag_sets themselves to the generic
ctrl instead of just pointers?

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH rfc 10/30] nvme: Add admin_tagset pointer to nvme_ctrl
  2017-06-19 12:41   ` Christoph Hellwig
@ 2017-06-19 13:58     ` Sagi Grimberg
  0 siblings, 0 replies; 69+ messages in thread
From: Sagi Grimberg @ 2017-06-19 13:58 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-nvme, Keith Busch, linux-block

>> Will be used when we centralize control flows. only
>> rdma for now.
> 
> Should we at some point move the tag_sets themselves to the generic
> ctrl instead of just pointers?

We can easily do that, but the tagsets are heavily read in the hot path
so I was careful not to completely move them to nvme_ctrl which is not
arranged for it at all (and transports through it far back in their
struct).

Once we actually get some of this merged we should look into arranging
the transport controllers to be:

struct transport_ctrl {
	/* transport specific accessed in the hot path */
	...

	struct nvme_ctrl ctrl; /* hot members first */

	/* transport specific bookkeeping */
	...
};

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH rfc 11/30] nvme: move controller cap to struct nvme_ctrl
  2017-06-18 15:21 [PATCH rfc 00/30] centralize nvme controller reset, delete and periodic reconnects Sagi Grimberg
                   ` (9 preceding siblings ...)
  2017-06-18 15:21 ` [PATCH rfc 10/30] nvme: Add admin_tagset pointer to nvme_ctrl Sagi Grimberg
@ 2017-06-18 15:21 ` Sagi Grimberg
  2017-06-19 12:42   ` Christoph Hellwig
  2017-06-18 15:21 ` [PATCH rfc 12/30] nvme-rdma: disable controller in reset instead of shutdown Sagi Grimberg
                   ` (19 subsequent siblings)
  30 siblings, 1 reply; 69+ messages in thread
From: Sagi Grimberg @ 2017-06-18 15:21 UTC (permalink / raw)
  To: linux-nvme; +Cc: Christoph Hellwig, Keith Busch, linux-block

Will be used in centralized code later. only rdma
for now.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
 drivers/nvme/host/nvme.h | 1 +
 drivers/nvme/host/rdma.c | 7 +++----
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 3be59634b4af..5b75f6a81764 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -159,6 +159,7 @@ struct nvme_ctrl {
 	u16 kas;
 	u8 npss;
 	u8 apsta;
+	u64 cap;
 	unsigned int kato;
 	bool subsystem;
 	unsigned long quirks;
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 58ed2ae3cd35..cd637e28647b 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -119,7 +119,6 @@ struct nvme_rdma_ctrl {
 	struct blk_mq_tag_set	admin_tag_set;
 	struct nvme_rdma_device	*device;
 
-	u64			cap;
 	u32			max_fr_pages;
 
 	struct sockaddr_storage addr;
@@ -826,7 +825,7 @@ static int nvme_rdma_configure_admin_queue(struct nvme_rdma_ctrl *ctrl, bool new
 
 	set_bit(NVME_RDMA_Q_LIVE, &ctrl->queues[0].flags);
 
-	error = nvmf_reg_read64(&ctrl->ctrl, NVME_REG_CAP, &ctrl->cap);
+	error = nvmf_reg_read64(&ctrl->ctrl, NVME_REG_CAP, &ctrl->ctrl.cap);
 	if (error) {
 		dev_err(ctrl->ctrl.device,
 			"prop_get NVME_REG_CAP failed\n");
@@ -834,9 +833,9 @@ static int nvme_rdma_configure_admin_queue(struct nvme_rdma_ctrl *ctrl, bool new
 	}
 
 	ctrl->ctrl.sqsize =
-		min_t(int, NVME_CAP_MQES(ctrl->cap), ctrl->ctrl.sqsize);
+		min_t(int, NVME_CAP_MQES(ctrl->ctrl.cap), ctrl->ctrl.sqsize);
 
-	error = nvme_enable_ctrl(&ctrl->ctrl, ctrl->cap);
+	error = nvme_enable_ctrl(&ctrl->ctrl, ctrl->ctrl.cap);
 	if (error)
 		goto out_cleanup_connect_queue;
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: [PATCH rfc 11/30] nvme: move controller cap to struct nvme_ctrl
  2017-06-18 15:21 ` [PATCH rfc 11/30] nvme: move controller cap to struct nvme_ctrl Sagi Grimberg
@ 2017-06-19 12:42   ` Christoph Hellwig
  0 siblings, 0 replies; 69+ messages in thread
From: Christoph Hellwig @ 2017-06-19 12:42 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: linux-nvme, Christoph Hellwig, Keith Busch, linux-block

On Sun, Jun 18, 2017 at 06:21:45PM +0300, Sagi Grimberg wrote:
> Will be used in centralized code later. only rdma
> for now.

It would be great to initialize it early on for all transports, and
then just use the stored field instead of re-reading CAP in various
places.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH rfc 12/30] nvme-rdma: disable controller in reset instead of shutdown
  2017-06-18 15:21 [PATCH rfc 00/30] centralize nvme controller reset, delete and periodic reconnects Sagi Grimberg
                   ` (10 preceding siblings ...)
  2017-06-18 15:21 ` [PATCH rfc 11/30] nvme: move controller cap to struct nvme_ctrl Sagi Grimberg
@ 2017-06-18 15:21 ` Sagi Grimberg
  2017-06-19 12:43   ` Christoph Hellwig
  2017-06-18 15:21 ` [PATCH rfc 13/30] nvme-rdma: move queue LIVE/DELETING flags settings to queue routines Sagi Grimberg
                   ` (18 subsequent siblings)
  30 siblings, 1 reply; 69+ messages in thread
From: Sagi Grimberg @ 2017-06-18 15:21 UTC (permalink / raw)
  To: linux-nvme; +Cc: Christoph Hellwig, Keith Busch, linux-block

For controller resets we can avoid a full controller shutdown.
This makes rdma similar to pci.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
 drivers/nvme/host/rdma.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index cd637e28647b..34518c90609a 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -1720,7 +1720,7 @@ static const struct blk_mq_ops nvme_rdma_admin_mq_ops = {
 	.timeout	= nvme_rdma_timeout,
 };
 
-static void nvme_rdma_shutdown_ctrl(struct nvme_rdma_ctrl *ctrl, bool shutdown)
+static void nvme_rdma_teardown_ctrl(struct nvme_rdma_ctrl *ctrl, bool shutdown)
 {
 	nvme_stop_keep_alive(&ctrl->ctrl);
 	cancel_work_sync(&ctrl->err_work);
@@ -1733,8 +1733,12 @@ static void nvme_rdma_shutdown_ctrl(struct nvme_rdma_ctrl *ctrl, bool shutdown)
 		nvme_rdma_destroy_io_queues(ctrl, shutdown);
 	}
 
-	if (test_bit(NVME_RDMA_Q_LIVE, &ctrl->queues[0].flags))
-		nvme_shutdown_ctrl(&ctrl->ctrl);
+	if (test_bit(NVME_RDMA_Q_LIVE, &ctrl->queues[0].flags)) {
+		if (shutdown)
+			nvme_shutdown_ctrl(&ctrl->ctrl);
+		else
+			nvme_disable_ctrl(&ctrl->ctrl, ctrl->ctrl.cap);
+	}
 
 	blk_mq_stop_hw_queues(ctrl->ctrl.admin_q);
 	blk_mq_tagset_busy_iter(&ctrl->admin_tag_set,
@@ -1748,7 +1752,7 @@ static void nvme_rdma_del_ctrl_work(struct work_struct *work)
 				struct nvme_rdma_ctrl, delete_work);
 
 	nvme_uninit_ctrl(&ctrl->ctrl);
-	nvme_rdma_shutdown_ctrl(ctrl, true);
+	nvme_rdma_teardown_ctrl(ctrl, true);
 	nvme_put_ctrl(&ctrl->ctrl);
 }
 
@@ -1788,7 +1792,7 @@ static void nvme_rdma_reset_ctrl_work(struct work_struct *work)
 	int ret;
 	bool changed;
 
-	nvme_rdma_shutdown_ctrl(ctrl, false);
+	nvme_rdma_teardown_ctrl(ctrl, false);
 
 	ret = nvme_rdma_configure_admin_queue(ctrl, false);
 	if (ret)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: [PATCH rfc 12/30] nvme-rdma: disable controller in reset instead of shutdown
  2017-06-18 15:21 ` [PATCH rfc 12/30] nvme-rdma: disable controller in reset instead of shutdown Sagi Grimberg
@ 2017-06-19 12:43   ` Christoph Hellwig
  0 siblings, 0 replies; 69+ messages in thread
From: Christoph Hellwig @ 2017-06-19 12:43 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: linux-nvme, Christoph Hellwig, Keith Busch, linux-block

Looks fine,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH rfc 13/30] nvme-rdma: move queue LIVE/DELETING flags settings to queue routines
  2017-06-18 15:21 [PATCH rfc 00/30] centralize nvme controller reset, delete and periodic reconnects Sagi Grimberg
                   ` (11 preceding siblings ...)
  2017-06-18 15:21 ` [PATCH rfc 12/30] nvme-rdma: disable controller in reset instead of shutdown Sagi Grimberg
@ 2017-06-18 15:21 ` Sagi Grimberg
  2017-06-19 12:44   ` Christoph Hellwig
  2017-06-18 15:21 ` [PATCH rfc 14/30] nvme-rdma: stop queues instead of simply flipping their state Sagi Grimberg
                   ` (17 subsequent siblings)
  30 siblings, 1 reply; 69+ messages in thread
From: Sagi Grimberg @ 2017-06-18 15:21 UTC (permalink / raw)
  To: linux-nvme; +Cc: Christoph Hellwig, Keith Busch, linux-block

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
 drivers/nvme/host/rdma.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 34518c90609a..d9524389aedd 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -565,7 +565,7 @@ static void nvme_rdma_stop_queue(struct nvme_rdma_ctrl *ctrl, int qid)
 {
 	struct nvme_rdma_queue *queue = &ctrl->queues[qid];
 
-	if (test_bit(NVME_RDMA_Q_DELETING, &queue->flags))
+	if (!test_and_clear_bit(NVME_RDMA_Q_LIVE, &queue->flags))
 		return;
 	rdma_disconnect(queue->cm_id);
 	ib_drain_qp(queue->qp);
@@ -617,7 +617,9 @@ static int nvme_rdma_start_queue(struct nvme_rdma_ctrl *ctrl, int idx)
 	else
 		ret = nvmf_connect_admin_queue(&ctrl->ctrl);
 
-	if (ret)
+	if (!ret)
+		set_bit(NVME_RDMA_Q_LIVE, &ctrl->queues[idx].flags);
+	else
 		dev_info(ctrl->ctrl.device,
 			"failed to connect queue: %d ret=%d\n", idx, ret);
 	return ret;
@@ -631,8 +633,6 @@ static int nvme_rdma_start_io_queues(struct nvme_rdma_ctrl *ctrl)
 		ret = nvme_rdma_start_queue(ctrl, i);
 		if (ret)
 			goto out_stop_queues;
-
-		set_bit(NVME_RDMA_Q_LIVE, &ctrl->queues[i].flags);
 	}
 
 	return 0;
@@ -823,8 +823,6 @@ static int nvme_rdma_configure_admin_queue(struct nvme_rdma_ctrl *ctrl, bool new
 	if (error)
 		goto out_cleanup_connect_queue;
 
-	set_bit(NVME_RDMA_Q_LIVE, &ctrl->queues[0].flags);
-
 	error = nvmf_reg_read64(&ctrl->ctrl, NVME_REG_CAP, &ctrl->ctrl.cap);
 	if (error) {
 		dev_err(ctrl->ctrl.device,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: [PATCH rfc 13/30] nvme-rdma: move queue LIVE/DELETING flags settings to queue routines
  2017-06-18 15:21 ` [PATCH rfc 13/30] nvme-rdma: move queue LIVE/DELETING flags settings to queue routines Sagi Grimberg
@ 2017-06-19 12:44   ` Christoph Hellwig
  0 siblings, 0 replies; 69+ messages in thread
From: Christoph Hellwig @ 2017-06-19 12:44 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: linux-nvme, Christoph Hellwig, Keith Busch, linux-block

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH rfc 14/30] nvme-rdma: stop queues instead of simply flipping their state
  2017-06-18 15:21 [PATCH rfc 00/30] centralize nvme controller reset, delete and periodic reconnects Sagi Grimberg
                   ` (12 preceding siblings ...)
  2017-06-18 15:21 ` [PATCH rfc 13/30] nvme-rdma: move queue LIVE/DELETING flags settings to queue routines Sagi Grimberg
@ 2017-06-18 15:21 ` Sagi Grimberg
  2017-06-19 12:44   ` Christoph Hellwig
  2017-06-18 15:21 ` [PATCH rfc 15/30] nvme-rdma: don't check queue state for shutdown/disable Sagi Grimberg
                   ` (16 subsequent siblings)
  30 siblings, 1 reply; 69+ messages in thread
From: Sagi Grimberg @ 2017-06-18 15:21 UTC (permalink / raw)
  To: linux-nvme; +Cc: Christoph Hellwig, Keith Busch, linux-block

If we move the queues from LIVE state, we might as well stop
them (drain for rdma). Do it after we stop the sw request queues
to prevent a stray request sneaking in .queue_rq after we stop
the queue.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
 drivers/nvme/host/rdma.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index d9524389aedd..fbe2ca4f4ba3 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -952,16 +952,15 @@ static void nvme_rdma_error_recovery_work(struct work_struct *work)
 {
 	struct nvme_rdma_ctrl *ctrl = container_of(work,
 			struct nvme_rdma_ctrl, err_work);
-	int i;
 
 	nvme_stop_keep_alive(&ctrl->ctrl);
 
-	for (i = 0; i < ctrl->ctrl.queue_count; i++)
-		clear_bit(NVME_RDMA_Q_LIVE, &ctrl->queues[i].flags);
-
-	if (ctrl->ctrl.queue_count > 1)
+	if (ctrl->ctrl.queue_count > 1) {
 		nvme_stop_queues(&ctrl->ctrl);
+		nvme_rdma_stop_io_queues(ctrl);
+	}
 	blk_mq_stop_hw_queues(ctrl->ctrl.admin_q);
+	nvme_rdma_stop_queue(ctrl, 0);
 
 	/* We must take care of fastfail/requeue all our inflight requests */
 	if (ctrl->ctrl.queue_count > 1)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: [PATCH rfc 14/30] nvme-rdma: stop queues instead of simply flipping their state
  2017-06-18 15:21 ` [PATCH rfc 14/30] nvme-rdma: stop queues instead of simply flipping their state Sagi Grimberg
@ 2017-06-19 12:44   ` Christoph Hellwig
  0 siblings, 0 replies; 69+ messages in thread
From: Christoph Hellwig @ 2017-06-19 12:44 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: linux-nvme, Christoph Hellwig, Keith Busch, linux-block

Looks fine,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH rfc 15/30] nvme-rdma: don't check queue state for shutdown/disable
  2017-06-18 15:21 [PATCH rfc 00/30] centralize nvme controller reset, delete and periodic reconnects Sagi Grimberg
                   ` (13 preceding siblings ...)
  2017-06-18 15:21 ` [PATCH rfc 14/30] nvme-rdma: stop queues instead of simply flipping their state Sagi Grimberg
@ 2017-06-18 15:21 ` Sagi Grimberg
  2017-06-19 12:44   ` Christoph Hellwig
  2017-06-18 15:21 ` [PATCH rfc 16/30] nvme-rdma: move tagset allocation to a dedicated routine Sagi Grimberg
                   ` (15 subsequent siblings)
  30 siblings, 1 reply; 69+ messages in thread
From: Sagi Grimberg @ 2017-06-18 15:21 UTC (permalink / raw)
  To: linux-nvme; +Cc: Christoph Hellwig, Keith Busch, linux-block

If the queue is not ready, it will be rejected in queue_rq

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
 drivers/nvme/host/rdma.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index fbe2ca4f4ba3..700aef42c4f2 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -1730,12 +1730,10 @@ static void nvme_rdma_teardown_ctrl(struct nvme_rdma_ctrl *ctrl, bool shutdown)
 		nvme_rdma_destroy_io_queues(ctrl, shutdown);
 	}
 
-	if (test_bit(NVME_RDMA_Q_LIVE, &ctrl->queues[0].flags)) {
-		if (shutdown)
-			nvme_shutdown_ctrl(&ctrl->ctrl);
-		else
-			nvme_disable_ctrl(&ctrl->ctrl, ctrl->ctrl.cap);
-	}
+	if (shutdown)
+		nvme_shutdown_ctrl(&ctrl->ctrl);
+	else
+		nvme_disable_ctrl(&ctrl->ctrl, ctrl->ctrl.cap);
 
 	blk_mq_stop_hw_queues(ctrl->ctrl.admin_q);
 	blk_mq_tagset_busy_iter(&ctrl->admin_tag_set,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: [PATCH rfc 15/30] nvme-rdma: don't check queue state for shutdown/disable
  2017-06-18 15:21 ` [PATCH rfc 15/30] nvme-rdma: don't check queue state for shutdown/disable Sagi Grimberg
@ 2017-06-19 12:44   ` Christoph Hellwig
  0 siblings, 0 replies; 69+ messages in thread
From: Christoph Hellwig @ 2017-06-19 12:44 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: linux-nvme, Christoph Hellwig, Keith Busch, linux-block

Looks fine,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH rfc 16/30] nvme-rdma: move tagset allocation to a dedicated routine
  2017-06-18 15:21 [PATCH rfc 00/30] centralize nvme controller reset, delete and periodic reconnects Sagi Grimberg
                   ` (14 preceding siblings ...)
  2017-06-18 15:21 ` [PATCH rfc 15/30] nvme-rdma: don't check queue state for shutdown/disable Sagi Grimberg
@ 2017-06-18 15:21 ` Sagi Grimberg
  2017-06-19 12:45   ` Christoph Hellwig
  2017-06-18 15:21 ` [PATCH rfc 17/30] nvme-rdma: move admin specific resources to alloc_queue Sagi Grimberg
                   ` (14 subsequent siblings)
  30 siblings, 1 reply; 69+ messages in thread
From: Sagi Grimberg @ 2017-06-18 15:21 UTC (permalink / raw)
  To: linux-nvme; +Cc: Christoph Hellwig, Keith Busch, linux-block

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
 drivers/nvme/host/rdma.c | 148 ++++++++++++++++++++++++++---------------------
 1 file changed, 83 insertions(+), 65 deletions(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 700aef42c4f2..c1ffdb823cbb 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -506,6 +506,72 @@ static int nvme_rdma_create_queue_ib(struct nvme_rdma_queue *queue)
 	return ret;
 }
 
+static void nvme_rdma_free_tagset(struct nvme_ctrl *nctrl, bool admin)
+{
+	struct nvme_rdma_ctrl *ctrl = to_rdma_ctrl(nctrl);
+	struct blk_mq_tag_set *set = admin ?
+			&ctrl->admin_tag_set : &ctrl->tag_set;
+
+	nvme_rdma_dev_put(ctrl->device);
+	blk_mq_free_tag_set(set);
+}
+
+static struct blk_mq_tag_set *nvme_rdma_alloc_tagset(struct nvme_ctrl *nctrl,
+		bool admin)
+{
+	struct nvme_rdma_ctrl *ctrl = to_rdma_ctrl(nctrl);
+	struct blk_mq_tag_set *set;
+	int ret;
+
+	if (admin) {
+		set = &ctrl->admin_tag_set;
+		memset(set, 0, sizeof(*set));
+		set->ops = &nvme_rdma_admin_mq_ops;
+		set->queue_depth = NVME_RDMA_AQ_BLKMQ_DEPTH;
+		set->reserved_tags = 2; /* connect + keep-alive */
+		set->numa_node = NUMA_NO_NODE;
+		set->cmd_size = sizeof(struct nvme_rdma_request) +
+			SG_CHUNK_SIZE * sizeof(struct scatterlist);
+		set->driver_data = ctrl;
+		set->nr_hw_queues = 1;
+		set->timeout = ADMIN_TIMEOUT;
+	} else {
+		set = &ctrl->tag_set;
+		memset(set, 0, sizeof(*set));
+		set->ops = &nvme_rdma_mq_ops;
+		set->queue_depth = nctrl->opts->queue_size;
+		set->reserved_tags = 1; /* fabric connect */
+		set->numa_node = NUMA_NO_NODE;
+		set->flags = BLK_MQ_F_SHOULD_MERGE;
+		set->cmd_size = sizeof(struct nvme_rdma_request) +
+			SG_CHUNK_SIZE * sizeof(struct scatterlist);
+		set->driver_data = ctrl;
+		set->nr_hw_queues = nctrl->queue_count - 1;
+		set->timeout = NVME_IO_TIMEOUT;
+	}
+
+	ret = blk_mq_alloc_tag_set(set);
+	if (ret)
+		goto out;
+
+	/*
+	 * We need a reference on the device as long as the tag_set is alive,
+	 * as the MRs in the request structures need a valid ib_device.
+	 */
+	ret = nvme_rdma_dev_get(ctrl->device);
+	if (!ret) {
+		ret = -EINVAL;
+		goto out_free_tagset;
+	}
+
+	return set;
+
+out_free_tagset:
+	blk_mq_free_tag_set(set);
+out:
+	return ERR_PTR(ret);
+}
+
 static int nvme_rdma_alloc_queue(struct nvme_rdma_ctrl *ctrl,
 		int idx, size_t queue_size)
 {
@@ -602,8 +668,7 @@ static void nvme_rdma_destroy_io_queues(struct nvme_rdma_ctrl *ctrl, bool remove
 	nvme_rdma_stop_io_queues(ctrl);
 	if (remove) {
 		blk_cleanup_queue(ctrl->ctrl.connect_q);
-		blk_mq_free_tag_set(&ctrl->tag_set);
-		nvme_rdma_dev_put(ctrl->device);
+		nvme_rdma_free_tagset(&ctrl->ctrl, false);
 	}
 	nvme_rdma_free_io_queues(ctrl);
 }
@@ -694,38 +759,19 @@ static int nvme_rdma_configure_io_queues(struct nvme_rdma_ctrl *ctrl, bool new)
 		return ret;
 
 	if (new) {
-		/*
-		 * We need a reference on the device as long as the tag_set is alive,
-		 * as the MRs in the request structures need a valid ib_device.
-		 */
-		ret = -EINVAL;
-		if (!nvme_rdma_dev_get(ctrl->device))
+		ctrl->ctrl.tagset = nvme_rdma_alloc_tagset(&ctrl->ctrl, false);
+		if (IS_ERR(ctrl->ctrl.tagset)) {
+			ret = PTR_ERR(ctrl->ctrl.tagset);
 			goto out_free_io_queues;
+		}
 
-		memset(&ctrl->tag_set, 0, sizeof(ctrl->tag_set));
-		ctrl->tag_set.ops = &nvme_rdma_mq_ops;
-		ctrl->tag_set.queue_depth = ctrl->ctrl.opts->queue_size;
-		ctrl->tag_set.reserved_tags = 1; /* fabric connect */
-		ctrl->tag_set.numa_node = NUMA_NO_NODE;
-		ctrl->tag_set.flags = BLK_MQ_F_SHOULD_MERGE;
-		ctrl->tag_set.cmd_size = sizeof(struct nvme_rdma_request) +
-			SG_CHUNK_SIZE * sizeof(struct scatterlist);
-		ctrl->tag_set.driver_data = ctrl;
-		ctrl->tag_set.nr_hw_queues = ctrl->ctrl.max_queues - 1;
-		ctrl->tag_set.timeout = NVME_IO_TIMEOUT;
-
-		ret = blk_mq_alloc_tag_set(&ctrl->tag_set);
-		if (ret)
-			goto out_put_dev;
-		ctrl->ctrl.tagset = &ctrl->tag_set;
-
-		ctrl->ctrl.connect_q = blk_mq_init_queue(&ctrl->tag_set);
+		ctrl->ctrl.connect_q = blk_mq_init_queue(ctrl->ctrl.tagset);
 		if (IS_ERR(ctrl->ctrl.connect_q)) {
 			ret = PTR_ERR(ctrl->ctrl.connect_q);
 			goto out_free_tag_set;
 		}
 	} else {
-		ret = blk_mq_reinit_tagset(&ctrl->tag_set);
+		ret = blk_mq_reinit_tagset(ctrl->ctrl.tagset);
 		if (ret)
 			goto out_free_io_queues;
 	}
@@ -741,10 +787,7 @@ static int nvme_rdma_configure_io_queues(struct nvme_rdma_ctrl *ctrl, bool new)
 		blk_cleanup_queue(ctrl->ctrl.connect_q);
 out_free_tag_set:
 	if (new)
-		blk_mq_free_tag_set(&ctrl->tag_set);
-out_put_dev:
-	if (new)
-		nvme_rdma_dev_put(ctrl->device);
+		nvme_rdma_free_tagset(&ctrl->ctrl, false);
 out_free_io_queues:
 	nvme_rdma_free_io_queues(ctrl);
 	return ret;
@@ -756,8 +799,7 @@ static void nvme_rdma_destroy_admin_queue(struct nvme_rdma_ctrl *ctrl, bool remo
 	if (remove) {
 		blk_cleanup_queue(ctrl->ctrl.admin_connect_q);
 		blk_cleanup_queue(ctrl->ctrl.admin_q);
-		blk_mq_free_tag_set(&ctrl->admin_tag_set);
-		nvme_rdma_dev_put(ctrl->device);
+		nvme_rdma_free_tagset(&ctrl->ctrl, true);
 	}
 
 	nvme_rdma_free_qe(ctrl->queues[0].device->dev, &ctrl->async_event_sqe,
@@ -778,43 +820,25 @@ static int nvme_rdma_configure_admin_queue(struct nvme_rdma_ctrl *ctrl, bool new
 		ctrl->device->dev->attrs.max_fast_reg_page_list_len);
 
 	if (new) {
-		/*
-		 * We need a reference on the device as long as the tag_set is alive,
-		 * as the MRs in the request structures need a valid ib_device.
-		 */
-		error = -EINVAL;
-		if (!nvme_rdma_dev_get(ctrl->device))
+		ctrl->ctrl.admin_tagset = nvme_rdma_alloc_tagset(&ctrl->ctrl, true);
+		if (IS_ERR(ctrl->ctrl.admin_tagset)) {
+			error = PTR_ERR(ctrl->ctrl.admin_tagset);
 			goto out_free_queue;
+		}
 
-		memset(&ctrl->admin_tag_set, 0, sizeof(ctrl->admin_tag_set));
-		ctrl->admin_tag_set.ops = &nvme_rdma_admin_mq_ops;
-		ctrl->admin_tag_set.queue_depth = NVME_RDMA_AQ_BLKMQ_DEPTH;
-		ctrl->admin_tag_set.reserved_tags = 2; /* connect + keep-alive */
-		ctrl->admin_tag_set.numa_node = NUMA_NO_NODE;
-		ctrl->admin_tag_set.cmd_size = sizeof(struct nvme_rdma_request) +
-			SG_CHUNK_SIZE * sizeof(struct scatterlist);
-		ctrl->admin_tag_set.driver_data = ctrl;
-		ctrl->admin_tag_set.nr_hw_queues = 1;
-		ctrl->admin_tag_set.timeout = ADMIN_TIMEOUT;
-
-		error = blk_mq_alloc_tag_set(&ctrl->admin_tag_set);
-		if (error)
-			goto out_put_dev;
-		ctrl->ctrl.admin_tagset = &ctrl->admin_tag_set;
-
-		ctrl->ctrl.admin_q = blk_mq_init_queue(&ctrl->admin_tag_set);
+		ctrl->ctrl.admin_q = blk_mq_init_queue(ctrl->ctrl.admin_tagset);
 		if (IS_ERR(ctrl->ctrl.admin_q)) {
 			error = PTR_ERR(ctrl->ctrl.admin_q);
 			goto out_free_tagset;
 		}
 
-		ctrl->ctrl.admin_connect_q = blk_mq_init_queue(&ctrl->admin_tag_set);
+		ctrl->ctrl.admin_connect_q = blk_mq_init_queue(ctrl->ctrl.admin_tagset);
 		if (IS_ERR(ctrl->ctrl.admin_connect_q)) {
 			error = PTR_ERR(ctrl->ctrl.admin_connect_q);
 			goto out_cleanup_queue;
 		}
 	} else {
-		error = blk_mq_reinit_tagset(&ctrl->admin_tag_set);
+		error = blk_mq_reinit_tagset(ctrl->ctrl.admin_tagset);
 		if (error)
 			goto out_free_queue;
 	}
@@ -861,14 +885,8 @@ static int nvme_rdma_configure_admin_queue(struct nvme_rdma_ctrl *ctrl, bool new
 	if (new)
 		blk_cleanup_queue(ctrl->ctrl.admin_q);
 out_free_tagset:
-	if (new) {
-		/* disconnect and drain the queue before freeing the tagset */
-		nvme_rdma_stop_queue(ctrl, 0);
-		blk_mq_free_tag_set(&ctrl->admin_tag_set);
-	}
-out_put_dev:
 	if (new)
-		nvme_rdma_dev_put(ctrl->device);
+		nvme_rdma_free_tagset(&ctrl->ctrl, true);
 out_free_queue:
 	nvme_rdma_free_queue(ctrl, 0);
 	return error;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: [PATCH rfc 16/30] nvme-rdma: move tagset allocation to a dedicated routine
  2017-06-18 15:21 ` [PATCH rfc 16/30] nvme-rdma: move tagset allocation to a dedicated routine Sagi Grimberg
@ 2017-06-19 12:45   ` Christoph Hellwig
  0 siblings, 0 replies; 69+ messages in thread
From: Christoph Hellwig @ 2017-06-19 12:45 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: linux-nvme, Christoph Hellwig, Keith Busch, linux-block

Looks fine, but how about doing this early in the series?  There's
quite a bit of churn around this code.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH rfc 17/30] nvme-rdma: move admin specific resources to alloc_queue
  2017-06-18 15:21 [PATCH rfc 00/30] centralize nvme controller reset, delete and periodic reconnects Sagi Grimberg
                   ` (15 preceding siblings ...)
  2017-06-18 15:21 ` [PATCH rfc 16/30] nvme-rdma: move tagset allocation to a dedicated routine Sagi Grimberg
@ 2017-06-18 15:21 ` Sagi Grimberg
  2017-06-19 12:46   ` Christoph Hellwig
  2017-06-18 15:21 ` [PATCH rfc 18/30] nvme-rdma: limit max_queues to rdma device number of completion vectors Sagi Grimberg
                   ` (13 subsequent siblings)
  30 siblings, 1 reply; 69+ messages in thread
From: Sagi Grimberg @ 2017-06-18 15:21 UTC (permalink / raw)
  To: linux-nvme; +Cc: Christoph Hellwig, Keith Busch, linux-block

We're trying to make admin queue configuration generic, so
move the rdma specifics to the queue allocation (based on
the queue index passed).

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
 drivers/nvme/host/rdma.c | 38 ++++++++++++++++++++++----------------
 1 file changed, 22 insertions(+), 16 deletions(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index c1ffdb823cbb..7f4b66cf67cc 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -618,6 +618,23 @@ static int nvme_rdma_alloc_queue(struct nvme_rdma_ctrl *ctrl,
 		goto out_destroy_cm_id;
 	}
 
+	if (!idx) {
+		ctrl->device = ctrl->queues[0].device;
+		ctrl->max_fr_pages = min_t(u32, NVME_RDMA_MAX_SEGMENTS,
+			ctrl->device->dev->attrs.max_fast_reg_page_list_len);
+		ctrl->ctrl.max_hw_sectors =
+			(ctrl->max_fr_pages - 1) << (PAGE_SHIFT - 9);
+
+		ret = nvme_rdma_alloc_qe(ctrl->queues[0].device->dev,
+			&ctrl->async_event_sqe, sizeof(struct nvme_command),
+			DMA_TO_DEVICE);
+		if (ret) {
+			nvme_rdma_destroy_queue_ib(&ctrl->queues[0]);
+			goto out_destroy_cm_id;
+		}
+
+	}
+
 	clear_bit(NVME_RDMA_Q_DELETING, &queue->flags);
 
 	return 0;
@@ -643,6 +660,11 @@ static void nvme_rdma_free_queue(struct nvme_rdma_ctrl *ctrl, int qid)
 
 	if (test_and_set_bit(NVME_RDMA_Q_DELETING, &queue->flags))
 		return;
+
+	if (!qid)
+		nvme_rdma_free_qe(ctrl->queues[0].device->dev,
+			&ctrl->async_event_sqe, sizeof(struct nvme_command),
+			DMA_TO_DEVICE);
 	nvme_rdma_destroy_queue_ib(queue);
 	rdma_destroy_id(queue->cm_id);
 }
@@ -801,9 +823,6 @@ static void nvme_rdma_destroy_admin_queue(struct nvme_rdma_ctrl *ctrl, bool remo
 		blk_cleanup_queue(ctrl->ctrl.admin_q);
 		nvme_rdma_free_tagset(&ctrl->ctrl, true);
 	}
-
-	nvme_rdma_free_qe(ctrl->queues[0].device->dev, &ctrl->async_event_sqe,
-			sizeof(struct nvme_command), DMA_TO_DEVICE);
 	nvme_rdma_free_queue(ctrl, 0);
 }
 
@@ -815,10 +834,6 @@ static int nvme_rdma_configure_admin_queue(struct nvme_rdma_ctrl *ctrl, bool new
 	if (error)
 		return error;
 
-	ctrl->device = ctrl->queues[0].device;
-	ctrl->max_fr_pages = min_t(u32, NVME_RDMA_MAX_SEGMENTS,
-		ctrl->device->dev->attrs.max_fast_reg_page_list_len);
-
 	if (new) {
 		ctrl->ctrl.admin_tagset = nvme_rdma_alloc_tagset(&ctrl->ctrl, true);
 		if (IS_ERR(ctrl->ctrl.admin_tagset)) {
@@ -861,19 +876,10 @@ static int nvme_rdma_configure_admin_queue(struct nvme_rdma_ctrl *ctrl, bool new
 	if (error)
 		goto out_cleanup_connect_queue;
 
-	ctrl->ctrl.max_hw_sectors =
-		(ctrl->max_fr_pages - 1) << (PAGE_SHIFT - 9);
-
 	error = nvme_init_identify(&ctrl->ctrl);
 	if (error)
 		goto out_cleanup_connect_queue;
 
-	error = nvme_rdma_alloc_qe(ctrl->queues[0].device->dev,
-			&ctrl->async_event_sqe, sizeof(struct nvme_command),
-			DMA_TO_DEVICE);
-	if (error)
-		goto out_cleanup_connect_queue;
-
 	nvme_start_keep_alive(&ctrl->ctrl);
 
 	return 0;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: [PATCH rfc 17/30] nvme-rdma: move admin specific resources to alloc_queue
  2017-06-18 15:21 ` [PATCH rfc 17/30] nvme-rdma: move admin specific resources to alloc_queue Sagi Grimberg
@ 2017-06-19 12:46   ` Christoph Hellwig
  0 siblings, 0 replies; 69+ messages in thread
From: Christoph Hellwig @ 2017-06-19 12:46 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: linux-nvme, Christoph Hellwig, Keith Busch, linux-block

On Sun, Jun 18, 2017 at 06:21:51PM +0300, Sagi Grimberg wrote:
> We're trying to make admin queue configuration generic, so
> move the rdma specifics to the queue allocation (based on
> the queue index passed).

Needs at least a comment, and probably factoring into a little
helper.  And once we have that helper it sounds like this might
be a good callout from the core?

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH rfc 18/30] nvme-rdma: limit max_queues to rdma device number of completion vectors
  2017-06-18 15:21 [PATCH rfc 00/30] centralize nvme controller reset, delete and periodic reconnects Sagi Grimberg
                   ` (16 preceding siblings ...)
  2017-06-18 15:21 ` [PATCH rfc 17/30] nvme-rdma: move admin specific resources to alloc_queue Sagi Grimberg
@ 2017-06-18 15:21 ` Sagi Grimberg
  2017-06-18 15:21 ` [PATCH rfc 19/30] nvme-rdma: call ops->reg_read64 instead of nvmf_reg_read64 Sagi Grimberg
                   ` (12 subsequent siblings)
  30 siblings, 0 replies; 69+ messages in thread
From: Sagi Grimberg @ 2017-06-18 15:21 UTC (permalink / raw)
  To: linux-nvme; +Cc: Christoph Hellwig, Keith Busch, linux-block

nvme_rdma_alloc_io_queues is heading to generic code, we want
to decouple its dependency from the various transports.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
 drivers/nvme/host/rdma.c | 18 ++++++++----------
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 7f4b66cf67cc..ce63dd40e6b4 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -625,6 +625,14 @@ static int nvme_rdma_alloc_queue(struct nvme_rdma_ctrl *ctrl,
 		ctrl->ctrl.max_hw_sectors =
 			(ctrl->max_fr_pages - 1) << (PAGE_SHIFT - 9);
 
+		/*
+		 * we map queues according to the device irq vectors for
+		 * optimal locality so we don't need more queues than
+		 * completion vectors.
+		 */
+		ctrl->ctrl.max_queues = min_t(u32, ctrl->ctrl.max_queues,
+				ctrl->device->dev->num_comp_vectors + 1);
+
 		ret = nvme_rdma_alloc_qe(ctrl->queues[0].device->dev,
 			&ctrl->async_event_sqe, sizeof(struct nvme_command),
 			DMA_TO_DEVICE);
@@ -632,7 +640,6 @@ static int nvme_rdma_alloc_queue(struct nvme_rdma_ctrl *ctrl,
 			nvme_rdma_destroy_queue_ib(&ctrl->queues[0]);
 			goto out_destroy_cm_id;
 		}
-
 	}
 
 	clear_bit(NVME_RDMA_Q_DELETING, &queue->flags);
@@ -733,18 +740,9 @@ static int nvme_rdma_start_io_queues(struct nvme_rdma_ctrl *ctrl)
 static int nvme_rdma_alloc_io_queues(struct nvme_rdma_ctrl *ctrl)
 {
 	unsigned int nr_io_queues = ctrl->ctrl.max_queues - 1;
-	struct ib_device *ibdev = ctrl->device->dev;
 	int i, ret;
 
 	nr_io_queues = min(nr_io_queues, num_online_cpus());
-	/*
-	 * we map queues according to the device irq vectors for
-	 * optimal locality so we don't need more queues than
-	 * completion vectors.
-	 */
-	nr_io_queues = min_t(unsigned int, nr_io_queues,
-				ibdev->num_comp_vectors);
-
 	ret = nvme_set_queue_count(&ctrl->ctrl, &nr_io_queues);
 	if (ret)
 		return ret;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH rfc 19/30] nvme-rdma: call ops->reg_read64 instead of nvmf_reg_read64
  2017-06-18 15:21 [PATCH rfc 00/30] centralize nvme controller reset, delete and periodic reconnects Sagi Grimberg
                   ` (17 preceding siblings ...)
  2017-06-18 15:21 ` [PATCH rfc 18/30] nvme-rdma: limit max_queues to rdma device number of completion vectors Sagi Grimberg
@ 2017-06-18 15:21 ` Sagi Grimberg
  2017-06-18 15:21 ` [PATCH rfc 20/30] nvme: add err, reconnect and delete work items to nvme core Sagi Grimberg
                   ` (11 subsequent siblings)
  30 siblings, 0 replies; 69+ messages in thread
From: Sagi Grimberg @ 2017-06-18 15:21 UTC (permalink / raw)
  To: linux-nvme; +Cc: Christoph Hellwig, Keith Busch, linux-block

Make it a generic routine.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
 drivers/nvme/host/rdma.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index ce63dd40e6b4..753e66c1d77d 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -860,7 +860,7 @@ static int nvme_rdma_configure_admin_queue(struct nvme_rdma_ctrl *ctrl, bool new
 	if (error)
 		goto out_cleanup_connect_queue;
 
-	error = nvmf_reg_read64(&ctrl->ctrl, NVME_REG_CAP, &ctrl->ctrl.cap);
+	error = ctrl->ctrl.ops->reg_read64(&ctrl->ctrl, NVME_REG_CAP, &ctrl->ctrl.cap);
 	if (error) {
 		dev_err(ctrl->ctrl.device,
 			"prop_get NVME_REG_CAP failed\n");
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH rfc 20/30] nvme: add err, reconnect and delete work items to nvme core
  2017-06-18 15:21 [PATCH rfc 00/30] centralize nvme controller reset, delete and periodic reconnects Sagi Grimberg
                   ` (18 preceding siblings ...)
  2017-06-18 15:21 ` [PATCH rfc 19/30] nvme-rdma: call ops->reg_read64 instead of nvmf_reg_read64 Sagi Grimberg
@ 2017-06-18 15:21 ` Sagi Grimberg
  2017-06-19 12:49   ` Christoph Hellwig
  2017-06-18 15:21 ` [PATCH rfc 21/30] nvme-rdma: plumb nvme_ctrl down the calls tack Sagi Grimberg
                   ` (10 subsequent siblings)
  30 siblings, 1 reply; 69+ messages in thread
From: Sagi Grimberg @ 2017-06-18 15:21 UTC (permalink / raw)
  To: linux-nvme; +Cc: Christoph Hellwig, Keith Busch, linux-block

We intent for these handlers to become generic, thus, add them to
the nvme core controller struct.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
 drivers/nvme/host/nvme.h |  4 +++
 drivers/nvme/host/rdma.c | 69 ++++++++++++++++++++++++------------------------
 2 files changed, 38 insertions(+), 35 deletions(-)

diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 5b75f6a81764..c604d471aa3d 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -164,6 +164,7 @@ struct nvme_ctrl {
 	bool subsystem;
 	unsigned long quirks;
 	struct nvme_id_power_state psd[32];
+
 	struct work_struct scan_work;
 	struct work_struct async_event_work;
 	struct delayed_work ka_work;
@@ -181,6 +182,9 @@ struct nvme_ctrl {
 	u16 icdoff;
 	u16 maxcmd;
 	int nr_reconnects;
+	struct work_struct delete_work;
+	struct work_struct err_work;
+	struct delayed_work reconnect_work;
 	struct nvmf_ctrl_options *opts;
 };
 
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 753e66c1d77d..6ce5054d4470 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -107,13 +107,9 @@ struct nvme_rdma_ctrl {
 
 	/* other member variables */
 	struct blk_mq_tag_set	tag_set;
-	struct work_struct	delete_work;
-	struct work_struct	err_work;
 
 	struct nvme_rdma_qe	async_event_sqe;
 
-	struct delayed_work	reconnect_work;
-
 	struct list_head	list;
 
 	struct blk_mq_tag_set	admin_tag_set;
@@ -925,18 +921,19 @@ static void nvme_rdma_reconnect_or_remove(struct nvme_rdma_ctrl *ctrl)
 	if (nvmf_should_reconnect(&ctrl->ctrl)) {
 		dev_info(ctrl->ctrl.device, "Reconnecting in %d seconds...\n",
 			ctrl->ctrl.opts->reconnect_delay);
-		queue_delayed_work(nvme_wq, &ctrl->reconnect_work,
+		queue_delayed_work(nvme_wq, &ctrl->ctrl.reconnect_work,
 				ctrl->ctrl.opts->reconnect_delay * HZ);
 	} else {
 		dev_info(ctrl->ctrl.device, "Removing controller...\n");
-		queue_work(nvme_wq, &ctrl->delete_work);
+		queue_work(nvme_wq, &ctrl->ctrl.delete_work);
 	}
 }
 
 static void nvme_rdma_reconnect_ctrl_work(struct work_struct *work)
 {
-	struct nvme_rdma_ctrl *ctrl = container_of(to_delayed_work(work),
-			struct nvme_rdma_ctrl, reconnect_work);
+	struct nvme_ctrl *nctrl = container_of(to_delayed_work(work),
+			struct nvme_ctrl, reconnect_work);
+	struct nvme_rdma_ctrl *ctrl = to_rdma_ctrl(nctrl);
 	bool changed;
 	int ret;
 
@@ -972,8 +969,9 @@ static void nvme_rdma_reconnect_ctrl_work(struct work_struct *work)
 
 static void nvme_rdma_error_recovery_work(struct work_struct *work)
 {
-	struct nvme_rdma_ctrl *ctrl = container_of(work,
-			struct nvme_rdma_ctrl, err_work);
+	struct nvme_ctrl *nctrl = container_of(work,
+			struct nvme_ctrl, err_work);
+	struct nvme_rdma_ctrl *ctrl = to_rdma_ctrl(nctrl);
 
 	nvme_stop_keep_alive(&ctrl->ctrl);
 
@@ -1006,7 +1004,7 @@ static void nvme_rdma_error_recovery(struct nvme_rdma_ctrl *ctrl)
 	if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_RECONNECTING))
 		return;
 
-	queue_work(nvme_wq, &ctrl->err_work);
+	queue_work(nvme_wq, &ctrl->ctrl.err_work);
 }
 
 static void nvme_rdma_wr_error(struct ib_cq *cq, struct ib_wc *wc,
@@ -1742,8 +1740,8 @@ static const struct blk_mq_ops nvme_rdma_admin_mq_ops = {
 static void nvme_rdma_teardown_ctrl(struct nvme_rdma_ctrl *ctrl, bool shutdown)
 {
 	nvme_stop_keep_alive(&ctrl->ctrl);
-	cancel_work_sync(&ctrl->err_work);
-	cancel_delayed_work_sync(&ctrl->reconnect_work);
+	cancel_work_sync(&ctrl->ctrl.err_work);
+	cancel_delayed_work_sync(&ctrl->ctrl.reconnect_work);
 
 	if (ctrl->ctrl.max_queues > 1) {
 		nvme_stop_queues(&ctrl->ctrl);
@@ -1765,17 +1763,18 @@ static void nvme_rdma_teardown_ctrl(struct nvme_rdma_ctrl *ctrl, bool shutdown)
 
 static void nvme_rdma_del_ctrl_work(struct work_struct *work)
 {
-	struct nvme_rdma_ctrl *ctrl = container_of(work,
-				struct nvme_rdma_ctrl, delete_work);
+	struct nvme_ctrl *nctrl = container_of(work,
+			struct nvme_ctrl, delete_work);
+	struct nvme_rdma_ctrl *ctrl = to_rdma_ctrl(nctrl);
 
 	nvme_uninit_ctrl(&ctrl->ctrl);
 	nvme_rdma_teardown_ctrl(ctrl, true);
 	nvme_put_ctrl(&ctrl->ctrl);
 }
 
-static int __nvme_rdma_del_ctrl(struct nvme_rdma_ctrl *ctrl)
+static int __nvme_rdma_del_ctrl(struct nvme_ctrl *ctrl)
 {
-	if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_DELETING))
+	if (!nvme_change_ctrl_state(ctrl, NVME_CTRL_DELETING))
 		return -EBUSY;
 
 	if (!queue_work(nvme_wq, &ctrl->delete_work))
@@ -1784,28 +1783,28 @@ static int __nvme_rdma_del_ctrl(struct nvme_rdma_ctrl *ctrl)
 	return 0;
 }
 
-static int nvme_rdma_del_ctrl(struct nvme_ctrl *nctrl)
+static int nvme_rdma_del_ctrl(struct nvme_ctrl *ctrl)
 {
-	struct nvme_rdma_ctrl *ctrl = to_rdma_ctrl(nctrl);
 	int ret = 0;
 
 	/*
 	 * Keep a reference until all work is flushed since
 	 * __nvme_rdma_del_ctrl can free the ctrl mem
 	 */
-	if (!kref_get_unless_zero(&ctrl->ctrl.kref))
+	if (!kref_get_unless_zero(&ctrl->kref))
 		return -EBUSY;
 	ret = __nvme_rdma_del_ctrl(ctrl);
 	if (!ret)
 		flush_work(&ctrl->delete_work);
-	nvme_put_ctrl(&ctrl->ctrl);
+	nvme_put_ctrl(ctrl);
 	return ret;
 }
 
 static void nvme_rdma_reset_ctrl_work(struct work_struct *work)
 {
-	struct nvme_rdma_ctrl *ctrl =
-		container_of(work, struct nvme_rdma_ctrl, ctrl.reset_work);
+	struct nvme_ctrl *nctrl = container_of(work,
+			struct nvme_ctrl, reset_work);
+	struct nvme_rdma_ctrl *ctrl = to_rdma_ctrl(nctrl);
 	int ret;
 	bool changed;
 
@@ -1866,7 +1865,7 @@ static struct nvme_ctrl *nvme_rdma_create_ctrl(struct device *dev,
 	ctrl = kzalloc(sizeof(*ctrl), GFP_KERNEL);
 	if (!ctrl)
 		return ERR_PTR(-ENOMEM);
-	ctrl->ctrl.opts = opts;
+
 	INIT_LIST_HEAD(&ctrl->list);
 
 	if (opts->mask & NVMF_OPT_TRSVCID)
@@ -1891,21 +1890,21 @@ static struct nvme_ctrl *nvme_rdma_create_ctrl(struct device *dev,
 		}
 	}
 
+	ctrl->ctrl.opts = opts;
+	ctrl->ctrl.max_queues = opts->nr_io_queues + 1;
+	ctrl->ctrl.sqsize = opts->queue_size - 1;
+	ctrl->ctrl.kato = opts->kato;
+	INIT_DELAYED_WORK(&ctrl->ctrl.reconnect_work,
+			nvme_rdma_reconnect_ctrl_work);
+	INIT_WORK(&ctrl->ctrl.err_work, nvme_rdma_error_recovery_work);
+	INIT_WORK(&ctrl->ctrl.delete_work, nvme_rdma_del_ctrl_work);
+	INIT_WORK(&ctrl->ctrl.reset_work, nvme_rdma_reset_ctrl_work);
+
 	ret = nvme_init_ctrl(&ctrl->ctrl, dev, &nvme_rdma_ctrl_ops,
 				0 /* no quirks, we're perfect! */);
 	if (ret)
 		goto out_free_ctrl;
 
-	INIT_DELAYED_WORK(&ctrl->reconnect_work,
-			nvme_rdma_reconnect_ctrl_work);
-	INIT_WORK(&ctrl->err_work, nvme_rdma_error_recovery_work);
-	INIT_WORK(&ctrl->delete_work, nvme_rdma_del_ctrl_work);
-	INIT_WORK(&ctrl->ctrl.reset_work, nvme_rdma_reset_ctrl_work);
-
-	ctrl->ctrl.max_queues = opts->nr_io_queues + 1;
-	ctrl->ctrl.sqsize = opts->queue_size - 1;
-	ctrl->ctrl.kato = opts->kato;
-
 	ret = -ENOMEM;
 	ctrl->queues = kcalloc(ctrl->ctrl.max_queues, sizeof(*ctrl->queues),
 				GFP_KERNEL);
@@ -2011,7 +2010,7 @@ static void nvme_rdma_remove_one(struct ib_device *ib_device, void *client_data)
 		dev_info(ctrl->ctrl.device,
 			"Removing ctrl: NQN \"%s\", addr %pISp\n",
 			ctrl->ctrl.opts->subsysnqn, &ctrl->addr);
-		__nvme_rdma_del_ctrl(ctrl);
+		__nvme_rdma_del_ctrl(&ctrl->ctrl);
 	}
 	mutex_unlock(&nvme_rdma_ctrl_mutex);
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: [PATCH rfc 20/30] nvme: add err, reconnect and delete work items to nvme core
  2017-06-18 15:21 ` [PATCH rfc 20/30] nvme: add err, reconnect and delete work items to nvme core Sagi Grimberg
@ 2017-06-19 12:49   ` Christoph Hellwig
  2017-06-19 14:14     ` Sagi Grimberg
  0 siblings, 1 reply; 69+ messages in thread
From: Christoph Hellwig @ 2017-06-19 12:49 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: linux-nvme, Christoph Hellwig, Keith Busch, linux-block

On Sun, Jun 18, 2017 at 06:21:54PM +0300, Sagi Grimberg wrote:
> We intent for these handlers to become generic, thus, add them to
> the nvme core controller struct.

Do you remember why we actually need all the different work items?

We need err_work to recover from RDMA QP-level errors.  But how
is it so different from a reset in that respect?  Similarly why
do we need reset to be different from reconnect?  Especially as
reconnect sort of is the reset of fabrics.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH rfc 20/30] nvme: add err, reconnect and delete work items to nvme core
  2017-06-19 12:49   ` Christoph Hellwig
@ 2017-06-19 14:14     ` Sagi Grimberg
  0 siblings, 0 replies; 69+ messages in thread
From: Sagi Grimberg @ 2017-06-19 14:14 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-nvme, Keith Busch, linux-block

>> We intent for these handlers to become generic, thus, add them to
>> the nvme core controller struct.
> 
> Do you remember why we actually need all the different work items?

I remember documenting it at some point, but either it got lost
somewhere or I don't remember...

> We need err_work to recover from RDMA QP-level errors.

transport errors usually detected in soft-irq, so we queue up
err_work to:
1. stop + drain queues
2. fail inflight I/O.
3. queue delayed reconnect (reconnect_work)

> But how is it so different from a reset in that respect?  Similarly why
> do we need reset to be different from reconnect?  Especially as
> reconnect sort of is the reset of fabrics.

Hmm, resets and reconnects are indeed similar, but one difference
is that in resets we do not fail fast inflight I/O as the expectation
is that the recovery should be immediate (it also matches pci with
that respect) while we consider reconnect something that can last
for a while so we fail fast to allow failover and continue reconnect
attempts quietly. Another difference is that failed reset results in
a controller removal while in reconnects we have to exhaust
ctrl_loss_tmo.

We could change things, like merging reconnect and reset and introduce
a concept of "on-host reset". Not sure it will be any less confusing
though...

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH rfc 21/30] nvme-rdma: plumb nvme_ctrl down the calls tack
  2017-06-18 15:21 [PATCH rfc 00/30] centralize nvme controller reset, delete and periodic reconnects Sagi Grimberg
                   ` (19 preceding siblings ...)
  2017-06-18 15:21 ` [PATCH rfc 20/30] nvme: add err, reconnect and delete work items to nvme core Sagi Grimberg
@ 2017-06-18 15:21 ` Sagi Grimberg
  2017-06-18 15:21 ` [PATCH rfc 22/30] nvme-rdma: Split create_ctrl to transport specific and generic parts Sagi Grimberg
                   ` (9 subsequent siblings)
  30 siblings, 0 replies; 69+ messages in thread
From: Sagi Grimberg @ 2017-06-18 15:21 UTC (permalink / raw)
  To: linux-nvme; +Cc: Christoph Hellwig, Keith Busch, linux-block

We are trying to make these routines generic, so pass
around nvme_ctrl down the call stack and only access
the rdma ctrl in the bottom of the stack, where we
will callout the transports.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
 drivers/nvme/host/rdma.c | 251 +++++++++++++++++++++++------------------------
 1 file changed, 125 insertions(+), 126 deletions(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 6ce5054d4470..e656b9b17d67 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -568,14 +568,14 @@ static struct blk_mq_tag_set *nvme_rdma_alloc_tagset(struct nvme_ctrl *nctrl,
 	return ERR_PTR(ret);
 }
 
-static int nvme_rdma_alloc_queue(struct nvme_rdma_ctrl *ctrl,
+static int nvme_rdma_alloc_queue(struct nvme_ctrl *nctrl,
 		int idx, size_t queue_size)
 {
-	struct nvme_rdma_queue *queue;
+	struct nvme_rdma_ctrl *ctrl = to_rdma_ctrl(nctrl);
+	struct nvme_rdma_queue *queue = &ctrl->queues[idx];
 	struct sockaddr *src_addr = NULL;
 	int ret;
 
-	queue = &ctrl->queues[idx];
 	queue->ctrl = ctrl;
 	init_completion(&queue->cm_done);
 
@@ -647,8 +647,9 @@ static int nvme_rdma_alloc_queue(struct nvme_rdma_ctrl *ctrl,
 	return ret;
 }
 
-static void nvme_rdma_stop_queue(struct nvme_rdma_ctrl *ctrl, int qid)
+static void nvme_rdma_stop_queue(struct nvme_ctrl *nctrl, int qid)
 {
+	struct nvme_rdma_ctrl *ctrl = to_rdma_ctrl(nctrl);
 	struct nvme_rdma_queue *queue = &ctrl->queues[qid];
 
 	if (!test_and_clear_bit(NVME_RDMA_Q_LIVE, &queue->flags))
@@ -657,8 +658,9 @@ static void nvme_rdma_stop_queue(struct nvme_rdma_ctrl *ctrl, int qid)
 	ib_drain_qp(queue->qp);
 }
 
-static void nvme_rdma_free_queue(struct nvme_rdma_ctrl *ctrl, int qid)
+static void nvme_rdma_free_queue(struct nvme_ctrl *nctrl, int qid)
 {
+	struct nvme_rdma_ctrl *ctrl = to_rdma_ctrl(nctrl);
 	struct nvme_rdma_queue *queue = &ctrl->queues[qid];
 
 	if (test_and_set_bit(NVME_RDMA_Q_DELETING, &queue->flags))
@@ -672,54 +674,55 @@ static void nvme_rdma_free_queue(struct nvme_rdma_ctrl *ctrl, int qid)
 	rdma_destroy_id(queue->cm_id);
 }
 
-static void nvme_rdma_free_io_queues(struct nvme_rdma_ctrl *ctrl)
+static void nvme_rdma_free_io_queues(struct nvme_ctrl *ctrl)
 {
 	int i;
 
-	for (i = 1; i < ctrl->ctrl.queue_count; i++)
+	for (i = 1; i < ctrl->queue_count; i++)
 		nvme_rdma_free_queue(ctrl, i);
 }
 
-static void nvme_rdma_stop_io_queues(struct nvme_rdma_ctrl *ctrl)
+static void nvme_rdma_stop_io_queues(struct nvme_ctrl *ctrl)
 {
 	int i;
 
-	for (i = 1; i < ctrl->ctrl.queue_count; i++)
+	for (i = 1; i < ctrl->queue_count; i++)
 		nvme_rdma_stop_queue(ctrl, i);
 }
 
-static void nvme_rdma_destroy_io_queues(struct nvme_rdma_ctrl *ctrl, bool remove)
+static void nvme_rdma_destroy_io_queues(struct nvme_ctrl *ctrl, bool remove)
 {
 	nvme_rdma_stop_io_queues(ctrl);
 	if (remove) {
-		blk_cleanup_queue(ctrl->ctrl.connect_q);
-		nvme_rdma_free_tagset(&ctrl->ctrl, false);
+		blk_cleanup_queue(ctrl->connect_q);
+		nvme_rdma_free_tagset(ctrl, false);
 	}
 	nvme_rdma_free_io_queues(ctrl);
 }
 
-static int nvme_rdma_start_queue(struct nvme_rdma_ctrl *ctrl, int idx)
+static int nvme_rdma_start_queue(struct nvme_ctrl *nctrl, int idx)
 {
+	struct nvme_rdma_ctrl *ctrl = to_rdma_ctrl(nctrl);
 	int ret;
 
 	if (idx)
-		ret = nvmf_connect_io_queue(&ctrl->ctrl, idx);
+		ret = nvmf_connect_io_queue(nctrl, idx);
 	else
-		ret = nvmf_connect_admin_queue(&ctrl->ctrl);
+		ret = nvmf_connect_admin_queue(nctrl);
 
 	if (!ret)
 		set_bit(NVME_RDMA_Q_LIVE, &ctrl->queues[idx].flags);
 	else
-		dev_info(ctrl->ctrl.device,
+		dev_info(nctrl->device,
 			"failed to connect queue: %d ret=%d\n", idx, ret);
 	return ret;
 }
 
-static int nvme_rdma_start_io_queues(struct nvme_rdma_ctrl *ctrl)
+static int nvme_rdma_start_io_queues(struct nvme_ctrl *ctrl)
 {
 	int i, ret = 0;
 
-	for (i = 1; i < ctrl->ctrl.queue_count; i++) {
+	for (i = 1; i < ctrl->queue_count; i++) {
 		ret = nvme_rdma_start_queue(ctrl, i);
 		if (ret)
 			goto out_stop_queues;
@@ -733,26 +736,26 @@ static int nvme_rdma_start_io_queues(struct nvme_rdma_ctrl *ctrl)
 	return ret;
 }
 
-static int nvme_rdma_alloc_io_queues(struct nvme_rdma_ctrl *ctrl)
+static int nvme_rdma_alloc_io_queues(struct nvme_ctrl *ctrl)
 {
-	unsigned int nr_io_queues = ctrl->ctrl.max_queues - 1;
+	unsigned int nr_io_queues = ctrl->max_queues - 1;
 	int i, ret;
 
 	nr_io_queues = min(nr_io_queues, num_online_cpus());
-	ret = nvme_set_queue_count(&ctrl->ctrl, &nr_io_queues);
+	ret = nvme_set_queue_count(ctrl, &nr_io_queues);
 	if (ret)
 		return ret;
 
-	ctrl->ctrl.queue_count = nr_io_queues + 1;
-	if (ctrl->ctrl.queue_count < 2)
+	ctrl->queue_count = nr_io_queues + 1;
+	if (ctrl->queue_count < 2)
 		return 0;
 
-	dev_info(ctrl->ctrl.device,
+	dev_info(ctrl->device,
 		"creating %d I/O queues.\n", nr_io_queues);
 
-	for (i = 1; i < ctrl->ctrl.queue_count; i++) {
+	for (i = 1; i < ctrl->queue_count; i++) {
 		ret = nvme_rdma_alloc_queue(ctrl, i,
-				ctrl->ctrl.sqsize + 1);
+				ctrl->sqsize + 1);
 		if (ret)
 			goto out_free_queues;
 	}
@@ -766,7 +769,7 @@ static int nvme_rdma_alloc_io_queues(struct nvme_rdma_ctrl *ctrl)
 	return ret;
 }
 
-static int nvme_rdma_configure_io_queues(struct nvme_rdma_ctrl *ctrl, bool new)
+static int nvme_rdma_configure_io_queues(struct nvme_ctrl *ctrl, bool new)
 {
 	int ret;
 
@@ -775,19 +778,19 @@ static int nvme_rdma_configure_io_queues(struct nvme_rdma_ctrl *ctrl, bool new)
 		return ret;
 
 	if (new) {
-		ctrl->ctrl.tagset = nvme_rdma_alloc_tagset(&ctrl->ctrl, false);
-		if (IS_ERR(ctrl->ctrl.tagset)) {
-			ret = PTR_ERR(ctrl->ctrl.tagset);
+		ctrl->tagset = nvme_rdma_alloc_tagset(ctrl, false);
+		if (IS_ERR(ctrl->tagset)) {
+			ret = PTR_ERR(ctrl->tagset);
 			goto out_free_io_queues;
 		}
 
-		ctrl->ctrl.connect_q = blk_mq_init_queue(ctrl->ctrl.tagset);
-		if (IS_ERR(ctrl->ctrl.connect_q)) {
-			ret = PTR_ERR(ctrl->ctrl.connect_q);
+		ctrl->connect_q = blk_mq_init_queue(ctrl->tagset);
+		if (IS_ERR(ctrl->connect_q)) {
+			ret = PTR_ERR(ctrl->connect_q);
 			goto out_free_tag_set;
 		}
 	} else {
-		ret = blk_mq_reinit_tagset(ctrl->ctrl.tagset);
+		ret = blk_mq_reinit_tagset(ctrl->tagset);
 		if (ret)
 			goto out_free_io_queues;
 	}
@@ -800,27 +803,27 @@ static int nvme_rdma_configure_io_queues(struct nvme_rdma_ctrl *ctrl, bool new)
 
 out_cleanup_connect_q:
 	if (new)
-		blk_cleanup_queue(ctrl->ctrl.connect_q);
+		blk_cleanup_queue(ctrl->connect_q);
 out_free_tag_set:
 	if (new)
-		nvme_rdma_free_tagset(&ctrl->ctrl, false);
+		nvme_rdma_free_tagset(ctrl, false);
 out_free_io_queues:
 	nvme_rdma_free_io_queues(ctrl);
 	return ret;
 }
 
-static void nvme_rdma_destroy_admin_queue(struct nvme_rdma_ctrl *ctrl, bool remove)
+static void nvme_rdma_destroy_admin_queue(struct nvme_ctrl *ctrl, bool remove)
 {
 	nvme_rdma_stop_queue(ctrl, 0);
 	if (remove) {
-		blk_cleanup_queue(ctrl->ctrl.admin_connect_q);
-		blk_cleanup_queue(ctrl->ctrl.admin_q);
-		nvme_rdma_free_tagset(&ctrl->ctrl, true);
+		blk_cleanup_queue(ctrl->admin_connect_q);
+		blk_cleanup_queue(ctrl->admin_q);
+		nvme_rdma_free_tagset(ctrl, true);
 	}
 	nvme_rdma_free_queue(ctrl, 0);
 }
 
-static int nvme_rdma_configure_admin_queue(struct nvme_rdma_ctrl *ctrl, bool new)
+static int nvme_rdma_configure_admin_queue(struct nvme_ctrl *ctrl, bool new)
 {
 	int error;
 
@@ -829,25 +832,25 @@ static int nvme_rdma_configure_admin_queue(struct nvme_rdma_ctrl *ctrl, bool new
 		return error;
 
 	if (new) {
-		ctrl->ctrl.admin_tagset = nvme_rdma_alloc_tagset(&ctrl->ctrl, true);
-		if (IS_ERR(ctrl->ctrl.admin_tagset)) {
-			error = PTR_ERR(ctrl->ctrl.admin_tagset);
+		ctrl->admin_tagset = nvme_rdma_alloc_tagset(ctrl, true);
+		if (IS_ERR(ctrl->admin_tagset)) {
+			error = PTR_ERR(ctrl->admin_tagset);
 			goto out_free_queue;
 		}
 
-		ctrl->ctrl.admin_q = blk_mq_init_queue(ctrl->ctrl.admin_tagset);
-		if (IS_ERR(ctrl->ctrl.admin_q)) {
-			error = PTR_ERR(ctrl->ctrl.admin_q);
+		ctrl->admin_q = blk_mq_init_queue(ctrl->admin_tagset);
+		if (IS_ERR(ctrl->admin_q)) {
+			error = PTR_ERR(ctrl->admin_q);
 			goto out_free_tagset;
 		}
 
-		ctrl->ctrl.admin_connect_q = blk_mq_init_queue(ctrl->ctrl.admin_tagset);
-		if (IS_ERR(ctrl->ctrl.admin_connect_q)) {
-			error = PTR_ERR(ctrl->ctrl.admin_connect_q);
+		ctrl->admin_connect_q = blk_mq_init_queue(ctrl->admin_tagset);
+		if (IS_ERR(ctrl->admin_connect_q)) {
+			error = PTR_ERR(ctrl->admin_connect_q);
 			goto out_cleanup_queue;
 		}
 	} else {
-		error = blk_mq_reinit_tagset(ctrl->ctrl.admin_tagset);
+		error = blk_mq_reinit_tagset(ctrl->admin_tagset);
 		if (error)
 			goto out_free_queue;
 	}
@@ -856,37 +859,37 @@ static int nvme_rdma_configure_admin_queue(struct nvme_rdma_ctrl *ctrl, bool new
 	if (error)
 		goto out_cleanup_connect_queue;
 
-	error = ctrl->ctrl.ops->reg_read64(&ctrl->ctrl, NVME_REG_CAP, &ctrl->ctrl.cap);
+	error = ctrl->ops->reg_read64(ctrl, NVME_REG_CAP, &ctrl->cap);
 	if (error) {
-		dev_err(ctrl->ctrl.device,
+		dev_err(ctrl->device,
 			"prop_get NVME_REG_CAP failed\n");
 		goto out_cleanup_connect_queue;
 	}
 
-	ctrl->ctrl.sqsize =
-		min_t(int, NVME_CAP_MQES(ctrl->ctrl.cap), ctrl->ctrl.sqsize);
+	ctrl->sqsize =
+		min_t(int, NVME_CAP_MQES(ctrl->cap), ctrl->sqsize);
 
-	error = nvme_enable_ctrl(&ctrl->ctrl, ctrl->ctrl.cap);
+	error = nvme_enable_ctrl(ctrl, ctrl->cap);
 	if (error)
 		goto out_cleanup_connect_queue;
 
-	error = nvme_init_identify(&ctrl->ctrl);
+	error = nvme_init_identify(ctrl);
 	if (error)
 		goto out_cleanup_connect_queue;
 
-	nvme_start_keep_alive(&ctrl->ctrl);
+	nvme_start_keep_alive(ctrl);
 
 	return 0;
 
 out_cleanup_connect_queue:
 	if (new)
-		blk_cleanup_queue(ctrl->ctrl.admin_connect_q);
+		blk_cleanup_queue(ctrl->admin_connect_q);
 out_cleanup_queue:
 	if (new)
-		blk_cleanup_queue(ctrl->ctrl.admin_q);
+		blk_cleanup_queue(ctrl->admin_q);
 out_free_tagset:
 	if (new)
-		nvme_rdma_free_tagset(&ctrl->ctrl, true);
+		nvme_rdma_free_tagset(ctrl, true);
 out_free_queue:
 	nvme_rdma_free_queue(ctrl, 0);
 	return error;
@@ -909,37 +912,36 @@ static void nvme_rdma_free_ctrl(struct nvme_ctrl *nctrl)
 	kfree(ctrl);
 }
 
-static void nvme_rdma_reconnect_or_remove(struct nvme_rdma_ctrl *ctrl)
+static void nvme_rdma_reconnect_or_remove(struct nvme_ctrl *ctrl)
 {
 	/* If we are resetting/deleting then do nothing */
-	if (ctrl->ctrl.state != NVME_CTRL_RECONNECTING) {
-		WARN_ON_ONCE(ctrl->ctrl.state == NVME_CTRL_NEW ||
-			ctrl->ctrl.state == NVME_CTRL_LIVE);
+	if (ctrl->state != NVME_CTRL_RECONNECTING) {
+		WARN_ON_ONCE(ctrl->state == NVME_CTRL_NEW ||
+			ctrl->state == NVME_CTRL_LIVE);
 		return;
 	}
 
-	if (nvmf_should_reconnect(&ctrl->ctrl)) {
-		dev_info(ctrl->ctrl.device, "Reconnecting in %d seconds...\n",
-			ctrl->ctrl.opts->reconnect_delay);
-		queue_delayed_work(nvme_wq, &ctrl->ctrl.reconnect_work,
-				ctrl->ctrl.opts->reconnect_delay * HZ);
+	if (nvmf_should_reconnect(ctrl)) {
+		dev_info(ctrl->device, "Reconnecting in %d seconds...\n",
+			ctrl->opts->reconnect_delay);
+		queue_delayed_work(nvme_wq, &ctrl->reconnect_work,
+				ctrl->opts->reconnect_delay * HZ);
 	} else {
-		dev_info(ctrl->ctrl.device, "Removing controller...\n");
-		queue_work(nvme_wq, &ctrl->ctrl.delete_work);
+		dev_info(ctrl->device, "Removing controller...\n");
+		queue_work(nvme_wq, &ctrl->delete_work);
 	}
 }
 
 static void nvme_rdma_reconnect_ctrl_work(struct work_struct *work)
 {
-	struct nvme_ctrl *nctrl = container_of(to_delayed_work(work),
+	struct nvme_ctrl *ctrl = container_of(to_delayed_work(work),
 			struct nvme_ctrl, reconnect_work);
-	struct nvme_rdma_ctrl *ctrl = to_rdma_ctrl(nctrl);
 	bool changed;
 	int ret;
 
-	++ctrl->ctrl.nr_reconnects;
+	++ctrl->nr_reconnects;
 
-	if (ctrl->ctrl.max_queues > 1)
+	if (ctrl->max_queues > 1)
 		nvme_rdma_destroy_io_queues(ctrl, false);
 
 	nvme_rdma_destroy_admin_queue(ctrl, false);
@@ -948,53 +950,52 @@ static void nvme_rdma_reconnect_ctrl_work(struct work_struct *work)
 	if (ret)
 		goto requeue;
 
-	if (ctrl->ctrl.max_queues > 1) {
+	if (ctrl->max_queues > 1) {
 		ret = nvme_rdma_configure_io_queues(ctrl, false);
 		if (ret)
 			goto requeue;
 	}
 
-	changed = nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_LIVE);
+	changed = nvme_change_ctrl_state(ctrl, NVME_CTRL_LIVE);
 	WARN_ON_ONCE(!changed);
-	ctrl->ctrl.nr_reconnects = 0;
-	dev_info(ctrl->ctrl.device, "Successfully reconnected\n");
+	ctrl->nr_reconnects = 0;
+	dev_info(ctrl->device, "Successfully reconnected\n");
 
 	return;
 
 requeue:
-	dev_info(ctrl->ctrl.device, "Failed reconnect attempt %d\n",
-			ctrl->ctrl.nr_reconnects);
+	dev_info(ctrl->device, "Failed reconnect attempt %d\n",
+			ctrl->nr_reconnects);
 	nvme_rdma_reconnect_or_remove(ctrl);
 }
 
 static void nvme_rdma_error_recovery_work(struct work_struct *work)
 {
-	struct nvme_ctrl *nctrl = container_of(work,
+	struct nvme_ctrl *ctrl = container_of(work,
 			struct nvme_ctrl, err_work);
-	struct nvme_rdma_ctrl *ctrl = to_rdma_ctrl(nctrl);
 
-	nvme_stop_keep_alive(&ctrl->ctrl);
+	nvme_stop_keep_alive(ctrl);
 
-	if (ctrl->ctrl.queue_count > 1) {
-		nvme_stop_queues(&ctrl->ctrl);
+	if (ctrl->queue_count > 1) {
+		nvme_stop_queues(ctrl);
 		nvme_rdma_stop_io_queues(ctrl);
 	}
-	blk_mq_stop_hw_queues(ctrl->ctrl.admin_q);
+	blk_mq_stop_hw_queues(ctrl->admin_q);
 	nvme_rdma_stop_queue(ctrl, 0);
 
 	/* We must take care of fastfail/requeue all our inflight requests */
-	if (ctrl->ctrl.queue_count > 1)
-		blk_mq_tagset_busy_iter(&ctrl->tag_set,
-					nvme_cancel_request, &ctrl->ctrl);
-	blk_mq_tagset_busy_iter(&ctrl->admin_tag_set,
-				nvme_cancel_request, &ctrl->ctrl);
+	if (ctrl->queue_count > 1)
+		blk_mq_tagset_busy_iter(ctrl->tagset,
+					nvme_cancel_request, ctrl);
+	blk_mq_tagset_busy_iter(ctrl->admin_tagset,
+				nvme_cancel_request, ctrl);
 
 	/*
 	 * queues are not a live anymore, so restart the queues to fail fast
 	 * new IO
 	 */
-	blk_mq_start_stopped_hw_queues(ctrl->ctrl.admin_q, true);
-	nvme_start_queues(&ctrl->ctrl);
+	blk_mq_start_stopped_hw_queues(ctrl->admin_q, true);
+	nvme_start_queues(ctrl);
 
 	nvme_rdma_reconnect_or_remove(ctrl);
 }
@@ -1737,39 +1738,38 @@ static const struct blk_mq_ops nvme_rdma_admin_mq_ops = {
 	.timeout	= nvme_rdma_timeout,
 };
 
-static void nvme_rdma_teardown_ctrl(struct nvme_rdma_ctrl *ctrl, bool shutdown)
+static void nvme_rdma_teardown_ctrl(struct nvme_ctrl *ctrl, bool shutdown)
 {
-	nvme_stop_keep_alive(&ctrl->ctrl);
-	cancel_work_sync(&ctrl->ctrl.err_work);
-	cancel_delayed_work_sync(&ctrl->ctrl.reconnect_work);
+	nvme_stop_keep_alive(ctrl);
+	cancel_work_sync(&ctrl->err_work);
+	cancel_delayed_work_sync(&ctrl->reconnect_work);
 
-	if (ctrl->ctrl.max_queues > 1) {
-		nvme_stop_queues(&ctrl->ctrl);
-		blk_mq_tagset_busy_iter(&ctrl->tag_set,
-					nvme_cancel_request, &ctrl->ctrl);
+	if (ctrl->max_queues > 1) {
+		nvme_stop_queues(ctrl);
+		blk_mq_tagset_busy_iter(ctrl->tagset,
+					nvme_cancel_request, ctrl);
 		nvme_rdma_destroy_io_queues(ctrl, shutdown);
 	}
 
 	if (shutdown)
-		nvme_shutdown_ctrl(&ctrl->ctrl);
+		nvme_shutdown_ctrl(ctrl);
 	else
-		nvme_disable_ctrl(&ctrl->ctrl, ctrl->ctrl.cap);
+		nvme_disable_ctrl(ctrl, ctrl->cap);
 
-	blk_mq_stop_hw_queues(ctrl->ctrl.admin_q);
-	blk_mq_tagset_busy_iter(&ctrl->admin_tag_set,
-				nvme_cancel_request, &ctrl->ctrl);
+	blk_mq_stop_hw_queues(ctrl->admin_q);
+	blk_mq_tagset_busy_iter(ctrl->admin_tagset,
+				nvme_cancel_request, ctrl);
 	nvme_rdma_destroy_admin_queue(ctrl, shutdown);
 }
 
 static void nvme_rdma_del_ctrl_work(struct work_struct *work)
 {
-	struct nvme_ctrl *nctrl = container_of(work,
+	struct nvme_ctrl *ctrl = container_of(work,
 			struct nvme_ctrl, delete_work);
-	struct nvme_rdma_ctrl *ctrl = to_rdma_ctrl(nctrl);
 
-	nvme_uninit_ctrl(&ctrl->ctrl);
+	nvme_uninit_ctrl(ctrl);
 	nvme_rdma_teardown_ctrl(ctrl, true);
-	nvme_put_ctrl(&ctrl->ctrl);
+	nvme_put_ctrl(ctrl);
 }
 
 static int __nvme_rdma_del_ctrl(struct nvme_ctrl *ctrl)
@@ -1802,9 +1802,8 @@ static int nvme_rdma_del_ctrl(struct nvme_ctrl *ctrl)
 
 static void nvme_rdma_reset_ctrl_work(struct work_struct *work)
 {
-	struct nvme_ctrl *nctrl = container_of(work,
+	struct nvme_ctrl *ctrl = container_of(work,
 			struct nvme_ctrl, reset_work);
-	struct nvme_rdma_ctrl *ctrl = to_rdma_ctrl(nctrl);
 	int ret;
 	bool changed;
 
@@ -1814,19 +1813,19 @@ static void nvme_rdma_reset_ctrl_work(struct work_struct *work)
 	if (ret)
 		goto out_destroy_admin;
 
-	if (ctrl->ctrl.max_queues > 1) {
+	if (ctrl->max_queues > 1) {
 		ret = nvme_rdma_configure_io_queues(ctrl, false);
 		if (ret)
 			goto out_destroy_io;
 	}
 
-	changed = nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_LIVE);
+	changed = nvme_change_ctrl_state(ctrl, NVME_CTRL_LIVE);
 	WARN_ON_ONCE(!changed);
 
-	if (ctrl->ctrl.queue_count > 1) {
-		nvme_start_queues(&ctrl->ctrl);
-		nvme_queue_scan(&ctrl->ctrl);
-		nvme_queue_async_events(&ctrl->ctrl);
+	if (ctrl->queue_count > 1) {
+		nvme_start_queues(ctrl);
+		nvme_queue_scan(ctrl);
+		nvme_queue_async_events(ctrl);
 	}
 
 	return;
@@ -1835,9 +1834,9 @@ static void nvme_rdma_reset_ctrl_work(struct work_struct *work)
 	nvme_rdma_destroy_io_queues(ctrl, true);
 out_destroy_admin:
 	nvme_rdma_destroy_admin_queue(ctrl, true);
-	dev_warn(ctrl->ctrl.device, "Removing after reset failure\n");
-	nvme_uninit_ctrl(&ctrl->ctrl);
-	nvme_put_ctrl(&ctrl->ctrl);
+	dev_warn(ctrl->device, "Removing after reset failure\n");
+	nvme_uninit_ctrl(ctrl);
+	nvme_put_ctrl(ctrl);
 }
 
 static const struct nvme_ctrl_ops nvme_rdma_ctrl_ops = {
@@ -1911,7 +1910,7 @@ static struct nvme_ctrl *nvme_rdma_create_ctrl(struct device *dev,
 	if (!ctrl->queues)
 		goto out_uninit_ctrl;
 
-	ret = nvme_rdma_configure_admin_queue(ctrl, true);
+	ret = nvme_rdma_configure_admin_queue(&ctrl->ctrl, true);
 	if (ret)
 		goto out_kfree_queues;
 
@@ -1946,7 +1945,7 @@ static struct nvme_ctrl *nvme_rdma_create_ctrl(struct device *dev,
 	}
 
 	if (ctrl->ctrl.max_queues > 1) {
-		ret = nvme_rdma_configure_io_queues(ctrl, true);
+		ret = nvme_rdma_configure_io_queues(&ctrl->ctrl, true);
 		if (ret)
 			goto out_remove_admin_queue;
 	}
@@ -1972,7 +1971,7 @@ static struct nvme_ctrl *nvme_rdma_create_ctrl(struct device *dev,
 
 out_remove_admin_queue:
 	nvme_stop_keep_alive(&ctrl->ctrl);
-	nvme_rdma_destroy_admin_queue(ctrl, true);
+	nvme_rdma_destroy_admin_queue(&ctrl->ctrl, true);
 out_kfree_queues:
 	kfree(ctrl->queues);
 out_uninit_ctrl:
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH rfc 22/30] nvme-rdma: Split create_ctrl to transport specific and generic parts
  2017-06-18 15:21 [PATCH rfc 00/30] centralize nvme controller reset, delete and periodic reconnects Sagi Grimberg
                   ` (20 preceding siblings ...)
  2017-06-18 15:21 ` [PATCH rfc 21/30] nvme-rdma: plumb nvme_ctrl down the calls tack Sagi Grimberg
@ 2017-06-18 15:21 ` Sagi Grimberg
  2017-06-18 15:21 ` [PATCH rfc 23/30] nvme: add low level queue and tagset controller ops Sagi Grimberg
                   ` (8 subsequent siblings)
  30 siblings, 0 replies; 69+ messages in thread
From: Sagi Grimberg @ 2017-06-18 15:21 UTC (permalink / raw)
  To: linux-nvme; +Cc: Christoph Hellwig, Keith Busch, linux-block

Most of create controller are simply setup admin queue and tags, submit a set
of admin commands, allocate io tags, and io queues.

We can make that generic, next we will move stuff into the core.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
 drivers/nvme/host/rdma.c | 160 +++++++++++++++++++++++++++--------------------
 1 file changed, 91 insertions(+), 69 deletions(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index e656b9b17d67..0036ddcbc138 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -1839,6 +1839,41 @@ static void nvme_rdma_reset_ctrl_work(struct work_struct *work)
 	nvme_put_ctrl(ctrl);
 }
 
+static int nvme_rdma_verify_ctrl(struct nvme_ctrl *ctrl)
+{
+	struct nvmf_ctrl_options *opts = ctrl->opts;
+
+	/* sanity check icdoff */
+	if (ctrl->icdoff) {
+		dev_err(ctrl->device, "icdoff is not supported!\n");
+		return -EINVAL;
+	}
+
+	/* sanity check keyed sgls */
+	if (!(ctrl->sgls & (1 << 20))) {
+		dev_err(ctrl->device, "Mandatory keyed sgls are not support\n");
+		return -EINVAL;
+	}
+
+	if (opts->queue_size > ctrl->maxcmd) {
+		/* warn if maxcmd is lower than queue_size */
+		dev_warn(ctrl->device,
+			"queue_size %zu > ctrl maxcmd %u, clamping down\n",
+			opts->queue_size, ctrl->maxcmd);
+		opts->queue_size = ctrl->maxcmd;
+	}
+
+	if (opts->queue_size > ctrl->sqsize + 1) {
+		/* warn if sqsize is lower than queue_size */
+		dev_warn(ctrl->device,
+			"queue_size %zu > ctrl sqsize %u, clamping down\n",
+			opts->queue_size, ctrl->sqsize + 1);
+		opts->queue_size = ctrl->sqsize + 1;
+	}
+
+	return 0;
+}
+
 static const struct nvme_ctrl_ops nvme_rdma_ctrl_ops = {
 	.name			= "rdma",
 	.module			= THIS_MODULE,
@@ -1853,12 +1888,62 @@ static const struct nvme_ctrl_ops nvme_rdma_ctrl_ops = {
 	.get_address		= nvmf_get_address,
 };
 
+static int nvme_rdma_probe_ctrl(struct nvme_ctrl *ctrl, struct device *dev,
+		const struct nvme_ctrl_ops *ops, unsigned long quirks,
+		unsigned int nr_io_queues, size_t queue_size, int kato)
+{
+	bool changed;
+	int ret;
+
+	INIT_WORK(&ctrl->delete_work, nvme_rdma_del_ctrl_work);
+	INIT_WORK(&ctrl->reset_work, nvme_rdma_reset_ctrl_work);
+
+	ctrl->max_queues = nr_io_queues + 1; /* +1 for admin queue */
+	ctrl->sqsize = queue_size - 1; /* 0's based */
+	ctrl->kato = kato;
+
+	ret = nvme_init_ctrl(ctrl, dev, ops, quirks);
+	if (ret)
+		return ret;
+
+	ret = nvme_rdma_configure_admin_queue(ctrl, true);
+	if (ret)
+		goto out_uninit_ctrl;
+
+	ret = nvme_rdma_verify_ctrl(ctrl);
+	if (ret)
+		goto out_remove_admin_queue;
+
+	if (ctrl->max_queues > 1) {
+		ret = nvme_rdma_configure_io_queues(ctrl, true);
+		if (ret)
+			goto out_remove_admin_queue;
+	}
+
+	changed = nvme_change_ctrl_state(ctrl, NVME_CTRL_LIVE);
+	WARN_ON_ONCE(!changed);
+
+	kref_get(&ctrl->kref);
+
+	if (ctrl->queue_count > 1) {
+		nvme_queue_scan(ctrl);
+		nvme_queue_async_events(ctrl);
+	}
+
+	return 0;
+
+out_remove_admin_queue:
+	nvme_rdma_destroy_admin_queue(ctrl, true);
+out_uninit_ctrl:
+	nvme_uninit_ctrl(ctrl);
+	return ret;
+}
+
 static struct nvme_ctrl *nvme_rdma_create_ctrl(struct device *dev,
 		struct nvmf_ctrl_options *opts)
 {
 	struct nvme_rdma_ctrl *ctrl;
 	int ret;
-	bool changed;
 	char *port;
 
 	ctrl = kzalloc(sizeof(*ctrl), GFP_KERNEL);
@@ -1866,6 +1951,7 @@ static struct nvme_ctrl *nvme_rdma_create_ctrl(struct device *dev,
 		return ERR_PTR(-ENOMEM);
 
 	INIT_LIST_HEAD(&ctrl->list);
+	ctrl->ctrl.opts = opts;
 
 	if (opts->mask & NVMF_OPT_TRSVCID)
 		port = opts->trsvcid;
@@ -1889,97 +1975,33 @@ static struct nvme_ctrl *nvme_rdma_create_ctrl(struct device *dev,
 		}
 	}
 
-	ctrl->ctrl.opts = opts;
-	ctrl->ctrl.max_queues = opts->nr_io_queues + 1;
-	ctrl->ctrl.sqsize = opts->queue_size - 1;
-	ctrl->ctrl.kato = opts->kato;
 	INIT_DELAYED_WORK(&ctrl->ctrl.reconnect_work,
 			nvme_rdma_reconnect_ctrl_work);
 	INIT_WORK(&ctrl->ctrl.err_work, nvme_rdma_error_recovery_work);
 	INIT_WORK(&ctrl->ctrl.delete_work, nvme_rdma_del_ctrl_work);
-	INIT_WORK(&ctrl->ctrl.reset_work, nvme_rdma_reset_ctrl_work);
-
-	ret = nvme_init_ctrl(&ctrl->ctrl, dev, &nvme_rdma_ctrl_ops,
-				0 /* no quirks, we're perfect! */);
-	if (ret)
-		goto out_free_ctrl;
 
 	ret = -ENOMEM;
-	ctrl->queues = kcalloc(ctrl->ctrl.max_queues, sizeof(*ctrl->queues),
+	ctrl->queues = kcalloc(opts->nr_io_queues + 1, sizeof(*ctrl->queues),
 				GFP_KERNEL);
 	if (!ctrl->queues)
-		goto out_uninit_ctrl;
+		goto out_free_ctrl;
 
-	ret = nvme_rdma_configure_admin_queue(&ctrl->ctrl, true);
+	ret = nvme_rdma_probe_ctrl(&ctrl->ctrl, dev, &nvme_rdma_ctrl_ops,
+			0, opts->nr_io_queues, opts->queue_size, opts->kato);
 	if (ret)
 		goto out_kfree_queues;
 
-	/* sanity check icdoff */
-	if (ctrl->ctrl.icdoff) {
-		dev_err(ctrl->ctrl.device, "icdoff is not supported!\n");
-		ret = -EINVAL;
-		goto out_remove_admin_queue;
-	}
-
-	/* sanity check keyed sgls */
-	if (!(ctrl->ctrl.sgls & (1 << 20))) {
-		dev_err(ctrl->ctrl.device, "Mandatory keyed sgls are not support\n");
-		ret = -EINVAL;
-		goto out_remove_admin_queue;
-	}
-
-	if (opts->queue_size > ctrl->ctrl.maxcmd) {
-		/* warn if maxcmd is lower than queue_size */
-		dev_warn(ctrl->ctrl.device,
-			"queue_size %zu > ctrl maxcmd %u, clamping down\n",
-			opts->queue_size, ctrl->ctrl.maxcmd);
-		opts->queue_size = ctrl->ctrl.maxcmd;
-	}
-
-	if (opts->queue_size > ctrl->ctrl.sqsize + 1) {
-		/* warn if sqsize is lower than queue_size */
-		dev_warn(ctrl->ctrl.device,
-			"queue_size %zu > ctrl sqsize %u, clamping down\n",
-			opts->queue_size, ctrl->ctrl.sqsize + 1);
-		opts->queue_size = ctrl->ctrl.sqsize + 1;
-	}
-
-	if (ctrl->ctrl.max_queues > 1) {
-		ret = nvme_rdma_configure_io_queues(&ctrl->ctrl, true);
-		if (ret)
-			goto out_remove_admin_queue;
-	}
-
-	changed = nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_LIVE);
-	WARN_ON_ONCE(!changed);
-
 	dev_info(ctrl->ctrl.device, "new ctrl: NQN \"%s\", addr %pISpcs\n",
 		ctrl->ctrl.opts->subsysnqn, &ctrl->addr);
 
-	kref_get(&ctrl->ctrl.kref);
-
 	mutex_lock(&nvme_rdma_ctrl_mutex);
 	list_add_tail(&ctrl->list, &nvme_rdma_ctrl_list);
 	mutex_unlock(&nvme_rdma_ctrl_mutex);
 
-	if (ctrl->ctrl.max_queues > 1) {
-		nvme_queue_scan(&ctrl->ctrl);
-		nvme_queue_async_events(&ctrl->ctrl);
-	}
-
 	return &ctrl->ctrl;
 
-out_remove_admin_queue:
-	nvme_stop_keep_alive(&ctrl->ctrl);
-	nvme_rdma_destroy_admin_queue(&ctrl->ctrl, true);
 out_kfree_queues:
 	kfree(ctrl->queues);
-out_uninit_ctrl:
-	nvme_uninit_ctrl(&ctrl->ctrl);
-	nvme_put_ctrl(&ctrl->ctrl);
-	if (ret > 0)
-		ret = -EIO;
-	return ERR_PTR(ret);
 out_free_ctrl:
 	kfree(ctrl);
 	return ERR_PTR(ret);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH rfc 23/30] nvme: add low level queue and tagset controller ops
  2017-06-18 15:21 [PATCH rfc 00/30] centralize nvme controller reset, delete and periodic reconnects Sagi Grimberg
                   ` (21 preceding siblings ...)
  2017-06-18 15:21 ` [PATCH rfc 22/30] nvme-rdma: Split create_ctrl to transport specific and generic parts Sagi Grimberg
@ 2017-06-18 15:21 ` Sagi Grimberg
  2017-06-18 15:21 ` [PATCH rfc 24/30] nvme-pci: rename to nvme_pci_configure_admin_queue Sagi Grimberg
                   ` (7 subsequent siblings)
  30 siblings, 0 replies; 69+ messages in thread
From: Sagi Grimberg @ 2017-06-18 15:21 UTC (permalink / raw)
  To: linux-nvme; +Cc: Christoph Hellwig, Keith Busch, linux-block

This is a preparation for moving a lot of the shared control
plane logic to the nvme core.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
 drivers/nvme/host/nvme.h | 10 ++++++++++
 drivers/nvme/host/rdma.c | 46 +++++++++++++++++++++++++++-------------------
 2 files changed, 37 insertions(+), 19 deletions(-)

diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index c604d471aa3d..18aac677a96c 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -233,6 +233,16 @@ struct nvme_ctrl_ops {
 	int (*delete_ctrl)(struct nvme_ctrl *ctrl);
 	const char *(*get_subsysnqn)(struct nvme_ctrl *ctrl);
 	int (*get_address)(struct nvme_ctrl *ctrl, char *buf, int size);
+
+	int (*alloc_hw_queue)(struct nvme_ctrl *ctrl, int idx,
+		size_t queue_size);
+	void (*free_hw_queue)(struct nvme_ctrl *ctrl, int idx);
+	int (*start_hw_queue)(struct nvme_ctrl *ctrl, int idx);
+	void (*stop_hw_queue)(struct nvme_ctrl *ctrl, int idx);
+	struct blk_mq_tag_set *(*alloc_tagset)(struct nvme_ctrl *ctrl,
+		bool admin);
+	void (*free_tagset)(struct nvme_ctrl *ctrl, bool admin);
+	int (*verify_ctrl)(struct nvme_ctrl *ctrl);
 };
 
 static inline bool nvme_ctrl_ready(struct nvme_ctrl *ctrl)
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 0036ddcbc138..a32c8a710ad4 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -679,7 +679,7 @@ static void nvme_rdma_free_io_queues(struct nvme_ctrl *ctrl)
 	int i;
 
 	for (i = 1; i < ctrl->queue_count; i++)
-		nvme_rdma_free_queue(ctrl, i);
+		ctrl->ops->free_hw_queue(ctrl, i);
 }
 
 static void nvme_rdma_stop_io_queues(struct nvme_ctrl *ctrl)
@@ -687,7 +687,7 @@ static void nvme_rdma_stop_io_queues(struct nvme_ctrl *ctrl)
 	int i;
 
 	for (i = 1; i < ctrl->queue_count; i++)
-		nvme_rdma_stop_queue(ctrl, i);
+		ctrl->ops->stop_hw_queue(ctrl, i);
 }
 
 static void nvme_rdma_destroy_io_queues(struct nvme_ctrl *ctrl, bool remove)
@@ -695,7 +695,7 @@ static void nvme_rdma_destroy_io_queues(struct nvme_ctrl *ctrl, bool remove)
 	nvme_rdma_stop_io_queues(ctrl);
 	if (remove) {
 		blk_cleanup_queue(ctrl->connect_q);
-		nvme_rdma_free_tagset(ctrl, false);
+		ctrl->ops->free_tagset(ctrl, false);
 	}
 	nvme_rdma_free_io_queues(ctrl);
 }
@@ -723,7 +723,7 @@ static int nvme_rdma_start_io_queues(struct nvme_ctrl *ctrl)
 	int i, ret = 0;
 
 	for (i = 1; i < ctrl->queue_count; i++) {
-		ret = nvme_rdma_start_queue(ctrl, i);
+		ret = ctrl->ops->start_hw_queue(ctrl, i);
 		if (ret)
 			goto out_stop_queues;
 	}
@@ -732,7 +732,7 @@ static int nvme_rdma_start_io_queues(struct nvme_ctrl *ctrl)
 
 out_stop_queues:
 	for (i--; i >= 1; i--)
-		nvme_rdma_stop_queue(ctrl, i);
+		ctrl->ops->stop_hw_queue(ctrl, i);
 	return ret;
 }
 
@@ -754,7 +754,7 @@ static int nvme_rdma_alloc_io_queues(struct nvme_ctrl *ctrl)
 		"creating %d I/O queues.\n", nr_io_queues);
 
 	for (i = 1; i < ctrl->queue_count; i++) {
-		ret = nvme_rdma_alloc_queue(ctrl, i,
+		ret = ctrl->ops->alloc_hw_queue(ctrl, i,
 				ctrl->sqsize + 1);
 		if (ret)
 			goto out_free_queues;
@@ -764,7 +764,7 @@ static int nvme_rdma_alloc_io_queues(struct nvme_ctrl *ctrl)
 
 out_free_queues:
 	for (i--; i >= 1; i--)
-		nvme_rdma_free_queue(ctrl, i);
+		ctrl->ops->free_hw_queue(ctrl, i);
 
 	return ret;
 }
@@ -778,7 +778,7 @@ static int nvme_rdma_configure_io_queues(struct nvme_ctrl *ctrl, bool new)
 		return ret;
 
 	if (new) {
-		ctrl->tagset = nvme_rdma_alloc_tagset(ctrl, false);
+		ctrl->tagset = ctrl->ops->alloc_tagset(ctrl, false);
 		if (IS_ERR(ctrl->tagset)) {
 			ret = PTR_ERR(ctrl->tagset);
 			goto out_free_io_queues;
@@ -806,7 +806,7 @@ static int nvme_rdma_configure_io_queues(struct nvme_ctrl *ctrl, bool new)
 		blk_cleanup_queue(ctrl->connect_q);
 out_free_tag_set:
 	if (new)
-		nvme_rdma_free_tagset(ctrl, false);
+		ctrl->ops->free_tagset(ctrl, false);
 out_free_io_queues:
 	nvme_rdma_free_io_queues(ctrl);
 	return ret;
@@ -814,25 +814,25 @@ static int nvme_rdma_configure_io_queues(struct nvme_ctrl *ctrl, bool new)
 
 static void nvme_rdma_destroy_admin_queue(struct nvme_ctrl *ctrl, bool remove)
 {
-	nvme_rdma_stop_queue(ctrl, 0);
+	ctrl->ops->stop_hw_queue(ctrl, 0);
 	if (remove) {
 		blk_cleanup_queue(ctrl->admin_connect_q);
 		blk_cleanup_queue(ctrl->admin_q);
-		nvme_rdma_free_tagset(ctrl, true);
+		ctrl->ops->free_tagset(ctrl, true);
 	}
-	nvme_rdma_free_queue(ctrl, 0);
+	ctrl->ops->free_hw_queue(ctrl, 0);
 }
 
 static int nvme_rdma_configure_admin_queue(struct nvme_ctrl *ctrl, bool new)
 {
 	int error;
 
-	error = nvme_rdma_alloc_queue(ctrl, 0, NVME_AQ_DEPTH);
+	error = ctrl->ops->alloc_hw_queue(ctrl, 0, NVME_AQ_DEPTH);
 	if (error)
 		return error;
 
 	if (new) {
-		ctrl->admin_tagset = nvme_rdma_alloc_tagset(ctrl, true);
+		ctrl->admin_tagset = ctrl->ops->alloc_tagset(ctrl, true);
 		if (IS_ERR(ctrl->admin_tagset)) {
 			error = PTR_ERR(ctrl->admin_tagset);
 			goto out_free_queue;
@@ -855,7 +855,7 @@ static int nvme_rdma_configure_admin_queue(struct nvme_ctrl *ctrl, bool new)
 			goto out_free_queue;
 	}
 
-	error = nvme_rdma_start_queue(ctrl, 0);
+	error = ctrl->ops->start_hw_queue(ctrl, 0);
 	if (error)
 		goto out_cleanup_connect_queue;
 
@@ -889,9 +889,9 @@ static int nvme_rdma_configure_admin_queue(struct nvme_ctrl *ctrl, bool new)
 		blk_cleanup_queue(ctrl->admin_q);
 out_free_tagset:
 	if (new)
-		nvme_rdma_free_tagset(ctrl, true);
+		ctrl->ops->free_tagset(ctrl, true);
 out_free_queue:
-	nvme_rdma_free_queue(ctrl, 0);
+	ctrl->ops->free_hw_queue(ctrl, 0);
 	return error;
 }
 
@@ -981,7 +981,7 @@ static void nvme_rdma_error_recovery_work(struct work_struct *work)
 		nvme_rdma_stop_io_queues(ctrl);
 	}
 	blk_mq_stop_hw_queues(ctrl->admin_q);
-	nvme_rdma_stop_queue(ctrl, 0);
+	ctrl->ops->stop_hw_queue(ctrl, 0);
 
 	/* We must take care of fastfail/requeue all our inflight requests */
 	if (ctrl->queue_count > 1)
@@ -1886,6 +1886,14 @@ static const struct nvme_ctrl_ops nvme_rdma_ctrl_ops = {
 	.delete_ctrl		= nvme_rdma_del_ctrl,
 	.get_subsysnqn		= nvmf_get_subsysnqn,
 	.get_address		= nvmf_get_address,
+
+	.alloc_hw_queue		= nvme_rdma_alloc_queue,
+	.free_hw_queue		= nvme_rdma_free_queue,
+	.start_hw_queue		= nvme_rdma_start_queue,
+	.stop_hw_queue		= nvme_rdma_stop_queue,
+	.alloc_tagset		= nvme_rdma_alloc_tagset,
+	.free_tagset		= nvme_rdma_free_tagset,
+	.verify_ctrl		= nvme_rdma_verify_ctrl,
 };
 
 static int nvme_rdma_probe_ctrl(struct nvme_ctrl *ctrl, struct device *dev,
@@ -1910,7 +1918,7 @@ static int nvme_rdma_probe_ctrl(struct nvme_ctrl *ctrl, struct device *dev,
 	if (ret)
 		goto out_uninit_ctrl;
 
-	ret = nvme_rdma_verify_ctrl(ctrl);
+	ret = ctrl->ops->verify_ctrl(ctrl);
 	if (ret)
 		goto out_remove_admin_queue;
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH rfc 24/30] nvme-pci: rename to nvme_pci_configure_admin_queue
  2017-06-18 15:21 [PATCH rfc 00/30] centralize nvme controller reset, delete and periodic reconnects Sagi Grimberg
                   ` (22 preceding siblings ...)
  2017-06-18 15:21 ` [PATCH rfc 23/30] nvme: add low level queue and tagset controller ops Sagi Grimberg
@ 2017-06-18 15:21 ` Sagi Grimberg
  2017-06-19  7:20   ` Christoph Hellwig
  2017-06-18 15:21 ` [PATCH rfc 25/30] nvme: move control plane handling to nvme core Sagi Grimberg
                   ` (6 subsequent siblings)
  30 siblings, 1 reply; 69+ messages in thread
From: Sagi Grimberg @ 2017-06-18 15:21 UTC (permalink / raw)
  To: linux-nvme; +Cc: Christoph Hellwig, Keith Busch, linux-block

we are going to need the name for the core routine...

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
 drivers/nvme/host/pci.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 32a98e2740ad..628f1edb0acd 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1385,7 +1385,7 @@ static int nvme_remap_bar(struct nvme_dev *dev, unsigned long size)
 	return 0;
 }
 
-static int nvme_configure_admin_queue(struct nvme_dev *dev)
+static int nvme_pci_configure_admin_queue(struct nvme_dev *dev)
 {
 	int result;
 	u32 aqa;
@@ -2093,7 +2093,7 @@ static void nvme_reset_work(struct work_struct *work)
 	if (result)
 		goto out;
 
-	result = nvme_configure_admin_queue(dev);
+	result = nvme_pci_configure_admin_queue(dev);
 	if (result)
 		goto out;
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: [PATCH rfc 24/30] nvme-pci: rename to nvme_pci_configure_admin_queue
  2017-06-18 15:21 ` [PATCH rfc 24/30] nvme-pci: rename to nvme_pci_configure_admin_queue Sagi Grimberg
@ 2017-06-19  7:20   ` Christoph Hellwig
  0 siblings, 0 replies; 69+ messages in thread
From: Christoph Hellwig @ 2017-06-19  7:20 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: linux-nvme, Christoph Hellwig, Keith Busch, linux-block

On Sun, Jun 18, 2017 at 06:21:58PM +0300, Sagi Grimberg wrote:
> we are going to need the name for the core routine...

I think we should just pick this up ASAP as a prep patch..

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH rfc 25/30] nvme: move control plane handling to nvme core
  2017-06-18 15:21 [PATCH rfc 00/30] centralize nvme controller reset, delete and periodic reconnects Sagi Grimberg
                   ` (23 preceding siblings ...)
  2017-06-18 15:21 ` [PATCH rfc 24/30] nvme-pci: rename to nvme_pci_configure_admin_queue Sagi Grimberg
@ 2017-06-18 15:21 ` Sagi Grimberg
  2017-06-19 12:55   ` Christoph Hellwig
  2017-06-18 15:22 ` [PATCH rfc 26/30] nvme-fabrics: handle reconnects in fabrics library Sagi Grimberg
                   ` (5 subsequent siblings)
  30 siblings, 1 reply; 69+ messages in thread
From: Sagi Grimberg @ 2017-06-18 15:21 UTC (permalink / raw)
  To: linux-nvme; +Cc: Christoph Hellwig, Keith Busch, linux-block

handle controller setup, reset and delete

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
 drivers/nvme/host/core.c | 373 +++++++++++++++++++++++++++++++++++++++++++++++
 drivers/nvme/host/nvme.h |  12 ++
 drivers/nvme/host/rdma.c | 372 +---------------------------------------------
 3 files changed, 393 insertions(+), 364 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 17a10549d688..6937ba26ff2c 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2670,6 +2670,379 @@ void nvme_start_queues(struct nvme_ctrl *ctrl)
 }
 EXPORT_SYMBOL_GPL(nvme_start_queues);
 
+static void nvme_free_io_queues(struct nvme_ctrl *ctrl)
+{
+	int i;
+
+	for (i = 1; i < ctrl->queue_count; i++)
+		ctrl->ops->free_hw_queue(ctrl, i);
+}
+
+void nvme_stop_io_queues(struct nvme_ctrl *ctrl)
+{
+	int i;
+
+	for (i = 1; i < ctrl->queue_count; i++)
+		ctrl->ops->stop_hw_queue(ctrl, i);
+}
+EXPORT_SYMBOL_GPL(nvme_stop_io_queues);
+
+static int nvme_start_io_queues(struct nvme_ctrl *ctrl)
+{
+	int i, ret = 0;
+
+	for (i = 1; i < ctrl->queue_count; i++) {
+		ret = ctrl->ops->start_hw_queue(ctrl, i);
+		if (ret)
+			goto out_stop_queues;
+	}
+
+	return 0;
+
+out_stop_queues:
+	for (i--; i >= 1; i--)
+		ctrl->ops->stop_hw_queue(ctrl, i);
+	return ret;
+}
+
+static int nvme_alloc_io_queues(struct nvme_ctrl *ctrl)
+{
+	unsigned int nr_io_queues = ctrl->max_queues - 1;
+	int i, ret;
+
+	nr_io_queues = min(nr_io_queues, num_online_cpus());
+	ret = nvme_set_queue_count(ctrl, &nr_io_queues);
+	if (ret)
+		return ret;
+
+	ctrl->queue_count = nr_io_queues + 1;
+	if (ctrl->queue_count < 2)
+		return 0;
+
+	dev_info(ctrl->device,
+		"creating %d I/O queues.\n", nr_io_queues);
+
+	for (i = 1; i < ctrl->queue_count; i++) {
+		ret = ctrl->ops->alloc_hw_queue(ctrl, i,
+				ctrl->sqsize + 1);
+		if (ret)
+			goto out_free_queues;
+	}
+
+	return 0;
+
+out_free_queues:
+	for (i--; i >= 1; i--)
+		ctrl->ops->free_hw_queue(ctrl, i);
+
+	return ret;
+}
+
+void nvme_destroy_io_queues(struct nvme_ctrl *ctrl, bool remove)
+{
+	nvme_stop_io_queues(ctrl);
+	if (remove) {
+		if (ctrl->ops->flags & NVME_F_FABRICS)
+			blk_cleanup_queue(ctrl->connect_q);
+		ctrl->ops->free_tagset(ctrl, false);
+	}
+	nvme_free_io_queues(ctrl);
+}
+EXPORT_SYMBOL_GPL(nvme_destroy_io_queues);
+
+int nvme_configure_io_queues(struct nvme_ctrl *ctrl, bool new)
+{
+	int ret;
+
+	ret = nvme_alloc_io_queues(ctrl);
+	if (ret)
+	return ret;
+
+	if (new) {
+		ctrl->tagset = ctrl->ops->alloc_tagset(ctrl, false);
+		if (IS_ERR(ctrl->tagset)) {
+			ret = PTR_ERR(ctrl->tagset);
+			goto out_free_io_queues;
+		}
+
+		if (ctrl->ops->flags & NVME_F_FABRICS) {
+			ctrl->connect_q = blk_mq_init_queue(ctrl->tagset);
+			if (IS_ERR(ctrl->connect_q)) {
+				ret = PTR_ERR(ctrl->connect_q);
+				goto out_free_tag_set;
+			}
+		}
+       } else {
+		ret = blk_mq_reinit_tagset(ctrl->tagset);
+		if (ret)
+			goto out_free_io_queues;
+       }
+
+	ret = nvme_start_io_queues(ctrl);
+	if (ret)
+		goto out_cleanup_connect_q;
+
+	return 0;
+
+out_cleanup_connect_q:
+	if (new && (ctrl->ops->flags & NVME_F_FABRICS))
+		blk_cleanup_queue(ctrl->connect_q);
+out_free_tag_set:
+       if (new)
+		ctrl->ops->free_tagset(ctrl, false);
+out_free_io_queues:
+       nvme_free_io_queues(ctrl);
+       return ret;
+}
+EXPORT_SYMBOL_GPL(nvme_configure_io_queues);
+
+void nvme_destroy_admin_queue(struct nvme_ctrl *ctrl, bool remove)
+{
+	ctrl->ops->stop_hw_queue(ctrl, 0);
+	if (remove) {
+		if (ctrl->ops->flags & NVME_F_FABRICS)
+			blk_cleanup_queue(ctrl->admin_connect_q);
+		blk_cleanup_queue(ctrl->admin_q);
+		ctrl->ops->free_tagset(ctrl, true);
+	}
+	ctrl->ops->free_hw_queue(ctrl, 0);
+}
+EXPORT_SYMBOL_GPL(nvme_destroy_admin_queue);
+
+int nvme_configure_admin_queue(struct nvme_ctrl *ctrl, bool new)
+{
+	int error;
+
+	error = ctrl->ops->alloc_hw_queue(ctrl, 0, NVME_AQ_DEPTH);
+	if (error)
+		return error;
+
+	if (new) {
+		ctrl->admin_tagset = ctrl->ops->alloc_tagset(ctrl, true);
+		if (IS_ERR(ctrl->admin_tagset)) {
+			error = PTR_ERR(ctrl->admin_tagset);
+			goto out_free_queue;
+		}
+
+		ctrl->admin_q = blk_mq_init_queue(ctrl->admin_tagset);
+		if (IS_ERR(ctrl->admin_q)) {
+			error = PTR_ERR(ctrl->admin_q);
+			goto out_free_tagset;
+		}
+
+		if (ctrl->ops->flags & NVME_F_FABRICS) {
+			ctrl->admin_connect_q =
+				blk_mq_init_queue(ctrl->admin_tagset);
+			if (IS_ERR(ctrl->admin_connect_q)) {
+				error = PTR_ERR(ctrl->admin_connect_q);
+				goto out_cleanup_queue;
+			}
+		}
+	} else {
+		error = blk_mq_reinit_tagset(ctrl->admin_tagset);
+		if (error)
+			goto out_free_queue;
+	}
+
+	error = ctrl->ops->start_hw_queue(ctrl, 0);
+	if (error)
+		goto out_cleanup_connect_queue;
+
+	error = ctrl->ops->reg_read64(ctrl, NVME_REG_CAP, &ctrl->cap);
+	if (error) {
+		dev_err(ctrl->device,
+			"prop_get NVME_REG_CAP failed\n");
+		goto out_cleanup_connect_queue;
+	}
+
+	ctrl->sqsize = min_t(int, NVME_CAP_MQES(ctrl->cap), ctrl->sqsize);
+
+	error = nvme_enable_ctrl(ctrl, ctrl->cap);
+	if (error)
+		goto out_cleanup_connect_queue;
+
+	error = nvme_init_identify(ctrl);
+	if (error)
+		goto out_cleanup_connect_queue;
+
+	nvme_start_keep_alive(ctrl);
+
+	return 0;
+
+out_cleanup_connect_queue:
+	if (new && (ctrl->ops->flags & NVME_F_FABRICS))
+		blk_cleanup_queue(ctrl->admin_connect_q);
+out_cleanup_queue:
+	if (new)
+		blk_cleanup_queue(ctrl->admin_q);
+out_free_tagset:
+	if (new)
+		ctrl->ops->free_tagset(ctrl, true);
+out_free_queue:
+	ctrl->ops->free_hw_queue(ctrl, 0);
+	return error;
+}
+EXPORT_SYMBOL_GPL(nvme_configure_admin_queue);
+
+static void nvme_teardown_ctrl(struct nvme_ctrl *ctrl, bool shutdown)
+{
+	nvme_stop_keep_alive(ctrl);
+	cancel_work_sync(&ctrl->err_work);
+	cancel_delayed_work_sync(&ctrl->reconnect_work);
+
+	if (ctrl->max_queues > 1) {
+		nvme_stop_queues(ctrl);
+		blk_mq_tagset_busy_iter(ctrl->tagset,
+					nvme_cancel_request, ctrl);
+		nvme_destroy_io_queues(ctrl, shutdown);
+	}
+
+	if (shutdown)
+		nvme_shutdown_ctrl(ctrl);
+	else
+		nvme_disable_ctrl(ctrl, ctrl->cap);
+
+	blk_mq_stop_hw_queues(ctrl->admin_q);
+	blk_mq_tagset_busy_iter(ctrl->admin_tagset,
+				nvme_cancel_request, ctrl);
+	nvme_destroy_admin_queue(ctrl, shutdown);
+}
+
+static void nvme_del_ctrl_work(struct work_struct *work)
+{
+	struct nvme_ctrl *ctrl = container_of(work,
+			struct nvme_ctrl, delete_work);
+
+	nvme_uninit_ctrl(ctrl);
+	nvme_teardown_ctrl(ctrl, true);
+	nvme_put_ctrl(ctrl);
+}
+
+int __nvme_del_ctrl(struct nvme_ctrl *ctrl)
+{
+	if (!nvme_change_ctrl_state(ctrl, NVME_CTRL_DELETING))
+		return -EBUSY;
+
+	if (!queue_work(nvme_wq, &ctrl->delete_work))
+		return -EBUSY;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(__nvme_del_ctrl);
+
+int nvme_del_ctrl(struct nvme_ctrl *ctrl)
+{
+	int ret = 0;
+
+	/*
+	 * Keep a reference until all work is flushed since
+	 * __nvme_del_ctrl can free the ctrl mem
+	 */
+	if (!kref_get_unless_zero(&ctrl->kref))
+		return -EBUSY;
+
+	ret = __nvme_del_ctrl(ctrl);
+	if (!ret)
+		flush_work(&ctrl->delete_work);
+
+	nvme_put_ctrl(ctrl);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(nvme_del_ctrl);
+
+static void nvme_reset_ctrl_work(struct work_struct *work)
+{
+	struct nvme_ctrl *ctrl = container_of(work,
+			struct nvme_ctrl, reset_work);
+	int ret;
+	bool changed;
+
+	nvme_teardown_ctrl(ctrl, false);
+
+	blk_mq_start_stopped_hw_queues(ctrl->admin_q, true);
+
+	ret = nvme_configure_admin_queue(ctrl, false);
+	if (ret)
+		goto out_destroy_admin;
+
+	if (ctrl->max_queues > 1) {
+		ret = nvme_configure_io_queues(ctrl, false);
+		if (ret)
+			goto out_destroy_io;
+	}
+
+	changed = nvme_change_ctrl_state(ctrl, NVME_CTRL_LIVE);
+	WARN_ON_ONCE(!changed);
+
+	if (ctrl->queue_count > 1) {
+		nvme_start_queues(ctrl);
+		nvme_queue_scan(ctrl);
+		nvme_queue_async_events(ctrl);
+	}
+
+	return;
+
+out_destroy_io:
+	nvme_destroy_io_queues(ctrl, true);
+out_destroy_admin:
+	nvme_destroy_admin_queue(ctrl, true);
+	dev_warn(ctrl->device, "Removing after reset failure\n");
+	nvme_uninit_ctrl(ctrl);
+	nvme_put_ctrl(ctrl);
+}
+
+int nvme_probe_ctrl(struct nvme_ctrl *ctrl, struct device *dev,
+		const struct nvme_ctrl_ops *ops, unsigned long quirks,
+		unsigned int nr_io_queues, size_t queue_size, int kato)
+{
+	bool changed;
+	int ret;
+
+	INIT_WORK(&ctrl->delete_work, nvme_del_ctrl_work);
+	INIT_WORK(&ctrl->reset_work, nvme_reset_ctrl_work);
+
+	ctrl->max_queues = nr_io_queues + 1; /* +1 for admin queue */
+	ctrl->sqsize = queue_size - 1; /* 0's based */
+	ctrl->kato = kato;
+
+	ret = nvme_init_ctrl(ctrl, dev, ops, quirks);
+	if (ret)
+		return ret;
+
+	ret = nvme_configure_admin_queue(ctrl, true);
+	if (ret)
+		goto out_uninit_ctrl;
+
+	ret = ctrl->ops->verify_ctrl(ctrl);
+	if (ret)
+		goto out_remove_admin_queue;
+
+	if (ctrl->max_queues > 1) {
+		ret = nvme_configure_io_queues(ctrl, true);
+		if (ret)
+			goto out_remove_admin_queue;
+	}
+
+	changed = nvme_change_ctrl_state(ctrl, NVME_CTRL_LIVE);
+	WARN_ON_ONCE(!changed);
+
+	kref_get(&ctrl->kref);
+
+	if (ctrl->queue_count > 1) {
+		nvme_queue_scan(ctrl);
+		nvme_queue_async_events(ctrl);
+	}
+
+	return 0;
+
+out_remove_admin_queue:
+	nvme_destroy_admin_queue(ctrl, true);
+out_uninit_ctrl:
+	nvme_uninit_ctrl(ctrl);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(nvme_probe_ctrl);
+
 int __init nvme_core_init(void)
 {
 	int result;
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 18aac677a96c..c231caf0e486 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -388,6 +388,18 @@ static inline struct nvme_ns *nvme_get_ns_from_dev(struct device *dev)
 	return dev_to_disk(dev)->private_data;
 }
 
+void nvme_stop_io_queues(struct nvme_ctrl *ctrl);
+void nvme_destroy_io_queues(struct nvme_ctrl *ctrl, bool remove);
+int nvme_configure_io_queues(struct nvme_ctrl *ctrl, bool new);
+void nvme_destroy_admin_queue(struct nvme_ctrl *ctrl, bool remove);
+int nvme_configure_admin_queue(struct nvme_ctrl *ctrl, bool new);
+int __nvme_del_ctrl(struct nvme_ctrl *ctrl);
+int nvme_del_ctrl(struct nvme_ctrl *ctrl);
+int nvme_reset_ctrl(struct nvme_ctrl *ctrl);
+int nvme_probe_ctrl(struct nvme_ctrl *ctrl, struct device *dev,
+		const struct nvme_ctrl_ops *ops, unsigned long quirks,
+		unsigned int nr_io_queues, size_t queue_size, int kato);
+
 int __init nvme_core_init(void);
 void nvme_core_exit(void);
 
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index a32c8a710ad4..9b8c819f2bd7 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -674,32 +674,6 @@ static void nvme_rdma_free_queue(struct nvme_ctrl *nctrl, int qid)
 	rdma_destroy_id(queue->cm_id);
 }
 
-static void nvme_rdma_free_io_queues(struct nvme_ctrl *ctrl)
-{
-	int i;
-
-	for (i = 1; i < ctrl->queue_count; i++)
-		ctrl->ops->free_hw_queue(ctrl, i);
-}
-
-static void nvme_rdma_stop_io_queues(struct nvme_ctrl *ctrl)
-{
-	int i;
-
-	for (i = 1; i < ctrl->queue_count; i++)
-		ctrl->ops->stop_hw_queue(ctrl, i);
-}
-
-static void nvme_rdma_destroy_io_queues(struct nvme_ctrl *ctrl, bool remove)
-{
-	nvme_rdma_stop_io_queues(ctrl);
-	if (remove) {
-		blk_cleanup_queue(ctrl->connect_q);
-		ctrl->ops->free_tagset(ctrl, false);
-	}
-	nvme_rdma_free_io_queues(ctrl);
-}
-
 static int nvme_rdma_start_queue(struct nvme_ctrl *nctrl, int idx)
 {
 	struct nvme_rdma_ctrl *ctrl = to_rdma_ctrl(nctrl);
@@ -718,183 +692,6 @@ static int nvme_rdma_start_queue(struct nvme_ctrl *nctrl, int idx)
 	return ret;
 }
 
-static int nvme_rdma_start_io_queues(struct nvme_ctrl *ctrl)
-{
-	int i, ret = 0;
-
-	for (i = 1; i < ctrl->queue_count; i++) {
-		ret = ctrl->ops->start_hw_queue(ctrl, i);
-		if (ret)
-			goto out_stop_queues;
-	}
-
-	return 0;
-
-out_stop_queues:
-	for (i--; i >= 1; i--)
-		ctrl->ops->stop_hw_queue(ctrl, i);
-	return ret;
-}
-
-static int nvme_rdma_alloc_io_queues(struct nvme_ctrl *ctrl)
-{
-	unsigned int nr_io_queues = ctrl->max_queues - 1;
-	int i, ret;
-
-	nr_io_queues = min(nr_io_queues, num_online_cpus());
-	ret = nvme_set_queue_count(ctrl, &nr_io_queues);
-	if (ret)
-		return ret;
-
-	ctrl->queue_count = nr_io_queues + 1;
-	if (ctrl->queue_count < 2)
-		return 0;
-
-	dev_info(ctrl->device,
-		"creating %d I/O queues.\n", nr_io_queues);
-
-	for (i = 1; i < ctrl->queue_count; i++) {
-		ret = ctrl->ops->alloc_hw_queue(ctrl, i,
-				ctrl->sqsize + 1);
-		if (ret)
-			goto out_free_queues;
-	}
-
-	return 0;
-
-out_free_queues:
-	for (i--; i >= 1; i--)
-		ctrl->ops->free_hw_queue(ctrl, i);
-
-	return ret;
-}
-
-static int nvme_rdma_configure_io_queues(struct nvme_ctrl *ctrl, bool new)
-{
-	int ret;
-
-	ret = nvme_rdma_alloc_io_queues(ctrl);
-	if (ret)
-		return ret;
-
-	if (new) {
-		ctrl->tagset = ctrl->ops->alloc_tagset(ctrl, false);
-		if (IS_ERR(ctrl->tagset)) {
-			ret = PTR_ERR(ctrl->tagset);
-			goto out_free_io_queues;
-		}
-
-		ctrl->connect_q = blk_mq_init_queue(ctrl->tagset);
-		if (IS_ERR(ctrl->connect_q)) {
-			ret = PTR_ERR(ctrl->connect_q);
-			goto out_free_tag_set;
-		}
-	} else {
-		ret = blk_mq_reinit_tagset(ctrl->tagset);
-		if (ret)
-			goto out_free_io_queues;
-	}
-
-	ret = nvme_rdma_start_io_queues(ctrl);
-	if (ret)
-		goto out_cleanup_connect_q;
-
-	return 0;
-
-out_cleanup_connect_q:
-	if (new)
-		blk_cleanup_queue(ctrl->connect_q);
-out_free_tag_set:
-	if (new)
-		ctrl->ops->free_tagset(ctrl, false);
-out_free_io_queues:
-	nvme_rdma_free_io_queues(ctrl);
-	return ret;
-}
-
-static void nvme_rdma_destroy_admin_queue(struct nvme_ctrl *ctrl, bool remove)
-{
-	ctrl->ops->stop_hw_queue(ctrl, 0);
-	if (remove) {
-		blk_cleanup_queue(ctrl->admin_connect_q);
-		blk_cleanup_queue(ctrl->admin_q);
-		ctrl->ops->free_tagset(ctrl, true);
-	}
-	ctrl->ops->free_hw_queue(ctrl, 0);
-}
-
-static int nvme_rdma_configure_admin_queue(struct nvme_ctrl *ctrl, bool new)
-{
-	int error;
-
-	error = ctrl->ops->alloc_hw_queue(ctrl, 0, NVME_AQ_DEPTH);
-	if (error)
-		return error;
-
-	if (new) {
-		ctrl->admin_tagset = ctrl->ops->alloc_tagset(ctrl, true);
-		if (IS_ERR(ctrl->admin_tagset)) {
-			error = PTR_ERR(ctrl->admin_tagset);
-			goto out_free_queue;
-		}
-
-		ctrl->admin_q = blk_mq_init_queue(ctrl->admin_tagset);
-		if (IS_ERR(ctrl->admin_q)) {
-			error = PTR_ERR(ctrl->admin_q);
-			goto out_free_tagset;
-		}
-
-		ctrl->admin_connect_q = blk_mq_init_queue(ctrl->admin_tagset);
-		if (IS_ERR(ctrl->admin_connect_q)) {
-			error = PTR_ERR(ctrl->admin_connect_q);
-			goto out_cleanup_queue;
-		}
-	} else {
-		error = blk_mq_reinit_tagset(ctrl->admin_tagset);
-		if (error)
-			goto out_free_queue;
-	}
-
-	error = ctrl->ops->start_hw_queue(ctrl, 0);
-	if (error)
-		goto out_cleanup_connect_queue;
-
-	error = ctrl->ops->reg_read64(ctrl, NVME_REG_CAP, &ctrl->cap);
-	if (error) {
-		dev_err(ctrl->device,
-			"prop_get NVME_REG_CAP failed\n");
-		goto out_cleanup_connect_queue;
-	}
-
-	ctrl->sqsize =
-		min_t(int, NVME_CAP_MQES(ctrl->cap), ctrl->sqsize);
-
-	error = nvme_enable_ctrl(ctrl, ctrl->cap);
-	if (error)
-		goto out_cleanup_connect_queue;
-
-	error = nvme_init_identify(ctrl);
-	if (error)
-		goto out_cleanup_connect_queue;
-
-	nvme_start_keep_alive(ctrl);
-
-	return 0;
-
-out_cleanup_connect_queue:
-	if (new)
-		blk_cleanup_queue(ctrl->admin_connect_q);
-out_cleanup_queue:
-	if (new)
-		blk_cleanup_queue(ctrl->admin_q);
-out_free_tagset:
-	if (new)
-		ctrl->ops->free_tagset(ctrl, true);
-out_free_queue:
-	ctrl->ops->free_hw_queue(ctrl, 0);
-	return error;
-}
-
 static void nvme_rdma_free_ctrl(struct nvme_ctrl *nctrl)
 {
 	struct nvme_rdma_ctrl *ctrl = to_rdma_ctrl(nctrl);
@@ -942,16 +739,16 @@ static void nvme_rdma_reconnect_ctrl_work(struct work_struct *work)
 	++ctrl->nr_reconnects;
 
 	if (ctrl->max_queues > 1)
-		nvme_rdma_destroy_io_queues(ctrl, false);
+		nvme_destroy_io_queues(ctrl, false);
 
-	nvme_rdma_destroy_admin_queue(ctrl, false);
+	nvme_destroy_admin_queue(ctrl, false);
 
-	ret = nvme_rdma_configure_admin_queue(ctrl, false);
+	ret = nvme_configure_admin_queue(ctrl, false);
 	if (ret)
 		goto requeue;
 
 	if (ctrl->max_queues > 1) {
-		ret = nvme_rdma_configure_io_queues(ctrl, false);
+		ret = nvme_configure_io_queues(ctrl, false);
 		if (ret)
 			goto requeue;
 	}
@@ -978,7 +775,7 @@ static void nvme_rdma_error_recovery_work(struct work_struct *work)
 
 	if (ctrl->queue_count > 1) {
 		nvme_stop_queues(ctrl);
-		nvme_rdma_stop_io_queues(ctrl);
+		nvme_stop_io_queues(ctrl);
 	}
 	blk_mq_stop_hw_queues(ctrl->admin_q);
 	ctrl->ops->stop_hw_queue(ctrl, 0);
@@ -1738,107 +1535,6 @@ static const struct blk_mq_ops nvme_rdma_admin_mq_ops = {
 	.timeout	= nvme_rdma_timeout,
 };
 
-static void nvme_rdma_teardown_ctrl(struct nvme_ctrl *ctrl, bool shutdown)
-{
-	nvme_stop_keep_alive(ctrl);
-	cancel_work_sync(&ctrl->err_work);
-	cancel_delayed_work_sync(&ctrl->reconnect_work);
-
-	if (ctrl->max_queues > 1) {
-		nvme_stop_queues(ctrl);
-		blk_mq_tagset_busy_iter(ctrl->tagset,
-					nvme_cancel_request, ctrl);
-		nvme_rdma_destroy_io_queues(ctrl, shutdown);
-	}
-
-	if (shutdown)
-		nvme_shutdown_ctrl(ctrl);
-	else
-		nvme_disable_ctrl(ctrl, ctrl->cap);
-
-	blk_mq_stop_hw_queues(ctrl->admin_q);
-	blk_mq_tagset_busy_iter(ctrl->admin_tagset,
-				nvme_cancel_request, ctrl);
-	nvme_rdma_destroy_admin_queue(ctrl, shutdown);
-}
-
-static void nvme_rdma_del_ctrl_work(struct work_struct *work)
-{
-	struct nvme_ctrl *ctrl = container_of(work,
-			struct nvme_ctrl, delete_work);
-
-	nvme_uninit_ctrl(ctrl);
-	nvme_rdma_teardown_ctrl(ctrl, true);
-	nvme_put_ctrl(ctrl);
-}
-
-static int __nvme_rdma_del_ctrl(struct nvme_ctrl *ctrl)
-{
-	if (!nvme_change_ctrl_state(ctrl, NVME_CTRL_DELETING))
-		return -EBUSY;
-
-	if (!queue_work(nvme_wq, &ctrl->delete_work))
-		return -EBUSY;
-
-	return 0;
-}
-
-static int nvme_rdma_del_ctrl(struct nvme_ctrl *ctrl)
-{
-	int ret = 0;
-
-	/*
-	 * Keep a reference until all work is flushed since
-	 * __nvme_rdma_del_ctrl can free the ctrl mem
-	 */
-	if (!kref_get_unless_zero(&ctrl->kref))
-		return -EBUSY;
-	ret = __nvme_rdma_del_ctrl(ctrl);
-	if (!ret)
-		flush_work(&ctrl->delete_work);
-	nvme_put_ctrl(ctrl);
-	return ret;
-}
-
-static void nvme_rdma_reset_ctrl_work(struct work_struct *work)
-{
-	struct nvme_ctrl *ctrl = container_of(work,
-			struct nvme_ctrl, reset_work);
-	int ret;
-	bool changed;
-
-	nvme_rdma_teardown_ctrl(ctrl, false);
-
-	ret = nvme_rdma_configure_admin_queue(ctrl, false);
-	if (ret)
-		goto out_destroy_admin;
-
-	if (ctrl->max_queues > 1) {
-		ret = nvme_rdma_configure_io_queues(ctrl, false);
-		if (ret)
-			goto out_destroy_io;
-	}
-
-	changed = nvme_change_ctrl_state(ctrl, NVME_CTRL_LIVE);
-	WARN_ON_ONCE(!changed);
-
-	if (ctrl->queue_count > 1) {
-		nvme_start_queues(ctrl);
-		nvme_queue_scan(ctrl);
-		nvme_queue_async_events(ctrl);
-	}
-
-	return;
-
-out_destroy_io:
-	nvme_rdma_destroy_io_queues(ctrl, true);
-out_destroy_admin:
-	nvme_rdma_destroy_admin_queue(ctrl, true);
-	dev_warn(ctrl->device, "Removing after reset failure\n");
-	nvme_uninit_ctrl(ctrl);
-	nvme_put_ctrl(ctrl);
-}
-
 static int nvme_rdma_verify_ctrl(struct nvme_ctrl *ctrl)
 {
 	struct nvmf_ctrl_options *opts = ctrl->opts;
@@ -1883,7 +1579,7 @@ static const struct nvme_ctrl_ops nvme_rdma_ctrl_ops = {
 	.reg_write32		= nvmf_reg_write32,
 	.free_ctrl		= nvme_rdma_free_ctrl,
 	.submit_async_event	= nvme_rdma_submit_async_event,
-	.delete_ctrl		= nvme_rdma_del_ctrl,
+	.delete_ctrl		= nvme_del_ctrl,
 	.get_subsysnqn		= nvmf_get_subsysnqn,
 	.get_address		= nvmf_get_address,
 
@@ -1896,57 +1592,6 @@ static const struct nvme_ctrl_ops nvme_rdma_ctrl_ops = {
 	.verify_ctrl		= nvme_rdma_verify_ctrl,
 };
 
-static int nvme_rdma_probe_ctrl(struct nvme_ctrl *ctrl, struct device *dev,
-		const struct nvme_ctrl_ops *ops, unsigned long quirks,
-		unsigned int nr_io_queues, size_t queue_size, int kato)
-{
-	bool changed;
-	int ret;
-
-	INIT_WORK(&ctrl->delete_work, nvme_rdma_del_ctrl_work);
-	INIT_WORK(&ctrl->reset_work, nvme_rdma_reset_ctrl_work);
-
-	ctrl->max_queues = nr_io_queues + 1; /* +1 for admin queue */
-	ctrl->sqsize = queue_size - 1; /* 0's based */
-	ctrl->kato = kato;
-
-	ret = nvme_init_ctrl(ctrl, dev, ops, quirks);
-	if (ret)
-		return ret;
-
-	ret = nvme_rdma_configure_admin_queue(ctrl, true);
-	if (ret)
-		goto out_uninit_ctrl;
-
-	ret = ctrl->ops->verify_ctrl(ctrl);
-	if (ret)
-		goto out_remove_admin_queue;
-
-	if (ctrl->max_queues > 1) {
-		ret = nvme_rdma_configure_io_queues(ctrl, true);
-		if (ret)
-			goto out_remove_admin_queue;
-	}
-
-	changed = nvme_change_ctrl_state(ctrl, NVME_CTRL_LIVE);
-	WARN_ON_ONCE(!changed);
-
-	kref_get(&ctrl->kref);
-
-	if (ctrl->queue_count > 1) {
-		nvme_queue_scan(ctrl);
-		nvme_queue_async_events(ctrl);
-	}
-
-	return 0;
-
-out_remove_admin_queue:
-	nvme_rdma_destroy_admin_queue(ctrl, true);
-out_uninit_ctrl:
-	nvme_uninit_ctrl(ctrl);
-	return ret;
-}
-
 static struct nvme_ctrl *nvme_rdma_create_ctrl(struct device *dev,
 		struct nvmf_ctrl_options *opts)
 {
@@ -1986,7 +1631,6 @@ static struct nvme_ctrl *nvme_rdma_create_ctrl(struct device *dev,
 	INIT_DELAYED_WORK(&ctrl->ctrl.reconnect_work,
 			nvme_rdma_reconnect_ctrl_work);
 	INIT_WORK(&ctrl->ctrl.err_work, nvme_rdma_error_recovery_work);
-	INIT_WORK(&ctrl->ctrl.delete_work, nvme_rdma_del_ctrl_work);
 
 	ret = -ENOMEM;
 	ctrl->queues = kcalloc(opts->nr_io_queues + 1, sizeof(*ctrl->queues),
@@ -1994,7 +1638,7 @@ static struct nvme_ctrl *nvme_rdma_create_ctrl(struct device *dev,
 	if (!ctrl->queues)
 		goto out_free_ctrl;
 
-	ret = nvme_rdma_probe_ctrl(&ctrl->ctrl, dev, &nvme_rdma_ctrl_ops,
+	ret = nvme_probe_ctrl(&ctrl->ctrl, dev, &nvme_rdma_ctrl_ops,
 			0, opts->nr_io_queues, opts->queue_size, opts->kato);
 	if (ret)
 		goto out_kfree_queues;
@@ -2039,7 +1683,7 @@ static void nvme_rdma_remove_one(struct ib_device *ib_device, void *client_data)
 		dev_info(ctrl->ctrl.device,
 			"Removing ctrl: NQN \"%s\", addr %pISp\n",
 			ctrl->ctrl.opts->subsysnqn, &ctrl->addr);
-		__nvme_rdma_del_ctrl(&ctrl->ctrl);
+		__nvme_del_ctrl(&ctrl->ctrl);
 	}
 	mutex_unlock(&nvme_rdma_ctrl_mutex);
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: [PATCH rfc 25/30] nvme: move control plane handling to nvme core
  2017-06-18 15:21 ` [PATCH rfc 25/30] nvme: move control plane handling to nvme core Sagi Grimberg
@ 2017-06-19 12:55   ` Christoph Hellwig
  2017-06-19 16:24     ` Sagi Grimberg
  0 siblings, 1 reply; 69+ messages in thread
From: Christoph Hellwig @ 2017-06-19 12:55 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: linux-nvme, Christoph Hellwig, Keith Busch, linux-block

> +static void nvme_free_io_queues(struct nvme_ctrl *ctrl)
> +{
> +	int i;
> +
> +	for (i = 1; i < ctrl->queue_count; i++)
> +		ctrl->ops->free_hw_queue(ctrl, i);
> +}
> +
> +void nvme_stop_io_queues(struct nvme_ctrl *ctrl)
> +{
> +	int i;
> +
> +	for (i = 1; i < ctrl->queue_count; i++)
> +		ctrl->ops->stop_hw_queue(ctrl, i);
> +}
> +EXPORT_SYMBOL_GPL(nvme_stop_io_queues);

At leasr for PCIe this is going to work very differently, so I'm not
sure this part make so much sense in the core.  Maybe in Fabrics?
Or at least make the callouts operate on all I/O queues, which would
suite PCIe a lot more.

> +	error = ctrl->ops->start_hw_queue(ctrl, 0);
> +	if (error)
> +		goto out_cleanup_connect_queue;
> +
> +	error = ctrl->ops->reg_read64(ctrl, NVME_REG_CAP, &ctrl->cap);
> +	if (error) {
> +		dev_err(ctrl->device,
> +			"prop_get NVME_REG_CAP failed\n");
> +		goto out_cleanup_connect_queue;
> +	}
> +
> +	ctrl->sqsize = min_t(int, NVME_CAP_MQES(ctrl->cap), ctrl->sqsize);
> +
> +	error = nvme_enable_ctrl(ctrl, ctrl->cap);
> +	if (error)
> +		goto out_cleanup_connect_queue;

I'm not sure this ordering is going to work for PCIe..

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH rfc 25/30] nvme: move control plane handling to nvme core
  2017-06-19 12:55   ` Christoph Hellwig
@ 2017-06-19 16:24     ` Sagi Grimberg
  0 siblings, 0 replies; 69+ messages in thread
From: Sagi Grimberg @ 2017-06-19 16:24 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-nvme, Keith Busch, linux-block

>> +static void nvme_free_io_queues(struct nvme_ctrl *ctrl)
>> +{
>> +	int i;
>> +
>> +	for (i = 1; i < ctrl->queue_count; i++)
>> +		ctrl->ops->free_hw_queue(ctrl, i);
>> +}
>> +
>> +void nvme_stop_io_queues(struct nvme_ctrl *ctrl)
>> +{
>> +	int i;
>> +
>> +	for (i = 1; i < ctrl->queue_count; i++)
>> +		ctrl->ops->stop_hw_queue(ctrl, i);
>> +}
>> +EXPORT_SYMBOL_GPL(nvme_stop_io_queues);
> 
> At leasr for PCIe this is going to work very differently, so I'm not
> sure this part make so much sense in the core.  Maybe in Fabrics?
> Or at least make the callouts operate on all I/O queues, which would
> suite PCIe a lot more.

Yea, I spent some time thinking on the async nature of queue
removal for pci... I started from ->stop/free_io_queues callouts
but hated the fact that we need to iterate exactly the same way
in every driver...

We could have an optional stop/free_io_queues that the core
will call instead if implemented?

>> +	error = ctrl->ops->start_hw_queue(ctrl, 0);
>> +	if (error)
>> +		goto out_cleanup_connect_queue;
>> +
>> +	error = ctrl->ops->reg_read64(ctrl, NVME_REG_CAP, &ctrl->cap);
>> +	if (error) {
>> +		dev_err(ctrl->device,
>> +			"prop_get NVME_REG_CAP failed\n");
>> +		goto out_cleanup_connect_queue;
>> +	}
>> +
>> +	ctrl->sqsize = min_t(int, NVME_CAP_MQES(ctrl->cap), ctrl->sqsize);
>> +
>> +	error = nvme_enable_ctrl(ctrl, ctrl->cap);
>> +	if (error)
>> +		goto out_cleanup_connect_queue;
> 
> I'm not sure this ordering is going to work for PCIe..

This one is easy to reverse...

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH rfc 26/30] nvme-fabrics: handle reconnects in fabrics library
  2017-06-18 15:21 [PATCH rfc 00/30] centralize nvme controller reset, delete and periodic reconnects Sagi Grimberg
                   ` (24 preceding siblings ...)
  2017-06-18 15:21 ` [PATCH rfc 25/30] nvme: move control plane handling to nvme core Sagi Grimberg
@ 2017-06-18 15:22 ` Sagi Grimberg
  2017-06-18 15:22 ` [PATCH rfc 27/30] nvme-loop: convert to nvme-core control plane management Sagi Grimberg
                   ` (4 subsequent siblings)
  30 siblings, 0 replies; 69+ messages in thread
From: Sagi Grimberg @ 2017-06-18 15:22 UTC (permalink / raw)
  To: linux-nvme; +Cc: Christoph Hellwig, Keith Busch, linux-block

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
 drivers/nvme/host/fabrics.c | 102 ++++++++++++++++++++++++++++++++++++++++
 drivers/nvme/host/fabrics.h |   1 +
 drivers/nvme/host/rdma.c    | 112 +++-----------------------------------------
 3 files changed, 109 insertions(+), 106 deletions(-)

diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
index bd99bbb1faa3..b543d52f00d0 100644
--- a/drivers/nvme/host/fabrics.c
+++ b/drivers/nvme/host/fabrics.c
@@ -813,6 +813,104 @@ void nvmf_free_options(struct nvmf_ctrl_options *opts)
 }
 EXPORT_SYMBOL_GPL(nvmf_free_options);
 
+static void nvmf_reconnect_or_remove(struct nvme_ctrl *ctrl)
+{
+	/* If we are resetting/deleting then do nothing */
+	if (ctrl->state != NVME_CTRL_RECONNECTING) {
+		WARN_ON_ONCE(ctrl->state == NVME_CTRL_NEW ||
+			ctrl->state == NVME_CTRL_LIVE);
+		return;
+	}
+
+	if (nvmf_should_reconnect(ctrl)) {
+		dev_info(ctrl->device, "Reconnecting in %d seconds...\n",
+			ctrl->opts->reconnect_delay);
+		queue_delayed_work(nvme_wq, &ctrl->reconnect_work,
+				ctrl->opts->reconnect_delay * HZ);
+	} else {
+		dev_info(ctrl->device, "Removing controller...\n");
+		queue_work(nvme_wq, &ctrl->delete_work);
+	}
+}
+
+static void nvmf_reconnect_ctrl_work(struct work_struct *work)
+{
+	struct nvme_ctrl *ctrl = container_of(to_delayed_work(work),
+			struct nvme_ctrl, reconnect_work);
+	bool changed;
+	int ret;
+
+	++ctrl->nr_reconnects;
+
+	if (ctrl->max_queues > 1)
+		nvme_destroy_io_queues(ctrl, false);
+
+	nvme_destroy_admin_queue(ctrl, false);
+
+	ret = nvme_configure_admin_queue(ctrl, false);
+	if (ret)
+		goto requeue;
+
+	if (ctrl->max_queues > 1) {
+		ret = nvme_configure_io_queues(ctrl, false);
+		if (ret)
+			goto requeue;
+	}
+
+	changed = nvme_change_ctrl_state(ctrl, NVME_CTRL_LIVE);
+	WARN_ON_ONCE(!changed);
+	ctrl->nr_reconnects = 0;
+
+	if (ctrl->queue_count > 1) {
+		nvme_queue_scan(ctrl);
+		nvme_queue_async_events(ctrl);
+	}
+
+	dev_info(ctrl->device, "Successfully reconnected\n");
+
+	return;
+
+requeue:
+	dev_info(ctrl->device, "Failed reconnect attempt %d\n",
+			ctrl->nr_reconnects);
+	nvmf_reconnect_or_remove(ctrl);
+}
+
+static void nvmf_error_recovery_work(struct work_struct *work)
+{
+	struct nvme_ctrl *ctrl = container_of(work,
+			struct nvme_ctrl, err_work);
+
+	nvme_stop_keep_alive(ctrl);
+
+	if (ctrl->queue_count > 1) {
+		nvme_stop_queues(ctrl);
+		nvme_stop_io_queues(ctrl);
+	}
+	blk_mq_stop_hw_queues(ctrl->admin_q);
+	ctrl->ops->stop_hw_queue(ctrl, 0);
+
+	/* We must take care of fastfail/requeue all our inflight requests */
+	if (ctrl->queue_count > 1)
+		blk_mq_tagset_busy_iter(ctrl->tagset,
+					nvme_cancel_request, ctrl);
+	blk_mq_tagset_busy_iter(ctrl->admin_tagset,
+				nvme_cancel_request, ctrl);
+	nvme_start_queues(ctrl);
+	blk_mq_start_stopped_hw_queues(ctrl->admin_q, true);
+
+	nvmf_reconnect_or_remove(ctrl);
+}
+
+void nvmf_error_recovery(struct nvme_ctrl *ctrl)
+{
+	if (!nvme_change_ctrl_state(ctrl, NVME_CTRL_RECONNECTING))
+		return;
+
+	queue_work(nvme_wq, &ctrl->err_work);
+}
+EXPORT_SYMBOL_GPL(nvmf_error_recovery);
+
 #define NVMF_REQUIRED_OPTS	(NVMF_OPT_TRANSPORT | NVMF_OPT_NQN)
 #define NVMF_ALLOWED_OPTS	(NVMF_OPT_QUEUE_SIZE | NVMF_OPT_NR_IO_QUEUES | \
 				 NVMF_OPT_KATO | NVMF_OPT_HOSTNQN)
@@ -866,6 +964,10 @@ nvmf_create_ctrl(struct device *dev, const char *buf, size_t count)
 		goto out_unlock;
 	}
 
+	INIT_DELAYED_WORK(&ctrl->reconnect_work,
+			nvmf_reconnect_ctrl_work);
+	INIT_WORK(&ctrl->err_work, nvmf_error_recovery_work);
+
 	mutex_unlock(&nvmf_transports_mutex);
 	return ctrl;
 
diff --git a/drivers/nvme/host/fabrics.h b/drivers/nvme/host/fabrics.h
index f1c9bd7ae7ff..c8f6ea03e288 100644
--- a/drivers/nvme/host/fabrics.h
+++ b/drivers/nvme/host/fabrics.h
@@ -140,6 +140,7 @@ void nvmf_unregister_transport(struct nvmf_transport_ops *ops);
 void nvmf_free_options(struct nvmf_ctrl_options *opts);
 const char *nvmf_get_subsysnqn(struct nvme_ctrl *ctrl);
 int nvmf_get_address(struct nvme_ctrl *ctrl, char *buf, int size);
+void nvmf_error_recovery(struct nvme_ctrl *ctrl);
 bool nvmf_should_reconnect(struct nvme_ctrl *ctrl);
 
 #endif /* _NVME_FABRICS_H */
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 9b8c819f2bd7..4f20ade3f752 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -709,102 +709,6 @@ static void nvme_rdma_free_ctrl(struct nvme_ctrl *nctrl)
 	kfree(ctrl);
 }
 
-static void nvme_rdma_reconnect_or_remove(struct nvme_ctrl *ctrl)
-{
-	/* If we are resetting/deleting then do nothing */
-	if (ctrl->state != NVME_CTRL_RECONNECTING) {
-		WARN_ON_ONCE(ctrl->state == NVME_CTRL_NEW ||
-			ctrl->state == NVME_CTRL_LIVE);
-		return;
-	}
-
-	if (nvmf_should_reconnect(ctrl)) {
-		dev_info(ctrl->device, "Reconnecting in %d seconds...\n",
-			ctrl->opts->reconnect_delay);
-		queue_delayed_work(nvme_wq, &ctrl->reconnect_work,
-				ctrl->opts->reconnect_delay * HZ);
-	} else {
-		dev_info(ctrl->device, "Removing controller...\n");
-		queue_work(nvme_wq, &ctrl->delete_work);
-	}
-}
-
-static void nvme_rdma_reconnect_ctrl_work(struct work_struct *work)
-{
-	struct nvme_ctrl *ctrl = container_of(to_delayed_work(work),
-			struct nvme_ctrl, reconnect_work);
-	bool changed;
-	int ret;
-
-	++ctrl->nr_reconnects;
-
-	if (ctrl->max_queues > 1)
-		nvme_destroy_io_queues(ctrl, false);
-
-	nvme_destroy_admin_queue(ctrl, false);
-
-	ret = nvme_configure_admin_queue(ctrl, false);
-	if (ret)
-		goto requeue;
-
-	if (ctrl->max_queues > 1) {
-		ret = nvme_configure_io_queues(ctrl, false);
-		if (ret)
-			goto requeue;
-	}
-
-	changed = nvme_change_ctrl_state(ctrl, NVME_CTRL_LIVE);
-	WARN_ON_ONCE(!changed);
-	ctrl->nr_reconnects = 0;
-	dev_info(ctrl->device, "Successfully reconnected\n");
-
-	return;
-
-requeue:
-	dev_info(ctrl->device, "Failed reconnect attempt %d\n",
-			ctrl->nr_reconnects);
-	nvme_rdma_reconnect_or_remove(ctrl);
-}
-
-static void nvme_rdma_error_recovery_work(struct work_struct *work)
-{
-	struct nvme_ctrl *ctrl = container_of(work,
-			struct nvme_ctrl, err_work);
-
-	nvme_stop_keep_alive(ctrl);
-
-	if (ctrl->queue_count > 1) {
-		nvme_stop_queues(ctrl);
-		nvme_stop_io_queues(ctrl);
-	}
-	blk_mq_stop_hw_queues(ctrl->admin_q);
-	ctrl->ops->stop_hw_queue(ctrl, 0);
-
-	/* We must take care of fastfail/requeue all our inflight requests */
-	if (ctrl->queue_count > 1)
-		blk_mq_tagset_busy_iter(ctrl->tagset,
-					nvme_cancel_request, ctrl);
-	blk_mq_tagset_busy_iter(ctrl->admin_tagset,
-				nvme_cancel_request, ctrl);
-
-	/*
-	 * queues are not a live anymore, so restart the queues to fail fast
-	 * new IO
-	 */
-	blk_mq_start_stopped_hw_queues(ctrl->admin_q, true);
-	nvme_start_queues(ctrl);
-
-	nvme_rdma_reconnect_or_remove(ctrl);
-}
-
-static void nvme_rdma_error_recovery(struct nvme_rdma_ctrl *ctrl)
-{
-	if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_RECONNECTING))
-		return;
-
-	queue_work(nvme_wq, &ctrl->ctrl.err_work);
-}
-
 static void nvme_rdma_wr_error(struct ib_cq *cq, struct ib_wc *wc,
 		const char *op)
 {
@@ -816,7 +720,7 @@ static void nvme_rdma_wr_error(struct ib_cq *cq, struct ib_wc *wc,
 			     "%s for CQE 0x%p failed with status %s (%d)\n",
 			     op, wc->wr_cqe,
 			     ib_wc_status_msg(wc->status), wc->status);
-	nvme_rdma_error_recovery(ctrl);
+	nvmf_error_recovery(&ctrl->ctrl);
 }
 
 static void nvme_rdma_memreg_done(struct ib_cq *cq, struct ib_wc *wc)
@@ -867,7 +771,7 @@ static void nvme_rdma_unmap_data(struct nvme_rdma_queue *queue,
 			dev_err(ctrl->ctrl.device,
 				"Queueing INV WR for rkey %#x failed (%d)\n",
 				req->mr->rkey, res);
-			nvme_rdma_error_recovery(queue->ctrl);
+			nvmf_error_recovery(&queue->ctrl->ctrl);
 		}
 	}
 
@@ -1147,7 +1051,7 @@ static int nvme_rdma_process_nvme_rsp(struct nvme_rdma_queue *queue,
 		dev_err(queue->ctrl->ctrl.device,
 			"tag 0x%x on QP %#x not found\n",
 			cqe->command_id, queue->qp->qp_num);
-		nvme_rdma_error_recovery(queue->ctrl);
+		nvmf_error_recovery(&queue->ctrl->ctrl);
 		return ret;
 	}
 	req = blk_mq_rq_to_pdu(rq);
@@ -1358,7 +1262,7 @@ static int nvme_rdma_cm_handler(struct rdma_cm_id *cm_id,
 	case RDMA_CM_EVENT_TIMEWAIT_EXIT:
 		dev_dbg(queue->ctrl->ctrl.device,
 			"disconnect received - connection closed\n");
-		nvme_rdma_error_recovery(queue->ctrl);
+		nvmf_error_recovery(&queue->ctrl->ctrl);
 		break;
 	case RDMA_CM_EVENT_DEVICE_REMOVAL:
 		/* device removal is handled via the ib_client API */
@@ -1366,7 +1270,7 @@ static int nvme_rdma_cm_handler(struct rdma_cm_id *cm_id,
 	default:
 		dev_err(queue->ctrl->ctrl.device,
 			"Unexpected RDMA CM event (%d)\n", ev->event);
-		nvme_rdma_error_recovery(queue->ctrl);
+		nvmf_error_recovery(&queue->ctrl->ctrl);
 		break;
 	}
 
@@ -1384,7 +1288,7 @@ nvme_rdma_timeout(struct request *rq, bool reserved)
 	struct nvme_rdma_request *req = blk_mq_rq_to_pdu(rq);
 
 	/* queue error recovery */
-	nvme_rdma_error_recovery(req->queue->ctrl);
+	nvmf_error_recovery(&req->queue->ctrl->ctrl);
 
 	/* fail with DNR on cmd timeout */
 	nvme_req(rq)->status = NVME_SC_ABORT_REQ | NVME_SC_DNR;
@@ -1628,10 +1532,6 @@ static struct nvme_ctrl *nvme_rdma_create_ctrl(struct device *dev,
 		}
 	}
 
-	INIT_DELAYED_WORK(&ctrl->ctrl.reconnect_work,
-			nvme_rdma_reconnect_ctrl_work);
-	INIT_WORK(&ctrl->ctrl.err_work, nvme_rdma_error_recovery_work);
-
 	ret = -ENOMEM;
 	ctrl->queues = kcalloc(opts->nr_io_queues + 1, sizeof(*ctrl->queues),
 				GFP_KERNEL);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH rfc 27/30] nvme-loop: convert to nvme-core control plane management
  2017-06-18 15:21 [PATCH rfc 00/30] centralize nvme controller reset, delete and periodic reconnects Sagi Grimberg
                   ` (25 preceding siblings ...)
  2017-06-18 15:22 ` [PATCH rfc 26/30] nvme-fabrics: handle reconnects in fabrics library Sagi Grimberg
@ 2017-06-18 15:22 ` Sagi Grimberg
  2017-06-18 15:22 ` [PATCH rfc 28/30] nvme: update tagset nr_hw_queues when reallocating io queues Sagi Grimberg
                   ` (3 subsequent siblings)
  30 siblings, 0 replies; 69+ messages in thread
From: Sagi Grimberg @ 2017-06-18 15:22 UTC (permalink / raw)
  To: linux-nvme; +Cc: Christoph Hellwig, Keith Busch, linux-block

rip out all the controller and queues control plane code,
only maintain queue alloc/free/start/stop and tagset alloc/free.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
 drivers/nvme/target/loop.c | 424 ++++++++++++---------------------------------
 1 file changed, 110 insertions(+), 314 deletions(-)

diff --git a/drivers/nvme/target/loop.c b/drivers/nvme/target/loop.c
index edd9ee04de02..f176b473a2dd 100644
--- a/drivers/nvme/target/loop.c
+++ b/drivers/nvme/target/loop.c
@@ -241,7 +241,7 @@ static int nvme_loop_init_hctx(struct blk_mq_hw_ctx *hctx, void *data,
 	struct nvme_loop_ctrl *ctrl = data;
 	struct nvme_loop_queue *queue = &ctrl->queues[hctx_idx + 1];
 
-	BUG_ON(hctx_idx >= ctrl->queue_count);
+	BUG_ON(hctx_idx >= ctrl->ctrl.max_queues);
 
 	hctx->driver_data = queue;
 	return 0;
@@ -275,268 +275,137 @@ static const struct blk_mq_ops nvme_loop_admin_mq_ops = {
 	.timeout	= nvme_loop_timeout,
 };
 
-static void nvme_loop_destroy_admin_queue(struct nvme_loop_ctrl *ctrl)
+static int nvme_loop_verify_ctrl(struct nvme_ctrl *ctrl)
 {
-	nvmet_sq_destroy(&ctrl->queues[0].nvme_sq);
-	blk_cleanup_queue(ctrl->ctrl.admin_connect_q);
-	blk_cleanup_queue(ctrl->ctrl.admin_q);
-	blk_mq_free_tag_set(&ctrl->admin_tag_set);
-}
-
-static void nvme_loop_free_ctrl(struct nvme_ctrl *nctrl)
-{
-	struct nvme_loop_ctrl *ctrl = to_loop_ctrl(nctrl);
+	struct nvmf_ctrl_options *opts = ctrl->opts;
 
-	if (list_empty(&ctrl->list))
-		goto free_ctrl;
-
-	mutex_lock(&nvme_loop_ctrl_mutex);
-	list_del(&ctrl->list);
-	mutex_unlock(&nvme_loop_ctrl_mutex);
-
-	if (nctrl->tagset) {
-		blk_cleanup_queue(ctrl->ctrl.connect_q);
-		blk_mq_free_tag_set(&ctrl->tag_set);
+	if (opts->queue_size > ctrl->maxcmd) {
+		/* warn if maxcmd is lower than queue_size */
+		dev_warn(ctrl->device,
+			"queue_size %zu > ctrl maxcmd %u, clamping down\n",
+			opts->queue_size, ctrl->maxcmd);
+		opts->queue_size = ctrl->maxcmd;
 	}
-	kfree(ctrl->queues);
-	nvmf_free_options(nctrl->opts);
-free_ctrl:
-	kfree(ctrl);
-}
 
-static void nvme_loop_destroy_io_queues(struct nvme_loop_ctrl *ctrl)
-{
-	int i;
-
-	for (i = 1; i < ctrl->queue_count; i++)
-		nvmet_sq_destroy(&ctrl->queues[i].nvme_sq);
-}
-
-static int nvme_loop_init_io_queues(struct nvme_loop_ctrl *ctrl)
-{
-	struct nvmf_ctrl_options *opts = ctrl->ctrl.opts;
-	unsigned int nr_io_queues;
-	int ret, i;
-
-	nr_io_queues = min(opts->nr_io_queues, num_online_cpus());
-	ret = nvme_set_queue_count(&ctrl->ctrl, &nr_io_queues);
-	if (ret || !nr_io_queues)
-		return ret;
-
-	dev_info(ctrl->ctrl.device, "creating %d I/O queues.\n", nr_io_queues);
-
-	for (i = 1; i <= nr_io_queues; i++) {
-		ctrl->queues[i].ctrl = ctrl;
-		ret = nvmet_sq_init(&ctrl->queues[i].nvme_sq);
-		if (ret)
-			goto out_destroy_queues;
-
-		ctrl->queue_count++;
+	if (opts->queue_size > ctrl->sqsize + 1) {
+		/* warn if sqsize is lower than queue_size */
+		dev_warn(ctrl->device,
+			"queue_size %zu > ctrl sqsize %u, clamping down\n",
+			opts->queue_size, ctrl->sqsize + 1);
+		opts->queue_size = ctrl->sqsize + 1;
 	}
 
 	return 0;
-
-out_destroy_queues:
-	nvme_loop_destroy_io_queues(ctrl);
-	return ret;
 }
 
-static int nvme_loop_connect_io_queues(struct nvme_loop_ctrl *ctrl)
+static void nvme_loop_free_tagset(struct nvme_ctrl *nctrl, bool admin)
 {
-	int i, ret;
-
-	for (i = 1; i < ctrl->queue_count; i++) {
-		ret = nvmf_connect_io_queue(&ctrl->ctrl, i);
-		if (ret)
-			return ret;
-	}
+	struct nvme_loop_ctrl *ctrl = to_loop_ctrl(nctrl);
+	struct blk_mq_tag_set *set = admin ?
+			&ctrl->admin_tag_set : &ctrl->tag_set;
 
-	return 0;
+	blk_mq_free_tag_set(set);
 }
 
-static int nvme_loop_configure_admin_queue(struct nvme_loop_ctrl *ctrl)
+static struct blk_mq_tag_set *nvme_loop_alloc_tagset(struct nvme_ctrl *nctrl,
+		bool admin)
 {
-	int error;
-
-	memset(&ctrl->admin_tag_set, 0, sizeof(ctrl->admin_tag_set));
-	ctrl->admin_tag_set.ops = &nvme_loop_admin_mq_ops;
-	ctrl->admin_tag_set.queue_depth = NVME_LOOP_AQ_BLKMQ_DEPTH;
-	ctrl->admin_tag_set.reserved_tags = 2; /* connect + keep-alive */
-	ctrl->admin_tag_set.numa_node = NUMA_NO_NODE;
-	ctrl->admin_tag_set.cmd_size = sizeof(struct nvme_loop_iod) +
-		SG_CHUNK_SIZE * sizeof(struct scatterlist);
-	ctrl->admin_tag_set.driver_data = ctrl;
-	ctrl->admin_tag_set.nr_hw_queues = 1;
-	ctrl->admin_tag_set.timeout = ADMIN_TIMEOUT;
-
-	ctrl->queues[0].ctrl = ctrl;
-	error = nvmet_sq_init(&ctrl->queues[0].nvme_sq);
-	if (error)
-		return error;
-	ctrl->queue_count = 1;
-
-	error = blk_mq_alloc_tag_set(&ctrl->admin_tag_set);
-	if (error)
-		goto out_free_sq;
-
-	ctrl->ctrl.admin_q = blk_mq_init_queue(&ctrl->admin_tag_set);
-	if (IS_ERR(ctrl->ctrl.admin_q)) {
-		error = PTR_ERR(ctrl->ctrl.admin_q);
-		goto out_free_tagset;
-	}
-
-	ctrl->ctrl.admin_connect_q = blk_mq_init_queue(&ctrl->admin_tag_set);
-	if (IS_ERR(ctrl->ctrl.admin_connect_q)) {
-		error = PTR_ERR(ctrl->ctrl.admin_connect_q);
-		goto out_cleanup_queue;
-	}
-
-	error = nvmf_connect_admin_queue(&ctrl->ctrl);
-	if (error)
-		goto out_cleanup_connect_queue;
+	struct nvme_loop_ctrl *ctrl = to_loop_ctrl(nctrl);
+	struct blk_mq_tag_set *set;
+	int ret;
 
-	error = nvmf_reg_read64(&ctrl->ctrl, NVME_REG_CAP, &ctrl->cap);
-	if (error) {
-		dev_err(ctrl->ctrl.device,
-			"prop_get NVME_REG_CAP failed\n");
-		goto out_cleanup_connect_queue;
+	if (admin) {
+		set = &ctrl->admin_tag_set;
+		memset(set, 0, sizeof(*set));
+		set->ops = &nvme_loop_admin_mq_ops;
+		set->queue_depth = NVME_LOOP_AQ_BLKMQ_DEPTH;
+		set->reserved_tags = 2; /* connect + keep-alive */
+		set->numa_node = NUMA_NO_NODE;
+		set->cmd_size = sizeof(struct nvme_loop_iod) +
+			SG_CHUNK_SIZE * sizeof(struct scatterlist);
+		set->driver_data = ctrl;
+		set->nr_hw_queues = 1;
+		set->timeout = ADMIN_TIMEOUT;
+	} else {
+		set = &ctrl->tag_set;
+		memset(set, 0, sizeof(*set));
+		set->ops = &nvme_loop_mq_ops;
+		set->queue_depth = nctrl->opts->queue_size;
+		set->reserved_tags = 1; /* fabric connect */
+		set->numa_node = NUMA_NO_NODE;
+		set->flags = BLK_MQ_F_SHOULD_MERGE;
+		set->cmd_size = sizeof(struct nvme_loop_iod) +
+			SG_CHUNK_SIZE * sizeof(struct scatterlist);
+		set->driver_data = ctrl;
+		set->nr_hw_queues = nctrl->queue_count - 1;
+		set->timeout = NVME_IO_TIMEOUT;
 	}
 
-	ctrl->ctrl.sqsize =
-		min_t(int, NVME_CAP_MQES(ctrl->cap), ctrl->ctrl.sqsize);
-
-	error = nvme_enable_ctrl(&ctrl->ctrl, ctrl->cap);
-	if (error)
-		goto out_cleanup_connect_queue;
-
-	ctrl->ctrl.max_hw_sectors =
-		(NVME_LOOP_MAX_SEGMENTS - 1) << (PAGE_SHIFT - 9);
-
-	error = nvme_init_identify(&ctrl->ctrl);
-	if (error)
-		goto out_cleanup_connect_queue;
-
-	nvme_start_keep_alive(&ctrl->ctrl);
-
-	return 0;
+	ret = blk_mq_alloc_tag_set(set);
+	if (ret)
+		return ERR_PTR(ret);
 
-out_cleanup_connect_queue:
-	blk_cleanup_queue(ctrl->ctrl.admin_connect_q);
-out_cleanup_queue:
-	blk_cleanup_queue(ctrl->ctrl.admin_q);
-out_free_tagset:
-	blk_mq_free_tag_set(&ctrl->admin_tag_set);
-out_free_sq:
-	nvmet_sq_destroy(&ctrl->queues[0].nvme_sq);
-	return error;
+	return set;
 }
 
-static void nvme_loop_shutdown_ctrl(struct nvme_loop_ctrl *ctrl)
+static void nvme_loop_free_queue(struct nvme_ctrl *nctrl, int qid)
 {
-	nvme_stop_keep_alive(&ctrl->ctrl);
-
-	if (ctrl->queue_count > 1) {
-		nvme_stop_queues(&ctrl->ctrl);
-		blk_mq_tagset_busy_iter(&ctrl->tag_set,
-					nvme_cancel_request, &ctrl->ctrl);
-		nvme_loop_destroy_io_queues(ctrl);
-	}
-
-	if (ctrl->ctrl.state == NVME_CTRL_LIVE)
-		nvme_shutdown_ctrl(&ctrl->ctrl);
-
-	blk_mq_stop_hw_queues(ctrl->ctrl.admin_q);
-	blk_mq_tagset_busy_iter(&ctrl->admin_tag_set,
-				nvme_cancel_request, &ctrl->ctrl);
-	nvme_loop_destroy_admin_queue(ctrl);
 }
 
-static void nvme_loop_del_ctrl_work(struct work_struct *work)
+static void nvme_loop_stop_queue(struct nvme_ctrl *nctrl, int qid)
 {
-	struct nvme_loop_ctrl *ctrl = container_of(work,
-				struct nvme_loop_ctrl, delete_work);
+	struct nvme_loop_ctrl *ctrl = to_loop_ctrl(nctrl);
 
-	nvme_uninit_ctrl(&ctrl->ctrl);
-	nvme_loop_shutdown_ctrl(ctrl);
-	nvme_put_ctrl(&ctrl->ctrl);
+	nvmet_sq_destroy(&ctrl->queues[qid].nvme_sq);
 }
 
-static int __nvme_loop_del_ctrl(struct nvme_loop_ctrl *ctrl)
+static int nvme_loop_start_queue(struct nvme_ctrl *nctrl, int qid)
 {
-	if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_DELETING))
-		return -EBUSY;
+	int ret;
 
-	if (!queue_work(nvme_wq, &ctrl->delete_work))
-		return -EBUSY;
+	if (qid)
+		ret = nvmf_connect_io_queue(nctrl, qid);
+	else
+		ret = nvmf_connect_admin_queue(nctrl);
 
-	return 0;
+	if (ret)
+		dev_info(nctrl->device,
+			"failed to connect queue: %d ret=%d\n", qid, ret);
+	return ret;
 }
 
-static int nvme_loop_del_ctrl(struct nvme_ctrl *nctrl)
+static int nvme_loop_alloc_queue(struct nvme_ctrl *nctrl,
+		int qid, size_t queue_size)
 {
 	struct nvme_loop_ctrl *ctrl = to_loop_ctrl(nctrl);
 	int ret;
 
-	ret = __nvme_loop_del_ctrl(ctrl);
+	ctrl->queues[qid].ctrl = ctrl;
+	ret = nvmet_sq_init(&ctrl->queues[qid].nvme_sq);
 	if (ret)
 		return ret;
 
-	flush_work(&ctrl->delete_work);
+	if (!qid)
+		nvme_loop_init_iod(ctrl, &ctrl->async_event_iod, 0);
 
 	return 0;
 }
 
-static void nvme_loop_delete_ctrl(struct nvmet_ctrl *nctrl)
+static void nvme_loop_free_ctrl(struct nvme_ctrl *nctrl)
 {
-	struct nvme_loop_ctrl *ctrl;
+	struct nvme_loop_ctrl *ctrl = to_loop_ctrl(nctrl);
+
+	if (list_empty(&ctrl->list))
+		goto free_ctrl;
 
 	mutex_lock(&nvme_loop_ctrl_mutex);
-	list_for_each_entry(ctrl, &nvme_loop_ctrl_list, list) {
-		if (ctrl->ctrl.cntlid == nctrl->cntlid)
-			__nvme_loop_del_ctrl(ctrl);
-	}
+	list_del(&ctrl->list);
 	mutex_unlock(&nvme_loop_ctrl_mutex);
-}
-
-static void nvme_loop_reset_ctrl_work(struct work_struct *work)
-{
-	struct nvme_loop_ctrl *ctrl =
-		container_of(work, struct nvme_loop_ctrl, ctrl.reset_work);
-	bool changed;
-	int ret;
-
-	nvme_loop_shutdown_ctrl(ctrl);
-
-	ret = nvme_loop_configure_admin_queue(ctrl);
-	if (ret)
-		goto out_disable;
-
-	ret = nvme_loop_init_io_queues(ctrl);
-	if (ret)
-		goto out_destroy_admin;
-
-	ret = nvme_loop_connect_io_queues(ctrl);
-	if (ret)
-		goto out_destroy_io;
-
-	changed = nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_LIVE);
-	WARN_ON_ONCE(!changed);
-
-	nvme_queue_scan(&ctrl->ctrl);
-	nvme_queue_async_events(&ctrl->ctrl);
-
-	nvme_start_queues(&ctrl->ctrl);
-
-	return;
 
-out_destroy_io:
-	nvme_loop_destroy_io_queues(ctrl);
-out_destroy_admin:
-	nvme_loop_destroy_admin_queue(ctrl);
-out_disable:
-	dev_warn(ctrl->ctrl.device, "Removing after reset failure\n");
-	nvme_uninit_ctrl(&ctrl->ctrl);
-	nvme_put_ctrl(&ctrl->ctrl);
+	kfree(ctrl->queues);
+	nvmf_free_options(nctrl->opts);
+free_ctrl:
+	kfree(ctrl);
 }
 
 static const struct nvme_ctrl_ops nvme_loop_ctrl_ops = {
@@ -548,139 +417,66 @@ static const struct nvme_ctrl_ops nvme_loop_ctrl_ops = {
 	.reg_write32		= nvmf_reg_write32,
 	.free_ctrl		= nvme_loop_free_ctrl,
 	.submit_async_event	= nvme_loop_submit_async_event,
-	.delete_ctrl		= nvme_loop_del_ctrl,
+	.delete_ctrl		= nvme_del_ctrl,
 	.get_subsysnqn		= nvmf_get_subsysnqn,
+	.alloc_hw_queue		= nvme_loop_alloc_queue,
+	.free_hw_queue		= nvme_loop_free_queue,
+	.start_hw_queue		= nvme_loop_start_queue,
+	.stop_hw_queue		= nvme_loop_stop_queue,
+	.alloc_tagset		= nvme_loop_alloc_tagset,
+	.free_tagset		= nvme_loop_free_tagset,
+	.verify_ctrl		= nvme_loop_verify_ctrl,
 };
 
-static int nvme_loop_create_io_queues(struct nvme_loop_ctrl *ctrl)
-{
-	int ret;
-
-	ret = nvme_loop_init_io_queues(ctrl);
-	if (ret)
-		return ret;
-
-	memset(&ctrl->tag_set, 0, sizeof(ctrl->tag_set));
-	ctrl->tag_set.ops = &nvme_loop_mq_ops;
-	ctrl->tag_set.queue_depth = ctrl->ctrl.opts->queue_size;
-	ctrl->tag_set.reserved_tags = 1; /* fabric connect */
-	ctrl->tag_set.numa_node = NUMA_NO_NODE;
-	ctrl->tag_set.flags = BLK_MQ_F_SHOULD_MERGE;
-	ctrl->tag_set.cmd_size = sizeof(struct nvme_loop_iod) +
-		SG_CHUNK_SIZE * sizeof(struct scatterlist);
-	ctrl->tag_set.driver_data = ctrl;
-	ctrl->tag_set.nr_hw_queues = ctrl->queue_count - 1;
-	ctrl->tag_set.timeout = NVME_IO_TIMEOUT;
-	ctrl->ctrl.tagset = &ctrl->tag_set;
-
-	ret = blk_mq_alloc_tag_set(&ctrl->tag_set);
-	if (ret)
-		goto out_destroy_queues;
-
-	ctrl->ctrl.connect_q = blk_mq_init_queue(&ctrl->tag_set);
-	if (IS_ERR(ctrl->ctrl.connect_q)) {
-		ret = PTR_ERR(ctrl->ctrl.connect_q);
-		goto out_free_tagset;
-	}
-
-	ret = nvme_loop_connect_io_queues(ctrl);
-	if (ret)
-		goto out_cleanup_connect_q;
-
-	return 0;
-
-out_cleanup_connect_q:
-	blk_cleanup_queue(ctrl->ctrl.connect_q);
-out_free_tagset:
-	blk_mq_free_tag_set(&ctrl->tag_set);
-out_destroy_queues:
-	nvme_loop_destroy_io_queues(ctrl);
-	return ret;
-}
-
 static struct nvme_ctrl *nvme_loop_create_ctrl(struct device *dev,
 		struct nvmf_ctrl_options *opts)
 {
 	struct nvme_loop_ctrl *ctrl;
-	bool changed;
 	int ret;
 
 	ctrl = kzalloc(sizeof(*ctrl), GFP_KERNEL);
 	if (!ctrl)
 		return ERR_PTR(-ENOMEM);
-	ctrl->ctrl.opts = opts;
-	INIT_LIST_HEAD(&ctrl->list);
-
-	INIT_WORK(&ctrl->delete_work, nvme_loop_del_ctrl_work);
-	INIT_WORK(&ctrl->ctrl.reset_work, nvme_loop_reset_ctrl_work);
-
-	ret = nvme_init_ctrl(&ctrl->ctrl, dev, &nvme_loop_ctrl_ops,
-				0 /* no quirks, we're perfect! */);
-	if (ret)
-		goto out_put_ctrl;
 
 	ret = -ENOMEM;
-
-	ctrl->ctrl.sqsize = opts->queue_size - 1;
-	ctrl->ctrl.kato = opts->kato;
-
 	ctrl->queues = kcalloc(opts->nr_io_queues + 1, sizeof(*ctrl->queues),
 			GFP_KERNEL);
 	if (!ctrl->queues)
-		goto out_uninit_ctrl;
+		goto out_free_ctrl;
 
-	ret = nvme_loop_configure_admin_queue(ctrl);
+	ret = nvme_probe_ctrl(&ctrl->ctrl, dev, &nvme_loop_ctrl_ops,
+			0, opts->nr_io_queues, opts->queue_size, opts->kato);
 	if (ret)
 		goto out_free_queues;
 
-	if (opts->queue_size > ctrl->ctrl.maxcmd) {
-		/* warn if maxcmd is lower than queue_size */
-		dev_warn(ctrl->ctrl.device,
-			"queue_size %zu > ctrl maxcmd %u, clamping down\n",
-			opts->queue_size, ctrl->ctrl.maxcmd);
-		opts->queue_size = ctrl->ctrl.maxcmd;
-	}
-
-	if (opts->nr_io_queues) {
-		ret = nvme_loop_create_io_queues(ctrl);
-		if (ret)
-			goto out_remove_admin_queue;
-	}
-
-	nvme_loop_init_iod(ctrl, &ctrl->async_event_iod, 0);
-
 	dev_info(ctrl->ctrl.device,
 		 "new ctrl: \"%s\"\n", ctrl->ctrl.opts->subsysnqn);
 
-	kref_get(&ctrl->ctrl.kref);
-
-	changed = nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_LIVE);
-	WARN_ON_ONCE(!changed);
-
 	mutex_lock(&nvme_loop_ctrl_mutex);
 	list_add_tail(&ctrl->list, &nvme_loop_ctrl_list);
 	mutex_unlock(&nvme_loop_ctrl_mutex);
 
-	if (opts->nr_io_queues) {
-		nvme_queue_scan(&ctrl->ctrl);
-		nvme_queue_async_events(&ctrl->ctrl);
-	}
-
 	return &ctrl->ctrl;
 
-out_remove_admin_queue:
-	nvme_loop_destroy_admin_queue(ctrl);
 out_free_queues:
 	kfree(ctrl->queues);
-out_uninit_ctrl:
-	nvme_uninit_ctrl(&ctrl->ctrl);
-out_put_ctrl:
-	nvme_put_ctrl(&ctrl->ctrl);
-	if (ret > 0)
-		ret = -EIO;
+out_free_ctrl:
+	kfree(ctrl);
 	return ERR_PTR(ret);
 }
 
+static void nvme_loop_delete_ctrl(struct nvmet_ctrl *nctrl)
+{
+	struct nvme_loop_ctrl *ctrl;
+
+	mutex_lock(&nvme_loop_ctrl_mutex);
+	list_for_each_entry(ctrl, &nvme_loop_ctrl_list, list) {
+		if (ctrl->ctrl.cntlid == nctrl->cntlid)
+			__nvme_del_ctrl(&ctrl->ctrl);
+	}
+	mutex_unlock(&nvme_loop_ctrl_mutex);
+}
+
 static int nvme_loop_add_port(struct nvmet_port *port)
 {
 	/*
@@ -744,7 +540,7 @@ static void __exit nvme_loop_cleanup_module(void)
 
 	mutex_lock(&nvme_loop_ctrl_mutex);
 	list_for_each_entry_safe(ctrl, next, &nvme_loop_ctrl_list, list)
-		__nvme_loop_del_ctrl(ctrl);
+		__nvme_del_ctrl(&ctrl->ctrl);
 	mutex_unlock(&nvme_loop_ctrl_mutex);
 
 	flush_workqueue(nvme_wq);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH rfc 28/30] nvme: update tagset nr_hw_queues when reallocating io queues
  2017-06-18 15:21 [PATCH rfc 00/30] centralize nvme controller reset, delete and periodic reconnects Sagi Grimberg
                   ` (26 preceding siblings ...)
  2017-06-18 15:22 ` [PATCH rfc 27/30] nvme-loop: convert to nvme-core control plane management Sagi Grimberg
@ 2017-06-18 15:22 ` Sagi Grimberg
  2017-06-19  7:21   ` Christoph Hellwig
  2017-06-18 15:22 ` [PATCH rfc 29/30] nvme: add sed-opal ctrl manipulation in admin configuration Sagi Grimberg
                   ` (2 subsequent siblings)
  30 siblings, 1 reply; 69+ messages in thread
From: Sagi Grimberg @ 2017-06-18 15:22 UTC (permalink / raw)
  To: linux-nvme; +Cc: Christoph Hellwig, Keith Busch, linux-block

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
 drivers/nvme/host/core.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 6937ba26ff2c..476c49c0601f 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2776,6 +2776,9 @@ int nvme_configure_io_queues(struct nvme_ctrl *ctrl, bool new)
 		ret = blk_mq_reinit_tagset(ctrl->tagset);
 		if (ret)
 			goto out_free_io_queues;
+
+		blk_mq_update_nr_hw_queues(ctrl->tagset,
+				ctrl->queue_count - 1);
        }
 
 	ret = nvme_start_io_queues(ctrl);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: [PATCH rfc 28/30] nvme: update tagset nr_hw_queues when reallocating io queues
  2017-06-18 15:22 ` [PATCH rfc 28/30] nvme: update tagset nr_hw_queues when reallocating io queues Sagi Grimberg
@ 2017-06-19  7:21   ` Christoph Hellwig
  2017-06-19  8:06     ` Ming Lei
  0 siblings, 1 reply; 69+ messages in thread
From: Christoph Hellwig @ 2017-06-19  7:21 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: linux-nvme, Christoph Hellwig, Keith Busch, linux-block, Ming Lei

On Sun, Jun 18, 2017 at 06:22:02PM +0300, Sagi Grimberg wrote:
> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>

Could use a changelog.

Ming: does this solve your problem of not seeing the new queues
after a qemu CPU hotplug + reset?

> ---
>  drivers/nvme/host/core.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 6937ba26ff2c..476c49c0601f 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -2776,6 +2776,9 @@ int nvme_configure_io_queues(struct nvme_ctrl *ctrl, bool new)
>  		ret = blk_mq_reinit_tagset(ctrl->tagset);
>  		if (ret)
>  			goto out_free_io_queues;
> +
> +		blk_mq_update_nr_hw_queues(ctrl->tagset,
> +				ctrl->queue_count - 1);
>         }
>  
>  	ret = nvme_start_io_queues(ctrl);
> -- 
> 2.7.4
---end quoted text---

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH rfc 28/30] nvme: update tagset nr_hw_queues when reallocating io queues
  2017-06-19  7:21   ` Christoph Hellwig
@ 2017-06-19  8:06     ` Ming Lei
  2017-06-19 16:21       ` Sagi Grimberg
  0 siblings, 1 reply; 69+ messages in thread
From: Ming Lei @ 2017-06-19  8:06 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Sagi Grimberg, Keith Busch, linux-block, linux-nvme

On Mon, Jun 19, 2017 at 09:21:48AM +0200, Christoph Hellwig wrote:
> On Sun, Jun 18, 2017 at 06:22:02PM +0300, Sagi Grimberg wrote:
> > Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
> 
> Could use a changelog.
> 
> Ming: does this solve your problem of not seeing the new queues
> after a qemu CPU hotplug + reset?

The issue I observed is that there isn't NVMe reset triggered after
CPU becomes online.

It may take a while since the test need to apply the whole patchset.
Or is it fine to figure out one fix on this issue? Sagi?

Once the test is done, I will update with you.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH rfc 28/30] nvme: update tagset nr_hw_queues when reallocating io queues
  2017-06-19  8:06     ` Ming Lei
@ 2017-06-19 16:21       ` Sagi Grimberg
  0 siblings, 0 replies; 69+ messages in thread
From: Sagi Grimberg @ 2017-06-19 16:21 UTC (permalink / raw)
  To: Ming Lei, Christoph Hellwig; +Cc: Keith Busch, linux-block, linux-nvme


>>> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
>>
>> Could use a changelog.
>>
>> Ming: does this solve your problem of not seeing the new queues
>> after a qemu CPU hotplug + reset?
> 
> The issue I observed is that there isn't NVMe reset triggered after
> CPU becomes online.

This won't help with that.

> It may take a while since the test need to apply the whole patchset.
> Or is it fine to figure out one fix on this issue? Sagi?

No need, all this doesn't even touch pci for now...

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH rfc 29/30] nvme: add sed-opal ctrl manipulation in admin configuration
  2017-06-18 15:21 [PATCH rfc 00/30] centralize nvme controller reset, delete and periodic reconnects Sagi Grimberg
                   ` (27 preceding siblings ...)
  2017-06-18 15:22 ` [PATCH rfc 28/30] nvme: update tagset nr_hw_queues when reallocating io queues Sagi Grimberg
@ 2017-06-18 15:22 ` Sagi Grimberg
  2017-06-19  7:22   ` Christoph Hellwig
  2017-06-18 15:22 ` [PATCH rfc 30/30] nvme: Add queue freeze/unfreeze handling on controller resets Sagi Grimberg
  2017-06-18 15:24 ` [PATCH rfc 00/30] centralize nvme controller reset, delete and periodic reconnects Sagi Grimberg
  30 siblings, 1 reply; 69+ messages in thread
From: Sagi Grimberg @ 2017-06-18 15:22 UTC (permalink / raw)
  To: linux-nvme; +Cc: Christoph Hellwig, Keith Busch, linux-block

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
 drivers/nvme/host/core.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 476c49c0601f..f4800b8e47a0 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2493,6 +2493,7 @@ static void nvme_free_ctrl(struct kref *kref)
 	put_device(ctrl->device);
 	nvme_release_instance(ctrl);
 	ida_destroy(&ctrl->ns_ida);
+	free_opal_dev(ctrl->opal_dev);
 
 	ctrl->ops->free_ctrl(ctrl);
 }
@@ -2814,6 +2815,7 @@ EXPORT_SYMBOL_GPL(nvme_destroy_admin_queue);
 
 int nvme_configure_admin_queue(struct nvme_ctrl *ctrl, bool new)
 {
+	bool was_suspend = !!(ctrl->ctrl_config & NVME_CC_SHN_NORMAL);
 	int error;
 
 	error = ctrl->ops->alloc_hw_queue(ctrl, 0, NVME_AQ_DEPTH);
@@ -2868,6 +2870,16 @@ int nvme_configure_admin_queue(struct nvme_ctrl *ctrl, bool new)
 	if (error)
 		goto out_cleanup_connect_queue;
 
+	if (ctrl->oacs & NVME_CTRL_OACS_SEC_SUPP) {
+		if (!ctrl->opal_dev)
+			ctrl->opal_dev = init_opal_dev(ctrl, &nvme_sec_submit);
+		else if (was_suspend)
+			opal_unlock_from_suspend(ctrl->opal_dev);
+	} else {
+		free_opal_dev(ctrl->opal_dev);
+		ctrl->opal_dev = NULL;
+	}
+
 	nvme_start_keep_alive(ctrl);
 
 	return 0;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: [PATCH rfc 29/30] nvme: add sed-opal ctrl manipulation in admin configuration
  2017-06-18 15:22 ` [PATCH rfc 29/30] nvme: add sed-opal ctrl manipulation in admin configuration Sagi Grimberg
@ 2017-06-19  7:22   ` Christoph Hellwig
  2017-06-19  8:03     ` Sagi Grimberg
  0 siblings, 1 reply; 69+ messages in thread
From: Christoph Hellwig @ 2017-06-19  7:22 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: linux-nvme, Christoph Hellwig, Keith Busch, linux-block

On Sun, Jun 18, 2017 at 06:22:03PM +0300, Sagi Grimberg wrote:
> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>

The subject sounds odd and it could use a changelog.  But I'd love to
pick this change up ASAP as it's the right thing to do..

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH rfc 29/30] nvme: add sed-opal ctrl manipulation in admin configuration
  2017-06-19  7:22   ` Christoph Hellwig
@ 2017-06-19  8:03     ` Sagi Grimberg
  2017-06-19 12:55       ` Christoph Hellwig
  0 siblings, 1 reply; 69+ messages in thread
From: Sagi Grimberg @ 2017-06-19  8:03 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-nvme, Keith Busch, linux-block


> The subject sounds odd and it could use a changelog.  But I'd love to
> pick this change up ASAP as it's the right thing to do..

How? where would you place it? there is no nvme_configure_admin_queue in
nvme-core.

I suppose we could get it in nvme_init_ctrl?

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH rfc 29/30] nvme: add sed-opal ctrl manipulation in admin configuration
  2017-06-19  8:03     ` Sagi Grimberg
@ 2017-06-19 12:55       ` Christoph Hellwig
  0 siblings, 0 replies; 69+ messages in thread
From: Christoph Hellwig @ 2017-06-19 12:55 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: Christoph Hellwig, linux-nvme, Keith Busch, linux-block

On Mon, Jun 19, 2017 at 11:03:36AM +0300, Sagi Grimberg wrote:
>
>> The subject sounds odd and it could use a changelog.  But I'd love to
>> pick this change up ASAP as it's the right thing to do..
>
> How? where would you place it? there is no nvme_configure_admin_queue in
> nvme-core.

Doh.  Yeah, let's keep it where it is.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH rfc 30/30] nvme: Add queue freeze/unfreeze handling on controller resets
  2017-06-18 15:21 [PATCH rfc 00/30] centralize nvme controller reset, delete and periodic reconnects Sagi Grimberg
                   ` (28 preceding siblings ...)
  2017-06-18 15:22 ` [PATCH rfc 29/30] nvme: add sed-opal ctrl manipulation in admin configuration Sagi Grimberg
@ 2017-06-18 15:22 ` Sagi Grimberg
  2017-06-18 15:24 ` [PATCH rfc 00/30] centralize nvme controller reset, delete and periodic reconnects Sagi Grimberg
  30 siblings, 0 replies; 69+ messages in thread
From: Sagi Grimberg @ 2017-06-18 15:22 UTC (permalink / raw)
  To: linux-nvme; +Cc: Christoph Hellwig, Keith Busch, linux-block

Just copy what we have in nvme-pci. It's a generic flow anyway.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
 drivers/nvme/host/core.c | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index f4800b8e47a0..959b6c39f22c 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2901,15 +2901,40 @@ EXPORT_SYMBOL_GPL(nvme_configure_admin_queue);
 
 static void nvme_teardown_ctrl(struct nvme_ctrl *ctrl, bool shutdown)
 {
+	bool dead = true;
+
 	nvme_stop_keep_alive(ctrl);
 	cancel_work_sync(&ctrl->err_work);
 	cancel_delayed_work_sync(&ctrl->reconnect_work);
 
 	if (ctrl->max_queues > 1) {
+		u32 csts;
+
+		if (!ctrl->ops->reg_read32(ctrl, NVME_REG_CSTS, &csts)) {
+			nvme_start_freeze(ctrl);
+			dead = !!((csts & NVME_CSTS_CFS) ||
+				!(csts & NVME_CSTS_RDY));
+		}
+
+		/*
+		 * Give the controller a chance to complete all entered requests
+		 * if doing a safe shutdown.
+		 */
+		if (!dead && shutdown)
+			nvme_wait_freeze_timeout(ctrl, NVME_IO_TIMEOUT);
+
 		nvme_stop_queues(ctrl);
 		blk_mq_tagset_busy_iter(ctrl->tagset,
 					nvme_cancel_request, ctrl);
 		nvme_destroy_io_queues(ctrl, shutdown);
+
+		/*
+		 * The driver will not be starting up queues again if shutting
+		 * down so must flush all entered requests to their failed
+		 * completion to avoid deadlocking blk-mq hot-cpu notifier.
+		 */
+		if (shutdown)
+			nvme_start_queues(ctrl);
 	}
 
 	if (shutdown)
@@ -2991,6 +3016,8 @@ static void nvme_reset_ctrl_work(struct work_struct *work)
 
 	if (ctrl->queue_count > 1) {
 		nvme_start_queues(ctrl);
+		nvme_wait_freeze(ctrl);
+		nvme_unfreeze(ctrl);
 		nvme_queue_scan(ctrl);
 		nvme_queue_async_events(ctrl);
 	}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: [PATCH rfc 00/30] centralize nvme controller reset, delete and periodic reconnects
  2017-06-18 15:21 [PATCH rfc 00/30] centralize nvme controller reset, delete and periodic reconnects Sagi Grimberg
                   ` (29 preceding siblings ...)
  2017-06-18 15:22 ` [PATCH rfc 30/30] nvme: Add queue freeze/unfreeze handling on controller resets Sagi Grimberg
@ 2017-06-18 15:24 ` Sagi Grimberg
  30 siblings, 0 replies; 69+ messages in thread
From: Sagi Grimberg @ 2017-06-18 15:24 UTC (permalink / raw)
  To: linux-nvme; +Cc: Keith Busch, linux-block, Christoph Hellwig

Oh, forgot to add the gitweb for all this:

http://git.infradead.org/users/sagi/linux.git nvme-central-reset-delete-err2

^ permalink raw reply	[flat|nested] 69+ messages in thread

end of thread, other threads:[~2017-07-10 18:57 UTC | newest]

Thread overview: 69+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-06-18 15:21 [PATCH rfc 00/30] centralize nvme controller reset, delete and periodic reconnects Sagi Grimberg
2017-06-18 15:21 ` [PATCH rfc 01/30] nvme: Add admin connect request queue Sagi Grimberg
2017-06-19  7:13   ` Christoph Hellwig
2017-06-19  7:49     ` Sagi Grimberg
2017-06-19 12:30       ` Christoph Hellwig
2017-06-19 15:56       ` Hannes Reinecke
2017-06-18 15:21 ` [PATCH rfc 02/30] nvme-rdma: Don't alloc/free the tagset on reset Sagi Grimberg
2017-06-19  7:18   ` Christoph Hellwig
2017-06-19  7:59     ` Sagi Grimberg
2017-06-19 12:35       ` Christoph Hellwig
2017-07-10 18:50     ` James Smart
2017-06-18 15:21 ` [PATCH rfc 03/30] nvme-rdma: reuse configure/destroy admin queue Sagi Grimberg
2017-06-19  7:20   ` Christoph Hellwig
2017-06-19  8:00     ` Sagi Grimberg
2017-06-18 15:21 ` [PATCH rfc 04/30] nvme-rdma: introduce configure/destroy io queues Sagi Grimberg
2017-06-19 12:35   ` Christoph Hellwig
2017-06-18 15:21 ` [PATCH rfc 05/30] nvme-rdma: introduce nvme_rdma_start_queue Sagi Grimberg
2017-06-19 12:38   ` Christoph Hellwig
2017-06-18 15:21 ` [PATCH rfc 06/30] nvme-rdma: rename nvme_rdma_init_queue to nvme_rdma_alloc_queue Sagi Grimberg
2017-06-19 12:38   ` Christoph Hellwig
2017-06-18 15:21 ` [PATCH rfc 07/30] nvme-rdma: make stop/free queue receive a ctrl and qid struct Sagi Grimberg
2017-06-19 12:39   ` Christoph Hellwig
2017-06-18 15:21 ` [PATCH rfc 08/30] nvme-rdma: cleanup error path in controller reset Sagi Grimberg
2017-06-19 12:40   ` Christoph Hellwig
2017-07-10 18:57   ` James Smart
2017-06-18 15:21 ` [PATCH rfc 09/30] nvme: Move queue_count to the nvme_ctrl Sagi Grimberg
2017-06-19 12:41   ` Christoph Hellwig
2017-06-18 15:21 ` [PATCH rfc 10/30] nvme: Add admin_tagset pointer to nvme_ctrl Sagi Grimberg
2017-06-19 12:41   ` Christoph Hellwig
2017-06-19 13:58     ` Sagi Grimberg
2017-06-18 15:21 ` [PATCH rfc 11/30] nvme: move controller cap to struct nvme_ctrl Sagi Grimberg
2017-06-19 12:42   ` Christoph Hellwig
2017-06-18 15:21 ` [PATCH rfc 12/30] nvme-rdma: disable controller in reset instead of shutdown Sagi Grimberg
2017-06-19 12:43   ` Christoph Hellwig
2017-06-18 15:21 ` [PATCH rfc 13/30] nvme-rdma: move queue LIVE/DELETING flags settings to queue routines Sagi Grimberg
2017-06-19 12:44   ` Christoph Hellwig
2017-06-18 15:21 ` [PATCH rfc 14/30] nvme-rdma: stop queues instead of simply flipping their state Sagi Grimberg
2017-06-19 12:44   ` Christoph Hellwig
2017-06-18 15:21 ` [PATCH rfc 15/30] nvme-rdma: don't check queue state for shutdown/disable Sagi Grimberg
2017-06-19 12:44   ` Christoph Hellwig
2017-06-18 15:21 ` [PATCH rfc 16/30] nvme-rdma: move tagset allocation to a dedicated routine Sagi Grimberg
2017-06-19 12:45   ` Christoph Hellwig
2017-06-18 15:21 ` [PATCH rfc 17/30] nvme-rdma: move admin specific resources to alloc_queue Sagi Grimberg
2017-06-19 12:46   ` Christoph Hellwig
2017-06-18 15:21 ` [PATCH rfc 18/30] nvme-rdma: limit max_queues to rdma device number of completion vectors Sagi Grimberg
2017-06-18 15:21 ` [PATCH rfc 19/30] nvme-rdma: call ops->reg_read64 instead of nvmf_reg_read64 Sagi Grimberg
2017-06-18 15:21 ` [PATCH rfc 20/30] nvme: add err, reconnect and delete work items to nvme core Sagi Grimberg
2017-06-19 12:49   ` Christoph Hellwig
2017-06-19 14:14     ` Sagi Grimberg
2017-06-18 15:21 ` [PATCH rfc 21/30] nvme-rdma: plumb nvme_ctrl down the calls tack Sagi Grimberg
2017-06-18 15:21 ` [PATCH rfc 22/30] nvme-rdma: Split create_ctrl to transport specific and generic parts Sagi Grimberg
2017-06-18 15:21 ` [PATCH rfc 23/30] nvme: add low level queue and tagset controller ops Sagi Grimberg
2017-06-18 15:21 ` [PATCH rfc 24/30] nvme-pci: rename to nvme_pci_configure_admin_queue Sagi Grimberg
2017-06-19  7:20   ` Christoph Hellwig
2017-06-18 15:21 ` [PATCH rfc 25/30] nvme: move control plane handling to nvme core Sagi Grimberg
2017-06-19 12:55   ` Christoph Hellwig
2017-06-19 16:24     ` Sagi Grimberg
2017-06-18 15:22 ` [PATCH rfc 26/30] nvme-fabrics: handle reconnects in fabrics library Sagi Grimberg
2017-06-18 15:22 ` [PATCH rfc 27/30] nvme-loop: convert to nvme-core control plane management Sagi Grimberg
2017-06-18 15:22 ` [PATCH rfc 28/30] nvme: update tagset nr_hw_queues when reallocating io queues Sagi Grimberg
2017-06-19  7:21   ` Christoph Hellwig
2017-06-19  8:06     ` Ming Lei
2017-06-19 16:21       ` Sagi Grimberg
2017-06-18 15:22 ` [PATCH rfc 29/30] nvme: add sed-opal ctrl manipulation in admin configuration Sagi Grimberg
2017-06-19  7:22   ` Christoph Hellwig
2017-06-19  8:03     ` Sagi Grimberg
2017-06-19 12:55       ` Christoph Hellwig
2017-06-18 15:22 ` [PATCH rfc 30/30] nvme: Add queue freeze/unfreeze handling on controller resets Sagi Grimberg
2017-06-18 15:24 ` [PATCH rfc 00/30] centralize nvme controller reset, delete and periodic reconnects Sagi Grimberg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).