[PATCHv2 0/5] NVMe fixes and updates, version 2

linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

* [PATCHv2 0/5] NVMe fixes and updates, version 2
@ 2015-12-31 16:41 Keith Busch
  2015-12-31 16:41 ` [PATCHv2 1/5] NVMe: Fix admin queue ring wrap Keith Busch
                   ` (5 more replies)
  0 siblings, 6 replies; 16+ messages in thread
From: Keith Busch @ 2015-12-31 16:41 UTC (permalink / raw)


Changes since v1:

  Fixed the patch order. Previously 4/5 and 5/5 were swapped.

  Added a code comment to make the clarify the queue full condition.

  Better commit log messages.

  Moved the "struct completion" from an nvme_queue union to struct nvme_dev.

  Simplified the shutdown patch based on review comments. The logic is
  pushed into a single admin queue handling function, and the result is
  more understandable.

There are suggestions for other mechanisms to handle async io queue
deletion (patch 4/5, previously 5/5), but I'm sticking with the original
idea this time. I'll spin another patch if the alternate proposals compel
new version.

Keith Busch (5):
  NVMe: Fix admin queue ring wrap
  NVMe: Use a retryable error code on reset
  NVMe: Remove queue freezing on resets
  NVMe: IO queue deletion re-write
  NVMe: Shutdown controller only for power-off

 drivers/nvme/host/pci.c | 314 +++++++++++++++++-------------------------------
 1 file changed, 112 insertions(+), 202 deletions(-)

-- 
2.6.2.307.g37023ba

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCHv2 1/5] NVMe: Fix admin queue ring wrap
  2015-12-31 16:41 [PATCHv2 0/5] NVMe fixes and updates, version 2 Keith Busch
@ 2015-12-31 16:41 ` Keith Busch
  2016-01-03 16:44   ` Sagi Grimberg
  2015-12-31 16:41 ` [PATCHv2 2/5] NVMe: Use a retryable error code on reset Keith Busch
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 16+ messages in thread
From: Keith Busch @ 2015-12-31 16:41 UTC (permalink / raw)


The tag set queue depth needs to be one less than the h/w queue depth
so we don't wrap the circular buffer. This conforms to the specification
defined "Full Queue" condition.

Signed-off-by: Keith Busch <keith.busch at intel.com>
---
 drivers/nvme/host/pci.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index b82bbea..67cb406 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1271,7 +1271,12 @@ static int nvme_alloc_admin_tags(struct nvme_dev *dev)
 	if (!dev->ctrl.admin_q) {
 		dev->admin_tagset.ops = &nvme_mq_admin_ops;
 		dev->admin_tagset.nr_hw_queues = 1;
-		dev->admin_tagset.queue_depth = NVME_AQ_BLKMQ_DEPTH;
+
+		/*
+		 * Subtract one to leave an empty queue entry for 'Full Queue'
+		 * condition. See NVM-Express 1.2 specification, section 4.1.2.
+		 */
+		dev->admin_tagset.queue_depth = NVME_AQ_BLKMQ_DEPTH - 1;
 		dev->admin_tagset.timeout = ADMIN_TIMEOUT;
 		dev->admin_tagset.numa_node = dev_to_node(dev->dev);
 		dev->admin_tagset.cmd_size = nvme_cmd_size(dev);
-- 
2.6.2.307.g37023ba

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCHv2 1/5] NVMe: Fix admin queue ring wrap
  2015-12-31 16:41 ` [PATCHv2 1/5] NVMe: Fix admin queue ring wrap Keith Busch
@ 2016-01-03 16:44   ` Sagi Grimberg
  0 siblings, 0 replies; 16+ messages in thread
From: Sagi Grimberg @ 2016-01-03 16:44 UTC (permalink / raw)


This looks good,

Reviewed-by: Sagi Grimberg <sagig at mellanox.com>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCHv2 2/5] NVMe: Use a retryable error code on reset
  2015-12-31 16:41 [PATCHv2 0/5] NVMe fixes and updates, version 2 Keith Busch
  2015-12-31 16:41 ` [PATCHv2 1/5] NVMe: Fix admin queue ring wrap Keith Busch
@ 2015-12-31 16:41 ` Keith Busch
  2016-01-03 16:54   ` Sagi Grimberg
  2015-12-31 16:41 ` [PATCHv2 3/5] NVMe: Remove queue freezing on resets Keith Busch
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 16+ messages in thread
From: Keith Busch @ 2015-12-31 16:41 UTC (permalink / raw)


The negative status has the "do not retry" bit set, which makes it not
retryable. Use a fake status that can potentially be retried on reset.

An aborted command's status is still overridden by the timeout handler
with the CANCELLED status, which is needed for initialization to
distinguish an unresponsive controller.

Signed-off-by: Keith Busch <keith.busch at intel.com>
---
 drivers/nvme/host/pci.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 67cb406..71f43ff 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1017,7 +1017,7 @@ static void nvme_cancel_queue_ios(struct request *req, void *data, bool reserved
 	dev_warn(nvmeq->q_dmadev,
 		 "Cancelling I/O %d QID %d\n", req->tag, nvmeq->qid);
 
-	status = NVME_SC_CANCELLED;
+	status = NVME_SC_ABORT_REQ;
 	if (blk_queue_dying(req->q))
 		status |= NVME_SC_DNR;
 	blk_mq_complete_request(req, status);
-- 
2.6.2.307.g37023ba

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCHv2 2/5] NVMe: Use a retryable error code on reset
  2015-12-31 16:41 ` [PATCHv2 2/5] NVMe: Use a retryable error code on reset Keith Busch
@ 2016-01-03 16:54   ` Sagi Grimberg
  0 siblings, 0 replies; 16+ messages in thread
From: Sagi Grimberg @ 2016-01-03 16:54 UTC (permalink / raw)


> The negative status has the "do not retry" bit set, which makes it not
> retryable. Use a fake status that can potentially be retried on reset.
>
> An aborted command's status is still overridden by the timeout handler
> with the CANCELLED status, which is needed for initialization to
> distinguish an unresponsive controller.
>
> Signed-off-by: Keith Busch <keith.busch at intel.com>

Reviewed-by: Sagi Grimberg <sagig at mellanox.com>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCHv2 3/5] NVMe: Remove queue freezing on resets
  2015-12-31 16:41 [PATCHv2 0/5] NVMe fixes and updates, version 2 Keith Busch
  2015-12-31 16:41 ` [PATCHv2 1/5] NVMe: Fix admin queue ring wrap Keith Busch
  2015-12-31 16:41 ` [PATCHv2 2/5] NVMe: Use a retryable error code on reset Keith Busch
@ 2015-12-31 16:41 ` Keith Busch
  2016-01-03 16:41   ` Sagi Grimberg
  2015-12-31 16:41 ` [PATCHv2 4/5] NVMe: IO queue deletion re-write Keith Busch
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 16+ messages in thread
From: Keith Busch @ 2015-12-31 16:41 UTC (permalink / raw)


NVMe submits all commands through the block layer now. This means the
driver can let requests queue at the blk-mq hardware context since there
is no path that bypasses this anymore, so this patch removes freezing
the queues on reset. The driver simply stops the h/w queues from running
during a reset instead.

This also fixes a WARN in percpu_ref_reinit when the queue was unfrozen
with requeued requests.

Signed-off-by: Keith Busch <keith.busch at intel.com>
---
 drivers/nvme/host/pci.c | 15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 71f43ff..5e1e8cd 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1064,7 +1064,7 @@ static int nvme_suspend_queue(struct nvme_queue *nvmeq)
 	spin_unlock_irq(&nvmeq->q_lock);
 
 	if (!nvmeq->qid && nvmeq->dev->ctrl.admin_q)
-		blk_mq_freeze_queue_start(nvmeq->dev->ctrl.admin_q);
+		blk_mq_stop_hw_queues(nvmeq->dev->ctrl.admin_q);
 
 	irq_set_affinity_hint(vector, NULL);
 	free_irq(vector, nvmeq);
@@ -1296,7 +1296,7 @@ static int nvme_alloc_admin_tags(struct nvme_dev *dev)
 			return -ENODEV;
 		}
 	} else
-		blk_mq_unfreeze_queue(dev->ctrl.admin_q);
+		blk_mq_start_stopped_hw_queues(dev->ctrl.admin_q, true);
 
 	return 0;
 }
@@ -1908,13 +1908,11 @@ static void nvme_dev_list_remove(struct nvme_dev *dev)
 		kthread_stop(tmp);
 }
 
-static void nvme_freeze_queues(struct nvme_dev *dev)
+static void nvme_stop_queues(struct nvme_dev *dev)
 {
 	struct nvme_ns *ns;
 
 	list_for_each_entry(ns, &dev->ctrl.namespaces, list) {
-		blk_mq_freeze_queue_start(ns->queue);
-
 		spin_lock_irq(ns->queue->queue_lock);
 		queue_flag_set(QUEUE_FLAG_STOPPED, ns->queue);
 		spin_unlock_irq(ns->queue->queue_lock);
@@ -1924,13 +1922,12 @@ static void nvme_freeze_queues(struct nvme_dev *dev)
 	}
 }
 
-static void nvme_unfreeze_queues(struct nvme_dev *dev)
+static void nvme_start_queues(struct nvme_dev *dev)
 {
 	struct nvme_ns *ns;
 
 	list_for_each_entry(ns, &dev->ctrl.namespaces, list) {
 		queue_flag_clear_unlocked(QUEUE_FLAG_STOPPED, ns->queue);
-		blk_mq_unfreeze_queue(ns->queue);
 		blk_mq_start_stopped_hw_queues(ns->queue, true);
 		blk_mq_kick_requeue_list(ns->queue);
 	}
@@ -1945,7 +1942,7 @@ static void nvme_dev_shutdown(struct nvme_dev *dev)
 
 	mutex_lock(&dev->shutdown_lock);
 	if (dev->bar) {
-		nvme_freeze_queues(dev);
+		nvme_stop_queues(dev);
 		csts = readl(dev->bar + NVME_REG_CSTS);
 	}
 	if (csts & NVME_CSTS_CFS || !(csts & NVME_CSTS_RDY)) {
@@ -2054,7 +2051,7 @@ static void nvme_reset_work(struct work_struct *work)
 		dev_warn(dev->dev, "IO queues not created\n");
 		nvme_remove_namespaces(&dev->ctrl);
 	} else {
-		nvme_unfreeze_queues(dev);
+		nvme_start_queues(dev);
 		nvme_dev_add(dev);
 	}
 
-- 
2.6.2.307.g37023ba

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCHv2 3/5] NVMe: Remove queue freezing on resets
  2015-12-31 16:41 ` [PATCHv2 3/5] NVMe: Remove queue freezing on resets Keith Busch
@ 2016-01-03 16:41   ` Sagi Grimberg
  0 siblings, 0 replies; 16+ messages in thread
From: Sagi Grimberg @ 2016-01-03 16:41 UTC (permalink / raw)


Since this wasn't added in v2:

Tested-by: Sagi Grimberg <sagig at mellanox.com>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCHv2 4/5] NVMe: IO queue deletion re-write
  2015-12-31 16:41 [PATCHv2 0/5] NVMe fixes and updates, version 2 Keith Busch
                   ` (2 preceding siblings ...)
  2015-12-31 16:41 ` [PATCHv2 3/5] NVMe: Remove queue freezing on resets Keith Busch
@ 2015-12-31 16:41 ` Keith Busch
  2015-12-31 16:41 ` [PATCHv2 5/5] NVMe: Shutdown controller only for power-off Keith Busch
  2015-12-31 18:38 ` [PATCHv2 0/5] NVMe fixes and updates, version 2 Sagi Grimberg
  5 siblings, 0 replies; 16+ messages in thread
From: Keith Busch @ 2015-12-31 16:41 UTC (permalink / raw)


The nvme driver deletes IO queues asynchronously since this operation
may potentially take an undesirable amount of time if done serially with
a large number of queues.

The driver used to manage coordinating asynchronous deletions. This
patch simplifies that by leveraging the block layer rather than using
kthread workers and chaining complicated callbacks.

Beyond just being a simpler method, this also fixes a theoretical hang
if the controller stops responding while the worker thread was queued
deeper than the admin queue depth.

Signed-off-by: Keith Busch <keith.busch at intel.com>
---
 drivers/nvme/host/pci.c | 250 ++++++++++++++++--------------------------------
 1 file changed, 80 insertions(+), 170 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 5e1e8cd..d2eecd8 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -89,13 +89,6 @@ static void nvme_process_cq(struct nvme_queue *nvmeq);
 static void nvme_remove_dead_ctrl(struct nvme_dev *dev);
 static void nvme_dev_shutdown(struct nvme_dev *dev);
 
-struct async_cmd_info {
-	struct kthread_work work;
-	struct kthread_worker *worker;
-	int status;
-	void *ctx;
-};
-
 /*
  * Represents an NVM Express device.  Each nvme_dev is a PCI function.
  */
@@ -125,9 +118,11 @@ struct nvme_dev {
 	u64 cmb_size;
 	u32 cmbsz;
 	unsigned long flags;
+
 #define NVME_CTRL_RESETTING    0
 
 	struct nvme_ctrl ctrl;
+	struct completion ioq_wait;
 };
 
 static inline struct nvme_dev *to_nvme_dev(struct nvme_ctrl *ctrl)
@@ -159,7 +154,7 @@ struct nvme_queue {
 	u16 qid;
 	u8 cq_phase;
 	u8 cqe_seen;
-	struct async_cmd_info cmdinfo;
+	struct nvme_delete_queue dq;
 };
 
 /*
@@ -844,15 +839,6 @@ static void nvme_submit_async_event(struct nvme_dev *dev)
 	__nvme_submit_cmd(dev->queues[0], &c);
 }
 
-static void async_cmd_info_endio(struct request *req, int error)
-{
-	struct async_cmd_info *cmdinfo = req->end_io_data;
-
-	cmdinfo->status = req->errors;
-	queue_kthread_work(cmdinfo->worker, &cmdinfo->work);
-	blk_mq_free_request(req);
-}
-
 static int adapter_delete_queue(struct nvme_dev *dev, u8 opcode, u16 id)
 {
 	struct nvme_command c;
@@ -1600,6 +1586,82 @@ static void nvme_dev_scan(struct work_struct *work)
 	nvme_set_irq_hints(dev);
 }
 
+static void nvme_del_queue_end(struct request *req, int error)
+{
+	unsigned long flags;
+	struct nvme_queue *nvmeq = req->end_io_data;
+
+	blk_mq_free_request(req);
+
+	spin_lock_irqsave(&nvmeq->q_lock, flags);
+	nvme_process_cq(nvmeq);
+	spin_unlock_irqrestore(&nvmeq->q_lock, flags);
+
+	complete(&nvmeq->dev->ioq_wait);
+}
+
+static void nvme_del_sq_end(struct request *req, int error)
+{
+	struct nvme_queue *nvmeq = req->end_io_data;
+
+	if (error) {
+		nvme_del_queue_end(req, error);
+		return;
+	}
+
+	nvmeq->dq.opcode = nvme_admin_delete_cq;
+	req->end_io = nvme_del_queue_end;
+
+	/* Must requeue. This callback occurs in irq w/ q_lock held */
+	nvme_requeue_req(req);
+}
+
+static int nvme_delete_queue(struct nvme_queue *nvmeq, struct request_queue *q)
+{
+	struct request *req;
+
+	req = nvme_alloc_request(q, (struct nvme_command *)&nvmeq->dq,
+							BLK_MQ_REQ_NOWAIT);
+	if (IS_ERR(req))
+		return PTR_ERR(req);
+
+	memset(&nvmeq->dq, 0, sizeof(nvmeq->dq));
+	nvmeq->dq.opcode = nvme_admin_delete_sq;
+	nvmeq->dq.qid = cpu_to_le16(nvmeq->qid);
+
+	req->end_io_data = nvmeq;
+	req->timeout = ADMIN_TIMEOUT;
+
+	blk_execute_rq_nowait(q, NULL, req, false, nvme_del_sq_end);
+	return 0;
+}
+
+static void nvme_disable_io_queues(struct nvme_dev *dev)
+{
+	struct request_queue *q = dev->ctrl.admin_q;
+	int i = dev->queue_count - 1, sent = 0;
+	unsigned long timeout;
+
+	reinit_completion(&dev->ioq_wait);
+ retry:
+	timeout = ADMIN_TIMEOUT;
+	for (; i > 0; i--) {
+		struct nvme_queue *nvmeq = dev->queues[i];
+
+		nvme_suspend_queue(nvmeq);
+		if (nvme_delete_queue(nvmeq, q))
+			break;
+		++sent;
+	}
+	while (sent--) {
+		timeout = wait_for_completion_io_timeout(&dev->ioq_wait, timeout);
+		if (timeout == 0)
+			return;
+		if (i)
+			goto retry;
+	}
+}
+
 /*
  * Return: error value if an error occurred setting up the queues or calling
  * Identify Device.  0 if these succeeded, even if adding some of the
@@ -1711,159 +1773,6 @@ static void nvme_dev_unmap(struct nvme_dev *dev)
 	}
 }
 
-struct nvme_delq_ctx {
-	struct task_struct *waiter;
-	struct kthread_worker *worker;
-	atomic_t refcount;
-};
-
-static void nvme_wait_dq(struct nvme_delq_ctx *dq, struct nvme_dev *dev)
-{
-	dq->waiter = current;
-	mb();
-
-	for (;;) {
-		set_current_state(TASK_KILLABLE);
-		if (!atomic_read(&dq->refcount))
-			break;
-		if (!schedule_timeout(ADMIN_TIMEOUT) ||
-					fatal_signal_pending(current)) {
-			/*
-			 * Disable the controller first since we can't trust it
-			 * at this point, but leave the admin queue enabled
-			 * until all queue deletion requests are flushed.
-			 * FIXME: This may take a while if there are more h/w
-			 * queues than admin tags.
-			 */
-			set_current_state(TASK_RUNNING);
-			nvme_disable_ctrl(&dev->ctrl,
-				lo_hi_readq(dev->bar + NVME_REG_CAP));
-			nvme_clear_queue(dev->queues[0]);
-			flush_kthread_worker(dq->worker);
-			nvme_disable_queue(dev, 0);
-			return;
-		}
-	}
-	set_current_state(TASK_RUNNING);
-}
-
-static void nvme_put_dq(struct nvme_delq_ctx *dq)
-{
-	atomic_dec(&dq->refcount);
-	if (dq->waiter)
-		wake_up_process(dq->waiter);
-}
-
-static struct nvme_delq_ctx *nvme_get_dq(struct nvme_delq_ctx *dq)
-{
-	atomic_inc(&dq->refcount);
-	return dq;
-}
-
-static void nvme_del_queue_end(struct nvme_queue *nvmeq)
-{
-	struct nvme_delq_ctx *dq = nvmeq->cmdinfo.ctx;
-	nvme_put_dq(dq);
-
-	spin_lock_irq(&nvmeq->q_lock);
-	nvme_process_cq(nvmeq);
-	spin_unlock_irq(&nvmeq->q_lock);
-}
-
-static int adapter_async_del_queue(struct nvme_queue *nvmeq, u8 opcode,
-						kthread_work_func_t fn)
-{
-	struct request *req;
-	struct nvme_command c;
-
-	memset(&c, 0, sizeof(c));
-	c.delete_queue.opcode = opcode;
-	c.delete_queue.qid = cpu_to_le16(nvmeq->qid);
-
-	init_kthread_work(&nvmeq->cmdinfo.work, fn);
-
-	req = nvme_alloc_request(nvmeq->dev->ctrl.admin_q, &c, 0);
-	if (IS_ERR(req))
-		return PTR_ERR(req);
-
-	req->timeout = ADMIN_TIMEOUT;
-	req->end_io_data = &nvmeq->cmdinfo;
-	blk_execute_rq_nowait(req->q, NULL, req, 0, async_cmd_info_endio);
-	return 0;
-}
-
-static void nvme_del_cq_work_handler(struct kthread_work *work)
-{
-	struct nvme_queue *nvmeq = container_of(work, struct nvme_queue,
-							cmdinfo.work);
-	nvme_del_queue_end(nvmeq);
-}
-
-static int nvme_delete_cq(struct nvme_queue *nvmeq)
-{
-	return adapter_async_del_queue(nvmeq, nvme_admin_delete_cq,
-						nvme_del_cq_work_handler);
-}
-
-static void nvme_del_sq_work_handler(struct kthread_work *work)
-{
-	struct nvme_queue *nvmeq = container_of(work, struct nvme_queue,
-							cmdinfo.work);
-	int status = nvmeq->cmdinfo.status;
-
-	if (!status)
-		status = nvme_delete_cq(nvmeq);
-	if (status)
-		nvme_del_queue_end(nvmeq);
-}
-
-static int nvme_delete_sq(struct nvme_queue *nvmeq)
-{
-	return adapter_async_del_queue(nvmeq, nvme_admin_delete_sq,
-						nvme_del_sq_work_handler);
-}
-
-static void nvme_del_queue_start(struct kthread_work *work)
-{
-	struct nvme_queue *nvmeq = container_of(work, struct nvme_queue,
-							cmdinfo.work);
-	if (nvme_delete_sq(nvmeq))
-		nvme_del_queue_end(nvmeq);
-}
-
-static void nvme_disable_io_queues(struct nvme_dev *dev)
-{
-	int i;
-	DEFINE_KTHREAD_WORKER_ONSTACK(worker);
-	struct nvme_delq_ctx dq;
-	struct task_struct *kworker_task = kthread_run(kthread_worker_fn,
-					&worker, "nvme%d", dev->ctrl.instance);
-
-	if (IS_ERR(kworker_task)) {
-		dev_err(dev->dev,
-			"Failed to create queue del task\n");
-		for (i = dev->queue_count - 1; i > 0; i--)
-			nvme_disable_queue(dev, i);
-		return;
-	}
-
-	dq.waiter = NULL;
-	atomic_set(&dq.refcount, 0);
-	dq.worker = &worker;
-	for (i = dev->queue_count - 1; i > 0; i--) {
-		struct nvme_queue *nvmeq = dev->queues[i];
-
-		if (nvme_suspend_queue(nvmeq))
-			continue;
-		nvmeq->cmdinfo.ctx = nvme_get_dq(&dq);
-		nvmeq->cmdinfo.worker = dq.worker;
-		init_kthread_work(&nvmeq->cmdinfo.work, nvme_del_queue_start);
-		queue_kthread_work(dq.worker, &nvmeq->cmdinfo.work);
-	}
-	nvme_wait_dq(&dq, dev);
-	kthread_stop(kworker_task);
-}
-
 static int nvme_dev_list_add(struct nvme_dev *dev)
 {
 	bool start_thread = false;
@@ -2171,6 +2080,7 @@ static int nvme_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	INIT_WORK(&dev->reset_work, nvme_reset_work);
 	INIT_WORK(&dev->remove_work, nvme_remove_dead_ctrl_work);
 	mutex_init(&dev->shutdown_lock);
+	init_completion(&dev->ioq_wait);
 
 	result = nvme_setup_prp_pools(dev);
 	if (result)
-- 
2.6.2.307.g37023ba

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCHv2 5/5] NVMe: Shutdown controller only for power-off
  2015-12-31 16:41 [PATCHv2 0/5] NVMe fixes and updates, version 2 Keith Busch
                   ` (3 preceding siblings ...)
  2015-12-31 16:41 ` [PATCHv2 4/5] NVMe: IO queue deletion re-write Keith Busch
@ 2015-12-31 16:41 ` Keith Busch
  2016-01-03 16:52   ` Sagi Grimberg
  2015-12-31 18:38 ` [PATCHv2 0/5] NVMe fixes and updates, version 2 Sagi Grimberg
  5 siblings, 1 reply; 16+ messages in thread
From: Keith Busch @ 2015-12-31 16:41 UTC (permalink / raw)


We don't need to shutdown a controller for a reset. A controller in a
shutdown state may take longer to become ready than one that was simply
disabled. This patch has the driver shut down the controller only if it
is about to be powered off or being removed. When taking the controller
down for a reset reason, the controller will only be disabled instead.

Function names have been updated in this patch to reflect their changed
semantics.

Signed-off-by: Keith Busch <keith.busch at intel.com>
---
 drivers/nvme/host/pci.c | 40 +++++++++++++++++++---------------------
 1 file changed, 19 insertions(+), 21 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index d2eecd8..eb37ef6 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -87,7 +87,7 @@ struct nvme_queue;
 static int nvme_reset(struct nvme_dev *dev);
 static void nvme_process_cq(struct nvme_queue *nvmeq);
 static void nvme_remove_dead_ctrl(struct nvme_dev *dev);
-static void nvme_dev_shutdown(struct nvme_dev *dev);
+static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown);
 
 /*
  * Represents an NVM Express device.  Each nvme_dev is a PCI function.
@@ -933,7 +933,7 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved)
 		dev_warn(dev->dev,
 			 "I/O %d QID %d timeout, disable controller\n",
 			 req->tag, nvmeq->qid);
-		nvme_dev_shutdown(dev);
+		nvme_dev_disable(dev, false);
 		req->errors = NVME_SC_CANCELLED;
 		return BLK_EH_HANDLED;
 	}
@@ -947,7 +947,7 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved)
 		dev_warn(dev->dev,
 			 "I/O %d QID %d timeout, reset controller\n",
 			 req->tag, nvmeq->qid);
-		nvme_dev_shutdown(dev);
+		nvme_dev_disable(dev, false);
 		queue_work(nvme_workq, &dev->reset_work);
 
 		/*
@@ -1066,21 +1066,20 @@ static void nvme_clear_queue(struct nvme_queue *nvmeq)
 	spin_unlock_irq(&nvmeq->q_lock);
 }
 
-static void nvme_disable_queue(struct nvme_dev *dev, int qid)
+static void nvme_disable_admin_queue(struct nvme_dev *dev, bool shutdown)
 {
-	struct nvme_queue *nvmeq = dev->queues[qid];
+	struct nvme_queue *nvmeq = dev->queues[0];
 
 	if (!nvmeq)
 		return;
 	if (nvme_suspend_queue(nvmeq))
 		return;
 
-	/* Don't tell the adapter to delete the admin queue.
-	 * Don't tell a removed adapter to delete IO queues. */
-	if (qid && readl(dev->bar + NVME_REG_CSTS) != -1) {
-		adapter_delete_sq(dev, qid);
-		adapter_delete_cq(dev, qid);
-	}
+	if (shutdown)
+		nvme_shutdown_ctrl(&dev->ctrl);
+	else
+		nvme_disable_ctrl(&dev->ctrl, lo_hi_readq(
+						dev->bar + NVME_REG_CAP));
 
 	spin_lock_irq(&nvmeq->q_lock);
 	nvme_process_cq(nvmeq);
@@ -1842,7 +1841,7 @@ static void nvme_start_queues(struct nvme_dev *dev)
 	}
 }
 
-static void nvme_dev_shutdown(struct nvme_dev *dev)
+static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
 {
 	int i;
 	u32 csts = -1;
@@ -1861,8 +1860,7 @@ static void nvme_dev_shutdown(struct nvme_dev *dev)
 		}
 	} else {
 		nvme_disable_io_queues(dev);
-		nvme_shutdown_ctrl(&dev->ctrl);
-		nvme_disable_queue(dev, 0);
+		nvme_disable_admin_queue(dev, shutdown);
 	}
 	nvme_dev_unmap(dev);
 
@@ -1921,7 +1919,7 @@ static void nvme_reset_work(struct work_struct *work)
 	 * moving on.
 	 */
 	if (dev->bar)
-		nvme_dev_shutdown(dev);
+		nvme_dev_disable(dev, false);
 
 	set_bit(NVME_CTRL_RESETTING, &dev->flags);
 
@@ -1975,7 +1973,7 @@ static void nvme_reset_work(struct work_struct *work)
 	dev->ctrl.admin_q = NULL;
 	dev->queues[0]->tags = NULL;
  disable:
-	nvme_disable_queue(dev, 0);
+	nvme_disable_admin_queue(dev, false);
  unmap:
 	nvme_dev_unmap(dev);
  out:
@@ -2110,7 +2108,7 @@ static void nvme_reset_notify(struct pci_dev *pdev, bool prepare)
 	struct nvme_dev *dev = pci_get_drvdata(pdev);
 
 	if (prepare)
-		nvme_dev_shutdown(dev);
+		nvme_dev_disable(dev, false);
 	else
 		queue_work(nvme_workq, &dev->reset_work);
 }
@@ -2118,7 +2116,7 @@ static void nvme_reset_notify(struct pci_dev *pdev, bool prepare)
 static void nvme_shutdown(struct pci_dev *pdev)
 {
 	struct nvme_dev *dev = pci_get_drvdata(pdev);
-	nvme_dev_shutdown(dev);
+	nvme_dev_disable(dev, true);
 }
 
 static void nvme_remove(struct pci_dev *pdev)
@@ -2134,7 +2132,7 @@ static void nvme_remove(struct pci_dev *pdev)
 	flush_work(&dev->scan_work);
 	nvme_remove_namespaces(&dev->ctrl);
 	nvme_uninit_ctrl(&dev->ctrl);
-	nvme_dev_shutdown(dev);
+	nvme_dev_disable(dev, true);
 	nvme_dev_remove_admin(dev);
 	nvme_free_queues(dev, 0);
 	nvme_release_cmb(dev);
@@ -2148,7 +2146,7 @@ static int nvme_suspend(struct device *dev)
 	struct pci_dev *pdev = to_pci_dev(dev);
 	struct nvme_dev *ndev = pci_get_drvdata(pdev);
 
-	nvme_dev_shutdown(ndev);
+	nvme_dev_disable(ndev, true);
 	return 0;
 }
 
@@ -2179,7 +2177,7 @@ static pci_ers_result_t nvme_error_detected(struct pci_dev *pdev,
 	case pci_channel_io_normal:
 		return PCI_ERS_RESULT_CAN_RECOVER;
 	case pci_channel_io_frozen:
-		nvme_dev_shutdown(dev);
+		nvme_dev_disable(dev, false);
 		return PCI_ERS_RESULT_NEED_RESET;
 	case pci_channel_io_perm_failure:
 		return PCI_ERS_RESULT_DISCONNECT;
-- 
2.6.2.307.g37023ba

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCHv2 5/5] NVMe: Shutdown controller only for power-off
  2015-12-31 16:41 ` [PATCHv2 5/5] NVMe: Shutdown controller only for power-off Keith Busch
@ 2016-01-03 16:52   ` Sagi Grimberg
  2016-01-04 15:54     ` Keith Busch
  0 siblings, 1 reply; 16+ messages in thread
From: Sagi Grimberg @ 2016-01-03 16:52 UTC (permalink / raw)



> We don't need to shutdown a controller for a reset. A controller in a
> shutdown state may take longer to become ready than one that was simply
> disabled.

Just out of curiosity, what's "longer"? how much does this patch save us?

Otherwise,

Reviewed-by: Sagi Grimberg <sagig at mellanox.com>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCHv2 5/5] NVMe: Shutdown controller only for power-off
  2016-01-03 16:52   ` Sagi Grimberg
@ 2016-01-04 15:54     ` Keith Busch
  2016-01-04 16:08       ` Sagi Grimberg
  0 siblings, 1 reply; 16+ messages in thread
From: Keith Busch @ 2016-01-04 15:54 UTC (permalink / raw)


On Sun, Jan 03, 2016@06:52:23PM +0200, Sagi Grimberg wrote:
> 
> >We don't need to shutdown a controller for a reset. A controller in a
> >shutdown state may take longer to become ready than one that was simply
> >disabled.
> 
> Just out of curiosity, what's "longer"? how much does this patch save us?

It depends on the device's capacity in my experience. Testing on a 2TB
P3700, a complete reset takes about 8 seconds when using controller
shutdown, and ~3 seconds without it.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCHv2 5/5] NVMe: Shutdown controller only for power-off
  2016-01-04 15:54     ` Keith Busch
@ 2016-01-04 16:08       ` Sagi Grimberg
  0 siblings, 0 replies; 16+ messages in thread
From: Sagi Grimberg @ 2016-01-04 16:08 UTC (permalink / raw)



>> Just out of curiosity, what's "longer"? how much does this patch save us?
>
> It depends on the device's capacity in my experience. Testing on a 2TB
> P3700, a complete reset takes about 8 seconds when using controller
> shutdown, and ~3 seconds without it.

Fair enough.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCHv2 0/5] NVMe fixes and updates, version 2
  2015-12-31 16:41 [PATCHv2 0/5] NVMe fixes and updates, version 2 Keith Busch
                   ` (4 preceding siblings ...)
  2015-12-31 16:41 ` [PATCHv2 5/5] NVMe: Shutdown controller only for power-off Keith Busch
@ 2015-12-31 18:38 ` Sagi Grimberg
  2015-12-31 18:40   ` Keith Busch
  5 siblings, 1 reply; 16+ messages in thread
From: Sagi Grimberg @ 2015-12-31 18:38 UTC (permalink / raw)




On 31/12/2015 18:41, Keith Busch wrote:
> Changes since v1:
>
>    Fixed the patch order. Previously 4/5 and 5/5 were swapped.
>
>    Added a code comment to make the clarify the queue full condition.
>
>    Better commit log messages.
>
>    Moved the "struct completion" from an nvme_queue union to struct nvme_dev.
>
>    Simplified the shutdown patch based on review comments. The logic is
>    pushed into a single admin queue handling function, and the result is
>    more understandable.
>
> There are suggestions for other mechanisms to handle async io queue
> deletion (patch 4/5, previously 5/5), but I'm sticking with the original
> idea this time. I'll spin another patch if the alternate proposals compel
> new version.
>
> Keith Busch (5):
>    NVMe: Fix admin queue ring wrap
>    NVMe: Use a retryable error code on reset
>    NVMe: Remove queue freezing on resets

Any particular reason for not rebasing on top of my patch and
Christoph's "namespace list locking and ioctl fixes V2"?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCHv2 0/5] NVMe fixes and updates, version 2
  2015-12-31 18:38 ` [PATCHv2 0/5] NVMe fixes and updates, version 2 Sagi Grimberg
@ 2015-12-31 18:40   ` Keith Busch
  2015-12-31 18:42     ` Sagi Grimberg
  0 siblings, 1 reply; 16+ messages in thread
From: Keith Busch @ 2015-12-31 18:40 UTC (permalink / raw)


On Thu, Dec 31, 2015@08:38:21PM +0200, Sagi Grimberg wrote:
> >Keith Busch (5):
> >   NVMe: Fix admin queue ring wrap
> >   NVMe: Use a retryable error code on reset
> >   NVMe: Remove queue freezing on resets
> 
> Any particular reason for not rebasing on top of my patch and
> Christoph's "namespace list locking and ioctl fixes V2"?

I'm building off Jens' tree. I'm happy to rebase on Christoph's.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCHv2 0/5] NVMe fixes and updates, version 2
  2015-12-31 18:40   ` Keith Busch
@ 2015-12-31 18:42     ` Sagi Grimberg
  2016-01-02 16:59       ` Christoph Hellwig
  0 siblings, 1 reply; 16+ messages in thread
From: Sagi Grimberg @ 2015-12-31 18:42 UTC (permalink / raw)



> I'm building off Jens' tree. I'm happy to rebase on Christoph's.

That would help, thanks!

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCHv2 0/5] NVMe fixes and updates, version 2
  2015-12-31 18:42     ` Sagi Grimberg
@ 2016-01-02 16:59       ` Christoph Hellwig
  0 siblings, 0 replies; 16+ messages in thread
From: Christoph Hellwig @ 2016-01-02 16:59 UTC (permalink / raw)


On Thu, Dec 31, 2015@08:42:30PM +0200, Sagi Grimberg wrote:
> 
> >I'm building off Jens' tree. I'm happy to rebase on Christoph's.
> 
> That would help, thanks!

A couple more reviews from you and Keith for the series could ensure
they are part of Jens' tree ASAP :)

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2016-01-04 16:08 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-12-31 16:41 [PATCHv2 0/5] NVMe fixes and updates, version 2 Keith Busch
2015-12-31 16:41 ` [PATCHv2 1/5] NVMe: Fix admin queue ring wrap Keith Busch
2016-01-03 16:44   ` Sagi Grimberg
2015-12-31 16:41 ` [PATCHv2 2/5] NVMe: Use a retryable error code on reset Keith Busch
2016-01-03 16:54   ` Sagi Grimberg
2015-12-31 16:41 ` [PATCHv2 3/5] NVMe: Remove queue freezing on resets Keith Busch
2016-01-03 16:41   ` Sagi Grimberg
2015-12-31 16:41 ` [PATCHv2 4/5] NVMe: IO queue deletion re-write Keith Busch
2015-12-31 16:41 ` [PATCHv2 5/5] NVMe: Shutdown controller only for power-off Keith Busch
2016-01-03 16:52   ` Sagi Grimberg
2016-01-04 15:54     ` Keith Busch
2016-01-04 16:08       ` Sagi Grimberg
2015-12-31 18:38 ` [PATCHv2 0/5] NVMe fixes and updates, version 2 Sagi Grimberg
2015-12-31 18:40   ` Keith Busch
2015-12-31 18:42     ` Sagi Grimberg
2016-01-02 16:59       ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).