[PATCHv3 0/5] NVMe fixes and updates, v3

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCHv3 0/5] NVMe fixes and updates, v3
@ 2016-01-04 16:10 Keith Busch
  2016-01-04 16:10 ` [PATCHv3 1/5] NVMe: Fix admin queue ring wrap Keith Busch
                   ` (5 more replies)
  0 siblings, 6 replies; 12+ messages in thread
From: Keith Busch @ 2016-01-04 16:10 UTC (permalink / raw)


This new version just merges up with the latest from Christoph and Sagi.

I guess 4/5 is still controversial, but confident it will make sense
when the failure cases are exercised.

Keith Busch (5):
  NVMe: Fix admin queue ring wrap
  NVMe: Use a retryable error code on reset
  NVMe: Remove queue freezing on resets
  NVMe: IO queue deletion re-write
  NVMe: Shutdown controller only for power-off

 drivers/nvme/host/core.c |   7 +-
 drivers/nvme/host/nvme.h |   4 +-
 drivers/nvme/host/pci.c  | 307 +++++++++++++++++------------------------------
 3 files changed, 114 insertions(+), 204 deletions(-)

-- 
2.6.2.307.g37023ba

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCHv3 1/5] NVMe: Fix admin queue ring wrap
  2016-01-04 16:10 [PATCHv3 0/5] NVMe fixes and updates, v3 Keith Busch
@ 2016-01-04 16:10 ` Keith Busch
  2016-01-04 16:10 ` [PATCHv3 2/5] NVMe: Use a retryable error code on reset Keith Busch
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Keith Busch @ 2016-01-04 16:10 UTC (permalink / raw)


The tag set queue depth needs to be one less than the h/w queue depth
so we don't wrap the circular buffer. This conforms to the specification
defined "Full Queue" condition.

Signed-off-by: Keith Busch <keith.busch at intel.com>
---
 drivers/nvme/host/pci.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index a7e5499..30ed2ab 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1271,7 +1271,12 @@ static int nvme_alloc_admin_tags(struct nvme_dev *dev)
 	if (!dev->ctrl.admin_q) {
 		dev->admin_tagset.ops = &nvme_mq_admin_ops;
 		dev->admin_tagset.nr_hw_queues = 1;
-		dev->admin_tagset.queue_depth = NVME_AQ_BLKMQ_DEPTH;
+
+		/*
+		 * Subtract one to leave an empty queue entry for 'Full Queue'
+		 * condition. See NVM-Express 1.2 specification, section 4.1.2.
+		 */
+		dev->admin_tagset.queue_depth = NVME_AQ_BLKMQ_DEPTH - 1;
 		dev->admin_tagset.timeout = ADMIN_TIMEOUT;
 		dev->admin_tagset.numa_node = dev_to_node(dev->dev);
 		dev->admin_tagset.cmd_size = nvme_cmd_size(dev);
-- 
2.6.2.307.g37023ba

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCHv3 2/5] NVMe: Use a retryable error code on reset
  2016-01-04 16:10 [PATCHv3 0/5] NVMe fixes and updates, v3 Keith Busch
  2016-01-04 16:10 ` [PATCHv3 1/5] NVMe: Fix admin queue ring wrap Keith Busch
@ 2016-01-04 16:10 ` Keith Busch
  2016-01-04 16:10 ` [PATCHv3 3/5] NVMe: Remove queue freezing on resets Keith Busch
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Keith Busch @ 2016-01-04 16:10 UTC (permalink / raw)


A negative status has the "do not retry" bit set, which makes it not
retryable.  Use a fake status that can potentially be retried on reset.

An aborted command's status is overridden by the timeout handler so
that it won't be retried, which is necessary to keep initialization from
getting into a reset loop.

Signed-off-by: Keith Busch <keith.busch at intel.com>
---
 drivers/nvme/host/pci.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 30ed2ab..ac6c7af 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1017,7 +1017,7 @@ static void nvme_cancel_queue_ios(struct request *req, void *data, bool reserved
 	dev_warn(nvmeq->q_dmadev,
 		 "Cancelling I/O %d QID %d\n", req->tag, nvmeq->qid);
 
-	status = NVME_SC_CANCELLED;
+	status = NVME_SC_ABORT_REQ;
 	if (blk_queue_dying(req->q))
 		status |= NVME_SC_DNR;
 	blk_mq_complete_request(req, status);
-- 
2.6.2.307.g37023ba

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCHv3 3/5] NVMe: Remove queue freezing on resets
  2016-01-04 16:10 [PATCHv3 0/5] NVMe fixes and updates, v3 Keith Busch
  2016-01-04 16:10 ` [PATCHv3 1/5] NVMe: Fix admin queue ring wrap Keith Busch
  2016-01-04 16:10 ` [PATCHv3 2/5] NVMe: Use a retryable error code on reset Keith Busch
@ 2016-01-04 16:10 ` Keith Busch
  2016-01-04 16:10 ` [PATCHv3 4/5] NVMe: IO queue deletion re-write Keith Busch
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Keith Busch @ 2016-01-04 16:10 UTC (permalink / raw)


NVMe submits all commands through the block layer now. This means we
can let requests queue at the blk-mq hardware context since there is no
path that bypasses this anymore so we don't need to freeze the queues
anymore. The driver can simply stop the h/w queues from running during
a reset instead.

This also fixes a WARN in percpu_ref_reinit when the queue was unfrozen
with requeued requests.

Signed-off-by: Keith Busch <keith.busch at intel.com>
---
 drivers/nvme/host/core.c | 7 ++-----
 drivers/nvme/host/nvme.h | 4 ++--
 drivers/nvme/host/pci.c  | 8 ++++----
 3 files changed, 8 insertions(+), 11 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 8da4a8a..e31a256 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1372,14 +1372,12 @@ out:
 	return ret;
 }
 
-void nvme_freeze_queues(struct nvme_ctrl *ctrl)
+void nvme_stop_queues(struct nvme_ctrl *ctrl)
 {
 	struct nvme_ns *ns;
 
 	mutex_lock(&ctrl->namespaces_mutex);
 	list_for_each_entry(ns, &ctrl->namespaces, list) {
-		blk_mq_freeze_queue_start(ns->queue);
-
 		spin_lock_irq(ns->queue->queue_lock);
 		queue_flag_set(QUEUE_FLAG_STOPPED, ns->queue);
 		spin_unlock_irq(ns->queue->queue_lock);
@@ -1390,14 +1388,13 @@ void nvme_freeze_queues(struct nvme_ctrl *ctrl)
 	mutex_unlock(&ctrl->namespaces_mutex);
 }
 
-void nvme_unfreeze_queues(struct nvme_ctrl *ctrl)
+void nvme_start_queues(struct nvme_ctrl *ctrl)
 {
 	struct nvme_ns *ns;
 
 	mutex_lock(&ctrl->namespaces_mutex);
 	list_for_each_entry(ns, &ctrl->namespaces, list) {
 		queue_flag_clear_unlocked(QUEUE_FLAG_STOPPED, ns->queue);
-		blk_mq_unfreeze_queue(ns->queue);
 		blk_mq_start_stopped_hw_queues(ns->queue, true);
 		blk_mq_kick_requeue_list(ns->queue);
 	}
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 4437592..4722fad 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -238,8 +238,8 @@ int nvme_init_identify(struct nvme_ctrl *ctrl);
 void nvme_scan_namespaces(struct nvme_ctrl *ctrl);
 void nvme_remove_namespaces(struct nvme_ctrl *ctrl);
 
-void nvme_freeze_queues(struct nvme_ctrl *ctrl);
-void nvme_unfreeze_queues(struct nvme_ctrl *ctrl);
+void nvme_stop_queues(struct nvme_ctrl *ctrl);
+void nvme_start_queues(struct nvme_ctrl *ctrl);
 
 struct request *nvme_alloc_request(struct request_queue *q,
 		struct nvme_command *cmd, unsigned int flags);
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index ac6c7af..953fe48 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1064,7 +1064,7 @@ static int nvme_suspend_queue(struct nvme_queue *nvmeq)
 	spin_unlock_irq(&nvmeq->q_lock);
 
 	if (!nvmeq->qid && nvmeq->dev->ctrl.admin_q)
-		blk_mq_freeze_queue_start(nvmeq->dev->ctrl.admin_q);
+		blk_mq_stop_hw_queues(nvmeq->dev->ctrl.admin_q);
 
 	irq_set_affinity_hint(vector, NULL);
 	free_irq(vector, nvmeq);
@@ -1296,7 +1296,7 @@ static int nvme_alloc_admin_tags(struct nvme_dev *dev)
 			return -ENODEV;
 		}
 	} else
-		blk_mq_unfreeze_queue(dev->ctrl.admin_q);
+		blk_mq_start_stopped_hw_queues(dev->ctrl.admin_q, true);
 
 	return 0;
 }
@@ -1917,7 +1917,7 @@ static void nvme_dev_shutdown(struct nvme_dev *dev)
 
 	mutex_lock(&dev->shutdown_lock);
 	if (dev->bar) {
-		nvme_freeze_queues(&dev->ctrl);
+		nvme_stop_queues(&dev->ctrl);
 		csts = readl(dev->bar + NVME_REG_CSTS);
 	}
 	if (csts & NVME_CSTS_CFS || !(csts & NVME_CSTS_RDY)) {
@@ -2026,7 +2026,7 @@ static void nvme_reset_work(struct work_struct *work)
 		dev_warn(dev->dev, "IO queues not created\n");
 		nvme_remove_namespaces(&dev->ctrl);
 	} else {
-		nvme_unfreeze_queues(&dev->ctrl);
+		nvme_start_queues(&dev->ctrl);
 		nvme_dev_add(dev);
 	}
 
-- 
2.6.2.307.g37023ba

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCHv3 4/5] NVMe: IO queue deletion re-write
  2016-01-04 16:10 [PATCHv3 0/5] NVMe fixes and updates, v3 Keith Busch
                   ` (2 preceding siblings ...)
  2016-01-04 16:10 ` [PATCHv3 3/5] NVMe: Remove queue freezing on resets Keith Busch
@ 2016-01-04 16:10 ` Keith Busch
  2016-01-06 13:04   ` Christoph Hellwig
  2016-01-04 16:10 ` [PATCHv3 5/5] NVMe: Shutdown controller only for power-off Keith Busch
  2016-01-04 16:36 ` [PATCHv3 0/5] NVMe fixes and updates, v3 Christoph Hellwig
  5 siblings, 1 reply; 12+ messages in thread
From: Keith Busch @ 2016-01-04 16:10 UTC (permalink / raw)


The nvme driver deletes IO queues asynchronously since this operation
may potentially take an undesirable amount of time with a large number
of queues if done serially.

The driver used to manage coordinating asynchronous deletions. This
patch simplifies that by leveraging the block layer rather than using
kthread workers and chaining more complicated callbacks.

Signed-off-by: Keith Busch <keith.busch at intel.com>
---
 drivers/nvme/host/pci.c | 250 ++++++++++++++++--------------------------------
 1 file changed, 80 insertions(+), 170 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 953fe48..49829a7 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -89,13 +89,6 @@ static void nvme_process_cq(struct nvme_queue *nvmeq);
 static void nvme_remove_dead_ctrl(struct nvme_dev *dev);
 static void nvme_dev_shutdown(struct nvme_dev *dev);
 
-struct async_cmd_info {
-	struct kthread_work work;
-	struct kthread_worker *worker;
-	int status;
-	void *ctx;
-};
-
 /*
  * Represents an NVM Express device.  Each nvme_dev is a PCI function.
  */
@@ -125,9 +118,11 @@ struct nvme_dev {
 	u64 cmb_size;
 	u32 cmbsz;
 	unsigned long flags;
+
 #define NVME_CTRL_RESETTING    0
 
 	struct nvme_ctrl ctrl;
+	struct completion ioq_wait;
 };
 
 static inline struct nvme_dev *to_nvme_dev(struct nvme_ctrl *ctrl)
@@ -159,7 +154,7 @@ struct nvme_queue {
 	u16 qid;
 	u8 cq_phase;
 	u8 cqe_seen;
-	struct async_cmd_info cmdinfo;
+	struct nvme_delete_queue dq;
 };
 
 /*
@@ -844,15 +839,6 @@ static void nvme_submit_async_event(struct nvme_dev *dev)
 	__nvme_submit_cmd(dev->queues[0], &c);
 }
 
-static void async_cmd_info_endio(struct request *req, int error)
-{
-	struct async_cmd_info *cmdinfo = req->end_io_data;
-
-	cmdinfo->status = req->errors;
-	queue_kthread_work(cmdinfo->worker, &cmdinfo->work);
-	blk_mq_free_request(req);
-}
-
 static int adapter_delete_queue(struct nvme_dev *dev, u8 opcode, u16 id)
 {
 	struct nvme_command c;
@@ -1600,6 +1586,82 @@ static void nvme_dev_scan(struct work_struct *work)
 	nvme_set_irq_hints(dev);
 }
 
+static void nvme_del_queue_end(struct request *req, int error)
+{
+	unsigned long flags;
+	struct nvme_queue *nvmeq = req->end_io_data;
+
+	blk_mq_free_request(req);
+
+	spin_lock_irqsave(&nvmeq->q_lock, flags);
+	nvme_process_cq(nvmeq);
+	spin_unlock_irqrestore(&nvmeq->q_lock, flags);
+
+	complete(&nvmeq->dev->ioq_wait);
+}
+
+static void nvme_del_sq_end(struct request *req, int error)
+{
+	struct nvme_queue *nvmeq = req->end_io_data;
+
+	if (error) {
+		nvme_del_queue_end(req, error);
+		return;
+	}
+
+	nvmeq->dq.opcode = nvme_admin_delete_cq;
+	req->end_io = nvme_del_queue_end;
+
+	/* Must requeue. This callback occurs in irq w/ q_lock held */
+	nvme_requeue_req(req);
+}
+
+static int nvme_delete_queue(struct nvme_queue *nvmeq, struct request_queue *q)
+{
+	struct request *req;
+
+	req = nvme_alloc_request(q, (struct nvme_command *)&nvmeq->dq,
+							BLK_MQ_REQ_NOWAIT);
+	if (IS_ERR(req))
+		return PTR_ERR(req);
+
+	memset(&nvmeq->dq, 0, sizeof(nvmeq->dq));
+	nvmeq->dq.opcode = nvme_admin_delete_sq;
+	nvmeq->dq.qid = cpu_to_le16(nvmeq->qid);
+
+	req->end_io_data = nvmeq;
+	req->timeout = ADMIN_TIMEOUT;
+
+	blk_execute_rq_nowait(q, NULL, req, false, nvme_del_sq_end);
+	return 0;
+}
+
+static void nvme_disable_io_queues(struct nvme_dev *dev)
+{
+	struct request_queue *q = dev->ctrl.admin_q;
+	int i = dev->queue_count - 1, sent = 0;
+	unsigned long timeout;
+
+	reinit_completion(&dev->ioq_wait);
+ retry:
+	timeout = ADMIN_TIMEOUT;
+	for (; i > 0; i--) {
+		struct nvme_queue *nvmeq = dev->queues[i];
+
+		nvme_suspend_queue(nvmeq);
+		if (nvme_delete_queue(nvmeq, q))
+			break;
+		++sent;
+	}
+	while (sent--) {
+		timeout = wait_for_completion_io_timeout(&dev->ioq_wait, timeout);
+		if (timeout == 0)
+			return;
+		if (i)
+			goto retry;
+	}
+}
+
 /*
  * Return: error value if an error occurred setting up the queues or calling
  * Identify Device.  0 if these succeeded, even if adding some of the
@@ -1711,159 +1773,6 @@ static void nvme_dev_unmap(struct nvme_dev *dev)
 	}
 }
 
-struct nvme_delq_ctx {
-	struct task_struct *waiter;
-	struct kthread_worker *worker;
-	atomic_t refcount;
-};
-
-static void nvme_wait_dq(struct nvme_delq_ctx *dq, struct nvme_dev *dev)
-{
-	dq->waiter = current;
-	mb();
-
-	for (;;) {
-		set_current_state(TASK_KILLABLE);
-		if (!atomic_read(&dq->refcount))
-			break;
-		if (!schedule_timeout(ADMIN_TIMEOUT) ||
-					fatal_signal_pending(current)) {
-			/*
-			 * Disable the controller first since we can't trust it
-			 * at this point, but leave the admin queue enabled
-			 * until all queue deletion requests are flushed.
-			 * FIXME: This may take a while if there are more h/w
-			 * queues than admin tags.
-			 */
-			set_current_state(TASK_RUNNING);
-			nvme_disable_ctrl(&dev->ctrl,
-				lo_hi_readq(dev->bar + NVME_REG_CAP));
-			nvme_clear_queue(dev->queues[0]);
-			flush_kthread_worker(dq->worker);
-			nvme_disable_queue(dev, 0);
-			return;
-		}
-	}
-	set_current_state(TASK_RUNNING);
-}
-
-static void nvme_put_dq(struct nvme_delq_ctx *dq)
-{
-	atomic_dec(&dq->refcount);
-	if (dq->waiter)
-		wake_up_process(dq->waiter);
-}
-
-static struct nvme_delq_ctx *nvme_get_dq(struct nvme_delq_ctx *dq)
-{
-	atomic_inc(&dq->refcount);
-	return dq;
-}
-
-static void nvme_del_queue_end(struct nvme_queue *nvmeq)
-{
-	struct nvme_delq_ctx *dq = nvmeq->cmdinfo.ctx;
-	nvme_put_dq(dq);
-
-	spin_lock_irq(&nvmeq->q_lock);
-	nvme_process_cq(nvmeq);
-	spin_unlock_irq(&nvmeq->q_lock);
-}
-
-static int adapter_async_del_queue(struct nvme_queue *nvmeq, u8 opcode,
-						kthread_work_func_t fn)
-{
-	struct request *req;
-	struct nvme_command c;
-
-	memset(&c, 0, sizeof(c));
-	c.delete_queue.opcode = opcode;
-	c.delete_queue.qid = cpu_to_le16(nvmeq->qid);
-
-	init_kthread_work(&nvmeq->cmdinfo.work, fn);
-
-	req = nvme_alloc_request(nvmeq->dev->ctrl.admin_q, &c, 0);
-	if (IS_ERR(req))
-		return PTR_ERR(req);
-
-	req->timeout = ADMIN_TIMEOUT;
-	req->end_io_data = &nvmeq->cmdinfo;
-	blk_execute_rq_nowait(req->q, NULL, req, 0, async_cmd_info_endio);
-	return 0;
-}
-
-static void nvme_del_cq_work_handler(struct kthread_work *work)
-{
-	struct nvme_queue *nvmeq = container_of(work, struct nvme_queue,
-							cmdinfo.work);
-	nvme_del_queue_end(nvmeq);
-}
-
-static int nvme_delete_cq(struct nvme_queue *nvmeq)
-{
-	return adapter_async_del_queue(nvmeq, nvme_admin_delete_cq,
-						nvme_del_cq_work_handler);
-}
-
-static void nvme_del_sq_work_handler(struct kthread_work *work)
-{
-	struct nvme_queue *nvmeq = container_of(work, struct nvme_queue,
-							cmdinfo.work);
-	int status = nvmeq->cmdinfo.status;
-
-	if (!status)
-		status = nvme_delete_cq(nvmeq);
-	if (status)
-		nvme_del_queue_end(nvmeq);
-}
-
-static int nvme_delete_sq(struct nvme_queue *nvmeq)
-{
-	return adapter_async_del_queue(nvmeq, nvme_admin_delete_sq,
-						nvme_del_sq_work_handler);
-}
-
-static void nvme_del_queue_start(struct kthread_work *work)
-{
-	struct nvme_queue *nvmeq = container_of(work, struct nvme_queue,
-							cmdinfo.work);
-	if (nvme_delete_sq(nvmeq))
-		nvme_del_queue_end(nvmeq);
-}
-
-static void nvme_disable_io_queues(struct nvme_dev *dev)
-{
-	int i;
-	DEFINE_KTHREAD_WORKER_ONSTACK(worker);
-	struct nvme_delq_ctx dq;
-	struct task_struct *kworker_task = kthread_run(kthread_worker_fn,
-					&worker, "nvme%d", dev->ctrl.instance);
-
-	if (IS_ERR(kworker_task)) {
-		dev_err(dev->dev,
-			"Failed to create queue del task\n");
-		for (i = dev->queue_count - 1; i > 0; i--)
-			nvme_disable_queue(dev, i);
-		return;
-	}
-
-	dq.waiter = NULL;
-	atomic_set(&dq.refcount, 0);
-	dq.worker = &worker;
-	for (i = dev->queue_count - 1; i > 0; i--) {
-		struct nvme_queue *nvmeq = dev->queues[i];
-
-		if (nvme_suspend_queue(nvmeq))
-			continue;
-		nvmeq->cmdinfo.ctx = nvme_get_dq(&dq);
-		nvmeq->cmdinfo.worker = dq.worker;
-		init_kthread_work(&nvmeq->cmdinfo.work, nvme_del_queue_start);
-		queue_kthread_work(dq.worker, &nvmeq->cmdinfo.work);
-	}
-	nvme_wait_dq(&dq, dev);
-	kthread_stop(kworker_task);
-}
-
 static int nvme_dev_list_add(struct nvme_dev *dev)
 {
 	bool start_thread = false;
@@ -2146,6 +2055,7 @@ static int nvme_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	INIT_WORK(&dev->reset_work, nvme_reset_work);
 	INIT_WORK(&dev->remove_work, nvme_remove_dead_ctrl_work);
 	mutex_init(&dev->shutdown_lock);
+	init_completion(&dev->ioq_wait);
 
 	result = nvme_setup_prp_pools(dev);
 	if (result)
-- 
2.6.2.307.g37023ba

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCHv3 5/5] NVMe: Shutdown controller only for power-off
  2016-01-04 16:10 [PATCHv3 0/5] NVMe fixes and updates, v3 Keith Busch
                   ` (3 preceding siblings ...)
  2016-01-04 16:10 ` [PATCHv3 4/5] NVMe: IO queue deletion re-write Keith Busch
@ 2016-01-04 16:10 ` Keith Busch
  2016-01-04 16:36 ` [PATCHv3 0/5] NVMe fixes and updates, v3 Christoph Hellwig
  5 siblings, 0 replies; 12+ messages in thread
From: Keith Busch @ 2016-01-04 16:10 UTC (permalink / raw)


We don't need to shutdown a controller for a reset. A controller in a
shutdown state may take longer to become ready than one that was simply
disabled. This patch has the driver shut down a controller only if the
device is about to be powered off or being removed. When taking the
controller down for a reset reason, the controller will be disabled
instead.

Function names have been updated in this patch to reflect their changed
semantics.

Signed-off-by: Keith Busch <keith.busch at intel.com>
---
 drivers/nvme/host/pci.c | 40 +++++++++++++++++++---------------------
 1 file changed, 19 insertions(+), 21 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 49829a7..3f536c2 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -87,7 +87,7 @@ struct nvme_queue;
 static int nvme_reset(struct nvme_dev *dev);
 static void nvme_process_cq(struct nvme_queue *nvmeq);
 static void nvme_remove_dead_ctrl(struct nvme_dev *dev);
-static void nvme_dev_shutdown(struct nvme_dev *dev);
+static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown);
 
 /*
  * Represents an NVM Express device.  Each nvme_dev is a PCI function.
@@ -933,7 +933,7 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved)
 		dev_warn(dev->dev,
 			 "I/O %d QID %d timeout, disable controller\n",
 			 req->tag, nvmeq->qid);
-		nvme_dev_shutdown(dev);
+		nvme_dev_disable(dev, false);
 		req->errors = NVME_SC_CANCELLED;
 		return BLK_EH_HANDLED;
 	}
@@ -947,7 +947,7 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved)
 		dev_warn(dev->dev,
 			 "I/O %d QID %d timeout, reset controller\n",
 			 req->tag, nvmeq->qid);
-		nvme_dev_shutdown(dev);
+		nvme_dev_disable(dev, false);
 		queue_work(nvme_workq, &dev->reset_work);
 
 		/*
@@ -1066,21 +1066,20 @@ static void nvme_clear_queue(struct nvme_queue *nvmeq)
 	spin_unlock_irq(&nvmeq->q_lock);
 }
 
-static void nvme_disable_queue(struct nvme_dev *dev, int qid)
+static void nvme_disable_admin_queue(struct nvme_dev *dev, bool shutdown)
 {
-	struct nvme_queue *nvmeq = dev->queues[qid];
+	struct nvme_queue *nvmeq = dev->queues[0];
 
 	if (!nvmeq)
 		return;
 	if (nvme_suspend_queue(nvmeq))
 		return;
 
-	/* Don't tell the adapter to delete the admin queue.
-	 * Don't tell a removed adapter to delete IO queues. */
-	if (qid && readl(dev->bar + NVME_REG_CSTS) != -1) {
-		adapter_delete_sq(dev, qid);
-		adapter_delete_cq(dev, qid);
-	}
+	if (shutdown)
+		nvme_shutdown_ctrl(&dev->ctrl);
+	else
+		nvme_disable_ctrl(&dev->ctrl, lo_hi_readq(
+						dev->bar + NVME_REG_CAP));
 
 	spin_lock_irq(&nvmeq->q_lock);
 	nvme_process_cq(nvmeq);
@@ -1817,7 +1816,7 @@ static void nvme_dev_list_remove(struct nvme_dev *dev)
 		kthread_stop(tmp);
 }
 
-static void nvme_dev_shutdown(struct nvme_dev *dev)
+static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
 {
 	int i;
 	u32 csts = -1;
@@ -1836,8 +1835,7 @@ static void nvme_dev_shutdown(struct nvme_dev *dev)
 		}
 	} else {
 		nvme_disable_io_queues(dev);
-		nvme_shutdown_ctrl(&dev->ctrl);
-		nvme_disable_queue(dev, 0);
+		nvme_disable_admin_queue(dev, shutdown);
 	}
 	nvme_dev_unmap(dev);
 
@@ -1896,7 +1894,7 @@ static void nvme_reset_work(struct work_struct *work)
 	 * moving on.
 	 */
 	if (dev->bar)
-		nvme_dev_shutdown(dev);
+		nvme_dev_disable(dev, false);
 
 	set_bit(NVME_CTRL_RESETTING, &dev->flags);
 
@@ -1950,7 +1948,7 @@ static void nvme_reset_work(struct work_struct *work)
 	dev->ctrl.admin_q = NULL;
 	dev->queues[0]->tags = NULL;
  disable:
-	nvme_disable_queue(dev, 0);
+	nvme_disable_admin_queue(dev, false);
  unmap:
 	nvme_dev_unmap(dev);
  out:
@@ -2085,7 +2083,7 @@ static void nvme_reset_notify(struct pci_dev *pdev, bool prepare)
 	struct nvme_dev *dev = pci_get_drvdata(pdev);
 
 	if (prepare)
-		nvme_dev_shutdown(dev);
+		nvme_dev_disable(dev, false);
 	else
 		queue_work(nvme_workq, &dev->reset_work);
 }
@@ -2093,7 +2091,7 @@ static void nvme_reset_notify(struct pci_dev *pdev, bool prepare)
 static void nvme_shutdown(struct pci_dev *pdev)
 {
 	struct nvme_dev *dev = pci_get_drvdata(pdev);
-	nvme_dev_shutdown(dev);
+	nvme_dev_disable(dev, true);
 }
 
 static void nvme_remove(struct pci_dev *pdev)
@@ -2109,7 +2107,7 @@ static void nvme_remove(struct pci_dev *pdev)
 	flush_work(&dev->scan_work);
 	nvme_remove_namespaces(&dev->ctrl);
 	nvme_uninit_ctrl(&dev->ctrl);
-	nvme_dev_shutdown(dev);
+	nvme_dev_disable(dev, true);
 	nvme_dev_remove_admin(dev);
 	nvme_free_queues(dev, 0);
 	nvme_release_cmb(dev);
@@ -2123,7 +2121,7 @@ static int nvme_suspend(struct device *dev)
 	struct pci_dev *pdev = to_pci_dev(dev);
 	struct nvme_dev *ndev = pci_get_drvdata(pdev);
 
-	nvme_dev_shutdown(ndev);
+	nvme_dev_disable(ndev, true);
 	return 0;
 }
 
@@ -2154,7 +2152,7 @@ static pci_ers_result_t nvme_error_detected(struct pci_dev *pdev,
 	case pci_channel_io_normal:
 		return PCI_ERS_RESULT_CAN_RECOVER;
 	case pci_channel_io_frozen:
-		nvme_dev_shutdown(dev);
+		nvme_dev_disable(dev, false);
 		return PCI_ERS_RESULT_NEED_RESET;
 	case pci_channel_io_perm_failure:
 		return PCI_ERS_RESULT_DISCONNECT;
-- 
2.6.2.307.g37023ba

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCHv3 0/5] NVMe fixes and updates, v3
  2016-01-04 16:10 [PATCHv3 0/5] NVMe fixes and updates, v3 Keith Busch
                   ` (4 preceding siblings ...)
  2016-01-04 16:10 ` [PATCHv3 5/5] NVMe: Shutdown controller only for power-off Keith Busch
@ 2016-01-04 16:36 ` Christoph Hellwig
  5 siblings, 0 replies; 12+ messages in thread
From: Christoph Hellwig @ 2016-01-04 16:36 UTC (permalink / raw)


I'm fine with the first three, and I think I alredy gave your my review,
but if not it's here:

Reviewed-by: Christoph Hellwig <hch at lst.de>

I really dislike 4, and I'll look over your previous mails once I
got an urgent issue of my back.  5 looks fine but depends on 4, so
let's wait on that for now as well.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCHv3 4/5] NVMe: IO queue deletion re-write
  2016-01-04 16:10 ` [PATCHv3 4/5] NVMe: IO queue deletion re-write Keith Busch
@ 2016-01-06 13:04   ` Christoph Hellwig
  2016-01-06 15:44     ` Keith Busch
  2016-01-06 19:51     ` Keith Busch
  0 siblings, 2 replies; 12+ messages in thread
From: Christoph Hellwig @ 2016-01-06 13:04 UTC (permalink / raw)


This is my counter-suggestion to the requeue abuse:

simply do two passes of your algorithm, one for the SQs, one for the
CQs.  Patch relative to the whole series below:

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 3f536c2..4f3d1e3 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -154,7 +154,6 @@ struct nvme_queue {
 	u16 qid;
 	u8 cq_phase;
 	u8 cqe_seen;
-	struct nvme_delete_queue dq;
 };
 
 /*
@@ -1587,57 +1586,52 @@ static void nvme_dev_scan(struct work_struct *work)
 
 static void nvme_del_queue_end(struct request *req, int error)
 {
-	unsigned long flags;
 	struct nvme_queue *nvmeq = req->end_io_data;
 
 	blk_mq_free_request(req);
-
-	spin_lock_irqsave(&nvmeq->q_lock, flags);
-	nvme_process_cq(nvmeq);
-	spin_unlock_irqrestore(&nvmeq->q_lock, flags);
-
 	complete(&nvmeq->dev->ioq_wait);
 }
 
-static void nvme_del_sq_end(struct request *req, int error)
+static void nvme_del_cq_end(struct request *req, int error)
 {
 	struct nvme_queue *nvmeq = req->end_io_data;
 
-	if (error) {
-		nvme_del_queue_end(req, error);
-		return;
-	}
+	if (!error) {
+		unsigned long flags;
 
-	nvmeq->dq.opcode = nvme_admin_delete_cq;
-	req->end_io = nvme_del_queue_end;
+		spin_lock_irqsave(&nvmeq->q_lock, flags);
+		nvme_process_cq(nvmeq);
+		spin_unlock_irqrestore(&nvmeq->q_lock, flags);
+	}
 
-	/* Must requeue. This callback occurs in irq w/ q_lock held */
-	nvme_requeue_req(req);
+	nvme_del_queue_end(req, error);
 }
 
-static int nvme_delete_queue(struct nvme_queue *nvmeq, struct request_queue *q)
+static int nvme_delete_queue(struct nvme_queue *nvmeq, u8 opcode)
 {
+	struct request_queue *q = nvmeq->dev->ctrl.admin_q;
 	struct request *req;
+	struct nvme_command cmd;
 
-	req = nvme_alloc_request(q, (struct nvme_command *)&nvmeq->dq,
-							BLK_MQ_REQ_NOWAIT);
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.delete_queue.opcode = nvme_admin_delete_sq;
+	cmd.delete_queue.qid = cpu_to_le16(nvmeq->qid);
+
+	req = nvme_alloc_request(q, &cmd, BLK_MQ_REQ_NOWAIT);
 	if (IS_ERR(req))
 		return PTR_ERR(req);
 
-	memset(&nvmeq->dq, 0, sizeof(nvmeq->dq));
-	nvmeq->dq.opcode = nvme_admin_delete_sq;
-	nvmeq->dq.qid = cpu_to_le16(nvmeq->qid);
-
-	req->end_io_data = nvmeq;
 	req->timeout = ADMIN_TIMEOUT;
+	req->end_io_data = nvmeq;
 
-	blk_execute_rq_nowait(q, NULL, req, false, nvme_del_sq_end);
+	blk_execute_rq_nowait(q, NULL, req, false,
+			opcode == nvme_admin_delete_cq ?
+				nvme_del_cq_end : nvme_del_queue_end);
 	return 0;
 }
 
-static void nvme_disable_io_queues(struct nvme_dev *dev)
+static int nvme_disable_io_queues(struct nvme_dev *dev, u8 opcode)
 {
-	struct request_queue *q = dev->ctrl.admin_q;
 	int i = dev->queue_count - 1, sent = 0;
 	unsigned long timeout;
 
@@ -1648,17 +1642,19 @@ static void nvme_disable_io_queues(struct nvme_dev *dev)
 		struct nvme_queue *nvmeq = dev->queues[i];
 
 		nvme_suspend_queue(nvmeq);
-		if (nvme_delete_queue(nvmeq, q))
+		if (nvme_delete_queue(nvmeq, opcode))
 			break;
 		++sent;
 	}
 	while (sent--) {
 		timeout = wait_for_completion_io_timeout(&dev->ioq_wait, timeout);
 		if (timeout == 0)
-			return;
+			return -EIO;
 		if (i)
 			goto retry;
 	}
+
+	return 0;
 }
 
 /*
@@ -1834,7 +1830,8 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
 			nvme_suspend_queue(nvmeq);
 		}
 	} else {
-		nvme_disable_io_queues(dev);
+		if (!nvme_disable_io_queues(dev, nvme_admin_delete_sq))
+			nvme_disable_io_queues(dev, nvme_admin_delete_cq);
 		nvme_disable_admin_queue(dev, shutdown);
 	}
 	nvme_dev_unmap(dev);

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCHv3 4/5] NVMe: IO queue deletion re-write
  2016-01-06 13:04   ` Christoph Hellwig
@ 2016-01-06 15:44     ` Keith Busch
  2016-01-12 20:33       ` Jens Axboe
  2016-01-06 19:51     ` Keith Busch
  1 sibling, 1 reply; 12+ messages in thread
From: Keith Busch @ 2016-01-06 15:44 UTC (permalink / raw)


On Wed, Jan 06, 2016@05:04:19AM -0800, Christoph Hellwig wrote:
> This is my counter-suggestion to the requeue abuse:
> 
> simply do two passes of your algorithm, one for the SQs, one for the
> CQs.  Patch relative to the whole series below:

Yes, thanks, this looks great!

Just a nitpick, I think it'd look better semantically to have
nvme_disable_io_queues handle switching the opcodes instead of having
to call it twice, but this is fine as-is.

Acked-by: Keith Busch <keith.busch at intel.com>
 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 3f536c2..4f3d1e3 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -154,7 +154,6 @@ struct nvme_queue {
>  	u16 qid;
>  	u8 cq_phase;
>  	u8 cqe_seen;
> -	struct nvme_delete_queue dq;
>  };
>  
>  /*
> @@ -1587,57 +1586,52 @@ static void nvme_dev_scan(struct work_struct *work)
>  
>  static void nvme_del_queue_end(struct request *req, int error)
>  {
> -	unsigned long flags;
>  	struct nvme_queue *nvmeq = req->end_io_data;
>  
>  	blk_mq_free_request(req);
> -
> -	spin_lock_irqsave(&nvmeq->q_lock, flags);
> -	nvme_process_cq(nvmeq);
> -	spin_unlock_irqrestore(&nvmeq->q_lock, flags);
> -
>  	complete(&nvmeq->dev->ioq_wait);
>  }
>  
> -static void nvme_del_sq_end(struct request *req, int error)
> +static void nvme_del_cq_end(struct request *req, int error)
>  {
>  	struct nvme_queue *nvmeq = req->end_io_data;
>  
> -	if (error) {
> -		nvme_del_queue_end(req, error);
> -		return;
> -	}
> +	if (!error) {
> +		unsigned long flags;
>  
> -	nvmeq->dq.opcode = nvme_admin_delete_cq;
> -	req->end_io = nvme_del_queue_end;
> +		spin_lock_irqsave(&nvmeq->q_lock, flags);
> +		nvme_process_cq(nvmeq);
> +		spin_unlock_irqrestore(&nvmeq->q_lock, flags);
> +	}
>  
> -	/* Must requeue. This callback occurs in irq w/ q_lock held */
> -	nvme_requeue_req(req);
> +	nvme_del_queue_end(req, error);
>  }
>  
> -static int nvme_delete_queue(struct nvme_queue *nvmeq, struct request_queue *q)
> +static int nvme_delete_queue(struct nvme_queue *nvmeq, u8 opcode)
>  {
> +	struct request_queue *q = nvmeq->dev->ctrl.admin_q;
>  	struct request *req;
> +	struct nvme_command cmd;
>  
> -	req = nvme_alloc_request(q, (struct nvme_command *)&nvmeq->dq,
> -							BLK_MQ_REQ_NOWAIT);
> +	memset(&cmd, 0, sizeof(cmd));
> +	cmd.delete_queue.opcode = nvme_admin_delete_sq;
> +	cmd.delete_queue.qid = cpu_to_le16(nvmeq->qid);
> +
> +	req = nvme_alloc_request(q, &cmd, BLK_MQ_REQ_NOWAIT);
>  	if (IS_ERR(req))
>  		return PTR_ERR(req);
>  
> -	memset(&nvmeq->dq, 0, sizeof(nvmeq->dq));
> -	nvmeq->dq.opcode = nvme_admin_delete_sq;
> -	nvmeq->dq.qid = cpu_to_le16(nvmeq->qid);
> -
> -	req->end_io_data = nvmeq;
>  	req->timeout = ADMIN_TIMEOUT;
> +	req->end_io_data = nvmeq;
>  
> -	blk_execute_rq_nowait(q, NULL, req, false, nvme_del_sq_end);
> +	blk_execute_rq_nowait(q, NULL, req, false,
> +			opcode == nvme_admin_delete_cq ?
> +				nvme_del_cq_end : nvme_del_queue_end);
>  	return 0;
>  }
>  
> -static void nvme_disable_io_queues(struct nvme_dev *dev)
> +static int nvme_disable_io_queues(struct nvme_dev *dev, u8 opcode)
>  {
> -	struct request_queue *q = dev->ctrl.admin_q;
>  	int i = dev->queue_count - 1, sent = 0;
>  	unsigned long timeout;
>  
> @@ -1648,17 +1642,19 @@ static void nvme_disable_io_queues(struct nvme_dev *dev)
>  		struct nvme_queue *nvmeq = dev->queues[i];
>  
>  		nvme_suspend_queue(nvmeq);
> -		if (nvme_delete_queue(nvmeq, q))
> +		if (nvme_delete_queue(nvmeq, opcode))
>  			break;
>  		++sent;
>  	}
>  	while (sent--) {
>  		timeout = wait_for_completion_io_timeout(&dev->ioq_wait, timeout);
>  		if (timeout == 0)
> -			return;
> +			return -EIO;
>  		if (i)
>  			goto retry;
>  	}
> +
> +	return 0;
>  }
>  
>  /*
> @@ -1834,7 +1830,8 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
>  			nvme_suspend_queue(nvmeq);
>  		}
>  	} else {
> -		nvme_disable_io_queues(dev);
> +		if (!nvme_disable_io_queues(dev, nvme_admin_delete_sq))
> +			nvme_disable_io_queues(dev, nvme_admin_delete_cq);
>  		nvme_disable_admin_queue(dev, shutdown);
>  	}
>  	nvme_dev_unmap(dev);

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCHv3 4/5] NVMe: IO queue deletion re-write
  2016-01-06 13:04   ` Christoph Hellwig
  2016-01-06 15:44     ` Keith Busch
@ 2016-01-06 19:51     ` Keith Busch
  2016-01-07 18:54       ` Christoph Hellwig
  1 sibling, 1 reply; 12+ messages in thread
From: Keith Busch @ 2016-01-06 19:51 UTC (permalink / raw)


On Wed, Jan 06, 2016@05:04:19AM -0800, Christoph Hellwig wrote:
> -static int nvme_delete_queue(struct nvme_queue *nvmeq, struct request_queue *q)
> +static int nvme_delete_queue(struct nvme_queue *nvmeq, u8 opcode)
>  {
> +	struct request_queue *q = nvmeq->dev->ctrl.admin_q;
>  	struct request *req;
> +	struct nvme_command cmd;
>  
> -	req = nvme_alloc_request(q, (struct nvme_command *)&nvmeq->dq,
> -							BLK_MQ_REQ_NOWAIT);
> +	memset(&cmd, 0, sizeof(cmd));
> +	cmd.delete_queue.opcode = nvme_admin_delete_sq;

Now I actually ran with the proposal and found this minor opcode
oversight. Should be:

+	cmd.delete_queue.opcode = opcode;

After that, it's successful.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCHv3 4/5] NVMe: IO queue deletion re-write
  2016-01-06 19:51     ` Keith Busch
@ 2016-01-07 18:54       ` Christoph Hellwig
  0 siblings, 0 replies; 12+ messages in thread
From: Christoph Hellwig @ 2016-01-07 18:54 UTC (permalink / raw)


On Wed, Jan 06, 2016@07:51:12PM +0000, Keith Busch wrote:
> Now I actually ran with the proposal and found this minor opcode
> oversight. Should be:
> 
> +	cmd.delete_queue.opcode = opcode;
> 
> After that, it's successful.

Ok.  Feel free to fold the whole thing into your original patch.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCHv3 4/5] NVMe: IO queue deletion re-write
  2016-01-06 15:44     ` Keith Busch
@ 2016-01-12 20:33       ` Jens Axboe
  0 siblings, 0 replies; 12+ messages in thread
From: Jens Axboe @ 2016-01-12 20:33 UTC (permalink / raw)


On 01/06/2016 08:44 AM, Keith Busch wrote:
> On Wed, Jan 06, 2016@05:04:19AM -0800, Christoph Hellwig wrote:
>> This is my counter-suggestion to the requeue abuse:
>>
>> simply do two passes of your algorithm, one for the SQs, one for the
>> CQs.  Patch relative to the whole series below:
>
> Yes, thanks, this looks great!
>
> Just a nitpick, I think it'd look better semantically to have
> nvme_disable_io_queues handle switching the opcodes instead of having
> to call it twice, but this is fine as-is.
>
> Acked-by: Keith Busch <keith.busch at intel.com>

Keith, are you going to re-post 4-5/5?

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2016-01-12 20:33 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-01-04 16:10 [PATCHv3 0/5] NVMe fixes and updates, v3 Keith Busch
2016-01-04 16:10 ` [PATCHv3 1/5] NVMe: Fix admin queue ring wrap Keith Busch
2016-01-04 16:10 ` [PATCHv3 2/5] NVMe: Use a retryable error code on reset Keith Busch
2016-01-04 16:10 ` [PATCHv3 3/5] NVMe: Remove queue freezing on resets Keith Busch
2016-01-04 16:10 ` [PATCHv3 4/5] NVMe: IO queue deletion re-write Keith Busch
2016-01-06 13:04   ` Christoph Hellwig
2016-01-06 15:44     ` Keith Busch
2016-01-12 20:33       ` Jens Axboe
2016-01-06 19:51     ` Keith Busch
2016-01-07 18:54       ` Christoph Hellwig
2016-01-04 16:10 ` [PATCHv3 5/5] NVMe: Shutdown controller only for power-off Keith Busch
2016-01-04 16:36 ` [PATCHv3 0/5] NVMe fixes and updates, v3 Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.