[PATCHv2 00/10] Second attempt at blk-mq + nvme hotplug fixes

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCHv2 00/10] Second attempt at blk-mq + nvme hotplug fixes
@ 2015-01-07  2:57 Keith Busch
  2015-01-07  2:57 ` [PATCHv2 01/10] blk-mq: Wake tasks entering queue on dying Keith Busch
                   ` (11 more replies)
  0 siblings, 12 replies; 20+ messages in thread
From: Keith Busch @ 2015-01-07  2:57 UTC (permalink / raw)


Second try, this time tested against many more scenarios than before
with error injection and surprise hot-removal and intermittent resets.

I'm adding a lot of stuff outside the driver, but I didn't find a
cleaner way to a lot of these things. This makes me a little nervous,
so please let me know if anything seems amiss here. I don't think any
of the blk-mq changes could possibly be harmful to anyone else since
nvme is the only driver that uses most of the additions.

The only issue remaining I found is unfreezing queues might tigger the
percpu_ref_reinit WARN_ON_ONCE when the driver restarts a request_queue
with queued up IO's.

This is against linux-block/for-next.

Jens,
I believe fourth one ("abort requeue list") is from you, but I didn't find
the patch.

Keith Busch (10):
  blk-mq: Wake tasks entering a dying queue 
  blk-mq: Export test for started requests
  blk-mq: Let drivers cancel requeue_work
  blk-mq: Export abort requeue list
  blk-mq: Allow requests to never expire
  blk-mq: End unstarted requests on a dying queue
  NVMe: Start driver allocated requests
  NVMe: Start and stop h/w queues on reset
  NVMe: Admin queue error handling
  NVMe: Command abort handling fixes

 block/blk-mq.c            |   46 ++++++++++++++-
 drivers/block/nvme-core.c |  142 +++++++++++++++++++++++++++++++--------------
 include/linux/blk-mq.h    |    3 +
 include/linux/blkdev.h    |    1 +
 4 files changed, 146 insertions(+), 46 deletions(-)

-- 
1.7.10.4

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCHv2 01/10] blk-mq: Wake tasks entering queue on dying
  2015-01-07  2:57 [PATCHv2 00/10] Second attempt at blk-mq + nvme hotplug fixes Keith Busch
@ 2015-01-07  2:57 ` Keith Busch
  2015-01-07 16:18   ` Jens Axboe
  2015-01-07  2:57 ` [PATCHv2 02/10] blk-mq: Export test for started requests Keith Busch
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 20+ messages in thread
From: Keith Busch @ 2015-01-07  2:57 UTC (permalink / raw)


When the queue is set to dying, wake up tasks that are waiting on frozen
queue so they realize it is dying and abandon their request.

Signed-off-by: Keith Busch <keith.busch at intel.com>
---
 block/blk-mq.c |    1 +
 1 file changed, 1 insertion(+)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 1a41d7a..6d83ee6 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -160,6 +160,7 @@ void blk_mq_wake_waiters(struct request_queue *q)
 	queue_for_each_hw_ctx(q, hctx, i)
 		if (blk_mq_hw_queue_mapped(hctx))
 			blk_mq_tag_wakeup_all(hctx->tags, true);
+	wake_up_all(&q->mq_freeze_wq);
 }
 
 bool blk_mq_can_queue(struct blk_mq_hw_ctx *hctx)
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCHv2 02/10] blk-mq: Export test for started requests
  2015-01-07  2:57 [PATCHv2 00/10] Second attempt at blk-mq + nvme hotplug fixes Keith Busch
  2015-01-07  2:57 ` [PATCHv2 01/10] blk-mq: Wake tasks entering queue on dying Keith Busch
@ 2015-01-07  2:57 ` Keith Busch
  2015-01-07  2:57 ` [PATCHv2 03/10] blk-mq: Let drivers cancel requeue_work Keith Busch
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 20+ messages in thread
From: Keith Busch @ 2015-01-07  2:57 UTC (permalink / raw)


Drivers can iterate over all allocated request tags, but their callback
needs a way to know if the driver started the request in the first place.

Signed-off-by: Keith Busch <keith.busch at intel.com>
---
 block/blk-mq.c         |    6 ++++++
 include/linux/blk-mq.h |    1 +
 2 files changed, 7 insertions(+)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 6d83ee6..82930c0 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -398,6 +398,12 @@ void blk_mq_complete_request(struct request *rq)
 }
 EXPORT_SYMBOL(blk_mq_complete_request);
 
+int blk_mq_request_started(struct request *rq)
+{
+	return test_bit(REQ_ATOM_STARTED, &rq->atomic_flags);
+}
+EXPORT_SYMBOL_GPL(blk_mq_request_started);
+
 void blk_mq_start_request(struct request *rq)
 {
 	struct request_queue *q = rq->q;
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 3b43f50..8bbd082 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -195,6 +195,7 @@ static inline u16 blk_mq_unique_tag_to_tag(u32 unique_tag)
 struct blk_mq_hw_ctx *blk_mq_map_queue(struct request_queue *, const int ctx_index);
 struct blk_mq_hw_ctx *blk_mq_alloc_single_hw_queue(struct blk_mq_tag_set *, unsigned int, int);
 
+int blk_mq_request_started(struct request *rq);
 void blk_mq_start_request(struct request *rq);
 void blk_mq_end_request(struct request *rq, int error);
 void __blk_mq_end_request(struct request *rq, int error);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCHv2 03/10] blk-mq: Let drivers cancel requeue_work
  2015-01-07  2:57 [PATCHv2 00/10] Second attempt at blk-mq + nvme hotplug fixes Keith Busch
  2015-01-07  2:57 ` [PATCHv2 01/10] blk-mq: Wake tasks entering queue on dying Keith Busch
  2015-01-07  2:57 ` [PATCHv2 02/10] blk-mq: Export test for started requests Keith Busch
@ 2015-01-07  2:57 ` Keith Busch
  2015-01-07  2:57 ` [PATCHv2 04/10] blk-mq: Export abort requeue list Keith Busch
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 20+ messages in thread
From: Keith Busch @ 2015-01-07  2:57 UTC (permalink / raw)


Requeueing requests run h/w queues in a work_queue, which may alter the
driver's requested state to temporarily stop them. This patch exports
a method to cancel the q->requeue_work so a driver can be assured it's
stopped h/w queues won't be started up when it is not ready.

Signed-off-by: Keith Busch <keith.busch at intel.com>
---
 block/blk-mq.c         |    6 ++++++
 include/linux/blk-mq.h |    1 +
 2 files changed, 7 insertions(+)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 82930c0..a976db4 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -521,6 +521,12 @@ void blk_mq_add_to_requeue_list(struct request *rq, bool at_head)
 }
 EXPORT_SYMBOL(blk_mq_add_to_requeue_list);
 
+void blk_mq_cancel_requeue_work(struct request_queue *q)
+{
+	cancel_work_sync(&q->requeue_work);
+}
+EXPORT_SYMBOL_GPL(blk_mq_cancel_requeue_work);
+
 void blk_mq_kick_requeue_list(struct request_queue *q)
 {
 	kblockd_schedule_work(&q->requeue_work);
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 8bbd082..b509ef5 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -202,6 +202,7 @@ void __blk_mq_end_request(struct request *rq, int error);
 
 void blk_mq_requeue_request(struct request *rq);
 void blk_mq_add_to_requeue_list(struct request *rq, bool at_head);
+void blk_mq_cancel_requeue_work(struct request_queue *q);
 void blk_mq_kick_requeue_list(struct request_queue *q);
 void blk_mq_complete_request(struct request *rq);
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCHv2 04/10] blk-mq: Export abort requeue list
  2015-01-07  2:57 [PATCHv2 00/10] Second attempt at blk-mq + nvme hotplug fixes Keith Busch
                   ` (2 preceding siblings ...)
  2015-01-07  2:57 ` [PATCHv2 03/10] blk-mq: Let drivers cancel requeue_work Keith Busch
@ 2015-01-07  2:57 ` Keith Busch
  2015-01-07  2:57 ` [PATCHv2 05/10] blk-mq: Allow requests to never expire Keith Busch
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 20+ messages in thread
From: Keith Busch @ 2015-01-07  2:57 UTC (permalink / raw)


This patch lets a driver abort all requeued requests in case there are
any pending, but the h/w queue will never become available again.

Signed-off-by: Keith Busch <keith.busch at intel.com>
---
 block/blk-mq.c         |   20 ++++++++++++++++++++
 include/linux/blk-mq.h |    1 +
 2 files changed, 21 insertions(+)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index a976db4..f6e1225 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -533,6 +533,26 @@ void blk_mq_kick_requeue_list(struct request_queue *q)
 }
 EXPORT_SYMBOL(blk_mq_kick_requeue_list);
 
+void blk_mq_abort_requeue_list(struct request_queue *q)
+{
+	unsigned long flags;
+	LIST_HEAD(rq_list);
+
+	spin_lock_irqsave(&q->requeue_lock, flags);
+	list_splice_init(&q->requeue_list, &rq_list);
+	spin_unlock_irqrestore(&q->requeue_lock, flags);
+
+	while (!list_empty(&rq_list)) {
+		struct request *rq;
+
+		rq = list_first_entry(&rq_list, struct request, queuelist);
+		list_del_init(&rq->queuelist);
+		rq->errors = -EIO;
+		blk_mq_end_request(rq, rq->errors);
+	}
+}
+EXPORT_SYMBOL(blk_mq_abort_requeue_list);
+
 static inline bool is_flush_request(struct request *rq,
 		struct blk_flush_queue *fq, unsigned int tag)
 {
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index b509ef5..a09728e 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -204,6 +204,7 @@ void blk_mq_requeue_request(struct request *rq);
 void blk_mq_add_to_requeue_list(struct request *rq, bool at_head);
 void blk_mq_cancel_requeue_work(struct request_queue *q);
 void blk_mq_kick_requeue_list(struct request_queue *q);
+void blk_mq_abort_requeue_list(struct request_queue *q);
 void blk_mq_complete_request(struct request *rq);
 
 void blk_mq_stop_hw_queue(struct blk_mq_hw_ctx *hctx);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCHv2 05/10] blk-mq: Allow requests to never expire
  2015-01-07  2:57 [PATCHv2 00/10] Second attempt at blk-mq + nvme hotplug fixes Keith Busch
                   ` (3 preceding siblings ...)
  2015-01-07  2:57 ` [PATCHv2 04/10] blk-mq: Export abort requeue list Keith Busch
@ 2015-01-07  2:57 ` Keith Busch
  2015-01-07 16:16   ` Jens Axboe
  2015-01-07  2:58 ` [PATCHv2 06/10] blk-mq: End unstarted requests on a dying queue Keith Busch
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 20+ messages in thread
From: Keith Busch @ 2015-01-07  2:57 UTC (permalink / raw)


Some requests may be started but have no gaurantee they'll ever
complete. This defines a special timeout value that a driver can use so
the request will never be timed out.

Signed-off-by: Keith Busch <keith.busch at intel.com>
---
 block/blk-mq.c         |    8 +++++---
 include/linux/blkdev.h |    1 +
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index f6e1225..5dbd315b 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -414,7 +414,8 @@ void blk_mq_start_request(struct request *rq)
 	if (unlikely(blk_bidi_rq(rq)))
 		rq->next_rq->resid_len = blk_rq_bytes(rq->next_rq);
 
-	blk_add_timer(rq);
+	if (rq->timeout != REQ_NO_TIMEOUT)
+		blk_add_timer(rq);
 
 	/*
 	 * Ensure that ->deadline is visible before set the started
@@ -613,13 +614,14 @@ void blk_mq_rq_timed_out(struct request *req, bool reserved)
 		break;
 	}
 }
-		
+
 static void blk_mq_check_expired(struct blk_mq_hw_ctx *hctx,
 		struct request *rq, void *priv, bool reserved)
 {
 	struct blk_mq_timeout_data *data = priv;
 
-	if (!test_bit(REQ_ATOM_STARTED, &rq->atomic_flags))
+	if (!test_bit(REQ_ATOM_STARTED, &rq->atomic_flags)  ||
+				rq->timeout == REQ_NO_TIMEOUT)
 		return;
 
 	if (time_after_eq(jiffies, rq->deadline)) {
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 92f4b4b..096b4f7f 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -198,6 +198,7 @@ struct request {
 
 	unsigned long deadline;
 	struct list_head timeout_list;
+#define REQ_NO_TIMEOUT UINT_MAX
 	unsigned int timeout;
 	int retries;
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCHv2 06/10] blk-mq: End unstarted requests on a dying queue
  2015-01-07  2:57 [PATCHv2 00/10] Second attempt at blk-mq + nvme hotplug fixes Keith Busch
                   ` (4 preceding siblings ...)
  2015-01-07  2:57 ` [PATCHv2 05/10] blk-mq: Allow requests to never expire Keith Busch
@ 2015-01-07  2:58 ` Keith Busch
  2015-01-07  2:58 ` [PATCHv2 07/10] NVMe: Start driver allocated requests Keith Busch
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 20+ messages in thread
From: Keith Busch @ 2015-01-07  2:58 UTC (permalink / raw)


Requests that haven't been started prior to a queue dying can be ended
in error without having to wait for them to time out.

Signed-off-by: Keith Busch <keith.busch at intel.com>
---
 block/blk-mq.c |    7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 5dbd315b..1443c77 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -621,8 +621,13 @@ static void blk_mq_check_expired(struct blk_mq_hw_ctx *hctx,
 	struct blk_mq_timeout_data *data = priv;
 
 	if (!test_bit(REQ_ATOM_STARTED, &rq->atomic_flags)  ||
-				rq->timeout == REQ_NO_TIMEOUT)
+				rq->timeout == REQ_NO_TIMEOUT) {
+		if (unlikely(blk_queue_dying(rq->q))) {
+			rq->errors = -EIO;
+			blk_mq_complete_request(rq);
+		}
 		return;
+	}
 
 	if (time_after_eq(jiffies, rq->deadline)) {
 		if (!blk_mark_rq_complete(rq))
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCHv2 07/10] NVMe: Start driver allocated requests
  2015-01-07  2:57 [PATCHv2 00/10] Second attempt at blk-mq + nvme hotplug fixes Keith Busch
                   ` (5 preceding siblings ...)
  2015-01-07  2:58 ` [PATCHv2 06/10] blk-mq: End unstarted requests on a dying queue Keith Busch
@ 2015-01-07  2:58 ` Keith Busch
  2015-01-07  2:58 ` [PATCHv2 08/10] NVMe: Start and stop h/w queues on reset Keith Busch
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 20+ messages in thread
From: Keith Busch @ 2015-01-07  2:58 UTC (permalink / raw)


Once the nvme callback is set for a request, the driver can start it
and make it available for timeout handling. For timed out commands on a
device that is not initialized, this fixes potential deadlocks that can
occur on startup and shutdown when a device is unresponsive since they
can now be cancelled.

Asynchronous requests do not have any expected timeout, so these are
using the new "REQ_NO_TIMEOUT".

Signed-off-by: Keith Busch <keith.busch at intel.com>
---
 drivers/block/nvme-core.c |   16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/block/nvme-core.c b/drivers/block/nvme-core.c
index f7d083b..ff3012b 100644
--- a/drivers/block/nvme-core.c
+++ b/drivers/block/nvme-core.c
@@ -215,6 +215,7 @@ static void nvme_set_info(struct nvme_cmd_info *cmd, void *ctx,
 	cmd->fn = handler;
 	cmd->ctx = ctx;
 	cmd->aborted = 0;
+	blk_mq_start_request(blk_mq_rq_from_pdu(cmd));
 }
 
 /* Special values must be less than 0x1000 */
@@ -664,8 +665,6 @@ static int nvme_queue_rq(struct blk_mq_hw_ctx *hctx,
 		}
 	}
 
-	blk_mq_start_request(req);
-
 	nvme_set_info(cmd, iod, req_completion);
 	spin_lock_irq(&nvmeq->q_lock);
 	if (req->cmd_flags & REQ_DISCARD)
@@ -835,6 +834,7 @@ static int nvme_submit_async_admin_req(struct nvme_dev *dev)
 	if (IS_ERR(req))
 		return PTR_ERR(req);
 
+	req->timeout = REQ_NO_TIMEOUT;
 	cmd_info = blk_mq_rq_to_pdu(req);
 	nvme_set_info(cmd_info, req, async_req_completion);
 
@@ -1086,8 +1086,16 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved)
 
 	dev_warn(nvmeq->q_dmadev, "Timeout I/O %d QID %d\n", req->tag,
 							nvmeq->qid);
-	if (nvmeq->dev->initialized)
-		nvme_abort_req(req);
+
+	if (!nvmeq->dev->initialized) {
+		/*
+		 * Force cancelled command frees the request, which requires we
+		 * return BLK_EH_NOT_HANDLED.
+		 */
+		nvme_cancel_queue_ios(nvmeq->hctx, req, nvmeq, reserved);
+		return BLK_EH_NOT_HANDLED;
+	}
+	nvme_abort_req(req);
 
 	/*
 	 * The aborted req will be completed on receiving the abort req.
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCHv2 08/10] NVMe: Start and stop h/w queues on reset
  2015-01-07  2:57 [PATCHv2 00/10] Second attempt at blk-mq + nvme hotplug fixes Keith Busch
                   ` (6 preceding siblings ...)
  2015-01-07  2:58 ` [PATCHv2 07/10] NVMe: Start driver allocated requests Keith Busch
@ 2015-01-07  2:58 ` Keith Busch
  2015-01-07  3:08   ` Keith Busch
  2015-01-07  2:58 ` [PATCHv2 09/10] NVMe: Admin queue error handling Keith Busch
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 20+ messages in thread
From: Keith Busch @ 2015-01-07  2:58 UTC (permalink / raw)


This freezes and stops all the queues on device shutdown and restarts
them on resume, and fixes hotplug and reset issues when the controller
is actively being used.

Signed-off-by: Keith Busch <keith.busch at intel.com>
---
 drivers/block/nvme-core.c |   42 +++++++++++++++++++++++++++++++++++++++---
 1 file changed, 39 insertions(+), 3 deletions(-)

diff --git a/drivers/block/nvme-core.c b/drivers/block/nvme-core.c
index ff3012b..571577c 100644
--- a/drivers/block/nvme-core.c
+++ b/drivers/block/nvme-core.c
@@ -433,7 +433,10 @@ static void req_completion(struct nvme_queue *nvmeq, void *ctx,
 		if (!(status & NVME_SC_DNR || blk_noretry_request(req))
 		    && (jiffies - req->start_time) < req->timeout) {
 			blk_mq_requeue_request(req);
-			blk_mq_kick_requeue_list(req->q);
+			spin_lock(req->q->queue_lock);
+			if (!blk_queue_stopped(req->q))
+				blk_mq_kick_requeue_list(req->q);
+			spin_unlock(req->q->queue_lock);
 			return;
 		}
 		req->errors = nvme_error_status(status);
@@ -2391,6 +2394,34 @@ static void nvme_dev_list_remove(struct nvme_dev *dev)
 		kthread_stop(tmp);
 }
 
+static void nvme_freeze_queues(struct nvme_dev *dev)
+{
+	struct nvme_ns *ns;
+
+	list_for_each_entry(ns, &dev->namespaces, list) {
+		blk_mq_freeze_queue_start(ns->queue);
+
+		spin_lock(ns->queue->queue_lock);
+		queue_flag_set(QUEUE_FLAG_STOPPED, ns->queue);
+		spin_unlock(ns->queue->queue_lock);
+
+		blk_mq_cancel_requeue_work(ns->queue);
+		blk_mq_stop_hw_queues(ns->queue);
+	}
+}
+
+static void nvme_unfreeze_queues(struct nvme_dev *dev)
+{
+	struct nvme_ns *ns;
+
+	list_for_each_entry(ns, &dev->namespaces, list) {
+		queue_flag_clear_unlocked(QUEUE_FLAG_STOPPED, ns->queue);
+		blk_mq_unfreeze_queue(ns->queue);
+		blk_mq_start_stopped_hw_queues(ns->queue, true);
+		blk_mq_kick_requeue_list(ns->queue);
+	}
+}
+
 static void nvme_dev_shutdown(struct nvme_dev *dev)
 {
 	int i;
@@ -2399,8 +2430,10 @@ static void nvme_dev_shutdown(struct nvme_dev *dev)
 	dev->initialized = 0;
 	nvme_dev_list_remove(dev);
 
-	if (dev->bar)
+	if (dev->bar) {
+		nvme_freeze_queues(dev);
 		csts = readl(&dev->bar->csts);
+	}
 	if (csts & NVME_CSTS_CFS || !(csts & NVME_CSTS_RDY)) {
 		for (i = dev->queue_count - 1; i >= 0; i--) {
 			struct nvme_queue *nvmeq = dev->queues[i];
@@ -2654,6 +2687,9 @@ static int nvme_dev_resume(struct nvme_dev *dev)
 		dev->reset_workfn = nvme_remove_disks;
 		queue_work(nvme_workq, &dev->reset_work);
 		spin_unlock(&dev_list_lock);
+	} else {
+		nvme_unfreeze_queues(dev);
+		nvme_set_irq_hints(dev);
 	}
 	dev->initialized = 1;
 	return 0;
@@ -2791,8 +2827,8 @@ static void nvme_remove(struct pci_dev *pdev)
 	pci_set_drvdata(pdev, NULL);
 	flush_work(&dev->reset_work);
 	misc_deregister(&dev->miscdev);
-	nvme_dev_remove(dev);
 	nvme_dev_shutdown(dev);
+	nvme_dev_remove(dev);
 	nvme_dev_remove_admin(dev);
 	nvme_free_queues(dev, 0);
 	nvme_free_admin_tags(dev);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCHv2 09/10] NVMe: Admin queue error handling
  2015-01-07  2:57 [PATCHv2 00/10] Second attempt at blk-mq + nvme hotplug fixes Keith Busch
                   ` (7 preceding siblings ...)
  2015-01-07  2:58 ` [PATCHv2 08/10] NVMe: Start and stop h/w queues on reset Keith Busch
@ 2015-01-07  2:58 ` Keith Busch
  2015-01-07  2:58 ` [PATCHv2 10/10] NVMe: Command abort handling fixes Keith Busch
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 20+ messages in thread
From: Keith Busch @ 2015-01-07  2:58 UTC (permalink / raw)


This protects the admin queue access in a variety of scenarios:

Its request_queue is reference counted once it is allocated so it can't
be deleted while there is an open reference on the controller and fixes
a use-after-free error on some hot-remove scenarios.

The queue is frozen on reset after all IO queues have been deleted
since the controller cannot accept commands after this anyway, so new
requests will block until the reset completes. Since the queue has to
be unfrozen, the function doing that was moved to the point after the
h/w queue was initialized, which is probably where it should have gone
in the first place.

Special handling is done if the controller becomes unresponsive on a
shutdown to forcefully cancel commands, and then wait for the remaining
queue deletion workers to finish.

This patch also removed the unnecessary software signals that were
previously used to signal the worker thread to die, but that mechanism
is not used in this path anymore, and fixed the signedness on cq_vector
so we don't disable a queue twice.

Signed-off-by: Keith Busch <keith.busch at intel.com>
---
 drivers/block/nvme-core.c |   67 +++++++++++++++++++++++----------------------
 1 file changed, 35 insertions(+), 32 deletions(-)

diff --git a/drivers/block/nvme-core.c b/drivers/block/nvme-core.c
index 571577c..f20e6c6 100644
--- a/drivers/block/nvme-core.c
+++ b/drivers/block/nvme-core.c
@@ -106,7 +106,7 @@ struct nvme_queue {
 	dma_addr_t cq_dma_addr;
 	u32 __iomem *q_db;
 	u16 q_depth;
-	u16 cq_vector;
+	s16 cq_vector;
 	u16 sq_head;
 	u16 sq_tail;
 	u16 cq_head;
@@ -1186,6 +1186,8 @@ static void nvme_disable_queue(struct nvme_dev *dev, int qid)
 		adapter_delete_sq(dev, qid);
 		adapter_delete_cq(dev, qid);
 	}
+	if (!qid)
+		blk_mq_freeze_queue_start(dev->admin_q);
 	nvme_clear_queue(nvmeq);
 }
 
@@ -1372,6 +1374,14 @@ static struct blk_mq_ops nvme_mq_ops = {
 	.timeout	= nvme_timeout,
 };
 
+static void nvme_dev_remove_admin(struct nvme_dev *dev)
+{
+	if (dev->admin_q && !blk_queue_dying(dev->admin_q)) {
+		blk_cleanup_queue(dev->admin_q);
+		blk_mq_free_tag_set(&dev->admin_tagset);
+	}
+}
+
 static int nvme_alloc_admin_tags(struct nvme_dev *dev)
 {
 	if (!dev->admin_q) {
@@ -1391,17 +1401,16 @@ static int nvme_alloc_admin_tags(struct nvme_dev *dev)
 			blk_mq_free_tag_set(&dev->admin_tagset);
 			return -ENOMEM;
 		}
-	}
+		if (!blk_get_queue(dev->admin_q)) {
+			nvme_dev_remove_admin(dev);
+			return -ENODEV;
+		}
+	} else
+		blk_mq_unfreeze_queue(dev->admin_q);
 
 	return 0;
 }
 
-static void nvme_free_admin_tags(struct nvme_dev *dev)
-{
-	if (dev->admin_q)
-		blk_mq_free_tag_set(&dev->admin_tagset);
-}
-
 static int nvme_configure_admin_queue(struct nvme_dev *dev)
 {
 	int result;
@@ -1456,19 +1465,13 @@ static int nvme_configure_admin_queue(struct nvme_dev *dev)
 	if (result)
 		goto free_nvmeq;
 
-	result = nvme_alloc_admin_tags(dev);
-	if (result)
-		goto free_nvmeq;
-
 	nvmeq->cq_vector = 0;
 	result = queue_request_irq(dev, nvmeq, nvmeq->irqname);
 	if (result)
-		goto free_tags;
+		goto free_nvmeq;
 
 	return result;
 
- free_tags:
-	nvme_free_admin_tags(dev);
  free_nvmeq:
 	nvme_free_queues(dev, 0);
 	return result;
@@ -2251,15 +2254,19 @@ static void nvme_wait_dq(struct nvme_delq_ctx *dq, struct nvme_dev *dev)
 		set_current_state(TASK_KILLABLE);
 		if (!atomic_read(&dq->refcount))
 			break;
-		if (!schedule_timeout(ADMIN_TIMEOUT) ||
-					fatal_signal_pending(current)) {
+		if (!schedule_timeout(ADMIN_TIMEOUT)) {
+			/*
+			 * Disable the controller first since we can't trust it
+			 * at this point, but leave the admin queue enabled
+			 * until all queue deletion requests are flushed. This
+			 * may take a while if there are more h/w queues than
+			 * admin tags.
+			 */
 			set_current_state(TASK_RUNNING);
-
 			nvme_disable_ctrl(dev, readq(&dev->bar->cap));
-			nvme_disable_queue(dev, 0);
-
-			send_sig(SIGKILL, dq->worker->task, 1);
+			nvme_clear_queue(dev->queues[0]);
 			flush_kthread_worker(dq->worker);
+			nvme_disable_queue(dev, 0);
 			return;
 		}
 	}
@@ -2336,7 +2343,6 @@ static void nvme_del_queue_start(struct kthread_work *work)
 {
 	struct nvme_queue *nvmeq = container_of(work, struct nvme_queue,
 							cmdinfo.work);
-	allow_signal(SIGKILL);
 	if (nvme_delete_sq(nvmeq))
 		nvme_del_queue_end(nvmeq);
 }
@@ -2448,12 +2454,6 @@ static void nvme_dev_shutdown(struct nvme_dev *dev)
 	nvme_dev_unmap(dev);
 }
 
-static void nvme_dev_remove_admin(struct nvme_dev *dev)
-{
-	if (dev->admin_q && !blk_queue_dying(dev->admin_q))
-		blk_cleanup_queue(dev->admin_q);
-}
-
 static void nvme_dev_remove(struct nvme_dev *dev)
 {
 	struct nvme_ns *ns;
@@ -2543,6 +2543,7 @@ static void nvme_free_dev(struct kref *kref)
 	nvme_free_namespaces(dev);
 	nvme_release_instance(dev);
 	blk_mq_free_tag_set(&dev->tagset);
+	blk_put_queue(dev->admin_q);
 	kfree(dev->queues);
 	kfree(dev->entry);
 	kfree(dev);
@@ -2639,15 +2640,18 @@ static int nvme_dev_start(struct nvme_dev *dev)
 	}
 
 	nvme_init_queue(dev->queues[0], 0);
-
-	result = nvme_setup_io_queues(dev);
+	result = nvme_alloc_admin_tags(dev);
 	if (result)
 		goto disable;
 
-	nvme_set_irq_hints(dev);
+	result = nvme_setup_io_queues(dev);
+	if (result)
+		goto free_tags;
 
 	return result;
 
+ free_tags:
+	nvme_dev_remove_admin(dev);
  disable:
 	nvme_disable_queue(dev, 0);
 	nvme_dev_list_remove(dev);
@@ -2831,7 +2835,6 @@ static void nvme_remove(struct pci_dev *pdev)
 	nvme_dev_remove(dev);
 	nvme_dev_remove_admin(dev);
 	nvme_free_queues(dev, 0);
-	nvme_free_admin_tags(dev);
 	nvme_release_prp_pools(dev);
 	kref_put(&dev->kref, nvme_free_dev);
 }
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCHv2 10/10] NVMe: Command abort handling fixes
  2015-01-07  2:57 [PATCHv2 00/10] Second attempt at blk-mq + nvme hotplug fixes Keith Busch
                   ` (8 preceding siblings ...)
  2015-01-07  2:58 ` [PATCHv2 09/10] NVMe: Admin queue error handling Keith Busch
@ 2015-01-07  2:58 ` Keith Busch
  2015-01-07  7:42 ` [PATCHv2 00/10] Second attempt at blk-mq + nvme hotplug fixes Ming Lin
  2015-01-07 16:22 ` Jens Axboe
  11 siblings, 0 replies; 20+ messages in thread
From: Keith Busch @ 2015-01-07  2:58 UTC (permalink / raw)


Aborts all requeued commands prior to killing the request_queue. For
commands that time out on a dying request queue, set the "Do Not Retry"
bit on the command status so the command cannot be requeued. Finanally, if
the driver is requested to abort a command it did not start, do nothing.

Signed-off-by: Keith Busch <keith.busch at intel.com>
---
 drivers/block/nvme-core.c |   17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/drivers/block/nvme-core.c b/drivers/block/nvme-core.c
index f20e6c6..a7f663c 100644
--- a/drivers/block/nvme-core.c
+++ b/drivers/block/nvme-core.c
@@ -1067,15 +1067,22 @@ static void nvme_cancel_queue_ios(struct blk_mq_hw_ctx *hctx,
 	void *ctx;
 	nvme_completion_fn fn;
 	struct nvme_cmd_info *cmd;
-	static struct nvme_completion cqe = {
-		.status = cpu_to_le16(NVME_SC_ABORT_REQ << 1),
-	};
+	struct nvme_completion cqe;
+
+	if (!blk_mq_request_started(req))
+		return;
 
 	cmd = blk_mq_rq_to_pdu(req);
 
 	if (cmd->ctx == CMD_CTX_CANCELLED)
 		return;
 
+	if (blk_queue_dying(req->q))
+		cqe.status = cpu_to_le16((NVME_SC_ABORT_REQ | NVME_SC_DNR) << 1);
+	else
+		cqe.status = cpu_to_le16(NVME_SC_ABORT_REQ << 1);
+
+
 	dev_warn(nvmeq->q_dmadev, "Cancelling I/O %d QID %d\n",
 						req->tag, nvmeq->qid);
 	ctx = cancel_cmd_info(cmd, &fn);
@@ -2461,8 +2468,10 @@ static void nvme_dev_remove(struct nvme_dev *dev)
 	list_for_each_entry(ns, &dev->namespaces, list) {
 		if (ns->disk->flags & GENHD_FL_UP)
 			del_gendisk(ns->disk);
-		if (!blk_queue_dying(ns->queue))
+		if (!blk_queue_dying(ns->queue)) {
+			blk_mq_abort_requeue_list(ns->queue);
 			blk_cleanup_queue(ns->queue);
+		}
 	}
 }
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCHv2 08/10] NVMe: Start and stop h/w queues on reset
  2015-01-07  2:58 ` [PATCHv2 08/10] NVMe: Start and stop h/w queues on reset Keith Busch
@ 2015-01-07  3:08   ` Keith Busch
  0 siblings, 0 replies; 20+ messages in thread
From: Keith Busch @ 2015-01-07  3:08 UTC (permalink / raw)


On Tue, 6 Jan 2015, Keith Busch wrote:
> @@ -433,7 +433,10 @@ static void req_completion(struct nvme_queue *nvmeq, void *ctx,
> 		if (!(status & NVME_SC_DNR || blk_noretry_request(req))
> 		    && (jiffies - req->start_time) < req->timeout) {
> 			blk_mq_requeue_request(req);
> -			blk_mq_kick_requeue_list(req->q);
> +			spin_lock(req->q->queue_lock);
> +			if (!blk_queue_stopped(req->q))
> +				blk_mq_kick_requeue_list(req->q);
> +			spin_unlock(req->q->queue_lock);
> 			return;
> 		}

Ugh, been at this too long and screwed up my hand-merge when
squashing and splitting into managable patches. The above should use
irq save/restore locks.

@@ -433,7 +433,10 @@ static void req_completion(struct nvme_queue *nvmeq, void *ctx,
   		if (!(status & NVME_SC_DNR || blk_noretry_request(req))
   		    && (jiffies - req->start_time) < req->timeout) {
+ 			int flags;
   			blk_mq_requeue_request(req);
-			blk_mq_kick_requeue_list(req->q);
+			spin_lock_irqsave(req->q->queue_lock, flags);
+			if (!blk_queue_stopped(req->q))
+				blk_mq_kick_requeue_list(req->q);
+			spin_unlock_irqrestore(req->q->queue_lock, flags);
  			return;
  		}

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCHv2 00/10] Second attempt at blk-mq + nvme hotplug fixes
  2015-01-07  2:57 [PATCHv2 00/10] Second attempt at blk-mq + nvme hotplug fixes Keith Busch
                   ` (9 preceding siblings ...)
  2015-01-07  2:58 ` [PATCHv2 10/10] NVMe: Command abort handling fixes Keith Busch
@ 2015-01-07  7:42 ` Ming Lin
  2015-01-07 16:20   ` Keith Busch
  2015-01-07 16:22 ` Jens Axboe
  11 siblings, 1 reply; 20+ messages in thread
From: Ming Lin @ 2015-01-07  7:42 UTC (permalink / raw)


On Tue, Jan 6, 2015@6:57 PM, Keith Busch <keith.busch@intel.com> wrote:
> Second try, this time tested against many more scenarios than before
> with error injection and surprise hot-removal and intermittent resets.
>
> I'm adding a lot of stuff outside the driver, but I didn't find a
> cleaner way to a lot of these things. This makes me a little nervous,
> so please let me know if anything seems amiss here. I don't think any
> of the blk-mq changes could possibly be harmful to anyone else since
> nvme is the only driver that uses most of the additions.
>
> The only issue remaining I found is unfreezing queues might tigger the
> percpu_ref_reinit WARN_ON_ONCE when the driver restarts a request_queue
> with queued up IO's.
>
> This is against linux-block/for-next.

Hi Keith,

Tested with qemu-nvme. Hotplug seems work., but has some issues.

On guest: run fio for a while
root at block:~# fio --name=global --filename=/dev/nvme0n1 --direct=1 --bs=4k \
     --rw=randrw --ioengine=libaio --iodepth=128 --name=foobar
foobar: (g=0): rw=randrw, bs=4K-4K/4K-4K, ioengine=libaio, iodepth=128
2.0.8
Starting 1 process
Jobs: 1 (f=1): [m] [10.3% done] [3746K/3980K /s] [936 /995  iops] [eta
08m:52s]s]

On guest: then remove nvme device
root at block:~# time echo 1 > /sys/devices/pci0000:00/0000:00:04.0/remove
real 1m0.040s
user 0m0.000s
sys 0m0.012

It works, but took long time(1 minutes) to return.
And during this time frame, the whole qemu system was not responsible.
Ping from host to guest also failed.

mlin at minggr:~$ ping 192.168.122.89
PING 192.168.122.89 (192.168.122.89) 56(84) bytes of data.
>From 192.168.122.1 icmp_seq=15 Destination Host Unreachable
>From 192.168.122.1 icmp_seq=16 Destination Host Unreachable

Thanks,
Ming

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCHv2 05/10] blk-mq: Allow requests to never expire
  2015-01-07  2:57 ` [PATCHv2 05/10] blk-mq: Allow requests to never expire Keith Busch
@ 2015-01-07 16:16   ` Jens Axboe
  0 siblings, 0 replies; 20+ messages in thread
From: Jens Axboe @ 2015-01-07 16:16 UTC (permalink / raw)


On 01/06/2015 07:57 PM, Keith Busch wrote:
> Some requests may be started but have no gaurantee they'll ever
> complete. This defines a special timeout value that a driver can use so
> the request will never be timed out.

What if the timeout just happens to be UINT_MAX for the regular case? I 
think it'd be a lot safer to add a specific no-timeout flag to the 
request instead.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCHv2 01/10] blk-mq: Wake tasks entering queue on dying
  2015-01-07  2:57 ` [PATCHv2 01/10] blk-mq: Wake tasks entering queue on dying Keith Busch
@ 2015-01-07 16:18   ` Jens Axboe
  0 siblings, 0 replies; 20+ messages in thread
From: Jens Axboe @ 2015-01-07 16:18 UTC (permalink / raw)


On 01/06/2015 07:57 PM, Keith Busch wrote:
> When the queue is set to dying, wake up tasks that are waiting on frozen
> queue so they realize it is dying and abandon their request.
>
> Signed-off-by: Keith Busch <keith.busch at intel.com>
> ---
>   block/blk-mq.c |    1 +
>   1 file changed, 1 insertion(+)
>
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 1a41d7a..6d83ee6 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -160,6 +160,7 @@ void blk_mq_wake_waiters(struct request_queue *q)
>   	queue_for_each_hw_ctx(q, hctx, i)
>   		if (blk_mq_hw_queue_mapped(hctx))
>   			blk_mq_tag_wakeup_all(hctx->tags, true);
> +	wake_up_all(&q->mq_freeze_wq);
>   }
>
>   bool blk_mq_can_queue(struct blk_mq_hw_ctx *hctx)

That looks sane, thanks.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCHv2 00/10] Second attempt at blk-mq + nvme hotplug fixes
  2015-01-07  7:42 ` [PATCHv2 00/10] Second attempt at blk-mq + nvme hotplug fixes Ming Lin
@ 2015-01-07 16:20   ` Keith Busch
  2015-01-07 19:36     ` Ming Lin
  0 siblings, 1 reply; 20+ messages in thread
From: Keith Busch @ 2015-01-07 16:20 UTC (permalink / raw)

On Tue, 6 Jan 2015, Ming Lin wrote:
> Hi Keith,
>
> Tested with qemu-nvme. Hotplug seems work., but has some issues.
>
> On guest: run fio for a while
> root at block:~# fio --name=global --filename=/dev/nvme0n1 --direct=1 --bs=4k \
>     --rw=randrw --ioengine=libaio --iodepth=128 --name=foobar
> foobar: (g=0): rw=randrw, bs=4K-4K/4K-4K, ioengine=libaio, iodepth=128
> 2.0.8
> Starting 1 process
> Jobs: 1 (f=1): [m] [10.3% done] [3746K/3980K /s] [936 /995  iops] [eta
> 08m:52s]s]
>
> On guest: then remove nvme device
> root at block:~# time echo 1 > /sys/devices/pci0000:00/0000:00:04.0/remove
> real 1m0.040s
> user 0m0.000s
> sys 0m0.012
>
> It works, but took long time(1 minutes) to return.
> And during this time frame, the whole qemu system was not responsible.
> Ping from host to guest also failed.

Thanks for the feedback. Is there anything special about your emulated
device? I haven't been able to duplicate these results. When running
the same fio job, I get ~0.4s on qemu, and ~2.2s on real h/w.

A 1 mintute removal sounds like you're hitting the admin timeout on
queue deletion.

On the other hand, your namespace backing storage appears to be really
slow, so maybe it's stuck in blk_flush on shutdown. You say the guest
was unresponsive during removal, so that sounds plausible since qemu's
nvme shutdown is down synchronously with the CC register MMIO write.

What if you try this using a ramdisk or tmpfs file for the namespace
storage instead of what you're currently using?

Or add some prints in both the driver's and emualted target's shutdown
path and see where you're getting stuck?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCHv2 00/10] Second attempt at blk-mq + nvme hotplug fixes
  2015-01-07  2:57 [PATCHv2 00/10] Second attempt at blk-mq + nvme hotplug fixes Keith Busch
                   ` (10 preceding siblings ...)
  2015-01-07  7:42 ` [PATCHv2 00/10] Second attempt at blk-mq + nvme hotplug fixes Ming Lin
@ 2015-01-07 16:22 ` Jens Axboe
  2015-01-07 16:31   ` Keith Busch
  11 siblings, 1 reply; 20+ messages in thread
From: Jens Axboe @ 2015-01-07 16:22 UTC (permalink / raw)


On 01/06/2015 07:57 PM, Keith Busch wrote:
> Second try, this time tested against many more scenarios than before
> with error injection and surprise hot-removal and intermittent resets.
>
> I'm adding a lot of stuff outside the driver, but I didn't find a
> cleaner way to a lot of these things. This makes me a little nervous,
> so please let me know if anything seems amiss here. I don't think any
> of the blk-mq changes could possibly be harmful to anyone else since
> nvme is the only driver that uses most of the additions.
>
> The only issue remaining I found is unfreezing queues might tigger the
> percpu_ref_reinit WARN_ON_ONCE when the driver restarts a request_queue
> with queued up IO's.
>
> This is against linux-block/for-next.

Series looks sane to me, apart from the timeout change I replied to 
separately. Could you turn that into a request flag and respin the 
series on top of that?

> Jens,
> I believe fourth one ("abort requeue list") is from you, but I didn't find
> the patch.

Yeah it is, sent to you on 12/23 as part of the "nvme-blkmq fixes" thread.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCHv2 00/10] Second attempt at blk-mq + nvme hotplug fixes
  2015-01-07 16:22 ` Jens Axboe
@ 2015-01-07 16:31   ` Keith Busch
  2015-01-07 16:34     ` Jens Axboe
  0 siblings, 1 reply; 20+ messages in thread
From: Keith Busch @ 2015-01-07 16:31 UTC (permalink / raw)


On Wed, 7 Jan 2015, Jens Axboe wrote:
> On 01/06/2015 07:57 PM, Keith Busch wrote:
>> Second try, this time tested against many more scenarios than before
>> with error injection and surprise hot-removal and intermittent resets.
>> 
>> I'm adding a lot of stuff outside the driver, but I didn't find a
>> cleaner way to a lot of these things. This makes me a little nervous,
>> so please let me know if anything seems amiss here. I don't think any
>> of the blk-mq changes could possibly be harmful to anyone else since
>> nvme is the only driver that uses most of the additions.
>> 
>> The only issue remaining I found is unfreezing queues might tigger the
>> percpu_ref_reinit WARN_ON_ONCE when the driver restarts a request_queue
>> with queued up IO's.
>> 
>> This is against linux-block/for-next.
>
> Series looks sane to me, apart from the timeout change I replied to 
> separately. Could you turn that into a request flag and respin the series on 
> top of that?

Thanks a bunch, I'll respin the series with the no-timeout request flag,
correct irq spin locking I found in [8/10], and fix attribution in [4/10].

>> Jens,
>> I believe fourth one ("abort requeue list") is from you, but I didn't find
>> the patch.
>
> Yeah it is, sent to you on 12/23 as part of the "nvme-blkmq fixes" thread.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCHv2 00/10] Second attempt at blk-mq + nvme hotplug fixes
  2015-01-07 16:31   ` Keith Busch
@ 2015-01-07 16:34     ` Jens Axboe
  0 siblings, 0 replies; 20+ messages in thread
From: Jens Axboe @ 2015-01-07 16:34 UTC (permalink / raw)


On 01/07/2015 09:31 AM, Keith Busch wrote:
> On Wed, 7 Jan 2015, Jens Axboe wrote:
>> On 01/06/2015 07:57 PM, Keith Busch wrote:
>>> Second try, this time tested against many more scenarios than before
>>> with error injection and surprise hot-removal and intermittent resets.
>>>
>>> I'm adding a lot of stuff outside the driver, but I didn't find a
>>> cleaner way to a lot of these things. This makes me a little nervous,
>>> so please let me know if anything seems amiss here. I don't think any
>>> of the blk-mq changes could possibly be harmful to anyone else since
>>> nvme is the only driver that uses most of the additions.
>>>
>>> The only issue remaining I found is unfreezing queues might tigger the
>>> percpu_ref_reinit WARN_ON_ONCE when the driver restarts a request_queue
>>> with queued up IO's.
>>>
>>> This is against linux-block/for-next.
>>
>> Series looks sane to me, apart from the timeout change I replied to
>> separately. Could you turn that into a request flag and respin the
>> series on top of that?
>
> Thanks a bunch, I'll respin the series with the no-timeout request flag,
> correct irq spin locking I found in [8/10], and fix attribution in [4/10].

Awesome, thanks Keith! And thanks for taking the time to get this sorted.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCHv2 00/10] Second attempt at blk-mq + nvme hotplug fixes
  2015-01-07 16:20   ` Keith Busch
@ 2015-01-07 19:36     ` Ming Lin
  0 siblings, 0 replies; 20+ messages in thread
From: Ming Lin @ 2015-01-07 19:36 UTC (permalink / raw)


On Wed, Jan 7, 2015@8:20 AM, Keith Busch <keith.busch@intel.com> wrote:
> On Tue, 6 Jan 2015, Ming Lin wrote:
>>
>> Hi Keith,
>>
>> Tested with qemu-nvme. Hotplug seems work., but has some issues.
>>
>> On guest: run fio for a while
>> root at block:~# fio --name=global --filename=/dev/nvme0n1 --direct=1 --bs=4k
>> \
>>     --rw=randrw --ioengine=libaio --iodepth=128 --name=foobar
>> foobar: (g=0): rw=randrw, bs=4K-4K/4K-4K, ioengine=libaio, iodepth=128
>> 2.0.8
>> Starting 1 process
>> Jobs: 1 (f=1): [m] [10.3% done] [3746K/3980K /s] [936 /995  iops] [eta
>> 08m:52s]s]
>>
>> On guest: then remove nvme device
>> root at block:~# time echo 1 > /sys/devices/pci0000:00/0000:00:04.0/remove
>> real 1m0.040s
>> user 0m0.000s
>> sys 0m0.012
>>
>> It works, but took long time(1 minutes) to return.
>> And during this time frame, the whole qemu system was not responsible.
>> Ping from host to guest also failed.
>
>
> Thanks for the feedback. Is there anything special about your emulated
> device? I haven't been able to duplicate these results. When running
> the same fio job, I get ~0.4s on qemu, and ~2.2s on real h/w.
>
> A 1 mintute removal sounds like you're hitting the admin timeout on
> queue deletion.
>
> On the other hand, your namespace backing storage appears to be really
> slow, so maybe it's stuck in blk_flush on shutdown. You say the guest
> was unresponsive during removal, so that sounds plausible since qemu's
> nvme shutdown is down synchronously with the CC register MMIO write.
>
> What if you try this using a ramdisk or tmpfs file for the namespace
> storage instead of what you're currently using?

It's good with tmpfs file.

root at block:~# time echo 1 > /sys/devices/pci0000:00/0000:00:04.0/remove
real 0m0.403s
user 0m0.000s
sys 0m0.008s

Thanks.

>
> Or add some prints in both the driver's and emualted target's shutdown
> path and see where you're getting stuck?

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2015-01-07 19:36 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-01-07  2:57 [PATCHv2 00/10] Second attempt at blk-mq + nvme hotplug fixes Keith Busch
2015-01-07  2:57 ` [PATCHv2 01/10] blk-mq: Wake tasks entering queue on dying Keith Busch
2015-01-07 16:18   ` Jens Axboe
2015-01-07  2:57 ` [PATCHv2 02/10] blk-mq: Export test for started requests Keith Busch
2015-01-07  2:57 ` [PATCHv2 03/10] blk-mq: Let drivers cancel requeue_work Keith Busch
2015-01-07  2:57 ` [PATCHv2 04/10] blk-mq: Export abort requeue list Keith Busch
2015-01-07  2:57 ` [PATCHv2 05/10] blk-mq: Allow requests to never expire Keith Busch
2015-01-07 16:16   ` Jens Axboe
2015-01-07  2:58 ` [PATCHv2 06/10] blk-mq: End unstarted requests on a dying queue Keith Busch
2015-01-07  2:58 ` [PATCHv2 07/10] NVMe: Start driver allocated requests Keith Busch
2015-01-07  2:58 ` [PATCHv2 08/10] NVMe: Start and stop h/w queues on reset Keith Busch
2015-01-07  3:08   ` Keith Busch
2015-01-07  2:58 ` [PATCHv2 09/10] NVMe: Admin queue error handling Keith Busch
2015-01-07  2:58 ` [PATCHv2 10/10] NVMe: Command abort handling fixes Keith Busch
2015-01-07  7:42 ` [PATCHv2 00/10] Second attempt at blk-mq + nvme hotplug fixes Ming Lin
2015-01-07 16:20   ` Keith Busch
2015-01-07 19:36     ` Ming Lin
2015-01-07 16:22 ` Jens Axboe
2015-01-07 16:31   ` Keith Busch
2015-01-07 16:34     ` Jens Axboe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.