[PATCH v5 0/14] Fix race conditions related to stopping block layer queues

linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v5 0/14] Fix race conditions related to stopping block layer queues
@ 2016-10-29  0:18 Bart Van Assche
  2016-10-29  0:18 ` [PATCH v5 01/14] blk-mq: Do not invoke .queue_rq() for a stopped queue Bart Van Assche
                   ` (15 more replies)
  0 siblings, 16 replies; 31+ messages in thread
From: Bart Van Assche @ 2016-10-29  0:18 UTC (permalink / raw)


Hello Jens,

Multiple block drivers need the functionality to stop a request queue 
and to wait until all ongoing request_fn() / queue_rq() calls have 
finished without waiting until all outstanding requests have finished. 
Hence this patch series that introduces the blk_mq_quiesce_queue() 
function. The dm-mq, SRP and NVMe patches in this patch series are three 
examples of where these functions are useful. These patches have been 
tested on top of kernel v4.9-rc2. The following tests have been run to 
verify this patch series:
- Mike's mptest suite that stress-tests dm-multipath.
- My own srp-test suite that stress-tests SRP on top of dm-multipath.
- fio on top of the NVMeOF host driver that was connected to the NVMeOF
   target driver on the same host.
- Laurence verified the previous version (v3) of this patch series by
   running it through the Red Hat SRP and NVMe test suites.

The changes compared to the third version of this patch series are:
- Added a blk_mq_stop_hw_queues() call in blk_mq_quiesce_queue() as
   requested by Ming Lei.
- Modified scsi_unblock_target() such that it waits until
   .queuecommand() finished. Unexported scsi_wait_for_queuecommand().
- Reordered the two NVMe patches.
- Added a patch that avoids that blk_mq_requeue_work() restarts stopped
   queues.
- Added a patch that removes blk_mq_cancel_requeue_work().

Changes between v4 and v3:
- Left out the dm changes from the patch that introduces
   blk_mq_hctx_stopped() because a later patch deletes the changed code
   from the dm core.
- Moved the blk_mq_hctx_stopped() declaration from a public to a
   private block layer header file.
- Added a new patch that moves more code into
   blk_mq_direct_issue_request(). This patch avoids that a new function
   has to be introduced to avoid code duplication.
- Explained the implemented algorithm in the patch that introduces
   blk_mq_quiesce_queue() in the description of the patch that
   introduces this function.
- Added "select SRCU" to the patch that introduces
   blk_mq_quiesce_queue() to avoid build failures.
- Documented the shost argument in the scsi_wait_for_queuecommand()
   kerneldoc header.
- Fixed an unintended behavior change in the last patch of this series.

Changes between v3 and v2:
- Changed the order of the patches in this patch series.
- Added several new patches: a patch that avoids that .queue_rq() gets
   invoked from the direct submission path if a queue has been stopped
   and also a patch that introduces the helper function
   blk_mq_hctx_stopped().
- blk_mq_quiesce_queue() has been reworked (thanks to Ming Lin and Sagi
   for their feedback).
- A bool 'kick' argument has been added to blk_mq_requeue_request().
- As proposed by Christoph, the code that waits for queuecommand() has
   been moved from the SRP transport driver to the SCSI core.

Changes between v2 and v1:
- Dropped the non-blk-mq changes from this patch series.
- Added support for harware queues with BLK_MQ_F_BLOCKING set.
- Added a call stack to the description of the dm race fix patch.
- Dropped the non-scsi-mq changes from the SRP patch.
- Added a patch that introduces blk_mq_queue_stopped() in the dm driver.

The individual patches in this series are:

0001-blk-mq-Do-not-invoke-.queue_rq-for-a-stopped-queue.patch
0002-blk-mq-Introduce-blk_mq_hctx_stopped.patch
0003-blk-mq-Introduce-blk_mq_queue_stopped.patch
0004-blk-mq-Move-more-code-into-blk_mq_direct_issue_reque.patch
0005-blk-mq-Avoid-that-requeueing-starts-stopped-queues.patch
0006-blk-mq-Remove-blk_mq_cancel_requeue_work.patch
0007-blk-mq-Introduce-blk_mq_quiesce_queue.patch
0008-blk-mq-Add-a-kick_requeue_list-argument-to-blk_mq_re.patch
0009-dm-Use-BLK_MQ_S_STOPPED-instead-of-QUEUE_FLAG_STOPPE.patch
0010-dm-Fix-a-race-condition-related-to-stopping-and-star.patch
0011-SRP-transport-Move-queuecommand-wait-code-to-SCSI-co.patch
0012-SRP-transport-scsi-mq-Wait-for-.queue_rq-if-necessar.patch
0013-nvme-Fix-a-race-condition-related-to-stopping-queues.patch
0014-nvme-Use-BLK_MQ_S_STOPPED-instead-of-QUEUE_FLAG_STOP.patch

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v5 01/14] blk-mq: Do not invoke .queue_rq() for a stopped queue
  2016-10-29  0:18 [PATCH v5 0/14] Fix race conditions related to stopping block layer queues Bart Van Assche
@ 2016-10-29  0:18 ` Bart Van Assche
  2016-10-29  0:19 ` [PATCH v5 02/14] blk-mq: Introduce blk_mq_hctx_stopped() Bart Van Assche
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 31+ messages in thread
From: Bart Van Assche @ 2016-10-29  0:18 UTC (permalink / raw)


The meaning of the BLK_MQ_S_STOPPED flag is "do not call
.queue_rq()". Hence modify blk_mq_make_request() such that requests
are queued instead of issued if a queue has been stopped.

Reported-by: Ming Lei <tom.leiming at gmail.com>
Signed-off-by: Bart Van Assche <bart.vanassche at sandisk.com>
Reviewed-by: Christoph Hellwig <hch at lst.de>
Reviewed-by: Ming Lei <tom.leiming at gmail.com>
Reviewed-by: Hannes Reinecke <hare at suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn at suse.de>
Reviewed-by: Sagi Grimberg <sagi at grimberg.me>
Cc: <stable at vger.kernel.org>
---
 block/blk-mq.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index f3d27a6..ad459e4 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1332,9 +1332,9 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
 		blk_mq_put_ctx(data.ctx);
 		if (!old_rq)
 			goto done;
-		if (!blk_mq_direct_issue_request(old_rq, &cookie))
-			goto done;
-		blk_mq_insert_request(old_rq, false, true, true);
+		if (test_bit(BLK_MQ_S_STOPPED, &data.hctx->state) ||
+		    blk_mq_direct_issue_request(old_rq, &cookie) != 0)
+			blk_mq_insert_request(old_rq, false, true, true);
 		goto done;
 	}
 
-- 
2.10.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 02/14] blk-mq: Introduce blk_mq_hctx_stopped()
  2016-10-29  0:18 [PATCH v5 0/14] Fix race conditions related to stopping block layer queues Bart Van Assche
  2016-10-29  0:18 ` [PATCH v5 01/14] blk-mq: Do not invoke .queue_rq() for a stopped queue Bart Van Assche
@ 2016-10-29  0:19 ` Bart Van Assche
  2016-10-29  0:19 ` [PATCH v5 03/14] blk-mq: Introduce blk_mq_queue_stopped() Bart Van Assche
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 31+ messages in thread
From: Bart Van Assche @ 2016-10-29  0:19 UTC (permalink / raw)


Multiple functions test the BLK_MQ_S_STOPPED bit so introduce
a helper function that performs this test.

Signed-off-by: Bart Van Assche <bart.vanassche at sandisk.com>
Reviewed-by: Ming Lei <tom.leiming at gmail.com>
Reviewed-by: Hannes Reinecke <hare at suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn at suse.de>
Reviewed-by: Sagi Grimberg <sagi at grimberg.me>
Reviewed-by: Christoph Hellwig <hch at lst.de>
---
 block/blk-mq.c | 12 ++++++------
 block/blk-mq.h |  5 +++++
 2 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index ad459e4..bc1f462 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -787,7 +787,7 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
 	struct list_head *dptr;
 	int queued;
 
-	if (unlikely(test_bit(BLK_MQ_S_STOPPED, &hctx->state)))
+	if (unlikely(blk_mq_hctx_stopped(hctx)))
 		return;
 
 	WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) &&
@@ -912,8 +912,8 @@ static int blk_mq_hctx_next_cpu(struct blk_mq_hw_ctx *hctx)
 
 void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async)
 {
-	if (unlikely(test_bit(BLK_MQ_S_STOPPED, &hctx->state) ||
-	    !blk_mq_hw_queue_mapped(hctx)))
+	if (unlikely(blk_mq_hctx_stopped(hctx) ||
+		     !blk_mq_hw_queue_mapped(hctx)))
 		return;
 
 	if (!async && !(hctx->flags & BLK_MQ_F_BLOCKING)) {
@@ -938,7 +938,7 @@ void blk_mq_run_hw_queues(struct request_queue *q, bool async)
 	queue_for_each_hw_ctx(q, hctx, i) {
 		if ((!blk_mq_hctx_has_pending(hctx) &&
 		    list_empty_careful(&hctx->dispatch)) ||
-		    test_bit(BLK_MQ_S_STOPPED, &hctx->state))
+		    blk_mq_hctx_stopped(hctx))
 			continue;
 
 		blk_mq_run_hw_queue(hctx, async);
@@ -988,7 +988,7 @@ void blk_mq_start_stopped_hw_queues(struct request_queue *q, bool async)
 	int i;
 
 	queue_for_each_hw_ctx(q, hctx, i) {
-		if (!test_bit(BLK_MQ_S_STOPPED, &hctx->state))
+		if (!blk_mq_hctx_stopped(hctx))
 			continue;
 
 		clear_bit(BLK_MQ_S_STOPPED, &hctx->state);
@@ -1332,7 +1332,7 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
 		blk_mq_put_ctx(data.ctx);
 		if (!old_rq)
 			goto done;
-		if (test_bit(BLK_MQ_S_STOPPED, &data.hctx->state) ||
+		if (blk_mq_hctx_stopped(data.hctx) ||
 		    blk_mq_direct_issue_request(old_rq, &cookie) != 0)
 			blk_mq_insert_request(old_rq, false, true, true);
 		goto done;
diff --git a/block/blk-mq.h b/block/blk-mq.h
index e5d2524..ac772da 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -100,6 +100,11 @@ static inline void blk_mq_set_alloc_data(struct blk_mq_alloc_data *data,
 	data->hctx = hctx;
 }
 
+static inline bool blk_mq_hctx_stopped(struct blk_mq_hw_ctx *hctx)
+{
+	return test_bit(BLK_MQ_S_STOPPED, &hctx->state);
+}
+
 static inline bool blk_mq_hw_queue_mapped(struct blk_mq_hw_ctx *hctx)
 {
 	return hctx->nr_ctx && hctx->tags;
-- 
2.10.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 03/14] blk-mq: Introduce blk_mq_queue_stopped()
  2016-10-29  0:18 [PATCH v5 0/14] Fix race conditions related to stopping block layer queues Bart Van Assche
  2016-10-29  0:18 ` [PATCH v5 01/14] blk-mq: Do not invoke .queue_rq() for a stopped queue Bart Van Assche
  2016-10-29  0:19 ` [PATCH v5 02/14] blk-mq: Introduce blk_mq_hctx_stopped() Bart Van Assche
@ 2016-10-29  0:19 ` Bart Van Assche
  2016-10-29  0:20 ` [PATCH v5 04/14] blk-mq: Move more code into blk_mq_direct_issue_request() Bart Van Assche
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 31+ messages in thread
From: Bart Van Assche @ 2016-10-29  0:19 UTC (permalink / raw)


The function blk_queue_stopped() allows to test whether or not a
traditional request queue has been stopped. Introduce a helper
function that allows block drivers to query easily whether or not
one or more hardware contexts of a blk-mq queue have been stopped.

Signed-off-by: Bart Van Assche <bart.vanassche at sandisk.com>
Reviewed-by: Hannes Reinecke <hare at suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn at suse.de>
Reviewed-by: Sagi Grimberg <sagi at grimberg.me>
Reviewed-by: Christoph Hellwig <hch at lst.de>
---
 block/blk-mq.c         | 20 ++++++++++++++++++++
 include/linux/blk-mq.h |  1 +
 2 files changed, 21 insertions(+)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index bc1f462..283e0eb 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -946,6 +946,26 @@ void blk_mq_run_hw_queues(struct request_queue *q, bool async)
 }
 EXPORT_SYMBOL(blk_mq_run_hw_queues);
 
+/**
+ * blk_mq_queue_stopped() - check whether one or more hctxs have been stopped
+ * @q: request queue.
+ *
+ * The caller is responsible for serializing this function against
+ * blk_mq_{start,stop}_hw_queue().
+ */
+bool blk_mq_queue_stopped(struct request_queue *q)
+{
+	struct blk_mq_hw_ctx *hctx;
+	int i;
+
+	queue_for_each_hw_ctx(q, hctx, i)
+		if (blk_mq_hctx_stopped(hctx))
+			return true;
+
+	return false;
+}
+EXPORT_SYMBOL(blk_mq_queue_stopped);
+
 void blk_mq_stop_hw_queue(struct blk_mq_hw_ctx *hctx)
 {
 	cancel_work(&hctx->run_work);
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 535ab2e..aa93000 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -223,6 +223,7 @@ void blk_mq_delay_kick_requeue_list(struct request_queue *q, unsigned long msecs
 void blk_mq_abort_requeue_list(struct request_queue *q);
 void blk_mq_complete_request(struct request *rq, int error);
 
+bool blk_mq_queue_stopped(struct request_queue *q);
 void blk_mq_stop_hw_queue(struct blk_mq_hw_ctx *hctx);
 void blk_mq_start_hw_queue(struct blk_mq_hw_ctx *hctx);
 void blk_mq_stop_hw_queues(struct request_queue *q);
-- 
2.10.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 04/14] blk-mq: Move more code into blk_mq_direct_issue_request()
  2016-10-29  0:18 [PATCH v5 0/14] Fix race conditions related to stopping block layer queues Bart Van Assche
                   ` (2 preceding siblings ...)
  2016-10-29  0:19 ` [PATCH v5 03/14] blk-mq: Introduce blk_mq_queue_stopped() Bart Van Assche
@ 2016-10-29  0:20 ` Bart Van Assche
  2016-10-29  0:20 ` [PATCH v5 05/14] blk-mq: Avoid that requeueing starts stopped queues Bart Van Assche
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 31+ messages in thread
From: Bart Van Assche @ 2016-10-29  0:20 UTC (permalink / raw)


Move the "hctx stopped" test and the insert request calls into
blk_mq_direct_issue_request(). Rename that function into
blk_mq_try_issue_directly() to reflect its new semantics. Pass
the hctx pointer to that function instead of looking it up a
second time. These changes avoid that code has to be duplicated
in the next patch.

Signed-off-by: Bart Van Assche <bart.vanassche at sandisk.com>
Reviewed-by: Hannes Reinecke <hare at suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn at suse.de>
Reviewed-by: Sagi Grimberg <sagi at grimberg.me>
Reviewed-by: Christoph Hellwig <hch at lst.de>
---
 block/blk-mq.c | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 283e0eb..c8ae970 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1243,11 +1243,11 @@ static struct request *blk_mq_map_request(struct request_queue *q,
 	return rq;
 }
 
-static int blk_mq_direct_issue_request(struct request *rq, blk_qc_t *cookie)
+static void blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
+				      struct request *rq, blk_qc_t *cookie)
 {
 	int ret;
 	struct request_queue *q = rq->q;
-	struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(q, rq->mq_ctx->cpu);
 	struct blk_mq_queue_data bd = {
 		.rq = rq,
 		.list = NULL,
@@ -1255,6 +1255,9 @@ static int blk_mq_direct_issue_request(struct request *rq, blk_qc_t *cookie)
 	};
 	blk_qc_t new_cookie = blk_tag_to_qc_t(rq->tag, hctx->queue_num);
 
+	if (blk_mq_hctx_stopped(hctx))
+		goto insert;
+
 	/*
 	 * For OK queue, we are done. For error, kill it. Any other
 	 * error (busy), just add it to our list as we previously
@@ -1263,7 +1266,7 @@ static int blk_mq_direct_issue_request(struct request *rq, blk_qc_t *cookie)
 	ret = q->mq_ops->queue_rq(hctx, &bd);
 	if (ret == BLK_MQ_RQ_QUEUE_OK) {
 		*cookie = new_cookie;
-		return 0;
+		return;
 	}
 
 	__blk_mq_requeue_request(rq);
@@ -1272,10 +1275,11 @@ static int blk_mq_direct_issue_request(struct request *rq, blk_qc_t *cookie)
 		*cookie = BLK_QC_T_NONE;
 		rq->errors = -EIO;
 		blk_mq_end_request(rq, rq->errors);
-		return 0;
+		return;
 	}
 
-	return -1;
+insert:
+	blk_mq_insert_request(rq, false, true, true);
 }
 
 /*
@@ -1352,9 +1356,7 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
 		blk_mq_put_ctx(data.ctx);
 		if (!old_rq)
 			goto done;
-		if (blk_mq_hctx_stopped(data.hctx) ||
-		    blk_mq_direct_issue_request(old_rq, &cookie) != 0)
-			blk_mq_insert_request(old_rq, false, true, true);
+		blk_mq_try_issue_directly(data.hctx, old_rq, &cookie);
 		goto done;
 	}
 
-- 
2.10.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 05/14] blk-mq: Avoid that requeueing starts stopped queues
  2016-10-29  0:18 [PATCH v5 0/14] Fix race conditions related to stopping block layer queues Bart Van Assche
                   ` (3 preceding siblings ...)
  2016-10-29  0:20 ` [PATCH v5 04/14] blk-mq: Move more code into blk_mq_direct_issue_request() Bart Van Assche
@ 2016-10-29  0:20 ` Bart Van Assche
  2016-11-01 16:01   ` Sagi Grimberg
  2016-10-29  0:20 ` [PATCH v5 06/14] blk-mq: Remove blk_mq_cancel_requeue_work() Bart Van Assche
                   ` (10 subsequent siblings)
  15 siblings, 1 reply; 31+ messages in thread
From: Bart Van Assche @ 2016-10-29  0:20 UTC (permalink / raw)


Since blk_mq_requeue_work() starts stopped queues and since
execution of this function can be scheduled after a queue has
been stopped it is not possible to stop queues without using
an additional state variable to track whether or not the queue
has been stopped. Hence modify blk_mq_requeue_work() such that it
does not start stopped queues. My conclusion after a review of
the blk_mq_stop_hw_queues() and blk_mq_{delay_,}kick_requeue_list()
callers is as follows:
* In the dm driver starting and stopping queues should only happen
  if __dm_suspend() or __dm_resume() is called and not if the
  requeue list is processed.
* In the SCSI core queue stopping and starting should only be
  performed by the scsi_internal_device_block() and
  scsi_internal_device_unblock() functions but not by any other
  function. Although the blk_mq_stop_hw_queue() call in
  scsi_queue_rq() may help to reduce CPU load if a LLD queue is
  full, figuring out whether or not a queue should be restarted
  when requeueing a command would require to introduce additional
  locking in scsi_mq_requeue_cmd() to avoid a race with
  scsi_internal_device_block(). Avoid this complexity by removing
  the blk_mq_stop_hw_queue() call from scsi_queue_rq().
* In the NVMe core only the functions that call
  blk_mq_start_stopped_hw_queues() explicitly should start stopped
  queues.
* A blk_mq_start_stopped_hwqueues() call must be added in the
  xen-blkfront driver in its blkif_recover() function.

Signed-off-by: Bart Van Assche <bart.vanassche at sandisk.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk at oracle.com>
Cc: Roger Pau Monn? <roger.pau at citrix.com>
Cc: Mike Snitzer <snitzer at redhat.com>
Cc: James Bottomley <jejb at linux.vnet.ibm.com>
Cc: Martin K. Petersen <martin.petersen at oracle.com>
---
 block/blk-mq.c               | 6 +-----
 drivers/block/xen-blkfront.c | 1 +
 drivers/md/dm-rq.c           | 7 +------
 drivers/scsi/scsi_lib.c      | 1 -
 4 files changed, 3 insertions(+), 12 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index c8ae970..fe367b5 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -503,11 +503,7 @@ static void blk_mq_requeue_work(struct work_struct *work)
 		blk_mq_insert_request(rq, false, false, false);
 	}
 
-	/*
-	 * Use the start variant of queue running here, so that running
-	 * the requeue work will kick stopped queues.
-	 */
-	blk_mq_start_hw_queues(q);
+	blk_mq_run_hw_queues(q, false);
 }
 
 void blk_mq_add_to_requeue_list(struct request *rq, bool at_head)
diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 9908597..60fff99 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -2045,6 +2045,7 @@ static int blkif_recover(struct blkfront_info *info)
 		BUG_ON(req->nr_phys_segments > segs);
 		blk_mq_requeue_request(req);
 	}
+	blk_mq_start_stopped_hwqueues(info->rq);
 	blk_mq_kick_requeue_list(info->rq);
 
 	while ((bio = bio_list_pop(&info->bio_list)) != NULL) {
diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
index 1d0d2ad..1794de5 100644
--- a/drivers/md/dm-rq.c
+++ b/drivers/md/dm-rq.c
@@ -338,12 +338,7 @@ static void dm_old_requeue_request(struct request *rq)
 
 static void __dm_mq_kick_requeue_list(struct request_queue *q, unsigned long msecs)
 {
-	unsigned long flags;
-
-	spin_lock_irqsave(q->queue_lock, flags);
-	if (!blk_queue_stopped(q))
-		blk_mq_delay_kick_requeue_list(q, msecs);
-	spin_unlock_irqrestore(q->queue_lock, flags);
+	blk_mq_delay_kick_requeue_list(q, msecs);
 }
 
 void dm_mq_kick_requeue_list(struct mapped_device *md)
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 2cca9cf..126a784 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1941,7 +1941,6 @@ static int scsi_queue_rq(struct blk_mq_hw_ctx *hctx,
 out:
 	switch (ret) {
 	case BLK_MQ_RQ_QUEUE_BUSY:
-		blk_mq_stop_hw_queue(hctx);
 		if (atomic_read(&sdev->device_busy) == 0 &&
 		    !scsi_device_blocked(sdev))
 			blk_mq_delay_queue(hctx, SCSI_QUEUE_DELAY);
-- 
2.10.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 05/14] blk-mq: Avoid that requeueing starts stopped queues
  2016-10-29  0:20 ` [PATCH v5 05/14] blk-mq: Avoid that requeueing starts stopped queues Bart Van Assche
@ 2016-11-01 16:01   ` Sagi Grimberg
  0 siblings, 0 replies; 31+ messages in thread
From: Sagi Grimberg @ 2016-11-01 16:01 UTC (permalink / raw)


Reviewed-by: Sagi Grimberg <sagi at grimberg.me>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v5 06/14] blk-mq: Remove blk_mq_cancel_requeue_work()
  2016-10-29  0:18 [PATCH v5 0/14] Fix race conditions related to stopping block layer queues Bart Van Assche
                   ` (4 preceding siblings ...)
  2016-10-29  0:20 ` [PATCH v5 05/14] blk-mq: Avoid that requeueing starts stopped queues Bart Van Assche
@ 2016-10-29  0:20 ` Bart Van Assche
  2016-11-01 16:01   ` Sagi Grimberg
  2016-10-29  0:21 ` [PATCH v5 07/14] blk-mq: Introduce blk_mq_quiesce_queue() Bart Van Assche
                   ` (9 subsequent siblings)
  15 siblings, 1 reply; 31+ messages in thread
From: Bart Van Assche @ 2016-10-29  0:20 UTC (permalink / raw)


Since blk_mq_requeue_work() no longer restarts stopped queues
canceling requeue work is no longer needed to prevent that a
stopped queue would be restarted. Hence remove this function.

Signed-off-by: Bart Van Assche <bart.vanassche at sandisk.com>
Cc: Mike Snitzer <snitzer at redhat.com>
Cc: Keith Busch <keith.busch at intel.com>
Cc: Christoph Hellwig <hch at lst.de>
Cc: Hannes Reinecke <hare at suse.com>
Cc: Sagi Grimberg <sagi at grimberg.me>
Cc: Johannes Thumshirn <jthumshirn at suse.de>
---
 block/blk-mq.c           | 6 ------
 drivers/md/dm-rq.c       | 2 --
 drivers/nvme/host/core.c | 1 -
 include/linux/blk-mq.h   | 1 -
 4 files changed, 10 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index fe367b5..534128a 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -528,12 +528,6 @@ void blk_mq_add_to_requeue_list(struct request *rq, bool at_head)
 }
 EXPORT_SYMBOL(blk_mq_add_to_requeue_list);
 
-void blk_mq_cancel_requeue_work(struct request_queue *q)
-{
-	cancel_delayed_work_sync(&q->requeue_work);
-}
-EXPORT_SYMBOL_GPL(blk_mq_cancel_requeue_work);
-
 void blk_mq_kick_requeue_list(struct request_queue *q)
 {
 	kblockd_schedule_delayed_work(&q->requeue_work, 0);
diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
index 1794de5..2b82496 100644
--- a/drivers/md/dm-rq.c
+++ b/drivers/md/dm-rq.c
@@ -116,8 +116,6 @@ static void dm_mq_stop_queue(struct request_queue *q)
 	queue_flag_set(QUEUE_FLAG_STOPPED, q);
 	spin_unlock_irqrestore(q->queue_lock, flags);
 
-	/* Avoid that requeuing could restart the queue. */
-	blk_mq_cancel_requeue_work(q);
 	blk_mq_stop_hw_queues(q);
 }
 
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 79e679d..ab5f59e 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2083,7 +2083,6 @@ void nvme_stop_queues(struct nvme_ctrl *ctrl)
 		queue_flag_set(QUEUE_FLAG_STOPPED, ns->queue);
 		spin_unlock_irq(ns->queue->queue_lock);
 
-		blk_mq_cancel_requeue_work(ns->queue);
 		blk_mq_stop_hw_queues(ns->queue);
 	}
 	mutex_unlock(&ctrl->namespaces_mutex);
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index aa93000..a85a20f 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -217,7 +217,6 @@ void __blk_mq_end_request(struct request *rq, int error);
 
 void blk_mq_requeue_request(struct request *rq);
 void blk_mq_add_to_requeue_list(struct request *rq, bool at_head);
-void blk_mq_cancel_requeue_work(struct request_queue *q);
 void blk_mq_kick_requeue_list(struct request_queue *q);
 void blk_mq_delay_kick_requeue_list(struct request_queue *q, unsigned long msecs);
 void blk_mq_abort_requeue_list(struct request_queue *q);
-- 
2.10.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 06/14] blk-mq: Remove blk_mq_cancel_requeue_work()
  2016-10-29  0:20 ` [PATCH v5 06/14] blk-mq: Remove blk_mq_cancel_requeue_work() Bart Van Assche
@ 2016-11-01 16:01   ` Sagi Grimberg
  0 siblings, 0 replies; 31+ messages in thread
From: Sagi Grimberg @ 2016-11-01 16:01 UTC (permalink / raw)


Reviewed-by: Sagi Grimberg <sagi at grimberg.me>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v5 07/14] blk-mq: Introduce blk_mq_quiesce_queue()
  2016-10-29  0:18 [PATCH v5 0/14] Fix race conditions related to stopping block layer queues Bart Van Assche
                   ` (5 preceding siblings ...)
  2016-10-29  0:20 ` [PATCH v5 06/14] blk-mq: Remove blk_mq_cancel_requeue_work() Bart Van Assche
@ 2016-10-29  0:21 ` Bart Van Assche
  2016-11-01 16:02   ` Sagi Grimberg
  2016-10-29  0:21 ` [PATCH v5 08/14] blk-mq: Add a kick_requeue_list argument to blk_mq_requeue_request() Bart Van Assche
                   ` (8 subsequent siblings)
  15 siblings, 1 reply; 31+ messages in thread
From: Bart Van Assche @ 2016-10-29  0:21 UTC (permalink / raw)


blk_mq_quiesce_queue() waits until ongoing .queue_rq() invocations
have finished. This function does *not* wait until all outstanding
requests have finished (this means invocation of request.end_io()).
The algorithm used by blk_mq_quiesce_queue() is as follows:
* Hold either an RCU read lock or an SRCU read lock around
  .queue_rq() calls. The former is used if .queue_rq() does not
  block and the latter if .queue_rq() may block.
* blk_mq_quiesce_queue() first calls blk_mq_stop_hw_queues()
  followed by synchronize_srcu() or synchronize_rcu(). The latter
  call waits for .queue_rq() invocations that started before
  blk_mq_quiesce_queue() was called.
* The blk_mq_hctx_stopped() calls that control whether or not
  .queue_rq() will be called are called with the (S)RCU read lock
  held. This is necessary to avoid race conditions against
  blk_mq_quiesce_queue().

Signed-off-by: Bart Van Assche <bart.vanassche at sandisk.com>
Cc: Christoph Hellwig <hch at lst.de>
Cc: Ming Lei <tom.leiming at gmail.com>
Cc: Hannes Reinecke <hare at suse.com>
Cc: Johannes Thumshirn <jthumshirn at suse.de>
---
 block/Kconfig          |  1 +
 block/blk-mq.c         | 71 +++++++++++++++++++++++++++++++++++++++++++++-----
 include/linux/blk-mq.h |  3 +++
 include/linux/blkdev.h |  1 +
 4 files changed, 69 insertions(+), 7 deletions(-)

diff --git a/block/Kconfig b/block/Kconfig
index 1d4d624..0562ef9 100644
--- a/block/Kconfig
+++ b/block/Kconfig
@@ -5,6 +5,7 @@ menuconfig BLOCK
        bool "Enable the block layer" if EXPERT
        default y
        select SBITMAP
+       select SRCU
        help
 	 Provide block layer support for the kernel.
 
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 534128a..96015a9 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -115,6 +115,33 @@ void blk_mq_unfreeze_queue(struct request_queue *q)
 }
 EXPORT_SYMBOL_GPL(blk_mq_unfreeze_queue);
 
+/**
+ * blk_mq_quiesce_queue() - wait until all ongoing queue_rq calls have finished
+ * @q: request queue.
+ *
+ * Note: this function does not prevent that the struct request end_io()
+ * callback function is invoked. Additionally, it is not prevented that
+ * new queue_rq() calls occur unless the queue has been stopped first.
+ */
+void blk_mq_quiesce_queue(struct request_queue *q)
+{
+	struct blk_mq_hw_ctx *hctx;
+	unsigned int i;
+	bool rcu = false;
+
+	blk_mq_stop_hw_queues(q);
+
+	queue_for_each_hw_ctx(q, hctx, i) {
+		if (hctx->flags & BLK_MQ_F_BLOCKING)
+			synchronize_srcu(&hctx->queue_rq_srcu);
+		else
+			rcu = true;
+	}
+	if (rcu)
+		synchronize_rcu();
+}
+EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue);
+
 void blk_mq_wake_waiters(struct request_queue *q)
 {
 	struct blk_mq_hw_ctx *hctx;
@@ -768,7 +795,7 @@ static inline unsigned int queued_to_index(unsigned int queued)
  * of IO. In particular, we'd like FIFO behaviour on handling existing
  * items on the hctx->dispatch list. Ignore that for now.
  */
-static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
+static void blk_mq_process_rq_list(struct blk_mq_hw_ctx *hctx)
 {
 	struct request_queue *q = hctx->queue;
 	struct request *rq;
@@ -780,9 +807,6 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
 	if (unlikely(blk_mq_hctx_stopped(hctx)))
 		return;
 
-	WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) &&
-		cpu_online(hctx->next_cpu));
-
 	hctx->run++;
 
 	/*
@@ -873,6 +897,24 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
 	}
 }
 
+static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
+{
+	int srcu_idx;
+
+	WARN_ON(!cpumask_test_cpu(raw_smp_processor_id(), hctx->cpumask) &&
+		cpu_online(hctx->next_cpu));
+
+	if (!(hctx->flags & BLK_MQ_F_BLOCKING)) {
+		rcu_read_lock();
+		blk_mq_process_rq_list(hctx);
+		rcu_read_unlock();
+	} else {
+		srcu_idx = srcu_read_lock(&hctx->queue_rq_srcu);
+		blk_mq_process_rq_list(hctx);
+		srcu_read_unlock(&hctx->queue_rq_srcu, srcu_idx);
+	}
+}
+
 /*
  * It'd be great if the workqueue API had a way to pass
  * in a mask and had some smarts for more clever placement.
@@ -1283,7 +1325,7 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
 	const int is_flush_fua = bio->bi_opf & (REQ_PREFLUSH | REQ_FUA);
 	struct blk_map_ctx data;
 	struct request *rq;
-	unsigned int request_count = 0;
+	unsigned int request_count = 0, srcu_idx;
 	struct blk_plug *plug;
 	struct request *same_queue_rq = NULL;
 	blk_qc_t cookie;
@@ -1326,7 +1368,7 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
 		blk_mq_bio_to_request(rq, bio);
 
 		/*
-		 * We do limited pluging. If the bio can be merged, do that.
+		 * We do limited plugging. If the bio can be merged, do that.
 		 * Otherwise the existing request in the plug list will be
 		 * issued. So the plug list will have one request at most
 		 */
@@ -1346,7 +1388,16 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
 		blk_mq_put_ctx(data.ctx);
 		if (!old_rq)
 			goto done;
-		blk_mq_try_issue_directly(data.hctx, old_rq, &cookie);
+
+		if (!(data.hctx->flags & BLK_MQ_F_BLOCKING)) {
+			rcu_read_lock();
+			blk_mq_try_issue_directly(data.hctx, old_rq, &cookie);
+			rcu_read_unlock();
+		} else {
+			srcu_idx = srcu_read_lock(&data.hctx->queue_rq_srcu);
+			blk_mq_try_issue_directly(data.hctx, old_rq, &cookie);
+			srcu_read_unlock(&data.hctx->queue_rq_srcu, srcu_idx);
+		}
 		goto done;
 	}
 
@@ -1625,6 +1676,9 @@ static void blk_mq_exit_hctx(struct request_queue *q,
 	if (set->ops->exit_hctx)
 		set->ops->exit_hctx(hctx, hctx_idx);
 
+	if (hctx->flags & BLK_MQ_F_BLOCKING)
+		cleanup_srcu_struct(&hctx->queue_rq_srcu);
+
 	blk_mq_remove_cpuhp(hctx);
 	blk_free_flush_queue(hctx->fq);
 	sbitmap_free(&hctx->ctx_map);
@@ -1705,6 +1759,9 @@ static int blk_mq_init_hctx(struct request_queue *q,
 				   flush_start_tag + hctx_idx, node))
 		goto free_fq;
 
+	if (hctx->flags & BLK_MQ_F_BLOCKING)
+		init_srcu_struct(&hctx->queue_rq_srcu);
+
 	return 0;
 
  free_fq:
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index a85a20f..ed20ac7 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -3,6 +3,7 @@
 
 #include <linux/blkdev.h>
 #include <linux/sbitmap.h>
+#include <linux/srcu.h>
 
 struct blk_mq_tags;
 struct blk_flush_queue;
@@ -35,6 +36,8 @@ struct blk_mq_hw_ctx {
 
 	struct blk_mq_tags	*tags;
 
+	struct srcu_struct	queue_rq_srcu;
+
 	unsigned long		queued;
 	unsigned long		run;
 #define BLK_MQ_MAX_DISPATCH_ORDER	7
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index c47c358..8259d87 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -824,6 +824,7 @@ extern void __blk_run_queue(struct request_queue *q);
 extern void __blk_run_queue_uncond(struct request_queue *q);
 extern void blk_run_queue(struct request_queue *);
 extern void blk_run_queue_async(struct request_queue *q);
+extern void blk_mq_quiesce_queue(struct request_queue *q);
 extern int blk_rq_map_user(struct request_queue *, struct request *,
 			   struct rq_map_data *, void __user *, unsigned long,
 			   gfp_t);
-- 
2.10.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 07/14] blk-mq: Introduce blk_mq_quiesce_queue()
  2016-10-29  0:21 ` [PATCH v5 07/14] blk-mq: Introduce blk_mq_quiesce_queue() Bart Van Assche
@ 2016-11-01 16:02   ` Sagi Grimberg
  2016-11-02  2:08     ` Ming Lei
  0 siblings, 1 reply; 31+ messages in thread
From: Sagi Grimberg @ 2016-11-01 16:02 UTC (permalink / raw)


Reviewed-by: Sagi Grimberg <sagi at grimberg.me>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v5 07/14] blk-mq: Introduce blk_mq_quiesce_queue()
  2016-11-01 16:02   ` Sagi Grimberg
@ 2016-11-02  2:08     ` Ming Lei
  0 siblings, 0 replies; 31+ messages in thread
From: Ming Lei @ 2016-11-02  2:08 UTC (permalink / raw)


On Wed, Nov 2, 2016@12:02 AM, Sagi Grimberg <sagi@grimberg.me> wrote:
> Reviewed-by: Sagi Grimberg <sagi at grimberg.me>

Reviewed-by: Ming Lei <tom.leiming at gmail.com>


--
Ming Lei

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v5 08/14] blk-mq: Add a kick_requeue_list argument to blk_mq_requeue_request()
  2016-10-29  0:18 [PATCH v5 0/14] Fix race conditions related to stopping block layer queues Bart Van Assche
                   ` (6 preceding siblings ...)
  2016-10-29  0:21 ` [PATCH v5 07/14] blk-mq: Introduce blk_mq_quiesce_queue() Bart Van Assche
@ 2016-10-29  0:21 ` Bart Van Assche
  2016-11-01 16:02   ` Sagi Grimberg
  2016-10-29  0:22 ` [PATCH v5 09/14] dm: Use BLK_MQ_S_STOPPED instead of QUEUE_FLAG_STOPPED in blk-mq code Bart Van Assche
                   ` (7 subsequent siblings)
  15 siblings, 1 reply; 31+ messages in thread
From: Bart Van Assche @ 2016-10-29  0:21 UTC (permalink / raw)


Most blk_mq_requeue_request() and blk_mq_add_to_requeue_list() calls
are followed by kicking the requeue list. Hence add an argument to
these two functions that allows to kick the requeue list. This was
proposed by Christoph Hellwig.

Signed-off-by: Bart Van Assche <bart.vanassche at sandisk.com>
Reviewed-by: Johannes Thumshirn <jthumshirn at suse.de>
Reviewed-by: Christoph Hellwig <hch at lst.de>
Cc: Hannes Reinecke <hare at suse.com>
Cc: Sagi Grimberg <sagi at grimberg.me>
---
 block/blk-flush.c            |  5 +----
 block/blk-mq.c               | 10 +++++++---
 drivers/block/xen-blkfront.c |  2 +-
 drivers/md/dm-rq.c           |  2 +-
 drivers/nvme/host/core.c     |  2 +-
 drivers/scsi/scsi_lib.c      |  4 +---
 include/linux/blk-mq.h       |  5 +++--
 7 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/block/blk-flush.c b/block/blk-flush.c
index 3c882cb..7e9950b 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -134,10 +134,7 @@ static void blk_flush_restore_request(struct request *rq)
 static bool blk_flush_queue_rq(struct request *rq, bool add_front)
 {
 	if (rq->q->mq_ops) {
-		struct request_queue *q = rq->q;
-
-		blk_mq_add_to_requeue_list(rq, add_front);
-		blk_mq_kick_requeue_list(q);
+		blk_mq_add_to_requeue_list(rq, add_front, true);
 		return false;
 	} else {
 		if (add_front)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 96015a9..b7753ae 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -494,12 +494,12 @@ static void __blk_mq_requeue_request(struct request *rq)
 	}
 }
 
-void blk_mq_requeue_request(struct request *rq)
+void blk_mq_requeue_request(struct request *rq, bool kick_requeue_list)
 {
 	__blk_mq_requeue_request(rq);
 
 	BUG_ON(blk_queued_rq(rq));
-	blk_mq_add_to_requeue_list(rq, true);
+	blk_mq_add_to_requeue_list(rq, true, kick_requeue_list);
 }
 EXPORT_SYMBOL(blk_mq_requeue_request);
 
@@ -533,7 +533,8 @@ static void blk_mq_requeue_work(struct work_struct *work)
 	blk_mq_run_hw_queues(q, false);
 }
 
-void blk_mq_add_to_requeue_list(struct request *rq, bool at_head)
+void blk_mq_add_to_requeue_list(struct request *rq, bool at_head,
+				bool kick_requeue_list)
 {
 	struct request_queue *q = rq->q;
 	unsigned long flags;
@@ -552,6 +553,9 @@ void blk_mq_add_to_requeue_list(struct request *rq, bool at_head)
 		list_add_tail(&rq->queuelist, &q->requeue_list);
 	}
 	spin_unlock_irqrestore(&q->requeue_lock, flags);
+
+	if (kick_requeue_list)
+		blk_mq_kick_requeue_list(q);
 }
 EXPORT_SYMBOL(blk_mq_add_to_requeue_list);
 
diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 60fff99..a3e1727 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -2043,7 +2043,7 @@ static int blkif_recover(struct blkfront_info *info)
 		/* Requeue pending requests (flush or discard) */
 		list_del_init(&req->queuelist);
 		BUG_ON(req->nr_phys_segments > segs);
-		blk_mq_requeue_request(req);
+		blk_mq_requeue_request(req, false);
 	}
 	blk_mq_start_stopped_hwqueues(info->rq);
 	blk_mq_kick_requeue_list(info->rq);
diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
index 2b82496..4b62e74 100644
--- a/drivers/md/dm-rq.c
+++ b/drivers/md/dm-rq.c
@@ -347,7 +347,7 @@ EXPORT_SYMBOL(dm_mq_kick_requeue_list);
 
 static void dm_mq_delay_requeue_request(struct request *rq, unsigned long msecs)
 {
-	blk_mq_requeue_request(rq);
+	blk_mq_requeue_request(rq, false);
 	__dm_mq_kick_requeue_list(rq->q, msecs);
 }
 
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index ab5f59e..8403996 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -203,7 +203,7 @@ void nvme_requeue_req(struct request *req)
 {
 	unsigned long flags;
 
-	blk_mq_requeue_request(req);
+	blk_mq_requeue_request(req, false);
 	spin_lock_irqsave(req->q->queue_lock, flags);
 	if (!blk_queue_stopped(req->q))
 		blk_mq_kick_requeue_list(req->q);
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 126a784..b4f682c 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -86,10 +86,8 @@ scsi_set_blocked(struct scsi_cmnd *cmd, int reason)
 static void scsi_mq_requeue_cmd(struct scsi_cmnd *cmd)
 {
 	struct scsi_device *sdev = cmd->device;
-	struct request_queue *q = cmd->request->q;
 
-	blk_mq_requeue_request(cmd->request);
-	blk_mq_kick_requeue_list(q);
+	blk_mq_requeue_request(cmd->request, true);
 	put_device(&sdev->sdev_gendev);
 }
 
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index ed20ac7..35a0af5 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -218,8 +218,9 @@ void blk_mq_start_request(struct request *rq);
 void blk_mq_end_request(struct request *rq, int error);
 void __blk_mq_end_request(struct request *rq, int error);
 
-void blk_mq_requeue_request(struct request *rq);
-void blk_mq_add_to_requeue_list(struct request *rq, bool at_head);
+void blk_mq_requeue_request(struct request *rq, bool kick_requeue_list);
+void blk_mq_add_to_requeue_list(struct request *rq, bool at_head,
+				bool kick_requeue_list);
 void blk_mq_kick_requeue_list(struct request_queue *q);
 void blk_mq_delay_kick_requeue_list(struct request_queue *q, unsigned long msecs);
 void blk_mq_abort_requeue_list(struct request_queue *q);
-- 
2.10.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 08/14] blk-mq: Add a kick_requeue_list argument to blk_mq_requeue_request()
  2016-10-29  0:21 ` [PATCH v5 08/14] blk-mq: Add a kick_requeue_list argument to blk_mq_requeue_request() Bart Van Assche
@ 2016-11-01 16:02   ` Sagi Grimberg
  0 siblings, 0 replies; 31+ messages in thread
From: Sagi Grimberg @ 2016-11-01 16:02 UTC (permalink / raw)


Looks useful,

Reviewed-by: Sagi Grimberg <sagi at grimberg.me>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v5 09/14] dm: Use BLK_MQ_S_STOPPED instead of QUEUE_FLAG_STOPPED in blk-mq code
  2016-10-29  0:18 [PATCH v5 0/14] Fix race conditions related to stopping block layer queues Bart Van Assche
                   ` (7 preceding siblings ...)
  2016-10-29  0:21 ` [PATCH v5 08/14] blk-mq: Add a kick_requeue_list argument to blk_mq_requeue_request() Bart Van Assche
@ 2016-10-29  0:22 ` Bart Van Assche
  2016-10-29  0:22 ` [PATCH v5 10/14] dm: Fix a race condition related to stopping and starting queues Bart Van Assche
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 31+ messages in thread
From: Bart Van Assche @ 2016-10-29  0:22 UTC (permalink / raw)


Instead of manipulating both QUEUE_FLAG_STOPPED and BLK_MQ_S_STOPPED
in the dm start and stop queue functions, only manipulate the latter
flag. Change blk_queue_stopped() tests into blk_mq_queue_stopped().

Signed-off-by: Bart Van Assche <bart.vanassche at sandisk.com>
Acked-by: Mike Snitzer <snitzer at redhat.com>
Reviewed-by: Christoph Hellwig <hch at lst.de>
Reviewed-by: Hannes Reinecke <hare at suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn at suse.de>
---
 drivers/md/dm-rq.c | 16 +---------------
 1 file changed, 1 insertion(+), 15 deletions(-)

diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
index 4b62e74..0103031 100644
--- a/drivers/md/dm-rq.c
+++ b/drivers/md/dm-rq.c
@@ -75,12 +75,6 @@ static void dm_old_start_queue(struct request_queue *q)
 
 static void dm_mq_start_queue(struct request_queue *q)
 {
-	unsigned long flags;
-
-	spin_lock_irqsave(q->queue_lock, flags);
-	queue_flag_clear(QUEUE_FLAG_STOPPED, q);
-	spin_unlock_irqrestore(q->queue_lock, flags);
-
 	blk_mq_start_stopped_hw_queues(q, true);
 	blk_mq_kick_requeue_list(q);
 }
@@ -105,16 +99,8 @@ static void dm_old_stop_queue(struct request_queue *q)
 
 static void dm_mq_stop_queue(struct request_queue *q)
 {
-	unsigned long flags;
-
-	spin_lock_irqsave(q->queue_lock, flags);
-	if (blk_queue_stopped(q)) {
-		spin_unlock_irqrestore(q->queue_lock, flags);
+	if (blk_mq_queue_stopped(q))
 		return;
-	}
-
-	queue_flag_set(QUEUE_FLAG_STOPPED, q);
-	spin_unlock_irqrestore(q->queue_lock, flags);
 
 	blk_mq_stop_hw_queues(q);
 }
-- 
2.10.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 10/14] dm: Fix a race condition related to stopping and starting queues
  2016-10-29  0:18 [PATCH v5 0/14] Fix race conditions related to stopping block layer queues Bart Van Assche
                   ` (8 preceding siblings ...)
  2016-10-29  0:22 ` [PATCH v5 09/14] dm: Use BLK_MQ_S_STOPPED instead of QUEUE_FLAG_STOPPED in blk-mq code Bart Van Assche
@ 2016-10-29  0:22 ` Bart Van Assche
  2016-10-29  0:22 ` [PATCH v5 11/14] SRP transport: Move queuecommand() wait code to SCSI core Bart Van Assche
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 31+ messages in thread
From: Bart Van Assche @ 2016-10-29  0:22 UTC (permalink / raw)


Ensure that all ongoing dm_mq_queue_rq() and dm_mq_requeue_request()
calls have stopped before setting the "queue stopped" flag. This
allows to remove the "queue stopped" test from dm_mq_queue_rq() and
dm_mq_requeue_request(). This patch fixes a race condition because
dm_mq_queue_rq() is called without holding the queue lock and hence
BLK_MQ_S_STOPPED can be set at any time while dm_mq_queue_rq() is
in progress. This patch prevents that the following hang occurs
sporadically when using dm-mq:

INFO: task systemd-udevd:10111 blocked for more than 480 seconds.
Call Trace:
 [<ffffffff8161f397>] schedule+0x37/0x90
 [<ffffffff816239ef>] schedule_timeout+0x27f/0x470
 [<ffffffff8161e76f>] io_schedule_timeout+0x9f/0x110
 [<ffffffff8161fb36>] bit_wait_io+0x16/0x60
 [<ffffffff8161f929>] __wait_on_bit_lock+0x49/0xa0
 [<ffffffff8114fe69>] __lock_page+0xb9/0xc0
 [<ffffffff81165d90>] truncate_inode_pages_range+0x3e0/0x760
 [<ffffffff81166120>] truncate_inode_pages+0x10/0x20
 [<ffffffff81212a20>] kill_bdev+0x30/0x40
 [<ffffffff81213d41>] __blkdev_put+0x71/0x360
 [<ffffffff81214079>] blkdev_put+0x49/0x170
 [<ffffffff812141c0>] blkdev_close+0x20/0x30
 [<ffffffff811d48e8>] __fput+0xe8/0x1f0
 [<ffffffff811d4a29>] ____fput+0x9/0x10
 [<ffffffff810842d3>] task_work_run+0x83/0xb0
 [<ffffffff8106606e>] do_exit+0x3ee/0xc40
 [<ffffffff8106694b>] do_group_exit+0x4b/0xc0
 [<ffffffff81073d9a>] get_signal+0x2ca/0x940
 [<ffffffff8101bf43>] do_signal+0x23/0x660
 [<ffffffff810022b3>] exit_to_usermode_loop+0x73/0xb0
 [<ffffffff81002cb0>] syscall_return_slowpath+0xb0/0xc0
 [<ffffffff81624e33>] entry_SYSCALL_64_fastpath+0xa6/0xa8

Signed-off-by: Bart Van Assche <bart.vanassche at sandisk.com>
Acked-by: Mike Snitzer <snitzer at redhat.com>
Reviewed-by: Hannes Reinecke <hare at suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn at suse.de>
Reviewed-by: Christoph Hellwig <hch at lst.de>
---
 drivers/md/dm-rq.c | 13 +------------
 1 file changed, 1 insertion(+), 12 deletions(-)

diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
index 0103031..f9f37ad 100644
--- a/drivers/md/dm-rq.c
+++ b/drivers/md/dm-rq.c
@@ -102,7 +102,7 @@ static void dm_mq_stop_queue(struct request_queue *q)
 	if (blk_mq_queue_stopped(q))
 		return;
 
-	blk_mq_stop_hw_queues(q);
+	blk_mq_quiesce_queue(q);
 }
 
 void dm_stop_queue(struct request_queue *q)
@@ -883,17 +883,6 @@ static int dm_mq_queue_rq(struct blk_mq_hw_ctx *hctx,
 		dm_put_live_table(md, srcu_idx);
 	}
 
-	/*
-	 * On suspend dm_stop_queue() handles stopping the blk-mq
-	 * request_queue BUT: even though the hw_queues are marked
-	 * BLK_MQ_S_STOPPED at that point there is still a race that
-	 * is allowing block/blk-mq.c to call ->queue_rq against a
-	 * hctx that it really shouldn't.  The following check guards
-	 * against this rarity (albeit _not_ race-free).
-	 */
-	if (unlikely(test_bit(BLK_MQ_S_STOPPED, &hctx->state)))
-		return BLK_MQ_RQ_QUEUE_BUSY;
-
 	if (ti->type->busy && ti->type->busy(ti))
 		return BLK_MQ_RQ_QUEUE_BUSY;
 
-- 
2.10.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 11/14] SRP transport: Move queuecommand() wait code to SCSI core
  2016-10-29  0:18 [PATCH v5 0/14] Fix race conditions related to stopping block layer queues Bart Van Assche
                   ` (9 preceding siblings ...)
  2016-10-29  0:22 ` [PATCH v5 10/14] dm: Fix a race condition related to stopping and starting queues Bart Van Assche
@ 2016-10-29  0:22 ` Bart Van Assche
  2016-11-01 16:03   ` Sagi Grimberg
  2016-11-01 16:11   ` Martin K. Petersen
  2016-10-29  0:23 ` [PATCH v5 12/14] SRP transport, scsi-mq: Wait for .queue_rq() if necessary Bart Van Assche
                   ` (4 subsequent siblings)
  15 siblings, 2 replies; 31+ messages in thread
From: Bart Van Assche @ 2016-10-29  0:22 UTC (permalink / raw)


Additionally, rename srp_wait_for_queuecommand() into
scsi_wait_for_queuecommand() and add a comment about the
queuecommand() call from scsi_send_eh_cmnd().

Signed-off-by: Bart Van Assche <bart.vanassche at sandisk.com>
Cc: James Bottomley <jejb at linux.vnet.ibm.com>
Cc: Martin K. Petersen <martin.petersen at oracle.com>
Cc: Christoph Hellwig <hch at lst.de>
Cc: Sagi Grimberg <sagi at grimberg.me>
Cc: Doug Ledford <dledford at redhat.com>
---
 drivers/scsi/scsi_lib.c           | 38 ++++++++++++++++++++++++++++++++++++
 drivers/scsi/scsi_transport_srp.c | 41 ++++++---------------------------------
 2 files changed, 44 insertions(+), 35 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index b4f682c..3ab9c87 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -2721,6 +2721,39 @@ void sdev_evt_send_simple(struct scsi_device *sdev,
 EXPORT_SYMBOL_GPL(sdev_evt_send_simple);
 
 /**
+ * scsi_request_fn_active() - number of kernel threads inside scsi_request_fn()
+ * @sdev: SCSI device to count the number of scsi_request_fn() callers for.
+ */
+static int scsi_request_fn_active(struct scsi_device *sdev)
+{
+	struct request_queue *q = sdev->request_queue;
+	int request_fn_active;
+
+	WARN_ON_ONCE(sdev->host->use_blk_mq);
+
+	spin_lock_irq(q->queue_lock);
+	request_fn_active = q->request_fn_active;
+	spin_unlock_irq(q->queue_lock);
+
+	return request_fn_active;
+}
+
+/**
+ * scsi_wait_for_queuecommand() - wait for ongoing queuecommand() calls
+ * @shost: SCSI host pointer.
+ *
+ * Wait until the ongoing shost->hostt->queuecommand() calls that are
+ * invoked from scsi_request_fn() have finished.
+ */
+static void scsi_wait_for_queuecommand(struct scsi_device *sdev)
+{
+	WARN_ON_ONCE(sdev->host->use_blk_mq);
+
+	while (scsi_request_fn_active(sdev))
+		msleep(20);
+}
+
+/**
  *	scsi_device_quiesce - Block user issued commands.
  *	@sdev:	scsi device to quiesce.
  *
@@ -2814,6 +2847,10 @@ EXPORT_SYMBOL(scsi_target_resume);
  *	(which must be a legal transition).  When the device is in this
  *	state, all commands are deferred until the scsi lld reenables
  *	the device with scsi_device_unblock or device_block_tmo fires.
+ *
+ * To do: avoid that scsi_send_eh_cmnd() calls queuecommand() after
+ * scsi_internal_device_block() has blocked a SCSI device and also
+ * remove the rport mutex lock and unlock calls from srp_queuecommand().
  */
 int
 scsi_internal_device_block(struct scsi_device *sdev)
@@ -2841,6 +2878,7 @@ scsi_internal_device_block(struct scsi_device *sdev)
 		spin_lock_irqsave(q->queue_lock, flags);
 		blk_stop_queue(q);
 		spin_unlock_irqrestore(q->queue_lock, flags);
+		scsi_wait_for_queuecommand(sdev);
 	}
 
 	return 0;
diff --git a/drivers/scsi/scsi_transport_srp.c b/drivers/scsi/scsi_transport_srp.c
index e3cd3ec..b48328a 100644
--- a/drivers/scsi/scsi_transport_srp.c
+++ b/drivers/scsi/scsi_transport_srp.c
@@ -24,7 +24,6 @@
 #include <linux/err.h>
 #include <linux/slab.h>
 #include <linux/string.h>
-#include <linux/delay.h>
 
 #include <scsi/scsi.h>
 #include <scsi/scsi_cmnd.h>
@@ -402,36 +401,6 @@ static void srp_reconnect_work(struct work_struct *work)
 	}
 }
 
-/**
- * scsi_request_fn_active() - number of kernel threads inside scsi_request_fn()
- * @shost: SCSI host for which to count the number of scsi_request_fn() callers.
- *
- * To do: add support for scsi-mq in this function.
- */
-static int scsi_request_fn_active(struct Scsi_Host *shost)
-{
-	struct scsi_device *sdev;
-	struct request_queue *q;
-	int request_fn_active = 0;
-
-	shost_for_each_device(sdev, shost) {
-		q = sdev->request_queue;
-
-		spin_lock_irq(q->queue_lock);
-		request_fn_active += q->request_fn_active;
-		spin_unlock_irq(q->queue_lock);
-	}
-
-	return request_fn_active;
-}
-
-/* Wait until ongoing shost->hostt->queuecommand() calls have finished. */
-static void srp_wait_for_queuecommand(struct Scsi_Host *shost)
-{
-	while (scsi_request_fn_active(shost))
-		msleep(20);
-}
-
 static void __rport_fail_io_fast(struct srp_rport *rport)
 {
 	struct Scsi_Host *shost = rport_to_shost(rport);
@@ -441,14 +410,17 @@ static void __rport_fail_io_fast(struct srp_rport *rport)
 
 	if (srp_rport_set_state(rport, SRP_RPORT_FAIL_FAST))
 		return;
+	/*
+	 * Call scsi_target_block() to wait for ongoing shost->queuecommand()
+	 * calls before invoking i->f->terminate_rport_io().
+	 */
+	scsi_target_block(rport->dev.parent);
 	scsi_target_unblock(rport->dev.parent, SDEV_TRANSPORT_OFFLINE);
 
 	/* Involve the LLD if possible to terminate all I/O on the rport. */
 	i = to_srp_internal(shost->transportt);
-	if (i->f->terminate_rport_io) {
-		srp_wait_for_queuecommand(shost);
+	if (i->f->terminate_rport_io)
 		i->f->terminate_rport_io(rport);
-	}
 }
 
 /**
@@ -576,7 +548,6 @@ int srp_reconnect_rport(struct srp_rport *rport)
 	if (res)
 		goto out;
 	scsi_target_block(&shost->shost_gendev);
-	srp_wait_for_queuecommand(shost);
 	res = rport->state != SRP_RPORT_LOST ? i->f->reconnect(rport) : -ENODEV;
 	pr_debug("%s (state %d): transport.reconnect() returned %d\n",
 		 dev_name(&shost->shost_gendev), rport->state, res);
-- 
2.10.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 11/14] SRP transport: Move queuecommand() wait code to SCSI core
  2016-10-29  0:22 ` [PATCH v5 11/14] SRP transport: Move queuecommand() wait code to SCSI core Bart Van Assche
@ 2016-11-01 16:03   ` Sagi Grimberg
  2016-11-01 16:11   ` Martin K. Petersen
  1 sibling, 0 replies; 31+ messages in thread
From: Sagi Grimberg @ 2016-11-01 16:03 UTC (permalink / raw)


Again,

Reviewed-by: Sagi Grimberg <sagi at grimberg.me>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v5 11/14] SRP transport: Move queuecommand() wait code to SCSI core
  2016-10-29  0:22 ` [PATCH v5 11/14] SRP transport: Move queuecommand() wait code to SCSI core Bart Van Assche
  2016-11-01 16:03   ` Sagi Grimberg
@ 2016-11-01 16:11   ` Martin K. Petersen
  1 sibling, 0 replies; 31+ messages in thread
From: Martin K. Petersen @ 2016-11-01 16:11 UTC (permalink / raw)


>>>>> "Bart" == Bart Van Assche <bart.vanassche at sandisk.com> writes:

Bart> Additionally, rename srp_wait_for_queuecommand() into
Bart> scsi_wait_for_queuecommand() and add a comment about the
Bart> queuecommand() call from scsi_send_eh_cmnd().

Reviewed-by: Martin K. Petersen <martin.petersen at oracle.com>

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v5 12/14] SRP transport, scsi-mq: Wait for .queue_rq() if necessary
  2016-10-29  0:18 [PATCH v5 0/14] Fix race conditions related to stopping block layer queues Bart Van Assche
                   ` (10 preceding siblings ...)
  2016-10-29  0:22 ` [PATCH v5 11/14] SRP transport: Move queuecommand() wait code to SCSI core Bart Van Assche
@ 2016-10-29  0:23 ` Bart Van Assche
  2016-11-01 16:03   ` Sagi Grimberg
  2016-11-01 16:12   ` Martin K. Petersen
  2016-10-29  0:23 ` [PATCH v5 13/14] nvme: Fix a race condition related to stopping queues Bart Van Assche
                   ` (3 subsequent siblings)
  15 siblings, 2 replies; 31+ messages in thread
From: Bart Van Assche @ 2016-10-29  0:23 UTC (permalink / raw)


Ensure that if scsi-mq is enabled that scsi_internal_device_block()
waits until ongoing shost->hostt->queuecommand() calls have finished.

Signed-off-by: Bart Van Assche <bart.vanassche at sandisk.com>
Cc: James Bottomley <jejb at linux.vnet.ibm.com>
Cc: Martin K. Petersen <martin.petersen at oracle.com>
Cc: Christoph Hellwig <hch at lst.de>
Cc: Sagi Grimberg <sagi at grimberg.me>
Cc: Doug Ledford <dledford at redhat.com>
---
 drivers/scsi/scsi_lib.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 3ab9c87..1febc52 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -2873,7 +2873,7 @@ scsi_internal_device_block(struct scsi_device *sdev)
 	 * request queue. 
 	 */
 	if (q->mq_ops) {
-		blk_mq_stop_hw_queues(q);
+		blk_mq_quiesce_queue(q);
 	} else {
 		spin_lock_irqsave(q->queue_lock, flags);
 		blk_stop_queue(q);
-- 
2.10.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 12/14] SRP transport, scsi-mq: Wait for .queue_rq() if necessary
  2016-10-29  0:23 ` [PATCH v5 12/14] SRP transport, scsi-mq: Wait for .queue_rq() if necessary Bart Van Assche
@ 2016-11-01 16:03   ` Sagi Grimberg
  2016-11-01 16:12   ` Martin K. Petersen
  1 sibling, 0 replies; 31+ messages in thread
From: Sagi Grimberg @ 2016-11-01 16:03 UTC (permalink / raw)


and again,

Reviewed-by: Sagi Grimberg <sagi at grimberg.me>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v5 12/14] SRP transport, scsi-mq: Wait for .queue_rq() if necessary
  2016-10-29  0:23 ` [PATCH v5 12/14] SRP transport, scsi-mq: Wait for .queue_rq() if necessary Bart Van Assche
  2016-11-01 16:03   ` Sagi Grimberg
@ 2016-11-01 16:12   ` Martin K. Petersen
  1 sibling, 0 replies; 31+ messages in thread
From: Martin K. Petersen @ 2016-11-01 16:12 UTC (permalink / raw)


>>>>> "Bart" == Bart Van Assche <bart.vanassche at sandisk.com> writes:

Bart> Ensure that if scsi-mq is enabled that
Bart> scsi_internal_device_block() waits until ongoing
Bart> shost->hostt->queuecommand() calls have finished.

Reviewed-by: Martin K. Petersen <martin.petersen at oracle.com>

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v5 13/14] nvme: Fix a race condition related to stopping queues
  2016-10-29  0:18 [PATCH v5 0/14] Fix race conditions related to stopping block layer queues Bart Van Assche
                   ` (11 preceding siblings ...)
  2016-10-29  0:23 ` [PATCH v5 12/14] SRP transport, scsi-mq: Wait for .queue_rq() if necessary Bart Van Assche
@ 2016-10-29  0:23 ` Bart Van Assche
  2016-11-01 16:03   ` Sagi Grimberg
  2016-10-29  0:23 ` [PATCH v5 14/14] nvme: Use BLK_MQ_S_STOPPED instead of QUEUE_FLAG_STOPPED in blk-mq code Bart Van Assche
                   ` (2 subsequent siblings)
  15 siblings, 1 reply; 31+ messages in thread
From: Bart Van Assche @ 2016-10-29  0:23 UTC (permalink / raw)


Avoid that nvme_queue_rq() is still running when nvme_stop_queues()
returns.

Signed-off-by: Bart Van Assche <bart.vanassche at sandisk.com>
Cc: Keith Busch <keith.busch at intel.com>
Cc: Sagi Grimberg <sagi at grimberg.me>
Cc: Christoph Hellwig <hch at lst.de>
---
 drivers/nvme/host/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 8403996..fe15d94 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2083,7 +2083,7 @@ void nvme_stop_queues(struct nvme_ctrl *ctrl)
 		queue_flag_set(QUEUE_FLAG_STOPPED, ns->queue);
 		spin_unlock_irq(ns->queue->queue_lock);
 
-		blk_mq_stop_hw_queues(ns->queue);
+		blk_mq_quiesce_queue(ns->queue);
 	}
 	mutex_unlock(&ctrl->namespaces_mutex);
 }
-- 
2.10.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 13/14] nvme: Fix a race condition related to stopping queues
  2016-10-29  0:23 ` [PATCH v5 13/14] nvme: Fix a race condition related to stopping queues Bart Van Assche
@ 2016-11-01 16:03   ` Sagi Grimberg
  0 siblings, 0 replies; 31+ messages in thread
From: Sagi Grimberg @ 2016-11-01 16:03 UTC (permalink / raw)


Reviewed-by: Sagi Grimberg <sagi at grimberg.me>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v5 14/14] nvme: Use BLK_MQ_S_STOPPED instead of QUEUE_FLAG_STOPPED in blk-mq code
  2016-10-29  0:18 [PATCH v5 0/14] Fix race conditions related to stopping block layer queues Bart Van Assche
                   ` (12 preceding siblings ...)
  2016-10-29  0:23 ` [PATCH v5 13/14] nvme: Fix a race condition related to stopping queues Bart Van Assche
@ 2016-10-29  0:23 ` Bart Van Assche
  2016-10-31 13:53   ` Laurence Oberman
  2016-11-02 15:17 ` [PATCH v5 0/14] Fix race conditions related to stopping block layer queues Christoph Hellwig
  2016-11-02 18:52 ` Jens Axboe
  15 siblings, 1 reply; 31+ messages in thread
From: Bart Van Assche @ 2016-10-29  0:23 UTC (permalink / raw)


Make nvme_requeue_req() check BLK_MQ_S_STOPPED instead of
QUEUE_FLAG_STOPPED. Remove the QUEUE_FLAG_STOPPED manipulations
that became superfluous because of this change. Change
blk_queue_stopped() tests into blk_mq_queue_stopped().

This patch fixes a race condition: using queue_flag_clear_unlocked()
is not safe if any other function that manipulates the queue flags
can be called concurrently, e.g. blk_cleanup_queue().

Signed-off-by: Bart Van Assche <bart.vanassche at sandisk.com>
Cc: Keith Busch <keith.busch at intel.com>
Cc: Christoph Hellwig <hch at lst.de>
Cc: Sagi Grimberg <sagi at grimberg.me>
---
 drivers/nvme/host/core.c | 16 ++--------------
 1 file changed, 2 insertions(+), 14 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index fe15d94..45dd237 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -201,13 +201,7 @@ static struct nvme_ns *nvme_get_ns_from_disk(struct gendisk *disk)
 
 void nvme_requeue_req(struct request *req)
 {
-	unsigned long flags;
-
-	blk_mq_requeue_request(req, false);
-	spin_lock_irqsave(req->q->queue_lock, flags);
-	if (!blk_queue_stopped(req->q))
-		blk_mq_kick_requeue_list(req->q);
-	spin_unlock_irqrestore(req->q->queue_lock, flags);
+	blk_mq_requeue_request(req, !blk_mq_queue_stopped(req->q));
 }
 EXPORT_SYMBOL_GPL(nvme_requeue_req);
 
@@ -2078,13 +2072,8 @@ void nvme_stop_queues(struct nvme_ctrl *ctrl)
 	struct nvme_ns *ns;
 
 	mutex_lock(&ctrl->namespaces_mutex);
-	list_for_each_entry(ns, &ctrl->namespaces, list) {
-		spin_lock_irq(ns->queue->queue_lock);
-		queue_flag_set(QUEUE_FLAG_STOPPED, ns->queue);
-		spin_unlock_irq(ns->queue->queue_lock);
-
+	list_for_each_entry(ns, &ctrl->namespaces, list)
 		blk_mq_quiesce_queue(ns->queue);
-	}
 	mutex_unlock(&ctrl->namespaces_mutex);
 }
 EXPORT_SYMBOL_GPL(nvme_stop_queues);
@@ -2095,7 +2084,6 @@ void nvme_start_queues(struct nvme_ctrl *ctrl)
 
 	mutex_lock(&ctrl->namespaces_mutex);
 	list_for_each_entry(ns, &ctrl->namespaces, list) {
-		queue_flag_clear_unlocked(QUEUE_FLAG_STOPPED, ns->queue);
 		blk_mq_start_stopped_hw_queues(ns->queue, true);
 		blk_mq_kick_requeue_list(ns->queue);
 	}
-- 
2.10.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 14/14] nvme: Use BLK_MQ_S_STOPPED instead of QUEUE_FLAG_STOPPED in blk-mq code
  2016-10-29  0:23 ` [PATCH v5 14/14] nvme: Use BLK_MQ_S_STOPPED instead of QUEUE_FLAG_STOPPED in blk-mq code Bart Van Assche
@ 2016-10-31 13:53   ` Laurence Oberman
  2016-10-31 13:59     ` Bart Van Assche
  2016-10-31 15:10     ` Bart Van Assche
  0 siblings, 2 replies; 31+ messages in thread
From: Laurence Oberman @ 2016-10-31 13:53 UTC (permalink / raw)




----- Original Message -----
> From: "Bart Van Assche" <bart.vanassche at sandisk.com>
> To: "Jens Axboe" <axboe at fb.com>
> Cc: "Christoph Hellwig" <hch at lst.de>, "James Bottomley" <jejb at linux.vnet.ibm.com>, "Martin K. Petersen"
> <martin.petersen at oracle.com>, "Mike Snitzer" <snitzer at redhat.com>, "Doug Ledford" <dledford at redhat.com>, "Keith
> Busch" <keith.busch at intel.com>, "Ming Lei" <tom.leiming at gmail.com>, "Konrad Rzeszutek Wilk"
> <konrad.wilk at oracle.com>, "Roger Pau Monn?" <roger.pau at citrix.com>, "Laurence Oberman" <loberman at redhat.com>,
> linux-block at vger.kernel.org, linux-scsi at vger.kernel.org, linux-rdma at vger.kernel.org, linux-nvme at lists.infradead.org
> Sent: Friday, October 28, 2016 8:23:40 PM
> Subject: [PATCH v5 14/14] nvme: Use BLK_MQ_S_STOPPED instead of QUEUE_FLAG_STOPPED in blk-mq code
> 
> Make nvme_requeue_req() check BLK_MQ_S_STOPPED instead of
> QUEUE_FLAG_STOPPED. Remove the QUEUE_FLAG_STOPPED manipulations
> that became superfluous because of this change. Change
> blk_queue_stopped() tests into blk_mq_queue_stopped().
> 
> This patch fixes a race condition: using queue_flag_clear_unlocked()
> is not safe if any other function that manipulates the queue flags
> can be called concurrently, e.g. blk_cleanup_queue().
> 
> Signed-off-by: Bart Van Assche <bart.vanassche at sandisk.com>
> Cc: Keith Busch <keith.busch at intel.com>
> Cc: Christoph Hellwig <hch at lst.de>
> Cc: Sagi Grimberg <sagi at grimberg.me>
> ---
>  drivers/nvme/host/core.c | 16 ++--------------
>  1 file changed, 2 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index fe15d94..45dd237 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -201,13 +201,7 @@ static struct nvme_ns *nvme_get_ns_from_disk(struct
> gendisk *disk)
>  
>  void nvme_requeue_req(struct request *req)
>  {
> -	unsigned long flags;
> -
> -	blk_mq_requeue_request(req, false);
> -	spin_lock_irqsave(req->q->queue_lock, flags);
> -	if (!blk_queue_stopped(req->q))
> -		blk_mq_kick_requeue_list(req->q);
> -	spin_unlock_irqrestore(req->q->queue_lock, flags);
> +	blk_mq_requeue_request(req, !blk_mq_queue_stopped(req->q));
>  }
>  EXPORT_SYMBOL_GPL(nvme_requeue_req);
>  
> @@ -2078,13 +2072,8 @@ void nvme_stop_queues(struct nvme_ctrl *ctrl)
>  	struct nvme_ns *ns;
>  
>  	mutex_lock(&ctrl->namespaces_mutex);
> -	list_for_each_entry(ns, &ctrl->namespaces, list) {
> -		spin_lock_irq(ns->queue->queue_lock);
> -		queue_flag_set(QUEUE_FLAG_STOPPED, ns->queue);
> -		spin_unlock_irq(ns->queue->queue_lock);
> -
> +	list_for_each_entry(ns, &ctrl->namespaces, list)
>  		blk_mq_quiesce_queue(ns->queue);
> -	}
>  	mutex_unlock(&ctrl->namespaces_mutex);
>  }
>  EXPORT_SYMBOL_GPL(nvme_stop_queues);
> @@ -2095,7 +2084,6 @@ void nvme_start_queues(struct nvme_ctrl *ctrl)
>  
>  	mutex_lock(&ctrl->namespaces_mutex);
>  	list_for_each_entry(ns, &ctrl->namespaces, list) {
> -		queue_flag_clear_unlocked(QUEUE_FLAG_STOPPED, ns->queue);
>  		blk_mq_start_stopped_hw_queues(ns->queue, true);
>  		blk_mq_kick_requeue_list(ns->queue);
>  	}
> --
> 2.10.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

Hello Bart

Thanks for all this work.

Applied all 14 patches, also corrected the part of the xen-blkfront.c blkif_recover patch in patchv5-5/14.

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 9908597..60fff99 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -2045,6 +2045,7 @@ static int blkif_recover(struct blkfront_info *info)
                 BUG_ON(req->nr_phys_segments > segs);
                 blk_mq_requeue_request(req);
         }
+        blk_mq_start_stopped_hw_queues(infrq, true);                    *** Corrected
         blk_mq_kick_requeue_list(infrq);
 
         while ((bio = bio_list_pop(&infbio_list)) != NULL) {

Ran multiple read/write buffered and directio tests via RDMA/SRP and mlx5 (100Gbit) with max_sectors_kb set to 1024, 2048, 4096 and 8196
Ran multiple read/write buffered and directio tests via RDMA/SRP and mlx4 (56Gbit)  with max_sectors_kb set to 1024, 2048, 4096 and 8196
Reset the SRP hosts multiple times with multipath set to no_path_retry queue
Ran basic NVME read/write testing with no hot plug disconnects on multiple block sizes

All tests passed.

For the series:
Tested-by: Laurence Oberman <loberman at redhat.com>

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v5 14/14] nvme: Use BLK_MQ_S_STOPPED instead of QUEUE_FLAG_STOPPED in blk-mq code
  2016-10-31 13:53   ` Laurence Oberman
@ 2016-10-31 13:59     ` Bart Van Assche
  2016-10-31 15:10     ` Bart Van Assche
  1 sibling, 0 replies; 31+ messages in thread
From: Bart Van Assche @ 2016-10-31 13:59 UTC (permalink / raw)


On 10/31/2016 06:53 AM, Laurence Oberman wrote:
> Ran multiple read/write buffered and directio tests via RDMA/SRP and mlx5 (100Gbit) with max_sectors_kb set to 1024, 2048, 4096 and 8196
> Ran multiple read/write buffered and directio tests via RDMA/SRP and mlx4 (56Gbit)  with max_sectors_kb set to 1024, 2048, 4096 and 8196
> Reset the SRP hosts multiple times with multipath set to no_path_retry queue
> Ran basic NVME read/write testing with no hot plug disconnects on multiple block sizes
>
> All tests passed.
>
> For the series:
> Tested-by: Laurence Oberman <loberman at redhat.com>

Hello Laurence,

Thanks for having tested this version of this patch series again so quickly!

Bart.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v5 14/14] nvme: Use BLK_MQ_S_STOPPED instead of QUEUE_FLAG_STOPPED in blk-mq code
  2016-10-31 13:53   ` Laurence Oberman
  2016-10-31 13:59     ` Bart Van Assche
@ 2016-10-31 15:10     ` Bart Van Assche
  1 sibling, 0 replies; 31+ messages in thread
From: Bart Van Assche @ 2016-10-31 15:10 UTC (permalink / raw)


On Mon, 2016-10-31@09:53 -0400, Laurence Oberman wrote:
> Applied all 14 patches, also corrected the part of the xen-blkfront.c 
> blkif_recover patch in patchv5-5/14.
> 
> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-
> blkfront.c
> index 9908597..60fff99 100644
> --- a/drivers/block/xen-blkfront.c
> +++ b/drivers/block/xen-blkfront.c
> @@ -2045,6 +2045,7 @@ static int blkif_recover(struct blkfront_info
> *info)
> ?????????????????BUG_ON(req->nr_phys_segments > segs);
> ?????????????????blk_mq_requeue_request(req);
> ?????????}
> +????????blk_mq_start_stopped_hw_queues(infrq,
> true);????????????????????*** Corrected
> ?????????blk_mq_kick_requeue_list(infrq);
> ?
> ?????????while ((bio = bio_list_pop(&infbio_list)) != NULL) {

Hello Laurence,

Sorry for the build failure. The way you changed xen-blkfront is indeed
what I intended. Apparently I forgot to enable Xen in my kernel config
...

Bart.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v5 0/14] Fix race conditions related to stopping block layer queues
  2016-10-29  0:18 [PATCH v5 0/14] Fix race conditions related to stopping block layer queues Bart Van Assche
                   ` (13 preceding siblings ...)
  2016-10-29  0:23 ` [PATCH v5 14/14] nvme: Use BLK_MQ_S_STOPPED instead of QUEUE_FLAG_STOPPED in blk-mq code Bart Van Assche
@ 2016-11-02 15:17 ` Christoph Hellwig
  2016-11-02 18:52 ` Jens Axboe
  15 siblings, 0 replies; 31+ messages in thread
From: Christoph Hellwig @ 2016-11-02 15:17 UTC (permalink / raw)


Hi Bart,

this whole series looks great to me:

Reviewed-by: Christoph Hellwig <hch at lst.de>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v5 0/14] Fix race conditions related to stopping block layer queues
  2016-10-29  0:18 [PATCH v5 0/14] Fix race conditions related to stopping block layer queues Bart Van Assche
                   ` (14 preceding siblings ...)
  2016-11-02 15:17 ` [PATCH v5 0/14] Fix race conditions related to stopping block layer queues Christoph Hellwig
@ 2016-11-02 18:52 ` Jens Axboe
  2016-11-02 19:35   ` Bart Van Assche
  15 siblings, 1 reply; 31+ messages in thread
From: Jens Axboe @ 2016-11-02 18:52 UTC (permalink / raw)


On 10/28/2016 06:18 PM, Bart Van Assche wrote:
> Hello Jens,
>
> Multiple block drivers need the functionality to stop a request queue
> and to wait until all ongoing request_fn() / queue_rq() calls have
> finished without waiting until all outstanding requests have finished.
> Hence this patch series that introduces the blk_mq_quiesce_queue()
> function. The dm-mq, SRP and NVMe patches in this patch series are three
> examples of where these functions are useful. These patches have been
> tested on top of kernel v4.9-rc2. The following tests have been run to
> verify this patch series:
> - Mike's mptest suite that stress-tests dm-multipath.
> - My own srp-test suite that stress-tests SRP on top of dm-multipath.
> - fio on top of the NVMeOF host driver that was connected to the NVMeOF
>   target driver on the same host.
> - Laurence verified the previous version (v3) of this patch series by
>   running it through the Red Hat SRP and NVMe test suites.
>
> The changes compared to the third version of this patch series are:
> - Added a blk_mq_stop_hw_queues() call in blk_mq_quiesce_queue() as
>   requested by Ming Lei.
> - Modified scsi_unblock_target() such that it waits until
>   .queuecommand() finished. Unexported scsi_wait_for_queuecommand().
> - Reordered the two NVMe patches.
> - Added a patch that avoids that blk_mq_requeue_work() restarts stopped
>   queues.
> - Added a patch that removes blk_mq_cancel_requeue_work().

Bart, I have applied the series, except patches 11+12. I don't mind
taking them through the block tree, but they can go through the SCSI
tree as well. Let me know what folks think.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v5 0/14] Fix race conditions related to stopping block layer queues
  2016-11-02 18:52 ` Jens Axboe
@ 2016-11-02 19:35   ` Bart Van Assche
  0 siblings, 0 replies; 31+ messages in thread
From: Bart Van Assche @ 2016-11-02 19:35 UTC (permalink / raw)

On 11/02/2016 01:08 PM, Jens Axboe wrote:
> Bart, I have applied the series, except patches 11+12. I don't mind
> taking them through the block tree, but they can go through the SCSI
> tree as well. Let me know what folks think.

Hello Jens,

Thanks for having applied this patch series. I will resend the SCSI 
patches to the SCSI maintainers. Since I am not aware of any bug report 
for which these patches are the proper fix I don't think there is any 
urgency to apply the SCSI patches.

Bart.

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2016-11-02 19:35 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-10-29  0:18 [PATCH v5 0/14] Fix race conditions related to stopping block layer queues Bart Van Assche
2016-10-29  0:18 ` [PATCH v5 01/14] blk-mq: Do not invoke .queue_rq() for a stopped queue Bart Van Assche
2016-10-29  0:19 ` [PATCH v5 02/14] blk-mq: Introduce blk_mq_hctx_stopped() Bart Van Assche
2016-10-29  0:19 ` [PATCH v5 03/14] blk-mq: Introduce blk_mq_queue_stopped() Bart Van Assche
2016-10-29  0:20 ` [PATCH v5 04/14] blk-mq: Move more code into blk_mq_direct_issue_request() Bart Van Assche
2016-10-29  0:20 ` [PATCH v5 05/14] blk-mq: Avoid that requeueing starts stopped queues Bart Van Assche
2016-11-01 16:01   ` Sagi Grimberg
2016-10-29  0:20 ` [PATCH v5 06/14] blk-mq: Remove blk_mq_cancel_requeue_work() Bart Van Assche
2016-11-01 16:01   ` Sagi Grimberg
2016-10-29  0:21 ` [PATCH v5 07/14] blk-mq: Introduce blk_mq_quiesce_queue() Bart Van Assche
2016-11-01 16:02   ` Sagi Grimberg
2016-11-02  2:08     ` Ming Lei
2016-10-29  0:21 ` [PATCH v5 08/14] blk-mq: Add a kick_requeue_list argument to blk_mq_requeue_request() Bart Van Assche
2016-11-01 16:02   ` Sagi Grimberg
2016-10-29  0:22 ` [PATCH v5 09/14] dm: Use BLK_MQ_S_STOPPED instead of QUEUE_FLAG_STOPPED in blk-mq code Bart Van Assche
2016-10-29  0:22 ` [PATCH v5 10/14] dm: Fix a race condition related to stopping and starting queues Bart Van Assche
2016-10-29  0:22 ` [PATCH v5 11/14] SRP transport: Move queuecommand() wait code to SCSI core Bart Van Assche
2016-11-01 16:03   ` Sagi Grimberg
2016-11-01 16:11   ` Martin K. Petersen
2016-10-29  0:23 ` [PATCH v5 12/14] SRP transport, scsi-mq: Wait for .queue_rq() if necessary Bart Van Assche
2016-11-01 16:03   ` Sagi Grimberg
2016-11-01 16:12   ` Martin K. Petersen
2016-10-29  0:23 ` [PATCH v5 13/14] nvme: Fix a race condition related to stopping queues Bart Van Assche
2016-11-01 16:03   ` Sagi Grimberg
2016-10-29  0:23 ` [PATCH v5 14/14] nvme: Use BLK_MQ_S_STOPPED instead of QUEUE_FLAG_STOPPED in blk-mq code Bart Van Assche
2016-10-31 13:53   ` Laurence Oberman
2016-10-31 13:59     ` Bart Van Assche
2016-10-31 15:10     ` Bart Van Assche
2016-11-02 15:17 ` [PATCH v5 0/14] Fix race conditions related to stopping block layer queues Christoph Hellwig
2016-11-02 18:52 ` Jens Axboe
2016-11-02 19:35   ` Bart Van Assche

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).