[PATCH 0/9] block/scsi: safe SCSI quiescing

linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/9] block/scsi: safe SCSI quiescing
@ 2017-08-31 17:27 Ming Lei
  2017-08-31 17:27 ` [PATCH 1/9] percpu-refcount: introduce percpu_ref_is_dead() Ming Lei
                   ` (9 more replies)
  0 siblings, 10 replies; 34+ messages in thread
From: Ming Lei @ 2017-08-31 17:27 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig, Bart Van Assche,
	linux-scsi, Martin K . Petersen, James E . J . Bottomley
  Cc: Oleksandr Natalenko, Ming Lei

The current SCSI quiesce isn't safe and easy to trigger I/O deadlock.

Once SCSI device is put into QUIESCE, no new request except for RQF_PREEMPT
can be dispatched to SCSI successfully, and scsi_device_quiesce() just simply
waits for completion of I/Os dispatched to SCSI stack. It isn't enough at all.

Because new request still can be allocated, but all the allocated
requests can't be dispatched successfully, so request pool can be
consumed up easily.

Then request with RQF_PREEMPT can't be allocated, and system may
hang forever, such as during system suspend or SCSI domain alidation.

Both IO hang inside system suspend[1] or SCSI domain validation
were reported before.

This patch tries to solve the issue by freezing block queue during
SCSI quiescing, and allowing to allocate request of RQF_PREEMPT
when queue is frozen.

Both SCSI and SCSI_MQ have this IO deadlock issue, this patch fixes
them all by unifying blk_freeze_queue() and blk_unfreeze_queue().

[1] https://marc.info/?t=150340250100013&r=3&w=2

Ming Lei (9):
  percpu-refcount: introduce percpu_ref_is_dead()
  blk-mq: rename blk_mq_unfreeze_queue as blk_unfreeze_queue
  blk-mq: rename blk_mq_freeze_queue as blk_freeze_queue
  blk-mq: only run hw queues for blk-mq
  block: introduce blk_drain_queue()
  blk-mq: rename blk_mq_freeze_queue_wait as blk_freeze_queue_wait
  block: tracking request allocation with q_usage_counter
  block: allow to allocate req with REQF_PREEMPT when queue is frozen
  SCSI: freeze block queue when SCSI device is put into quiesce

 block/bfq-iosched.c             |  2 +-
 block/blk-cgroup.c              |  8 +++----
 block/blk-core.c                | 49 +++++++++++++++++++++++++++++++++--------
 block/blk-mq.c                  | 49 +++++++++++++++++++++--------------------
 block/blk-mq.h                  |  1 -
 block/blk.h                     |  5 +++++
 block/elevator.c                |  4 ++--
 drivers/block/loop.c            | 16 +++++++-------
 drivers/block/rbd.c             |  2 +-
 drivers/ide/ide-pm.c            |  3 ++-
 drivers/nvme/host/core.c        |  8 +++----
 drivers/scsi/scsi_lib.c         | 23 +++++++++++++++++--
 include/linux/blk-mq.h          | 13 ++++++-----
 include/linux/blkdev.h          | 18 +++++++++++++--
 include/linux/percpu-refcount.h | 17 ++++++++++++++
 15 files changed, 153 insertions(+), 65 deletions(-)

-- 
2.9.5

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH 1/9] percpu-refcount: introduce percpu_ref_is_dead()
  2017-08-31 17:27 [PATCH 0/9] block/scsi: safe SCSI quiescing Ming Lei
@ 2017-08-31 17:27 ` Ming Lei
  2017-08-31 23:07   ` Bart Van Assche
  2017-09-01 13:59   ` Tejun Heo
  2017-08-31 17:27 ` [PATCH 2/9] blk-mq: rename blk_mq_unfreeze_queue as blk_unfreeze_queue Ming Lei
                   ` (8 subsequent siblings)
  9 siblings, 2 replies; 34+ messages in thread
From: Ming Lei @ 2017-08-31 17:27 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig, Bart Van Assche,
	linux-scsi, Martin K . Petersen, James E . J . Bottomley
  Cc: Oleksandr Natalenko, Ming Lei, Tejun Heo

Inside block layer, we need to support to allocate request with
RQF_PREEMPT even though queue is frozen.

So introduce this helper for checking if queue is frozen.

Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 include/linux/percpu-refcount.h | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/include/linux/percpu-refcount.h b/include/linux/percpu-refcount.h
index c13dceb87b60..08549bbf4080 100644
--- a/include/linux/percpu-refcount.h
+++ b/include/linux/percpu-refcount.h
@@ -324,4 +324,21 @@ static inline bool percpu_ref_is_zero(struct percpu_ref *ref)
 	return !atomic_long_read(&ref->count);
 }
 
+/**
+ * percpu_ref_is_dead - test whether a percpu refcount is killed
+ * @ref: percpu_ref to test
+ *
+ * Returns %true if @ref is dead
+ *
+ * This function is safe to call as long as @ref is between init and exit.
+ */
+static inline bool percpu_ref_is_dead(struct percpu_ref *ref)
+{
+	unsigned long __percpu *percpu_count;
+
+	if (__ref_is_percpu(ref, &percpu_count))
+		return false;
+	return ref->percpu_count_ptr & __PERCPU_REF_DEAD;
+}
+
 #endif
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 2/9] blk-mq: rename blk_mq_unfreeze_queue as blk_unfreeze_queue
  2017-08-31 17:27 [PATCH 0/9] block/scsi: safe SCSI quiescing Ming Lei
  2017-08-31 17:27 ` [PATCH 1/9] percpu-refcount: introduce percpu_ref_is_dead() Ming Lei
@ 2017-08-31 17:27 ` Ming Lei
  2017-08-31 17:27 ` [PATCH 3/9] blk-mq: rename blk_mq_freeze_queue as blk_freeze_queue Ming Lei
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 34+ messages in thread
From: Ming Lei @ 2017-08-31 17:27 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig, Bart Van Assche,
	linux-scsi, Martin K . Petersen, James E . J . Bottomley
  Cc: Oleksandr Natalenko, Ming Lei

We will support to freeze queue on block legacy path too.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-cgroup.c       |  4 ++--
 block/blk-mq.c           | 10 +++++-----
 block/elevator.c         |  2 +-
 drivers/block/loop.c     |  8 ++++----
 drivers/nvme/host/core.c |  4 ++--
 include/linux/blk-mq.h   |  2 +-
 6 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 0480892e97e5..02e8a47ac77c 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -1337,7 +1337,7 @@ int blkcg_activate_policy(struct request_queue *q,
 	spin_unlock_irq(q->queue_lock);
 out_bypass_end:
 	if (q->mq_ops)
-		blk_mq_unfreeze_queue(q);
+		blk_unfreeze_queue(q);
 	else
 		blk_queue_bypass_end(q);
 	if (pd_prealloc)
@@ -1388,7 +1388,7 @@ void blkcg_deactivate_policy(struct request_queue *q,
 	spin_unlock_irq(q->queue_lock);
 
 	if (q->mq_ops)
-		blk_mq_unfreeze_queue(q);
+		blk_unfreeze_queue(q);
 	else
 		blk_queue_bypass_end(q);
 }
diff --git a/block/blk-mq.c b/block/blk-mq.c
index d935f15c54da..82136e83951d 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -172,7 +172,7 @@ void blk_mq_freeze_queue(struct request_queue *q)
 }
 EXPORT_SYMBOL_GPL(blk_mq_freeze_queue);
 
-void blk_mq_unfreeze_queue(struct request_queue *q)
+void blk_unfreeze_queue(struct request_queue *q)
 {
 	int freeze_depth;
 
@@ -183,7 +183,7 @@ void blk_mq_unfreeze_queue(struct request_queue *q)
 		wake_up_all(&q->mq_freeze_wq);
 	}
 }
-EXPORT_SYMBOL_GPL(blk_mq_unfreeze_queue);
+EXPORT_SYMBOL_GPL(blk_unfreeze_queue);
 
 /*
  * FIXME: replace the scsi_internal_device_*block_nowait() calls in the
@@ -2250,7 +2250,7 @@ static void blk_mq_update_tag_set_depth(struct blk_mq_tag_set *set,
 	list_for_each_entry(q, &set->tag_list, tag_set_list) {
 		blk_mq_freeze_queue(q);
 		queue_set_hctx_shared(q, shared);
-		blk_mq_unfreeze_queue(q);
+		blk_unfreeze_queue(q);
 	}
 }
 
@@ -2708,7 +2708,7 @@ static int __blk_mq_update_nr_requests(struct request_queue *q,
 	if (!ret)
 		q->nr_requests = nr;
 
-	blk_mq_unfreeze_queue(q);
+	blk_unfreeze_queue(q);
 
 	return ret;
 }
@@ -2757,7 +2757,7 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set,
 	}
 
 	list_for_each_entry(q, &set->tag_list, tag_set_list)
-		blk_mq_unfreeze_queue(q);
+		blk_unfreeze_queue(q);
 }
 
 void blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, int nr_hw_queues)
diff --git a/block/elevator.c b/block/elevator.c
index 0e465809d3f3..371c8165c9e8 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -994,7 +994,7 @@ static int elevator_switch_mq(struct request_queue *q,
 		blk_add_trace_msg(q, "elv switch: none");
 
 out:
-	blk_mq_unfreeze_queue(q);
+	blk_unfreeze_queue(q);
 	return ret;
 }
 
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 2fbd4089c20e..5c11ea44d470 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -217,7 +217,7 @@ static void __loop_update_dio(struct loop_device *lo, bool dio)
 		lo->lo_flags |= LO_FLAGS_DIRECT_IO;
 	else
 		lo->lo_flags &= ~LO_FLAGS_DIRECT_IO;
-	blk_mq_unfreeze_queue(lo->lo_queue);
+	blk_unfreeze_queue(lo->lo_queue);
 }
 
 static int
@@ -605,7 +605,7 @@ static int loop_switch(struct loop_device *lo, struct file *file)
 	do_loop_switch(lo, &w);
 
 	/* unfreeze */
-	blk_mq_unfreeze_queue(lo->lo_queue);
+	blk_unfreeze_queue(lo->lo_queue);
 
 	return 0;
 }
@@ -1079,7 +1079,7 @@ static int loop_clr_fd(struct loop_device *lo)
 	lo->lo_state = Lo_unbound;
 	/* This is safe: open() is still holding a reference. */
 	module_put(THIS_MODULE);
-	blk_mq_unfreeze_queue(lo->lo_queue);
+	blk_unfreeze_queue(lo->lo_queue);
 
 	if (lo->lo_flags & LO_FLAGS_PARTSCAN && bdev)
 		loop_reread_partitions(lo, bdev);
@@ -1191,7 +1191,7 @@ loop_set_status(struct loop_device *lo, const struct loop_info64 *info)
 	__loop_update_dio(lo, lo->use_dio);
 
  exit:
-	blk_mq_unfreeze_queue(lo->lo_queue);
+	blk_unfreeze_queue(lo->lo_queue);
 
 	if (!err && (info->lo_flags & LO_FLAGS_PARTSCAN) &&
 	     !(lo->lo_flags & LO_FLAGS_PARTSCAN)) {
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 37046ac2c441..5c76b0a96be2 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1226,7 +1226,7 @@ static void __nvme_revalidate_disk(struct gendisk *disk, struct nvme_id_ns *id)
 
 	if (ctrl->oncs & NVME_CTRL_ONCS_DSM)
 		nvme_config_discard(ns);
-	blk_mq_unfreeze_queue(disk->queue);
+	blk_unfreeze_queue(disk->queue);
 }
 
 static int nvme_revalidate_disk(struct gendisk *disk)
@@ -2753,7 +2753,7 @@ void nvme_unfreeze(struct nvme_ctrl *ctrl)
 
 	mutex_lock(&ctrl->namespaces_mutex);
 	list_for_each_entry(ns, &ctrl->namespaces, list)
-		blk_mq_unfreeze_queue(ns->queue);
+		blk_unfreeze_queue(ns->queue);
 	mutex_unlock(&ctrl->namespaces_mutex);
 }
 EXPORT_SYMBOL_GPL(nvme_unfreeze);
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 13f6c25fa461..2572e5641568 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -257,7 +257,7 @@ void blk_mq_delay_queue(struct blk_mq_hw_ctx *hctx, unsigned long msecs);
 void blk_mq_tagset_busy_iter(struct blk_mq_tag_set *tagset,
 		busy_tag_iter_fn *fn, void *priv);
 void blk_mq_freeze_queue(struct request_queue *q);
-void blk_mq_unfreeze_queue(struct request_queue *q);
+void blk_unfreeze_queue(struct request_queue *q);
 void blk_freeze_queue_start(struct request_queue *q);
 void blk_mq_freeze_queue_wait(struct request_queue *q);
 int blk_mq_freeze_queue_wait_timeout(struct request_queue *q,
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 3/9] blk-mq: rename blk_mq_freeze_queue as blk_freeze_queue
  2017-08-31 17:27 [PATCH 0/9] block/scsi: safe SCSI quiescing Ming Lei
  2017-08-31 17:27 ` [PATCH 1/9] percpu-refcount: introduce percpu_ref_is_dead() Ming Lei
  2017-08-31 17:27 ` [PATCH 2/9] blk-mq: rename blk_mq_unfreeze_queue as blk_unfreeze_queue Ming Lei
@ 2017-08-31 17:27 ` Ming Lei
  2017-08-31 22:35   ` Bart Van Assche
  2017-08-31 17:27 ` [PATCH 4/9] blk-mq: only run hw queues for blk-mq Ming Lei
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 34+ messages in thread
From: Ming Lei @ 2017-08-31 17:27 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig, Bart Van Assche,
	linux-scsi, Martin K . Petersen, James E . J . Bottomley
  Cc: Oleksandr Natalenko, Ming Lei

This APIs will be used by legacy path too.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/bfq-iosched.c      |  2 +-
 block/blk-cgroup.c       |  4 ++--
 block/blk-mq.c           | 17 ++++-------------
 block/blk-mq.h           |  1 -
 block/elevator.c         |  2 +-
 drivers/block/loop.c     |  8 ++++----
 drivers/block/rbd.c      |  2 +-
 drivers/nvme/host/core.c |  2 +-
 include/linux/blk-mq.h   |  2 +-
 9 files changed, 15 insertions(+), 25 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 509f39998011..ce2b00e897e2 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -4757,7 +4757,7 @@ static int bfq_init_queue(struct request_queue *q, struct elevator_type *e)
 	 * The invocation of the next bfq_create_group_hierarchy
 	 * function is the head of a chain of function calls
 	 * (bfq_create_group_hierarchy->blkcg_activate_policy->
-	 * blk_mq_freeze_queue) that may lead to the invocation of the
+	 * blk_freeze_queue) that may lead to the invocation of the
 	 * has_work hook function. For this reason,
 	 * bfq_create_group_hierarchy is invoked only after all
 	 * scheduler data has been initialized, apart from the fields
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 02e8a47ac77c..87c15f3947d5 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -1296,7 +1296,7 @@ int blkcg_activate_policy(struct request_queue *q,
 		return 0;
 
 	if (q->mq_ops)
-		blk_mq_freeze_queue(q);
+		blk_freeze_queue(q);
 	else
 		blk_queue_bypass_start(q);
 pd_prealloc:
@@ -1363,7 +1363,7 @@ void blkcg_deactivate_policy(struct request_queue *q,
 		return;
 
 	if (q->mq_ops)
-		blk_mq_freeze_queue(q);
+		blk_freeze_queue(q);
 	else
 		blk_queue_bypass_start(q);
 
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 82136e83951d..8cf1f7cbef2b 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -161,16 +161,7 @@ void blk_freeze_queue(struct request_queue *q)
 	blk_freeze_queue_start(q);
 	blk_mq_freeze_queue_wait(q);
 }
-
-void blk_mq_freeze_queue(struct request_queue *q)
-{
-	/*
-	 * ...just an alias to keep freeze and unfreeze actions balanced
-	 * in the blk_mq_* namespace
-	 */
-	blk_freeze_queue(q);
-}
-EXPORT_SYMBOL_GPL(blk_mq_freeze_queue);
+EXPORT_SYMBOL_GPL(blk_freeze_queue);
 
 void blk_unfreeze_queue(struct request_queue *q)
 {
@@ -2248,7 +2239,7 @@ static void blk_mq_update_tag_set_depth(struct blk_mq_tag_set *set,
 	lockdep_assert_held(&set->tag_list_lock);
 
 	list_for_each_entry(q, &set->tag_list, tag_set_list) {
-		blk_mq_freeze_queue(q);
+		blk_freeze_queue(q);
 		queue_set_hctx_shared(q, shared);
 		blk_unfreeze_queue(q);
 	}
@@ -2683,7 +2674,7 @@ static int __blk_mq_update_nr_requests(struct request_queue *q,
 	if (!set)
 		return -EINVAL;
 
-	blk_mq_freeze_queue(q);
+	blk_freeze_queue(q);
 
 	ret = 0;
 	queue_for_each_hw_ctx(q, hctx, i) {
@@ -2747,7 +2738,7 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set,
 		return;
 
 	list_for_each_entry(q, &set->tag_list, tag_set_list)
-		blk_mq_freeze_queue(q);
+		blk_freeze_queue(q);
 
 	set->nr_hw_queues = nr_hw_queues;
 	blk_mq_update_queue_map(set);
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 1b9742eb7399..7ce29ef1e6f3 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -30,7 +30,6 @@ struct blk_mq_ctx {
 } ____cacheline_aligned_in_smp;
 
 void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async);
-void blk_mq_freeze_queue(struct request_queue *q);
 void blk_mq_free_queue(struct request_queue *q);
 int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr);
 void blk_mq_wake_waiters(struct request_queue *q);
diff --git a/block/elevator.c b/block/elevator.c
index 371c8165c9e8..1164c8a3720f 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -967,7 +967,7 @@ static int elevator_switch_mq(struct request_queue *q,
 {
 	int ret;
 
-	blk_mq_freeze_queue(q);
+	blk_freeze_queue(q);
 
 	if (q->elevator) {
 		if (q->elevator->registered)
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 5c11ea44d470..b2e708b7e1e6 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -211,7 +211,7 @@ static void __loop_update_dio(struct loop_device *lo, bool dio)
 	 * LO_FLAGS_READ_ONLY, both are set from kernel, and losetup
 	 * will get updated by ioctl(LOOP_GET_STATUS)
 	 */
-	blk_mq_freeze_queue(lo->lo_queue);
+	blk_freeze_queue(lo->lo_queue);
 	lo->use_dio = use_dio;
 	if (use_dio)
 		lo->lo_flags |= LO_FLAGS_DIRECT_IO;
@@ -599,7 +599,7 @@ static int loop_switch(struct loop_device *lo, struct file *file)
 	w.file = file;
 
 	/* freeze queue and wait for completion of scheduled requests */
-	blk_mq_freeze_queue(lo->lo_queue);
+	blk_freeze_queue(lo->lo_queue);
 
 	/* do the switch action */
 	do_loop_switch(lo, &w);
@@ -1046,7 +1046,7 @@ static int loop_clr_fd(struct loop_device *lo)
 		return -EINVAL;
 
 	/* freeze request queue during the transition */
-	blk_mq_freeze_queue(lo->lo_queue);
+	blk_freeze_queue(lo->lo_queue);
 
 	spin_lock_irq(&lo->lo_lock);
 	lo->lo_state = Lo_rundown;
@@ -1116,7 +1116,7 @@ loop_set_status(struct loop_device *lo, const struct loop_info64 *info)
 		return -EINVAL;
 
 	/* I/O need to be drained during transfer transition */
-	blk_mq_freeze_queue(lo->lo_queue);
+	blk_freeze_queue(lo->lo_queue);
 
 	err = loop_release_xfer(lo);
 	if (err)
diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index b008b6a98098..3a97ffcb3a81 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -6347,7 +6347,7 @@ static ssize_t do_rbd_remove(struct bus_type *bus,
 		 * Prevent new IO from being queued and wait for existing
 		 * IO to complete/fail.
 		 */
-		blk_mq_freeze_queue(rbd_dev->disk->queue);
+		blk_freeze_queue(rbd_dev->disk->queue);
 		blk_set_queue_dying(rbd_dev->disk->queue);
 	}
 
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 5c76b0a96be2..986f2b4f9760 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1210,7 +1210,7 @@ static void __nvme_revalidate_disk(struct gendisk *disk, struct nvme_id_ns *id)
 	bs = 1 << ns->lba_shift;
 	ns->noiob = le16_to_cpu(id->noiob);
 
-	blk_mq_freeze_queue(disk->queue);
+	blk_freeze_queue(disk->queue);
 
 	if (ctrl->ops->flags & NVME_F_METADATA_SUPPORTED)
 		nvme_prep_integrity(disk, id, bs);
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 2572e5641568..8ae77e088c01 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -256,7 +256,7 @@ void blk_mq_run_hw_queues(struct request_queue *q, bool async);
 void blk_mq_delay_queue(struct blk_mq_hw_ctx *hctx, unsigned long msecs);
 void blk_mq_tagset_busy_iter(struct blk_mq_tag_set *tagset,
 		busy_tag_iter_fn *fn, void *priv);
-void blk_mq_freeze_queue(struct request_queue *q);
+void blk_freeze_queue(struct request_queue *q);
 void blk_unfreeze_queue(struct request_queue *q);
 void blk_freeze_queue_start(struct request_queue *q);
 void blk_mq_freeze_queue_wait(struct request_queue *q);
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 4/9] blk-mq: only run hw queues for blk-mq
  2017-08-31 17:27 [PATCH 0/9] block/scsi: safe SCSI quiescing Ming Lei
                   ` (2 preceding siblings ...)
  2017-08-31 17:27 ` [PATCH 3/9] blk-mq: rename blk_mq_freeze_queue as blk_freeze_queue Ming Lei
@ 2017-08-31 17:27 ` Ming Lei
  2017-09-01  8:16   ` Johannes Thumshirn
  2017-08-31 17:27 ` [PATCH 5/9] block: introduce blk_drain_queue() Ming Lei
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 34+ messages in thread
From: Ming Lei @ 2017-08-31 17:27 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig, Bart Van Assche,
	linux-scsi, Martin K . Petersen, James E . J . Bottomley
  Cc: Oleksandr Natalenko, Ming Lei

This patch just makes it explicitely.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 8cf1f7cbef2b..4c532d8612e1 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -125,7 +125,8 @@ void blk_freeze_queue_start(struct request_queue *q)
 	freeze_depth = atomic_inc_return(&q->mq_freeze_depth);
 	if (freeze_depth == 1) {
 		percpu_ref_kill(&q->q_usage_counter);
-		blk_mq_run_hw_queues(q, false);
+		if (q->mq_ops)
+			blk_mq_run_hw_queues(q, false);
 	}
 }
 EXPORT_SYMBOL_GPL(blk_freeze_queue_start);
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 5/9] block: introduce blk_drain_queue()
  2017-08-31 17:27 [PATCH 0/9] block/scsi: safe SCSI quiescing Ming Lei
                   ` (3 preceding siblings ...)
  2017-08-31 17:27 ` [PATCH 4/9] blk-mq: only run hw queues for blk-mq Ming Lei
@ 2017-08-31 17:27 ` Ming Lei
  2017-08-31 22:34   ` Bart Van Assche
  2017-08-31 17:27 ` [PATCH 6/9] blk-mq: rename blk_mq_freeze_queue_wait as blk_freeze_queue_wait Ming Lei
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 34+ messages in thread
From: Ming Lei @ 2017-08-31 17:27 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig, Bart Van Assche,
	linux-scsi, Martin K . Petersen, James E . J . Bottomley
  Cc: Oleksandr Natalenko, Ming Lei

So that we can support legacy version of freezing queue,
which is required by safe SCSI quiescing.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-core.c       | 16 ++++++++++++++++
 include/linux/blkdev.h |  1 +
 2 files changed, 17 insertions(+)

diff --git a/block/blk-core.c b/block/blk-core.c
index d579501f24ba..636452f151ea 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -530,6 +530,22 @@ static void __blk_drain_queue(struct request_queue *q, bool drain_all)
 }
 
 /**
+ * blk_drain_queue - drain requests from request_queue
+ * @q: queue to drain
+ *
+ * Drain requests from @q.  All pending requests are drained.
+ * The caller is responsible for ensuring that no new requests
+ * which need to be drained are queued.
+ */
+void blk_drain_queue(struct request_queue *q)
+{
+	spin_lock_irq(q->queue_lock);
+	__blk_drain_queue(q, true);
+	spin_unlock_irq(q->queue_lock);
+}
+EXPORT_SYMBOL(blk_drain_queue);
+
+/**
  * blk_queue_bypass_start - enter queue bypass mode
  * @q: queue of interest
  *
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index f45f157b2910..02959ca03b66 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1146,6 +1146,7 @@ extern struct request_queue *blk_init_queue_node(request_fn_proc *rfn,
 extern struct request_queue *blk_init_queue(request_fn_proc *, spinlock_t *);
 extern int blk_init_allocated_queue(struct request_queue *);
 extern void blk_cleanup_queue(struct request_queue *);
+extern void blk_drain_queue(struct request_queue *);
 extern void blk_queue_make_request(struct request_queue *, make_request_fn *);
 extern void blk_queue_bounce_limit(struct request_queue *, u64);
 extern void blk_queue_max_hw_sectors(struct request_queue *, unsigned int);
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 6/9] blk-mq: rename blk_mq_freeze_queue_wait as blk_freeze_queue_wait
  2017-08-31 17:27 [PATCH 0/9] block/scsi: safe SCSI quiescing Ming Lei
                   ` (4 preceding siblings ...)
  2017-08-31 17:27 ` [PATCH 5/9] block: introduce blk_drain_queue() Ming Lei
@ 2017-08-31 17:27 ` Ming Lei
  2017-08-31 22:37   ` Bart Van Assche
  2017-08-31 17:27 ` [PATCH 7/9] block: tracking request allocation with q_usage_counter Ming Lei
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 34+ messages in thread
From: Ming Lei @ 2017-08-31 17:27 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig, Bart Van Assche,
	linux-scsi, Martin K . Petersen, James E . J . Bottomley
  Cc: Oleksandr Natalenko, Ming Lei

The only change on legacy is that blk_drain_queue() is run
from blk_freeze_queue(), which is called in blk_cleanup_queue().

So this patch removes the explicite __blk_drain_queue() in
blk_cleanup_queue().

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-core.c         | 2 --
 block/blk-mq.c           | 8 +++++---
 drivers/nvme/host/core.c | 2 +-
 include/linux/blk-mq.h   | 2 +-
 4 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 636452f151ea..7a9f93889669 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -669,8 +669,6 @@ void blk_cleanup_queue(struct request_queue *q)
 	 */
 	blk_freeze_queue(q);
 	spin_lock_irq(lock);
-	if (!q->mq_ops)
-		__blk_drain_queue(q, true);
 	queue_flag_set(QUEUE_FLAG_DEAD, q);
 	spin_unlock_irq(lock);
 
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 4c532d8612e1..24de78afbe9a 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -131,11 +131,13 @@ void blk_freeze_queue_start(struct request_queue *q)
 }
 EXPORT_SYMBOL_GPL(blk_freeze_queue_start);
 
-void blk_mq_freeze_queue_wait(struct request_queue *q)
+void blk_freeze_queue_wait(struct request_queue *q)
 {
+	if (!q->mq_ops)
+		blk_drain_queue(q);
 	wait_event(q->mq_freeze_wq, percpu_ref_is_zero(&q->q_usage_counter));
 }
-EXPORT_SYMBOL_GPL(blk_mq_freeze_queue_wait);
+EXPORT_SYMBOL_GPL(blk_freeze_queue_wait);
 
 int blk_mq_freeze_queue_wait_timeout(struct request_queue *q,
 				     unsigned long timeout)
@@ -160,7 +162,7 @@ void blk_freeze_queue(struct request_queue *q)
 	 * exported to drivers as the only user for unfreeze is blk_mq.
 	 */
 	blk_freeze_queue_start(q);
-	blk_mq_freeze_queue_wait(q);
+	blk_freeze_queue_wait(q);
 }
 EXPORT_SYMBOL_GPL(blk_freeze_queue);
 
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 986f2b4f9760..d34a9ffaa940 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2778,7 +2778,7 @@ void nvme_wait_freeze(struct nvme_ctrl *ctrl)
 
 	mutex_lock(&ctrl->namespaces_mutex);
 	list_for_each_entry(ns, &ctrl->namespaces, list)
-		blk_mq_freeze_queue_wait(ns->queue);
+		blk_freeze_queue_wait(ns->queue);
 	mutex_unlock(&ctrl->namespaces_mutex);
 }
 EXPORT_SYMBOL_GPL(nvme_wait_freeze);
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 8ae77e088c01..f90d78eb85df 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -259,7 +259,7 @@ void blk_mq_tagset_busy_iter(struct blk_mq_tag_set *tagset,
 void blk_freeze_queue(struct request_queue *q);
 void blk_unfreeze_queue(struct request_queue *q);
 void blk_freeze_queue_start(struct request_queue *q);
-void blk_mq_freeze_queue_wait(struct request_queue *q);
+void blk_freeze_queue_wait(struct request_queue *q);
 int blk_mq_freeze_queue_wait_timeout(struct request_queue *q,
 				     unsigned long timeout);
 int blk_mq_reinit_tagset(struct blk_mq_tag_set *set,
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 7/9] block: tracking request allocation with q_usage_counter
  2017-08-31 17:27 [PATCH 0/9] block/scsi: safe SCSI quiescing Ming Lei
                   ` (5 preceding siblings ...)
  2017-08-31 17:27 ` [PATCH 6/9] blk-mq: rename blk_mq_freeze_queue_wait as blk_freeze_queue_wait Ming Lei
@ 2017-08-31 17:27 ` Ming Lei
  2017-08-31 17:27 ` [PATCH 8/9] block: allow to allocate req with REQF_PREEMPT when queue is frozen Ming Lei
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 34+ messages in thread
From: Ming Lei @ 2017-08-31 17:27 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig, Bart Van Assche,
	linux-scsi, Martin K . Petersen, James E . J . Bottomley
  Cc: Oleksandr Natalenko, Ming Lei

This usage is basically same with blk-mq, so that we can
support to freeze queue easily.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-core.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/block/blk-core.c b/block/blk-core.c
index 7a9f93889669..a84672f4db32 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1406,16 +1406,21 @@ static struct request *blk_old_get_request(struct request_queue *q,
 					   unsigned int op, gfp_t gfp_mask)
 {
 	struct request *rq;
+	int ret = 0;
 
 	WARN_ON_ONCE(q->mq_ops);
 
 	/* create ioc upfront */
 	create_io_context(gfp_mask, q->node);
 
+	ret = blk_queue_enter(q, !(gfp_mask & __GFP_DIRECT_RECLAIM));
+	if (ret)
+		return ERR_PTR(ret);
 	spin_lock_irq(q->queue_lock);
 	rq = get_request(q, op, NULL, gfp_mask);
 	if (IS_ERR(rq)) {
 		spin_unlock_irq(q->queue_lock);
+		blk_queue_exit(q);
 		return rq;
 	}
 
@@ -1587,6 +1592,7 @@ void __blk_put_request(struct request_queue *q, struct request *req)
 		blk_free_request(rl, req);
 		freed_request(rl, sync, rq_flags);
 		blk_put_rl(rl);
+		blk_queue_exit(q);
 	}
 }
 EXPORT_SYMBOL_GPL(__blk_put_request);
@@ -1868,8 +1874,10 @@ static blk_qc_t blk_queue_bio(struct request_queue *q, struct bio *bio)
 	 * Grab a free request. This is might sleep but can not fail.
 	 * Returns with the queue unlocked.
 	 */
+	blk_queue_enter_live(q);
 	req = get_request(q, bio->bi_opf, bio, GFP_NOIO);
 	if (IS_ERR(req)) {
+		blk_queue_exit(q);
 		__wbt_done(q->rq_wb, wb_acct);
 		if (PTR_ERR(req) == -ENOMEM)
 			bio->bi_status = BLK_STS_RESOURCE;
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 8/9] block: allow to allocate req with REQF_PREEMPT when queue is frozen
  2017-08-31 17:27 [PATCH 0/9] block/scsi: safe SCSI quiescing Ming Lei
                   ` (6 preceding siblings ...)
  2017-08-31 17:27 ` [PATCH 7/9] block: tracking request allocation with q_usage_counter Ming Lei
@ 2017-08-31 17:27 ` Ming Lei
  2017-08-31 22:50   ` Bart Van Assche
  2017-08-31 17:27 ` [PATCH 9/9] SCSI: freeze block queue when SCSI device is put into quiesce Ming Lei
  2017-08-31 17:34 ` [PATCH 0/9] block/scsi: safe SCSI quiescing Oleksandr Natalenko
  9 siblings, 1 reply; 34+ messages in thread
From: Ming Lei @ 2017-08-31 17:27 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig, Bart Van Assche,
	linux-scsi, Martin K . Petersen, James E . J . Bottomley
  Cc: Oleksandr Natalenko, Ming Lei

REQF_PREEMPT is a bit special because it is required to be
dispatched to lld even when SCSI device is quiesced.

So this patch introduces __blk_get_request() to allow block
layer to allocate request when queue is frozen, since we
will freeze queue before quiescing SCSI device in the
following patch for supporting safe SCSI quiescing.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-core.c       | 25 +++++++++++++++++--------
 block/blk-mq.c         | 11 +++++++++--
 block/blk.h            |  5 +++++
 include/linux/blk-mq.h |  7 ++++---
 include/linux/blkdev.h | 17 +++++++++++++++--
 5 files changed, 50 insertions(+), 15 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index a84672f4db32..4e048cbf82e1 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1403,7 +1403,8 @@ static struct request *get_request(struct request_queue *q, unsigned int op,
 }
 
 static struct request *blk_old_get_request(struct request_queue *q,
-					   unsigned int op, gfp_t gfp_mask)
+					   unsigned int op, gfp_t gfp_mask,
+					   unsigned int flags)
 {
 	struct request *rq;
 	int ret = 0;
@@ -1413,9 +1414,17 @@ static struct request *blk_old_get_request(struct request_queue *q,
 	/* create ioc upfront */
 	create_io_context(gfp_mask, q->node);
 
-	ret = blk_queue_enter(q, !(gfp_mask & __GFP_DIRECT_RECLAIM));
+	/*
+	 * When queue is frozen, we still need to allocate req for
+	 * REQF_PREEMPT.
+	 */
+	if ((flags & BLK_MQ_REQ_PREEMPT) && blk_queue_is_frozen(q))
+		blk_queue_enter_live(q);
+	else
+		ret = blk_queue_enter(q, !(gfp_mask & __GFP_DIRECT_RECLAIM));
 	if (ret)
 		return ERR_PTR(ret);
+
 	spin_lock_irq(q->queue_lock);
 	rq = get_request(q, op, NULL, gfp_mask);
 	if (IS_ERR(rq)) {
@@ -1431,26 +1440,26 @@ static struct request *blk_old_get_request(struct request_queue *q,
 	return rq;
 }
 
-struct request *blk_get_request(struct request_queue *q, unsigned int op,
-				gfp_t gfp_mask)
+struct request *__blk_get_request(struct request_queue *q, unsigned int op,
+				  gfp_t gfp_mask, unsigned int flags)
 {
 	struct request *req;
 
 	if (q->mq_ops) {
 		req = blk_mq_alloc_request(q, op,
-			(gfp_mask & __GFP_DIRECT_RECLAIM) ?
-				0 : BLK_MQ_REQ_NOWAIT);
+			flags | ((gfp_mask & __GFP_DIRECT_RECLAIM) ?
+				0 : BLK_MQ_REQ_NOWAIT));
 		if (!IS_ERR(req) && q->mq_ops->initialize_rq_fn)
 			q->mq_ops->initialize_rq_fn(req);
 	} else {
-		req = blk_old_get_request(q, op, gfp_mask);
+		req = blk_old_get_request(q, op, gfp_mask, flags);
 		if (!IS_ERR(req) && q->initialize_rq_fn)
 			q->initialize_rq_fn(req);
 	}
 
 	return req;
 }
-EXPORT_SYMBOL(blk_get_request);
+EXPORT_SYMBOL(__blk_get_request);
 
 /**
  * blk_requeue_request - put a request back on queue
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 24de78afbe9a..bd0a498e04df 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -382,9 +382,16 @@ struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op,
 {
 	struct blk_mq_alloc_data alloc_data = { .flags = flags };
 	struct request *rq;
-	int ret;
+	int ret = 0;
 
-	ret = blk_queue_enter(q, flags & BLK_MQ_REQ_NOWAIT);
+	/*
+	 * When queue is frozen, we still need to allocate req for
+	 * REQF_PREEMPT.
+	 */
+	if ((flags & BLK_MQ_REQ_PREEMPT) && blk_queue_is_frozen(q))
+		blk_queue_enter_live(q);
+	else
+		ret = blk_queue_enter(q, flags & BLK_MQ_REQ_NOWAIT);
 	if (ret)
 		return ERR_PTR(ret);
 
diff --git a/block/blk.h b/block/blk.h
index 6847c5435cca..784e487c7bb7 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -79,6 +79,11 @@ static inline void blk_queue_enter_live(struct request_queue *q)
 	percpu_ref_get(&q->q_usage_counter);
 }
 
+static inline bool blk_queue_is_frozen(struct request_queue *q)
+{
+	return percpu_ref_is_dead(&q->q_usage_counter);
+}
+
 #ifdef CONFIG_BLK_DEV_INTEGRITY
 void blk_flush_integrity(void);
 bool __bio_integrity_endio(struct bio *);
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index f90d78eb85df..0ba5cb043172 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -200,9 +200,10 @@ void blk_mq_free_request(struct request *rq);
 bool blk_mq_can_queue(struct blk_mq_hw_ctx *);
 
 enum {
-	BLK_MQ_REQ_NOWAIT	= (1 << 0), /* return when out of requests */
-	BLK_MQ_REQ_RESERVED	= (1 << 1), /* allocate from reserved pool */
-	BLK_MQ_REQ_INTERNAL	= (1 << 2), /* allocate internal/sched tag */
+	BLK_MQ_REQ_PREEMPT	= BLK_REQ_PREEMPT, /* allocate for RQF_PREEMPT */
+	BLK_MQ_REQ_NOWAIT	= (1 << 8), /* return when out of requests */
+	BLK_MQ_REQ_RESERVED	= (1 << 9), /* allocate from reserved pool */
+	BLK_MQ_REQ_INTERNAL	= (1 << 10), /* allocate internal/sched tag */
 };
 
 struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op,
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 02959ca03b66..10369ba86102 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -858,6 +858,11 @@ enum {
 	BLKPREP_INVALID,	/* invalid command, kill, return -EREMOTEIO */
 };
 
+/* passed to __blk_get_request */
+enum {
+	BLK_REQ_PREEMPT	= (1 << 0), /* allocate for RQF_PREEMPT */
+};
+
 extern unsigned long blk_max_low_pfn, blk_max_pfn;
 
 /*
@@ -940,8 +945,9 @@ extern void blk_rq_init(struct request_queue *q, struct request *rq);
 extern void blk_init_request_from_bio(struct request *req, struct bio *bio);
 extern void blk_put_request(struct request *);
 extern void __blk_put_request(struct request_queue *, struct request *);
-extern struct request *blk_get_request(struct request_queue *, unsigned int op,
-				       gfp_t gfp_mask);
+extern struct request *__blk_get_request(struct request_queue *,
+					 unsigned int op, gfp_t gfp_mask,
+					 unsigned int flags);
 extern void blk_requeue_request(struct request_queue *, struct request *);
 extern int blk_lld_busy(struct request_queue *q);
 extern int blk_rq_prep_clone(struct request *rq, struct request *rq_src,
@@ -992,6 +998,13 @@ blk_status_t errno_to_blk_status(int errno);
 
 bool blk_mq_poll(struct request_queue *q, blk_qc_t cookie);
 
+static inline struct request *blk_get_request(struct request_queue *q,
+					      unsigned int op,
+					      gfp_t gfp_mask)
+{
+	return __blk_get_request(q, op, gfp_mask, 0);
+}
+
 static inline struct request_queue *bdev_get_queue(struct block_device *bdev)
 {
 	return bdev->bd_disk->queue;	/* this is never NULL */
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 9/9] SCSI: freeze block queue when SCSI device is put into quiesce
  2017-08-31 17:27 [PATCH 0/9] block/scsi: safe SCSI quiescing Ming Lei
                   ` (7 preceding siblings ...)
  2017-08-31 17:27 ` [PATCH 8/9] block: allow to allocate req with REQF_PREEMPT when queue is frozen Ming Lei
@ 2017-08-31 17:27 ` Ming Lei
  2017-08-31 17:34 ` [PATCH 0/9] block/scsi: safe SCSI quiescing Oleksandr Natalenko
  9 siblings, 0 replies; 34+ messages in thread
From: Ming Lei @ 2017-08-31 17:27 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig, Bart Van Assche,
	linux-scsi, Martin K . Petersen, James E . J . Bottomley
  Cc: Oleksandr Natalenko, Ming Lei

Simply quiesing SCSI device and waiting for completeion of IO
dispatched to SCSI queue isn't safe, it is easy to use up
requests because all these allocated requests can't be dispatched
when device is put in QIUESCE. Then no request can be allocated
for RQF_PREEMPT, and system may hang somewhere, such as
When sending commands of sync_cache or start_stop during
system suspend path.

Before quiesing SCSI, this patch freezes block queue first,
so no new request can enter queue any more, and all pending
requests are drained too once blk_freeze_queue is returned.

This patch also uses __blk_get_request() for allocating
request with RQF_PREEMPT, so that the allocation can
succeed even though block queue is frozen.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/ide/ide-pm.c    |  3 ++-
 drivers/scsi/scsi_lib.c | 23 +++++++++++++++++++++--
 2 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/drivers/ide/ide-pm.c b/drivers/ide/ide-pm.c
index 544f02d673ca..4970e3561b13 100644
--- a/drivers/ide/ide-pm.c
+++ b/drivers/ide/ide-pm.c
@@ -89,7 +89,8 @@ int generic_ide_resume(struct device *dev)
 	}
 
 	memset(&rqpm, 0, sizeof(rqpm));
-	rq = blk_get_request(drive->queue, REQ_OP_DRV_IN, __GFP_RECLAIM);
+	rq = __blk_get_request(drive->queue, REQ_OP_DRV_IN,
+			       __GFP_RECLAIM, BLK_REQ_PREEMPT);
 	ide_req(rq)->type = ATA_PRIV_PM_RESUME;
 	rq->rq_flags |= RQF_PREEMPT;
 	rq->special = &rqpm;
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index f6097b89d5d3..b9ad8aeb456f 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -244,9 +244,10 @@ int scsi_execute(struct scsi_device *sdev, const unsigned char *cmd,
 	struct scsi_request *rq;
 	int ret = DRIVER_ERROR << 24;
 
-	req = blk_get_request(sdev->request_queue,
+	req = __blk_get_request(sdev->request_queue,
 			data_direction == DMA_TO_DEVICE ?
-			REQ_OP_SCSI_OUT : REQ_OP_SCSI_IN, __GFP_RECLAIM);
+			REQ_OP_SCSI_OUT : REQ_OP_SCSI_IN, __GFP_RECLAIM,
+			BLK_REQ_PREEMPT);
 	if (IS_ERR(req))
 		return ret;
 	rq = scsi_req(req);
@@ -2890,6 +2891,19 @@ scsi_device_quiesce(struct scsi_device *sdev)
 {
 	int err;
 
+	/*
+	 * Simply quiesing SCSI device isn't safe, it is easy
+	 * to use up requests because all these allocated requests
+	 * can't be dispatched when device is put in QIUESCE.
+	 * Then no request can be allocated and we may hang
+	 * somewhere, such as system suspend/resume.
+	 *
+	 * So we freeze block queue first, no new request can
+	 * enter queue any more, and pending requests are drained
+	 * once blk_freeze_queue is returned.
+	 */
+	blk_freeze_queue(sdev->request_queue);
+
 	mutex_lock(&sdev->state_mutex);
 	err = scsi_device_set_state(sdev, SDEV_QUIESCE);
 	mutex_unlock(&sdev->state_mutex);
@@ -2917,6 +2931,11 @@ EXPORT_SYMBOL(scsi_device_quiesce);
  */
 void scsi_device_resume(struct scsi_device *sdev)
 {
+	/* wait for completion of IO issued during SCSI quiese */
+	blk_freeze_queue_wait(sdev->request_queue);
+
+	blk_unfreeze_queue(sdev->request_queue);
+
 	/* check if the device state was mutated prior to resume, and if
 	 * so assume the state is being managed elsewhere (for example
 	 * device deleted during suspend)
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH 0/9] block/scsi: safe SCSI quiescing
  2017-08-31 17:27 [PATCH 0/9] block/scsi: safe SCSI quiescing Ming Lei
                   ` (8 preceding siblings ...)
  2017-08-31 17:27 ` [PATCH 9/9] SCSI: freeze block queue when SCSI device is put into quiesce Ming Lei
@ 2017-08-31 17:34 ` Oleksandr Natalenko
  2017-08-31 17:38   ` Ming Lei
  9 siblings, 1 reply; 34+ messages in thread
From: Oleksandr Natalenko @ 2017-08-31 17:34 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Bart Van Assche,
	linux-scsi, Martin K . Petersen, James E . J . Bottomley

Since I'm in CC, does this series aim to replace 2 patches I've tested before:

blk-mq: add requests in the tail of hctx->dispatch
blk-mq: align to legacy's implementation of blk_execute_rq

?

On čtvrtek 31. srpna 2017 19:27:19 CEST Ming Lei wrote:
> The current SCSI quiesce isn't safe and easy to trigger I/O deadlock.
> 
> Once SCSI device is put into QUIESCE, no new request except for RQF_PREEMPT
> can be dispatched to SCSI successfully, and scsi_device_quiesce() just
> simply waits for completion of I/Os dispatched to SCSI stack. It isn't
> enough at all.
> 
> Because new request still can be allocated, but all the allocated
> requests can't be dispatched successfully, so request pool can be
> consumed up easily.
> 
> Then request with RQF_PREEMPT can't be allocated, and system may
> hang forever, such as during system suspend or SCSI domain alidation.
> 
> Both IO hang inside system suspend[1] or SCSI domain validation
> were reported before.
> 
> This patch tries to solve the issue by freezing block queue during
> SCSI quiescing, and allowing to allocate request of RQF_PREEMPT
> when queue is frozen.
> 
> Both SCSI and SCSI_MQ have this IO deadlock issue, this patch fixes
> them all by unifying blk_freeze_queue() and blk_unfreeze_queue().
> 
> 
> [1] https://marc.info/?t=150340250100013&r=3&w=2
> 
> Ming Lei (9):
>   percpu-refcount: introduce percpu_ref_is_dead()
>   blk-mq: rename blk_mq_unfreeze_queue as blk_unfreeze_queue
>   blk-mq: rename blk_mq_freeze_queue as blk_freeze_queue
>   blk-mq: only run hw queues for blk-mq
>   block: introduce blk_drain_queue()
>   blk-mq: rename blk_mq_freeze_queue_wait as blk_freeze_queue_wait
>   block: tracking request allocation with q_usage_counter
>   block: allow to allocate req with REQF_PREEMPT when queue is frozen
>   SCSI: freeze block queue when SCSI device is put into quiesce
> 
>  block/bfq-iosched.c             |  2 +-
>  block/blk-cgroup.c              |  8 +++----
>  block/blk-core.c                | 49
> +++++++++++++++++++++++++++++++++-------- block/blk-mq.c                  |
> 49 +++++++++++++++++++++-------------------- block/blk-mq.h                
>  |  1 -
>  block/blk.h                     |  5 +++++
>  block/elevator.c                |  4 ++--
>  drivers/block/loop.c            | 16 +++++++-------
>  drivers/block/rbd.c             |  2 +-
>  drivers/ide/ide-pm.c            |  3 ++-
>  drivers/nvme/host/core.c        |  8 +++----
>  drivers/scsi/scsi_lib.c         | 23 +++++++++++++++++--
>  include/linux/blk-mq.h          | 13 ++++++-----
>  include/linux/blkdev.h          | 18 +++++++++++++--
>  include/linux/percpu-refcount.h | 17 ++++++++++++++
>  15 files changed, 153 insertions(+), 65 deletions(-)

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 0/9] block/scsi: safe SCSI quiescing
  2017-08-31 17:34 ` [PATCH 0/9] block/scsi: safe SCSI quiescing Oleksandr Natalenko
@ 2017-08-31 17:38   ` Ming Lei
  2017-08-31 18:31     ` Oleksandr Natalenko
  0 siblings, 1 reply; 34+ messages in thread
From: Ming Lei @ 2017-08-31 17:38 UTC (permalink / raw)
  To: Oleksandr Natalenko
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Bart Van Assche,
	linux-scsi, Martin K . Petersen, James E . J . Bottomley

On Thu, Aug 31, 2017 at 07:34:06PM +0200, Oleksandr Natalenko wrote:
> Since I'm in CC, does this series aim to replace 2 patches I've tested before:
> 
> blk-mq: add requests in the tail of hctx->dispatch
> blk-mq: align to legacy's implementation of blk_execute_rq
> 
> ?

Yeah, this solution is more generic, and the old one in above
two patches may run into the same deadlock inevitably.

Oleksandr, could you test this patchset and provide the feedback?

BTW, it fixes the I/O hang in my raid10 test, but I just write
'devices' to pm_test.

Thanks!

-- 
Ming

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 0/9] block/scsi: safe SCSI quiescing
  2017-08-31 17:38   ` Ming Lei
@ 2017-08-31 18:31     ` Oleksandr Natalenko
  2017-09-01  3:45       ` Ming Lei
  0 siblings, 1 reply; 34+ messages in thread
From: Oleksandr Natalenko @ 2017-08-31 18:31 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Bart Van Assche,
	linux-scsi, Martin K . Petersen, James E . J . Bottomley

Tested against v4.13-rc7. With this patchset it looks like I/O doesn't hang, 
but once (just once, not each time) I've got the following stacktrace on 
resume:

===
[   55.577173] ata1.00: Security Log not supported
[   55.580690] ata1.00: configured for UDMA/100
[   55.582257] ------------[ cut here ]------------
[   55.583924] usb 1-1: reset high-speed USB device number 2 using xhci_hcd
[   55.587489] WARNING: CPU: 3 PID: 646 at lib/percpu-refcount.c:361 
percpu_ref_reinit+0x21/0x80
[   55.590073] Modules linked in: nls_iso8859_1 nls_cp437 vfat fat iTCO_wdt 
kvm_intel bochs_drm ppdev kvm ttm iTCO_vendor_support drm_kms_helper irqbypass 
8139too input_leds drm evdev psmouse led_class pcspkr syscopyarea joydev 
sysfillrect lpc_ich 8139cp parport_pc sysimgblt mousedev intel_agp i2c_i801 
fb_sys_fops mii mac_hid intel_gtt parport qemu_fw_cfg button sch_fq_codel 
ip_tables x_tables xfs dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio 
libcrc32c crc32c_generic algif_skcipher af_alg dm_crypt dm_mod dax raid10 
md_mod sr_mod cdrom sd_mod hid_generic usbhid hid uhci_hcd crct10dif_pclmul 
crc32_pclmul crc32c_intel ghash_clmulni_intel virtio_rng ahci xhci_pci 
serio_raw pcbc ehci_pci xhci_hcd rng_core atkbd libps2 libahci ehci_hcd libata 
aesni_intel aes_x86_64 crypto_simd glue_helper cryptd
[   55.611580]  usbcore virtio_pci scsi_mod usb_common virtio_ring virtio 
i8042 serio
[   55.614305] CPU: 3 PID: 646 Comm: kworker/u8:23 Not tainted 4.13.0-pf1 #1
[   55.616611] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 
02/06/2015
[   55.619903] Workqueue: events_unbound async_run_entry_fn
[   55.621888] task: ffff88001b271e00 task.stack: ffffc90000a2c000
[   55.623674] RIP: 0010:percpu_ref_reinit+0x21/0x80
[   55.625751] RSP: 0000:ffffc90000a2fdb0 EFLAGS: 00010002
[   55.628687] RAX: 0000000000000002 RBX: ffff88001dd80768 RCX: 
ffff88001dd80758
[   55.631475] RDX: 0000000000000001 RSI: 0000000000000212 RDI: 
ffffffff81f3e2f0
[   55.633694] RBP: ffffc90000a2fdc0 R08: 0000000cc61e7800 R09: 
ffff88001f9929c0
[   55.637144] R10: ffffffffffec3296 R11: 7fffffffffffffff R12: 
0000000000000246
[   55.642456] R13: ffff88001f410800 R14: ffff88001f414300 R15: 
0000000000000000
[   55.644832] FS:  0000000000000000(0000) GS:ffff88001f980000(0000) knlGS:
0000000000000000
[   55.647388] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   55.649608] CR2: 00000000ffffffff CR3: 000000001aa50000 CR4: 
00000000001406e0
[   55.652688] Call Trace:
[   55.654597]  blk_unfreeze_queue+0x2f/0x50
[   55.656794]  scsi_device_resume+0x28/0x70 [scsi_mod]
[   55.659059]  scsi_dev_type_resume+0x38/0x90 [scsi_mod]
[   55.660875]  async_sdev_resume+0x15/0x20 [scsi_mod]
[   55.662564]  async_run_entry_fn+0x36/0x150
[   55.664241]  process_one_work+0x1de/0x430
[   55.666018]  worker_thread+0x47/0x3f0
[   55.667387]  kthread+0x125/0x140
[   55.672740]  ? process_one_work+0x430/0x430
[   55.674971]  ? kthread_create_on_node+0x70/0x70
[   55.677110]  ret_from_fork+0x25/0x30
[   55.679098] Code: 5b 41 5c 5d c3 0f 1f 44 00 00 55 48 89 e5 41 54 53 48 89 
fb 48 c7 c7 f0 e2 f3 81 e8 0a de 32 00 49 89 c4 48 8b 43 08 a8 03 75 42 <0f> 
ff 48 83 63 08 fd 65 ff 05 31 7d cc 7e 48 8b 53 08 f6 c2 03 
[   55.684822] ---[ end trace dbbf5aed3cf35331 ]---
[   55.714306] PM: resume of devices complete after 500.175 msecs
[   55.717299] OOM killer enabled.
===

Here:

===
355 void percpu_ref_reinit(struct percpu_ref *ref)
356 {
357     unsigned long flags;
358 
359     spin_lock_irqsave(&percpu_ref_switch_lock, flags);
360 
361     WARN_ON_ONCE(!percpu_ref_is_zero(ref));   // <--
362 
363     ref->percpu_count_ptr &= ~__PERCPU_REF_DEAD;
364     percpu_ref_get(ref);
365     __percpu_ref_switch_mode(ref, NULL);
366 
367     spin_unlock_irqrestore(&percpu_ref_switch_lock, flags);
368 }
===

On čtvrtek 31. srpna 2017 19:38:34 CEST Ming Lei wrote:
> On Thu, Aug 31, 2017 at 07:34:06PM +0200, Oleksandr Natalenko wrote:
> > Since I'm in CC, does this series aim to replace 2 patches I've tested
> > before:
> > 
> > blk-mq: add requests in the tail of hctx->dispatch
> > blk-mq: align to legacy's implementation of blk_execute_rq
> > 
> > ?
> 
> Yeah, this solution is more generic, and the old one in above
> two patches may run into the same deadlock inevitably.
> 
> Oleksandr, could you test this patchset and provide the feedback?
> 
> BTW, it fixes the I/O hang in my raid10 test, but I just write
> 'devices' to pm_test.
> 
> Thanks!

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 5/9] block: introduce blk_drain_queue()
  2017-08-31 17:27 ` [PATCH 5/9] block: introduce blk_drain_queue() Ming Lei
@ 2017-08-31 22:34   ` Bart Van Assche
  2017-09-01  3:52     ` Ming Lei
  0 siblings, 1 reply; 34+ messages in thread
From: Bart Van Assche @ 2017-08-31 22:34 UTC (permalink / raw)
  To: linux-block@vger.kernel.org, hch@infradead.org, Bart Van Assche,
	martin.petersen@oracle.com, ming.lei@redhat.com,
	linux-scsi@vger.kernel.org, axboe@fb.com, jejb@linux.vnet.ibm.com
  Cc: oleksandr@natalenko.name

On Fri, 2017-09-01 at 01:27 +0800, Ming Lei wrote:
> So that we can support legacy version of freezing queue,
> which is required by safe SCSI quiescing.
> 
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>  block/blk-core.c       | 16 ++++++++++++++++
>  include/linux/blkdev.h |  1 +
>  2 files changed, 17 insertions(+)
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index d579501f24ba..636452f151ea 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -530,6 +530,22 @@ static void __blk_drain_queue(struct request_queue *q, bool drain_all)
>  }
>  
>  /**
> + * blk_drain_queue - drain requests from request_queue
> + * @q: queue to drain
> + *
> + * Drain requests from @q.  All pending requests are drained.
> + * The caller is responsible for ensuring that no new requests
> + * which need to be drained are queued.
> + */
> +void blk_drain_queue(struct request_queue *q)
> +{
> +	spin_lock_irq(q->queue_lock);
> +	__blk_drain_queue(q, true);
> +	spin_unlock_irq(q->queue_lock);
> +}
> +EXPORT_SYMBOL(blk_drain_queue);

Hello Ming,

Please drop this patch. As far as I can see this patch series introduces only
one call to blk_drain_queue(), namely in blk_freeze_queue_wait() in patch 6.
My proposal is to inline the body of blk_drain_queue() in blk_freeze_queue_wait()
instead of reintroducing blk_drain_queue() (see also commit 807592a4fafb;
"block: Let blk_drain_queue() caller obtain the queue lock").

Bart.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/9] blk-mq: rename blk_mq_freeze_queue as blk_freeze_queue
  2017-08-31 17:27 ` [PATCH 3/9] blk-mq: rename blk_mq_freeze_queue as blk_freeze_queue Ming Lei
@ 2017-08-31 22:35   ` Bart Van Assche
  2017-09-01  3:51     ` Ming Lei
  0 siblings, 1 reply; 34+ messages in thread
From: Bart Van Assche @ 2017-08-31 22:35 UTC (permalink / raw)
  To: linux-block@vger.kernel.org, hch@infradead.org, Bart Van Assche,
	martin.petersen@oracle.com, ming.lei@redhat.com,
	linux-scsi@vger.kernel.org, axboe@fb.com, jejb@linux.vnet.ibm.com
  Cc: oleksandr@natalenko.name

On Fri, 2017-09-01 at 01:27 +0800, Ming Lei wrote:
> This APIs will be used by legacy path too.

Hello Ming,

Had you already considered to combine patches 2/9 and 3/9 into a single patch?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 6/9] blk-mq: rename blk_mq_freeze_queue_wait as blk_freeze_queue_wait
  2017-08-31 17:27 ` [PATCH 6/9] blk-mq: rename blk_mq_freeze_queue_wait as blk_freeze_queue_wait Ming Lei
@ 2017-08-31 22:37   ` Bart Van Assche
  2017-09-01  3:53     ` Ming Lei
  0 siblings, 1 reply; 34+ messages in thread
From: Bart Van Assche @ 2017-08-31 22:37 UTC (permalink / raw)
  To: linux-block@vger.kernel.org, hch@infradead.org, Bart Van Assche,
	martin.petersen@oracle.com, ming.lei@redhat.com,
	linux-scsi@vger.kernel.org, axboe@fb.com, jejb@linux.vnet.ibm.com
  Cc: oleksandr@natalenko.name

On Fri, 2017-09-01 at 01:27 +0800, Ming Lei wrote:
> The only change on legacy is that blk_drain_queue() is run
> from blk_freeze_queue(), which is called in blk_cleanup_queue().
> 
> So this patch removes the explicite __blk_drain_queue() in
> blk_cleanup_queue().

Hello Ming,

Shouldn't the rename and the move of the __blk_drain_queue() be
separate patches?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 8/9] block: allow to allocate req with REQF_PREEMPT when queue is frozen
  2017-08-31 17:27 ` [PATCH 8/9] block: allow to allocate req with REQF_PREEMPT when queue is frozen Ming Lei
@ 2017-08-31 22:50   ` Bart Van Assche
  2017-09-01  3:55     ` Ming Lei
  0 siblings, 1 reply; 34+ messages in thread
From: Bart Van Assche @ 2017-08-31 22:50 UTC (permalink / raw)
  To: linux-block@vger.kernel.org, hch@infradead.org, Bart Van Assche,
	martin.petersen@oracle.com, ming.lei@redhat.com,
	linux-scsi@vger.kernel.org, axboe@fb.com, jejb@linux.vnet.ibm.com
  Cc: oleksandr@natalenko.name

On Fri, 2017-09-01 at 01:27 +0800, Ming Lei wrote:
> @@ -1413,9 +1414,17 @@ static struct request *blk_old_get_request(struct request_queue *q,
>  	/* create ioc upfront */
>  	create_io_context(gfp_mask, q->node);
>  
> -	ret = blk_queue_enter(q, !(gfp_mask & __GFP_DIRECT_RECLAIM));
> +	/*
> +	 * When queue is frozen, we still need to allocate req for
> +	 * REQF_PREEMPT.
> +	 */
> +	if ((flags & BLK_MQ_REQ_PREEMPT) && blk_queue_is_frozen(q))
> +		blk_queue_enter_live(q);
> +	else
> +		ret = blk_queue_enter(q, !(gfp_mask & __GFP_DIRECT_RECLAIM));

Hello Ming,

Sorry but I doubt that calling blk_queue_enter_live() from inside
blk_old_get_request() is safe. Calling blk_queue_enter_live() is only safe
before a request queue has been marked dead. What prevents a kernel thread
that holds a reference on a request queue and that is running concurrently
with blk_cleanup_queue() to call blk_old_get_request() after a queue has
been marked dead?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/9] percpu-refcount: introduce percpu_ref_is_dead()
  2017-08-31 17:27 ` [PATCH 1/9] percpu-refcount: introduce percpu_ref_is_dead() Ming Lei
@ 2017-08-31 23:07   ` Bart Van Assche
  2017-09-01  3:49     ` Ming Lei
  2017-09-01 13:59   ` Tejun Heo
  1 sibling, 1 reply; 34+ messages in thread
From: Bart Van Assche @ 2017-08-31 23:07 UTC (permalink / raw)
  To: linux-block@vger.kernel.org, hch@infradead.org, Bart Van Assche,
	martin.petersen@oracle.com, ming.lei@redhat.com,
	linux-scsi@vger.kernel.org, axboe@fb.com, jejb@linux.vnet.ibm.com
  Cc: tj@kernel.org, oleksandr@natalenko.name

On Fri, 2017-09-01 at 01:27 +0800, Ming Lei wrote:
> Inside block layer, we need to support to allocate request with
> RQF_PREEMPT even though queue is frozen.

Hello Ming,

Can patch 8/9 be reworked such that it doesn't need the new
percpu_ref_is_dead() helper function, e.g. by checking q->mq_freeze_depth
to check whether or not request processing has been frozen? Have you noticed
that several blk-mq code paths already check q->mq_freeze_depth to check
whether or not a queue is frozen?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 0/9] block/scsi: safe SCSI quiescing
  2017-08-31 18:31     ` Oleksandr Natalenko
@ 2017-09-01  3:45       ` Ming Lei
  2017-09-01  6:24         ` oleksandr
  0 siblings, 1 reply; 34+ messages in thread
From: Ming Lei @ 2017-09-01  3:45 UTC (permalink / raw)
  To: Oleksandr Natalenko
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Bart Van Assche,
	linux-scsi, Martin K . Petersen, James E . J . Bottomley

On Thu, Aug 31, 2017 at 08:31:54PM +0200, Oleksandr Natalenko wrote:
> Tested against v4.13-rc7. With this patchset it looks like I/O doesn't hang, 
> but once (just once, not each time) I've got the following stacktrace on 
> resume:

That looks not bad, and thanks for your test!

Could you try the following patch against this patchset to see
if there is still the warning?


diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index b9ad8aeb456f..bbd85ddad18c 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -243,11 +243,12 @@ int scsi_execute(struct scsi_device *sdev, const unsigned char *cmd,
 	struct request *req;
 	struct scsi_request *rq;
 	int ret = DRIVER_ERROR << 24;
+	unsigned flag = sdev->sdev_state == SDEV_QUIESCE ? BLK_REQ_PREEMPT : 0;
 
 	req = __blk_get_request(sdev->request_queue,
 			data_direction == DMA_TO_DEVICE ?
 			REQ_OP_SCSI_OUT : REQ_OP_SCSI_IN, __GFP_RECLAIM,
-			BLK_REQ_PREEMPT);
+			flag);
 	if (IS_ERR(req))
 		return ret;
 	rq = scsi_req(req);
@@ -2931,11 +2932,6 @@ EXPORT_SYMBOL(scsi_device_quiesce);
  */
 void scsi_device_resume(struct scsi_device *sdev)
 {
-	/* wait for completion of IO issued during SCSI quiese */
-	blk_freeze_queue_wait(sdev->request_queue);
-
-	blk_unfreeze_queue(sdev->request_queue);
-
 	/* check if the device state was mutated prior to resume, and if
 	 * so assume the state is being managed elsewhere (for example
 	 * device deleted during suspend)
@@ -2945,6 +2941,11 @@ void scsi_device_resume(struct scsi_device *sdev)
 	    scsi_device_set_state(sdev, SDEV_RUNNING) == 0)
 		scsi_run_queue(sdev->request_queue);
 	mutex_unlock(&sdev->state_mutex);
+
+	/* wait for completion of IO issued during SCSI quiese */
+	blk_freeze_queue_wait(sdev->request_queue);
+
+	blk_unfreeze_queue(sdev->request_queue);
 }
 EXPORT_SYMBOL(scsi_device_resume);
 
-- 
Ming

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/9] percpu-refcount: introduce percpu_ref_is_dead()
  2017-08-31 23:07   ` Bart Van Assche
@ 2017-09-01  3:49     ` Ming Lei
  0 siblings, 0 replies; 34+ messages in thread
From: Ming Lei @ 2017-09-01  3:49 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-block@vger.kernel.org, hch@infradead.org,
	martin.petersen@oracle.com, linux-scsi@vger.kernel.org,
	axboe@fb.com, jejb@linux.vnet.ibm.com, tj@kernel.org,
	oleksandr@natalenko.name

On Thu, Aug 31, 2017 at 11:07:30PM +0000, Bart Van Assche wrote:
> On Fri, 2017-09-01 at 01:27 +0800, Ming Lei wrote:
> > Inside block layer, we need to support to allocate request with
> > RQF_PREEMPT even though queue is frozen.
> 
> Hello Ming,
> 
> Can patch 8/9 be reworked such that it doesn't need the new
> percpu_ref_is_dead() helper function, e.g. by checking q->mq_freeze_depth
> to check whether or not request processing has been frozen? Have you noticed
> that several blk-mq code paths already check q->mq_freeze_depth to check
> whether or not a queue is frozen?

Strictly speaking, that way is wrong, or at least one lock is required
to check the counter, because the percpu-refcount isn't killed yet after
the atomic counter is increased by 1.

So I prefer to use this new helper.

-- 
Ming

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/9] blk-mq: rename blk_mq_freeze_queue as blk_freeze_queue
  2017-08-31 22:35   ` Bart Van Assche
@ 2017-09-01  3:51     ` Ming Lei
  0 siblings, 0 replies; 34+ messages in thread
From: Ming Lei @ 2017-09-01  3:51 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-block@vger.kernel.org, hch@infradead.org,
	martin.petersen@oracle.com, linux-scsi@vger.kernel.org,
	axboe@fb.com, jejb@linux.vnet.ibm.com, oleksandr@natalenko.name

On Thu, Aug 31, 2017 at 10:35:50PM +0000, Bart Van Assche wrote:
> On Fri, 2017-09-01 at 01:27 +0800, Ming Lei wrote:
> > This APIs will be used by legacy path too.
> 
> Hello Ming,
> 
> Had you already considered to combine patches 2/9 and 3/9 into a single patch?

Either way is fine for me, and splitting them makes it easier to review.

-- 
Ming

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 5/9] block: introduce blk_drain_queue()
  2017-08-31 22:34   ` Bart Van Assche
@ 2017-09-01  3:52     ` Ming Lei
  0 siblings, 0 replies; 34+ messages in thread
From: Ming Lei @ 2017-09-01  3:52 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-block@vger.kernel.org, hch@infradead.org,
	martin.petersen@oracle.com, linux-scsi@vger.kernel.org,
	axboe@fb.com, jejb@linux.vnet.ibm.com, oleksandr@natalenko.name

On Thu, Aug 31, 2017 at 10:34:49PM +0000, Bart Van Assche wrote:
> On Fri, 2017-09-01 at 01:27 +0800, Ming Lei wrote:
> > So that we can support legacy version of freezing queue,
> > which is required by safe SCSI quiescing.
> > 
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > ---
> >  block/blk-core.c       | 16 ++++++++++++++++
> >  include/linux/blkdev.h |  1 +
> >  2 files changed, 17 insertions(+)
> > 
> > diff --git a/block/blk-core.c b/block/blk-core.c
> > index d579501f24ba..636452f151ea 100644
> > --- a/block/blk-core.c
> > +++ b/block/blk-core.c
> > @@ -530,6 +530,22 @@ static void __blk_drain_queue(struct request_queue *q, bool drain_all)
> >  }
> >  
> >  /**
> > + * blk_drain_queue - drain requests from request_queue
> > + * @q: queue to drain
> > + *
> > + * Drain requests from @q.  All pending requests are drained.
> > + * The caller is responsible for ensuring that no new requests
> > + * which need to be drained are queued.
> > + */
> > +void blk_drain_queue(struct request_queue *q)
> > +{
> > +	spin_lock_irq(q->queue_lock);
> > +	__blk_drain_queue(q, true);
> > +	spin_unlock_irq(q->queue_lock);
> > +}
> > +EXPORT_SYMBOL(blk_drain_queue);
> 
> Hello Ming,
> 
> Please drop this patch. As far as I can see this patch series introduces only
> one call to blk_drain_queue(), namely in blk_freeze_queue_wait() in patch 6.
> My proposal is to inline the body of blk_drain_queue() in blk_freeze_queue_wait()
> instead of reintroducing blk_drain_queue() (see also commit 807592a4fafb;
> "block: Let blk_drain_queue() caller obtain the queue lock").

Looks good, will do it in V2.

-- 
Ming

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 6/9] blk-mq: rename blk_mq_freeze_queue_wait as blk_freeze_queue_wait
  2017-08-31 22:37   ` Bart Van Assche
@ 2017-09-01  3:53     ` Ming Lei
  0 siblings, 0 replies; 34+ messages in thread
From: Ming Lei @ 2017-09-01  3:53 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-block@vger.kernel.org, hch@infradead.org,
	martin.petersen@oracle.com, linux-scsi@vger.kernel.org,
	axboe@fb.com, jejb@linux.vnet.ibm.com, oleksandr@natalenko.name

On Thu, Aug 31, 2017 at 10:37:46PM +0000, Bart Van Assche wrote:
> On Fri, 2017-09-01 at 01:27 +0800, Ming Lei wrote:
> > The only change on legacy is that blk_drain_queue() is run
> > from blk_freeze_queue(), which is called in blk_cleanup_queue().
> > 
> > So this patch removes the explicite __blk_drain_queue() in
> > blk_cleanup_queue().
> 
> Hello Ming,
> 
> Shouldn't the rename and the move of the __blk_drain_queue() be
> separate patches?

OK.

-- 
Ming

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 8/9] block: allow to allocate req with REQF_PREEMPT when queue is frozen
  2017-08-31 22:50   ` Bart Van Assche
@ 2017-09-01  3:55     ` Ming Lei
  2017-09-01 15:43       ` Bart Van Assche
  0 siblings, 1 reply; 34+ messages in thread
From: Ming Lei @ 2017-09-01  3:55 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-block@vger.kernel.org, hch@infradead.org,
	martin.petersen@oracle.com, linux-scsi@vger.kernel.org,
	axboe@fb.com, jejb@linux.vnet.ibm.com, oleksandr@natalenko.name

On Thu, Aug 31, 2017 at 10:50:25PM +0000, Bart Van Assche wrote:
> On Fri, 2017-09-01 at 01:27 +0800, Ming Lei wrote:
> > @@ -1413,9 +1414,17 @@ static struct request *blk_old_get_request(struct request_queue *q,
> >  	/* create ioc upfront */
> >  	create_io_context(gfp_mask, q->node);
> >  
> > -	ret = blk_queue_enter(q, !(gfp_mask & __GFP_DIRECT_RECLAIM));
> > +	/*
> > +	 * When queue is frozen, we still need to allocate req for
> > +	 * REQF_PREEMPT.
> > +	 */
> > +	if ((flags & BLK_MQ_REQ_PREEMPT) && blk_queue_is_frozen(q))
> > +		blk_queue_enter_live(q);
> > +	else
> > +		ret = blk_queue_enter(q, !(gfp_mask & __GFP_DIRECT_RECLAIM));
> 
> Hello Ming,
> 
> Sorry but I doubt that calling blk_queue_enter_live() from inside
> blk_old_get_request() is safe. Calling blk_queue_enter_live() is only safe
> before a request queue has been marked dead. What prevents a kernel thread
> that holds a reference on a request queue and that is running concurrently
> with blk_cleanup_queue() to call blk_old_get_request() after a queue has
> been marked dead?

I have sent one delta patch in list, which will only call
blk_queue_enter_live() iff SCSI device is in QUIESCE.

-- 
Ming

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 0/9] block/scsi: safe SCSI quiescing
  2017-09-01  3:45       ` Ming Lei
@ 2017-09-01  6:24         ` oleksandr
  2017-09-01  6:30           ` Ming Lei
  0 siblings, 1 reply; 34+ messages in thread
From: oleksandr @ 2017-09-01  6:24 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Bart Van Assche,
	linux-scsi, Martin K . Petersen, James E . J . Bottomley

Hi.

01.09.2017 05:45, Ming Lei wrote:
> Could you try the following patch against this patchset to see
> if there is still the warning?

With this patch I wasn't able to trigger per-CPU-related warning.

Also, for 8th patch you've written:

> I have sent one delta patch in list, which will only call 
> blk_queue_enter_live() iff SCSI device is in QUIESCE.

This refers to the patch I've just tested, right?

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 0/9] block/scsi: safe SCSI quiescing
  2017-09-01  6:24         ` oleksandr
@ 2017-09-01  6:30           ` Ming Lei
  0 siblings, 0 replies; 34+ messages in thread
From: Ming Lei @ 2017-09-01  6:30 UTC (permalink / raw)
  To: oleksandr
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Bart Van Assche,
	linux-scsi, Martin K . Petersen, James E . J . Bottomley

On Fri, Sep 01, 2017 at 08:24:46AM +0200, oleksandr@natalenko.name wrote:
> Hi.
> 
> 01.09.2017 05:45, Ming Lei wrote:
> > Could you try the following patch against this patchset to see
> > if there is still the warning?
> 
> With this patch I wasn't able to trigger per-CPU-related warning.

That is great!

> 
> Also, for 8th patch you've written:
> 
> > I have sent one delta patch in list, which will only call
> > blk_queue_enter_live() iff SCSI device is in QUIESCE.
> 
> This refers to the patch I've just tested, right?

Yeah, you are right, thanks again for your test!

I will prepare V2 and send it out soon.

-- 
Ming

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 4/9] blk-mq: only run hw queues for blk-mq
  2017-08-31 17:27 ` [PATCH 4/9] blk-mq: only run hw queues for blk-mq Ming Lei
@ 2017-09-01  8:16   ` Johannes Thumshirn
  2017-09-01 12:33     ` Ming Lei
  0 siblings, 1 reply; 34+ messages in thread
From: Johannes Thumshirn @ 2017-09-01  8:16 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Bart Van Assche,
	linux-scsi, Martin K . Petersen, James E . J . Bottomley,
	Oleksandr Natalenko

Hi Ming,

On Fri, Sep 01, 2017 at 01:27:23AM +0800, Ming Lei wrote:
> -		blk_mq_run_hw_queues(q, false);
> +		if (q->mq_ops)
> +			blk_mq_run_hw_queues(q, false);

What speaks against putting the if (q->mq_ops) directly into
blk_mq_run_hw_queues() so we can't accidently call it from the sq path?

Just an idea, no hard preferences here.

Byte,
	Johannes

-- 
Johannes Thumshirn                                          Storage
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 4/9] blk-mq: only run hw queues for blk-mq
  2017-09-01  8:16   ` Johannes Thumshirn
@ 2017-09-01 12:33     ` Ming Lei
  2017-09-01 12:44       ` Johannes Thumshirn
  0 siblings, 1 reply; 34+ messages in thread
From: Ming Lei @ 2017-09-01 12:33 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Bart Van Assche,
	linux-scsi, Martin K . Petersen, James E . J . Bottomley,
	Oleksandr Natalenko

On Fri, Sep 01, 2017 at 10:16:34AM +0200, Johannes Thumshirn wrote:
> Hi Ming,
> 
> On Fri, Sep 01, 2017 at 01:27:23AM +0800, Ming Lei wrote:
> > -		blk_mq_run_hw_queues(q, false);
> > +		if (q->mq_ops)
> > +			blk_mq_run_hw_queues(q, false);
> 
> What speaks against putting the if (q->mq_ops) directly into
> blk_mq_run_hw_queues() so we can't accidently call it from the sq path?
> 
> Just an idea, no hard preferences here.

Given it is the only case that blk_mq_run_hw_queues() is run on
!q->mq_ops now, I suggest to check q->mq_ops outside, otherwise
it can be a bit overkill.

-- 
Ming

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 4/9] blk-mq: only run hw queues for blk-mq
  2017-09-01 12:33     ` Ming Lei
@ 2017-09-01 12:44       ` Johannes Thumshirn
  0 siblings, 0 replies; 34+ messages in thread
From: Johannes Thumshirn @ 2017-09-01 12:44 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Bart Van Assche,
	linux-scsi, Martin K . Petersen, James E . J . Bottomley,
	Oleksandr Natalenko

On Fri, Sep 01, 2017 at 08:33:40PM +0800, Ming Lei wrote:
> Given it is the only case that blk_mq_run_hw_queues() is run on
> !q->mq_ops now, I suggest to check q->mq_ops outside, otherwise
> it can be a bit overkill.

Fair enough,
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
-- 
Johannes Thumshirn                                          Storage
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/9] percpu-refcount: introduce percpu_ref_is_dead()
  2017-08-31 17:27 ` [PATCH 1/9] percpu-refcount: introduce percpu_ref_is_dead() Ming Lei
  2017-08-31 23:07   ` Bart Van Assche
@ 2017-09-01 13:59   ` Tejun Heo
  2017-09-01 15:44     ` Bart Van Assche
  2017-09-01 16:05     ` Ming Lei
  1 sibling, 2 replies; 34+ messages in thread
From: Tejun Heo @ 2017-09-01 13:59 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Bart Van Assche,
	linux-scsi, Martin K . Petersen, James E . J . Bottomley,
	Oleksandr Natalenko

Hello, Ming.

> +/**
> + * percpu_ref_is_dead - test whether a percpu refcount is killed
> + * @ref: percpu_ref to test
> + *
> + * Returns %true if @ref is dead
> + *
> + * This function is safe to call as long as @ref is between init and exit.
> + */
> +static inline bool percpu_ref_is_dead(struct percpu_ref *ref)
> +{
> +	unsigned long __percpu *percpu_count;
> +
> +	if (__ref_is_percpu(ref, &percpu_count))
> +		return false;
> +	return ref->percpu_count_ptr & __PERCPU_REF_DEAD;
> +}

Can you please explain why percpu_ref_is_dying() isn't enough for your
use case?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 8/9] block: allow to allocate req with REQF_PREEMPT when queue is frozen
  2017-09-01  3:55     ` Ming Lei
@ 2017-09-01 15:43       ` Bart Van Assche
  2017-09-01 16:56         ` Ming Lei
  0 siblings, 1 reply; 34+ messages in thread
From: Bart Van Assche @ 2017-09-01 15:43 UTC (permalink / raw)
  To: ming.lei@redhat.com
  Cc: linux-scsi@vger.kernel.org, hch@infradead.org,
	jejb@linux.vnet.ibm.com, linux-block@vger.kernel.org,
	oleksandr@natalenko.name, martin.petersen@oracle.com,
	axboe@fb.com

On Fri, 2017-09-01 at 11:55 +0800, Ming Lei wrote:
> On Thu, Aug 31, 2017 at 10:50:25PM +0000, Bart Van Assche wrote:
> > On Fri, 2017-09-01 at 01:27 +0800, Ming Lei wrote:
> > > @@ -1413,9 +1414,17 @@ static struct request *blk_old_get_request(struct request_queue *q,
> > >  	/* create ioc upfront */
> > >  	create_io_context(gfp_mask, q->node);
> > >  
> > > -	ret = blk_queue_enter(q, !(gfp_mask & __GFP_DIRECT_RECLAIM));
> > > +	/*
> > > +	 * When queue is frozen, we still need to allocate req for
> > > +	 * REQF_PREEMPT.
> > > +	 */
> > > +	if ((flags & BLK_MQ_REQ_PREEMPT) && blk_queue_is_frozen(q))
> > > +		blk_queue_enter_live(q);
> > > +	else
> > > +		ret = blk_queue_enter(q, !(gfp_mask & __GFP_DIRECT_RECLAIM));
> > 
> > Sorry but I doubt that calling blk_queue_enter_live() from inside
> > blk_old_get_request() is safe. Calling blk_queue_enter_live() is only safe
> > before a request queue has been marked dead. What prevents a kernel thread
> > that holds a reference on a request queue and that is running concurrently
> > with blk_cleanup_queue() to call blk_old_get_request() after a queue has
> > been marked dead?
> 
> I have sent one delta patch in list, which will only call
> blk_queue_enter_live() iff SCSI device is in QUIESCE.

That wouldn't make this hack less ugly.

It is possible to trigger the running -> quiesce state transition through
/sys/class/scsi_device/*/device/state and the quiesce -> removed transition
through /sys/class/scsi_device/*/device/delete. An example:

modprobe scsi_debug delay=0 dev_size_mb=16
lsscsi | grep scsi_debug
cd /sys/class/scsi_device/8:0:0:0/device
echo quiesce > state
echo 1 > delete

So the code from your patch 8/9 can race against device removal.

I think we need a better approach than the REQ_PREEMPT hack. How about
implementing resume as a callback by adding an additional function pointer
to struct request_queue / struct blk_mq_ops instead of implementing it as
a request? For SCSI devices races of resume against removal can e.g. be
handled by checking the scsi_device_set_state() return value. That function
namely does not allow the removing/removed to running state transition.

Bart.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/9] percpu-refcount: introduce percpu_ref_is_dead()
  2017-09-01 13:59   ` Tejun Heo
@ 2017-09-01 15:44     ` Bart Van Assche
  2017-09-01 16:05     ` Ming Lei
  1 sibling, 0 replies; 34+ messages in thread
From: Bart Van Assche @ 2017-09-01 15:44 UTC (permalink / raw)
  To: tj@kernel.org, ming.lei@redhat.com
  Cc: linux-block@vger.kernel.org, hch@infradead.org, Bart Van Assche,
	martin.petersen@oracle.com, axboe@fb.com,
	linux-scsi@vger.kernel.org, oleksandr@natalenko.name,
	jejb@linux.vnet.ibm.com

On Fri, 2017-09-01 at 06:59 -0700, Tejun Heo wrote:
> > +/**
> > + * percpu_ref_is_dead - test whether a percpu refcount is killed
> > + * @ref: percpu_ref to test
> > + *
> > + * Returns %true if @ref is dead
> > + *
> > + * This function is safe to call as long as @ref is between init and exit.
> > + */
> > +static inline bool percpu_ref_is_dead(struct percpu_ref *ref)
> > +{
> > +	unsigned long __percpu *percpu_count;
> > +
> > +	if (__ref_is_percpu(ref, &percpu_count))
> > +		return false;
> > +	return ref->percpu_count_ptr & __PERCPU_REF_DEAD;
> > +}
> 
> Can you please explain why percpu_ref_is_dying() isn't enough for your
> use case?

Hello Tejun,

The approach of the whole series is wrong so I think that you can ignore this patch.

Bart.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/9] percpu-refcount: introduce percpu_ref_is_dead()
  2017-09-01 13:59   ` Tejun Heo
  2017-09-01 15:44     ` Bart Van Assche
@ 2017-09-01 16:05     ` Ming Lei
  1 sibling, 0 replies; 34+ messages in thread
From: Ming Lei @ 2017-09-01 16:05 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Bart Van Assche,
	linux-scsi, Martin K . Petersen, James E . J . Bottomley,
	Oleksandr Natalenko

On Fri, Sep 01, 2017 at 06:59:15AM -0700, Tejun Heo wrote:
> Hello, Ming.
> 
> > +/**
> > + * percpu_ref_is_dead - test whether a percpu refcount is killed
> > + * @ref: percpu_ref to test
> > + *
> > + * Returns %true if @ref is dead
> > + *
> > + * This function is safe to call as long as @ref is between init and exit.
> > + */
> > +static inline bool percpu_ref_is_dead(struct percpu_ref *ref)
> > +{
> > +	unsigned long __percpu *percpu_count;
> > +
> > +	if (__ref_is_percpu(ref, &percpu_count))
> > +		return false;
> > +	return ref->percpu_count_ptr & __PERCPU_REF_DEAD;
> > +}
> 
> Can you please explain why percpu_ref_is_dying() isn't enough for your
> use case?

Hi Tejun,

You are right, looks percpu_ref_is_dying() is enough,
percpu_ref_get_many() works fine no matter the ref is dying or not.

Thanks!

-- 
Ming

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 8/9] block: allow to allocate req with REQF_PREEMPT when queue is frozen
  2017-09-01 15:43       ` Bart Van Assche
@ 2017-09-01 16:56         ` Ming Lei
  0 siblings, 0 replies; 34+ messages in thread
From: Ming Lei @ 2017-09-01 16:56 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-scsi@vger.kernel.org, hch@infradead.org,
	jejb@linux.vnet.ibm.com, linux-block@vger.kernel.org,
	oleksandr@natalenko.name, martin.petersen@oracle.com,
	axboe@fb.com

On Fri, Sep 01, 2017 at 03:43:39PM +0000, Bart Van Assche wrote:
> On Fri, 2017-09-01 at 11:55 +0800, Ming Lei wrote:
> > On Thu, Aug 31, 2017 at 10:50:25PM +0000, Bart Van Assche wrote:
> > > On Fri, 2017-09-01 at 01:27 +0800, Ming Lei wrote:
> > > > @@ -1413,9 +1414,17 @@ static struct request *blk_old_get_request(struct request_queue *q,
> > > >  	/* create ioc upfront */
> > > >  	create_io_context(gfp_mask, q->node);
> > > >  
> > > > -	ret = blk_queue_enter(q, !(gfp_mask & __GFP_DIRECT_RECLAIM));
> > > > +	/*
> > > > +	 * When queue is frozen, we still need to allocate req for
> > > > +	 * REQF_PREEMPT.
> > > > +	 */
> > > > +	if ((flags & BLK_MQ_REQ_PREEMPT) && blk_queue_is_frozen(q))
> > > > +		blk_queue_enter_live(q);
> > > > +	else
> > > > +		ret = blk_queue_enter(q, !(gfp_mask & __GFP_DIRECT_RECLAIM));
> > > 
> > > Sorry but I doubt that calling blk_queue_enter_live() from inside
> > > blk_old_get_request() is safe. Calling blk_queue_enter_live() is only safe
> > > before a request queue has been marked dead. What prevents a kernel thread
> > > that holds a reference on a request queue and that is running concurrently
> > > with blk_cleanup_queue() to call blk_old_get_request() after a queue has
> > > been marked dead?
> > 
> > I have sent one delta patch in list, which will only call
> > blk_queue_enter_live() iff SCSI device is in QUIESCE.
> 
> That wouldn't make this hack less ugly.
> 
> It is possible to trigger the running -> quiesce state transition through
> /sys/class/scsi_device/*/device/state and the quiesce -> removed transition
> through /sys/class/scsi_device/*/device/delete. An example:
> 
> modprobe scsi_debug delay=0 dev_size_mb=16
> lsscsi | grep scsi_debug
> cd /sys/class/scsi_device/8:0:0:0/device
> echo quiesce > state
> echo 1 > delete
> 
> So the code from your patch 8/9 can race against device removal.
> 
> I think we need a better approach than the REQ_PREEMPT hack. How about
> implementing resume as a callback by adding an additional function pointer
> to struct request_queue / struct blk_mq_ops instead of implementing it as
> a request? For SCSI devices races of resume against removal can e.g. be
> handled by checking the scsi_device_set_state() return value. That function
> namely does not allow the removing/removed to running state transition.

If there is race between resume vs. removal, that is nothing to do
this patchset.

We definitely need to prevent new requests from being allocated after
SCSI device is put into quiesce. I don't see another better way
than freezing queue, because it is the only available way to 
prevent new req allocation.

Actually there is race between normal freezing and the preempt freezing
in this patchset, and it will be fixed in V2.


-- 
Ming

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2017-09-01 16:56 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-08-31 17:27 [PATCH 0/9] block/scsi: safe SCSI quiescing Ming Lei
2017-08-31 17:27 ` [PATCH 1/9] percpu-refcount: introduce percpu_ref_is_dead() Ming Lei
2017-08-31 23:07   ` Bart Van Assche
2017-09-01  3:49     ` Ming Lei
2017-09-01 13:59   ` Tejun Heo
2017-09-01 15:44     ` Bart Van Assche
2017-09-01 16:05     ` Ming Lei
2017-08-31 17:27 ` [PATCH 2/9] blk-mq: rename blk_mq_unfreeze_queue as blk_unfreeze_queue Ming Lei
2017-08-31 17:27 ` [PATCH 3/9] blk-mq: rename blk_mq_freeze_queue as blk_freeze_queue Ming Lei
2017-08-31 22:35   ` Bart Van Assche
2017-09-01  3:51     ` Ming Lei
2017-08-31 17:27 ` [PATCH 4/9] blk-mq: only run hw queues for blk-mq Ming Lei
2017-09-01  8:16   ` Johannes Thumshirn
2017-09-01 12:33     ` Ming Lei
2017-09-01 12:44       ` Johannes Thumshirn
2017-08-31 17:27 ` [PATCH 5/9] block: introduce blk_drain_queue() Ming Lei
2017-08-31 22:34   ` Bart Van Assche
2017-09-01  3:52     ` Ming Lei
2017-08-31 17:27 ` [PATCH 6/9] blk-mq: rename blk_mq_freeze_queue_wait as blk_freeze_queue_wait Ming Lei
2017-08-31 22:37   ` Bart Van Assche
2017-09-01  3:53     ` Ming Lei
2017-08-31 17:27 ` [PATCH 7/9] block: tracking request allocation with q_usage_counter Ming Lei
2017-08-31 17:27 ` [PATCH 8/9] block: allow to allocate req with REQF_PREEMPT when queue is frozen Ming Lei
2017-08-31 22:50   ` Bart Van Assche
2017-09-01  3:55     ` Ming Lei
2017-09-01 15:43       ` Bart Van Assche
2017-09-01 16:56         ` Ming Lei
2017-08-31 17:27 ` [PATCH 9/9] SCSI: freeze block queue when SCSI device is put into quiesce Ming Lei
2017-08-31 17:34 ` [PATCH 0/9] block/scsi: safe SCSI quiescing Oleksandr Natalenko
2017-08-31 17:38   ` Ming Lei
2017-08-31 18:31     ` Oleksandr Natalenko
2017-09-01  3:45       ` Ming Lei
2017-09-01  6:24         ` oleksandr
2017-09-01  6:30           ` Ming Lei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).