linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v8 00/10] block, scsi, md: Improve suspend and resume
@ 2017-10-10 21:03 Bart Van Assche
  2017-10-10 21:03 ` [PATCH v8 01/10] md: Rename md_notifier into md_reboot_notifier Bart Van Assche
                   ` (9 more replies)
  0 siblings, 10 replies; 20+ messages in thread
From: Bart Van Assche @ 2017-10-10 21:03 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Christoph Hellwig, Martin K . Petersen,
	Oleksandr Natalenko, Ming Lei, Bart Van Assche

Hello Jens,

It is known that during the resume following a hibernate, especially when
using an md RAID1 array created on top of SCSI devices, sometimes the system
hangs instead of coming up properly. This patch series fixes that
problem. These patches have been tested on top of the block layer for-next
branch. Please consider these changes for kernel v4.15.

Thanks,

Bart.

Changes between v7 and v8:
- Fixed a (theoretical?) race identified by Ming Lei.
- Added a tenth patch that checks whether the proper type of flags has been
  passed to a range of block layer functions.

Changes between v6 and v7:
- Added support for the PREEMPT_ONLY flag in blk-mq-debugfs.c.
- Fixed kerneldoc header of blk_queue_enter().
- Added a rcu_read_lock_sched() / rcu_read_unlock_sched() pair in
  blk_set_preempt_only().
- Removed a synchronize_rcu() call from scsi_device_quiesce().
- Modified the description of patch 9/9 in this series.
- Removed scsi_run_queue() call from scsi_device_resume().

Changes between v5 and v6:
- Split an md patch into two patches to make it easier to review the changes.
- For the md patch that suspends I/O while the system is frozen, switched back
  to the freezer mechanism because this makes the code shorter and easier to
  review.
- Changed blk_set/clear_preempt_only() from EXPORT_SYMBOL() into
  EXPORT_SYMBOL_GPL().
- Made blk_set_preempt_only() behave as a test-and-set operation.
- Introduced blk_get_request_flags() and BLK_MQ_REQ_PREEMPT as requested by
  Christoph and reduced the number of arguments of blk_queue_enter() back from
  three to two.
- In scsi_device_quiesce(), moved the blk_mq_freeze_queue() call out of a
  critical section. Made the explanation of why the synchronize_rcu() call
  is necessary more detailed.

Changes between v4 and v5:
- Split blk_set_preempt_only() into two functions as requested by Christoph.
- Made blk_get_request() trigger WARN_ONCE() if it is attempted to allocate
  a request while the system is frozen. This is a big help to identify drivers
  that submit I/O while the system is frozen.
- Since kernel thread freezing is on its way out, reworked the approach for
  avoiding that the md driver submits I/O while the system is frozen such that
  the approach no longer depends on the kernel thread freeze mechanism.
- Fixed the (theoretical) deadlock in scsi_device_quiesce() that was identified
  by Ming.

Changes between v3 and v4:
- Made sure that this patch series not only works for scsi-mq but also for
  the legacy SCSI stack.
- Removed an smp_rmb()/smp_wmb() pair from the hot path and added a comment
  that explains why that is safe.
- Reordered the patches in this patch series.

Changes between v2 and v3:
- Made md kernel threads freezable.
- Changed the approach for quiescing SCSI devices again.
- Addressed Ming's review comments.

Changes compared to v1 of this patch series:
- Changed the approach and rewrote the patch series.

Bart Van Assche (9):
  md: Rename md_notifier into md_reboot_notifier
  md: Introduce md_stop_all_writes()
  md: Neither resync nor reshape while the system is frozen
  block: Introduce blk_get_request_flags()
  block: Introduce BLK_MQ_REQ_PREEMPT
  ide, scsi: Tell the block layer at request allocation time about
    preempt requests
  block: Add the QUEUE_FLAG_PREEMPT_ONLY request queue flag
  block, scsi: Make SCSI quiesce and resume work reliably
  block, nvme: Introduce blk_mq_req_flags_t

Ming Lei (1):
  block: Make q_usage_counter also track legacy requests

 block/blk-core.c          | 132 +++++++++++++++++++++++++++++++++++++++-------
 block/blk-mq-debugfs.c    |   1 +
 block/blk-mq.c            |  20 +++----
 block/blk-mq.h            |   2 +-
 block/blk-timeout.c       |   2 +-
 drivers/ide/ide-pm.c      |   4 +-
 drivers/md/md.c           |  45 +++++++++++++---
 drivers/nvme/host/core.c  |   5 +-
 drivers/nvme/host/nvme.h  |   5 +-
 drivers/scsi/scsi_lib.c   |  35 +++++++-----
 fs/block_dev.c            |   4 +-
 include/linux/blk-mq.h    |  16 ++++--
 include/linux/blk_types.h |   2 +
 include/linux/blkdev.h    |  11 +++-
 14 files changed, 217 insertions(+), 67 deletions(-)

-- 
2.14.2

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v8 01/10] md: Rename md_notifier into md_reboot_notifier
  2017-10-10 21:03 [PATCH v8 00/10] block, scsi, md: Improve suspend and resume Bart Van Assche
@ 2017-10-10 21:03 ` Bart Van Assche
  2017-10-10 21:03 ` [PATCH v8 02/10] md: Introduce md_stop_all_writes() Bart Van Assche
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Bart Van Assche @ 2017-10-10 21:03 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Christoph Hellwig, Martin K . Petersen,
	Oleksandr Natalenko, Ming Lei, Bart Van Assche, Shaohua Li,
	linux-raid, Hannes Reinecke

This avoids confusion with the pm notifier that will be added
through a later patch.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Shaohua Li <shli@kernel.org>
Cc: linux-raid@vger.kernel.org
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
---
 drivers/md/md.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 5d61049e7417..8933cafc212d 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -8966,7 +8966,7 @@ static int md_notify_reboot(struct notifier_block *this,
 	return NOTIFY_DONE;
 }
 
-static struct notifier_block md_notifier = {
+static struct notifier_block md_reboot_notifier = {
 	.notifier_call	= md_notify_reboot,
 	.next		= NULL,
 	.priority	= INT_MAX, /* before any real devices */
@@ -9003,7 +9003,7 @@ static int __init md_init(void)
 	blk_register_region(MKDEV(mdp_major, 0), 1UL<<MINORBITS, THIS_MODULE,
 			    md_probe, NULL, NULL);
 
-	register_reboot_notifier(&md_notifier);
+	register_reboot_notifier(&md_reboot_notifier);
 	raid_table_header = register_sysctl_table(raid_root_table);
 
 	md_geninit();
@@ -9243,7 +9243,7 @@ static __exit void md_exit(void)
 
 	unregister_blkdev(MD_MAJOR,"md");
 	unregister_blkdev(mdp_major, "mdp");
-	unregister_reboot_notifier(&md_notifier);
+	unregister_reboot_notifier(&md_reboot_notifier);
 	unregister_sysctl_table(raid_table_header);
 
 	/* We cannot unload the modules while some process is
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v8 02/10] md: Introduce md_stop_all_writes()
  2017-10-10 21:03 [PATCH v8 00/10] block, scsi, md: Improve suspend and resume Bart Van Assche
  2017-10-10 21:03 ` [PATCH v8 01/10] md: Rename md_notifier into md_reboot_notifier Bart Van Assche
@ 2017-10-10 21:03 ` Bart Van Assche
  2017-10-10 21:03 ` [PATCH v8 03/10] md: Neither resync nor reshape while the system is frozen Bart Van Assche
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Bart Van Assche @ 2017-10-10 21:03 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Christoph Hellwig, Martin K . Petersen,
	Oleksandr Natalenko, Ming Lei, Bart Van Assche, Shaohua Li,
	linux-raid, Hannes Reinecke

Introduce md_stop_all_writes() because the next patch will add
a second caller for this function. This patch does not change
any functionality.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Shaohua Li <shli@kernel.org>
Cc: linux-raid@vger.kernel.org
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
---
 drivers/md/md.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 8933cafc212d..b99584e5d6b1 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -8937,8 +8937,7 @@ int rdev_clear_badblocks(struct md_rdev *rdev, sector_t s, int sectors,
 }
 EXPORT_SYMBOL_GPL(rdev_clear_badblocks);
 
-static int md_notify_reboot(struct notifier_block *this,
-			    unsigned long code, void *x)
+static void md_stop_all_writes(void)
 {
 	struct list_head *tmp;
 	struct mddev *mddev;
@@ -8962,6 +8961,12 @@ static int md_notify_reboot(struct notifier_block *this,
 	 */
 	if (need_delay)
 		mdelay(1000*1);
+}
+
+static int md_notify_reboot(struct notifier_block *this,
+			    unsigned long code, void *x)
+{
+	md_stop_all_writes();
 
 	return NOTIFY_DONE;
 }
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v8 03/10] md: Neither resync nor reshape while the system is frozen
  2017-10-10 21:03 [PATCH v8 00/10] block, scsi, md: Improve suspend and resume Bart Van Assche
  2017-10-10 21:03 ` [PATCH v8 01/10] md: Rename md_notifier into md_reboot_notifier Bart Van Assche
  2017-10-10 21:03 ` [PATCH v8 02/10] md: Introduce md_stop_all_writes() Bart Van Assche
@ 2017-10-10 21:03 ` Bart Van Assche
  2017-10-10 22:30   ` Shaohua Li
  2017-10-10 21:03 ` [PATCH v8 04/10] block: Make q_usage_counter also track legacy requests Bart Van Assche
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 20+ messages in thread
From: Bart Van Assche @ 2017-10-10 21:03 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Christoph Hellwig, Martin K . Petersen,
	Oleksandr Natalenko, Ming Lei, Bart Van Assche, Shaohua Li,
	linux-raid, Hannes Reinecke, Johannes Thumshirn

Some people use the md driver on laptops and use the suspend and
resume functionality. Since it is essential that submitting of
new I/O requests stops before a hibernation image is created,
interrupt the md resync and reshape actions if the system is
being frozen. Note: the resync and reshape will restart after
the system is resumed and a message similar to the following
will appear in the system log:

md: md0: data-check interrupted.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Shaohua Li <shli@kernel.org>
Cc: linux-raid@vger.kernel.org
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
---
 drivers/md/md.c | 30 +++++++++++++++++++++++++++++-
 1 file changed, 29 insertions(+), 1 deletion(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index b99584e5d6b1..d712d3320c1d 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -66,6 +66,8 @@
 #include <linux/raid/md_u.h>
 #include <linux/slab.h>
 #include <linux/percpu-refcount.h>
+#include <linux/freezer.h>
+#include <linux/suspend.h>
 
 #include <trace/events/block.h>
 #include "md.h"
@@ -7439,6 +7441,7 @@ static int md_thread(void *arg)
 	 */
 
 	allow_signal(SIGKILL);
+	set_freezable();
 	while (!kthread_should_stop()) {
 
 		/* We need to wait INTERRUPTIBLE so that
@@ -7449,7 +7452,7 @@ static int md_thread(void *arg)
 		if (signal_pending(current))
 			flush_signals(current);
 
-		wait_event_interruptible_timeout
+		wait_event_freezable_timeout
 			(thread->wqueue,
 			 test_bit(THREAD_WAKEUP, &thread->flags)
 			 || kthread_should_stop() || kthread_should_park(),
@@ -8963,6 +8966,29 @@ static void md_stop_all_writes(void)
 		mdelay(1000*1);
 }
 
+/*
+ * Ensure that neither resyncing nor reshaping occurs while the system is
+ * frozen.
+ */
+static int md_notify_pm(struct notifier_block *bl, unsigned long state,
+			void *unused)
+{
+	pr_debug("%s: state = %ld\n", __func__, state);
+
+	switch (state) {
+	case PM_HIBERNATION_PREPARE:
+	case PM_SUSPEND_PREPARE:
+	case PM_RESTORE_PREPARE:
+		md_stop_all_writes();
+		break;
+	}
+	return NOTIFY_DONE;
+}
+
+static struct notifier_block md_pm_notifier = {
+	.notifier_call	= md_notify_pm,
+};
+
 static int md_notify_reboot(struct notifier_block *this,
 			    unsigned long code, void *x)
 {
@@ -9009,6 +9035,7 @@ static int __init md_init(void)
 			    md_probe, NULL, NULL);
 
 	register_reboot_notifier(&md_reboot_notifier);
+	register_pm_notifier(&md_pm_notifier);
 	raid_table_header = register_sysctl_table(raid_root_table);
 
 	md_geninit();
@@ -9248,6 +9275,7 @@ static __exit void md_exit(void)
 
 	unregister_blkdev(MD_MAJOR,"md");
 	unregister_blkdev(mdp_major, "mdp");
+	unregister_pm_notifier(&md_pm_notifier);
 	unregister_reboot_notifier(&md_reboot_notifier);
 	unregister_sysctl_table(raid_table_header);
 
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v8 04/10] block: Make q_usage_counter also track legacy requests
  2017-10-10 21:03 [PATCH v8 00/10] block, scsi, md: Improve suspend and resume Bart Van Assche
                   ` (2 preceding siblings ...)
  2017-10-10 21:03 ` [PATCH v8 03/10] md: Neither resync nor reshape while the system is frozen Bart Van Assche
@ 2017-10-10 21:03 ` Bart Van Assche
  2017-10-10 21:03 ` [PATCH v8 05/10] block: Introduce blk_get_request_flags() Bart Van Assche
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Bart Van Assche @ 2017-10-10 21:03 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Christoph Hellwig, Martin K . Petersen,
	Oleksandr Natalenko, Ming Lei, Bart Van Assche, Hannes Reinecke

From: Ming Lei <ming.lei@redhat.com>

This patch makes it possible to pause request allocation for
the legacy block layer by calling blk_mq_freeze_queue() and
blk_mq_unfreeze_queue().

Signed-off-by: Ming Lei <ming.lei@redhat.com>
[ bvanassche: Combined two patches into one, edited a comment and made sure
  REQ_NOWAIT is handled properly in blk_old_get_request() ]
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
---
 block/blk-core.c | 12 ++++++++++++
 block/blk-mq.c   | 10 ++--------
 2 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 14f7674fa0b1..265a69511c47 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -610,6 +610,9 @@ void blk_set_queue_dying(struct request_queue *q)
 		}
 		spin_unlock_irq(q->queue_lock);
 	}
+
+	/* Make blk_queue_enter() reexamine the DYING flag. */
+	wake_up_all(&q->mq_freeze_wq);
 }
 EXPORT_SYMBOL_GPL(blk_set_queue_dying);
 
@@ -1395,16 +1398,22 @@ static struct request *blk_old_get_request(struct request_queue *q,
 					   unsigned int op, gfp_t gfp_mask)
 {
 	struct request *rq;
+	int ret = 0;
 
 	WARN_ON_ONCE(q->mq_ops);
 
 	/* create ioc upfront */
 	create_io_context(gfp_mask, q->node);
 
+	ret = blk_queue_enter(q, !(gfp_mask & __GFP_DIRECT_RECLAIM) ||
+			      (op & REQ_NOWAIT));
+	if (ret)
+		return ERR_PTR(ret);
 	spin_lock_irq(q->queue_lock);
 	rq = get_request(q, op, NULL, gfp_mask);
 	if (IS_ERR(rq)) {
 		spin_unlock_irq(q->queue_lock);
+		blk_queue_exit(q);
 		return rq;
 	}
 
@@ -1576,6 +1585,7 @@ void __blk_put_request(struct request_queue *q, struct request *req)
 		blk_free_request(rl, req);
 		freed_request(rl, sync, rq_flags);
 		blk_put_rl(rl);
+		blk_queue_exit(q);
 	}
 }
 EXPORT_SYMBOL_GPL(__blk_put_request);
@@ -1857,8 +1867,10 @@ static blk_qc_t blk_queue_bio(struct request_queue *q, struct bio *bio)
 	 * Grab a free request. This is might sleep but can not fail.
 	 * Returns with the queue unlocked.
 	 */
+	blk_queue_enter_live(q);
 	req = get_request(q, bio->bi_opf, bio, GFP_NOIO);
 	if (IS_ERR(req)) {
+		blk_queue_exit(q);
 		__wbt_done(q->rq_wb, wb_acct);
 		if (PTR_ERR(req) == -ENOMEM)
 			bio->bi_status = BLK_STS_RESOURCE;
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 076cbab9c3e0..8766587ae8b8 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -125,7 +125,8 @@ void blk_freeze_queue_start(struct request_queue *q)
 	freeze_depth = atomic_inc_return(&q->mq_freeze_depth);
 	if (freeze_depth == 1) {
 		percpu_ref_kill(&q->q_usage_counter);
-		blk_mq_run_hw_queues(q, false);
+		if (q->mq_ops)
+			blk_mq_run_hw_queues(q, false);
 	}
 }
 EXPORT_SYMBOL_GPL(blk_freeze_queue_start);
@@ -255,13 +256,6 @@ void blk_mq_wake_waiters(struct request_queue *q)
 	queue_for_each_hw_ctx(q, hctx, i)
 		if (blk_mq_hw_queue_mapped(hctx))
 			blk_mq_tag_wakeup_all(hctx->tags, true);
-
-	/*
-	 * If we are called because the queue has now been marked as
-	 * dying, we need to ensure that processes currently waiting on
-	 * the queue are notified as well.
-	 */
-	wake_up_all(&q->mq_freeze_wq);
 }
 
 bool blk_mq_can_queue(struct blk_mq_hw_ctx *hctx)
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v8 05/10] block: Introduce blk_get_request_flags()
  2017-10-10 21:03 [PATCH v8 00/10] block, scsi, md: Improve suspend and resume Bart Van Assche
                   ` (3 preceding siblings ...)
  2017-10-10 21:03 ` [PATCH v8 04/10] block: Make q_usage_counter also track legacy requests Bart Van Assche
@ 2017-10-10 21:03 ` Bart Van Assche
  2017-10-10 21:03 ` [PATCH v8 06/10] block: Introduce BLK_MQ_REQ_PREEMPT Bart Van Assche
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Bart Van Assche @ 2017-10-10 21:03 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Christoph Hellwig, Martin K . Petersen,
	Oleksandr Natalenko, Ming Lei, Bart Van Assche, Hannes Reinecke,
	Johannes Thumshirn

A side effect of this patch is that the GFP mask that is passed to
several allocation functions in the legacy block layer is changed
from GFP_KERNEL into __GFP_DIRECT_RECLAIM.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
---
 block/blk-core.c       | 50 +++++++++++++++++++++++++++++++++++---------------
 include/linux/blkdev.h |  3 +++
 2 files changed, 38 insertions(+), 15 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 265a69511c47..03156d9ede94 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1157,7 +1157,7 @@ int blk_update_nr_requests(struct request_queue *q, unsigned int nr)
  * @rl: request list to allocate from
  * @op: operation and flags
  * @bio: bio to allocate request for (can be %NULL)
- * @gfp_mask: allocation mask
+ * @flags: BLQ_MQ_REQ_* flags
  *
  * Get a free request from @q.  This function may fail under memory
  * pressure or if @q is dead.
@@ -1167,7 +1167,7 @@ int blk_update_nr_requests(struct request_queue *q, unsigned int nr)
  * Returns request pointer on success, with @q->queue_lock *not held*.
  */
 static struct request *__get_request(struct request_list *rl, unsigned int op,
-		struct bio *bio, gfp_t gfp_mask)
+				     struct bio *bio, unsigned int flags)
 {
 	struct request_queue *q = rl->q;
 	struct request *rq;
@@ -1176,6 +1176,8 @@ static struct request *__get_request(struct request_list *rl, unsigned int op,
 	struct io_cq *icq = NULL;
 	const bool is_sync = op_is_sync(op);
 	int may_queue;
+	gfp_t gfp_mask = flags & BLK_MQ_REQ_NOWAIT ? GFP_ATOMIC :
+			 __GFP_DIRECT_RECLAIM;
 	req_flags_t rq_flags = RQF_ALLOCED;
 
 	lockdep_assert_held(q->queue_lock);
@@ -1336,7 +1338,7 @@ static struct request *__get_request(struct request_list *rl, unsigned int op,
  * @q: request_queue to allocate request from
  * @op: operation and flags
  * @bio: bio to allocate request for (can be %NULL)
- * @gfp_mask: allocation mask
+ * @flags: BLK_MQ_REQ_* flags.
  *
  * Get a free request from @q.  If %__GFP_DIRECT_RECLAIM is set in @gfp_mask,
  * this function keeps retrying under memory pressure and fails iff @q is dead.
@@ -1346,7 +1348,7 @@ static struct request *__get_request(struct request_list *rl, unsigned int op,
  * Returns request pointer on success, with @q->queue_lock *not held*.
  */
 static struct request *get_request(struct request_queue *q, unsigned int op,
-		struct bio *bio, gfp_t gfp_mask)
+				   struct bio *bio, unsigned int flags)
 {
 	const bool is_sync = op_is_sync(op);
 	DEFINE_WAIT(wait);
@@ -1358,7 +1360,7 @@ static struct request *get_request(struct request_queue *q, unsigned int op,
 
 	rl = blk_get_rl(q, bio);	/* transferred to @rq on success */
 retry:
-	rq = __get_request(rl, op, bio, gfp_mask);
+	rq = __get_request(rl, op, bio, flags);
 	if (!IS_ERR(rq))
 		return rq;
 
@@ -1367,7 +1369,7 @@ static struct request *get_request(struct request_queue *q, unsigned int op,
 		return ERR_PTR(-EAGAIN);
 	}
 
-	if (!gfpflags_allow_blocking(gfp_mask) || unlikely(blk_queue_dying(q))) {
+	if ((flags & BLK_MQ_REQ_NOWAIT) || unlikely(blk_queue_dying(q))) {
 		blk_put_rl(rl);
 		return rq;
 	}
@@ -1394,10 +1396,13 @@ static struct request *get_request(struct request_queue *q, unsigned int op,
 	goto retry;
 }
 
+/* flags: BLK_MQ_REQ_PREEMPT and/or BLK_MQ_REQ_NOWAIT. */
 static struct request *blk_old_get_request(struct request_queue *q,
-					   unsigned int op, gfp_t gfp_mask)
+					   unsigned int op, unsigned int flags)
 {
 	struct request *rq;
+	gfp_t gfp_mask = flags & BLK_MQ_REQ_NOWAIT ? GFP_ATOMIC :
+			 __GFP_DIRECT_RECLAIM;
 	int ret = 0;
 
 	WARN_ON_ONCE(q->mq_ops);
@@ -1410,7 +1415,7 @@ static struct request *blk_old_get_request(struct request_queue *q,
 	if (ret)
 		return ERR_PTR(ret);
 	spin_lock_irq(q->queue_lock);
-	rq = get_request(q, op, NULL, gfp_mask);
+	rq = get_request(q, op, NULL, flags);
 	if (IS_ERR(rq)) {
 		spin_unlock_irq(q->queue_lock);
 		blk_queue_exit(q);
@@ -1424,25 +1429,40 @@ static struct request *blk_old_get_request(struct request_queue *q,
 	return rq;
 }
 
-struct request *blk_get_request(struct request_queue *q, unsigned int op,
-				gfp_t gfp_mask)
+/**
+ * blk_get_request_flags - allocate a request
+ * @q: request queue to allocate a request for
+ * @op: operation (REQ_OP_*) and REQ_* flags, e.g. REQ_SYNC.
+ * @flags: BLK_MQ_REQ_* flags, e.g. BLK_MQ_REQ_NOWAIT.
+ */
+struct request *blk_get_request_flags(struct request_queue *q, unsigned int op,
+				      unsigned int flags)
 {
 	struct request *req;
 
+	WARN_ON_ONCE(op & REQ_NOWAIT);
+	WARN_ON_ONCE(flags & ~BLK_MQ_REQ_NOWAIT);
+
 	if (q->mq_ops) {
-		req = blk_mq_alloc_request(q, op,
-			(gfp_mask & __GFP_DIRECT_RECLAIM) ?
-				0 : BLK_MQ_REQ_NOWAIT);
+		req = blk_mq_alloc_request(q, op, flags);
 		if (!IS_ERR(req) && q->mq_ops->initialize_rq_fn)
 			q->mq_ops->initialize_rq_fn(req);
 	} else {
-		req = blk_old_get_request(q, op, gfp_mask);
+		req = blk_old_get_request(q, op, flags);
 		if (!IS_ERR(req) && q->initialize_rq_fn)
 			q->initialize_rq_fn(req);
 	}
 
 	return req;
 }
+EXPORT_SYMBOL(blk_get_request_flags);
+
+struct request *blk_get_request(struct request_queue *q, unsigned int op,
+				gfp_t gfp_mask)
+{
+	return blk_get_request_flags(q, op, gfp_mask & __GFP_DIRECT_RECLAIM ?
+				     0 : BLK_MQ_REQ_NOWAIT);
+}
 EXPORT_SYMBOL(blk_get_request);
 
 /**
@@ -1868,7 +1888,7 @@ static blk_qc_t blk_queue_bio(struct request_queue *q, struct bio *bio)
 	 * Returns with the queue unlocked.
 	 */
 	blk_queue_enter_live(q);
-	req = get_request(q, bio->bi_opf, bio, GFP_NOIO);
+	req = get_request(q, bio->bi_opf, bio, 0);
 	if (IS_ERR(req)) {
 		blk_queue_exit(q);
 		__wbt_done(q->rq_wb, wb_acct);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 9fb71fc7d0e8..3f7b08a9b9ae 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -935,6 +935,9 @@ extern void blk_rq_init(struct request_queue *q, struct request *rq);
 extern void blk_init_request_from_bio(struct request *req, struct bio *bio);
 extern void blk_put_request(struct request *);
 extern void __blk_put_request(struct request_queue *, struct request *);
+extern struct request *blk_get_request_flags(struct request_queue *,
+					     unsigned int op,
+					     unsigned int flags);
 extern struct request *blk_get_request(struct request_queue *, unsigned int op,
 				       gfp_t gfp_mask);
 extern void blk_requeue_request(struct request_queue *, struct request *);
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v8 06/10] block: Introduce BLK_MQ_REQ_PREEMPT
  2017-10-10 21:03 [PATCH v8 00/10] block, scsi, md: Improve suspend and resume Bart Van Assche
                   ` (4 preceding siblings ...)
  2017-10-10 21:03 ` [PATCH v8 05/10] block: Introduce blk_get_request_flags() Bart Van Assche
@ 2017-10-10 21:03 ` Bart Van Assche
  2017-10-10 21:03 ` [PATCH v8 07/10] ide, scsi: Tell the block layer at request allocation time about preempt requests Bart Van Assche
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Bart Van Assche @ 2017-10-10 21:03 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Christoph Hellwig, Martin K . Petersen,
	Oleksandr Natalenko, Ming Lei, Bart Van Assche, Hannes Reinecke,
	Johannes Thumshirn

Set RQF_PREEMPT if BLK_MQ_REQ_PREEMPT is passed to
blk_get_request_flags().

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
---
 block/blk-core.c       | 4 +++-
 block/blk-mq.c         | 2 ++
 include/linux/blk-mq.h | 1 +
 3 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 03156d9ede94..842e67e0b7ab 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1260,6 +1260,8 @@ static struct request *__get_request(struct request_list *rl, unsigned int op,
 	blk_rq_set_rl(rq, rl);
 	rq->cmd_flags = op;
 	rq->rq_flags = rq_flags;
+	if (flags & BLK_MQ_REQ_PREEMPT)
+		rq->rq_flags |= RQF_PREEMPT;
 
 	/* init elvpriv */
 	if (rq_flags & RQF_ELVPRIV) {
@@ -1441,7 +1443,7 @@ struct request *blk_get_request_flags(struct request_queue *q, unsigned int op,
 	struct request *req;
 
 	WARN_ON_ONCE(op & REQ_NOWAIT);
-	WARN_ON_ONCE(flags & ~BLK_MQ_REQ_NOWAIT);
+	WARN_ON_ONCE(flags & ~(BLK_MQ_REQ_NOWAIT | BLK_MQ_REQ_PREEMPT));
 
 	if (q->mq_ops) {
 		req = blk_mq_alloc_request(q, op, flags);
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 8766587ae8b8..bdbfe760bda0 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -290,6 +290,8 @@ static struct request *blk_mq_rq_ctx_init(struct blk_mq_alloc_data *data,
 	rq->q = data->q;
 	rq->mq_ctx = data->ctx;
 	rq->cmd_flags = op;
+	if (data->flags & BLK_MQ_REQ_PREEMPT)
+		rq->rq_flags |= RQF_PREEMPT;
 	if (blk_queue_io_stat(data->q))
 		rq->rq_flags |= RQF_IO_STAT;
 	/* do not touch atomic flags, it needs atomic ops against the timer */
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 50c6485cb04f..ee9fc3fb7fca 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -200,6 +200,7 @@ enum {
 	BLK_MQ_REQ_NOWAIT	= (1 << 0), /* return when out of requests */
 	BLK_MQ_REQ_RESERVED	= (1 << 1), /* allocate from reserved pool */
 	BLK_MQ_REQ_INTERNAL	= (1 << 2), /* allocate internal/sched tag */
+	BLK_MQ_REQ_PREEMPT	= (1 << 3), /* set RQF_PREEMPT */
 };
 
 struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op,
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v8 07/10] ide, scsi: Tell the block layer at request allocation time about preempt requests
  2017-10-10 21:03 [PATCH v8 00/10] block, scsi, md: Improve suspend and resume Bart Van Assche
                   ` (5 preceding siblings ...)
  2017-10-10 21:03 ` [PATCH v8 06/10] block: Introduce BLK_MQ_REQ_PREEMPT Bart Van Assche
@ 2017-10-10 21:03 ` Bart Van Assche
  2017-10-10 21:03 ` [PATCH v8 08/10] block: Add the QUEUE_FLAG_PREEMPT_ONLY request queue flag Bart Van Assche
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Bart Van Assche @ 2017-10-10 21:03 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Christoph Hellwig, Martin K . Petersen,
	Oleksandr Natalenko, Ming Lei, Bart Van Assche, Hannes Reinecke,
	Johannes Thumshirn

Convert blk_get_request(q, op, __GFP_RECLAIM) into
blk_get_request_flags(q, op, BLK_MQ_PREEMPT). This patch does not
change any functionality.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Acked-by: David S. Miller <davem@davemloft.net> [ for IDE ]
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
---
 drivers/ide/ide-pm.c    | 4 ++--
 drivers/scsi/scsi_lib.c | 6 +++---
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/ide/ide-pm.c b/drivers/ide/ide-pm.c
index 544f02d673ca..f56d742908df 100644
--- a/drivers/ide/ide-pm.c
+++ b/drivers/ide/ide-pm.c
@@ -89,9 +89,9 @@ int generic_ide_resume(struct device *dev)
 	}
 
 	memset(&rqpm, 0, sizeof(rqpm));
-	rq = blk_get_request(drive->queue, REQ_OP_DRV_IN, __GFP_RECLAIM);
+	rq = blk_get_request_flags(drive->queue, REQ_OP_DRV_IN,
+				   BLK_MQ_REQ_PREEMPT);
 	ide_req(rq)->type = ATA_PRIV_PM_RESUME;
-	rq->rq_flags |= RQF_PREEMPT;
 	rq->special = &rqpm;
 	rqpm.pm_step = IDE_PM_START_RESUME;
 	rqpm.pm_state = PM_EVENT_ON;
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 9cf6a80fe297..1c16a247fae6 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -252,9 +252,9 @@ int scsi_execute(struct scsi_device *sdev, const unsigned char *cmd,
 	struct scsi_request *rq;
 	int ret = DRIVER_ERROR << 24;
 
-	req = blk_get_request(sdev->request_queue,
+	req = blk_get_request_flags(sdev->request_queue,
 			data_direction == DMA_TO_DEVICE ?
-			REQ_OP_SCSI_OUT : REQ_OP_SCSI_IN, __GFP_RECLAIM);
+			REQ_OP_SCSI_OUT : REQ_OP_SCSI_IN, BLK_MQ_REQ_PREEMPT);
 	if (IS_ERR(req))
 		return ret;
 	rq = scsi_req(req);
@@ -268,7 +268,7 @@ int scsi_execute(struct scsi_device *sdev, const unsigned char *cmd,
 	rq->retries = retries;
 	req->timeout = timeout;
 	req->cmd_flags |= flags;
-	req->rq_flags |= rq_flags | RQF_QUIET | RQF_PREEMPT;
+	req->rq_flags |= rq_flags | RQF_QUIET;
 
 	/*
 	 * head injection *required* here otherwise quiesce won't work
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v8 08/10] block: Add the QUEUE_FLAG_PREEMPT_ONLY request queue flag
  2017-10-10 21:03 [PATCH v8 00/10] block, scsi, md: Improve suspend and resume Bart Van Assche
                   ` (6 preceding siblings ...)
  2017-10-10 21:03 ` [PATCH v8 07/10] ide, scsi: Tell the block layer at request allocation time about preempt requests Bart Van Assche
@ 2017-10-10 21:03 ` Bart Van Assche
  2017-10-10 21:03 ` [PATCH v8 09/10] block, scsi: Make SCSI quiesce and resume work reliably Bart Van Assche
  2017-10-10 21:03 ` [PATCH v8 10/10] block, nvme: Introduce blk_mq_req_flags_t Bart Van Assche
  9 siblings, 0 replies; 20+ messages in thread
From: Bart Van Assche @ 2017-10-10 21:03 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Christoph Hellwig, Martin K . Petersen,
	Oleksandr Natalenko, Ming Lei, Bart Van Assche, Hannes Reinecke,
	Johannes Thumshirn

This flag will be used in the next patch to let the block layer
core know whether or not a SCSI request queue has been quiesced.
A quiesced SCSI queue namely only processes RQF_PREEMPT requests.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
---
 block/blk-core.c       | 30 ++++++++++++++++++++++++++++++
 block/blk-mq-debugfs.c |  1 +
 include/linux/blkdev.h |  6 ++++++
 3 files changed, 37 insertions(+)

diff --git a/block/blk-core.c b/block/blk-core.c
index 842e67e0b7ab..ed992cbd107f 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -346,6 +346,36 @@ void blk_sync_queue(struct request_queue *q)
 }
 EXPORT_SYMBOL(blk_sync_queue);
 
+/**
+ * blk_set_preempt_only - set QUEUE_FLAG_PREEMPT_ONLY
+ * @q: request queue pointer
+ *
+ * Returns the previous value of the PREEMPT_ONLY flag - 0 if the flag was not
+ * set and 1 if the flag was already set.
+ */
+int blk_set_preempt_only(struct request_queue *q)
+{
+	unsigned long flags;
+	int res;
+
+	spin_lock_irqsave(q->queue_lock, flags);
+	res = queue_flag_test_and_set(QUEUE_FLAG_PREEMPT_ONLY, q);
+	spin_unlock_irqrestore(q->queue_lock, flags);
+
+	return res;
+}
+EXPORT_SYMBOL_GPL(blk_set_preempt_only);
+
+void blk_clear_preempt_only(struct request_queue *q)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(q->queue_lock, flags);
+	queue_flag_clear(QUEUE_FLAG_PREEMPT_ONLY, q);
+	spin_unlock_irqrestore(q->queue_lock, flags);
+}
+EXPORT_SYMBOL_GPL(blk_clear_preempt_only);
+
 /**
  * __blk_run_queue_uncond - run a queue whether or not it has been stopped
  * @q:	The queue to run
diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 7f4a1ba532af..75f31535f280 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -74,6 +74,7 @@ static const char *const blk_queue_flag_name[] = {
 	QUEUE_FLAG_NAME(REGISTERED),
 	QUEUE_FLAG_NAME(SCSI_PASSTHROUGH),
 	QUEUE_FLAG_NAME(QUIESCED),
+	QUEUE_FLAG_NAME(PREEMPT_ONLY),
 };
 #undef QUEUE_FLAG_NAME
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 3f7b08a9b9ae..89555eea742b 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -630,6 +630,7 @@ struct request_queue {
 #define QUEUE_FLAG_REGISTERED  26	/* queue has been registered to a disk */
 #define QUEUE_FLAG_SCSI_PASSTHROUGH 27	/* queue supports SCSI commands */
 #define QUEUE_FLAG_QUIESCED    28	/* queue has been quiesced */
+#define QUEUE_FLAG_PREEMPT_ONLY	29	/* only process REQ_PREEMPT requests */
 
 #define QUEUE_FLAG_DEFAULT	((1 << QUEUE_FLAG_IO_STAT) |		\
 				 (1 << QUEUE_FLAG_SAME_COMP)	|	\
@@ -730,6 +731,11 @@ static inline void queue_flag_clear(unsigned int flag, struct request_queue *q)
 	((rq)->cmd_flags & (REQ_FAILFAST_DEV|REQ_FAILFAST_TRANSPORT| \
 			     REQ_FAILFAST_DRIVER))
 #define blk_queue_quiesced(q)	test_bit(QUEUE_FLAG_QUIESCED, &(q)->queue_flags)
+#define blk_queue_preempt_only(q)				\
+	test_bit(QUEUE_FLAG_PREEMPT_ONLY, &(q)->queue_flags)
+
+extern int blk_set_preempt_only(struct request_queue *q);
+extern void blk_clear_preempt_only(struct request_queue *q);
 
 static inline bool blk_account_rq(struct request *rq)
 {
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v8 09/10] block, scsi: Make SCSI quiesce and resume work reliably
  2017-10-10 21:03 [PATCH v8 00/10] block, scsi, md: Improve suspend and resume Bart Van Assche
                   ` (7 preceding siblings ...)
  2017-10-10 21:03 ` [PATCH v8 08/10] block: Add the QUEUE_FLAG_PREEMPT_ONLY request queue flag Bart Van Assche
@ 2017-10-10 21:03 ` Bart Van Assche
  2017-10-12 16:02   ` Ming Lei
  2017-10-10 21:03 ` [PATCH v8 10/10] block, nvme: Introduce blk_mq_req_flags_t Bart Van Assche
  9 siblings, 1 reply; 20+ messages in thread
From: Bart Van Assche @ 2017-10-10 21:03 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Christoph Hellwig, Martin K . Petersen,
	Oleksandr Natalenko, Ming Lei, Bart Van Assche, Hannes Reinecke,
	Johannes Thumshirn

The contexts from which a SCSI device can be quiesced or resumed are:
* Writing into /sys/class/scsi_device/*/device/state.
* SCSI parallel (SPI) domain validation.
* The SCSI device power management methods. See also scsi_bus_pm_ops.

It is essential during suspend and resume that neither the filesystem
state nor the filesystem metadata in RAM changes. This is why while
the hibernation image is being written or restored that SCSI devices
are quiesced. The SCSI core quiesces devices through scsi_device_quiesce()
and scsi_device_resume(). In the SDEV_QUIESCE state execution of
non-preempt requests is deferred. This is realized by returning
BLKPREP_DEFER from inside scsi_prep_state_check() for quiesced SCSI
devices. Avoid that a full queue prevents power management requests
to be submitted by deferring allocation of non-preempt requests for
devices in the quiesced state. This patch has been tested by running
the following commands and by verifying that after resume the fio job
is still running:

for d in /sys/class/block/sd*[a-z]; do
  hcil=$(readlink "$d/device")
  hcil=${hcil#../../../}
  echo 4 > "$d/queue/nr_requests"
  echo 1 > "/sys/class/scsi_device/$hcil/device/queue_depth"
done
bdev=$(readlink /dev/disk/by-uuid/5217d83f-213e-4b42-b86e-20013325ba6c)
bdev=${bdev#../../}
hcil=$(readlink "/sys/block/$bdev/device")
hcil=${hcil#../../../}
fio --name="$bdev" --filename="/dev/$bdev" --buffered=0 --bs=512 --rw=randread \
  --ioengine=libaio --numjobs=4 --iodepth=16 --iodepth_batch=1 --thread \
  --loops=$((2**31)) &
pid=$!
sleep 1
systemctl hibernate
sleep 10
kill $pid

Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name>
References: "I/O hangs after resuming from suspend-to-ram" (https://marc.info/?l=linux-block&m=150340235201348).
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
---
 block/blk-core.c        | 42 +++++++++++++++++++++++++++++++++++-------
 block/blk-mq.c          |  4 ++--
 block/blk-timeout.c     |  2 +-
 drivers/scsi/scsi_lib.c | 29 +++++++++++++++++++----------
 fs/block_dev.c          |  4 ++--
 include/linux/blkdev.h  |  2 +-
 6 files changed, 60 insertions(+), 23 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index ed992cbd107f..3847ea42e341 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -372,6 +372,7 @@ void blk_clear_preempt_only(struct request_queue *q)
 
 	spin_lock_irqsave(q->queue_lock, flags);
 	queue_flag_clear(QUEUE_FLAG_PREEMPT_ONLY, q);
+	wake_up_all(&q->mq_freeze_wq);
 	spin_unlock_irqrestore(q->queue_lock, flags);
 }
 EXPORT_SYMBOL_GPL(blk_clear_preempt_only);
@@ -793,15 +794,40 @@ struct request_queue *blk_alloc_queue(gfp_t gfp_mask)
 }
 EXPORT_SYMBOL(blk_alloc_queue);
 
-int blk_queue_enter(struct request_queue *q, bool nowait)
+/**
+ * blk_queue_enter() - try to increase q->q_usage_counter
+ * @q: request queue pointer
+ * @flags: BLK_MQ_REQ_NOWAIT and/or BLK_MQ_REQ_PREEMPT
+ */
+int blk_queue_enter(struct request_queue *q, unsigned int flags)
 {
+	const bool preempt = flags & BLK_MQ_REQ_PREEMPT;
+
 	while (true) {
+		bool success = false;
 		int ret;
 
-		if (percpu_ref_tryget_live(&q->q_usage_counter))
+		rcu_read_lock_sched();
+		if (percpu_ref_tryget_live(&q->q_usage_counter)) {
+			/*
+			 * The code that sets the PREEMPT_ONLY flag is
+			 * responsible for ensuring that that flag is globally
+			 * visible before the queue is unfrozen.
+			 */
+			if (preempt || !blk_queue_preempt_only(q)) {
+				success = true;
+			} else {
+				percpu_ref_put(&q->q_usage_counter);
+				WARN_ONCE("%s: Attempt to allocate non-preempt request in preempt-only mode.\n",
+					  kobject_name(q->kobj.parent));
+			}
+		}
+		rcu_read_unlock_sched();
+
+		if (success)
 			return 0;
 
-		if (nowait)
+		if (flags & BLK_MQ_REQ_NOWAIT)
 			return -EBUSY;
 
 		/*
@@ -814,7 +840,8 @@ int blk_queue_enter(struct request_queue *q, bool nowait)
 		smp_rmb();
 
 		ret = wait_event_interruptible(q->mq_freeze_wq,
-				!atomic_read(&q->mq_freeze_depth) ||
+				(atomic_read(&q->mq_freeze_depth) == 0 &&
+				 (preempt || !blk_queue_preempt_only(q))) ||
 				blk_queue_dying(q));
 		if (blk_queue_dying(q))
 			return -ENODEV;
@@ -1442,8 +1469,7 @@ static struct request *blk_old_get_request(struct request_queue *q,
 	/* create ioc upfront */
 	create_io_context(gfp_mask, q->node);
 
-	ret = blk_queue_enter(q, !(gfp_mask & __GFP_DIRECT_RECLAIM) ||
-			      (op & REQ_NOWAIT));
+	ret = blk_queue_enter(q, flags);
 	if (ret)
 		return ERR_PTR(ret);
 	spin_lock_irq(q->queue_lock);
@@ -2264,8 +2290,10 @@ blk_qc_t generic_make_request(struct bio *bio)
 	current->bio_list = bio_list_on_stack;
 	do {
 		struct request_queue *q = bio->bi_disk->queue;
+		unsigned int flags = bio->bi_opf & REQ_NOWAIT ?
+			BLK_MQ_REQ_NOWAIT : 0;
 
-		if (likely(blk_queue_enter(q, bio->bi_opf & REQ_NOWAIT) == 0)) {
+		if (likely(blk_queue_enter(q, flags) == 0)) {
 			struct bio_list lower, same;
 
 			/* Create a fresh bio_list for all subordinate requests */
diff --git a/block/blk-mq.c b/block/blk-mq.c
index bdbfe760bda0..44a06e8541f2 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -386,7 +386,7 @@ struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op,
 	struct request *rq;
 	int ret;
 
-	ret = blk_queue_enter(q, flags & BLK_MQ_REQ_NOWAIT);
+	ret = blk_queue_enter(q, flags);
 	if (ret)
 		return ERR_PTR(ret);
 
@@ -425,7 +425,7 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
 	if (hctx_idx >= q->nr_hw_queues)
 		return ERR_PTR(-EIO);
 
-	ret = blk_queue_enter(q, true);
+	ret = blk_queue_enter(q, flags);
 	if (ret)
 		return ERR_PTR(ret);
 
diff --git a/block/blk-timeout.c b/block/blk-timeout.c
index e3e9c9771d36..1eba71486716 100644
--- a/block/blk-timeout.c
+++ b/block/blk-timeout.c
@@ -134,7 +134,7 @@ void blk_timeout_work(struct work_struct *work)
 	struct request *rq, *tmp;
 	int next_set = 0;
 
-	if (blk_queue_enter(q, true))
+	if (blk_queue_enter(q, BLK_MQ_REQ_NOWAIT | BLK_MQ_REQ_PREEMPT))
 		return;
 	spin_lock_irqsave(q->queue_lock, flags);
 
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 1c16a247fae6..462e3fe385db 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -2926,21 +2926,30 @@ static void scsi_wait_for_queuecommand(struct scsi_device *sdev)
 int
 scsi_device_quiesce(struct scsi_device *sdev)
 {
+	struct request_queue *q = sdev->request_queue;
 	int err;
 
+	/* If the SCSI device has already been quiesced, do nothing. */
+	if (blk_set_preempt_only(q))
+		return 0;
+
+	blk_mq_freeze_queue(q);
+	/*
+	 * Ensure that the effect of blk_set_preempt_only() will be visible
+	 * for percpu_ref_tryget() callers that occur after the queue
+	 * unfreeze even if the queue was already frozen before this function
+	 * was called. See also https://lwn.net/Articles/573497/.
+	 */
+	synchronize_rcu();
+	blk_mq_unfreeze_queue(q);
+
 	mutex_lock(&sdev->state_mutex);
 	err = scsi_device_set_state(sdev, SDEV_QUIESCE);
-	mutex_unlock(&sdev->state_mutex);
-
 	if (err)
-		return err;
+		blk_clear_preempt_only(q);
+	mutex_unlock(&sdev->state_mutex);
 
-	scsi_run_queue(sdev->request_queue);
-	while (atomic_read(&sdev->device_busy)) {
-		msleep_interruptible(200);
-		scsi_run_queue(sdev->request_queue);
-	}
-	return 0;
+	return err;
 }
 EXPORT_SYMBOL(scsi_device_quiesce);
 
@@ -2962,7 +2971,7 @@ void scsi_device_resume(struct scsi_device *sdev)
 	mutex_lock(&sdev->state_mutex);
 	if (sdev->sdev_state == SDEV_QUIESCE &&
 	    scsi_device_set_state(sdev, SDEV_RUNNING) == 0)
-		scsi_run_queue(sdev->request_queue);
+		blk_clear_preempt_only(sdev->request_queue);
 	mutex_unlock(&sdev->state_mutex);
 }
 EXPORT_SYMBOL(scsi_device_resume);
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 93d088ffc05c..98cf2d7ee9d3 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -674,7 +674,7 @@ int bdev_read_page(struct block_device *bdev, sector_t sector,
 	if (!ops->rw_page || bdev_get_integrity(bdev))
 		return result;
 
-	result = blk_queue_enter(bdev->bd_queue, false);
+	result = blk_queue_enter(bdev->bd_queue, 0);
 	if (result)
 		return result;
 	result = ops->rw_page(bdev, sector + get_start_sect(bdev), page, false);
@@ -710,7 +710,7 @@ int bdev_write_page(struct block_device *bdev, sector_t sector,
 
 	if (!ops->rw_page || bdev_get_integrity(bdev))
 		return -EOPNOTSUPP;
-	result = blk_queue_enter(bdev->bd_queue, false);
+	result = blk_queue_enter(bdev->bd_queue, 0);
 	if (result)
 		return result;
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 89555eea742b..0a4638cf0687 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -967,7 +967,7 @@ extern int scsi_cmd_ioctl(struct request_queue *, struct gendisk *, fmode_t,
 extern int sg_scsi_ioctl(struct request_queue *, struct gendisk *, fmode_t,
 			 struct scsi_ioctl_command __user *);
 
-extern int blk_queue_enter(struct request_queue *q, bool nowait);
+extern int blk_queue_enter(struct request_queue *q, unsigned int flags);
 extern void blk_queue_exit(struct request_queue *q);
 extern void blk_start_queue(struct request_queue *q);
 extern void blk_start_queue_async(struct request_queue *q);
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v8 10/10] block, nvme: Introduce blk_mq_req_flags_t
  2017-10-10 21:03 [PATCH v8 00/10] block, scsi, md: Improve suspend and resume Bart Van Assche
                   ` (8 preceding siblings ...)
  2017-10-10 21:03 ` [PATCH v8 09/10] block, scsi: Make SCSI quiesce and resume work reliably Bart Van Assche
@ 2017-10-10 21:03 ` Bart Van Assche
  9 siblings, 0 replies; 20+ messages in thread
From: Bart Van Assche @ 2017-10-10 21:03 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Christoph Hellwig, Martin K . Petersen,
	Oleksandr Natalenko, Ming Lei, Bart Van Assche, linux-nvme,
	Hannes Reinecke, Johannes Thumshirn

Several block layer and NVMe core functions accept a combination
of BLK_MQ_REQ_* flags through the 'flags' argument but there is
no verification at compile time whether the right type of block
layer flags is passed. Make it possible for sparse to verify this.
This patch does not change any functionality.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: linux-nvme@lists.infradead.org
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Ming Lei <ming.lei@redhat.com>
---
 block/blk-core.c          | 12 ++++++------
 block/blk-mq.c            |  4 ++--
 block/blk-mq.h            |  2 +-
 drivers/nvme/host/core.c  |  5 +++--
 drivers/nvme/host/nvme.h  |  5 +++--
 include/linux/blk-mq.h    | 17 +++++++++++------
 include/linux/blk_types.h |  2 ++
 include/linux/blkdev.h    |  4 ++--
 8 files changed, 30 insertions(+), 21 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 3847ea42e341..3bd4d42cee80 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -799,7 +799,7 @@ EXPORT_SYMBOL(blk_alloc_queue);
  * @q: request queue pointer
  * @flags: BLK_MQ_REQ_NOWAIT and/or BLK_MQ_REQ_PREEMPT
  */
-int blk_queue_enter(struct request_queue *q, unsigned int flags)
+int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
 {
 	const bool preempt = flags & BLK_MQ_REQ_PREEMPT;
 
@@ -1224,7 +1224,7 @@ int blk_update_nr_requests(struct request_queue *q, unsigned int nr)
  * Returns request pointer on success, with @q->queue_lock *not held*.
  */
 static struct request *__get_request(struct request_list *rl, unsigned int op,
-				     struct bio *bio, unsigned int flags)
+				     struct bio *bio, blk_mq_req_flags_t flags)
 {
 	struct request_queue *q = rl->q;
 	struct request *rq;
@@ -1407,7 +1407,7 @@ static struct request *__get_request(struct request_list *rl, unsigned int op,
  * Returns request pointer on success, with @q->queue_lock *not held*.
  */
 static struct request *get_request(struct request_queue *q, unsigned int op,
-				   struct bio *bio, unsigned int flags)
+				   struct bio *bio, blk_mq_req_flags_t flags)
 {
 	const bool is_sync = op_is_sync(op);
 	DEFINE_WAIT(wait);
@@ -1457,7 +1457,7 @@ static struct request *get_request(struct request_queue *q, unsigned int op,
 
 /* flags: BLK_MQ_REQ_PREEMPT and/or BLK_MQ_REQ_NOWAIT. */
 static struct request *blk_old_get_request(struct request_queue *q,
-					   unsigned int op, unsigned int flags)
+				unsigned int op, blk_mq_req_flags_t flags)
 {
 	struct request *rq;
 	gfp_t gfp_mask = flags & BLK_MQ_REQ_NOWAIT ? GFP_ATOMIC :
@@ -1494,7 +1494,7 @@ static struct request *blk_old_get_request(struct request_queue *q,
  * @flags: BLK_MQ_REQ_* flags, e.g. BLK_MQ_REQ_NOWAIT.
  */
 struct request *blk_get_request_flags(struct request_queue *q, unsigned int op,
-				      unsigned int flags)
+				      blk_mq_req_flags_t flags)
 {
 	struct request *req;
 
@@ -2290,7 +2290,7 @@ blk_qc_t generic_make_request(struct bio *bio)
 	current->bio_list = bio_list_on_stack;
 	do {
 		struct request_queue *q = bio->bi_disk->queue;
-		unsigned int flags = bio->bi_opf & REQ_NOWAIT ?
+		blk_mq_req_flags_t flags = bio->bi_opf & REQ_NOWAIT ?
 			BLK_MQ_REQ_NOWAIT : 0;
 
 		if (likely(blk_queue_enter(q, flags) == 0)) {
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 44a06e8541f2..8fa03e3f97ca 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -380,7 +380,7 @@ static struct request *blk_mq_get_request(struct request_queue *q,
 }
 
 struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op,
-		unsigned int flags)
+		blk_mq_req_flags_t flags)
 {
 	struct blk_mq_alloc_data alloc_data = { .flags = flags };
 	struct request *rq;
@@ -406,7 +406,7 @@ struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op,
 EXPORT_SYMBOL(blk_mq_alloc_request);
 
 struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
-		unsigned int op, unsigned int flags, unsigned int hctx_idx)
+	unsigned int op, blk_mq_req_flags_t flags, unsigned int hctx_idx)
 {
 	struct blk_mq_alloc_data alloc_data = { .flags = flags };
 	struct request *rq;
diff --git a/block/blk-mq.h b/block/blk-mq.h
index ef15b3414da5..1f0f7bb3c7e4 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -108,7 +108,7 @@ static inline void blk_mq_put_ctx(struct blk_mq_ctx *ctx)
 struct blk_mq_alloc_data {
 	/* input parameter */
 	struct request_queue *q;
-	unsigned int flags;
+	blk_mq_req_flags_t flags;
 	unsigned int shallow_depth;
 
 	/* input & output parameter */
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index bb2aad078637..01947cd82b5a 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -292,7 +292,7 @@ static struct nvme_ns *nvme_get_ns_from_disk(struct gendisk *disk)
 }
 
 struct request *nvme_alloc_request(struct request_queue *q,
-		struct nvme_command *cmd, unsigned int flags, int qid)
+		struct nvme_command *cmd, blk_mq_req_flags_t flags, int qid)
 {
 	unsigned op = nvme_is_write(cmd) ? REQ_OP_DRV_OUT : REQ_OP_DRV_IN;
 	struct request *req;
@@ -560,7 +560,8 @@ EXPORT_SYMBOL_GPL(nvme_setup_cmd);
  */
 int __nvme_submit_sync_cmd(struct request_queue *q, struct nvme_command *cmd,
 		union nvme_result *result, void *buffer, unsigned bufflen,
-		unsigned timeout, int qid, int at_head, int flags)
+		unsigned timeout, int qid, int at_head,
+		blk_mq_req_flags_t flags)
 {
 	struct request *req;
 	int ret;
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index d3f3c4447515..61b25e8c222c 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -314,14 +314,15 @@ void nvme_start_freeze(struct nvme_ctrl *ctrl);
 
 #define NVME_QID_ANY -1
 struct request *nvme_alloc_request(struct request_queue *q,
-		struct nvme_command *cmd, unsigned int flags, int qid);
+		struct nvme_command *cmd, blk_mq_req_flags_t flags, int qid);
 blk_status_t nvme_setup_cmd(struct nvme_ns *ns, struct request *req,
 		struct nvme_command *cmd);
 int nvme_submit_sync_cmd(struct request_queue *q, struct nvme_command *cmd,
 		void *buf, unsigned bufflen);
 int __nvme_submit_sync_cmd(struct request_queue *q, struct nvme_command *cmd,
 		union nvme_result *result, void *buffer, unsigned bufflen,
-		unsigned timeout, int qid, int at_head, int flags);
+		unsigned timeout, int qid, int at_head,
+		blk_mq_req_flags_t flags);
 int nvme_set_queue_count(struct nvme_ctrl *ctrl, int *count);
 void nvme_start_keep_alive(struct nvme_ctrl *ctrl);
 void nvme_stop_keep_alive(struct nvme_ctrl *ctrl);
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index ee9fc3fb7fca..80505e5b9684 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -197,16 +197,21 @@ void blk_mq_free_request(struct request *rq);
 bool blk_mq_can_queue(struct blk_mq_hw_ctx *);
 
 enum {
-	BLK_MQ_REQ_NOWAIT	= (1 << 0), /* return when out of requests */
-	BLK_MQ_REQ_RESERVED	= (1 << 1), /* allocate from reserved pool */
-	BLK_MQ_REQ_INTERNAL	= (1 << 2), /* allocate internal/sched tag */
-	BLK_MQ_REQ_PREEMPT	= (1 << 3), /* set RQF_PREEMPT */
+	/* return when out of requests */
+	BLK_MQ_REQ_NOWAIT	= (__force blk_mq_req_flags_t)(1 << 0),
+	/* allocate from reserved pool */
+	BLK_MQ_REQ_RESERVED	= (__force blk_mq_req_flags_t)(1 << 1),
+	/* allocate internal/sched tag */
+	BLK_MQ_REQ_INTERNAL	= (__force blk_mq_req_flags_t)(1 << 2),
+	/* set RQF_PREEMPT */
+	BLK_MQ_REQ_PREEMPT	= (__force blk_mq_req_flags_t)(1 << 3),
 };
 
 struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op,
-		unsigned int flags);
+		blk_mq_req_flags_t flags);
 struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
-		unsigned int op, unsigned int flags, unsigned int hctx_idx);
+		unsigned int op, blk_mq_req_flags_t flags,
+		unsigned int hctx_idx);
 struct request *blk_mq_tag_to_rq(struct blk_mq_tags *tags, unsigned int tag);
 
 enum {
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index a2d2aa709cef..03325d576cab 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -162,6 +162,8 @@ struct bio {
  */
 #define BIO_RESET_BITS	BVEC_POOL_OFFSET
 
+typedef __u32 __bitwise blk_mq_req_flags_t;
+
 /*
  * Operations and flags common to the bio and request structures.
  * We use 8 bits for encoding the operation, and the remaining 24 for flags.
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 0a4638cf0687..d1b7e18ce1c6 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -943,7 +943,7 @@ extern void blk_put_request(struct request *);
 extern void __blk_put_request(struct request_queue *, struct request *);
 extern struct request *blk_get_request_flags(struct request_queue *,
 					     unsigned int op,
-					     unsigned int flags);
+					     blk_mq_req_flags_t flags);
 extern struct request *blk_get_request(struct request_queue *, unsigned int op,
 				       gfp_t gfp_mask);
 extern void blk_requeue_request(struct request_queue *, struct request *);
@@ -967,7 +967,7 @@ extern int scsi_cmd_ioctl(struct request_queue *, struct gendisk *, fmode_t,
 extern int sg_scsi_ioctl(struct request_queue *, struct gendisk *, fmode_t,
 			 struct scsi_ioctl_command __user *);
 
-extern int blk_queue_enter(struct request_queue *q, unsigned int flags);
+extern int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags);
 extern void blk_queue_exit(struct request_queue *q);
 extern void blk_start_queue(struct request_queue *q);
 extern void blk_start_queue_async(struct request_queue *q);
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v8 03/10] md: Neither resync nor reshape while the system is frozen
  2017-10-10 21:03 ` [PATCH v8 03/10] md: Neither resync nor reshape while the system is frozen Bart Van Assche
@ 2017-10-10 22:30   ` Shaohua Li
  2017-10-10 23:33     ` Bart Van Assche
  0 siblings, 1 reply; 20+ messages in thread
From: Shaohua Li @ 2017-10-10 22:30 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block, linux-scsi, Christoph Hellwig,
	Martin K . Petersen, Oleksandr Natalenko, Ming Lei, linux-raid,
	Hannes Reinecke, Johannes Thumshirn

On Tue, Oct 10, 2017 at 02:03:39PM -0700, Bart Van Assche wrote:
> Some people use the md driver on laptops and use the suspend and
> resume functionality. Since it is essential that submitting of
> new I/O requests stops before a hibernation image is created,
> interrupt the md resync and reshape actions if the system is
> being frozen. Note: the resync and reshape will restart after
> the system is resumed and a message similar to the following
> will appear in the system log:
> 
> md: md0: data-check interrupted.

Where do we restart resync and reshape?
 
> Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
> Cc: Shaohua Li <shli@kernel.org>
> Cc: linux-raid@vger.kernel.org
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Hannes Reinecke <hare@suse.com>
> Cc: Johannes Thumshirn <jthumshirn@suse.de>
> ---
>  drivers/md/md.c | 30 +++++++++++++++++++++++++++++-
>  1 file changed, 29 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index b99584e5d6b1..d712d3320c1d 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -66,6 +66,8 @@
>  #include <linux/raid/md_u.h>
>  #include <linux/slab.h>
>  #include <linux/percpu-refcount.h>
> +#include <linux/freezer.h>
> +#include <linux/suspend.h>
>  
>  #include <trace/events/block.h>
>  #include "md.h"
> @@ -7439,6 +7441,7 @@ static int md_thread(void *arg)
>  	 */
>  
>  	allow_signal(SIGKILL);
> +	set_freezable();
>  	while (!kthread_should_stop()) {
>  
>  		/* We need to wait INTERRUPTIBLE so that
> @@ -7449,7 +7452,7 @@ static int md_thread(void *arg)
>  		if (signal_pending(current))
>  			flush_signals(current);
>  
> -		wait_event_interruptible_timeout
> +		wait_event_freezable_timeout
>  			(thread->wqueue,
>  			 test_bit(THREAD_WAKEUP, &thread->flags)
>  			 || kthread_should_stop() || kthread_should_park(),
> @@ -8963,6 +8966,29 @@ static void md_stop_all_writes(void)
>  		mdelay(1000*1);
>  }
>  
> +/*
> + * Ensure that neither resyncing nor reshaping occurs while the system is
> + * frozen.
> + */
> +static int md_notify_pm(struct notifier_block *bl, unsigned long state,
> +			void *unused)
> +{
> +	pr_debug("%s: state = %ld\n", __func__, state);
> +
> +	switch (state) {
> +	case PM_HIBERNATION_PREPARE:
> +	case PM_SUSPEND_PREPARE:
> +	case PM_RESTORE_PREPARE:
> +		md_stop_all_writes();
> +		break;
> +	}
> +	return NOTIFY_DONE;
> +}
> +
> +static struct notifier_block md_pm_notifier = {
> +	.notifier_call	= md_notify_pm,
> +};
> +
>  static int md_notify_reboot(struct notifier_block *this,
>  			    unsigned long code, void *x)
>  {
> @@ -9009,6 +9035,7 @@ static int __init md_init(void)
>  			    md_probe, NULL, NULL);
>  
>  	register_reboot_notifier(&md_reboot_notifier);
> +	register_pm_notifier(&md_pm_notifier);
>  	raid_table_header = register_sysctl_table(raid_root_table);
>  
>  	md_geninit();
> @@ -9248,6 +9275,7 @@ static __exit void md_exit(void)
>  
>  	unregister_blkdev(MD_MAJOR,"md");
>  	unregister_blkdev(mdp_major, "mdp");
> +	unregister_pm_notifier(&md_pm_notifier);
>  	unregister_reboot_notifier(&md_reboot_notifier);
>  	unregister_sysctl_table(raid_table_header);
>  
> -- 
> 2.14.2
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v8 03/10] md: Neither resync nor reshape while the system is frozen
  2017-10-10 22:30   ` Shaohua Li
@ 2017-10-10 23:33     ` Bart Van Assche
  2017-10-11  1:48       ` Shaohua Li
  0 siblings, 1 reply; 20+ messages in thread
From: Bart Van Assche @ 2017-10-10 23:33 UTC (permalink / raw)
  To: shli@kernel.org
  Cc: linux-block@vger.kernel.org, jthumshirn@suse.de,
	linux-raid@vger.kernel.org, hch@lst.de,
	martin.petersen@oracle.com, axboe@kernel.dk, ming.lei@redhat.com,
	linux-scsi@vger.kernel.org, oleksandr@natalenko.name,
	hare@suse.com

T24gVHVlLCAyMDE3LTEwLTEwIGF0IDE1OjMwIC0wNzAwLCBTaGFvaHVhIExpIHdyb3RlOg0KPiBP
biBUdWUsIE9jdCAxMCwgMjAxNyBhdCAwMjowMzozOVBNIC0wNzAwLCBCYXJ0IFZhbiBBc3NjaGUg
d3JvdGU6DQo+ID4gU29tZSBwZW9wbGUgdXNlIHRoZSBtZCBkcml2ZXIgb24gbGFwdG9wcyBhbmQg
dXNlIHRoZSBzdXNwZW5kIGFuZA0KPiA+IHJlc3VtZSBmdW5jdGlvbmFsaXR5LiBTaW5jZSBpdCBp
cyBlc3NlbnRpYWwgdGhhdCBzdWJtaXR0aW5nIG9mDQo+ID4gbmV3IEkvTyByZXF1ZXN0cyBzdG9w
cyBiZWZvcmUgYSBoaWJlcm5hdGlvbiBpbWFnZSBpcyBjcmVhdGVkLA0KPiA+IGludGVycnVwdCB0
aGUgbWQgcmVzeW5jIGFuZCByZXNoYXBlIGFjdGlvbnMgaWYgdGhlIHN5c3RlbSBpcw0KPiA+IGJl
aW5nIGZyb3plbi4gTm90ZTogdGhlIHJlc3luYyBhbmQgcmVzaGFwZSB3aWxsIHJlc3RhcnQgYWZ0
ZXINCj4gPiB0aGUgc3lzdGVtIGlzIHJlc3VtZWQgYW5kIGEgbWVzc2FnZSBzaW1pbGFyIHRvIHRo
ZSBmb2xsb3dpbmcNCj4gPiB3aWxsIGFwcGVhciBpbiB0aGUgc3lzdGVtIGxvZzoNCj4gPiANCj4g
PiBtZDogbWQwOiBkYXRhLWNoZWNrIGludGVycnVwdGVkLg0KPiANCj4gV2hlcmUgZG8gd2UgcmVz
dGFydCByZXN5bmMgYW5kIHJlc2hhcGU/DQoNCkhlbGxvIFNoYW9odWEsDQoNCk15IHVuZGVyc3Rh
bmRpbmcgb2YgdGhlIG1kIGRyaXZlciBpcyBhcyBmb2xsb3dzOg0KLSBCZWZvcmUgYW55IHdyaXRl
IG9jY3VycywgbWRfd3JpdGVfc3RhcnQoKSBpcyBjYWxsZWQuIFRoYXQgZnVuY3Rpb24gY2xlYXJz
DQogIG1kZGV2LT5zYWZlbW9kZSBhbmQgY2hlY2tzIG1kZGV2LT5pbl9zeW5jLiBJZiB0aGF0IHZh
cmlhYmxlIGlzIHNldCwgaXQgaXMNCiAgY2xlYXJlZCBhbmQgbWRfd3JpdGVfc3RhcnQoKSB3YWtl
cyB1cCBtZGRldi0+dGhyZWFkIGFuZCB3YWl0cyB1bnRpbCB0aGUNCiAgc3VwZXJibG9jayBoYXMg
YmVlbiB3cml0dGVuIHRvIGRpc2suDQotIEFsbCBtZGRldi0+dGhyZWFkIGltcGxlbWVudGF0aW9u
cyBjYWxsIG1kX2NoZWNrX3JlY292ZXJ5KCkuIFRoYXQgZnVuY3Rpb24NCiAgdXBkYXRlcyB0aGUg
c3VwZXJibG9jayBieSBjYWxsaW5nIG1kX3VwZGF0ZV9zYigpIGlmIG5lZWRlZC4NCiAgbWRfY2hl
Y2tfcmVjb3ZlcnkoKSBhbHNvIHN0YXJ0cyBhIHJlc3luYyB0aHJlYWQgaWYgdGhlIGFycmF5IGlz
IG5vdCBpbg0KICBzeW5jLiBTZWUgYWxzbyB0aGUgY29tbWVudCBibG9jayBhYm92ZSBtZF9jaGVj
a19yZWNvdmVyeSgpLg0KLSBtZF9kb19zeW5jKCkgcGVyZm9ybXMgdGhlIHJlc3luYy4gSWYgaXQg
Y29tcGxldGVzIHN1Y2Nlc3NmdWxseSB0aGUNCiAgImluX3N5bmMiIHN0YXRlIGlzIHNldCAoc2V0
X2luX3N5bmMoKSkuIElmIGl0IGlzIGludGVycnVwdGVkIHRoZSAiaW5fc3luYyINCiAgc3RhdGUg
aXMgbm90IG1vZGlmaWVkLg0KLSBzdXBlcl85MF9zeW5jKCkgc2V0cyB0aGUgTURfU0JfQ0xFQU4g
ZmxhZyBpbiB0aGUgb24tZGlzayBzdXBlcmJsb2NrIGlmIHRoZQ0KICAiaW5fc3luYyIgZmxhZyBp
cyBzZXQuDQoNCkluIG90aGVyIHdvcmRzOiBpZiB0aGUgYXJyYXkgaXMgaW4gc3luYyB0aGUgTURf
U0JfQ0xFQU4gZmxhZyBpcyBzZXQgaW4gdGhlDQpvbi1kaXNrIHN1cGVyYmxvY2suIElmIGEgcmVz
eW5jIGlzIG5lZWRlZCB0aGF0IGZsYWcgaXMgbm90IHNldC4gVGhlDQpNRF9TQl9DTEVBTiBmbGFn
IGlzIGNoZWNrZWQgd2hlbiB0aGUgc3VwZXJibG9jayBpcyBsb2FkZWQuIElmIHRoYXQgZmxhZyBp
cw0Kbm90IHNldCBhIHJlc3luYyBpcyBzdGFydGVkLg0KDQpCYXJ0Lg0K

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v8 03/10] md: Neither resync nor reshape while the system is frozen
  2017-10-10 23:33     ` Bart Van Assche
@ 2017-10-11  1:48       ` Shaohua Li
  2017-10-11 17:17         ` Bart Van Assche
  0 siblings, 1 reply; 20+ messages in thread
From: Shaohua Li @ 2017-10-11  1:48 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-block@vger.kernel.org, jthumshirn@suse.de,
	linux-raid@vger.kernel.org, hch@lst.de,
	martin.petersen@oracle.com, axboe@kernel.dk, ming.lei@redhat.com,
	linux-scsi@vger.kernel.org, oleksandr@natalenko.name,
	hare@suse.com

On Tue, Oct 10, 2017 at 11:33:06PM +0000, Bart Van Assche wrote:
> On Tue, 2017-10-10 at 15:30 -0700, Shaohua Li wrote:
> > On Tue, Oct 10, 2017 at 02:03:39PM -0700, Bart Van Assche wrote:
> > > Some people use the md driver on laptops and use the suspend and
> > > resume functionality. Since it is essential that submitting of
> > > new I/O requests stops before a hibernation image is created,
> > > interrupt the md resync and reshape actions if the system is
> > > being frozen. Note: the resync and reshape will restart after
> > > the system is resumed and a message similar to the following
> > > will appear in the system log:
> > > 
> > > md: md0: data-check interrupted.
> > 
> > Where do we restart resync and reshape?
> 
> Hello Shaohua,
> 
> My understanding of the md driver is as follows:
> - Before any write occurs, md_write_start() is called. That function clears
>   mddev->safemode and checks mddev->in_sync. If that variable is set, it is
>   cleared and md_write_start() wakes up mddev->thread and waits until the
>   superblock has been written to disk.
> - All mddev->thread implementations call md_check_recovery(). That function
>   updates the superblock by calling md_update_sb() if needed.
>   md_check_recovery() also starts a resync thread if the array is not in
>   sync. See also the comment block above md_check_recovery().
> - md_do_sync() performs the resync. If it completes successfully the
>   "in_sync" state is set (set_in_sync()). If it is interrupted the "in_sync"
>   state is not modified.
> - super_90_sync() sets the MD_SB_CLEAN flag in the on-disk superblock if the
>   "in_sync" flag is set.
> 
> In other words: if the array is in sync the MD_SB_CLEAN flag is set in the
> on-disk superblock. If a resync is needed that flag is not set. The
> MD_SB_CLEAN flag is checked when the superblock is loaded. If that flag is
> not set a resync is started.

The problem is __md_stop_writes set some bit like MD_RECOVERY_FROZEN, which
will prevent md_check_recovery restarting resync/reshape. I think we need
preserve mddev->recovery in suspend prepare and restore after resume

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v8 03/10] md: Neither resync nor reshape while the system is frozen
  2017-10-11  1:48       ` Shaohua Li
@ 2017-10-11 17:17         ` Bart Van Assche
  2017-10-11 19:32           ` Shaohua Li
  0 siblings, 1 reply; 20+ messages in thread
From: Bart Van Assche @ 2017-10-11 17:17 UTC (permalink / raw)
  To: shli@kernel.org
  Cc: linux-block@vger.kernel.org, jthumshirn@suse.de,
	linux-raid@vger.kernel.org, hch@lst.de,
	martin.petersen@oracle.com, axboe@kernel.dk, ming.lei@redhat.com,
	linux-scsi@vger.kernel.org, oleksandr@natalenko.name,
	hare@suse.com

T24gVHVlLCAyMDE3LTEwLTEwIGF0IDE4OjQ4IC0wNzAwLCBTaGFvaHVhIExpIHdyb3RlOg0KPiBU
aGUgcHJvYmxlbSBpcyBfX21kX3N0b3Bfd3JpdGVzIHNldCBzb21lIGJpdCBsaWtlIE1EX1JFQ09W
RVJZX0ZST1pFTiwgd2hpY2gNCj4gd2lsbCBwcmV2ZW50IG1kX2NoZWNrX3JlY292ZXJ5IHJlc3Rh
cnRpbmcgcmVzeW5jL3Jlc2hhcGUuIEkgdGhpbmsgd2UgbmVlZA0KPiBwcmVzZXJ2ZSBtZGRldi0+
cmVjb3ZlcnkgaW4gc3VzcGVuZCBwcmVwYXJlIGFuZCByZXN0b3JlIGFmdGVyIHJlc3VtZQ0KDQpI
ZWxsbyBTaGFvaHVhLA0KDQpBcyBmYXIgYXMgSSBjYW4gc2VlIF9fbWRfc3RvcF93cml0ZXMoKSBz
ZXRzIE1EX1JFQ09WRVJZX0ZST1pFTiBhbmQgY2FuIHNldA0KTURfUkVDT1ZFUllfSU5UUi4gU2lu
Y2UgbWRfY2hlY2tfcmVjb3ZlcnkoKSBjbGVhcnMgTURfUkVDT1ZFUllfSU5UUiBJIHRoaW5rDQpp
dCBzaG91bGQgYmUgc3VmZmljaWVudCB0byBzYXZlIGFuZCByZXN0b3JlIHRoZSBzdGF0ZSBvZiB0
aGUNCk1EX1JFQ09WRVJZX0ZST1pFTiBmbGFnLiBIb3dldmVyLCB3aGVuIEkgcmFuIHRoZSBmb2xs
b3dpbmcgdGVzdDoNCiogZWNobyBmcm96ZW4gPiAvc3lzL2Jsb2NrL21kMC9tZC9zeW5jX2FjdGlv
bg0KKiBjYXQgL3N5cy9ibG9jay9tZDAvbWQvc3luY19hY3Rpb24NCiogc3lzdGVtY3RsIGhpYmVy
bmF0ZQ0KKiAocG93ZXIgb24gc3lzdGVtIGFnYWluKQ0KKiBjYXQgL3N5cy9ibG9jay9tZDAvbWQv
c3luY19hY3Rpb24NCg0KdGhlIG91dHB1dCB0aGF0IGFwcGVhcmVkIHdhcyBhcyBmb2xsb3dzOg0K
DQpmcm96ZW4NCmlkbGUNCg0KRG9lcyB0aGF0IG1lYW4gdGhhdCBhIGhpYmVybmF0ZSBvciBzdXNw
ZW5kIGZvbGxvd2VkIGJ5IGEgcmVzdW1lIGlzIHN1ZmZpY2llbnQNCnRvIGNsZWFyIHRoZSBNRF9S
RUNPVkVSWV9GUk9aRU4gZmxhZyBmb3IgdGhlIG1kIGRyaXZlcnMgYW5kIGhlbmNlIHRoYXQgdGhl
DQpwYXRjaCBhdCB0aGUgc3RhcnQgb2YgdGhpcyBlLW1haWwgdGhyZWFkIGRvZXMgbm90IG5lZWQg
YW55IGZ1cnRoZXINCm1vZGlmaWNhdGlvbnM/DQoNClRoYW5rcywNCg0KQmFydC4=

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v8 03/10] md: Neither resync nor reshape while the system is frozen
  2017-10-11 17:17         ` Bart Van Assche
@ 2017-10-11 19:32           ` Shaohua Li
  2017-10-12 16:59             ` Bart Van Assche
  0 siblings, 1 reply; 20+ messages in thread
From: Shaohua Li @ 2017-10-11 19:32 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-block@vger.kernel.org, jthumshirn@suse.de,
	linux-raid@vger.kernel.org, hch@lst.de,
	martin.petersen@oracle.com, axboe@kernel.dk, ming.lei@redhat.com,
	linux-scsi@vger.kernel.org, oleksandr@natalenko.name,
	hare@suse.com

On Wed, Oct 11, 2017 at 05:17:56PM +0000, Bart Van Assche wrote:
> On Tue, 2017-10-10 at 18:48 -0700, Shaohua Li wrote:
> > The problem is __md_stop_writes set some bit like MD_RECOVERY_FROZEN, which
> > will prevent md_check_recovery restarting resync/reshape. I think we need
> > preserve mddev->recovery in suspend prepare and restore after resume
> 
> Hello Shaohua,
> 
> As far as I can see __md_stop_writes() sets MD_RECOVERY_FROZEN and can set
> MD_RECOVERY_INTR. Since md_check_recovery() clears MD_RECOVERY_INTR I think
> it should be sufficient to save and restore the state of the
> MD_RECOVERY_FROZEN flag. However, when I ran the following test:
> * echo frozen > /sys/block/md0/md/sync_action
> * cat /sys/block/md0/md/sync_action
> * systemctl hibernate
> * (power on system again)
> * cat /sys/block/md0/md/sync_action
> 
> the output that appeared was as follows:
> 
> frozen
> idle 
> Does that mean that a hibernate or suspend followed by a resume is sufficient
> to clear the MD_RECOVERY_FROZEN flag for the md drivers and hence that the
> patch at the start of this e-mail thread does not need any further
> modifications?

Have no idea why it shows 'idle'. From my understanding, after resume, previous
memory is stored and MD_RECOVERY_FROZEN bit should be set. Was trying to
understand what happens.

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v8 09/10] block, scsi: Make SCSI quiesce and resume work reliably
  2017-10-10 21:03 ` [PATCH v8 09/10] block, scsi: Make SCSI quiesce and resume work reliably Bart Van Assche
@ 2017-10-12 16:02   ` Ming Lei
  2017-10-12 17:18     ` Bart Van Assche
  0 siblings, 1 reply; 20+ messages in thread
From: Ming Lei @ 2017-10-12 16:02 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block, linux-scsi, Christoph Hellwig,
	Martin K . Petersen, Oleksandr Natalenko, Hannes Reinecke,
	Johannes Thumshirn

On Tue, Oct 10, 2017 at 02:03:45PM -0700, Bart Van Assche wrote:
> The contexts from which a SCSI device can be quiesced or resumed are:
> * Writing into /sys/class/scsi_device/*/device/state.
> * SCSI parallel (SPI) domain validation.
> * The SCSI device power management methods. See also scsi_bus_pm_ops.
> 
> It is essential during suspend and resume that neither the filesystem
> state nor the filesystem metadata in RAM changes. This is why while
> the hibernation image is being written or restored that SCSI devices
> are quiesced. The SCSI core quiesces devices through scsi_device_quiesce()
> and scsi_device_resume(). In the SDEV_QUIESCE state execution of
> non-preempt requests is deferred. This is realized by returning
> BLKPREP_DEFER from inside scsi_prep_state_check() for quiesced SCSI
> devices. Avoid that a full queue prevents power management requests
> to be submitted by deferring allocation of non-preempt requests for
> devices in the quiesced state. This patch has been tested by running
> the following commands and by verifying that after resume the fio job
> is still running:
> 
> for d in /sys/class/block/sd*[a-z]; do
>   hcil=$(readlink "$d/device")
>   hcil=${hcil#../../../}
>   echo 4 > "$d/queue/nr_requests"
>   echo 1 > "/sys/class/scsi_device/$hcil/device/queue_depth"
> done
> bdev=$(readlink /dev/disk/by-uuid/5217d83f-213e-4b42-b86e-20013325ba6c)
> bdev=${bdev#../../}
> hcil=$(readlink "/sys/block/$bdev/device")
> hcil=${hcil#../../../}
> fio --name="$bdev" --filename="/dev/$bdev" --buffered=0 --bs=512 --rw=randread \
>   --ioengine=libaio --numjobs=4 --iodepth=16 --iodepth_batch=1 --thread \
>   --loops=$((2**31)) &
> pid=$!
> sleep 1
> systemctl hibernate
> sleep 10
> kill $pid
> 
> Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name>
> References: "I/O hangs after resuming from suspend-to-ram" (https://marc.info/?l=linux-block&m=150340235201348).
> Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
> Cc: Martin K. Petersen <martin.petersen@oracle.com>
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Hannes Reinecke <hare@suse.com>
> Cc: Johannes Thumshirn <jthumshirn@suse.de>
> ---
>  block/blk-core.c        | 42 +++++++++++++++++++++++++++++++++++-------
>  block/blk-mq.c          |  4 ++--
>  block/blk-timeout.c     |  2 +-
>  drivers/scsi/scsi_lib.c | 29 +++++++++++++++++++----------
>  fs/block_dev.c          |  4 ++--
>  include/linux/blkdev.h  |  2 +-
>  6 files changed, 60 insertions(+), 23 deletions(-)
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index ed992cbd107f..3847ea42e341 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -372,6 +372,7 @@ void blk_clear_preempt_only(struct request_queue *q)
>  
>  	spin_lock_irqsave(q->queue_lock, flags);
>  	queue_flag_clear(QUEUE_FLAG_PREEMPT_ONLY, q);
> +	wake_up_all(&q->mq_freeze_wq);
>  	spin_unlock_irqrestore(q->queue_lock, flags);
>  }
>  EXPORT_SYMBOL_GPL(blk_clear_preempt_only);
> @@ -793,15 +794,40 @@ struct request_queue *blk_alloc_queue(gfp_t gfp_mask)
>  }
>  EXPORT_SYMBOL(blk_alloc_queue);
>  
> -int blk_queue_enter(struct request_queue *q, bool nowait)
> +/**
> + * blk_queue_enter() - try to increase q->q_usage_counter
> + * @q: request queue pointer
> + * @flags: BLK_MQ_REQ_NOWAIT and/or BLK_MQ_REQ_PREEMPT
> + */
> +int blk_queue_enter(struct request_queue *q, unsigned int flags)
>  {
> +	const bool preempt = flags & BLK_MQ_REQ_PREEMPT;
> +
>  	while (true) {
> +		bool success = false;
>  		int ret;
>  
> -		if (percpu_ref_tryget_live(&q->q_usage_counter))
> +		rcu_read_lock_sched();
> +		if (percpu_ref_tryget_live(&q->q_usage_counter)) {
> +			/*
> +			 * The code that sets the PREEMPT_ONLY flag is
> +			 * responsible for ensuring that that flag is globally
> +			 * visible before the queue is unfrozen.
> +			 */
> +			if (preempt || !blk_queue_preempt_only(q)) {
> +				success = true;
> +			} else {
> +				percpu_ref_put(&q->q_usage_counter);
> +				WARN_ONCE("%s: Attempt to allocate non-preempt request in preempt-only mode.\n",
> +					  kobject_name(q->kobj.parent));
> +			}
> +		}
> +		rcu_read_unlock_sched();
> +
> +		if (success)
>  			return 0;
>  
> -		if (nowait)
> +		if (flags & BLK_MQ_REQ_NOWAIT)
>  			return -EBUSY;
>  
>  		/*
> @@ -814,7 +840,8 @@ int blk_queue_enter(struct request_queue *q, bool nowait)
>  		smp_rmb();
>  
>  		ret = wait_event_interruptible(q->mq_freeze_wq,
> -				!atomic_read(&q->mq_freeze_depth) ||
> +				(atomic_read(&q->mq_freeze_depth) == 0 &&
> +				 (preempt || !blk_queue_preempt_only(q))) ||
>  				blk_queue_dying(q));
>  		if (blk_queue_dying(q))
>  			return -ENODEV;
> @@ -1442,8 +1469,7 @@ static struct request *blk_old_get_request(struct request_queue *q,
>  	/* create ioc upfront */
>  	create_io_context(gfp_mask, q->node);
>  
> -	ret = blk_queue_enter(q, !(gfp_mask & __GFP_DIRECT_RECLAIM) ||
> -			      (op & REQ_NOWAIT));
> +	ret = blk_queue_enter(q, flags);
>  	if (ret)
>  		return ERR_PTR(ret);
>  	spin_lock_irq(q->queue_lock);
> @@ -2264,8 +2290,10 @@ blk_qc_t generic_make_request(struct bio *bio)
>  	current->bio_list = bio_list_on_stack;
>  	do {
>  		struct request_queue *q = bio->bi_disk->queue;
> +		unsigned int flags = bio->bi_opf & REQ_NOWAIT ?
> +			BLK_MQ_REQ_NOWAIT : 0;
>  
> -		if (likely(blk_queue_enter(q, bio->bi_opf & REQ_NOWAIT) == 0)) {
> +		if (likely(blk_queue_enter(q, flags) == 0)) {
>  			struct bio_list lower, same;
>  
>  			/* Create a fresh bio_list for all subordinate requests */
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index bdbfe760bda0..44a06e8541f2 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -386,7 +386,7 @@ struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op,
>  	struct request *rq;
>  	int ret;
>  
> -	ret = blk_queue_enter(q, flags & BLK_MQ_REQ_NOWAIT);
> +	ret = blk_queue_enter(q, flags);
>  	if (ret)
>  		return ERR_PTR(ret);
>  
> @@ -425,7 +425,7 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
>  	if (hctx_idx >= q->nr_hw_queues)
>  		return ERR_PTR(-EIO);
>  
> -	ret = blk_queue_enter(q, true);
> +	ret = blk_queue_enter(q, flags);
>  	if (ret)
>  		return ERR_PTR(ret);
>  
> diff --git a/block/blk-timeout.c b/block/blk-timeout.c
> index e3e9c9771d36..1eba71486716 100644
> --- a/block/blk-timeout.c
> +++ b/block/blk-timeout.c
> @@ -134,7 +134,7 @@ void blk_timeout_work(struct work_struct *work)
>  	struct request *rq, *tmp;
>  	int next_set = 0;
>  
> -	if (blk_queue_enter(q, true))
> +	if (blk_queue_enter(q, BLK_MQ_REQ_NOWAIT | BLK_MQ_REQ_PREEMPT))
>  		return;
>  	spin_lock_irqsave(q->queue_lock, flags);
>  
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 1c16a247fae6..462e3fe385db 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -2926,21 +2926,30 @@ static void scsi_wait_for_queuecommand(struct scsi_device *sdev)
>  int
>  scsi_device_quiesce(struct scsi_device *sdev)
>  {
> +	struct request_queue *q = sdev->request_queue;
>  	int err;
>  
> +	/* If the SCSI device has already been quiesced, do nothing. */
> +	if (blk_set_preempt_only(q))
> +		return 0;

Last time, I have mentioned that this way isn't safe, especially
the SCSI domain command can be sent concurrently, also there might
be issue wrt. runtime PM vs. system suspend.

The issue should have been avoided simply by setting the flag
and continue to freeze queue.

> +
> +	blk_mq_freeze_queue(q);
> +	/*
> +	 * Ensure that the effect of blk_set_preempt_only() will be visible
> +	 * for percpu_ref_tryget() callers that occur after the queue
> +	 * unfreeze even if the queue was already frozen before this function
> +	 * was called. See also https://lwn.net/Articles/573497/.
> +	 */
> +	synchronize_rcu();

No, it is a bad idea to do two synchronize_rcu() which can
be very slow, especially in big machine, this way may slow down
suspend much.


-- 
Ming

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v8 03/10] md: Neither resync nor reshape while the system is frozen
  2017-10-11 19:32           ` Shaohua Li
@ 2017-10-12 16:59             ` Bart Van Assche
  2017-10-12 17:45               ` Shaohua Li
  0 siblings, 1 reply; 20+ messages in thread
From: Bart Van Assche @ 2017-10-12 16:59 UTC (permalink / raw)
  To: shli@kernel.org
  Cc: linux-block@vger.kernel.org, jthumshirn@suse.de,
	linux-raid@vger.kernel.org, hch@lst.de,
	martin.petersen@oracle.com, axboe@kernel.dk, ming.lei@redhat.com,
	linux-scsi@vger.kernel.org, oleksandr@natalenko.name,
	hare@suse.com

T24gV2VkLCAyMDE3LTEwLTExIGF0IDEyOjMyIC0wNzAwLCBTaGFvaHVhIExpIHdyb3RlOg0KPiBP
biBXZWQsIE9jdCAxMSwgMjAxNyBhdCAwNToxNzo1NlBNICswMDAwLCBCYXJ0IFZhbiBBc3NjaGUg
d3JvdGU6DQo+ID4gT24gVHVlLCAyMDE3LTEwLTEwIGF0IDE4OjQ4IC0wNzAwLCBTaGFvaHVhIExp
IHdyb3RlOg0KPiA+ID4gVGhlIHByb2JsZW0gaXMgX19tZF9zdG9wX3dyaXRlcyBzZXQgc29tZSBi
aXQgbGlrZSBNRF9SRUNPVkVSWV9GUk9aRU4sIHdoaWNoDQo+ID4gPiB3aWxsIHByZXZlbnQgbWRf
Y2hlY2tfcmVjb3ZlcnkgcmVzdGFydGluZyByZXN5bmMvcmVzaGFwZS4gSSB0aGluayB3ZSBuZWVk
DQo+ID4gPiBwcmVzZXJ2ZSBtZGRldi0+cmVjb3ZlcnkgaW4gc3VzcGVuZCBwcmVwYXJlIGFuZCBy
ZXN0b3JlIGFmdGVyIHJlc3VtZQ0KPiA+IA0KPiA+IEhlbGxvIFNoYW9odWEsDQo+ID4gDQo+ID4g
QXMgZmFyIGFzIEkgY2FuIHNlZSBfX21kX3N0b3Bfd3JpdGVzKCkgc2V0cyBNRF9SRUNPVkVSWV9G
Uk9aRU4gYW5kIGNhbiBzZXQNCj4gPiBNRF9SRUNPVkVSWV9JTlRSLiBTaW5jZSBtZF9jaGVja19y
ZWNvdmVyeSgpIGNsZWFycyBNRF9SRUNPVkVSWV9JTlRSIEkgdGhpbmsNCj4gPiBpdCBzaG91bGQg
YmUgc3VmZmljaWVudCB0byBzYXZlIGFuZCByZXN0b3JlIHRoZSBzdGF0ZSBvZiB0aGUNCj4gPiBN
RF9SRUNPVkVSWV9GUk9aRU4gZmxhZy4gSG93ZXZlciwgd2hlbiBJIHJhbiB0aGUgZm9sbG93aW5n
IHRlc3Q6DQo+ID4gKiBlY2hvIGZyb3plbiA+IC9zeXMvYmxvY2svbWQwL21kL3N5bmNfYWN0aW9u
DQo+ID4gKiBjYXQgL3N5cy9ibG9jay9tZDAvbWQvc3luY19hY3Rpb24NCj4gPiAqIHN5c3RlbWN0
bCBoaWJlcm5hdGUNCj4gPiAqIChwb3dlciBvbiBzeXN0ZW0gYWdhaW4pDQo+ID4gKiBjYXQgL3N5
cy9ibG9jay9tZDAvbWQvc3luY19hY3Rpb24NCj4gPiANCj4gPiB0aGUgb3V0cHV0IHRoYXQgYXBw
ZWFyZWQgd2FzIGFzIGZvbGxvd3M6DQo+ID4gDQo+ID4gZnJvemVuDQo+ID4gaWRsZSANCj4gPiBE
b2VzIHRoYXQgbWVhbiB0aGF0IGEgaGliZXJuYXRlIG9yIHN1c3BlbmQgZm9sbG93ZWQgYnkgYSBy
ZXN1bWUgaXMgc3VmZmljaWVudA0KPiA+IHRvIGNsZWFyIHRoZSBNRF9SRUNPVkVSWV9GUk9aRU4g
ZmxhZyBmb3IgdGhlIG1kIGRyaXZlcnMgYW5kIGhlbmNlIHRoYXQgdGhlDQo+ID4gcGF0Y2ggYXQg
dGhlIHN0YXJ0IG9mIHRoaXMgZS1tYWlsIHRocmVhZCBkb2VzIG5vdCBuZWVkIGFueSBmdXJ0aGVy
DQo+ID4gbW9kaWZpY2F0aW9ucz8NCj4gDQo+IEhhdmUgbm8gaWRlYSB3aHkgaXQgc2hvd3MgJ2lk
bGUnLiBGcm9tIG15IHVuZGVyc3RhbmRpbmcsIGFmdGVyIHJlc3VtZSwgcHJldmlvdXMNCj4gbWVt
b3J5IGlzIHN0b3JlZCBhbmQgTURfUkVDT1ZFUllfRlJPWkVOIGJpdCBzaG91bGQgYmUgc2V0LiBX
YXMgdHJ5aW5nIHRvDQo+IHVuZGVyc3RhbmQgd2hhdCBoYXBwZW5zLg0KDQpIZWxsbyBTaGFvaHVh
LA0KDQpJIHRoaW5rIGEgdXNlciBzcGFjZSBhY3Rpb24gY2F1c2VzIHRoZSBNRF9SRUNPVkVSWV9G
Uk9aRU4gYml0IHRvIGJlIGNsZWFyZWQuDQpBZnRlciBJIGhhZCBhZGRlZCBhIGZldyBkZWJ1ZyBz
dGF0ZW1lbnRzIGluIHRoZSBtZCBkcml2ZXIgdGhpcyBpcyB3aGF0IEkNCmZvdW5kIGluIHRoZSBz
eXN0ZW0gbG9nOg0KDQpPY3QgMTIgMDk6Mzg6Mzkga2VybmVsOiBhY3Rpb25fc3RvcmU6IHN0b3Jp
bmcgYWN0aW9uIGNoZWNrDQpPY3QgMTIgMDk6Mzg6Mzkga2VybmVsOiBtZDogZGF0YS1jaGVjayBv
ZiBSQUlEIGFycmF5IG1kMA0KT2N0IDEyIDA5OjM4OjQwIHN5c3RlbWRbMV06IFN0YXJ0aW5nIEhp
YmVybmF0ZS4uLg0KT2N0IDEyIDA5OjM4OjQwIGtlcm5lbDogbWQ6IG1kMDogZGF0YS1jaGVjayBp
bnRlcnJ1cHRlZC4NClsgcG93ZXJlZCBvbiBzeXN0ZW0gbWFudWFsbHkgXQ0KT2N0IDEyIDA5OjM5
OjA1IGtlcm5lbDogYWN0aW9uX3N0b3JlOiBzdG9yaW5nIGFjdGlvbiBjaGVjaw0KT2N0IDEyIDA5
OjM5OjA1IGtlcm5lbDogbWQ6IGRhdGEtY2hlY2sgb2YgUkFJRCBhcnJheSBtZDANCk9jdCAxMiAw
OTozOToxMSBrZXJuZWw6IGFjdGlvbl9zdG9yZTogc3RvcmluZyBhY3Rpb24gaWRsZQ0KT2N0IDEy
IDA5OjM5OjExIGtlcm5lbDogbWQ6IG1kMDogZGF0YS1jaGVjayBpbnRlcnJ1cHRlZC4NCg0KQW55
d2F5LCBob3cgYWJvdXQgdGhlIHBhdGNoIGJlbG93PyBJIGhhdmUgdmVyaWZpZWQgYnkgYWRkaW5n
IGEgZGVidWcgcHJfaW5mbygpDQpzdGF0ZW1lbnQgdGhhdCB0aGUgbmV3bHkgYWRkZWQgY29kZSBy
ZWFsbHkgY2xlYXJzIHRoZSBNRF9SRUNPVkVSWV9GUk9aRU4gYml0Lg0KDQpUaGFua3MsDQoNCkJh
cnQuDQoNCg0KU3ViamVjdDogW1BBVENIXSBtZDogTmVpdGhlciByZXN5bmMgbm9yIHJlc2hhcGUg
d2hpbGUgdGhlIHN5c3RlbSBpcyBmcm96ZW4NCg0KU29tZSBwZW9wbGUgdXNlIHRoZSBtZCBkcml2
ZXIgb24gbGFwdG9wcyBhbmQgdXNlIHRoZSBzdXNwZW5kIGFuZA0KcmVzdW1lIGZ1bmN0aW9uYWxp
dHkuIFNpbmNlIGl0IGlzIGVzc2VudGlhbCB0aGF0IHN1Ym1pdHRpbmcgb2YNCm5ldyBJL08gcmVx
dWVzdHMgc3RvcHMgYmVmb3JlIGEgaGliZXJuYXRpb24gaW1hZ2UgaXMgY3JlYXRlZCwNCmludGVy
cnVwdCB0aGUgbWQgcmVzeW5jIGFuZCByZXNoYXBlIGFjdGlvbnMgaWYgdGhlIHN5c3RlbSBpcw0K
YmVpbmcgZnJvemVuLiBOb3RlOiB0aGUgcmVzeW5jIGFuZCByZXNoYXBlIHdpbGwgcmVzdGFydCBh
ZnRlcg0KdGhlIHN5c3RlbSBpcyByZXN1bWVkIGFuZCBhIG1lc3NhZ2Ugc2ltaWxhciB0byB0aGUg
Zm9sbG93aW5nDQp3aWxsIGFwcGVhciBpbiB0aGUgc3lzdGVtIGxvZzoNCg0KbWQ6IG1kMDogZGF0
YS1jaGVjayBpbnRlcnJ1cHRlZC4NCg0KU2lnbmVkLW9mZi1ieTogQmFydCBWYW4gQXNzY2hlIDxi
YXJ0LnZhbmFzc2NoZUB3ZGMuY29tPg0KQ2M6IFNoYW9odWEgTGkgPHNobGlAa2VybmVsLm9yZz4N
CkNjOiBsaW51eC1yYWlkQHZnZXIua2VybmVsLm9yZw0KQ2M6IE1pbmcgTGVpIDxtaW5nLmxlaUBy
ZWRoYXQuY29tPg0KQ2M6IENocmlzdG9waCBIZWxsd2lnIDxoY2hAbHN0LmRlPg0KQ2M6IEhhbm5l
cyBSZWluZWNrZSA8aGFyZUBzdXNlLmNvbT4NCkNjOiBKb2hhbm5lcyBUaHVtc2hpcm4gPGp0aHVt
c2hpcm5Ac3VzZS5kZT4NCi0tLQ0KIGRyaXZlcnMvbWQvbWQuYyB8IDUwICsrKysrKysrKysrKysr
KysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKystDQogZHJpdmVycy9tZC9tZC5oIHwg
IDggKysrKysrKysNCiAyIGZpbGVzIGNoYW5nZWQsIDU3IGluc2VydGlvbnMoKyksIDEgZGVsZXRp
b24oLSkNCg0KZGlmZiAtLWdpdCBhL2RyaXZlcnMvbWQvbWQuYyBiL2RyaXZlcnMvbWQvbWQuYw0K
aW5kZXggYjk5NTg0ZTVkNmIxLi44YjJmYzc1MGY5N2YgMTAwNjQ0DQotLS0gYS9kcml2ZXJzL21k
L21kLmMNCisrKyBiL2RyaXZlcnMvbWQvbWQuYw0KQEAgLTY2LDYgKzY2LDggQEANCiAjaW5jbHVk
ZSA8bGludXgvcmFpZC9tZF91Lmg+DQogI2luY2x1ZGUgPGxpbnV4L3NsYWIuaD4NCiAjaW5jbHVk
ZSA8bGludXgvcGVyY3B1LXJlZmNvdW50Lmg+DQorI2luY2x1ZGUgPGxpbnV4L2ZyZWV6ZXIuaD4N
CisjaW5jbHVkZSA8bGludXgvc3VzcGVuZC5oPg0KIA0KICNpbmNsdWRlIDx0cmFjZS9ldmVudHMv
YmxvY2suaD4NCiAjaW5jbHVkZSAibWQuaCINCkBAIC03NDM5LDYgKzc0NDEsNyBAQCBzdGF0aWMg
aW50IG1kX3RocmVhZCh2b2lkICphcmcpDQogCSAqLw0KIA0KIAlhbGxvd19zaWduYWwoU0lHS0lM
TCk7DQorCXNldF9mcmVlemFibGUoKTsNCiAJd2hpbGUgKCFrdGhyZWFkX3Nob3VsZF9zdG9wKCkp
IHsNCiANCiAJCS8qIFdlIG5lZWQgdG8gd2FpdCBJTlRFUlJVUFRJQkxFIHNvIHRoYXQNCkBAIC03
NDQ5LDcgKzc0NTIsNyBAQCBzdGF0aWMgaW50IG1kX3RocmVhZCh2b2lkICphcmcpDQogCQlpZiAo
c2lnbmFsX3BlbmRpbmcoY3VycmVudCkpDQogCQkJZmx1c2hfc2lnbmFscyhjdXJyZW50KTsNCiAN
Ci0JCXdhaXRfZXZlbnRfaW50ZXJydXB0aWJsZV90aW1lb3V0DQorCQl3YWl0X2V2ZW50X2ZyZWV6
YWJsZV90aW1lb3V0DQogCQkJKHRocmVhZC0+d3F1ZXVlLA0KIAkJCSB0ZXN0X2JpdChUSFJFQURf
V0FLRVVQLCAmdGhyZWFkLT5mbGFncykNCiAJCQkgfHwga3RocmVhZF9zaG91bGRfc3RvcCgpIHx8
IGt0aHJlYWRfc2hvdWxkX3BhcmsoKSwNCkBAIC04OTQ0LDYgKzg5NDcsOCBAQCBzdGF0aWMgdm9p
ZCBtZF9zdG9wX2FsbF93cml0ZXModm9pZCkNCiAJaW50IG5lZWRfZGVsYXkgPSAwOw0KIA0KIAlm
b3JfZWFjaF9tZGRldihtZGRldiwgdG1wKSB7DQorCQltZGRldi0+ZnJvemVuX2JlZm9yZV9zdXNw
ZW5kID0NCisJCQl0ZXN0X2JpdChNRF9SRUNPVkVSWV9GUk9aRU4sICZtZGRldi0+cmVjb3Zlcnkp
Ow0KIAkJaWYgKG1kZGV2X3RyeWxvY2sobWRkZXYpKSB7DQogCQkJaWYgKG1kZGV2LT5wZXJzKQ0K
IAkJCQlfX21kX3N0b3Bfd3JpdGVzKG1kZGV2KTsNCkBAIC04OTYzLDYgKzg5NjgsNDcgQEAgc3Rh
dGljIHZvaWQgbWRfc3RvcF9hbGxfd3JpdGVzKHZvaWQpDQogCQltZGVsYXkoMTAwMCoxKTsNCiB9
DQogDQorc3RhdGljIHZvaWQgbWRfcmVzdG9yZV9mcm96ZW5fZmxhZyh2b2lkKQ0KK3sNCisJc3Ry
dWN0IGxpc3RfaGVhZCAqdG1wOw0KKwlzdHJ1Y3QgbWRkZXYgKm1kZGV2Ow0KKw0KKwlmb3JfZWFj
aF9tZGRldihtZGRldiwgdG1wKSB7DQorCQlpZiAobWRkZXYtPmZyb3plbl9iZWZvcmVfc3VzcGVu
ZCkNCisJCQlzZXRfYml0KE1EX1JFQ09WRVJZX0ZST1pFTiwgJm1kZGV2LT5yZWNvdmVyeSk7DQor
CQllbHNlDQorCQkJY2xlYXJfYml0KE1EX1JFQ09WRVJZX0ZST1pFTiwgJm1kZGV2LT5yZWNvdmVy
eSk7DQorCX0NCit9DQorDQorLyoNCisgKiBFbnN1cmUgdGhhdCBuZWl0aGVyIHJlc3luY2luZyBu
b3IgcmVzaGFwaW5nIG9jY3VycyB3aGlsZSB0aGUgc3lzdGVtIGlzDQorICogZnJvemVuLg0KKyAq
Lw0KK3N0YXRpYyBpbnQgbWRfbm90aWZ5X3BtKHN0cnVjdCBub3RpZmllcl9ibG9jayAqYmwsIHVu
c2lnbmVkIGxvbmcgc3RhdGUsDQorCQkJdm9pZCAqdW51c2VkKQ0KK3sNCisJcHJfZGVidWcoIiVz
OiBzdGF0ZSA9ICVsZFxuIiwgX19mdW5jX18sIHN0YXRlKTsNCisNCisJc3dpdGNoIChzdGF0ZSkg
ew0KKwljYXNlIFBNX0hJQkVSTkFUSU9OX1BSRVBBUkU6DQorCWNhc2UgUE1fU1VTUEVORF9QUkVQ
QVJFOg0KKwljYXNlIFBNX1JFU1RPUkVfUFJFUEFSRToNCisJCW1kX3N0b3BfYWxsX3dyaXRlcygp
Ow0KKwkJYnJlYWs7DQorCWNhc2UgUE1fUE9TVF9ISUJFUk5BVElPTjoNCisJY2FzZSBQTV9QT1NU
X1NVU1BFTkQ6DQorCWNhc2UgUE1fUE9TVF9SRVNUT1JFOg0KKwkJbWRfcmVzdG9yZV9mcm96ZW5f
ZmxhZygpOw0KKwkJYnJlYWs7DQorCX0NCisJcmV0dXJuIE5PVElGWV9ET05FOw0KK30NCisNCitz
dGF0aWMgc3RydWN0IG5vdGlmaWVyX2Jsb2NrIG1kX3BtX25vdGlmaWVyID0gew0KKwkubm90aWZp
ZXJfY2FsbAk9IG1kX25vdGlmeV9wbSwNCit9Ow0KKw0KIHN0YXRpYyBpbnQgbWRfbm90aWZ5X3Jl
Ym9vdChzdHJ1Y3Qgbm90aWZpZXJfYmxvY2sgKnRoaXMsDQogCQkJICAgIHVuc2lnbmVkIGxvbmcg
Y29kZSwgdm9pZCAqeCkNCiB7DQpAQCAtOTAwOSw2ICs5MDU1LDcgQEAgc3RhdGljIGludCBfX2lu
aXQgbWRfaW5pdCh2b2lkKQ0KIAkJCSAgICBtZF9wcm9iZSwgTlVMTCwgTlVMTCk7DQogDQogCXJl
Z2lzdGVyX3JlYm9vdF9ub3RpZmllcigmbWRfcmVib290X25vdGlmaWVyKTsNCisJcmVnaXN0ZXJf
cG1fbm90aWZpZXIoJm1kX3BtX25vdGlmaWVyKTsNCiAJcmFpZF90YWJsZV9oZWFkZXIgPSByZWdp
c3Rlcl9zeXNjdGxfdGFibGUocmFpZF9yb290X3RhYmxlKTsNCiANCiAJbWRfZ2VuaW5pdCgpOw0K
QEAgLTkyNDgsNiArOTI5NSw3IEBAIHN0YXRpYyBfX2V4aXQgdm9pZCBtZF9leGl0KHZvaWQpDQog
DQogCXVucmVnaXN0ZXJfYmxrZGV2KE1EX01BSk9SLCJtZCIpOw0KIAl1bnJlZ2lzdGVyX2Jsa2Rl
dihtZHBfbWFqb3IsICJtZHAiKTsNCisJdW5yZWdpc3Rlcl9wbV9ub3RpZmllcigmbWRfcG1fbm90
aWZpZXIpOw0KIAl1bnJlZ2lzdGVyX3JlYm9vdF9ub3RpZmllcigmbWRfcmVib290X25vdGlmaWVy
KTsNCiAJdW5yZWdpc3Rlcl9zeXNjdGxfdGFibGUocmFpZF90YWJsZV9oZWFkZXIpOw0KIA0KZGlm
ZiAtLWdpdCBhL2RyaXZlcnMvbWQvbWQuaCBiL2RyaXZlcnMvbWQvbWQuaA0KaW5kZXggZDgyODdk
M2NkMWJmLi43MjdiM2FlZjRkMjYgMTAwNjQ0DQotLS0gYS9kcml2ZXJzL21kL21kLmgNCisrKyBi
L2RyaXZlcnMvbWQvbWQuaA0KQEAgLTM1Niw2ICszNTYsMTQgQEAgc3RydWN0IG1kZGV2IHsNCiAJ
ICovDQogCWludAkJCQlyZWNvdmVyeV9kaXNhYmxlZDsNCiANCisJLyogV3JpdGVzIGFyZSBzdG9w
cGVkIGJlZm9yZSBoaWJlcm5hdGlvbiwgc3VzcGVuZCBhbmQgcmVzdG9yZSBieQ0KKwkgKiBjYWxs
aW5nIG1kX3N0b3BfYWxsX3dyaXRlcygpLiBUaGF0IGZ1bmN0aW9uIHNldHMgdGhlDQorCSAqIE1E
X1JFQ09WRVJZX0ZST1pFTiBmbGFnLiBUaGUgdmFsdWUgb2YgdGhhdCBmbGFnIGlzIHNhdmVkIGJl
Zm9yZQ0KKwkgKiB3cml0ZXMgYXJlIHN0b3BwZWQgc3VjaCB0aGF0IGl0IGNhbiBiZSByZXN0b3Jl
ZCBhZnRlciBoaWJlcm5hdGlvbiwNCisJICogc3VzcGVuZCBvciByZXN0b3JlIGhhdmUgZmluaXNo
ZWQuDQorCSAqLw0KKwlib29sCQkJCWZyb3plbl9iZWZvcmVfc3VzcGVuZDsNCisNCiAJaW50CQkJ
CWluX3N5bmM7CS8qIGtub3cgdG8gbm90IG5lZWQgcmVzeW5jICovDQogCS8qICdvcGVuX211dGV4
JyBhdm9pZHMgcmFjZXMgYmV0d2VlbiAnbWRfb3BlbicgYW5kICdkb19tZF9zdG9wJywgc28NCiAJ
ICogdGhhdCB3ZSBhcmUgbmV2ZXIgc3RvcHBpbmcgYW4gYXJyYXkgd2hpbGUgaXQgaXMgb3Blbi4N
Cg==

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v8 09/10] block, scsi: Make SCSI quiesce and resume work reliably
  2017-10-12 16:02   ` Ming Lei
@ 2017-10-12 17:18     ` Bart Van Assche
  0 siblings, 0 replies; 20+ messages in thread
From: Bart Van Assche @ 2017-10-12 17:18 UTC (permalink / raw)
  To: ming.lei@redhat.com
  Cc: linux-block@vger.kernel.org, jthumshirn@suse.de, hch@lst.de,
	martin.petersen@oracle.com, axboe@kernel.dk,
	linux-scsi@vger.kernel.org, oleksandr@natalenko.name,
	hare@suse.com

T24gRnJpLCAyMDE3LTEwLTEzIGF0IDAwOjAyICswODAwLCBNaW5nIExlaSB3cm90ZToNCj4gT24g
VHVlLCBPY3QgMTAsIDIwMTcgYXQgMDI6MDM6NDVQTSAtMDcwMCwgQmFydCBWYW4gQXNzY2hlIHdy
b3RlOg0KPiA+IFsgLi4uIF0NCj4gPiArCS8qIElmIHRoZSBTQ1NJIGRldmljZSBoYXMgYWxyZWFk
eSBiZWVuIHF1aWVzY2VkLCBkbyBub3RoaW5nLiAqLw0KPiA+ICsJaWYgKGJsa19zZXRfcHJlZW1w
dF9vbmx5KHEpKQ0KPiA+ICsJCXJldHVybiAwOw0KPiANCj4gTGFzdCB0aW1lLCBJIGhhdmUgbWVu
dGlvbmVkIHRoYXQgdGhpcyB3YXkgaXNuJ3Qgc2FmZSwgZXNwZWNpYWxseQ0KPiB0aGUgU0NTSSBk
b21haW4gY29tbWFuZCBjYW4gYmUgc2VudCBjb25jdXJyZW50bHksIGFsc28gdGhlcmUgbWlnaHQN
Cj4gYmUgaXNzdWUgd3J0LiBydW50aW1lIFBNIHZzLiBzeXN0ZW0gc3VzcGVuZC4NCj4gDQo+IFRo
ZSBpc3N1ZSBzaG91bGQgaGF2ZSBiZWVuIGF2b2lkZWQgc2ltcGx5IGJ5IHNldHRpbmcgdGhlIGZs
YWcNCj4gYW5kIGNvbnRpbnVlIHRvIGZyZWV6ZSBxdWV1ZS4NCg0KSSB0aGluayB0aGUgcG1fYXV0
b3NsZWVwX2xvY2soKSAvIHBtX2F1dG9zbGVlcF91bmxvY2soKSBjYWxscyBpbiB0aGUgcG93ZXIN
Cm1hbmFnZW1lbnQgY29kZSBwcmV2ZW50IHRoYXQgdGhlIHBvd2VyIG1hbmFnZW1lbnQgY29kZSBj
YW4gdHJpZ2dlcg0KY29uY3VycmVudCBzY3NpX2RldmljZV9xdWllc2NlKCkgY2FsbHMuDQoNCj4g
PiArDQo+ID4gKwlibGtfbXFfZnJlZXplX3F1ZXVlKHEpOw0KPiA+ICsJLyoNCj4gPiArCSAqIEVu
c3VyZSB0aGF0IHRoZSBlZmZlY3Qgb2YgYmxrX3NldF9wcmVlbXB0X29ubHkoKSB3aWxsIGJlIHZp
c2libGUNCj4gPiArCSAqIGZvciBwZXJjcHVfcmVmX3RyeWdldCgpIGNhbGxlcnMgdGhhdCBvY2N1
ciBhZnRlciB0aGUgcXVldWUNCj4gPiArCSAqIHVuZnJlZXplIGV2ZW4gaWYgdGhlIHF1ZXVlIHdh
cyBhbHJlYWR5IGZyb3plbiBiZWZvcmUgdGhpcyBmdW5jdGlvbg0KPiA+ICsJICogd2FzIGNhbGxl
ZC4gU2VlIGFsc28gaHR0cHM6Ly9sd24ubmV0L0FydGljbGVzLzU3MzQ5Ny8uDQo+ID4gKwkgKi8N
Cj4gPiArCXN5bmNocm9uaXplX3JjdSgpOw0KPiANCj4gTm8sIGl0IGlzIGEgYmFkIGlkZWEgdG8g
ZG8gdHdvIHN5bmNocm9uaXplX3JjdSgpIHdoaWNoIGNhbg0KPiBiZSB2ZXJ5IHNsb3csIGVzcGVj
aWFsbHkgaW4gYmlnIG1hY2hpbmUsIHRoaXMgd2F5IG1heSBzbG93IGRvd24NCj4gc3VzcGVuZCBt
dWNoLg0KDQpzY3NpX2RldmljZV9xdWllc2NlKCkgaXMgb25seSB1c2VkIGJ5IHRoZSBTQ1NJIHBv
d2VyIG1hbmFnZW1lbnQgY29kZSBhbmQgYnkNCnRoZSBTUEkgRFYgY29kZS4gVGhlIG51bWJlciBv
ZiB1c2VycyBvZiB0aGUgU0NTSSBwYXJhbGxlbCBjb2RlIGlzIHNtYWxsLiBBbmQNCnRoZSBTQ1NJ
IHBvd2VyIG1hbmFnZW1lbnQgY29kZSBpcyB1c2VkIG9uIGxhcHRvcHMgYnV0IG5vdCBvbiBiaWcg
bWFjaGluZXMuDQpBbnl3YXksIGlmIHlvdSByZWFsbHkgY2FyZSBhYm91dCB0aGlzIG9wdGltaXph
dGlvbiwgdGhhdCdzIHNvbWV0aGluZyB0aGF0IGNhbg0KYmUgZG9uZSBlYXNpbHkgYXMgYSBmb2xs
b3ctdXAgcGF0Y2guDQoNCkJhcnQu

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v8 03/10] md: Neither resync nor reshape while the system is frozen
  2017-10-12 16:59             ` Bart Van Assche
@ 2017-10-12 17:45               ` Shaohua Li
  0 siblings, 0 replies; 20+ messages in thread
From: Shaohua Li @ 2017-10-12 17:45 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-block@vger.kernel.org, jthumshirn@suse.de,
	linux-raid@vger.kernel.org, hch@lst.de,
	martin.petersen@oracle.com, axboe@kernel.dk, ming.lei@redhat.com,
	linux-scsi@vger.kernel.org, oleksandr@natalenko.name,
	hare@suse.com

On Thu, Oct 12, 2017 at 04:59:01PM +0000, Bart Van Assche wrote:
> On Wed, 2017-10-11 at 12:32 -0700, Shaohua Li wrote:
> > On Wed, Oct 11, 2017 at 05:17:56PM +0000, Bart Van Assche wrote:
> > > On Tue, 2017-10-10 at 18:48 -0700, Shaohua Li wrote:
> > > > The problem is __md_stop_writes set some bit like MD_RECOVERY_FROZEN, which
> > > > will prevent md_check_recovery restarting resync/reshape. I think we need
> > > > preserve mddev->recovery in suspend prepare and restore after resume
> > > 
> > > Hello Shaohua,
> > > 
> > > As far as I can see __md_stop_writes() sets MD_RECOVERY_FROZEN and can set
> > > MD_RECOVERY_INTR. Since md_check_recovery() clears MD_RECOVERY_INTR I think
> > > it should be sufficient to save and restore the state of the
> > > MD_RECOVERY_FROZEN flag. However, when I ran the following test:
> > > * echo frozen > /sys/block/md0/md/sync_action
> > > * cat /sys/block/md0/md/sync_action
> > > * systemctl hibernate
> > > * (power on system again)
> > > * cat /sys/block/md0/md/sync_action
> > > 
> > > the output that appeared was as follows:
> > > 
> > > frozen
> > > idle 
> > > Does that mean that a hibernate or suspend followed by a resume is sufficient
> > > to clear the MD_RECOVERY_FROZEN flag for the md drivers and hence that the
> > > patch at the start of this e-mail thread does not need any further
> > > modifications?
> > 
> > Have no idea why it shows 'idle'. From my understanding, after resume, previous
> > memory is stored and MD_RECOVERY_FROZEN bit should be set. Was trying to
> > understand what happens.
> 
> Hello Shaohua,
> 
> I think a user space action causes the MD_RECOVERY_FROZEN bit to be cleared.
> After I had added a few debug statements in the md driver this is what I
> found in the system log:
> 
> Oct 12 09:38:39 kernel: action_store: storing action check
> Oct 12 09:38:39 kernel: md: data-check of RAID array md0
> Oct 12 09:38:40 systemd[1]: Starting Hibernate...
> Oct 12 09:38:40 kernel: md: md0: data-check interrupted.
> [ powered on system manually ]
> Oct 12 09:39:05 kernel: action_store: storing action check
> Oct 12 09:39:05 kernel: md: data-check of RAID array md0
> Oct 12 09:39:11 kernel: action_store: storing action idle
> Oct 12 09:39:11 kernel: md: md0: data-check interrupted.
> 
> Anyway, how about the patch below? I have verified by adding a debug pr_info()
> statement that the newly added code really clears the MD_RECOVERY_FROZEN bit.

Ok, for patch 1-3:
Reviewed-by: Shaohua Li <shli@kernel.org>

> Subject: [PATCH] md: Neither resync nor reshape while the system is frozen
> 
> Some people use the md driver on laptops and use the suspend and
> resume functionality. Since it is essential that submitting of
> new I/O requests stops before a hibernation image is created,
> interrupt the md resync and reshape actions if the system is
> being frozen. Note: the resync and reshape will restart after
> the system is resumed and a message similar to the following
> will appear in the system log:
> 
> md: md0: data-check interrupted.
> 
> Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
> Cc: Shaohua Li <shli@kernel.org>
> Cc: linux-raid@vger.kernel.org
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Hannes Reinecke <hare@suse.com>
> Cc: Johannes Thumshirn <jthumshirn@suse.de>
> ---
>  drivers/md/md.c | 50 +++++++++++++++++++++++++++++++++++++++++++++++++-
>  drivers/md/md.h |  8 ++++++++
>  2 files changed, 57 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index b99584e5d6b1..8b2fc750f97f 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -66,6 +66,8 @@
>  #include <linux/raid/md_u.h>
>  #include <linux/slab.h>
>  #include <linux/percpu-refcount.h>
> +#include <linux/freezer.h>
> +#include <linux/suspend.h>
>  
>  #include <trace/events/block.h>
>  #include "md.h"
> @@ -7439,6 +7441,7 @@ static int md_thread(void *arg)
>  	 */
>  
>  	allow_signal(SIGKILL);
> +	set_freezable();
>  	while (!kthread_should_stop()) {
>  
>  		/* We need to wait INTERRUPTIBLE so that
> @@ -7449,7 +7452,7 @@ static int md_thread(void *arg)
>  		if (signal_pending(current))
>  			flush_signals(current);
>  
> -		wait_event_interruptible_timeout
> +		wait_event_freezable_timeout
>  			(thread->wqueue,
>  			 test_bit(THREAD_WAKEUP, &thread->flags)
>  			 || kthread_should_stop() || kthread_should_park(),
> @@ -8944,6 +8947,8 @@ static void md_stop_all_writes(void)
>  	int need_delay = 0;
>  
>  	for_each_mddev(mddev, tmp) {
> +		mddev->frozen_before_suspend =
> +			test_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
>  		if (mddev_trylock(mddev)) {
>  			if (mddev->pers)
>  				__md_stop_writes(mddev);
> @@ -8963,6 +8968,47 @@ static void md_stop_all_writes(void)
>  		mdelay(1000*1);
>  }
>  
> +static void md_restore_frozen_flag(void)
> +{
> +	struct list_head *tmp;
> +	struct mddev *mddev;
> +
> +	for_each_mddev(mddev, tmp) {
> +		if (mddev->frozen_before_suspend)
> +			set_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
> +		else
> +			clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
> +	}
> +}
> +
> +/*
> + * Ensure that neither resyncing nor reshaping occurs while the system is
> + * frozen.
> + */
> +static int md_notify_pm(struct notifier_block *bl, unsigned long state,
> +			void *unused)
> +{
> +	pr_debug("%s: state = %ld\n", __func__, state);
> +
> +	switch (state) {
> +	case PM_HIBERNATION_PREPARE:
> +	case PM_SUSPEND_PREPARE:
> +	case PM_RESTORE_PREPARE:
> +		md_stop_all_writes();
> +		break;
> +	case PM_POST_HIBERNATION:
> +	case PM_POST_SUSPEND:
> +	case PM_POST_RESTORE:
> +		md_restore_frozen_flag();
> +		break;
> +	}
> +	return NOTIFY_DONE;
> +}
> +
> +static struct notifier_block md_pm_notifier = {
> +	.notifier_call	= md_notify_pm,
> +};
> +
>  static int md_notify_reboot(struct notifier_block *this,
>  			    unsigned long code, void *x)
>  {
> @@ -9009,6 +9055,7 @@ static int __init md_init(void)
>  			    md_probe, NULL, NULL);
>  
>  	register_reboot_notifier(&md_reboot_notifier);
> +	register_pm_notifier(&md_pm_notifier);
>  	raid_table_header = register_sysctl_table(raid_root_table);
>  
>  	md_geninit();
> @@ -9248,6 +9295,7 @@ static __exit void md_exit(void)
>  
>  	unregister_blkdev(MD_MAJOR,"md");
>  	unregister_blkdev(mdp_major, "mdp");
> +	unregister_pm_notifier(&md_pm_notifier);
>  	unregister_reboot_notifier(&md_reboot_notifier);
>  	unregister_sysctl_table(raid_table_header);
>  
> diff --git a/drivers/md/md.h b/drivers/md/md.h
> index d8287d3cd1bf..727b3aef4d26 100644
> --- a/drivers/md/md.h
> +++ b/drivers/md/md.h
> @@ -356,6 +356,14 @@ struct mddev {
>  	 */
>  	int				recovery_disabled;
>  
> +	/* Writes are stopped before hibernation, suspend and restore by
> +	 * calling md_stop_all_writes(). That function sets the
> +	 * MD_RECOVERY_FROZEN flag. The value of that flag is saved before
> +	 * writes are stopped such that it can be restored after hibernation,
> +	 * suspend or restore have finished.
> +	 */
> +	bool				frozen_before_suspend;
> +
>  	int				in_sync;	/* know to not need resync */
>  	/* 'open_mutex' avoids races between 'md_open' and 'do_md_stop', so
>  	 * that we are never stopping an array while it is open.

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2017-10-12 17:45 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-10-10 21:03 [PATCH v8 00/10] block, scsi, md: Improve suspend and resume Bart Van Assche
2017-10-10 21:03 ` [PATCH v8 01/10] md: Rename md_notifier into md_reboot_notifier Bart Van Assche
2017-10-10 21:03 ` [PATCH v8 02/10] md: Introduce md_stop_all_writes() Bart Van Assche
2017-10-10 21:03 ` [PATCH v8 03/10] md: Neither resync nor reshape while the system is frozen Bart Van Assche
2017-10-10 22:30   ` Shaohua Li
2017-10-10 23:33     ` Bart Van Assche
2017-10-11  1:48       ` Shaohua Li
2017-10-11 17:17         ` Bart Van Assche
2017-10-11 19:32           ` Shaohua Li
2017-10-12 16:59             ` Bart Van Assche
2017-10-12 17:45               ` Shaohua Li
2017-10-10 21:03 ` [PATCH v8 04/10] block: Make q_usage_counter also track legacy requests Bart Van Assche
2017-10-10 21:03 ` [PATCH v8 05/10] block: Introduce blk_get_request_flags() Bart Van Assche
2017-10-10 21:03 ` [PATCH v8 06/10] block: Introduce BLK_MQ_REQ_PREEMPT Bart Van Assche
2017-10-10 21:03 ` [PATCH v8 07/10] ide, scsi: Tell the block layer at request allocation time about preempt requests Bart Van Assche
2017-10-10 21:03 ` [PATCH v8 08/10] block: Add the QUEUE_FLAG_PREEMPT_ONLY request queue flag Bart Van Assche
2017-10-10 21:03 ` [PATCH v8 09/10] block, scsi: Make SCSI quiesce and resume work reliably Bart Van Assche
2017-10-12 16:02   ` Ming Lei
2017-10-12 17:18     ` Bart Van Assche
2017-10-10 21:03 ` [PATCH v8 10/10] block, nvme: Introduce blk_mq_req_flags_t Bart Van Assche

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).