[PATCH v10 00/10] block, scsi, md: Improve suspend and resume

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v10 00/10] block, scsi, md: Improve suspend and resume
@ 2017-10-17 23:26 Bart Van Assche
  2017-10-17 23:26 ` [PATCH v10 01/10] md: Rename md_notifier into md_reboot_notifier Bart Van Assche
                   ` (10 more replies)
  0 siblings, 11 replies; 25+ messages in thread
From: Bart Van Assche @ 2017-10-17 23:26 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Christoph Hellwig, Martin K . Petersen,
	Oleksandr Natalenko, Ming Lei, Bart Van Assche

Hello Jens,

It is known that during the resume following a hibernate, especially when
using an md RAID1 array created on top of SCSI devices, sometimes the system
hangs instead of coming up properly. This patch series fixes that
problem. These patches have been tested on top of the block layer for-next
branch. Please consider these changes for kernel v4.15.

Thanks,

Bart.

Changes between v9 and v10:
- Made sure that scsi_device_resume() handles SCSI devices that are in an
  error state correctly. Unlike other SCSI transport protocols, during an
  ATA device resume the ATA controller is reset. This should fix the
  blk_queue_enter WARNING reported by the kernel test robot.

Changes between v8 and v9:
- Modified the third patch such that the MD_RECOVERY_FROZEN flag is restored
  properly after a resume.
- Modified the ninth patch such that a kernel warning is reported if it is
  attempted to call scsi_device_quiesce() from multiple contexts concurrently.
- Added Reviewed-by / Tested-by tags as appropriate.
  
Changes between v7 and v8:
- Fixed a (theoretical?) race identified by Ming Lei.
- Added a tenth patch that checks whether the proper type of flags has been
  passed to a range of block layer functions.

Changes between v6 and v7:
- Added support for the PREEMPT_ONLY flag in blk-mq-debugfs.c.
- Fixed kerneldoc header of blk_queue_enter().
- Added a rcu_read_lock_sched() / rcu_read_unlock_sched() pair in
  blk_set_preempt_only().
- Removed a synchronize_rcu() call from scsi_device_quiesce().
- Modified the description of patch 9/9 in this series.
- Removed scsi_run_queue() call from scsi_device_resume().

Changes between v5 and v6:
- Split an md patch into two patches to make it easier to review the changes.
- For the md patch that suspends I/O while the system is frozen, switched back
  to the freezer mechanism because this makes the code shorter and easier to
  review.
- Changed blk_set/clear_preempt_only() from EXPORT_SYMBOL() into
  EXPORT_SYMBOL_GPL().
- Made blk_set_preempt_only() behave as a test-and-set operation.
- Introduced blk_get_request_flags() and BLK_MQ_REQ_PREEMPT as requested by
  Christoph and reduced the number of arguments of blk_queue_enter() back from
  three to two.
- In scsi_device_quiesce(), moved the blk_mq_freeze_queue() call out of a
  critical section. Made the explanation of why the synchronize_rcu() call
  is necessary more detailed.

Changes between v4 and v5:
- Split blk_set_preempt_only() into two functions as requested by Christoph.
- Made blk_get_request() trigger WARN_ONCE() if it is attempted to allocate
  a request while the system is frozen. This is a big help to identify drivers
  that submit I/O while the system is frozen.
- Since kernel thread freezing is on its way out, reworked the approach for
  avoiding that the md driver submits I/O while the system is frozen such that
  the approach no longer depends on the kernel thread freeze mechanism.
- Fixed the (theoretical) deadlock in scsi_device_quiesce() that was identified
  by Ming.

Changes between v3 and v4:
- Made sure that this patch series not only works for scsi-mq but also for
  the legacy SCSI stack.
- Removed an smp_rmb()/smp_wmb() pair from the hot path and added a comment
  that explains why that is safe.
- Reordered the patches in this patch series.

Changes between v2 and v3:
- Made md kernel threads freezable.
- Changed the approach for quiescing SCSI devices again.
- Addressed Ming's review comments.

Changes compared to v1 of this patch series:
- Changed the approach and rewrote the patch series.

Bart Van Assche (9):
  md: Rename md_notifier into md_reboot_notifier
  md: Introduce md_stop_all_writes()
  md: Neither resync nor reshape while the system is frozen
  block: Introduce blk_get_request_flags()
  block: Introduce BLK_MQ_REQ_PREEMPT
  ide, scsi: Tell the block layer at request allocation time about
    preempt requests
  block: Add the QUEUE_FLAG_PREEMPT_ONLY request queue flag
  block, scsi: Make SCSI quiesce and resume work reliably
  block, nvme: Introduce blk_mq_req_flags_t

Ming Lei (1):
  block: Make q_usage_counter also track legacy requests

 block/blk-core.c           | 133 ++++++++++++++++++++++++++++++++++++++-------
 block/blk-mq-debugfs.c     |   1 +
 block/blk-mq.c             |  20 +++----
 block/blk-mq.h             |   2 +-
 block/blk-timeout.c        |   2 +-
 drivers/ide/ide-pm.c       |   4 +-
 drivers/md/md.c            |  65 ++++++++++++++++++++--
 drivers/md/md.h            |   8 +++
 drivers/nvme/host/core.c   |   5 +-
 drivers/nvme/host/nvme.h   |   5 +-
 drivers/scsi/scsi_lib.c    |  48 +++++++++++-----
 fs/block_dev.c             |   4 +-
 include/linux/blk-mq.h     |  16 ++++--
 include/linux/blk_types.h  |   2 +
 include/linux/blkdev.h     |  11 +++-
 include/scsi/scsi_device.h |   1 +
 16 files changed, 258 insertions(+), 69 deletions(-)

-- 
2.14.2

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v10 01/10] md: Rename md_notifier into md_reboot_notifier
  2017-10-17 23:26 [PATCH v10 00/10] block, scsi, md: Improve suspend and resume Bart Van Assche
@ 2017-10-17 23:26 ` Bart Van Assche
  2017-10-17 23:26 ` [PATCH v10 02/10] md: Introduce md_stop_all_writes() Bart Van Assche
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 25+ messages in thread
From: Bart Van Assche @ 2017-10-17 23:26 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Christoph Hellwig, Martin K . Petersen,
	Oleksandr Natalenko, Ming Lei, Bart Van Assche, linux-raid

This avoids confusion with the pm notifier that will be added
through a later patch.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Shaohua Li <shli@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Tested-by: Martin Steigerwald <martin@lichtvoll.de>
Cc: linux-raid@vger.kernel.org
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
---
 drivers/md/md.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 5d61049e7417..8933cafc212d 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -8966,7 +8966,7 @@ static int md_notify_reboot(struct notifier_block *this,
 	return NOTIFY_DONE;
 }
 
-static struct notifier_block md_notifier = {
+static struct notifier_block md_reboot_notifier = {
 	.notifier_call	= md_notify_reboot,
 	.next		= NULL,
 	.priority	= INT_MAX, /* before any real devices */
@@ -9003,7 +9003,7 @@ static int __init md_init(void)
 	blk_register_region(MKDEV(mdp_major, 0), 1UL<<MINORBITS, THIS_MODULE,
 			    md_probe, NULL, NULL);
 
-	register_reboot_notifier(&md_notifier);
+	register_reboot_notifier(&md_reboot_notifier);
 	raid_table_header = register_sysctl_table(raid_root_table);
 
 	md_geninit();
@@ -9243,7 +9243,7 @@ static __exit void md_exit(void)
 
 	unregister_blkdev(MD_MAJOR,"md");
 	unregister_blkdev(mdp_major, "mdp");
-	unregister_reboot_notifier(&md_notifier);
+	unregister_reboot_notifier(&md_reboot_notifier);
 	unregister_sysctl_table(raid_table_header);
 
 	/* We cannot unload the modules while some process is
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v10 02/10] md: Introduce md_stop_all_writes()
  2017-10-17 23:26 [PATCH v10 00/10] block, scsi, md: Improve suspend and resume Bart Van Assche
  2017-10-17 23:26 ` [PATCH v10 01/10] md: Rename md_notifier into md_reboot_notifier Bart Van Assche
@ 2017-10-17 23:26 ` Bart Van Assche
  2017-10-17 23:26 ` [PATCH v10 03/10] md: Neither resync nor reshape while the system is frozen Bart Van Assche
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 25+ messages in thread
From: Bart Van Assche @ 2017-10-17 23:26 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Christoph Hellwig, Martin K . Petersen,
	Oleksandr Natalenko, Ming Lei, Bart Van Assche, linux-raid

Introduce md_stop_all_writes() because the next patch will add
a second caller for this function. This patch does not change
any functionality.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Shaohua Li <shli@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Tested-by: Martin Steigerwald <martin@lichtvoll.de>
Cc: linux-raid@vger.kernel.org
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
---
 drivers/md/md.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 8933cafc212d..b99584e5d6b1 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -8937,8 +8937,7 @@ int rdev_clear_badblocks(struct md_rdev *rdev, sector_t s, int sectors,
 }
 EXPORT_SYMBOL_GPL(rdev_clear_badblocks);
 
-static int md_notify_reboot(struct notifier_block *this,
-			    unsigned long code, void *x)
+static void md_stop_all_writes(void)
 {
 	struct list_head *tmp;
 	struct mddev *mddev;
@@ -8962,6 +8961,12 @@ static int md_notify_reboot(struct notifier_block *this,
 	 */
 	if (need_delay)
 		mdelay(1000*1);
+}
+
+static int md_notify_reboot(struct notifier_block *this,
+			    unsigned long code, void *x)
+{
+	md_stop_all_writes();
 
 	return NOTIFY_DONE;
 }
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v10 03/10] md: Neither resync nor reshape while the system is frozen
  2017-10-17 23:26 [PATCH v10 00/10] block, scsi, md: Improve suspend and resume Bart Van Assche
  2017-10-17 23:26 ` [PATCH v10 01/10] md: Rename md_notifier into md_reboot_notifier Bart Van Assche
  2017-10-17 23:26 ` [PATCH v10 02/10] md: Introduce md_stop_all_writes() Bart Van Assche
@ 2017-10-17 23:26 ` Bart Van Assche
  2017-10-17 23:26 ` [PATCH v10 04/10] block: Make q_usage_counter also track legacy requests Bart Van Assche
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 25+ messages in thread
From: Bart Van Assche @ 2017-10-17 23:26 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Christoph Hellwig, Martin K . Petersen,
	Oleksandr Natalenko, Ming Lei, Bart Van Assche, linux-raid,
	Johannes Thumshirn

Some people use the md driver on laptops and use the suspend and
resume functionality. Since it is essential that submitting of
new I/O requests stops before a hibernation image is created,
interrupt the md resync and reshape actions if the system is
being frozen. Note: the resync and reshape will restart after
the system is resumed and a message similar to the following
will appear in the system log:

md: md0: data-check interrupted.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Shaohua Li <shli@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Tested-by: Martin Steigerwald <martin@lichtvoll.de>
Cc: linux-raid@vger.kernel.org
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
---
 drivers/md/md.c | 50 +++++++++++++++++++++++++++++++++++++++++++++++++-
 drivers/md/md.h |  8 ++++++++
 2 files changed, 57 insertions(+), 1 deletion(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index b99584e5d6b1..8b2fc750f97f 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -66,6 +66,8 @@
 #include <linux/raid/md_u.h>
 #include <linux/slab.h>
 #include <linux/percpu-refcount.h>
+#include <linux/freezer.h>
+#include <linux/suspend.h>
 
 #include <trace/events/block.h>
 #include "md.h"
@@ -7439,6 +7441,7 @@ static int md_thread(void *arg)
 	 */
 
 	allow_signal(SIGKILL);
+	set_freezable();
 	while (!kthread_should_stop()) {
 
 		/* We need to wait INTERRUPTIBLE so that
@@ -7449,7 +7452,7 @@ static int md_thread(void *arg)
 		if (signal_pending(current))
 			flush_signals(current);
 
-		wait_event_interruptible_timeout
+		wait_event_freezable_timeout
 			(thread->wqueue,
 			 test_bit(THREAD_WAKEUP, &thread->flags)
 			 || kthread_should_stop() || kthread_should_park(),
@@ -8944,6 +8947,8 @@ static void md_stop_all_writes(void)
 	int need_delay = 0;
 
 	for_each_mddev(mddev, tmp) {
+		mddev->frozen_before_suspend =
+			test_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
 		if (mddev_trylock(mddev)) {
 			if (mddev->pers)
 				__md_stop_writes(mddev);
@@ -8963,6 +8968,47 @@ static void md_stop_all_writes(void)
 		mdelay(1000*1);
 }
 
+static void md_restore_frozen_flag(void)
+{
+	struct list_head *tmp;
+	struct mddev *mddev;
+
+	for_each_mddev(mddev, tmp) {
+		if (mddev->frozen_before_suspend)
+			set_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
+		else
+			clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
+	}
+}
+
+/*
+ * Ensure that neither resyncing nor reshaping occurs while the system is
+ * frozen.
+ */
+static int md_notify_pm(struct notifier_block *bl, unsigned long state,
+			void *unused)
+{
+	pr_debug("%s: state = %ld\n", __func__, state);
+
+	switch (state) {
+	case PM_HIBERNATION_PREPARE:
+	case PM_SUSPEND_PREPARE:
+	case PM_RESTORE_PREPARE:
+		md_stop_all_writes();
+		break;
+	case PM_POST_HIBERNATION:
+	case PM_POST_SUSPEND:
+	case PM_POST_RESTORE:
+		md_restore_frozen_flag();
+		break;
+	}
+	return NOTIFY_DONE;
+}
+
+static struct notifier_block md_pm_notifier = {
+	.notifier_call	= md_notify_pm,
+};
+
 static int md_notify_reboot(struct notifier_block *this,
 			    unsigned long code, void *x)
 {
@@ -9009,6 +9055,7 @@ static int __init md_init(void)
 			    md_probe, NULL, NULL);
 
 	register_reboot_notifier(&md_reboot_notifier);
+	register_pm_notifier(&md_pm_notifier);
 	raid_table_header = register_sysctl_table(raid_root_table);
 
 	md_geninit();
@@ -9248,6 +9295,7 @@ static __exit void md_exit(void)
 
 	unregister_blkdev(MD_MAJOR,"md");
 	unregister_blkdev(mdp_major, "mdp");
+	unregister_pm_notifier(&md_pm_notifier);
 	unregister_reboot_notifier(&md_reboot_notifier);
 	unregister_sysctl_table(raid_table_header);
 
diff --git a/drivers/md/md.h b/drivers/md/md.h
index d8287d3cd1bf..727b3aef4d26 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -356,6 +356,14 @@ struct mddev {
 	 */
 	int				recovery_disabled;
 
+	/* Writes are stopped before hibernation, suspend and restore by
+	 * calling md_stop_all_writes(). That function sets the
+	 * MD_RECOVERY_FROZEN flag. The value of that flag is saved before
+	 * writes are stopped such that it can be restored after hibernation,
+	 * suspend or restore have finished.
+	 */
+	bool				frozen_before_suspend;
+
 	int				in_sync;	/* know to not need resync */
 	/* 'open_mutex' avoids races between 'md_open' and 'do_md_stop', so
 	 * that we are never stopping an array while it is open.
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v10 04/10] block: Make q_usage_counter also track legacy requests
  2017-10-17 23:26 [PATCH v10 00/10] block, scsi, md: Improve suspend and resume Bart Van Assche
                   ` (2 preceding siblings ...)
  2017-10-17 23:26 ` [PATCH v10 03/10] md: Neither resync nor reshape while the system is frozen Bart Van Assche
@ 2017-10-17 23:26 ` Bart Van Assche
  2017-10-17 23:26 ` [PATCH v10 05/10] block: Introduce blk_get_request_flags() Bart Van Assche
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 25+ messages in thread
From: Bart Van Assche @ 2017-10-17 23:26 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Christoph Hellwig, Martin K . Petersen,
	Oleksandr Natalenko, Ming Lei, Bart Van Assche

From: Ming Lei <ming.lei@redhat.com>

This patch makes it possible to pause request allocation for
the legacy block layer by calling blk_mq_freeze_queue() and
blk_mq_unfreeze_queue().

Signed-off-by: Ming Lei <ming.lei@redhat.com>
[ bvanassche: Combined two patches into one, edited a comment and made sure
  REQ_NOWAIT is handled properly in blk_old_get_request() ]
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Tested-by: Martin Steigerwald <martin@lichtvoll.de>
Cc: Ming Lei <ming.lei@redhat.com>
---
 block/blk-core.c | 12 ++++++++++++
 block/blk-mq.c   | 10 ++--------
 2 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index e8e149ccc86b..23a0110fc62b 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -610,6 +610,9 @@ void blk_set_queue_dying(struct request_queue *q)
 		}
 		spin_unlock_irq(q->queue_lock);
 	}
+
+	/* Make blk_queue_enter() reexamine the DYING flag. */
+	wake_up_all(&q->mq_freeze_wq);
 }
 EXPORT_SYMBOL_GPL(blk_set_queue_dying);
 
@@ -1395,16 +1398,22 @@ static struct request *blk_old_get_request(struct request_queue *q,
 					   unsigned int op, gfp_t gfp_mask)
 {
 	struct request *rq;
+	int ret = 0;
 
 	WARN_ON_ONCE(q->mq_ops);
 
 	/* create ioc upfront */
 	create_io_context(gfp_mask, q->node);
 
+	ret = blk_queue_enter(q, !(gfp_mask & __GFP_DIRECT_RECLAIM) ||
+			      (op & REQ_NOWAIT));
+	if (ret)
+		return ERR_PTR(ret);
 	spin_lock_irq(q->queue_lock);
 	rq = get_request(q, op, NULL, gfp_mask);
 	if (IS_ERR(rq)) {
 		spin_unlock_irq(q->queue_lock);
+		blk_queue_exit(q);
 		return rq;
 	}
 
@@ -1576,6 +1585,7 @@ void __blk_put_request(struct request_queue *q, struct request *req)
 		blk_free_request(rl, req);
 		freed_request(rl, sync, rq_flags);
 		blk_put_rl(rl);
+		blk_queue_exit(q);
 	}
 }
 EXPORT_SYMBOL_GPL(__blk_put_request);
@@ -1857,8 +1867,10 @@ static blk_qc_t blk_queue_bio(struct request_queue *q, struct bio *bio)
 	 * Grab a free request. This is might sleep but can not fail.
 	 * Returns with the queue unlocked.
 	 */
+	blk_queue_enter_live(q);
 	req = get_request(q, bio->bi_opf, bio, GFP_NOIO);
 	if (IS_ERR(req)) {
+		blk_queue_exit(q);
 		__wbt_done(q->rq_wb, wb_acct);
 		if (PTR_ERR(req) == -ENOMEM)
 			bio->bi_status = BLK_STS_RESOURCE;
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 097ca3ece716..59b7de6b616b 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -125,7 +125,8 @@ void blk_freeze_queue_start(struct request_queue *q)
 	freeze_depth = atomic_inc_return(&q->mq_freeze_depth);
 	if (freeze_depth == 1) {
 		percpu_ref_kill(&q->q_usage_counter);
-		blk_mq_run_hw_queues(q, false);
+		if (q->mq_ops)
+			blk_mq_run_hw_queues(q, false);
 	}
 }
 EXPORT_SYMBOL_GPL(blk_freeze_queue_start);
@@ -255,13 +256,6 @@ void blk_mq_wake_waiters(struct request_queue *q)
 	queue_for_each_hw_ctx(q, hctx, i)
 		if (blk_mq_hw_queue_mapped(hctx))
 			blk_mq_tag_wakeup_all(hctx->tags, true);
-
-	/*
-	 * If we are called because the queue has now been marked as
-	 * dying, we need to ensure that processes currently waiting on
-	 * the queue are notified as well.
-	 */
-	wake_up_all(&q->mq_freeze_wq);
 }
 
 bool blk_mq_can_queue(struct blk_mq_hw_ctx *hctx)
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v10 05/10] block: Introduce blk_get_request_flags()
  2017-10-17 23:26 [PATCH v10 00/10] block, scsi, md: Improve suspend and resume Bart Van Assche
                   ` (3 preceding siblings ...)
  2017-10-17 23:26 ` [PATCH v10 04/10] block: Make q_usage_counter also track legacy requests Bart Van Assche
@ 2017-10-17 23:26 ` Bart Van Assche
  2017-10-17 23:26 ` [PATCH v10 06/10] block: Introduce BLK_MQ_REQ_PREEMPT Bart Van Assche
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 25+ messages in thread
From: Bart Van Assche @ 2017-10-17 23:26 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Christoph Hellwig, Martin K . Petersen,
	Oleksandr Natalenko, Ming Lei, Bart Van Assche,
	Johannes Thumshirn

A side effect of this patch is that the GFP mask that is passed to
several allocation functions in the legacy block layer is changed
from GFP_KERNEL into __GFP_DIRECT_RECLAIM.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Tested-by: Martin Steigerwald <martin@lichtvoll.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
---
 block/blk-core.c       | 50 +++++++++++++++++++++++++++++++++++---------------
 include/linux/blkdev.h |  3 +++
 2 files changed, 38 insertions(+), 15 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 23a0110fc62b..559818be8e02 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1157,7 +1157,7 @@ int blk_update_nr_requests(struct request_queue *q, unsigned int nr)
  * @rl: request list to allocate from
  * @op: operation and flags
  * @bio: bio to allocate request for (can be %NULL)
- * @gfp_mask: allocation mask
+ * @flags: BLQ_MQ_REQ_* flags
  *
  * Get a free request from @q.  This function may fail under memory
  * pressure or if @q is dead.
@@ -1167,7 +1167,7 @@ int blk_update_nr_requests(struct request_queue *q, unsigned int nr)
  * Returns request pointer on success, with @q->queue_lock *not held*.
  */
 static struct request *__get_request(struct request_list *rl, unsigned int op,
-		struct bio *bio, gfp_t gfp_mask)
+				     struct bio *bio, unsigned int flags)
 {
 	struct request_queue *q = rl->q;
 	struct request *rq;
@@ -1176,6 +1176,8 @@ static struct request *__get_request(struct request_list *rl, unsigned int op,
 	struct io_cq *icq = NULL;
 	const bool is_sync = op_is_sync(op);
 	int may_queue;
+	gfp_t gfp_mask = flags & BLK_MQ_REQ_NOWAIT ? GFP_ATOMIC :
+			 __GFP_DIRECT_RECLAIM;
 	req_flags_t rq_flags = RQF_ALLOCED;
 
 	lockdep_assert_held(q->queue_lock);
@@ -1336,7 +1338,7 @@ static struct request *__get_request(struct request_list *rl, unsigned int op,
  * @q: request_queue to allocate request from
  * @op: operation and flags
  * @bio: bio to allocate request for (can be %NULL)
- * @gfp_mask: allocation mask
+ * @flags: BLK_MQ_REQ_* flags.
  *
  * Get a free request from @q.  If %__GFP_DIRECT_RECLAIM is set in @gfp_mask,
  * this function keeps retrying under memory pressure and fails iff @q is dead.
@@ -1346,7 +1348,7 @@ static struct request *__get_request(struct request_list *rl, unsigned int op,
  * Returns request pointer on success, with @q->queue_lock *not held*.
  */
 static struct request *get_request(struct request_queue *q, unsigned int op,
-		struct bio *bio, gfp_t gfp_mask)
+				   struct bio *bio, unsigned int flags)
 {
 	const bool is_sync = op_is_sync(op);
 	DEFINE_WAIT(wait);
@@ -1358,7 +1360,7 @@ static struct request *get_request(struct request_queue *q, unsigned int op,
 
 	rl = blk_get_rl(q, bio);	/* transferred to @rq on success */
 retry:
-	rq = __get_request(rl, op, bio, gfp_mask);
+	rq = __get_request(rl, op, bio, flags);
 	if (!IS_ERR(rq))
 		return rq;
 
@@ -1367,7 +1369,7 @@ static struct request *get_request(struct request_queue *q, unsigned int op,
 		return ERR_PTR(-EAGAIN);
 	}
 
-	if (!gfpflags_allow_blocking(gfp_mask) || unlikely(blk_queue_dying(q))) {
+	if ((flags & BLK_MQ_REQ_NOWAIT) || unlikely(blk_queue_dying(q))) {
 		blk_put_rl(rl);
 		return rq;
 	}
@@ -1394,10 +1396,13 @@ static struct request *get_request(struct request_queue *q, unsigned int op,
 	goto retry;
 }
 
+/* flags: BLK_MQ_REQ_PREEMPT and/or BLK_MQ_REQ_NOWAIT. */
 static struct request *blk_old_get_request(struct request_queue *q,
-					   unsigned int op, gfp_t gfp_mask)
+					   unsigned int op, unsigned int flags)
 {
 	struct request *rq;
+	gfp_t gfp_mask = flags & BLK_MQ_REQ_NOWAIT ? GFP_ATOMIC :
+			 __GFP_DIRECT_RECLAIM;
 	int ret = 0;
 
 	WARN_ON_ONCE(q->mq_ops);
@@ -1410,7 +1415,7 @@ static struct request *blk_old_get_request(struct request_queue *q,
 	if (ret)
 		return ERR_PTR(ret);
 	spin_lock_irq(q->queue_lock);
-	rq = get_request(q, op, NULL, gfp_mask);
+	rq = get_request(q, op, NULL, flags);
 	if (IS_ERR(rq)) {
 		spin_unlock_irq(q->queue_lock);
 		blk_queue_exit(q);
@@ -1424,25 +1429,40 @@ static struct request *blk_old_get_request(struct request_queue *q,
 	return rq;
 }
 
-struct request *blk_get_request(struct request_queue *q, unsigned int op,
-				gfp_t gfp_mask)
+/**
+ * blk_get_request_flags - allocate a request
+ * @q: request queue to allocate a request for
+ * @op: operation (REQ_OP_*) and REQ_* flags, e.g. REQ_SYNC.
+ * @flags: BLK_MQ_REQ_* flags, e.g. BLK_MQ_REQ_NOWAIT.
+ */
+struct request *blk_get_request_flags(struct request_queue *q, unsigned int op,
+				      unsigned int flags)
 {
 	struct request *req;
 
+	WARN_ON_ONCE(op & REQ_NOWAIT);
+	WARN_ON_ONCE(flags & ~BLK_MQ_REQ_NOWAIT);
+
 	if (q->mq_ops) {
-		req = blk_mq_alloc_request(q, op,
-			(gfp_mask & __GFP_DIRECT_RECLAIM) ?
-				0 : BLK_MQ_REQ_NOWAIT);
+		req = blk_mq_alloc_request(q, op, flags);
 		if (!IS_ERR(req) && q->mq_ops->initialize_rq_fn)
 			q->mq_ops->initialize_rq_fn(req);
 	} else {
-		req = blk_old_get_request(q, op, gfp_mask);
+		req = blk_old_get_request(q, op, flags);
 		if (!IS_ERR(req) && q->initialize_rq_fn)
 			q->initialize_rq_fn(req);
 	}
 
 	return req;
 }
+EXPORT_SYMBOL(blk_get_request_flags);
+
+struct request *blk_get_request(struct request_queue *q, unsigned int op,
+				gfp_t gfp_mask)
+{
+	return blk_get_request_flags(q, op, gfp_mask & __GFP_DIRECT_RECLAIM ?
+				     0 : BLK_MQ_REQ_NOWAIT);
+}
 EXPORT_SYMBOL(blk_get_request);
 
 /**
@@ -1868,7 +1888,7 @@ static blk_qc_t blk_queue_bio(struct request_queue *q, struct bio *bio)
 	 * Returns with the queue unlocked.
 	 */
 	blk_queue_enter_live(q);
-	req = get_request(q, bio->bi_opf, bio, GFP_NOIO);
+	req = get_request(q, bio->bi_opf, bio, 0);
 	if (IS_ERR(req)) {
 		blk_queue_exit(q);
 		__wbt_done(q->rq_wb, wb_acct);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 72637028f3c9..05203175eb9c 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -924,6 +924,9 @@ extern void blk_rq_init(struct request_queue *q, struct request *rq);
 extern void blk_init_request_from_bio(struct request *req, struct bio *bio);
 extern void blk_put_request(struct request *);
 extern void __blk_put_request(struct request_queue *, struct request *);
+extern struct request *blk_get_request_flags(struct request_queue *,
+					     unsigned int op,
+					     unsigned int flags);
 extern struct request *blk_get_request(struct request_queue *, unsigned int op,
 				       gfp_t gfp_mask);
 extern void blk_requeue_request(struct request_queue *, struct request *);
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v10 06/10] block: Introduce BLK_MQ_REQ_PREEMPT
  2017-10-17 23:26 [PATCH v10 00/10] block, scsi, md: Improve suspend and resume Bart Van Assche
                   ` (4 preceding siblings ...)
  2017-10-17 23:26 ` [PATCH v10 05/10] block: Introduce blk_get_request_flags() Bart Van Assche
@ 2017-10-17 23:26 ` Bart Van Assche
  2017-10-17 23:26 ` [PATCH v10 07/10] ide, scsi: Tell the block layer at request allocation time about preempt requests Bart Van Assche
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 25+ messages in thread
From: Bart Van Assche @ 2017-10-17 23:26 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Christoph Hellwig, Martin K . Petersen,
	Oleksandr Natalenko, Ming Lei, Bart Van Assche,
	Johannes Thumshirn

Set RQF_PREEMPT if BLK_MQ_REQ_PREEMPT is passed to
blk_get_request_flags().

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Tested-by: Martin Steigerwald <martin@lichtvoll.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
---
 block/blk-core.c       | 4 +++-
 block/blk-mq.c         | 2 ++
 include/linux/blk-mq.h | 1 +
 3 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 559818be8e02..ab835d3c2f39 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1260,6 +1260,8 @@ static struct request *__get_request(struct request_list *rl, unsigned int op,
 	blk_rq_set_rl(rq, rl);
 	rq->cmd_flags = op;
 	rq->rq_flags = rq_flags;
+	if (flags & BLK_MQ_REQ_PREEMPT)
+		rq->rq_flags |= RQF_PREEMPT;
 
 	/* init elvpriv */
 	if (rq_flags & RQF_ELVPRIV) {
@@ -1441,7 +1443,7 @@ struct request *blk_get_request_flags(struct request_queue *q, unsigned int op,
 	struct request *req;
 
 	WARN_ON_ONCE(op & REQ_NOWAIT);
-	WARN_ON_ONCE(flags & ~BLK_MQ_REQ_NOWAIT);
+	WARN_ON_ONCE(flags & ~(BLK_MQ_REQ_NOWAIT | BLK_MQ_REQ_PREEMPT));
 
 	if (q->mq_ops) {
 		req = blk_mq_alloc_request(q, op, flags);
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 59b7de6b616b..6a025b17caac 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -290,6 +290,8 @@ static struct request *blk_mq_rq_ctx_init(struct blk_mq_alloc_data *data,
 	rq->q = data->q;
 	rq->mq_ctx = data->ctx;
 	rq->cmd_flags = op;
+	if (data->flags & BLK_MQ_REQ_PREEMPT)
+		rq->rq_flags |= RQF_PREEMPT;
 	if (blk_queue_io_stat(data->q))
 		rq->rq_flags |= RQF_IO_STAT;
 	/* do not touch atomic flags, it needs atomic ops against the timer */
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index e5e6becd57d3..22c7f36745fc 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -213,6 +213,7 @@ enum {
 	BLK_MQ_REQ_NOWAIT	= (1 << 0), /* return when out of requests */
 	BLK_MQ_REQ_RESERVED	= (1 << 1), /* allocate from reserved pool */
 	BLK_MQ_REQ_INTERNAL	= (1 << 2), /* allocate internal/sched tag */
+	BLK_MQ_REQ_PREEMPT	= (1 << 3), /* set RQF_PREEMPT */
 };
 
 struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op,
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v10 07/10] ide, scsi: Tell the block layer at request allocation time about preempt requests
  2017-10-17 23:26 [PATCH v10 00/10] block, scsi, md: Improve suspend and resume Bart Van Assche
                   ` (5 preceding siblings ...)
  2017-10-17 23:26 ` [PATCH v10 06/10] block: Introduce BLK_MQ_REQ_PREEMPT Bart Van Assche
@ 2017-10-17 23:26 ` Bart Van Assche
  2017-10-17 23:26 ` [PATCH v10 08/10] block: Add the QUEUE_FLAG_PREEMPT_ONLY request queue flag Bart Van Assche
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 25+ messages in thread
From: Bart Van Assche @ 2017-10-17 23:26 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Christoph Hellwig, Martin K . Petersen,
	Oleksandr Natalenko, Ming Lei, Bart Van Assche,
	Johannes Thumshirn

Convert blk_get_request(q, op, __GFP_RECLAIM) into
blk_get_request_flags(q, op, BLK_MQ_PREEMPT). This patch does not
change any functionality.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Tested-by: Martin Steigerwald <martin@lichtvoll.de>
Acked-by: David S. Miller <davem@davemloft.net> [ for IDE ]
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
---
 drivers/ide/ide-pm.c    | 4 ++--
 drivers/scsi/scsi_lib.c | 6 +++---
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/ide/ide-pm.c b/drivers/ide/ide-pm.c
index 544f02d673ca..f56d742908df 100644
--- a/drivers/ide/ide-pm.c
+++ b/drivers/ide/ide-pm.c
@@ -89,9 +89,9 @@ int generic_ide_resume(struct device *dev)
 	}
 
 	memset(&rqpm, 0, sizeof(rqpm));
-	rq = blk_get_request(drive->queue, REQ_OP_DRV_IN, __GFP_RECLAIM);
+	rq = blk_get_request_flags(drive->queue, REQ_OP_DRV_IN,
+				   BLK_MQ_REQ_PREEMPT);
 	ide_req(rq)->type = ATA_PRIV_PM_RESUME;
-	rq->rq_flags |= RQF_PREEMPT;
 	rq->special = &rqpm;
 	rqpm.pm_step = IDE_PM_START_RESUME;
 	rqpm.pm_state = PM_EVENT_ON;
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 6f10afaca25b..7c119696402c 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -252,9 +252,9 @@ int scsi_execute(struct scsi_device *sdev, const unsigned char *cmd,
 	struct scsi_request *rq;
 	int ret = DRIVER_ERROR << 24;
 
-	req = blk_get_request(sdev->request_queue,
+	req = blk_get_request_flags(sdev->request_queue,
 			data_direction == DMA_TO_DEVICE ?
-			REQ_OP_SCSI_OUT : REQ_OP_SCSI_IN, __GFP_RECLAIM);
+			REQ_OP_SCSI_OUT : REQ_OP_SCSI_IN, BLK_MQ_REQ_PREEMPT);
 	if (IS_ERR(req))
 		return ret;
 	rq = scsi_req(req);
@@ -268,7 +268,7 @@ int scsi_execute(struct scsi_device *sdev, const unsigned char *cmd,
 	rq->retries = retries;
 	req->timeout = timeout;
 	req->cmd_flags |= flags;
-	req->rq_flags |= rq_flags | RQF_QUIET | RQF_PREEMPT;
+	req->rq_flags |= rq_flags | RQF_QUIET;
 
 	/*
 	 * head injection *required* here otherwise quiesce won't work
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v10 08/10] block: Add the QUEUE_FLAG_PREEMPT_ONLY request queue flag
  2017-10-17 23:26 [PATCH v10 00/10] block, scsi, md: Improve suspend and resume Bart Van Assche
                   ` (6 preceding siblings ...)
  2017-10-17 23:26 ` [PATCH v10 07/10] ide, scsi: Tell the block layer at request allocation time about preempt requests Bart Van Assche
@ 2017-10-17 23:26 ` Bart Van Assche
  2017-10-17 23:26 ` [PATCH v10 09/10] block, scsi: Make SCSI quiesce and resume work reliably Bart Van Assche
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 25+ messages in thread
From: Bart Van Assche @ 2017-10-17 23:26 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Christoph Hellwig, Martin K . Petersen,
	Oleksandr Natalenko, Ming Lei, Bart Van Assche,
	Johannes Thumshirn

This flag will be used in the next patch to let the block layer
core know whether or not a SCSI request queue has been quiesced.
A quiesced SCSI queue namely only processes RQF_PREEMPT requests.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Tested-by: Martin Steigerwald <martin@lichtvoll.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
---
 block/blk-core.c       | 30 ++++++++++++++++++++++++++++++
 block/blk-mq-debugfs.c |  1 +
 include/linux/blkdev.h |  6 ++++++
 3 files changed, 37 insertions(+)

diff --git a/block/blk-core.c b/block/blk-core.c
index ab835d3c2f39..b73203286bf5 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -346,6 +346,36 @@ void blk_sync_queue(struct request_queue *q)
 }
 EXPORT_SYMBOL(blk_sync_queue);
 
+/**
+ * blk_set_preempt_only - set QUEUE_FLAG_PREEMPT_ONLY
+ * @q: request queue pointer
+ *
+ * Returns the previous value of the PREEMPT_ONLY flag - 0 if the flag was not
+ * set and 1 if the flag was already set.
+ */
+int blk_set_preempt_only(struct request_queue *q)
+{
+	unsigned long flags;
+	int res;
+
+	spin_lock_irqsave(q->queue_lock, flags);
+	res = queue_flag_test_and_set(QUEUE_FLAG_PREEMPT_ONLY, q);
+	spin_unlock_irqrestore(q->queue_lock, flags);
+
+	return res;
+}
+EXPORT_SYMBOL_GPL(blk_set_preempt_only);
+
+void blk_clear_preempt_only(struct request_queue *q)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(q->queue_lock, flags);
+	queue_flag_clear(QUEUE_FLAG_PREEMPT_ONLY, q);
+	spin_unlock_irqrestore(q->queue_lock, flags);
+}
+EXPORT_SYMBOL_GPL(blk_clear_preempt_only);
+
 /**
  * __blk_run_queue_uncond - run a queue whether or not it has been stopped
  * @q:	The queue to run
diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 7f4a1ba532af..75f31535f280 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -74,6 +74,7 @@ static const char *const blk_queue_flag_name[] = {
 	QUEUE_FLAG_NAME(REGISTERED),
 	QUEUE_FLAG_NAME(SCSI_PASSTHROUGH),
 	QUEUE_FLAG_NAME(QUIESCED),
+	QUEUE_FLAG_NAME(PREEMPT_ONLY),
 };
 #undef QUEUE_FLAG_NAME
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 05203175eb9c..864ad2e4a58c 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -630,6 +630,7 @@ struct request_queue {
 #define QUEUE_FLAG_REGISTERED  26	/* queue has been registered to a disk */
 #define QUEUE_FLAG_SCSI_PASSTHROUGH 27	/* queue supports SCSI commands */
 #define QUEUE_FLAG_QUIESCED    28	/* queue has been quiesced */
+#define QUEUE_FLAG_PREEMPT_ONLY	29	/* only process REQ_PREEMPT requests */
 
 #define QUEUE_FLAG_DEFAULT	((1 << QUEUE_FLAG_IO_STAT) |		\
 				 (1 << QUEUE_FLAG_SAME_COMP)	|	\
@@ -730,6 +731,11 @@ static inline void queue_flag_clear(unsigned int flag, struct request_queue *q)
 	((rq)->cmd_flags & (REQ_FAILFAST_DEV|REQ_FAILFAST_TRANSPORT| \
 			     REQ_FAILFAST_DRIVER))
 #define blk_queue_quiesced(q)	test_bit(QUEUE_FLAG_QUIESCED, &(q)->queue_flags)
+#define blk_queue_preempt_only(q)				\
+	test_bit(QUEUE_FLAG_PREEMPT_ONLY, &(q)->queue_flags)
+
+extern int blk_set_preempt_only(struct request_queue *q);
+extern void blk_clear_preempt_only(struct request_queue *q);
 
 static inline bool blk_account_rq(struct request *rq)
 {
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v10 09/10] block, scsi: Make SCSI quiesce and resume work reliably
  2017-10-17 23:26 [PATCH v10 00/10] block, scsi, md: Improve suspend and resume Bart Van Assche
                   ` (7 preceding siblings ...)
  2017-10-17 23:26 ` [PATCH v10 08/10] block: Add the QUEUE_FLAG_PREEMPT_ONLY request queue flag Bart Van Assche
@ 2017-10-17 23:26 ` Bart Van Assche
  2017-10-17 23:26 ` [PATCH v10 10/10] block, nvme: Introduce blk_mq_req_flags_t Bart Van Assche
  2017-10-17 23:28 ` [PATCH v10 00/10] block, scsi, md: Improve suspend and resume Jens Axboe
  10 siblings, 0 replies; 25+ messages in thread
From: Bart Van Assche @ 2017-10-17 23:26 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Christoph Hellwig, Martin K . Petersen,
	Oleksandr Natalenko, Ming Lei, Bart Van Assche, Hannes Reinecke,
	Johannes Thumshirn

The contexts from which a SCSI device can be quiesced or resumed are:
* Writing into /sys/class/scsi_device/*/device/state.
* SCSI parallel (SPI) domain validation.
* The SCSI device power management methods. See also scsi_bus_pm_ops.

It is essential during suspend and resume that neither the filesystem
state nor the filesystem metadata in RAM changes. This is why while
the hibernation image is being written or restored that SCSI devices
are quiesced. The SCSI core quiesces devices through scsi_device_quiesce()
and scsi_device_resume(). In the SDEV_QUIESCE state execution of
non-preempt requests is deferred. This is realized by returning
BLKPREP_DEFER from inside scsi_prep_state_check() for quiesced SCSI
devices. Avoid that a full queue prevents power management requests
to be submitted by deferring allocation of non-preempt requests for
devices in the quiesced state. This patch has been tested by running
the following commands and by verifying that after resume the fio job
is still running:

for d in /sys/class/block/sd*[a-z]; do
  hcil=$(readlink "$d/device")
  hcil=${hcil#../../../}
  echo 4 > "$d/queue/nr_requests"
  echo 1 > "/sys/class/scsi_device/$hcil/device/queue_depth"
done
bdev=$(readlink /dev/disk/by-uuid/5217d83f-213e-4b42-b86e-20013325ba6c)
bdev=${bdev#../../}
hcil=$(readlink "/sys/block/$bdev/device")
hcil=${hcil#../../../}
fio --name="$bdev" --filename="/dev/$bdev" --buffered=0 --bs=512 --rw=randread \
  --ioengine=libaio --numjobs=4 --iodepth=16 --iodepth_batch=1 --thread \
  --loops=$((2**31)) &
pid=$!
sleep 1
systemctl hibernate
sleep 10
kill $pid

Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name>
References: "I/O hangs after resuming from suspend-to-ram" (https://marc.info/?l=linux-block&m=150340235201348).
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Tested-by: Martin Steigerwald <martin@lichtvoll.de>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
---
 block/blk-core.c           | 43 ++++++++++++++++++++++++++++++++++++-------
 block/blk-mq.c             |  4 ++--
 block/blk-timeout.c        |  2 +-
 drivers/scsi/scsi_lib.c    | 42 ++++++++++++++++++++++++++++++------------
 fs/block_dev.c             |  4 ++--
 include/linux/blkdev.h     |  2 +-
 include/scsi/scsi_device.h |  1 +
 7 files changed, 73 insertions(+), 25 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index b73203286bf5..63e88e78c1fe 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -372,6 +372,7 @@ void blk_clear_preempt_only(struct request_queue *q)
 
 	spin_lock_irqsave(q->queue_lock, flags);
 	queue_flag_clear(QUEUE_FLAG_PREEMPT_ONLY, q);
+	wake_up_all(&q->mq_freeze_wq);
 	spin_unlock_irqrestore(q->queue_lock, flags);
 }
 EXPORT_SYMBOL_GPL(blk_clear_preempt_only);
@@ -793,15 +794,41 @@ struct request_queue *blk_alloc_queue(gfp_t gfp_mask)
 }
 EXPORT_SYMBOL(blk_alloc_queue);
 
-int blk_queue_enter(struct request_queue *q, bool nowait)
+/**
+ * blk_queue_enter() - try to increase q->q_usage_counter
+ * @q: request queue pointer
+ * @flags: BLK_MQ_REQ_NOWAIT and/or BLK_MQ_REQ_PREEMPT
+ */
+int blk_queue_enter(struct request_queue *q, unsigned int flags)
 {
+	const bool preempt = flags & BLK_MQ_REQ_PREEMPT;
+
 	while (true) {
+		bool success = false;
 		int ret;
 
-		if (percpu_ref_tryget_live(&q->q_usage_counter))
+		rcu_read_lock_sched();
+		if (percpu_ref_tryget_live(&q->q_usage_counter)) {
+			/*
+			 * The code that sets the PREEMPT_ONLY flag is
+			 * responsible for ensuring that that flag is globally
+			 * visible before the queue is unfrozen.
+			 */
+			if (preempt || !blk_queue_preempt_only(q)) {
+				success = true;
+			} else {
+				percpu_ref_put(&q->q_usage_counter);
+				WARN_ONCE(true,
+					  "%s: Attempt to allocate non-preempt request in preempt-only mode.\n",
+					  kobject_name(q->kobj.parent));
+			}
+		}
+		rcu_read_unlock_sched();
+
+		if (success)
 			return 0;
 
-		if (nowait)
+		if (flags & BLK_MQ_REQ_NOWAIT)
 			return -EBUSY;
 
 		/*
@@ -814,7 +841,8 @@ int blk_queue_enter(struct request_queue *q, bool nowait)
 		smp_rmb();
 
 		ret = wait_event_interruptible(q->mq_freeze_wq,
-				!atomic_read(&q->mq_freeze_depth) ||
+				(atomic_read(&q->mq_freeze_depth) == 0 &&
+				 (preempt || !blk_queue_preempt_only(q))) ||
 				blk_queue_dying(q));
 		if (blk_queue_dying(q))
 			return -ENODEV;
@@ -1442,8 +1470,7 @@ static struct request *blk_old_get_request(struct request_queue *q,
 	/* create ioc upfront */
 	create_io_context(gfp_mask, q->node);
 
-	ret = blk_queue_enter(q, !(gfp_mask & __GFP_DIRECT_RECLAIM) ||
-			      (op & REQ_NOWAIT));
+	ret = blk_queue_enter(q, flags);
 	if (ret)
 		return ERR_PTR(ret);
 	spin_lock_irq(q->queue_lock);
@@ -2264,8 +2291,10 @@ blk_qc_t generic_make_request(struct bio *bio)
 	current->bio_list = bio_list_on_stack;
 	do {
 		struct request_queue *q = bio->bi_disk->queue;
+		unsigned int flags = bio->bi_opf & REQ_NOWAIT ?
+			BLK_MQ_REQ_NOWAIT : 0;
 
-		if (likely(blk_queue_enter(q, bio->bi_opf & REQ_NOWAIT) == 0)) {
+		if (likely(blk_queue_enter(q, flags) == 0)) {
 			struct bio_list lower, same;
 
 			/* Create a fresh bio_list for all subordinate requests */
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 6a025b17caac..c6bff60e6b8b 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -386,7 +386,7 @@ struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op,
 	struct request *rq;
 	int ret;
 
-	ret = blk_queue_enter(q, flags & BLK_MQ_REQ_NOWAIT);
+	ret = blk_queue_enter(q, flags);
 	if (ret)
 		return ERR_PTR(ret);
 
@@ -425,7 +425,7 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
 	if (hctx_idx >= q->nr_hw_queues)
 		return ERR_PTR(-EIO);
 
-	ret = blk_queue_enter(q, true);
+	ret = blk_queue_enter(q, flags);
 	if (ret)
 		return ERR_PTR(ret);
 
diff --git a/block/blk-timeout.c b/block/blk-timeout.c
index e3e9c9771d36..1eba71486716 100644
--- a/block/blk-timeout.c
+++ b/block/blk-timeout.c
@@ -134,7 +134,7 @@ void blk_timeout_work(struct work_struct *work)
 	struct request *rq, *tmp;
 	int next_set = 0;
 
-	if (blk_queue_enter(q, true))
+	if (blk_queue_enter(q, BLK_MQ_REQ_NOWAIT | BLK_MQ_REQ_PREEMPT))
 		return;
 	spin_lock_irqsave(q->queue_lock, flags);
 
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 7c119696402c..d85b7941b988 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -2955,21 +2955,37 @@ static void scsi_wait_for_queuecommand(struct scsi_device *sdev)
 int
 scsi_device_quiesce(struct scsi_device *sdev)
 {
+	struct request_queue *q = sdev->request_queue;
 	int err;
 
+	/*
+	 * It is allowed to call scsi_device_quiesce() multiple times from
+	 * the same context but concurrent scsi_device_quiesce() calls are
+	 * not allowed.
+	 */
+	WARN_ON_ONCE(sdev->quiesced_by && sdev->quiesced_by != current);
+
+	blk_set_preempt_only(q);
+
+	blk_mq_freeze_queue(q);
+	/*
+	 * Ensure that the effect of blk_set_preempt_only() will be visible
+	 * for percpu_ref_tryget() callers that occur after the queue
+	 * unfreeze even if the queue was already frozen before this function
+	 * was called. See also https://lwn.net/Articles/573497/.
+	 */
+	synchronize_rcu();
+	blk_mq_unfreeze_queue(q);
+
 	mutex_lock(&sdev->state_mutex);
 	err = scsi_device_set_state(sdev, SDEV_QUIESCE);
+	if (err == 0)
+		sdev->quiesced_by = current;
+	else
+		blk_clear_preempt_only(q);
 	mutex_unlock(&sdev->state_mutex);
 
-	if (err)
-		return err;
-
-	scsi_run_queue(sdev->request_queue);
-	while (atomic_read(&sdev->device_busy)) {
-		msleep_interruptible(200);
-		scsi_run_queue(sdev->request_queue);
-	}
-	return 0;
+	return err;
 }
 EXPORT_SYMBOL(scsi_device_quiesce);
 
@@ -2989,9 +3005,11 @@ void scsi_device_resume(struct scsi_device *sdev)
 	 * device deleted during suspend)
 	 */
 	mutex_lock(&sdev->state_mutex);
-	if (sdev->sdev_state == SDEV_QUIESCE &&
-	    scsi_device_set_state(sdev, SDEV_RUNNING) == 0)
-		scsi_run_queue(sdev->request_queue);
+	WARN_ON_ONCE(!sdev->quiesced_by);
+	sdev->quiesced_by = NULL;
+	blk_clear_preempt_only(sdev->request_queue);
+	if (sdev->sdev_state == SDEV_QUIESCE)
+		scsi_device_set_state(sdev, SDEV_RUNNING);
 	mutex_unlock(&sdev->state_mutex);
 }
 EXPORT_SYMBOL(scsi_device_resume);
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 07ddccd17801..c5363186618b 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -662,7 +662,7 @@ int bdev_read_page(struct block_device *bdev, sector_t sector,
 	if (!ops->rw_page || bdev_get_integrity(bdev))
 		return result;
 
-	result = blk_queue_enter(bdev->bd_queue, false);
+	result = blk_queue_enter(bdev->bd_queue, 0);
 	if (result)
 		return result;
 	result = ops->rw_page(bdev, sector + get_start_sect(bdev), page, false);
@@ -698,7 +698,7 @@ int bdev_write_page(struct block_device *bdev, sector_t sector,
 
 	if (!ops->rw_page || bdev_get_integrity(bdev))
 		return -EOPNOTSUPP;
-	result = blk_queue_enter(bdev->bd_queue, false);
+	result = blk_queue_enter(bdev->bd_queue, 0);
 	if (result)
 		return result;
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 864ad2e4a58c..4f91c6462752 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -956,7 +956,7 @@ extern int scsi_cmd_ioctl(struct request_queue *, struct gendisk *, fmode_t,
 extern int sg_scsi_ioctl(struct request_queue *, struct gendisk *, fmode_t,
 			 struct scsi_ioctl_command __user *);
 
-extern int blk_queue_enter(struct request_queue *q, bool nowait);
+extern int blk_queue_enter(struct request_queue *q, unsigned int flags);
 extern void blk_queue_exit(struct request_queue *q);
 extern void blk_start_queue(struct request_queue *q);
 extern void blk_start_queue_async(struct request_queue *q);
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index 82e93ee94708..6f0f1e242e23 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -219,6 +219,7 @@ struct scsi_device {
 	unsigned char		access_state;
 	struct mutex		state_mutex;
 	enum scsi_device_state sdev_state;
+	struct task_struct	*quiesced_by;
 	unsigned long		sdev_data[0];
 } __attribute__((aligned(sizeof(unsigned long))));
 
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v10 10/10] block, nvme: Introduce blk_mq_req_flags_t
  2017-10-17 23:26 [PATCH v10 00/10] block, scsi, md: Improve suspend and resume Bart Van Assche
                   ` (8 preceding siblings ...)
  2017-10-17 23:26 ` [PATCH v10 09/10] block, scsi: Make SCSI quiesce and resume work reliably Bart Van Assche
@ 2017-10-17 23:26 ` Bart Van Assche
  2017-10-17 23:28 ` [PATCH v10 00/10] block, scsi, md: Improve suspend and resume Jens Axboe
  10 siblings, 0 replies; 25+ messages in thread
From: Bart Van Assche @ 2017-10-17 23:26 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-scsi, Christoph Hellwig, Martin K . Petersen,
	Oleksandr Natalenko, Ming Lei, Bart Van Assche, linux-nvme,
	Johannes Thumshirn

Several block layer and NVMe core functions accept a combination
of BLK_MQ_REQ_* flags through the 'flags' argument but there is
no verification at compile time whether the right type of block
layer flags is passed. Make it possible for sparse to verify this.
This patch does not change any functionality.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Cc: linux-nvme@lists.infradead.org
Cc: Christoph Hellwig <hch@lst.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Ming Lei <ming.lei@redhat.com>
---
 block/blk-core.c          | 12 ++++++------
 block/blk-mq.c            |  4 ++--
 block/blk-mq.h            |  2 +-
 drivers/nvme/host/core.c  |  5 +++--
 drivers/nvme/host/nvme.h  |  5 +++--
 include/linux/blk-mq.h    | 17 +++++++++++------
 include/linux/blk_types.h |  2 ++
 include/linux/blkdev.h    |  4 ++--
 8 files changed, 30 insertions(+), 21 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 63e88e78c1fe..4fc9d6d44edd 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -799,7 +799,7 @@ EXPORT_SYMBOL(blk_alloc_queue);
  * @q: request queue pointer
  * @flags: BLK_MQ_REQ_NOWAIT and/or BLK_MQ_REQ_PREEMPT
  */
-int blk_queue_enter(struct request_queue *q, unsigned int flags)
+int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
 {
 	const bool preempt = flags & BLK_MQ_REQ_PREEMPT;
 
@@ -1225,7 +1225,7 @@ int blk_update_nr_requests(struct request_queue *q, unsigned int nr)
  * Returns request pointer on success, with @q->queue_lock *not held*.
  */
 static struct request *__get_request(struct request_list *rl, unsigned int op,
-				     struct bio *bio, unsigned int flags)
+				     struct bio *bio, blk_mq_req_flags_t flags)
 {
 	struct request_queue *q = rl->q;
 	struct request *rq;
@@ -1408,7 +1408,7 @@ static struct request *__get_request(struct request_list *rl, unsigned int op,
  * Returns request pointer on success, with @q->queue_lock *not held*.
  */
 static struct request *get_request(struct request_queue *q, unsigned int op,
-				   struct bio *bio, unsigned int flags)
+				   struct bio *bio, blk_mq_req_flags_t flags)
 {
 	const bool is_sync = op_is_sync(op);
 	DEFINE_WAIT(wait);
@@ -1458,7 +1458,7 @@ static struct request *get_request(struct request_queue *q, unsigned int op,
 
 /* flags: BLK_MQ_REQ_PREEMPT and/or BLK_MQ_REQ_NOWAIT. */
 static struct request *blk_old_get_request(struct request_queue *q,
-					   unsigned int op, unsigned int flags)
+				unsigned int op, blk_mq_req_flags_t flags)
 {
 	struct request *rq;
 	gfp_t gfp_mask = flags & BLK_MQ_REQ_NOWAIT ? GFP_ATOMIC :
@@ -1495,7 +1495,7 @@ static struct request *blk_old_get_request(struct request_queue *q,
  * @flags: BLK_MQ_REQ_* flags, e.g. BLK_MQ_REQ_NOWAIT.
  */
 struct request *blk_get_request_flags(struct request_queue *q, unsigned int op,
-				      unsigned int flags)
+				      blk_mq_req_flags_t flags)
 {
 	struct request *req;
 
@@ -2291,7 +2291,7 @@ blk_qc_t generic_make_request(struct bio *bio)
 	current->bio_list = bio_list_on_stack;
 	do {
 		struct request_queue *q = bio->bi_disk->queue;
-		unsigned int flags = bio->bi_opf & REQ_NOWAIT ?
+		blk_mq_req_flags_t flags = bio->bi_opf & REQ_NOWAIT ?
 			BLK_MQ_REQ_NOWAIT : 0;
 
 		if (likely(blk_queue_enter(q, flags) == 0)) {
diff --git a/block/blk-mq.c b/block/blk-mq.c
index c6bff60e6b8b..c037b1ad64a7 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -380,7 +380,7 @@ static struct request *blk_mq_get_request(struct request_queue *q,
 }
 
 struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op,
-		unsigned int flags)
+		blk_mq_req_flags_t flags)
 {
 	struct blk_mq_alloc_data alloc_data = { .flags = flags };
 	struct request *rq;
@@ -406,7 +406,7 @@ struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op,
 EXPORT_SYMBOL(blk_mq_alloc_request);
 
 struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
-		unsigned int op, unsigned int flags, unsigned int hctx_idx)
+	unsigned int op, blk_mq_req_flags_t flags, unsigned int hctx_idx)
 {
 	struct blk_mq_alloc_data alloc_data = { .flags = flags };
 	struct request *rq;
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 522b420dedc0..5dcfe4fa5e0d 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -110,7 +110,7 @@ static inline void blk_mq_put_ctx(struct blk_mq_ctx *ctx)
 struct blk_mq_alloc_data {
 	/* input parameter */
 	struct request_queue *q;
-	unsigned int flags;
+	blk_mq_req_flags_t flags;
 	unsigned int shallow_depth;
 
 	/* input & output parameter */
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index bb2aad078637..01947cd82b5a 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -292,7 +292,7 @@ static struct nvme_ns *nvme_get_ns_from_disk(struct gendisk *disk)
 }
 
 struct request *nvme_alloc_request(struct request_queue *q,
-		struct nvme_command *cmd, unsigned int flags, int qid)
+		struct nvme_command *cmd, blk_mq_req_flags_t flags, int qid)
 {
 	unsigned op = nvme_is_write(cmd) ? REQ_OP_DRV_OUT : REQ_OP_DRV_IN;
 	struct request *req;
@@ -560,7 +560,8 @@ EXPORT_SYMBOL_GPL(nvme_setup_cmd);
  */
 int __nvme_submit_sync_cmd(struct request_queue *q, struct nvme_command *cmd,
 		union nvme_result *result, void *buffer, unsigned bufflen,
-		unsigned timeout, int qid, int at_head, int flags)
+		unsigned timeout, int qid, int at_head,
+		blk_mq_req_flags_t flags)
 {
 	struct request *req;
 	int ret;
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index d3f3c4447515..61b25e8c222c 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -314,14 +314,15 @@ void nvme_start_freeze(struct nvme_ctrl *ctrl);
 
 #define NVME_QID_ANY -1
 struct request *nvme_alloc_request(struct request_queue *q,
-		struct nvme_command *cmd, unsigned int flags, int qid);
+		struct nvme_command *cmd, blk_mq_req_flags_t flags, int qid);
 blk_status_t nvme_setup_cmd(struct nvme_ns *ns, struct request *req,
 		struct nvme_command *cmd);
 int nvme_submit_sync_cmd(struct request_queue *q, struct nvme_command *cmd,
 		void *buf, unsigned bufflen);
 int __nvme_submit_sync_cmd(struct request_queue *q, struct nvme_command *cmd,
 		union nvme_result *result, void *buffer, unsigned bufflen,
-		unsigned timeout, int qid, int at_head, int flags);
+		unsigned timeout, int qid, int at_head,
+		blk_mq_req_flags_t flags);
 int nvme_set_queue_count(struct nvme_ctrl *ctrl, int *count);
 void nvme_start_keep_alive(struct nvme_ctrl *ctrl);
 void nvme_stop_keep_alive(struct nvme_ctrl *ctrl);
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 22c7f36745fc..38ecf1340266 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -210,16 +210,21 @@ void blk_mq_free_request(struct request *rq);
 bool blk_mq_can_queue(struct blk_mq_hw_ctx *);
 
 enum {
-	BLK_MQ_REQ_NOWAIT	= (1 << 0), /* return when out of requests */
-	BLK_MQ_REQ_RESERVED	= (1 << 1), /* allocate from reserved pool */
-	BLK_MQ_REQ_INTERNAL	= (1 << 2), /* allocate internal/sched tag */
-	BLK_MQ_REQ_PREEMPT	= (1 << 3), /* set RQF_PREEMPT */
+	/* return when out of requests */
+	BLK_MQ_REQ_NOWAIT	= (__force blk_mq_req_flags_t)(1 << 0),
+	/* allocate from reserved pool */
+	BLK_MQ_REQ_RESERVED	= (__force blk_mq_req_flags_t)(1 << 1),
+	/* allocate internal/sched tag */
+	BLK_MQ_REQ_INTERNAL	= (__force blk_mq_req_flags_t)(1 << 2),
+	/* set RQF_PREEMPT */
+	BLK_MQ_REQ_PREEMPT	= (__force blk_mq_req_flags_t)(1 << 3),
 };
 
 struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op,
-		unsigned int flags);
+		blk_mq_req_flags_t flags);
 struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
-		unsigned int op, unsigned int flags, unsigned int hctx_idx);
+		unsigned int op, blk_mq_req_flags_t flags,
+		unsigned int hctx_idx);
 struct request *blk_mq_tag_to_rq(struct blk_mq_tags *tags, unsigned int tag);
 
 enum {
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 3385c89f402e..cbd908478140 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -162,6 +162,8 @@ struct bio {
  */
 #define BIO_RESET_BITS	BVEC_POOL_OFFSET
 
+typedef __u32 __bitwise blk_mq_req_flags_t;
+
 /*
  * Operations and flags common to the bio and request structures.
  * We use 8 bits for encoding the operation, and the remaining 24 for flags.
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 4f91c6462752..9a7d7e775cd0 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -932,7 +932,7 @@ extern void blk_put_request(struct request *);
 extern void __blk_put_request(struct request_queue *, struct request *);
 extern struct request *blk_get_request_flags(struct request_queue *,
 					     unsigned int op,
-					     unsigned int flags);
+					     blk_mq_req_flags_t flags);
 extern struct request *blk_get_request(struct request_queue *, unsigned int op,
 				       gfp_t gfp_mask);
 extern void blk_requeue_request(struct request_queue *, struct request *);
@@ -956,7 +956,7 @@ extern int scsi_cmd_ioctl(struct request_queue *, struct gendisk *, fmode_t,
 extern int sg_scsi_ioctl(struct request_queue *, struct gendisk *, fmode_t,
 			 struct scsi_ioctl_command __user *);
 
-extern int blk_queue_enter(struct request_queue *q, unsigned int flags);
+extern int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags);
 extern void blk_queue_exit(struct request_queue *q);
 extern void blk_start_queue(struct request_queue *q);
 extern void blk_start_queue_async(struct request_queue *q);
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v10 00/10] block, scsi, md: Improve suspend and resume
  2017-10-17 23:26 [PATCH v10 00/10] block, scsi, md: Improve suspend and resume Bart Van Assche
                   ` (9 preceding siblings ...)
  2017-10-17 23:26 ` [PATCH v10 10/10] block, nvme: Introduce blk_mq_req_flags_t Bart Van Assche
@ 2017-10-17 23:28 ` Jens Axboe
  2017-10-17 23:40     ` Bart Van Assche
  2017-10-18  5:02     ` Oleksandr Natalenko
  10 siblings, 2 replies; 25+ messages in thread
From: Jens Axboe @ 2017-10-17 23:28 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-block, linux-scsi, Christoph Hellwig, Martin K . Petersen,
	Oleksandr Natalenko, Ming Lei

On 10/17/2017 05:26 PM, Bart Van Assche wrote:
> Hello Jens,
> 
> It is known that during the resume following a hibernate, especially when
> using an md RAID1 array created on top of SCSI devices, sometimes the system
> hangs instead of coming up properly. This patch series fixes that
> problem. These patches have been tested on top of the block layer for-next
> branch. Please consider these changes for kernel v4.15.

The folks that had trouble with this issue, have they tested the
patchset? I'd like to get a solution queued up for this sooner rather
than later, but I'd greatly prefer if we had some folks that have
tested and verified it.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v10 00/10] block, scsi, md: Improve suspend and resume
  2017-10-17 23:28 ` [PATCH v10 00/10] block, scsi, md: Improve suspend and resume Jens Axboe
@ 2017-10-17 23:40     ` Bart Van Assche
  2017-10-18  5:02     ` Oleksandr Natalenko
  1 sibling, 0 replies; 25+ messages in thread
From: Bart Van Assche @ 2017-10-17 23:40 UTC (permalink / raw)
  To: axboe@kernel.dk
  Cc: linux-scsi@vger.kernel.org, hch@lst.de,
	linux-block@vger.kernel.org, oleksandr@natalenko.name,
	martin.petersen@oracle.com, ming.lei@redhat.com

T24gVHVlLCAyMDE3LTEwLTE3IGF0IDE3OjI4IC0wNjAwLCBKZW5zIEF4Ym9lIHdyb3RlOg0KPiBP
biAxMC8xNy8yMDE3IDA1OjI2IFBNLCBCYXJ0IFZhbiBBc3NjaGUgd3JvdGU6DQo+ID4gSXQgaXMg
a25vd24gdGhhdCBkdXJpbmcgdGhlIHJlc3VtZSBmb2xsb3dpbmcgYSBoaWJlcm5hdGUsIGVzcGVj
aWFsbHkgd2hlbg0KPiA+IHVzaW5nIGFuIG1kIFJBSUQxIGFycmF5IGNyZWF0ZWQgb24gdG9wIG9m
IFNDU0kgZGV2aWNlcywgc29tZXRpbWVzIHRoZSBzeXN0ZW0NCj4gPiBoYW5ncyBpbnN0ZWFkIG9m
IGNvbWluZyB1cCBwcm9wZXJseS4gVGhpcyBwYXRjaCBzZXJpZXMgZml4ZXMgdGhhdA0KPiA+IHBy
b2JsZW0uIFRoZXNlIHBhdGNoZXMgaGF2ZSBiZWVuIHRlc3RlZCBvbiB0b3Agb2YgdGhlIGJsb2Nr
IGxheWVyIGZvci1uZXh0DQo+ID4gYnJhbmNoLiBQbGVhc2UgY29uc2lkZXIgdGhlc2UgY2hhbmdl
cyBmb3Iga2VybmVsIHY0LjE1Lg0KPiANCj4gVGhlIGZvbGtzIHRoYXQgaGFkIHRyb3VibGUgd2l0
aCB0aGlzIGlzc3VlLCBoYXZlIHRoZXkgdGVzdGVkIHRoZQ0KPiBwYXRjaHNldD8gSSdkIGxpa2Ug
dG8gZ2V0IGEgc29sdXRpb24gcXVldWVkIHVwIGZvciB0aGlzIHNvb25lciByYXRoZXINCj4gdGhh
biBsYXRlciwgYnV0IEknZCBncmVhdGx5IHByZWZlciBpZiB3ZSBoYWQgc29tZSBmb2xrcyB0aGF0
IGhhdmUNCj4gdGVzdGVkIGFuZCB2ZXJpZmllZCBpdC4NCg0KSGVsbG8gSmVucywNCg0KUHJldmlv
dXMgdmVyc2lvbnMgb2YgdGhpcyBzZXJpZXMgaGF2ZSBiZWVuIHRlc3RlZCBieSB0aGUgdHdvIHJl
cG9ydGVycyBvZg0KdGhpcyBpc3N1ZSAoT2xla3NhbmRyIE5hdGFsZW5rbyBhbmQgTWFydGluIFN0
ZWlnZXJ3YWxkKS4gSSB0aGluayB0aGlzIHZlcnNpb24NCmlzIGNsb3NlIGVub3VnaCB0byB3aGF0
IGhhcyBiZWVuIHRlc3RlZCBieSBPbGVrc2FuZHIgYW5kIE1hcnRpbiBmb3IgdGhlaXINCnRlc3Qg
cmVzdWx0cyBzdGlsbCB0byBiZSB2YWxpZC4gU2VlIGFsc286DQoqIGh0dHBzOi8vd3d3Lm1haWwt
YXJjaGl2ZS5jb20vbGludXgtYmxvY2tAdmdlci5rZXJuZWwub3JnL21zZzEzOTMwLmh0bWwNCiog
aHR0cHM6Ly93d3cubWFpbC1hcmNoaXZlLmNvbS9saW51eC1ibG9ja0B2Z2VyLmtlcm5lbC5vcmcv
bXNnMTQzODkuaHRtbA0KDQpUaGFua3MsDQoNCkJhcnQuDQo=

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v10 00/10] block, scsi, md: Improve suspend and resume
@ 2017-10-17 23:40     ` Bart Van Assche
  0 siblings, 0 replies; 25+ messages in thread
From: Bart Van Assche @ 2017-10-17 23:40 UTC (permalink / raw)
  To: axboe@kernel.dk
  Cc: linux-scsi@vger.kernel.org, hch@lst.de,
	linux-block@vger.kernel.org, oleksandr@natalenko.name,
	martin.petersen@oracle.com, ming.lei@redhat.com

On Tue, 2017-10-17 at 17:28 -0600, Jens Axboe wrote:
> On 10/17/2017 05:26 PM, Bart Van Assche wrote:
> > It is known that during the resume following a hibernate, especially when
> > using an md RAID1 array created on top of SCSI devices, sometimes the system
> > hangs instead of coming up properly. This patch series fixes that
> > problem. These patches have been tested on top of the block layer for-next
> > branch. Please consider these changes for kernel v4.15.
> 
> The folks that had trouble with this issue, have they tested the
> patchset? I'd like to get a solution queued up for this sooner rather
> than later, but I'd greatly prefer if we had some folks that have
> tested and verified it.

Hello Jens,

Previous versions of this series have been tested by the two reporters of
this issue (Oleksandr Natalenko and Martin Steigerwald). I think this version
is close enough to what has been tested by Oleksandr and Martin for their
test results still to be valid. See also:
* https://www.mail-archive.com/linux-block@vger.kernel.org/msg13930.html
* https://www.mail-archive.com/linux-block@vger.kernel.org/msg14389.html

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v10 00/10] block, scsi, md: Improve suspend and resume
  2017-10-17 23:40     ` Bart Van Assche
  (?)
@ 2017-10-18  1:47     ` Jens Axboe
  -1 siblings, 0 replies; 25+ messages in thread
From: Jens Axboe @ 2017-10-18  1:47 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-scsi@vger.kernel.org, hch@lst.de,
	linux-block@vger.kernel.org, oleksandr@natalenko.name,
	martin.petersen@oracle.com, ming.lei@redhat.com

On 10/17/2017 05:40 PM, Bart Van Assche wrote:
> On Tue, 2017-10-17 at 17:28 -0600, Jens Axboe wrote:
>> On 10/17/2017 05:26 PM, Bart Van Assche wrote:
>>> It is known that during the resume following a hibernate, especially when
>>> using an md RAID1 array created on top of SCSI devices, sometimes the system
>>> hangs instead of coming up properly. This patch series fixes that
>>> problem. These patches have been tested on top of the block layer for-next
>>> branch. Please consider these changes for kernel v4.15.
>>
>> The folks that had trouble with this issue, have they tested the
>> patchset? I'd like to get a solution queued up for this sooner rather
>> than later, but I'd greatly prefer if we had some folks that have
>> tested and verified it.
> 
> Hello Jens,
> 
> Previous versions of this series have been tested by the two reporters of
> this issue (Oleksandr Natalenko and Martin Steigerwald). I think this version
> is close enough to what has been tested by Oleksandr and Martin for their
> test results still to be valid. See also:
> * https://www.mail-archive.com/linux-block@vger.kernel.org/msg13930.html
> * https://www.mail-archive.com/linux-block@vger.kernel.org/msg14389.html

OK good. Can we add their tested-by then?

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v10 00/10] block, scsi, md: Improve suspend and resume
  2017-10-17 23:28 ` [PATCH v10 00/10] block, scsi, md: Improve suspend and resume Jens Axboe
@ 2017-10-18  5:02     ` Oleksandr Natalenko
  2017-10-18  5:02     ` Oleksandr Natalenko
  1 sibling, 0 replies; 25+ messages in thread
From: Oleksandr Natalenko @ 2017-10-18  5:02 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Bart Van Assche, linux-block, linux-scsi, Christoph Hellwig,
	Martin K . Petersen, Ming Lei

I've indeed tested some previous versions of this patchset, but still use=20
Ming's solution.

Can it be clarified which one (Bart's or Ming's) is a correct approach?

On st=C5=99eda 18. =C5=99=C3=ADjna 2017 1:28:14 CEST Jens Axboe wrote:
> On 10/17/2017 05:26 PM, Bart Van Assche wrote:
> > Hello Jens,
> >=20
> > It is known that during the resume following a hibernate, especially wh=
en
> > using an md RAID1 array created on top of SCSI devices, sometimes the
> > system hangs instead of coming up properly. This patch series fixes that
> > problem. These patches have been tested on top of the block layer for-n=
ext
> > branch. Please consider these changes for kernel v4.15.
>=20
> The folks that had trouble with this issue, have they tested the
> patchset? I'd like to get a solution queued up for this sooner rather
> than later, but I'd greatly prefer if we had some folks that have
> tested and verified it.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v10 00/10] block, scsi, md: Improve suspend and resume
@ 2017-10-18  5:02     ` Oleksandr Natalenko
  0 siblings, 0 replies; 25+ messages in thread
From: Oleksandr Natalenko @ 2017-10-18  5:02 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Bart Van Assche, linux-block, linux-scsi, Christoph Hellwig,
	Martin K . Petersen, Ming Lei

I've indeed tested some previous versions of this patchset, but still use 
Ming's solution.

Can it be clarified which one (Bart's or Ming's) is a correct approach?

On středa 18. října 2017 1:28:14 CEST Jens Axboe wrote:
> On 10/17/2017 05:26 PM, Bart Van Assche wrote:
> > Hello Jens,
> > 
> > It is known that during the resume following a hibernate, especially when
> > using an md RAID1 array created on top of SCSI devices, sometimes the
> > system hangs instead of coming up properly. This patch series fixes that
> > problem. These patches have been tested on top of the block layer for-next
> > branch. Please consider these changes for kernel v4.15.
> 
> The folks that had trouble with this issue, have they tested the
> patchset? I'd like to get a solution queued up for this sooner rather
> than later, but I'd greatly prefer if we had some folks that have
> tested and verified it.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v10 00/10] block, scsi, md: Improve suspend and resume
  2017-10-18  5:02     ` Oleksandr Natalenko
@ 2017-10-18 15:27       ` Bart Van Assche
  -1 siblings, 0 replies; 25+ messages in thread
From: Bart Van Assche @ 2017-10-18 15:27 UTC (permalink / raw)
  To: oleksandr@natalenko.name, axboe@kernel.dk
  Cc: linux-scsi@vger.kernel.org, hch@lst.de,
	linux-block@vger.kernel.org, martin.petersen@oracle.com,
	ming.lei@redhat.com

T24gV2VkLCAyMDE3LTEwLTE4IGF0IDA3OjAyICswMjAwLCBPbGVrc2FuZHIgTmF0YWxlbmtvIHdy
b3RlOg0KPiBJJ3ZlIGluZGVlZCB0ZXN0ZWQgc29tZSBwcmV2aW91cyB2ZXJzaW9ucyBvZiB0aGlz
IHBhdGNoc2V0LCBidXQgc3RpbGwgdXNlIA0KPiBNaW5nJ3Mgc29sdXRpb24uDQo+IA0KPiBDYW4g
aXQgYmUgY2xhcmlmaWVkIHdoaWNoIG9uZSAoQmFydCdzIG9yIE1pbmcncykgaXMgYSBjb3JyZWN0
IGFwcHJvYWNoPw0KDQpIZWxsbyBPbGVrc2FuZHIsDQoNCkkgdGhpbmsgdGhpcyB2ZXJzaW9uICh2
MTApIGhhcyBzaWduaWZpY2FudCBhZHZhbnRhZ2VzIG92ZXIgdGhlIG1vc3QgcmVjZW50DQpwYXRj
aCBzZXJpZXMgcG9zdGVkIGJ5IE1pbmcgTGVpIHRvIGFkZHJlc3Mgc3VzcGVuZCwgcmVzdW1lIGFu
ZCBTUEkgZG9tYWluDQp2YWxpZGF0aW9uLiBTbyBpdCB3b3VsZCBiZSBhcHByZWNpYXRlZCBpZiB5
b3UgY291bGQgc3dpdGNoIHRvIHRoaXMgc2VyaWVzIGZvcg0KdGVzdGluZyBzdXNwZW5kIGFuZCBy
ZXN1bWUuIEEgZ2l0IGJyYW5jaCB3aXRoIHRoaXMgcGF0Y2ggc2VyaWVzIGlzIGF2YWlsYWJsZQ0K
YXQgaHR0cHM6Ly9naXRodWIuY29tL2J2YW5hc3NjaGUvbGludXgvdHJlZS9ibGstbXEtcG0tMjAx
Ny0xMC0xOC12MTAuDQoNClRoYW5rcywNCg0KQmFydC4=

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v10 00/10] block, scsi, md: Improve suspend and resume
@ 2017-10-18 15:27       ` Bart Van Assche
  0 siblings, 0 replies; 25+ messages in thread
From: Bart Van Assche @ 2017-10-18 15:27 UTC (permalink / raw)
  To: oleksandr@natalenko.name, axboe@kernel.dk
  Cc: linux-scsi@vger.kernel.org, hch@lst.de,
	linux-block@vger.kernel.org, martin.petersen@oracle.com,
	ming.lei@redhat.com

On Wed, 2017-10-18 at 07:02 +0200, Oleksandr Natalenko wrote:
> I've indeed tested some previous versions of this patchset, but still use 
> Ming's solution.
> 
> Can it be clarified which one (Bart's or Ming's) is a correct approach?

Hello Oleksandr,

I think this version (v10) has significant advantages over the most recent
patch series posted by Ming Lei to address suspend, resume and SPI domain
validation. So it would be appreciated if you could switch to this series for
testing suspend and resume. A git branch with this patch series is available
at https://github.com/bvanassche/linux/tree/blk-mq-pm-2017-10-18-v10.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v10 00/10] block, scsi, md: Improve suspend and resume
  2017-10-18 15:27       ` Bart Van Assche
@ 2017-10-21 17:21         ` Oleksandr Natalenko
  -1 siblings, 0 replies; 25+ messages in thread
From: Oleksandr Natalenko @ 2017-10-21 17:21 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: axboe@kernel.dk, linux-scsi@vger.kernel.org, hch@lst.de,
	linux-block@vger.kernel.org, martin.petersen@oracle.com,
	ming.lei@redhat.com

Well,

I've cherry-picked this series for current upstream/master branch, and got=
=20
this while performing another suspend try:

=3D=3D=3D
[   62.415890] Freezing of tasks failed after 20.007 seconds (1 tasks refus=
ing=20
to freeze, wq_busy=3D0):
[   62.421150] xfsaild/dm-7    D    0   289      2 0x80000000
[   62.425800] Call Trace:
[   62.428902]  __schedule+0x239/0x870
[   62.431834]  schedule+0x33/0x90
[   62.434156]  _xfs_log_force+0x143/0x280 [xfs]
[   62.438767]  ? schedule_timeout+0x188/0x390
[   62.443592]  ? wake_up_q+0x80/0x80
[   62.446545]  ? xfsaild+0x18d/0x780 [xfs]
[   62.449702]  xfs_log_force+0x2c/0x90 [xfs]
[   62.453217]  xfsaild+0x18d/0x780 [xfs]
[   62.456717]  kthread+0x124/0x140
[   62.459237]  ? kthread+0x124/0x140
[   62.461818]  ? xfs_trans_ail_cursor_first+0x90/0x90 [xfs]
[   62.465146]  ? kthread_create_on_node+0x70/0x70
[   62.467331]  ret_from_fork+0x25/0x30
[   62.474386] Restarting kernel threads ... done.
=3D=3D=3D

After this it looks like the system tried to freeze anyway:

=3D=3D=3D
[   62.478290] OOM killer enabled.
[   62.481711] Restarting tasks ... done.
[   62.488931] PM: suspend exit
[   62.491497] PM: suspend entry (s2idle)
[   62.493445] PM: Syncing filesystems ... done.
[   63.774220] Freezing user space processes ... (elapsed 0.001 seconds) do=
ne.
[   63.782707] OOM killer disabled.
[   63.785226] Freezing remaining freezable tasks ... (elapsed 0.001 second=
s)=20
done.
[   63.861548] sd 1:0:0:0: [sdb] Synchronizing SCSI cache
[   63.868153] sd 0:0:0:0: [sda] Synchronizing SCSI cache
[   63.868463] sd 1:0:0:0: [sdb] Stopping disk
[   63.873216] sd 0:0:0:0: [sda] Stopping disk
=3D=3D=3D

but got hung completely. After some time hung task was detected:

=3D=3D=3D
[  247.531069] INFO: task systemd-sleep:663 blocked for more than 120 secon=
ds.
[  247.535307]       Not tainted 4.14.0-pf0 #1
[  247.537820] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables=
=20
this message.
[  247.541015] systemd-sleep   D    0   663      1 0x00000000
[  247.542706] Call Trace:
[  247.543386]  __schedule+0x239/0x870
[  247.544351]  schedule+0x33/0x90
[  247.545197]  suspend_devices_and_enter+0x61b/0x890
[  247.546539]  ? wait_woken+0x80/0x80
[  247.547517]  pm_suspend+0x340/0x3b0
[  247.548550]  state_store+0x5a/0x90
[  247.549646]  kobj_attr_store+0xf/0x20
[  247.550649]  sysfs_kf_write+0x37/0x40
[  247.551640]  kernfs_fop_write+0x11c/0x1a0
[  247.552708]  __vfs_write+0x37/0x150
[  247.553641]  ? SYSC_newfstat+0x44/0x70
[  247.554628]  vfs_write+0xb1/0x1a0
[  247.555509]  SyS_write+0x55/0xc0
[  247.556366]  entry_SYSCALL_64_fastpath+0x1a/0xa5
[  247.557667] RIP: 0033:0x7f56b74ec8d4
[  247.558616] RSP: 002b:00007fff141c7738 EFLAGS: 00000246 ORIG_RAX:=20
0000000000000001
[  247.560667] RAX: ffffffffffffffda RBX: 000055dd61863290 RCX:=20
00007f56b74ec8d4
[  247.562639] RDX: 0000000000000007 RSI: 000055dd61864eb0 RDI:=20
0000000000000004
[  247.564874] RBP: 00007f56b77b3240 R08: 000055dd61863370 R09:=20
00007f56b79c88c0
[  247.566875] R10: 000000000000000a R11: 0000000000000246 R12:=20
0000000000000000
[  247.569213] R13: 000055dd61863290 R14: 000055dd61863d08 R15:=20
00000000ffffffea
=3D=3D=3D

P.S. Current Ming's series is enough for 4.13 to not experience any issues=
=20
like this.

On st=C5=99eda 18. =C5=99=C3=ADjna 2017 17:27:14 CEST Bart Van Assche wrote:
> I think this version (v10) has significant advantages over the most recent
> patch series posted by Ming Lei to address suspend, resume and SPI domain
> validation. So it would be appreciated if you could switch to this series
> for testing suspend and resume.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v10 00/10] block, scsi, md: Improve suspend and resume
@ 2017-10-21 17:21         ` Oleksandr Natalenko
  0 siblings, 0 replies; 25+ messages in thread
From: Oleksandr Natalenko @ 2017-10-21 17:21 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: axboe@kernel.dk, linux-scsi@vger.kernel.org, hch@lst.de,
	linux-block@vger.kernel.org, martin.petersen@oracle.com,
	ming.lei@redhat.com

Well,

I've cherry-picked this series for current upstream/master branch, and got 
this while performing another suspend try:

===
[   62.415890] Freezing of tasks failed after 20.007 seconds (1 tasks refusing 
to freeze, wq_busy=0):
[   62.421150] xfsaild/dm-7    D    0   289      2 0x80000000
[   62.425800] Call Trace:
[   62.428902]  __schedule+0x239/0x870
[   62.431834]  schedule+0x33/0x90
[   62.434156]  _xfs_log_force+0x143/0x280 [xfs]
[   62.438767]  ? schedule_timeout+0x188/0x390
[   62.443592]  ? wake_up_q+0x80/0x80
[   62.446545]  ? xfsaild+0x18d/0x780 [xfs]
[   62.449702]  xfs_log_force+0x2c/0x90 [xfs]
[   62.453217]  xfsaild+0x18d/0x780 [xfs]
[   62.456717]  kthread+0x124/0x140
[   62.459237]  ? kthread+0x124/0x140
[   62.461818]  ? xfs_trans_ail_cursor_first+0x90/0x90 [xfs]
[   62.465146]  ? kthread_create_on_node+0x70/0x70
[   62.467331]  ret_from_fork+0x25/0x30
[   62.474386] Restarting kernel threads ... done.
===

After this it looks like the system tried to freeze anyway:

===
[   62.478290] OOM killer enabled.
[   62.481711] Restarting tasks ... done.
[   62.488931] PM: suspend exit
[   62.491497] PM: suspend entry (s2idle)
[   62.493445] PM: Syncing filesystems ... done.
[   63.774220] Freezing user space processes ... (elapsed 0.001 seconds) done.
[   63.782707] OOM killer disabled.
[   63.785226] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) 
done.
[   63.861548] sd 1:0:0:0: [sdb] Synchronizing SCSI cache
[   63.868153] sd 0:0:0:0: [sda] Synchronizing SCSI cache
[   63.868463] sd 1:0:0:0: [sdb] Stopping disk
[   63.873216] sd 0:0:0:0: [sda] Stopping disk
===

but got hung completely. After some time hung task was detected:

===
[  247.531069] INFO: task systemd-sleep:663 blocked for more than 120 seconds.
[  247.535307]       Not tainted 4.14.0-pf0 #1
[  247.537820] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
[  247.541015] systemd-sleep   D    0   663      1 0x00000000
[  247.542706] Call Trace:
[  247.543386]  __schedule+0x239/0x870
[  247.544351]  schedule+0x33/0x90
[  247.545197]  suspend_devices_and_enter+0x61b/0x890
[  247.546539]  ? wait_woken+0x80/0x80
[  247.547517]  pm_suspend+0x340/0x3b0
[  247.548550]  state_store+0x5a/0x90
[  247.549646]  kobj_attr_store+0xf/0x20
[  247.550649]  sysfs_kf_write+0x37/0x40
[  247.551640]  kernfs_fop_write+0x11c/0x1a0
[  247.552708]  __vfs_write+0x37/0x150
[  247.553641]  ? SYSC_newfstat+0x44/0x70
[  247.554628]  vfs_write+0xb1/0x1a0
[  247.555509]  SyS_write+0x55/0xc0
[  247.556366]  entry_SYSCALL_64_fastpath+0x1a/0xa5
[  247.557667] RIP: 0033:0x7f56b74ec8d4
[  247.558616] RSP: 002b:00007fff141c7738 EFLAGS: 00000246 ORIG_RAX: 
0000000000000001
[  247.560667] RAX: ffffffffffffffda RBX: 000055dd61863290 RCX: 
00007f56b74ec8d4
[  247.562639] RDX: 0000000000000007 RSI: 000055dd61864eb0 RDI: 
0000000000000004
[  247.564874] RBP: 00007f56b77b3240 R08: 000055dd61863370 R09: 
00007f56b79c88c0
[  247.566875] R10: 000000000000000a R11: 0000000000000246 R12: 
0000000000000000
[  247.569213] R13: 000055dd61863290 R14: 000055dd61863d08 R15: 
00000000ffffffea
===

P.S. Current Ming's series is enough for 4.13 to not experience any issues 
like this.

On středa 18. října 2017 17:27:14 CEST Bart Van Assche wrote:
> I think this version (v10) has significant advantages over the most recent
> patch series posted by Ming Lei to address suspend, resume and SPI domain
> validation. So it would be appreciated if you could switch to this series
> for testing suspend and resume.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v10 00/10] block, scsi, md: Improve suspend and resume
  2017-10-21 17:21         ` Oleksandr Natalenko
@ 2017-10-21 17:59           ` Bart Van Assche
  -1 siblings, 0 replies; 25+ messages in thread
From: Bart Van Assche @ 2017-10-21 17:59 UTC (permalink / raw)
  To: oleksandr@natalenko.name
  Cc: linux-scsi@vger.kernel.org, hch@lst.de,
	linux-block@vger.kernel.org, martin.petersen@oracle.com,
	axboe@kernel.dk, ming.lei@redhat.com

T24gU2F0LCAyMDE3LTEwLTIxIGF0IDE5OjIxICswMjAwLCBPbGVrc2FuZHIgTmF0YWxlbmtvIHdy
b3RlOg0KPiBJJ3ZlIGNoZXJyeS1waWNrZWQgdGhpcyBzZXJpZXMgZm9yIGN1cnJlbnQgdXBzdHJl
YW0vbWFzdGVyIGJyYW5jaCwgYW5kIGdvdA0KPiB0aGlzIHdoaWxlIHBlcmZvcm1pbmcgYW5vdGhl
ciBzdXNwZW5kIHRyeToNCj4gDQo+ID09PQ0KPiBbICAgNjIuNDE1ODkwXSBGcmVlemluZyBvZiB0
YXNrcyBmYWlsZWQgYWZ0ZXIgMjAuMDA3IHNlY29uZHMgKDEgdGFza3MgcmVmdXNpbmcgDQo+IHRv
IGZyZWV6ZSwgd3FfYnVzeT0wKToNCj4gWyAgIDYyLjQyMTE1MF0geGZzYWlsZC9kbS03ICAgIEQg
ICAgMCAgIDI4OSAgICAgIDIgMHg4MDAwMDAwMA0KPiBbICAgNjIuNDI1ODAwXSBDYWxsIFRyYWNl
Og0KPiBbICAgNjIuNDI4OTAyXSAgX19zY2hlZHVsZSsweDIzOS8weDg3MA0KPiBbICAgNjIuNDMx
ODM0XSAgc2NoZWR1bGUrMHgzMy8weDkwDQo+IFsgICA2Mi40MzQxNTZdICBfeGZzX2xvZ19mb3Jj
ZSsweDE0My8weDI4MCBbeGZzXQ0KPiBbICAgNjIuNDM4NzY3XSAgPyBzY2hlZHVsZV90aW1lb3V0
KzB4MTg4LzB4MzkwDQo+IFsgICA2Mi40NDM1OTJdICA/IHdha2VfdXBfcSsweDgwLzB4ODANCj4g
WyAgIDYyLjQ0NjU0NV0gID8geGZzYWlsZCsweDE4ZC8weDc4MCBbeGZzXQ0KPiBbICAgNjIuNDQ5
NzAyXSAgeGZzX2xvZ19mb3JjZSsweDJjLzB4OTAgW3hmc10NCj4gWyAgIDYyLjQ1MzIxN10gIHhm
c2FpbGQrMHgxOGQvMHg3ODAgW3hmc10NCj4gWyAgIDYyLjQ1NjcxN10gIGt0aHJlYWQrMHgxMjQv
MHgxNDANCj4gWyAgIDYyLjQ1OTIzN10gID8ga3RocmVhZCsweDEyNC8weDE0MA0KPiBbICAgNjIu
NDYxODE4XSAgPyB4ZnNfdHJhbnNfYWlsX2N1cnNvcl9maXJzdCsweDkwLzB4OTAgW3hmc10NCj4g
WyAgIDYyLjQ2NTE0Nl0gID8ga3RocmVhZF9jcmVhdGVfb25fbm9kZSsweDcwLzB4NzANCj4gWyAg
IDYyLjQ2NzMzMV0gIHJldF9mcm9tX2ZvcmsrMHgyNS8weDMwDQo+IFsgICA2Mi40NzQzODZdIFJl
c3RhcnRpbmcga2VybmVsIHRocmVhZHMgLi4uIGRvbmUuDQo+ID09PQ0KDQpIZWxsbyBPbGVrc2Fu
ZHIsDQoNClRoYW5rIHlvdSBmb3IgaGF2aW5nIHRlc3RlZCB2MTAgb2YgdGhpcyBwYXRjaCBzZXJp
ZXMuIFRoaXMgcGF0Y2ggc2VyaWVzIG9ubHkNCmNoYW5nZXMgdGhlIGJlaGF2aW9yIG9mIHN1c3Bl
bmRpbmcgZGV2aWNlcyBidXQgbm90IHRoZSBiZWhhdmlvciBvZiBmcmVlemluZw0KdGhlIFhGUyBr
ZXJuZWwgdGhyZWFkcy4gRHVyaW5nIHN1c3BlbmQgdGFzayBmcmVlemluZyBvY2N1cnMgKmJlZm9y
ZSoNCnN1c3BlbmRpbmcgZGV2aWNlcy4gSW4gb3RoZXIgd29yZHMsIHRoZSBsb2NrdXAgb2NjdXJy
ZWQgYmVmb3JlIHRoZSBjb2RlIHdhcw0KcmVhY2hlZCB0aGF0IGlzIG1vZGlmaWVkIGJ5IHRoaXMg
cGF0Y2ggc2VyaWVzLiBTbyBJIHRoaW5rIHlvdSByYW4gaW50byBhIGJ1Zw0KaW4gdGhlIFhGUyBj
b2RlIGFuZCBub3QgaW50byBhIGJ1ZyBpbiB0aGlzIHBhdGNoIHNlcmllcy4NCg0KQmFydC4=

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v10 00/10] block, scsi, md: Improve suspend and resume
@ 2017-10-21 17:59           ` Bart Van Assche
  0 siblings, 0 replies; 25+ messages in thread
From: Bart Van Assche @ 2017-10-21 17:59 UTC (permalink / raw)
  To: oleksandr@natalenko.name
  Cc: linux-scsi@vger.kernel.org, hch@lst.de,
	linux-block@vger.kernel.org, martin.petersen@oracle.com,
	axboe@kernel.dk, ming.lei@redhat.com

On Sat, 2017-10-21 at 19:21 +0200, Oleksandr Natalenko wrote:
> I've cherry-picked this series for current upstream/master branch, and got
> this while performing another suspend try:
> 
> ===
> [   62.415890] Freezing of tasks failed after 20.007 seconds (1 tasks refusing 
> to freeze, wq_busy=0):
> [   62.421150] xfsaild/dm-7    D    0   289      2 0x80000000
> [   62.425800] Call Trace:
> [   62.428902]  __schedule+0x239/0x870
> [   62.431834]  schedule+0x33/0x90
> [   62.434156]  _xfs_log_force+0x143/0x280 [xfs]
> [   62.438767]  ? schedule_timeout+0x188/0x390
> [   62.443592]  ? wake_up_q+0x80/0x80
> [   62.446545]  ? xfsaild+0x18d/0x780 [xfs]
> [   62.449702]  xfs_log_force+0x2c/0x90 [xfs]
> [   62.453217]  xfsaild+0x18d/0x780 [xfs]
> [   62.456717]  kthread+0x124/0x140
> [   62.459237]  ? kthread+0x124/0x140
> [   62.461818]  ? xfs_trans_ail_cursor_first+0x90/0x90 [xfs]
> [   62.465146]  ? kthread_create_on_node+0x70/0x70
> [   62.467331]  ret_from_fork+0x25/0x30
> [   62.474386] Restarting kernel threads ... done.
> ===

Hello Oleksandr,

Thank you for having tested v10 of this patch series. This patch series only
changes the behavior of suspending devices but not the behavior of freezing
the XFS kernel threads. During suspend task freezing occurs *before*
suspending devices. In other words, the lockup occurred before the code was
reached that is modified by this patch series. So I think you ran into a bug
in the XFS code and not into a bug in this patch series.

Bart.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v10 00/10] block, scsi, md: Improve suspend and resume
  2017-10-21 17:59           ` Bart Van Assche
@ 2017-10-21 18:31             ` Bart Van Assche
  -1 siblings, 0 replies; 25+ messages in thread
From: Bart Van Assche @ 2017-10-21 18:31 UTC (permalink / raw)
  To: oleksandr@natalenko.name
  Cc: linux-scsi@vger.kernel.org, hch@lst.de,
	linux-block@vger.kernel.org, martin.petersen@oracle.com,
	axboe@kernel.dk, ming.lei@redhat.com

T24gU2F0LCAyMDE3LTEwLTIxIGF0IDE3OjU5ICswMDAwLCBCYXJ0IFZhbiBBc3NjaGUgd3JvdGU6
DQo+IE9uIFNhdCwgMjAxNy0xMC0yMSBhdCAxOToyMSArMDIwMCwgT2xla3NhbmRyIE5hdGFsZW5r
byB3cm90ZToNCj4gPiBJJ3ZlIGNoZXJyeS1waWNrZWQgdGhpcyBzZXJpZXMgZm9yIGN1cnJlbnQg
dXBzdHJlYW0vbWFzdGVyIGJyYW5jaCwgYW5kIGdvdA0KPiA+IHRoaXMgd2hpbGUgcGVyZm9ybWlu
ZyBhbm90aGVyIHN1c3BlbmQgdHJ5Og0KPiA+IA0KPiA+ID09PQ0KPiA+IFsgICA2Mi40MTU4OTBd
IEZyZWV6aW5nIG9mIHRhc2tzIGZhaWxlZCBhZnRlciAyMC4wMDcgc2Vjb25kcyAoMSB0YXNrcyBy
ZWZ1c2luZyANCj4gPiB0byBmcmVlemUsIHdxX2J1c3k9MCk6DQo+ID4gWyAgIDYyLjQyMTE1MF0g
eGZzYWlsZC9kbS03ICAgIEQgICAgMCAgIDI4OSAgICAgIDIgMHg4MDAwMDAwMA0KPiA+IFsgICA2
Mi40MjU4MDBdIENhbGwgVHJhY2U6DQo+ID4gWyAgIDYyLjQyODkwMl0gIF9fc2NoZWR1bGUrMHgy
MzkvMHg4NzANCj4gPiBbICAgNjIuNDMxODM0XSAgc2NoZWR1bGUrMHgzMy8weDkwDQo+ID4gWyAg
IDYyLjQzNDE1Nl0gIF94ZnNfbG9nX2ZvcmNlKzB4MTQzLzB4MjgwIFt4ZnNdDQo+ID4gWyAgIDYy
LjQzODc2N10gID8gc2NoZWR1bGVfdGltZW91dCsweDE4OC8weDM5MA0KPiA+IFsgICA2Mi40NDM1
OTJdICA/IHdha2VfdXBfcSsweDgwLzB4ODANCj4gPiBbICAgNjIuNDQ2NTQ1XSAgPyB4ZnNhaWxk
KzB4MThkLzB4NzgwIFt4ZnNdDQo+ID4gWyAgIDYyLjQ0OTcwMl0gIHhmc19sb2dfZm9yY2UrMHgy
Yy8weDkwIFt4ZnNdDQo+ID4gWyAgIDYyLjQ1MzIxN10gIHhmc2FpbGQrMHgxOGQvMHg3ODAgW3hm
c10NCj4gPiBbICAgNjIuNDU2NzE3XSAga3RocmVhZCsweDEyNC8weDE0MA0KPiA+IFsgICA2Mi40
NTkyMzddICA/IGt0aHJlYWQrMHgxMjQvMHgxNDANCj4gPiBbICAgNjIuNDYxODE4XSAgPyB4ZnNf
dHJhbnNfYWlsX2N1cnNvcl9maXJzdCsweDkwLzB4OTAgW3hmc10NCj4gPiBbICAgNjIuNDY1MTQ2
XSAgPyBrdGhyZWFkX2NyZWF0ZV9vbl9ub2RlKzB4NzAvMHg3MA0KPiA+IFsgICA2Mi40NjczMzFd
ICByZXRfZnJvbV9mb3JrKzB4MjUvMHgzMA0KPiA+IFsgICA2Mi40NzQzODZdIFJlc3RhcnRpbmcg
a2VybmVsIHRocmVhZHMgLi4uIGRvbmUuDQo+ID4gPT09DQo+IA0KPiBUaGFuayB5b3UgZm9yIGhh
dmluZyB0ZXN0ZWQgdjEwIG9mIHRoaXMgcGF0Y2ggc2VyaWVzLiBUaGlzIHBhdGNoIHNlcmllcyBv
bmx5DQo+IGNoYW5nZXMgdGhlIGJlaGF2aW9yIG9mIHN1c3BlbmRpbmcgZGV2aWNlcyBidXQgbm90
IHRoZSBiZWhhdmlvciBvZiBmcmVlemluZw0KPiB0aGUgWEZTIGtlcm5lbCB0aHJlYWRzLiBEdXJp
bmcgc3VzcGVuZCB0YXNrIGZyZWV6aW5nIG9jY3VycyAqYmVmb3JlKg0KPiBzdXNwZW5kaW5nIGRl
dmljZXMuIEluIG90aGVyIHdvcmRzLCB0aGUgbG9ja3VwIG9jY3VycmVkIGJlZm9yZSB0aGUgY29k
ZSB3YXMNCj4gcmVhY2hlZCB0aGF0IGlzIG1vZGlmaWVkIGJ5IHRoaXMgcGF0Y2ggc2VyaWVzLiBT
byBJIHRoaW5rIHlvdSByYW4gaW50byBhIGJ1Zw0KPiBpbiB0aGUgWEZTIGNvZGUgYW5kIG5vdCBp
bnRvIGEgYnVnIGluIHRoaXMgcGF0Y2ggc2VyaWVzLg0KDQpPbiBzZWNvbmQgdGhvdWdodCwgSSB3
YW50IHRvIHRha2UgYmFjayB3aGF0IEkgd3JvdGUgYWJvdXQgWEZTLiBTaW5jZSB4ZnNhaWxkKCkN
CmNhbiBzdWJtaXQgSS9PLCB3aGF0IG1heSBoYXZlIGhhcHBlbmVkIGlzIHRoYXQgdGhlIG1kIGtl
cm5lbCB0aHJlYWQgZ290IGZyb3plbg0KYmVmb3JlIHRoZSBYRlMga2VybmVsIHRocmVhZC4gSSB3
aWxsIHNlZSB3aGF0IEkgY2FuIGRvIHRvIGF2b2lkIGZyZWV6ZSBmYWlsdXJlcw0KbGlrZSB0aGUg
b25lIHlvdSByZXBvcnRlZC4NCg0KQmFydC4=

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v10 00/10] block, scsi, md: Improve suspend and resume
@ 2017-10-21 18:31             ` Bart Van Assche
  0 siblings, 0 replies; 25+ messages in thread
From: Bart Van Assche @ 2017-10-21 18:31 UTC (permalink / raw)
  To: oleksandr@natalenko.name
  Cc: linux-scsi@vger.kernel.org, hch@lst.de,
	linux-block@vger.kernel.org, martin.petersen@oracle.com,
	axboe@kernel.dk, ming.lei@redhat.com

On Sat, 2017-10-21 at 17:59 +0000, Bart Van Assche wrote:
> On Sat, 2017-10-21 at 19:21 +0200, Oleksandr Natalenko wrote:
> > I've cherry-picked this series for current upstream/master branch, and got
> > this while performing another suspend try:
> > 
> > ===
> > [   62.415890] Freezing of tasks failed after 20.007 seconds (1 tasks refusing 
> > to freeze, wq_busy=0):
> > [   62.421150] xfsaild/dm-7    D    0   289      2 0x80000000
> > [   62.425800] Call Trace:
> > [   62.428902]  __schedule+0x239/0x870
> > [   62.431834]  schedule+0x33/0x90
> > [   62.434156]  _xfs_log_force+0x143/0x280 [xfs]
> > [   62.438767]  ? schedule_timeout+0x188/0x390
> > [   62.443592]  ? wake_up_q+0x80/0x80
> > [   62.446545]  ? xfsaild+0x18d/0x780 [xfs]
> > [   62.449702]  xfs_log_force+0x2c/0x90 [xfs]
> > [   62.453217]  xfsaild+0x18d/0x780 [xfs]
> > [   62.456717]  kthread+0x124/0x140
> > [   62.459237]  ? kthread+0x124/0x140
> > [   62.461818]  ? xfs_trans_ail_cursor_first+0x90/0x90 [xfs]
> > [   62.465146]  ? kthread_create_on_node+0x70/0x70
> > [   62.467331]  ret_from_fork+0x25/0x30
> > [   62.474386] Restarting kernel threads ... done.
> > ===
> 
> Thank you for having tested v10 of this patch series. This patch series only
> changes the behavior of suspending devices but not the behavior of freezing
> the XFS kernel threads. During suspend task freezing occurs *before*
> suspending devices. In other words, the lockup occurred before the code was
> reached that is modified by this patch series. So I think you ran into a bug
> in the XFS code and not into a bug in this patch series.

On second thought, I want to take back what I wrote about XFS. Since xfsaild()
can submit I/O, what may have happened is that the md kernel thread got frozen
before the XFS kernel thread. I will see what I can do to avoid freeze failures
like the one you reported.

Bart.

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2017-10-21 18:31 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-10-17 23:26 [PATCH v10 00/10] block, scsi, md: Improve suspend and resume Bart Van Assche
2017-10-17 23:26 ` [PATCH v10 01/10] md: Rename md_notifier into md_reboot_notifier Bart Van Assche
2017-10-17 23:26 ` [PATCH v10 02/10] md: Introduce md_stop_all_writes() Bart Van Assche
2017-10-17 23:26 ` [PATCH v10 03/10] md: Neither resync nor reshape while the system is frozen Bart Van Assche
2017-10-17 23:26 ` [PATCH v10 04/10] block: Make q_usage_counter also track legacy requests Bart Van Assche
2017-10-17 23:26 ` [PATCH v10 05/10] block: Introduce blk_get_request_flags() Bart Van Assche
2017-10-17 23:26 ` [PATCH v10 06/10] block: Introduce BLK_MQ_REQ_PREEMPT Bart Van Assche
2017-10-17 23:26 ` [PATCH v10 07/10] ide, scsi: Tell the block layer at request allocation time about preempt requests Bart Van Assche
2017-10-17 23:26 ` [PATCH v10 08/10] block: Add the QUEUE_FLAG_PREEMPT_ONLY request queue flag Bart Van Assche
2017-10-17 23:26 ` [PATCH v10 09/10] block, scsi: Make SCSI quiesce and resume work reliably Bart Van Assche
2017-10-17 23:26 ` [PATCH v10 10/10] block, nvme: Introduce blk_mq_req_flags_t Bart Van Assche
2017-10-17 23:28 ` [PATCH v10 00/10] block, scsi, md: Improve suspend and resume Jens Axboe
2017-10-17 23:40   ` Bart Van Assche
2017-10-17 23:40     ` Bart Van Assche
2017-10-18  1:47     ` Jens Axboe
2017-10-18  5:02   ` Oleksandr Natalenko
2017-10-18  5:02     ` Oleksandr Natalenko
2017-10-18 15:27     ` Bart Van Assche
2017-10-18 15:27       ` Bart Van Assche
2017-10-21 17:21       ` Oleksandr Natalenko
2017-10-21 17:21         ` Oleksandr Natalenko
2017-10-21 17:59         ` Bart Van Assche
2017-10-21 17:59           ` Bart Van Assche
2017-10-21 18:31           ` Bart Van Assche
2017-10-21 18:31             ` Bart Van Assche

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.