public inbox for linux-block@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 0/9] blk-mq: Implement runtime power management
@ 2018-08-07 22:51 Bart Van Assche
  2018-08-07 22:51 ` [PATCH v5 1/9] block: Change the preempt-only flag into a counter Bart Van Assche
                   ` (8 more replies)
  0 siblings, 9 replies; 21+ messages in thread
From: Bart Van Assche @ 2018-08-07 22:51 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Christoph Hellwig, Bart Van Assche

Hello Jens,

This patch series not only implements runtime power management for blk-mq but
also fixes a starvation issue in the power management code for the legacy
block layer. Please consider this patch series for kernel v4.19.

Thanks,

Bart.

Changes compared to v4:
- Dropped the patches "Give RQF_PREEMPT back its original meaning" and
  "Serialize queue freezing and blk_pre_runtime_suspend()".
- Replaced "percpu_ref_read()" with "percpu_is_in_use()".
- Inserted pm_request_resume() calls in the block layer request allocation
  code such that the context that submits a request no longer has to call
  pm_runtime_get().

Changes compared to v3:
- Avoid adverse interactions between system-wide suspend/resume and runtime
  power management by changing the PREEMPT_ONLY flag into a counter.
- Give RQF_PREEMPT back its original meaning, namely that it is only set for
  ide_preempt requests.
- Remove the flag BLK_MQ_REQ_PREEMPT.
- Removed the pm_request_resume() call.

Changes compared to v2:
- Fixed the build for CONFIG_BLOCK=n.
- Added a patch that introduces percpu_ref_read() in the percpu-counter code.
- Added a patch that makes it easier to detect missing pm_runtime_get*() calls.
- Addressed Jianchao's feedback including the comment about runtime overhead
  of switching a per-cpu counter to atomic mode.

Changes compared to v1:
- Moved the runtime power management code into a separate file.
- Addressed Ming's feedback.

Bart Van Assche (9):
  block: Change the preempt-only flag into a counter
  block: Move power management code into a new source file
  block, scsi: Introduce blk_pm_runtime_exit()
  percpu-refcount: Introduce percpu_ref_is_in_use()
  block: Change the runtime power management approach (1/2)
  block: Change the runtime power management approach (2/2)
  block: Remove blk_pm_requeue_request()
  blk-mq: Insert a blk_pm_put_request() call
  blk-mq: Enable support for runtime power management

 block/Kconfig                   |   5 +
 block/Makefile                  |   1 +
 block/blk-core.c                | 277 ++++----------------------------
 block/blk-mq-debugfs.c          |  11 +-
 block/blk-mq.c                  |   8 +
 block/blk-pm.c                  | 230 ++++++++++++++++++++++++++
 block/blk-pm.h                  |  33 ++++
 block/elevator.c                |  26 +--
 drivers/scsi/scsi_lib.c         |  17 +-
 drivers/scsi/scsi_pm.c          |   1 +
 drivers/scsi/sd.c               |  10 +-
 drivers/scsi/sr.c               |   4 +-
 include/linux/blk-mq.h          |   2 +
 include/linux/blk-pm.h          |  26 +++
 include/linux/blkdev.h          |  38 ++---
 include/linux/percpu-refcount.h |   2 +
 lib/percpu-refcount.c           |  40 +++++
 17 files changed, 420 insertions(+), 311 deletions(-)
 create mode 100644 block/blk-pm.c
 create mode 100644 block/blk-pm.h
 create mode 100644 include/linux/blk-pm.h

-- 
2.18.0

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v5 1/9] block: Change the preempt-only flag into a counter
  2018-08-07 22:51 [PATCH v5 0/9] blk-mq: Implement runtime power management Bart Van Assche
@ 2018-08-07 22:51 ` Bart Van Assche
  2018-08-08  8:21   ` Ming Lei
  2018-08-07 22:51 ` [PATCH v5 2/9] block: Move power management code into a new source file Bart Van Assche
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 21+ messages in thread
From: Bart Van Assche @ 2018-08-07 22:51 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Bart Van Assche, Ming Lei,
	Jianchao Wang, Johannes Thumshirn, Alan Stern

The RQF_PREEMPT flag is used for three purposes:
- In the SCSI core, for making sure that power management requests
  are executed if a device is in the "quiesced" state.
- For domain validation by SCSI drivers that use the parallel port.
- In the IDE driver, for IDE preempt requests.
Rename "preempt-only" into "pm-only" because the primary purpose of
this mode is power management. Since the power management core may
but does not have to resume a runtime suspended device before
performing system-wide suspend and since a later patch will set
"pm-only" mode as long as a block device is runtime suspended, make
it possible to set "pm-only" mode from more than one context. Since
with this change scsi_device_quiesce() is no longer idempotent, make
that function return early if it is called for a quiesced queue.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Alan Stern <stern@rowland.harvard.edu>
---
 block/blk-core.c        | 35 ++++++++++++++++++-----------------
 block/blk-mq-debugfs.c  | 10 +++++++++-
 drivers/scsi/scsi_lib.c | 11 +++++++----
 include/linux/blkdev.h  | 14 +++++++++-----
 4 files changed, 43 insertions(+), 27 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 5e0f4bee3588..375de3658d19 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -421,24 +421,25 @@ void blk_sync_queue(struct request_queue *q)
 EXPORT_SYMBOL(blk_sync_queue);
 
 /**
- * blk_set_preempt_only - set QUEUE_FLAG_PREEMPT_ONLY
+ * blk_set_pm_only - increment pm_only counter
  * @q: request queue pointer
- *
- * Returns the previous value of the PREEMPT_ONLY flag - 0 if the flag was not
- * set and 1 if the flag was already set.
  */
-int blk_set_preempt_only(struct request_queue *q)
+void blk_set_pm_only(struct request_queue *q)
 {
-	return blk_queue_flag_test_and_set(QUEUE_FLAG_PREEMPT_ONLY, q);
+	atomic_inc(&q->pm_only);
 }
-EXPORT_SYMBOL_GPL(blk_set_preempt_only);
+EXPORT_SYMBOL_GPL(blk_set_pm_only);
 
-void blk_clear_preempt_only(struct request_queue *q)
+void blk_clear_pm_only(struct request_queue *q)
 {
-	blk_queue_flag_clear(QUEUE_FLAG_PREEMPT_ONLY, q);
-	wake_up_all(&q->mq_freeze_wq);
+	int pm_only;
+
+	pm_only = atomic_dec_return(&q->pm_only);
+	WARN_ON_ONCE(pm_only < 0);
+	if (pm_only == 0)
+		wake_up_all(&q->mq_freeze_wq);
 }
-EXPORT_SYMBOL_GPL(blk_clear_preempt_only);
+EXPORT_SYMBOL_GPL(blk_clear_pm_only);
 
 /**
  * __blk_run_queue_uncond - run a queue whether or not it has been stopped
@@ -911,7 +912,7 @@ EXPORT_SYMBOL(blk_alloc_queue);
  */
 int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
 {
-	const bool preempt = flags & BLK_MQ_REQ_PREEMPT;
+	const bool pm = flags & BLK_MQ_REQ_PREEMPT;
 
 	while (true) {
 		bool success = false;
@@ -919,11 +920,11 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
 		rcu_read_lock();
 		if (percpu_ref_tryget_live(&q->q_usage_counter)) {
 			/*
-			 * The code that sets the PREEMPT_ONLY flag is
-			 * responsible for ensuring that that flag is globally
-			 * visible before the queue is unfrozen.
+			 * The code that increments the pm_only counter is
+			 * responsible for ensuring that that counter is
+			 * globally visible before the queue is unfrozen.
 			 */
-			if (preempt || !blk_queue_preempt_only(q)) {
+			if (pm || !blk_queue_pm_only(q)) {
 				success = true;
 			} else {
 				percpu_ref_put(&q->q_usage_counter);
@@ -948,7 +949,7 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
 
 		wait_event(q->mq_freeze_wq,
 			   (atomic_read(&q->mq_freeze_depth) == 0 &&
-			    (preempt || !blk_queue_preempt_only(q))) ||
+			    (pm || !blk_queue_pm_only(q))) ||
 			   blk_queue_dying(q));
 		if (blk_queue_dying(q))
 			return -ENODEV;
diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index cb1e6cf7ac48..a5ea86835fcb 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -102,6 +102,14 @@ static int blk_flags_show(struct seq_file *m, const unsigned long flags,
 	return 0;
 }
 
+static int queue_pm_only_show(void *data, struct seq_file *m)
+{
+	struct request_queue *q = data;
+
+	seq_printf(m, "%d\n", atomic_read(&q->pm_only));
+	return 0;
+}
+
 #define QUEUE_FLAG_NAME(name) [QUEUE_FLAG_##name] = #name
 static const char *const blk_queue_flag_name[] = {
 	QUEUE_FLAG_NAME(QUEUED),
@@ -132,7 +140,6 @@ static const char *const blk_queue_flag_name[] = {
 	QUEUE_FLAG_NAME(REGISTERED),
 	QUEUE_FLAG_NAME(SCSI_PASSTHROUGH),
 	QUEUE_FLAG_NAME(QUIESCED),
-	QUEUE_FLAG_NAME(PREEMPT_ONLY),
 };
 #undef QUEUE_FLAG_NAME
 
@@ -209,6 +216,7 @@ static ssize_t queue_write_hint_store(void *data, const char __user *buf,
 static const struct blk_mq_debugfs_attr blk_mq_debugfs_queue_attrs[] = {
 	{ "poll_stat", 0400, queue_poll_stat_show },
 	{ "requeue_list", 0400, .seq_ops = &queue_requeue_list_seq_ops },
+	{ "pm_only", 0600, queue_pm_only_show, NULL },
 	{ "state", 0600, queue_state_show, queue_state_write },
 	{ "write_hints", 0600, queue_write_hint_show, queue_write_hint_store },
 	{ "zone_wlock", 0400, queue_zone_wlock_show, NULL },
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 9cb9a166fa0c..0e48136329c6 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -2996,11 +2996,14 @@ scsi_device_quiesce(struct scsi_device *sdev)
 	 */
 	WARN_ON_ONCE(sdev->quiesced_by && sdev->quiesced_by != current);
 
-	blk_set_preempt_only(q);
+	if (sdev->quiesced_by == current)
+		return 0;
+
+	blk_set_pm_only(q);
 
 	blk_mq_freeze_queue(q);
 	/*
-	 * Ensure that the effect of blk_set_preempt_only() will be visible
+	 * Ensure that the effect of blk_set_pm_only() will be visible
 	 * for percpu_ref_tryget() callers that occur after the queue
 	 * unfreeze even if the queue was already frozen before this function
 	 * was called. See also https://lwn.net/Articles/573497/.
@@ -3013,7 +3016,7 @@ scsi_device_quiesce(struct scsi_device *sdev)
 	if (err == 0)
 		sdev->quiesced_by = current;
 	else
-		blk_clear_preempt_only(q);
+		blk_clear_pm_only(q);
 	mutex_unlock(&sdev->state_mutex);
 
 	return err;
@@ -3038,7 +3041,7 @@ void scsi_device_resume(struct scsi_device *sdev)
 	mutex_lock(&sdev->state_mutex);
 	WARN_ON_ONCE(!sdev->quiesced_by);
 	sdev->quiesced_by = NULL;
-	blk_clear_preempt_only(sdev->request_queue);
+	blk_clear_pm_only(sdev->request_queue);
 	if (sdev->sdev_state == SDEV_QUIESCE)
 		scsi_device_set_state(sdev, SDEV_RUNNING);
 	mutex_unlock(&sdev->state_mutex);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 050d599f5ea9..1a08edbddf9f 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -506,6 +506,12 @@ struct request_queue {
 	 * various queue flags, see QUEUE_* below
 	 */
 	unsigned long		queue_flags;
+	/*
+	 * Number of contexts that have called blk_set_pm_only(). If this
+	 * counter is above zero then only RQF_PM and RQF_PREEMPT requests are
+	 * processed.
+	 */
+	atomic_t		pm_only;
 
 	/*
 	 * ida allocated id for this queue.  Used to index queues from
@@ -700,7 +706,6 @@ struct request_queue {
 #define QUEUE_FLAG_REGISTERED  26	/* queue has been registered to a disk */
 #define QUEUE_FLAG_SCSI_PASSTHROUGH 27	/* queue supports SCSI commands */
 #define QUEUE_FLAG_QUIESCED    28	/* queue has been quiesced */
-#define QUEUE_FLAG_PREEMPT_ONLY	29	/* only process REQ_PREEMPT requests */
 
 #define QUEUE_FLAG_DEFAULT	((1 << QUEUE_FLAG_IO_STAT) |		\
 				 (1 << QUEUE_FLAG_SAME_COMP)	|	\
@@ -738,12 +743,11 @@ bool blk_queue_flag_test_and_clear(unsigned int flag, struct request_queue *q);
 	((rq)->cmd_flags & (REQ_FAILFAST_DEV|REQ_FAILFAST_TRANSPORT| \
 			     REQ_FAILFAST_DRIVER))
 #define blk_queue_quiesced(q)	test_bit(QUEUE_FLAG_QUIESCED, &(q)->queue_flags)
-#define blk_queue_preempt_only(q)				\
-	test_bit(QUEUE_FLAG_PREEMPT_ONLY, &(q)->queue_flags)
+#define blk_queue_pm_only(q)	atomic_read(&(q)->pm_only)
 #define blk_queue_fua(q)	test_bit(QUEUE_FLAG_FUA, &(q)->queue_flags)
 
-extern int blk_set_preempt_only(struct request_queue *q);
-extern void blk_clear_preempt_only(struct request_queue *q);
+extern void blk_set_pm_only(struct request_queue *q);
+extern void blk_clear_pm_only(struct request_queue *q);
 
 static inline int queue_in_flight(struct request_queue *q)
 {
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v5 2/9] block: Move power management code into a new source file
  2018-08-07 22:51 [PATCH v5 0/9] blk-mq: Implement runtime power management Bart Van Assche
  2018-08-07 22:51 ` [PATCH v5 1/9] block: Change the preempt-only flag into a counter Bart Van Assche
@ 2018-08-07 22:51 ` Bart Van Assche
  2018-08-07 22:51 ` [PATCH v5 3/9] block, scsi: Introduce blk_pm_runtime_exit() Bart Van Assche
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 21+ messages in thread
From: Bart Van Assche @ 2018-08-07 22:51 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Bart Van Assche, Ming Lei,
	Jianchao Wang, Hannes Reinecke, Johannes Thumshirn, Alan Stern

Move the code for runtime power management from blk-core.c into the
new source file blk-pm.c. Move the corresponding declarations from
<linux/blkdev.h> into <linux/blk-pm.h>. For CONFIG_PM=n, leave out
the declarations of the functions that are not used in that mode.
This patch not only reduces the number of #ifdefs in the block layer
core code but also reduces the size of header file <linux/blkdev.h>
and hence should help to reduce the build time of the Linux kernel
if CONFIG_PM is not defined.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Alan Stern <stern@rowland.harvard.edu>
---
 block/Kconfig          |   5 ++
 block/Makefile         |   1 +
 block/blk-core.c       | 196 +----------------------------------------
 block/blk-pm.c         | 188 +++++++++++++++++++++++++++++++++++++++
 block/blk-pm.h         |  43 +++++++++
 block/elevator.c       |  22 +----
 drivers/scsi/scsi_pm.c |   1 +
 drivers/scsi/sd.c      |   1 +
 drivers/scsi/sr.c      |   1 +
 include/linux/blk-pm.h |  24 +++++
 include/linux/blkdev.h |  23 -----
 11 files changed, 266 insertions(+), 239 deletions(-)
 create mode 100644 block/blk-pm.c
 create mode 100644 block/blk-pm.h
 create mode 100644 include/linux/blk-pm.h

diff --git a/block/Kconfig b/block/Kconfig
index 1f2469a0123c..e213d90a5e64 100644
--- a/block/Kconfig
+++ b/block/Kconfig
@@ -228,4 +228,9 @@ config BLK_MQ_RDMA
 	depends on BLOCK && INFINIBAND
 	default y
 
+config BLK_PM
+	bool
+	depends on BLOCK && PM
+	default y
+
 source block/Kconfig.iosched
diff --git a/block/Makefile b/block/Makefile
index 572b33f32c07..27eac600474f 100644
--- a/block/Makefile
+++ b/block/Makefile
@@ -37,3 +37,4 @@ obj-$(CONFIG_BLK_WBT)		+= blk-wbt.o
 obj-$(CONFIG_BLK_DEBUG_FS)	+= blk-mq-debugfs.o
 obj-$(CONFIG_BLK_DEBUG_FS_ZONED)+= blk-mq-debugfs-zoned.o
 obj-$(CONFIG_BLK_SED_OPAL)	+= sed-opal.o
+obj-$(CONFIG_BLK_PM)		+= blk-pm.o
diff --git a/block/blk-core.c b/block/blk-core.c
index 375de3658d19..64697a97147a 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -42,6 +42,7 @@
 #include "blk.h"
 #include "blk-mq.h"
 #include "blk-mq-sched.h"
+#include "blk-pm.h"
 #include "blk-rq-qos.h"
 
 #ifdef CONFIG_DEBUG_FS
@@ -1722,16 +1723,6 @@ void part_round_stats(struct request_queue *q, int cpu, struct hd_struct *part)
 }
 EXPORT_SYMBOL_GPL(part_round_stats);
 
-#ifdef CONFIG_PM
-static void blk_pm_put_request(struct request *rq)
-{
-	if (rq->q->dev && !(rq->rq_flags & RQF_PM) && !--rq->q->nr_pending)
-		pm_runtime_mark_last_busy(rq->q->dev);
-}
-#else
-static inline void blk_pm_put_request(struct request *rq) {}
-#endif
-
 void __blk_put_request(struct request_queue *q, struct request *req)
 {
 	req_flags_t rq_flags = req->rq_flags;
@@ -3748,191 +3739,6 @@ void blk_finish_plug(struct blk_plug *plug)
 }
 EXPORT_SYMBOL(blk_finish_plug);
 
-#ifdef CONFIG_PM
-/**
- * blk_pm_runtime_init - Block layer runtime PM initialization routine
- * @q: the queue of the device
- * @dev: the device the queue belongs to
- *
- * Description:
- *    Initialize runtime-PM-related fields for @q and start auto suspend for
- *    @dev. Drivers that want to take advantage of request-based runtime PM
- *    should call this function after @dev has been initialized, and its
- *    request queue @q has been allocated, and runtime PM for it can not happen
- *    yet(either due to disabled/forbidden or its usage_count > 0). In most
- *    cases, driver should call this function before any I/O has taken place.
- *
- *    This function takes care of setting up using auto suspend for the device,
- *    the autosuspend delay is set to -1 to make runtime suspend impossible
- *    until an updated value is either set by user or by driver. Drivers do
- *    not need to touch other autosuspend settings.
- *
- *    The block layer runtime PM is request based, so only works for drivers
- *    that use request as their IO unit instead of those directly use bio's.
- */
-void blk_pm_runtime_init(struct request_queue *q, struct device *dev)
-{
-	/* Don't enable runtime PM for blk-mq until it is ready */
-	if (q->mq_ops) {
-		pm_runtime_disable(dev);
-		return;
-	}
-
-	q->dev = dev;
-	q->rpm_status = RPM_ACTIVE;
-	pm_runtime_set_autosuspend_delay(q->dev, -1);
-	pm_runtime_use_autosuspend(q->dev);
-}
-EXPORT_SYMBOL(blk_pm_runtime_init);
-
-/**
- * blk_pre_runtime_suspend - Pre runtime suspend check
- * @q: the queue of the device
- *
- * Description:
- *    This function will check if runtime suspend is allowed for the device
- *    by examining if there are any requests pending in the queue. If there
- *    are requests pending, the device can not be runtime suspended; otherwise,
- *    the queue's status will be updated to SUSPENDING and the driver can
- *    proceed to suspend the device.
- *
- *    For the not allowed case, we mark last busy for the device so that
- *    runtime PM core will try to autosuspend it some time later.
- *
- *    This function should be called near the start of the device's
- *    runtime_suspend callback.
- *
- * Return:
- *    0		- OK to runtime suspend the device
- *    -EBUSY	- Device should not be runtime suspended
- */
-int blk_pre_runtime_suspend(struct request_queue *q)
-{
-	int ret = 0;
-
-	if (!q->dev)
-		return ret;
-
-	spin_lock_irq(q->queue_lock);
-	if (q->nr_pending) {
-		ret = -EBUSY;
-		pm_runtime_mark_last_busy(q->dev);
-	} else {
-		q->rpm_status = RPM_SUSPENDING;
-	}
-	spin_unlock_irq(q->queue_lock);
-	return ret;
-}
-EXPORT_SYMBOL(blk_pre_runtime_suspend);
-
-/**
- * blk_post_runtime_suspend - Post runtime suspend processing
- * @q: the queue of the device
- * @err: return value of the device's runtime_suspend function
- *
- * Description:
- *    Update the queue's runtime status according to the return value of the
- *    device's runtime suspend function and mark last busy for the device so
- *    that PM core will try to auto suspend the device at a later time.
- *
- *    This function should be called near the end of the device's
- *    runtime_suspend callback.
- */
-void blk_post_runtime_suspend(struct request_queue *q, int err)
-{
-	if (!q->dev)
-		return;
-
-	spin_lock_irq(q->queue_lock);
-	if (!err) {
-		q->rpm_status = RPM_SUSPENDED;
-	} else {
-		q->rpm_status = RPM_ACTIVE;
-		pm_runtime_mark_last_busy(q->dev);
-	}
-	spin_unlock_irq(q->queue_lock);
-}
-EXPORT_SYMBOL(blk_post_runtime_suspend);
-
-/**
- * blk_pre_runtime_resume - Pre runtime resume processing
- * @q: the queue of the device
- *
- * Description:
- *    Update the queue's runtime status to RESUMING in preparation for the
- *    runtime resume of the device.
- *
- *    This function should be called near the start of the device's
- *    runtime_resume callback.
- */
-void blk_pre_runtime_resume(struct request_queue *q)
-{
-	if (!q->dev)
-		return;
-
-	spin_lock_irq(q->queue_lock);
-	q->rpm_status = RPM_RESUMING;
-	spin_unlock_irq(q->queue_lock);
-}
-EXPORT_SYMBOL(blk_pre_runtime_resume);
-
-/**
- * blk_post_runtime_resume - Post runtime resume processing
- * @q: the queue of the device
- * @err: return value of the device's runtime_resume function
- *
- * Description:
- *    Update the queue's runtime status according to the return value of the
- *    device's runtime_resume function. If it is successfully resumed, process
- *    the requests that are queued into the device's queue when it is resuming
- *    and then mark last busy and initiate autosuspend for it.
- *
- *    This function should be called near the end of the device's
- *    runtime_resume callback.
- */
-void blk_post_runtime_resume(struct request_queue *q, int err)
-{
-	if (!q->dev)
-		return;
-
-	spin_lock_irq(q->queue_lock);
-	if (!err) {
-		q->rpm_status = RPM_ACTIVE;
-		__blk_run_queue(q);
-		pm_runtime_mark_last_busy(q->dev);
-		pm_request_autosuspend(q->dev);
-	} else {
-		q->rpm_status = RPM_SUSPENDED;
-	}
-	spin_unlock_irq(q->queue_lock);
-}
-EXPORT_SYMBOL(blk_post_runtime_resume);
-
-/**
- * blk_set_runtime_active - Force runtime status of the queue to be active
- * @q: the queue of the device
- *
- * If the device is left runtime suspended during system suspend the resume
- * hook typically resumes the device and corrects runtime status
- * accordingly. However, that does not affect the queue runtime PM status
- * which is still "suspended". This prevents processing requests from the
- * queue.
- *
- * This function can be used in driver's resume hook to correct queue
- * runtime PM status and re-enable peeking requests from the queue. It
- * should be called before first request is added to the queue.
- */
-void blk_set_runtime_active(struct request_queue *q)
-{
-	spin_lock_irq(q->queue_lock);
-	q->rpm_status = RPM_ACTIVE;
-	pm_runtime_mark_last_busy(q->dev);
-	pm_request_autosuspend(q->dev);
-	spin_unlock_irq(q->queue_lock);
-}
-EXPORT_SYMBOL(blk_set_runtime_active);
-#endif
-
 int __init blk_dev_init(void)
 {
 	BUILD_BUG_ON(REQ_OP_LAST >= (1 << REQ_OP_BITS));
diff --git a/block/blk-pm.c b/block/blk-pm.c
new file mode 100644
index 000000000000..9b636960d285
--- /dev/null
+++ b/block/blk-pm.c
@@ -0,0 +1,188 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/blk-pm.h>
+#include <linux/blkdev.h>
+#include <linux/pm_runtime.h>
+
+/**
+ * blk_pm_runtime_init - Block layer runtime PM initialization routine
+ * @q: the queue of the device
+ * @dev: the device the queue belongs to
+ *
+ * Description:
+ *    Initialize runtime-PM-related fields for @q and start auto suspend for
+ *    @dev. Drivers that want to take advantage of request-based runtime PM
+ *    should call this function after @dev has been initialized, and its
+ *    request queue @q has been allocated, and runtime PM for it can not happen
+ *    yet(either due to disabled/forbidden or its usage_count > 0). In most
+ *    cases, driver should call this function before any I/O has taken place.
+ *
+ *    This function takes care of setting up using auto suspend for the device,
+ *    the autosuspend delay is set to -1 to make runtime suspend impossible
+ *    until an updated value is either set by user or by driver. Drivers do
+ *    not need to touch other autosuspend settings.
+ *
+ *    The block layer runtime PM is request based, so only works for drivers
+ *    that use request as their IO unit instead of those directly use bio's.
+ */
+void blk_pm_runtime_init(struct request_queue *q, struct device *dev)
+{
+	/* Don't enable runtime PM for blk-mq until it is ready */
+	if (q->mq_ops) {
+		pm_runtime_disable(dev);
+		return;
+	}
+
+	q->dev = dev;
+	q->rpm_status = RPM_ACTIVE;
+	pm_runtime_set_autosuspend_delay(q->dev, -1);
+	pm_runtime_use_autosuspend(q->dev);
+}
+EXPORT_SYMBOL(blk_pm_runtime_init);
+
+/**
+ * blk_pre_runtime_suspend - Pre runtime suspend check
+ * @q: the queue of the device
+ *
+ * Description:
+ *    This function will check if runtime suspend is allowed for the device
+ *    by examining if there are any requests pending in the queue. If there
+ *    are requests pending, the device can not be runtime suspended; otherwise,
+ *    the queue's status will be updated to SUSPENDING and the driver can
+ *    proceed to suspend the device.
+ *
+ *    For the not allowed case, we mark last busy for the device so that
+ *    runtime PM core will try to autosuspend it some time later.
+ *
+ *    This function should be called near the start of the device's
+ *    runtime_suspend callback.
+ *
+ * Return:
+ *    0		- OK to runtime suspend the device
+ *    -EBUSY	- Device should not be runtime suspended
+ */
+int blk_pre_runtime_suspend(struct request_queue *q)
+{
+	int ret = 0;
+
+	if (!q->dev)
+		return ret;
+
+	spin_lock_irq(q->queue_lock);
+	if (q->nr_pending) {
+		ret = -EBUSY;
+		pm_runtime_mark_last_busy(q->dev);
+	} else {
+		q->rpm_status = RPM_SUSPENDING;
+	}
+	spin_unlock_irq(q->queue_lock);
+	return ret;
+}
+EXPORT_SYMBOL(blk_pre_runtime_suspend);
+
+/**
+ * blk_post_runtime_suspend - Post runtime suspend processing
+ * @q: the queue of the device
+ * @err: return value of the device's runtime_suspend function
+ *
+ * Description:
+ *    Update the queue's runtime status according to the return value of the
+ *    device's runtime suspend function and mark last busy for the device so
+ *    that PM core will try to auto suspend the device at a later time.
+ *
+ *    This function should be called near the end of the device's
+ *    runtime_suspend callback.
+ */
+void blk_post_runtime_suspend(struct request_queue *q, int err)
+{
+	if (!q->dev)
+		return;
+
+	spin_lock_irq(q->queue_lock);
+	if (!err) {
+		q->rpm_status = RPM_SUSPENDED;
+	} else {
+		q->rpm_status = RPM_ACTIVE;
+		pm_runtime_mark_last_busy(q->dev);
+	}
+	spin_unlock_irq(q->queue_lock);
+}
+EXPORT_SYMBOL(blk_post_runtime_suspend);
+
+/**
+ * blk_pre_runtime_resume - Pre runtime resume processing
+ * @q: the queue of the device
+ *
+ * Description:
+ *    Update the queue's runtime status to RESUMING in preparation for the
+ *    runtime resume of the device.
+ *
+ *    This function should be called near the start of the device's
+ *    runtime_resume callback.
+ */
+void blk_pre_runtime_resume(struct request_queue *q)
+{
+	if (!q->dev)
+		return;
+
+	spin_lock_irq(q->queue_lock);
+	q->rpm_status = RPM_RESUMING;
+	spin_unlock_irq(q->queue_lock);
+}
+EXPORT_SYMBOL(blk_pre_runtime_resume);
+
+/**
+ * blk_post_runtime_resume - Post runtime resume processing
+ * @q: the queue of the device
+ * @err: return value of the device's runtime_resume function
+ *
+ * Description:
+ *    Update the queue's runtime status according to the return value of the
+ *    device's runtime_resume function. If it is successfully resumed, process
+ *    the requests that are queued into the device's queue when it is resuming
+ *    and then mark last busy and initiate autosuspend for it.
+ *
+ *    This function should be called near the end of the device's
+ *    runtime_resume callback.
+ */
+void blk_post_runtime_resume(struct request_queue *q, int err)
+{
+	if (!q->dev)
+		return;
+
+	spin_lock_irq(q->queue_lock);
+	if (!err) {
+		q->rpm_status = RPM_ACTIVE;
+		__blk_run_queue(q);
+		pm_runtime_mark_last_busy(q->dev);
+		pm_request_autosuspend(q->dev);
+	} else {
+		q->rpm_status = RPM_SUSPENDED;
+	}
+	spin_unlock_irq(q->queue_lock);
+}
+EXPORT_SYMBOL(blk_post_runtime_resume);
+
+/**
+ * blk_set_runtime_active - Force runtime status of the queue to be active
+ * @q: the queue of the device
+ *
+ * If the device is left runtime suspended during system suspend the resume
+ * hook typically resumes the device and corrects runtime status
+ * accordingly. However, that does not affect the queue runtime PM status
+ * which is still "suspended". This prevents processing requests from the
+ * queue.
+ *
+ * This function can be used in driver's resume hook to correct queue
+ * runtime PM status and re-enable peeking requests from the queue. It
+ * should be called before first request is added to the queue.
+ */
+void blk_set_runtime_active(struct request_queue *q)
+{
+	spin_lock_irq(q->queue_lock);
+	q->rpm_status = RPM_ACTIVE;
+	pm_runtime_mark_last_busy(q->dev);
+	pm_request_autosuspend(q->dev);
+	spin_unlock_irq(q->queue_lock);
+}
+EXPORT_SYMBOL(blk_set_runtime_active);
diff --git a/block/blk-pm.h b/block/blk-pm.h
new file mode 100644
index 000000000000..1ffc8ef203ec
--- /dev/null
+++ b/block/blk-pm.h
@@ -0,0 +1,43 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef _BLOCK_BLK_PM_H_
+#define _BLOCK_BLK_PM_H_
+
+#include <linux/pm_runtime.h>
+
+#ifdef CONFIG_PM
+static inline void blk_pm_requeue_request(struct request *rq)
+{
+	if (rq->q->dev && !(rq->rq_flags & RQF_PM))
+		rq->q->nr_pending--;
+}
+
+static inline void blk_pm_add_request(struct request_queue *q,
+				      struct request *rq)
+{
+	if (q->dev && !(rq->rq_flags & RQF_PM) && q->nr_pending++ == 0 &&
+	    (q->rpm_status == RPM_SUSPENDED || q->rpm_status == RPM_SUSPENDING))
+		pm_request_resume(q->dev);
+}
+
+static inline void blk_pm_put_request(struct request *rq)
+{
+	if (rq->q->dev && !(rq->rq_flags & RQF_PM) && !--rq->q->nr_pending)
+		pm_runtime_mark_last_busy(rq->q->dev);
+}
+#else
+static inline void blk_pm_requeue_request(struct request *rq)
+{
+}
+
+static inline void blk_pm_add_request(struct request_queue *q,
+				      struct request *rq)
+{
+}
+
+static inline void blk_pm_put_request(struct request *rq)
+{
+}
+#endif
+
+#endif /* _BLOCK_BLK_PM_H_ */
diff --git a/block/elevator.c b/block/elevator.c
index fa828b5bfd4b..4c15f0240c99 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -41,6 +41,7 @@
 
 #include "blk.h"
 #include "blk-mq-sched.h"
+#include "blk-pm.h"
 #include "blk-wbt.h"
 
 static DEFINE_SPINLOCK(elv_list_lock);
@@ -557,27 +558,6 @@ void elv_bio_merged(struct request_queue *q, struct request *rq,
 		e->type->ops.sq.elevator_bio_merged_fn(q, rq, bio);
 }
 
-#ifdef CONFIG_PM
-static void blk_pm_requeue_request(struct request *rq)
-{
-	if (rq->q->dev && !(rq->rq_flags & RQF_PM))
-		rq->q->nr_pending--;
-}
-
-static void blk_pm_add_request(struct request_queue *q, struct request *rq)
-{
-	if (q->dev && !(rq->rq_flags & RQF_PM) && q->nr_pending++ == 0 &&
-	    (q->rpm_status == RPM_SUSPENDED || q->rpm_status == RPM_SUSPENDING))
-		pm_request_resume(q->dev);
-}
-#else
-static inline void blk_pm_requeue_request(struct request *rq) {}
-static inline void blk_pm_add_request(struct request_queue *q,
-				      struct request *rq)
-{
-}
-#endif
-
 void elv_requeue_request(struct request_queue *q, struct request *rq)
 {
 	/*
diff --git a/drivers/scsi/scsi_pm.c b/drivers/scsi/scsi_pm.c
index b44c1bb687a2..a2b4179bfdf7 100644
--- a/drivers/scsi/scsi_pm.c
+++ b/drivers/scsi/scsi_pm.c
@@ -8,6 +8,7 @@
 #include <linux/pm_runtime.h>
 #include <linux/export.h>
 #include <linux/async.h>
+#include <linux/blk-pm.h>
 
 #include <scsi/scsi.h>
 #include <scsi/scsi_device.h>
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index bbebdc3769b0..69ab459abb98 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -45,6 +45,7 @@
 #include <linux/init.h>
 #include <linux/blkdev.h>
 #include <linux/blkpg.h>
+#include <linux/blk-pm.h>
 #include <linux/delay.h>
 #include <linux/mutex.h>
 #include <linux/string_helpers.h>
diff --git a/drivers/scsi/sr.c b/drivers/scsi/sr.c
index 3f3cb72e0c0c..de4413e66eca 100644
--- a/drivers/scsi/sr.c
+++ b/drivers/scsi/sr.c
@@ -43,6 +43,7 @@
 #include <linux/interrupt.h>
 #include <linux/init.h>
 #include <linux/blkdev.h>
+#include <linux/blk-pm.h>
 #include <linux/mutex.h>
 #include <linux/slab.h>
 #include <linux/pm_runtime.h>
diff --git a/include/linux/blk-pm.h b/include/linux/blk-pm.h
new file mode 100644
index 000000000000..b80c65aba249
--- /dev/null
+++ b/include/linux/blk-pm.h
@@ -0,0 +1,24 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef _BLK_PM_H_
+#define _BLK_PM_H_
+
+struct device;
+struct request_queue;
+
+/*
+ * block layer runtime pm functions
+ */
+#ifdef CONFIG_PM
+extern void blk_pm_runtime_init(struct request_queue *q, struct device *dev);
+extern int blk_pre_runtime_suspend(struct request_queue *q);
+extern void blk_post_runtime_suspend(struct request_queue *q, int err);
+extern void blk_pre_runtime_resume(struct request_queue *q);
+extern void blk_post_runtime_resume(struct request_queue *q, int err);
+extern void blk_set_runtime_active(struct request_queue *q);
+#else
+static inline void blk_pm_runtime_init(struct request_queue *q,
+				       struct device *dev) {}
+#endif
+
+#endif /* _BLK_PM_H_ */
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 1a08edbddf9f..cc5ef316eb39 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1286,29 +1286,6 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id,
 extern void blk_put_queue(struct request_queue *);
 extern void blk_set_queue_dying(struct request_queue *);
 
-/*
- * block layer runtime pm functions
- */
-#ifdef CONFIG_PM
-extern void blk_pm_runtime_init(struct request_queue *q, struct device *dev);
-extern int blk_pre_runtime_suspend(struct request_queue *q);
-extern void blk_post_runtime_suspend(struct request_queue *q, int err);
-extern void blk_pre_runtime_resume(struct request_queue *q);
-extern void blk_post_runtime_resume(struct request_queue *q, int err);
-extern void blk_set_runtime_active(struct request_queue *q);
-#else
-static inline void blk_pm_runtime_init(struct request_queue *q,
-	struct device *dev) {}
-static inline int blk_pre_runtime_suspend(struct request_queue *q)
-{
-	return -ENOSYS;
-}
-static inline void blk_post_runtime_suspend(struct request_queue *q, int err) {}
-static inline void blk_pre_runtime_resume(struct request_queue *q) {}
-static inline void blk_post_runtime_resume(struct request_queue *q, int err) {}
-static inline void blk_set_runtime_active(struct request_queue *q) {}
-#endif
-
 /*
  * blk_plug permits building a queue of related requests by holding the I/O
  * fragments for a short period. This allows merging of sequential requests
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v5 3/9] block, scsi: Introduce blk_pm_runtime_exit()
  2018-08-07 22:51 [PATCH v5 0/9] blk-mq: Implement runtime power management Bart Van Assche
  2018-08-07 22:51 ` [PATCH v5 1/9] block: Change the preempt-only flag into a counter Bart Van Assche
  2018-08-07 22:51 ` [PATCH v5 2/9] block: Move power management code into a new source file Bart Van Assche
@ 2018-08-07 22:51 ` Bart Van Assche
  2018-08-07 22:51 ` [PATCH v5 4/9] percpu-refcount: Introduce percpu_ref_is_in_use() Bart Van Assche
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 21+ messages in thread
From: Bart Van Assche @ 2018-08-07 22:51 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Bart Van Assche,
	Martin K . Petersen, Ming Lei, Jianchao Wang, Hannes Reinecke,
	Johannes Thumshirn, Alan Stern

Since it is possible to unbind a SCSI ULD and since unbinding
removes the association between a request queue and struct device,
the q->dev pointer has to be reset during unbind. Hence introduce
a function in the block layer that clears request_queue.dev.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Alan Stern <stern@rowland.harvard.edu>
---
 block/blk-pm.c         | 18 ++++++++++++++++++
 drivers/scsi/sd.c      |  9 ++++-----
 drivers/scsi/sr.c      |  3 ++-
 include/linux/blk-pm.h |  2 ++
 4 files changed, 26 insertions(+), 6 deletions(-)

diff --git a/block/blk-pm.c b/block/blk-pm.c
index 9b636960d285..bf8532da952d 100644
--- a/block/blk-pm.c
+++ b/block/blk-pm.c
@@ -40,6 +40,24 @@ void blk_pm_runtime_init(struct request_queue *q, struct device *dev)
 }
 EXPORT_SYMBOL(blk_pm_runtime_init);
 
+/**
+ * blk_pm_runtime_exit - runtime PM exit routine
+ * @q: the queue of the device
+ *
+ * This function should be called from the device_driver.remove() callback
+ * function to avoid that further calls to runtime power management functions
+ * occur.
+ */
+void blk_pm_runtime_exit(struct request_queue *q)
+{
+	if (!q->dev)
+		return;
+
+	pm_runtime_get_sync(q->dev);
+	q->dev = NULL;
+}
+EXPORT_SYMBOL(blk_pm_runtime_exit);
+
 /**
  * blk_pre_runtime_suspend - Pre runtime suspend check
  * @q: the queue of the device
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 69ab459abb98..5537762dfcfd 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3420,12 +3420,11 @@ static int sd_probe(struct device *dev)
  **/
 static int sd_remove(struct device *dev)
 {
-	struct scsi_disk *sdkp;
-	dev_t devt;
+	struct scsi_disk *sdkp = dev_get_drvdata(dev);
+	struct scsi_device *sdp = sdkp->device;
+	dev_t devt = disk_devt(sdkp->disk);
 
-	sdkp = dev_get_drvdata(dev);
-	devt = disk_devt(sdkp->disk);
-	scsi_autopm_get_device(sdkp->device);
+	blk_pm_runtime_exit(sdp->request_queue);
 
 	async_synchronize_full_domain(&scsi_sd_pm_domain);
 	async_synchronize_full_domain(&scsi_sd_probe_domain);
diff --git a/drivers/scsi/sr.c b/drivers/scsi/sr.c
index de4413e66eca..476987f7ed48 100644
--- a/drivers/scsi/sr.c
+++ b/drivers/scsi/sr.c
@@ -1002,8 +1002,9 @@ static void sr_kref_release(struct kref *kref)
 static int sr_remove(struct device *dev)
 {
 	struct scsi_cd *cd = dev_get_drvdata(dev);
+	struct scsi_device *sdev = cd->device;
 
-	scsi_autopm_get_device(cd->device);
+	blk_pm_runtime_exit(sdev->request_queue);
 
 	del_gendisk(cd->disk);
 	dev_set_drvdata(dev, NULL);
diff --git a/include/linux/blk-pm.h b/include/linux/blk-pm.h
index b80c65aba249..6d654f41acbf 100644
--- a/include/linux/blk-pm.h
+++ b/include/linux/blk-pm.h
@@ -11,6 +11,7 @@ struct request_queue;
  */
 #ifdef CONFIG_PM
 extern void blk_pm_runtime_init(struct request_queue *q, struct device *dev);
+extern void blk_pm_runtime_exit(struct request_queue *q);
 extern int blk_pre_runtime_suspend(struct request_queue *q);
 extern void blk_post_runtime_suspend(struct request_queue *q, int err);
 extern void blk_pre_runtime_resume(struct request_queue *q);
@@ -19,6 +20,7 @@ extern void blk_set_runtime_active(struct request_queue *q);
 #else
 static inline void blk_pm_runtime_init(struct request_queue *q,
 				       struct device *dev) {}
+static inline void blk_pm_runtime_exit(struct request_queue *q) {}
 #endif
 
 #endif /* _BLK_PM_H_ */
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v5 4/9] percpu-refcount: Introduce percpu_ref_is_in_use()
  2018-08-07 22:51 [PATCH v5 0/9] blk-mq: Implement runtime power management Bart Van Assche
                   ` (2 preceding siblings ...)
  2018-08-07 22:51 ` [PATCH v5 3/9] block, scsi: Introduce blk_pm_runtime_exit() Bart Van Assche
@ 2018-08-07 22:51 ` Bart Van Assche
  2018-08-08 15:23   ` Tejun Heo
  2018-08-07 22:51 ` [PATCH v5 5/9] block: Change the runtime power management approach (1/2) Bart Van Assche
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 21+ messages in thread
From: Bart Van Assche @ 2018-08-07 22:51 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Bart Van Assche, Tejun Heo,
	Ming Lei, Jianchao Wang, Hannes Reinecke, Johannes Thumshirn,
	Alan Stern

Introduce a function that allows to determine whether a per-cpu refcount
is in use. This function will be used in a later patch to determine
whether or not any block layer requests are being executed.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Alan Stern <stern@rowland.harvard.edu>
---
 include/linux/percpu-refcount.h |  2 ++
 lib/percpu-refcount.c           | 40 +++++++++++++++++++++++++++++++++
 2 files changed, 42 insertions(+)

diff --git a/include/linux/percpu-refcount.h b/include/linux/percpu-refcount.h
index 009cdf3d65b6..dd247756d634 100644
--- a/include/linux/percpu-refcount.h
+++ b/include/linux/percpu-refcount.h
@@ -331,4 +331,6 @@ static inline bool percpu_ref_is_zero(struct percpu_ref *ref)
 	return !atomic_long_read(&ref->count);
 }
 
+bool percpu_ref_is_in_use(struct percpu_ref *ref);
+
 #endif
diff --git a/lib/percpu-refcount.c b/lib/percpu-refcount.c
index 9f96fa7bc000..1dcb47e2c561 100644
--- a/lib/percpu-refcount.c
+++ b/lib/percpu-refcount.c
@@ -369,3 +369,43 @@ void percpu_ref_reinit(struct percpu_ref *ref)
 	spin_unlock_irqrestore(&percpu_ref_switch_lock, flags);
 }
 EXPORT_SYMBOL_GPL(percpu_ref_reinit);
+
+/**
+ * percpu_ref_is_in_use - verify whether or not a percpu refcount is in use
+ * @ref: percpu_ref to test
+ *
+ * For a percpu refcount that is in percpu-mode, verify whether the reference
+ * count is above one. For a percpu refcount that is in atomic mode, verify
+ * whether the reference count is above zero. This function allows to verify
+ * whether any references are held on a percpu refcount that is switched
+ * between atomic and percpu mode with percpu_ref_reinit() /
+ * percpu_ref_kill().
+ *
+ * This function is safe to call as long as @ref is between init and exit. It
+ * is the responsibility of the caller to handle percpu_ref_get() and
+ * percpu_ref_put() calls that occur concurrently with this function.
+ */
+bool percpu_ref_is_in_use(struct percpu_ref *ref)
+{
+	unsigned long __percpu *percpu_count;
+	unsigned long sum = 0;
+	int cpu;
+	unsigned long flags;
+
+	/* Obtain percpu_ref_switch_lock to serialize against mode switches. */
+	spin_lock_irqsave(&percpu_ref_switch_lock, flags);
+	rcu_read_lock_sched();
+	if (__ref_is_percpu(ref, &percpu_count)) {
+		for_each_possible_cpu(cpu)
+			sum += *per_cpu_ptr(percpu_count, cpu);
+	}
+	rcu_read_unlock_sched();
+	sum += atomic_long_read(&ref->count);
+	sum &= ~PERCPU_COUNT_BIAS;
+	sum += (ref->percpu_count_ptr & __PERCPU_REF_DEAD);
+	spin_unlock_irqrestore(&percpu_ref_switch_lock, flags);
+
+	WARN_ON_ONCE(sum == 0);
+	return sum > 1;
+}
+EXPORT_SYMBOL_GPL(percpu_ref_is_in_use);
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v5 5/9] block: Change the runtime power management approach (1/2)
  2018-08-07 22:51 [PATCH v5 0/9] blk-mq: Implement runtime power management Bart Van Assche
                   ` (3 preceding siblings ...)
  2018-08-07 22:51 ` [PATCH v5 4/9] percpu-refcount: Introduce percpu_ref_is_in_use() Bart Van Assche
@ 2018-08-07 22:51 ` Bart Van Assche
  2018-08-08  6:11   ` jianchao.wang
  2018-08-07 22:51 ` [PATCH v5 6/9] block: Change the runtime power management approach (2/2) Bart Van Assche
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 21+ messages in thread
From: Bart Van Assche @ 2018-08-07 22:51 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Bart Van Assche, Ming Lei,
	Jianchao Wang, Hannes Reinecke, Johannes Thumshirn, Alan Stern

Instead of scheduling runtime resume of a request queue after a
request has been queued, schedule asynchronous resume during request
allocation. The new pm_request_resume() calls occur after
blk_queue_enter() has increased the q_usage_counter request queue
member. This change is needed for a later patch that will make request
allocation block while the queue status is not RPM_ACTIVE.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Alan Stern <stern@rowland.harvard.edu>
---
 block/blk-core.c        | 11 +++++++----
 block/blk-mq.c          |  5 +++++
 block/elevator.c        |  2 --
 drivers/scsi/scsi_lib.c |  6 +++++-
 include/linux/blk-mq.h  |  2 ++
 5 files changed, 19 insertions(+), 7 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 64697a97147a..179a13be0fca 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -909,11 +909,11 @@ EXPORT_SYMBOL(blk_alloc_queue);
 /**
  * blk_queue_enter() - try to increase q->q_usage_counter
  * @q: request queue pointer
- * @flags: BLK_MQ_REQ_NOWAIT and/or BLK_MQ_REQ_PREEMPT
+ * @flags: BLK_MQ_REQ_NOWAIT, BLK_MQ_REQ_PREEMPT and/or BLK_MQ_REQ_PM
  */
 int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
 {
-	const bool pm = flags & BLK_MQ_REQ_PREEMPT;
+	const bool pm = flags & (BLK_MQ_REQ_PREEMPT | BLK_MQ_REQ_PM);
 
 	while (true) {
 		bool success = false;
@@ -1571,7 +1571,7 @@ static struct request *get_request(struct request_queue *q, unsigned int op,
 	goto retry;
 }
 
-/* flags: BLK_MQ_REQ_PREEMPT and/or BLK_MQ_REQ_NOWAIT. */
+/* flags: BLK_MQ_REQ_PREEMPT, BLK_MQ_REQ_PM and/or BLK_MQ_REQ_NOWAIT. */
 static struct request *blk_old_get_request(struct request_queue *q,
 				unsigned int op, blk_mq_req_flags_t flags)
 {
@@ -1595,6 +1595,8 @@ static struct request *blk_old_get_request(struct request_queue *q,
 		return rq;
 	}
 
+	blk_pm_add_request(q, rq);
+
 	/* q->queue_lock is unlocked at this point */
 	rq->__data_len = 0;
 	rq->__sector = (sector_t) -1;
@@ -1614,7 +1616,8 @@ struct request *blk_get_request(struct request_queue *q, unsigned int op,
 	struct request *req;
 
 	WARN_ON_ONCE(op & REQ_NOWAIT);
-	WARN_ON_ONCE(flags & ~(BLK_MQ_REQ_NOWAIT | BLK_MQ_REQ_PREEMPT));
+	WARN_ON_ONCE(flags & ~(BLK_MQ_REQ_NOWAIT | BLK_MQ_REQ_PREEMPT |
+			       BLK_MQ_REQ_PM));
 
 	if (q->mq_ops) {
 		req = blk_mq_alloc_request(q, op, flags);
diff --git a/block/blk-mq.c b/block/blk-mq.c
index c92ce06fd565..434515018c43 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -24,6 +24,7 @@
 #include <linux/sched/signal.h>
 #include <linux/delay.h>
 #include <linux/crash_dump.h>
+#include <linux/pm_runtime.h>
 #include <linux/prefetch.h>
 
 #include <trace/events/block.h>
@@ -33,6 +34,7 @@
 #include "blk-mq.h"
 #include "blk-mq-debugfs.h"
 #include "blk-mq-tag.h"
+#include "blk-pm.h"
 #include "blk-stat.h"
 #include "blk-mq-sched.h"
 #include "blk-rq-qos.h"
@@ -391,6 +393,9 @@ static struct request *blk_mq_get_request(struct request_queue *q,
 		}
 	}
 	data->hctx->queued++;
+
+	blk_pm_add_request(q, rq);
+
 	return rq;
 }
 
diff --git a/block/elevator.c b/block/elevator.c
index 4c15f0240c99..3965292b0724 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -600,8 +600,6 @@ void __elv_add_request(struct request_queue *q, struct request *rq, int where)
 {
 	trace_block_rq_insert(q, rq);
 
-	blk_pm_add_request(q, rq);
-
 	rq->q = q;
 
 	if (rq->rq_flags & RQF_SOFTBARRIER) {
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 0e48136329c6..6843b49cc130 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -263,11 +263,15 @@ int __scsi_execute(struct scsi_device *sdev, const unsigned char *cmd,
 {
 	struct request *req;
 	struct scsi_request *rq;
+	blk_mq_req_flags_t blk_mq_req_flags;
 	int ret = DRIVER_ERROR << 24;
 
+	blk_mq_req_flags = BLK_MQ_REQ_PREEMPT;
+	if (rq_flags & RQF_PM)
+		blk_mq_req_flags |= BLK_MQ_REQ_PM;
 	req = blk_get_request(sdev->request_queue,
 			data_direction == DMA_TO_DEVICE ?
-			REQ_OP_SCSI_OUT : REQ_OP_SCSI_IN, BLK_MQ_REQ_PREEMPT);
+			REQ_OP_SCSI_OUT : REQ_OP_SCSI_IN, blk_mq_req_flags);
 	if (IS_ERR(req))
 		return ret;
 	rq = scsi_req(req);
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 1da59c16f637..1c9fae629eec 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -223,6 +223,8 @@ enum {
 	BLK_MQ_REQ_INTERNAL	= (__force blk_mq_req_flags_t)(1 << 2),
 	/* set RQF_PREEMPT */
 	BLK_MQ_REQ_PREEMPT	= (__force blk_mq_req_flags_t)(1 << 3),
+	/* RQF_PM will be set */
+	BLK_MQ_REQ_PM		= (__force blk_mq_req_flags_t)(1 << 4),
 };
 
 struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op,
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v5 6/9] block: Change the runtime power management approach (2/2)
  2018-08-07 22:51 [PATCH v5 0/9] blk-mq: Implement runtime power management Bart Van Assche
                   ` (4 preceding siblings ...)
  2018-08-07 22:51 ` [PATCH v5 5/9] block: Change the runtime power management approach (1/2) Bart Van Assche
@ 2018-08-07 22:51 ` Bart Van Assche
  2018-08-08  8:50   ` Ming Lei
  2018-08-07 22:51 ` [PATCH v5 7/9] block: Remove blk_pm_requeue_request() Bart Van Assche
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 21+ messages in thread
From: Bart Van Assche @ 2018-08-07 22:51 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Bart Van Assche, Ming Lei,
	Jianchao Wang, Hannes Reinecke, Johannes Thumshirn, Alan Stern

Instead of allowing requests that are not power management requests
to enter the queue in runtime suspended status (RPM_SUSPENDED), make
the blk_get_request() caller block. This change fixes a starvation
issue: it is now guaranteed that power management requests will be
executed no matter how many blk_get_request() callers are waiting.
Instead of maintaining the q->nr_pending counter, rely on
q->q_usage_counter. Call pm_runtime_mark_last_busy() every time a
request finishes instead of only if the queue depth drops to zero.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Alan Stern <stern@rowland.harvard.edu>
---
 block/blk-core.c       | 37 ++++++++-----------------------------
 block/blk-mq-debugfs.c |  1 -
 block/blk-pm.c         | 40 +++++++++++++++++++++++++++++++++++-----
 block/blk-pm.h         |  6 ++----
 include/linux/blkdev.h |  1 -
 5 files changed, 45 insertions(+), 40 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 179a13be0fca..a2ef253edfbd 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -2737,30 +2737,6 @@ void blk_account_io_done(struct request *req, u64 now)
 	}
 }
 
-#ifdef CONFIG_PM
-/*
- * Don't process normal requests when queue is suspended
- * or in the process of suspending/resuming
- */
-static bool blk_pm_allow_request(struct request *rq)
-{
-	switch (rq->q->rpm_status) {
-	case RPM_RESUMING:
-	case RPM_SUSPENDING:
-		return rq->rq_flags & RQF_PM;
-	case RPM_SUSPENDED:
-		return false;
-	default:
-		return true;
-	}
-}
-#else
-static bool blk_pm_allow_request(struct request *rq)
-{
-	return true;
-}
-#endif
-
 void blk_account_io_start(struct request *rq, bool new_io)
 {
 	struct hd_struct *part;
@@ -2806,11 +2782,14 @@ static struct request *elv_next_request(struct request_queue *q)
 
 	while (1) {
 		list_for_each_entry(rq, &q->queue_head, queuelist) {
-			if (blk_pm_allow_request(rq))
-				return rq;
-
-			if (rq->rq_flags & RQF_SOFTBARRIER)
-				break;
+#ifdef CONFIG_PM
+			/*
+			 * If a request gets queued in state RPM_SUSPENDED
+			 * then that's a kernel bug.
+			 */
+			WARN_ON_ONCE(q->rpm_status == RPM_SUSPENDED);
+#endif
+			return rq;
 		}
 
 		/*
diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index a5ea86835fcb..7d74d53dc098 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -332,7 +332,6 @@ static const char *const rqf_name[] = {
 	RQF_NAME(ELVPRIV),
 	RQF_NAME(IO_STAT),
 	RQF_NAME(ALLOCED),
-	RQF_NAME(PM),
 	RQF_NAME(HASHED),
 	RQF_NAME(STATS),
 	RQF_NAME(SPECIAL_PAYLOAD),
diff --git a/block/blk-pm.c b/block/blk-pm.c
index bf8532da952d..d6b65cef9764 100644
--- a/block/blk-pm.c
+++ b/block/blk-pm.c
@@ -86,14 +86,39 @@ int blk_pre_runtime_suspend(struct request_queue *q)
 	if (!q->dev)
 		return ret;
 
+	WARN_ON_ONCE(q->rpm_status != RPM_ACTIVE);
+
+	blk_set_pm_only(q);
+	/*
+	 * This function only gets called if the most recent
+	 * pm_request_resume() call occurred at least autosuspend_delay_ms
+	 * ago. Since blk_queue_enter() is called by the request allocation
+	 * code before pm_request_resume(), if q_usage_counter indicates that
+	 * no requests are in flight it is safe to suspend the device.
+	 */
+	ret = -EBUSY;
+	if (!percpu_ref_is_in_use(&q->q_usage_counter)) {
+		/*
+		 * Switch to preempt-only mode before calling
+		 * synchronize_rcu() such that later blk_queue_enter() calls
+		 * see the preempt-only state. See also
+		 * http://lwn.net/Articles/573497/.
+		 */
+		synchronize_rcu();
+		if (!percpu_ref_is_in_use(&q->q_usage_counter))
+			ret = 0;
+	}
+
 	spin_lock_irq(q->queue_lock);
-	if (q->nr_pending) {
-		ret = -EBUSY;
+	if (ret < 0)
 		pm_runtime_mark_last_busy(q->dev);
-	} else {
+	else
 		q->rpm_status = RPM_SUSPENDING;
-	}
 	spin_unlock_irq(q->queue_lock);
+
+	if (ret)
+		blk_clear_pm_only(q);
+
 	return ret;
 }
 EXPORT_SYMBOL(blk_pre_runtime_suspend);
@@ -124,6 +149,9 @@ void blk_post_runtime_suspend(struct request_queue *q, int err)
 		pm_runtime_mark_last_busy(q->dev);
 	}
 	spin_unlock_irq(q->queue_lock);
+
+	if (err)
+		blk_clear_pm_only(q);
 }
 EXPORT_SYMBOL(blk_post_runtime_suspend);
 
@@ -171,13 +199,15 @@ void blk_post_runtime_resume(struct request_queue *q, int err)
 	spin_lock_irq(q->queue_lock);
 	if (!err) {
 		q->rpm_status = RPM_ACTIVE;
-		__blk_run_queue(q);
 		pm_runtime_mark_last_busy(q->dev);
 		pm_request_autosuspend(q->dev);
 	} else {
 		q->rpm_status = RPM_SUSPENDED;
 	}
 	spin_unlock_irq(q->queue_lock);
+
+	if (!err)
+		blk_clear_pm_only(q);
 }
 EXPORT_SYMBOL(blk_post_runtime_resume);
 
diff --git a/block/blk-pm.h b/block/blk-pm.h
index 1ffc8ef203ec..fcb507a29e99 100644
--- a/block/blk-pm.h
+++ b/block/blk-pm.h
@@ -8,21 +8,19 @@
 #ifdef CONFIG_PM
 static inline void blk_pm_requeue_request(struct request *rq)
 {
-	if (rq->q->dev && !(rq->rq_flags & RQF_PM))
-		rq->q->nr_pending--;
 }
 
 static inline void blk_pm_add_request(struct request_queue *q,
 				      struct request *rq)
 {
-	if (q->dev && !(rq->rq_flags & RQF_PM) && q->nr_pending++ == 0 &&
+	if (q->dev && !(rq->rq_flags & RQF_PM) &&
 	    (q->rpm_status == RPM_SUSPENDED || q->rpm_status == RPM_SUSPENDING))
 		pm_request_resume(q->dev);
 }
 
 static inline void blk_pm_put_request(struct request *rq)
 {
-	if (rq->q->dev && !(rq->rq_flags & RQF_PM) && !--rq->q->nr_pending)
+	if (rq->q->dev && !(rq->rq_flags & RQF_PM))
 		pm_runtime_mark_last_busy(rq->q->dev);
 }
 #else
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index cc5ef316eb39..b10787fdc69e 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -549,7 +549,6 @@ struct request_queue {
 #ifdef CONFIG_PM
 	struct device		*dev;
 	int			rpm_status;
-	unsigned int		nr_pending;
 #endif
 
 	/*
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v5 7/9] block: Remove blk_pm_requeue_request()
  2018-08-07 22:51 [PATCH v5 0/9] blk-mq: Implement runtime power management Bart Van Assche
                   ` (5 preceding siblings ...)
  2018-08-07 22:51 ` [PATCH v5 6/9] block: Change the runtime power management approach (2/2) Bart Van Assche
@ 2018-08-07 22:51 ` Bart Van Assche
  2018-08-07 22:51 ` [PATCH v5 8/9] blk-mq: Insert a blk_pm_put_request() call Bart Van Assche
  2018-08-07 22:51 ` [PATCH v5 9/9] blk-mq: Enable support for runtime power management Bart Van Assche
  8 siblings, 0 replies; 21+ messages in thread
From: Bart Van Assche @ 2018-08-07 22:51 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Bart Van Assche, Ming Lei,
	Jianchao Wang, Hannes Reinecke, Johannes Thumshirn, Alan Stern

Since this function is now empty, remove it.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Alan Stern <stern@rowland.harvard.edu>
---
 block/blk-pm.h   | 8 --------
 block/elevator.c | 2 --
 2 files changed, 10 deletions(-)

diff --git a/block/blk-pm.h b/block/blk-pm.h
index fcb507a29e99..3a903efcacaf 100644
--- a/block/blk-pm.h
+++ b/block/blk-pm.h
@@ -6,10 +6,6 @@
 #include <linux/pm_runtime.h>
 
 #ifdef CONFIG_PM
-static inline void blk_pm_requeue_request(struct request *rq)
-{
-}
-
 static inline void blk_pm_add_request(struct request_queue *q,
 				      struct request *rq)
 {
@@ -24,10 +20,6 @@ static inline void blk_pm_put_request(struct request *rq)
 		pm_runtime_mark_last_busy(rq->q->dev);
 }
 #else
-static inline void blk_pm_requeue_request(struct request *rq)
-{
-}
-
 static inline void blk_pm_add_request(struct request_queue *q,
 				      struct request *rq)
 {
diff --git a/block/elevator.c b/block/elevator.c
index 3965292b0724..9e46df35cacc 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -572,8 +572,6 @@ void elv_requeue_request(struct request_queue *q, struct request *rq)
 
 	rq->rq_flags &= ~RQF_STARTED;
 
-	blk_pm_requeue_request(rq);
-
 	__elv_add_request(q, rq, ELEVATOR_INSERT_REQUEUE);
 }
 
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v5 8/9] blk-mq: Insert a blk_pm_put_request() call
  2018-08-07 22:51 [PATCH v5 0/9] blk-mq: Implement runtime power management Bart Van Assche
                   ` (6 preceding siblings ...)
  2018-08-07 22:51 ` [PATCH v5 7/9] block: Remove blk_pm_requeue_request() Bart Van Assche
@ 2018-08-07 22:51 ` Bart Van Assche
  2018-08-07 22:51 ` [PATCH v5 9/9] blk-mq: Enable support for runtime power management Bart Van Assche
  8 siblings, 0 replies; 21+ messages in thread
From: Bart Van Assche @ 2018-08-07 22:51 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Bart Van Assche, Ming Lei,
	Jianchao Wang, Hannes Reinecke, Johannes Thumshirn, Alan Stern

Call blk_pm_put_request() after a request has finished.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Alan Stern <stern@rowland.harvard.edu>
---
 block/blk-mq.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 434515018c43..6fb91b2b001c 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -37,6 +37,7 @@
 #include "blk-pm.h"
 #include "blk-stat.h"
 #include "blk-mq-sched.h"
+#include "blk-pm.h"
 #include "blk-rq-qos.h"
 
 static bool blk_mq_poll(struct request_queue *q, blk_qc_t cookie);
@@ -478,6 +479,8 @@ static void __blk_mq_free_request(struct request *rq)
 	struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(q, ctx->cpu);
 	const int sched_tag = rq->internal_tag;
 
+	blk_pm_put_request(rq);
+
 	if (rq->tag != -1)
 		blk_mq_put_tag(hctx, hctx->tags, ctx, rq->tag);
 	if (sched_tag != -1)
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v5 9/9] blk-mq: Enable support for runtime power management
  2018-08-07 22:51 [PATCH v5 0/9] blk-mq: Implement runtime power management Bart Van Assche
                   ` (7 preceding siblings ...)
  2018-08-07 22:51 ` [PATCH v5 8/9] blk-mq: Insert a blk_pm_put_request() call Bart Van Assche
@ 2018-08-07 22:51 ` Bart Van Assche
  8 siblings, 0 replies; 21+ messages in thread
From: Bart Van Assche @ 2018-08-07 22:51 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Bart Van Assche, Ming Lei,
	Jianchao Wang, Hannes Reinecke, Johannes Thumshirn, Alan Stern

Now that the blk-mq core processes power management requests
(marked with RQF_PREEMPT) in other states than RPM_ACTIVE, enable
runtime power management for blk-mq.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Alan Stern <stern@rowland.harvard.edu>
---
 block/blk-pm.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/block/blk-pm.c b/block/blk-pm.c
index d6b65cef9764..eaaca1e26c88 100644
--- a/block/blk-pm.c
+++ b/block/blk-pm.c
@@ -27,12 +27,6 @@
  */
 void blk_pm_runtime_init(struct request_queue *q, struct device *dev)
 {
-	/* Don't enable runtime PM for blk-mq until it is ready */
-	if (q->mq_ops) {
-		pm_runtime_disable(dev);
-		return;
-	}
-
 	q->dev = dev;
 	q->rpm_status = RPM_ACTIVE;
 	pm_runtime_set_autosuspend_delay(q->dev, -1);
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 5/9] block: Change the runtime power management approach (1/2)
  2018-08-07 22:51 ` [PATCH v5 5/9] block: Change the runtime power management approach (1/2) Bart Van Assche
@ 2018-08-08  6:11   ` jianchao.wang
  2018-08-08  6:43     ` jianchao.wang
  0 siblings, 1 reply; 21+ messages in thread
From: jianchao.wang @ 2018-08-08  6:11 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, Christoph Hellwig, Ming Lei, Hannes Reinecke,
	Johannes Thumshirn, Alan Stern

Hi Bart

On 08/08/2018 06:51 AM, Bart Van Assche wrote:
> @@ -391,6 +393,9 @@ static struct request *blk_mq_get_request(struct request_queue *q,
>  		}
>  	}
>  	data->hctx->queued++;
> +
> +	blk_pm_add_request(q, rq);
> +
>  	return rq;
>  }

The request_queue is in pm_only mode when suspended, who can reach here to do the resume ?

Thanks
Jianchao

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 5/9] block: Change the runtime power management approach (1/2)
  2018-08-08  6:11   ` jianchao.wang
@ 2018-08-08  6:43     ` jianchao.wang
  2018-08-08 17:28       ` Bart Van Assche
  0 siblings, 1 reply; 21+ messages in thread
From: jianchao.wang @ 2018-08-08  6:43 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, Christoph Hellwig, Ming Lei, Hannes Reinecke,
	Johannes Thumshirn, Alan Stern



On 08/08/2018 02:11 PM, jianchao.wang wrote:
> Hi Bart
> 
> On 08/08/2018 06:51 AM, Bart Van Assche wrote:
>> @@ -391,6 +393,9 @@ static struct request *blk_mq_get_request(struct request_queue *q,
>>  		}
>>  	}
>>  	data->hctx->queued++;
>> +
>> +	blk_pm_add_request(q, rq);
>> +
>>  	return rq;
>>  }
> 
> The request_queue is in pm_only mode when suspended, who can reach here to do the resume ?

I mean, in the original blk-legacy runtime pm implementation, any new IO could trigger the resume,
after your patch set, only the pm request could pass the blk_queue_enter while the queue is suspended
and in pm-only mode. But if no resume, where does the pm request come from ?

The blk_pm_add_request should be added to blk_queue_enter.
It looks like as following:
   1. when an normal io reaches blk_queue_enter, if queue is in suspended mode, it invoke blk_pm_add_request
      to trigger the resume, then wait here for the pm_only mode to be cleared.
   2. the runtime pm core does the resume work and clear the pm_only more finally
   3. the task blocked in blk_queue_enter is waked up and proceed.

Thanks
Jianchao

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 1/9] block: Change the preempt-only flag into a counter
  2018-08-07 22:51 ` [PATCH v5 1/9] block: Change the preempt-only flag into a counter Bart Van Assche
@ 2018-08-08  8:21   ` Ming Lei
  2018-08-08 15:27     ` Bart Van Assche
  0 siblings, 1 reply; 21+ messages in thread
From: Ming Lei @ 2018-08-08  8:21 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Jianchao Wang,
	Johannes Thumshirn, Alan Stern

On Tue, Aug 07, 2018 at 03:51:25PM -0700, Bart Van Assche wrote:
> The RQF_PREEMPT flag is used for three purposes:
> - In the SCSI core, for making sure that power management requests
>   are executed if a device is in the "quiesced" state.
> - For domain validation by SCSI drivers that use the parallel port.
> - In the IDE driver, for IDE preempt requests.

I think the above description may not be accurate, BLK_MQ_REQ_PREEMPT is
always set inside scsi_execute(), that means any scsi_execute() callers
can use the flag of RQF_PREEMPT, of course, not limited to the above
three cases.


Thanks,
Ming

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 6/9] block: Change the runtime power management approach (2/2)
  2018-08-07 22:51 ` [PATCH v5 6/9] block: Change the runtime power management approach (2/2) Bart Van Assche
@ 2018-08-08  8:50   ` Ming Lei
  2018-08-08 17:32     ` Bart Van Assche
  0 siblings, 1 reply; 21+ messages in thread
From: Ming Lei @ 2018-08-08  8:50 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Jianchao Wang,
	Hannes Reinecke, Johannes Thumshirn, Alan Stern

On Tue, Aug 07, 2018 at 03:51:30PM -0700, Bart Van Assche wrote:
> Instead of allowing requests that are not power management requests
> to enter the queue in runtime suspended status (RPM_SUSPENDED), make
> the blk_get_request() caller block. This change fixes a starvation

Looks not see the related change which blocks blk_get_request() in
this patchset.

BTW, blk_pm_add_request() won't block since it uses the async version
of runtime resume.

> issue: it is now guaranteed that power management requests will be
> executed no matter how many blk_get_request() callers are waiting.
> Instead of maintaining the q->nr_pending counter, rely on
> q->q_usage_counter. Call pm_runtime_mark_last_busy() every time a
> request finishes instead of only if the queue depth drops to zero.
> 
> Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
> Cc: Hannes Reinecke <hare@suse.com>
> Cc: Johannes Thumshirn <jthumshirn@suse.de>
> Cc: Alan Stern <stern@rowland.harvard.edu>
> ---
>  block/blk-core.c       | 37 ++++++++-----------------------------
>  block/blk-mq-debugfs.c |  1 -
>  block/blk-pm.c         | 40 +++++++++++++++++++++++++++++++++++-----
>  block/blk-pm.h         |  6 ++----
>  include/linux/blkdev.h |  1 -
>  5 files changed, 45 insertions(+), 40 deletions(-)
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 179a13be0fca..a2ef253edfbd 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -2737,30 +2737,6 @@ void blk_account_io_done(struct request *req, u64 now)
>  	}
>  }
>  
> -#ifdef CONFIG_PM
> -/*
> - * Don't process normal requests when queue is suspended
> - * or in the process of suspending/resuming
> - */
> -static bool blk_pm_allow_request(struct request *rq)
> -{
> -	switch (rq->q->rpm_status) {
> -	case RPM_RESUMING:
> -	case RPM_SUSPENDING:
> -		return rq->rq_flags & RQF_PM;
> -	case RPM_SUSPENDED:
> -		return false;
> -	default:
> -		return true;
> -	}
> -}
> -#else
> -static bool blk_pm_allow_request(struct request *rq)
> -{
> -	return true;
> -}
> -#endif
> -
>  void blk_account_io_start(struct request *rq, bool new_io)
>  {
>  	struct hd_struct *part;
> @@ -2806,11 +2782,14 @@ static struct request *elv_next_request(struct request_queue *q)
>  
>  	while (1) {
>  		list_for_each_entry(rq, &q->queue_head, queuelist) {
> -			if (blk_pm_allow_request(rq))
> -				return rq;
> -
> -			if (rq->rq_flags & RQF_SOFTBARRIER)
> -				break;
> +#ifdef CONFIG_PM
> +			/*
> +			 * If a request gets queued in state RPM_SUSPENDED
> +			 * then that's a kernel bug.
> +			 */
> +			WARN_ON_ONCE(q->rpm_status == RPM_SUSPENDED);
> +#endif
> +			return rq;
>  		}
>  
>  		/*
> diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
> index a5ea86835fcb..7d74d53dc098 100644
> --- a/block/blk-mq-debugfs.c
> +++ b/block/blk-mq-debugfs.c
> @@ -332,7 +332,6 @@ static const char *const rqf_name[] = {
>  	RQF_NAME(ELVPRIV),
>  	RQF_NAME(IO_STAT),
>  	RQF_NAME(ALLOCED),
> -	RQF_NAME(PM),
>  	RQF_NAME(HASHED),
>  	RQF_NAME(STATS),
>  	RQF_NAME(SPECIAL_PAYLOAD),
> diff --git a/block/blk-pm.c b/block/blk-pm.c
> index bf8532da952d..d6b65cef9764 100644
> --- a/block/blk-pm.c
> +++ b/block/blk-pm.c
> @@ -86,14 +86,39 @@ int blk_pre_runtime_suspend(struct request_queue *q)
>  	if (!q->dev)
>  		return ret;
>  
> +	WARN_ON_ONCE(q->rpm_status != RPM_ACTIVE);
> +
> +	blk_set_pm_only(q);
> +	/*
> +	 * This function only gets called if the most recent
> +	 * pm_request_resume() call occurred at least autosuspend_delay_ms
> +	 * ago. Since blk_queue_enter() is called by the request allocation
> +	 * code before pm_request_resume(), if q_usage_counter indicates that
> +	 * no requests are in flight it is safe to suspend the device.
> +	 */
> +	ret = -EBUSY;
> +	if (!percpu_ref_is_in_use(&q->q_usage_counter)) {
> +		/*
> +		 * Switch to preempt-only mode before calling
> +		 * synchronize_rcu() such that later blk_queue_enter() calls
> +		 * see the preempt-only state. See also
> +		 * http://lwn.net/Articles/573497/.
> +		 */
> +		synchronize_rcu();
> +		if (!percpu_ref_is_in_use(&q->q_usage_counter))
> +			ret = 0;
> +	}
> +

In blk_queue_enter(), V5 starts to allow all RQF_PREEMPT requests
to enter queue even though pm_only is set. That means any scsi_execute()
may issue a new command to host just after the above percpu_ref_is_in_use()
returns 0, meantime the suspend is in-progress.

This case is illegal given RQF_PM is the only kind of request which can be
issued to hardware during suspend.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 4/9] percpu-refcount: Introduce percpu_ref_is_in_use()
  2018-08-07 22:51 ` [PATCH v5 4/9] percpu-refcount: Introduce percpu_ref_is_in_use() Bart Van Assche
@ 2018-08-08 15:23   ` Tejun Heo
  2018-08-09 14:32     ` Bart Van Assche
  0 siblings, 1 reply; 21+ messages in thread
From: Tejun Heo @ 2018-08-08 15:23 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Ming Lei,
	Jianchao Wang, Hannes Reinecke, Johannes Thumshirn, Alan Stern

Hello,

On Tue, Aug 07, 2018 at 03:51:28PM -0700, Bart Van Assche wrote:
> Introduce a function that allows to determine whether a per-cpu refcount
> is in use. This function will be used in a later patch to determine
> whether or not any block layer requests are being executed.

I thought about it a bit and am having a bit of difficulty convincing
myself this is necessary.  Switching a percpu_ref to atomic mode isn't
expensive - it's one spinlock cycle, a rcu wait and one sweep of the
percpu counters.  The most expensive part - the percpu sweep - needs
to be there even with optimization, the wait doesn't really matter as
all it'll do is slightly delaying timer based PM operation and can be
overlayed with the propagation of set_pm_only() anyway.

So, how about just doing the simple thing?  Switch it to atomic mode
and check the counter and switch back to percpu mode afterwards.  If
we see any issues with that, we can try to optimize it later but that
seems unlikely to me.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 1/9] block: Change the preempt-only flag into a counter
  2018-08-08  8:21   ` Ming Lei
@ 2018-08-08 15:27     ` Bart Van Assche
  0 siblings, 0 replies; 21+ messages in thread
From: Bart Van Assche @ 2018-08-08 15:27 UTC (permalink / raw)
  To: ming.lei@redhat.com
  Cc: hch@lst.de, linux-block@vger.kernel.org, jthumshirn@suse.de,
	stern@rowland.harvard.edu, axboe@kernel.dk,
	jianchao.w.wang@oracle.com

On Wed, 2018-08-08 at 16:21 +0800, Ming Lei wrote:
> On Tue, Aug 07, 2018 at 03:51:25PM -0700, Bart Van Assche wrote:
> > The RQF_PREEMPT flag is used for three purposes:
> > - In the SCSI core, for making sure that power management reque=
sts
> >   are executed if a device is in the "quiesced" state.
> > - For domain validation by SCSI drivers that use the parallel p=
ort.
> > - In the IDE driver, for IDE preempt requests.
>=20
> I think the above description may not be accurate, BLK_MQ_REQ=
_PREEMPT is
> always set inside scsi_execute(), that means any scsi_execute=
() callers
> can use the flag of RQF_PREEMPT, of course, not limited to the ab=
ove
> three cases.

Hello Ming,

What I described is for which cases we really need the RQF_PREEMPT flag=
 rather
than in which cases the RQF_PREEMPT flag is set today. I think we shoul=
d split
the RQF_PREEMPT flag into three flags, one flag per case mentioned abov=
e.

Bart.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 5/9] block: Change the runtime power management approach (1/2)
  2018-08-08  6:43     ` jianchao.wang
@ 2018-08-08 17:28       ` Bart Van Assche
  2018-08-09  2:52         ` Ming Lei
  0 siblings, 1 reply; 21+ messages in thread
From: Bart Van Assche @ 2018-08-08 17:28 UTC (permalink / raw)
  To: jianchao.w.wang@oracle.com, axboe@kernel.dk
  Cc: hch@lst.de, jthumshirn@suse.de, linux-block@vger.kernel.org,
	hare@suse.com, stern@rowland.harvard.edu, ming.lei@redhat.com

On Wed, 2018-08-08 at 14:43 +0800, jianchao.wang wrote:
>=20
> On 08/08/2018 02:11 PM, jianchao.wang wrote:
> > Hi Bart
> >=20
> > On 08/08/2018 06:51 AM, Bart Van Assche wrote:
> > > @@ -391,6 +393,9 @@ static struct request =
Co-blk_mq_get_request(struct request_queue *q,
> > >  		}
> > >  	}
> > >  	data->hctx->queued++;
> > > +
> > > +	blk_pm_add_request(q, rq);
> > > +
> > >  	return rq;
> > >  }
> >=20
> > The request_queue is in pm_only mode when suspended, wh=
o can reach here to do the resume ?
>=20
> I mean, in the original blk-legacy runtime pm implementation, any new=
 IO could trigger the resume,
> after your patch set, only the pm request could pass the blk_queu=
e_enter while the queue is suspended
> and in pm-only mode. But if no resume, where does the pm request come=
 from ?
>=20
> The blk_pm_add_request should be added to blk_queue=
F8-enter.
> It looks like as following:
>    1. when an normal io reaches blk_queue_enter, if queue is =
in suspended mode, it invoke blk_pm_add_request
>       to trigger the resume, then wait here for the pm_only mode =
to be cleared.
>    2. the runtime pm core does the resume work and clear the pm_o=
nly more finally
>    3. the task blocked in blk_queue_enter is waked up and pro=
ceed.

Hello Jianchao,

Some but not all blk_queue_enter() calls are related to request all=
ocation so
I'm not sure that that call should be added into blk_queue_enter().=
 The reason
my tests passed is probably because of the scsi_autopm_get_devi=
ce() calls in
the sd and sr drivers. However, not all request submission code is protecte=
d by
these calls. I will have a closer look at how to preserve the behavior that
queueing a new request that is not protected by scsi_autopm_get_=
AKg-() restores
full power mode.

Bart.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 6/9] block: Change the runtime power management approach (2/2)
  2018-08-08  8:50   ` Ming Lei
@ 2018-08-08 17:32     ` Bart Van Assche
  0 siblings, 0 replies; 21+ messages in thread
From: Bart Van Assche @ 2018-08-08 17:32 UTC (permalink / raw)
  To: ming.lei@redhat.com
  Cc: hch@lst.de, jthumshirn@suse.de, linux-block@vger.kernel.org,
	hare@suse.com, stern@rowland.harvard.edu, axboe@kernel.dk,
	jianchao.w.wang@oracle.com

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-7", Size: 2949 bytes --]

On Wed, 2018-08-08 at 16:50 +-0800, Ming Lei wrote:
+AD4- On Tue, Aug 07, 2018 at 03:51:30PM -0700, Bart Van Assche wrote:
+AD4- +AD4- Instead of allowing requests that are not power management requ=
ests
+AD4- +AD4- to enter the queue in runtime suspended status (RPM+AF8-SUSPEND=
ED), make
+AD4- +AD4- the blk+AF8-get+AF8-request() caller block. This change fixes a=
 starvation
+AD4-=20
+AD4- Looks not see the related change which blocks blk+AF8-get+AF8-request=
() in
+AD4- this patchset.

I was referring to setting pm-only mode for suspended devices. Since
blk+AF8-queue+AF8-enter() is called before a request is allocated that is s=
ufficient
to make request allocation block.

+AD4- +AD4- diff --git a/block/blk-pm.c b/block/blk-pm.c
+AD4- +AD4- index bf8532da952d..d6b65cef9764 100644
+AD4- +AD4- --- a/block/blk-pm.c
+AD4- +AD4- +-+-+- b/block/blk-pm.c
+AD4- +AD4- +AEAAQA- -86,14 +-86,39 +AEAAQA- int blk+AF8-pre+AF8-runtime+AF=
8-suspend(struct request+AF8-queue +ACo-q)
+AD4- +AD4-  	if (+ACE-q-+AD4-dev)
+AD4- +AD4-  		return ret+ADs-
+AD4- +AD4- =20
+AD4- +AD4- +-	WARN+AF8-ON+AF8-ONCE(q-+AD4-rpm+AF8-status +ACEAPQ- RPM+AF8-=
ACTIVE)+ADs-
+AD4- +AD4- +-
+AD4- +AD4- +-	blk+AF8-set+AF8-pm+AF8-only(q)+ADs-
+AD4- +AD4- +-	/+ACo-
+AD4- +AD4- +-	 +ACo- This function only gets called if the most recent
+AD4- +AD4- +-	 +ACo- pm+AF8-request+AF8-resume() call occurred at least au=
tosuspend+AF8-delay+AF8-ms
+AD4- +AD4- +-	 +ACo- ago. Since blk+AF8-queue+AF8-enter() is called by the=
 request allocation
+AD4- +AD4- +-	 +ACo- code before pm+AF8-request+AF8-resume(), if q+AF8-usa=
ge+AF8-counter indicates that
+AD4- +AD4- +-	 +ACo- no requests are in flight it is safe to suspend the d=
evice.
+AD4- +AD4- +-	 +ACo-/
+AD4- +AD4- +-	ret +AD0- -EBUSY+ADs-
+AD4- +AD4- +-	if (+ACE-percpu+AF8-ref+AF8-is+AF8-in+AF8-use(+ACY-q-+AD4-q+=
AF8-usage+AF8-counter)) +AHs-
+AD4- +AD4- +-		/+ACo-
+AD4- +AD4- +-		 +ACo- Switch to preempt-only mode before calling
+AD4- +AD4- +-		 +ACo- synchronize+AF8-rcu() such that later blk+AF8-queue+=
AF8-enter() calls
+AD4- +AD4- +-		 +ACo- see the preempt-only state. See also
+AD4- +AD4- +-		 +ACo- http://lwn.net/Articles/573497/.
+AD4- +AD4- +-		 +ACo-/
+AD4- +AD4- +-		synchronize+AF8-rcu()+ADs-
+AD4- +AD4- +-		if (+ACE-percpu+AF8-ref+AF8-is+AF8-in+AF8-use(+ACY-q-+AD4-q=
+AF8-usage+AF8-counter))
+AD4- +AD4- +-			ret +AD0- 0+ADs-
+AD4- +AD4- +-	+AH0-
+AD4- +AD4- +-
+AD4-=20
+AD4- In blk+AF8-queue+AF8-enter(), V5 starts to allow all RQF+AF8-PREEMPT =
requests
+AD4- to enter queue even though pm+AF8-only is set. That means any scsi+AF=
8-execute()
+AD4- may issue a new command to host just after the above percpu+AF8-ref+A=
F8-is+AF8-in+AF8-use()
+AD4- returns 0, meantime the suspend is in-progress.
+AD4-=20
+AD4- This case is illegal given RQF+AF8-PM is the only kind of request whi=
ch can be
+AD4- issued to hardware during suspend.

Right, that's something that needs to be addressed.

Bart.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 5/9] block: Change the runtime power management approach (1/2)
  2018-08-08 17:28       ` Bart Van Assche
@ 2018-08-09  2:52         ` Ming Lei
  2018-08-09 17:12           ` Bart Van Assche
  0 siblings, 1 reply; 21+ messages in thread
From: Ming Lei @ 2018-08-09  2:52 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: jianchao.w.wang@oracle.com, axboe@kernel.dk, hch@lst.de,
	jthumshirn@suse.de, linux-block@vger.kernel.org, hare@suse.com,
	stern@rowland.harvard.edu

On Wed, Aug 08, 2018 at 05:28:43PM +0000, Bart Van Assche wrote:
> On Wed, 2018-08-08 at 14:43 +0800, jianchao.wang wrote:
> > 
> > On 08/08/2018 02:11 PM, jianchao.wang wrote:
> > > Hi Bart
> > > 
> > > On 08/08/2018 06:51 AM, Bart Van Assche wrote:
> > > > @@ -391,6 +393,9 @@ static struct request *blk_mq_get_request(struct request_queue *q,
> > > >  		}
> > > >  	}
> > > >  	data->hctx->queued++;
> > > > +
> > > > +	blk_pm_add_request(q, rq);
> > > > +
> > > >  	return rq;
> > > >  }
> > > 
> > > The request_queue is in pm_only mode when suspended, who can reach here to do the resume ?
> > 
> > I mean, in the original blk-legacy runtime pm implementation, any new IO could trigger the resume,
> > after your patch set, only the pm request could pass the blk_queue_enter while the queue is suspended
> > and in pm-only mode. But if no resume, where does the pm request come from ?
> > 
> > The blk_pm_add_request should be added to blk_queue_enter.
> > It looks like as following:
> >    1. when an normal io reaches blk_queue_enter, if queue is in suspended mode, it invoke blk_pm_add_request
> >       to trigger the resume, then wait here for the pm_only mode to be cleared.
> >    2. the runtime pm core does the resume work and clear the pm_only more finally
> >    3. the task blocked in blk_queue_enter is waked up and proceed.
> 
> Hello Jianchao,
> 
> Some but not all blk_queue_enter() calls are related to request allocation so

The only one I remember is scsi_ioctl_reset(), in which scsi_autopm_get_host()
is called before allocating this request, that means it is enough to put
host up for handling RESET. Also this request shouldn't have entered the block
request queue.

Or are there other cases in which request allocation isn't related with
blk_queue_enter()?


thanks,
Ming

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 4/9] percpu-refcount: Introduce percpu_ref_is_in_use()
  2018-08-08 15:23   ` Tejun Heo
@ 2018-08-09 14:32     ` Bart Van Assche
  0 siblings, 0 replies; 21+ messages in thread
From: Bart Van Assche @ 2018-08-09 14:32 UTC (permalink / raw)
  To: tj@kernel.org
  Cc: linux-block@vger.kernel.org, jthumshirn@suse.de, hch@lst.de,
	axboe@kernel.dk, ming.lei@redhat.com, hare@suse.com,
	stern@rowland.harvard.edu, jianchao.w.wang@oracle.com

On Wed, 2018-08-08 at 08:23 -0700, Tejun Heo wrote:
> On Tue, Aug 07, 2018 at 03:51:28PM -0700, Bart Van Assche wrote:
> > Introduce a function that allows to determine whether a per-cpu=
 refcount
> > is in use. This function will be used in a later patch to deter=
mine
> > whether or not any block layer requests are being executed.
>=20
> I thought about it a bit and am having a bit of difficulty convincing
> myself this is necessary.  Switching a percpu_ref to atomic mode =
isn't
> expensive - it's one spinlock cycle, a rcu wait and one sweep of the
> percpu counters.  The most expensive part - the percpu sweep - needs
> to be there even with optimization, the wait doesn't really matter as
> all it'll do is slightly delaying timer based PM operation and can be
> overlayed with the propagation of set_pm_only() anyway.
>=20
> So, how about just doing the simple thing?  Switch it to atomic mode
> and check the counter and switch back to percpu mode afterwards.  If
> we see any issues with that, we can try to optimize it later but that
> seems unlikely to me.

Hello Tejun,

Switching to atomic mode would require me to add a serialization mechanism
against blk_freeze_queue_start() and blk_mq_unfreeze=
8-queue() since these
functions call percpu_ref_kill() and percpu_ref_reinit(). T=
hat is easy but
requires additional code. I will see whether I can implement an alternative
approach using blk_mq_queue_tag_busy_iter().

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v5 5/9] block: Change the runtime power management approach (1/2)
  2018-08-09  2:52         ` Ming Lei
@ 2018-08-09 17:12           ` Bart Van Assche
  0 siblings, 0 replies; 21+ messages in thread
From: Bart Van Assche @ 2018-08-09 17:12 UTC (permalink / raw)
  To: ming.lei@redhat.com
  Cc: hch@lst.de, hare@suse.com, jthumshirn@suse.de,
	linux-block@vger.kernel.org, stern@rowland.harvard.edu,
	jianchao.w.wang@oracle.com, axboe@kernel.dk

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-7", Size: 1053 bytes --]

On Thu, 2018-08-09 at 10:52 +-0800, Ming Lei wrote:
+AD4- On Wed, Aug 08, 2018 at 05:28:43PM +-0000, Bart Van Assche wrote:
+AD4- +AD4- Some but not all blk+AF8-queue+AF8-enter() calls are related to=
 request allocation so
+AD4-=20
+AD4- The only one I remember is scsi+AF8-ioctl+AF8-reset(), in which scsi+=
AF8-autopm+AF8-get+AF8-host()
+AD4- is called before allocating this request, that means it is enough to =
put
+AD4- host up for handling RESET. Also this request shouldn't have entered =
the block
+AD4- request queue.
+AD4-=20
+AD4- Or are there other cases in which request allocation isn't related wi=
th
+AD4- blk+AF8-queue+AF8-enter()?

What I was referring to in my e-mail is the q+AF8-usage+AF8-counter manipul=
ations
in blk+AF8-mq+AF8-timeout+AF8-work(). However, blk+AF8-mq+AF8-timeout+AF8-w=
ork() manipulates that
counter directly. So it's only the blk+AF8-queue+AF8-exit() call in that fu=
nction
that is not related to request allocation. All blk+AF8-queue+AF8-enter() ca=
lls I
know of are related to request allocation.

Bart.

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2018-08-09 17:12 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-08-07 22:51 [PATCH v5 0/9] blk-mq: Implement runtime power management Bart Van Assche
2018-08-07 22:51 ` [PATCH v5 1/9] block: Change the preempt-only flag into a counter Bart Van Assche
2018-08-08  8:21   ` Ming Lei
2018-08-08 15:27     ` Bart Van Assche
2018-08-07 22:51 ` [PATCH v5 2/9] block: Move power management code into a new source file Bart Van Assche
2018-08-07 22:51 ` [PATCH v5 3/9] block, scsi: Introduce blk_pm_runtime_exit() Bart Van Assche
2018-08-07 22:51 ` [PATCH v5 4/9] percpu-refcount: Introduce percpu_ref_is_in_use() Bart Van Assche
2018-08-08 15:23   ` Tejun Heo
2018-08-09 14:32     ` Bart Van Assche
2018-08-07 22:51 ` [PATCH v5 5/9] block: Change the runtime power management approach (1/2) Bart Van Assche
2018-08-08  6:11   ` jianchao.wang
2018-08-08  6:43     ` jianchao.wang
2018-08-08 17:28       ` Bart Van Assche
2018-08-09  2:52         ` Ming Lei
2018-08-09 17:12           ` Bart Van Assche
2018-08-07 22:51 ` [PATCH v5 6/9] block: Change the runtime power management approach (2/2) Bart Van Assche
2018-08-08  8:50   ` Ming Lei
2018-08-08 17:32     ` Bart Van Assche
2018-08-07 22:51 ` [PATCH v5 7/9] block: Remove blk_pm_requeue_request() Bart Van Assche
2018-08-07 22:51 ` [PATCH v5 8/9] blk-mq: Insert a blk_pm_put_request() call Bart Van Assche
2018-08-07 22:51 ` [PATCH v5 9/9] blk-mq: Enable support for runtime power management Bart Van Assche

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox