* [PATCH 1/3] block/genhd: add sysfs knobs for the device frequency PM QoS
2025-09-14 11:45 [PATCH 0/3] block: device frequency PM QoS tuning Wang Jianzheng
@ 2025-09-14 11:45 ` Wang Jianzheng
2025-09-15 21:15 ` Bart Van Assche
2025-09-14 11:45 ` [PATCH 2/3] block: add support for device frequency PM QoS tuning Wang Jianzheng
` (2 subsequent siblings)
3 siblings, 1 reply; 8+ messages in thread
From: Wang Jianzheng @ 2025-09-14 11:45 UTC (permalink / raw)
To: Jens Axboe, Alim Akhtar, Avri Altman, Bart Van Assche,
James E.J. Bottomley, Martin K. Petersen, Rafael J. Wysocki,
Peter Wang, Bean Huo, Bao D. Nguyen, Adrian Hunter, linux-block,
linux-kernel, linux-scsi, linux-pm
Cc: Wang Jianzheng
Add sysfs knobs for the following parameters:
dev_freq_timeout_ms: for clearing up the device frequency limit when
latency-intensive block IO is complete
This can be used to prevent delayed responses to latency-sensitive block
I/O operations when storage device operate at low frequency. By
implementing the "dev_freq_timeout_ms", it automatically restores device
frequency constraints throug PM QoS.
Signed-off-by: Wang Jianzheng <wangjianzheng@vivo.com>
---
block/genhd.c | 23 +++++++++++++++++++++++
include/linux/blkdev.h | 2 ++
2 files changed, 25 insertions(+)
diff --git a/block/genhd.c b/block/genhd.c
index 9bbc38d12792..9462a81501a8 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -1159,6 +1159,27 @@ static ssize_t partscan_show(struct device *dev,
return sysfs_emit(buf, "%u\n", disk_has_partscan(dev_to_disk(dev)));
}
+static ssize_t dev_freq_timeout_ms_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct gendisk *disk = dev_to_disk(dev);
+
+ return sprintf(buf, "%d\n", disk->dev_freq_timeout);
+}
+
+static ssize_t dev_freq_timeout_ms_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct gendisk *disk = dev_to_disk(dev);
+ int i;
+
+ if (count > 0 && !kstrtoint(buf, 10, &i))
+ disk->dev_freq_timeout = i;
+
+ return count;
+}
+
static DEVICE_ATTR(range, 0444, disk_range_show, NULL);
static DEVICE_ATTR(ext_range, 0444, disk_ext_range_show, NULL);
static DEVICE_ATTR(removable, 0444, disk_removable_show, NULL);
@@ -1173,6 +1194,7 @@ static DEVICE_ATTR(inflight, 0444, part_inflight_show, NULL);
static DEVICE_ATTR(badblocks, 0644, disk_badblocks_show, disk_badblocks_store);
static DEVICE_ATTR(diskseq, 0444, diskseq_show, NULL);
static DEVICE_ATTR(partscan, 0444, partscan_show, NULL);
+static DEVICE_ATTR_RW(dev_freq_timeout_ms);
#ifdef CONFIG_FAIL_MAKE_REQUEST
ssize_t part_fail_show(struct device *dev,
@@ -1224,6 +1246,7 @@ static struct attribute *disk_attrs[] = {
&dev_attr_events_poll_msecs.attr,
&dev_attr_diskseq.attr,
&dev_attr_partscan.attr,
+ &dev_attr_dev_freq_timeout_ms.attr,
#ifdef CONFIG_FAIL_MAKE_REQUEST
&dev_attr_fail.attr,
#endif
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index fe1797bbec42..950ad047dd81 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -213,6 +213,8 @@ struct gendisk {
u64 diskseq;
blk_mode_t open_mode;
+ int dev_freq_timeout;
+
/*
* Independent sector access ranges. This is always NULL for
* devices that do not have multiple independent access ranges.
--
2.34.1
^ permalink raw reply related [flat|nested] 8+ messages in thread* Re: [PATCH 1/3] block/genhd: add sysfs knobs for the device frequency PM QoS
2025-09-14 11:45 ` [PATCH 1/3] block/genhd: add sysfs knobs for the device frequency PM QoS Wang Jianzheng
@ 2025-09-15 21:15 ` Bart Van Assche
0 siblings, 0 replies; 8+ messages in thread
From: Bart Van Assche @ 2025-09-15 21:15 UTC (permalink / raw)
To: Wang Jianzheng, Jens Axboe, Alim Akhtar, Avri Altman,
James E.J. Bottomley, Martin K. Petersen, Rafael J. Wysocki,
Peter Wang, Bean Huo, Bao D. Nguyen, Adrian Hunter, linux-block,
linux-kernel, linux-scsi, linux-pm
On 9/14/25 4:45 AM, Wang Jianzheng wrote:
> block/genhd.c | 23 +++++++++++++++++++++++
> include/linux/blkdev.h | 2 ++
If a new sysfs attribute is added then an entry must be added in
Documentation/ABI/ that explains the purpose of the new attribute and
also how to use it.
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -213,6 +213,8 @@ struct gendisk {
> u64 diskseq;
> blk_mode_t open_mode;
>
> + int dev_freq_timeout;
The name `dev_freq_timeout` is not sufficient to guess the meaning of
the attribute. Please add a comment.
Thanks,
Bart.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 2/3] block: add support for device frequency PM QoS tuning
2025-09-14 11:45 [PATCH 0/3] block: device frequency PM QoS tuning Wang Jianzheng
2025-09-14 11:45 ` [PATCH 1/3] block/genhd: add sysfs knobs for the device frequency PM QoS Wang Jianzheng
@ 2025-09-14 11:45 ` Wang Jianzheng
2025-09-15 21:26 ` Bart Van Assche
2025-09-14 11:45 ` [PATCH 3/3] scsi: ufs: core: Add support for " Wang Jianzheng
2025-09-15 21:13 ` [PATCH 0/3] block: device " Bart Van Assche
3 siblings, 1 reply; 8+ messages in thread
From: Wang Jianzheng @ 2025-09-14 11:45 UTC (permalink / raw)
To: Jens Axboe, Alim Akhtar, Avri Altman, Bart Van Assche,
James E.J. Bottomley, Martin K. Petersen, Rafael J. Wysocki,
Peter Wang, Bean Huo, Bao D. Nguyen, Adrian Hunter, linux-block,
linux-kernel, linux-scsi, linux-pm
Cc: Wang Jianzheng
This mechanism use the PM QoS framework to add device frequency
constraints during Block IO is running. When critical processing
I/O requests that could block latency-sensitive threads, it
dynamically applies frequency restrictions. These constraints will
be removed through a timeout-based reset mechanism.
Signed-off-by: Wang Jianzheng <wangjianzheng@vivo.com>
---
block/blk-mq.c | 58 ++++++++++++++++++++++++++++++++++++++++++
include/linux/blkdev.h | 9 +++++++
include/linux/pm_qos.h | 6 +++++
3 files changed, 73 insertions(+)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index ba3a4b77f578..fcb4034287d3 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -11,6 +11,7 @@
#include <linux/bio.h>
#include <linux/blkdev.h>
#include <linux/blk-integrity.h>
+#include <linux/blk-pm.h>
#include <linux/kmemleak.h>
#include <linux/mm.h>
#include <linux/init.h>
@@ -28,6 +29,7 @@
#include <linux/prefetch.h>
#include <linux/blk-crypto.h>
#include <linux/part_stat.h>
+#include <linux/pm_qos.h>
#include <linux/sched/isolation.h>
#include <trace/events/block.h>
@@ -3095,6 +3097,54 @@ static bool bio_unaligned(const struct bio *bio, struct request_queue *q)
return false;
}
+#ifdef CONFIG_PM
+static void blk_mq_dev_frequency_work(struct work_struct *work)
+{
+ struct request_queue *q =
+ container_of(work, struct request_queue, dev_freq_work.work);
+ unsigned long timeout;
+ struct dev_pm_qos_request *qos = q->dev_freq_qos;
+
+ timeout = msecs_to_jiffies(q->disk->dev_freq_timeout);
+ if (!q || IS_ERR_OR_NULL(q->dev) || IS_ERR_OR_NULL(qos))
+ return;
+
+ if (q->pm_qos_status == PM_QOS_ACTIVE) {
+ q->pm_qos_status = PM_QOS_FREQ_SET;
+ dev_pm_qos_add_request(q->dev, qos, DEV_PM_QOS_MIN_FREQUENCY,
+ FREQ_QOS_MAX_DEFAULT_VALUE);
+ } else {
+ if (time_after(jiffies, READ_ONCE(q->last_active) + timeout))
+ q->pm_qos_status = PM_QOS_FREQ_REMOV;
+ }
+
+ if (q->pm_qos_status == PM_QOS_FREQ_REMOV) {
+ dev_pm_qos_remove_request(qos);
+ q->pm_qos_status = PM_QOS_ACTIVE;
+ } else {
+ schedule_delayed_work(&q->dev_freq_work,
+ q->last_active + timeout - jiffies);
+ }
+}
+
+static void blk_pm_qos_dev_freq_update(struct request_queue *q, struct bio *bio)
+{
+ if (IS_ERR_OR_NULL(q->dev) || !q->disk->dev_freq_timeout)
+ return;
+
+ if ((bio->bi_opf & REQ_SYNC || bio_op(bio) == REQ_OP_READ)) {
+ WRITE_ONCE(q->last_active, jiffies);
+ if (q->pm_qos_status == PM_QOS_ACTIVE)
+ schedule_delayed_work(&q->dev_freq_work, 0);
+ }
+}
+#else
+static void blk_pm_qos_dev_freq_update(struct request_queue *q, struct bio *bio)
+{
+ return;
+}
+#endif
+
/**
* blk_mq_submit_bio - Create and send a request to block device.
* @bio: Bio pointer.
@@ -3161,6 +3211,8 @@ void blk_mq_submit_bio(struct bio *bio)
goto queue_exit;
}
+ blk_pm_qos_dev_freq_update(q, bio);
+
bio = __bio_split_to_limits(bio, &q->limits, &nr_segs);
if (!bio)
goto queue_exit;
@@ -4601,6 +4653,12 @@ int blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
q->queue_flags |= QUEUE_FLAG_MQ_DEFAULT;
INIT_DELAYED_WORK(&q->requeue_work, blk_mq_requeue_work);
+#ifdef CONFIG_PM
+ INIT_DELAYED_WORK(&q->dev_freq_work, blk_mq_dev_frequency_work);
+ q->dev_freq_qos = kzalloc(sizeof(*q->dev_freq_qos), GFP_KERNEL);
+ if (IS_ERR_OR_NULL(q->dev_freq_qos))
+ goto err_hctxs;
+#endif
INIT_LIST_HEAD(&q->flush_list);
INIT_LIST_HEAD(&q->requeue_list);
spin_lock_init(&q->requeue_lock);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 950ad047dd81..ea6dfff6b0f5 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -26,6 +26,7 @@
#include <linux/xarray.h>
#include <linux/file.h>
#include <linux/lockdep.h>
+#include <linux/pm_qos.h>
struct module;
struct request_queue;
@@ -522,6 +523,14 @@ struct request_queue {
#ifdef CONFIG_PM
struct device *dev;
enum rpm_status rpm_status;
+ enum pm_qos_status pm_qos_status;
+ unsigned long last_active;
+
+ /** @dev_freq_work: Work to handle dev frequency PM QoS limits. */
+ struct delayed_work dev_freq_work;
+
+ /** @dev_freq_qos: PM QoS frequency limits for dev. */
+ struct dev_pm_qos_request *dev_freq_qos;
#endif
/*
diff --git a/include/linux/pm_qos.h b/include/linux/pm_qos.h
index 4a69d4af3ff8..e0d77ff65942 100644
--- a/include/linux/pm_qos.h
+++ b/include/linux/pm_qos.h
@@ -95,6 +95,12 @@ struct freq_qos_request {
struct freq_constraints *qos;
};
+enum pm_qos_status {
+ PM_QOS_INVALID = -1,
+ PM_QOS_ACTIVE = 0,
+ PM_QOS_FREQ_SET,
+ PM_QOS_FREQ_REMOV,
+};
enum dev_pm_qos_req_type {
DEV_PM_QOS_RESUME_LATENCY = 1,
--
2.34.1
^ permalink raw reply related [flat|nested] 8+ messages in thread* Re: [PATCH 2/3] block: add support for device frequency PM QoS tuning
2025-09-14 11:45 ` [PATCH 2/3] block: add support for device frequency PM QoS tuning Wang Jianzheng
@ 2025-09-15 21:26 ` Bart Van Assche
0 siblings, 0 replies; 8+ messages in thread
From: Bart Van Assche @ 2025-09-15 21:26 UTC (permalink / raw)
To: Wang Jianzheng, Jens Axboe, Alim Akhtar, Avri Altman,
James E.J. Bottomley, Martin K. Petersen, Rafael J. Wysocki,
Peter Wang, Bean Huo, Bao D. Nguyen, Adrian Hunter, linux-block,
linux-kernel, linux-scsi, linux-pm
On 9/14/25 4:45 AM, Wang Jianzheng wrote:
> +#ifdef CONFIG_PM
> +static void blk_mq_dev_frequency_work(struct work_struct *work)
> +{
> + struct request_queue *q =
> + container_of(work, struct request_queue, dev_freq_work.work);
> + unsigned long timeout;
> + struct dev_pm_qos_request *qos = q->dev_freq_qos;
> +
> + timeout = msecs_to_jiffies(q->disk->dev_freq_timeout);
> + if (!q || IS_ERR_OR_NULL(q->dev) || IS_ERR_OR_NULL(qos))
> + return;
> +
> + if (q->pm_qos_status == PM_QOS_ACTIVE) {
> + q->pm_qos_status = PM_QOS_FREQ_SET;
> + dev_pm_qos_add_request(q->dev, qos, DEV_PM_QOS_MIN_FREQUENCY,
> + FREQ_QOS_MAX_DEFAULT_VALUE);
> + } else {
> + if (time_after(jiffies, READ_ONCE(q->last_active) + timeout))
> + q->pm_qos_status = PM_QOS_FREQ_REMOV;
> + }
> +
> + if (q->pm_qos_status == PM_QOS_FREQ_REMOV) {
> + dev_pm_qos_remove_request(qos);
> + q->pm_qos_status = PM_QOS_ACTIVE;
> + } else {
> + schedule_delayed_work(&q->dev_freq_work,
> + q->last_active + timeout - jiffies);
> + }
> +}
The above code is similar in nature to the activity detection by the
run-time power management (RPM) code. Why a new timer mechanism instead
of adding more code in the UFS driver RPM callbacks?
> @@ -3161,6 +3211,8 @@ void blk_mq_submit_bio(struct bio *bio)
> goto queue_exit;
> }
>
> + blk_pm_qos_dev_freq_update(q, bio);
Good luck with adding power-management code in the block layer hot path
... I'm not sure anyone will be enthusiast seeing code being added in
blk_mq_submit_bio().
Thanks,
Bart.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 3/3] scsi: ufs: core: Add support for frequency PM QoS tuning
2025-09-14 11:45 [PATCH 0/3] block: device frequency PM QoS tuning Wang Jianzheng
2025-09-14 11:45 ` [PATCH 1/3] block/genhd: add sysfs knobs for the device frequency PM QoS Wang Jianzheng
2025-09-14 11:45 ` [PATCH 2/3] block: add support for device frequency PM QoS tuning Wang Jianzheng
@ 2025-09-14 11:45 ` Wang Jianzheng
2025-09-15 21:20 ` Bart Van Assche
2025-09-15 21:13 ` [PATCH 0/3] block: device " Bart Van Assche
3 siblings, 1 reply; 8+ messages in thread
From: Wang Jianzheng @ 2025-09-14 11:45 UTC (permalink / raw)
To: Jens Axboe, Alim Akhtar, Avri Altman, Bart Van Assche,
James E.J. Bottomley, Martin K. Petersen, Rafael J. Wysocki,
Peter Wang, Bean Huo, Bao D. Nguyen, Adrian Hunter, linux-block,
linux-kernel, linux-scsi, linux-pm
Cc: Wang Jianzheng
During frequency scaling operations, the host can reads the frequency
constraint configured in the PM QoS framework and updates the target
frequency accordingly.
Signed-off-by: Wang Jianzheng <wangjianzheng@vivo.com>
---
drivers/ufs/core/ufshcd.c | 44 +++++++++++++++++++++++++++++++++++++++
1 file changed, 44 insertions(+)
diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
index 9a43102b2b21..1a9bcbc9dab0 100644
--- a/drivers/ufs/core/ufshcd.c
+++ b/drivers/ufs/core/ufshcd.c
@@ -1546,6 +1546,47 @@ static void ufshcd_clk_scaling_resume_work(struct work_struct *work)
devfreq_resume_device(hba->devfreq);
}
+#ifdef CONFIG_PM
+static void ufshcd_pm_qos_freq_read_value(struct ufs_hba *hba,
+ unsigned long *freq)
+{
+ struct Scsi_Host *shost = hba->host;
+ struct scsi_device *sdev;
+ struct request_queue *q;
+ struct ufs_clk_info *clki;
+ s32 min_qos_freq;
+
+ if (!shost)
+ return;
+
+ shost_for_each_device(sdev, shost) {
+ if (!sdev)
+ continue;
+
+ q = sdev->request_queue;
+ if (IS_ERR_OR_NULL(q))
+ continue;
+
+ if (q->disk && !IS_ERR_OR_NULL(q->dev)) {
+ if (q->disk->dev_freq_timeout) {
+ min_qos_freq = dev_pm_qos_read_value(q->dev,
+ DEV_PM_QOS_MIN_FREQUENCY);
+ clki = list_first_entry(&hba->clk_list_head,
+ struct ufs_clk_info, list);
+ if (min_qos_freq > clki->max_freq)
+ *freq = clki->max_freq;
+ }
+ }
+ }
+}
+#else
+static void ufshcd_pm_qos_freq_read_value(struct ufs_hba *hba,
+ unsigned long *freq)
+{
+ return;
+}
+#endif
+
static int ufshcd_devfreq_target(struct device *dev,
unsigned long *freq, u32 flags)
{
@@ -1559,6 +1600,9 @@ static int ufshcd_devfreq_target(struct device *dev,
if (!ufshcd_is_clkscaling_supported(hba))
return -EINVAL;
+ /* Select the frequency based on the DEV_PM_QOS_MIN_FREQUENCY */
+ ufshcd_pm_qos_freq_read_value(hba, freq);
+
if (hba->use_pm_opp) {
struct dev_pm_opp *opp;
--
2.34.1
^ permalink raw reply related [flat|nested] 8+ messages in thread* Re: [PATCH 3/3] scsi: ufs: core: Add support for frequency PM QoS tuning
2025-09-14 11:45 ` [PATCH 3/3] scsi: ufs: core: Add support for " Wang Jianzheng
@ 2025-09-15 21:20 ` Bart Van Assche
0 siblings, 0 replies; 8+ messages in thread
From: Bart Van Assche @ 2025-09-15 21:20 UTC (permalink / raw)
To: Wang Jianzheng, Jens Axboe, Alim Akhtar, Avri Altman,
James E.J. Bottomley, Martin K. Petersen, Rafael J. Wysocki,
Peter Wang, Bean Huo, Bao D. Nguyen, Adrian Hunter, linux-block,
linux-kernel, linux-scsi, linux-pm
On 9/14/25 4:45 AM, Wang Jianzheng wrote:
> + if (!shost)
> + return;
Please remove the above if-statement and make sure that the
initialization order guarantees that hba->host is set before
ufshcd_pm_qos_freq_read_value() can be called.
> + shost_for_each_device(sdev, shost) {
Why a loop over all logical units? Isn't it sufficient to check the
WLUN?
> + if (!sdev)
> + continue;
> +
> + q = sdev->request_queue;
> + if (IS_ERR_OR_NULL(q))
> + continue;
> +
> + if (q->disk && !IS_ERR_OR_NULL(q->dev)) {
The above three if-statements are not necessary. Please remove these
three if-statements to improve code clarity.
Thanks,
Bart.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 0/3] block: device frequency PM QoS tuning
2025-09-14 11:45 [PATCH 0/3] block: device frequency PM QoS tuning Wang Jianzheng
` (2 preceding siblings ...)
2025-09-14 11:45 ` [PATCH 3/3] scsi: ufs: core: Add support for " Wang Jianzheng
@ 2025-09-15 21:13 ` Bart Van Assche
3 siblings, 0 replies; 8+ messages in thread
From: Bart Van Assche @ 2025-09-15 21:13 UTC (permalink / raw)
To: Wang Jianzheng, Jens Axboe, Alim Akhtar, Avri Altman,
James E.J. Bottomley, Martin K. Petersen, Rafael J. Wysocki,
Peter Wang, Bean Huo, Bao D. Nguyen, Adrian Hunter, linux-block,
linux-kernel, linux-scsi, linux-pm
On 9/14/25 4:45 AM, Wang Jianzheng wrote:
> Now, the patches provided here intruduce a mechanism for the block layer
> to add constraints to device frequecny through PM QoS framework, with
> configurable sysfs knobs per block device. Doing following config in my
> test system:
>
> /sys/block/sda/dev_freq_timeout_ms = 30
>
> This constraints is removed if there is no block IO for 30ms.
Why a new sysfs attribute? Is this attribute really necessary? How are
users expected to determine what value to write into that attribute to
achieve optimal results?
Thanks,
Bart.
^ permalink raw reply [flat|nested] 8+ messages in thread