Linux Power Management development
 help / color / mirror / Atom feed
* [PATCH v2 00/13] sched/fair/schedutil: Better manage system response time
@ 2026-05-04  1:59 Qais Yousef
  2026-05-04  1:59 ` [PATCH v2 01/13] sched: cpufreq: Rename map_util_perf to sugov_apply_dvfs_headroom Qais Yousef
                   ` (12 more replies)
  0 siblings, 13 replies; 20+ messages in thread
From: Qais Yousef @ 2026-05-04  1:59 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Vincent Guittot, Rafael J. Wysocki,
	Viresh Kumar
  Cc: Juri Lelli, Steven Rostedt, John Stultz, Dietmar Eggemann,
	Tim Chen, Chen, Yu C, Thomas Gleixner, linux-kernel, linux-pm,
	Qais Yousef

This is the long delayed follow up to the series sent back in August 2024 [1].
Life got in the way to some extent (I had a baby, and now my time that I used
to do upstream work late at night was stolen :). Apologies for those who
replied and I didn't get a chance to respond back.

The series is now rebased on top of tip/sched/core 78cde54ea5f0. I removed
a number of optimization patches that are not necessary for this initial merge
and can be treated as their own separate topics once this is hopefully
accepted.

I discussed the problem in LPC in 2024 [2] and the initial cover letter
contains all the details. I hope all the key parties are up-to-date on the
problem details by now.

As a brief recap, there are some hardcoded constants in the kernel that
introduce a bias that frequently fails to deliver the best outcome on various
systems.  It turns out these constant seem to help somewhat against a bigger
problem in utilization signal distortion due to utilization invariance causing
what I call black hole effect. The lower the capacity, the harder it is to
accumulate runtime to cause the signal to rise acting like a gravitational pull
causing time dilation.

One of the major difficulties we will face is that this distortion turns up bad
for performance but good for power. The fix will inevitably rebalance the
system, while in the right way, but also in a surprising way to potentially
cause some to be unhappy. sched_features were added to ensure those unhappy
folks can revert the system to the old behavior while still allow us to make
the right progress.

That is to retain the older behavior one must:

	echo 0 | sudo tee /proc/sys/kernel/sched_qos_default_rampup_multiplier
	echo CONST_DVFS_HEADROOM NO_UTIL_EST_RAMPUP_ZERO UTIL_EST_FORCE_POST_INIT > /sys/kernel/debug/sched/features

Note for migration margin there's no sched features since I think the old
behavior was worse for perf and power and doesn't require reverting back to.

The system is going to be a lot faster now by default with
sched_qos_default_rampup_multiplier=1 since it fixes the distortion issue and
provides a constant rise time regardless of DVFS latencies.

The desired behavior is for default rampup_multiplier to be 0 and only those
interactive tasks to request a higher rampup multiplier. Preliminary
integration with schedqos is available [3] for those who want to see the full
benefit of fine grained control to mange perf and power.

Open questions:

* The details of the QoS interface is the biggest one.
* Would debugfs be better for setting the default rampup multiplier instead of sysctl?
* Patch 13 makes updating load_avg unconditional not on period boundaries.

Patches 1-3 are prepatory patches renaming a function and introducing new ones.

Patches 4-5 handle the magic margin problem but making them dynamic based on
actual hardware limitations.

Patches 6-7 fix the black hole problem and teaches the scheduler how to handle
bursty and periodic tasks via extending util_est.

Patches 8-9 is where I expect most of the discussion on as I introduce a new
sched_qos interface to support the new rampup_multiplier to help manage DVFS.

Patches 10-11 introduces a couple of necessary optimizations to counter the
power impact of increased responsiveness by disabling some features that we now
know how to handle better.

Patches 12-13 fix a couple of issues causing util_est and util_avg value to
swing for a periodic task. Patch 12 must go via stable.

My mac mini M1 system where I did the testing on before is down and it has been
proven difficult to revive it before sending this series. I will revive and
repeat the testing to ensure all is okay after the rebase.

I did test it on AMD system, but it has only 3 freqs so no real perf numbers to
report since it just whizzes by these 3 freqs anyway. But I did spend enough
time to verify the util_est behaves as expected under different scenarios. More
testing would still be appreciated :)

[1] https://lore.kernel.org/lkml/20240820163512.1096301-1-qyousef@layalina.io/
[2] https://lpc.events/event/18/contributions/1880/
[3] https://github.com/qais-yousef/schedqos/compare/main...schedqos

Qais Yousef (13):
  sched: cpufreq: Rename map_util_perf to sugov_apply_dvfs_headroom
  sched/pelt: Add a new function to approximate the future util_avg
    value
  sched/pelt: Add a new function to approximate runtime to reach given
    util
  sched/fair: Remove magic hardcoded margin in fits_capacity()
  sched: cpufreq: Remove magic 1.25 headroom from
    sugov_apply_dvfs_headroom()
  sched/fair: Extend util_est to improve rampup time
  sched/fair: util_est: Take into account periodic tasks
  sched/qos: Add a new sched-qos interface
  sched/qos: Add rampup multiplier QoS
  sched/fair: Disable util_est when rampup_multiplier is 0
  sched/fair: Don't mess with util_avg post init
  sched/fair: Call update_util_est() after dequeue_entities()
  sched/pelt: Always allow load updates

 Documentation/scheduler/index.rst             |   1 +
 Documentation/scheduler/sched-qos.rst         |  66 ++++++++++
 include/linux/sched.h                         |  10 ++
 include/linux/sched/cpufreq.h                 |   5 -
 include/uapi/linux/sched.h                    |  10 +-
 include/uapi/linux/sched/types.h              |  46 +++++++
 kernel/sched/core.c                           |  71 ++++++++++
 kernel/sched/cpufreq_schedutil.c              |  49 ++++++-
 kernel/sched/debug.c                          |   1 +
 kernel/sched/fair.c                           | 124 ++++++++++++++++--
 kernel/sched/features.h                       |  21 +++
 kernel/sched/pelt.c                           |  44 ++++++-
 kernel/sched/sched.h                          |  12 ++
 kernel/sched/syscalls.c                       |  61 +++++++++
 .../trace/beauty/include/uapi/linux/sched.h   |   4 +
 15 files changed, 501 insertions(+), 24 deletions(-)
 create mode 100644 Documentation/scheduler/sched-qos.rst

-- 
2.34.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2026-05-11 11:08 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-04  1:59 [PATCH v2 00/13] sched/fair/schedutil: Better manage system response time Qais Yousef
2026-05-04  1:59 ` [PATCH v2 01/13] sched: cpufreq: Rename map_util_perf to sugov_apply_dvfs_headroom Qais Yousef
2026-05-04  1:59 ` [PATCH v2 02/13] sched/pelt: Add a new function to approximate the future util_avg value Qais Yousef
2026-05-04  1:59 ` [PATCH v2 03/13] sched/pelt: Add a new function to approximate runtime to reach given util Qais Yousef
2026-05-04  1:59 ` [PATCH v2 04/13] sched/fair: Remove magic hardcoded margin in fits_capacity() Qais Yousef
2026-05-04  1:59 ` [PATCH v2 05/13] sched: cpufreq: Remove magic 1.25 headroom from sugov_apply_dvfs_headroom() Qais Yousef
2026-05-04  1:59 ` [PATCH v2 06/13] sched/fair: Extend util_est to improve rampup time Qais Yousef
2026-05-04  1:59 ` [PATCH v2 07/13] sched/fair: util_est: Take into account periodic tasks Qais Yousef
2026-05-04  1:59 ` [PATCH v2 RFC 08/13] sched/qos: Add a new sched-qos interface Qais Yousef
2026-05-06 20:38   ` Tim Chen
2026-05-07  9:55     ` Qais Yousef
2026-05-07 14:20       ` Chen, Yu C
2026-05-09  9:39         ` Qais Yousef
2026-05-11 10:57   ` Peter Zijlstra
2026-05-04  1:59 ` [PATCH v2 09/13] sched/qos: Add rampup multiplier QoS Qais Yousef
2026-05-11 11:03   ` Peter Zijlstra
2026-05-04  2:00 ` [PATCH v2 10/13] sched/fair: Disable util_est when rampup_multiplier is 0 Qais Yousef
2026-05-04  2:00 ` [PATCH v2 11/13] sched/fair: Don't mess with util_avg post init Qais Yousef
2026-05-04  2:00 ` [PATCH v2 12/13] sched/fair: Call update_util_est() after dequeue_entities() Qais Yousef
2026-05-04  2:00 ` [PATCH v2 RFC 13/13] sched/pelt: Always allow load updates Qais Yousef

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox