From: Qais Yousef <qyousef@layalina.io>
To: Ingo Molnar <mingo@kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
Vincent Guittot <vincent.guittot@linaro.org>,
"Rafael J. Wysocki" <rafael@kernel.org>,
Viresh Kumar <viresh.kumar@linaro.org>
Cc: Juri Lelli <juri.lelli@redhat.com>,
Steven Rostedt <rostedt@goodmis.org>,
John Stultz <jstultz@google.com>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Tim Chen <tim.c.chen@linux.intel.com>,
"Chen, Yu C" <yu.c.chen@intel.com>,
Thomas Gleixner <tglx@kernel.org>,
linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org,
Qais Yousef <qyousef@layalina.io>
Subject: [PATCH v2 07/13] sched/fair: util_est: Take into account periodic tasks
Date: Mon, 4 May 2026 02:59:57 +0100 [thread overview]
Message-ID: <20260504020003.71306-8-qyousef@layalina.io> (raw)
In-Reply-To: <20260504020003.71306-1-qyousef@layalina.io>
The new faster rampup is great for performance. But terrible for power.
We want the faster rampup to be only applied for tasks that are
transitioning from one periodic/steady state to another periodic/steady
state. But if they are stably periodic, then the faster rampup doesn't
make sense as util_avg describes their computational demand accurately
and we can rely on that to make accurate decision. And preserve the
power savings from being exact with the resources we give to this task
(ie: smaller DVFS headroom).
We detect periodic tasks based on util_avg across util_est_update()
calls. If it is rising, then the task is going through a transition.
We rely on util_avg being stable for periodic tasks with very little
variations around one stable point.
Signed-off-by: Qais Yousef <qyousef@layalina.io>
---
include/linux/sched.h | 2 ++
kernel/sched/core.c | 2 ++
kernel/sched/fair.c | 35 ++++++++++++++++++++++++-----------
3 files changed, 28 insertions(+), 11 deletions(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index b61da16861e7..70517497e80b 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -907,6 +907,8 @@ struct task_struct {
struct uclamp_se uclamp[UCLAMP_CNT];
#endif
+ unsigned long util_avg_dequeued;
+
struct sched_statistics stats;
#ifdef CONFIG_PREEMPT_NOTIFIERS
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index fe14fd4a2d53..82189bdc85b7 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4409,6 +4409,8 @@ static void __sched_fork(u64 clone_flags, struct task_struct *p)
#endif
#endif
+ p->util_avg_dequeued = 0;
+
#ifdef CONFIG_SCHEDSTATS
/* Even if schedstat is disabled, there should not be garbage */
memset(&p->stats, 0, sizeof(p->stats));
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c6363ec5de9d..d9729da3901a 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5041,6 +5041,11 @@ static inline unsigned long task_util(struct task_struct *p)
return READ_ONCE(p->se.avg.util_avg);
}
+static inline unsigned long task_util_dequeued(struct task_struct *p)
+{
+ return READ_ONCE(p->util_avg_dequeued);
+}
+
static inline unsigned long task_runnable(struct task_struct *p)
{
return READ_ONCE(p->se.avg.runnable_avg);
@@ -5108,18 +5113,22 @@ static inline void util_est_update(struct cfs_rq *cfs_rq,
* quickly to settle down to our new util_avg.
*/
if (!task_sleep) {
- u64 delta = p->se.delta_exec;
- unsigned int prev_ewma = ewma & ~UTIL_AVG_UNCHANGED;
+ if (task_util(p) > task_util_dequeued(p) &&
+ task_util(p) - task_util_dequeued(p) > UTIL_EST_MARGIN) {
+ u64 delta = p->se.delta_exec;
+ unsigned int prev_ewma = ewma & ~UTIL_AVG_UNCHANGED;
- do_div(delta, 1000);
- ewma = approximate_util_avg(prev_ewma, delta);
- /*
- * Keep accumulating delta_exec if it is too small to cause
- * a change.
- */
- if (ewma != prev_ewma)
- p->se.delta_exec = 0;
- goto done;
+ do_div(delta, 1000);
+ ewma = approximate_util_avg(prev_ewma, delta);
+ /*
+ * Keep accumulating delta_exec if it is too small to cause
+ * a change.
+ */
+ if (ewma != prev_ewma)
+ p->se.delta_exec = 0;
+ goto done_running;
+ }
+ return;
} else {
p->se.delta_exec = 0;
}
@@ -5134,6 +5143,9 @@ static inline void util_est_update(struct cfs_rq *cfs_rq,
/* Get utilization at dequeue */
dequeued = task_util(p);
+ if (!task_on_rq_migrating(p))
+ p->util_avg_dequeued = dequeued;
+
/*
* Reset EWMA on utilization increases, the moving average is used only
* to smooth utilization decreases.
@@ -5180,6 +5192,7 @@ static inline void util_est_update(struct cfs_rq *cfs_rq,
ewma >>= UTIL_EST_WEIGHT_SHIFT;
done:
ewma |= UTIL_AVG_UNCHANGED;
+done_running:
WRITE_ONCE(p->se.avg.util_est, ewma);
trace_sched_util_est_se_tp(&p->se);
--
2.34.1
next prev parent reply other threads:[~2026-05-04 2:00 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-04 1:59 [PATCH v2 00/13] sched/fair/schedutil: Better manage system response time Qais Yousef
2026-05-04 1:59 ` [PATCH v2 01/13] sched: cpufreq: Rename map_util_perf to sugov_apply_dvfs_headroom Qais Yousef
2026-05-04 1:59 ` [PATCH v2 02/13] sched/pelt: Add a new function to approximate the future util_avg value Qais Yousef
2026-05-04 1:59 ` [PATCH v2 03/13] sched/pelt: Add a new function to approximate runtime to reach given util Qais Yousef
2026-05-04 1:59 ` [PATCH v2 04/13] sched/fair: Remove magic hardcoded margin in fits_capacity() Qais Yousef
2026-05-04 1:59 ` [PATCH v2 05/13] sched: cpufreq: Remove magic 1.25 headroom from sugov_apply_dvfs_headroom() Qais Yousef
2026-05-04 1:59 ` [PATCH v2 06/13] sched/fair: Extend util_est to improve rampup time Qais Yousef
2026-05-04 1:59 ` Qais Yousef [this message]
2026-05-04 1:59 ` [PATCH v2 RFC 08/13] sched/qos: Add a new sched-qos interface Qais Yousef
2026-05-06 20:38 ` Tim Chen
2026-05-07 9:55 ` Qais Yousef
2026-05-07 14:20 ` Chen, Yu C
2026-05-09 9:39 ` Qais Yousef
2026-05-11 10:57 ` Peter Zijlstra
2026-05-04 1:59 ` [PATCH v2 09/13] sched/qos: Add rampup multiplier QoS Qais Yousef
2026-05-11 11:03 ` Peter Zijlstra
2026-05-04 2:00 ` [PATCH v2 10/13] sched/fair: Disable util_est when rampup_multiplier is 0 Qais Yousef
2026-05-04 2:00 ` [PATCH v2 11/13] sched/fair: Don't mess with util_avg post init Qais Yousef
2026-05-04 2:00 ` [PATCH v2 12/13] sched/fair: Call update_util_est() after dequeue_entities() Qais Yousef
2026-05-04 2:00 ` [PATCH v2 RFC 13/13] sched/pelt: Always allow load updates Qais Yousef
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260504020003.71306-8-qyousef@layalina.io \
--to=qyousef@layalina.io \
--cc=dietmar.eggemann@arm.com \
--cc=jstultz@google.com \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pm@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=peterz@infradead.org \
--cc=rafael@kernel.org \
--cc=rostedt@goodmis.org \
--cc=tglx@kernel.org \
--cc=tim.c.chen@linux.intel.com \
--cc=vincent.guittot@linaro.org \
--cc=viresh.kumar@linaro.org \
--cc=yu.c.chen@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox