From: Qais Yousef <qyousef@layalina.io>
To: Xuewen Yan <xuewen.yan@unisoc.com>
Cc: daniel.lezcano@kernel.org, amit.kachhap@gmail.com,
viresh.kumar@linaro.org, lukasz.luba@arm.com, rafael@kernel.org,
rui.zhang@intel.com, linux-pm@vger.kernel.org,
linux-kernel@vger.kernel.org, ke.wang@unisoc.com,
di.shen@unisoc.com, jeson.gao@unisoc.com, xuewen.yan94@gmail.com,
Peter Zijlstra <peterz@infradead.org>,
Vincent Guittot <vincent.guittot@linaro.org>
Subject: Re: [RFC PATCH 2/2] thermal/cpufreq_cooling: Use idle_time to get cpu_load when scx_enabled
Date: Tue, 24 Mar 2026 01:41:47 +0000 [thread overview]
Message-ID: <20260324014147.4rnhi3h37kffyrim@airbuntu> (raw)
In-Reply-To: <20260320113148.7308-2-xuewen.yan@unisoc.com>
On 03/20/26 19:31, Xuewen Yan wrote:
> From: Di Shen <di.shen@unisoc.com>
>
> Recently, while enabling sched-ext debugging, we observed abnormal behavior
> in our thermal power_allocator’s temperature control.
> Through debugging, we found that the CPU util was too low, causing
> the CPU frequency to remain unrestricted.
>
> This issue stems from the fact that in the sched_cpu_util() function,
> when scx is enabled, cpu_util_cfs becomes zero. As a result,
> the thermal subsystem perceives an extremely low CPU utilization,
> which degrades the effectiveness of the power_allocator’s control.
>
> However, the scx_cpuperf_target() reflects the targeted performance,
> not the utilisation. We couldn't use it.
>
> Until a perfect solution is found, using idle_time to get the cpu load
> might be a better approach.
>
> Co-developed-by: Xuewen Yan <xuewen.yan@unisoc.com>
> Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com>
> Signed-off-by: Di Shen <di.shen@unisoc.com>
> ---
> Previous discussion:
> https://lore.kernel.org/all/5a5d565b-33ac-4d5c-b0dd-1353324a6117@arm.com/
>
> ---
> drivers/thermal/cpufreq_cooling.c | 54 ++++++++++++++++++++-----------
> 1 file changed, 35 insertions(+), 19 deletions(-)
>
> diff --git a/drivers/thermal/cpufreq_cooling.c b/drivers/thermal/cpufreq_cooling.c
> index d030dbeb2973..e8fa70a95d00 100644
> --- a/drivers/thermal/cpufreq_cooling.c
> +++ b/drivers/thermal/cpufreq_cooling.c
> @@ -24,6 +24,9 @@
> #include <linux/units.h>
>
> #include "thermal_trace.h"
> +#ifdef CONFIG_SCHED_CLASS_EXT
> +#include "../../kernel/sched/sched.h"
> +#endif
This is a terrible include
>
> /*
> * Cooling state <-> CPUFreq frequency
> @@ -72,7 +75,7 @@ struct cpufreq_cooling_device {
> struct em_perf_domain *em;
> struct cpufreq_policy *policy;
> struct thermal_cooling_device_ops cooling_ops;
> -#ifndef CONFIG_SMP
> +#if !defined(CONFIG_SMP) || defined(CONFIG_SCHED_CLASS_EXT)
> struct time_in_idle *idle_time;
> #endif
> struct freq_qos_request qos_req;
> @@ -147,23 +150,9 @@ static u32 cpu_power_to_freq(struct cpufreq_cooling_device *cpufreq_cdev,
> return freq;
> }
>
> -/**
> - * get_load() - get load for a cpu
> - * @cpufreq_cdev: struct cpufreq_cooling_device for the cpu
> - * @cpu: cpu number
> - *
> - * Return: The average load of cpu @cpu in percentage since this
> - * function was last called.
> - */
> -#ifdef CONFIG_SMP
> -static u32 get_load(struct cpufreq_cooling_device *cpufreq_cdev, int cpu)
> -{
> - unsigned long util = sched_cpu_util(cpu);
> -
> - return (util * 100) / arch_scale_cpu_capacity(cpu);
> -}
> -#else /* !CONFIG_SMP */
> -static u32 get_load(struct cpufreq_cooling_device *cpufreq_cdev, int cpu)
> +#if !defined(CONFIG_SMP) || defined(CONFIG_SCHED_CLASS_EXT)
> +static u32 get_load_from_idle_time(struct cpufreq_cooling_device *cpufreq_cdev,
> + int cpu)
> {
> u32 load;
> u64 now, now_idle, delta_time, delta_idle;
> @@ -183,8 +172,35 @@ static u32 get_load(struct cpufreq_cooling_device *cpufreq_cdev, int cpu)
>
> return load;
> }
> -#endif /* CONFIG_SMP */
> +#endif /* !defined(CONFIG_SMP) || defined(CONFIG_SCHED_CLASS_EXT) */
More ugly ifdefs
>
> +/**
> + * get_load() - get load for a cpu
> + * @cpufreq_cdev: struct cpufreq_cooling_device for the cpu
> + * @cpu: cpu number
> + *
> + * Return: The average load of cpu @cpu in percentage since this
> + * function was last called.
> + */
> +#ifndef CONFIG_SMP
> +static u32 get_load(struct cpufreq_cooling_device *cpufreq_cdev, int cpu,
> + int cpu_idx)
> +{
> + return get_load_from_idle_time(cpufreq_cdev, cpu, cpu_idx);
> +}
> +#else /* CONFIG_SMP */
> +static u32 get_load(struct cpufreq_cooling_device *cpufreq_cdev, int cpu)
> +{
> + unsigned long util;
> +
> +#ifdef CONFIG_SCHED_CLASS_EXT
> + if (scx_enabled())
> + return get_load_from_idle_time(cpufreq_cdev, cpu);
> +#endif
Instead of this scx special hack, wouldn't it be better to implement this as
a special operation mode? But then this will beg the question do we actually
need sched_cpu_util() if it can all be done based on idle time and just remove
the deps on sched_cpu_util()?
ifdefing based on scx is nasty hack, this can be done better; most likely by
decoupling the deps on util if truly the idle time is enough. If it is not
enough, then I am not sure this will solve any problem.
> + util = sched_cpu_util(cpu);
> + return (util * 100) / arch_scale_cpu_capacity(cpu);
> +}
> +#endif /* !CONFIG_SMP */
> /**
> * get_dynamic_power() - calculate the dynamic power
> * @cpufreq_cdev: &cpufreq_cooling_device for this cdev
> --
> 2.25.1
>
next prev parent reply other threads:[~2026-03-24 1:41 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-20 11:31 [RFC PATCH 1/2] thermal/cpufreq_cooling: remove unused cpu_idx in get_load() Xuewen Yan
2026-03-20 11:31 ` [RFC PATCH 2/2] thermal/cpufreq_cooling: Use idle_time to get cpu_load when scx_enabled Xuewen Yan
2026-03-24 1:41 ` Qais Yousef [this message]
2026-03-20 12:32 ` [RFC PATCH 1/2] thermal/cpufreq_cooling: remove unused cpu_idx in get_load() Lukasz Luba
2026-03-21 8:48 ` Xuewen Yan
2026-03-23 5:34 ` Viresh Kumar
2026-03-23 9:20 ` Lukasz Luba
2026-03-23 10:41 ` Viresh Kumar
2026-03-23 10:52 ` Lukasz Luba
2026-03-23 11:06 ` Viresh Kumar
2026-03-23 13:25 ` Lukasz Luba
2026-03-24 2:20 ` Xuewen Yan
2026-03-24 10:46 ` Lukasz Luba
2026-03-24 12:03 ` Xuewen Yan
2026-03-25 8:31 ` Lukasz Luba
2026-03-26 9:05 ` Qais Yousef
2026-03-26 9:21 ` Lukasz Luba
2026-03-28 8:09 ` Qais Yousef
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260324014147.4rnhi3h37kffyrim@airbuntu \
--to=qyousef@layalina.io \
--cc=amit.kachhap@gmail.com \
--cc=daniel.lezcano@kernel.org \
--cc=di.shen@unisoc.com \
--cc=jeson.gao@unisoc.com \
--cc=ke.wang@unisoc.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pm@vger.kernel.org \
--cc=lukasz.luba@arm.com \
--cc=peterz@infradead.org \
--cc=rafael@kernel.org \
--cc=rui.zhang@intel.com \
--cc=vincent.guittot@linaro.org \
--cc=viresh.kumar@linaro.org \
--cc=xuewen.yan94@gmail.com \
--cc=xuewen.yan@unisoc.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox