public inbox for linux-pm@vger.kernel.org
 help / color / mirror / Atom feed
From: Qais Yousef <qyousef@layalina.io>
To: Xuewen Yan <xuewen.yan@unisoc.com>
Cc: daniel.lezcano@kernel.org, amit.kachhap@gmail.com,
	viresh.kumar@linaro.org, lukasz.luba@arm.com, rafael@kernel.org,
	rui.zhang@intel.com, linux-pm@vger.kernel.org,
	linux-kernel@vger.kernel.org, ke.wang@unisoc.com,
	di.shen@unisoc.com, jeson.gao@unisoc.com, xuewen.yan94@gmail.com,
	Peter Zijlstra <peterz@infradead.org>,
	Vincent Guittot <vincent.guittot@linaro.org>
Subject: Re: [RFC PATCH 2/2] thermal/cpufreq_cooling: Use idle_time to get cpu_load when scx_enabled
Date: Tue, 24 Mar 2026 01:41:47 +0000	[thread overview]
Message-ID: <20260324014147.4rnhi3h37kffyrim@airbuntu> (raw)
In-Reply-To: <20260320113148.7308-2-xuewen.yan@unisoc.com>

On 03/20/26 19:31, Xuewen Yan wrote:
> From: Di Shen <di.shen@unisoc.com>
> 
> Recently, while enabling sched-ext debugging, we observed abnormal behavior
> in our thermal power_allocator’s temperature control.
> Through debugging, we found that the CPU util was too low, causing
> the CPU frequency to remain unrestricted.
> 
> This issue stems from the fact that in the sched_cpu_util() function,
> when scx is enabled, cpu_util_cfs becomes zero. As a result,
> the thermal subsystem perceives an extremely low CPU utilization,
> which degrades the effectiveness of the power_allocator’s control.
> 
> However, the scx_cpuperf_target() reflects the targeted performance,
> not the utilisation. We couldn't use it.
> 
> Until a perfect solution is found, using idle_time to get the cpu load
> might be a better approach.
> 
> Co-developed-by: Xuewen Yan <xuewen.yan@unisoc.com>
> Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com>
> Signed-off-by: Di Shen <di.shen@unisoc.com>
> ---
> Previous discussion:
> https://lore.kernel.org/all/5a5d565b-33ac-4d5c-b0dd-1353324a6117@arm.com/
> 
> ---
>  drivers/thermal/cpufreq_cooling.c | 54 ++++++++++++++++++++-----------
>  1 file changed, 35 insertions(+), 19 deletions(-)
> 
> diff --git a/drivers/thermal/cpufreq_cooling.c b/drivers/thermal/cpufreq_cooling.c
> index d030dbeb2973..e8fa70a95d00 100644
> --- a/drivers/thermal/cpufreq_cooling.c
> +++ b/drivers/thermal/cpufreq_cooling.c
> @@ -24,6 +24,9 @@
>  #include <linux/units.h>
>  
>  #include "thermal_trace.h"
> +#ifdef CONFIG_SCHED_CLASS_EXT
> +#include "../../kernel/sched/sched.h"
> +#endif

This is a terrible include

>  
>  /*
>   * Cooling state <-> CPUFreq frequency
> @@ -72,7 +75,7 @@ struct cpufreq_cooling_device {
>  	struct em_perf_domain *em;
>  	struct cpufreq_policy *policy;
>  	struct thermal_cooling_device_ops cooling_ops;
> -#ifndef CONFIG_SMP
> +#if !defined(CONFIG_SMP) || defined(CONFIG_SCHED_CLASS_EXT)
>  	struct time_in_idle *idle_time;
>  #endif
>  	struct freq_qos_request qos_req;
> @@ -147,23 +150,9 @@ static u32 cpu_power_to_freq(struct cpufreq_cooling_device *cpufreq_cdev,
>  	return freq;
>  }
>  
> -/**
> - * get_load() - get load for a cpu
> - * @cpufreq_cdev: struct cpufreq_cooling_device for the cpu
> - * @cpu: cpu number
> - *
> - * Return: The average load of cpu @cpu in percentage since this
> - * function was last called.
> - */
> -#ifdef CONFIG_SMP
> -static u32 get_load(struct cpufreq_cooling_device *cpufreq_cdev, int cpu)
> -{
> -	unsigned long util = sched_cpu_util(cpu);
> -
> -	return (util * 100) / arch_scale_cpu_capacity(cpu);
> -}
> -#else /* !CONFIG_SMP */
> -static u32 get_load(struct cpufreq_cooling_device *cpufreq_cdev, int cpu)
> +#if !defined(CONFIG_SMP) || defined(CONFIG_SCHED_CLASS_EXT)
> +static u32 get_load_from_idle_time(struct cpufreq_cooling_device *cpufreq_cdev,
> +				   int cpu)
>  {
>  	u32 load;
>  	u64 now, now_idle, delta_time, delta_idle;
> @@ -183,8 +172,35 @@ static u32 get_load(struct cpufreq_cooling_device *cpufreq_cdev, int cpu)
>  
>  	return load;
>  }
> -#endif /* CONFIG_SMP */
> +#endif /* !defined(CONFIG_SMP) || defined(CONFIG_SCHED_CLASS_EXT) */

More ugly ifdefs

>  
> +/**
> + * get_load() - get load for a cpu
> + * @cpufreq_cdev: struct cpufreq_cooling_device for the cpu
> + * @cpu: cpu number
> + *
> + * Return: The average load of cpu @cpu in percentage since this
> + * function was last called.
> + */
> +#ifndef CONFIG_SMP
> +static u32 get_load(struct cpufreq_cooling_device *cpufreq_cdev, int cpu,
> +		    int cpu_idx)
> +{
> +	return get_load_from_idle_time(cpufreq_cdev, cpu, cpu_idx);
> +}
> +#else /* CONFIG_SMP */
> +static u32 get_load(struct cpufreq_cooling_device *cpufreq_cdev, int cpu)
> +{
> +	unsigned long util;
> +
> +#ifdef CONFIG_SCHED_CLASS_EXT
> +	if (scx_enabled())
> +		return get_load_from_idle_time(cpufreq_cdev, cpu);
> +#endif

Instead of this scx special hack, wouldn't it be better to implement this as
a special operation mode? But then this will beg the question do we actually
need sched_cpu_util() if it can all be done based on idle time and just remove
the deps on sched_cpu_util()?

ifdefing based on scx is nasty hack, this can be done better; most likely by
decoupling the deps on util if truly the idle time is enough. If it is not
enough, then I am not sure this will solve any problem.

> +	util = sched_cpu_util(cpu);
> +	return (util * 100) / arch_scale_cpu_capacity(cpu);
> +}
> +#endif /* !CONFIG_SMP */
>  /**
>   * get_dynamic_power() - calculate the dynamic power
>   * @cpufreq_cdev:	&cpufreq_cooling_device for this cdev
> -- 
> 2.25.1
> 

  reply	other threads:[~2026-03-24  1:41 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-20 11:31 [RFC PATCH 1/2] thermal/cpufreq_cooling: remove unused cpu_idx in get_load() Xuewen Yan
2026-03-20 11:31 ` [RFC PATCH 2/2] thermal/cpufreq_cooling: Use idle_time to get cpu_load when scx_enabled Xuewen Yan
2026-03-24  1:41   ` Qais Yousef [this message]
2026-03-20 12:32 ` [RFC PATCH 1/2] thermal/cpufreq_cooling: remove unused cpu_idx in get_load() Lukasz Luba
2026-03-21  8:48   ` Xuewen Yan
2026-03-23  5:34   ` Viresh Kumar
2026-03-23  9:20     ` Lukasz Luba
2026-03-23 10:41       ` Viresh Kumar
2026-03-23 10:52         ` Lukasz Luba
2026-03-23 11:06           ` Viresh Kumar
2026-03-23 13:25             ` Lukasz Luba
2026-03-24  2:20               ` Xuewen Yan
2026-03-24 10:46                 ` Lukasz Luba
2026-03-24 12:03                   ` Xuewen Yan
2026-03-25  8:31                     ` Lukasz Luba
2026-03-26  9:05                   ` Qais Yousef
2026-03-26  9:21                     ` Lukasz Luba
2026-03-28  8:09                       ` Qais Yousef

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260324014147.4rnhi3h37kffyrim@airbuntu \
    --to=qyousef@layalina.io \
    --cc=amit.kachhap@gmail.com \
    --cc=daniel.lezcano@kernel.org \
    --cc=di.shen@unisoc.com \
    --cc=jeson.gao@unisoc.com \
    --cc=ke.wang@unisoc.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=lukasz.luba@arm.com \
    --cc=peterz@infradead.org \
    --cc=rafael@kernel.org \
    --cc=rui.zhang@intel.com \
    --cc=vincent.guittot@linaro.org \
    --cc=viresh.kumar@linaro.org \
    --cc=xuewen.yan94@gmail.com \
    --cc=xuewen.yan@unisoc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox