* [RFC PATCH] sched: Add scx_cpuperf_target in sched_cpu_util()
@ 2026-03-18 12:17 Xuewen Yan
2026-03-18 12:47 ` Peter Zijlstra
` (2 more replies)
0 siblings, 3 replies; 22+ messages in thread
From: Xuewen Yan @ 2026-03-18 12:17 UTC (permalink / raw)
To: peterz, mingo, juri.lelli, vincent.guittot, tj
Cc: dietmar.eggemann, rostedt, bsegall, mgorman, vschneid,
lukasz.luba, linux-kernel, rui.zhang, di.shen, ke.wang,
xuewen.yan94
Recently, while enabling sched-ext debugging, we observed abnormal behavior
in our thermal power_allocator’s temperature control.
Through debugging, we found that the CPU util was too low, causing
the CPU frequency to remain unrestricted.
This issue stems from the fact that in the sched_cpu_util() function,
when scx is enabled, cpu_util_cfs becomes zero. As a result,
the thermal subsystem perceives an extremely low CPU utilization,
which degrades the effectiveness of the power_allocator’s control.
To address this, we propose adding scx_cpuperf_target in the sched_cpu_util()
as a replacement for cpu_util_cfs, ensuring that the thermal subsystem receives
accurate load information and restores proper control behavior.
Reported-by: Di Shen <di.shen@unisoc.com>
Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com>
---
kernel/sched/fair.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index bf948db905ed..20adb6fede2a 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8198,7 +8198,12 @@ unsigned long effective_cpu_util(int cpu, unsigned long util_cfs,
unsigned long sched_cpu_util(int cpu)
{
- return effective_cpu_util(cpu, cpu_util_cfs(cpu), NULL, NULL);
+ unsigned long util = scx_cpuperf_target(cpu);
+
+ if (!scx_switched_all())
+ util += cpu_util_cfs(cpu);
+
+ return effective_cpu_util(cpu, util, NULL, NULL);
}
/*
--
2.25.1
^ permalink raw reply related [flat|nested] 22+ messages in thread* Re: [RFC PATCH] sched: Add scx_cpuperf_target in sched_cpu_util() 2026-03-18 12:17 [RFC PATCH] sched: Add scx_cpuperf_target in sched_cpu_util() Xuewen Yan @ 2026-03-18 12:47 ` Peter Zijlstra 2026-03-18 12:55 ` Vincent Guittot ` (2 more replies) 2026-03-18 12:54 ` Christian Loehle 2026-03-19 1:21 ` Tejun Heo 2 siblings, 3 replies; 22+ messages in thread From: Peter Zijlstra @ 2026-03-18 12:47 UTC (permalink / raw) To: Xuewen Yan Cc: mingo, juri.lelli, vincent.guittot, tj, dietmar.eggemann, rostedt, bsegall, mgorman, vschneid, lukasz.luba, linux-kernel, rui.zhang, di.shen, ke.wang, xuewen.yan94 On Wed, Mar 18, 2026 at 08:17:55PM +0800, Xuewen Yan wrote: > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index bf948db905ed..20adb6fede2a 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -8198,7 +8198,12 @@ unsigned long effective_cpu_util(int cpu, unsigned long util_cfs, > > unsigned long sched_cpu_util(int cpu) > { > - return effective_cpu_util(cpu, cpu_util_cfs(cpu), NULL, NULL); > + unsigned long util = scx_cpuperf_target(cpu); > + > + if (!scx_switched_all()) > + util += cpu_util_cfs(cpu); > + > + return effective_cpu_util(cpu, util, NULL, NULL); > } This puts the common case of no ext muck into the slow path of that static_branch. This wants to be something like: unsigned long sched_cpu_util(int cpu) { unsigned long util = cpu_util_cfs(cpu); if (scx_enabled()) { unsigned long scx_util = scx_cpuperf_target(cpu); if (!scx_switched_all()) scx_util += util; util = scx_util; } return effective_cpu_util(cpu, util, NULL, NULL); } ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC PATCH] sched: Add scx_cpuperf_target in sched_cpu_util() 2026-03-18 12:47 ` Peter Zijlstra @ 2026-03-18 12:55 ` Vincent Guittot 2026-03-18 13:44 ` Qais Yousef 2026-03-18 13:03 ` [PATCH] sched/cpufreq: Reorder so non-SCX is common path Christian Loehle 2026-03-19 1:08 ` [RFC PATCH] sched: Add scx_cpuperf_target in sched_cpu_util() Tejun Heo 2 siblings, 1 reply; 22+ messages in thread From: Vincent Guittot @ 2026-03-18 12:55 UTC (permalink / raw) To: Peter Zijlstra Cc: Xuewen Yan, mingo, juri.lelli, tj, dietmar.eggemann, rostedt, bsegall, mgorman, vschneid, lukasz.luba, linux-kernel, rui.zhang, di.shen, ke.wang, xuewen.yan94 On Wed, 18 Mar 2026 at 13:47, Peter Zijlstra <peterz@infradead.org> wrote: > > On Wed, Mar 18, 2026 at 08:17:55PM +0800, Xuewen Yan wrote: > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > index bf948db905ed..20adb6fede2a 100644 > > --- a/kernel/sched/fair.c > > +++ b/kernel/sched/fair.c > > @@ -8198,7 +8198,12 @@ unsigned long effective_cpu_util(int cpu, unsigned long util_cfs, > > > > unsigned long sched_cpu_util(int cpu) > > { > > - return effective_cpu_util(cpu, cpu_util_cfs(cpu), NULL, NULL); > > + unsigned long util = scx_cpuperf_target(cpu); > > + > > + if (!scx_switched_all()) > > + util += cpu_util_cfs(cpu); > > + > > + return effective_cpu_util(cpu, util, NULL, NULL); > > } > > This puts the common case of no ext muck into the slow path of that > static_branch. +1 I was about to same > > This wants to be something like: > > unsigned long sched_cpu_util(int cpu) > { > unsigned long util = cpu_util_cfs(cpu); > > if (scx_enabled()) { > unsigned long scx_util = scx_cpuperf_target(cpu); also scx_cpuperf_target() does not reflect the utilization of the CPU but the targeted perfromance level > > if (!scx_switched_all()) > scx_util += util; > > util = scx_util; > } > > return effective_cpu_util(cpu, util, NULL, NULL); > } ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC PATCH] sched: Add scx_cpuperf_target in sched_cpu_util() 2026-03-18 12:55 ` Vincent Guittot @ 2026-03-18 13:44 ` Qais Yousef 2026-03-19 2:13 ` Xuewen Yan 0 siblings, 1 reply; 22+ messages in thread From: Qais Yousef @ 2026-03-18 13:44 UTC (permalink / raw) To: Vincent Guittot Cc: Peter Zijlstra, Xuewen Yan, mingo, juri.lelli, tj, dietmar.eggemann, rostedt, bsegall, mgorman, vschneid, lukasz.luba, linux-kernel, rui.zhang, di.shen, ke.wang, xuewen.yan94 On 03/18/26 13:55, Vincent Guittot wrote: > On Wed, 18 Mar 2026 at 13:47, Peter Zijlstra <peterz@infradead.org> wrote: > > > > On Wed, Mar 18, 2026 at 08:17:55PM +0800, Xuewen Yan wrote: > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > > index bf948db905ed..20adb6fede2a 100644 > > > --- a/kernel/sched/fair.c > > > +++ b/kernel/sched/fair.c > > > @@ -8198,7 +8198,12 @@ unsigned long effective_cpu_util(int cpu, unsigned long util_cfs, > > > > > > unsigned long sched_cpu_util(int cpu) > > > { > > > - return effective_cpu_util(cpu, cpu_util_cfs(cpu), NULL, NULL); > > > + unsigned long util = scx_cpuperf_target(cpu); > > > + > > > + if (!scx_switched_all()) > > > + util += cpu_util_cfs(cpu); > > > + > > > + return effective_cpu_util(cpu, util, NULL, NULL); > > > } > > > > This puts the common case of no ext muck into the slow path of that > > static_branch. > > +1 > I was about to same > > > > > This wants to be something like: > > > > unsigned long sched_cpu_util(int cpu) > > { > > unsigned long util = cpu_util_cfs(cpu); > > > > if (scx_enabled()) { > > unsigned long scx_util = scx_cpuperf_target(cpu); > > also scx_cpuperf_target() does not reflect the utilization of the CPU > but the targeted perfromance level Beside that, this sort of plug-and-play is a big concern. You picked up sched ext and changed the behavior, then you'd need to get your thermal management to work with that. Not retrospectively sprinkle these hacks around to force things to work again. This is a no from me. I think we have to keep the separation clear. And I haven't seen a single contribution back to scheduler out of these 'experiments'. Clearly everyone is going their own way and getting the threshold for us to get patches merged even higher not to break these out of tree 'experiments' is a big nuance. I think the sugov one was already a mistake to accept. > > > > > > if (!scx_switched_all()) > > scx_util += util; > > > > util = scx_util; > > } > > > > return effective_cpu_util(cpu, util, NULL, NULL); > > } ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC PATCH] sched: Add scx_cpuperf_target in sched_cpu_util() 2026-03-18 13:44 ` Qais Yousef @ 2026-03-19 2:13 ` Xuewen Yan 2026-03-19 7:09 ` Vincent Guittot ` (2 more replies) 0 siblings, 3 replies; 22+ messages in thread From: Xuewen Yan @ 2026-03-19 2:13 UTC (permalink / raw) To: Qais Yousef Cc: Vincent Guittot, Peter Zijlstra, Xuewen Yan, mingo, juri.lelli, tj, dietmar.eggemann, rostedt, bsegall, mgorman, vschneid, lukasz.luba, linux-kernel, rui.zhang, di.shen, ke.wang, Christian Loehle On Wed, Mar 18, 2026 at 9:44 PM Qais Yousef <qyousef@layalina.io> wrote: > > On 03/18/26 13:55, Vincent Guittot wrote: > > On Wed, 18 Mar 2026 at 13:47, Peter Zijlstra <peterz@infradead.org> wrote: > > > > > > On Wed, Mar 18, 2026 at 08:17:55PM +0800, Xuewen Yan wrote: > > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > > > index bf948db905ed..20adb6fede2a 100644 > > > > --- a/kernel/sched/fair.c > > > > +++ b/kernel/sched/fair.c > > > > @@ -8198,7 +8198,12 @@ unsigned long effective_cpu_util(int cpu, unsigned long util_cfs, > > > > > > > > unsigned long sched_cpu_util(int cpu) > > > > { > > > > - return effective_cpu_util(cpu, cpu_util_cfs(cpu), NULL, NULL); > > > > + unsigned long util = scx_cpuperf_target(cpu); > > > > + > > > > + if (!scx_switched_all()) > > > > + util += cpu_util_cfs(cpu); > > > > + > > > > + return effective_cpu_util(cpu, util, NULL, NULL); > > > > } > > > > > > This puts the common case of no ext muck into the slow path of that > > > static_branch. > > > > +1 > > I was about to same > > > > > > > > This wants to be something like: > > > > > > unsigned long sched_cpu_util(int cpu) > > > { > > > unsigned long util = cpu_util_cfs(cpu); > > > > > > if (scx_enabled()) { > > > unsigned long scx_util = scx_cpuperf_target(cpu); > > > > also scx_cpuperf_target() does not reflect the utilization of the CPU > > but the targeted perfromance level > > Beside that, this sort of plug-and-play is a big concern. You picked up sched > ext and changed the behavior, then you'd need to get your thermal management to > work with that. Not retrospectively sprinkle these hacks around to force things > to work again. In fact, I had considered this issue even before sending this patch. Our initial fix was to modify the thermal subsystem specifically, in cpufreq_cooling.c, when SCX is enabled, we stopped calling sched_cpu_util() and instead used idle-time-based calculation to obtain CPU util. However, we later realized that sched_cpu_util() is used not only in cpufreq_cooling.c but also in dtpm_cpu.c. Since I’m not familiar with dtpm_cpu.c, I was hesitant to modify it. This led us to propose the current patch. Although scx_cpuperf_target may not accurately reflect the true CPU utilization, as Christian pointed out, it’s still better than having nothing at all. That’s exactly why I marked this patch as RFC, I wasn’t sure whether the proper fix should be in cpufreq_cooling.c or in sched_cpu_util(). It now seems clear that modifying only sched_cpu_util() is insufficient. Ideally, we should also update cpufreq_cooling.c, since we cannot guarantee that all SCX BPF programs will provide an accurate cpuperf_target. I’ll submit a follow-up patch to modify cpufreq_cooling.c shortly. Thanks! > > This is a no from me. I think we have to keep the separation clear. And > I haven't seen a single contribution back to scheduler out of these > 'experiments'. Clearly everyone is going their own way and getting the > threshold for us to get patches merged even higher not to break these out of > tree 'experiments' is a big nuance. I think the sugov one was already a mistake > to accept. > > > > > > > > > > > if (!scx_switched_all()) > > > scx_util += util; > > > > > > util = scx_util; > > > } > > > > > > return effective_cpu_util(cpu, util, NULL, NULL); > > > } ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC PATCH] sched: Add scx_cpuperf_target in sched_cpu_util() 2026-03-19 2:13 ` Xuewen Yan @ 2026-03-19 7:09 ` Vincent Guittot 2026-03-19 10:18 ` Lukasz Luba 2026-03-24 1:32 ` Qais Yousef 2 siblings, 0 replies; 22+ messages in thread From: Vincent Guittot @ 2026-03-19 7:09 UTC (permalink / raw) To: Xuewen Yan Cc: Qais Yousef, Peter Zijlstra, Xuewen Yan, mingo, juri.lelli, tj, dietmar.eggemann, rostedt, bsegall, mgorman, vschneid, lukasz.luba, linux-kernel, rui.zhang, di.shen, ke.wang, Christian Loehle On Thu, 19 Mar 2026 at 03:13, Xuewen Yan <xuewen.yan94@gmail.com> wrote: > > On Wed, Mar 18, 2026 at 9:44 PM Qais Yousef <qyousef@layalina.io> wrote: > > > > On 03/18/26 13:55, Vincent Guittot wrote: > > > On Wed, 18 Mar 2026 at 13:47, Peter Zijlstra <peterz@infradead.org> wrote: > > > > > > > > On Wed, Mar 18, 2026 at 08:17:55PM +0800, Xuewen Yan wrote: > > > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > > > > index bf948db905ed..20adb6fede2a 100644 > > > > > --- a/kernel/sched/fair.c > > > > > +++ b/kernel/sched/fair.c > > > > > @@ -8198,7 +8198,12 @@ unsigned long effective_cpu_util(int cpu, unsigned long util_cfs, > > > > > > > > > > unsigned long sched_cpu_util(int cpu) > > > > > { > > > > > - return effective_cpu_util(cpu, cpu_util_cfs(cpu), NULL, NULL); > > > > > + unsigned long util = scx_cpuperf_target(cpu); > > > > > + > > > > > + if (!scx_switched_all()) > > > > > + util += cpu_util_cfs(cpu); > > > > > + > > > > > + return effective_cpu_util(cpu, util, NULL, NULL); > > > > > } > > > > > > > > This puts the common case of no ext muck into the slow path of that > > > > static_branch. > > > > > > +1 > > > I was about to same > > > > > > > > > > > This wants to be something like: > > > > > > > > unsigned long sched_cpu_util(int cpu) > > > > { > > > > unsigned long util = cpu_util_cfs(cpu); > > > > > > > > if (scx_enabled()) { > > > > unsigned long scx_util = scx_cpuperf_target(cpu); > > > > > > also scx_cpuperf_target() does not reflect the utilization of the CPU > > > but the targeted perfromance level > > > > Beside that, this sort of plug-and-play is a big concern. You picked up sched > > ext and changed the behavior, then you'd need to get your thermal management to > > work with that. Not retrospectively sprinkle these hacks around to force things > > to work again. > > In fact, I had considered this issue even before sending this patch. > Our initial fix was to modify the thermal subsystem specifically, > in cpufreq_cooling.c, when SCX is enabled, we stopped calling > sched_cpu_util() and instead used idle-time-based calculation to > obtain CPU util. > > However, we later realized that sched_cpu_util() is used not only in > cpufreq_cooling.c but also in dtpm_cpu.c. > Since I’m not familiar with dtpm_cpu.c, I was hesitant to modify it. > This led us to propose the current patch. > Although scx_cpuperf_target may not accurately reflect the true CPU > utilization, as Christian pointed out, it’s still better than having > nothing at all. scx_cpuperf_target() reflects the targeted performance level, not the utilisation. This is good for cpufreq but not here as you can have a high performance target with low cpu utilization. You must provide the right value > > That’s exactly why I marked this patch as RFC, I wasn’t sure whether > the proper fix should be in cpufreq_cooling.c or in sched_cpu_util(). > It now seems clear that modifying only sched_cpu_util() is > insufficient. Ideally, we should also update cpufreq_cooling.c, since > we cannot guarantee that all SCX BPF programs will provide an accurate > cpuperf_target. > I’ll submit a follow-up patch to modify cpufreq_cooling.c shortly. > > Thanks! > > > > > > This is a no from me. I think we have to keep the separation clear. And > > I haven't seen a single contribution back to scheduler out of these > > 'experiments'. Clearly everyone is going their own way and getting the > > threshold for us to get patches merged even higher not to break these out of > > tree 'experiments' is a big nuance. I think the sugov one was already a mistake > > to accept. > > > > > > > > > > > > > > > > if (!scx_switched_all()) > > > > scx_util += util; > > > > > > > > util = scx_util; > > > > } > > > > > > > > return effective_cpu_util(cpu, util, NULL, NULL); > > > > } ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC PATCH] sched: Add scx_cpuperf_target in sched_cpu_util() 2026-03-19 2:13 ` Xuewen Yan 2026-03-19 7:09 ` Vincent Guittot @ 2026-03-19 10:18 ` Lukasz Luba 2026-03-24 1:32 ` Qais Yousef 2 siblings, 0 replies; 22+ messages in thread From: Lukasz Luba @ 2026-03-19 10:18 UTC (permalink / raw) To: Xuewen Yan, Qais Yousef Cc: Vincent Guittot, Peter Zijlstra, Xuewen Yan, mingo, juri.lelli, tj, dietmar.eggemann, rostedt, bsegall, mgorman, vschneid, linux-kernel, rui.zhang, di.shen, ke.wang, Christian Loehle Hi Qais and Xuewen, On 3/19/26 02:13, Xuewen Yan wrote: > On Wed, Mar 18, 2026 at 9:44 PM Qais Yousef <qyousef@layalina.io> wrote: >> >> On 03/18/26 13:55, Vincent Guittot wrote: >>> On Wed, 18 Mar 2026 at 13:47, Peter Zijlstra <peterz@infradead.org> wrote: >>>> >>>> On Wed, Mar 18, 2026 at 08:17:55PM +0800, Xuewen Yan wrote: >>>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >>>>> index bf948db905ed..20adb6fede2a 100644 >>>>> --- a/kernel/sched/fair.c >>>>> +++ b/kernel/sched/fair.c >>>>> @@ -8198,7 +8198,12 @@ unsigned long effective_cpu_util(int cpu, unsigned long util_cfs, >>>>> >>>>> unsigned long sched_cpu_util(int cpu) >>>>> { >>>>> - return effective_cpu_util(cpu, cpu_util_cfs(cpu), NULL, NULL); >>>>> + unsigned long util = scx_cpuperf_target(cpu); >>>>> + >>>>> + if (!scx_switched_all()) >>>>> + util += cpu_util_cfs(cpu); >>>>> + >>>>> + return effective_cpu_util(cpu, util, NULL, NULL); >>>>> } >>>> >>>> This puts the common case of no ext muck into the slow path of that >>>> static_branch. >>> >>> +1 >>> I was about to same >>> >>>> >>>> This wants to be something like: >>>> >>>> unsigned long sched_cpu_util(int cpu) >>>> { >>>> unsigned long util = cpu_util_cfs(cpu); >>>> >>>> if (scx_enabled()) { >>>> unsigned long scx_util = scx_cpuperf_target(cpu); >>> >>> also scx_cpuperf_target() does not reflect the utilization of the CPU >>> but the targeted perfromance level >> >> Beside that, this sort of plug-and-play is a big concern. You picked up sched >> ext and changed the behavior, then you'd need to get your thermal management to >> work with that. Not retrospectively sprinkle these hacks around to force things >> to work again. > > In fact, I had considered this issue even before sending this patch. > Our initial fix was to modify the thermal subsystem specifically, > in cpufreq_cooling.c, when SCX is enabled, we stopped calling > sched_cpu_util() and instead used idle-time-based calculation to > obtain CPU util. > > However, we later realized that sched_cpu_util() is used not only in > cpufreq_cooling.c but also in dtpm_cpu.c. > Since I’m not familiar with dtpm_cpu.c, I was hesitant to modify it. > This led us to propose the current patch. > Although scx_cpuperf_target may not accurately reflect the true CPU > utilization, as Christian pointed out, it’s still better than having > nothing at all. > > That’s exactly why I marked this patch as RFC, I wasn’t sure whether > the proper fix should be in cpufreq_cooling.c or in sched_cpu_util(). > It now seems clear that modifying only sched_cpu_util() is > insufficient. Ideally, we should also update cpufreq_cooling.c, since > we cannot guarantee that all SCX BPF programs will provide an accurate > cpuperf_target. > I’ll submit a follow-up patch to modify cpufreq_cooling.c shortly. > If you refer to the utilization driven power estimation - I tend to agree. You might remember, we had a few corner-cases due to that design and I tried to address it differently [1][2]. Xuewen please send your proposal for the cpufreq_cooling.c and I will have a look there. If there still be a need to fix it more deeply then I can help with some deeper thermal changes. Estimating the real CPU's power based on available information in the kernel is really hard. Although, let's try to fix what you observe in your devices. Regards, Lukasz [1] https://lore.kernel.org/linux-pm/20210622075925.16189-1-lukasz.luba@arm.com/ [2] https://lore.kernel.org/linux-pm/20220406220809.22555-1-lukasz.luba@arm.com/ ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC PATCH] sched: Add scx_cpuperf_target in sched_cpu_util() 2026-03-19 2:13 ` Xuewen Yan 2026-03-19 7:09 ` Vincent Guittot 2026-03-19 10:18 ` Lukasz Luba @ 2026-03-24 1:32 ` Qais Yousef 2 siblings, 0 replies; 22+ messages in thread From: Qais Yousef @ 2026-03-24 1:32 UTC (permalink / raw) To: Xuewen Yan Cc: Vincent Guittot, Peter Zijlstra, Xuewen Yan, mingo, juri.lelli, tj, dietmar.eggemann, rostedt, bsegall, mgorman, vschneid, lukasz.luba, linux-kernel, rui.zhang, di.shen, ke.wang, Christian Loehle On 03/19/26 10:13, Xuewen Yan wrote: > > Beside that, this sort of plug-and-play is a big concern. You picked up sched > > ext and changed the behavior, then you'd need to get your thermal management to > > work with that. Not retrospectively sprinkle these hacks around to force things > > to work again. > > In fact, I had considered this issue even before sending this patch. > Our initial fix was to modify the thermal subsystem specifically, > in cpufreq_cooling.c, when SCX is enabled, we stopped calling > sched_cpu_util() and instead used idle-time-based calculation to > obtain CPU util. > > However, we later realized that sched_cpu_util() is used not only in > cpufreq_cooling.c but also in dtpm_cpu.c. > Since I’m not familiar with dtpm_cpu.c, I was hesitant to modify it. > This led us to propose the current patch. > Although scx_cpuperf_target may not accurately reflect the true CPU > utilization, as Christian pointed out, it’s still better than having > nothing at all. > > That’s exactly why I marked this patch as RFC, I wasn’t sure whether > the proper fix should be in cpufreq_cooling.c or in sched_cpu_util(). > It now seems clear that modifying only sched_cpu_util() is > insufficient. Ideally, we should also update cpufreq_cooling.c, since > we cannot guarantee that all SCX BPF programs will provide an accurate > cpuperf_target. > I’ll submit a follow-up patch to modify cpufreq_cooling.c shortly. The bigger picture problem is that sched_ext is not trustable. For something that is critical as thermal management you'd have to bake it with your out of tree sched_ext changes. The current system was designed to work together with some assumptions, and by opting to sched_ext you're opting out of these in-tree assumptions. User space scheduler can be used with a user space cpufreq governor and a userspace thermal solutions; I don't think there are restrictions to creating out of tree governors. If you have a problem that needs to be fixed in the scheduler, perhaps we can help with that ;-) If you just want to go and do your own thing, perhaps you can go all the way then and redo it all. ^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH] sched/cpufreq: Reorder so non-SCX is common path 2026-03-18 12:47 ` Peter Zijlstra 2026-03-18 12:55 ` Vincent Guittot @ 2026-03-18 13:03 ` Christian Loehle 2026-03-19 1:08 ` [RFC PATCH] sched: Add scx_cpuperf_target in sched_cpu_util() Tejun Heo 2 siblings, 0 replies; 22+ messages in thread From: Christian Loehle @ 2026-03-18 13:03 UTC (permalink / raw) To: Peter Zijlstra, Xuewen Yan, Vincent Guittot Cc: mingo, juri.lelli, vincent.guittot, tj, dietmar.eggemann, rostedt, bsegall, mgorman, vschneid, lukasz.luba, linux-kernel, rui.zhang, di.shen, ke.wang, xuewen.yan94 scx_enabled() with sugov is probably the rarer case to fair with sugov, so reorder the condition into scx_enabled() check instead of the !scx_switched_all() check so the static branch is not checked when sched_ext isn't enabled. No functional change intended. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Christian Loehle <christian.loehle@arm.com> --- @Peter We probably also want the analogous reorder in sugov then.... kernel/sched/cpufreq_schedutil.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c index 153232dd8276..06bd453a37eb 100644 --- a/kernel/sched/cpufreq_schedutil.c +++ b/kernel/sched/cpufreq_schedutil.c @@ -225,10 +225,16 @@ unsigned long sugov_effective_cpu_perf(int cpu, unsigned long actual, static void sugov_get_util(struct sugov_cpu *sg_cpu, unsigned long boost) { - unsigned long min, max, util = scx_cpuperf_target(sg_cpu->cpu); + unsigned long min, max, util = cpu_util_cfs_boost(sg_cpu->cpu); - if (!scx_switched_all()) - util += cpu_util_cfs_boost(sg_cpu->cpu); + if (scx_enabled()) { + unsigned long scx_util = scx_cpuperf_target(sg_cpu->cpu); + + if (!scx_switched_all()) + scx_util += util; + + util = scx_util; + } util = effective_cpu_util(sg_cpu->cpu, util, &min, &max); util = max(util, boost); sg_cpu->bw_min = min; -- 2.34.1 ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [RFC PATCH] sched: Add scx_cpuperf_target in sched_cpu_util() 2026-03-18 12:47 ` Peter Zijlstra 2026-03-18 12:55 ` Vincent Guittot 2026-03-18 13:03 ` [PATCH] sched/cpufreq: Reorder so non-SCX is common path Christian Loehle @ 2026-03-19 1:08 ` Tejun Heo 2026-03-19 2:24 ` Xuewen Yan 2026-03-19 9:02 ` Peter Zijlstra 2 siblings, 2 replies; 22+ messages in thread From: Tejun Heo @ 2026-03-19 1:08 UTC (permalink / raw) To: Peter Zijlstra Cc: Xuewen Yan, mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman, vschneid, lukasz.luba, linux-kernel, rui.zhang, di.shen, ke.wang, xuewen.yan94 On Wed, Mar 18, 2026 at 01:47:18PM +0100, Peter Zijlstra wrote: > On Wed, Mar 18, 2026 at 08:17:55PM +0800, Xuewen Yan wrote: > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > index bf948db905ed..20adb6fede2a 100644 > > --- a/kernel/sched/fair.c > > +++ b/kernel/sched/fair.c > > @@ -8198,7 +8198,12 @@ unsigned long effective_cpu_util(int cpu, unsigned long util_cfs, > > > > unsigned long sched_cpu_util(int cpu) > > { > > - return effective_cpu_util(cpu, cpu_util_cfs(cpu), NULL, NULL); > > + unsigned long util = scx_cpuperf_target(cpu); > > + > > + if (!scx_switched_all()) > > + util += cpu_util_cfs(cpu); > > + > > + return effective_cpu_util(cpu, util, NULL, NULL); > > } > > This puts the common case of no ext muck into the slow path of that > static_branch. > > This wants to be something like: > > unsigned long sched_cpu_util(int cpu) > { > unsigned long util = cpu_util_cfs(cpu); > > if (scx_enabled()) { > unsigned long scx_util = scx_cpuperf_target(cpu); > > if (!scx_switched_all()) > scx_util += util; > > util = scx_util; > } > > return effective_cpu_util(cpu, util, NULL, NULL); > } scx_switched_all() is an unlikely static branch just like scx_enabled() and scx_cpuperf_target() has scx_enabled() in it too, so the difference for the fair path between the two versions is two noop run-throughs vs. one. Either way is fine but it is more code for likely no discernible gain. Thanks. -- tejun ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC PATCH] sched: Add scx_cpuperf_target in sched_cpu_util() 2026-03-19 1:08 ` [RFC PATCH] sched: Add scx_cpuperf_target in sched_cpu_util() Tejun Heo @ 2026-03-19 2:24 ` Xuewen Yan 2026-03-19 2:38 ` Xuewen Yan 2026-03-19 9:02 ` Peter Zijlstra 1 sibling, 1 reply; 22+ messages in thread From: Xuewen Yan @ 2026-03-19 2:24 UTC (permalink / raw) To: Tejun Heo Cc: Peter Zijlstra, Xuewen Yan, mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman, vschneid, lukasz.luba, linux-kernel, rui.zhang, di.shen, ke.wang On Thu, Mar 19, 2026 at 9:08 AM Tejun Heo <tj@kernel.org> wrote: > > On Wed, Mar 18, 2026 at 01:47:18PM +0100, Peter Zijlstra wrote: > > On Wed, Mar 18, 2026 at 08:17:55PM +0800, Xuewen Yan wrote: > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > > index bf948db905ed..20adb6fede2a 100644 > > > --- a/kernel/sched/fair.c > > > +++ b/kernel/sched/fair.c > > > @@ -8198,7 +8198,12 @@ unsigned long effective_cpu_util(int cpu, unsigned long util_cfs, > > > > > > unsigned long sched_cpu_util(int cpu) > > > { > > > - return effective_cpu_util(cpu, cpu_util_cfs(cpu), NULL, NULL); > > > + unsigned long util = scx_cpuperf_target(cpu); > > > + > > > + if (!scx_switched_all()) > > > + util += cpu_util_cfs(cpu); > > > + > > > + return effective_cpu_util(cpu, util, NULL, NULL); > > > } > > > > This puts the common case of no ext muck into the slow path of that > > static_branch. > > > > This wants to be something like: > > > > unsigned long sched_cpu_util(int cpu) > > { > > unsigned long util = cpu_util_cfs(cpu); > > > > if (scx_enabled()) { > > unsigned long scx_util = scx_cpuperf_target(cpu); > > > > if (!scx_switched_all()) > > scx_util += util; > > > > util = scx_util; > > } > > > > return effective_cpu_util(cpu, util, NULL, NULL); > > } > > scx_switched_all() is an unlikely static branch just like scx_enabled() and > scx_cpuperf_target() has scx_enabled() in it too, so the difference for the > fair path between the two versions is two noop run-throughs vs. one. Either > way is fine but it is more code for likely no discernible gain. Agree, and for scx_switched_all(), there is no need to call cpu_util_cfs(), Maybe we could modify it as follows? ---> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index bf948db905ed..8f5d0c7e8ea2 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -8198,7 +8198,18 @@ unsigned long effective_cpu_util(int cpu, unsigned long util_cfs, unsigned long sched_cpu_util(int cpu) { - return effective_cpu_util(cpu, cpu_util_cfs(cpu), NULL, NULL); + unsigned long util; + + if (scx_enabled()) { + unsigned long util = scx_cpuperf_target(cpu); + + if (!scx_switched_all()) + util += cpu_util_cfs(cpu); + } else { + util = cpu_util_cfs(cpu); + } + + return effective_cpu_util(cpu, util, NULL, NULL); } --- Thanks! ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [RFC PATCH] sched: Add scx_cpuperf_target in sched_cpu_util() 2026-03-19 2:24 ` Xuewen Yan @ 2026-03-19 2:38 ` Xuewen Yan 0 siblings, 0 replies; 22+ messages in thread From: Xuewen Yan @ 2026-03-19 2:38 UTC (permalink / raw) To: Tejun Heo Cc: Peter Zijlstra, Xuewen Yan, mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman, vschneid, lukasz.luba, linux-kernel, rui.zhang, di.shen, ke.wang On Thu, Mar 19, 2026 at 10:24 AM Xuewen Yan <xuewen.yan94@gmail.com> wrote: > > On Thu, Mar 19, 2026 at 9:08 AM Tejun Heo <tj@kernel.org> wrote: > > > > On Wed, Mar 18, 2026 at 01:47:18PM +0100, Peter Zijlstra wrote: > > > On Wed, Mar 18, 2026 at 08:17:55PM +0800, Xuewen Yan wrote: > > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > > > index bf948db905ed..20adb6fede2a 100644 > > > > --- a/kernel/sched/fair.c > > > > +++ b/kernel/sched/fair.c > > > > @@ -8198,7 +8198,12 @@ unsigned long effective_cpu_util(int cpu, unsigned long util_cfs, > > > > > > > > unsigned long sched_cpu_util(int cpu) > > > > { > > > > - return effective_cpu_util(cpu, cpu_util_cfs(cpu), NULL, NULL); > > > > + unsigned long util = scx_cpuperf_target(cpu); > > > > + > > > > + if (!scx_switched_all()) > > > > + util += cpu_util_cfs(cpu); > > > > + > > > > + return effective_cpu_util(cpu, util, NULL, NULL); > > > > } > > > > > > This puts the common case of no ext muck into the slow path of that > > > static_branch. > > > > > > This wants to be something like: > > > > > > unsigned long sched_cpu_util(int cpu) > > > { > > > unsigned long util = cpu_util_cfs(cpu); > > > > > > if (scx_enabled()) { > > > unsigned long scx_util = scx_cpuperf_target(cpu); > > > > > > if (!scx_switched_all()) > > > scx_util += util; > > > > > > util = scx_util; > > > } > > > > > > return effective_cpu_util(cpu, util, NULL, NULL); > > > } > > > > scx_switched_all() is an unlikely static branch just like scx_enabled() and > > scx_cpuperf_target() has scx_enabled() in it too, so the difference for the > > fair path between the two versions is two noop run-throughs vs. one. Either > > way is fine but it is more code for likely no discernible gain. > > Agree, and for scx_switched_all(), there is no need to call cpu_util_cfs(), > Maybe we could modify it as follows? > > ---> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index bf948db905ed..8f5d0c7e8ea2 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -8198,7 +8198,18 @@ unsigned long effective_cpu_util(int cpu, > unsigned long util_cfs, > > unsigned long sched_cpu_util(int cpu) > { > - return effective_cpu_util(cpu, cpu_util_cfs(cpu), NULL, NULL); > + unsigned long util; > + > + if (scx_enabled()) { > + unsigned long util = scx_cpuperf_target(cpu); Sorry, that was a typo, this will be fixed in patch v2 if the proposed fix is acceptable. Thanks! > + > + if (!scx_switched_all()) > + util += cpu_util_cfs(cpu); > + } else { > + util = cpu_util_cfs(cpu); > + } > + > + return effective_cpu_util(cpu, util, NULL, NULL); > } > --- > > > Thanks! ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC PATCH] sched: Add scx_cpuperf_target in sched_cpu_util() 2026-03-19 1:08 ` [RFC PATCH] sched: Add scx_cpuperf_target in sched_cpu_util() Tejun Heo 2026-03-19 2:24 ` Xuewen Yan @ 2026-03-19 9:02 ` Peter Zijlstra 2026-03-19 10:01 ` Uros Bizjak 1 sibling, 1 reply; 22+ messages in thread From: Peter Zijlstra @ 2026-03-19 9:02 UTC (permalink / raw) To: Tejun Heo Cc: Xuewen Yan, mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman, vschneid, lukasz.luba, linux-kernel, rui.zhang, di.shen, ke.wang, xuewen.yan94, ubizjak, Marco Elver On Wed, Mar 18, 2026 at 03:08:43PM -1000, Tejun Heo wrote: > On Wed, Mar 18, 2026 at 01:47:18PM +0100, Peter Zijlstra wrote: > > On Wed, Mar 18, 2026 at 08:17:55PM +0800, Xuewen Yan wrote: > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > > index bf948db905ed..20adb6fede2a 100644 > > > --- a/kernel/sched/fair.c > > > +++ b/kernel/sched/fair.c > > > @@ -8198,7 +8198,12 @@ unsigned long effective_cpu_util(int cpu, unsigned long util_cfs, > > > > > > unsigned long sched_cpu_util(int cpu) > > > { > > > - return effective_cpu_util(cpu, cpu_util_cfs(cpu), NULL, NULL); > > > + unsigned long util = scx_cpuperf_target(cpu); > > > + > > > + if (!scx_switched_all()) > > > + util += cpu_util_cfs(cpu); > > > + > > > + return effective_cpu_util(cpu, util, NULL, NULL); > > > } > > > > This puts the common case of no ext muck into the slow path of that > > static_branch. > > > > This wants to be something like: > > > > unsigned long sched_cpu_util(int cpu) > > { > > unsigned long util = cpu_util_cfs(cpu); > > > > if (scx_enabled()) { > > unsigned long scx_util = scx_cpuperf_target(cpu); > > > > if (!scx_switched_all()) > > scx_util += util; > > > > util = scx_util; > > } > > > > return effective_cpu_util(cpu, util, NULL, NULL); > > } > > scx_switched_all() is an unlikely static branch just like scx_enabled() and > scx_cpuperf_target() has scx_enabled() in it too, so the difference for the > fair path between the two versions is two noop run-throughs vs. one. Either > way is fine but it is more code for likely no discernible gain. (added noinline to effective_cpu_util() for clarity) So the original patch generates this: sched_cpu_util: 1c5240: sched_cpu_util+0x0 endbr64 1c5244: sched_cpu_util+0x4 call 0x1c5249 <__fentry__> 1c5249: sched_cpu_util+0x9 push %rbp 1c524a: sched_cpu_util+0xa push %rbx 1c524b: sched_cpu_util+0xb mov %edi,%ebx 1c524d: sched_cpu_util+0xd <jump_table.1c524d> = nop2 (if DEFAULT) = jmp 1c5271 <sched_cpu_util+0x31> (if JUMP) 1c524f: sched_cpu_util+0xf xor %ebp,%ebp 1c5251: sched_cpu_util+0x11 <jump_table.1c5251> = nop2 (if DEFAULT) = jmp 1c5261 <sched_cpu_util+0x21> (if JUMP) 1c5253: sched_cpu_util+0x13 xor %edx,%edx 1c5255: sched_cpu_util+0x15 xor %esi,%esi 1c5257: sched_cpu_util+0x17 mov %ebx,%edi 1c5259: sched_cpu_util+0x19 call 0x1bc5b0 <cpu_util.constprop.0> 1c525e: sched_cpu_util+0x1e add %rax,%rbp 1c5261: sched_cpu_util+0x21 mov %rbp,%rsi 1c5264: sched_cpu_util+0x24 mov %ebx,%edi 1c5266: sched_cpu_util+0x26 xor %ecx,%ecx 1c5268: sched_cpu_util+0x28 pop %rbx 1c5269: sched_cpu_util+0x29 xor %edx,%edx 1c526b: sched_cpu_util+0x2b pop %rbp 1c526c: sched_cpu_util+0x2c jmp 0x1c5160 <effective_cpu_util> (slowpath) 1c5271: sched_cpu_util+0x31 movslq %edi,%rdx 1c5274: sched_cpu_util+0x34 mov $0x0,%rax 1c527b: sched_cpu_util+0x3b mov 0x0(,%rdx,8),%rdx 1c5283: sched_cpu_util+0x43 mov 0xa34(%rdx,%rax,1),%ebp 1c528a: sched_cpu_util+0x4a jmp 0x1c5251 <sched_cpu_util+0x11> While my proposal generates this: sched_cpu_util: 1c5240: sched_cpu_util+0x0 endbr64 1c5244: sched_cpu_util+0x4 call 0x1c5249 <__fentry__> 1c5249: sched_cpu_util+0x9 push %rbx 1c524a: sched_cpu_util+0xa xor %esi,%esi 1c524c: sched_cpu_util+0xc xor %edx,%edx 1c524e: sched_cpu_util+0xe mov %edi,%ebx 1c5250: sched_cpu_util+0x10 call 0x1bc5b0 <cpu_util.constprop.0> 1c5255: sched_cpu_util+0x15 mov %rax,%rsi 1c5258: sched_cpu_util+0x18 <jump_table.1c5258> = nop2 (if DEFAULT) = jmp 1c5266 <sched_cpu_util+0x26> (if JUMP) 1c525a: sched_cpu_util+0x1a mov %ebx,%edi 1c525c: sched_cpu_util+0x1c xor %ecx,%ecx 1c525e: sched_cpu_util+0x1e xor %edx,%edx 1c5260: sched_cpu_util+0x20 pop %rbx 1c5261: sched_cpu_util+0x21 jmp 0x1c5160 <effective_cpu_util> (slowpath) 1c5266: sched_cpu_util+0x26 <jump_table.1c5266> = nop2 (if DEFAULT) = jmp 1c527b <sched_cpu_util+0x3b> (if JUMP) 1c5268: sched_cpu_util+0x28 xor %eax,%eax 1c526a: sched_cpu_util+0x2a <jump_table.1c526a> = nop2 (if DEFAULT) = jmp 1c5296 <sched_cpu_util+0x56> (if JUMP) 1c526c: sched_cpu_util+0x2c mov %ebx,%edi 1c526e: sched_cpu_util+0x2e add %rax,%rsi 1c5271: sched_cpu_util+0x31 xor %ecx,%ecx 1c5273: sched_cpu_util+0x33 xor %edx,%edx 1c5275: sched_cpu_util+0x35 pop %rbx 1c5276: sched_cpu_util+0x36 jmp 0x1c5160 <effective_cpu_util> 1c527b: sched_cpu_util+0x3b movslq %ebx,%rdx 1c527e: sched_cpu_util+0x3e mov $0x0,%rax 1c5285: sched_cpu_util+0x45 mov 0x0(,%rdx,8),%rdx 1c528d: sched_cpu_util+0x4d mov 0xa34(%rdx,%rax,1),%eax 1c5294: sched_cpu_util+0x54 jmp 0x1c526a <sched_cpu_util+0x2a> 1c5296: sched_cpu_util+0x56 mov %ebx,%edi 1c5298: sched_cpu_util+0x58 mov %rax,%rsi 1c529b: sched_cpu_util+0x5b xor %ecx,%ecx 1c529d: sched_cpu_util+0x5d xor %edx,%edx 1c529f: sched_cpu_util+0x5f pop %rbx 1c52a0: sched_cpu_util+0x60 jmp 0x1c5160 <effective_cpu_util> That fastpath is definitely better; the slowpath is worse, but that is in part because the compilers are stupid and cannot eliminate static_branch(). /me goes try again .. Yeah, the below patch does nothing :-( It will happily emit scx_enabled() twice. --- diff --git a/arch/x86/include/asm/jump_label.h b/arch/x86/include/asm/jump_label.h index 05b16299588d..47cd1a1f9784 100644 --- a/arch/x86/include/asm/jump_label.h +++ b/arch/x86/include/asm/jump_label.h @@ -32,7 +32,7 @@ JUMP_TABLE_ENTRY(key, label) #endif /* CONFIG_HAVE_JUMP_LABEL_HACK */ -static __always_inline bool arch_static_branch(struct static_key * const key, const bool branch) +static __always_inline __const bool arch_static_branch(struct static_key * const key, const bool branch) { asm goto(ARCH_STATIC_BRANCH_ASM("%c0 + %c1", "%l[l_yes]") : : "i" (key), "i" (branch) : : l_yes); @@ -42,7 +42,7 @@ static __always_inline bool arch_static_branch(struct static_key * const key, co return true; } -static __always_inline bool arch_static_branch_jump(struct static_key * const key, const bool branch) +static __always_inline __const bool arch_static_branch_jump(struct static_key * const key, const bool branch) { asm goto("1:" "jmp %l[l_yes]\n\t" diff --git a/include/linux/compiler_attributes.h b/include/linux/compiler_attributes.h index c16d4199bf92..553fc9f3f7eb 100644 --- a/include/linux/compiler_attributes.h +++ b/include/linux/compiler_attributes.h @@ -312,6 +312,7 @@ * gcc: https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-pure-function-attribute */ #define __pure __attribute__((__pure__)) +#define __const __attribute__((__const__)) /* * gcc: https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-section-function-attribute ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [RFC PATCH] sched: Add scx_cpuperf_target in sched_cpu_util() 2026-03-19 9:02 ` Peter Zijlstra @ 2026-03-19 10:01 ` Uros Bizjak 2026-03-19 10:26 ` Peter Zijlstra 0 siblings, 1 reply; 22+ messages in thread From: Uros Bizjak @ 2026-03-19 10:01 UTC (permalink / raw) To: Peter Zijlstra Cc: Tejun Heo, Xuewen Yan, mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman, vschneid, lukasz.luba, linux-kernel, rui.zhang, di.shen, ke.wang, xuewen.yan94, Marco Elver On Thu, Mar 19, 2026 at 10:02 AM Peter Zijlstra <peterz@infradead.org> wrote: > That fastpath is definitely better; the slowpath is worse, but that is > in part because the compilers are stupid and cannot eliminate > static_branch(). asm gotos are implicitly volatile because they are control flow primitives. The compiler will *not* remove them. Uros. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC PATCH] sched: Add scx_cpuperf_target in sched_cpu_util() 2026-03-19 10:01 ` Uros Bizjak @ 2026-03-19 10:26 ` Peter Zijlstra 2026-03-19 11:02 ` Uros Bizjak 0 siblings, 1 reply; 22+ messages in thread From: Peter Zijlstra @ 2026-03-19 10:26 UTC (permalink / raw) To: Uros Bizjak Cc: Tejun Heo, Xuewen Yan, mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman, vschneid, lukasz.luba, linux-kernel, rui.zhang, di.shen, ke.wang, xuewen.yan94, Marco Elver On Thu, Mar 19, 2026 at 11:01:03AM +0100, Uros Bizjak wrote: > On Thu, Mar 19, 2026 at 10:02 AM Peter Zijlstra <peterz@infradead.org> wrote: > > > That fastpath is definitely better; the slowpath is worse, but that is > > in part because the compilers are stupid and cannot eliminate > > static_branch(). > > asm gotos are implicitly volatile because they are control flow > primitives. The compiler will *not* remove them. Yes, but I want ponies ;-) if (static_branch_unlikely(&foo)) { if (static_branch_unlikely(&foo)) { /* A */ } else { /* B */ } /* C */ } Is a very common occurence. And we all know this really should be: if (static_branch_unlikely(&foo)) { /* A */ /* C */ } So how can we make this happen? IMO marking those functions __const should tell the compiler that yes, it can elimintate them. You should not try and protect the user. If they use __const incorrectly, they get to keep the pieces and all that. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC PATCH] sched: Add scx_cpuperf_target in sched_cpu_util() 2026-03-19 10:26 ` Peter Zijlstra @ 2026-03-19 11:02 ` Uros Bizjak 2026-03-19 11:12 ` Peter Zijlstra 0 siblings, 1 reply; 22+ messages in thread From: Uros Bizjak @ 2026-03-19 11:02 UTC (permalink / raw) To: Peter Zijlstra Cc: Tejun Heo, Xuewen Yan, mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman, vschneid, lukasz.luba, linux-kernel, rui.zhang, di.shen, ke.wang, xuewen.yan94, Marco Elver On Thu, Mar 19, 2026 at 11:27 AM Peter Zijlstra <peterz@infradead.org> wrote: > > On Thu, Mar 19, 2026 at 11:01:03AM +0100, Uros Bizjak wrote: > > On Thu, Mar 19, 2026 at 10:02 AM Peter Zijlstra <peterz@infradead.org> wrote: > > > > > That fastpath is definitely better; the slowpath is worse, but that is > > > in part because the compilers are stupid and cannot eliminate > > > static_branch(). > > > > asm gotos are implicitly volatile because they are control flow > > primitives. The compiler will *not* remove them. > > Yes, but I want ponies ;-) > > if (static_branch_unlikely(&foo)) { > if (static_branch_unlikely(&foo)) { > /* A */ > } else { > /* B */ > } > /* C */ > } > > Is a very common occurence. And we all know this really should be: > > if (static_branch_unlikely(&foo)) { > /* A */ > /* C */ > } > > So how can we make this happen? IMO marking those functions __const > should tell the compiler that yes, it can elimintate them. Huh, __const promises that the function does not access global memory and that the function does not have side effects other than returning a value. asm volatile inside the __const function creates a side effect, so removing function calls would be considered a misoptimization. Probably this could lead to undefined behavior in terms of what the compiler expects from a __const function. > You should not try and protect the user. If they use __const > incorrectly, they get to keep the pieces and all that. I'm afraid that here the user wants the "__const with only partial properties of __const" function, where the compiler does not know what the user really wants. Uros. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC PATCH] sched: Add scx_cpuperf_target in sched_cpu_util() 2026-03-19 11:02 ` Uros Bizjak @ 2026-03-19 11:12 ` Peter Zijlstra 2026-03-19 11:19 ` Uros Bizjak 2026-03-19 11:22 ` Peter Zijlstra 0 siblings, 2 replies; 22+ messages in thread From: Peter Zijlstra @ 2026-03-19 11:12 UTC (permalink / raw) To: Uros Bizjak Cc: Tejun Heo, Xuewen Yan, mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman, vschneid, lukasz.luba, linux-kernel, rui.zhang, di.shen, ke.wang, xuewen.yan94, Marco Elver On Thu, Mar 19, 2026 at 12:02:29PM +0100, Uros Bizjak wrote: > On Thu, Mar 19, 2026 at 11:27 AM Peter Zijlstra <peterz@infradead.org> wrote: > > > > On Thu, Mar 19, 2026 at 11:01:03AM +0100, Uros Bizjak wrote: > > > On Thu, Mar 19, 2026 at 10:02 AM Peter Zijlstra <peterz@infradead.org> wrote: > > > > > > > That fastpath is definitely better; the slowpath is worse, but that is > > > > in part because the compilers are stupid and cannot eliminate > > > > static_branch(). > > > > > > asm gotos are implicitly volatile because they are control flow > > > primitives. The compiler will *not* remove them. > > > > Yes, but I want ponies ;-) > > > > if (static_branch_unlikely(&foo)) { > > if (static_branch_unlikely(&foo)) { > > /* A */ > > } else { > > /* B */ > > } > > /* C */ > > } > > > > Is a very common occurence. And we all know this really should be: > > > > if (static_branch_unlikely(&foo)) { > > /* A */ > > /* C */ > > } > > > > So how can we make this happen? IMO marking those functions __const > > should tell the compiler that yes, it can elimintate them. > > Huh, __const promises that the function does not access global memory > and that the function does not have side effects other than returning > a value. asm volatile inside the __const function creates a side > effect, so removing function calls would be considered a > misoptimization. Probably this could lead to undefined behavior in > terms of what the compiler expects from a __const function. So since the whole function reduces to a single branch or nop, it does not in fact access memory or have side effects, right? (and there is still __pure, for if we somehow consider the key governing the text patching to be an 'access' in this case) > > You should not try and protect the user. If they use __const > > incorrectly, they get to keep the pieces and all that. > > I'm afraid that here the user wants the "__const with only partial > properties of __const" function, where the compiler does not know what > the user really wants. Well, clearly I'm not a compiler person :-) I just want to be able to tell the compiler that it can collapse these branches. Do you have another option that would be less offensive to compiler folks such as yourself? ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC PATCH] sched: Add scx_cpuperf_target in sched_cpu_util() 2026-03-19 11:12 ` Peter Zijlstra @ 2026-03-19 11:19 ` Uros Bizjak 2026-03-19 11:33 ` Peter Zijlstra 2026-03-19 11:22 ` Peter Zijlstra 1 sibling, 1 reply; 22+ messages in thread From: Uros Bizjak @ 2026-03-19 11:19 UTC (permalink / raw) To: Peter Zijlstra Cc: Tejun Heo, Xuewen Yan, mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman, vschneid, lukasz.luba, linux-kernel, rui.zhang, di.shen, ke.wang, xuewen.yan94, Marco Elver On Thu, Mar 19, 2026 at 12:12 PM Peter Zijlstra <peterz@infradead.org> wrote: > > On Thu, Mar 19, 2026 at 12:02:29PM +0100, Uros Bizjak wrote: > > On Thu, Mar 19, 2026 at 11:27 AM Peter Zijlstra <peterz@infradead.org> wrote: > > > > > > On Thu, Mar 19, 2026 at 11:01:03AM +0100, Uros Bizjak wrote: > > > > On Thu, Mar 19, 2026 at 10:02 AM Peter Zijlstra <peterz@infradead.org> wrote: > > > > > > > > > That fastpath is definitely better; the slowpath is worse, but that is > > > > > in part because the compilers are stupid and cannot eliminate > > > > > static_branch(). > > > > > > > > asm gotos are implicitly volatile because they are control flow > > > > primitives. The compiler will *not* remove them. > > > > > > Yes, but I want ponies ;-) > > > > > > if (static_branch_unlikely(&foo)) { > > > if (static_branch_unlikely(&foo)) { > > > /* A */ > > > } else { > > > /* B */ > > > } > > > /* C */ > > > } > > > > > > Is a very common occurence. And we all know this really should be: > > > > > > if (static_branch_unlikely(&foo)) { > > > /* A */ > > > /* C */ > > > } > > > > > > So how can we make this happen? IMO marking those functions __const > > > should tell the compiler that yes, it can elimintate them. > > > > Huh, __const promises that the function does not access global memory > > and that the function does not have side effects other than returning > > a value. asm volatile inside the __const function creates a side > > effect, so removing function calls would be considered a > > misoptimization. Probably this could lead to undefined behavior in > > terms of what the compiler expects from a __const function. > > So since the whole function reduces to a single branch or nop, it does > not in fact access memory or have side effects, right? But there is "asm goto" inside, this tells the compiler that there is a side effect, no matter what is written in the asm template. Even 'asm volatile ("nop");' inside function declares a side effect. > (and there is still __pure, for if we somehow consider the key governing > the text patching to be an 'access' in this case) __pure function also can't have side effects. > > > You should not try and protect the user. If they use __const > > > incorrectly, they get to keep the pieces and all that. > > > > I'm afraid that here the user wants the "__const with only partial > > properties of __const" function, where the compiler does not know what > > the user really wants. > > Well, clearly I'm not a compiler person :-) I just want to be able to > tell the compiler that it can collapse these branches. Do you have > another option that would be less offensive to compiler folks such as > yourself? I don't see any in this case (mixing volatiles with __const or __pure attribute), but please ask on gcc@ mailing list, if there is any other option I have missed. Uros. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC PATCH] sched: Add scx_cpuperf_target in sched_cpu_util() 2026-03-19 11:19 ` Uros Bizjak @ 2026-03-19 11:33 ` Peter Zijlstra 0 siblings, 0 replies; 22+ messages in thread From: Peter Zijlstra @ 2026-03-19 11:33 UTC (permalink / raw) To: Uros Bizjak Cc: Tejun Heo, Xuewen Yan, mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman, vschneid, lukasz.luba, linux-kernel, rui.zhang, di.shen, ke.wang, xuewen.yan94, Marco Elver On Thu, Mar 19, 2026 at 12:19:50PM +0100, Uros Bizjak wrote: > I don't see any in this case (mixing volatiles with __const or __pure > attribute), but please ask on gcc@ mailing list, if there is any other > option I have missed. I raised this on toolchains a while ago, but that discussions got derailed into irrelevance :/ https://lore.kernel.org/all/YG80wg%2F2iZjXfCDJ@hirez.programming.kicks-ass.net/ I really just want it fixed, I don't particularly care how. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC PATCH] sched: Add scx_cpuperf_target in sched_cpu_util() 2026-03-19 11:12 ` Peter Zijlstra 2026-03-19 11:19 ` Uros Bizjak @ 2026-03-19 11:22 ` Peter Zijlstra 1 sibling, 0 replies; 22+ messages in thread From: Peter Zijlstra @ 2026-03-19 11:22 UTC (permalink / raw) To: Uros Bizjak Cc: Tejun Heo, Xuewen Yan, mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman, vschneid, lukasz.luba, linux-kernel, rui.zhang, di.shen, ke.wang, xuewen.yan94, Marco Elver On Thu, Mar 19, 2026 at 12:12:42PM +0100, Peter Zijlstra wrote: > On Thu, Mar 19, 2026 at 12:02:29PM +0100, Uros Bizjak wrote: > > On Thu, Mar 19, 2026 at 11:27 AM Peter Zijlstra <peterz@infradead.org> wrote: > > > > > > On Thu, Mar 19, 2026 at 11:01:03AM +0100, Uros Bizjak wrote: > > > > On Thu, Mar 19, 2026 at 10:02 AM Peter Zijlstra <peterz@infradead.org> wrote: > > > > > > > > > That fastpath is definitely better; the slowpath is worse, but that is > > > > > in part because the compilers are stupid and cannot eliminate > > > > > static_branch(). > > > > > > > > asm gotos are implicitly volatile because they are control flow > > > > primitives. The compiler will *not* remove them. > > > > > > Yes, but I want ponies ;-) > > > > > > if (static_branch_unlikely(&foo)) { > > > if (static_branch_unlikely(&foo)) { > > > /* A */ > > > } else { > > > /* B */ > > > } > > > /* C */ > > > } > > > > > > Is a very common occurence. And we all know this really should be: > > > > > > if (static_branch_unlikely(&foo)) { > > > /* A */ > > > /* C */ > > > } > > > > > > So how can we make this happen? IMO marking those functions __const > > > should tell the compiler that yes, it can elimintate them. > > > > Huh, __const promises that the function does not access global memory > > and that the function does not have side effects other than returning > > a value. asm volatile inside the __const function creates a side > > effect, so removing function calls would be considered a > > misoptimization. Probably this could lead to undefined behavior in > > terms of what the compiler expects from a __const function. > > So since the whole function reduces to a single branch or nop, it does > not in fact access memory or have side effects, right? > > (and there is still __pure, for if we somehow consider the key governing > the text patching to be an 'access' in this case) So is it really just the problem that since the compiler doesn't know what asm volatile ends up doing, it must assume the worse and hence invalidates the __const (without a warning even). So you're letting the unknown weight heavier than the explicit instruction from the author who put on __const (and supposedly does know). And I feel somewhat strongly that this is wrong. Yes, if the author gets it wrong, and there are side effects and things are elided it is a misoptimization, but that is on the author. You can't blame the compiler for doing what it was told to do. Or is there really more to this? ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC PATCH] sched: Add scx_cpuperf_target in sched_cpu_util() 2026-03-18 12:17 [RFC PATCH] sched: Add scx_cpuperf_target in sched_cpu_util() Xuewen Yan 2026-03-18 12:47 ` Peter Zijlstra @ 2026-03-18 12:54 ` Christian Loehle 2026-03-19 1:21 ` Tejun Heo 2 siblings, 0 replies; 22+ messages in thread From: Christian Loehle @ 2026-03-18 12:54 UTC (permalink / raw) To: Xuewen Yan, peterz, mingo, juri.lelli, vincent.guittot, tj Cc: dietmar.eggemann, rostedt, bsegall, mgorman, vschneid, lukasz.luba, linux-kernel, rui.zhang, di.shen, ke.wang, xuewen.yan94 On 3/18/26 12:17, Xuewen Yan wrote: > Recently, while enabling sched-ext debugging, we observed abnormal behavior > in our thermal power_allocator’s temperature control. > Through debugging, we found that the CPU util was too low, causing > the CPU frequency to remain unrestricted. > > This issue stems from the fact that in the sched_cpu_util() function, > when scx is enabled, cpu_util_cfs becomes zero. As a result, > the thermal subsystem perceives an extremely low CPU utilization, > which degrades the effectiveness of the power_allocator’s control. > > To address this, we propose adding scx_cpuperf_target in the sched_cpu_util() > as a replacement for cpu_util_cfs, ensuring that the thermal subsystem receives > accurate load information and restores proper control behavior. > > Reported-by: Di Shen <di.shen@unisoc.com> > Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com> Fixes: d86adb4fc0655 ("sched_ext: Add cpuperf support")? > --- > kernel/sched/fair.c | 7 ++++++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index bf948db905ed..20adb6fede2a 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -8198,7 +8198,12 @@ unsigned long effective_cpu_util(int cpu, unsigned long util_cfs, > > unsigned long sched_cpu_util(int cpu) > { > - return effective_cpu_util(cpu, cpu_util_cfs(cpu), NULL, NULL); > + unsigned long util = scx_cpuperf_target(cpu); > + > + if (!scx_switched_all()) > + util += cpu_util_cfs(cpu); > + > + return effective_cpu_util(cpu, util, NULL, NULL); > } > > /* Ah right, I guess it's not as good as sugov as there's no guarantee it behaves somewhat low-pass-filterish, but better than nothing. No idea if it's worth factoring out this and sugov_get_util, maybe not given that sugov includes ('temporary') cfs runnable boost while sched_cpu_util() doesn't. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [RFC PATCH] sched: Add scx_cpuperf_target in sched_cpu_util() 2026-03-18 12:17 [RFC PATCH] sched: Add scx_cpuperf_target in sched_cpu_util() Xuewen Yan 2026-03-18 12:47 ` Peter Zijlstra 2026-03-18 12:54 ` Christian Loehle @ 2026-03-19 1:21 ` Tejun Heo 2 siblings, 0 replies; 22+ messages in thread From: Tejun Heo @ 2026-03-19 1:21 UTC (permalink / raw) To: Xuewen Yan Cc: peterz, mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman, vschneid, lukasz.luba, linux-kernel, rui.zhang, di.shen, ke.wang, xuewen.yan94 On Wed, Mar 18, 2026 at 08:17:55PM +0800, Xuewen Yan wrote: > Recently, while enabling sched-ext debugging, we observed abnormal behavior > in our thermal power_allocator’s temperature control. > Through debugging, we found that the CPU util was too low, causing > the CPU frequency to remain unrestricted. > > This issue stems from the fact that in the sched_cpu_util() function, > when scx is enabled, cpu_util_cfs becomes zero. As a result, > the thermal subsystem perceives an extremely low CPU utilization, > which degrades the effectiveness of the power_allocator’s control. > > To address this, we propose adding scx_cpuperf_target in the sched_cpu_util() > as a replacement for cpu_util_cfs, ensuring that the thermal subsystem receives > accurate load information and restores proper control behavior. > > Reported-by: Di Shen <di.shen@unisoc.com> > Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com> Yeah, I missed that path. In either this or Peter's suggested form: Acked-by: Tejun Heo <tj@kernel.org> Thanks. -- tejun ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2026-03-24 1:33 UTC | newest] Thread overview: 22+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-03-18 12:17 [RFC PATCH] sched: Add scx_cpuperf_target in sched_cpu_util() Xuewen Yan 2026-03-18 12:47 ` Peter Zijlstra 2026-03-18 12:55 ` Vincent Guittot 2026-03-18 13:44 ` Qais Yousef 2026-03-19 2:13 ` Xuewen Yan 2026-03-19 7:09 ` Vincent Guittot 2026-03-19 10:18 ` Lukasz Luba 2026-03-24 1:32 ` Qais Yousef 2026-03-18 13:03 ` [PATCH] sched/cpufreq: Reorder so non-SCX is common path Christian Loehle 2026-03-19 1:08 ` [RFC PATCH] sched: Add scx_cpuperf_target in sched_cpu_util() Tejun Heo 2026-03-19 2:24 ` Xuewen Yan 2026-03-19 2:38 ` Xuewen Yan 2026-03-19 9:02 ` Peter Zijlstra 2026-03-19 10:01 ` Uros Bizjak 2026-03-19 10:26 ` Peter Zijlstra 2026-03-19 11:02 ` Uros Bizjak 2026-03-19 11:12 ` Peter Zijlstra 2026-03-19 11:19 ` Uros Bizjak 2026-03-19 11:33 ` Peter Zijlstra 2026-03-19 11:22 ` Peter Zijlstra 2026-03-18 12:54 ` Christian Loehle 2026-03-19 1:21 ` Tejun Heo
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox