* [PATCH v2 0/2] cpufreq: CPPC: idle cpu perf handling
@ 2025-06-19 0:09 Prashant Malani
2025-06-19 0:09 ` [PATCH v2 1/2] sched: Expose idle_cpu() to modules Prashant Malani
2025-06-19 0:09 ` [PATCH v2 2/2] cpufreq: CPPC: Dont read counters for idle CPUs Prashant Malani
0 siblings, 2 replies; 44+ messages in thread
From: Prashant Malani @ 2025-06-19 0:09 UTC (permalink / raw)
To: Ben Segall, Dietmar Eggemann, Ingo Molnar, Juri Lelli, open list,
open list:CPU FREQUENCY SCALING FRAMEWORK, Mel Gorman,
Peter Zijlstra, Rafael J. Wysocki, Steven Rostedt,
Valentin Schneider, Vincent Guittot, Viresh Kumar
Cc: Prashant Malani
This is a short series to address the unreliable feedback performance
counter values that are returned by the CPPC driver when the CPU is
idle.
The first patch exposes idle_cpu() to be accessible to modules, and the
second patch does the actual change of not reading the feedback counters
when we know that the CPU is idle.
v1(single patch): https://lore.kernel.org/all/20250614003601.1600784-1-pmalani@google.com/
Prashant Malani (2):
sched: Expose idle_cpu() to modules
cpufreq: CPPC: Dont read counters for idle CPUs
drivers/cpufreq/cppc_cpufreq.c | 5 +++++
kernel/sched/syscalls.c | 1 +
2 files changed, 6 insertions(+)
--
2.50.0.rc2.701.gf1e915cc24-goog
^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH v2 1/2] sched: Expose idle_cpu() to modules
2025-06-19 0:09 [PATCH v2 0/2] cpufreq: CPPC: idle cpu perf handling Prashant Malani
@ 2025-06-19 0:09 ` Prashant Malani
2025-06-19 0:09 ` [PATCH v2 2/2] cpufreq: CPPC: Dont read counters for idle CPUs Prashant Malani
1 sibling, 0 replies; 44+ messages in thread
From: Prashant Malani @ 2025-06-19 0:09 UTC (permalink / raw)
To: Ben Segall, Dietmar Eggemann, Ingo Molnar, Juri Lelli, open list,
open list:CPU FREQUENCY SCALING FRAMEWORK, Mel Gorman,
Peter Zijlstra, Rafael J. Wysocki, Steven Rostedt,
Valentin Schneider, Vincent Guittot, Viresh Kumar
Cc: Prashant Malani
idle_cpu() can be helpful in some drivers which can utilize it to make
performance optimizations based on CPU idle state; since these drivers
can be compiled as modules, that prevents them from using idle_cpu().
So, expose the function to be available to modules.
Signed-off-by: Prashant Malani <pmalani@google.com>
---
Patch first introduced in v2.
kernel/sched/syscalls.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/kernel/sched/syscalls.c b/kernel/sched/syscalls.c
index 547c1f05b667..0fd5e2dafcf7 100644
--- a/kernel/sched/syscalls.c
+++ b/kernel/sched/syscalls.c
@@ -216,6 +216,7 @@ int idle_cpu(int cpu)
return 1;
}
+EXPORT_SYMBOL_GPL(idle_cpu);
/**
* available_idle_cpu - is a given CPU idle for enqueuing work.
--
2.50.0.rc2.701.gf1e915cc24-goog
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v2 2/2] cpufreq: CPPC: Dont read counters for idle CPUs
2025-06-19 0:09 [PATCH v2 0/2] cpufreq: CPPC: idle cpu perf handling Prashant Malani
2025-06-19 0:09 ` [PATCH v2 1/2] sched: Expose idle_cpu() to modules Prashant Malani
@ 2025-06-19 0:09 ` Prashant Malani
2025-06-20 3:53 ` Jie Zhan
1 sibling, 1 reply; 44+ messages in thread
From: Prashant Malani @ 2025-06-19 0:09 UTC (permalink / raw)
To: Ben Segall, Dietmar Eggemann, Ingo Molnar, Juri Lelli, open list,
open list:CPU FREQUENCY SCALING FRAMEWORK, Mel Gorman,
Peter Zijlstra, Rafael J. Wysocki, Steven Rostedt,
Valentin Schneider, Vincent Guittot, Viresh Kumar
Cc: Prashant Malani
AMU performance counters tend to be inaccurate when measured on idle CPUs.
On an idle CPU which is programmed to 3.4 GHz (verified through firmware),
here is a measurement and calculation of operating frequency:
t0: ref=899127636, del=3012458473
t1: ref=899129626, del=3012466509
perf=40
For reference, when we measure the same CPU with stress-ng running, we have
a more accurate result:
t0: ref=30751756418, del=104490567689
t1: ref=30751760628, del=104490582296
perf=34
(t0 and t1 are 2 microseconds apart)
In the above, the prescribed method[1] of calculating frequency from CPPC
counters was used.
The follow-on effect is that the inaccurate frequency is stashed in the
cpufreq policy struct when the CPU is brought online. Since CPUs are mostly
idle when they are brought online, this means cpufreq has an inaccurate
view of the programmed clock rate.
Consequently, if userspace tries to actually set the frequency to the
previously erroneous rate (4 GHz in the above example), cpufreq returns
early without calling in to the CPPC driver to send the relevant PCC
command; it thinks the CPU is already at that frequency.
Update the CPPC get_rate() code to skip sampling counters if we know a CPU
is idle, and go directly to the fallback response of returning the
“desired” frequency. The code intends to do that anyway if the counters
happen to return an “idle” reading.
[1] https://docs.kernel.org/admin-guide/acpi/cppc_sysfs.html#computing-average-delivered-performance
Signed-off-by: Prashant Malani <pmalani@google.com>
---
Changes in v2:
- Add sched.h header for usage when compiled as module.
drivers/cpufreq/cppc_cpufreq.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
index b7c688a5659c..5ed04774e569 100644
--- a/drivers/cpufreq/cppc_cpufreq.c
+++ b/drivers/cpufreq/cppc_cpufreq.c
@@ -18,6 +18,7 @@
#include <linux/cpufreq.h>
#include <linux/irq_work.h>
#include <linux/kthread.h>
+#include <linux/sched.h>
#include <linux/time.h>
#include <linux/vmalloc.h>
#include <uapi/linux/sched/types.h>
@@ -753,6 +754,10 @@ static unsigned int cppc_cpufreq_get_rate(unsigned int cpu)
cpufreq_cpu_put(policy);
+ /* Idle CPUs have unreliable counters, so skip to the end. */
+ if (idle_cpu(cpu))
+ goto out_invalid_counters;
+
ret = cppc_get_perf_ctrs_sample(cpu, &fb_ctrs_t0, &fb_ctrs_t1);
if (ret) {
if (ret == -EFAULT)
--
2.50.0.rc2.701.gf1e915cc24-goog
^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH v2 2/2] cpufreq: CPPC: Dont read counters for idle CPUs
2025-06-19 0:09 ` [PATCH v2 2/2] cpufreq: CPPC: Dont read counters for idle CPUs Prashant Malani
@ 2025-06-20 3:53 ` Jie Zhan
2025-06-20 5:07 ` Prashant Malani
0 siblings, 1 reply; 44+ messages in thread
From: Jie Zhan @ 2025-06-20 3:53 UTC (permalink / raw)
To: Prashant Malani
Cc: Ben Segall, Dietmar Eggemann, Ingo Molnar, Juri Lelli, open list,
open list:CPU FREQUENCY SCALING FRAMEWORK, Mel Gorman,
Peter Zijlstra, Rafael J. Wysocki, Steven Rostedt,
Valentin Schneider, Vincent Guittot, Viresh Kumar,
Ionela Voinescu, Beata Michalska, z00813676
Hi Prashant,
Thanks for spotting this.
Cc'd a few more developers.
Jie
On 19/06/2025 08:09, Prashant Malani wrote:
> AMU performance counters tend to be inaccurate when measured on idle CPUs.
> On an idle CPU which is programmed to 3.4 GHz (verified through firmware),
> here is a measurement and calculation of operating frequency:
>
> t0: ref=899127636, del=3012458473
> t1: ref=899129626, del=3012466509
> perf=40
In this case, the target cpu is mostly idle but not fully idle during the
sampling window since the counter grows a little bit.
Perhaps some interrupts happen to run on the cpu shortly.
Thus, the actual issue is the accuracy of frequency sampling becomes poor
when the delta of counters are too small to obtain a reliable accuracy.
Would it be more sensible to put a minimum threshold of the delta of
counters when sampling the frequency?
If the threshold is not met, we can go to the existing out_invalid_counters
path. Then we don't have to export idle_cpu() either, and BTW, that ABI
doesn't seem to be synchronous at all, i.e. the cpu might be busy when we
check and then become idle when sampling.
>
> For reference, when we measure the same CPU with stress-ng running, we have
> a more accurate result:
> t0: ref=30751756418, del=104490567689
> t1: ref=30751760628, del=104490582296
> perf=34
>
> (t0 and t1 are 2 microseconds apart)
>
> In the above, the prescribed method[1] of calculating frequency from CPPC
> counters was used.
>
> The follow-on effect is that the inaccurate frequency is stashed in the
> cpufreq policy struct when the CPU is brought online. Since CPUs are mostly
> idle when they are brought online, this means cpufreq has an inaccurate
> view of the programmed clock rate.
>
> Consequently, if userspace tries to actually set the frequency to the
> previously erroneous rate (4 GHz in the above example), cpufreq returns
> early without calling in to the CPPC driver to send the relevant PCC
> command; it thinks the CPU is already at that frequency.
>
> Update the CPPC get_rate() code to skip sampling counters if we know a CPU
> is idle, and go directly to the fallback response of returning the
> “desired” frequency. The code intends to do that anyway if the counters
> happen to return an “idle” reading.
>
> [1] https://docs.kernel.org/admin-guide/acpi/cppc_sysfs.html#computing-average-delivered-performance
>
> Signed-off-by: Prashant Malani <pmalani@google.com>
> ---
>
> Changes in v2:
> - Add sched.h header for usage when compiled as module.
>
> drivers/cpufreq/cppc_cpufreq.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
> index b7c688a5659c..5ed04774e569 100644
> --- a/drivers/cpufreq/cppc_cpufreq.c
> +++ b/drivers/cpufreq/cppc_cpufreq.c
> @@ -18,6 +18,7 @@
> #include <linux/cpufreq.h>
> #include <linux/irq_work.h>
> #include <linux/kthread.h>
> +#include <linux/sched.h>
> #include <linux/time.h>
> #include <linux/vmalloc.h>
> #include <uapi/linux/sched/types.h>
> @@ -753,6 +754,10 @@ static unsigned int cppc_cpufreq_get_rate(unsigned int cpu)
>
> cpufreq_cpu_put(policy);
>
> + /* Idle CPUs have unreliable counters, so skip to the end. */
> + if (idle_cpu(cpu))
> + goto out_invalid_counters;
> +
> ret = cppc_get_perf_ctrs_sample(cpu, &fb_ctrs_t0, &fb_ctrs_t1);
> if (ret) {
> if (ret == -EFAULT)
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2 2/2] cpufreq: CPPC: Dont read counters for idle CPUs
2025-06-20 3:53 ` Jie Zhan
@ 2025-06-20 5:07 ` Prashant Malani
2025-06-26 18:42 ` Prashant Malani
2025-06-27 7:54 ` Jie Zhan
0 siblings, 2 replies; 44+ messages in thread
From: Prashant Malani @ 2025-06-20 5:07 UTC (permalink / raw)
To: Jie Zhan
Cc: Ben Segall, Dietmar Eggemann, Ingo Molnar, Juri Lelli, open list,
open list:CPU FREQUENCY SCALING FRAMEWORK, Mel Gorman,
Peter Zijlstra, Rafael J. Wysocki, Steven Rostedt,
Valentin Schneider, Vincent Guittot, Viresh Kumar,
Ionela Voinescu, Beata Michalska, z00813676
Hi Jie,
Thanks for taking a look at the patch.
On Thu, 19 Jun 2025 at 20:53, Jie Zhan <zhanjie9@hisilicon.com> wrote:
> On 19/06/2025 08:09, Prashant Malani wrote:
> > AMU performance counters tend to be inaccurate when measured on idle CPUs.
> > On an idle CPU which is programmed to 3.4 GHz (verified through firmware),
> > here is a measurement and calculation of operating frequency:
> >
> > t0: ref=899127636, del=3012458473
> > t1: ref=899129626, del=3012466509
> > perf=40
>
> In this case, the target cpu is mostly idle but not fully idle during the
> sampling window since the counter grows a little bit.
> Perhaps some interrupts happen to run on the cpu shortly.
>
> Thus, the actual issue is the accuracy of frequency sampling becomes poor
> when the delta of counters are too small to obtain a reliable accuracy.
>
> Would it be more sensible to put a minimum threshold of the delta of
> counters when sampling the frequency?
I'm happy to throw together a patch if there is some safe
threshold the experts here can agree on for the minimum delta for
the ref counter. I would caution that with this sort of approach we
start running into the familiar issue:
- What value is appropriate? Too large and you get false
positives (falling back to the idle invalid path when we shouldn't), and
too less and you get false negatives (we still report inaccurate
counter values).
- Is the threshold the same across platforms?
- Will it remain the same 5/10 years from now?
> BTW, that ABI
> doesn't seem to be synchronous at all, i.e. the cpu might be busy when we
> check and then become idle when sampling.
>
I don't think this is necessarily an issue. The ABI doesn't need to be
synchronous; it is merely a snapshot of the scheduler view of that CPU
at a point in time. Even the current method of perf counters sampling
is purely hueristic. The CPU might be idle for the 2 usec the
sampling is done, and servicing traffic before and after that.
This is inherent whenever you are sampling any system state.
I would imagine it is more reliable to trust the kernel scheduler's view
of whether a CPU is idle, than relying on counters and a calculation
method which are sensitive and unreliable for idle systems
(i.e stray interrupts can throw off the calculations).
That said, I'm happy to go with the approach folks on this list recommend.
Cheers,
--
-Prashant
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2 2/2] cpufreq: CPPC: Dont read counters for idle CPUs
2025-06-20 5:07 ` Prashant Malani
@ 2025-06-26 18:42 ` Prashant Malani
2025-06-27 7:54 ` Jie Zhan
1 sibling, 0 replies; 44+ messages in thread
From: Prashant Malani @ 2025-06-26 18:42 UTC (permalink / raw)
To: Jie Zhan
Cc: Ben Segall, Dietmar Eggemann, Ingo Molnar, Juri Lelli, open list,
open list:CPU FREQUENCY SCALING FRAMEWORK, Mel Gorman,
Peter Zijlstra, Rafael J. Wysocki, Steven Rostedt,
Valentin Schneider, Vincent Guittot, Viresh Kumar,
Ionela Voinescu, Beata Michalska, z00813676
(re-sending without HTML)
Hi,
Just a friendly ping on this thread/series. It would be great to get
some feedback from the maintainers/stakeholders here.
Thanks,
On Thu, 19 Jun 2025 at 22:07, Prashant Malani <pmalani@google.com> wrote:
>
> Hi Jie,
>
> Thanks for taking a look at the patch.
>
> On Thu, 19 Jun 2025 at 20:53, Jie Zhan <zhanjie9@hisilicon.com> wrote:
> > On 19/06/2025 08:09, Prashant Malani wrote:
> > > AMU performance counters tend to be inaccurate when measured on idle CPUs.
> > > On an idle CPU which is programmed to 3.4 GHz (verified through firmware),
> > > here is a measurement and calculation of operating frequency:
> > >
> > > t0: ref=899127636, del=3012458473
> > > t1: ref=899129626, del=3012466509
> > > perf=40
> >
> > In this case, the target cpu is mostly idle but not fully idle during the
> > sampling window since the counter grows a little bit.
> > Perhaps some interrupts happen to run on the cpu shortly.
> >
> > Thus, the actual issue is the accuracy of frequency sampling becomes poor
> > when the delta of counters are too small to obtain a reliable accuracy.
> >
> > Would it be more sensible to put a minimum threshold of the delta of
> > counters when sampling the frequency?
>
> I'm happy to throw together a patch if there is some safe
> threshold the experts here can agree on for the minimum delta for
> the ref counter. I would caution that with this sort of approach we
> start running into the familiar issue:
> - What value is appropriate? Too large and you get false
> positives (falling back to the idle invalid path when we shouldn't), and
> too less and you get false negatives (we still report inaccurate
> counter values).
> - Is the threshold the same across platforms?
> - Will it remain the same 5/10 years from now?
>
> > BTW, that ABI
> > doesn't seem to be synchronous at all, i.e. the cpu might be busy when we
> > check and then become idle when sampling.
> >
>
> I don't think this is necessarily an issue. The ABI doesn't need to be
> synchronous; it is merely a snapshot of the scheduler view of that CPU
> at a point in time. Even the current method of perf counters sampling
> is purely hueristic. The CPU might be idle for the 2 usec the
> sampling is done, and servicing traffic before and after that.
> This is inherent whenever you are sampling any system state.
>
> I would imagine it is more reliable to trust the kernel scheduler's view
> of whether a CPU is idle, than relying on counters and a calculation
> method which are sensitive and unreliable for idle systems
> (i.e stray interrupts can throw off the calculations).
>
> That said, I'm happy to go with the approach folks on this list recommend.
>
> Cheers,
>
> --
> -Prashant
--
-Prashant
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2 2/2] cpufreq: CPPC: Dont read counters for idle CPUs
2025-06-20 5:07 ` Prashant Malani
2025-06-26 18:42 ` Prashant Malani
@ 2025-06-27 7:54 ` Jie Zhan
2025-06-27 17:07 ` Prashant Malani
2025-07-07 8:35 ` Beata Michalska
1 sibling, 2 replies; 44+ messages in thread
From: Jie Zhan @ 2025-06-27 7:54 UTC (permalink / raw)
To: Prashant Malani
Cc: Ben Segall, Dietmar Eggemann, Ingo Molnar, Juri Lelli, open list,
open list:CPU FREQUENCY SCALING FRAMEWORK, Mel Gorman,
Peter Zijlstra, Rafael J. Wysocki, Steven Rostedt,
Valentin Schneider, Vincent Guittot, Viresh Kumar,
Ionela Voinescu, Beata Michalska, z00813676
Hi Prashant,
Sorry for a late reply as I'm busy on other stuff and this doesn't seem to
be an easy issue to solve.
I may provide some thoughts but probably need more time to go through the
history and come up with a good solution.
Actually, the inaccuracy of cppc_cpufreq_get_rate() has been reported and
discussed many times. I believe your issue is just one of the cases.
For the latest kernel, [1] provides a new 'cpuinfo_avg_freq' sysfs file to
reflect the frequency base on AMUs, which is supposed to be more stable.
Though it usually shows 'Resource temporarily unavailable' on my platform
at the moment and looks a bit buggy.
Most of the related discussions can be found in the reference links in [1].
[1] https://lore.kernel.org/linux-pm/20250131162439.3843071-1-beata.michalska@arm.com/
As reported, the current frequency sampling method may show an large error
on 1) 100% load, 2) high memory access pressure, 3) idle cpus in your case.
AFAICS, they may all come from the unstable latency accessing remote AMUs
for 4 times but delaying a fixed 2us sampling window.
Increase the sampling windows seems to help but also increase the time
overhead, so that's not favoured by people.
On 20/06/2025 13:07, Prashant Malani wrote:
> Hi Jie,
>
> Thanks for taking a look at the patch.
>
> On Thu, 19 Jun 2025 at 20:53, Jie Zhan <zhanjie9@hisilicon.com> wrote:
>> On 19/06/2025 08:09, Prashant Malani wrote:
>>> AMU performance counters tend to be inaccurate when measured on idle CPUs.
>>> On an idle CPU which is programmed to 3.4 GHz (verified through firmware),
>>> here is a measurement and calculation of operating frequency:
>>>
>>> t0: ref=899127636, del=3012458473
>>> t1: ref=899129626, del=3012466509
>>> perf=40
>>
>> In this case, the target cpu is mostly idle but not fully idle during the
>> sampling window since the counter grows a little bit.
>> Perhaps some interrupts happen to run on the cpu shortly.
Check back here again, I don't think it 'mostly idle'.
Diff of ref counters is around 2000, and I guess the ref counter freq is
1GHz on your platform? That's exactly 2us, so the target cpu is mostly
busy.
So that might be some other issue. Let's forget the minimum threshold
stuff below for now.
>>
>> Thus, the actual issue is the accuracy of frequency sampling becomes poor
>> when the delta of counters are too small to obtain a reliable accuracy.
>>
>> Would it be more sensible to put a minimum threshold of the delta of
>> counters when sampling the frequency?
>
> I'm happy to throw together a patch if there is some safe
> threshold the experts here can agree on for the minimum delta for
> the ref counter. I would caution that with this sort of approach we
> start running into the familiar issue:
> - What value is appropriate? Too large and you get false
> positives (falling back to the idle invalid path when we shouldn't), and
> too less and you get false negatives (we still report inaccurate
> counter values).
> - Is the threshold the same across platforms?
> - Will it remain the same 5/10 years from now?
>
>> BTW, that ABI
>> doesn't seem to be synchronous at all, i.e. the cpu might be busy when we
>> check and then become idle when sampling.
>>
>
> I don't think this is necessarily an issue. The ABI doesn't need to be
> synchronous; it is merely a snapshot of the scheduler view of that CPU
> at a point in time. Even the current method of perf counters sampling
> is purely hueristic. The CPU might be idle for the 2 usec the
> sampling is done, and servicing traffic before and after that.
> This is inherent whenever you are sampling any system state.
Then the issue is not totally solved, just less often?
>
> I would imagine it is more reliable to trust the kernel scheduler's view
> of whether a CPU is idle, than relying on counters and a calculation
> method which are sensitive and unreliable for idle systems
> (i.e stray interrupts can throw off the calculations).
>
> That said, I'm happy to go with the approach folks on this list recommend.
>
> Cheers,
>
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2 2/2] cpufreq: CPPC: Dont read counters for idle CPUs
2025-06-27 7:54 ` Jie Zhan
@ 2025-06-27 17:07 ` Prashant Malani
2025-07-02 18:38 ` Prashant Malani
2025-07-07 8:35 ` Beata Michalska
1 sibling, 1 reply; 44+ messages in thread
From: Prashant Malani @ 2025-06-27 17:07 UTC (permalink / raw)
To: Jie Zhan
Cc: Ben Segall, Dietmar Eggemann, Ingo Molnar, Juri Lelli, open list,
open list:CPU FREQUENCY SCALING FRAMEWORK, Mel Gorman,
Peter Zijlstra, Rafael J. Wysocki, Steven Rostedt,
Valentin Schneider, Vincent Guittot, Viresh Kumar,
Ionela Voinescu, Beata Michalska, z00813676
Hi Jie,
On Fri, 27 Jun 2025 at 00:55, Jie Zhan <zhanjie9@hisilicon.com> wrote:
>
>
> Hi Prashant,
>
> Sorry for a late reply as I'm busy on other stuff and this doesn't seem to
> be an easy issue to solve.
>
No worries, the ping was in general to all the people in the thread :)
> For the latest kernel, [1] provides a new 'cpuinfo_avg_freq' sysfs file to
> reflect the frequency base on AMUs, which is supposed to be more stable.
> Though it usually shows 'Resource temporarily unavailable' on my platform
> at the moment and looks a bit buggy.
>
> Most of the related discussions can be found in the reference links in [1].
> [1] https://lore.kernel.org/linux-pm/20250131162439.3843071-1-beata.michalska@arm.com/
>
> As reported, the current frequency sampling method may show an large error
> on 1) 100% load, 2) high memory access pressure, 3) idle cpus in your case.
>
> AFAICS, they may all come from the unstable latency accessing remote AMUs
> for 4 times but delaying a fixed 2us sampling window.
I tried applying [1] which consolidates the ref and del register reads
into 1 IPI, but that did not make a difference. The values still
fluctuate wildly.
>
> Increase the sampling windows seems to help but also increase the time
> overhead, so that's not favoured by people.
>
This experiment did not appear to help in our case. It's a point in
the direction that this method is inherently inaccurate during idle
situations.
> On 20/06/2025 13:07, Prashant Malani wrote:
> > Hi Jie,
> > On Thu, 19 Jun 2025 at 20:53, Jie Zhan <zhanjie9@hisilicon.com> wrote:
> >> On 19/06/2025 08:09, Prashant Malani wrote:
> >>> t0: ref=899127636, del=3012458473
> >>> t1: ref=899129626, del=3012466509
> >>> perf=40
> >>
> >> In this case, the target cpu is mostly idle but not fully idle during the
> >> sampling window since the counter grows a little bit.
> >> Perhaps some interrupts happen to run on the cpu shortly.
>
> Check back here again, I don't think it 'mostly idle'.
> Diff of ref counters is around 2000, and I guess the ref counter freq is
> 1GHz on your platform? That's exactly 2us, so the target cpu is mostly
> busy.
I don't think the reference counter increment means that the CPU is
"busy" or "not idle". Per [2], it just means that the "processor is
active".
idle_cpu() returning true means that the CPU is just running the idle
task, and has nothing in its runqueue.
In our experiments, this is always the case at least when the cpu is
being brought online (which kind of makes sense).
> > I don't think this is necessarily an issue. The ABI doesn't need to be
> > synchronous; it is merely a snapshot of the scheduler view of that CPU
> > at a point in time. Even the current method of perf counters sampling
> > is purely hueristic. The CPU might be idle for the 2 usec the
> > sampling is done, and servicing traffic before and after that.
> > This is inherent whenever you are sampling any system state.
>
> Then the issue is not totally solved, just less often?
>
Yes. I don't think this can be completely solved, given the inherent
inaccuracy in hardware. What this *does* do is mitigate one of the
scenarios, while not impacting sampling when the CPU is actually doing
something useful; as such I don't see much downside to including it.
Best regards,
[1] https://patchew.org/linux/20240229162520.970986-1-vanshikonda@os.amperecomputing.com/20240229162520.970986-4-vanshikonda@os.amperecomputing.com/
[2] https://uefi.org/specs/ACPI/6.5/08_Processor_Configuration_and_Control.html?highlight=cppc#performance-counters
--
-Prashant
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2 2/2] cpufreq: CPPC: Dont read counters for idle CPUs
2025-06-27 17:07 ` Prashant Malani
@ 2025-07-02 18:38 ` Prashant Malani
2025-07-03 9:29 ` Beata Michalska
2025-07-07 8:32 ` Beata Michalska
0 siblings, 2 replies; 44+ messages in thread
From: Prashant Malani @ 2025-07-02 18:38 UTC (permalink / raw)
To: Jie Zhan, Ionela Voinescu, Beata Michalska
Cc: Ben Segall, Dietmar Eggemann, Ingo Molnar, Juri Lelli, open list,
open list:CPU FREQUENCY SCALING FRAMEWORK, Mel Gorman,
Peter Zijlstra, Rafael J. Wysocki, Steven Rostedt,
Valentin Schneider, Vincent Guittot, Viresh Kumar, z00813676
Hi All,
Ionela, Beata, could you kindly review ?
On Fri, 27 Jun 2025 at 10:07, Prashant Malani <pmalani@google.com> wrote:
>
> Hi Jie,
>
> On Fri, 27 Jun 2025 at 00:55, Jie Zhan <zhanjie9@hisilicon.com> wrote:
> >
> >
> > Hi Prashant,
> >
> > Sorry for a late reply as I'm busy on other stuff and this doesn't seem to
> > be an easy issue to solve.
> >
>
> No worries, the ping was in general to all the people in the thread :)
>
> > For the latest kernel, [1] provides a new 'cpuinfo_avg_freq' sysfs file to
> > reflect the frequency base on AMUs, which is supposed to be more stable.
> > Though it usually shows 'Resource temporarily unavailable' on my platform
> > at the moment and looks a bit buggy.
> >
> > Most of the related discussions can be found in the reference links in [1].
> > [1] https://lore.kernel.org/linux-pm/20250131162439.3843071-1-beata.michalska@arm.com/
> >
> > As reported, the current frequency sampling method may show an large error
> > on 1) 100% load, 2) high memory access pressure, 3) idle cpus in your case.
> >
> > AFAICS, they may all come from the unstable latency accessing remote AMUs
> > for 4 times but delaying a fixed 2us sampling window.
>
> I tried applying [1] which consolidates the ref and del register reads
> into 1 IPI, but that did not make a difference. The values still
> fluctuate wildly.
>
> >
> > Increase the sampling windows seems to help but also increase the time
> > overhead, so that's not favoured by people.
> >
>
> This experiment did not appear to help in our case. It's a point in
> the direction that this method is inherently inaccurate during idle
> situations.
>
> > On 20/06/2025 13:07, Prashant Malani wrote:
> > > Hi Jie,
> > > On Thu, 19 Jun 2025 at 20:53, Jie Zhan <zhanjie9@hisilicon.com> wrote:
> > >> On 19/06/2025 08:09, Prashant Malani wrote:
> > >>> t0: ref=899127636, del=3012458473
> > >>> t1: ref=899129626, del=3012466509
> > >>> perf=40
> > >>
> > >> In this case, the target cpu is mostly idle but not fully idle during the
> > >> sampling window since the counter grows a little bit.
> > >> Perhaps some interrupts happen to run on the cpu shortly.
> >
> > Check back here again, I don't think it 'mostly idle'.
> > Diff of ref counters is around 2000, and I guess the ref counter freq is
> > 1GHz on your platform? That's exactly 2us, so the target cpu is mostly
> > busy.
I think it is pertinent to note: the actual act of reading the CPPC counters
will (at least for ACPI_ADR_SPACE_FIXED_HARDWARE counters)
wake the CPU up, so even if a CPU *was* idle, the reading of the counters
calls cpc_read_ffh() [1] which does an IPI on the target CPU [2] thus waking
it up from WFI.
And that brings us back to the original assertion made in this patch:
the counter values are quite unreliable when the CPU is in this
idle (or rather I should correct that to, waking from WFI) state.
This work around probably hits more types of implementations, but
I can't see another way to limit it to only ARM FFH. Open to suggestions!
[1] https://elixir.bootlin.com/linux/v6.15.4/source/arch/arm64/kernel/topology.c#L482
[2] https://elixir.bootlin.com/linux/v6.15.4/source/arch/arm64/kernel/topology.c#L453
Best regards,
-Prashant
--
-Prashant
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2 2/2] cpufreq: CPPC: Dont read counters for idle CPUs
2025-07-02 18:38 ` Prashant Malani
@ 2025-07-03 9:29 ` Beata Michalska
2025-07-07 8:32 ` Beata Michalska
1 sibling, 0 replies; 44+ messages in thread
From: Beata Michalska @ 2025-07-03 9:29 UTC (permalink / raw)
To: Prashant Malani
Cc: Jie Zhan, Ionela Voinescu, Ben Segall, Dietmar Eggemann,
Ingo Molnar, Juri Lelli, open list,
open list:CPU FREQUENCY SCALING FRAMEWORK, Mel Gorman,
Peter Zijlstra, Rafael J. Wysocki, Steven Rostedt,
Valentin Schneider, Vincent Guittot, Viresh Kumar, z00813676
Hi Prashant,
On Wed, Jul 02, 2025 at 11:38:11AM -0700, Prashant Malani wrote:
> Hi All,
>
> Ionela, Beata, could you kindly review ?
>
I've totally missed that - apologies for that. Will try to have a look within
next day or two.
---
BR
Beata
> On Fri, 27 Jun 2025 at 10:07, Prashant Malani <pmalani@google.com> wrote:
> >
> > Hi Jie,
> >
> > On Fri, 27 Jun 2025 at 00:55, Jie Zhan <zhanjie9@hisilicon.com> wrote:
> > >
> > >
> > > Hi Prashant,
> > >
> > > Sorry for a late reply as I'm busy on other stuff and this doesn't seem to
> > > be an easy issue to solve.
> > >
> >
> > No worries, the ping was in general to all the people in the thread :)
> >
> > > For the latest kernel, [1] provides a new 'cpuinfo_avg_freq' sysfs file to
> > > reflect the frequency base on AMUs, which is supposed to be more stable.
> > > Though it usually shows 'Resource temporarily unavailable' on my platform
> > > at the moment and looks a bit buggy.
> > >
> > > Most of the related discussions can be found in the reference links in [1].
> > > [1] https://lore.kernel.org/linux-pm/20250131162439.3843071-1-beata.michalska@arm.com/
> > >
> > > As reported, the current frequency sampling method may show an large error
> > > on 1) 100% load, 2) high memory access pressure, 3) idle cpus in your case.
> > >
> > > AFAICS, they may all come from the unstable latency accessing remote AMUs
> > > for 4 times but delaying a fixed 2us sampling window.
> >
> > I tried applying [1] which consolidates the ref and del register reads
> > into 1 IPI, but that did not make a difference. The values still
> > fluctuate wildly.
> >
> > >
> > > Increase the sampling windows seems to help but also increase the time
> > > overhead, so that's not favoured by people.
> > >
> >
> > This experiment did not appear to help in our case. It's a point in
> > the direction that this method is inherently inaccurate during idle
> > situations.
> >
> > > On 20/06/2025 13:07, Prashant Malani wrote:
> > > > Hi Jie,
> > > > On Thu, 19 Jun 2025 at 20:53, Jie Zhan <zhanjie9@hisilicon.com> wrote:
> > > >> On 19/06/2025 08:09, Prashant Malani wrote:
> > > >>> t0: ref=899127636, del=3012458473
> > > >>> t1: ref=899129626, del=3012466509
> > > >>> perf=40
> > > >>
> > > >> In this case, the target cpu is mostly idle but not fully idle during the
> > > >> sampling window since the counter grows a little bit.
> > > >> Perhaps some interrupts happen to run on the cpu shortly.
> > >
> > > Check back here again, I don't think it 'mostly idle'.
> > > Diff of ref counters is around 2000, and I guess the ref counter freq is
> > > 1GHz on your platform? That's exactly 2us, so the target cpu is mostly
> > > busy.
>
> I think it is pertinent to note: the actual act of reading the CPPC counters
> will (at least for ACPI_ADR_SPACE_FIXED_HARDWARE counters)
> wake the CPU up, so even if a CPU *was* idle, the reading of the counters
> calls cpc_read_ffh() [1] which does an IPI on the target CPU [2] thus waking
> it up from WFI.
>
> And that brings us back to the original assertion made in this patch:
> the counter values are quite unreliable when the CPU is in this
> idle (or rather I should correct that to, waking from WFI) state.
>
> This work around probably hits more types of implementations, but
> I can't see another way to limit it to only ARM FFH. Open to suggestions!
>
> [1] https://elixir.bootlin.com/linux/v6.15.4/source/arch/arm64/kernel/topology.c#L482
> [2] https://elixir.bootlin.com/linux/v6.15.4/source/arch/arm64/kernel/topology.c#L453
>
> Best regards,
>
> -Prashant
>
>
> --
> -Prashant
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2 2/2] cpufreq: CPPC: Dont read counters for idle CPUs
2025-07-02 18:38 ` Prashant Malani
2025-07-03 9:29 ` Beata Michalska
@ 2025-07-07 8:32 ` Beata Michalska
2025-07-09 17:25 ` Prashant Malani
1 sibling, 1 reply; 44+ messages in thread
From: Beata Michalska @ 2025-07-07 8:32 UTC (permalink / raw)
To: Prashant Malani
Cc: Jie Zhan, Ionela Voinescu, Ben Segall, Dietmar Eggemann,
Ingo Molnar, Juri Lelli, open list,
open list:CPU FREQUENCY SCALING FRAMEWORK, Mel Gorman,
Peter Zijlstra, Rafael J. Wysocki, Steven Rostedt,
Valentin Schneider, Vincent Guittot, Viresh Kumar, z00813676
Hi Prashant,
On Wed, Jul 02, 2025 at 11:38:11AM -0700, Prashant Malani wrote:
> Hi All,
>
> Ionela, Beata, could you kindly review ?
>
> On Fri, 27 Jun 2025 at 10:07, Prashant Malani <pmalani@google.com> wrote:
> >
> > Hi Jie,
> >
> > On Fri, 27 Jun 2025 at 00:55, Jie Zhan <zhanjie9@hisilicon.com> wrote:
> > >
> > >
> > > Hi Prashant,
> > >
> > > Sorry for a late reply as I'm busy on other stuff and this doesn't seem to
> > > be an easy issue to solve.
> > >
> >
> > No worries, the ping was in general to all the people in the thread :)
> >
> > > For the latest kernel, [1] provides a new 'cpuinfo_avg_freq' sysfs file to
> > > reflect the frequency base on AMUs, which is supposed to be more stable.
> > > Though it usually shows 'Resource temporarily unavailable' on my platform
> > > at the moment and looks a bit buggy.
> > >
> > > Most of the related discussions can be found in the reference links in [1].
> > > [1] https://lore.kernel.org/linux-pm/20250131162439.3843071-1-beata.michalska@arm.com/
> > >
> > > As reported, the current frequency sampling method may show an large error
> > > on 1) 100% load, 2) high memory access pressure, 3) idle cpus in your case.
> > >
> > > AFAICS, they may all come from the unstable latency accessing remote AMUs
> > > for 4 times but delaying a fixed 2us sampling window.
> >
> > I tried applying [1] which consolidates the ref and del register reads
> > into 1 IPI, but that did not make a difference. The values still
> > fluctuate wildly.
> >
> > >
> > > Increase the sampling windows seems to help but also increase the time
> > > overhead, so that's not favoured by people.
> > >
> >
> > This experiment did not appear to help in our case. It's a point in
> > the direction that this method is inherently inaccurate during idle
> > situations.
> >
> > > On 20/06/2025 13:07, Prashant Malani wrote:
> > > > Hi Jie,
> > > > On Thu, 19 Jun 2025 at 20:53, Jie Zhan <zhanjie9@hisilicon.com> wrote:
> > > >> On 19/06/2025 08:09, Prashant Malani wrote:
> > > >>> t0: ref=899127636, del=3012458473
> > > >>> t1: ref=899129626, del=3012466509
> > > >>> perf=40
> > > >>
> > > >> In this case, the target cpu is mostly idle but not fully idle during the
> > > >> sampling window since the counter grows a little bit.
> > > >> Perhaps some interrupts happen to run on the cpu shortly.
> > >
> > > Check back here again, I don't think it 'mostly idle'.
> > > Diff of ref counters is around 2000, and I guess the ref counter freq is
> > > 1GHz on your platform? That's exactly 2us, so the target cpu is mostly
> > > busy.
>
> I think it is pertinent to note: the actual act of reading the CPPC counters
> will (at least for ACPI_ADR_SPACE_FIXED_HARDWARE counters)
> wake the CPU up, so even if a CPU *was* idle, the reading of the counters
> calls cpc_read_ffh() [1] which does an IPI on the target CPU [2] thus waking
> it up from WFI.
>
> And that brings us back to the original assertion made in this patch:
> the counter values are quite unreliable when the CPU is in this
> idle (or rather I should correct that to, waking from WFI) state.
>
I'd say that's very platform specific, and as such playing with the delay makes
little sense. I'd need to do more deliberate testing, but I haven't noticed
(yet) any discrepancies in AMU counters on waking up.
Aside, you have mentioned that you've observed the frequency reported to be
above max one (4GHz vs 3.5HZ if I recall correctly) - shouldn't that be clamped
by the driver if the values are outside of supported range ?
Verifying whether the CPU is idle before poking it just to get a counters
reading to derive current frequency from those does feel rather like an
appealing idea. Narrowing the scope for ACPI_ADR_SPACE_FIXED_HARDWARE cases
could be solved by providing a query for the address space. Though I am not an
expert here so would be good to get some input from someone who is
(on both).
---
BR
Beata
> This work around probably hits more types of implementations, but
> I can't see another way to limit it to only ARM FFH. Open to suggestions!
>
> [1] https://elixir.bootlin.com/linux/v6.15.4/source/arch/arm64/kernel/topology.c#L482
> [2] https://elixir.bootlin.com/linux/v6.15.4/source/arch/arm64/kernel/topology.c#L453
>
> Best regards,
>
> -Prashant
>
>
> --
> -Prashant
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2 2/2] cpufreq: CPPC: Dont read counters for idle CPUs
2025-06-27 7:54 ` Jie Zhan
2025-06-27 17:07 ` Prashant Malani
@ 2025-07-07 8:35 ` Beata Michalska
1 sibling, 0 replies; 44+ messages in thread
From: Beata Michalska @ 2025-07-07 8:35 UTC (permalink / raw)
To: Jie Zhan
Cc: Prashant Malani, Ben Segall, Dietmar Eggemann, Ingo Molnar,
Juri Lelli, open list, open list:CPU FREQUENCY SCALING FRAMEWORK,
Mel Gorman, Peter Zijlstra, Rafael J. Wysocki, Steven Rostedt,
Valentin Schneider, Vincent Guittot, Viresh Kumar,
Ionela Voinescu, z00813676
On Fri, Jun 27, 2025 at 03:54:59PM +0800, Jie Zhan wrote:
>
> Hi Prashant,
>
> Sorry for a late reply as I'm busy on other stuff and this doesn't seem to
> be an easy issue to solve.
>
> I may provide some thoughts but probably need more time to go through the
> history and come up with a good solution.
>
> Actually, the inaccuracy of cppc_cpufreq_get_rate() has been reported and
> discussed many times. I believe your issue is just one of the cases.
>
> For the latest kernel, [1] provides a new 'cpuinfo_avg_freq' sysfs file to
> reflect the frequency base on AMUs, which is supposed to be more stable.
> Though it usually shows 'Resource temporarily unavailable' on my platform
> at the moment and looks a bit buggy.
>
I'd say that would mean the CPU for which the 'cpuinfo_avg_freq' is queried is
mostly idle. If that is not the case then I guess it is 'buggy' and should be
fixed, so more details would be appreciated.
---
BR
Beata
> Most of the related discussions can be found in the reference links in [1].
> [1] https://lore.kernel.org/linux-pm/20250131162439.3843071-1-beata.michalska@arm.com/
>
> As reported, the current frequency sampling method may show an large error
> on 1) 100% load, 2) high memory access pressure, 3) idle cpus in your case.
>
> AFAICS, they may all come from the unstable latency accessing remote AMUs
> for 4 times but delaying a fixed 2us sampling window.
>
> Increase the sampling windows seems to help but also increase the time
> overhead, so that's not favoured by people.
>
> On 20/06/2025 13:07, Prashant Malani wrote:
> > Hi Jie,
> >
> > Thanks for taking a look at the patch.
> >
> > On Thu, 19 Jun 2025 at 20:53, Jie Zhan <zhanjie9@hisilicon.com> wrote:
> >> On 19/06/2025 08:09, Prashant Malani wrote:
> >>> AMU performance counters tend to be inaccurate when measured on idle CPUs.
> >>> On an idle CPU which is programmed to 3.4 GHz (verified through firmware),
> >>> here is a measurement and calculation of operating frequency:
> >>>
> >>> t0: ref=899127636, del=3012458473
> >>> t1: ref=899129626, del=3012466509
> >>> perf=40
> >>
> >> In this case, the target cpu is mostly idle but not fully idle during the
> >> sampling window since the counter grows a little bit.
> >> Perhaps some interrupts happen to run on the cpu shortly.
>
> Check back here again, I don't think it 'mostly idle'.
> Diff of ref counters is around 2000, and I guess the ref counter freq is
> 1GHz on your platform? That's exactly 2us, so the target cpu is mostly
> busy.
>
> So that might be some other issue. Let's forget the minimum threshold
> stuff below for now.
>
> >>
> >> Thus, the actual issue is the accuracy of frequency sampling becomes poor
> >> when the delta of counters are too small to obtain a reliable accuracy.
> >>
> >> Would it be more sensible to put a minimum threshold of the delta of
> >> counters when sampling the frequency?
> >
> > I'm happy to throw together a patch if there is some safe
> > threshold the experts here can agree on for the minimum delta for
> > the ref counter. I would caution that with this sort of approach we
> > start running into the familiar issue:
> > - What value is appropriate? Too large and you get false
> > positives (falling back to the idle invalid path when we shouldn't), and
> > too less and you get false negatives (we still report inaccurate
> > counter values).
> > - Is the threshold the same across platforms?
> > - Will it remain the same 5/10 years from now?
> >
> >> BTW, that ABI
> >> doesn't seem to be synchronous at all, i.e. the cpu might be busy when we
> >> check and then become idle when sampling.
> >>
> >
> > I don't think this is necessarily an issue. The ABI doesn't need to be
> > synchronous; it is merely a snapshot of the scheduler view of that CPU
> > at a point in time. Even the current method of perf counters sampling
> > is purely hueristic. The CPU might be idle for the 2 usec the
> > sampling is done, and servicing traffic before and after that.
> > This is inherent whenever you are sampling any system state.
>
> Then the issue is not totally solved, just less often?
>
> >
> > I would imagine it is more reliable to trust the kernel scheduler's view
> > of whether a CPU is idle, than relying on counters and a calculation
> > method which are sensitive and unreliable for idle systems
> > (i.e stray interrupts can throw off the calculations).
> >
> > That said, I'm happy to go with the approach folks on this list recommend.
> >
> > Cheers,
> >
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2 2/2] cpufreq: CPPC: Dont read counters for idle CPUs
2025-07-07 8:32 ` Beata Michalska
@ 2025-07-09 17:25 ` Prashant Malani
2025-07-09 22:49 ` Prashant Malani
0 siblings, 1 reply; 44+ messages in thread
From: Prashant Malani @ 2025-07-09 17:25 UTC (permalink / raw)
To: Beata Michalska
Cc: Jie Zhan, Ionela Voinescu, Ben Segall, Dietmar Eggemann,
Ingo Molnar, Juri Lelli, open list,
open list:CPU FREQUENCY SCALING FRAMEWORK, Mel Gorman,
Peter Zijlstra, Rafael J. Wysocki, Steven Rostedt,
Valentin Schneider, Vincent Guittot, Viresh Kumar, z00813676
Hi Beata,
Thanks for taking a look.
On Mon, 7 Jul 2025 at 01:33, Beata Michalska <beata.michalska@arm.com> wrote:
>
> Hi Prashant,
>
> On Wed, Jul 02, 2025 at 11:38:11AM -0700, Prashant Malani wrote:
> > Hi All,
> >
> > Ionela, Beata, could you kindly review ?
> >
> > On Fri, 27 Jun 2025 at 10:07, Prashant Malani <pmalani@google.com> wrote:
> > >
> >
> > I think it is pertinent to note: the actual act of reading the CPPC counters
> > will (at least for ACPI_ADR_SPACE_FIXED_HARDWARE counters)
> > wake the CPU up, so even if a CPU *was* idle, the reading of the counters
> > calls cpc_read_ffh() [1] which does an IPI on the target CPU [2] thus waking
> > it up from WFI.
> >
> > And that brings us back to the original assertion made in this patch:
> > the counter values are quite unreliable when the CPU is in this
> > idle (or rather I should correct that to, waking from WFI) state.
> >
> I'd say that's very platform specific, and as such playing with the delay makes
> little sense. I'd need to do more deliberate testing, but I haven't noticed
> (yet) any discrepancies in AMU counters on waking up.
> Aside, you have mentioned that you've observed the frequency reported to be
> above max one (4GHz vs 3.5HZ if I recall correctly) - shouldn't that be clamped
> by the driver if the values are outside of supported range ?
>
> Verifying whether the CPU is idle before poking it just to get a counters
> reading to derive current frequency from those does feel rather like an
> appealing idea.
That's good to hear. What can we do to refine this series further?
> Narrowing the scope for ACPI_ADR_SPACE_FIXED_HARDWARE cases
> could be solved by providing a query for the address space. Though I am not an
> expert here so would be good to get some input from someone who is
> (on both).
Who would be the expert here (are they on this mailing list)?
Could you point me to an example for the query for the address space? Then
I can respin this series to use that query and narrow the scope.
BR,
--
-Prashant
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2 2/2] cpufreq: CPPC: Dont read counters for idle CPUs
2025-07-09 17:25 ` Prashant Malani
@ 2025-07-09 22:49 ` Prashant Malani
2025-07-14 9:30 ` Beata Michalska
0 siblings, 1 reply; 44+ messages in thread
From: Prashant Malani @ 2025-07-09 22:49 UTC (permalink / raw)
To: Beata Michalska
Cc: Jie Zhan, Ionela Voinescu, Ben Segall, Dietmar Eggemann,
Ingo Molnar, Juri Lelli, open list,
open list:CPU FREQUENCY SCALING FRAMEWORK, Mel Gorman,
Peter Zijlstra, Rafael J. Wysocki, Steven Rostedt,
Valentin Schneider, Vincent Guittot, Viresh Kumar, z00813676
On Wed, 9 Jul 2025 at 10:25, Prashant Malani <pmalani@google.com> wrote:
>
> Hi Beata,
>
> Thanks for taking a look.
>
> On Mon, 7 Jul 2025 at 01:33, Beata Michalska <beata.michalska@arm.com> wrote:
> >
> > Hi Prashant,
> >
> > On Wed, Jul 02, 2025 at 11:38:11AM -0700, Prashant Malani wrote:
> > > Hi All,
> > >
> > > Ionela, Beata, could you kindly review ?
> > >
> > > On Fri, 27 Jun 2025 at 10:07, Prashant Malani <pmalani@google.com> wrote:
> > > >
> > >
> > > I think it is pertinent to note: the actual act of reading the CPPC counters
> > > will (at least for ACPI_ADR_SPACE_FIXED_HARDWARE counters)
> > > wake the CPU up, so even if a CPU *was* idle, the reading of the counters
> > > calls cpc_read_ffh() [1] which does an IPI on the target CPU [2] thus waking
> > > it up from WFI.
> > >
> > > And that brings us back to the original assertion made in this patch:
> > > the counter values are quite unreliable when the CPU is in this
> > > idle (or rather I should correct that to, waking from WFI) state.
> > >
> > I'd say that's very platform specific, and as such playing with the delay makes
> > little sense. I'd need to do more deliberate testing, but I haven't noticed
> > (yet) any discrepancies in AMU counters on waking up.
> > Aside, you have mentioned that you've observed the frequency reported to be
> > above max one (4GHz vs 3.5HZ if I recall correctly) - shouldn't that be clamped
> > by the driver if the values are outside of supported range ?
> >
> > Verifying whether the CPU is idle before poking it just to get a counters
> > reading to derive current frequency from those does feel rather like an
> > appealing idea.
>
> That's good to hear. What can we do to refine this series further?
>
> > Narrowing the scope for ACPI_ADR_SPACE_FIXED_HARDWARE cases
> > could be solved by providing a query for the address space. Though I am not an
> > expert here so would be good to get some input from someone who is
> > (on both).
>
> Who would be the expert here (are they on this mailing list)?
>
> Could you point me to an example for the query for the address space? Then
> I can respin this series to use that query and narrow the scope.
Actually, if the idea of this optimization (the idle_cpu check) sounds
good to you,
I don't see why we should limit it to ACPI_ADR_SPACE_FIXED_HARDWARE.
IOW, the patch can remain in its current form.
Best regards,
--
-Prashant
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2 2/2] cpufreq: CPPC: Dont read counters for idle CPUs
2025-07-09 22:49 ` Prashant Malani
@ 2025-07-14 9:30 ` Beata Michalska
2025-07-15 6:28 ` Prashant Malani
0 siblings, 1 reply; 44+ messages in thread
From: Beata Michalska @ 2025-07-14 9:30 UTC (permalink / raw)
To: Prashant Malani
Cc: Jie Zhan, Ionela Voinescu, Ben Segall, Dietmar Eggemann,
Ingo Molnar, Juri Lelli, open list,
open list:CPU FREQUENCY SCALING FRAMEWORK, Mel Gorman,
Peter Zijlstra, Rafael J. Wysocki, Steven Rostedt,
Valentin Schneider, Vincent Guittot, Viresh Kumar, z00813676
On Wed, Jul 09, 2025 at 03:49:03PM -0700, Prashant Malani wrote:
> On Wed, 9 Jul 2025 at 10:25, Prashant Malani <pmalani@google.com> wrote:
> >
> > Hi Beata,
> >
> > Thanks for taking a look.
> >
> > On Mon, 7 Jul 2025 at 01:33, Beata Michalska <beata.michalska@arm.com> wrote:
> > >
> > > Hi Prashant,
> > >
> > > On Wed, Jul 02, 2025 at 11:38:11AM -0700, Prashant Malani wrote:
> > > > Hi All,
> > > >
> > > > Ionela, Beata, could you kindly review ?
> > > >
> > > > On Fri, 27 Jun 2025 at 10:07, Prashant Malani <pmalani@google.com> wrote:
> > > > >
> > > >
> > > > I think it is pertinent to note: the actual act of reading the CPPC counters
> > > > will (at least for ACPI_ADR_SPACE_FIXED_HARDWARE counters)
> > > > wake the CPU up, so even if a CPU *was* idle, the reading of the counters
> > > > calls cpc_read_ffh() [1] which does an IPI on the target CPU [2] thus waking
> > > > it up from WFI.
> > > >
> > > > And that brings us back to the original assertion made in this patch:
> > > > the counter values are quite unreliable when the CPU is in this
> > > > idle (or rather I should correct that to, waking from WFI) state.
> > > >
> > > I'd say that's very platform specific, and as such playing with the delay makes
> > > little sense. I'd need to do more deliberate testing, but I haven't noticed
> > > (yet) any discrepancies in AMU counters on waking up.
> > > Aside, you have mentioned that you've observed the frequency reported to be
> > > above max one (4GHz vs 3.5HZ if I recall correctly) - shouldn't that be clamped
> > > by the driver if the values are outside of supported range ?
> > >
> > > Verifying whether the CPU is idle before poking it just to get a counters
> > > reading to derive current frequency from those does feel rather like an
> > > appealing idea.
> >
> > That's good to hear. What can we do to refine this series further?
So I believe this should be handled in CPUFreq core, if at all.
Would be good to get an input/opinion from the maintainers: Viresh and Rafael.
> >
> > > Narrowing the scope for ACPI_ADR_SPACE_FIXED_HARDWARE cases
> > > could be solved by providing a query for the address space. Though I am not an
> > > expert here so would be good to get some input from someone who is
> > > (on both).
> >
> > Who would be the expert here (are they on this mailing list)?
Probably as above + Sudeep Holla
> >
> > Could you point me to an example for the query for the address space? Then
> > I can respin this series to use that query and narrow the scope.
>
Actually was suggesting adding one.
> Actually, if the idea of this optimization (the idle_cpu check) sounds
> good to you,
> I don't see why we should limit it to ACPI_ADR_SPACE_FIXED_HARDWARE.
> IOW, the patch can remain in its current form.
>
Right, that does need though verifying against all users.
In the meantime ....
It seems that the issue of getting counters on a CPU that is idle is not
in the counters themselves, but in the way how they are being read - at least
from what I can observe.
The first read experience longer delay between reading core and const counters,
and as const one is read as a second one, it misses some increments (within
calculated delta). So, what we could do within the driver is either:
- Add a way to request reading both counters in a single cpc_read (preferable
I guess, though I would have to have a closer look at that)
- Add some logic that would make sure the reads are not far apart
Would you be able to verify that on your end?
---
BR
Beata
> Best regards,
>
> --
> -Prashant
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2 2/2] cpufreq: CPPC: Dont read counters for idle CPUs
2025-07-14 9:30 ` Beata Michalska
@ 2025-07-15 6:28 ` Prashant Malani
2025-07-21 17:00 ` Rafael J. Wysocki
0 siblings, 1 reply; 44+ messages in thread
From: Prashant Malani @ 2025-07-15 6:28 UTC (permalink / raw)
To: Beata Michalska
Cc: Jie Zhan, Ionela Voinescu, Ben Segall, Dietmar Eggemann,
Ingo Molnar, Juri Lelli, open list,
open list:CPU FREQUENCY SCALING FRAMEWORK, Mel Gorman,
Peter Zijlstra, Rafael J. Wysocki, Steven Rostedt,
Valentin Schneider, Vincent Guittot, Viresh Kumar, z00813676,
sudeep.holla
+Sudeep.
On Mon, 14 Jul 2025 at 02:31, Beata Michalska <beata.michalska@arm.com> wrote:
> So I believe this should be handled in CPUFreq core, if at all.
> Would be good to get an input/opinion from the maintainers: Viresh and Rafael.
Viresh, Rafael, Sudeep, could you kindly chime in? The unreliability
of this frequency
measurement method in CPPC is affecting the cached frequency saved by CPUFreq,
which in turn affects future frequency set calls.
It would be great if we could solve this in CPUFreq core (maybe not
rely on the cached
frequency while setting the new one [3]?)
>
> In the meantime ....
> It seems that the issue of getting counters on a CPU that is idle is not
> in the counters themselves, but in the way how they are being read - at least
> from what I can observe.
> The first read experience longer delay between reading core and const counters,
> and as const one is read as a second one, it misses some increments (within
> calculated delta). So, what we could do within the driver is either:
> - Add a way to request reading both counters in a single cpc_read (preferable
> I guess, though I would have to have a closer look at that)
I already tried something like this; I used [1] which basically puts the
2 (constcnt, corecnt) register reads in a single CPC call;
This did not help. The values are still highly variable. I never got
merged FWIW.
> - Add some logic that would make sure the reads are not far apart
As Jie pointed out earlier, a lot of this has been discussed (see the references
within the patch link [2]), so I'm not really sure what else can done
to reduce this on
Linux; there are two registers (SYS_AMEVCNTR0_CORE_EL0 &
SYS_AMEVCNTR0_CORE_EL0), so there will always be some scheduler induced
variability between the two reads.
Best regards,
[1] https://patchew.org/linux/20240229162520.970986-1-vanshikonda@os.amperecomputing.com/20240229162520.970986-4-vanshikonda@os.amperecomputing.com/
[2] https://lore.kernel.org/linux-pm/20250131162439.3843071-1-beata.michalska@arm.com/
[3] https://elixir.bootlin.com/linux/v6.15.6/source/drivers/cpufreq/cpufreq.c#L2415
--
-Prashant
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2 2/2] cpufreq: CPPC: Dont read counters for idle CPUs
2025-07-15 6:28 ` Prashant Malani
@ 2025-07-21 17:00 ` Rafael J. Wysocki
2025-07-21 19:40 ` Prashant Malani
0 siblings, 1 reply; 44+ messages in thread
From: Rafael J. Wysocki @ 2025-07-21 17:00 UTC (permalink / raw)
To: Prashant Malani
Cc: Beata Michalska, Jie Zhan, Ionela Voinescu, Ben Segall,
Dietmar Eggemann, Ingo Molnar, Juri Lelli, open list,
open list:CPU FREQUENCY SCALING FRAMEWORK, Mel Gorman,
Peter Zijlstra, Rafael J. Wysocki, Steven Rostedt,
Valentin Schneider, Vincent Guittot, Viresh Kumar, z00813676,
sudeep.holla
On Tue, Jul 15, 2025 at 8:28 AM Prashant Malani <pmalani@google.com> wrote:
>
> +Sudeep.
>
> On Mon, 14 Jul 2025 at 02:31, Beata Michalska <beata.michalska@arm.com> wrote:
> > So I believe this should be handled in CPUFreq core, if at all.
> > Would be good to get an input/opinion from the maintainers: Viresh and Rafael.
>
> Viresh, Rafael, Sudeep, could you kindly chime in? The unreliability
> of this frequency
> measurement method in CPPC is affecting the cached frequency saved by CPUFreq,
> which in turn affects future frequency set calls.
I gather that "the cached frequency saved by CPUFreq" means policy->cur.
> It would be great if we could solve this in CPUFreq core (maybe not
> rely on the cached frequency while setting the new one [3]?)
I see what you mean now.
Why don't you flag the driver as CPUFREQ_NEED_UPDATE_LIMITS?
That would kind of make sense given how the driver works overall, or
am I missing anything?
> >
> > In the meantime ....
> > It seems that the issue of getting counters on a CPU that is idle is not
> > in the counters themselves, but in the way how they are being read - at least
> > from what I can observe.
> > The first read experience longer delay between reading core and const counters,
> > and as const one is read as a second one, it misses some increments (within
> > calculated delta). So, what we could do within the driver is either:
> > - Add a way to request reading both counters in a single cpc_read (preferable
> > I guess, though I would have to have a closer look at that)
>
> I already tried something like this; I used [1] which basically puts the
> 2 (constcnt, corecnt) register reads in a single CPC call;
> This did not help. The values are still highly variable. I never got
> merged FWIW.
>
> > - Add some logic that would make sure the reads are not far apart
>
> As Jie pointed out earlier, a lot of this has been discussed (see the references
> within the patch link [2]), so I'm not really sure what else can done
> to reduce this on
> Linux; there are two registers (SYS_AMEVCNTR0_CORE_EL0 &
> SYS_AMEVCNTR0_CORE_EL0), so there will always be some scheduler induced
> variability between the two reads.
>
> Best regards,
>
> [1] https://patchew.org/linux/20240229162520.970986-1-vanshikonda@os.amperecomputing.com/20240229162520.970986-4-vanshikonda@os.amperecomputing.com/
> [2] https://lore.kernel.org/linux-pm/20250131162439.3843071-1-beata.michalska@arm.com/
> [3] https://elixir.bootlin.com/linux/v6.15.6/source/drivers/cpufreq/cpufreq.c#L2415
>
> --
> -Prashant
>
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2 2/2] cpufreq: CPPC: Dont read counters for idle CPUs
2025-07-21 17:00 ` Rafael J. Wysocki
@ 2025-07-21 19:40 ` Prashant Malani
2025-07-22 3:27 ` Viresh Kumar
0 siblings, 1 reply; 44+ messages in thread
From: Prashant Malani @ 2025-07-21 19:40 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Beata Michalska, Jie Zhan, Ionela Voinescu, Ben Segall,
Dietmar Eggemann, Ingo Molnar, Juri Lelli, open list,
open list:CPU FREQUENCY SCALING FRAMEWORK, Mel Gorman,
Peter Zijlstra, Steven Rostedt, Valentin Schneider,
Vincent Guittot, Viresh Kumar, z00813676, sudeep.holla
Hi Rafael,
Thanks for looking into this.
On Mon, 21 Jul 2025 at 10:00, Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Tue, Jul 15, 2025 at 8:28 AM Prashant Malani <pmalani@google.com> wrote:
> >
> > +Sudeep.
> >
> > On Mon, 14 Jul 2025 at 02:31, Beata Michalska <beata.michalska@arm.com> wrote:
> > > So I believe this should be handled in CPUFreq core, if at all.
> > > Would be good to get an input/opinion from the maintainers: Viresh and Rafael.
> >
> > Viresh, Rafael, Sudeep, could you kindly chime in? The unreliability
> > of this frequency
> > measurement method in CPPC is affecting the cached frequency saved by CPUFreq,
> > which in turn affects future frequency set calls.
>
> I gather that "the cached frequency saved by CPUFreq" means policy->cur.
Yes, that's right.
>
> > It would be great if we could solve this in CPUFreq core (maybe not
> > rely on the cached frequency while setting the new one [3]?)
>
> I see what you mean now.
>
> Why don't you flag the driver as CPUFREQ_NEED_UPDATE_LIMITS?
>
> That would kind of make sense given how the driver works overall, or
> am I missing anything?
Sounds fine to me (it doesn't fix the lingering accuracy issue, but at
least frequency
setting will get unblocked). I can put together a patch if there are
no objections.
Thanks,
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2 2/2] cpufreq: CPPC: Dont read counters for idle CPUs
2025-07-21 19:40 ` Prashant Malani
@ 2025-07-22 3:27 ` Viresh Kumar
2025-07-22 6:02 ` Prashant Malani
0 siblings, 1 reply; 44+ messages in thread
From: Viresh Kumar @ 2025-07-22 3:27 UTC (permalink / raw)
To: Prashant Malani
Cc: Rafael J. Wysocki, Beata Michalska, Jie Zhan, Ionela Voinescu,
Ben Segall, Dietmar Eggemann, Ingo Molnar, Juri Lelli, open list,
open list:CPU FREQUENCY SCALING FRAMEWORK, Mel Gorman,
Peter Zijlstra, Steven Rostedt, Valentin Schneider,
Vincent Guittot, z00813676, sudeep.holla
On 21-07-25, 12:40, Prashant Malani wrote:
> On Mon, 21 Jul 2025 at 10:00, Rafael J. Wysocki <rafael@kernel.org> wrote:
> > Why don't you flag the driver as CPUFREQ_NEED_UPDATE_LIMITS?
> >
> > That would kind of make sense given how the driver works overall, or
> > am I missing anything?
+1
> Sounds fine to me (it doesn't fix the lingering accuracy issue, but at
> least frequency
> setting will get unblocked). I can put together a patch if there are
> no objections.
--
viresh
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2 2/2] cpufreq: CPPC: Dont read counters for idle CPUs
2025-07-22 3:27 ` Viresh Kumar
@ 2025-07-22 6:02 ` Prashant Malani
2025-07-30 7:31 ` Prashant Malani
0 siblings, 1 reply; 44+ messages in thread
From: Prashant Malani @ 2025-07-22 6:02 UTC (permalink / raw)
To: Viresh Kumar
Cc: Rafael J. Wysocki, Beata Michalska, Jie Zhan, Ionela Voinescu,
Ben Segall, Dietmar Eggemann, Ingo Molnar, Juri Lelli, open list,
open list:CPU FREQUENCY SCALING FRAMEWORK, Mel Gorman,
Peter Zijlstra, Steven Rostedt, Valentin Schneider,
Vincent Guittot, z00813676, sudeep.holla
Hi Viresh and Rafael,
Thank you for taking the time to look at this series.
On Mon, 21 Jul 2025 at 20:27, Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 21-07-25, 12:40, Prashant Malani wrote:
> > On Mon, 21 Jul 2025 at 10:00, Rafael J. Wysocki <rafael@kernel.org> wrote:
> > > Why don't you flag the driver as CPUFREQ_NEED_UPDATE_LIMITS?
> > >
> > > That would kind of make sense given how the driver works overall, or
> > > am I missing anything?
>
> +1
Thanks, I posted [1] which implements what's suggested by Rafael. PTAL
Best regards,
[1] https://lore.kernel.org/linux-pm/20250722055611.130574-2-pmalani@google.com/
--
-Prashant
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2 2/2] cpufreq: CPPC: Dont read counters for idle CPUs
2025-07-22 6:02 ` Prashant Malani
@ 2025-07-30 7:31 ` Prashant Malani
2025-07-31 8:27 ` Beata Michalska
0 siblings, 1 reply; 44+ messages in thread
From: Prashant Malani @ 2025-07-30 7:31 UTC (permalink / raw)
To: Viresh Kumar
Cc: Rafael J. Wysocki, Beata Michalska, Jie Zhan, Ionela Voinescu,
Ben Segall, Dietmar Eggemann, Ingo Molnar, Juri Lelli, open list,
open list:CPU FREQUENCY SCALING FRAMEWORK, Mel Gorman,
Peter Zijlstra, Steven Rostedt, Valentin Schneider,
Vincent Guittot, z00813676, sudeep.holla
Sorry for restarting this thread, but I think the solution submitted
solves the write path. The read path still needs addressing.
As such, given the limitations and scheduler uncertainties
around the reads of the FFH registers, I think the patch in this
thread is still a good optimization to include; namely, if we know
the CPU is idle, don't bother trying to wake up that CPU just to
calculate frequency.
On Mon, 21 Jul 2025 at 23:02, Prashant Malani <pmalani@google.com> wrote:
>
> Hi Viresh and Rafael,
>
> Thank you for taking the time to look at this series.
>
> On Mon, 21 Jul 2025 at 20:27, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> >
> > On 21-07-25, 12:40, Prashant Malani wrote:
> > > On Mon, 21 Jul 2025 at 10:00, Rafael J. Wysocki <rafael@kernel.org> wrote:
> > > > Why don't you flag the driver as CPUFREQ_NEED_UPDATE_LIMITS?
> > > >
> > > > That would kind of make sense given how the driver works overall, or
> > > > am I missing anything?
> >
> > +1
>
> Thanks, I posted [1] which implements what's suggested by Rafael. PTAL
>
> Best regards,
>
> [1] https://lore.kernel.org/linux-pm/20250722055611.130574-2-pmalani@google.com/
>
> --
> -Prashant
--
-Prashant
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2 2/2] cpufreq: CPPC: Dont read counters for idle CPUs
2025-07-30 7:31 ` Prashant Malani
@ 2025-07-31 8:27 ` Beata Michalska
2025-07-31 11:13 ` Viresh Kumar
2025-07-31 16:51 ` Prashant Malani
0 siblings, 2 replies; 44+ messages in thread
From: Beata Michalska @ 2025-07-31 8:27 UTC (permalink / raw)
To: Prashant Malani
Cc: Viresh Kumar, Rafael J. Wysocki, Jie Zhan, Ionela Voinescu,
Ben Segall, Dietmar Eggemann, Ingo Molnar, Juri Lelli, open list,
open list:CPU FREQUENCY SCALING FRAMEWORK, Mel Gorman,
Peter Zijlstra, Steven Rostedt, Valentin Schneider,
Vincent Guittot, z00813676, sudeep.holla
On Wed, Jul 30, 2025 at 12:31:33AM -0700, Prashant Malani wrote:
> Sorry for restarting this thread, but I think the solution submitted
> solves the write path. The read path still needs addressing.
>
> As such, given the limitations and scheduler uncertainties
> around the reads of the FFH registers, I think the patch in this
> thread is still a good optimization to include; namely, if we know
> the CPU is idle, don't bother trying to wake up that CPU just to
> calculate frequency.
>
I am still wondering whether cpufreq core is not a better suited place for
checking whether the CPU is idle. We could potentially try on anther CPU within
the policy and if there is none, just provide the last known freq ?
@Viresh: What are your thoughts on that ?
In the meantime I'm still trying to figure out smht to mitigate the issues with
the numbers we get from counters after waking up the CPU.
---
BR
Beata
>
> On Mon, 21 Jul 2025 at 23:02, Prashant Malani <pmalani@google.com> wrote:
> >
> > Hi Viresh and Rafael,
> >
> > Thank you for taking the time to look at this series.
> >
> > On Mon, 21 Jul 2025 at 20:27, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> > >
> > > On 21-07-25, 12:40, Prashant Malani wrote:
> > > > On Mon, 21 Jul 2025 at 10:00, Rafael J. Wysocki <rafael@kernel.org> wrote:
> > > > > Why don't you flag the driver as CPUFREQ_NEED_UPDATE_LIMITS?
> > > > >
> > > > > That would kind of make sense given how the driver works overall, or
> > > > > am I missing anything?
> > >
> > > +1
> >
> > Thanks, I posted [1] which implements what's suggested by Rafael. PTAL
> >
> > Best regards,
> >
> > [1] https://lore.kernel.org/linux-pm/20250722055611.130574-2-pmalani@google.com/
> >
> > --
> > -Prashant
>
>
>
> --
> -Prashant
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2 2/2] cpufreq: CPPC: Dont read counters for idle CPUs
2025-07-31 8:27 ` Beata Michalska
@ 2025-07-31 11:13 ` Viresh Kumar
2025-07-31 20:23 ` Beata Michalska
2025-07-31 16:51 ` Prashant Malani
1 sibling, 1 reply; 44+ messages in thread
From: Viresh Kumar @ 2025-07-31 11:13 UTC (permalink / raw)
To: Beata Michalska
Cc: Prashant Malani, Rafael J. Wysocki, Jie Zhan, Ionela Voinescu,
Ben Segall, Dietmar Eggemann, Ingo Molnar, Juri Lelli, open list,
open list:CPU FREQUENCY SCALING FRAMEWORK, Mel Gorman,
Peter Zijlstra, Steven Rostedt, Valentin Schneider,
Vincent Guittot, z00813676, sudeep.holla
On 31-07-25, 10:27, Beata Michalska wrote:
> I am still wondering whether cpufreq core is not a better suited place for
> checking whether the CPU is idle. We could potentially try on anther CPU within
> the policy and if there is none, just provide the last known freq ?
>
> @Viresh: What are your thoughts on that ?
For other platforms (that' don't have counters to read), I think a call to
->get() to get the currently configured frequency is perfectly fine. Isn't it ?
--
viresh
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2 2/2] cpufreq: CPPC: Dont read counters for idle CPUs
2025-07-31 8:27 ` Beata Michalska
2025-07-31 11:13 ` Viresh Kumar
@ 2025-07-31 16:51 ` Prashant Malani
2025-07-31 20:30 ` Beata Michalska
1 sibling, 1 reply; 44+ messages in thread
From: Prashant Malani @ 2025-07-31 16:51 UTC (permalink / raw)
To: Beata Michalska
Cc: Viresh Kumar, Rafael J. Wysocki, Jie Zhan, Ionela Voinescu,
Ben Segall, Dietmar Eggemann, Ingo Molnar, Juri Lelli, open list,
open list:CPU FREQUENCY SCALING FRAMEWORK, Mel Gorman,
Peter Zijlstra, Steven Rostedt, Valentin Schneider,
Vincent Guittot, z00813676, sudeep.holla
HI Beata,
On Thu, 31 Jul 2025 at 01:27, Beata Michalska <beata.michalska@arm.com> wrote:
> In the meantime I'm still trying to figure out smht to mitigate the issues with
> the numbers we get from counters after waking up the CPU.
My attempt at that is here: [1]
It's been proposed before, but it's increasing the delay between samples.
[1] https://lore.kernel.org/linux-pm/20250730220812.53098-1-pmalani@google.com/
BR,
-Prashant
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2 2/2] cpufreq: CPPC: Dont read counters for idle CPUs
2025-07-31 11:13 ` Viresh Kumar
@ 2025-07-31 20:23 ` Beata Michalska
2025-08-01 4:43 ` Viresh Kumar
0 siblings, 1 reply; 44+ messages in thread
From: Beata Michalska @ 2025-07-31 20:23 UTC (permalink / raw)
To: Viresh Kumar
Cc: Prashant Malani, Rafael J. Wysocki, Jie Zhan, Ionela Voinescu,
Ben Segall, Dietmar Eggemann, Ingo Molnar, Juri Lelli, open list,
open list:CPU FREQUENCY SCALING FRAMEWORK, Mel Gorman,
Peter Zijlstra, Steven Rostedt, Valentin Schneider,
Vincent Guittot, z00813676, sudeep.holla
On Thu, Jul 31, 2025 at 04:43:24PM +0530, Viresh Kumar wrote:
> On 31-07-25, 10:27, Beata Michalska wrote:
> > I am still wondering whether cpufreq core is not a better suited place for
> > checking whether the CPU is idle. We could potentially try on anther CPU within
> > the policy and if there is none, just provide the last known freq ?
> >
> > @Viresh: What are your thoughts on that ?
>
> For other platforms (that' don't have counters to read), I think a call to
> ->get() to get the currently configured frequency is perfectly fine. Isn't it ?
I assume it is. I was just thinking out loud and wasn't suggesting anything
otherwise. The reason why I mentioned that is that getting current frequency
for an idle CPU seems like smth we could potentially optimise away and save some
cycles (fixing other problems on the way, like this one).
But if that's undesired for any reason, it's perfectly fine to stay with
unconditional calls to `->get()'.
---
BR
Beata
>
> --
> viresh
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2 2/2] cpufreq: CPPC: Dont read counters for idle CPUs
2025-07-31 16:51 ` Prashant Malani
@ 2025-07-31 20:30 ` Beata Michalska
2025-08-01 9:16 ` Prashant Malani
0 siblings, 1 reply; 44+ messages in thread
From: Beata Michalska @ 2025-07-31 20:30 UTC (permalink / raw)
To: Prashant Malani
Cc: Viresh Kumar, Rafael J. Wysocki, Jie Zhan, Ionela Voinescu,
Ben Segall, Dietmar Eggemann, Ingo Molnar, Juri Lelli, open list,
open list:CPU FREQUENCY SCALING FRAMEWORK, Mel Gorman,
Peter Zijlstra, Steven Rostedt, Valentin Schneider,
Vincent Guittot, z00813676, sudeep.holla
On Thu, Jul 31, 2025 at 09:51:51AM -0700, Prashant Malani wrote:
> HI Beata,
>
> On Thu, 31 Jul 2025 at 01:27, Beata Michalska <beata.michalska@arm.com> wrote:
> > In the meantime I'm still trying to figure out smht to mitigate the issues with
> > the numbers we get from counters after waking up the CPU.
>
> My attempt at that is here: [1]
> It's been proposed before, but it's increasing the delay between samples.
>
> [1] https://lore.kernel.org/linux-pm/20250730220812.53098-1-pmalani@google.com/
Thank you for the info, but I'm exploring ways that will not increase the time
window between the reads.
While we are at it, would you be able to drop me some numbers from your
platform, preferably good and `bad` ones:
both counter values, and the bits that are used for mapping performance to
actual frequency (nominal freq/perf, reference perf)
---
BR
Beata
>
> BR,
>
> -Prashant
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2 2/2] cpufreq: CPPC: Dont read counters for idle CPUs
2025-07-31 20:23 ` Beata Michalska
@ 2025-08-01 4:43 ` Viresh Kumar
2025-08-07 0:19 ` Prashant Malani
0 siblings, 1 reply; 44+ messages in thread
From: Viresh Kumar @ 2025-08-01 4:43 UTC (permalink / raw)
To: Beata Michalska
Cc: Prashant Malani, Rafael J. Wysocki, Jie Zhan, Ionela Voinescu,
Ben Segall, Dietmar Eggemann, Ingo Molnar, Juri Lelli, open list,
open list:CPU FREQUENCY SCALING FRAMEWORK, Mel Gorman,
Peter Zijlstra, Steven Rostedt, Valentin Schneider,
Vincent Guittot, z00813676, sudeep.holla
On 31-07-25, 22:23, Beata Michalska wrote:
> The reason why I mentioned that is that getting current frequency
> for an idle CPU seems like smth we could potentially optimise away and save some
> cycles (fixing other problems on the way, like this one).
I agree with that idea, just that the cpufreq core may not be the right place
for that. Doing that in the driver should be fine.
> But if that's undesired for any reason, it's perfectly fine to stay with
--
viresh
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2 2/2] cpufreq: CPPC: Dont read counters for idle CPUs
2025-07-31 20:30 ` Beata Michalska
@ 2025-08-01 9:16 ` Prashant Malani
2025-08-04 20:55 ` Prashant Malani
0 siblings, 1 reply; 44+ messages in thread
From: Prashant Malani @ 2025-08-01 9:16 UTC (permalink / raw)
To: Beata Michalska
Cc: Viresh Kumar, Rafael J. Wysocki, Jie Zhan, Ionela Voinescu,
Ben Segall, Dietmar Eggemann, Ingo Molnar, Juri Lelli, open list,
open list:CPU FREQUENCY SCALING FRAMEWORK, Mel Gorman,
Peter Zijlstra, Steven Rostedt, Valentin Schneider,
Vincent Guittot, z00813676, sudeep.holla
HI Beata,
On Thu, 31 Jul 2025 at 13:31, Beata Michalska <beata.michalska@arm.com> wrote:
> Thank you for the info, but I'm exploring ways that will not increase the time
> window between the reads.
IMO this issue is intractable on non-RT OSes like Linux (at least,
Linux when it is not compiled for RT), since we basically need to
ensure atomicity for the reading of both ref and del registers together.
We can't disable preemption here, since some of
the code paths (like PCC regs) acquire semaphores [2].
This also explains why we don't see this issue while reading the same
registers from firmware (running an RTOS). There, the same readings
are accurate (whether the CPU is idle or loaded).
So mitigations like increasing the delay are the only practical recourse.
IAC, I hope this doesn't detract from the discussion regarding the patch
series in this thread, which is about the idle optimization
(which I think we should pursue).
> While we are at it, would you be able to drop me some numbers from your
> platform, preferably good and `bad` ones:
> both counter values, and the bits that are used for mapping performance to
> actual frequency (nominal freq/perf, reference perf)
Sure. There are two pathological scenarios.
The first is when the CPU is idle/mostly idle. Here are some counter readings
(I used ftrace for all of these, and the max possible frequency is 3.4GHz):
t0: del:2936200649, ref:972155370
t1: del:2936208538, ref:972157370
ref_perf:10
delivered_perf:39
t0: del:1733705541, ref:518550840
t1: del:1733713524, ref:518552820
ref_perf:10
delivered_perf:40
This scenario is handled by this patch series.
The second is the heavily loaded CPU case.
Here are the counter readings:
t0: del:93896505680, ref:27625620970
t1: del:93896521640, ref:27625625360
ref_perf:10
delivered_perf:36
t0: del:94258513479, ref:27795493670
t1: del:94258529230, ref:27795498090
ref_perf:10
delivered_perf:35
These aren't as bad, but still above the stated maximum.
These get addressed by [1].
And here is a "good" measurement when the CPU is loaded:
t0: del:102081104909, ref:30074917840
t1: del:102081121558, ref:30074922660
ref_perf:10
delivered_perf:34
HTH,
-Prashant
[1] https://lore.kernel.org/linux-pm/20250730220812.53098-1-pmalani@google.com/
[2] https://elixir.bootlin.com/linux/v6.16-rc7/source/drivers/acpi/cppc_acpi.c#L1509
--
-Prashant
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2 2/2] cpufreq: CPPC: Dont read counters for idle CPUs
2025-08-01 9:16 ` Prashant Malani
@ 2025-08-04 20:55 ` Prashant Malani
2025-08-06 7:21 ` Beata Michalska
0 siblings, 1 reply; 44+ messages in thread
From: Prashant Malani @ 2025-08-04 20:55 UTC (permalink / raw)
To: Beata Michalska
Cc: Viresh Kumar, Rafael J. Wysocki, Jie Zhan, Ionela Voinescu,
Ben Segall, Dietmar Eggemann, Ingo Molnar, Juri Lelli, open list,
open list:CPU FREQUENCY SCALING FRAMEWORK, Mel Gorman,
Peter Zijlstra, Steven Rostedt, Valentin Schneider,
Vincent Guittot, z00813676, sudeep.holla
On Fri, 1 Aug 2025 at 02:16, Prashant Malani <pmalani@google.com> wrote:
> On Thu, 31 Jul 2025 at 13:31, Beata Michalska <beata.michalska@arm.com> wrote:
> > Thank you for the info, but I'm exploring ways that will not increase the time
> > window between the reads.
>
> IMO this issue is intractable on non-RT OSes like Linux (at least,
> Linux when it is not compiled for RT), since we basically need to
> ensure atomicity for the reading of both ref and del registers together.
> We can't disable preemption here, since some of
> the code paths (like PCC regs) acquire semaphores [2].
Actually, minor correction here. The PCC path is not the issue.
It's the ffh read path[3], which requires interrupts to be enabled.
Larger point still stands.
[3] https://elixir.bootlin.com/linux/v6.16-rc7/source/arch/arm64/kernel/topology.c#L451
BR,
-Prashant
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2 2/2] cpufreq: CPPC: Dont read counters for idle CPUs
2025-08-04 20:55 ` Prashant Malani
@ 2025-08-06 7:21 ` Beata Michalska
2025-08-07 0:01 ` Prashant Malani
0 siblings, 1 reply; 44+ messages in thread
From: Beata Michalska @ 2025-08-06 7:21 UTC (permalink / raw)
To: Prashant Malani
Cc: Viresh Kumar, Rafael J. Wysocki, Jie Zhan, Ionela Voinescu,
Ben Segall, Dietmar Eggemann, Ingo Molnar, Juri Lelli, open list,
open list:CPU FREQUENCY SCALING FRAMEWORK, Mel Gorman,
Peter Zijlstra, Steven Rostedt, Valentin Schneider,
Vincent Guittot, z00813676, sudeep.holla
On Mon, Aug 04, 2025 at 01:55:37PM -0700, Prashant Malani wrote:
> On Fri, 1 Aug 2025 at 02:16, Prashant Malani <pmalani@google.com> wrote:
> > On Thu, 31 Jul 2025 at 13:31, Beata Michalska <beata.michalska@arm.com> wrote:
> > > Thank you for the info, but I'm exploring ways that will not increase the time
> > > window between the reads.
> >
> > IMO this issue is intractable on non-RT OSes like Linux (at least,
> > Linux when it is not compiled for RT), since we basically need to
> > ensure atomicity for the reading of both ref and del registers together.
> > We can't disable preemption here, since some of
> > the code paths (like PCC regs) acquire semaphores [2].
>
> Actually, minor correction here. The PCC path is not the issue.
> It's the ffh read path[3], which requires interrupts to be enabled.
> Larger point still stands.
>
> [3] https://elixir.bootlin.com/linux/v6.16-rc7/source/arch/arm64/kernel/topology.c#L451
>
> BR,
>
> -Prashant
Would you mind giving it a go and see whether that improves things on your end ?
Note that this is a quick and semi-dirty hack though.
---
BR
Beata
diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
index 5d07ee85bdae..65adb78a9a87 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -479,6 +479,11 @@ bool cpc_ffh_supported(void)
return true;
}
+bool cpc_burst_read_supported(void)
+{
+ return cpc_ffh_supported();
+}
+
int cpc_read_ffh(int cpu, struct cpc_reg *reg, u64 *val)
{
int ret = -EOPNOTSUPP;
@@ -501,6 +506,61 @@ int cpc_read_ffh(int cpu, struct cpc_reg *reg, u64 *val)
return ret;
}
+
+struct cpc_burst_read {
+ struct cpc_reg_sample *samples;
+ size_t count;
+};
+
+void counters_burst_read_on_cpu(void *arg)
+{
+ struct cpc_burst_read *desc = arg;
+ u64 value;
+ int i;
+
+ for (i = 0; i < desc->count; ++i) {
+ switch ((u64)desc->samples[i].reg->address) {
+ case 0x0:
+ value = read_corecnt();
+ break;
+ case 0x1:
+ value = read_constcnt();
+ break;
+ }
+ *(u64 *)desc->samples[i].sample_value = value;
+ }
+
+ for (i = 0; i < desc->count; ++i) {
+ struct cpc_reg *reg = desc->samples[i].reg;
+
+ value = *(u64 *)desc->samples[i].sample_value;
+ value &= GENMASK_ULL(reg->bit_offset + reg->bit_width - 1,
+ reg->bit_offset);
+ value >>= reg->bit_offset;
+ *(u64 *)desc->samples[i].sample_value = value;
+ }
+}
+
+static inline bool cpc_reg_supported(struct cpc_reg *reg)
+{
+ return !((u64)reg->address != 0x0 || (u64)reg->address != 0x1);
+}
+
+int cpc_burst_read_ffh(int cpu, struct cpc_reg_sample *samples, size_t count)
+{
+ struct cpc_burst_read desc = { samples, count };
+ int ret = -EOPNOTSUPP;
+ int i;
+
+ for (i = 0; i < count; ++i) {
+ if (!cpc_reg_supported(samples[i].reg))
+ return ret;
+ }
+
+ smp_call_function_single(cpu, counters_burst_read_on_cpu, &desc, 1);
+ return 0;
+}
+
int cpc_write_ffh(int cpunum, struct cpc_reg *reg, u64 val)
{
return -EOPNOTSUPP;
diff --git a/drivers/acpi/cppc_acpi.c b/drivers/acpi/cppc_acpi.c
index 6b649031808f..c070627e4a1e 100644
--- a/drivers/acpi/cppc_acpi.c
+++ b/drivers/acpi/cppc_acpi.c
@@ -617,6 +617,11 @@ bool __weak cpc_supported_by_cpu(void)
return false;
}
+bool __weak cpc_burst_read_supported(void)
+{
+ return false;
+}
+
/**
* pcc_data_alloc() - Allocate the pcc_data memory for pcc subspace
* @pcc_ss_id: PCC Subspace index as in the PCC client ACPI package.
@@ -1077,6 +1082,21 @@ static int cpc_read(int cpu, struct cpc_register_resource *reg_res, u64 *val)
return 0;
}
+static int cpc_burst_read(int cpu, struct cpc_reg_sample *samples, size_t count)
+{
+ int i;
+
+ // Just for now - only ffh
+ if (!cpc_ffh_supported())
+ return -EINVAL;
+
+ for (i = 0; i < count; ++i)
+ if (samples[i].reg->space_id != ACPI_ADR_SPACE_FIXED_HARDWARE)
+ return -EINVAL;
+ return cpc_burst_read_ffh(cpu, samples, count);
+}
+
+
static int cpc_write(int cpu, struct cpc_register_resource *reg_res, u64 val)
{
int ret_val = 0;
@@ -1515,8 +1535,21 @@ int cppc_get_perf_ctrs(int cpunum, struct cppc_perf_fb_ctrs *perf_fb_ctrs)
}
}
- cpc_read(cpunum, delivered_reg, &delivered);
- cpc_read(cpunum, reference_reg, &reference);
+ // Verify whether the registers can be requested in one go
+ if (delivered_reg->type != ACPI_TYPE_INTEGER &&
+ reference_reg->type != ACPI_TYPE_INTEGER &&
+ cpc_burst_read_supported()) {
+
+ struct cpc_reg_sample samples[] = {
+ { &delivered_reg->cpc_entry.reg, &delivered },
+ { &reference_reg->cpc_entry.reg, &reference }
+ };
+
+ cpc_burst_read(cpunum, samples, ARRAY_SIZE(samples));
+ } else {
+ cpc_read(cpunum, delivered_reg, &delivered);
+ cpc_read(cpunum, reference_reg, &reference);
+ }
cpc_read(cpunum, ref_perf_reg, &ref_perf);
/*
diff --git a/include/acpi/cppc_acpi.h b/include/acpi/cppc_acpi.h
index 325e9543e08f..f094e275834a 100644
--- a/include/acpi/cppc_acpi.h
+++ b/include/acpi/cppc_acpi.h
@@ -52,6 +52,12 @@ struct cpc_reg {
u64 address;
} __packed;
+
+struct cpc_reg_sample {
+ struct cpc_reg *reg;
+ void *sample_value;
+};
+
/*
* Each entry in the CPC table is either
* of type ACPI_TYPE_BUFFER or
@@ -165,6 +171,7 @@ extern unsigned int cppc_get_transition_latency(int cpu);
extern bool cpc_ffh_supported(void);
extern bool cpc_supported_by_cpu(void);
extern int cpc_read_ffh(int cpunum, struct cpc_reg *reg, u64 *val);
+extern int cpc_burst_read_ffh(int cpunum, struct cpc_reg_sample *samples, size_t count);
extern int cpc_write_ffh(int cpunum, struct cpc_reg *reg, u64 val);
extern int cppc_get_epp_perf(int cpunum, u64 *epp_perf);
extern int cppc_set_epp_perf(int cpu, struct cppc_perf_ctrls *perf_ctrls, bool enable);
@@ -229,6 +236,10 @@ static inline int cpc_read_ffh(int cpunum, struct cpc_reg *reg, u64 *val)
{
return -EOPNOTSUPP;
}
+static inline int cpc_burst_read_ffh(int cpunum, struct cpc_reg_sample *samples, size_t count)
+{
+ return -EOPNOTSUPP;
+}
static inline int cpc_write_ffh(int cpunum, struct cpc_reg *reg, u64 val)
{
return -EOPNOTSUPP;
--
^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH v2 2/2] cpufreq: CPPC: Dont read counters for idle CPUs
2025-08-06 7:21 ` Beata Michalska
@ 2025-08-07 0:01 ` Prashant Malani
2025-08-07 10:24 ` Beata Michalska
0 siblings, 1 reply; 44+ messages in thread
From: Prashant Malani @ 2025-08-07 0:01 UTC (permalink / raw)
To: Beata Michalska
Cc: Viresh Kumar, Rafael J. Wysocki, Jie Zhan, Ionela Voinescu,
Ben Segall, Dietmar Eggemann, Ingo Molnar, Juri Lelli, open list,
open list:CPU FREQUENCY SCALING FRAMEWORK, Mel Gorman,
Peter Zijlstra, Steven Rostedt, Valentin Schneider,
Vincent Guittot, z00813676, sudeep.holla
Hi Beata,
On Wed, 6 Aug 2025 at 00:22, Beata Michalska <beata.michalska@arm.com> wrote:
> Would you mind giving it a go and see whether that improves things on your end ?
> Note that this is a quick and semi-dirty hack though.
>
Sure.
The provided patch doesn't appear to work as expected.
With all cores loaded (stress_ng --cpu N), it's returning the same
counter values
across samples. Here are readings from multiple CPUs:
t0: del:18446603338626579088, ref:192
t1: del:18446603338626579088, ref:192
ref_perf:10
delivered_perf:0
t0: del:18446603338627594896, ref:192
t1: del:18446603338627594896, ref:192
ref_perf:10
delivered_perf:0
t0: del:18446603338627627664, ref:192
t1: del:18446603338627627664, ref:192
ref_perf:10
delivered_perf:0
I verified separately that the "burst_read" path is being used by the platform
I am testing on.
BR,
-Prashant
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2 2/2] cpufreq: CPPC: Dont read counters for idle CPUs
2025-08-01 4:43 ` Viresh Kumar
@ 2025-08-07 0:19 ` Prashant Malani
2025-08-11 6:05 ` Viresh Kumar
0 siblings, 1 reply; 44+ messages in thread
From: Prashant Malani @ 2025-08-07 0:19 UTC (permalink / raw)
To: Viresh Kumar
Cc: Beata Michalska, Rafael J. Wysocki, Jie Zhan, Ionela Voinescu,
Ben Segall, Dietmar Eggemann, Ingo Molnar, Juri Lelli, open list,
open list:CPU FREQUENCY SCALING FRAMEWORK, Mel Gorman,
Peter Zijlstra, Steven Rostedt, Valentin Schneider,
Vincent Guittot, z00813676, sudeep.holla
On Thu, 31 Jul 2025 at 21:43, Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 31-07-25, 22:23, Beata Michalska wrote:
> > The reason why I mentioned that is that getting current frequency
> > for an idle CPU seems like smth we could potentially optimise away and save some
> > cycles (fixing other problems on the way, like this one).
>
> I agree with that idea, just that the cpufreq core may not be the right place
> for that. Doing that in the driver should be fine.
>
> > But if that's undesired for any reason, it's perfectly fine to stay with
>
So, do we have consensus that the idle check is acceptable as proposed?
(Just want to make sure this thread doesn't get lost given another thread
has forked off in this conversation).
Best regards,
--
-Prashant
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2 2/2] cpufreq: CPPC: Dont read counters for idle CPUs
2025-08-07 0:01 ` Prashant Malani
@ 2025-08-07 10:24 ` Beata Michalska
2025-08-08 2:14 ` Prashant Malani
0 siblings, 1 reply; 44+ messages in thread
From: Beata Michalska @ 2025-08-07 10:24 UTC (permalink / raw)
To: Prashant Malani
Cc: Viresh Kumar, Rafael J. Wysocki, Jie Zhan, Ionela Voinescu,
Ben Segall, Dietmar Eggemann, Ingo Molnar, Juri Lelli, open list,
open list:CPU FREQUENCY SCALING FRAMEWORK, Mel Gorman,
Peter Zijlstra, Steven Rostedt, Valentin Schneider,
Vincent Guittot, z00813676, sudeep.holla
On Wed, Aug 06, 2025 at 05:01:48PM -0700, Prashant Malani wrote:
> Hi Beata,
>
> On Wed, 6 Aug 2025 at 00:22, Beata Michalska <beata.michalska@arm.com> wrote:
> > Would you mind giving it a go and see whether that improves things on your end ?
> > Note that this is a quick and semi-dirty hack though.
> >
>
> Sure.
> The provided patch doesn't appear to work as expected.
> With all cores loaded (stress_ng --cpu N), it's returning the same
> counter values
> across samples. Here are readings from multiple CPUs:
>
> t0: del:18446603338626579088, ref:192
> t1: del:18446603338626579088, ref:192
> ref_perf:10
> delivered_perf:0
>
>
> t0: del:18446603338627594896, ref:192
> t1: del:18446603338627594896, ref:192
> ref_perf:10
> delivered_perf:0
>
> t0: del:18446603338627627664, ref:192
> t1: del:18446603338627627664, ref:192
> ref_perf:10
> delivered_perf:0
>
> I verified separately that the "burst_read" path is being used by the platform
> I am testing on.
>
> BR,
>
> -Prashant
Right .... that's what happens when you are (I am) making last minute clean up.
That should fix it. Would you mind giving it another go ? Would appreciate it.
---
BR
Beata
diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
index 65adb78a9a87..2a51e93fcd6c 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -543,7 +543,7 @@ void counters_burst_read_on_cpu(void *arg)
static inline bool cpc_reg_supported(struct cpc_reg *reg)
{
- return !((u64)reg->address != 0x0 || (u64)reg->address != 0x1);
+ return !((u64)reg->address != 0x0 && (u64)reg->address != 0x1);
}
int cpc_burst_read_ffh(int cpu, struct cpc_reg_sample *samples, size_t count)
^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH v2 2/2] cpufreq: CPPC: Dont read counters for idle CPUs
2025-08-07 10:24 ` Beata Michalska
@ 2025-08-08 2:14 ` Prashant Malani
2025-08-13 10:15 ` Beata Michalska
0 siblings, 1 reply; 44+ messages in thread
From: Prashant Malani @ 2025-08-08 2:14 UTC (permalink / raw)
To: Beata Michalska
Cc: Viresh Kumar, Rafael J. Wysocki, Jie Zhan, Ionela Voinescu,
Ben Segall, Dietmar Eggemann, Ingo Molnar, Juri Lelli, open list,
open list:CPU FREQUENCY SCALING FRAMEWORK, Mel Gorman,
Peter Zijlstra, Steven Rostedt, Valentin Schneider,
Vincent Guittot, z00813676, sudeep.holla
Hi Beata,
On Aug 07 12:24, Beata Michalska wrote:
> Right .... that's what happens when you are (I am) making last minute clean up.
> That should fix it. Would you mind giving it another go ? Would appreciate it.
>
> ---
> BR
> Beata
>
> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
> index 65adb78a9a87..2a51e93fcd6c 100644
> --- a/arch/arm64/kernel/topology.c
> +++ b/arch/arm64/kernel/topology.c
> @@ -543,7 +543,7 @@ void counters_burst_read_on_cpu(void *arg)
>
> static inline bool cpc_reg_supported(struct cpc_reg *reg)
> {
> - return !((u64)reg->address != 0x0 || (u64)reg->address != 0x1);
> + return !((u64)reg->address != 0x0 && (u64)reg->address != 0x1);
> }
Here are the measurements with the fix:
The readings are less accurate. There are some which report
3.4 GHz (as earlier) but many are off:
t0: del:77500009084, ref:22804739600
t1: del:77500020316, ref:22804743100
ref_perf:10
delivered_perf:32
t0: del:77910203848, ref:22941794740
t1: del:77910215594, ref:22941798070
ref_perf:10
delivered_perf:35
t0: del:77354782419, ref:22762276000
t1: del:77354793991, ref:22762279400
ref_perf:10
delivered_perf:34
t0: del:64470686034, ref:22998377620
t1: del:64470695313, ref:22998380880
ref_perf:10
delivered_perf:28
t0: del:78019898424, ref:22957940640
t1: del:78019912872, ref:22957944590
ref_perf:10
delivered_perf:36
Best regards,
-Prashant
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2 2/2] cpufreq: CPPC: Dont read counters for idle CPUs
2025-08-07 0:19 ` Prashant Malani
@ 2025-08-11 6:05 ` Viresh Kumar
2025-08-11 18:43 ` Prashant Malani
0 siblings, 1 reply; 44+ messages in thread
From: Viresh Kumar @ 2025-08-11 6:05 UTC (permalink / raw)
To: Prashant Malani
Cc: Beata Michalska, Rafael J. Wysocki, Jie Zhan, Ionela Voinescu,
Ben Segall, Dietmar Eggemann, Ingo Molnar, Juri Lelli, open list,
open list:CPU FREQUENCY SCALING FRAMEWORK, Mel Gorman,
Peter Zijlstra, Steven Rostedt, Valentin Schneider,
Vincent Guittot, z00813676, sudeep.holla
On 06-08-25, 17:19, Prashant Malani wrote:
> So, do we have consensus that the idle check is acceptable as proposed?
> (Just want to make sure this thread doesn't get lost given another thread
> has forked off in this conversation).
I don't have any objections to this or a better solution to this.
--
viresh
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2 2/2] cpufreq: CPPC: Dont read counters for idle CPUs
2025-08-11 6:05 ` Viresh Kumar
@ 2025-08-11 18:43 ` Prashant Malani
2025-08-11 19:19 ` Rafael J. Wysocki
2025-08-13 10:12 ` Beata Michalska
0 siblings, 2 replies; 44+ messages in thread
From: Prashant Malani @ 2025-08-11 18:43 UTC (permalink / raw)
To: Viresh Kumar
Cc: Beata Michalska, Rafael J. Wysocki, Jie Zhan, Ionela Voinescu,
Ben Segall, Dietmar Eggemann, Ingo Molnar, Juri Lelli, open list,
open list:CPU FREQUENCY SCALING FRAMEWORK, Mel Gorman,
Peter Zijlstra, Steven Rostedt, Valentin Schneider,
Vincent Guittot, z00813676, sudeep.holla
On Aug 11 11:35, Viresh Kumar wrote:
> On 06-08-25, 17:19, Prashant Malani wrote:
> > So, do we have consensus that the idle check is acceptable as proposed?
> > (Just want to make sure this thread doesn't get lost given another thread
> > has forked off in this conversation).
>
> I don't have any objections to this or a better solution to this.
Thanks Viresh! Beata, can we kindly move ahead with the idle
optimization (which is this series), while we continue discussions for
the "under load" scenarios on the other thread?
BR,
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2 2/2] cpufreq: CPPC: Dont read counters for idle CPUs
2025-08-11 18:43 ` Prashant Malani
@ 2025-08-11 19:19 ` Rafael J. Wysocki
2025-08-11 20:01 ` Prashant Malani
2025-08-13 10:12 ` Beata Michalska
1 sibling, 1 reply; 44+ messages in thread
From: Rafael J. Wysocki @ 2025-08-11 19:19 UTC (permalink / raw)
To: Prashant Malani
Cc: Viresh Kumar, Beata Michalska, Rafael J. Wysocki, Jie Zhan,
Ionela Voinescu, Ben Segall, Dietmar Eggemann, Ingo Molnar,
Juri Lelli, open list, open list:CPU FREQUENCY SCALING FRAMEWORK,
Mel Gorman, Peter Zijlstra, Steven Rostedt, Valentin Schneider,
Vincent Guittot, z00813676, sudeep.holla
On Mon, Aug 11, 2025 at 8:43 PM Prashant Malani <pmalani@google.com> wrote:
>
> On Aug 11 11:35, Viresh Kumar wrote:
> > On 06-08-25, 17:19, Prashant Malani wrote:
> > > So, do we have consensus that the idle check is acceptable as proposed?
> > > (Just want to make sure this thread doesn't get lost given another thread
> > > has forked off in this conversation).
> >
> > I don't have any objections to this or a better solution to this.
>
> Thanks Viresh! Beata, can we kindly move ahead with the idle
> optimization (which is this series), while we continue discussions for
> the "under load" scenarios on the other thread?
I need some more time, please?
This problem is similar (if not analogous) to what happens on x86 and
that is not handled in the cpuidle core.
Thanks!
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2 2/2] cpufreq: CPPC: Dont read counters for idle CPUs
2025-08-11 19:19 ` Rafael J. Wysocki
@ 2025-08-11 20:01 ` Prashant Malani
2025-08-14 11:48 ` Rafael J. Wysocki
0 siblings, 1 reply; 44+ messages in thread
From: Prashant Malani @ 2025-08-11 20:01 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Viresh Kumar, Beata Michalska, Jie Zhan, Ionela Voinescu,
Ben Segall, Dietmar Eggemann, Ingo Molnar, Juri Lelli, open list,
open list:CPU FREQUENCY SCALING FRAMEWORK, Mel Gorman,
Peter Zijlstra, Steven Rostedt, Valentin Schneider,
Vincent Guittot, z00813676, sudeep.holla
On Aug 11 21:19, Rafael J. Wysocki wrote:
> On Mon, Aug 11, 2025 at 8:43 PM Prashant Malani <pmalani@google.com> wrote:
> >
> > On Aug 11 11:35, Viresh Kumar wrote:
> > > On 06-08-25, 17:19, Prashant Malani wrote:
> > > > So, do we have consensus that the idle check is acceptable as proposed?
> > > > (Just want to make sure this thread doesn't get lost given another thread
> > > > has forked off in this conversation).
> > >
> > > I don't have any objections to this or a better solution to this.
> >
> > Thanks Viresh! Beata, can we kindly move ahead with the idle
> > optimization (which is this series), while we continue discussions for
> > the "under load" scenarios on the other thread?
>
> I need some more time, please?
>
> This problem is similar (if not analogous) to what happens on x86 and
> that is not handled in the cpuidle core.
My apologies! Didn't mean to rush.
Will stand by for updates.
BR,
-Prashant
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2 2/2] cpufreq: CPPC: Dont read counters for idle CPUs
2025-08-11 18:43 ` Prashant Malani
2025-08-11 19:19 ` Rafael J. Wysocki
@ 2025-08-13 10:12 ` Beata Michalska
1 sibling, 0 replies; 44+ messages in thread
From: Beata Michalska @ 2025-08-13 10:12 UTC (permalink / raw)
To: Prashant Malani
Cc: Viresh Kumar, Rafael J. Wysocki, Jie Zhan, Ionela Voinescu,
Ben Segall, Dietmar Eggemann, Ingo Molnar, Juri Lelli, open list,
open list:CPU FREQUENCY SCALING FRAMEWORK, Mel Gorman,
Peter Zijlstra, Steven Rostedt, Valentin Schneider,
Vincent Guittot, z00813676, sudeep.holla
On Mon, Aug 11, 2025 at 06:43:08PM +0000, Prashant Malani wrote:
> On Aug 11 11:35, Viresh Kumar wrote:
> > On 06-08-25, 17:19, Prashant Malani wrote:
> > > So, do we have consensus that the idle check is acceptable as proposed?
> > > (Just want to make sure this thread doesn't get lost given another thread
> > > has forked off in this conversation).
> >
> > I don't have any objections to this or a better solution to this.
>
> Thanks Viresh! Beata, can we kindly move ahead with the idle
> optimization (which is this series), while we continue discussions for
> the "under load" scenarios on the other thread?
>
> BR,
I'd say yes, as long as you get a green light on exporting `idle_cpu`.
---
BR
Beata
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2 2/2] cpufreq: CPPC: Dont read counters for idle CPUs
2025-08-08 2:14 ` Prashant Malani
@ 2025-08-13 10:15 ` Beata Michalska
2025-08-13 22:25 ` Prashant Malani
0 siblings, 1 reply; 44+ messages in thread
From: Beata Michalska @ 2025-08-13 10:15 UTC (permalink / raw)
To: Prashant Malani
Cc: Viresh Kumar, Rafael J. Wysocki, Jie Zhan, Ionela Voinescu,
Ben Segall, Dietmar Eggemann, Ingo Molnar, Juri Lelli, open list,
open list:CPU FREQUENCY SCALING FRAMEWORK, Mel Gorman,
Peter Zijlstra, Steven Rostedt, Valentin Schneider,
Vincent Guittot, z00813676, sudeep.holla
On Fri, Aug 08, 2025 at 02:14:39AM +0000, Prashant Malani wrote:
> Hi Beata,
>
> On Aug 07 12:24, Beata Michalska wrote:
> > Right .... that's what happens when you are (I am) making last minute clean up.
> > That should fix it. Would you mind giving it another go ? Would appreciate it.
> >
> > ---
> > BR
> > Beata
> >
> > diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
> > index 65adb78a9a87..2a51e93fcd6c 100644
> > --- a/arch/arm64/kernel/topology.c
> > +++ b/arch/arm64/kernel/topology.c
> > @@ -543,7 +543,7 @@ void counters_burst_read_on_cpu(void *arg)
> >
> > static inline bool cpc_reg_supported(struct cpc_reg *reg)
> > {
> > - return !((u64)reg->address != 0x0 || (u64)reg->address != 0x1);
> > + return !((u64)reg->address != 0x0 && (u64)reg->address != 0x1);
> > }
>
> Here are the measurements with the fix:
>
> The readings are less accurate. There are some which report
> 3.4 GHz (as earlier) but many are off:
>
> t0: del:77500009084, ref:22804739600
> t1: del:77500020316, ref:22804743100
> ref_perf:10
> delivered_perf:32
>
> t0: del:77910203848, ref:22941794740
> t1: del:77910215594, ref:22941798070
> ref_perf:10
> delivered_perf:35
>
> t0: del:77354782419, ref:22762276000
> t1: del:77354793991, ref:22762279400
> ref_perf:10
> delivered_perf:34
>
> t0: del:64470686034, ref:22998377620
> t1: del:64470695313, ref:22998380880
> ref_perf:10
> delivered_perf:28
>
> t0: del:78019898424, ref:22957940640
> t1: del:78019912872, ref:22957944590
> ref_perf:10
> delivered_perf:36
>
> Best regards,
>
> -Prashant
Ok, that's not really good.
Any chances on sharing which platform are you using ?
Remote debugging tends to be rather painful.
---
(Note: I will be off for couple of days so please bear with me)
BR
Beata
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2 2/2] cpufreq: CPPC: Dont read counters for idle CPUs
2025-08-13 10:15 ` Beata Michalska
@ 2025-08-13 22:25 ` Prashant Malani
0 siblings, 0 replies; 44+ messages in thread
From: Prashant Malani @ 2025-08-13 22:25 UTC (permalink / raw)
To: Beata Michalska
Cc: Viresh Kumar, Rafael J. Wysocki, Jie Zhan, Ionela Voinescu,
Ben Segall, Dietmar Eggemann, Ingo Molnar, Juri Lelli, open list,
open list:CPU FREQUENCY SCALING FRAMEWORK, Mel Gorman,
Peter Zijlstra, Steven Rostedt, Valentin Schneider,
Vincent Guittot, z00813676, sudeep.holla
Hi Beata,
On Aug 13 12:15, Beata Michalska wrote:
> Ok, that's not really good.
> Any chances on sharing which platform are you using ?
Unfortunately I can't share that info ATM.
But it's the ARM standard counters in FFH, so it should be reproducible
on any platform that has that configuration for those AMU counters.
> Remote debugging tends to be rather painful.
Totally understand. I really appreciate your patience and help in
figuring this out!
Please LMK if there are any other experiments/patches you'd like me to
try.
BR,
-Prashant
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2 2/2] cpufreq: CPPC: Dont read counters for idle CPUs
2025-08-11 20:01 ` Prashant Malani
@ 2025-08-14 11:48 ` Rafael J. Wysocki
2025-08-15 5:12 ` Prashant Malani
0 siblings, 1 reply; 44+ messages in thread
From: Rafael J. Wysocki @ 2025-08-14 11:48 UTC (permalink / raw)
To: Prashant Malani
Cc: Rafael J. Wysocki, Viresh Kumar, Beata Michalska, Jie Zhan,
Ionela Voinescu, Ben Segall, Dietmar Eggemann, Ingo Molnar,
Juri Lelli, open list, open list:CPU FREQUENCY SCALING FRAMEWORK,
Mel Gorman, Peter Zijlstra, Steven Rostedt, Valentin Schneider,
Vincent Guittot, z00813676, sudeep.holla
On Mon, Aug 11, 2025 at 10:01 PM Prashant Malani <pmalani@google.com> wrote:
>
> On Aug 11 21:19, Rafael J. Wysocki wrote:
> > On Mon, Aug 11, 2025 at 8:43 PM Prashant Malani <pmalani@google.com> wrote:
> > >
> > > On Aug 11 11:35, Viresh Kumar wrote:
> > > > On 06-08-25, 17:19, Prashant Malani wrote:
> > > > > So, do we have consensus that the idle check is acceptable as proposed?
> > > > > (Just want to make sure this thread doesn't get lost given another thread
> > > > > has forked off in this conversation).
> > > >
> > > > I don't have any objections to this or a better solution to this.
> > >
> > > Thanks Viresh! Beata, can we kindly move ahead with the idle
> > > optimization (which is this series), while we continue discussions for
> > > the "under load" scenarios on the other thread?
> >
> > I need some more time, please?
> >
> > This problem is similar (if not analogous) to what happens on x86 and
> > that is not handled in the cpuidle core.
>
> My apologies! Didn't mean to rush.
No worries.
> Will stand by for updates.
First off, AFAICS, using idle_cpu() for reliable detection of CPU
idleness in a sysfs attribute code path would be at least
questionable, if not outright invalid. By the time you have got a
result from it, there's nothing to prevent the CPU in question from
going idle or waking up from idle. Moreover, the fact that the given
CPU is idle from the scheduler perspective doesn't actually mean that
it is in an idle state and so it has no bearing on whether or not its
performance counters can be accessed etc.
The way x86 deals with this problem is to snapshot the counters in
question periodically (actually, in scheduler ticks) and fall back to
cpu_khz if the interval between the two consecutive updates is too
large (see https://elixir.bootlin.com/linux/v6.16/source/arch/x86/kernel/cpu/aperfmperf.c#L502).
I think that this is the only reliable way to handle it, but I may be
mistaken.
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2 2/2] cpufreq: CPPC: Dont read counters for idle CPUs
2025-08-14 11:48 ` Rafael J. Wysocki
@ 2025-08-15 5:12 ` Prashant Malani
2025-08-16 8:25 ` Prashant Malani
0 siblings, 1 reply; 44+ messages in thread
From: Prashant Malani @ 2025-08-15 5:12 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Viresh Kumar, Beata Michalska, Jie Zhan, Ionela Voinescu,
Ben Segall, Dietmar Eggemann, Ingo Molnar, Juri Lelli, open list,
open list:CPU FREQUENCY SCALING FRAMEWORK, Mel Gorman,
Peter Zijlstra, Steven Rostedt, Valentin Schneider,
Vincent Guittot, z00813676, sudeep.holla
Thanks a lot for taking a look at this, Rafael.
On Aug 14 13:48, Rafael J. Wysocki wrote:
>
> First off, AFAICS, using idle_cpu() for reliable detection of CPU
> idleness in a sysfs attribute code path would be at least
> questionable, if not outright invalid. By the time you have got a
> result from it, there's nothing to prevent the CPU in question from
> going idle or waking up from idle.
This is a heuristic-based optimization. The observation is that when
the CPU is idle (or near-idle/lightly loaded, since FFH actually wakes
up an idle CPU), the AMU counters as read from the kernel are unreliable.
It is fine if the CPU wakes up from idle immediately after the check.
In that case, we'd return the desired frequency (via PCC reg read), which
is what the frequency would be anyway (if the AMU measurement was
actually taken).
In a sense, the assumption here is no worse than what is there at
present; currently the samples are taken across 2us, and (theoretically)
if the difference between them is 0, we take the fallback path. There is
nothing to prevent the CPU from waking up immediately after that 2us
sample period.
> Moreover, the fact that the given
> CPU is idle from the scheduler perspective doesn't actually mean that
> it is in an idle state and so it has no bearing on whether or not its
> performance counters can be accessed etc.
The idle check isn't meant to guard against accessing the counters.
AFAICT it is perfectly valid to access the counters even when the CPU is
actually idle.
>
> The way x86 deals with this problem is to snapshot the counters in
> question periodically (actually, in scheduler ticks) and fall back to
> cpu_khz if the interval between the two consecutive updates is too
> large (see https://elixir.bootlin.com/linux/v6.16/source/arch/x86/kernel/cpu/aperfmperf.c#L502).
> I think that this is the only reliable way to handle it, but I may be
> mistaken.
This is interesting. I think it may not work for the CPPC case, since
the registers in question are in some cases accessed through PCC reads
which require semaphores. I think it would be untenable to do that in
the tick handler (but I may be mistaken here). It's easier on x86
since those are always just MSRs.
We could probably do it for the FFH case, but then we're bifurcating
the computation method and IMO that's not worth the hassle.
Perhaps some of the ARM experts here can think of ways to do this that
I haven't considered.
Best regards,
-Prashant
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2 2/2] cpufreq: CPPC: Dont read counters for idle CPUs
2025-08-15 5:12 ` Prashant Malani
@ 2025-08-16 8:25 ` Prashant Malani
0 siblings, 0 replies; 44+ messages in thread
From: Prashant Malani @ 2025-08-16 8:25 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Viresh Kumar, Beata Michalska, Jie Zhan, Ionela Voinescu,
Ben Segall, Dietmar Eggemann, Ingo Molnar, Juri Lelli, open list,
open list:CPU FREQUENCY SCALING FRAMEWORK, Mel Gorman,
Peter Zijlstra, Steven Rostedt, Valentin Schneider,
Vincent Guittot, z00813676, sudeep.holla
On Thu, 14 Aug 2025 at 22:13, Prashant Malani <pmalani@google.com> wrote:
>
> Thanks a lot for taking a look at this, Rafael.
>
> On Aug 14 13:48, Rafael J. Wysocki wrote:
> >
> > First off, AFAICS, using idle_cpu() for reliable detection of CPU
> > idleness in a sysfs attribute code path would be at least
> > questionable, if not outright invalid. By the time you have got a
> > result from it, there's nothing to prevent the CPU in question from
> > going idle or waking up from idle.
>
> This is a heuristic-based optimization. The observation is that when
> the CPU is idle (or near-idle/lightly loaded, since FFH actually wakes
> up an idle CPU), the AMU counters as read from the kernel are unreliable.
> It is fine if the CPU wakes up from idle immediately after the check.
> In that case, we'd return the desired frequency (via PCC reg read), which
> is what the frequency would be anyway (if the AMU measurement was
> actually taken).
>
> In a sense, the assumption here is no worse than what is there at
> present; currently the samples are taken across 2us, and (theoretically)
> if the difference between them is 0, we take the fallback path. There is
> nothing to prevent the CPU from waking up immediately after that 2us
> sample period.
>
> > Moreover, the fact that the given
> > CPU is idle from the scheduler perspective doesn't actually mean that
> > it is in an idle state and so it has no bearing on whether or not its
> > performance counters can be accessed etc.
>
> The idle check isn't meant to guard against accessing the counters.
> AFAICT it is perfectly valid to access the counters even when the CPU is
> actually idle.
>
> >
> > The way x86 deals with this problem is to snapshot the counters in
> > question periodically (actually, in scheduler ticks) and fall back to
> > cpu_khz if the interval between the two consecutive updates is too
> > large (see https://elixir.bootlin.com/linux/v6.16/source/arch/x86/kernel/cpu/aperfmperf.c#L502).
> > I think that this is the only reliable way to handle it, but I may be
> > mistaken.
>
> This is interesting. I think it may not work for the CPPC case, since
> the registers in question are in some cases accessed through PCC reads
> which require semaphores. I think it would be untenable to do that in
> the tick handler (but I may be mistaken here). It's easier on x86
> since those are always just MSRs.
> We could probably do it for the FFH case, but then we're bifurcating
> the computation method and IMO that's not worth the hassle.
I looked around a bit more and it turns out arm64 is already doing something
similar to what you propose.
It takes a snapshot of the counters every tick, and falls back to
arch_scale_freq_capacity() for another non-idle CPU (checked using idle_cpu())
in the same frequency domain if the interval between snapshots is too long [1].
However, this function only works some of the time; if the CPU hasn't had
a tick in a while *and* there is no other non-idle CPU in the frequency domain,
it returns an error [2].
Moreover, cpuinfo_cur_freq doesn't use this (it reads the frequency directly
from the CPPC driver).
(I'll abandon this series if you still believe this is an invalid
approach; just thought
I'd get this bit of clarification in, for my own understanding).
BR,
-Prashant
[1] https://elixir.bootlin.com/linux/v6.16-rc7/source/arch/arm64/kernel/topology.c#L285
[2] https://elixir.bootlin.com/linux/v6.16-rc7/source/arch/arm64/kernel/topology.c#L336
^ permalink raw reply [flat|nested] 44+ messages in thread
end of thread, other threads:[~2025-08-16 8:25 UTC | newest]
Thread overview: 44+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-19 0:09 [PATCH v2 0/2] cpufreq: CPPC: idle cpu perf handling Prashant Malani
2025-06-19 0:09 ` [PATCH v2 1/2] sched: Expose idle_cpu() to modules Prashant Malani
2025-06-19 0:09 ` [PATCH v2 2/2] cpufreq: CPPC: Dont read counters for idle CPUs Prashant Malani
2025-06-20 3:53 ` Jie Zhan
2025-06-20 5:07 ` Prashant Malani
2025-06-26 18:42 ` Prashant Malani
2025-06-27 7:54 ` Jie Zhan
2025-06-27 17:07 ` Prashant Malani
2025-07-02 18:38 ` Prashant Malani
2025-07-03 9:29 ` Beata Michalska
2025-07-07 8:32 ` Beata Michalska
2025-07-09 17:25 ` Prashant Malani
2025-07-09 22:49 ` Prashant Malani
2025-07-14 9:30 ` Beata Michalska
2025-07-15 6:28 ` Prashant Malani
2025-07-21 17:00 ` Rafael J. Wysocki
2025-07-21 19:40 ` Prashant Malani
2025-07-22 3:27 ` Viresh Kumar
2025-07-22 6:02 ` Prashant Malani
2025-07-30 7:31 ` Prashant Malani
2025-07-31 8:27 ` Beata Michalska
2025-07-31 11:13 ` Viresh Kumar
2025-07-31 20:23 ` Beata Michalska
2025-08-01 4:43 ` Viresh Kumar
2025-08-07 0:19 ` Prashant Malani
2025-08-11 6:05 ` Viresh Kumar
2025-08-11 18:43 ` Prashant Malani
2025-08-11 19:19 ` Rafael J. Wysocki
2025-08-11 20:01 ` Prashant Malani
2025-08-14 11:48 ` Rafael J. Wysocki
2025-08-15 5:12 ` Prashant Malani
2025-08-16 8:25 ` Prashant Malani
2025-08-13 10:12 ` Beata Michalska
2025-07-31 16:51 ` Prashant Malani
2025-07-31 20:30 ` Beata Michalska
2025-08-01 9:16 ` Prashant Malani
2025-08-04 20:55 ` Prashant Malani
2025-08-06 7:21 ` Beata Michalska
2025-08-07 0:01 ` Prashant Malani
2025-08-07 10:24 ` Beata Michalska
2025-08-08 2:14 ` Prashant Malani
2025-08-13 10:15 ` Beata Michalska
2025-08-13 22:25 ` Prashant Malani
2025-07-07 8:35 ` Beata Michalska
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).