linux-pm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] cpufreq: CPPC: Increase delay between perf counter reads
@ 2025-07-30 22:07 Prashant Malani
  2025-08-06  0:26 ` Yang Shi
  0 siblings, 1 reply; 11+ messages in thread
From: Prashant Malani @ 2025-07-30 22:07 UTC (permalink / raw)
  To: open list, open list:CPU FREQUENCY SCALING FRAMEWORK,
	Rafael J. Wysocki, Viresh Kumar
  Cc: Prashant Malani, Yang Shi, Ionela Voinescu

On a heavily loaded CPU, performance counter reads can be erratic. This is
due to two factors:
- The method used to calculate CPPC delivered performance.
- Linux scheduler vagaries.

As an example, on a CPU which has a max frequency of 3.4 GHz, if we run
stress-ng on the CPU in the background and then read the frequency, we get
invalid readings:

./stress_ng --cpu 108 --taskset 3 -t 30s &
cat /sys/devices/system/cpu/cpu3/cpufreq/cpuinfo_cur_freq
3600000
3500000
3600000

Per [1] CPPC performance is measured by reading the delivered and reference
counters at timestamp t0, then waiting 2us, and then repeating the
measurement at t1. So, in theory, one should end up with:
Timestamp t0: ref0, del0
Timestamp t1: ref1, del1

However, since the reference and delivered registers are individual
register reads (in the case of FFH, it even results in an IPI to the CPU in
question), what happens in practice is:
Timestamp t0: del0
Timestamp t0 + m: ref0
Timestamp t1: del1
Timestamp t1 + n: ref1

There has been prior discussion[2] about the cause of these differences;
it was broadly pegged as due to IRQs and "interconnect congestion".

Since the gap between t0 and t1 is very small (2us), differing values of m
and n mean that the measurements don't correspond to 2 discrete timestamps,
since the delivered performance delta is being measured across a
significantly different time period than the reference performance
delta. This has an influence on the perf measurement which is:
((del1 - del0) * reference perf) / (ref1 - ref0)

Previously collected data[4] shows that cppc_get_perf_ctrs() itself
takes anywhere between 4.9us and 3.6us, which further suggests that a
2us delta is too less.

If we increase the time delta to a high enough value (i.e if delay >> m,n)
then the effects of m and n get mitigated, leading to both the register
measurements (ref and del) corresponding to the same timestamp.

When this approach was previously proposed[3], there was concern about
this function being called with interrupts off but that was later found to
be not true [2]. So, waiting for a slightly longer time in between counter
samples should be acceptable.

Increase the time delay between counter reads to 200 us to reduce the
effect of timing discrepancies in reading individual performance registers.

[1] https://docs.kernel.org/admin-guide/acpi/cppc_sysfs.html#computing-average-delivered-performance
[2] https://lore.kernel.org/all/7b57e680-0ba3-0b8b-851e-7cc369050386@os.amperecomputing.com/
[3] https://lore.kernel.org/all/20230328193846.8757-1-yang@os.amperecomputing.com/
[4] https://lore.kernel.org/all/1ce09fd7-0c1d-fc46-ce12-01b25fbd4afd@os.amperecomputing.com/

Cc: Yang Shi <yang@os.amperecomputing.com>
Cc: Ionela Voinescu <Ionela.Voinescu@arm.com>
Signed-off-by: Prashant Malani <pmalani@google.com>
---
 drivers/cpufreq/cppc_cpufreq.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
index 4a17162a392d..086c3b87bd4e 100644
--- a/drivers/cpufreq/cppc_cpufreq.c
+++ b/drivers/cpufreq/cppc_cpufreq.c
@@ -718,7 +718,7 @@ static int cppc_get_perf_ctrs_sample(int cpu,
 	if (ret)
 		return ret;
 
-	udelay(2); /* 2usec delay between sampling */
+	udelay(200); /* 200usec delay between sampling */
 
 	return cppc_get_perf_ctrs(cpu, fb_ctrs_t1);
 }
-- 
2.50.1.552.g942d659e1b-goog


^ permalink raw reply related	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2025-08-28 21:29 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-30 22:07 [PATCH] cpufreq: CPPC: Increase delay between perf counter reads Prashant Malani
2025-08-06  0:26 ` Yang Shi
2025-08-06  1:00   ` Prashant Malani
2025-08-19  9:28     ` Beata Michalska
2025-08-19 23:43       ` Prashant Malani
2025-08-20 20:49         ` Beata Michalska
2025-08-20 21:25           ` Prashant Malani
2025-08-25 14:52             ` Beata Michalska
2025-08-25 20:24               ` Prashant Malani
2025-08-28  7:33                 ` Beata Michalska
2025-08-28 21:29                   ` Prashant Malani

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).