linux-pm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Yang Shi <yang@os.amperecomputing.com>
To: Prashant Malani <pmalani@google.com>,
	open list <linux-kernel@vger.kernel.org>,
	"open list:CPU FREQUENCY SCALING FRAMEWORK"
	<linux-pm@vger.kernel.org>,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	Viresh Kumar <viresh.kumar@linaro.org>,
	beata.michalska@arm.com,
	Catalin Marinas <catalin.marinas@arm.com>
Cc: Ionela Voinescu <Ionela.Voinescu@arm.com>
Subject: Re: [PATCH] cpufreq: CPPC: Increase delay between perf counter reads
Date: Tue, 5 Aug 2025 17:26:33 -0700	[thread overview]
Message-ID: <8252b1e6-5d13-4f26-8aa3-30e841639e10@os.amperecomputing.com> (raw)
In-Reply-To: <20250730220812.53098-1-pmalani@google.com>



On 7/30/25 3:07 PM, Prashant Malani wrote:
> [You don't often get email from pmalani@google.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
>
> On a heavily loaded CPU, performance counter reads can be erratic. This is
> due to two factors:
> - The method used to calculate CPPC delivered performance.
> - Linux scheduler vagaries.
>
> As an example, on a CPU which has a max frequency of 3.4 GHz, if we run
> stress-ng on the CPU in the background and then read the frequency, we get
> invalid readings:
>
> ./stress_ng --cpu 108 --taskset 3 -t 30s &
> cat /sys/devices/system/cpu/cpu3/cpufreq/cpuinfo_cur_freq
> 3600000
> 3500000
> 3600000
>
> Per [1] CPPC performance is measured by reading the delivered and reference
> counters at timestamp t0, then waiting 2us, and then repeating the
> measurement at t1. So, in theory, one should end up with:
> Timestamp t0: ref0, del0
> Timestamp t1: ref1, del1
>
> However, since the reference and delivered registers are individual
> register reads (in the case of FFH, it even results in an IPI to the CPU in
> question), what happens in practice is:
> Timestamp t0: del0
> Timestamp t0 + m: ref0
> Timestamp t1: del1
> Timestamp t1 + n: ref1
>
> There has been prior discussion[2] about the cause of these differences;
> it was broadly pegged as due to IRQs and "interconnect congestion".
>
> Since the gap between t0 and t1 is very small (2us), differing values of m
> and n mean that the measurements don't correspond to 2 discrete timestamps,
> since the delivered performance delta is being measured across a
> significantly different time period than the reference performance
> delta. This has an influence on the perf measurement which is:
> ((del1 - del0) * reference perf) / (ref1 - ref0)
>
> Previously collected data[4] shows that cppc_get_perf_ctrs() itself
> takes anywhere between 4.9us and 3.6us, which further suggests that a
> 2us delta is too less.
>
> If we increase the time delta to a high enough value (i.e if delay >> m,n)
> then the effects of m and n get mitigated, leading to both the register
> measurements (ref and del) corresponding to the same timestamp.
>
> When this approach was previously proposed[3], there was concern about
> this function being called with interrupts off but that was later found to
> be not true [2]. So, waiting for a slightly longer time in between counter
> samples should be acceptable.
>
> Increase the time delay between counter reads to 200 us to reduce the
> effect of timing discrepancies in reading individual performance registers.
>
> [1] https://docs.kernel.org/admin-guide/acpi/cppc_sysfs.html#computing-average-delivered-performance
> [2] https://lore.kernel.org/all/7b57e680-0ba3-0b8b-851e-7cc369050386@os.amperecomputing.com/
> [3] https://lore.kernel.org/all/20230328193846.8757-1-yang@os.amperecomputing.com/
> [4] https://lore.kernel.org/all/1ce09fd7-0c1d-fc46-ce12-01b25fbd4afd@os.amperecomputing.com/
>
> Cc: Yang Shi <yang@os.amperecomputing.com>
> Cc: Ionela Voinescu <Ionela.Voinescu@arm.com>
> Signed-off-by: Prashant Malani <pmalani@google.com>

Thank you for cc'ing me the patch. I posted the similar patch ago and 
had some discussion on the mailing list. Then someone else from ARM 
pursued a different way to solve it. But I didn't follow very closely. 
If I remember correctly, a new sysfs interface, called cpuinfo_avg_freq 
was added. It should be the preferred way to get cpu frequency. Please 
see 
https://github.com/torvalds/linux/commit/fbb4a4759b541d09ebb8e391d5fa7f9a5a0cad61.

Added Beata Michalska in the loop too, who is the author of the patch. 
Please feel free to correct me, if I'm wrong.

Yang


> ---
>   drivers/cpufreq/cppc_cpufreq.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
> index 4a17162a392d..086c3b87bd4e 100644
> --- a/drivers/cpufreq/cppc_cpufreq.c
> +++ b/drivers/cpufreq/cppc_cpufreq.c
> @@ -718,7 +718,7 @@ static int cppc_get_perf_ctrs_sample(int cpu,
>          if (ret)
>                  return ret;
>
> -       udelay(2); /* 2usec delay between sampling */
> +       udelay(200); /* 200usec delay between sampling */
>
>          return cppc_get_perf_ctrs(cpu, fb_ctrs_t1);
>   }
> --
> 2.50.1.552.g942d659e1b-goog
>


  reply	other threads:[~2025-08-06  0:26 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-30 22:07 [PATCH] cpufreq: CPPC: Increase delay between perf counter reads Prashant Malani
2025-08-06  0:26 ` Yang Shi [this message]
2025-08-06  1:00   ` Prashant Malani
2025-08-19  9:28     ` Beata Michalska
2025-08-19 23:43       ` Prashant Malani
2025-08-20 20:49         ` Beata Michalska
2025-08-20 21:25           ` Prashant Malani
2025-08-25 14:52             ` Beata Michalska
2025-08-25 20:24               ` Prashant Malani
2025-08-28  7:33                 ` Beata Michalska
2025-08-28 21:29                   ` Prashant Malani

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8252b1e6-5d13-4f26-8aa3-30e841639e10@os.amperecomputing.com \
    --to=yang@os.amperecomputing.com \
    --cc=Ionela.Voinescu@arm.com \
    --cc=beata.michalska@arm.com \
    --cc=catalin.marinas@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=pmalani@google.com \
    --cc=rafael@kernel.org \
    --cc=viresh.kumar@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).