From: Beata Michalska <beata.michalska@arm.com>
To: Prashant Malani <pmalani@google.com>
Cc: Yang Shi <yang@os.amperecomputing.com>,
open list <linux-kernel@vger.kernel.org>,
"open list:CPU FREQUENCY SCALING FRAMEWORK"
<linux-pm@vger.kernel.org>,
"Rafael J. Wysocki" <rafael@kernel.org>,
Viresh Kumar <viresh.kumar@linaro.org>,
Catalin Marinas <catalin.marinas@arm.com>,
Ionela Voinescu <Ionela.Voinescu@arm.com>
Subject: Re: [PATCH] cpufreq: CPPC: Increase delay between perf counter reads
Date: Thu, 28 Aug 2025 09:33:22 +0200 [thread overview]
Message-ID: <aLAGQk3nOcEI0qJ2@arm.com> (raw)
In-Reply-To: <aKzGlD7ZDIS4XMsF@google.com>
On Mon, Aug 25, 2025 at 08:24:52PM +0000, Prashant Malani wrote:
> On Aug 25 16:52, Beata Michalska wrote:
> > On Wed, Aug 20, 2025 at 02:25:16PM -0700, Prashant Malani wrote:
> > > Hi Beata,
> > >
> > > On Wed, 20 Aug 2025 at 13:49, Beata Michalska <beata.michalska@arm.com> wrote:
> > > >
> > > > Kinda working on that one.
> > >
> > > OK. I'm eager to see what the solution is!
> > >
> > > > >
> > > > > Outside of that, I can't think of another mitigation beyond adding delay to make
> > > > > the time deltas not matter so much.
> > > > I'm not entirely sure what 'so much' means in this context.
> > > > How one would quantify whether the added delay is actually mitigating the issue?
> > > >
> > >
> > > I alluded to it in the commit description, but here is the my basic
> > > numerical analysis:
> > > The effective timestamps for the 4 readings right now are:
> > > Timestamp t0: del0
> > > Timestamp t0 + m: ref0
> > > (Time delay X us)
> > > Timestamp t1: del1
> > > Timestamp t1 + n: ref1
> > >
> > > Timestamp t1 = t0 + m + X
> > >
> > > The perf calculation is:
> > > Per = del1 - del0 / ref1 - ref0
> > > = Del_counter_diff_over_time(t1 - t0) /
> > > ref_counter_diff_over_time(t1 + n - (t0 + m))
> > > = Del_counter_diff_over time(t0 + m + X - t0) /
> > > ref_counter_diff_over_time((t0 + m + X + n - t0 - m)
> > > = Del_counter_diff_over_time(m + X) / ref_counter_diff_over_time(n + X)
> > >
> > > If X >> (m,n) this becomes:
> > > = Del_counter_diff_over_time(X) / ref_counter_diff_over_time(X)
> > > which is what the actual calculation is supposed to be.
> > >
> > > if X ~ (m, N) (which is what the case is right now), the calculation
> > > becomes erratic.
> > This is still bound by 'm' and 'n' values, as the difference between those will
> > determine the error factor (with given, fixed X). If m != n, one counter delta
> > is stretched more than the other, so the perf ratio no longer represents the
> > same time interval. And that will vary between platforms/workloads leading to
> > over/under-reporting.
>
> What you are saying holds when m,n ~ X. But if X >> m,n, the X component
> dominates. On most platforms, m and n are typically 1-2 us.
> If X is anything >= 100us, it dominates the m,n component, making both
> time intervals practically the same, i.e
>
> (100 + 1) / (100 + 2) = 101 / 102 = 0.9901 ~ 1.00
True but that does still influence the error - in this case that's ~1% so
negligible. But the overall error magnitude does increase when the range
between min and max of the possible values of m and n gets bigger.
Question is what's the max error that can be deemed acceptable.
And I'm pretty sure there are platforms that would require bigger X still.
>
> > >
> > > There have been other observations on this topic [1], that suggest
> > > that even 100us
> > > improves the error rate significantly from what it is with 2us.
> > >
> > > BR,
> > Which is exactly why I've mentioned this approach is not really recommended,
> > being bound to rather specific setup. There have been similar proposals in the
> > past, all with different values of the delay which should illustrate how fragile
> > solution (if any) that is.
>
> The reports/occurences point to the fact that the current value doesn't work.
Wasn't claiming it does.
>
> Another way of putting it is, why is 2us considered the "right"
> value?
Looking at the history, the argument was pretty much the same as yours: was
considered sufficient for most platforms [1]
>
> This patch was never meant to be an ideal solution, but it's better than what
> is there at present. Currently, the `policy->cur` is completely unusable on CPPC,
> and is cropping up in other locations in the cpufreq driver core [1]
> while also breaking a userfacing ABI i.e scaling_setspeed.
>
> I realize you're working on a solution, so if that is O(weeks) away, it
> makes sense to wait; otherwise it would seem logical to mitigate the
> error (it can always be reverted once the "better" solution is in
> place).
>
> Ultimately it's your call, but I'm not convinced with rationale provided
> thus far.
Actually it is not up to me, I'm simply sharing my opinion, which is:
we should fix the problem instead of hiding it.
Setting that aside though - this change seems rather harmless.
Aside:
./scripts/get_maintainer.pl --m ./drivers/cpufreq/cppc_cpufreq.c
---
[1] https://lore.kernel.org/all/37517652-9a74-83f8-1315-07fe79a78d73@codeaurora.org/
---
BR
Beata
>
> Best regards,
>
> -Prashant
>
> [1] https://lore.kernel.org/linux-pm/20250823001937.2765316-1-pmalani@google.com/T/#t
next prev parent reply other threads:[~2025-08-28 7:33 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-30 22:07 [PATCH] cpufreq: CPPC: Increase delay between perf counter reads Prashant Malani
2025-08-06 0:26 ` Yang Shi
2025-08-06 1:00 ` Prashant Malani
2025-08-19 9:28 ` Beata Michalska
2025-08-19 23:43 ` Prashant Malani
2025-08-20 20:49 ` Beata Michalska
2025-08-20 21:25 ` Prashant Malani
2025-08-25 14:52 ` Beata Michalska
2025-08-25 20:24 ` Prashant Malani
2025-08-28 7:33 ` Beata Michalska [this message]
2025-08-28 21:29 ` Prashant Malani
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aLAGQk3nOcEI0qJ2@arm.com \
--to=beata.michalska@arm.com \
--cc=Ionela.Voinescu@arm.com \
--cc=catalin.marinas@arm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pm@vger.kernel.org \
--cc=pmalani@google.com \
--cc=rafael@kernel.org \
--cc=viresh.kumar@linaro.org \
--cc=yang@os.amperecomputing.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.