public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] x86/aperfmperf: Fix the fallback condition in arch_freq_get_on_cpu()
@ 2023-06-26 19:36 Keyon Jie
  2023-06-28  0:57 ` Yang Jie
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Keyon Jie @ 2023-06-26 19:36 UTC (permalink / raw)
  To: Thomas Gleixner, x86, linux-kernel
  Cc: Ingo Molnar, Borislav Petkov, Dave Hansen, H . Peter Anvin,
	Peter Zijlstra, Yair Podemsky, Keyon Jie

From the commit f3eca381bd49 on, the fallback condition about the 'the
last update was too long' have been comparing ticks and milliseconds by
mistake, which leads to that the condition is met and the fallback
method is used frequently.

The change to compare ticks here corrects that and fixes related issues
have been seen on x86 platforms since 5.18 kernel.

Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217597
Fixes: f3eca381bd49 ("x86/aperfmperf: Replace arch_freq_get_on_cpu()")
CC: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Keyon Jie <yang.jie@linux.intel.com>
---
 arch/x86/kernel/cpu/aperfmperf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/aperfmperf.c b/arch/x86/kernel/cpu/aperfmperf.c
index fdbb5f07448f..24e24e137226 100644
--- a/arch/x86/kernel/cpu/aperfmperf.c
+++ b/arch/x86/kernel/cpu/aperfmperf.c
@@ -432,7 +432,7 @@ unsigned int arch_freq_get_on_cpu(int cpu)
 	 * Bail on invalid count and when the last update was too long ago,
 	 * which covers idle and NOHZ full CPUs.
 	 */
-	if (!mcnt || (jiffies - last) > MAX_SAMPLE_AGE)
+	if (!mcnt || (jiffies - last) > MAX_SAMPLE_AGE * cpu_khz)
 		goto fallback;
 
 	return div64_u64((cpu_khz * acnt), mcnt);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] x86/aperfmperf: Fix the fallback condition in arch_freq_get_on_cpu()
  2023-06-26 19:36 [PATCH] x86/aperfmperf: Fix the fallback condition in arch_freq_get_on_cpu() Keyon Jie
@ 2023-06-28  0:57 ` Yang Jie
  2023-06-30 12:35 ` Thomas Gleixner
  2023-06-30 15:13 ` Dave Hansen
  2 siblings, 0 replies; 5+ messages in thread
From: Yang Jie @ 2023-06-28  0:57 UTC (permalink / raw)
  To: Thomas Gleixner, x86, linux-kernel
  Cc: Ingo Molnar, Borislav Petkov, Dave Hansen, H . Peter Anvin,
	Peter Zijlstra, Yair Podemsky, linux-pm, Rafael J. Wysocki,
	Viresh Kumar, Doug Smythies


Sorry for top posting, I should have sent it to linux-pm and maintainers.

Doug Smythies had a good discussion with me about the related history 
and issues in the bugzilla here: 
https://bugzilla.kernel.org/show_bug.cgi?id=217597.

Basically, there are 2 issues here per my observation:
1. the cpu_khz shared for all CPU cores? In Intel's recent Hybrid CPUs, 
what does this cpu_khz read from cpuid really mean? I am seeing 
cpu_khz=3.6GHz for E-cores with Max frequecy 3GHz. We should fix that, no?
2. We don't want to wake up cores just because of the sysfs queries, so 
we introduced fallback mechanism here, what is our clear design about that?

So, before discussing those issues, we should get alignment on these first:
1. What is fallback and When should we fallback. From the comment, looks 
we wanted to use cpu_khz for Cores haven't executed any task during the 
last 20ms, this sounds reasonable, and I patch here is to address this 
issue.
2. What frequencies should we show in fallback case. This could be 
controversial, 0? min_freq? base_freq? or last calculated one? Doug has 
suggestion here but this is not touched in my patch here.

Thanks,
~Keyon

On 6/26/23 12:36, Keyon Jie wrote:
>>From the commit f3eca381bd49 on, the fallback condition about the 'the
> last update was too long' have been comparing ticks and milliseconds by
> mistake, which leads to that the condition is met and the fallback
> method is used frequently.
> 
> The change to compare ticks here corrects that and fixes related issues
> have been seen on x86 platforms since 5.18 kernel.
> 
> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217597
> Fixes: f3eca381bd49 ("x86/aperfmperf: Replace arch_freq_get_on_cpu()")
> CC: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Keyon Jie <yang.jie@linux.intel.com>
> ---
>   arch/x86/kernel/cpu/aperfmperf.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/cpu/aperfmperf.c b/arch/x86/kernel/cpu/aperfmperf.c
> index fdbb5f07448f..24e24e137226 100644
> --- a/arch/x86/kernel/cpu/aperfmperf.c
> +++ b/arch/x86/kernel/cpu/aperfmperf.c
> @@ -432,7 +432,7 @@ unsigned int arch_freq_get_on_cpu(int cpu)
>   	 * Bail on invalid count and when the last update was too long ago,
>   	 * which covers idle and NOHZ full CPUs.
>   	 */
> -	if (!mcnt || (jiffies - last) > MAX_SAMPLE_AGE)
> +	if (!mcnt || (jiffies - last) > MAX_SAMPLE_AGE * cpu_khz)
>   		goto fallback;
>   
>   	return div64_u64((cpu_khz * acnt), mcnt);

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] x86/aperfmperf: Fix the fallback condition in arch_freq_get_on_cpu()
  2023-06-26 19:36 [PATCH] x86/aperfmperf: Fix the fallback condition in arch_freq_get_on_cpu() Keyon Jie
  2023-06-28  0:57 ` Yang Jie
@ 2023-06-30 12:35 ` Thomas Gleixner
  2023-07-05 17:23   ` Keyon Jie
  2023-06-30 15:13 ` Dave Hansen
  2 siblings, 1 reply; 5+ messages in thread
From: Thomas Gleixner @ 2023-06-30 12:35 UTC (permalink / raw)
  To: Keyon Jie, x86, linux-kernel
  Cc: Ingo Molnar, Borislav Petkov, Dave Hansen, H . Peter Anvin,
	Peter Zijlstra, Yair Podemsky, Keyon Jie

On Mon, Jun 26 2023 at 12:36, Keyon Jie wrote:
> diff --git a/arch/x86/kernel/cpu/aperfmperf.c b/arch/x86/kernel/cpu/aperfmperf.c
> index fdbb5f07448f..24e24e137226 100644
> --- a/arch/x86/kernel/cpu/aperfmperf.c
> +++ b/arch/x86/kernel/cpu/aperfmperf.c
> @@ -432,7 +432,7 @@ unsigned int arch_freq_get_on_cpu(int cpu)
>  	 * Bail on invalid count and when the last update was too long ago,
>  	 * which covers idle and NOHZ full CPUs.
>  	 */
> -	if (!mcnt || (jiffies - last) > MAX_SAMPLE_AGE)
> +	if (!mcnt || (jiffies - last) > MAX_SAMPLE_AGE * cpu_khz)

What?

#define MAX_SAMPLE_AGE  ((unsigned long)HZ / 50)

HZ is the number of ticks (jiffies) per second. 20ms is 1/50 of a
second.

As the sample age is measured in jiffies and the maximum is defined to
be 20ms, the existing code is correct.

With your change the condition resolves to:

     delta > MAX_SAMPLE_AGE * cpu_khz

cpu_khz = Nominal CPU frequency / 1000

Ergo:

     delta > (HZ / 50) * (cpufreq / 1000)

                HZ * cpufreq
-->  delta > ------------------
                  50000

Let's describe cpufreq in GHz:

                HZ * G * 1e9
-->  delta > ------------------
                  50000

-->  delta > HZ * G * 20000

delta is calculated in jiffies, i.e. the number of ticks since the last
invocation. Because HZ is ticks per second, the resulting timeout
measured in seconds is:

         HZ * G * 20000 / HZ

-->      G * 20000 seconds

20000 seconds for a 1GHz CPU, 40000 seconds for a 4GHz CPU independent
of the actual HZ value.

jiffies are incremented once per tick, i.e. at tick frequency. The
number of ticks required to reach 20ms depends obviously on the tick
frequency, aka HZ.

HZ         ticks per second     tick period     Number of ticks which
                                                are equivalent to 20ms
 100        100                 10ms             2
 250        250                  4ms             5
1000       1000                  1ms            10

And that's what the code does correctly:

#define MAX_SAMPLE_AGE  ((unsigned long)HZ / 50)

No?

> From the commit f3eca381bd49 on, the fallback condition about the 'the
> last update was too long' have been comparing ticks and milliseconds by
> mistake, which leads to that the condition is met and the fallback
> method is used frequently.

The comparison is comparing a tick delta with a maximum number of ticks
and that's not a mistake. It's simply correct.

> The change to compare ticks here corrects that and fixes related issues
> have been seen on x86 platforms since 5.18 kernel.

I have no idea what you are trying to "fix" here, but that's moot as
there is nothing to fix.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] x86/aperfmperf: Fix the fallback condition in arch_freq_get_on_cpu()
  2023-06-26 19:36 [PATCH] x86/aperfmperf: Fix the fallback condition in arch_freq_get_on_cpu() Keyon Jie
  2023-06-28  0:57 ` Yang Jie
  2023-06-30 12:35 ` Thomas Gleixner
@ 2023-06-30 15:13 ` Dave Hansen
  2 siblings, 0 replies; 5+ messages in thread
From: Dave Hansen @ 2023-06-30 15:13 UTC (permalink / raw)
  To: Keyon Jie, Thomas Gleixner, x86, linux-kernel
  Cc: Ingo Molnar, Borislav Petkov, Dave Hansen, H . Peter Anvin,
	Peter Zijlstra, Yair Podemsky

On 6/26/23 12:36, Keyon Jie wrote:
>>From the commit f3eca381bd49 on, the fallback condition about the 'the
> last update was too long' have been comparing ticks and milliseconds by
> mistake, which leads to that the condition is met and the fallback
> method is used frequently.

Could you do me a favor and walk me through the units of the variables
that you're talking about?  What is in ticks and what is in milliseconds?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] x86/aperfmperf: Fix the fallback condition in arch_freq_get_on_cpu()
  2023-06-30 12:35 ` Thomas Gleixner
@ 2023-07-05 17:23   ` Keyon Jie
  0 siblings, 0 replies; 5+ messages in thread
From: Keyon Jie @ 2023-07-05 17:23 UTC (permalink / raw)
  To: Thomas Gleixner, x86, linux-kernel
  Cc: Ingo Molnar, Borislav Petkov, Dave Hansen, H . Peter Anvin,
	Peter Zijlstra, Yair Podemsky



On 6/30/23 05:35, Thomas Gleixner wrote:
> On Mon, Jun 26 2023 at 12:36, Keyon Jie wrote:
>> diff --git a/arch/x86/kernel/cpu/aperfmperf.c b/arch/x86/kernel/cpu/aperfmperf.c
>> index fdbb5f07448f..24e24e137226 100644
>> --- a/arch/x86/kernel/cpu/aperfmperf.c
>> +++ b/arch/x86/kernel/cpu/aperfmperf.c
>> @@ -432,7 +432,7 @@ unsigned int arch_freq_get_on_cpu(int cpu)
>>   	 * Bail on invalid count and when the last update was too long ago,
>>   	 * which covers idle and NOHZ full CPUs.
>>   	 */
>> -	if (!mcnt || (jiffies - last) > MAX_SAMPLE_AGE)
>> +	if (!mcnt || (jiffies - last) > MAX_SAMPLE_AGE * cpu_khz)
> 
> What?
> 
> #define MAX_SAMPLE_AGE  ((unsigned long)HZ / 50)
> 
> HZ is the number of ticks (jiffies) per second. 20ms is 1/50 of a
> second.
> 
> As the sample age is measured in jiffies and the maximum is defined to
> be 20ms, the existing code is correct.
> 
> With your change the condition resolves to:
> 
>       delta > MAX_SAMPLE_AGE * cpu_khz
> 
> cpu_khz = Nominal CPU frequency / 1000
> 
> Ergo:
> 
>       delta > (HZ / 50) * (cpufreq / 1000)
> 
>                  HZ * cpufreq
> -->  delta > ------------------
>                    50000
> 
> Let's describe cpufreq in GHz:
> 
>                  HZ * G * 1e9
> -->  delta > ------------------
>                    50000
> 
> -->  delta > HZ * G * 20000
> 
> delta is calculated in jiffies, i.e. the number of ticks since the last
> invocation. Because HZ is ticks per second, the resulting timeout
> measured in seconds is:
> 
>           HZ * G * 20000 / HZ
> 
> -->      G * 20000 seconds
> 
> 20000 seconds for a 1GHz CPU, 40000 seconds for a 4GHz CPU independent
> of the actual HZ value.
> 
> jiffies are incremented once per tick, i.e. at tick frequency. The
> number of ticks required to reach 20ms depends obviously on the tick
> frequency, aka HZ.
> 
> HZ         ticks per second     tick period     Number of ticks which
>                                                  are equivalent to 20ms
>   100        100                 10ms             2
>   250        250                  4ms             5
> 1000       1000                  1ms            10
> 
> And that's what the code does correctly:
> 
> #define MAX_SAMPLE_AGE  ((unsigned long)HZ / 50)
> 
> No?
> 
>>  From the commit f3eca381bd49 on, the fallback condition about the 'the
>> last update was too long' have been comparing ticks and milliseconds by
>> mistake, which leads to that the condition is met and the fallback
>> method is used frequently.
> 
> The comparison is comparing a tick delta with a maximum number of ticks
> and that's not a mistake. It's simply correct.
> 
>> The change to compare ticks here corrects that and fixes related issues
>> have been seen on x86 platforms since 5.18 kernel.
> 
> I have no idea what you are trying to "fix" here, but that's moot as
> there is nothing to fix.

Hi Thomas and Dave,

Thank you for educating on this, I think you are totally right. So the 
original issue described in the bugzilla is not caused by what I 
mentioned here. Please ignore this patch, we need to figure out another 
fix for the issue, sorry for the confusion.

Thanks,
~Keyon

> 
> Thanks,
> 
>          tglx

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-07-05 17:23 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-06-26 19:36 [PATCH] x86/aperfmperf: Fix the fallback condition in arch_freq_get_on_cpu() Keyon Jie
2023-06-28  0:57 ` Yang Jie
2023-06-30 12:35 ` Thomas Gleixner
2023-07-05 17:23   ` Keyon Jie
2023-06-30 15:13 ` Dave Hansen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox