* [PATCH v9 0/5] Add support for AArch64 AMUv1-based average freq
@ 2025-01-21 8:44 Beata Michalska
2025-01-21 8:44 ` [PATCH v9 1/5] cpufreq: Allow arch_freq_get_on_cpu to return an error Beata Michalska
` (4 more replies)
0 siblings, 5 replies; 32+ messages in thread
From: Beata Michalska @ 2025-01-21 8:44 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, linux-pm, ionela.voinescu,
sudeep.holla, will, catalin.marinas, rafael, viresh.kumar
Cc: sumitg, yang, vanshikonda, lihuisong, zhanjie9, Jonathan Corbet,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
H . Peter Anvin, Phil Auld, x86, linux-doc
Hi All,
This series adds support for obtaining an average CPU frequency based on
a hardware provided feedback. The average frequency is being exposed via
dedicated yet optional cpufreq sysfs attribute - cpuinfo_avg_freq.
The architecture specific bits are being provided for AArch64, caching on
existing implementation for FIE and AMUv1 support: the frequency scale
factor, updated on each sched tick, serving as a base for retrieving
the frequency for a given CPU, representing an average frequency
reported between the ticks.
The changes have been rather lightly (due to some limitations) tested on
an FVP model.
Note that [PATCH 2/4] arm64: amu: Delay allocating cpumask for AMU FIE support
can be merged independently.
Additionally, this series depends on [6]
Relevant discussions:
[1] https://lore.kernel.org/all/20240229162520.970986-1-vanshikonda@os.amperecomputing.com/
[2] https://lore.kernel.org/all/7eozim2xnepacnnkzxlbx34hib4otycnbn4dqymfziqou5lw5u@5xzpv3t7sxo3/
[3] https://lore.kernel.org/all/20231212072617.14756-1-lihuisong@huawei.com/
[4] https://lore.kernel.org/lkml/ZIHpd6unkOtYVEqP@e120325.cambridge.arm.com/T/#m4e74cb5a0aaa353c60fedc6cfb95ab7a6e381e3c
[5] https://lore.kernel.org/all/20240603081331.3829278-1-beata.michalska@arm.com/
[6] https://lore.kernel.org/all/20240827154818.1195849-1-ionela.voinescu@arm.com/
v9:
- Moved changes to arch_freq_get_on_cpu to a separate patch
v8:
- Drop introducing new function and reuse arch_freq_get_on_cpu, guarding its use
in scaling_cur_freq sysfs handler with dedicated config for x86
v7:
- Dropping 'arch_topology: init capacity_freq_ref to 0' patch from the series
as this one has been sent separately as an independent change
[https://lore.kernel.org/all/20240827154818.1195849-1-ionela.voinescu@arm.com/]
- Including in the series change that introduces new sysfs entry [PATCH 1/4]
- Consequently modifying previously arch_freq_get_on_cpu to match reqs for new
sysfs attribute
- Dropping an idea of considering a CPU that has been idle for a while as a
valid source of information for obtaining an AMU-counter based frequency
- Some minor cosmetic changes
v6:
- delay allocating cpumask for AMU FIE support instead of invalidating the mask
upon failure to register cpufreq policy notifications
- drop the change to cpufreq core (for cpuinfo_cur_freq) as this one will be
sent as a separate change
v5:
- Fix invalid access to cpumask
- Reworked finding reference cpu when getting the freq
v4:
- dropping seqcount
- fixing identifying active cpu within given policy
- skipping full dynticks cpus when retrieving the freq
- bringing back plugging in arch_freq_get_on_cpu into cpuinfo_cur_freq
v3:
- dropping changes to cpufreq_verify_current_freq
- pulling in changes from Ionela initializing capacity_freq_ref to 0
(thanks for that!) and applying suggestions made by her during last review:
- switching to arch_scale_freq_capacity and arch_scale_freq_ref when
reversing freq scale factor computation
- swapping shift with multiplication
- adding time limit for considering last scale update as valid
- updating frequency scale factor upon entering idle
v2:
- Splitting the patches
- Adding comment for full dyntick mode
- Plugging arch_freq_get_on_cpu into cpufreq_verify_current_freq instead
of in show_cpuinfo_cur_freq to allow the framework to stay more in sync
with potential freq changes
CC: Jonathan Corbet <corbet@lwn.net>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
CC: Borislav Petkov <bp@alien8.de>
CC: Dave Hansen <dave.hansen@linux.intel.com>
CC: H. Peter Anvin <hpa@zytor.com>
CC: Phil Auld <pauld@redhat.com>
CC: x86@kernel.org
CC: linux-doc@vger.kernel.org
*** BLURB HERE ***
Beata Michalska (5):
cpufreq: Allow arch_freq_get_on_cpu to return an error
cpufreq: Introduce an optional cpuinfo_avg_freq sysfs entry
arm64: amu: Delay allocating cpumask for AMU FIE support
arm64: Provide an AMU-based version of arch_freq_get_on_cpu
arm64: Update AMU-based freq scale factor on entering idle
Documentation/admin-guide/pm/cpufreq.rst | 16 ++-
arch/arm64/kernel/topology.c | 144 +++++++++++++++++++----
arch/x86/kernel/cpu/aperfmperf.c | 2 +-
arch/x86/kernel/cpu/proc.c | 7 +-
drivers/cpufreq/Kconfig.x86 | 12 ++
drivers/cpufreq/cpufreq.c | 38 +++++-
include/linux/cpufreq.h | 2 +-
7 files changed, 189 insertions(+), 32 deletions(-)
--
2.25.1
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH v9 1/5] cpufreq: Allow arch_freq_get_on_cpu to return an error
2025-01-21 8:44 [PATCH v9 0/5] Add support for AArch64 AMUv1-based average freq Beata Michalska
@ 2025-01-21 8:44 ` Beata Michalska
2025-01-21 10:47 ` Viresh Kumar
` (2 more replies)
2025-01-21 8:44 ` [PATCH v9 2/5] cpufreq: Introduce an optional cpuinfo_avg_freq sysfs entry Beata Michalska
` (3 subsequent siblings)
4 siblings, 3 replies; 32+ messages in thread
From: Beata Michalska @ 2025-01-21 8:44 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, linux-pm, ionela.voinescu,
sudeep.holla, will, catalin.marinas, rafael, viresh.kumar
Cc: sumitg, yang, vanshikonda, lihuisong, zhanjie9
Allow arch_freq_get_on_cpu to return an error for cases when retrieving
current CPU frequency is not possible, whether that being due to lack of
required arch support or due to other circumstances when the current
frequency cannot be determined at given point of time.
Signed-off-by: Beata Michalska <beata.michalska@arm.com>
---
arch/x86/kernel/cpu/aperfmperf.c | 2 +-
arch/x86/kernel/cpu/proc.c | 7 +++++--
drivers/cpufreq/cpufreq.c | 8 ++++----
include/linux/cpufreq.h | 2 +-
4 files changed, 11 insertions(+), 8 deletions(-)
diff --git a/arch/x86/kernel/cpu/aperfmperf.c b/arch/x86/kernel/cpu/aperfmperf.c
index f642de2ebdac..6cf31a1649c4 100644
--- a/arch/x86/kernel/cpu/aperfmperf.c
+++ b/arch/x86/kernel/cpu/aperfmperf.c
@@ -498,7 +498,7 @@ void arch_scale_freq_tick(void)
*/
#define MAX_SAMPLE_AGE ((unsigned long)HZ / 50)
-unsigned int arch_freq_get_on_cpu(int cpu)
+int arch_freq_get_on_cpu(int cpu)
{
struct aperfmperf *s = per_cpu_ptr(&cpu_samples, cpu);
unsigned int seq, freq;
diff --git a/arch/x86/kernel/cpu/proc.c b/arch/x86/kernel/cpu/proc.c
index 41ed01f46bd9..d79f5845a463 100644
--- a/arch/x86/kernel/cpu/proc.c
+++ b/arch/x86/kernel/cpu/proc.c
@@ -86,9 +86,12 @@ static int show_cpuinfo(struct seq_file *m, void *v)
seq_printf(m, "microcode\t: 0x%x\n", c->microcode);
if (cpu_has(c, X86_FEATURE_TSC)) {
- unsigned int freq = arch_freq_get_on_cpu(cpu);
+ int freq = arch_freq_get_on_cpu(cpu);
- seq_printf(m, "cpu MHz\t\t: %u.%03u\n", freq / 1000, (freq % 1000));
+ if (freq <= 0)
+ seq_puts(m, "cpu MHz\t\t: Unknown\n");
+ else
+ seq_printf(m, "cpu MHz\t\t: %u.%03u\n", freq / 1000, (freq % 1000));
}
/* Cache size */
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 418236fef172..6f45684483c4 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -728,18 +728,18 @@ show_one(cpuinfo_transition_latency, cpuinfo.transition_latency);
show_one(scaling_min_freq, min);
show_one(scaling_max_freq, max);
-__weak unsigned int arch_freq_get_on_cpu(int cpu)
+__weak int arch_freq_get_on_cpu(int cpu)
{
- return 0;
+ return -EOPNOTSUPP;
}
static ssize_t show_scaling_cur_freq(struct cpufreq_policy *policy, char *buf)
{
ssize_t ret;
- unsigned int freq;
+ int freq;
freq = arch_freq_get_on_cpu(policy->cpu);
- if (freq)
+ if (freq > 0)
ret = sysfs_emit(buf, "%u\n", freq);
else if (cpufreq_driver->setpolicy && cpufreq_driver->get)
ret = sysfs_emit(buf, "%u\n", cpufreq_driver->get(policy->cpu));
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index 7fe0981a7e46..02fd4746231d 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -1184,7 +1184,7 @@ static inline int of_perf_domain_get_sharing_cpumask(int pcpu, const char *list_
}
#endif
-extern unsigned int arch_freq_get_on_cpu(int cpu);
+extern int arch_freq_get_on_cpu(int cpu);
#ifndef arch_set_freq_scale
static __always_inline
--
2.25.1
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v9 2/5] cpufreq: Introduce an optional cpuinfo_avg_freq sysfs entry
2025-01-21 8:44 [PATCH v9 0/5] Add support for AArch64 AMUv1-based average freq Beata Michalska
2025-01-21 8:44 ` [PATCH v9 1/5] cpufreq: Allow arch_freq_get_on_cpu to return an error Beata Michalska
@ 2025-01-21 8:44 ` Beata Michalska
2025-01-21 10:53 ` Viresh Kumar
` (2 more replies)
2025-01-21 8:44 ` [PATCH v9 3/5] arm64: amu: Delay allocating cpumask for AMU FIE support Beata Michalska
` (2 subsequent siblings)
4 siblings, 3 replies; 32+ messages in thread
From: Beata Michalska @ 2025-01-21 8:44 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, linux-pm, ionela.voinescu,
sudeep.holla, will, catalin.marinas, rafael, viresh.kumar
Cc: sumitg, yang, vanshikonda, lihuisong, zhanjie9, Jonathan Corbet,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
H . Peter Anvin, Phil Auld, x86, linux-doc
Currently the CPUFreq core exposes two sysfs attributes that can be used
to query current frequency of a given CPU(s): namely cpuinfo_cur_freq
and scaling_cur_freq. Both provide slightly different view on the
subject and they do come with their own drawbacks.
cpuinfo_cur_freq provides higher precision though at a cost of being
rather expensive. Moreover, the information retrieved via this attribute
is somewhat short lived as frequency can change at any point of time
making it difficult to reason from.
scaling_cur_freq, on the other hand, tends to be less accurate but then
the actual level of precision (and source of information) varies between
architectures making it a bit ambiguous.
The new attribute, cpuinfo_avg_freq, is intended to provide more stable,
distinct interface, exposing an average frequency of a given CPU(s), as
reported by the hardware, over a time frame spanning no more than a few
milliseconds. As it requires appropriate hardware support, this
interface is optional.
Note that under the hood, the new attribute relies on the information
provided by arch_freq_get_on_cpu, which, up to this point, has been
feeding data for scaling_cur_freq attribute, being the source of
ambiguity when it comes to interpretation. This has been amended by
restoring the intended behavior for scaling_cur_freq, with a new
dedicated config option to maintain status quo for those, who may need
it.
CC: Jonathan Corbet <corbet@lwn.net>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
CC: Borislav Petkov <bp@alien8.de>
CC: Dave Hansen <dave.hansen@linux.intel.com>
CC: H. Peter Anvin <hpa@zytor.com>
CC: Phil Auld <pauld@redhat.com>
CC: x86@kernel.org
CC: linux-doc@vger.kernel.org
Signed-off-by: Beata Michalska <beata.michalska@arm.com>
---
Documentation/admin-guide/pm/cpufreq.rst | 16 ++++++++++++-
drivers/cpufreq/Kconfig.x86 | 12 ++++++++++
drivers/cpufreq/cpufreq.c | 30 +++++++++++++++++++++++-
3 files changed, 56 insertions(+), 2 deletions(-)
diff --git a/Documentation/admin-guide/pm/cpufreq.rst b/Documentation/admin-guide/pm/cpufreq.rst
index a21369eba034..e9969174026c 100644
--- a/Documentation/admin-guide/pm/cpufreq.rst
+++ b/Documentation/admin-guide/pm/cpufreq.rst
@@ -248,6 +248,19 @@ are the following:
If that frequency cannot be determined, this attribute should not
be present.
+``cpuinfo_avg_freq``
+ An average frequency (in KHz) of all CPUs belonging to a given policy,
+ derived from a hardware provided feedback and reported on a time frame
+ spanning at most few milliseconds.
+
+ This is expected to be based on the frequency the hardware actually runs
+ at and, as such, might require specialised hardware support (such as AMU
+ extension on ARM). If one cannot be determined, this attribute should
+ not be present.
+
+ Note, that failed attempt to retrieve current frequency for a given
+ CPU(s) will result in an appropriate error.
+
``cpuinfo_max_freq``
Maximum possible operating frequency the CPUs belonging to this policy
can run at (in kHz).
@@ -293,7 +306,8 @@ are the following:
Some architectures (e.g. ``x86``) may attempt to provide information
more precisely reflecting the current CPU frequency through this
attribute, but that still may not be the exact current CPU frequency as
- seen by the hardware at the moment.
+ seen by the hardware at the moment. This behavior though, is only
+ available via c:macro:``CPUFREQ_ARCH_CUR_FREQ`` option.
``scaling_driver``
The scaling driver currently in use.
diff --git a/drivers/cpufreq/Kconfig.x86 b/drivers/cpufreq/Kconfig.x86
index 97c2d4f15d76..212e1b9afe21 100644
--- a/drivers/cpufreq/Kconfig.x86
+++ b/drivers/cpufreq/Kconfig.x86
@@ -340,3 +340,15 @@ config X86_SPEEDSTEP_RELAXED_CAP_CHECK
option lets the probing code bypass some of those checks if the
parameter "relaxed_check=1" is passed to the module.
+config CPUFREQ_ARCH_CUR_FREQ
+ default y
+ bool "Current frequency derived from HW provided feedback"
+ help
+ This determines whether the scaling_cur_freq sysfs attribute returns
+ the last requested frequency or a more precise value based on hardware
+ provided feedback (as architected counters).
+ Given that a more precise frequency can now be provided via the
+ cpuinfo_avg_cur_freq attribute, by enabling this option,
+ scaling_cur_freq maintains the provision of a counter based frequency,
+ for compatibility reasons.
+
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 6f45684483c4..b2a8efa83c98 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -733,12 +733,20 @@ __weak int arch_freq_get_on_cpu(int cpu)
return -EOPNOTSUPP;
}
+static inline bool cpufreq_avg_freq_supported(struct cpufreq_policy *policy)
+{
+ return arch_freq_get_on_cpu(policy->cpu) != -EOPNOTSUPP;
+}
+
static ssize_t show_scaling_cur_freq(struct cpufreq_policy *policy, char *buf)
{
ssize_t ret;
int freq;
- freq = arch_freq_get_on_cpu(policy->cpu);
+ freq = IS_ENABLED(CONFIG_CPUFREQ_ARCH_CUR_FREQ)
+ ? arch_freq_get_on_cpu(policy->cpu)
+ : 0;
+
if (freq > 0)
ret = sysfs_emit(buf, "%u\n", freq);
else if (cpufreq_driver->setpolicy && cpufreq_driver->get)
@@ -783,6 +791,19 @@ static ssize_t show_cpuinfo_cur_freq(struct cpufreq_policy *policy,
return sysfs_emit(buf, "<unknown>\n");
}
+/*
+ * show_cpuinfo_avg_freq - average CPU frequency as detected by hardware
+ */
+static ssize_t show_cpuinfo_avg_freq(struct cpufreq_policy *policy,
+ char *buf)
+{
+ int avg_freq = arch_freq_get_on_cpu(policy->cpu);
+
+ if (avg_freq > 0)
+ return sysfs_emit(buf, "%u\n", avg_freq);
+ return avg_freq != 0 ? avg_freq : -EINVAL;
+}
+
/*
* show_scaling_governor - show the current policy for the specified CPU
*/
@@ -945,6 +966,7 @@ static ssize_t show_bios_limit(struct cpufreq_policy *policy, char *buf)
}
cpufreq_freq_attr_ro_perm(cpuinfo_cur_freq, 0400);
+cpufreq_freq_attr_ro(cpuinfo_avg_freq);
cpufreq_freq_attr_ro(cpuinfo_min_freq);
cpufreq_freq_attr_ro(cpuinfo_max_freq);
cpufreq_freq_attr_ro(cpuinfo_transition_latency);
@@ -1072,6 +1094,12 @@ static int cpufreq_add_dev_interface(struct cpufreq_policy *policy)
return ret;
}
+ if (cpufreq_avg_freq_supported(policy)) {
+ ret = sysfs_create_file(&policy->kobj, &cpuinfo_avg_freq.attr);
+ if (ret)
+ return ret;
+ }
+
ret = sysfs_create_file(&policy->kobj, &scaling_cur_freq.attr);
if (ret)
return ret;
--
2.25.1
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v9 3/5] arm64: amu: Delay allocating cpumask for AMU FIE support
2025-01-21 8:44 [PATCH v9 0/5] Add support for AArch64 AMUv1-based average freq Beata Michalska
2025-01-21 8:44 ` [PATCH v9 1/5] cpufreq: Allow arch_freq_get_on_cpu to return an error Beata Michalska
2025-01-21 8:44 ` [PATCH v9 2/5] cpufreq: Introduce an optional cpuinfo_avg_freq sysfs entry Beata Michalska
@ 2025-01-21 8:44 ` Beata Michalska
2025-01-24 4:48 ` Prasanna Kumar T S M
2025-01-29 11:17 ` Sumit Gupta
2025-01-21 8:44 ` [PATCH v9 4/5] arm64: Provide an AMU-based version of arch_freq_get_on_cpu Beata Michalska
2025-01-21 8:44 ` [PATCH v9 5/5] arm64: Update AMU-based freq scale factor on entering idle Beata Michalska
4 siblings, 2 replies; 32+ messages in thread
From: Beata Michalska @ 2025-01-21 8:44 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, linux-pm, ionela.voinescu,
sudeep.holla, will, catalin.marinas, rafael, viresh.kumar
Cc: sumitg, yang, vanshikonda, lihuisong, zhanjie9
For the time being, the amu_fie_cpus cpumask is being exclusively used
by the AMU-related internals of FIE support and is guaranteed to be
valid on every access currently made. Still the mask is not being
invalidated on one of the error handling code paths, which leaves
a soft spot with theoretical risk of UAF for CPUMASK_OFFSTACK cases.
To make things sound, delay allocating said cpumask
(for CPUMASK_OFFSTACK) avoiding otherwise nasty sanitising case failing
to register the cpufreq policy notifications.
Signed-off-by: Beata Michalska <beata.michalska@arm.com>
---
arch/arm64/kernel/topology.c | 22 ++++++++++------------
1 file changed, 10 insertions(+), 12 deletions(-)
diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
index 1a2c72f3e7f8..cb180684d10d 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -194,12 +194,19 @@ static void amu_fie_setup(const struct cpumask *cpus)
int cpu;
/* We are already set since the last insmod of cpufreq driver */
- if (unlikely(cpumask_subset(cpus, amu_fie_cpus)))
+ if (cpumask_available(amu_fie_cpus) &&
+ unlikely(cpumask_subset(cpus, amu_fie_cpus)))
return;
- for_each_cpu(cpu, cpus) {
+ for_each_cpu(cpu, cpus)
if (!freq_counters_valid(cpu))
return;
+
+ if (!cpumask_available(amu_fie_cpus) &&
+ !zalloc_cpumask_var(&amu_fie_cpus, GFP_KERNEL)) {
+ WARN_ONCE(1, "Failed to allocate FIE cpumask for CPUs[%*pbl]\n",
+ cpumask_pr_args(cpus));
+ return;
}
cpumask_or(amu_fie_cpus, amu_fie_cpus, cpus);
@@ -237,17 +244,8 @@ static struct notifier_block init_amu_fie_notifier = {
static int __init init_amu_fie(void)
{
- int ret;
-
- if (!zalloc_cpumask_var(&amu_fie_cpus, GFP_KERNEL))
- return -ENOMEM;
-
- ret = cpufreq_register_notifier(&init_amu_fie_notifier,
+ return cpufreq_register_notifier(&init_amu_fie_notifier,
CPUFREQ_POLICY_NOTIFIER);
- if (ret)
- free_cpumask_var(amu_fie_cpus);
-
- return ret;
}
core_initcall(init_amu_fie);
--
2.25.1
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v9 4/5] arm64: Provide an AMU-based version of arch_freq_get_on_cpu
2025-01-21 8:44 [PATCH v9 0/5] Add support for AArch64 AMUv1-based average freq Beata Michalska
` (2 preceding siblings ...)
2025-01-21 8:44 ` [PATCH v9 3/5] arm64: amu: Delay allocating cpumask for AMU FIE support Beata Michalska
@ 2025-01-21 8:44 ` Beata Michalska
2025-01-24 4:43 ` Prasanna Kumar T S M
2025-01-29 11:15 ` Sumit Gupta
2025-01-21 8:44 ` [PATCH v9 5/5] arm64: Update AMU-based freq scale factor on entering idle Beata Michalska
4 siblings, 2 replies; 32+ messages in thread
From: Beata Michalska @ 2025-01-21 8:44 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, linux-pm, ionela.voinescu,
sudeep.holla, will, catalin.marinas, rafael, viresh.kumar
Cc: sumitg, yang, vanshikonda, lihuisong, zhanjie9
With the Frequency Invariance Engine (FIE) being already wired up with
sched tick and making use of relevant (core counter and constant
counter) AMU counters, getting the average frequency for a given CPU,
can be achieved by utilizing the frequency scale factor which reflects
an average CPU frequency for the last tick period length.
The solution is partially based on APERF/MPERF implementation of
arch_freq_get_on_cpu.
Suggested-by: Ionela Voinescu <ionela.voinescu@arm.com>
Signed-off-by: Beata Michalska <beata.michalska@arm.com>
---
arch/arm64/kernel/topology.c | 109 +++++++++++++++++++++++++++++++----
1 file changed, 99 insertions(+), 10 deletions(-)
diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
index cb180684d10d..5f5738b174c7 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -17,6 +17,7 @@
#include <linux/cpufreq.h>
#include <linux/init.h>
#include <linux/percpu.h>
+#include <linux/sched/isolation.h>
#include <asm/cpu.h>
#include <asm/cputype.h>
@@ -88,18 +89,28 @@ int __init parse_acpi_topology(void)
* initialized.
*/
static DEFINE_PER_CPU_READ_MOSTLY(unsigned long, arch_max_freq_scale) = 1UL << (2 * SCHED_CAPACITY_SHIFT);
-static DEFINE_PER_CPU(u64, arch_const_cycles_prev);
-static DEFINE_PER_CPU(u64, arch_core_cycles_prev);
static cpumask_var_t amu_fie_cpus;
+struct amu_cntr_sample {
+ u64 arch_const_cycles_prev;
+ u64 arch_core_cycles_prev;
+ unsigned long last_scale_update;
+};
+
+static DEFINE_PER_CPU_SHARED_ALIGNED(struct amu_cntr_sample, cpu_amu_samples);
+
void update_freq_counters_refs(void)
{
- this_cpu_write(arch_core_cycles_prev, read_corecnt());
- this_cpu_write(arch_const_cycles_prev, read_constcnt());
+ struct amu_cntr_sample *amu_sample = this_cpu_ptr(&cpu_amu_samples);
+
+ amu_sample->arch_core_cycles_prev = read_corecnt();
+ amu_sample->arch_const_cycles_prev = read_constcnt();
}
static inline bool freq_counters_valid(int cpu)
{
+ struct amu_cntr_sample *amu_sample = per_cpu_ptr(&cpu_amu_samples, cpu);
+
if ((cpu >= nr_cpu_ids) || !cpumask_test_cpu(cpu, cpu_present_mask))
return false;
@@ -108,8 +119,8 @@ static inline bool freq_counters_valid(int cpu)
return false;
}
- if (unlikely(!per_cpu(arch_const_cycles_prev, cpu) ||
- !per_cpu(arch_core_cycles_prev, cpu))) {
+ if (unlikely(!amu_sample->arch_const_cycles_prev ||
+ !amu_sample->arch_core_cycles_prev)) {
pr_debug("CPU%d: cycle counters are not enabled.\n", cpu);
return false;
}
@@ -152,17 +163,22 @@ void freq_inv_set_max_ratio(int cpu, u64 max_rate)
static void amu_scale_freq_tick(void)
{
+ struct amu_cntr_sample *amu_sample = this_cpu_ptr(&cpu_amu_samples);
u64 prev_core_cnt, prev_const_cnt;
u64 core_cnt, const_cnt, scale;
- prev_const_cnt = this_cpu_read(arch_const_cycles_prev);
- prev_core_cnt = this_cpu_read(arch_core_cycles_prev);
+ prev_const_cnt = amu_sample->arch_const_cycles_prev;
+ prev_core_cnt = amu_sample->arch_core_cycles_prev;
update_freq_counters_refs();
- const_cnt = this_cpu_read(arch_const_cycles_prev);
- core_cnt = this_cpu_read(arch_core_cycles_prev);
+ const_cnt = amu_sample->arch_const_cycles_prev;
+ core_cnt = amu_sample->arch_core_cycles_prev;
+ /*
+ * This should not happen unless the AMUs have been reset and the
+ * counter values have not been restored - unlikely
+ */
if (unlikely(core_cnt <= prev_core_cnt ||
const_cnt <= prev_const_cnt))
return;
@@ -182,6 +198,8 @@ static void amu_scale_freq_tick(void)
scale = min_t(unsigned long, scale, SCHED_CAPACITY_SCALE);
this_cpu_write(arch_freq_scale, (unsigned long)scale);
+
+ amu_sample->last_scale_update = jiffies;
}
static struct scale_freq_data amu_sfd = {
@@ -189,6 +207,77 @@ static struct scale_freq_data amu_sfd = {
.set_freq_scale = amu_scale_freq_tick,
};
+static __always_inline bool amu_fie_cpu_supported(unsigned int cpu)
+{
+ return cpumask_available(amu_fie_cpus) &&
+ cpumask_test_cpu(cpu, amu_fie_cpus);
+}
+
+#define AMU_SAMPLE_EXP_MS 20
+
+int arch_freq_get_on_cpu(int cpu)
+{
+ struct amu_cntr_sample *amu_sample;
+ unsigned int start_cpu = cpu;
+ unsigned long last_update;
+ unsigned int freq = 0;
+ u64 scale;
+
+ if (!amu_fie_cpu_supported(cpu) || !arch_scale_freq_ref(cpu))
+ return -EOPNOTSUPP;
+
+retry:
+ amu_sample = per_cpu_ptr(&cpu_amu_samples, cpu);
+
+ last_update = amu_sample->last_scale_update;
+
+ /*
+ * For those CPUs that are in full dynticks mode, or those that have
+ * not seen tick for a while, try an alternative source for the counters
+ * (and thus freq scale), if available, for given policy: this boils
+ * down to identifying an active cpu within the same freq domain, if any.
+ */
+ if (!housekeeping_cpu(cpu, HK_TYPE_TICK) ||
+ time_is_before_jiffies(last_update + msecs_to_jiffies(AMU_SAMPLE_EXP_MS))) {
+ struct cpufreq_policy *policy = cpufreq_cpu_get(cpu);
+ int ref_cpu = cpu;
+
+ if (!policy)
+ return -EINVAL;
+
+ if (!cpumask_intersects(policy->related_cpus,
+ housekeeping_cpumask(HK_TYPE_TICK))) {
+ cpufreq_cpu_put(policy);
+ return -EOPNOTSUPP;
+ }
+
+
+ do {
+ ref_cpu = cpumask_next_wrap(ref_cpu, policy->cpus,
+ start_cpu, false);
+
+ } while (ref_cpu < nr_cpu_ids && idle_cpu(ref_cpu));
+
+ cpufreq_cpu_put(policy);
+
+ if (ref_cpu >= nr_cpu_ids)
+ /* No alternative to pull info from */
+ return -EAGAIN;
+
+ cpu = ref_cpu;
+ goto retry;
+ }
+ /*
+ * Reversed computation to the one used to determine
+ * the arch_freq_scale value
+ * (see amu_scale_freq_tick for details)
+ */
+ scale = arch_scale_freq_capacity(cpu);
+ freq = scale * arch_scale_freq_ref(cpu);
+ freq >>= SCHED_CAPACITY_SHIFT;
+ return freq;
+}
+
static void amu_fie_setup(const struct cpumask *cpus)
{
int cpu;
--
2.25.1
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v9 5/5] arm64: Update AMU-based freq scale factor on entering idle
2025-01-21 8:44 [PATCH v9 0/5] Add support for AArch64 AMUv1-based average freq Beata Michalska
` (3 preceding siblings ...)
2025-01-21 8:44 ` [PATCH v9 4/5] arm64: Provide an AMU-based version of arch_freq_get_on_cpu Beata Michalska
@ 2025-01-21 8:44 ` Beata Michalska
2025-01-24 4:45 ` Prasanna Kumar T S M
2025-01-29 11:13 ` Sumit Gupta
4 siblings, 2 replies; 32+ messages in thread
From: Beata Michalska @ 2025-01-21 8:44 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, linux-pm, ionela.voinescu,
sudeep.holla, will, catalin.marinas, rafael, viresh.kumar
Cc: sumitg, yang, vanshikonda, lihuisong, zhanjie9
Now that the frequency scale factor has been activated for retrieving
current frequency on a given CPU, trigger its update upon entering
idle. This will, to an extent, allow querying last known frequency
in a non-invasive way. It will also improve the frequency scale factor
accuracy when a CPU entering idle did not receive a tick for a while.
As a consequence, for idle cores, the reported frequency will be the
last one observed before entering the idle state.
Suggested-by: Vanshidhar Konda <vanshikonda@os.amperecomputing.com>
Signed-off-by: Beata Michalska <beata.michalska@arm.com>
---
arch/arm64/kernel/topology.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
index 5f5738b174c7..6c43aafac77c 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -213,6 +213,19 @@ static __always_inline bool amu_fie_cpu_supported(unsigned int cpu)
cpumask_test_cpu(cpu, amu_fie_cpus);
}
+void arch_cpu_idle_enter(void)
+{
+ unsigned int cpu = smp_processor_id();
+
+ if (!amu_fie_cpu_supported(cpu))
+ return;
+
+ /* Kick in AMU update but only if one has not happened already */
+ if (housekeeping_cpu(cpu, HK_TYPE_TICK) &&
+ time_is_before_jiffies(per_cpu(cpu_amu_samples.last_scale_update, cpu)))
+ amu_scale_freq_tick();
+}
+
#define AMU_SAMPLE_EXP_MS 20
int arch_freq_get_on_cpu(int cpu)
--
2.25.1
^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [PATCH v9 1/5] cpufreq: Allow arch_freq_get_on_cpu to return an error
2025-01-21 8:44 ` [PATCH v9 1/5] cpufreq: Allow arch_freq_get_on_cpu to return an error Beata Michalska
@ 2025-01-21 10:47 ` Viresh Kumar
2025-01-21 15:14 ` Beata Michalska
2025-01-21 10:47 ` Prasanna Kumar T S M
2025-01-24 4:15 ` Prasanna Kumar T S M
2 siblings, 1 reply; 32+ messages in thread
From: Viresh Kumar @ 2025-01-21 10:47 UTC (permalink / raw)
To: Beata Michalska
Cc: linux-kernel, linux-arm-kernel, linux-pm, ionela.voinescu,
sudeep.holla, will, catalin.marinas, rafael, sumitg, yang,
vanshikonda, lihuisong, zhanjie9
On 21-01-25, 08:44, Beata Michalska wrote:
> Allow arch_freq_get_on_cpu to return an error for cases when retrieving
> current CPU frequency is not possible, whether that being due to lack of
> required arch support or due to other circumstances when the current
> frequency cannot be determined at given point of time.
>
> Signed-off-by: Beata Michalska <beata.michalska@arm.com>
> ---
> arch/x86/kernel/cpu/aperfmperf.c | 2 +-
> arch/x86/kernel/cpu/proc.c | 7 +++++--
> drivers/cpufreq/cpufreq.c | 8 ++++----
> include/linux/cpufreq.h | 2 +-
> 4 files changed, 11 insertions(+), 8 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/aperfmperf.c b/arch/x86/kernel/cpu/aperfmperf.c
> index f642de2ebdac..6cf31a1649c4 100644
> --- a/arch/x86/kernel/cpu/aperfmperf.c
> +++ b/arch/x86/kernel/cpu/aperfmperf.c
> @@ -498,7 +498,7 @@ void arch_scale_freq_tick(void)
> */
> #define MAX_SAMPLE_AGE ((unsigned long)HZ / 50)
>
> -unsigned int arch_freq_get_on_cpu(int cpu)
> +int arch_freq_get_on_cpu(int cpu)
> {
> struct aperfmperf *s = per_cpu_ptr(&cpu_samples, cpu);
> unsigned int seq, freq;
> diff --git a/arch/x86/kernel/cpu/proc.c b/arch/x86/kernel/cpu/proc.c
> index 41ed01f46bd9..d79f5845a463 100644
> --- a/arch/x86/kernel/cpu/proc.c
> +++ b/arch/x86/kernel/cpu/proc.c
> @@ -86,9 +86,12 @@ static int show_cpuinfo(struct seq_file *m, void *v)
> seq_printf(m, "microcode\t: 0x%x\n", c->microcode);
>
> if (cpu_has(c, X86_FEATURE_TSC)) {
> - unsigned int freq = arch_freq_get_on_cpu(cpu);
> + int freq = arch_freq_get_on_cpu(cpu);
>
> - seq_printf(m, "cpu MHz\t\t: %u.%03u\n", freq / 1000, (freq % 1000));
> + if (freq <= 0)
> + seq_puts(m, "cpu MHz\t\t: Unknown\n");
> + else
> + seq_printf(m, "cpu MHz\t\t: %u.%03u\n", freq / 1000, (freq % 1000));
> }
>
> /* Cache size */
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index 418236fef172..6f45684483c4 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -728,18 +728,18 @@ show_one(cpuinfo_transition_latency, cpuinfo.transition_latency);
> show_one(scaling_min_freq, min);
> show_one(scaling_max_freq, max);
>
> -__weak unsigned int arch_freq_get_on_cpu(int cpu)
> +__weak int arch_freq_get_on_cpu(int cpu)
> {
> - return 0;
> + return -EOPNOTSUPP;
> }
>
> static ssize_t show_scaling_cur_freq(struct cpufreq_policy *policy, char *buf)
> {
> ssize_t ret;
> - unsigned int freq;
> + int freq;
>
> freq = arch_freq_get_on_cpu(policy->cpu);
> - if (freq)
> + if (freq > 0)
>= ?
Since we can return error now, 0 should be considered a valid
frequency value ?
> ret = sysfs_emit(buf, "%u\n", freq);
> else if (cpufreq_driver->setpolicy && cpufreq_driver->get)
> ret = sysfs_emit(buf, "%u\n", cpufreq_driver->get(policy->cpu));
> diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
> index 7fe0981a7e46..02fd4746231d 100644
> --- a/include/linux/cpufreq.h
> +++ b/include/linux/cpufreq.h
> @@ -1184,7 +1184,7 @@ static inline int of_perf_domain_get_sharing_cpumask(int pcpu, const char *list_
> }
> #endif
>
> -extern unsigned int arch_freq_get_on_cpu(int cpu);
> +extern int arch_freq_get_on_cpu(int cpu);
>
> #ifndef arch_set_freq_scale
> static __always_inline
> --
> 2.25.1
--
viresh
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v9 1/5] cpufreq: Allow arch_freq_get_on_cpu to return an error
2025-01-21 8:44 ` [PATCH v9 1/5] cpufreq: Allow arch_freq_get_on_cpu to return an error Beata Michalska
2025-01-21 10:47 ` Viresh Kumar
@ 2025-01-21 10:47 ` Prasanna Kumar T S M
2025-01-24 4:15 ` Prasanna Kumar T S M
2 siblings, 0 replies; 32+ messages in thread
From: Prasanna Kumar T S M @ 2025-01-21 10:47 UTC (permalink / raw)
To: Beata Michalska, linux-kernel, linux-arm-kernel, linux-pm,
ionela.voinescu, sudeep.holla, will, catalin.marinas, rafael,
viresh.kumar
Cc: sumitg, yang, vanshikonda, lihuisong, zhanjie9
On 21-01-2025 14:14, Beata Michalska wrote:
> Allow arch_freq_get_on_cpu to return an error for cases when retrieving
> current CPU frequency is not possible, whether that being due to lack of
> required arch support or due to other circumstances when the current
> frequency cannot be determined at given point of time.
>
> Signed-off-by: Beata Michalska <beata.michalska@arm.com>
> ---
> arch/x86/kernel/cpu/aperfmperf.c | 2 +-
> arch/x86/kernel/cpu/proc.c | 7 +++++--
> drivers/cpufreq/cpufreq.c | 8 ++++----
> include/linux/cpufreq.h | 2 +-
> 4 files changed, 11 insertions(+), 8 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/aperfmperf.c b/arch/x86/kernel/cpu/aperfmperf.c
> index f642de2ebdac..6cf31a1649c4 100644
> --- a/arch/x86/kernel/cpu/aperfmperf.c
> +++ b/arch/x86/kernel/cpu/aperfmperf.c
> @@ -498,7 +498,7 @@ void arch_scale_freq_tick(void)
> */
> #define MAX_SAMPLE_AGE ((unsigned long)HZ / 50)
>
> -unsigned int arch_freq_get_on_cpu(int cpu)
> +int arch_freq_get_on_cpu(int cpu)
> {
> struct aperfmperf *s = per_cpu_ptr(&cpu_samples, cpu);
> unsigned int seq, freq;
> diff --git a/arch/x86/kernel/cpu/proc.c b/arch/x86/kernel/cpu/proc.c
> index 41ed01f46bd9..d79f5845a463 100644
> --- a/arch/x86/kernel/cpu/proc.c
> +++ b/arch/x86/kernel/cpu/proc.c
> @@ -86,9 +86,12 @@ static int show_cpuinfo(struct seq_file *m, void *v)
> seq_printf(m, "microcode\t: 0x%x\n", c->microcode);
>
> if (cpu_has(c, X86_FEATURE_TSC)) {
> - unsigned int freq = arch_freq_get_on_cpu(cpu);
> + int freq = arch_freq_get_on_cpu(cpu);
>
> - seq_printf(m, "cpu MHz\t\t: %u.%03u\n", freq / 1000, (freq % 1000));
> + if (freq <= 0)
> + seq_puts(m, "cpu MHz\t\t: Unknown\n");
> + else
> + seq_printf(m, "cpu MHz\t\t: %u.%03u\n", freq / 1000, (freq % 1000));
> }
>
> /* Cache size */
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index 418236fef172..6f45684483c4 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -728,18 +728,18 @@ show_one(cpuinfo_transition_latency, cpuinfo.transition_latency);
> show_one(scaling_min_freq, min);
> show_one(scaling_max_freq, max);
>
> -__weak unsigned int arch_freq_get_on_cpu(int cpu)
> +__weak int arch_freq_get_on_cpu(int cpu)
> {
> - return 0;
> + return -EOPNOTSUPP;
> }
>
> static ssize_t show_scaling_cur_freq(struct cpufreq_policy *policy, char *buf)
> {
> ssize_t ret;
> - unsigned int freq;
> + int freq;
>
> freq = arch_freq_get_on_cpu(policy->cpu);
> - if (freq)
> + if (freq > 0)
> ret = sysfs_emit(buf, "%u\n", freq);
> else if (cpufreq_driver->setpolicy && cpufreq_driver->get)
> ret = sysfs_emit(buf, "%u\n", cpufreq_driver->get(policy->cpu));
> diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
> index 7fe0981a7e46..02fd4746231d 100644
> --- a/include/linux/cpufreq.h
> +++ b/include/linux/cpufreq.h
> @@ -1184,7 +1184,7 @@ static inline int of_perf_domain_get_sharing_cpumask(int pcpu, const char *list_
> }
> #endif
>
> -extern unsigned int arch_freq_get_on_cpu(int cpu);
> +extern int arch_freq_get_on_cpu(int cpu);
>
> #ifndef arch_set_freq_scale
> static __always_inline
Looks good to me.
Reviewed-by: Prasanna Kumar T S M <ptsm@linux.microsoft.com>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v9 2/5] cpufreq: Introduce an optional cpuinfo_avg_freq sysfs entry
2025-01-21 8:44 ` [PATCH v9 2/5] cpufreq: Introduce an optional cpuinfo_avg_freq sysfs entry Beata Michalska
@ 2025-01-21 10:53 ` Viresh Kumar
2025-01-21 15:17 ` Beata Michalska
2025-01-28 8:43 ` Prasanna Kumar T S M
2025-01-29 11:29 ` Sumit Gupta
2 siblings, 1 reply; 32+ messages in thread
From: Viresh Kumar @ 2025-01-21 10:53 UTC (permalink / raw)
To: Beata Michalska
Cc: linux-kernel, linux-arm-kernel, linux-pm, ionela.voinescu,
sudeep.holla, will, catalin.marinas, rafael, sumitg, yang,
vanshikonda, lihuisong, zhanjie9, Jonathan Corbet,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
H . Peter Anvin, Phil Auld, x86, linux-doc
On 21-01-25, 08:44, Beata Michalska wrote:
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index 6f45684483c4..b2a8efa83c98 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -733,12 +733,20 @@ __weak int arch_freq_get_on_cpu(int cpu)
> return -EOPNOTSUPP;
> }
>
> static ssize_t show_scaling_cur_freq(struct cpufreq_policy *policy, char *buf)
> {
> ssize_t ret;
> int freq;
>
> - freq = arch_freq_get_on_cpu(policy->cpu);
> + freq = IS_ENABLED(CONFIG_CPUFREQ_ARCH_CUR_FREQ)
> + ? arch_freq_get_on_cpu(policy->cpu)
> + : 0;
> +
> if (freq > 0)
> ret = sysfs_emit(buf, "%u\n", freq);
> else if (cpufreq_driver->setpolicy && cpufreq_driver->get)
Maybe this should be a separate commit ? And also I am not very happy
with the new kconfig option. I don't want others to use it as we want
to get rid of this for X86 too eventually. Making it a kconfig option
allows anyone to enable it and then depend on it without us knowing..
Rather just write it as "if (x86)", with a comment on what we plan to
do with it in few release cycles.
--
viresh
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v9 1/5] cpufreq: Allow arch_freq_get_on_cpu to return an error
2025-01-21 10:47 ` Viresh Kumar
@ 2025-01-21 15:14 ` Beata Michalska
2025-01-21 18:40 ` Vanshidhar Konda
2025-01-22 6:12 ` Viresh Kumar
0 siblings, 2 replies; 32+ messages in thread
From: Beata Michalska @ 2025-01-21 15:14 UTC (permalink / raw)
To: Viresh Kumar
Cc: linux-kernel, linux-arm-kernel, linux-pm, ionela.voinescu,
sudeep.holla, will, catalin.marinas, rafael, sumitg, yang,
vanshikonda, lihuisong, zhanjie9
On Tue, Jan 21, 2025 at 04:17:06PM +0530, Viresh Kumar wrote:
> On 21-01-25, 08:44, Beata Michalska wrote:
> > Allow arch_freq_get_on_cpu to return an error for cases when retrieving
> > current CPU frequency is not possible, whether that being due to lack of
> > required arch support or due to other circumstances when the current
> > frequency cannot be determined at given point of time.
> >
> > Signed-off-by: Beata Michalska <beata.michalska@arm.com>
> > ---
> > arch/x86/kernel/cpu/aperfmperf.c | 2 +-
> > arch/x86/kernel/cpu/proc.c | 7 +++++--
> > drivers/cpufreq/cpufreq.c | 8 ++++----
> > include/linux/cpufreq.h | 2 +-
> > 4 files changed, 11 insertions(+), 8 deletions(-)
> >
> > diff --git a/arch/x86/kernel/cpu/aperfmperf.c b/arch/x86/kernel/cpu/aperfmperf.c
> > index f642de2ebdac..6cf31a1649c4 100644
> > --- a/arch/x86/kernel/cpu/aperfmperf.c
> > +++ b/arch/x86/kernel/cpu/aperfmperf.c
> > @@ -498,7 +498,7 @@ void arch_scale_freq_tick(void)
> > */
> > #define MAX_SAMPLE_AGE ((unsigned long)HZ / 50)
> >
> > -unsigned int arch_freq_get_on_cpu(int cpu)
> > +int arch_freq_get_on_cpu(int cpu)
> > {
> > struct aperfmperf *s = per_cpu_ptr(&cpu_samples, cpu);
> > unsigned int seq, freq;
> > diff --git a/arch/x86/kernel/cpu/proc.c b/arch/x86/kernel/cpu/proc.c
> > index 41ed01f46bd9..d79f5845a463 100644
> > --- a/arch/x86/kernel/cpu/proc.c
> > +++ b/arch/x86/kernel/cpu/proc.c
> > @@ -86,9 +86,12 @@ static int show_cpuinfo(struct seq_file *m, void *v)
> > seq_printf(m, "microcode\t: 0x%x\n", c->microcode);
> >
> > if (cpu_has(c, X86_FEATURE_TSC)) {
> > - unsigned int freq = arch_freq_get_on_cpu(cpu);
> > + int freq = arch_freq_get_on_cpu(cpu);
> >
> > - seq_printf(m, "cpu MHz\t\t: %u.%03u\n", freq / 1000, (freq % 1000));
> > + if (freq <= 0)
> > + seq_puts(m, "cpu MHz\t\t: Unknown\n");
> > + else
> > + seq_printf(m, "cpu MHz\t\t: %u.%03u\n", freq / 1000, (freq % 1000));
> > }
> >
> > /* Cache size */
> > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> > index 418236fef172..6f45684483c4 100644
> > --- a/drivers/cpufreq/cpufreq.c
> > +++ b/drivers/cpufreq/cpufreq.c
> > @@ -728,18 +728,18 @@ show_one(cpuinfo_transition_latency, cpuinfo.transition_latency);
> > show_one(scaling_min_freq, min);
> > show_one(scaling_max_freq, max);
> >
> > -__weak unsigned int arch_freq_get_on_cpu(int cpu)
> > +__weak int arch_freq_get_on_cpu(int cpu)
> > {
> > - return 0;
> > + return -EOPNOTSUPP;
> > }
> >
> > static ssize_t show_scaling_cur_freq(struct cpufreq_policy *policy, char *buf)
> > {
> > ssize_t ret;
> > - unsigned int freq;
> > + int freq;
> >
> > freq = arch_freq_get_on_cpu(policy->cpu);
> > - if (freq)
> > + if (freq > 0)
>
> >= ?
>
> Since we can return error now, 0 should be considered a valid
> frequency value ?
Theoretically speaking - it should, though what would 0 actually
represent then ?
---
BR
Beata
>
> > ret = sysfs_emit(buf, "%u\n", freq);
> > else if (cpufreq_driver->setpolicy && cpufreq_driver->get)
> > ret = sysfs_emit(buf, "%u\n", cpufreq_driver->get(policy->cpu));
> > diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
> > index 7fe0981a7e46..02fd4746231d 100644
> > --- a/include/linux/cpufreq.h
> > +++ b/include/linux/cpufreq.h
> > @@ -1184,7 +1184,7 @@ static inline int of_perf_domain_get_sharing_cpumask(int pcpu, const char *list_
> > }
> > #endif
> >
> > -extern unsigned int arch_freq_get_on_cpu(int cpu);
> > +extern int arch_freq_get_on_cpu(int cpu);
> >
> > #ifndef arch_set_freq_scale
> > static __always_inline
> > --
> > 2.25.1
>
> --
> viresh
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v9 2/5] cpufreq: Introduce an optional cpuinfo_avg_freq sysfs entry
2025-01-21 10:53 ` Viresh Kumar
@ 2025-01-21 15:17 ` Beata Michalska
2025-01-22 6:09 ` Viresh Kumar
0 siblings, 1 reply; 32+ messages in thread
From: Beata Michalska @ 2025-01-21 15:17 UTC (permalink / raw)
To: Viresh Kumar
Cc: linux-kernel, linux-arm-kernel, linux-pm, ionela.voinescu,
sudeep.holla, will, catalin.marinas, rafael, sumitg, yang,
vanshikonda, lihuisong, zhanjie9, Jonathan Corbet,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
H . Peter Anvin, Phil Auld, x86, linux-doc
On Tue, Jan 21, 2025 at 04:23:55PM +0530, Viresh Kumar wrote:
> On 21-01-25, 08:44, Beata Michalska wrote:
> > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> > index 6f45684483c4..b2a8efa83c98 100644
> > --- a/drivers/cpufreq/cpufreq.c
> > +++ b/drivers/cpufreq/cpufreq.c
> > @@ -733,12 +733,20 @@ __weak int arch_freq_get_on_cpu(int cpu)
> > return -EOPNOTSUPP;
> > }
> >
> > static ssize_t show_scaling_cur_freq(struct cpufreq_policy *policy, char *buf)
> > {
> > ssize_t ret;
> > int freq;
> >
> > - freq = arch_freq_get_on_cpu(policy->cpu);
> > + freq = IS_ENABLED(CONFIG_CPUFREQ_ARCH_CUR_FREQ)
> > + ? arch_freq_get_on_cpu(policy->cpu)
> > + : 0;
> > +
> > if (freq > 0)
> > ret = sysfs_emit(buf, "%u\n", freq);
> > else if (cpufreq_driver->setpolicy && cpufreq_driver->get)
>
> Maybe this should be a separate commit ? And also I am not very happy
Initially it was supposed to be one, but then the rest of the series justifies
the changes so it made sense to send those in one go.
> with the new kconfig option. I don't want others to use it as we want
> to get rid of this for X86 too eventually. Making it a kconfig option
> allows anyone to enable it and then depend on it without us knowing..
>
> Rather just write it as "if (x86)", with a comment on what we plan to
> do with it in few release cycles.
Right, those changes are based on discussion in [1].
---
[1] https://lore.kernel.org/all/CAJZ5v0gCRKzaFrwkoBpLHQUxoP_+jAyhMiCkLQaBUpduk9yxtA@mail.gmail.com/
---
BR
Beata
>
> --
> viresh
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v9 1/5] cpufreq: Allow arch_freq_get_on_cpu to return an error
2025-01-21 15:14 ` Beata Michalska
@ 2025-01-21 18:40 ` Vanshidhar Konda
2025-01-23 21:37 ` Beata Michalska
2025-01-22 6:12 ` Viresh Kumar
1 sibling, 1 reply; 32+ messages in thread
From: Vanshidhar Konda @ 2025-01-21 18:40 UTC (permalink / raw)
To: Beata Michalska
Cc: Viresh Kumar, linux-kernel, linux-arm-kernel, linux-pm,
ionela.voinescu, sudeep.holla, will, catalin.marinas, rafael,
sumitg, yang, lihuisong, zhanjie9
On Tue, Jan 21, 2025 at 04:14:32PM +0100, Beata Michalska wrote:
>On Tue, Jan 21, 2025 at 04:17:06PM +0530, Viresh Kumar wrote:
>> On 21-01-25, 08:44, Beata Michalska wrote:
>> > Allow arch_freq_get_on_cpu to return an error for cases when retrieving
>> > current CPU frequency is not possible, whether that being due to lack of
>> > required arch support or due to other circumstances when the current
>> > frequency cannot be determined at given point of time.
>> >
>> > Signed-off-by: Beata Michalska <beata.michalska@arm.com>
>> > ---
>> > arch/x86/kernel/cpu/aperfmperf.c | 2 +-
>> > arch/x86/kernel/cpu/proc.c | 7 +++++--
>> > drivers/cpufreq/cpufreq.c | 8 ++++----
>> > include/linux/cpufreq.h | 2 +-
>> > 4 files changed, 11 insertions(+), 8 deletions(-)
>> >
>> > diff --git a/arch/x86/kernel/cpu/aperfmperf.c b/arch/x86/kernel/cpu/aperfmperf.c
>> > index f642de2ebdac..6cf31a1649c4 100644
>> > --- a/arch/x86/kernel/cpu/aperfmperf.c
>> > +++ b/arch/x86/kernel/cpu/aperfmperf.c
>> > @@ -498,7 +498,7 @@ void arch_scale_freq_tick(void)
>> > */
>> > #define MAX_SAMPLE_AGE ((unsigned long)HZ / 50)
>> >
>> > -unsigned int arch_freq_get_on_cpu(int cpu)
>> > +int arch_freq_get_on_cpu(int cpu)
>> > {
>> > struct aperfmperf *s = per_cpu_ptr(&cpu_samples, cpu);
>> > unsigned int seq, freq;
>> > diff --git a/arch/x86/kernel/cpu/proc.c b/arch/x86/kernel/cpu/proc.c
>> > index 41ed01f46bd9..d79f5845a463 100644
>> > --- a/arch/x86/kernel/cpu/proc.c
>> > +++ b/arch/x86/kernel/cpu/proc.c
>> > @@ -86,9 +86,12 @@ static int show_cpuinfo(struct seq_file *m, void *v)
>> > seq_printf(m, "microcode\t: 0x%x\n", c->microcode);
>> >
>> > if (cpu_has(c, X86_FEATURE_TSC)) {
>> > - unsigned int freq = arch_freq_get_on_cpu(cpu);
>> > + int freq = arch_freq_get_on_cpu(cpu);
>> >
>> > - seq_printf(m, "cpu MHz\t\t: %u.%03u\n", freq / 1000, (freq % 1000));
>> > + if (freq <= 0)
>> > + seq_puts(m, "cpu MHz\t\t: Unknown\n");
>> > + else
>> > + seq_printf(m, "cpu MHz\t\t: %u.%03u\n", freq / 1000, (freq % 1000));
>> > }
>> >
>> > /* Cache size */
>> > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
>> > index 418236fef172..6f45684483c4 100644
>> > --- a/drivers/cpufreq/cpufreq.c
>> > +++ b/drivers/cpufreq/cpufreq.c
>> > @@ -728,18 +728,18 @@ show_one(cpuinfo_transition_latency, cpuinfo.transition_latency);
>> > show_one(scaling_min_freq, min);
>> > show_one(scaling_max_freq, max);
>> >
>> > -__weak unsigned int arch_freq_get_on_cpu(int cpu)
>> > +__weak int arch_freq_get_on_cpu(int cpu)
>> > {
>> > - return 0;
>> > + return -EOPNOTSUPP;
>> > }
>> >
>> > static ssize_t show_scaling_cur_freq(struct cpufreq_policy *policy, char *buf)
>> > {
>> > ssize_t ret;
>> > - unsigned int freq;
>> > + int freq;
>> >
>> > freq = arch_freq_get_on_cpu(policy->cpu);
>> > - if (freq)
>> > + if (freq > 0)
>>
>> >= ?
>>
>> Since we can return error now, 0 should be considered a valid
>> frequency value ?
>Theoretically speaking - it should, though what would 0 actually
>represent then ?
I would think the value of 0 would be valid and should be interpreted in a
product/architecture specific manner? From silicon behavior, to me only negative
frequency values wouldn't make any sense.
In Patch 1 of this series we interpret 0 as "Unknown" on a x86 system though. So
for the sake of consistency should we consider 0 a valid value everywhere?
Regards,
Vanshi
>
>---
>BR
>Beata
>>
>> > ret = sysfs_emit(buf, "%u\n", freq);
>> > else if (cpufreq_driver->setpolicy && cpufreq_driver->get)
>> > ret = sysfs_emit(buf, "%u\n", cpufreq_driver->get(policy->cpu));
>> > diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
>> > index 7fe0981a7e46..02fd4746231d 100644
>> > --- a/include/linux/cpufreq.h
>> > +++ b/include/linux/cpufreq.h
>> > @@ -1184,7 +1184,7 @@ static inline int of_perf_domain_get_sharing_cpumask(int pcpu, const char *list_
>> > }
>> > #endif
>> >
>> > -extern unsigned int arch_freq_get_on_cpu(int cpu);
>> > +extern int arch_freq_get_on_cpu(int cpu);
>> >
>> > #ifndef arch_set_freq_scale
>> > static __always_inline
>> > --
>> > 2.25.1
>>
>> --
>> viresh
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v9 2/5] cpufreq: Introduce an optional cpuinfo_avg_freq sysfs entry
2025-01-21 15:17 ` Beata Michalska
@ 2025-01-22 6:09 ` Viresh Kumar
2025-01-23 21:47 ` Beata Michalska
0 siblings, 1 reply; 32+ messages in thread
From: Viresh Kumar @ 2025-01-22 6:09 UTC (permalink / raw)
To: Beata Michalska
Cc: linux-kernel, linux-arm-kernel, linux-pm, ionela.voinescu,
sudeep.holla, will, catalin.marinas, rafael, sumitg, yang,
vanshikonda, lihuisong, zhanjie9, Jonathan Corbet,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
H . Peter Anvin, Phil Auld, x86, linux-doc
On 21-01-25, 16:17, Beata Michalska wrote:
> On Tue, Jan 21, 2025 at 04:23:55PM +0530, Viresh Kumar wrote:
> > On 21-01-25, 08:44, Beata Michalska wrote:
> > > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> > > index 6f45684483c4..b2a8efa83c98 100644
> > > --- a/drivers/cpufreq/cpufreq.c
> > > +++ b/drivers/cpufreq/cpufreq.c
> > > @@ -733,12 +733,20 @@ __weak int arch_freq_get_on_cpu(int cpu)
> > > return -EOPNOTSUPP;
> > > }
> > >
> > > static ssize_t show_scaling_cur_freq(struct cpufreq_policy *policy, char *buf)
> > > {
> > > ssize_t ret;
> > > int freq;
> > >
> > > - freq = arch_freq_get_on_cpu(policy->cpu);
> > > + freq = IS_ENABLED(CONFIG_CPUFREQ_ARCH_CUR_FREQ)
> > > + ? arch_freq_get_on_cpu(policy->cpu)
> > > + : 0;
> > > +
> > > if (freq > 0)
> > > ret = sysfs_emit(buf, "%u\n", freq);
> > > else if (cpufreq_driver->setpolicy && cpufreq_driver->get)
> >
> > Maybe this should be a separate commit ? And also I am not very happy
> Initially it was supposed to be one, but then the rest of the series justifies
> the changes so it made sense to send those in one go.
> > with the new kconfig option. I don't want others to use it as we want
> > to get rid of this for X86 too eventually. Making it a kconfig option
> > allows anyone to enable it and then depend on it without us knowing..
> >
> > Rather just write it as "if (x86)", with a comment on what we plan to
> > do with it in few release cycles.
> Right, those changes are based on discussion in [1].
Ahh I see.. What about making it depend on X86 for now, as we really
don't want new users to use it ?
--
viresh
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v9 1/5] cpufreq: Allow arch_freq_get_on_cpu to return an error
2025-01-21 15:14 ` Beata Michalska
2025-01-21 18:40 ` Vanshidhar Konda
@ 2025-01-22 6:12 ` Viresh Kumar
2025-01-23 21:45 ` Beata Michalska
1 sibling, 1 reply; 32+ messages in thread
From: Viresh Kumar @ 2025-01-22 6:12 UTC (permalink / raw)
To: Beata Michalska
Cc: linux-kernel, linux-arm-kernel, linux-pm, ionela.voinescu,
sudeep.holla, will, catalin.marinas, rafael, sumitg, yang,
vanshikonda, lihuisong, zhanjie9
On 21-01-25, 16:14, Beata Michalska wrote:
> Theoretically speaking - it should, though what would 0 actually
> represent then ?
0 won't be a failure, that's clear, since errors are represented
differently now. I am not sure what 0 frequency would mean and it can
be left as an architecture specific value, which is a corner case I am
not sure will ever occur.
--
viresh
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v9 1/5] cpufreq: Allow arch_freq_get_on_cpu to return an error
2025-01-21 18:40 ` Vanshidhar Konda
@ 2025-01-23 21:37 ` Beata Michalska
0 siblings, 0 replies; 32+ messages in thread
From: Beata Michalska @ 2025-01-23 21:37 UTC (permalink / raw)
To: Vanshidhar Konda
Cc: Viresh Kumar, linux-kernel, linux-arm-kernel, linux-pm,
ionela.voinescu, sudeep.holla, will, catalin.marinas, rafael,
sumitg, yang, lihuisong, zhanjie9
On Tue, Jan 21, 2025 at 10:40:55AM -0800, Vanshidhar Konda wrote:
> On Tue, Jan 21, 2025 at 04:14:32PM +0100, Beata Michalska wrote:
> > On Tue, Jan 21, 2025 at 04:17:06PM +0530, Viresh Kumar wrote:
> > > On 21-01-25, 08:44, Beata Michalska wrote:
> > > > Allow arch_freq_get_on_cpu to return an error for cases when retrieving
> > > > current CPU frequency is not possible, whether that being due to lack of
> > > > required arch support or due to other circumstances when the current
> > > > frequency cannot be determined at given point of time.
> > > >
> > > > Signed-off-by: Beata Michalska <beata.michalska@arm.com>
> > > > ---
> > > > arch/x86/kernel/cpu/aperfmperf.c | 2 +-
> > > > arch/x86/kernel/cpu/proc.c | 7 +++++--
> > > > drivers/cpufreq/cpufreq.c | 8 ++++----
> > > > include/linux/cpufreq.h | 2 +-
> > > > 4 files changed, 11 insertions(+), 8 deletions(-)
> > > >
> > > > diff --git a/arch/x86/kernel/cpu/aperfmperf.c b/arch/x86/kernel/cpu/aperfmperf.c
> > > > index f642de2ebdac..6cf31a1649c4 100644
> > > > --- a/arch/x86/kernel/cpu/aperfmperf.c
> > > > +++ b/arch/x86/kernel/cpu/aperfmperf.c
> > > > @@ -498,7 +498,7 @@ void arch_scale_freq_tick(void)
> > > > */
> > > > #define MAX_SAMPLE_AGE ((unsigned long)HZ / 50)
> > > >
> > > > -unsigned int arch_freq_get_on_cpu(int cpu)
> > > > +int arch_freq_get_on_cpu(int cpu)
> > > > {
> > > > struct aperfmperf *s = per_cpu_ptr(&cpu_samples, cpu);
> > > > unsigned int seq, freq;
> > > > diff --git a/arch/x86/kernel/cpu/proc.c b/arch/x86/kernel/cpu/proc.c
> > > > index 41ed01f46bd9..d79f5845a463 100644
> > > > --- a/arch/x86/kernel/cpu/proc.c
> > > > +++ b/arch/x86/kernel/cpu/proc.c
> > > > @@ -86,9 +86,12 @@ static int show_cpuinfo(struct seq_file *m, void *v)
> > > > seq_printf(m, "microcode\t: 0x%x\n", c->microcode);
> > > >
> > > > if (cpu_has(c, X86_FEATURE_TSC)) {
> > > > - unsigned int freq = arch_freq_get_on_cpu(cpu);
> > > > + int freq = arch_freq_get_on_cpu(cpu);
> > > >
> > > > - seq_printf(m, "cpu MHz\t\t: %u.%03u\n", freq / 1000, (freq % 1000));
> > > > + if (freq <= 0)
> > > > + seq_puts(m, "cpu MHz\t\t: Unknown\n");
> > > > + else
> > > > + seq_printf(m, "cpu MHz\t\t: %u.%03u\n", freq / 1000, (freq % 1000));
> > > > }
> > > >
> > > > /* Cache size */
> > > > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> > > > index 418236fef172..6f45684483c4 100644
> > > > --- a/drivers/cpufreq/cpufreq.c
> > > > +++ b/drivers/cpufreq/cpufreq.c
> > > > @@ -728,18 +728,18 @@ show_one(cpuinfo_transition_latency, cpuinfo.transition_latency);
> > > > show_one(scaling_min_freq, min);
> > > > show_one(scaling_max_freq, max);
> > > >
> > > > -__weak unsigned int arch_freq_get_on_cpu(int cpu)
> > > > +__weak int arch_freq_get_on_cpu(int cpu)
> > > > {
> > > > - return 0;
> > > > + return -EOPNOTSUPP;
> > > > }
> > > >
> > > > static ssize_t show_scaling_cur_freq(struct cpufreq_policy *policy, char *buf)
> > > > {
> > > > ssize_t ret;
> > > > - unsigned int freq;
> > > > + int freq;
> > > >
> > > > freq = arch_freq_get_on_cpu(policy->cpu);
> > > > - if (freq)
> > > > + if (freq > 0)
> > >
> > > >= ?
> > >
> > > Since we can return error now, 0 should be considered a valid
> > > frequency value ?
> > Theoretically speaking - it should, though what would 0 actually
> > represent then ?
> I would think the value of 0 would be valid and should be interpreted in a
> product/architecture specific manner? From silicon behavior, to me only negative
> frequency values wouldn't make any sense.
Still not convinced as of when '0' could represent a valid frequency and what
that one would actually imply but I do not have that strong of an opinion here.
>
> In Patch 1 of this series we interpret 0 as "Unknown" on a x86 system though. So
> for the sake of consistency should we consider 0 a valid value everywhere?
Yes, but on the other hand, showing 'Unknown' could be skipped entirely here as
the arch_freq_get_on_cpu for x86 is always providing a 'valid' frequency.
The change here was just to make things somewhat sane.
Note that the new attribute introduced in the following patch will utilise
potential error values from arch_freq_get_on_cpu though.
---
BR
Beata
>
> Regards,
> Vanshi
>
> >
> > ---
> > BR
> > Beata
> > >
> > > > ret = sysfs_emit(buf, "%u\n", freq);
> > > > else if (cpufreq_driver->setpolicy && cpufreq_driver->get)
> > > > ret = sysfs_emit(buf, "%u\n", cpufreq_driver->get(policy->cpu));
> > > > diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
> > > > index 7fe0981a7e46..02fd4746231d 100644
> > > > --- a/include/linux/cpufreq.h
> > > > +++ b/include/linux/cpufreq.h
> > > > @@ -1184,7 +1184,7 @@ static inline int of_perf_domain_get_sharing_cpumask(int pcpu, const char *list_
> > > > }
> > > > #endif
> > > >
> > > > -extern unsigned int arch_freq_get_on_cpu(int cpu);
> > > > +extern int arch_freq_get_on_cpu(int cpu);
> > > >
> > > > #ifndef arch_set_freq_scale
> > > > static __always_inline
> > > > --
> > > > 2.25.1
> > >
> > > --
> > > viresh
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v9 1/5] cpufreq: Allow arch_freq_get_on_cpu to return an error
2025-01-22 6:12 ` Viresh Kumar
@ 2025-01-23 21:45 ` Beata Michalska
2025-01-24 3:33 ` Viresh Kumar
0 siblings, 1 reply; 32+ messages in thread
From: Beata Michalska @ 2025-01-23 21:45 UTC (permalink / raw)
To: Viresh Kumar
Cc: linux-kernel, linux-arm-kernel, linux-pm, ionela.voinescu,
sudeep.holla, will, catalin.marinas, rafael, sumitg, yang,
vanshikonda, lihuisong, zhanjie9
On Wed, Jan 22, 2025 at 11:42:50AM +0530, Viresh Kumar wrote:
> On 21-01-25, 16:14, Beata Michalska wrote:
> > Theoretically speaking - it should, though what would 0 actually
> > represent then ?
>
> 0 won't be a failure, that's clear, since errors are represented
> differently now. I am not sure what 0 frequency would mean and it can
> be left as an architecture specific value, which is a corner case I am
> not sure will ever occur.
That would mean we are opting for presenting '0' value (whatever that means)
instead of trying alternative ways of getting 'current' frequency ?
This is still the scaling_cur_freq.
---
BR
Beata
>
> --
> viresh
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v9 2/5] cpufreq: Introduce an optional cpuinfo_avg_freq sysfs entry
2025-01-22 6:09 ` Viresh Kumar
@ 2025-01-23 21:47 ` Beata Michalska
2025-01-24 3:27 ` Viresh Kumar
0 siblings, 1 reply; 32+ messages in thread
From: Beata Michalska @ 2025-01-23 21:47 UTC (permalink / raw)
To: Viresh Kumar
Cc: linux-kernel, linux-arm-kernel, linux-pm, ionela.voinescu,
sudeep.holla, will, catalin.marinas, rafael, sumitg, yang,
vanshikonda, lihuisong, zhanjie9, Jonathan Corbet,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
H . Peter Anvin, Phil Auld, x86, linux-doc
On Wed, Jan 22, 2025 at 11:39:02AM +0530, Viresh Kumar wrote:
> On 21-01-25, 16:17, Beata Michalska wrote:
> > On Tue, Jan 21, 2025 at 04:23:55PM +0530, Viresh Kumar wrote:
> > > On 21-01-25, 08:44, Beata Michalska wrote:
> > > > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> > > > index 6f45684483c4..b2a8efa83c98 100644
> > > > --- a/drivers/cpufreq/cpufreq.c
> > > > +++ b/drivers/cpufreq/cpufreq.c
> > > > @@ -733,12 +733,20 @@ __weak int arch_freq_get_on_cpu(int cpu)
> > > > return -EOPNOTSUPP;
> > > > }
> > > >
> > > > static ssize_t show_scaling_cur_freq(struct cpufreq_policy *policy, char *buf)
> > > > {
> > > > ssize_t ret;
> > > > int freq;
> > > >
> > > > - freq = arch_freq_get_on_cpu(policy->cpu);
> > > > + freq = IS_ENABLED(CONFIG_CPUFREQ_ARCH_CUR_FREQ)
> > > > + ? arch_freq_get_on_cpu(policy->cpu)
> > > > + : 0;
> > > > +
> > > > if (freq > 0)
> > > > ret = sysfs_emit(buf, "%u\n", freq);
> > > > else if (cpufreq_driver->setpolicy && cpufreq_driver->get)
> > >
> > > Maybe this should be a separate commit ? And also I am not very happy
> > Initially it was supposed to be one, but then the rest of the series justifies
> > the changes so it made sense to send those in one go.
> > > with the new kconfig option. I don't want others to use it as we want
> > > to get rid of this for X86 too eventually. Making it a kconfig option
> > > allows anyone to enable it and then depend on it without us knowing..
> > >
> > > Rather just write it as "if (x86)", with a comment on what we plan to
> > > do with it in few release cycles.
> > Right, those changes are based on discussion in [1].
>
> Ahh I see.. What about making it depend on X86 for now, as we really
> don't want new users to use it ?
Do you mean the new config option? If so, it is in Kconfig.x86 already.
Unless you have smth else in mind ?
---
BR
Beata
>
> --
> viresh
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v9 2/5] cpufreq: Introduce an optional cpuinfo_avg_freq sysfs entry
2025-01-23 21:47 ` Beata Michalska
@ 2025-01-24 3:27 ` Viresh Kumar
0 siblings, 0 replies; 32+ messages in thread
From: Viresh Kumar @ 2025-01-24 3:27 UTC (permalink / raw)
To: Beata Michalska
Cc: linux-kernel, linux-arm-kernel, linux-pm, ionela.voinescu,
sudeep.holla, will, catalin.marinas, rafael, sumitg, yang,
vanshikonda, lihuisong, zhanjie9, Jonathan Corbet,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
H . Peter Anvin, Phil Auld, x86, linux-doc
On 23-01-25, 22:47, Beata Michalska wrote:
> Do you mean the new config option? If so, it is in Kconfig.x86 already.
> Unless you have smth else in mind ?
Its good then :)
--
viresh
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v9 1/5] cpufreq: Allow arch_freq_get_on_cpu to return an error
2025-01-23 21:45 ` Beata Michalska
@ 2025-01-24 3:33 ` Viresh Kumar
2025-01-28 8:09 ` Beata Michalska
0 siblings, 1 reply; 32+ messages in thread
From: Viresh Kumar @ 2025-01-24 3:33 UTC (permalink / raw)
To: Beata Michalska
Cc: linux-kernel, linux-arm-kernel, linux-pm, ionela.voinescu,
sudeep.holla, will, catalin.marinas, rafael, sumitg, yang,
vanshikonda, lihuisong, zhanjie9
On 23-01-25, 22:45, Beata Michalska wrote:
> That would mean we are opting for presenting '0' value (whatever that means)
> instead of trying alternative ways of getting 'current' frequency ?
> This is still the scaling_cur_freq.
A return value of 0 should typically mean something went wrong
somewhere and didn't return the right value to us.
- For the print message, I think we should just print the value
instead of UNKNOWN. Let the user / developer decide what to do with
it.
- As for trying other mechanism to find the frequency now, maybe you
are right and looking for an alternate way is the right way to go.
And that would be consistent with existing behavior too.
--
viresh
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v9 1/5] cpufreq: Allow arch_freq_get_on_cpu to return an error
2025-01-21 8:44 ` [PATCH v9 1/5] cpufreq: Allow arch_freq_get_on_cpu to return an error Beata Michalska
2025-01-21 10:47 ` Viresh Kumar
2025-01-21 10:47 ` Prasanna Kumar T S M
@ 2025-01-24 4:15 ` Prasanna Kumar T S M
2 siblings, 0 replies; 32+ messages in thread
From: Prasanna Kumar T S M @ 2025-01-24 4:15 UTC (permalink / raw)
To: Beata Michalska, linux-kernel, linux-arm-kernel, linux-pm,
ionela.voinescu, sudeep.holla, will, catalin.marinas, rafael,
viresh.kumar
Cc: sumitg, yang, vanshikonda, lihuisong, zhanjie9
On 21-01-2025 14:14, Beata Michalska wrote:
> Allow arch_freq_get_on_cpu to return an error for cases when retrieving
> current CPU frequency is not possible, whether that being due to lack of
> required arch support or due to other circumstances when the current
> frequency cannot be determined at given point of time.
>
> Signed-off-by: Beata Michalska <beata.michalska@arm.com>
> ---
> arch/x86/kernel/cpu/aperfmperf.c | 2 +-
> arch/x86/kernel/cpu/proc.c | 7 +++++--
> drivers/cpufreq/cpufreq.c | 8 ++++----
> include/linux/cpufreq.h | 2 +-
> 4 files changed, 11 insertions(+), 8 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/aperfmperf.c b/arch/x86/kernel/cpu/aperfmperf.c
> index f642de2ebdac..6cf31a1649c4 100644
> --- a/arch/x86/kernel/cpu/aperfmperf.c
> +++ b/arch/x86/kernel/cpu/aperfmperf.c
> @@ -498,7 +498,7 @@ void arch_scale_freq_tick(void)
> */
> #define MAX_SAMPLE_AGE ((unsigned long)HZ / 50)
>
> -unsigned int arch_freq_get_on_cpu(int cpu)
> +int arch_freq_get_on_cpu(int cpu)
> {
> struct aperfmperf *s = per_cpu_ptr(&cpu_samples, cpu);
> unsigned int seq, freq;
> diff --git a/arch/x86/kernel/cpu/proc.c b/arch/x86/kernel/cpu/proc.c
> index 41ed01f46bd9..d79f5845a463 100644
> --- a/arch/x86/kernel/cpu/proc.c
> +++ b/arch/x86/kernel/cpu/proc.c
> @@ -86,9 +86,12 @@ static int show_cpuinfo(struct seq_file *m, void *v)
> seq_printf(m, "microcode\t: 0x%x\n", c->microcode);
>
> if (cpu_has(c, X86_FEATURE_TSC)) {
> - unsigned int freq = arch_freq_get_on_cpu(cpu);
> + int freq = arch_freq_get_on_cpu(cpu);
>
> - seq_printf(m, "cpu MHz\t\t: %u.%03u\n", freq / 1000, (freq % 1000));
> + if (freq <= 0)
> + seq_puts(m, "cpu MHz\t\t: Unknown\n");
> + else
> + seq_printf(m, "cpu MHz\t\t: %u.%03u\n", freq / 1000, (freq % 1000));
> }
>
> /* Cache size */
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index 418236fef172..6f45684483c4 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -728,18 +728,18 @@ show_one(cpuinfo_transition_latency, cpuinfo.transition_latency);
> show_one(scaling_min_freq, min);
> show_one(scaling_max_freq, max);
>
> -__weak unsigned int arch_freq_get_on_cpu(int cpu)
> +__weak int arch_freq_get_on_cpu(int cpu)
> {
> - return 0;
> + return -EOPNOTSUPP;
> }
>
> static ssize_t show_scaling_cur_freq(struct cpufreq_policy *policy, char *buf)
> {
> ssize_t ret;
> - unsigned int freq;
> + int freq;
>
> freq = arch_freq_get_on_cpu(policy->cpu);
> - if (freq)
> + if (freq > 0)
> ret = sysfs_emit(buf, "%u\n", freq);
> else if (cpufreq_driver->setpolicy && cpufreq_driver->get)
> ret = sysfs_emit(buf, "%u\n", cpufreq_driver->get(policy->cpu));
> diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
> index 7fe0981a7e46..02fd4746231d 100644
> --- a/include/linux/cpufreq.h
> +++ b/include/linux/cpufreq.h
> @@ -1184,7 +1184,7 @@ static inline int of_perf_domain_get_sharing_cpumask(int pcpu, const char *list_
> }
> #endif
>
> -extern unsigned int arch_freq_get_on_cpu(int cpu);
> +extern int arch_freq_get_on_cpu(int cpu);
>
> #ifndef arch_set_freq_scale
> static __always_inline
Looks good to me.
Reviewed-by: Prasanna Kumar T S M <ptsm@linux.microsoft.com>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v9 4/5] arm64: Provide an AMU-based version of arch_freq_get_on_cpu
2025-01-21 8:44 ` [PATCH v9 4/5] arm64: Provide an AMU-based version of arch_freq_get_on_cpu Beata Michalska
@ 2025-01-24 4:43 ` Prasanna Kumar T S M
2025-01-28 8:16 ` Beata Michalska
2025-01-29 11:15 ` Sumit Gupta
1 sibling, 1 reply; 32+ messages in thread
From: Prasanna Kumar T S M @ 2025-01-24 4:43 UTC (permalink / raw)
To: Beata Michalska, linux-kernel, linux-arm-kernel, linux-pm,
ionela.voinescu, sudeep.holla, will, catalin.marinas, rafael,
viresh.kumar
Cc: sumitg, yang, vanshikonda, lihuisong, zhanjie9
On 21-01-2025 14:14, Beata Michalska wrote:
> With the Frequency Invariance Engine (FIE) being already wired up with
> sched tick and making use of relevant (core counter and constant
> counter) AMU counters, getting the average frequency for a given CPU,
> can be achieved by utilizing the frequency scale factor which reflects
> an average CPU frequency for the last tick period length.
>
> The solution is partially based on APERF/MPERF implementation of
> arch_freq_get_on_cpu.
>
> Suggested-by: Ionela Voinescu <ionela.voinescu@arm.com>
> Signed-off-by: Beata Michalska <beata.michalska@arm.com>
> ---
> arch/arm64/kernel/topology.c | 109 +++++++++++++++++++++++++++++++----
> 1 file changed, 99 insertions(+), 10 deletions(-)
>
> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
> index cb180684d10d..5f5738b174c7 100644
> --- a/arch/arm64/kernel/topology.c
> +++ b/arch/arm64/kernel/topology.c
> @@ -17,6 +17,7 @@
> #include <linux/cpufreq.h>
> #include <linux/init.h>
> #include <linux/percpu.h>
> +#include <linux/sched/isolation.h>
>
> #include <asm/cpu.h>
> #include <asm/cputype.h>
> @@ -88,18 +89,28 @@ int __init parse_acpi_topology(void)
> * initialized.
> */
> static DEFINE_PER_CPU_READ_MOSTLY(unsigned long, arch_max_freq_scale) = 1UL << (2 * SCHED_CAPACITY_SHIFT);
> -static DEFINE_PER_CPU(u64, arch_const_cycles_prev);
> -static DEFINE_PER_CPU(u64, arch_core_cycles_prev);
> static cpumask_var_t amu_fie_cpus;
>
> +struct amu_cntr_sample {
> + u64 arch_const_cycles_prev;
> + u64 arch_core_cycles_prev;
> + unsigned long last_scale_update;
> +};
> +
> +static DEFINE_PER_CPU_SHARED_ALIGNED(struct amu_cntr_sample, cpu_amu_samples);
> +
> void update_freq_counters_refs(void)
> {
> - this_cpu_write(arch_core_cycles_prev, read_corecnt());
> - this_cpu_write(arch_const_cycles_prev, read_constcnt());
> + struct amu_cntr_sample *amu_sample = this_cpu_ptr(&cpu_amu_samples);
> +
> + amu_sample->arch_core_cycles_prev = read_corecnt();
> + amu_sample->arch_const_cycles_prev = read_constcnt();
> }
>
> static inline bool freq_counters_valid(int cpu)
> {
> + struct amu_cntr_sample *amu_sample = per_cpu_ptr(&cpu_amu_samples, cpu);
> +
> if ((cpu >= nr_cpu_ids) || !cpumask_test_cpu(cpu, cpu_present_mask))
> return false;
>
> @@ -108,8 +119,8 @@ static inline bool freq_counters_valid(int cpu)
> return false;
> }
>
> - if (unlikely(!per_cpu(arch_const_cycles_prev, cpu) ||
> - !per_cpu(arch_core_cycles_prev, cpu))) {
> + if (unlikely(!amu_sample->arch_const_cycles_prev ||
> + !amu_sample->arch_core_cycles_prev)) {
> pr_debug("CPU%d: cycle counters are not enabled.\n", cpu);
> return false;
> }
> @@ -152,17 +163,22 @@ void freq_inv_set_max_ratio(int cpu, u64 max_rate)
>
> static void amu_scale_freq_tick(void)
> {
> + struct amu_cntr_sample *amu_sample = this_cpu_ptr(&cpu_amu_samples);
> u64 prev_core_cnt, prev_const_cnt;
> u64 core_cnt, const_cnt, scale;
>
> - prev_const_cnt = this_cpu_read(arch_const_cycles_prev);
> - prev_core_cnt = this_cpu_read(arch_core_cycles_prev);
> + prev_const_cnt = amu_sample->arch_const_cycles_prev;
> + prev_core_cnt = amu_sample->arch_core_cycles_prev;
>
> update_freq_counters_refs();
>
> - const_cnt = this_cpu_read(arch_const_cycles_prev);
> - core_cnt = this_cpu_read(arch_core_cycles_prev);
> + const_cnt = amu_sample->arch_const_cycles_prev;
> + core_cnt = amu_sample->arch_core_cycles_prev;
>
> + /*
> + * This should not happen unless the AMUs have been reset and the
> + * counter values have not been restored - unlikely
> + */
> if (unlikely(core_cnt <= prev_core_cnt ||
> const_cnt <= prev_const_cnt))
> return;
> @@ -182,6 +198,8 @@ static void amu_scale_freq_tick(void)
>
> scale = min_t(unsigned long, scale, SCHED_CAPACITY_SCALE);
> this_cpu_write(arch_freq_scale, (unsigned long)scale);
> +
> + amu_sample->last_scale_update = jiffies;
> }
>
> static struct scale_freq_data amu_sfd = {
> @@ -189,6 +207,77 @@ static struct scale_freq_data amu_sfd = {
> .set_freq_scale = amu_scale_freq_tick,
> };
>
> +static __always_inline bool amu_fie_cpu_supported(unsigned int cpu)
> +{
> + return cpumask_available(amu_fie_cpus) &&
> + cpumask_test_cpu(cpu, amu_fie_cpus);
> +}
> +
> +#define AMU_SAMPLE_EXP_MS 20
> +
> +int arch_freq_get_on_cpu(int cpu)
> +{
> + struct amu_cntr_sample *amu_sample;
> + unsigned int start_cpu = cpu;
> + unsigned long last_update;
> + unsigned int freq = 0;
> + u64 scale;
> +
> + if (!amu_fie_cpu_supported(cpu) || !arch_scale_freq_ref(cpu))
> + return -EOPNOTSUPP;
> +
> +retry:
> + amu_sample = per_cpu_ptr(&cpu_amu_samples, cpu);
> +
> + last_update = amu_sample->last_scale_update;
> +
> + /*
> + * For those CPUs that are in full dynticks mode, or those that have
> + * not seen tick for a while, try an alternative source for the counters
> + * (and thus freq scale), if available, for given policy: this boils
> + * down to identifying an active cpu within the same freq domain, if any.
> + */
> + if (!housekeeping_cpu(cpu, HK_TYPE_TICK) ||
> + time_is_before_jiffies(last_update + msecs_to_jiffies(AMU_SAMPLE_EXP_MS))) {
> + struct cpufreq_policy *policy = cpufreq_cpu_get(cpu);
> + int ref_cpu = cpu;
> +
> + if (!policy)
> + return -EINVAL;
> +
> + if (!cpumask_intersects(policy->related_cpus,
> + housekeeping_cpumask(HK_TYPE_TICK))) {
> + cpufreq_cpu_put(policy);
> + return -EOPNOTSUPP;
> + }
> +
> +
> + do {
> + ref_cpu = cpumask_next_wrap(ref_cpu, policy->cpus,
> + start_cpu, false);
> +
> + } while (ref_cpu < nr_cpu_ids && idle_cpu(ref_cpu));
> +
> + cpufreq_cpu_put(policy);
> +
> + if (ref_cpu >= nr_cpu_ids)
> + /* No alternative to pull info from */
> + return -EAGAIN;
> +
> + cpu = ref_cpu;
> + goto retry;
If you are going to spin a new revision, can you use while loop instead
of using goto for looping? This will help improve the readability.
> + }
> + /*
> + * Reversed computation to the one used to determine
> + * the arch_freq_scale value
> + * (see amu_scale_freq_tick for details)
> + */
> + scale = arch_scale_freq_capacity(cpu);
> + freq = scale * arch_scale_freq_ref(cpu);
> + freq >>= SCHED_CAPACITY_SHIFT;
> + return freq;
> +}
> +
> static void amu_fie_setup(const struct cpumask *cpus)
> {
> int cpu;
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v9 5/5] arm64: Update AMU-based freq scale factor on entering idle
2025-01-21 8:44 ` [PATCH v9 5/5] arm64: Update AMU-based freq scale factor on entering idle Beata Michalska
@ 2025-01-24 4:45 ` Prasanna Kumar T S M
2025-01-29 11:13 ` Sumit Gupta
1 sibling, 0 replies; 32+ messages in thread
From: Prasanna Kumar T S M @ 2025-01-24 4:45 UTC (permalink / raw)
To: Beata Michalska, linux-kernel, linux-arm-kernel, linux-pm,
ionela.voinescu, sudeep.holla, will, catalin.marinas, rafael,
viresh.kumar
Cc: sumitg, yang, vanshikonda, lihuisong, zhanjie9
On 21-01-2025 14:14, Beata Michalska wrote:
> Now that the frequency scale factor has been activated for retrieving
> current frequency on a given CPU, trigger its update upon entering
> idle. This will, to an extent, allow querying last known frequency
> in a non-invasive way. It will also improve the frequency scale factor
> accuracy when a CPU entering idle did not receive a tick for a while.
> As a consequence, for idle cores, the reported frequency will be the
> last one observed before entering the idle state.
>
> Suggested-by: Vanshidhar Konda <vanshikonda@os.amperecomputing.com>
> Signed-off-by: Beata Michalska <beata.michalska@arm.com>
> ---
> arch/arm64/kernel/topology.c | 13 +++++++++++++
> 1 file changed, 13 insertions(+)
>
> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
> index 5f5738b174c7..6c43aafac77c 100644
> --- a/arch/arm64/kernel/topology.c
> +++ b/arch/arm64/kernel/topology.c
> @@ -213,6 +213,19 @@ static __always_inline bool amu_fie_cpu_supported(unsigned int cpu)
> cpumask_test_cpu(cpu, amu_fie_cpus);
> }
>
> +void arch_cpu_idle_enter(void)
> +{
> + unsigned int cpu = smp_processor_id();
> +
> + if (!amu_fie_cpu_supported(cpu))
> + return;
> +
> + /* Kick in AMU update but only if one has not happened already */
> + if (housekeeping_cpu(cpu, HK_TYPE_TICK) &&
> + time_is_before_jiffies(per_cpu(cpu_amu_samples.last_scale_update, cpu)))
> + amu_scale_freq_tick();
> +}
> +
> #define AMU_SAMPLE_EXP_MS 20
>
> int arch_freq_get_on_cpu(int cpu)
Looks good to me.
Reviewed-by: Prasanna Kumar T S M <ptsm@linux.microsoft.com>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v9 3/5] arm64: amu: Delay allocating cpumask for AMU FIE support
2025-01-21 8:44 ` [PATCH v9 3/5] arm64: amu: Delay allocating cpumask for AMU FIE support Beata Michalska
@ 2025-01-24 4:48 ` Prasanna Kumar T S M
2025-01-29 11:17 ` Sumit Gupta
1 sibling, 0 replies; 32+ messages in thread
From: Prasanna Kumar T S M @ 2025-01-24 4:48 UTC (permalink / raw)
To: Beata Michalska, linux-kernel, linux-arm-kernel, linux-pm,
ionela.voinescu, sudeep.holla, will, catalin.marinas, rafael,
viresh.kumar
Cc: sumitg, yang, vanshikonda, lihuisong, zhanjie9
On 21-01-2025 14:14, Beata Michalska wrote:
> For the time being, the amu_fie_cpus cpumask is being exclusively used
> by the AMU-related internals of FIE support and is guaranteed to be
> valid on every access currently made. Still the mask is not being
> invalidated on one of the error handling code paths, which leaves
> a soft spot with theoretical risk of UAF for CPUMASK_OFFSTACK cases.
> To make things sound, delay allocating said cpumask
> (for CPUMASK_OFFSTACK) avoiding otherwise nasty sanitising case failing
> to register the cpufreq policy notifications.
>
> Signed-off-by: Beata Michalska <beata.michalska@arm.com>
> ---
> arch/arm64/kernel/topology.c | 22 ++++++++++------------
> 1 file changed, 10 insertions(+), 12 deletions(-)
>
> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
> index 1a2c72f3e7f8..cb180684d10d 100644
> --- a/arch/arm64/kernel/topology.c
> +++ b/arch/arm64/kernel/topology.c
> @@ -194,12 +194,19 @@ static void amu_fie_setup(const struct cpumask *cpus)
> int cpu;
>
> /* We are already set since the last insmod of cpufreq driver */
> - if (unlikely(cpumask_subset(cpus, amu_fie_cpus)))
> + if (cpumask_available(amu_fie_cpus) &&
> + unlikely(cpumask_subset(cpus, amu_fie_cpus)))
> return;
>
> - for_each_cpu(cpu, cpus) {
> + for_each_cpu(cpu, cpus)
> if (!freq_counters_valid(cpu))
> return;
> +
> + if (!cpumask_available(amu_fie_cpus) &&
> + !zalloc_cpumask_var(&amu_fie_cpus, GFP_KERNEL)) {
> + WARN_ONCE(1, "Failed to allocate FIE cpumask for CPUs[%*pbl]\n",
> + cpumask_pr_args(cpus));
> + return;
> }
>
> cpumask_or(amu_fie_cpus, amu_fie_cpus, cpus);
> @@ -237,17 +244,8 @@ static struct notifier_block init_amu_fie_notifier = {
>
> static int __init init_amu_fie(void)
> {
> - int ret;
> -
> - if (!zalloc_cpumask_var(&amu_fie_cpus, GFP_KERNEL))
> - return -ENOMEM;
> -
> - ret = cpufreq_register_notifier(&init_amu_fie_notifier,
> + return cpufreq_register_notifier(&init_amu_fie_notifier,
> CPUFREQ_POLICY_NOTIFIER);
> - if (ret)
> - free_cpumask_var(amu_fie_cpus);
> -
> - return ret;
> }
> core_initcall(init_amu_fie);
>
This patch appears useful, irrespective of the feature introduced by
this series. Isn't it? Can you please carve out this into an individual
patch?
The change looks good to me.
Reviewed-by: Prasanna Kumar T S M <ptsm@linux.microsoft.com>.
Please feel free to add my reviewed-by tag even if you are carving this
patch out.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v9 1/5] cpufreq: Allow arch_freq_get_on_cpu to return an error
2025-01-24 3:33 ` Viresh Kumar
@ 2025-01-28 8:09 ` Beata Michalska
2025-01-28 8:18 ` Viresh Kumar
0 siblings, 1 reply; 32+ messages in thread
From: Beata Michalska @ 2025-01-28 8:09 UTC (permalink / raw)
To: Viresh Kumar
Cc: linux-kernel, linux-arm-kernel, linux-pm, ionela.voinescu,
sudeep.holla, will, catalin.marinas, rafael, sumitg, yang,
vanshikonda, lihuisong, zhanjie9
On Fri, Jan 24, 2025 at 09:03:33AM +0530, Viresh Kumar wrote:
> On 23-01-25, 22:45, Beata Michalska wrote:
> > That would mean we are opting for presenting '0' value (whatever that means)
> > instead of trying alternative ways of getting 'current' frequency ?
> > This is still the scaling_cur_freq.
>
> A return value of 0 should typically mean something went wrong
> somewhere and didn't return the right value to us.
If smth goes wrong, an error should be returned, shoulnd't it?
>
> - For the print message, I think we should just print the value
> instead of UNKNOWN. Let the user / developer decide what to do with
> it.
Are you refering to the x86 show_cpuinfo behaviour altered by this patch ?
>
> - As for trying other mechanism to find the frequency now, maybe you
> are right and looking for an alternate way is the right way to go.
> And that would be consistent with existing behavior too.
>
That would mean that changes to show_scaling_cur_freq are fine ?
Just trying to clarify things.
Thank you.
---
BR
Beata
> --
> viresh
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v9 4/5] arm64: Provide an AMU-based version of arch_freq_get_on_cpu
2025-01-24 4:43 ` Prasanna Kumar T S M
@ 2025-01-28 8:16 ` Beata Michalska
2025-01-28 8:52 ` Prasanna Kumar T S M
0 siblings, 1 reply; 32+ messages in thread
From: Beata Michalska @ 2025-01-28 8:16 UTC (permalink / raw)
To: Prasanna Kumar T S M
Cc: linux-kernel, linux-arm-kernel, linux-pm, ionela.voinescu,
sudeep.holla, will, catalin.marinas, rafael, viresh.kumar, sumitg,
yang, vanshikonda, lihuisong, zhanjie9
On Fri, Jan 24, 2025 at 10:13:30AM +0530, Prasanna Kumar T S M wrote:
>
> On 21-01-2025 14:14, Beata Michalska wrote:
> > With the Frequency Invariance Engine (FIE) being already wired up with
> > sched tick and making use of relevant (core counter and constant
> > counter) AMU counters, getting the average frequency for a given CPU,
> > can be achieved by utilizing the frequency scale factor which reflects
> > an average CPU frequency for the last tick period length.
> >
> > The solution is partially based on APERF/MPERF implementation of
> > arch_freq_get_on_cpu.
> >
> > Suggested-by: Ionela Voinescu <ionela.voinescu@arm.com>
> > Signed-off-by: Beata Michalska <beata.michalska@arm.com>
> > ---
> > arch/arm64/kernel/topology.c | 109 +++++++++++++++++++++++++++++++----
> > 1 file changed, 99 insertions(+), 10 deletions(-)
> >
> > diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
> > index cb180684d10d..5f5738b174c7 100644
> > --- a/arch/arm64/kernel/topology.c
> > +++ b/arch/arm64/kernel/topology.c
> > @@ -17,6 +17,7 @@
> > #include <linux/cpufreq.h>
> > #include <linux/init.h>
> > #include <linux/percpu.h>
> > +#include <linux/sched/isolation.h>
> > #include <asm/cpu.h>
> > #include <asm/cputype.h>
> > @@ -88,18 +89,28 @@ int __init parse_acpi_topology(void)
> > * initialized.
> > */
> > static DEFINE_PER_CPU_READ_MOSTLY(unsigned long, arch_max_freq_scale) = 1UL << (2 * SCHED_CAPACITY_SHIFT);
> > -static DEFINE_PER_CPU(u64, arch_const_cycles_prev);
> > -static DEFINE_PER_CPU(u64, arch_core_cycles_prev);
> > static cpumask_var_t amu_fie_cpus;
> > +struct amu_cntr_sample {
> > + u64 arch_const_cycles_prev;
> > + u64 arch_core_cycles_prev;
> > + unsigned long last_scale_update;
> > +};
> > +
> > +static DEFINE_PER_CPU_SHARED_ALIGNED(struct amu_cntr_sample, cpu_amu_samples);
> > +
> > void update_freq_counters_refs(void)
> > {
> > - this_cpu_write(arch_core_cycles_prev, read_corecnt());
> > - this_cpu_write(arch_const_cycles_prev, read_constcnt());
> > + struct amu_cntr_sample *amu_sample = this_cpu_ptr(&cpu_amu_samples);
> > +
> > + amu_sample->arch_core_cycles_prev = read_corecnt();
> > + amu_sample->arch_const_cycles_prev = read_constcnt();
> > }
> > static inline bool freq_counters_valid(int cpu)
> > {
> > + struct amu_cntr_sample *amu_sample = per_cpu_ptr(&cpu_amu_samples, cpu);
> > +
> > if ((cpu >= nr_cpu_ids) || !cpumask_test_cpu(cpu, cpu_present_mask))
> > return false;
> > @@ -108,8 +119,8 @@ static inline bool freq_counters_valid(int cpu)
> > return false;
> > }
> > - if (unlikely(!per_cpu(arch_const_cycles_prev, cpu) ||
> > - !per_cpu(arch_core_cycles_prev, cpu))) {
> > + if (unlikely(!amu_sample->arch_const_cycles_prev ||
> > + !amu_sample->arch_core_cycles_prev)) {
> > pr_debug("CPU%d: cycle counters are not enabled.\n", cpu);
> > return false;
> > }
> > @@ -152,17 +163,22 @@ void freq_inv_set_max_ratio(int cpu, u64 max_rate)
> > static void amu_scale_freq_tick(void)
> > {
> > + struct amu_cntr_sample *amu_sample = this_cpu_ptr(&cpu_amu_samples);
> > u64 prev_core_cnt, prev_const_cnt;
> > u64 core_cnt, const_cnt, scale;
> > - prev_const_cnt = this_cpu_read(arch_const_cycles_prev);
> > - prev_core_cnt = this_cpu_read(arch_core_cycles_prev);
> > + prev_const_cnt = amu_sample->arch_const_cycles_prev;
> > + prev_core_cnt = amu_sample->arch_core_cycles_prev;
> > update_freq_counters_refs();
> > - const_cnt = this_cpu_read(arch_const_cycles_prev);
> > - core_cnt = this_cpu_read(arch_core_cycles_prev);
> > + const_cnt = amu_sample->arch_const_cycles_prev;
> > + core_cnt = amu_sample->arch_core_cycles_prev;
> > + /*
> > + * This should not happen unless the AMUs have been reset and the
> > + * counter values have not been restored - unlikely
> > + */
> > if (unlikely(core_cnt <= prev_core_cnt ||
> > const_cnt <= prev_const_cnt))
> > return;
> > @@ -182,6 +198,8 @@ static void amu_scale_freq_tick(void)
> > scale = min_t(unsigned long, scale, SCHED_CAPACITY_SCALE);
> > this_cpu_write(arch_freq_scale, (unsigned long)scale);
> > +
> > + amu_sample->last_scale_update = jiffies;
> > }
> > static struct scale_freq_data amu_sfd = {
> > @@ -189,6 +207,77 @@ static struct scale_freq_data amu_sfd = {
> > .set_freq_scale = amu_scale_freq_tick,
> > };
> > +static __always_inline bool amu_fie_cpu_supported(unsigned int cpu)
> > +{
> > + return cpumask_available(amu_fie_cpus) &&
> > + cpumask_test_cpu(cpu, amu_fie_cpus);
> > +}
> > +
> > +#define AMU_SAMPLE_EXP_MS 20
> > +
> > +int arch_freq_get_on_cpu(int cpu)
> > +{
> > + struct amu_cntr_sample *amu_sample;
> > + unsigned int start_cpu = cpu;
> > + unsigned long last_update;
> > + unsigned int freq = 0;
> > + u64 scale;
> > +
> > + if (!amu_fie_cpu_supported(cpu) || !arch_scale_freq_ref(cpu))
> > + return -EOPNOTSUPP;
> > +
> > +retry:
> > + amu_sample = per_cpu_ptr(&cpu_amu_samples, cpu);
> > +
> > + last_update = amu_sample->last_scale_update;
> > +
> > + /*
> > + * For those CPUs that are in full dynticks mode, or those that have
> > + * not seen tick for a while, try an alternative source for the counters
> > + * (and thus freq scale), if available, for given policy: this boils
> > + * down to identifying an active cpu within the same freq domain, if any.
> > + */
> > + if (!housekeeping_cpu(cpu, HK_TYPE_TICK) ||
> > + time_is_before_jiffies(last_update + msecs_to_jiffies(AMU_SAMPLE_EXP_MS))) {
> > + struct cpufreq_policy *policy = cpufreq_cpu_get(cpu);
> > + int ref_cpu = cpu;
> > +
> > + if (!policy)
> > + return -EINVAL;
> > +
> > + if (!cpumask_intersects(policy->related_cpus,
> > + housekeeping_cpumask(HK_TYPE_TICK))) {
> > + cpufreq_cpu_put(policy);
> > + return -EOPNOTSUPP;
> > + }
> > +
> > +
> > + do {
> > + ref_cpu = cpumask_next_wrap(ref_cpu, policy->cpus,
> > + start_cpu, false);
> > +
> > + } while (ref_cpu < nr_cpu_ids && idle_cpu(ref_cpu));
> > +
> > + cpufreq_cpu_put(policy);
> > +
> > + if (ref_cpu >= nr_cpu_ids)
> > + /* No alternative to pull info from */
> > + return -EAGAIN;
> > +
> > + cpu = ref_cpu;
> > + goto retry;
>
> If you are going to spin a new revision, can you use while loop instead of
> using goto for looping? This will help improve the readability.
Can do, I guess, if you believe it will be more readable that way - me myself
slightly hesitating about that.
---
BR
Beata
>
> > + }
> > + /*
> > + * Reversed computation to the one used to determine
> > + * the arch_freq_scale value
> > + * (see amu_scale_freq_tick for details)
> > + */
> > + scale = arch_scale_freq_capacity(cpu);
> > + freq = scale * arch_scale_freq_ref(cpu);
> > + freq >>= SCHED_CAPACITY_SHIFT;
> > + return freq;
> > +}
> > +
> > static void amu_fie_setup(const struct cpumask *cpus)
> > {
> > int cpu;
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v9 1/5] cpufreq: Allow arch_freq_get_on_cpu to return an error
2025-01-28 8:09 ` Beata Michalska
@ 2025-01-28 8:18 ` Viresh Kumar
0 siblings, 0 replies; 32+ messages in thread
From: Viresh Kumar @ 2025-01-28 8:18 UTC (permalink / raw)
To: Beata Michalska
Cc: linux-kernel, linux-arm-kernel, linux-pm, ionela.voinescu,
sudeep.holla, will, catalin.marinas, rafael, sumitg, yang,
vanshikonda, lihuisong, zhanjie9
On 28-01-25, 09:09, Beata Michalska wrote:
> On Fri, Jan 24, 2025 at 09:03:33AM +0530, Viresh Kumar wrote:
> > On 23-01-25, 22:45, Beata Michalska wrote:
> > > That would mean we are opting for presenting '0' value (whatever that means)
> > > instead of trying alternative ways of getting 'current' frequency ?
> > > This is still the scaling_cur_freq.
> >
> > A return value of 0 should typically mean something went wrong
> > somewhere and didn't return the right value to us.
> If smth goes wrong, an error should be returned, shoulnd't it?
Right, but what if no error is detected and still a value of 0 is
returned somehow ? That's what I was talking about.
> > - For the print message, I think we should just print the value
> > instead of UNKNOWN. Let the user / developer decide what to do with
> > it.
> Are you refering to the x86 show_cpuinfo behaviour altered by this patch ?
Yes
> > - As for trying other mechanism to find the frequency now, maybe you
> > are right and looking for an alternate way is the right way to go.
> > And that would be consistent with existing behavior too.
> >
> That would mean that changes to show_scaling_cur_freq are fine ?
Yes.
--
viresh
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v9 2/5] cpufreq: Introduce an optional cpuinfo_avg_freq sysfs entry
2025-01-21 8:44 ` [PATCH v9 2/5] cpufreq: Introduce an optional cpuinfo_avg_freq sysfs entry Beata Michalska
2025-01-21 10:53 ` Viresh Kumar
@ 2025-01-28 8:43 ` Prasanna Kumar T S M
2025-01-29 11:29 ` Sumit Gupta
2 siblings, 0 replies; 32+ messages in thread
From: Prasanna Kumar T S M @ 2025-01-28 8:43 UTC (permalink / raw)
To: Beata Michalska, linux-kernel, linux-arm-kernel, linux-pm,
ionela.voinescu, sudeep.holla, will, catalin.marinas, rafael,
viresh.kumar
Cc: sumitg, yang, vanshikonda, lihuisong, zhanjie9, Jonathan Corbet,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
H . Peter Anvin, Phil Auld, x86, linux-doc
On 21-01-2025 14:14, Beata Michalska wrote:
> Currently the CPUFreq core exposes two sysfs attributes that can be used
> to query current frequency of a given CPU(s): namely cpuinfo_cur_freq
> and scaling_cur_freq. Both provide slightly different view on the
> subject and they do come with their own drawbacks.
>
> cpuinfo_cur_freq provides higher precision though at a cost of being
> rather expensive. Moreover, the information retrieved via this attribute
> is somewhat short lived as frequency can change at any point of time
> making it difficult to reason from.
>
> scaling_cur_freq, on the other hand, tends to be less accurate but then
> the actual level of precision (and source of information) varies between
> architectures making it a bit ambiguous.
>
> The new attribute, cpuinfo_avg_freq, is intended to provide more stable,
> distinct interface, exposing an average frequency of a given CPU(s), as
> reported by the hardware, over a time frame spanning no more than a few
> milliseconds. As it requires appropriate hardware support, this
> interface is optional.
>
> Note that under the hood, the new attribute relies on the information
> provided by arch_freq_get_on_cpu, which, up to this point, has been
> feeding data for scaling_cur_freq attribute, being the source of
> ambiguity when it comes to interpretation. This has been amended by
> restoring the intended behavior for scaling_cur_freq, with a new
> dedicated config option to maintain status quo for those, who may need
> it.
>
> CC: Jonathan Corbet <corbet@lwn.net>
> CC: Thomas Gleixner <tglx@linutronix.de>
> CC: Ingo Molnar <mingo@redhat.com>
> CC: Borislav Petkov <bp@alien8.de>
> CC: Dave Hansen <dave.hansen@linux.intel.com>
> CC: H. Peter Anvin <hpa@zytor.com>
> CC: Phil Auld <pauld@redhat.com>
> CC: x86@kernel.org
> CC: linux-doc@vger.kernel.org
>
> Signed-off-by: Beata Michalska <beata.michalska@arm.com>
> ---
> Documentation/admin-guide/pm/cpufreq.rst | 16 ++++++++++++-
> drivers/cpufreq/Kconfig.x86 | 12 ++++++++++
> drivers/cpufreq/cpufreq.c | 30 +++++++++++++++++++++++-
> 3 files changed, 56 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/admin-guide/pm/cpufreq.rst b/Documentation/admin-guide/pm/cpufreq.rst
> index a21369eba034..e9969174026c 100644
> --- a/Documentation/admin-guide/pm/cpufreq.rst
> +++ b/Documentation/admin-guide/pm/cpufreq.rst
> @@ -248,6 +248,19 @@ are the following:
> If that frequency cannot be determined, this attribute should not
> be present.
>
> +``cpuinfo_avg_freq``
> + An average frequency (in KHz) of all CPUs belonging to a given policy,
> + derived from a hardware provided feedback and reported on a time frame
> + spanning at most few milliseconds.
> +
> + This is expected to be based on the frequency the hardware actually runs
> + at and, as such, might require specialised hardware support (such as AMU
> + extension on ARM). If one cannot be determined, this attribute should
> + not be present.
> +
> + Note, that failed attempt to retrieve current frequency for a given
> + CPU(s) will result in an appropriate error.
> +
> ``cpuinfo_max_freq``
> Maximum possible operating frequency the CPUs belonging to this policy
> can run at (in kHz).
> @@ -293,7 +306,8 @@ are the following:
> Some architectures (e.g. ``x86``) may attempt to provide information
> more precisely reflecting the current CPU frequency through this
> attribute, but that still may not be the exact current CPU frequency as
> - seen by the hardware at the moment.
> + seen by the hardware at the moment. This behavior though, is only
> + available via c:macro:``CPUFREQ_ARCH_CUR_FREQ`` option.
>
> ``scaling_driver``
> The scaling driver currently in use.
> diff --git a/drivers/cpufreq/Kconfig.x86 b/drivers/cpufreq/Kconfig.x86
> index 97c2d4f15d76..212e1b9afe21 100644
> --- a/drivers/cpufreq/Kconfig.x86
> +++ b/drivers/cpufreq/Kconfig.x86
> @@ -340,3 +340,15 @@ config X86_SPEEDSTEP_RELAXED_CAP_CHECK
> option lets the probing code bypass some of those checks if the
> parameter "relaxed_check=1" is passed to the module.
>
> +config CPUFREQ_ARCH_CUR_FREQ
> + default y
> + bool "Current frequency derived from HW provided feedback"
> + help
> + This determines whether the scaling_cur_freq sysfs attribute returns
> + the last requested frequency or a more precise value based on hardware
> + provided feedback (as architected counters).
> + Given that a more precise frequency can now be provided via the
> + cpuinfo_avg_cur_freq attribute, by enabling this option,
> + scaling_cur_freq maintains the provision of a counter based frequency,
> + for compatibility reasons.
> +
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index 6f45684483c4..b2a8efa83c98 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -733,12 +733,20 @@ __weak int arch_freq_get_on_cpu(int cpu)
> return -EOPNOTSUPP;
> }
>
> +static inline bool cpufreq_avg_freq_supported(struct cpufreq_policy *policy)
> +{
> + return arch_freq_get_on_cpu(policy->cpu) != -EOPNOTSUPP;
> +}
> +
> static ssize_t show_scaling_cur_freq(struct cpufreq_policy *policy, char *buf)
> {
> ssize_t ret;
> int freq;
>
> - freq = arch_freq_get_on_cpu(policy->cpu);
> + freq = IS_ENABLED(CONFIG_CPUFREQ_ARCH_CUR_FREQ)
> + ? arch_freq_get_on_cpu(policy->cpu)
> + : 0;
> +
> if (freq > 0)
> ret = sysfs_emit(buf, "%u\n", freq);
> else if (cpufreq_driver->setpolicy && cpufreq_driver->get)
> @@ -783,6 +791,19 @@ static ssize_t show_cpuinfo_cur_freq(struct cpufreq_policy *policy,
> return sysfs_emit(buf, "<unknown>\n");
> }
>
> +/*
> + * show_cpuinfo_avg_freq - average CPU frequency as detected by hardware
> + */
> +static ssize_t show_cpuinfo_avg_freq(struct cpufreq_policy *policy,
> + char *buf)
> +{
> + int avg_freq = arch_freq_get_on_cpu(policy->cpu);
> +
> + if (avg_freq > 0)
> + return sysfs_emit(buf, "%u\n", avg_freq);
> + return avg_freq != 0 ? avg_freq : -EINVAL;
> +}
> +
> /*
> * show_scaling_governor - show the current policy for the specified CPU
> */
> @@ -945,6 +966,7 @@ static ssize_t show_bios_limit(struct cpufreq_policy *policy, char *buf)
> }
>
> cpufreq_freq_attr_ro_perm(cpuinfo_cur_freq, 0400);
> +cpufreq_freq_attr_ro(cpuinfo_avg_freq);
> cpufreq_freq_attr_ro(cpuinfo_min_freq);
> cpufreq_freq_attr_ro(cpuinfo_max_freq);
> cpufreq_freq_attr_ro(cpuinfo_transition_latency);
> @@ -1072,6 +1094,12 @@ static int cpufreq_add_dev_interface(struct cpufreq_policy *policy)
> return ret;
> }
>
> + if (cpufreq_avg_freq_supported(policy)) {
> + ret = sysfs_create_file(&policy->kobj, &cpuinfo_avg_freq.attr);
> + if (ret)
> + return ret;
> + }
> +
> ret = sysfs_create_file(&policy->kobj, &scaling_cur_freq.attr);
> if (ret)
> return ret;
Looks good to me.
Reviewed-by: Prasanna Kumar T S M <ptsm@linux.microsoft.com>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v9 4/5] arm64: Provide an AMU-based version of arch_freq_get_on_cpu
2025-01-28 8:16 ` Beata Michalska
@ 2025-01-28 8:52 ` Prasanna Kumar T S M
0 siblings, 0 replies; 32+ messages in thread
From: Prasanna Kumar T S M @ 2025-01-28 8:52 UTC (permalink / raw)
To: Beata Michalska
Cc: linux-kernel, linux-arm-kernel, linux-pm, ionela.voinescu,
sudeep.holla, will, catalin.marinas, rafael, viresh.kumar, sumitg,
yang, vanshikonda, lihuisong, zhanjie9
On 28-01-2025 13:46, Beata Michalska wrote:
> On Fri, Jan 24, 2025 at 10:13:30AM +0530, Prasanna Kumar T S M wrote:
>> On 21-01-2025 14:14, Beata Michalska wrote:
>>> With the Frequency Invariance Engine (FIE) being already wired up with
>>> sched tick and making use of relevant (core counter and constant
>>> counter) AMU counters, getting the average frequency for a given CPU,
>>> can be achieved by utilizing the frequency scale factor which reflects
>>> an average CPU frequency for the last tick period length.
>>>
>>> The solution is partially based on APERF/MPERF implementation of
>>> arch_freq_get_on_cpu.
>>>
>>> Suggested-by: Ionela Voinescu <ionela.voinescu@arm.com>
>>> Signed-off-by: Beata Michalska <beata.michalska@arm.com>
>>> ---
>>> arch/arm64/kernel/topology.c | 109 +++++++++++++++++++++++++++++++----
>>> 1 file changed, 99 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
>>> index cb180684d10d..5f5738b174c7 100644
>>> --- a/arch/arm64/kernel/topology.c
>>> +++ b/arch/arm64/kernel/topology.c
>>> @@ -17,6 +17,7 @@
>>> #include <linux/cpufreq.h>
>>> #include <linux/init.h>
>>> #include <linux/percpu.h>
>>> +#include <linux/sched/isolation.h>
>>> #include <asm/cpu.h>
>>> #include <asm/cputype.h>
>>> @@ -88,18 +89,28 @@ int __init parse_acpi_topology(void)
>>> * initialized.
>>> */
>>> static DEFINE_PER_CPU_READ_MOSTLY(unsigned long, arch_max_freq_scale) = 1UL << (2 * SCHED_CAPACITY_SHIFT);
>>> -static DEFINE_PER_CPU(u64, arch_const_cycles_prev);
>>> -static DEFINE_PER_CPU(u64, arch_core_cycles_prev);
>>> static cpumask_var_t amu_fie_cpus;
>>> +struct amu_cntr_sample {
>>> + u64 arch_const_cycles_prev;
>>> + u64 arch_core_cycles_prev;
>>> + unsigned long last_scale_update;
>>> +};
>>> +
>>> +static DEFINE_PER_CPU_SHARED_ALIGNED(struct amu_cntr_sample, cpu_amu_samples);
>>> +
>>> void update_freq_counters_refs(void)
>>> {
>>> - this_cpu_write(arch_core_cycles_prev, read_corecnt());
>>> - this_cpu_write(arch_const_cycles_prev, read_constcnt());
>>> + struct amu_cntr_sample *amu_sample = this_cpu_ptr(&cpu_amu_samples);
>>> +
>>> + amu_sample->arch_core_cycles_prev = read_corecnt();
>>> + amu_sample->arch_const_cycles_prev = read_constcnt();
>>> }
>>> static inline bool freq_counters_valid(int cpu)
>>> {
>>> + struct amu_cntr_sample *amu_sample = per_cpu_ptr(&cpu_amu_samples, cpu);
>>> +
>>> if ((cpu >= nr_cpu_ids) || !cpumask_test_cpu(cpu, cpu_present_mask))
>>> return false;
>>> @@ -108,8 +119,8 @@ static inline bool freq_counters_valid(int cpu)
>>> return false;
>>> }
>>> - if (unlikely(!per_cpu(arch_const_cycles_prev, cpu) ||
>>> - !per_cpu(arch_core_cycles_prev, cpu))) {
>>> + if (unlikely(!amu_sample->arch_const_cycles_prev ||
>>> + !amu_sample->arch_core_cycles_prev)) {
>>> pr_debug("CPU%d: cycle counters are not enabled.\n", cpu);
>>> return false;
>>> }
>>> @@ -152,17 +163,22 @@ void freq_inv_set_max_ratio(int cpu, u64 max_rate)
>>> static void amu_scale_freq_tick(void)
>>> {
>>> + struct amu_cntr_sample *amu_sample = this_cpu_ptr(&cpu_amu_samples);
>>> u64 prev_core_cnt, prev_const_cnt;
>>> u64 core_cnt, const_cnt, scale;
>>> - prev_const_cnt = this_cpu_read(arch_const_cycles_prev);
>>> - prev_core_cnt = this_cpu_read(arch_core_cycles_prev);
>>> + prev_const_cnt = amu_sample->arch_const_cycles_prev;
>>> + prev_core_cnt = amu_sample->arch_core_cycles_prev;
>>> update_freq_counters_refs();
>>> - const_cnt = this_cpu_read(arch_const_cycles_prev);
>>> - core_cnt = this_cpu_read(arch_core_cycles_prev);
>>> + const_cnt = amu_sample->arch_const_cycles_prev;
>>> + core_cnt = amu_sample->arch_core_cycles_prev;
>>> + /*
>>> + * This should not happen unless the AMUs have been reset and the
>>> + * counter values have not been restored - unlikely
>>> + */
>>> if (unlikely(core_cnt <= prev_core_cnt ||
>>> const_cnt <= prev_const_cnt))
>>> return;
>>> @@ -182,6 +198,8 @@ static void amu_scale_freq_tick(void)
>>> scale = min_t(unsigned long, scale, SCHED_CAPACITY_SCALE);
>>> this_cpu_write(arch_freq_scale, (unsigned long)scale);
>>> +
>>> + amu_sample->last_scale_update = jiffies;
>>> }
>>> static struct scale_freq_data amu_sfd = {
>>> @@ -189,6 +207,77 @@ static struct scale_freq_data amu_sfd = {
>>> .set_freq_scale = amu_scale_freq_tick,
>>> };
>>> +static __always_inline bool amu_fie_cpu_supported(unsigned int cpu)
>>> +{
>>> + return cpumask_available(amu_fie_cpus) &&
>>> + cpumask_test_cpu(cpu, amu_fie_cpus);
>>> +}
>>> +
>>> +#define AMU_SAMPLE_EXP_MS 20
>>> +
>>> +int arch_freq_get_on_cpu(int cpu)
>>> +{
>>> + struct amu_cntr_sample *amu_sample;
>>> + unsigned int start_cpu = cpu;
>>> + unsigned long last_update;
>>> + unsigned int freq = 0;
>>> + u64 scale;
>>> +
>>> + if (!amu_fie_cpu_supported(cpu) || !arch_scale_freq_ref(cpu))
>>> + return -EOPNOTSUPP;
>>> +
>>> +retry:
>>> + amu_sample = per_cpu_ptr(&cpu_amu_samples, cpu);
>>> +
>>> + last_update = amu_sample->last_scale_update;
>>> +
>>> + /*
>>> + * For those CPUs that are in full dynticks mode, or those that have
>>> + * not seen tick for a while, try an alternative source for the counters
>>> + * (and thus freq scale), if available, for given policy: this boils
>>> + * down to identifying an active cpu within the same freq domain, if any.
>>> + */
>>> + if (!housekeeping_cpu(cpu, HK_TYPE_TICK) ||
>>> + time_is_before_jiffies(last_update + msecs_to_jiffies(AMU_SAMPLE_EXP_MS))) {
>>> + struct cpufreq_policy *policy = cpufreq_cpu_get(cpu);
>>> + int ref_cpu = cpu;
>>> +
>>> + if (!policy)
>>> + return -EINVAL;
>>> +
>>> + if (!cpumask_intersects(policy->related_cpus,
>>> + housekeeping_cpumask(HK_TYPE_TICK))) {
>>> + cpufreq_cpu_put(policy);
>>> + return -EOPNOTSUPP;
>>> + }
>>> +
>>> +
>>> + do {
>>> + ref_cpu = cpumask_next_wrap(ref_cpu, policy->cpus,
>>> + start_cpu, false);
>>> +
>>> + } while (ref_cpu < nr_cpu_ids && idle_cpu(ref_cpu));
>>> +
>>> + cpufreq_cpu_put(policy);
>>> +
>>> + if (ref_cpu >= nr_cpu_ids)
>>> + /* No alternative to pull info from */
>>> + return -EAGAIN;
>>> +
>>> + cpu = ref_cpu;
>>> + goto retry;
>> If you are going to spin a new revision, can you use while loop instead of
>> using goto for looping? This will help improve the readability.
> Can do, I guess, if you believe it will be more readable that way - me myself
> slightly hesitating about that.
Feel free to pick whichever option you feel is best. Don't spin a new
version just to change this.
I missed adding this in my previous email.
Looks good to me.
Reviewed-by: Prasanna Kumar T S M <ptsm@linux.microsoft.com>.
>
> ---
> BR
> Beata
>>> + }
>>> + /*
>>> + * Reversed computation to the one used to determine
>>> + * the arch_freq_scale value
>>> + * (see amu_scale_freq_tick for details)
>>> + */
>>> + scale = arch_scale_freq_capacity(cpu);
>>> + freq = scale * arch_scale_freq_ref(cpu);
>>> + freq >>= SCHED_CAPACITY_SHIFT;
>>> + return freq;
>>> +}
>>> +
>>> static void amu_fie_setup(const struct cpumask *cpus)
>>> {
>>> int cpu;
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v9 5/5] arm64: Update AMU-based freq scale factor on entering idle
2025-01-21 8:44 ` [PATCH v9 5/5] arm64: Update AMU-based freq scale factor on entering idle Beata Michalska
2025-01-24 4:45 ` Prasanna Kumar T S M
@ 2025-01-29 11:13 ` Sumit Gupta
1 sibling, 0 replies; 32+ messages in thread
From: Sumit Gupta @ 2025-01-29 11:13 UTC (permalink / raw)
To: Beata Michalska, linux-kernel, linux-arm-kernel, linux-pm,
ionela.voinescu, sudeep.holla, will, catalin.marinas, rafael,
viresh.kumar
Cc: yang, vanshikonda, lihuisong, zhanjie9, linux-tegra
On 21/01/25 14:14, Beata Michalska wrote:
>
> Now that the frequency scale factor has been activated for retrieving
> current frequency on a given CPU, trigger its update upon entering
> idle. This will, to an extent, allow querying last known frequency
> in a non-invasive way. It will also improve the frequency scale factor
> accuracy when a CPU entering idle did not receive a tick for a while.
> As a consequence, for idle cores, the reported frequency will be the
> last one observed before entering the idle state.
>
> Suggested-by: Vanshidhar Konda <vanshikonda@os.amperecomputing.com>
> Signed-off-by: Beata Michalska <beata.michalska@arm.com>
Reviewed-by: Sumit Gupta <sumitg@nvidia.com>
> ---
> arch/arm64/kernel/topology.c | 13 +++++++++++++
> 1 file changed, 13 insertions(+)
>
> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
> index 5f5738b174c7..6c43aafac77c 100644
> --- a/arch/arm64/kernel/topology.c
> +++ b/arch/arm64/kernel/topology.c
> @@ -213,6 +213,19 @@ static __always_inline bool amu_fie_cpu_supported(unsigned int cpu)
> cpumask_test_cpu(cpu, amu_fie_cpus);
> }
>
> +void arch_cpu_idle_enter(void)
> +{
> + unsigned int cpu = smp_processor_id();
> +
> + if (!amu_fie_cpu_supported(cpu))
> + return;
> +
> + /* Kick in AMU update but only if one has not happened already */
> + if (housekeeping_cpu(cpu, HK_TYPE_TICK) &&
> + time_is_before_jiffies(per_cpu(cpu_amu_samples.last_scale_update, cpu)))
> + amu_scale_freq_tick();
> +}
> +
> #define AMU_SAMPLE_EXP_MS 20
>
> int arch_freq_get_on_cpu(int cpu)
> --
> 2.25.1
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v9 4/5] arm64: Provide an AMU-based version of arch_freq_get_on_cpu
2025-01-21 8:44 ` [PATCH v9 4/5] arm64: Provide an AMU-based version of arch_freq_get_on_cpu Beata Michalska
2025-01-24 4:43 ` Prasanna Kumar T S M
@ 2025-01-29 11:15 ` Sumit Gupta
1 sibling, 0 replies; 32+ messages in thread
From: Sumit Gupta @ 2025-01-29 11:15 UTC (permalink / raw)
To: Beata Michalska, linux-kernel, linux-arm-kernel, linux-pm,
ionela.voinescu, sudeep.holla, will, catalin.marinas, rafael,
viresh.kumar
Cc: yang, vanshikonda, lihuisong, zhanjie9, linux-tegra
On 21/01/25 14:14, Beata Michalska wrote:
>
>
> With the Frequency Invariance Engine (FIE) being already wired up with
> sched tick and making use of relevant (core counter and constant
> counter) AMU counters, getting the average frequency for a given CPU,
> can be achieved by utilizing the frequency scale factor which reflects
> an average CPU frequency for the last tick period length.
>
> The solution is partially based on APERF/MPERF implementation of
> arch_freq_get_on_cpu.
>
> Suggested-by: Ionela Voinescu <ionela.voinescu@arm.com>
> Signed-off-by: Beata Michalska <beata.michalska@arm.com>
Reviewed-by: Sumit Gupta <sumitg@nvidia.com>
> ---
> arch/arm64/kernel/topology.c | 109 +++++++++++++++++++++++++++++++----
> 1 file changed, 99 insertions(+), 10 deletions(-)
>
> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
> index cb180684d10d..5f5738b174c7 100644
> --- a/arch/arm64/kernel/topology.c
> +++ b/arch/arm64/kernel/topology.c
> @@ -17,6 +17,7 @@
> #include <linux/cpufreq.h>
> #include <linux/init.h>
> #include <linux/percpu.h>
> +#include <linux/sched/isolation.h>
>
> #include <asm/cpu.h>
> #include <asm/cputype.h>
> @@ -88,18 +89,28 @@ int __init parse_acpi_topology(void)
> * initialized.
> */
> static DEFINE_PER_CPU_READ_MOSTLY(unsigned long, arch_max_freq_scale) = 1UL << (2 * SCHED_CAPACITY_SHIFT);
> -static DEFINE_PER_CPU(u64, arch_const_cycles_prev);
> -static DEFINE_PER_CPU(u64, arch_core_cycles_prev);
> static cpumask_var_t amu_fie_cpus;
>
> +struct amu_cntr_sample {
> + u64 arch_const_cycles_prev;
> + u64 arch_core_cycles_prev;
> + unsigned long last_scale_update;
> +};
> +
> +static DEFINE_PER_CPU_SHARED_ALIGNED(struct amu_cntr_sample, cpu_amu_samples);
> +
> void update_freq_counters_refs(void)
> {
> - this_cpu_write(arch_core_cycles_prev, read_corecnt());
> - this_cpu_write(arch_const_cycles_prev, read_constcnt());
> + struct amu_cntr_sample *amu_sample = this_cpu_ptr(&cpu_amu_samples);
> +
> + amu_sample->arch_core_cycles_prev = read_corecnt();
> + amu_sample->arch_const_cycles_prev = read_constcnt();
> }
>
> static inline bool freq_counters_valid(int cpu)
> {
> + struct amu_cntr_sample *amu_sample = per_cpu_ptr(&cpu_amu_samples, cpu);
> +
> if ((cpu >= nr_cpu_ids) || !cpumask_test_cpu(cpu, cpu_present_mask))
> return false;
>
> @@ -108,8 +119,8 @@ static inline bool freq_counters_valid(int cpu)
> return false;
> }
>
> - if (unlikely(!per_cpu(arch_const_cycles_prev, cpu) ||
> - !per_cpu(arch_core_cycles_prev, cpu))) {
> + if (unlikely(!amu_sample->arch_const_cycles_prev ||
> + !amu_sample->arch_core_cycles_prev)) {
> pr_debug("CPU%d: cycle counters are not enabled.\n", cpu);
> return false;
> }
> @@ -152,17 +163,22 @@ void freq_inv_set_max_ratio(int cpu, u64 max_rate)
>
> static void amu_scale_freq_tick(void)
> {
> + struct amu_cntr_sample *amu_sample = this_cpu_ptr(&cpu_amu_samples);
> u64 prev_core_cnt, prev_const_cnt;
> u64 core_cnt, const_cnt, scale;
>
> - prev_const_cnt = this_cpu_read(arch_const_cycles_prev);
> - prev_core_cnt = this_cpu_read(arch_core_cycles_prev);
> + prev_const_cnt = amu_sample->arch_const_cycles_prev;
> + prev_core_cnt = amu_sample->arch_core_cycles_prev;
>
> update_freq_counters_refs();
>
> - const_cnt = this_cpu_read(arch_const_cycles_prev);
> - core_cnt = this_cpu_read(arch_core_cycles_prev);
> + const_cnt = amu_sample->arch_const_cycles_prev;
> + core_cnt = amu_sample->arch_core_cycles_prev;
>
> + /*
> + * This should not happen unless the AMUs have been reset and the
> + * counter values have not been restored - unlikely
> + */
> if (unlikely(core_cnt <= prev_core_cnt ||
> const_cnt <= prev_const_cnt))
> return;
> @@ -182,6 +198,8 @@ static void amu_scale_freq_tick(void)
>
> scale = min_t(unsigned long, scale, SCHED_CAPACITY_SCALE);
> this_cpu_write(arch_freq_scale, (unsigned long)scale);
> +
> + amu_sample->last_scale_update = jiffies;
> }
>
> static struct scale_freq_data amu_sfd = {
> @@ -189,6 +207,77 @@ static struct scale_freq_data amu_sfd = {
> .set_freq_scale = amu_scale_freq_tick,
> };
>
> +static __always_inline bool amu_fie_cpu_supported(unsigned int cpu)
> +{
> + return cpumask_available(amu_fie_cpus) &&
> + cpumask_test_cpu(cpu, amu_fie_cpus);
> +}
> +
> +#define AMU_SAMPLE_EXP_MS 20
> +
> +int arch_freq_get_on_cpu(int cpu)
> +{
> + struct amu_cntr_sample *amu_sample;
> + unsigned int start_cpu = cpu;
> + unsigned long last_update;
> + unsigned int freq = 0;
> + u64 scale;
> +
> + if (!amu_fie_cpu_supported(cpu) || !arch_scale_freq_ref(cpu))
> + return -EOPNOTSUPP;
> +
> +retry:
> + amu_sample = per_cpu_ptr(&cpu_amu_samples, cpu);
> +
> + last_update = amu_sample->last_scale_update;
> +
> + /*
> + * For those CPUs that are in full dynticks mode, or those that have
> + * not seen tick for a while, try an alternative source for the counters
> + * (and thus freq scale), if available, for given policy: this boils
> + * down to identifying an active cpu within the same freq domain, if any.
> + */
> + if (!housekeeping_cpu(cpu, HK_TYPE_TICK) ||
> + time_is_before_jiffies(last_update + msecs_to_jiffies(AMU_SAMPLE_EXP_MS))) {
> + struct cpufreq_policy *policy = cpufreq_cpu_get(cpu);
> + int ref_cpu = cpu;
> +
> + if (!policy)
> + return -EINVAL;
> +
> + if (!cpumask_intersects(policy->related_cpus,
> + housekeeping_cpumask(HK_TYPE_TICK))) {
> + cpufreq_cpu_put(policy);
> + return -EOPNOTSUPP;
> + }
> +
> +
> + do {
> + ref_cpu = cpumask_next_wrap(ref_cpu, policy->cpus,
> + start_cpu, false);
> +
> + } while (ref_cpu < nr_cpu_ids && idle_cpu(ref_cpu));
> +
> + cpufreq_cpu_put(policy);
> +
> + if (ref_cpu >= nr_cpu_ids)
> + /* No alternative to pull info from */
> + return -EAGAIN;
> +
> + cpu = ref_cpu;
> + goto retry;
> + }
> + /*
> + * Reversed computation to the one used to determine
> + * the arch_freq_scale value
> + * (see amu_scale_freq_tick for details)
> + */
> + scale = arch_scale_freq_capacity(cpu);
> + freq = scale * arch_scale_freq_ref(cpu);
> + freq >>= SCHED_CAPACITY_SHIFT;
> + return freq;
> +}
> +
> static void amu_fie_setup(const struct cpumask *cpus)
> {
> int cpu;
> --
> 2.25.1
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v9 3/5] arm64: amu: Delay allocating cpumask for AMU FIE support
2025-01-21 8:44 ` [PATCH v9 3/5] arm64: amu: Delay allocating cpumask for AMU FIE support Beata Michalska
2025-01-24 4:48 ` Prasanna Kumar T S M
@ 2025-01-29 11:17 ` Sumit Gupta
1 sibling, 0 replies; 32+ messages in thread
From: Sumit Gupta @ 2025-01-29 11:17 UTC (permalink / raw)
To: Beata Michalska, linux-kernel, linux-arm-kernel, linux-pm,
ionela.voinescu, sudeep.holla, will, catalin.marinas, rafael,
viresh.kumar
Cc: yang, vanshikonda, lihuisong, zhanjie9, linux-tegra
On 21/01/25 14:14, Beata Michalska wrote:
>
>
> For the time being, the amu_fie_cpus cpumask is being exclusively used
> by the AMU-related internals of FIE support and is guaranteed to be
> valid on every access currently made. Still the mask is not being
> invalidated on one of the error handling code paths, which leaves
> a soft spot with theoretical risk of UAF for CPUMASK_OFFSTACK cases.
> To make things sound, delay allocating said cpumask
> (for CPUMASK_OFFSTACK) avoiding otherwise nasty sanitising case failing
> to register the cpufreq policy notifications.
>
> Signed-off-by: Beata Michalska <beata.michalska@arm.com>
Reviewed-by: Sumit Gupta <sumitg@nvidia.com>
> ---
> arch/arm64/kernel/topology.c | 22 ++++++++++------------
> 1 file changed, 10 insertions(+), 12 deletions(-)
>
> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
> index 1a2c72f3e7f8..cb180684d10d 100644
> --- a/arch/arm64/kernel/topology.c
> +++ b/arch/arm64/kernel/topology.c
> @@ -194,12 +194,19 @@ static void amu_fie_setup(const struct cpumask *cpus)
> int cpu;
>
> /* We are already set since the last insmod of cpufreq driver */
> - if (unlikely(cpumask_subset(cpus, amu_fie_cpus)))
> + if (cpumask_available(amu_fie_cpus) &&
> + unlikely(cpumask_subset(cpus, amu_fie_cpus)))
> return;
>
> - for_each_cpu(cpu, cpus) {
> + for_each_cpu(cpu, cpus)
> if (!freq_counters_valid(cpu))
> return;
> +
> + if (!cpumask_available(amu_fie_cpus) &&
> + !zalloc_cpumask_var(&amu_fie_cpus, GFP_KERNEL)) {
> + WARN_ONCE(1, "Failed to allocate FIE cpumask for CPUs[%*pbl]\n",
> + cpumask_pr_args(cpus));
> + return;
> }
>
> cpumask_or(amu_fie_cpus, amu_fie_cpus, cpus);
> @@ -237,17 +244,8 @@ static struct notifier_block init_amu_fie_notifier = {
>
> static int __init init_amu_fie(void)
> {
> - int ret;
> -
> - if (!zalloc_cpumask_var(&amu_fie_cpus, GFP_KERNEL))
> - return -ENOMEM;
> -
> - ret = cpufreq_register_notifier(&init_amu_fie_notifier,
> + return cpufreq_register_notifier(&init_amu_fie_notifier,
> CPUFREQ_POLICY_NOTIFIER);
> - if (ret)
> - free_cpumask_var(amu_fie_cpus);
> -
> - return ret;
> }
> core_initcall(init_amu_fie);
>
> --
> 2.25.1
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v9 2/5] cpufreq: Introduce an optional cpuinfo_avg_freq sysfs entry
2025-01-21 8:44 ` [PATCH v9 2/5] cpufreq: Introduce an optional cpuinfo_avg_freq sysfs entry Beata Michalska
2025-01-21 10:53 ` Viresh Kumar
2025-01-28 8:43 ` Prasanna Kumar T S M
@ 2025-01-29 11:29 ` Sumit Gupta
2 siblings, 0 replies; 32+ messages in thread
From: Sumit Gupta @ 2025-01-29 11:29 UTC (permalink / raw)
To: Beata Michalska, linux-kernel, linux-arm-kernel, linux-pm,
ionela.voinescu, sudeep.holla, will, catalin.marinas, rafael,
viresh.kumar
Cc: yang, vanshikonda, lihuisong, zhanjie9, Jonathan Corbet,
Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
H . Peter Anvin, Phil Auld, x86, linux-doc, linux-tegra
On 21/01/25 14:14, Beata Michalska wrote:
>
>
> Currently the CPUFreq core exposes two sysfs attributes that can be used
> to query current frequency of a given CPU(s): namely cpuinfo_cur_freq
> and scaling_cur_freq. Both provide slightly different view on the
> subject and they do come with their own drawbacks.
>
> cpuinfo_cur_freq provides higher precision though at a cost of being
> rather expensive. Moreover, the information retrieved via this attribute
> is somewhat short lived as frequency can change at any point of time
> making it difficult to reason from.
>
> scaling_cur_freq, on the other hand, tends to be less accurate but then
> the actual level of precision (and source of information) varies between
> architectures making it a bit ambiguous.
>
> The new attribute, cpuinfo_avg_freq, is intended to provide more stable,
> distinct interface, exposing an average frequency of a given CPU(s), as
> reported by the hardware, over a time frame spanning no more than a few
> milliseconds. As it requires appropriate hardware support, this
> interface is optional.
>
> Note that under the hood, the new attribute relies on the information
> provided by arch_freq_get_on_cpu, which, up to this point, has been
> feeding data for scaling_cur_freq attribute, being the source of
> ambiguity when it comes to interpretation. This has been amended by
> restoring the intended behavior for scaling_cur_freq, with a new
> dedicated config option to maintain status quo for those, who may need
> it.
>
> CC: Jonathan Corbet <corbet@lwn.net>
> CC: Thomas Gleixner <tglx@linutronix.de>
> CC: Ingo Molnar <mingo@redhat.com>
> CC: Borislav Petkov <bp@alien8.de>
> CC: Dave Hansen <dave.hansen@linux.intel.com>
> CC: H. Peter Anvin <hpa@zytor.com>
> CC: Phil Auld <pauld@redhat.com>
> CC: x86@kernel.org
> CC: linux-doc@vger.kernel.org
>
> Signed-off-by: Beata Michalska <beata.michalska@arm.com>
> ---
> Documentation/admin-guide/pm/cpufreq.rst | 16 ++++++++++++-
> drivers/cpufreq/Kconfig.x86 | 12 ++++++++++
> drivers/cpufreq/cpufreq.c | 30 +++++++++++++++++++++++-
> 3 files changed, 56 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/admin-guide/pm/cpufreq.rst b/Documentation/admin-guide/pm/cpufreq.rst
> index a21369eba034..e9969174026c 100644
> --- a/Documentation/admin-guide/pm/cpufreq.rst
> +++ b/Documentation/admin-guide/pm/cpufreq.rst
> @@ -248,6 +248,19 @@ are the following:
> If that frequency cannot be determined, this attribute should not
> be present.
>
> +``cpuinfo_avg_freq``
> + An average frequency (in KHz) of all CPUs belonging to a given policy,
> + derived from a hardware provided feedback and reported on a time frame
> + spanning at most few milliseconds.
> +
> + This is expected to be based on the frequency the hardware actually runs
> + at and, as such, might require specialised hardware support (such as AMU
> + extension on ARM). If one cannot be determined, this attribute should
> + not be present.
> +
> + Note, that failed attempt to retrieve current frequency for a given
> + CPU(s) will result in an appropriate error.
> +
Minor nit:
Should we also add: Idle CPU's on ARM will return EAGAIN (Resource
temporarily unavailable) error?
> ``cpuinfo_max_freq``
> Maximum possible operating frequency the CPUs belonging to this policy
> can run at (in kHz).
> @@ -293,7 +306,8 @@ are the following:
> Some architectures (e.g. ``x86``) may attempt to provide information
> more precisely reflecting the current CPU frequency through this
> attribute, but that still may not be the exact current CPU frequency as
> - seen by the hardware at the moment.
> + seen by the hardware at the moment. This behavior though, is only
> + available via c:macro:``CPUFREQ_ARCH_CUR_FREQ`` option.
>
> ``scaling_driver``
> The scaling driver currently in use.
> diff --git a/drivers/cpufreq/Kconfig.x86 b/drivers/cpufreq/Kconfig.x86
> index 97c2d4f15d76..212e1b9afe21 100644
> --- a/drivers/cpufreq/Kconfig.x86
> +++ b/drivers/cpufreq/Kconfig.x86
> @@ -340,3 +340,15 @@ config X86_SPEEDSTEP_RELAXED_CAP_CHECK
> option lets the probing code bypass some of those checks if the
> parameter "relaxed_check=1" is passed to the module.
>
> +config CPUFREQ_ARCH_CUR_FREQ
> + default y
> + bool "Current frequency derived from HW provided feedback"
> + help
> + This determines whether the scaling_cur_freq sysfs attribute returns
> + the last requested frequency or a more precise value based on hardware
> + provided feedback (as architected counters).
> + Given that a more precise frequency can now be provided via the
> + cpuinfo_avg_cur_freq attribute, by enabling this option,
s/cpuinfo_avg_cur_freq/cpuinfo_cur_freq/?
Overall looks good to me.
Reviewed-by: Sumit Gupta <sumitg@nvidia.com>
^ permalink raw reply [flat|nested] 32+ messages in thread
end of thread, other threads:[~2025-01-29 11:31 UTC | newest]
Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-21 8:44 [PATCH v9 0/5] Add support for AArch64 AMUv1-based average freq Beata Michalska
2025-01-21 8:44 ` [PATCH v9 1/5] cpufreq: Allow arch_freq_get_on_cpu to return an error Beata Michalska
2025-01-21 10:47 ` Viresh Kumar
2025-01-21 15:14 ` Beata Michalska
2025-01-21 18:40 ` Vanshidhar Konda
2025-01-23 21:37 ` Beata Michalska
2025-01-22 6:12 ` Viresh Kumar
2025-01-23 21:45 ` Beata Michalska
2025-01-24 3:33 ` Viresh Kumar
2025-01-28 8:09 ` Beata Michalska
2025-01-28 8:18 ` Viresh Kumar
2025-01-21 10:47 ` Prasanna Kumar T S M
2025-01-24 4:15 ` Prasanna Kumar T S M
2025-01-21 8:44 ` [PATCH v9 2/5] cpufreq: Introduce an optional cpuinfo_avg_freq sysfs entry Beata Michalska
2025-01-21 10:53 ` Viresh Kumar
2025-01-21 15:17 ` Beata Michalska
2025-01-22 6:09 ` Viresh Kumar
2025-01-23 21:47 ` Beata Michalska
2025-01-24 3:27 ` Viresh Kumar
2025-01-28 8:43 ` Prasanna Kumar T S M
2025-01-29 11:29 ` Sumit Gupta
2025-01-21 8:44 ` [PATCH v9 3/5] arm64: amu: Delay allocating cpumask for AMU FIE support Beata Michalska
2025-01-24 4:48 ` Prasanna Kumar T S M
2025-01-29 11:17 ` Sumit Gupta
2025-01-21 8:44 ` [PATCH v9 4/5] arm64: Provide an AMU-based version of arch_freq_get_on_cpu Beata Michalska
2025-01-24 4:43 ` Prasanna Kumar T S M
2025-01-28 8:16 ` Beata Michalska
2025-01-28 8:52 ` Prasanna Kumar T S M
2025-01-29 11:15 ` Sumit Gupta
2025-01-21 8:44 ` [PATCH v9 5/5] arm64: Update AMU-based freq scale factor on entering idle Beata Michalska
2025-01-24 4:45 ` Prasanna Kumar T S M
2025-01-29 11:13 ` Sumit Gupta
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).