* [PATCH v7 0/4] Add support for AArch64 AMUv1-based average freq
@ 2024-09-13 13:29 Beata Michalska
2024-09-13 13:29 ` [PATCH v7 1/4] cpufreq: Introduce an optional cpuinfo_avg_freq sysfs entry Beata Michalska
` (4 more replies)
0 siblings, 5 replies; 27+ messages in thread
From: Beata Michalska @ 2024-09-13 13:29 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, linux-pm, ionela.voinescu,
sudeep.holla, will, catalin.marinas, rafael, viresh.kumar
Cc: sumitg, yang, vanshikonda, lihuisong, zhanjie9
Hi All,
This series adds support for obtaining an average CPU frequency based on
a hardware provided feedback. The average frequency is being exposed via
dedicated yet optional cpufreq sysfs attribute - cpuinfo_avg_freq.
The architecture specific bits are being provided for AArch64, caching on
existing implementation for FIE and AMUv1 support: the frequency scale
factor, updated on each sched tick, serving as a base for retrieving
the frequency for a given CPU, representing an average frequency
reported between the ticks.
The changes have been rather lightly (due to some limitations) tested on
an FVP model. Note that some small discrepancies have been observed while
testing (on the model) and this is currently being investigated, though it
should not have any significant impact on the overall results.
Note that [PATCH 2/4] arm64: amu: Delay allocating cpumask for AMU FIE support
can be merged independently.
Relevant discussions:
[1] https://lore.kernel.org/all/20240229162520.970986-1-vanshikonda@os.amperecomputing.com/
[2] https://lore.kernel.org/all/7eozim2xnepacnnkzxlbx34hib4otycnbn4dqymfziqou5lw5u@5xzpv3t7sxo3/
[3] https://lore.kernel.org/all/20231212072617.14756-1-lihuisong@huawei.com/
[4] https://lore.kernel.org/lkml/ZIHpd6unkOtYVEqP@e120325.cambridge.arm.com/T/#m4e74cb5a0aaa353c60fedc6cfb95ab7a6e381e3c
[5] https://lore.kernel.org/all/20240603081331.3829278-1-beata.michalska@arm.com/
v7:
- Dropping 'arch_topology: init capacity_freq_ref to 0' patch from the series
as this one has been sent separately as an independent change
[https://lore.kernel.org/all/20240827154818.1195849-1-ionela.voinescu@arm.com/]
- Including in the series change that introduces new sysfs entry [PATCH 1/4]
- Consequently modifying previously arch_freq_get_on_cpu to match reqs for new
sysfs attribute
- Dropping an idea of considering a CPU that has been idle for a while as a
valid source of information for obtaining an AMU-counter based frequency
- Some minor cosmetic changes
v6:
- delay allocating cpumask for AMU FIE support instead of invalidating the mask
upon failure to register cpufreq policy notifications
- drop the change to cpufreq core (for cpuinfo_cur_freq) as this one will be
sent as a separate change
v5:
- Fix invalid access to cpumask
- Reworked finding reference cpu when getting the freq
v4:
- dropping seqcount
- fixing identifying active cpu within given policy
- skipping full dynticks cpus when retrieving the freq
- bringing back plugging in arch_freq_get_on_cpu into cpuinfo_cur_freq
v3:
- dropping changes to cpufreq_verify_current_freq
- pulling in changes from Ionela initializing capacity_freq_ref to 0
(thanks for that!) and applying suggestions made by her during last review:
- switching to arch_scale_freq_capacity and arch_scale_freq_ref when
reversing freq scale factor computation
- swapping shift with multiplication
- adding time limit for considering last scale update as valid
- updating frequency scale factor upon entering idle
v2:
- Splitting the patches
- Adding comment for full dyntick mode
- Plugging arch_freq_get_on_cpu into cpufreq_verify_current_freq instead
of in show_cpuinfo_cur_freq to allow the framework to stay more in sync
with potential freq changes
Beata Michalska (4):
cpufreq: Introduce an optional cpuinfo_avg_freq sysfs entry
arm64: amu: Delay allocating cpumask for AMU FIE support
arm64: Provide an AMU-based version of arch_freq_avg_get_on_cpu
arm64: Update AMU-based freq scale factor on entering idle
Documentation/admin-guide/pm/cpufreq.rst | 10 ++
arch/arm64/kernel/topology.c | 144 +++++++++++++++++++----
drivers/cpufreq/cpufreq.c | 31 +++++
include/linux/cpufreq.h | 1 +
4 files changed, 164 insertions(+), 22 deletions(-)
--
2.25.1
^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH v7 1/4] cpufreq: Introduce an optional cpuinfo_avg_freq sysfs entry
2024-09-13 13:29 [PATCH v7 0/4] Add support for AArch64 AMUv1-based average freq Beata Michalska
@ 2024-09-13 13:29 ` Beata Michalska
2024-09-25 8:58 ` Jie Zhan
2024-10-29 7:04 ` Viresh Kumar
2024-09-13 13:29 ` [PATCH v7 2/4] arm64: amu: Delay allocating cpumask for AMU FIE support Beata Michalska
` (3 subsequent siblings)
4 siblings, 2 replies; 27+ messages in thread
From: Beata Michalska @ 2024-09-13 13:29 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, linux-pm, ionela.voinescu,
sudeep.holla, will, catalin.marinas, rafael, viresh.kumar
Cc: sumitg, yang, vanshikonda, lihuisong, zhanjie9
Currently the CPUFreq core exposes two sysfs attributes that can be used
to query current frequency of a given CPU(s): namely cpuinfo_cur_freq
and scaling_cur_freq. Both provide slightly different view on the
subject and they do come with their own drawbacks.
cpuinfo_cur_freq provides higher precision though at a cost of being
rather expensive. Moreover, the information retrieved via this attribute
is somewhat short lived as frequency can change at any point of time
making it difficult to reason from.
scaling_cur_freq, on the other hand, tends to be less accurate but then
the actual level of precision (and source of information) varies between
architectures making it a bit ambiguous.
The new attribute, cpuinfo_avg_freq, is intended to provide more stable,
distinct interface, exposing an average frequency of a given CPU(s), as
reported by the hardware, over a time frame spanning no more than a few
milliseconds. As it requires appropriate hardware support, this
interface is optional.
Signed-off-by: Beata Michalska <beata.michalska@arm.com>
---
Documentation/admin-guide/pm/cpufreq.rst | 10 ++++++++
drivers/cpufreq/cpufreq.c | 31 ++++++++++++++++++++++++
include/linux/cpufreq.h | 1 +
3 files changed, 42 insertions(+)
diff --git a/Documentation/admin-guide/pm/cpufreq.rst b/Documentation/admin-guide/pm/cpufreq.rst
index fe1be4ad88cb..2204d6132c05 100644
--- a/Documentation/admin-guide/pm/cpufreq.rst
+++ b/Documentation/admin-guide/pm/cpufreq.rst
@@ -248,6 +248,16 @@ are the following:
If that frequency cannot be determined, this attribute should not
be present.
+``cpuinfo_avg_freq``
+ An average frequency (in KHz) of all CPUs belonging to a given policy,
+ derived from a hardware provided feedback and reported on a time frame
+ spanning at most few milliseconds.
+
+ This is expected to be based on the frequency the hardware actually runs
+ at and, as such, might require specialised hardware support (such as AMU
+ extension on ARM). If one cannot be determined, this attribute should
+ not be present.
+
``cpuinfo_max_freq``
Maximum possible operating frequency the CPUs belonging to this policy
can run at (in kHz).
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 04fc786dd2c0..3493e5a9500d 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -752,6 +752,16 @@ __weak unsigned int arch_freq_get_on_cpu(int cpu)
return 0;
}
+__weak int arch_freq_avg_get_on_cpu(int cpu)
+{
+ return -EOPNOTSUPP;
+}
+
+static inline bool cpufreq_avg_freq_supported(struct cpufreq_policy *policy)
+{
+ return arch_freq_avg_get_on_cpu(policy->cpu) >= 0;
+}
+
static ssize_t show_scaling_cur_freq(struct cpufreq_policy *policy, char *buf)
{
ssize_t ret;
@@ -802,6 +812,20 @@ static ssize_t show_cpuinfo_cur_freq(struct cpufreq_policy *policy,
return sysfs_emit(buf, "<unknown>\n");
}
+/*
+ * show_cpuinfo_avg_freq - average CPU frequency as detected by hardware
+ */
+static ssize_t show_cpuinfo_avg_freq(struct cpufreq_policy *policy,
+ char *buf)
+{
+ int avg_freq = arch_freq_avg_get_on_cpu(policy->cpu);
+
+ if (avg_freq > 0)
+ return sysfs_emit(buf, "%u\n", avg_freq);
+
+ return sysfs_emit(buf, "<unknown>\n");
+}
+
/*
* show_scaling_governor - show the current policy for the specified CPU
*/
@@ -964,6 +988,7 @@ static ssize_t show_bios_limit(struct cpufreq_policy *policy, char *buf)
}
cpufreq_freq_attr_ro_perm(cpuinfo_cur_freq, 0400);
+cpufreq_freq_attr_ro(cpuinfo_avg_freq);
cpufreq_freq_attr_ro(cpuinfo_min_freq);
cpufreq_freq_attr_ro(cpuinfo_max_freq);
cpufreq_freq_attr_ro(cpuinfo_transition_latency);
@@ -1091,6 +1116,12 @@ static int cpufreq_add_dev_interface(struct cpufreq_policy *policy)
return ret;
}
+ if (cpufreq_avg_freq_supported(policy)) {
+ ret = sysfs_create_file(&policy->kobj, &cpuinfo_avg_freq.attr);
+ if (ret)
+ return ret;
+ }
+
ret = sysfs_create_file(&policy->kobj, &scaling_cur_freq.attr);
if (ret)
return ret;
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index d4d2f4d1d7cb..48262073707e 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -1195,6 +1195,7 @@ static inline int of_perf_domain_get_sharing_cpumask(int pcpu, const char *list_
#endif
extern unsigned int arch_freq_get_on_cpu(int cpu);
+extern int arch_freq_avg_get_on_cpu(int cpu);
#ifndef arch_set_freq_scale
static __always_inline
--
2.25.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v7 2/4] arm64: amu: Delay allocating cpumask for AMU FIE support
2024-09-13 13:29 [PATCH v7 0/4] Add support for AArch64 AMUv1-based average freq Beata Michalska
2024-09-13 13:29 ` [PATCH v7 1/4] cpufreq: Introduce an optional cpuinfo_avg_freq sysfs entry Beata Michalska
@ 2024-09-13 13:29 ` Beata Michalska
2024-09-13 13:29 ` [PATCH v7 3/4] arm64: Provide an AMU-based version of arch_freq_avg_get_on_cpu Beata Michalska
` (2 subsequent siblings)
4 siblings, 0 replies; 27+ messages in thread
From: Beata Michalska @ 2024-09-13 13:29 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, linux-pm, ionela.voinescu,
sudeep.holla, will, catalin.marinas, rafael, viresh.kumar
Cc: sumitg, yang, vanshikonda, lihuisong, zhanjie9
For the time being, the amu_fie_cpus cpumask is being exclusively used
by the AMU-related internals of FIE support and is guaranteed to be
valid on every access currently made. Still the mask is not being
invalidated on one of the error handling code paths, which leaves
a soft spot with theoretical risk of UAF for CPUMASK_OFFSTACK cases.
To make things sound, delay allocating said cpumask
(for CPUMASK_OFFSTACK) avoiding otherwise nasty sanitising case failing
to register the cpufreq policy notifications.
Signed-off-by: Beata Michalska <beata.michalska@arm.com>
---
arch/arm64/kernel/topology.c | 22 ++++++++++------------
1 file changed, 10 insertions(+), 12 deletions(-)
diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
index 1a2c72f3e7f8..cb180684d10d 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -194,12 +194,19 @@ static void amu_fie_setup(const struct cpumask *cpus)
int cpu;
/* We are already set since the last insmod of cpufreq driver */
- if (unlikely(cpumask_subset(cpus, amu_fie_cpus)))
+ if (cpumask_available(amu_fie_cpus) &&
+ unlikely(cpumask_subset(cpus, amu_fie_cpus)))
return;
- for_each_cpu(cpu, cpus) {
+ for_each_cpu(cpu, cpus)
if (!freq_counters_valid(cpu))
return;
+
+ if (!cpumask_available(amu_fie_cpus) &&
+ !zalloc_cpumask_var(&amu_fie_cpus, GFP_KERNEL)) {
+ WARN_ONCE(1, "Failed to allocate FIE cpumask for CPUs[%*pbl]\n",
+ cpumask_pr_args(cpus));
+ return;
}
cpumask_or(amu_fie_cpus, amu_fie_cpus, cpus);
@@ -237,17 +244,8 @@ static struct notifier_block init_amu_fie_notifier = {
static int __init init_amu_fie(void)
{
- int ret;
-
- if (!zalloc_cpumask_var(&amu_fie_cpus, GFP_KERNEL))
- return -ENOMEM;
-
- ret = cpufreq_register_notifier(&init_amu_fie_notifier,
+ return cpufreq_register_notifier(&init_amu_fie_notifier,
CPUFREQ_POLICY_NOTIFIER);
- if (ret)
- free_cpumask_var(amu_fie_cpus);
-
- return ret;
}
core_initcall(init_amu_fie);
--
2.25.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v7 3/4] arm64: Provide an AMU-based version of arch_freq_avg_get_on_cpu
2024-09-13 13:29 [PATCH v7 0/4] Add support for AArch64 AMUv1-based average freq Beata Michalska
2024-09-13 13:29 ` [PATCH v7 1/4] cpufreq: Introduce an optional cpuinfo_avg_freq sysfs entry Beata Michalska
2024-09-13 13:29 ` [PATCH v7 2/4] arm64: amu: Delay allocating cpumask for AMU FIE support Beata Michalska
@ 2024-09-13 13:29 ` Beata Michalska
2024-09-17 12:11 ` Sumit Gupta
2024-09-13 13:29 ` [PATCH v7 4/4] arm64: Update AMU-based freq scale factor on entering idle Beata Michalska
2024-10-16 10:59 ` [PATCH v7 0/4] Add support for AArch64 AMUv1-based average freq Catalin Marinas
4 siblings, 1 reply; 27+ messages in thread
From: Beata Michalska @ 2024-09-13 13:29 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, linux-pm, ionela.voinescu,
sudeep.holla, will, catalin.marinas, rafael, viresh.kumar
Cc: sumitg, yang, vanshikonda, lihuisong, zhanjie9
With the Frequency Invariance Engine (FIE) being already wired up with
sched tick and making use of relevant (core counter and constant
counter) AMU counters, getting the average frequency for a given CPU,
can be achieved by utilizing the frequency scale factor which reflects
an average CPU frequency for the last tick period length.
The solution is partially based on APERF/MPERF implementation of
arch_freq_get_on_cpu.
Suggested-by: Ionela Voinescu <ionela.voinescu@arm.com>
Signed-off-by: Beata Michalska <beata.michalska@arm.com>
---
arch/arm64/kernel/topology.c | 109 +++++++++++++++++++++++++++++++----
1 file changed, 99 insertions(+), 10 deletions(-)
diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
index cb180684d10d..22e510733336 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -17,6 +17,7 @@
#include <linux/cpufreq.h>
#include <linux/init.h>
#include <linux/percpu.h>
+#include <linux/sched/isolation.h>
#include <asm/cpu.h>
#include <asm/cputype.h>
@@ -88,18 +89,28 @@ int __init parse_acpi_topology(void)
* initialized.
*/
static DEFINE_PER_CPU_READ_MOSTLY(unsigned long, arch_max_freq_scale) = 1UL << (2 * SCHED_CAPACITY_SHIFT);
-static DEFINE_PER_CPU(u64, arch_const_cycles_prev);
-static DEFINE_PER_CPU(u64, arch_core_cycles_prev);
static cpumask_var_t amu_fie_cpus;
+struct amu_cntr_sample {
+ u64 arch_const_cycles_prev;
+ u64 arch_core_cycles_prev;
+ unsigned long last_scale_update;
+};
+
+static DEFINE_PER_CPU_SHARED_ALIGNED(struct amu_cntr_sample, cpu_amu_samples);
+
void update_freq_counters_refs(void)
{
- this_cpu_write(arch_core_cycles_prev, read_corecnt());
- this_cpu_write(arch_const_cycles_prev, read_constcnt());
+ struct amu_cntr_sample *amu_sample = this_cpu_ptr(&cpu_amu_samples);
+
+ amu_sample->arch_core_cycles_prev = read_corecnt();
+ amu_sample->arch_const_cycles_prev = read_constcnt();
}
static inline bool freq_counters_valid(int cpu)
{
+ struct amu_cntr_sample *amu_sample = per_cpu_ptr(&cpu_amu_samples, cpu);
+
if ((cpu >= nr_cpu_ids) || !cpumask_test_cpu(cpu, cpu_present_mask))
return false;
@@ -108,8 +119,8 @@ static inline bool freq_counters_valid(int cpu)
return false;
}
- if (unlikely(!per_cpu(arch_const_cycles_prev, cpu) ||
- !per_cpu(arch_core_cycles_prev, cpu))) {
+ if (unlikely(!amu_sample->arch_const_cycles_prev ||
+ !amu_sample->arch_core_cycles_prev)) {
pr_debug("CPU%d: cycle counters are not enabled.\n", cpu);
return false;
}
@@ -152,17 +163,22 @@ void freq_inv_set_max_ratio(int cpu, u64 max_rate)
static void amu_scale_freq_tick(void)
{
+ struct amu_cntr_sample *amu_sample = this_cpu_ptr(&cpu_amu_samples);
u64 prev_core_cnt, prev_const_cnt;
u64 core_cnt, const_cnt, scale;
- prev_const_cnt = this_cpu_read(arch_const_cycles_prev);
- prev_core_cnt = this_cpu_read(arch_core_cycles_prev);
+ prev_const_cnt = amu_sample->arch_const_cycles_prev;
+ prev_core_cnt = amu_sample->arch_core_cycles_prev;
update_freq_counters_refs();
- const_cnt = this_cpu_read(arch_const_cycles_prev);
- core_cnt = this_cpu_read(arch_core_cycles_prev);
+ const_cnt = amu_sample->arch_const_cycles_prev;
+ core_cnt = amu_sample->arch_core_cycles_prev;
+ /*
+ * This should not happen unless the AMUs have been reset and the
+ * counter values have not been restored - unlikely
+ */
if (unlikely(core_cnt <= prev_core_cnt ||
const_cnt <= prev_const_cnt))
return;
@@ -182,6 +198,8 @@ static void amu_scale_freq_tick(void)
scale = min_t(unsigned long, scale, SCHED_CAPACITY_SCALE);
this_cpu_write(arch_freq_scale, (unsigned long)scale);
+
+ amu_sample->last_scale_update = jiffies;
}
static struct scale_freq_data amu_sfd = {
@@ -189,6 +207,77 @@ static struct scale_freq_data amu_sfd = {
.set_freq_scale = amu_scale_freq_tick,
};
+static __always_inline bool amu_fie_cpu_supported(unsigned int cpu)
+{
+ return cpumask_available(amu_fie_cpus) &&
+ cpumask_test_cpu(cpu, amu_fie_cpus);
+}
+
+#define AMU_SAMPLE_EXP_MS 20
+
+int arch_freq_avg_get_on_cpu(int cpu)
+{
+ struct amu_cntr_sample *amu_sample;
+ unsigned int start_cpu = cpu;
+ unsigned long last_update;
+ unsigned int freq = 0;
+ u64 scale;
+
+ if (!amu_fie_cpu_supported(cpu) || !arch_scale_freq_ref(cpu))
+ return -EOPNOTSUPP;
+
+retry:
+ amu_sample = per_cpu_ptr(&cpu_amu_samples, cpu);
+
+ last_update = amu_sample->last_scale_update;
+
+ /*
+ * For those CPUs that are in full dynticks mode, and those that have
+ * not seen tick for a while, try an alternative source for the counters
+ * (and thus freq scale), if available, for given policy: this boils
+ * down to identifying an active cpu within the same freq domain, if any.
+ */
+ if (!housekeeping_cpu(cpu, HK_TYPE_TICK) ||
+ time_is_before_jiffies(last_update + msecs_to_jiffies(AMU_SAMPLE_EXP_MS))) {
+ struct cpufreq_policy *policy = cpufreq_cpu_get(cpu);
+ int ref_cpu = cpu;
+
+ if (!policy)
+ return 0;
+
+ if (!cpumask_intersects(policy->related_cpus,
+ housekeeping_cpumask(HK_TYPE_TICK))) {
+ cpufreq_cpu_put(policy);
+ return -EOPNOTSUPP;
+ }
+
+
+ do {
+ ref_cpu = cpumask_next_wrap(ref_cpu, policy->cpus,
+ start_cpu, false);
+
+ } while (ref_cpu < nr_cpu_ids && idle_cpu(ref_cpu));
+
+ cpufreq_cpu_put(policy);
+
+ if (ref_cpu >= nr_cpu_ids)
+ /* No alternative to pull info from */
+ return 0;
+
+ cpu = ref_cpu;
+ goto retry;
+ }
+ /*
+ * Reversed computation to the one used to determine
+ * the arch_freq_scale value
+ * (see amu_scale_freq_tick for details)
+ */
+ scale = arch_scale_freq_capacity(cpu);
+ freq = scale * arch_scale_freq_ref(cpu);
+ freq >>= SCHED_CAPACITY_SHIFT;
+ return freq;
+}
+
static void amu_fie_setup(const struct cpumask *cpus)
{
int cpu;
--
2.25.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v7 4/4] arm64: Update AMU-based freq scale factor on entering idle
2024-09-13 13:29 [PATCH v7 0/4] Add support for AArch64 AMUv1-based average freq Beata Michalska
` (2 preceding siblings ...)
2024-09-13 13:29 ` [PATCH v7 3/4] arm64: Provide an AMU-based version of arch_freq_avg_get_on_cpu Beata Michalska
@ 2024-09-13 13:29 ` Beata Michalska
2024-10-16 10:59 ` [PATCH v7 0/4] Add support for AArch64 AMUv1-based average freq Catalin Marinas
4 siblings, 0 replies; 27+ messages in thread
From: Beata Michalska @ 2024-09-13 13:29 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, linux-pm, ionela.voinescu,
sudeep.holla, will, catalin.marinas, rafael, viresh.kumar
Cc: sumitg, yang, vanshikonda, lihuisong, zhanjie9
Now that the frequency scale factor has been activated for retrieving
current frequency on a given CPU, trigger its update upon entering
idle. This will, to an extent, allow querying last known frequency
in a non-invasive way. It will also improve the frequency scale factor
accuracy when a CPU entering idle did not receive a tick for a while.
As a consequence, for idle cores, the reported frequency will be the
last one observed before entering the idle state.
Suggested-by: Vanshidhar Konda <vanshikonda@os.amperecomputing.com>
Signed-off-by: Beata Michalska <beata.michalska@arm.com>
---
arch/arm64/kernel/topology.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
index 22e510733336..dbde62fd013c 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -213,6 +213,19 @@ static __always_inline bool amu_fie_cpu_supported(unsigned int cpu)
cpumask_test_cpu(cpu, amu_fie_cpus);
}
+void arch_cpu_idle_enter(void)
+{
+ unsigned int cpu = smp_processor_id();
+
+ if (!amu_fie_cpu_supported(cpu))
+ return;
+
+ /* Kick in AMU update but only if one has not happened already */
+ if (housekeeping_cpu(cpu, HK_TYPE_TICK) &&
+ time_is_before_jiffies(per_cpu(cpu_amu_samples.last_scale_update, cpu)))
+ amu_scale_freq_tick();
+}
+
#define AMU_SAMPLE_EXP_MS 20
int arch_freq_avg_get_on_cpu(int cpu)
--
2.25.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH v7 3/4] arm64: Provide an AMU-based version of arch_freq_avg_get_on_cpu
2024-09-13 13:29 ` [PATCH v7 3/4] arm64: Provide an AMU-based version of arch_freq_avg_get_on_cpu Beata Michalska
@ 2024-09-17 12:11 ` Sumit Gupta
2024-09-26 10:34 ` Beata Michalska
0 siblings, 1 reply; 27+ messages in thread
From: Sumit Gupta @ 2024-09-17 12:11 UTC (permalink / raw)
To: Beata Michalska, linux-kernel, linux-arm-kernel, linux-pm,
ionela.voinescu, sudeep.holla, will, catalin.marinas, rafael,
viresh.kumar
Cc: yang, vanshikonda, lihuisong, zhanjie9, linux-tegra, Bibek Basu
Hi Beata,
Thank you for the patches.
On 13/09/24 18:59, Beata Michalska wrote:
> External email: Use caution opening links or attachments
>
>
> With the Frequency Invariance Engine (FIE) being already wired up with
> sched tick and making use of relevant (core counter and constant
> counter) AMU counters, getting the average frequency for a given CPU,
> can be achieved by utilizing the frequency scale factor which reflects
> an average CPU frequency for the last tick period length.
>
> The solution is partially based on APERF/MPERF implementation of
> arch_freq_get_on_cpu.
>
> Suggested-by: Ionela Voinescu <ionela.voinescu@arm.com>
> Signed-off-by: Beata Michalska <beata.michalska@arm.com>
> ---
> arch/arm64/kernel/topology.c | 109 +++++++++++++++++++++++++++++++----
> 1 file changed, 99 insertions(+), 10 deletions(-)
>
> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
> index cb180684d10d..22e510733336 100644
> --- a/arch/arm64/kernel/topology.c
> +++ b/arch/arm64/kernel/topology.c
> @@ -17,6 +17,7 @@
> #include <linux/cpufreq.h>
> #include <linux/init.h>
> #include <linux/percpu.h>
> +#include <linux/sched/isolation.h>
>
> #include <asm/cpu.h>
> #include <asm/cputype.h>
> @@ -88,18 +89,28 @@ int __init parse_acpi_topology(void)
> * initialized.
> */
> static DEFINE_PER_CPU_READ_MOSTLY(unsigned long, arch_max_freq_scale) = 1UL << (2 * SCHED_CAPACITY_SHIFT);
> -static DEFINE_PER_CPU(u64, arch_const_cycles_prev);
> -static DEFINE_PER_CPU(u64, arch_core_cycles_prev);
> static cpumask_var_t amu_fie_cpus;
>
> +struct amu_cntr_sample {
> + u64 arch_const_cycles_prev;
> + u64 arch_core_cycles_prev;
> + unsigned long last_scale_update;
> +};
> +
> +static DEFINE_PER_CPU_SHARED_ALIGNED(struct amu_cntr_sample, cpu_amu_samples);
> +
> void update_freq_counters_refs(void)
> {
> - this_cpu_write(arch_core_cycles_prev, read_corecnt());
> - this_cpu_write(arch_const_cycles_prev, read_constcnt());
> + struct amu_cntr_sample *amu_sample = this_cpu_ptr(&cpu_amu_samples);
> +
> + amu_sample->arch_core_cycles_prev = read_corecnt();
> + amu_sample->arch_const_cycles_prev = read_constcnt();
> }
>
> static inline bool freq_counters_valid(int cpu)
> {
> + struct amu_cntr_sample *amu_sample = per_cpu_ptr(&cpu_amu_samples, cpu);
> +
> if ((cpu >= nr_cpu_ids) || !cpumask_test_cpu(cpu, cpu_present_mask))
> return false;
>
> @@ -108,8 +119,8 @@ static inline bool freq_counters_valid(int cpu)
> return false;
> }
>
> - if (unlikely(!per_cpu(arch_const_cycles_prev, cpu) ||
> - !per_cpu(arch_core_cycles_prev, cpu))) {
> + if (unlikely(!amu_sample->arch_const_cycles_prev ||
> + !amu_sample->arch_core_cycles_prev)) {
> pr_debug("CPU%d: cycle counters are not enabled.\n", cpu);
> return false;
> }
> @@ -152,17 +163,22 @@ void freq_inv_set_max_ratio(int cpu, u64 max_rate)
>
> static void amu_scale_freq_tick(void)
> {
> + struct amu_cntr_sample *amu_sample = this_cpu_ptr(&cpu_amu_samples);
> u64 prev_core_cnt, prev_const_cnt;
> u64 core_cnt, const_cnt, scale;
>
> - prev_const_cnt = this_cpu_read(arch_const_cycles_prev);
> - prev_core_cnt = this_cpu_read(arch_core_cycles_prev);
> + prev_const_cnt = amu_sample->arch_const_cycles_prev;
> + prev_core_cnt = amu_sample->arch_core_cycles_prev;
>
> update_freq_counters_refs();
>
> - const_cnt = this_cpu_read(arch_const_cycles_prev);
> - core_cnt = this_cpu_read(arch_core_cycles_prev);
> + const_cnt = amu_sample->arch_const_cycles_prev;
> + core_cnt = amu_sample->arch_core_cycles_prev;
>
> + /*
> + * This should not happen unless the AMUs have been reset and the
> + * counter values have not been restored - unlikely
> + */
> if (unlikely(core_cnt <= prev_core_cnt ||
> const_cnt <= prev_const_cnt))
> return;
> @@ -182,6 +198,8 @@ static void amu_scale_freq_tick(void)
>
> scale = min_t(unsigned long, scale, SCHED_CAPACITY_SCALE);
> this_cpu_write(arch_freq_scale, (unsigned long)scale);
> +
> + amu_sample->last_scale_update = jiffies;
> }
>
> static struct scale_freq_data amu_sfd = {
> @@ -189,6 +207,77 @@ static struct scale_freq_data amu_sfd = {
> .set_freq_scale = amu_scale_freq_tick,
> };
>
> +static __always_inline bool amu_fie_cpu_supported(unsigned int cpu)
> +{
> + return cpumask_available(amu_fie_cpus) &&
> + cpumask_test_cpu(cpu, amu_fie_cpus);
> +}
> +
> +#define AMU_SAMPLE_EXP_MS 20
> +
> +int arch_freq_avg_get_on_cpu(int cpu)
> +{
> + struct amu_cntr_sample *amu_sample;
> + unsigned int start_cpu = cpu;
> + unsigned long last_update;
> + unsigned int freq = 0;
> + u64 scale;
> +
> + if (!amu_fie_cpu_supported(cpu) || !arch_scale_freq_ref(cpu))
> + return -EOPNOTSUPP;
> +
> +retry:
> + amu_sample = per_cpu_ptr(&cpu_amu_samples, cpu);
> +
> + last_update = amu_sample->last_scale_update;
> +
> + /*
> + * For those CPUs that are in full dynticks mode, and those that have
'or those' to match with if condition?
> + * not seen tick for a while, try an alternative source for the counters
> + * (and thus freq scale), if available, for given policy: this boils
> + * down to identifying an active cpu within the same freq domain, if any.
> + */
> + if (!housekeeping_cpu(cpu, HK_TYPE_TICK) ||
> + time_is_before_jiffies(last_update + msecs_to_jiffies(AMU_SAMPLE_EXP_MS))) {
> + struct cpufreq_policy *policy = cpufreq_cpu_get(cpu);
> + int ref_cpu = cpu;
> +
> + if (!policy)
> + return 0;
> +
We can skip the rest of code if policy has a single cpu. AFAIR, one of
the previous versions had similar check.
if (!policy_is_shared(policy)) {
cpufreq_cpu_put(policy);
goto freq_comput;
}
> + if (!cpumask_intersects(policy->related_cpus,
> + housekeeping_cpumask(HK_TYPE_TICK))) {
> + cpufreq_cpu_put(policy);
> + return -EOPNOTSUPP;
> + }
> +
> +
> + do {
> + ref_cpu = cpumask_next_wrap(ref_cpu, policy->cpus,
> + start_cpu, false);
> +
> + } while (ref_cpu < nr_cpu_ids && idle_cpu(ref_cpu));
> +
> + cpufreq_cpu_put(policy);
> +
> + if (ref_cpu >= nr_cpu_ids)
> + /* No alternative to pull info from */
> + return 0;
> +
The 'cpuinfo_avg_freq' node gives 'unknown' value for single CPU per
policy as 'ref_cpu' increments to 'nr_cpu_ids'. We can use the same CPU
instead of returning zero if no alternative CPU.
# cat /sys/devices/system/cpu/cpu2/cpufreq/cpuinfo_avg_freq
<unknown>
----
if (ref_cpu >= nr_cpu_ids)
/* Use same CPU if no alternative to pull info from */
goto freq_comput;
..
freq_comput:
scale = arch_scale_freq_capacity(cpu);
freq = scale * arch_scale_freq_ref(cpu);
----
Thank you,
Sumit Gupta
P.S. Will be on afk for next 2 weeks with no access to email. Please
expect a delay in response.
> + cpu = ref_cpu;
> + goto retry;
> + }
> + /*
> + * Reversed computation to the one used to determine
> + * the arch_freq_scale value
> + * (see amu_scale_freq_tick for details)
> + */
> + scale = arch_scale_freq_capacity(cpu);
> + freq = scale * arch_scale_freq_ref(cpu);
> + freq >>= SCHED_CAPACITY_SHIFT;
> + return freq;
> +}
> +
> static void amu_fie_setup(const struct cpumask *cpus)
> {
> int cpu;
> --
> 2.25.1
>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v7 1/4] cpufreq: Introduce an optional cpuinfo_avg_freq sysfs entry
2024-09-13 13:29 ` [PATCH v7 1/4] cpufreq: Introduce an optional cpuinfo_avg_freq sysfs entry Beata Michalska
@ 2024-09-25 8:58 ` Jie Zhan
2024-09-26 10:42 ` Beata Michalska
2024-10-29 7:04 ` Viresh Kumar
1 sibling, 1 reply; 27+ messages in thread
From: Jie Zhan @ 2024-09-25 8:58 UTC (permalink / raw)
To: Beata Michalska, linux-kernel, linux-arm-kernel, linux-pm,
ionela.voinescu, sudeep.holla, will, catalin.marinas, rafael,
viresh.kumar
Cc: sumitg, yang, vanshikonda, lihuisong
Hi Beata,
Great thanks for the update.
On 13/09/2024 21:29, Beata Michalska wrote:
> Currently the CPUFreq core exposes two sysfs attributes that can be used
> to query current frequency of a given CPU(s): namely cpuinfo_cur_freq
> and scaling_cur_freq. Both provide slightly different view on the
> subject and they do come with their own drawbacks.
>
> cpuinfo_cur_freq provides higher precision though at a cost of being
> rather expensive. Moreover, the information retrieved via this attribute
> is somewhat short lived as frequency can change at any point of time
> making it difficult to reason from.
>
> scaling_cur_freq, on the other hand, tends to be less accurate but then
> the actual level of precision (and source of information) varies between
> architectures making it a bit ambiguous.
>
> The new attribute, cpuinfo_avg_freq, is intended to provide more stable,
> distinct interface, exposing an average frequency of a given CPU(s), as
> reported by the hardware, over a time frame spanning no more than a few
> milliseconds. As it requires appropriate hardware support, this
> interface is optional.
>
> Signed-off-by: Beata Michalska <beata.michalska@arm.com>
> ---
> Documentation/admin-guide/pm/cpufreq.rst | 10 ++++++++
> drivers/cpufreq/cpufreq.c | 31 ++++++++++++++++++++++++
> include/linux/cpufreq.h | 1 +
> 3 files changed, 42 insertions(+)
>
> diff --git a/Documentation/admin-guide/pm/cpufreq.rst b/Documentation/admin-guide/pm/cpufreq.rst
> index fe1be4ad88cb..2204d6132c05 100644
> --- a/Documentation/admin-guide/pm/cpufreq.rst
> +++ b/Documentation/admin-guide/pm/cpufreq.rst
> @@ -248,6 +248,16 @@ are the following:
> If that frequency cannot be determined, this attribute should not
> be present.
>
> +``cpuinfo_avg_freq``
> + An average frequency (in KHz) of all CPUs belonging to a given policy,
> + derived from a hardware provided feedback and reported on a time frame
> + spanning at most few milliseconds.
I don't think it's necessary to put the 'at most few milliseconds'
limitation on.
It's supposed to be fine for other platforms to implement the interface
with a longer time period, e.g. a few seconds, in the future. Otherwise,
this would probably force the implementation of 'cpuinfo_avg_freq' to be
binded with the 'scale freq tick' stuff.
> +
> + This is expected to be based on the frequency the hardware actually runs
> + at and, as such, might require specialised hardware support (such as AMU
> + extension on ARM). If one cannot be determined, this attribute should
> + not be present.
> +
> ``cpuinfo_max_freq``
> Maximum possible operating frequency the CPUs belonging to this policy
> can run at (in kHz).
...
> diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
> index d4d2f4d1d7cb..48262073707e 100644
> --- a/include/linux/cpufreq.h
> +++ b/include/linux/cpufreq.h
> @@ -1195,6 +1195,7 @@ static inline int of_perf_domain_get_sharing_cpumask(int pcpu, const char *list_
> #endif
>
> extern unsigned int arch_freq_get_on_cpu(int cpu);
> +extern int arch_freq_avg_get_on_cpu(int cpu);
It's werid to have two different functions with mostly the same behaviour,
i.e. arch_freq_get_on_cpu() and arch_freq_avg_get_on_cpu().
Appreciated that there would be some capatibility work with x86 at the
moment if merging them, e.g. return type, default implementation, impact on
some userspace tools, etc.
Anyhow, are they supposed to be merged in the near future?
Thanks,
Jie
>
> #ifndef arch_set_freq_scale
> static __always_inline
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v7 3/4] arm64: Provide an AMU-based version of arch_freq_avg_get_on_cpu
2024-09-17 12:11 ` Sumit Gupta
@ 2024-09-26 10:34 ` Beata Michalska
2024-09-26 23:21 ` Vanshidhar Konda
0 siblings, 1 reply; 27+ messages in thread
From: Beata Michalska @ 2024-09-26 10:34 UTC (permalink / raw)
To: Sumit Gupta
Cc: linux-kernel, linux-arm-kernel, linux-pm, ionela.voinescu,
sudeep.holla, will, catalin.marinas, rafael, viresh.kumar, yang,
vanshikonda, lihuisong, zhanjie9, linux-tegra, Bibek Basu
On Tue, Sep 17, 2024 at 05:41:09PM +0530, Sumit Gupta wrote:
> Hi Beata,
Hi Sumit,
>
> Thank you for the patches.
Thank you for having a look at those.
>
> On 13/09/24 18:59, Beata Michalska wrote:
> > External email: Use caution opening links or attachments
> >
> >
> > With the Frequency Invariance Engine (FIE) being already wired up with
> > sched tick and making use of relevant (core counter and constant
> > counter) AMU counters, getting the average frequency for a given CPU,
> > can be achieved by utilizing the frequency scale factor which reflects
> > an average CPU frequency for the last tick period length.
> >
> > The solution is partially based on APERF/MPERF implementation of
> > arch_freq_get_on_cpu.
> >
> > Suggested-by: Ionela Voinescu <ionela.voinescu@arm.com>
> > Signed-off-by: Beata Michalska <beata.michalska@arm.com>
> > ---
> > arch/arm64/kernel/topology.c | 109 +++++++++++++++++++++++++++++++----
> > 1 file changed, 99 insertions(+), 10 deletions(-)
> >
> > diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
> > index cb180684d10d..22e510733336 100644
> > --- a/arch/arm64/kernel/topology.c
> > +++ b/arch/arm64/kernel/topology.c
> > @@ -17,6 +17,7 @@
> > #include <linux/cpufreq.h>
> > #include <linux/init.h>
> > #include <linux/percpu.h>
> > +#include <linux/sched/isolation.h>
> >
> > #include <asm/cpu.h>
> > #include <asm/cputype.h>
> > @@ -88,18 +89,28 @@ int __init parse_acpi_topology(void)
> > * initialized.
> > */
> > static DEFINE_PER_CPU_READ_MOSTLY(unsigned long, arch_max_freq_scale) = 1UL << (2 * SCHED_CAPACITY_SHIFT);
> > -static DEFINE_PER_CPU(u64, arch_const_cycles_prev);
> > -static DEFINE_PER_CPU(u64, arch_core_cycles_prev);
> > static cpumask_var_t amu_fie_cpus;
> >
> > +struct amu_cntr_sample {
> > + u64 arch_const_cycles_prev;
> > + u64 arch_core_cycles_prev;
> > + unsigned long last_scale_update;
> > +};
> > +
> > +static DEFINE_PER_CPU_SHARED_ALIGNED(struct amu_cntr_sample, cpu_amu_samples);
> > +
> > void update_freq_counters_refs(void)
> > {
> > - this_cpu_write(arch_core_cycles_prev, read_corecnt());
> > - this_cpu_write(arch_const_cycles_prev, read_constcnt());
> > + struct amu_cntr_sample *amu_sample = this_cpu_ptr(&cpu_amu_samples);
> > +
> > + amu_sample->arch_core_cycles_prev = read_corecnt();
> > + amu_sample->arch_const_cycles_prev = read_constcnt();
> > }
> >
> > static inline bool freq_counters_valid(int cpu)
> > {
> > + struct amu_cntr_sample *amu_sample = per_cpu_ptr(&cpu_amu_samples, cpu);
> > +
> > if ((cpu >= nr_cpu_ids) || !cpumask_test_cpu(cpu, cpu_present_mask))
> > return false;
> >
> > @@ -108,8 +119,8 @@ static inline bool freq_counters_valid(int cpu)
> > return false;
> > }
> >
> > - if (unlikely(!per_cpu(arch_const_cycles_prev, cpu) ||
> > - !per_cpu(arch_core_cycles_prev, cpu))) {
> > + if (unlikely(!amu_sample->arch_const_cycles_prev ||
> > + !amu_sample->arch_core_cycles_prev)) {
> > pr_debug("CPU%d: cycle counters are not enabled.\n", cpu);
> > return false;
> > }
> > @@ -152,17 +163,22 @@ void freq_inv_set_max_ratio(int cpu, u64 max_rate)
> >
> > static void amu_scale_freq_tick(void)
> > {
> > + struct amu_cntr_sample *amu_sample = this_cpu_ptr(&cpu_amu_samples);
> > u64 prev_core_cnt, prev_const_cnt;
> > u64 core_cnt, const_cnt, scale;
> >
> > - prev_const_cnt = this_cpu_read(arch_const_cycles_prev);
> > - prev_core_cnt = this_cpu_read(arch_core_cycles_prev);
> > + prev_const_cnt = amu_sample->arch_const_cycles_prev;
> > + prev_core_cnt = amu_sample->arch_core_cycles_prev;
> >
> > update_freq_counters_refs();
> >
> > - const_cnt = this_cpu_read(arch_const_cycles_prev);
> > - core_cnt = this_cpu_read(arch_core_cycles_prev);
> > + const_cnt = amu_sample->arch_const_cycles_prev;
> > + core_cnt = amu_sample->arch_core_cycles_prev;
> >
> > + /*
> > + * This should not happen unless the AMUs have been reset and the
> > + * counter values have not been restored - unlikely
> > + */
> > if (unlikely(core_cnt <= prev_core_cnt ||
> > const_cnt <= prev_const_cnt))
> > return;
> > @@ -182,6 +198,8 @@ static void amu_scale_freq_tick(void)
> >
> > scale = min_t(unsigned long, scale, SCHED_CAPACITY_SCALE);
> > this_cpu_write(arch_freq_scale, (unsigned long)scale);
> > +
> > + amu_sample->last_scale_update = jiffies;
> > }
> >
> > static struct scale_freq_data amu_sfd = {
> > @@ -189,6 +207,77 @@ static struct scale_freq_data amu_sfd = {
> > .set_freq_scale = amu_scale_freq_tick,
> > };
> >
> > +static __always_inline bool amu_fie_cpu_supported(unsigned int cpu)
> > +{
> > + return cpumask_available(amu_fie_cpus) &&
> > + cpumask_test_cpu(cpu, amu_fie_cpus);
> > +}
> > +
> > +#define AMU_SAMPLE_EXP_MS 20
> > +
> > +int arch_freq_avg_get_on_cpu(int cpu)
> > +{
> > + struct amu_cntr_sample *amu_sample;
> > + unsigned int start_cpu = cpu;
> > + unsigned long last_update;
> > + unsigned int freq = 0;
> > + u64 scale;
> > +
> > + if (!amu_fie_cpu_supported(cpu) || !arch_scale_freq_ref(cpu))
> > + return -EOPNOTSUPP;
> > +
> > +retry:
> > + amu_sample = per_cpu_ptr(&cpu_amu_samples, cpu);
> > +
> > + last_update = amu_sample->last_scale_update;
> > +
> > + /*
> > + * For those CPUs that are in full dynticks mode, and those that have
> 'or those' to match with if condition?
Yeah, might be.
>
> > + * not seen tick for a while, try an alternative source for the counters
> > + * (and thus freq scale), if available, for given policy: this boils
> > + * down to identifying an active cpu within the same freq domain, if any.
> > + */
> > + if (!housekeeping_cpu(cpu, HK_TYPE_TICK) ||
> > + time_is_before_jiffies(last_update + msecs_to_jiffies(AMU_SAMPLE_EXP_MS))) {
> > + struct cpufreq_policy *policy = cpufreq_cpu_get(cpu);
> > + int ref_cpu = cpu;
> > +
> > + if (!policy)
> > + return 0;
> > +
>
> We can skip the rest of code if policy has a single cpu. AFAIR, one of the
> previous versions had similar check.
>
> if (!policy_is_shared(policy)) {
> cpufreq_cpu_put(policy);
> goto freq_comput;
> }
True, we could but then this case is covered by cpumask_next_wrap
which for single-cpu policies will render the ref_cpu invalid,
so policy_is_shared check seemed unnecessary.
>
> > + if (!cpumask_intersects(policy->related_cpus,
> > + housekeeping_cpumask(HK_TYPE_TICK))) {
> > + cpufreq_cpu_put(policy);
> > + return -EOPNOTSUPP;
> > + }
> > +
> > +
> > + do {
> > + ref_cpu = cpumask_next_wrap(ref_cpu, policy->cpus,
> > + start_cpu, false);
> > +
> > + } while (ref_cpu < nr_cpu_ids && idle_cpu(ref_cpu));
> > +
> > + cpufreq_cpu_put(policy);
> > +
> > + if (ref_cpu >= nr_cpu_ids)
> > + /* No alternative to pull info from */
> > + return 0;
> > +
>
> The 'cpuinfo_avg_freq' node gives 'unknown' value for single CPU per policy
> as 'ref_cpu' increments to 'nr_cpu_ids'. We can use the same CPU instead of
> returning zero if no alternative CPU.
>
> # cat /sys/devices/system/cpu/cpu2/cpufreq/cpuinfo_avg_freq
> <unknown>
>
> ----
> if (ref_cpu >= nr_cpu_ids)
> /* Use same CPU if no alternative to pull info from */
> goto freq_comput;
>
> ..
> freq_comput:
> scale = arch_scale_freq_capacity(cpu);
> freq = scale * arch_scale_freq_ref(cpu);
> ----
>
This boils down to the question what that function, and the information it
provides, represent really. The 'unknown' here simply says the CPU has been idle
for a while and as such the frequency data is a bit stale and it does not
represent the average freq the CPU is actually running at anymore, which is
the intention here really. Or, that the given CPU is a non-housekeeping one.
Either way I believe this is a useful information, instead of providing
stale data with no indication on whether the frequency is really the 'current'
one or not.
If that is somehow undesirable we can discuss this further, though I'd rather
avoid exposing an interface where the feedback provided is open to
interpretation at all times.
---
Best Regards
Beata
> Thank you,
> Sumit Gupta
>
> P.S. Will be on afk for next 2 weeks with no access to email. Please expect
> a delay in response.
>
> > + cpu = ref_cpu;
> > + goto retry;
> > + }
> > + /*
> > + * Reversed computation to the one used to determine
> > + * the arch_freq_scale value
> > + * (see amu_scale_freq_tick for details)
> > + */
> > + scale = arch_scale_freq_capacity(cpu);
> > + freq = scale * arch_scale_freq_ref(cpu);
> > + freq >>= SCHED_CAPACITY_SHIFT;
> > + return freq;
> > +}
> > +
>
> > static void amu_fie_setup(const struct cpumask *cpus)
> > {
> > int cpu;
> > --
> > 2.25.1
> >
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v7 1/4] cpufreq: Introduce an optional cpuinfo_avg_freq sysfs entry
2024-09-25 8:58 ` Jie Zhan
@ 2024-09-26 10:42 ` Beata Michalska
0 siblings, 0 replies; 27+ messages in thread
From: Beata Michalska @ 2024-09-26 10:42 UTC (permalink / raw)
To: Jie Zhan
Cc: linux-kernel, linux-arm-kernel, linux-pm, ionela.voinescu,
sudeep.holla, will, catalin.marinas, rafael, viresh.kumar, sumitg,
yang, vanshikonda, lihuisong
On Wed, Sep 25, 2024 at 04:58:36PM +0800, Jie Zhan wrote:
> Hi Beata,
Hi Jie
>
> Great thanks for the update.
>
> On 13/09/2024 21:29, Beata Michalska wrote:
> > Currently the CPUFreq core exposes two sysfs attributes that can be used
> > to query current frequency of a given CPU(s): namely cpuinfo_cur_freq
> > and scaling_cur_freq. Both provide slightly different view on the
> > subject and they do come with their own drawbacks.
> >
> > cpuinfo_cur_freq provides higher precision though at a cost of being
> > rather expensive. Moreover, the information retrieved via this attribute
> > is somewhat short lived as frequency can change at any point of time
> > making it difficult to reason from.
> >
> > scaling_cur_freq, on the other hand, tends to be less accurate but then
> > the actual level of precision (and source of information) varies between
> > architectures making it a bit ambiguous.
> >
> > The new attribute, cpuinfo_avg_freq, is intended to provide more stable,
> > distinct interface, exposing an average frequency of a given CPU(s), as
> > reported by the hardware, over a time frame spanning no more than a few
> > milliseconds. As it requires appropriate hardware support, this
> > interface is optional.
> >
> > Signed-off-by: Beata Michalska <beata.michalska@arm.com>
> > ---
> > Documentation/admin-guide/pm/cpufreq.rst | 10 ++++++++
> > drivers/cpufreq/cpufreq.c | 31 ++++++++++++++++++++++++
> > include/linux/cpufreq.h | 1 +
> > 3 files changed, 42 insertions(+)
> >
> > diff --git a/Documentation/admin-guide/pm/cpufreq.rst b/Documentation/admin-guide/pm/cpufreq.rst
> > index fe1be4ad88cb..2204d6132c05 100644
> > --- a/Documentation/admin-guide/pm/cpufreq.rst
> > +++ b/Documentation/admin-guide/pm/cpufreq.rst
> > @@ -248,6 +248,16 @@ are the following:
> > If that frequency cannot be determined, this attribute should not
> > be present.
> >
> > +``cpuinfo_avg_freq``
> > + An average frequency (in KHz) of all CPUs belonging to a given policy,
> > + derived from a hardware provided feedback and reported on a time frame
> > + spanning at most few milliseconds.
>
> I don't think it's necessary to put the 'at most few milliseconds'
> limitation on.
>
> It's supposed to be fine for other platforms to implement the interface
> with a longer time period, e.g. a few seconds, in the future. Otherwise,
> this would probably force the implementation of 'cpuinfo_avg_freq' to be
> binded with the 'scale freq tick' stuff.
Actually the sched_tick was intentionally omitted from the description
to avoid associating one with another.
Not really sure how useful it would be to have a longer time-frames for the
average frequency though.
It is still intended to be rather accurate - thus the 'at most few
milliseconds' statement. Extending that period reduces the accuracy.
If we allow that - meaning getting average frequency over different time-frame
spans , we introduce yet again platform specific behaviour for common interface,
which might not be that desired.
>
> > +
> > + This is expected to be based on the frequency the hardware actually runs
> > + at and, as such, might require specialised hardware support (such as AMU
> > + extension on ARM). If one cannot be determined, this attribute should
> > + not be present.
> > +
> > ``cpuinfo_max_freq``
> > Maximum possible operating frequency the CPUs belonging to this policy
> > can run at (in kHz).
>
> ...
>
> > diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
> > index d4d2f4d1d7cb..48262073707e 100644
> > --- a/include/linux/cpufreq.h
> > +++ b/include/linux/cpufreq.h
> > @@ -1195,6 +1195,7 @@ static inline int of_perf_domain_get_sharing_cpumask(int pcpu, const char *list_
> > #endif
> >
> > extern unsigned int arch_freq_get_on_cpu(int cpu);
> > +extern int arch_freq_avg_get_on_cpu(int cpu);
>
> It's werid to have two different functions with mostly the same behaviour,
> i.e. arch_freq_get_on_cpu() and arch_freq_avg_get_on_cpu().
>
> Appreciated that there would be some capatibility work with x86 at the
> moment if merging them, e.g. return type, default implementation, impact on
> some userspace tools, etc.
The intention here was indeed to have a clean distinction between the two.
>
> Anyhow, are they supposed to be merged in the near future?
That depends on any further comments on that new sysfs attribute I guess.
---
Thanks
Beata
>
>
> Thanks,
> Jie
> >
> > #ifndef arch_set_freq_scale
> > static __always_inline
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v7 3/4] arm64: Provide an AMU-based version of arch_freq_avg_get_on_cpu
2024-09-26 10:34 ` Beata Michalska
@ 2024-09-26 23:21 ` Vanshidhar Konda
2024-10-03 21:39 ` Beata Michalska
0 siblings, 1 reply; 27+ messages in thread
From: Vanshidhar Konda @ 2024-09-26 23:21 UTC (permalink / raw)
To: Beata Michalska
Cc: Sumit Gupta, linux-kernel, linux-arm-kernel, linux-pm,
ionela.voinescu, sudeep.holla, will, catalin.marinas, rafael,
viresh.kumar, yang, lihuisong, zhanjie9, linux-tegra, Bibek Basu
On Thu, Sep 26, 2024 at 12:34:01PM GMT, Beata Michalska wrote:
>On Tue, Sep 17, 2024 at 05:41:09PM +0530, Sumit Gupta wrote:
>> Hi Beata,
>Hi Sumit,
>>
>> Thank you for the patches.
>Thank you for having a look at those.
>>
>> On 13/09/24 18:59, Beata Michalska wrote:
>> > External email: Use caution opening links or attachments
>> >
>> >
>> > With the Frequency Invariance Engine (FIE) being already wired up with
>> > sched tick and making use of relevant (core counter and constant
>> > counter) AMU counters, getting the average frequency for a given CPU,
>> > can be achieved by utilizing the frequency scale factor which reflects
>> > an average CPU frequency for the last tick period length.
>> >
>> > The solution is partially based on APERF/MPERF implementation of
>> > arch_freq_get_on_cpu.
>> >
>> > Suggested-by: Ionela Voinescu <ionela.voinescu@arm.com>
>> > Signed-off-by: Beata Michalska <beata.michalska@arm.com>
>> > ---
>> > arch/arm64/kernel/topology.c | 109 +++++++++++++++++++++++++++++++----
>> > 1 file changed, 99 insertions(+), 10 deletions(-)
>> >
>> > diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
>> > index cb180684d10d..22e510733336 100644
>> > --- a/arch/arm64/kernel/topology.c
>> > +++ b/arch/arm64/kernel/topology.c
>> > @@ -17,6 +17,7 @@
>> > #include <linux/cpufreq.h>
>> > #include <linux/init.h>
>> > #include <linux/percpu.h>
>> > +#include <linux/sched/isolation.h>
>> >
>> > #include <asm/cpu.h>
>> > #include <asm/cputype.h>
>> > @@ -88,18 +89,28 @@ int __init parse_acpi_topology(void)
>> > * initialized.
>> > */
>> > static DEFINE_PER_CPU_READ_MOSTLY(unsigned long, arch_max_freq_scale) = 1UL << (2 * SCHED_CAPACITY_SHIFT);
>> > -static DEFINE_PER_CPU(u64, arch_const_cycles_prev);
>> > -static DEFINE_PER_CPU(u64, arch_core_cycles_prev);
>> > static cpumask_var_t amu_fie_cpus;
>> >
>> > +struct amu_cntr_sample {
>> > + u64 arch_const_cycles_prev;
>> > + u64 arch_core_cycles_prev;
>> > + unsigned long last_scale_update;
>> > +};
>> > +
>> > +static DEFINE_PER_CPU_SHARED_ALIGNED(struct amu_cntr_sample, cpu_amu_samples);
>> > +
>> > void update_freq_counters_refs(void)
>> > {
>> > - this_cpu_write(arch_core_cycles_prev, read_corecnt());
>> > - this_cpu_write(arch_const_cycles_prev, read_constcnt());
>> > + struct amu_cntr_sample *amu_sample = this_cpu_ptr(&cpu_amu_samples);
>> > +
>> > + amu_sample->arch_core_cycles_prev = read_corecnt();
>> > + amu_sample->arch_const_cycles_prev = read_constcnt();
>> > }
>> >
>> > static inline bool freq_counters_valid(int cpu)
>> > {
>> > + struct amu_cntr_sample *amu_sample = per_cpu_ptr(&cpu_amu_samples, cpu);
>> > +
>> > if ((cpu >= nr_cpu_ids) || !cpumask_test_cpu(cpu, cpu_present_mask))
>> > return false;
>> >
>> > @@ -108,8 +119,8 @@ static inline bool freq_counters_valid(int cpu)
>> > return false;
>> > }
>> >
>> > - if (unlikely(!per_cpu(arch_const_cycles_prev, cpu) ||
>> > - !per_cpu(arch_core_cycles_prev, cpu))) {
>> > + if (unlikely(!amu_sample->arch_const_cycles_prev ||
>> > + !amu_sample->arch_core_cycles_prev)) {
>> > pr_debug("CPU%d: cycle counters are not enabled.\n", cpu);
>> > return false;
>> > }
>> > @@ -152,17 +163,22 @@ void freq_inv_set_max_ratio(int cpu, u64 max_rate)
>> >
>> > static void amu_scale_freq_tick(void)
>> > {
>> > + struct amu_cntr_sample *amu_sample = this_cpu_ptr(&cpu_amu_samples);
>> > u64 prev_core_cnt, prev_const_cnt;
>> > u64 core_cnt, const_cnt, scale;
>> >
>> > - prev_const_cnt = this_cpu_read(arch_const_cycles_prev);
>> > - prev_core_cnt = this_cpu_read(arch_core_cycles_prev);
>> > + prev_const_cnt = amu_sample->arch_const_cycles_prev;
>> > + prev_core_cnt = amu_sample->arch_core_cycles_prev;
>> >
>> > update_freq_counters_refs();
>> >
>> > - const_cnt = this_cpu_read(arch_const_cycles_prev);
>> > - core_cnt = this_cpu_read(arch_core_cycles_prev);
>> > + const_cnt = amu_sample->arch_const_cycles_prev;
>> > + core_cnt = amu_sample->arch_core_cycles_prev;
>> >
>> > + /*
>> > + * This should not happen unless the AMUs have been reset and the
>> > + * counter values have not been restored - unlikely
>> > + */
>> > if (unlikely(core_cnt <= prev_core_cnt ||
>> > const_cnt <= prev_const_cnt))
>> > return;
>> > @@ -182,6 +198,8 @@ static void amu_scale_freq_tick(void)
>> >
>> > scale = min_t(unsigned long, scale, SCHED_CAPACITY_SCALE);
>> > this_cpu_write(arch_freq_scale, (unsigned long)scale);
>> > +
>> > + amu_sample->last_scale_update = jiffies;
>> > }
>> >
>> > static struct scale_freq_data amu_sfd = {
>> > @@ -189,6 +207,77 @@ static struct scale_freq_data amu_sfd = {
>> > .set_freq_scale = amu_scale_freq_tick,
>> > };
>> >
>> > +static __always_inline bool amu_fie_cpu_supported(unsigned int cpu)
>> > +{
>> > + return cpumask_available(amu_fie_cpus) &&
>> > + cpumask_test_cpu(cpu, amu_fie_cpus);
>> > +}
>> > +
>> > +#define AMU_SAMPLE_EXP_MS 20
>> > +
>> > +int arch_freq_avg_get_on_cpu(int cpu)
>> > +{
>> > + struct amu_cntr_sample *amu_sample;
>> > + unsigned int start_cpu = cpu;
>> > + unsigned long last_update;
>> > + unsigned int freq = 0;
>> > + u64 scale;
>> > +
>> > + if (!amu_fie_cpu_supported(cpu) || !arch_scale_freq_ref(cpu))
>> > + return -EOPNOTSUPP;
>> > +
>> > +retry:
>> > + amu_sample = per_cpu_ptr(&cpu_amu_samples, cpu);
>> > +
>> > + last_update = amu_sample->last_scale_update;
>> > +
>> > + /*
>> > + * For those CPUs that are in full dynticks mode, and those that have
>> 'or those' to match with if condition?
>Yeah, might be.
>>
>> > + * not seen tick for a while, try an alternative source for the counters
>> > + * (and thus freq scale), if available, for given policy: this boils
>> > + * down to identifying an active cpu within the same freq domain, if any.
>> > + */
>> > + if (!housekeeping_cpu(cpu, HK_TYPE_TICK) ||
>> > + time_is_before_jiffies(last_update + msecs_to_jiffies(AMU_SAMPLE_EXP_MS))) {
>> > + struct cpufreq_policy *policy = cpufreq_cpu_get(cpu);
>> > + int ref_cpu = cpu;
>> > +
>> > + if (!policy)
>> > + return 0;
>> > +
>>
>> We can skip the rest of code if policy has a single cpu. AFAIR, one of the
>> previous versions had similar check.
>>
>> if (!policy_is_shared(policy)) {
>> cpufreq_cpu_put(policy);
>> goto freq_comput;
>> }
>True, we could but then this case is covered by cpumask_next_wrap
>which for single-cpu policies will render the ref_cpu invalid,
>so policy_is_shared check seemed unnecessary.
>>
>> > + if (!cpumask_intersects(policy->related_cpus,
>> > + housekeeping_cpumask(HK_TYPE_TICK))) {
>> > + cpufreq_cpu_put(policy);
>> > + return -EOPNOTSUPP;
>> > + }
>> > +
>> > +
>> > + do {
>> > + ref_cpu = cpumask_next_wrap(ref_cpu, policy->cpus,
>> > + start_cpu, false);
>> > +
>> > + } while (ref_cpu < nr_cpu_ids && idle_cpu(ref_cpu));
>> > +
>> > + cpufreq_cpu_put(policy);
>> > +
>> > + if (ref_cpu >= nr_cpu_ids)
>> > + /* No alternative to pull info from */
>> > + return 0;
>> > +
>>
>> The 'cpuinfo_avg_freq' node gives 'unknown' value for single CPU per policy
>> as 'ref_cpu' increments to 'nr_cpu_ids'. We can use the same CPU instead of
>> returning zero if no alternative CPU.
>>
>> # cat /sys/devices/system/cpu/cpu2/cpufreq/cpuinfo_avg_freq
>> <unknown>
>>
>> ----
>> if (ref_cpu >= nr_cpu_ids)
>> /* Use same CPU if no alternative to pull info from */
>> goto freq_comput;
>>
>> ..
>> freq_comput:
>> scale = arch_scale_freq_capacity(cpu);
>> freq = scale * arch_scale_freq_ref(cpu);
>> ----
>>
>This boils down to the question what that function, and the information it
>provides, represent really. The 'unknown' here simply says the CPU has been idle
>for a while and as such the frequency data is a bit stale and it does not
>represent the average freq the CPU is actually running at anymore, which is
>the intention here really. Or, that the given CPU is a non-housekeeping one.
>Either way I believe this is a useful information, instead of providing
>stale data with no indication on whether the frequency is really the 'current'
>one or not.
>
>If that is somehow undesirable we can discuss this further, though I'd rather
>avoid exposing an interface where the feedback provided is open to
>interpretation at all times.
Would it make sense to identify that the frequency reporting is unknown due to
cpu being idle vs some other issue like being a non-housekeeping CPU? Would
returning a value of 0 make it easier for tools to represent that the CPU is
currently idle?
Thanks,
Vanshidhar
>
>---
>Best Regards
>Beata
>> Thank you,
>> Sumit Gupta
>>
>> P.S. Will be on afk for next 2 weeks with no access to email. Please expect
>> a delay in response.
>>
>> > + cpu = ref_cpu;
>> > + goto retry;
>> > + }
>> > + /*
>> > + * Reversed computation to the one used to determine
>> > + * the arch_freq_scale value
>> > + * (see amu_scale_freq_tick for details)
>> > + */
>> > + scale = arch_scale_freq_capacity(cpu);
>> > + freq = scale * arch_scale_freq_ref(cpu);
>> > + freq >>= SCHED_CAPACITY_SHIFT;
>> > + return freq;
>> > +}
>> > +
>>
>> > static void amu_fie_setup(const struct cpumask *cpus)
>> > {
>> > int cpu;
>> > --
>> > 2.25.1
>> >
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v7 3/4] arm64: Provide an AMU-based version of arch_freq_avg_get_on_cpu
2024-09-26 23:21 ` Vanshidhar Konda
@ 2024-10-03 21:39 ` Beata Michalska
2024-10-03 21:54 ` Vanshidhar Konda
0 siblings, 1 reply; 27+ messages in thread
From: Beata Michalska @ 2024-10-03 21:39 UTC (permalink / raw)
To: Vanshidhar Konda
Cc: Sumit Gupta, linux-kernel, linux-arm-kernel, linux-pm,
ionela.voinescu, sudeep.holla, will, catalin.marinas, rafael,
viresh.kumar, yang, lihuisong, zhanjie9, linux-tegra, Bibek Basu
On Thu, Sep 26, 2024 at 04:21:14PM -0700, Vanshidhar Konda wrote:
> On Thu, Sep 26, 2024 at 12:34:01PM GMT, Beata Michalska wrote:
> > On Tue, Sep 17, 2024 at 05:41:09PM +0530, Sumit Gupta wrote:
> > > Hi Beata,
> > Hi Sumit,
> > >
> > > Thank you for the patches.
> > Thank you for having a look at those.
> > >
> > > On 13/09/24 18:59, Beata Michalska wrote:
> > > > External email: Use caution opening links or attachments
> > > >
> > > >
> > > > With the Frequency Invariance Engine (FIE) being already wired up with
> > > > sched tick and making use of relevant (core counter and constant
> > > > counter) AMU counters, getting the average frequency for a given CPU,
> > > > can be achieved by utilizing the frequency scale factor which reflects
> > > > an average CPU frequency for the last tick period length.
> > > >
> > > > The solution is partially based on APERF/MPERF implementation of
> > > > arch_freq_get_on_cpu.
> > > >
> > > > Suggested-by: Ionela Voinescu <ionela.voinescu@arm.com>
> > > > Signed-off-by: Beata Michalska <beata.michalska@arm.com>
> > > > ---
> > > > arch/arm64/kernel/topology.c | 109 +++++++++++++++++++++++++++++++----
> > > > 1 file changed, 99 insertions(+), 10 deletions(-)
> > > >
> > > > diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
> > > > index cb180684d10d..22e510733336 100644
> > > > --- a/arch/arm64/kernel/topology.c
> > > > +++ b/arch/arm64/kernel/topology.c
> > > > @@ -17,6 +17,7 @@
> > > > #include <linux/cpufreq.h>
> > > > #include <linux/init.h>
> > > > #include <linux/percpu.h>
> > > > +#include <linux/sched/isolation.h>
> > > >
> > > > #include <asm/cpu.h>
> > > > #include <asm/cputype.h>
> > > > @@ -88,18 +89,28 @@ int __init parse_acpi_topology(void)
> > > > * initialized.
> > > > */
> > > > static DEFINE_PER_CPU_READ_MOSTLY(unsigned long, arch_max_freq_scale) = 1UL << (2 * SCHED_CAPACITY_SHIFT);
> > > > -static DEFINE_PER_CPU(u64, arch_const_cycles_prev);
> > > > -static DEFINE_PER_CPU(u64, arch_core_cycles_prev);
> > > > static cpumask_var_t amu_fie_cpus;
> > > >
> > > > +struct amu_cntr_sample {
> > > > + u64 arch_const_cycles_prev;
> > > > + u64 arch_core_cycles_prev;
> > > > + unsigned long last_scale_update;
> > > > +};
> > > > +
> > > > +static DEFINE_PER_CPU_SHARED_ALIGNED(struct amu_cntr_sample, cpu_amu_samples);
> > > > +
> > > > void update_freq_counters_refs(void)
> > > > {
> > > > - this_cpu_write(arch_core_cycles_prev, read_corecnt());
> > > > - this_cpu_write(arch_const_cycles_prev, read_constcnt());
> > > > + struct amu_cntr_sample *amu_sample = this_cpu_ptr(&cpu_amu_samples);
> > > > +
> > > > + amu_sample->arch_core_cycles_prev = read_corecnt();
> > > > + amu_sample->arch_const_cycles_prev = read_constcnt();
> > > > }
> > > >
> > > > static inline bool freq_counters_valid(int cpu)
> > > > {
> > > > + struct amu_cntr_sample *amu_sample = per_cpu_ptr(&cpu_amu_samples, cpu);
> > > > +
> > > > if ((cpu >= nr_cpu_ids) || !cpumask_test_cpu(cpu, cpu_present_mask))
> > > > return false;
> > > >
> > > > @@ -108,8 +119,8 @@ static inline bool freq_counters_valid(int cpu)
> > > > return false;
> > > > }
> > > >
> > > > - if (unlikely(!per_cpu(arch_const_cycles_prev, cpu) ||
> > > > - !per_cpu(arch_core_cycles_prev, cpu))) {
> > > > + if (unlikely(!amu_sample->arch_const_cycles_prev ||
> > > > + !amu_sample->arch_core_cycles_prev)) {
> > > > pr_debug("CPU%d: cycle counters are not enabled.\n", cpu);
> > > > return false;
> > > > }
> > > > @@ -152,17 +163,22 @@ void freq_inv_set_max_ratio(int cpu, u64 max_rate)
> > > >
> > > > static void amu_scale_freq_tick(void)
> > > > {
> > > > + struct amu_cntr_sample *amu_sample = this_cpu_ptr(&cpu_amu_samples);
> > > > u64 prev_core_cnt, prev_const_cnt;
> > > > u64 core_cnt, const_cnt, scale;
> > > >
> > > > - prev_const_cnt = this_cpu_read(arch_const_cycles_prev);
> > > > - prev_core_cnt = this_cpu_read(arch_core_cycles_prev);
> > > > + prev_const_cnt = amu_sample->arch_const_cycles_prev;
> > > > + prev_core_cnt = amu_sample->arch_core_cycles_prev;
> > > >
> > > > update_freq_counters_refs();
> > > >
> > > > - const_cnt = this_cpu_read(arch_const_cycles_prev);
> > > > - core_cnt = this_cpu_read(arch_core_cycles_prev);
> > > > + const_cnt = amu_sample->arch_const_cycles_prev;
> > > > + core_cnt = amu_sample->arch_core_cycles_prev;
> > > >
> > > > + /*
> > > > + * This should not happen unless the AMUs have been reset and the
> > > > + * counter values have not been restored - unlikely
> > > > + */
> > > > if (unlikely(core_cnt <= prev_core_cnt ||
> > > > const_cnt <= prev_const_cnt))
> > > > return;
> > > > @@ -182,6 +198,8 @@ static void amu_scale_freq_tick(void)
> > > >
> > > > scale = min_t(unsigned long, scale, SCHED_CAPACITY_SCALE);
> > > > this_cpu_write(arch_freq_scale, (unsigned long)scale);
> > > > +
> > > > + amu_sample->last_scale_update = jiffies;
> > > > }
> > > >
> > > > static struct scale_freq_data amu_sfd = {
> > > > @@ -189,6 +207,77 @@ static struct scale_freq_data amu_sfd = {
> > > > .set_freq_scale = amu_scale_freq_tick,
> > > > };
> > > >
> > > > +static __always_inline bool amu_fie_cpu_supported(unsigned int cpu)
> > > > +{
> > > > + return cpumask_available(amu_fie_cpus) &&
> > > > + cpumask_test_cpu(cpu, amu_fie_cpus);
> > > > +}
> > > > +
> > > > +#define AMU_SAMPLE_EXP_MS 20
> > > > +
> > > > +int arch_freq_avg_get_on_cpu(int cpu)
> > > > +{
> > > > + struct amu_cntr_sample *amu_sample;
> > > > + unsigned int start_cpu = cpu;
> > > > + unsigned long last_update;
> > > > + unsigned int freq = 0;
> > > > + u64 scale;
> > > > +
> > > > + if (!amu_fie_cpu_supported(cpu) || !arch_scale_freq_ref(cpu))
> > > > + return -EOPNOTSUPP;
> > > > +
> > > > +retry:
> > > > + amu_sample = per_cpu_ptr(&cpu_amu_samples, cpu);
> > > > +
> > > > + last_update = amu_sample->last_scale_update;
> > > > +
> > > > + /*
> > > > + * For those CPUs that are in full dynticks mode, and those that have
> > > 'or those' to match with if condition?
> > Yeah, might be.
> > >
> > > > + * not seen tick for a while, try an alternative source for the counters
> > > > + * (and thus freq scale), if available, for given policy: this boils
> > > > + * down to identifying an active cpu within the same freq domain, if any.
> > > > + */
> > > > + if (!housekeeping_cpu(cpu, HK_TYPE_TICK) ||
> > > > + time_is_before_jiffies(last_update + msecs_to_jiffies(AMU_SAMPLE_EXP_MS))) {
> > > > + struct cpufreq_policy *policy = cpufreq_cpu_get(cpu);
> > > > + int ref_cpu = cpu;
> > > > +
> > > > + if (!policy)
> > > > + return 0;
> > > > +
> > >
> > > We can skip the rest of code if policy has a single cpu. AFAIR, one of the
> > > previous versions had similar check.
> > >
> > > if (!policy_is_shared(policy)) {
> > > cpufreq_cpu_put(policy);
> > > goto freq_comput;
> > > }
> > True, we could but then this case is covered by cpumask_next_wrap
> > which for single-cpu policies will render the ref_cpu invalid,
> > so policy_is_shared check seemed unnecessary.
> > >
> > > > + if (!cpumask_intersects(policy->related_cpus,
> > > > + housekeeping_cpumask(HK_TYPE_TICK))) {
> > > > + cpufreq_cpu_put(policy);
> > > > + return -EOPNOTSUPP;
> > > > + }
> > > > +
> > > > +
> > > > + do {
> > > > + ref_cpu = cpumask_next_wrap(ref_cpu, policy->cpus,
> > > > + start_cpu, false);
> > > > +
> > > > + } while (ref_cpu < nr_cpu_ids && idle_cpu(ref_cpu));
> > > > +
> > > > + cpufreq_cpu_put(policy);
> > > > +
> > > > + if (ref_cpu >= nr_cpu_ids)
> > > > + /* No alternative to pull info from */
> > > > + return 0;
> > > > +
> > >
> > > The 'cpuinfo_avg_freq' node gives 'unknown' value for single CPU per policy
> > > as 'ref_cpu' increments to 'nr_cpu_ids'. We can use the same CPU instead of
> > > returning zero if no alternative CPU.
> > >
> > > # cat /sys/devices/system/cpu/cpu2/cpufreq/cpuinfo_avg_freq
> > > <unknown>
> > >
> > > ----
> > > if (ref_cpu >= nr_cpu_ids)
> > > /* Use same CPU if no alternative to pull info from */
> > > goto freq_comput;
> > >
> > > ..
> > > freq_comput:
> > > scale = arch_scale_freq_capacity(cpu);
> > > freq = scale * arch_scale_freq_ref(cpu);
> > > ----
> > >
> > This boils down to the question what that function, and the information it
> > provides, represent really. The 'unknown' here simply says the CPU has been idle
> > for a while and as such the frequency data is a bit stale and it does not
> > represent the average freq the CPU is actually running at anymore, which is
> > the intention here really. Or, that the given CPU is a non-housekeeping one.
> > Either way I believe this is a useful information, instead of providing
> > stale data with no indication on whether the frequency is really the 'current'
> > one or not.
> >
> > If that is somehow undesirable we can discuss this further, though I'd rather
> > avoid exposing an interface where the feedback provided is open to
> > interpretation at all times.
>
> Would it make sense to identify that the frequency reporting is unknown due to
> cpu being idle vs some other issue like being a non-housekeeping CPU? Would
> returning a value of 0 make it easier for tools to represent that the CPU is
> currently idle?
That is an option.
Another one would be to return an error for those cases. This would make it
easier to distinguish between valid frequency &/| idle CPU vs tickless CPU
(EINVAL vs ENOENT) ?
---
BR
Beata
>
> Thanks,
> Vanshidhar
>
> >
> > ---
> > Best Regards
> > Beata
> > > Thank you,
> > > Sumit Gupta
> > >
> > > P.S. Will be on afk for next 2 weeks with no access to email. Please expect
> > > a delay in response.
> > >
> > > > + cpu = ref_cpu;
> > > > + goto retry;
> > > > + }
> > > > + /*
> > > > + * Reversed computation to the one used to determine
> > > > + * the arch_freq_scale value
> > > > + * (see amu_scale_freq_tick for details)
> > > > + */
> > > > + scale = arch_scale_freq_capacity(cpu);
> > > > + freq = scale * arch_scale_freq_ref(cpu);
> > > > + freq >>= SCHED_CAPACITY_SHIFT;
> > > > + return freq;
> > > > +}
> > > > +
> > >
> > > > static void amu_fie_setup(const struct cpumask *cpus)
> > > > {
> > > > int cpu;
> > > > --
> > > > 2.25.1
> > > >
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v7 3/4] arm64: Provide an AMU-based version of arch_freq_avg_get_on_cpu
2024-10-03 21:39 ` Beata Michalska
@ 2024-10-03 21:54 ` Vanshidhar Konda
2024-10-10 11:08 ` Beata Michalska
0 siblings, 1 reply; 27+ messages in thread
From: Vanshidhar Konda @ 2024-10-03 21:54 UTC (permalink / raw)
To: Beata Michalska
Cc: Sumit Gupta, linux-kernel, linux-arm-kernel, linux-pm,
ionela.voinescu, sudeep.holla, will, catalin.marinas, rafael,
viresh.kumar, yang, lihuisong, zhanjie9, linux-tegra, Bibek Basu
On Thu, Oct 03, 2024 at 11:39:54PM GMT, Beata Michalska wrote:
>On Thu, Sep 26, 2024 at 04:21:14PM -0700, Vanshidhar Konda wrote:
>> On Thu, Sep 26, 2024 at 12:34:01PM GMT, Beata Michalska wrote:
>> > On Tue, Sep 17, 2024 at 05:41:09PM +0530, Sumit Gupta wrote:
>> > > Hi Beata,
>> > Hi Sumit,
>> > >
>> > > Thank you for the patches.
>> > Thank you for having a look at those.
>> > >
>> > > On 13/09/24 18:59, Beata Michalska wrote:
>> > > > External email: Use caution opening links or attachments
>> > > >
>> > > >
>> > > > With the Frequency Invariance Engine (FIE) being already wired up with
>> > > > sched tick and making use of relevant (core counter and constant
>> > > > counter) AMU counters, getting the average frequency for a given CPU,
>> > > > can be achieved by utilizing the frequency scale factor which reflects
>> > > > an average CPU frequency for the last tick period length.
>> > > >
>> > > > The solution is partially based on APERF/MPERF implementation of
>> > > > arch_freq_get_on_cpu.
>> > > >
>> > > > Suggested-by: Ionela Voinescu <ionela.voinescu@arm.com>
>> > > > Signed-off-by: Beata Michalska <beata.michalska@arm.com>
>> > > > ---
>> > > > arch/arm64/kernel/topology.c | 109 +++++++++++++++++++++++++++++++----
>> > > > 1 file changed, 99 insertions(+), 10 deletions(-)
>> > > >
--- snip ----
>> > >
>> > > ..
>> > > freq_comput:
>> > > scale = arch_scale_freq_capacity(cpu);
>> > > freq = scale * arch_scale_freq_ref(cpu);
>> > > ----
>> > >
>> > This boils down to the question what that function, and the information it
>> > provides, represent really. The 'unknown' here simply says the CPU has been idle
>> > for a while and as such the frequency data is a bit stale and it does not
>> > represent the average freq the CPU is actually running at anymore, which is
>> > the intention here really. Or, that the given CPU is a non-housekeeping one.
>> > Either way I believe this is a useful information, instead of providing
>> > stale data with no indication on whether the frequency is really the 'current'
>> > one or not.
>> >
>> > If that is somehow undesirable we can discuss this further, though I'd rather
>> > avoid exposing an interface where the feedback provided is open to
>> > interpretation at all times.
>>
>> Would it make sense to identify that the frequency reporting is unknown due to
>> cpu being idle vs some other issue like being a non-housekeeping CPU? Would
>> returning a value of 0 make it easier for tools to represent that the CPU is
>> currently idle?
>That is an option.
>Another one would be to return an error for those cases. This would make it
>easier to distinguish between valid frequency &/| idle CPU vs tickless CPU
>(EINVAL vs ENOENT) ?
>
That seems like a good idea but I suspect it would be confusing to the end user.
If a user runs `cat /sys/devices/system/cpu/cpu2/cpuinfo_avg_freq` they would
get an error in some cases or get a number in some other iterations.
Thanks,
Vanshidhar
>---
>BR
>Beata
>>
>> Thanks,
>> Vanshidhar
>>
>> >
>> > ---
>> > Best Regards
>> > Beata
>> > > Thank you,
>> > > Sumit Gupta
>> > >
>> > > P.S. Will be on afk for next 2 weeks with no access to email. Please expect
>> > > a delay in response.
>> > >
>> > > > + cpu = ref_cpu;
>> > > > + goto retry;
>> > > > + }
>> > > > + /*
>> > > > + * Reversed computation to the one used to determine
>> > > > + * the arch_freq_scale value
>> > > > + * (see amu_scale_freq_tick for details)
>> > > > + */
>> > > > + scale = arch_scale_freq_capacity(cpu);
>> > > > + freq = scale * arch_scale_freq_ref(cpu);
>> > > > + freq >>= SCHED_CAPACITY_SHIFT;
>> > > > + return freq;
>> > > > +}
>> > > > +
>> > >
>> > > > static void amu_fie_setup(const struct cpumask *cpus)
>> > > > {
>> > > > int cpu;
>> > > > --
>> > > > 2.25.1
>> > > >
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v7 3/4] arm64: Provide an AMU-based version of arch_freq_avg_get_on_cpu
2024-10-03 21:54 ` Vanshidhar Konda
@ 2024-10-10 11:08 ` Beata Michalska
2024-10-11 16:29 ` Vanshidhar Konda
2024-10-29 6:53 ` Viresh Kumar
0 siblings, 2 replies; 27+ messages in thread
From: Beata Michalska @ 2024-10-10 11:08 UTC (permalink / raw)
To: Vanshidhar Konda, viresh.kumar
Cc: Sumit Gupta, linux-kernel, linux-arm-kernel, linux-pm,
ionela.voinescu, sudeep.holla, will, catalin.marinas, rafael,
viresh.kumar, yang, lihuisong, zhanjie9, linux-tegra, Bibek Basu
On Thu, Oct 03, 2024 at 02:54:22PM -0700, Vanshidhar Konda wrote:
> On Thu, Oct 03, 2024 at 11:39:54PM GMT, Beata Michalska wrote:
> > On Thu, Sep 26, 2024 at 04:21:14PM -0700, Vanshidhar Konda wrote:
> > > On Thu, Sep 26, 2024 at 12:34:01PM GMT, Beata Michalska wrote:
> > > > On Tue, Sep 17, 2024 at 05:41:09PM +0530, Sumit Gupta wrote:
> > > > > Hi Beata,
> > > > Hi Sumit,
> > > > >
> > > > > Thank you for the patches.
> > > > Thank you for having a look at those.
> > > > >
> > > > > On 13/09/24 18:59, Beata Michalska wrote:
> > > > > > External email: Use caution opening links or attachments
> > > > > >
> > > > > >
> > > > > > With the Frequency Invariance Engine (FIE) being already wired up with
> > > > > > sched tick and making use of relevant (core counter and constant
> > > > > > counter) AMU counters, getting the average frequency for a given CPU,
> > > > > > can be achieved by utilizing the frequency scale factor which reflects
> > > > > > an average CPU frequency for the last tick period length.
> > > > > >
> > > > > > The solution is partially based on APERF/MPERF implementation of
> > > > > > arch_freq_get_on_cpu.
> > > > > >
> > > > > > Suggested-by: Ionela Voinescu <ionela.voinescu@arm.com>
> > > > > > Signed-off-by: Beata Michalska <beata.michalska@arm.com>
> > > > > > ---
> > > > > > arch/arm64/kernel/topology.c | 109 +++++++++++++++++++++++++++++++----
> > > > > > 1 file changed, 99 insertions(+), 10 deletions(-)
> > > > > >
>
> --- snip ----
>
> > > > >
> > > > > ..
> > > > > freq_comput:
> > > > > scale = arch_scale_freq_capacity(cpu);
> > > > > freq = scale * arch_scale_freq_ref(cpu);
> > > > > ----
> > > > >
> > > > This boils down to the question what that function, and the information it
> > > > provides, represent really. The 'unknown' here simply says the CPU has been idle
> > > > for a while and as such the frequency data is a bit stale and it does not
> > > > represent the average freq the CPU is actually running at anymore, which is
> > > > the intention here really. Or, that the given CPU is a non-housekeeping one.
> > > > Either way I believe this is a useful information, instead of providing
> > > > stale data with no indication on whether the frequency is really the 'current'
> > > > one or not.
> > > >
> > > > If that is somehow undesirable we can discuss this further, though I'd rather
> > > > avoid exposing an interface where the feedback provided is open to
> > > > interpretation at all times.
> > >
> > > Would it make sense to identify that the frequency reporting is unknown due to
> > > cpu being idle vs some other issue like being a non-housekeeping CPU? Would
> > > returning a value of 0 make it easier for tools to represent that the CPU is
> > > currently idle?
> > That is an option.
> > Another one would be to return an error for those cases. This would make it
> > easier to distinguish between valid frequency &/| idle CPU vs tickless CPU
> > (EINVAL vs ENOENT) ?
> >
>
> That seems like a good idea but I suspect it would be confusing to the end user.
>
> If a user runs `cat /sys/devices/system/cpu/cpu2/cpuinfo_avg_freq` they would
> get an error in some cases or get a number in some other iterations.
>
That is a fair point but I am not entirely convinced using '0' instead makes
things any more clearer as this is in no way a valid CPU frequency.
As long as we document the expected behaviour keeping the interface well
defined, both options should be fine I guess.
@Viresh: what is your opinion on that one ?
---
BR
Beata
> Thanks,
> Vanshidhar
>
> > ---
> > BR
> > Beata
> > >
> > > Thanks,
> > > Vanshidhar
> > >
> > > >
> > > > ---
> > > > Best Regards
> > > > Beata
> > > > > Thank you,
> > > > > Sumit Gupta
> > > > >
> > > > > P.S. Will be on afk for next 2 weeks with no access to email. Please expect
> > > > > a delay in response.
> > > > >
> > > > > > + cpu = ref_cpu;
> > > > > > + goto retry;
> > > > > > + }
> > > > > > + /*
> > > > > > + * Reversed computation to the one used to determine
> > > > > > + * the arch_freq_scale value
> > > > > > + * (see amu_scale_freq_tick for details)
> > > > > > + */
> > > > > > + scale = arch_scale_freq_capacity(cpu);
> > > > > > + freq = scale * arch_scale_freq_ref(cpu);
> > > > > > + freq >>= SCHED_CAPACITY_SHIFT;
> > > > > > + return freq;
> > > > > > +}
> > > > > > +
> > > > >
> > > > > > static void amu_fie_setup(const struct cpumask *cpus)
> > > > > > {
> > > > > > int cpu;
> > > > > > --
> > > > > > 2.25.1
> > > > > >
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v7 3/4] arm64: Provide an AMU-based version of arch_freq_avg_get_on_cpu
2024-10-10 11:08 ` Beata Michalska
@ 2024-10-11 16:29 ` Vanshidhar Konda
2024-10-14 17:46 ` Sumit Gupta
2024-10-29 6:53 ` Viresh Kumar
1 sibling, 1 reply; 27+ messages in thread
From: Vanshidhar Konda @ 2024-10-11 16:29 UTC (permalink / raw)
To: Beata Michalska
Cc: viresh.kumar, Sumit Gupta, linux-kernel, linux-arm-kernel,
linux-pm, ionela.voinescu, sudeep.holla, will, catalin.marinas,
rafael, yang, lihuisong, zhanjie9, linux-tegra, Bibek Basu
On Thu, Oct 10, 2024 at 01:08:23PM GMT, Beata Michalska wrote:
>On Thu, Oct 03, 2024 at 02:54:22PM -0700, Vanshidhar Konda wrote:
>> On Thu, Oct 03, 2024 at 11:39:54PM GMT, Beata Michalska wrote:
>> > On Thu, Sep 26, 2024 at 04:21:14PM -0700, Vanshidhar Konda wrote:
>> > > On Thu, Sep 26, 2024 at 12:34:01PM GMT, Beata Michalska wrote:
>> > > > On Tue, Sep 17, 2024 at 05:41:09PM +0530, Sumit Gupta wrote:
>> > > > > Hi Beata,
>> > > > Hi Sumit,
>> > > > >
>> > > > > Thank you for the patches.
>> > > > Thank you for having a look at those.
>> > > > >
>> > > > > On 13/09/24 18:59, Beata Michalska wrote:
>> > > > > > External email: Use caution opening links or attachments
>> > > > > >
>> > > > > >
>> > > > > > With the Frequency Invariance Engine (FIE) being already wired up with
>> > > > > > sched tick and making use of relevant (core counter and constant
>> > > > > > counter) AMU counters, getting the average frequency for a given CPU,
>> > > > > > can be achieved by utilizing the frequency scale factor which reflects
>> > > > > > an average CPU frequency for the last tick period length.
>> > > > > >
>> > > > > > The solution is partially based on APERF/MPERF implementation of
>> > > > > > arch_freq_get_on_cpu.
>> > > > > >
>> > > > > > Suggested-by: Ionela Voinescu <ionela.voinescu@arm.com>
>> > > > > > Signed-off-by: Beata Michalska <beata.michalska@arm.com>
>> > > > > > ---
>> > > > > > arch/arm64/kernel/topology.c | 109 +++++++++++++++++++++++++++++++----
>> > > > > > 1 file changed, 99 insertions(+), 10 deletions(-)
>> > > > > >
>>
>> --- snip ----
>>
>> > > > >
>> > > > > ..
>> > > > > freq_comput:
>> > > > > scale = arch_scale_freq_capacity(cpu);
>> > > > > freq = scale * arch_scale_freq_ref(cpu);
>> > > > > ----
>> > > > >
>> > > > This boils down to the question what that function, and the information it
>> > > > provides, represent really. The 'unknown' here simply says the CPU has been idle
>> > > > for a while and as such the frequency data is a bit stale and it does not
>> > > > represent the average freq the CPU is actually running at anymore, which is
>> > > > the intention here really. Or, that the given CPU is a non-housekeeping one.
>> > > > Either way I believe this is a useful information, instead of providing
>> > > > stale data with no indication on whether the frequency is really the 'current'
>> > > > one or not.
>> > > >
>> > > > If that is somehow undesirable we can discuss this further, though I'd rather
>> > > > avoid exposing an interface where the feedback provided is open to
>> > > > interpretation at all times.
>> > >
>> > > Would it make sense to identify that the frequency reporting is unknown due to
>> > > cpu being idle vs some other issue like being a non-housekeeping CPU? Would
>> > > returning a value of 0 make it easier for tools to represent that the CPU is
>> > > currently idle?
>> > That is an option.
>> > Another one would be to return an error for those cases. This would make it
>> > easier to distinguish between valid frequency &/| idle CPU vs tickless CPU
>> > (EINVAL vs ENOENT) ?
>> >
>>
>> That seems like a good idea but I suspect it would be confusing to the end user.
>>
>> If a user runs `cat /sys/devices/system/cpu/cpu2/cpuinfo_avg_freq` they would
>> get an error in some cases or get a number in some other iterations.
>>
>That is a fair point but I am not entirely convinced using '0' instead makes
>things any more clearer as this is in no way a valid CPU frequency.
>As long as we document the expected behaviour keeping the interface well
>defined, both options should be fine I guess.
>
Another option could be to list out the reason as 'idle' or 'no entry' instead of
returning EINVAL or ENOENT. These wouldn't be valid values either but cat on the
sysfs node wouldn't return an error.
Thanks,
Vanshidhar
>@Viresh: what is your opinion on that one ?
>
>---
>BR
>Beata
>> Thanks,
>> Vanshidhar
>>
>> > ---
>> > BR
>> > Beata
>> > >
>> > > Thanks,
>> > > Vanshidhar
>> > >
>> > > >
>> > > > ---
>> > > > Best Regards
>> > > > Beata
>> > > > > Thank you,
>> > > > > Sumit Gupta
>> > > > >
>> > > > > P.S. Will be on afk for next 2 weeks with no access to email. Please expect
>> > > > > a delay in response.
>> > > > >
>> > > > > > + cpu = ref_cpu;
>> > > > > > + goto retry;
>> > > > > > + }
>> > > > > > + /*
>> > > > > > + * Reversed computation to the one used to determine
>> > > > > > + * the arch_freq_scale value
>> > > > > > + * (see amu_scale_freq_tick for details)
>> > > > > > + */
>> > > > > > + scale = arch_scale_freq_capacity(cpu);
>> > > > > > + freq = scale * arch_scale_freq_ref(cpu);
>> > > > > > + freq >>= SCHED_CAPACITY_SHIFT;
>> > > > > > + return freq;
>> > > > > > +}
>> > > > > > +
>> > > > >
>> > > > > > static void amu_fie_setup(const struct cpumask *cpus)
>> > > > > > {
>> > > > > > int cpu;
>> > > > > > --
>> > > > > > 2.25.1
>> > > > > >
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v7 3/4] arm64: Provide an AMU-based version of arch_freq_avg_get_on_cpu
2024-10-11 16:29 ` Vanshidhar Konda
@ 2024-10-14 17:46 ` Sumit Gupta
2024-10-16 20:45 ` Beata Michalska
0 siblings, 1 reply; 27+ messages in thread
From: Sumit Gupta @ 2024-10-14 17:46 UTC (permalink / raw)
To: Vanshidhar Konda, Beata Michalska
Cc: viresh.kumar, linux-kernel, linux-arm-kernel, linux-pm,
ionela.voinescu, sudeep.holla, will, catalin.marinas, rafael,
yang, lihuisong, zhanjie9, linux-tegra, Bibek Basu
On 11/10/24 21:59, Vanshidhar Konda wrote:
> External email: Use caution opening links or attachments
>
>
> On Thu, Oct 10, 2024 at 01:08:23PM GMT, Beata Michalska wrote:
>> On Thu, Oct 03, 2024 at 02:54:22PM -0700, Vanshidhar Konda wrote:
>>> On Thu, Oct 03, 2024 at 11:39:54PM GMT, Beata Michalska wrote:
>>> > On Thu, Sep 26, 2024 at 04:21:14PM -0700, Vanshidhar Konda wrote:
>>> > > On Thu, Sep 26, 2024 at 12:34:01PM GMT, Beata Michalska wrote:
>>> > > > On Tue, Sep 17, 2024 at 05:41:09PM +0530, Sumit Gupta wrote:
>>> > > > > Hi Beata,
>>> > > > Hi Sumit,
>>> > > > >
>>> > > > > Thank you for the patches.
>>> > > > Thank you for having a look at those.
>>> > > > >
>>> > > > > On 13/09/24 18:59, Beata Michalska wrote:
>>> > > > > > External email: Use caution opening links or attachments
>>> > > > > >
>>> > > > > >
>>> > > > > > With the Frequency Invariance Engine (FIE) being already
>>> wired up with
>>> > > > > > sched tick and making use of relevant (core counter and
>>> constant
>>> > > > > > counter) AMU counters, getting the average frequency for a
>>> given CPU,
>>> > > > > > can be achieved by utilizing the frequency scale factor
>>> which reflects
>>> > > > > > an average CPU frequency for the last tick period length.
>>> > > > > >
>>> > > > > > The solution is partially based on APERF/MPERF
>>> implementation of
>>> > > > > > arch_freq_get_on_cpu.
>>> > > > > >
>>> > > > > > Suggested-by: Ionela Voinescu <ionela.voinescu@arm.com>
>>> > > > > > Signed-off-by: Beata Michalska <beata.michalska@arm.com>
>>> > > > > > ---
>>> > > > > > arch/arm64/kernel/topology.c | 109
>>> +++++++++++++++++++++++++++++++----
>>> > > > > > 1 file changed, 99 insertions(+), 10 deletions(-)
>>> > > > > >
>>>
>>> --- snip ----
>>>
>>> > > > >
>>> > > > > ..
>>> > > > > freq_comput:
>>> > > > > scale = arch_scale_freq_capacity(cpu);
>>> > > > > freq = scale * arch_scale_freq_ref(cpu);
>>> > > > > ----
>>> > > > >
>>> > > > This boils down to the question what that function, and the
>>> information it
>>> > > > provides, represent really. The 'unknown' here simply says the
>>> CPU has been idle
>>> > > > for a while and as such the frequency data is a bit stale and
>>> it does not
>>> > > > represent the average freq the CPU is actually running at
>>> anymore, which is
>>> > > > the intention here really. Or, that the given CPU is a
>>> non-housekeeping one.
>>> > > > Either way I believe this is a useful information, instead of
>>> providing
>>> > > > stale data with no indication on whether the frequency is
>>> really the 'current'
>>> > > > one or not.
>>> > > >
>>> > > > If that is somehow undesirable we can discuss this further,
>>> though I'd rather
>>> > > > avoid exposing an interface where the feedback provided is open to
>>> > > > interpretation at all times.
>>> > >
>>> > > Would it make sense to identify that the frequency reporting is
>>> unknown due to
>>> > > cpu being idle vs some other issue like being a non-housekeeping
>>> CPU? Would
>>> > > returning a value of 0 make it easier for tools to represent that
>>> the CPU is
>>> > > currently idle?
>>> > That is an option.
>>> > Another one would be to return an error for those cases. This would
>>> make it
>>> > easier to distinguish between valid frequency &/| idle CPU vs
>>> tickless CPU
>>> > (EINVAL vs ENOENT) ?
>>> >
>>>
>>> That seems like a good idea but I suspect it would be confusing to
>>> the end user.
>>>
>>> If a user runs `cat /sys/devices/system/cpu/cpu2/cpuinfo_avg_freq`
>>> they would
>>> get an error in some cases or get a number in some other iterations.
>>>
>> That is a fair point but I am not entirely convinced using '0' instead
>> makes
>> things any more clearer as this is in no way a valid CPU frequency.
>> As long as we document the expected behaviour keeping the interface well
>> defined, both options should be fine I guess.
>>
>
> Another option could be to list out the reason as 'idle' or 'no entry'
> instead of
> returning EINVAL or ENOENT. These wouldn't be valid values either but
> cat on the
> sysfs node wouldn't return an error.
>
> Thanks,
> Vanshidhar
>
Ya, listing the clear reason sounds better.
Thank you,
Sumit Gupta
>> @Viresh: what is your opinion on that one ?
>>
>> ---
>> BR
>> Beata
>>> Thanks,
....
>>> > > > > > + cpu = ref_cpu;
>>> > > > > > + goto retry;
>>> > > > > > + }
>>> > > > > > + /*
>>> > > > > > + * Reversed computation to the one used to determine
>>> > > > > > + * the arch_freq_scale value
>>> > > > > > + * (see amu_scale_freq_tick for details)
>>> > > > > > + */
>>> > > > > > + scale = arch_scale_freq_capacity(cpu);
>>> > > > > > + freq = scale * arch_scale_freq_ref(cpu);
>>> > > > > > + freq >>= SCHED_CAPACITY_SHIFT;
>>> > > > > > + return freq;
>>> > > > > > +}
>>> > > > > > +
>>> > > > >
>>> > > > > > static void amu_fie_setup(const struct cpumask *cpus)
>>> > > > > > {
>>> > > > > > int cpu;
>>> > > > > > --
>>> > > > > > 2.25.1
>>> > > > > >
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v7 0/4] Add support for AArch64 AMUv1-based average freq
2024-09-13 13:29 [PATCH v7 0/4] Add support for AArch64 AMUv1-based average freq Beata Michalska
` (3 preceding siblings ...)
2024-09-13 13:29 ` [PATCH v7 4/4] arm64: Update AMU-based freq scale factor on entering idle Beata Michalska
@ 2024-10-16 10:59 ` Catalin Marinas
2024-10-16 20:51 ` Beata Michalska
4 siblings, 1 reply; 27+ messages in thread
From: Catalin Marinas @ 2024-10-16 10:59 UTC (permalink / raw)
To: Beata Michalska
Cc: linux-kernel, linux-arm-kernel, linux-pm, ionela.voinescu,
sudeep.holla, will, rafael, viresh.kumar, sumitg, yang,
vanshikonda, lihuisong, zhanjie9
Hi Beata,
On Fri, Sep 13, 2024 at 02:29:40PM +0100, Beata Michalska wrote:
> This series adds support for obtaining an average CPU frequency based on
> a hardware provided feedback. The average frequency is being exposed via
> dedicated yet optional cpufreq sysfs attribute - cpuinfo_avg_freq.
> The architecture specific bits are being provided for AArch64, caching on
> existing implementation for FIE and AMUv1 support: the frequency scale
> factor, updated on each sched tick, serving as a base for retrieving
> the frequency for a given CPU, representing an average frequency
> reported between the ticks.
>
> The changes have been rather lightly (due to some limitations) tested on
> an FVP model. Note that some small discrepancies have been observed while
> testing (on the model) and this is currently being investigated, though it
> should not have any significant impact on the overall results.
>
> Note that [PATCH 2/4] arm64: amu: Delay allocating cpumask for AMU FIE support
> can be merged independently.
What's the plan with the rest of the patches? Are you going to respin?
The first patch would need an ack from Rafael or Viresh if we are to
merge them via the arm64 tree.
Thanks.
--
Catalin
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v7 3/4] arm64: Provide an AMU-based version of arch_freq_avg_get_on_cpu
2024-10-14 17:46 ` Sumit Gupta
@ 2024-10-16 20:45 ` Beata Michalska
0 siblings, 0 replies; 27+ messages in thread
From: Beata Michalska @ 2024-10-16 20:45 UTC (permalink / raw)
To: Sumit Gupta
Cc: Vanshidhar Konda, viresh.kumar, linux-kernel, linux-arm-kernel,
linux-pm, ionela.voinescu, sudeep.holla, will, catalin.marinas,
rafael, yang, lihuisong, zhanjie9, linux-tegra, Bibek Basu
On Mon, Oct 14, 2024 at 11:16:36PM +0530, Sumit Gupta wrote:
>
>
> On 11/10/24 21:59, Vanshidhar Konda wrote:
> > External email: Use caution opening links or attachments
> >
> >
> > On Thu, Oct 10, 2024 at 01:08:23PM GMT, Beata Michalska wrote:
> > > On Thu, Oct 03, 2024 at 02:54:22PM -0700, Vanshidhar Konda wrote:
> > > > On Thu, Oct 03, 2024 at 11:39:54PM GMT, Beata Michalska wrote:
> > > > > On Thu, Sep 26, 2024 at 04:21:14PM -0700, Vanshidhar Konda wrote:
> > > > > > On Thu, Sep 26, 2024 at 12:34:01PM GMT, Beata Michalska wrote:
> > > > > > > On Tue, Sep 17, 2024 at 05:41:09PM +0530, Sumit Gupta wrote:
> > > > > > > > Hi Beata,
> > > > > > > Hi Sumit,
> > > > > > > >
> > > > > > > > Thank you for the patches.
> > > > > > > Thank you for having a look at those.
> > > > > > > >
> > > > > > > > On 13/09/24 18:59, Beata Michalska wrote:
> > > > > > > > > External email: Use caution opening links or attachments
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > With the Frequency Invariance Engine (FIE) being
> > > > already wired up with
> > > > > > > > > sched tick and making use of relevant (core counter
> > > > and constant
> > > > > > > > > counter) AMU counters, getting the average frequency
> > > > for a given CPU,
> > > > > > > > > can be achieved by utilizing the frequency scale
> > > > factor which reflects
> > > > > > > > > an average CPU frequency for the last tick period length.
> > > > > > > > >
> > > > > > > > > The solution is partially based on APERF/MPERF
> > > > implementation of
> > > > > > > > > arch_freq_get_on_cpu.
> > > > > > > > >
> > > > > > > > > Suggested-by: Ionela Voinescu <ionela.voinescu@arm.com>
> > > > > > > > > Signed-off-by: Beata Michalska <beata.michalska@arm.com>
> > > > > > > > > ---
> > > > > > > > > arch/arm64/kernel/topology.c | 109
> > > > +++++++++++++++++++++++++++++++----
> > > > > > > > > 1 file changed, 99 insertions(+), 10 deletions(-)
> > > > > > > > >
> > > >
> > > > --- snip ----
> > > >
> > > > > > > >
> > > > > > > > ..
> > > > > > > > freq_comput:
> > > > > > > > scale = arch_scale_freq_capacity(cpu);
> > > > > > > > freq = scale * arch_scale_freq_ref(cpu);
> > > > > > > > ----
> > > > > > > >
> > > > > > > This boils down to the question what that function, and
> > > > the information it
> > > > > > > provides, represent really. The 'unknown' here simply says
> > > > the CPU has been idle
> > > > > > > for a while and as such the frequency data is a bit stale
> > > > and it does not
> > > > > > > represent the average freq the CPU is actually running at
> > > > anymore, which is
> > > > > > > the intention here really. Or, that the given CPU is a
> > > > non-housekeeping one.
> > > > > > > Either way I believe this is a useful information, instead
> > > > of providing
> > > > > > > stale data with no indication on whether the frequency is
> > > > really the 'current'
> > > > > > > one or not.
> > > > > > >
> > > > > > > If that is somehow undesirable we can discuss this
> > > > further, though I'd rather
> > > > > > > avoid exposing an interface where the feedback provided is open to
> > > > > > > interpretation at all times.
> > > > > >
> > > > > > Would it make sense to identify that the frequency reporting
> > > > is unknown due to
> > > > > > cpu being idle vs some other issue like being a
> > > > non-housekeeping CPU? Would
> > > > > > returning a value of 0 make it easier for tools to represent
> > > > that the CPU is
> > > > > > currently idle?
> > > > > That is an option.
> > > > > Another one would be to return an error for those cases. This
> > > > would make it
> > > > > easier to distinguish between valid frequency &/| idle CPU vs
> > > > tickless CPU
> > > > > (EINVAL vs ENOENT) ?
> > > > >
> > > >
> > > > That seems like a good idea but I suspect it would be confusing
> > > > to the end user.
> > > >
> > > > If a user runs `cat
> > > > /sys/devices/system/cpu/cpu2/cpuinfo_avg_freq` they would
> > > > get an error in some cases or get a number in some other iterations.
> > > >
> > > That is a fair point but I am not entirely convinced using '0'
> > > instead makes
> > > things any more clearer as this is in no way a valid CPU frequency.
> > > As long as we document the expected behaviour keeping the interface well
> > > defined, both options should be fine I guess.
> > >
> >
> > Another option could be to list out the reason as 'idle' or 'no entry'
> > instead of
> > returning EINVAL or ENOENT. These wouldn't be valid values either but
> > cat on the
> > sysfs node wouldn't return an error.
> >
> > Thanks,
> > Vanshidhar
> >
>
> Ya, listing the clear reason sounds better.
>
> Thank you,
> Sumit Gupta
>
I'd still prefer returning an error as that is a clear indication on failure
upon read. Furthermore, that would also make that attribute stick to single-type
rule for sysfs, which is currently not the case and will not be if we return
'idle' or 'no entry'. That said, I am happy to make that change if that would be
the final decision and that one is not mine, as the change is ultimately the
cpufreq one.
---
BR
Beata
> > > @Viresh: what is your opinion on that one ?
> > >
> > > ---
> > > BR
> > > Beata
> > > > Thanks,
>
> ....
>
> > > > > > > > > + cpu = ref_cpu;
> > > > > > > > > + goto retry;
> > > > > > > > > + }
> > > > > > > > > + /*
> > > > > > > > > + * Reversed computation to the one used to determine
> > > > > > > > > + * the arch_freq_scale value
> > > > > > > > > + * (see amu_scale_freq_tick for details)
> > > > > > > > > + */
> > > > > > > > > + scale = arch_scale_freq_capacity(cpu);
> > > > > > > > > + freq = scale * arch_scale_freq_ref(cpu);
> > > > > > > > > + freq >>= SCHED_CAPACITY_SHIFT;
> > > > > > > > > + return freq;
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > >
> > > > > > > > > static void amu_fie_setup(const struct cpumask *cpus)
> > > > > > > > > {
> > > > > > > > > int cpu;
> > > > > > > > > --
> > > > > > > > > 2.25.1
> > > > > > > > >
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v7 0/4] Add support for AArch64 AMUv1-based average freq
2024-10-16 10:59 ` [PATCH v7 0/4] Add support for AArch64 AMUv1-based average freq Catalin Marinas
@ 2024-10-16 20:51 ` Beata Michalska
2024-10-27 18:16 ` Beata Michalska
0 siblings, 1 reply; 27+ messages in thread
From: Beata Michalska @ 2024-10-16 20:51 UTC (permalink / raw)
To: Catalin Marinas
Cc: linux-kernel, linux-arm-kernel, linux-pm, ionela.voinescu,
sudeep.holla, will, rafael, viresh.kumar, sumitg, yang,
vanshikonda, lihuisong, zhanjie9
On Wed, Oct 16, 2024 at 11:59:25AM +0100, Catalin Marinas wrote:
Hi Catalin,
> Hi Beata,
>
> On Fri, Sep 13, 2024 at 02:29:40PM +0100, Beata Michalska wrote:
> > This series adds support for obtaining an average CPU frequency based on
> > a hardware provided feedback. The average frequency is being exposed via
> > dedicated yet optional cpufreq sysfs attribute - cpuinfo_avg_freq.
> > The architecture specific bits are being provided for AArch64, caching on
> > existing implementation for FIE and AMUv1 support: the frequency scale
> > factor, updated on each sched tick, serving as a base for retrieving
> > the frequency for a given CPU, representing an average frequency
> > reported between the ticks.
> >
> > The changes have been rather lightly (due to some limitations) tested on
> > an FVP model. Note that some small discrepancies have been observed while
> > testing (on the model) and this is currently being investigated, though it
> > should not have any significant impact on the overall results.
> >
> > Note that [PATCH 2/4] arm64: amu: Delay allocating cpumask for AMU FIE support
> > can be merged independently.
>
> What's the plan with the rest of the patches? Are you going to respin?
> The first patch would need an ack from Rafael or Viresh if we are to
> merge them via the arm64 tree.
>
I am still waiting on any feedback on [PATCH 1/4] - changes to cpufreq, as that
one drives the changes in arch specific bits. There is also an ongoing discussion
on how to handle idle cpu cases - so I would say we still need to agree on few
details.
---
BR
Beata
> Thanks.
>
> --
> Catalin
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v7 0/4] Add support for AArch64 AMUv1-based average freq
2024-10-16 20:51 ` Beata Michalska
@ 2024-10-27 18:16 ` Beata Michalska
0 siblings, 0 replies; 27+ messages in thread
From: Beata Michalska @ 2024-10-27 18:16 UTC (permalink / raw)
To: viresh.kumar, rafael
Cc: linux-kernel, linux-arm-kernel, linux-pm, ionela.voinescu,
catalin.marinas, sudeep.holla, will, sumitg, yang, vanshikonda,
lihuisong, zhanjie9
On Wed, Oct 16, 2024 at 10:51:57PM +0200, Beata Michalska wrote:
> On Wed, Oct 16, 2024 at 11:59:25AM +0100, Catalin Marinas wrote:
Hi Viresh,
Hi Rafael,
> Hi Catalin,
> > Hi Beata,
> >
> > On Fri, Sep 13, 2024 at 02:29:40PM +0100, Beata Michalska wrote:
> > > This series adds support for obtaining an average CPU frequency based on
> > > a hardware provided feedback. The average frequency is being exposed via
> > > dedicated yet optional cpufreq sysfs attribute - cpuinfo_avg_freq.
> > > The architecture specific bits are being provided for AArch64, caching on
> > > existing implementation for FIE and AMUv1 support: the frequency scale
> > > factor, updated on each sched tick, serving as a base for retrieving
> > > the frequency for a given CPU, representing an average frequency
> > > reported between the ticks.
> > >
> > > The changes have been rather lightly (due to some limitations) tested on
> > > an FVP model. Note that some small discrepancies have been observed while
> > > testing (on the model) and this is currently being investigated, though it
> > > should not have any significant impact on the overall results.
> > >
> > > Note that [PATCH 2/4] arm64: amu: Delay allocating cpumask for AMU FIE support
> > > can be merged independently.
> >
> > What's the plan with the rest of the patches? Are you going to respin?
> > The first patch would need an ack from Rafael or Viresh if we are to
> > merge them via the arm64 tree.
> >
> I am still waiting on any feedback on [PATCH 1/4] - changes to cpufreq, as that
> one drives the changes in arch specific bits. There is also an ongoing discussion
> on how to handle idle cpu cases - so I would say we still need to agree on few
> details.
Would really appreciate your feedback on above mentioned [PATCH 1/4] -> [1]
as well as the discussion at [2].
Thank you.
---
[1] https://lore.kernel.org/all/20240913132944.1880703-2-beata.michalska@arm.com/
[2] https://lore.kernel.org/all/ZxAl77IYcMO2SfWh@arm.com/
---
>
> ---
> BR
> Beata
> > Thanks.
> >
> > --
> > Catalin
>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v7 3/4] arm64: Provide an AMU-based version of arch_freq_avg_get_on_cpu
2024-10-10 11:08 ` Beata Michalska
2024-10-11 16:29 ` Vanshidhar Konda
@ 2024-10-29 6:53 ` Viresh Kumar
2024-11-04 7:58 ` Beata Michalska
1 sibling, 1 reply; 27+ messages in thread
From: Viresh Kumar @ 2024-10-29 6:53 UTC (permalink / raw)
To: Beata Michalska
Cc: Vanshidhar Konda, Sumit Gupta, linux-kernel, linux-arm-kernel,
linux-pm, ionela.voinescu, sudeep.holla, will, catalin.marinas,
rafael, yang, lihuisong, zhanjie9, linux-tegra, Bibek Basu
On 10-10-24, 13:08, Beata Michalska wrote:
> That is a fair point but I am not entirely convinced using '0' instead makes
> things any more clearer as this is in no way a valid CPU frequency.
> As long as we document the expected behaviour keeping the interface well
> defined, both options should be fine I guess.
>
> @Viresh: what is your opinion on that one ?
Failing to get frequency for the CPU shouldn't be represented by 0,
even if it is confusing for the user.
--
viresh
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v7 1/4] cpufreq: Introduce an optional cpuinfo_avg_freq sysfs entry
2024-09-13 13:29 ` [PATCH v7 1/4] cpufreq: Introduce an optional cpuinfo_avg_freq sysfs entry Beata Michalska
2024-09-25 8:58 ` Jie Zhan
@ 2024-10-29 7:04 ` Viresh Kumar
2024-10-29 11:31 ` Rafael J. Wysocki
2024-11-04 8:00 ` Beata Michalska
1 sibling, 2 replies; 27+ messages in thread
From: Viresh Kumar @ 2024-10-29 7:04 UTC (permalink / raw)
To: Beata Michalska
Cc: linux-kernel, linux-arm-kernel, linux-pm, ionela.voinescu,
sudeep.holla, will, catalin.marinas, rafael, sumitg, yang,
vanshikonda, lihuisong, zhanjie9
Apologies for the delay from my side. September was mostly holidays
for me and then I was stuck with other stuff plus email backlog and
this series was always a painful point to return to :(
On 13-09-24, 14:29, Beata Michalska wrote:
> Currently the CPUFreq core exposes two sysfs attributes that can be used
> to query current frequency of a given CPU(s): namely cpuinfo_cur_freq
> and scaling_cur_freq. Both provide slightly different view on the
> subject and they do come with their own drawbacks.
>
> cpuinfo_cur_freq provides higher precision though at a cost of being
> rather expensive. Moreover, the information retrieved via this attribute
> is somewhat short lived as frequency can change at any point of time
> making it difficult to reason from.
>
> scaling_cur_freq, on the other hand, tends to be less accurate but then
> the actual level of precision (and source of information) varies between
> architectures making it a bit ambiguous.
>
> The new attribute, cpuinfo_avg_freq, is intended to provide more stable,
> distinct interface, exposing an average frequency of a given CPU(s), as
> reported by the hardware, over a time frame spanning no more than a few
> milliseconds. As it requires appropriate hardware support, this
> interface is optional.
From what I recall, the plan is to:
- keep cpuinfo_cur_freq as it is, not expose for x86 and call ->get()
for ARM.
- introduce cpuinfo_avg_freq() and make it return frequency from hw
counters for both ARM and Intel and others who provide the API.
- update scaling_cur_freq() to only return the requested frequency or
error in case of X86 and update documentation to reflect the same.
Right now or after some time ? How much time ?
Rafael ?
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index 04fc786dd2c0..3493e5a9500d 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -752,6 +752,16 @@ __weak unsigned int arch_freq_get_on_cpu(int cpu)
> return 0;
> }
>
> +__weak int arch_freq_avg_get_on_cpu(int cpu)
> +{
> + return -EOPNOTSUPP;
> +}
> +
> +static inline bool cpufreq_avg_freq_supported(struct cpufreq_policy *policy)
> +{
> + return arch_freq_avg_get_on_cpu(policy->cpu) >= 0;
> +}
And why aren't we simply reusing arch_freq_get_on_cpu() here ?
--
viresh
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v7 1/4] cpufreq: Introduce an optional cpuinfo_avg_freq sysfs entry
2024-10-29 7:04 ` Viresh Kumar
@ 2024-10-29 11:31 ` Rafael J. Wysocki
2024-11-04 8:01 ` Beata Michalska
2024-11-04 8:00 ` Beata Michalska
1 sibling, 1 reply; 27+ messages in thread
From: Rafael J. Wysocki @ 2024-10-29 11:31 UTC (permalink / raw)
To: Viresh Kumar
Cc: Beata Michalska, linux-kernel, linux-arm-kernel, linux-pm,
ionela.voinescu, sudeep.holla, will, catalin.marinas, rafael,
sumitg, yang, vanshikonda, lihuisong, zhanjie9
On Tue, Oct 29, 2024 at 8:04 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> Apologies for the delay from my side. September was mostly holidays
> for me and then I was stuck with other stuff plus email backlog and
> this series was always a painful point to return to :(
>
> On 13-09-24, 14:29, Beata Michalska wrote:
> > Currently the CPUFreq core exposes two sysfs attributes that can be used
> > to query current frequency of a given CPU(s): namely cpuinfo_cur_freq
> > and scaling_cur_freq. Both provide slightly different view on the
> > subject and they do come with their own drawbacks.
> >
> > cpuinfo_cur_freq provides higher precision though at a cost of being
> > rather expensive. Moreover, the information retrieved via this attribute
> > is somewhat short lived as frequency can change at any point of time
> > making it difficult to reason from.
> >
> > scaling_cur_freq, on the other hand, tends to be less accurate but then
> > the actual level of precision (and source of information) varies between
> > architectures making it a bit ambiguous.
> >
> > The new attribute, cpuinfo_avg_freq, is intended to provide more stable,
> > distinct interface, exposing an average frequency of a given CPU(s), as
> > reported by the hardware, over a time frame spanning no more than a few
> > milliseconds. As it requires appropriate hardware support, this
> > interface is optional.
>
> From what I recall, the plan is to:
> - keep cpuinfo_cur_freq as it is, not expose for x86 and call ->get()
> for ARM.
Yes.
> - introduce cpuinfo_avg_freq() and make it return frequency from hw
> counters for both ARM and Intel and others who provide the API.
Yes.
> - update scaling_cur_freq() to only return the requested frequency or
> error in case of X86
Yes.
Preferably, -ENOTSUPP for "setpolicy" drivers without the .get() callback.
> and update documentation to reflect the same.
> Right now or after some time ? How much time ?
After some time, I think at least two cycles, so people have the time
to switch over, but much more may be necessary if someone is stuck
with RHEL or similar user space.
Anyway, x86 will be the only one affected and there may be a Kconfig
option even to allow it to be changed at the kernel build time.
The documentation for cpuinfo_avg_freq() needs to be added along with it.
> > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> > index 04fc786dd2c0..3493e5a9500d 100644
> > --- a/drivers/cpufreq/cpufreq.c
> > +++ b/drivers/cpufreq/cpufreq.c
> > @@ -752,6 +752,16 @@ __weak unsigned int arch_freq_get_on_cpu(int cpu)
> > return 0;
> > }
> >
> > +__weak int arch_freq_avg_get_on_cpu(int cpu)
> > +{
> > + return -EOPNOTSUPP;
> > +}
> > +
> > +static inline bool cpufreq_avg_freq_supported(struct cpufreq_policy *policy)
> > +{
> > + return arch_freq_avg_get_on_cpu(policy->cpu) >= 0;
> > +}
>
> And why aren't we simply reusing arch_freq_get_on_cpu() here ?
>
> --
> viresh
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v7 3/4] arm64: Provide an AMU-based version of arch_freq_avg_get_on_cpu
2024-10-29 6:53 ` Viresh Kumar
@ 2024-11-04 7:58 ` Beata Michalska
0 siblings, 0 replies; 27+ messages in thread
From: Beata Michalska @ 2024-11-04 7:58 UTC (permalink / raw)
To: Viresh Kumar
Cc: Vanshidhar Konda, Sumit Gupta, linux-kernel, linux-arm-kernel,
linux-pm, ionela.voinescu, sudeep.holla, will, catalin.marinas,
rafael, yang, lihuisong, zhanjie9, linux-tegra, Bibek Basu
On Tue, Oct 29, 2024 at 12:23:19PM +0530, Viresh Kumar wrote:
> On 10-10-24, 13:08, Beata Michalska wrote:
> > That is a fair point but I am not entirely convinced using '0' instead makes
> > things any more clearer as this is in no way a valid CPU frequency.
> > As long as we document the expected behaviour keeping the interface well
> > defined, both options should be fine I guess.
> >
> > @Viresh: what is your opinion on that one ?
>
> Failing to get frequency for the CPU shouldn't be represented by 0,
> even if it is confusing for the user.
We still need to decide whether provide a more descriptive way of informing
about such cases (whether it be 'unknown' or 'idle' ) or to simply return
an appropriate error and leave the userspace with dealing with that.
---
Thanks
Beata
>
> --
> viresh
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v7 1/4] cpufreq: Introduce an optional cpuinfo_avg_freq sysfs entry
2024-10-29 7:04 ` Viresh Kumar
2024-10-29 11:31 ` Rafael J. Wysocki
@ 2024-11-04 8:00 ` Beata Michalska
2024-11-08 6:28 ` Viresh Kumar
1 sibling, 1 reply; 27+ messages in thread
From: Beata Michalska @ 2024-11-04 8:00 UTC (permalink / raw)
To: Viresh Kumar
Cc: linux-kernel, linux-arm-kernel, linux-pm, ionela.voinescu,
sudeep.holla, will, catalin.marinas, rafael, sumitg, yang,
vanshikonda, lihuisong, zhanjie9
On Tue, Oct 29, 2024 at 12:34:29PM +0530, Viresh Kumar wrote:
> Apologies for the delay from my side. September was mostly holidays
> for me and then I was stuck with other stuff plus email backlog and
> this series was always a painful point to return to :(
Thanks for getting back to me on this one!
>
> On 13-09-24, 14:29, Beata Michalska wrote:
> > Currently the CPUFreq core exposes two sysfs attributes that can be used
> > to query current frequency of a given CPU(s): namely cpuinfo_cur_freq
> > and scaling_cur_freq. Both provide slightly different view on the
> > subject and they do come with their own drawbacks.
> >
> > cpuinfo_cur_freq provides higher precision though at a cost of being
> > rather expensive. Moreover, the information retrieved via this attribute
> > is somewhat short lived as frequency can change at any point of time
> > making it difficult to reason from.
> >
> > scaling_cur_freq, on the other hand, tends to be less accurate but then
> > the actual level of precision (and source of information) varies between
> > architectures making it a bit ambiguous.
> >
> > The new attribute, cpuinfo_avg_freq, is intended to provide more stable,
> > distinct interface, exposing an average frequency of a given CPU(s), as
> > reported by the hardware, over a time frame spanning no more than a few
> > milliseconds. As it requires appropriate hardware support, this
> > interface is optional.
>
> From what I recall, the plan is to:
> - keep cpuinfo_cur_freq as it is, not expose for x86 and call ->get()
> for ARM.
>
> - introduce cpuinfo_avg_freq() and make it return frequency from hw
> counters for both ARM and Intel and others who provide the API.
>
> - update scaling_cur_freq() to only return the requested frequency or
> error in case of X86 and update documentation to reflect the same.
> Right now or after some time ? How much time ?
>
> Rafael ?
>
> > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> > index 04fc786dd2c0..3493e5a9500d 100644
> > --- a/drivers/cpufreq/cpufreq.c
> > +++ b/drivers/cpufreq/cpufreq.c
> > @@ -752,6 +752,16 @@ __weak unsigned int arch_freq_get_on_cpu(int cpu)
> > return 0;
> > }
> >
> > +__weak int arch_freq_avg_get_on_cpu(int cpu)
> > +{
> > + return -EOPNOTSUPP;
> > +}
> > +
> > +static inline bool cpufreq_avg_freq_supported(struct cpufreq_policy *policy)
> > +{
> > + return arch_freq_avg_get_on_cpu(policy->cpu) >= 0;
> > +}
>
> And why aren't we simply reusing arch_freq_get_on_cpu() here ?
We need a way to discover whether the new sysfs attrib is to be enabled or not.
I guess I could change the signature for arch_freq_get_on_cpu to return an error
if that functionality is not supported for given policy ?
---
Best Regards
Beata
>
> --
> viresh
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v7 1/4] cpufreq: Introduce an optional cpuinfo_avg_freq sysfs entry
2024-10-29 11:31 ` Rafael J. Wysocki
@ 2024-11-04 8:01 ` Beata Michalska
2024-11-04 13:26 ` Rafael J. Wysocki
0 siblings, 1 reply; 27+ messages in thread
From: Beata Michalska @ 2024-11-04 8:01 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Viresh Kumar, linux-kernel, linux-arm-kernel, linux-pm,
ionela.voinescu, sudeep.holla, will, catalin.marinas, sumitg,
yang, vanshikonda, lihuisong, zhanjie9
On Tue, Oct 29, 2024 at 12:31:11PM +0100, Rafael J. Wysocki wrote:
> On Tue, Oct 29, 2024 at 8:04 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
> >
> > Apologies for the delay from my side. September was mostly holidays
> > for me and then I was stuck with other stuff plus email backlog and
> > this series was always a painful point to return to :(
> >
> > On 13-09-24, 14:29, Beata Michalska wrote:
> > > Currently the CPUFreq core exposes two sysfs attributes that can be used
> > > to query current frequency of a given CPU(s): namely cpuinfo_cur_freq
> > > and scaling_cur_freq. Both provide slightly different view on the
> > > subject and they do come with their own drawbacks.
> > >
> > > cpuinfo_cur_freq provides higher precision though at a cost of being
> > > rather expensive. Moreover, the information retrieved via this attribute
> > > is somewhat short lived as frequency can change at any point of time
> > > making it difficult to reason from.
> > >
> > > scaling_cur_freq, on the other hand, tends to be less accurate but then
> > > the actual level of precision (and source of information) varies between
> > > architectures making it a bit ambiguous.
> > >
> > > The new attribute, cpuinfo_avg_freq, is intended to provide more stable,
> > > distinct interface, exposing an average frequency of a given CPU(s), as
> > > reported by the hardware, over a time frame spanning no more than a few
> > > milliseconds. As it requires appropriate hardware support, this
> > > interface is optional.
> >
> > From what I recall, the plan is to:
> > - keep cpuinfo_cur_freq as it is, not expose for x86 and call ->get()
> > for ARM.
>
> Yes.
That one indeed remains unchanged.
>
> > - introduce cpuinfo_avg_freq() and make it return frequency from hw
> > counters for both ARM and Intel and others who provide the API.
>
> Yes.
Will add changes for Intel as well.
>
> > - update scaling_cur_freq() to only return the requested frequency or
> > error in case of X86
>
> Yes.
>
> Preferably, -ENOTSUPP for "setpolicy" drivers without the .get() callback.
Right, my impression was that we want to leave that one as is.
Will add appropriate changes.
>
> > and update documentation to reflect the same.
> > Right now or after some time ? How much time ?
>
> After some time, I think at least two cycles, so people have the time
> to switch over, but much more may be necessary if someone is stuck
> with RHEL or similar user space.
>
> Anyway, x86 will be the only one affected and there may be a Kconfig
> option even to allow it to be changed at the kernel build time.
>
So just for my clarification we want a config switch to control what
scaling_cur_freq is to actually provide. It will keep the current behaviour as
default until we are ready to flip it and ultimately drop that temporary config
option ?
> The documentation for cpuinfo_avg_freq() needs to be added along with it.
That one is already provided unless you have smth else on mind ?
Like updating scaling_cur_freq to reference the new sysfs attribute ?
---
Best Regards
Beata
>
> > > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> > > index 04fc786dd2c0..3493e5a9500d 100644
> > > --- a/drivers/cpufreq/cpufreq.c
> > > +++ b/drivers/cpufreq/cpufreq.c
> > > @@ -752,6 +752,16 @@ __weak unsigned int arch_freq_get_on_cpu(int cpu)
> > > return 0;
> > > }
> > >
> > > +__weak int arch_freq_avg_get_on_cpu(int cpu)
> > > +{
> > > + return -EOPNOTSUPP;
> > > +}
> > > +
> > > +static inline bool cpufreq_avg_freq_supported(struct cpufreq_policy *policy)
> > > +{
> > > + return arch_freq_avg_get_on_cpu(policy->cpu) >= 0;
> > > +}
> >
> > And why aren't we simply reusing arch_freq_get_on_cpu() here ?
> >
> > --
> > viresh
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v7 1/4] cpufreq: Introduce an optional cpuinfo_avg_freq sysfs entry
2024-11-04 8:01 ` Beata Michalska
@ 2024-11-04 13:26 ` Rafael J. Wysocki
0 siblings, 0 replies; 27+ messages in thread
From: Rafael J. Wysocki @ 2024-11-04 13:26 UTC (permalink / raw)
To: Beata Michalska
Cc: Rafael J. Wysocki, Viresh Kumar, linux-kernel, linux-arm-kernel,
linux-pm, ionela.voinescu, sudeep.holla, will, catalin.marinas,
sumitg, yang, vanshikonda, lihuisong, zhanjie9
On Mon, Nov 4, 2024 at 9:01 AM Beata Michalska <beata.michalska@arm.com> wrote:
>
> On Tue, Oct 29, 2024 at 12:31:11PM +0100, Rafael J. Wysocki wrote:
> > On Tue, Oct 29, 2024 at 8:04 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
> > >
> > > Apologies for the delay from my side. September was mostly holidays
> > > for me and then I was stuck with other stuff plus email backlog and
> > > this series was always a painful point to return to :(
> > >
> > > On 13-09-24, 14:29, Beata Michalska wrote:
> > > > Currently the CPUFreq core exposes two sysfs attributes that can be used
> > > > to query current frequency of a given CPU(s): namely cpuinfo_cur_freq
> > > > and scaling_cur_freq. Both provide slightly different view on the
> > > > subject and they do come with their own drawbacks.
> > > >
> > > > cpuinfo_cur_freq provides higher precision though at a cost of being
> > > > rather expensive. Moreover, the information retrieved via this attribute
> > > > is somewhat short lived as frequency can change at any point of time
> > > > making it difficult to reason from.
> > > >
> > > > scaling_cur_freq, on the other hand, tends to be less accurate but then
> > > > the actual level of precision (and source of information) varies between
> > > > architectures making it a bit ambiguous.
> > > >
> > > > The new attribute, cpuinfo_avg_freq, is intended to provide more stable,
> > > > distinct interface, exposing an average frequency of a given CPU(s), as
> > > > reported by the hardware, over a time frame spanning no more than a few
> > > > milliseconds. As it requires appropriate hardware support, this
> > > > interface is optional.
> > >
> > > From what I recall, the plan is to:
> > > - keep cpuinfo_cur_freq as it is, not expose for x86 and call ->get()
> > > for ARM.
> >
> > Yes.
> That one indeed remains unchanged.
> >
> > > - introduce cpuinfo_avg_freq() and make it return frequency from hw
> > > counters for both ARM and Intel and others who provide the API.
> >
> > Yes.
> Will add changes for Intel as well.
> >
> > > - update scaling_cur_freq() to only return the requested frequency or
> > > error in case of X86
> >
> > Yes.
> >
> > Preferably, -ENOTSUPP for "setpolicy" drivers without the .get() callback.
> Right, my impression was that we want to leave that one as is.
For now, yes please.
> Will add appropriate changes.
In the future.
> > > and update documentation to reflect the same.
> > > Right now or after some time ? How much time ?
> >
> > After some time, I think at least two cycles, so people have the time
> > to switch over, but much more may be necessary if someone is stuck
> > with RHEL or similar user space.
> >
> > Anyway, x86 will be the only one affected and there may be a Kconfig
> > option even to allow it to be changed at the kernel build time.
> >
> So just for my clarification we want a config switch to control what
> scaling_cur_freq is to actually provide.
Something like:
if (IS_ENABLED(CONFIG_CPUFREQ_CUR_FREQ_FROM_ARCH))) {
freq = arch_freq_get_on_cpu(policy->cpu);
if (freq)
return sysfs_emit(buf, "%u\n", freq);
}
if (cpufreq_driver->setpolicy) {
if (cpufreq_driver->get))
return sysfs_emit(buf, "%u\n", cpufreq_driver->get(policy->cpu));
return -ENOTSUPP;
}
return sysfs_emit(buf, "%u\n", policy->cur);
And select CONFIG_CPUFREQ_CUR_FREQ_FROM_ARCH on x86 for now.
> It will keep the current behaviour as
> default until we are ready to flip it and ultimately drop that temporary config
> option ?
Yes.
> > The documentation for cpuinfo_avg_freq() needs to be added along with it.
> That one is already provided unless you have smth else on mind ?
> Like updating scaling_cur_freq to reference the new sysfs attribute ?
I haven't checked super-thoroughly, but generally all documentation
needs to be consistent.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v7 1/4] cpufreq: Introduce an optional cpuinfo_avg_freq sysfs entry
2024-11-04 8:00 ` Beata Michalska
@ 2024-11-08 6:28 ` Viresh Kumar
0 siblings, 0 replies; 27+ messages in thread
From: Viresh Kumar @ 2024-11-08 6:28 UTC (permalink / raw)
To: Beata Michalska
Cc: linux-kernel, linux-arm-kernel, linux-pm, ionela.voinescu,
sudeep.holla, will, catalin.marinas, rafael, sumitg, yang,
vanshikonda, lihuisong, zhanjie9
On Mon, 4 Nov 2024 at 13:30, Beata Michalska <beata.michalska@arm.com> wrote:
> We need a way to discover whether the new sysfs attrib is to be enabled or not.
Ahh, this is too much trouble to make it optional then :)
> I guess I could change the signature for arch_freq_get_on_cpu to return an error
> if that functionality is not supported for given policy ?
That would be more controversial I guess as we may want an unsigned int return
type for frequency.
I think we can do two things here:
- Just enable the sysfs attribute all the time ?
- Make it optional based on some static cpufreq driver flag ?
I would just enable it for everyone I guess.
--
Viresh
^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2024-11-08 6:30 UTC | newest]
Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-13 13:29 [PATCH v7 0/4] Add support for AArch64 AMUv1-based average freq Beata Michalska
2024-09-13 13:29 ` [PATCH v7 1/4] cpufreq: Introduce an optional cpuinfo_avg_freq sysfs entry Beata Michalska
2024-09-25 8:58 ` Jie Zhan
2024-09-26 10:42 ` Beata Michalska
2024-10-29 7:04 ` Viresh Kumar
2024-10-29 11:31 ` Rafael J. Wysocki
2024-11-04 8:01 ` Beata Michalska
2024-11-04 13:26 ` Rafael J. Wysocki
2024-11-04 8:00 ` Beata Michalska
2024-11-08 6:28 ` Viresh Kumar
2024-09-13 13:29 ` [PATCH v7 2/4] arm64: amu: Delay allocating cpumask for AMU FIE support Beata Michalska
2024-09-13 13:29 ` [PATCH v7 3/4] arm64: Provide an AMU-based version of arch_freq_avg_get_on_cpu Beata Michalska
2024-09-17 12:11 ` Sumit Gupta
2024-09-26 10:34 ` Beata Michalska
2024-09-26 23:21 ` Vanshidhar Konda
2024-10-03 21:39 ` Beata Michalska
2024-10-03 21:54 ` Vanshidhar Konda
2024-10-10 11:08 ` Beata Michalska
2024-10-11 16:29 ` Vanshidhar Konda
2024-10-14 17:46 ` Sumit Gupta
2024-10-16 20:45 ` Beata Michalska
2024-10-29 6:53 ` Viresh Kumar
2024-11-04 7:58 ` Beata Michalska
2024-09-13 13:29 ` [PATCH v7 4/4] arm64: Update AMU-based freq scale factor on entering idle Beata Michalska
2024-10-16 10:59 ` [PATCH v7 0/4] Add support for AArch64 AMUv1-based average freq Catalin Marinas
2024-10-16 20:51 ` Beata Michalska
2024-10-27 18:16 ` Beata Michalska
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).