* [PATCH v6 0/4] Add support for AArch64 AMUv1-based arch_freq_get_on_cpu
@ 2024-06-03 8:21 Beata Michalska
2024-06-03 8:21 ` [PATCH v6 1/4] arch_topology: init capacity_freq_ref to 0 Beata Michalska
` (5 more replies)
0 siblings, 6 replies; 18+ messages in thread
From: Beata Michalska @ 2024-06-03 8:21 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, ionela.voinescu, sudeep.holla,
will, catalin.marinas, vincent.guittot, vanshikonda
Cc: sumitg, yang, lihuisong, viresh.kumar, rafael
Introducing arm64 specific version of arch_freq_get_on_cpu, cashing on
existing implementation for FIE and AMUv1 support: the frequency scale
factor, updated on each sched tick, serves as a base for retrieving
the frequency for a given CPU, representing an average frequency
reported between the ticks - thus its accuracy is limited.
The changes have been rather lightly (due to some limitations) tested on
an FVP model. Note that some small discrepancies have been observed while
testing (on the model) and this is currently being investigated, though it
should not have any significant impact on the overall results.
Relevant discussions:
[1] https://lore.kernel.org/all/20240229162520.970986-1-vanshikonda@os.amperecomputing.com/
[2] https://lore.kernel.org/all/7eozim2xnepacnnkzxlbx34hib4otycnbn4dqymfziqou5lw5u@5xzpv3t7sxo3/
[3] https://lore.kernel.org/all/20231212072617.14756-1-lihuisong@huawei.com/
[4] https://lore.kernel.org/lkml/ZIHpd6unkOtYVEqP@e120325.cambridge.arm.com/T/#m4e74cb5a0aaa353c60fedc6cfb95ab7a6e381e3c
v6:
- delay allocating cpumask for AMU FIE support instead of invalidating the mask
upon failure to register cpufreq policy notifications
- drop the change to cpufreq core (for cpuinfo_cur_freq) as this one will be
sent as a separate change
v5:
- Fix invalid access to cpumask
- Reworked finding reference cpu when getting the freq
v4:
- dropping seqcount
- fixing identifying active cpu within given policy
- skipping full dynticks cpus when retrieving the freq
- bringing back plugging in arch_freq_get_on_cpu into cpuinfo_cur_freq
v3:
- dropping changes to cpufreq_verify_current_freq
- pulling in changes from Ionela initializing capacity_freq_ref to 0
(thanks for that!) and applying suggestions made by her during last review:
- switching to arch_scale_freq_capacity and arch_scale_freq_ref when
reversing freq scale factor computation
- swapping shift with multiplication
- adding time limit for considering last scale update as valid
- updating frequency scale factor upon entering idle
v2:
- Splitting the patches
- Adding comment for full dyntick mode
- Plugging arch_freq_get_on_cpu into cpufreq_verify_current_freq instead
of in show_cpuinfo_cur_freq to allow the framework to stay more in sync
with potential freq changes
Beata Michalska (3):
arm64: amu: Delay allocating cpumask for AMU FIE support
arm64: Provide an AMU-based version of arch_freq_get_on_cpu
arm64: Update AMU-based frequency scale factor on entering idle
Ionela Voinescu (1):
arch_topology: init capacity_freq_ref to 0
arch/arm64/kernel/topology.c | 145 +++++++++++++++++++++++++++++------
drivers/base/arch_topology.c | 8 +-
2 files changed, 127 insertions(+), 26 deletions(-)
--
2.25.1
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v6 1/4] arch_topology: init capacity_freq_ref to 0
2024-06-03 8:21 [PATCH v6 0/4] Add support for AArch64 AMUv1-based arch_freq_get_on_cpu Beata Michalska
@ 2024-06-03 8:21 ` Beata Michalska
2024-06-03 8:21 ` [PATCH v6 2/4] arm64: amu: Delay allocating cpumask for AMU FIE support Beata Michalska
` (4 subsequent siblings)
5 siblings, 0 replies; 18+ messages in thread
From: Beata Michalska @ 2024-06-03 8:21 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, ionela.voinescu, sudeep.holla,
will, catalin.marinas, vincent.guittot, vanshikonda
Cc: sumitg, yang, lihuisong, viresh.kumar, rafael
From: Ionela Voinescu <ionela.voinescu@arm.com>
It's useful to have capacity_freq_ref initialized to 0 for users of
arch_scale_freq_ref() to detect when capacity_freq_ref was not
yet set.
The only scenario affected by this change in the init value is when a
cpufreq driver is never loaded. As a result, the only setter of a
cpu scale factor remains the call of topology_normalize_cpu_scale()
from parse_dt_topology(). There we cannot use the value 0 of
capacity_freq_ref so we have to compensate for its uninitialized state.
Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
Signed-off-by: Beata Michalska <beata.michalska@arm.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
---
drivers/base/arch_topology.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
index c66d070207a0..b8217ce9082c 100644
--- a/drivers/base/arch_topology.c
+++ b/drivers/base/arch_topology.c
@@ -27,7 +27,7 @@
static DEFINE_PER_CPU(struct scale_freq_data __rcu *, sft_data);
static struct cpumask scale_freq_counters_mask;
static bool scale_freq_invariant;
-DEFINE_PER_CPU(unsigned long, capacity_freq_ref) = 1;
+DEFINE_PER_CPU(unsigned long, capacity_freq_ref) = 0;
EXPORT_PER_CPU_SYMBOL_GPL(capacity_freq_ref);
static bool supports_scale_freq_counters(const struct cpumask *cpus)
@@ -292,13 +292,15 @@ void topology_normalize_cpu_scale(void)
capacity_scale = 1;
for_each_possible_cpu(cpu) {
- capacity = raw_capacity[cpu] * per_cpu(capacity_freq_ref, cpu);
+ capacity = raw_capacity[cpu] *
+ (per_cpu(capacity_freq_ref, cpu) ?: 1);
capacity_scale = max(capacity, capacity_scale);
}
pr_debug("cpu_capacity: capacity_scale=%llu\n", capacity_scale);
for_each_possible_cpu(cpu) {
- capacity = raw_capacity[cpu] * per_cpu(capacity_freq_ref, cpu);
+ capacity = raw_capacity[cpu] *
+ (per_cpu(capacity_freq_ref, cpu) ?: 1);
capacity = div64_u64(capacity << SCHED_CAPACITY_SHIFT,
capacity_scale);
topology_set_cpu_scale(cpu, capacity);
--
2.25.1
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v6 2/4] arm64: amu: Delay allocating cpumask for AMU FIE support
2024-06-03 8:21 [PATCH v6 0/4] Add support for AArch64 AMUv1-based arch_freq_get_on_cpu Beata Michalska
2024-06-03 8:21 ` [PATCH v6 1/4] arch_topology: init capacity_freq_ref to 0 Beata Michalska
@ 2024-06-03 8:21 ` Beata Michalska
2024-06-03 8:21 ` [PATCH v6 3/4] arm64: Provide an AMU-based version of arch_freq_get_on_cpu Beata Michalska
` (3 subsequent siblings)
5 siblings, 0 replies; 18+ messages in thread
From: Beata Michalska @ 2024-06-03 8:21 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, ionela.voinescu, sudeep.holla,
will, catalin.marinas, vincent.guittot, vanshikonda
Cc: sumitg, yang, lihuisong, viresh.kumar, rafael
For the time being, the amu_fie_cpus cpumask is being exclusively used
by the AMU-related internals of FIE support and is guaranteed to be
valid on every access currently made. Still the mask is not being
invalidated on one of the error handling code paths, which leaves
a soft spot with theoretical risk of uaf for CPUMASK_OFFSTACK cases.
To make things sound, delay allocating said cpumask
(for CPUMASK_OFFSTACK) avoiding otherwise nasty sanitising case failing
to register the cpufreq policy notifications.
Signed-off-by: Beata Michalska <beata.michalska@arm.com>
---
arch/arm64/kernel/topology.c | 22 +++++++++-------------
1 file changed, 9 insertions(+), 13 deletions(-)
diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
index 1a2c72f3e7f8..e475ec2705e1 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -194,11 +194,16 @@ static void amu_fie_setup(const struct cpumask *cpus)
int cpu;
/* We are already set since the last insmod of cpufreq driver */
- if (unlikely(cpumask_subset(cpus, amu_fie_cpus)))
- return;
+ if (cpumask_available(amu_fie_cpus) &&
+ unlikely(cpumask_subset(cpus, amu_fie_cpus)))
+ return;
for_each_cpu(cpu, cpus) {
- if (!freq_counters_valid(cpu))
+ if (!cpumask_available(amu_fie_cpus) &&
+ !zalloc_cpumask_var(&amu_fie_cpus, GFP_KERNEL))
+ return;
+
+ if (!freq_counters_valid(cpu))
return;
}
@@ -237,17 +242,8 @@ static struct notifier_block init_amu_fie_notifier = {
static int __init init_amu_fie(void)
{
- int ret;
-
- if (!zalloc_cpumask_var(&amu_fie_cpus, GFP_KERNEL))
- return -ENOMEM;
-
- ret = cpufreq_register_notifier(&init_amu_fie_notifier,
+ return cpufreq_register_notifier(&init_amu_fie_notifier,
CPUFREQ_POLICY_NOTIFIER);
- if (ret)
- free_cpumask_var(amu_fie_cpus);
-
- return ret;
}
core_initcall(init_amu_fie);
--
2.25.1
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v6 3/4] arm64: Provide an AMU-based version of arch_freq_get_on_cpu
2024-06-03 8:21 [PATCH v6 0/4] Add support for AArch64 AMUv1-based arch_freq_get_on_cpu Beata Michalska
2024-06-03 8:21 ` [PATCH v6 1/4] arch_topology: init capacity_freq_ref to 0 Beata Michalska
2024-06-03 8:21 ` [PATCH v6 2/4] arm64: amu: Delay allocating cpumask for AMU FIE support Beata Michalska
@ 2024-06-03 8:21 ` Beata Michalska
2024-07-10 17:44 ` Vanshidhar Konda
2024-08-14 6:46 ` Jie Zhan
2024-06-03 8:21 ` [PATCH v6 4/4] arm64: Update AMU-based freq scale factor on entering idle Beata Michalska
` (2 subsequent siblings)
5 siblings, 2 replies; 18+ messages in thread
From: Beata Michalska @ 2024-06-03 8:21 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, ionela.voinescu, sudeep.holla,
will, catalin.marinas, vincent.guittot, vanshikonda
Cc: sumitg, yang, lihuisong, viresh.kumar, rafael
With the Frequency Invariance Engine (FIE) being already wired up with
sched tick and making use of relevant (core counter and constant
counter) AMU counters, getting the current frequency for a given CPU,
can be achieved by utilizing the frequency scale factor which reflects
an average CPU frequency for the last tick period length.
The solution is partially based on APERF/MPERF implementation of
arch_freq_get_on_cpu.
Suggested-by: Ionela Voinescu <ionela.voinescu@arm.com>
Signed-off-by: Beata Michalska <beata.michalska@arm.com>
---
arch/arm64/kernel/topology.c | 110 +++++++++++++++++++++++++++++++----
1 file changed, 100 insertions(+), 10 deletions(-)
diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
index e475ec2705e1..2c002d2c3e0b 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -17,6 +17,7 @@
#include <linux/cpufreq.h>
#include <linux/init.h>
#include <linux/percpu.h>
+#include <linux/sched/isolation.h>
#include <asm/cpu.h>
#include <asm/cputype.h>
@@ -88,18 +89,28 @@ int __init parse_acpi_topology(void)
* initialized.
*/
static DEFINE_PER_CPU_READ_MOSTLY(unsigned long, arch_max_freq_scale) = 1UL << (2 * SCHED_CAPACITY_SHIFT);
-static DEFINE_PER_CPU(u64, arch_const_cycles_prev);
-static DEFINE_PER_CPU(u64, arch_core_cycles_prev);
static cpumask_var_t amu_fie_cpus;
+struct amu_cntr_sample {
+ u64 arch_const_cycles_prev;
+ u64 arch_core_cycles_prev;
+ unsigned long last_update;
+};
+
+static DEFINE_PER_CPU_SHARED_ALIGNED(struct amu_cntr_sample, cpu_amu_samples);
+
void update_freq_counters_refs(void)
{
- this_cpu_write(arch_core_cycles_prev, read_corecnt());
- this_cpu_write(arch_const_cycles_prev, read_constcnt());
+ struct amu_cntr_sample *amu_sample = this_cpu_ptr(&cpu_amu_samples);
+
+ amu_sample->arch_core_cycles_prev = read_corecnt();
+ amu_sample->arch_const_cycles_prev = read_constcnt();
}
static inline bool freq_counters_valid(int cpu)
{
+ struct amu_cntr_sample *amu_sample = per_cpu_ptr(&cpu_amu_samples, cpu);
+
if ((cpu >= nr_cpu_ids) || !cpumask_test_cpu(cpu, cpu_present_mask))
return false;
@@ -108,8 +119,8 @@ static inline bool freq_counters_valid(int cpu)
return false;
}
- if (unlikely(!per_cpu(arch_const_cycles_prev, cpu) ||
- !per_cpu(arch_core_cycles_prev, cpu))) {
+ if (unlikely(!amu_sample->arch_const_cycles_prev ||
+ !amu_sample->arch_core_cycles_prev)) {
pr_debug("CPU%d: cycle counters are not enabled.\n", cpu);
return false;
}
@@ -152,17 +163,22 @@ void freq_inv_set_max_ratio(int cpu, u64 max_rate)
static void amu_scale_freq_tick(void)
{
+ struct amu_cntr_sample *amu_sample = this_cpu_ptr(&cpu_amu_samples);
u64 prev_core_cnt, prev_const_cnt;
u64 core_cnt, const_cnt, scale;
- prev_const_cnt = this_cpu_read(arch_const_cycles_prev);
- prev_core_cnt = this_cpu_read(arch_core_cycles_prev);
+ prev_const_cnt = amu_sample->arch_const_cycles_prev;
+ prev_core_cnt = amu_sample->arch_core_cycles_prev;
update_freq_counters_refs();
- const_cnt = this_cpu_read(arch_const_cycles_prev);
- core_cnt = this_cpu_read(arch_core_cycles_prev);
+ const_cnt = amu_sample->arch_const_cycles_prev;
+ core_cnt = amu_sample->arch_core_cycles_prev;
+ /*
+ * This should not happen unless the AMUs have been reset and the
+ * counter values have not been restored - unlikely
+ */
if (unlikely(core_cnt <= prev_core_cnt ||
const_cnt <= prev_const_cnt))
return;
@@ -182,6 +198,8 @@ static void amu_scale_freq_tick(void)
scale = min_t(unsigned long, scale, SCHED_CAPACITY_SCALE);
this_cpu_write(arch_freq_scale, (unsigned long)scale);
+
+ amu_sample->last_update = jiffies;
}
static struct scale_freq_data amu_sfd = {
@@ -189,6 +207,78 @@ static struct scale_freq_data amu_sfd = {
.set_freq_scale = amu_scale_freq_tick,
};
+static __always_inline bool amu_fie_cpu_supported(unsigned int cpu)
+{
+ return cpumask_available(amu_fie_cpus) &&
+ cpumask_test_cpu(cpu, amu_fie_cpus);
+}
+
+#define AMU_SAMPLE_EXP_MS 20
+
+unsigned int arch_freq_get_on_cpu(int cpu)
+{
+ struct amu_cntr_sample *amu_sample;
+ unsigned int start_cpu = cpu;
+ unsigned long last_update;
+ unsigned int freq = 0;
+ u64 scale;
+
+ if (!amu_fie_cpu_supported(cpu) || !arch_scale_freq_ref(cpu))
+ return 0;
+
+retry:
+ amu_sample = per_cpu_ptr(&cpu_amu_samples, cpu);
+
+ last_update = amu_sample->last_update;
+
+ /*
+ * For those CPUs that are in full dynticks mode,
+ * and those that have not seen tick for a while
+ * try an alternative source for the counters (and thus freq scale),
+ * if available, for given policy:
+ * this boils down to identifying an active cpu within the same freq
+ * domain, if any.
+ */
+ if (!housekeeping_cpu(cpu, HK_TYPE_TICK) ||
+ time_is_before_jiffies(last_update + msecs_to_jiffies(AMU_SAMPLE_EXP_MS))) {
+ struct cpufreq_policy *policy = cpufreq_cpu_get(cpu);
+ int ref_cpu = cpu;
+
+ if (!policy)
+ goto leave;
+
+ if (!policy_is_shared(policy)) {
+ cpufreq_cpu_put(policy);
+ goto leave;
+ }
+
+ do {
+ ref_cpu = cpumask_next_wrap(ref_cpu, policy->cpus,
+ start_cpu, false);
+
+ } while (ref_cpu < nr_cpu_ids && idle_cpu(ref_cpu));
+
+ cpufreq_cpu_put(policy);
+
+ if (ref_cpu >= nr_cpu_ids)
+ /* No alternative to pull info from */
+ goto leave;
+
+ cpu = ref_cpu;
+ goto retry;
+ }
+ /*
+ * Reversed computation to the one used to determine
+ * the arch_freq_scale value
+ * (see amu_scale_freq_tick for details)
+ */
+ scale = arch_scale_freq_capacity(cpu);
+ freq = scale * arch_scale_freq_ref(cpu);
+ freq >>= SCHED_CAPACITY_SHIFT;
+leave:
+ return freq;
+}
+
static void amu_fie_setup(const struct cpumask *cpus)
{
int cpu;
--
2.25.1
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v6 4/4] arm64: Update AMU-based freq scale factor on entering idle
2024-06-03 8:21 [PATCH v6 0/4] Add support for AArch64 AMUv1-based arch_freq_get_on_cpu Beata Michalska
` (2 preceding siblings ...)
2024-06-03 8:21 ` [PATCH v6 3/4] arm64: Provide an AMU-based version of arch_freq_get_on_cpu Beata Michalska
@ 2024-06-03 8:21 ` Beata Michalska
2024-07-08 17:10 ` [PATCH v6 0/4] Add support for AArch64 AMUv1-based arch_freq_get_on_cpu Catalin Marinas
2024-07-14 0:32 ` Vanshidhar Konda
5 siblings, 0 replies; 18+ messages in thread
From: Beata Michalska @ 2024-06-03 8:21 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel, ionela.voinescu, sudeep.holla,
will, catalin.marinas, vincent.guittot, vanshikonda
Cc: sumitg, yang, lihuisong, viresh.kumar, rafael
Now that the frequency scale factor has been activated for retrieving
current frequency on a given CPU, trigger its update upon entering
idle. This will, to an extent, allow querying last known frequency
in a non-invasive way. It will also improve the frequency scale factor
accuracy when a CPU entering idle did not receive a tick for a while.
As a consequence, for idle cores, the reported frequency will be the
last one observed before entering the idle state.
Suggested-by: Vanshidhar Konda <vanshikonda@os.amperecomputing.com>
Signed-off-by: Beata Michalska <beata.michalska@arm.com>
---
arch/arm64/kernel/topology.c | 17 +++++++++++++++--
1 file changed, 15 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
index 2c002d2c3e0b..56c5b2e632b4 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -213,6 +213,19 @@ static __always_inline bool amu_fie_cpu_supported(unsigned int cpu)
cpumask_test_cpu(cpu, amu_fie_cpus);
}
+void arch_cpu_idle_enter(void)
+{
+ unsigned int cpu = smp_processor_id();
+
+ if (!amu_fie_cpu_supported(cpu))
+ return;
+
+ /* Kick in AMU update but only if one has not happened already */
+ if (housekeeping_cpu(cpu, HK_TYPE_TICK) &&
+ time_is_before_jiffies(per_cpu(cpu_amu_samples.last_update, cpu)))
+ amu_scale_freq_tick();
+}
+
#define AMU_SAMPLE_EXP_MS 20
unsigned int arch_freq_get_on_cpu(int cpu)
@@ -239,8 +252,8 @@ unsigned int arch_freq_get_on_cpu(int cpu)
* this boils down to identifying an active cpu within the same freq
* domain, if any.
*/
- if (!housekeeping_cpu(cpu, HK_TYPE_TICK) ||
- time_is_before_jiffies(last_update + msecs_to_jiffies(AMU_SAMPLE_EXP_MS))) {
+ if (!housekeeping_cpu(cpu, HK_TYPE_TICK) || (!idle_cpu(cpu) &&
+ time_is_before_jiffies(last_update + msecs_to_jiffies(AMU_SAMPLE_EXP_MS)))) {
struct cpufreq_policy *policy = cpufreq_cpu_get(cpu);
int ref_cpu = cpu;
--
2.25.1
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH v6 0/4] Add support for AArch64 AMUv1-based arch_freq_get_on_cpu
2024-06-03 8:21 [PATCH v6 0/4] Add support for AArch64 AMUv1-based arch_freq_get_on_cpu Beata Michalska
` (3 preceding siblings ...)
2024-06-03 8:21 ` [PATCH v6 4/4] arm64: Update AMU-based freq scale factor on entering idle Beata Michalska
@ 2024-07-08 17:10 ` Catalin Marinas
2024-07-11 13:59 ` Beata Michalska
2024-07-14 0:32 ` Vanshidhar Konda
5 siblings, 1 reply; 18+ messages in thread
From: Catalin Marinas @ 2024-07-08 17:10 UTC (permalink / raw)
To: Beata Michalska
Cc: linux-kernel, linux-arm-kernel, ionela.voinescu, sudeep.holla,
will, vincent.guittot, vanshikonda, sumitg, yang, lihuisong,
viresh.kumar, rafael
Hi Beata,
On Mon, Jun 03, 2024 at 09:21:50AM +0100, Beata Michalska wrote:
> Introducing arm64 specific version of arch_freq_get_on_cpu, cashing on
> existing implementation for FIE and AMUv1 support: the frequency scale
> factor, updated on each sched tick, serves as a base for retrieving
> the frequency for a given CPU, representing an average frequency
> reported between the ticks - thus its accuracy is limited.
>
> The changes have been rather lightly (due to some limitations) tested on
> an FVP model. Note that some small discrepancies have been observed while
> testing (on the model) and this is currently being investigated, though it
> should not have any significant impact on the overall results.
What's the plan with this series? Are you still investigating those
discrepancies or is it good to go?
--
Catalin
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v6 3/4] arm64: Provide an AMU-based version of arch_freq_get_on_cpu
2024-06-03 8:21 ` [PATCH v6 3/4] arm64: Provide an AMU-based version of arch_freq_get_on_cpu Beata Michalska
@ 2024-07-10 17:44 ` Vanshidhar Konda
2024-07-17 6:54 ` Beata Michalska
2024-08-14 6:46 ` Jie Zhan
1 sibling, 1 reply; 18+ messages in thread
From: Vanshidhar Konda @ 2024-07-10 17:44 UTC (permalink / raw)
To: Beata Michalska
Cc: linux-kernel, linux-arm-kernel, ionela.voinescu, sudeep.holla,
will, catalin.marinas, vincent.guittot, sumitg, yang, lihuisong,
viresh.kumar, rafael
I apologize for the late review. This series dropped off my radar. I will test
this on an AmpereOne system.
On Mon, Jun 03, 2024 at 09:21:53AM +0100, Beata Michalska wrote:
>With the Frequency Invariance Engine (FIE) being already wired up with
>sched tick and making use of relevant (core counter and constant
>counter) AMU counters, getting the current frequency for a given CPU,
>can be achieved by utilizing the frequency scale factor which reflects
>an average CPU frequency for the last tick period length.
>
>The solution is partially based on APERF/MPERF implementation of
>arch_freq_get_on_cpu.
>
>Suggested-by: Ionela Voinescu <ionela.voinescu@arm.com>
>Signed-off-by: Beata Michalska <beata.michalska@arm.com>
>---
> arch/arm64/kernel/topology.c | 110 +++++++++++++++++++++++++++++++----
> 1 file changed, 100 insertions(+), 10 deletions(-)
>
>diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
>index e475ec2705e1..2c002d2c3e0b 100644
>--- a/arch/arm64/kernel/topology.c
>+++ b/arch/arm64/kernel/topology.c
>@@ -17,6 +17,7 @@
> #include <linux/cpufreq.h>
> #include <linux/init.h>
> #include <linux/percpu.h>
>+#include <linux/sched/isolation.h>
>
> #include <asm/cpu.h>
> #include <asm/cputype.h>
>@@ -88,18 +89,28 @@ int __init parse_acpi_topology(void)
> * initialized.
> */
> static DEFINE_PER_CPU_READ_MOSTLY(unsigned long, arch_max_freq_scale) = 1UL << (2 * SCHED_CAPACITY_SHIFT);
>-static DEFINE_PER_CPU(u64, arch_const_cycles_prev);
>-static DEFINE_PER_CPU(u64, arch_core_cycles_prev);
> static cpumask_var_t amu_fie_cpus;
>
>+struct amu_cntr_sample {
>+ u64 arch_const_cycles_prev;
>+ u64 arch_core_cycles_prev;
>+ unsigned long last_update;
>+};
>+
>+static DEFINE_PER_CPU_SHARED_ALIGNED(struct amu_cntr_sample, cpu_amu_samples);
>+
> void update_freq_counters_refs(void)
Could this be renamed to update_amu_cntr_sample() for more clarity?
> {
>- this_cpu_write(arch_core_cycles_prev, read_corecnt());
>- this_cpu_write(arch_const_cycles_prev, read_constcnt());
>+ struct amu_cntr_sample *amu_sample = this_cpu_ptr(&cpu_amu_samples);
>+
>+ amu_sample->arch_core_cycles_prev = read_corecnt();
>+ amu_sample->arch_const_cycles_prev = read_constcnt();
I think it would be better to update amp_sample->last_update here. update_freq_counters_refs
is the only way to update the amu_sample. So it should be safer to update the whole structure
in this method.
> }
>
> static inline bool freq_counters_valid(int cpu)
> {
>+ struct amu_cntr_sample *amu_sample = per_cpu_ptr(&cpu_amu_samples, cpu);
>+
> if ((cpu >= nr_cpu_ids) || !cpumask_test_cpu(cpu, cpu_present_mask))
> return false;
>
>@@ -108,8 +119,8 @@ static inline bool freq_counters_valid(int cpu)
> return false;
> }
>
>- if (unlikely(!per_cpu(arch_const_cycles_prev, cpu) ||
>- !per_cpu(arch_core_cycles_prev, cpu))) {
>+ if (unlikely(!amu_sample->arch_const_cycles_prev ||
>+ !amu_sample->arch_core_cycles_prev)) {
> pr_debug("CPU%d: cycle counters are not enabled.\n", cpu);
> return false;
> }
>@@ -152,17 +163,22 @@ void freq_inv_set_max_ratio(int cpu, u64 max_rate)
>
> static void amu_scale_freq_tick(void)
> {
>+ struct amu_cntr_sample *amu_sample = this_cpu_ptr(&cpu_amu_samples);
> u64 prev_core_cnt, prev_const_cnt;
> u64 core_cnt, const_cnt, scale;
>
>- prev_const_cnt = this_cpu_read(arch_const_cycles_prev);
>- prev_core_cnt = this_cpu_read(arch_core_cycles_prev);
>+ prev_const_cnt = amu_sample->arch_const_cycles_prev;
>+ prev_core_cnt = amu_sample->arch_core_cycles_prev;
>
> update_freq_counters_refs();
>
>- const_cnt = this_cpu_read(arch_const_cycles_prev);
>- core_cnt = this_cpu_read(arch_core_cycles_prev);
>+ const_cnt = amu_sample->arch_const_cycles_prev;
>+ core_cnt = amu_sample->arch_core_cycles_prev;
>
>+ /*
>+ * This should not happen unless the AMUs have been reset and the
>+ * counter values have not been restored - unlikely
>+ */
> if (unlikely(core_cnt <= prev_core_cnt ||
> const_cnt <= prev_const_cnt))
> return;
>@@ -182,6 +198,8 @@ static void amu_scale_freq_tick(void)
>
> scale = min_t(unsigned long, scale, SCHED_CAPACITY_SCALE);
> this_cpu_write(arch_freq_scale, (unsigned long)scale);
>+
>+ amu_sample->last_update = jiffies;
Please see the comment above. I think this line could be moved to
update_freq_counters_refs method.
> }
>
> static struct scale_freq_data amu_sfd = {
>@@ -189,6 +207,78 @@ static struct scale_freq_data amu_sfd = {
> .set_freq_scale = amu_scale_freq_tick,
> };
>
>+static __always_inline bool amu_fie_cpu_supported(unsigned int cpu)
>+{
>+ return cpumask_available(amu_fie_cpus) &&
>+ cpumask_test_cpu(cpu, amu_fie_cpus);
>+}
>+
>+#define AMU_SAMPLE_EXP_MS 20
>+
>+unsigned int arch_freq_get_on_cpu(int cpu)
>+{
>+ struct amu_cntr_sample *amu_sample;
>+ unsigned int start_cpu = cpu;
>+ unsigned long last_update;
>+ unsigned int freq = 0;
>+ u64 scale;
>+
>+ if (!amu_fie_cpu_supported(cpu) || !arch_scale_freq_ref(cpu))
>+ return 0;
>+
>+retry:
>+ amu_sample = per_cpu_ptr(&cpu_amu_samples, cpu);
>+
>+ last_update = amu_sample->last_update;
>+
>+ /*
>+ * For those CPUs that are in full dynticks mode,
>+ * and those that have not seen tick for a while
>+ * try an alternative source for the counters (and thus freq scale),
>+ * if available, for given policy:
>+ * this boils down to identifying an active cpu within the same freq
>+ * domain, if any.
>+ */
>+ if (!housekeeping_cpu(cpu, HK_TYPE_TICK) ||
>+ time_is_before_jiffies(last_update + msecs_to_jiffies(AMU_SAMPLE_EXP_MS))) {
>+ struct cpufreq_policy *policy = cpufreq_cpu_get(cpu);
>+ int ref_cpu = cpu;
>+
>+ if (!policy)
>+ goto leave;
>+
>+ if (!policy_is_shared(policy)) {
>+ cpufreq_cpu_put(policy);
>+ goto leave;
>+ }
>+
>+ do {
>+ ref_cpu = cpumask_next_wrap(ref_cpu, policy->cpus,
>+ start_cpu, false);
>+
>+ } while (ref_cpu < nr_cpu_ids && idle_cpu(ref_cpu));
>+
>+ cpufreq_cpu_put(policy);
>+
>+ if (ref_cpu >= nr_cpu_ids)
>+ /* No alternative to pull info from */
>+ goto leave;
>+
>+ cpu = ref_cpu;
>+ goto retry;
>+ }
>+ /*
>+ * Reversed computation to the one used to determine
>+ * the arch_freq_scale value
>+ * (see amu_scale_freq_tick for details)
>+ */
>+ scale = arch_scale_freq_capacity(cpu);
>+ freq = scale * arch_scale_freq_ref(cpu);
>+ freq >>= SCHED_CAPACITY_SHIFT;
>+leave:
>+ return freq;
>+}
>+
> static void amu_fie_setup(const struct cpumask *cpus)
> {
> int cpu;
>--
>2.25.1
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v6 0/4] Add support for AArch64 AMUv1-based arch_freq_get_on_cpu
2024-07-08 17:10 ` [PATCH v6 0/4] Add support for AArch64 AMUv1-based arch_freq_get_on_cpu Catalin Marinas
@ 2024-07-11 13:59 ` Beata Michalska
2024-08-14 8:05 ` Jie Zhan
0 siblings, 1 reply; 18+ messages in thread
From: Beata Michalska @ 2024-07-11 13:59 UTC (permalink / raw)
To: Catalin Marinas
Cc: linux-kernel, linux-arm-kernel, ionela.voinescu, sudeep.holla,
will, vincent.guittot, vanshikonda, sumitg, yang, lihuisong,
viresh.kumar, rafael
Hi Catalin,
On Mon, Jul 08, 2024 at 06:10:02PM +0100, Catalin Marinas wrote:
> Hi Beata,
>
> On Mon, Jun 03, 2024 at 09:21:50AM +0100, Beata Michalska wrote:
> > Introducing arm64 specific version of arch_freq_get_on_cpu, cashing on
> > existing implementation for FIE and AMUv1 support: the frequency scale
> > factor, updated on each sched tick, serves as a base for retrieving
> > the frequency for a given CPU, representing an average frequency
> > reported between the ticks - thus its accuracy is limited.
> >
> > The changes have been rather lightly (due to some limitations) tested on
> > an FVP model. Note that some small discrepancies have been observed while
> > testing (on the model) and this is currently being investigated, though it
> > should not have any significant impact on the overall results.
>
> What's the plan with this series? Are you still investigating those
> discrepancies or is it good to go?
>
Overall it should be good to go with small caveat:
as per discussion [1] we might need to provide new sysfs attribute exposing an
average frequency instead of plugging new code under existing cpuinfo_cur_freq.
This is to avoid messing up with other archs and make a clean distinction on
which attribute provides what information.
As such, the arch_freq_get_on_cpu implementation provided within this series
[PATCH v6 3/4] will most probably be shifted to a new function.
Hopefully will be able to send those changes soon.
---
[1] https://lore.kernel.org/all/ZmrB_DqtmVpvG30l@arm.com/
---
BR
Beata
> --
> Catalin
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v6 0/4] Add support for AArch64 AMUv1-based arch_freq_get_on_cpu
2024-06-03 8:21 [PATCH v6 0/4] Add support for AArch64 AMUv1-based arch_freq_get_on_cpu Beata Michalska
` (4 preceding siblings ...)
2024-07-08 17:10 ` [PATCH v6 0/4] Add support for AArch64 AMUv1-based arch_freq_get_on_cpu Catalin Marinas
@ 2024-07-14 0:32 ` Vanshidhar Konda
2024-07-17 6:58 ` Beata Michalska
5 siblings, 1 reply; 18+ messages in thread
From: Vanshidhar Konda @ 2024-07-14 0:32 UTC (permalink / raw)
To: Beata Michalska
Cc: linux-kernel, linux-arm-kernel, ionela.voinescu, sudeep.holla,
will, catalin.marinas, vincent.guittot, sumitg, yang, lihuisong,
viresh.kumar, rafael
On Mon, Jun 03, 2024 at 09:21:50AM +0100, Beata Michalska wrote:
>Introducing arm64 specific version of arch_freq_get_on_cpu, cashing on
>existing implementation for FIE and AMUv1 support: the frequency scale
>factor, updated on each sched tick, serves as a base for retrieving
>the frequency for a given CPU, representing an average frequency
>reported between the ticks - thus its accuracy is limited.
>
>The changes have been rather lightly (due to some limitations) tested on
>an FVP model. Note that some small discrepancies have been observed while
I've tested these changes (v6) on AmpereOne system and the results look correct.
The frequency reported by scaling_cur_freq is as expected for housekeeping cpus,
idle as well as isolated cpus.
Thanks,
Vanshi
>testing (on the model) and this is currently being investigated, though it
>should not have any significant impact on the overall results.
>
>Relevant discussions:
>[1] https://lore.kernel.org/all/20240229162520.970986-1-vanshikonda@os.amperecomputing.com/
>[2] https://lore.kernel.org/all/7eozim2xnepacnnkzxlbx34hib4otycnbn4dqymfziqou5lw5u@5xzpv3t7sxo3/
>[3] https://lore.kernel.org/all/20231212072617.14756-1-lihuisong@huawei.com/
>[4] https://lore.kernel.org/lkml/ZIHpd6unkOtYVEqP@e120325.cambridge.arm.com/T/#m4e74cb5a0aaa353c60fedc6cfb95ab7a6e381e3c
>
>v6:
> - delay allocating cpumask for AMU FIE support instead of invalidating the mask
> upon failure to register cpufreq policy notifications
> - drop the change to cpufreq core (for cpuinfo_cur_freq) as this one will be
> sent as a separate change
>
>v5:
> - Fix invalid access to cpumask
> - Reworked finding reference cpu when getting the freq
>
>v4:
>- dropping seqcount
>- fixing identifying active cpu within given policy
>- skipping full dynticks cpus when retrieving the freq
>- bringing back plugging in arch_freq_get_on_cpu into cpuinfo_cur_freq
>
>v3:
>- dropping changes to cpufreq_verify_current_freq
>- pulling in changes from Ionela initializing capacity_freq_ref to 0
> (thanks for that!) and applying suggestions made by her during last review:
> - switching to arch_scale_freq_capacity and arch_scale_freq_ref when
> reversing freq scale factor computation
> - swapping shift with multiplication
>- adding time limit for considering last scale update as valid
>- updating frequency scale factor upon entering idle
>
>v2:
>- Splitting the patches
>- Adding comment for full dyntick mode
>- Plugging arch_freq_get_on_cpu into cpufreq_verify_current_freq instead
> of in show_cpuinfo_cur_freq to allow the framework to stay more in sync
> with potential freq changes
>
>
>
>Beata Michalska (3):
> arm64: amu: Delay allocating cpumask for AMU FIE support
> arm64: Provide an AMU-based version of arch_freq_get_on_cpu
> arm64: Update AMU-based frequency scale factor on entering idle
>
>Ionela Voinescu (1):
> arch_topology: init capacity_freq_ref to 0
>
> arch/arm64/kernel/topology.c | 145 +++++++++++++++++++++++++++++------
> drivers/base/arch_topology.c | 8 +-
> 2 files changed, 127 insertions(+), 26 deletions(-)
>
>--
>2.25.1
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v6 3/4] arm64: Provide an AMU-based version of arch_freq_get_on_cpu
2024-07-10 17:44 ` Vanshidhar Konda
@ 2024-07-17 6:54 ` Beata Michalska
0 siblings, 0 replies; 18+ messages in thread
From: Beata Michalska @ 2024-07-17 6:54 UTC (permalink / raw)
To: Vanshidhar Konda
Cc: linux-kernel, linux-arm-kernel, ionela.voinescu, sudeep.holla,
will, catalin.marinas, vincent.guittot, sumitg, yang, lihuisong,
viresh.kumar, rafael
On Wed, Jul 10, 2024 at 10:44:41AM -0700, Vanshidhar Konda wrote:
> I apologize for the late review. This series dropped off my radar. I will test
> this on an AmpereOne system.
>
> On Mon, Jun 03, 2024 at 09:21:53AM +0100, Beata Michalska wrote:
> > With the Frequency Invariance Engine (FIE) being already wired up with
> > sched tick and making use of relevant (core counter and constant
> > counter) AMU counters, getting the current frequency for a given CPU,
> > can be achieved by utilizing the frequency scale factor which reflects
> > an average CPU frequency for the last tick period length.
> >
> > The solution is partially based on APERF/MPERF implementation of
> > arch_freq_get_on_cpu.
> >
> > Suggested-by: Ionela Voinescu <ionela.voinescu@arm.com>
> > Signed-off-by: Beata Michalska <beata.michalska@arm.com>
> > ---
> > arch/arm64/kernel/topology.c | 110 +++++++++++++++++++++++++++++++----
> > 1 file changed, 100 insertions(+), 10 deletions(-)
> >
> > diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
> > index e475ec2705e1..2c002d2c3e0b 100644
> > --- a/arch/arm64/kernel/topology.c
> > +++ b/arch/arm64/kernel/topology.c
> > @@ -17,6 +17,7 @@
> > #include <linux/cpufreq.h>
> > #include <linux/init.h>
> > #include <linux/percpu.h>
> > +#include <linux/sched/isolation.h>
> >
> > #include <asm/cpu.h>
> > #include <asm/cputype.h>
> > @@ -88,18 +89,28 @@ int __init parse_acpi_topology(void)
> > * initialized.
> > */
> > static DEFINE_PER_CPU_READ_MOSTLY(unsigned long, arch_max_freq_scale) = 1UL << (2 * SCHED_CAPACITY_SHIFT);
> > -static DEFINE_PER_CPU(u64, arch_const_cycles_prev);
> > -static DEFINE_PER_CPU(u64, arch_core_cycles_prev);
> > static cpumask_var_t amu_fie_cpus;
> >
> > +struct amu_cntr_sample {
> > + u64 arch_const_cycles_prev;
> > + u64 arch_core_cycles_prev;
> > + unsigned long last_update;
> > +};
> > +
> > +static DEFINE_PER_CPU_SHARED_ALIGNED(struct amu_cntr_sample, cpu_amu_samples);
> > +
> > void update_freq_counters_refs(void)
>
> Could this be renamed to update_amu_cntr_sample() for more clarity?
I guess it could though to be fair I'd be more inclined to rename the struct
itself: this function updates the cached counter values that are being used
as a most recent ones but also as reference when calculating the delta
between the readings, so I'd say the naming here is pretty accurate.
>
> > {
> > - this_cpu_write(arch_core_cycles_prev, read_corecnt());
> > - this_cpu_write(arch_const_cycles_prev, read_constcnt());
> > + struct amu_cntr_sample *amu_sample = this_cpu_ptr(&cpu_amu_samples);
> > +
> > + amu_sample->arch_core_cycles_prev = read_corecnt();
> > + amu_sample->arch_const_cycles_prev = read_constcnt();
>
> I think it would be better to update amp_sample->last_update here. update_freq_counters_refs
> is the only way to update the amu_sample. So it should be safer to update the whole structure
> in this method.
Right, so the amu_sample->last_update is actually referring to updating the
scale factor, not the counter values per-se, even though one cannot be updated
without the other; so I'd rather keep those separated: the update time is being
used to determine whether the last known freq scale is still somewhat relevant.
If that would eliminate the confusion I can rename that filed.
>
> > }
> >
> > static inline bool freq_counters_valid(int cpu)
> > {
> > + struct amu_cntr_sample *amu_sample = per_cpu_ptr(&cpu_amu_samples, cpu);
> > +
> > if ((cpu >= nr_cpu_ids) || !cpumask_test_cpu(cpu, cpu_present_mask))
> > return false;
> >
> > @@ -108,8 +119,8 @@ static inline bool freq_counters_valid(int cpu)
> > return false;
> > }
> >
> > - if (unlikely(!per_cpu(arch_const_cycles_prev, cpu) ||
> > - !per_cpu(arch_core_cycles_prev, cpu))) {
> > + if (unlikely(!amu_sample->arch_const_cycles_prev ||
> > + !amu_sample->arch_core_cycles_prev)) {
> > pr_debug("CPU%d: cycle counters are not enabled.\n", cpu);
> > return false;
> > }
> > @@ -152,17 +163,22 @@ void freq_inv_set_max_ratio(int cpu, u64 max_rate)
> >
> > static void amu_scale_freq_tick(void)
> > {
> > + struct amu_cntr_sample *amu_sample = this_cpu_ptr(&cpu_amu_samples);
> > u64 prev_core_cnt, prev_const_cnt;
> > u64 core_cnt, const_cnt, scale;
> >
> > - prev_const_cnt = this_cpu_read(arch_const_cycles_prev);
> > - prev_core_cnt = this_cpu_read(arch_core_cycles_prev);
> > + prev_const_cnt = amu_sample->arch_const_cycles_prev;
> > + prev_core_cnt = amu_sample->arch_core_cycles_prev;
> >
> > update_freq_counters_refs();
> >
> > - const_cnt = this_cpu_read(arch_const_cycles_prev);
> > - core_cnt = this_cpu_read(arch_core_cycles_prev);
> > + const_cnt = amu_sample->arch_const_cycles_prev;
> > + core_cnt = amu_sample->arch_core_cycles_prev;
> >
> > + /*
> > + * This should not happen unless the AMUs have been reset and the
> > + * counter values have not been restored - unlikely
> > + */
> > if (unlikely(core_cnt <= prev_core_cnt ||
> > const_cnt <= prev_const_cnt))
> > return;
> > @@ -182,6 +198,8 @@ static void amu_scale_freq_tick(void)
> >
> > scale = min_t(unsigned long, scale, SCHED_CAPACITY_SCALE);
> > this_cpu_write(arch_freq_scale, (unsigned long)scale);
> > +
> > + amu_sample->last_update = jiffies;
>
> Please see the comment above. I think this line could be moved to
> update_freq_counters_refs method.
>
> > }
> >
> > static struct scale_freq_data amu_sfd = {
> > @@ -189,6 +207,78 @@ static struct scale_freq_data amu_sfd = {
> > .set_freq_scale = amu_scale_freq_tick,
> > };
> >
> > +static __always_inline bool amu_fie_cpu_supported(unsigned int cpu)
> > +{
> > + return cpumask_available(amu_fie_cpus) &&
> > + cpumask_test_cpu(cpu, amu_fie_cpus);
> > +}
> > +
> > +#define AMU_SAMPLE_EXP_MS 20
> > +
> > +unsigned int arch_freq_get_on_cpu(int cpu)
> > +{
> > + struct amu_cntr_sample *amu_sample;
> > + unsigned int start_cpu = cpu;
> > + unsigned long last_update;
> > + unsigned int freq = 0;
> > + u64 scale;
> > +
> > + if (!amu_fie_cpu_supported(cpu) || !arch_scale_freq_ref(cpu))
> > + return 0;
> > +
> > +retry:
> > + amu_sample = per_cpu_ptr(&cpu_amu_samples, cpu);
> > +
> > + last_update = amu_sample->last_update;
> > +
> > + /*
> > + * For those CPUs that are in full dynticks mode,
> > + * and those that have not seen tick for a while
> > + * try an alternative source for the counters (and thus freq scale),
> > + * if available, for given policy:
> > + * this boils down to identifying an active cpu within the same freq
> > + * domain, if any.
> > + */
> > + if (!housekeeping_cpu(cpu, HK_TYPE_TICK) ||
> > + time_is_before_jiffies(last_update + msecs_to_jiffies(AMU_SAMPLE_EXP_MS))) {
> > + struct cpufreq_policy *policy = cpufreq_cpu_get(cpu);
> > + int ref_cpu = cpu;
> > +
> > + if (!policy)
> > + goto leave;
> > +
> > + if (!policy_is_shared(policy)) {
> > + cpufreq_cpu_put(policy);
> > + goto leave;
> > + }
> > +
> > + do {
> > + ref_cpu = cpumask_next_wrap(ref_cpu, policy->cpus,
> > + start_cpu, false);
> > +
> > + } while (ref_cpu < nr_cpu_ids && idle_cpu(ref_cpu));
> > +
> > + cpufreq_cpu_put(policy);
> > +
> > + if (ref_cpu >= nr_cpu_ids)
> > + /* No alternative to pull info from */
> > + goto leave;
> > +
> > + cpu = ref_cpu;
> > + goto retry;
> > + }
> > + /*
> > + * Reversed computation to the one used to determine
> > + * the arch_freq_scale value
> > + * (see amu_scale_freq_tick for details)
> > + */
> > + scale = arch_scale_freq_capacity(cpu);
> > + freq = scale * arch_scale_freq_ref(cpu);
> > + freq >>= SCHED_CAPACITY_SHIFT;
> > +leave:
> > + return freq;
> > +}
> > +
> > static void amu_fie_setup(const struct cpumask *cpus)
> > {
> > int cpu;
> > --
> > 2.25.1
> >
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v6 0/4] Add support for AArch64 AMUv1-based arch_freq_get_on_cpu
2024-07-14 0:32 ` Vanshidhar Konda
@ 2024-07-17 6:58 ` Beata Michalska
0 siblings, 0 replies; 18+ messages in thread
From: Beata Michalska @ 2024-07-17 6:58 UTC (permalink / raw)
To: Vanshidhar Konda
Cc: linux-kernel, linux-arm-kernel, ionela.voinescu, sudeep.holla,
will, catalin.marinas, vincent.guittot, sumitg, yang, lihuisong,
viresh.kumar, rafael
On Sat, Jul 13, 2024 at 05:32:43PM -0700, Vanshidhar Konda wrote:
> On Mon, Jun 03, 2024 at 09:21:50AM +0100, Beata Michalska wrote:
> > Introducing arm64 specific version of arch_freq_get_on_cpu, cashing on
> > existing implementation for FIE and AMUv1 support: the frequency scale
> > factor, updated on each sched tick, serves as a base for retrieving
> > the frequency for a given CPU, representing an average frequency
> > reported between the ticks - thus its accuracy is limited.
> >
> > The changes have been rather lightly (due to some limitations) tested on
> > an FVP model. Note that some small discrepancies have been observed while
>
> I've tested these changes (v6) on AmpereOne system and the results look correct.
> The frequency reported by scaling_cur_freq is as expected for housekeeping cpus,
> idle as well as isolated cpus.
>
That's greatly appreciated - thank you for that!
This series depends on [1] so once the relevant changes are ready will send an
updated to expose the average freq through a relevant sysfs attrib.
---
[1] https://lore.kernel.org/all/ZmrB_DqtmVpvG30l@arm.com/
---
Best Regards
Beata
> Thanks,
> Vanshi
>
> > testing (on the model) and this is currently being investigated, though it
> > should not have any significant impact on the overall results.
> >
> > Relevant discussions:
> > [1] https://lore.kernel.org/all/20240229162520.970986-1-vanshikonda@os.amperecomputing.com/
> > [2] https://lore.kernel.org/all/7eozim2xnepacnnkzxlbx34hib4otycnbn4dqymfziqou5lw5u@5xzpv3t7sxo3/
> > [3] https://lore.kernel.org/all/20231212072617.14756-1-lihuisong@huawei.com/
> > [4] https://lore.kernel.org/lkml/ZIHpd6unkOtYVEqP@e120325.cambridge.arm.com/T/#m4e74cb5a0aaa353c60fedc6cfb95ab7a6e381e3c
> >
> > v6:
> > - delay allocating cpumask for AMU FIE support instead of invalidating the mask
> > upon failure to register cpufreq policy notifications
> > - drop the change to cpufreq core (for cpuinfo_cur_freq) as this one will be
> > sent as a separate change
> >
> > v5:
> > - Fix invalid access to cpumask
> > - Reworked finding reference cpu when getting the freq
> >
> > v4:
> > - dropping seqcount
> > - fixing identifying active cpu within given policy
> > - skipping full dynticks cpus when retrieving the freq
> > - bringing back plugging in arch_freq_get_on_cpu into cpuinfo_cur_freq
> >
> > v3:
> > - dropping changes to cpufreq_verify_current_freq
> > - pulling in changes from Ionela initializing capacity_freq_ref to 0
> > (thanks for that!) and applying suggestions made by her during last review:
> > - switching to arch_scale_freq_capacity and arch_scale_freq_ref when
> > reversing freq scale factor computation
> > - swapping shift with multiplication
> > - adding time limit for considering last scale update as valid
> > - updating frequency scale factor upon entering idle
> >
> > v2:
> > - Splitting the patches
> > - Adding comment for full dyntick mode
> > - Plugging arch_freq_get_on_cpu into cpufreq_verify_current_freq instead
> > of in show_cpuinfo_cur_freq to allow the framework to stay more in sync
> > with potential freq changes
> >
> >
> >
> > Beata Michalska (3):
> > arm64: amu: Delay allocating cpumask for AMU FIE support
> > arm64: Provide an AMU-based version of arch_freq_get_on_cpu
> > arm64: Update AMU-based frequency scale factor on entering idle
> >
> > Ionela Voinescu (1):
> > arch_topology: init capacity_freq_ref to 0
> >
> > arch/arm64/kernel/topology.c | 145 +++++++++++++++++++++++++++++------
> > drivers/base/arch_topology.c | 8 +-
> > 2 files changed, 127 insertions(+), 26 deletions(-)
> >
> > --
> > 2.25.1
> >
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v6 3/4] arm64: Provide an AMU-based version of arch_freq_get_on_cpu
2024-06-03 8:21 ` [PATCH v6 3/4] arm64: Provide an AMU-based version of arch_freq_get_on_cpu Beata Michalska
2024-07-10 17:44 ` Vanshidhar Konda
@ 2024-08-14 6:46 ` Jie Zhan
2024-08-26 7:23 ` Beata Michalska
1 sibling, 1 reply; 18+ messages in thread
From: Jie Zhan @ 2024-08-14 6:46 UTC (permalink / raw)
To: Beata Michalska, linux-kernel, linux-arm-kernel, ionela.voinescu,
sudeep.holla, will, catalin.marinas, vincent.guittot, vanshikonda
Cc: sumitg, yang, lihuisong, viresh.kumar, rafael
Hi Beata,
On 03/06/2024 16:21, Beata Michalska wrote:
> With the Frequency Invariance Engine (FIE) being already wired up with
> sched tick and making use of relevant (core counter and constant
> counter) AMU counters, getting the current frequency for a given CPU,
> can be achieved by utilizing the frequency scale factor which reflects
> an average CPU frequency for the last tick period length.
>
> The solution is partially based on APERF/MPERF implementation of
> arch_freq_get_on_cpu.
>
> Suggested-by: Ionela Voinescu <ionela.voinescu@arm.com>
> Signed-off-by: Beata Michalska <beata.michalska@arm.com>
> ---
> arch/arm64/kernel/topology.c | 110 +++++++++++++++++++++++++++++++----
> 1 file changed, 100 insertions(+), 10 deletions(-)
...
> +
> +#define AMU_SAMPLE_EXP_MS 20
> +
> +unsigned int arch_freq_get_on_cpu(int cpu)
> +{
> + struct amu_cntr_sample *amu_sample;
> + unsigned int start_cpu = cpu;
> + unsigned long last_update;
> + unsigned int freq = 0;
> + u64 scale;
> +
> + if (!amu_fie_cpu_supported(cpu) || !arch_scale_freq_ref(cpu))
> + return 0;
> +
> +retry:
> + amu_sample = per_cpu_ptr(&cpu_amu_samples, cpu);
> +
> + last_update = amu_sample->last_update;
> +
> + /*
> + * For those CPUs that are in full dynticks mode,
> + * and those that have not seen tick for a while
> + * try an alternative source for the counters (and thus freq scale),
> + * if available, for given policy:
> + * this boils down to identifying an active cpu within the same freq
> + * domain, if any.
> + */
> + if (!housekeeping_cpu(cpu, HK_TYPE_TICK) ||
> + time_is_before_jiffies(last_update + msecs_to_jiffies(AMU_SAMPLE_EXP_MS))) {
One question here.
The 2nd condition, providing the addtional code in patch 4, would be:
(!idle_cpu(cpu) && time_is_before_jiffies(last_update +
msecs_to_jiffies(AMU_SAMPLE_EXP_MS)))
That means trying another cpu in the same freq domain if this cpu is
running and not having a tick recently.
In this case, if it fails to find an alternative cpu in the following
code, can it just jump to the calculation
part and return an 'old' frequency rather than return 0?
The freq here won't be older than the freq when the cpu went idle last
time -- yet the freq before last idle
is returned if the cpu were idle (patch 4).
> + struct cpufreq_policy *policy = cpufreq_cpu_get(cpu);
> + int ref_cpu = cpu;
> +
> + if (!policy)
> + goto leave;
> +
> + if (!policy_is_shared(policy)) {
> + cpufreq_cpu_put(policy);
> + goto leave;
> + }
> +
> + do {
> + ref_cpu = cpumask_next_wrap(ref_cpu, policy->cpus,
> + start_cpu, false);
start_cpu is only used here. looks like we can s/start_cpu/cpu/ and
remove its definition above?
> +
> + } while (ref_cpu < nr_cpu_ids && idle_cpu(ref_cpu));
> +
> + cpufreq_cpu_put(policy);
> +
> + if (ref_cpu >= nr_cpu_ids)
> + /* No alternative to pull info from */
> + goto leave;
> +
> + cpu = ref_cpu;
> + goto retry;
> + }
> + /*
> + * Reversed computation to the one used to determine
> + * the arch_freq_scale value
> + * (see amu_scale_freq_tick for details)
> + */
> + scale = arch_scale_freq_capacity(cpu);
> + freq = scale * arch_scale_freq_ref(cpu);
> + freq >>= SCHED_CAPACITY_SHIFT;
> +leave:
> + return freq;
> +}
> +
> static void amu_fie_setup(const struct cpumask *cpus)
> {
> int cpu;
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v6 0/4] Add support for AArch64 AMUv1-based arch_freq_get_on_cpu
2024-07-11 13:59 ` Beata Michalska
@ 2024-08-14 8:05 ` Jie Zhan
2024-08-26 7:24 ` Beata Michalska
0 siblings, 1 reply; 18+ messages in thread
From: Jie Zhan @ 2024-08-14 8:05 UTC (permalink / raw)
To: Beata Michalska, Catalin Marinas
Cc: linux-kernel, linux-arm-kernel, ionela.voinescu, sudeep.holla,
will, vincent.guittot, vanshikonda, sumitg, yang, lihuisong,
viresh.kumar, rafael
On 11/07/2024 21:59, Beata Michalska wrote:
> Hi Catalin,
>
> On Mon, Jul 08, 2024 at 06:10:02PM +0100, Catalin Marinas wrote:
>> Hi Beata,
>>
>> On Mon, Jun 03, 2024 at 09:21:50AM +0100, Beata Michalska wrote:
>>> Introducing arm64 specific version of arch_freq_get_on_cpu, cashing on
>>> existing implementation for FIE and AMUv1 support: the frequency scale
>>> factor, updated on each sched tick, serves as a base for retrieving
>>> the frequency for a given CPU, representing an average frequency
>>> reported between the ticks - thus its accuracy is limited.
>>>
>>> The changes have been rather lightly (due to some limitations) tested on
>>> an FVP model. Note that some small discrepancies have been observed while
>>> testing (on the model) and this is currently being investigated, though it
>>> should not have any significant impact on the overall results.
>> What's the plan with this series? Are you still investigating those
>> discrepancies or is it good to go?
>>
> Overall it should be good to go with small caveat:
> as per discussion [1] we might need to provide new sysfs attribute exposing an
> average frequency instead of plugging new code under existing cpuinfo_cur_freq.
> This is to avoid messing up with other archs and make a clean distinction on
> which attribute provides what information.
> As such, the arch_freq_get_on_cpu implementation provided within this series
> [PATCH v6 3/4] will most probably be shifted to a new function.
>
> Hopefully will be able to send those changes soon.
>
> ---
> [1] https://lore.kernel.org/all/ZmrB_DqtmVpvG30l@arm.com/
> ---
> BR
> Beata
>
>> --
>> Catalin
>>
Hi Beata,
I've recently tested this patchset on a Kunpeng system.
It works as expected when reading scaling_cur_freq.
The frequency samples are much stabler than what cppc_cpufreq returns.
A few minor things.
1. I observed larger errors on idle cpus than busy cpus, though it's
just up to 1%.
Not sure if this comes from the uncertain time interval between the last
tick and entering idle.
The shorter averaging interval, the larger error, I supposed.
2. In the current implementation, the resolution of frequency would be:
max_freq_khz / SCHED_CAPACITY_SCALE
This looks a bit unnecessary to me.
It's supposed to get a better resolution if we can do this in
arch_freq_get_on_cpu():
freq = delta_cycle_cnt * max_freq_khz / delta_const_cnt
which may require caching both current and previous sets of counts in
the per-cpu struct amu_cntr_sample.
Kind regards,
Jie
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v6 3/4] arm64: Provide an AMU-based version of arch_freq_get_on_cpu
2024-08-14 6:46 ` Jie Zhan
@ 2024-08-26 7:23 ` Beata Michalska
2024-08-27 13:05 ` Jie Zhan
0 siblings, 1 reply; 18+ messages in thread
From: Beata Michalska @ 2024-08-26 7:23 UTC (permalink / raw)
To: Jie Zhan
Cc: linux-kernel, linux-arm-kernel, ionela.voinescu, sudeep.holla,
will, catalin.marinas, vincent.guittot, vanshikonda, sumitg, yang,
lihuisong, viresh.kumar, rafael
On Wed, Aug 14, 2024 at 02:46:16PM +0800, Jie Zhan wrote:
> Hi Beata,
Hi Jie,
>
> On 03/06/2024 16:21, Beata Michalska wrote:
> > With the Frequency Invariance Engine (FIE) being already wired up with
> > sched tick and making use of relevant (core counter and constant
> > counter) AMU counters, getting the current frequency for a given CPU,
> > can be achieved by utilizing the frequency scale factor which reflects
> > an average CPU frequency for the last tick period length.
> >
> > The solution is partially based on APERF/MPERF implementation of
> > arch_freq_get_on_cpu.
> >
> > Suggested-by: Ionela Voinescu <ionela.voinescu@arm.com>
> > Signed-off-by: Beata Michalska <beata.michalska@arm.com>
> > ---
> > arch/arm64/kernel/topology.c | 110 +++++++++++++++++++++++++++++++----
> > 1 file changed, 100 insertions(+), 10 deletions(-)
> ...
> > +
> > +#define AMU_SAMPLE_EXP_MS 20
> > +
> > +unsigned int arch_freq_get_on_cpu(int cpu)
> > +{
> > + struct amu_cntr_sample *amu_sample;
> > + unsigned int start_cpu = cpu;
> > + unsigned long last_update;
> > + unsigned int freq = 0;
> > + u64 scale;
> > +
> > + if (!amu_fie_cpu_supported(cpu) || !arch_scale_freq_ref(cpu))
> > + return 0;
> > +
> > +retry:
> > + amu_sample = per_cpu_ptr(&cpu_amu_samples, cpu);
> > +
> > + last_update = amu_sample->last_update;
> > +
> > + /*
> > + * For those CPUs that are in full dynticks mode,
> > + * and those that have not seen tick for a while
> > + * try an alternative source for the counters (and thus freq scale),
> > + * if available, for given policy:
> > + * this boils down to identifying an active cpu within the same freq
> > + * domain, if any.
> > + */
> > + if (!housekeeping_cpu(cpu, HK_TYPE_TICK) ||
> > + time_is_before_jiffies(last_update + msecs_to_jiffies(AMU_SAMPLE_EXP_MS))) {
> One question here.
>
> The 2nd condition, providing the addtional code in patch 4, would be:
> (!idle_cpu(cpu) && time_is_before_jiffies(last_update +
> msecs_to_jiffies(AMU_SAMPLE_EXP_MS)))
> That means trying another cpu in the same freq domain if this cpu is running
> and not having a tick recently.
>
> In this case, if it fails to find an alternative cpu in the following code,
> can it just jump to the calculation
> part and return an 'old' frequency rather than return 0?
> The freq here won't be older than the freq when the cpu went idle last time
> -- yet the freq before last idle
> is returned if the cpu were idle (patch 4).
To be fair, I will be dropping the idle_cpu check from this condition so that
we do keep the cap on the validity of the scale factor: meaning if the cpu
remains idle for longer than assumed 20ms we will either look for alternative
or return 0. I'm not entirely convinced returning somewhat stale information on
the freq for CPUs that remain idle for a while will be useful.
One note here: as per discussion in [1] this functionality will be moved to new
sysfs attribute so it will no longer be used via scaling_cur_freq.
>
> > + struct cpufreq_policy *policy = cpufreq_cpu_get(cpu);
> > + int ref_cpu = cpu;
> > +
> > + if (!policy)
> > + goto leave;
> > +
> > + if (!policy_is_shared(policy)) {
> > + cpufreq_cpu_put(policy);
> > + goto leave;
> > + }
> > +
> > + do {
> > + ref_cpu = cpumask_next_wrap(ref_cpu, policy->cpus,
> > + start_cpu, false);
> start_cpu is only used here. looks like we can s/start_cpu/cpu/ and remove
> its definition above?
It is indeed but it is needed for the cpumask_next_wrap to know when to stop
the iteration after wrapping.
---
[1] https://lore.kernel.org/all/20240603081331.3829278-1-beata.michalska@arm.com/
---
BR
Beata
> > +
> > + } while (ref_cpu < nr_cpu_ids && idle_cpu(ref_cpu));
> > +
> > + cpufreq_cpu_put(policy);
> > +
> > + if (ref_cpu >= nr_cpu_ids)
> > + /* No alternative to pull info from */
> > + goto leave;
> > +
> > + cpu = ref_cpu;
> > + goto retry;
> > + }
> > + /*
> > + * Reversed computation to the one used to determine
> > + * the arch_freq_scale value
> > + * (see amu_scale_freq_tick for details)
> > + */
> > + scale = arch_scale_freq_capacity(cpu);
> > + freq = scale * arch_scale_freq_ref(cpu);
> > + freq >>= SCHED_CAPACITY_SHIFT;
> > +leave:
> > + return freq;
> > +}
> > +
> > static void amu_fie_setup(const struct cpumask *cpus)
> > {
> > int cpu;
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v6 0/4] Add support for AArch64 AMUv1-based arch_freq_get_on_cpu
2024-08-14 8:05 ` Jie Zhan
@ 2024-08-26 7:24 ` Beata Michalska
2024-08-27 13:03 ` Jie Zhan
[not found] ` <8a9b4e02-a5c6-cb1b-fd32-728fc2c5e741@hisilicon.com>
0 siblings, 2 replies; 18+ messages in thread
From: Beata Michalska @ 2024-08-26 7:24 UTC (permalink / raw)
To: Jie Zhan
Cc: Catalin Marinas, linux-kernel, linux-arm-kernel, ionela.voinescu,
sudeep.holla, will, vincent.guittot, vanshikonda, sumitg, yang,
lihuisong, viresh.kumar, rafael
On Wed, Aug 14, 2024 at 04:05:24PM +0800, Jie Zhan wrote:
>
> On 11/07/2024 21:59, Beata Michalska wrote:
> > Hi Catalin,
> >
> > On Mon, Jul 08, 2024 at 06:10:02PM +0100, Catalin Marinas wrote:
> > > Hi Beata,
> > >
> > > On Mon, Jun 03, 2024 at 09:21:50AM +0100, Beata Michalska wrote:
> > > > Introducing arm64 specific version of arch_freq_get_on_cpu, cashing on
> > > > existing implementation for FIE and AMUv1 support: the frequency scale
> > > > factor, updated on each sched tick, serves as a base for retrieving
> > > > the frequency for a given CPU, representing an average frequency
> > > > reported between the ticks - thus its accuracy is limited.
> > > >
> > > > The changes have been rather lightly (due to some limitations) tested on
> > > > an FVP model. Note that some small discrepancies have been observed while
> > > > testing (on the model) and this is currently being investigated, though it
> > > > should not have any significant impact on the overall results.
> > > What's the plan with this series? Are you still investigating those
> > > discrepancies or is it good to go?
> > >
> > Overall it should be good to go with small caveat:
> > as per discussion [1] we might need to provide new sysfs attribute exposing an
> > average frequency instead of plugging new code under existing cpuinfo_cur_freq.
> > This is to avoid messing up with other archs and make a clean distinction on
> > which attribute provides what information.
> > As such, the arch_freq_get_on_cpu implementation provided within this series
> > [PATCH v6 3/4] will most probably be shifted to a new function.
> >
> > Hopefully will be able to send those changes soon.
> >
> > ---
> > [1] https://lore.kernel.org/all/ZmrB_DqtmVpvG30l@arm.com/
> > ---
> > BR
> > Beata
> >
> > > --
> > > Catalin
> > >
> Hi Beata,
Hi Jie,
>
> I've recently tested this patchset on a Kunpeng system.
> It works as expected when reading scaling_cur_freq.
> The frequency samples are much stabler than what cppc_cpufreq returns.
Thank you for giving it a spin.
(and apologies for late reply)
>
> A few minor things.
>
> 1. I observed larger errors on idle cpus than busy cpus, though it's just up
> to 1%.
> Not sure if this comes from the uncertain time interval between the last
> tick and entering idle.
> The shorter averaging interval, the larger error, I supposed.
All right - will look into it.
Just for my benefit: that diff is strictly between arch_freq_avg_get_on_cpu
and cpufreq_driver->get(policy->cpu) ?
>
> 2. In the current implementation, the resolution of frequency would be:
> max_freq_khz / SCHED_CAPACITY_SCALE
> This looks a bit unnecessary to me.
>
> It's supposed to get a better resolution if we can do this in
> arch_freq_get_on_cpu():
>
> freq = delta_cycle_cnt * max_freq_khz / delta_const_cnt
>
> which may require caching both current and previous sets of counts in the
> per-cpu struct amu_cntr_sample.
>
arch_freq_get_on_cpu relies on the frequency scale factor to derive the average
frequency. The scale factor is being calculated based on the deltas you have
mentioned and arch_max_freq_scale which uses SCHED_CAPACITY_SCALE*2 factor to
accommodate for rather low reference frequencies. arch_freq_get_on_cpu just does
somewhat reverse computation to that.
---
BR
Beata
> Kind regards,
> Jie
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v6 0/4] Add support for AArch64 AMUv1-based arch_freq_get_on_cpu
2024-08-26 7:24 ` Beata Michalska
@ 2024-08-27 13:03 ` Jie Zhan
[not found] ` <8a9b4e02-a5c6-cb1b-fd32-728fc2c5e741@hisilicon.com>
1 sibling, 0 replies; 18+ messages in thread
From: Jie Zhan @ 2024-08-27 13:03 UTC (permalink / raw)
To: Beata Michalska
Cc: Catalin Marinas, linux-kernel, linux-arm-kernel, ionela.voinescu,
sudeep.holla, will, vincent.guittot, vanshikonda, sumitg, yang,
lihuisong, viresh.kumar, rafael
On 26/08/2024 15:24, Beata Michalska wrote:
> ...
>> Hi Beata,
> Hi Jie,
>> I've recently tested this patchset on a Kunpeng system.
>> It works as expected when reading scaling_cur_freq.
>> The frequency samples are much stabler than what cppc_cpufreq returns.
> Thank you for giving it a spin.
> (and apologies for late reply)
>> A few minor things.
>>
>> 1. I observed larger errors on idle cpus than busy cpus, though it's just up
>> to 1%.
>> Not sure if this comes from the uncertain time interval between the last
>> tick and entering idle.
>> The shorter averaging interval, the larger error, I supposed.
> All right - will look into it.
> Just for my benefit: that diff is strictly between arch_freq_avg_get_on_cpu
> and cpufreq_driver->get(policy->cpu) ?
I can't say whether it's "strictly" between them or not because
driver->get()
shows a fluctuating value.
On my platform, cppc_cpufreq's driver->get() sometimes shows large errors on
busy cpus (as reported by recent emails), but quite accurate on idle
cpus (<1%).
With this patch, the "error" on idle cpus as mentioned above is
typically <1%,
hence better in general.
>> 2. In the current implementation, the resolution of frequency would be:
>> max_freq_khz / SCHED_CAPACITY_SCALE
>> This looks a bit unnecessary to me.
>>
>> It's supposed to get a better resolution if we can do this in
>> arch_freq_get_on_cpu():
>>
>> freq = delta_cycle_cnt * max_freq_khz / delta_const_cnt
>>
>> which may require caching both current and previous sets of counts in the
>> per-cpu struct amu_cntr_sample.
>>
> arch_freq_get_on_cpu relies on the frequency scale factor to derive the average
> frequency. The scale factor is being calculated based on the deltas you have
> mentioned and arch_max_freq_scale which uses SCHED_CAPACITY_SCALE*2 factor to
> accommodate for rather low reference frequencies. arch_freq_get_on_cpu just does
> somewhat reverse computation to that.
Yeah I understood this.
arch_freq_get_on_cpu() now returns:
freq = (arch_scale_freq_capacity() * arch_scale_freq_ref()) >>
SCHED_CAPACITY_SHIFT
The frequency resolution is (arch_scale_freq_ref() >>
SCHED_CAPACITY_SHIFT), which
is equivalent to max_freq_khz / SCHED_CAPACITY_SCALE.
If we can directly do
freq = delta_cycle_cnt * ref_freq_khz / delta_const_cnt
in arch_freq_get_on_cpu(), it's supposed to have a 1KHz resolution.
(sorry for the wrong multiplier in the last reply)
Just similar to what's done in [1].
It could be more worthwhile to have a good resolution than utilising the
arch_topology framework?
[1]
https://lore.kernel.org/all/20240229162520.970986-2-vanshikonda@os.amperecomputing.com/
> ---
> BR
> Beata
>> Kind regards,
>> Jie
>>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v6 3/4] arm64: Provide an AMU-based version of arch_freq_get_on_cpu
2024-08-26 7:23 ` Beata Michalska
@ 2024-08-27 13:05 ` Jie Zhan
0 siblings, 0 replies; 18+ messages in thread
From: Jie Zhan @ 2024-08-27 13:05 UTC (permalink / raw)
To: Beata Michalska
Cc: linux-kernel, linux-arm-kernel, ionela.voinescu, sudeep.holla,
will, catalin.marinas, vincent.guittot, vanshikonda, sumitg, yang,
lihuisong, viresh.kumar, rafael
On 26/08/2024 15:23, Beata Michalska wrote:
> On Wed, Aug 14, 2024 at 02:46:16PM +0800, Jie Zhan wrote:
>> Hi Beata,
> Hi Jie,
>> On 03/06/2024 16:21, Beata Michalska wrote:
>>> With the Frequency Invariance Engine (FIE) being already wired up with
>>> sched tick and making use of relevant (core counter and constant
>>> counter) AMU counters, getting the current frequency for a given CPU,
>>> can be achieved by utilizing the frequency scale factor which reflects
>>> an average CPU frequency for the last tick period length.
>>>
>>> The solution is partially based on APERF/MPERF implementation of
>>> arch_freq_get_on_cpu.
>>>
>>> Suggested-by: Ionela Voinescu <ionela.voinescu@arm.com>
>>> Signed-off-by: Beata Michalska <beata.michalska@arm.com>
>>> ---
>>> arch/arm64/kernel/topology.c | 110 +++++++++++++++++++++++++++++++----
>>> 1 file changed, 100 insertions(+), 10 deletions(-)
>> ...
>>> +
>>> +#define AMU_SAMPLE_EXP_MS 20
>>> +
>>> +unsigned int arch_freq_get_on_cpu(int cpu)
>>> +{
>>> + struct amu_cntr_sample *amu_sample;
>>> + unsigned int start_cpu = cpu;
>>> + unsigned long last_update;
>>> + unsigned int freq = 0;
>>> + u64 scale;
>>> +
>>> + if (!amu_fie_cpu_supported(cpu) || !arch_scale_freq_ref(cpu))
>>> + return 0;
>>> +
>>> +retry:
>>> + amu_sample = per_cpu_ptr(&cpu_amu_samples, cpu);
>>> +
>>> + last_update = amu_sample->last_update;
>>> +
>>> + /*
>>> + * For those CPUs that are in full dynticks mode,
>>> + * and those that have not seen tick for a while
>>> + * try an alternative source for the counters (and thus freq scale),
>>> + * if available, for given policy:
>>> + * this boils down to identifying an active cpu within the same freq
>>> + * domain, if any.
>>> + */
>>> + if (!housekeeping_cpu(cpu, HK_TYPE_TICK) ||
>>> + time_is_before_jiffies(last_update + msecs_to_jiffies(AMU_SAMPLE_EXP_MS))) {
>> One question here.
>>
>> The 2nd condition, providing the addtional code in patch 4, would be:
>> (!idle_cpu(cpu) && time_is_before_jiffies(last_update +
>> msecs_to_jiffies(AMU_SAMPLE_EXP_MS)))
>> That means trying another cpu in the same freq domain if this cpu is running
>> and not having a tick recently.
>>
>> In this case, if it fails to find an alternative cpu in the following code,
>> can it just jump to the calculation
>> part and return an 'old' frequency rather than return 0?
>> The freq here won't be older than the freq when the cpu went idle last time
>> -- yet the freq before last idle
>> is returned if the cpu were idle (patch 4).
> To be fair, I will be dropping the idle_cpu check from this condition so that
> we do keep the cap on the validity of the scale factor: meaning if the cpu
> remains idle for longer than assumed 20ms we will either look for alternative
> or return 0. I'm not entirely convinced returning somewhat stale information on
> the freq for CPUs that remain idle for a while will be useful.
>
> One note here: as per discussion in [1] this functionality will be moved to new
> sysfs attribute so it will no longer be used via scaling_cur_freq.
>
>
>>> + struct cpufreq_policy *policy = cpufreq_cpu_get(cpu);
>>> + int ref_cpu = cpu;
>>> +
>>> + if (!policy)
>>> + goto leave;
>>> +
>>> + if (!policy_is_shared(policy)) {
>>> + cpufreq_cpu_put(policy);
>>> + goto leave;
>>> + }
>>> +
>>> + do {
>>> + ref_cpu = cpumask_next_wrap(ref_cpu, policy->cpus,
>>> + start_cpu, false);
>> start_cpu is only used here. looks like we can s/start_cpu/cpu/ and remove
>> its definition above?
> It is indeed but it is needed for the cpumask_next_wrap to know when to stop
> the iteration after wrapping.
>
> ---
> [1] https://lore.kernel.org/all/20240603081331.3829278-1-beata.michalska@arm.com/
> ---
>
> BR
> Beata
>
Sure, thanks. Looking forward to the next version.
Jie
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v6 0/4] Add support for AArch64 AMUv1-based arch_freq_get_on_cpu
[not found] ` <8a9b4e02-a5c6-cb1b-fd32-728fc2c5e741@hisilicon.com>
@ 2024-09-06 9:45 ` Beata Michalska
0 siblings, 0 replies; 18+ messages in thread
From: Beata Michalska @ 2024-09-06 9:45 UTC (permalink / raw)
To: Jie Zhan
Cc: Catalin Marinas, linux-kernel, linux-arm-kernel, ionela.voinescu,
sudeep.holla, will, vincent.guittot, vanshikonda, sumitg, yang,
lihuisong, viresh.kumar, rafael
On Tue, Aug 27, 2024 at 08:56:28PM +0800, Jie Zhan wrote:
> On 26/08/2024 15:24, Beata Michalska wrote:
>
> ...
>
> > > I've recently tested this patchset on a Kunpeng system.
> > > It works as expected when reading scaling_cur_freq.
> > > The frequency samples are much stabler than what cppc_cpufreq returns.
> > Thank you for giving it a spin.
> > (and apologies for late reply)
> > > A few minor things.
> > >
> > > 1. I observed larger errors on idle cpus than busy cpus, though it's just up
> > > to 1%.
> > > Not sure if this comes from the uncertain time interval between the last
> > > tick and entering idle.
> > > The shorter averaging interval, the larger error, I supposed.
> > All right - will look into it.
> > Just for my benefit: that diff is strictly between arch_freq_avg_get_on_cpu
> > and cpufreq_driver->get(policy->cpu) ?
>
> I can't say whether it's "strictly" between them or not because driver->get()
> shows a fluctuating value.
> On my platform, cppc_cpufreq's driver->get() sometimes shows large errors on
> busy cpus (as reported by recent emails), but quite accurate on idle cpus (<1%).
>
> With this patch, the "error" on idle cpus as mentioned above is typically <1%,
> hence better in general.
Ah, that's great then - I must have misunderstood your previous comment.
Apologies for that.
>
> > > 2. In the current implementation, the resolution of frequency would be:
> > > max_freq_khz / SCHED_CAPACITY_SCALE
> > > This looks a bit unnecessary to me.
> > >
> > > It's supposed to get a better resolution if we can do this in
> > > arch_freq_get_on_cpu():
> > >
> > > freq = delta_cycle_cnt * max_freq_khz / delta_const_cnt
> > >
> > > which may require caching both current and previous sets of counts in the
> > > per-cpu struct amu_cntr_sample.
> > >
> > arch_freq_get_on_cpu relies on the frequency scale factor to derive the average
> > frequency. The scale factor is being calculated based on the deltas you have
> > mentioned and arch_max_freq_scale which uses SCHED_CAPACITY_SCALE*2 factor to
> > accommodate for rather low reference frequencies. arch_freq_get_on_cpu just does
> > somewhat reverse computation to that.
>
> Yeah I understood this.
>
> arch_freq_get_on_cpu() now returns:
> freq = (arch_scale_freq_capacity() * arch_scale_freq_ref()) >> SCHED_CAPACITY_SHIFT
>
> The frequency resolution is (arch_scale_freq_ref() >> SCHED_CAPACITY_SHIFT), which
> is equivalent to max_freq_khz / SCHED_CAPACITY_SCALE.
>
> If we can directly do
> freq = delta_cycle_cnt * ref_freq_khz / delta_const_cnt
> in arch_freq_get_on_cpu(), it's supposed to have a 1KHz resolution.
> (sorry for the wrong multiplier in the last reply)
>
> Just similar to what's done in [1].
>
> It could be more worthwhile to have a good resolution than utilising the
> arch_topology framework?
I guess the question would be whether the precision uplift justifies
the change ? In both cases, provided frequency value will carry over potential
error due to the nature of how it is being obtained. Furthermore, it is still
an average frequency so I am not really convinced that trading here for
a fraction of precision would bring any real value. Note that, the difference
between the two will fluctuate as well. If there is indeed a real need for
getting that extra precision* it would be good to see some actual numbers ?
---
BR
Beata
>
> [1]https://lore.kernel.org/all/20240229162520.970986-2-vanshikonda@os.amperecomputing.com/
>
> >
> > ---
> > BR
> > Beata
> > > Kind regards,
> > > Jie
> > >
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2024-09-06 10:07 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-03 8:21 [PATCH v6 0/4] Add support for AArch64 AMUv1-based arch_freq_get_on_cpu Beata Michalska
2024-06-03 8:21 ` [PATCH v6 1/4] arch_topology: init capacity_freq_ref to 0 Beata Michalska
2024-06-03 8:21 ` [PATCH v6 2/4] arm64: amu: Delay allocating cpumask for AMU FIE support Beata Michalska
2024-06-03 8:21 ` [PATCH v6 3/4] arm64: Provide an AMU-based version of arch_freq_get_on_cpu Beata Michalska
2024-07-10 17:44 ` Vanshidhar Konda
2024-07-17 6:54 ` Beata Michalska
2024-08-14 6:46 ` Jie Zhan
2024-08-26 7:23 ` Beata Michalska
2024-08-27 13:05 ` Jie Zhan
2024-06-03 8:21 ` [PATCH v6 4/4] arm64: Update AMU-based freq scale factor on entering idle Beata Michalska
2024-07-08 17:10 ` [PATCH v6 0/4] Add support for AArch64 AMUv1-based arch_freq_get_on_cpu Catalin Marinas
2024-07-11 13:59 ` Beata Michalska
2024-08-14 8:05 ` Jie Zhan
2024-08-26 7:24 ` Beata Michalska
2024-08-27 13:03 ` Jie Zhan
[not found] ` <8a9b4e02-a5c6-cb1b-fd32-728fc2c5e741@hisilicon.com>
2024-09-06 9:45 ` Beata Michalska
2024-07-14 0:32 ` Vanshidhar Konda
2024-07-17 6:58 ` Beata Michalska
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).