* [PATCH V3 0/4] cpufreq: cppc: Add support for frequency invariance
@ 2021-06-21 9:19 Viresh Kumar
2021-06-21 9:19 ` [PATCH V3 4/4] cpufreq: CPPC: " Viresh Kumar
` (2 more replies)
0 siblings, 3 replies; 41+ messages in thread
From: Viresh Kumar @ 2021-06-21 9:19 UTC (permalink / raw)
To: Rafael Wysocki, Ionela Voinescu, Ben Segall,
Daniel Bristot de Oliveira, Dietmar Eggemann, Greg Kroah-Hartman,
Ingo Molnar, Juri Lelli, Mel Gorman, Peter Zijlstra,
Rafael J. Wysocki, Steven Rostedt, Sudeep Holla, Vincent Guittot,
Viresh Kumar, Will Deacon
Cc: linux-pm, Qian Cai, linux-acpi, linux-kernel, Paul E. McKenney,
Rafael J. Wysocki
Hello,
Changes since V2:
- We don't need start_cpu() and stop_cpu() callbacks anymore, we can make it
work using policy ->init() and exit() alone.
- Two new cleanup patches 1/4 and 2/4.
- Improved commit log of 3/4.
- Dropped WARN_ON(local_freq_scale > 1024), since this can occur on counter's
overlap (seen with Vincent's setup).
- Handle stuff from init/exit() callbacks only.
Changes since V1:
- Few of the patches migrating users to ->exit() callback are posted separately.
- The CPPC patch was completely reverted and so the support for FIE is again
added here from scratch.
- The start_cpu() and stop_cpu() interface is reworked a little so stop_cpu() is
only ever called for a CPU if start_cpu() was called for it earlier.
- A new patch to implement RCU locking in arch_topology core to avoid some
races.
- Some cleanup and very clear/separate paths for FIE in cppc driver now.
-------------------------8<-------------------------
CPPC cpufreq driver is used for ARM servers and this patch series tries to
provide counter-based frequency invariance support for them in the absence for
architecture specific counters (like AMUs).
This was reverted earlier for the 5.13 kernel after Qian Cai reported kernel
oops during suspend/resume.
This is based of v5.13-rc7 + a cleanup patchset:
https://lore.kernel.org/linux-pm/cover.1623825358.git.viresh.kumar@linaro.org/
All the patches are pushed here together for people to run.
https://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm.git/log/?h=cpufreq/cppc
This is tested on my Hikey platform (without the actual read/write to
performance counters), with this script for over an hour:
while true; do
for i in `seq 1 7`;
do
echo 0 > /sys/devices/system/cpu/cpu$i/online;
done;
for i in `seq 1 7`;
do
echo 1 > /sys/devices/system/cpu/cpu$i/online;
done;
done
The same is done by Vincent on ThunderX2 and no issues were seen.
I would like to get this merged for 5.14, since it was recently reverted from
5.13. And that it is still an independent change to a single driver and topology
APIs that no one is using apart from arm64 topology stuff.
Thanks.
--
Viresh
Viresh Kumar (4):
cpufreq: cppc: Fix potential memleak in cppc_cpufreq_cpu_init
cpufreq: cppc: Pass structure instance by reference
arch_topology: Avoid use-after-free for scale_freq_data
cpufreq: CPPC: Add support for frequency invariance
drivers/base/arch_topology.c | 27 +++-
drivers/cpufreq/Kconfig.arm | 10 ++
drivers/cpufreq/cppc_cpufreq.c | 287 +++++++++++++++++++++++++++++----
include/linux/arch_topology.h | 1 +
kernel/sched/core.c | 1 +
5 files changed, 292 insertions(+), 34 deletions(-)
--
2.31.1.272.g89b43f80a514
^ permalink raw reply [flat|nested] 41+ messages in thread* [PATCH V3 4/4] cpufreq: CPPC: Add support for frequency invariance 2021-06-21 9:19 [PATCH V3 0/4] cpufreq: cppc: Add support for frequency invariance Viresh Kumar @ 2021-06-21 9:19 ` Viresh Kumar 2021-06-24 9:48 ` Ionela Voinescu 2021-06-21 20:48 ` [PATCH V3 0/4] cpufreq: cppc: " Qian Cai 2021-06-28 11:54 ` Ionela Voinescu 2 siblings, 1 reply; 41+ messages in thread From: Viresh Kumar @ 2021-06-21 9:19 UTC (permalink / raw) To: Rafael Wysocki, Ionela Voinescu, Viresh Kumar, Sudeep Holla, Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Daniel Bristot de Oliveira Cc: linux-pm, Qian Cai, linux-acpi, linux-kernel The Frequency Invariance Engine (FIE) is providing a frequency scaling correction factor that helps achieve more accurate load-tracking. Normally, this scaling factor can be obtained directly with the help of the cpufreq drivers as they know the exact frequency the hardware is running at. But that isn't the case for CPPC cpufreq driver. Another way of obtaining that is using the arch specific counter support, which is already present in kernel, but that hardware is optional for platforms. This patch updates the CPPC driver to register itself with the topology core to provide its own implementation (cppc_scale_freq_tick()) of topology_scale_freq_tick() which gets called by the scheduler on every tick. Note that the arch specific counters have higher priority than CPPC counters, if available, though the CPPC driver doesn't need to have any special handling for that. On an invocation of cppc_scale_freq_tick(), we schedule an irq work (since we reach here from hard-irq context), which then schedules a normal work item and cppc_scale_freq_workfn() updates the per_cpu arch_freq_scale variable based on the counter updates since the last tick. To allow platforms to disable this CPPC counter-based frequency invariance support, this is all done under CONFIG_ACPI_CPPC_CPUFREQ_FIE, which is enabled by default. This also exports sched_setattr_nocheck() as the CPPC driver can be built as a module. Cc: linux-acpi@vger.kernel.org Tested-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> --- drivers/cpufreq/Kconfig.arm | 10 ++ drivers/cpufreq/cppc_cpufreq.c | 249 +++++++++++++++++++++++++++++++-- include/linux/arch_topology.h | 1 + kernel/sched/core.c | 1 + 4 files changed, 247 insertions(+), 14 deletions(-) diff --git a/drivers/cpufreq/Kconfig.arm b/drivers/cpufreq/Kconfig.arm index e65e0a43be64..a5c5f70acfc9 100644 --- a/drivers/cpufreq/Kconfig.arm +++ b/drivers/cpufreq/Kconfig.arm @@ -19,6 +19,16 @@ config ACPI_CPPC_CPUFREQ If in doubt, say N. +config ACPI_CPPC_CPUFREQ_FIE + bool "Frequency Invariance support for CPPC cpufreq driver" + depends on ACPI_CPPC_CPUFREQ && GENERIC_ARCH_TOPOLOGY + default y + help + This extends frequency invariance support in the CPPC cpufreq driver, + by using CPPC delivered and reference performance counters. + + If in doubt, say N. + config ARM_ALLWINNER_SUN50I_CPUFREQ_NVMEM tristate "Allwinner nvmem based SUN50I CPUFreq driver" depends on ARCH_SUNXI diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c index 490175d65082..db550fc92931 100644 --- a/drivers/cpufreq/cppc_cpufreq.c +++ b/drivers/cpufreq/cppc_cpufreq.c @@ -10,14 +10,18 @@ #define pr_fmt(fmt) "CPPC Cpufreq:" fmt +#include <linux/arch_topology.h> #include <linux/kernel.h> #include <linux/module.h> #include <linux/delay.h> #include <linux/cpu.h> #include <linux/cpufreq.h> #include <linux/dmi.h> +#include <linux/irq_work.h> +#include <linux/kthread.h> #include <linux/time.h> #include <linux/vmalloc.h> +#include <uapi/linux/sched/types.h> #include <asm/unaligned.h> @@ -57,6 +61,210 @@ static struct cppc_workaround_oem_info wa_info[] = { } }; +#ifdef CONFIG_ACPI_CPPC_CPUFREQ_FIE + +/* Frequency invariance support */ +struct cppc_freq_invariance { + int cpu; + struct irq_work irq_work; + struct kthread_work work; + struct cppc_perf_fb_ctrs prev_perf_fb_ctrs; + struct cppc_cpudata *cpu_data; +}; + +static DEFINE_PER_CPU(struct cppc_freq_invariance, cppc_freq_inv); +static struct kthread_worker *kworker_fie; + +static struct cpufreq_driver cppc_cpufreq_driver; +static unsigned int hisi_cppc_cpufreq_get_rate(unsigned int cpu); +static int cppc_perf_from_fbctrs(struct cppc_cpudata *cpu_data, + struct cppc_perf_fb_ctrs *fb_ctrs_t0, + struct cppc_perf_fb_ctrs *fb_ctrs_t1); + +/** + * cppc_scale_freq_workfn - CPPC arch_freq_scale updater for frequency invariance + * @work: The work item. + * + * The CPPC driver register itself with the topology core to provide its own + * implementation (cppc_scale_freq_tick()) of topology_scale_freq_tick() which + * gets called by the scheduler on every tick. + * + * Note that the arch specific counters have higher priority than CPPC counters, + * if available, though the CPPC driver doesn't need to have any special + * handling for that. + * + * On an invocation of cppc_scale_freq_tick(), we schedule an irq work (since we + * reach here from hard-irq context), which then schedules a normal work item + * and cppc_scale_freq_workfn() updates the per_cpu arch_freq_scale variable + * based on the counter updates since the last tick. + */ +static void cppc_scale_freq_workfn(struct kthread_work *work) +{ + struct cppc_freq_invariance *cppc_fi; + struct cppc_perf_fb_ctrs fb_ctrs = {0}; + struct cppc_cpudata *cpu_data; + unsigned long local_freq_scale; + u64 perf; + + cppc_fi = container_of(work, struct cppc_freq_invariance, work); + cpu_data = cppc_fi->cpu_data; + + if (cppc_get_perf_ctrs(cppc_fi->cpu, &fb_ctrs)) { + pr_warn("%s: failed to read perf counters\n", __func__); + return; + } + + perf = cppc_perf_from_fbctrs(cpu_data, &cppc_fi->prev_perf_fb_ctrs, + &fb_ctrs); + cppc_fi->prev_perf_fb_ctrs = fb_ctrs; + + perf <<= SCHED_CAPACITY_SHIFT; + local_freq_scale = div64_u64(perf, cpu_data->perf_caps.highest_perf); + + /* This can happen due to counter's overflow */ + if (unlikely(local_freq_scale > 1024)) + local_freq_scale = 1024; + + per_cpu(arch_freq_scale, cppc_fi->cpu) = local_freq_scale; +} + +static void cppc_irq_work(struct irq_work *irq_work) +{ + struct cppc_freq_invariance *cppc_fi; + + cppc_fi = container_of(irq_work, struct cppc_freq_invariance, irq_work); + kthread_queue_work(kworker_fie, &cppc_fi->work); +} + +static void cppc_scale_freq_tick(void) +{ + struct cppc_freq_invariance *cppc_fi = &per_cpu(cppc_freq_inv, smp_processor_id()); + + /* + * cppc_get_perf_ctrs() can potentially sleep, call that from the right + * context. + */ + irq_work_queue(&cppc_fi->irq_work); +} + +static struct scale_freq_data cppc_sftd = { + .source = SCALE_FREQ_SOURCE_CPPC, + .set_freq_scale = cppc_scale_freq_tick, +}; + +static void cppc_cpufreq_cpu_fie_init(struct cpufreq_policy *policy) +{ + struct cppc_freq_invariance *cppc_fi; + int cpu, ret; + + if (cppc_cpufreq_driver.get == hisi_cppc_cpufreq_get_rate) + return; + + for_each_cpu(cpu, policy->cpus) { + cppc_fi = &per_cpu(cppc_freq_inv, cpu); + cppc_fi->cpu = cpu; + cppc_fi->cpu_data = policy->driver_data; + kthread_init_work(&cppc_fi->work, cppc_scale_freq_workfn); + init_irq_work(&cppc_fi->irq_work, cppc_irq_work); + + ret = cppc_get_perf_ctrs(cpu, &cppc_fi->prev_perf_fb_ctrs); + if (ret) { + pr_warn("%s: failed to read perf counters for cpu:%d: %d\n", + __func__, cpu, ret); + return; + } + } + + /* Register for freq-invariance */ + topology_set_scale_freq_source(&cppc_sftd, policy->cpus); +} + +/* + * We free all the resources on policy's removal and not on CPU removal as the + * irq-work are per-cpu and the hotplug core takes care of flushing the pending + * irq-works (hint: smpcfd_dying_cpu()) on CPU hotplug. Even if the kthread-work + * fires on another CPU after the concerned CPU is removed, it won't harm. + * + * We just need to make sure to remove them all on policy->exit(). + */ +static void cppc_cpufreq_cpu_fie_exit(struct cpufreq_policy *policy) +{ + struct cppc_freq_invariance *cppc_fi; + int cpu; + + if (cppc_cpufreq_driver.get == hisi_cppc_cpufreq_get_rate) + return; + + /* policy->cpus will be empty here, use related_cpus instead */ + topology_clear_scale_freq_source(SCALE_FREQ_SOURCE_CPPC, policy->related_cpus); + + for_each_cpu(cpu, policy->related_cpus) { + cppc_fi = &per_cpu(cppc_freq_inv, cpu); + irq_work_sync(&cppc_fi->irq_work); + kthread_cancel_work_sync(&cppc_fi->work); + } +} + +static void __init cppc_freq_invariance_init(void) +{ + struct sched_attr attr = { + .size = sizeof(struct sched_attr), + .sched_policy = SCHED_DEADLINE, + .sched_nice = 0, + .sched_priority = 0, + /* + * Fake (unused) bandwidth; workaround to "fix" + * priority inheritance. + */ + .sched_runtime = 1000000, + .sched_deadline = 10000000, + .sched_period = 10000000, + }; + int ret; + + if (cppc_cpufreq_driver.get == hisi_cppc_cpufreq_get_rate) + return; + + kworker_fie = kthread_create_worker(0, "cppc_fie"); + if (IS_ERR(kworker_fie)) + return; + + ret = sched_setattr_nocheck(kworker_fie->task, &attr); + if (ret) { + pr_warn("%s: failed to set SCHED_DEADLINE: %d\n", __func__, + ret); + kthread_destroy_worker(kworker_fie); + return; + } +} + +static void cppc_freq_invariance_exit(void) +{ + if (cppc_cpufreq_driver.get == hisi_cppc_cpufreq_get_rate) + return; + + kthread_destroy_worker(kworker_fie); + kworker_fie = NULL; +} + +#else +static inline void cppc_cpufreq_cpu_fie_init(struct cpufreq_policy *policy) +{ +} + +static inline void cppc_cpufreq_cpu_fie_exit(struct cpufreq_policy *policy) +{ +} + +static inline void cppc_freq_invariance_init(void) +{ +} + +static inline void cppc_freq_invariance_exit(void) +{ +} +#endif /* CONFIG_ACPI_CPPC_CPUFREQ_FIE */ + /* Callback function used to retrieve the max frequency from DMI */ static void cppc_find_dmi_mhz(const struct dmi_header *dm, void *private) { @@ -335,8 +543,10 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy *policy) cpu_data->perf_ctrls.desired_perf = caps->highest_perf; ret = cppc_set_perf(cpu, &cpu_data->perf_ctrls); - if (!ret) + if (!ret) { + cppc_cpufreq_cpu_fie_init(policy); return 0; + } pr_debug("Err setting perf value:%d on CPU:%d. ret:%d\n", caps->highest_perf, cpu, ret); @@ -353,6 +563,8 @@ static int cppc_cpufreq_cpu_exit(struct cpufreq_policy *policy) unsigned int cpu = policy->cpu; int ret; + cppc_cpufreq_cpu_fie_exit(policy); + cpu_data->perf_ctrls.desired_perf = caps->lowest_perf; ret = cppc_set_perf(cpu, &cpu_data->perf_ctrls); @@ -372,12 +584,12 @@ static inline u64 get_delta(u64 t1, u64 t0) return (u32)t1 - (u32)t0; } -static int cppc_get_rate_from_fbctrs(struct cppc_cpudata *cpu_data, - struct cppc_perf_fb_ctrs *fb_ctrs_t0, - struct cppc_perf_fb_ctrs *fb_ctrs_t1) +static int cppc_perf_from_fbctrs(struct cppc_cpudata *cpu_data, + struct cppc_perf_fb_ctrs *fb_ctrs_t0, + struct cppc_perf_fb_ctrs *fb_ctrs_t1) { u64 delta_reference, delta_delivered; - u64 reference_perf, delivered_perf; + u64 reference_perf; reference_perf = fb_ctrs_t0->reference_perf; @@ -386,14 +598,11 @@ static int cppc_get_rate_from_fbctrs(struct cppc_cpudata *cpu_data, delta_delivered = get_delta(fb_ctrs_t1->delivered, fb_ctrs_t0->delivered); - /* Check to avoid divide-by zero */ - if (delta_reference || delta_delivered) - delivered_perf = (reference_perf * delta_delivered) / - delta_reference; - else - delivered_perf = cpu_data->perf_ctrls.desired_perf; + /* Check to avoid divide-by zero and invalid delivered_perf */ + if (!delta_reference || !delta_delivered) + return cpu_data->perf_ctrls.desired_perf; - return cppc_cpufreq_perf_to_khz(cpu_data, delivered_perf); + return (reference_perf * delta_delivered) / delta_reference; } static unsigned int cppc_cpufreq_get_rate(unsigned int cpu) @@ -401,6 +610,7 @@ static unsigned int cppc_cpufreq_get_rate(unsigned int cpu) struct cppc_perf_fb_ctrs fb_ctrs_t0 = {0}, fb_ctrs_t1 = {0}; struct cpufreq_policy *policy = cpufreq_cpu_get(cpu); struct cppc_cpudata *cpu_data = policy->driver_data; + u64 delivered_perf; int ret; cpufreq_cpu_put(policy); @@ -415,7 +625,10 @@ static unsigned int cppc_cpufreq_get_rate(unsigned int cpu) if (ret) return ret; - return cppc_get_rate_from_fbctrs(cpu_data, &fb_ctrs_t0, &fb_ctrs_t1); + delivered_perf = cppc_perf_from_fbctrs(cpu_data, &fb_ctrs_t0, + &fb_ctrs_t1); + + return cppc_cpufreq_perf_to_khz(cpu_data, delivered_perf); } static int cppc_cpufreq_set_boost(struct cpufreq_policy *policy, int state) @@ -516,14 +729,21 @@ static void cppc_check_hisi_workaround(void) static int __init cppc_cpufreq_init(void) { + int ret; + if ((acpi_disabled) || !acpi_cpc_valid()) return -ENODEV; INIT_LIST_HEAD(&cpu_data_list); cppc_check_hisi_workaround(); + cppc_freq_invariance_init(); - return cpufreq_register_driver(&cppc_cpufreq_driver); + ret = cpufreq_register_driver(&cppc_cpufreq_driver); + if (ret) + cppc_freq_invariance_exit(); + + return ret; } static inline void free_cpu_data(void) @@ -541,6 +761,7 @@ static inline void free_cpu_data(void) static void __exit cppc_cpufreq_exit(void) { cpufreq_unregister_driver(&cppc_cpufreq_driver); + cppc_freq_invariance_exit(); free_cpu_data(); } diff --git a/include/linux/arch_topology.h b/include/linux/arch_topology.h index 11e555cfaecb..f180240dc95f 100644 --- a/include/linux/arch_topology.h +++ b/include/linux/arch_topology.h @@ -37,6 +37,7 @@ bool topology_scale_freq_invariant(void); enum scale_freq_source { SCALE_FREQ_SOURCE_CPUFREQ = 0, SCALE_FREQ_SOURCE_ARCH, + SCALE_FREQ_SOURCE_CPPC, }; struct scale_freq_data { diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 4ca80df205ce..5226cc26a095 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -6389,6 +6389,7 @@ int sched_setattr_nocheck(struct task_struct *p, const struct sched_attr *attr) { return __sched_setscheduler(p, attr, false, true); } +EXPORT_SYMBOL_GPL(sched_setattr_nocheck); /** * sched_setscheduler_nocheck - change the scheduling policy and/or RT priority of a thread from kernelspace. -- 2.31.1.272.g89b43f80a514 ^ permalink raw reply related [flat|nested] 41+ messages in thread
* Re: [PATCH V3 4/4] cpufreq: CPPC: Add support for frequency invariance 2021-06-21 9:19 ` [PATCH V3 4/4] cpufreq: CPPC: " Viresh Kumar @ 2021-06-24 9:48 ` Ionela Voinescu 2021-06-24 13:04 ` Viresh Kumar 0 siblings, 1 reply; 41+ messages in thread From: Ionela Voinescu @ 2021-06-24 9:48 UTC (permalink / raw) To: Viresh Kumar Cc: Rafael Wysocki, Sudeep Holla, Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Daniel Bristot de Oliveira, linux-pm, Qian Cai, linux-acpi, linux-kernel Hey, On Monday 21 Jun 2021 at 14:49:37 (+0530), Viresh Kumar wrote: > The Frequency Invariance Engine (FIE) is providing a frequency scaling > correction factor that helps achieve more accurate load-tracking. [..] > +static void cppc_cpufreq_cpu_fie_exit(struct cpufreq_policy *policy) > +{ > + struct cppc_freq_invariance *cppc_fi; > + int cpu; > + > + if (cppc_cpufreq_driver.get == hisi_cppc_cpufreq_get_rate) > + return; > + > + /* policy->cpus will be empty here, use related_cpus instead */ > + topology_clear_scale_freq_source(SCALE_FREQ_SOURCE_CPPC, policy->related_cpus); > + > + for_each_cpu(cpu, policy->related_cpus) { > + cppc_fi = &per_cpu(cppc_freq_inv, cpu); Do you think it might be worth having here something like: if (!cppc_fi->cpu_data) continue; This would be to protect against cases where the platform does not boot with all CPUs or the module is loaded after some have already been offlined. Unlikely, but.. > + irq_work_sync(&cppc_fi->irq_work); > + kthread_cancel_work_sync(&cppc_fi->work); > + } > +} The rest of the code is almost the same as the original, so that is all from me :). Thanks, Ionela. ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH V3 4/4] cpufreq: CPPC: Add support for frequency invariance 2021-06-24 9:48 ` Ionela Voinescu @ 2021-06-24 13:04 ` Viresh Kumar 2021-06-25 8:54 ` Ionela Voinescu 0 siblings, 1 reply; 41+ messages in thread From: Viresh Kumar @ 2021-06-24 13:04 UTC (permalink / raw) To: Ionela Voinescu Cc: Rafael Wysocki, Sudeep Holla, Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Daniel Bristot de Oliveira, linux-pm, Qian Cai, linux-acpi, linux-kernel On 24-06-21, 10:48, Ionela Voinescu wrote: > On Monday 21 Jun 2021 at 14:49:37 (+0530), Viresh Kumar wrote: > > The Frequency Invariance Engine (FIE) is providing a frequency scaling > > correction factor that helps achieve more accurate load-tracking. > [..] > > +static void cppc_cpufreq_cpu_fie_exit(struct cpufreq_policy *policy) > > +{ > > + struct cppc_freq_invariance *cppc_fi; > > + int cpu; > > + > > + if (cppc_cpufreq_driver.get == hisi_cppc_cpufreq_get_rate) > > + return; > > + > > + /* policy->cpus will be empty here, use related_cpus instead */ > > + topology_clear_scale_freq_source(SCALE_FREQ_SOURCE_CPPC, policy->related_cpus); > > + > > + for_each_cpu(cpu, policy->related_cpus) { > > + cppc_fi = &per_cpu(cppc_freq_inv, cpu); > > Do you think it might be worth having here something like: > > if (!cppc_fi->cpu_data) > continue; > > This would be to protect against cases where the platform does not boot > with all CPUs or the module is loaded after some have already been > offlined. Unlikely, but.. Even in that case policy->cpus will contain all offline+online CPUs (at ->init() time), isn't it ? > > + irq_work_sync(&cppc_fi->irq_work); > > + kthread_cancel_work_sync(&cppc_fi->work); > > + } > > +} > > The rest of the code is almost the same as the original, so that is all > from me :). > > Thanks, > Ionela. -- viresh ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH V3 4/4] cpufreq: CPPC: Add support for frequency invariance 2021-06-24 13:04 ` Viresh Kumar @ 2021-06-25 8:54 ` Ionela Voinescu 2021-06-25 16:54 ` Viresh Kumar 0 siblings, 1 reply; 41+ messages in thread From: Ionela Voinescu @ 2021-06-25 8:54 UTC (permalink / raw) To: Viresh Kumar Cc: Rafael Wysocki, Sudeep Holla, Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Daniel Bristot de Oliveira, linux-pm, Qian Cai, linux-acpi, linux-kernel Hey, On Thursday 24 Jun 2021 at 18:34:18 (+0530), Viresh Kumar wrote: > On 24-06-21, 10:48, Ionela Voinescu wrote: > > On Monday 21 Jun 2021 at 14:49:37 (+0530), Viresh Kumar wrote: > > > The Frequency Invariance Engine (FIE) is providing a frequency scaling > > > correction factor that helps achieve more accurate load-tracking. > > [..] > > > +static void cppc_cpufreq_cpu_fie_exit(struct cpufreq_policy *policy) > > > +{ > > > + struct cppc_freq_invariance *cppc_fi; > > > + int cpu; > > > + > > > + if (cppc_cpufreq_driver.get == hisi_cppc_cpufreq_get_rate) > > > + return; > > > + > > > + /* policy->cpus will be empty here, use related_cpus instead */ > > > + topology_clear_scale_freq_source(SCALE_FREQ_SOURCE_CPPC, policy->related_cpus); > > > + > > > + for_each_cpu(cpu, policy->related_cpus) { > > > + cppc_fi = &per_cpu(cppc_freq_inv, cpu); > > > > Do you think it might be worth having here something like: > > > > if (!cppc_fi->cpu_data) > > continue; > > > > This would be to protect against cases where the platform does not boot > > with all CPUs or the module is loaded after some have already been > > offlined. Unlikely, but.. > > Even in that case policy->cpus will contain all offline+online CPUs (at ->init() > time), isn't it ? > Right, my bad. I missed cpumask_and(policy->cpus, policy->cpus, cpu_online_mask) being done after init(). It logically seems a bit wrong, but drivers are in control of setting policy->cpus and acting on it, and in this case the driver does the right thing. Thanks, Ionela. > > > + irq_work_sync(&cppc_fi->irq_work); > > > + kthread_cancel_work_sync(&cppc_fi->work); > > > + } > > > +} > > > > The rest of the code is almost the same as the original, so that is all > > from me :). > > > > Thanks, > > Ionela. > > -- > viresh ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH V3 4/4] cpufreq: CPPC: Add support for frequency invariance 2021-06-25 8:54 ` Ionela Voinescu @ 2021-06-25 16:54 ` Viresh Kumar 2021-06-28 10:49 ` Ionela Voinescu 0 siblings, 1 reply; 41+ messages in thread From: Viresh Kumar @ 2021-06-25 16:54 UTC (permalink / raw) To: Ionela Voinescu Cc: Rafael Wysocki, Sudeep Holla, Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Daniel Bristot de Oliveira, linux-pm, Qian Cai, linux-acpi, linux-kernel On 25-06-21, 09:54, Ionela Voinescu wrote: > Hey, > > On Thursday 24 Jun 2021 at 18:34:18 (+0530), Viresh Kumar wrote: > > On 24-06-21, 10:48, Ionela Voinescu wrote: > > > On Monday 21 Jun 2021 at 14:49:37 (+0530), Viresh Kumar wrote: > > > > The Frequency Invariance Engine (FIE) is providing a frequency scaling > > > > correction factor that helps achieve more accurate load-tracking. > > > [..] > > > > +static void cppc_cpufreq_cpu_fie_exit(struct cpufreq_policy *policy) > > > > +{ > > > > + struct cppc_freq_invariance *cppc_fi; > > > > + int cpu; > > > > + > > > > + if (cppc_cpufreq_driver.get == hisi_cppc_cpufreq_get_rate) > > > > + return; > > > > + > > > > + /* policy->cpus will be empty here, use related_cpus instead */ > > > > + topology_clear_scale_freq_source(SCALE_FREQ_SOURCE_CPPC, policy->related_cpus); > > > > + > > > > + for_each_cpu(cpu, policy->related_cpus) { > > > > + cppc_fi = &per_cpu(cppc_freq_inv, cpu); > > > > > > Do you think it might be worth having here something like: > > > > > > if (!cppc_fi->cpu_data) > > > continue; > > > > > > This would be to protect against cases where the platform does not boot > > > with all CPUs or the module is loaded after some have already been > > > offlined. Unlikely, but.. > > > > Even in that case policy->cpus will contain all offline+online CPUs (at ->init() > > time), isn't it ? > > > > Right, my bad. I missed cpumask_and(policy->cpus, policy->cpus, > cpu_online_mask) being done after init(). It logically seems a bit > wrong, but drivers are in control of setting policy->cpus and acting on > it, and in this case the driver does the right thing. Do you want me to re-add your Reviewed-by here ? -- viresh ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH V3 4/4] cpufreq: CPPC: Add support for frequency invariance 2021-06-25 16:54 ` Viresh Kumar @ 2021-06-28 10:49 ` Ionela Voinescu 2021-06-29 4:32 ` Viresh Kumar 0 siblings, 1 reply; 41+ messages in thread From: Ionela Voinescu @ 2021-06-28 10:49 UTC (permalink / raw) To: Viresh Kumar Cc: Rafael Wysocki, Sudeep Holla, Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Daniel Bristot de Oliveira, linux-pm, Qian Cai, linux-acpi, linux-kernel On Friday 25 Jun 2021 at 22:24:18 (+0530), Viresh Kumar wrote: > On 25-06-21, 09:54, Ionela Voinescu wrote: > > Hey, > > > > On Thursday 24 Jun 2021 at 18:34:18 (+0530), Viresh Kumar wrote: > > > On 24-06-21, 10:48, Ionela Voinescu wrote: > > > > On Monday 21 Jun 2021 at 14:49:37 (+0530), Viresh Kumar wrote: > > > > > The Frequency Invariance Engine (FIE) is providing a frequency scaling > > > > > correction factor that helps achieve more accurate load-tracking. > > > > [..] > > > > > +static void cppc_cpufreq_cpu_fie_exit(struct cpufreq_policy *policy) > > > > > +{ > > > > > + struct cppc_freq_invariance *cppc_fi; > > > > > + int cpu; > > > > > + > > > > > + if (cppc_cpufreq_driver.get == hisi_cppc_cpufreq_get_rate) > > > > > + return; > > > > > + > > > > > + /* policy->cpus will be empty here, use related_cpus instead */ > > > > > + topology_clear_scale_freq_source(SCALE_FREQ_SOURCE_CPPC, policy->related_cpus); > > > > > + > > > > > + for_each_cpu(cpu, policy->related_cpus) { > > > > > + cppc_fi = &per_cpu(cppc_freq_inv, cpu); > > > > > > > > Do you think it might be worth having here something like: > > > > > > > > if (!cppc_fi->cpu_data) > > > > continue; > > > > > > > > This would be to protect against cases where the platform does not boot > > > > with all CPUs or the module is loaded after some have already been > > > > offlined. Unlikely, but.. > > > > > > Even in that case policy->cpus will contain all offline+online CPUs (at ->init() > > > time), isn't it ? > > > > > > > Right, my bad. I missed cpumask_and(policy->cpus, policy->cpus, > > cpu_online_mask) being done after init(). It logically seems a bit > > wrong, but drivers are in control of setting policy->cpus and acting on > > it, and in this case the driver does the right thing. > > Do you want me to re-add your Reviewed-by here ? > To be honest I would like to have more time on this before you merge the set, to better understand Qian's results and some observations I have for Thunder X2 (I will share in a bit). For the code, I think it's fine. I have a single observation regarding the following code: > +static void cppc_cpufreq_cpu_fie_init(struct cpufreq_policy *policy) > +{ > + struct cppc_freq_invariance *cppc_fi; > + int cpu, ret; > + > + if (cppc_cpufreq_driver.get == hisi_cppc_cpufreq_get_rate) > + return; > + > + for_each_cpu(cpu, policy->cpus) { > + cppc_fi = &per_cpu(cppc_freq_inv, cpu); > + cppc_fi->cpu = cpu; > + cppc_fi->cpu_data = policy->driver_data; > + kthread_init_work(&cppc_fi->work, cppc_scale_freq_workfn); > + init_irq_work(&cppc_fi->irq_work, cppc_irq_work); > + > + ret = cppc_get_perf_ctrs(cpu, &cppc_fi->prev_perf_fb_ctrs); > + if (ret) { > + pr_warn("%s: failed to read perf counters for cpu:%d: %d\n", > + __func__, cpu, ret); > + return; > + } For this condition above, think about a scenario where reading counters for offline CPUs returns an error. I'm not sure if that can happen, to be honest. That would mean here that you will never initialise the freq source unless all CPUs in the policy are online at policy creation. My recommendation is to warn about the failed read of perf counters but only return from this function if the target CPU is online as well when reading counters fails. This is probably a nit, so I'll let you decide if you want to do something about this. Thanks, Ionela. > + } > + > + /* Register for freq-invariance */ > + topology_set_scale_freq_source(&cppc_sftd, policy->cpus); > +} > -- > viresh ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH V3 4/4] cpufreq: CPPC: Add support for frequency invariance 2021-06-28 10:49 ` Ionela Voinescu @ 2021-06-29 4:32 ` Viresh Kumar 2021-06-29 8:47 ` Ionela Voinescu 0 siblings, 1 reply; 41+ messages in thread From: Viresh Kumar @ 2021-06-29 4:32 UTC (permalink / raw) To: Ionela Voinescu Cc: Rafael Wysocki, Sudeep Holla, Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Daniel Bristot de Oliveira, linux-pm, Qian Cai, linux-acpi, linux-kernel On 28-06-21, 11:49, Ionela Voinescu wrote: > To be honest I would like to have more time on this before you merge the > set, to better understand Qian's results and some observations I have > for Thunder X2 (I will share in a bit). Ideally, this code was already merged in 5.13 and would have required us to fix any problems as we encounter them. I did revert it because it caused a kernel crash and I wasn't sure if there was a sane/easy way of fixing that so late in the release cycle. That was the right thing to do then. All those issues are gone now, we may have an issue around rounding of counters or some hardware specific issues, it isn't clear yet. But the stuff works fine otherwise, doesn't make the kernel crash and it is controlled with a CONFIG_ option, so those who don't want to use it can still disable it. The merge window is here now, if we don't merge it now, it gets delayed by a full cycle (roughly two months) and if we merge it now and are able to narrow down the rounding issues, if there are any, we will have full two months to make a fix for that and still push it in 5.14 itself. And so I would like to get it merged in this merge window itself, it also makes sure more people would get to test it, like Qian was able to figure out a problem here for us. > For the code, I think it's fine. I have a single observation regarding > the following code: > > > +static void cppc_cpufreq_cpu_fie_init(struct cpufreq_policy *policy) > > +{ > > + struct cppc_freq_invariance *cppc_fi; > > + int cpu, ret; > > + > > + if (cppc_cpufreq_driver.get == hisi_cppc_cpufreq_get_rate) > > + return; > > + > > + for_each_cpu(cpu, policy->cpus) { > > + cppc_fi = &per_cpu(cppc_freq_inv, cpu); > > + cppc_fi->cpu = cpu; > > + cppc_fi->cpu_data = policy->driver_data; > > + kthread_init_work(&cppc_fi->work, cppc_scale_freq_workfn); > > + init_irq_work(&cppc_fi->irq_work, cppc_irq_work); > > + > > + ret = cppc_get_perf_ctrs(cpu, &cppc_fi->prev_perf_fb_ctrs); > > + if (ret) { > > + pr_warn("%s: failed to read perf counters for cpu:%d: %d\n", > > + __func__, cpu, ret); > > + return; > > + } > > For this condition above, think about a scenario where reading counters > for offline CPUs returns an error. I'm not sure if that can happen, to > be honest. That would mean here that you will never initialise the freq > source unless all CPUs in the policy are online at policy creation. > > My recommendation is to warn about the failed read of perf counters but > only return from this function if the target CPU is online as well when > reading counters fails. > > This is probably a nit, so I'll let you decide if you want to do something > about this. That is a very good observation actually. Thanks for that. This is how I fixed it. diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c index d688877e8fbe..f6540068d0fe 100644 --- a/drivers/cpufreq/cppc_cpufreq.c +++ b/drivers/cpufreq/cppc_cpufreq.c @@ -171,7 +171,13 @@ static void cppc_cpufreq_cpu_fie_init(struct cpufreq_policy *policy) if (ret) { pr_warn("%s: failed to read perf counters for cpu:%d: %d\n", __func__, cpu, ret); - return; + + /* + * Don't abort if the CPU was offline while the driver + * was getting registered. + */ + if (cpu_online(cpu)) + return; } } -- viresh ^ permalink raw reply related [flat|nested] 41+ messages in thread
* Re: [PATCH V3 4/4] cpufreq: CPPC: Add support for frequency invariance 2021-06-29 4:32 ` Viresh Kumar @ 2021-06-29 8:47 ` Ionela Voinescu 2021-06-29 8:53 ` Viresh Kumar 0 siblings, 1 reply; 41+ messages in thread From: Ionela Voinescu @ 2021-06-29 8:47 UTC (permalink / raw) To: Viresh Kumar Cc: Rafael Wysocki, Sudeep Holla, Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Daniel Bristot de Oliveira, linux-pm, Qian Cai, linux-acpi, linux-kernel Hey, On Tuesday 29 Jun 2021 at 10:02:44 (+0530), Viresh Kumar wrote: > On 28-06-21, 11:49, Ionela Voinescu wrote: > > To be honest I would like to have more time on this before you merge the > > set, to better understand Qian's results and some observations I have > > for Thunder X2 (I will share in a bit). > > Ideally, this code was already merged in 5.13 and would have required > us to fix any problems as we encounter them. I did revert it because > it caused a kernel crash and I wasn't sure if there was a sane/easy > way of fixing that so late in the release cycle. That was the right > thing to do then. > > All those issues are gone now, we may have an issue around rounding of > counters or some hardware specific issues, it isn't clear yet. > > But the stuff works fine otherwise, doesn't make the kernel crash and > it is controlled with a CONFIG_ option, so those who don't want to use > it can still disable it. > > The merge window is here now, if we don't merge it now, it gets > delayed by a full cycle (roughly two months) and if we merge it now > and are able to narrow down the rounding issues, if there are any, we > will have full two months to make a fix for that and still push it in > 5.14 itself. > > And so I would like to get it merged in this merge window itself, it > also makes sure more people would get to test it, like Qian was able > to figure out a problem here for us. > Okay, makes sense. I have not seen this code actually do anything wrong so far, and the issues I see on ThunderX2 point more to misbehaving counters for this purpose. This being said, I would have probably preferred for this feature to be disabled by default, until we've tested more, but that won't give the chance to anyone else to test. > > For the code, I think it's fine. I have a single observation regarding > > the following code: > > > > > +static void cppc_cpufreq_cpu_fie_init(struct cpufreq_policy *policy) > > > +{ > > > + struct cppc_freq_invariance *cppc_fi; > > > + int cpu, ret; > > > + > > > + if (cppc_cpufreq_driver.get == hisi_cppc_cpufreq_get_rate) > > > + return; > > > + > > > + for_each_cpu(cpu, policy->cpus) { > > > + cppc_fi = &per_cpu(cppc_freq_inv, cpu); > > > + cppc_fi->cpu = cpu; > > > + cppc_fi->cpu_data = policy->driver_data; > > > + kthread_init_work(&cppc_fi->work, cppc_scale_freq_workfn); > > > + init_irq_work(&cppc_fi->irq_work, cppc_irq_work); > > > + > > > + ret = cppc_get_perf_ctrs(cpu, &cppc_fi->prev_perf_fb_ctrs); > > > + if (ret) { > > > + pr_warn("%s: failed to read perf counters for cpu:%d: %d\n", > > > + __func__, cpu, ret); > > > + return; > > > + } > > > > For this condition above, think about a scenario where reading counters > > for offline CPUs returns an error. I'm not sure if that can happen, to > > be honest. That would mean here that you will never initialise the freq > > source unless all CPUs in the policy are online at policy creation. > > > > My recommendation is to warn about the failed read of perf counters but > > only return from this function if the target CPU is online as well when > > reading counters fails. > > > > This is probably a nit, so I'll let you decide if you want to do something > > about this. > > That is a very good observation actually. Thanks for that. This is how > I fixed it. > > diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c > index d688877e8fbe..f6540068d0fe 100644 > --- a/drivers/cpufreq/cppc_cpufreq.c > +++ b/drivers/cpufreq/cppc_cpufreq.c > @@ -171,7 +171,13 @@ static void cppc_cpufreq_cpu_fie_init(struct cpufreq_policy *policy) > if (ret) { > pr_warn("%s: failed to read perf counters for cpu:%d: %d\n", > __func__, cpu, ret); > - return; > + > + /* > + * Don't abort if the CPU was offline while the driver > + * was getting registered. > + */ > + if (cpu_online(cpu)) > + return; > } > } > > -- Thanks! Reviewed-by: Ionela Voinescu <ionela.voinescu@arm.com> Ionela. > viresh ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH V3 4/4] cpufreq: CPPC: Add support for frequency invariance 2021-06-29 8:47 ` Ionela Voinescu @ 2021-06-29 8:53 ` Viresh Kumar 0 siblings, 0 replies; 41+ messages in thread From: Viresh Kumar @ 2021-06-29 8:53 UTC (permalink / raw) To: Ionela Voinescu Cc: Rafael Wysocki, Sudeep Holla, Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Daniel Bristot de Oliveira, linux-pm, Qian Cai, linux-acpi, linux-kernel On 29-06-21, 09:47, Ionela Voinescu wrote: > Okay, makes sense. I have not seen this code actually do anything wrong > so far, and the issues I see on ThunderX2 point more to misbehaving > counters for this purpose. This being said, I would have probably > preferred for this feature to be disabled by default, until we've tested > more, but that won't give the chance to anyone else to test. > > Thanks! > > Reviewed-by: Ionela Voinescu <ionela.voinescu@arm.com> Thanks for understanding Ionela. -- viresh ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH V3 0/4] cpufreq: cppc: Add support for frequency invariance 2021-06-21 9:19 [PATCH V3 0/4] cpufreq: cppc: Add support for frequency invariance Viresh Kumar 2021-06-21 9:19 ` [PATCH V3 4/4] cpufreq: CPPC: " Viresh Kumar @ 2021-06-21 20:48 ` Qian Cai 2021-06-22 6:52 ` Viresh Kumar 2021-06-23 4:16 ` Viresh Kumar 2021-06-28 11:54 ` Ionela Voinescu 2 siblings, 2 replies; 41+ messages in thread From: Qian Cai @ 2021-06-21 20:48 UTC (permalink / raw) To: Viresh Kumar, Rafael Wysocki, Ionela Voinescu, Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann, Greg Kroah-Hartman, Ingo Molnar, Juri Lelli, Mel Gorman, Peter Zijlstra, Rafael J. Wysocki, Steven Rostedt, Sudeep Holla, Vincent Guittot, Will Deacon Cc: linux-pm, linux-acpi, linux-kernel, Paul E. McKenney, Rafael J. Wysocki On 6/21/2021 5:19 AM, Viresh Kumar wrote: > CPPC cpufreq driver is used for ARM servers and this patch series tries to > provide counter-based frequency invariance support for them in the absence for > architecture specific counters (like AMUs). Viresh, this series works fine on my quick tests so far. BTW, I noticed some strange things even with the series applied mentioned below when reading acpi_cppc vs cpufreq sysfs. Do you happen to know are those just hardware/firmware issues because Linux just faithfully exported the register values? == Arm64 server Foo == CPU max MHz: 3000.0000 CPU min MHz: 1000.0000 /sys/devices/system/cpu/cpu0/acpi_cppc/highest_perf 300 /sys/devices/system/cpu/cpu0/acpi_cppc/lowest_freq 1000 /sys/devices/system/cpu/cpu0/acpi_cppc/lowest_nonlinear_perf 200 /sys/devices/system/cpu/cpu0/acpi_cppc/lowest_perf 100 /sys/devices/system/cpu/cpu0/acpi_cppc/nominal_freq <--- should be 3000? 2800 /sys/devices/system/cpu/cpu0/acpi_cppc/nominal_perf <--- should be 300? 280 /sys/devices/system/cpu/cpu0/acpi_cppc/reference_perf 100 == Arm64 server Bar == CPU max MHz: 3000.0000 CPU min MHz: 375.0000 /sys/devices/system/cpu/cpu0/acpi_cppc/highest_perf <--- should be 3000? There is no cpufreq boost. 3300 /sys/devices/system/cpu/cpu0/acpi_cppc/lowest_freq <--- don't understand why 0. 0 /sys/devices/system/cpu/cpu0/acpi_cppc/lowest_nonlinear_perf 375 /sys/devices/system/cpu/cpu0/acpi_cppc/lowest_perf 375 /sys/devices/system/cpu/cpu0/acpi_cppc/nominal_freq <--- ditto 0 /sys/devices/system/cpu/cpu0/acpi_cppc/nominal_perf 3000 /sys/devices/system/cpu/cpu0/acpi_cppc/reference_perf 100 ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH V3 0/4] cpufreq: cppc: Add support for frequency invariance 2021-06-21 20:48 ` [PATCH V3 0/4] cpufreq: cppc: " Qian Cai @ 2021-06-22 6:52 ` Viresh Kumar 2021-06-23 4:16 ` Viresh Kumar 1 sibling, 0 replies; 41+ messages in thread From: Viresh Kumar @ 2021-06-22 6:52 UTC (permalink / raw) To: Qian Cai Cc: Rafael Wysocki, Ionela Voinescu, Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann, Greg Kroah-Hartman, Ingo Molnar, Juri Lelli, Mel Gorman, Peter Zijlstra, Rafael J. Wysocki, Steven Rostedt, Sudeep Holla, Vincent Guittot, Will Deacon, linux-pm, linux-acpi, linux-kernel, Paul E. McKenney, Rafael J. Wysocki On 21-06-21, 16:48, Qian Cai wrote: > Viresh, this series works fine on my quick tests so far. Thanks for testing. > BTW, I > noticed some strange things even with the series applied mentioned > below when reading acpi_cppc vs cpufreq sysfs. Do you happen to know > are those just hardware/firmware issues because Linux just > faithfully exported the register values? The values are exported by drivers/acpi/cppc_acpi.c I believe and they look to be based on simple register reads. > == Arm64 server Foo == > CPU max MHz: 3000.0000 > CPU min MHz: 1000.0000 > > /sys/devices/system/cpu/cpu0/acpi_cppc/highest_perf > 300 > /sys/devices/system/cpu/cpu0/acpi_cppc/lowest_freq > 1000 > /sys/devices/system/cpu/cpu0/acpi_cppc/lowest_nonlinear_perf > 200 > /sys/devices/system/cpu/cpu0/acpi_cppc/lowest_perf > 100 > /sys/devices/system/cpu/cpu0/acpi_cppc/nominal_freq <--- should be 3000? > 2800 > /sys/devices/system/cpu/cpu0/acpi_cppc/nominal_perf <--- should be 300? > 280 nominal-perf is max perf, and highest-perf is boost-perf. Same goes with nominal-freq (i.e. policy->max). So 280 and 2800 look to be the correct values, 300 and 3000 come with boost enabled. Look at the first entry, highest_perf. > /sys/devices/system/cpu/cpu0/acpi_cppc/reference_perf > 100 > > == Arm64 server Bar == > CPU max MHz: 3000.0000 > CPU min MHz: 375.0000 > > /sys/devices/system/cpu/cpu0/acpi_cppc/highest_perf <--- should be 3000? There is no cpufreq boost. > 3300 This isn't exported by cpufreq driver but acpi, and it just exports hardware values of highest_perf (with boost i.e.). cpufreq may or may not use this to support boost. > /sys/devices/system/cpu/cpu0/acpi_cppc/lowest_freq <--- don't understand why 0. > 0 Because corresponding hardware registers aren't implemented for your platform, this is the function that reads these registers: int cppc_get_perf_caps(int cpunum, struct cppc_perf_caps *perf_caps) { ... /* Read optional lowest and nominal frequencies if present */ if (CPC_SUPPORTED(low_freq_reg)) cpc_read(cpunum, low_freq_reg, &low_f); if (CPC_SUPPORTED(nom_freq_reg)) cpc_read(cpunum, nom_freq_reg, &nom_f); perf_caps->lowest_freq = low_f; perf_caps->nominal_freq = nom_f; } > /sys/devices/system/cpu/cpu0/acpi_cppc/lowest_nonlinear_perf > 375 > /sys/devices/system/cpu/cpu0/acpi_cppc/lowest_perf > 375 > /sys/devices/system/cpu/cpu0/acpi_cppc/nominal_freq <--- ditto > 0 > /sys/devices/system/cpu/cpu0/acpi_cppc/nominal_perf > 3000 > /sys/devices/system/cpu/cpu0/acpi_cppc/reference_perf > 100 -- viresh ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH V3 0/4] cpufreq: cppc: Add support for frequency invariance 2021-06-21 20:48 ` [PATCH V3 0/4] cpufreq: cppc: " Qian Cai 2021-06-22 6:52 ` Viresh Kumar @ 2021-06-23 4:16 ` Viresh Kumar 2021-06-23 12:57 ` Qian Cai 1 sibling, 1 reply; 41+ messages in thread From: Viresh Kumar @ 2021-06-23 4:16 UTC (permalink / raw) To: Qian Cai Cc: Rafael Wysocki, Ionela Voinescu, Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann, Greg Kroah-Hartman, Ingo Molnar, Juri Lelli, Mel Gorman, Peter Zijlstra, Rafael J. Wysocki, Steven Rostedt, Sudeep Holla, Vincent Guittot, Will Deacon, linux-pm, linux-acpi, linux-kernel, Paul E. McKenney, Rafael J. Wysocki On 21-06-21, 16:48, Qian Cai wrote: > > > On 6/21/2021 5:19 AM, Viresh Kumar wrote: > > CPPC cpufreq driver is used for ARM servers and this patch series tries to > > provide counter-based frequency invariance support for them in the absence for > > architecture specific counters (like AMUs). > > Viresh, this series works fine on my quick tests so far. Do you want me to add your Tested-by for the series ? -- viresh ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH V3 0/4] cpufreq: cppc: Add support for frequency invariance 2021-06-23 4:16 ` Viresh Kumar @ 2021-06-23 12:57 ` Qian Cai 2021-06-24 2:54 ` Viresh Kumar 0 siblings, 1 reply; 41+ messages in thread From: Qian Cai @ 2021-06-23 12:57 UTC (permalink / raw) To: Viresh Kumar Cc: Rafael Wysocki, Ionela Voinescu, Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann, Greg Kroah-Hartman, Ingo Molnar, Juri Lelli, Mel Gorman, Peter Zijlstra, Rafael J. Wysocki, Steven Rostedt, Sudeep Holla, Vincent Guittot, Will Deacon, linux-pm, linux-acpi, linux-kernel, Paul E. McKenney, Rafael J. Wysocki On 6/23/2021 12:16 AM, Viresh Kumar wrote: > On 21-06-21, 16:48, Qian Cai wrote: >> >> >> On 6/21/2021 5:19 AM, Viresh Kumar wrote: >>> CPPC cpufreq driver is used for ARM servers and this patch series tries to >>> provide counter-based frequency invariance support for them in the absence for >>> architecture specific counters (like AMUs). >> >> Viresh, this series works fine on my quick tests so far. > > Do you want me to add your Tested-by for the series ? Viresh, I am afraid I don't feel comfortable yet. I have a few new tests in development, and will provide an update once ready. Also, I noticed the delivered perf is even smaller than lowest_perf (100). # cat /sys/devices/system/cpu/cpu8/acpi_cppc/feedback_ctrs ref:103377547901 del:54540736873 # cat /sys/devices/system/cpu/cpu8/acpi_cppc/feedback_ctrs ref:103379170101 del:54541599117 100 * (54541599117 - 54540736873) / (103379170101 - 103377547901) = 53 My understanding is that the delivered perf should fail into the range between lowest_perf and highest_perf. Is that assumption correct? This happens on 5.4-based kernel, so I am in process running your series on that system to see if there is any differences. In any case, if it is a bug it is pre-existing, but I'd like to understand a bit better in that front first. ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH V3 0/4] cpufreq: cppc: Add support for frequency invariance 2021-06-23 12:57 ` Qian Cai @ 2021-06-24 2:54 ` Viresh Kumar 2021-06-24 9:49 ` Vincent Guittot 0 siblings, 1 reply; 41+ messages in thread From: Viresh Kumar @ 2021-06-24 2:54 UTC (permalink / raw) To: Qian Cai Cc: Rafael Wysocki, Ionela Voinescu, Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann, Greg Kroah-Hartman, Ingo Molnar, Juri Lelli, Mel Gorman, Peter Zijlstra, Rafael J. Wysocki, Steven Rostedt, Sudeep Holla, Vincent Guittot, Will Deacon, linux-pm, linux-acpi, linux-kernel, Paul E. McKenney, Rafael J. Wysocki On 23-06-21, 08:57, Qian Cai wrote: > Viresh, I am afraid I don't feel comfortable yet. I have a few new tests in > development, and will provide an update once ready. Oh sure, np. > Also, I noticed the delivered perf is even smaller than lowest_perf (100). > # cat /sys/devices/system/cpu/cpu8/acpi_cppc/feedback_ctrs > ref:103377547901 del:54540736873 > # cat /sys/devices/system/cpu/cpu8/acpi_cppc/feedback_ctrs > ref:103379170101 del:54541599117 > > 100 * (54541599117 - 54540736873) / (103379170101 - 103377547901) = 53 > > My understanding is that the delivered perf should fail into the range between > lowest_perf and highest_perf. Is that assumption correct? This happens on > 5.4-based kernel, so I am in process running your series on that system to see > if there is any differences. In any case, if it is a bug it is pre-existing, > but I'd like to understand a bit better in that front first. Vincent: Can that happen because of CPU idle ? -- viresh ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH V3 0/4] cpufreq: cppc: Add support for frequency invariance 2021-06-24 2:54 ` Viresh Kumar @ 2021-06-24 9:49 ` Vincent Guittot 2021-06-24 10:48 ` Ionela Voinescu 0 siblings, 1 reply; 41+ messages in thread From: Vincent Guittot @ 2021-06-24 9:49 UTC (permalink / raw) To: Viresh Kumar Cc: Qian Cai, Rafael Wysocki, Ionela Voinescu, Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann, Greg Kroah-Hartman, Ingo Molnar, Juri Lelli, Mel Gorman, Peter Zijlstra, Rafael J. Wysocki, Steven Rostedt, Sudeep Holla, Will Deacon, open list:THERMAL, ACPI Devel Maling List, linux-kernel, Paul E. McKenney, Rafael J. Wysocki On Thu, 24 Jun 2021 at 04:54, Viresh Kumar <viresh.kumar@linaro.org> wrote: > > On 23-06-21, 08:57, Qian Cai wrote: > > Viresh, I am afraid I don't feel comfortable yet. I have a few new tests in > > development, and will provide an update once ready. > > Oh sure, np. > > > Also, I noticed the delivered perf is even smaller than lowest_perf (100). > > > # cat /sys/devices/system/cpu/cpu8/acpi_cppc/feedback_ctrs > > ref:103377547901 del:54540736873 > > # cat /sys/devices/system/cpu/cpu8/acpi_cppc/feedback_ctrs > > ref:103379170101 del:54541599117 > > > > 100 * (54541599117 - 54540736873) / (103379170101 - 103377547901) = 53 I'm not sure that I understand your point. The formula above says that cpu8 run @ 53% of nominal performance > > > > My understanding is that the delivered perf should fail into the range between > > lowest_perf and highest_perf. Is that assumption correct? This happens on > > 5.4-based kernel, so I am in process running your series on that system to see > > if there is any differences. In any case, if it is a bug it is pre-existing, > > but I'd like to understand a bit better in that front first. > > Vincent: > > Can that happen because of CPU idle ? > > -- > viresh ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH V3 0/4] cpufreq: cppc: Add support for frequency invariance 2021-06-24 9:49 ` Vincent Guittot @ 2021-06-24 10:48 ` Ionela Voinescu 2021-06-24 11:15 ` Vincent Guittot ` (2 more replies) 0 siblings, 3 replies; 41+ messages in thread From: Ionela Voinescu @ 2021-06-24 10:48 UTC (permalink / raw) To: Vincent Guittot Cc: Viresh Kumar, Qian Cai, Rafael Wysocki, Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann, Greg Kroah-Hartman, Ingo Molnar, Juri Lelli, Mel Gorman, Peter Zijlstra, Rafael J. Wysocki, Steven Rostedt, Sudeep Holla, Will Deacon, open list:THERMAL, ACPI Devel Maling List, linux-kernel, Paul E. McKenney, Rafael J. Wysocki Hi guys, On Thursday 24 Jun 2021 at 11:49:53 (+0200), Vincent Guittot wrote: > On Thu, 24 Jun 2021 at 04:54, Viresh Kumar <viresh.kumar@linaro.org> wrote: > > > > On 23-06-21, 08:57, Qian Cai wrote: > > > Viresh, I am afraid I don't feel comfortable yet. I have a few new tests in > > > development, and will provide an update once ready. > > > > Oh sure, np. > > > > > Also, I noticed the delivered perf is even smaller than lowest_perf (100). > > > > > # cat /sys/devices/system/cpu/cpu8/acpi_cppc/feedback_ctrs > > > ref:103377547901 del:54540736873 > > > # cat /sys/devices/system/cpu/cpu8/acpi_cppc/feedback_ctrs > > > ref:103379170101 del:54541599117 > > > > > > 100 * (54541599117 - 54540736873) / (103379170101 - 103377547901) = 53 > > I'm not sure that I understand your point. The formula above says that > cpu8 run @ 53% of nominal performance > I think this is based on a previous example Qian had where: /sys/devices/system/cpu/cpu0/acpi_cppc/highest_perf 300 /sys/devices/system/cpu/cpu0/acpi_cppc/lowest_freq 1000 /sys/devices/system/cpu/cpu0/acpi_cppc/lowest_perf 100 /sys/devices/system/cpu/cpu0/acpi_cppc/reference_perf 100 ..so the 100 is not from obtaining percentage, is the reference performance. The logic of the formula is to obtain the delivered performance when knowing the number of ticks for each counter, so: So if one gets (103379170101 - 103377547901) ticks for the counter at running at 1GHz(perf 100), what is the frequency of the core, if its counter ticked (54541599117 - 54540736873) times in the same interval of time? The answer is 530MHz(perf 53), which is lower than the lowest frequency at 1GHz(perf 100). > > > > > > My understanding is that the delivered perf should fail into the range between > > > lowest_perf and highest_perf. Is that assumption correct? This happens on > > > 5.4-based kernel, so I am in process running your series on that system to see > > > if there is any differences. In any case, if it is a bug it is pre-existing, > > > but I'd like to understand a bit better in that front first. > > > > Vincent: > > > > Can that happen because of CPU idle ? > > Not if the counters are implemented properly. The kernel considers that both reference and delivered performance counters should stop or reset during idle. The kernel would not account for idle itself. If the reference performance counter does not stop during idle, while the core performance counter (delivered) does stop, the behavior above should be seen very often. Qian, do you see these small delivered performance values often or seldom? Thanks, Ionela. > > -- > > viresh ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH V3 0/4] cpufreq: cppc: Add support for frequency invariance 2021-06-24 10:48 ` Ionela Voinescu @ 2021-06-24 11:15 ` Vincent Guittot 2021-06-24 11:23 ` Ionela Voinescu 2021-06-24 15:17 ` Qian Cai 2021-06-24 20:44 ` Qian Cai 2 siblings, 1 reply; 41+ messages in thread From: Vincent Guittot @ 2021-06-24 11:15 UTC (permalink / raw) To: Ionela Voinescu Cc: Viresh Kumar, Qian Cai, Rafael Wysocki, Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann, Greg Kroah-Hartman, Ingo Molnar, Juri Lelli, Mel Gorman, Peter Zijlstra, Rafael J. Wysocki, Steven Rostedt, Sudeep Holla, Will Deacon, open list:THERMAL, ACPI Devel Maling List, linux-kernel, Paul E. McKenney, Rafael J. Wysocki On Thu, 24 Jun 2021 at 12:48, Ionela Voinescu <ionela.voinescu@arm.com> wrote: > > Hi guys, > > On Thursday 24 Jun 2021 at 11:49:53 (+0200), Vincent Guittot wrote: > > On Thu, 24 Jun 2021 at 04:54, Viresh Kumar <viresh.kumar@linaro.org> wrote: > > > > > > On 23-06-21, 08:57, Qian Cai wrote: > > > > Viresh, I am afraid I don't feel comfortable yet. I have a few new tests in > > > > development, and will provide an update once ready. > > > > > > Oh sure, np. > > > > > > > Also, I noticed the delivered perf is even smaller than lowest_perf (100). > > > > > > > # cat /sys/devices/system/cpu/cpu8/acpi_cppc/feedback_ctrs > > > > ref:103377547901 del:54540736873 > > > > # cat /sys/devices/system/cpu/cpu8/acpi_cppc/feedback_ctrs > > > > ref:103379170101 del:54541599117 > > > > > > > > 100 * (54541599117 - 54540736873) / (103379170101 - 103377547901) = 53 > > > > I'm not sure that I understand your point. The formula above says that > > cpu8 run @ 53% of nominal performance > > > > I think this is based on a previous example Qian had where: > > /sys/devices/system/cpu/cpu0/acpi_cppc/highest_perf > 300 > /sys/devices/system/cpu/cpu0/acpi_cppc/lowest_freq > 1000 > /sys/devices/system/cpu/cpu0/acpi_cppc/lowest_perf > 100 > /sys/devices/system/cpu/cpu0/acpi_cppc/reference_perf > 100 > > ..so the 100 is not from obtaining percentage, is the reference > performance. > > The logic of the formula is to obtain the delivered performance when > knowing the number of ticks for each counter, so: > > So if one gets (103379170101 - 103377547901) ticks for the counter at > running at 1GHz(perf 100), what is the frequency of the core, if its > counter ticked (54541599117 - 54540736873) times in the same interval > of time? > > The answer is 530MHz(perf 53), which is lower than the lowest frequency > at 1GHz(perf 100). But the nominal_perf is 280 and not 100 if i'm not wrong so the perf value is 148 > lowest_perf in this case > > > > > > > > > > My understanding is that the delivered perf should fail into the range between > > > > lowest_perf and highest_perf. Is that assumption correct? This happens on > > > > 5.4-based kernel, so I am in process running your series on that system to see > > > > if there is any differences. In any case, if it is a bug it is pre-existing, > > > > but I'd like to understand a bit better in that front first. > > > > > > Vincent: > > > > > > Can that happen because of CPU idle ? > > > > > Not if the counters are implemented properly. The kernel considers that > both reference and delivered performance counters should stop or reset > during idle. The kernel would not account for idle itself. > > If the reference performance counter does not stop during idle, while > the core performance counter (delivered) does stop, the behavior above > should be seen very often. > > Qian, do you see these small delivered performance values often or > seldom? > > Thanks, > Ionela. > > > > -- > > > viresh ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH V3 0/4] cpufreq: cppc: Add support for frequency invariance 2021-06-24 11:15 ` Vincent Guittot @ 2021-06-24 11:23 ` Ionela Voinescu 2021-06-24 11:59 ` Vincent Guittot 0 siblings, 1 reply; 41+ messages in thread From: Ionela Voinescu @ 2021-06-24 11:23 UTC (permalink / raw) To: Vincent Guittot Cc: Viresh Kumar, Qian Cai, Rafael Wysocki, Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann, Greg Kroah-Hartman, Ingo Molnar, Juri Lelli, Mel Gorman, Peter Zijlstra, Rafael J. Wysocki, Steven Rostedt, Sudeep Holla, Will Deacon, open list:THERMAL, ACPI Devel Maling List, linux-kernel, Paul E. McKenney, Rafael J. Wysocki On Thursday 24 Jun 2021 at 13:15:04 (+0200), Vincent Guittot wrote: > On Thu, 24 Jun 2021 at 12:48, Ionela Voinescu <ionela.voinescu@arm.com> wrote: > > > > Hi guys, > > > > On Thursday 24 Jun 2021 at 11:49:53 (+0200), Vincent Guittot wrote: > > > On Thu, 24 Jun 2021 at 04:54, Viresh Kumar <viresh.kumar@linaro.org> wrote: > > > > > > > > On 23-06-21, 08:57, Qian Cai wrote: > > > > > Viresh, I am afraid I don't feel comfortable yet. I have a few new tests in > > > > > development, and will provide an update once ready. > > > > > > > > Oh sure, np. > > > > > > > > > Also, I noticed the delivered perf is even smaller than lowest_perf (100). > > > > > > > > > # cat /sys/devices/system/cpu/cpu8/acpi_cppc/feedback_ctrs > > > > > ref:103377547901 del:54540736873 > > > > > # cat /sys/devices/system/cpu/cpu8/acpi_cppc/feedback_ctrs > > > > > ref:103379170101 del:54541599117 > > > > > > > > > > 100 * (54541599117 - 54540736873) / (103379170101 - 103377547901) = 53 > > > > > > I'm not sure that I understand your point. The formula above says that > > > cpu8 run @ 53% of nominal performance > > > > > > > I think this is based on a previous example Qian had where: > > > > /sys/devices/system/cpu/cpu0/acpi_cppc/highest_perf > > 300 > > /sys/devices/system/cpu/cpu0/acpi_cppc/lowest_freq > > 1000 > > /sys/devices/system/cpu/cpu0/acpi_cppc/lowest_perf > > 100 > > /sys/devices/system/cpu/cpu0/acpi_cppc/reference_perf > > 100 > > > > ..so the 100 is not from obtaining percentage, is the reference > > performance. > > > > The logic of the formula is to obtain the delivered performance when > > knowing the number of ticks for each counter, so: > > > > So if one gets (103379170101 - 103377547901) ticks for the counter at > > running at 1GHz(perf 100), what is the frequency of the core, if its > > counter ticked (54541599117 - 54540736873) times in the same interval > > of time? > > > > The answer is 530MHz(perf 53), which is lower than the lowest frequency > > at 1GHz(perf 100). > > But the nominal_perf is 280 and not 100 if i'm not wrong so the perf > value is 148 > lowest_perf in this case > Nominal performance has no meaning here. The reference counter ticks with the frequency equivalent to reference performance. Nominal performance is the maximum performance when !boost. Highest performance is the maximum performance available including boost frequencies. So nominal performance has no impact in these translations from counter values to delivered performance. Hope it helps, Ionela. > > > > > > > > > > > > > > > My understanding is that the delivered perf should fail into the range between > > > > > lowest_perf and highest_perf. Is that assumption correct? This happens on > > > > > 5.4-based kernel, so I am in process running your series on that system to see > > > > > if there is any differences. In any case, if it is a bug it is pre-existing, > > > > > but I'd like to understand a bit better in that front first. > > > > > > > > Vincent: > > > > > > > > Can that happen because of CPU idle ? > > > > > > > > Not if the counters are implemented properly. The kernel considers that > > both reference and delivered performance counters should stop or reset > > during idle. The kernel would not account for idle itself. > > > > If the reference performance counter does not stop during idle, while > > the core performance counter (delivered) does stop, the behavior above > > should be seen very often. > > > > Qian, do you see these small delivered performance values often or > > seldom? > > > > Thanks, > > Ionela. > > > > > > -- > > > > viresh ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH V3 0/4] cpufreq: cppc: Add support for frequency invariance 2021-06-24 11:23 ` Ionela Voinescu @ 2021-06-24 11:59 ` Vincent Guittot 0 siblings, 0 replies; 41+ messages in thread From: Vincent Guittot @ 2021-06-24 11:59 UTC (permalink / raw) To: Ionela Voinescu Cc: Viresh Kumar, Qian Cai, Rafael Wysocki, Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann, Greg Kroah-Hartman, Ingo Molnar, Juri Lelli, Mel Gorman, Peter Zijlstra, Rafael J. Wysocki, Steven Rostedt, Sudeep Holla, Will Deacon, open list:THERMAL, ACPI Devel Maling List, linux-kernel, Paul E. McKenney, Rafael J. Wysocki On Thu, 24 Jun 2021 at 13:23, Ionela Voinescu <ionela.voinescu@arm.com> wrote: > > On Thursday 24 Jun 2021 at 13:15:04 (+0200), Vincent Guittot wrote: > > On Thu, 24 Jun 2021 at 12:48, Ionela Voinescu <ionela.voinescu@arm.com> wrote: > > > > > > Hi guys, > > > > > > On Thursday 24 Jun 2021 at 11:49:53 (+0200), Vincent Guittot wrote: > > > > On Thu, 24 Jun 2021 at 04:54, Viresh Kumar <viresh.kumar@linaro.org> wrote: > > > > > > > > > > On 23-06-21, 08:57, Qian Cai wrote: > > > > > > Viresh, I am afraid I don't feel comfortable yet. I have a few new tests in > > > > > > development, and will provide an update once ready. > > > > > > > > > > Oh sure, np. > > > > > > > > > > > Also, I noticed the delivered perf is even smaller than lowest_perf (100). > > > > > > > > > > > # cat /sys/devices/system/cpu/cpu8/acpi_cppc/feedback_ctrs > > > > > > ref:103377547901 del:54540736873 > > > > > > # cat /sys/devices/system/cpu/cpu8/acpi_cppc/feedback_ctrs > > > > > > ref:103379170101 del:54541599117 > > > > > > > > > > > > 100 * (54541599117 - 54540736873) / (103379170101 - 103377547901) = 53 > > > > > > > > I'm not sure that I understand your point. The formula above says that > > > > cpu8 run @ 53% of nominal performance > > > > > > > > > > I think this is based on a previous example Qian had where: > > > > > > /sys/devices/system/cpu/cpu0/acpi_cppc/highest_perf > > > 300 > > > /sys/devices/system/cpu/cpu0/acpi_cppc/lowest_freq > > > 1000 > > > /sys/devices/system/cpu/cpu0/acpi_cppc/lowest_perf > > > 100 > > > /sys/devices/system/cpu/cpu0/acpi_cppc/reference_perf > > > 100 > > > > > > ..so the 100 is not from obtaining percentage, is the reference > > > performance. > > > > > > The logic of the formula is to obtain the delivered performance when > > > knowing the number of ticks for each counter, so: > > > > > > So if one gets (103379170101 - 103377547901) ticks for the counter at > > > running at 1GHz(perf 100), what is the frequency of the core, if its > > > counter ticked (54541599117 - 54540736873) times in the same interval > > > of time? > > > > > > The answer is 530MHz(perf 53), which is lower than the lowest frequency > > > at 1GHz(perf 100). > > > > But the nominal_perf is 280 and not 100 if i'm not wrong so the perf > > value is 148 > lowest_perf in this case > > > > Nominal performance has no meaning here. The reference counter ticks > with the frequency equivalent to reference performance. > > Nominal performance is the maximum performance when !boost. Highest > performance is the maximum performance available including boost > frequencies. So nominal performance has no impact in these translations > from counter values to delivered performance. my bad, nominal_perf == reference_perf on the systems that I have locally > > Hope it helps, > Ionela. > > > > > > > > > > > > > > > > > > > > > My understanding is that the delivered perf should fail into the range between > > > > > > lowest_perf and highest_perf. Is that assumption correct? This happens on > > > > > > 5.4-based kernel, so I am in process running your series on that system to see > > > > > > if there is any differences. In any case, if it is a bug it is pre-existing, > > > > > > but I'd like to understand a bit better in that front first. > > > > > > > > > > Vincent: > > > > > > > > > > Can that happen because of CPU idle ? > > > > > > > > > > > Not if the counters are implemented properly. The kernel considers that > > > both reference and delivered performance counters should stop or reset > > > during idle. The kernel would not account for idle itself. > > > > > > If the reference performance counter does not stop during idle, while > > > the core performance counter (delivered) does stop, the behavior above > > > should be seen very often. > > > > > > Qian, do you see these small delivered performance values often or > > > seldom? > > > > > > Thanks, > > > Ionela. > > > > > > > > -- > > > > > viresh ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH V3 0/4] cpufreq: cppc: Add support for frequency invariance 2021-06-24 10:48 ` Ionela Voinescu 2021-06-24 11:15 ` Vincent Guittot @ 2021-06-24 15:17 ` Qian Cai 2021-06-25 10:21 ` Ionela Voinescu 2021-06-24 20:44 ` Qian Cai 2 siblings, 1 reply; 41+ messages in thread From: Qian Cai @ 2021-06-24 15:17 UTC (permalink / raw) To: Ionela Voinescu, Vincent Guittot Cc: Viresh Kumar, Rafael Wysocki, Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann, Greg Kroah-Hartman, Ingo Molnar, Juri Lelli, Mel Gorman, Peter Zijlstra, Rafael J. Wysocki, Steven Rostedt, Sudeep Holla, Will Deacon, open list:THERMAL, ACPI Devel Maling List, linux-kernel, Paul E. McKenney, Rafael J. Wysocki On 6/24/2021 6:48 AM, Ionela Voinescu wrote: > Not if the counters are implemented properly. The kernel considers that > both reference and delivered performance counters should stop or reset > during idle. The kernel would not account for idle itself. > > If the reference performance counter does not stop during idle, while > the core performance counter (delivered) does stop, the behavior above > should be seen very often. > > Qian, do you see these small delivered performance values often or > seldom? Ionela, so I managed to upgrade the kernel on the system to today's linux-next which suppose to include this series. The delivered perf is now 280. However, scaling_min_freq (200 MHz) is not equal to lowest_perf (100). scaling_driver: acpi_cppc scaling_governor: schedutil Is that normal because lowest_nonlinear_perf is 200? Also, on this pretty idle system, 158 of 160 CPUs are always running in max freq (280 MHz). The other 2 are running in 243 and 213 MHz according to scaling_cur_freq. Apparently, "schedutil" does not work proper on this system. I am going to try other governors to narrow down the issue a bit. FYI, here is the acpi_cppc registers reading: /sys/devices/system/cpu/cpu0/acpi_cppc/feedback_ctrs ref:160705801 del:449594095 /sys/devices/system/cpu/cpu0/acpi_cppc/highest_perf 300 /sys/devices/system/cpu/cpu0/acpi_cppc/lowest_freq 1000 /sys/devices/system/cpu/cpu0/acpi_cppc/lowest_nonlinear_perf 200 /sys/devices/system/cpu/cpu0/acpi_cppc/lowest_perf 100 /sys/devices/system/cpu/cpu0/acpi_cppc/nominal_freq 2800 /sys/devices/system/cpu/cpu0/acpi_cppc/nominal_perf 280 /sys/devices/system/cpu/cpu0/acpi_cppc/reference_perf 100 /sys/devices/system/cpu/cpu0/acpi_cppc/wraparound_time 18446744073709551615 ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH V3 0/4] cpufreq: cppc: Add support for frequency invariance 2021-06-24 15:17 ` Qian Cai @ 2021-06-25 10:21 ` Ionela Voinescu 2021-06-25 13:31 ` Qian Cai 0 siblings, 1 reply; 41+ messages in thread From: Ionela Voinescu @ 2021-06-25 10:21 UTC (permalink / raw) To: Qian Cai Cc: Vincent Guittot, Viresh Kumar, Rafael Wysocki, Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann, Greg Kroah-Hartman, Ingo Molnar, Juri Lelli, Mel Gorman, Peter Zijlstra, Rafael J. Wysocki, Steven Rostedt, Sudeep Holla, Will Deacon, open list:THERMAL, ACPI Devel Maling List, linux-kernel, Paul E. McKenney, Rafael J. Wysocki Hi Qian, On Thursday 24 Jun 2021 at 11:17:55 (-0400), Qian Cai wrote: > > > On 6/24/2021 6:48 AM, Ionela Voinescu wrote: > > Not if the counters are implemented properly. The kernel considers that > > both reference and delivered performance counters should stop or reset > > during idle. The kernel would not account for idle itself. > > > > If the reference performance counter does not stop during idle, while > > the core performance counter (delivered) does stop, the behavior above > > should be seen very often. > > > > Qian, do you see these small delivered performance values often or > > seldom? > > Ionela, so I managed to upgrade the kernel on the system to today's > linux-next which suppose to include this series. The delivered perf > is now 280. However, scaling_min_freq (200 MHz) is not equal to > lowest_perf (100). > > scaling_driver: acpi_cppc ^^^^^^^^^ I suppose you mean "cppc-cpufreq"? "acpi_cppc" is not a scaling driver option. > scaling_governor: schedutil > > Is that normal because lowest_nonlinear_perf is 200? > Yes, that is right : [1] > Also, on this pretty idle system, 158 of 160 CPUs are always running > in max freq (280 MHz). The other 2 are running in 243 and 213 MHz > according to scaling_cur_freq. Apparently, "schedutil" does not work > proper on this system. I am going to try other governors to narrow > down the issue a bit. So your CPUs run at frequencies between 200MHz and 280MHz? Based on your acpi_cppc information below I would have assumed 2GHz as lowest nonlinear and 2.8GHz as nominal. The reason for this is that according to the ACPI spec the frequency values in the _CPC objects are supposed to be in MHz, so 2800 MHz for nominal frequency would be 2.8GHz. When you try more governors, make sure to check out the difference between scaling_cur_freq and cpuinfo_cur_freq at [2]. The first gives you the frequency that the governor (schedutil) is asking for, while the second is giving you the current frequency obtained from the counters. So to check the actual frequency the cores are running at, please check cpuinfo_cur_freq. [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/cpufreq/cppc_cpufreq.c?h=v5.13-rc7#n296 [2] https://www.kernel.org/doc/Documentation/cpu-freq/user-guide.txt Thanks, Ionela. > > FYI, here is the acpi_cppc registers reading: > > /sys/devices/system/cpu/cpu0/acpi_cppc/feedback_ctrs > ref:160705801 del:449594095 > /sys/devices/system/cpu/cpu0/acpi_cppc/highest_perf > 300 > /sys/devices/system/cpu/cpu0/acpi_cppc/lowest_freq > 1000 > /sys/devices/system/cpu/cpu0/acpi_cppc/lowest_nonlinear_perf > 200 > /sys/devices/system/cpu/cpu0/acpi_cppc/lowest_perf > 100 > /sys/devices/system/cpu/cpu0/acpi_cppc/nominal_freq > 2800 > /sys/devices/system/cpu/cpu0/acpi_cppc/nominal_perf > 280 > /sys/devices/system/cpu/cpu0/acpi_cppc/reference_perf > 100 > /sys/devices/system/cpu/cpu0/acpi_cppc/wraparound_time > 18446744073709551615 ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH V3 0/4] cpufreq: cppc: Add support for frequency invariance 2021-06-25 10:21 ` Ionela Voinescu @ 2021-06-25 13:31 ` Qian Cai 2021-06-25 14:37 ` Ionela Voinescu 2021-06-29 4:45 ` Viresh Kumar 0 siblings, 2 replies; 41+ messages in thread From: Qian Cai @ 2021-06-25 13:31 UTC (permalink / raw) To: Ionela Voinescu Cc: Vincent Guittot, Viresh Kumar, Rafael Wysocki, Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann, Greg Kroah-Hartman, Ingo Molnar, Juri Lelli, Mel Gorman, Peter Zijlstra, Rafael J. Wysocki, Steven Rostedt, Sudeep Holla, Will Deacon, open list:THERMAL, ACPI Devel Maling List, linux-kernel, Paul E. McKenney, Rafael J. Wysocki On 6/25/2021 6:21 AM, Ionela Voinescu wrote: >> scaling_driver: acpi_cppc > ^^^^^^^^^ > I suppose you mean "cppc-cpufreq"? > > "acpi_cppc" is not a scaling driver option. Ionela, yes. Sorry about that. > So your CPUs run at frequencies between 200MHz and 280MHz? 2000 to 2800 MHz. > Based on your acpi_cppc information below I would have assumed 2GHz as > lowest nonlinear and 2.8GHz as nominal. The reason for this is that > according to the ACPI spec the frequency values in the _CPC objects are > supposed to be in MHz, so 2800 MHz for nominal frequency would be > 2.8GHz. > > When you try more governors, make sure to check out the difference > between scaling_cur_freq and cpuinfo_cur_freq at [2]. The first gives > you the frequency that the governor (schedutil) is asking for, while the > second is giving you the current frequency obtained from the counters. > > So to check the actual frequency the cores are running at, please check > cpuinfo_cur_freq. The problem is that all CPUs are never scaling down. "cpuinfo_cur_freq" and "scaling_cur_freq" are always the 2800 MHz on all CPUs on this idle system. This looks like a regression somewhere as in 5.4-based kernel, I can see "cpuinfo_cur_freq" can go down to 2000 MHz in the same scenario. I'll bisect a bit unless you have better ideas? ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH V3 0/4] cpufreq: cppc: Add support for frequency invariance 2021-06-25 13:31 ` Qian Cai @ 2021-06-25 14:37 ` Ionela Voinescu 2021-06-25 16:56 ` Qian Cai 2021-06-26 2:29 ` Qian Cai 2021-06-29 4:45 ` Viresh Kumar 1 sibling, 2 replies; 41+ messages in thread From: Ionela Voinescu @ 2021-06-25 14:37 UTC (permalink / raw) To: Qian Cai Cc: Vincent Guittot, Viresh Kumar, Rafael Wysocki, Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann, Greg Kroah-Hartman, Ingo Molnar, Juri Lelli, Mel Gorman, Peter Zijlstra, Rafael J. Wysocki, Steven Rostedt, Sudeep Holla, Will Deacon, open list:THERMAL, ACPI Devel Maling List, linux-kernel, Paul E. McKenney, Rafael J. Wysocki Hey, On Friday 25 Jun 2021 at 09:31:58 (-0400), Qian Cai wrote: > > > On 6/25/2021 6:21 AM, Ionela Voinescu wrote: > >> scaling_driver: acpi_cppc > > ^^^^^^^^^ > > I suppose you mean "cppc-cpufreq"? > > > > "acpi_cppc" is not a scaling driver option. > > Ionela, yes. Sorry about that. > > > So your CPUs run at frequencies between 200MHz and 280MHz? > > 2000 to 2800 MHz. > Thank you for the clarification. > > Based on your acpi_cppc information below I would have assumed 2GHz as > > lowest nonlinear and 2.8GHz as nominal. The reason for this is that > > according to the ACPI spec the frequency values in the _CPC objects are > > supposed to be in MHz, so 2800 MHz for nominal frequency would be > > 2.8GHz. > > > > When you try more governors, make sure to check out the difference > > between scaling_cur_freq and cpuinfo_cur_freq at [2]. The first gives > > you the frequency that the governor (schedutil) is asking for, while the > > second is giving you the current frequency obtained from the counters. > > > > So to check the actual frequency the cores are running at, please check > > cpuinfo_cur_freq. > > The problem is that all CPUs are never scaling down. "cpuinfo_cur_freq" > and "scaling_cur_freq" are always the 2800 MHz on all CPUs on this idle > system. This looks like a regression somewhere as in 5.4-based kernel, > I can see "cpuinfo_cur_freq" can go down to 2000 MHz in the same > scenario. I'll bisect a bit unless you have better ideas? Quick questions for you: 1. When you say you tried a 5.4 kernel, did you try it with these patches backported? They also have some dependencies with the recent changes in the arch topology driver and cpufreq so they would not be straight forward to backport. If the 5.4 kernel you tried did not have these patches, it might be best to try next/master that has these patches, but with CONFIG_ACPI_CPPC_CPUFREQ_FIE=n, just to eliminate the possibility that an incorrect frequency scale factor here would affect utilization that would then affect the schedutil frequency selection. I would not expect this behavior even if the scale factor was wrong, but it would be good to rule out. 2. Is your platform booting with all CPUs? Are any hotplug operations done in your scenario? Thanks, Ionela. ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH V3 0/4] cpufreq: cppc: Add support for frequency invariance 2021-06-25 14:37 ` Ionela Voinescu @ 2021-06-25 16:56 ` Qian Cai 2021-06-26 2:29 ` Qian Cai 1 sibling, 0 replies; 41+ messages in thread From: Qian Cai @ 2021-06-25 16:56 UTC (permalink / raw) To: Ionela Voinescu Cc: Vincent Guittot, Viresh Kumar, Rafael Wysocki, Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann, Greg Kroah-Hartman, Ingo Molnar, Juri Lelli, Mel Gorman, Peter Zijlstra, Rafael J. Wysocki, Steven Rostedt, Sudeep Holla, Will Deacon, open list:THERMAL, ACPI Devel Maling List, linux-kernel, Paul E. McKenney, Rafael J. Wysocki On 6/25/2021 10:37 AM, Ionela Voinescu wrote: > Quick questions for you: > > 1. When you say you tried a 5.4 kernel, did you try it with these > patches backported? They also have some dependencies with the recent > changes in the arch topology driver and cpufreq so they would not be > straight forward to backport. No. It turned out that this 5.4-based kernel has "ondemand" governor by default which works fine which could even scale down to the lowest_perf (1000 MHz). Once switched the governor to "schedutil", it could keep the freq to the lowest. While on the latest kernel, it also works fine by using "ondemand" first and then switch to "schedutil". Even though it can only scale down to lowest_nonlinear_perf (2000 MHz). It is more of that using "schedutil" by default would not work. Also, on the latest kernel, even "userspace" governor only allows to scale down to 2000 MHz. > If the 5.4 kernel you tried did not have these patches, it might be best > to try next/master that has these patches, but with > CONFIG_ACPI_CPPC_CPUFREQ_FIE=n, just to eliminate the possibility that > an incorrect frequency scale factor here would affect utilization that > would then affect the schedutil frequency selection. I would not expect > this behavior even if the scale factor was wrong, but it would be good > to rule out. I'll try that at least see if CONFIG_ACPI_CPPC_CPUFREQ_FIE=n would make the latest kernel to be able to scale down to 1000 MHz. > 2. Is your platform booting with all CPUs? Are any hotplug operations > done in your scenario? Yes, booting with all CPUs. No additional hotplug there. ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH V3 0/4] cpufreq: cppc: Add support for frequency invariance 2021-06-25 14:37 ` Ionela Voinescu 2021-06-25 16:56 ` Qian Cai @ 2021-06-26 2:29 ` Qian Cai 2021-06-26 13:41 ` Qian Cai ` (2 more replies) 1 sibling, 3 replies; 41+ messages in thread From: Qian Cai @ 2021-06-26 2:29 UTC (permalink / raw) To: Ionela Voinescu Cc: Vincent Guittot, Viresh Kumar, Rafael Wysocki, Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann, Greg Kroah-Hartman, Ingo Molnar, Juri Lelli, Mel Gorman, Peter Zijlstra, Rafael J. Wysocki, Steven Rostedt, Sudeep Holla, Will Deacon, open list:THERMAL, ACPI Devel Maling List, linux-kernel, Paul E. McKenney, Rafael J. Wysocki On 6/25/2021 10:37 AM, Ionela Voinescu wrote: > Quick questions for you: > > 1. When you say you tried a 5.4 kernel, did you try it with these > patches backported? They also have some dependencies with the recent > changes in the arch topology driver and cpufreq so they would not be > straight forward to backport. > > If the 5.4 kernel you tried did not have these patches, it might be best > to try next/master that has these patches, but with > CONFIG_ACPI_CPPC_CPUFREQ_FIE=n, just to eliminate the possibility that > an incorrect frequency scale factor here would affect utilization that > would then affect the schedutil frequency selection. I would not expect > this behavior even if the scale factor was wrong, but it would be good > to rule out. > > 2. Is your platform booting with all CPUs? Are any hotplug operations > done in your scenario? Ionela, I found that set ACPI_PROCESSOR=y instead of ACPI_PROCESSOR=m will fix the previous mentioned issues here (any explanations of that?) even though the scaling down is not perfect. Now, we have the following on this idle system: # cat /sys/devices/system/cpu/*/cpufreq/cpuinfo_cur_freq | sort | uniq -c 79 1000000 1 1160000 73 1400000 1 2000000 4 2010000 1 2800000 1 860000 Even if I rerun a few times, there could still have a few CPUs running lower than lowest_perf (1GHz). Also, even though I set all CPUs to use "userspace" governor and set freq to the lowest. A few CPUs keep changing at will. # cat /sys/devices/system/cpu/*/cpufreq/cpuinfo_cur_freq | sort | uniq -c 156 1000000 3 2000000 1 760000 ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH V3 0/4] cpufreq: cppc: Add support for frequency invariance 2021-06-26 2:29 ` Qian Cai @ 2021-06-26 13:41 ` Qian Cai 2021-06-29 4:55 ` Viresh Kumar 2021-06-29 4:52 ` Viresh Kumar 2021-06-29 9:06 ` Ionela Voinescu 2 siblings, 1 reply; 41+ messages in thread From: Qian Cai @ 2021-06-26 13:41 UTC (permalink / raw) To: Ionela Voinescu Cc: Vincent Guittot, Viresh Kumar, Rafael Wysocki, Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann, Greg Kroah-Hartman, Ingo Molnar, Juri Lelli, Mel Gorman, Peter Zijlstra, Rafael J. Wysocki, Steven Rostedt, Sudeep Holla, Will Deacon, open list:THERMAL, ACPI Devel Maling List, linux-kernel, Paul E. McKenney, Rafael J. Wysocki On 6/25/2021 10:29 PM, Qian Cai wrote: > Ionela, I found that set ACPI_PROCESSOR=y instead of ACPI_PROCESSOR=m will fix the previous mentioned issues here (any explanations of that?) even though the scaling down is not perfect. Now, we have the following on this idle system: > > # cat /sys/devices/system/cpu/*/cpufreq/cpuinfo_cur_freq | sort | uniq -c > 79 1000000 > 1 1160000 > 73 1400000 > 1 2000000 > 4 2010000 > 1 2800000 > 1 860000 > > Even if I rerun a few times, there could still have a few CPUs running lower than lowest_perf (1GHz). Also, even though I set all CPUs to use "userspace" governor and set freq to the lowest. A few CPUs keep changing at will. > > # cat /sys/devices/system/cpu/*/cpufreq/cpuinfo_cur_freq | sort | uniq -c > 156 1000000 > 3 2000000 > 1 760000 Another date point is that set ACPI_CPPC_CPUFREQ_FIE=n fixed the issue that any CPU could run below the lowest freq. schedutil: # cat /sys/devices/system/cpu/*/cpufreq/cpuinfo_cur_freq | sort | uniq -c 80 1000000 78 1400000 1 2010000 1 2800000 userspace: # cat /sys/devices/system/cpu/*/cpufreq/cpuinfo_cur_freq | sort | uniq -c 158 1000000 2 2000000 ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH V3 0/4] cpufreq: cppc: Add support for frequency invariance 2021-06-26 13:41 ` Qian Cai @ 2021-06-29 4:55 ` Viresh Kumar 0 siblings, 0 replies; 41+ messages in thread From: Viresh Kumar @ 2021-06-29 4:55 UTC (permalink / raw) To: Qian Cai Cc: Ionela Voinescu, Vincent Guittot, Rafael Wysocki, Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann, Greg Kroah-Hartman, Ingo Molnar, Juri Lelli, Mel Gorman, Peter Zijlstra, Rafael J. Wysocki, Steven Rostedt, Sudeep Holla, Will Deacon, open list:THERMAL, ACPI Devel Maling List, linux-kernel, Paul E. McKenney, Rafael J. Wysocki On 26-06-21, 09:41, Qian Cai wrote: > Another date point is that set ACPI_CPPC_CPUFREQ_FIE=n fixed the issue that any CPU could run below the lowest freq. > > schedutil: > # cat /sys/devices/system/cpu/*/cpufreq/cpuinfo_cur_freq | sort | uniq -c > 80 1000000 > 78 1400000 > 1 2010000 > 1 2800000 > > userspace: > # cat /sys/devices/system/cpu/*/cpufreq/cpuinfo_cur_freq | sort | uniq -c > 158 1000000 > 2 2000000 ACPI_CPPC_CPUFREQ_FIE can play a role with schedutil, but not with userspace governor. Userspace doesn't use the values being updated by ACPI_CPPC_CPUFREQ_FIE. So I think the CPUs may not have been idle, just for some reason. Also, now that you are able run on latest kernel (linux-next), it would be better if we can talk in terms of that only going forward. 5.4 adds more to the already unstable results :) -- viresh ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH V3 0/4] cpufreq: cppc: Add support for frequency invariance 2021-06-26 2:29 ` Qian Cai 2021-06-26 13:41 ` Qian Cai @ 2021-06-29 4:52 ` Viresh Kumar 2021-06-29 9:06 ` Ionela Voinescu 2 siblings, 0 replies; 41+ messages in thread From: Viresh Kumar @ 2021-06-29 4:52 UTC (permalink / raw) To: Qian Cai Cc: Ionela Voinescu, Vincent Guittot, Rafael Wysocki, Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann, Greg Kroah-Hartman, Ingo Molnar, Juri Lelli, Mel Gorman, Peter Zijlstra, Rafael J. Wysocki, Steven Rostedt, Sudeep Holla, Will Deacon, open list:THERMAL, ACPI Devel Maling List, linux-kernel, Paul E. McKenney, Rafael J. Wysocki On 25-06-21, 22:29, Qian Cai wrote: > Ionela, I found that set ACPI_PROCESSOR=y instead of > ACPI_PROCESSOR=m will fix the previous mentioned issues here (any > explanations of that?) even though the scaling down is not perfect. Not sure how this affects it. > Now, we have the following on this idle system: > > # cat /sys/devices/system/cpu/*/cpufreq/cpuinfo_cur_freq | sort | uniq -c > 79 1000000 > 1 1160000 > 73 1400000 > 1 2000000 > 4 2010000 > 1 2800000 > 1 860000 > > Even if I rerun a few times, there could still have a few CPUs > running lower than lowest_perf (1GHz). (Please wrap your lines at 80 columns, it makes it harder to read otherwise). I think only the counters stopping on idle can get us that. > Also, even though I set all CPUs to use "userspace" governor and set > freq to the lowest. A few CPUs keep changing at will. > > # cat /sys/devices/system/cpu/*/cpufreq/cpuinfo_cur_freq | sort | uniq -c > 156 1000000 > 3 2000000 > 1 760000 I think this is expected since the hardware is in control of frequency here. The software can only request it to run at X frequency, the hardware may choose to do something else nevertheless. -- viresh ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH V3 0/4] cpufreq: cppc: Add support for frequency invariance 2021-06-26 2:29 ` Qian Cai 2021-06-26 13:41 ` Qian Cai 2021-06-29 4:52 ` Viresh Kumar @ 2021-06-29 9:06 ` Ionela Voinescu 2021-06-29 13:38 ` Qian Cai 2 siblings, 1 reply; 41+ messages in thread From: Ionela Voinescu @ 2021-06-29 9:06 UTC (permalink / raw) To: Qian Cai Cc: Vincent Guittot, Viresh Kumar, Rafael Wysocki, Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann, Greg Kroah-Hartman, Ingo Molnar, Juri Lelli, Mel Gorman, Peter Zijlstra, Rafael J. Wysocki, Steven Rostedt, Sudeep Holla, Will Deacon, open list:THERMAL, ACPI Devel Maling List, linux-kernel, Paul E. McKenney, Rafael J. Wysocki Hi Qian, Sorry for the delay. I was trying to run tests on ThunderX2 as well, as it can give me more control over testing, to get more insight on this. On Friday 25 Jun 2021 at 22:29:26 (-0400), Qian Cai wrote: > > > On 6/25/2021 10:37 AM, Ionela Voinescu wrote: > > Quick questions for you: > > > > 1. When you say you tried a 5.4 kernel, did you try it with these > > patches backported? They also have some dependencies with the recent > > changes in the arch topology driver and cpufreq so they would not be > > straight forward to backport. > > > > If the 5.4 kernel you tried did not have these patches, it might be best > > to try next/master that has these patches, but with > > CONFIG_ACPI_CPPC_CPUFREQ_FIE=n, just to eliminate the possibility that > > an incorrect frequency scale factor here would affect utilization that > > would then affect the schedutil frequency selection. I would not expect > > this behavior even if the scale factor was wrong, but it would be good > > to rule out. > > > > 2. Is your platform booting with all CPUs? Are any hotplug operations > > done in your scenario? > > Ionela, I found that set ACPI_PROCESSOR=y instead of ACPI_PROCESSOR=m > will fix the previous mentioned issues here (any explanations of that?) > even though the scaling down is not perfect. Now, we have the following > on this idle system: > I don't see how this would have played a role. The cppc cpufreq driver depends on functionality gated by CONFIG_ACPI_CPPC_LIB which in turn needs CONFIG_ACPI_PROCESSOR to trigger the parsing of the _CPC objects. If CONFIG_ACPI_PROCESSOR functionality is not built in or loaded, the cppc cpufreq driver could not be used at all. > # cat /sys/devices/system/cpu/*/cpufreq/cpuinfo_cur_freq | sort | uniq -c > 79 1000000 > 1 1160000 > 73 1400000 > 1 2000000 > 4 2010000 > 1 2800000 > 1 860000 > > Even if I rerun a few times, there could still have a few CPUs running > lower than lowest_perf (1GHz). Also, even though I set all CPUs to use > "userspace" governor and set freq to the lowest. A few CPUs keep changing > at will. > I do not believe getting values lower than lowest is worrying as long as they are not much much lower and they don't happen very often. First of all firmware has control over CPU frequencies and it can decide to select a lower frequency if it wishes. Looking at the fact that it only happens rarely in your tests, it's also possible that this is introduced by the fact that the CPU only spends only a few cycles in active state. That would reduce the resolution of the computed current performance level, which results in a lower value when converted to frequency. Hope it helps, Ionela. > # cat /sys/devices/system/cpu/*/cpufreq/cpuinfo_cur_freq | sort | uniq -c > 156 1000000 > 3 2000000 > 1 760000 ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH V3 0/4] cpufreq: cppc: Add support for frequency invariance 2021-06-29 9:06 ` Ionela Voinescu @ 2021-06-29 13:38 ` Qian Cai 0 siblings, 0 replies; 41+ messages in thread From: Qian Cai @ 2021-06-29 13:38 UTC (permalink / raw) To: Ionela Voinescu Cc: Vincent Guittot, Viresh Kumar, Rafael Wysocki, Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann, Greg Kroah-Hartman, Ingo Molnar, Juri Lelli, Mel Gorman, Peter Zijlstra, Rafael J. Wysocki, Steven Rostedt, Sudeep Holla, Will Deacon, open list:THERMAL, ACPI Devel Maling List, linux-kernel, Paul E. McKenney, Rafael J. Wysocki On 6/29/2021 5:06 AM, Ionela Voinescu wrote: >> Ionela, I found that set ACPI_PROCESSOR=y instead of ACPI_PROCESSOR=m >> will fix the previous mentioned issues here (any explanations of that?) >> even though the scaling down is not perfect. Now, we have the following >> on this idle system: >> > > I don't see how this would have played a role. The cppc cpufreq driver > depends on functionality gated by CONFIG_ACPI_CPPC_LIB which in turn > needs CONFIG_ACPI_PROCESSOR to trigger the parsing of the _CPC objects. > If CONFIG_ACPI_PROCESSOR functionality is not built in or loaded, the > cppc cpufreq driver could not be used at all. Ionela, that is fine. I can live with setting ACPI_PROCESSOR=y to workaround. I was more of just curious about why manually loading processor.ko would set the lowest to 2GHz instead of 1GHz. Anyway, if running below the lowest is not the a concern, then I have concluded my testing here. Feel free to include: Tested-by: Qian Cai <quic_qiancai@quicinc.com> ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH V3 0/4] cpufreq: cppc: Add support for frequency invariance 2021-06-25 13:31 ` Qian Cai 2021-06-25 14:37 ` Ionela Voinescu @ 2021-06-29 4:45 ` Viresh Kumar 1 sibling, 0 replies; 41+ messages in thread From: Viresh Kumar @ 2021-06-29 4:45 UTC (permalink / raw) To: Qian Cai Cc: Ionela Voinescu, Vincent Guittot, Rafael Wysocki, Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann, Greg Kroah-Hartman, Ingo Molnar, Juri Lelli, Mel Gorman, Peter Zijlstra, Rafael J. Wysocki, Steven Rostedt, Sudeep Holla, Will Deacon, open list:THERMAL, ACPI Devel Maling List, linux-kernel, Paul E. McKenney, Rafael J. Wysocki On 25-06-21, 09:31, Qian Cai wrote: > The problem is that all CPUs are never scaling down. > "cpuinfo_cur_freq" and "scaling_cur_freq" are always the 2800 MHz on > all CPUs on this idle system. This looks like a regression somewhere > as in 5.4-based kernel, I can see "cpuinfo_cur_freq" can go down to > 2000 MHz in the same scenario. I'll bisect a bit unless you have > better ideas? Few things which may let us understand the readings properly. - cpuinfo_cur_freq: eventually makes a call to cppc_cpufreq_get_rate() and returns the *actual* frequency hardware is running at (based on counter diff around 2 us delay). - scaling_cur_freq: is the frequency the cpufreq core thinks the hardware is running at, it would more in sync with what schedutil (or other governors) wants the CPU to run at. This can be different from what the hardware is running at, i.e. given by cpuinfo_cur_freq. -- viresh ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH V3 0/4] cpufreq: cppc: Add support for frequency invariance 2021-06-24 10:48 ` Ionela Voinescu 2021-06-24 11:15 ` Vincent Guittot 2021-06-24 15:17 ` Qian Cai @ 2021-06-24 20:44 ` Qian Cai 2 siblings, 0 replies; 41+ messages in thread From: Qian Cai @ 2021-06-24 20:44 UTC (permalink / raw) To: Ionela Voinescu, Vincent Guittot Cc: Viresh Kumar, Rafael Wysocki, Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann, Greg Kroah-Hartman, Ingo Molnar, Juri Lelli, Mel Gorman, Peter Zijlstra, Rafael J. Wysocki, Steven Rostedt, Sudeep Holla, Will Deacon, open list:THERMAL, ACPI Devel Maling List, linux-kernel, Paul E. McKenney, Rafael J. Wysocki On 6/24/2021 6:48 AM, Ionela Voinescu wrote: > Not if the counters are implemented properly. The kernel considers that > both reference and delivered performance counters should stop or reset > during idle. The kernel would not account for idle itself. > > If the reference performance counter does not stop during idle, while > the core performance counter (delivered) does stop, the behavior above > should be seen very often. > > Qian, do you see these small delivered performance values often or > seldom? FYI, the latest data point it that on the new kernel, the delivered performance does match the cpufreq_cur_freq. IOW, feedback_ctrs works fine. Also, "powersave" governor could bring down the scaling_cur_freq to scaling_min_freq. Right now, looks like the puzzles on this particular system as mentioned in the previous post are, 1) lowest_freq/lowest_perf != scaling_min_freq 2) CPPC + schedutil is not able to scale down CPUs. ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH V3 0/4] cpufreq: cppc: Add support for frequency invariance 2021-06-21 9:19 [PATCH V3 0/4] cpufreq: cppc: Add support for frequency invariance Viresh Kumar 2021-06-21 9:19 ` [PATCH V3 4/4] cpufreq: CPPC: " Viresh Kumar 2021-06-21 20:48 ` [PATCH V3 0/4] cpufreq: cppc: " Qian Cai @ 2021-06-28 11:54 ` Ionela Voinescu 2021-06-28 12:14 ` Vincent Guittot 2021-06-29 5:20 ` Viresh Kumar 2 siblings, 2 replies; 41+ messages in thread From: Ionela Voinescu @ 2021-06-28 11:54 UTC (permalink / raw) To: Viresh Kumar Cc: Rafael Wysocki, Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann, Greg Kroah-Hartman, Ingo Molnar, Juri Lelli, Mel Gorman, Peter Zijlstra, Rafael J. Wysocki, Steven Rostedt, Sudeep Holla, Vincent Guittot, Will Deacon, linux-pm, Qian Cai, linux-acpi, linux-kernel, Paul E. McKenney, Rafael J. Wysocki Hi guys, On Monday 21 Jun 2021 at 14:49:33 (+0530), Viresh Kumar wrote: > Hello, > > Changes since V2: > > - We don't need start_cpu() and stop_cpu() callbacks anymore, we can make it > work using policy ->init() and exit() alone. > > - Two new cleanup patches 1/4 and 2/4. > > - Improved commit log of 3/4. > > - Dropped WARN_ON(local_freq_scale > 1024), since this can occur on counter's > overlap (seen with Vincent's setup). > If you happen to have the data around, I would like to know more about your observations on ThunderX2. I tried ThunderX2 as well, with the following observations: Booting with userspace governor and all CPUs online, the CPPC frequency scale factor was all over the place (even much larger than 1024). My initial assumptions: - Counters do not behave properly in light of SMT - Firmware does not do a good job to keep the reference and core counters monotonic: save and restore at core off. So I offlined all CPUs with the exception of 0, 32, 64, 96 - threads of a single core (part of policy0). With this all works very well: root@target:/sys/devices/system/cpu/cpufreq/policy0# echo 1056000 > scaling_setspeed root@target:/sys/devices/system/cpu/cpufreq/policy0# [ 1863.095370] CPU96: cppc scale: 697. [ 1863.175370] CPU0: cppc scale: 492. [ 1863.215367] CPU64: cppc scale: 492. [ 1863.235366] CPU96: cppc scale: 492. [ 1863.485368] CPU32: cppc scale: 492. root@target:/sys/devices/system/cpu/cpufreq/policy0# echo 1936000 > scaling_setspeed root@target:/sys/devices/system/cpu/cpufreq/policy0# [ 1891.395363] CPU96: cppc scale: 558. [ 1891.415362] CPU0: cppc scale: 595. [ 1891.435362] CPU32: cppc scale: 615. [ 1891.465363] CPU96: cppc scale: 635. [ 1891.495361] CPU0: cppc scale: 673. [ 1891.515360] CPU32: cppc scale: 703. [ 1891.545360] CPU96: cppc scale: 738. [ 1891.575360] CPU0: cppc scale: 779. [ 1891.605360] CPU96: cppc scale: 829. [ 1891.635360] CPU0: cppc scale: 879. root@target:/sys/devices/system/cpu/cpufreq/policy0# root@target:/sys/devices/system/cpu/cpufreq/policy0# echo 2200000 > scaling_setspeed root@target:/sys/devices/system/cpu/cpufreq/policy0# [ 1896.585363] CPU32: cppc scale: 1004. [ 1896.675359] CPU64: cppc scale: 973. [ 1896.715359] CPU0: cppc scale: 1024. I'm doing a rate limited printk only for increase/decrease values over 64 in the scale factor value. This showed me that SMT is handled properly. Then, as soon as I start onlining CPUs 1, 33, 65, 97, the scale factor stops being even close to correct, for example: [238394.770328] CPU96: cppc scale: 22328. [238395.628846] CPU96: cppc scale: 245. [238516.087115] CPU96: cppc scale: 930. [238523.385009] CPU96: cppc scale: 245. [238538.767473] CPU96: cppc scale: 936. [238538.867546] CPU96: cppc scale: 245. [238599.367932] CPU97: cppc scale: 2728. [238599.859865] CPU97: cppc scale: 452. [238647.786284] CPU96: cppc scale: 1438. [238669.604684] CPU96: cppc scale: 27306. [238676.805049] CPU96: cppc scale: 245. [238737.642902] CPU97: cppc scale: 2035. [238737.664995] CPU97: cppc scale: 452. [238788.066193] CPU96: cppc scale: 2749. [238788.110192] CPU96: cppc scale: 245. [238817.231659] CPU96: cppc scale: 2698. [238818.083687] CPU96: cppc scale: 245. [238845.466850] CPU97: cppc scale: 2990. [238847.477805] CPU97: cppc scale: 452. [238936.984107] CPU97: cppc scale: 1590. [238937.029079] CPU97: cppc scale: 452. [238979.052464] CPU97: cppc scale: 911. [238980.900668] CPU97: cppc scale: 452. [239149.587889] CPU96: cppc scale: 803. [239151.085516] CPU96: cppc scale: 245. [239303.871373] CPU64: cppc scale: 956. [239303.906837] CPU64: cppc scale: 245. [239308.666786] CPU96: cppc scale: 821. [239319.440634] CPU96: cppc scale: 245. [239389.978395] CPU97: cppc scale: 4229. [239391.969562] CPU97: cppc scale: 452. [239415.894738] CPU96: cppc scale: 630. [239417.875326] CPU96: cppc scale: 245. The counter values shown by feedback_ctrs do not seem monotonic even when only core 0 threads are online. ref:2812420736 del:166051103 ref:3683620736 del:641578595 ref:1049653440 del:1548202980 ref:2099053440 del:2120997459 ref:3185853440 del:2714205997 ref:712486144 del:3708490753 ref:3658438336 del:3401357212 ref:1570998080 del:2279728438 For now I was just wondering if you have seen the same and whether you have an opinion on this. > This is tested on my Hikey platform (without the actual read/write to > performance counters), with this script for over an hour: > > while true; do > for i in `seq 1 7`; > do > echo 0 > /sys/devices/system/cpu/cpu$i/online; > done; > > for i in `seq 1 7`; > do > echo 1 > /sys/devices/system/cpu/cpu$i/online; > done; > done > > > The same is done by Vincent on ThunderX2 and no issues were seen. Hotplug worked fine for me as well on both platforms I tested (Juno R2 and ThunderX2). Thanks, Ionela. ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH V3 0/4] cpufreq: cppc: Add support for frequency invariance 2021-06-28 11:54 ` Ionela Voinescu @ 2021-06-28 12:14 ` Vincent Guittot 2021-06-28 12:17 ` Vincent Guittot 2021-06-28 13:08 ` Ionela Voinescu 2021-06-29 5:20 ` Viresh Kumar 1 sibling, 2 replies; 41+ messages in thread From: Vincent Guittot @ 2021-06-28 12:14 UTC (permalink / raw) To: Ionela Voinescu Cc: Viresh Kumar, Rafael Wysocki, Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann, Greg Kroah-Hartman, Ingo Molnar, Juri Lelli, Mel Gorman, Peter Zijlstra, Rafael J. Wysocki, Steven Rostedt, Sudeep Holla, Will Deacon, open list:THERMAL, Qian Cai, ACPI Devel Maling List, linux-kernel, Paul E. McKenney, Rafael J. Wysocki On Mon, 28 Jun 2021 at 13:54, Ionela Voinescu <ionela.voinescu@arm.com> wrote: > > Hi guys, > > On Monday 21 Jun 2021 at 14:49:33 (+0530), Viresh Kumar wrote: > > Hello, > > > > Changes since V2: > > > > - We don't need start_cpu() and stop_cpu() callbacks anymore, we can make it > > work using policy ->init() and exit() alone. > > > > - Two new cleanup patches 1/4 and 2/4. > > > > - Improved commit log of 3/4. > > > > - Dropped WARN_ON(local_freq_scale > 1024), since this can occur on counter's > > overlap (seen with Vincent's setup). > > > > If you happen to have the data around, I would like to know more about > your observations on ThunderX2. > > > I tried ThunderX2 as well, with the following observations: > > Booting with userspace governor and all CPUs online, the CPPC frequency > scale factor was all over the place (even much larger than 1024). > > My initial assumptions: > - Counters do not behave properly in light of SMT > - Firmware does not do a good job to keep the reference and core > counters monotonic: save and restore at core off. > > So I offlined all CPUs with the exception of 0, 32, 64, 96 - threads of > a single core (part of policy0). With this all works very well: > > root@target:/sys/devices/system/cpu/cpufreq/policy0# echo 1056000 > scaling_setspeed > root@target:/sys/devices/system/cpu/cpufreq/policy0# > [ 1863.095370] CPU96: cppc scale: 697. > [ 1863.175370] CPU0: cppc scale: 492. > [ 1863.215367] CPU64: cppc scale: 492. > [ 1863.235366] CPU96: cppc scale: 492. > [ 1863.485368] CPU32: cppc scale: 492. > > root@target:/sys/devices/system/cpu/cpufreq/policy0# echo 1936000 > scaling_setspeed > root@target:/sys/devices/system/cpu/cpufreq/policy0# > [ 1891.395363] CPU96: cppc scale: 558. > [ 1891.415362] CPU0: cppc scale: 595. > [ 1891.435362] CPU32: cppc scale: 615. > [ 1891.465363] CPU96: cppc scale: 635. > [ 1891.495361] CPU0: cppc scale: 673. > [ 1891.515360] CPU32: cppc scale: 703. > [ 1891.545360] CPU96: cppc scale: 738. > [ 1891.575360] CPU0: cppc scale: 779. > [ 1891.605360] CPU96: cppc scale: 829. > [ 1891.635360] CPU0: cppc scale: 879. > > root@target:/sys/devices/system/cpu/cpufreq/policy0# > root@target:/sys/devices/system/cpu/cpufreq/policy0# echo 2200000 > scaling_setspeed > root@target:/sys/devices/system/cpu/cpufreq/policy0# > [ 1896.585363] CPU32: cppc scale: 1004. > [ 1896.675359] CPU64: cppc scale: 973. > [ 1896.715359] CPU0: cppc scale: 1024. > > I'm doing a rate limited printk only for increase/decrease values over > 64 in the scale factor value. > > This showed me that SMT is handled properly. > > Then, as soon as I start onlining CPUs 1, 33, 65, 97, the scale factor > stops being even close to correct, for example: > > [238394.770328] CPU96: cppc scale: 22328. > [238395.628846] CPU96: cppc scale: 245. > [238516.087115] CPU96: cppc scale: 930. > [238523.385009] CPU96: cppc scale: 245. > [238538.767473] CPU96: cppc scale: 936. > [238538.867546] CPU96: cppc scale: 245. > [238599.367932] CPU97: cppc scale: 2728. > [238599.859865] CPU97: cppc scale: 452. > [238647.786284] CPU96: cppc scale: 1438. > [238669.604684] CPU96: cppc scale: 27306. > [238676.805049] CPU96: cppc scale: 245. > [238737.642902] CPU97: cppc scale: 2035. > [238737.664995] CPU97: cppc scale: 452. > [238788.066193] CPU96: cppc scale: 2749. > [238788.110192] CPU96: cppc scale: 245. > [238817.231659] CPU96: cppc scale: 2698. > [238818.083687] CPU96: cppc scale: 245. > [238845.466850] CPU97: cppc scale: 2990. > [238847.477805] CPU97: cppc scale: 452. > [238936.984107] CPU97: cppc scale: 1590. > [238937.029079] CPU97: cppc scale: 452. > [238979.052464] CPU97: cppc scale: 911. > [238980.900668] CPU97: cppc scale: 452. > [239149.587889] CPU96: cppc scale: 803. > [239151.085516] CPU96: cppc scale: 245. > [239303.871373] CPU64: cppc scale: 956. > [239303.906837] CPU64: cppc scale: 245. > [239308.666786] CPU96: cppc scale: 821. > [239319.440634] CPU96: cppc scale: 245. > [239389.978395] CPU97: cppc scale: 4229. > [239391.969562] CPU97: cppc scale: 452. > [239415.894738] CPU96: cppc scale: 630. > [239417.875326] CPU96: cppc scale: 245. > With the counter being 32bits and the freq scaling being update at tick, you can easily get a overflow on it in idle system. I can easily imagine that when you unplug CPUs there is enough activity on the CPU to update it regularly whereas with all CPUs, the idle time is longer that the counter overflow > The counter values shown by feedback_ctrs do not seem monotonic even > when only core 0 threads are online. > > ref:2812420736 del:166051103 > ref:3683620736 del:641578595 > ref:1049653440 del:1548202980 > ref:2099053440 del:2120997459 > ref:3185853440 del:2714205997 > ref:712486144 del:3708490753 > ref:3658438336 del:3401357212 > ref:1570998080 del:2279728438 > > For now I was just wondering if you have seen the same and whether you > have an opinion on this. > > > This is tested on my Hikey platform (without the actual read/write to > > performance counters), with this script for over an hour: > > > > while true; do > > for i in `seq 1 7`; > > do > > echo 0 > /sys/devices/system/cpu/cpu$i/online; > > done; > > > > for i in `seq 1 7`; > > do > > echo 1 > /sys/devices/system/cpu/cpu$i/online; > > done; > > done > > > > > > The same is done by Vincent on ThunderX2 and no issues were seen. > > Hotplug worked fine for me as well on both platforms I tested (Juno R2 > and ThunderX2). > > Thanks, > Ionela. ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH V3 0/4] cpufreq: cppc: Add support for frequency invariance 2021-06-28 12:14 ` Vincent Guittot @ 2021-06-28 12:17 ` Vincent Guittot 2021-06-28 13:08 ` Ionela Voinescu 1 sibling, 0 replies; 41+ messages in thread From: Vincent Guittot @ 2021-06-28 12:17 UTC (permalink / raw) To: Ionela Voinescu Cc: Viresh Kumar, Rafael Wysocki, Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann, Greg Kroah-Hartman, Ingo Molnar, Juri Lelli, Mel Gorman, Peter Zijlstra, Rafael J. Wysocki, Steven Rostedt, Sudeep Holla, Will Deacon, open list:THERMAL, Qian Cai, ACPI Devel Maling List, linux-kernel, Paul E. McKenney, Rafael J. Wysocki On Mon, 28 Jun 2021 at 14:14, Vincent Guittot <vincent.guittot@linaro.org> wrote: > > On Mon, 28 Jun 2021 at 13:54, Ionela Voinescu <ionela.voinescu@arm.com> wrote: > > > > Hi guys, > > > > On Monday 21 Jun 2021 at 14:49:33 (+0530), Viresh Kumar wrote: > > > Hello, > > > > > > Changes since V2: > > > > > > - We don't need start_cpu() and stop_cpu() callbacks anymore, we can make it > > > work using policy ->init() and exit() alone. > > > > > > - Two new cleanup patches 1/4 and 2/4. > > > > > > - Improved commit log of 3/4. > > > > > > - Dropped WARN_ON(local_freq_scale > 1024), since this can occur on counter's > > > overlap (seen with Vincent's setup). > > > > > > > If you happen to have the data around, I would like to know more about > > your observations on ThunderX2. > > > > > > I tried ThunderX2 as well, with the following observations: > > > > Booting with userspace governor and all CPUs online, the CPPC frequency > > scale factor was all over the place (even much larger than 1024). > > > > My initial assumptions: > > - Counters do not behave properly in light of SMT > > - Firmware does not do a good job to keep the reference and core > > counters monotonic: save and restore at core off. > > > > So I offlined all CPUs with the exception of 0, 32, 64, 96 - threads of > > a single core (part of policy0). With this all works very well: > > > > root@target:/sys/devices/system/cpu/cpufreq/policy0# echo 1056000 > scaling_setspeed > > root@target:/sys/devices/system/cpu/cpufreq/policy0# > > [ 1863.095370] CPU96: cppc scale: 697. > > [ 1863.175370] CPU0: cppc scale: 492. > > [ 1863.215367] CPU64: cppc scale: 492. > > [ 1863.235366] CPU96: cppc scale: 492. > > [ 1863.485368] CPU32: cppc scale: 492. > > > > root@target:/sys/devices/system/cpu/cpufreq/policy0# echo 1936000 > scaling_setspeed > > root@target:/sys/devices/system/cpu/cpufreq/policy0# > > [ 1891.395363] CPU96: cppc scale: 558. > > [ 1891.415362] CPU0: cppc scale: 595. > > [ 1891.435362] CPU32: cppc scale: 615. > > [ 1891.465363] CPU96: cppc scale: 635. > > [ 1891.495361] CPU0: cppc scale: 673. > > [ 1891.515360] CPU32: cppc scale: 703. > > [ 1891.545360] CPU96: cppc scale: 738. > > [ 1891.575360] CPU0: cppc scale: 779. > > [ 1891.605360] CPU96: cppc scale: 829. > > [ 1891.635360] CPU0: cppc scale: 879. > > > > root@target:/sys/devices/system/cpu/cpufreq/policy0# > > root@target:/sys/devices/system/cpu/cpufreq/policy0# echo 2200000 > scaling_setspeed > > root@target:/sys/devices/system/cpu/cpufreq/policy0# > > [ 1896.585363] CPU32: cppc scale: 1004. > > [ 1896.675359] CPU64: cppc scale: 973. > > [ 1896.715359] CPU0: cppc scale: 1024. > > > > I'm doing a rate limited printk only for increase/decrease values over > > 64 in the scale factor value. > > > > This showed me that SMT is handled properly. > > > > Then, as soon as I start onlining CPUs 1, 33, 65, 97, the scale factor > > stops being even close to correct, for example: > > > > [238394.770328] CPU96: cppc scale: 22328. > > [238395.628846] CPU96: cppc scale: 245. > > [238516.087115] CPU96: cppc scale: 930. > > [238523.385009] CPU96: cppc scale: 245. > > [238538.767473] CPU96: cppc scale: 936. > > [238538.867546] CPU96: cppc scale: 245. > > [238599.367932] CPU97: cppc scale: 2728. > > [238599.859865] CPU97: cppc scale: 452. > > [238647.786284] CPU96: cppc scale: 1438. > > [238669.604684] CPU96: cppc scale: 27306. > > [238676.805049] CPU96: cppc scale: 245. > > [238737.642902] CPU97: cppc scale: 2035. > > [238737.664995] CPU97: cppc scale: 452. > > [238788.066193] CPU96: cppc scale: 2749. > > [238788.110192] CPU96: cppc scale: 245. > > [238817.231659] CPU96: cppc scale: 2698. > > [238818.083687] CPU96: cppc scale: 245. > > [238845.466850] CPU97: cppc scale: 2990. > > [238847.477805] CPU97: cppc scale: 452. > > [238936.984107] CPU97: cppc scale: 1590. > > [238937.029079] CPU97: cppc scale: 452. > > [238979.052464] CPU97: cppc scale: 911. > > [238980.900668] CPU97: cppc scale: 452. > > [239149.587889] CPU96: cppc scale: 803. > > [239151.085516] CPU96: cppc scale: 245. > > [239303.871373] CPU64: cppc scale: 956. > > [239303.906837] CPU64: cppc scale: 245. > > [239308.666786] CPU96: cppc scale: 821. > > [239319.440634] CPU96: cppc scale: 245. > > [239389.978395] CPU97: cppc scale: 4229. > > [239391.969562] CPU97: cppc scale: 452. > > [239415.894738] CPU96: cppc scale: 630. > > [239417.875326] CPU96: cppc scale: 245. > > > > With the counter being 32bits and the freq scaling being update at > tick, you can easily get a overflow on it in idle system. I can easily > imagine that when you unplug CPUs there is enough activity on the CPU > to update it regularly whereas with all CPUs, the idle time is longer > that the counter overflow > > > The counter values shown by feedback_ctrs do not seem monotonic even > > when only core 0 threads are online. > > > > ref:2812420736 del:166051103 > > ref:3683620736 del:641578595 > > ref:1049653440 del:1548202980 > > ref:2099053440 del:2120997459 > > ref:3185853440 del:2714205997 > > ref:712486144 del:3708490753 > > ref:3658438336 del:3401357212 > > ref:1570998080 del:2279728438 There are 32bits and the overflow need to be handled by cppc_cpufreq driver > > > > For now I was just wondering if you have seen the same and whether you > > have an opinion on this. > > > > > This is tested on my Hikey platform (without the actual read/write to > > > performance counters), with this script for over an hour: > > > > > > while true; do > > > for i in `seq 1 7`; > > > do > > > echo 0 > /sys/devices/system/cpu/cpu$i/online; > > > done; > > > > > > for i in `seq 1 7`; > > > do > > > echo 1 > /sys/devices/system/cpu/cpu$i/online; > > > done; > > > done > > > > > > > > > The same is done by Vincent on ThunderX2 and no issues were seen. > > > > Hotplug worked fine for me as well on both platforms I tested (Juno R2 > > and ThunderX2). > > > > Thanks, > > Ionela. ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH V3 0/4] cpufreq: cppc: Add support for frequency invariance 2021-06-28 12:14 ` Vincent Guittot 2021-06-28 12:17 ` Vincent Guittot @ 2021-06-28 13:08 ` Ionela Voinescu 2021-06-28 21:37 ` Ionela Voinescu 1 sibling, 1 reply; 41+ messages in thread From: Ionela Voinescu @ 2021-06-28 13:08 UTC (permalink / raw) To: Vincent Guittot Cc: Viresh Kumar, Rafael Wysocki, Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann, Greg Kroah-Hartman, Ingo Molnar, Juri Lelli, Mel Gorman, Peter Zijlstra, Rafael J. Wysocki, Steven Rostedt, Sudeep Holla, Will Deacon, open list:THERMAL, Qian Cai, ACPI Devel Maling List, linux-kernel, Paul E. McKenney, Rafael J. Wysocki On Monday 28 Jun 2021 at 14:14:14 (+0200), Vincent Guittot wrote: > On Mon, 28 Jun 2021 at 13:54, Ionela Voinescu <ionela.voinescu@arm.com> wrote: > > > > Hi guys, > > > > On Monday 21 Jun 2021 at 14:49:33 (+0530), Viresh Kumar wrote: > > > Hello, > > > > > > Changes since V2: > > > > > > - We don't need start_cpu() and stop_cpu() callbacks anymore, we can make it > > > work using policy ->init() and exit() alone. > > > > > > - Two new cleanup patches 1/4 and 2/4. > > > > > > - Improved commit log of 3/4. > > > > > > - Dropped WARN_ON(local_freq_scale > 1024), since this can occur on counter's > > > overlap (seen with Vincent's setup). > > > > > > > If you happen to have the data around, I would like to know more about > > your observations on ThunderX2. > > > > > > I tried ThunderX2 as well, with the following observations: > > > > Booting with userspace governor and all CPUs online, the CPPC frequency > > scale factor was all over the place (even much larger than 1024). > > > > My initial assumptions: > > - Counters do not behave properly in light of SMT > > - Firmware does not do a good job to keep the reference and core > > counters monotonic: save and restore at core off. > > > > So I offlined all CPUs with the exception of 0, 32, 64, 96 - threads of > > a single core (part of policy0). With this all works very well: > > > > root@target:/sys/devices/system/cpu/cpufreq/policy0# echo 1056000 > scaling_setspeed > > root@target:/sys/devices/system/cpu/cpufreq/policy0# > > [ 1863.095370] CPU96: cppc scale: 697. > > [ 1863.175370] CPU0: cppc scale: 492. > > [ 1863.215367] CPU64: cppc scale: 492. > > [ 1863.235366] CPU96: cppc scale: 492. > > [ 1863.485368] CPU32: cppc scale: 492. > > > > root@target:/sys/devices/system/cpu/cpufreq/policy0# echo 1936000 > scaling_setspeed > > root@target:/sys/devices/system/cpu/cpufreq/policy0# > > [ 1891.395363] CPU96: cppc scale: 558. > > [ 1891.415362] CPU0: cppc scale: 595. > > [ 1891.435362] CPU32: cppc scale: 615. > > [ 1891.465363] CPU96: cppc scale: 635. > > [ 1891.495361] CPU0: cppc scale: 673. > > [ 1891.515360] CPU32: cppc scale: 703. > > [ 1891.545360] CPU96: cppc scale: 738. > > [ 1891.575360] CPU0: cppc scale: 779. > > [ 1891.605360] CPU96: cppc scale: 829. > > [ 1891.635360] CPU0: cppc scale: 879. > > > > root@target:/sys/devices/system/cpu/cpufreq/policy0# > > root@target:/sys/devices/system/cpu/cpufreq/policy0# echo 2200000 > scaling_setspeed > > root@target:/sys/devices/system/cpu/cpufreq/policy0# > > [ 1896.585363] CPU32: cppc scale: 1004. > > [ 1896.675359] CPU64: cppc scale: 973. > > [ 1896.715359] CPU0: cppc scale: 1024. > > > > I'm doing a rate limited printk only for increase/decrease values over > > 64 in the scale factor value. > > > > This showed me that SMT is handled properly. > > > > Then, as soon as I start onlining CPUs 1, 33, 65, 97, the scale factor > > stops being even close to correct, for example: > > > > [238394.770328] CPU96: cppc scale: 22328. > > [238395.628846] CPU96: cppc scale: 245. > > [238516.087115] CPU96: cppc scale: 930. > > [238523.385009] CPU96: cppc scale: 245. > > [238538.767473] CPU96: cppc scale: 936. > > [238538.867546] CPU96: cppc scale: 245. > > [238599.367932] CPU97: cppc scale: 2728. > > [238599.859865] CPU97: cppc scale: 452. > > [238647.786284] CPU96: cppc scale: 1438. > > [238669.604684] CPU96: cppc scale: 27306. > > [238676.805049] CPU96: cppc scale: 245. > > [238737.642902] CPU97: cppc scale: 2035. > > [238737.664995] CPU97: cppc scale: 452. > > [238788.066193] CPU96: cppc scale: 2749. > > [238788.110192] CPU96: cppc scale: 245. > > [238817.231659] CPU96: cppc scale: 2698. > > [238818.083687] CPU96: cppc scale: 245. > > [238845.466850] CPU97: cppc scale: 2990. > > [238847.477805] CPU97: cppc scale: 452. > > [238936.984107] CPU97: cppc scale: 1590. > > [238937.029079] CPU97: cppc scale: 452. > > [238979.052464] CPU97: cppc scale: 911. > > [238980.900668] CPU97: cppc scale: 452. > > [239149.587889] CPU96: cppc scale: 803. > > [239151.085516] CPU96: cppc scale: 245. > > [239303.871373] CPU64: cppc scale: 956. > > [239303.906837] CPU64: cppc scale: 245. > > [239308.666786] CPU96: cppc scale: 821. > > [239319.440634] CPU96: cppc scale: 245. > > [239389.978395] CPU97: cppc scale: 4229. > > [239391.969562] CPU97: cppc scale: 452. > > [239415.894738] CPU96: cppc scale: 630. > > [239417.875326] CPU96: cppc scale: 245. > > > > With the counter being 32bits and the freq scaling being update at > tick, you can easily get a overflow on it in idle system. I can easily > imagine that when you unplug CPUs there is enough activity on the CPU > to update it regularly whereas with all CPUs, the idle time is longer > that the counter overflow > Thanks! Yes, given the high wraparound time I thought they were 64 bit. All variables in software are 64 bit, but looking at bit width in the _CPC entries, the platform counters are 32 bit counters. > There are 32bits and the overflow need to be handled by cppc_cpufreq > driver I'm wondering if this would be best handled in the function that reads the counters or in the cppc_cpufreq driver that uses them. Probably the latter, as you say, as the read function should only return the raw values, but it does complicate things. Thanks, Ionela. > > The counter values shown by feedback_ctrs do not seem monotonic even > > when only core 0 threads are online. > > > > ref:2812420736 del:166051103 > > ref:3683620736 del:641578595 > > ref:1049653440 del:1548202980 > > ref:2099053440 del:2120997459 > > ref:3185853440 del:2714205997 > > ref:712486144 del:3708490753 > > ref:3658438336 del:3401357212 > > ref:1570998080 del:2279728438 > > > > For now I was just wondering if you have seen the same and whether you > > have an opinion on this. > > > > > This is tested on my Hikey platform (without the actual read/write to > > > performance counters), with this script for over an hour: > > > > > > while true; do > > > for i in `seq 1 7`; > > > do > > > echo 0 > /sys/devices/system/cpu/cpu$i/online; > > > done; > > > > > > for i in `seq 1 7`; > > > do > > > echo 1 > /sys/devices/system/cpu/cpu$i/online; > > > done; > > > done > > > > > > > > > The same is done by Vincent on ThunderX2 and no issues were seen. > > > > Hotplug worked fine for me as well on both platforms I tested (Juno R2 > > and ThunderX2). > > > > Thanks, > > Ionela. ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH V3 0/4] cpufreq: cppc: Add support for frequency invariance 2021-06-28 13:08 ` Ionela Voinescu @ 2021-06-28 21:37 ` Ionela Voinescu 2021-06-29 8:45 ` Vincent Guittot 0 siblings, 1 reply; 41+ messages in thread From: Ionela Voinescu @ 2021-06-28 21:37 UTC (permalink / raw) To: Vincent Guittot Cc: Viresh Kumar, Rafael Wysocki, Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann, Greg Kroah-Hartman, Ingo Molnar, Juri Lelli, Mel Gorman, Peter Zijlstra, Rafael J. Wysocki, Steven Rostedt, Sudeep Holla, Will Deacon, open list:THERMAL, Qian Cai, ACPI Devel Maling List, linux-kernel, Paul E. McKenney, Rafael J. Wysocki On Monday 28 Jun 2021 at 14:08:13 (+0100), Ionela Voinescu wrote: > On Monday 28 Jun 2021 at 14:14:14 (+0200), Vincent Guittot wrote: > > On Mon, 28 Jun 2021 at 13:54, Ionela Voinescu <ionela.voinescu@arm.com> wrote: > > > > > > Hi guys, > > > > > > On Monday 21 Jun 2021 at 14:49:33 (+0530), Viresh Kumar wrote: > > > > Hello, > > > > > > > > Changes since V2: > > > > > > > > - We don't need start_cpu() and stop_cpu() callbacks anymore, we can make it > > > > work using policy ->init() and exit() alone. > > > > > > > > - Two new cleanup patches 1/4 and 2/4. > > > > > > > > - Improved commit log of 3/4. > > > > > > > > - Dropped WARN_ON(local_freq_scale > 1024), since this can occur on counter's > > > > overlap (seen with Vincent's setup). > > > > > > > > > > If you happen to have the data around, I would like to know more about > > > your observations on ThunderX2. > > > > > > > > > I tried ThunderX2 as well, with the following observations: > > > > > > Booting with userspace governor and all CPUs online, the CPPC frequency > > > scale factor was all over the place (even much larger than 1024). > > > > > > My initial assumptions: > > > - Counters do not behave properly in light of SMT > > > - Firmware does not do a good job to keep the reference and core > > > counters monotonic: save and restore at core off. > > > > > > So I offlined all CPUs with the exception of 0, 32, 64, 96 - threads of > > > a single core (part of policy0). With this all works very well: > > > > > > root@target:/sys/devices/system/cpu/cpufreq/policy0# echo 1056000 > scaling_setspeed > > > root@target:/sys/devices/system/cpu/cpufreq/policy0# > > > [ 1863.095370] CPU96: cppc scale: 697. > > > [ 1863.175370] CPU0: cppc scale: 492. > > > [ 1863.215367] CPU64: cppc scale: 492. > > > [ 1863.235366] CPU96: cppc scale: 492. > > > [ 1863.485368] CPU32: cppc scale: 492. > > > > > > root@target:/sys/devices/system/cpu/cpufreq/policy0# echo 1936000 > scaling_setspeed > > > root@target:/sys/devices/system/cpu/cpufreq/policy0# > > > [ 1891.395363] CPU96: cppc scale: 558. > > > [ 1891.415362] CPU0: cppc scale: 595. > > > [ 1891.435362] CPU32: cppc scale: 615. > > > [ 1891.465363] CPU96: cppc scale: 635. > > > [ 1891.495361] CPU0: cppc scale: 673. > > > [ 1891.515360] CPU32: cppc scale: 703. > > > [ 1891.545360] CPU96: cppc scale: 738. > > > [ 1891.575360] CPU0: cppc scale: 779. > > > [ 1891.605360] CPU96: cppc scale: 829. > > > [ 1891.635360] CPU0: cppc scale: 879. > > > > > > root@target:/sys/devices/system/cpu/cpufreq/policy0# > > > root@target:/sys/devices/system/cpu/cpufreq/policy0# echo 2200000 > scaling_setspeed > > > root@target:/sys/devices/system/cpu/cpufreq/policy0# > > > [ 1896.585363] CPU32: cppc scale: 1004. > > > [ 1896.675359] CPU64: cppc scale: 973. > > > [ 1896.715359] CPU0: cppc scale: 1024. > > > > > > I'm doing a rate limited printk only for increase/decrease values over > > > 64 in the scale factor value. > > > > > > This showed me that SMT is handled properly. > > > > > > Then, as soon as I start onlining CPUs 1, 33, 65, 97, the scale factor > > > stops being even close to correct, for example: > > > > > > [238394.770328] CPU96: cppc scale: 22328. > > > [238395.628846] CPU96: cppc scale: 245. > > > [238516.087115] CPU96: cppc scale: 930. > > > [238523.385009] CPU96: cppc scale: 245. > > > [238538.767473] CPU96: cppc scale: 936. > > > [238538.867546] CPU96: cppc scale: 245. > > > [238599.367932] CPU97: cppc scale: 2728. > > > [238599.859865] CPU97: cppc scale: 452. > > > [238647.786284] CPU96: cppc scale: 1438. > > > [238669.604684] CPU96: cppc scale: 27306. > > > [238676.805049] CPU96: cppc scale: 245. > > > [238737.642902] CPU97: cppc scale: 2035. > > > [238737.664995] CPU97: cppc scale: 452. > > > [238788.066193] CPU96: cppc scale: 2749. > > > [238788.110192] CPU96: cppc scale: 245. > > > [238817.231659] CPU96: cppc scale: 2698. > > > [238818.083687] CPU96: cppc scale: 245. > > > [238845.466850] CPU97: cppc scale: 2990. > > > [238847.477805] CPU97: cppc scale: 452. > > > [238936.984107] CPU97: cppc scale: 1590. > > > [238937.029079] CPU97: cppc scale: 452. > > > [238979.052464] CPU97: cppc scale: 911. > > > [238980.900668] CPU97: cppc scale: 452. > > > [239149.587889] CPU96: cppc scale: 803. > > > [239151.085516] CPU96: cppc scale: 245. > > > [239303.871373] CPU64: cppc scale: 956. > > > [239303.906837] CPU64: cppc scale: 245. > > > [239308.666786] CPU96: cppc scale: 821. > > > [239319.440634] CPU96: cppc scale: 245. > > > [239389.978395] CPU97: cppc scale: 4229. > > > [239391.969562] CPU97: cppc scale: 452. > > > [239415.894738] CPU96: cppc scale: 630. > > > [239417.875326] CPU96: cppc scale: 245. > > > > > > > With the counter being 32bits and the freq scaling being update at > > tick, you can easily get a overflow on it in idle system. I can easily > > imagine that when you unplug CPUs there is enough activity on the CPU > > to update it regularly whereas with all CPUs, the idle time is longer > > that the counter overflow For sane counters, how long the CPU is idle should not matter (please see below my definition of sane counters). > > > > Thanks! Yes, given the high wraparound time I thought they were 64 bit. > All variables in software are 64 bit, but looking at bit width in the > _CPC entries, the platform counters are 32 bit counters. > I've looked a bit more over the code, and considering this particular system (32 bit counters, maximum frequency of CPUs = 2.2GHz), I believe the wraparound is considered, and this should not cause these strange values in the scale factor. I consider the counters sane if both stop during idle - either they stop when CPU is clock gated, or some firmware does save/restore at core off. Therefore, in all idle cases they seem to have stopped, from the point of view of the OS. The ACPI spec mentions that both count "any time the processor is active". After the cores return from idle, the counters will wraparound at a minimum of 1.95s. So with a tick every 4ms at most 1 wraparound would have happened which allows the getDelta() function in cppc_cpufreq driver to get the proper difference in values. Let me know what you think. Thanks, Ionela. > > There are 32bits and the overflow need to be handled by cppc_cpufreq > > driver > > I'm wondering if this would be best handled in the function that reads > the counters or in the cppc_cpufreq driver that uses them. Probably the > latter, as you say, as the read function should only return the raw > values, but it does complicate things. > > Thanks, > Ionela. > > > > > > The counter values shown by feedback_ctrs do not seem monotonic even > > > when only core 0 threads are online. > > > > > > ref:2812420736 del:166051103 > > > ref:3683620736 del:641578595 > > > ref:1049653440 del:1548202980 > > > ref:2099053440 del:2120997459 > > > ref:3185853440 del:2714205997 > > > ref:712486144 del:3708490753 > > > ref:3658438336 del:3401357212 > > > ref:1570998080 del:2279728438 > > > > > > For now I was just wondering if you have seen the same and whether you > > > have an opinion on this. > > > > > > > This is tested on my Hikey platform (without the actual read/write to > > > > performance counters), with this script for over an hour: > > > > > > > > while true; do > > > > for i in `seq 1 7`; > > > > do > > > > echo 0 > /sys/devices/system/cpu/cpu$i/online; > > > > done; > > > > > > > > for i in `seq 1 7`; > > > > do > > > > echo 1 > /sys/devices/system/cpu/cpu$i/online; > > > > done; > > > > done > > > > > > > > > > > > The same is done by Vincent on ThunderX2 and no issues were seen. > > > > > > Hotplug worked fine for me as well on both platforms I tested (Juno R2 > > > and ThunderX2). > > > > > > Thanks, > > > Ionela. ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH V3 0/4] cpufreq: cppc: Add support for frequency invariance 2021-06-28 21:37 ` Ionela Voinescu @ 2021-06-29 8:45 ` Vincent Guittot 0 siblings, 0 replies; 41+ messages in thread From: Vincent Guittot @ 2021-06-29 8:45 UTC (permalink / raw) To: Ionela Voinescu Cc: Viresh Kumar, Rafael Wysocki, Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann, Greg Kroah-Hartman, Ingo Molnar, Juri Lelli, Mel Gorman, Peter Zijlstra, Rafael J. Wysocki, Steven Rostedt, Sudeep Holla, Will Deacon, open list:THERMAL, Qian Cai, ACPI Devel Maling List, linux-kernel, Paul E. McKenney, Rafael J. Wysocki On Mon, 28 Jun 2021 at 23:37, Ionela Voinescu <ionela.voinescu@arm.com> wrote: > > On Monday 28 Jun 2021 at 14:08:13 (+0100), Ionela Voinescu wrote: > > On Monday 28 Jun 2021 at 14:14:14 (+0200), Vincent Guittot wrote: > > > On Mon, 28 Jun 2021 at 13:54, Ionela Voinescu <ionela.voinescu@arm.com> wrote: > > > > > > > > Hi guys, > > > > > > > > On Monday 21 Jun 2021 at 14:49:33 (+0530), Viresh Kumar wrote: > > > > > Hello, > > > > > > > > > > Changes since V2: > > > > > > > > > > - We don't need start_cpu() and stop_cpu() callbacks anymore, we can make it > > > > > work using policy ->init() and exit() alone. > > > > > > > > > > - Two new cleanup patches 1/4 and 2/4. > > > > > > > > > > - Improved commit log of 3/4. > > > > > > > > > > - Dropped WARN_ON(local_freq_scale > 1024), since this can occur on counter's > > > > > overlap (seen with Vincent's setup). > > > > > > > > > > > > > If you happen to have the data around, I would like to know more about > > > > your observations on ThunderX2. > > > > > > > > > > > > I tried ThunderX2 as well, with the following observations: > > > > > > > > Booting with userspace governor and all CPUs online, the CPPC frequency > > > > scale factor was all over the place (even much larger than 1024). > > > > > > > > My initial assumptions: > > > > - Counters do not behave properly in light of SMT > > > > - Firmware does not do a good job to keep the reference and core > > > > counters monotonic: save and restore at core off. > > > > > > > > So I offlined all CPUs with the exception of 0, 32, 64, 96 - threads of > > > > a single core (part of policy0). With this all works very well: > > > > > > > > root@target:/sys/devices/system/cpu/cpufreq/policy0# echo 1056000 > scaling_setspeed > > > > root@target:/sys/devices/system/cpu/cpufreq/policy0# > > > > [ 1863.095370] CPU96: cppc scale: 697. > > > > [ 1863.175370] CPU0: cppc scale: 492. > > > > [ 1863.215367] CPU64: cppc scale: 492. > > > > [ 1863.235366] CPU96: cppc scale: 492. > > > > [ 1863.485368] CPU32: cppc scale: 492. > > > > > > > > root@target:/sys/devices/system/cpu/cpufreq/policy0# echo 1936000 > scaling_setspeed > > > > root@target:/sys/devices/system/cpu/cpufreq/policy0# > > > > [ 1891.395363] CPU96: cppc scale: 558. > > > > [ 1891.415362] CPU0: cppc scale: 595. > > > > [ 1891.435362] CPU32: cppc scale: 615. > > > > [ 1891.465363] CPU96: cppc scale: 635. > > > > [ 1891.495361] CPU0: cppc scale: 673. > > > > [ 1891.515360] CPU32: cppc scale: 703. > > > > [ 1891.545360] CPU96: cppc scale: 738. > > > > [ 1891.575360] CPU0: cppc scale: 779. > > > > [ 1891.605360] CPU96: cppc scale: 829. > > > > [ 1891.635360] CPU0: cppc scale: 879. > > > > > > > > root@target:/sys/devices/system/cpu/cpufreq/policy0# > > > > root@target:/sys/devices/system/cpu/cpufreq/policy0# echo 2200000 > scaling_setspeed > > > > root@target:/sys/devices/system/cpu/cpufreq/policy0# > > > > [ 1896.585363] CPU32: cppc scale: 1004. > > > > [ 1896.675359] CPU64: cppc scale: 973. > > > > [ 1896.715359] CPU0: cppc scale: 1024. > > > > > > > > I'm doing a rate limited printk only for increase/decrease values over > > > > 64 in the scale factor value. > > > > > > > > This showed me that SMT is handled properly. > > > > > > > > Then, as soon as I start onlining CPUs 1, 33, 65, 97, the scale factor > > > > stops being even close to correct, for example: > > > > > > > > [238394.770328] CPU96: cppc scale: 22328. > > > > [238395.628846] CPU96: cppc scale: 245. > > > > [238516.087115] CPU96: cppc scale: 930. > > > > [238523.385009] CPU96: cppc scale: 245. > > > > [238538.767473] CPU96: cppc scale: 936. > > > > [238538.867546] CPU96: cppc scale: 245. > > > > [238599.367932] CPU97: cppc scale: 2728. > > > > [238599.859865] CPU97: cppc scale: 452. > > > > [238647.786284] CPU96: cppc scale: 1438. > > > > [238669.604684] CPU96: cppc scale: 27306. > > > > [238676.805049] CPU96: cppc scale: 245. > > > > [238737.642902] CPU97: cppc scale: 2035. > > > > [238737.664995] CPU97: cppc scale: 452. > > > > [238788.066193] CPU96: cppc scale: 2749. > > > > [238788.110192] CPU96: cppc scale: 245. > > > > [238817.231659] CPU96: cppc scale: 2698. > > > > [238818.083687] CPU96: cppc scale: 245. > > > > [238845.466850] CPU97: cppc scale: 2990. > > > > [238847.477805] CPU97: cppc scale: 452. > > > > [238936.984107] CPU97: cppc scale: 1590. > > > > [238937.029079] CPU97: cppc scale: 452. > > > > [238979.052464] CPU97: cppc scale: 911. > > > > [238980.900668] CPU97: cppc scale: 452. > > > > [239149.587889] CPU96: cppc scale: 803. > > > > [239151.085516] CPU96: cppc scale: 245. > > > > [239303.871373] CPU64: cppc scale: 956. > > > > [239303.906837] CPU64: cppc scale: 245. > > > > [239308.666786] CPU96: cppc scale: 821. > > > > [239319.440634] CPU96: cppc scale: 245. > > > > [239389.978395] CPU97: cppc scale: 4229. > > > > [239391.969562] CPU97: cppc scale: 452. > > > > [239415.894738] CPU96: cppc scale: 630. > > > > [239417.875326] CPU96: cppc scale: 245. > > > > > > > > > > With the counter being 32bits and the freq scaling being update at > > > tick, you can easily get a overflow on it in idle system. I can easily > > > imagine that when you unplug CPUs there is enough activity on the CPU > > > to update it regularly whereas with all CPUs, the idle time is longer > > > that the counter overflow > > For sane counters, how long the CPU is idle should not matter (please > see below my definition of sane counters). > > > > > > > > Thanks! Yes, given the high wraparound time I thought they were 64 bit. > > All variables in software are 64 bit, but looking at bit width in the > > _CPC entries, the platform counters are 32 bit counters. > > > > I've looked a bit more over the code, and considering this particular > system (32 bit counters, maximum frequency of CPUs = 2.2GHz), I believe > the wraparound is considered, and this should not cause these strange > values in the scale factor. > > I consider the counters sane if both stop during idle - either they stop > when CPU is clock gated, or some firmware does save/restore at core off. > Therefore, in all idle cases they seem to have stopped, from the point of > view of the OS. The ACPI spec mentions that both count "any time the > processor is active". > > After the cores return from idle, the counters will wraparound at a > minimum of 1.95s. So with a tick every 4ms at most 1 wraparound would > have happened which allows the getDelta() function in cppc_cpufreq > driver to get the proper difference in values. > > Let me know what you think. This is not what I can see on my THX2. Counter are always increasing at the same pace even if the system is idle (nothing running except background activity) As long as you read counter often enough, the results is coherent with the freq that has been set by the governor > > Thanks, > Ionela. > > > > There are 32bits and the overflow need to be handled by cppc_cpufreq > > > driver > > > > I'm wondering if this would be best handled in the function that reads > > the counters or in the cppc_cpufreq driver that uses them. Probably the > > latter, as you say, as the read function should only return the raw > > values, but it does complicate things. > > > > Thanks, > > Ionela. > > > > > > > > > > The counter values shown by feedback_ctrs do not seem monotonic even > > > > when only core 0 threads are online. > > > > > > > > ref:2812420736 del:166051103 > > > > ref:3683620736 del:641578595 > > > > ref:1049653440 del:1548202980 > > > > ref:2099053440 del:2120997459 > > > > ref:3185853440 del:2714205997 > > > > ref:712486144 del:3708490753 > > > > ref:3658438336 del:3401357212 > > > > ref:1570998080 del:2279728438 > > > > > > > > For now I was just wondering if you have seen the same and whether you > > > > have an opinion on this. > > > > > > > > > This is tested on my Hikey platform (without the actual read/write to > > > > > performance counters), with this script for over an hour: > > > > > > > > > > while true; do > > > > > for i in `seq 1 7`; > > > > > do > > > > > echo 0 > /sys/devices/system/cpu/cpu$i/online; > > > > > done; > > > > > > > > > > for i in `seq 1 7`; > > > > > do > > > > > echo 1 > /sys/devices/system/cpu/cpu$i/online; > > > > > done; > > > > > done > > > > > > > > > > > > > > > The same is done by Vincent on ThunderX2 and no issues were seen. > > > > > > > > Hotplug worked fine for me as well on both platforms I tested (Juno R2 > > > > and ThunderX2). > > > > > > > > Thanks, > > > > Ionela. ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH V3 0/4] cpufreq: cppc: Add support for frequency invariance 2021-06-28 11:54 ` Ionela Voinescu 2021-06-28 12:14 ` Vincent Guittot @ 2021-06-29 5:20 ` Viresh Kumar 2021-06-29 8:46 ` Ionela Voinescu 1 sibling, 1 reply; 41+ messages in thread From: Viresh Kumar @ 2021-06-29 5:20 UTC (permalink / raw) To: Ionela Voinescu Cc: Rafael Wysocki, Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann, Greg Kroah-Hartman, Ingo Molnar, Juri Lelli, Mel Gorman, Peter Zijlstra, Rafael J. Wysocki, Steven Rostedt, Sudeep Holla, Vincent Guittot, Will Deacon, linux-pm, Qian Cai, linux-acpi, linux-kernel, Paul E. McKenney, Rafael J. Wysocki On 28-06-21, 12:54, Ionela Voinescu wrote: > If you happen to have the data around, I would like to know more about > your observations on ThunderX2. > > > I tried ThunderX2 as well, with the following observations: > > Booting with userspace governor and all CPUs online, the CPPC frequency > scale factor was all over the place (even much larger than 1024). > > My initial assumptions: > - Counters do not behave properly in light of SMT > - Firmware does not do a good job to keep the reference and core > counters monotonic: save and restore at core off. > > So I offlined all CPUs with the exception of 0, 32, 64, 96 - threads of > a single core (part of policy0). With this all works very well: Interesting. > root@target:/sys/devices/system/cpu/cpufreq/policy0# echo 1056000 > scaling_setspeed > root@target:/sys/devices/system/cpu/cpufreq/policy0# > [ 1863.095370] CPU96: cppc scale: 697. > [ 1863.175370] CPU0: cppc scale: 492. > [ 1863.215367] CPU64: cppc scale: 492. > [ 1863.235366] CPU96: cppc scale: 492. > [ 1863.485368] CPU32: cppc scale: 492. > > root@target:/sys/devices/system/cpu/cpufreq/policy0# echo 1936000 > scaling_setspeed > root@target:/sys/devices/system/cpu/cpufreq/policy0# > [ 1891.395363] CPU96: cppc scale: 558. > [ 1891.415362] CPU0: cppc scale: 595. > [ 1891.435362] CPU32: cppc scale: 615. > [ 1891.465363] CPU96: cppc scale: 635. > [ 1891.495361] CPU0: cppc scale: 673. > [ 1891.515360] CPU32: cppc scale: 703. > [ 1891.545360] CPU96: cppc scale: 738. > [ 1891.575360] CPU0: cppc scale: 779. > [ 1891.605360] CPU96: cppc scale: 829. > [ 1891.635360] CPU0: cppc scale: 879. > > root@target:/sys/devices/system/cpu/cpufreq/policy0# > root@target:/sys/devices/system/cpu/cpufreq/policy0# echo 2200000 > scaling_setspeed > root@target:/sys/devices/system/cpu/cpufreq/policy0# > [ 1896.585363] CPU32: cppc scale: 1004. > [ 1896.675359] CPU64: cppc scale: 973. > [ 1896.715359] CPU0: cppc scale: 1024. > > I'm doing a rate limited printk only for increase/decrease values over > 64 in the scale factor value. > > This showed me that SMT is handled properly. > > Then, as soon as I start onlining CPUs 1, 33, 65, 97, the scale factor > stops being even close to correct, for example: > > [238394.770328] CPU96: cppc scale: 22328. > [238395.628846] CPU96: cppc scale: 245. > [238516.087115] CPU96: cppc scale: 930. > [238523.385009] CPU96: cppc scale: 245. > [238538.767473] CPU96: cppc scale: 936. > [238538.867546] CPU96: cppc scale: 245. > [238599.367932] CPU97: cppc scale: 2728. > [238599.859865] CPU97: cppc scale: 452. > [238647.786284] CPU96: cppc scale: 1438. > [238669.604684] CPU96: cppc scale: 27306. > [238676.805049] CPU96: cppc scale: 245. > [238737.642902] CPU97: cppc scale: 2035. > [238737.664995] CPU97: cppc scale: 452. > [238788.066193] CPU96: cppc scale: 2749. > [238788.110192] CPU96: cppc scale: 245. > [238817.231659] CPU96: cppc scale: 2698. > [238818.083687] CPU96: cppc scale: 245. > [238845.466850] CPU97: cppc scale: 2990. > [238847.477805] CPU97: cppc scale: 452. > [238936.984107] CPU97: cppc scale: 1590. > [238937.029079] CPU97: cppc scale: 452. > [238979.052464] CPU97: cppc scale: 911. > [238980.900668] CPU97: cppc scale: 452. > [239149.587889] CPU96: cppc scale: 803. > [239151.085516] CPU96: cppc scale: 245. > [239303.871373] CPU64: cppc scale: 956. > [239303.906837] CPU64: cppc scale: 245. > [239308.666786] CPU96: cppc scale: 821. > [239319.440634] CPU96: cppc scale: 245. > [239389.978395] CPU97: cppc scale: 4229. > [239391.969562] CPU97: cppc scale: 452. > [239415.894738] CPU96: cppc scale: 630. > [239417.875326] CPU96: cppc scale: 245. > > The counter values shown by feedback_ctrs do not seem monotonic even > when only core 0 threads are online. > > ref:2812420736 del:166051103 > ref:3683620736 del:641578595 > ref:1049653440 del:1548202980 > ref:2099053440 del:2120997459 > ref:3185853440 del:2714205997 > ref:712486144 del:3708490753 > ref:3658438336 del:3401357212 > ref:1570998080 del:2279728438 > > For now I was just wondering if you have seen the same and whether you > have an opinion on this. I think we also saw numbers like this, which didn't explain a lot on ThunderX2. We thought they may be due to rounding issues, but the offlining stuff adds an interesting factor to that. -- viresh ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH V3 0/4] cpufreq: cppc: Add support for frequency invariance 2021-06-29 5:20 ` Viresh Kumar @ 2021-06-29 8:46 ` Ionela Voinescu 0 siblings, 0 replies; 41+ messages in thread From: Ionela Voinescu @ 2021-06-29 8:46 UTC (permalink / raw) To: Viresh Kumar Cc: Rafael Wysocki, Ben Segall, Daniel Bristot de Oliveira, Dietmar Eggemann, Greg Kroah-Hartman, Ingo Molnar, Juri Lelli, Mel Gorman, Peter Zijlstra, Rafael J. Wysocki, Steven Rostedt, Sudeep Holla, Vincent Guittot, Will Deacon, linux-pm, Qian Cai, linux-acpi, linux-kernel, Paul E. McKenney, Rafael J. Wysocki Hi, On Tuesday 29 Jun 2021 at 10:50:28 (+0530), Viresh Kumar wrote: > On 28-06-21, 12:54, Ionela Voinescu wrote: > > If you happen to have the data around, I would like to know more about > > your observations on ThunderX2. > > > > > > I tried ThunderX2 as well, with the following observations: > > > > Booting with userspace governor and all CPUs online, the CPPC frequency > > scale factor was all over the place (even much larger than 1024). > > > > My initial assumptions: > > - Counters do not behave properly in light of SMT > > - Firmware does not do a good job to keep the reference and core > > counters monotonic: save and restore at core off. > > > > So I offlined all CPUs with the exception of 0, 32, 64, 96 - threads of > > a single core (part of policy0). With this all works very well: > > Interesting. > > > root@target:/sys/devices/system/cpu/cpufreq/policy0# echo 1056000 > scaling_setspeed > > root@target:/sys/devices/system/cpu/cpufreq/policy0# > > [ 1863.095370] CPU96: cppc scale: 697. > > [ 1863.175370] CPU0: cppc scale: 492. > > [ 1863.215367] CPU64: cppc scale: 492. > > [ 1863.235366] CPU96: cppc scale: 492. > > [ 1863.485368] CPU32: cppc scale: 492. > > > > root@target:/sys/devices/system/cpu/cpufreq/policy0# echo 1936000 > scaling_setspeed > > root@target:/sys/devices/system/cpu/cpufreq/policy0# > > [ 1891.395363] CPU96: cppc scale: 558. > > [ 1891.415362] CPU0: cppc scale: 595. > > [ 1891.435362] CPU32: cppc scale: 615. > > [ 1891.465363] CPU96: cppc scale: 635. > > [ 1891.495361] CPU0: cppc scale: 673. > > [ 1891.515360] CPU32: cppc scale: 703. > > [ 1891.545360] CPU96: cppc scale: 738. > > [ 1891.575360] CPU0: cppc scale: 779. > > [ 1891.605360] CPU96: cppc scale: 829. > > [ 1891.635360] CPU0: cppc scale: 879. > > > > root@target:/sys/devices/system/cpu/cpufreq/policy0# > > root@target:/sys/devices/system/cpu/cpufreq/policy0# echo 2200000 > scaling_setspeed > > root@target:/sys/devices/system/cpu/cpufreq/policy0# > > [ 1896.585363] CPU32: cppc scale: 1004. > > [ 1896.675359] CPU64: cppc scale: 973. > > [ 1896.715359] CPU0: cppc scale: 1024. > > > > I'm doing a rate limited printk only for increase/decrease values over > > 64 in the scale factor value. > > > > This showed me that SMT is handled properly. > > > > Then, as soon as I start onlining CPUs 1, 33, 65, 97, the scale factor > > stops being even close to correct, for example: > > > > [238394.770328] CPU96: cppc scale: 22328. > > [238395.628846] CPU96: cppc scale: 245. > > [238516.087115] CPU96: cppc scale: 930. > > [238523.385009] CPU96: cppc scale: 245. > > [238538.767473] CPU96: cppc scale: 936. > > [238538.867546] CPU96: cppc scale: 245. > > [238599.367932] CPU97: cppc scale: 2728. > > [238599.859865] CPU97: cppc scale: 452. > > [238647.786284] CPU96: cppc scale: 1438. > > [238669.604684] CPU96: cppc scale: 27306. > > [238676.805049] CPU96: cppc scale: 245. > > [238737.642902] CPU97: cppc scale: 2035. > > [238737.664995] CPU97: cppc scale: 452. > > [238788.066193] CPU96: cppc scale: 2749. > > [238788.110192] CPU96: cppc scale: 245. > > [238817.231659] CPU96: cppc scale: 2698. > > [238818.083687] CPU96: cppc scale: 245. > > [238845.466850] CPU97: cppc scale: 2990. > > [238847.477805] CPU97: cppc scale: 452. > > [238936.984107] CPU97: cppc scale: 1590. > > [238937.029079] CPU97: cppc scale: 452. > > [238979.052464] CPU97: cppc scale: 911. > > [238980.900668] CPU97: cppc scale: 452. > > [239149.587889] CPU96: cppc scale: 803. > > [239151.085516] CPU96: cppc scale: 245. > > [239303.871373] CPU64: cppc scale: 956. > > [239303.906837] CPU64: cppc scale: 245. > > [239308.666786] CPU96: cppc scale: 821. > > [239319.440634] CPU96: cppc scale: 245. > > [239389.978395] CPU97: cppc scale: 4229. > > [239391.969562] CPU97: cppc scale: 452. > > [239415.894738] CPU96: cppc scale: 630. > > [239417.875326] CPU96: cppc scale: 245. > > > > The counter values shown by feedback_ctrs do not seem monotonic even > > when only core 0 threads are online. > > > > ref:2812420736 del:166051103 > > ref:3683620736 del:641578595 > > ref:1049653440 del:1548202980 > > ref:2099053440 del:2120997459 > > ref:3185853440 del:2714205997 > > ref:712486144 del:3708490753 > > ref:3658438336 del:3401357212 > > ref:1570998080 del:2279728438 > > > > For now I was just wondering if you have seen the same and whether you > > have an opinion on this. > > I think we also saw numbers like this, which didn't explain a lot on > ThunderX2. We thought they may be due to rounding issues, but the > offlining stuff adds an interesting factor to that. > More testing last night showed that having 1 core (with all 4 threads) online per socket works fine, while having 2 cores online per socket gives these bad values. My assumption is that the counters reset to 0 at core off which introduces the behavior. When there is a single core per socket, this single core cannot be turned off. I tried to boot with less CPUs and cpuidle.off=1 but it did not make a difference. I expect much of the idle control to be in hardware/firmware, so possibly cores were still turned off. I'll do more research on idle state management for SMT and debugging to see if it explains more, but it will take longer as I've ignored a few other responsibilities in the meantime. Thanks, Ionela. > -- > viresh ^ permalink raw reply [flat|nested] 41+ messages in thread
end of thread, other threads:[~2021-06-29 13:38 UTC | newest] Thread overview: 41+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2021-06-21 9:19 [PATCH V3 0/4] cpufreq: cppc: Add support for frequency invariance Viresh Kumar 2021-06-21 9:19 ` [PATCH V3 4/4] cpufreq: CPPC: " Viresh Kumar 2021-06-24 9:48 ` Ionela Voinescu 2021-06-24 13:04 ` Viresh Kumar 2021-06-25 8:54 ` Ionela Voinescu 2021-06-25 16:54 ` Viresh Kumar 2021-06-28 10:49 ` Ionela Voinescu 2021-06-29 4:32 ` Viresh Kumar 2021-06-29 8:47 ` Ionela Voinescu 2021-06-29 8:53 ` Viresh Kumar 2021-06-21 20:48 ` [PATCH V3 0/4] cpufreq: cppc: " Qian Cai 2021-06-22 6:52 ` Viresh Kumar 2021-06-23 4:16 ` Viresh Kumar 2021-06-23 12:57 ` Qian Cai 2021-06-24 2:54 ` Viresh Kumar 2021-06-24 9:49 ` Vincent Guittot 2021-06-24 10:48 ` Ionela Voinescu 2021-06-24 11:15 ` Vincent Guittot 2021-06-24 11:23 ` Ionela Voinescu 2021-06-24 11:59 ` Vincent Guittot 2021-06-24 15:17 ` Qian Cai 2021-06-25 10:21 ` Ionela Voinescu 2021-06-25 13:31 ` Qian Cai 2021-06-25 14:37 ` Ionela Voinescu 2021-06-25 16:56 ` Qian Cai 2021-06-26 2:29 ` Qian Cai 2021-06-26 13:41 ` Qian Cai 2021-06-29 4:55 ` Viresh Kumar 2021-06-29 4:52 ` Viresh Kumar 2021-06-29 9:06 ` Ionela Voinescu 2021-06-29 13:38 ` Qian Cai 2021-06-29 4:45 ` Viresh Kumar 2021-06-24 20:44 ` Qian Cai 2021-06-28 11:54 ` Ionela Voinescu 2021-06-28 12:14 ` Vincent Guittot 2021-06-28 12:17 ` Vincent Guittot 2021-06-28 13:08 ` Ionela Voinescu 2021-06-28 21:37 ` Ionela Voinescu 2021-06-29 8:45 ` Vincent Guittot 2021-06-29 5:20 ` Viresh Kumar 2021-06-29 8:46 ` Ionela Voinescu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox