* [PATCH v3] perf/arm_pmu: Skip PMCCNTR_EL0 on NVIDIA Olympus
@ 2026-04-29 21:56 Besar Wicaksono
2026-05-01 0:27 ` Matt Ochs
2026-05-01 14:01 ` James Clark
0 siblings, 2 replies; 4+ messages in thread
From: Besar Wicaksono @ 2026-04-29 21:56 UTC (permalink / raw)
To: will, mark.rutland, james.clark, yangyccccc
Cc: linux-arm-kernel, linux-kernel, linux-tegra, treding, jonathanh,
vsethi, rwiley, sdonthineni, mochs, nirmoyd, skelley,
Besar Wicaksono
PMCCNTR_EL0 may continue to increment on NVIDIA Olympus CPUs while the
PE is in WFI/WFE. That does not necessarily match the CPU_CYCLES event
counted by a programmable counter, so using PMCCNTR_EL0 for cycles can
give results that differ from the programmable counter path.
Extend the existing PMCCNTR avoidance decision from the SMT case to
also cover Olympus. Store the result in the common arm_pmu state at
registration time, so arm_pmuv3 can keep using a single flag when
deciding whether CPU_CYCLES may use PMCCNTR_EL0.
Use the cached MIDR from cpu_data to identify Olympus parts and avoid
reading MIDR_EL1 in the event path.
Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
---
Changes from v1:
* add CONFIG_ARM64 check to fix build error found by kernel test robot
* add explicit include of <asm/cputype.h>
v1: https://lore.kernel.org/linux-arm-kernel/20260406232034.2566133-1-bwicaksono@nvidia.com/
Changes from v2:
* Move the Olympus PMCCNTR avoidance check from arm_pmuv3.c to the
common arm_pmu registration path.
* Replace the PMUv3-only has_smt flag with avoid_pmccntr, covering both
the existing SMT restriction and the Olympus MIDR restriction.
* Use the cached per-CPU MIDR from cpu_data instead of calling
is_midr_in_range_list() from armv8pmu_can_use_pmccntr().
* Add the required asm/cpu.h include for cpu_data.
* Drop the use_pmccntr override patch from this revision.
v2: https://lore.kernel.org/linux-arm-kernel/20260421203856.3539186-1-bwicaksono@nvidia.com/#t
---
drivers/perf/arm_pmu.c | 78 +++++++++++++++++++++++++++++++++---
drivers/perf/arm_pmuv3.c | 8 +---
include/linux/perf/arm_pmu.h | 2 +-
3 files changed, 75 insertions(+), 13 deletions(-)
diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
index 939bcbd433aa..7df185ee7b74 100644
--- a/drivers/perf/arm_pmu.c
+++ b/drivers/perf/arm_pmu.c
@@ -24,6 +24,8 @@
#include <linux/irq.h>
#include <linux/irqdesc.h>
+#include <asm/cpu.h>
+#include <asm/cputype.h>
#include <asm/irq_regs.h>
static int armpmu_count_irq_users(const struct cpumask *affinity,
@@ -920,6 +922,76 @@ void armpmu_free(struct arm_pmu *pmu)
kfree(pmu);
}
+#ifdef CONFIG_ARM64
+/*
+ * List of CPUs that should avoid using PMCCNTR_EL0.
+ */
+static struct midr_range armpmu_avoid_pmccntr_cpus[] = {
+ /*
+ * The PMCCNTR_EL0 in Olympus CPU may still increment while in WFI/WFE state.
+ * This is an implementation specific behavior and not an erratum.
+ *
+ * From ARM DDI0487 D14.4:
+ * It is IMPLEMENTATION SPECIFIC whether CPU_CYCLES and PMCCNTR count
+ * when the PE is in WFI or WFE state, even if the clocks are not stopped.
+ *
+ * From ARM DDI0487 D24.5.2:
+ * All counters are subject to any changes in clock frequency, including
+ * clock stopping caused by the WFI and WFE instructions.
+ * This means that it is CONSTRAINED UNPREDICTABLE whether or not
+ * PMCCNTR_EL0 continues to increment when clocks are stopped by WFI and
+ * WFE instructions.
+ */
+ MIDR_ALL_VERSIONS(MIDR_NVIDIA_OLYMPUS),
+ {}
+};
+
+static bool armpmu_is_in_avoid_pmccntr_cpus(int cpu)
+{
+ struct midr_range const *r = armpmu_avoid_pmccntr_cpus;
+ u32 midr = (u32)per_cpu(cpu_data, cpu).reg_midr;
+
+ while (r->model) {
+ if (midr_is_cpu_model_range(midr, r->model, r->rv_min, r->rv_max))
+ return true;
+ r++;
+ }
+
+ return false;
+}
+#else
+static bool armpmu_is_in_avoid_pmccntr_cpus(int cpu)
+{
+ return false;
+}
+#endif
+
+static bool armpmu_avoid_pmccntr(struct arm_pmu *pmu)
+{
+ int cpu = cpumask_first(&pmu->supported_cpus);
+
+ /*
+ * By this stage we know our supported CPUs on either DT/ACPI platforms,
+ * detect the SMT implementation.
+ * On SMT CPUs, the PMCCNTR_EL0 increments from the processor clock rather
+ * than the PE clock (ARM DDI0487 L.b D13.1.3) which means it'll continue
+ * counting on a WFI PE if one of its SMT sibling is not idle on a
+ * multi-threaded implementation. So don't use it on SMT cores.
+ */
+ if (topology_core_has_smt(cpu))
+ return true;
+
+ /*
+ * On some CPUs, PMCCNTR_EL0 does not match the behavior of CPU_CYCLES
+ * programmable counter, so avoid routing cycles through PMCCNTR_EL0 to
+ * prevent inconsistency in the results.
+ */
+ if (armpmu_is_in_avoid_pmccntr_cpus(cpu))
+ return true;
+
+ return false;
+}
+
int armpmu_register(struct arm_pmu *pmu)
{
int ret;
@@ -928,11 +1000,7 @@ int armpmu_register(struct arm_pmu *pmu)
if (ret)
return ret;
- /*
- * By this stage we know our supported CPUs on either DT/ACPI platforms,
- * detect the SMT implementation.
- */
- pmu->has_smt = topology_core_has_smt(cpumask_first(&pmu->supported_cpus));
+ pmu->avoid_pmccntr = armpmu_avoid_pmccntr(pmu);
if (!pmu->set_event_filter)
pmu->pmu.capabilities |= PERF_PMU_CAP_NO_EXCLUDE;
diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c
index 8014ff766cff..60f159a51992 100644
--- a/drivers/perf/arm_pmuv3.c
+++ b/drivers/perf/arm_pmuv3.c
@@ -1002,13 +1002,7 @@ static bool armv8pmu_can_use_pmccntr(struct pmu_hw_events *cpuc,
if (has_branch_stack(event))
return false;
- /*
- * The PMCCNTR_EL0 increments from the processor clock rather than
- * the PE clock (ARM DDI0487 L.b D13.1.3) which means it'll continue
- * counting on a WFI PE if one of its SMT sibling is not idle on a
- * multi-threaded implementation. So don't use it on SMT cores.
- */
- if (cpu_pmu->has_smt)
+ if (cpu_pmu->avoid_pmccntr)
return false;
return true;
diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h
index 52b37f7bdbf9..02d2c7f45b52 100644
--- a/include/linux/perf/arm_pmu.h
+++ b/include/linux/perf/arm_pmu.h
@@ -119,7 +119,7 @@ struct arm_pmu {
/* PMUv3 only */
int pmuver;
- bool has_smt;
+ bool avoid_pmccntr;
u64 reg_pmmir;
u64 reg_brbidr;
#define ARMV8_PMUV3_MAX_COMMON_EVENTS 0x40
--
2.43.0
^ permalink raw reply related [flat|nested] 4+ messages in thread* Re: [PATCH v3] perf/arm_pmu: Skip PMCCNTR_EL0 on NVIDIA Olympus
2026-04-29 21:56 [PATCH v3] perf/arm_pmu: Skip PMCCNTR_EL0 on NVIDIA Olympus Besar Wicaksono
@ 2026-05-01 0:27 ` Matt Ochs
2026-05-01 14:01 ` James Clark
1 sibling, 0 replies; 4+ messages in thread
From: Matt Ochs @ 2026-05-01 0:27 UTC (permalink / raw)
To: Besar Wicaksono
Cc: will@kernel.org, mark.rutland@arm.com, james.clark@linaro.org,
yangyccccc@gmail.com, linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, linux-tegra@vger.kernel.org,
Thierry Reding, Jon Hunter, Vikram Sethi, Rich Wiley,
Shanker Donthineni, Nirmoy Das, Sean Kelley
> On Apr 29, 2026, at 16:56, Besar Wicaksono <bwicaksono@nvidia.com> wrote:
>
> PMCCNTR_EL0 may continue to increment on NVIDIA Olympus CPUs while the
> PE is in WFI/WFE. That does not necessarily match the CPU_CYCLES event
> counted by a programmable counter, so using PMCCNTR_EL0 for cycles can
> give results that differ from the programmable counter path.
>
> Extend the existing PMCCNTR avoidance decision from the SMT case to
> also cover Olympus. Store the result in the common arm_pmu state at
> registration time, so arm_pmuv3 can keep using a single flag when
> deciding whether CPU_CYCLES may use PMCCNTR_EL0.
>
> Use the cached MIDR from cpu_data to identify Olympus parts and avoid
> reading MIDR_EL1 in the event path.
>
> Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
Verified on NVIDIA Vera (Olympus CPUs) with UEFI SMT disabled. Confirmed
that grouped cpu_cycles events show ~1x ratio (both on programmable
counters) with the patch vs ~15x inflation without it.
Tested-by: Matthew R. Ochs <mochs@nvidia.com>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH v3] perf/arm_pmu: Skip PMCCNTR_EL0 on NVIDIA Olympus
2026-04-29 21:56 [PATCH v3] perf/arm_pmu: Skip PMCCNTR_EL0 on NVIDIA Olympus Besar Wicaksono
2026-05-01 0:27 ` Matt Ochs
@ 2026-05-01 14:01 ` James Clark
2026-05-01 21:25 ` Besar Wicaksono
1 sibling, 1 reply; 4+ messages in thread
From: James Clark @ 2026-05-01 14:01 UTC (permalink / raw)
To: Besar Wicaksono
Cc: linux-arm-kernel, linux-kernel, linux-tegra, treding, jonathanh,
vsethi, rwiley, sdonthineni, mochs, nirmoyd, skelley, will,
mark.rutland, yangyccccc
On 29/04/2026 10:56 pm, Besar Wicaksono wrote:
> PMCCNTR_EL0 may continue to increment on NVIDIA Olympus CPUs while the
> PE is in WFI/WFE. That does not necessarily match the CPU_CYCLES event
> counted by a programmable counter, so using PMCCNTR_EL0 for cycles can
> give results that differ from the programmable counter path.
>
> Extend the existing PMCCNTR avoidance decision from the SMT case to
> also cover Olympus. Store the result in the common arm_pmu state at
> registration time, so arm_pmuv3 can keep using a single flag when
> deciding whether CPU_CYCLES may use PMCCNTR_EL0.
>
> Use the cached MIDR from cpu_data to identify Olympus parts and avoid
> reading MIDR_EL1 in the event path.
>
> Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
> ---
>
> Changes from v1:
> * add CONFIG_ARM64 check to fix build error found by kernel test robot
> * add explicit include of <asm/cputype.h>
> v1: https://lore.kernel.org/linux-arm-kernel/20260406232034.2566133-1-bwicaksono@nvidia.com/
>
> Changes from v2:
> * Move the Olympus PMCCNTR avoidance check from arm_pmuv3.c to the
> common arm_pmu registration path.
> * Replace the PMUv3-only has_smt flag with avoid_pmccntr, covering both
> the existing SMT restriction and the Olympus MIDR restriction.
> * Use the cached per-CPU MIDR from cpu_data instead of calling
> is_midr_in_range_list() from armv8pmu_can_use_pmccntr().
> * Add the required asm/cpu.h include for cpu_data.
> * Drop the use_pmccntr override patch from this revision.
> v2: https://lore.kernel.org/linux-arm-kernel/20260421203856.3539186-1-bwicaksono@nvidia.com/#t
>
> ---
> drivers/perf/arm_pmu.c | 78 +++++++++++++++++++++++++++++++++---
> drivers/perf/arm_pmuv3.c | 8 +---
> include/linux/perf/arm_pmu.h | 2 +-
> 3 files changed, 75 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
> index 939bcbd433aa..7df185ee7b74 100644
> --- a/drivers/perf/arm_pmu.c
> +++ b/drivers/perf/arm_pmu.c
> @@ -24,6 +24,8 @@
> #include <linux/irq.h>
> #include <linux/irqdesc.h>
>
> +#include <asm/cpu.h>
> +#include <asm/cputype.h>
> #include <asm/irq_regs.h>
>
> static int armpmu_count_irq_users(const struct cpumask *affinity,
> @@ -920,6 +922,76 @@ void armpmu_free(struct arm_pmu *pmu)
> kfree(pmu);
> }
>
> +#ifdef CONFIG_ARM64
> +/*
> + * List of CPUs that should avoid using PMCCNTR_EL0.
> + */
> +static struct midr_range armpmu_avoid_pmccntr_cpus[] = {
> + /*
> + * The PMCCNTR_EL0 in Olympus CPU may still increment while in WFI/WFE state.
> + * This is an implementation specific behavior and not an erratum.
> + *
> + * From ARM DDI0487 D14.4:
> + * It is IMPLEMENTATION SPECIFIC whether CPU_CYCLES and PMCCNTR count
> + * when the PE is in WFI or WFE state, even if the clocks are not stopped.
> + *
> + * From ARM DDI0487 D24.5.2:
> + * All counters are subject to any changes in clock frequency, including
> + * clock stopping caused by the WFI and WFE instructions.
> + * This means that it is CONSTRAINED UNPREDICTABLE whether or not
> + * PMCCNTR_EL0 continues to increment when clocks are stopped by WFI and
> + * WFE instructions.
> + */
> + MIDR_ALL_VERSIONS(MIDR_NVIDIA_OLYMPUS),
> + {}
> +};
> +
> +static bool armpmu_is_in_avoid_pmccntr_cpus(int cpu)
> +{
> + struct midr_range const *r = armpmu_avoid_pmccntr_cpus;
> + u32 midr = (u32)per_cpu(cpu_data, cpu).reg_midr;
Hi Besar,
This is still fragile to the thing I mentioned on V2 about some of the
CPUs not being online, then cpu_data isn't initialized for those CPUs.
Sashiko suggests to use cpumask_any_and(&pmu->supported_cpus,
cpu_online_mask), and currently the Arm PMUs do require at least one CPU
online so it's probably fine. Although it could be fragile if we added
deferred probing in the future.
The other alternative is to put this in __armv8pmu_probe_pmu(), although
then you end up with both arm_pmuv3 and arm_pmu initializing
cpu_pmu->has_smt, but I'm sure there is a way to make it fit somehow.
James
> +
> + while (r->model) {
> + if (midr_is_cpu_model_range(midr, r->model, r->rv_min, r->rv_max))
> + return true;
> + r++;
> + }
> +
> + return false;
> +}
> +#else
> +static bool armpmu_is_in_avoid_pmccntr_cpus(int cpu)
> +{
> + return false;
> +}
> +#endif
> +
> +static bool armpmu_avoid_pmccntr(struct arm_pmu *pmu)
> +{
> + int cpu = cpumask_first(&pmu->supported_cpus);
> +
> + /*
> + * By this stage we know our supported CPUs on either DT/ACPI platforms,
> + * detect the SMT implementation.
> + * On SMT CPUs, the PMCCNTR_EL0 increments from the processor clock rather
> + * than the PE clock (ARM DDI0487 L.b D13.1.3) which means it'll continue
> + * counting on a WFI PE if one of its SMT sibling is not idle on a
> + * multi-threaded implementation. So don't use it on SMT cores.
> + */
> + if (topology_core_has_smt(cpu))
> + return true;
> +
> + /*
> + * On some CPUs, PMCCNTR_EL0 does not match the behavior of CPU_CYCLES
> + * programmable counter, so avoid routing cycles through PMCCNTR_EL0 to
> + * prevent inconsistency in the results.
> + */
> + if (armpmu_is_in_avoid_pmccntr_cpus(cpu))
> + return true;
> +
> + return false;
> +}
> +
> int armpmu_register(struct arm_pmu *pmu)
> {
> int ret;
> @@ -928,11 +1000,7 @@ int armpmu_register(struct arm_pmu *pmu)
> if (ret)
> return ret;
>
> - /*
> - * By this stage we know our supported CPUs on either DT/ACPI platforms,
> - * detect the SMT implementation.
> - */
> - pmu->has_smt = topology_core_has_smt(cpumask_first(&pmu->supported_cpus));
> + pmu->avoid_pmccntr = armpmu_avoid_pmccntr(pmu);
>
> if (!pmu->set_event_filter)
> pmu->pmu.capabilities |= PERF_PMU_CAP_NO_EXCLUDE;
> diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c
> index 8014ff766cff..60f159a51992 100644
> --- a/drivers/perf/arm_pmuv3.c
> +++ b/drivers/perf/arm_pmuv3.c
> @@ -1002,13 +1002,7 @@ static bool armv8pmu_can_use_pmccntr(struct pmu_hw_events *cpuc,
> if (has_branch_stack(event))
> return false;
>
> - /*
> - * The PMCCNTR_EL0 increments from the processor clock rather than
> - * the PE clock (ARM DDI0487 L.b D13.1.3) which means it'll continue
> - * counting on a WFI PE if one of its SMT sibling is not idle on a
> - * multi-threaded implementation. So don't use it on SMT cores.
> - */
> - if (cpu_pmu->has_smt)
> + if (cpu_pmu->avoid_pmccntr)
> return false;
>
> return true;
> diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h
> index 52b37f7bdbf9..02d2c7f45b52 100644
> --- a/include/linux/perf/arm_pmu.h
> +++ b/include/linux/perf/arm_pmu.h
> @@ -119,7 +119,7 @@ struct arm_pmu {
>
> /* PMUv3 only */
> int pmuver;
> - bool has_smt;
> + bool avoid_pmccntr;
> u64 reg_pmmir;
> u64 reg_brbidr;
> #define ARMV8_PMUV3_MAX_COMMON_EVENTS 0x40
^ permalink raw reply [flat|nested] 4+ messages in thread* RE: [PATCH v3] perf/arm_pmu: Skip PMCCNTR_EL0 on NVIDIA Olympus
2026-05-01 14:01 ` James Clark
@ 2026-05-01 21:25 ` Besar Wicaksono
0 siblings, 0 replies; 4+ messages in thread
From: Besar Wicaksono @ 2026-05-01 21:25 UTC (permalink / raw)
To: James Clark
Cc: linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, linux-tegra@vger.kernel.org,
Thierry Reding, Jon Hunter, Vikram Sethi, Rich Wiley,
Shanker Donthineni, Matt Ochs, Nirmoy Das, Sean Kelley,
will@kernel.org, mark.rutland@arm.com, yangyccccc@gmail.com
> -----Original Message-----
> From: James Clark <james.clark@linaro.org>
> Sent: Friday, May 1, 2026 9:02 AM
> To: Besar Wicaksono <bwicaksono@nvidia.com>
> Cc: linux-arm-kernel@lists.infradead.org; linux-kernel@vger.kernel.org; linux-
> tegra@vger.kernel.org; Thierry Reding <treding@nvidia.com>; Jon Hunter
> <jonathanh@nvidia.com>; Vikram Sethi <vsethi@nvidia.com>; Rich Wiley
> <rwiley@nvidia.com>; Shanker Donthineni <sdonthineni@nvidia.com>; Matt
> Ochs <mochs@nvidia.com>; Nirmoy Das <nirmoyd@nvidia.com>; Sean Kelley
> <skelley@nvidia.com>; will@kernel.org; mark.rutland@arm.com;
> yangyccccc@gmail.com
> Subject: Re: [PATCH v3] perf/arm_pmu: Skip PMCCNTR_EL0 on NVIDIA
> Olympus
>
> External email: Use caution opening links or attachments
>
>
> On 29/04/2026 10:56 pm, Besar Wicaksono wrote:
> > PMCCNTR_EL0 may continue to increment on NVIDIA Olympus CPUs while
> the
> > PE is in WFI/WFE. That does not necessarily match the CPU_CYCLES event
> > counted by a programmable counter, so using PMCCNTR_EL0 for cycles can
> > give results that differ from the programmable counter path.
> >
> > Extend the existing PMCCNTR avoidance decision from the SMT case to
> > also cover Olympus. Store the result in the common arm_pmu state at
> > registration time, so arm_pmuv3 can keep using a single flag when
> > deciding whether CPU_CYCLES may use PMCCNTR_EL0.
> >
> > Use the cached MIDR from cpu_data to identify Olympus parts and avoid
> > reading MIDR_EL1 in the event path.
> >
> > Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
> > ---
> >
> > Changes from v1:
> > * add CONFIG_ARM64 check to fix build error found by kernel test robot
> > * add explicit include of <asm/cputype.h>
> > v1: https://lore.kernel.org/linux-arm-kernel/20260406232034.2566133-1-
> bwicaksono@nvidia.com/
> >
> > Changes from v2:
> > * Move the Olympus PMCCNTR avoidance check from arm_pmuv3.c to the
> > common arm_pmu registration path.
> > * Replace the PMUv3-only has_smt flag with avoid_pmccntr, covering both
> > the existing SMT restriction and the Olympus MIDR restriction.
> > * Use the cached per-CPU MIDR from cpu_data instead of calling
> > is_midr_in_range_list() from armv8pmu_can_use_pmccntr().
> > * Add the required asm/cpu.h include for cpu_data.
> > * Drop the use_pmccntr override patch from this revision.
> > v2: https://lore.kernel.org/linux-arm-kernel/20260421203856.3539186-1-
> bwicaksono@nvidia.com/#t
> >
> > ---
> > drivers/perf/arm_pmu.c | 78
> +++++++++++++++++++++++++++++++++---
> > drivers/perf/arm_pmuv3.c | 8 +---
> > include/linux/perf/arm_pmu.h | 2 +-
> > 3 files changed, 75 insertions(+), 13 deletions(-)
> >
> > diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
> > index 939bcbd433aa..7df185ee7b74 100644
> > --- a/drivers/perf/arm_pmu.c
> > +++ b/drivers/perf/arm_pmu.c
> > @@ -24,6 +24,8 @@
> > #include <linux/irq.h>
> > #include <linux/irqdesc.h>
> >
> > +#include <asm/cpu.h>
> > +#include <asm/cputype.h>
> > #include <asm/irq_regs.h>
> >
> > static int armpmu_count_irq_users(const struct cpumask *affinity,
> > @@ -920,6 +922,76 @@ void armpmu_free(struct arm_pmu *pmu)
> > kfree(pmu);
> > }
> >
> > +#ifdef CONFIG_ARM64
> > +/*
> > + * List of CPUs that should avoid using PMCCNTR_EL0.
> > + */
> > +static struct midr_range armpmu_avoid_pmccntr_cpus[] = {
> > + /*
> > + * The PMCCNTR_EL0 in Olympus CPU may still increment while in
> WFI/WFE state.
> > + * This is an implementation specific behavior and not an erratum.
> > + *
> > + * From ARM DDI0487 D14.4:
> > + * It is IMPLEMENTATION SPECIFIC whether CPU_CYCLES and PMCCNTR
> count
> > + * when the PE is in WFI or WFE state, even if the clocks are not stopped.
> > + *
> > + * From ARM DDI0487 D24.5.2:
> > + * All counters are subject to any changes in clock frequency, including
> > + * clock stopping caused by the WFI and WFE instructions.
> > + * This means that it is CONSTRAINED UNPREDICTABLE whether or not
> > + * PMCCNTR_EL0 continues to increment when clocks are stopped by
> WFI and
> > + * WFE instructions.
> > + */
> > + MIDR_ALL_VERSIONS(MIDR_NVIDIA_OLYMPUS),
> > + {}
> > +};
> > +
> > +static bool armpmu_is_in_avoid_pmccntr_cpus(int cpu)
> > +{
> > + struct midr_range const *r = armpmu_avoid_pmccntr_cpus;
> > + u32 midr = (u32)per_cpu(cpu_data, cpu).reg_midr;
>
> Hi Besar,
>
> This is still fragile to the thing I mentioned on V2 about some of the
> CPUs not being online, then cpu_data isn't initialized for those CPUs.
>
> Sashiko suggests to use cpumask_any_and(&pmu->supported_cpus,
> cpu_online_mask), and currently the Arm PMUs do require at least one CPU
> online so it's probably fine. Although it could be fragile if we added
> deferred probing in the future.
>
> The other alternative is to put this in __armv8pmu_probe_pmu(), although
> then you end up with both arm_pmuv3 and arm_pmu initializing
> cpu_pmu->has_smt, but I'm sure there is a way to make it fit somehow.
>
Thanks for the pointers, James and Sashiko. I will try this alternative approach
and add the check on __armv8pmu_probe_pmu(). I would still rename
has_smt to avoid_pmccntr and keep the SMT check on arm_pmu.c.
Regards,
Besar
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-05-01 21:25 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-29 21:56 [PATCH v3] perf/arm_pmu: Skip PMCCNTR_EL0 on NVIDIA Olympus Besar Wicaksono
2026-05-01 0:27 ` Matt Ochs
2026-05-01 14:01 ` James Clark
2026-05-01 21:25 ` Besar Wicaksono
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox