* [PATCH v3 1/1] perf: arm_pmuv3: Don't use PMCCNTR_EL0 on SMT cores
@ 2025-09-22 3:30 Yicong Yang
2025-11-03 14:57 ` Will Deacon
2025-11-06 14:33 ` Mark Brown
0 siblings, 2 replies; 5+ messages in thread
From: Yicong Yang @ 2025-09-22 3:30 UTC (permalink / raw)
To: will, mark.rutland, linux-arm-kernel
Cc: sudeep.holla, james.clark, robh, anshuman.khandual,
jonathan.cameron, hejunhao3, linuxarm, prime.zeng, xuwei5,
wangyushan12, yangyicong, yangyccccc
From: Yicong Yang <yangyicong@hisilicon.com>
CPU_CYCLES is expected to count the logical CPU (PE) clock. Currently it's
preferred to use PMCCNTR_EL0 for counting CPU_CYCLES, but it'll count
processor clock rather than the PE clock (ARM DDI0487 L.b D13.1.3) if
one of the SMT siblings is not idle on a multi-threaded implementation.
So don't use it on SMT cores.
Introduce topology_core_has_smt() for knowing the SMT implementation and
cached it in arm_pmu::has_smt during allocation.
When counting cycles on SMT CPU 2-3 and CPU 3 is idle, without this
patch we'll get:
[root@client1 tmp]# perf stat -e cycles -A -C 2-3 -- stress-ng -c 1
--taskset 2 --timeout 1
[...]
Performance counter stats for 'CPU(s) 2-3':
CPU2 2880457316 cycles
CPU3 2880459810 cycles
1.254688470 seconds time elapsed
With this patch the idle state of CPU3 is observed as expected:
[root@client1 ~]# perf stat -e cycles -A -C 2-3 -- stress-ng -c 1
--taskset 2 --timeout 1
[...]
Performance counter stats for 'CPU(s) 2-3':
CPU2 2558580492 cycles
CPU3 305749 cycles
1.113626410 seconds time elapsed
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
---
Change since v3:
- typo and code style fixes per Will
- move SMT detection from armpmu_alloc() to armpmu_register() since
armpmu_alloc() initialize the SMT information based on all possible CPUs,
may include CPUs not supported by this PMU
Link: https://lore.kernel.org/linux-arm-kernel/20250820084534.28037-3-yangyicong@huawei.com/
drivers/perf/arm_pmu.c | 6 ++++++
drivers/perf/arm_pmuv3.c | 10 ++++++++++
include/linux/arch_topology.h | 11 +++++++++++
include/linux/perf/arm_pmu.h | 1 +
4 files changed, 28 insertions(+)
diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
index 5c310e803dd7..ae437791b5f8 100644
--- a/drivers/perf/arm_pmu.c
+++ b/drivers/perf/arm_pmu.c
@@ -925,6 +925,12 @@ int armpmu_register(struct arm_pmu *pmu)
if (ret)
return ret;
+ /*
+ * By this stage we know our supported CPUs on either DT/ACPI platforms,
+ * detect the SMT implementation.
+ */
+ pmu->has_smt = topology_core_has_smt(cpumask_first(&pmu->supported_cpus));
+
if (!pmu->set_event_filter)
pmu->pmu.capabilities |= PERF_PMU_CAP_NO_EXCLUDE;
diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c
index 69c5cc8f5606..d1d6000517b2 100644
--- a/drivers/perf/arm_pmuv3.c
+++ b/drivers/perf/arm_pmuv3.c
@@ -981,6 +981,7 @@ static int armv8pmu_get_chain_idx(struct pmu_hw_events *cpuc,
static bool armv8pmu_can_use_pmccntr(struct pmu_hw_events *cpuc,
struct perf_event *event)
{
+ struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
struct hw_perf_event *hwc = &event->hw;
unsigned long evtype = hwc->config_base & ARMV8_PMU_EVTYPE_EVENT;
@@ -1001,6 +1002,15 @@ static bool armv8pmu_can_use_pmccntr(struct pmu_hw_events *cpuc,
if (has_branch_stack(event))
return false;
+ /*
+ * The PMCCNTR_EL0 increments from the processor clock rather than
+ * the PE clock (ARM DDI0487 L.b D13.1.3) which means it'll continue
+ * counting on a WFI PE if one of its SMT sibling is not idle on a
+ * multi-threaded implementation. So don't use it on SMT cores.
+ */
+ if (cpu_pmu->has_smt)
+ return false;
+
return true;
}
diff --git a/include/linux/arch_topology.h b/include/linux/arch_topology.h
index d72d6e5aa200..daa1af2e8204 100644
--- a/include/linux/arch_topology.h
+++ b/include/linux/arch_topology.h
@@ -89,6 +89,17 @@ void remove_cpu_topology(unsigned int cpuid);
void reset_cpu_topology(void);
int parse_acpi_topology(void);
void freq_inv_set_max_ratio(int cpu, u64 max_rate);
+
+/*
+ * Architectures like ARM64 don't have reliable architectural way to get SMT
+ * information and depend on the firmware (ACPI/OF) report. Non-SMT core won't
+ * initialize thread_id so we can use this to detect the SMT implementation.
+ */
+static inline bool topology_core_has_smt(int cpu)
+{
+ return cpu_topology[cpu].thread_id != -1;
+}
+
#endif
#endif /* _LINUX_ARCH_TOPOLOGY_H_ */
diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h
index 93c9a26492fc..2d39322c40c4 100644
--- a/include/linux/perf/arm_pmu.h
+++ b/include/linux/perf/arm_pmu.h
@@ -119,6 +119,7 @@ struct arm_pmu {
/* PMUv3 only */
int pmuver;
+ bool has_smt;
u64 reg_pmmir;
u64 reg_brbidr;
#define ARMV8_PMUV3_MAX_COMMON_EVENTS 0x40
--
2.24.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v3 1/1] perf: arm_pmuv3: Don't use PMCCNTR_EL0 on SMT cores
2025-09-22 3:30 [PATCH v3 1/1] perf: arm_pmuv3: Don't use PMCCNTR_EL0 on SMT cores Yicong Yang
@ 2025-11-03 14:57 ` Will Deacon
2025-11-06 14:33 ` Mark Brown
1 sibling, 0 replies; 5+ messages in thread
From: Will Deacon @ 2025-11-03 14:57 UTC (permalink / raw)
To: mark.rutland, linux-arm-kernel, Yicong Yang
Cc: catalin.marinas, kernel-team, Will Deacon, sudeep.holla,
james.clark, robh, anshuman.khandual, jonathan.cameron, hejunhao3,
linuxarm, prime.zeng, xuwei5, wangyushan12, yangyicong,
yangyccccc
On Mon, 22 Sep 2025 11:30:10 +0800, Yicong Yang wrote:
> CPU_CYCLES is expected to count the logical CPU (PE) clock. Currently it's
> preferred to use PMCCNTR_EL0 for counting CPU_CYCLES, but it'll count
> processor clock rather than the PE clock (ARM DDI0487 L.b D13.1.3) if
> one of the SMT siblings is not idle on a multi-threaded implementation.
> So don't use it on SMT cores.
>
> Introduce topology_core_has_smt() for knowing the SMT implementation and
> cached it in arm_pmu::has_smt during allocation.
>
> [...]
Applied to will (for-next/perf), thanks!
[1/1] perf: arm_pmuv3: Don't use PMCCNTR_EL0 on SMT cores
https://git.kernel.org/will/c/c3d78c34ad00
Cheers,
--
Will
https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v3 1/1] perf: arm_pmuv3: Don't use PMCCNTR_EL0 on SMT cores
2025-09-22 3:30 [PATCH v3 1/1] perf: arm_pmuv3: Don't use PMCCNTR_EL0 on SMT cores Yicong Yang
2025-11-03 14:57 ` Will Deacon
@ 2025-11-06 14:33 ` Mark Brown
2025-11-06 15:16 ` Yicong Yang
1 sibling, 1 reply; 5+ messages in thread
From: Mark Brown @ 2025-11-06 14:33 UTC (permalink / raw)
To: Yicong Yang
Cc: will, mark.rutland, linux-arm-kernel, sudeep.holla, james.clark,
robh, anshuman.khandual, jonathan.cameron, hejunhao3, linuxarm,
prime.zeng, xuwei5, wangyushan12, yangyicong, yangyccccc
[-- Attachment #1: Type: text/plain, Size: 1575 bytes --]
On Mon, Sep 22, 2025 at 11:30:10AM +0800, Yicong Yang wrote:
> index d72d6e5aa200..daa1af2e8204 100644
> --- a/include/linux/arch_topology.h
> +++ b/include/linux/arch_topology.h
> @@ -89,6 +89,17 @@ void remove_cpu_topology(unsigned int cpuid);
> void reset_cpu_topology(void);
> int parse_acpi_topology(void);
> void freq_inv_set_max_ratio(int cpu, u64 max_rate);
> +
> +/*
> + * Architectures like ARM64 don't have reliable architectural way to get SMT
> + * information and depend on the firmware (ACPI/OF) report. Non-SMT core won't
> + * initialize thread_id so we can use this to detect the SMT implementation.
> + */
> +static inline bool topology_core_has_smt(int cpu)
> +{
> + return cpu_topology[cpu].thread_id != -1;
> +}
I'm seeing build failures in -next on arm v5 and v6:
/build/stage/linux/drivers/perf/arm_pmu.c: In function ‘armpmu_register’:
/build/stage/linux/drivers/perf/arm_pmu.c:935:24: error: implicit declaration of
function ‘topology_core_has_smt’; did you mean ‘topology_core_cpumask’? [-Wimpl
icit-function-declaration]
935 | pmu->has_smt = topology_core_has_smt(cpumask_first(&pmu->support
ed_cpus));
| ^~~~~~~~~~~~~~~~~~~~~
| topology_core_cpumask
make[5]: *** [/build/stage/linux/scripts/Makefile.build:287: drivers/perf/arm_pm
u.o] Error 1
The above function is inside a CONFIG_GENERIC_ARCH_TOPOLOGY guard, we
need a stub definition for architectures which don't have topology
support or for the users to have guards.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v3 1/1] perf: arm_pmuv3: Don't use PMCCNTR_EL0 on SMT cores
2025-11-06 14:33 ` Mark Brown
@ 2025-11-06 15:16 ` Yicong Yang
2025-11-06 15:25 ` Mark Brown
0 siblings, 1 reply; 5+ messages in thread
From: Yicong Yang @ 2025-11-06 15:16 UTC (permalink / raw)
To: Mark Brown
Cc: yangyccccc, will, mark.rutland, linux-arm-kernel, sudeep.holla,
james.clark, robh, anshuman.khandual, jonathan.cameron, hejunhao3,
linuxarm, prime.zeng, xuwei5, wangyushan12
On 2025/11/6 22:33, Mark Brown wrote:
> On Mon, Sep 22, 2025 at 11:30:10AM +0800, Yicong Yang wrote:
>
>> index d72d6e5aa200..daa1af2e8204 100644
>> --- a/include/linux/arch_topology.h
>> +++ b/include/linux/arch_topology.h
>> @@ -89,6 +89,17 @@ void remove_cpu_topology(unsigned int cpuid);
>> void reset_cpu_topology(void);
>> int parse_acpi_topology(void);
>> void freq_inv_set_max_ratio(int cpu, u64 max_rate);
>> +
>> +/*
>> + * Architectures like ARM64 don't have reliable architectural way to get SMT
>> + * information and depend on the firmware (ACPI/OF) report. Non-SMT core won't
>> + * initialize thread_id so we can use this to detect the SMT implementation.
>> + */
>> +static inline bool topology_core_has_smt(int cpu)
>> +{
>> + return cpu_topology[cpu].thread_id != -1;
>> +}
> I'm seeing build failures in -next on arm v5 and v6:
>
> /build/stage/linux/drivers/perf/arm_pmu.c: In function ‘armpmu_register’:
> /build/stage/linux/drivers/perf/arm_pmu.c:935:24: error: implicit declaration of
> function ‘topology_core_has_smt’; did you mean ‘topology_core_cpumask’? [-Wimpl
> icit-function-declaration]
> 935 | pmu->has_smt = topology_core_has_smt(cpumask_first(&pmu->support
> ed_cpus));
> | ^~~~~~~~~~~~~~~~~~~~~
> | topology_core_cpumask
> make[5]: *** [/build/stage/linux/scripts/Makefile.build:287: drivers/perf/arm_pm
> u.o] Error 1
>
> The above function is inside a CONFIG_GENERIC_ARCH_TOPOLOGY guard, we
> need a stub definition for architectures which don't have topology
> support or for the users to have guards.
already sent a fix out [1]. sorry for didn't notice this :(
[1] https://lore.kernel.org/all/20251105103849.4093-1-yangyccccc@gmail.com/
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v3 1/1] perf: arm_pmuv3: Don't use PMCCNTR_EL0 on SMT cores
2025-11-06 15:16 ` Yicong Yang
@ 2025-11-06 15:25 ` Mark Brown
0 siblings, 0 replies; 5+ messages in thread
From: Mark Brown @ 2025-11-06 15:25 UTC (permalink / raw)
To: Yicong Yang
Cc: will, mark.rutland, linux-arm-kernel, sudeep.holla, james.clark,
robh, anshuman.khandual, jonathan.cameron, hejunhao3, linuxarm,
prime.zeng, xuwei5, wangyushan12
[-- Attachment #1: Type: text/plain, Size: 455 bytes --]
On Thu, Nov 06, 2025 at 11:16:52PM +0800, Yicong Yang wrote:
> On 2025/11/6 22:33, Mark Brown wrote:
> > The above function is inside a CONFIG_GENERIC_ARCH_TOPOLOGY guard, we
> > need a stub definition for architectures which don't have topology
> > support or for the users to have guards.
> already sent a fix out [1]. sorry for didn't notice this :(
> [1] https://lore.kernel.org/all/20251105103849.4093-1-yangyccccc@gmail.com/
Ah, great - thanks!
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-11-06 15:26 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-22 3:30 [PATCH v3 1/1] perf: arm_pmuv3: Don't use PMCCNTR_EL0 on SMT cores Yicong Yang
2025-11-03 14:57 ` Will Deacon
2025-11-06 14:33 ` Mark Brown
2025-11-06 15:16 ` Yicong Yang
2025-11-06 15:25 ` Mark Brown
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox