linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] perf: arm_pmuv3: Don't use PMCCNTR_EL0 on SMT cores
@ 2025-08-12  8:08 Yicong Yang
  2025-08-12  8:08 ` [PATCH 1/2] perf: arm_pmuv3: Factor out PMCCNTR_EL0 use conditions Yicong Yang
  2025-08-12  8:08 ` [PATCH 2/2] perf: arm_pmuv3: Don't use PMCCNTR_EL0 on SMT cores Yicong Yang
  0 siblings, 2 replies; 13+ messages in thread
From: Yicong Yang @ 2025-08-12  8:08 UTC (permalink / raw)
  To: will, mark.rutland, linux-arm-kernel
  Cc: james.clark, robh, anshuman.khandual, jonathan.cameron, hejunhao3,
	linuxarm, prime.zeng, xuwei5, wangyushan12, yangyicong

From: Yicong Yang <yangyicong@hisilicon.com>

This patchset fixed CPU_CYCLES counting on SMT system. CPU_CYCLES can use
PMCCNTR_EL0 which will count the hardware processor clock rather than the
PE clock (ARM DDI0487 L.b D13.1.3) on SMT cores which fails the users
expectation as CPU_CYCLES (0x0011) is defined to count on each PE cycles.
Fix this by avoid using PMCCNTR_EL0 on SMT cores when counting CPU_CYCLES.

Yicong Yang (2):
  perf: arm_pmuv3: Factor out PMCCNTR_EL0 use conditions
  perf: arm_pmuv3: Don't use PMCCNTR_EL0 on SMT cores

 drivers/perf/arm_pmuv3.c | 39 +++++++++++++++++++++++++++++++++++++--
 1 file changed, 37 insertions(+), 2 deletions(-)

-- 
2.24.0



^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/2] perf: arm_pmuv3: Factor out PMCCNTR_EL0 use conditions
  2025-08-12  8:08 [PATCH 0/2] perf: arm_pmuv3: Don't use PMCCNTR_EL0 on SMT cores Yicong Yang
@ 2025-08-12  8:08 ` Yicong Yang
  2025-08-12 10:02   ` James Clark
  2025-08-12 10:25   ` Mark Rutland
  2025-08-12  8:08 ` [PATCH 2/2] perf: arm_pmuv3: Don't use PMCCNTR_EL0 on SMT cores Yicong Yang
  1 sibling, 2 replies; 13+ messages in thread
From: Yicong Yang @ 2025-08-12  8:08 UTC (permalink / raw)
  To: will, mark.rutland, linux-arm-kernel
  Cc: james.clark, robh, anshuman.khandual, jonathan.cameron, hejunhao3,
	linuxarm, prime.zeng, xuwei5, wangyushan12, yangyicong

From: Yicong Yang <yangyicong@hisilicon.com>

PMCCNTR_EL0 is preferred for counting CPU_CYCLES under certain
conditions. Factor out the condition check to a separate function
for further extension. Add documents for better understanding.
No functional changes intended.

Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
---
 drivers/perf/arm_pmuv3.c | 30 ++++++++++++++++++++++++++++--
 1 file changed, 28 insertions(+), 2 deletions(-)

diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c
index f6d7bab5d555..95c899d07df5 100644
--- a/drivers/perf/arm_pmuv3.c
+++ b/drivers/perf/arm_pmuv3.c
@@ -978,6 +978,33 @@ static int armv8pmu_get_chain_idx(struct pmu_hw_events *cpuc,
 	return -EAGAIN;
 }
 
+static bool armv8pmu_can_use_pmccntr(struct pmu_hw_events *cpuc,
+				     struct perf_event *event)
+{
+	struct hw_perf_event *hwc = &event->hw;
+	unsigned long evtype = hwc->config_base & ARMV8_PMU_EVTYPE_EVENT;
+
+	/* PMCCNTR_EL0 can only be used for CPU_CYCLES event */
+	if (evtype != ARMV8_PMUV3_PERFCTR_CPU_CYCLES)
+		return false;
+
+	/*
+	 * A CPU_CYCLES event with threshold counting cannot use PMCCNTR_EL0
+	 * since it lacks threshold support.
+	 */
+	if (armv8pmu_event_get_threshold(&event->attr))
+		return false;
+
+	/*
+	 * PMCCNTR_EL0 is not affected by BRBE controls like BRBCR_ELx.FZP.
+	 * So don't use it for branch events.
+	 */
+	if (has_branch_stack(event))
+		return false;
+
+	return true;
+}
+
 static int armv8pmu_get_event_idx(struct pmu_hw_events *cpuc,
 				  struct perf_event *event)
 {
@@ -986,8 +1013,7 @@ static int armv8pmu_get_event_idx(struct pmu_hw_events *cpuc,
 	unsigned long evtype = hwc->config_base & ARMV8_PMU_EVTYPE_EVENT;
 
 	/* Always prefer to place a cycle counter into the cycle counter. */
-	if ((evtype == ARMV8_PMUV3_PERFCTR_CPU_CYCLES) &&
-	    !armv8pmu_event_get_threshold(&event->attr) && !has_branch_stack(event)) {
+	if (armv8pmu_can_use_pmccntr(cpuc, event)) {
 		if (!test_and_set_bit(ARMV8_PMU_CYCLE_IDX, cpuc->used_mask))
 			return ARMV8_PMU_CYCLE_IDX;
 		else if (armv8pmu_event_is_64bit(event) &&
-- 
2.24.0



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/2] perf: arm_pmuv3: Don't use PMCCNTR_EL0 on SMT cores
  2025-08-12  8:08 [PATCH 0/2] perf: arm_pmuv3: Don't use PMCCNTR_EL0 on SMT cores Yicong Yang
  2025-08-12  8:08 ` [PATCH 1/2] perf: arm_pmuv3: Factor out PMCCNTR_EL0 use conditions Yicong Yang
@ 2025-08-12  8:08 ` Yicong Yang
  2025-08-12 10:00   ` James Clark
  2025-08-12 10:33   ` Mark Rutland
  1 sibling, 2 replies; 13+ messages in thread
From: Yicong Yang @ 2025-08-12  8:08 UTC (permalink / raw)
  To: will, mark.rutland, linux-arm-kernel
  Cc: james.clark, robh, anshuman.khandual, jonathan.cameron, hejunhao3,
	linuxarm, prime.zeng, xuwei5, wangyushan12, yangyicong

From: Yicong Yang <yangyicong@hisilicon.com>

CPU_CYCLES is expected to count the logical CPU (PE) clock. Currently it's
preferred to use PMCCNTR_EL0 for counting CPU_CYCLES, but it'll count
processor clock rather than the PE clock (ARM DDI0487 L.b D13.1.3) if
one of the SMT siblings is not idle on a multi-threaded implementation.
So don't use it on SMT cores.

When counting cycles on SMT CPU 2-3 and CPU 3 is idle, without this
patch we'll get:
[root@client1 tmp]# perf stat -e cycles -A -C 2-3 -- stress-ng -c 1
--taskset 2 --timeout 1
[...]
 Performance counter stats for 'CPU(s) 2-3':

CPU2           2880457316      cycles
CPU3           2880459810      cycles
       1.254688470 seconds time elapsed

With this patch the idle state of CPU3 is observed as expected:
[root@client1 ~]#  perf stat -e cycles -A -C 2-3 -- stress-ng -c 1
--taskset 2 --timeout 1
[...]
 Performance counter stats for 'CPU(s) 2-3':

CPU2           2558580492      cycles
CPU3               305749      cycles
       1.113626410 seconds time elapsed

Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
---
 drivers/perf/arm_pmuv3.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c
index 95c899d07df5..ed3149632b71 100644
--- a/drivers/perf/arm_pmuv3.c
+++ b/drivers/perf/arm_pmuv3.c
@@ -1002,6 +1002,15 @@ static bool armv8pmu_can_use_pmccntr(struct pmu_hw_events *cpuc,
 	if (has_branch_stack(event))
 		return false;
 
+	/*
+	 * The PMCCNTR_EL0 increments from the processor clock rather than
+	 * the PE clock (ARM DDI0487 L.b D13.1.3) which means it'll continue
+	 * counting on a WFI PE if one of its SMT silbing is not idle on a
+	 * multi-threaded implementation. So don't use it on SMT cores.
+	 */
+	if (cpumask_weight(topology_sibling_cpumask(smp_processor_id())) > 1)
+		return false;
+
 	return true;
 }
 
-- 
2.24.0



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] perf: arm_pmuv3: Don't use PMCCNTR_EL0 on SMT cores
  2025-08-12  8:08 ` [PATCH 2/2] perf: arm_pmuv3: Don't use PMCCNTR_EL0 on SMT cores Yicong Yang
@ 2025-08-12 10:00   ` James Clark
  2025-08-12 10:14     ` Yicong Yang
  2025-08-12 10:33   ` Mark Rutland
  1 sibling, 1 reply; 13+ messages in thread
From: James Clark @ 2025-08-12 10:00 UTC (permalink / raw)
  To: Yicong Yang, will, mark.rutland, linux-arm-kernel
  Cc: robh, anshuman.khandual, jonathan.cameron, hejunhao3, linuxarm,
	prime.zeng, xuwei5, wangyushan12, yangyicong



On 12/08/2025 9:08 am, Yicong Yang wrote:
> From: Yicong Yang <yangyicong@hisilicon.com>
> 
> CPU_CYCLES is expected to count the logical CPU (PE) clock. Currently it's
> preferred to use PMCCNTR_EL0 for counting CPU_CYCLES, but it'll count
> processor clock rather than the PE clock (ARM DDI0487 L.b D13.1.3) if
> one of the SMT siblings is not idle on a multi-threaded implementation.
> So don't use it on SMT cores.
> 
> When counting cycles on SMT CPU 2-3 and CPU 3 is idle, without this
> patch we'll get:
> [root@client1 tmp]# perf stat -e cycles -A -C 2-3 -- stress-ng -c 1
> --taskset 2 --timeout 1
> [...]
>   Performance counter stats for 'CPU(s) 2-3':
> 
> CPU2           2880457316      cycles
> CPU3           2880459810      cycles
>         1.254688470 seconds time elapsed
> 
> With this patch the idle state of CPU3 is observed as expected:
> [root@client1 ~]#  perf stat -e cycles -A -C 2-3 -- stress-ng -c 1
> --taskset 2 --timeout 1
> [...]
>   Performance counter stats for 'CPU(s) 2-3':
> 
> CPU2           2558580492      cycles
> CPU3               305749      cycles
>         1.113626410 seconds time elapsed
> 
> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
> ---
>   drivers/perf/arm_pmuv3.c | 9 +++++++++
>   1 file changed, 9 insertions(+)
> 
> diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c
> index 95c899d07df5..ed3149632b71 100644
> --- a/drivers/perf/arm_pmuv3.c
> +++ b/drivers/perf/arm_pmuv3.c
> @@ -1002,6 +1002,15 @@ static bool armv8pmu_can_use_pmccntr(struct pmu_hw_events *cpuc,
>   	if (has_branch_stack(event))
>   		return false;
>   
> +	/*
> +	 * The PMCCNTR_EL0 increments from the processor clock rather than
> +	 * the PE clock (ARM DDI0487 L.b D13.1.3) which means it'll continue
> +	 * counting on a WFI PE if one of its SMT silbing is not idle on a
> +	 * multi-threaded implementation. So don't use it on SMT cores.
> +	 */
> +	if (cpumask_weight(topology_sibling_cpumask(smp_processor_id())) > 1)
> +		return false;
> +

Isn't this something that's static to the PMU? If all CPUs in each PMU 
are always the same then this doesn't need to be probed every time and 
can be set once.

Also you can't call smp_processor_id() from here because this is also 
called in armpmu_event_init() -> __hw_perf_event_init() -> 
validate_group() before the event is actually scheduled on a CPU. With 
CONFIG_DEBUG_PREEMPT you'd see the error.

>   	return true;
>   }
>   



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] perf: arm_pmuv3: Factor out PMCCNTR_EL0 use conditions
  2025-08-12  8:08 ` [PATCH 1/2] perf: arm_pmuv3: Factor out PMCCNTR_EL0 use conditions Yicong Yang
@ 2025-08-12 10:02   ` James Clark
  2025-08-12 10:25   ` Mark Rutland
  1 sibling, 0 replies; 13+ messages in thread
From: James Clark @ 2025-08-12 10:02 UTC (permalink / raw)
  To: Yicong Yang, will, mark.rutland, linux-arm-kernel
  Cc: robh, anshuman.khandual, jonathan.cameron, hejunhao3, linuxarm,
	prime.zeng, xuwei5, wangyushan12, yangyicong



On 12/08/2025 9:08 am, Yicong Yang wrote:
> From: Yicong Yang <yangyicong@hisilicon.com>
> 
> PMCCNTR_EL0 is preferred for counting CPU_CYCLES under certain
> conditions. Factor out the condition check to a separate function
> for further extension. Add documents for better understanding.
> No functional changes intended.
> 
> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
> ---
>   drivers/perf/arm_pmuv3.c | 30 ++++++++++++++++++++++++++++--
>   1 file changed, 28 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c
> index f6d7bab5d555..95c899d07df5 100644
> --- a/drivers/perf/arm_pmuv3.c
> +++ b/drivers/perf/arm_pmuv3.c
> @@ -978,6 +978,33 @@ static int armv8pmu_get_chain_idx(struct pmu_hw_events *cpuc,
>   	return -EAGAIN;
>   }
>   
> +static bool armv8pmu_can_use_pmccntr(struct pmu_hw_events *cpuc,
> +				     struct perf_event *event)
> +{
> +	struct hw_perf_event *hwc = &event->hw;
> +	unsigned long evtype = hwc->config_base & ARMV8_PMU_EVTYPE_EVENT;
> +
> +	/* PMCCNTR_EL0 can only be used for CPU_CYCLES event */
> +	if (evtype != ARMV8_PMUV3_PERFCTR_CPU_CYCLES)
> +		return false;
> +
> +	/*
> +	 * A CPU_CYCLES event with threshold counting cannot use PMCCNTR_EL0
> +	 * since it lacks threshold support.
> +	 */
> +	if (armv8pmu_event_get_threshold(&event->attr))
> +		return false;
> +
> +	/*
> +	 * PMCCNTR_EL0 is not affected by BRBE controls like BRBCR_ELx.FZP.
> +	 * So don't use it for branch events.
> +	 */
> +	if (has_branch_stack(event))
> +		return false;
> +
> +	return true;
> +}
> +
>   static int armv8pmu_get_event_idx(struct pmu_hw_events *cpuc,
>   				  struct perf_event *event)
>   {
> @@ -986,8 +1013,7 @@ static int armv8pmu_get_event_idx(struct pmu_hw_events *cpuc,
>   	unsigned long evtype = hwc->config_base & ARMV8_PMU_EVTYPE_EVENT;
>   
>   	/* Always prefer to place a cycle counter into the cycle counter. */
> -	if ((evtype == ARMV8_PMUV3_PERFCTR_CPU_CYCLES) &&
> -	    !armv8pmu_event_get_threshold(&event->attr) && !has_branch_stack(event)) {
> +	if (armv8pmu_can_use_pmccntr(cpuc, event)) {
>   		if (!test_and_set_bit(ARMV8_PMU_CYCLE_IDX, cpuc->used_mask))
>   			return ARMV8_PMU_CYCLE_IDX;
>   		else if (armv8pmu_event_is_64bit(event) &&

Reviewed-by: James Clark <james.clark@linaro.org>



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] perf: arm_pmuv3: Don't use PMCCNTR_EL0 on SMT cores
  2025-08-12 10:00   ` James Clark
@ 2025-08-12 10:14     ` Yicong Yang
  2025-08-12 10:22       ` Mark Rutland
  2025-08-12 10:31       ` James Clark
  0 siblings, 2 replies; 13+ messages in thread
From: Yicong Yang @ 2025-08-12 10:14 UTC (permalink / raw)
  To: James Clark, will, mark.rutland, linux-arm-kernel
  Cc: yangyicong, robh, anshuman.khandual, jonathan.cameron, hejunhao3,
	linuxarm, prime.zeng, xuwei5, wangyushan12

On 2025/8/12 18:00, James Clark wrote:
> 
> 
> On 12/08/2025 9:08 am, Yicong Yang wrote:
>> From: Yicong Yang <yangyicong@hisilicon.com>
>>
>> CPU_CYCLES is expected to count the logical CPU (PE) clock. Currently it's
>> preferred to use PMCCNTR_EL0 for counting CPU_CYCLES, but it'll count
>> processor clock rather than the PE clock (ARM DDI0487 L.b D13.1.3) if
>> one of the SMT siblings is not idle on a multi-threaded implementation.
>> So don't use it on SMT cores.
>>
>> When counting cycles on SMT CPU 2-3 and CPU 3 is idle, without this
>> patch we'll get:
>> [root@client1 tmp]# perf stat -e cycles -A -C 2-3 -- stress-ng -c 1
>> --taskset 2 --timeout 1
>> [...]
>>   Performance counter stats for 'CPU(s) 2-3':
>>
>> CPU2           2880457316      cycles
>> CPU3           2880459810      cycles
>>         1.254688470 seconds time elapsed
>>
>> With this patch the idle state of CPU3 is observed as expected:
>> [root@client1 ~]#  perf stat -e cycles -A -C 2-3 -- stress-ng -c 1
>> --taskset 2 --timeout 1
>> [...]
>>   Performance counter stats for 'CPU(s) 2-3':
>>
>> CPU2           2558580492      cycles
>> CPU3               305749      cycles
>>         1.113626410 seconds time elapsed
>>
>> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
>> ---
>>   drivers/perf/arm_pmuv3.c | 9 +++++++++
>>   1 file changed, 9 insertions(+)
>>
>> diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c
>> index 95c899d07df5..ed3149632b71 100644
>> --- a/drivers/perf/arm_pmuv3.c
>> +++ b/drivers/perf/arm_pmuv3.c
>> @@ -1002,6 +1002,15 @@ static bool armv8pmu_can_use_pmccntr(struct pmu_hw_events *cpuc,
>>       if (has_branch_stack(event))
>>           return false;
>>   +    /*
>> +     * The PMCCNTR_EL0 increments from the processor clock rather than
>> +     * the PE clock (ARM DDI0487 L.b D13.1.3) which means it'll continue
>> +     * counting on a WFI PE if one of its SMT silbing is not idle on a
>> +     * multi-threaded implementation. So don't use it on SMT cores.
>> +     */
>> +    if (cpumask_weight(topology_sibling_cpumask(smp_processor_id())) > 1)
>> +        return false;
>> +
> 
> Isn't this something that's static to the PMU? If all CPUs in each PMU are always the same then this doesn't need to be probed every time and can be set once.
> 
we can make use of PMCCNTR_EL0 if the SMT is runtime disabled, e.g. by /sys/devices/system/cpu/smt/control
if set this at probe time then we permanently lose the chance to use PMCCNTR_EL0.


> Also you can't call smp_processor_id() from here because this is also called in armpmu_event_init() -> __hw_perf_event_init() -> validate_group() before the event is actually scheduled on a CPU. With CONFIG_DEBUG_PREEMPT you'd see the error.

ok, will use raw_smp_processor_id() instead. it won't affect the validation checking in pmu::event_init().
in pmu::add() the cpu id is always stable so it'll also be fine.

Thanks.




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] perf: arm_pmuv3: Don't use PMCCNTR_EL0 on SMT cores
  2025-08-12 10:14     ` Yicong Yang
@ 2025-08-12 10:22       ` Mark Rutland
  2025-08-13  8:17         ` Yicong Yang
  2025-08-12 10:31       ` James Clark
  1 sibling, 1 reply; 13+ messages in thread
From: Mark Rutland @ 2025-08-12 10:22 UTC (permalink / raw)
  To: Yicong Yang
  Cc: James Clark, will, linux-arm-kernel, yangyicong, robh,
	anshuman.khandual, jonathan.cameron, hejunhao3, linuxarm,
	prime.zeng, xuwei5, wangyushan12

On Tue, Aug 12, 2025 at 06:14:33PM +0800, Yicong Yang wrote:
> On 2025/8/12 18:00, James Clark wrote:
> > On 12/08/2025 9:08 am, Yicong Yang wrote:
> >> @@ -1002,6 +1002,15 @@ static bool armv8pmu_can_use_pmccntr(struct pmu_hw_events *cpuc,
> >>       if (has_branch_stack(event))
> >>           return false;
> >>   +    /*
> >> +     * The PMCCNTR_EL0 increments from the processor clock rather than
> >> +     * the PE clock (ARM DDI0487 L.b D13.1.3) which means it'll continue
> >> +     * counting on a WFI PE if one of its SMT silbing is not idle on a
> >> +     * multi-threaded implementation. So don't use it on SMT cores.
> >> +     */
> >> +    if (cpumask_weight(topology_sibling_cpumask(smp_processor_id())) > 1)
> >> +        return false;
> >> +
> > 
> > Isn't this something that's static to the PMU? If all CPUs in each PMU are always the same then this doesn't need to be probed every time and can be set once.
> > 
> we can make use of PMCCNTR_EL0 if the SMT is runtime disabled, e.g. by /sys/devices/system/cpu/smt/control
> if set this at probe time then we permanently lose the chance to use PMCCNTR_EL0.

Can it be runtime enabled too?

If so, then we can't use PMCCNTR_EL0 in case we later dynamically go
from disabled to enabled.

I do not think this should be handled dynamically.

> > Also you can't call smp_processor_id() from here because this is
> > also called in armpmu_event_init() -> __hw_perf_event_init() ->
> > validate_group() before the event is actually scheduled on a CPU.
> > With CONFIG_DEBUG_PREEMPT you'd see the error.
> 
> ok, will use raw_smp_processor_id() instead. it won't affect the validation checking in pmu::event_init().
> in pmu::add() the cpu id is always stable so it'll also be fine.

NAK to this.

It *will* affect validation since it affects the number of events that
can be placed into a single group (by virtue of allowing or forbidding
an additional cycles events). That would be non-deterministic, which is
horrible to debug.

Mark.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] perf: arm_pmuv3: Factor out PMCCNTR_EL0 use conditions
  2025-08-12  8:08 ` [PATCH 1/2] perf: arm_pmuv3: Factor out PMCCNTR_EL0 use conditions Yicong Yang
  2025-08-12 10:02   ` James Clark
@ 2025-08-12 10:25   ` Mark Rutland
  1 sibling, 0 replies; 13+ messages in thread
From: Mark Rutland @ 2025-08-12 10:25 UTC (permalink / raw)
  To: Yicong Yang
  Cc: will, linux-arm-kernel, james.clark, robh, anshuman.khandual,
	jonathan.cameron, hejunhao3, linuxarm, prime.zeng, xuwei5,
	wangyushan12, yangyicong

On Tue, Aug 12, 2025 at 04:08:29PM +0800, Yicong Yang wrote:
> From: Yicong Yang <yangyicong@hisilicon.com>
> 
> PMCCNTR_EL0 is preferred for counting CPU_CYCLES under certain
> conditions. Factor out the condition check to a separate function
> for further extension. Add documents for better understanding.
> No functional changes intended.
> 
> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>

FWIW, splitting this oudt looks fine to me (with one nit below), so:

Acked-by: Mark Rutland <mark.rutland@arm.com>

> ---
>  drivers/perf/arm_pmuv3.c | 30 ++++++++++++++++++++++++++++--
>  1 file changed, 28 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c
> index f6d7bab5d555..95c899d07df5 100644
> --- a/drivers/perf/arm_pmuv3.c
> +++ b/drivers/perf/arm_pmuv3.c
> @@ -978,6 +978,33 @@ static int armv8pmu_get_chain_idx(struct pmu_hw_events *cpuc,
>  	return -EAGAIN;
>  }
>  
> +static bool armv8pmu_can_use_pmccntr(struct pmu_hw_events *cpuc,
> +				     struct perf_event *event)
> +{
> +	struct hw_perf_event *hwc = &event->hw;
> +	unsigned long evtype = hwc->config_base & ARMV8_PMU_EVTYPE_EVENT;
> +
> +	/* PMCCNTR_EL0 can only be used for CPU_CYCLES event */
> +	if (evtype != ARMV8_PMUV3_PERFCTR_CPU_CYCLES)
> +		return false;

Nit: I don't think this comment is useful, and it could be deleted.

Mark.

> +
> +	/*
> +	 * A CPU_CYCLES event with threshold counting cannot use PMCCNTR_EL0
> +	 * since it lacks threshold support.
> +	 */
> +	if (armv8pmu_event_get_threshold(&event->attr))
> +		return false;
> +
> +	/*
> +	 * PMCCNTR_EL0 is not affected by BRBE controls like BRBCR_ELx.FZP.
> +	 * So don't use it for branch events.
> +	 */
> +	if (has_branch_stack(event))
> +		return false;
> +
> +	return true;
> +}
> +
>  static int armv8pmu_get_event_idx(struct pmu_hw_events *cpuc,
>  				  struct perf_event *event)
>  {
> @@ -986,8 +1013,7 @@ static int armv8pmu_get_event_idx(struct pmu_hw_events *cpuc,
>  	unsigned long evtype = hwc->config_base & ARMV8_PMU_EVTYPE_EVENT;
>  
>  	/* Always prefer to place a cycle counter into the cycle counter. */
> -	if ((evtype == ARMV8_PMUV3_PERFCTR_CPU_CYCLES) &&
> -	    !armv8pmu_event_get_threshold(&event->attr) && !has_branch_stack(event)) {
> +	if (armv8pmu_can_use_pmccntr(cpuc, event)) {
>  		if (!test_and_set_bit(ARMV8_PMU_CYCLE_IDX, cpuc->used_mask))
>  			return ARMV8_PMU_CYCLE_IDX;
>  		else if (armv8pmu_event_is_64bit(event) &&
> -- 
> 2.24.0
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] perf: arm_pmuv3: Don't use PMCCNTR_EL0 on SMT cores
  2025-08-12 10:14     ` Yicong Yang
  2025-08-12 10:22       ` Mark Rutland
@ 2025-08-12 10:31       ` James Clark
  2025-08-13  8:32         ` Yicong Yang
  1 sibling, 1 reply; 13+ messages in thread
From: James Clark @ 2025-08-12 10:31 UTC (permalink / raw)
  To: Yicong Yang, will, mark.rutland, linux-arm-kernel
  Cc: yangyicong, robh, anshuman.khandual, jonathan.cameron, hejunhao3,
	linuxarm, prime.zeng, xuwei5, wangyushan12



On 12/08/2025 11:14 am, Yicong Yang wrote:
> On 2025/8/12 18:00, James Clark wrote:
>>
>>
>> On 12/08/2025 9:08 am, Yicong Yang wrote:
>>> From: Yicong Yang <yangyicong@hisilicon.com>
>>>
>>> CPU_CYCLES is expected to count the logical CPU (PE) clock. Currently it's
>>> preferred to use PMCCNTR_EL0 for counting CPU_CYCLES, but it'll count
>>> processor clock rather than the PE clock (ARM DDI0487 L.b D13.1.3) if
>>> one of the SMT siblings is not idle on a multi-threaded implementation.
>>> So don't use it on SMT cores.
>>>
>>> When counting cycles on SMT CPU 2-3 and CPU 3 is idle, without this
>>> patch we'll get:
>>> [root@client1 tmp]# perf stat -e cycles -A -C 2-3 -- stress-ng -c 1
>>> --taskset 2 --timeout 1
>>> [...]
>>>    Performance counter stats for 'CPU(s) 2-3':
>>>
>>> CPU2           2880457316      cycles
>>> CPU3           2880459810      cycles
>>>          1.254688470 seconds time elapsed
>>>
>>> With this patch the idle state of CPU3 is observed as expected:
>>> [root@client1 ~]#  perf stat -e cycles -A -C 2-3 -- stress-ng -c 1
>>> --taskset 2 --timeout 1
>>> [...]
>>>    Performance counter stats for 'CPU(s) 2-3':
>>>
>>> CPU2           2558580492      cycles
>>> CPU3               305749      cycles
>>>          1.113626410 seconds time elapsed
>>>
>>> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
>>> ---
>>>    drivers/perf/arm_pmuv3.c | 9 +++++++++
>>>    1 file changed, 9 insertions(+)
>>>
>>> diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c
>>> index 95c899d07df5..ed3149632b71 100644
>>> --- a/drivers/perf/arm_pmuv3.c
>>> +++ b/drivers/perf/arm_pmuv3.c
>>> @@ -1002,6 +1002,15 @@ static bool armv8pmu_can_use_pmccntr(struct pmu_hw_events *cpuc,
>>>        if (has_branch_stack(event))
>>>            return false;
>>>    +    /*
>>> +     * The PMCCNTR_EL0 increments from the processor clock rather than
>>> +     * the PE clock (ARM DDI0487 L.b D13.1.3) which means it'll continue
>>> +     * counting on a WFI PE if one of its SMT silbing is not idle on a
>>> +     * multi-threaded implementation. So don't use it on SMT cores.
>>> +     */
>>> +    if (cpumask_weight(topology_sibling_cpumask(smp_processor_id())) > 1)
>>> +        return false;
>>> +
>>
>> Isn't this something that's static to the PMU? If all CPUs in each PMU are always the same then this doesn't need to be probed every time and can be set once.
>>
> we can make use of PMCCNTR_EL0 if the SMT is runtime disabled, e.g. by /sys/devices/system/cpu/smt/control
> if set this at probe time then we permanently lose the chance to use PMCCNTR_EL0.
> 

Is that a valuable usecase though? I don't actually know the answer to 
this. How common is disabling SMT on SMT cores and then also using PMU 
events in a way that you would miss having the extra cycles counter, 
despite not minding that you didn't have it when SMT was enabled?

And would it correctly handle enabling and disabling SMT after the event 
has already started? Feels like it wouldn't if you start the event with 
it disabled and it puts it onto PMCCNTR_EL0 then you enable it and the 
counts become wrong again.

> 
>> Also you can't call smp_processor_id() from here because this is also called in armpmu_event_init() -> __hw_perf_event_init() -> validate_group() before the event is actually scheduled on a CPU. With CONFIG_DEBUG_PREEMPT you'd see the error.
> 
> ok, will use raw_smp_processor_id() instead. it won't affect the validation checking in pmu::event_init().
> in pmu::add() the cpu id is always stable so it'll also be fine.
> 
> Thanks.
> 
> 

That feels a bit wrong but I suppose it will work. Maybe the real 
problem is that validation is doing too much. I know it's to re-use 
code, but then we're doing things as part of the validation that don't 
make sense. That can confuse the reader or it's just wasted effort. Also 
using raw removes the safety check which might mean it gets refactored 
into somewhere were it isn't valid to call it in the future.





^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] perf: arm_pmuv3: Don't use PMCCNTR_EL0 on SMT cores
  2025-08-12  8:08 ` [PATCH 2/2] perf: arm_pmuv3: Don't use PMCCNTR_EL0 on SMT cores Yicong Yang
  2025-08-12 10:00   ` James Clark
@ 2025-08-12 10:33   ` Mark Rutland
  2025-08-13  8:03     ` Yicong Yang
  1 sibling, 1 reply; 13+ messages in thread
From: Mark Rutland @ 2025-08-12 10:33 UTC (permalink / raw)
  To: Yicong Yang
  Cc: will, linux-arm-kernel, james.clark, robh, anshuman.khandual,
	jonathan.cameron, hejunhao3, linuxarm, prime.zeng, xuwei5,
	wangyushan12, yangyicong

On Tue, Aug 12, 2025 at 04:08:30PM +0800, Yicong Yang wrote:
> From: Yicong Yang <yangyicong@hisilicon.com>
> 
> CPU_CYCLES is expected to count the logical CPU (PE) clock. Currently it's
> preferred to use PMCCNTR_EL0 for counting CPU_CYCLES, but it'll count
> processor clock rather than the PE clock (ARM DDI0487 L.b D13.1.3) if
> one of the SMT siblings is not idle on a multi-threaded implementation.
>
> So don't use it on SMT cores.

This is rather unfortunate.

When does this actually matter?

Per ARM DDI 0487 L.b, page D14-6918:

| If FEAT_PMUv3p9 is implemented, then CPU_CYCLES does not increment
| when the clocks are stopped by WFI and WFE instructions. Otherwise, it
| is CONSTRAINED UNPREDICTABLE whether or not CPU_CYCLES continues to
| increment when the clocks are stopped by WFI and WFE instructions.

... so prior to FEAT_PMUv3p9, no-one could rely on the difference
anyway.

> When counting cycles on SMT CPU 2-3 and CPU 3 is idle, without this
> patch we'll get:
> [root@client1 tmp]# perf stat -e cycles -A -C 2-3 -- stress-ng -c 1
> --taskset 2 --timeout 1
> [...]
>  Performance counter stats for 'CPU(s) 2-3':
> 
> CPU2           2880457316      cycles
> CPU3           2880459810      cycles
>        1.254688470 seconds time elapsed
> 
> With this patch the idle state of CPU3 is observed as expected:
> [root@client1 ~]#  perf stat -e cycles -A -C 2-3 -- stress-ng -c 1
> --taskset 2 --timeout 1
> [...]
>  Performance counter stats for 'CPU(s) 2-3':
> 
> CPU2           2558580492      cycles
> CPU3               305749      cycles
>        1.113626410 seconds time elapsed
> 
> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
> ---
>  drivers/perf/arm_pmuv3.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c
> index 95c899d07df5..ed3149632b71 100644
> --- a/drivers/perf/arm_pmuv3.c
> +++ b/drivers/perf/arm_pmuv3.c
> @@ -1002,6 +1002,15 @@ static bool armv8pmu_can_use_pmccntr(struct pmu_hw_events *cpuc,
>  	if (has_branch_stack(event))
>  		return false;
>  
> +	/*
> +	 * The PMCCNTR_EL0 increments from the processor clock rather than
> +	 * the PE clock (ARM DDI0487 L.b D13.1.3) which means it'll continue
> +	 * counting on a WFI PE if one of its SMT silbing is not idle on a
> +	 * multi-threaded implementation. So don't use it on SMT cores.
> +	 */
> +	if (cpumask_weight(topology_sibling_cpumask(smp_processor_id())) > 1)
> +		return false;

This effectively forbids use of PMCCNTR_EL0 for any events.

Is there any existing event that it is useful for?

Mark.

> +
>  	return true;
>  }
>  
> -- 
> 2.24.0
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] perf: arm_pmuv3: Don't use PMCCNTR_EL0 on SMT cores
  2025-08-12 10:33   ` Mark Rutland
@ 2025-08-13  8:03     ` Yicong Yang
  0 siblings, 0 replies; 13+ messages in thread
From: Yicong Yang @ 2025-08-13  8:03 UTC (permalink / raw)
  To: Mark Rutland
  Cc: yangyicong, will, linux-arm-kernel, james.clark, robh,
	anshuman.khandual, jonathan.cameron, hejunhao3, linuxarm,
	prime.zeng, xuwei5, wangyushan12

On 2025/8/12 18:33, Mark Rutland wrote:
> On Tue, Aug 12, 2025 at 04:08:30PM +0800, Yicong Yang wrote:
>> From: Yicong Yang <yangyicong@hisilicon.com>
>>
>> CPU_CYCLES is expected to count the logical CPU (PE) clock. Currently it's
>> preferred to use PMCCNTR_EL0 for counting CPU_CYCLES, but it'll count
>> processor clock rather than the PE clock (ARM DDI0487 L.b D13.1.3) if
>> one of the SMT siblings is not idle on a multi-threaded implementation.
>>
>> So don't use it on SMT cores.
> 
> This is rather unfortunate.
> 
> When does this actually matter?
> 

the event metrics use cycles will be affected, like IPC. also the cycles profiling
for code's hotspot. the result won't be precise, e.g. if the thread 0 is running at
half speed while it's sibling thread 1 is running at full speed. also sometimes
we'll use cycles to detect the currently running frequency. the senarios maybe
non-exhaustive.

> Per ARM DDI 0487 L.b, page D14-6918:
> 
> | If FEAT_PMUv3p9 is implemented, then CPU_CYCLES does not increment
> | when the clocks are stopped by WFI and WFE instructions. Otherwise, it
> | is CONSTRAINED UNPREDICTABLE whether or not CPU_CYCLES continues to
> | increment when the clocks are stopped by WFI and WFE instructions.
> 
> ... so prior to FEAT_PMUv3p9, no-one could rely on the difference
> anyway.
> 
>> When counting cycles on SMT CPU 2-3 and CPU 3 is idle, without this
>> patch we'll get:
>> [root@client1 tmp]# perf stat -e cycles -A -C 2-3 -- stress-ng -c 1
>> --taskset 2 --timeout 1
>> [...]
>>  Performance counter stats for 'CPU(s) 2-3':
>>
>> CPU2           2880457316      cycles
>> CPU3           2880459810      cycles
>>        1.254688470 seconds time elapsed
>>
>> With this patch the idle state of CPU3 is observed as expected:
>> [root@client1 ~]#  perf stat -e cycles -A -C 2-3 -- stress-ng -c 1
>> --taskset 2 --timeout 1
>> [...]
>>  Performance counter stats for 'CPU(s) 2-3':
>>
>> CPU2           2558580492      cycles
>> CPU3               305749      cycles
>>        1.113626410 seconds time elapsed
>>
>> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
>> ---
>>  drivers/perf/arm_pmuv3.c | 9 +++++++++
>>  1 file changed, 9 insertions(+)
>>
>> diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c
>> index 95c899d07df5..ed3149632b71 100644
>> --- a/drivers/perf/arm_pmuv3.c
>> +++ b/drivers/perf/arm_pmuv3.c
>> @@ -1002,6 +1002,15 @@ static bool armv8pmu_can_use_pmccntr(struct pmu_hw_events *cpuc,
>>  	if (has_branch_stack(event))
>>  		return false;
>>  
>> +	/*
>> +	 * The PMCCNTR_EL0 increments from the processor clock rather than
>> +	 * the PE clock (ARM DDI0487 L.b D13.1.3) which means it'll continue
>> +	 * counting on a WFI PE if one of its SMT silbing is not idle on a
>> +	 * multi-threaded implementation. So don't use it on SMT cores.
>> +	 */
>> +	if (cpumask_weight(topology_sibling_cpumask(smp_processor_id())) > 1)
>> +		return false;
> 
> This effectively forbids use of PMCCNTR_EL0 for any events.
> 
> Is there any existing event that it is useful for?
> 

I think no. we're using the core pmu's events in a per-cpu (PE) manner. users don't
know whether this CPU_CYCLES events is scheduled on the PMCCNTR_EL0 or a common
counter so they also have no way to use PMCCNTR_EL0.

thanks.




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] perf: arm_pmuv3: Don't use PMCCNTR_EL0 on SMT cores
  2025-08-12 10:22       ` Mark Rutland
@ 2025-08-13  8:17         ` Yicong Yang
  0 siblings, 0 replies; 13+ messages in thread
From: Yicong Yang @ 2025-08-13  8:17 UTC (permalink / raw)
  To: Mark Rutland
  Cc: yangyicong, James Clark, will, linux-arm-kernel, robh,
	anshuman.khandual, jonathan.cameron, hejunhao3, linuxarm,
	prime.zeng, xuwei5, wangyushan12

On 2025/8/12 18:22, Mark Rutland wrote:
> On Tue, Aug 12, 2025 at 06:14:33PM +0800, Yicong Yang wrote:
>> On 2025/8/12 18:00, James Clark wrote:
>>> On 12/08/2025 9:08 am, Yicong Yang wrote:
>>>> @@ -1002,6 +1002,15 @@ static bool armv8pmu_can_use_pmccntr(struct pmu_hw_events *cpuc,
>>>>       if (has_branch_stack(event))
>>>>           return false;
>>>>   +    /*
>>>> +     * The PMCCNTR_EL0 increments from the processor clock rather than
>>>> +     * the PE clock (ARM DDI0487 L.b D13.1.3) which means it'll continue
>>>> +     * counting on a WFI PE if one of its SMT silbing is not idle on a
>>>> +     * multi-threaded implementation. So don't use it on SMT cores.
>>>> +     */
>>>> +    if (cpumask_weight(topology_sibling_cpumask(smp_processor_id())) > 1)
>>>> +        return false;
>>>> +
>>>
>>> Isn't this something that's static to the PMU? If all CPUs in each PMU are always the same then this doesn't need to be probed every time and can be set once.
>>>
>> we can make use of PMCCNTR_EL0 if the SMT is runtime disabled, e.g. by /sys/devices/system/cpu/smt/control
>> if set this at probe time then we permanently lose the chance to use PMCCNTR_EL0.
> 
> Can it be runtime enabled too?
> 

yes.

> If so, then we can't use PMCCNTR_EL0 in case we later dynamically go
> from disabled to enabled.
> 

ok, this will be a problem.

> I do not think this should be handled dynamically.
> 
>>> Also you can't call smp_processor_id() from here because this is
>>> also called in armpmu_event_init() -> __hw_perf_event_init() ->
>>> validate_group() before the event is actually scheduled on a CPU.
>>> With CONFIG_DEBUG_PREEMPT you'd see the error.
>>
>> ok, will use raw_smp_processor_id() instead. it won't affect the validation checking in pmu::event_init().
>> in pmu::add() the cpu id is always stable so it'll also be fine.
> 
> NAK to this.
> 
> It *will* affect validation since it affects the number of events that
> can be placed into a single group (by virtue of allowing or forbidding
> an additional cycles events). That would be non-deterministic, which is
> horrible to debug.
> 

ok.

if we want to do it at probed time, we have no general way for knowing the CPU
is in a multi-threaded implementation - the ACPI and OF detect this in different
ways. the topology_sibling_cpumask() only contains the online CPUs so also have the
problem mentioned above. can we simply rely on the mpidr_el1.mt (maybe not, spec
mentions it doesn't indicate the multi-threaded implementations) or any suggestion?

thanks.




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] perf: arm_pmuv3: Don't use PMCCNTR_EL0 on SMT cores
  2025-08-12 10:31       ` James Clark
@ 2025-08-13  8:32         ` Yicong Yang
  0 siblings, 0 replies; 13+ messages in thread
From: Yicong Yang @ 2025-08-13  8:32 UTC (permalink / raw)
  To: James Clark, will, mark.rutland, linux-arm-kernel
  Cc: yangyicong, robh, anshuman.khandual, jonathan.cameron, hejunhao3,
	linuxarm, prime.zeng, xuwei5, wangyushan12

On 2025/8/12 18:31, James Clark wrote:
> 
> 
> On 12/08/2025 11:14 am, Yicong Yang wrote:
>> On 2025/8/12 18:00, James Clark wrote:
>>>
>>>
>>> On 12/08/2025 9:08 am, Yicong Yang wrote:
>>>> From: Yicong Yang <yangyicong@hisilicon.com>
>>>>
>>>> CPU_CYCLES is expected to count the logical CPU (PE) clock. Currently it's
>>>> preferred to use PMCCNTR_EL0 for counting CPU_CYCLES, but it'll count
>>>> processor clock rather than the PE clock (ARM DDI0487 L.b D13.1.3) if
>>>> one of the SMT siblings is not idle on a multi-threaded implementation.
>>>> So don't use it on SMT cores.
>>>>
>>>> When counting cycles on SMT CPU 2-3 and CPU 3 is idle, without this
>>>> patch we'll get:
>>>> [root@client1 tmp]# perf stat -e cycles -A -C 2-3 -- stress-ng -c 1
>>>> --taskset 2 --timeout 1
>>>> [...]
>>>>    Performance counter stats for 'CPU(s) 2-3':
>>>>
>>>> CPU2           2880457316      cycles
>>>> CPU3           2880459810      cycles
>>>>          1.254688470 seconds time elapsed
>>>>
>>>> With this patch the idle state of CPU3 is observed as expected:
>>>> [root@client1 ~]#  perf stat -e cycles -A -C 2-3 -- stress-ng -c 1
>>>> --taskset 2 --timeout 1
>>>> [...]
>>>>    Performance counter stats for 'CPU(s) 2-3':
>>>>
>>>> CPU2           2558580492      cycles
>>>> CPU3               305749      cycles
>>>>          1.113626410 seconds time elapsed
>>>>
>>>> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
>>>> ---
>>>>    drivers/perf/arm_pmuv3.c | 9 +++++++++
>>>>    1 file changed, 9 insertions(+)
>>>>
>>>> diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c
>>>> index 95c899d07df5..ed3149632b71 100644
>>>> --- a/drivers/perf/arm_pmuv3.c
>>>> +++ b/drivers/perf/arm_pmuv3.c
>>>> @@ -1002,6 +1002,15 @@ static bool armv8pmu_can_use_pmccntr(struct pmu_hw_events *cpuc,
>>>>        if (has_branch_stack(event))
>>>>            return false;
>>>>    +    /*
>>>> +     * The PMCCNTR_EL0 increments from the processor clock rather than
>>>> +     * the PE clock (ARM DDI0487 L.b D13.1.3) which means it'll continue
>>>> +     * counting on a WFI PE if one of its SMT silbing is not idle on a
>>>> +     * multi-threaded implementation. So don't use it on SMT cores.
>>>> +     */
>>>> +    if (cpumask_weight(topology_sibling_cpumask(smp_processor_id())) > 1)
>>>> +        return false;
>>>> +
>>>
>>> Isn't this something that's static to the PMU? If all CPUs in each PMU are always the same then this doesn't need to be probed every time and can be set once.
>>>
>> we can make use of PMCCNTR_EL0 if the SMT is runtime disabled, e.g. by /sys/devices/system/cpu/smt/control
>> if set this at probe time then we permanently lose the chance to use PMCCNTR_EL0.
>>
> 
> Is that a valuable usecase though? I don't actually know the answer to this. How common is disabling SMT on SMT cores and then also using PMU events in a way that you would miss having the extra cycles counter, despite not minding that you didn't have it when SMT was enabled?
> 

I suppose it won't be frequent. just an example to show that the SMT can be runtime disabled and in such case we can
make use of PMCCNTR_EL0. but...

> And would it correctly handle enabling and disabling SMT after the event has already started? Feels like it wouldn't if you start the event with it disabled and it puts it onto PMCCNTR_EL0 then you enable it and the counts become wrong again.
> 

...as you mentioned it will be a problem if we use PMCCNTR_EL0 but the SMT is later enabled.

>>
>>> Also you can't call smp_processor_id() from here because this is also called in armpmu_event_init() -> __hw_perf_event_init() -> validate_group() before the event is actually scheduled on a CPU. With CONFIG_DEBUG_PREEMPT you'd see the error.
>>
>> ok, will use raw_smp_processor_id() instead. it won't affect the validation checking in pmu::event_init().
>> in pmu::add() the cpu id is always stable so it'll also be fine.
>>
>> Thanks.
>>
>>
> 
> That feels a bit wrong but I suppose it will work. Maybe the real problem is that validation is doing too much. I know it's to re-use code, but then we're doing things as part of the validation that don't make sense. That can confuse the reader or it's just wasted effort. Also using raw removes the safety check which might mean it gets refactored into somewhere were it isn't valid to call it in the future.
> 

ok. as Mark mentioned in this way the group validation will be affected.

thanks.




^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2025-08-13  8:35 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-12  8:08 [PATCH 0/2] perf: arm_pmuv3: Don't use PMCCNTR_EL0 on SMT cores Yicong Yang
2025-08-12  8:08 ` [PATCH 1/2] perf: arm_pmuv3: Factor out PMCCNTR_EL0 use conditions Yicong Yang
2025-08-12 10:02   ` James Clark
2025-08-12 10:25   ` Mark Rutland
2025-08-12  8:08 ` [PATCH 2/2] perf: arm_pmuv3: Don't use PMCCNTR_EL0 on SMT cores Yicong Yang
2025-08-12 10:00   ` James Clark
2025-08-12 10:14     ` Yicong Yang
2025-08-12 10:22       ` Mark Rutland
2025-08-13  8:17         ` Yicong Yang
2025-08-12 10:31       ` James Clark
2025-08-13  8:32         ` Yicong Yang
2025-08-12 10:33   ` Mark Rutland
2025-08-13  8:03     ` Yicong Yang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).