* [PATCH 2/2] perf/x86/intel: Fix wrong index calculation in intel_pmu_config_acr()
2025-05-29 8:02 [PATCH 1/2] perf/x86/intel: Fix IA32_PMC_x_CFG_B MSRs access error Dapeng Mi
@ 2025-05-29 8:02 ` Dapeng Mi
2025-05-29 13:55 ` Liang, Kan
2025-05-29 13:54 ` [PATCH 1/2] perf/x86/intel: Fix IA32_PMC_x_CFG_B MSRs access error Liang, Kan
2025-05-31 8:01 ` Ingo Molnar
2 siblings, 1 reply; 8+ messages in thread
From: Dapeng Mi @ 2025-05-29 8:02 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin,
Kan Liang, Andi Kleen, Eranian Stephane
Cc: linux-kernel, linux-perf-users, Dapeng Mi, Dapeng Mi
To calculate fixed counter MSR address, the HW counter index "idx" is
subtracted by INTEL_PMC_IDX_FIXED. It leads to the ACR mask value of
fixed counters is incorrectly saved to the positions of GP counters
in acr_cfg_b[], e.g. For fixed counter 0, its ACR counter mask should be
saved to acr_cfg_b[32], but it's saved to acr_cfg_b[0] incorrectly.
Fix this issue.
Fixes: ec980e4facef ("perf/x86/intel: Support auto counter reload")
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
---
arch/x86/events/intel/core.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 8d046b1a237e..b0fee684ec8c 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -2899,6 +2899,7 @@ static void intel_pmu_config_acr(int idx, u64 mask, u32 reload)
{
struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
int msr_b, msr_c;
+ int msr_offset;
if (!mask && !cpuc->acr_cfg_b[idx])
return;
@@ -2906,19 +2907,20 @@ static void intel_pmu_config_acr(int idx, u64 mask, u32 reload)
if (idx < INTEL_PMC_IDX_FIXED) {
msr_b = MSR_IA32_PMC_V6_GP0_CFG_B;
msr_c = MSR_IA32_PMC_V6_GP0_CFG_C;
+ msr_offset = x86_pmu.addr_offset(idx, false);
} else {
msr_b = MSR_IA32_PMC_V6_FX0_CFG_B;
msr_c = MSR_IA32_PMC_V6_FX0_CFG_C;
- idx -= INTEL_PMC_IDX_FIXED;
+ msr_offset = x86_pmu.addr_offset(idx - INTEL_PMC_IDX_FIXED, false);
}
if (cpuc->acr_cfg_b[idx] != mask) {
- wrmsrl(msr_b + x86_pmu.addr_offset(idx, false), mask);
+ wrmsrl(msr_b + msr_offset, mask);
cpuc->acr_cfg_b[idx] = mask;
}
/* Only need to update the reload value when there is a valid config value. */
if (mask && cpuc->acr_cfg_c[idx] != reload) {
- wrmsrl(msr_c + x86_pmu.addr_offset(idx, false), reload);
+ wrmsrl(msr_c + msr_offset, reload);
cpuc->acr_cfg_c[idx] = reload;
}
}
--
2.43.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 2/2] perf/x86/intel: Fix wrong index calculation in intel_pmu_config_acr()
2025-05-29 8:02 ` [PATCH 2/2] perf/x86/intel: Fix wrong index calculation in intel_pmu_config_acr() Dapeng Mi
@ 2025-05-29 13:55 ` Liang, Kan
0 siblings, 0 replies; 8+ messages in thread
From: Liang, Kan @ 2025-05-29 13:55 UTC (permalink / raw)
To: Dapeng Mi, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin,
Andi Kleen, Eranian Stephane
Cc: linux-kernel, linux-perf-users, Dapeng Mi
On 2025-05-29 4:02 a.m., Dapeng Mi wrote:
> To calculate fixed counter MSR address, the HW counter index "idx" is
> subtracted by INTEL_PMC_IDX_FIXED. It leads to the ACR mask value of
> fixed counters is incorrectly saved to the positions of GP counters
> in acr_cfg_b[], e.g. For fixed counter 0, its ACR counter mask should be
> saved to acr_cfg_b[32], but it's saved to acr_cfg_b[0] incorrectly.
>
> Fix this issue.
>
> Fixes: ec980e4facef ("perf/x86/intel: Support auto counter reload")
> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Thanks,
Kan
> ---
> arch/x86/events/intel/core.c | 8 +++++---
> 1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> index 8d046b1a237e..b0fee684ec8c 100644
> --- a/arch/x86/events/intel/core.c
> +++ b/arch/x86/events/intel/core.c
> @@ -2899,6 +2899,7 @@ static void intel_pmu_config_acr(int idx, u64 mask, u32 reload)
> {
> struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
> int msr_b, msr_c;
> + int msr_offset;
>
> if (!mask && !cpuc->acr_cfg_b[idx])
> return;
> @@ -2906,19 +2907,20 @@ static void intel_pmu_config_acr(int idx, u64 mask, u32 reload)
> if (idx < INTEL_PMC_IDX_FIXED) {
> msr_b = MSR_IA32_PMC_V6_GP0_CFG_B;
> msr_c = MSR_IA32_PMC_V6_GP0_CFG_C;
> + msr_offset = x86_pmu.addr_offset(idx, false);
> } else {
> msr_b = MSR_IA32_PMC_V6_FX0_CFG_B;
> msr_c = MSR_IA32_PMC_V6_FX0_CFG_C;
> - idx -= INTEL_PMC_IDX_FIXED;
> + msr_offset = x86_pmu.addr_offset(idx - INTEL_PMC_IDX_FIXED, false);
> }
>
> if (cpuc->acr_cfg_b[idx] != mask) {
> - wrmsrl(msr_b + x86_pmu.addr_offset(idx, false), mask);
> + wrmsrl(msr_b + msr_offset, mask);
> cpuc->acr_cfg_b[idx] = mask;
> }
> /* Only need to update the reload value when there is a valid config value. */
> if (mask && cpuc->acr_cfg_c[idx] != reload) {
> - wrmsrl(msr_c + x86_pmu.addr_offset(idx, false), reload);
> + wrmsrl(msr_c + msr_offset, reload);
> cpuc->acr_cfg_c[idx] = reload;
> }
> }
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] perf/x86/intel: Fix IA32_PMC_x_CFG_B MSRs access error
2025-05-29 8:02 [PATCH 1/2] perf/x86/intel: Fix IA32_PMC_x_CFG_B MSRs access error Dapeng Mi
2025-05-29 8:02 ` [PATCH 2/2] perf/x86/intel: Fix wrong index calculation in intel_pmu_config_acr() Dapeng Mi
@ 2025-05-29 13:54 ` Liang, Kan
2025-05-31 8:01 ` Ingo Molnar
2 siblings, 0 replies; 8+ messages in thread
From: Liang, Kan @ 2025-05-29 13:54 UTC (permalink / raw)
To: Dapeng Mi, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin,
Andi Kleen, Eranian Stephane
Cc: linux-kernel, linux-perf-users, Dapeng Mi
On 2025-05-29 4:02 a.m., Dapeng Mi wrote:
> When running perf_fuzzer on PTL, sometimes the below "unchecked MSR
> access error" is seen when accessing IA32_PMC_x_CFG_B MSRs.
>
> [ 55.611268] unchecked MSR access error: WRMSR to 0x1986 (tried to write 0x0000000200000001) at rIP: 0xffffffffac564b28 (native_write_msr+0x8/0x30)
> [ 55.611280] Call Trace:
> [ 55.611282] <TASK>
> [ 55.611284] ? intel_pmu_config_acr+0x87/0x160
> [ 55.611289] intel_pmu_enable_acr+0x6d/0x80
> [ 55.611291] intel_pmu_enable_event+0xce/0x460
> [ 55.611293] x86_pmu_start+0x78/0xb0
> [ 55.611297] x86_pmu_enable+0x218/0x3a0
> [ 55.611300] ? x86_pmu_enable+0x121/0x3a0
> [ 55.611302] perf_pmu_enable+0x40/0x50
> [ 55.611307] ctx_resched+0x19d/0x220
> [ 55.611309] __perf_install_in_context+0x284/0x2f0
> [ 55.611311] ? __pfx_remote_function+0x10/0x10
> [ 55.611314] remote_function+0x52/0x70
> [ 55.611317] ? __pfx_remote_function+0x10/0x10
> [ 55.611319] generic_exec_single+0x84/0x150
> [ 55.611323] smp_call_function_single+0xc5/0x1a0
> [ 55.611326] ? __pfx_remote_function+0x10/0x10
> [ 55.611329] perf_install_in_context+0xd1/0x1e0
> [ 55.611331] ? __pfx___perf_install_in_context+0x10/0x10
> [ 55.611333] __do_sys_perf_event_open+0xa76/0x1040
> [ 55.611336] __x64_sys_perf_event_open+0x26/0x30
> [ 55.611337] x64_sys_call+0x1d8e/0x20c0
> [ 55.611339] do_syscall_64+0x4f/0x120
> [ 55.611343] entry_SYSCALL_64_after_hwframe+0x76/0x7e
>
> On PTL, GP counter 0 and 1 doesn't support auto counter reload feature,
> thus it would trigger a #GP when trying to write 1 on bit 0 of CFG_B MSR
> which requires to enable auto counter reload on GP counter 0.
>
> The root cause of causing this issue is the check for auto counter
> reload (ACR) counter mask from user space is incorrect in
> intel_pmu_acr_late_setup() helper. It leads to an invalid ACR counter
> mask from user space could be set into hw.config1 and then written into
> CFG_B MSRs and trigger the MSR access warning.
>
> e.g., User may create a perf event with ACR counter mask (config2=0xcb),
> and there is only 1 event created, so "cpuc->n_events" is 1.
>
> The correct check condition should be "i + idx >= cpuc->n_events"
> instead of "i + idx > cpuc->n_events" (it looks a typo). Otherwise,
> the counter mask would traverse twice and an invalid "cpuc->assign[1]"
> bit (bit 0) is set into hw.config1 and cause MSR accessing error.
>
> Besides, also check if the ACR counter mask corresponding events are
> ACR events. If not, filter out these counter mask. If a event is not a
> ACR event, it could be scheduled to an HW counter which doesn't support
> ACR. It's invalid to add their counter index in ACR counter mask.
>
> Furthermore, remove the WARN_ON_ONCE() since it's easily triggered as
> user could set any invalid ACR counter mask and the warning message
> could mislead users.
>
> Fixes: ec980e4facef ("perf/x86/intel: Support auto counter reload")
> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Thanks for the fix.
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Thanks,
Kan
> ---
> arch/x86/events/intel/core.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> index 3a319cf6d364..8d046b1a237e 100644
> --- a/arch/x86/events/intel/core.c
> +++ b/arch/x86/events/intel/core.c
> @@ -2994,7 +2994,8 @@ static void intel_pmu_acr_late_setup(struct cpu_hw_events *cpuc)
> if (event->group_leader != leader->group_leader)
> break;
> for_each_set_bit(idx, (unsigned long *)&event->attr.config2, X86_PMC_IDX_MAX) {
> - if (WARN_ON_ONCE(i + idx > cpuc->n_events))
> + if (i + idx >= cpuc->n_events ||
> + !is_acr_event_group(cpuc->event_list[i + idx]))
> return;
> __set_bit(cpuc->assign[i + idx], (unsigned long *)&event->hw.config1);
> }
>
> base-commit: e7d952cc39fca34386ec9f15f68cb2eaac01b5ae
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] perf/x86/intel: Fix IA32_PMC_x_CFG_B MSRs access error
2025-05-29 8:02 [PATCH 1/2] perf/x86/intel: Fix IA32_PMC_x_CFG_B MSRs access error Dapeng Mi
2025-05-29 8:02 ` [PATCH 2/2] perf/x86/intel: Fix wrong index calculation in intel_pmu_config_acr() Dapeng Mi
2025-05-29 13:54 ` [PATCH 1/2] perf/x86/intel: Fix IA32_PMC_x_CFG_B MSRs access error Liang, Kan
@ 2025-05-31 8:01 ` Ingo Molnar
2025-06-03 2:14 ` Mi, Dapeng
2 siblings, 1 reply; 8+ messages in thread
From: Ingo Molnar @ 2025-05-31 8:01 UTC (permalink / raw)
To: Dapeng Mi
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin,
Kan Liang, Andi Kleen, Eranian Stephane, linux-kernel,
linux-perf-users, Dapeng Mi
* Dapeng Mi <dapeng1.mi@linux.intel.com> wrote:
> When running perf_fuzzer on PTL, sometimes the below "unchecked MSR
> access error" is seen when accessing IA32_PMC_x_CFG_B MSRs.
>
> [ 55.611268] unchecked MSR access error: WRMSR to 0x1986 (tried to write 0x0000000200000001) at rIP: 0xffffffffac564b28 (native_write_msr+0x8/0x30)
> [ 55.611280] Call Trace:
> [ 55.611282] <TASK>
> [ 55.611284] ? intel_pmu_config_acr+0x87/0x160
> [ 55.611289] intel_pmu_enable_acr+0x6d/0x80
> [ 55.611291] intel_pmu_enable_event+0xce/0x460
> [ 55.611293] x86_pmu_start+0x78/0xb0
> [ 55.611297] x86_pmu_enable+0x218/0x3a0
> [ 55.611300] ? x86_pmu_enable+0x121/0x3a0
> [ 55.611302] perf_pmu_enable+0x40/0x50
> [ 55.611307] ctx_resched+0x19d/0x220
> [ 55.611309] __perf_install_in_context+0x284/0x2f0
> [ 55.611311] ? __pfx_remote_function+0x10/0x10
> [ 55.611314] remote_function+0x52/0x70
> [ 55.611317] ? __pfx_remote_function+0x10/0x10
> [ 55.611319] generic_exec_single+0x84/0x150
> [ 55.611323] smp_call_function_single+0xc5/0x1a0
> [ 55.611326] ? __pfx_remote_function+0x10/0x10
> [ 55.611329] perf_install_in_context+0xd1/0x1e0
> [ 55.611331] ? __pfx___perf_install_in_context+0x10/0x10
> [ 55.611333] __do_sys_perf_event_open+0xa76/0x1040
> [ 55.611336] __x64_sys_perf_event_open+0x26/0x30
> [ 55.611337] x64_sys_call+0x1d8e/0x20c0
> [ 55.611339] do_syscall_64+0x4f/0x120
> [ 55.611343] entry_SYSCALL_64_after_hwframe+0x76/0x7e
>
> On PTL, GP counter 0 and 1 doesn't support auto counter reload feature,
> thus it would trigger a #GP when trying to write 1 on bit 0 of CFG_B MSR
> which requires to enable auto counter reload on GP counter 0.
>
> The root cause of causing this issue is the check for auto counter
> reload (ACR) counter mask from user space is incorrect in
> intel_pmu_acr_late_setup() helper. It leads to an invalid ACR counter
> mask from user space could be set into hw.config1 and then written into
> CFG_B MSRs and trigger the MSR access warning.
>
> e.g., User may create a perf event with ACR counter mask (config2=0xcb),
> and there is only 1 event created, so "cpuc->n_events" is 1.
>
> The correct check condition should be "i + idx >= cpuc->n_events"
> instead of "i + idx > cpuc->n_events" (it looks a typo). Otherwise,
> the counter mask would traverse twice and an invalid "cpuc->assign[1]"
> bit (bit 0) is set into hw.config1 and cause MSR accessing error.
>
> Besides, also check if the ACR counter mask corresponding events are
> ACR events. If not, filter out these counter mask. If a event is not a
> ACR event, it could be scheduled to an HW counter which doesn't support
> ACR. It's invalid to add their counter index in ACR counter mask.
>
> Furthermore, remove the WARN_ON_ONCE() since it's easily triggered as
> user could set any invalid ACR counter mask and the warning message
> could mislead users.
>
> Fixes: ec980e4facef ("perf/x86/intel: Support auto counter reload")
> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
> ---
> arch/x86/events/intel/core.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> index 3a319cf6d364..8d046b1a237e 100644
> --- a/arch/x86/events/intel/core.c
> +++ b/arch/x86/events/intel/core.c
> @@ -2994,7 +2994,8 @@ static void intel_pmu_acr_late_setup(struct cpu_hw_events *cpuc)
> if (event->group_leader != leader->group_leader)
> break;
> for_each_set_bit(idx, (unsigned long *)&event->attr.config2, X86_PMC_IDX_MAX) {
> - if (WARN_ON_ONCE(i + idx > cpuc->n_events))
> + if (i + idx >= cpuc->n_events ||
> + !is_acr_event_group(cpuc->event_list[i + idx]))
> return;
Is this a normal condition?
- If it's normal then the 'return' is destructive, isn't it? Shouldn't
it be a 'break', if this condition is legitimate?
- If it's not normal, then the WARN_ON_ONCE() was justified, right?
Thanks,
Ingo
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] perf/x86/intel: Fix IA32_PMC_x_CFG_B MSRs access error
2025-05-31 8:01 ` Ingo Molnar
@ 2025-06-03 2:14 ` Mi, Dapeng
2025-06-19 1:11 ` Mi, Dapeng
2025-07-11 2:40 ` Mi, Dapeng
0 siblings, 2 replies; 8+ messages in thread
From: Mi, Dapeng @ 2025-06-03 2:14 UTC (permalink / raw)
To: Ingo Molnar
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin,
Kan Liang, Andi Kleen, Eranian Stephane, linux-kernel,
linux-perf-users, Dapeng Mi
On 5/31/2025 4:01 PM, Ingo Molnar wrote:
> * Dapeng Mi <dapeng1.mi@linux.intel.com> wrote:
>
>> When running perf_fuzzer on PTL, sometimes the below "unchecked MSR
>> access error" is seen when accessing IA32_PMC_x_CFG_B MSRs.
>>
>> [ 55.611268] unchecked MSR access error: WRMSR to 0x1986 (tried to write 0x0000000200000001) at rIP: 0xffffffffac564b28 (native_write_msr+0x8/0x30)
>> [ 55.611280] Call Trace:
>> [ 55.611282] <TASK>
>> [ 55.611284] ? intel_pmu_config_acr+0x87/0x160
>> [ 55.611289] intel_pmu_enable_acr+0x6d/0x80
>> [ 55.611291] intel_pmu_enable_event+0xce/0x460
>> [ 55.611293] x86_pmu_start+0x78/0xb0
>> [ 55.611297] x86_pmu_enable+0x218/0x3a0
>> [ 55.611300] ? x86_pmu_enable+0x121/0x3a0
>> [ 55.611302] perf_pmu_enable+0x40/0x50
>> [ 55.611307] ctx_resched+0x19d/0x220
>> [ 55.611309] __perf_install_in_context+0x284/0x2f0
>> [ 55.611311] ? __pfx_remote_function+0x10/0x10
>> [ 55.611314] remote_function+0x52/0x70
>> [ 55.611317] ? __pfx_remote_function+0x10/0x10
>> [ 55.611319] generic_exec_single+0x84/0x150
>> [ 55.611323] smp_call_function_single+0xc5/0x1a0
>> [ 55.611326] ? __pfx_remote_function+0x10/0x10
>> [ 55.611329] perf_install_in_context+0xd1/0x1e0
>> [ 55.611331] ? __pfx___perf_install_in_context+0x10/0x10
>> [ 55.611333] __do_sys_perf_event_open+0xa76/0x1040
>> [ 55.611336] __x64_sys_perf_event_open+0x26/0x30
>> [ 55.611337] x64_sys_call+0x1d8e/0x20c0
>> [ 55.611339] do_syscall_64+0x4f/0x120
>> [ 55.611343] entry_SYSCALL_64_after_hwframe+0x76/0x7e
>>
>> On PTL, GP counter 0 and 1 doesn't support auto counter reload feature,
>> thus it would trigger a #GP when trying to write 1 on bit 0 of CFG_B MSR
>> which requires to enable auto counter reload on GP counter 0.
>>
>> The root cause of causing this issue is the check for auto counter
>> reload (ACR) counter mask from user space is incorrect in
>> intel_pmu_acr_late_setup() helper. It leads to an invalid ACR counter
>> mask from user space could be set into hw.config1 and then written into
>> CFG_B MSRs and trigger the MSR access warning.
>>
>> e.g., User may create a perf event with ACR counter mask (config2=0xcb),
>> and there is only 1 event created, so "cpuc->n_events" is 1.
>>
>> The correct check condition should be "i + idx >= cpuc->n_events"
>> instead of "i + idx > cpuc->n_events" (it looks a typo). Otherwise,
>> the counter mask would traverse twice and an invalid "cpuc->assign[1]"
>> bit (bit 0) is set into hw.config1 and cause MSR accessing error.
>>
>> Besides, also check if the ACR counter mask corresponding events are
>> ACR events. If not, filter out these counter mask. If a event is not a
>> ACR event, it could be scheduled to an HW counter which doesn't support
>> ACR. It's invalid to add their counter index in ACR counter mask.
>>
>> Furthermore, remove the WARN_ON_ONCE() since it's easily triggered as
>> user could set any invalid ACR counter mask and the warning message
>> could mislead users.
>>
>> Fixes: ec980e4facef ("perf/x86/intel: Support auto counter reload")
>> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
>> ---
>> arch/x86/events/intel/core.c | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
>> index 3a319cf6d364..8d046b1a237e 100644
>> --- a/arch/x86/events/intel/core.c
>> +++ b/arch/x86/events/intel/core.c
>> @@ -2994,7 +2994,8 @@ static void intel_pmu_acr_late_setup(struct cpu_hw_events *cpuc)
>> if (event->group_leader != leader->group_leader)
>> break;
>> for_each_set_bit(idx, (unsigned long *)&event->attr.config2, X86_PMC_IDX_MAX) {
>> - if (WARN_ON_ONCE(i + idx > cpuc->n_events))
>> + if (i + idx >= cpuc->n_events ||
>> + !is_acr_event_group(cpuc->event_list[i + idx]))
>> return;
> Is this a normal condition?
>
> - If it's normal then the 'return' is destructive, isn't it? Shouldn't
> it be a 'break', if this condition is legitimate?
>
> - If it's not normal, then the WARN_ON_ONCE() was justified, right?
It's not normal.Strictly speaking, it's an invalid user configuration. It
looks not reasonable to trigger a kernel space warning for an invalid user
space configuration. It would mislead users and let users think there is
something wrong in kernel. That's why to remove the WARN_ON_ONCE(). Thanks.
>
> Thanks,
>
> Ingo
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] perf/x86/intel: Fix IA32_PMC_x_CFG_B MSRs access error
2025-06-03 2:14 ` Mi, Dapeng
@ 2025-06-19 1:11 ` Mi, Dapeng
2025-07-11 2:40 ` Mi, Dapeng
1 sibling, 0 replies; 8+ messages in thread
From: Mi, Dapeng @ 2025-06-19 1:11 UTC (permalink / raw)
To: Ingo Molnar
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin,
Kan Liang, Andi Kleen, Eranian Stephane, linux-kernel,
linux-perf-users, Dapeng Mi
Kindly ping... Thanks.
On 6/3/2025 10:14 AM, Mi, Dapeng wrote:
> On 5/31/2025 4:01 PM, Ingo Molnar wrote:
>> * Dapeng Mi <dapeng1.mi@linux.intel.com> wrote:
>>
>>> When running perf_fuzzer on PTL, sometimes the below "unchecked MSR
>>> access error" is seen when accessing IA32_PMC_x_CFG_B MSRs.
>>>
>>> [ 55.611268] unchecked MSR access error: WRMSR to 0x1986 (tried to write 0x0000000200000001) at rIP: 0xffffffffac564b28 (native_write_msr+0x8/0x30)
>>> [ 55.611280] Call Trace:
>>> [ 55.611282] <TASK>
>>> [ 55.611284] ? intel_pmu_config_acr+0x87/0x160
>>> [ 55.611289] intel_pmu_enable_acr+0x6d/0x80
>>> [ 55.611291] intel_pmu_enable_event+0xce/0x460
>>> [ 55.611293] x86_pmu_start+0x78/0xb0
>>> [ 55.611297] x86_pmu_enable+0x218/0x3a0
>>> [ 55.611300] ? x86_pmu_enable+0x121/0x3a0
>>> [ 55.611302] perf_pmu_enable+0x40/0x50
>>> [ 55.611307] ctx_resched+0x19d/0x220
>>> [ 55.611309] __perf_install_in_context+0x284/0x2f0
>>> [ 55.611311] ? __pfx_remote_function+0x10/0x10
>>> [ 55.611314] remote_function+0x52/0x70
>>> [ 55.611317] ? __pfx_remote_function+0x10/0x10
>>> [ 55.611319] generic_exec_single+0x84/0x150
>>> [ 55.611323] smp_call_function_single+0xc5/0x1a0
>>> [ 55.611326] ? __pfx_remote_function+0x10/0x10
>>> [ 55.611329] perf_install_in_context+0xd1/0x1e0
>>> [ 55.611331] ? __pfx___perf_install_in_context+0x10/0x10
>>> [ 55.611333] __do_sys_perf_event_open+0xa76/0x1040
>>> [ 55.611336] __x64_sys_perf_event_open+0x26/0x30
>>> [ 55.611337] x64_sys_call+0x1d8e/0x20c0
>>> [ 55.611339] do_syscall_64+0x4f/0x120
>>> [ 55.611343] entry_SYSCALL_64_after_hwframe+0x76/0x7e
>>>
>>> On PTL, GP counter 0 and 1 doesn't support auto counter reload feature,
>>> thus it would trigger a #GP when trying to write 1 on bit 0 of CFG_B MSR
>>> which requires to enable auto counter reload on GP counter 0.
>>>
>>> The root cause of causing this issue is the check for auto counter
>>> reload (ACR) counter mask from user space is incorrect in
>>> intel_pmu_acr_late_setup() helper. It leads to an invalid ACR counter
>>> mask from user space could be set into hw.config1 and then written into
>>> CFG_B MSRs and trigger the MSR access warning.
>>>
>>> e.g., User may create a perf event with ACR counter mask (config2=0xcb),
>>> and there is only 1 event created, so "cpuc->n_events" is 1.
>>>
>>> The correct check condition should be "i + idx >= cpuc->n_events"
>>> instead of "i + idx > cpuc->n_events" (it looks a typo). Otherwise,
>>> the counter mask would traverse twice and an invalid "cpuc->assign[1]"
>>> bit (bit 0) is set into hw.config1 and cause MSR accessing error.
>>>
>>> Besides, also check if the ACR counter mask corresponding events are
>>> ACR events. If not, filter out these counter mask. If a event is not a
>>> ACR event, it could be scheduled to an HW counter which doesn't support
>>> ACR. It's invalid to add their counter index in ACR counter mask.
>>>
>>> Furthermore, remove the WARN_ON_ONCE() since it's easily triggered as
>>> user could set any invalid ACR counter mask and the warning message
>>> could mislead users.
>>>
>>> Fixes: ec980e4facef ("perf/x86/intel: Support auto counter reload")
>>> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
>>> ---
>>> arch/x86/events/intel/core.c | 3 ++-
>>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
>>> index 3a319cf6d364..8d046b1a237e 100644
>>> --- a/arch/x86/events/intel/core.c
>>> +++ b/arch/x86/events/intel/core.c
>>> @@ -2994,7 +2994,8 @@ static void intel_pmu_acr_late_setup(struct cpu_hw_events *cpuc)
>>> if (event->group_leader != leader->group_leader)
>>> break;
>>> for_each_set_bit(idx, (unsigned long *)&event->attr.config2, X86_PMC_IDX_MAX) {
>>> - if (WARN_ON_ONCE(i + idx > cpuc->n_events))
>>> + if (i + idx >= cpuc->n_events ||
>>> + !is_acr_event_group(cpuc->event_list[i + idx]))
>>> return;
>> Is this a normal condition?
>>
>> - If it's normal then the 'return' is destructive, isn't it? Shouldn't
>> it be a 'break', if this condition is legitimate?
>>
>> - If it's not normal, then the WARN_ON_ONCE() was justified, right?
> It's not normal.Strictly speaking, it's an invalid user configuration. It
> looks not reasonable to trigger a kernel space warning for an invalid user
> space configuration. It would mislead users and let users think there is
> something wrong in kernel. That's why to remove the WARN_ON_ONCE(). Thanks.
>
>
>> Thanks,
>>
>> Ingo
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] perf/x86/intel: Fix IA32_PMC_x_CFG_B MSRs access error
2025-06-03 2:14 ` Mi, Dapeng
2025-06-19 1:11 ` Mi, Dapeng
@ 2025-07-11 2:40 ` Mi, Dapeng
1 sibling, 0 replies; 8+ messages in thread
From: Mi, Dapeng @ 2025-07-11 2:40 UTC (permalink / raw)
To: Ingo Molnar
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin,
Kan Liang, Andi Kleen, Eranian Stephane, linux-kernel,
linux-perf-users, Dapeng Mi
Kindly ping.
@Ingo @Peter, not sure if you have bandwidth to review this change? Thanks.
On 6/3/2025 10:14 AM, Mi, Dapeng wrote:
> On 5/31/2025 4:01 PM, Ingo Molnar wrote:
>> * Dapeng Mi <dapeng1.mi@linux.intel.com> wrote:
>>
>>> When running perf_fuzzer on PTL, sometimes the below "unchecked MSR
>>> access error" is seen when accessing IA32_PMC_x_CFG_B MSRs.
>>>
>>> [ 55.611268] unchecked MSR access error: WRMSR to 0x1986 (tried to write 0x0000000200000001) at rIP: 0xffffffffac564b28 (native_write_msr+0x8/0x30)
>>> [ 55.611280] Call Trace:
>>> [ 55.611282] <TASK>
>>> [ 55.611284] ? intel_pmu_config_acr+0x87/0x160
>>> [ 55.611289] intel_pmu_enable_acr+0x6d/0x80
>>> [ 55.611291] intel_pmu_enable_event+0xce/0x460
>>> [ 55.611293] x86_pmu_start+0x78/0xb0
>>> [ 55.611297] x86_pmu_enable+0x218/0x3a0
>>> [ 55.611300] ? x86_pmu_enable+0x121/0x3a0
>>> [ 55.611302] perf_pmu_enable+0x40/0x50
>>> [ 55.611307] ctx_resched+0x19d/0x220
>>> [ 55.611309] __perf_install_in_context+0x284/0x2f0
>>> [ 55.611311] ? __pfx_remote_function+0x10/0x10
>>> [ 55.611314] remote_function+0x52/0x70
>>> [ 55.611317] ? __pfx_remote_function+0x10/0x10
>>> [ 55.611319] generic_exec_single+0x84/0x150
>>> [ 55.611323] smp_call_function_single+0xc5/0x1a0
>>> [ 55.611326] ? __pfx_remote_function+0x10/0x10
>>> [ 55.611329] perf_install_in_context+0xd1/0x1e0
>>> [ 55.611331] ? __pfx___perf_install_in_context+0x10/0x10
>>> [ 55.611333] __do_sys_perf_event_open+0xa76/0x1040
>>> [ 55.611336] __x64_sys_perf_event_open+0x26/0x30
>>> [ 55.611337] x64_sys_call+0x1d8e/0x20c0
>>> [ 55.611339] do_syscall_64+0x4f/0x120
>>> [ 55.611343] entry_SYSCALL_64_after_hwframe+0x76/0x7e
>>>
>>> On PTL, GP counter 0 and 1 doesn't support auto counter reload feature,
>>> thus it would trigger a #GP when trying to write 1 on bit 0 of CFG_B MSR
>>> which requires to enable auto counter reload on GP counter 0.
>>>
>>> The root cause of causing this issue is the check for auto counter
>>> reload (ACR) counter mask from user space is incorrect in
>>> intel_pmu_acr_late_setup() helper. It leads to an invalid ACR counter
>>> mask from user space could be set into hw.config1 and then written into
>>> CFG_B MSRs and trigger the MSR access warning.
>>>
>>> e.g., User may create a perf event with ACR counter mask (config2=0xcb),
>>> and there is only 1 event created, so "cpuc->n_events" is 1.
>>>
>>> The correct check condition should be "i + idx >= cpuc->n_events"
>>> instead of "i + idx > cpuc->n_events" (it looks a typo). Otherwise,
>>> the counter mask would traverse twice and an invalid "cpuc->assign[1]"
>>> bit (bit 0) is set into hw.config1 and cause MSR accessing error.
>>>
>>> Besides, also check if the ACR counter mask corresponding events are
>>> ACR events. If not, filter out these counter mask. If a event is not a
>>> ACR event, it could be scheduled to an HW counter which doesn't support
>>> ACR. It's invalid to add their counter index in ACR counter mask.
>>>
>>> Furthermore, remove the WARN_ON_ONCE() since it's easily triggered as
>>> user could set any invalid ACR counter mask and the warning message
>>> could mislead users.
>>>
>>> Fixes: ec980e4facef ("perf/x86/intel: Support auto counter reload")
>>> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
>>> ---
>>> arch/x86/events/intel/core.c | 3 ++-
>>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
>>> index 3a319cf6d364..8d046b1a237e 100644
>>> --- a/arch/x86/events/intel/core.c
>>> +++ b/arch/x86/events/intel/core.c
>>> @@ -2994,7 +2994,8 @@ static void intel_pmu_acr_late_setup(struct cpu_hw_events *cpuc)
>>> if (event->group_leader != leader->group_leader)
>>> break;
>>> for_each_set_bit(idx, (unsigned long *)&event->attr.config2, X86_PMC_IDX_MAX) {
>>> - if (WARN_ON_ONCE(i + idx > cpuc->n_events))
>>> + if (i + idx >= cpuc->n_events ||
>>> + !is_acr_event_group(cpuc->event_list[i + idx]))
>>> return;
>> Is this a normal condition?
>>
>> - If it's normal then the 'return' is destructive, isn't it? Shouldn't
>> it be a 'break', if this condition is legitimate?
>>
>> - If it's not normal, then the WARN_ON_ONCE() was justified, right?
> It's not normal.Strictly speaking, it's an invalid user configuration. It
> looks not reasonable to trigger a kernel space warning for an invalid user
> space configuration. It would mislead users and let users think there is
> something wrong in kernel. That's why to remove the WARN_ON_ONCE(). Thanks.
>
>
>> Thanks,
>>
>> Ingo
^ permalink raw reply [flat|nested] 8+ messages in thread