* [PATCH 1/2] perf/x86/intel: Fix IA32_PMC_x_CFG_B MSRs access error
@ 2025-05-29 8:02 Dapeng Mi
2025-05-29 8:02 ` [PATCH 2/2] perf/x86/intel: Fix wrong index calculation in intel_pmu_config_acr() Dapeng Mi
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Dapeng Mi @ 2025-05-29 8:02 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin,
Kan Liang, Andi Kleen, Eranian Stephane
Cc: linux-kernel, linux-perf-users, Dapeng Mi, Dapeng Mi
When running perf_fuzzer on PTL, sometimes the below "unchecked MSR
access error" is seen when accessing IA32_PMC_x_CFG_B MSRs.
[ 55.611268] unchecked MSR access error: WRMSR to 0x1986 (tried to write 0x0000000200000001) at rIP: 0xffffffffac564b28 (native_write_msr+0x8/0x30)
[ 55.611280] Call Trace:
[ 55.611282] <TASK>
[ 55.611284] ? intel_pmu_config_acr+0x87/0x160
[ 55.611289] intel_pmu_enable_acr+0x6d/0x80
[ 55.611291] intel_pmu_enable_event+0xce/0x460
[ 55.611293] x86_pmu_start+0x78/0xb0
[ 55.611297] x86_pmu_enable+0x218/0x3a0
[ 55.611300] ? x86_pmu_enable+0x121/0x3a0
[ 55.611302] perf_pmu_enable+0x40/0x50
[ 55.611307] ctx_resched+0x19d/0x220
[ 55.611309] __perf_install_in_context+0x284/0x2f0
[ 55.611311] ? __pfx_remote_function+0x10/0x10
[ 55.611314] remote_function+0x52/0x70
[ 55.611317] ? __pfx_remote_function+0x10/0x10
[ 55.611319] generic_exec_single+0x84/0x150
[ 55.611323] smp_call_function_single+0xc5/0x1a0
[ 55.611326] ? __pfx_remote_function+0x10/0x10
[ 55.611329] perf_install_in_context+0xd1/0x1e0
[ 55.611331] ? __pfx___perf_install_in_context+0x10/0x10
[ 55.611333] __do_sys_perf_event_open+0xa76/0x1040
[ 55.611336] __x64_sys_perf_event_open+0x26/0x30
[ 55.611337] x64_sys_call+0x1d8e/0x20c0
[ 55.611339] do_syscall_64+0x4f/0x120
[ 55.611343] entry_SYSCALL_64_after_hwframe+0x76/0x7e
On PTL, GP counter 0 and 1 doesn't support auto counter reload feature,
thus it would trigger a #GP when trying to write 1 on bit 0 of CFG_B MSR
which requires to enable auto counter reload on GP counter 0.
The root cause of causing this issue is the check for auto counter
reload (ACR) counter mask from user space is incorrect in
intel_pmu_acr_late_setup() helper. It leads to an invalid ACR counter
mask from user space could be set into hw.config1 and then written into
CFG_B MSRs and trigger the MSR access warning.
e.g., User may create a perf event with ACR counter mask (config2=0xcb),
and there is only 1 event created, so "cpuc->n_events" is 1.
The correct check condition should be "i + idx >= cpuc->n_events"
instead of "i + idx > cpuc->n_events" (it looks a typo). Otherwise,
the counter mask would traverse twice and an invalid "cpuc->assign[1]"
bit (bit 0) is set into hw.config1 and cause MSR accessing error.
Besides, also check if the ACR counter mask corresponding events are
ACR events. If not, filter out these counter mask. If a event is not a
ACR event, it could be scheduled to an HW counter which doesn't support
ACR. It's invalid to add their counter index in ACR counter mask.
Furthermore, remove the WARN_ON_ONCE() since it's easily triggered as
user could set any invalid ACR counter mask and the warning message
could mislead users.
Fixes: ec980e4facef ("perf/x86/intel: Support auto counter reload")
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
---
arch/x86/events/intel/core.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 3a319cf6d364..8d046b1a237e 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -2994,7 +2994,8 @@ static void intel_pmu_acr_late_setup(struct cpu_hw_events *cpuc)
if (event->group_leader != leader->group_leader)
break;
for_each_set_bit(idx, (unsigned long *)&event->attr.config2, X86_PMC_IDX_MAX) {
- if (WARN_ON_ONCE(i + idx > cpuc->n_events))
+ if (i + idx >= cpuc->n_events ||
+ !is_acr_event_group(cpuc->event_list[i + idx]))
return;
__set_bit(cpuc->assign[i + idx], (unsigned long *)&event->hw.config1);
}
base-commit: e7d952cc39fca34386ec9f15f68cb2eaac01b5ae
--
2.43.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 2/2] perf/x86/intel: Fix wrong index calculation in intel_pmu_config_acr()
2025-05-29 8:02 [PATCH 1/2] perf/x86/intel: Fix IA32_PMC_x_CFG_B MSRs access error Dapeng Mi
@ 2025-05-29 8:02 ` Dapeng Mi
2025-05-29 13:55 ` Liang, Kan
2025-05-29 13:54 ` [PATCH 1/2] perf/x86/intel: Fix IA32_PMC_x_CFG_B MSRs access error Liang, Kan
2025-05-31 8:01 ` Ingo Molnar
2 siblings, 1 reply; 8+ messages in thread
From: Dapeng Mi @ 2025-05-29 8:02 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin,
Kan Liang, Andi Kleen, Eranian Stephane
Cc: linux-kernel, linux-perf-users, Dapeng Mi, Dapeng Mi
To calculate fixed counter MSR address, the HW counter index "idx" is
subtracted by INTEL_PMC_IDX_FIXED. It leads to the ACR mask value of
fixed counters is incorrectly saved to the positions of GP counters
in acr_cfg_b[], e.g. For fixed counter 0, its ACR counter mask should be
saved to acr_cfg_b[32], but it's saved to acr_cfg_b[0] incorrectly.
Fix this issue.
Fixes: ec980e4facef ("perf/x86/intel: Support auto counter reload")
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
---
arch/x86/events/intel/core.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 8d046b1a237e..b0fee684ec8c 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -2899,6 +2899,7 @@ static void intel_pmu_config_acr(int idx, u64 mask, u32 reload)
{
struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
int msr_b, msr_c;
+ int msr_offset;
if (!mask && !cpuc->acr_cfg_b[idx])
return;
@@ -2906,19 +2907,20 @@ static void intel_pmu_config_acr(int idx, u64 mask, u32 reload)
if (idx < INTEL_PMC_IDX_FIXED) {
msr_b = MSR_IA32_PMC_V6_GP0_CFG_B;
msr_c = MSR_IA32_PMC_V6_GP0_CFG_C;
+ msr_offset = x86_pmu.addr_offset(idx, false);
} else {
msr_b = MSR_IA32_PMC_V6_FX0_CFG_B;
msr_c = MSR_IA32_PMC_V6_FX0_CFG_C;
- idx -= INTEL_PMC_IDX_FIXED;
+ msr_offset = x86_pmu.addr_offset(idx - INTEL_PMC_IDX_FIXED, false);
}
if (cpuc->acr_cfg_b[idx] != mask) {
- wrmsrl(msr_b + x86_pmu.addr_offset(idx, false), mask);
+ wrmsrl(msr_b + msr_offset, mask);
cpuc->acr_cfg_b[idx] = mask;
}
/* Only need to update the reload value when there is a valid config value. */
if (mask && cpuc->acr_cfg_c[idx] != reload) {
- wrmsrl(msr_c + x86_pmu.addr_offset(idx, false), reload);
+ wrmsrl(msr_c + msr_offset, reload);
cpuc->acr_cfg_c[idx] = reload;
}
}
--
2.43.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] perf/x86/intel: Fix IA32_PMC_x_CFG_B MSRs access error
2025-05-29 8:02 [PATCH 1/2] perf/x86/intel: Fix IA32_PMC_x_CFG_B MSRs access error Dapeng Mi
2025-05-29 8:02 ` [PATCH 2/2] perf/x86/intel: Fix wrong index calculation in intel_pmu_config_acr() Dapeng Mi
@ 2025-05-29 13:54 ` Liang, Kan
2025-05-31 8:01 ` Ingo Molnar
2 siblings, 0 replies; 8+ messages in thread
From: Liang, Kan @ 2025-05-29 13:54 UTC (permalink / raw)
To: Dapeng Mi, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin,
Andi Kleen, Eranian Stephane
Cc: linux-kernel, linux-perf-users, Dapeng Mi
On 2025-05-29 4:02 a.m., Dapeng Mi wrote:
> When running perf_fuzzer on PTL, sometimes the below "unchecked MSR
> access error" is seen when accessing IA32_PMC_x_CFG_B MSRs.
>
> [ 55.611268] unchecked MSR access error: WRMSR to 0x1986 (tried to write 0x0000000200000001) at rIP: 0xffffffffac564b28 (native_write_msr+0x8/0x30)
> [ 55.611280] Call Trace:
> [ 55.611282] <TASK>
> [ 55.611284] ? intel_pmu_config_acr+0x87/0x160
> [ 55.611289] intel_pmu_enable_acr+0x6d/0x80
> [ 55.611291] intel_pmu_enable_event+0xce/0x460
> [ 55.611293] x86_pmu_start+0x78/0xb0
> [ 55.611297] x86_pmu_enable+0x218/0x3a0
> [ 55.611300] ? x86_pmu_enable+0x121/0x3a0
> [ 55.611302] perf_pmu_enable+0x40/0x50
> [ 55.611307] ctx_resched+0x19d/0x220
> [ 55.611309] __perf_install_in_context+0x284/0x2f0
> [ 55.611311] ? __pfx_remote_function+0x10/0x10
> [ 55.611314] remote_function+0x52/0x70
> [ 55.611317] ? __pfx_remote_function+0x10/0x10
> [ 55.611319] generic_exec_single+0x84/0x150
> [ 55.611323] smp_call_function_single+0xc5/0x1a0
> [ 55.611326] ? __pfx_remote_function+0x10/0x10
> [ 55.611329] perf_install_in_context+0xd1/0x1e0
> [ 55.611331] ? __pfx___perf_install_in_context+0x10/0x10
> [ 55.611333] __do_sys_perf_event_open+0xa76/0x1040
> [ 55.611336] __x64_sys_perf_event_open+0x26/0x30
> [ 55.611337] x64_sys_call+0x1d8e/0x20c0
> [ 55.611339] do_syscall_64+0x4f/0x120
> [ 55.611343] entry_SYSCALL_64_after_hwframe+0x76/0x7e
>
> On PTL, GP counter 0 and 1 doesn't support auto counter reload feature,
> thus it would trigger a #GP when trying to write 1 on bit 0 of CFG_B MSR
> which requires to enable auto counter reload on GP counter 0.
>
> The root cause of causing this issue is the check for auto counter
> reload (ACR) counter mask from user space is incorrect in
> intel_pmu_acr_late_setup() helper. It leads to an invalid ACR counter
> mask from user space could be set into hw.config1 and then written into
> CFG_B MSRs and trigger the MSR access warning.
>
> e.g., User may create a perf event with ACR counter mask (config2=0xcb),
> and there is only 1 event created, so "cpuc->n_events" is 1.
>
> The correct check condition should be "i + idx >= cpuc->n_events"
> instead of "i + idx > cpuc->n_events" (it looks a typo). Otherwise,
> the counter mask would traverse twice and an invalid "cpuc->assign[1]"
> bit (bit 0) is set into hw.config1 and cause MSR accessing error.
>
> Besides, also check if the ACR counter mask corresponding events are
> ACR events. If not, filter out these counter mask. If a event is not a
> ACR event, it could be scheduled to an HW counter which doesn't support
> ACR. It's invalid to add their counter index in ACR counter mask.
>
> Furthermore, remove the WARN_ON_ONCE() since it's easily triggered as
> user could set any invalid ACR counter mask and the warning message
> could mislead users.
>
> Fixes: ec980e4facef ("perf/x86/intel: Support auto counter reload")
> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Thanks for the fix.
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Thanks,
Kan
> ---
> arch/x86/events/intel/core.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> index 3a319cf6d364..8d046b1a237e 100644
> --- a/arch/x86/events/intel/core.c
> +++ b/arch/x86/events/intel/core.c
> @@ -2994,7 +2994,8 @@ static void intel_pmu_acr_late_setup(struct cpu_hw_events *cpuc)
> if (event->group_leader != leader->group_leader)
> break;
> for_each_set_bit(idx, (unsigned long *)&event->attr.config2, X86_PMC_IDX_MAX) {
> - if (WARN_ON_ONCE(i + idx > cpuc->n_events))
> + if (i + idx >= cpuc->n_events ||
> + !is_acr_event_group(cpuc->event_list[i + idx]))
> return;
> __set_bit(cpuc->assign[i + idx], (unsigned long *)&event->hw.config1);
> }
>
> base-commit: e7d952cc39fca34386ec9f15f68cb2eaac01b5ae
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2/2] perf/x86/intel: Fix wrong index calculation in intel_pmu_config_acr()
2025-05-29 8:02 ` [PATCH 2/2] perf/x86/intel: Fix wrong index calculation in intel_pmu_config_acr() Dapeng Mi
@ 2025-05-29 13:55 ` Liang, Kan
0 siblings, 0 replies; 8+ messages in thread
From: Liang, Kan @ 2025-05-29 13:55 UTC (permalink / raw)
To: Dapeng Mi, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin,
Andi Kleen, Eranian Stephane
Cc: linux-kernel, linux-perf-users, Dapeng Mi
On 2025-05-29 4:02 a.m., Dapeng Mi wrote:
> To calculate fixed counter MSR address, the HW counter index "idx" is
> subtracted by INTEL_PMC_IDX_FIXED. It leads to the ACR mask value of
> fixed counters is incorrectly saved to the positions of GP counters
> in acr_cfg_b[], e.g. For fixed counter 0, its ACR counter mask should be
> saved to acr_cfg_b[32], but it's saved to acr_cfg_b[0] incorrectly.
>
> Fix this issue.
>
> Fixes: ec980e4facef ("perf/x86/intel: Support auto counter reload")
> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Thanks,
Kan
> ---
> arch/x86/events/intel/core.c | 8 +++++---
> 1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> index 8d046b1a237e..b0fee684ec8c 100644
> --- a/arch/x86/events/intel/core.c
> +++ b/arch/x86/events/intel/core.c
> @@ -2899,6 +2899,7 @@ static void intel_pmu_config_acr(int idx, u64 mask, u32 reload)
> {
> struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
> int msr_b, msr_c;
> + int msr_offset;
>
> if (!mask && !cpuc->acr_cfg_b[idx])
> return;
> @@ -2906,19 +2907,20 @@ static void intel_pmu_config_acr(int idx, u64 mask, u32 reload)
> if (idx < INTEL_PMC_IDX_FIXED) {
> msr_b = MSR_IA32_PMC_V6_GP0_CFG_B;
> msr_c = MSR_IA32_PMC_V6_GP0_CFG_C;
> + msr_offset = x86_pmu.addr_offset(idx, false);
> } else {
> msr_b = MSR_IA32_PMC_V6_FX0_CFG_B;
> msr_c = MSR_IA32_PMC_V6_FX0_CFG_C;
> - idx -= INTEL_PMC_IDX_FIXED;
> + msr_offset = x86_pmu.addr_offset(idx - INTEL_PMC_IDX_FIXED, false);
> }
>
> if (cpuc->acr_cfg_b[idx] != mask) {
> - wrmsrl(msr_b + x86_pmu.addr_offset(idx, false), mask);
> + wrmsrl(msr_b + msr_offset, mask);
> cpuc->acr_cfg_b[idx] = mask;
> }
> /* Only need to update the reload value when there is a valid config value. */
> if (mask && cpuc->acr_cfg_c[idx] != reload) {
> - wrmsrl(msr_c + x86_pmu.addr_offset(idx, false), reload);
> + wrmsrl(msr_c + msr_offset, reload);
> cpuc->acr_cfg_c[idx] = reload;
> }
> }
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] perf/x86/intel: Fix IA32_PMC_x_CFG_B MSRs access error
2025-05-29 8:02 [PATCH 1/2] perf/x86/intel: Fix IA32_PMC_x_CFG_B MSRs access error Dapeng Mi
2025-05-29 8:02 ` [PATCH 2/2] perf/x86/intel: Fix wrong index calculation in intel_pmu_config_acr() Dapeng Mi
2025-05-29 13:54 ` [PATCH 1/2] perf/x86/intel: Fix IA32_PMC_x_CFG_B MSRs access error Liang, Kan
@ 2025-05-31 8:01 ` Ingo Molnar
2025-06-03 2:14 ` Mi, Dapeng
2 siblings, 1 reply; 8+ messages in thread
From: Ingo Molnar @ 2025-05-31 8:01 UTC (permalink / raw)
To: Dapeng Mi
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin,
Kan Liang, Andi Kleen, Eranian Stephane, linux-kernel,
linux-perf-users, Dapeng Mi
* Dapeng Mi <dapeng1.mi@linux.intel.com> wrote:
> When running perf_fuzzer on PTL, sometimes the below "unchecked MSR
> access error" is seen when accessing IA32_PMC_x_CFG_B MSRs.
>
> [ 55.611268] unchecked MSR access error: WRMSR to 0x1986 (tried to write 0x0000000200000001) at rIP: 0xffffffffac564b28 (native_write_msr+0x8/0x30)
> [ 55.611280] Call Trace:
> [ 55.611282] <TASK>
> [ 55.611284] ? intel_pmu_config_acr+0x87/0x160
> [ 55.611289] intel_pmu_enable_acr+0x6d/0x80
> [ 55.611291] intel_pmu_enable_event+0xce/0x460
> [ 55.611293] x86_pmu_start+0x78/0xb0
> [ 55.611297] x86_pmu_enable+0x218/0x3a0
> [ 55.611300] ? x86_pmu_enable+0x121/0x3a0
> [ 55.611302] perf_pmu_enable+0x40/0x50
> [ 55.611307] ctx_resched+0x19d/0x220
> [ 55.611309] __perf_install_in_context+0x284/0x2f0
> [ 55.611311] ? __pfx_remote_function+0x10/0x10
> [ 55.611314] remote_function+0x52/0x70
> [ 55.611317] ? __pfx_remote_function+0x10/0x10
> [ 55.611319] generic_exec_single+0x84/0x150
> [ 55.611323] smp_call_function_single+0xc5/0x1a0
> [ 55.611326] ? __pfx_remote_function+0x10/0x10
> [ 55.611329] perf_install_in_context+0xd1/0x1e0
> [ 55.611331] ? __pfx___perf_install_in_context+0x10/0x10
> [ 55.611333] __do_sys_perf_event_open+0xa76/0x1040
> [ 55.611336] __x64_sys_perf_event_open+0x26/0x30
> [ 55.611337] x64_sys_call+0x1d8e/0x20c0
> [ 55.611339] do_syscall_64+0x4f/0x120
> [ 55.611343] entry_SYSCALL_64_after_hwframe+0x76/0x7e
>
> On PTL, GP counter 0 and 1 doesn't support auto counter reload feature,
> thus it would trigger a #GP when trying to write 1 on bit 0 of CFG_B MSR
> which requires to enable auto counter reload on GP counter 0.
>
> The root cause of causing this issue is the check for auto counter
> reload (ACR) counter mask from user space is incorrect in
> intel_pmu_acr_late_setup() helper. It leads to an invalid ACR counter
> mask from user space could be set into hw.config1 and then written into
> CFG_B MSRs and trigger the MSR access warning.
>
> e.g., User may create a perf event with ACR counter mask (config2=0xcb),
> and there is only 1 event created, so "cpuc->n_events" is 1.
>
> The correct check condition should be "i + idx >= cpuc->n_events"
> instead of "i + idx > cpuc->n_events" (it looks a typo). Otherwise,
> the counter mask would traverse twice and an invalid "cpuc->assign[1]"
> bit (bit 0) is set into hw.config1 and cause MSR accessing error.
>
> Besides, also check if the ACR counter mask corresponding events are
> ACR events. If not, filter out these counter mask. If a event is not a
> ACR event, it could be scheduled to an HW counter which doesn't support
> ACR. It's invalid to add their counter index in ACR counter mask.
>
> Furthermore, remove the WARN_ON_ONCE() since it's easily triggered as
> user could set any invalid ACR counter mask and the warning message
> could mislead users.
>
> Fixes: ec980e4facef ("perf/x86/intel: Support auto counter reload")
> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
> ---
> arch/x86/events/intel/core.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> index 3a319cf6d364..8d046b1a237e 100644
> --- a/arch/x86/events/intel/core.c
> +++ b/arch/x86/events/intel/core.c
> @@ -2994,7 +2994,8 @@ static void intel_pmu_acr_late_setup(struct cpu_hw_events *cpuc)
> if (event->group_leader != leader->group_leader)
> break;
> for_each_set_bit(idx, (unsigned long *)&event->attr.config2, X86_PMC_IDX_MAX) {
> - if (WARN_ON_ONCE(i + idx > cpuc->n_events))
> + if (i + idx >= cpuc->n_events ||
> + !is_acr_event_group(cpuc->event_list[i + idx]))
> return;
Is this a normal condition?
- If it's normal then the 'return' is destructive, isn't it? Shouldn't
it be a 'break', if this condition is legitimate?
- If it's not normal, then the WARN_ON_ONCE() was justified, right?
Thanks,
Ingo
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] perf/x86/intel: Fix IA32_PMC_x_CFG_B MSRs access error
2025-05-31 8:01 ` Ingo Molnar
@ 2025-06-03 2:14 ` Mi, Dapeng
2025-06-19 1:11 ` Mi, Dapeng
2025-07-11 2:40 ` Mi, Dapeng
0 siblings, 2 replies; 8+ messages in thread
From: Mi, Dapeng @ 2025-06-03 2:14 UTC (permalink / raw)
To: Ingo Molnar
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin,
Kan Liang, Andi Kleen, Eranian Stephane, linux-kernel,
linux-perf-users, Dapeng Mi
On 5/31/2025 4:01 PM, Ingo Molnar wrote:
> * Dapeng Mi <dapeng1.mi@linux.intel.com> wrote:
>
>> When running perf_fuzzer on PTL, sometimes the below "unchecked MSR
>> access error" is seen when accessing IA32_PMC_x_CFG_B MSRs.
>>
>> [ 55.611268] unchecked MSR access error: WRMSR to 0x1986 (tried to write 0x0000000200000001) at rIP: 0xffffffffac564b28 (native_write_msr+0x8/0x30)
>> [ 55.611280] Call Trace:
>> [ 55.611282] <TASK>
>> [ 55.611284] ? intel_pmu_config_acr+0x87/0x160
>> [ 55.611289] intel_pmu_enable_acr+0x6d/0x80
>> [ 55.611291] intel_pmu_enable_event+0xce/0x460
>> [ 55.611293] x86_pmu_start+0x78/0xb0
>> [ 55.611297] x86_pmu_enable+0x218/0x3a0
>> [ 55.611300] ? x86_pmu_enable+0x121/0x3a0
>> [ 55.611302] perf_pmu_enable+0x40/0x50
>> [ 55.611307] ctx_resched+0x19d/0x220
>> [ 55.611309] __perf_install_in_context+0x284/0x2f0
>> [ 55.611311] ? __pfx_remote_function+0x10/0x10
>> [ 55.611314] remote_function+0x52/0x70
>> [ 55.611317] ? __pfx_remote_function+0x10/0x10
>> [ 55.611319] generic_exec_single+0x84/0x150
>> [ 55.611323] smp_call_function_single+0xc5/0x1a0
>> [ 55.611326] ? __pfx_remote_function+0x10/0x10
>> [ 55.611329] perf_install_in_context+0xd1/0x1e0
>> [ 55.611331] ? __pfx___perf_install_in_context+0x10/0x10
>> [ 55.611333] __do_sys_perf_event_open+0xa76/0x1040
>> [ 55.611336] __x64_sys_perf_event_open+0x26/0x30
>> [ 55.611337] x64_sys_call+0x1d8e/0x20c0
>> [ 55.611339] do_syscall_64+0x4f/0x120
>> [ 55.611343] entry_SYSCALL_64_after_hwframe+0x76/0x7e
>>
>> On PTL, GP counter 0 and 1 doesn't support auto counter reload feature,
>> thus it would trigger a #GP when trying to write 1 on bit 0 of CFG_B MSR
>> which requires to enable auto counter reload on GP counter 0.
>>
>> The root cause of causing this issue is the check for auto counter
>> reload (ACR) counter mask from user space is incorrect in
>> intel_pmu_acr_late_setup() helper. It leads to an invalid ACR counter
>> mask from user space could be set into hw.config1 and then written into
>> CFG_B MSRs and trigger the MSR access warning.
>>
>> e.g., User may create a perf event with ACR counter mask (config2=0xcb),
>> and there is only 1 event created, so "cpuc->n_events" is 1.
>>
>> The correct check condition should be "i + idx >= cpuc->n_events"
>> instead of "i + idx > cpuc->n_events" (it looks a typo). Otherwise,
>> the counter mask would traverse twice and an invalid "cpuc->assign[1]"
>> bit (bit 0) is set into hw.config1 and cause MSR accessing error.
>>
>> Besides, also check if the ACR counter mask corresponding events are
>> ACR events. If not, filter out these counter mask. If a event is not a
>> ACR event, it could be scheduled to an HW counter which doesn't support
>> ACR. It's invalid to add their counter index in ACR counter mask.
>>
>> Furthermore, remove the WARN_ON_ONCE() since it's easily triggered as
>> user could set any invalid ACR counter mask and the warning message
>> could mislead users.
>>
>> Fixes: ec980e4facef ("perf/x86/intel: Support auto counter reload")
>> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
>> ---
>> arch/x86/events/intel/core.c | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
>> index 3a319cf6d364..8d046b1a237e 100644
>> --- a/arch/x86/events/intel/core.c
>> +++ b/arch/x86/events/intel/core.c
>> @@ -2994,7 +2994,8 @@ static void intel_pmu_acr_late_setup(struct cpu_hw_events *cpuc)
>> if (event->group_leader != leader->group_leader)
>> break;
>> for_each_set_bit(idx, (unsigned long *)&event->attr.config2, X86_PMC_IDX_MAX) {
>> - if (WARN_ON_ONCE(i + idx > cpuc->n_events))
>> + if (i + idx >= cpuc->n_events ||
>> + !is_acr_event_group(cpuc->event_list[i + idx]))
>> return;
> Is this a normal condition?
>
> - If it's normal then the 'return' is destructive, isn't it? Shouldn't
> it be a 'break', if this condition is legitimate?
>
> - If it's not normal, then the WARN_ON_ONCE() was justified, right?
It's not normal.Strictly speaking, it's an invalid user configuration. It
looks not reasonable to trigger a kernel space warning for an invalid user
space configuration. It would mislead users and let users think there is
something wrong in kernel. That's why to remove the WARN_ON_ONCE(). Thanks.
>
> Thanks,
>
> Ingo
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] perf/x86/intel: Fix IA32_PMC_x_CFG_B MSRs access error
2025-06-03 2:14 ` Mi, Dapeng
@ 2025-06-19 1:11 ` Mi, Dapeng
2025-07-11 2:40 ` Mi, Dapeng
1 sibling, 0 replies; 8+ messages in thread
From: Mi, Dapeng @ 2025-06-19 1:11 UTC (permalink / raw)
To: Ingo Molnar
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin,
Kan Liang, Andi Kleen, Eranian Stephane, linux-kernel,
linux-perf-users, Dapeng Mi
Kindly ping... Thanks.
On 6/3/2025 10:14 AM, Mi, Dapeng wrote:
> On 5/31/2025 4:01 PM, Ingo Molnar wrote:
>> * Dapeng Mi <dapeng1.mi@linux.intel.com> wrote:
>>
>>> When running perf_fuzzer on PTL, sometimes the below "unchecked MSR
>>> access error" is seen when accessing IA32_PMC_x_CFG_B MSRs.
>>>
>>> [ 55.611268] unchecked MSR access error: WRMSR to 0x1986 (tried to write 0x0000000200000001) at rIP: 0xffffffffac564b28 (native_write_msr+0x8/0x30)
>>> [ 55.611280] Call Trace:
>>> [ 55.611282] <TASK>
>>> [ 55.611284] ? intel_pmu_config_acr+0x87/0x160
>>> [ 55.611289] intel_pmu_enable_acr+0x6d/0x80
>>> [ 55.611291] intel_pmu_enable_event+0xce/0x460
>>> [ 55.611293] x86_pmu_start+0x78/0xb0
>>> [ 55.611297] x86_pmu_enable+0x218/0x3a0
>>> [ 55.611300] ? x86_pmu_enable+0x121/0x3a0
>>> [ 55.611302] perf_pmu_enable+0x40/0x50
>>> [ 55.611307] ctx_resched+0x19d/0x220
>>> [ 55.611309] __perf_install_in_context+0x284/0x2f0
>>> [ 55.611311] ? __pfx_remote_function+0x10/0x10
>>> [ 55.611314] remote_function+0x52/0x70
>>> [ 55.611317] ? __pfx_remote_function+0x10/0x10
>>> [ 55.611319] generic_exec_single+0x84/0x150
>>> [ 55.611323] smp_call_function_single+0xc5/0x1a0
>>> [ 55.611326] ? __pfx_remote_function+0x10/0x10
>>> [ 55.611329] perf_install_in_context+0xd1/0x1e0
>>> [ 55.611331] ? __pfx___perf_install_in_context+0x10/0x10
>>> [ 55.611333] __do_sys_perf_event_open+0xa76/0x1040
>>> [ 55.611336] __x64_sys_perf_event_open+0x26/0x30
>>> [ 55.611337] x64_sys_call+0x1d8e/0x20c0
>>> [ 55.611339] do_syscall_64+0x4f/0x120
>>> [ 55.611343] entry_SYSCALL_64_after_hwframe+0x76/0x7e
>>>
>>> On PTL, GP counter 0 and 1 doesn't support auto counter reload feature,
>>> thus it would trigger a #GP when trying to write 1 on bit 0 of CFG_B MSR
>>> which requires to enable auto counter reload on GP counter 0.
>>>
>>> The root cause of causing this issue is the check for auto counter
>>> reload (ACR) counter mask from user space is incorrect in
>>> intel_pmu_acr_late_setup() helper. It leads to an invalid ACR counter
>>> mask from user space could be set into hw.config1 and then written into
>>> CFG_B MSRs and trigger the MSR access warning.
>>>
>>> e.g., User may create a perf event with ACR counter mask (config2=0xcb),
>>> and there is only 1 event created, so "cpuc->n_events" is 1.
>>>
>>> The correct check condition should be "i + idx >= cpuc->n_events"
>>> instead of "i + idx > cpuc->n_events" (it looks a typo). Otherwise,
>>> the counter mask would traverse twice and an invalid "cpuc->assign[1]"
>>> bit (bit 0) is set into hw.config1 and cause MSR accessing error.
>>>
>>> Besides, also check if the ACR counter mask corresponding events are
>>> ACR events. If not, filter out these counter mask. If a event is not a
>>> ACR event, it could be scheduled to an HW counter which doesn't support
>>> ACR. It's invalid to add their counter index in ACR counter mask.
>>>
>>> Furthermore, remove the WARN_ON_ONCE() since it's easily triggered as
>>> user could set any invalid ACR counter mask and the warning message
>>> could mislead users.
>>>
>>> Fixes: ec980e4facef ("perf/x86/intel: Support auto counter reload")
>>> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
>>> ---
>>> arch/x86/events/intel/core.c | 3 ++-
>>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
>>> index 3a319cf6d364..8d046b1a237e 100644
>>> --- a/arch/x86/events/intel/core.c
>>> +++ b/arch/x86/events/intel/core.c
>>> @@ -2994,7 +2994,8 @@ static void intel_pmu_acr_late_setup(struct cpu_hw_events *cpuc)
>>> if (event->group_leader != leader->group_leader)
>>> break;
>>> for_each_set_bit(idx, (unsigned long *)&event->attr.config2, X86_PMC_IDX_MAX) {
>>> - if (WARN_ON_ONCE(i + idx > cpuc->n_events))
>>> + if (i + idx >= cpuc->n_events ||
>>> + !is_acr_event_group(cpuc->event_list[i + idx]))
>>> return;
>> Is this a normal condition?
>>
>> - If it's normal then the 'return' is destructive, isn't it? Shouldn't
>> it be a 'break', if this condition is legitimate?
>>
>> - If it's not normal, then the WARN_ON_ONCE() was justified, right?
> It's not normal.Strictly speaking, it's an invalid user configuration. It
> looks not reasonable to trigger a kernel space warning for an invalid user
> space configuration. It would mislead users and let users think there is
> something wrong in kernel. That's why to remove the WARN_ON_ONCE(). Thanks.
>
>
>> Thanks,
>>
>> Ingo
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] perf/x86/intel: Fix IA32_PMC_x_CFG_B MSRs access error
2025-06-03 2:14 ` Mi, Dapeng
2025-06-19 1:11 ` Mi, Dapeng
@ 2025-07-11 2:40 ` Mi, Dapeng
1 sibling, 0 replies; 8+ messages in thread
From: Mi, Dapeng @ 2025-07-11 2:40 UTC (permalink / raw)
To: Ingo Molnar
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin,
Kan Liang, Andi Kleen, Eranian Stephane, linux-kernel,
linux-perf-users, Dapeng Mi
Kindly ping.
@Ingo @Peter, not sure if you have bandwidth to review this change? Thanks.
On 6/3/2025 10:14 AM, Mi, Dapeng wrote:
> On 5/31/2025 4:01 PM, Ingo Molnar wrote:
>> * Dapeng Mi <dapeng1.mi@linux.intel.com> wrote:
>>
>>> When running perf_fuzzer on PTL, sometimes the below "unchecked MSR
>>> access error" is seen when accessing IA32_PMC_x_CFG_B MSRs.
>>>
>>> [ 55.611268] unchecked MSR access error: WRMSR to 0x1986 (tried to write 0x0000000200000001) at rIP: 0xffffffffac564b28 (native_write_msr+0x8/0x30)
>>> [ 55.611280] Call Trace:
>>> [ 55.611282] <TASK>
>>> [ 55.611284] ? intel_pmu_config_acr+0x87/0x160
>>> [ 55.611289] intel_pmu_enable_acr+0x6d/0x80
>>> [ 55.611291] intel_pmu_enable_event+0xce/0x460
>>> [ 55.611293] x86_pmu_start+0x78/0xb0
>>> [ 55.611297] x86_pmu_enable+0x218/0x3a0
>>> [ 55.611300] ? x86_pmu_enable+0x121/0x3a0
>>> [ 55.611302] perf_pmu_enable+0x40/0x50
>>> [ 55.611307] ctx_resched+0x19d/0x220
>>> [ 55.611309] __perf_install_in_context+0x284/0x2f0
>>> [ 55.611311] ? __pfx_remote_function+0x10/0x10
>>> [ 55.611314] remote_function+0x52/0x70
>>> [ 55.611317] ? __pfx_remote_function+0x10/0x10
>>> [ 55.611319] generic_exec_single+0x84/0x150
>>> [ 55.611323] smp_call_function_single+0xc5/0x1a0
>>> [ 55.611326] ? __pfx_remote_function+0x10/0x10
>>> [ 55.611329] perf_install_in_context+0xd1/0x1e0
>>> [ 55.611331] ? __pfx___perf_install_in_context+0x10/0x10
>>> [ 55.611333] __do_sys_perf_event_open+0xa76/0x1040
>>> [ 55.611336] __x64_sys_perf_event_open+0x26/0x30
>>> [ 55.611337] x64_sys_call+0x1d8e/0x20c0
>>> [ 55.611339] do_syscall_64+0x4f/0x120
>>> [ 55.611343] entry_SYSCALL_64_after_hwframe+0x76/0x7e
>>>
>>> On PTL, GP counter 0 and 1 doesn't support auto counter reload feature,
>>> thus it would trigger a #GP when trying to write 1 on bit 0 of CFG_B MSR
>>> which requires to enable auto counter reload on GP counter 0.
>>>
>>> The root cause of causing this issue is the check for auto counter
>>> reload (ACR) counter mask from user space is incorrect in
>>> intel_pmu_acr_late_setup() helper. It leads to an invalid ACR counter
>>> mask from user space could be set into hw.config1 and then written into
>>> CFG_B MSRs and trigger the MSR access warning.
>>>
>>> e.g., User may create a perf event with ACR counter mask (config2=0xcb),
>>> and there is only 1 event created, so "cpuc->n_events" is 1.
>>>
>>> The correct check condition should be "i + idx >= cpuc->n_events"
>>> instead of "i + idx > cpuc->n_events" (it looks a typo). Otherwise,
>>> the counter mask would traverse twice and an invalid "cpuc->assign[1]"
>>> bit (bit 0) is set into hw.config1 and cause MSR accessing error.
>>>
>>> Besides, also check if the ACR counter mask corresponding events are
>>> ACR events. If not, filter out these counter mask. If a event is not a
>>> ACR event, it could be scheduled to an HW counter which doesn't support
>>> ACR. It's invalid to add their counter index in ACR counter mask.
>>>
>>> Furthermore, remove the WARN_ON_ONCE() since it's easily triggered as
>>> user could set any invalid ACR counter mask and the warning message
>>> could mislead users.
>>>
>>> Fixes: ec980e4facef ("perf/x86/intel: Support auto counter reload")
>>> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
>>> ---
>>> arch/x86/events/intel/core.c | 3 ++-
>>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
>>> index 3a319cf6d364..8d046b1a237e 100644
>>> --- a/arch/x86/events/intel/core.c
>>> +++ b/arch/x86/events/intel/core.c
>>> @@ -2994,7 +2994,8 @@ static void intel_pmu_acr_late_setup(struct cpu_hw_events *cpuc)
>>> if (event->group_leader != leader->group_leader)
>>> break;
>>> for_each_set_bit(idx, (unsigned long *)&event->attr.config2, X86_PMC_IDX_MAX) {
>>> - if (WARN_ON_ONCE(i + idx > cpuc->n_events))
>>> + if (i + idx >= cpuc->n_events ||
>>> + !is_acr_event_group(cpuc->event_list[i + idx]))
>>> return;
>> Is this a normal condition?
>>
>> - If it's normal then the 'return' is destructive, isn't it? Shouldn't
>> it be a 'break', if this condition is legitimate?
>>
>> - If it's not normal, then the WARN_ON_ONCE() was justified, right?
> It's not normal.Strictly speaking, it's an invalid user configuration. It
> looks not reasonable to trigger a kernel space warning for an invalid user
> space configuration. It would mislead users and let users think there is
> something wrong in kernel. That's why to remove the WARN_ON_ONCE(). Thanks.
>
>
>> Thanks,
>>
>> Ingo
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-07-11 2:40 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-29 8:02 [PATCH 1/2] perf/x86/intel: Fix IA32_PMC_x_CFG_B MSRs access error Dapeng Mi
2025-05-29 8:02 ` [PATCH 2/2] perf/x86/intel: Fix wrong index calculation in intel_pmu_config_acr() Dapeng Mi
2025-05-29 13:55 ` Liang, Kan
2025-05-29 13:54 ` [PATCH 1/2] perf/x86/intel: Fix IA32_PMC_x_CFG_B MSRs access error Liang, Kan
2025-05-31 8:01 ` Ingo Molnar
2025-06-03 2:14 ` Mi, Dapeng
2025-06-19 1:11 ` Mi, Dapeng
2025-07-11 2:40 ` Mi, Dapeng
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).