[perf] unchecked MSR access error: WRMSR to 0x3f1

linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [perf] unchecked MSR access error: WRMSR to 0x3f1
@ 2025-06-17 15:39 Vince Weaver
  2025-06-17 15:50 ` Abhigyan ghosh
  2025-06-17 19:47 ` Liang, Kan
  0 siblings, 2 replies; 11+ messages in thread
From: Vince Weaver @ 2025-06-17 15:39 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users
  Cc: Liang, Kan, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Ian Rogers, Adrian Hunter

Hello

When running the perf_fuzzzer on a raptor-lake machine I get a
	unchecked MSR access error: WRMSR to 0x3f1
error (see below).

A similar message happened before back in 2021 and was fixed in
commit 2dc0572f2cef87425147658698dce2600b799bd3 so not sure if this is the 
same problem or something new.

Vince Weaver
vincent.weaver@maine.edu

[12646.001692] unchecked MSR access error: WRMSR to 0x3f1 (tried to write 0x0001000000000001) at rIP: 0xffffffffa98932af (native_write_msr+0xf/0x20)
[12646.001698] Call Trace:
[12646.001700]  <TASK>
[12646.001700]  intel_pmu_pebs_enable_all+0x2c/0x40
[12646.001703]  intel_pmu_enable_all+0xe/0x20
[12646.001705]  ctx_resched+0x227/0x280
[12646.001708]  event_function+0x8f/0xd0
[12646.001710]  ? __pfx___perf_event_enable+0x10/0x10
[12646.001711]  remote_function+0x42/0x50
[12646.001713]  ? __pfx_remote_function+0x10/0x10
[12646.001714]  generic_exec_single+0x6d/0x130
[12646.001715]  smp_call_function_single+0xee/0x140
[12646.001716]  ? __pfx_remote_function+0x10/0x10
[12646.001717]  event_function_call+0x9f/0x1c0
[12646.001718]  ? __pfx___perf_event_enable+0x10/0x10
[12646.001720]  ? __pfx_event_function+0x10/0x10
[12646.001721]  perf_event_task_enable+0x7b/0x100
[12646.001723]  __do_sys_prctl+0x56f/0xca0
[12646.001725]  do_syscall_64+0x84/0x2f0
[12646.001727]  ? exit_to_user_mode_loop+0xcd/0x120
[12646.001729]  ? do_syscall_64+0x1ef/0x2f0
[12646.001730]  ? try_to_wake_up+0x7e/0x640
[12646.001732]  ? complete_signal+0x2e8/0x350
[12646.001734]  ? __send_signal_locked+0x2e3/0x450
[12646.001735]  ? send_signal_locked+0xb6/0x120
[12646.001736]  ? do_send_sig_info+0x6e/0xc0
[12646.001737]  ? kill_pid_info_type+0xa6/0xc0
[12646.001738]  ? kill_something_info+0x167/0x1a0
[12646.001739]  ? syscall_exit_work+0x132/0x140
[12646.001740]  ? do_syscall_64+0xbc/0x2f0
[12646.001741]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[12646.001743] RIP: 0033:0x7efe86afd40d
[12646.001744] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 18 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 9d 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 1b 48 8b 54 24 18 64 48 2b 14 25 28 00 00 00
[12646.001745] RSP: 002b:00007ffcd6444cf0 EFLAGS: 00000246 ORIG_RAX: 000000000000009d
[12646.001746] RAX: ffffffffffffffda RBX: 000000000000000e RCX: 00007efe86afd40d
[12646.001747] RDX: 0000000000000001 RSI: 00007ffcd6444d24 RDI: 0000000000000020
[12646.001747] RBP: 00007ffcd6444d60 R08: 00007efe86bc625c R09: 00007efe86bc6260
[12646.001748] R10: 00007efe86bc6250 R11: 0000000000000246 R12: 0000000000000000
[12646.001748] R13: 00007ffcd64471b8 R14: 0000559eb2a2edd8 R15: 00007efe86c30020
[12646.001749]  </TASK>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [perf] unchecked MSR access error: WRMSR to 0x3f1
  2025-06-17 15:39 [perf] unchecked MSR access error: WRMSR to 0x3f1 Vince Weaver
@ 2025-06-17 15:50 ` Abhigyan ghosh
  2025-06-17 19:47 ` Liang, Kan
  1 sibling, 0 replies; 11+ messages in thread
From: Abhigyan ghosh @ 2025-06-17 15:50 UTC (permalink / raw)
  To: Vince Weaver; +Cc: linux-kernel, linux-perf-users

Hi Vince,

Thanks for sharing the report.

The WRMSR to 0x3f1 stood out — seems similar to the one handled in commit 2dc0572f2cef back in 2021. Curious if 0x3f1 has popped up before or if this could be a new MSR usage pattern tied to recent PEBS changes?

Also, do you think a quirk-based mask or trap filter around this could be a cleaner way to handle this in the fuzzer context, especially for newer Intel platforms?

Let me know your thoughts.

Best,  
Abhigyan Ghosh

On 17 June 2025 9:09:36 pm IST, Vince Weaver <vincent.weaver@maine.edu> wrote:
>Hello
>
>When running the perf_fuzzzer on a raptor-lake machine I get a
>	unchecked MSR access error: WRMSR to 0x3f1
>error (see below).
>
>A similar message happened before back in 2021 and was fixed in
>commit 2dc0572f2cef87425147658698dce2600b799bd3 so not sure if this is the 
>same problem or something new.
>
>Vince Weaver
>vincent.weaver@maine.edu
>
>[12646.001692] unchecked MSR access error: WRMSR to 0x3f1 (tried to write 0x0001000000000001) at rIP: 0xffffffffa98932af (native_write_msr+0xf/0x20)
>[12646.001698] Call Trace:
>[12646.001700]  <TASK>
>[12646.001700]  intel_pmu_pebs_enable_all+0x2c/0x40
>[12646.001703]  intel_pmu_enable_all+0xe/0x20
>[12646.001705]  ctx_resched+0x227/0x280
>[12646.001708]  event_function+0x8f/0xd0
>[12646.001710]  ? __pfx___perf_event_enable+0x10/0x10
>[12646.001711]  remote_function+0x42/0x50
>[12646.001713]  ? __pfx_remote_function+0x10/0x10
>[12646.001714]  generic_exec_single+0x6d/0x130
>[12646.001715]  smp_call_function_single+0xee/0x140
>[12646.001716]  ? __pfx_remote_function+0x10/0x10
>[12646.001717]  event_function_call+0x9f/0x1c0
>[12646.001718]  ? __pfx___perf_event_enable+0x10/0x10
>[12646.001720]  ? __pfx_event_function+0x10/0x10
>[12646.001721]  perf_event_task_enable+0x7b/0x100
>[12646.001723]  __do_sys_prctl+0x56f/0xca0
>[12646.001725]  do_syscall_64+0x84/0x2f0
>[12646.001727]  ? exit_to_user_mode_loop+0xcd/0x120
>[12646.001729]  ? do_syscall_64+0x1ef/0x2f0
>[12646.001730]  ? try_to_wake_up+0x7e/0x640
>[12646.001732]  ? complete_signal+0x2e8/0x350
>[12646.001734]  ? __send_signal_locked+0x2e3/0x450
>[12646.001735]  ? send_signal_locked+0xb6/0x120
>[12646.001736]  ? do_send_sig_info+0x6e/0xc0
>[12646.001737]  ? kill_pid_info_type+0xa6/0xc0
>[12646.001738]  ? kill_something_info+0x167/0x1a0
>[12646.001739]  ? syscall_exit_work+0x132/0x140
>[12646.001740]  ? do_syscall_64+0xbc/0x2f0
>[12646.001741]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
>[12646.001743] RIP: 0033:0x7efe86afd40d
>[12646.001744] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 18 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 9d 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 1b 48 8b 54 24 18 64 48 2b 14 25 28 00 00 00
>[12646.001745] RSP: 002b:00007ffcd6444cf0 EFLAGS: 00000246 ORIG_RAX: 000000000000009d
>[12646.001746] RAX: ffffffffffffffda RBX: 000000000000000e RCX: 00007efe86afd40d
>[12646.001747] RDX: 0000000000000001 RSI: 00007ffcd6444d24 RDI: 0000000000000020
>[12646.001747] RBP: 00007ffcd6444d60 R08: 00007efe86bc625c R09: 00007efe86bc6260
>[12646.001748] R10: 00007efe86bc6250 R11: 0000000000000246 R12: 0000000000000000
>[12646.001748] R13: 00007ffcd64471b8 R14: 0000559eb2a2edd8 R15: 00007efe86c30020
>[12646.001749]  </TASK>
>
>

aghosh

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [perf] unchecked MSR access error: WRMSR to 0x3f1
  2025-06-17 15:39 [perf] unchecked MSR access error: WRMSR to 0x3f1 Vince Weaver
  2025-06-17 15:50 ` Abhigyan ghosh
@ 2025-06-17 19:47 ` Liang, Kan
  2025-06-18  3:49   ` Vince Weaver
  1 sibling, 1 reply; 11+ messages in thread
From: Liang, Kan @ 2025-06-17 19:47 UTC (permalink / raw)
  To: Vince Weaver, linux-kernel, linux-perf-users
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Ian Rogers, Adrian Hunter



On 2025-06-17 11:39 a.m., Vince Weaver wrote:
> Hello
> 
> When running the perf_fuzzzer on a raptor-lake machine I get a
> 	unchecked MSR access error: WRMSR to 0x3f1
> error (see below).
> 
> A similar message happened before back in 2021 and was fixed in
> commit 2dc0572f2cef87425147658698dce2600b799bd3 so not sure if this is the 
> same problem or something new.

The commit 2dc0572f2cef was triggered by the fake event VLBR_EVENT.
But this error should be triggered by the Topdown perf metrics event,
INTEL_TD_METRIC_RETIRING, which uses the idx 48 internally.

We never support perf metrics events in sampling mode. The PEBS cannot
be enabled in counting mode. So it's weird the cpuc->pebs_enabled has
the idx 48 set.

The recent change I did for the PEBS is commit e02e9b0374c3
"perf/x86/intel: Support PEBS counters snapshotting". But it should not
impact the above.

Could you please help on the below questions?
- It only happens on the p-core, right?
- Which kernel base do you use? Is it 6.16-rc2?
- Can this be easily reproduced?
  Is it possible to bisect the error commit? (Maybe start from the
commit e02e9b0374c3?)

Thanks,
Kan>
> Vince Weaver
> vincent.weaver@maine.edu
> 
> [12646.001692] unchecked MSR access error: WRMSR to 0x3f1 (tried to write 0x0001000000000001) at rIP: 0xffffffffa98932af (native_write_msr+0xf/0x20)
> [12646.001698] Call Trace:
> [12646.001700]  <TASK>
> [12646.001700]  intel_pmu_pebs_enable_all+0x2c/0x40
> [12646.001703]  intel_pmu_enable_all+0xe/0x20
> [12646.001705]  ctx_resched+0x227/0x280
> [12646.001708]  event_function+0x8f/0xd0
> [12646.001710]  ? __pfx___perf_event_enable+0x10/0x10
> [12646.001711]  remote_function+0x42/0x50
> [12646.001713]  ? __pfx_remote_function+0x10/0x10
> [12646.001714]  generic_exec_single+0x6d/0x130
> [12646.001715]  smp_call_function_single+0xee/0x140
> [12646.001716]  ? __pfx_remote_function+0x10/0x10
> [12646.001717]  event_function_call+0x9f/0x1c0
> [12646.001718]  ? __pfx___perf_event_enable+0x10/0x10
> [12646.001720]  ? __pfx_event_function+0x10/0x10
> [12646.001721]  perf_event_task_enable+0x7b/0x100
> [12646.001723]  __do_sys_prctl+0x56f/0xca0
> [12646.001725]  do_syscall_64+0x84/0x2f0
> [12646.001727]  ? exit_to_user_mode_loop+0xcd/0x120
> [12646.001729]  ? do_syscall_64+0x1ef/0x2f0
> [12646.001730]  ? try_to_wake_up+0x7e/0x640
> [12646.001732]  ? complete_signal+0x2e8/0x350
> [12646.001734]  ? __send_signal_locked+0x2e3/0x450
> [12646.001735]  ? send_signal_locked+0xb6/0x120
> [12646.001736]  ? do_send_sig_info+0x6e/0xc0
> [12646.001737]  ? kill_pid_info_type+0xa6/0xc0
> [12646.001738]  ? kill_something_info+0x167/0x1a0
> [12646.001739]  ? syscall_exit_work+0x132/0x140
> [12646.001740]  ? do_syscall_64+0xbc/0x2f0
> [12646.001741]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [12646.001743] RIP: 0033:0x7efe86afd40d
> [12646.001744] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 18 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 9d 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 1b 48 8b 54 24 18 64 48 2b 14 25 28 00 00 00
> [12646.001745] RSP: 002b:00007ffcd6444cf0 EFLAGS: 00000246 ORIG_RAX: 000000000000009d
> [12646.001746] RAX: ffffffffffffffda RBX: 000000000000000e RCX: 00007efe86afd40d
> [12646.001747] RDX: 0000000000000001 RSI: 00007ffcd6444d24 RDI: 0000000000000020
> [12646.001747] RBP: 00007ffcd6444d60 R08: 00007efe86bc625c R09: 00007efe86bc6260
> [12646.001748] R10: 00007efe86bc6250 R11: 0000000000000246 R12: 0000000000000000
> [12646.001748] R13: 00007ffcd64471b8 R14: 0000559eb2a2edd8 R15: 00007efe86c30020
> [12646.001749]  </TASK>
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [perf] unchecked MSR access error: WRMSR to 0x3f1
  2025-06-17 19:47 ` Liang, Kan
@ 2025-06-18  3:49   ` Vince Weaver
  2025-06-18 11:02     ` Liang, Kan
  0 siblings, 1 reply; 11+ messages in thread
From: Vince Weaver @ 2025-06-18  3:49 UTC (permalink / raw)
  To: Liang, Kan
  Cc: linux-kernel, linux-perf-users, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter

On Tue, 17 Jun 2025, Liang, Kan wrote:

> The commit 2dc0572f2cef was triggered by the fake event VLBR_EVENT.
> But this error should be triggered by the Topdown perf metrics event,
> INTEL_TD_METRIC_RETIRING, which uses the idx 48 internally.
> 
> We never support perf metrics events in sampling mode. The PEBS cannot
> be enabled in counting mode. So it's weird the cpuc->pebs_enabled has
> the idx 48 set.
> 
> The recent change I did for the PEBS is commit e02e9b0374c3
> "perf/x86/intel: Support PEBS counters snapshotting". But it should not
> impact the above.
> 
> Could you please help on the below questions?
> - It only happens on the p-core, right?

how would I tell?  I don't think the error message says what CPU it 
happens on?

> - Which kernel base do you use? Is it 6.16-rc2?

I was running just before -rc1.  I've updated to current git but didn't 
realize the throttle fix hadn't made it upstream yet so managed to lock up 
the machine and not sure when I'll be able to get over to reboot it.

> - Can this be easily reproduced?

probably.  It's another thing that's a pain to check because it's a 
WARN_ONCE I think so I have to reboot in order to see.  Even if it's not 
reproducible the fuzzer usually hits it within a few hours.

>   Is it possible to bisect the error commit? (Maybe start from the
> commit e02e9b0374c3?)

Maybe but I'd only like to do that as a last resort as it's a pain to 
build and reboot kernels on this machine (for secureboot and other 
reasons).  Also I suppose I'd have to manually apply the throttle patch 
while bisecting.

Vince Weaver
vincent.weaver@maine.edu

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [perf] unchecked MSR access error: WRMSR to 0x3f1
  2025-06-18  3:49   ` Vince Weaver
@ 2025-06-18 11:02     ` Liang, Kan
  2025-06-18 18:26       ` Vince Weaver
  0 siblings, 1 reply; 11+ messages in thread
From: Liang, Kan @ 2025-06-18 11:02 UTC (permalink / raw)
  To: Vince Weaver
  Cc: linux-kernel, linux-perf-users, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter



On 2025-06-17 11:49 p.m., Vince Weaver wrote:
> On Tue, 17 Jun 2025, Liang, Kan wrote:
> 
>> The commit 2dc0572f2cef was triggered by the fake event VLBR_EVENT.
>> But this error should be triggered by the Topdown perf metrics event,
>> INTEL_TD_METRIC_RETIRING, which uses the idx 48 internally.
>>
>> We never support perf metrics events in sampling mode. The PEBS cannot
>> be enabled in counting mode. So it's weird the cpuc->pebs_enabled has
>> the idx 48 set.
>>
>> The recent change I did for the PEBS is commit e02e9b0374c3
>> "perf/x86/intel: Support PEBS counters snapshotting". But it should not
>> impact the above.
>>
>> Could you please help on the below questions?
>> - It only happens on the p-core, right?
> 
> how would I tell?  I don't think the error message says what CPU it 
> happens on?

No, the error message doesn't say it. Just want to check if you have
extra information. Because the Topdown perf metrics is only supported on
p-core. I want to understand whether the code messes up with e-core.

> 
>> - Which kernel base do you use? Is it 6.16-rc2?
> 
> I was running just before -rc1.  I've updated to current git but didn't 
> realize the throttle fix hadn't made it upstream yet so managed to lock up 
> the machine and not sure when I'll be able to get over to reboot it.
>

They are not in rc2 as well. I guess it should be included in rc3.


>> - Can this be easily reproduced?
> 
> probably.  It's another thing that's a pain to check because it's a 
> WARN_ONCE I think so I have to reboot in order to see.  Even if it's not 
> reproducible the fuzzer usually hits it within a few hours.

OK. I will try to reproduce it locally.

> 
>>   Is it possible to bisect the error commit? (Maybe start from the
>> commit e02e9b0374c3?)
> 
> Maybe but I'd only like to do that as a last resort as it's a pain to 
> build and reboot kernels on this machine (for secureboot and other 
> reasons).  


Sure.

Thanks,
Kan

> Also I suppose I'd have to manually apply the throttle patch 
> while bisecting.

> 
> Vince Weaver
> vincent.weaver@maine.edu
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [perf] unchecked MSR access error: WRMSR to 0x3f1
  2025-06-18 11:02     ` Liang, Kan
@ 2025-06-18 18:26       ` Vince Weaver
  2025-06-19 15:17         ` Vince Weaver
  0 siblings, 1 reply; 11+ messages in thread
From: Vince Weaver @ 2025-06-18 18:26 UTC (permalink / raw)
  To: Liang, Kan
  Cc: Vince Weaver, linux-kernel, linux-perf-users, Peter Zijlstra,
	Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter

On Wed, 18 Jun 2025, Liang, Kan wrote:

> No, the error message doesn't say it. Just want to check if you have
> extra information. Because the Topdown perf metrics is only supported on
> p-core. I want to understand whether the code messes up with e-core.

I can't easily tell from the fuzzer as it intentionally switches cores 
often.  I guess I could patch the kernel to report CPU when the WRMSR 
error triggers.

> > I was running just before -rc1.  I've updated to current git but didn't 
> > realize the throttle fix hadn't made it upstream yet so managed to lock up 
> > the machine and not sure when I'll be able to get over to reboot it.
> >
> 
> They are not in rc2 as well. I guess it should be included in rc3.

OK I am running rc2 now (Well, whatever current git is) with the throttle 
fix applied.  The throttle crash is something else, it crashes my test 
machine so hard that even the power button doesn't work, I have to 
physically unplug the machine to reboot it.

> >> - Can this be easily reproduced?
> > 
> > probably.  It's another thing that's a pain to check because it's a 
> > WARN_ONCE I think so I have to reboot in order to see.  Even if it's not 
> > reproducible the fuzzer usually hits it within a few hours.

I am able to reproduce the error on -rc2 using a specific fuzzer random 
seed.  I can possibly try to create a simpler test case but that would be 
a bit of effort.

Vince Weaver
vincent.weaver@maine.edu

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [perf] unchecked MSR access error: WRMSR to 0x3f1
  2025-06-18 18:26       ` Vince Weaver
@ 2025-06-19 15:17         ` Vince Weaver
  2025-06-19 16:06           ` Liang, Kan
  0 siblings, 1 reply; 11+ messages in thread
From: Vince Weaver @ 2025-06-19 15:17 UTC (permalink / raw)
  To: Vince Weaver
  Cc: Liang, Kan, linux-kernel, linux-perf-users, Peter Zijlstra,
	Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter

On Wed, 18 Jun 2025, Vince Weaver wrote:

> On Wed, 18 Jun 2025, Liang, Kan wrote:
> 
> > No, the error message doesn't say it. Just want to check if you have
> > extra information. Because the Topdown perf metrics is only supported on
> > p-core. I want to understand whether the code messes up with e-core.
> 
> I can't easily tell from the fuzzer as it intentionally switches cores 
> often.  I guess I could patch the kernel to report CPU when the WRMSR 
> error triggers.

I've patched the kernel to get rid of the warn_once() and added a printk
for smp_processor_id()  (is that what I want to print?)  In any case that 
reports the warning is happening on CPU1 which is actually a P core, not 
an atom core.

Vince Weaver
vincent.weaver@maine.edu

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [perf] unchecked MSR access error: WRMSR to 0x3f1
  2025-06-19 15:17         ` Vince Weaver
@ 2025-06-19 16:06           ` Liang, Kan
  2025-06-19 20:10             ` Vince Weaver
  0 siblings, 1 reply; 11+ messages in thread
From: Liang, Kan @ 2025-06-19 16:06 UTC (permalink / raw)
  To: Vince Weaver
  Cc: linux-kernel, linux-perf-users, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter



On 2025-06-19 11:17 a.m., Vince Weaver wrote:
> On Wed, 18 Jun 2025, Vince Weaver wrote:
> 
>> On Wed, 18 Jun 2025, Liang, Kan wrote:
>>
>>> No, the error message doesn't say it. Just want to check if you have
>>> extra information. Because the Topdown perf metrics is only supported on
>>> p-core. I want to understand whether the code messes up with e-core.
>>
>> I can't easily tell from the fuzzer as it intentionally switches cores 
>> often.  I guess I could patch the kernel to report CPU when the WRMSR 
>> error triggers.
> 
> I've patched the kernel to get rid of the warn_once() and added a printk
> for smp_processor_id()  (is that what I want to print?)  In any case that 
> reports the warning is happening on CPU1 which is actually a P core, not 
> an atom core.

Thanks for the confirmation.
I've tried fuzzer in some newer machines (later than raptor-lake), but I
haven't reproduce it yet. I will try to find a raptor-lake for more tests.

Thanks,
Kan


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [perf] unchecked MSR access error: WRMSR to 0x3f1
  2025-06-19 16:06           ` Liang, Kan
@ 2025-06-19 20:10             ` Vince Weaver
  2025-06-20 11:07               ` Liang, Kan
  0 siblings, 1 reply; 11+ messages in thread
From: Vince Weaver @ 2025-06-19 20:10 UTC (permalink / raw)
  To: Liang, Kan
  Cc: Vince Weaver, linux-kernel, linux-perf-users, Peter Zijlstra,
	Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter

On Thu, 19 Jun 2025, Liang, Kan wrote:

> 
> 
> On 2025-06-19 11:17 a.m., Vince Weaver wrote:
> > On Wed, 18 Jun 2025, Vince Weaver wrote:
> > 
> >> On Wed, 18 Jun 2025, Liang, Kan wrote:
> >>
> >>> No, the error message doesn't say it. Just want to check if you have
> >>> extra information. Because the Topdown perf metrics is only supported on
> >>> p-core. I want to understand whether the code messes up with e-core.
> >>
> >> I can't easily tell from the fuzzer as it intentionally switches cores 
> >> often.  I guess I could patch the kernel to report CPU when the WRMSR 
> >> error triggers.
> > 
> > I've patched the kernel to get rid of the warn_once() and added a printk
> > for smp_processor_id()  (is that what I want to print?)  In any case that 
> > reports the warning is happening on CPU1 which is actually a P core, not 
> > an atom core.
> 
> Thanks for the confirmation.
> I've tried fuzzer in some newer machines (later than raptor-lake), but I
> haven't reproduce it yet. I will try to find a raptor-lake for more tests.

I've managed to use the perf_fuzzer tools to create a small reproducible 
test case that can trigger the bug.  It's included below.

Vince

---


/* WRMSR top-down reproducer */
/* by Vince Weaver <vincent.weaver _at_ maine.edu> */

#define _GNU_SOURCE 1
#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
#include <string.h>
#include <signal.h>
#include <sys/mman.h>
#include <sys/syscall.h>
#include <sys/ioctl.h>
#include <sys/prctl.h>
#include <sys/wait.h>
#include <poll.h>
#include <linux/hw_breakpoint.h>
#include <linux/perf_event.h>
#include <sched.h>

static int fd[1024];
static struct perf_event_attr pe[1024];

FILE *fff;
static int result;

int perf_event_open(struct perf_event_attr *hw_event_uptr,
	pid_t pid, int cpu, int group_fd, unsigned long flags) {

	return syscall(__NR_perf_event_open,hw_event_uptr, pid, cpu,
		group_fd, flags);
}

int main(int argc, char **argv) {

	int i;
	for(i=0;i<1024;i++) fd[i]=-1;

/* 1 */
/* fd = 72 */

	memset(&pe[72],0,sizeof(struct perf_event_attr));
	pe[72].type=PERF_TYPE_RAW;
	pe[72].config=0xffff880000008000ULL;
	pe[72].sample_freq=0x49ULL;
	pe[72].sample_type=PERF_SAMPLE_TID|PERF_SAMPLE_ADDR|PERF_SAMPLE_READ|PERF_SAMPLE_CPU; /* 9a */
	pe[72].read_format=PERF_FORMAT_ID|PERF_FORMAT_GROUP|0x10ULL; /* 1c */
	pe[72].exclude_user=1;
	pe[72].exclude_kernel=1;
	pe[72].mmap=1;
	pe[72].comm=1;
	pe[72].freq=1;
	pe[72].enable_on_exec=1;
	pe[72].watermark=1;
	pe[72].precise_ip=1; /* constant skid */
	pe[72].sample_id_all=1;
	pe[72].exclude_callchain_user=1;
	pe[72].comm_exec=1;
	pe[72].wakeup_watermark=-1970634752;
	pe[72].bp_type=HW_BREAKPOINT_R|HW_BREAKPOINT_W; /*3*/
	pe[72].bp_addr=0x0ULL;
	pe[72].bp_len=0x2ULL;
	pe[72].branch_sample_type=PERF_SAMPLE_BRANCH_HV|PERF_SAMPLE_BRANCH_ANY|PERF_SAMPLE_BRANCH_ANY_CALL|PERF_SAMPLE_BRANCH_ANY_RETURN|PERF_SAMPLE_BRANCH_IND_JUMP|PERF_SAMPLE_BRANCH_ABORT_TX|PERF_SAMPLE_BRANCH_COND|0xbcbcbca800ULL;
	pe[72].sample_regs_user=4294967253ULL;
	pe[72].sample_stack_user=0x23008000;

	fd[72]=perf_event_open(&pe[72],
				0, /* current thread */
				1, /* Only cpu 1 */
				fd[114], /* 114 is group leader */
				PERF_FLAG_FD_NO_GROUP /*1*/ );


/* 2 */
	prctl(PR_TASK_PERF_EVENTS_DISABLE);
/* 3 */
// a 0 1 1
// which=0,num=1,cpi=1

#define MAX_CPUS 1024

	pid_t pid=0;    /* current thread */
        static cpu_set_t *cpu_mask;
        int max_cpus=MAX_CPUS;
        size_t set_size;

	cpu_mask=CPU_ALLOC(max_cpus);
	set_size=CPU_ALLOC_SIZE(max_cpus);


	CPU_ZERO_S(set_size,cpu_mask);
	CPU_SET_S(1,set_size,cpu_mask);

	result=sched_setaffinity(pid,max_cpus,cpu_mask);

/* 4 */
	prctl(PR_TASK_PERF_EVENTS_ENABLE);
/* 5 */
/* fd = 38 */

	memset(&pe[38],0,sizeof(struct perf_event_attr));
	pe[38].type=PERF_TYPE_HARDWARE;
	pe[38].size=112;
	pe[38].config=PERF_COUNT_HW_BRANCH_MISSES;
	pe[38].sample_type=0; /* 0 */
	pe[38].read_format=PERF_FORMAT_ID|PERF_FORMAT_GROUP|0x10ULL; /* 1c */
	pe[38].disabled=1;
	pe[38].precise_ip=0; /* arbitrary skid */
	pe[38].wakeup_events=0;
	pe[38].bp_type=HW_BREAKPOINT_EMPTY;

	fd[38]=perf_event_open(&pe[38],
				getpid(), /* current thread */
				22, /* Only cpu 22 */
				-1, /* New Group Leader */
				PERF_FLAG_FD_NO_GROUP /*1*/ );




	/* Replayed 4 syscalls */
	return 0;
}

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [perf] unchecked MSR access error: WRMSR to 0x3f1
  2025-06-19 20:10             ` Vince Weaver
@ 2025-06-20 11:07               ` Liang, Kan
  2025-06-20 16:12                 ` Vince Weaver
  0 siblings, 1 reply; 11+ messages in thread
From: Liang, Kan @ 2025-06-20 11:07 UTC (permalink / raw)
  To: Vince Weaver
  Cc: linux-kernel, linux-perf-users, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter

Hi Vince,

On 2025-06-19 4:10 p.m., Vince Weaver wrote:
> On Thu, 19 Jun 2025, Liang, Kan wrote:
> 
>>
>>
>> On 2025-06-19 11:17 a.m., Vince Weaver wrote:
>>> On Wed, 18 Jun 2025, Vince Weaver wrote:
>>>
>>>> On Wed, 18 Jun 2025, Liang, Kan wrote:
>>>>
>>>>> No, the error message doesn't say it. Just want to check if you have
>>>>> extra information. Because the Topdown perf metrics is only supported on
>>>>> p-core. I want to understand whether the code messes up with e-core.
>>>>
>>>> I can't easily tell from the fuzzer as it intentionally switches cores 
>>>> often.  I guess I could patch the kernel to report CPU when the WRMSR 
>>>> error triggers.
>>>
>>> I've patched the kernel to get rid of the warn_once() and added a printk
>>> for smp_processor_id()  (is that what I want to print?)  In any case that 
>>> reports the warning is happening on CPU1 which is actually a P core, not 
>>> an atom core.
>>
>> Thanks for the confirmation.
>> I've tried fuzzer in some newer machines (later than raptor-lake), but I
>> haven't reproduce it yet. I will try to find a raptor-lake for more tests.
> 
> I've managed to use the perf_fuzzer tools to create a small reproducible 
> test case that can trigger the bug.  It's included below.

Thanks very much for the reproducer! The issue has been root-caused.
I've sent a patch to fix it. Please give it a try.
https://lore.kernel.org/lkml/20250620110406.3782402-1-kan.liang@linux.intel.com/

Thanks,
Kan>
> Vince
> 
> ---
> 
> 
> /* WRMSR top-down reproducer */
> /* by Vince Weaver <vincent.weaver _at_ maine.edu> */
> 
> #define _GNU_SOURCE 1
> #include <stdio.h>
> #include <unistd.h>
> #include <fcntl.h>
> #include <string.h>
> #include <signal.h>
> #include <sys/mman.h>
> #include <sys/syscall.h>
> #include <sys/ioctl.h>
> #include <sys/prctl.h>
> #include <sys/wait.h>
> #include <poll.h>
> #include <linux/hw_breakpoint.h>
> #include <linux/perf_event.h>
> #include <sched.h>
> 
> static int fd[1024];
> static struct perf_event_attr pe[1024];
> 
> FILE *fff;
> static int result;
> 
> int perf_event_open(struct perf_event_attr *hw_event_uptr,
> 	pid_t pid, int cpu, int group_fd, unsigned long flags) {
> 
> 	return syscall(__NR_perf_event_open,hw_event_uptr, pid, cpu,
> 		group_fd, flags);
> }
> 
> int main(int argc, char **argv) {
> 
> 	int i;
> 	for(i=0;i<1024;i++) fd[i]=-1;
> 
> /* 1 */
> /* fd = 72 */
> 
> 	memset(&pe[72],0,sizeof(struct perf_event_attr));
> 	pe[72].type=PERF_TYPE_RAW;
> 	pe[72].config=0xffff880000008000ULL;
> 	pe[72].sample_freq=0x49ULL;
> 	pe[72].sample_type=PERF_SAMPLE_TID|PERF_SAMPLE_ADDR|PERF_SAMPLE_READ|PERF_SAMPLE_CPU; /* 9a */
> 	pe[72].read_format=PERF_FORMAT_ID|PERF_FORMAT_GROUP|0x10ULL; /* 1c */
> 	pe[72].exclude_user=1;
> 	pe[72].exclude_kernel=1;
> 	pe[72].mmap=1;
> 	pe[72].comm=1;
> 	pe[72].freq=1;
> 	pe[72].enable_on_exec=1;
> 	pe[72].watermark=1;
> 	pe[72].precise_ip=1; /* constant skid */
> 	pe[72].sample_id_all=1;
> 	pe[72].exclude_callchain_user=1;
> 	pe[72].comm_exec=1;
> 	pe[72].wakeup_watermark=-1970634752;
> 	pe[72].bp_type=HW_BREAKPOINT_R|HW_BREAKPOINT_W; /*3*/
> 	pe[72].bp_addr=0x0ULL;
> 	pe[72].bp_len=0x2ULL;
> 	pe[72].branch_sample_type=PERF_SAMPLE_BRANCH_HV|PERF_SAMPLE_BRANCH_ANY|PERF_SAMPLE_BRANCH_ANY_CALL|PERF_SAMPLE_BRANCH_ANY_RETURN|PERF_SAMPLE_BRANCH_IND_JUMP|PERF_SAMPLE_BRANCH_ABORT_TX|PERF_SAMPLE_BRANCH_COND|0xbcbcbca800ULL;
> 	pe[72].sample_regs_user=4294967253ULL;
> 	pe[72].sample_stack_user=0x23008000;
> 
> 	fd[72]=perf_event_open(&pe[72],
> 				0, /* current thread */
> 				1, /* Only cpu 1 */
> 				fd[114], /* 114 is group leader */
> 				PERF_FLAG_FD_NO_GROUP /*1*/ );
> 
> 
> /* 2 */
> 	prctl(PR_TASK_PERF_EVENTS_DISABLE);
> /* 3 */
> // a 0 1 1
> // which=0,num=1,cpi=1
> 
> #define MAX_CPUS 1024
> 
> 	pid_t pid=0;    /* current thread */
>         static cpu_set_t *cpu_mask;
>         int max_cpus=MAX_CPUS;
>         size_t set_size;
> 
> 	cpu_mask=CPU_ALLOC(max_cpus);
> 	set_size=CPU_ALLOC_SIZE(max_cpus);
> 
> 
> 	CPU_ZERO_S(set_size,cpu_mask);
> 	CPU_SET_S(1,set_size,cpu_mask);
> 
> 	result=sched_setaffinity(pid,max_cpus,cpu_mask);
> 
> /* 4 */
> 	prctl(PR_TASK_PERF_EVENTS_ENABLE);
> /* 5 */
> /* fd = 38 */
> 
> 	memset(&pe[38],0,sizeof(struct perf_event_attr));
> 	pe[38].type=PERF_TYPE_HARDWARE;
> 	pe[38].size=112;
> 	pe[38].config=PERF_COUNT_HW_BRANCH_MISSES;
> 	pe[38].sample_type=0; /* 0 */
> 	pe[38].read_format=PERF_FORMAT_ID|PERF_FORMAT_GROUP|0x10ULL; /* 1c */
> 	pe[38].disabled=1;
> 	pe[38].precise_ip=0; /* arbitrary skid */
> 	pe[38].wakeup_events=0;
> 	pe[38].bp_type=HW_BREAKPOINT_EMPTY;
> 
> 	fd[38]=perf_event_open(&pe[38],
> 				getpid(), /* current thread */
> 				22, /* Only cpu 22 */
> 				-1, /* New Group Leader */
> 				PERF_FLAG_FD_NO_GROUP /*1*/ );
> 
> 
> 
> 
> 	/* Replayed 4 syscalls */
> 	return 0;
> }
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [perf] unchecked MSR access error: WRMSR to 0x3f1
  2025-06-20 11:07               ` Liang, Kan
@ 2025-06-20 16:12                 ` Vince Weaver
  0 siblings, 0 replies; 11+ messages in thread
From: Vince Weaver @ 2025-06-20 16:12 UTC (permalink / raw)
  To: Liang, Kan
  Cc: Vince Weaver, linux-kernel, linux-perf-users, Peter Zijlstra,
	Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter

On Fri, 20 Jun 2025, Liang, Kan wrote:

> Thanks very much for the reproducer! The issue has been root-caused.
> I've sent a patch to fix it. Please give it a try.
> https://lore.kernel.org/lkml/20250620110406.3782402-1-kan.liang@linux.intel.com/

I've applied the patch and can no longer generat the issue with my tests.

Thanks!

Vince

Tested-by: Vince Weaver <vincent.weaver@maine.edu>

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2025-06-20 16:12 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-17 15:39 [perf] unchecked MSR access error: WRMSR to 0x3f1 Vince Weaver
2025-06-17 15:50 ` Abhigyan ghosh
2025-06-17 19:47 ` Liang, Kan
2025-06-18  3:49   ` Vince Weaver
2025-06-18 11:02     ` Liang, Kan
2025-06-18 18:26       ` Vince Weaver
2025-06-19 15:17         ` Vince Weaver
2025-06-19 16:06           ` Liang, Kan
2025-06-19 20:10             ` Vince Weaver
2025-06-20 11:07               ` Liang, Kan
2025-06-20 16:12                 ` Vince Weaver

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).