* [perf] unchecked MSR access error: WRMSR to 0x3f1 @ 2025-06-17 15:39 Vince Weaver 2025-06-17 15:50 ` Abhigyan ghosh 2025-06-17 19:47 ` Liang, Kan 0 siblings, 2 replies; 11+ messages in thread From: Vince Weaver @ 2025-06-17 15:39 UTC (permalink / raw) To: linux-kernel, linux-perf-users Cc: Liang, Kan, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter Hello When running the perf_fuzzzer on a raptor-lake machine I get a unchecked MSR access error: WRMSR to 0x3f1 error (see below). A similar message happened before back in 2021 and was fixed in commit 2dc0572f2cef87425147658698dce2600b799bd3 so not sure if this is the same problem or something new. Vince Weaver vincent.weaver@maine.edu [12646.001692] unchecked MSR access error: WRMSR to 0x3f1 (tried to write 0x0001000000000001) at rIP: 0xffffffffa98932af (native_write_msr+0xf/0x20) [12646.001698] Call Trace: [12646.001700] <TASK> [12646.001700] intel_pmu_pebs_enable_all+0x2c/0x40 [12646.001703] intel_pmu_enable_all+0xe/0x20 [12646.001705] ctx_resched+0x227/0x280 [12646.001708] event_function+0x8f/0xd0 [12646.001710] ? __pfx___perf_event_enable+0x10/0x10 [12646.001711] remote_function+0x42/0x50 [12646.001713] ? __pfx_remote_function+0x10/0x10 [12646.001714] generic_exec_single+0x6d/0x130 [12646.001715] smp_call_function_single+0xee/0x140 [12646.001716] ? __pfx_remote_function+0x10/0x10 [12646.001717] event_function_call+0x9f/0x1c0 [12646.001718] ? __pfx___perf_event_enable+0x10/0x10 [12646.001720] ? __pfx_event_function+0x10/0x10 [12646.001721] perf_event_task_enable+0x7b/0x100 [12646.001723] __do_sys_prctl+0x56f/0xca0 [12646.001725] do_syscall_64+0x84/0x2f0 [12646.001727] ? exit_to_user_mode_loop+0xcd/0x120 [12646.001729] ? do_syscall_64+0x1ef/0x2f0 [12646.001730] ? try_to_wake_up+0x7e/0x640 [12646.001732] ? complete_signal+0x2e8/0x350 [12646.001734] ? __send_signal_locked+0x2e3/0x450 [12646.001735] ? send_signal_locked+0xb6/0x120 [12646.001736] ? do_send_sig_info+0x6e/0xc0 [12646.001737] ? kill_pid_info_type+0xa6/0xc0 [12646.001738] ? kill_something_info+0x167/0x1a0 [12646.001739] ? syscall_exit_work+0x132/0x140 [12646.001740] ? do_syscall_64+0xbc/0x2f0 [12646.001741] entry_SYSCALL_64_after_hwframe+0x76/0x7e [12646.001743] RIP: 0033:0x7efe86afd40d [12646.001744] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 18 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 9d 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 1b 48 8b 54 24 18 64 48 2b 14 25 28 00 00 00 [12646.001745] RSP: 002b:00007ffcd6444cf0 EFLAGS: 00000246 ORIG_RAX: 000000000000009d [12646.001746] RAX: ffffffffffffffda RBX: 000000000000000e RCX: 00007efe86afd40d [12646.001747] RDX: 0000000000000001 RSI: 00007ffcd6444d24 RDI: 0000000000000020 [12646.001747] RBP: 00007ffcd6444d60 R08: 00007efe86bc625c R09: 00007efe86bc6260 [12646.001748] R10: 00007efe86bc6250 R11: 0000000000000246 R12: 0000000000000000 [12646.001748] R13: 00007ffcd64471b8 R14: 0000559eb2a2edd8 R15: 00007efe86c30020 [12646.001749] </TASK> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [perf] unchecked MSR access error: WRMSR to 0x3f1 2025-06-17 15:39 [perf] unchecked MSR access error: WRMSR to 0x3f1 Vince Weaver @ 2025-06-17 15:50 ` Abhigyan ghosh 2025-06-17 19:47 ` Liang, Kan 1 sibling, 0 replies; 11+ messages in thread From: Abhigyan ghosh @ 2025-06-17 15:50 UTC (permalink / raw) To: Vince Weaver; +Cc: linux-kernel, linux-perf-users Hi Vince, Thanks for sharing the report. The WRMSR to 0x3f1 stood out — seems similar to the one handled in commit 2dc0572f2cef back in 2021. Curious if 0x3f1 has popped up before or if this could be a new MSR usage pattern tied to recent PEBS changes? Also, do you think a quirk-based mask or trap filter around this could be a cleaner way to handle this in the fuzzer context, especially for newer Intel platforms? Let me know your thoughts. Best, Abhigyan Ghosh On 17 June 2025 9:09:36 pm IST, Vince Weaver <vincent.weaver@maine.edu> wrote: >Hello > >When running the perf_fuzzzer on a raptor-lake machine I get a > unchecked MSR access error: WRMSR to 0x3f1 >error (see below). > >A similar message happened before back in 2021 and was fixed in >commit 2dc0572f2cef87425147658698dce2600b799bd3 so not sure if this is the >same problem or something new. > >Vince Weaver >vincent.weaver@maine.edu > >[12646.001692] unchecked MSR access error: WRMSR to 0x3f1 (tried to write 0x0001000000000001) at rIP: 0xffffffffa98932af (native_write_msr+0xf/0x20) >[12646.001698] Call Trace: >[12646.001700] <TASK> >[12646.001700] intel_pmu_pebs_enable_all+0x2c/0x40 >[12646.001703] intel_pmu_enable_all+0xe/0x20 >[12646.001705] ctx_resched+0x227/0x280 >[12646.001708] event_function+0x8f/0xd0 >[12646.001710] ? __pfx___perf_event_enable+0x10/0x10 >[12646.001711] remote_function+0x42/0x50 >[12646.001713] ? __pfx_remote_function+0x10/0x10 >[12646.001714] generic_exec_single+0x6d/0x130 >[12646.001715] smp_call_function_single+0xee/0x140 >[12646.001716] ? __pfx_remote_function+0x10/0x10 >[12646.001717] event_function_call+0x9f/0x1c0 >[12646.001718] ? __pfx___perf_event_enable+0x10/0x10 >[12646.001720] ? __pfx_event_function+0x10/0x10 >[12646.001721] perf_event_task_enable+0x7b/0x100 >[12646.001723] __do_sys_prctl+0x56f/0xca0 >[12646.001725] do_syscall_64+0x84/0x2f0 >[12646.001727] ? exit_to_user_mode_loop+0xcd/0x120 >[12646.001729] ? do_syscall_64+0x1ef/0x2f0 >[12646.001730] ? try_to_wake_up+0x7e/0x640 >[12646.001732] ? complete_signal+0x2e8/0x350 >[12646.001734] ? __send_signal_locked+0x2e3/0x450 >[12646.001735] ? send_signal_locked+0xb6/0x120 >[12646.001736] ? do_send_sig_info+0x6e/0xc0 >[12646.001737] ? kill_pid_info_type+0xa6/0xc0 >[12646.001738] ? kill_something_info+0x167/0x1a0 >[12646.001739] ? syscall_exit_work+0x132/0x140 >[12646.001740] ? do_syscall_64+0xbc/0x2f0 >[12646.001741] entry_SYSCALL_64_after_hwframe+0x76/0x7e >[12646.001743] RIP: 0033:0x7efe86afd40d >[12646.001744] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 18 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 9d 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 1b 48 8b 54 24 18 64 48 2b 14 25 28 00 00 00 >[12646.001745] RSP: 002b:00007ffcd6444cf0 EFLAGS: 00000246 ORIG_RAX: 000000000000009d >[12646.001746] RAX: ffffffffffffffda RBX: 000000000000000e RCX: 00007efe86afd40d >[12646.001747] RDX: 0000000000000001 RSI: 00007ffcd6444d24 RDI: 0000000000000020 >[12646.001747] RBP: 00007ffcd6444d60 R08: 00007efe86bc625c R09: 00007efe86bc6260 >[12646.001748] R10: 00007efe86bc6250 R11: 0000000000000246 R12: 0000000000000000 >[12646.001748] R13: 00007ffcd64471b8 R14: 0000559eb2a2edd8 R15: 00007efe86c30020 >[12646.001749] </TASK> > > aghosh ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [perf] unchecked MSR access error: WRMSR to 0x3f1 2025-06-17 15:39 [perf] unchecked MSR access error: WRMSR to 0x3f1 Vince Weaver 2025-06-17 15:50 ` Abhigyan ghosh @ 2025-06-17 19:47 ` Liang, Kan 2025-06-18 3:49 ` Vince Weaver 1 sibling, 1 reply; 11+ messages in thread From: Liang, Kan @ 2025-06-17 19:47 UTC (permalink / raw) To: Vince Weaver, linux-kernel, linux-perf-users Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter On 2025-06-17 11:39 a.m., Vince Weaver wrote: > Hello > > When running the perf_fuzzzer on a raptor-lake machine I get a > unchecked MSR access error: WRMSR to 0x3f1 > error (see below). > > A similar message happened before back in 2021 and was fixed in > commit 2dc0572f2cef87425147658698dce2600b799bd3 so not sure if this is the > same problem or something new. The commit 2dc0572f2cef was triggered by the fake event VLBR_EVENT. But this error should be triggered by the Topdown perf metrics event, INTEL_TD_METRIC_RETIRING, which uses the idx 48 internally. We never support perf metrics events in sampling mode. The PEBS cannot be enabled in counting mode. So it's weird the cpuc->pebs_enabled has the idx 48 set. The recent change I did for the PEBS is commit e02e9b0374c3 "perf/x86/intel: Support PEBS counters snapshotting". But it should not impact the above. Could you please help on the below questions? - It only happens on the p-core, right? - Which kernel base do you use? Is it 6.16-rc2? - Can this be easily reproduced? Is it possible to bisect the error commit? (Maybe start from the commit e02e9b0374c3?) Thanks, Kan> > Vince Weaver > vincent.weaver@maine.edu > > [12646.001692] unchecked MSR access error: WRMSR to 0x3f1 (tried to write 0x0001000000000001) at rIP: 0xffffffffa98932af (native_write_msr+0xf/0x20) > [12646.001698] Call Trace: > [12646.001700] <TASK> > [12646.001700] intel_pmu_pebs_enable_all+0x2c/0x40 > [12646.001703] intel_pmu_enable_all+0xe/0x20 > [12646.001705] ctx_resched+0x227/0x280 > [12646.001708] event_function+0x8f/0xd0 > [12646.001710] ? __pfx___perf_event_enable+0x10/0x10 > [12646.001711] remote_function+0x42/0x50 > [12646.001713] ? __pfx_remote_function+0x10/0x10 > [12646.001714] generic_exec_single+0x6d/0x130 > [12646.001715] smp_call_function_single+0xee/0x140 > [12646.001716] ? __pfx_remote_function+0x10/0x10 > [12646.001717] event_function_call+0x9f/0x1c0 > [12646.001718] ? __pfx___perf_event_enable+0x10/0x10 > [12646.001720] ? __pfx_event_function+0x10/0x10 > [12646.001721] perf_event_task_enable+0x7b/0x100 > [12646.001723] __do_sys_prctl+0x56f/0xca0 > [12646.001725] do_syscall_64+0x84/0x2f0 > [12646.001727] ? exit_to_user_mode_loop+0xcd/0x120 > [12646.001729] ? do_syscall_64+0x1ef/0x2f0 > [12646.001730] ? try_to_wake_up+0x7e/0x640 > [12646.001732] ? complete_signal+0x2e8/0x350 > [12646.001734] ? __send_signal_locked+0x2e3/0x450 > [12646.001735] ? send_signal_locked+0xb6/0x120 > [12646.001736] ? do_send_sig_info+0x6e/0xc0 > [12646.001737] ? kill_pid_info_type+0xa6/0xc0 > [12646.001738] ? kill_something_info+0x167/0x1a0 > [12646.001739] ? syscall_exit_work+0x132/0x140 > [12646.001740] ? do_syscall_64+0xbc/0x2f0 > [12646.001741] entry_SYSCALL_64_after_hwframe+0x76/0x7e > [12646.001743] RIP: 0033:0x7efe86afd40d > [12646.001744] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 18 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 9d 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 1b 48 8b 54 24 18 64 48 2b 14 25 28 00 00 00 > [12646.001745] RSP: 002b:00007ffcd6444cf0 EFLAGS: 00000246 ORIG_RAX: 000000000000009d > [12646.001746] RAX: ffffffffffffffda RBX: 000000000000000e RCX: 00007efe86afd40d > [12646.001747] RDX: 0000000000000001 RSI: 00007ffcd6444d24 RDI: 0000000000000020 > [12646.001747] RBP: 00007ffcd6444d60 R08: 00007efe86bc625c R09: 00007efe86bc6260 > [12646.001748] R10: 00007efe86bc6250 R11: 0000000000000246 R12: 0000000000000000 > [12646.001748] R13: 00007ffcd64471b8 R14: 0000559eb2a2edd8 R15: 00007efe86c30020 > [12646.001749] </TASK> > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [perf] unchecked MSR access error: WRMSR to 0x3f1 2025-06-17 19:47 ` Liang, Kan @ 2025-06-18 3:49 ` Vince Weaver 2025-06-18 11:02 ` Liang, Kan 0 siblings, 1 reply; 11+ messages in thread From: Vince Weaver @ 2025-06-18 3:49 UTC (permalink / raw) To: Liang, Kan Cc: linux-kernel, linux-perf-users, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter On Tue, 17 Jun 2025, Liang, Kan wrote: > The commit 2dc0572f2cef was triggered by the fake event VLBR_EVENT. > But this error should be triggered by the Topdown perf metrics event, > INTEL_TD_METRIC_RETIRING, which uses the idx 48 internally. > > We never support perf metrics events in sampling mode. The PEBS cannot > be enabled in counting mode. So it's weird the cpuc->pebs_enabled has > the idx 48 set. > > The recent change I did for the PEBS is commit e02e9b0374c3 > "perf/x86/intel: Support PEBS counters snapshotting". But it should not > impact the above. > > Could you please help on the below questions? > - It only happens on the p-core, right? how would I tell? I don't think the error message says what CPU it happens on? > - Which kernel base do you use? Is it 6.16-rc2? I was running just before -rc1. I've updated to current git but didn't realize the throttle fix hadn't made it upstream yet so managed to lock up the machine and not sure when I'll be able to get over to reboot it. > - Can this be easily reproduced? probably. It's another thing that's a pain to check because it's a WARN_ONCE I think so I have to reboot in order to see. Even if it's not reproducible the fuzzer usually hits it within a few hours. > Is it possible to bisect the error commit? (Maybe start from the > commit e02e9b0374c3?) Maybe but I'd only like to do that as a last resort as it's a pain to build and reboot kernels on this machine (for secureboot and other reasons). Also I suppose I'd have to manually apply the throttle patch while bisecting. Vince Weaver vincent.weaver@maine.edu ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [perf] unchecked MSR access error: WRMSR to 0x3f1 2025-06-18 3:49 ` Vince Weaver @ 2025-06-18 11:02 ` Liang, Kan 2025-06-18 18:26 ` Vince Weaver 0 siblings, 1 reply; 11+ messages in thread From: Liang, Kan @ 2025-06-18 11:02 UTC (permalink / raw) To: Vince Weaver Cc: linux-kernel, linux-perf-users, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter On 2025-06-17 11:49 p.m., Vince Weaver wrote: > On Tue, 17 Jun 2025, Liang, Kan wrote: > >> The commit 2dc0572f2cef was triggered by the fake event VLBR_EVENT. >> But this error should be triggered by the Topdown perf metrics event, >> INTEL_TD_METRIC_RETIRING, which uses the idx 48 internally. >> >> We never support perf metrics events in sampling mode. The PEBS cannot >> be enabled in counting mode. So it's weird the cpuc->pebs_enabled has >> the idx 48 set. >> >> The recent change I did for the PEBS is commit e02e9b0374c3 >> "perf/x86/intel: Support PEBS counters snapshotting". But it should not >> impact the above. >> >> Could you please help on the below questions? >> - It only happens on the p-core, right? > > how would I tell? I don't think the error message says what CPU it > happens on? No, the error message doesn't say it. Just want to check if you have extra information. Because the Topdown perf metrics is only supported on p-core. I want to understand whether the code messes up with e-core. > >> - Which kernel base do you use? Is it 6.16-rc2? > > I was running just before -rc1. I've updated to current git but didn't > realize the throttle fix hadn't made it upstream yet so managed to lock up > the machine and not sure when I'll be able to get over to reboot it. > They are not in rc2 as well. I guess it should be included in rc3. >> - Can this be easily reproduced? > > probably. It's another thing that's a pain to check because it's a > WARN_ONCE I think so I have to reboot in order to see. Even if it's not > reproducible the fuzzer usually hits it within a few hours. OK. I will try to reproduce it locally. > >> Is it possible to bisect the error commit? (Maybe start from the >> commit e02e9b0374c3?) > > Maybe but I'd only like to do that as a last resort as it's a pain to > build and reboot kernels on this machine (for secureboot and other > reasons). Sure. Thanks, Kan > Also I suppose I'd have to manually apply the throttle patch > while bisecting. > > Vince Weaver > vincent.weaver@maine.edu > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [perf] unchecked MSR access error: WRMSR to 0x3f1 2025-06-18 11:02 ` Liang, Kan @ 2025-06-18 18:26 ` Vince Weaver 2025-06-19 15:17 ` Vince Weaver 0 siblings, 1 reply; 11+ messages in thread From: Vince Weaver @ 2025-06-18 18:26 UTC (permalink / raw) To: Liang, Kan Cc: Vince Weaver, linux-kernel, linux-perf-users, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter On Wed, 18 Jun 2025, Liang, Kan wrote: > No, the error message doesn't say it. Just want to check if you have > extra information. Because the Topdown perf metrics is only supported on > p-core. I want to understand whether the code messes up with e-core. I can't easily tell from the fuzzer as it intentionally switches cores often. I guess I could patch the kernel to report CPU when the WRMSR error triggers. > > I was running just before -rc1. I've updated to current git but didn't > > realize the throttle fix hadn't made it upstream yet so managed to lock up > > the machine and not sure when I'll be able to get over to reboot it. > > > > They are not in rc2 as well. I guess it should be included in rc3. OK I am running rc2 now (Well, whatever current git is) with the throttle fix applied. The throttle crash is something else, it crashes my test machine so hard that even the power button doesn't work, I have to physically unplug the machine to reboot it. > >> - Can this be easily reproduced? > > > > probably. It's another thing that's a pain to check because it's a > > WARN_ONCE I think so I have to reboot in order to see. Even if it's not > > reproducible the fuzzer usually hits it within a few hours. I am able to reproduce the error on -rc2 using a specific fuzzer random seed. I can possibly try to create a simpler test case but that would be a bit of effort. Vince Weaver vincent.weaver@maine.edu ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [perf] unchecked MSR access error: WRMSR to 0x3f1 2025-06-18 18:26 ` Vince Weaver @ 2025-06-19 15:17 ` Vince Weaver 2025-06-19 16:06 ` Liang, Kan 0 siblings, 1 reply; 11+ messages in thread From: Vince Weaver @ 2025-06-19 15:17 UTC (permalink / raw) To: Vince Weaver Cc: Liang, Kan, linux-kernel, linux-perf-users, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter On Wed, 18 Jun 2025, Vince Weaver wrote: > On Wed, 18 Jun 2025, Liang, Kan wrote: > > > No, the error message doesn't say it. Just want to check if you have > > extra information. Because the Topdown perf metrics is only supported on > > p-core. I want to understand whether the code messes up with e-core. > > I can't easily tell from the fuzzer as it intentionally switches cores > often. I guess I could patch the kernel to report CPU when the WRMSR > error triggers. I've patched the kernel to get rid of the warn_once() and added a printk for smp_processor_id() (is that what I want to print?) In any case that reports the warning is happening on CPU1 which is actually a P core, not an atom core. Vince Weaver vincent.weaver@maine.edu ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [perf] unchecked MSR access error: WRMSR to 0x3f1 2025-06-19 15:17 ` Vince Weaver @ 2025-06-19 16:06 ` Liang, Kan 2025-06-19 20:10 ` Vince Weaver 0 siblings, 1 reply; 11+ messages in thread From: Liang, Kan @ 2025-06-19 16:06 UTC (permalink / raw) To: Vince Weaver Cc: linux-kernel, linux-perf-users, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter On 2025-06-19 11:17 a.m., Vince Weaver wrote: > On Wed, 18 Jun 2025, Vince Weaver wrote: > >> On Wed, 18 Jun 2025, Liang, Kan wrote: >> >>> No, the error message doesn't say it. Just want to check if you have >>> extra information. Because the Topdown perf metrics is only supported on >>> p-core. I want to understand whether the code messes up with e-core. >> >> I can't easily tell from the fuzzer as it intentionally switches cores >> often. I guess I could patch the kernel to report CPU when the WRMSR >> error triggers. > > I've patched the kernel to get rid of the warn_once() and added a printk > for smp_processor_id() (is that what I want to print?) In any case that > reports the warning is happening on CPU1 which is actually a P core, not > an atom core. Thanks for the confirmation. I've tried fuzzer in some newer machines (later than raptor-lake), but I haven't reproduce it yet. I will try to find a raptor-lake for more tests. Thanks, Kan ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [perf] unchecked MSR access error: WRMSR to 0x3f1 2025-06-19 16:06 ` Liang, Kan @ 2025-06-19 20:10 ` Vince Weaver 2025-06-20 11:07 ` Liang, Kan 0 siblings, 1 reply; 11+ messages in thread From: Vince Weaver @ 2025-06-19 20:10 UTC (permalink / raw) To: Liang, Kan Cc: Vince Weaver, linux-kernel, linux-perf-users, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter On Thu, 19 Jun 2025, Liang, Kan wrote: > > > On 2025-06-19 11:17 a.m., Vince Weaver wrote: > > On Wed, 18 Jun 2025, Vince Weaver wrote: > > > >> On Wed, 18 Jun 2025, Liang, Kan wrote: > >> > >>> No, the error message doesn't say it. Just want to check if you have > >>> extra information. Because the Topdown perf metrics is only supported on > >>> p-core. I want to understand whether the code messes up with e-core. > >> > >> I can't easily tell from the fuzzer as it intentionally switches cores > >> often. I guess I could patch the kernel to report CPU when the WRMSR > >> error triggers. > > > > I've patched the kernel to get rid of the warn_once() and added a printk > > for smp_processor_id() (is that what I want to print?) In any case that > > reports the warning is happening on CPU1 which is actually a P core, not > > an atom core. > > Thanks for the confirmation. > I've tried fuzzer in some newer machines (later than raptor-lake), but I > haven't reproduce it yet. I will try to find a raptor-lake for more tests. I've managed to use the perf_fuzzer tools to create a small reproducible test case that can trigger the bug. It's included below. Vince --- /* WRMSR top-down reproducer */ /* by Vince Weaver <vincent.weaver _at_ maine.edu> */ #define _GNU_SOURCE 1 #include <stdio.h> #include <unistd.h> #include <fcntl.h> #include <string.h> #include <signal.h> #include <sys/mman.h> #include <sys/syscall.h> #include <sys/ioctl.h> #include <sys/prctl.h> #include <sys/wait.h> #include <poll.h> #include <linux/hw_breakpoint.h> #include <linux/perf_event.h> #include <sched.h> static int fd[1024]; static struct perf_event_attr pe[1024]; FILE *fff; static int result; int perf_event_open(struct perf_event_attr *hw_event_uptr, pid_t pid, int cpu, int group_fd, unsigned long flags) { return syscall(__NR_perf_event_open,hw_event_uptr, pid, cpu, group_fd, flags); } int main(int argc, char **argv) { int i; for(i=0;i<1024;i++) fd[i]=-1; /* 1 */ /* fd = 72 */ memset(&pe[72],0,sizeof(struct perf_event_attr)); pe[72].type=PERF_TYPE_RAW; pe[72].config=0xffff880000008000ULL; pe[72].sample_freq=0x49ULL; pe[72].sample_type=PERF_SAMPLE_TID|PERF_SAMPLE_ADDR|PERF_SAMPLE_READ|PERF_SAMPLE_CPU; /* 9a */ pe[72].read_format=PERF_FORMAT_ID|PERF_FORMAT_GROUP|0x10ULL; /* 1c */ pe[72].exclude_user=1; pe[72].exclude_kernel=1; pe[72].mmap=1; pe[72].comm=1; pe[72].freq=1; pe[72].enable_on_exec=1; pe[72].watermark=1; pe[72].precise_ip=1; /* constant skid */ pe[72].sample_id_all=1; pe[72].exclude_callchain_user=1; pe[72].comm_exec=1; pe[72].wakeup_watermark=-1970634752; pe[72].bp_type=HW_BREAKPOINT_R|HW_BREAKPOINT_W; /*3*/ pe[72].bp_addr=0x0ULL; pe[72].bp_len=0x2ULL; pe[72].branch_sample_type=PERF_SAMPLE_BRANCH_HV|PERF_SAMPLE_BRANCH_ANY|PERF_SAMPLE_BRANCH_ANY_CALL|PERF_SAMPLE_BRANCH_ANY_RETURN|PERF_SAMPLE_BRANCH_IND_JUMP|PERF_SAMPLE_BRANCH_ABORT_TX|PERF_SAMPLE_BRANCH_COND|0xbcbcbca800ULL; pe[72].sample_regs_user=4294967253ULL; pe[72].sample_stack_user=0x23008000; fd[72]=perf_event_open(&pe[72], 0, /* current thread */ 1, /* Only cpu 1 */ fd[114], /* 114 is group leader */ PERF_FLAG_FD_NO_GROUP /*1*/ ); /* 2 */ prctl(PR_TASK_PERF_EVENTS_DISABLE); /* 3 */ // a 0 1 1 // which=0,num=1,cpi=1 #define MAX_CPUS 1024 pid_t pid=0; /* current thread */ static cpu_set_t *cpu_mask; int max_cpus=MAX_CPUS; size_t set_size; cpu_mask=CPU_ALLOC(max_cpus); set_size=CPU_ALLOC_SIZE(max_cpus); CPU_ZERO_S(set_size,cpu_mask); CPU_SET_S(1,set_size,cpu_mask); result=sched_setaffinity(pid,max_cpus,cpu_mask); /* 4 */ prctl(PR_TASK_PERF_EVENTS_ENABLE); /* 5 */ /* fd = 38 */ memset(&pe[38],0,sizeof(struct perf_event_attr)); pe[38].type=PERF_TYPE_HARDWARE; pe[38].size=112; pe[38].config=PERF_COUNT_HW_BRANCH_MISSES; pe[38].sample_type=0; /* 0 */ pe[38].read_format=PERF_FORMAT_ID|PERF_FORMAT_GROUP|0x10ULL; /* 1c */ pe[38].disabled=1; pe[38].precise_ip=0; /* arbitrary skid */ pe[38].wakeup_events=0; pe[38].bp_type=HW_BREAKPOINT_EMPTY; fd[38]=perf_event_open(&pe[38], getpid(), /* current thread */ 22, /* Only cpu 22 */ -1, /* New Group Leader */ PERF_FLAG_FD_NO_GROUP /*1*/ ); /* Replayed 4 syscalls */ return 0; } ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [perf] unchecked MSR access error: WRMSR to 0x3f1 2025-06-19 20:10 ` Vince Weaver @ 2025-06-20 11:07 ` Liang, Kan 2025-06-20 16:12 ` Vince Weaver 0 siblings, 1 reply; 11+ messages in thread From: Liang, Kan @ 2025-06-20 11:07 UTC (permalink / raw) To: Vince Weaver Cc: linux-kernel, linux-perf-users, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter Hi Vince, On 2025-06-19 4:10 p.m., Vince Weaver wrote: > On Thu, 19 Jun 2025, Liang, Kan wrote: > >> >> >> On 2025-06-19 11:17 a.m., Vince Weaver wrote: >>> On Wed, 18 Jun 2025, Vince Weaver wrote: >>> >>>> On Wed, 18 Jun 2025, Liang, Kan wrote: >>>> >>>>> No, the error message doesn't say it. Just want to check if you have >>>>> extra information. Because the Topdown perf metrics is only supported on >>>>> p-core. I want to understand whether the code messes up with e-core. >>>> >>>> I can't easily tell from the fuzzer as it intentionally switches cores >>>> often. I guess I could patch the kernel to report CPU when the WRMSR >>>> error triggers. >>> >>> I've patched the kernel to get rid of the warn_once() and added a printk >>> for smp_processor_id() (is that what I want to print?) In any case that >>> reports the warning is happening on CPU1 which is actually a P core, not >>> an atom core. >> >> Thanks for the confirmation. >> I've tried fuzzer in some newer machines (later than raptor-lake), but I >> haven't reproduce it yet. I will try to find a raptor-lake for more tests. > > I've managed to use the perf_fuzzer tools to create a small reproducible > test case that can trigger the bug. It's included below. Thanks very much for the reproducer! The issue has been root-caused. I've sent a patch to fix it. Please give it a try. https://lore.kernel.org/lkml/20250620110406.3782402-1-kan.liang@linux.intel.com/ Thanks, Kan> > Vince > > --- > > > /* WRMSR top-down reproducer */ > /* by Vince Weaver <vincent.weaver _at_ maine.edu> */ > > #define _GNU_SOURCE 1 > #include <stdio.h> > #include <unistd.h> > #include <fcntl.h> > #include <string.h> > #include <signal.h> > #include <sys/mman.h> > #include <sys/syscall.h> > #include <sys/ioctl.h> > #include <sys/prctl.h> > #include <sys/wait.h> > #include <poll.h> > #include <linux/hw_breakpoint.h> > #include <linux/perf_event.h> > #include <sched.h> > > static int fd[1024]; > static struct perf_event_attr pe[1024]; > > FILE *fff; > static int result; > > int perf_event_open(struct perf_event_attr *hw_event_uptr, > pid_t pid, int cpu, int group_fd, unsigned long flags) { > > return syscall(__NR_perf_event_open,hw_event_uptr, pid, cpu, > group_fd, flags); > } > > int main(int argc, char **argv) { > > int i; > for(i=0;i<1024;i++) fd[i]=-1; > > /* 1 */ > /* fd = 72 */ > > memset(&pe[72],0,sizeof(struct perf_event_attr)); > pe[72].type=PERF_TYPE_RAW; > pe[72].config=0xffff880000008000ULL; > pe[72].sample_freq=0x49ULL; > pe[72].sample_type=PERF_SAMPLE_TID|PERF_SAMPLE_ADDR|PERF_SAMPLE_READ|PERF_SAMPLE_CPU; /* 9a */ > pe[72].read_format=PERF_FORMAT_ID|PERF_FORMAT_GROUP|0x10ULL; /* 1c */ > pe[72].exclude_user=1; > pe[72].exclude_kernel=1; > pe[72].mmap=1; > pe[72].comm=1; > pe[72].freq=1; > pe[72].enable_on_exec=1; > pe[72].watermark=1; > pe[72].precise_ip=1; /* constant skid */ > pe[72].sample_id_all=1; > pe[72].exclude_callchain_user=1; > pe[72].comm_exec=1; > pe[72].wakeup_watermark=-1970634752; > pe[72].bp_type=HW_BREAKPOINT_R|HW_BREAKPOINT_W; /*3*/ > pe[72].bp_addr=0x0ULL; > pe[72].bp_len=0x2ULL; > pe[72].branch_sample_type=PERF_SAMPLE_BRANCH_HV|PERF_SAMPLE_BRANCH_ANY|PERF_SAMPLE_BRANCH_ANY_CALL|PERF_SAMPLE_BRANCH_ANY_RETURN|PERF_SAMPLE_BRANCH_IND_JUMP|PERF_SAMPLE_BRANCH_ABORT_TX|PERF_SAMPLE_BRANCH_COND|0xbcbcbca800ULL; > pe[72].sample_regs_user=4294967253ULL; > pe[72].sample_stack_user=0x23008000; > > fd[72]=perf_event_open(&pe[72], > 0, /* current thread */ > 1, /* Only cpu 1 */ > fd[114], /* 114 is group leader */ > PERF_FLAG_FD_NO_GROUP /*1*/ ); > > > /* 2 */ > prctl(PR_TASK_PERF_EVENTS_DISABLE); > /* 3 */ > // a 0 1 1 > // which=0,num=1,cpi=1 > > #define MAX_CPUS 1024 > > pid_t pid=0; /* current thread */ > static cpu_set_t *cpu_mask; > int max_cpus=MAX_CPUS; > size_t set_size; > > cpu_mask=CPU_ALLOC(max_cpus); > set_size=CPU_ALLOC_SIZE(max_cpus); > > > CPU_ZERO_S(set_size,cpu_mask); > CPU_SET_S(1,set_size,cpu_mask); > > result=sched_setaffinity(pid,max_cpus,cpu_mask); > > /* 4 */ > prctl(PR_TASK_PERF_EVENTS_ENABLE); > /* 5 */ > /* fd = 38 */ > > memset(&pe[38],0,sizeof(struct perf_event_attr)); > pe[38].type=PERF_TYPE_HARDWARE; > pe[38].size=112; > pe[38].config=PERF_COUNT_HW_BRANCH_MISSES; > pe[38].sample_type=0; /* 0 */ > pe[38].read_format=PERF_FORMAT_ID|PERF_FORMAT_GROUP|0x10ULL; /* 1c */ > pe[38].disabled=1; > pe[38].precise_ip=0; /* arbitrary skid */ > pe[38].wakeup_events=0; > pe[38].bp_type=HW_BREAKPOINT_EMPTY; > > fd[38]=perf_event_open(&pe[38], > getpid(), /* current thread */ > 22, /* Only cpu 22 */ > -1, /* New Group Leader */ > PERF_FLAG_FD_NO_GROUP /*1*/ ); > > > > > /* Replayed 4 syscalls */ > return 0; > } > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [perf] unchecked MSR access error: WRMSR to 0x3f1 2025-06-20 11:07 ` Liang, Kan @ 2025-06-20 16:12 ` Vince Weaver 0 siblings, 0 replies; 11+ messages in thread From: Vince Weaver @ 2025-06-20 16:12 UTC (permalink / raw) To: Liang, Kan Cc: Vince Weaver, linux-kernel, linux-perf-users, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter On Fri, 20 Jun 2025, Liang, Kan wrote: > Thanks very much for the reproducer! The issue has been root-caused. > I've sent a patch to fix it. Please give it a try. > https://lore.kernel.org/lkml/20250620110406.3782402-1-kan.liang@linux.intel.com/ I've applied the patch and can no longer generat the issue with my tests. Thanks! Vince Tested-by: Vince Weaver <vincent.weaver@maine.edu> ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2025-06-20 16:12 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-06-17 15:39 [perf] unchecked MSR access error: WRMSR to 0x3f1 Vince Weaver 2025-06-17 15:50 ` Abhigyan ghosh 2025-06-17 19:47 ` Liang, Kan 2025-06-18 3:49 ` Vince Weaver 2025-06-18 11:02 ` Liang, Kan 2025-06-18 18:26 ` Vince Weaver 2025-06-19 15:17 ` Vince Weaver 2025-06-19 16:06 ` Liang, Kan 2025-06-19 20:10 ` Vince Weaver 2025-06-20 11:07 ` Liang, Kan 2025-06-20 16:12 ` Vince Weaver
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).