[bug report] perf top generates kernel "unchecked MSR access error: WRMSR"

linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [bug report] perf top generates kernel "unchecked MSR access error: WRMSR"
@ 2024-10-22 14:55 John Garry
  2024-10-23 22:59 ` Namhyung Kim
  0 siblings, 1 reply; 4+ messages in thread
From: John Garry @ 2024-10-22 14:55 UTC (permalink / raw)
  To: linux-perf-users, x86

Hi all,

On my VM, "perf top" gives this stackframe on v6.12-rc4:

[  930.527581] unchecked MSR access error: WRMSR to 0xc0010200 (tried to 
write 0x0000020000510076) at rIP: 0xffffffff94ead548 
(native_write_msr+0x8/0x30)
[  930.531135] Call Trace:
[  930.531456]  <IRQ>
[  930.531749]  ? ex_handler_msr+0x138/0x150
[  930.532285]  ? search_extable+0x26/0x30
[  930.532780]  ? fixup_exception+0x9c/0x310
[  930.533405]  ? exc_general_protection+0x10c/0x490
[  930.534081]  ? asm_exc_general_protection+0x26/0x30
[  930.534768]  ? native_write_msr+0x8/0x30
[  930.535357]  ? srso_alias_return_thunk+0x5/0xfbef5
[  930.535998]  x86_pmu_enable_event+0xa5/0xd0
[  930.536641]  amd_pmu_enable_all+0x4e/0x80
[  930.537211]  ctx_resched+0x13b/0x1d0
[  930.537735]  __perf_install_in_context+0x2a2/0x390
[  930.538439]  remote_function+0x49/0x60
[  930.538931]  __flush_smp_call_function_queue+0xdc/0x700
[  930.539694]  ? __pfx_remote_function+0x10/0x10
[  930.540480]  __sysvec_call_function_single+0x38/0x140
[  930.541134]  sysvec_call_function_single+0x6c/0x90
[  930.541970]  </IRQ>
[  930.542269]  <TASK>
[  930.542766]  asm_sysvec_call_function_single+0x1a/0x20
[  930.543493] RIP: 0010:pv_native_safe_halt+0xf/0x20
[  930.544195] Code: 22 d7 e9 ff b5 13 00 0f 1f 40 00 90 90 90 90 90 90 
90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa eb 07 0f 00 2d d3 e3 25 00 fb 
f4 <e9> d7 b5 13 00 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90
[  930.546841] RSP: 0018:ffffffff96a03e68 EFLAGS: 00000206
[  930.547563] RAX: 0000000000000006 RBX: ffffffff96a269c0 RCX: 
0000000000000000
[  930.548579] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 
ffffffff94f53f31
[  930.549568] RBP: 0000000000000000 R08: 0000000000000001 R09: 
0000000000000000
[  930.550529] R10: 0000000000000001 R11: 0000000000000000 R12: 
ffffffff970608e0
[  930.551582] R13: ffffffff96a269c0 R14: 0000000000000000 R15: 
0000000000000000
[  930.552683]  ? do_idle+0x1d1/0x2a0
[  930.553182]  default_idle+0x9/0x20
[  930.553670]  default_idle_call+0x7d/0xc0
[  930.554226]  do_idle+0x1d1/0x2a0
[  930.554696]  cpu_startup_entry+0x29/0x30
[  930.555154]  rest_init+0x12e/0x1d0
[  930.555621]  start_kernel+0x60f/0x6d0
[  930.556064]  x86_64_start_reservations+0x21/0x40
[  930.556633]  x86_64_start_kernel+0x91/0xa0
[  930.557107]  common_startup_64+0x13e/0x141
[  930.558038]  </TASK>
[  930.738880] perf: interrupt took too long (2511 > 2500), lowering 
kernel.perf_event_max_sample_rate to 79000
[  930.772912] perf: interrupt took too long (3414 > 3138), lowering 
kernel.perf_event_max_sample_rate to 58000
[  930.797764] perf: interrupt took too long (4275 > 4267), lowering 
kernel.perf_event_max_sample_rate to 46000
[  931.117733] perf: interrupt took too long (5345 > 5343), lowering 
kernel.perf_event_max_sample_rate to 37000
[  933.862829] perf: interrupt took too long (6765 > 6681), lowering 
kernel.perf_event_max_sample_rate to 29000
[opc@jgarry-atomic-write-exp-e4-8-instance-20231214-1221 ~]$ ^C

a known issue?

more /proc/cpuinfo gives:

processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 25
model           : 1
model name      : AMD EPYC 7J13 64-Core Processor
stepping        : 1
microcode       : 0x1000065
cpu MHz         : 2445.322
cache size      : 512 KB
physical id     : 0
siblings        : 16
core id         : 0
cpu cores       : 8
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 16
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext 
fxsr_opt pdpe1gb rdtscp lm rep_good nopl xtopology cpuid extd_apicid tsc_kn
own_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe 
popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm 
cmp_legacy svm cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topo
ext perfctr_core ssbd ibrs ibpb stibp vmmcall fsgsbase tsc_adjust bmi1 
avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb sha_ni 
xsaveopt xsavec xgetbv1 xsaves clzero xsaveerptr wbnoinvd arat npt nrip_sa
ve umip pku ospke vaes vpclmulqdq rdpid arch_capabilities
bugs            : sysret_ss_attrs null_seg spectre_v1 spectre_v2 
spec_store_bypass srso ibpb_no_ret
bogomips        : 4890.64
TLB size        : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

Thanks,
John

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [bug report] perf top generates kernel "unchecked MSR access error: WRMSR"
  2024-10-22 14:55 [bug report] perf top generates kernel "unchecked MSR access error: WRMSR" John Garry
@ 2024-10-23 22:59 ` Namhyung Kim
  2024-10-24  6:21   ` Sandipan Das
  0 siblings, 1 reply; 4+ messages in thread
From: Namhyung Kim @ 2024-10-23 22:59 UTC (permalink / raw)
  To: John Garry; +Cc: linux-perf-users, x86, ravi.bangoria, sandipan.das

Adding Ravi and Sandipan to CC.

On Tue, Oct 22, 2024 at 03:55:05PM +0100, John Garry wrote:
> Hi all,
> 
> On my VM, "perf top" gives this stackframe on v6.12-rc4:
> 
> [  930.527581] unchecked MSR access error: WRMSR to 0xc0010200 (tried to
> write 0x0000020000510076) at rIP: 0xffffffff94ead548
> (native_write_msr+0x8/0x30)
> [  930.531135] Call Trace:
> [  930.531456]  <IRQ>
> [  930.531749]  ? ex_handler_msr+0x138/0x150
> [  930.532285]  ? search_extable+0x26/0x30
> [  930.532780]  ? fixup_exception+0x9c/0x310
> [  930.533405]  ? exc_general_protection+0x10c/0x490
> [  930.534081]  ? asm_exc_general_protection+0x26/0x30
> [  930.534768]  ? native_write_msr+0x8/0x30
> [  930.535357]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  930.535998]  x86_pmu_enable_event+0xa5/0xd0
> [  930.536641]  amd_pmu_enable_all+0x4e/0x80
> [  930.537211]  ctx_resched+0x13b/0x1d0
> [  930.537735]  __perf_install_in_context+0x2a2/0x390
> [  930.538439]  remote_function+0x49/0x60
> [  930.538931]  __flush_smp_call_function_queue+0xdc/0x700
> [  930.539694]  ? __pfx_remote_function+0x10/0x10
> [  930.540480]  __sysvec_call_function_single+0x38/0x140
> [  930.541134]  sysvec_call_function_single+0x6c/0x90
> [  930.541970]  </IRQ>
> [  930.542269]  <TASK>
> [  930.542766]  asm_sysvec_call_function_single+0x1a/0x20
> [  930.543493] RIP: 0010:pv_native_safe_halt+0xf/0x20
> [  930.544195] Code: 22 d7 e9 ff b5 13 00 0f 1f 40 00 90 90 90 90 90 90 90
> 90 90 90 90 90 90 90 90 90 f3 0f 1e fa eb 07 0f 00 2d d3 e3 25 00 fb f4 <e9>
> d7 b5 13 00 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90
> [  930.546841] RSP: 0018:ffffffff96a03e68 EFLAGS: 00000206
> [  930.547563] RAX: 0000000000000006 RBX: ffffffff96a269c0 RCX:
> 0000000000000000
> [  930.548579] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
> ffffffff94f53f31
> [  930.549568] RBP: 0000000000000000 R08: 0000000000000001 R09:
> 0000000000000000
> [  930.550529] R10: 0000000000000001 R11: 0000000000000000 R12:
> ffffffff970608e0
> [  930.551582] R13: ffffffff96a269c0 R14: 0000000000000000 R15:
> 0000000000000000
> [  930.552683]  ? do_idle+0x1d1/0x2a0
> [  930.553182]  default_idle+0x9/0x20
> [  930.553670]  default_idle_call+0x7d/0xc0
> [  930.554226]  do_idle+0x1d1/0x2a0
> [  930.554696]  cpu_startup_entry+0x29/0x30
> [  930.555154]  rest_init+0x12e/0x1d0
> [  930.555621]  start_kernel+0x60f/0x6d0
> [  930.556064]  x86_64_start_reservations+0x21/0x40
> [  930.556633]  x86_64_start_kernel+0x91/0xa0
> [  930.557107]  common_startup_64+0x13e/0x141
> [  930.558038]  </TASK>
> [  930.738880] perf: interrupt took too long (2511 > 2500), lowering
> kernel.perf_event_max_sample_rate to 79000
> [  930.772912] perf: interrupt took too long (3414 > 3138), lowering
> kernel.perf_event_max_sample_rate to 58000
> [  930.797764] perf: interrupt took too long (4275 > 4267), lowering
> kernel.perf_event_max_sample_rate to 46000
> [  931.117733] perf: interrupt took too long (5345 > 5343), lowering
> kernel.perf_event_max_sample_rate to 37000
> [  933.862829] perf: interrupt took too long (6765 > 6681), lowering
> kernel.perf_event_max_sample_rate to 29000
> [opc@jgarry-atomic-write-exp-e4-8-instance-20231214-1221 ~]$ ^C
> 
> a known issue?
> 
> more /proc/cpuinfo gives:
> 
> processor       : 0
> vendor_id       : AuthenticAMD
> cpu family      : 25
> model           : 1
> model name      : AMD EPYC 7J13 64-Core Processor
> stepping        : 1
> microcode       : 0x1000065
> cpu MHz         : 2445.322
> cache size      : 512 KB
> physical id     : 0
> siblings        : 16
> core id         : 0
> cpu cores       : 8
> apicid          : 0
> initial apicid  : 0
> fpu             : yes
> fpu_exception   : yes
> cpuid level     : 16
> wp              : yes
> flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
> cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt
> pdpe1gb rdtscp lm rep_good nopl xtopology cpuid extd_apicid tsc_kn
> own_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt
> tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy
> svm cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topo
> ext perfctr_core ssbd ibrs ibpb stibp vmmcall fsgsbase tsc_adjust bmi1 avx2
> smep bmi2 erms invpcid rdseed adx smap clflushopt clwb sha_ni xsaveopt
> xsavec xgetbv1 xsaves clzero xsaveerptr wbnoinvd arat npt nrip_sa
> ve umip pku ospke vaes vpclmulqdq rdpid arch_capabilities
> bugs            : sysret_ss_attrs null_seg spectre_v1 spectre_v2
> spec_store_bypass srso ibpb_no_ret
> bogomips        : 4890.64
> TLB size        : 1024 4K pages
> clflush size    : 64
> cache_alignment : 64
> address sizes   : 40 bits physical, 48 bits virtual
> power management:
> 
> Thanks,
> John

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [bug report] perf top generates kernel "unchecked MSR access error: WRMSR"
  2024-10-23 22:59 ` Namhyung Kim
@ 2024-10-24  6:21   ` Sandipan Das
  2024-10-24 16:20     ` John Garry
  0 siblings, 1 reply; 4+ messages in thread
From: Sandipan Das @ 2024-10-24  6:21 UTC (permalink / raw)
  To: John Garry; +Cc: linux-perf-users, x86, ravi.bangoria, Namhyung Kim

On 10/24/2024 4:29 AM, Namhyung Kim wrote:
> Adding Ravi and Sandipan to CC.
> 

Thanks for bringing this to our attention.

> On Tue, Oct 22, 2024 at 03:55:05PM +0100, John Garry wrote:
>> Hi all,
>>
>> On my VM, "perf top" gives this stackframe on v6.12-rc4:
>>
>> [  930.527581] unchecked MSR access error: WRMSR to 0xc0010200 (tried to
>> write 0x0000020000510076) at rIP: 0xffffffff94ead548
>> (native_write_msr+0x8/0x30)
>> [  930.531135] Call Trace:
>> [  930.531456]  <IRQ>
>> [  930.531749]  ? ex_handler_msr+0x138/0x150
>> [  930.532285]  ? search_extable+0x26/0x30
>> [  930.532780]  ? fixup_exception+0x9c/0x310
>> [  930.533405]  ? exc_general_protection+0x10c/0x490
>> [  930.534081]  ? asm_exc_general_protection+0x26/0x30
>> [  930.534768]  ? native_write_msr+0x8/0x30
>> [  930.535357]  ? srso_alias_return_thunk+0x5/0xfbef5
>> [  930.535998]  x86_pmu_enable_event+0xa5/0xd0
>> [  930.536641]  amd_pmu_enable_all+0x4e/0x80
>> [  930.537211]  ctx_resched+0x13b/0x1d0
>> [  930.537735]  __perf_install_in_context+0x2a2/0x390
>> [  930.538439]  remote_function+0x49/0x60
>> [  930.538931]  __flush_smp_call_function_queue+0xdc/0x700
>> [  930.539694]  ? __pfx_remote_function+0x10/0x10
>> [  930.540480]  __sysvec_call_function_single+0x38/0x140
>> [  930.541134]  sysvec_call_function_single+0x6c/0x90
>> [  930.541970]  </IRQ>
>> [  930.542269]  <TASK>
>> [  930.542766]  asm_sysvec_call_function_single+0x1a/0x20
>> [  930.543493] RIP: 0010:pv_native_safe_halt+0xf/0x20
>> [  930.544195] Code: 22 d7 e9 ff b5 13 00 0f 1f 40 00 90 90 90 90 90 90 90
>> 90 90 90 90 90 90 90 90 90 f3 0f 1e fa eb 07 0f 00 2d d3 e3 25 00 fb f4 <e9>
>> d7 b5 13 00 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90
>> [  930.546841] RSP: 0018:ffffffff96a03e68 EFLAGS: 00000206
>> [  930.547563] RAX: 0000000000000006 RBX: ffffffff96a269c0 RCX:
>> 0000000000000000
>> [  930.548579] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
>> ffffffff94f53f31
>> [  930.549568] RBP: 0000000000000000 R08: 0000000000000001 R09:
>> 0000000000000000
>> [  930.550529] R10: 0000000000000001 R11: 0000000000000000 R12:
>> ffffffff970608e0
>> [  930.551582] R13: ffffffff96a269c0 R14: 0000000000000000 R15:
>> 0000000000000000
>> [  930.552683]  ? do_idle+0x1d1/0x2a0
>> [  930.553182]  default_idle+0x9/0x20
>> [  930.553670]  default_idle_call+0x7d/0xc0
>> [  930.554226]  do_idle+0x1d1/0x2a0
>> [  930.554696]  cpu_startup_entry+0x29/0x30
>> [  930.555154]  rest_init+0x12e/0x1d0
>> [  930.555621]  start_kernel+0x60f/0x6d0
>> [  930.556064]  x86_64_start_reservations+0x21/0x40
>> [  930.556633]  x86_64_start_kernel+0x91/0xa0
>> [  930.557107]  common_startup_64+0x13e/0x141
>> [  930.558038]  </TASK>
>> [  930.738880] perf: interrupt took too long (2511 > 2500), lowering
>> kernel.perf_event_max_sample_rate to 79000
>> [  930.772912] perf: interrupt took too long (3414 > 3138), lowering
>> kernel.perf_event_max_sample_rate to 58000
>> [  930.797764] perf: interrupt took too long (4275 > 4267), lowering
>> kernel.perf_event_max_sample_rate to 46000
>> [  931.117733] perf: interrupt took too long (5345 > 5343), lowering
>> kernel.perf_event_max_sample_rate to 37000
>> [  933.862829] perf: interrupt took too long (6765 > 6681), lowering
>> kernel.perf_event_max_sample_rate to 29000
>> [opc@jgarry-atomic-write-exp-e4-8-instance-20231214-1221 ~]$ ^C
>>
>> a known issue?

I am unable to replicate this with KVM guests. MSR 0xc0010200 is the
first PERF_CTL (event selector) and generally, unchecked MSR accesses
happen when the hypervisor restricts what guests can access.

Can you share details about the hypervisor?
If its just KVM, can you share the host kernel version as well?

>>
>> more /proc/cpuinfo gives:
>>
>> processor       : 0
>> vendor_id       : AuthenticAMD
>> cpu family      : 25
>> model           : 1
>> model name      : AMD EPYC 7J13 64-Core Processor
>> stepping        : 1
>> microcode       : 0x1000065
>> cpu MHz         : 2445.322
>> cache size      : 512 KB
>> physical id     : 0
>> siblings        : 16
>> core id         : 0
>> cpu cores       : 8
>> apicid          : 0
>> initial apicid  : 0
>> fpu             : yes
>> fpu_exception   : yes
>> cpuid level     : 16
>> wp              : yes
>> flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
>> cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt
>> pdpe1gb rdtscp lm rep_good nopl xtopology cpuid extd_apicid tsc_kn
>> own_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt
>> tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy
>> svm cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topo
>> ext perfctr_core ssbd ibrs ibpb stibp vmmcall fsgsbase tsc_adjust bmi1 avx2
>> smep bmi2 erms invpcid rdseed adx smap clflushopt clwb sha_ni xsaveopt
>> xsavec xgetbv1 xsaves clzero xsaveerptr wbnoinvd arat npt nrip_sa
>> ve umip pku ospke vaes vpclmulqdq rdpid arch_capabilities
>> bugs            : sysret_ss_attrs null_seg spectre_v1 spectre_v2
>> spec_store_bypass srso ibpb_no_ret
>> bogomips        : 4890.64
>> TLB size        : 1024 4K pages
>> clflush size    : 64
>> cache_alignment : 64
>> address sizes   : 40 bits physical, 48 bits virtual
>> power management:
>>

I tried replicating this on systems with an EPYC 7713 (very similar to the
one above) and an EPYC 9654 but had no luck.

- Sandipan

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [bug report] perf top generates kernel "unchecked MSR access error: WRMSR"
  2024-10-24  6:21   ` Sandipan Das
@ 2024-10-24 16:20     ` John Garry
  0 siblings, 0 replies; 4+ messages in thread
From: John Garry @ 2024-10-24 16:20 UTC (permalink / raw)
  To: Sandipan Das; +Cc: linux-perf-users, x86, ravi.bangoria, Namhyung Kim

On 24/10/2024 07:21, Sandipan Das wrote:
> Thanks for bringing this to our attention.
> 
>> On Tue, Oct 22, 2024 at 03:55:05PM +0100, John Garry wrote:
>>> Hi all,
>>>
>>> On my VM, "perf top" gives this stackframe on v6.12-rc4:
>>>
>>> [  930.527581] unchecked MSR access error: WRMSR to 0xc0010200 (tried to
>>> write 0x0000020000510076) at rIP: 0xffffffff94ead548
>>> (native_write_msr+0x8/0x30)
>>> [  930.531135] Call Trace:
>>> [  930.531456]  <IRQ>
>>> [  930.531749]  ? ex_handler_msr+0x138/0x150
>>> [  930.532285]  ? search_extable+0x26/0x30
>>> [  930.532780]  ? fixup_exception+0x9c/0x310
>>> [  930.533405]  ? exc_general_protection+0x10c/0x490
>>> [  930.534081]  ? asm_exc_general_protection+0x26/0x30
>>> [  930.534768]  ? native_write_msr+0x8/0x30
>>> [  930.535357]  ? srso_alias_return_thunk+0x5/0xfbef5
>>> [  930.535998]  x86_pmu_enable_event+0xa5/0xd0
>>> [  930.536641]  amd_pmu_enable_all+0x4e/0x80
>>> [  930.537211]  ctx_resched+0x13b/0x1d0
>>> [  930.537735]  __perf_install_in_context+0x2a2/0x390
>>> [  930.538439]  remote_function+0x49/0x60
>>> [  930.538931]  __flush_smp_call_function_queue+0xdc/0x700
>>> [  930.539694]  ? __pfx_remote_function+0x10/0x10
>>> [  930.540480]  __sysvec_call_function_single+0x38/0x140
>>> [  930.541134]  sysvec_call_function_single+0x6c/0x90
>>> [  930.541970]  </IRQ>
>>> [  930.542269]  <TASK>
>>> [  930.542766]  asm_sysvec_call_function_single+0x1a/0x20
>>> [  930.543493] RIP: 0010:pv_native_safe_halt+0xf/0x20
>>> [  930.544195] Code: 22 d7 e9 ff b5 13 00 0f 1f 40 00 90 90 90 90 90 90 90
>>> 90 90 90 90 90 90 90 90 90 f3 0f 1e fa eb 07 0f 00 2d d3 e3 25 00 fb f4 <e9>
>>> d7 b5 13 00 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90
>>> [  930.546841] RSP: 0018:ffffffff96a03e68 EFLAGS: 00000206
>>> [  930.547563] RAX: 0000000000000006 RBX: ffffffff96a269c0 RCX:
>>> 0000000000000000
>>> [  930.548579] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
>>> ffffffff94f53f31
>>> [  930.549568] RBP: 0000000000000000 R08: 0000000000000001 R09:
>>> 0000000000000000
>>> [  930.550529] R10: 0000000000000001 R11: 0000000000000000 R12:
>>> ffffffff970608e0
>>> [  930.551582] R13: ffffffff96a269c0 R14: 0000000000000000 R15:
>>> 0000000000000000
>>> [  930.552683]  ? do_idle+0x1d1/0x2a0
>>> [  930.553182]  default_idle+0x9/0x20
>>> [  930.553670]  default_idle_call+0x7d/0xc0
>>> [  930.554226]  do_idle+0x1d1/0x2a0
>>> [  930.554696]  cpu_startup_entry+0x29/0x30
>>> [  930.555154]  rest_init+0x12e/0x1d0
>>> [  930.555621]  start_kernel+0x60f/0x6d0
>>> [  930.556064]  x86_64_start_reservations+0x21/0x40
>>> [  930.556633]  x86_64_start_kernel+0x91/0xa0
>>> [  930.557107]  common_startup_64+0x13e/0x141
>>> [  930.558038]  </TASK>
>>> [  930.738880] perf: interrupt took too long (2511 > 2500), lowering
>>> kernel.perf_event_max_sample_rate to 79000
>>> [  930.772912] perf: interrupt took too long (3414 > 3138), lowering
>>> kernel.perf_event_max_sample_rate to 58000
>>> [  930.797764] perf: interrupt took too long (4275 > 4267), lowering
>>> kernel.perf_event_max_sample_rate to 46000
>>> [  931.117733] perf: interrupt took too long (5345 > 5343), lowering
>>> kernel.perf_event_max_sample_rate to 37000
>>> [  933.862829] perf: interrupt took too long (6765 > 6681), lowering
>>> kernel.perf_event_max_sample_rate to 29000
>>> [opc@jgarry-atomic-write-exp-e4-8-instance-20231214-1221 ~]$ ^C
>>>
>>> a known issue?
> I am unable to replicate this with KVM guests. MSR 0xc0010200 is the
> first PERF_CTL (event selector) and generally, unchecked MSR accesses
> happen when the hypervisor restricts what guests can access.
> 
> Can you share details about the hypervisor?
> If its just KVM, can you share the host kernel version as well?

It's KVM, but I don't know the host version - I don't think it's easy 
info to get. Here's some KVM prints:

[opc@jgarry-atomic-write-exp-e4-8-instance-20231214-1221 ~]$ sudo dmesg 
| grep -i kvm
[    0.000000] Hypervisor detected: KVM
[    0.000000] kvm-clock: Using msrs 4b564d01 and 4b564d00
[    0.000000] kvm-clock: using sched offset of 394705327075 cycles
[    0.000002] clocksource: kvm-clock: mask: 0xffffffffffffffff 
max_cycles: 0x1cd42e4dffb, max_idle_ns: 881590591483 ns
[    0.010289] kvm-guest: APIC: eoi() replaced with 
kvm_guest_apic_eoi_write()
[    0.010297] kvm-guest: KVM setup pv remote TLB flush
[    0.010300] kvm-guest: setup PV sched yield
[    0.010324] Booting paravirtualized kernel on KVM
[    0.016465] kvm-guest: PV spinlocks enabled
[    0.058487] kvm-guest: APIC: send_IPI_mask() replaced with 
kvm_send_ipi_mask()
[    0.058491] kvm-guest: APIC: send_IPI_mask_allbutself() replaced with 
kvm_send_ipi_mask_allbutself()
[    0.058492] kvm-guest: setup PV IPIs
[    0.280302] clocksource: Switched to clocksource kvm-clock
[    1.302630] systemd[1]: Detected virtualization kvm.
[    1.312201] systemd[1]: Initializing machine ID from KVM UUID.
[   13.771695] systemd[1]: Detected virtualization kvm.
[   15.072121] kvm_amd: Nested Virtualization enabled
[   15.072124] kvm_amd: Nested Paging enabled
[opc@jgarry-atomic-write-exp-e4-8-instance-20231214-1221 ~]$


> 
>>> more /proc/cpuinfo gives:
>>>
>>> processor       : 0
>>> vendor_id       : AuthenticAMD
>>> cpu family      : 25
>>> model           : 1
>>> model name      : AMD EPYC 7J13 64-Core Processor
>>> stepping        : 1
>>> microcode       : 0x1000065
>>> cpu MHz         : 2445.322
>>> cache size      : 512 KB
>>> physical id     : 0
>>> siblings        : 16
>>> core id         : 0
>>> cpu cores       : 8
>>> apicid          : 0
>>> initial apicid  : 0
>>> fpu             : yes
>>> fpu_exception   : yes
>>> cpuid level     : 16
>>> wp              : yes
>>> flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
>>> cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt
>>> pdpe1gb rdtscp lm rep_good nopl xtopology cpuid extd_apicid tsc_kn
>>> own_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt
>>> tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy
>>> svm cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topo
>>> ext perfctr_core ssbd ibrs ibpb stibp vmmcall fsgsbase tsc_adjust bmi1 avx2
>>> smep bmi2 erms invpcid rdseed adx smap clflushopt clwb sha_ni xsaveopt
>>> xsavec xgetbv1 xsaves clzero xsaveerptr wbnoinvd arat npt nrip_sa
>>> ve umip pku ospke vaes vpclmulqdq rdpid arch_capabilities
>>> bugs            : sysret_ss_attrs null_seg spectre_v1 spectre_v2
>>> spec_store_bypass srso ibpb_no_ret
>>> bogomips        : 4890.64
>>> TLB size        : 1024 4K pages
>>> clflush size    : 64
>>> cache_alignment : 64
>>> address sizes   : 40 bits physical, 48 bits virtual
>>> power management:
>>>
> I tried replicating this on systems with an EPYC 7713 (very similar to the
> one above) and an EPYC 9654 but had no luck.

Thanks for checking.

but wouldn't you know it - it does not occur now. I guess that it will 
reappear... I'll let you know.

Thanks
John


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-10-24 16:20 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-22 14:55 [bug report] perf top generates kernel "unchecked MSR access error: WRMSR" John Garry
2024-10-23 22:59 ` Namhyung Kim
2024-10-24  6:21   ` Sandipan Das
2024-10-24 16:20     ` John Garry

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).