Re: [bug report] perf top generates kernel "unchecked MSR access error: WRMSR"

linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: John Garry <john.g.garry@oracle.com>
To: Sandipan Das <sandipan.das@amd.com>
Cc: linux-perf-users@vger.kernel.org, x86@kernel.org,
	ravi.bangoria@amd.com, Namhyung Kim <namhyung@kernel.org>
Subject: Re: [bug report] perf top generates kernel "unchecked MSR access error: WRMSR"
Date: Thu, 24 Oct 2024 17:20:24 +0100	[thread overview]
Message-ID: <8be4ca4e-ce69-468f-a846-d532a0e7393c@oracle.com> (raw)
In-Reply-To: <368573d6-fd3d-43c3-8c15-d01ef0c35026@amd.com>

On 24/10/2024 07:21, Sandipan Das wrote:
> Thanks for bringing this to our attention.
> 
>> On Tue, Oct 22, 2024 at 03:55:05PM +0100, John Garry wrote:
>>> Hi all,
>>>
>>> On my VM, "perf top" gives this stackframe on v6.12-rc4:
>>>
>>> [  930.527581] unchecked MSR access error: WRMSR to 0xc0010200 (tried to
>>> write 0x0000020000510076) at rIP: 0xffffffff94ead548
>>> (native_write_msr+0x8/0x30)
>>> [  930.531135] Call Trace:
>>> [  930.531456]  <IRQ>
>>> [  930.531749]  ? ex_handler_msr+0x138/0x150
>>> [  930.532285]  ? search_extable+0x26/0x30
>>> [  930.532780]  ? fixup_exception+0x9c/0x310
>>> [  930.533405]  ? exc_general_protection+0x10c/0x490
>>> [  930.534081]  ? asm_exc_general_protection+0x26/0x30
>>> [  930.534768]  ? native_write_msr+0x8/0x30
>>> [  930.535357]  ? srso_alias_return_thunk+0x5/0xfbef5
>>> [  930.535998]  x86_pmu_enable_event+0xa5/0xd0
>>> [  930.536641]  amd_pmu_enable_all+0x4e/0x80
>>> [  930.537211]  ctx_resched+0x13b/0x1d0
>>> [  930.537735]  __perf_install_in_context+0x2a2/0x390
>>> [  930.538439]  remote_function+0x49/0x60
>>> [  930.538931]  __flush_smp_call_function_queue+0xdc/0x700
>>> [  930.539694]  ? __pfx_remote_function+0x10/0x10
>>> [  930.540480]  __sysvec_call_function_single+0x38/0x140
>>> [  930.541134]  sysvec_call_function_single+0x6c/0x90
>>> [  930.541970]  </IRQ>
>>> [  930.542269]  <TASK>
>>> [  930.542766]  asm_sysvec_call_function_single+0x1a/0x20
>>> [  930.543493] RIP: 0010:pv_native_safe_halt+0xf/0x20
>>> [  930.544195] Code: 22 d7 e9 ff b5 13 00 0f 1f 40 00 90 90 90 90 90 90 90
>>> 90 90 90 90 90 90 90 90 90 f3 0f 1e fa eb 07 0f 00 2d d3 e3 25 00 fb f4 <e9>
>>> d7 b5 13 00 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90
>>> [  930.546841] RSP: 0018:ffffffff96a03e68 EFLAGS: 00000206
>>> [  930.547563] RAX: 0000000000000006 RBX: ffffffff96a269c0 RCX:
>>> 0000000000000000
>>> [  930.548579] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
>>> ffffffff94f53f31
>>> [  930.549568] RBP: 0000000000000000 R08: 0000000000000001 R09:
>>> 0000000000000000
>>> [  930.550529] R10: 0000000000000001 R11: 0000000000000000 R12:
>>> ffffffff970608e0
>>> [  930.551582] R13: ffffffff96a269c0 R14: 0000000000000000 R15:
>>> 0000000000000000
>>> [  930.552683]  ? do_idle+0x1d1/0x2a0
>>> [  930.553182]  default_idle+0x9/0x20
>>> [  930.553670]  default_idle_call+0x7d/0xc0
>>> [  930.554226]  do_idle+0x1d1/0x2a0
>>> [  930.554696]  cpu_startup_entry+0x29/0x30
>>> [  930.555154]  rest_init+0x12e/0x1d0
>>> [  930.555621]  start_kernel+0x60f/0x6d0
>>> [  930.556064]  x86_64_start_reservations+0x21/0x40
>>> [  930.556633]  x86_64_start_kernel+0x91/0xa0
>>> [  930.557107]  common_startup_64+0x13e/0x141
>>> [  930.558038]  </TASK>
>>> [  930.738880] perf: interrupt took too long (2511 > 2500), lowering
>>> kernel.perf_event_max_sample_rate to 79000
>>> [  930.772912] perf: interrupt took too long (3414 > 3138), lowering
>>> kernel.perf_event_max_sample_rate to 58000
>>> [  930.797764] perf: interrupt took too long (4275 > 4267), lowering
>>> kernel.perf_event_max_sample_rate to 46000
>>> [  931.117733] perf: interrupt took too long (5345 > 5343), lowering
>>> kernel.perf_event_max_sample_rate to 37000
>>> [  933.862829] perf: interrupt took too long (6765 > 6681), lowering
>>> kernel.perf_event_max_sample_rate to 29000
>>> [opc@jgarry-atomic-write-exp-e4-8-instance-20231214-1221 ~]$ ^C
>>>
>>> a known issue?
> I am unable to replicate this with KVM guests. MSR 0xc0010200 is the
> first PERF_CTL (event selector) and generally, unchecked MSR accesses
> happen when the hypervisor restricts what guests can access.
> 
> Can you share details about the hypervisor?
> If its just KVM, can you share the host kernel version as well?

It's KVM, but I don't know the host version - I don't think it's easy 
info to get. Here's some KVM prints:

[opc@jgarry-atomic-write-exp-e4-8-instance-20231214-1221 ~]$ sudo dmesg 
| grep -i kvm
[    0.000000] Hypervisor detected: KVM
[    0.000000] kvm-clock: Using msrs 4b564d01 and 4b564d00
[    0.000000] kvm-clock: using sched offset of 394705327075 cycles
[    0.000002] clocksource: kvm-clock: mask: 0xffffffffffffffff 
max_cycles: 0x1cd42e4dffb, max_idle_ns: 881590591483 ns
[    0.010289] kvm-guest: APIC: eoi() replaced with 
kvm_guest_apic_eoi_write()
[    0.010297] kvm-guest: KVM setup pv remote TLB flush
[    0.010300] kvm-guest: setup PV sched yield
[    0.010324] Booting paravirtualized kernel on KVM
[    0.016465] kvm-guest: PV spinlocks enabled
[    0.058487] kvm-guest: APIC: send_IPI_mask() replaced with 
kvm_send_ipi_mask()
[    0.058491] kvm-guest: APIC: send_IPI_mask_allbutself() replaced with 
kvm_send_ipi_mask_allbutself()
[    0.058492] kvm-guest: setup PV IPIs
[    0.280302] clocksource: Switched to clocksource kvm-clock
[    1.302630] systemd[1]: Detected virtualization kvm.
[    1.312201] systemd[1]: Initializing machine ID from KVM UUID.
[   13.771695] systemd[1]: Detected virtualization kvm.
[   15.072121] kvm_amd: Nested Virtualization enabled
[   15.072124] kvm_amd: Nested Paging enabled
[opc@jgarry-atomic-write-exp-e4-8-instance-20231214-1221 ~]$


> 
>>> more /proc/cpuinfo gives:
>>>
>>> processor       : 0
>>> vendor_id       : AuthenticAMD
>>> cpu family      : 25
>>> model           : 1
>>> model name      : AMD EPYC 7J13 64-Core Processor
>>> stepping        : 1
>>> microcode       : 0x1000065
>>> cpu MHz         : 2445.322
>>> cache size      : 512 KB
>>> physical id     : 0
>>> siblings        : 16
>>> core id         : 0
>>> cpu cores       : 8
>>> apicid          : 0
>>> initial apicid  : 0
>>> fpu             : yes
>>> fpu_exception   : yes
>>> cpuid level     : 16
>>> wp              : yes
>>> flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
>>> cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt
>>> pdpe1gb rdtscp lm rep_good nopl xtopology cpuid extd_apicid tsc_kn
>>> own_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt
>>> tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy
>>> svm cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topo
>>> ext perfctr_core ssbd ibrs ibpb stibp vmmcall fsgsbase tsc_adjust bmi1 avx2
>>> smep bmi2 erms invpcid rdseed adx smap clflushopt clwb sha_ni xsaveopt
>>> xsavec xgetbv1 xsaves clzero xsaveerptr wbnoinvd arat npt nrip_sa
>>> ve umip pku ospke vaes vpclmulqdq rdpid arch_capabilities
>>> bugs            : sysret_ss_attrs null_seg spectre_v1 spectre_v2
>>> spec_store_bypass srso ibpb_no_ret
>>> bogomips        : 4890.64
>>> TLB size        : 1024 4K pages
>>> clflush size    : 64
>>> cache_alignment : 64
>>> address sizes   : 40 bits physical, 48 bits virtual
>>> power management:
>>>
> I tried replicating this on systems with an EPYC 7713 (very similar to the
> one above) and an EPYC 9654 but had no luck.

Thanks for checking.

but wouldn't you know it - it does not occur now. I guess that it will 
reappear... I'll let you know.

Thanks
John

     prev parent reply	other threads:[~2024-10-24 16:20 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-22 14:55 [bug report] perf top generates kernel "unchecked MSR access error: WRMSR" John Garry
2024-10-23 22:59 ` Namhyung Kim
2024-10-24  6:21   ` Sandipan Das
2024-10-24 16:20     ` John Garry [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8be4ca4e-ce69-468f-a846-d532a0e7393c@oracle.com \
    --to=john.g.garry@oracle.com \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=namhyung@kernel.org \
    --cc=ravi.bangoria@amd.com \
    --cc=sandipan.das@amd.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).