From: Dongli Zhang <dongli.zhang@oracle.com>
To: qemu-devel@nongnu.org, kvm@vger.kernel.org
Cc: pbonzini@redhat.com, zhao1.liu@intel.com, mtosatti@redhat.com,
sandipan.das@amd.com, babu.moger@amd.com, likexu@tencent.com,
like.xu.linux@gmail.com, groug@kaod.org, khorenko@virtuozzo.com,
alexander.ivanov@virtuozzo.com, den@virtuozzo.com,
davydov-max@yandex-team.ru, xiaoyao.li@intel.com,
dapeng1.mi@linux.intel.com, joe.jin@oracle.com,
ewanhai-oc@zhaoxin.com, ewanhai@zhaoxin.com
Subject: [PATCH v6 0/9] target/i386/kvm/pmu: PMU Enhancement, Bugfix and Cleanup
Date: Tue, 24 Jun 2025 00:43:19 -0700 [thread overview]
Message-ID: <20250624074421.40429-1-dongli.zhang@oracle.com> (raw)
This patchset addresses four bugs related to AMD PMU virtualization.
1. The PerfMonV2 is still available if PERCORE if disabled via
"-cpu host,-perfctr-core".
2. The VM 'cpuid' command still returns PERFCORE although "-pmu" is
configured.
3. The third issue is that using "-cpu host,-pmu" does not disable AMD PMU
virtualization. When using "-cpu EPYC" or "-cpu host,-pmu", AMD PMU
virtualization remains enabled. On the VM's Linux side, you might still
see:
[ 0.510611] Performance Events: Fam17h+ core perfctr, AMD PMU driver.
instead of:
[ 0.596381] Performance Events: PMU not available due to virtualization, using software events only.
[ 0.600972] NMI watchdog: Perf NMI watchdog permanently disabled
To address this, KVM_CAP_PMU_CAPABILITY is used to set KVM_PMU_CAP_DISABLE
when "-pmu" is configured.
4. The fourth issue is that unreclaimed performance events (after a QEMU
system_reset) in KVM may cause random, unwanted, or unknown NMIs to be
injected into the VM.
The AMD PMU registers are not reset during QEMU system_reset.
(1) If the VM is reset (e.g., via QEMU system_reset or VM kdump/kexec) while
running "perf top", the PMU registers are not disabled properly.
(2) Despite x86_cpu_reset() resetting many registers to zero, kvm_put_msrs()
does not handle AMD PMU registers, causing some PMU events to remain
enabled in KVM.
(3) The KVM kvm_pmc_speculative_in_use() function consistently returns true,
preventing the reclamation of these events. Consequently, the
kvm_pmc->perf_event remains active.
(4) After a reboot, the VM kernel may report the following error:
[ 0.092011] Performance Events: Fam17h+ core perfctr, Broken BIOS detected, complain to your hardware vendor.
[ 0.092023] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR c0010200 is 530076)
(5) In the worst case, the active kvm_pmc->perf_event may inject unknown
NMIs randomly into the VM kernel:
[...] Uhhuh. NMI received for unknown reason 30 on CPU 0.
To resolve these issues, we propose resetting AMD PMU registers during the
VM reset process
Changed since v1:
- Use feature_dependencies for CPUID_EXT3_PERFCORE and
CPUID_8000_0022_EAX_PERFMON_V2.
- Remove CPUID_EXT3_PERFCORE when !cpu->enable_pmu.
- Pick kvm_arch_pre_create_vcpu() patch from Xiaoyao Li.
- Use "-pmu" but not a global "pmu-cap-disabled" for KVM_PMU_CAP_DISABLE.
- Also use sysfs kvm.enable_pmu=N to determine if PMU is supported.
- Some changes to PMU register limit calculation.
Changed since v2:
- Change has_pmu_cap to pmu_cap.
- Use cpuid_find_entry() instead of cpu_x86_cpuid().
- Rework the code flow of PATCH 07 related to kvm.enable_pmu=N following
Zhao's suggestion.
- Use object_property_get_int() to get CPU family.
- Add support to Zhaoxin.
Changed since v3:
- Re-base on top of Zhao's queued patch.
- Use host_cpu_vendor_fms() from Zhao's patch.
- Pick new version of kvm_arch_pre_create_vcpu() patch from Xiaoyao.
- Re-split the cases into enable_pmu and !enable_pmu, following Zhao's
suggestion.
- Check AMD directly makes the "compat" rule clear.
- Some changes on commit message and comment.
- Bring back global static variable 'kvm_pmu_disabled' read from
/sys/module/kvm/parameters/enable_pmu.
Changed since v4:
- Re-base on top of most recent mainline QEMU.
- Add more Reviewed-by.
- All patches are reviewed.
Changed since v5:
- Re-base on top of most recent mainline QEMU.
- Remove patch "kvm: Introduce kvm_arch_pre_create_vcpu()" as it is
already merged.
- To resolve conflicts in new [PATCH v6 3/9] , move the PMU related code
before the call site of is_tdx_vm().
There is regression in mainline QEMU when "vendor=" is involved in QEMU
command line. I have reverted it when testing with "vendor=".
https://lore.kernel.org/all/d429b6f5-b59c-4884-b18f-8db71cb8dc7b@oracle.com/
Dongli Zhang (9):
target/i386: disable PerfMonV2 when PERFCORE unavailable
target/i386: disable PERFCORE when "-pmu" is configured
target/i386/kvm: set KVM_PMU_CAP_DISABLE if "-pmu" is configured
target/i386/kvm: extract unrelated code out of kvm_x86_build_cpuid()
target/i386/kvm: rename architectural PMU variables
target/i386/kvm: query kvm.enable_pmu parameter
target/i386/kvm: reset AMD PMU registers during VM reset
target/i386/kvm: support perfmon-v2 for reset
target/i386/kvm: don't stop Intel PMU counters
target/i386/cpu.c | 8 +
target/i386/cpu.h | 16 ++
target/i386/kvm/kvm.c | 355 +++++++++++++++++++++++++++++++++++++++------
3 files changed, 332 insertions(+), 47 deletions(-)
base-commit: 43ba160cb4bbb193560eb0d2d7decc4b5fc599fe
Thank you very much!
Dongli Zhang
next reply other threads:[~2025-06-24 7:50 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-24 7:43 Dongli Zhang [this message]
2025-06-24 7:43 ` [PATCH v6 1/9] target/i386: disable PerfMonV2 when PERFCORE unavailable Dongli Zhang
2025-06-24 7:43 ` [PATCH v6 2/9] target/i386: disable PERFCORE when "-pmu" is configured Dongli Zhang
2025-06-24 7:43 ` [PATCH v6 3/9] target/i386/kvm: set KVM_PMU_CAP_DISABLE if " Dongli Zhang
2025-07-02 3:47 ` Mi, Dapeng
2025-06-24 7:43 ` [PATCH v6 4/9] target/i386/kvm: extract unrelated code out of kvm_x86_build_cpuid() Dongli Zhang
2025-07-02 3:52 ` Mi, Dapeng
2025-06-24 7:43 ` [PATCH v6 5/9] target/i386/kvm: rename architectural PMU variables Dongli Zhang
2025-08-13 9:18 ` Sandipan Das
2025-06-24 7:43 ` [PATCH v6 6/9] target/i386/kvm: query kvm.enable_pmu parameter Dongli Zhang
2025-07-02 5:10 ` Mi, Dapeng
2025-06-24 7:43 ` [PATCH v6 7/9] target/i386/kvm: reset AMD PMU registers during VM reset Dongli Zhang
2025-07-02 5:38 ` Mi, Dapeng
2025-06-24 7:43 ` [PATCH v6 8/9] target/i386/kvm: support perfmon-v2 for reset Dongli Zhang
2025-06-24 7:43 ` [PATCH v6 9/9] target/i386/kvm: don't stop Intel PMU counters Dongli Zhang
2025-07-02 5:42 ` Mi, Dapeng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250624074421.40429-1-dongli.zhang@oracle.com \
--to=dongli.zhang@oracle.com \
--cc=alexander.ivanov@virtuozzo.com \
--cc=babu.moger@amd.com \
--cc=dapeng1.mi@linux.intel.com \
--cc=davydov-max@yandex-team.ru \
--cc=den@virtuozzo.com \
--cc=ewanhai-oc@zhaoxin.com \
--cc=ewanhai@zhaoxin.com \
--cc=groug@kaod.org \
--cc=joe.jin@oracle.com \
--cc=khorenko@virtuozzo.com \
--cc=kvm@vger.kernel.org \
--cc=like.xu.linux@gmail.com \
--cc=likexu@tencent.com \
--cc=mtosatti@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=sandipan.das@amd.com \
--cc=xiaoyao.li@intel.com \
--cc=zhao1.liu@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).