qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Dongli Zhang <dongli.zhang@oracle.com>
To: qemu-devel@nongnu.org, kvm@vger.kernel.org, qemu-arm@nongnu.org,
	qemu-ppc@nongnu.org, qemu-riscv@nongnu.org,
	qemu-s390x@nongnu.org
Cc: pbonzini@redhat.com, zhao1.liu@intel.com, mtosatti@redhat.com,
	sandipan.das@amd.com, babu.moger@amd.com, likexu@tencent.com,
	like.xu.linux@gmail.com, groug@kaod.org, khorenko@virtuozzo.com,
	alexander.ivanov@virtuozzo.com, den@virtuozzo.com,
	davydov-max@yandex-team.ru, xiaoyao.li@intel.com,
	dapeng1.mi@linux.intel.com, joe.jin@oracle.com,
	peter.maydell@linaro.org, gaosong@loongson.cn,
	chenhuacai@kernel.org, philmd@linaro.org, aurelien@aurel32.net,
	jiaxun.yang@flygoat.com, arikalo@gmail.com, npiggin@gmail.com,
	danielhb413@gmail.com, palmer@dabbelt.com,
	alistair.francis@wdc.com, liwei1518@gmail.com,
	zhiwei_liu@linux.alibaba.com, pasic@linux.ibm.com,
	borntraeger@linux.ibm.com, richard.henderson@linaro.org,
	david@redhat.com, iii@linux.ibm.com, thuth@redhat.com,
	flavra@baylibre.com, ewanhai-oc@zhaoxin.com, ewanhai@zhaoxin.com,
	cobechen@zhaoxin.com, louisqi@zhaoxin.com, liamni@zhaoxin.com,
	frankzhu@zhaoxin.com, silviazhao@zhaoxin.com, kraxel@redhat.com,
	berrange@redhat.com
Subject: [PATCH v4 00/11] target/i386/kvm/pmu: PMU Enhancement, Bugfix and Cleanup
Date: Wed, 16 Apr 2025 14:52:25 -0700	[thread overview]
Message-ID: <20250416215306.32426-1-dongli.zhang@oracle.com> (raw)

This patchset addresses four bugs related to AMD PMU virtualization.

1. The PerfMonV2 is still available if PERCORE if disabled via
"-cpu host,-perfctr-core".

2. The VM 'cpuid' command still returns PERFCORE although "-pmu" is
configured.

3. The third issue is that using "-cpu host,-pmu" does not disable AMD PMU
virtualization. When using "-cpu EPYC" or "-cpu host,-pmu", AMD PMU
virtualization remains enabled. On the VM's Linux side, you might still
see:

[    0.510611] Performance Events: Fam17h+ core perfctr, AMD PMU driver.

instead of:

[    0.596381] Performance Events: PMU not available due to virtualization, using software events only.
[    0.600972] NMI watchdog: Perf NMI watchdog permanently disabled

To address this, KVM_CAP_PMU_CAPABILITY is used to set KVM_PMU_CAP_DISABLE
when "-pmu" is configured.

4. The fourth issue is that unreclaimed performance events (after a QEMU
system_reset) in KVM may cause random, unwanted, or unknown NMIs to be
injected into the VM.

The AMD PMU registers are not reset during QEMU system_reset.

(1) If the VM is reset (e.g., via QEMU system_reset or VM kdump/kexec) while
running "perf top", the PMU registers are not disabled properly.

(2) Despite x86_cpu_reset() resetting many registers to zero, kvm_put_msrs()
does not handle AMD PMU registers, causing some PMU events to remain
enabled in KVM.

(3) The KVM kvm_pmc_speculative_in_use() function consistently returns true,
preventing the reclamation of these events. Consequently, the
kvm_pmc->perf_event remains active.

(4) After a reboot, the VM kernel may report the following error:

[    0.092011] Performance Events: Fam17h+ core perfctr, Broken BIOS detected, complain to your hardware vendor.
[    0.092023] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR c0010200 is 530076)

(5) In the worst case, the active kvm_pmc->perf_event may inject unknown
NMIs randomly into the VM kernel:

[...] Uhhuh. NMI received for unknown reason 30 on CPU 0.

To resolve these issues, we propose resetting AMD PMU registers during the
VM reset process


Changed since v1:
  - Use feature_dependencies for CPUID_EXT3_PERFCORE and
    CPUID_8000_0022_EAX_PERFMON_V2.
  - Remove CPUID_EXT3_PERFCORE when !cpu->enable_pmu.
  - Pick kvm_arch_pre_create_vcpu() patch from Xiaoyao Li.
  - Use "-pmu" but not a global "pmu-cap-disabled" for KVM_PMU_CAP_DISABLE.
  - Also use sysfs kvm.enable_pmu=N to determine if PMU is supported.
  - Some changes to PMU register limit calculation.
Changed since v2:
  - Change has_pmu_cap to pmu_cap.
  - Use cpuid_find_entry() instead of cpu_x86_cpuid().
  - Rework the code flow of PATCH 07 related to kvm.enable_pmu=N following
    Zhao's suggestion.
  - Use object_property_get_int() to get CPU family.
  - Add support to Zhaoxin.
Changed since v3:
  - Re-base on top of Zhao's queued patch.
  - Use host_cpu_vendor_fms() from Zhao's patch.
  - Pick new version of kvm_arch_pre_create_vcpu() patch from Xiaoyao.
  - Re-split the cases into enable_pmu and !enable_pmu, following Zhao's
    suggestion.
  - Check AMD directly makes the "compat" rule clear.
  - Some changes on commit message and comment.
  - Bring back global static variable 'kvm_pmu_disabled' read from
    /sys/module/kvm/parameters/enable_pmu.


Zhao Liu (1):
  i386/cpu: Consolidate the helper to get Host's vendor [Don't merge]

Xiaoyao Li (1):
  kvm: Introduce kvm_arch_pre_create_vcpu()

Dongli Zhang (9):
  target/i386: disable PerfMonV2 when PERFCORE unavailable
  target/i386: disable PERFCORE when "-pmu" is configured
  target/i386/kvm: set KVM_PMU_CAP_DISABLE if "-pmu" is configured
  target/i386/kvm: extract unrelated code out of kvm_x86_build_cpuid()
  target/i386/kvm: rename architectural PMU variables
  target/i386/kvm: query kvm.enable_pmu parameter
  target/i386/kvm: reset AMD PMU registers during VM reset
  target/i386/kvm: support perfmon-v2 for reset
  target/i386/kvm: don't stop Intel PMU counters

 accel/kvm/kvm-all.c           |   5 +
 include/system/kvm.h          |   1 +
 target/arm/kvm.c              |   5 +
 target/i386/cpu.c             |   8 +
 target/i386/cpu.h             |  16 ++
 target/i386/host-cpu.c        |  10 +-
 target/i386/kvm/kvm.c         | 360 ++++++++++++++++++++++++++++++++-----
 target/i386/kvm/vmsr_energy.c |   3 +-
 target/loongarch/kvm/kvm.c    |   4 +
 target/mips/kvm.c             |   5 +
 target/ppc/kvm.c              |   5 +
 target/riscv/kvm/kvm-cpu.c    |   5 +
 target/s390x/kvm/kvm.c        |   5 +
 13 files changed, 379 insertions(+), 53 deletions(-)

base-commit: a9cd5bc6399a80fcf233ed0fffe6067b731227d8

Thank you very much!

Dongli Zhang



             reply	other threads:[~2025-04-16 21:59 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-16 21:52 Dongli Zhang [this message]
2025-04-16 21:52 ` [PATCH v4 01/11] [DO NOT MERGE] i386/cpu: Consolidate the helper to get Host's vendor Dongli Zhang
2025-04-25  8:28   ` Zhao Liu
2025-04-25 15:45     ` Dongli Zhang
2025-04-16 21:52 ` [PATCH v4 02/11] target/i386: disable PerfMonV2 when PERFCORE unavailable Dongli Zhang
2025-04-16 21:52 ` [PATCH v4 03/11] target/i386: disable PERFCORE when "-pmu" is configured Dongli Zhang
2025-04-25 10:11   ` Sandipan Das
2025-04-16 21:52 ` [PATCH v4 04/11] kvm: Introduce kvm_arch_pre_create_vcpu() Dongli Zhang
2025-04-16 21:52 ` [PATCH v4 05/11] target/i386/kvm: set KVM_PMU_CAP_DISABLE if "-pmu" is configured Dongli Zhang
2025-04-16 21:52 ` [PATCH v4 06/11] target/i386/kvm: extract unrelated code out of kvm_x86_build_cpuid() Dongli Zhang
2025-04-16 21:52 ` [PATCH v4 07/11] target/i386/kvm: rename architectural PMU variables Dongli Zhang
2025-04-16 21:52 ` [PATCH v4 08/11] target/i386/kvm: query kvm.enable_pmu parameter Dongli Zhang
2025-04-25  8:56   ` Zhao Liu
2025-04-16 21:52 ` [PATCH v4 09/11] target/i386/kvm: reset AMD PMU registers during VM reset Dongli Zhang
2025-04-25  9:18   ` Zhao Liu
2025-04-25 10:14   ` Sandipan Das
2025-04-16 21:52 ` [PATCH v4 10/11] target/i386/kvm: support perfmon-v2 for reset Dongli Zhang
2025-04-25 10:12   ` Sandipan Das
2025-04-16 21:52 ` [PATCH v4 11/11] target/i386/kvm: don't stop Intel PMU counters Dongli Zhang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250416215306.32426-1-dongli.zhang@oracle.com \
    --to=dongli.zhang@oracle.com \
    --cc=alexander.ivanov@virtuozzo.com \
    --cc=alistair.francis@wdc.com \
    --cc=arikalo@gmail.com \
    --cc=aurelien@aurel32.net \
    --cc=babu.moger@amd.com \
    --cc=berrange@redhat.com \
    --cc=borntraeger@linux.ibm.com \
    --cc=chenhuacai@kernel.org \
    --cc=cobechen@zhaoxin.com \
    --cc=danielhb413@gmail.com \
    --cc=dapeng1.mi@linux.intel.com \
    --cc=david@redhat.com \
    --cc=davydov-max@yandex-team.ru \
    --cc=den@virtuozzo.com \
    --cc=ewanhai-oc@zhaoxin.com \
    --cc=ewanhai@zhaoxin.com \
    --cc=flavra@baylibre.com \
    --cc=frankzhu@zhaoxin.com \
    --cc=gaosong@loongson.cn \
    --cc=groug@kaod.org \
    --cc=iii@linux.ibm.com \
    --cc=jiaxun.yang@flygoat.com \
    --cc=joe.jin@oracle.com \
    --cc=khorenko@virtuozzo.com \
    --cc=kraxel@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=liamni@zhaoxin.com \
    --cc=like.xu.linux@gmail.com \
    --cc=likexu@tencent.com \
    --cc=liwei1518@gmail.com \
    --cc=louisqi@zhaoxin.com \
    --cc=mtosatti@redhat.com \
    --cc=npiggin@gmail.com \
    --cc=palmer@dabbelt.com \
    --cc=pasic@linux.ibm.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=philmd@linaro.org \
    --cc=qemu-arm@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    --cc=qemu-riscv@nongnu.org \
    --cc=qemu-s390x@nongnu.org \
    --cc=richard.henderson@linaro.org \
    --cc=sandipan.das@amd.com \
    --cc=silviazhao@zhaoxin.com \
    --cc=thuth@redhat.com \
    --cc=xiaoyao.li@intel.com \
    --cc=zhao1.liu@intel.com \
    --cc=zhiwei_liu@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).