* [PATCH v6 0/9] target/i386/kvm/pmu: PMU Enhancement, Bugfix and Cleanup
@ 2025-06-24 7:43 Dongli Zhang
2025-06-24 7:43 ` [PATCH v6 1/9] target/i386: disable PerfMonV2 when PERFCORE unavailable Dongli Zhang
` (8 more replies)
0 siblings, 9 replies; 16+ messages in thread
From: Dongli Zhang @ 2025-06-24 7:43 UTC (permalink / raw)
To: qemu-devel, kvm
Cc: pbonzini, zhao1.liu, mtosatti, sandipan.das, babu.moger, likexu,
like.xu.linux, groug, khorenko, alexander.ivanov, den,
davydov-max, xiaoyao.li, dapeng1.mi, joe.jin, ewanhai-oc, ewanhai
This patchset addresses four bugs related to AMD PMU virtualization.
1. The PerfMonV2 is still available if PERCORE if disabled via
"-cpu host,-perfctr-core".
2. The VM 'cpuid' command still returns PERFCORE although "-pmu" is
configured.
3. The third issue is that using "-cpu host,-pmu" does not disable AMD PMU
virtualization. When using "-cpu EPYC" or "-cpu host,-pmu", AMD PMU
virtualization remains enabled. On the VM's Linux side, you might still
see:
[ 0.510611] Performance Events: Fam17h+ core perfctr, AMD PMU driver.
instead of:
[ 0.596381] Performance Events: PMU not available due to virtualization, using software events only.
[ 0.600972] NMI watchdog: Perf NMI watchdog permanently disabled
To address this, KVM_CAP_PMU_CAPABILITY is used to set KVM_PMU_CAP_DISABLE
when "-pmu" is configured.
4. The fourth issue is that unreclaimed performance events (after a QEMU
system_reset) in KVM may cause random, unwanted, or unknown NMIs to be
injected into the VM.
The AMD PMU registers are not reset during QEMU system_reset.
(1) If the VM is reset (e.g., via QEMU system_reset or VM kdump/kexec) while
running "perf top", the PMU registers are not disabled properly.
(2) Despite x86_cpu_reset() resetting many registers to zero, kvm_put_msrs()
does not handle AMD PMU registers, causing some PMU events to remain
enabled in KVM.
(3) The KVM kvm_pmc_speculative_in_use() function consistently returns true,
preventing the reclamation of these events. Consequently, the
kvm_pmc->perf_event remains active.
(4) After a reboot, the VM kernel may report the following error:
[ 0.092011] Performance Events: Fam17h+ core perfctr, Broken BIOS detected, complain to your hardware vendor.
[ 0.092023] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR c0010200 is 530076)
(5) In the worst case, the active kvm_pmc->perf_event may inject unknown
NMIs randomly into the VM kernel:
[...] Uhhuh. NMI received for unknown reason 30 on CPU 0.
To resolve these issues, we propose resetting AMD PMU registers during the
VM reset process
Changed since v1:
- Use feature_dependencies for CPUID_EXT3_PERFCORE and
CPUID_8000_0022_EAX_PERFMON_V2.
- Remove CPUID_EXT3_PERFCORE when !cpu->enable_pmu.
- Pick kvm_arch_pre_create_vcpu() patch from Xiaoyao Li.
- Use "-pmu" but not a global "pmu-cap-disabled" for KVM_PMU_CAP_DISABLE.
- Also use sysfs kvm.enable_pmu=N to determine if PMU is supported.
- Some changes to PMU register limit calculation.
Changed since v2:
- Change has_pmu_cap to pmu_cap.
- Use cpuid_find_entry() instead of cpu_x86_cpuid().
- Rework the code flow of PATCH 07 related to kvm.enable_pmu=N following
Zhao's suggestion.
- Use object_property_get_int() to get CPU family.
- Add support to Zhaoxin.
Changed since v3:
- Re-base on top of Zhao's queued patch.
- Use host_cpu_vendor_fms() from Zhao's patch.
- Pick new version of kvm_arch_pre_create_vcpu() patch from Xiaoyao.
- Re-split the cases into enable_pmu and !enable_pmu, following Zhao's
suggestion.
- Check AMD directly makes the "compat" rule clear.
- Some changes on commit message and comment.
- Bring back global static variable 'kvm_pmu_disabled' read from
/sys/module/kvm/parameters/enable_pmu.
Changed since v4:
- Re-base on top of most recent mainline QEMU.
- Add more Reviewed-by.
- All patches are reviewed.
Changed since v5:
- Re-base on top of most recent mainline QEMU.
- Remove patch "kvm: Introduce kvm_arch_pre_create_vcpu()" as it is
already merged.
- To resolve conflicts in new [PATCH v6 3/9] , move the PMU related code
before the call site of is_tdx_vm().
There is regression in mainline QEMU when "vendor=" is involved in QEMU
command line. I have reverted it when testing with "vendor=".
https://lore.kernel.org/all/d429b6f5-b59c-4884-b18f-8db71cb8dc7b@oracle.com/
Dongli Zhang (9):
target/i386: disable PerfMonV2 when PERFCORE unavailable
target/i386: disable PERFCORE when "-pmu" is configured
target/i386/kvm: set KVM_PMU_CAP_DISABLE if "-pmu" is configured
target/i386/kvm: extract unrelated code out of kvm_x86_build_cpuid()
target/i386/kvm: rename architectural PMU variables
target/i386/kvm: query kvm.enable_pmu parameter
target/i386/kvm: reset AMD PMU registers during VM reset
target/i386/kvm: support perfmon-v2 for reset
target/i386/kvm: don't stop Intel PMU counters
target/i386/cpu.c | 8 +
target/i386/cpu.h | 16 ++
target/i386/kvm/kvm.c | 355 +++++++++++++++++++++++++++++++++++++++------
3 files changed, 332 insertions(+), 47 deletions(-)
base-commit: 43ba160cb4bbb193560eb0d2d7decc4b5fc599fe
Thank you very much!
Dongli Zhang
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v6 1/9] target/i386: disable PerfMonV2 when PERFCORE unavailable
2025-06-24 7:43 [PATCH v6 0/9] target/i386/kvm/pmu: PMU Enhancement, Bugfix and Cleanup Dongli Zhang
@ 2025-06-24 7:43 ` Dongli Zhang
2025-06-24 7:43 ` [PATCH v6 2/9] target/i386: disable PERFCORE when "-pmu" is configured Dongli Zhang
` (7 subsequent siblings)
8 siblings, 0 replies; 16+ messages in thread
From: Dongli Zhang @ 2025-06-24 7:43 UTC (permalink / raw)
To: qemu-devel, kvm
Cc: pbonzini, zhao1.liu, mtosatti, sandipan.das, babu.moger, likexu,
like.xu.linux, groug, khorenko, alexander.ivanov, den,
davydov-max, xiaoyao.li, dapeng1.mi, joe.jin, ewanhai-oc, ewanhai
When the PERFCORE is disabled with "-cpu host,-perfctr-core", it is
reflected in in guest dmesg.
[ 0.285136] Performance Events: AMD PMU driver.
However, the guest CPUID indicates the PerfMonV2 is still available.
CPU:
Extended Performance Monitoring and Debugging (0x80000022):
AMD performance monitoring V2 = true
AMD LBR V2 = false
AMD LBR stack & PMC freezing = false
number of core perf ctrs = 0x6 (6)
number of LBR stack entries = 0x0 (0)
number of avail Northbridge perf ctrs = 0x0 (0)
number of available UMC PMCs = 0x0 (0)
active UMCs bitmask = 0x0
Disable PerfMonV2 in CPUID when PERFCORE is disabled.
Suggested-by: Zhao Liu <zhao1.liu@intel.com>
Fixes: 209b0ac12074 ("target/i386: Add PerfMonV2 feature bit")
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Sandipan Das <sandipan.das@amd.com>
---
Changed since v1:
- Use feature_dependencies (suggested by Zhao Liu).
Changed since v2:
- Nothing. Zhao and Xiaoyao may move it to x86_cpu_expand_features()
later.
target/i386/cpu.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 0d35e95430..21494816d4 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -1855,6 +1855,10 @@ static FeatureDep feature_dependencies[] = {
.from = { FEAT_7_1_EDX, CPUID_7_1_EDX_AVX10 },
.to = { FEAT_24_0_EBX, ~0ull },
},
+ {
+ .from = { FEAT_8000_0001_ECX, CPUID_EXT3_PERFCORE },
+ .to = { FEAT_8000_0022_EAX, CPUID_8000_0022_EAX_PERFMON_V2 },
+ },
};
typedef struct X86RegisterInfo32 {
--
2.43.5
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v6 2/9] target/i386: disable PERFCORE when "-pmu" is configured
2025-06-24 7:43 [PATCH v6 0/9] target/i386/kvm/pmu: PMU Enhancement, Bugfix and Cleanup Dongli Zhang
2025-06-24 7:43 ` [PATCH v6 1/9] target/i386: disable PerfMonV2 when PERFCORE unavailable Dongli Zhang
@ 2025-06-24 7:43 ` Dongli Zhang
2025-06-24 7:43 ` [PATCH v6 3/9] target/i386/kvm: set KVM_PMU_CAP_DISABLE if " Dongli Zhang
` (6 subsequent siblings)
8 siblings, 0 replies; 16+ messages in thread
From: Dongli Zhang @ 2025-06-24 7:43 UTC (permalink / raw)
To: qemu-devel, kvm
Cc: pbonzini, zhao1.liu, mtosatti, sandipan.das, babu.moger, likexu,
like.xu.linux, groug, khorenko, alexander.ivanov, den,
davydov-max, xiaoyao.li, dapeng1.mi, joe.jin, ewanhai-oc, ewanhai
Currently, AMD PMU support isn't determined based on CPUID, that is, the
"-pmu" option does not fully disable KVM AMD PMU virtualization.
To minimize AMD PMU features, remove PERFCORE when "-pmu" is configured.
To completely disable AMD PMU virtualization will be implemented via
KVM_CAP_PMU_CAPABILITY in upcoming patches.
As a reminder, neither CPUID_EXT3_PERFCORE nor
CPUID_8000_0022_EAX_PERFMON_V2 is removed from env->features[] when "-pmu"
is configured. Developers should query whether they are supported via
cpu_x86_cpuid() rather than relying on env->features[] in future patches.
Suggested-by: Zhao Liu <zhao1.liu@intel.com>
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Sandipan Das <sandipan.das@amd.com>
---
Changed since v2:
- No need to check "kvm_enabled() && IS_AMD_CPU(env)".
Changed since v4:
- Add Reviewed-by from Sandipan.
target/i386/cpu.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 21494816d4..50757123eb 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -7768,6 +7768,10 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
!(env->hflags & HF_LMA_MASK)) {
*edx &= ~CPUID_EXT2_SYSCALL;
}
+
+ if (!cpu->enable_pmu) {
+ *ecx &= ~CPUID_EXT3_PERFCORE;
+ }
break;
case 0x80000002:
case 0x80000003:
--
2.43.5
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v6 3/9] target/i386/kvm: set KVM_PMU_CAP_DISABLE if "-pmu" is configured
2025-06-24 7:43 [PATCH v6 0/9] target/i386/kvm/pmu: PMU Enhancement, Bugfix and Cleanup Dongli Zhang
2025-06-24 7:43 ` [PATCH v6 1/9] target/i386: disable PerfMonV2 when PERFCORE unavailable Dongli Zhang
2025-06-24 7:43 ` [PATCH v6 2/9] target/i386: disable PERFCORE when "-pmu" is configured Dongli Zhang
@ 2025-06-24 7:43 ` Dongli Zhang
2025-07-02 3:47 ` Mi, Dapeng
2025-06-24 7:43 ` [PATCH v6 4/9] target/i386/kvm: extract unrelated code out of kvm_x86_build_cpuid() Dongli Zhang
` (5 subsequent siblings)
8 siblings, 1 reply; 16+ messages in thread
From: Dongli Zhang @ 2025-06-24 7:43 UTC (permalink / raw)
To: qemu-devel, kvm
Cc: pbonzini, zhao1.liu, mtosatti, sandipan.das, babu.moger, likexu,
like.xu.linux, groug, khorenko, alexander.ivanov, den,
davydov-max, xiaoyao.li, dapeng1.mi, joe.jin, ewanhai-oc, ewanhai
Although AMD PERFCORE and PerfMonV2 are removed when "-pmu" is configured,
there is no way to fully disable KVM AMD PMU virtualization. Neither
"-cpu host,-pmu" nor "-cpu EPYC" achieves this.
As a result, the following message still appears in the VM dmesg:
[ 0.263615] Performance Events: AMD PMU driver.
However, the expected output should be:
[ 0.596381] Performance Events: PMU not available due to virtualization, using software events only.
[ 0.600972] NMI watchdog: Perf NMI watchdog permanently disabled
This occurs because AMD does not use any CPUID bit to indicate PMU
availability.
To address this, KVM_CAP_PMU_CAPABILITY is used to set KVM_PMU_CAP_DISABLE
when "-pmu" is configured.
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
---
Changed since v1:
- Switch back to the initial implementation with "-pmu".
https://lore.kernel.org/all/20221119122901.2469-3-dongli.zhang@oracle.com
- Mention that "KVM_PMU_CAP_DISABLE doesn't change the PMU behavior on
Intel platform because current "pmu" property works as expected."
Changed since v2:
- Change has_pmu_cap to pmu_cap.
- Use (pmu_cap & KVM_PMU_CAP_DISABLE) instead of only pmu_cap in if
statement.
- Add Reviewed-by from Xiaoyao and Zhao as the change is minor.
Changed since v5:
- Re-base on top of most recent mainline QEMU.
- To resolve conflicts, move the PMU related code before the
call site of is_tdx_vm().
target/i386/kvm/kvm.c | 31 +++++++++++++++++++++++++++++++
1 file changed, 31 insertions(+)
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 234878c613..15155b79b5 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -178,6 +178,8 @@ static int has_triple_fault_event;
static bool has_msr_mcg_ext_ctl;
+static int pmu_cap;
+
static struct kvm_cpuid2 *cpuid_cache;
static struct kvm_cpuid2 *hv_cpuid_cache;
static struct kvm_msr_list *kvm_feature_msrs;
@@ -2062,6 +2064,33 @@ full:
int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp)
{
+ static bool first = true;
+ int ret;
+
+ if (first) {
+ first = false;
+
+ /*
+ * Since Linux v5.18, KVM provides a VM-level capability to easily
+ * disable PMUs; however, QEMU has been providing PMU property per
+ * CPU since v1.6. In order to accommodate both, have to configure
+ * the VM-level capability here.
+ *
+ * KVM_PMU_CAP_DISABLE doesn't change the PMU
+ * behavior on Intel platform because current "pmu" property works
+ * as expected.
+ */
+ if ((pmu_cap & KVM_PMU_CAP_DISABLE) && !X86_CPU(cpu)->enable_pmu) {
+ ret = kvm_vm_enable_cap(kvm_state, KVM_CAP_PMU_CAPABILITY, 0,
+ KVM_PMU_CAP_DISABLE);
+ if (ret < 0) {
+ error_setg_errno(errp, -ret,
+ "Failed to set KVM_PMU_CAP_DISABLE");
+ return ret;
+ }
+ }
+ }
+
if (is_tdx_vm()) {
return tdx_pre_create_vcpu(cpu, errp);
}
@@ -3363,6 +3392,8 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
}
}
+ pmu_cap = kvm_check_extension(s, KVM_CAP_PMU_CAPABILITY);
+
return 0;
}
--
2.43.5
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v6 4/9] target/i386/kvm: extract unrelated code out of kvm_x86_build_cpuid()
2025-06-24 7:43 [PATCH v6 0/9] target/i386/kvm/pmu: PMU Enhancement, Bugfix and Cleanup Dongli Zhang
` (2 preceding siblings ...)
2025-06-24 7:43 ` [PATCH v6 3/9] target/i386/kvm: set KVM_PMU_CAP_DISABLE if " Dongli Zhang
@ 2025-06-24 7:43 ` Dongli Zhang
2025-07-02 3:52 ` Mi, Dapeng
2025-06-24 7:43 ` [PATCH v6 5/9] target/i386/kvm: rename architectural PMU variables Dongli Zhang
` (4 subsequent siblings)
8 siblings, 1 reply; 16+ messages in thread
From: Dongli Zhang @ 2025-06-24 7:43 UTC (permalink / raw)
To: qemu-devel, kvm
Cc: pbonzini, zhao1.liu, mtosatti, sandipan.das, babu.moger, likexu,
like.xu.linux, groug, khorenko, alexander.ivanov, den,
davydov-max, xiaoyao.li, dapeng1.mi, joe.jin, ewanhai-oc, ewanhai
The initialization of 'has_architectural_pmu_version',
'num_architectural_pmu_gp_counters', and
'num_architectural_pmu_fixed_counters' is unrelated to the process of
building the CPUID.
Extract them out of kvm_x86_build_cpuid().
In addition, use cpuid_find_entry() instead of cpu_x86_cpuid(), because
CPUID has already been filled at this stage.
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
---
Changed since v1:
- Still extract the code, but call them for all CPUs.
Changed since v2:
- Use cpuid_find_entry() instead of cpu_x86_cpuid().
- Didn't add Reviewed-by from Dapeng as the change isn't minor.
target/i386/kvm/kvm.c | 62 ++++++++++++++++++++++++-------------------
1 file changed, 35 insertions(+), 27 deletions(-)
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 15155b79b5..4baaa069b8 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -1968,33 +1968,6 @@ uint32_t kvm_x86_build_cpuid(CPUX86State *env, struct kvm_cpuid_entry2 *entries,
}
}
- if (limit >= 0x0a) {
- uint32_t eax, edx;
-
- cpu_x86_cpuid(env, 0x0a, 0, &eax, &unused, &unused, &edx);
-
- has_architectural_pmu_version = eax & 0xff;
- if (has_architectural_pmu_version > 0) {
- num_architectural_pmu_gp_counters = (eax & 0xff00) >> 8;
-
- /* Shouldn't be more than 32, since that's the number of bits
- * available in EBX to tell us _which_ counters are available.
- * Play it safe.
- */
- if (num_architectural_pmu_gp_counters > MAX_GP_COUNTERS) {
- num_architectural_pmu_gp_counters = MAX_GP_COUNTERS;
- }
-
- if (has_architectural_pmu_version > 1) {
- num_architectural_pmu_fixed_counters = edx & 0x1f;
-
- if (num_architectural_pmu_fixed_counters > MAX_FIXED_COUNTERS) {
- num_architectural_pmu_fixed_counters = MAX_FIXED_COUNTERS;
- }
- }
- }
- }
-
cpu_x86_cpuid(env, 0x80000000, 0, &limit, &unused, &unused, &unused);
for (i = 0x80000000; i <= limit; i++) {
@@ -2098,6 +2071,39 @@ int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp)
return 0;
}
+static void kvm_init_pmu_info(struct kvm_cpuid2 *cpuid)
+{
+ struct kvm_cpuid_entry2 *c;
+
+ c = cpuid_find_entry(cpuid, 0xa, 0);
+
+ if (!c) {
+ return;
+ }
+
+ has_architectural_pmu_version = c->eax & 0xff;
+ if (has_architectural_pmu_version > 0) {
+ num_architectural_pmu_gp_counters = (c->eax & 0xff00) >> 8;
+
+ /*
+ * Shouldn't be more than 32, since that's the number of bits
+ * available in EBX to tell us _which_ counters are available.
+ * Play it safe.
+ */
+ if (num_architectural_pmu_gp_counters > MAX_GP_COUNTERS) {
+ num_architectural_pmu_gp_counters = MAX_GP_COUNTERS;
+ }
+
+ if (has_architectural_pmu_version > 1) {
+ num_architectural_pmu_fixed_counters = c->edx & 0x1f;
+
+ if (num_architectural_pmu_fixed_counters > MAX_FIXED_COUNTERS) {
+ num_architectural_pmu_fixed_counters = MAX_FIXED_COUNTERS;
+ }
+ }
+ }
+}
+
int kvm_arch_init_vcpu(CPUState *cs)
{
struct {
@@ -2288,6 +2294,8 @@ int kvm_arch_init_vcpu(CPUState *cs)
cpuid_i = kvm_x86_build_cpuid(env, cpuid_data.entries, cpuid_i);
cpuid_data.cpuid.nent = cpuid_i;
+ kvm_init_pmu_info(&cpuid_data.cpuid);
+
if (((env->cpuid_version >> 8)&0xF) >= 6
&& (env->features[FEAT_1_EDX] & (CPUID_MCE | CPUID_MCA)) ==
(CPUID_MCE | CPUID_MCA)) {
--
2.43.5
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v6 5/9] target/i386/kvm: rename architectural PMU variables
2025-06-24 7:43 [PATCH v6 0/9] target/i386/kvm/pmu: PMU Enhancement, Bugfix and Cleanup Dongli Zhang
` (3 preceding siblings ...)
2025-06-24 7:43 ` [PATCH v6 4/9] target/i386/kvm: extract unrelated code out of kvm_x86_build_cpuid() Dongli Zhang
@ 2025-06-24 7:43 ` Dongli Zhang
2025-08-13 9:18 ` Sandipan Das
2025-06-24 7:43 ` [PATCH v6 6/9] target/i386/kvm: query kvm.enable_pmu parameter Dongli Zhang
` (3 subsequent siblings)
8 siblings, 1 reply; 16+ messages in thread
From: Dongli Zhang @ 2025-06-24 7:43 UTC (permalink / raw)
To: qemu-devel, kvm
Cc: pbonzini, zhao1.liu, mtosatti, sandipan.das, babu.moger, likexu,
like.xu.linux, groug, khorenko, alexander.ivanov, den,
davydov-max, xiaoyao.li, dapeng1.mi, joe.jin, ewanhai-oc, ewanhai
AMD does not have what is commonly referred to as an architectural PMU.
Therefore, we need to rename the following variables to be applicable for
both Intel and AMD:
- has_architectural_pmu_version
- num_architectural_pmu_gp_counters
- num_architectural_pmu_fixed_counters
For Intel processors, the meaning of pmu_version remains unchanged.
For AMD processors:
pmu_version == 1 corresponds to versions before AMD PerfMonV2.
pmu_version == 2 corresponds to AMD PerfMonV2.
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
---
Changed since v2:
- Change has_pmu_version to pmu_version.
- Add Reviewed-by since the change is minor.
- As a reminder, there are some contextual change due to PATCH 05,
i.e., c->edx vs. edx.
target/i386/kvm/kvm.c | 49 ++++++++++++++++++++++++-------------------
1 file changed, 28 insertions(+), 21 deletions(-)
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 4baaa069b8..824148688d 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -166,9 +166,16 @@ static bool has_msr_perf_capabs;
static bool has_msr_pkrs;
static bool has_msr_hwcr;
-static uint32_t has_architectural_pmu_version;
-static uint32_t num_architectural_pmu_gp_counters;
-static uint32_t num_architectural_pmu_fixed_counters;
+/*
+ * For Intel processors, the meaning is the architectural PMU version
+ * number.
+ *
+ * For AMD processors: 1 corresponds to the prior versions, and 2
+ * corresponds to AMD PerfMonV2.
+ */
+static uint32_t pmu_version;
+static uint32_t num_pmu_gp_counters;
+static uint32_t num_pmu_fixed_counters;
static int has_xsave2;
static int has_xcrs;
@@ -2081,24 +2088,24 @@ static void kvm_init_pmu_info(struct kvm_cpuid2 *cpuid)
return;
}
- has_architectural_pmu_version = c->eax & 0xff;
- if (has_architectural_pmu_version > 0) {
- num_architectural_pmu_gp_counters = (c->eax & 0xff00) >> 8;
+ pmu_version = c->eax & 0xff;
+ if (pmu_version > 0) {
+ num_pmu_gp_counters = (c->eax & 0xff00) >> 8;
/*
* Shouldn't be more than 32, since that's the number of bits
* available in EBX to tell us _which_ counters are available.
* Play it safe.
*/
- if (num_architectural_pmu_gp_counters > MAX_GP_COUNTERS) {
- num_architectural_pmu_gp_counters = MAX_GP_COUNTERS;
+ if (num_pmu_gp_counters > MAX_GP_COUNTERS) {
+ num_pmu_gp_counters = MAX_GP_COUNTERS;
}
- if (has_architectural_pmu_version > 1) {
- num_architectural_pmu_fixed_counters = c->edx & 0x1f;
+ if (pmu_version > 1) {
+ num_pmu_fixed_counters = c->edx & 0x1f;
- if (num_architectural_pmu_fixed_counters > MAX_FIXED_COUNTERS) {
- num_architectural_pmu_fixed_counters = MAX_FIXED_COUNTERS;
+ if (num_pmu_fixed_counters > MAX_FIXED_COUNTERS) {
+ num_pmu_fixed_counters = MAX_FIXED_COUNTERS;
}
}
}
@@ -4051,25 +4058,25 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
kvm_msr_entry_add(cpu, MSR_KVM_POLL_CONTROL, env->poll_control_msr);
}
- if (has_architectural_pmu_version > 0) {
- if (has_architectural_pmu_version > 1) {
+ if (pmu_version > 0) {
+ if (pmu_version > 1) {
/* Stop the counter. */
kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR_CTRL, 0);
kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_CTRL, 0);
}
/* Set the counter values. */
- for (i = 0; i < num_architectural_pmu_fixed_counters; i++) {
+ for (i = 0; i < num_pmu_fixed_counters; i++) {
kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR0 + i,
env->msr_fixed_counters[i]);
}
- for (i = 0; i < num_architectural_pmu_gp_counters; i++) {
+ for (i = 0; i < num_pmu_gp_counters; i++) {
kvm_msr_entry_add(cpu, MSR_P6_PERFCTR0 + i,
env->msr_gp_counters[i]);
kvm_msr_entry_add(cpu, MSR_P6_EVNTSEL0 + i,
env->msr_gp_evtsel[i]);
}
- if (has_architectural_pmu_version > 1) {
+ if (pmu_version > 1) {
kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_STATUS,
env->msr_global_status);
kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_OVF_CTRL,
@@ -4529,17 +4536,17 @@ static int kvm_get_msrs(X86CPU *cpu)
if (env->features[FEAT_KVM] & CPUID_KVM_POLL_CONTROL) {
kvm_msr_entry_add(cpu, MSR_KVM_POLL_CONTROL, 1);
}
- if (has_architectural_pmu_version > 0) {
- if (has_architectural_pmu_version > 1) {
+ if (pmu_version > 0) {
+ if (pmu_version > 1) {
kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR_CTRL, 0);
kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_CTRL, 0);
kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_STATUS, 0);
kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_OVF_CTRL, 0);
}
- for (i = 0; i < num_architectural_pmu_fixed_counters; i++) {
+ for (i = 0; i < num_pmu_fixed_counters; i++) {
kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR0 + i, 0);
}
- for (i = 0; i < num_architectural_pmu_gp_counters; i++) {
+ for (i = 0; i < num_pmu_gp_counters; i++) {
kvm_msr_entry_add(cpu, MSR_P6_PERFCTR0 + i, 0);
kvm_msr_entry_add(cpu, MSR_P6_EVNTSEL0 + i, 0);
}
--
2.43.5
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v6 6/9] target/i386/kvm: query kvm.enable_pmu parameter
2025-06-24 7:43 [PATCH v6 0/9] target/i386/kvm/pmu: PMU Enhancement, Bugfix and Cleanup Dongli Zhang
` (4 preceding siblings ...)
2025-06-24 7:43 ` [PATCH v6 5/9] target/i386/kvm: rename architectural PMU variables Dongli Zhang
@ 2025-06-24 7:43 ` Dongli Zhang
2025-07-02 5:10 ` Mi, Dapeng
2025-06-24 7:43 ` [PATCH v6 7/9] target/i386/kvm: reset AMD PMU registers during VM reset Dongli Zhang
` (2 subsequent siblings)
8 siblings, 1 reply; 16+ messages in thread
From: Dongli Zhang @ 2025-06-24 7:43 UTC (permalink / raw)
To: qemu-devel, kvm
Cc: pbonzini, zhao1.liu, mtosatti, sandipan.das, babu.moger, likexu,
like.xu.linux, groug, khorenko, alexander.ivanov, den,
davydov-max, xiaoyao.li, dapeng1.mi, joe.jin, ewanhai-oc, ewanhai
When PMU is enabled in QEMU, there is a chance that PMU virtualization is
completely disabled by the KVM module parameter kvm.enable_pmu=N.
The kvm.enable_pmu parameter is introduced since Linux v5.17.
Its permission is 0444. It does not change until a reload of the KVM
module.
Read the kvm.enable_pmu value from the module sysfs to give a chance to
provide more information about vPMU enablement.
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
---
Changed since v2:
- Rework the code flow following Zhao's suggestion.
- Return error when:
(*kvm_enable_pmu == 'N' && X86_CPU(cpu)->enable_pmu)
Changed since v3:
- Re-split the cases into enable_pmu and !enable_pmu, following Zhao's
suggestion.
- Rework the commit messages.
- Bring back global static variable 'kvm_pmu_disabled' from v2.
Changed since v4:
- Add Reviewed-by from Zhao.
Changed since v5:
- Rebase on top of most recent QEMU.
target/i386/kvm/kvm.c | 61 +++++++++++++++++++++++++++++++------------
1 file changed, 44 insertions(+), 17 deletions(-)
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 824148688d..d191dd1da3 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -186,6 +186,10 @@ static int has_triple_fault_event;
static bool has_msr_mcg_ext_ctl;
static int pmu_cap;
+/*
+ * Read from /sys/module/kvm/parameters/enable_pmu.
+ */
+static bool kvm_pmu_disabled;
static struct kvm_cpuid2 *cpuid_cache;
static struct kvm_cpuid2 *hv_cpuid_cache;
@@ -2050,23 +2054,30 @@ int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp)
if (first) {
first = false;
- /*
- * Since Linux v5.18, KVM provides a VM-level capability to easily
- * disable PMUs; however, QEMU has been providing PMU property per
- * CPU since v1.6. In order to accommodate both, have to configure
- * the VM-level capability here.
- *
- * KVM_PMU_CAP_DISABLE doesn't change the PMU
- * behavior on Intel platform because current "pmu" property works
- * as expected.
- */
- if ((pmu_cap & KVM_PMU_CAP_DISABLE) && !X86_CPU(cpu)->enable_pmu) {
- ret = kvm_vm_enable_cap(kvm_state, KVM_CAP_PMU_CAPABILITY, 0,
- KVM_PMU_CAP_DISABLE);
- if (ret < 0) {
- error_setg_errno(errp, -ret,
- "Failed to set KVM_PMU_CAP_DISABLE");
- return ret;
+ if (X86_CPU(cpu)->enable_pmu) {
+ if (kvm_pmu_disabled) {
+ warn_report("Failed to enable PMU since "
+ "KVM's enable_pmu parameter is disabled");
+ }
+ } else {
+ /*
+ * Since Linux v5.18, KVM provides a VM-level capability to easily
+ * disable PMUs; however, QEMU has been providing PMU property per
+ * CPU since v1.6. In order to accommodate both, have to configure
+ * the VM-level capability here.
+ *
+ * KVM_PMU_CAP_DISABLE doesn't change the PMU
+ * behavior on Intel platform because current "pmu" property works
+ * as expected.
+ */
+ if (pmu_cap & KVM_PMU_CAP_DISABLE) {
+ ret = kvm_vm_enable_cap(kvm_state, KVM_CAP_PMU_CAPABILITY, 0,
+ KVM_PMU_CAP_DISABLE);
+ if (ret < 0) {
+ error_setg_errno(errp, -ret,
+ "Failed to set KVM_PMU_CAP_DISABLE");
+ return ret;
+ }
}
}
}
@@ -3273,6 +3284,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
int ret;
struct utsname utsname;
Error *local_err = NULL;
+ g_autofree char *kvm_enable_pmu;
/*
* Initialize confidential guest (SEV/TDX) context, if required
@@ -3409,6 +3421,21 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
pmu_cap = kvm_check_extension(s, KVM_CAP_PMU_CAPABILITY);
+ /*
+ * The enable_pmu parameter is introduced since Linux v5.17,
+ * give a chance to provide more information about vPMU
+ * enablement.
+ *
+ * The kvm.enable_pmu's permission is 0444. It does not change
+ * until a reload of the KVM module.
+ */
+ if (g_file_get_contents("/sys/module/kvm/parameters/enable_pmu",
+ &kvm_enable_pmu, NULL, NULL)) {
+ if (*kvm_enable_pmu == 'N') {
+ kvm_pmu_disabled = true;
+ }
+ }
+
return 0;
}
--
2.43.5
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v6 7/9] target/i386/kvm: reset AMD PMU registers during VM reset
2025-06-24 7:43 [PATCH v6 0/9] target/i386/kvm/pmu: PMU Enhancement, Bugfix and Cleanup Dongli Zhang
` (5 preceding siblings ...)
2025-06-24 7:43 ` [PATCH v6 6/9] target/i386/kvm: query kvm.enable_pmu parameter Dongli Zhang
@ 2025-06-24 7:43 ` Dongli Zhang
2025-07-02 5:38 ` Mi, Dapeng
2025-06-24 7:43 ` [PATCH v6 8/9] target/i386/kvm: support perfmon-v2 for reset Dongli Zhang
2025-06-24 7:43 ` [PATCH v6 9/9] target/i386/kvm: don't stop Intel PMU counters Dongli Zhang
8 siblings, 1 reply; 16+ messages in thread
From: Dongli Zhang @ 2025-06-24 7:43 UTC (permalink / raw)
To: qemu-devel, kvm
Cc: pbonzini, zhao1.liu, mtosatti, sandipan.das, babu.moger, likexu,
like.xu.linux, groug, khorenko, alexander.ivanov, den,
davydov-max, xiaoyao.li, dapeng1.mi, joe.jin, ewanhai-oc, ewanhai
QEMU uses the kvm_get_msrs() function to save Intel PMU registers from KVM
and kvm_put_msrs() to restore them to KVM. However, there is no support for
AMD PMU registers. Currently, pmu_version and num_pmu_gp_counters are
initialized based on cpuid(0xa), which does not apply to AMD processors.
For AMD CPUs, prior to PerfMonV2, the number of general-purpose registers
is determined based on the CPU version.
To address this issue, we need to add support for AMD PMU registers.
Without this support, the following problems can arise:
1. If the VM is reset (e.g., via QEMU system_reset or VM kdump/kexec) while
running "perf top", the PMU registers are not disabled properly.
2. Despite x86_cpu_reset() resetting many registers to zero, kvm_put_msrs()
does not handle AMD PMU registers, causing some PMU events to remain
enabled in KVM.
3. The KVM kvm_pmc_speculative_in_use() function consistently returns true,
preventing the reclamation of these events. Consequently, the
kvm_pmc->perf_event remains active.
4. After a reboot, the VM kernel may report the following error:
[ 0.092011] Performance Events: Fam17h+ core perfctr, Broken BIOS detected, complain to your hardware vendor.
[ 0.092023] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR c0010200 is 530076)
5. In the worst case, the active kvm_pmc->perf_event may inject unknown
NMIs randomly into the VM kernel:
[...] Uhhuh. NMI received for unknown reason 30 on CPU 0.
To resolve these issues, we propose resetting AMD PMU registers during the
VM reset process.
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Sandipan Das <sandipan.das@amd.com>
---
Changed since v1:
- Modify "MSR_K7_EVNTSEL0 + 3" and "MSR_K7_PERFCTR0 + 3" by using
AMD64_NUM_COUNTERS (suggested by Sandipan Das).
- Use "AMD64_NUM_COUNTERS_CORE * 2 - 1", not "MSR_F15H_PERF_CTL0 + 0xb".
(suggested by Sandipan Das).
- Switch back to "-pmu" instead of using a global "pmu-cap-disabled".
- Don't initialize PMU info if kvm.enable_pmu=N.
Changed since v2:
- Remove 'static' from host_cpuid_vendorX.
- Change has_pmu_version to pmu_version.
- Use object_property_get_int() to get CPU family.
- Use cpuid_find_entry() instead of cpu_x86_cpuid().
- Send error log when host and guest are from different vendors.
- Move "if (!cpu->enable_pmu)" to begin of function. Add comments to
reminder developers.
- Add support to Zhaoxin. Change is_same_vendor() to
is_host_compat_vendor().
- Didn't add Reviewed-by from Sandipan because the change isn't minor.
Changed since v3:
- Use host_cpu_vendor_fms() from Zhao's patch.
- Check AMD directly makes the "compat" rule clear.
- Add comment to MAX_GP_COUNTERS.
- Skip PMU info initialization if !kvm_pmu_disabled.
Changed since v4:
- Add Reviewed-by from Zhao and Sandipan.
target/i386/cpu.h | 12 +++
target/i386/kvm/kvm.c | 175 +++++++++++++++++++++++++++++++++++++++++-
2 files changed, 183 insertions(+), 4 deletions(-)
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 51e10139df..3fe6263ecf 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -486,6 +486,14 @@ typedef enum X86Seg {
#define MSR_CORE_PERF_GLOBAL_CTRL 0x38f
#define MSR_CORE_PERF_GLOBAL_OVF_CTRL 0x390
+#define MSR_K7_EVNTSEL0 0xc0010000
+#define MSR_K7_PERFCTR0 0xc0010004
+#define MSR_F15H_PERF_CTL0 0xc0010200
+#define MSR_F15H_PERF_CTR0 0xc0010201
+
+#define AMD64_NUM_COUNTERS 4
+#define AMD64_NUM_COUNTERS_CORE 6
+
#define MSR_MC0_CTL 0x400
#define MSR_MC0_STATUS 0x401
#define MSR_MC0_ADDR 0x402
@@ -1636,6 +1644,10 @@ typedef struct {
#endif
#define MAX_FIXED_COUNTERS 3
+/*
+ * This formula is based on Intel's MSR. The current size also meets AMD's
+ * needs.
+ */
#define MAX_GP_COUNTERS (MSR_IA32_PERF_STATUS - MSR_P6_EVNTSEL0)
#define NB_OPMASK_REGS 8
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index d191dd1da3..ff9be6a06f 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -2089,7 +2089,7 @@ int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp)
return 0;
}
-static void kvm_init_pmu_info(struct kvm_cpuid2 *cpuid)
+static void kvm_init_pmu_info_intel(struct kvm_cpuid2 *cpuid)
{
struct kvm_cpuid_entry2 *c;
@@ -2122,6 +2122,96 @@ static void kvm_init_pmu_info(struct kvm_cpuid2 *cpuid)
}
}
+static void kvm_init_pmu_info_amd(struct kvm_cpuid2 *cpuid, X86CPU *cpu)
+{
+ struct kvm_cpuid_entry2 *c;
+ int64_t family;
+
+ family = object_property_get_int(OBJECT(cpu), "family", NULL);
+ if (family < 0) {
+ return;
+ }
+
+ if (family < 6) {
+ error_report("AMD performance-monitoring is supported from "
+ "K7 and later");
+ return;
+ }
+
+ pmu_version = 1;
+ num_pmu_gp_counters = AMD64_NUM_COUNTERS;
+
+ c = cpuid_find_entry(cpuid, 0x80000001, 0);
+ if (!c) {
+ return;
+ }
+
+ if (!(c->ecx & CPUID_EXT3_PERFCORE)) {
+ return;
+ }
+
+ num_pmu_gp_counters = AMD64_NUM_COUNTERS_CORE;
+}
+
+static bool is_host_compat_vendor(CPUX86State *env)
+{
+ char host_vendor[CPUID_VENDOR_SZ + 1];
+
+ host_cpu_vendor_fms(host_vendor, NULL, NULL, NULL);
+
+ /*
+ * Intel and Zhaoxin are compatible.
+ */
+ if ((g_str_equal(host_vendor, CPUID_VENDOR_INTEL) ||
+ g_str_equal(host_vendor, CPUID_VENDOR_ZHAOXIN1) ||
+ g_str_equal(host_vendor, CPUID_VENDOR_ZHAOXIN2)) &&
+ (IS_INTEL_CPU(env) || IS_ZHAOXIN_CPU(env))) {
+ return true;
+ }
+
+ return g_str_equal(host_vendor, CPUID_VENDOR_AMD) &&
+ IS_AMD_CPU(env);
+}
+
+static void kvm_init_pmu_info(struct kvm_cpuid2 *cpuid, X86CPU *cpu)
+{
+ CPUX86State *env = &cpu->env;
+
+ /*
+ * The PMU virtualization is disabled by kvm.enable_pmu=N.
+ */
+ if (kvm_pmu_disabled) {
+ return;
+ }
+
+ /*
+ * If KVM_CAP_PMU_CAPABILITY is not supported, there is no way to
+ * disable the AMD PMU virtualization.
+ *
+ * Assume the user is aware of this when !cpu->enable_pmu. AMD PMU
+ * registers are not going to reset, even they are still available to
+ * guest VM.
+ */
+ if (!cpu->enable_pmu) {
+ return;
+ }
+
+ /*
+ * It is not supported to virtualize AMD PMU registers on Intel
+ * processors, nor to virtualize Intel PMU registers on AMD processors.
+ */
+ if (!is_host_compat_vendor(env)) {
+ error_report("host doesn't support requested feature: vPMU");
+ return;
+ }
+
+ if (IS_INTEL_CPU(env) || IS_ZHAOXIN_CPU(env)) {
+ kvm_init_pmu_info_intel(cpuid);
+ } else if (IS_AMD_CPU(env)) {
+ kvm_init_pmu_info_amd(cpuid, cpu);
+ }
+}
+
int kvm_arch_init_vcpu(CPUState *cs)
{
struct {
@@ -2312,7 +2402,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
cpuid_i = kvm_x86_build_cpuid(env, cpuid_data.entries, cpuid_i);
cpuid_data.cpuid.nent = cpuid_i;
- kvm_init_pmu_info(&cpuid_data.cpuid);
+ kvm_init_pmu_info(&cpuid_data.cpuid, cpu);
if (((env->cpuid_version >> 8)&0xF) >= 6
&& (env->features[FEAT_1_EDX] & (CPUID_MCE | CPUID_MCA)) ==
@@ -4085,7 +4175,7 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
kvm_msr_entry_add(cpu, MSR_KVM_POLL_CONTROL, env->poll_control_msr);
}
- if (pmu_version > 0) {
+ if ((IS_INTEL_CPU(env) || IS_ZHAOXIN_CPU(env)) && pmu_version > 0) {
if (pmu_version > 1) {
/* Stop the counter. */
kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR_CTRL, 0);
@@ -4116,6 +4206,38 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
env->msr_global_ctrl);
}
}
+
+ if (IS_AMD_CPU(env) && pmu_version > 0) {
+ uint32_t sel_base = MSR_K7_EVNTSEL0;
+ uint32_t ctr_base = MSR_K7_PERFCTR0;
+ /*
+ * The address of the next selector or counter register is
+ * obtained by incrementing the address of the current selector
+ * or counter register by one.
+ */
+ uint32_t step = 1;
+
+ /*
+ * When PERFCORE is enabled, AMD PMU uses a separate set of
+ * addresses for the selector and counter registers.
+ * Additionally, the address of the next selector or counter
+ * register is determined by incrementing the address of the
+ * current register by two.
+ */
+ if (num_pmu_gp_counters == AMD64_NUM_COUNTERS_CORE) {
+ sel_base = MSR_F15H_PERF_CTL0;
+ ctr_base = MSR_F15H_PERF_CTR0;
+ step = 2;
+ }
+
+ for (i = 0; i < num_pmu_gp_counters; i++) {
+ kvm_msr_entry_add(cpu, ctr_base + i * step,
+ env->msr_gp_counters[i]);
+ kvm_msr_entry_add(cpu, sel_base + i * step,
+ env->msr_gp_evtsel[i]);
+ }
+ }
+
/*
* Hyper-V partition-wide MSRs: to avoid clearing them on cpu hot-add,
* only sync them to KVM on the first cpu
@@ -4563,7 +4685,8 @@ static int kvm_get_msrs(X86CPU *cpu)
if (env->features[FEAT_KVM] & CPUID_KVM_POLL_CONTROL) {
kvm_msr_entry_add(cpu, MSR_KVM_POLL_CONTROL, 1);
}
- if (pmu_version > 0) {
+
+ if ((IS_INTEL_CPU(env) || IS_ZHAOXIN_CPU(env)) && pmu_version > 0) {
if (pmu_version > 1) {
kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR_CTRL, 0);
kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_CTRL, 0);
@@ -4579,6 +4702,35 @@ static int kvm_get_msrs(X86CPU *cpu)
}
}
+ if (IS_AMD_CPU(env) && pmu_version > 0) {
+ uint32_t sel_base = MSR_K7_EVNTSEL0;
+ uint32_t ctr_base = MSR_K7_PERFCTR0;
+ /*
+ * The address of the next selector or counter register is
+ * obtained by incrementing the address of the current selector
+ * or counter register by one.
+ */
+ uint32_t step = 1;
+
+ /*
+ * When PERFCORE is enabled, AMD PMU uses a separate set of
+ * addresses for the selector and counter registers.
+ * Additionally, the address of the next selector or counter
+ * register is determined by incrementing the address of the
+ * current register by two.
+ */
+ if (num_pmu_gp_counters == AMD64_NUM_COUNTERS_CORE) {
+ sel_base = MSR_F15H_PERF_CTL0;
+ ctr_base = MSR_F15H_PERF_CTR0;
+ step = 2;
+ }
+
+ for (i = 0; i < num_pmu_gp_counters; i++) {
+ kvm_msr_entry_add(cpu, ctr_base + i * step, 0);
+ kvm_msr_entry_add(cpu, sel_base + i * step, 0);
+ }
+ }
+
if (env->mcg_cap) {
kvm_msr_entry_add(cpu, MSR_MCG_STATUS, 0);
kvm_msr_entry_add(cpu, MSR_MCG_CTL, 0);
@@ -4890,6 +5042,21 @@ static int kvm_get_msrs(X86CPU *cpu)
case MSR_P6_EVNTSEL0 ... MSR_P6_EVNTSEL0 + MAX_GP_COUNTERS - 1:
env->msr_gp_evtsel[index - MSR_P6_EVNTSEL0] = msrs[i].data;
break;
+ case MSR_K7_EVNTSEL0 ... MSR_K7_EVNTSEL0 + AMD64_NUM_COUNTERS - 1:
+ env->msr_gp_evtsel[index - MSR_K7_EVNTSEL0] = msrs[i].data;
+ break;
+ case MSR_K7_PERFCTR0 ... MSR_K7_PERFCTR0 + AMD64_NUM_COUNTERS - 1:
+ env->msr_gp_counters[index - MSR_K7_PERFCTR0] = msrs[i].data;
+ break;
+ case MSR_F15H_PERF_CTL0 ...
+ MSR_F15H_PERF_CTL0 + AMD64_NUM_COUNTERS_CORE * 2 - 1:
+ index = index - MSR_F15H_PERF_CTL0;
+ if (index & 0x1) {
+ env->msr_gp_counters[index] = msrs[i].data;
+ } else {
+ env->msr_gp_evtsel[index] = msrs[i].data;
+ }
+ break;
case HV_X64_MSR_HYPERCALL:
env->msr_hv_hypercall = msrs[i].data;
break;
--
2.43.5
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v6 8/9] target/i386/kvm: support perfmon-v2 for reset
2025-06-24 7:43 [PATCH v6 0/9] target/i386/kvm/pmu: PMU Enhancement, Bugfix and Cleanup Dongli Zhang
` (6 preceding siblings ...)
2025-06-24 7:43 ` [PATCH v6 7/9] target/i386/kvm: reset AMD PMU registers during VM reset Dongli Zhang
@ 2025-06-24 7:43 ` Dongli Zhang
2025-06-24 7:43 ` [PATCH v6 9/9] target/i386/kvm: don't stop Intel PMU counters Dongli Zhang
8 siblings, 0 replies; 16+ messages in thread
From: Dongli Zhang @ 2025-06-24 7:43 UTC (permalink / raw)
To: qemu-devel, kvm
Cc: pbonzini, zhao1.liu, mtosatti, sandipan.das, babu.moger, likexu,
like.xu.linux, groug, khorenko, alexander.ivanov, den,
davydov-max, xiaoyao.li, dapeng1.mi, joe.jin, ewanhai-oc, ewanhai
Since perfmon-v2, the AMD PMU supports additional registers. This update
includes get/put functionality for these extra registers.
Similar to the implementation in KVM:
- MSR_CORE_PERF_GLOBAL_STATUS and MSR_AMD64_PERF_CNTR_GLOBAL_STATUS both
use env->msr_global_status.
- MSR_CORE_PERF_GLOBAL_CTRL and MSR_AMD64_PERF_CNTR_GLOBAL_CTL both use
env->msr_global_ctrl.
- MSR_CORE_PERF_GLOBAL_OVF_CTRL and MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR
both use env->msr_global_ovf_ctrl.
No changes are needed for vmstate_msr_architectural_pmu or
pmu_enable_needed().
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Sandipan Das <sandipan.das@amd.com>
---
Changed since v1:
- Use "has_pmu_version > 1", not "has_pmu_version == 2".
Changed since v2:
- Use cpuid_find_entry() instead of cpu_x86_cpuid().
- Change has_pmu_version to pmu_version.
- Cap num_pmu_gp_counters with MAX_GP_COUNTERS.
Changed since v4:
- Add Reviewed-by from Sandipan.
target/i386/cpu.h | 4 ++++
target/i386/kvm/kvm.c | 48 +++++++++++++++++++++++++++++++++++--------
2 files changed, 43 insertions(+), 9 deletions(-)
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 3fe6263ecf..e6480a6871 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -486,6 +486,10 @@ typedef enum X86Seg {
#define MSR_CORE_PERF_GLOBAL_CTRL 0x38f
#define MSR_CORE_PERF_GLOBAL_OVF_CTRL 0x390
+#define MSR_AMD64_PERF_CNTR_GLOBAL_STATUS 0xc0000300
+#define MSR_AMD64_PERF_CNTR_GLOBAL_CTL 0xc0000301
+#define MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR 0xc0000302
+
#define MSR_K7_EVNTSEL0 0xc0010000
#define MSR_K7_PERFCTR0 0xc0010004
#define MSR_F15H_PERF_CTL0 0xc0010200
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index ff9be6a06f..4bbdf996ef 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -2151,6 +2151,16 @@ static void kvm_init_pmu_info_amd(struct kvm_cpuid2 *cpuid, X86CPU *cpu)
}
num_pmu_gp_counters = AMD64_NUM_COUNTERS_CORE;
+
+ c = cpuid_find_entry(cpuid, 0x80000022, 0);
+ if (c && (c->eax & CPUID_8000_0022_EAX_PERFMON_V2)) {
+ pmu_version = 2;
+ num_pmu_gp_counters = c->ebx & 0xf;
+
+ if (num_pmu_gp_counters > MAX_GP_COUNTERS) {
+ num_pmu_gp_counters = MAX_GP_COUNTERS;
+ }
+ }
}
static bool is_host_compat_vendor(CPUX86State *env)
@@ -4218,13 +4228,14 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
uint32_t step = 1;
/*
- * When PERFCORE is enabled, AMD PMU uses a separate set of
- * addresses for the selector and counter registers.
- * Additionally, the address of the next selector or counter
- * register is determined by incrementing the address of the
- * current register by two.
+ * When PERFCORE or PerfMonV2 is enabled, AMD PMU uses a
+ * separate set of addresses for the selector and counter
+ * registers. Additionally, the address of the next selector or
+ * counter register is determined by incrementing the address
+ * of the current register by two.
*/
- if (num_pmu_gp_counters == AMD64_NUM_COUNTERS_CORE) {
+ if (num_pmu_gp_counters == AMD64_NUM_COUNTERS_CORE ||
+ pmu_version > 1) {
sel_base = MSR_F15H_PERF_CTL0;
ctr_base = MSR_F15H_PERF_CTR0;
step = 2;
@@ -4236,6 +4247,15 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
kvm_msr_entry_add(cpu, sel_base + i * step,
env->msr_gp_evtsel[i]);
}
+
+ if (pmu_version > 1) {
+ kvm_msr_entry_add(cpu, MSR_AMD64_PERF_CNTR_GLOBAL_STATUS,
+ env->msr_global_status);
+ kvm_msr_entry_add(cpu, MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR,
+ env->msr_global_ovf_ctrl);
+ kvm_msr_entry_add(cpu, MSR_AMD64_PERF_CNTR_GLOBAL_CTL,
+ env->msr_global_ctrl);
+ }
}
/*
@@ -4713,13 +4733,14 @@ static int kvm_get_msrs(X86CPU *cpu)
uint32_t step = 1;
/*
- * When PERFCORE is enabled, AMD PMU uses a separate set of
- * addresses for the selector and counter registers.
+ * When PERFCORE or PerfMonV2 is enabled, AMD PMU uses a separate
+ * set of addresses for the selector and counter registers.
* Additionally, the address of the next selector or counter
* register is determined by incrementing the address of the
* current register by two.
*/
- if (num_pmu_gp_counters == AMD64_NUM_COUNTERS_CORE) {
+ if (num_pmu_gp_counters == AMD64_NUM_COUNTERS_CORE ||
+ pmu_version > 1) {
sel_base = MSR_F15H_PERF_CTL0;
ctr_base = MSR_F15H_PERF_CTR0;
step = 2;
@@ -4729,6 +4750,12 @@ static int kvm_get_msrs(X86CPU *cpu)
kvm_msr_entry_add(cpu, ctr_base + i * step, 0);
kvm_msr_entry_add(cpu, sel_base + i * step, 0);
}
+
+ if (pmu_version > 1) {
+ kvm_msr_entry_add(cpu, MSR_AMD64_PERF_CNTR_GLOBAL_CTL, 0);
+ kvm_msr_entry_add(cpu, MSR_AMD64_PERF_CNTR_GLOBAL_STATUS, 0);
+ kvm_msr_entry_add(cpu, MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR, 0);
+ }
}
if (env->mcg_cap) {
@@ -5025,12 +5052,15 @@ static int kvm_get_msrs(X86CPU *cpu)
env->msr_fixed_ctr_ctrl = msrs[i].data;
break;
case MSR_CORE_PERF_GLOBAL_CTRL:
+ case MSR_AMD64_PERF_CNTR_GLOBAL_CTL:
env->msr_global_ctrl = msrs[i].data;
break;
case MSR_CORE_PERF_GLOBAL_STATUS:
+ case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS:
env->msr_global_status = msrs[i].data;
break;
case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
+ case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR:
env->msr_global_ovf_ctrl = msrs[i].data;
break;
case MSR_CORE_PERF_FIXED_CTR0 ... MSR_CORE_PERF_FIXED_CTR0 + MAX_FIXED_COUNTERS - 1:
--
2.43.5
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v6 9/9] target/i386/kvm: don't stop Intel PMU counters
2025-06-24 7:43 [PATCH v6 0/9] target/i386/kvm/pmu: PMU Enhancement, Bugfix and Cleanup Dongli Zhang
` (7 preceding siblings ...)
2025-06-24 7:43 ` [PATCH v6 8/9] target/i386/kvm: support perfmon-v2 for reset Dongli Zhang
@ 2025-06-24 7:43 ` Dongli Zhang
2025-07-02 5:42 ` Mi, Dapeng
8 siblings, 1 reply; 16+ messages in thread
From: Dongli Zhang @ 2025-06-24 7:43 UTC (permalink / raw)
To: qemu-devel, kvm
Cc: pbonzini, zhao1.liu, mtosatti, sandipan.das, babu.moger, likexu,
like.xu.linux, groug, khorenko, alexander.ivanov, den,
davydov-max, xiaoyao.li, dapeng1.mi, joe.jin, ewanhai-oc, ewanhai
PMU MSRs are set by QEMU only at levels >= KVM_PUT_RESET_STATE,
excluding runtime. Therefore, updating these MSRs without stopping events
should be acceptable.
In addition, KVM creates kernel perf events with host mode excluded
(exclude_host = 1). While the events remain active, they don't increment
the counter during QEMU vCPU userspace mode.
Finally, The kvm_put_msrs() sets the MSRs using KVM_SET_MSRS. The x86 KVM
processes these MSRs one by one in a loop, only saving the config and
triggering the KVM_REQ_PMU request. This approach does not immediately stop
the event before updating PMC. This approach is true since Linux kernel
commit 68fb4757e867 ("KVM: x86/pmu: Defer reprogram_counter() to
kvm_pmu_handle_event"), that is, v6.2.
No Fixed tag is going to be added for the commit 0d89436786b0 ("kvm:
migrate vPMU state"), because this isn't a bugfix.
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
---
Changed since v3:
- Re-order reasons in commit messages.
- Mention KVM's commit 68fb4757e867 (v6.2).
- Keep Zhao's review as there isn't code change.
target/i386/kvm/kvm.c | 9 ---------
1 file changed, 9 deletions(-)
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 4bbdf996ef..207de21404 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -4186,13 +4186,6 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
}
if ((IS_INTEL_CPU(env) || IS_ZHAOXIN_CPU(env)) && pmu_version > 0) {
- if (pmu_version > 1) {
- /* Stop the counter. */
- kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR_CTRL, 0);
- kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_CTRL, 0);
- }
-
- /* Set the counter values. */
for (i = 0; i < num_pmu_fixed_counters; i++) {
kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR0 + i,
env->msr_fixed_counters[i]);
@@ -4208,8 +4201,6 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
env->msr_global_status);
kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_OVF_CTRL,
env->msr_global_ovf_ctrl);
-
- /* Now start the PMU. */
kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR_CTRL,
env->msr_fixed_ctr_ctrl);
kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_CTRL,
--
2.43.5
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v6 3/9] target/i386/kvm: set KVM_PMU_CAP_DISABLE if "-pmu" is configured
2025-06-24 7:43 ` [PATCH v6 3/9] target/i386/kvm: set KVM_PMU_CAP_DISABLE if " Dongli Zhang
@ 2025-07-02 3:47 ` Mi, Dapeng
0 siblings, 0 replies; 16+ messages in thread
From: Mi, Dapeng @ 2025-07-02 3:47 UTC (permalink / raw)
To: Dongli Zhang, qemu-devel, kvm
Cc: pbonzini, zhao1.liu, mtosatti, sandipan.das, babu.moger, likexu,
like.xu.linux, groug, khorenko, alexander.ivanov, den,
davydov-max, xiaoyao.li, joe.jin, ewanhai-oc, ewanhai
On 6/24/2025 3:43 PM, Dongli Zhang wrote:
> Although AMD PERFCORE and PerfMonV2 are removed when "-pmu" is configured,
> there is no way to fully disable KVM AMD PMU virtualization. Neither
> "-cpu host,-pmu" nor "-cpu EPYC" achieves this.
>
> As a result, the following message still appears in the VM dmesg:
>
> [ 0.263615] Performance Events: AMD PMU driver.
>
> However, the expected output should be:
>
> [ 0.596381] Performance Events: PMU not available due to virtualization, using software events only.
> [ 0.600972] NMI watchdog: Perf NMI watchdog permanently disabled
>
> This occurs because AMD does not use any CPUID bit to indicate PMU
> availability.
>
> To address this, KVM_CAP_PMU_CAPABILITY is used to set KVM_PMU_CAP_DISABLE
> when "-pmu" is configured.
>
> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
> Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
> ---
> Changed since v1:
> - Switch back to the initial implementation with "-pmu".
> https://lore.kernel.org/all/20221119122901.2469-3-dongli.zhang@oracle.com
> - Mention that "KVM_PMU_CAP_DISABLE doesn't change the PMU behavior on
> Intel platform because current "pmu" property works as expected."
> Changed since v2:
> - Change has_pmu_cap to pmu_cap.
> - Use (pmu_cap & KVM_PMU_CAP_DISABLE) instead of only pmu_cap in if
> statement.
> - Add Reviewed-by from Xiaoyao and Zhao as the change is minor.
> Changed since v5:
> - Re-base on top of most recent mainline QEMU.
> - To resolve conflicts, move the PMU related code before the
> call site of is_tdx_vm().
>
> target/i386/kvm/kvm.c | 31 +++++++++++++++++++++++++++++++
> 1 file changed, 31 insertions(+)
>
> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> index 234878c613..15155b79b5 100644
> --- a/target/i386/kvm/kvm.c
> +++ b/target/i386/kvm/kvm.c
> @@ -178,6 +178,8 @@ static int has_triple_fault_event;
>
> static bool has_msr_mcg_ext_ctl;
>
> +static int pmu_cap;
> +
> static struct kvm_cpuid2 *cpuid_cache;
> static struct kvm_cpuid2 *hv_cpuid_cache;
> static struct kvm_msr_list *kvm_feature_msrs;
> @@ -2062,6 +2064,33 @@ full:
>
> int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp)
> {
> + static bool first = true;
> + int ret;
> +
> + if (first) {
> + first = false;
> +
> + /*
> + * Since Linux v5.18, KVM provides a VM-level capability to easily
> + * disable PMUs; however, QEMU has been providing PMU property per
> + * CPU since v1.6. In order to accommodate both, have to configure
> + * the VM-level capability here.
> + *
> + * KVM_PMU_CAP_DISABLE doesn't change the PMU
> + * behavior on Intel platform because current "pmu" property works
> + * as expected.
> + */
> + if ((pmu_cap & KVM_PMU_CAP_DISABLE) && !X86_CPU(cpu)->enable_pmu) {
> + ret = kvm_vm_enable_cap(kvm_state, KVM_CAP_PMU_CAPABILITY, 0,
> + KVM_PMU_CAP_DISABLE);
> + if (ret < 0) {
> + error_setg_errno(errp, -ret,
> + "Failed to set KVM_PMU_CAP_DISABLE");
> + return ret;
> + }
> + }
> + }
> +
> if (is_tdx_vm()) {
> return tdx_pre_create_vcpu(cpu, errp);
> }
> @@ -3363,6 +3392,8 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
> }
> }
>
> + pmu_cap = kvm_check_extension(s, KVM_CAP_PMU_CAPABILITY);
> +
> return 0;
> }
>
LGTM.
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v6 4/9] target/i386/kvm: extract unrelated code out of kvm_x86_build_cpuid()
2025-06-24 7:43 ` [PATCH v6 4/9] target/i386/kvm: extract unrelated code out of kvm_x86_build_cpuid() Dongli Zhang
@ 2025-07-02 3:52 ` Mi, Dapeng
0 siblings, 0 replies; 16+ messages in thread
From: Mi, Dapeng @ 2025-07-02 3:52 UTC (permalink / raw)
To: Dongli Zhang, qemu-devel, kvm
Cc: pbonzini, zhao1.liu, mtosatti, sandipan.das, babu.moger, likexu,
like.xu.linux, groug, khorenko, alexander.ivanov, den,
davydov-max, xiaoyao.li, joe.jin, ewanhai-oc, ewanhai
On 6/24/2025 3:43 PM, Dongli Zhang wrote:
> The initialization of 'has_architectural_pmu_version',
> 'num_architectural_pmu_gp_counters', and
> 'num_architectural_pmu_fixed_counters' is unrelated to the process of
> building the CPUID.
>
> Extract them out of kvm_x86_build_cpuid().
>
> In addition, use cpuid_find_entry() instead of cpu_x86_cpuid(), because
> CPUID has already been filled at this stage.
>
> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
> Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
> ---
> Changed since v1:
> - Still extract the code, but call them for all CPUs.
> Changed since v2:
> - Use cpuid_find_entry() instead of cpu_x86_cpuid().
> - Didn't add Reviewed-by from Dapeng as the change isn't minor.
>
> target/i386/kvm/kvm.c | 62 ++++++++++++++++++++++++-------------------
> 1 file changed, 35 insertions(+), 27 deletions(-)
>
> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> index 15155b79b5..4baaa069b8 100644
> --- a/target/i386/kvm/kvm.c
> +++ b/target/i386/kvm/kvm.c
> @@ -1968,33 +1968,6 @@ uint32_t kvm_x86_build_cpuid(CPUX86State *env, struct kvm_cpuid_entry2 *entries,
> }
> }
>
> - if (limit >= 0x0a) {
> - uint32_t eax, edx;
> -
> - cpu_x86_cpuid(env, 0x0a, 0, &eax, &unused, &unused, &edx);
> -
> - has_architectural_pmu_version = eax & 0xff;
> - if (has_architectural_pmu_version > 0) {
> - num_architectural_pmu_gp_counters = (eax & 0xff00) >> 8;
> -
> - /* Shouldn't be more than 32, since that's the number of bits
> - * available in EBX to tell us _which_ counters are available.
> - * Play it safe.
> - */
> - if (num_architectural_pmu_gp_counters > MAX_GP_COUNTERS) {
> - num_architectural_pmu_gp_counters = MAX_GP_COUNTERS;
> - }
> -
> - if (has_architectural_pmu_version > 1) {
> - num_architectural_pmu_fixed_counters = edx & 0x1f;
> -
> - if (num_architectural_pmu_fixed_counters > MAX_FIXED_COUNTERS) {
> - num_architectural_pmu_fixed_counters = MAX_FIXED_COUNTERS;
> - }
> - }
> - }
> - }
> -
> cpu_x86_cpuid(env, 0x80000000, 0, &limit, &unused, &unused, &unused);
>
> for (i = 0x80000000; i <= limit; i++) {
> @@ -2098,6 +2071,39 @@ int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp)
> return 0;
> }
>
> +static void kvm_init_pmu_info(struct kvm_cpuid2 *cpuid)
> +{
> + struct kvm_cpuid_entry2 *c;
> +
> + c = cpuid_find_entry(cpuid, 0xa, 0);
> +
> + if (!c) {
> + return;
> + }
> +
> + has_architectural_pmu_version = c->eax & 0xff;
> + if (has_architectural_pmu_version > 0) {
> + num_architectural_pmu_gp_counters = (c->eax & 0xff00) >> 8;
> +
> + /*
> + * Shouldn't be more than 32, since that's the number of bits
> + * available in EBX to tell us _which_ counters are available.
> + * Play it safe.
> + */
> + if (num_architectural_pmu_gp_counters > MAX_GP_COUNTERS) {
> + num_architectural_pmu_gp_counters = MAX_GP_COUNTERS;
> + }
> +
> + if (has_architectural_pmu_version > 1) {
> + num_architectural_pmu_fixed_counters = c->edx & 0x1f;
> +
> + if (num_architectural_pmu_fixed_counters > MAX_FIXED_COUNTERS) {
> + num_architectural_pmu_fixed_counters = MAX_FIXED_COUNTERS;
> + }
> + }
> + }
> +}
> +
> int kvm_arch_init_vcpu(CPUState *cs)
> {
> struct {
> @@ -2288,6 +2294,8 @@ int kvm_arch_init_vcpu(CPUState *cs)
> cpuid_i = kvm_x86_build_cpuid(env, cpuid_data.entries, cpuid_i);
> cpuid_data.cpuid.nent = cpuid_i;
>
> + kvm_init_pmu_info(&cpuid_data.cpuid);
> +
> if (((env->cpuid_version >> 8)&0xF) >= 6
> && (env->features[FEAT_1_EDX] & (CPUID_MCE | CPUID_MCA)) ==
> (CPUID_MCE | CPUID_MCA)) {
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v6 6/9] target/i386/kvm: query kvm.enable_pmu parameter
2025-06-24 7:43 ` [PATCH v6 6/9] target/i386/kvm: query kvm.enable_pmu parameter Dongli Zhang
@ 2025-07-02 5:10 ` Mi, Dapeng
0 siblings, 0 replies; 16+ messages in thread
From: Mi, Dapeng @ 2025-07-02 5:10 UTC (permalink / raw)
To: Dongli Zhang, qemu-devel, kvm
Cc: pbonzini, zhao1.liu, mtosatti, sandipan.das, babu.moger, likexu,
like.xu.linux, groug, khorenko, alexander.ivanov, den,
davydov-max, xiaoyao.li, joe.jin, ewanhai-oc, ewanhai
On 6/24/2025 3:43 PM, Dongli Zhang wrote:
> When PMU is enabled in QEMU, there is a chance that PMU virtualization is
> completely disabled by the KVM module parameter kvm.enable_pmu=N.
>
> The kvm.enable_pmu parameter is introduced since Linux v5.17.
> Its permission is 0444. It does not change until a reload of the KVM
> module.
>
> Read the kvm.enable_pmu value from the module sysfs to give a chance to
> provide more information about vPMU enablement.
>
> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
> Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
> ---
> Changed since v2:
> - Rework the code flow following Zhao's suggestion.
> - Return error when:
> (*kvm_enable_pmu == 'N' && X86_CPU(cpu)->enable_pmu)
> Changed since v3:
> - Re-split the cases into enable_pmu and !enable_pmu, following Zhao's
> suggestion.
> - Rework the commit messages.
> - Bring back global static variable 'kvm_pmu_disabled' from v2.
> Changed since v4:
> - Add Reviewed-by from Zhao.
> Changed since v5:
> - Rebase on top of most recent QEMU.
>
> target/i386/kvm/kvm.c | 61 +++++++++++++++++++++++++++++++------------
> 1 file changed, 44 insertions(+), 17 deletions(-)
>
> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> index 824148688d..d191dd1da3 100644
> --- a/target/i386/kvm/kvm.c
> +++ b/target/i386/kvm/kvm.c
> @@ -186,6 +186,10 @@ static int has_triple_fault_event;
> static bool has_msr_mcg_ext_ctl;
>
> static int pmu_cap;
> +/*
> + * Read from /sys/module/kvm/parameters/enable_pmu.
> + */
> +static bool kvm_pmu_disabled;
>
> static struct kvm_cpuid2 *cpuid_cache;
> static struct kvm_cpuid2 *hv_cpuid_cache;
> @@ -2050,23 +2054,30 @@ int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp)
> if (first) {
> first = false;
>
> - /*
> - * Since Linux v5.18, KVM provides a VM-level capability to easily
> - * disable PMUs; however, QEMU has been providing PMU property per
> - * CPU since v1.6. In order to accommodate both, have to configure
> - * the VM-level capability here.
> - *
> - * KVM_PMU_CAP_DISABLE doesn't change the PMU
> - * behavior on Intel platform because current "pmu" property works
> - * as expected.
> - */
> - if ((pmu_cap & KVM_PMU_CAP_DISABLE) && !X86_CPU(cpu)->enable_pmu) {
> - ret = kvm_vm_enable_cap(kvm_state, KVM_CAP_PMU_CAPABILITY, 0,
> - KVM_PMU_CAP_DISABLE);
> - if (ret < 0) {
> - error_setg_errno(errp, -ret,
> - "Failed to set KVM_PMU_CAP_DISABLE");
> - return ret;
> + if (X86_CPU(cpu)->enable_pmu) {
> + if (kvm_pmu_disabled) {
> + warn_report("Failed to enable PMU since "
> + "KVM's enable_pmu parameter is disabled");
> + }
> + } else {
> + /*
> + * Since Linux v5.18, KVM provides a VM-level capability to easily
> + * disable PMUs; however, QEMU has been providing PMU property per
> + * CPU since v1.6. In order to accommodate both, have to configure
> + * the VM-level capability here.
> + *
> + * KVM_PMU_CAP_DISABLE doesn't change the PMU
> + * behavior on Intel platform because current "pmu" property works
> + * as expected.
> + */
> + if (pmu_cap & KVM_PMU_CAP_DISABLE) {
> + ret = kvm_vm_enable_cap(kvm_state, KVM_CAP_PMU_CAPABILITY, 0,
> + KVM_PMU_CAP_DISABLE);
> + if (ret < 0) {
> + error_setg_errno(errp, -ret,
> + "Failed to set KVM_PMU_CAP_DISABLE");
> + return ret;
> + }
> }
> }
> }
> @@ -3273,6 +3284,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
> int ret;
> struct utsname utsname;
> Error *local_err = NULL;
> + g_autofree char *kvm_enable_pmu;
>
> /*
> * Initialize confidential guest (SEV/TDX) context, if required
> @@ -3409,6 +3421,21 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
>
> pmu_cap = kvm_check_extension(s, KVM_CAP_PMU_CAPABILITY);
>
> + /*
> + * The enable_pmu parameter is introduced since Linux v5.17,
> + * give a chance to provide more information about vPMU
> + * enablement.
> + *
> + * The kvm.enable_pmu's permission is 0444. It does not change
> + * until a reload of the KVM module.
> + */
> + if (g_file_get_contents("/sys/module/kvm/parameters/enable_pmu",
> + &kvm_enable_pmu, NULL, NULL)) {
> + if (*kvm_enable_pmu == 'N') {
> + kvm_pmu_disabled = true;
> + }
> + }
> +
> return 0;
> }
>
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v6 7/9] target/i386/kvm: reset AMD PMU registers during VM reset
2025-06-24 7:43 ` [PATCH v6 7/9] target/i386/kvm: reset AMD PMU registers during VM reset Dongli Zhang
@ 2025-07-02 5:38 ` Mi, Dapeng
0 siblings, 0 replies; 16+ messages in thread
From: Mi, Dapeng @ 2025-07-02 5:38 UTC (permalink / raw)
To: Dongli Zhang, qemu-devel, kvm
Cc: pbonzini, zhao1.liu, mtosatti, sandipan.das, babu.moger, likexu,
like.xu.linux, groug, khorenko, alexander.ivanov, den,
davydov-max, xiaoyao.li, joe.jin, ewanhai-oc, ewanhai
On 6/24/2025 3:43 PM, Dongli Zhang wrote:
> + uint32_t sel_base = MSR_K7_EVNTSEL0;
> + uint32_t ctr_base = MSR_K7_PERFCTR0;
> + /*
> + * The address of the next selector or counter register is
> + * obtained by incrementing the address of the current selector
> + * or counter register by one.
> + */
> + uint32_t step = 1;
> +
> + /*
> + * When PERFCORE is enabled, AMD PMU uses a separate set of
> + * addresses for the selector and counter registers.
> + * Additionally, the address of the next selector or counter
> + * register is determined by incrementing the address of the
> + * current register by two.
> + */
> + if (num_pmu_gp_counters == AMD64_NUM_COUNTERS_CORE) {
> + sel_base = MSR_F15H_PERF_CTL0;
> + ctr_base = MSR_F15H_PERF_CTR0;
> + step = 2;
> + }
This part of code is duplicate with previous code in kvm_put_msrs(), we'd
better add a new helper to get PMU counter MSRs base and index for all
vendors. (This can be done as an independent patch if no new version for
this patchset).
All others look good to me.
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v6 9/9] target/i386/kvm: don't stop Intel PMU counters
2025-06-24 7:43 ` [PATCH v6 9/9] target/i386/kvm: don't stop Intel PMU counters Dongli Zhang
@ 2025-07-02 5:42 ` Mi, Dapeng
0 siblings, 0 replies; 16+ messages in thread
From: Mi, Dapeng @ 2025-07-02 5:42 UTC (permalink / raw)
To: Dongli Zhang, qemu-devel, kvm
Cc: pbonzini, zhao1.liu, mtosatti, sandipan.das, babu.moger, likexu,
like.xu.linux, groug, khorenko, alexander.ivanov, den,
davydov-max, xiaoyao.li, joe.jin, ewanhai-oc, ewanhai
On 6/24/2025 3:43 PM, Dongli Zhang wrote:
> PMU MSRs are set by QEMU only at levels >= KVM_PUT_RESET_STATE,
> excluding runtime. Therefore, updating these MSRs without stopping events
> should be acceptable.
>
> In addition, KVM creates kernel perf events with host mode excluded
> (exclude_host = 1). While the events remain active, they don't increment
> the counter during QEMU vCPU userspace mode.
>
> Finally, The kvm_put_msrs() sets the MSRs using KVM_SET_MSRS. The x86 KVM
> processes these MSRs one by one in a loop, only saving the config and
> triggering the KVM_REQ_PMU request. This approach does not immediately stop
> the event before updating PMC. This approach is true since Linux kernel
> commit 68fb4757e867 ("KVM: x86/pmu: Defer reprogram_counter() to
> kvm_pmu_handle_event"), that is, v6.2.
>
> No Fixed tag is going to be added for the commit 0d89436786b0 ("kvm:
> migrate vPMU state"), because this isn't a bugfix.
>
> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
> Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
> ---
> Changed since v3:
> - Re-order reasons in commit messages.
> - Mention KVM's commit 68fb4757e867 (v6.2).
> - Keep Zhao's review as there isn't code change.
>
> target/i386/kvm/kvm.c | 9 ---------
> 1 file changed, 9 deletions(-)
>
> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> index 4bbdf996ef..207de21404 100644
> --- a/target/i386/kvm/kvm.c
> +++ b/target/i386/kvm/kvm.c
> @@ -4186,13 +4186,6 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
> }
>
> if ((IS_INTEL_CPU(env) || IS_ZHAOXIN_CPU(env)) && pmu_version > 0) {
> - if (pmu_version > 1) {
> - /* Stop the counter. */
> - kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR_CTRL, 0);
> - kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_CTRL, 0);
> - }
> -
> - /* Set the counter values. */
> for (i = 0; i < num_pmu_fixed_counters; i++) {
> kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR0 + i,
> env->msr_fixed_counters[i]);
> @@ -4208,8 +4201,6 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
> env->msr_global_status);
> kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_OVF_CTRL,
> env->msr_global_ovf_ctrl);
> -
> - /* Now start the PMU. */
> kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR_CTRL,
> env->msr_fixed_ctr_ctrl);
> kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_CTRL,
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v6 5/9] target/i386/kvm: rename architectural PMU variables
2025-06-24 7:43 ` [PATCH v6 5/9] target/i386/kvm: rename architectural PMU variables Dongli Zhang
@ 2025-08-13 9:18 ` Sandipan Das
0 siblings, 0 replies; 16+ messages in thread
From: Sandipan Das @ 2025-08-13 9:18 UTC (permalink / raw)
To: Dongli Zhang, qemu-devel, kvm
Cc: pbonzini, zhao1.liu, mtosatti, babu.moger, likexu, like.xu.linux,
groug, khorenko, alexander.ivanov, den, davydov-max, xiaoyao.li,
dapeng1.mi, joe.jin, ewanhai-oc, ewanhai
On 24-06-2025 13:13, Dongli Zhang wrote:
> AMD does not have what is commonly referred to as an architectural PMU.
> Therefore, we need to rename the following variables to be applicable for
> both Intel and AMD:
>
> - has_architectural_pmu_version
> - num_architectural_pmu_gp_counters
> - num_architectural_pmu_fixed_counters
>
> For Intel processors, the meaning of pmu_version remains unchanged.
>
> For AMD processors:
>
> pmu_version == 1 corresponds to versions before AMD PerfMonV2.
> pmu_version == 2 corresponds to AMD PerfMonV2.
>
> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
> Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
> Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
> ---
> Changed since v2:
> - Change has_pmu_version to pmu_version.
> - Add Reviewed-by since the change is minor.
> - As a reminder, there are some contextual change due to PATCH 05,
> i.e., c->edx vs. edx.
>
> target/i386/kvm/kvm.c | 49 ++++++++++++++++++++++++-------------------
> 1 file changed, 28 insertions(+), 21 deletions(-)
>
> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> index 4baaa069b8..824148688d 100644
> --- a/target/i386/kvm/kvm.c
> +++ b/target/i386/kvm/kvm.c
> @@ -166,9 +166,16 @@ static bool has_msr_perf_capabs;
> static bool has_msr_pkrs;
> static bool has_msr_hwcr;
>
> -static uint32_t has_architectural_pmu_version;
> -static uint32_t num_architectural_pmu_gp_counters;
> -static uint32_t num_architectural_pmu_fixed_counters;
> +/*
> + * For Intel processors, the meaning is the architectural PMU version
> + * number.
> + *
> + * For AMD processors: 1 corresponds to the prior versions, and 2
> + * corresponds to AMD PerfMonV2.
> + */
> +static uint32_t pmu_version;
> +static uint32_t num_pmu_gp_counters;
> +static uint32_t num_pmu_fixed_counters;
>
> static int has_xsave2;
> static int has_xcrs;
> @@ -2081,24 +2088,24 @@ static void kvm_init_pmu_info(struct kvm_cpuid2 *cpuid)
> return;
> }
>
> - has_architectural_pmu_version = c->eax & 0xff;
> - if (has_architectural_pmu_version > 0) {
> - num_architectural_pmu_gp_counters = (c->eax & 0xff00) >> 8;
> + pmu_version = c->eax & 0xff;
> + if (pmu_version > 0) {
> + num_pmu_gp_counters = (c->eax & 0xff00) >> 8;
>
> /*
> * Shouldn't be more than 32, since that's the number of bits
> * available in EBX to tell us _which_ counters are available.
> * Play it safe.
> */
> - if (num_architectural_pmu_gp_counters > MAX_GP_COUNTERS) {
> - num_architectural_pmu_gp_counters = MAX_GP_COUNTERS;
> + if (num_pmu_gp_counters > MAX_GP_COUNTERS) {
> + num_pmu_gp_counters = MAX_GP_COUNTERS;
> }
>
> - if (has_architectural_pmu_version > 1) {
> - num_architectural_pmu_fixed_counters = c->edx & 0x1f;
> + if (pmu_version > 1) {
> + num_pmu_fixed_counters = c->edx & 0x1f;
>
> - if (num_architectural_pmu_fixed_counters > MAX_FIXED_COUNTERS) {
> - num_architectural_pmu_fixed_counters = MAX_FIXED_COUNTERS;
> + if (num_pmu_fixed_counters > MAX_FIXED_COUNTERS) {
> + num_pmu_fixed_counters = MAX_FIXED_COUNTERS;
> }
> }
> }
> @@ -4051,25 +4058,25 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
> kvm_msr_entry_add(cpu, MSR_KVM_POLL_CONTROL, env->poll_control_msr);
> }
>
> - if (has_architectural_pmu_version > 0) {
> - if (has_architectural_pmu_version > 1) {
> + if (pmu_version > 0) {
> + if (pmu_version > 1) {
> /* Stop the counter. */
> kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR_CTRL, 0);
> kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_CTRL, 0);
> }
>
> /* Set the counter values. */
> - for (i = 0; i < num_architectural_pmu_fixed_counters; i++) {
> + for (i = 0; i < num_pmu_fixed_counters; i++) {
> kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR0 + i,
> env->msr_fixed_counters[i]);
> }
> - for (i = 0; i < num_architectural_pmu_gp_counters; i++) {
> + for (i = 0; i < num_pmu_gp_counters; i++) {
> kvm_msr_entry_add(cpu, MSR_P6_PERFCTR0 + i,
> env->msr_gp_counters[i]);
> kvm_msr_entry_add(cpu, MSR_P6_EVNTSEL0 + i,
> env->msr_gp_evtsel[i]);
> }
> - if (has_architectural_pmu_version > 1) {
> + if (pmu_version > 1) {
> kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_STATUS,
> env->msr_global_status);
> kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_OVF_CTRL,
> @@ -4529,17 +4536,17 @@ static int kvm_get_msrs(X86CPU *cpu)
> if (env->features[FEAT_KVM] & CPUID_KVM_POLL_CONTROL) {
> kvm_msr_entry_add(cpu, MSR_KVM_POLL_CONTROL, 1);
> }
> - if (has_architectural_pmu_version > 0) {
> - if (has_architectural_pmu_version > 1) {
> + if (pmu_version > 0) {
> + if (pmu_version > 1) {
> kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR_CTRL, 0);
> kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_CTRL, 0);
> kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_STATUS, 0);
> kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_OVF_CTRL, 0);
> }
> - for (i = 0; i < num_architectural_pmu_fixed_counters; i++) {
> + for (i = 0; i < num_pmu_fixed_counters; i++) {
> kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR0 + i, 0);
> }
> - for (i = 0; i < num_architectural_pmu_gp_counters; i++) {
> + for (i = 0; i < num_pmu_gp_counters; i++) {
> kvm_msr_entry_add(cpu, MSR_P6_PERFCTR0 + i, 0);
> kvm_msr_entry_add(cpu, MSR_P6_EVNTSEL0 + i, 0);
> }
Reviewed-by: Sandipan Das <sandipan.das@amd.com>
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2025-08-13 9:19 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-24 7:43 [PATCH v6 0/9] target/i386/kvm/pmu: PMU Enhancement, Bugfix and Cleanup Dongli Zhang
2025-06-24 7:43 ` [PATCH v6 1/9] target/i386: disable PerfMonV2 when PERFCORE unavailable Dongli Zhang
2025-06-24 7:43 ` [PATCH v6 2/9] target/i386: disable PERFCORE when "-pmu" is configured Dongli Zhang
2025-06-24 7:43 ` [PATCH v6 3/9] target/i386/kvm: set KVM_PMU_CAP_DISABLE if " Dongli Zhang
2025-07-02 3:47 ` Mi, Dapeng
2025-06-24 7:43 ` [PATCH v6 4/9] target/i386/kvm: extract unrelated code out of kvm_x86_build_cpuid() Dongli Zhang
2025-07-02 3:52 ` Mi, Dapeng
2025-06-24 7:43 ` [PATCH v6 5/9] target/i386/kvm: rename architectural PMU variables Dongli Zhang
2025-08-13 9:18 ` Sandipan Das
2025-06-24 7:43 ` [PATCH v6 6/9] target/i386/kvm: query kvm.enable_pmu parameter Dongli Zhang
2025-07-02 5:10 ` Mi, Dapeng
2025-06-24 7:43 ` [PATCH v6 7/9] target/i386/kvm: reset AMD PMU registers during VM reset Dongli Zhang
2025-07-02 5:38 ` Mi, Dapeng
2025-06-24 7:43 ` [PATCH v6 8/9] target/i386/kvm: support perfmon-v2 for reset Dongli Zhang
2025-06-24 7:43 ` [PATCH v6 9/9] target/i386/kvm: don't stop Intel PMU counters Dongli Zhang
2025-07-02 5:42 ` Mi, Dapeng
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).