* [PATCH v3 00/10] target/i386/kvm/pmu: PMU Enhancement, Bugfix and Cleanup
@ 2025-03-31 1:32 Dongli Zhang
2025-03-31 1:32 ` [PATCH v3 01/10] target/i386: disable PerfMonV2 when PERFCORE unavailable Dongli Zhang
` (9 more replies)
0 siblings, 10 replies; 21+ messages in thread
From: Dongli Zhang @ 2025-03-31 1:32 UTC (permalink / raw)
To: qemu-devel, kvm, qemu-arm, qemu-ppc, qemu-riscv, qemu-s390x
Cc: pbonzini, zhao1.liu, mtosatti, sandipan.das, babu.moger, likexu,
like.xu.linux, groug, khorenko, alexander.ivanov, den,
davydov-max, xiaoyao.li, dapeng1.mi, joe.jin, peter.maydell,
gaosong, chenhuacai, philmd, aurelien, jiaxun.yang, arikalo,
npiggin, danielhb413, palmer, alistair.francis, liwei1518,
zhiwei_liu, pasic, borntraeger, richard.henderson, david, iii,
thuth, flavra, ewanhai-oc, ewanhai, cobechen, louisqi, liamni,
frankzhu, silviazhao
This patchset addresses four bugs related to AMD PMU virtualization.
1. The PerfMonV2 is still available if PERCORE if disabled via
"-cpu host,-perfctr-core".
2. The VM 'cpuid' command still returns PERFCORE although "-pmu" is
configured.
3. The third issue is that using "-cpu host,-pmu" does not disable AMD PMU
virtualization. When using "-cpu EPYC" or "-cpu host,-pmu", AMD PMU
virtualization remains enabled. On the VM's Linux side, you might still
see:
[ 0.510611] Performance Events: Fam17h+ core perfctr, AMD PMU driver.
instead of:
[ 0.596381] Performance Events: PMU not available due to virtualization, using software events only.
[ 0.600972] NMI watchdog: Perf NMI watchdog permanently disabled
To address this, KVM_CAP_PMU_CAPABILITY is used to set KVM_PMU_CAP_DISABLE
when "-pmu" is configured.
4. The fourth issue is that unreclaimed performance events (after a QEMU
system_reset) in KVM may cause random, unwanted, or unknown NMIs to be
injected into the VM.
The AMD PMU registers are not reset during QEMU system_reset.
(1) If the VM is reset (e.g., via QEMU system_reset or VM kdump/kexec) while
running "perf top", the PMU registers are not disabled properly.
(2) Despite x86_cpu_reset() resetting many registers to zero, kvm_put_msrs()
does not handle AMD PMU registers, causing some PMU events to remain
enabled in KVM.
(3) The KVM kvm_pmc_speculative_in_use() function consistently returns true,
preventing the reclamation of these events. Consequently, the
kvm_pmc->perf_event remains active.
(4) After a reboot, the VM kernel may report the following error:
[ 0.092011] Performance Events: Fam17h+ core perfctr, Broken BIOS detected, complain to your hardware vendor.
[ 0.092023] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR c0010200 is 530076)
(5) In the worst case, the active kvm_pmc->perf_event may inject unknown
NMIs randomly into the VM kernel:
[...] Uhhuh. NMI received for unknown reason 30 on CPU 0.
To resolve these issues, we propose resetting AMD PMU registers during the
VM reset process
Changed since v1:
- Use feature_dependencies for CPUID_EXT3_PERFCORE and
CPUID_8000_0022_EAX_PERFMON_V2.
- Remove CPUID_EXT3_PERFCORE when !cpu->enable_pmu.
- Pick kvm_arch_pre_create_vcpu() patch from Xiaoyao Li.
- Use "-pmu" but not a global "pmu-cap-disabled" for KVM_PMU_CAP_DISABLE.
- Also use sysfs kvm.enable_pmu=N to determine if PMU is supported.
- Some changes to PMU register limit calculation.
Changed since v2:
- Change has_pmu_cap to pmu_cap.
- Use cpuid_find_entry() instead of cpu_x86_cpuid().
- Rework the code flow of PATCH 07 related to kvm.enable_pmu=N following
Zhao's suggestion.
- Use object_property_get_int() to get CPU family.
- Add support to Zhaoxin.
Xiaoyao Li (1):
kvm: Introduce kvm_arch_pre_create_vcpu()
Dongli Zhang (9):
target/i386: disable PerfMonV2 when PERFCORE unavailable
target/i386: disable PERFCORE when "-pmu" is configured
target/i386/kvm: set KVM_PMU_CAP_DISABLE if "-pmu" is configured
target/i386/kvm: extract unrelated code out of kvm_x86_build_cpuid()
target/i386/kvm: rename architectural PMU variables
target/i386/kvm: query kvm.enable_pmu parameter
target/i386/kvm: reset AMD PMU registers during VM reset
target/i386/kvm: support perfmon-v2 for reset
target/i386/kvm: don't stop Intel PMU counters
accel/kvm/kvm-all.c | 5 +
include/system/kvm.h | 1 +
target/arm/kvm.c | 5 +
target/i386/cpu.c | 8 +
target/i386/cpu.h | 12 ++
target/i386/kvm/kvm.c | 356 ++++++++++++++++++++++++++++++++++------
target/loongarch/kvm/kvm.c | 5 +
target/mips/kvm.c | 5 +
target/ppc/kvm.c | 5 +
target/riscv/kvm/kvm-cpu.c | 5 +
target/s390x/kvm/kvm.c | 5 +
11 files changed, 365 insertions(+), 47 deletions(-)
base-commit: 0f15892acaf3f50ecc20c6dad4b3ebdd701aa93e
Thank you very much!
Dongli Zhang
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH v3 01/10] target/i386: disable PerfMonV2 when PERFCORE unavailable
2025-03-31 1:32 [PATCH v3 00/10] target/i386/kvm/pmu: PMU Enhancement, Bugfix and Cleanup Dongli Zhang
@ 2025-03-31 1:32 ` Dongli Zhang
2025-03-31 1:32 ` [PATCH v3 02/10] target/i386: disable PERFCORE when "-pmu" is configured Dongli Zhang
` (8 subsequent siblings)
9 siblings, 0 replies; 21+ messages in thread
From: Dongli Zhang @ 2025-03-31 1:32 UTC (permalink / raw)
To: qemu-devel, kvm, qemu-arm, qemu-ppc, qemu-riscv, qemu-s390x
Cc: pbonzini, zhao1.liu, mtosatti, sandipan.das, babu.moger, likexu,
like.xu.linux, groug, khorenko, alexander.ivanov, den,
davydov-max, xiaoyao.li, dapeng1.mi, joe.jin, peter.maydell,
gaosong, chenhuacai, philmd, aurelien, jiaxun.yang, arikalo,
npiggin, danielhb413, palmer, alistair.francis, liwei1518,
zhiwei_liu, pasic, borntraeger, richard.henderson, david, iii,
thuth, flavra, ewanhai-oc, ewanhai, cobechen, louisqi, liamni,
frankzhu, silviazhao
When the PERFCORE is disabled with "-cpu host,-perfctr-core", it is
reflected in in guest dmesg.
[ 0.285136] Performance Events: AMD PMU driver.
However, the guest CPUID indicates the PerfMonV2 is still available.
CPU:
Extended Performance Monitoring and Debugging (0x80000022):
AMD performance monitoring V2 = true
AMD LBR V2 = false
AMD LBR stack & PMC freezing = false
number of core perf ctrs = 0x6 (6)
number of LBR stack entries = 0x0 (0)
number of avail Northbridge perf ctrs = 0x0 (0)
number of available UMC PMCs = 0x0 (0)
active UMCs bitmask = 0x0
Disable PerfMonV2 in CPUID when PERFCORE is disabled.
Suggested-by: Zhao Liu <zhao1.liu@intel.com>
Fixes: 209b0ac12074 ("target/i386: Add PerfMonV2 feature bit")
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Sandipan Das <sandipan.das@amd.com>
---
Changed since v1:
- Use feature_dependencies (suggested by Zhao Liu).
Changed since v2:
- Nothing. Zhao and Xiaoyao may move it to x86_cpu_expand_features()
later.
target/i386/cpu.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 1b64ceaaba..2b87331be5 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -1808,6 +1808,10 @@ static FeatureDep feature_dependencies[] = {
.from = { FEAT_7_1_EDX, CPUID_7_1_EDX_AVX10 },
.to = { FEAT_24_0_EBX, ~0ull },
},
+ {
+ .from = { FEAT_8000_0001_ECX, CPUID_EXT3_PERFCORE },
+ .to = { FEAT_8000_0022_EAX, CPUID_8000_0022_EAX_PERFMON_V2 },
+ },
};
typedef struct X86RegisterInfo32 {
--
2.39.3
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 02/10] target/i386: disable PERFCORE when "-pmu" is configured
2025-03-31 1:32 [PATCH v3 00/10] target/i386/kvm/pmu: PMU Enhancement, Bugfix and Cleanup Dongli Zhang
2025-03-31 1:32 ` [PATCH v3 01/10] target/i386: disable PerfMonV2 when PERFCORE unavailable Dongli Zhang
@ 2025-03-31 1:32 ` Dongli Zhang
2025-03-31 1:32 ` [PATCH v3 03/10] kvm: Introduce kvm_arch_pre_create_vcpu() Dongli Zhang
` (7 subsequent siblings)
9 siblings, 0 replies; 21+ messages in thread
From: Dongli Zhang @ 2025-03-31 1:32 UTC (permalink / raw)
To: qemu-devel, kvm, qemu-arm, qemu-ppc, qemu-riscv, qemu-s390x
Cc: pbonzini, zhao1.liu, mtosatti, sandipan.das, babu.moger, likexu,
like.xu.linux, groug, khorenko, alexander.ivanov, den,
davydov-max, xiaoyao.li, dapeng1.mi, joe.jin, peter.maydell,
gaosong, chenhuacai, philmd, aurelien, jiaxun.yang, arikalo,
npiggin, danielhb413, palmer, alistair.francis, liwei1518,
zhiwei_liu, pasic, borntraeger, richard.henderson, david, iii,
thuth, flavra, ewanhai-oc, ewanhai, cobechen, louisqi, liamni,
frankzhu, silviazhao
Currently, AMD PMU support isn't determined based on CPUID, that is, the
"-pmu" option does not fully disable KVM AMD PMU virtualization.
To minimize AMD PMU features, remove PERFCORE when "-pmu" is configured.
To completely disable AMD PMU virtualization will be implemented via
KVM_CAP_PMU_CAPABILITY in upcoming patches.
As a reminder, neither CPUID_EXT3_PERFCORE nor
CPUID_8000_0022_EAX_PERFMON_V2 is removed from env->features[] when "-pmu"
is configured. Developers should query whether they are supported via
cpu_x86_cpuid() rather than relying on env->features[] in future patches.
Suggested-by: Zhao Liu <zhao1.liu@intel.com>
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
---
Changed since v2:
- No need to check "kvm_enabled() && IS_AMD_CPU(env)".
target/i386/cpu.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 2b87331be5..acbd627f7e 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -7242,6 +7242,10 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
!(env->hflags & HF_LMA_MASK)) {
*edx &= ~CPUID_EXT2_SYSCALL;
}
+
+ if (!cpu->enable_pmu) {
+ *ecx &= ~CPUID_EXT3_PERFCORE;
+ }
break;
case 0x80000002:
case 0x80000003:
--
2.39.3
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 03/10] kvm: Introduce kvm_arch_pre_create_vcpu()
2025-03-31 1:32 [PATCH v3 00/10] target/i386/kvm/pmu: PMU Enhancement, Bugfix and Cleanup Dongli Zhang
2025-03-31 1:32 ` [PATCH v3 01/10] target/i386: disable PerfMonV2 when PERFCORE unavailable Dongli Zhang
2025-03-31 1:32 ` [PATCH v3 02/10] target/i386: disable PERFCORE when "-pmu" is configured Dongli Zhang
@ 2025-03-31 1:32 ` Dongli Zhang
2025-03-31 1:32 ` [PATCH v3 04/10] target/i386/kvm: set KVM_PMU_CAP_DISABLE if "-pmu" is configured Dongli Zhang
` (6 subsequent siblings)
9 siblings, 0 replies; 21+ messages in thread
From: Dongli Zhang @ 2025-03-31 1:32 UTC (permalink / raw)
To: qemu-devel, kvm, qemu-arm, qemu-ppc, qemu-riscv, qemu-s390x
Cc: pbonzini, zhao1.liu, mtosatti, sandipan.das, babu.moger, likexu,
like.xu.linux, groug, khorenko, alexander.ivanov, den,
davydov-max, xiaoyao.li, dapeng1.mi, joe.jin, peter.maydell,
gaosong, chenhuacai, philmd, aurelien, jiaxun.yang, arikalo,
npiggin, danielhb413, palmer, alistair.francis, liwei1518,
zhiwei_liu, pasic, borntraeger, richard.henderson, david, iii,
thuth, flavra, ewanhai-oc, ewanhai, cobechen, louisqi, liamni,
frankzhu, silviazhao
From: Xiaoyao Li <xiaoyao.li@intel.com>
Introduce kvm_arch_pre_create_vcpu(), to perform arch-dependent
work prior to create any vcpu. This is for i386 TDX because it needs
call TDX_INIT_VM before creating any vcpu.
The specific implemnet of i386 will be added in the future patch.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
---
Changed since v2:
- Add my Signed-off-by.
accel/kvm/kvm-all.c | 5 +++++
include/system/kvm.h | 1 +
target/arm/kvm.c | 5 +++++
target/i386/kvm/kvm.c | 5 +++++
target/loongarch/kvm/kvm.c | 5 +++++
target/mips/kvm.c | 5 +++++
target/ppc/kvm.c | 5 +++++
target/riscv/kvm/kvm-cpu.c | 5 +++++
target/s390x/kvm/kvm.c | 5 +++++
9 files changed, 41 insertions(+)
diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index f89568bfa3..df9840e53a 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -540,6 +540,11 @@ int kvm_init_vcpu(CPUState *cpu, Error **errp)
trace_kvm_init_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu));
+ ret = kvm_arch_pre_create_vcpu(cpu, errp);
+ if (ret < 0) {
+ goto err;
+ }
+
ret = kvm_create_vcpu(cpu);
if (ret < 0) {
error_setg_errno(errp, -ret,
diff --git a/include/system/kvm.h b/include/system/kvm.h
index ab17c09a55..d7dfa25493 100644
--- a/include/system/kvm.h
+++ b/include/system/kvm.h
@@ -374,6 +374,7 @@ int kvm_arch_get_default_type(MachineState *ms);
int kvm_arch_init(MachineState *ms, KVMState *s);
+int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp);
int kvm_arch_init_vcpu(CPUState *cpu);
int kvm_arch_destroy_vcpu(CPUState *cpu);
diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index da30bdbb23..93f1a7245b 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -1874,6 +1874,11 @@ static int kvm_arm_sve_set_vls(ARMCPU *cpu)
#define ARM_CPU_ID_MPIDR 3, 0, 0, 0, 5
+int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp)
+{
+ return 0;
+}
+
int kvm_arch_init_vcpu(CPUState *cs)
{
int ret;
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 6c749d4ee8..f41e190fb8 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -2051,6 +2051,11 @@ full:
abort();
}
+int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp)
+{
+ return 0;
+}
+
int kvm_arch_init_vcpu(CPUState *cs)
{
struct {
diff --git a/target/loongarch/kvm/kvm.c b/target/loongarch/kvm/kvm.c
index 7f63e7c8fe..ed0ddf1cbf 100644
--- a/target/loongarch/kvm/kvm.c
+++ b/target/loongarch/kvm/kvm.c
@@ -1075,6 +1075,11 @@ static int kvm_cpu_check_pv_features(CPUState *cs, Error **errp)
return 0;
}
+int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp)
+{
+ return 0;
+}
+
int kvm_arch_init_vcpu(CPUState *cs)
{
uint64_t val;
diff --git a/target/mips/kvm.c b/target/mips/kvm.c
index d67b7c1a8e..ec53acb51a 100644
--- a/target/mips/kvm.c
+++ b/target/mips/kvm.c
@@ -61,6 +61,11 @@ int kvm_arch_irqchip_create(KVMState *s)
return 0;
}
+int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp)
+{
+ return 0;
+}
+
int kvm_arch_init_vcpu(CPUState *cs)
{
CPUMIPSState *env = cpu_env(cs);
diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
index 992356cb75..20fabccecd 100644
--- a/target/ppc/kvm.c
+++ b/target/ppc/kvm.c
@@ -479,6 +479,11 @@ static void kvmppc_hw_debug_points_init(CPUPPCState *cenv)
}
}
+int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp)
+{
+ return 0;
+}
+
int kvm_arch_init_vcpu(CPUState *cs)
{
PowerPCCPU *cpu = POWERPC_CPU(cs);
diff --git a/target/riscv/kvm/kvm-cpu.c b/target/riscv/kvm/kvm-cpu.c
index 4ffeeaa1c9..451c00f17c 100644
--- a/target/riscv/kvm/kvm-cpu.c
+++ b/target/riscv/kvm/kvm-cpu.c
@@ -1389,6 +1389,11 @@ static int kvm_vcpu_enable_sbi_dbcn(RISCVCPU *cpu, CPUState *cs)
return kvm_set_one_reg(cs, kvm_sbi_dbcn.kvm_reg_id, ®);
}
+int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp)
+{
+ return 0;
+}
+
int kvm_arch_init_vcpu(CPUState *cs)
{
int ret = 0;
diff --git a/target/s390x/kvm/kvm.c b/target/s390x/kvm/kvm.c
index 4d56e653dd..1f592733f4 100644
--- a/target/s390x/kvm/kvm.c
+++ b/target/s390x/kvm/kvm.c
@@ -404,6 +404,11 @@ unsigned long kvm_arch_vcpu_id(CPUState *cpu)
return cpu->cpu_index;
}
+int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp)
+{
+ return 0;
+}
+
int kvm_arch_init_vcpu(CPUState *cs)
{
unsigned int max_cpus = MACHINE(qdev_get_machine())->smp.max_cpus;
--
2.39.3
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 04/10] target/i386/kvm: set KVM_PMU_CAP_DISABLE if "-pmu" is configured
2025-03-31 1:32 [PATCH v3 00/10] target/i386/kvm/pmu: PMU Enhancement, Bugfix and Cleanup Dongli Zhang
` (2 preceding siblings ...)
2025-03-31 1:32 ` [PATCH v3 03/10] kvm: Introduce kvm_arch_pre_create_vcpu() Dongli Zhang
@ 2025-03-31 1:32 ` Dongli Zhang
2025-03-31 1:32 ` [PATCH v3 05/10] target/i386/kvm: extract unrelated code out of kvm_x86_build_cpuid() Dongli Zhang
` (5 subsequent siblings)
9 siblings, 0 replies; 21+ messages in thread
From: Dongli Zhang @ 2025-03-31 1:32 UTC (permalink / raw)
To: qemu-devel, kvm, qemu-arm, qemu-ppc, qemu-riscv, qemu-s390x
Cc: pbonzini, zhao1.liu, mtosatti, sandipan.das, babu.moger, likexu,
like.xu.linux, groug, khorenko, alexander.ivanov, den,
davydov-max, xiaoyao.li, dapeng1.mi, joe.jin, peter.maydell,
gaosong, chenhuacai, philmd, aurelien, jiaxun.yang, arikalo,
npiggin, danielhb413, palmer, alistair.francis, liwei1518,
zhiwei_liu, pasic, borntraeger, richard.henderson, david, iii,
thuth, flavra, ewanhai-oc, ewanhai, cobechen, louisqi, liamni,
frankzhu, silviazhao
Although AMD PERFCORE and PerfMonV2 are removed when "-pmu" is configured,
there is no way to fully disable KVM AMD PMU virtualization. Neither
"-cpu host,-pmu" nor "-cpu EPYC" achieves this.
As a result, the following message still appears in the VM dmesg:
[ 0.263615] Performance Events: AMD PMU driver.
However, the expected output should be:
[ 0.596381] Performance Events: PMU not available due to virtualization, using software events only.
[ 0.600972] NMI watchdog: Perf NMI watchdog permanently disabled
This occurs because AMD does not use any CPUID bit to indicate PMU
availability.
To address this, KVM_CAP_PMU_CAPABILITY is used to set KVM_PMU_CAP_DISABLE
when "-pmu" is configured.
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
---
Changed since v1:
- Switch back to the initial implementation with "-pmu".
https://lore.kernel.org/all/20221119122901.2469-3-dongli.zhang@oracle.com
- Mention that "KVM_PMU_CAP_DISABLE doesn't change the PMU behavior on
Intel platform because current "pmu" property works as expected."
Changed since v2:
- Change has_pmu_cap to pmu_cap.
- Use (pmu_cap & KVM_PMU_CAP_DISABLE) instead of only pmu_cap in if
statement.
- Add Reviewed-by from Xiaoyao and Zhao as the change is minor.
target/i386/kvm/kvm.c | 31 +++++++++++++++++++++++++++++++
1 file changed, 31 insertions(+)
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index f41e190fb8..579c0f7e0b 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -176,6 +176,8 @@ static int has_triple_fault_event;
static bool has_msr_mcg_ext_ctl;
+static int pmu_cap;
+
static struct kvm_cpuid2 *cpuid_cache;
static struct kvm_cpuid2 *hv_cpuid_cache;
static struct kvm_msr_list *kvm_feature_msrs;
@@ -2053,6 +2055,33 @@ full:
int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp)
{
+ static bool first = true;
+ int ret;
+
+ if (first) {
+ first = false;
+
+ /*
+ * Since Linux v5.18, KVM provides a VM-level capability to easily
+ * disable PMUs; however, QEMU has been providing PMU property per
+ * CPU since v1.6. In order to accommodate both, have to configure
+ * the VM-level capability here.
+ *
+ * KVM_PMU_CAP_DISABLE doesn't change the PMU
+ * behavior on Intel platform because current "pmu" property works
+ * as expected.
+ */
+ if ((pmu_cap & KVM_PMU_CAP_DISABLE) && !X86_CPU(cpu)->enable_pmu) {
+ ret = kvm_vm_enable_cap(kvm_state, KVM_CAP_PMU_CAPABILITY, 0,
+ KVM_PMU_CAP_DISABLE);
+ if (ret < 0) {
+ error_setg_errno(errp, -ret,
+ "Failed to set KVM_PMU_CAP_DISABLE");
+ return ret;
+ }
+ }
+ }
+
return 0;
}
@@ -3351,6 +3380,8 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
}
}
+ pmu_cap = kvm_check_extension(s, KVM_CAP_PMU_CAPABILITY);
+
return 0;
}
--
2.39.3
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 05/10] target/i386/kvm: extract unrelated code out of kvm_x86_build_cpuid()
2025-03-31 1:32 [PATCH v3 00/10] target/i386/kvm/pmu: PMU Enhancement, Bugfix and Cleanup Dongli Zhang
` (3 preceding siblings ...)
2025-03-31 1:32 ` [PATCH v3 04/10] target/i386/kvm: set KVM_PMU_CAP_DISABLE if "-pmu" is configured Dongli Zhang
@ 2025-03-31 1:32 ` Dongli Zhang
2025-04-10 2:41 ` Zhao Liu
2025-03-31 1:32 ` [PATCH v3 06/10] target/i386/kvm: rename architectural PMU variables Dongli Zhang
` (4 subsequent siblings)
9 siblings, 1 reply; 21+ messages in thread
From: Dongli Zhang @ 2025-03-31 1:32 UTC (permalink / raw)
To: qemu-devel, kvm, qemu-arm, qemu-ppc, qemu-riscv, qemu-s390x
Cc: pbonzini, zhao1.liu, mtosatti, sandipan.das, babu.moger, likexu,
like.xu.linux, groug, khorenko, alexander.ivanov, den,
davydov-max, xiaoyao.li, dapeng1.mi, joe.jin, peter.maydell,
gaosong, chenhuacai, philmd, aurelien, jiaxun.yang, arikalo,
npiggin, danielhb413, palmer, alistair.francis, liwei1518,
zhiwei_liu, pasic, borntraeger, richard.henderson, david, iii,
thuth, flavra, ewanhai-oc, ewanhai, cobechen, louisqi, liamni,
frankzhu, silviazhao
The initialization of 'has_architectural_pmu_version',
'num_architectural_pmu_gp_counters', and
'num_architectural_pmu_fixed_counters' is unrelated to the process of
building the CPUID.
Extract them out of kvm_x86_build_cpuid().
In addition, use cpuid_find_entry() instead of cpu_x86_cpuid(), because
CPUID has already been filled at this stage.
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
---
Changed since v1:
- Still extract the code, but call them for all CPUs.
Changed since v2:
- Use cpuid_find_entry() instead of cpu_x86_cpuid().
- Didn't add Reviewed-by from Dapeng as the change isn't minor.
target/i386/kvm/kvm.c | 62 ++++++++++++++++++++++++-------------------
1 file changed, 35 insertions(+), 27 deletions(-)
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 579c0f7e0b..4d86c08c6c 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -1959,33 +1959,6 @@ static uint32_t kvm_x86_build_cpuid(CPUX86State *env,
}
}
- if (limit >= 0x0a) {
- uint32_t eax, edx;
-
- cpu_x86_cpuid(env, 0x0a, 0, &eax, &unused, &unused, &edx);
-
- has_architectural_pmu_version = eax & 0xff;
- if (has_architectural_pmu_version > 0) {
- num_architectural_pmu_gp_counters = (eax & 0xff00) >> 8;
-
- /* Shouldn't be more than 32, since that's the number of bits
- * available in EBX to tell us _which_ counters are available.
- * Play it safe.
- */
- if (num_architectural_pmu_gp_counters > MAX_GP_COUNTERS) {
- num_architectural_pmu_gp_counters = MAX_GP_COUNTERS;
- }
-
- if (has_architectural_pmu_version > 1) {
- num_architectural_pmu_fixed_counters = edx & 0x1f;
-
- if (num_architectural_pmu_fixed_counters > MAX_FIXED_COUNTERS) {
- num_architectural_pmu_fixed_counters = MAX_FIXED_COUNTERS;
- }
- }
- }
- }
-
cpu_x86_cpuid(env, 0x80000000, 0, &limit, &unused, &unused, &unused);
for (i = 0x80000000; i <= limit; i++) {
@@ -2085,6 +2058,39 @@ int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp)
return 0;
}
+static void kvm_init_pmu_info(struct kvm_cpuid2 *cpuid)
+{
+ struct kvm_cpuid_entry2 *c;
+
+ c = cpuid_find_entry(cpuid, 0xa, 0);
+
+ if (!c) {
+ return;
+ }
+
+ has_architectural_pmu_version = c->eax & 0xff;
+ if (has_architectural_pmu_version > 0) {
+ num_architectural_pmu_gp_counters = (c->eax & 0xff00) >> 8;
+
+ /*
+ * Shouldn't be more than 32, since that's the number of bits
+ * available in EBX to tell us _which_ counters are available.
+ * Play it safe.
+ */
+ if (num_architectural_pmu_gp_counters > MAX_GP_COUNTERS) {
+ num_architectural_pmu_gp_counters = MAX_GP_COUNTERS;
+ }
+
+ if (has_architectural_pmu_version > 1) {
+ num_architectural_pmu_fixed_counters = c->edx & 0x1f;
+
+ if (num_architectural_pmu_fixed_counters > MAX_FIXED_COUNTERS) {
+ num_architectural_pmu_fixed_counters = MAX_FIXED_COUNTERS;
+ }
+ }
+ }
+}
+
int kvm_arch_init_vcpu(CPUState *cs)
{
struct {
@@ -2267,6 +2273,8 @@ int kvm_arch_init_vcpu(CPUState *cs)
cpuid_i = kvm_x86_build_cpuid(env, cpuid_data.entries, cpuid_i);
cpuid_data.cpuid.nent = cpuid_i;
+ kvm_init_pmu_info(&cpuid_data.cpuid);
+
if (((env->cpuid_version >> 8)&0xF) >= 6
&& (env->features[FEAT_1_EDX] & (CPUID_MCE | CPUID_MCA)) ==
(CPUID_MCE | CPUID_MCA)) {
--
2.39.3
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 06/10] target/i386/kvm: rename architectural PMU variables
2025-03-31 1:32 [PATCH v3 00/10] target/i386/kvm/pmu: PMU Enhancement, Bugfix and Cleanup Dongli Zhang
` (4 preceding siblings ...)
2025-03-31 1:32 ` [PATCH v3 05/10] target/i386/kvm: extract unrelated code out of kvm_x86_build_cpuid() Dongli Zhang
@ 2025-03-31 1:32 ` Dongli Zhang
2025-03-31 1:32 ` [PATCH v3 07/10] target/i386/kvm: query kvm.enable_pmu parameter Dongli Zhang
` (3 subsequent siblings)
9 siblings, 0 replies; 21+ messages in thread
From: Dongli Zhang @ 2025-03-31 1:32 UTC (permalink / raw)
To: qemu-devel, kvm, qemu-arm, qemu-ppc, qemu-riscv, qemu-s390x
Cc: pbonzini, zhao1.liu, mtosatti, sandipan.das, babu.moger, likexu,
like.xu.linux, groug, khorenko, alexander.ivanov, den,
davydov-max, xiaoyao.li, dapeng1.mi, joe.jin, peter.maydell,
gaosong, chenhuacai, philmd, aurelien, jiaxun.yang, arikalo,
npiggin, danielhb413, palmer, alistair.francis, liwei1518,
zhiwei_liu, pasic, borntraeger, richard.henderson, david, iii,
thuth, flavra, ewanhai-oc, ewanhai, cobechen, louisqi, liamni,
frankzhu, silviazhao
AMD does not have what is commonly referred to as an architectural PMU.
Therefore, we need to rename the following variables to be applicable for
both Intel and AMD:
- has_architectural_pmu_version
- num_architectural_pmu_gp_counters
- num_architectural_pmu_fixed_counters
For Intel processors, the meaning of pmu_version remains unchanged.
For AMD processors:
pmu_version == 1 corresponds to versions before AMD PerfMonV2.
pmu_version == 2 corresponds to AMD PerfMonV2.
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
---
Changed since v2:
- Change has_pmu_version to pmu_version.
- Add Reviewed-by since the change is minor.
- As a reminder, there are some contextual change due to PATCH 05,
i.e., c->edx vs. edx.
target/i386/kvm/kvm.c | 49 ++++++++++++++++++++++++-------------------
1 file changed, 28 insertions(+), 21 deletions(-)
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 4d86c08c6c..6b49549f1b 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -164,9 +164,16 @@ static bool has_msr_perf_capabs;
static bool has_msr_pkrs;
static bool has_msr_hwcr;
-static uint32_t has_architectural_pmu_version;
-static uint32_t num_architectural_pmu_gp_counters;
-static uint32_t num_architectural_pmu_fixed_counters;
+/*
+ * For Intel processors, the meaning is the architectural PMU version
+ * number.
+ *
+ * For AMD processors: 1 corresponds to the prior versions, and 2
+ * corresponds to AMD PerfMonV2.
+ */
+static uint32_t pmu_version;
+static uint32_t num_pmu_gp_counters;
+static uint32_t num_pmu_fixed_counters;
static int has_xsave2;
static int has_xcrs;
@@ -2068,24 +2075,24 @@ static void kvm_init_pmu_info(struct kvm_cpuid2 *cpuid)
return;
}
- has_architectural_pmu_version = c->eax & 0xff;
- if (has_architectural_pmu_version > 0) {
- num_architectural_pmu_gp_counters = (c->eax & 0xff00) >> 8;
+ pmu_version = c->eax & 0xff;
+ if (pmu_version > 0) {
+ num_pmu_gp_counters = (c->eax & 0xff00) >> 8;
/*
* Shouldn't be more than 32, since that's the number of bits
* available in EBX to tell us _which_ counters are available.
* Play it safe.
*/
- if (num_architectural_pmu_gp_counters > MAX_GP_COUNTERS) {
- num_architectural_pmu_gp_counters = MAX_GP_COUNTERS;
+ if (num_pmu_gp_counters > MAX_GP_COUNTERS) {
+ num_pmu_gp_counters = MAX_GP_COUNTERS;
}
- if (has_architectural_pmu_version > 1) {
- num_architectural_pmu_fixed_counters = c->edx & 0x1f;
+ if (pmu_version > 1) {
+ num_pmu_fixed_counters = c->edx & 0x1f;
- if (num_architectural_pmu_fixed_counters > MAX_FIXED_COUNTERS) {
- num_architectural_pmu_fixed_counters = MAX_FIXED_COUNTERS;
+ if (num_pmu_fixed_counters > MAX_FIXED_COUNTERS) {
+ num_pmu_fixed_counters = MAX_FIXED_COUNTERS;
}
}
}
@@ -4037,25 +4044,25 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
kvm_msr_entry_add(cpu, MSR_KVM_POLL_CONTROL, env->poll_control_msr);
}
- if (has_architectural_pmu_version > 0) {
- if (has_architectural_pmu_version > 1) {
+ if (pmu_version > 0) {
+ if (pmu_version > 1) {
/* Stop the counter. */
kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR_CTRL, 0);
kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_CTRL, 0);
}
/* Set the counter values. */
- for (i = 0; i < num_architectural_pmu_fixed_counters; i++) {
+ for (i = 0; i < num_pmu_fixed_counters; i++) {
kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR0 + i,
env->msr_fixed_counters[i]);
}
- for (i = 0; i < num_architectural_pmu_gp_counters; i++) {
+ for (i = 0; i < num_pmu_gp_counters; i++) {
kvm_msr_entry_add(cpu, MSR_P6_PERFCTR0 + i,
env->msr_gp_counters[i]);
kvm_msr_entry_add(cpu, MSR_P6_EVNTSEL0 + i,
env->msr_gp_evtsel[i]);
}
- if (has_architectural_pmu_version > 1) {
+ if (pmu_version > 1) {
kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_STATUS,
env->msr_global_status);
kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_OVF_CTRL,
@@ -4515,17 +4522,17 @@ static int kvm_get_msrs(X86CPU *cpu)
if (env->features[FEAT_KVM] & CPUID_KVM_POLL_CONTROL) {
kvm_msr_entry_add(cpu, MSR_KVM_POLL_CONTROL, 1);
}
- if (has_architectural_pmu_version > 0) {
- if (has_architectural_pmu_version > 1) {
+ if (pmu_version > 0) {
+ if (pmu_version > 1) {
kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR_CTRL, 0);
kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_CTRL, 0);
kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_STATUS, 0);
kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_OVF_CTRL, 0);
}
- for (i = 0; i < num_architectural_pmu_fixed_counters; i++) {
+ for (i = 0; i < num_pmu_fixed_counters; i++) {
kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR0 + i, 0);
}
- for (i = 0; i < num_architectural_pmu_gp_counters; i++) {
+ for (i = 0; i < num_pmu_gp_counters; i++) {
kvm_msr_entry_add(cpu, MSR_P6_PERFCTR0 + i, 0);
kvm_msr_entry_add(cpu, MSR_P6_EVNTSEL0 + i, 0);
}
--
2.39.3
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 07/10] target/i386/kvm: query kvm.enable_pmu parameter
2025-03-31 1:32 [PATCH v3 00/10] target/i386/kvm/pmu: PMU Enhancement, Bugfix and Cleanup Dongli Zhang
` (5 preceding siblings ...)
2025-03-31 1:32 ` [PATCH v3 06/10] target/i386/kvm: rename architectural PMU variables Dongli Zhang
@ 2025-03-31 1:32 ` Dongli Zhang
2025-04-10 5:05 ` Zhao Liu
2025-03-31 1:32 ` [PATCH v3 08/10] target/i386/kvm: reset AMD PMU registers during VM reset Dongli Zhang
` (2 subsequent siblings)
9 siblings, 1 reply; 21+ messages in thread
From: Dongli Zhang @ 2025-03-31 1:32 UTC (permalink / raw)
To: qemu-devel, kvm, qemu-arm, qemu-ppc, qemu-riscv, qemu-s390x
Cc: pbonzini, zhao1.liu, mtosatti, sandipan.das, babu.moger, likexu,
like.xu.linux, groug, khorenko, alexander.ivanov, den,
davydov-max, xiaoyao.li, dapeng1.mi, joe.jin, peter.maydell,
gaosong, chenhuacai, philmd, aurelien, jiaxun.yang, arikalo,
npiggin, danielhb413, palmer, alistair.francis, liwei1518,
zhiwei_liu, pasic, borntraeger, richard.henderson, david, iii,
thuth, flavra, ewanhai-oc, ewanhai, cobechen, louisqi, liamni,
frankzhu, silviazhao
There is no way to distinguish between the following scenarios:
(1) KVM_CAP_PMU_CAPABILITY is not supported.
(2) KVM_CAP_PMU_CAPABILITY is supported but disabled via the module
parameter kvm.enable_pmu=N.
In scenario (1), there is no way to fully disable AMD PMU virtualization.
In scenario (2), PMU virtualization is completely disabled by the KVM
module.
To help determine the scenario, read the kvm.enable_pmu value from the
sysfs module parameter.
There isn't any requirement to initialize 'pmu_version',
'num_pmu_gp_counters' or 'num_pmu_fixed_counters', if kvm.enable_pmu=N.
In addition, return error when kvm.enable_pmu=N but the user wants to enable
vPMU.
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
---
Changed since v2:
- Rework the code flow following Zhao's suggestion.
- Return error when:
(*kvm_enable_pmu == 'N' && X86_CPU(cpu)->enable_pmu)
target/i386/kvm/kvm.c | 36 +++++++++++++++++++++++++++++-------
1 file changed, 29 insertions(+), 7 deletions(-)
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 6b49549f1b..f68d5a0578 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -2051,13 +2051,35 @@ int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp)
* behavior on Intel platform because current "pmu" property works
* as expected.
*/
- if ((pmu_cap & KVM_PMU_CAP_DISABLE) && !X86_CPU(cpu)->enable_pmu) {
- ret = kvm_vm_enable_cap(kvm_state, KVM_CAP_PMU_CAPABILITY, 0,
- KVM_PMU_CAP_DISABLE);
- if (ret < 0) {
- error_setg_errno(errp, -ret,
- "Failed to set KVM_PMU_CAP_DISABLE");
- return ret;
+ if (pmu_cap) {
+ if ((pmu_cap & KVM_PMU_CAP_DISABLE) &&
+ !X86_CPU(cpu)->enable_pmu) {
+ ret = kvm_vm_enable_cap(kvm_state, KVM_CAP_PMU_CAPABILITY, 0,
+ KVM_PMU_CAP_DISABLE);
+ if (ret < 0) {
+ error_setg_errno(errp, -ret,
+ "Failed to set KVM_PMU_CAP_DISABLE");
+ return ret;
+ }
+ }
+ } else {
+ /*
+ * KVM_CAP_PMU_CAPABILITY is introduced in Linux v5.18. For old
+ * linux, we have to check enable_pmu parameter for vPMU support.
+ */
+ g_autofree char *kvm_enable_pmu;
+
+ /*
+ * The kvm.enable_pmu's permission is 0444. It does not change until
+ * a reload of the KVM module.
+ */
+ if (g_file_get_contents("/sys/module/kvm/parameters/enable_pmu",
+ &kvm_enable_pmu, NULL, NULL)) {
+ if (*kvm_enable_pmu == 'N' && X86_CPU(cpu)->enable_pmu) {
+ error_setg(errp, "Failed to enable PMU since "
+ "KVM's enable_pmu parameter is disabled");
+ return -EPERM;
+ }
}
}
}
--
2.39.3
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 08/10] target/i386/kvm: reset AMD PMU registers during VM reset
2025-03-31 1:32 [PATCH v3 00/10] target/i386/kvm/pmu: PMU Enhancement, Bugfix and Cleanup Dongli Zhang
` (6 preceding siblings ...)
2025-03-31 1:32 ` [PATCH v3 07/10] target/i386/kvm: query kvm.enable_pmu parameter Dongli Zhang
@ 2025-03-31 1:32 ` Dongli Zhang
2025-04-10 7:43 ` Zhao Liu
2025-03-31 1:32 ` [PATCH v3 09/10] target/i386/kvm: support perfmon-v2 for reset Dongli Zhang
2025-03-31 1:32 ` [PATCH v3 10/10] target/i386/kvm: don't stop Intel PMU counters Dongli Zhang
9 siblings, 1 reply; 21+ messages in thread
From: Dongli Zhang @ 2025-03-31 1:32 UTC (permalink / raw)
To: qemu-devel, kvm, qemu-arm, qemu-ppc, qemu-riscv, qemu-s390x
Cc: pbonzini, zhao1.liu, mtosatti, sandipan.das, babu.moger, likexu,
like.xu.linux, groug, khorenko, alexander.ivanov, den,
davydov-max, xiaoyao.li, dapeng1.mi, joe.jin, peter.maydell,
gaosong, chenhuacai, philmd, aurelien, jiaxun.yang, arikalo,
npiggin, danielhb413, palmer, alistair.francis, liwei1518,
zhiwei_liu, pasic, borntraeger, richard.henderson, david, iii,
thuth, flavra, ewanhai-oc, ewanhai, cobechen, louisqi, liamni,
frankzhu, silviazhao
QEMU uses the kvm_get_msrs() function to save Intel PMU registers from KVM
and kvm_put_msrs() to restore them to KVM. However, there is no support for
AMD PMU registers. Currently, pmu_version and num_pmu_gp_counters are
initialized based on cpuid(0xa), which does not apply to AMD processors.
For AMD CPUs, prior to PerfMonV2, the number of general-purpose registers
is determined based on the CPU version.
To address this issue, we need to add support for AMD PMU registers.
Without this support, the following problems can arise:
1. If the VM is reset (e.g., via QEMU system_reset or VM kdump/kexec) while
running "perf top", the PMU registers are not disabled properly.
2. Despite x86_cpu_reset() resetting many registers to zero, kvm_put_msrs()
does not handle AMD PMU registers, causing some PMU events to remain
enabled in KVM.
3. The KVM kvm_pmc_speculative_in_use() function consistently returns true,
preventing the reclamation of these events. Consequently, the
kvm_pmc->perf_event remains active.
4. After a reboot, the VM kernel may report the following error:
[ 0.092011] Performance Events: Fam17h+ core perfctr, Broken BIOS detected, complain to your hardware vendor.
[ 0.092023] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR c0010200 is 530076)
5. In the worst case, the active kvm_pmc->perf_event may inject unknown
NMIs randomly into the VM kernel:
[...] Uhhuh. NMI received for unknown reason 30 on CPU 0.
To resolve these issues, we propose resetting AMD PMU registers during the
VM reset process.
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
---
Changed since v1:
- Modify "MSR_K7_EVNTSEL0 + 3" and "MSR_K7_PERFCTR0 + 3" by using
AMD64_NUM_COUNTERS (suggested by Sandipan Das).
- Use "AMD64_NUM_COUNTERS_CORE * 2 - 1", not "MSR_F15H_PERF_CTL0 + 0xb".
(suggested by Sandipan Das).
- Switch back to "-pmu" instead of using a global "pmu-cap-disabled".
- Don't initialize PMU info if kvm.enable_pmu=N.
Changed since v2:
- Remove 'static' from host_cpuid_vendorX.
- Change has_pmu_version to pmu_version.
- Use object_property_get_int() to get CPU family.
- Use cpuid_find_entry() instead of cpu_x86_cpuid().
- Send error log when host and guest are from different vendors.
- Move "if (!cpu->enable_pmu)" to begin of function. Add comments to
reminder developers.
- Add support to Zhaoxin. Change is_same_vendor() to
is_host_compat_vendor().
- Didn't add Reviewed-by from Sandipan because the change isn't minor.
TODO:
- This patch adds is_host_compat_vendor(), while there are something
like is_host_cpu_intel() from target/i386/kvm/vmsr_energy.c. A rework
may help move those helpers to target/i386/cpu*.
target/i386/cpu.h | 8 ++
target/i386/kvm/kvm.c | 176 +++++++++++++++++++++++++++++++++++++++++-
2 files changed, 180 insertions(+), 4 deletions(-)
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 76f24446a5..84e497f5d3 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -490,6 +490,14 @@ typedef enum X86Seg {
#define MSR_CORE_PERF_GLOBAL_CTRL 0x38f
#define MSR_CORE_PERF_GLOBAL_OVF_CTRL 0x390
+#define MSR_K7_EVNTSEL0 0xc0010000
+#define MSR_K7_PERFCTR0 0xc0010004
+#define MSR_F15H_PERF_CTL0 0xc0010200
+#define MSR_F15H_PERF_CTR0 0xc0010201
+
+#define AMD64_NUM_COUNTERS 4
+#define AMD64_NUM_COUNTERS_CORE 6
+
#define MSR_MC0_CTL 0x400
#define MSR_MC0_STATUS 0x401
#define MSR_MC0_ADDR 0x402
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index f68d5a0578..3a35fd741d 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -2087,7 +2087,7 @@ int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp)
return 0;
}
-static void kvm_init_pmu_info(struct kvm_cpuid2 *cpuid)
+static void kvm_init_pmu_info_intel(struct kvm_cpuid2 *cpuid)
{
struct kvm_cpuid_entry2 *c;
@@ -2120,6 +2120,97 @@ static void kvm_init_pmu_info(struct kvm_cpuid2 *cpuid)
}
}
+static void kvm_init_pmu_info_amd(struct kvm_cpuid2 *cpuid, X86CPU *cpu)
+{
+ struct kvm_cpuid_entry2 *c;
+ int64_t family;
+
+ family = object_property_get_int(OBJECT(cpu), "family", NULL);
+ if (family < 0) {
+ return;
+ }
+
+ if (family < 6) {
+ error_report("AMD performance-monitoring is supported from "
+ "K7 and later");
+ return;
+ }
+
+ pmu_version = 1;
+ num_pmu_gp_counters = AMD64_NUM_COUNTERS;
+
+ c = cpuid_find_entry(cpuid, 0x80000001, 0);
+ if (!c) {
+ return;
+ }
+
+ if (!(c->ecx & CPUID_EXT3_PERFCORE)) {
+ return;
+ }
+
+ num_pmu_gp_counters = AMD64_NUM_COUNTERS_CORE;
+}
+
+static bool is_host_compat_vendor(CPUX86State *env)
+{
+ char host_vendor[CPUID_VENDOR_SZ + 1];
+ uint32_t host_cpuid_vendor1;
+ uint32_t host_cpuid_vendor2;
+ uint32_t host_cpuid_vendor3;
+
+ host_cpuid(0x0, 0, NULL, &host_cpuid_vendor1, &host_cpuid_vendor3,
+ &host_cpuid_vendor2);
+
+ x86_cpu_vendor_words2str(host_vendor, host_cpuid_vendor1,
+ host_cpuid_vendor2, host_cpuid_vendor3);
+
+ /*
+ * Intel and Zhaoxin are compatible.
+ */
+ if ((g_str_equal(host_vendor, CPUID_VENDOR_INTEL) ||
+ g_str_equal(host_vendor, CPUID_VENDOR_ZHAOXIN1) ||
+ g_str_equal(host_vendor, CPUID_VENDOR_ZHAOXIN2)) &&
+ (IS_INTEL_CPU(env) || IS_ZHAOXIN_CPU(env))) {
+ return true;
+ }
+
+ return env->cpuid_vendor1 == host_cpuid_vendor1 &&
+ env->cpuid_vendor2 == host_cpuid_vendor2 &&
+ env->cpuid_vendor3 == host_cpuid_vendor3;
+}
+
+static void kvm_init_pmu_info(struct kvm_cpuid2 *cpuid, X86CPU *cpu)
+{
+ CPUX86State *env = &cpu->env;
+
+ /*
+ * If KVM_CAP_PMU_CAPABILITY is not supported, there is no way to
+ * disable the AMD PMU virtualization.
+ *
+ * Assume the user is aware of this when !cpu->enable_pmu. AMD PMU
+ * registers are not going to reset, even they are still available to
+ * guest VM.
+ */
+ if (!cpu->enable_pmu) {
+ return;
+ }
+
+ /*
+ * It is not supported to virtualize AMD PMU registers on Intel
+ * processors, nor to virtualize Intel PMU registers on AMD processors.
+ */
+ if (!is_host_compat_vendor(env)) {
+ error_report("host doesn't support requested feature: vPMU");
+ return;
+ }
+
+ if (IS_INTEL_CPU(env) || IS_ZHAOXIN_CPU(env)) {
+ kvm_init_pmu_info_intel(cpuid);
+ } else if (IS_AMD_CPU(env)) {
+ kvm_init_pmu_info_amd(cpuid, cpu);
+ }
+}
+
int kvm_arch_init_vcpu(CPUState *cs)
{
struct {
@@ -2302,7 +2393,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
cpuid_i = kvm_x86_build_cpuid(env, cpuid_data.entries, cpuid_i);
cpuid_data.cpuid.nent = cpuid_i;
- kvm_init_pmu_info(&cpuid_data.cpuid);
+ kvm_init_pmu_info(&cpuid_data.cpuid, cpu);
if (((env->cpuid_version >> 8)&0xF) >= 6
&& (env->features[FEAT_1_EDX] & (CPUID_MCE | CPUID_MCA)) ==
@@ -4066,7 +4157,7 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
kvm_msr_entry_add(cpu, MSR_KVM_POLL_CONTROL, env->poll_control_msr);
}
- if (pmu_version > 0) {
+ if ((IS_INTEL_CPU(env) || IS_ZHAOXIN_CPU(env)) && pmu_version > 0) {
if (pmu_version > 1) {
/* Stop the counter. */
kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR_CTRL, 0);
@@ -4097,6 +4188,38 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
env->msr_global_ctrl);
}
}
+
+ if (IS_AMD_CPU(env) && pmu_version > 0) {
+ uint32_t sel_base = MSR_K7_EVNTSEL0;
+ uint32_t ctr_base = MSR_K7_PERFCTR0;
+ /*
+ * The address of the next selector or counter register is
+ * obtained by incrementing the address of the current selector
+ * or counter register by one.
+ */
+ uint32_t step = 1;
+
+ /*
+ * When PERFCORE is enabled, AMD PMU uses a separate set of
+ * addresses for the selector and counter registers.
+ * Additionally, the address of the next selector or counter
+ * register is determined by incrementing the address of the
+ * current register by two.
+ */
+ if (num_pmu_gp_counters == AMD64_NUM_COUNTERS_CORE) {
+ sel_base = MSR_F15H_PERF_CTL0;
+ ctr_base = MSR_F15H_PERF_CTR0;
+ step = 2;
+ }
+
+ for (i = 0; i < num_pmu_gp_counters; i++) {
+ kvm_msr_entry_add(cpu, ctr_base + i * step,
+ env->msr_gp_counters[i]);
+ kvm_msr_entry_add(cpu, sel_base + i * step,
+ env->msr_gp_evtsel[i]);
+ }
+ }
+
/*
* Hyper-V partition-wide MSRs: to avoid clearing them on cpu hot-add,
* only sync them to KVM on the first cpu
@@ -4544,7 +4667,8 @@ static int kvm_get_msrs(X86CPU *cpu)
if (env->features[FEAT_KVM] & CPUID_KVM_POLL_CONTROL) {
kvm_msr_entry_add(cpu, MSR_KVM_POLL_CONTROL, 1);
}
- if (pmu_version > 0) {
+
+ if ((IS_INTEL_CPU(env) || IS_ZHAOXIN_CPU(env)) && pmu_version > 0) {
if (pmu_version > 1) {
kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR_CTRL, 0);
kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_CTRL, 0);
@@ -4560,6 +4684,35 @@ static int kvm_get_msrs(X86CPU *cpu)
}
}
+ if (IS_AMD_CPU(env) && pmu_version > 0) {
+ uint32_t sel_base = MSR_K7_EVNTSEL0;
+ uint32_t ctr_base = MSR_K7_PERFCTR0;
+ /*
+ * The address of the next selector or counter register is
+ * obtained by incrementing the address of the current selector
+ * or counter register by one.
+ */
+ uint32_t step = 1;
+
+ /*
+ * When PERFCORE is enabled, AMD PMU uses a separate set of
+ * addresses for the selector and counter registers.
+ * Additionally, the address of the next selector or counter
+ * register is determined by incrementing the address of the
+ * current register by two.
+ */
+ if (num_pmu_gp_counters == AMD64_NUM_COUNTERS_CORE) {
+ sel_base = MSR_F15H_PERF_CTL0;
+ ctr_base = MSR_F15H_PERF_CTR0;
+ step = 2;
+ }
+
+ for (i = 0; i < num_pmu_gp_counters; i++) {
+ kvm_msr_entry_add(cpu, ctr_base + i * step, 0);
+ kvm_msr_entry_add(cpu, sel_base + i * step, 0);
+ }
+ }
+
if (env->mcg_cap) {
kvm_msr_entry_add(cpu, MSR_MCG_STATUS, 0);
kvm_msr_entry_add(cpu, MSR_MCG_CTL, 0);
@@ -4871,6 +5024,21 @@ static int kvm_get_msrs(X86CPU *cpu)
case MSR_P6_EVNTSEL0 ... MSR_P6_EVNTSEL0 + MAX_GP_COUNTERS - 1:
env->msr_gp_evtsel[index - MSR_P6_EVNTSEL0] = msrs[i].data;
break;
+ case MSR_K7_EVNTSEL0 ... MSR_K7_EVNTSEL0 + AMD64_NUM_COUNTERS - 1:
+ env->msr_gp_evtsel[index - MSR_K7_EVNTSEL0] = msrs[i].data;
+ break;
+ case MSR_K7_PERFCTR0 ... MSR_K7_PERFCTR0 + AMD64_NUM_COUNTERS - 1:
+ env->msr_gp_counters[index - MSR_K7_PERFCTR0] = msrs[i].data;
+ break;
+ case MSR_F15H_PERF_CTL0 ...
+ MSR_F15H_PERF_CTL0 + AMD64_NUM_COUNTERS_CORE * 2 - 1:
+ index = index - MSR_F15H_PERF_CTL0;
+ if (index & 0x1) {
+ env->msr_gp_counters[index] = msrs[i].data;
+ } else {
+ env->msr_gp_evtsel[index] = msrs[i].data;
+ }
+ break;
case HV_X64_MSR_HYPERCALL:
env->msr_hv_hypercall = msrs[i].data;
break;
--
2.39.3
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 09/10] target/i386/kvm: support perfmon-v2 for reset
2025-03-31 1:32 [PATCH v3 00/10] target/i386/kvm/pmu: PMU Enhancement, Bugfix and Cleanup Dongli Zhang
` (7 preceding siblings ...)
2025-03-31 1:32 ` [PATCH v3 08/10] target/i386/kvm: reset AMD PMU registers during VM reset Dongli Zhang
@ 2025-03-31 1:32 ` Dongli Zhang
2025-04-10 8:21 ` Zhao Liu
2025-03-31 1:32 ` [PATCH v3 10/10] target/i386/kvm: don't stop Intel PMU counters Dongli Zhang
9 siblings, 1 reply; 21+ messages in thread
From: Dongli Zhang @ 2025-03-31 1:32 UTC (permalink / raw)
To: qemu-devel, kvm, qemu-arm, qemu-ppc, qemu-riscv, qemu-s390x
Cc: pbonzini, zhao1.liu, mtosatti, sandipan.das, babu.moger, likexu,
like.xu.linux, groug, khorenko, alexander.ivanov, den,
davydov-max, xiaoyao.li, dapeng1.mi, joe.jin, peter.maydell,
gaosong, chenhuacai, philmd, aurelien, jiaxun.yang, arikalo,
npiggin, danielhb413, palmer, alistair.francis, liwei1518,
zhiwei_liu, pasic, borntraeger, richard.henderson, david, iii,
thuth, flavra, ewanhai-oc, ewanhai, cobechen, louisqi, liamni,
frankzhu, silviazhao
Since perfmon-v2, the AMD PMU supports additional registers. This update
includes get/put functionality for these extra registers.
Similar to the implementation in KVM:
- MSR_CORE_PERF_GLOBAL_STATUS and MSR_AMD64_PERF_CNTR_GLOBAL_STATUS both
use env->msr_global_status.
- MSR_CORE_PERF_GLOBAL_CTRL and MSR_AMD64_PERF_CNTR_GLOBAL_CTL both use
env->msr_global_ctrl.
- MSR_CORE_PERF_GLOBAL_OVF_CTRL and MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR
both use env->msr_global_ovf_ctrl.
No changes are needed for vmstate_msr_architectural_pmu or
pmu_enable_needed().
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
---
Changed since v1:
- Use "has_pmu_version > 1", not "has_pmu_version == 2".
Changed since v2:
- Use cpuid_find_entry() instead of cpu_x86_cpuid().
- Change has_pmu_version to pmu_version.
- Cap num_pmu_gp_counters with MAX_GP_COUNTERS.
target/i386/cpu.h | 4 ++++
target/i386/kvm/kvm.c | 48 +++++++++++++++++++++++++++++++++++--------
2 files changed, 43 insertions(+), 9 deletions(-)
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 84e497f5d3..ab952ac5ad 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -490,6 +490,10 @@ typedef enum X86Seg {
#define MSR_CORE_PERF_GLOBAL_CTRL 0x38f
#define MSR_CORE_PERF_GLOBAL_OVF_CTRL 0x390
+#define MSR_AMD64_PERF_CNTR_GLOBAL_STATUS 0xc0000300
+#define MSR_AMD64_PERF_CNTR_GLOBAL_CTL 0xc0000301
+#define MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR 0xc0000302
+
#define MSR_K7_EVNTSEL0 0xc0010000
#define MSR_K7_PERFCTR0 0xc0010004
#define MSR_F15H_PERF_CTL0 0xc0010200
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 3a35fd741d..f4532e6f2a 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -2149,6 +2149,16 @@ static void kvm_init_pmu_info_amd(struct kvm_cpuid2 *cpuid, X86CPU *cpu)
}
num_pmu_gp_counters = AMD64_NUM_COUNTERS_CORE;
+
+ c = cpuid_find_entry(cpuid, 0x80000022, 0);
+ if (c && (c->eax & CPUID_8000_0022_EAX_PERFMON_V2)) {
+ pmu_version = 2;
+ num_pmu_gp_counters = c->ebx & 0xf;
+
+ if (num_pmu_gp_counters > MAX_GP_COUNTERS) {
+ num_pmu_gp_counters = MAX_GP_COUNTERS;
+ }
+ }
}
static bool is_host_compat_vendor(CPUX86State *env)
@@ -4200,13 +4210,14 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
uint32_t step = 1;
/*
- * When PERFCORE is enabled, AMD PMU uses a separate set of
- * addresses for the selector and counter registers.
- * Additionally, the address of the next selector or counter
- * register is determined by incrementing the address of the
- * current register by two.
+ * When PERFCORE or PerfMonV2 is enabled, AMD PMU uses a
+ * separate set of addresses for the selector and counter
+ * registers. Additionally, the address of the next selector or
+ * counter register is determined by incrementing the address
+ * of the current register by two.
*/
- if (num_pmu_gp_counters == AMD64_NUM_COUNTERS_CORE) {
+ if (num_pmu_gp_counters == AMD64_NUM_COUNTERS_CORE ||
+ pmu_version > 1) {
sel_base = MSR_F15H_PERF_CTL0;
ctr_base = MSR_F15H_PERF_CTR0;
step = 2;
@@ -4218,6 +4229,15 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
kvm_msr_entry_add(cpu, sel_base + i * step,
env->msr_gp_evtsel[i]);
}
+
+ if (pmu_version > 1) {
+ kvm_msr_entry_add(cpu, MSR_AMD64_PERF_CNTR_GLOBAL_STATUS,
+ env->msr_global_status);
+ kvm_msr_entry_add(cpu, MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR,
+ env->msr_global_ovf_ctrl);
+ kvm_msr_entry_add(cpu, MSR_AMD64_PERF_CNTR_GLOBAL_CTL,
+ env->msr_global_ctrl);
+ }
}
/*
@@ -4695,13 +4715,14 @@ static int kvm_get_msrs(X86CPU *cpu)
uint32_t step = 1;
/*
- * When PERFCORE is enabled, AMD PMU uses a separate set of
- * addresses for the selector and counter registers.
+ * When PERFCORE or PerfMonV2 is enabled, AMD PMU uses a separate
+ * set of addresses for the selector and counter registers.
* Additionally, the address of the next selector or counter
* register is determined by incrementing the address of the
* current register by two.
*/
- if (num_pmu_gp_counters == AMD64_NUM_COUNTERS_CORE) {
+ if (num_pmu_gp_counters == AMD64_NUM_COUNTERS_CORE ||
+ pmu_version > 1) {
sel_base = MSR_F15H_PERF_CTL0;
ctr_base = MSR_F15H_PERF_CTR0;
step = 2;
@@ -4711,6 +4732,12 @@ static int kvm_get_msrs(X86CPU *cpu)
kvm_msr_entry_add(cpu, ctr_base + i * step, 0);
kvm_msr_entry_add(cpu, sel_base + i * step, 0);
}
+
+ if (pmu_version > 1) {
+ kvm_msr_entry_add(cpu, MSR_AMD64_PERF_CNTR_GLOBAL_CTL, 0);
+ kvm_msr_entry_add(cpu, MSR_AMD64_PERF_CNTR_GLOBAL_STATUS, 0);
+ kvm_msr_entry_add(cpu, MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR, 0);
+ }
}
if (env->mcg_cap) {
@@ -5007,12 +5034,15 @@ static int kvm_get_msrs(X86CPU *cpu)
env->msr_fixed_ctr_ctrl = msrs[i].data;
break;
case MSR_CORE_PERF_GLOBAL_CTRL:
+ case MSR_AMD64_PERF_CNTR_GLOBAL_CTL:
env->msr_global_ctrl = msrs[i].data;
break;
case MSR_CORE_PERF_GLOBAL_STATUS:
+ case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS:
env->msr_global_status = msrs[i].data;
break;
case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
+ case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR:
env->msr_global_ovf_ctrl = msrs[i].data;
break;
case MSR_CORE_PERF_FIXED_CTR0 ... MSR_CORE_PERF_FIXED_CTR0 + MAX_FIXED_COUNTERS - 1:
--
2.39.3
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3 10/10] target/i386/kvm: don't stop Intel PMU counters
2025-03-31 1:32 [PATCH v3 00/10] target/i386/kvm/pmu: PMU Enhancement, Bugfix and Cleanup Dongli Zhang
` (8 preceding siblings ...)
2025-03-31 1:32 ` [PATCH v3 09/10] target/i386/kvm: support perfmon-v2 for reset Dongli Zhang
@ 2025-03-31 1:32 ` Dongli Zhang
2025-04-10 9:45 ` Zhao Liu
9 siblings, 1 reply; 21+ messages in thread
From: Dongli Zhang @ 2025-03-31 1:32 UTC (permalink / raw)
To: qemu-devel, kvm, qemu-arm, qemu-ppc, qemu-riscv, qemu-s390x
Cc: pbonzini, zhao1.liu, mtosatti, sandipan.das, babu.moger, likexu,
like.xu.linux, groug, khorenko, alexander.ivanov, den,
davydov-max, xiaoyao.li, dapeng1.mi, joe.jin, peter.maydell,
gaosong, chenhuacai, philmd, aurelien, jiaxun.yang, arikalo,
npiggin, danielhb413, palmer, alistair.francis, liwei1518,
zhiwei_liu, pasic, borntraeger, richard.henderson, david, iii,
thuth, flavra, ewanhai-oc, ewanhai, cobechen, louisqi, liamni,
frankzhu, silviazhao
The kvm_put_msrs() sets the MSRs using KVM_SET_MSRS. The x86 KVM processes
these MSRs one by one in a loop, only saving the config and triggering the
KVM_REQ_PMU request. This approach does not immediately stop the event
before updating PMC.
In additional, PMU MSRs are set only at levels >= KVM_PUT_RESET_STATE,
excluding runtime. Therefore, updating these MSRs without stopping events
should be acceptable.
Finally, KVM creates kernel perf events with host mode excluded
(exclude_host = 1). While the events remain active, they don't increment
the counter during QEMU vCPU userspace mode.
No Fixed tag is going to be added for the commit 0d89436786b0 ("kvm:
migrate vPMU state"), because this isn't a bugfix.
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
---
target/i386/kvm/kvm.c | 9 ---------
1 file changed, 9 deletions(-)
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 4c3908e09e..d9c6c9905e 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -4170,13 +4170,6 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
}
if ((IS_INTEL_CPU(env) || IS_ZHAOXIN_CPU(env)) && pmu_version > 0) {
- if (pmu_version > 1) {
- /* Stop the counter. */
- kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR_CTRL, 0);
- kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_CTRL, 0);
- }
-
- /* Set the counter values. */
for (i = 0; i < num_pmu_fixed_counters; i++) {
kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR0 + i,
env->msr_fixed_counters[i]);
@@ -4192,8 +4185,6 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
env->msr_global_status);
kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_OVF_CTRL,
env->msr_global_ovf_ctrl);
-
- /* Now start the PMU. */
kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR_CTRL,
env->msr_fixed_ctr_ctrl);
kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_CTRL,
--
2.39.3
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH v3 05/10] target/i386/kvm: extract unrelated code out of kvm_x86_build_cpuid()
2025-03-31 1:32 ` [PATCH v3 05/10] target/i386/kvm: extract unrelated code out of kvm_x86_build_cpuid() Dongli Zhang
@ 2025-04-10 2:41 ` Zhao Liu
0 siblings, 0 replies; 21+ messages in thread
From: Zhao Liu @ 2025-04-10 2:41 UTC (permalink / raw)
To: Dongli Zhang
Cc: qemu-devel, kvm, qemu-arm, qemu-ppc, qemu-riscv, qemu-s390x,
pbonzini, mtosatti, sandipan.das, babu.moger, likexu,
like.xu.linux, groug, khorenko, alexander.ivanov, den,
davydov-max, xiaoyao.li, dapeng1.mi, joe.jin, peter.maydell,
gaosong, chenhuacai, philmd, aurelien, jiaxun.yang, arikalo,
npiggin, danielhb413, palmer, alistair.francis, liwei1518,
zhiwei_liu, pasic, borntraeger, richard.henderson, david, iii,
thuth, flavra, ewanhai-oc, ewanhai, cobechen, louisqi, liamni,
frankzhu, silviazhao
On Sun, Mar 30, 2025 at 06:32:24PM -0700, Dongli Zhang wrote:
> Date: Sun, 30 Mar 2025 18:32:24 -0700
> From: Dongli Zhang <dongli.zhang@oracle.com>
> Subject: [PATCH v3 05/10] target/i386/kvm: extract unrelated code out of
> kvm_x86_build_cpuid()
> X-Mailer: git-send-email 2.43.5
>
> The initialization of 'has_architectural_pmu_version',
> 'num_architectural_pmu_gp_counters', and
> 'num_architectural_pmu_fixed_counters' is unrelated to the process of
> building the CPUID.
>
> Extract them out of kvm_x86_build_cpuid().
>
> In addition, use cpuid_find_entry() instead of cpu_x86_cpuid(), because
> CPUID has already been filled at this stage.
>
> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
> ---
> Changed since v1:
> - Still extract the code, but call them for all CPUs.
> Changed since v2:
> - Use cpuid_find_entry() instead of cpu_x86_cpuid().
> - Didn't add Reviewed-by from Dapeng as the change isn't minor.
>
> target/i386/kvm/kvm.c | 62 ++++++++++++++++++++++++-------------------
> 1 file changed, 35 insertions(+), 27 deletions(-)
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3 07/10] target/i386/kvm: query kvm.enable_pmu parameter
2025-03-31 1:32 ` [PATCH v3 07/10] target/i386/kvm: query kvm.enable_pmu parameter Dongli Zhang
@ 2025-04-10 5:05 ` Zhao Liu
2025-04-10 20:16 ` Dongli Zhang
2025-04-16 7:48 ` Dongli Zhang
0 siblings, 2 replies; 21+ messages in thread
From: Zhao Liu @ 2025-04-10 5:05 UTC (permalink / raw)
To: Dongli Zhang
Cc: qemu-devel, kvm, qemu-arm, qemu-ppc, qemu-riscv, qemu-s390x,
pbonzini, mtosatti, sandipan.das, babu.moger, likexu,
like.xu.linux, groug, khorenko, alexander.ivanov, den,
davydov-max, xiaoyao.li, dapeng1.mi, joe.jin, peter.maydell,
gaosong, chenhuacai, philmd, aurelien, jiaxun.yang, arikalo,
npiggin, danielhb413, palmer, alistair.francis, liwei1518,
zhiwei_liu, pasic, borntraeger, richard.henderson, david, iii,
thuth, flavra, ewanhai-oc, ewanhai, cobechen, louisqi, liamni,
frankzhu, silviazhao
Hi Dongli,
The logic is fine for me :-) And thank you to take my previous
suggestion. When I revisit here after these few weeks, I have some
thoughts:
> + if (pmu_cap) {
> + if ((pmu_cap & KVM_PMU_CAP_DISABLE) &&
> + !X86_CPU(cpu)->enable_pmu) {
> + ret = kvm_vm_enable_cap(kvm_state, KVM_CAP_PMU_CAPABILITY, 0,
> + KVM_PMU_CAP_DISABLE);
> + if (ret < 0) {
> + error_setg_errno(errp, -ret,
> + "Failed to set KVM_PMU_CAP_DISABLE");
> + return ret;
> + }
> + }
This case enhances vPMU disablement.
> + } else {
> + /*
> + * KVM_CAP_PMU_CAPABILITY is introduced in Linux v5.18. For old
> + * linux, we have to check enable_pmu parameter for vPMU support.
> + */
> + g_autofree char *kvm_enable_pmu;
> +
> + /*
> + * The kvm.enable_pmu's permission is 0444. It does not change until
> + * a reload of the KVM module.
> + */
> + if (g_file_get_contents("/sys/module/kvm/parameters/enable_pmu",
> + &kvm_enable_pmu, NULL, NULL)) {
> + if (*kvm_enable_pmu == 'N' && X86_CPU(cpu)->enable_pmu) {
> + error_setg(errp, "Failed to enable PMU since "
> + "KVM's enable_pmu parameter is disabled");
> + return -EPERM;
> + }
And this case checks if vPMU could enable.
> }
> }
> }
So I feel it's not good enough to check based on pmu_cap, we can
re-split it into these two cases: enable_pmu and !enable_pmu. Then we
can make the code path more clear!
Just like:
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index f68d5a057882..d728fb5eaec6 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -2041,44 +2041,42 @@ int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp)
if (first) {
first = false;
- /*
- * Since Linux v5.18, KVM provides a VM-level capability to easily
- * disable PMUs; however, QEMU has been providing PMU property per
- * CPU since v1.6. In order to accommodate both, have to configure
- * the VM-level capability here.
- *
- * KVM_PMU_CAP_DISABLE doesn't change the PMU
- * behavior on Intel platform because current "pmu" property works
- * as expected.
- */
- if (pmu_cap) {
- if ((pmu_cap & KVM_PMU_CAP_DISABLE) &&
- !X86_CPU(cpu)->enable_pmu) {
- ret = kvm_vm_enable_cap(kvm_state, KVM_CAP_PMU_CAPABILITY, 0,
- KVM_PMU_CAP_DISABLE);
- if (ret < 0) {
- error_setg_errno(errp, -ret,
- "Failed to set KVM_PMU_CAP_DISABLE");
- return ret;
- }
- }
- } else {
- /*
- * KVM_CAP_PMU_CAPABILITY is introduced in Linux v5.18. For old
- * linux, we have to check enable_pmu parameter for vPMU support.
- */
+ if (X86_CPU(cpu)->enable_pmu) {
g_autofree char *kvm_enable_pmu;
/*
- * The kvm.enable_pmu's permission is 0444. It does not change until
- * a reload of the KVM module.
+ * The enable_pmu parameter is introduced since Linux v5.17,
+ * give a chance to provide more information about vPMU
+ * enablement.
+ *
+ * The kvm.enable_pmu's permission is 0444. It does not change
+ * until a reload of the KVM module.
*/
if (g_file_get_contents("/sys/module/kvm/parameters/enable_pmu",
&kvm_enable_pmu, NULL, NULL)) {
- if (*kvm_enable_pmu == 'N' && X86_CPU(cpu)->enable_pmu) {
- error_setg(errp, "Failed to enable PMU since "
+ if (*kvm_enable_pmu == 'N') {
+ warn_report("Failed to enable PMU since "
"KVM's enable_pmu parameter is disabled");
- return -EPERM;
+ }
+ }
+ } else {
+ /*
+ * Since Linux v5.18, KVM provides a VM-level capability to easily
+ * disable PMUs; however, QEMU has been providing PMU property per
+ * CPU since v1.6. In order to accommodate both, have to configure
+ * the VM-level capability here.
+ *
+ * KVM_PMU_CAP_DISABLE doesn't change the PMU
+ * behavior on Intel platform because current "pmu" property works
+ * as expected.
+ */
+ if ((pmu_cap & KVM_PMU_CAP_DISABLE)) {
+ ret = kvm_vm_enable_cap(kvm_state, KVM_CAP_PMU_CAPABILITY, 0,
+ KVM_PMU_CAP_DISABLE);
+ if (ret < 0) {
+ error_setg_errno(errp, -ret,
+ "Failed to set KVM_PMU_CAP_DISABLE");
+ return ret;
}
}
}
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH v3 08/10] target/i386/kvm: reset AMD PMU registers during VM reset
2025-03-31 1:32 ` [PATCH v3 08/10] target/i386/kvm: reset AMD PMU registers during VM reset Dongli Zhang
@ 2025-04-10 7:43 ` Zhao Liu
2025-04-10 21:17 ` Dongli Zhang
0 siblings, 1 reply; 21+ messages in thread
From: Zhao Liu @ 2025-04-10 7:43 UTC (permalink / raw)
To: Dongli Zhang
Cc: qemu-devel, kvm, qemu-arm, qemu-ppc, qemu-riscv, qemu-s390x,
pbonzini, mtosatti, sandipan.das, babu.moger, likexu,
like.xu.linux, groug, khorenko, alexander.ivanov, den,
davydov-max, xiaoyao.li, dapeng1.mi, joe.jin, peter.maydell,
gaosong, chenhuacai, philmd, aurelien, jiaxun.yang, arikalo,
npiggin, danielhb413, palmer, alistair.francis, liwei1518,
zhiwei_liu, pasic, borntraeger, richard.henderson, david, iii,
thuth, flavra, ewanhai-oc, ewanhai, cobechen, louisqi, liamni,
frankzhu, silviazhao
...
> TODO:
> - This patch adds is_host_compat_vendor(), while there are something
> like is_host_cpu_intel() from target/i386/kvm/vmsr_energy.c. A rework
> may help move those helpers to target/i386/cpu*.
vmsr_energy emulates RAPL in user space...but RAPL is not architectural
(no CPUID), so this case doesn't need to consider "compat" vendor.
> target/i386/cpu.h | 8 ++
> target/i386/kvm/kvm.c | 176 +++++++++++++++++++++++++++++++++++++++++-
> 2 files changed, 180 insertions(+), 4 deletions(-)
...
> +static bool is_host_compat_vendor(CPUX86State *env)
> +{
> + char host_vendor[CPUID_VENDOR_SZ + 1];
> + uint32_t host_cpuid_vendor1;
> + uint32_t host_cpuid_vendor2;
> + uint32_t host_cpuid_vendor3;
>
> + host_cpuid(0x0, 0, NULL, &host_cpuid_vendor1, &host_cpuid_vendor3,
> + &host_cpuid_vendor2);
> +
> + x86_cpu_vendor_words2str(host_vendor, host_cpuid_vendor1,
> + host_cpuid_vendor2, host_cpuid_vendor3);
We can use host_cpu_vendor_fms() (with a little change). If you like
this idea, pls feel free to pick my cleanup patch into your series.
> + /*
> + * Intel and Zhaoxin are compatible.
> + */
> + if ((g_str_equal(host_vendor, CPUID_VENDOR_INTEL) ||
> + g_str_equal(host_vendor, CPUID_VENDOR_ZHAOXIN1) ||
> + g_str_equal(host_vendor, CPUID_VENDOR_ZHAOXIN2)) &&
> + (IS_INTEL_CPU(env) || IS_ZHAOXIN_CPU(env))) {
> + return true;
> + }
> +
> + return env->cpuid_vendor1 == host_cpuid_vendor1 &&
> + env->cpuid_vendor2 == host_cpuid_vendor2 &&
> + env->cpuid_vendor3 == host_cpuid_vendor3;
Checking AMD directly makes the "compat" rule clear:
return g_str_equal(host_vendor, CPUID_VENDOR_AMD) &&
IS_AMD_CPU(env);
> +}
...
> if (env->mcg_cap) {
> kvm_msr_entry_add(cpu, MSR_MCG_STATUS, 0);
> kvm_msr_entry_add(cpu, MSR_MCG_CTL, 0);
> @@ -4871,6 +5024,21 @@ static int kvm_get_msrs(X86CPU *cpu)
> case MSR_P6_EVNTSEL0 ... MSR_P6_EVNTSEL0 + MAX_GP_COUNTERS - 1:
> env->msr_gp_evtsel[index - MSR_P6_EVNTSEL0] = msrs[i].data;
> break;
> + case MSR_K7_EVNTSEL0 ... MSR_K7_EVNTSEL0 + AMD64_NUM_COUNTERS - 1:
> + env->msr_gp_evtsel[index - MSR_K7_EVNTSEL0] = msrs[i].data;
> + break;
> + case MSR_K7_PERFCTR0 ... MSR_K7_PERFCTR0 + AMD64_NUM_COUNTERS - 1:
> + env->msr_gp_counters[index - MSR_K7_PERFCTR0] = msrs[i].data;
> + break;
> + case MSR_F15H_PERF_CTL0 ...
> + MSR_F15H_PERF_CTL0 + AMD64_NUM_COUNTERS_CORE * 2 - 1:
> + index = index - MSR_F15H_PERF_CTL0;
> + if (index & 0x1) {
> + env->msr_gp_counters[index] = msrs[i].data;
> + } else {
> + env->msr_gp_evtsel[index] = msrs[i].data;
This msr_gp_evtsel[] array's size is 18:
#define MAX_GP_COUNTERS (MSR_IA32_PERF_STATUS - MSR_P6_EVNTSEL0)
This formula is based on Intel's MSR, it's best to add a note that the
current size also meets AMD's needs. (No need to adjust the size, as
it will affect migration).
> + }
> + break;
> case HV_X64_MSR_HYPERCALL:
> env->msr_hv_hypercall = msrs[i].data;
> break;
Others LGTM!
Thanks,
Zhao
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3 09/10] target/i386/kvm: support perfmon-v2 for reset
2025-03-31 1:32 ` [PATCH v3 09/10] target/i386/kvm: support perfmon-v2 for reset Dongli Zhang
@ 2025-04-10 8:21 ` Zhao Liu
2025-04-10 21:19 ` Dongli Zhang
0 siblings, 1 reply; 21+ messages in thread
From: Zhao Liu @ 2025-04-10 8:21 UTC (permalink / raw)
To: Dongli Zhang
Cc: qemu-devel, kvm, qemu-arm, qemu-ppc, qemu-riscv, qemu-s390x,
pbonzini, mtosatti, sandipan.das, babu.moger, likexu,
like.xu.linux, groug, khorenko, alexander.ivanov, den,
davydov-max, xiaoyao.li, dapeng1.mi, joe.jin, peter.maydell,
gaosong, chenhuacai, philmd, aurelien, jiaxun.yang, arikalo,
npiggin, danielhb413, palmer, alistair.francis, liwei1518,
zhiwei_liu, pasic, borntraeger, richard.henderson, david, iii,
thuth, flavra, ewanhai-oc, ewanhai, cobechen, louisqi, liamni,
frankzhu, silviazhao
On Sun, Mar 30, 2025 at 06:32:28PM -0700, Dongli Zhang wrote:
> Date: Sun, 30 Mar 2025 18:32:28 -0700
> From: Dongli Zhang <dongli.zhang@oracle.com>
> Subject: [PATCH v3 09/10] target/i386/kvm: support perfmon-v2 for reset
> X-Mailer: git-send-email 2.43.5
>
> Since perfmon-v2, the AMD PMU supports additional registers. This update
> includes get/put functionality for these extra registers.
>
> Similar to the implementation in KVM:
>
> - MSR_CORE_PERF_GLOBAL_STATUS and MSR_AMD64_PERF_CNTR_GLOBAL_STATUS both
> use env->msr_global_status.
> - MSR_CORE_PERF_GLOBAL_CTRL and MSR_AMD64_PERF_CNTR_GLOBAL_CTL both use
> env->msr_global_ctrl.
> - MSR_CORE_PERF_GLOBAL_OVF_CTRL and MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR
> both use env->msr_global_ovf_ctrl.
>
> No changes are needed for vmstate_msr_architectural_pmu or
> pmu_enable_needed().
>
> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
> ---
...
> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> index 3a35fd741d..f4532e6f2a 100644
> --- a/target/i386/kvm/kvm.c
> +++ b/target/i386/kvm/kvm.c
> @@ -2149,6 +2149,16 @@ static void kvm_init_pmu_info_amd(struct kvm_cpuid2 *cpuid, X86CPU *cpu)
> }
>
> num_pmu_gp_counters = AMD64_NUM_COUNTERS_CORE;
> +
> + c = cpuid_find_entry(cpuid, 0x80000022, 0);
> + if (c && (c->eax & CPUID_8000_0022_EAX_PERFMON_V2)) {
> + pmu_version = 2;
> + num_pmu_gp_counters = c->ebx & 0xf;
> +
> + if (num_pmu_gp_counters > MAX_GP_COUNTERS) {
> + num_pmu_gp_counters = MAX_GP_COUNTERS;
OK! KVM now supports 6 GP counters (KVM_MAX_NR_AMD_GP_COUNTERS).
> + }
> + }
> }
Fine for me,
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3 10/10] target/i386/kvm: don't stop Intel PMU counters
2025-03-31 1:32 ` [PATCH v3 10/10] target/i386/kvm: don't stop Intel PMU counters Dongli Zhang
@ 2025-04-10 9:45 ` Zhao Liu
2025-04-10 22:25 ` Dongli Zhang
0 siblings, 1 reply; 21+ messages in thread
From: Zhao Liu @ 2025-04-10 9:45 UTC (permalink / raw)
To: Dongli Zhang
Cc: qemu-devel, kvm, qemu-arm, qemu-ppc, qemu-riscv, qemu-s390x,
pbonzini, mtosatti, sandipan.das, babu.moger, likexu,
like.xu.linux, groug, khorenko, alexander.ivanov, den,
davydov-max, xiaoyao.li, dapeng1.mi, joe.jin, peter.maydell,
gaosong, chenhuacai, philmd, aurelien, jiaxun.yang, arikalo,
npiggin, danielhb413, palmer, alistair.francis, liwei1518,
zhiwei_liu, pasic, borntraeger, richard.henderson, david, iii,
thuth, flavra, ewanhai-oc, ewanhai, cobechen, louisqi, liamni,
frankzhu, silviazhao
On Sun, Mar 30, 2025 at 06:32:29PM -0700, Dongli Zhang wrote:
> Date: Sun, 30 Mar 2025 18:32:29 -0700
> From: Dongli Zhang <dongli.zhang@oracle.com>
> Subject: [PATCH v3 10/10] target/i386/kvm: don't stop Intel PMU counters
> X-Mailer: git-send-email 2.43.5
>
> The kvm_put_msrs() sets the MSRs using KVM_SET_MSRS. The x86 KVM processes
> these MSRs one by one in a loop, only saving the config and triggering the
> KVM_REQ_PMU request. This approach does not immediately stop the event
> before updating PMC.
This is ture after KVM's 68fb4757e867 (v6.2). QEMU even supports v4.5
(docs/system/target-i386.rst)... I'm not sure whether it is outdated,
but it's better to mention the Linux version.
> In additional, PMU MSRs are set only at levels >= KVM_PUT_RESET_STATE,
> excluding runtime. Therefore, updating these MSRs without stopping events
> should be acceptable.
I agree.
> Finally, KVM creates kernel perf events with host mode excluded
> (exclude_host = 1). While the events remain active, they don't increment
> the counter during QEMU vCPU userspace mode.
>
> No Fixed tag is going to be added for the commit 0d89436786b0 ("kvm:
> migrate vPMU state"), because this isn't a bugfix.
>
> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
> ---
> target/i386/kvm/kvm.c | 9 ---------
> 1 file changed, 9 deletions(-)
Fine for me,
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3 07/10] target/i386/kvm: query kvm.enable_pmu parameter
2025-04-10 5:05 ` Zhao Liu
@ 2025-04-10 20:16 ` Dongli Zhang
2025-04-16 7:48 ` Dongli Zhang
1 sibling, 0 replies; 21+ messages in thread
From: Dongli Zhang @ 2025-04-10 20:16 UTC (permalink / raw)
To: Zhao Liu
Cc: qemu-devel, kvm, qemu-arm, qemu-ppc, qemu-riscv, qemu-s390x,
pbonzini, mtosatti, sandipan.das, babu.moger, likexu,
like.xu.linux, groug, khorenko, alexander.ivanov, den,
davydov-max, xiaoyao.li, dapeng1.mi, joe.jin, peter.maydell,
gaosong, chenhuacai, philmd, aurelien, jiaxun.yang, arikalo,
npiggin, danielhb413, palmer, alistair.francis, liwei1518,
zhiwei_liu, pasic, borntraeger, richard.henderson, david, iii,
thuth, flavra, ewanhai-oc, ewanhai, cobechen, louisqi, liamni,
frankzhu, silviazhao
Hi Zhao,
On 4/9/25 10:05 PM, Zhao Liu wrote:
> Hi Dongli,
>
> The logic is fine for me :-) And thank you to take my previous
> suggestion. When I revisit here after these few weeks, I have some
> thoughts:
>
>> + if (pmu_cap) {
>> + if ((pmu_cap & KVM_PMU_CAP_DISABLE) &&
>> + !X86_CPU(cpu)->enable_pmu) {
>> + ret = kvm_vm_enable_cap(kvm_state, KVM_CAP_PMU_CAPABILITY, 0,
>> + KVM_PMU_CAP_DISABLE);
>> + if (ret < 0) {
>> + error_setg_errno(errp, -ret,
>> + "Failed to set KVM_PMU_CAP_DISABLE");
>> + return ret;
>> + }
>> + }
>
> This case enhances vPMU disablement.
>
>> + } else {
>> + /*
>> + * KVM_CAP_PMU_CAPABILITY is introduced in Linux v5.18. For old
>> + * linux, we have to check enable_pmu parameter for vPMU support.
>> + */
>> + g_autofree char *kvm_enable_pmu;
>> +
>> + /*
>> + * The kvm.enable_pmu's permission is 0444. It does not change until
>> + * a reload of the KVM module.
>> + */
>> + if (g_file_get_contents("/sys/module/kvm/parameters/enable_pmu",
>> + &kvm_enable_pmu, NULL, NULL)) {
>> + if (*kvm_enable_pmu == 'N' && X86_CPU(cpu)->enable_pmu) {
>> + error_setg(errp, "Failed to enable PMU since "
>> + "KVM's enable_pmu parameter is disabled");
>> + return -EPERM;
>> + }
>
> And this case checks if vPMU could enable.
>
>> }
>> }
>> }
>
> So I feel it's not good enough to check based on pmu_cap, we can
> re-split it into these two cases: enable_pmu and !enable_pmu. Then we
> can make the code path more clear!
>
> Just like:
>
> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> index f68d5a057882..d728fb5eaec6 100644
> --- a/target/i386/kvm/kvm.c
> +++ b/target/i386/kvm/kvm.c
> @@ -2041,44 +2041,42 @@ int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp)
> if (first) {
> first = false;
>
> - /*
> - * Since Linux v5.18, KVM provides a VM-level capability to easily
> - * disable PMUs; however, QEMU has been providing PMU property per
> - * CPU since v1.6. In order to accommodate both, have to configure
> - * the VM-level capability here.
> - *
> - * KVM_PMU_CAP_DISABLE doesn't change the PMU
> - * behavior on Intel platform because current "pmu" property works
> - * as expected.
> - */
> - if (pmu_cap) {
> - if ((pmu_cap & KVM_PMU_CAP_DISABLE) &&
> - !X86_CPU(cpu)->enable_pmu) {
> - ret = kvm_vm_enable_cap(kvm_state, KVM_CAP_PMU_CAPABILITY, 0,
> - KVM_PMU_CAP_DISABLE);
> - if (ret < 0) {
> - error_setg_errno(errp, -ret,
> - "Failed to set KVM_PMU_CAP_DISABLE");
> - return ret;
> - }
> - }
> - } else {
> - /*
> - * KVM_CAP_PMU_CAPABILITY is introduced in Linux v5.18. For old
> - * linux, we have to check enable_pmu parameter for vPMU support.
> - */
> + if (X86_CPU(cpu)->enable_pmu) {
> g_autofree char *kvm_enable_pmu;
>
> /*
> - * The kvm.enable_pmu's permission is 0444. It does not change until
> - * a reload of the KVM module.
> + * The enable_pmu parameter is introduced since Linux v5.17,
> + * give a chance to provide more information about vPMU
> + * enablement.
> + *
> + * The kvm.enable_pmu's permission is 0444. It does not change
> + * until a reload of the KVM module.
> */
> if (g_file_get_contents("/sys/module/kvm/parameters/enable_pmu",
> &kvm_enable_pmu, NULL, NULL)) {
> - if (*kvm_enable_pmu == 'N' && X86_CPU(cpu)->enable_pmu) {
> - error_setg(errp, "Failed to enable PMU since "
> + if (*kvm_enable_pmu == 'N') {
> + warn_report("Failed to enable PMU since "
> "KVM's enable_pmu parameter is disabled");
> - return -EPERM;
> + }
> + }
> + } else {
> + /*
> + * Since Linux v5.18, KVM provides a VM-level capability to easily
> + * disable PMUs; however, QEMU has been providing PMU property per
> + * CPU since v1.6. In order to accommodate both, have to configure
> + * the VM-level capability here.
> + *
> + * KVM_PMU_CAP_DISABLE doesn't change the PMU
> + * behavior on Intel platform because current "pmu" property works
> + * as expected.
> + */
> + if ((pmu_cap & KVM_PMU_CAP_DISABLE)) {
> + ret = kvm_vm_enable_cap(kvm_state, KVM_CAP_PMU_CAPABILITY, 0,
> + KVM_PMU_CAP_DISABLE);
> + if (ret < 0) {
> + error_setg_errno(errp, -ret,
> + "Failed to set KVM_PMU_CAP_DISABLE");
> + return ret;
> }
> }
> }
>
Thank you very much! I will split based on (enable_pmu) and (!enable_pmu)
following your suggestion.
Dongli Zhang
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3 08/10] target/i386/kvm: reset AMD PMU registers during VM reset
2025-04-10 7:43 ` Zhao Liu
@ 2025-04-10 21:17 ` Dongli Zhang
0 siblings, 0 replies; 21+ messages in thread
From: Dongli Zhang @ 2025-04-10 21:17 UTC (permalink / raw)
To: Zhao Liu
Cc: qemu-devel, kvm, qemu-arm, qemu-ppc, qemu-riscv, qemu-s390x,
pbonzini, mtosatti, sandipan.das, babu.moger, likexu,
like.xu.linux, groug, khorenko, alexander.ivanov, den,
davydov-max, xiaoyao.li, dapeng1.mi, joe.jin, peter.maydell,
gaosong, chenhuacai, philmd, aurelien, jiaxun.yang, arikalo,
npiggin, danielhb413, palmer, alistair.francis, liwei1518,
zhiwei_liu, pasic, borntraeger, richard.henderson, david, iii,
thuth, flavra, ewanhai-oc, ewanhai, cobechen, louisqi, liamni,
frankzhu, silviazhao
Hi Zhao,
On 4/10/25 12:43 AM, Zhao Liu wrote:
> ...
>
>> TODO:
>> - This patch adds is_host_compat_vendor(), while there are something
>> like is_host_cpu_intel() from target/i386/kvm/vmsr_energy.c. A rework
>> may help move those helpers to target/i386/cpu*.
>
> vmsr_energy emulates RAPL in user space...but RAPL is not architectural
> (no CPUID), so this case doesn't need to consider "compat" vendor.
>
>> target/i386/cpu.h | 8 ++
>> target/i386/kvm/kvm.c | 176 +++++++++++++++++++++++++++++++++++++++++-
>> 2 files changed, 180 insertions(+), 4 deletions(-)
>
> ...
>
>> +static bool is_host_compat_vendor(CPUX86State *env)
>> +{
>> + char host_vendor[CPUID_VENDOR_SZ + 1];
>> + uint32_t host_cpuid_vendor1;
>> + uint32_t host_cpuid_vendor2;
>> + uint32_t host_cpuid_vendor3;
>>
>> + host_cpuid(0x0, 0, NULL, &host_cpuid_vendor1, &host_cpuid_vendor3,
>> + &host_cpuid_vendor2);
>> +
>> + x86_cpu_vendor_words2str(host_vendor, host_cpuid_vendor1,
>> + host_cpuid_vendor2, host_cpuid_vendor3);
>
> We can use host_cpu_vendor_fms() (with a little change). If you like
> this idea, pls feel free to pick my cleanup patch into your series.
Sure. I will try to use host_cpu_vendor_fms().
>
>> + /*
>> + * Intel and Zhaoxin are compatible.
>> + */
>> + if ((g_str_equal(host_vendor, CPUID_VENDOR_INTEL) ||
>> + g_str_equal(host_vendor, CPUID_VENDOR_ZHAOXIN1) ||
>> + g_str_equal(host_vendor, CPUID_VENDOR_ZHAOXIN2)) &&
>> + (IS_INTEL_CPU(env) || IS_ZHAOXIN_CPU(env))) {
>> + return true;
>> + }
>> +
>> + return env->cpuid_vendor1 == host_cpuid_vendor1 &&
>> + env->cpuid_vendor2 == host_cpuid_vendor2 &&
>> + env->cpuid_vendor3 == host_cpuid_vendor3;
>
> Checking AMD directly makes the "compat" rule clear:
>
> return g_str_equal(host_vendor, CPUID_VENDOR_AMD) &&
> IS_AMD_CPU(env);
Sure.
>
>> +}
>
> ...
>
>> if (env->mcg_cap) {
>> kvm_msr_entry_add(cpu, MSR_MCG_STATUS, 0);
>> kvm_msr_entry_add(cpu, MSR_MCG_CTL, 0);
>> @@ -4871,6 +5024,21 @@ static int kvm_get_msrs(X86CPU *cpu)
>> case MSR_P6_EVNTSEL0 ... MSR_P6_EVNTSEL0 + MAX_GP_COUNTERS - 1:
>> env->msr_gp_evtsel[index - MSR_P6_EVNTSEL0] = msrs[i].data;
>> break;
>> + case MSR_K7_EVNTSEL0 ... MSR_K7_EVNTSEL0 + AMD64_NUM_COUNTERS - 1:
>> + env->msr_gp_evtsel[index - MSR_K7_EVNTSEL0] = msrs[i].data;
>> + break;
>> + case MSR_K7_PERFCTR0 ... MSR_K7_PERFCTR0 + AMD64_NUM_COUNTERS - 1:
>> + env->msr_gp_counters[index - MSR_K7_PERFCTR0] = msrs[i].data;
>> + break;
>> + case MSR_F15H_PERF_CTL0 ...
>> + MSR_F15H_PERF_CTL0 + AMD64_NUM_COUNTERS_CORE * 2 - 1:
>> + index = index - MSR_F15H_PERF_CTL0;
>> + if (index & 0x1) {
>> + env->msr_gp_counters[index] = msrs[i].data;
>> + } else {
>> + env->msr_gp_evtsel[index] = msrs[i].data;
>
> This msr_gp_evtsel[] array's size is 18:
>
> #define MAX_GP_COUNTERS (MSR_IA32_PERF_STATUS - MSR_P6_EVNTSEL0)
>
> This formula is based on Intel's MSR, it's best to add a note that the
> current size also meets AMD's needs. (No need to adjust the size, as
> it will affect migration).
I will add a comment to target/i386/cpu.h, above the definition of MAX_GP_COUNTERS.
Thank you very much!
Dongli Zhang
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3 09/10] target/i386/kvm: support perfmon-v2 for reset
2025-04-10 8:21 ` Zhao Liu
@ 2025-04-10 21:19 ` Dongli Zhang
0 siblings, 0 replies; 21+ messages in thread
From: Dongli Zhang @ 2025-04-10 21:19 UTC (permalink / raw)
To: Zhao Liu
Cc: qemu-devel, kvm, qemu-arm, qemu-ppc, qemu-riscv, qemu-s390x,
pbonzini, mtosatti, sandipan.das, babu.moger, likexu,
like.xu.linux, groug, khorenko, alexander.ivanov, den,
davydov-max, xiaoyao.li, dapeng1.mi, joe.jin, peter.maydell,
gaosong, chenhuacai, philmd, aurelien, jiaxun.yang, arikalo,
npiggin, danielhb413, palmer, alistair.francis, liwei1518,
zhiwei_liu, pasic, borntraeger, richard.henderson, david, iii,
thuth, flavra, ewanhai-oc, ewanhai, cobechen, louisqi, liamni,
frankzhu, silviazhao
Hi Zhao,
On 4/10/25 1:21 AM, Zhao Liu wrote:
> On Sun, Mar 30, 2025 at 06:32:28PM -0700, Dongli Zhang wrote:
>> Date: Sun, 30 Mar 2025 18:32:28 -0700
>> From: Dongli Zhang <dongli.zhang@oracle.com>
>> Subject: [PATCH v3 09/10] target/i386/kvm: support perfmon-v2 for reset
>> X-Mailer: git-send-email 2.43.5
>>
>> Since perfmon-v2, the AMD PMU supports additional registers. This update
>> includes get/put functionality for these extra registers.
>>
>> Similar to the implementation in KVM:
>>
>> - MSR_CORE_PERF_GLOBAL_STATUS and MSR_AMD64_PERF_CNTR_GLOBAL_STATUS both
>> use env->msr_global_status.
>> - MSR_CORE_PERF_GLOBAL_CTRL and MSR_AMD64_PERF_CNTR_GLOBAL_CTL both use
>> env->msr_global_ctrl.
>> - MSR_CORE_PERF_GLOBAL_OVF_CTRL and MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR
>> both use env->msr_global_ovf_ctrl.
>>
>> No changes are needed for vmstate_msr_architectural_pmu or
>> pmu_enable_needed().
>>
>> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
>> ---
>
> ...
>
>> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
>> index 3a35fd741d..f4532e6f2a 100644
>> --- a/target/i386/kvm/kvm.c
>> +++ b/target/i386/kvm/kvm.c
>> @@ -2149,6 +2149,16 @@ static void kvm_init_pmu_info_amd(struct kvm_cpuid2 *cpuid, X86CPU *cpu)
>> }
>>
>> num_pmu_gp_counters = AMD64_NUM_COUNTERS_CORE;
>> +
>> + c = cpuid_find_entry(cpuid, 0x80000022, 0);
>> + if (c && (c->eax & CPUID_8000_0022_EAX_PERFMON_V2)) {
>> + pmu_version = 2;
>> + num_pmu_gp_counters = c->ebx & 0xf;
>> +
>> + if (num_pmu_gp_counters > MAX_GP_COUNTERS) {
>> + num_pmu_gp_counters = MAX_GP_COUNTERS;
>
> OK! KVM now supports 6 GP counters (KVM_MAX_NR_AMD_GP_COUNTERS).
Thank you very much for the Reviewed-by. I assume MAX_GP_COUNTERS is still good
to you here in the patch. It is to just do an upper-bound check.
Dongli Zhang
>
>> + }
>> + }
>> }
>
> Fine for me,
>
> Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3 10/10] target/i386/kvm: don't stop Intel PMU counters
2025-04-10 9:45 ` Zhao Liu
@ 2025-04-10 22:25 ` Dongli Zhang
0 siblings, 0 replies; 21+ messages in thread
From: Dongli Zhang @ 2025-04-10 22:25 UTC (permalink / raw)
To: Zhao Liu
Cc: qemu-devel, kvm, qemu-arm, qemu-ppc, qemu-riscv, qemu-s390x,
pbonzini, mtosatti, sandipan.das, babu.moger, likexu,
like.xu.linux, groug, khorenko, alexander.ivanov, den,
davydov-max, xiaoyao.li, dapeng1.mi, joe.jin, peter.maydell,
gaosong, chenhuacai, philmd, aurelien, jiaxun.yang, arikalo,
npiggin, danielhb413, palmer, alistair.francis, liwei1518,
zhiwei_liu, pasic, borntraeger, richard.henderson, david, iii,
thuth, flavra, ewanhai-oc, ewanhai, cobechen, louisqi, liamni,
frankzhu, silviazhao
Hi Zhao,
On 4/10/25 2:45 AM, Zhao Liu wrote:
> On Sun, Mar 30, 2025 at 06:32:29PM -0700, Dongli Zhang wrote:
>> Date: Sun, 30 Mar 2025 18:32:29 -0700
>> From: Dongli Zhang <dongli.zhang@oracle.com>
>> Subject: [PATCH v3 10/10] target/i386/kvm: don't stop Intel PMU counters
>> X-Mailer: git-send-email 2.43.5
>>
>> The kvm_put_msrs() sets the MSRs using KVM_SET_MSRS. The x86 KVM processes
>> these MSRs one by one in a loop, only saving the config and triggering the
>> KVM_REQ_PMU request. This approach does not immediately stop the event
>> before updating PMC.
>
> This is ture after KVM's 68fb4757e867 (v6.2). QEMU even supports v4.5
> (docs/system/target-i386.rst)... I'm not sure whether it is outdated,
> but it's better to mention the Linux version.
Thank you very much for the reminder.
I will:
1. Reorder the reasons and put the above at the end, because now "levels >=
KVM_PUT_RESET_STATE" and "exclude_host = 1" (Intel uses atomic MSR autoload
while looks AMD supports a special guest mode) are more convincing.
2. Add the commit id that you suggest to the last reason.
3. Add your Reviewed-by. Thank you very much!
Dongli Zhang
>
>> In additional, PMU MSRs are set only at levels >= KVM_PUT_RESET_STATE,
>> excluding runtime. Therefore, updating these MSRs without stopping events
>> should be acceptable.
>
> I agree.
>
>> Finally, KVM creates kernel perf events with host mode excluded
>> (exclude_host = 1). While the events remain active, they don't increment
>> the counter during QEMU vCPU userspace mode.
>>
>> No Fixed tag is going to be added for the commit 0d89436786b0 ("kvm:
>> migrate vPMU state"), because this isn't a bugfix.
>>
>> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
>> ---
>> target/i386/kvm/kvm.c | 9 ---------
>> 1 file changed, 9 deletions(-)
>
> Fine for me,
>
> Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v3 07/10] target/i386/kvm: query kvm.enable_pmu parameter
2025-04-10 5:05 ` Zhao Liu
2025-04-10 20:16 ` Dongli Zhang
@ 2025-04-16 7:48 ` Dongli Zhang
1 sibling, 0 replies; 21+ messages in thread
From: Dongli Zhang @ 2025-04-16 7:48 UTC (permalink / raw)
To: Zhao Liu
Cc: qemu-devel, kvm, qemu-arm, qemu-ppc, qemu-riscv, qemu-s390x,
pbonzini, mtosatti, sandipan.das, babu.moger, likexu,
like.xu.linux, groug, khorenko, alexander.ivanov, den,
davydov-max, xiaoyao.li, dapeng1.mi, joe.jin, peter.maydell,
gaosong, chenhuacai, philmd, aurelien, jiaxun.yang, arikalo,
npiggin, danielhb413, palmer, alistair.francis, liwei1518,
zhiwei_liu, pasic, borntraeger, richard.henderson, david, iii,
thuth, flavra, ewanhai-oc, ewanhai, cobechen, louisqi, liamni,
frankzhu, silviazhao
Hi Zhao,
On 4/9/25 10:05 PM, Zhao Liu wrote:
> Hi Dongli,
>
> The logic is fine for me :-) And thank you to take my previous
> suggestion. When I revisit here after these few weeks, I have some
> thoughts:
>
>> + if (pmu_cap) {
>> + if ((pmu_cap & KVM_PMU_CAP_DISABLE) &&
>> + !X86_CPU(cpu)->enable_pmu) {
>> + ret = kvm_vm_enable_cap(kvm_state, KVM_CAP_PMU_CAPABILITY, 0,
>> + KVM_PMU_CAP_DISABLE);
>> + if (ret < 0) {
>> + error_setg_errno(errp, -ret,
>> + "Failed to set KVM_PMU_CAP_DISABLE");
>> + return ret;
>> + }
>> + }
>
> This case enhances vPMU disablement.
>
>> + } else {
>> + /*
>> + * KVM_CAP_PMU_CAPABILITY is introduced in Linux v5.18. For old
>> + * linux, we have to check enable_pmu parameter for vPMU support.
>> + */
>> + g_autofree char *kvm_enable_pmu;
>> +
>> + /*
>> + * The kvm.enable_pmu's permission is 0444. It does not change until
>> + * a reload of the KVM module.
>> + */
>> + if (g_file_get_contents("/sys/module/kvm/parameters/enable_pmu",
>> + &kvm_enable_pmu, NULL, NULL)) {
>> + if (*kvm_enable_pmu == 'N' && X86_CPU(cpu)->enable_pmu) {
>> + error_setg(errp, "Failed to enable PMU since "
>> + "KVM's enable_pmu parameter is disabled");
>> + return -EPERM;
>> + }
>
> And this case checks if vPMU could enable.
>
>> }
>> }
>> }
>
> So I feel it's not good enough to check based on pmu_cap, we can
> re-split it into these two cases: enable_pmu and !enable_pmu. Then we
> can make the code path more clear!
>
> Just like:
>
> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> index f68d5a057882..d728fb5eaec6 100644
> --- a/target/i386/kvm/kvm.c
> +++ b/target/i386/kvm/kvm.c
> @@ -2041,44 +2041,42 @@ int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp)
> if (first) {
> first = false;
>
> - /*
> - * Since Linux v5.18, KVM provides a VM-level capability to easily
> - * disable PMUs; however, QEMU has been providing PMU property per
> - * CPU since v1.6. In order to accommodate both, have to configure
> - * the VM-level capability here.
> - *
> - * KVM_PMU_CAP_DISABLE doesn't change the PMU
> - * behavior on Intel platform because current "pmu" property works
> - * as expected.
> - */
> - if (pmu_cap) {
> - if ((pmu_cap & KVM_PMU_CAP_DISABLE) &&
> - !X86_CPU(cpu)->enable_pmu) {
> - ret = kvm_vm_enable_cap(kvm_state, KVM_CAP_PMU_CAPABILITY, 0,
> - KVM_PMU_CAP_DISABLE);
> - if (ret < 0) {
> - error_setg_errno(errp, -ret,
> - "Failed to set KVM_PMU_CAP_DISABLE");
> - return ret;
> - }
> - }
> - } else {
> - /*
> - * KVM_CAP_PMU_CAPABILITY is introduced in Linux v5.18. For old
> - * linux, we have to check enable_pmu parameter for vPMU support.
> - */
> + if (X86_CPU(cpu)->enable_pmu) {
> g_autofree char *kvm_enable_pmu;
>
> /*
> - * The kvm.enable_pmu's permission is 0444. It does not change until
> - * a reload of the KVM module.
> + * The enable_pmu parameter is introduced since Linux v5.17,
> + * give a chance to provide more information about vPMU
> + * enablement.
> + *
> + * The kvm.enable_pmu's permission is 0444. It does not change
> + * until a reload of the KVM module.
> */
> if (g_file_get_contents("/sys/module/kvm/parameters/enable_pmu",
> &kvm_enable_pmu, NULL, NULL)) {
> - if (*kvm_enable_pmu == 'N' && X86_CPU(cpu)->enable_pmu) {
> - error_setg(errp, "Failed to enable PMU since "
> + if (*kvm_enable_pmu == 'N') {
> + warn_report("Failed to enable PMU since "
> "KVM's enable_pmu parameter is disabled");
> - return -EPERM;
Base on QA of v4 patchset, since we are not going to exit with an error
(-EPERM), I will need to bring back the global variable as in v2: kvm_pmu_disabled.
https://lore.kernel.org/all/20250302220112.17653-8-dongli.zhang@oracle.com/
Because we don't exit with error, I need kvm_pmu_disabled in PATCH 08 to
determine whether to skip the PMU info initialization, i.e.:
- +pmu
- enable_pmu=N
In this case, we don't need to initialize pmu_version or num_pmu_gp_counters.
Thank you very much!
Dongli Zhang
^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2025-04-16 7:51 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-31 1:32 [PATCH v3 00/10] target/i386/kvm/pmu: PMU Enhancement, Bugfix and Cleanup Dongli Zhang
2025-03-31 1:32 ` [PATCH v3 01/10] target/i386: disable PerfMonV2 when PERFCORE unavailable Dongli Zhang
2025-03-31 1:32 ` [PATCH v3 02/10] target/i386: disable PERFCORE when "-pmu" is configured Dongli Zhang
2025-03-31 1:32 ` [PATCH v3 03/10] kvm: Introduce kvm_arch_pre_create_vcpu() Dongli Zhang
2025-03-31 1:32 ` [PATCH v3 04/10] target/i386/kvm: set KVM_PMU_CAP_DISABLE if "-pmu" is configured Dongli Zhang
2025-03-31 1:32 ` [PATCH v3 05/10] target/i386/kvm: extract unrelated code out of kvm_x86_build_cpuid() Dongli Zhang
2025-04-10 2:41 ` Zhao Liu
2025-03-31 1:32 ` [PATCH v3 06/10] target/i386/kvm: rename architectural PMU variables Dongli Zhang
2025-03-31 1:32 ` [PATCH v3 07/10] target/i386/kvm: query kvm.enable_pmu parameter Dongli Zhang
2025-04-10 5:05 ` Zhao Liu
2025-04-10 20:16 ` Dongli Zhang
2025-04-16 7:48 ` Dongli Zhang
2025-03-31 1:32 ` [PATCH v3 08/10] target/i386/kvm: reset AMD PMU registers during VM reset Dongli Zhang
2025-04-10 7:43 ` Zhao Liu
2025-04-10 21:17 ` Dongli Zhang
2025-03-31 1:32 ` [PATCH v3 09/10] target/i386/kvm: support perfmon-v2 for reset Dongli Zhang
2025-04-10 8:21 ` Zhao Liu
2025-04-10 21:19 ` Dongli Zhang
2025-03-31 1:32 ` [PATCH v3 10/10] target/i386/kvm: don't stop Intel PMU counters Dongli Zhang
2025-04-10 9:45 ` Zhao Liu
2025-04-10 22:25 ` Dongli Zhang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).