public inbox for qemu-devel@nongnu.org
 help / color / mirror / Atom feed
* [PATCH 0/7] target/i386: Misc PMU, PEBS, and MSR fixes and improvements
@ 2026-01-17  1:10 Zide Chen
  2026-01-17  1:10 ` [PATCH 1/7] target/i386: Disable unsupported BTS for guest Zide Chen
                   ` (6 more replies)
  0 siblings, 7 replies; 17+ messages in thread
From: Zide Chen @ 2026-01-17  1:10 UTC (permalink / raw)
  To: qemu-devel, kvm, Paolo Bonzini, Zhao Liu, Peter Xu, Fabiano Rosas
  Cc: xiaoyao.li, Dongli Zhang, Dapeng Mi, Zide Chen

This series contains a set of mostly independent fixes and small
improvements in target/i386 related to PMU, PEBS, and MSR handling.

The patches are grouped into a single series for review convenience;
they are not tightly coupled and can be applied individually.

Dapeng Mi (3):
  target/i386: Don't save/restore PERF_GLOBAL_OVF_CTRL MSR
  target/i386: Support full-width writes for perf counters
  target/i386: Save/Restore DS based PEBS specfic MSRs

Zide Chen (4):
  target/i386: Disable unsupported BTS for guest
  target/i386: Gate enable_pmu on kvm_enabled()
  target/i386: Make some PEBS features user-visible
  target/i386: Increase MSR_BUF_SIZE and split KVM_[GET/SET]_MSRS calls

 target/i386/cpu.c     |  15 ++--
 target/i386/cpu.h     |  20 +++++-
 target/i386/kvm/kvm.c | 162 +++++++++++++++++++++++++++++++++++-------
 target/i386/machine.c |  35 +++++++--
 4 files changed, 191 insertions(+), 41 deletions(-)

-- 
2.52.0



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/7] target/i386: Disable unsupported BTS for guest
  2026-01-17  1:10 [PATCH 0/7] target/i386: Misc PMU, PEBS, and MSR fixes and improvements Zide Chen
@ 2026-01-17  1:10 ` Zide Chen
  2026-01-19  1:47   ` Mi, Dapeng
  2026-01-17  1:10 ` [PATCH 2/7] target/i386: Don't save/restore PERF_GLOBAL_OVF_CTRL MSR Zide Chen
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 17+ messages in thread
From: Zide Chen @ 2026-01-17  1:10 UTC (permalink / raw)
  To: qemu-devel, kvm, Paolo Bonzini, Zhao Liu, Peter Xu, Fabiano Rosas
  Cc: xiaoyao.li, Dongli Zhang, Dapeng Mi, Zide Chen

BTS (Branch Trace Store), enumerated by IA32_MISC_ENABLE.BTS_UNAVAILABLE
(bit 11), is deprecated and has been superseded by LBR and Intel PT.

KVM yields control of the above mentioned bit to userspace since KVM
commit 9fc222967a39 ("KVM: x86: Give host userspace full control of
MSR_IA32_MISC_ENABLES").

However, QEMU does not set this bit, which allows guests to write the
BTS and BTINT bits in IA32_DEBUGCTL.  Since KVM doesn't support BTS,
this may lead to unexpected MSR access errors.

Setting this bit does not introduce migration compatibility issues, so
the VMState version_id is not bumped.

Signed-off-by: Zide Chen <zide.chen@intel.com>
---
 target/i386/cpu.h | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 2bbc977d9088..f2b79a8bf1dc 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -474,7 +474,10 @@ typedef enum X86Seg {
 
 #define MSR_IA32_MISC_ENABLE            0x1a0
 /* Indicates good rep/movs microcode on some processors: */
-#define MSR_IA32_MISC_ENABLE_DEFAULT    1
+#define MSR_IA32_MISC_ENABLE_FASTSTRING    1
+#define MSR_IA32_MISC_ENABLE_BTS_UNAVAIL   (1ULL << 11)
+#define MSR_IA32_MISC_ENABLE_DEFAULT       (MSR_IA32_MISC_ENABLE_FASTSTRING     |\
+                                            MSR_IA32_MISC_ENABLE_BTS_UNAVAIL)
 #define MSR_IA32_MISC_ENABLE_MWAIT      (1ULL << 18)
 
 #define MSR_MTRRphysBase(reg)           (0x200 + 2 * (reg))
-- 
2.52.0



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/7] target/i386: Don't save/restore PERF_GLOBAL_OVF_CTRL MSR
  2026-01-17  1:10 [PATCH 0/7] target/i386: Misc PMU, PEBS, and MSR fixes and improvements Zide Chen
  2026-01-17  1:10 ` [PATCH 1/7] target/i386: Disable unsupported BTS for guest Zide Chen
@ 2026-01-17  1:10 ` Zide Chen
  2026-01-17  1:10 ` [PATCH 3/7] target/i386: Gate enable_pmu on kvm_enabled() Zide Chen
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 17+ messages in thread
From: Zide Chen @ 2026-01-17  1:10 UTC (permalink / raw)
  To: qemu-devel, kvm, Paolo Bonzini, Zhao Liu, Peter Xu, Fabiano Rosas
  Cc: xiaoyao.li, Dongli Zhang, Dapeng Mi, Zide Chen

From: Dapeng Mi <dapeng1.mi@linux.intel.com>

MSR_CORE_PERF_GLOBAL_OVF_CTRL is a write-only MSR and reads always
return zero.

Saving and restoring this MSR is therefore unnecessary.  Replace
VMSTATE_UINT64 with VMSTATE_UNUSED in the VMStateDescription to ignore
env.msr_global_ovf_ctrl during migration.  This avoids the need to bump
version_id and does not introduce any migration incompatibility.

Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
---
 target/i386/cpu.h     | 1 -
 target/i386/kvm/kvm.c | 6 ------
 target/i386/machine.c | 4 ++--
 3 files changed, 2 insertions(+), 9 deletions(-)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index f2b79a8bf1dc..0b480c631ed0 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -2086,7 +2086,6 @@ typedef struct CPUArchState {
     uint64_t msr_fixed_ctr_ctrl;
     uint64_t msr_global_ctrl;
     uint64_t msr_global_status;
-    uint64_t msr_global_ovf_ctrl;
     uint64_t msr_fixed_counters[MAX_FIXED_COUNTERS];
     uint64_t msr_gp_counters[MAX_GP_COUNTERS];
     uint64_t msr_gp_evtsel[MAX_GP_COUNTERS];
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 7b9b740a8e5a..cffbc90d1c50 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -4069,8 +4069,6 @@ static int kvm_put_msrs(X86CPU *cpu, KvmPutState level)
             if (has_architectural_pmu_version > 1) {
                 kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_STATUS,
                                   env->msr_global_status);
-                kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_OVF_CTRL,
-                                  env->msr_global_ovf_ctrl);
 
                 /* Now start the PMU.  */
                 kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR_CTRL,
@@ -4588,7 +4586,6 @@ static int kvm_get_msrs(X86CPU *cpu)
             kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR_CTRL, 0);
             kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_CTRL, 0);
             kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_STATUS, 0);
-            kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_OVF_CTRL, 0);
         }
         for (i = 0; i < num_architectural_pmu_fixed_counters; i++) {
             kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR0 + i, 0);
@@ -4917,9 +4914,6 @@ static int kvm_get_msrs(X86CPU *cpu)
         case MSR_CORE_PERF_GLOBAL_STATUS:
             env->msr_global_status = msrs[i].data;
             break;
-        case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
-            env->msr_global_ovf_ctrl = msrs[i].data;
-            break;
         case MSR_CORE_PERF_FIXED_CTR0 ... MSR_CORE_PERF_FIXED_CTR0 + MAX_FIXED_COUNTERS - 1:
             env->msr_fixed_counters[index - MSR_CORE_PERF_FIXED_CTR0] = msrs[i].data;
             break;
diff --git a/target/i386/machine.c b/target/i386/machine.c
index c9139612813b..1125c8a64ec5 100644
--- a/target/i386/machine.c
+++ b/target/i386/machine.c
@@ -666,7 +666,7 @@ static bool pmu_enable_needed(void *opaque)
     int i;
 
     if (env->msr_fixed_ctr_ctrl || env->msr_global_ctrl ||
-        env->msr_global_status || env->msr_global_ovf_ctrl) {
+        env->msr_global_status) {
         return true;
     }
     for (i = 0; i < MAX_FIXED_COUNTERS; i++) {
@@ -692,7 +692,7 @@ static const VMStateDescription vmstate_msr_architectural_pmu = {
         VMSTATE_UINT64(env.msr_fixed_ctr_ctrl, X86CPU),
         VMSTATE_UINT64(env.msr_global_ctrl, X86CPU),
         VMSTATE_UINT64(env.msr_global_status, X86CPU),
-        VMSTATE_UINT64(env.msr_global_ovf_ctrl, X86CPU),
+        VMSTATE_UNUSED(sizeof(uint64_t)),
         VMSTATE_UINT64_ARRAY(env.msr_fixed_counters, X86CPU, MAX_FIXED_COUNTERS),
         VMSTATE_UINT64_ARRAY(env.msr_gp_counters, X86CPU, MAX_GP_COUNTERS),
         VMSTATE_UINT64_ARRAY(env.msr_gp_evtsel, X86CPU, MAX_GP_COUNTERS),
-- 
2.52.0



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/7] target/i386: Gate enable_pmu on kvm_enabled()
  2026-01-17  1:10 [PATCH 0/7] target/i386: Misc PMU, PEBS, and MSR fixes and improvements Zide Chen
  2026-01-17  1:10 ` [PATCH 1/7] target/i386: Disable unsupported BTS for guest Zide Chen
  2026-01-17  1:10 ` [PATCH 2/7] target/i386: Don't save/restore PERF_GLOBAL_OVF_CTRL MSR Zide Chen
@ 2026-01-17  1:10 ` Zide Chen
  2026-01-19  2:02   ` Mi, Dapeng
  2026-01-17  1:10 ` [PATCH 4/7] target/i386: Support full-width writes for perf counters Zide Chen
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 17+ messages in thread
From: Zide Chen @ 2026-01-17  1:10 UTC (permalink / raw)
  To: qemu-devel, kvm, Paolo Bonzini, Zhao Liu, Peter Xu, Fabiano Rosas
  Cc: xiaoyao.li, Dongli Zhang, Dapeng Mi, Zide Chen

Guest PMU support requires KVM.  Clear cpu->enable_pmu when KVM is not
enabled, so PMU-related code can rely solely on cpu->enable_pmu.

This reduces duplication and avoids bugs where one of the checks is
missed.  For example, cpu_x86_cpuid() enables CPUID.0AH when
cpu->enable_pmu is set but does not check kvm_enabled(). This is
implicitly fixed by this patch:

if (cpu->enable_pmu) {
	x86_cpu_get_supported_cpuid(0xA, count, eax, ebx, ecx, edx);
}

Also fix two places that check kvm_enabled() but not cpu->enable_pmu.

Signed-off-by: Zide Chen <zide.chen@intel.com>
---
 target/i386/cpu.c     | 9 ++++++---
 target/i386/kvm/kvm.c | 4 ++--
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 37803cd72490..f1ac98970d3e 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -8671,7 +8671,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
         *ecx = 0;
         *edx = 0;
         if (!(env->features[FEAT_7_0_EBX] & CPUID_7_0_EBX_INTEL_PT) ||
-            !kvm_enabled()) {
+            !cpu->enable_pmu) {
             break;
         }
 
@@ -9018,7 +9018,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
     case 0x80000022:
         *eax = *ebx = *ecx = *edx = 0;
         /* AMD Extended Performance Monitoring and Debug */
-        if (kvm_enabled() && cpu->enable_pmu &&
+        if (cpu->enable_pmu &&
             (env->features[FEAT_8000_0022_EAX] & CPUID_8000_0022_EAX_PERFMON_V2)) {
             *eax |= CPUID_8000_0022_EAX_PERFMON_V2;
             *ebx |= kvm_arch_get_supported_cpuid(cs->kvm_state, index, count,
@@ -9642,7 +9642,7 @@ static bool x86_cpu_filter_features(X86CPU *cpu, bool verbose)
      * are advertised by cpu_x86_cpuid().  Keep these two in sync.
      */
     if ((env->features[FEAT_7_0_EBX] & CPUID_7_0_EBX_INTEL_PT) &&
-        kvm_enabled()) {
+        cpu->enable_pmu) {
         x86_cpu_get_supported_cpuid(0x14, 0,
                                     &eax_0, &ebx_0, &ecx_0, &edx_0);
         x86_cpu_get_supported_cpuid(0x14, 1,
@@ -9790,6 +9790,9 @@ static void x86_cpu_realizefn(DeviceState *dev, Error **errp)
     Error *local_err = NULL;
     unsigned requested_lbr_fmt;
 
+    if (!kvm_enabled())
+	    cpu->enable_pmu = false;
+
 #if defined(CONFIG_TCG) && !defined(CONFIG_USER_ONLY)
     /* Use pc-relative instructions in system-mode */
     tcg_cflags_set(cs, CF_PCREL);
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index cffbc90d1c50..e81fa46ed66c 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -4222,7 +4222,7 @@ static int kvm_put_msrs(X86CPU *cpu, KvmPutState level)
                               env->msr_xfd_err);
         }
 
-        if (kvm_enabled() && cpu->enable_pmu &&
+        if (cpu->enable_pmu &&
             (env->features[FEAT_7_0_EDX] & CPUID_7_0_EDX_ARCH_LBR)) {
             uint64_t depth;
             int ret;
@@ -4698,7 +4698,7 @@ static int kvm_get_msrs(X86CPU *cpu)
         kvm_msr_entry_add(cpu, MSR_IA32_XFD_ERR, 0);
     }
 
-    if (kvm_enabled() && cpu->enable_pmu &&
+    if (cpu->enable_pmu &&
         (env->features[FEAT_7_0_EDX] & CPUID_7_0_EDX_ARCH_LBR)) {
         uint64_t depth;
 
-- 
2.52.0



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 4/7] target/i386: Support full-width writes for perf counters
  2026-01-17  1:10 [PATCH 0/7] target/i386: Misc PMU, PEBS, and MSR fixes and improvements Zide Chen
                   ` (2 preceding siblings ...)
  2026-01-17  1:10 ` [PATCH 3/7] target/i386: Gate enable_pmu on kvm_enabled() Zide Chen
@ 2026-01-17  1:10 ` Zide Chen
  2026-01-19  3:11   ` Mi, Dapeng
  2026-01-17  1:10 ` [PATCH 5/7] target/i386: Save/Restore DS based PEBS specfic MSRs Zide Chen
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 17+ messages in thread
From: Zide Chen @ 2026-01-17  1:10 UTC (permalink / raw)
  To: qemu-devel, kvm, Paolo Bonzini, Zhao Liu, Peter Xu, Fabiano Rosas
  Cc: xiaoyao.li, Dongli Zhang, Dapeng Mi, Zide Chen

From: Dapeng Mi <dapeng1.mi@linux.intel.com>

If IA32_PERF_CAPABILITIES.FW_WRITE (bit 13) is set, each general-
purpose counter IA32_PMCi (starting at 0xc1) is accompanied by a
corresponding alias MSR starting at 0x4c1 (IA32_A_PMC0), which are
64-bit wide.

The legacy IA32_PMCi MSRs are not full-width and their effective width
is determined by CPUID.0AH:EAX[23:16].

Since these two sets of MSRs are aliases, when IA32_A_PMCi is supported
it is safe to use it for save/restore instead of the legacy MSRs,
regardless of whether the hypervisor uses the legacy or the 64-bit
counterpart.

Full-width write is a user-visible feature and can be disabled
individually.

Reduce MAX_GP_COUNTERS from 18 to 15 to avoid conflicts between the
full-width MSR range and MSR_MCG_EXT_CTL.  Current CPUs support at most
10 general-purpose counters, so 15 is sufficient for now and leaves room
for future expansion.

Bump minimum_version_id to avoid migration from older QEMU, as this may
otherwise cause VMState overflow. This also requires bumping version_id,
which prevents migration to older QEMU as well.

Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
---
 target/i386/cpu.h     |  5 ++++-
 target/i386/kvm/kvm.c | 19 +++++++++++++++++--
 target/i386/machine.c |  4 ++--
 3 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 0b480c631ed0..e7cf4a7bd594 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -421,6 +421,7 @@ typedef enum X86Seg {
 
 #define MSR_IA32_PERF_CAPABILITIES      0x345
 #define PERF_CAP_LBR_FMT                0x3f
+#define PERF_CAP_FULL_WRITE             (1U << 13)
 
 #define MSR_IA32_TSX_CTRL		0x122
 #define MSR_IA32_TSCDEADLINE            0x6e0
@@ -448,6 +449,8 @@ typedef enum X86Seg {
 #define MSR_IA32_SGXLEPUBKEYHASH3       0x8f
 
 #define MSR_P6_PERFCTR0                 0xc1
+/* Alternative perfctr range with full access. */
+#define MSR_IA32_PMC0                   0x4c1
 
 #define MSR_IA32_SMBASE                 0x9e
 #define MSR_SMI_COUNT                   0x34
@@ -1740,7 +1743,7 @@ typedef struct {
 #endif
 
 #define MAX_FIXED_COUNTERS 3
-#define MAX_GP_COUNTERS    (MSR_IA32_PERF_STATUS - MSR_P6_EVNTSEL0)
+#define MAX_GP_COUNTERS    15
 
 #define NB_OPMASK_REGS 8
 
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index e81fa46ed66c..530f50e4b218 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -4049,6 +4049,12 @@ static int kvm_put_msrs(X86CPU *cpu, KvmPutState level)
         }
 
         if (has_architectural_pmu_version > 0) {
+            uint32_t perf_cntr_base = MSR_P6_PERFCTR0;
+
+            if (env->features[FEAT_PERF_CAPABILITIES] & PERF_CAP_FULL_WRITE) {
+                perf_cntr_base = MSR_IA32_PMC0;
+            }
+
             if (has_architectural_pmu_version > 1) {
                 /* Stop the counter.  */
                 kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR_CTRL, 0);
@@ -4061,7 +4067,7 @@ static int kvm_put_msrs(X86CPU *cpu, KvmPutState level)
                                   env->msr_fixed_counters[i]);
             }
             for (i = 0; i < num_architectural_pmu_gp_counters; i++) {
-                kvm_msr_entry_add(cpu, MSR_P6_PERFCTR0 + i,
+                kvm_msr_entry_add(cpu, perf_cntr_base + i,
                                   env->msr_gp_counters[i]);
                 kvm_msr_entry_add(cpu, MSR_P6_EVNTSEL0 + i,
                                   env->msr_gp_evtsel[i]);
@@ -4582,6 +4588,12 @@ static int kvm_get_msrs(X86CPU *cpu)
         kvm_msr_entry_add(cpu, MSR_KVM_POLL_CONTROL, 1);
     }
     if (has_architectural_pmu_version > 0) {
+        uint32_t perf_cntr_base = MSR_P6_PERFCTR0;
+
+        if (env->features[FEAT_PERF_CAPABILITIES] & PERF_CAP_FULL_WRITE) {
+            perf_cntr_base = MSR_IA32_PMC0;
+        }
+
         if (has_architectural_pmu_version > 1) {
             kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR_CTRL, 0);
             kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_CTRL, 0);
@@ -4591,7 +4603,7 @@ static int kvm_get_msrs(X86CPU *cpu)
             kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR0 + i, 0);
         }
         for (i = 0; i < num_architectural_pmu_gp_counters; i++) {
-            kvm_msr_entry_add(cpu, MSR_P6_PERFCTR0 + i, 0);
+            kvm_msr_entry_add(cpu, perf_cntr_base + i, 0);
             kvm_msr_entry_add(cpu, MSR_P6_EVNTSEL0 + i, 0);
         }
     }
@@ -4920,6 +4932,9 @@ static int kvm_get_msrs(X86CPU *cpu)
         case MSR_P6_PERFCTR0 ... MSR_P6_PERFCTR0 + MAX_GP_COUNTERS - 1:
             env->msr_gp_counters[index - MSR_P6_PERFCTR0] = msrs[i].data;
             break;
+        case MSR_IA32_PMC0 ... MSR_IA32_PMC0 + MAX_GP_COUNTERS - 1:
+            env->msr_gp_counters[index - MSR_IA32_PMC0] = msrs[i].data;
+            break;
         case MSR_P6_EVNTSEL0 ... MSR_P6_EVNTSEL0 + MAX_GP_COUNTERS - 1:
             env->msr_gp_evtsel[index - MSR_P6_EVNTSEL0] = msrs[i].data;
             break;
diff --git a/target/i386/machine.c b/target/i386/machine.c
index 1125c8a64ec5..7d08a05835fc 100644
--- a/target/i386/machine.c
+++ b/target/i386/machine.c
@@ -685,8 +685,8 @@ static bool pmu_enable_needed(void *opaque)
 
 static const VMStateDescription vmstate_msr_architectural_pmu = {
     .name = "cpu/msr_architectural_pmu",
-    .version_id = 1,
-    .minimum_version_id = 1,
+    .version_id = 2,
+    .minimum_version_id = 2,
     .needed = pmu_enable_needed,
     .fields = (const VMStateField[]) {
         VMSTATE_UINT64(env.msr_fixed_ctr_ctrl, X86CPU),
-- 
2.52.0



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 5/7] target/i386: Save/Restore DS based PEBS specfic MSRs
  2026-01-17  1:10 [PATCH 0/7] target/i386: Misc PMU, PEBS, and MSR fixes and improvements Zide Chen
                   ` (3 preceding siblings ...)
  2026-01-17  1:10 ` [PATCH 4/7] target/i386: Support full-width writes for perf counters Zide Chen
@ 2026-01-17  1:10 ` Zide Chen
  2026-01-17  1:10 ` [PATCH 6/7] target/i386: Make some PEBS features user-visible Zide Chen
  2026-01-17  1:10 ` [PATCH 7/7] target/i386: Increase MSR_BUF_SIZE and split KVM_[GET/SET]_MSRS calls Zide Chen
  6 siblings, 0 replies; 17+ messages in thread
From: Zide Chen @ 2026-01-17  1:10 UTC (permalink / raw)
  To: qemu-devel, kvm, Paolo Bonzini, Zhao Liu, Peter Xu, Fabiano Rosas
  Cc: xiaoyao.li, Dongli Zhang, Dapeng Mi, Zide Chen

From: Dapeng Mi <dapeng1.mi@linux.intel.com>

DS-based PEBS introduces three MSRs: MSR_IA32_DS_AREA, MSR_PEBS_DATA_CFG,
and MSR_IA32_PEBS_ENABLE. Save and restore these MSRs when legacy DS
PEBS is enabled.

Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
---
 target/i386/cpu.h     |  9 +++++++++
 target/i386/kvm/kvm.c | 25 +++++++++++++++++++++++++
 target/i386/machine.c | 27 ++++++++++++++++++++++++++-
 3 files changed, 60 insertions(+), 1 deletion(-)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index e7cf4a7bd594..dc5b477be283 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -422,6 +422,7 @@ typedef enum X86Seg {
 #define MSR_IA32_PERF_CAPABILITIES      0x345
 #define PERF_CAP_LBR_FMT                0x3f
 #define PERF_CAP_FULL_WRITE             (1U << 13)
+#define PERF_CAP_PEBS_BASELINE          (1U << 14)
 
 #define MSR_IA32_TSX_CTRL		0x122
 #define MSR_IA32_TSCDEADLINE            0x6e0
@@ -512,6 +513,11 @@ typedef enum X86Seg {
 #define MSR_CORE_PERF_GLOBAL_CTRL       0x38f
 #define MSR_CORE_PERF_GLOBAL_OVF_CTRL   0x390
 
+/* Legacy DS based PEBS MSRs */
+#define MSR_IA32_PEBS_ENABLE            0x3f1
+#define MSR_PEBS_DATA_CFG               0x3f2
+#define MSR_IA32_DS_AREA                0x600
+
 #define MSR_MC0_CTL                     0x400
 #define MSR_MC0_STATUS                  0x401
 #define MSR_MC0_ADDR                    0x402
@@ -2089,6 +2095,9 @@ typedef struct CPUArchState {
     uint64_t msr_fixed_ctr_ctrl;
     uint64_t msr_global_ctrl;
     uint64_t msr_global_status;
+    uint64_t msr_ds_area;
+    uint64_t msr_pebs_data_cfg;
+    uint64_t msr_pebs_enable;
     uint64_t msr_fixed_counters[MAX_FIXED_COUNTERS];
     uint64_t msr_gp_counters[MAX_GP_COUNTERS];
     uint64_t msr_gp_evtsel[MAX_GP_COUNTERS];
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 530f50e4b218..80974114a173 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -4061,6 +4061,15 @@ static int kvm_put_msrs(X86CPU *cpu, KvmPutState level)
                 kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_CTRL, 0);
             }
 
+            if (env->features[FEAT_1_EDX] & CPUID_DTS) {
+                kvm_msr_entry_add(cpu, MSR_IA32_DS_AREA, env->msr_ds_area);
+            }
+
+            if (env->features[FEAT_PERF_CAPABILITIES] & PERF_CAP_PEBS_BASELINE) {
+                kvm_msr_entry_add(cpu, MSR_IA32_PEBS_ENABLE, env->msr_pebs_enable);
+                kvm_msr_entry_add(cpu, MSR_PEBS_DATA_CFG, env->msr_pebs_data_cfg);
+            }
+
             /* Set the counter values.  */
             for (i = 0; i < num_architectural_pmu_fixed_counters; i++) {
                 kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR0 + i,
@@ -4606,6 +4615,13 @@ static int kvm_get_msrs(X86CPU *cpu)
             kvm_msr_entry_add(cpu, perf_cntr_base + i, 0);
             kvm_msr_entry_add(cpu, MSR_P6_EVNTSEL0 + i, 0);
         }
+        if (env->features[FEAT_1_EDX] & CPUID_DTS) {
+            kvm_msr_entry_add(cpu, MSR_IA32_DS_AREA, 0);
+        }
+        if (env->features[FEAT_PERF_CAPABILITIES] & PERF_CAP_PEBS_BASELINE) {
+            kvm_msr_entry_add(cpu, MSR_IA32_PEBS_ENABLE, 0);
+            kvm_msr_entry_add(cpu, MSR_PEBS_DATA_CFG, 0);
+        }
     }
 
     if (env->mcg_cap) {
@@ -4938,6 +4954,15 @@ static int kvm_get_msrs(X86CPU *cpu)
         case MSR_P6_EVNTSEL0 ... MSR_P6_EVNTSEL0 + MAX_GP_COUNTERS - 1:
             env->msr_gp_evtsel[index - MSR_P6_EVNTSEL0] = msrs[i].data;
             break;
+        case MSR_IA32_DS_AREA:
+            env->msr_ds_area = msrs[i].data;
+            break;
+        case MSR_PEBS_DATA_CFG:
+            env->msr_pebs_data_cfg = msrs[i].data;
+            break;
+        case MSR_IA32_PEBS_ENABLE:
+            env->msr_pebs_enable = msrs[i].data;
+            break;
         case HV_X64_MSR_HYPERCALL:
             env->msr_hv_hypercall = msrs[i].data;
             break;
diff --git a/target/i386/machine.c b/target/i386/machine.c
index 7d08a05835fc..7f45db1247b1 100644
--- a/target/i386/machine.c
+++ b/target/i386/machine.c
@@ -659,6 +659,27 @@ static const VMStateDescription vmstate_msr_ia32_feature_control = {
     }
 };
 
+static bool ds_pebs_enabled(void *opaque)
+{
+    X86CPU *cpu = opaque;
+    CPUX86State *env = &cpu->env;
+
+    return (env->msr_ds_area || env->msr_pebs_enable ||
+            env->msr_pebs_data_cfg);
+}
+
+static const VMStateDescription vmstate_msr_ds_pebs = {
+    .name = "cpu/msr_ds_pebs",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .needed = ds_pebs_enabled,
+    .fields = (const VMStateField[]){
+        VMSTATE_UINT64(env.msr_ds_area, X86CPU),
+        VMSTATE_UINT64(env.msr_pebs_data_cfg, X86CPU),
+        VMSTATE_UINT64(env.msr_pebs_enable, X86CPU),
+        VMSTATE_END_OF_LIST()}
+};
+
 static bool pmu_enable_needed(void *opaque)
 {
     X86CPU *cpu = opaque;
@@ -697,7 +718,11 @@ static const VMStateDescription vmstate_msr_architectural_pmu = {
         VMSTATE_UINT64_ARRAY(env.msr_gp_counters, X86CPU, MAX_GP_COUNTERS),
         VMSTATE_UINT64_ARRAY(env.msr_gp_evtsel, X86CPU, MAX_GP_COUNTERS),
         VMSTATE_END_OF_LIST()
-    }
+    },
+    .subsections = (const VMStateDescription * const []) {
+        &vmstate_msr_ds_pebs,
+        NULL,
+    },
 };
 
 static bool mpx_needed(void *opaque)
-- 
2.52.0



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 6/7] target/i386: Make some PEBS features user-visible
  2026-01-17  1:10 [PATCH 0/7] target/i386: Misc PMU, PEBS, and MSR fixes and improvements Zide Chen
                   ` (4 preceding siblings ...)
  2026-01-17  1:10 ` [PATCH 5/7] target/i386: Save/Restore DS based PEBS specfic MSRs Zide Chen
@ 2026-01-17  1:10 ` Zide Chen
  2026-01-19  3:30   ` Mi, Dapeng
  2026-01-17  1:10 ` [PATCH 7/7] target/i386: Increase MSR_BUF_SIZE and split KVM_[GET/SET]_MSRS calls Zide Chen
  6 siblings, 1 reply; 17+ messages in thread
From: Zide Chen @ 2026-01-17  1:10 UTC (permalink / raw)
  To: qemu-devel, kvm, Paolo Bonzini, Zhao Liu, Peter Xu, Fabiano Rosas
  Cc: xiaoyao.li, Dongli Zhang, Dapeng Mi, Zide Chen

Populate selected PEBS feature names in FEAT_PERF_CAPABILITIES to make
the corresponding bits user-visible CPU feature knobs, allowing them to
be explicitly enabled or disabled via -cpu +/-<feature>.

Once named, these bits become part of the guest CPU configuration
contract.  If a VM is configured with such a feature enabled, migration
to a destination that does not support the feature may fail, as the
destination cannot honor the guest-visible CPU model.

The PEBS_FMT bits are intentionally not exposed. They are not meaningful
as user-visible features, and QEMU registers CPU features as boolean
QOM properties, which makes them unsuitable for representing and
checking numeric capabilities.

Co-developed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
---
 target/i386/cpu.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index f1ac98970d3e..fc6a64287415 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -1618,10 +1618,10 @@ FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
         .type = MSR_FEATURE_WORD,
         .feat_names = {
             NULL, NULL, NULL, NULL,
+            NULL, NULL, "pebs-trap", "pebs-arch-reg"
             NULL, NULL, NULL, NULL,
-            NULL, NULL, NULL, NULL,
-            NULL, "full-width-write", NULL, NULL,
-            NULL, NULL, NULL, NULL,
+            NULL, "full-width-write", "pebs-baseline", NULL,
+            NULL, "pebs-timing-info", NULL, NULL,
             NULL, NULL, NULL, NULL,
             NULL, NULL, NULL, NULL,
             NULL, NULL, NULL, NULL,
-- 
2.52.0



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 7/7] target/i386: Increase MSR_BUF_SIZE and split KVM_[GET/SET]_MSRS calls
  2026-01-17  1:10 [PATCH 0/7] target/i386: Misc PMU, PEBS, and MSR fixes and improvements Zide Chen
                   ` (5 preceding siblings ...)
  2026-01-17  1:10 ` [PATCH 6/7] target/i386: Make some PEBS features user-visible Zide Chen
@ 2026-01-17  1:10 ` Zide Chen
  6 siblings, 0 replies; 17+ messages in thread
From: Zide Chen @ 2026-01-17  1:10 UTC (permalink / raw)
  To: qemu-devel, kvm, Paolo Bonzini, Zhao Liu, Peter Xu, Fabiano Rosas
  Cc: xiaoyao.li, Dongli Zhang, Dapeng Mi, Zide Chen

Newer Intel server CPUs support a large number of PMU MSRs.  Currently,
QEMU allocates cpu->kvm_msr_buf as a single-page buffer, which is not
sufficient to hold all possible MSRs.

Increase MSR_BUF_SIZE to 8192 bytes, providing space for up to 511 MSRs.
This is sufficient even for the theoretical worst case, such as
architectural LBR with a depth of 64.

KVM_[GET/SET]_MSRS is limited to 255 MSRs per call.  Raising this limit
to 511 would require changes in KVM and would introduce backward
compatibility issues.  Instead, split requests into multiple
KVM_[GET/SET]_MSRS calls when the number of MSRs exceeds the API limit.

Signed-off-by: Zide Chen <zide.chen@intel.com>
---
 target/i386/kvm/kvm.c | 109 +++++++++++++++++++++++++++++++++++-------
 1 file changed, 92 insertions(+), 17 deletions(-)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 80974114a173..a72e4d60dfa2 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -98,9 +98,12 @@
 #define KVM_APIC_BUS_CYCLE_NS       1
 #define KVM_APIC_BUS_FREQUENCY      (1000000000ULL / KVM_APIC_BUS_CYCLE_NS)
 
-/* A 4096-byte buffer can hold the 8-byte kvm_msrs header, plus
- * 255 kvm_msr_entry structs */
-#define MSR_BUF_SIZE 4096
+/* A 8192-byte buffer can hold the 8-byte kvm_msrs header, plus
+ * 511 kvm_msr_entry structs */
+#define MSR_BUF_SIZE      8192
+
+/* Maximum number of MSRs in one single KVM_[GET/SET]_MSRS call. */
+#define KVM_MAX_IO_MSRS   255
 
 typedef bool QEMURDMSRHandler(X86CPU *cpu, uint32_t msr, uint64_t *val);
 typedef bool QEMUWRMSRHandler(X86CPU *cpu, uint32_t msr, uint64_t val);
@@ -3878,23 +3881,102 @@ static void kvm_msr_entry_add_perf(X86CPU *cpu, FeatureWordArray f)
     }
 }
 
-static int kvm_buf_set_msrs(X86CPU *cpu)
+static int __kvm_buf_set_msrs(X86CPU *cpu, struct kvm_msrs *msrs)
 {
-    int ret = kvm_vcpu_ioctl(CPU(cpu), KVM_SET_MSRS, cpu->kvm_msr_buf);
+    int ret = kvm_vcpu_ioctl(CPU(cpu), KVM_SET_MSRS, msrs);
     if (ret < 0) {
         return ret;
     }
 
-    if (ret < cpu->kvm_msr_buf->nmsrs) {
-        struct kvm_msr_entry *e = &cpu->kvm_msr_buf->entries[ret];
+    if (ret < msrs->nmsrs) {
+        struct kvm_msr_entry *e = &msrs->entries[ret];
         error_report("error: failed to set MSR 0x%" PRIx32 " to 0x%" PRIx64,
                      (uint32_t)e->index, (uint64_t)e->data);
     }
 
-    assert(ret == cpu->kvm_msr_buf->nmsrs);
+    assert(ret == msrs->nmsrs);
+    return ret;
+}
+
+static int __kvm_buf_get_msrs(X86CPU *cpu, struct kvm_msrs *msrs)
+{
+    int ret;
+
+    ret = kvm_vcpu_ioctl(CPU(cpu), KVM_GET_MSRS, msrs);
+    if (ret < 0) {
+        return ret;
+    }
+
+    if (ret < msrs->nmsrs) {
+        struct kvm_msr_entry *e = &msrs->entries[ret];
+        error_report("error: failed to get MSR 0x%" PRIx32,
+                     (uint32_t)e->index);
+    }
+
+    assert(ret == msrs->nmsrs);
+    return ret;
+}
+
+static int kvm_buf_set_or_get_msrs(X86CPU *cpu, bool is_write)
+{
+    struct kvm_msr_entry *entries = cpu->kvm_msr_buf->entries;
+    struct kvm_msrs *buf = NULL;
+    int current, remaining, ret = 0;
+    size_t buf_size;
+
+    buf_size = KVM_MAX_IO_MSRS * sizeof(struct kvm_msr_entry) +
+               sizeof(struct kvm_msrs);
+    buf = g_malloc(buf_size);
+
+    remaining = cpu->kvm_msr_buf->nmsrs;
+    current = 0;
+    while (remaining) {
+        size_t size;
+
+        memset(buf, 0, buf_size);
+
+        if (remaining > KVM_MAX_IO_MSRS) {
+            buf->nmsrs = KVM_MAX_IO_MSRS;
+        } else {
+            buf->nmsrs = remaining;
+        }
+
+        size = buf->nmsrs * sizeof(entries[0]);
+        memcpy(buf->entries, &entries[current], size);
+
+        if (is_write) {
+            ret = __kvm_buf_set_msrs(cpu, buf);
+        } else {
+            ret = __kvm_buf_get_msrs(cpu, buf);
+        }
+
+        if (ret < 0) {
+            goto out;
+        }
+
+        if (!is_write)
+            memcpy(&entries[current], buf->entries, size);
+
+        current += buf->nmsrs;
+        remaining -= buf->nmsrs;
+    }
+
+out:
+    g_free(buf);
+    return ret < 0 ? ret : cpu->kvm_msr_buf->nmsrs;
+}
+
+static int kvm_buf_set_msrs(X86CPU *cpu)
+{
+    kvm_buf_set_or_get_msrs(cpu, true);
     return 0;
 }
 
+static int kvm_buf_get_msrs(X86CPU *cpu)
+{
+    return kvm_buf_set_or_get_msrs(cpu, false);
+}
+
 static void kvm_init_msrs(X86CPU *cpu)
 {
     CPUX86State *env = &cpu->env;
@@ -3928,7 +4010,7 @@ static void kvm_init_msrs(X86CPU *cpu)
     if (has_msr_ucode_rev) {
         kvm_msr_entry_add(cpu, MSR_IA32_UCODE_REV, cpu->ucode_rev);
     }
-    assert(kvm_buf_set_msrs(cpu) == 0);
+    kvm_buf_set_msrs(cpu);
 }
 
 static int kvm_put_msrs(X86CPU *cpu, KvmPutState level)
@@ -4762,18 +4844,11 @@ static int kvm_get_msrs(X86CPU *cpu)
         }
     }
 
-    ret = kvm_vcpu_ioctl(CPU(cpu), KVM_GET_MSRS, cpu->kvm_msr_buf);
+    ret = kvm_buf_get_msrs(cpu);
     if (ret < 0) {
         return ret;
     }
 
-    if (ret < cpu->kvm_msr_buf->nmsrs) {
-        struct kvm_msr_entry *e = &cpu->kvm_msr_buf->entries[ret];
-        error_report("error: failed to get MSR 0x%" PRIx32,
-                     (uint32_t)e->index);
-    }
-
-    assert(ret == cpu->kvm_msr_buf->nmsrs);
     /*
      * MTRR masks: Each mask consists of 5 parts
      * a  10..0: must be zero
-- 
2.52.0



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/7] target/i386: Disable unsupported BTS for guest
  2026-01-17  1:10 ` [PATCH 1/7] target/i386: Disable unsupported BTS for guest Zide Chen
@ 2026-01-19  1:47   ` Mi, Dapeng
  2026-01-20 18:09     ` Chen, Zide
  0 siblings, 1 reply; 17+ messages in thread
From: Mi, Dapeng @ 2026-01-19  1:47 UTC (permalink / raw)
  To: Zide Chen, qemu-devel, kvm, Paolo Bonzini, Zhao Liu, Peter Xu,
	Fabiano Rosas
  Cc: xiaoyao.li, Dongli Zhang


On 1/17/2026 9:10 AM, Zide Chen wrote:
> BTS (Branch Trace Store), enumerated by IA32_MISC_ENABLE.BTS_UNAVAILABLE
> (bit 11), is deprecated and has been superseded by LBR and Intel PT.
>
> KVM yields control of the above mentioned bit to userspace since KVM
> commit 9fc222967a39 ("KVM: x86: Give host userspace full control of
> MSR_IA32_MISC_ENABLES").
>
> However, QEMU does not set this bit, which allows guests to write the
> BTS and BTINT bits in IA32_DEBUGCTL.  Since KVM doesn't support BTS,
> this may lead to unexpected MSR access errors.
>
> Setting this bit does not introduce migration compatibility issues, so
> the VMState version_id is not bumped.
>
> Signed-off-by: Zide Chen <zide.chen@intel.com>
> ---
>  target/i386/cpu.h | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/target/i386/cpu.h b/target/i386/cpu.h
> index 2bbc977d9088..f2b79a8bf1dc 100644
> --- a/target/i386/cpu.h
> +++ b/target/i386/cpu.h
> @@ -474,7 +474,10 @@ typedef enum X86Seg {
>  
>  #define MSR_IA32_MISC_ENABLE            0x1a0
>  /* Indicates good rep/movs microcode on some processors: */
> -#define MSR_IA32_MISC_ENABLE_DEFAULT    1
> +#define MSR_IA32_MISC_ENABLE_FASTSTRING    1

To keep the same code style and make users clearly know the macro is a
bitmask, better define MSR_IA32_MISC_ENABLE_FASTSTRING like below.

#define MSR_IA32_MISC_ENABLE_FASTSTRING    (1ULL << 0)


> +#define MSR_IA32_MISC_ENABLE_BTS_UNAVAIL   (1ULL << 11)
> +#define MSR_IA32_MISC_ENABLE_DEFAULT       (MSR_IA32_MISC_ENABLE_FASTSTRING     |\
> +                                            MSR_IA32_MISC_ENABLE_BTS_UNAVAIL)

Better move the macro "MSR_IA32_MISC_ENABLE_DEFAULT" after
"MSR_IA32_MISC_ENABLE_MWAIT".


>  #define MSR_IA32_MISC_ENABLE_MWAIT      (1ULL << 18)
>  
>  #define MSR_MTRRphysBase(reg)           (0x200 + 2 * (reg))


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/7] target/i386: Gate enable_pmu on kvm_enabled()
  2026-01-17  1:10 ` [PATCH 3/7] target/i386: Gate enable_pmu on kvm_enabled() Zide Chen
@ 2026-01-19  2:02   ` Mi, Dapeng
  0 siblings, 0 replies; 17+ messages in thread
From: Mi, Dapeng @ 2026-01-19  2:02 UTC (permalink / raw)
  To: Zide Chen, qemu-devel, kvm, Paolo Bonzini, Zhao Liu, Peter Xu,
	Fabiano Rosas
  Cc: xiaoyao.li, Dongli Zhang


On 1/17/2026 9:10 AM, Zide Chen wrote:
> Guest PMU support requires KVM.  Clear cpu->enable_pmu when KVM is not
> enabled, so PMU-related code can rely solely on cpu->enable_pmu.
>
> This reduces duplication and avoids bugs where one of the checks is
> missed.  For example, cpu_x86_cpuid() enables CPUID.0AH when
> cpu->enable_pmu is set but does not check kvm_enabled(). This is
> implicitly fixed by this patch:
>
> if (cpu->enable_pmu) {
> 	x86_cpu_get_supported_cpuid(0xA, count, eax, ebx, ecx, edx);
> }
>
> Also fix two places that check kvm_enabled() but not cpu->enable_pmu.
>
> Signed-off-by: Zide Chen <zide.chen@intel.com>
> ---
>  target/i386/cpu.c     | 9 ++++++---
>  target/i386/kvm/kvm.c | 4 ++--
>  2 files changed, 8 insertions(+), 5 deletions(-)
>
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index 37803cd72490..f1ac98970d3e 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -8671,7 +8671,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
>          *ecx = 0;
>          *edx = 0;
>          if (!(env->features[FEAT_7_0_EBX] & CPUID_7_0_EBX_INTEL_PT) ||
> -            !kvm_enabled()) {
> +            !cpu->enable_pmu) {
>              break;
>          }
>  
> @@ -9018,7 +9018,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count,
>      case 0x80000022:
>          *eax = *ebx = *ecx = *edx = 0;
>          /* AMD Extended Performance Monitoring and Debug */
> -        if (kvm_enabled() && cpu->enable_pmu &&
> +        if (cpu->enable_pmu &&
>              (env->features[FEAT_8000_0022_EAX] & CPUID_8000_0022_EAX_PERFMON_V2)) {
>              *eax |= CPUID_8000_0022_EAX_PERFMON_V2;
>              *ebx |= kvm_arch_get_supported_cpuid(cs->kvm_state, index, count,
> @@ -9642,7 +9642,7 @@ static bool x86_cpu_filter_features(X86CPU *cpu, bool verbose)
>       * are advertised by cpu_x86_cpuid().  Keep these two in sync.
>       */
>      if ((env->features[FEAT_7_0_EBX] & CPUID_7_0_EBX_INTEL_PT) &&
> -        kvm_enabled()) {
> +        cpu->enable_pmu) {
>          x86_cpu_get_supported_cpuid(0x14, 0,
>                                      &eax_0, &ebx_0, &ecx_0, &edx_0);
>          x86_cpu_get_supported_cpuid(0x14, 1,
> @@ -9790,6 +9790,9 @@ static void x86_cpu_realizefn(DeviceState *dev, Error **errp)
>      Error *local_err = NULL;
>      unsigned requested_lbr_fmt;
>  
> +    if (!kvm_enabled())
> +	    cpu->enable_pmu = false;
> +
>  #if defined(CONFIG_TCG) && !defined(CONFIG_USER_ONLY)
>      /* Use pc-relative instructions in system-mode */
>      tcg_cflags_set(cs, CF_PCREL);
> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> index cffbc90d1c50..e81fa46ed66c 100644
> --- a/target/i386/kvm/kvm.c
> +++ b/target/i386/kvm/kvm.c
> @@ -4222,7 +4222,7 @@ static int kvm_put_msrs(X86CPU *cpu, KvmPutState level)
>                                env->msr_xfd_err);
>          }
>  
> -        if (kvm_enabled() && cpu->enable_pmu &&
> +        if (cpu->enable_pmu &&
>              (env->features[FEAT_7_0_EDX] & CPUID_7_0_EDX_ARCH_LBR)) {
>              uint64_t depth;
>              int ret;
> @@ -4698,7 +4698,7 @@ static int kvm_get_msrs(X86CPU *cpu)
>          kvm_msr_entry_add(cpu, MSR_IA32_XFD_ERR, 0);
>      }
>  
> -    if (kvm_enabled() && cpu->enable_pmu &&
> +    if (cpu->enable_pmu &&
>          (env->features[FEAT_7_0_EDX] & CPUID_7_0_EDX_ARCH_LBR)) {
>          uint64_t depth;
>  

LGTM.

Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/7] target/i386: Support full-width writes for perf counters
  2026-01-17  1:10 ` [PATCH 4/7] target/i386: Support full-width writes for perf counters Zide Chen
@ 2026-01-19  3:11   ` Mi, Dapeng
  0 siblings, 0 replies; 17+ messages in thread
From: Mi, Dapeng @ 2026-01-19  3:11 UTC (permalink / raw)
  To: Zide Chen, qemu-devel, kvm, Paolo Bonzini, Zhao Liu, Peter Xu,
	Fabiano Rosas
  Cc: xiaoyao.li, Dongli Zhang


On 1/17/2026 9:10 AM, Zide Chen wrote:
> From: Dapeng Mi <dapeng1.mi@linux.intel.com>
>
> If IA32_PERF_CAPABILITIES.FW_WRITE (bit 13) is set, each general-
> purpose counter IA32_PMCi (starting at 0xc1) is accompanied by a
> corresponding alias MSR starting at 0x4c1 (IA32_A_PMC0), which are
> 64-bit wide.
>
> The legacy IA32_PMCi MSRs are not full-width and their effective width
> is determined by CPUID.0AH:EAX[23:16].
>
> Since these two sets of MSRs are aliases, when IA32_A_PMCi is supported
> it is safe to use it for save/restore instead of the legacy MSRs,
> regardless of whether the hypervisor uses the legacy or the 64-bit
> counterpart.
>
> Full-width write is a user-visible feature and can be disabled
> individually.
>
> Reduce MAX_GP_COUNTERS from 18 to 15 to avoid conflicts between the
> full-width MSR range and MSR_MCG_EXT_CTL.  Current CPUs support at most
> 10 general-purpose counters, so 15 is sufficient for now and leaves room
> for future expansion.
>
> Bump minimum_version_id to avoid migration from older QEMU, as this may
> otherwise cause VMState overflow. This also requires bumping version_id,
> which prevents migration to older QEMU as well.
>
> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
> Signed-off-by: Zide Chen <zide.chen@intel.com>
> ---
>  target/i386/cpu.h     |  5 ++++-
>  target/i386/kvm/kvm.c | 19 +++++++++++++++++--
>  target/i386/machine.c |  4 ++--
>  3 files changed, 23 insertions(+), 5 deletions(-)
>
> diff --git a/target/i386/cpu.h b/target/i386/cpu.h
> index 0b480c631ed0..e7cf4a7bd594 100644
> --- a/target/i386/cpu.h
> +++ b/target/i386/cpu.h
> @@ -421,6 +421,7 @@ typedef enum X86Seg {
>  
>  #define MSR_IA32_PERF_CAPABILITIES      0x345
>  #define PERF_CAP_LBR_FMT                0x3f
> +#define PERF_CAP_FULL_WRITE             (1U << 13)
>  
>  #define MSR_IA32_TSX_CTRL		0x122
>  #define MSR_IA32_TSCDEADLINE            0x6e0
> @@ -448,6 +449,8 @@ typedef enum X86Seg {
>  #define MSR_IA32_SGXLEPUBKEYHASH3       0x8f
>  
>  #define MSR_P6_PERFCTR0                 0xc1
> +/* Alternative perfctr range with full access. */
> +#define MSR_IA32_PMC0                   0x4c1
>  
>  #define MSR_IA32_SMBASE                 0x9e
>  #define MSR_SMI_COUNT                   0x34
> @@ -1740,7 +1743,7 @@ typedef struct {
>  #endif
>  
>  #define MAX_FIXED_COUNTERS 3
> -#define MAX_GP_COUNTERS    (MSR_IA32_PERF_STATUS - MSR_P6_EVNTSEL0)
> +#define MAX_GP_COUNTERS    15

I suppose this is good enough for AMD CPUs, but need AMD guys to double
confirm. Thanks.


>  
>  #define NB_OPMASK_REGS 8
>  
> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> index e81fa46ed66c..530f50e4b218 100644
> --- a/target/i386/kvm/kvm.c
> +++ b/target/i386/kvm/kvm.c
> @@ -4049,6 +4049,12 @@ static int kvm_put_msrs(X86CPU *cpu, KvmPutState level)
>          }
>  
>          if (has_architectural_pmu_version > 0) {
> +            uint32_t perf_cntr_base = MSR_P6_PERFCTR0;
> +
> +            if (env->features[FEAT_PERF_CAPABILITIES] & PERF_CAP_FULL_WRITE) {
> +                perf_cntr_base = MSR_IA32_PMC0;
> +            }
> +
>              if (has_architectural_pmu_version > 1) {
>                  /* Stop the counter.  */
>                  kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR_CTRL, 0);
> @@ -4061,7 +4067,7 @@ static int kvm_put_msrs(X86CPU *cpu, KvmPutState level)
>                                    env->msr_fixed_counters[i]);
>              }
>              for (i = 0; i < num_architectural_pmu_gp_counters; i++) {
> -                kvm_msr_entry_add(cpu, MSR_P6_PERFCTR0 + i,
> +                kvm_msr_entry_add(cpu, perf_cntr_base + i,
>                                    env->msr_gp_counters[i]);
>                  kvm_msr_entry_add(cpu, MSR_P6_EVNTSEL0 + i,
>                                    env->msr_gp_evtsel[i]);
> @@ -4582,6 +4588,12 @@ static int kvm_get_msrs(X86CPU *cpu)
>          kvm_msr_entry_add(cpu, MSR_KVM_POLL_CONTROL, 1);
>      }
>      if (has_architectural_pmu_version > 0) {
> +        uint32_t perf_cntr_base = MSR_P6_PERFCTR0;
> +
> +        if (env->features[FEAT_PERF_CAPABILITIES] & PERF_CAP_FULL_WRITE) {
> +            perf_cntr_base = MSR_IA32_PMC0;
> +        }
> +
>          if (has_architectural_pmu_version > 1) {
>              kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR_CTRL, 0);
>              kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_CTRL, 0);
> @@ -4591,7 +4603,7 @@ static int kvm_get_msrs(X86CPU *cpu)
>              kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR0 + i, 0);
>          }
>          for (i = 0; i < num_architectural_pmu_gp_counters; i++) {
> -            kvm_msr_entry_add(cpu, MSR_P6_PERFCTR0 + i, 0);
> +            kvm_msr_entry_add(cpu, perf_cntr_base + i, 0);
>              kvm_msr_entry_add(cpu, MSR_P6_EVNTSEL0 + i, 0);
>          }
>      }
> @@ -4920,6 +4932,9 @@ static int kvm_get_msrs(X86CPU *cpu)
>          case MSR_P6_PERFCTR0 ... MSR_P6_PERFCTR0 + MAX_GP_COUNTERS - 1:
>              env->msr_gp_counters[index - MSR_P6_PERFCTR0] = msrs[i].data;
>              break;
> +        case MSR_IA32_PMC0 ... MSR_IA32_PMC0 + MAX_GP_COUNTERS - 1:
> +            env->msr_gp_counters[index - MSR_IA32_PMC0] = msrs[i].data;
> +            break;
>          case MSR_P6_EVNTSEL0 ... MSR_P6_EVNTSEL0 + MAX_GP_COUNTERS - 1:
>              env->msr_gp_evtsel[index - MSR_P6_EVNTSEL0] = msrs[i].data;
>              break;
> diff --git a/target/i386/machine.c b/target/i386/machine.c
> index 1125c8a64ec5..7d08a05835fc 100644
> --- a/target/i386/machine.c
> +++ b/target/i386/machine.c
> @@ -685,8 +685,8 @@ static bool pmu_enable_needed(void *opaque)
>  
>  static const VMStateDescription vmstate_msr_architectural_pmu = {
>      .name = "cpu/msr_architectural_pmu",
> -    .version_id = 1,
> -    .minimum_version_id = 1,
> +    .version_id = 2,
> +    .minimum_version_id = 2,
>      .needed = pmu_enable_needed,
>      .fields = (const VMStateField[]) {
>          VMSTATE_UINT64(env.msr_fixed_ctr_ctrl, X86CPU),


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 6/7] target/i386: Make some PEBS features user-visible
  2026-01-17  1:10 ` [PATCH 6/7] target/i386: Make some PEBS features user-visible Zide Chen
@ 2026-01-19  3:30   ` Mi, Dapeng
  2026-01-20 21:58     ` Chen, Zide
  0 siblings, 1 reply; 17+ messages in thread
From: Mi, Dapeng @ 2026-01-19  3:30 UTC (permalink / raw)
  To: Zide Chen, qemu-devel, kvm, Paolo Bonzini, Zhao Liu, Peter Xu,
	Fabiano Rosas
  Cc: xiaoyao.li, Dongli Zhang


On 1/17/2026 9:10 AM, Zide Chen wrote:
> Populate selected PEBS feature names in FEAT_PERF_CAPABILITIES to make
> the corresponding bits user-visible CPU feature knobs, allowing them to
> be explicitly enabled or disabled via -cpu +/-<feature>.
>
> Once named, these bits become part of the guest CPU configuration
> contract.  If a VM is configured with such a feature enabled, migration
> to a destination that does not support the feature may fail, as the
> destination cannot honor the guest-visible CPU model.
>
> The PEBS_FMT bits are intentionally not exposed. They are not meaningful
> as user-visible features, and QEMU registers CPU features as boolean
> QOM properties, which makes them unsuitable for representing and
> checking numeric capabilities.

Currently KVM supports user space sets PEBS_FMT (see vmx_set_msr()), but
just requires the guest PEBS_FMT is identical with host PEBS_FMT.

IIRC, many places in KVM judges whether guest PEBS is enabled by checking
the guest PEBS_FMT. If we don't expose PEBS_FMT to user space, how does KVM
get the guest PEBS_FMT?


>
> Co-developed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
> Signed-off-by: Zide Chen <zide.chen@intel.com>
> ---
>  target/i386/cpu.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index f1ac98970d3e..fc6a64287415 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -1618,10 +1618,10 @@ FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
>          .type = MSR_FEATURE_WORD,
>          .feat_names = {
>              NULL, NULL, NULL, NULL,
> +            NULL, NULL, "pebs-trap", "pebs-arch-reg"
>              NULL, NULL, NULL, NULL,
> -            NULL, NULL, NULL, NULL,
> -            NULL, "full-width-write", NULL, NULL,
> -            NULL, NULL, NULL, NULL,
> +            NULL, "full-width-write", "pebs-baseline", NULL,
> +            NULL, "pebs-timing-info", NULL, NULL,
>              NULL, NULL, NULL, NULL,
>              NULL, NULL, NULL, NULL,
>              NULL, NULL, NULL, NULL,


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/7] target/i386: Disable unsupported BTS for guest
  2026-01-19  1:47   ` Mi, Dapeng
@ 2026-01-20 18:09     ` Chen, Zide
  0 siblings, 0 replies; 17+ messages in thread
From: Chen, Zide @ 2026-01-20 18:09 UTC (permalink / raw)
  To: Mi, Dapeng, qemu-devel, kvm, Paolo Bonzini, Zhao Liu, Peter Xu,
	Fabiano Rosas
  Cc: xiaoyao.li, Dongli Zhang



On 1/18/2026 5:47 PM, Mi, Dapeng wrote:
> 
> On 1/17/2026 9:10 AM, Zide Chen wrote:
>> BTS (Branch Trace Store), enumerated by IA32_MISC_ENABLE.BTS_UNAVAILABLE
>> (bit 11), is deprecated and has been superseded by LBR and Intel PT.
>>
>> KVM yields control of the above mentioned bit to userspace since KVM
>> commit 9fc222967a39 ("KVM: x86: Give host userspace full control of
>> MSR_IA32_MISC_ENABLES").
>>
>> However, QEMU does not set this bit, which allows guests to write the
>> BTS and BTINT bits in IA32_DEBUGCTL.  Since KVM doesn't support BTS,
>> this may lead to unexpected MSR access errors.
>>
>> Setting this bit does not introduce migration compatibility issues, so
>> the VMState version_id is not bumped.
>>
>> Signed-off-by: Zide Chen <zide.chen@intel.com>
>> ---
>>  target/i386/cpu.h | 5 ++++-
>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/target/i386/cpu.h b/target/i386/cpu.h
>> index 2bbc977d9088..f2b79a8bf1dc 100644
>> --- a/target/i386/cpu.h
>> +++ b/target/i386/cpu.h
>> @@ -474,7 +474,10 @@ typedef enum X86Seg {
>>  
>>  #define MSR_IA32_MISC_ENABLE            0x1a0
>>  /* Indicates good rep/movs microcode on some processors: */
>> -#define MSR_IA32_MISC_ENABLE_DEFAULT    1
>> +#define MSR_IA32_MISC_ENABLE_FASTSTRING    1
> 
> To keep the same code style and make users clearly know the macro is a
> bitmask, better define MSR_IA32_MISC_ENABLE_FASTSTRING like below.
> 
> #define MSR_IA32_MISC_ENABLE_FASTSTRING    (1ULL << 0)

Yes. Thanks.

> 
>> +#define MSR_IA32_MISC_ENABLE_BTS_UNAVAIL   (1ULL << 11)
>> +#define MSR_IA32_MISC_ENABLE_DEFAULT       (MSR_IA32_MISC_ENABLE_FASTSTRING     |\
>> +                                            MSR_IA32_MISC_ENABLE_BTS_UNAVAIL)
> 
> Better move the macro "MSR_IA32_MISC_ENABLE_DEFAULT" after
> "MSR_IA32_MISC_ENABLE_MWAIT".
> 

Thanks. Will do.

>>  #define MSR_IA32_MISC_ENABLE_MWAIT      (1ULL << 18)
>>  
>>  #define MSR_MTRRphysBase(reg)           (0x200 + 2 * (reg))



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 6/7] target/i386: Make some PEBS features user-visible
  2026-01-19  3:30   ` Mi, Dapeng
@ 2026-01-20 21:58     ` Chen, Zide
  2026-01-21  5:19       ` Mi, Dapeng
  2026-01-25  8:38       ` Zhao Liu
  0 siblings, 2 replies; 17+ messages in thread
From: Chen, Zide @ 2026-01-20 21:58 UTC (permalink / raw)
  To: Mi, Dapeng, qemu-devel, kvm, Paolo Bonzini, Zhao Liu, Peter Xu,
	Fabiano Rosas
  Cc: xiaoyao.li, Dongli Zhang



On 1/18/2026 7:30 PM, Mi, Dapeng wrote:
> 
> On 1/17/2026 9:10 AM, Zide Chen wrote:
>> Populate selected PEBS feature names in FEAT_PERF_CAPABILITIES to make
>> the corresponding bits user-visible CPU feature knobs, allowing them to
>> be explicitly enabled or disabled via -cpu +/-<feature>.
>>
>> Once named, these bits become part of the guest CPU configuration
>> contract.  If a VM is configured with such a feature enabled, migration
>> to a destination that does not support the feature may fail, as the
>> destination cannot honor the guest-visible CPU model.
>>
>> The PEBS_FMT bits are intentionally not exposed. They are not meaningful
>> as user-visible features, and QEMU registers CPU features as boolean
>> QOM properties, which makes them unsuitable for representing and
>> checking numeric capabilities.
> 
> Currently KVM supports user space sets PEBS_FMT (see vmx_set_msr()), but
> just requires the guest PEBS_FMT is identical with host PEBS_FMT.

My mistake — this is indeed problematic.

There are four possible ways to expose pebs_fmt to the guest when
cpu->migratable = true:

1. Add a pebs_fmt option similar to lbr_fmt.
This may work, but is not user-friendly and adds unnecessary complexity.

2. Set feat_names[8] = feat_names[9] = ... = "pebs-fmt".
This violates the implicit rule that feat_names[] entries should be
unique, and target/i386 does not support numeric features.

3. Use feat_names[8..11] = "pebs-fmt[1/2/3/4]".
This has two issues:
- It exposes pebs-fmt[1/2/3/4] as independent features, which is
semantically incorrect.
- Migration may fail incorrectly; e.g., migrating from pebs_fmt=2 to a
more capable host (pebs_fmt=4) would be reported as missing pebs-fmt2.

Given this, I propose the below changes. This may allow migration to a
less capable destination, which is not ideal, but it avoids false
“missing feature” bug and preserves the expectation that ensuring
destination compatibility is the responsibility of the management
application or the user.

BTW, I am not proposing a generic “x86 CPU numeric feature” mechanism at
this time, as it is unclear whether lbr_fmt and pebs_fmt alone justify
such a change.

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 015ba3fc9c7b..b6c95d5ceb31 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -1629,6 +1629,7 @@ FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
         .msr = {
             .index = MSR_IA32_PERF_CAPABILITIES,
         },
+       .migratable_flags = PERF_CAP_PEBS_FMT,
     },

     [FEAT_VMX_PROCBASED_CTLS] = {
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 1666eff65300..de4074d6baa7 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -421,6 +421,7 @@ typedef enum X86Seg {

 #define MSR_IA32_PERF_CAPABILITIES      0x345
 #define PERF_CAP_LBR_FMT                0x3f
+#define PERF_CAP_PEBS_FMT               0xf00
 #define PERF_CAP_FULL_WRITE             (1U << 13)
 #define PERF_CAP_PEBS_BASELINE          (1U << 14)


> IIRC, many places in KVM judges whether guest PEBS is enabled by checking
> the guest PEBS_FMT. If we don't expose PEBS_FMT to user space, how does KVM
> get the guest PEBS_FMT?
> 
> 
>>
>> Co-developed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
>> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
>> Signed-off-by: Zide Chen <zide.chen@intel.com>
>> ---
>>  target/i386/cpu.c | 6 +++---
>>  1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
>> index f1ac98970d3e..fc6a64287415 100644
>> --- a/target/i386/cpu.c
>> +++ b/target/i386/cpu.c
>> @@ -1618,10 +1618,10 @@ FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
>>          .type = MSR_FEATURE_WORD,
>>          .feat_names = {
>>              NULL, NULL, NULL, NULL,
>> +            NULL, NULL, "pebs-trap", "pebs-arch-reg"
>>              NULL, NULL, NULL, NULL,
>> -            NULL, NULL, NULL, NULL,
>> -            NULL, "full-width-write", NULL, NULL,
>> -            NULL, NULL, NULL, NULL,
>> +            NULL, "full-width-write", "pebs-baseline", NULL,
>> +            NULL, "pebs-timing-info", NULL, NULL,
>>              NULL, NULL, NULL, NULL,
>>              NULL, NULL, NULL, NULL,
>>              NULL, NULL, NULL, NULL,



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 6/7] target/i386: Make some PEBS features user-visible
  2026-01-20 21:58     ` Chen, Zide
@ 2026-01-21  5:19       ` Mi, Dapeng
  2026-01-25  8:38       ` Zhao Liu
  1 sibling, 0 replies; 17+ messages in thread
From: Mi, Dapeng @ 2026-01-21  5:19 UTC (permalink / raw)
  To: Chen, Zide, qemu-devel, kvm, Paolo Bonzini, Zhao Liu, Peter Xu,
	Fabiano Rosas
  Cc: xiaoyao.li, Dongli Zhang


On 1/21/2026 5:58 AM, Chen, Zide wrote:
>
> On 1/18/2026 7:30 PM, Mi, Dapeng wrote:
>> On 1/17/2026 9:10 AM, Zide Chen wrote:
>>> Populate selected PEBS feature names in FEAT_PERF_CAPABILITIES to make
>>> the corresponding bits user-visible CPU feature knobs, allowing them to
>>> be explicitly enabled or disabled via -cpu +/-<feature>.
>>>
>>> Once named, these bits become part of the guest CPU configuration
>>> contract.  If a VM is configured with such a feature enabled, migration
>>> to a destination that does not support the feature may fail, as the
>>> destination cannot honor the guest-visible CPU model.
>>>
>>> The PEBS_FMT bits are intentionally not exposed. They are not meaningful
>>> as user-visible features, and QEMU registers CPU features as boolean
>>> QOM properties, which makes them unsuitable for representing and
>>> checking numeric capabilities.
>> Currently KVM supports user space sets PEBS_FMT (see vmx_set_msr()), but
>> just requires the guest PEBS_FMT is identical with host PEBS_FMT.
> My mistake — this is indeed problematic.
>
> There are four possible ways to expose pebs_fmt to the guest when
> cpu->migratable = true:
>
> 1. Add a pebs_fmt option similar to lbr_fmt.
> This may work, but is not user-friendly and adds unnecessary complexity.
>
> 2. Set feat_names[8] = feat_names[9] = ... = "pebs-fmt".
> This violates the implicit rule that feat_names[] entries should be
> unique, and target/i386 does not support numeric features.
>
> 3. Use feat_names[8..11] = "pebs-fmt[1/2/3/4]".
> This has two issues:
> - It exposes pebs-fmt[1/2/3/4] as independent features, which is
> semantically incorrect.
> - Migration may fail incorrectly; e.g., migrating from pebs_fmt=2 to a
> more capable host (pebs_fmt=4) would be reported as missing pebs-fmt2.
>
> Given this, I propose the below changes. This may allow migration to a
> less capable destination, which is not ideal, but it avoids false
> “missing feature” bug and preserves the expectation that ensuring
> destination compatibility is the responsibility of the management
> application or the user.
>
> BTW, I am not proposing a generic “x86 CPU numeric feature” mechanism at
> this time, as it is unclear whether lbr_fmt and pebs_fmt alone justify
> such a change.
>
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index 015ba3fc9c7b..b6c95d5ceb31 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -1629,6 +1629,7 @@ FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
>          .msr = {
>              .index = MSR_IA32_PERF_CAPABILITIES,
>          },
> +       .migratable_flags = PERF_CAP_PEBS_FMT,
>      },
>
>      [FEAT_VMX_PROCBASED_CTLS] = {
> diff --git a/target/i386/cpu.h b/target/i386/cpu.h
> index 1666eff65300..de4074d6baa7 100644
> --- a/target/i386/cpu.h
> +++ b/target/i386/cpu.h
> @@ -421,6 +421,7 @@ typedef enum X86Seg {
>
>  #define MSR_IA32_PERF_CAPABILITIES      0x345
>  #define PERF_CAP_LBR_FMT                0x3f
> +#define PERF_CAP_PEBS_FMT               0xf00
>  #define PERF_CAP_FULL_WRITE             (1U << 13)
>  #define PERF_CAP_PEBS_BASELINE          (1U << 14)

I can't say if this is the best way. Maybe @Zhao can give some comments.
Thanks.


>
>
>> IIRC, many places in KVM judges whether guest PEBS is enabled by checking
>> the guest PEBS_FMT. If we don't expose PEBS_FMT to user space, how does KVM
>> get the guest PEBS_FMT?
>>
>>
>>> Co-developed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
>>> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
>>> Signed-off-by: Zide Chen <zide.chen@intel.com>
>>> ---
>>>  target/i386/cpu.c | 6 +++---
>>>  1 file changed, 3 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
>>> index f1ac98970d3e..fc6a64287415 100644
>>> --- a/target/i386/cpu.c
>>> +++ b/target/i386/cpu.c
>>> @@ -1618,10 +1618,10 @@ FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
>>>          .type = MSR_FEATURE_WORD,
>>>          .feat_names = {
>>>              NULL, NULL, NULL, NULL,
>>> +            NULL, NULL, "pebs-trap", "pebs-arch-reg"
>>>              NULL, NULL, NULL, NULL,
>>> -            NULL, NULL, NULL, NULL,
>>> -            NULL, "full-width-write", NULL, NULL,
>>> -            NULL, NULL, NULL, NULL,
>>> +            NULL, "full-width-write", "pebs-baseline", NULL,
>>> +            NULL, "pebs-timing-info", NULL, NULL,
>>>              NULL, NULL, NULL, NULL,
>>>              NULL, NULL, NULL, NULL,
>>>              NULL, NULL, NULL, NULL,
>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 6/7] target/i386: Make some PEBS features user-visible
  2026-01-20 21:58     ` Chen, Zide
  2026-01-21  5:19       ` Mi, Dapeng
@ 2026-01-25  8:38       ` Zhao Liu
  2026-01-27  0:51         ` Chen, Zide
  1 sibling, 1 reply; 17+ messages in thread
From: Zhao Liu @ 2026-01-25  8:38 UTC (permalink / raw)
  To: Chen, Zide, Mi, Dapeng
  Cc: qemu-devel, kvm, Paolo Bonzini, Peter Xu, Fabiano Rosas,
	xiaoyao.li, Dongli Zhang

Hi Zide & Dapeng,

> On 1/18/2026 7:30 PM, Mi, Dapeng wrote:
> > 
> > On 1/17/2026 9:10 AM, Zide Chen wrote:
> >> Populate selected PEBS feature names in FEAT_PERF_CAPABILITIES to make
> >> the corresponding bits user-visible CPU feature knobs, allowing them to
> >> be explicitly enabled or disabled via -cpu +/-<feature>.
> >>
> >> Once named, these bits become part of the guest CPU configuration
> >> contract.  If a VM is configured with such a feature enabled, migration
> >> to a destination that does not support the feature may fail, as the
> >> destination cannot honor the guest-visible CPU model.
> >>
> >> The PEBS_FMT bits are intentionally not exposed. They are not meaningful
> >> as user-visible features, and QEMU registers CPU features as boolean
> >> QOM properties, which makes them unsuitable for representing and
> >> checking numeric capabilities.
> > 
> > Currently KVM supports user space sets PEBS_FMT (see vmx_set_msr()), but
> > just requires the guest PEBS_FMT is identical with host PEBS_FMT.
> 
> My mistake — this is indeed problematic.
> 
> There are four possible ways to expose pebs_fmt to the guest when
> cpu->migratable = true:
> 
> 1. Add a pebs_fmt option similar to lbr_fmt.
> This may work, but is not user-friendly and adds unnecessary complexity.
> 
> 2. Set feat_names[8] = feat_names[9] = ... = "pebs-fmt".
> This violates the implicit rule that feat_names[] entries should be
> unique, and target/i386 does not support numeric features.
> 
> 3. Use feat_names[8..11] = "pebs-fmt[1/2/3/4]".
> This has two issues:
> - It exposes pebs-fmt[1/2/3/4] as independent features, which is
> semantically incorrect.
> - Migration may fail incorrectly; e.g., migrating from pebs_fmt=2 to a
> more capable host (pebs_fmt=4) would be reported as missing pebs-fmt2.

For 2) & 3), I think if it's necessary, maybe it's time to re-consider
the previous multi-bits property:

https://lore.kernel.org/qemu-devel/20230106083826.5384-4-lei4.wang@intel.com/

But as for now, I think 1) is also okay. Since lbr-fmt seems very
similar to pebs-fmt, it's best to have them handle these fmt things in a
similar manner, otherwise it can make code maintenance troublesome.

> Given this, I propose the below changes. This may allow migration to a
> less capable destination, which is not ideal, but it avoids false
> “missing feature” bug and preserves the expectation that ensuring
> destination compatibility is the responsibility of the management
> application or the user.
> 
> BTW, I am not proposing a generic “x86 CPU numeric feature” mechanism at
> this time, as it is unclear whether lbr_fmt and pebs_fmt alone justify
> such a change.
> 
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index 015ba3fc9c7b..b6c95d5ceb31 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -1629,6 +1629,7 @@ FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
>          .msr = {
>              .index = MSR_IA32_PERF_CAPABILITIES,
>          },
> +       .migratable_flags = PERF_CAP_PEBS_FMT,

About the migration issue, I wonder whether it's necessary to migrate
pebs-fmt? IIUC, it seems not necessary: the guest's pebs-fmt depends on
host's pebs-fmt, but I'm sure what will happens when guest migrates to
a mahince with different pebs-fmt.

note, lbr-fmt seems not be migrated.

Thanks,
Zhao



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 6/7] target/i386: Make some PEBS features user-visible
  2026-01-25  8:38       ` Zhao Liu
@ 2026-01-27  0:51         ` Chen, Zide
  0 siblings, 0 replies; 17+ messages in thread
From: Chen, Zide @ 2026-01-27  0:51 UTC (permalink / raw)
  To: Zhao Liu, Mi, Dapeng
  Cc: qemu-devel, kvm, Paolo Bonzini, Peter Xu, Fabiano Rosas,
	xiaoyao.li, Dongli Zhang



On 1/25/2026 12:38 AM, Zhao Liu wrote:
> Hi Zide & Dapeng,
> 
>> On 1/18/2026 7:30 PM, Mi, Dapeng wrote:
>>>
>>> On 1/17/2026 9:10 AM, Zide Chen wrote:
>>>> Populate selected PEBS feature names in FEAT_PERF_CAPABILITIES to make
>>>> the corresponding bits user-visible CPU feature knobs, allowing them to
>>>> be explicitly enabled or disabled via -cpu +/-<feature>.
>>>>
>>>> Once named, these bits become part of the guest CPU configuration
>>>> contract.  If a VM is configured with such a feature enabled, migration
>>>> to a destination that does not support the feature may fail, as the
>>>> destination cannot honor the guest-visible CPU model.
>>>>
>>>> The PEBS_FMT bits are intentionally not exposed. They are not meaningful
>>>> as user-visible features, and QEMU registers CPU features as boolean
>>>> QOM properties, which makes them unsuitable for representing and
>>>> checking numeric capabilities.
>>>
>>> Currently KVM supports user space sets PEBS_FMT (see vmx_set_msr()), but
>>> just requires the guest PEBS_FMT is identical with host PEBS_FMT.
>>
>> My mistake — this is indeed problematic.
>>
>> There are four possible ways to expose pebs_fmt to the guest when
>> cpu->migratable = true:
>>
>> 1. Add a pebs_fmt option similar to lbr_fmt.
>> This may work, but is not user-friendly and adds unnecessary complexity.
>>
>> 2. Set feat_names[8] = feat_names[9] = ... = "pebs-fmt".
>> This violates the implicit rule that feat_names[] entries should be
>> unique, and target/i386 does not support numeric features.
>>
>> 3. Use feat_names[8..11] = "pebs-fmt[1/2/3/4]".
>> This has two issues:
>> - It exposes pebs-fmt[1/2/3/4] as independent features, which is
>> semantically incorrect.
>> - Migration may fail incorrectly; e.g., migrating from pebs_fmt=2 to a
>> more capable host (pebs_fmt=4) would be reported as missing pebs-fmt2.
> 
> For 2) & 3), I think if it's necessary, maybe it's time to re-consider
> the previous multi-bits property:
> 
> https://lore.kernel.org/qemu-devel/20230106083826.5384-4-lei4.wang@intel.com/

I think the multi-bit property may be a better approach (which I
previously referred to as “numeric features”).

As Igor pointed out, this would involve an infrastructure change, so I
am hesitant to pursue it at this time.


> But as for now, I think 1) is also okay. Since lbr-fmt seems very
> similar to pebs-fmt, it's best to have them handle these fmt things in a
> similar manner, otherwise it can make code maintenance troublesome.
Yes, it works. Will post V2 with this approach.

>> Given this, I propose the below changes. This may allow migration to a
>> less capable destination, which is not ideal, but it avoids false
>> “missing feature” bug and preserves the expectation that ensuring
>> destination compatibility is the responsibility of the management
>> application or the user.
>>
>> BTW, I am not proposing a generic “x86 CPU numeric feature” mechanism at
>> this time, as it is unclear whether lbr_fmt and pebs_fmt alone justify
>> such a change.
>>
>> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
>> index 015ba3fc9c7b..b6c95d5ceb31 100644
>> --- a/target/i386/cpu.c
>> +++ b/target/i386/cpu.c
>> @@ -1629,6 +1629,7 @@ FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
>>          .msr = {
>>              .index = MSR_IA32_PERF_CAPABILITIES,
>>          },
>> +       .migratable_flags = PERF_CAP_PEBS_FMT,
> 
> About the migration issue, I wonder whether it's necessary to migrate
> pebs-fmt? IIUC, it seems not necessary: the guest's pebs-fmt depends on
> host's pebs-fmt, but I'm sure what will happens when guest migrates to
> a mahince with different pebs-fmt.
> 
> note, lbr-fmt seems not be migrated.
For a migratable feature, it's related state should be fully captured in
the VM migration stream.  Both the LBR and PEBS states are serialized
via vmstate, so it may not be wrong to call labr/pebs_fmt a migratable
feature.

However, QEMU must provide means—either to external management
applications via QOM properties or to users via command-line options—to
ensure that the destination machine supports all features enabled on the
source. The current migratable_flags proposal does not address this.

My original intention was to treat this as a workaround.


> Thanks,
> Zhao
> 



^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2026-01-27  0:52 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-17  1:10 [PATCH 0/7] target/i386: Misc PMU, PEBS, and MSR fixes and improvements Zide Chen
2026-01-17  1:10 ` [PATCH 1/7] target/i386: Disable unsupported BTS for guest Zide Chen
2026-01-19  1:47   ` Mi, Dapeng
2026-01-20 18:09     ` Chen, Zide
2026-01-17  1:10 ` [PATCH 2/7] target/i386: Don't save/restore PERF_GLOBAL_OVF_CTRL MSR Zide Chen
2026-01-17  1:10 ` [PATCH 3/7] target/i386: Gate enable_pmu on kvm_enabled() Zide Chen
2026-01-19  2:02   ` Mi, Dapeng
2026-01-17  1:10 ` [PATCH 4/7] target/i386: Support full-width writes for perf counters Zide Chen
2026-01-19  3:11   ` Mi, Dapeng
2026-01-17  1:10 ` [PATCH 5/7] target/i386: Save/Restore DS based PEBS specfic MSRs Zide Chen
2026-01-17  1:10 ` [PATCH 6/7] target/i386: Make some PEBS features user-visible Zide Chen
2026-01-19  3:30   ` Mi, Dapeng
2026-01-20 21:58     ` Chen, Zide
2026-01-21  5:19       ` Mi, Dapeng
2026-01-25  8:38       ` Zhao Liu
2026-01-27  0:51         ` Chen, Zide
2026-01-17  1:10 ` [PATCH 7/7] target/i386: Increase MSR_BUF_SIZE and split KVM_[GET/SET]_MSRS calls Zide Chen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox