[PATCH 0/9] Upgrade vPMU version to 5

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/9] Upgrade vPMU version to 5
@ 2023-09-01  7:28 Xiong Zhang
  2023-09-01  7:28 ` [PATCH 1/9] KVM: x86/PMU: Don't release vLBR caused by PMI Xiong Zhang
                   ` (8 more replies)
  0 siblings, 9 replies; 27+ messages in thread
From: Xiong Zhang @ 2023-09-01  7:28 UTC (permalink / raw)
  To: kvm
  Cc: seanjc, like.xu.linux, zhiyuan.lv, zhenyu.z.wang, kan.liang,
	dapeng1.mi, Xiong Zhang

Intel recent processors have supported Architectural Performance
Monitoring Version 5, while kvm vPMU still keeps on version 2. In order
to use new PMU features introduced in version 4 and 5, this patchset
upgrades vPMU version to 5.

Go through PMU features from version 3 to 5, the following features are
supported by this patchset:
1. Streamlined Freeze LBR on PMI on version 4. This feature adds a new
bit IA32_MSR_PERF_GLOBAL_STATUS.LBR_FRZ[58], it will be set when PMI
happens and LBR stack is forzen. This bit also serves as a control to
enable LBR stack. SW should clear this bit at the end of PMI handler
to enable LBR stack.
2. IA32_PERF_GLOBAL_STATUS_RESET MSR on version 4. its address is
inherited from  IA32_PERF_GLOBAL_OVF_CTRL MSR, and they have the same
function to clear individual bits in IA32_PERF_GLOBAL_STATUS MSR.
3. IA32_PERF_GLOBAL_STATUS_SET MSR on version 4. it allows software to
set individual bits in IA32_PERF_GLOBAL_STATUS MSR.
4. IA32_PERF_GLOBAL_INUSE MSR on version 4. It provides an "InUse" bit
for each programmable performance counter and fixed counter in the
processor. Additionally, it includes an indicator if the PMI mechanisam
has been used.
5. Fixed Counter Enumeration on version 5. CPUID.0AH.ECX provides a bit
mask which enumerates the supported Fixed Counters in a processor.

For each added feature, the kvm emulation is straightforward and reflects
vPMU state, please see each feature's emulation code in the following
commits, a kvm unit test case or kernel selftests is added to verify
this feature's emultion. Kernel doesn't use feature 3 and 4, so
designed kvm unit test case is the only verification method for
feature 3 and 4. I'm afraid that I miss something for these features,
especially the user case for these features. So any suggestions and
usage are welcome.

While the following features are not supported:
1. AnyThread counting: it is added in v3, and deprecated in v5. so this
feature isn't supported.
2. Streamlined Freeze_PerfMon_On_PMI on version 4. Since legacy
Freeze_PerMon_On_PMI on version 2 isn't supported and community think
this feature has problems on native[1], so this feature's emulation
isn't supported.
3. IA32_PERF_GLOBAL_STATUS.ASCI[bit 60] on version 4. This new bit
relates to SGX, and will be emulated by SGX developer.
4. Domain Seperation on version 5. When INV flag in IA32_PERFEVTSELx is
used, a counter stops counting when logical processor exits the C0 ACPI
C-state. First guest INV flag isn't supported, second guest ACPI C-state
is vague. So this feature's emulation isn't supported.

Reference:
[1]: perf/intel: Remove Perfmon-v4 counter_freezing support
https://lore.kernel.org/all/20201110153721.GQ2651@hirez.programming.kicks-ass.net/

Like Xu (1):
  KVM: x86/pmu: Add Intel PMU supported fixed counters mask

Xiong Zhang (8):
  KVM: x86/PMU: Don't release vLBR caused by PMI
  KVM: x85/pmu: Add Streamlined FREEZE_LBR_ON_PMI for vPMU v4
  KVM: x86/pmu: Add PERF_GLOBAL_STATUS_SET MSR emulation
  KVM: x86/pmu: Add MSR_PERF_GLOBAL_INUSE emulation
  KVM: x86/pmu: Check CPUID.0AH.ECX consistency
  KVM: x86/pmu: Add fixed counter enumeration for pmu v5
  KVM: x86/pmu: Upgrade pmu version to 5 on intel processor
  KVM: selftests: Add fixed counters enumeration test case

 arch/x86/include/asm/kvm_host.h               |   1 -
 arch/x86/include/asm/msr-index.h              |   6 +
 arch/x86/kvm/cpuid.c                          |  32 ++-
 arch/x86/kvm/pmu.c                            |   8 -
 arch/x86/kvm/pmu.h                            |  17 +-
 arch/x86/kvm/svm/pmu.c                        |   1 -
 arch/x86/kvm/vmx/pmu_intel.c                  | 188 +++++++++++++++---
 arch/x86/kvm/vmx/vmx.c                        |  15 +-
 arch/x86/kvm/vmx/vmx.h                        |   3 +
 .../selftests/kvm/x86_64/vmx_pmu_caps_test.c  |  84 ++++++++
 10 files changed, 312 insertions(+), 43 deletions(-)

base-commit: fff2e47e6c3b8050ca26656693caa857e3a8b740
-- 
2.34.1

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 1/9] KVM: x86/PMU: Don't release vLBR caused by PMI
  2023-09-01  7:28 [PATCH 0/9] Upgrade vPMU version to 5 Xiong Zhang
@ 2023-09-01  7:28 ` Xiong Zhang
  2023-09-06  9:47   ` Mi, Dapeng
  2023-09-12 11:54   ` Like Xu
  2023-09-01  7:28 ` [PATCH 2/9] KVM: x85/pmu: Add Streamlined FREEZE_LBR_ON_PMI for vPMU v4 Xiong Zhang
                   ` (7 subsequent siblings)
  8 siblings, 2 replies; 27+ messages in thread
From: Xiong Zhang @ 2023-09-01  7:28 UTC (permalink / raw)
  To: kvm
  Cc: seanjc, like.xu.linux, zhiyuan.lv, zhenyu.z.wang, kan.liang,
	dapeng1.mi, Xiong Zhang

vLBR event will be released at vcpu sched-in time if LBR_EN bit is not
set in GUEST_IA32_DEBUGCTL VMCS field, this bit is cleared in two cases:
1. guest disable LBR through WRMSR
2. KVM disable LBR at PMI injection to emulate guest FREEZE_LBR_ON_PMI.

The first case is guest LBR won't be used anymore and vLBR event can be
released, but guest LBR is still be used in the second case, vLBR event
can not be released.

Considering this serial:
1. vPMC overflow, KVM injects vPMI and clears guest LBR_EN
2. guest handles PMI, and reads LBR records.
3. vCPU is sched-out, later sched-in, vLBR event is released.
4. Guest continue reading LBR records, KVM creates vLBR event again,
the vLBR event is the only LBR user on host now, host PMU driver will
reset HW LBR facility at vLBR creataion.
5. Guest gets the remain LBR records with reset state.
This is conflict with FREEZE_LBR_ON_PMI meaning, so vLBR event can
not be release on PMI.

This commit adds a freeze_on_pmi flag, this flag is set at pmi
injection and is cleared when guest operates guest DEBUGCTL_MSR. If
this flag is true, vLBR event will not be released.

Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
---
 arch/x86/kvm/vmx/pmu_intel.c |  5 ++++-
 arch/x86/kvm/vmx/vmx.c       | 12 +++++++++---
 arch/x86/kvm/vmx/vmx.h       |  3 +++
 3 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index f2efa0bf7ae8..3a36a91638c6 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -628,6 +628,7 @@ static void intel_pmu_init(struct kvm_vcpu *vcpu)
 	lbr_desc->records.nr = 0;
 	lbr_desc->event = NULL;
 	lbr_desc->msr_passthrough = false;
+	lbr_desc->freeze_on_pmi = false;
 }
 
 static void intel_pmu_reset(struct kvm_vcpu *vcpu)
@@ -670,6 +671,7 @@ static void intel_pmu_legacy_freezing_lbrs_on_pmi(struct kvm_vcpu *vcpu)
 	if (data & DEBUGCTLMSR_FREEZE_LBRS_ON_PMI) {
 		data &= ~DEBUGCTLMSR_LBR;
 		vmcs_write64(GUEST_IA32_DEBUGCTL, data);
+		vcpu_to_lbr_desc(vcpu)->freeze_on_pmi = true;
 	}
 }
 
@@ -761,7 +763,8 @@ void vmx_passthrough_lbr_msrs(struct kvm_vcpu *vcpu)
 
 static void intel_pmu_cleanup(struct kvm_vcpu *vcpu)
 {
-	if (!(vmcs_read64(GUEST_IA32_DEBUGCTL) & DEBUGCTLMSR_LBR))
+	if (!(vmcs_read64(GUEST_IA32_DEBUGCTL) & DEBUGCTLMSR_LBR) &&
+	    !vcpu_to_lbr_desc(vcpu)->freeze_on_pmi)
 		intel_pmu_release_guest_lbr_event(vcpu);
 }
 
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index e6849f780dba..199d0da1dbee 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2223,9 +2223,15 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 			get_vmcs12(vcpu)->guest_ia32_debugctl = data;
 
 		vmcs_write64(GUEST_IA32_DEBUGCTL, data);
-		if (intel_pmu_lbr_is_enabled(vcpu) && !to_vmx(vcpu)->lbr_desc.event &&
-		    (data & DEBUGCTLMSR_LBR))
-			intel_pmu_create_guest_lbr_event(vcpu);
+
+		if (intel_pmu_lbr_is_enabled(vcpu)) {
+			struct lbr_desc *lbr_desc = vcpu_to_lbr_desc(vcpu);
+
+			lbr_desc->freeze_on_pmi = false;
+			if (!lbr_desc->event && (data & DEBUGCTLMSR_LBR))
+				intel_pmu_create_guest_lbr_event(vcpu);
+		}
+
 		return 0;
 	}
 	case MSR_IA32_BNDCFGS:
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index c2130d2c8e24..9729ccfa75ae 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -107,6 +107,9 @@ struct lbr_desc {
 
 	/* True if LBRs are marked as not intercepted in the MSR bitmap */
 	bool msr_passthrough;
+
+	/* True if LBR is frozen on PMI */
+	bool freeze_on_pmi;
 };
 
 /*
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 2/9] KVM: x85/pmu: Add Streamlined FREEZE_LBR_ON_PMI for vPMU v4
  2023-09-01  7:28 [PATCH 0/9] Upgrade vPMU version to 5 Xiong Zhang
  2023-09-01  7:28 ` [PATCH 1/9] KVM: x86/PMU: Don't release vLBR caused by PMI Xiong Zhang
@ 2023-09-01  7:28 ` Xiong Zhang
  2023-09-06  9:49   ` Mi, Dapeng
  2023-09-01  7:28 ` [PATCH 3/9] KVM: x86/pmu: Add PERF_GLOBAL_STATUS_SET MSR emulation Xiong Zhang
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 27+ messages in thread
From: Xiong Zhang @ 2023-09-01  7:28 UTC (permalink / raw)
  To: kvm
  Cc: seanjc, like.xu.linux, zhiyuan.lv, zhenyu.z.wang, kan.liang,
	dapeng1.mi, Xiong Zhang

Arch PMU version 4 adds a streamlined FREEZE_LBR_ON_PMI feature, this
feature adds LBR_FRZ[bit 58] into IA32_PERF_GLOBAL_STATUS, this bit is
set due to the following conditions:
-- IA32_DEBUGCTL.FREEZE_LBR_ON_PMI has been set
-- A performance counter, configured to generate PMI, has overflowed to
signal a PMI. Consequently the LBR stack is frozen.
Effectively, this bit also serves as a control to enabled capturing
data in the LBR stack. When this bit is set, LBR stack is frozen, and
new LBR records won't be filled.

The sequence of streamlined freeze LBR is:
1. Profiling agent set IA32_DEBUGCTL.FREEZE_LBR_ON_PMI, and enable
a performance counter to generate PMI on overflow.
2. Processor generates PMI and sets IA32_PERF_GLOBAL_STATUS.LBR_FRZ,
then LBR stack is forzen.
3. Profiling agent PMI handler handles overflow, and clears
IA32_PERF_GLOBAL_STATUS.
4. When IA32_PERF_GLOBAL_STATUS.LBR_FRZ is cleared in step 3,
processor resume LBR stack, and new LBR records can be filled
again.

In order to emulate this behavior, LBR stack must be frozen on PMI.
KVM has two choice to do this:
1. KVM stops vLBR event through perf_event_pause(), and put vLBR
event into off state, then vLBR lose LBR hw resource, finally guest
couldn't read LBR records in guest PMI handler. This choice couldn't
be used.
2. KVM clear guest DEBUGCTLMSR_LBR bit in VMCS on PMI, so when guest
is running, LBR HW stack is disabled, while vLBR event is still active
and own LBR HW, so guest could still read LBR records in guest PMI
handler. But the sequence of streamlined freeze LBR doesn't clear
DEBUGCTLMSR_LBR bit, so when guest read guest DEBUGCTL_MSR, KVM will
return a value with DEBUGCTLMSR_LBR bit set during LBR freezing. Once
guest clears IA32_PERF_GLOBAL_STATUS.LBR_FRZ in step 4, KVM will
re-enable guest LBR through setting guest DEBUGCTL_LBR bit in VMCS.

As KVM will re-enable guest LBR when guest clears global status, the
handling of GLOBAL_OVF_CTRL MSR is moved from common pmu.c into
vmx/pmu_intel.c.

Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
---
 arch/x86/include/asm/msr-index.h |  1 +
 arch/x86/kvm/pmu.c               |  8 ------
 arch/x86/kvm/vmx/pmu_intel.c     | 44 ++++++++++++++++++++++++++++++++
 arch/x86/kvm/vmx/vmx.c           |  3 +++
 4 files changed, 48 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 3aedae61af4f..4fce37ae5a90 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -1041,6 +1041,7 @@
 /* PERF_GLOBAL_OVF_CTL bits */
 #define MSR_CORE_PERF_GLOBAL_OVF_CTRL_TRACE_TOPA_PMI_BIT	55
 #define MSR_CORE_PERF_GLOBAL_OVF_CTRL_TRACE_TOPA_PMI		(1ULL << MSR_CORE_PERF_GLOBAL_OVF_CTRL_TRACE_TOPA_PMI_BIT)
+#define MSR_CORE_PERF_GLOBAL_OVF_CTRL_LBR_FREEZE		BIT_ULL(58)
 #define MSR_CORE_PERF_GLOBAL_OVF_CTRL_OVF_BUF_BIT		62
 #define MSR_CORE_PERF_GLOBAL_OVF_CTRL_OVF_BUF			(1ULL <<  MSR_CORE_PERF_GLOBAL_OVF_CTRL_OVF_BUF_BIT)
 #define MSR_CORE_PERF_GLOBAL_OVF_CTRL_COND_CHGD_BIT		63
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index edb89b51b383..4b6a508f3f0b 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -640,14 +640,6 @@ int kvm_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 			reprogram_counters(pmu, diff);
 		}
 		break;
-	case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
-		/*
-		 * GLOBAL_OVF_CTRL, a.k.a. GLOBAL STATUS_RESET, clears bits in
-		 * GLOBAL_STATUS, and so the set of reserved bits is the same.
-		 */
-		if (data & pmu->global_status_mask)
-			return 1;
-		fallthrough;
 	case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR:
 		if (!msr_info->host_initiated)
 			pmu->global_status &= ~data;
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 3a36a91638c6..ba7695a64ff1 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -426,6 +426,29 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 
 		pmu->pebs_data_cfg = data;
 		break;
+	case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
+		/*
+		 * GLOBAL_OVF_CTRL, a.k.a. GLOBAL STATUS_RESET, clears bits in
+		 * GLOBAL_STATUS, and so the set of reserved bits is the same.
+		 */
+		if (data & pmu->global_status_mask)
+			return 1;
+		if (pmu->version >= 4 && !msr_info->host_initiated &&
+		    (data & MSR_CORE_PERF_GLOBAL_OVF_CTRL_LBR_FREEZE)) {
+			u64 debug_ctl = vmcs_read64(GUEST_IA32_DEBUGCTL);
+			struct lbr_desc *lbr_desc = vcpu_to_lbr_desc(vcpu);
+
+			if (!(debug_ctl & DEBUGCTLMSR_LBR) &&
+			    lbr_desc->freeze_on_pmi) {
+				debug_ctl |= DEBUGCTLMSR_LBR;
+				vmcs_write64(GUEST_IA32_DEBUGCTL, debug_ctl);
+				lbr_desc->freeze_on_pmi = false;
+			}
+		}
+
+		if (!msr_info->host_initiated)
+			pmu->global_status &= ~data;
+		break;
 	default:
 		if ((pmc = get_gp_pmc(pmu, msr, MSR_IA32_PERFCTR0)) ||
 		    (pmc = get_gp_pmc(pmu, msr, MSR_IA32_PMC0))) {
@@ -565,6 +588,9 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
 	if (vmx_pt_mode_is_host_guest())
 		pmu->global_status_mask &=
 				~MSR_CORE_PERF_GLOBAL_OVF_CTRL_TRACE_TOPA_PMI;
+	if (pmu->version >= 4)
+		pmu->global_status_mask &=
+				~MSR_CORE_PERF_GLOBAL_OVF_CTRL_LBR_FREEZE;
 
 	entry = kvm_find_cpuid_entry_index(vcpu, 7, 0);
 	if (entry &&
@@ -675,6 +701,22 @@ static void intel_pmu_legacy_freezing_lbrs_on_pmi(struct kvm_vcpu *vcpu)
 	}
 }
 
+static void intel_pmu_streamlined_freezing_lbrs_on_pmi(struct kvm_vcpu *vcpu)
+{
+	u64 data = vmcs_read64(GUEST_IA32_DEBUGCTL);
+	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
+
+	/*
+	 * Even if streamlined freezing LBR won't clear LBR_EN like legacy
+	 * freezing LBR, here legacy freezing LBR is called to freeze LBR HW
+	 * for streamlined freezing LBR when guest run. But guest VM will
+	 * see a fake guest DEBUGCTL MSR with LBR_EN bit set.
+	 */
+	intel_pmu_legacy_freezing_lbrs_on_pmi(vcpu);
+	if ((data & DEBUGCTLMSR_FREEZE_LBRS_ON_PMI) && (data & DEBUGCTLMSR_LBR))
+		pmu->global_status |= MSR_CORE_PERF_GLOBAL_OVF_CTRL_LBR_FREEZE;
+}
+
 static void intel_pmu_deliver_pmi(struct kvm_vcpu *vcpu)
 {
 	u8 version = vcpu_to_pmu(vcpu)->version;
@@ -684,6 +726,8 @@ static void intel_pmu_deliver_pmi(struct kvm_vcpu *vcpu)
 
 	if (version > 1 && version < 4)
 		intel_pmu_legacy_freezing_lbrs_on_pmi(vcpu);
+	else if (version >= 4)
+		intel_pmu_streamlined_freezing_lbrs_on_pmi(vcpu);
 }
 
 static void vmx_update_intercept_for_lbr_msrs(struct kvm_vcpu *vcpu, bool set)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 199d0da1dbee..3bd64879aab3 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2098,6 +2098,9 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		break;
 	case MSR_IA32_DEBUGCTLMSR:
 		msr_info->data = vmcs_read64(GUEST_IA32_DEBUGCTL);
+		if (vcpu_to_lbr_desc(vcpu)->freeze_on_pmi &&
+		    vcpu_to_pmu(vcpu)->version >= 4)
+			msr_info->data |= DEBUGCTLMSR_LBR;
 		break;
 	default:
 	find_uret_msr:
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 3/9] KVM: x86/pmu: Add PERF_GLOBAL_STATUS_SET MSR emulation
  2023-09-01  7:28 [PATCH 0/9] Upgrade vPMU version to 5 Xiong Zhang
  2023-09-01  7:28 ` [PATCH 1/9] KVM: x86/PMU: Don't release vLBR caused by PMI Xiong Zhang
  2023-09-01  7:28 ` [PATCH 2/9] KVM: x85/pmu: Add Streamlined FREEZE_LBR_ON_PMI for vPMU v4 Xiong Zhang
@ 2023-09-01  7:28 ` Xiong Zhang
  2023-09-01  7:28 ` [PATCH 4/9] KVM: x86/pmu: Add MSR_PERF_GLOBAL_INUSE emulation Xiong Zhang
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 27+ messages in thread
From: Xiong Zhang @ 2023-09-01  7:28 UTC (permalink / raw)
  To: kvm
  Cc: seanjc, like.xu.linux, zhiyuan.lv, zhenyu.z.wang, kan.liang,
	dapeng1.mi, Xiong Zhang

The IA32_PERF_GLOBAL_STATUS_SET MSR is introduced with arch PMU
version 4. It allows software to set individual bits in
IA32_PERF_GLOBAL_STATUS MSR. It can be used by a VMM to virtualize the
state of IA32_PERF_GLOBAL_STATUS across VMS.

If the running VM owns the whole PMU, different VM will have different
perf global status, VMM needs to restore IA32_PERF_GLOBAL_STATUS MSR at
VM switch, but IA32_PERF_GLOBAL_STATUS MSR is read only, so VMM can use
IA32_PERF_GLOBAL_STATUS_SET MSR to restore VM's PERF_GLOBAL_STATUS MSR.

This commit adds this MSR emulation, so that L1 VMM could use it. As it
is mainly used by VMM to restore VM's PERF_GLOBAL_STATUS MSR during VM
switch, it doesn't need to inject PMI or FREEZE_LBR when VMM write it.

Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
---
 arch/x86/include/asm/msr-index.h |  1 +
 arch/x86/kvm/vmx/pmu_intel.c     | 16 ++++++++++++++++
 2 files changed, 17 insertions(+)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 4fce37ae5a90..7c8cf6b53a76 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -1035,6 +1035,7 @@
 #define MSR_CORE_PERF_GLOBAL_STATUS	0x0000038e
 #define MSR_CORE_PERF_GLOBAL_CTRL	0x0000038f
 #define MSR_CORE_PERF_GLOBAL_OVF_CTRL	0x00000390
+#define MSR_CORE_PERF_GLOBAL_STATUS_SET 0x00000391
 
 #define MSR_PERF_METRICS		0x00000329
 
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index ba7695a64ff1..b25df421cd75 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -206,6 +206,8 @@ static bool intel_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)
 	switch (msr) {
 	case MSR_CORE_PERF_FIXED_CTR_CTRL:
 		return kvm_pmu_has_perf_global_ctrl(pmu);
+	case MSR_CORE_PERF_GLOBAL_STATUS_SET:
+		return vcpu_to_pmu(vcpu)->version >= 4;
 	case MSR_IA32_PEBS_ENABLE:
 		ret = vcpu_get_perf_capabilities(vcpu) & PERF_CAP_PEBS_FORMAT;
 		break;
@@ -355,6 +357,9 @@ static int intel_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case MSR_CORE_PERF_FIXED_CTR_CTRL:
 		msr_info->data = pmu->fixed_ctr_ctrl;
 		break;
+	case MSR_CORE_PERF_GLOBAL_STATUS_SET:
+		msr_info->data = 0;
+		break;
 	case MSR_IA32_PEBS_ENABLE:
 		msr_info->data = pmu->pebs_enable;
 		break;
@@ -449,6 +454,17 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		if (!msr_info->host_initiated)
 			pmu->global_status &= ~data;
 		break;
+	case MSR_CORE_PERF_GLOBAL_STATUS_SET:
+		/*
+		 * GLOBAL STATUS_SET, sets bits in GLOBAL_STATUS, so the
+		 * set of reserved bits are the same.
+		 */
+		if (data & pmu->global_status_mask)
+			return 1;
+
+		if (!msr_info->host_initiated)
+			pmu->global_status |= data;
+		break;
 	default:
 		if ((pmc = get_gp_pmc(pmu, msr, MSR_IA32_PERFCTR0)) ||
 		    (pmc = get_gp_pmc(pmu, msr, MSR_IA32_PMC0))) {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 4/9] KVM: x86/pmu: Add MSR_PERF_GLOBAL_INUSE emulation
  2023-09-01  7:28 [PATCH 0/9] Upgrade vPMU version to 5 Xiong Zhang
                   ` (2 preceding siblings ...)
  2023-09-01  7:28 ` [PATCH 3/9] KVM: x86/pmu: Add PERF_GLOBAL_STATUS_SET MSR emulation Xiong Zhang
@ 2023-09-01  7:28 ` Xiong Zhang
  2023-09-12 11:41   ` Like Xu
  2023-09-01  7:28 ` [PATCH 5/9] KVM: x86/pmu: Check CPUID.0AH.ECX consistency Xiong Zhang
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 27+ messages in thread
From: Xiong Zhang @ 2023-09-01  7:28 UTC (permalink / raw)
  To: kvm
  Cc: seanjc, like.xu.linux, zhiyuan.lv, zhenyu.z.wang, kan.liang,
	dapeng1.mi, Xiong Zhang

Arch PMU v4 introduces a new MSR, IA32_PERF_GLOBAL_INUSE. It provides
as "InUse" bit for each GP counter and fixed counter in processor.
Additionally PMI InUse[bit 63] indicates if the PMI mechanism has been
configured.

Each bit's definition references Architectural Performance Monitoring
Version 4 section of SDM.

Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
---
 arch/x86/include/asm/msr-index.h |  4 +++
 arch/x86/kvm/vmx/pmu_intel.c     | 58 ++++++++++++++++++++++++++++++++
 2 files changed, 62 insertions(+)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 7c8cf6b53a76..31bb425899fb 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -1036,6 +1036,7 @@
 #define MSR_CORE_PERF_GLOBAL_CTRL	0x0000038f
 #define MSR_CORE_PERF_GLOBAL_OVF_CTRL	0x00000390
 #define MSR_CORE_PERF_GLOBAL_STATUS_SET 0x00000391
+#define MSR_CORE_PERF_GLOBAL_INUSE	0x00000392
 
 #define MSR_PERF_METRICS		0x00000329
 
@@ -1048,6 +1049,9 @@
 #define MSR_CORE_PERF_GLOBAL_OVF_CTRL_COND_CHGD_BIT		63
 #define MSR_CORE_PERF_GLOBAL_OVF_CTRL_COND_CHGD			(1ULL << MSR_CORE_PERF_GLOBAL_OVF_CTRL_COND_CHGD_BIT)
 
+/* PERF_GLOBAL_INUSE bits */
+#define MSR_CORE_PERF_GLOBAL_INUSE_PMI				BIT_ULL(63)
+
 /* Geode defined MSRs */
 #define MSR_GEODE_BUSCONT_CONF0		0x00001900
 
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index b25df421cd75..46363ac82a79 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -207,6 +207,7 @@ static bool intel_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)
 	case MSR_CORE_PERF_FIXED_CTR_CTRL:
 		return kvm_pmu_has_perf_global_ctrl(pmu);
 	case MSR_CORE_PERF_GLOBAL_STATUS_SET:
+	case MSR_CORE_PERF_GLOBAL_INUSE:
 		return vcpu_to_pmu(vcpu)->version >= 4;
 	case MSR_IA32_PEBS_ENABLE:
 		ret = vcpu_get_perf_capabilities(vcpu) & PERF_CAP_PEBS_FORMAT;
@@ -347,6 +348,58 @@ static bool intel_pmu_handle_lbr_msrs_access(struct kvm_vcpu *vcpu,
 	return true;
 }
 
+static u64 intel_pmu_global_inuse_emulation(struct kvm_pmu *pmu)
+{
+	u64 data = 0;
+	int i;
+
+	for (i = 0; i < pmu->nr_arch_gp_counters; i++) {
+		struct kvm_pmc *pmc = &pmu->gp_counters[i];
+
+		/*
+		 * IA32_PERF_GLOBAL_INUSE.PERFEVTSELn_InUse[bit n]: This bit
+		 * reflects the logical state of (IA32_PERFEVTSELn[7:0]),
+		 * n < CPUID.0AH.EAX[15:8].
+		 */
+		if (pmc->eventsel & ARCH_PERFMON_EVENTSEL_EVENT)
+			data |= 1 << i;
+		/*
+		 * IA32_PERF_GLOBAL_INUSE.PMI_InUse[bit 63]: This bit is set if
+		 * IA32_PERFEVTSELn.INT[bit 20], n < CPUID.0AH.EAX[15:8] is set.
+		 */
+		if (pmc->eventsel & ARCH_PERFMON_EVENTSEL_INT)
+			data |= MSR_CORE_PERF_GLOBAL_INUSE_PMI;
+	}
+
+	for (i = 0; i < pmu->nr_arch_fixed_counters; i++) {
+		/*
+		 * IA32_PERF_GLOBAL_INUSE.FCi_InUse[bit (i + 32)]: This bit
+		 * reflects the logical state of
+		 * IA32_FIXED_CTR_CTRL[i * 4 + 1, i * 4] != 0
+		 */
+		if (pmu->fixed_ctr_ctrl &
+		    intel_fixed_bits_by_idx(i, INTEL_FIXED_0_KERNEL | INTEL_FIXED_0_USER))
+			data |= 1ULL << (i + INTEL_PMC_IDX_FIXED);
+		/*
+		 * IA32_PERF_GLOBAL_INUSE.PMI_InUse[bit 63]: This bit is set if
+		 * IA32_FIXED_CTR_CTRL.ENi_PMI, i = 0, 1, 2 is set.
+		 */
+		if (pmu->fixed_ctr_ctrl &
+		    intel_fixed_bits_by_idx(i, INTEL_FIXED_0_ENABLE_PMI))
+			data |= MSR_CORE_PERF_GLOBAL_INUSE_PMI;
+	}
+
+	/*
+	 * IA32_PERF_GLOBAL_INUSE.PMI_InUse[bit 63]: This bit is set if
+	 * any IA32_PEBS_ENABLES bit is set, which enables PEBS for a GP or
+	 * fixed counter.
+	 */
+	if (pmu->pebs_enable)
+		data |= MSR_CORE_PERF_GLOBAL_INUSE_PMI;
+
+	return data;
+}
+
 static int intel_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 {
 	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
@@ -360,6 +413,9 @@ static int intel_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case MSR_CORE_PERF_GLOBAL_STATUS_SET:
 		msr_info->data = 0;
 		break;
+	case MSR_CORE_PERF_GLOBAL_INUSE:
+		msr_info->data = intel_pmu_global_inuse_emulation(pmu);
+		break;
 	case MSR_IA32_PEBS_ENABLE:
 		msr_info->data = pmu->pebs_enable;
 		break;
@@ -409,6 +465,8 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		if (pmu->fixed_ctr_ctrl != data)
 			reprogram_fixed_counters(pmu, data);
 		break;
+	case MSR_CORE_PERF_GLOBAL_INUSE:
+		return 1;   /* RO MSR */
 	case MSR_IA32_PEBS_ENABLE:
 		if (data & pmu->pebs_enable_mask)
 			return 1;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 5/9] KVM: x86/pmu: Check CPUID.0AH.ECX consistency
  2023-09-01  7:28 [PATCH 0/9] Upgrade vPMU version to 5 Xiong Zhang
                   ` (3 preceding siblings ...)
  2023-09-01  7:28 ` [PATCH 4/9] KVM: x86/pmu: Add MSR_PERF_GLOBAL_INUSE emulation Xiong Zhang
@ 2023-09-01  7:28 ` Xiong Zhang
  2023-09-06  9:44   ` Mi, Dapeng
  2023-09-12 11:31   ` Like Xu
  2023-09-01  7:28 ` [PATCH 6/9] KVM: x86/pmu: Add Intel PMU supported fixed counters mask Xiong Zhang
                   ` (3 subsequent siblings)
  8 siblings, 2 replies; 27+ messages in thread
From: Xiong Zhang @ 2023-09-01  7:28 UTC (permalink / raw)
  To: kvm
  Cc: seanjc, like.xu.linux, zhiyuan.lv, zhenyu.z.wang, kan.liang,
	dapeng1.mi, Xiong Zhang

With Arch PMU V5, register CPUID.0AH.ECX indicates Fixed Counter
enumeration. It is a bit mask which enumerates the supported Fixed
counters.
FxCtrl[i]_is_supported := ECX[i] || (EDX[4:0] > i)
where EDX[4:0] is Number of continuous fixed-function performance
counters starting from 0 (if version ID > 1).

Here ECX and EDX[4:0] should satisfy the following consistency:
1. if 1 < pmu_version < 5, ECX == 0;
2. if pmu_version == 5 && edx[4:0] == 0, ECX[bit 0] == 0
3. if pmu_version == 5 && edx[4:0] > 0,
   ecx & ((1 << edx[4:0]) - 1) == (1 << edx[4:0]) -1

Otherwise it is mess to decide whether a fixed counter is supported
or not. i.e. pmu_version = 5, edx[4:0] = 3, ecx = 0x10, it is hard
to decide whether fixed counters 0 ~ 2 are supported or not.

User can call SET_CPUID2 ioctl to set guest CPUID.0AH, this commit
adds a check to guarantee ecx and edx consistency specified by user.

Once user specifies an un-consistency value, KVM can return an
error to user and drop user setting, or correct the un-consistency
data and accept the corrected data, this commit chooses to
return an error to user.

Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
---
 arch/x86/kvm/cpuid.c | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index e961e9a05847..95dc5e8847e0 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -150,6 +150,33 @@ static int kvm_check_cpuid(struct kvm_vcpu *vcpu,
 			return -EINVAL;
 	}
 
+	best = cpuid_entry2_find(entries, nent, 0xa,
+				 KVM_CPUID_INDEX_NOT_SIGNIFICANT);
+	if (best && vcpu->kvm->arch.enable_pmu) {
+		union cpuid10_eax eax;
+		union cpuid10_edx   edx;
+
+		eax.full = best->eax;
+		edx.full = best->edx;
+
+		if (eax.split.version_id > 1 &&
+		    eax.split.version_id < 5 &&
+		    best->ecx != 0) {
+			return -EINVAL;
+		} else if (eax.split.version_id >= 5) {
+			int fixed_count = edx.split.num_counters_fixed;
+
+			if (fixed_count == 0 && (best->ecx & 0x1)) {
+				return -EINVAL;
+			} else if (fixed_count > 0) {
+				int low_fixed_mask = (1 << fixed_count) - 1;
+
+				if ((best->ecx & low_fixed_mask) != low_fixed_mask)
+					return -EINVAL;
+			}
+		}
+	}
+
 	/*
 	 * Exposing dynamic xfeatures to the guest requires additional
 	 * enabling in the FPU, e.g. to expand the guest XSAVE state size.
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 6/9] KVM: x86/pmu: Add Intel PMU supported fixed counters mask
  2023-09-01  7:28 [PATCH 0/9] Upgrade vPMU version to 5 Xiong Zhang
                   ` (4 preceding siblings ...)
  2023-09-01  7:28 ` [PATCH 5/9] KVM: x86/pmu: Check CPUID.0AH.ECX consistency Xiong Zhang
@ 2023-09-01  7:28 ` Xiong Zhang
  2023-09-06 10:08   ` Mi, Dapeng
  2023-09-01  7:28 ` [PATCH 7/9] KVM: x86/pmu: Add fixed counter enumeration for pmu v5 Xiong Zhang
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 27+ messages in thread
From: Xiong Zhang @ 2023-09-01  7:28 UTC (permalink / raw)
  To: kvm
  Cc: seanjc, like.xu.linux, zhiyuan.lv, zhenyu.z.wang, kan.liang,
	dapeng1.mi, Like Xu, Xiong Zhang

From: Like Xu <likexu@tencent.com>

Per Intel SDM, fixed-function performance counter 'i' is supported:

	FxCtr[i]_is_supported := ECX[i] || (EDX[4:0] > i);
if pmu.version >=5, ECX is supported fixed counters bit mask.
if 1 < pmu.version < 5, EDX[4:0] is number of contiguous fixed-function
performance counters starting from 0.

which means that the KVM user space can use EDX to limit the number of
fixed counters starting from 0 and at the same time, using ECX to enable
part of other KVM supported fixed counters. i.e: pmu.version = 5,
ECX= 0x5, EDX[4:0]=1, FxCtrl[2, 0] are supported, FxCtrl[1] isn't
supported.

Add Fixed counter bit mask into all_valid_pmc_idx, and use it to perform
the semantic checks.

Since fixed counter may be non-continuous, nr_arch_fixed_counters can
not be used to enumerate fixed counters, for_each_set_bit_from() is
used to enumerate fixed counters, and nr_arch_fixed_counters is deleted.

Signed-off-by: Like Xu <likexu@tencent.com>
Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
---
 arch/x86/include/asm/kvm_host.h |  1 -
 arch/x86/kvm/pmu.h              | 12 +++++-
 arch/x86/kvm/svm/pmu.c          |  1 -
 arch/x86/kvm/vmx/pmu_intel.c    | 69 ++++++++++++++++++++-------------
 4 files changed, 52 insertions(+), 31 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 9f57aa33798b..ceba4f89dec5 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -515,7 +515,6 @@ struct kvm_pmc {
 struct kvm_pmu {
 	u8 version;
 	unsigned nr_arch_gp_counters;
-	unsigned nr_arch_fixed_counters;
 	unsigned available_event_types;
 	u64 fixed_ctr_ctrl;
 	u64 fixed_ctr_ctrl_mask;
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index 7d9ba301c090..4bab4819ea6c 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -125,14 +125,22 @@ static inline struct kvm_pmc *get_gp_pmc(struct kvm_pmu *pmu, u32 msr,
 	return NULL;
 }
 
+static inline bool fixed_ctr_is_supported(struct kvm_pmu *pmu, unsigned int idx)
+{
+	return test_bit(INTEL_PMC_IDX_FIXED + idx, pmu->all_valid_pmc_idx);
+}
+
 /* returns fixed PMC with the specified MSR */
 static inline struct kvm_pmc *get_fixed_pmc(struct kvm_pmu *pmu, u32 msr)
 {
 	int base = MSR_CORE_PERF_FIXED_CTR0;
 
-	if (msr >= base && msr < base + pmu->nr_arch_fixed_counters) {
+	if (msr >= base && msr < base + KVM_PMC_MAX_FIXED) {
 		u32 index = array_index_nospec(msr - base,
-					       pmu->nr_arch_fixed_counters);
+					       KVM_PMC_MAX_FIXED);
+
+		if (!fixed_ctr_is_supported(pmu, index))
+			return NULL;
 
 		return &pmu->fixed_counters[index];
 	}
diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c
index cef5a3d0abd0..d0a12e739989 100644
--- a/arch/x86/kvm/svm/pmu.c
+++ b/arch/x86/kvm/svm/pmu.c
@@ -213,7 +213,6 @@ static void amd_pmu_refresh(struct kvm_vcpu *vcpu)
 	pmu->raw_event_mask = AMD64_RAW_EVENT_MASK;
 	/* not applicable to AMD; but clean them to prevent any fall out */
 	pmu->counter_bitmask[KVM_PMC_FIXED] = 0;
-	pmu->nr_arch_fixed_counters = 0;
 	bitmap_set(pmu->all_valid_pmc_idx, 0, pmu->nr_arch_gp_counters);
 }
 
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 46363ac82a79..ce6d06ec562c 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -72,10 +72,12 @@ static void reprogram_fixed_counters(struct kvm_pmu *pmu, u64 data)
 {
 	struct kvm_pmc *pmc;
 	u8 old_fixed_ctr_ctrl = pmu->fixed_ctr_ctrl;
-	int i;
+	int s = INTEL_PMC_IDX_FIXED;
 
 	pmu->fixed_ctr_ctrl = data;
-	for (i = 0; i < pmu->nr_arch_fixed_counters; i++) {
+	for_each_set_bit_from(s, pmu->all_valid_pmc_idx,
+			      INTEL_PMC_IDX_FIXED + INTEL_PMC_MAX_FIXED) {
+		int i = s - INTEL_PMC_IDX_FIXED;
 		u8 new_ctrl = fixed_ctrl_field(data, i);
 		u8 old_ctrl = fixed_ctrl_field(old_fixed_ctr_ctrl, i);
 
@@ -132,7 +134,7 @@ static bool intel_is_valid_rdpmc_ecx(struct kvm_vcpu *vcpu, unsigned int idx)
 
 	idx &= ~(3u << 30);
 
-	return fixed ? idx < pmu->nr_arch_fixed_counters
+	return fixed ? fixed_ctr_is_supported(pmu, idx)
 		     : idx < pmu->nr_arch_gp_counters;
 }
 
@@ -144,16 +146,17 @@ static struct kvm_pmc *intel_rdpmc_ecx_to_pmc(struct kvm_vcpu *vcpu,
 	struct kvm_pmc *counters;
 	unsigned int num_counters;
 
+	if (!intel_is_valid_rdpmc_ecx(vcpu, idx))
+		return NULL;
+
 	idx &= ~(3u << 30);
 	if (fixed) {
 		counters = pmu->fixed_counters;
-		num_counters = pmu->nr_arch_fixed_counters;
+		num_counters = KVM_PMC_MAX_FIXED;
 	} else {
 		counters = pmu->gp_counters;
 		num_counters = pmu->nr_arch_gp_counters;
 	}
-	if (idx >= num_counters)
-		return NULL;
 	*mask &= pmu->counter_bitmask[fixed ? KVM_PMC_FIXED : KVM_PMC_GP];
 	return &counters[array_index_nospec(idx, num_counters)];
 }
@@ -352,6 +355,7 @@ static u64 intel_pmu_global_inuse_emulation(struct kvm_pmu *pmu)
 {
 	u64 data = 0;
 	int i;
+	int s = INTEL_PMC_IDX_FIXED;
 
 	for (i = 0; i < pmu->nr_arch_gp_counters; i++) {
 		struct kvm_pmc *pmc = &pmu->gp_counters[i];
@@ -371,7 +375,10 @@ static u64 intel_pmu_global_inuse_emulation(struct kvm_pmu *pmu)
 			data |= MSR_CORE_PERF_GLOBAL_INUSE_PMI;
 	}
 
-	for (i = 0; i < pmu->nr_arch_fixed_counters; i++) {
+	for_each_set_bit_from(s, pmu->all_valid_pmc_idx,
+			      INTEL_PMC_IDX_FIXED + INTEL_PMC_MAX_FIXED) {
+		i = s - INTEL_PMC_IDX_FIXED;
+
 		/*
 		 * IA32_PERF_GLOBAL_INUSE.FCi_InUse[bit (i + 32)]: This bit
 		 * reflects the logical state of
@@ -379,7 +386,7 @@ static u64 intel_pmu_global_inuse_emulation(struct kvm_pmu *pmu)
 		 */
 		if (pmu->fixed_ctr_ctrl &
 		    intel_fixed_bits_by_idx(i, INTEL_FIXED_0_KERNEL | INTEL_FIXED_0_USER))
-			data |= 1ULL << (i + INTEL_PMC_IDX_FIXED);
+			data |= 1ULL << s;
 		/*
 		 * IA32_PERF_GLOBAL_INUSE.PMI_InUse[bit 63]: This bit is set if
 		 * IA32_FIXED_CTR_CTRL.ENi_PMI, i = 0, 1, 2 is set.
@@ -565,12 +572,14 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 
 static void setup_fixed_pmc_eventsel(struct kvm_pmu *pmu)
 {
-	int i;
+	int s = INTEL_PMC_IDX_FIXED;
 
 	BUILD_BUG_ON(ARRAY_SIZE(fixed_pmc_events) != KVM_PMC_MAX_FIXED);
 
-	for (i = 0; i < pmu->nr_arch_fixed_counters; i++) {
-		int index = array_index_nospec(i, KVM_PMC_MAX_FIXED);
+	for_each_set_bit_from(s, pmu->all_valid_pmc_idx,
+			      INTEL_PMC_IDX_FIXED + INTEL_PMC_MAX_FIXED) {
+		int index = array_index_nospec(s - INTEL_PMC_IDX_FIXED,
+					       KVM_PMC_MAX_FIXED);
 		struct kvm_pmc *pmc = &pmu->fixed_counters[index];
 		u32 event = fixed_pmc_events[index];
 
@@ -591,7 +600,6 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
 	int i;
 
 	pmu->nr_arch_gp_counters = 0;
-	pmu->nr_arch_fixed_counters = 0;
 	pmu->counter_bitmask[KVM_PMC_GP] = 0;
 	pmu->counter_bitmask[KVM_PMC_FIXED] = 0;
 	pmu->version = 0;
@@ -633,11 +641,22 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
 	pmu->available_event_types = ~entry->ebx &
 					((1ull << eax.split.mask_length) - 1);
 
-	if (pmu->version == 1) {
-		pmu->nr_arch_fixed_counters = 0;
-	} else {
-		pmu->nr_arch_fixed_counters = min_t(int, edx.split.num_counters_fixed,
-						    kvm_pmu_cap.num_counters_fixed);
+	counter_mask = ~(BIT_ULL(pmu->nr_arch_gp_counters) - 1);
+	bitmap_set(pmu->all_valid_pmc_idx, 0, pmu->nr_arch_gp_counters);
+
+	if (pmu->version > 1) {
+		for (i = 0; i < kvm_pmu_cap.num_counters_fixed; i++) {
+			/*
+			 * FxCtr[i]_is_supported :=
+			 *	CPUID.0xA.ECX[i] || EDX[4:0] > i
+			 */
+			if (!(entry->ecx & BIT_ULL(i) || edx.split.num_counters_fixed > i))
+				continue;
+
+			set_bit(INTEL_PMC_IDX_FIXED + i, pmu->all_valid_pmc_idx);
+			pmu->fixed_ctr_ctrl_mask &= ~(0xbull << (i * 4));
+			counter_mask &= ~BIT_ULL(INTEL_PMC_IDX_FIXED + i);
+		}
 		edx.split.bit_width_fixed = min_t(int, edx.split.bit_width_fixed,
 						  kvm_pmu_cap.bit_width_fixed);
 		pmu->counter_bitmask[KVM_PMC_FIXED] =
@@ -645,10 +664,6 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
 		setup_fixed_pmc_eventsel(pmu);
 	}
 
-	for (i = 0; i < pmu->nr_arch_fixed_counters; i++)
-		pmu->fixed_ctr_ctrl_mask &= ~(0xbull << (i * 4));
-	counter_mask = ~(((1ull << pmu->nr_arch_gp_counters) - 1) |
-		(((1ull << pmu->nr_arch_fixed_counters) - 1) << INTEL_PMC_IDX_FIXED));
 	pmu->global_ctrl_mask = counter_mask;
 
 	/*
@@ -674,11 +689,6 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
 		pmu->raw_event_mask |= (HSW_IN_TX|HSW_IN_TX_CHECKPOINTED);
 	}
 
-	bitmap_set(pmu->all_valid_pmc_idx,
-		0, pmu->nr_arch_gp_counters);
-	bitmap_set(pmu->all_valid_pmc_idx,
-		INTEL_PMC_MAX_GENERIC, pmu->nr_arch_fixed_counters);
-
 	perf_capabilities = vcpu_get_perf_capabilities(vcpu);
 	if (cpuid_model_is_consistent(vcpu) &&
 	    (perf_capabilities & PMU_CAP_LBR_FMT))
@@ -691,9 +701,14 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
 
 	if (perf_capabilities & PERF_CAP_PEBS_FORMAT) {
 		if (perf_capabilities & PERF_CAP_PEBS_BASELINE) {
+			int s = INTEL_PMC_IDX_FIXED;
+			int e = INTEL_PMC_IDX_FIXED + INTEL_PMC_MAX_FIXED;
+
 			pmu->pebs_enable_mask = counter_mask;
 			pmu->reserved_bits &= ~ICL_EVENTSEL_ADAPTIVE;
-			for (i = 0; i < pmu->nr_arch_fixed_counters; i++) {
+
+			for_each_set_bit_from(s, pmu->all_valid_pmc_idx, e) {
+				i = s - INTEL_PMC_IDX_FIXED;
 				pmu->fixed_ctr_ctrl_mask &=
 					~(1ULL << (INTEL_PMC_IDX_FIXED + i * 4));
 			}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 7/9] KVM: x86/pmu: Add fixed counter enumeration for pmu v5
  2023-09-01  7:28 [PATCH 0/9] Upgrade vPMU version to 5 Xiong Zhang
                   ` (5 preceding siblings ...)
  2023-09-01  7:28 ` [PATCH 6/9] KVM: x86/pmu: Add Intel PMU supported fixed counters mask Xiong Zhang
@ 2023-09-01  7:28 ` Xiong Zhang
  2023-09-12 11:24   ` Like Xu
  2023-09-01  7:28 ` [PATCH 8/9] KVM: x86/pmu: Upgrade pmu version to 5 on intel processor Xiong Zhang
  2023-09-01  7:28 ` [PATCH 9/9] KVM: selftests: Add fixed counters enumeration test case Xiong Zhang
  8 siblings, 1 reply; 27+ messages in thread
From: Xiong Zhang @ 2023-09-01  7:28 UTC (permalink / raw)
  To: kvm
  Cc: seanjc, like.xu.linux, zhiyuan.lv, zhenyu.z.wang, kan.liang,
	dapeng1.mi, Xiong Zhang

With Arch PMU v5, CPUID.0AH.ECX is a bit mask which enumerates the
supported Fixed Counters. If bit 'i' is set, it implies that Fixed
Counter 'i' is supported.

This commit adds CPUID.0AH.ECX emulation for vPMU version 5, KVM
supports Fixed Counter enumeration starting from 0 by default,
user can modify it through SET_CPUID2 ioctl.

Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
---
 arch/x86/kvm/cpuid.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 95dc5e8847e0..2bffed010c9e 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -1028,7 +1028,10 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
 
 		entry->eax = eax.full;
 		entry->ebx = kvm_pmu_cap.events_mask;
-		entry->ecx = 0;
+		if (kvm_pmu_cap.version < 5)
+			entry->ecx = 0;
+		else
+			entry->ecx = (1ULL << kvm_pmu_cap.num_counters_fixed) - 1;
 		entry->edx = edx.full;
 		break;
 	}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 8/9] KVM: x86/pmu: Upgrade pmu version to 5 on intel processor
  2023-09-01  7:28 [PATCH 0/9] Upgrade vPMU version to 5 Xiong Zhang
                   ` (6 preceding siblings ...)
  2023-09-01  7:28 ` [PATCH 7/9] KVM: x86/pmu: Add fixed counter enumeration for pmu v5 Xiong Zhang
@ 2023-09-01  7:28 ` Xiong Zhang
  2023-09-12 11:19   ` Like Xu
  2023-09-01  7:28 ` [PATCH 9/9] KVM: selftests: Add fixed counters enumeration test case Xiong Zhang
  8 siblings, 1 reply; 27+ messages in thread
From: Xiong Zhang @ 2023-09-01  7:28 UTC (permalink / raw)
  To: kvm
  Cc: seanjc, like.xu.linux, zhiyuan.lv, zhenyu.z.wang, kan.liang,
	dapeng1.mi, Xiong Zhang

Modern intel processors have supported Architectural Performance
Monitoring Version 5, this commit upgrade Intel vcpu's vPMU
version from 2 to 5.

Go through PMU features from version 3 to 5, the following
features are not supported:
1. AnyThread counting: it is added in v3, and deprecated in v5.
2. Streamed Freeze_PerfMon_On_PMI in v4, since legacy Freeze_PerMon_ON_PMI
isn't supported, the new one won't be supported neither.
3. IA32_PERF_GLOBAL_STATUS.ASCI[bit 60]: Related to SGX, and will be
emulated by SGX developer later.
4. Domain Separation in v5. When INV flag in IA32_PERFEVTSELx is used, a
counter stops counting when logical processor exits the C0 ACPI C-state.
First guest INV flag isn't supported, second guest ACPI C-state is vague.

When a guest enable unsupported features through WRMSR, KVM will inject
a #GP into the guest.

Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
---
 arch/x86/kvm/pmu.h | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index 4bab4819ea6c..8e6bc9b1a747 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -215,7 +215,10 @@ static inline void kvm_init_pmu_capability(const struct kvm_pmu_ops *pmu_ops)
 		return;
 	}

-	kvm_pmu_cap.version = min(kvm_pmu_cap.version, 2);
+	if (is_intel)
+		kvm_pmu_cap.version = min(kvm_pmu_cap.version, 5);
+	else
+		kvm_pmu_cap.version = min(kvm_pmu_cap.version, 2);
 	kvm_pmu_cap.num_counters_gp = min(kvm_pmu_cap.num_counters_gp,
 					  pmu_ops->MAX_NR_GP_COUNTERS);
 	kvm_pmu_cap.num_counters_fixed = min(kvm_pmu_cap.num_counters_fixed,
-- 
2.34.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 9/9] KVM: selftests: Add fixed counters enumeration test case
  2023-09-01  7:28 [PATCH 0/9] Upgrade vPMU version to 5 Xiong Zhang
                   ` (7 preceding siblings ...)
  2023-09-01  7:28 ` [PATCH 8/9] KVM: x86/pmu: Upgrade pmu version to 5 on intel processor Xiong Zhang
@ 2023-09-01  7:28 ` Xiong Zhang
  2023-09-11  3:03   ` Mi, Dapeng
  8 siblings, 1 reply; 27+ messages in thread
From: Xiong Zhang @ 2023-09-01  7:28 UTC (permalink / raw)
  To: kvm
  Cc: seanjc, like.xu.linux, zhiyuan.lv, zhenyu.z.wang, kan.liang,
	dapeng1.mi, Xiong Zhang

vPMU v5 adds fixed counter enumeration, which allows user space to
specify which fixed counters are supported through emulated
CPUID.0Ah.ECX.

This commit adds a test case which specify the max fixed counter
supported only, so guest can access the max fixed counter only, #GP
exception will be happen once guest access other fixed counters.

Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
---
 .../selftests/kvm/x86_64/vmx_pmu_caps_test.c  | 84 +++++++++++++++++++
 1 file changed, 84 insertions(+)

diff --git a/tools/testing/selftests/kvm/x86_64/vmx_pmu_caps_test.c b/tools/testing/selftests/kvm/x86_64/vmx_pmu_caps_test.c
index ebbcb0a3f743..e37dc39164fe 100644
--- a/tools/testing/selftests/kvm/x86_64/vmx_pmu_caps_test.c
+++ b/tools/testing/selftests/kvm/x86_64/vmx_pmu_caps_test.c
@@ -18,6 +18,8 @@
 #include "kvm_util.h"
 #include "vmx.h"
 
+uint8_t fixed_counter_num;
+
 union perf_capabilities {
 	struct {
 		u64	lbr_format:6;
@@ -233,6 +235,86 @@ static void test_lbr_perf_capabilities(union perf_capabilities host_cap)
 	kvm_vm_free(vm);
 }
 
+static void guest_v5_code(void)
+{
+	uint8_t  vector, i;
+	uint64_t val;
+
+	for (i = 0; i < fixed_counter_num; i++) {
+		vector = rdmsr_safe(MSR_CORE_PERF_FIXED_CTR0 + i, &val);
+
+		/*
+		 * Only the max fixed counter is supported, #GP will be generated
+		 * when guest access other fixed counters.
+		 */
+		if (i == fixed_counter_num - 1)
+			__GUEST_ASSERT(vector != GP_VECTOR,
+				       "Max Fixed counter is accessible, but get #GP");
+		else
+			__GUEST_ASSERT(vector == GP_VECTOR,
+				       "Fixed counter isn't accessible, but access is ok");
+	}
+
+	GUEST_DONE();
+}
+
+#define PMU_NR_FIXED_COUNTERS_MASK  0x1f
+
+static void test_fixed_counter_enumeration(void)
+{
+	struct kvm_vcpu *vcpu;
+	struct kvm_vm *vm;
+	int r;
+	struct kvm_cpuid_entry2 *ent;
+	struct ucall uc;
+	uint32_t fixed_counter_bit_mask;
+
+	if (kvm_cpu_property(X86_PROPERTY_PMU_VERSION) != 5)
+		return;
+
+	vm = vm_create_with_one_vcpu(&vcpu, guest_v5_code);
+	vm_init_descriptor_tables(vm);
+	vcpu_init_descriptor_tables(vcpu);
+
+	ent = vcpu_get_cpuid_entry(vcpu, 0xa);
+	fixed_counter_num = ent->edx & PMU_NR_FIXED_COUNTERS_MASK;
+	TEST_ASSERT(fixed_counter_num > 0, "fixed counter isn't supported");
+	fixed_counter_bit_mask = (1ul << fixed_counter_num) - 1;
+	TEST_ASSERT(ent->ecx == fixed_counter_bit_mask,
+		    "cpuid.0xa.ecx != %x", fixed_counter_bit_mask);
+
+	/* Fixed counter 0 isn't in ecx, but in edx, set_cpuid should be error. */
+	ent->ecx &= ~0x1;
+	r = __vcpu_set_cpuid(vcpu);
+	TEST_ASSERT(r, "Setting in-consistency cpuid.0xa.ecx and edx success");
+
+	if (fixed_counter_num == 1) {
+		kvm_vm_free(vm);
+		return;
+	}
+
+	/* Support the max Fixed Counter only */
+	ent->ecx = 1UL << (fixed_counter_num - 1);
+	ent->edx &= ~(u32)PMU_NR_FIXED_COUNTERS_MASK;
+
+	r = __vcpu_set_cpuid(vcpu);
+	TEST_ASSERT(!r, "Setting modified cpuid.0xa.ecx and edx failed");
+
+	vcpu_run(vcpu);
+
+	switch (get_ucall(vcpu, &uc)) {
+	case UCALL_ABORT:
+		REPORT_GUEST_ASSERT(uc);
+		break;
+	case UCALL_DONE:
+		break;
+	default:
+		TEST_FAIL("Unexpected ucall: %lu", uc.cmd);
+	}
+
+	kvm_vm_free(vm);
+}
+
 int main(int argc, char *argv[])
 {
 	union perf_capabilities host_cap;
@@ -253,4 +335,6 @@ int main(int argc, char *argv[])
 	test_immutable_perf_capabilities(host_cap);
 	test_guest_wrmsr_perf_capabilities(host_cap);
 	test_lbr_perf_capabilities(host_cap);
+
+	test_fixed_counter_enumeration();
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 5/9] KVM: x86/pmu: Check CPUID.0AH.ECX consistency
  2023-09-01  7:28 ` [PATCH 5/9] KVM: x86/pmu: Check CPUID.0AH.ECX consistency Xiong Zhang
@ 2023-09-06  9:44   ` Mi, Dapeng
  2023-09-12  0:45     ` Zhang, Xiong Y
  2023-09-12 11:31   ` Like Xu
  1 sibling, 1 reply; 27+ messages in thread
From: Mi, Dapeng @ 2023-09-06  9:44 UTC (permalink / raw)
  To: Xiong Zhang, kvm
  Cc: seanjc, like.xu.linux, zhiyuan.lv, zhenyu.z.wang, kan.liang

On 9/1/2023 3:28 PM, Xiong Zhang wrote:
> With Arch PMU V5, register CPUID.0AH.ECX indicates Fixed Counter
> enumeration. It is a bit mask which enumerates the supported Fixed
> counters.
> FxCtrl[i]_is_supported := ECX[i] || (EDX[4:0] > i)
> where EDX[4:0] is Number of continuous fixed-function performance
> counters starting from 0 (if version ID > 1).
>
> Here ECX and EDX[4:0] should satisfy the following consistency:
> 1. if 1 < pmu_version < 5, ECX == 0;
> 2. if pmu_version == 5 && edx[4:0] == 0, ECX[bit 0] == 0
> 3. if pmu_version == 5 && edx[4:0] > 0,
>     ecx & ((1 << edx[4:0]) - 1) == (1 << edx[4:0]) -1
>
> Otherwise it is mess to decide whether a fixed counter is supported
> or not. i.e. pmu_version = 5, edx[4:0] = 3, ecx = 0x10, it is hard
> to decide whether fixed counters 0 ~ 2 are supported or not.
>
> User can call SET_CPUID2 ioctl to set guest CPUID.0AH, this commit
> adds a check to guarantee ecx and edx consistency specified by user.
>
> Once user specifies an un-consistency value, KVM can return an
> error to user and drop user setting, or correct the un-consistency
> data and accept the corrected data, this commit chooses to
> return an error to user.
>
> Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
> ---
>   arch/x86/kvm/cpuid.c | 27 +++++++++++++++++++++++++++
>   1 file changed, 27 insertions(+)
>
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index e961e9a05847..95dc5e8847e0 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -150,6 +150,33 @@ static int kvm_check_cpuid(struct kvm_vcpu *vcpu,
>   			return -EINVAL;
>   	}
>   
> +	best = cpuid_entry2_find(entries, nent, 0xa,
> +				 KVM_CPUID_INDEX_NOT_SIGNIFICANT);
> +	if (best && vcpu->kvm->arch.enable_pmu) {
> +		union cpuid10_eax eax;
> +		union cpuid10_edx   edx;


Remove the redundant space before edx.


> +
> +		eax.full = best->eax;
> +		edx.full = best->edx;
> +

We may add SDM quotes as comments here. That makes reader to understand 
the logic easily.


> +		if (eax.split.version_id > 1 &&
> +		    eax.split.version_id < 5 &&
> +		    best->ecx != 0) {
> +			return -EINVAL;
> +		} else if (eax.split.version_id >= 5) {
> +			int fixed_count = edx.split.num_counters_fixed;
> +
> +			if (fixed_count == 0 && (best->ecx & 0x1)) {
> +				return -EINVAL;
> +			} else if (fixed_count > 0) {
> +				int low_fixed_mask = (1 << fixed_count) - 1;
> +
> +				if ((best->ecx & low_fixed_mask) != low_fixed_mask)
> +					return -EINVAL;
> +			}
> +		}
> +	}
> +
>   	/*
>   	 * Exposing dynamic xfeatures to the guest requires additional
>   	 * enabling in the FPU, e.g. to expand the guest XSAVE state size.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/9] KVM: x86/PMU: Don't release vLBR caused by PMI
  2023-09-01  7:28 ` [PATCH 1/9] KVM: x86/PMU: Don't release vLBR caused by PMI Xiong Zhang
@ 2023-09-06  9:47   ` Mi, Dapeng
  2023-09-12 11:54   ` Like Xu
  1 sibling, 0 replies; 27+ messages in thread
From: Mi, Dapeng @ 2023-09-06  9:47 UTC (permalink / raw)
  To: Xiong Zhang, kvm
  Cc: seanjc, like.xu.linux, zhiyuan.lv, zhenyu.z.wang, kan.liang


On 9/1/2023 3:28 PM, Xiong Zhang wrote:
> vLBR event will be released at vcpu sched-in time if LBR_EN bit is not
> set in GUEST_IA32_DEBUGCTL VMCS field, this bit is cleared in two cases:
> 1. guest disable LBR through WRMSR
> 2. KVM disable LBR at PMI injection to emulate guest FREEZE_LBR_ON_PMI.
>
> The first case is guest LBR won't be used anymore and vLBR event can be
> released, but guest LBR is still be used in the second case, vLBR event
> can not be released.
>
> Considering this serial:
> 1. vPMC overflow, KVM injects vPMI and clears guest LBR_EN
> 2. guest handles PMI, and reads LBR records.
> 3. vCPU is sched-out, later sched-in, vLBR event is released.
> 4. Guest continue reading LBR records, KVM creates vLBR event again,
> the vLBR event is the only LBR user on host now, host PMU driver will
> reset HW LBR facility at vLBR creataion.
> 5. Guest gets the remain LBR records with reset state.
> This is conflict with FREEZE_LBR_ON_PMI meaning, so vLBR event can
> not be release on PMI.
>
> This commit adds a freeze_on_pmi flag, this flag is set at pmi
> injection and is cleared when guest operates guest DEBUGCTL_MSR. If
> this flag is true, vLBR event will not be released.
>
> Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
> ---
>   arch/x86/kvm/vmx/pmu_intel.c |  5 ++++-
>   arch/x86/kvm/vmx/vmx.c       | 12 +++++++++---
>   arch/x86/kvm/vmx/vmx.h       |  3 +++
>   3 files changed, 16 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
> index f2efa0bf7ae8..3a36a91638c6 100644
> --- a/arch/x86/kvm/vmx/pmu_intel.c
> +++ b/arch/x86/kvm/vmx/pmu_intel.c
> @@ -628,6 +628,7 @@ static void intel_pmu_init(struct kvm_vcpu *vcpu)
>   	lbr_desc->records.nr = 0;
>   	lbr_desc->event = NULL;
>   	lbr_desc->msr_passthrough = false;
> +	lbr_desc->freeze_on_pmi = false;
>   }
>   
>   static void intel_pmu_reset(struct kvm_vcpu *vcpu)
> @@ -670,6 +671,7 @@ static void intel_pmu_legacy_freezing_lbrs_on_pmi(struct kvm_vcpu *vcpu)
>   	if (data & DEBUGCTLMSR_FREEZE_LBRS_ON_PMI) {
>   		data &= ~DEBUGCTLMSR_LBR;
>   		vmcs_write64(GUEST_IA32_DEBUGCTL, data);
> +		vcpu_to_lbr_desc(vcpu)->freeze_on_pmi = true;
>   	}
>   }
>   
> @@ -761,7 +763,8 @@ void vmx_passthrough_lbr_msrs(struct kvm_vcpu *vcpu)
>   
>   static void intel_pmu_cleanup(struct kvm_vcpu *vcpu)
>   {
> -	if (!(vmcs_read64(GUEST_IA32_DEBUGCTL) & DEBUGCTLMSR_LBR))
> +	if (!(vmcs_read64(GUEST_IA32_DEBUGCTL) & DEBUGCTLMSR_LBR) &&
> +	    !vcpu_to_lbr_desc(vcpu)->freeze_on_pmi)
>   		intel_pmu_release_guest_lbr_event(vcpu);
>   }
>   
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index e6849f780dba..199d0da1dbee 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -2223,9 +2223,15 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>   			get_vmcs12(vcpu)->guest_ia32_debugctl = data;
>   
>   		vmcs_write64(GUEST_IA32_DEBUGCTL, data);
> -		if (intel_pmu_lbr_is_enabled(vcpu) && !to_vmx(vcpu)->lbr_desc.event &&
> -		    (data & DEBUGCTLMSR_LBR))
> -			intel_pmu_create_guest_lbr_event(vcpu);
> +
> +		if (intel_pmu_lbr_is_enabled(vcpu)) {
> +			struct lbr_desc *lbr_desc = vcpu_to_lbr_desc(vcpu);
> +
> +			lbr_desc->freeze_on_pmi = false;
> +			if (!lbr_desc->event && (data & DEBUGCTLMSR_LBR))
> +				intel_pmu_create_guest_lbr_event(vcpu);
> +		}
> +
>   		return 0;
>   	}
>   	case MSR_IA32_BNDCFGS:
> diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
> index c2130d2c8e24..9729ccfa75ae 100644
> --- a/arch/x86/kvm/vmx/vmx.h
> +++ b/arch/x86/kvm/vmx/vmx.h
> @@ -107,6 +107,9 @@ struct lbr_desc {
>   
>   	/* True if LBRs are marked as not intercepted in the MSR bitmap */
>   	bool msr_passthrough;
> +
> +	/* True if LBR is frozen on PMI */
> +	bool freeze_on_pmi;
>   };
>   
>   /*
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/9] KVM: x85/pmu: Add Streamlined FREEZE_LBR_ON_PMI for vPMU v4
  2023-09-01  7:28 ` [PATCH 2/9] KVM: x85/pmu: Add Streamlined FREEZE_LBR_ON_PMI for vPMU v4 Xiong Zhang
@ 2023-09-06  9:49   ` Mi, Dapeng
  0 siblings, 0 replies; 27+ messages in thread
From: Mi, Dapeng @ 2023-09-06  9:49 UTC (permalink / raw)
  To: Xiong Zhang, kvm
  Cc: seanjc, like.xu.linux, zhiyuan.lv, zhenyu.z.wang, kan.liang

On 9/1/2023 3:28 PM, Xiong Zhang wrote:
> Arch PMU version 4 adds a streamlined FREEZE_LBR_ON_PMI feature, this
> feature adds LBR_FRZ[bit 58] into IA32_PERF_GLOBAL_STATUS, this bit is
> set due to the following conditions:
> -- IA32_DEBUGCTL.FREEZE_LBR_ON_PMI has been set
> -- A performance counter, configured to generate PMI, has overflowed to
> signal a PMI. Consequently the LBR stack is frozen.
> Effectively, this bit also serves as a control to enabled capturing
> data in the LBR stack. When this bit is set, LBR stack is frozen, and
> new LBR records won't be filled.
>
> The sequence of streamlined freeze LBR is:
> 1. Profiling agent set IA32_DEBUGCTL.FREEZE_LBR_ON_PMI, and enable
> a performance counter to generate PMI on overflow.
> 2. Processor generates PMI and sets IA32_PERF_GLOBAL_STATUS.LBR_FRZ,
> then LBR stack is forzen.
> 3. Profiling agent PMI handler handles overflow, and clears
> IA32_PERF_GLOBAL_STATUS.
> 4. When IA32_PERF_GLOBAL_STATUS.LBR_FRZ is cleared in step 3,
> processor resume LBR stack, and new LBR records can be filled
> again.
>
> In order to emulate this behavior, LBR stack must be frozen on PMI.
> KVM has two choice to do this:
> 1. KVM stops vLBR event through perf_event_pause(), and put vLBR
> event into off state, then vLBR lose LBR hw resource, finally guest
> couldn't read LBR records in guest PMI handler. This choice couldn't
> be used.
> 2. KVM clear guest DEBUGCTLMSR_LBR bit in VMCS on PMI, so when guest
> is running, LBR HW stack is disabled, while vLBR event is still active
> and own LBR HW, so guest could still read LBR records in guest PMI
> handler. But the sequence of streamlined freeze LBR doesn't clear
> DEBUGCTLMSR_LBR bit, so when guest read guest DEBUGCTL_MSR, KVM will
> return a value with DEBUGCTLMSR_LBR bit set during LBR freezing. Once
> guest clears IA32_PERF_GLOBAL_STATUS.LBR_FRZ in step 4, KVM will
> re-enable guest LBR through setting guest DEBUGCTL_LBR bit in VMCS.
>
> As KVM will re-enable guest LBR when guest clears global status, the
> handling of GLOBAL_OVF_CTRL MSR is moved from common pmu.c into
> vmx/pmu_intel.c.
>
> Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
> ---
>   arch/x86/include/asm/msr-index.h |  1 +
>   arch/x86/kvm/pmu.c               |  8 ------
>   arch/x86/kvm/vmx/pmu_intel.c     | 44 ++++++++++++++++++++++++++++++++
>   arch/x86/kvm/vmx/vmx.c           |  3 +++
>   4 files changed, 48 insertions(+), 8 deletions(-)
>
> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
> index 3aedae61af4f..4fce37ae5a90 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -1041,6 +1041,7 @@
>   /* PERF_GLOBAL_OVF_CTL bits */
>   #define MSR_CORE_PERF_GLOBAL_OVF_CTRL_TRACE_TOPA_PMI_BIT	55
>   #define MSR_CORE_PERF_GLOBAL_OVF_CTRL_TRACE_TOPA_PMI		(1ULL << MSR_CORE_PERF_GLOBAL_OVF_CTRL_TRACE_TOPA_PMI_BIT)
> +#define MSR_CORE_PERF_GLOBAL_OVF_CTRL_LBR_FREEZE		BIT_ULL(58)
>   #define MSR_CORE_PERF_GLOBAL_OVF_CTRL_OVF_BUF_BIT		62
>   #define MSR_CORE_PERF_GLOBAL_OVF_CTRL_OVF_BUF			(1ULL <<  MSR_CORE_PERF_GLOBAL_OVF_CTRL_OVF_BUF_BIT)
>   #define MSR_CORE_PERF_GLOBAL_OVF_CTRL_COND_CHGD_BIT		63
> diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
> index edb89b51b383..4b6a508f3f0b 100644
> --- a/arch/x86/kvm/pmu.c
> +++ b/arch/x86/kvm/pmu.c
> @@ -640,14 +640,6 @@ int kvm_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>   			reprogram_counters(pmu, diff);
>   		}
>   		break;
> -	case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
> -		/*
> -		 * GLOBAL_OVF_CTRL, a.k.a. GLOBAL STATUS_RESET, clears bits in
> -		 * GLOBAL_STATUS, and so the set of reserved bits is the same.
> -		 */
> -		if (data & pmu->global_status_mask)
> -			return 1;
> -		fallthrough;
>   	case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR:
>   		if (!msr_info->host_initiated)
>   			pmu->global_status &= ~data;
> diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
> index 3a36a91638c6..ba7695a64ff1 100644
> --- a/arch/x86/kvm/vmx/pmu_intel.c
> +++ b/arch/x86/kvm/vmx/pmu_intel.c
> @@ -426,6 +426,29 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>   
>   		pmu->pebs_data_cfg = data;
>   		break;
> +	case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
> +		/*
> +		 * GLOBAL_OVF_CTRL, a.k.a. GLOBAL STATUS_RESET, clears bits in
> +		 * GLOBAL_STATUS, and so the set of reserved bits is the same.
> +		 */
> +		if (data & pmu->global_status_mask)
> +			return 1;
> +		if (pmu->version >= 4 && !msr_info->host_initiated &&
> +		    (data & MSR_CORE_PERF_GLOBAL_OVF_CTRL_LBR_FREEZE)) {
> +			u64 debug_ctl = vmcs_read64(GUEST_IA32_DEBUGCTL);
> +			struct lbr_desc *lbr_desc = vcpu_to_lbr_desc(vcpu);
> +
> +			if (!(debug_ctl & DEBUGCTLMSR_LBR) &&
> +			    lbr_desc->freeze_on_pmi) {
> +				debug_ctl |= DEBUGCTLMSR_LBR;
> +				vmcs_write64(GUEST_IA32_DEBUGCTL, debug_ctl);
> +				lbr_desc->freeze_on_pmi = false;
> +			}
> +		}
> +
> +		if (!msr_info->host_initiated)
> +			pmu->global_status &= ~data;
> +		break;
>   	default:
>   		if ((pmc = get_gp_pmc(pmu, msr, MSR_IA32_PERFCTR0)) ||
>   		    (pmc = get_gp_pmc(pmu, msr, MSR_IA32_PMC0))) {
> @@ -565,6 +588,9 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
>   	if (vmx_pt_mode_is_host_guest())
>   		pmu->global_status_mask &=
>   				~MSR_CORE_PERF_GLOBAL_OVF_CTRL_TRACE_TOPA_PMI;
> +	if (pmu->version >= 4)
> +		pmu->global_status_mask &=
> +				~MSR_CORE_PERF_GLOBAL_OVF_CTRL_LBR_FREEZE;
>   
>   	entry = kvm_find_cpuid_entry_index(vcpu, 7, 0);
>   	if (entry &&
> @@ -675,6 +701,22 @@ static void intel_pmu_legacy_freezing_lbrs_on_pmi(struct kvm_vcpu *vcpu)
>   	}
>   }
>   
> +static void intel_pmu_streamlined_freezing_lbrs_on_pmi(struct kvm_vcpu *vcpu)
> +{
> +	u64 data = vmcs_read64(GUEST_IA32_DEBUGCTL);
> +	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
> +
> +	/*
> +	 * Even if streamlined freezing LBR won't clear LBR_EN like legacy
> +	 * freezing LBR, here legacy freezing LBR is called to freeze LBR HW
> +	 * for streamlined freezing LBR when guest run. But guest VM will
> +	 * see a fake guest DEBUGCTL MSR with LBR_EN bit set.
> +	 */
> +	intel_pmu_legacy_freezing_lbrs_on_pmi(vcpu);
> +	if ((data & DEBUGCTLMSR_FREEZE_LBRS_ON_PMI) && (data & DEBUGCTLMSR_LBR))
> +		pmu->global_status |= MSR_CORE_PERF_GLOBAL_OVF_CTRL_LBR_FREEZE;
> +}
> +
>   static void intel_pmu_deliver_pmi(struct kvm_vcpu *vcpu)
>   {
>   	u8 version = vcpu_to_pmu(vcpu)->version;
> @@ -684,6 +726,8 @@ static void intel_pmu_deliver_pmi(struct kvm_vcpu *vcpu)
>   
>   	if (version > 1 && version < 4)
>   		intel_pmu_legacy_freezing_lbrs_on_pmi(vcpu);
> +	else if (version >= 4)
> +		intel_pmu_streamlined_freezing_lbrs_on_pmi(vcpu);
>   }
>   
>   static void vmx_update_intercept_for_lbr_msrs(struct kvm_vcpu *vcpu, bool set)
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 199d0da1dbee..3bd64879aab3 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -2098,6 +2098,9 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>   		break;
>   	case MSR_IA32_DEBUGCTLMSR:
>   		msr_info->data = vmcs_read64(GUEST_IA32_DEBUGCTL);
> +		if (vcpu_to_lbr_desc(vcpu)->freeze_on_pmi &&
> +		    vcpu_to_pmu(vcpu)->version >= 4)
> +			msr_info->data |= DEBUGCTLMSR_LBR;
>   		break;
>   	default:
>   	find_uret_msr:

Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 6/9] KVM: x86/pmu: Add Intel PMU supported fixed counters mask
  2023-09-01  7:28 ` [PATCH 6/9] KVM: x86/pmu: Add Intel PMU supported fixed counters mask Xiong Zhang
@ 2023-09-06 10:08   ` Mi, Dapeng
  0 siblings, 0 replies; 27+ messages in thread
From: Mi, Dapeng @ 2023-09-06 10:08 UTC (permalink / raw)
  To: Xiong Zhang, kvm
  Cc: seanjc, like.xu.linux, zhiyuan.lv, zhenyu.z.wang, kan.liang,
	Like Xu

On 9/1/2023 3:28 PM, Xiong Zhang wrote:
> From: Like Xu <likexu@tencent.com>
>
> Per Intel SDM, fixed-function performance counter 'i' is supported:
>
> 	FxCtr[i]_is_supported := ECX[i] || (EDX[4:0] > i);
> if pmu.version >=5, ECX is supported fixed counters bit mask.
> if 1 < pmu.version < 5, EDX[4:0] is number of contiguous fixed-function
> performance counters starting from 0.
>
> which means that the KVM user space can use EDX to limit the number of
> fixed counters starting from 0 and at the same time, using ECX to enable
> part of other KVM supported fixed counters. i.e: pmu.version = 5,
> ECX= 0x5, EDX[4:0]=1, FxCtrl[2, 0] are supported, FxCtrl[1] isn't
> supported.
>
> Add Fixed counter bit mask into all_valid_pmc_idx, and use it to perform
> the semantic checks.
>
> Since fixed counter may be non-continuous, nr_arch_fixed_counters can
> not be used to enumerate fixed counters, for_each_set_bit_from() is
> used to enumerate fixed counters, and nr_arch_fixed_counters is deleted.
>
> Signed-off-by: Like Xu <likexu@tencent.com>
> Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
> ---
>   arch/x86/include/asm/kvm_host.h |  1 -
>   arch/x86/kvm/pmu.h              | 12 +++++-
>   arch/x86/kvm/svm/pmu.c          |  1 -
>   arch/x86/kvm/vmx/pmu_intel.c    | 69 ++++++++++++++++++++-------------
>   4 files changed, 52 insertions(+), 31 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 9f57aa33798b..ceba4f89dec5 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -515,7 +515,6 @@ struct kvm_pmc {
>   struct kvm_pmu {
>   	u8 version;
>   	unsigned nr_arch_gp_counters;
> -	unsigned nr_arch_fixed_counters;
>   	unsigned available_event_types;
>   	u64 fixed_ctr_ctrl;
>   	u64 fixed_ctr_ctrl_mask;
> diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
> index 7d9ba301c090..4bab4819ea6c 100644
> --- a/arch/x86/kvm/pmu.h
> +++ b/arch/x86/kvm/pmu.h
> @@ -125,14 +125,22 @@ static inline struct kvm_pmc *get_gp_pmc(struct kvm_pmu *pmu, u32 msr,
>   	return NULL;
>   }
>   
> +static inline bool fixed_ctr_is_supported(struct kvm_pmu *pmu, unsigned int idx)
> +{
> +	return test_bit(INTEL_PMC_IDX_FIXED + idx, pmu->all_valid_pmc_idx);
> +}
> +
>   /* returns fixed PMC with the specified MSR */
>   static inline struct kvm_pmc *get_fixed_pmc(struct kvm_pmu *pmu, u32 msr)
>   {
>   	int base = MSR_CORE_PERF_FIXED_CTR0;
>   
> -	if (msr >= base && msr < base + pmu->nr_arch_fixed_counters) {
> +	if (msr >= base && msr < base + KVM_PMC_MAX_FIXED) {
>   		u32 index = array_index_nospec(msr - base,
> -					       pmu->nr_arch_fixed_counters);
> +					       KVM_PMC_MAX_FIXED);
> +
> +		if (!fixed_ctr_is_supported(pmu, index))
> +			return NULL;
>   
>   		return &pmu->fixed_counters[index];
>   	}
> diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c
> index cef5a3d0abd0..d0a12e739989 100644
> --- a/arch/x86/kvm/svm/pmu.c
> +++ b/arch/x86/kvm/svm/pmu.c
> @@ -213,7 +213,6 @@ static void amd_pmu_refresh(struct kvm_vcpu *vcpu)
>   	pmu->raw_event_mask = AMD64_RAW_EVENT_MASK;
>   	/* not applicable to AMD; but clean them to prevent any fall out */
>   	pmu->counter_bitmask[KVM_PMC_FIXED] = 0;
> -	pmu->nr_arch_fixed_counters = 0;
>   	bitmap_set(pmu->all_valid_pmc_idx, 0, pmu->nr_arch_gp_counters);
>   }
>   
> diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
> index 46363ac82a79..ce6d06ec562c 100644
> --- a/arch/x86/kvm/vmx/pmu_intel.c
> +++ b/arch/x86/kvm/vmx/pmu_intel.c
> @@ -72,10 +72,12 @@ static void reprogram_fixed_counters(struct kvm_pmu *pmu, u64 data)
>   {
>   	struct kvm_pmc *pmc;
>   	u8 old_fixed_ctr_ctrl = pmu->fixed_ctr_ctrl;
> -	int i;
> +	int s = INTEL_PMC_IDX_FIXED;
>   
>   	pmu->fixed_ctr_ctrl = data;
> -	for (i = 0; i < pmu->nr_arch_fixed_counters; i++) {
> +	for_each_set_bit_from(s, pmu->all_valid_pmc_idx,
> +			      INTEL_PMC_IDX_FIXED + INTEL_PMC_MAX_FIXED) {
> +		int i = s - INTEL_PMC_IDX_FIXED;
>   		u8 new_ctrl = fixed_ctrl_field(data, i);
>   		u8 old_ctrl = fixed_ctrl_field(old_fixed_ctr_ctrl, i);
>   
> @@ -132,7 +134,7 @@ static bool intel_is_valid_rdpmc_ecx(struct kvm_vcpu *vcpu, unsigned int idx)
>   
>   	idx &= ~(3u << 30);
>   
> -	return fixed ? idx < pmu->nr_arch_fixed_counters
> +	return fixed ? fixed_ctr_is_supported(pmu, idx)
>   		     : idx < pmu->nr_arch_gp_counters;
>   }
>   
> @@ -144,16 +146,17 @@ static struct kvm_pmc *intel_rdpmc_ecx_to_pmc(struct kvm_vcpu *vcpu,
>   	struct kvm_pmc *counters;
>   	unsigned int num_counters;
>   
> +	if (!intel_is_valid_rdpmc_ecx(vcpu, idx))
> +		return NULL;
> +
>   	idx &= ~(3u << 30);
>   	if (fixed) {
>   		counters = pmu->fixed_counters;
> -		num_counters = pmu->nr_arch_fixed_counters;
> +		num_counters = KVM_PMC_MAX_FIXED;
>   	} else {
>   		counters = pmu->gp_counters;
>   		num_counters = pmu->nr_arch_gp_counters;
>   	}
> -	if (idx >= num_counters)
> -		return NULL;
>   	*mask &= pmu->counter_bitmask[fixed ? KVM_PMC_FIXED : KVM_PMC_GP];
>   	return &counters[array_index_nospec(idx, num_counters)];
>   }
> @@ -352,6 +355,7 @@ static u64 intel_pmu_global_inuse_emulation(struct kvm_pmu *pmu)
>   {
>   	u64 data = 0;
>   	int i;
> +	int s = INTEL_PMC_IDX_FIXED;
>   
>   	for (i = 0; i < pmu->nr_arch_gp_counters; i++) {
>   		struct kvm_pmc *pmc = &pmu->gp_counters[i];
> @@ -371,7 +375,10 @@ static u64 intel_pmu_global_inuse_emulation(struct kvm_pmu *pmu)
>   			data |= MSR_CORE_PERF_GLOBAL_INUSE_PMI;
>   	}
>   
> -	for (i = 0; i < pmu->nr_arch_fixed_counters; i++) {
> +	for_each_set_bit_from(s, pmu->all_valid_pmc_idx,
> +			      INTEL_PMC_IDX_FIXED + INTEL_PMC_MAX_FIXED) {
> +		i = s - INTEL_PMC_IDX_FIXED;
> +
>   		/*
>   		 * IA32_PERF_GLOBAL_INUSE.FCi_InUse[bit (i + 32)]: This bit
>   		 * reflects the logical state of
> @@ -379,7 +386,7 @@ static u64 intel_pmu_global_inuse_emulation(struct kvm_pmu *pmu)
>   		 */
>   		if (pmu->fixed_ctr_ctrl &
>   		    intel_fixed_bits_by_idx(i, INTEL_FIXED_0_KERNEL | INTEL_FIXED_0_USER))
> -			data |= 1ULL << (i + INTEL_PMC_IDX_FIXED);
> +			data |= 1ULL << s;
>   		/*
>   		 * IA32_PERF_GLOBAL_INUSE.PMI_InUse[bit 63]: This bit is set if
>   		 * IA32_FIXED_CTR_CTRL.ENi_PMI, i = 0, 1, 2 is set.
> @@ -565,12 +572,14 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>   
>   static void setup_fixed_pmc_eventsel(struct kvm_pmu *pmu)
>   {
> -	int i;
> +	int s = INTEL_PMC_IDX_FIXED;
>   
>   	BUILD_BUG_ON(ARRAY_SIZE(fixed_pmc_events) != KVM_PMC_MAX_FIXED);
>   
> -	for (i = 0; i < pmu->nr_arch_fixed_counters; i++) {
> -		int index = array_index_nospec(i, KVM_PMC_MAX_FIXED);
> +	for_each_set_bit_from(s, pmu->all_valid_pmc_idx,
> +			      INTEL_PMC_IDX_FIXED + INTEL_PMC_MAX_FIXED) {
> +		int index = array_index_nospec(s - INTEL_PMC_IDX_FIXED,
> +					       KVM_PMC_MAX_FIXED);
>   		struct kvm_pmc *pmc = &pmu->fixed_counters[index];
>   		u32 event = fixed_pmc_events[index];
>   
> @@ -591,7 +600,6 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
>   	int i;
>   
>   	pmu->nr_arch_gp_counters = 0;
> -	pmu->nr_arch_fixed_counters = 0;
>   	pmu->counter_bitmask[KVM_PMC_GP] = 0;
>   	pmu->counter_bitmask[KVM_PMC_FIXED] = 0;
>   	pmu->version = 0;
> @@ -633,11 +641,22 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
>   	pmu->available_event_types = ~entry->ebx &
>   					((1ull << eax.split.mask_length) - 1);
>   
> -	if (pmu->version == 1) {
> -		pmu->nr_arch_fixed_counters = 0;
> -	} else {
> -		pmu->nr_arch_fixed_counters = min_t(int, edx.split.num_counters_fixed,
> -						    kvm_pmu_cap.num_counters_fixed);
> +	counter_mask = ~(BIT_ULL(pmu->nr_arch_gp_counters) - 1);
> +	bitmap_set(pmu->all_valid_pmc_idx, 0, pmu->nr_arch_gp_counters);
> +
> +	if (pmu->version > 1) {
> +		for (i = 0; i < kvm_pmu_cap.num_counters_fixed; i++) {
> +			/*
> +			 * FxCtr[i]_is_supported :=
> +			 *	CPUID.0xA.ECX[i] || EDX[4:0] > i
> +			 */
> +			if (!(entry->ecx & BIT_ULL(i) || edx.split.num_counters_fixed > i))
> +				continue;
> +
> +			set_bit(INTEL_PMC_IDX_FIXED + i, pmu->all_valid_pmc_idx);
> +			pmu->fixed_ctr_ctrl_mask &= ~(0xbull << (i * 4));
> +			counter_mask &= ~BIT_ULL(INTEL_PMC_IDX_FIXED + i);
> +		}
>   		edx.split.bit_width_fixed = min_t(int, edx.split.bit_width_fixed,
>   						  kvm_pmu_cap.bit_width_fixed);
>   		pmu->counter_bitmask[KVM_PMC_FIXED] =
> @@ -645,10 +664,6 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
>   		setup_fixed_pmc_eventsel(pmu);
>   	}
>   
> -	for (i = 0; i < pmu->nr_arch_fixed_counters; i++)
> -		pmu->fixed_ctr_ctrl_mask &= ~(0xbull << (i * 4));
> -	counter_mask = ~(((1ull << pmu->nr_arch_gp_counters) - 1) |
> -		(((1ull << pmu->nr_arch_fixed_counters) - 1) << INTEL_PMC_IDX_FIXED));
>   	pmu->global_ctrl_mask = counter_mask;
>   
>   	/*
> @@ -674,11 +689,6 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
>   		pmu->raw_event_mask |= (HSW_IN_TX|HSW_IN_TX_CHECKPOINTED);
>   	}
>   
> -	bitmap_set(pmu->all_valid_pmc_idx,
> -		0, pmu->nr_arch_gp_counters);
> -	bitmap_set(pmu->all_valid_pmc_idx,
> -		INTEL_PMC_MAX_GENERIC, pmu->nr_arch_fixed_counters);
> -
>   	perf_capabilities = vcpu_get_perf_capabilities(vcpu);
>   	if (cpuid_model_is_consistent(vcpu) &&
>   	    (perf_capabilities & PMU_CAP_LBR_FMT))
> @@ -691,9 +701,14 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
>   
>   	if (perf_capabilities & PERF_CAP_PEBS_FORMAT) {
>   		if (perf_capabilities & PERF_CAP_PEBS_BASELINE) {
> +			int s = INTEL_PMC_IDX_FIXED;
> +			int e = INTEL_PMC_IDX_FIXED + INTEL_PMC_MAX_FIXED;
> +
>   			pmu->pebs_enable_mask = counter_mask;
>   			pmu->reserved_bits &= ~ICL_EVENTSEL_ADAPTIVE;
> -			for (i = 0; i < pmu->nr_arch_fixed_counters; i++) {
> +
> +			for_each_set_bit_from(s, pmu->all_valid_pmc_idx, e) {
> +				i = s - INTEL_PMC_IDX_FIXED;
>   				pmu->fixed_ctr_ctrl_mask &=
>   					~(1ULL << (INTEL_PMC_IDX_FIXED + i * 4));
>   			}
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 9/9] KVM: selftests: Add fixed counters enumeration test case
  2023-09-01  7:28 ` [PATCH 9/9] KVM: selftests: Add fixed counters enumeration test case Xiong Zhang
@ 2023-09-11  3:03   ` Mi, Dapeng
  2023-09-12  0:35     ` Zhang, Xiong Y
  0 siblings, 1 reply; 27+ messages in thread
From: Mi, Dapeng @ 2023-09-11  3:03 UTC (permalink / raw)
  To: Xiong Zhang, kvm
  Cc: seanjc, like.xu.linux, zhiyuan.lv, zhenyu.z.wang, kan.liang

On 9/1/2023 3:28 PM, Xiong Zhang wrote:
> vPMU v5 adds fixed counter enumeration, which allows user space to
> specify which fixed counters are supported through emulated
> CPUID.0Ah.ECX.
>
> This commit adds a test case which specify the max fixed counter
> supported only, so guest can access the max fixed counter only, #GP
> exception will be happen once guest access other fixed counters.
>
> Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
> ---
>   .../selftests/kvm/x86_64/vmx_pmu_caps_test.c  | 84 +++++++++++++++++++
>   1 file changed, 84 insertions(+)
>
> diff --git a/tools/testing/selftests/kvm/x86_64/vmx_pmu_caps_test.c b/tools/testing/selftests/kvm/x86_64/vmx_pmu_caps_test.c
> index ebbcb0a3f743..e37dc39164fe 100644
> --- a/tools/testing/selftests/kvm/x86_64/vmx_pmu_caps_test.c
> +++ b/tools/testing/selftests/kvm/x86_64/vmx_pmu_caps_test.c


since we added new test cases  in this file, this file is not just test 
'perf_capablities' anymore, we may change the file name 
vmx_pmu_caps_test.c to a more generic name like "vmx_pmu_test.c" or 
something else.

> @@ -18,6 +18,8 @@
>   #include "kvm_util.h"
>   #include "vmx.h"
>   
> +uint8_t fixed_counter_num;
> +
>   union perf_capabilities {
>   	struct {
>   		u64	lbr_format:6;
> @@ -233,6 +235,86 @@ static void test_lbr_perf_capabilities(union perf_capabilities host_cap)
>   	kvm_vm_free(vm);
>   }
>   
> +static void guest_v5_code(void)
> +{
> +	uint8_t  vector, i;
> +	uint64_t val;
> +
> +	for (i = 0; i < fixed_counter_num; i++) {
> +		vector = rdmsr_safe(MSR_CORE_PERF_FIXED_CTR0 + i, &val);
> +
> +		/*
> +		 * Only the max fixed counter is supported, #GP will be generated
> +		 * when guest access other fixed counters.
> +		 */
> +		if (i == fixed_counter_num - 1)
> +			__GUEST_ASSERT(vector != GP_VECTOR,
> +				       "Max Fixed counter is accessible, but get #GP");
> +		else
> +			__GUEST_ASSERT(vector == GP_VECTOR,
> +				       "Fixed counter isn't accessible, but access is ok");
> +	}
> +
> +	GUEST_DONE();
> +}
> +
> +#define PMU_NR_FIXED_COUNTERS_MASK  0x1f
> +
> +static void test_fixed_counter_enumeration(void)
> +{
> +	struct kvm_vcpu *vcpu;
> +	struct kvm_vm *vm;
> +	int r;
> +	struct kvm_cpuid_entry2 *ent;
> +	struct ucall uc;
> +	uint32_t fixed_counter_bit_mask;
> +
> +	if (kvm_cpu_property(X86_PROPERTY_PMU_VERSION) != 5)
> +		return;


We'd better check if the version is less than 5 here, since we might 
have higher version than 5 in the future.


> +
> +	vm = vm_create_with_one_vcpu(&vcpu, guest_v5_code);
> +	vm_init_descriptor_tables(vm);
> +	vcpu_init_descriptor_tables(vcpu);
> +
> +	ent = vcpu_get_cpuid_entry(vcpu, 0xa);
> +	fixed_counter_num = ent->edx & PMU_NR_FIXED_COUNTERS_MASK;
> +	TEST_ASSERT(fixed_counter_num > 0, "fixed counter isn't supported");
> +	fixed_counter_bit_mask = (1ul << fixed_counter_num) - 1;
> +	TEST_ASSERT(ent->ecx == fixed_counter_bit_mask,
> +		    "cpuid.0xa.ecx != %x", fixed_counter_bit_mask);
> +
> +	/* Fixed counter 0 isn't in ecx, but in edx, set_cpuid should be error. */
> +	ent->ecx &= ~0x1;
> +	r = __vcpu_set_cpuid(vcpu);
> +	TEST_ASSERT(r, "Setting in-consistency cpuid.0xa.ecx and edx success");
> +
> +	if (fixed_counter_num == 1) {
> +		kvm_vm_free(vm);
> +		return;
> +	}
> +
> +	/* Support the max Fixed Counter only */
> +	ent->ecx = 1UL << (fixed_counter_num - 1);
> +	ent->edx &= ~(u32)PMU_NR_FIXED_COUNTERS_MASK;
> +
> +	r = __vcpu_set_cpuid(vcpu);
> +	TEST_ASSERT(!r, "Setting modified cpuid.0xa.ecx and edx failed");
> +
> +	vcpu_run(vcpu);
> +
> +	switch (get_ucall(vcpu, &uc)) {
> +	case UCALL_ABORT:
> +		REPORT_GUEST_ASSERT(uc);
> +		break;
> +	case UCALL_DONE:
> +		break;
> +	default:
> +		TEST_FAIL("Unexpected ucall: %lu", uc.cmd);
> +	}
> +
> +	kvm_vm_free(vm);
> +}
> +
>   int main(int argc, char *argv[])
>   {
>   	union perf_capabilities host_cap;
> @@ -253,4 +335,6 @@ int main(int argc, char *argv[])
>   	test_immutable_perf_capabilities(host_cap);
>   	test_guest_wrmsr_perf_capabilities(host_cap);
>   	test_lbr_perf_capabilities(host_cap);
> +
> +	test_fixed_counter_enumeration();
>   }

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [PATCH 9/9] KVM: selftests: Add fixed counters enumeration test case
  2023-09-11  3:03   ` Mi, Dapeng
@ 2023-09-12  0:35     ` Zhang, Xiong Y
  0 siblings, 0 replies; 27+ messages in thread
From: Zhang, Xiong Y @ 2023-09-12  0:35 UTC (permalink / raw)
  To: Mi, Dapeng, kvm@vger.kernel.org
  Cc: Christopherson,, Sean, like.xu.linux@gmail.com, Lv, Zhiyuan,
	Wang, Zhenyu Z, Liang, Kan

> On 9/1/2023 3:28 PM, Xiong Zhang wrote:
> > vPMU v5 adds fixed counter enumeration, which allows user space to
> > specify which fixed counters are supported through emulated
> > CPUID.0Ah.ECX.
> >
> > This commit adds a test case which specify the max fixed counter
> > supported only, so guest can access the max fixed counter only, #GP
> > exception will be happen once guest access other fixed counters.
> >
> > Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
> > ---
> >   .../selftests/kvm/x86_64/vmx_pmu_caps_test.c  | 84 +++++++++++++++++++
> >   1 file changed, 84 insertions(+)
> >
> > diff --git a/tools/testing/selftests/kvm/x86_64/vmx_pmu_caps_test.c
> > b/tools/testing/selftests/kvm/x86_64/vmx_pmu_caps_test.c
> > index ebbcb0a3f743..e37dc39164fe 100644
> > --- a/tools/testing/selftests/kvm/x86_64/vmx_pmu_caps_test.c
> > +++ b/tools/testing/selftests/kvm/x86_64/vmx_pmu_caps_test.c
> 
> 
> since we added new test cases  in this file, this file is not just test
> 'perf_capablities' anymore, we may change the file name vmx_pmu_caps_test.c
> to a more generic name like "vmx_pmu_test.c" or something else.
Yes,  I will move it into vmx_counters_test.c after this serial https://lore.kernel.org/lkml/ZN6sopXa8aw8DG3w@google.com/T/ is merged.
> 
> > @@ -18,6 +18,8 @@
> >   #include "kvm_util.h"
> >   #include "vmx.h"
> >
> > +uint8_t fixed_counter_num;
> > +
> >   union perf_capabilities {
> >   	struct {
> >   		u64	lbr_format:6;
> > @@ -233,6 +235,86 @@ static void test_lbr_perf_capabilities(union
> perf_capabilities host_cap)
> >   	kvm_vm_free(vm);
> >   }
> >
> > +static void guest_v5_code(void)
> > +{
> > +	uint8_t  vector, i;
> > +	uint64_t val;
> > +
> > +	for (i = 0; i < fixed_counter_num; i++) {
> > +		vector = rdmsr_safe(MSR_CORE_PERF_FIXED_CTR0 + i, &val);
> > +
> > +		/*
> > +		 * Only the max fixed counter is supported, #GP will be
> generated
> > +		 * when guest access other fixed counters.
> > +		 */
> > +		if (i == fixed_counter_num - 1)
> > +			__GUEST_ASSERT(vector != GP_VECTOR,
> > +				       "Max Fixed counter is accessible, but get
> #GP");
> > +		else
> > +			__GUEST_ASSERT(vector == GP_VECTOR,
> > +				       "Fixed counter isn't accessible, but access is
> ok");
> > +	}
> > +
> > +	GUEST_DONE();
> > +}
> > +
> > +#define PMU_NR_FIXED_COUNTERS_MASK  0x1f
> > +
> > +static void test_fixed_counter_enumeration(void)
> > +{
> > +	struct kvm_vcpu *vcpu;
> > +	struct kvm_vm *vm;
> > +	int r;
> > +	struct kvm_cpuid_entry2 *ent;
> > +	struct ucall uc;
> > +	uint32_t fixed_counter_bit_mask;
> > +
> > +	if (kvm_cpu_property(X86_PROPERTY_PMU_VERSION) != 5)
> > +		return;
> 
> 
> We'd better check if the version is less than 5 here, since we might have higher
> version than 5 in the future.
Sure, change it in next version.
> 
> 
> > +
> > +	vm = vm_create_with_one_vcpu(&vcpu, guest_v5_code);
> > +	vm_init_descriptor_tables(vm);
> > +	vcpu_init_descriptor_tables(vcpu);
> > +
> > +	ent = vcpu_get_cpuid_entry(vcpu, 0xa);
> > +	fixed_counter_num = ent->edx & PMU_NR_FIXED_COUNTERS_MASK;
> > +	TEST_ASSERT(fixed_counter_num > 0, "fixed counter isn't supported");
> > +	fixed_counter_bit_mask = (1ul << fixed_counter_num) - 1;
> > +	TEST_ASSERT(ent->ecx == fixed_counter_bit_mask,
> > +		    "cpuid.0xa.ecx != %x", fixed_counter_bit_mask);
> > +
> > +	/* Fixed counter 0 isn't in ecx, but in edx, set_cpuid should be error. */
> > +	ent->ecx &= ~0x1;
> > +	r = __vcpu_set_cpuid(vcpu);
> > +	TEST_ASSERT(r, "Setting in-consistency cpuid.0xa.ecx and edx success");
> > +
> > +	if (fixed_counter_num == 1) {
> > +		kvm_vm_free(vm);
> > +		return;
> > +	}
> > +
> > +	/* Support the max Fixed Counter only */
> > +	ent->ecx = 1UL << (fixed_counter_num - 1);
> > +	ent->edx &= ~(u32)PMU_NR_FIXED_COUNTERS_MASK;
> > +
> > +	r = __vcpu_set_cpuid(vcpu);
> > +	TEST_ASSERT(!r, "Setting modified cpuid.0xa.ecx and edx failed");
> > +
> > +	vcpu_run(vcpu);
> > +
> > +	switch (get_ucall(vcpu, &uc)) {
> > +	case UCALL_ABORT:
> > +		REPORT_GUEST_ASSERT(uc);
> > +		break;
> > +	case UCALL_DONE:
> > +		break;
> > +	default:
> > +		TEST_FAIL("Unexpected ucall: %lu", uc.cmd);
> > +	}
> > +
> > +	kvm_vm_free(vm);
> > +}
> > +
> >   int main(int argc, char *argv[])
> >   {
> >   	union perf_capabilities host_cap;
> > @@ -253,4 +335,6 @@ int main(int argc, char *argv[])
> >   	test_immutable_perf_capabilities(host_cap);
> >   	test_guest_wrmsr_perf_capabilities(host_cap);
> >   	test_lbr_perf_capabilities(host_cap);
> > +
> > +	test_fixed_counter_enumeration();
> >   }

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [PATCH 5/9] KVM: x86/pmu: Check CPUID.0AH.ECX consistency
  2023-09-06  9:44   ` Mi, Dapeng
@ 2023-09-12  0:45     ` Zhang, Xiong Y
  0 siblings, 0 replies; 27+ messages in thread
From: Zhang, Xiong Y @ 2023-09-12  0:45 UTC (permalink / raw)
  To: Mi, Dapeng, kvm@vger.kernel.org
  Cc: Christopherson,, Sean, like.xu.linux@gmail.com, Lv, Zhiyuan,
	Wang, Zhenyu Z, Liang, Kan

> On 9/1/2023 3:28 PM, Xiong Zhang wrote:
> > With Arch PMU V5, register CPUID.0AH.ECX indicates Fixed Counter
> > enumeration. It is a bit mask which enumerates the supported Fixed
> > counters.
> > FxCtrl[i]_is_supported := ECX[i] || (EDX[4:0] > i) where EDX[4:0] is
> > Number of continuous fixed-function performance counters starting from
> > 0 (if version ID > 1).
> >
> > Here ECX and EDX[4:0] should satisfy the following consistency:
> > 1. if 1 < pmu_version < 5, ECX == 0;
> > 2. if pmu_version == 5 && edx[4:0] == 0, ECX[bit 0] == 0 3. if
> > pmu_version == 5 && edx[4:0] > 0,
> >     ecx & ((1 << edx[4:0]) - 1) == (1 << edx[4:0]) -1
> >
> > Otherwise it is mess to decide whether a fixed counter is supported or
> > not. i.e. pmu_version = 5, edx[4:0] = 3, ecx = 0x10, it is hard to
> > decide whether fixed counters 0 ~ 2 are supported or not.
> >
> > User can call SET_CPUID2 ioctl to set guest CPUID.0AH, this commit
> > adds a check to guarantee ecx and edx consistency specified by user.
> >
> > Once user specifies an un-consistency value, KVM can return an error
> > to user and drop user setting, or correct the un-consistency data and
> > accept the corrected data, this commit chooses to return an error to
> > user.
> >
> > Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
> > ---
> >   arch/x86/kvm/cpuid.c | 27 +++++++++++++++++++++++++++
> >   1 file changed, 27 insertions(+)
> >
> > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index
> > e961e9a05847..95dc5e8847e0 100644
> > --- a/arch/x86/kvm/cpuid.c
> > +++ b/arch/x86/kvm/cpuid.c
> > @@ -150,6 +150,33 @@ static int kvm_check_cpuid(struct kvm_vcpu *vcpu,
> >   			return -EINVAL;
> >   	}
> >
> > +	best = cpuid_entry2_find(entries, nent, 0xa,
> > +				 KVM_CPUID_INDEX_NOT_SIGNIFICANT);
> > +	if (best && vcpu->kvm->arch.enable_pmu) {
> > +		union cpuid10_eax eax;
> > +		union cpuid10_edx   edx;
> 
> 
> Remove the redundant space before edx.
> 
> 
> > +
> > +		eax.full = best->eax;
> > +		edx.full = best->edx;
> > +
> 
> We may add SDM quotes as comments here. That makes reader to understand
> the logic easily.
Ok, do it in next version.
thanks
> 
> 
> > +		if (eax.split.version_id > 1 &&
> > +		    eax.split.version_id < 5 &&
> > +		    best->ecx != 0) {
> > +			return -EINVAL;
> > +		} else if (eax.split.version_id >= 5) {
> > +			int fixed_count = edx.split.num_counters_fixed;
> > +
> > +			if (fixed_count == 0 && (best->ecx & 0x1)) {
> > +				return -EINVAL;
> > +			} else if (fixed_count > 0) {
> > +				int low_fixed_mask = (1 << fixed_count) - 1;
> > +
> > +				if ((best->ecx & low_fixed_mask) !=
> low_fixed_mask)
> > +					return -EINVAL;
> > +			}
> > +		}
> > +	}
> > +
> >   	/*
> >   	 * Exposing dynamic xfeatures to the guest requires additional
> >   	 * enabling in the FPU, e.g. to expand the guest XSAVE state size.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 8/9] KVM: x86/pmu: Upgrade pmu version to 5 on intel processor
  2023-09-01  7:28 ` [PATCH 8/9] KVM: x86/pmu: Upgrade pmu version to 5 on intel processor Xiong Zhang
@ 2023-09-12 11:19   ` Like Xu
  2023-09-13  3:34     ` Zhang, Xiong Y
  0 siblings, 1 reply; 27+ messages in thread
From: Like Xu @ 2023-09-12 11:19 UTC (permalink / raw)
  To: Xiong Zhang; +Cc: seanjc, zhiyuan.lv, zhenyu.z.wang, kan.liang, dapeng1.mi, kvm



On 1/9/2023 3:28 pm, Xiong Zhang wrote:
> Modern intel processors have supported Architectural Performance
> Monitoring Version 5, this commit upgrade Intel vcpu's vPMU
> version from 2 to 5.
> 
> Go through PMU features from version 3 to 5, the following
> features are not supported:
> 1. AnyThread counting: it is added in v3, and deprecated in v5.
> 2. Streamed Freeze_PerfMon_On_PMI in v4, since legacy Freeze_PerMon_ON_PMI
> isn't supported, the new one won't be supported neither.
> 3. IA32_PERF_GLOBAL_STATUS.ASCI[bit 60]: Related to SGX, and will be
> emulated by SGX developer later.
> 4. Domain Separation in v5. When INV flag in IA32_PERFEVTSELx is used, a
> counter stops counting when logical processor exits the C0 ACPI C-state.
> First guest INV flag isn't supported, second guest ACPI C-state is vague.
> 
> When a guest enable unsupported features through WRMSR, KVM will inject
> a #GP into the guest.
> 
> Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
> ---
>   arch/x86/kvm/pmu.h | 5 ++++-
>   1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
> index 4bab4819ea6c..8e6bc9b1a747 100644
> --- a/arch/x86/kvm/pmu.h
> +++ b/arch/x86/kvm/pmu.h
> @@ -215,7 +215,10 @@ static inline void kvm_init_pmu_capability(const struct kvm_pmu_ops *pmu_ops)
>   		return;
>   	}
>   
> -	kvm_pmu_cap.version = min(kvm_pmu_cap.version, 2);

For AMD as of now, the kvm_pmu_cap.version will not exceed 2.
Thus there's no need to differentiate between Intel and AMD.

> +	if (is_intel)
> +		kvm_pmu_cap.version = min(kvm_pmu_cap.version, 5);
> +	else
> +		kvm_pmu_cap.version = min(kvm_pmu_cap.version, 2);
>   	kvm_pmu_cap.num_counters_gp = min(kvm_pmu_cap.num_counters_gp,
>   					  pmu_ops->MAX_NR_GP_COUNTERS);
>   	kvm_pmu_cap.num_counters_fixed = min(kvm_pmu_cap.num_counters_fixed,

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 7/9] KVM: x86/pmu: Add fixed counter enumeration for pmu v5
  2023-09-01  7:28 ` [PATCH 7/9] KVM: x86/pmu: Add fixed counter enumeration for pmu v5 Xiong Zhang
@ 2023-09-12 11:24   ` Like Xu
  2023-09-13  4:11     ` Zhang, Xiong Y
  0 siblings, 1 reply; 27+ messages in thread
From: Like Xu @ 2023-09-12 11:24 UTC (permalink / raw)
  To: Xiong Zhang; +Cc: seanjc, zhiyuan.lv, zhenyu.z.wang, kan.liang, dapeng1.mi, kvm

On 1/9/2023 3:28 pm, Xiong Zhang wrote:
> With Arch PMU v5, CPUID.0AH.ECX is a bit mask which enumerates the
> supported Fixed Counters. If bit 'i' is set, it implies that Fixed
> Counter 'i' is supported.
> 
> This commit adds CPUID.0AH.ECX emulation for vPMU version 5, KVM
> supports Fixed Counter enumeration starting from 0 by default,
> user can modify it through SET_CPUID2 ioctl.
> 
> Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
> ---
>   arch/x86/kvm/cpuid.c | 5 ++++-
>   1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 95dc5e8847e0..2bffed010c9e 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -1028,7 +1028,10 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
>   
>   		entry->eax = eax.full;
>   		entry->ebx = kvm_pmu_cap.events_mask;
> -		entry->ecx = 0;
> +		if (kvm_pmu_cap.version < 5)
> +			entry->ecx = 0;
> +		else
> +			entry->ecx = (1ULL << kvm_pmu_cap.num_counters_fixed) - 1;

If there are partial fixed counters on the host (e.g. L1 host for L2 VM) that 
are filtered out,
L1 KVM should not expose unsupported fixed counters in this way.

>   		entry->edx = edx.full;
>   		break;
>   	}

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 5/9] KVM: x86/pmu: Check CPUID.0AH.ECX consistency
  2023-09-01  7:28 ` [PATCH 5/9] KVM: x86/pmu: Check CPUID.0AH.ECX consistency Xiong Zhang
  2023-09-06  9:44   ` Mi, Dapeng
@ 2023-09-12 11:31   ` Like Xu
  2023-09-13  4:25     ` Zhang, Xiong Y
  1 sibling, 1 reply; 27+ messages in thread
From: Like Xu @ 2023-09-12 11:31 UTC (permalink / raw)
  To: Xiong Zhang
  Cc: seanjc, zhiyuan.lv, zhenyu.z.wang, kan.liang, dapeng1.mi,
	kvm list

On 1/9/2023 3:28 pm, Xiong Zhang wrote:
> Once user specifies an un-consistency value, KVM can return an
> error to user and drop user setting, or correct the un-consistency
> data and accept the corrected data, this commit chooses to
> return an error to user.

Doing so is inconsistent with other vPMU CPUID configurations.

This issue is generic and not just for this PMU CPUID leaf.
Make sure this part of the design is covered in the vPMU documentation.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4/9] KVM: x86/pmu: Add MSR_PERF_GLOBAL_INUSE emulation
  2023-09-01  7:28 ` [PATCH 4/9] KVM: x86/pmu: Add MSR_PERF_GLOBAL_INUSE emulation Xiong Zhang
@ 2023-09-12 11:41   ` Like Xu
  2023-09-13  5:11     ` Zhang, Xiong Y
  0 siblings, 1 reply; 27+ messages in thread
From: Like Xu @ 2023-09-12 11:41 UTC (permalink / raw)
  To: Xiong Zhang; +Cc: seanjc, zhiyuan.lv, zhenyu.z.wang, kan.liang, dapeng1.mi, kvm



On 1/9/2023 3:28 pm, Xiong Zhang wrote:
> Arch PMU v4 introduces a new MSR, IA32_PERF_GLOBAL_INUSE. It provides
> as "InUse" bit for each GP counter and fixed counter in processor.
> Additionally PMI InUse[bit 63] indicates if the PMI mechanism has been
> configured.
> 
> Each bit's definition references Architectural Performance Monitoring
> Version 4 section of SDM.
> 
> Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
> ---
>   arch/x86/include/asm/msr-index.h |  4 +++
>   arch/x86/kvm/vmx/pmu_intel.c     | 58 ++++++++++++++++++++++++++++++++
>   2 files changed, 62 insertions(+)
> 
> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
> index 7c8cf6b53a76..31bb425899fb 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -1036,6 +1036,7 @@
>   #define MSR_CORE_PERF_GLOBAL_CTRL	0x0000038f
>   #define MSR_CORE_PERF_GLOBAL_OVF_CTRL	0x00000390
>   #define MSR_CORE_PERF_GLOBAL_STATUS_SET 0x00000391
> +#define MSR_CORE_PERF_GLOBAL_INUSE	0x00000392
>   
>   #define MSR_PERF_METRICS		0x00000329
>   
> @@ -1048,6 +1049,9 @@
>   #define MSR_CORE_PERF_GLOBAL_OVF_CTRL_COND_CHGD_BIT		63
>   #define MSR_CORE_PERF_GLOBAL_OVF_CTRL_COND_CHGD			(1ULL << MSR_CORE_PERF_GLOBAL_OVF_CTRL_COND_CHGD_BIT)
>   
> +/* PERF_GLOBAL_INUSE bits */
> +#define MSR_CORE_PERF_GLOBAL_INUSE_PMI				BIT_ULL(63)
> +
>   /* Geode defined MSRs */
>   #define MSR_GEODE_BUSCONT_CONF0		0x00001900
>   
> diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
> index b25df421cd75..46363ac82a79 100644
> --- a/arch/x86/kvm/vmx/pmu_intel.c
> +++ b/arch/x86/kvm/vmx/pmu_intel.c
> @@ -207,6 +207,7 @@ static bool intel_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)
>   	case MSR_CORE_PERF_FIXED_CTR_CTRL:
>   		return kvm_pmu_has_perf_global_ctrl(pmu);
>   	case MSR_CORE_PERF_GLOBAL_STATUS_SET:
> +	case MSR_CORE_PERF_GLOBAL_INUSE:
>   		return vcpu_to_pmu(vcpu)->version >= 4;
>   	case MSR_IA32_PEBS_ENABLE:
>   		ret = vcpu_get_perf_capabilities(vcpu) & PERF_CAP_PEBS_FORMAT;
> @@ -347,6 +348,58 @@ static bool intel_pmu_handle_lbr_msrs_access(struct kvm_vcpu *vcpu,
>   	return true;
>   }
>   
> +static u64 intel_pmu_global_inuse_emulation(struct kvm_pmu *pmu)
> +{
> +	u64 data = 0;
> +	int i;
> +
> +	for (i = 0; i < pmu->nr_arch_gp_counters; i++) {
> +		struct kvm_pmc *pmc = &pmu->gp_counters[i];
> +
> +		/*
> +		 * IA32_PERF_GLOBAL_INUSE.PERFEVTSELn_InUse[bit n]: This bit
> +		 * reflects the logical state of (IA32_PERFEVTSELn[7:0]),
> +		 * n < CPUID.0AH.EAX[15:8].
> +		 */
> +		if (pmc->eventsel & ARCH_PERFMON_EVENTSEL_EVENT)
> +			data |= 1 << i;
> +		/*
> +		 * IA32_PERF_GLOBAL_INUSE.PMI_InUse[bit 63]: This bit is set if
> +		 * IA32_PERFEVTSELn.INT[bit 20], n < CPUID.0AH.EAX[15:8] is set.
> +		 */
> +		if (pmc->eventsel & ARCH_PERFMON_EVENTSEL_INT)
> +			data |= MSR_CORE_PERF_GLOBAL_INUSE_PMI;

If this bit is already set, there is no need to repeat it to avoid wasting cycles.

> +	}
> +
> +	for (i = 0; i < pmu->nr_arch_fixed_counters; i++) {
> +		/*
> +		 * IA32_PERF_GLOBAL_INUSE.FCi_InUse[bit (i + 32)]: This bit
> +		 * reflects the logical state of
> +		 * IA32_FIXED_CTR_CTRL[i * 4 + 1, i * 4] != 0
> +		 */
> +		if (pmu->fixed_ctr_ctrl &
> +		    intel_fixed_bits_by_idx(i, INTEL_FIXED_0_KERNEL | INTEL_FIXED_0_USER))
> +			data |= 1ULL << (i + INTEL_PMC_IDX_FIXED);
> +		/*
> +		 * IA32_PERF_GLOBAL_INUSE.PMI_InUse[bit 63]: This bit is set if
> +		 * IA32_FIXED_CTR_CTRL.ENi_PMI, i = 0, 1, 2 is set.
> +		 */
> +		if (pmu->fixed_ctr_ctrl &
> +		    intel_fixed_bits_by_idx(i, INTEL_FIXED_0_ENABLE_PMI))
> +			data |= MSR_CORE_PERF_GLOBAL_INUSE_PMI;
> +	}
> +
> +	/*
> +	 * IA32_PERF_GLOBAL_INUSE.PMI_InUse[bit 63]: This bit is set if
> +	 * any IA32_PEBS_ENABLES bit is set, which enables PEBS for a GP or
> +	 * fixed counter.
> +	 */
> +	if (pmu->pebs_enable)
> +		data |= MSR_CORE_PERF_GLOBAL_INUSE_PMI;
> +
> +	return data;
> +}
> +
>   static int intel_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>   {
>   	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
> @@ -360,6 +413,9 @@ static int intel_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>   	case MSR_CORE_PERF_GLOBAL_STATUS_SET:
>   		msr_info->data = 0;
>   		break;
> +	case MSR_CORE_PERF_GLOBAL_INUSE:
> +		msr_info->data = intel_pmu_global_inuse_emulation(pmu);
> +		break;
>   	case MSR_IA32_PEBS_ENABLE:
>   		msr_info->data = pmu->pebs_enable;
>   		break;
> @@ -409,6 +465,8 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>   		if (pmu->fixed_ctr_ctrl != data)
>   			reprogram_fixed_counters(pmu, data);
>   		break;
> +	case MSR_CORE_PERF_GLOBAL_INUSE:
> +		return 1;   /* RO MSR */

Is msrs_to_save_pmu[] updated?

>   	case MSR_IA32_PEBS_ENABLE:
>   		if (data & pmu->pebs_enable_mask)
>   			return 1;

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/9] KVM: x86/PMU: Don't release vLBR caused by PMI
  2023-09-01  7:28 ` [PATCH 1/9] KVM: x86/PMU: Don't release vLBR caused by PMI Xiong Zhang
  2023-09-06  9:47   ` Mi, Dapeng
@ 2023-09-12 11:54   ` Like Xu
  2023-09-13  6:00     ` Zhang, Xiong Y
  1 sibling, 1 reply; 27+ messages in thread
From: Like Xu @ 2023-09-12 11:54 UTC (permalink / raw)
  To: Xiong Zhang; +Cc: seanjc, zhiyuan.lv, zhenyu.z.wang, kan.liang, dapeng1.mi, kvm

On 1/9/2023 3:28 pm, Xiong Zhang wrote:
> vLBR event will be released at vcpu sched-in time if LBR_EN bit is not
> set in GUEST_IA32_DEBUGCTL VMCS field, this bit is cleared in two cases:
> 1. guest disable LBR through WRMSR
> 2. KVM disable LBR at PMI injection to emulate guest FREEZE_LBR_ON_PMI.
> 
> The first case is guest LBR won't be used anymore and vLBR event can be
> released, but guest LBR is still be used in the second case, vLBR event
> can not be released.
> 
> Considering this serial:
> 1. vPMC overflow, KVM injects vPMI and clears guest LBR_EN
> 2. guest handles PMI, and reads LBR records.
> 3. vCPU is sched-out, later sched-in, vLBR event is released.

This has nothing to do with vPMI. If guest lbr is disabled and the guest
LBR driver doesn't read it before the KVM vLBR event is released (typically
after two sched slices), that part of the LBR records are lost in terms of
design. What is needed here is a generic KVM mechanism to close this gap.

> 4. Guest continue reading LBR records, KVM creates vLBR event again,
> the vLBR event is the only LBR user on host now, host PMU driver will
> reset HW LBR facility at vLBR creataion.
> 5. Guest gets the remain LBR records with reset state.
> This is conflict with FREEZE_LBR_ON_PMI meaning, so vLBR event can
> not be release on PMI.
> 
> This commit adds a freeze_on_pmi flag, this flag is set at pmi
> injection and is cleared when guest operates guest DEBUGCTL_MSR. If
> this flag is true, vLBR event will not be released.
> 
> Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
> ---
>   arch/x86/kvm/vmx/pmu_intel.c |  5 ++++-
>   arch/x86/kvm/vmx/vmx.c       | 12 +++++++++---
>   arch/x86/kvm/vmx/vmx.h       |  3 +++
>   3 files changed, 16 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
> index f2efa0bf7ae8..3a36a91638c6 100644
> --- a/arch/x86/kvm/vmx/pmu_intel.c
> +++ b/arch/x86/kvm/vmx/pmu_intel.c
> @@ -628,6 +628,7 @@ static void intel_pmu_init(struct kvm_vcpu *vcpu)
>   	lbr_desc->records.nr = 0;
>   	lbr_desc->event = NULL;
>   	lbr_desc->msr_passthrough = false;
> +	lbr_desc->freeze_on_pmi = false;
>   }
>   
>   static void intel_pmu_reset(struct kvm_vcpu *vcpu)
> @@ -670,6 +671,7 @@ static void intel_pmu_legacy_freezing_lbrs_on_pmi(struct kvm_vcpu *vcpu)
>   	if (data & DEBUGCTLMSR_FREEZE_LBRS_ON_PMI) {
>   		data &= ~DEBUGCTLMSR_LBR;
>   		vmcs_write64(GUEST_IA32_DEBUGCTL, data);
> +		vcpu_to_lbr_desc(vcpu)->freeze_on_pmi = true;
>   	}
>   }
>   
> @@ -761,7 +763,8 @@ void vmx_passthrough_lbr_msrs(struct kvm_vcpu *vcpu)
>   
>   static void intel_pmu_cleanup(struct kvm_vcpu *vcpu)
>   {
> -	if (!(vmcs_read64(GUEST_IA32_DEBUGCTL) & DEBUGCTLMSR_LBR))
> +	if (!(vmcs_read64(GUEST_IA32_DEBUGCTL) & DEBUGCTLMSR_LBR) &&
> +	    !vcpu_to_lbr_desc(vcpu)->freeze_on_pmi)
>   		intel_pmu_release_guest_lbr_event(vcpu);
>   }
>   
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index e6849f780dba..199d0da1dbee 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -2223,9 +2223,15 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>   			get_vmcs12(vcpu)->guest_ia32_debugctl = data;
>   
>   		vmcs_write64(GUEST_IA32_DEBUGCTL, data);
> -		if (intel_pmu_lbr_is_enabled(vcpu) && !to_vmx(vcpu)->lbr_desc.event &&
> -		    (data & DEBUGCTLMSR_LBR))
> -			intel_pmu_create_guest_lbr_event(vcpu);
> +
> +		if (intel_pmu_lbr_is_enabled(vcpu)) {
> +			struct lbr_desc *lbr_desc = vcpu_to_lbr_desc(vcpu);
> +
> +			lbr_desc->freeze_on_pmi = false;
> +			if (!lbr_desc->event && (data & DEBUGCTLMSR_LBR))
> +				intel_pmu_create_guest_lbr_event(vcpu);
> +		}
> +
>   		return 0;
>   	}
>   	case MSR_IA32_BNDCFGS:
> diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
> index c2130d2c8e24..9729ccfa75ae 100644
> --- a/arch/x86/kvm/vmx/vmx.h
> +++ b/arch/x86/kvm/vmx/vmx.h
> @@ -107,6 +107,9 @@ struct lbr_desc {
>   
>   	/* True if LBRs are marked as not intercepted in the MSR bitmap */
>   	bool msr_passthrough;
> +
> +	/* True if LBR is frozen on PMI */
> +	bool freeze_on_pmi;
>   };
>   
>   /*

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [PATCH 8/9] KVM: x86/pmu: Upgrade pmu version to 5 on intel processor
  2023-09-12 11:19   ` Like Xu
@ 2023-09-13  3:34     ` Zhang, Xiong Y
  0 siblings, 0 replies; 27+ messages in thread
From: Zhang, Xiong Y @ 2023-09-13  3:34 UTC (permalink / raw)
  To: Like Xu
  Cc: Christopherson,, Sean, Lv, Zhiyuan, Wang, Zhenyu Z, Liang, Kan,
	dapeng1.mi@linux.intel.com, kvm@vger.kernel.org

> On 1/9/2023 3:28 pm, Xiong Zhang wrote:
> > Modern intel processors have supported Architectural Performance
> > Monitoring Version 5, this commit upgrade Intel vcpu's vPMU version
> > from 2 to 5.
> >
> > Go through PMU features from version 3 to 5, the following features
> > are not supported:
> > 1. AnyThread counting: it is added in v3, and deprecated in v5.
> > 2. Streamed Freeze_PerfMon_On_PMI in v4, since legacy
> > Freeze_PerMon_ON_PMI isn't supported, the new one won't be supported
> neither.
> > 3. IA32_PERF_GLOBAL_STATUS.ASCI[bit 60]: Related to SGX, and will be
> > emulated by SGX developer later.
> > 4. Domain Separation in v5. When INV flag in IA32_PERFEVTSELx is used,
> > a counter stops counting when logical processor exits the C0 ACPI C-state.
> > First guest INV flag isn't supported, second guest ACPI C-state is vague.
> >
> > When a guest enable unsupported features through WRMSR, KVM will
> > inject a #GP into the guest.
> >
> > Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
> > ---
> >   arch/x86/kvm/pmu.h | 5 ++++-
> >   1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h index
> > 4bab4819ea6c..8e6bc9b1a747 100644
> > --- a/arch/x86/kvm/pmu.h
> > +++ b/arch/x86/kvm/pmu.h
> > @@ -215,7 +215,10 @@ static inline void kvm_init_pmu_capability(const
> struct kvm_pmu_ops *pmu_ops)
> >   		return;
> >   	}
> >
> > -	kvm_pmu_cap.version = min(kvm_pmu_cap.version, 2);
> 
> For AMD as of now, the kvm_pmu_cap.version will not exceed 2.
> Thus there's no need to differentiate between Intel and AMD.
> 
Get it.

thanks
> > +	if (is_intel)
> > +		kvm_pmu_cap.version = min(kvm_pmu_cap.version, 5);
> > +	else
> > +		kvm_pmu_cap.version = min(kvm_pmu_cap.version, 2);
> >   	kvm_pmu_cap.num_counters_gp =
> min(kvm_pmu_cap.num_counters_gp,
> >   					  pmu_ops->MAX_NR_GP_COUNTERS);
> >   	kvm_pmu_cap.num_counters_fixed =
> > min(kvm_pmu_cap.num_counters_fixed,

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [PATCH 7/9] KVM: x86/pmu: Add fixed counter enumeration for pmu v5
  2023-09-12 11:24   ` Like Xu
@ 2023-09-13  4:11     ` Zhang, Xiong Y
  0 siblings, 0 replies; 27+ messages in thread
From: Zhang, Xiong Y @ 2023-09-13  4:11 UTC (permalink / raw)
  To: Like Xu
  Cc: Christopherson,, Sean, Lv, Zhiyuan, Wang, Zhenyu Z, Liang, Kan,
	dapeng1.mi@linux.intel.com, kvm@vger.kernel.org

> On 1/9/2023 3:28 pm, Xiong Zhang wrote:
> > With Arch PMU v5, CPUID.0AH.ECX is a bit mask which enumerates the
> > supported Fixed Counters. If bit 'i' is set, it implies that Fixed
> > Counter 'i' is supported.
> >
> > This commit adds CPUID.0AH.ECX emulation for vPMU version 5, KVM
> > supports Fixed Counter enumeration starting from 0 by default, user
> > can modify it through SET_CPUID2 ioctl.
> >
> > Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
> > ---
> >   arch/x86/kvm/cpuid.c | 5 ++++-
> >   1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index
> > 95dc5e8847e0..2bffed010c9e 100644
> > --- a/arch/x86/kvm/cpuid.c
> > +++ b/arch/x86/kvm/cpuid.c
> > @@ -1028,7 +1028,10 @@ static inline int __do_cpuid_func(struct
> > kvm_cpuid_array *array, u32 function)
> >
> >   		entry->eax = eax.full;
> >   		entry->ebx = kvm_pmu_cap.events_mask;
> > -		entry->ecx = 0;
> > +		if (kvm_pmu_cap.version < 5)
> > +			entry->ecx = 0;
> > +		else
> > +			entry->ecx = (1ULL <<
> kvm_pmu_cap.num_counters_fixed) - 1;
> 
> If there are partial fixed counters on the host (e.g. L1 host for L2 VM) that are
> filtered out,
> L1 KVM should not expose unsupported fixed counters in this way.
If vPMC index doesn't exist on host,
for basic counter, this doesn't matter as KVM still get host counter for it.
for pebs, this will disable guest pebs.
Is this right? Any other reasons ?
So here we'd better get entry->ecx from host, so that guest and host will have the same fixed counter bitmap, this means we will extend perf_get_x86_pmu_capability(&kvm_pmu_cap).
> 
> >   		entry->edx = edx.full;
> >   		break;
> >   	}

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [PATCH 5/9] KVM: x86/pmu: Check CPUID.0AH.ECX consistency
  2023-09-12 11:31   ` Like Xu
@ 2023-09-13  4:25     ` Zhang, Xiong Y
  0 siblings, 0 replies; 27+ messages in thread
From: Zhang, Xiong Y @ 2023-09-13  4:25 UTC (permalink / raw)
  To: Like Xu
  Cc: Christopherson,, Sean, Lv, Zhiyuan, Wang, Zhenyu Z, Liang, Kan,
	dapeng1.mi@linux.intel.com, kvm list

> On 1/9/2023 3:28 pm, Xiong Zhang wrote:
> > Once user specifies an un-consistency value, KVM can return an error
> > to user and drop user setting, or correct the un-consistency data and
> > accept the corrected data, this commit chooses to return an error to
> > user.
> 
> Doing so is inconsistent with other vPMU CPUID configurations.
> 
> This issue is generic and not just for this PMU CPUID leaf.
> Make sure this part of the design is covered in the vPMU documentation.
So I will drop this commit.
thanks

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [PATCH 4/9] KVM: x86/pmu: Add MSR_PERF_GLOBAL_INUSE emulation
  2023-09-12 11:41   ` Like Xu
@ 2023-09-13  5:11     ` Zhang, Xiong Y
  0 siblings, 0 replies; 27+ messages in thread
From: Zhang, Xiong Y @ 2023-09-13  5:11 UTC (permalink / raw)
  To: Like Xu
  Cc: Christopherson,, Sean, Lv, Zhiyuan, Wang, Zhenyu Z, Liang, Kan,
	dapeng1.mi@linux.intel.com, kvm@vger.kernel.org


> On 1/9/2023 3:28 pm, Xiong Zhang wrote:
> > Arch PMU v4 introduces a new MSR, IA32_PERF_GLOBAL_INUSE. It provides
> > as "InUse" bit for each GP counter and fixed counter in processor.
> > Additionally PMI InUse[bit 63] indicates if the PMI mechanism has been
> > configured.
> >
> > Each bit's definition references Architectural Performance Monitoring
> > Version 4 section of SDM.
> >
> > Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
> > ---
> >   arch/x86/include/asm/msr-index.h |  4 +++
> >   arch/x86/kvm/vmx/pmu_intel.c     | 58
> ++++++++++++++++++++++++++++++++
> >   2 files changed, 62 insertions(+)
> >
> > diff --git a/arch/x86/include/asm/msr-index.h
> > b/arch/x86/include/asm/msr-index.h
> > index 7c8cf6b53a76..31bb425899fb 100644
> > --- a/arch/x86/include/asm/msr-index.h
> > +++ b/arch/x86/include/asm/msr-index.h
> > @@ -1036,6 +1036,7 @@
> >   #define MSR_CORE_PERF_GLOBAL_CTRL	0x0000038f
> >   #define MSR_CORE_PERF_GLOBAL_OVF_CTRL	0x00000390
> >   #define MSR_CORE_PERF_GLOBAL_STATUS_SET 0x00000391
> > +#define MSR_CORE_PERF_GLOBAL_INUSE	0x00000392
> >
> >   #define MSR_PERF_METRICS		0x00000329
> >
> > @@ -1048,6 +1049,9 @@
> >   #define MSR_CORE_PERF_GLOBAL_OVF_CTRL_COND_CHGD_BIT
> 	63
> >   #define MSR_CORE_PERF_GLOBAL_OVF_CTRL_COND_CHGD
> 	(1ULL << MSR_CORE_PERF_GLOBAL_OVF_CTRL_COND_CHGD_BIT)
> >
> > +/* PERF_GLOBAL_INUSE bits */
> > +#define MSR_CORE_PERF_GLOBAL_INUSE_PMI
> 	BIT_ULL(63)
> > +
> >   /* Geode defined MSRs */
> >   #define MSR_GEODE_BUSCONT_CONF0		0x00001900
> >
> > diff --git a/arch/x86/kvm/vmx/pmu_intel.c
> > b/arch/x86/kvm/vmx/pmu_intel.c index b25df421cd75..46363ac82a79 100644
> > --- a/arch/x86/kvm/vmx/pmu_intel.c
> > +++ b/arch/x86/kvm/vmx/pmu_intel.c
> > @@ -207,6 +207,7 @@ static bool intel_is_valid_msr(struct kvm_vcpu *vcpu,
> u32 msr)
> >   	case MSR_CORE_PERF_FIXED_CTR_CTRL:
> >   		return kvm_pmu_has_perf_global_ctrl(pmu);
> >   	case MSR_CORE_PERF_GLOBAL_STATUS_SET:
> > +	case MSR_CORE_PERF_GLOBAL_INUSE:
> >   		return vcpu_to_pmu(vcpu)->version >= 4;
> >   	case MSR_IA32_PEBS_ENABLE:
> >   		ret = vcpu_get_perf_capabilities(vcpu) &
> PERF_CAP_PEBS_FORMAT; @@
> > -347,6 +348,58 @@ static bool intel_pmu_handle_lbr_msrs_access(struct
> kvm_vcpu *vcpu,
> >   	return true;
> >   }
> >
> > +static u64 intel_pmu_global_inuse_emulation(struct kvm_pmu *pmu) {
> > +	u64 data = 0;
> > +	int i;
> > +
> > +	for (i = 0; i < pmu->nr_arch_gp_counters; i++) {
> > +		struct kvm_pmc *pmc = &pmu->gp_counters[i];
> > +
> > +		/*
> > +		 * IA32_PERF_GLOBAL_INUSE.PERFEVTSELn_InUse[bit n]: This
> bit
> > +		 * reflects the logical state of (IA32_PERFEVTSELn[7:0]),
> > +		 * n < CPUID.0AH.EAX[15:8].
> > +		 */
> > +		if (pmc->eventsel & ARCH_PERFMON_EVENTSEL_EVENT)
> > +			data |= 1 << i;
> > +		/*
> > +		 * IA32_PERF_GLOBAL_INUSE.PMI_InUse[bit 63]: This bit is set
> if
> > +		 * IA32_PERFEVTSELn.INT[bit 20], n < CPUID.0AH.EAX[15:8] is
> set.
> > +		 */
> > +		if (pmc->eventsel & ARCH_PERFMON_EVENTSEL_INT)
> > +			data |= MSR_CORE_PERF_GLOBAL_INUSE_PMI;
> 
> If this bit is already set, there is no need to repeat it to avoid wasting cycles.
Get it.
> 
> > +	}
> > +
> > +	for (i = 0; i < pmu->nr_arch_fixed_counters; i++) {
> > +		/*
> > +		 * IA32_PERF_GLOBAL_INUSE.FCi_InUse[bit (i + 32)]: This bit
> > +		 * reflects the logical state of
> > +		 * IA32_FIXED_CTR_CTRL[i * 4 + 1, i * 4] != 0
> > +		 */
> > +		if (pmu->fixed_ctr_ctrl &
> > +		    intel_fixed_bits_by_idx(i, INTEL_FIXED_0_KERNEL |
> INTEL_FIXED_0_USER))
> > +			data |= 1ULL << (i + INTEL_PMC_IDX_FIXED);
> > +		/*
> > +		 * IA32_PERF_GLOBAL_INUSE.PMI_InUse[bit 63]: This bit is set
> if
> > +		 * IA32_FIXED_CTR_CTRL.ENi_PMI, i = 0, 1, 2 is set.
> > +		 */
> > +		if (pmu->fixed_ctr_ctrl &
> > +		    intel_fixed_bits_by_idx(i, INTEL_FIXED_0_ENABLE_PMI))
> > +			data |= MSR_CORE_PERF_GLOBAL_INUSE_PMI;
> > +	}
> > +
> > +	/*
> > +	 * IA32_PERF_GLOBAL_INUSE.PMI_InUse[bit 63]: This bit is set if
> > +	 * any IA32_PEBS_ENABLES bit is set, which enables PEBS for a GP or
> > +	 * fixed counter.
> > +	 */
> > +	if (pmu->pebs_enable)
> > +		data |= MSR_CORE_PERF_GLOBAL_INUSE_PMI;
> > +
> > +	return data;
> > +}
> > +
> >   static int intel_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data
> *msr_info)
> >   {
> >   	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu); @@ -360,6 +413,9 @@
> static
> > int intel_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> >   	case MSR_CORE_PERF_GLOBAL_STATUS_SET:
> >   		msr_info->data = 0;
> >   		break;
> > +	case MSR_CORE_PERF_GLOBAL_INUSE:
> > +		msr_info->data = intel_pmu_global_inuse_emulation(pmu);
> > +		break;
> >   	case MSR_IA32_PEBS_ENABLE:
> >   		msr_info->data = pmu->pebs_enable;
> >   		break;
> > @@ -409,6 +465,8 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu,
> struct msr_data *msr_info)
> >   		if (pmu->fixed_ctr_ctrl != data)
> >   			reprogram_fixed_counters(pmu, data);
> >   		break;
> > +	case MSR_CORE_PERF_GLOBAL_INUSE:
> > +		return 1;   /* RO MSR */
> 
> Is msrs_to_save_pmu[] updated?
I will add it and GLOBAL_STATUS_SET in next version.
thanks
> 
> >   	case MSR_IA32_PEBS_ENABLE:
> >   		if (data & pmu->pebs_enable_mask)
> >   			return 1;

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [PATCH 1/9] KVM: x86/PMU: Don't release vLBR caused by PMI
  2023-09-12 11:54   ` Like Xu
@ 2023-09-13  6:00     ` Zhang, Xiong Y
  0 siblings, 0 replies; 27+ messages in thread
From: Zhang, Xiong Y @ 2023-09-13  6:00 UTC (permalink / raw)
  To: Like Xu
  Cc: Christopherson,, Sean, Lv, Zhiyuan, Wang, Zhenyu Z, Liang, Kan,
	dapeng1.mi@linux.intel.com, kvm@vger.kernel.org

> On 1/9/2023 3:28 pm, Xiong Zhang wrote:
> > vLBR event will be released at vcpu sched-in time if LBR_EN bit is not
> > set in GUEST_IA32_DEBUGCTL VMCS field, this bit is cleared in two cases:
> > 1. guest disable LBR through WRMSR
> > 2. KVM disable LBR at PMI injection to emulate guest FREEZE_LBR_ON_PMI.
> >
> > The first case is guest LBR won't be used anymore and vLBR event can
> > be released, but guest LBR is still be used in the second case, vLBR
> > event can not be released.
> >
> > Considering this serial:
> > 1. vPMC overflow, KVM injects vPMI and clears guest LBR_EN 2. guest
> > handles PMI, and reads LBR records.
> > 3. vCPU is sched-out, later sched-in, vLBR event is released.
> 
> This has nothing to do with vPMI. If guest lbr is disabled and the guest LBR driver
> doesn't read it before the KVM vLBR event is released (typically after two sched
> slices), that part of the LBR records are lost in terms of design. What is needed
> here is a generic KVM mechanism to close this gap.
PMI handler will read LBR records, so it is easier to find this issue with vPMI.
Actually I found this issue with this test  https://lore.kernel.org/kvm/20230901074052.640296-3-xiong.y.zhang@intel.com/T/#u.
Indeed it can be extended to generic case as you said, in order to fix this gap, my rough idea is:
1. find a point to save LBR snapshot, where ????  it is too heavy when guest disable LBR, it is too late when vcpu sched-out.
2. Before guest enable LBR, kvm return LBR snapshot when guest read LBR records
3. When guest enable LBR, kvm create vLBR and restore snapshot (maybe this restore isn't needed).
As LBR msrs number is big, this will make vLBR overhead bigger.

Anyway, as vPMI may be frequent, this commit could reduce the number of vLBR release and creation, so it is still needed. 
> 
> > 4. Guest continue reading LBR records, KVM creates vLBR event again,
> > the vLBR event is the only LBR user on host now, host PMU driver will
> > reset HW LBR facility at vLBR creataion.
> > 5. Guest gets the remain LBR records with reset state.
> > This is conflict with FREEZE_LBR_ON_PMI meaning, so vLBR event can not
> > be release on PMI.
> >
> > This commit adds a freeze_on_pmi flag, this flag is set at pmi
> > injection and is cleared when guest operates guest DEBUGCTL_MSR. If
> > this flag is true, vLBR event will not be released.
> >
> > Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
> > ---
> >   arch/x86/kvm/vmx/pmu_intel.c |  5 ++++-
> >   arch/x86/kvm/vmx/vmx.c       | 12 +++++++++---
> >   arch/x86/kvm/vmx/vmx.h       |  3 +++
> >   3 files changed, 16 insertions(+), 4 deletions(-)
> >
> > diff --git a/arch/x86/kvm/vmx/pmu_intel.c
> > b/arch/x86/kvm/vmx/pmu_intel.c index f2efa0bf7ae8..3a36a91638c6 100644
> > --- a/arch/x86/kvm/vmx/pmu_intel.c
> > +++ b/arch/x86/kvm/vmx/pmu_intel.c
> > @@ -628,6 +628,7 @@ static void intel_pmu_init(struct kvm_vcpu *vcpu)
> >   	lbr_desc->records.nr = 0;
> >   	lbr_desc->event = NULL;
> >   	lbr_desc->msr_passthrough = false;
> > +	lbr_desc->freeze_on_pmi = false;
> >   }
> >
> >   static void intel_pmu_reset(struct kvm_vcpu *vcpu) @@ -670,6 +671,7
> > @@ static void intel_pmu_legacy_freezing_lbrs_on_pmi(struct kvm_vcpu
> *vcpu)
> >   	if (data & DEBUGCTLMSR_FREEZE_LBRS_ON_PMI) {
> >   		data &= ~DEBUGCTLMSR_LBR;
> >   		vmcs_write64(GUEST_IA32_DEBUGCTL, data);
> > +		vcpu_to_lbr_desc(vcpu)->freeze_on_pmi = true;
> >   	}
> >   }
> >
> > @@ -761,7 +763,8 @@ void vmx_passthrough_lbr_msrs(struct kvm_vcpu
> > *vcpu)
> >
> >   static void intel_pmu_cleanup(struct kvm_vcpu *vcpu)
> >   {
> > -	if (!(vmcs_read64(GUEST_IA32_DEBUGCTL) & DEBUGCTLMSR_LBR))
> > +	if (!(vmcs_read64(GUEST_IA32_DEBUGCTL) & DEBUGCTLMSR_LBR) &&
> > +	    !vcpu_to_lbr_desc(vcpu)->freeze_on_pmi)
> >   		intel_pmu_release_guest_lbr_event(vcpu);
> >   }
> >
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index
> > e6849f780dba..199d0da1dbee 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -2223,9 +2223,15 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu,
> struct msr_data *msr_info)
> >   			get_vmcs12(vcpu)->guest_ia32_debugctl = data;
> >
> >   		vmcs_write64(GUEST_IA32_DEBUGCTL, data);
> > -		if (intel_pmu_lbr_is_enabled(vcpu) && !to_vmx(vcpu)-
> >lbr_desc.event &&
> > -		    (data & DEBUGCTLMSR_LBR))
> > -			intel_pmu_create_guest_lbr_event(vcpu);
> > +
> > +		if (intel_pmu_lbr_is_enabled(vcpu)) {
> > +			struct lbr_desc *lbr_desc = vcpu_to_lbr_desc(vcpu);
> > +
> > +			lbr_desc->freeze_on_pmi = false;
> > +			if (!lbr_desc->event && (data & DEBUGCTLMSR_LBR))
> > +				intel_pmu_create_guest_lbr_event(vcpu);
> > +		}
> > +
> >   		return 0;
> >   	}
> >   	case MSR_IA32_BNDCFGS:
> > diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h index
> > c2130d2c8e24..9729ccfa75ae 100644
> > --- a/arch/x86/kvm/vmx/vmx.h
> > +++ b/arch/x86/kvm/vmx/vmx.h
> > @@ -107,6 +107,9 @@ struct lbr_desc {
> >
> >   	/* True if LBRs are marked as not intercepted in the MSR bitmap */
> >   	bool msr_passthrough;
> > +
> > +	/* True if LBR is frozen on PMI */
> > +	bool freeze_on_pmi;
> >   };
> >
> >   /*

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2023-09-13  6:00 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-01  7:28 [PATCH 0/9] Upgrade vPMU version to 5 Xiong Zhang
2023-09-01  7:28 ` [PATCH 1/9] KVM: x86/PMU: Don't release vLBR caused by PMI Xiong Zhang
2023-09-06  9:47   ` Mi, Dapeng
2023-09-12 11:54   ` Like Xu
2023-09-13  6:00     ` Zhang, Xiong Y
2023-09-01  7:28 ` [PATCH 2/9] KVM: x85/pmu: Add Streamlined FREEZE_LBR_ON_PMI for vPMU v4 Xiong Zhang
2023-09-06  9:49   ` Mi, Dapeng
2023-09-01  7:28 ` [PATCH 3/9] KVM: x86/pmu: Add PERF_GLOBAL_STATUS_SET MSR emulation Xiong Zhang
2023-09-01  7:28 ` [PATCH 4/9] KVM: x86/pmu: Add MSR_PERF_GLOBAL_INUSE emulation Xiong Zhang
2023-09-12 11:41   ` Like Xu
2023-09-13  5:11     ` Zhang, Xiong Y
2023-09-01  7:28 ` [PATCH 5/9] KVM: x86/pmu: Check CPUID.0AH.ECX consistency Xiong Zhang
2023-09-06  9:44   ` Mi, Dapeng
2023-09-12  0:45     ` Zhang, Xiong Y
2023-09-12 11:31   ` Like Xu
2023-09-13  4:25     ` Zhang, Xiong Y
2023-09-01  7:28 ` [PATCH 6/9] KVM: x86/pmu: Add Intel PMU supported fixed counters mask Xiong Zhang
2023-09-06 10:08   ` Mi, Dapeng
2023-09-01  7:28 ` [PATCH 7/9] KVM: x86/pmu: Add fixed counter enumeration for pmu v5 Xiong Zhang
2023-09-12 11:24   ` Like Xu
2023-09-13  4:11     ` Zhang, Xiong Y
2023-09-01  7:28 ` [PATCH 8/9] KVM: x86/pmu: Upgrade pmu version to 5 on intel processor Xiong Zhang
2023-09-12 11:19   ` Like Xu
2023-09-13  3:34     ` Zhang, Xiong Y
2023-09-01  7:28 ` [PATCH 9/9] KVM: selftests: Add fixed counters enumeration test case Xiong Zhang
2023-09-11  3:03   ` Mi, Dapeng
2023-09-12  0:35     ` Zhang, Xiong Y

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox