linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] Part of fix for host and guest LBR event coexist
@ 2023-06-16 11:33 Xiong Zhang
  2023-06-16 11:33 ` [PATCH 1/4] perf/x86/intel: Get shared reg constraints first for vLBR Xiong Zhang
                   ` (3 more replies)
  0 siblings, 4 replies; 22+ messages in thread
From: Xiong Zhang @ 2023-06-16 11:33 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: seanjc, pbonzini, peterz, like.xu.linux, kan.liang, zhenyuw,
	zhiyuan.lv, Xiong Zhang

Perf has four types of events: per cpu pinned event, per process pinned
event, per cpu event, per process event, their priority are from high to
low. This means higher priority event could premmpt lower prority event
and owns hardware resource. Perf scheduler activates an event on specific
cpu through sending ipi to the target cpu.

When guest access LBR msr at the first time, kvm will create a per process
pinned vLBR event which take part in perf scheduler. When vLBR event is
active, LBR will be owned by guest and guest could access LBR msr. When
vLBR event is inactive, LBR is ownned by host and guest couldn't access LBR
msr.

But current vLBR event is always active even if LBR is owned by host higher
prority per cpu pinned LBR event, this violates perf scheduler's rule. vLBR
event is a kind of perf event and doesn't have any special for perf
scheduler, it should follow perf scheduler's rule.

This patchset try to fix this violation, make vLBR event not break host,
and expects the following results when host and guest LBR event coexist:
1. If host per cpu pinned LBR event is active when vm starts, guest vLBR
event couldn't preempt LBR, so guest couldn't use LBR.
2. If host other LBR events are active when vm starts, guest vLBR event
could preempt LBR, so guest could use LBR.
3. If host per cpu pinned LBR event begin active when guest vLBR event is
active, guest vLBR event will lose LBR and guest couldn't use LBR anymore.
4. If host other LBR events begin active when guest vLBR event is active,
guest vLBR event keeps LBR, guest could still use LBR.
5. If host per cpu pinned LBR event becomes inactive when guest vLBR event
is inactive, guest vLBR event could be active and own LBR, so guest could
use LBR.

In the first three commits, each commit fix an issue when host and guest
LBR coexist, the fourth commit add a kernel selftests to cover the above
cases when host and guest LBR coexist.

Even with this patchset, the coexist of host and guest perf LBR events
still has gap, actually this gap exists in vPMU arch when host and guest
perf event coexist, kvm guest perf event could be inactive in two cases:
1. Counter or hw resource is full at kvm guest perf event creataion.
2. host higher priority event preempts kvm guest perf event in vm exit
handler.
But current guest couldn't get any notification about these failure, and
guest think its PMU still works, then get wrong data. Maybe some PV
interface is needed.

Perf command to create per cpu pinned LBR event:
perf record -b -a -e instructions:D

Xiong Zhang (4):
  perf/x86/intel: Get shared reg constraints first for vLBR
  KVM: VMX/pmu: Save host debugctlmsr just before vm entry
  KVM: vmx/pmu: Enable inactive vLBR event in guest LBR MSR emulation
  KVM:X86:selftests: Add test case for guest and host LBR preemption

 arch/x86/events/intel/core.c                  |   6 +-
 arch/x86/kvm/vmx/pmu_intel.c                  |   9 +-
 arch/x86/kvm/vmx/vmx.c                        |   5 +-
 tools/testing/selftests/kvm/Makefile          |   1 +
 .../selftests/kvm/include/ucall_common.h      |  17 ++
 .../kvm/x86_64/pmu_event_filter_test.c        |  16 --
 .../kvm/x86_64/vmx_pmu_lbr_contend.c          | 171 ++++++++++++++++++
 7 files changed, 201 insertions(+), 24 deletions(-)
 create mode 100644 tools/testing/selftests/kvm/x86_64/vmx_pmu_lbr_contend.c

-- 
2.25.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/4] perf/x86/intel: Get shared reg constraints first for vLBR
  2023-06-16 11:33 [PATCH 0/4] Part of fix for host and guest LBR event coexist Xiong Zhang
@ 2023-06-16 11:33 ` Xiong Zhang
  2023-06-28  4:25   ` Like Xu
  2023-06-16 11:33 ` [PATCH 2/4] KVM: VMX/pmu: Save host debugctlmsr just before vm entry Xiong Zhang
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 22+ messages in thread
From: Xiong Zhang @ 2023-06-16 11:33 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: seanjc, pbonzini, peterz, like.xu.linux, kan.liang, zhenyuw,
	zhiyuan.lv, Xiong Zhang

When host has per cpu pinned LBR event and guest use LBR also, host
couldn't get correct LBR data, as the physical LBR is preempted by
guest.

The rule for multi events sharing LBR is defined in
__intel_shared_reg_get_constraints(), but guest vLBR event skips this
function, so even if host has per cpu pinned LBR event, guest vLBR event
could get constraints successfully and make vlbr_exclude_host returns true,
finally host couldn't enable LBR in intel_pmu_lbr_enable_all().

This commit move intel_vlbr_constraints() behind
intel_shared_regs_constraints(), guest vLBR event will use LBR also and it
should get LBR resource through intel_shared_regs_constraints().

Fixes: 097e4311cda9 ("perf/x86: Add constraint to create guest LBR event without hw counter")
Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
---
 arch/x86/events/intel/core.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 6fd3cd97a6ac..2e27a69e9725 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -3347,15 +3347,15 @@ __intel_get_event_constraints(struct cpu_hw_events *cpuc, int idx,
 {
 	struct event_constraint *c;
 
-	c = intel_vlbr_constraints(event);
+	c = intel_bts_constraints(event);
 	if (c)
 		return c;
 
-	c = intel_bts_constraints(event);
+	c = intel_shared_regs_constraints(cpuc, event);
 	if (c)
 		return c;
 
-	c = intel_shared_regs_constraints(cpuc, event);
+	c = intel_vlbr_constraints(event);
 	if (c)
 		return c;
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 2/4] KVM: VMX/pmu: Save host debugctlmsr just before vm entry
  2023-06-16 11:33 [PATCH 0/4] Part of fix for host and guest LBR event coexist Xiong Zhang
  2023-06-16 11:33 ` [PATCH 1/4] perf/x86/intel: Get shared reg constraints first for vLBR Xiong Zhang
@ 2023-06-16 11:33 ` Xiong Zhang
  2023-06-23 20:15   ` Sean Christopherson
  2023-06-16 11:33 ` [PATCH 3/4] KVM: VMX/pmu: Enable inactive vLBR event in guest LBR MSR emulation Xiong Zhang
  2023-06-16 11:33 ` [PATCH 4/4] KVM: selftests: Add test case for guest and host LBR preemption Xiong Zhang
  3 siblings, 1 reply; 22+ messages in thread
From: Xiong Zhang @ 2023-06-16 11:33 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: seanjc, pbonzini, peterz, like.xu.linux, kan.liang, zhenyuw,
	zhiyuan.lv, Xiong Zhang

Perf defines four types of perf event: per cpu pinned event, per process
pinned event, per cpu event, per process event, their prioirity are from
high to low. vLBR event is per process pinned event. So durng vm exit
handler, if vLBR event preempts perf low priority LBR event, perf will
disable LBR and let guest control LBR, or if vLBR event is preempted by
perf high priority LBR event, perf will enable LBR. In a word LBR status
may be changed during vm exit handler.

MSR_IA32_DEBUGCTLMSR[0] controls LBR enabling, kvm saves its value into
vmx->host_debugctlmsr in vcpu_load(), and kvm restores its value from
vmx->host_debugctlmsr after vm exit immediately. Since
MSR_IA32_DEBUGCTLMSR[0] could be changed during vm exit handler, the
saved value vmx->host_debugctlmsr could be wrong. So this commit saves
MSR_IA32_DEBUGCTLMSR into vmx->host_debugctlmsr just before vm entry to
reflect the real hardware value.

Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
---
 arch/x86/kvm/vmx/vmx.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 44fb619803b8..5ca61a26d0d7 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1459,13 +1459,9 @@ void vmx_vcpu_load_vmcs(struct kvm_vcpu *vcpu, int cpu,
  */
 static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 {
-	struct vcpu_vmx *vmx = to_vmx(vcpu);
-
 	vmx_vcpu_load_vmcs(vcpu, cpu, NULL);
 
 	vmx_vcpu_pi_load(vcpu, cpu);
-
-	vmx->host_debugctlmsr = get_debugctlmsr();
 }
 
 static void vmx_vcpu_put(struct kvm_vcpu *vcpu)
@@ -7273,6 +7269,7 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)
 	atomic_switch_perf_msrs(vmx);
 	if (intel_pmu_lbr_is_enabled(vcpu))
 		vmx_passthrough_lbr_msrs(vcpu);
+	vmx->host_debugctlmsr = get_debugctlmsr();
 
 	if (enable_preemption_timer)
 		vmx_update_hv_timer(vcpu);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 3/4] KVM: VMX/pmu: Enable inactive vLBR event in guest LBR MSR emulation
  2023-06-16 11:33 [PATCH 0/4] Part of fix for host and guest LBR event coexist Xiong Zhang
  2023-06-16 11:33 ` [PATCH 1/4] perf/x86/intel: Get shared reg constraints first for vLBR Xiong Zhang
  2023-06-16 11:33 ` [PATCH 2/4] KVM: VMX/pmu: Save host debugctlmsr just before vm entry Xiong Zhang
@ 2023-06-16 11:33 ` Xiong Zhang
  2023-06-23 20:38   ` Sean Christopherson
  2023-06-28  5:50   ` Like Xu
  2023-06-16 11:33 ` [PATCH 4/4] KVM: selftests: Add test case for guest and host LBR preemption Xiong Zhang
  3 siblings, 2 replies; 22+ messages in thread
From: Xiong Zhang @ 2023-06-16 11:33 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: seanjc, pbonzini, peterz, like.xu.linux, kan.liang, zhenyuw,
	zhiyuan.lv, Xiong Zhang

vLBR event could be inactive in two case:
a. host per cpu pinned LBR event occupy LBR when vLBR event is created
b. vLBR event is preempted by host per cpu pinned LBR event during vm
exit handler.
When vLBR event is inactive, guest couldn't access LBR msr, and it is
forced into error state and is excluded from schedule by perf scheduler.
So vLBR event couldn't be active through perf scheduler even if host per
cpu pinned LBR event has released LBR, kvm could enable vLBR event
proactively, then vLBR event may be active and LBR msr could be passthrough
into guest.

Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
---
 arch/x86/kvm/vmx/pmu_intel.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 741efe2c497b..5a3ab8c8711b 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -314,7 +314,16 @@ static bool intel_pmu_handle_lbr_msrs_access(struct kvm_vcpu *vcpu,
 	if (!intel_pmu_is_valid_lbr_msr(vcpu, index))
 		return false;
 
-	if (!lbr_desc->event && intel_pmu_create_guest_lbr_event(vcpu) < 0)
+	/* vLBR event may be inactive, but physical LBR may be free now.
+	 * but vLBR event is pinned event, once it is inactive state, perf
+	 * will force it to error state in merge_sched_in() and exclude it from
+	 * perf schedule, so even if LBR is free now, vLBR event couldn't be active
+	 * through perf scheduler and vLBR event could be active through
+	 * perf_event_enable().
+	 */
+	if (lbr_desc->event && (lbr_desc->event->state == PERF_EVENT_STATE_ERROR))
+		perf_event_enable(lbr_desc->event);
+	else if (!lbr_desc->event && intel_pmu_create_guest_lbr_event(vcpu) < 0)
 		goto dummy;
 
 	/*
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 4/4] KVM: selftests: Add test case for guest and host LBR preemption
  2023-06-16 11:33 [PATCH 0/4] Part of fix for host and guest LBR event coexist Xiong Zhang
                   ` (2 preceding siblings ...)
  2023-06-16 11:33 ` [PATCH 3/4] KVM: VMX/pmu: Enable inactive vLBR event in guest LBR MSR emulation Xiong Zhang
@ 2023-06-16 11:33 ` Xiong Zhang
  2023-06-28  6:27   ` Like Xu
  2023-06-28  9:27   ` Yang, Weijiang
  3 siblings, 2 replies; 22+ messages in thread
From: Xiong Zhang @ 2023-06-16 11:33 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: seanjc, pbonzini, peterz, like.xu.linux, kan.liang, zhenyuw,
	zhiyuan.lv, Xiong Zhang

When guest access LBR msr at the first time, kvm will create a vLBR event,
vLBR event joins perf scheduler and occupy physical LBR for guest usage.
Once vLBR event is active and own LBR, guest could access LBR msr.

But vLBR event is per process pinned event, perf has higher priority event:
per cpu pinned LBR event, perf has lower priority events also: per cpu LBR
event and per process LBR event.
So if host doesn't have higher priority per cpu pinned LBR event, vLBR
event could occupy physical LBR always. But once per cpu pinned LBR event
is active, vLBR event couldn't be active anymore, then guest couldn't
access LBR msr.

This commit adds test case to cover guest and host lbr contend.

Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
---
 tools/testing/selftests/kvm/Makefile          |   1 +
 .../selftests/kvm/include/ucall_common.h      |  17 ++
 .../kvm/x86_64/pmu_event_filter_test.c        |  16 --
 .../kvm/x86_64/vmx_pmu_lbr_contend.c          | 171 ++++++++++++++++++
 4 files changed, 189 insertions(+), 16 deletions(-)
 create mode 100644 tools/testing/selftests/kvm/x86_64/vmx_pmu_lbr_contend.c

diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index 4761b768b773..422bbc16ba2a 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -100,6 +100,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/vmx_dirty_log_test
 TEST_GEN_PROGS_x86_64 += x86_64/vmx_exception_with_invalid_guest_state
 TEST_GEN_PROGS_x86_64 += x86_64/vmx_msrs_test
 TEST_GEN_PROGS_x86_64 += x86_64/vmx_invalid_nested_guest_state
+TEST_GEN_PROGS_x86_64 += x86_64/vmx_pmu_lbr_contend
 TEST_GEN_PROGS_x86_64 += x86_64/vmx_set_nested_state_test
 TEST_GEN_PROGS_x86_64 += x86_64/vmx_tsc_adjust_test
 TEST_GEN_PROGS_x86_64 += x86_64/vmx_nested_tsc_scaling_test
diff --git a/tools/testing/selftests/kvm/include/ucall_common.h b/tools/testing/selftests/kvm/include/ucall_common.h
index 1a6aaef5ccae..c1bb0cacf390 100644
--- a/tools/testing/selftests/kvm/include/ucall_common.h
+++ b/tools/testing/selftests/kvm/include/ucall_common.h
@@ -35,6 +35,23 @@ void ucall(uint64_t cmd, int nargs, ...);
 uint64_t get_ucall(struct kvm_vcpu *vcpu, struct ucall *uc);
 void ucall_init(struct kvm_vm *vm, vm_paddr_t mmio_gpa);
 
+/*
+ * Run the VM to the next GUEST_SYNC(value), and return the value passed
+ * to the sync. Any other exit from the guest is fatal.
+ */
+static inline uint64_t run_vcpu_to_sync(struct kvm_vcpu *vcpu)
+{
+	struct ucall uc;
+
+	vcpu_run(vcpu);
+	TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_IO);
+	get_ucall(vcpu, &uc);
+	TEST_ASSERT(uc.cmd == UCALL_SYNC,
+		    "Received ucall other than UCALL_SYNC: %lu", uc.cmd);
+
+	return uc.args[1];
+}
+
 /*
  * Perform userspace call without any associated data.  This bare call avoids
  * allocating a ucall struct, which can be useful if the atomic operations in
diff --git a/tools/testing/selftests/kvm/x86_64/pmu_event_filter_test.c b/tools/testing/selftests/kvm/x86_64/pmu_event_filter_test.c
index 40507ed9fe8a..8c68029cfb4b 100644
--- a/tools/testing/selftests/kvm/x86_64/pmu_event_filter_test.c
+++ b/tools/testing/selftests/kvm/x86_64/pmu_event_filter_test.c
@@ -177,22 +177,6 @@ static void amd_guest_code(void)
 	}
 }
 
-/*
- * Run the VM to the next GUEST_SYNC(value), and return the value passed
- * to the sync. Any other exit from the guest is fatal.
- */
-static uint64_t run_vcpu_to_sync(struct kvm_vcpu *vcpu)
-{
-	struct ucall uc;
-
-	vcpu_run(vcpu);
-	TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_IO);
-	get_ucall(vcpu, &uc);
-	TEST_ASSERT(uc.cmd == UCALL_SYNC,
-		    "Received ucall other than UCALL_SYNC: %lu", uc.cmd);
-	return uc.args[1];
-}
-
 static void run_vcpu_and_sync_pmc_results(struct kvm_vcpu *vcpu)
 {
 	uint64_t r;
diff --git a/tools/testing/selftests/kvm/x86_64/vmx_pmu_lbr_contend.c b/tools/testing/selftests/kvm/x86_64/vmx_pmu_lbr_contend.c
new file mode 100644
index 000000000000..a6a793f08515
--- /dev/null
+++ b/tools/testing/selftests/kvm/x86_64/vmx_pmu_lbr_contend.c
@@ -0,0 +1,171 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Test for host and guest LBR preemption
+ *
+ * Copyright (C) 2021 Intel Corporation
+ *
+ */
+
+#define _GNU_SOURCEGG
+
+#include <linux/perf_event.h>
+#include <sys/syscall.h>
+#include <sys/sysinfo.h>
+#include <unistd.h>
+
+#include "test_util.h"
+#include "kvm_util.h"
+#include "processor.h"
+
+static void create_perf_events(int *fds, int cpu_num, bool pinned)
+{
+	struct perf_event_attr attr = {
+		.type = PERF_TYPE_HARDWARE,
+		.size = sizeof(attr),
+		.config = PERF_COUNT_HW_CPU_CYCLES,
+		.sample_type = PERF_SAMPLE_BRANCH_STACK,
+		.sample_period = 1000,
+		.pinned = pinned,
+		.branch_sample_type = PERF_SAMPLE_BRANCH_CALL_STACK |
+				      PERF_SAMPLE_BRANCH_USER |
+				      PERF_SAMPLE_BRANCH_KERNEL,
+	};
+	int i;
+
+	for (i = 0; i < cpu_num; i++) {
+		fds[i] = syscall(__NR_perf_event_open, &attr, -1, i, -1, PERF_FLAG_FD_CLOEXEC);
+		TEST_ASSERT(fds[i] != -1, "Failed to create lbr event on cpu%d", i);
+	}
+}
+
+static void release_perf_events(int *fds, int cpu_num)
+{
+	int i;
+
+	for (i = 0; i < cpu_num; i++)
+		close(fds[i]);
+}
+
+#define PERF_CAP_LBR_FMT_MASK  0x1F
+
+#define LBR_NOT_SUPPORTED  0xFFFE
+#define LBR_MSR_WRITE_ERROR 0xFFFD
+
+#define LBR_MODE_CHECK_PASS 0x0
+#define LBR_MSR_WRITE_SUCC  0x1
+
+static bool check_lbr_msr(void)
+{
+	uint64_t v, old_val;
+
+	old_val = rdmsr(MSR_LBR_TOS);
+
+	v  = old_val ^ 0x3UL;
+
+	wrmsr(MSR_LBR_TOS, v);
+	if (rdmsr(MSR_LBR_TOS) != v)
+		return false;
+
+	wrmsr(MSR_LBR_TOS, old_val);
+	if (rdmsr(MSR_LBR_TOS) != old_val)
+		return false;
+
+	return true;
+}
+
+static void guest_code(void)
+{
+	uint64_t v;
+
+	v = rdmsr(MSR_IA32_PERF_CAPABILITIES);
+	if ((v & PERF_CAP_LBR_FMT_MASK) == 0)
+		GUEST_SYNC(LBR_NOT_SUPPORTED);
+
+	GUEST_SYNC(LBR_MODE_CHECK_PASS);
+
+	while (1) {
+		if (!check_lbr_msr()) {
+			GUEST_SYNC(LBR_MSR_WRITE_ERROR);
+			continue;
+		}
+
+		/* Enable LBR to avoid KVM recyling LBR. */
+		 v = rdmsr(MSR_IA32_DEBUGCTLMSR);
+		 v |= DEBUGCTLMSR_LBR;
+		 wrmsr(MSR_IA32_DEBUGCTLMSR, v);
+
+		GUEST_SYNC(LBR_MSR_WRITE_SUCC);
+	}
+}
+
+int main(int argc, char *argv[])
+{
+	int *fds, ncpus;
+	struct kvm_vcpu *vcpu;
+	struct kvm_vm *vm;
+	uint64_t r;
+
+	TEST_REQUIRE(get_kvm_param_bool("enable_pmu"));
+	TEST_REQUIRE(host_cpu_is_intel);
+	TEST_REQUIRE(kvm_cpu_property(X86_PROPERTY_PMU_VERSION));
+
+	vm = vm_create_with_one_vcpu(&vcpu, guest_code);
+	r = run_vcpu_to_sync(vcpu);
+	TEST_ASSERT(r == LBR_MODE_CHECK_PASS,
+		    "LBR format in guest PERF_CAP msr isn't correct");
+
+	ncpus = get_nprocs();
+	fds = malloc(sizeof(int) * ncpus);
+	TEST_ASSERT(fds != NULL, "Failed to create fds for all cpus");
+
+	/* Create per cpu pinned LBR event, then it will own LBR. */
+	create_perf_events(fds, ncpus, true);
+
+	/* Since LBR is owned by per cpu pinned LBR event, guest couldn't get it,
+	 * so guest couldn't access LBR_TOS msr.
+	 */
+	r = run_vcpu_to_sync(vcpu);
+	TEST_ASSERT(r == LBR_MSR_WRITE_ERROR,
+		    "1. Unexpected successfully read/write guest LBR_TO msr");
+
+	release_perf_events(fds, ncpus);
+
+	/* Since per cpu pinned event is closed and LBR is free, guest could get it,
+	 * so guest could access LBR_TOS msr.
+	 */
+	r = run_vcpu_to_sync(vcpu);
+	TEST_ASSERT(r == LBR_MSR_WRITE_SUCC,
+		    "2. Failed to read/write guest LBR_TO msr");
+
+	/* Create per cpu LBR event, its priority is lower than vLBR event, and it
+	 *  couldn't get LBR back from vLBR
+	 */
+	create_perf_events(fds, ncpus, false);
+
+	/* LBR is still owned by guest, So guest could access LBR_TOS successfully. */
+	r = run_vcpu_to_sync(vcpu);
+	TEST_ASSERT(r == LBR_MSR_WRITE_SUCC,
+		    "3. Failed read/write guest LBR_TO msr");
+
+	release_perf_events(fds, ncpus);
+
+	/* Create per cpu pinned LBR event, its priority is higher than vLBR event,
+	 * so it will get LBR back from vLBR.
+	 */
+	create_perf_events(fds, ncpus, true);
+
+	/* LBR is preepmted by per cpu pinned LBR event, guest couldn't access
+	 * LBR_TOS msr.
+	 */
+	r = run_vcpu_to_sync(vcpu);
+	TEST_ASSERT(r == LBR_MSR_WRITE_ERROR,
+		    "4. Unexpected successfully read/write guest LBR_TO msr");
+
+	release_perf_events(fds, ncpus);
+
+	kvm_vm_free(vm);
+
+	free(fds);
+
+	return 0;
+}
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/4] KVM: VMX/pmu: Save host debugctlmsr just before vm entry
  2023-06-16 11:33 ` [PATCH 2/4] KVM: VMX/pmu: Save host debugctlmsr just before vm entry Xiong Zhang
@ 2023-06-23 20:15   ` Sean Christopherson
  2023-06-25  4:03     ` Zhang, Xiong Y
  0 siblings, 1 reply; 22+ messages in thread
From: Sean Christopherson @ 2023-06-23 20:15 UTC (permalink / raw)
  To: Xiong Zhang
  Cc: kvm, linux-kernel, pbonzini, peterz, like.xu.linux, kan.liang,
	zhenyuw, zhiyuan.lv

On Fri, Jun 16, 2023, Xiong Zhang wrote:
> Perf defines four types of perf event: per cpu pinned event, per process
> pinned event, per cpu event, per process event, their prioirity are from
> high to low. vLBR event is per process pinned event. So durng vm exit
> handler, if vLBR event preempts perf low priority LBR event, perf will
> disable LBR and let guest control LBR, or if vLBR event is preempted by
> perf high priority LBR event, perf will enable LBR. In a word LBR status
> may be changed during vm exit handler.
> 
> MSR_IA32_DEBUGCTLMSR[0] controls LBR enabling, kvm saves its value into
> vmx->host_debugctlmsr in vcpu_load(), and kvm restores its value from
> vmx->host_debugctlmsr after vm exit immediately. Since
> MSR_IA32_DEBUGCTLMSR[0] could be changed during vm exit handler, the
> saved value vmx->host_debugctlmsr could be wrong. So this commit saves
> MSR_IA32_DEBUGCTLMSR into vmx->host_debugctlmsr just before vm entry to
> reflect the real hardware value.
> 
> Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
> ---
>  arch/x86/kvm/vmx/vmx.c | 5 +----
>  1 file changed, 1 insertion(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 44fb619803b8..5ca61a26d0d7 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -1459,13 +1459,9 @@ void vmx_vcpu_load_vmcs(struct kvm_vcpu *vcpu, int cpu,
>   */
>  static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  {
> -	struct vcpu_vmx *vmx = to_vmx(vcpu);
> -
>  	vmx_vcpu_load_vmcs(vcpu, cpu, NULL);
>  
>  	vmx_vcpu_pi_load(vcpu, cpu);
> -
> -	vmx->host_debugctlmsr = get_debugctlmsr();
>  }
>  
>  static void vmx_vcpu_put(struct kvm_vcpu *vcpu)
> @@ -7273,6 +7269,7 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)
>  	atomic_switch_perf_msrs(vmx);
>  	if (intel_pmu_lbr_is_enabled(vcpu))
>  		vmx_passthrough_lbr_msrs(vcpu);
> +	vmx->host_debugctlmsr = get_debugctlmsr();

Reading DEBUG_CTL on every VM-Entry is either unnecessary or insufficient.  If
the DEBUG_CTL value is being changed synchronously, then just fix whatever KVM
path leads to a change in the host avlue.  If DEBUG_CTL is being changed
asynchronously, then I'm guessing the change is coming from NMI context, which
means that KVM is buggy no matter how close we put this to VM-Enter.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/4] KVM: VMX/pmu: Enable inactive vLBR event in guest LBR MSR emulation
  2023-06-16 11:33 ` [PATCH 3/4] KVM: VMX/pmu: Enable inactive vLBR event in guest LBR MSR emulation Xiong Zhang
@ 2023-06-23 20:38   ` Sean Christopherson
  2023-06-25  6:54     ` Zhang, Xiong Y
  2023-06-28  5:50   ` Like Xu
  1 sibling, 1 reply; 22+ messages in thread
From: Sean Christopherson @ 2023-06-23 20:38 UTC (permalink / raw)
  To: Xiong Zhang
  Cc: kvm, linux-kernel, pbonzini, peterz, like.xu.linux, kan.liang,
	zhenyuw, zhiyuan.lv

On Fri, Jun 16, 2023, Xiong Zhang wrote:
> vLBR event could be inactive in two case:
> a. host per cpu pinned LBR event occupy LBR when vLBR event is created
> b. vLBR event is preempted by host per cpu pinned LBR event during vm
> exit handler.
> When vLBR event is inactive, guest couldn't access LBR msr, and it is
> forced into error state and is excluded from schedule by perf scheduler.
> So vLBR event couldn't be active through perf scheduler even if host per
> cpu pinned LBR event has released LBR, kvm could enable vLBR event
> proactively, then vLBR event may be active and LBR msr could be passthrough
> into guest.
> 
> Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
> ---
>  arch/x86/kvm/vmx/pmu_intel.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
> index 741efe2c497b..5a3ab8c8711b 100644
> --- a/arch/x86/kvm/vmx/pmu_intel.c
> +++ b/arch/x86/kvm/vmx/pmu_intel.c
> @@ -314,7 +314,16 @@ static bool intel_pmu_handle_lbr_msrs_access(struct kvm_vcpu *vcpu,
>  	if (!intel_pmu_is_valid_lbr_msr(vcpu, index))
>  		return false;
>  
> -	if (!lbr_desc->event && intel_pmu_create_guest_lbr_event(vcpu) < 0)
> +	/* vLBR event may be inactive, but physical LBR may be free now.

	/*
	 * This is the preferred block comment style.
	 */

> +	 * but vLBR event is pinned event, once it is inactive state, perf
> +	 * will force it to error state in merge_sched_in() and exclude it from
> +	 * perf schedule, so even if LBR is free now, vLBR event couldn't be active
> +	 * through perf scheduler and vLBR event could be active through
> +	 * perf_event_enable().
> +	 */

Trimming that down, is this what you mean?

	/*
	 * Attempt to re-enable the vLBR event if it was disabled due to
	 * contention with host LBR usage, i.e. was put into an error state.
	 * Perf doesn't notify KVM if the host stops using LBRs, i.e. KVM needs
	 * to manually re-enable the event.
	 */

Which begs the question, why can't there be a notification of some form that the
LBRs are once again available?

Assuming that's too difficult for whatever reason, why wait until the guest tries
to read LBRs?  E.g. why not be more aggressive and try to re-enable vLBRs on every
VM-Exit.

And if we do wait until the guest touches relevant MSRs, shouldn't writes to
DEBUG_CTL that set DEBUGCTLMSR_LBR also try to re-enable the event?

Lastly, what guarantees that the MSRs hold guest data?  I assume perf purges the
MSRs at some point, but it would be helpful to call that out in the changelog.

> +	if (lbr_desc->event && (lbr_desc->event->state == PERF_EVENT_STATE_ERROR))
> +		perf_event_enable(lbr_desc->event);
> +	else if (!lbr_desc->event && intel_pmu_create_guest_lbr_event(vcpu) < 0)
>  		goto dummy;
>  
>  	/*
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH 2/4] KVM: VMX/pmu: Save host debugctlmsr just before vm entry
  2023-06-23 20:15   ` Sean Christopherson
@ 2023-06-25  4:03     ` Zhang, Xiong Y
  2023-06-28  5:37       ` Like Xu
  0 siblings, 1 reply; 22+ messages in thread
From: Zhang, Xiong Y @ 2023-06-25  4:03 UTC (permalink / raw)
  To: Christopherson,, Sean
  Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	pbonzini@redhat.com, peterz@infradead.org,
	like.xu.linux@gmail.com, kan.liang@linux.intel.com,
	zhenyuw@linux.intel.com, Lv, Zhiyuan

> On Fri, Jun 16, 2023, Xiong Zhang wrote:
> > Perf defines four types of perf event: per cpu pinned event, per
> > process pinned event, per cpu event, per process event, their
> > prioirity are from high to low. vLBR event is per process pinned
> > event. So durng vm exit handler, if vLBR event preempts perf low
> > priority LBR event, perf will disable LBR and let guest control LBR,
> > or if vLBR event is preempted by perf high priority LBR event, perf
> > will enable LBR. In a word LBR status may be changed during vm exit handler.
> >
> > MSR_IA32_DEBUGCTLMSR[0] controls LBR enabling, kvm saves its value
> > into
> > vmx->host_debugctlmsr in vcpu_load(), and kvm restores its value from
> > vmx->host_debugctlmsr after vm exit immediately. Since
> > MSR_IA32_DEBUGCTLMSR[0] could be changed during vm exit handler, the
> > saved value vmx->host_debugctlmsr could be wrong. So this commit saves
> > MSR_IA32_DEBUGCTLMSR into vmx->host_debugctlmsr just before vm entry
> > to reflect the real hardware value.
> >
> > Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
> > ---
> >  arch/x86/kvm/vmx/vmx.c | 5 +----
> >  1 file changed, 1 insertion(+), 4 deletions(-)
> >
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index
> > 44fb619803b8..5ca61a26d0d7 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -1459,13 +1459,9 @@ void vmx_vcpu_load_vmcs(struct kvm_vcpu *vcpu,
> int cpu,
> >   */
> >  static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)  {
> > -	struct vcpu_vmx *vmx = to_vmx(vcpu);
> > -
> >  	vmx_vcpu_load_vmcs(vcpu, cpu, NULL);
> >
> >  	vmx_vcpu_pi_load(vcpu, cpu);
> > -
> > -	vmx->host_debugctlmsr = get_debugctlmsr();
> >  }
> >
> >  static void vmx_vcpu_put(struct kvm_vcpu *vcpu) @@ -7273,6 +7269,7 @@
> > static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)
> >  	atomic_switch_perf_msrs(vmx);
> >  	if (intel_pmu_lbr_is_enabled(vcpu))
> >  		vmx_passthrough_lbr_msrs(vcpu);
> > +	vmx->host_debugctlmsr = get_debugctlmsr();
> 
> Reading DEBUG_CTL on every VM-Entry is either unnecessary or insufficient.  If
> the DEBUG_CTL value is being changed synchronously, then just fix whatever
> KVM path leads to a change in the host avlue.  If DEBUG_CTL is being changed
> asynchronously, then I'm guessing the change is coming from NMI context,
> which means that KVM is buggy no matter how close we put this to VM-Enter.
When a perf event reschedule is needed on a physical cpu, perf scheduler send an IPI to the target cpu, LBR will be enabled or disabled in the IPI handler according to active event attribute.
If vLBR event is active, LBR is disabled in IPI handler.
If Host LBR event is active, LBR is enabled in the IPI handler, this could happen when host LBR event preempt vLBR event during vm exit handler.
DEBUG_CTL[0]'s changing is asynchronous in the perf IPI handler,  host irq is disabled near VM-Enter, so IPI couldn't happen, then host DEBUG_CTL[0] couldn't change before kvm enable host irq.
Perf event counter overflow (PMI) is a NMI, but this NMI handler doesn't change LBR status, the kvm saved host_debugctlmsr is correct still after PMI handler.

thanks 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH 3/4] KVM: VMX/pmu: Enable inactive vLBR event in guest LBR MSR emulation
  2023-06-23 20:38   ` Sean Christopherson
@ 2023-06-25  6:54     ` Zhang, Xiong Y
  2023-06-26 17:00       ` Sean Christopherson
  0 siblings, 1 reply; 22+ messages in thread
From: Zhang, Xiong Y @ 2023-06-25  6:54 UTC (permalink / raw)
  To: Christopherson,, Sean
  Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	pbonzini@redhat.com, peterz@infradead.org,
	like.xu.linux@gmail.com, kan.liang@linux.intel.com,
	zhenyuw@linux.intel.com, Lv, Zhiyuan

> > On Fri, Jun 16, 2023, Xiong Zhang wrote:
> > vLBR event could be inactive in two case:
> > a. host per cpu pinned LBR event occupy LBR when vLBR event is created
> > b. vLBR event is preempted by host per cpu pinned LBR event during vm
> > exit handler.
> > When vLBR event is inactive, guest couldn't access LBR msr, and it is
> > forced into error state and is excluded from schedule by perf scheduler.
> > So vLBR event couldn't be active through perf scheduler even if host
> > per cpu pinned LBR event has released LBR, kvm could enable vLBR event
> > proactively, then vLBR event may be active and LBR msr could be
> > passthrough into guest.
> >
> > Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
> > ---
> >  arch/x86/kvm/vmx/pmu_intel.c | 11 ++++++++++-
> >  1 file changed, 10 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kvm/vmx/pmu_intel.c
> > b/arch/x86/kvm/vmx/pmu_intel.c index 741efe2c497b..5a3ab8c8711b 100644
> > --- a/arch/x86/kvm/vmx/pmu_intel.c
> > +++ b/arch/x86/kvm/vmx/pmu_intel.c
> > @@ -314,7 +314,16 @@ static bool intel_pmu_handle_lbr_msrs_access(struct
> kvm_vcpu *vcpu,
> >  	if (!intel_pmu_is_valid_lbr_msr(vcpu, index))
> >  		return false;
> >
> > -	if (!lbr_desc->event && intel_pmu_create_guest_lbr_event(vcpu) < 0)
> > +	/* vLBR event may be inactive, but physical LBR may be free now.
> 
> 	/*
> 	 * This is the preferred block comment style.
> 	 */
> 
> > +	 * but vLBR event is pinned event, once it is inactive state, perf
> > +	 * will force it to error state in merge_sched_in() and exclude it from
> > +	 * perf schedule, so even if LBR is free now, vLBR event couldn't be
> active
> > +	 * through perf scheduler and vLBR event could be active through
> > +	 * perf_event_enable().
> > +	 */
> 
> Trimming that down, is this what you mean?
Yes, thanks a lot.
> 
> 	/*
> 	 * Attempt to re-enable the vLBR event if it was disabled due to
> 	 * contention with host LBR usage, i.e. was put into an error state.
> 	 * Perf doesn't notify KVM if the host stops using LBRs, i.e. KVM needs
> 	 * to manually re-enable the event.
> 	 */
> 
> Which begs the question, why can't there be a notification of some form that
> the LBRs are once again available?
This is perf scheduler rule. If pinned event couldn't get resource as resource limitation, perf will put it into error state and exclude it from perf scheduler, even if resource available later, perf won't schedule it again as it is in error state, the only way to reschedule it is to enable it again. 
If non-pinned event couldn't get resource as resource limitation, perf will put it into inactive state, perf will reschedule it automatically once resource is available.
vLBR event is per process pinned event.
> 
> Assuming that's too difficult for whatever reason, why wait until the guest tries
> to read LBRs?  E.g. why not be more aggressive and try to re-enable vLBRs on
> every VM-Exit.
Yes, it is a good suggestion. Actually vmx_passthrough_lbr_msrs() is called on every
VM-exit, it also check vLBR event state, so I could re-enable vLBR in this function.
> 
> And if we do wait until the guest touches relevant MSRs, shouldn't writes to
> DEBUG_CTL that set DEBUGCTLMSR_LBR also try to re-enable the event?
Only perf know whether vLBR event could be active or not at this moment, if vLBR is active, KVM could read/write DEBUG_CTL[0] with irq disable/enable pair in theory, but it is better that kvm don't touch perf hw resource directly, as vLBR event is one host LBR event, host may have other LBR event, perf will schedule them according to perf scheduler rule.  If vLBR is inactive, KVM shouldn't touch DEBUG_CTL MSR totally.
> 
> Lastly, what guarantees that the MSRs hold guest data?  I assume perf purges
> the MSRs at some point, but it would be helpful to call that out in the changelog.
For DEBUG_CTL msr, VMCS has two fields for this:
1. "Save debug controls" in VM-Exit controls
2. "Load debug controls" in VM-Entry controls
For LBR records MSRs, perf will save them at process schedule out and load them at process schedule in.
Sure, I will add it into changelog.

> > +	if (lbr_desc->event && (lbr_desc->event->state ==
> PERF_EVENT_STATE_ERROR))
> > +		perf_event_enable(lbr_desc->event);
> > +	else if (!lbr_desc->event && intel_pmu_create_guest_lbr_event(vcpu)
> > +< 0)
> >  		goto dummy;
> >
> >  	/*
> > --
> > 2.25.1
> >

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/4] KVM: VMX/pmu: Enable inactive vLBR event in guest LBR MSR emulation
  2023-06-25  6:54     ` Zhang, Xiong Y
@ 2023-06-26 17:00       ` Sean Christopherson
  2023-06-27  3:29         ` Zhang, Xiong Y
  0 siblings, 1 reply; 22+ messages in thread
From: Sean Christopherson @ 2023-06-26 17:00 UTC (permalink / raw)
  To: Xiong Y Zhang
  Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	pbonzini@redhat.com, peterz@infradead.org,
	like.xu.linux@gmail.com, kan.liang@linux.intel.com,
	zhenyuw@linux.intel.com, Zhiyuan Lv, Yang Weijiang

+Weijiang

On Sun, Jun 25, 2023, Xiong Y Zhang wrote:
> > > On Fri, Jun 16, 2023, Xiong Zhang wrote:
> > > vLBR event could be inactive in two case:
> > > a. host per cpu pinned LBR event occupy LBR when vLBR event is created
> > > b. vLBR event is preempted by host per cpu pinned LBR event during vm
> > > exit handler.
> > > When vLBR event is inactive, guest couldn't access LBR msr, and it is
> > > forced into error state and is excluded from schedule by perf scheduler.
> > > So vLBR event couldn't be active through perf scheduler even if host
> > > per cpu pinned LBR event has released LBR, kvm could enable vLBR event
> > > proactively, then vLBR event may be active and LBR msr could be
> > > passthrough into guest.
> > >
> > > Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
> > > ---
> > >  arch/x86/kvm/vmx/pmu_intel.c | 11 ++++++++++-
> > >  1 file changed, 10 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/arch/x86/kvm/vmx/pmu_intel.c
> > > b/arch/x86/kvm/vmx/pmu_intel.c index 741efe2c497b..5a3ab8c8711b 100644
> > > --- a/arch/x86/kvm/vmx/pmu_intel.c
> > > +++ b/arch/x86/kvm/vmx/pmu_intel.c
> > > @@ -314,7 +314,16 @@ static bool intel_pmu_handle_lbr_msrs_access(struct
> > kvm_vcpu *vcpu,
> > >  	if (!intel_pmu_is_valid_lbr_msr(vcpu, index))
> > >  		return false;
> > >
> > > -	if (!lbr_desc->event && intel_pmu_create_guest_lbr_event(vcpu) < 0)
> > > +	/* vLBR event may be inactive, but physical LBR may be free now.
> > 
> > 	/*
> > 	 * This is the preferred block comment style.
> > 	 */
> > 
> > > +	 * but vLBR event is pinned event, once it is inactive state, perf
> > > +	 * will force it to error state in merge_sched_in() and exclude it from
> > > +	 * perf schedule, so even if LBR is free now, vLBR event couldn't be
> > active
> > > +	 * through perf scheduler and vLBR event could be active through
> > > +	 * perf_event_enable().
> > > +	 */
> > 
> > Trimming that down, is this what you mean?
> Yes, thanks a lot.
> > 
> > 	/*
> > 	 * Attempt to re-enable the vLBR event if it was disabled due to
> > 	 * contention with host LBR usage, i.e. was put into an error state.
> > 	 * Perf doesn't notify KVM if the host stops using LBRs, i.e. KVM needs
> > 	 * to manually re-enable the event.
> > 	 */
> > 
> > Which begs the question, why can't there be a notification of some form that
> > the LBRs are once again available?
> This is perf scheduler rule. If pinned event couldn't get resource as
> resource limitation, perf will put it into error state and exclude it from
> perf scheduler, even if resource available later, perf won't schedule it
> again as it is in error state, the only way to reschedule it is to enable it
> again.  If non-pinned event couldn't get resource as resource limitation,
> perf will put it into inactive state, perf will reschedule it automatically
> once resource is available.  vLBR event is per process pinned event.

That doesn't answer my question.  I get that all of this is subject to perf
scheduling, I'm asking why perf doesn't communicate directly with KVM to coordinate
access to LBRs instead of pulling the rug out from under KVM.

> > Assuming that's too difficult for whatever reason, why wait until the guest tries
> > to read LBRs?  E.g. why not be more aggressive and try to re-enable vLBRs on
> > every VM-Exit.
> Yes, it is a good suggestion. Actually vmx_passthrough_lbr_msrs() is called on every
> VM-exit, it also check vLBR event state, so I could re-enable vLBR in this function.
> > 
> > And if we do wait until the guest touches relevant MSRs, shouldn't writes to
> > DEBUG_CTL that set DEBUGCTLMSR_LBR also try to re-enable the event?
> Only perf know whether vLBR event could be active or not at this moment, if
> vLBR is active, KVM could read/write DEBUG_CTL[0] with irq disable/enable
> pair in theory, but it is better that kvm don't touch perf hw resource
> directly, as vLBR event is one host LBR event, host may have other LBR event,
> perf will schedule them according to perf scheduler rule.  If vLBR is
> inactive, KVM shouldn't touch DEBUG_CTL MSR totally.

Again, this doesn't answer my question.  I didn't suggest KVM write anything
directly, I asked why KVM doesn't try to re-enable the perf LBR event when emulating
a guest write to DEBUG_CTL.

> > Lastly, what guarantees that the MSRs hold guest data?  I assume perf purges
> > the MSRs at some point, but it would be helpful to call that out in the changelog.
> For DEBUG_CTL msr, VMCS has two fields for this:
> 1. "Save debug controls" in VM-Exit controls
> 2. "Load debug controls" in VM-Entry controls
> For LBR records MSRs, perf will save them at process schedule out and load them at process schedule in.

Once again, this doesn't answer my question.  I want to know *exactly* when perf
can take control of the LBRs.  The fact that intel_pmu_handle_lbr_msrs_access()
disables IRQs when checking lbr_desc->event->state strongly suggests that the
answer isn't "when a task is scheduled in".

Your other response[1] mostly answered that question, but I want explicit documentation
on the contract between perf and KVM with respect to LBRs.  In short, please work
with Weijiang to fulfill my request/demand[*] that someone document KVM's LBR support,
and justify the "design".  I am simply not willing to take KVM LBR patches until that
documentation is provided.

Copy+pasting my request/demand for convenience:

  : First and foremost, the existing LBR support needs to be documented.  Someone,
  : I don't care who, needs to provide a detailed writeup of the contract between KVM
  : and perf.  Specifically, I want to know:
  : 
  :   1. When exactly is perf allowed to take control of LBR MRS.  Task switch?  IRQ?
  :      NMI?
  : 
  :   2. What is the expected behavior when perf is using LBRs?  Is the guest supposed
  :      to be traced?
  : 
  :   3. Why does KVM snapshot DEBUGCTL with IRQs enabled, but disables IRQs when
  :      accessing LBR MSRs?
  : 
  : It doesn't have to be polished, e.g. I'll happily wordsmith things into proper
  : documentation, but I want to have a very clear understanding of how LBR support
  : is _intended_ to function and how it all _actually_ functions without having to
  : make guesses.
  : 
  : And depending on the answers, I want to revisit KVM's LBR implementation before
  : tackling arch LBRs.  Letting perf usurp LBRs while KVM has the vCPU loaded is
  : frankly ridiculous.  Just have perf set a flag telling KVM that it needs to take
  : control of LBRs and have KVM service the flag as a request or something.  Stealing
  : the LBRs back in IRQ context adds a stupid amount of complexity without much value,
  : e.g. waiting a few branches for KVM to get to a safe place isn't going to meaningfully
  : change the traces.  If that can't actually happen, then why on earth does KVM need
  : to disable IRQs to read MSRs?
  : 
  : And AFAICT, since KVM unconditionally loads the guest's DEBUGCTL, whether or not
  : guest branches show up in the LBRs when the host is tracing is completely up to
  : the whims of the guest.  If that's correct, then again, what's the point of the
  : dance between KVM and perf?

[1] https://lkml.kernel.org/r/MW4PR11MB5824480492ED55C63769FF11BB21A%40MW4PR11MB5824.namprd11.prod.outlook.com
[2] https://lore.kernel.org/all/Y9RUOvJ5dkCU9J8C@google.com

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH 3/4] KVM: VMX/pmu: Enable inactive vLBR event in guest LBR MSR emulation
  2023-06-26 17:00       ` Sean Christopherson
@ 2023-06-27  3:29         ` Zhang, Xiong Y
  2023-06-27 15:07           ` Sean Christopherson
  0 siblings, 1 reply; 22+ messages in thread
From: Zhang, Xiong Y @ 2023-06-27  3:29 UTC (permalink / raw)
  To: Christopherson,, Sean
  Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	pbonzini@redhat.com, peterz@infradead.org,
	like.xu.linux@gmail.com, kan.liang@linux.intel.com,
	zhenyuw@linux.intel.com, Lv, Zhiyuan, Yang, Weijiang

> On Sun, Jun 25, 2023, Xiong Y Zhang wrote:
> > > > On Fri, Jun 16, 2023, Xiong Zhang wrote:
> > > > vLBR event could be inactive in two case:
> > > > a. host per cpu pinned LBR event occupy LBR when vLBR event is
> > > > created b. vLBR event is preempted by host per cpu pinned LBR
> > > > event during vm exit handler.
> > > > When vLBR event is inactive, guest couldn't access LBR msr, and it
> > > > is forced into error state and is excluded from schedule by perf scheduler.
> > > > So vLBR event couldn't be active through perf scheduler even if
> > > > host per cpu pinned LBR event has released LBR, kvm could enable
> > > > vLBR event proactively, then vLBR event may be active and LBR msr
> > > > could be passthrough into guest.
> > > >
> > > > Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
> > > > ---
> > > >  arch/x86/kvm/vmx/pmu_intel.c | 11 ++++++++++-
> > > >  1 file changed, 10 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/arch/x86/kvm/vmx/pmu_intel.c
> > > > b/arch/x86/kvm/vmx/pmu_intel.c index 741efe2c497b..5a3ab8c8711b
> > > > 100644
> > > > --- a/arch/x86/kvm/vmx/pmu_intel.c
> > > > +++ b/arch/x86/kvm/vmx/pmu_intel.c
> > > > @@ -314,7 +314,16 @@ static bool
> > > > intel_pmu_handle_lbr_msrs_access(struct
> > > kvm_vcpu *vcpu,
> > > >  	if (!intel_pmu_is_valid_lbr_msr(vcpu, index))
> > > >  		return false;
> > > >
> > > > -	if (!lbr_desc->event && intel_pmu_create_guest_lbr_event(vcpu) < 0)
> > > > +	/* vLBR event may be inactive, but physical LBR may be free now.
> > >
> > > 	/*
> > > 	 * This is the preferred block comment style.
> > > 	 */
> > >
> > > > +	 * but vLBR event is pinned event, once it is inactive state, perf
> > > > +	 * will force it to error state in merge_sched_in() and exclude it from
> > > > +	 * perf schedule, so even if LBR is free now, vLBR event
> > > > +couldn't be
> > > active
> > > > +	 * through perf scheduler and vLBR event could be active through
> > > > +	 * perf_event_enable().
> > > > +	 */
> > >
> > > Trimming that down, is this what you mean?
> > Yes, thanks a lot.
> > >
> > > 	/*
> > > 	 * Attempt to re-enable the vLBR event if it was disabled due to
> > > 	 * contention with host LBR usage, i.e. was put into an error state.
> > > 	 * Perf doesn't notify KVM if the host stops using LBRs, i.e. KVM needs
> > > 	 * to manually re-enable the event.
> > > 	 */
> > >
> > > Which begs the question, why can't there be a notification of some
> > > form that the LBRs are once again available?
> > This is perf scheduler rule. If pinned event couldn't get resource as
> > resource limitation, perf will put it into error state and exclude it
> > from perf scheduler, even if resource available later, perf won't
> > schedule it again as it is in error state, the only way to reschedule
> > it is to enable it again.  If non-pinned event couldn't get resource
> > as resource limitation, perf will put it into inactive state, perf
> > will reschedule it automatically once resource is available.  vLBR event is per
> process pinned event.
> 
> That doesn't answer my question.  I get that all of this is subject to perf
> scheduling, I'm asking why perf doesn't communicate directly with KVM to
> coordinate access to LBRs instead of pulling the rug out from under KVM.
Perf doesn't need such notification interface currently, as non-pinned event will be active automatically once resource available, only pinned event is still in inactive even if resource available, perf may refuse to add such interface for KVM usage only. Perf may suggest us to modify vLBR event from pinned to nonpinned.
> 
> > > Assuming that's too difficult for whatever reason, why wait until
> > > the guest tries to read LBRs?  E.g. why not be more aggressive and
> > > try to re-enable vLBRs on every VM-Exit.
> > Yes, it is a good suggestion. Actually vmx_passthrough_lbr_msrs() is
> > called on every VM-exit, it also check vLBR event state, so I could re-enable
> vLBR in this function.
> > >
> > > And if we do wait until the guest touches relevant MSRs, shouldn't
> > > writes to DEBUG_CTL that set DEBUGCTLMSR_LBR also try to re-enable the
> event?
> > Only perf know whether vLBR event could be active or not at this
> > moment, if vLBR is active, KVM could read/write DEBUG_CTL[0] with irq
> > disable/enable pair in theory, but it is better that kvm don't touch
> > perf hw resource directly, as vLBR event is one host LBR event, host
> > may have other LBR event, perf will schedule them according to perf
> > scheduler rule.  If vLBR is inactive, KVM shouldn't touch DEBUG_CTL MSR
> totally.
> 
> Again, this doesn't answer my question.  I didn't suggest KVM write anything
> directly, I asked why KVM doesn't try to re-enable the perf LBR event when
> emulating a guest write to DEBUG_CTL.
Yes, kvm should re-enable vLBR event when emulating a guest write to DEBUG_CTL. Since inactive vLBR event will be re-enabled in every VM-Exit according to the above suggestion, no extra code is needed here.
> 
> > > Lastly, what guarantees that the MSRs hold guest data?  I assume
> > > perf purges the MSRs at some point, but it would be helpful to call that out in
> the changelog.
> > For DEBUG_CTL msr, VMCS has two fields for this:
> > 1. "Save debug controls" in VM-Exit controls 2. "Load debug controls"
> > in VM-Entry controls For LBR records MSRs, perf will save them at
> > process schedule out and load them at process schedule in.
> 
> Once again, this doesn't answer my question.  I want to know *exactly* when
> perf can take control of the LBRs.  The fact that
> intel_pmu_handle_lbr_msrs_access()
> disables IRQs when checking lbr_desc->event->state strongly suggests that the
> answer isn't "when a task is scheduled in".
> 
> Your other response[1] mostly answered that question, but I want explicit
> documentation on the contract between perf and KVM with respect to LBRs.  In
> short, please work with Weijiang to fulfill my request/demand[*] that someone
> document KVM's LBR support, and justify the "design".  I am simply not willing to
> take KVM LBR patches until that documentation is provided.
Sure, I will work with Weijiang to supply such documentation. Will this document be put in Documentation/virt/kvm/x86/ ?
> 
> Copy+pasting my request/demand for convenience:
> 
>   : First and foremost, the existing LBR support needs to be documented.
> Someone,
>   : I don't care who, needs to provide a detailed writeup of the contract between
> KVM
>   : and perf.  Specifically, I want to know:
>   :
>   :   1. When exactly is perf allowed to take control of LBR MRS.  Task switch?
> IRQ?
>   :      NMI?
>   :
>   :   2. What is the expected behavior when perf is using LBRs?  Is the guest
> supposed
>   :      to be traced?
>   :
>   :   3. Why does KVM snapshot DEBUGCTL with IRQs enabled, but disables IRQs
> when
>   :      accessing LBR MSRs?
>   :
>   : It doesn't have to be polished, e.g. I'll happily wordsmith things into proper
>   : documentation, but I want to have a very clear understanding of how LBR
> support
>   : is _intended_ to function and how it all _actually_ functions without having to
>   : make guesses.
>   :
>   : And depending on the answers, I want to revisit KVM's LBR implementation
> before
>   : tackling arch LBRs.  Letting perf usurp LBRs while KVM has the vCPU loaded is
>   : frankly ridiculous.  Just have perf set a flag telling KVM that it needs to take
>   : control of LBRs and have KVM service the flag as a request or something.
> Stealing
>   : the LBRs back in IRQ context adds a stupid amount of complexity without
> much value,
>   : e.g. waiting a few branches for KVM to get to a safe place isn't going to
> meaningfully
>   : change the traces.  If that can't actually happen, then why on earth does KVM
> need
>   : to disable IRQs to read MSRs?
>   :
>   : And AFAICT, since KVM unconditionally loads the guest's DEBUGCTL, whether
> or not
>   : guest branches show up in the LBRs when the host is tracing is completely up
> to
>   : the whims of the guest.  If that's correct, then again, what's the point of the
>   : dance between KVM and perf?
> 
> [1]
> https://lkml.kernel.org/r/MW4PR11MB5824480492ED55C63769FF11BB21A%40
> MW4PR11MB5824.namprd11.prod.outlook.com
> [2] https://lore.kernel.org/all/Y9RUOvJ5dkCU9J8C@google.com

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/4] KVM: VMX/pmu: Enable inactive vLBR event in guest LBR MSR emulation
  2023-06-27  3:29         ` Zhang, Xiong Y
@ 2023-06-27 15:07           ` Sean Christopherson
  2023-06-28  6:07             ` Like Xu
  0 siblings, 1 reply; 22+ messages in thread
From: Sean Christopherson @ 2023-06-27 15:07 UTC (permalink / raw)
  To: Xiong Y Zhang
  Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	pbonzini@redhat.com, peterz@infradead.org,
	like.xu.linux@gmail.com, kan.liang@linux.intel.com,
	zhenyuw@linux.intel.com, Zhiyuan Lv, Weijiang Yang

On Tue, Jun 27, 2023, Xiong Y Zhang wrote:
> > On Sun, Jun 25, 2023, Xiong Y Zhang wrote:
> > > > > On Fri, Jun 16, 2023, Xiong Zhang wrote:
> > > > 	/*
> > > > 	 * Attempt to re-enable the vLBR event if it was disabled due to
> > > > 	 * contention with host LBR usage, i.e. was put into an error state.
> > > > 	 * Perf doesn't notify KVM if the host stops using LBRs, i.e. KVM needs
> > > > 	 * to manually re-enable the event.
> > > > 	 */
> > > >
> > > > Which begs the question, why can't there be a notification of some
> > > > form that the LBRs are once again available?
> > > This is perf scheduler rule. If pinned event couldn't get resource as
> > > resource limitation, perf will put it into error state and exclude it
> > > from perf scheduler, even if resource available later, perf won't
> > > schedule it again as it is in error state, the only way to reschedule
> > > it is to enable it again.  If non-pinned event couldn't get resource
> > > as resource limitation, perf will put it into inactive state, perf
> > > will reschedule it automatically once resource is available.  vLBR event is per
> > process pinned event.
> > 
> > That doesn't answer my question.  I get that all of this is subject to perf
> > scheduling, I'm asking why perf doesn't communicate directly with KVM to
> > coordinate access to LBRs instead of pulling the rug out from under KVM.
> Perf doesn't need such notification interface currently, as non-pinned event
> will be active automatically once resource available, only pinned event is
> still in inactive even if resource available, perf may refuse to add such
> interface for KVM usage only.

Or maybe perf will be overjoyed that someone is finally proposing a coherent
interface.  Until we actually try/ask, we'll never know.

> > Your other response[1] mostly answered that question, but I want explicit
> > documentation on the contract between perf and KVM with respect to LBRs.  In
> > short, please work with Weijiang to fulfill my request/demand[*] that someone
> > document KVM's LBR support, and justify the "design".  I am simply not willing to
> > take KVM LBR patches until that documentation is provided.
> Sure, I will work with Weijiang to supply such documentation. Will this
> document be put in Documentation/virt/kvm/x86/ ?

Ya, Documentation/virt/kvm/x86/pmu.rst please.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/4] perf/x86/intel: Get shared reg constraints first for vLBR
  2023-06-16 11:33 ` [PATCH 1/4] perf/x86/intel: Get shared reg constraints first for vLBR Xiong Zhang
@ 2023-06-28  4:25   ` Like Xu
  2023-06-29  2:11     ` Zhang, Xiong Y
  0 siblings, 1 reply; 22+ messages in thread
From: Like Xu @ 2023-06-28  4:25 UTC (permalink / raw)
  To: Xiong Zhang, Sean Christopherson
  Cc: pbonzini, peterz, kan.liang, zhenyuw, zhiyuan.lv, kvm list,
	linux-kernel@vger.kernel.org

On 16/6/2023 7:33 pm, Xiong Zhang wrote:
> When host has per cpu pinned LBR event and guest use LBR also, host
> couldn't get correct LBR data, as the physical LBR is preempted by
> guest.
> 
> The rule for multi events sharing LBR is defined in
> __intel_shared_reg_get_constraints(), but guest vLBR event skips this
> function, so even if host has per cpu pinned LBR event, guest vLBR event
> could get constraints successfully and make vlbr_exclude_host returns true,
> finally host couldn't enable LBR in intel_pmu_lbr_enable_all().

Although it goes against the "per cpu pinned LBR event" priority expectation,
the order is intentionally specified. For two reasons:

- vlbr uses the fake event mechanism in its implementation, a presence similar to
   BTS event, thus the question here is whether we can get the per cpu pinned BTS
   event to work as expected;

- this change should not be applied first before KVM has done a good job
   of making guest lbr and other lbr events coexist correctly;

In treating vlbr event as an ordinary perf_event behind a guest counter that
is expected to comply equally with the scheduling rules of host perf, the first
thing we need to address is how a guest counter should continue to function
during the time when the backend event is preempted by a higher priority one.

> 
> This commit move intel_vlbr_constraints() behind
> intel_shared_regs_constraints(), guest vLBR event will use LBR also and it
> should get LBR resource through intel_shared_regs_constraints().
> 
> Fixes: 097e4311cda9 ("perf/x86: Add constraint to create guest LBR event without hw counter")
> Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
> ---
>   arch/x86/events/intel/core.c | 6 +++---
>   1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> index 6fd3cd97a6ac..2e27a69e9725 100644
> --- a/arch/x86/events/intel/core.c
> +++ b/arch/x86/events/intel/core.c
> @@ -3347,15 +3347,15 @@ __intel_get_event_constraints(struct cpu_hw_events *cpuc, int idx,
>   {
>   	struct event_constraint *c;
>   
> -	c = intel_vlbr_constraints(event);
> +	c = intel_bts_constraints(event);
>   	if (c)
>   		return c;
>   
> -	c = intel_bts_constraints(event);
> +	c = intel_shared_regs_constraints(cpuc, event);
>   	if (c)
>   		return c;
>   
> -	c = intel_shared_regs_constraints(cpuc, event);
> +	c = intel_vlbr_constraints(event);
>   	if (c)
>   		return c;
>   

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/4] KVM: VMX/pmu: Save host debugctlmsr just before vm entry
  2023-06-25  4:03     ` Zhang, Xiong Y
@ 2023-06-28  5:37       ` Like Xu
  0 siblings, 0 replies; 22+ messages in thread
From: Like Xu @ 2023-06-28  5:37 UTC (permalink / raw)
  To: Zhang, Xiong Y, Christopherson,, Sean
  Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	pbonzini@redhat.com, peterz@infradead.org,
	kan.liang@linux.intel.com, zhenyuw@linux.intel.com, Lv, Zhiyuan

On 25/6/2023 12:03 pm, Zhang, Xiong Y wrote:
>> On Fri, Jun 16, 2023, Xiong Zhang wrote:
>>> Perf defines four types of perf event: per cpu pinned event, per
>>> process pinned event, per cpu event, per process event, their
>>> prioirity are from high to low. vLBR event is per process pinned
>>> event. So durng vm exit handler, if vLBR event preempts perf low
>>> priority LBR event, perf will disable LBR and let guest control LBR,
>>> or if vLBR event is preempted by perf high priority LBR event, perf
>>> will enable LBR. In a word LBR status may be changed during vm exit handler.
>>>
>>> MSR_IA32_DEBUGCTLMSR[0] controls LBR enabling, kvm saves its value
>>> into
>>> vmx->host_debugctlmsr in vcpu_load(), and kvm restores its value from
>>> vmx->host_debugctlmsr after vm exit immediately. Since
>>> MSR_IA32_DEBUGCTLMSR[0] could be changed during vm exit handler, the
>>> saved value vmx->host_debugctlmsr could be wrong. So this commit saves
>>> MSR_IA32_DEBUGCTLMSR into vmx->host_debugctlmsr just before vm entry
>>> to reflect the real hardware value.
>>>
>>> Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
>>> ---
>>>   arch/x86/kvm/vmx/vmx.c | 5 +----
>>>   1 file changed, 1 insertion(+), 4 deletions(-)
>>>
>>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index
>>> 44fb619803b8..5ca61a26d0d7 100644
>>> --- a/arch/x86/kvm/vmx/vmx.c
>>> +++ b/arch/x86/kvm/vmx/vmx.c
>>> @@ -1459,13 +1459,9 @@ void vmx_vcpu_load_vmcs(struct kvm_vcpu *vcpu,
>> int cpu,
>>>    */
>>>   static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)  {
>>> -	struct vcpu_vmx *vmx = to_vmx(vcpu);
>>> -
>>>   	vmx_vcpu_load_vmcs(vcpu, cpu, NULL);
>>>
>>>   	vmx_vcpu_pi_load(vcpu, cpu);
>>> -
>>> -	vmx->host_debugctlmsr = get_debugctlmsr();
>>>   }
>>>
>>>   static void vmx_vcpu_put(struct kvm_vcpu *vcpu) @@ -7273,6 +7269,7 @@
>>> static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)
>>>   	atomic_switch_perf_msrs(vmx);
>>>   	if (intel_pmu_lbr_is_enabled(vcpu))
>>>   		vmx_passthrough_lbr_msrs(vcpu);
>>> +	vmx->host_debugctlmsr = get_debugctlmsr();
>>
>> Reading DEBUG_CTL on every VM-Entry is either unnecessary or insufficient.  If
>> the DEBUG_CTL value is being changed synchronously, then just fix whatever
>> KVM path leads to a change in the host avlue.  If DEBUG_CTL is being changed
>> asynchronously, then I'm guessing the change is coming from NMI context,
>> which means that KVM is buggy no matter how close we put this to VM-Enter.
> When a perf event reschedule is needed on a physical cpu, perf scheduler send an IPI to the target cpu, LBR will be enabled or disabled in the IPI handler according to active event attribute.
> If vLBR event is active, LBR is disabled in IPI handler.
> If Host LBR event is active, LBR is enabled in the IPI handler, this could happen when host LBR event preempt vLBR event during vm exit handler.
> DEBUG_CTL[0]'s changing is asynchronous in the perf IPI handler,  host irq is disabled near VM-Enter, so IPI couldn't happen, then host DEBUG_CTL[0] couldn't change before kvm enable host irq.
> Perf event counter overflow (PMI) is a NMI, but this NMI handler doesn't change LBR status, the kvm saved host_debugctlmsr is correct still after PMI handler.
> 
> thanks

This is not true. One example is Freezing LBRs on PMI (bit 11) in the host NMI ctx.

For "Legacy Freeze_LBR_on_PMI" feature on a host, "the LBR is frozen on the
overflowed condition of the buffer area, the processor clears the LBR bit
(bit 0) in IA32_DEBUGCTL."

Not to mention that the commit message makes no mention of the effect of
this change on other features on DEBUG_CTL.

I couldn't agree with Sean more here. I think the first is to make sure that 
debugctl's
functionality is not broken in both root mode and non-root mode, followed closely
by what policy should be set and notified to any user if host/kvm are not in a
position to support either side.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/4] KVM: VMX/pmu: Enable inactive vLBR event in guest LBR MSR emulation
  2023-06-16 11:33 ` [PATCH 3/4] KVM: VMX/pmu: Enable inactive vLBR event in guest LBR MSR emulation Xiong Zhang
  2023-06-23 20:38   ` Sean Christopherson
@ 2023-06-28  5:50   ` Like Xu
  1 sibling, 0 replies; 22+ messages in thread
From: Like Xu @ 2023-06-28  5:50 UTC (permalink / raw)
  To: Xiong Zhang
  Cc: seanjc, pbonzini, peterz, kan.liang, zhenyuw, zhiyuan.lv,
	kvm list, linux-kernel

On 16/6/2023 7:33 pm, Xiong Zhang wrote:
> vLBR event could be inactive in two case:
> a. host per cpu pinned LBR event occupy LBR when vLBR event is created
> b. vLBR event is preempted by host per cpu pinned LBR event during vm
> exit handler.
> When vLBR event is inactive, guest couldn't access LBR msr, and it is
> forced into error state and is excluded from schedule by perf scheduler.
> So vLBR event couldn't be active through perf scheduler even if host per
> cpu pinned LBR event has released LBR, kvm could enable vLBR event
> proactively, then vLBR event may be active and LBR msr could be passthrough
> into guest.
> 
> Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
> ---
>   arch/x86/kvm/vmx/pmu_intel.c | 11 ++++++++++-
>   1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
> index 741efe2c497b..5a3ab8c8711b 100644
> --- a/arch/x86/kvm/vmx/pmu_intel.c
> +++ b/arch/x86/kvm/vmx/pmu_intel.c
> @@ -314,7 +314,16 @@ static bool intel_pmu_handle_lbr_msrs_access(struct kvm_vcpu *vcpu,
>   	if (!intel_pmu_is_valid_lbr_msr(vcpu, index))
>   		return false;
>   
> -	if (!lbr_desc->event && intel_pmu_create_guest_lbr_event(vcpu) < 0)
> +	/* vLBR event may be inactive, but physical LBR may be free now.
> +	 * but vLBR event is pinned event, once it is inactive state, perf
> +	 * will force it to error state in merge_sched_in() and exclude it from
> +	 * perf schedule, so even if LBR is free now, vLBR event couldn't be active
> +	 * through perf scheduler and vLBR event could be active through
> +	 * perf_event_enable().
> +	 */
> +	if (lbr_desc->event && (lbr_desc->event->state == PERF_EVENT_STATE_ERROR))
> +		perf_event_enable(lbr_desc->event);

After allowing LBR host/guest coexistence, calls perf_event_enable() for events
here do not always succeed, thus this is not a good call point.

As expected here, any erroneous perf_event is released and reprogrammed in the
kvm_pmu_handle_event(), and the perf status for enabled features are checked
near the atomic_switch_perf_msrs().

> +	else if (!lbr_desc->event && intel_pmu_create_guest_lbr_event(vcpu) < 0)
>   		goto dummy;
>   
>   	/*

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/4] KVM: VMX/pmu: Enable inactive vLBR event in guest LBR MSR emulation
  2023-06-27 15:07           ` Sean Christopherson
@ 2023-06-28  6:07             ` Like Xu
  0 siblings, 0 replies; 22+ messages in thread
From: Like Xu @ 2023-06-28  6:07 UTC (permalink / raw)
  To: Xiong Y Zhang, Sean Christopherson
  Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	pbonzini@redhat.com, peterz@infradead.org,
	kan.liang@linux.intel.com, zhenyuw@linux.intel.com, Zhiyuan Lv,
	Weijiang Yang

On 27/6/2023 11:07 pm, Sean Christopherson wrote:
> On Tue, Jun 27, 2023, Xiong Y Zhang wrote:
>>> On Sun, Jun 25, 2023, Xiong Y Zhang wrote:
>>>>>> On Fri, Jun 16, 2023, Xiong Zhang wrote:
>>>>> 	/*
>>>>> 	 * Attempt to re-enable the vLBR event if it was disabled due to
>>>>> 	 * contention with host LBR usage, i.e. was put into an error state.
>>>>> 	 * Perf doesn't notify KVM if the host stops using LBRs, i.e. KVM needs
>>>>> 	 * to manually re-enable the event.
>>>>> 	 */
>>>>>
>>>>> Which begs the question, why can't there be a notification of some
>>>>> form that the LBRs are once again available?
>>>> This is perf scheduler rule. If pinned event couldn't get resource as
>>>> resource limitation, perf will put it into error state and exclude it
>>>> from perf scheduler, even if resource available later, perf won't
>>>> schedule it again as it is in error state, the only way to reschedule
>>>> it is to enable it again.  If non-pinned event couldn't get resource
>>>> as resource limitation, perf will put it into inactive state, perf
>>>> will reschedule it automatically once resource is available.  vLBR event is per
>>> process pinned event.
>>>
>>> That doesn't answer my question.  I get that all of this is subject to perf
>>> scheduling, I'm asking why perf doesn't communicate directly with KVM to
>>> coordinate access to LBRs instead of pulling the rug out from under KVM.
>> Perf doesn't need such notification interface currently, as non-pinned event
>> will be active automatically once resource available, only pinned event is
>> still in inactive even if resource available, perf may refuse to add such
>> interface for KVM usage only.
> 
> Or maybe perf will be overjoyed that someone is finally proposing a coherent
> interface.  Until we actually try/ask, we'll never know.

For the perf subsystem, KVM or any other perf_event in kernel space is just a user,
and any external logic that deeply interferes with perf management of host PMU
hw resources will be defeated without question.

> 
>>> Your other response[1] mostly answered that question, but I want explicit
>>> documentation on the contract between perf and KVM with respect to LBRs.  In
>>> short, please work with Weijiang to fulfill my request/demand[*] that someone
>>> document KVM's LBR support, and justify the "design".  I am simply not willing to
>>> take KVM LBR patches until that documentation is provided.
>> Sure, I will work with Weijiang to supply such documentation. Will this
>> document be put in Documentation/virt/kvm/x86/ ?
> 
> Ya, Documentation/virt/kvm/x86/pmu.rst please.

Please pay extra attention to the current status and expectations of co-existence
in the documentation. I'm really looking forward to this document written from
the perspective of a new vPMU developer. Thanks.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/4] KVM: selftests: Add test case for guest and host LBR preemption
  2023-06-16 11:33 ` [PATCH 4/4] KVM: selftests: Add test case for guest and host LBR preemption Xiong Zhang
@ 2023-06-28  6:27   ` Like Xu
  2023-06-29  2:39     ` Zhang, Xiong Y
  2023-06-28  9:27   ` Yang, Weijiang
  1 sibling, 1 reply; 22+ messages in thread
From: Like Xu @ 2023-06-28  6:27 UTC (permalink / raw)
  To: Xiong Zhang
  Cc: pbonzini, peterz, kan.liang, zhenyuw, zhiyuan.lv, kvm list,
	linux-kernel@vger.kernel.org, Sean Christopherson

This direction could be a vPMU "coexistence" feature, please feel free to test it.
But the first step might be to test the behavior of vPMC when its event is 
preempted.
Then expand to Guest LBR and Guest PEBS etc.

On 16/6/2023 7:33 pm, Xiong Zhang wrote:
> When guest access LBR msr at the first time, kvm will create a vLBR event,
> vLBR event joins perf scheduler and occupy physical LBR for guest usage.
> Once vLBR event is active and own LBR, guest could access LBR msr.
> 
> But vLBR event is per process pinned event, perf has higher priority event:
> per cpu pinned LBR event, perf has lower priority events also: per cpu LBR
> event and per process LBR event.
> So if host doesn't have higher priority per cpu pinned LBR event, vLBR
> event could occupy physical LBR always. But once per cpu pinned LBR event
> is active, vLBR event couldn't be active anymore, then guest couldn't
> access LBR msr.
> 
> This commit adds test case to cover guest and host lbr contend.
> 
> Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
> ---
>   tools/testing/selftests/kvm/Makefile          |   1 +
>   .../selftests/kvm/include/ucall_common.h      |  17 ++
>   .../kvm/x86_64/pmu_event_filter_test.c        |  16 --
>   .../kvm/x86_64/vmx_pmu_lbr_contend.c          | 171 ++++++++++++++++++
>   4 files changed, 189 insertions(+), 16 deletions(-)
>   create mode 100644 tools/testing/selftests/kvm/x86_64/vmx_pmu_lbr_contend.c
> 
> diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
> index 4761b768b773..422bbc16ba2a 100644
> --- a/tools/testing/selftests/kvm/Makefile
> +++ b/tools/testing/selftests/kvm/Makefile
> @@ -100,6 +100,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/vmx_dirty_log_test
>   TEST_GEN_PROGS_x86_64 += x86_64/vmx_exception_with_invalid_guest_state
>   TEST_GEN_PROGS_x86_64 += x86_64/vmx_msrs_test
>   TEST_GEN_PROGS_x86_64 += x86_64/vmx_invalid_nested_guest_state
> +TEST_GEN_PROGS_x86_64 += x86_64/vmx_pmu_lbr_contend

x86_64/vmx_pmu_coexistence ?

>   TEST_GEN_PROGS_x86_64 += x86_64/vmx_set_nested_state_test
>   TEST_GEN_PROGS_x86_64 += x86_64/vmx_tsc_adjust_test
>   TEST_GEN_PROGS_x86_64 += x86_64/vmx_nested_tsc_scaling_test
> diff --git a/tools/testing/selftests/kvm/include/ucall_common.h b/tools/testing/selftests/kvm/include/ucall_common.h
> index 1a6aaef5ccae..c1bb0cacf390 100644
> --- a/tools/testing/selftests/kvm/include/ucall_common.h
> +++ b/tools/testing/selftests/kvm/include/ucall_common.h
> @@ -35,6 +35,23 @@ void ucall(uint64_t cmd, int nargs, ...);
>   uint64_t get_ucall(struct kvm_vcpu *vcpu, struct ucall *uc);
>   void ucall_init(struct kvm_vm *vm, vm_paddr_t mmio_gpa);
>   
> +/*
> + * Run the VM to the next GUEST_SYNC(value), and return the value passed
> + * to the sync. Any other exit from the guest is fatal.
> + */
> +static inline uint64_t run_vcpu_to_sync(struct kvm_vcpu *vcpu)
> +{
> +	struct ucall uc;
> +
> +	vcpu_run(vcpu);
> +	TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_IO);
> +	get_ucall(vcpu, &uc);
> +	TEST_ASSERT(uc.cmd == UCALL_SYNC,
> +		    "Received ucall other than UCALL_SYNC: %lu", uc.cmd);
> +
> +	return uc.args[1];
> +}
> +
>   /*
>    * Perform userspace call without any associated data.  This bare call avoids
>    * allocating a ucall struct, which can be useful if the atomic operations in
> diff --git a/tools/testing/selftests/kvm/x86_64/pmu_event_filter_test.c b/tools/testing/selftests/kvm/x86_64/pmu_event_filter_test.c
> index 40507ed9fe8a..8c68029cfb4b 100644
> --- a/tools/testing/selftests/kvm/x86_64/pmu_event_filter_test.c
> +++ b/tools/testing/selftests/kvm/x86_64/pmu_event_filter_test.c
> @@ -177,22 +177,6 @@ static void amd_guest_code(void)
>   	}
>   }
>   
> -/*
> - * Run the VM to the next GUEST_SYNC(value), and return the value passed
> - * to the sync. Any other exit from the guest is fatal.
> - */
> -static uint64_t run_vcpu_to_sync(struct kvm_vcpu *vcpu)
> -{
> -	struct ucall uc;
> -
> -	vcpu_run(vcpu);
> -	TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_IO);
> -	get_ucall(vcpu, &uc);
> -	TEST_ASSERT(uc.cmd == UCALL_SYNC,
> -		    "Received ucall other than UCALL_SYNC: %lu", uc.cmd);
> -	return uc.args[1];
> -}

Can this part be a separate patch ?

> -
>   static void run_vcpu_and_sync_pmc_results(struct kvm_vcpu *vcpu)
>   {
>   	uint64_t r;
> diff --git a/tools/testing/selftests/kvm/x86_64/vmx_pmu_lbr_contend.c b/tools/testing/selftests/kvm/x86_64/vmx_pmu_lbr_contend.c
> new file mode 100644
> index 000000000000..a6a793f08515
> --- /dev/null
> +++ b/tools/testing/selftests/kvm/x86_64/vmx_pmu_lbr_contend.c
> @@ -0,0 +1,171 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Test for host and guest LBR preemption
> + *
> + * Copyright (C) 2021 Intel Corporation
> + *
> + */
> +
> +#define _GNU_SOURCEGG
> +
> +#include <linux/perf_event.h>
> +#include <sys/syscall.h>
> +#include <sys/sysinfo.h>
> +#include <unistd.h>
> +
> +#include "test_util.h"
> +#include "kvm_util.h"
> +#include "processor.h"
> +
> +static void create_perf_events(int *fds, int cpu_num, bool pinned)
> +{
> +	struct perf_event_attr attr = {
> +		.type = PERF_TYPE_HARDWARE,
> +		.size = sizeof(attr),
> +		.config = PERF_COUNT_HW_CPU_CYCLES,
> +		.sample_type = PERF_SAMPLE_BRANCH_STACK,
> +		.sample_period = 1000,
> +		.pinned = pinned,
> +		.branch_sample_type = PERF_SAMPLE_BRANCH_CALL_STACK |
> +				      PERF_SAMPLE_BRANCH_USER |
> +				      PERF_SAMPLE_BRANCH_KERNEL,
> +	};
> +	int i;
> +
> +	for (i = 0; i < cpu_num; i++) {
> +		fds[i] = syscall(__NR_perf_event_open, &attr, -1, i, -1, PERF_FLAG_FD_CLOEXEC);

Which field can point to the generation of a "per cpu pinned" event ?
More comments are required.

> +		TEST_ASSERT(fds[i] != -1, "Failed to create lbr event on cpu%d", i);
> +	}
> +}
> +
> +static void release_perf_events(int *fds, int cpu_num)
> +{
> +	int i;
> +
> +	for (i = 0; i < cpu_num; i++)
> +		close(fds[i]);
> +}
> +
> +#define PERF_CAP_LBR_FMT_MASK  0x1F
> +
> +#define LBR_NOT_SUPPORTED  0xFFFE
> +#define LBR_MSR_WRITE_ERROR 0xFFFD
> +
> +#define LBR_MODE_CHECK_PASS 0x0
> +#define LBR_MSR_WRITE_SUCC  0x1
> +
> +static bool check_lbr_msr(void)
> +{
> +	uint64_t v, old_val;
> +
> +	old_val = rdmsr(MSR_LBR_TOS);

Why focus only on MSR_LBR_TOS ?

> +
> +	v  = old_val ^ 0x3UL;
> +
> +	wrmsr(MSR_LBR_TOS, v);
> +	if (rdmsr(MSR_LBR_TOS) != v)
> +		return false;
> +
> +	wrmsr(MSR_LBR_TOS, old_val);
> +	if (rdmsr(MSR_LBR_TOS) != old_val)
> +		return false;
> +
> +	return true;
> +}
> +
> +static void guest_code(void)
> +{
> +	uint64_t v;
> +
> +	v = rdmsr(MSR_IA32_PERF_CAPABILITIES);
> +	if ((v & PERF_CAP_LBR_FMT_MASK) == 0)
> +		GUEST_SYNC(LBR_NOT_SUPPORTED);
> +
> +	GUEST_SYNC(LBR_MODE_CHECK_PASS);
> +
> +	while (1) {
> +		if (!check_lbr_msr()) {
> +			GUEST_SYNC(LBR_MSR_WRITE_ERROR);
> +			continue;
> +		}
> +
> +		/* Enable LBR to avoid KVM recyling LBR. */
> +		 v = rdmsr(MSR_IA32_DEBUGCTLMSR);
> +		 v |= DEBUGCTLMSR_LBR;
> +		 wrmsr(MSR_IA32_DEBUGCTLMSR, v);
> +
> +		GUEST_SYNC(LBR_MSR_WRITE_SUCC);
> +	}
> +}
> +
> +int main(int argc, char *argv[])
> +{
> +	int *fds, ncpus;
> +	struct kvm_vcpu *vcpu;
> +	struct kvm_vm *vm;
> +	uint64_t r;
> +
> +	TEST_REQUIRE(get_kvm_param_bool("enable_pmu"));
> +	TEST_REQUIRE(host_cpu_is_intel);
> +	TEST_REQUIRE(kvm_cpu_property(X86_PROPERTY_PMU_VERSION));
> +
> +	vm = vm_create_with_one_vcpu(&vcpu, guest_code);
> +	r = run_vcpu_to_sync(vcpu);
> +	TEST_ASSERT(r == LBR_MODE_CHECK_PASS,
> +		    "LBR format in guest PERF_CAP msr isn't correct");
> +
> +	ncpus = get_nprocs();

Could we limit the test to a specific cpu, since it will affect the load on 
other cpus?

> +	fds = malloc(sizeof(int) * ncpus);
> +	TEST_ASSERT(fds != NULL, "Failed to create fds for all cpus");
> +
> +	/* Create per cpu pinned LBR event, then it will own LBR. */
> +	create_perf_events(fds, ncpus, true);
> +
> +	/* Since LBR is owned by per cpu pinned LBR event, guest couldn't get it,
> +	 * so guest couldn't access LBR_TOS msr.
> +	 */
> +	r = run_vcpu_to_sync(vcpu);
> +	TEST_ASSERT(r == LBR_MSR_WRITE_ERROR,
> +		    "1. Unexpected successfully read/write guest LBR_TO msr");
> +
> +	release_perf_events(fds, ncpus);

Obviously there are duplicate calls on release_perf_events() that can be omitted.

> +
> +	/* Since per cpu pinned event is closed and LBR is free, guest could get it,
> +	 * so guest could access LBR_TOS msr.
> +	 */
> +	r = run_vcpu_to_sync(vcpu);
> +	TEST_ASSERT(r == LBR_MSR_WRITE_SUCC,
> +		    "2. Failed to read/write guest LBR_TO msr");
> +
> +	/* Create per cpu LBR event, its priority is lower than vLBR event, and it
> +	 *  couldn't get LBR back from vLBR
> +	 */
> +	create_perf_events(fds, ncpus, false);
> +
> +	/* LBR is still owned by guest, So guest could access LBR_TOS successfully. */
> +	r = run_vcpu_to_sync(vcpu);
> +	TEST_ASSERT(r == LBR_MSR_WRITE_SUCC,
> +		    "3. Failed read/write guest LBR_TO msr");
> +
> +	release_perf_events(fds, ncpus);
> +
> +	/* Create per cpu pinned LBR event, its priority is higher than vLBR event,
> +	 * so it will get LBR back from vLBR.
> +	 */
> +	create_perf_events(fds, ncpus, true);
> +
> +	/* LBR is preepmted by per cpu pinned LBR event, guest couldn't access
> +	 * LBR_TOS msr.
> +	 */
> +	r = run_vcpu_to_sync(vcpu);
> +	TEST_ASSERT(r == LBR_MSR_WRITE_ERROR,
> +		    "4. Unexpected successfully read/write guest LBR_TO msr");
> +
> +	release_perf_events(fds, ncpus);

  Why not add more tests to cover all possibilities ?

	per cpu pinned event
	per process pinned event
	per cpu event
	per process event

> +
> +	kvm_vm_free(vm);
> +
> +	free(fds);
> +
> +	return 0;
> +}

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/4] KVM: selftests: Add test case for guest and host LBR preemption
  2023-06-16 11:33 ` [PATCH 4/4] KVM: selftests: Add test case for guest and host LBR preemption Xiong Zhang
  2023-06-28  6:27   ` Like Xu
@ 2023-06-28  9:27   ` Yang, Weijiang
  2023-06-29  2:52     ` Zhang, Xiong Y
  1 sibling, 1 reply; 22+ messages in thread
From: Yang, Weijiang @ 2023-06-28  9:27 UTC (permalink / raw)
  To: Xiong Zhang
  Cc: Christopherson,, Sean, pbonzini@redhat.com, peterz@infradead.org,
	like.xu.linux@gmail.com, kan.liang@linux.intel.com,
	zhenyuw@linux.intel.com, Lv, Zhiyuan, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org

What kind of issues you expect this selftest to find?

IMO, it verifies the generic perf schedule rules but cannot find 
specific issues.

e.g., whether the LBR data is corrupted in some cases. If the selftest 
can verify whether

guest/host data is maintained correctly in a high contention env., that 
can be better to

sever the purpose of selftest.


On 6/16/2023 7:33 PM, Xiong Zhang wrote:
> When guest access LBR msr at the first time, kvm will create a vLBR event,
> vLBR event joins perf scheduler and occupy physical LBR for guest usage.
> Once vLBR event is active and own LBR, guest could access LBR msr.
>
> But vLBR event is per process pinned event, perf has higher priority event:
> per cpu pinned LBR event, perf has lower priority events also: per cpu LBR
> event and per process LBR event.
> So if host doesn't have higher priority per cpu pinned LBR event, vLBR
> event could occupy physical LBR always. But once per cpu pinned LBR event
> is active, vLBR event couldn't be active anymore, then guest couldn't
> access LBR msr.
>
> This commit adds test case to cover guest and host lbr contend.
>
> Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
> ---
>   tools/testing/selftests/kvm/Makefile          |   1 +
>   .../selftests/kvm/include/ucall_common.h      |  17 ++
>   .../kvm/x86_64/pmu_event_filter_test.c        |  16 --
>   .../kvm/x86_64/vmx_pmu_lbr_contend.c          | 171 ++++++++++++++++++
>   4 files changed, 189 insertions(+), 16 deletions(-)
>   create mode 100644 tools/testing/selftests/kvm/x86_64/vmx_pmu_lbr_contend.c
>
> diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
> index 4761b768b773..422bbc16ba2a 100644
> --- a/tools/testing/selftests/kvm/Makefile
> +++ b/tools/testing/selftests/kvm/Makefile
> @@ -100,6 +100,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/vmx_dirty_log_test
>   TEST_GEN_PROGS_x86_64 += x86_64/vmx_exception_with_invalid_guest_state
>   TEST_GEN_PROGS_x86_64 += x86_64/vmx_msrs_test
>   TEST_GEN_PROGS_x86_64 += x86_64/vmx_invalid_nested_guest_state
> +TEST_GEN_PROGS_x86_64 += x86_64/vmx_pmu_lbr_contend
>   TEST_GEN_PROGS_x86_64 += x86_64/vmx_set_nested_state_test
>   TEST_GEN_PROGS_x86_64 += x86_64/vmx_tsc_adjust_test
>   TEST_GEN_PROGS_x86_64 += x86_64/vmx_nested_tsc_scaling_test
> diff --git a/tools/testing/selftests/kvm/include/ucall_common.h b/tools/testing/selftests/kvm/include/ucall_common.h
> index 1a6aaef5ccae..c1bb0cacf390 100644
> --- a/tools/testing/selftests/kvm/include/ucall_common.h
> +++ b/tools/testing/selftests/kvm/include/ucall_common.h
> @@ -35,6 +35,23 @@ void ucall(uint64_t cmd, int nargs, ...);
>   uint64_t get_ucall(struct kvm_vcpu *vcpu, struct ucall *uc);
>   void ucall_init(struct kvm_vm *vm, vm_paddr_t mmio_gpa);
>   
> +/*
> + * Run the VM to the next GUEST_SYNC(value), and return the value passed
> + * to the sync. Any other exit from the guest is fatal.
> + */
> +static inline uint64_t run_vcpu_to_sync(struct kvm_vcpu *vcpu)
> +{
> +	struct ucall uc;
> +
> +	vcpu_run(vcpu);
> +	TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_IO);
> +	get_ucall(vcpu, &uc);
> +	TEST_ASSERT(uc.cmd == UCALL_SYNC,
> +		    "Received ucall other than UCALL_SYNC: %lu", uc.cmd);
> +
> +	return uc.args[1];
> +}
> +
>   /*
>    * Perform userspace call without any associated data.  This bare call avoids
>    * allocating a ucall struct, which can be useful if the atomic operations in
> diff --git a/tools/testing/selftests/kvm/x86_64/pmu_event_filter_test.c b/tools/testing/selftests/kvm/x86_64/pmu_event_filter_test.c
> index 40507ed9fe8a..8c68029cfb4b 100644
> --- a/tools/testing/selftests/kvm/x86_64/pmu_event_filter_test.c
> +++ b/tools/testing/selftests/kvm/x86_64/pmu_event_filter_test.c
> @@ -177,22 +177,6 @@ static void amd_guest_code(void)
>   	}
>   }
>   
> -/*
> - * Run the VM to the next GUEST_SYNC(value), and return the value passed
> - * to the sync. Any other exit from the guest is fatal.
> - */
> -static uint64_t run_vcpu_to_sync(struct kvm_vcpu *vcpu)
> -{
> -	struct ucall uc;
> -
> -	vcpu_run(vcpu);
> -	TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_IO);
> -	get_ucall(vcpu, &uc);
> -	TEST_ASSERT(uc.cmd == UCALL_SYNC,
> -		    "Received ucall other than UCALL_SYNC: %lu", uc.cmd);
> -	return uc.args[1];
> -}
> -
>   static void run_vcpu_and_sync_pmc_results(struct kvm_vcpu *vcpu)
>   {
>   	uint64_t r;
> diff --git a/tools/testing/selftests/kvm/x86_64/vmx_pmu_lbr_contend.c b/tools/testing/selftests/kvm/x86_64/vmx_pmu_lbr_contend.c
> new file mode 100644
> index 000000000000..a6a793f08515
> --- /dev/null
> +++ b/tools/testing/selftests/kvm/x86_64/vmx_pmu_lbr_contend.c
> @@ -0,0 +1,171 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Test for host and guest LBR preemption
> + *
> + * Copyright (C) 2021 Intel Corporation
> + *
> + */
> +
> +#define _GNU_SOURCEGG
> +
> +#include <linux/perf_event.h>
> +#include <sys/syscall.h>
> +#include <sys/sysinfo.h>
> +#include <unistd.h>
> +
> +#include "test_util.h"
> +#include "kvm_util.h"
> +#include "processor.h"
> +
> +static void create_perf_events(int *fds, int cpu_num, bool pinned)
> +{
> +	struct perf_event_attr attr = {
> +		.type = PERF_TYPE_HARDWARE,
> +		.size = sizeof(attr),
> +		.config = PERF_COUNT_HW_CPU_CYCLES,
> +		.sample_type = PERF_SAMPLE_BRANCH_STACK,
> +		.sample_period = 1000,
> +		.pinned = pinned,
> +		.branch_sample_type = PERF_SAMPLE_BRANCH_CALL_STACK |
> +				      PERF_SAMPLE_BRANCH_USER |
> +				      PERF_SAMPLE_BRANCH_KERNEL,
> +	};
> +	int i;
> +
> +	for (i = 0; i < cpu_num; i++) {
> +		fds[i] = syscall(__NR_perf_event_open, &attr, -1, i, -1, PERF_FLAG_FD_CLOEXEC);
> +		TEST_ASSERT(fds[i] != -1, "Failed to create lbr event on cpu%d", i);
> +	}
> +}
> +
> +static void release_perf_events(int *fds, int cpu_num)
> +{
> +	int i;
> +
> +	for (i = 0; i < cpu_num; i++)
> +		close(fds[i]);
> +}
> +
> +#define PERF_CAP_LBR_FMT_MASK  0x1F
> +
> +#define LBR_NOT_SUPPORTED  0xFFFE
> +#define LBR_MSR_WRITE_ERROR 0xFFFD
> +
> +#define LBR_MODE_CHECK_PASS 0x0
> +#define LBR_MSR_WRITE_SUCC  0x1
> +
> +static bool check_lbr_msr(void)
> +{
> +	uint64_t v, old_val;
> +
> +	old_val = rdmsr(MSR_LBR_TOS);
> +
> +	v  = old_val ^ 0x3UL;
> +
> +	wrmsr(MSR_LBR_TOS, v);
> +	if (rdmsr(MSR_LBR_TOS) != v)
> +		return false;
> +
> +	wrmsr(MSR_LBR_TOS, old_val);
> +	if (rdmsr(MSR_LBR_TOS) != old_val)
> +		return false;
> +
> +	return true;
> +}
> +
> +static void guest_code(void)
> +{
> +	uint64_t v;
> +
> +	v = rdmsr(MSR_IA32_PERF_CAPABILITIES);
> +	if ((v & PERF_CAP_LBR_FMT_MASK) == 0)
> +		GUEST_SYNC(LBR_NOT_SUPPORTED);
> +
> +	GUEST_SYNC(LBR_MODE_CHECK_PASS);
> +
> +	while (1) {
> +		if (!check_lbr_msr()) {
> +			GUEST_SYNC(LBR_MSR_WRITE_ERROR);
> +			continue;
> +		}
> +
> +		/* Enable LBR to avoid KVM recyling LBR. */
> +		 v = rdmsr(MSR_IA32_DEBUGCTLMSR);
> +		 v |= DEBUGCTLMSR_LBR;
> +		 wrmsr(MSR_IA32_DEBUGCTLMSR, v);
> +
> +		GUEST_SYNC(LBR_MSR_WRITE_SUCC);
> +	}
> +}
> +
> +int main(int argc, char *argv[])
> +{
> +	int *fds, ncpus;
> +	struct kvm_vcpu *vcpu;
> +	struct kvm_vm *vm;
> +	uint64_t r;
> +
> +	TEST_REQUIRE(get_kvm_param_bool("enable_pmu"));
> +	TEST_REQUIRE(host_cpu_is_intel);
> +	TEST_REQUIRE(kvm_cpu_property(X86_PROPERTY_PMU_VERSION));
> +
> +	vm = vm_create_with_one_vcpu(&vcpu, guest_code);
> +	r = run_vcpu_to_sync(vcpu);
> +	TEST_ASSERT(r == LBR_MODE_CHECK_PASS,
> +		    "LBR format in guest PERF_CAP msr isn't correct");
> +
> +	ncpus = get_nprocs();
> +	fds = malloc(sizeof(int) * ncpus);
> +	TEST_ASSERT(fds != NULL, "Failed to create fds for all cpus");
> +
> +	/* Create per cpu pinned LBR event, then it will own LBR. */
> +	create_perf_events(fds, ncpus, true);
> +
> +	/* Since LBR is owned by per cpu pinned LBR event, guest couldn't get it,
> +	 * so guest couldn't access LBR_TOS msr.
> +	 */
> +	r = run_vcpu_to_sync(vcpu);
> +	TEST_ASSERT(r == LBR_MSR_WRITE_ERROR,
> +		    "1. Unexpected successfully read/write guest LBR_TO msr");
> +
> +	release_perf_events(fds, ncpus);
> +
> +	/* Since per cpu pinned event is closed and LBR is free, guest could get it,
> +	 * so guest could access LBR_TOS msr.
> +	 */
> +	r = run_vcpu_to_sync(vcpu);
> +	TEST_ASSERT(r == LBR_MSR_WRITE_SUCC,
> +		    "2. Failed to read/write guest LBR_TO msr");
> +
> +	/* Create per cpu LBR event, its priority is lower than vLBR event, and it
> +	 *  couldn't get LBR back from vLBR
> +	 */
> +	create_perf_events(fds, ncpus, false);
> +
> +	/* LBR is still owned by guest, So guest could access LBR_TOS successfully. */
> +	r = run_vcpu_to_sync(vcpu);
> +	TEST_ASSERT(r == LBR_MSR_WRITE_SUCC,
> +		    "3. Failed read/write guest LBR_TO msr");
> +
> +	release_perf_events(fds, ncpus);
> +
> +	/* Create per cpu pinned LBR event, its priority is higher than vLBR event,
> +	 * so it will get LBR back from vLBR.
> +	 */
> +	create_perf_events(fds, ncpus, true);
> +
> +	/* LBR is preepmted by per cpu pinned LBR event, guest couldn't access
> +	 * LBR_TOS msr.
> +	 */
> +	r = run_vcpu_to_sync(vcpu);
> +	TEST_ASSERT(r == LBR_MSR_WRITE_ERROR,
> +		    "4. Unexpected successfully read/write guest LBR_TO msr");
> +
> +	release_perf_events(fds, ncpus);
> +
> +	kvm_vm_free(vm);
> +
> +	free(fds);
> +
> +	return 0;
> +}

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH 1/4] perf/x86/intel: Get shared reg constraints first for vLBR
  2023-06-28  4:25   ` Like Xu
@ 2023-06-29  2:11     ` Zhang, Xiong Y
  0 siblings, 0 replies; 22+ messages in thread
From: Zhang, Xiong Y @ 2023-06-29  2:11 UTC (permalink / raw)
  To: Like Xu, Christopherson,, Sean
  Cc: pbonzini@redhat.com, peterz@infradead.org,
	kan.liang@linux.intel.com, zhenyuw@linux.intel.com, Lv, Zhiyuan,
	kvm list, linux-kernel@vger.kernel.org

> On 16/6/2023 7:33 pm, Xiong Zhang wrote:
> > When host has per cpu pinned LBR event and guest use LBR also, host
> > couldn't get correct LBR data, as the physical LBR is preempted by
> > guest.
> >
> > The rule for multi events sharing LBR is defined in
> > __intel_shared_reg_get_constraints(), but guest vLBR event skips this
> > function, so even if host has per cpu pinned LBR event, guest vLBR
> > event could get constraints successfully and make vlbr_exclude_host
> > returns true, finally host couldn't enable LBR in intel_pmu_lbr_enable_all().
> 
> Although it goes against the "per cpu pinned LBR event" priority expectation, the
> order is intentionally specified. For two reasons:
> 
> - vlbr uses the fake event mechanism in its implementation, a presence similar to
>    BTS event, thus the question here is whether we can get the per cpu pinned
> BTS
>    event to work as expected;
> 
> - this change should not be applied first before KVM has done a good job
>    of making guest lbr and other lbr events coexist correctly;
> 
> In treating vlbr event as an ordinary perf_event behind a guest counter that is
> expected to comply equally with the scheduling rules of host perf, the first thing
> we need to address is how a guest counter should continue to function during
> the time when the backend event is preempted by a higher priority one.
Both arch vPMU and vLBR have such issue, this is vPMU arch gap, you discussed this in the past: https://lore.kernel.org/lkml/810c3148-1791-de57-27c0-d1ac5ed35fb8@gmail.com/
this needs perf and kvm coordinate.
1) In order to let guest get correct data, guest perf event could have higher priority and could not be preempted by host, or passthrough/reserved some counter into guest. This is similar as "guest-only lbr use" in SDM. But this make host perf compromised. 
2) guest perf event is an ordinary perf_event. Like event preemption on host, when an event is preempted, this event is stopped and perf stop this event's total_time_running, finally perf calculates the event counter like multiplex an event, then the event count is an estimated value. But perf couldn't notify the backend counter stopped into guest, and let guest perf notice its event has stopped. Here two new interface are needed:  perf notify kvm that guest perf event is stopped, kvm notify guest that counter is stopped. 
Any suggestion are welcome.

thanks
> 
> >
> > This commit move intel_vlbr_constraints() behind
> > intel_shared_regs_constraints(), guest vLBR event will use LBR also
> > and it should get LBR resource through intel_shared_regs_constraints().
> >
> > Fixes: 097e4311cda9 ("perf/x86: Add constraint to create guest LBR
> > event without hw counter")
> > Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
> > ---
> >   arch/x86/events/intel/core.c | 6 +++---
> >   1 file changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/x86/events/intel/core.c
> > b/arch/x86/events/intel/core.c index 6fd3cd97a6ac..2e27a69e9725 100644
> > --- a/arch/x86/events/intel/core.c
> > +++ b/arch/x86/events/intel/core.c
> > @@ -3347,15 +3347,15 @@ __intel_get_event_constraints(struct
> cpu_hw_events *cpuc, int idx,
> >   {
> >   	struct event_constraint *c;
> >
> > -	c = intel_vlbr_constraints(event);
> > +	c = intel_bts_constraints(event);
> >   	if (c)
> >   		return c;
> >
> > -	c = intel_bts_constraints(event);
> > +	c = intel_shared_regs_constraints(cpuc, event);
> >   	if (c)
> >   		return c;
> >
> > -	c = intel_shared_regs_constraints(cpuc, event);
> > +	c = intel_vlbr_constraints(event);
> >   	if (c)
> >   		return c;
> >

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH 4/4] KVM: selftests: Add test case for guest and host LBR preemption
  2023-06-28  6:27   ` Like Xu
@ 2023-06-29  2:39     ` Zhang, Xiong Y
  0 siblings, 0 replies; 22+ messages in thread
From: Zhang, Xiong Y @ 2023-06-29  2:39 UTC (permalink / raw)
  To: Like Xu
  Cc: pbonzini@redhat.com, peterz@infradead.org,
	kan.liang@linux.intel.com, zhenyuw@linux.intel.com, Lv, Zhiyuan,
	kvm list, linux-kernel@vger.kernel.org, Christopherson,, Sean

> This direction could be a vPMU "coexistence" feature, please feel free to test it.
> But the first step might be to test the behavior of vPMC when its event is
> preempted.
It is harder to construct vPMC preemption as several counters exist in processor, while only one LBR and one PEBS exist in processor. 
> Then expand to Guest LBR and Guest PEBS etc.
> 
> On 16/6/2023 7:33 pm, Xiong Zhang wrote:
> > When guest access LBR msr at the first time, kvm will create a vLBR
> > event, vLBR event joins perf scheduler and occupy physical LBR for guest usage.
> > Once vLBR event is active and own LBR, guest could access LBR msr.
> >
> > But vLBR event is per process pinned event, perf has higher priority event:
> > per cpu pinned LBR event, perf has lower priority events also: per cpu
> > LBR event and per process LBR event.
> > So if host doesn't have higher priority per cpu pinned LBR event, vLBR
> > event could occupy physical LBR always. But once per cpu pinned LBR
> > event is active, vLBR event couldn't be active anymore, then guest
> > couldn't access LBR msr.
> >
> > This commit adds test case to cover guest and host lbr contend.
> >
> > Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
> > ---
> >   tools/testing/selftests/kvm/Makefile          |   1 +
> >   .../selftests/kvm/include/ucall_common.h      |  17 ++
> >   .../kvm/x86_64/pmu_event_filter_test.c        |  16 --
> >   .../kvm/x86_64/vmx_pmu_lbr_contend.c          | 171 ++++++++++++++++++
> >   4 files changed, 189 insertions(+), 16 deletions(-)
> >   create mode 100644
> > tools/testing/selftests/kvm/x86_64/vmx_pmu_lbr_contend.c
> >
> > diff --git a/tools/testing/selftests/kvm/Makefile
> > b/tools/testing/selftests/kvm/Makefile
> > index 4761b768b773..422bbc16ba2a 100644
> > --- a/tools/testing/selftests/kvm/Makefile
> > +++ b/tools/testing/selftests/kvm/Makefile
> > @@ -100,6 +100,7 @@ TEST_GEN_PROGS_x86_64 +=
> x86_64/vmx_dirty_log_test
> >   TEST_GEN_PROGS_x86_64 +=
> x86_64/vmx_exception_with_invalid_guest_state
> >   TEST_GEN_PROGS_x86_64 += x86_64/vmx_msrs_test
> >   TEST_GEN_PROGS_x86_64 += x86_64/vmx_invalid_nested_guest_state
> > +TEST_GEN_PROGS_x86_64 += x86_64/vmx_pmu_lbr_contend
> 
> x86_64/vmx_pmu_coexistence ?
Yes, once this expand to vPMC and PEBS
> 
> >   TEST_GEN_PROGS_x86_64 += x86_64/vmx_set_nested_state_test
> >   TEST_GEN_PROGS_x86_64 += x86_64/vmx_tsc_adjust_test
> >   TEST_GEN_PROGS_x86_64 += x86_64/vmx_nested_tsc_scaling_test
> > diff --git a/tools/testing/selftests/kvm/include/ucall_common.h
> > b/tools/testing/selftests/kvm/include/ucall_common.h
> > index 1a6aaef5ccae..c1bb0cacf390 100644
> > --- a/tools/testing/selftests/kvm/include/ucall_common.h
> > +++ b/tools/testing/selftests/kvm/include/ucall_common.h
> > @@ -35,6 +35,23 @@ void ucall(uint64_t cmd, int nargs, ...);
> >   uint64_t get_ucall(struct kvm_vcpu *vcpu, struct ucall *uc);
> >   void ucall_init(struct kvm_vm *vm, vm_paddr_t mmio_gpa);
> >
> > +/*
> > + * Run the VM to the next GUEST_SYNC(value), and return the value
> > +passed
> > + * to the sync. Any other exit from the guest is fatal.
> > + */
> > +static inline uint64_t run_vcpu_to_sync(struct kvm_vcpu *vcpu) {
> > +	struct ucall uc;
> > +
> > +	vcpu_run(vcpu);
> > +	TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_IO);
> > +	get_ucall(vcpu, &uc);
> > +	TEST_ASSERT(uc.cmd == UCALL_SYNC,
> > +		    "Received ucall other than UCALL_SYNC: %lu", uc.cmd);
> > +
> > +	return uc.args[1];
> > +}
> > +
> >   /*
> >    * Perform userspace call without any associated data.  This bare call avoids
> >    * allocating a ucall struct, which can be useful if the atomic
> > operations in diff --git
> > a/tools/testing/selftests/kvm/x86_64/pmu_event_filter_test.c
> > b/tools/testing/selftests/kvm/x86_64/pmu_event_filter_test.c
> > index 40507ed9fe8a..8c68029cfb4b 100644
> > --- a/tools/testing/selftests/kvm/x86_64/pmu_event_filter_test.c
> > +++ b/tools/testing/selftests/kvm/x86_64/pmu_event_filter_test.c
> > @@ -177,22 +177,6 @@ static void amd_guest_code(void)
> >   	}
> >   }
> >
> > -/*
> > - * Run the VM to the next GUEST_SYNC(value), and return the value
> > passed
> > - * to the sync. Any other exit from the guest is fatal.
> > - */
> > -static uint64_t run_vcpu_to_sync(struct kvm_vcpu *vcpu) -{
> > -	struct ucall uc;
> > -
> > -	vcpu_run(vcpu);
> > -	TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_IO);
> > -	get_ucall(vcpu, &uc);
> > -	TEST_ASSERT(uc.cmd == UCALL_SYNC,
> > -		    "Received ucall other than UCALL_SYNC: %lu", uc.cmd);
> > -	return uc.args[1];
> > -}
> 
> Can this part be a separate patch ?
OK.
> 
> > -
> >   static void run_vcpu_and_sync_pmc_results(struct kvm_vcpu *vcpu)
> >   {
> >   	uint64_t r;
> > diff --git a/tools/testing/selftests/kvm/x86_64/vmx_pmu_lbr_contend.c
> > b/tools/testing/selftests/kvm/x86_64/vmx_pmu_lbr_contend.c
> > new file mode 100644
> > index 000000000000..a6a793f08515
> > --- /dev/null
> > +++ b/tools/testing/selftests/kvm/x86_64/vmx_pmu_lbr_contend.c
> > @@ -0,0 +1,171 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * Test for host and guest LBR preemption
> > + *
> > + * Copyright (C) 2021 Intel Corporation
> > + *
> > + */
> > +
> > +#define _GNU_SOURCEGG
> > +
> > +#include <linux/perf_event.h>
> > +#include <sys/syscall.h>
> > +#include <sys/sysinfo.h>
> > +#include <unistd.h>
> > +
> > +#include "test_util.h"
> > +#include "kvm_util.h"
> > +#include "processor.h"
> > +
> > +static void create_perf_events(int *fds, int cpu_num, bool pinned) {
> > +	struct perf_event_attr attr = {
> > +		.type = PERF_TYPE_HARDWARE,
> > +		.size = sizeof(attr),
> > +		.config = PERF_COUNT_HW_CPU_CYCLES,
> > +		.sample_type = PERF_SAMPLE_BRANCH_STACK,
> > +		.sample_period = 1000,
> > +		.pinned = pinned,
> > +		.branch_sample_type = PERF_SAMPLE_BRANCH_CALL_STACK |
> > +				      PERF_SAMPLE_BRANCH_USER |
> > +				      PERF_SAMPLE_BRANCH_KERNEL,
> > +	};
> > +	int i;
> > +
> > +	for (i = 0; i < cpu_num; i++) {
> > +		fds[i] = syscall(__NR_perf_event_open, &attr, -1, i, -1,
> > +PERF_FLAG_FD_CLOEXEC);
> 
> Which field can point to the generation of a "per cpu pinned" event ?
> More comments are required.
Yes, I should add more comments. This function create pinned and flexible event, it is controlled by input parameter "bool pinned".
> 
> > +		TEST_ASSERT(fds[i] != -1, "Failed to create lbr event on cpu%d",
> i);
> > +	}
> > +}
> > +
> > +static void release_perf_events(int *fds, int cpu_num) {
> > +	int i;
> > +
> > +	for (i = 0; i < cpu_num; i++)
> > +		close(fds[i]);
> > +}
> > +
> > +#define PERF_CAP_LBR_FMT_MASK  0x1F
> > +
> > +#define LBR_NOT_SUPPORTED  0xFFFE
> > +#define LBR_MSR_WRITE_ERROR 0xFFFD
> > +
> > +#define LBR_MODE_CHECK_PASS 0x0
> > +#define LBR_MSR_WRITE_SUCC  0x1
> > +
> > +static bool check_lbr_msr(void)
> > +{
> > +	uint64_t v, old_val;
> > +
> > +	old_val = rdmsr(MSR_LBR_TOS);
> 
> Why focus only on MSR_LBR_TOS ?
MSR_LBR_FROMx/TOx could be used also, I choose TOS without special reason.
> 
> > +
> > +	v  = old_val ^ 0x3UL;
> > +
> > +	wrmsr(MSR_LBR_TOS, v);
> > +	if (rdmsr(MSR_LBR_TOS) != v)
> > +		return false;
> > +
> > +	wrmsr(MSR_LBR_TOS, old_val);
> > +	if (rdmsr(MSR_LBR_TOS) != old_val)
> > +		return false;
> > +
> > +	return true;
> > +}
> > +
> > +static void guest_code(void)
> > +{
> > +	uint64_t v;
> > +
> > +	v = rdmsr(MSR_IA32_PERF_CAPABILITIES);
> > +	if ((v & PERF_CAP_LBR_FMT_MASK) == 0)
> > +		GUEST_SYNC(LBR_NOT_SUPPORTED);
> > +
> > +	GUEST_SYNC(LBR_MODE_CHECK_PASS);
> > +
> > +	while (1) {
> > +		if (!check_lbr_msr()) {
> > +			GUEST_SYNC(LBR_MSR_WRITE_ERROR);
> > +			continue;
> > +		}
> > +
> > +		/* Enable LBR to avoid KVM recyling LBR. */
> > +		 v = rdmsr(MSR_IA32_DEBUGCTLMSR);
> > +		 v |= DEBUGCTLMSR_LBR;
> > +		 wrmsr(MSR_IA32_DEBUGCTLMSR, v);
> > +
> > +		GUEST_SYNC(LBR_MSR_WRITE_SUCC);
> > +	}
> > +}
> > +
> > +int main(int argc, char *argv[])
> > +{
> > +	int *fds, ncpus;
> > +	struct kvm_vcpu *vcpu;
> > +	struct kvm_vm *vm;
> > +	uint64_t r;
> > +
> > +	TEST_REQUIRE(get_kvm_param_bool("enable_pmu"));
> > +	TEST_REQUIRE(host_cpu_is_intel);
> > +	TEST_REQUIRE(kvm_cpu_property(X86_PROPERTY_PMU_VERSION));
> > +
> > +	vm = vm_create_with_one_vcpu(&vcpu, guest_code);
> > +	r = run_vcpu_to_sync(vcpu);
> > +	TEST_ASSERT(r == LBR_MODE_CHECK_PASS,
> > +		    "LBR format in guest PERF_CAP msr isn't correct");
> > +
> > +	ncpus = get_nprocs();
> 
> Could we limit the test to a specific cpu, since it will affect the load on other
> cpus?
Yes, If selftest or vcpu could be bind to a specific cpu, only one perf event could be created on the target cpu. But I don't know how to specify cpu. 
> 
> > +	fds = malloc(sizeof(int) * ncpus);
> > +	TEST_ASSERT(fds != NULL, "Failed to create fds for all cpus");
> > +
> > +	/* Create per cpu pinned LBR event, then it will own LBR. */
> > +	create_perf_events(fds, ncpus, true);
> > +
> > +	/* Since LBR is owned by per cpu pinned LBR event, guest couldn't get it,
> > +	 * so guest couldn't access LBR_TOS msr.
> > +	 */
> > +	r = run_vcpu_to_sync(vcpu);
> > +	TEST_ASSERT(r == LBR_MSR_WRITE_ERROR,
> > +		    "1. Unexpected successfully read/write guest LBR_TO msr");
> > +
> > +	release_perf_events(fds, ncpus);
> 
> Obviously there are duplicate calls on release_perf_events() that can be omitted.
> 
> > +
> > +	/* Since per cpu pinned event is closed and LBR is free, guest could get it,
> > +	 * so guest could access LBR_TOS msr.
> > +	 */
> > +	r = run_vcpu_to_sync(vcpu);
> > +	TEST_ASSERT(r == LBR_MSR_WRITE_SUCC,
> > +		    "2. Failed to read/write guest LBR_TO msr");
> > +
> > +	/* Create per cpu LBR event, its priority is lower than vLBR event, and it
> > +	 *  couldn't get LBR back from vLBR
> > +	 */
> > +	create_perf_events(fds, ncpus, false);
> > +
> > +	/* LBR is still owned by guest, So guest could access LBR_TOS
> successfully. */
> > +	r = run_vcpu_to_sync(vcpu);
> > +	TEST_ASSERT(r == LBR_MSR_WRITE_SUCC,
> > +		    "3. Failed read/write guest LBR_TO msr");
> > +
> > +	release_perf_events(fds, ncpus);
> > +
> > +	/* Create per cpu pinned LBR event, its priority is higher than vLBR event,
> > +	 * so it will get LBR back from vLBR.
> > +	 */
> > +	create_perf_events(fds, ncpus, true);
> > +
> > +	/* LBR is preepmted by per cpu pinned LBR event, guest couldn't access
> > +	 * LBR_TOS msr.
> > +	 */
> > +	r = run_vcpu_to_sync(vcpu);
> > +	TEST_ASSERT(r == LBR_MSR_WRITE_ERROR,
> > +		    "4. Unexpected successfully read/write guest LBR_TO msr");
> > +
> > +	release_perf_events(fds, ncpus);
> 
>   Why not add more tests to cover all possibilities ?
> 
> 	per cpu pinned event
> 	per process pinned event
> 	per cpu event
> 	per process event
Per cpu pinned/flexible event have been covered here.
Per process, I think it means attaching perf onto qemu's process, I will add it.

Anyway, without the previous commits, vLBR has highest priority, this test result will change a lot.
> 
> > +
> > +	kvm_vm_free(vm);
> > +
> > +	free(fds);
> > +
> > +	return 0;
> > +}

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH 4/4] KVM: selftests: Add test case for guest and host LBR preemption
  2023-06-28  9:27   ` Yang, Weijiang
@ 2023-06-29  2:52     ` Zhang, Xiong Y
  2023-06-30  2:05       ` Yang, Weijiang
  0 siblings, 1 reply; 22+ messages in thread
From: Zhang, Xiong Y @ 2023-06-29  2:52 UTC (permalink / raw)
  To: Yang, Weijiang
  Cc: Christopherson,, Sean, pbonzini@redhat.com, peterz@infradead.org,
	like.xu.linux@gmail.com, kan.liang@linux.intel.com,
	zhenyuw@linux.intel.com, Lv, Zhiyuan, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org


> 
> What kind of issues you expect this selftest to find?
> 
> IMO, it verifies the generic perf schedule rules but cannot find specific issues.
Current vLBR event break the generic perf schedule rule. So I write the fix commits and selftest to avoid future broken again. 
> 
> e.g., whether the LBR data is corrupted in some cases. If the selftest can verify
> whether
> 
> guest/host data is maintained correctly in a high contention env., that can be
> better to
> 
> sever the purpose of selftest.
Once vLBR event is preempted, I see designing gap and guest should get wrong data, it is out of this commits scope to fix the gap and to verify the result.
I should add this into commit message.

thanks
> 
> 
> On 6/16/2023 7:33 PM, Xiong Zhang wrote:
> > When guest access LBR msr at the first time, kvm will create a vLBR
> > event, vLBR event joins perf scheduler and occupy physical LBR for guest usage.
> > Once vLBR event is active and own LBR, guest could access LBR msr.
> >
> > But vLBR event is per process pinned event, perf has higher priority event:
> > per cpu pinned LBR event, perf has lower priority events also: per cpu
> > LBR event and per process LBR event.
> > So if host doesn't have higher priority per cpu pinned LBR event, vLBR
> > event could occupy physical LBR always. But once per cpu pinned LBR
> > event is active, vLBR event couldn't be active anymore, then guest
> > couldn't access LBR msr.
> >
> > This commit adds test case to cover guest and host lbr contend.
> >
> > Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
> > ---
> >   tools/testing/selftests/kvm/Makefile          |   1 +
> >   .../selftests/kvm/include/ucall_common.h      |  17 ++
> >   .../kvm/x86_64/pmu_event_filter_test.c        |  16 --
> >   .../kvm/x86_64/vmx_pmu_lbr_contend.c          | 171 ++++++++++++++++++
> >   4 files changed, 189 insertions(+), 16 deletions(-)
> >   create mode 100644
> > tools/testing/selftests/kvm/x86_64/vmx_pmu_lbr_contend.c
> >
> > diff --git a/tools/testing/selftests/kvm/Makefile
> > b/tools/testing/selftests/kvm/Makefile
> > index 4761b768b773..422bbc16ba2a 100644
> > --- a/tools/testing/selftests/kvm/Makefile
> > +++ b/tools/testing/selftests/kvm/Makefile
> > @@ -100,6 +100,7 @@ TEST_GEN_PROGS_x86_64 +=
> x86_64/vmx_dirty_log_test
> >   TEST_GEN_PROGS_x86_64 +=
> x86_64/vmx_exception_with_invalid_guest_state
> >   TEST_GEN_PROGS_x86_64 += x86_64/vmx_msrs_test
> >   TEST_GEN_PROGS_x86_64 += x86_64/vmx_invalid_nested_guest_state
> > +TEST_GEN_PROGS_x86_64 += x86_64/vmx_pmu_lbr_contend
> >   TEST_GEN_PROGS_x86_64 += x86_64/vmx_set_nested_state_test
> >   TEST_GEN_PROGS_x86_64 += x86_64/vmx_tsc_adjust_test
> >   TEST_GEN_PROGS_x86_64 += x86_64/vmx_nested_tsc_scaling_test
> > diff --git a/tools/testing/selftests/kvm/include/ucall_common.h
> > b/tools/testing/selftests/kvm/include/ucall_common.h
> > index 1a6aaef5ccae..c1bb0cacf390 100644
> > --- a/tools/testing/selftests/kvm/include/ucall_common.h
> > +++ b/tools/testing/selftests/kvm/include/ucall_common.h
> > @@ -35,6 +35,23 @@ void ucall(uint64_t cmd, int nargs, ...);
> >   uint64_t get_ucall(struct kvm_vcpu *vcpu, struct ucall *uc);
> >   void ucall_init(struct kvm_vm *vm, vm_paddr_t mmio_gpa);
> >
> > +/*
> > + * Run the VM to the next GUEST_SYNC(value), and return the value
> > +passed
> > + * to the sync. Any other exit from the guest is fatal.
> > + */
> > +static inline uint64_t run_vcpu_to_sync(struct kvm_vcpu *vcpu) {
> > +	struct ucall uc;
> > +
> > +	vcpu_run(vcpu);
> > +	TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_IO);
> > +	get_ucall(vcpu, &uc);
> > +	TEST_ASSERT(uc.cmd == UCALL_SYNC,
> > +		    "Received ucall other than UCALL_SYNC: %lu", uc.cmd);
> > +
> > +	return uc.args[1];
> > +}
> > +
> >   /*
> >    * Perform userspace call without any associated data.  This bare call avoids
> >    * allocating a ucall struct, which can be useful if the atomic
> > operations in diff --git
> > a/tools/testing/selftests/kvm/x86_64/pmu_event_filter_test.c
> > b/tools/testing/selftests/kvm/x86_64/pmu_event_filter_test.c
> > index 40507ed9fe8a..8c68029cfb4b 100644
> > --- a/tools/testing/selftests/kvm/x86_64/pmu_event_filter_test.c
> > +++ b/tools/testing/selftests/kvm/x86_64/pmu_event_filter_test.c
> > @@ -177,22 +177,6 @@ static void amd_guest_code(void)
> >   	}
> >   }
> >
> > -/*
> > - * Run the VM to the next GUEST_SYNC(value), and return the value
> > passed
> > - * to the sync. Any other exit from the guest is fatal.
> > - */
> > -static uint64_t run_vcpu_to_sync(struct kvm_vcpu *vcpu) -{
> > -	struct ucall uc;
> > -
> > -	vcpu_run(vcpu);
> > -	TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_IO);
> > -	get_ucall(vcpu, &uc);
> > -	TEST_ASSERT(uc.cmd == UCALL_SYNC,
> > -		    "Received ucall other than UCALL_SYNC: %lu", uc.cmd);
> > -	return uc.args[1];
> > -}
> > -
> >   static void run_vcpu_and_sync_pmc_results(struct kvm_vcpu *vcpu)
> >   {
> >   	uint64_t r;
> > diff --git a/tools/testing/selftests/kvm/x86_64/vmx_pmu_lbr_contend.c
> > b/tools/testing/selftests/kvm/x86_64/vmx_pmu_lbr_contend.c
> > new file mode 100644
> > index 000000000000..a6a793f08515
> > --- /dev/null
> > +++ b/tools/testing/selftests/kvm/x86_64/vmx_pmu_lbr_contend.c
> > @@ -0,0 +1,171 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * Test for host and guest LBR preemption
> > + *
> > + * Copyright (C) 2021 Intel Corporation
> > + *
> > + */
> > +
> > +#define _GNU_SOURCEGG
> > +
> > +#include <linux/perf_event.h>
> > +#include <sys/syscall.h>
> > +#include <sys/sysinfo.h>
> > +#include <unistd.h>
> > +
> > +#include "test_util.h"
> > +#include "kvm_util.h"
> > +#include "processor.h"
> > +
> > +static void create_perf_events(int *fds, int cpu_num, bool pinned) {
> > +	struct perf_event_attr attr = {
> > +		.type = PERF_TYPE_HARDWARE,
> > +		.size = sizeof(attr),
> > +		.config = PERF_COUNT_HW_CPU_CYCLES,
> > +		.sample_type = PERF_SAMPLE_BRANCH_STACK,
> > +		.sample_period = 1000,
> > +		.pinned = pinned,
> > +		.branch_sample_type = PERF_SAMPLE_BRANCH_CALL_STACK |
> > +				      PERF_SAMPLE_BRANCH_USER |
> > +				      PERF_SAMPLE_BRANCH_KERNEL,
> > +	};
> > +	int i;
> > +
> > +	for (i = 0; i < cpu_num; i++) {
> > +		fds[i] = syscall(__NR_perf_event_open, &attr, -1, i, -1,
> PERF_FLAG_FD_CLOEXEC);
> > +		TEST_ASSERT(fds[i] != -1, "Failed to create lbr event on cpu%d",
> i);
> > +	}
> > +}
> > +
> > +static void release_perf_events(int *fds, int cpu_num) {
> > +	int i;
> > +
> > +	for (i = 0; i < cpu_num; i++)
> > +		close(fds[i]);
> > +}
> > +
> > +#define PERF_CAP_LBR_FMT_MASK  0x1F
> > +
> > +#define LBR_NOT_SUPPORTED  0xFFFE
> > +#define LBR_MSR_WRITE_ERROR 0xFFFD
> > +
> > +#define LBR_MODE_CHECK_PASS 0x0
> > +#define LBR_MSR_WRITE_SUCC  0x1
> > +
> > +static bool check_lbr_msr(void)
> > +{
> > +	uint64_t v, old_val;
> > +
> > +	old_val = rdmsr(MSR_LBR_TOS);
> > +
> > +	v  = old_val ^ 0x3UL;
> > +
> > +	wrmsr(MSR_LBR_TOS, v);
> > +	if (rdmsr(MSR_LBR_TOS) != v)
> > +		return false;
> > +
> > +	wrmsr(MSR_LBR_TOS, old_val);
> > +	if (rdmsr(MSR_LBR_TOS) != old_val)
> > +		return false;
> > +
> > +	return true;
> > +}
> > +
> > +static void guest_code(void)
> > +{
> > +	uint64_t v;
> > +
> > +	v = rdmsr(MSR_IA32_PERF_CAPABILITIES);
> > +	if ((v & PERF_CAP_LBR_FMT_MASK) == 0)
> > +		GUEST_SYNC(LBR_NOT_SUPPORTED);
> > +
> > +	GUEST_SYNC(LBR_MODE_CHECK_PASS);
> > +
> > +	while (1) {
> > +		if (!check_lbr_msr()) {
> > +			GUEST_SYNC(LBR_MSR_WRITE_ERROR);
> > +			continue;
> > +		}
> > +
> > +		/* Enable LBR to avoid KVM recyling LBR. */
> > +		 v = rdmsr(MSR_IA32_DEBUGCTLMSR);
> > +		 v |= DEBUGCTLMSR_LBR;
> > +		 wrmsr(MSR_IA32_DEBUGCTLMSR, v);
> > +
> > +		GUEST_SYNC(LBR_MSR_WRITE_SUCC);
> > +	}
> > +}
> > +
> > +int main(int argc, char *argv[])
> > +{
> > +	int *fds, ncpus;
> > +	struct kvm_vcpu *vcpu;
> > +	struct kvm_vm *vm;
> > +	uint64_t r;
> > +
> > +	TEST_REQUIRE(get_kvm_param_bool("enable_pmu"));
> > +	TEST_REQUIRE(host_cpu_is_intel);
> > +	TEST_REQUIRE(kvm_cpu_property(X86_PROPERTY_PMU_VERSION));
> > +
> > +	vm = vm_create_with_one_vcpu(&vcpu, guest_code);
> > +	r = run_vcpu_to_sync(vcpu);
> > +	TEST_ASSERT(r == LBR_MODE_CHECK_PASS,
> > +		    "LBR format in guest PERF_CAP msr isn't correct");
> > +
> > +	ncpus = get_nprocs();
> > +	fds = malloc(sizeof(int) * ncpus);
> > +	TEST_ASSERT(fds != NULL, "Failed to create fds for all cpus");
> > +
> > +	/* Create per cpu pinned LBR event, then it will own LBR. */
> > +	create_perf_events(fds, ncpus, true);
> > +
> > +	/* Since LBR is owned by per cpu pinned LBR event, guest couldn't get it,
> > +	 * so guest couldn't access LBR_TOS msr.
> > +	 */
> > +	r = run_vcpu_to_sync(vcpu);
> > +	TEST_ASSERT(r == LBR_MSR_WRITE_ERROR,
> > +		    "1. Unexpected successfully read/write guest LBR_TO msr");
> > +
> > +	release_perf_events(fds, ncpus);
> > +
> > +	/* Since per cpu pinned event is closed and LBR is free, guest could get it,
> > +	 * so guest could access LBR_TOS msr.
> > +	 */
> > +	r = run_vcpu_to_sync(vcpu);
> > +	TEST_ASSERT(r == LBR_MSR_WRITE_SUCC,
> > +		    "2. Failed to read/write guest LBR_TO msr");
> > +
> > +	/* Create per cpu LBR event, its priority is lower than vLBR event, and it
> > +	 *  couldn't get LBR back from vLBR
> > +	 */
> > +	create_perf_events(fds, ncpus, false);
> > +
> > +	/* LBR is still owned by guest, So guest could access LBR_TOS
> successfully. */
> > +	r = run_vcpu_to_sync(vcpu);
> > +	TEST_ASSERT(r == LBR_MSR_WRITE_SUCC,
> > +		    "3. Failed read/write guest LBR_TO msr");
> > +
> > +	release_perf_events(fds, ncpus);
> > +
> > +	/* Create per cpu pinned LBR event, its priority is higher than vLBR event,
> > +	 * so it will get LBR back from vLBR.
> > +	 */
> > +	create_perf_events(fds, ncpus, true);
> > +
> > +	/* LBR is preepmted by per cpu pinned LBR event, guest couldn't access
> > +	 * LBR_TOS msr.
> > +	 */
> > +	r = run_vcpu_to_sync(vcpu);
> > +	TEST_ASSERT(r == LBR_MSR_WRITE_ERROR,
> > +		    "4. Unexpected successfully read/write guest LBR_TO msr");
> > +
> > +	release_perf_events(fds, ncpus);
> > +
> > +	kvm_vm_free(vm);
> > +
> > +	free(fds);
> > +
> > +	return 0;
> > +}

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/4] KVM: selftests: Add test case for guest and host LBR preemption
  2023-06-29  2:52     ` Zhang, Xiong Y
@ 2023-06-30  2:05       ` Yang, Weijiang
  0 siblings, 0 replies; 22+ messages in thread
From: Yang, Weijiang @ 2023-06-30  2:05 UTC (permalink / raw)
  To: Zhang, Xiong Y
  Cc: Christopherson,, Sean, pbonzini@redhat.com, peterz@infradead.org,
	like.xu.linux@gmail.com, kan.liang@linux.intel.com,
	zhenyuw@linux.intel.com, Lv, Zhiyuan, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org


On 6/29/2023 10:52 AM, Zhang, Xiong Y wrote:
>> What kind of issues you expect this selftest to find?
>>
>> IMO, it verifies the generic perf schedule rules but cannot find specific issues.
> Current vLBR event break the generic perf schedule rule. So I write the fix commits and selftest to avoid future broken again.

OK, I think you need to refine the assert failure messages and give the 
users  straightforward messages showing something

wrong is happening and stop the following tests since something is 
broken, no need to trigger more errors. E.g.,

+	TEST_ASSERT(r == LBR_MSR_WRITE_ERROR,
+		    "1. Unexpected successfully read/write guest LBR_TO msr");


"1. Unexpected successfully read/write guest LBR_TO msr"
=> "Host LBR ON: Detected unexpected results when write guest vLBR MSRs. Stop testing."

Then at the end of tests, you can print a successful message showing the perf/LBR is working as
expected. This way, testers can got clear result indication of the app.

>> e.g., whether the LBR data is corrupted in some cases. If the selftest can verify
>> whether
>>
>> guest/host data is maintained correctly in a high contention env., that can be
>> better to
>>
>> sever the purpose of selftest.
> Once vLBR event is preempted, I see designing gap and guest should get wrong data, it is out of this commits scope to fix the gap and to verify the result.
> I should add this into commit message.

Yes, refine the change log of this patch to tell clear purpose of this 
app so that users know why this app is needed.

>
> thanks
>>
>> On 6/16/2023 7:33 PM, Xiong Zhang wrote:
>>> When guest access LBR msr at the first time, kvm will create a vLBR
>>> event, vLBR event joins perf scheduler and occupy physical LBR for guest usage.
>>> Once vLBR event is active and own LBR, guest could access LBR msr.
>>>
>>> But vLBR event is per process pinned event, perf has higher priority event:
>>> per cpu pinned LBR event, perf has lower priority events also: per cpu
>>> LBR event and per process LBR event.
>>> So if host doesn't have higher priority per cpu pinned LBR event, vLBR
>>> event could occupy physical LBR always. But once per cpu pinned LBR
>>> event is active, vLBR event couldn't be active anymore, then guest
>>> couldn't access LBR msr.
>>>
>>> This commit adds test case to cover guest and host lbr contend.
>>>
>>> Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
>>> ---
>>>    tools/testing/selftests/kvm/Makefile          |   1 +
>>>    .../selftests/kvm/include/ucall_common.h      |  17 ++
[...]

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2023-06-30  2:07 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-06-16 11:33 [PATCH 0/4] Part of fix for host and guest LBR event coexist Xiong Zhang
2023-06-16 11:33 ` [PATCH 1/4] perf/x86/intel: Get shared reg constraints first for vLBR Xiong Zhang
2023-06-28  4:25   ` Like Xu
2023-06-29  2:11     ` Zhang, Xiong Y
2023-06-16 11:33 ` [PATCH 2/4] KVM: VMX/pmu: Save host debugctlmsr just before vm entry Xiong Zhang
2023-06-23 20:15   ` Sean Christopherson
2023-06-25  4:03     ` Zhang, Xiong Y
2023-06-28  5:37       ` Like Xu
2023-06-16 11:33 ` [PATCH 3/4] KVM: VMX/pmu: Enable inactive vLBR event in guest LBR MSR emulation Xiong Zhang
2023-06-23 20:38   ` Sean Christopherson
2023-06-25  6:54     ` Zhang, Xiong Y
2023-06-26 17:00       ` Sean Christopherson
2023-06-27  3:29         ` Zhang, Xiong Y
2023-06-27 15:07           ` Sean Christopherson
2023-06-28  6:07             ` Like Xu
2023-06-28  5:50   ` Like Xu
2023-06-16 11:33 ` [PATCH 4/4] KVM: selftests: Add test case for guest and host LBR preemption Xiong Zhang
2023-06-28  6:27   ` Like Xu
2023-06-29  2:39     ` Zhang, Xiong Y
2023-06-28  9:27   ` Yang, Weijiang
2023-06-29  2:52     ` Zhang, Xiong Y
2023-06-30  2:05       ` Yang, Weijiang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).