* [PATCH v2 0/6] Extend pmu_counters_test to AMD CPUs
@ 2024-09-18 20:53 Colton Lewis
2024-09-18 20:53 ` [PATCH v2 1/6] KVM: x86: selftests: Fix typos in macro variable use Colton Lewis
` (7 more replies)
0 siblings, 8 replies; 21+ messages in thread
From: Colton Lewis @ 2024-09-18 20:53 UTC (permalink / raw)
To: kvm
Cc: Mingwei Zhang, Jinrong Liang, Jim Mattson, Aaron Lewis,
Sean Christopherson, Paolo Bonzini, Shuah Khan, linux-kselftest,
linux-kernel, Colton Lewis
Extend pmu_counters_test to AMD CPUs.
As the AMD PMU is quite different from Intel with different events and
feature sets, this series introduces a new code path to test it,
specifically focusing on the core counters including the
PerfCtrExtCore and PerfMonV2 features. Northbridge counters and cache
counters exist, but are not as important and can be deferred to a
later series.
The first patch is a bug fix that could be submitted separately.
The series has been tested on both Intel and AMD machines, but I have
not found an AMD machine old enough to lack PerfCtrExtCore. I have
made efforts that no part of the code has any dependency on its
presence.
I am aware of similar work in this direction done by Jinrong Liang
[1]. He told me he is not working on it currently and I am not
intruding by making my own submission.
[1] https://lore.kernel.org/kvm/20231121115457.76269-1-cloudliang@tencent.com/
v2:
* Test all combinations of VM setup rather than only the maximum
allowed by hardware
* Add fixes tag to bug fix in patch 1
* Refine some names
v1:
https://lore.kernel.org/kvm/20240813164244.751597-1-coltonlewis@google.com/
Colton Lewis (6):
KVM: x86: selftests: Fix typos in macro variable use
KVM: x86: selftests: Define AMD PMU CPUID leaves
KVM: x86: selftests: Set up AMD VM in pmu_counters_test
KVM: x86: selftests: Test read/write core counters
KVM: x86: selftests: Test core events
KVM: x86: selftests: Test PerfMonV2
.../selftests/kvm/include/x86_64/processor.h | 7 +
.../selftests/kvm/x86_64/pmu_counters_test.c | 304 ++++++++++++++++--
2 files changed, 277 insertions(+), 34 deletions(-)
base-commit: da3ea35007d0af457a0afc87e84fddaebc4e0b63
--
2.46.0.662.g92d0881bb0-goog
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH v2 1/6] KVM: x86: selftests: Fix typos in macro variable use
2024-09-18 20:53 [PATCH v2 0/6] Extend pmu_counters_test to AMD CPUs Colton Lewis
@ 2024-09-18 20:53 ` Colton Lewis
2024-09-18 20:53 ` [PATCH v2 2/6] KVM: x86: selftests: Define AMD PMU CPUID leaves Colton Lewis
` (6 subsequent siblings)
7 siblings, 0 replies; 21+ messages in thread
From: Colton Lewis @ 2024-09-18 20:53 UTC (permalink / raw)
To: kvm
Cc: Mingwei Zhang, Jinrong Liang, Jim Mattson, Aaron Lewis,
Sean Christopherson, Paolo Bonzini, Shuah Khan, linux-kselftest,
linux-kernel, Colton Lewis
Without the leading underscore, these variables are referencing a
variable in the calling scope. It only worked before by accident
because all calling scopes had a variable with the right name.
Fixes: cd34fd8c758e ("KVM: selftests: Test PMC virtualization with forced emulation")
Signed-off-by: Colton Lewis <coltonlewis@google.com>
---
tools/testing/selftests/kvm/x86_64/pmu_counters_test.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
index 698cb36989db..0e305e43a93b 100644
--- a/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
+++ b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
@@ -174,7 +174,7 @@ do { \
#define GUEST_TEST_EVENT(_idx, _event, _pmc, _pmc_msr, _ctrl_msr, _value, FEP) \
do { \
- wrmsr(pmc_msr, 0); \
+ wrmsr(_pmc_msr, 0); \
\
if (this_cpu_has(X86_FEATURE_CLFLUSHOPT)) \
GUEST_MEASURE_EVENT(_ctrl_msr, _value, "clflushopt .", FEP); \
@@ -331,9 +331,9 @@ __GUEST_ASSERT(expect_gp ? vector == GP_VECTOR : !vector, \
expect_gp ? "#GP" : "no fault", msr, vector) \
#define GUEST_ASSERT_PMC_VALUE(insn, msr, val, expected) \
- __GUEST_ASSERT(val == expected_val, \
+ __GUEST_ASSERT(val == expected, \
"Expected " #insn "(0x%x) to yield 0x%lx, got 0x%lx", \
- msr, expected_val, val);
+ msr, expected, val);
static void guest_test_rdpmc(uint32_t rdpmc_idx, bool expect_success,
uint64_t expected_val)
--
2.46.0.662.g92d0881bb0-goog
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v2 2/6] KVM: x86: selftests: Define AMD PMU CPUID leaves
2024-09-18 20:53 [PATCH v2 0/6] Extend pmu_counters_test to AMD CPUs Colton Lewis
2024-09-18 20:53 ` [PATCH v2 1/6] KVM: x86: selftests: Fix typos in macro variable use Colton Lewis
@ 2024-09-18 20:53 ` Colton Lewis
2025-01-08 17:58 ` Sean Christopherson
2024-09-18 20:53 ` [PATCH v2 3/6] KVM: x86: selftests: Set up AMD VM in pmu_counters_test Colton Lewis
` (5 subsequent siblings)
7 siblings, 1 reply; 21+ messages in thread
From: Colton Lewis @ 2024-09-18 20:53 UTC (permalink / raw)
To: kvm
Cc: Mingwei Zhang, Jinrong Liang, Jim Mattson, Aaron Lewis,
Sean Christopherson, Paolo Bonzini, Shuah Khan, linux-kselftest,
linux-kernel, Colton Lewis
This defined the CPUID calls to determine what extensions and
properties are available. AMD reference manual names listed below.
* PerfCtrExtCore (six core counters instead of four)
* PerfCtrExtNB (four counters for northbridge events)
* PerfCtrExtL2I (four counters for L2 cache events)
* PerfMonV2 (support for registers to control multiple
counters with a single register write)
* LbrAndPmcFreeze (support for freezing last branch recorded stack on
performance counter overflow)
* NumPerfCtrCore (number of core counters)
* NumPerfCtrNB (number of northbridge counters)
Signed-off-by: Colton Lewis <coltonlewis@google.com>
---
tools/testing/selftests/kvm/include/x86_64/processor.h | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h
index a0c1440017bb..44ddfc4c1673 100644
--- a/tools/testing/selftests/kvm/include/x86_64/processor.h
+++ b/tools/testing/selftests/kvm/include/x86_64/processor.h
@@ -183,6 +183,9 @@ struct kvm_x86_cpu_feature {
#define X86_FEATURE_GBPAGES KVM_X86_CPU_FEATURE(0x80000001, 0, EDX, 26)
#define X86_FEATURE_RDTSCP KVM_X86_CPU_FEATURE(0x80000001, 0, EDX, 27)
#define X86_FEATURE_LM KVM_X86_CPU_FEATURE(0x80000001, 0, EDX, 29)
+#define X86_FEATURE_PERF_CTR_EXT_CORE KVM_X86_CPU_FEATURE(0x80000001, 0, ECX, 23)
+#define X86_FEATURE_PERF_CTR_EXT_NB KVM_X86_CPU_FEATURE(0x80000001, 0, ECX, 24)
+#define X86_FEATURE_PERF_CTR_EXT_L2I KVM_X86_CPU_FEATURE(0x80000001, 0, ECX, 28)
#define X86_FEATURE_INVTSC KVM_X86_CPU_FEATURE(0x80000007, 0, EDX, 8)
#define X86_FEATURE_RDPRU KVM_X86_CPU_FEATURE(0x80000008, 0, EBX, 4)
#define X86_FEATURE_AMD_IBPB KVM_X86_CPU_FEATURE(0x80000008, 0, EBX, 12)
@@ -195,6 +198,8 @@ struct kvm_x86_cpu_feature {
#define X86_FEATURE_VGIF KVM_X86_CPU_FEATURE(0x8000000A, 0, EDX, 16)
#define X86_FEATURE_SEV KVM_X86_CPU_FEATURE(0x8000001F, 0, EAX, 1)
#define X86_FEATURE_SEV_ES KVM_X86_CPU_FEATURE(0x8000001F, 0, EAX, 3)
+#define X86_FEATURE_PERFMON_V2 KVM_X86_CPU_FEATURE(0x80000022, 0, EAX, 0)
+#define X86_FEATURE_PERF_LBR_PMC_FREEZE KVM_X86_CPU_FEATURE(0x80000022, 0, EAX, 2)
/*
* KVM defined paravirt features.
@@ -281,6 +286,8 @@ struct kvm_x86_cpu_property {
#define X86_PROPERTY_GUEST_MAX_PHY_ADDR KVM_X86_CPU_PROPERTY(0x80000008, 0, EAX, 16, 23)
#define X86_PROPERTY_SEV_C_BIT KVM_X86_CPU_PROPERTY(0x8000001F, 0, EBX, 0, 5)
#define X86_PROPERTY_PHYS_ADDR_REDUCTION KVM_X86_CPU_PROPERTY(0x8000001F, 0, EBX, 6, 11)
+#define X86_PROPERTY_NUM_PERF_CTR_CORE KVM_X86_CPU_PROPERTY(0x80000022, 0, EBX, 0, 3)
+#define X86_PROPERTY_NUM_PERF_CTR_NB KVM_X86_CPU_PROPERTY(0x80000022, 0, EBX, 10, 15)
#define X86_PROPERTY_MAX_CENTAUR_LEAF KVM_X86_CPU_PROPERTY(0xC0000000, 0, EAX, 0, 31)
--
2.46.0.662.g92d0881bb0-goog
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v2 3/6] KVM: x86: selftests: Set up AMD VM in pmu_counters_test
2024-09-18 20:53 [PATCH v2 0/6] Extend pmu_counters_test to AMD CPUs Colton Lewis
2024-09-18 20:53 ` [PATCH v2 1/6] KVM: x86: selftests: Fix typos in macro variable use Colton Lewis
2024-09-18 20:53 ` [PATCH v2 2/6] KVM: x86: selftests: Define AMD PMU CPUID leaves Colton Lewis
@ 2024-09-18 20:53 ` Colton Lewis
2025-01-08 18:35 ` Sean Christopherson
2024-09-18 20:53 ` [PATCH v2 4/6] KVM: x86: selftests: Test read/write core counters Colton Lewis
` (4 subsequent siblings)
7 siblings, 1 reply; 21+ messages in thread
From: Colton Lewis @ 2024-09-18 20:53 UTC (permalink / raw)
To: kvm
Cc: Mingwei Zhang, Jinrong Liang, Jim Mattson, Aaron Lewis,
Sean Christopherson, Paolo Bonzini, Shuah Khan, linux-kselftest,
linux-kernel, Colton Lewis
Branch in main() depending on if the CPU is Intel or AMD. They are
subject to vastly different requirements because the AMD PMU lacks
many properties defined by the Intel PMU including the entire CPUID
0xa function where Intel stores all the PMU properties. AMD lacks this
as well as any consistent notion of PMU versions as Intel does. Every
feature is a separate flag and they aren't the same features as Intel.
Set up a VM for testing core AMD counters and ensure proper CPUID
features are set.
Signed-off-by: Colton Lewis <coltonlewis@google.com>
---
.../selftests/kvm/x86_64/pmu_counters_test.c | 104 ++++++++++++++----
1 file changed, 83 insertions(+), 21 deletions(-)
diff --git a/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
index 0e305e43a93b..5b240585edc5 100644
--- a/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
+++ b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
@@ -30,10 +30,21 @@
#define NUM_INSNS_RETIRED (NUM_LOOPS * NUM_INSNS_PER_LOOP + NUM_EXTRA_INSNS)
+/*
+ * Limit testing to MSRs that are actually defined by Intel (in the SDM). MSRs
+ * that aren't defined counter MSRs *probably* don't exist, but there's no
+ * guarantee that currently undefined MSR indices won't be used for something
+ * other than PMCs in the future.
+ */
+#define MAX_NR_GP_COUNTERS 8
+#define MAX_NR_FIXED_COUNTERS 3
+#define AMD_NR_CORE_COUNTERS 4
+#define AMD_NR_CORE_EXT_COUNTERS 6
+
static uint8_t kvm_pmu_version;
static bool kvm_has_perf_caps;
-static struct kvm_vm *pmu_vm_create_with_one_vcpu(struct kvm_vcpu **vcpu,
+static struct kvm_vm *intel_pmu_vm_create(struct kvm_vcpu **vcpu,
void *guest_code,
uint8_t pmu_version,
uint64_t perf_capabilities)
@@ -303,7 +314,7 @@ static void test_arch_events(uint8_t pmu_version, uint64_t perf_capabilities,
if (!pmu_version)
return;
- vm = pmu_vm_create_with_one_vcpu(&vcpu, guest_test_arch_events,
+ vm = intel_pmu_vm_create(&vcpu, guest_test_arch_events,
pmu_version, perf_capabilities);
vcpu_set_cpuid_property(vcpu, X86_PROPERTY_PMU_EBX_BIT_VECTOR_LENGTH,
@@ -316,15 +327,6 @@ static void test_arch_events(uint8_t pmu_version, uint64_t perf_capabilities,
kvm_vm_free(vm);
}
-/*
- * Limit testing to MSRs that are actually defined by Intel (in the SDM). MSRs
- * that aren't defined counter MSRs *probably* don't exist, but there's no
- * guarantee that currently undefined MSR indices won't be used for something
- * other than PMCs in the future.
- */
-#define MAX_NR_GP_COUNTERS 8
-#define MAX_NR_FIXED_COUNTERS 3
-
#define GUEST_ASSERT_PMC_MSR_ACCESS(insn, msr, expect_gp, vector) \
__GUEST_ASSERT(expect_gp ? vector == GP_VECTOR : !vector, \
"Expected %s on " #insn "(0x%x), got vector %u", \
@@ -463,7 +465,7 @@ static void test_gp_counters(uint8_t pmu_version, uint64_t perf_capabilities,
struct kvm_vcpu *vcpu;
struct kvm_vm *vm;
- vm = pmu_vm_create_with_one_vcpu(&vcpu, guest_test_gp_counters,
+ vm = intel_pmu_vm_create(&vcpu, guest_test_gp_counters,
pmu_version, perf_capabilities);
vcpu_set_cpuid_property(vcpu, X86_PROPERTY_PMU_NR_GP_COUNTERS,
@@ -530,7 +532,7 @@ static void test_fixed_counters(uint8_t pmu_version, uint64_t perf_capabilities,
struct kvm_vcpu *vcpu;
struct kvm_vm *vm;
- vm = pmu_vm_create_with_one_vcpu(&vcpu, guest_test_fixed_counters,
+ vm = intel_pmu_vm_create(&vcpu, guest_test_fixed_counters,
pmu_version, perf_capabilities);
vcpu_set_cpuid_property(vcpu, X86_PROPERTY_PMU_FIXED_COUNTERS_BITMASK,
@@ -627,18 +629,78 @@ static void test_intel_counters(void)
}
}
-int main(int argc, char *argv[])
+static uint8_t nr_core_counters(void)
{
- TEST_REQUIRE(kvm_is_pmu_enabled());
+ uint8_t nr_counters = kvm_cpu_property(X86_PROPERTY_NUM_PERF_CTR_CORE);
+ bool core_ext = kvm_cpu_has(X86_FEATURE_PERF_CTR_EXT_CORE);
+
+ if (nr_counters != 0)
+ return nr_counters;
+
+ if (core_ext)
+ return AMD_NR_CORE_EXT_COUNTERS;
- TEST_REQUIRE(host_cpu_is_intel);
- TEST_REQUIRE(kvm_cpu_has_p(X86_PROPERTY_PMU_VERSION));
- TEST_REQUIRE(kvm_cpu_property(X86_PROPERTY_PMU_VERSION) > 0);
+ return AMD_NR_CORE_COUNTERS;
- kvm_pmu_version = kvm_cpu_property(X86_PROPERTY_PMU_VERSION);
- kvm_has_perf_caps = kvm_cpu_has(X86_FEATURE_PDCM);
+}
+
+static void guest_test_core_counters(void)
+{
+ GUEST_DONE();
+}
+
+static void test_core_counters(void)
+{
+ uint8_t nr_counters = nr_core_counters();
+ bool core_ext = kvm_cpu_has(X86_FEATURE_PERF_CTR_EXT_CORE);
+ bool perfmon_v2 = kvm_cpu_has(X86_FEATURE_PERFMON_V2);
+ struct kvm_vcpu *vcpu;
+ struct kvm_vm *vm;
+
+ for (uint8_t ce = 0; ce <= core_ext; ce++) {
+ for (uint8_t pm = 0; pm <= perfmon_v2; pm++) {
+ for (uint8_t nc = 0; nc <= nr_counters; nc++) {
+ vm = vm_create_with_one_vcpu(&vcpu, guest_test_core_counters);
+
+ if (nc)
+ vcpu_set_cpuid_property(
+ vcpu, X86_PROPERTY_NUM_PERF_CTR_CORE, nc);
+ if (ce)
+ vcpu_set_cpuid_feature(
+ vcpu, X86_FEATURE_PERF_CTR_EXT_CORE);
+ if (pm)
+ vcpu_set_cpuid_feature(
+ vcpu, X86_FEATURE_PERFMON_V2);
+
+ pr_info("Testing core counters: CoreExt = %u, PerfMonV2 = %u, NumCounters = %u\n",
+ ce, pm, nc);
+ run_vcpu(vcpu);
+
+ kvm_vm_free(vm);
+ }
+ }
+ }
+}
+
+static void test_amd_counters(void)
+{
+ test_core_counters();
+}
- test_intel_counters();
+int main(int argc, char *argv[])
+{
+ TEST_REQUIRE(kvm_is_pmu_enabled());
+
+ if (host_cpu_is_intel) {
+ TEST_REQUIRE(kvm_cpu_has_p(X86_PROPERTY_PMU_VERSION));
+ TEST_REQUIRE(kvm_cpu_property(X86_PROPERTY_PMU_VERSION) > 0);
+ kvm_pmu_version = kvm_cpu_property(X86_PROPERTY_PMU_VERSION);
+ kvm_has_perf_caps = kvm_cpu_has(X86_FEATURE_PDCM);
+ test_intel_counters();
+ } else if (host_cpu_is_amd) {
+ /* AMD CPUs don't have the same properties to look at. */
+ test_amd_counters();
+ }
return 0;
}
--
2.46.0.662.g92d0881bb0-goog
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v2 4/6] KVM: x86: selftests: Test read/write core counters
2024-09-18 20:53 [PATCH v2 0/6] Extend pmu_counters_test to AMD CPUs Colton Lewis
` (2 preceding siblings ...)
2024-09-18 20:53 ` [PATCH v2 3/6] KVM: x86: selftests: Set up AMD VM in pmu_counters_test Colton Lewis
@ 2024-09-18 20:53 ` Colton Lewis
2025-01-08 18:54 ` Sean Christopherson
2024-09-18 20:53 ` [PATCH v2 5/6] KVM: x86: selftests: Test core events Colton Lewis
` (3 subsequent siblings)
7 siblings, 1 reply; 21+ messages in thread
From: Colton Lewis @ 2024-09-18 20:53 UTC (permalink / raw)
To: kvm
Cc: Mingwei Zhang, Jinrong Liang, Jim Mattson, Aaron Lewis,
Sean Christopherson, Paolo Bonzini, Shuah Khan, linux-kselftest,
linux-kernel, Colton Lewis
Run a basic test to ensure we can write an arbitrary value to the core
counters and read it back.
Signed-off-by: Colton Lewis <coltonlewis@google.com>
---
.../selftests/kvm/x86_64/pmu_counters_test.c | 54 +++++++++++++++++++
1 file changed, 54 insertions(+)
diff --git a/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
index 5b240585edc5..79ca7d608e00 100644
--- a/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
+++ b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
@@ -641,11 +641,65 @@ static uint8_t nr_core_counters(void)
return AMD_NR_CORE_EXT_COUNTERS;
return AMD_NR_CORE_COUNTERS;
+}
+
+static uint8_t guest_nr_core_counters(void)
+{
+ uint8_t nr_counters = this_cpu_property(X86_PROPERTY_NUM_PERF_CTR_CORE);
+ bool core_ext = this_cpu_has(X86_FEATURE_PERF_CTR_EXT_CORE);
+
+ if (nr_counters != 0)
+ return nr_counters;
+
+ if (core_ext)
+ return AMD_NR_CORE_EXT_COUNTERS;
+
+ return AMD_NR_CORE_COUNTERS;
+
+}
+static void guest_test_rdwr_core_counters(void)
+{
+ bool core_ext = this_cpu_has(X86_FEATURE_PERF_CTR_EXT_CORE);
+ uint8_t nr_counters = guest_nr_core_counters();
+ uint8_t i;
+ uint32_t esel_msr_base = core_ext ? MSR_F15H_PERF_CTL : MSR_K7_EVNTSEL0;
+ uint32_t cnt_msr_base = core_ext ? MSR_F15H_PERF_CTR : MSR_K7_PERFCTR0;
+ uint32_t msr_step = core_ext ? 2 : 1;
+
+ for (i = 0; i < AMD_NR_CORE_EXT_COUNTERS; i++) {
+ uint64_t test_val = 0xffff;
+ uint32_t esel_msr = esel_msr_base + msr_step * i;
+ uint32_t cnt_msr = cnt_msr_base + msr_step * i;
+ bool expect_gp = !(i < nr_counters);
+ uint8_t vector;
+ uint64_t val;
+
+ /* Test event selection register. */
+ vector = wrmsr_safe(esel_msr, test_val);
+ GUEST_ASSERT_PMC_MSR_ACCESS(WRMSR, esel_msr, expect_gp, vector);
+
+ vector = rdmsr_safe(esel_msr, &val);
+ GUEST_ASSERT_PMC_MSR_ACCESS(RDMSR, esel_msr, expect_gp, vector);
+
+ if (!expect_gp)
+ GUEST_ASSERT_PMC_VALUE(RDMSR, esel_msr, val, test_val);
+
+ /* Test counter register. */
+ vector = wrmsr_safe(cnt_msr, test_val);
+ GUEST_ASSERT_PMC_MSR_ACCESS(WRMSR, cnt_msr, expect_gp, vector);
+
+ vector = rdmsr_safe(cnt_msr, &val);
+ GUEST_ASSERT_PMC_MSR_ACCESS(RDMSR, cnt_msr, expect_gp, vector);
+
+ if (!expect_gp)
+ GUEST_ASSERT_PMC_VALUE(RDMSR, cnt_msr, val, test_val);
+ }
}
static void guest_test_core_counters(void)
{
+ guest_test_rdwr_core_counters();
GUEST_DONE();
}
--
2.46.0.662.g92d0881bb0-goog
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v2 5/6] KVM: x86: selftests: Test core events
2024-09-18 20:53 [PATCH v2 0/6] Extend pmu_counters_test to AMD CPUs Colton Lewis
` (3 preceding siblings ...)
2024-09-18 20:53 ` [PATCH v2 4/6] KVM: x86: selftests: Test read/write core counters Colton Lewis
@ 2024-09-18 20:53 ` Colton Lewis
2025-01-08 19:31 ` Sean Christopherson
2024-09-18 20:53 ` [PATCH v2 6/6] KVM: x86: selftests: Test PerfMonV2 Colton Lewis
` (2 subsequent siblings)
7 siblings, 1 reply; 21+ messages in thread
From: Colton Lewis @ 2024-09-18 20:53 UTC (permalink / raw)
To: kvm
Cc: Mingwei Zhang, Jinrong Liang, Jim Mattson, Aaron Lewis,
Sean Christopherson, Paolo Bonzini, Shuah Khan, linux-kselftest,
linux-kernel, Colton Lewis
Test events on core counters by iterating through every combination of
events in amd_pmu_zen_events with every core counter.
For each combination, calculate the appropriate register addresses for
the event selection/control register and the counter register. The
base addresses and layout schemes change depending on whether we have
the CoreExt feature.
To do the testing, reuse GUEST_TEST_EVENT to run a standard known
workload. Decouple it from guest_assert_event_count (now
guest_assert_intel_event_count) to generalize to AMD.
Then assert the most specific detail that can be reasonably known
about the counter result. Exact count is defined and known for some
events and for other events merely asserted to be nonzero.
Note on exact counts: AMD counts one more branch than Intel for the
same workload. Though I can't confirm a reason, the only thing it
could be is the boundary of the loop instruction being counted
differently. Presumably, when the counter reaches 0 and execution
continues to the next instruction, AMD counts this as a branch and
Intel doesn't.
Signed-off-by: Colton Lewis <coltonlewis@google.com>
---
.../selftests/kvm/x86_64/pmu_counters_test.c | 87 ++++++++++++++++---
1 file changed, 77 insertions(+), 10 deletions(-)
diff --git a/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
index 79ca7d608e00..cf2941cc7c4c 100644
--- a/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
+++ b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
@@ -29,6 +29,9 @@
/* Total number of instructions retired within the measured section. */
#define NUM_INSNS_RETIRED (NUM_LOOPS * NUM_INSNS_PER_LOOP + NUM_EXTRA_INSNS)
+/* AMD counting one extra branch. Probably at loop boundary condition. */
+#define NUM_BRANCH_INSNS_RETIRED_AMD (NUM_LOOPS+1)
+#define NUM_INSNS_RETIRED_AMD (NUM_INSNS_RETIRED+1)
/*
* Limit testing to MSRs that are actually defined by Intel (in the SDM). MSRs
@@ -109,7 +112,7 @@ static uint8_t guest_get_pmu_version(void)
* Sanity check that in all cases, the event doesn't count when it's disabled,
* and that KVM correctly emulates the write of an arbitrary value.
*/
-static void guest_assert_event_count(uint8_t idx,
+static void guest_assert_intel_event_count(uint8_t idx,
struct kvm_x86_pmu_feature event,
uint32_t pmc, uint32_t pmc_msr)
{
@@ -151,6 +154,33 @@ static void guest_assert_event_count(uint8_t idx,
GUEST_ASSERT_EQ(_rdpmc(pmc), 0xdead);
}
+static void guest_assert_amd_event_count(uint8_t evt_idx, uint8_t cnt_idx, uint32_t pmc_msr)
+{
+ uint64_t count;
+ uint64_t count_pmc;
+
+ count = rdmsr(pmc_msr);
+ count_pmc = _rdpmc(cnt_idx);
+ GUEST_ASSERT_EQ(count, count_pmc);
+
+ switch (evt_idx) {
+ case AMD_ZEN_CORE_CYCLES_INDEX:
+ GUEST_ASSERT_NE(count, 0);
+ break;
+ case AMD_ZEN_INSTRUCTIONS_INDEX:
+ GUEST_ASSERT_EQ(count, NUM_INSNS_RETIRED_AMD);
+ break;
+ case AMD_ZEN_BRANCHES_INDEX:
+ GUEST_ASSERT_EQ(count, NUM_BRANCH_INSNS_RETIRED_AMD);
+ break;
+ case AMD_ZEN_BRANCH_MISSES_INDEX:
+ GUEST_ASSERT_NE(count, 0);
+ break;
+ default:
+ break;
+ }
+
+}
/*
* Enable and disable the PMC in a monolithic asm blob to ensure that the
* compiler can't insert _any_ code into the measured sequence. Note, ECX
@@ -183,28 +213,29 @@ do { \
); \
} while (0)
-#define GUEST_TEST_EVENT(_idx, _event, _pmc, _pmc_msr, _ctrl_msr, _value, FEP) \
+#define GUEST_TEST_EVENT(_pmc_msr, _ctrl_msr, _ctrl_value, FEP) \
do { \
wrmsr(_pmc_msr, 0); \
\
if (this_cpu_has(X86_FEATURE_CLFLUSHOPT)) \
- GUEST_MEASURE_EVENT(_ctrl_msr, _value, "clflushopt .", FEP); \
+ GUEST_MEASURE_EVENT(_ctrl_msr, _ctrl_value, "clflushopt .", FEP); \
else if (this_cpu_has(X86_FEATURE_CLFLUSH)) \
- GUEST_MEASURE_EVENT(_ctrl_msr, _value, "clflush .", FEP); \
+ GUEST_MEASURE_EVENT(_ctrl_msr, _ctrl_value, "clflush .", FEP); \
else \
- GUEST_MEASURE_EVENT(_ctrl_msr, _value, "nop", FEP); \
- \
- guest_assert_event_count(_idx, _event, _pmc, _pmc_msr); \
+ GUEST_MEASURE_EVENT(_ctrl_msr, _ctrl_value, "nop", FEP); \
} while (0)
static void __guest_test_arch_event(uint8_t idx, struct kvm_x86_pmu_feature event,
uint32_t pmc, uint32_t pmc_msr,
uint32_t ctrl_msr, uint64_t ctrl_msr_value)
{
- GUEST_TEST_EVENT(idx, event, pmc, pmc_msr, ctrl_msr, ctrl_msr_value, "");
+ GUEST_TEST_EVENT(pmc_msr, ctrl_msr, ctrl_msr_value, "");
+ guest_assert_intel_event_count(idx, event, pmc, pmc_msr);
- if (is_forced_emulation_enabled)
- GUEST_TEST_EVENT(idx, event, pmc, pmc_msr, ctrl_msr, ctrl_msr_value, KVM_FEP);
+ if (is_forced_emulation_enabled) {
+ GUEST_TEST_EVENT(pmc_msr, ctrl_msr, ctrl_msr_value, KVM_FEP);
+ guest_assert_intel_event_count(idx, event, pmc, pmc_msr);
+ }
}
#define X86_PMU_FEATURE_NULL \
@@ -697,9 +728,45 @@ static void guest_test_rdwr_core_counters(void)
}
}
+static void __guest_test_core_event(uint8_t event_idx, uint8_t counter_idx)
+{
+ /* One fortunate area of actual compatibility! This register
+ * layout is the same for both AMD and Intel.
+ */
+ uint64_t eventsel = ARCH_PERFMON_EVENTSEL_OS |
+ ARCH_PERFMON_EVENTSEL_ENABLE |
+ amd_pmu_zen_events[event_idx];
+ bool core_ext = this_cpu_has(X86_FEATURE_PERF_CTR_EXT_CORE);
+ uint64_t esel_msr_base = core_ext ? MSR_F15H_PERF_CTL : MSR_K7_EVNTSEL0;
+ uint64_t cnt_msr_base = core_ext ? MSR_F15H_PERF_CTR : MSR_K7_PERFCTR0;
+ uint64_t msr_step = core_ext ? 2 : 1;
+ uint64_t esel_msr = esel_msr_base + msr_step * counter_idx;
+ uint64_t cnt_msr = cnt_msr_base + msr_step * counter_idx;
+
+ GUEST_TEST_EVENT(cnt_msr, esel_msr, eventsel, "");
+ guest_assert_amd_event_count(event_idx, counter_idx, cnt_msr);
+
+ if (is_forced_emulation_enabled) {
+ GUEST_TEST_EVENT(cnt_msr, esel_msr, eventsel, KVM_FEP);
+ guest_assert_amd_event_count(event_idx, counter_idx, cnt_msr);
+ }
+
+}
+
+static void guest_test_core_events(void)
+{
+ uint8_t nr_counters = guest_nr_core_counters();
+
+ for (uint8_t i = 0; i < NR_AMD_ZEN_EVENTS; i++) {
+ for (uint8_t j = 0; j < nr_counters; j++)
+ __guest_test_core_event(i, j);
+ }
+}
+
static void guest_test_core_counters(void)
{
guest_test_rdwr_core_counters();
+ guest_test_core_events();
GUEST_DONE();
}
--
2.46.0.662.g92d0881bb0-goog
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v2 6/6] KVM: x86: selftests: Test PerfMonV2
2024-09-18 20:53 [PATCH v2 0/6] Extend pmu_counters_test to AMD CPUs Colton Lewis
` (4 preceding siblings ...)
2024-09-18 20:53 ` [PATCH v2 5/6] KVM: x86: selftests: Test core events Colton Lewis
@ 2024-09-18 20:53 ` Colton Lewis
2025-01-08 19:42 ` Sean Christopherson
2024-10-31 20:09 ` [PATCH v2 0/6] Extend pmu_counters_test to AMD CPUs Colton Lewis
2025-01-09 19:47 ` Sean Christopherson
7 siblings, 1 reply; 21+ messages in thread
From: Colton Lewis @ 2024-09-18 20:53 UTC (permalink / raw)
To: kvm
Cc: Mingwei Zhang, Jinrong Liang, Jim Mattson, Aaron Lewis,
Sean Christopherson, Paolo Bonzini, Shuah Khan, linux-kselftest,
linux-kernel, Colton Lewis
Test PerfMonV2, which defines global registers to enable multiple
performance counters with a single MSR write, in its own function.
If the feature is available, ensure the global control register has
the ability to start and stop the performance counters and the global
status register correctly flags an overflow by the associated counter.
Signed-off-by: Colton Lewis <coltonlewis@google.com>
---
.../selftests/kvm/x86_64/pmu_counters_test.c | 53 +++++++++++++++++++
1 file changed, 53 insertions(+)
diff --git a/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
index cf2941cc7c4c..a90df8b67a19 100644
--- a/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
+++ b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
@@ -763,10 +763,63 @@ static void guest_test_core_events(void)
}
}
+static void guest_test_perfmon_v2(void)
+{
+ uint64_t i;
+ uint64_t eventsel = ARCH_PERFMON_EVENTSEL_OS |
+ ARCH_PERFMON_EVENTSEL_ENABLE |
+ AMD_ZEN_CORE_CYCLES;
+ bool core_ext = this_cpu_has(X86_FEATURE_PERF_CTR_EXT_CORE);
+ uint64_t sel_msr_base = core_ext ? MSR_F15H_PERF_CTL : MSR_K7_EVNTSEL0;
+ uint64_t cnt_msr_base = core_ext ? MSR_F15H_PERF_CTR : MSR_K7_PERFCTR0;
+ uint64_t msr_step = core_ext ? 2 : 1;
+ uint8_t nr_counters = guest_nr_core_counters();
+ bool perfmon_v2 = this_cpu_has(X86_FEATURE_PERFMON_V2);
+ uint64_t sel_msr;
+ uint64_t cnt_msr;
+
+ if (!perfmon_v2)
+ return;
+
+ for (i = 0; i < nr_counters; i++) {
+ sel_msr = sel_msr_base + msr_step * i;
+ cnt_msr = cnt_msr_base + msr_step * i;
+
+ /* Ensure count stays 0 when global register disables counter. */
+ wrmsr(MSR_AMD64_PERF_CNTR_GLOBAL_CTL, 0);
+ wrmsr(sel_msr, eventsel);
+ wrmsr(cnt_msr, 0);
+ __asm__ __volatile__("loop ." : "+c"((int){NUM_LOOPS}));
+ GUEST_ASSERT(!_rdpmc(i));
+
+ /* Ensure counter is >0 when global register enables counter. */
+ wrmsr(MSR_AMD64_PERF_CNTR_GLOBAL_CTL, BIT_ULL(i));
+ __asm__ __volatile__("loop ." : "+c"((int){NUM_LOOPS}));
+ wrmsr(MSR_AMD64_PERF_CNTR_GLOBAL_CTL, 0);
+ GUEST_ASSERT(_rdpmc(i));
+
+ /* Ensure global status register flags a counter overflow. */
+ wrmsr(cnt_msr, -1);
+ wrmsr(MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR, 0xff);
+ wrmsr(MSR_AMD64_PERF_CNTR_GLOBAL_CTL, BIT_ULL(i));
+ __asm__ __volatile__("loop ." : "+c"((int){NUM_LOOPS}));
+ wrmsr(MSR_AMD64_PERF_CNTR_GLOBAL_CTL, 0);
+ GUEST_ASSERT(rdmsr(MSR_AMD64_PERF_CNTR_GLOBAL_STATUS) &
+ BIT_ULL(i));
+
+ /* Ensure global status register flag is cleared correctly. */
+ wrmsr(MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR, BIT_ULL(i));
+ GUEST_ASSERT(!(rdmsr(MSR_AMD64_PERF_CNTR_GLOBAL_STATUS) &
+ BIT_ULL(i)));
+ }
+}
+
+
static void guest_test_core_counters(void)
{
guest_test_rdwr_core_counters();
guest_test_core_events();
+ guest_test_perfmon_v2();
GUEST_DONE();
}
--
2.46.0.662.g92d0881bb0-goog
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH v2 0/6] Extend pmu_counters_test to AMD CPUs
2024-09-18 20:53 [PATCH v2 0/6] Extend pmu_counters_test to AMD CPUs Colton Lewis
` (5 preceding siblings ...)
2024-09-18 20:53 ` [PATCH v2 6/6] KVM: x86: selftests: Test PerfMonV2 Colton Lewis
@ 2024-10-31 20:09 ` Colton Lewis
2025-01-09 19:47 ` Sean Christopherson
7 siblings, 0 replies; 21+ messages in thread
From: Colton Lewis @ 2024-10-31 20:09 UTC (permalink / raw)
To: Colton Lewis
Cc: kvm, mizhang, ljr.kernel, jmattson, aaronlewis, seanjc, pbonzini,
shuah, linux-kselftest, linux-kernel
Bumping this for Mingwei
Colton Lewis <coltonlewis@google.com> writes:
> Extend pmu_counters_test to AMD CPUs.
> As the AMD PMU is quite different from Intel with different events and
> feature sets, this series introduces a new code path to test it,
> specifically focusing on the core counters including the
> PerfCtrExtCore and PerfMonV2 features. Northbridge counters and cache
> counters exist, but are not as important and can be deferred to a
> later series.
> The first patch is a bug fix that could be submitted separately.
> The series has been tested on both Intel and AMD machines, but I have
> not found an AMD machine old enough to lack PerfCtrExtCore. I have
> made efforts that no part of the code has any dependency on its
> presence.
> I am aware of similar work in this direction done by Jinrong Liang
> [1]. He told me he is not working on it currently and I am not
> intruding by making my own submission.
> [1]
> https://lore.kernel.org/kvm/20231121115457.76269-1-cloudliang@tencent.com/
> v2:
> * Test all combinations of VM setup rather than only the maximum
> allowed by hardware
> * Add fixes tag to bug fix in patch 1
> * Refine some names
> v1:
> https://lore.kernel.org/kvm/20240813164244.751597-1-coltonlewis@google.com/
> Colton Lewis (6):
> KVM: x86: selftests: Fix typos in macro variable use
> KVM: x86: selftests: Define AMD PMU CPUID leaves
> KVM: x86: selftests: Set up AMD VM in pmu_counters_test
> KVM: x86: selftests: Test read/write core counters
> KVM: x86: selftests: Test core events
> KVM: x86: selftests: Test PerfMonV2
> .../selftests/kvm/include/x86_64/processor.h | 7 +
> .../selftests/kvm/x86_64/pmu_counters_test.c | 304 ++++++++++++++++--
> 2 files changed, 277 insertions(+), 34 deletions(-)
> base-commit: da3ea35007d0af457a0afc87e84fddaebc4e0b63
> --
> 2.46.0.662.g92d0881bb0-goog
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v2 2/6] KVM: x86: selftests: Define AMD PMU CPUID leaves
2024-09-18 20:53 ` [PATCH v2 2/6] KVM: x86: selftests: Define AMD PMU CPUID leaves Colton Lewis
@ 2025-01-08 17:58 ` Sean Christopherson
2025-01-20 17:06 ` Colton Lewis
0 siblings, 1 reply; 21+ messages in thread
From: Sean Christopherson @ 2025-01-08 17:58 UTC (permalink / raw)
To: Colton Lewis
Cc: kvm, Mingwei Zhang, Jinrong Liang, Jim Mattson, Aaron Lewis,
Paolo Bonzini, Shuah Khan, linux-kselftest, linux-kernel
The shortlog is misleading. It's the *leaves* that are being defined, it's the
features and properties.
On Wed, Sep 18, 2024, Colton Lewis wrote:
> This defined the CPUID calls to determine what extensions and
> properties are available.
This is not a coherent changelog.
> AMD reference manual names listed below.
>
> * PerfCtrExtCore (six core counters instead of four)
> * PerfCtrExtNB (four counters for northbridge events)
> * PerfCtrExtL2I (four counters for L2 cache events)
> * PerfMonV2 (support for registers to control multiple
> counters with a single register write)
> * LbrAndPmcFreeze (support for freezing last branch recorded stack on
> performance counter overflow)
> * NumPerfCtrCore (number of core counters)
> * NumPerfCtrNB (number of northbridge counters)
>
> Signed-off-by: Colton Lewis <coltonlewis@google.com>
> ---
> tools/testing/selftests/kvm/include/x86_64/processor.h | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h
> index a0c1440017bb..44ddfc4c1673 100644
> --- a/tools/testing/selftests/kvm/include/x86_64/processor.h
> +++ b/tools/testing/selftests/kvm/include/x86_64/processor.h
> @@ -183,6 +183,9 @@ struct kvm_x86_cpu_feature {
> #define X86_FEATURE_GBPAGES KVM_X86_CPU_FEATURE(0x80000001, 0, EDX, 26)
> #define X86_FEATURE_RDTSCP KVM_X86_CPU_FEATURE(0x80000001, 0, EDX, 27)
> #define X86_FEATURE_LM KVM_X86_CPU_FEATURE(0x80000001, 0, EDX, 29)
> +#define X86_FEATURE_PERF_CTR_EXT_CORE KVM_X86_CPU_FEATURE(0x80000001, 0, ECX, 23)
This ordering is "broken", and confused me for quite some time. These new features
are in ECX, but they're landed after features for EDX. To make matters worse,
there's an existing feature, SVM, defined for ECX.
TL;DR: please be more careful about the ordering, don't just drop stuff at the
end.
> +#define X86_FEATURE_PERF_CTR_EXT_NB KVM_X86_CPU_FEATURE(0x80000001, 0, ECX, 24)
> +#define X86_FEATURE_PERF_CTR_EXT_L2I KVM_X86_CPU_FEATURE(0x80000001, 0, ECX, 28)
To make life easier for developers, I think it makes sense to use the kernel's
names (when the kernel also defines a feature), and adjust the property names to
follow suit.
If there are no objections, I'll apply this as:
--
Author: Colton Lewis <coltonlewis@google.com>
AuthorDate: Wed Sep 18 20:53:15 2024 +0000
Commit: Sean Christopherson <seanjc@google.com>
CommitDate: Wed Jan 8 09:55:57 2025 -0800
KVM: selftests: Add defines for AMD PMU CPUID features and properties
Add macros for AMD's PMU related CPUID features. To make it easier to
cross reference selftest code with KVM/kernel code, use the same macro
names as the kernel for the features.
For reference, the AMD APM defines the features/properties as:
* PerfCtrExtCore (six core counters instead of four)
* PerfCtrExtNB (four counters for northbridge events)
* PerfCtrExtL2I (four counters for L2 cache events)
* PerfMonV2 (support for registers to control multiple
counters with a single register write)
* LbrAndPmcFreeze (support for freezing last branch recorded stack on
performance counter overflow)
* NumPerfCtrCore (number of core counters)
* NumPerfCtrNB (number of northbridge counters)
Signed-off-by: Colton Lewis <coltonlewis@google.com>
Link: https://lore.kernel.org/r/20240918205319.3517569-3-coltonlewis@google.com
[sean: massage changelog, use same names as the kernel]
Signed-off-by: Sean Christopherson <seanjc@google.com>
diff --git a/tools/testing/selftests/kvm/include/x86/processor.h b/tools/testing/selftests/kvm/include/x86/processor.h
index 9ec984cf8674..8de7cace1fbf 100644
--- a/tools/testing/selftests/kvm/include/x86/processor.h
+++ b/tools/testing/selftests/kvm/include/x86/processor.h
@@ -181,6 +181,9 @@ struct kvm_x86_cpu_feature {
* Extended Leafs, a.k.a. AMD defined
*/
#define X86_FEATURE_SVM KVM_X86_CPU_FEATURE(0x80000001, 0, ECX, 2)
+#define X86_FEATURE_PERFCTR_CORE KVM_X86_CPU_FEATURE(0x80000001, 0, ECX, 23)
+#define X86_FEATURE_PERFCTR_NB KVM_X86_CPU_FEATURE(0x80000001, 0, ECX, 24)
+#define X86_FEATURE_PERFCTR_LLC KVM_X86_CPU_FEATURE(0x80000001, 0, ECX, 28)
#define X86_FEATURE_NX KVM_X86_CPU_FEATURE(0x80000001, 0, EDX, 20)
#define X86_FEATURE_GBPAGES KVM_X86_CPU_FEATURE(0x80000001, 0, EDX, 26)
#define X86_FEATURE_RDTSCP KVM_X86_CPU_FEATURE(0x80000001, 0, EDX, 27)
@@ -197,6 +200,8 @@ struct kvm_x86_cpu_feature {
#define X86_FEATURE_VGIF KVM_X86_CPU_FEATURE(0x8000000A, 0, EDX, 16)
#define X86_FEATURE_SEV KVM_X86_CPU_FEATURE(0x8000001F, 0, EAX, 1)
#define X86_FEATURE_SEV_ES KVM_X86_CPU_FEATURE(0x8000001F, 0, EAX, 3)
+#define X86_FEATURE_PERFMON_V2 KVM_X86_CPU_FEATURE(0x80000022, 0, EAX, 0)
+#define X86_FEATURE_LBR_PMC_FREEZE KVM_X86_CPU_FEATURE(0x80000022, 0, EAX, 2)
/*
* KVM defined paravirt features.
@@ -283,6 +288,8 @@ struct kvm_x86_cpu_property {
#define X86_PROPERTY_GUEST_MAX_PHY_ADDR KVM_X86_CPU_PROPERTY(0x80000008, 0, EAX, 16, 23)
#define X86_PROPERTY_SEV_C_BIT KVM_X86_CPU_PROPERTY(0x8000001F, 0, EBX, 0, 5)
#define X86_PROPERTY_PHYS_ADDR_REDUCTION KVM_X86_CPU_PROPERTY(0x8000001F, 0, EBX, 6, 11)
+#define X86_PROPERTY_NR_PERFCTR_CORE KVM_X86_CPU_PROPERTY(0x80000022, 0, EBX, 0, 3)
+#define X86_PROPERTY_NR_PERFCTR_NB KVM_X86_CPU_PROPERTY(0x80000022, 0, EBX, 10, 15)
#define X86_PROPERTY_MAX_CENTAUR_LEAF KVM_X86_CPU_PROPERTY(0xC0000000, 0, EAX, 0, 31)
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH v2 3/6] KVM: x86: selftests: Set up AMD VM in pmu_counters_test
2024-09-18 20:53 ` [PATCH v2 3/6] KVM: x86: selftests: Set up AMD VM in pmu_counters_test Colton Lewis
@ 2025-01-08 18:35 ` Sean Christopherson
2025-01-20 18:03 ` Colton Lewis
0 siblings, 1 reply; 21+ messages in thread
From: Sean Christopherson @ 2025-01-08 18:35 UTC (permalink / raw)
To: Colton Lewis
Cc: kvm, Mingwei Zhang, Jinrong Liang, Jim Mattson, Aaron Lewis,
Paolo Bonzini, Shuah Khan, linux-kselftest, linux-kernel
On Wed, Sep 18, 2024, Colton Lewis wrote:
> Branch in main() depending on if the CPU is Intel or AMD. They are
> subject to vastly different requirements because the AMD PMU lacks
> many properties defined by the Intel PMU including the entire CPUID
> 0xa function where Intel stores all the PMU properties. AMD lacks this
> as well as any consistent notion of PMU versions as Intel does. Every
> feature is a separate flag and they aren't the same features as Intel.
>
> Set up a VM for testing core AMD counters and ensure proper CPUID
> features are set.
>
> Signed-off-by: Colton Lewis <coltonlewis@google.com>
> ---
> .../selftests/kvm/x86_64/pmu_counters_test.c | 104 ++++++++++++++----
> 1 file changed, 83 insertions(+), 21 deletions(-)
>
> diff --git a/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
> index 0e305e43a93b..5b240585edc5 100644
> --- a/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
> +++ b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
> @@ -30,10 +30,21 @@
> #define NUM_INSNS_RETIRED (NUM_LOOPS * NUM_INSNS_PER_LOOP + NUM_EXTRA_INSNS)
>
>
> +/*
> + * Limit testing to MSRs that are actually defined by Intel (in the SDM). MSRs
> + * that aren't defined counter MSRs *probably* don't exist, but there's no
> + * guarantee that currently undefined MSR indices won't be used for something
> + * other than PMCs in the future.
> + */
> +#define MAX_NR_GP_COUNTERS 8
> +#define MAX_NR_FIXED_COUNTERS 3
> +#define AMD_NR_CORE_COUNTERS 4
> +#define AMD_NR_CORE_EXT_COUNTERS 6
> +
> static uint8_t kvm_pmu_version;
> static bool kvm_has_perf_caps;
>
> -static struct kvm_vm *pmu_vm_create_with_one_vcpu(struct kvm_vcpu **vcpu,
> +static struct kvm_vm *intel_pmu_vm_create(struct kvm_vcpu **vcpu,
> void *guest_code,
When renaming things, please fixup the alignment as needed. Yes, it's more
churn, but one-time pain is preferable to living indefinitely with funky formatting.
I also don't like renaming just one symbol. E.g. the above MAX_NR_GP_COUNTERS
and MAX_NR_FIXED_COUNTERS #defines are Intel specific, but that's not at all
clear from the code. Ditto for guest_rd_wr_counters() vs.
guest_test_rdwr_core_counters().
Given how little code is actually shared between Intel and AMD, I think it makes
sense to have the bulk of the code live in separate .c files. Since
tools/testing/selftests/kvm/lib/x86/pmu.c is already a thing, the best option is
probably to rename pmu_counters_test.c to intel_pmu_counters_test.c, and then
extract the common bits to lib/x86/pmu.c (or include/x86/pmu.h as appropriate).
> uint8_t pmu_version,
> uint64_t perf_capabilities)
> +static void test_core_counters(void)
> +{
> + uint8_t nr_counters = nr_core_counters();
> + bool core_ext = kvm_cpu_has(X86_FEATURE_PERF_CTR_EXT_CORE);
> + bool perfmon_v2 = kvm_cpu_has(X86_FEATURE_PERFMON_V2);
> + struct kvm_vcpu *vcpu;
> + struct kvm_vm *vm;
> +
> + for (uint8_t ce = 0; ce <= core_ext; ce++) {
Kernel style is to not declared variables inside for-loops.
> + for (uint8_t pm = 0; pm <= perfmon_v2; pm++) {
Iterating over booleans is decidedly odd, the indentation levels are painful and
will only get worse as more features are added, and the "ce" and "pm" variables
aren't all that intuitive. More below.
> + for (uint8_t nc = 0; nc <= nr_counters; nc++) {
I also find "nc" to be unintuitive. Either use a fully descriptive name, or
make it obvious that the variables is an iterator. E.g. either
uint8_t max_nr_counters = nr_core_counters();
...
for (nr_counters = 0; nr_counters < max_nr_counters; nr_counters++) {
or
for (j = 0; j < nr_counters; j++) {
'j' is obviously not descriptive, but when reading the usage, it's more obvious
that it's a loop iterator (if you choose
> + vm = vm_create_with_one_vcpu(&vcpu, guest_test_core_counters);
> +
> + if (nc)
Is '0' not a legal number of counters?
> + vcpu_set_cpuid_property(
Google3! (Never, ever wrap immediately after the opening paranethesis).
> + vcpu, X86_PROPERTY_NUM_PERF_CTR_CORE, nc);
> + if (ce)
> + vcpu_set_cpuid_feature(
> + vcpu, X86_FEATURE_PERF_CTR_EXT_CORE);
This likely doesn't do what you want. By default, vm_arch_vcpu_add() initializes
CPUID to KVM's supported CPUID. So only _setting_ the feature means that that
the test is likely only ever running with the full set of supported features.
Jumping back to my complaints with the for-loops, if the features to interate on
are collected in an array, then the test can generate a mask of all possible
combinations and iterate over that (plus the array itself). That keeps the
indentation bounded and eliminates the copy+paste needed to add a new feature.
The only downside is that the test is limited to 64 features, but we'll run into
run time issues long before that limit is reached.
const struct kvm_x86_cpu_feature pmu_features[] = {
X86_FEATURE_PERF_CTR_EXT_CORE,
X86_FEATURE_PERFMON_V2,
};
const u64 pmu_features_mask = BIT_ULL(ARRAY_SIZE(pmu_features)) - 1;
for (mask = 0; mask <= pmu_features_mask; mask++) {
for (nr_counters = 0; nr_counters < max_nr_counters; nr_counters++) {
vm = vm_create_with_one_vcpu(&vcpu, guest_test_core_counters);
vcpu_set_cpuid_property(vcpu, X86_PROPERTY_NUM_PERF_CTR_CORE,
nr_counters);
/* Comment goes here */
for (i = 0; i < ARRAY_SIZE(pmu_features); i++)
vcpu_set_or_clear_cpuid_feature(vcpu, pmu_features[i],
mask & BIT_ULL(i));
...
}
> + if (pm)
> + vcpu_set_cpuid_feature(
> + vcpu, X86_FEATURE_PERFMON_V2);
> +
> + pr_info("Testing core counters: CoreExt = %u, PerfMonV2 = %u, NumCounters = %u\n",
> + ce, pm, nc);
> + run_vcpu(vcpu);
> +
> + kvm_vm_free(vm);
> + }
> + }
> + }
> +}
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v2 4/6] KVM: x86: selftests: Test read/write core counters
2024-09-18 20:53 ` [PATCH v2 4/6] KVM: x86: selftests: Test read/write core counters Colton Lewis
@ 2025-01-08 18:54 ` Sean Christopherson
2025-01-20 19:54 ` Colton Lewis
0 siblings, 1 reply; 21+ messages in thread
From: Sean Christopherson @ 2025-01-08 18:54 UTC (permalink / raw)
To: Colton Lewis
Cc: kvm, Mingwei Zhang, Jinrong Liang, Jim Mattson, Aaron Lewis,
Paolo Bonzini, Shuah Khan, linux-kselftest, linux-kernel
On Wed, Sep 18, 2024, Colton Lewis wrote:
> Run a basic test to ensure we can write an arbitrary value to the core
> counters and read it back.
>
> Signed-off-by: Colton Lewis <coltonlewis@google.com>
> ---
> .../selftests/kvm/x86_64/pmu_counters_test.c | 54 +++++++++++++++++++
> 1 file changed, 54 insertions(+)
>
> diff --git a/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
> index 5b240585edc5..79ca7d608e00 100644
> --- a/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
> +++ b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
> @@ -641,11 +641,65 @@ static uint8_t nr_core_counters(void)
> return AMD_NR_CORE_EXT_COUNTERS;
>
> return AMD_NR_CORE_COUNTERS;
> +}
> +
> +static uint8_t guest_nr_core_counters(void)
> +{
> + uint8_t nr_counters = this_cpu_property(X86_PROPERTY_NUM_PERF_CTR_CORE);
> + bool core_ext = this_cpu_has(X86_FEATURE_PERF_CTR_EXT_CORE);
For both this and nr_core_counters(), there's no need to read PERF_CTR_EXT_CORE
if nr_counters is non-zero, and then no need to capture it in a local variable.
> +
> + if (nr_counters != 0)
> + return nr_counters;
> +
> + if (core_ext)
> + return AMD_NR_CORE_EXT_COUNTERS;
> +
> + return AMD_NR_CORE_COUNTERS;
This is *painfully* similar to nr_core_counters(). It actually took me almost
a minute of staring to see the difference. One option would be to add a helper
to dedup the if-statements, but while somewhat gross, I actually think a macro
is the way to go.
#define nr_core_counters(scope) \
({ \
uint8_t nr_counters = scope##_cpu_property(X86_PROPERTY_NR_PERFCTR_CORE); \
\
if (!nr_counters) { \
if (scope##_cpu_has(X86_FEATURE_PERFCTR_CORE)) \
nr_counters = AMD_NR_CORE_EXT_COUNTERS; \
else \
nr_counters = AMD_NR_CORE_COUNTERS; \
} \
nr_counters; \
})
static uint8_t kvm_nr_core_counters(void)
{
return nr_core_counters(kvm);
}
static uint8_t guest_nr_core_counters(void)
{
return nr_core_counters(this);
}
> +
Unnecessary newline.
> +}
>
> +static void guest_test_rdwr_core_counters(void)
> +{
> + bool core_ext = this_cpu_has(X86_FEATURE_PERF_CTR_EXT_CORE);
> + uint8_t nr_counters = guest_nr_core_counters();
> + uint8_t i;
> + uint32_t esel_msr_base = core_ext ? MSR_F15H_PERF_CTL : MSR_K7_EVNTSEL0;
Please don't concoct new abbreviations. "esel" isn't used anywhere in KVM, and
AFAICT it's not used in perf either.
I would also prefer to have consistent naming between the Intel and AMD tests
(the Intel test uses base_<name>_msr).
base_eventsel_msr is all of four characters more.
> + uint32_t cnt_msr_base = core_ext ? MSR_F15H_PERF_CTR : MSR_K7_PERFCTR0;
For better or worse, the Intel version uses "base_pmc_msr". I see no reason to
diverage from that.
> + uint32_t msr_step = core_ext ? 2 : 1;
> +
> + for (i = 0; i < AMD_NR_CORE_EXT_COUNTERS; i++) {
> + uint64_t test_val = 0xffff;
> + uint32_t esel_msr = esel_msr_base + msr_step * i;
> + uint32_t cnt_msr = cnt_msr_base + msr_step * i;
And then
uint32_t eventsel_msr = ...;
uint32_t pmc_msr = ...;
> + bool expect_gp = !(i < nr_counters);
Uh, isn't that just a weird way of writing:
bool expect_gp = i >= nr_counters;
> + uint8_t vector;
> + uint64_t val;
> +
> + /* Test event selection register. */
This is pretty obvious if the MSR is named eventsel_msr.
> + vector = wrmsr_safe(esel_msr, test_val);
> + GUEST_ASSERT_PMC_MSR_ACCESS(WRMSR, esel_msr, expect_gp, vector);
> +
> + vector = rdmsr_safe(esel_msr, &val);
> + GUEST_ASSERT_PMC_MSR_ACCESS(RDMSR, esel_msr, expect_gp, vector);
> +
> + if (!expect_gp)
> + GUEST_ASSERT_PMC_VALUE(RDMSR, esel_msr, val, test_val);
> +
> + /* Test counter register. */
Same thing here. If there is novel information/behavior, then by all means add
a comment.
> + vector = wrmsr_safe(cnt_msr, test_val);
> + GUEST_ASSERT_PMC_MSR_ACCESS(WRMSR, cnt_msr, expect_gp, vector);
> +
> + vector = rdmsr_safe(cnt_msr, &val);
> + GUEST_ASSERT_PMC_MSR_ACCESS(RDMSR, cnt_msr, expect_gp, vector);
> +
> + if (!expect_gp)
> + GUEST_ASSERT_PMC_VALUE(RDMSR, cnt_msr, val, test_val);
> + }
> }
>
> static void guest_test_core_counters(void)
> {
> + guest_test_rdwr_core_counters();
> GUEST_DONE();
> }
>
> --
> 2.46.0.662.g92d0881bb0-goog
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v2 5/6] KVM: x86: selftests: Test core events
2024-09-18 20:53 ` [PATCH v2 5/6] KVM: x86: selftests: Test core events Colton Lewis
@ 2025-01-08 19:31 ` Sean Christopherson
2025-01-20 20:01 ` Colton Lewis
0 siblings, 1 reply; 21+ messages in thread
From: Sean Christopherson @ 2025-01-08 19:31 UTC (permalink / raw)
To: Colton Lewis
Cc: kvm, Mingwei Zhang, Jinrong Liang, Jim Mattson, Aaron Lewis,
Paolo Bonzini, Shuah Khan, linux-kselftest, linux-kernel
On Wed, Sep 18, 2024, Colton Lewis wrote:
> Test events on core counters by iterating through every combination of
> events in amd_pmu_zen_events with every core counter.
>
> For each combination, calculate the appropriate register addresses for
> the event selection/control register and the counter register. The
> base addresses and layout schemes change depending on whether we have
> the CoreExt feature.
>
> To do the testing, reuse GUEST_TEST_EVENT to run a standard known
> workload. Decouple it from guest_assert_event_count (now
> guest_assert_intel_event_count) to generalize to AMD.
>
> Then assert the most specific detail that can be reasonably known
> about the counter result. Exact count is defined and known for some
> events and for other events merely asserted to be nonzero.
>
> Note on exact counts: AMD counts one more branch than Intel for the
> same workload. Though I can't confirm a reason, the only thing it
> could be is the boundary of the loop instruction being counted
> differently. Presumably, when the counter reaches 0 and execution
> continues to the next instruction, AMD counts this as a branch and
> Intel doesn't
IIRC, VMRUN is counted as a branch instruction for the guest. Assuming my memory
is correct, that means this test is going to be flaky as an asynchronous exit,
e.g. due to a host IRQ, during the measurement loop will inflate the count. I'm
not entirely sure what to do about that :-(
> +static void __guest_test_core_event(uint8_t event_idx, uint8_t counter_idx)
> +{
> + /* One fortunate area of actual compatibility! This register
/*
* This is the proper format for multi-line comments. We are not the
* crazy net/ folks.
*/
> + * layout is the same for both AMD and Intel.
It's not, actually. There are differences in the layout, it just so happens that
the differences don't throw a wrench in things.
The comments in tools/testing/selftests/kvm/include/x86_64/pmu.h document this
fairly well, I don't see any reason to have a comment here.
> + */
> + uint64_t eventsel = ARCH_PERFMON_EVENTSEL_OS |
> + ARCH_PERFMON_EVENTSEL_ENABLE |
> + amd_pmu_zen_events[event_idx];
Align the indentation.
uint64_t eventsel = ARCH_PERFMON_EVENTSEL_OS |
ARCH_PERFMON_EVENTSEL_ENABLE |
amd_pmu_zen_events[event_idx];
> + bool core_ext = this_cpu_has(X86_FEATURE_PERF_CTR_EXT_CORE);
> + uint64_t esel_msr_base = core_ext ? MSR_F15H_PERF_CTL : MSR_K7_EVNTSEL0;
> + uint64_t cnt_msr_base = core_ext ? MSR_F15H_PERF_CTR : MSR_K7_PERFCTR0;
> + uint64_t msr_step = core_ext ? 2 : 1;
> + uint64_t esel_msr = esel_msr_base + msr_step * counter_idx;
> + uint64_t cnt_msr = cnt_msr_base + msr_step * counter_idx;
This pattern of code is copy+pasted in three functions. Please add macros and/or
helpers to consolidate things. These should also be uint32_t, not 64.
It's a bit evil, but one approach would be to add a macro to iterate over all
PMU counters. Eating the VM-Exit for the CPUID to get X86_FEATURE_PERF_CTR_EXT_CORE
each time is unfortunate, but I doubt/hope it's not problematic in practice. If
the cost is meaningful, we could figure out a way to cache the info, e.g. something
awful like this might work:
/* Note, this relies on guest state being recreated between each test. */
static int has_perfctr_core = -1;
if (has_perfctr_core == -1)
has_perfctr_core = this_cpu_has(X86_FEATURE_PERFCTR_CORE);
if (has_perfctr_core) {
static bool get_pmu_counter_msrs(int idx, uint32_t *eventsel, uint32_t *pmc)
{
if (this_cpu_has(X86_FEATURE_PERFCTR_CORE)) {
*eventsel = MSR_F15H_PERF_CTL + idx * 2;
*pmc = MSR_F15H_PERF_CTR + idx * 2;
} else {
*eventsel = MSR_K7_EVNTSEL0 + idx;
*pmc = MSR_K7_PERFCTR0 + idx;
}
return true;
}
#define for_each_pmu_counter(_i, _nr_counters, _eventsel, _pmc) \
for (_i = 0; i < _nr_counters; _i++) \
if (get_pmu_counter_msrs(_i, &_eventsel, _pmc)) \
static void guest_test_core_events(void)
{
uint8_t nr_counters = guest_nr_core_counters();
uint32_t eventsel_msr, pmc_msr;
int i, j;
for (i = 0; i < NR_AMD_ZEN_EVENTS; i++) {
for_each_pmu_counter(j, nr_counters, eventsel_msr, pmc_msr) {
uint64_t eventsel = ARCH_PERFMON_EVENTSEL_OS |
ARCH_PERFMON_EVENTSEL_ENABLE |
amd_pmu_zen_events[event_idx];
GUEST_TEST_EVENT(pmc_msr, eventsel_msr, eventsel, "");
guest_assert_amd_event_count(i, j, pmc_msr);
if (!is_forced_emulation_enabled)
continue;
GUEST_TEST_EVENT(pmc_msr, eventsel_msr, eventsel, KVM_FEP);
guest_assert_amd_event_count(i, j, pmc_msr);
}
}
}
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v2 6/6] KVM: x86: selftests: Test PerfMonV2
2024-09-18 20:53 ` [PATCH v2 6/6] KVM: x86: selftests: Test PerfMonV2 Colton Lewis
@ 2025-01-08 19:42 ` Sean Christopherson
2025-01-20 20:07 ` Colton Lewis
0 siblings, 1 reply; 21+ messages in thread
From: Sean Christopherson @ 2025-01-08 19:42 UTC (permalink / raw)
To: Colton Lewis
Cc: kvm, Mingwei Zhang, Jinrong Liang, Jim Mattson, Aaron Lewis,
Paolo Bonzini, Shuah Khan, linux-kselftest, linux-kernel
On Wed, Sep 18, 2024, Colton Lewis wrote:
> Test PerfMonV2, which defines global registers to enable multiple
> performance counters with a single MSR write, in its own function.
>
> If the feature is available, ensure the global control register has
> the ability to start and stop the performance counters and the global
> status register correctly flags an overflow by the associated counter.
>
> Signed-off-by: Colton Lewis <coltonlewis@google.com>
> ---
> .../selftests/kvm/x86_64/pmu_counters_test.c | 53 +++++++++++++++++++
> 1 file changed, 53 insertions(+)
>
> diff --git a/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
> index cf2941cc7c4c..a90df8b67a19 100644
> --- a/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
> +++ b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
> @@ -763,10 +763,63 @@ static void guest_test_core_events(void)
> }
> }
>
> +static void guest_test_perfmon_v2(void)
> +{
> + uint64_t i;
> + uint64_t eventsel = ARCH_PERFMON_EVENTSEL_OS |
> + ARCH_PERFMON_EVENTSEL_ENABLE |
> + AMD_ZEN_CORE_CYCLES;
Hmm. I like the extra coverage, but I think the guts of this test belong is
common code, because the core logic is the same across Intel and AMD (I think),
only the MSRs are different.
Maybe a library helper that takes in the MSRs as parameters? Not sure.
I suspect it'll take some back and forth to figure out how best to validate these
more "advanced" behaviors, so maybe skip this patch for the next version? I.e.
land basic AMD coverage and then we can figure out how to test global control and
status.
> + bool core_ext = this_cpu_has(X86_FEATURE_PERF_CTR_EXT_CORE);
> + uint64_t sel_msr_base = core_ext ? MSR_F15H_PERF_CTL : MSR_K7_EVNTSEL0;
> + uint64_t cnt_msr_base = core_ext ? MSR_F15H_PERF_CTR : MSR_K7_PERFCTR0;
> + uint64_t msr_step = core_ext ? 2 : 1;
> + uint8_t nr_counters = guest_nr_core_counters();
> + bool perfmon_v2 = this_cpu_has(X86_FEATURE_PERFMON_V2);
Zero reason to capture this in a local variable.
> + uint64_t sel_msr;
> + uint64_t cnt_msr;
> +
> + if (!perfmon_v2)
> + return;
> +
> + for (i = 0; i < nr_counters; i++) {
> + sel_msr = sel_msr_base + msr_step * i;
> + cnt_msr = cnt_msr_base + msr_step * i;
> +
> + /* Ensure count stays 0 when global register disables counter. */
> + wrmsr(MSR_AMD64_PERF_CNTR_GLOBAL_CTL, 0);
> + wrmsr(sel_msr, eventsel);
> + wrmsr(cnt_msr, 0);
> + __asm__ __volatile__("loop ." : "+c"((int){NUM_LOOPS}));
> + GUEST_ASSERT(!_rdpmc(i));
> +
> + /* Ensure counter is >0 when global register enables counter. */
> + wrmsr(MSR_AMD64_PERF_CNTR_GLOBAL_CTL, BIT_ULL(i));
> + __asm__ __volatile__("loop ." : "+c"((int){NUM_LOOPS}));
> + wrmsr(MSR_AMD64_PERF_CNTR_GLOBAL_CTL, 0);
> + GUEST_ASSERT(_rdpmc(i));
> +
> + /* Ensure global status register flags a counter overflow. */
> + wrmsr(cnt_msr, -1);
> + wrmsr(MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR, 0xff);
> + wrmsr(MSR_AMD64_PERF_CNTR_GLOBAL_CTL, BIT_ULL(i));
> + __asm__ __volatile__("loop ." : "+c"((int){NUM_LOOPS}));
> + wrmsr(MSR_AMD64_PERF_CNTR_GLOBAL_CTL, 0);
> + GUEST_ASSERT(rdmsr(MSR_AMD64_PERF_CNTR_GLOBAL_STATUS) &
> + BIT_ULL(i));
> +
> + /* Ensure global status register flag is cleared correctly. */
> + wrmsr(MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR, BIT_ULL(i));
> + GUEST_ASSERT(!(rdmsr(MSR_AMD64_PERF_CNTR_GLOBAL_STATUS) &
> + BIT_ULL(i)));
> + }
> +}
> +
> +
> static void guest_test_core_counters(void)
> {
> guest_test_rdwr_core_counters();
> guest_test_core_events();
> + guest_test_perfmon_v2();
> GUEST_DONE();
> }
>
> --
> 2.46.0.662.g92d0881bb0-goog
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v2 0/6] Extend pmu_counters_test to AMD CPUs
2024-09-18 20:53 [PATCH v2 0/6] Extend pmu_counters_test to AMD CPUs Colton Lewis
` (6 preceding siblings ...)
2024-10-31 20:09 ` [PATCH v2 0/6] Extend pmu_counters_test to AMD CPUs Colton Lewis
@ 2025-01-09 19:47 ` Sean Christopherson
2025-01-20 20:11 ` Colton Lewis
7 siblings, 1 reply; 21+ messages in thread
From: Sean Christopherson @ 2025-01-09 19:47 UTC (permalink / raw)
To: Sean Christopherson, kvm, Colton Lewis
Cc: Mingwei Zhang, Jinrong Liang, Jim Mattson, Aaron Lewis,
Paolo Bonzini, Shuah Khan, linux-kselftest, linux-kernel
On Wed, 18 Sep 2024 20:53:13 +0000, Colton Lewis wrote:
> Extend pmu_counters_test to AMD CPUs.
>
> As the AMD PMU is quite different from Intel with different events and
> feature sets, this series introduces a new code path to test it,
> specifically focusing on the core counters including the
> PerfCtrExtCore and PerfMonV2 features. Northbridge counters and cache
> counters exist, but are not as important and can be deferred to a
> later series.
>
> [...]
Applied 1 and a modified version of 2 to kvm-x86 selftests, thanks!
[1/6] KVM: x86: selftests: Fix typos in macro variable use
https://github.com/kvm-x86/linux/commit/97d0d1655ea8
[2/6] KVM: x86: selftests: Define AMD PMU CPUID leaves
https://github.com/kvm-x86/linux/commit/c76a92382805
--
https://github.com/kvm-x86/linux/tree/next
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v2 2/6] KVM: x86: selftests: Define AMD PMU CPUID leaves
2025-01-08 17:58 ` Sean Christopherson
@ 2025-01-20 17:06 ` Colton Lewis
0 siblings, 0 replies; 21+ messages in thread
From: Colton Lewis @ 2025-01-20 17:06 UTC (permalink / raw)
To: Sean Christopherson
Cc: kvm, mizhang, ljr.kernel, jmattson, aaronlewis, pbonzini, shuah,
linux-kselftest, linux-kernel
Noted about the changelog and the ordering. No objections for how you
want to apply it.
Sean Christopherson <seanjc@google.com> writes:
> The shortlog is misleading. It's the *leaves* that are being defined,
> it's the
> features and properties.
> On Wed, Sep 18, 2024, Colton Lewis wrote:
>> This defined the CPUID calls to determine what extensions and
>> properties are available.
> This is not a coherent changelog.
>> AMD reference manual names listed below.
>> * PerfCtrExtCore (six core counters instead of four)
>> * PerfCtrExtNB (four counters for northbridge events)
>> * PerfCtrExtL2I (four counters for L2 cache events)
>> * PerfMonV2 (support for registers to control multiple
>> counters with a single register write)
>> * LbrAndPmcFreeze (support for freezing last branch recorded stack on
>> performance counter overflow)
>> * NumPerfCtrCore (number of core counters)
>> * NumPerfCtrNB (number of northbridge counters)
>> Signed-off-by: Colton Lewis <coltonlewis@google.com>
>> ---
>> tools/testing/selftests/kvm/include/x86_64/processor.h | 7 +++++++
>> 1 file changed, 7 insertions(+)
>> diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h
>> b/tools/testing/selftests/kvm/include/x86_64/processor.h
>> index a0c1440017bb..44ddfc4c1673 100644
>> --- a/tools/testing/selftests/kvm/include/x86_64/processor.h
>> +++ b/tools/testing/selftests/kvm/include/x86_64/processor.h
>> @@ -183,6 +183,9 @@ struct kvm_x86_cpu_feature {
>> #define X86_FEATURE_GBPAGES KVM_X86_CPU_FEATURE(0x80000001, 0, EDX, 26)
>> #define X86_FEATURE_RDTSCP KVM_X86_CPU_FEATURE(0x80000001, 0, EDX, 27)
>> #define X86_FEATURE_LM KVM_X86_CPU_FEATURE(0x80000001, 0, EDX, 29)
>> +#define X86_FEATURE_PERF_CTR_EXT_CORE KVM_X86_CPU_FEATURE(0x80000001,
>> 0, ECX, 23)
> This ordering is "broken", and confused me for quite some time. These
> new features
> are in ECX, but they're landed after features for EDX. To make matters
> worse,
> there's an existing feature, SVM, defined for ECX.
> TL;DR: please be more careful about the ordering, don't just drop stuff
> at the
> end.
>> +#define X86_FEATURE_PERF_CTR_EXT_NB KVM_X86_CPU_FEATURE(0x80000001, 0,
>> ECX, 24)
>> +#define X86_FEATURE_PERF_CTR_EXT_L2I KVM_X86_CPU_FEATURE(0x80000001, 0,
>> ECX, 28)
> To make life easier for developers, I think it makes sense to use the
> kernel's
> names (when the kernel also defines a feature), and adjust the property
> names to
> follow suit.
> If there are no objections, I'll apply this as:
> --
> Author: Colton Lewis <coltonlewis@google.com>
> AuthorDate: Wed Sep 18 20:53:15 2024 +0000
> Commit: Sean Christopherson <seanjc@google.com>
> CommitDate: Wed Jan 8 09:55:57 2025 -0800
> KVM: selftests: Add defines for AMD PMU CPUID features and properties
> Add macros for AMD's PMU related CPUID features. To make it easier to
> cross reference selftest code with KVM/kernel code, use the same macro
> names as the kernel for the features.
> For reference, the AMD APM defines the features/properties as:
> * PerfCtrExtCore (six core counters instead of four)
> * PerfCtrExtNB (four counters for northbridge events)
> * PerfCtrExtL2I (four counters for L2 cache events)
> * PerfMonV2 (support for registers to control multiple
> counters with a single register write)
> * LbrAndPmcFreeze (support for freezing last branch recorded stack
> on
> performance counter overflow)
> * NumPerfCtrCore (number of core counters)
> * NumPerfCtrNB (number of northbridge counters)
> Signed-off-by: Colton Lewis <coltonlewis@google.com>
> Link:
> https://lore.kernel.org/r/20240918205319.3517569-3-coltonlewis@google.com
> [sean: massage changelog, use same names as the kernel]
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> diff --git a/tools/testing/selftests/kvm/include/x86/processor.h
> b/tools/testing/selftests/kvm/include/x86/processor.h
> index 9ec984cf8674..8de7cace1fbf 100644
> --- a/tools/testing/selftests/kvm/include/x86/processor.h
> +++ b/tools/testing/selftests/kvm/include/x86/processor.h
> @@ -181,6 +181,9 @@ struct kvm_x86_cpu_feature {
> * Extended Leafs, a.k.a. AMD defined
> */
> #define X86_FEATURE_SVM
> KVM_X86_CPU_FEATURE(0x80000001, 0, ECX, 2)
> +#define X86_FEATURE_PERFCTR_CORE
> KVM_X86_CPU_FEATURE(0x80000001, 0, ECX, 23)
> +#define X86_FEATURE_PERFCTR_NB
> KVM_X86_CPU_FEATURE(0x80000001, 0, ECX, 24)
> +#define X86_FEATURE_PERFCTR_LLC
> KVM_X86_CPU_FEATURE(0x80000001, 0, ECX, 28)
> #define X86_FEATURE_NX
> KVM_X86_CPU_FEATURE(0x80000001, 0, EDX, 20)
> #define X86_FEATURE_GBPAGES
> KVM_X86_CPU_FEATURE(0x80000001, 0, EDX, 26)
> #define X86_FEATURE_RDTSCP
> KVM_X86_CPU_FEATURE(0x80000001, 0, EDX, 27)
> @@ -197,6 +200,8 @@ struct kvm_x86_cpu_feature {
> #define X86_FEATURE_VGIF
> KVM_X86_CPU_FEATURE(0x8000000A, 0, EDX, 16)
> #define X86_FEATURE_SEV
> KVM_X86_CPU_FEATURE(0x8000001F, 0, EAX, 1)
> #define X86_FEATURE_SEV_ES KVM_X86_CPU_FEATURE(0x8000001F,
> 0, EAX, 3)
> +#define X86_FEATURE_PERFMON_V2
> KVM_X86_CPU_FEATURE(0x80000022, 0, EAX, 0)
> +#define X86_FEATURE_LBR_PMC_FREEZE
> KVM_X86_CPU_FEATURE(0x80000022, 0, EAX, 2)
> /*
> * KVM defined paravirt features.
> @@ -283,6 +288,8 @@ struct kvm_x86_cpu_property {
> #define X86_PROPERTY_GUEST_MAX_PHY_ADDR
> KVM_X86_CPU_PROPERTY(0x80000008, 0, EAX, 16, 23)
> #define X86_PROPERTY_SEV_C_BIT
> KVM_X86_CPU_PROPERTY(0x8000001F, 0, EBX, 0, 5)
> #define X86_PROPERTY_PHYS_ADDR_REDUCTION
> KVM_X86_CPU_PROPERTY(0x8000001F, 0, EBX, 6, 11)
> +#define X86_PROPERTY_NR_PERFCTR_CORE
> KVM_X86_CPU_PROPERTY(0x80000022, 0, EBX, 0, 3)
> +#define X86_PROPERTY_NR_PERFCTR_NB
> KVM_X86_CPU_PROPERTY(0x80000022, 0, EBX, 10, 15)
> #define X86_PROPERTY_MAX_CENTAUR_LEAF
> KVM_X86_CPU_PROPERTY(0xC0000000, 0, EAX, 0, 31)
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v2 3/6] KVM: x86: selftests: Set up AMD VM in pmu_counters_test
2025-01-08 18:35 ` Sean Christopherson
@ 2025-01-20 18:03 ` Colton Lewis
2025-01-21 20:56 ` Sean Christopherson
0 siblings, 1 reply; 21+ messages in thread
From: Colton Lewis @ 2025-01-20 18:03 UTC (permalink / raw)
To: Sean Christopherson
Cc: kvm, mizhang, ljr.kernel, jmattson, aaronlewis, pbonzini, shuah,
linux-kselftest, linux-kernel
Hey Sean,
Thanks for the review.
Sean Christopherson <seanjc@google.com> writes:
> On Wed, Sep 18, 2024, Colton Lewis wrote:
>> Branch in main() depending on if the CPU is Intel or AMD. They are
>> subject to vastly different requirements because the AMD PMU lacks
>> many properties defined by the Intel PMU including the entire CPUID
>> 0xa function where Intel stores all the PMU properties. AMD lacks this
>> as well as any consistent notion of PMU versions as Intel does. Every
>> feature is a separate flag and they aren't the same features as Intel.
>> Set up a VM for testing core AMD counters and ensure proper CPUID
>> features are set.
>> Signed-off-by: Colton Lewis <coltonlewis@google.com>
>> ---
>> .../selftests/kvm/x86_64/pmu_counters_test.c | 104 ++++++++++++++----
>> 1 file changed, 83 insertions(+), 21 deletions(-)
>> diff --git a/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
>> b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
>> index 0e305e43a93b..5b240585edc5 100644
>> --- a/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
>> +++ b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
>> @@ -30,10 +30,21 @@
>> #define NUM_INSNS_RETIRED (NUM_LOOPS * NUM_INSNS_PER_LOOP +
>> NUM_EXTRA_INSNS)
>> +/*
>> + * Limit testing to MSRs that are actually defined by Intel (in the
>> SDM). MSRs
>> + * that aren't defined counter MSRs *probably* don't exist, but there's
>> no
>> + * guarantee that currently undefined MSR indices won't be used for
>> something
>> + * other than PMCs in the future.
>> + */
>> +#define MAX_NR_GP_COUNTERS 8
>> +#define MAX_NR_FIXED_COUNTERS 3
>> +#define AMD_NR_CORE_COUNTERS 4
>> +#define AMD_NR_CORE_EXT_COUNTERS 6
>> +
>> static uint8_t kvm_pmu_version;
>> static bool kvm_has_perf_caps;
>> -static struct kvm_vm *pmu_vm_create_with_one_vcpu(struct kvm_vcpu
>> **vcpu,
>> +static struct kvm_vm *intel_pmu_vm_create(struct kvm_vcpu **vcpu,
>> void *guest_code,
> When renaming things, please fixup the alignment as needed. Yes, it's
> more
> churn, but one-time pain is preferable to living indefinitely with funky
> formatting.
Understood
> I also don't like renaming just one symbol. E.g. the above
> MAX_NR_GP_COUNTERS
> and MAX_NR_FIXED_COUNTERS #defines are Intel specific, but that's not at
> all
> clear from the code. Ditto for guest_rd_wr_counters() vs.
> guest_test_rdwr_core_counters().
> Given how little code is actually shared between Intel and AMD, I think
> it makes
> sense to have the bulk of the code live in separate .c files. Since
> tools/testing/selftests/kvm/lib/x86/pmu.c is already a thing, the best
> option is
> probably to rename pmu_counters_test.c to intel_pmu_counters_test.c, and
> then
> extract the common bits to lib/x86/pmu.c (or include/x86/pmu.h as
> appropriate).
Yeah I see what you mean about making processor-specific functions more
obvious and think separating to different files would help a lot with
that.
>> uint8_t pmu_version,
>> uint64_t perf_capabilities)
>> +static void test_core_counters(void)
>> +{
>> + uint8_t nr_counters = nr_core_counters();
>> + bool core_ext = kvm_cpu_has(X86_FEATURE_PERF_CTR_EXT_CORE);
>> + bool perfmon_v2 = kvm_cpu_has(X86_FEATURE_PERFMON_V2);
>> + struct kvm_vcpu *vcpu;
>> + struct kvm_vm *vm;
>> +
>> + for (uint8_t ce = 0; ce <= core_ext; ce++) {
> Kernel style is to not declared variables inside for-loops.
I ran it through checkpatch and it didn't complain.
>> + for (uint8_t pm = 0; pm <= perfmon_v2; pm++) {
> Iterating over booleans is decidedly odd, the indentation levels are
> painful and
> will only get worse as more features are added, and the "ce" and "pm"
> variables
> aren't all that intuitive. More below.
>> + for (uint8_t nc = 0; nc <= nr_counters; nc++) {
> I also find "nc" to be unintuitive. Either use a fully descriptive name,
> or
> make it obvious that the variables is an iterator. E.g. either
> uint8_t max_nr_counters = nr_core_counters();
> ...
> for (nr_counters = 0; nr_counters < max_nr_counters; nr_counters++) {
> or
> for (j = 0; j < nr_counters; j++) {
> 'j' is obviously not descriptive, but when reading the usage, it's more
> obvious
> that it's a loop iterator (if you choose
Ok I'll go with the generic loop iterator name.
>> + vm = vm_create_with_one_vcpu(&vcpu, guest_test_core_counters);
>> +
>> + if (nc)
> Is '0' not a legal number of counters?
AMD64 Architecture Programmers Manual Volume 2 Chapter 13.2.2 states
that only nonzero values for that property are meaningful.
With 0, you are supposed to assume either 4 or 6 counters depending on
the CoreExt feature.
I could make 0 mean 0 counters inside the guest but that would break
what hardware does.
>> + vcpu_set_cpuid_property(
> Google3! (Never, ever wrap immediately after the opening paranethesis).
Checkpatch didn't complain.
>> + vcpu, X86_PROPERTY_NUM_PERF_CTR_CORE, nc);
>> + if (ce)
>> + vcpu_set_cpuid_feature(
>> + vcpu, X86_FEATURE_PERF_CTR_EXT_CORE);
> This likely doesn't do what you want. By default, vm_arch_vcpu_add()
> initializes
> CPUID to KVM's supported CPUID. So only _setting_ the feature means that
> that
> the test is likely only ever running with the full set of supported
> features.
Ok, so I have to explicitly unset the feature if I don't want it.
> Jumping back to my complaints with the for-loops, if the features to
> interate on
> are collected in an array, then the test can generate a mask of all
> possible
> combinations and iterate over that (plus the array itself). That keeps
> the
> indentation bounded and eliminates the copy+paste needed to add a new
> feature.
> The only downside is that the test is limited to 64 features, but we'll
> run into
> run time issues long before that limit is reached.
> const struct kvm_x86_cpu_feature pmu_features[] = {
> X86_FEATURE_PERF_CTR_EXT_CORE,
> X86_FEATURE_PERFMON_V2,
> };
> const u64 pmu_features_mask = BIT_ULL(ARRAY_SIZE(pmu_features)) - 1;
> for (mask = 0; mask <= pmu_features_mask; mask++) {
> for (nr_counters = 0; nr_counters < max_nr_counters; nr_counters++) {
> vm = vm_create_with_one_vcpu(&vcpu, guest_test_core_counters);
> vcpu_set_cpuid_property(vcpu, X86_PROPERTY_NUM_PERF_CTR_CORE,
> nr_counters);
> /* Comment goes here */
> for (i = 0; i < ARRAY_SIZE(pmu_features); i++)
> vcpu_set_or_clear_cpuid_feature(vcpu, pmu_features[i],
> mask & BIT_ULL(i));
> ...
> }
I thought of putting the features in a bitmask but worried it obscured
the intent too much. Listing features in an array and constructing the
bitmask seems clear enough to me so I will use that technique now.
>> + if (pm)
>> + vcpu_set_cpuid_feature(
>> + vcpu, X86_FEATURE_PERFMON_V2);
>> +
>> + pr_info("Testing core counters: CoreExt = %u, PerfMonV2 = %u,
>> NumCounters = %u\n",
>> + ce, pm, nc);
>> + run_vcpu(vcpu);
>> +
>> + kvm_vm_free(vm);
>> + }
>> + }
>> + }
>> +}
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v2 4/6] KVM: x86: selftests: Test read/write core counters
2025-01-08 18:54 ` Sean Christopherson
@ 2025-01-20 19:54 ` Colton Lewis
0 siblings, 0 replies; 21+ messages in thread
From: Colton Lewis @ 2025-01-20 19:54 UTC (permalink / raw)
To: Sean Christopherson
Cc: kvm, mizhang, ljr.kernel, jmattson, aaronlewis, pbonzini, shuah,
linux-kselftest, linux-kernel
Sean Christopherson <seanjc@google.com> writes:
> On Wed, Sep 18, 2024, Colton Lewis wrote:
>> Run a basic test to ensure we can write an arbitrary value to the core
>> counters and read it back.
>> Signed-off-by: Colton Lewis <coltonlewis@google.com>
>> ---
>> .../selftests/kvm/x86_64/pmu_counters_test.c | 54 +++++++++++++++++++
>> 1 file changed, 54 insertions(+)
>> diff --git a/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
>> b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
>> index 5b240585edc5..79ca7d608e00 100644
>> --- a/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
>> +++ b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
>> @@ -641,11 +641,65 @@ static uint8_t nr_core_counters(void)
>> return AMD_NR_CORE_EXT_COUNTERS;
>> return AMD_NR_CORE_COUNTERS;
>> +}
>> +
>> +static uint8_t guest_nr_core_counters(void)
>> +{
>> + uint8_t nr_counters =
>> this_cpu_property(X86_PROPERTY_NUM_PERF_CTR_CORE);
>> + bool core_ext = this_cpu_has(X86_FEATURE_PERF_CTR_EXT_CORE);
> For both this and nr_core_counters(), there's no need to read
> PERF_CTR_EXT_CORE
> if nr_counters is non-zero, and then no need to capture it in a local
> variable.
Sure but since I might need it and don't see why the performance cost
matters for a test that is only calling it a few times, I thought the
code looked nicer to just read it up front when I declare the variable.
I can change it.
>> +
>> + if (nr_counters != 0)
>> + return nr_counters;
>> +
>> + if (core_ext)
>> + return AMD_NR_CORE_EXT_COUNTERS;
>> +
>> + return AMD_NR_CORE_COUNTERS;
> This is *painfully* similar to nr_core_counters(). It actually took me
> almost
> a minute of staring to see the difference. One option would be to add a
> helper
> to dedup the if-statements, but while somewhat gross, I actually think a
> macro
> is the way to go.
> #define nr_core_counters(scope) \
> ({ \
> uint8_t nr_counters =
> scope##_cpu_property(X86_PROPERTY_NR_PERFCTR_CORE); \
> \
> if (!nr_counters) { \
> if (scope##_cpu_has(X86_FEATURE_PERFCTR_CORE)) \
> nr_counters = AMD_NR_CORE_EXT_COUNTERS; \
> else \
> nr_counters = AMD_NR_CORE_COUNTERS; \
> } \
> nr_counters; \
> })
> static uint8_t kvm_nr_core_counters(void)
> {
> return nr_core_counters(kvm);
> }
> static uint8_t guest_nr_core_counters(void)
> {
> return nr_core_counters(this);
> }
Point taken. I'll go with the macro.
>> +
> Unnecessary newline.
Will delete
>> +}
>> +static void guest_test_rdwr_core_counters(void)
>> +{
>> + bool core_ext = this_cpu_has(X86_FEATURE_PERF_CTR_EXT_CORE);
>> + uint8_t nr_counters = guest_nr_core_counters();
>> + uint8_t i;
>> + uint32_t esel_msr_base = core_ext ? MSR_F15H_PERF_CTL :
>> MSR_K7_EVNTSEL0;
> Please don't concoct new abbreviations. "esel" isn't used anywhere in
> KVM, and
> AFAICT it's not used in perf either.
I'll avoid that in the future
> I would also prefer to have consistent naming between the Intel and AMD
> tests
> (the Intel test uses base_<name>_msr).
Done
> base_eventsel_msr is all of four characters more.
>> + uint32_t cnt_msr_base = core_ext ? MSR_F15H_PERF_CTR : MSR_K7_PERFCTR0;
> For better or worse, the Intel version uses "base_pmc_msr". I see no
> reason to
> diverage from that.
Done
>> + uint32_t msr_step = core_ext ? 2 : 1;
>> +
>> + for (i = 0; i < AMD_NR_CORE_EXT_COUNTERS; i++) {
>> + uint64_t test_val = 0xffff;
>> + uint32_t esel_msr = esel_msr_base + msr_step * i;
>> + uint32_t cnt_msr = cnt_msr_base + msr_step * i;
> And then
> uint32_t eventsel_msr = ...;
> uint32_t pmc_msr = ...;
>> + bool expect_gp = !(i < nr_counters);
> Uh, isn't that just a weird way of writing:
> bool expect_gp = i >= nr_counters;
Yes they are logically equivalent. I thought it was clearer by
emphasizing it was the negation of "i is a valid counter" (i <
nr_counters)
But I'll change it
>> + uint8_t vector;
>> + uint64_t val;
>> +
>> + /* Test event selection register. */
> This is pretty obvious if the MSR is named eventsel_msr.
Will delete
>> + vector = wrmsr_safe(esel_msr, test_val);
>> + GUEST_ASSERT_PMC_MSR_ACCESS(WRMSR, esel_msr, expect_gp, vector);
>> +
>> + vector = rdmsr_safe(esel_msr, &val);
>> + GUEST_ASSERT_PMC_MSR_ACCESS(RDMSR, esel_msr, expect_gp, vector);
>> +
>> + if (!expect_gp)
>> + GUEST_ASSERT_PMC_VALUE(RDMSR, esel_msr, val, test_val);
>> +
>> + /* Test counter register. */
> Same thing here. If there is novel information/behavior, then by all
> means add
> a comment.
Will delete
>> + vector = wrmsr_safe(cnt_msr, test_val);
>> + GUEST_ASSERT_PMC_MSR_ACCESS(WRMSR, cnt_msr, expect_gp, vector);
>> +
>> + vector = rdmsr_safe(cnt_msr, &val);
>> + GUEST_ASSERT_PMC_MSR_ACCESS(RDMSR, cnt_msr, expect_gp, vector);
>> +
>> + if (!expect_gp)
>> + GUEST_ASSERT_PMC_VALUE(RDMSR, cnt_msr, val, test_val);
>> + }
>> }
>> static void guest_test_core_counters(void)
>> {
>> + guest_test_rdwr_core_counters();
>> GUEST_DONE();
>> }
>> --
>> 2.46.0.662.g92d0881bb0-goog
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v2 5/6] KVM: x86: selftests: Test core events
2025-01-08 19:31 ` Sean Christopherson
@ 2025-01-20 20:01 ` Colton Lewis
0 siblings, 0 replies; 21+ messages in thread
From: Colton Lewis @ 2025-01-20 20:01 UTC (permalink / raw)
To: Sean Christopherson
Cc: kvm, mizhang, ljr.kernel, jmattson, aaronlewis, pbonzini, shuah,
linux-kselftest, linux-kernel
Sean Christopherson <seanjc@google.com> writes:
> On Wed, Sep 18, 2024, Colton Lewis wrote:
>> Test events on core counters by iterating through every combination of
>> events in amd_pmu_zen_events with every core counter.
>> For each combination, calculate the appropriate register addresses for
>> the event selection/control register and the counter register. The
>> base addresses and layout schemes change depending on whether we have
>> the CoreExt feature.
>> To do the testing, reuse GUEST_TEST_EVENT to run a standard known
>> workload. Decouple it from guest_assert_event_count (now
>> guest_assert_intel_event_count) to generalize to AMD.
>> Then assert the most specific detail that can be reasonably known
>> about the counter result. Exact count is defined and known for some
>> events and for other events merely asserted to be nonzero.
>> Note on exact counts: AMD counts one more branch than Intel for the
>> same workload. Though I can't confirm a reason, the only thing it
>> could be is the boundary of the loop instruction being counted
>> differently. Presumably, when the counter reaches 0 and execution
>> continues to the next instruction, AMD counts this as a branch and
>> Intel doesn't
> IIRC, VMRUN is counted as a branch instruction for the guest. Assuming
> my memory
> is correct, that means this test is going to be flaky as an asynchronous
> exit,
> e.g. due to a host IRQ, during the measurement loop will inflate the
> count. I'm
> not entirely sure what to do about that :-(
:-( but thanks for the explanation
>> +static void __guest_test_core_event(uint8_t event_idx, uint8_t
>> counter_idx)
>> +{
>> + /* One fortunate area of actual compatibility! This register
> /*
> * This is the proper format for multi-line comments. We are not the
> * crazy net/ folks.
> */
Will do. As with some other formatting comments, checkpatch didn't
complain.
>> + * layout is the same for both AMD and Intel.
> It's not, actually. There are differences in the layout, it just so
> happens that
> the differences don't throw a wrench in things.
> The comments in tools/testing/selftests/kvm/include/x86_64/pmu.h document
> this
> fairly well, I don't see any reason to have a comment here.
Will delete the comment
>> + */
>> + uint64_t eventsel = ARCH_PERFMON_EVENTSEL_OS |
>> + ARCH_PERFMON_EVENTSEL_ENABLE |
>> + amd_pmu_zen_events[event_idx];
> Align the indentation.
> uint64_t eventsel = ARCH_PERFMON_EVENTSEL_OS |
> ARCH_PERFMON_EVENTSEL_ENABLE |
> amd_pmu_zen_events[event_idx];
Will do
>> + bool core_ext = this_cpu_has(X86_FEATURE_PERF_CTR_EXT_CORE);
>> + uint64_t esel_msr_base = core_ext ? MSR_F15H_PERF_CTL :
>> MSR_K7_EVNTSEL0;
>> + uint64_t cnt_msr_base = core_ext ? MSR_F15H_PERF_CTR : MSR_K7_PERFCTR0;
>> + uint64_t msr_step = core_ext ? 2 : 1;
>> + uint64_t esel_msr = esel_msr_base + msr_step * counter_idx;
>> + uint64_t cnt_msr = cnt_msr_base + msr_step * counter_idx;
> This pattern of code is copy+pasted in three functions. Please add
> macros and/or
> helpers to consolidate things. These should also be uint32_t, not 64.
Will do
> It's a bit evil, but one approach would be to add a macro to iterate over
> all
> PMU counters. Eating the VM-Exit for the CPUID to get
> X86_FEATURE_PERF_CTR_EXT_CORE
> each time is unfortunate, but I doubt/hope it's not problematic in
> practice. If
> the cost is meaningful, we could figure out a way to cache the info, e.g.
> something
> awful like this might work:
> /* Note, this relies on guest state being recreated between each test. */
> static int has_perfctr_core = -1;
> if (has_perfctr_core == -1)
> has_perfctr_core = this_cpu_has(X86_FEATURE_PERFCTR_CORE);
> if (has_perfctr_core) {
> static bool get_pmu_counter_msrs(int idx, uint32_t *eventsel, uint32_t
> *pmc)
> {
> if (this_cpu_has(X86_FEATURE_PERFCTR_CORE)) {
> *eventsel = MSR_F15H_PERF_CTL + idx * 2;
> *pmc = MSR_F15H_PERF_CTR + idx * 2;
> } else {
> *eventsel = MSR_K7_EVNTSEL0 + idx;
> *pmc = MSR_K7_PERFCTR0 + idx;
> }
> return true;
> }
> #define for_each_pmu_counter(_i, _nr_counters, _eventsel, _pmc) \
> for (_i = 0; i < _nr_counters; _i++) \
> if (get_pmu_counter_msrs(_i, &_eventsel, _pmc)) \
> static void guest_test_core_events(void)
> {
> uint8_t nr_counters = guest_nr_core_counters();
> uint32_t eventsel_msr, pmc_msr;
> int i, j;
> for (i = 0; i < NR_AMD_ZEN_EVENTS; i++) {
> for_each_pmu_counter(j, nr_counters, eventsel_msr, pmc_msr) {
> uint64_t eventsel = ARCH_PERFMON_EVENTSEL_OS |
> ARCH_PERFMON_EVENTSEL_ENABLE |
> amd_pmu_zen_events[event_idx];
> GUEST_TEST_EVENT(pmc_msr, eventsel_msr, eventsel, "");
> guest_assert_amd_event_count(i, j, pmc_msr);
> if (!is_forced_emulation_enabled)
> continue;
> GUEST_TEST_EVENT(pmc_msr, eventsel_msr, eventsel, KVM_FEP);
> guest_assert_amd_event_count(i, j, pmc_msr);
> }
> }
> }
I'll experiment with this
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v2 6/6] KVM: x86: selftests: Test PerfMonV2
2025-01-08 19:42 ` Sean Christopherson
@ 2025-01-20 20:07 ` Colton Lewis
0 siblings, 0 replies; 21+ messages in thread
From: Colton Lewis @ 2025-01-20 20:07 UTC (permalink / raw)
To: Sean Christopherson
Cc: kvm, mizhang, ljr.kernel, jmattson, aaronlewis, pbonzini, shuah,
linux-kselftest, linux-kernel
Sean Christopherson <seanjc@google.com> writes:
> On Wed, Sep 18, 2024, Colton Lewis wrote:
>> Test PerfMonV2, which defines global registers to enable multiple
>> performance counters with a single MSR write, in its own function.
>> If the feature is available, ensure the global control register has
>> the ability to start and stop the performance counters and the global
>> status register correctly flags an overflow by the associated counter.
>> Signed-off-by: Colton Lewis <coltonlewis@google.com>
>> ---
>> .../selftests/kvm/x86_64/pmu_counters_test.c | 53 +++++++++++++++++++
>> 1 file changed, 53 insertions(+)
>> diff --git a/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
>> b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
>> index cf2941cc7c4c..a90df8b67a19 100644
>> --- a/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
>> +++ b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
>> @@ -763,10 +763,63 @@ static void guest_test_core_events(void)
>> }
>> }
>> +static void guest_test_perfmon_v2(void)
>> +{
>> + uint64_t i;
>> + uint64_t eventsel = ARCH_PERFMON_EVENTSEL_OS |
>> + ARCH_PERFMON_EVENTSEL_ENABLE |
>> + AMD_ZEN_CORE_CYCLES;
> Hmm. I like the extra coverage, but I think the guts of this test belong
> is
> common code, because the core logic is the same across Intel and AMD (I
> think),
> only the MSRs are different.
> Maybe a library helper that takes in the MSRs as parameters? Not sure.
I'll explore some other options
> I suspect it'll take some back and forth to figure out how best to
> validate these
> more "advanced" behaviors, so maybe skip this patch for the next
> version? I.e.
> land basic AMD coverage and then we can figure out how to test global
> control and
> status.
Sure
>> + bool core_ext = this_cpu_has(X86_FEATURE_PERF_CTR_EXT_CORE);
>> + uint64_t sel_msr_base = core_ext ? MSR_F15H_PERF_CTL : MSR_K7_EVNTSEL0;
>> + uint64_t cnt_msr_base = core_ext ? MSR_F15H_PERF_CTR : MSR_K7_PERFCTR0;
>> + uint64_t msr_step = core_ext ? 2 : 1;
>> + uint8_t nr_counters = guest_nr_core_counters();
>> + bool perfmon_v2 = this_cpu_has(X86_FEATURE_PERFMON_V2);
> Zero reason to capture this in a local variable.
Will delete
>> + uint64_t sel_msr;
>> + uint64_t cnt_msr;
>> +
>> + if (!perfmon_v2)
>> + return;
>> +
>> + for (i = 0; i < nr_counters; i++) {
>> + sel_msr = sel_msr_base + msr_step * i;
>> + cnt_msr = cnt_msr_base + msr_step * i;
>> +
>> + /* Ensure count stays 0 when global register disables counter. */
>> + wrmsr(MSR_AMD64_PERF_CNTR_GLOBAL_CTL, 0);
>> + wrmsr(sel_msr, eventsel);
>> + wrmsr(cnt_msr, 0);
>> + __asm__ __volatile__("loop ." : "+c"((int){NUM_LOOPS}));
>> + GUEST_ASSERT(!_rdpmc(i));
>> +
>> + /* Ensure counter is >0 when global register enables counter. */
>> + wrmsr(MSR_AMD64_PERF_CNTR_GLOBAL_CTL, BIT_ULL(i));
>> + __asm__ __volatile__("loop ." : "+c"((int){NUM_LOOPS}));
>> + wrmsr(MSR_AMD64_PERF_CNTR_GLOBAL_CTL, 0);
>> + GUEST_ASSERT(_rdpmc(i));
>> +
>> + /* Ensure global status register flags a counter overflow. */
>> + wrmsr(cnt_msr, -1);
>> + wrmsr(MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR, 0xff);
>> + wrmsr(MSR_AMD64_PERF_CNTR_GLOBAL_CTL, BIT_ULL(i));
>> + __asm__ __volatile__("loop ." : "+c"((int){NUM_LOOPS}));
>> + wrmsr(MSR_AMD64_PERF_CNTR_GLOBAL_CTL, 0);
>> + GUEST_ASSERT(rdmsr(MSR_AMD64_PERF_CNTR_GLOBAL_STATUS) &
>> + BIT_ULL(i));
>> +
>> + /* Ensure global status register flag is cleared correctly. */
>> + wrmsr(MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR, BIT_ULL(i));
>> + GUEST_ASSERT(!(rdmsr(MSR_AMD64_PERF_CNTR_GLOBAL_STATUS) &
>> + BIT_ULL(i)));
>> + }
>> +}
>> +
>> +
>> static void guest_test_core_counters(void)
>> {
>> guest_test_rdwr_core_counters();
>> guest_test_core_events();
>> + guest_test_perfmon_v2();
>> GUEST_DONE();
>> }
>> --
>> 2.46.0.662.g92d0881bb0-goog
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v2 0/6] Extend pmu_counters_test to AMD CPUs
2025-01-09 19:47 ` Sean Christopherson
@ 2025-01-20 20:11 ` Colton Lewis
0 siblings, 0 replies; 21+ messages in thread
From: Colton Lewis @ 2025-01-20 20:11 UTC (permalink / raw)
To: Sean Christopherson
Cc: seanjc, kvm, mizhang, ljr.kernel, jmattson, aaronlewis, pbonzini,
shuah, linux-kselftest, linux-kernel
Sean Christopherson <seanjc@google.com> writes:
> On Wed, 18 Sep 2024 20:53:13 +0000, Colton Lewis wrote:
>> Extend pmu_counters_test to AMD CPUs.
>> As the AMD PMU is quite different from Intel with different events and
>> feature sets, this series introduces a new code path to test it,
>> specifically focusing on the core counters including the
>> PerfCtrExtCore and PerfMonV2 features. Northbridge counters and cache
>> counters exist, but are not as important and can be deferred to a
>> later series.
>> [...]
> Applied 1 and a modified version of 2 to kvm-x86 selftests, thanks!
> [1/6] KVM: x86: selftests: Fix typos in macro variable use
> https://github.com/kvm-x86/linux/commit/97d0d1655ea8
> [2/6] KVM: x86: selftests: Define AMD PMU CPUID leaves
> https://github.com/kvm-x86/linux/commit/c76a92382805
> --
> https://github.com/kvm-x86/linux/tree/next
Thanks Sean! I'll get to the rest. Sorry for the delay in responding to
your comments. I was head down in something else and didn't have time
until now to remember what I was doing here.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v2 3/6] KVM: x86: selftests: Set up AMD VM in pmu_counters_test
2025-01-20 18:03 ` Colton Lewis
@ 2025-01-21 20:56 ` Sean Christopherson
0 siblings, 0 replies; 21+ messages in thread
From: Sean Christopherson @ 2025-01-21 20:56 UTC (permalink / raw)
To: Colton Lewis
Cc: kvm, mizhang, ljr.kernel, jmattson, aaronlewis, pbonzini, shuah,
linux-kselftest, linux-kernel
On Mon, Jan 20, 2025, Colton Lewis wrote:
> > > +static void test_core_counters(void)
> > > +{
> > > + uint8_t nr_counters = nr_core_counters();
> > > + bool core_ext = kvm_cpu_has(X86_FEATURE_PERF_CTR_EXT_CORE);
> > > + bool perfmon_v2 = kvm_cpu_has(X86_FEATURE_PERFMON_V2);
> > > + struct kvm_vcpu *vcpu;
> > > + struct kvm_vm *vm;
> > > +
> > > + for (uint8_t ce = 0; ce <= core_ext; ce++) {
>
> > Kernel style is to not declared variables inside for-loops.
>
> I ran it through checkpatch and it didn't complain.
...
> > > + vcpu_set_cpuid_property(
>
> > Google3! (Never, ever wrap immediately after the opening paranethesis).
>
> Checkpatch didn't complain.
Checkpatch is a perl script, not sentient AI. It's nothing more than a tool to
help detect common goofs, typos, egregious flaws, etc. The absense of checkpatch
warnings/errors does not mean a patch has no issues. Coding style in particular
is quite subjective and prone to "exceptions to the rule", which makes is especially
hard to "enforce" via checkpatch.
As explained in Documentation/process/4.Coding.rst, what matters most is consistency:
A code base as large as the kernel requires some uniformity of code to make it
possible for developers to quickly understand any part of it. So there is no
longer room for strangely-formatted code.
https://www.kernel.org/doc/html/v5.0/process/4.Coding.html#coding-style
^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2025-01-21 20:56 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-18 20:53 [PATCH v2 0/6] Extend pmu_counters_test to AMD CPUs Colton Lewis
2024-09-18 20:53 ` [PATCH v2 1/6] KVM: x86: selftests: Fix typos in macro variable use Colton Lewis
2024-09-18 20:53 ` [PATCH v2 2/6] KVM: x86: selftests: Define AMD PMU CPUID leaves Colton Lewis
2025-01-08 17:58 ` Sean Christopherson
2025-01-20 17:06 ` Colton Lewis
2024-09-18 20:53 ` [PATCH v2 3/6] KVM: x86: selftests: Set up AMD VM in pmu_counters_test Colton Lewis
2025-01-08 18:35 ` Sean Christopherson
2025-01-20 18:03 ` Colton Lewis
2025-01-21 20:56 ` Sean Christopherson
2024-09-18 20:53 ` [PATCH v2 4/6] KVM: x86: selftests: Test read/write core counters Colton Lewis
2025-01-08 18:54 ` Sean Christopherson
2025-01-20 19:54 ` Colton Lewis
2024-09-18 20:53 ` [PATCH v2 5/6] KVM: x86: selftests: Test core events Colton Lewis
2025-01-08 19:31 ` Sean Christopherson
2025-01-20 20:01 ` Colton Lewis
2024-09-18 20:53 ` [PATCH v2 6/6] KVM: x86: selftests: Test PerfMonV2 Colton Lewis
2025-01-08 19:42 ` Sean Christopherson
2025-01-20 20:07 ` Colton Lewis
2024-10-31 20:09 ` [PATCH v2 0/6] Extend pmu_counters_test to AMD CPUs Colton Lewis
2025-01-09 19:47 ` Sean Christopherson
2025-01-20 20:11 ` Colton Lewis
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).