* [linux-next:master] [KVM] 7803339fa9: kernel-selftests.kvm.pmu_counters_test.fail
@ 2025-01-14 2:40 kernel test robot
2025-01-14 19:47 ` Sean Christopherson
0 siblings, 1 reply; 8+ messages in thread
From: kernel test robot @ 2025-01-14 2:40 UTC (permalink / raw)
To: Sean Christopherson
Cc: oe-lkp, lkp, Maxim Levitsky, kvm, xudong.hao, oliver.sang
Hello,
we fould the test failed on a Cooper Lake, not sure if this is expected.
below full report FYI.
kernel test robot noticed "kernel-selftests.kvm.pmu_counters_test.fail" on:
commit: 7803339fa929387bbc66479532afbaf8cbebb41b ("KVM: selftests: Use data load to trigger LLC references/misses in Intel PMU")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
[test failed on linux-next/master 37136bf5c3a6f6b686d74f41837a6406bec6b7bc]
in testcase: kernel-selftests
version: kernel-selftests-x86_64-7503345ac5f5-1_20241208
with following parameters:
group: kvm
config: x86_64-rhel-9.4-kselftests
compiler: gcc-12
test machine: 224 threads 4 sockets Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with 192G memory
(please refer to attached dmesg/kmsg for entire log/backtrace)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202501141009.30c629b4-lkp@intel.com
# timeout set to 120
# selftests: kvm: pmu_counters_test
# Random seed: 0x6b8b4567
# Testing arch events, PMU version 0, perf_caps = 0
# Testing GP counters, PMU version 0, perf_caps = 0
# Testing fixed counters, PMU version 0, perf_caps = 0
# Testing arch events, PMU version 0, perf_caps = 2000
# Testing GP counters, PMU version 0, perf_caps = 2000
# Testing fixed counters, PMU version 0, perf_caps = 2000
# Testing arch events, PMU version 1, perf_caps = 0
# ==== Test Assertion Failure ====
# x86/pmu_counters_test.c:129: count >= (10 * 4 + 5)
# pid=6278 tid=6278 errno=4 - Interrupted system call
# 1 0x0000000000411281: assert_on_unhandled_exception at processor.c:625
# 2 0x00000000004075d4: _vcpu_run at kvm_util.c:1652
# 3 (inlined by) vcpu_run at kvm_util.c:1663
# 4 0x0000000000402c5e: run_vcpu at pmu_counters_test.c:62
# 5 0x0000000000402e4d: test_arch_events at pmu_counters_test.c:315
# 6 0x0000000000402663: test_arch_events at pmu_counters_test.c:304
# 7 (inlined by) test_intel_counters at pmu_counters_test.c:609
# 8 (inlined by) main at pmu_counters_test.c:642
# 9 0x00007f3b134f9249: ?? ??:0
# 10 0x00007f3b134f9304: ?? ??:0
# 11 0x0000000000402900: _start at ??:?
# count >= NUM_INSNS_RETIRED
not ok 21 selftests: kvm: pmu_counters_test # exit=254
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250114/202501141009.30c629b4-lkp@intel.com
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [linux-next:master] [KVM] 7803339fa9: kernel-selftests.kvm.pmu_counters_test.fail
2025-01-14 2:40 [linux-next:master] [KVM] 7803339fa9: kernel-selftests.kvm.pmu_counters_test.fail kernel test robot
@ 2025-01-14 19:47 ` Sean Christopherson
2025-01-15 2:44 ` Mi, Dapeng
0 siblings, 1 reply; 8+ messages in thread
From: Sean Christopherson @ 2025-01-14 19:47 UTC (permalink / raw)
To: kernel test robot, g
Cc: oe-lkp, lkp, Maxim Levitsky, kvm, xudong.hao, Dapeng Mi
+Dapeng
On Tue, Jan 14, 2025, kernel test robot wrote:
> we fould the test failed on a Cooper Lake, not sure if this is expected.
> below full report FYI.
>
>
> kernel test robot noticed "kernel-selftests.kvm.pmu_counters_test.fail" on:
>
> commit: 7803339fa929387bbc66479532afbaf8cbebb41b ("KVM: selftests: Use data load to trigger LLC references/misses in Intel PMU")
> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
>
> [test failed on linux-next/master 37136bf5c3a6f6b686d74f41837a6406bec6b7bc]
>
> in testcase: kernel-selftests
> version: kernel-selftests-x86_64-7503345ac5f5-1_20241208
> with following parameters:
>
> group: kvm
>
> config: x86_64-rhel-9.4-kselftests
> compiler: gcc-12
> test machine: 224 threads 4 sockets Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with 192G memory
*sigh*
This fails on our Skylake and Cascade Lake systems, but I only tested an Emerald
Rapids.
> # Testing fixed counters, PMU version 0, perf_caps = 2000
> # Testing arch events, PMU version 1, perf_caps = 0
> # ==== Test Assertion Failure ====
> # x86/pmu_counters_test.c:129: count >= (10 * 4 + 5)
> # pid=6278 tid=6278 errno=4 - Interrupted system call
> # 1 0x0000000000411281: assert_on_unhandled_exception at processor.c:625
> # 2 0x00000000004075d4: _vcpu_run at kvm_util.c:1652
> # 3 (inlined by) vcpu_run at kvm_util.c:1663
> # 4 0x0000000000402c5e: run_vcpu at pmu_counters_test.c:62
> # 5 0x0000000000402e4d: test_arch_events at pmu_counters_test.c:315
> # 6 0x0000000000402663: test_arch_events at pmu_counters_test.c:304
> # 7 (inlined by) test_intel_counters at pmu_counters_test.c:609
> # 8 (inlined by) main at pmu_counters_test.c:642
> # 9 0x00007f3b134f9249: ?? ??:0
> # 10 0x00007f3b134f9304: ?? ??:0
> # 11 0x0000000000402900: _start at ??:?
> # count >= NUM_INSNS_RETIRED
The failure is on top-down slots. I modified the assert to actually print the
count (I'll make sure to post a patch regardless of where this goes), and based
on the count for failing vs. passing, I'm pretty sure the issue is not the extra
instruction, but instead is due to changing the target of the CLFUSH from the
address of the code to the address of kvm_pmu_version.
However, I think the blame lies with the assertion itself, i.e. with commit
4a447b135e45 ("KVM: selftests: Test top-down slots event in x86's pmu_counters_test").
Either that or top-down slots is broken on the Lakes.
By my rudimentary measurements, tying the number of available slots to the number
of instructions *retired* is fundamentally flawed. E.g. on the Lakes (SKX is more
or less identical to CLX), omitting the CLFLUSHOPT entirely results in *more*
slots being available throughout the lifetime of the measured section.
My best guess is that flushing the cache line use for the data load causes the
backend to saturate its slots with prefetching data, and as a result the number
of slots that are available goes down.
CLFLUSHOPT . | CLFLUSHOPT [%m] | NOP
CLX 350-100 | 20-60[*] | 135-150
SPR 49000-57000 | 32500-41000 | 6760-6830
[*] CLX had a few outliers in the 200-400 range, but the majority of runs were
in the 20-60 range.
Reading through more (and more and more) of the TMA documentation, I don't think
we can assume anything about the number of available slots, beyond a very basic
assertion that it's practically impossible for there to never be an available
slot. IIUC, retiring an instruction does NOT require an available slot, rather
it requires the opposite: an occupied slot for the uop(s).
I'm mildly curious as to why the counts for SPR are orders of magnitude higher
that CLX (simple accounting differences?), but I don't think it changes anything
in the test itself.
Unless someone has a better idea, my plan is to post a patch to assert that the
top-down slots count is non-zero, not that it's >= instructions retired. E.g.
diff --git a/tools/testing/selftests/kvm/x86/pmu_counters_test.c b/tools/testing/selftests/kvm/x86/pmu_counters_test.c
index accd7ecd3e5f..21acedcd46cd 100644
--- a/tools/testing/selftests/kvm/x86/pmu_counters_test.c
+++ b/tools/testing/selftests/kvm/x86/pmu_counters_test.c
@@ -123,10 +123,8 @@ static void guest_assert_event_count(uint8_t idx,
fallthrough;
case INTEL_ARCH_CPU_CYCLES_INDEX:
case INTEL_ARCH_REFERENCE_CYCLES_INDEX:
- GUEST_ASSERT_NE(count, 0);
- break;
case INTEL_ARCH_TOPDOWN_SLOTS_INDEX:
- GUEST_ASSERT(count >= NUM_INSNS_RETIRED);
+ GUEST_ASSERT_NE(count, 0);
break;
default:
break;
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [linux-next:master] [KVM] 7803339fa9: kernel-selftests.kvm.pmu_counters_test.fail
2025-01-14 19:47 ` Sean Christopherson
@ 2025-01-15 2:44 ` Mi, Dapeng
2025-01-17 3:04 ` Mi, Dapeng
0 siblings, 1 reply; 8+ messages in thread
From: Mi, Dapeng @ 2025-01-15 2:44 UTC (permalink / raw)
To: Sean Christopherson, kernel test robot, g
Cc: oe-lkp, lkp, Maxim Levitsky, kvm, xudong.hao
On 1/15/2025 3:47 AM, Sean Christopherson wrote:
> +Dapeng
>
> On Tue, Jan 14, 2025, kernel test robot wrote:
>> we fould the test failed on a Cooper Lake, not sure if this is expected.
>> below full report FYI.
>>
>>
>> kernel test robot noticed "kernel-selftests.kvm.pmu_counters_test.fail" on:
>>
>> commit: 7803339fa929387bbc66479532afbaf8cbebb41b ("KVM: selftests: Use data load to trigger LLC references/misses in Intel PMU")
>> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
>>
>> [test failed on linux-next/master 37136bf5c3a6f6b686d74f41837a6406bec6b7bc]
>>
>> in testcase: kernel-selftests
>> version: kernel-selftests-x86_64-7503345ac5f5-1_20241208
>> with following parameters:
>>
>> group: kvm
>>
>> config: x86_64-rhel-9.4-kselftests
>> compiler: gcc-12
>> test machine: 224 threads 4 sockets Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with 192G memory
> *sigh*
>
> This fails on our Skylake and Cascade Lake systems, but I only tested an Emerald
> Rapids.
>
>> # Testing fixed counters, PMU version 0, perf_caps = 2000
>> # Testing arch events, PMU version 1, perf_caps = 0
>> # ==== Test Assertion Failure ====
>> # x86/pmu_counters_test.c:129: count >= (10 * 4 + 5)
>> # pid=6278 tid=6278 errno=4 - Interrupted system call
>> # 1 0x0000000000411281: assert_on_unhandled_exception at processor.c:625
>> # 2 0x00000000004075d4: _vcpu_run at kvm_util.c:1652
>> # 3 (inlined by) vcpu_run at kvm_util.c:1663
>> # 4 0x0000000000402c5e: run_vcpu at pmu_counters_test.c:62
>> # 5 0x0000000000402e4d: test_arch_events at pmu_counters_test.c:315
>> # 6 0x0000000000402663: test_arch_events at pmu_counters_test.c:304
>> # 7 (inlined by) test_intel_counters at pmu_counters_test.c:609
>> # 8 (inlined by) main at pmu_counters_test.c:642
>> # 9 0x00007f3b134f9249: ?? ??:0
>> # 10 0x00007f3b134f9304: ?? ??:0
>> # 11 0x0000000000402900: _start at ??:?
>> # count >= NUM_INSNS_RETIRED
> The failure is on top-down slots. I modified the assert to actually print the
> count (I'll make sure to post a patch regardless of where this goes), and based
> on the count for failing vs. passing, I'm pretty sure the issue is not the extra
> instruction, but instead is due to changing the target of the CLFUSH from the
> address of the code to the address of kvm_pmu_version.
>
> However, I think the blame lies with the assertion itself, i.e. with commit
> 4a447b135e45 ("KVM: selftests: Test top-down slots event in x86's pmu_counters_test").
> Either that or top-down slots is broken on the Lakes.
>
> By my rudimentary measurements, tying the number of available slots to the number
> of instructions *retired* is fundamentally flawed. E.g. on the Lakes (SKX is more
> or less identical to CLX), omitting the CLFLUSHOPT entirely results in *more*
> slots being available throughout the lifetime of the measured section.
>
> My best guess is that flushing the cache line use for the data load causes the
> backend to saturate its slots with prefetching data, and as a result the number
> of slots that are available goes down.
>
> CLFLUSHOPT . | CLFLUSHOPT [%m] | NOP
> CLX 350-100 | 20-60[*] | 135-150
> SPR 49000-57000 | 32500-41000 | 6760-6830
>
> [*] CLX had a few outliers in the 200-400 range, but the majority of runs were
> in the 20-60 range.
>
> Reading through more (and more and more) of the TMA documentation, I don't think
> we can assume anything about the number of available slots, beyond a very basic
> assertion that it's practically impossible for there to never be an available
> slot. IIUC, retiring an instruction does NOT require an available slot, rather
> it requires the opposite: an occupied slot for the uop(s).
I'm not quite sure about this. IIUC, retiring an instruction may not need a
cycle, but it needs a slot at least except the instruction is macro-fused.
Anyway, let me double check with our micro-architecture and perf experts.
>
> I'm mildly curious as to why the counts for SPR are orders of magnitude higher
> that CLX (simple accounting differences?), but I don't think it changes anything
> in the test itself.
>
> Unless someone has a better idea, my plan is to post a patch to assert that the
> top-down slots count is non-zero, not that it's >= instructions retired. E.g.
>
> diff --git a/tools/testing/selftests/kvm/x86/pmu_counters_test.c b/tools/testing/selftests/kvm/x86/pmu_counters_test.c
> index accd7ecd3e5f..21acedcd46cd 100644
> --- a/tools/testing/selftests/kvm/x86/pmu_counters_test.c
> +++ b/tools/testing/selftests/kvm/x86/pmu_counters_test.c
> @@ -123,10 +123,8 @@ static void guest_assert_event_count(uint8_t idx,
> fallthrough;
> case INTEL_ARCH_CPU_CYCLES_INDEX:
> case INTEL_ARCH_REFERENCE_CYCLES_INDEX:
> - GUEST_ASSERT_NE(count, 0);
> - break;
> case INTEL_ARCH_TOPDOWN_SLOTS_INDEX:
> - GUEST_ASSERT(count >= NUM_INSNS_RETIRED);
> + GUEST_ASSERT_NE(count, 0);
> break;
> default:
> break;
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [linux-next:master] [KVM] 7803339fa9: kernel-selftests.kvm.pmu_counters_test.fail
2025-01-15 2:44 ` Mi, Dapeng
@ 2025-01-17 3:04 ` Mi, Dapeng
2025-01-17 17:11 ` Sean Christopherson
0 siblings, 1 reply; 8+ messages in thread
From: Mi, Dapeng @ 2025-01-17 3:04 UTC (permalink / raw)
To: Sean Christopherson, kernel test robot, g
Cc: oe-lkp, lkp, Maxim Levitsky, kvm, xudong.hao
On 1/15/2025 10:44 AM, Mi, Dapeng wrote:
> On 1/15/2025 3:47 AM, Sean Christopherson wrote:
>> +Dapeng
>>
>> On Tue, Jan 14, 2025, kernel test robot wrote:
>>> we fould the test failed on a Cooper Lake, not sure if this is expected.
>>> below full report FYI.
>>>
>>>
>>> kernel test robot noticed "kernel-selftests.kvm.pmu_counters_test.fail" on:
>>>
>>> commit: 7803339fa929387bbc66479532afbaf8cbebb41b ("KVM: selftests: Use data load to trigger LLC references/misses in Intel PMU")
>>> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
>>>
>>> [test failed on linux-next/master 37136bf5c3a6f6b686d74f41837a6406bec6b7bc]
>>>
>>> in testcase: kernel-selftests
>>> version: kernel-selftests-x86_64-7503345ac5f5-1_20241208
>>> with following parameters:
>>>
>>> group: kvm
>>>
>>> config: x86_64-rhel-9.4-kselftests
>>> compiler: gcc-12
>>> test machine: 224 threads 4 sockets Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with 192G memory
>> *sigh*
>>
>> This fails on our Skylake and Cascade Lake systems, but I only tested an Emerald
>> Rapids.
>>
>>> # Testing fixed counters, PMU version 0, perf_caps = 2000
>>> # Testing arch events, PMU version 1, perf_caps = 0
>>> # ==== Test Assertion Failure ====
>>> # x86/pmu_counters_test.c:129: count >= (10 * 4 + 5)
>>> # pid=6278 tid=6278 errno=4 - Interrupted system call
>>> # 1 0x0000000000411281: assert_on_unhandled_exception at processor.c:625
>>> # 2 0x00000000004075d4: _vcpu_run at kvm_util.c:1652
>>> # 3 (inlined by) vcpu_run at kvm_util.c:1663
>>> # 4 0x0000000000402c5e: run_vcpu at pmu_counters_test.c:62
>>> # 5 0x0000000000402e4d: test_arch_events at pmu_counters_test.c:315
>>> # 6 0x0000000000402663: test_arch_events at pmu_counters_test.c:304
>>> # 7 (inlined by) test_intel_counters at pmu_counters_test.c:609
>>> # 8 (inlined by) main at pmu_counters_test.c:642
>>> # 9 0x00007f3b134f9249: ?? ??:0
>>> # 10 0x00007f3b134f9304: ?? ??:0
>>> # 11 0x0000000000402900: _start at ??:?
>>> # count >= NUM_INSNS_RETIRED
>> The failure is on top-down slots. I modified the assert to actually print the
>> count (I'll make sure to post a patch regardless of where this goes), and based
>> on the count for failing vs. passing, I'm pretty sure the issue is not the extra
>> instruction, but instead is due to changing the target of the CLFUSH from the
>> address of the code to the address of kvm_pmu_version.
>>
>> However, I think the blame lies with the assertion itself, i.e. with commit
>> 4a447b135e45 ("KVM: selftests: Test top-down slots event in x86's pmu_counters_test").
>> Either that or top-down slots is broken on the Lakes.
>>
>> By my rudimentary measurements, tying the number of available slots to the number
>> of instructions *retired* is fundamentally flawed. E.g. on the Lakes (SKX is more
>> or less identical to CLX), omitting the CLFLUSHOPT entirely results in *more*
>> slots being available throughout the lifetime of the measured section.
>>
>> My best guess is that flushing the cache line use for the data load causes the
>> backend to saturate its slots with prefetching data, and as a result the number
>> of slots that are available goes down.
>>
>> CLFLUSHOPT . | CLFLUSHOPT [%m] | NOP
>> CLX 350-100 | 20-60[*] | 135-150
>> SPR 49000-57000 | 32500-41000 | 6760-6830
>>
>> [*] CLX had a few outliers in the 200-400 range, but the majority of runs were
>> in the 20-60 range.
>>
>> Reading through more (and more and more) of the TMA documentation, I don't think
>> we can assume anything about the number of available slots, beyond a very basic
>> assertion that it's practically impossible for there to never be an available
>> slot. IIUC, retiring an instruction does NOT require an available slot, rather
>> it requires the opposite: an occupied slot for the uop(s).
> I'm not quite sure about this. IIUC, retiring an instruction may not need a
> cycle, but it needs a slot at least except the instruction is macro-fused.
> Anyway, let me double check with our micro-architecture and perf experts.
Hi Sean,
Just double check with our perf experts, the understanding that "retiring
an instruction needs a slot at least except the instruction is macro-fused"
is correct. The reason of this error is that the architectural topdown
slots event is just introduced from Ice lake platform and it's not
supported on skylake and cascade lake platforms. On these earlier platforms
the event 0x01a4 is another event which counts different thing instead of
topdown slots. On these earlier platforms, the slots event is derived from
0x3c event.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/events/intel/core.c#n466
I don't understand why current pmu_counters_test code would validate an
architectural event which KVM (HW) doesn't support, it's not reasonable and
could cause misleading.
/*
* Force iterating over known arch events regardless of whether or not
* KVM/hardware supports a given event.
*/
nr_arch_events = max_t(typeof(nr_arch_events), nr_arch_events,
NR_INTEL_ARCH_EVENTS);
I would provide a patch to fix this.
BTW, currently KVM doesn't check if user space sets a valid pmu version in
kvm_check_cpuid(). The user space could set KVM a PMU version which is
larger than KVM supported maximum PMU version, just like currently
pmu_counters_test does. This is not correct. I originally intent to fix
this with the mediated vPMU patch series, but It looks we can send the
patches just with this fix together, then the issue can be fixed earlier.
Thanks,
Dapeng Mi
>
>
>> I'm mildly curious as to why the counts for SPR are orders of magnitude higher
>> that CLX (simple accounting differences?), but I don't think it changes anything
>> in the test itself.
>>
>> Unless someone has a better idea, my plan is to post a patch to assert that the
>> top-down slots count is non-zero, not that it's >= instructions retired. E.g.
>>
>> diff --git a/tools/testing/selftests/kvm/x86/pmu_counters_test.c b/tools/testing/selftests/kvm/x86/pmu_counters_test.c
>> index accd7ecd3e5f..21acedcd46cd 100644
>> --- a/tools/testing/selftests/kvm/x86/pmu_counters_test.c
>> +++ b/tools/testing/selftests/kvm/x86/pmu_counters_test.c
>> @@ -123,10 +123,8 @@ static void guest_assert_event_count(uint8_t idx,
>> fallthrough;
>> case INTEL_ARCH_CPU_CYCLES_INDEX:
>> case INTEL_ARCH_REFERENCE_CYCLES_INDEX:
>> - GUEST_ASSERT_NE(count, 0);
>> - break;
>> case INTEL_ARCH_TOPDOWN_SLOTS_INDEX:
>> - GUEST_ASSERT(count >= NUM_INSNS_RETIRED);
>> + GUEST_ASSERT_NE(count, 0);
>> break;
>> default:
>> break;
>>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [linux-next:master] [KVM] 7803339fa9: kernel-selftests.kvm.pmu_counters_test.fail
2025-01-17 3:04 ` Mi, Dapeng
@ 2025-01-17 17:11 ` Sean Christopherson
2025-01-20 2:02 ` Mi, Dapeng
0 siblings, 1 reply; 8+ messages in thread
From: Sean Christopherson @ 2025-01-17 17:11 UTC (permalink / raw)
To: Dapeng Mi
Cc: kernel test robot, g, oe-lkp, lkp, Maxim Levitsky, kvm,
xudong.hao
On Fri, Jan 17, 2025, Dapeng Mi wrote:
> On 1/15/2025 10:44 AM, Mi, Dapeng wrote:
> > On 1/15/2025 3:47 AM, Sean Christopherson wrote:
> >>> # Testing fixed counters, PMU version 0, perf_caps = 2000
> >>> # Testing arch events, PMU version 1, perf_caps = 0
> >>> # ==== Test Assertion Failure ====
> >>> # x86/pmu_counters_test.c:129: count >= (10 * 4 + 5)
> >>> # pid=6278 tid=6278 errno=4 - Interrupted system call
> >>> # 1 0x0000000000411281: assert_on_unhandled_exception at processor.c:625
> >>> # 2 0x00000000004075d4: _vcpu_run at kvm_util.c:1652
> >>> # 3 (inlined by) vcpu_run at kvm_util.c:1663
> >>> # 4 0x0000000000402c5e: run_vcpu at pmu_counters_test.c:62
> >>> # 5 0x0000000000402e4d: test_arch_events at pmu_counters_test.c:315
> >>> # 6 0x0000000000402663: test_arch_events at pmu_counters_test.c:304
> >>> # 7 (inlined by) test_intel_counters at pmu_counters_test.c:609
> >>> # 8 (inlined by) main at pmu_counters_test.c:642
> >>> # 9 0x00007f3b134f9249: ?? ??:0
> >>> # 10 0x00007f3b134f9304: ?? ??:0
> >>> # 11 0x0000000000402900: _start at ??:?
> >>> # count >= NUM_INSNS_RETIRED
> >> The failure is on top-down slots. I modified the assert to actually print the
> >> count (I'll make sure to post a patch regardless of where this goes), and based
> >> on the count for failing vs. passing, I'm pretty sure the issue is not the extra
> >> instruction, but instead is due to changing the target of the CLFUSH from the
> >> address of the code to the address of kvm_pmu_version.
> >>
> >> However, I think the blame lies with the assertion itself, i.e. with commit
> >> 4a447b135e45 ("KVM: selftests: Test top-down slots event in x86's pmu_counters_test").
> >> Either that or top-down slots is broken on the Lakes.
> >>
> >> By my rudimentary measurements, tying the number of available slots to the number
> >> of instructions *retired* is fundamentally flawed. E.g. on the Lakes (SKX is more
> >> or less identical to CLX), omitting the CLFLUSHOPT entirely results in *more*
> >> slots being available throughout the lifetime of the measured section.
> >>
> >> My best guess is that flushing the cache line use for the data load causes the
> >> backend to saturate its slots with prefetching data, and as a result the number
> >> of slots that are available goes down.
> >>
> >> CLFLUSHOPT . | CLFLUSHOPT [%m] | NOP
> >> CLX 350-100 | 20-60[*] | 135-150
> >> SPR 49000-57000 | 32500-41000 | 6760-6830
> >>
> >> [*] CLX had a few outliers in the 200-400 range, but the majority of runs were
> >> in the 20-60 range.
> >>
> >> Reading through more (and more and more) of the TMA documentation, I don't think
> >> we can assume anything about the number of available slots, beyond a very basic
> >> assertion that it's practically impossible for there to never be an available
> >> slot. IIUC, retiring an instruction does NOT require an available slot, rather
> >> it requires the opposite: an occupied slot for the uop(s).
> > I'm not quite sure about this. IIUC, retiring an instruction may not need a
> > cycle, but it needs a slot at least except the instruction is macro-fused.
> > Anyway, let me double check with our micro-architecture and perf experts.
>
> Hi Sean,
>
> Just double check with our perf experts, the understanding that "retiring
> an instruction needs a slot at least except the instruction is macro-fused"
> is correct. The reason of this error is that the architectural topdown
> slots event is just introduced from Ice lake platform and it's not
> supported on skylake and cascade lake platforms. On these earlier platforms
> the event 0x01a4 is another event which counts different thing instead of
> topdown slots. On these earlier platforms, the slots event is derived from
> 0x3c event.
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/events/intel/core.c#n466
>
>
> I don't understand why current pmu_counters_test code would validate an
> architectural event which KVM (HW) doesn't support, it's not reasonable and
> could cause misleading.
>
> /*
> * Force iterating over known arch events regardless of whether or not
> * KVM/hardware supports a given event.
> */
> nr_arch_events = max_t(typeof(nr_arch_events), nr_arch_events,
> NR_INTEL_ARCH_EVENTS);
/facepalm
That's hilariously obvious in hindsight.
> I would provide a patch to fix this.
Testing "unsupproted" arch events is intentional. The idea is to validate that
KVM programs the requested event selector irrespective of whether or not the
architectural event is *enumerated* to the guest (old KVM incorrectly "filtered"
such events).
The flaw in the test is that it doesn't check if the architectural event is
supported in *hardware* when validating the count. But it's still desirable to
program the event selector in that case, i.e. only the validation of the count
should be skipped. Diff at the bottom to address this (needs to be spread
over multiple patches).
> BTW, currently KVM doesn't check if user space sets a valid pmu version in
> kvm_check_cpuid(). The user space could set KVM a PMU version which is
> larger than KVM supported maximum PMU version, just like currently
> pmu_counters_test does. This is not correct.
It's "correct" in the sense that KVM typically doesn't restrict what userspace
enumerates to the guest through CPUID. KVM needs to protect the host against
doing bad things based on a funky guest CPUID, e.g. KVM needs t
> I originally intent to fix this with the mediated vPMU patch series, but It
> looks we can send the patches just with this fix together, then the issue can
> be fixed earlier.
I suspect that if there's a flaw in KVM, it only affects the mediated PMU. Because
perf manages MSRs/hardware with the current PMU, advertising a bogus PMU version
to the guest is "fine" because even if KVM thinks it's legal for the *guest* to
write MSRs that don't exist in hardware, KVM/perf will never try to propagate the
guest values to non-existent hardware.
diff --git a/tools/testing/selftests/kvm/x86/pmu_counters_test.c b/tools/testing/selftests/kvm/x86/pmu_counters_test.c
index accd7ecd3e5f..124051ea50be 100644
--- a/tools/testing/selftests/kvm/x86/pmu_counters_test.c
+++ b/tools/testing/selftests/kvm/x86/pmu_counters_test.c
@@ -29,10 +29,60 @@
/* Total number of instructions retired within the measured section. */
#define NUM_INSNS_RETIRED (NUM_LOOPS * NUM_INSNS_PER_LOOP + NUM_EXTRA_INSNS)
+/* Track which architectural events are supported by hardware. */
+static uint32_t hardware_pmu_arch_events;
static uint8_t kvm_pmu_version;
static bool kvm_has_perf_caps;
+
+#define X86_PMU_FEATURE_NULL \
+({ \
+ struct kvm_x86_pmu_feature feature = {}; \
+ \
+ feature; \
+})
+
+static bool pmu_is_null_feature(struct kvm_x86_pmu_feature event)
+{
+ return !(*(u64 *)&event);
+}
+
+struct kvm_intel_pmu_event {
+ struct kvm_x86_pmu_feature gp_event;
+ struct kvm_x86_pmu_feature fixed_event;
+};
+
+/*
+ * Wrap the array to appease the compiler, as the macros used to construct each
+ * kvm_x86_pmu_feature use syntax that's only valid in function scope, and the
+ * compiler often thinks the feature definitions aren't compile-time constants.
+ */
+static struct kvm_intel_pmu_event intel_event_to_feature(uint8_t idx)
+{
+ const struct kvm_intel_pmu_event __intel_event_to_feature[] = {
+ [INTEL_ARCH_CPU_CYCLES_INDEX] = { X86_PMU_FEATURE_CPU_CYCLES, X86_PMU_FEATURE_CPU_CYCLES_FIXED },
+ [INTEL_ARCH_INSTRUCTIONS_RETIRED_INDEX] = { X86_PMU_FEATURE_INSNS_RETIRED, X86_PMU_FEATURE_INSNS_RETIRED_FIXED },
+ /*
+ * Note, the fixed counter for reference cycles is NOT the same as the
+ * general purpose architectural event. The fixed counter explicitly
+ * counts at the same frequency as the TSC, whereas the GP event counts
+ * at a fixed, but uarch specific, frequency. Bundle them here for
+ * simplicity.
+ */
+ [INTEL_ARCH_REFERENCE_CYCLES_INDEX] = { X86_PMU_FEATURE_REFERENCE_CYCLES, X86_PMU_FEATURE_REFERENCE_TSC_CYCLES_FIXED },
+ [INTEL_ARCH_LLC_REFERENCES_INDEX] = { X86_PMU_FEATURE_LLC_REFERENCES, X86_PMU_FEATURE_NULL },
+ [INTEL_ARCH_LLC_MISSES_INDEX] = { X86_PMU_FEATURE_LLC_MISSES, X86_PMU_FEATURE_NULL },
+ [INTEL_ARCH_BRANCHES_RETIRED_INDEX] = { X86_PMU_FEATURE_BRANCH_INSNS_RETIRED, X86_PMU_FEATURE_NULL },
+ [INTEL_ARCH_BRANCHES_MISPREDICTED_INDEX] = { X86_PMU_FEATURE_BRANCHES_MISPREDICTED, X86_PMU_FEATURE_NULL },
+ [INTEL_ARCH_TOPDOWN_SLOTS_INDEX] = { X86_PMU_FEATURE_TOPDOWN_SLOTS, X86_PMU_FEATURE_TOPDOWN_SLOTS_FIXED },
+ };
+
+ kvm_static_assert(ARRAY_SIZE(__intel_event_to_feature) == NR_INTEL_ARCH_EVENTS);
+
+ return __intel_event_to_feature[idx];
+}
+
static struct kvm_vm *pmu_vm_create_with_one_vcpu(struct kvm_vcpu **vcpu,
void *guest_code,
uint8_t pmu_version,
@@ -42,6 +92,7 @@ static struct kvm_vm *pmu_vm_create_with_one_vcpu(struct kvm_vcpu **vcpu,
vm = vm_create_with_one_vcpu(vcpu, guest_code);
sync_global_to_guest(vm, kvm_pmu_version);
+ sync_global_to_guest(vm, hardware_pmu_arch_events);
/*
* Set PERF_CAPABILITIES before PMU version as KVM disallows enabling
@@ -98,14 +149,12 @@ static uint8_t guest_get_pmu_version(void)
* Sanity check that in all cases, the event doesn't count when it's disabled,
* and that KVM correctly emulates the write of an arbitrary value.
*/
-static void guest_assert_event_count(uint8_t idx,
- struct kvm_x86_pmu_feature event,
- uint32_t pmc, uint32_t pmc_msr)
+static void guest_assert_event_count(uint8_t idx, uint32_t pmc, uint32_t pmc_msr)
{
uint64_t count;
count = _rdpmc(pmc);
- if (!this_pmu_has(event))
+ if (!(hardware_pmu_arch_events & BIT(idx)))
goto sanity_checks;
switch (idx) {
@@ -126,7 +175,9 @@ static void guest_assert_event_count(uint8_t idx,
GUEST_ASSERT_NE(count, 0);
break;
case INTEL_ARCH_TOPDOWN_SLOTS_INDEX:
- GUEST_ASSERT(count >= NUM_INSNS_RETIRED);
+ __GUEST_ASSERT(count < NUM_INSNS_RETIRED,
+ "Expected top-down slots >= %u, got count = %lu",
+ NUM_INSNS_RETIRED, count);
break;
default:
break;
@@ -173,7 +224,7 @@ do { \
); \
} while (0)
-#define GUEST_TEST_EVENT(_idx, _event, _pmc, _pmc_msr, _ctrl_msr, _value, FEP) \
+#define GUEST_TEST_EVENT(_idx, _pmc, _pmc_msr, _ctrl_msr, _value, FEP) \
do { \
wrmsr(_pmc_msr, 0); \
\
@@ -184,54 +235,20 @@ do { \
else \
GUEST_MEASURE_EVENT(_ctrl_msr, _value, "nop", FEP); \
\
- guest_assert_event_count(_idx, _event, _pmc, _pmc_msr); \
+ guest_assert_event_count(_idx, _pmc, _pmc_msr); \
} while (0)
-static void __guest_test_arch_event(uint8_t idx, struct kvm_x86_pmu_feature event,
- uint32_t pmc, uint32_t pmc_msr,
+static void __guest_test_arch_event(uint8_t idx, uint32_t pmc, uint32_t pmc_msr,
uint32_t ctrl_msr, uint64_t ctrl_msr_value)
{
- GUEST_TEST_EVENT(idx, event, pmc, pmc_msr, ctrl_msr, ctrl_msr_value, "");
+ GUEST_TEST_EVENT(idx, pmc, pmc_msr, ctrl_msr, ctrl_msr_value, "");
if (is_forced_emulation_enabled)
- GUEST_TEST_EVENT(idx, event, pmc, pmc_msr, ctrl_msr, ctrl_msr_value, KVM_FEP);
-}
-
-#define X86_PMU_FEATURE_NULL \
-({ \
- struct kvm_x86_pmu_feature feature = {}; \
- \
- feature; \
-})
-
-static bool pmu_is_null_feature(struct kvm_x86_pmu_feature event)
-{
- return !(*(u64 *)&event);
+ GUEST_TEST_EVENT(idx, pmc, pmc_msr, ctrl_msr, ctrl_msr_value, KVM_FEP);
}
static void guest_test_arch_event(uint8_t idx)
{
- const struct {
- struct kvm_x86_pmu_feature gp_event;
- struct kvm_x86_pmu_feature fixed_event;
- } intel_event_to_feature[] = {
- [INTEL_ARCH_CPU_CYCLES_INDEX] = { X86_PMU_FEATURE_CPU_CYCLES, X86_PMU_FEATURE_CPU_CYCLES_FIXED },
- [INTEL_ARCH_INSTRUCTIONS_RETIRED_INDEX] = { X86_PMU_FEATURE_INSNS_RETIRED, X86_PMU_FEATURE_INSNS_RETIRED_FIXED },
- /*
- * Note, the fixed counter for reference cycles is NOT the same
- * as the general purpose architectural event. The fixed counter
- * explicitly counts at the same frequency as the TSC, whereas
- * the GP event counts at a fixed, but uarch specific, frequency.
- * Bundle them here for simplicity.
- */
- [INTEL_ARCH_REFERENCE_CYCLES_INDEX] = { X86_PMU_FEATURE_REFERENCE_CYCLES, X86_PMU_FEATURE_REFERENCE_TSC_CYCLES_FIXED },
- [INTEL_ARCH_LLC_REFERENCES_INDEX] = { X86_PMU_FEATURE_LLC_REFERENCES, X86_PMU_FEATURE_NULL },
- [INTEL_ARCH_LLC_MISSES_INDEX] = { X86_PMU_FEATURE_LLC_MISSES, X86_PMU_FEATURE_NULL },
- [INTEL_ARCH_BRANCHES_RETIRED_INDEX] = { X86_PMU_FEATURE_BRANCH_INSNS_RETIRED, X86_PMU_FEATURE_NULL },
- [INTEL_ARCH_BRANCHES_MISPREDICTED_INDEX] = { X86_PMU_FEATURE_BRANCHES_MISPREDICTED, X86_PMU_FEATURE_NULL },
- [INTEL_ARCH_TOPDOWN_SLOTS_INDEX] = { X86_PMU_FEATURE_TOPDOWN_SLOTS, X86_PMU_FEATURE_TOPDOWN_SLOTS_FIXED },
- };
-
uint32_t nr_gp_counters = this_cpu_property(X86_PROPERTY_PMU_NR_GP_COUNTERS);
uint32_t pmu_version = guest_get_pmu_version();
/* PERF_GLOBAL_CTRL exists only for Architectural PMU Version 2+. */
@@ -249,7 +266,7 @@ static void guest_test_arch_event(uint8_t idx)
else
base_pmc_msr = MSR_IA32_PERFCTR0;
- gp_event = intel_event_to_feature[idx].gp_event;
+ gp_event = intel_event_to_feature(idx).gp_event;
GUEST_ASSERT_EQ(idx, gp_event.f.bit);
GUEST_ASSERT(nr_gp_counters);
@@ -263,14 +280,14 @@ static void guest_test_arch_event(uint8_t idx)
if (guest_has_perf_global_ctrl)
wrmsr(MSR_CORE_PERF_GLOBAL_CTRL, BIT_ULL(i));
- __guest_test_arch_event(idx, gp_event, i, base_pmc_msr + i,
+ __guest_test_arch_event(idx, i, base_pmc_msr + i,
MSR_P6_EVNTSEL0 + i, eventsel);
}
if (!guest_has_perf_global_ctrl)
return;
- fixed_event = intel_event_to_feature[idx].fixed_event;
+ fixed_event = intel_event_to_feature(idx).fixed_event;
if (pmu_is_null_feature(fixed_event) || !this_pmu_has(fixed_event))
return;
@@ -278,7 +295,7 @@ static void guest_test_arch_event(uint8_t idx)
wrmsr(MSR_CORE_PERF_FIXED_CTR_CTRL, FIXED_PMC_CTRL(i, FIXED_PMC_KERNEL));
- __guest_test_arch_event(idx, fixed_event, i | INTEL_RDPMC_FIXED,
+ __guest_test_arch_event(idx, i | INTEL_RDPMC_FIXED,
MSR_CORE_PERF_FIXED_CTR0 + i,
MSR_CORE_PERF_GLOBAL_CTRL,
FIXED_PMC_GLOBAL_CTRL_ENABLE(i));
@@ -546,7 +563,6 @@ static void test_fixed_counters(uint8_t pmu_version, uint64_t perf_capabilities,
static void test_intel_counters(void)
{
- uint8_t nr_arch_events = kvm_cpu_property(X86_PROPERTY_PMU_EBX_BIT_VECTOR_LENGTH);
uint8_t nr_fixed_counters = kvm_cpu_property(X86_PROPERTY_PMU_NR_FIXED_COUNTERS);
uint8_t nr_gp_counters = kvm_cpu_property(X86_PROPERTY_PMU_NR_GP_COUNTERS);
uint8_t pmu_version = kvm_cpu_property(X86_PROPERTY_PMU_VERSION);
@@ -568,18 +584,26 @@ static void test_intel_counters(void)
/*
* Detect the existence of events that aren't supported by selftests.
- * This will (obviously) fail any time the kernel adds support for a
- * new event, but it's worth paying that price to keep the test fresh.
+ * This will (obviously) fail any time hardware adds support for a new
+ * event, but it's worth paying that price to keep the test fresh.
*/
- TEST_ASSERT(nr_arch_events <= NR_INTEL_ARCH_EVENTS,
+ TEST_ASSERT(this_cpu_property(X86_PROPERTY_PMU_EBX_BIT_VECTOR_LENGTH) <= NR_INTEL_ARCH_EVENTS,
"New architectural event(s) detected; please update this test (length = %u, mask = %x)",
- nr_arch_events, kvm_cpu_property(X86_PROPERTY_PMU_EVENTS_MASK));
+ this_cpu_property(X86_PROPERTY_PMU_EBX_BIT_VECTOR_LENGTH),
+ this_cpu_property(X86_PROPERTY_PMU_EVENTS_MASK));
/*
- * Force iterating over known arch events regardless of whether or not
- * KVM/hardware supports a given event.
+ * Iterate over known arch events irrespective of KVM/hardware support
+ * to verify that KVM doesn't reject programming of events just because
+ * the *architectural* encoding is unsupported. Track which events are
+ * supported in hardware; the guest side will validate supported events
+ * count correctly, even if *enumeration* of the event is unsupported
+ * by KVM and/or isn't exposed to the guest.
*/
- nr_arch_events = max_t(typeof(nr_arch_events), nr_arch_events, NR_INTEL_ARCH_EVENTS);
+ for (i = 0; i < NR_INTEL_ARCH_EVENTS; i++) {
+ if (this_pmu_has(intel_event_to_feature(i).gp_event))
+ hardware_pmu_arch_events |= BIT(i);
+ }
for (v = 0; v <= max_pmu_version; v++) {
for (i = 0; i < ARRAY_SIZE(perf_caps); i++) {
@@ -595,8 +619,8 @@ static void test_intel_counters(void)
* vector length.
*/
if (v == pmu_version) {
- for (k = 1; k < (BIT(nr_arch_events) - 1); k++)
- test_arch_events(v, perf_caps[i], nr_arch_events, k);
+ for (k = 1; k < (BIT(NR_INTEL_ARCH_EVENTS) - 1); k++)
+ test_arch_events(v, perf_caps[i], NR_INTEL_ARCH_EVENTS, k);
}
/*
* Test single bits for all PMU version and lengths up
@@ -605,11 +629,11 @@ static void test_intel_counters(void)
* host length). Explicitly test a mask of '0' and all
* ones i.e. all events being available and unavailable.
*/
- for (j = 0; j <= nr_arch_events + 1; j++) {
+ for (j = 0; j <= NR_INTEL_ARCH_EVENTS + 1; j++) {
test_arch_events(v, perf_caps[i], j, 0);
test_arch_events(v, perf_caps[i], j, 0xff);
- for (k = 0; k < nr_arch_events; k++)
+ for (k = 0; k < NR_INTEL_ARCH_EVENTS; k++)
test_arch_events(v, perf_caps[i], j, BIT(k));
}
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [linux-next:master] [KVM] 7803339fa9: kernel-selftests.kvm.pmu_counters_test.fail
2025-01-17 17:11 ` Sean Christopherson
@ 2025-01-20 2:02 ` Mi, Dapeng
2025-01-21 16:13 ` Sean Christopherson
0 siblings, 1 reply; 8+ messages in thread
From: Mi, Dapeng @ 2025-01-20 2:02 UTC (permalink / raw)
To: Sean Christopherson
Cc: kernel test robot, g, oe-lkp, lkp, Maxim Levitsky, kvm,
xudong.hao
On 1/18/2025 1:11 AM, Sean Christopherson wrote:
> On Fri, Jan 17, 2025, Dapeng Mi wrote:
>> On 1/15/2025 10:44 AM, Mi, Dapeng wrote:
>>> On 1/15/2025 3:47 AM, Sean Christopherson wrote:
>>>>> # Testing fixed counters, PMU version 0, perf_caps = 2000
>>>>> # Testing arch events, PMU version 1, perf_caps = 0
>>>>> # ==== Test Assertion Failure ====
>>>>> # x86/pmu_counters_test.c:129: count >= (10 * 4 + 5)
>>>>> # pid=6278 tid=6278 errno=4 - Interrupted system call
>>>>> # 1 0x0000000000411281: assert_on_unhandled_exception at processor.c:625
>>>>> # 2 0x00000000004075d4: _vcpu_run at kvm_util.c:1652
>>>>> # 3 (inlined by) vcpu_run at kvm_util.c:1663
>>>>> # 4 0x0000000000402c5e: run_vcpu at pmu_counters_test.c:62
>>>>> # 5 0x0000000000402e4d: test_arch_events at pmu_counters_test.c:315
>>>>> # 6 0x0000000000402663: test_arch_events at pmu_counters_test.c:304
>>>>> # 7 (inlined by) test_intel_counters at pmu_counters_test.c:609
>>>>> # 8 (inlined by) main at pmu_counters_test.c:642
>>>>> # 9 0x00007f3b134f9249: ?? ??:0
>>>>> # 10 0x00007f3b134f9304: ?? ??:0
>>>>> # 11 0x0000000000402900: _start at ??:?
>>>>> # count >= NUM_INSNS_RETIRED
>>>> The failure is on top-down slots. I modified the assert to actually print the
>>>> count (I'll make sure to post a patch regardless of where this goes), and based
>>>> on the count for failing vs. passing, I'm pretty sure the issue is not the extra
>>>> instruction, but instead is due to changing the target of the CLFUSH from the
>>>> address of the code to the address of kvm_pmu_version.
>>>>
>>>> However, I think the blame lies with the assertion itself, i.e. with commit
>>>> 4a447b135e45 ("KVM: selftests: Test top-down slots event in x86's pmu_counters_test").
>>>> Either that or top-down slots is broken on the Lakes.
>>>>
>>>> By my rudimentary measurements, tying the number of available slots to the number
>>>> of instructions *retired* is fundamentally flawed. E.g. on the Lakes (SKX is more
>>>> or less identical to CLX), omitting the CLFLUSHOPT entirely results in *more*
>>>> slots being available throughout the lifetime of the measured section.
>>>>
>>>> My best guess is that flushing the cache line use for the data load causes the
>>>> backend to saturate its slots with prefetching data, and as a result the number
>>>> of slots that are available goes down.
>>>>
>>>> CLFLUSHOPT . | CLFLUSHOPT [%m] | NOP
>>>> CLX 350-100 | 20-60[*] | 135-150
>>>> SPR 49000-57000 | 32500-41000 | 6760-6830
>>>>
>>>> [*] CLX had a few outliers in the 200-400 range, but the majority of runs were
>>>> in the 20-60 range.
>>>>
>>>> Reading through more (and more and more) of the TMA documentation, I don't think
>>>> we can assume anything about the number of available slots, beyond a very basic
>>>> assertion that it's practically impossible for there to never be an available
>>>> slot. IIUC, retiring an instruction does NOT require an available slot, rather
>>>> it requires the opposite: an occupied slot for the uop(s).
>>> I'm not quite sure about this. IIUC, retiring an instruction may not need a
>>> cycle, but it needs a slot at least except the instruction is macro-fused.
>>> Anyway, let me double check with our micro-architecture and perf experts.
>> Hi Sean,
>>
>> Just double check with our perf experts, the understanding that "retiring
>> an instruction needs a slot at least except the instruction is macro-fused"
>> is correct. The reason of this error is that the architectural topdown
>> slots event is just introduced from Ice lake platform and it's not
>> supported on skylake and cascade lake platforms. On these earlier platforms
>> the event 0x01a4 is another event which counts different thing instead of
>> topdown slots. On these earlier platforms, the slots event is derived from
>> 0x3c event.
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/events/intel/core.c#n466
>>
>>
>> I don't understand why current pmu_counters_test code would validate an
>> architectural event which KVM (HW) doesn't support, it's not reasonable and
>> could cause misleading.
>>
>> /*
>> * Force iterating over known arch events regardless of whether or not
>> * KVM/hardware supports a given event.
>> */
>> nr_arch_events = max_t(typeof(nr_arch_events), nr_arch_events,
>> NR_INTEL_ARCH_EVENTS);
> /facepalm
>
> That's hilariously obvious in hindsight.
>
>> I would provide a patch to fix this.
> Testing "unsupproted" arch events is intentional. The idea is to validate that
> KVM programs the requested event selector irrespective of whether or not the
> architectural event is *enumerated* to the guest (old KVM incorrectly "filtered"
> such events).
>
> The flaw in the test is that it doesn't check if the architectural event is
> supported in *hardware* when validating the count. But it's still desirable to
> program the event selector in that case, i.e. only the validation of the count
> should be skipped. Diff at the bottom to address this (needs to be spread
> over multiple patches).
I see. Thanks for explaining the history.
>
>> BTW, currently KVM doesn't check if user space sets a valid pmu version in
>> kvm_check_cpuid(). The user space could set KVM a PMU version which is
>> larger than KVM supported maximum PMU version, just like currently
>> pmu_counters_test does. This is not correct.
> It's "correct" in the sense that KVM typically doesn't restrict what userspace
> enumerates to the guest through CPUID. KVM needs to protect the host against
> doing bad things based on a funky guest CPUID, e.g. KVM needs t
>
>> I originally intent to fix this with the mediated vPMU patch series, but It
>> looks we can send the patches just with this fix together, then the issue can
>> be fixed earlier.
> I suspect that if there's a flaw in KVM, it only affects the mediated PMU. Because
> perf manages MSRs/hardware with the current PMU, advertising a bogus PMU version
> to the guest is "fine" because even if KVM thinks it's legal for the *guest* to
> write MSRs that don't exist in hardware, KVM/perf will never try to propagate the
> guest values to non-existent hardware.
Hmm, yeah, I think you're right. In theory, it should only impact mediated
vPMU since perf-based vPMU doesn't manipulate HW directly. Thanks.
>
> diff --git a/tools/testing/selftests/kvm/x86/pmu_counters_test.c b/tools/testing/selftests/kvm/x86/pmu_counters_test.c
> index accd7ecd3e5f..124051ea50be 100644
> --- a/tools/testing/selftests/kvm/x86/pmu_counters_test.c
> +++ b/tools/testing/selftests/kvm/x86/pmu_counters_test.c
> @@ -29,10 +29,60 @@
> /* Total number of instructions retired within the measured section. */
> #define NUM_INSNS_RETIRED (NUM_LOOPS * NUM_INSNS_PER_LOOP + NUM_EXTRA_INSNS)
>
> +/* Track which architectural events are supported by hardware. */
> +static uint32_t hardware_pmu_arch_events;
>
> static uint8_t kvm_pmu_version;
> static bool kvm_has_perf_caps;
>
> +
> +#define X86_PMU_FEATURE_NULL \
> +({ \
> + struct kvm_x86_pmu_feature feature = {}; \
> + \
> + feature; \
> +})
> +
> +static bool pmu_is_null_feature(struct kvm_x86_pmu_feature event)
> +{
> + return !(*(u64 *)&event);
> +}
> +
> +struct kvm_intel_pmu_event {
> + struct kvm_x86_pmu_feature gp_event;
> + struct kvm_x86_pmu_feature fixed_event;
> +};
> +
> +/*
> + * Wrap the array to appease the compiler, as the macros used to construct each
> + * kvm_x86_pmu_feature use syntax that's only valid in function scope, and the
> + * compiler often thinks the feature definitions aren't compile-time constants.
> + */
> +static struct kvm_intel_pmu_event intel_event_to_feature(uint8_t idx)
> +{
> + const struct kvm_intel_pmu_event __intel_event_to_feature[] = {
> + [INTEL_ARCH_CPU_CYCLES_INDEX] = { X86_PMU_FEATURE_CPU_CYCLES, X86_PMU_FEATURE_CPU_CYCLES_FIXED },
> + [INTEL_ARCH_INSTRUCTIONS_RETIRED_INDEX] = { X86_PMU_FEATURE_INSNS_RETIRED, X86_PMU_FEATURE_INSNS_RETIRED_FIXED },
> + /*
> + * Note, the fixed counter for reference cycles is NOT the same as the
> + * general purpose architectural event. The fixed counter explicitly
> + * counts at the same frequency as the TSC, whereas the GP event counts
> + * at a fixed, but uarch specific, frequency. Bundle them here for
> + * simplicity.
> + */
> + [INTEL_ARCH_REFERENCE_CYCLES_INDEX] = { X86_PMU_FEATURE_REFERENCE_CYCLES, X86_PMU_FEATURE_REFERENCE_TSC_CYCLES_FIXED },
> + [INTEL_ARCH_LLC_REFERENCES_INDEX] = { X86_PMU_FEATURE_LLC_REFERENCES, X86_PMU_FEATURE_NULL },
> + [INTEL_ARCH_LLC_MISSES_INDEX] = { X86_PMU_FEATURE_LLC_MISSES, X86_PMU_FEATURE_NULL },
> + [INTEL_ARCH_BRANCHES_RETIRED_INDEX] = { X86_PMU_FEATURE_BRANCH_INSNS_RETIRED, X86_PMU_FEATURE_NULL },
> + [INTEL_ARCH_BRANCHES_MISPREDICTED_INDEX] = { X86_PMU_FEATURE_BRANCHES_MISPREDICTED, X86_PMU_FEATURE_NULL },
> + [INTEL_ARCH_TOPDOWN_SLOTS_INDEX] = { X86_PMU_FEATURE_TOPDOWN_SLOTS, X86_PMU_FEATURE_TOPDOWN_SLOTS_FIXED },
> + };
> +
> + kvm_static_assert(ARRAY_SIZE(__intel_event_to_feature) == NR_INTEL_ARCH_EVENTS);
> +
> + return __intel_event_to_feature[idx];
> +}
> +
> static struct kvm_vm *pmu_vm_create_with_one_vcpu(struct kvm_vcpu **vcpu,
> void *guest_code,
> uint8_t pmu_version,
> @@ -42,6 +92,7 @@ static struct kvm_vm *pmu_vm_create_with_one_vcpu(struct kvm_vcpu **vcpu,
>
> vm = vm_create_with_one_vcpu(vcpu, guest_code);
> sync_global_to_guest(vm, kvm_pmu_version);
> + sync_global_to_guest(vm, hardware_pmu_arch_events);
>
> /*
> * Set PERF_CAPABILITIES before PMU version as KVM disallows enabling
> @@ -98,14 +149,12 @@ static uint8_t guest_get_pmu_version(void)
> * Sanity check that in all cases, the event doesn't count when it's disabled,
> * and that KVM correctly emulates the write of an arbitrary value.
> */
> -static void guest_assert_event_count(uint8_t idx,
> - struct kvm_x86_pmu_feature event,
> - uint32_t pmc, uint32_t pmc_msr)
> +static void guest_assert_event_count(uint8_t idx, uint32_t pmc, uint32_t pmc_msr)
> {
> uint64_t count;
>
> count = _rdpmc(pmc);
> - if (!this_pmu_has(event))
> + if (!(hardware_pmu_arch_events & BIT(idx)))
> goto sanity_checks;
>
> switch (idx) {
> @@ -126,7 +175,9 @@ static void guest_assert_event_count(uint8_t idx,
> GUEST_ASSERT_NE(count, 0);
> break;
> case INTEL_ARCH_TOPDOWN_SLOTS_INDEX:
> - GUEST_ASSERT(count >= NUM_INSNS_RETIRED);
> + __GUEST_ASSERT(count < NUM_INSNS_RETIRED,
shouldn't be "__GUEST_ASSERT(count >= NUM_INSNS_RETIRED," ?
> + "Expected top-down slots >= %u, got count = %lu",
> + NUM_INSNS_RETIRED, count);
> break;
> default:
> break;
> @@ -173,7 +224,7 @@ do { \
> ); \
> } while (0)
>
> -#define GUEST_TEST_EVENT(_idx, _event, _pmc, _pmc_msr, _ctrl_msr, _value, FEP) \
> +#define GUEST_TEST_EVENT(_idx, _pmc, _pmc_msr, _ctrl_msr, _value, FEP) \
> do { \
> wrmsr(_pmc_msr, 0); \
> \
> @@ -184,54 +235,20 @@ do { \
> else \
> GUEST_MEASURE_EVENT(_ctrl_msr, _value, "nop", FEP); \
> \
> - guest_assert_event_count(_idx, _event, _pmc, _pmc_msr); \
> + guest_assert_event_count(_idx, _pmc, _pmc_msr); \
> } while (0)
>
> -static void __guest_test_arch_event(uint8_t idx, struct kvm_x86_pmu_feature event,
> - uint32_t pmc, uint32_t pmc_msr,
> +static void __guest_test_arch_event(uint8_t idx, uint32_t pmc, uint32_t pmc_msr,
> uint32_t ctrl_msr, uint64_t ctrl_msr_value)
> {
> - GUEST_TEST_EVENT(idx, event, pmc, pmc_msr, ctrl_msr, ctrl_msr_value, "");
> + GUEST_TEST_EVENT(idx, pmc, pmc_msr, ctrl_msr, ctrl_msr_value, "");
>
> if (is_forced_emulation_enabled)
> - GUEST_TEST_EVENT(idx, event, pmc, pmc_msr, ctrl_msr, ctrl_msr_value, KVM_FEP);
> -}
> -
> -#define X86_PMU_FEATURE_NULL \
> -({ \
> - struct kvm_x86_pmu_feature feature = {}; \
> - \
> - feature; \
> -})
> -
> -static bool pmu_is_null_feature(struct kvm_x86_pmu_feature event)
> -{
> - return !(*(u64 *)&event);
> + GUEST_TEST_EVENT(idx, pmc, pmc_msr, ctrl_msr, ctrl_msr_value, KVM_FEP);
> }
>
> static void guest_test_arch_event(uint8_t idx)
> {
> - const struct {
> - struct kvm_x86_pmu_feature gp_event;
> - struct kvm_x86_pmu_feature fixed_event;
> - } intel_event_to_feature[] = {
> - [INTEL_ARCH_CPU_CYCLES_INDEX] = { X86_PMU_FEATURE_CPU_CYCLES, X86_PMU_FEATURE_CPU_CYCLES_FIXED },
> - [INTEL_ARCH_INSTRUCTIONS_RETIRED_INDEX] = { X86_PMU_FEATURE_INSNS_RETIRED, X86_PMU_FEATURE_INSNS_RETIRED_FIXED },
> - /*
> - * Note, the fixed counter for reference cycles is NOT the same
> - * as the general purpose architectural event. The fixed counter
> - * explicitly counts at the same frequency as the TSC, whereas
> - * the GP event counts at a fixed, but uarch specific, frequency.
> - * Bundle them here for simplicity.
> - */
> - [INTEL_ARCH_REFERENCE_CYCLES_INDEX] = { X86_PMU_FEATURE_REFERENCE_CYCLES, X86_PMU_FEATURE_REFERENCE_TSC_CYCLES_FIXED },
> - [INTEL_ARCH_LLC_REFERENCES_INDEX] = { X86_PMU_FEATURE_LLC_REFERENCES, X86_PMU_FEATURE_NULL },
> - [INTEL_ARCH_LLC_MISSES_INDEX] = { X86_PMU_FEATURE_LLC_MISSES, X86_PMU_FEATURE_NULL },
> - [INTEL_ARCH_BRANCHES_RETIRED_INDEX] = { X86_PMU_FEATURE_BRANCH_INSNS_RETIRED, X86_PMU_FEATURE_NULL },
> - [INTEL_ARCH_BRANCHES_MISPREDICTED_INDEX] = { X86_PMU_FEATURE_BRANCHES_MISPREDICTED, X86_PMU_FEATURE_NULL },
> - [INTEL_ARCH_TOPDOWN_SLOTS_INDEX] = { X86_PMU_FEATURE_TOPDOWN_SLOTS, X86_PMU_FEATURE_TOPDOWN_SLOTS_FIXED },
> - };
> -
> uint32_t nr_gp_counters = this_cpu_property(X86_PROPERTY_PMU_NR_GP_COUNTERS);
> uint32_t pmu_version = guest_get_pmu_version();
> /* PERF_GLOBAL_CTRL exists only for Architectural PMU Version 2+. */
> @@ -249,7 +266,7 @@ static void guest_test_arch_event(uint8_t idx)
> else
> base_pmc_msr = MSR_IA32_PERFCTR0;
>
> - gp_event = intel_event_to_feature[idx].gp_event;
> + gp_event = intel_event_to_feature(idx).gp_event;
> GUEST_ASSERT_EQ(idx, gp_event.f.bit);
>
> GUEST_ASSERT(nr_gp_counters);
> @@ -263,14 +280,14 @@ static void guest_test_arch_event(uint8_t idx)
> if (guest_has_perf_global_ctrl)
> wrmsr(MSR_CORE_PERF_GLOBAL_CTRL, BIT_ULL(i));
>
> - __guest_test_arch_event(idx, gp_event, i, base_pmc_msr + i,
> + __guest_test_arch_event(idx, i, base_pmc_msr + i,
> MSR_P6_EVNTSEL0 + i, eventsel);
> }
>
> if (!guest_has_perf_global_ctrl)
> return;
>
> - fixed_event = intel_event_to_feature[idx].fixed_event;
> + fixed_event = intel_event_to_feature(idx).fixed_event;
> if (pmu_is_null_feature(fixed_event) || !this_pmu_has(fixed_event))
> return;
>
> @@ -278,7 +295,7 @@ static void guest_test_arch_event(uint8_t idx)
>
> wrmsr(MSR_CORE_PERF_FIXED_CTR_CTRL, FIXED_PMC_CTRL(i, FIXED_PMC_KERNEL));
>
> - __guest_test_arch_event(idx, fixed_event, i | INTEL_RDPMC_FIXED,
> + __guest_test_arch_event(idx, i | INTEL_RDPMC_FIXED,
> MSR_CORE_PERF_FIXED_CTR0 + i,
> MSR_CORE_PERF_GLOBAL_CTRL,
> FIXED_PMC_GLOBAL_CTRL_ENABLE(i));
> @@ -546,7 +563,6 @@ static void test_fixed_counters(uint8_t pmu_version, uint64_t perf_capabilities,
>
> static void test_intel_counters(void)
> {
> - uint8_t nr_arch_events = kvm_cpu_property(X86_PROPERTY_PMU_EBX_BIT_VECTOR_LENGTH);
> uint8_t nr_fixed_counters = kvm_cpu_property(X86_PROPERTY_PMU_NR_FIXED_COUNTERS);
> uint8_t nr_gp_counters = kvm_cpu_property(X86_PROPERTY_PMU_NR_GP_COUNTERS);
> uint8_t pmu_version = kvm_cpu_property(X86_PROPERTY_PMU_VERSION);
> @@ -568,18 +584,26 @@ static void test_intel_counters(void)
>
> /*
> * Detect the existence of events that aren't supported by selftests.
> - * This will (obviously) fail any time the kernel adds support for a
> - * new event, but it's worth paying that price to keep the test fresh.
> + * This will (obviously) fail any time hardware adds support for a new
> + * event, but it's worth paying that price to keep the test fresh.
> */
> - TEST_ASSERT(nr_arch_events <= NR_INTEL_ARCH_EVENTS,
> + TEST_ASSERT(this_cpu_property(X86_PROPERTY_PMU_EBX_BIT_VECTOR_LENGTH) <= NR_INTEL_ARCH_EVENTS,
> "New architectural event(s) detected; please update this test (length = %u, mask = %x)",
> - nr_arch_events, kvm_cpu_property(X86_PROPERTY_PMU_EVENTS_MASK));
> + this_cpu_property(X86_PROPERTY_PMU_EBX_BIT_VECTOR_LENGTH),
> + this_cpu_property(X86_PROPERTY_PMU_EVENTS_MASK));
>
> /*
> - * Force iterating over known arch events regardless of whether or not
> - * KVM/hardware supports a given event.
> + * Iterate over known arch events irrespective of KVM/hardware support
> + * to verify that KVM doesn't reject programming of events just because
> + * the *architectural* encoding is unsupported. Track which events are
> + * supported in hardware; the guest side will validate supported events
> + * count correctly, even if *enumeration* of the event is unsupported
> + * by KVM and/or isn't exposed to the guest.
> */
> - nr_arch_events = max_t(typeof(nr_arch_events), nr_arch_events, NR_INTEL_ARCH_EVENTS);
> + for (i = 0; i < NR_INTEL_ARCH_EVENTS; i++) {
> + if (this_pmu_has(intel_event_to_feature(i).gp_event))
> + hardware_pmu_arch_events |= BIT(i);
> + }
>
> for (v = 0; v <= max_pmu_version; v++) {
> for (i = 0; i < ARRAY_SIZE(perf_caps); i++) {
> @@ -595,8 +619,8 @@ static void test_intel_counters(void)
> * vector length.
> */
> if (v == pmu_version) {
> - for (k = 1; k < (BIT(nr_arch_events) - 1); k++)
> - test_arch_events(v, perf_caps[i], nr_arch_events, k);
> + for (k = 1; k < (BIT(NR_INTEL_ARCH_EVENTS) - 1); k++)
> + test_arch_events(v, perf_caps[i], NR_INTEL_ARCH_EVENTS, k);
> }
> /*
> * Test single bits for all PMU version and lengths up
> @@ -605,11 +629,11 @@ static void test_intel_counters(void)
> * host length). Explicitly test a mask of '0' and all
> * ones i.e. all events being available and unavailable.
> */
> - for (j = 0; j <= nr_arch_events + 1; j++) {
> + for (j = 0; j <= NR_INTEL_ARCH_EVENTS + 1; j++) {
> test_arch_events(v, perf_caps[i], j, 0);
> test_arch_events(v, perf_caps[i], j, 0xff);
>
> - for (k = 0; k < nr_arch_events; k++)
> + for (k = 0; k < NR_INTEL_ARCH_EVENTS; k++)
> test_arch_events(v, perf_caps[i], j, BIT(k));
> }
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [linux-next:master] [KVM] 7803339fa9: kernel-selftests.kvm.pmu_counters_test.fail
2025-01-20 2:02 ` Mi, Dapeng
@ 2025-01-21 16:13 ` Sean Christopherson
2025-01-22 1:26 ` Mi, Dapeng
0 siblings, 1 reply; 8+ messages in thread
From: Sean Christopherson @ 2025-01-21 16:13 UTC (permalink / raw)
To: Dapeng Mi
Cc: kernel test robot, g, oe-lkp, lkp, Maxim Levitsky, kvm,
xudong.hao
On Mon, Jan 20, 2025, Dapeng Mi wrote:
> On 1/18/2025 1:11 AM, Sean Christopherson wrote:
> > @@ -98,14 +149,12 @@ static uint8_t guest_get_pmu_version(void)
> > * Sanity check that in all cases, the event doesn't count when it's disabled,
> > * and that KVM correctly emulates the write of an arbitrary value.
> > */
> > -static void guest_assert_event_count(uint8_t idx,
> > - struct kvm_x86_pmu_feature event,
> > - uint32_t pmc, uint32_t pmc_msr)
> > +static void guest_assert_event_count(uint8_t idx, uint32_t pmc, uint32_t pmc_msr)
> > {
> > uint64_t count;
> >
> > count = _rdpmc(pmc);
> > - if (!this_pmu_has(event))
> > + if (!(hardware_pmu_arch_events & BIT(idx)))
> > goto sanity_checks;
> >
> > switch (idx) {
> > @@ -126,7 +175,9 @@ static void guest_assert_event_count(uint8_t idx,
> > GUEST_ASSERT_NE(count, 0);
> > break;
> > case INTEL_ARCH_TOPDOWN_SLOTS_INDEX:
> > - GUEST_ASSERT(count >= NUM_INSNS_RETIRED);
> > + __GUEST_ASSERT(count < NUM_INSNS_RETIRED,
>
> shouldn't be "__GUEST_ASSERT(count >= NUM_INSNS_RETIRED," ?
Yes. I had intentionally inverted the check to verify the assert message and
forgot to flip it back before hitting "send". Thankfully, I didn't forget before
posting formally[*]. Ugh, but I did forget to Cc you on that series, sorry :-/
[*] https://lore.kernel.org/all/20250117234204.2600624-6-seanjc@google.com
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [linux-next:master] [KVM] 7803339fa9: kernel-selftests.kvm.pmu_counters_test.fail
2025-01-21 16:13 ` Sean Christopherson
@ 2025-01-22 1:26 ` Mi, Dapeng
0 siblings, 0 replies; 8+ messages in thread
From: Mi, Dapeng @ 2025-01-22 1:26 UTC (permalink / raw)
To: Sean Christopherson
Cc: kernel test robot, g, oe-lkp, lkp, Maxim Levitsky, kvm,
xudong.hao
On 1/22/2025 12:13 AM, Sean Christopherson wrote:
> On Mon, Jan 20, 2025, Dapeng Mi wrote:
>> On 1/18/2025 1:11 AM, Sean Christopherson wrote:
>>> @@ -98,14 +149,12 @@ static uint8_t guest_get_pmu_version(void)
>>> * Sanity check that in all cases, the event doesn't count when it's disabled,
>>> * and that KVM correctly emulates the write of an arbitrary value.
>>> */
>>> -static void guest_assert_event_count(uint8_t idx,
>>> - struct kvm_x86_pmu_feature event,
>>> - uint32_t pmc, uint32_t pmc_msr)
>>> +static void guest_assert_event_count(uint8_t idx, uint32_t pmc, uint32_t pmc_msr)
>>> {
>>> uint64_t count;
>>>
>>> count = _rdpmc(pmc);
>>> - if (!this_pmu_has(event))
>>> + if (!(hardware_pmu_arch_events & BIT(idx)))
>>> goto sanity_checks;
>>>
>>> switch (idx) {
>>> @@ -126,7 +175,9 @@ static void guest_assert_event_count(uint8_t idx,
>>> GUEST_ASSERT_NE(count, 0);
>>> break;
>>> case INTEL_ARCH_TOPDOWN_SLOTS_INDEX:
>>> - GUEST_ASSERT(count >= NUM_INSNS_RETIRED);
>>> + __GUEST_ASSERT(count < NUM_INSNS_RETIRED,
>> shouldn't be "__GUEST_ASSERT(count >= NUM_INSNS_RETIRED," ?
> Yes. I had intentionally inverted the check to verify the assert message and
> forgot to flip it back before hitting "send". Thankfully, I didn't forget before
> posting formally[*]. Ugh, but I did forget to Cc you on that series, sorry :-/
>
> [*] https://lore.kernel.org/all/20250117234204.2600624-6-seanjc@google.com
Good to know. Thanks.
>
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-01-22 1:26 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-14 2:40 [linux-next:master] [KVM] 7803339fa9: kernel-selftests.kvm.pmu_counters_test.fail kernel test robot
2025-01-14 19:47 ` Sean Christopherson
2025-01-15 2:44 ` Mi, Dapeng
2025-01-17 3:04 ` Mi, Dapeng
2025-01-17 17:11 ` Sean Christopherson
2025-01-20 2:02 ` Mi, Dapeng
2025-01-21 16:13 ` Sean Christopherson
2025-01-22 1:26 ` Mi, Dapeng
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox