* [Bug 218739] New: pmu_counters_test kvm-selftest fails with (count != NUM_INSNS_RETIRED)
@ 2024-04-17 14:29 bugzilla-daemon
2024-04-23 0:21 ` [Bug 218739] " bugzilla-daemon
` (6 more replies)
0 siblings, 7 replies; 9+ messages in thread
From: bugzilla-daemon @ 2024-04-17 14:29 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=218739
Bug ID: 218739
Summary: pmu_counters_test kvm-selftest fails with (count !=
NUM_INSNS_RETIRED)
Product: Virtualization
Version: unspecified
Hardware: Intel
OS: Linux
Status: NEW
Severity: normal
Priority: P3
Component: kvm
Assignee: virtualization_kvm@kernel-bugs.osdl.org
Reporter: jarichte@redhat.com
Regression: No
Environment:
CPU Architecture: x86_64, Intel(R) Atom(TM) CPU C2750 @ 2.40GHz
Host OS: Fedorarawhide
Host kernel: Linux Kernel 6.9.0-rc3
gcc: gcc (GCC) 14.0.1
Host kernel source: https://git.kernel.org/pub/scm/virt/kvm/kvm.git
Branch: master
Commit: 1c3bed8006691f485156153778192864c9d8e14f
Bug Detailed Description:
Assertion failure executing kvm selftest pmu_counters_test.
Reproducing Steps:
git clone https://git.kernel.org/pub/scm/virt/kvm/kvm.git
cd kvm && make headers_install
cd kvm/tools/testing/selftests/kvm && make
cd x86_64 && ./pmu_counters_test
Actual Result:
Testing arch events, PMU version 0, perf_caps = 0
Testing GP counters, PMU version 0, perf_caps = 0
Testing fixed counters, PMU version 0, perf_caps = 0
Testing arch events, PMU version 0, perf_caps = 2000
Testing GP counters, PMU version 0, perf_caps = 2000
Testing fixed counters, PMU version 0, perf_caps = 2000
Testing arch events, PMU version 1, perf_caps = 0
==== Test Assertion Failure ====
x86_64/pmu_counters_test.c:107: count == NUM_INSNS_RETIRED
pid=51128 tid=51128 errno=4 - Interrupted system call
1 0x0000000000402c7d: run_vcpu at pmu_counters_test.c:61
2 0x0000000000402ead: test_arch_events at pmu_counters_test.c:307
3 0x0000000000402674: test_arch_events at pmu_counters_test.c:296
4 (inlined by) test_intel_counters at pmu_counters_test.c:601
5 (inlined by) main at pmu_counters_test.c:635
6 0x00007f78bd1981c7: ?? ??:0
7 0x00007f78bd19828a: ?? ??:0
8 0x0000000000402924: _start at ??:?
0x12 != 0x11 (count != NUM_INSNS_RETIRED)
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug 218739] pmu_counters_test kvm-selftest fails with (count != NUM_INSNS_RETIRED)
2024-04-17 14:29 [Bug 218739] New: pmu_counters_test kvm-selftest fails with (count != NUM_INSNS_RETIRED) bugzilla-daemon
@ 2024-04-23 0:21 ` bugzilla-daemon
2024-05-27 18:19 ` bugzilla-daemon
` (5 subsequent siblings)
6 siblings, 0 replies; 9+ messages in thread
From: bugzilla-daemon @ 2024-04-23 0:21 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=218739
Dongli Zhang (dongli.zhang@oracle.com) changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |dongli.zhang@oracle.com
--- Comment #1 from Dongli Zhang (dongli.zhang@oracle.com) ---
Perhaps more information can be printed by pmu_counters_test in the future,
e.g., msr, msr_ctrl, their values, cflush and whether forced emulation?
Just from the output, the number of instructions by GUEST_MEASURE_EVENT() does
not match with NUM_INSNS_RETIRED=17,
------------------------
I have tried on an Icelake server and I could not reproduce anything for most
of times, except the below for only once.
# ./pmu_counters_test
Testing arch events, PMU version 0, perf_caps = 0
Testing GP counters, PMU version 0, perf_caps = 0
Testing fixed counters, PMU version 0, perf_caps = 0
Testing arch events, PMU version 0, perf_caps = 2000
Testing GP counters, PMU version 0, perf_caps = 2000
Testing fixed counters, PMU version 0, perf_caps = 2000
Testing arch events, PMU version 1, perf_caps = 0
Testing GP counters, PMU version 1, perf_caps = 0
Testing fixed counters, PMU version 1, perf_caps = 0
Testing arch events, PMU version 1, perf_caps = 2000
Testing GP counters, PMU version 1, perf_caps = 2000
Testing fixed counters, PMU version 1, perf_caps = 2000
Testing arch events, PMU version 2, perf_caps = 0
Testing GP counters, PMU version 2, perf_caps = 0
Testing fixed counters, PMU version 2, perf_caps = 0
Testing arch events, PMU version 2, perf_caps = 2000
Testing GP counters, PMU version 2, perf_caps = 2000
Testing fixed counters, PMU version 2, perf_caps = 2000
Testing arch events, PMU version 3, perf_caps = 0
Testing GP counters, PMU version 3, perf_caps = 0
Testing fixed counters, PMU version 3, perf_caps = 0
Testing arch events, PMU version 3, perf_caps = 2000
Testing GP counters, PMU version 3, perf_caps = 2000
Testing fixed counters, PMU version 3, perf_caps = 2000
Testing arch events, PMU version 4, perf_caps = 0
==== Test Assertion Failure ====
x86_64/pmu_counters_test.c:120: count != 0
pid=39696 tid=39696 errno=4 - Interrupted system call
1 0x0000000000402baf: run_vcpu at pmu_counters_test.c:61
2 0x0000000000402ddd: test_arch_events at pmu_counters_test.c:307
3 0x0000000000402683: test_arch_events at pmu_counters_test.c:605
4 (inlined by) test_intel_counters at pmu_counters_test.c:605
5 (inlined by) main at pmu_counters_test.c:635
6 0x00007fcfeb43ae44: ?? ??:0
7 0x000000000040288d: _start at ??:?
0x0 == 0x0 (count == 0)
# cat /sys/module/kvm/parameters/enable_pmu
Y
# cat /sys/module/kvm/parameters/force_emulation_prefix
0
# cpuid -l 0xa -1
CPU:
Architecture Performance Monitoring Features (0xa):
version ID = 0x5 (5)
number of counters per logical processor = 0x8 (8)
bit width of counter = 0x30 (48)
length of EBX bit vector = 0x8 (8)
core cycle event = available
instruction retired event = available
reference cycles event = available
last-level cache ref event = available
last-level cache miss event = available
branch inst retired event = available
branch mispred retired event = available
top-down slots event = available
fixed counter 0 supported = true
fixed counter 1 supported = true
fixed counter 2 supported = true
fixed counter 3 supported = true
fixed counter 4 supported = false
fixed counter 5 supported = false
fixed counter 6 supported = false
fixed counter 7 supported = false
fixed counter 8 supported = false
fixed counter 9 supported = false
fixed counter 10 supported = false
fixed counter 11 supported = false
fixed counter 12 supported = false
fixed counter 13 supported = false
fixed counter 14 supported = false
fixed counter 15 supported = false
fixed counter 16 supported = false
fixed counter 17 supported = false
fixed counter 18 supported = false
fixed counter 19 supported = false
fixed counter 20 supported = false
fixed counter 21 supported = false
fixed counter 22 supported = false
fixed counter 23 supported = false
fixed counter 24 supported = false
fixed counter 25 supported = false
fixed counter 26 supported = false
fixed counter 27 supported = false
fixed counter 28 supported = false
fixed counter 29 supported = false
fixed counter 30 supported = false
fixed counter 31 supported = false
number of contiguous fixed counters = 0x4 (4)
bit width of fixed counters = 0x30 (48)
anythread deprecation = true
-------------------------------------------
I also did tests on nested L1 hypervisor (more legacy hardware). Most of time
are good, except once.
# ./pmu_counters_test
Testing arch events, PMU version 0, perf_caps = 0
Testing GP counters, PMU version 0, perf_caps = 0
Testing fixed counters, PMU version 0, perf_caps = 0
Testing arch events, PMU version 0, perf_caps = 2000
Testing GP counters, PMU version 0, perf_caps = 2000
Testing fixed counters, PMU version 0, perf_caps = 2000
Testing arch events, PMU version 1, perf_caps = 0
Testing GP counters, PMU version 1, perf_caps = 0
Testing fixed counters, PMU version 1, perf_caps = 0
Testing arch events, PMU version 1, perf_caps = 2000
Testing GP counters, PMU version 1, perf_caps = 2000
Testing fixed counters, PMU version 1, perf_caps = 2000
Testing arch events, PMU version 2, perf_caps = 0
Testing GP counters, PMU version 2, perf_caps = 0
Testing fixed counters, PMU version 2, perf_caps = 0
Testing arch events, PMU version 2, perf_caps = 2000
Testing GP counters, PMU version 2, perf_caps = 2000
Testing fixed counters, PMU version 2, perf_caps = 2000
Testing arch events, PMU version 3, perf_caps = 0
==== Test Assertion Failure ====
x86_64/pmu_counters_test.c:120: count != 0
pid=9301 tid=9301 errno=4 - Interrupted system call
1 0x0000000000402bdf: run_vcpu at pmu_counters_test.c:61
2 0x0000000000402dfd: test_arch_events at pmu_counters_test.c:307
3 0x00000000004026a3: test_arch_events at pmu_counters_test.c:605
4 (inlined by) test_intel_counters at pmu_counters_test.c:605
5 (inlined by) main at pmu_counters_test.c:635
6 0x00007f05e2f60d8f: ?? ??:0
7 0x00007f05e2f60e3f: ?? ??:0
8 0x00000000004028b4: _start at ??:?
0x0 == 0x0 (count == 0)
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug 218739] pmu_counters_test kvm-selftest fails with (count != NUM_INSNS_RETIRED)
2024-04-17 14:29 [Bug 218739] New: pmu_counters_test kvm-selftest fails with (count != NUM_INSNS_RETIRED) bugzilla-daemon
2024-04-23 0:21 ` [Bug 218739] " bugzilla-daemon
@ 2024-05-27 18:19 ` bugzilla-daemon
2024-05-28 17:20 ` Sean Christopherson
2024-05-28 17:20 ` bugzilla-daemon
` (4 subsequent siblings)
6 siblings, 1 reply; 9+ messages in thread
From: bugzilla-daemon @ 2024-05-27 18:19 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=218739
mlevitsk@redhat.com changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |mlevitsk@redhat.com
--- Comment #2 from mlevitsk@redhat.com ---
Adding my .02 cents:
I also see this test fail sometimes (once per hour or so of continuous running)
and in my case it fails because 'count != 0' assert on
INTEL_ARCH_LLC_MISSES_INDEX event and only for this event.
The reason is IMHO, is that it is possible to have 0 LLC misses if the cache
is large enough and code was run for enough iterations.
I wasn't able to make the test fail for other reasons (I only tested non-nested
case so far, nested this test also fails sometimes)
Best regards,
Maxim Levitsky
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Bug 218739] pmu_counters_test kvm-selftest fails with (count != NUM_INSNS_RETIRED)
2024-05-27 18:19 ` bugzilla-daemon
@ 2024-05-28 17:20 ` Sean Christopherson
0 siblings, 0 replies; 9+ messages in thread
From: Sean Christopherson @ 2024-05-28 17:20 UTC (permalink / raw)
To: bugzilla-daemon; +Cc: kvm
On Mon, May 27, 2024, bugzilla-daemon@kernel.org wrote:
> I also see this test fail sometimes (once per hour or so of continuous running)
> and in my case it fails because 'count != 0' assert on
> INTEL_ARCH_LLC_MISSES_INDEX event and only for this event.
>
> The reason is IMHO, is that it is possible to have 0 LLC misses if the cache
> is large enough and code was run for enough iterations.
The test does CLFUSH{,OPT} on its future code sequence after enabling the counter.
In theory, that's should guarantee an LLC Miss.
Hmm, but this SDM blurb about speculative loads makes me think past me was wrong.
(that is, data can be speculatively loaded into a cache line just before, during,
or after the execution of a CLFLUSH instruction that references the cache line).
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug 218739] pmu_counters_test kvm-selftest fails with (count != NUM_INSNS_RETIRED)
2024-04-17 14:29 [Bug 218739] New: pmu_counters_test kvm-selftest fails with (count != NUM_INSNS_RETIRED) bugzilla-daemon
2024-04-23 0:21 ` [Bug 218739] " bugzilla-daemon
2024-05-27 18:19 ` bugzilla-daemon
@ 2024-05-28 17:20 ` bugzilla-daemon
2024-06-10 19:22 ` bugzilla-daemon
` (3 subsequent siblings)
6 siblings, 0 replies; 9+ messages in thread
From: bugzilla-daemon @ 2024-05-28 17:20 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=218739
--- Comment #3 from Sean Christopherson (seanjc@google.com) ---
On Mon, May 27, 2024, bugzilla-daemon@kernel.org wrote:
> I also see this test fail sometimes (once per hour or so of continuous
> running)
> and in my case it fails because 'count != 0' assert on
> INTEL_ARCH_LLC_MISSES_INDEX event and only for this event.
>
> The reason is IMHO, is that it is possible to have 0 LLC misses if the cache
> is large enough and code was run for enough iterations.
The test does CLFUSH{,OPT} on its future code sequence after enabling the
counter.
In theory, that's should guarantee an LLC Miss.
Hmm, but this SDM blurb about speculative loads makes me think past me was
wrong.
(that is, data can be speculatively loaded into a cache line just before,
during,
or after the execution of a CLFLUSH instruction that references the cache
line).
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug 218739] pmu_counters_test kvm-selftest fails with (count != NUM_INSNS_RETIRED)
2024-04-17 14:29 [Bug 218739] New: pmu_counters_test kvm-selftest fails with (count != NUM_INSNS_RETIRED) bugzilla-daemon
` (2 preceding siblings ...)
2024-05-28 17:20 ` bugzilla-daemon
@ 2024-06-10 19:22 ` bugzilla-daemon
2024-06-20 21:28 ` bugzilla-daemon
` (2 subsequent siblings)
6 siblings, 0 replies; 9+ messages in thread
From: bugzilla-daemon @ 2024-06-10 19:22 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=218739
--- Comment #4 from mlevitsk@redhat.com ---
I did some more testing:
1. I double checked that INTEL_ARCH_LLC_MISSES_INDEX is the only event that
fails,
test survived whole night with only it commented out.
2. using CLFLUSH instead of CLFLUSHOPT doesn't help
Should we disable this event for now to avoid the failure until we figure out
how to use this event in a reliable way?
Best regards,
Maxim Levitsky
PS: I also did initial testing for running this test nested - it fails with
invalid guest state in L1, but only sometimes.
I'll investigate that further soon.
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug 218739] pmu_counters_test kvm-selftest fails with (count != NUM_INSNS_RETIRED)
2024-04-17 14:29 [Bug 218739] New: pmu_counters_test kvm-selftest fails with (count != NUM_INSNS_RETIRED) bugzilla-daemon
` (3 preceding siblings ...)
2024-06-10 19:22 ` bugzilla-daemon
@ 2024-06-20 21:28 ` bugzilla-daemon
2024-06-20 21:35 ` bugzilla-daemon
2024-06-21 15:14 ` bugzilla-daemon
6 siblings, 0 replies; 9+ messages in thread
From: bugzilla-daemon @ 2024-06-20 21:28 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=218739
--- Comment #5 from mlevitsk@redhat.com ---
I tested several approaches to eliminate the issue, but none of them seem to be
very robust.
In particular:
- I tried to clflush a global memory location outside of the loop, then access
it.
0 LLC misses still happen, once in a while.
- I also tried to access a location on the stack.
Here the test started failing on INTEL_ARCH_TOPDOWN_SLOTS_INDEX sometimes,
I am not sure why. I did push/pop, maybe ucode is smart enough to elide
this?
I now found a new and a more or less robust solution, which is to clflush on
each loop iteration.
That both increases the chances of at least one clflush working and it should
also confuse the speculation code enough.
It survived about 4 hours of testing.
I attached a draft patch with this solution, if you think that it is
reasonable, I'll send it to LKML.
Note that I dropped the mfence instruction thinking that it doesn't help much
since it helps with memory loads/stores while we clflush the memory which is
fetched for code execution.
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug 218739] pmu_counters_test kvm-selftest fails with (count != NUM_INSNS_RETIRED)
2024-04-17 14:29 [Bug 218739] New: pmu_counters_test kvm-selftest fails with (count != NUM_INSNS_RETIRED) bugzilla-daemon
` (4 preceding siblings ...)
2024-06-20 21:28 ` bugzilla-daemon
@ 2024-06-20 21:35 ` bugzilla-daemon
2024-06-21 15:14 ` bugzilla-daemon
6 siblings, 0 replies; 9+ messages in thread
From: bugzilla-daemon @ 2024-06-20 21:35 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=218739
--- Comment #6 from mlevitsk@redhat.com ---
Created attachment 306480
--> https://bugzilla.kernel.org/attachment.cgi?id=306480&action=edit
Patch to do CLFLUSH on each iteration of the loop
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug 218739] pmu_counters_test kvm-selftest fails with (count != NUM_INSNS_RETIRED)
2024-04-17 14:29 [Bug 218739] New: pmu_counters_test kvm-selftest fails with (count != NUM_INSNS_RETIRED) bugzilla-daemon
` (5 preceding siblings ...)
2024-06-20 21:35 ` bugzilla-daemon
@ 2024-06-21 15:14 ` bugzilla-daemon
6 siblings, 0 replies; 9+ messages in thread
From: bugzilla-daemon @ 2024-06-21 15:14 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=218739
--- Comment #7 from mlevitsk@redhat.com ---
I ran the test overnight, not a single failure.
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2024-06-21 15:14 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-17 14:29 [Bug 218739] New: pmu_counters_test kvm-selftest fails with (count != NUM_INSNS_RETIRED) bugzilla-daemon
2024-04-23 0:21 ` [Bug 218739] " bugzilla-daemon
2024-05-27 18:19 ` bugzilla-daemon
2024-05-28 17:20 ` Sean Christopherson
2024-05-28 17:20 ` bugzilla-daemon
2024-06-10 19:22 ` bugzilla-daemon
2024-06-20 21:28 ` bugzilla-daemon
2024-06-20 21:35 ` bugzilla-daemon
2024-06-21 15:14 ` bugzilla-daemon
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).