* [Bug 218739] New: pmu_counters_test kvm-selftest fails with (count != NUM_INSNS_RETIRED)
@ 2024-04-17 14:29 bugzilla-daemon
2024-04-23 0:21 ` [Bug 218739] " bugzilla-daemon
` (6 more replies)
0 siblings, 7 replies; 9+ messages in thread
From: bugzilla-daemon @ 2024-04-17 14:29 UTC (permalink / raw)
To: kvm
https://bugzilla.kernel.org/show_bug.cgi?id=218739
Bug ID: 218739
Summary: pmu_counters_test kvm-selftest fails with (count !=
NUM_INSNS_RETIRED)
Product: Virtualization
Version: unspecified
Hardware: Intel
OS: Linux
Status: NEW
Severity: normal
Priority: P3
Component: kvm
Assignee: virtualization_kvm@kernel-bugs.osdl.org
Reporter: jarichte@redhat.com
Regression: No
Environment:
CPU Architecture: x86_64, Intel(R) Atom(TM) CPU C2750 @ 2.40GHz
Host OS: Fedorarawhide
Host kernel: Linux Kernel 6.9.0-rc3
gcc: gcc (GCC) 14.0.1
Host kernel source: https://git.kernel.org/pub/scm/virt/kvm/kvm.git
Branch: master
Commit: 1c3bed8006691f485156153778192864c9d8e14f
Bug Detailed Description:
Assertion failure executing kvm selftest pmu_counters_test.
Reproducing Steps:
git clone https://git.kernel.org/pub/scm/virt/kvm/kvm.git
cd kvm && make headers_install
cd kvm/tools/testing/selftests/kvm && make
cd x86_64 && ./pmu_counters_test
Actual Result:
Testing arch events, PMU version 0, perf_caps = 0
Testing GP counters, PMU version 0, perf_caps = 0
Testing fixed counters, PMU version 0, perf_caps = 0
Testing arch events, PMU version 0, perf_caps = 2000
Testing GP counters, PMU version 0, perf_caps = 2000
Testing fixed counters, PMU version 0, perf_caps = 2000
Testing arch events, PMU version 1, perf_caps = 0
==== Test Assertion Failure ====
x86_64/pmu_counters_test.c:107: count == NUM_INSNS_RETIRED
pid=51128 tid=51128 errno=4 - Interrupted system call
1 0x0000000000402c7d: run_vcpu at pmu_counters_test.c:61
2 0x0000000000402ead: test_arch_events at pmu_counters_test.c:307
3 0x0000000000402674: test_arch_events at pmu_counters_test.c:296
4 (inlined by) test_intel_counters at pmu_counters_test.c:601
5 (inlined by) main at pmu_counters_test.c:635
6 0x00007f78bd1981c7: ?? ??:0
7 0x00007f78bd19828a: ?? ??:0
8 0x0000000000402924: _start at ??:?
0x12 != 0x11 (count != NUM_INSNS_RETIRED)
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 9+ messages in thread* [Bug 218739] pmu_counters_test kvm-selftest fails with (count != NUM_INSNS_RETIRED) 2024-04-17 14:29 [Bug 218739] New: pmu_counters_test kvm-selftest fails with (count != NUM_INSNS_RETIRED) bugzilla-daemon @ 2024-04-23 0:21 ` bugzilla-daemon 2024-05-27 18:19 ` bugzilla-daemon ` (5 subsequent siblings) 6 siblings, 0 replies; 9+ messages in thread From: bugzilla-daemon @ 2024-04-23 0:21 UTC (permalink / raw) To: kvm https://bugzilla.kernel.org/show_bug.cgi?id=218739 Dongli Zhang (dongli.zhang@oracle.com) changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |dongli.zhang@oracle.com --- Comment #1 from Dongli Zhang (dongli.zhang@oracle.com) --- Perhaps more information can be printed by pmu_counters_test in the future, e.g., msr, msr_ctrl, their values, cflush and whether forced emulation? Just from the output, the number of instructions by GUEST_MEASURE_EVENT() does not match with NUM_INSNS_RETIRED=17, ------------------------ I have tried on an Icelake server and I could not reproduce anything for most of times, except the below for only once. # ./pmu_counters_test Testing arch events, PMU version 0, perf_caps = 0 Testing GP counters, PMU version 0, perf_caps = 0 Testing fixed counters, PMU version 0, perf_caps = 0 Testing arch events, PMU version 0, perf_caps = 2000 Testing GP counters, PMU version 0, perf_caps = 2000 Testing fixed counters, PMU version 0, perf_caps = 2000 Testing arch events, PMU version 1, perf_caps = 0 Testing GP counters, PMU version 1, perf_caps = 0 Testing fixed counters, PMU version 1, perf_caps = 0 Testing arch events, PMU version 1, perf_caps = 2000 Testing GP counters, PMU version 1, perf_caps = 2000 Testing fixed counters, PMU version 1, perf_caps = 2000 Testing arch events, PMU version 2, perf_caps = 0 Testing GP counters, PMU version 2, perf_caps = 0 Testing fixed counters, PMU version 2, perf_caps = 0 Testing arch events, PMU version 2, perf_caps = 2000 Testing GP counters, PMU version 2, perf_caps = 2000 Testing fixed counters, PMU version 2, perf_caps = 2000 Testing arch events, PMU version 3, perf_caps = 0 Testing GP counters, PMU version 3, perf_caps = 0 Testing fixed counters, PMU version 3, perf_caps = 0 Testing arch events, PMU version 3, perf_caps = 2000 Testing GP counters, PMU version 3, perf_caps = 2000 Testing fixed counters, PMU version 3, perf_caps = 2000 Testing arch events, PMU version 4, perf_caps = 0 ==== Test Assertion Failure ==== x86_64/pmu_counters_test.c:120: count != 0 pid=39696 tid=39696 errno=4 - Interrupted system call 1 0x0000000000402baf: run_vcpu at pmu_counters_test.c:61 2 0x0000000000402ddd: test_arch_events at pmu_counters_test.c:307 3 0x0000000000402683: test_arch_events at pmu_counters_test.c:605 4 (inlined by) test_intel_counters at pmu_counters_test.c:605 5 (inlined by) main at pmu_counters_test.c:635 6 0x00007fcfeb43ae44: ?? ??:0 7 0x000000000040288d: _start at ??:? 0x0 == 0x0 (count == 0) # cat /sys/module/kvm/parameters/enable_pmu Y # cat /sys/module/kvm/parameters/force_emulation_prefix 0 # cpuid -l 0xa -1 CPU: Architecture Performance Monitoring Features (0xa): version ID = 0x5 (5) number of counters per logical processor = 0x8 (8) bit width of counter = 0x30 (48) length of EBX bit vector = 0x8 (8) core cycle event = available instruction retired event = available reference cycles event = available last-level cache ref event = available last-level cache miss event = available branch inst retired event = available branch mispred retired event = available top-down slots event = available fixed counter 0 supported = true fixed counter 1 supported = true fixed counter 2 supported = true fixed counter 3 supported = true fixed counter 4 supported = false fixed counter 5 supported = false fixed counter 6 supported = false fixed counter 7 supported = false fixed counter 8 supported = false fixed counter 9 supported = false fixed counter 10 supported = false fixed counter 11 supported = false fixed counter 12 supported = false fixed counter 13 supported = false fixed counter 14 supported = false fixed counter 15 supported = false fixed counter 16 supported = false fixed counter 17 supported = false fixed counter 18 supported = false fixed counter 19 supported = false fixed counter 20 supported = false fixed counter 21 supported = false fixed counter 22 supported = false fixed counter 23 supported = false fixed counter 24 supported = false fixed counter 25 supported = false fixed counter 26 supported = false fixed counter 27 supported = false fixed counter 28 supported = false fixed counter 29 supported = false fixed counter 30 supported = false fixed counter 31 supported = false number of contiguous fixed counters = 0x4 (4) bit width of fixed counters = 0x30 (48) anythread deprecation = true ------------------------------------------- I also did tests on nested L1 hypervisor (more legacy hardware). Most of time are good, except once. # ./pmu_counters_test Testing arch events, PMU version 0, perf_caps = 0 Testing GP counters, PMU version 0, perf_caps = 0 Testing fixed counters, PMU version 0, perf_caps = 0 Testing arch events, PMU version 0, perf_caps = 2000 Testing GP counters, PMU version 0, perf_caps = 2000 Testing fixed counters, PMU version 0, perf_caps = 2000 Testing arch events, PMU version 1, perf_caps = 0 Testing GP counters, PMU version 1, perf_caps = 0 Testing fixed counters, PMU version 1, perf_caps = 0 Testing arch events, PMU version 1, perf_caps = 2000 Testing GP counters, PMU version 1, perf_caps = 2000 Testing fixed counters, PMU version 1, perf_caps = 2000 Testing arch events, PMU version 2, perf_caps = 0 Testing GP counters, PMU version 2, perf_caps = 0 Testing fixed counters, PMU version 2, perf_caps = 0 Testing arch events, PMU version 2, perf_caps = 2000 Testing GP counters, PMU version 2, perf_caps = 2000 Testing fixed counters, PMU version 2, perf_caps = 2000 Testing arch events, PMU version 3, perf_caps = 0 ==== Test Assertion Failure ==== x86_64/pmu_counters_test.c:120: count != 0 pid=9301 tid=9301 errno=4 - Interrupted system call 1 0x0000000000402bdf: run_vcpu at pmu_counters_test.c:61 2 0x0000000000402dfd: test_arch_events at pmu_counters_test.c:307 3 0x00000000004026a3: test_arch_events at pmu_counters_test.c:605 4 (inlined by) test_intel_counters at pmu_counters_test.c:605 5 (inlined by) main at pmu_counters_test.c:635 6 0x00007f05e2f60d8f: ?? ??:0 7 0x00007f05e2f60e3f: ?? ??:0 8 0x00000000004028b4: _start at ??:? 0x0 == 0x0 (count == 0) -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug. ^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug 218739] pmu_counters_test kvm-selftest fails with (count != NUM_INSNS_RETIRED) 2024-04-17 14:29 [Bug 218739] New: pmu_counters_test kvm-selftest fails with (count != NUM_INSNS_RETIRED) bugzilla-daemon 2024-04-23 0:21 ` [Bug 218739] " bugzilla-daemon @ 2024-05-27 18:19 ` bugzilla-daemon 2024-05-28 17:20 ` Sean Christopherson 2024-05-28 17:20 ` bugzilla-daemon ` (4 subsequent siblings) 6 siblings, 1 reply; 9+ messages in thread From: bugzilla-daemon @ 2024-05-27 18:19 UTC (permalink / raw) To: kvm https://bugzilla.kernel.org/show_bug.cgi?id=218739 mlevitsk@redhat.com changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |mlevitsk@redhat.com --- Comment #2 from mlevitsk@redhat.com --- Adding my .02 cents: I also see this test fail sometimes (once per hour or so of continuous running) and in my case it fails because 'count != 0' assert on INTEL_ARCH_LLC_MISSES_INDEX event and only for this event. The reason is IMHO, is that it is possible to have 0 LLC misses if the cache is large enough and code was run for enough iterations. I wasn't able to make the test fail for other reasons (I only tested non-nested case so far, nested this test also fails sometimes) Best regards, Maxim Levitsky -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Bug 218739] pmu_counters_test kvm-selftest fails with (count != NUM_INSNS_RETIRED) 2024-05-27 18:19 ` bugzilla-daemon @ 2024-05-28 17:20 ` Sean Christopherson 0 siblings, 0 replies; 9+ messages in thread From: Sean Christopherson @ 2024-05-28 17:20 UTC (permalink / raw) To: bugzilla-daemon; +Cc: kvm On Mon, May 27, 2024, bugzilla-daemon@kernel.org wrote: > I also see this test fail sometimes (once per hour or so of continuous running) > and in my case it fails because 'count != 0' assert on > INTEL_ARCH_LLC_MISSES_INDEX event and only for this event. > > The reason is IMHO, is that it is possible to have 0 LLC misses if the cache > is large enough and code was run for enough iterations. The test does CLFUSH{,OPT} on its future code sequence after enabling the counter. In theory, that's should guarantee an LLC Miss. Hmm, but this SDM blurb about speculative loads makes me think past me was wrong. (that is, data can be speculatively loaded into a cache line just before, during, or after the execution of a CLFLUSH instruction that references the cache line). ^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug 218739] pmu_counters_test kvm-selftest fails with (count != NUM_INSNS_RETIRED) 2024-04-17 14:29 [Bug 218739] New: pmu_counters_test kvm-selftest fails with (count != NUM_INSNS_RETIRED) bugzilla-daemon 2024-04-23 0:21 ` [Bug 218739] " bugzilla-daemon 2024-05-27 18:19 ` bugzilla-daemon @ 2024-05-28 17:20 ` bugzilla-daemon 2024-06-10 19:22 ` bugzilla-daemon ` (3 subsequent siblings) 6 siblings, 0 replies; 9+ messages in thread From: bugzilla-daemon @ 2024-05-28 17:20 UTC (permalink / raw) To: kvm https://bugzilla.kernel.org/show_bug.cgi?id=218739 --- Comment #3 from Sean Christopherson (seanjc@google.com) --- On Mon, May 27, 2024, bugzilla-daemon@kernel.org wrote: > I also see this test fail sometimes (once per hour or so of continuous > running) > and in my case it fails because 'count != 0' assert on > INTEL_ARCH_LLC_MISSES_INDEX event and only for this event. > > The reason is IMHO, is that it is possible to have 0 LLC misses if the cache > is large enough and code was run for enough iterations. The test does CLFUSH{,OPT} on its future code sequence after enabling the counter. In theory, that's should guarantee an LLC Miss. Hmm, but this SDM blurb about speculative loads makes me think past me was wrong. (that is, data can be speculatively loaded into a cache line just before, during, or after the execution of a CLFLUSH instruction that references the cache line). -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug. ^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug 218739] pmu_counters_test kvm-selftest fails with (count != NUM_INSNS_RETIRED) 2024-04-17 14:29 [Bug 218739] New: pmu_counters_test kvm-selftest fails with (count != NUM_INSNS_RETIRED) bugzilla-daemon ` (2 preceding siblings ...) 2024-05-28 17:20 ` bugzilla-daemon @ 2024-06-10 19:22 ` bugzilla-daemon 2024-06-20 21:28 ` bugzilla-daemon ` (2 subsequent siblings) 6 siblings, 0 replies; 9+ messages in thread From: bugzilla-daemon @ 2024-06-10 19:22 UTC (permalink / raw) To: kvm https://bugzilla.kernel.org/show_bug.cgi?id=218739 --- Comment #4 from mlevitsk@redhat.com --- I did some more testing: 1. I double checked that INTEL_ARCH_LLC_MISSES_INDEX is the only event that fails, test survived whole night with only it commented out. 2. using CLFLUSH instead of CLFLUSHOPT doesn't help Should we disable this event for now to avoid the failure until we figure out how to use this event in a reliable way? Best regards, Maxim Levitsky PS: I also did initial testing for running this test nested - it fails with invalid guest state in L1, but only sometimes. I'll investigate that further soon. -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug. ^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug 218739] pmu_counters_test kvm-selftest fails with (count != NUM_INSNS_RETIRED) 2024-04-17 14:29 [Bug 218739] New: pmu_counters_test kvm-selftest fails with (count != NUM_INSNS_RETIRED) bugzilla-daemon ` (3 preceding siblings ...) 2024-06-10 19:22 ` bugzilla-daemon @ 2024-06-20 21:28 ` bugzilla-daemon 2024-06-20 21:35 ` bugzilla-daemon 2024-06-21 15:14 ` bugzilla-daemon 6 siblings, 0 replies; 9+ messages in thread From: bugzilla-daemon @ 2024-06-20 21:28 UTC (permalink / raw) To: kvm https://bugzilla.kernel.org/show_bug.cgi?id=218739 --- Comment #5 from mlevitsk@redhat.com --- I tested several approaches to eliminate the issue, but none of them seem to be very robust. In particular: - I tried to clflush a global memory location outside of the loop, then access it. 0 LLC misses still happen, once in a while. - I also tried to access a location on the stack. Here the test started failing on INTEL_ARCH_TOPDOWN_SLOTS_INDEX sometimes, I am not sure why. I did push/pop, maybe ucode is smart enough to elide this? I now found a new and a more or less robust solution, which is to clflush on each loop iteration. That both increases the chances of at least one clflush working and it should also confuse the speculation code enough. It survived about 4 hours of testing. I attached a draft patch with this solution, if you think that it is reasonable, I'll send it to LKML. Note that I dropped the mfence instruction thinking that it doesn't help much since it helps with memory loads/stores while we clflush the memory which is fetched for code execution. -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug. ^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug 218739] pmu_counters_test kvm-selftest fails with (count != NUM_INSNS_RETIRED) 2024-04-17 14:29 [Bug 218739] New: pmu_counters_test kvm-selftest fails with (count != NUM_INSNS_RETIRED) bugzilla-daemon ` (4 preceding siblings ...) 2024-06-20 21:28 ` bugzilla-daemon @ 2024-06-20 21:35 ` bugzilla-daemon 2024-06-21 15:14 ` bugzilla-daemon 6 siblings, 0 replies; 9+ messages in thread From: bugzilla-daemon @ 2024-06-20 21:35 UTC (permalink / raw) To: kvm https://bugzilla.kernel.org/show_bug.cgi?id=218739 --- Comment #6 from mlevitsk@redhat.com --- Created attachment 306480 --> https://bugzilla.kernel.org/attachment.cgi?id=306480&action=edit Patch to do CLFLUSH on each iteration of the loop -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug. ^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug 218739] pmu_counters_test kvm-selftest fails with (count != NUM_INSNS_RETIRED) 2024-04-17 14:29 [Bug 218739] New: pmu_counters_test kvm-selftest fails with (count != NUM_INSNS_RETIRED) bugzilla-daemon ` (5 preceding siblings ...) 2024-06-20 21:35 ` bugzilla-daemon @ 2024-06-21 15:14 ` bugzilla-daemon 6 siblings, 0 replies; 9+ messages in thread From: bugzilla-daemon @ 2024-06-21 15:14 UTC (permalink / raw) To: kvm https://bugzilla.kernel.org/show_bug.cgi?id=218739 --- Comment #7 from mlevitsk@redhat.com --- I ran the test overnight, not a single failure. -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug. ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2024-06-21 15:14 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-04-17 14:29 [Bug 218739] New: pmu_counters_test kvm-selftest fails with (count != NUM_INSNS_RETIRED) bugzilla-daemon 2024-04-23 0:21 ` [Bug 218739] " bugzilla-daemon 2024-05-27 18:19 ` bugzilla-daemon 2024-05-28 17:20 ` Sean Christopherson 2024-05-28 17:20 ` bugzilla-daemon 2024-06-10 19:22 ` bugzilla-daemon 2024-06-20 21:28 ` bugzilla-daemon 2024-06-20 21:35 ` bugzilla-daemon 2024-06-21 15:14 ` bugzilla-daemon
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).