* [PATCH 0/1] KVM: selftests: pmu_counters_test: increase robustness of LLC cache misses
@ 2024-06-21 20:43 Maxim Levitsky
2024-06-21 20:43 ` [PATCH 1/1] " Maxim Levitsky
0 siblings, 1 reply; 5+ messages in thread
From: Maxim Levitsky @ 2024-06-21 20:43 UTC (permalink / raw)
To: kvm
Cc: Paolo Bonzini, linux-kselftest, Shuah Khan, linux-kernel,
Sean Christopherson, Maxim Levitsky
Recent CI failures brought my attention to the fact that pmu_counters_test
sometimes fails because it doesn't get any LLC cache misses.
It apparently happens because CLFLUSH can race with CPU prediction.
To attempt to fix this, implement a more aggressive cache flushing - now it is flushed
on each iteration of the measured loop which should at least reduce by order
of magnitude the chance of this happening.
This patch survived more that a day of running in a loop on a Comet Lake machine,
where the test used to fail after about 10-20 minites.
Best regards,
Maxim Levitsky
Maxim Levitsky (1):
KVM: selftests: pmu_counters_test: increase robustness of LLC cache
misses
.../selftests/kvm/x86_64/pmu_counters_test.c | 20 +++++++++----------
1 file changed, 9 insertions(+), 11 deletions(-)
--
2.26.3
^ permalink raw reply [flat|nested] 5+ messages in thread* [PATCH 1/1] KVM: selftests: pmu_counters_test: increase robustness of LLC cache misses 2024-06-21 20:43 [PATCH 0/1] KVM: selftests: pmu_counters_test: increase robustness of LLC cache misses Maxim Levitsky @ 2024-06-21 20:43 ` Maxim Levitsky 2024-06-26 16:08 ` Maxim Levitsky 2024-06-27 17:42 ` Sean Christopherson 0 siblings, 2 replies; 5+ messages in thread From: Maxim Levitsky @ 2024-06-21 20:43 UTC (permalink / raw) To: kvm Cc: Paolo Bonzini, linux-kselftest, Shuah Khan, linux-kernel, Sean Christopherson, Maxim Levitsky Currently this test does a single CLFLUSH on its memory location but due to speculative execution this might not cause LLC misses. Instead, do a cache flush on each loop iteration to confuse the prediction and make sure that cache misses always occur. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> --- .../selftests/kvm/x86_64/pmu_counters_test.c | 20 +++++++++---------- 1 file changed, 9 insertions(+), 11 deletions(-) diff --git a/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c index 96446134c00b7..ddc0b7e4a888e 100644 --- a/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c +++ b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c @@ -14,8 +14,8 @@ * instructions that are needed to set up the loop and then disabled the * counter. 1 CLFLUSH/CLFLUSHOPT/NOP, 1 MFENCE, 2 MOV, 2 XOR, 1 WRMSR. */ -#define NUM_EXTRA_INSNS 7 -#define NUM_INSNS_RETIRED (NUM_BRANCHES + NUM_EXTRA_INSNS) +#define NUM_EXTRA_INSNS 5 +#define NUM_INSNS_RETIRED (NUM_BRANCHES * 2 + NUM_EXTRA_INSNS) static uint8_t kvm_pmu_version; static bool kvm_has_perf_caps; @@ -133,9 +133,8 @@ static void guest_assert_event_count(uint8_t idx, * doesn't need to be clobbered as the input value, @pmc_msr, is restored * before the end of the sequence. * - * If CLFUSH{,OPT} is supported, flush the cacheline containing (at least) the - * start of the loop to force LLC references and misses, i.e. to allow testing - * that those events actually count. + * If CLFUSH{,OPT} is supported, flush the cacheline containing the CLFUSH{,OPT} + * instruction on each loop iteration to ensure that LLC cache misses happen. * * If forced emulation is enabled (and specified), force emulation on a subset * of the measured code to verify that KVM correctly emulates instructions and @@ -145,10 +144,9 @@ static void guest_assert_event_count(uint8_t idx, #define GUEST_MEASURE_EVENT(_msr, _value, clflush, FEP) \ do { \ __asm__ __volatile__("wrmsr\n\t" \ - clflush "\n\t" \ - "mfence\n\t" \ - "1: mov $" __stringify(NUM_BRANCHES) ", %%ecx\n\t" \ - FEP "loop .\n\t" \ + " mov $" __stringify(NUM_BRANCHES) ", %%ecx\n\t" \ + "1: " clflush "\n\t" \ + FEP "loop 1b\n\t" \ FEP "mov %%edi, %%ecx\n\t" \ FEP "xor %%eax, %%eax\n\t" \ FEP "xor %%edx, %%edx\n\t" \ @@ -163,9 +161,9 @@ do { \ wrmsr(pmc_msr, 0); \ \ if (this_cpu_has(X86_FEATURE_CLFLUSHOPT)) \ - GUEST_MEASURE_EVENT(_ctrl_msr, _value, "clflushopt 1f", FEP); \ + GUEST_MEASURE_EVENT(_ctrl_msr, _value, "clflushopt .", FEP); \ else if (this_cpu_has(X86_FEATURE_CLFLUSH)) \ - GUEST_MEASURE_EVENT(_ctrl_msr, _value, "clflush 1f", FEP); \ + GUEST_MEASURE_EVENT(_ctrl_msr, _value, "clflush .", FEP); \ else \ GUEST_MEASURE_EVENT(_ctrl_msr, _value, "nop", FEP); \ \ -- 2.26.3 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH 1/1] KVM: selftests: pmu_counters_test: increase robustness of LLC cache misses 2024-06-21 20:43 ` [PATCH 1/1] " Maxim Levitsky @ 2024-06-26 16:08 ` Maxim Levitsky 2024-06-27 17:42 ` Sean Christopherson 1 sibling, 0 replies; 5+ messages in thread From: Maxim Levitsky @ 2024-06-26 16:08 UTC (permalink / raw) To: kvm Cc: Paolo Bonzini, linux-kselftest, Shuah Khan, linux-kernel, Sean Christopherson On Fri, 2024-06-21 at 16:43 -0400, Maxim Levitsky wrote: > Currently this test does a single CLFLUSH on its memory location > but due to speculative execution this might not cause LLC misses. > > Instead, do a cache flush on each loop iteration to confuse the prediction > and make sure that cache misses always occur. > > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> > --- > .../selftests/kvm/x86_64/pmu_counters_test.c | 20 +++++++++---------- > 1 file changed, 9 insertions(+), 11 deletions(-) > > diff --git a/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c > index 96446134c00b7..ddc0b7e4a888e 100644 > --- a/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c > +++ b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c > @@ -14,8 +14,8 @@ > * instructions that are needed to set up the loop and then disabled the > * counter. 1 CLFLUSH/CLFLUSHOPT/NOP, 1 MFENCE, 2 MOV, 2 XOR, 1 WRMSR. > */ > -#define NUM_EXTRA_INSNS 7 > -#define NUM_INSNS_RETIRED (NUM_BRANCHES + NUM_EXTRA_INSNS) > +#define NUM_EXTRA_INSNS 5 > +#define NUM_INSNS_RETIRED (NUM_BRANCHES * 2 + NUM_EXTRA_INSNS) > > static uint8_t kvm_pmu_version; > static bool kvm_has_perf_caps; > @@ -133,9 +133,8 @@ static void guest_assert_event_count(uint8_t idx, > * doesn't need to be clobbered as the input value, @pmc_msr, is restored > * before the end of the sequence. > * > - * If CLFUSH{,OPT} is supported, flush the cacheline containing (at least) the > - * start of the loop to force LLC references and misses, i.e. to allow testing > - * that those events actually count. > + * If CLFUSH{,OPT} is supported, flush the cacheline containing the CLFUSH{,OPT} > + * instruction on each loop iteration to ensure that LLC cache misses happen. > * > * If forced emulation is enabled (and specified), force emulation on a subset > * of the measured code to verify that KVM correctly emulates instructions and > @@ -145,10 +144,9 @@ static void guest_assert_event_count(uint8_t idx, > #define GUEST_MEASURE_EVENT(_msr, _value, clflush, FEP) \ > do { \ > __asm__ __volatile__("wrmsr\n\t" \ > - clflush "\n\t" \ > - "mfence\n\t" \ > - "1: mov $" __stringify(NUM_BRANCHES) ", %%ecx\n\t" \ > - FEP "loop .\n\t" \ > + " mov $" __stringify(NUM_BRANCHES) ", %%ecx\n\t" \ > + "1: " clflush "\n\t" \ > + FEP "loop 1b\n\t" \ > FEP "mov %%edi, %%ecx\n\t" \ > FEP "xor %%eax, %%eax\n\t" \ > FEP "xor %%edx, %%edx\n\t" \ > @@ -163,9 +161,9 @@ do { \ > wrmsr(pmc_msr, 0); \ > \ > if (this_cpu_has(X86_FEATURE_CLFLUSHOPT)) \ > - GUEST_MEASURE_EVENT(_ctrl_msr, _value, "clflushopt 1f", FEP); \ > + GUEST_MEASURE_EVENT(_ctrl_msr, _value, "clflushopt .", FEP); \ > else if (this_cpu_has(X86_FEATURE_CLFLUSH)) \ > - GUEST_MEASURE_EVENT(_ctrl_msr, _value, "clflush 1f", FEP); \ > + GUEST_MEASURE_EVENT(_ctrl_msr, _value, "clflush .", FEP); \ > else \ > GUEST_MEASURE_EVENT(_ctrl_msr, _value, "nop", FEP); \ > \ Any update? The test patched with this patch survived about 3 days of running in a loop. Best regards, Maxim Levitsky ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 1/1] KVM: selftests: pmu_counters_test: increase robustness of LLC cache misses 2024-06-21 20:43 ` [PATCH 1/1] " Maxim Levitsky 2024-06-26 16:08 ` Maxim Levitsky @ 2024-06-27 17:42 ` Sean Christopherson 2024-07-05 2:48 ` Maxim Levitsky 1 sibling, 1 reply; 5+ messages in thread From: Sean Christopherson @ 2024-06-27 17:42 UTC (permalink / raw) To: Maxim Levitsky Cc: kvm, Paolo Bonzini, linux-kselftest, Shuah Khan, linux-kernel On Fri, Jun 21, 2024, Maxim Levitsky wrote: > Currently this test does a single CLFLUSH on its memory location > but due to speculative execution this might not cause LLC misses. > > Instead, do a cache flush on each loop iteration to confuse the prediction > and make sure that cache misses always occur. > > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> > --- > .../selftests/kvm/x86_64/pmu_counters_test.c | 20 +++++++++---------- > 1 file changed, 9 insertions(+), 11 deletions(-) > > diff --git a/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c > index 96446134c00b7..ddc0b7e4a888e 100644 > --- a/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c > +++ b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c > @@ -14,8 +14,8 @@ > * instructions that are needed to set up the loop and then disabled the > * counter. 1 CLFLUSH/CLFLUSHOPT/NOP, 1 MFENCE, 2 MOV, 2 XOR, 1 WRMSR. > */ > -#define NUM_EXTRA_INSNS 7 > -#define NUM_INSNS_RETIRED (NUM_BRANCHES + NUM_EXTRA_INSNS) > +#define NUM_EXTRA_INSNS 5 > +#define NUM_INSNS_RETIRED (NUM_BRANCHES * 2 + NUM_EXTRA_INSNS) The comment above is stale. I also think it's worth adding a macro to capture that the '2' comes from having two instructions in the loop body (three, if we keep the MFENCE). > static uint8_t kvm_pmu_version; > static bool kvm_has_perf_caps; > @@ -133,9 +133,8 @@ static void guest_assert_event_count(uint8_t idx, > * doesn't need to be clobbered as the input value, @pmc_msr, is restored > * before the end of the sequence. > * > - * If CLFUSH{,OPT} is supported, flush the cacheline containing (at least) the > - * start of the loop to force LLC references and misses, i.e. to allow testing > - * that those events actually count. > + * If CLFUSH{,OPT} is supported, flush the cacheline containing the CLFUSH{,OPT} > + * instruction on each loop iteration to ensure that LLC cache misses happen. > * > * If forced emulation is enabled (and specified), force emulation on a subset > * of the measured code to verify that KVM correctly emulates instructions and > @@ -145,10 +144,9 @@ static void guest_assert_event_count(uint8_t idx, > #define GUEST_MEASURE_EVENT(_msr, _value, clflush, FEP) \ > do { \ > __asm__ __volatile__("wrmsr\n\t" \ > - clflush "\n\t" \ > - "mfence\n\t" \ Based on your testing, it's probably ok to drop the mfence, but I don't see any reason to do so. It's not like that mfence meaningfully affects the runtime, and anything easy/free we can do to avoid flaky tests is worth doing. I'll post and apply a v2, with a prep patch to add a NUM_INSNS_PER_LOOP macro and keep the MFENCE (I'll be offline all of next week, and don't want to push anything to -next tomorrow, even though the risk of breaking anything is minimal). > - "1: mov $" __stringify(NUM_BRANCHES) ", %%ecx\n\t" \ > - FEP "loop .\n\t" \ > + " mov $" __stringify(NUM_BRANCHES) ", %%ecx\n\t" \ > + "1: " clflush "\n\t" \ > + FEP "loop 1b\n\t" \ > FEP "mov %%edi, %%ecx\n\t" \ > FEP "xor %%eax, %%eax\n\t" \ > FEP "xor %%edx, %%edx\n\t" \ > @@ -163,9 +161,9 @@ do { \ > wrmsr(pmc_msr, 0); \ > \ > if (this_cpu_has(X86_FEATURE_CLFLUSHOPT)) \ > - GUEST_MEASURE_EVENT(_ctrl_msr, _value, "clflushopt 1f", FEP); \ > + GUEST_MEASURE_EVENT(_ctrl_msr, _value, "clflushopt .", FEP); \ > else if (this_cpu_has(X86_FEATURE_CLFLUSH)) \ > - GUEST_MEASURE_EVENT(_ctrl_msr, _value, "clflush 1f", FEP); \ > + GUEST_MEASURE_EVENT(_ctrl_msr, _value, "clflush .", FEP); \ > else \ > GUEST_MEASURE_EVENT(_ctrl_msr, _value, "nop", FEP); \ > \ > -- > 2.26.3 > ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 1/1] KVM: selftests: pmu_counters_test: increase robustness of LLC cache misses 2024-06-27 17:42 ` Sean Christopherson @ 2024-07-05 2:48 ` Maxim Levitsky 0 siblings, 0 replies; 5+ messages in thread From: Maxim Levitsky @ 2024-07-05 2:48 UTC (permalink / raw) To: Sean Christopherson Cc: kvm, Paolo Bonzini, linux-kselftest, Shuah Khan, linux-kernel On Thu, 2024-06-27 at 10:42 -0700, Sean Christopherson wrote: > On Fri, Jun 21, 2024, Maxim Levitsky wrote: > > Currently this test does a single CLFLUSH on its memory location > > but due to speculative execution this might not cause LLC misses. > > > > Instead, do a cache flush on each loop iteration to confuse the prediction > > and make sure that cache misses always occur. > > > > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> > > --- > > .../selftests/kvm/x86_64/pmu_counters_test.c | 20 +++++++++---------- > > 1 file changed, 9 insertions(+), 11 deletions(-) > > > > diff --git a/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c > > index 96446134c00b7..ddc0b7e4a888e 100644 > > --- a/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c > > +++ b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c > > @@ -14,8 +14,8 @@ > > * instructions that are needed to set up the loop and then disabled the > > * counter. 1 CLFLUSH/CLFLUSHOPT/NOP, 1 MFENCE, 2 MOV, 2 XOR, 1 WRMSR. > > */ > > -#define NUM_EXTRA_INSNS 7 > > -#define NUM_INSNS_RETIRED (NUM_BRANCHES + NUM_EXTRA_INSNS) > > +#define NUM_EXTRA_INSNS 5 > > +#define NUM_INSNS_RETIRED (NUM_BRANCHES * 2 + NUM_EXTRA_INSNS) > > The comment above is stale. I also think it's worth adding a macro to capture > that the '2' comes from having two instructions in the loop body (three, if we > keep the MFENCE). True, my mistake. > > > static uint8_t kvm_pmu_version; > > static bool kvm_has_perf_caps; > > @@ -133,9 +133,8 @@ static void guest_assert_event_count(uint8_t idx, > > * doesn't need to be clobbered as the input value, @pmc_msr, is restored > > * before the end of the sequence. > > * > > - * If CLFUSH{,OPT} is supported, flush the cacheline containing (at least) the > > - * start of the loop to force LLC references and misses, i.e. to allow testing > > - * that those events actually count. > > + * If CLFUSH{,OPT} is supported, flush the cacheline containing the CLFUSH{,OPT} > > + * instruction on each loop iteration to ensure that LLC cache misses happen. > > * > > * If forced emulation is enabled (and specified), force emulation on a subset > > * of the measured code to verify that KVM correctly emulates instructions and > > @@ -145,10 +144,9 @@ static void guest_assert_event_count(uint8_t idx, > > #define GUEST_MEASURE_EVENT(_msr, _value, clflush, FEP) \ > > do { \ > > __asm__ __volatile__("wrmsr\n\t" \ > > - clflush "\n\t" \ > > - "mfence\n\t" \ > > Based on your testing, it's probably ok to drop the mfence, but I don't see any > reason to do so. It's not like that mfence meaningfully affects the runtime, and > anything easy/free we can do to avoid flaky tests is worth doing. Hi, I just didn't want to add another instruction to the loop, since in theory that will slow the test down. From PRM: "Executions of the CLFLUSH instruction are ordered with respect to each other and with respect to writes, locked read-modify-write instructions, and fence instructions. 1 They are not ordered with respect to executions of CLFLUSHOPT and CLWB. Software can use the SFENCE instruction to order an execution of CLFLUSH relative to one of those operations." Plus there is note that: "Earlier versions of this manual specified that executions of the CLFLUSH instruction were ordered only by the MFENCE instruction. All processors implementing the CLFLUSH instruction also order it relative to the other operations enumerated above." Here we have an instruction fetch and cache flush, and it is not clear if MFENCE orders two operations. Thus it is not clear if MFENCE helps or not. I honestly would have preferred a cache flush on data memory, followed by a read from it, except that this also sometimes doesn't work (maybe I made some mistake, maybe it is possible to make it work, don't know) But overall I don't object keeping it. > > I'll post and apply a v2, with a prep patch to add a NUM_INSNS_PER_LOOP macro and > keep the MFENCE (I'll be offline all of next week, and don't want to push anything > to -next tomorrow, even though the risk of breaking anything is minimal). Sounds good. Best regards, Maxim Levitsky > > > - "1: mov $" __stringify(NUM_BRANCHES) ", %%ecx\n\t" \ > > - FEP "loop .\n\t" \ > > + " mov $" __stringify(NUM_BRANCHES) ", %%ecx\n\t" \ > > + "1: " clflush "\n\t" \ > > + FEP "loop 1b\n\t" \ > > FEP "mov %%edi, %%ecx\n\t" \ > > FEP "xor %%eax, %%eax\n\t" \ > > FEP "xor %%edx, %%edx\n\t" \ > > @@ -163,9 +161,9 @@ do { \ > > wrmsr(pmc_msr, 0); \ > > \ > > if (this_cpu_has(X86_FEATURE_CLFLUSHOPT)) \ > > - GUEST_MEASURE_EVENT(_ctrl_msr, _value, "clflushopt 1f", FEP); \ > > + GUEST_MEASURE_EVENT(_ctrl_msr, _value, "clflushopt .", FEP); \ > > else if (this_cpu_has(X86_FEATURE_CLFLUSH)) \ > > - GUEST_MEASURE_EVENT(_ctrl_msr, _value, "clflush 1f", FEP); \ > > + GUEST_MEASURE_EVENT(_ctrl_msr, _value, "clflush .", FEP); \ > > else \ > > GUEST_MEASURE_EVENT(_ctrl_msr, _value, "nop", FEP); \ > > \ > > -- > > 2.26.3 > > ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-07-05 2:48 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-06-21 20:43 [PATCH 0/1] KVM: selftests: pmu_counters_test: increase robustness of LLC cache misses Maxim Levitsky 2024-06-21 20:43 ` [PATCH 1/1] " Maxim Levitsky 2024-06-26 16:08 ` Maxim Levitsky 2024-06-27 17:42 ` Sean Christopherson 2024-07-05 2:48 ` Maxim Levitsky
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).