Re: KVM Unit Test Suite Regression on AMD EPYC Turin (Zen 5)

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

From: Sean Christopherson <seanjc@google.com>
To: Jim Mattson <jmattson@google.com>
Cc: Srikanth Aithal <sraithal@amd.com>, KVM <kvm@vger.kernel.org>
Subject: Re: KVM Unit Test Suite Regression on AMD EPYC Turin (Zen 5)
Date: Tue, 18 Nov 2025 14:38:34 -0800	[thread overview]
Message-ID: <aRz1aiQl3TedzVvm@google.com> (raw)
In-Reply-To: <aRzzWrghCDzdKGKD@google.com>

On Tue, Nov 18, 2025, Sean Christopherson wrote:
> On Wed, Jul 23, 2025, Jim Mattson wrote:
> > On Tue, Jul 8, 2025 at 1:58 PM Sean Christopherson <seanjc@google.com> wrote:
> > >
> > > On Tue, Jul 08, 2025, Srikanth Aithal wrote:
> > > > Hello all,
> > > > KVM unit test suite for SVM is regressing on the AMD EPYC Turin platform
> > > > (Zen 5) for a while now, even on latest linux-next[https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tag/?h=
> > > > next-20250704]. The same seem to work fine with linux-next tag
> > > > next-20250505.
> > > > The TSC delay test fails intermittently (approximately once in three runs)
> > > > with an unexpected result (expected: 50, actual: 49). This test passed
> > > > consistently on earlier tags (e.g., next-20250505) and on non-Turin
> > > > platforms.
> > >
> > > Stating the obvious to some extent, I suspect it's something to do with Turin,
> > > not a KVM issue.  This fails on our Turin hosts as far back as v6.12, i.e. long
> > > before next-20250505 (I haven't bothered checking earlier builds), and AFAICT
> > > the KUT test isn't doing anything to actually stress KVM itself.  I.e. I would
> > > expect KVM bugs to manifest as blatant, 100% reproducible failures, not random
> > > TSC slop.
> > 
> > I think the final test case is broken, actually.
> > 
> > The test case is:
> > 
> >     svm_tsc_scale_run_testcase(50, 0.0001, rdrand());
> > 
> > So, guest_tsc_delay_value is (u64)((50 << 24) * 0.0001), which is
> > 83886. Note that this is 83886.080000000002 truncated.
> > 
> > If L2 exits after 83886 scaled TSC cycles, the "duration" spent in L2
> > will be (u64)(83886 / 0.0001) >> 24, which is 49. To get up to 50, we
> > have to accumulate an additional (0.080000000002 / 0.0001 =
> > 800.0000000199999) cycles between the two rdtsc() operations
> > bracketing the svm_vmrun() in L1 .
> > 
> > The test probably passes on other CPUs because emulated VMRUN and
> > #VMEXIT add those 800 cycles.
> > 
> > Instead of truncating ((50 << 24) * 0.0001), I think we should
> > calculate guest_tsc_delay_value as ceil((50 << 24) * 0.0001).
> > Something like this:
> > 
> > diff --git a/x86/svm_tests.c b/x86/svm_tests.c
> > index 9358c1f0383a..1bfe11045bd1 100644
> > --- a/x86/svm_tests.c
> > +++ b/x86/svm_tests.c
> > @@ -891,6 +891,8 @@ static void svm_tsc_scale_run_testcase(u64 duration,
> >         u64 start_tsc, actual_duration;
> > 
> >         guest_tsc_delay_value = (duration << TSC_SHIFT) * tsc_scale;
> > +       if (guest_tsc_delay_value < (duration << TSC_SHIFT) * tsc_scale)
> > +               guest_tsc_delay_value++;
> > 
> >         test_set_guest(svm_tsc_scale_guest);
> >         vmcb->control.tsc_offset = tsc_offset;
> > 
> > Even then, equality of duration and actual_duration is only guaranteed
> > if there are no significant delays during the measurement.
> 
> Wrote a changelog and applied this to kvm-x86 next.  Thanks Jim!
> 
> [1/1] x86/svm: Account for numerical rounding errors in TSC scaling test
>       https://github.com/kvm-x86/linux/commit/5465145a

Gah, my alias is hardcoded to point at linux, the actual commit is:

  https://github.com/kvm-x86/kvm-unit-tests/commit/5465145a

     prev parent reply	other threads:[~2025-11-18 22:38 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-08  5:01 KVM Unit Test Suite Regression on AMD EPYC Turin (Zen 5) Aithal, Srikanth
2025-07-08 20:57 ` Sean Christopherson
2025-07-24  3:59   ` Jim Mattson
2025-11-18 22:29     ` Sean Christopherson
2025-11-18 22:38       ` Sean Christopherson [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aRz1aiQl3TedzVvm@google.com \
    --to=seanjc@google.com \
    --cc=jmattson@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=sraithal@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox