Re: KVM Unit Test Suite Regression on AMD EPYC Turin (Zen 5)

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Sean Christopherson <seanjc@google.com>
To: Jim Mattson <jmattson@google.com>
Cc: Srikanth Aithal <sraithal@amd.com>, KVM <kvm@vger.kernel.org>
Subject: Re: KVM Unit Test Suite Regression on AMD EPYC Turin (Zen 5)
Date: Tue, 18 Nov 2025 14:38:34 -0800	[thread overview]
Message-ID: <aRz1aiQl3TedzVvm@google.com> (raw)
In-Reply-To: <aRzzWrghCDzdKGKD@google.com>

On Tue, Nov 18, 2025, Sean Christopherson wrote:
> On Wed, Jul 23, 2025, Jim Mattson wrote:
> > On Tue, Jul 8, 2025 at 1:58 PM Sean Christopherson <seanjc@google.com> wrote:
> > >
> > > On Tue, Jul 08, 2025, Srikanth Aithal wrote:
> > > > Hello all,
> > > > KVM unit test suite for SVM is regressing on the AMD EPYC Turin platform
> > > > (Zen 5) for a while now, even on latest linux-next[https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tag/?h=
> > > > next-20250704]. The same seem to work fine with linux-next tag
> > > > next-20250505.
> > > > The TSC delay test fails intermittently (approximately once in three runs)
> > > > with an unexpected result (expected: 50, actual: 49). This test passed
> > > > consistently on earlier tags (e.g., next-20250505) and on non-Turin
> > > > platforms.
> > >
> > > Stating the obvious to some extent, I suspect it's something to do with Turin,
> > > not a KVM issue.  This fails on our Turin hosts as far back as v6.12, i.e. long
> > > before next-20250505 (I haven't bothered checking earlier builds), and AFAICT
> > > the KUT test isn't doing anything to actually stress KVM itself.  I.e. I would
> > > expect KVM bugs to manifest as blatant, 100% reproducible failures, not random
> > > TSC slop.
> > 
> > I think the final test case is broken, actually.
> > 
> > The test case is:
> > 
> >     svm_tsc_scale_run_testcase(50, 0.0001, rdrand());
> > 
> > So, guest_tsc_delay_value is (u64)((50 << 24) * 0.0001), which is
> > 83886. Note that this is 83886.080000000002 truncated.
> > 
> > If L2 exits after 83886 scaled TSC cycles, the "duration" spent in L2
> > will be (u64)(83886 / 0.0001) >> 24, which is 49. To get up to 50, we
> > have to accumulate an additional (0.080000000002 / 0.0001 =
> > 800.0000000199999) cycles between the two rdtsc() operations
> > bracketing the svm_vmrun() in L1 .
> > 
> > The test probably passes on other CPUs because emulated VMRUN and
> > #VMEXIT add those 800 cycles.
> > 
> > Instead of truncating ((50 << 24) * 0.0001), I think we should
> > calculate guest_tsc_delay_value as ceil((50 << 24) * 0.0001).
> > Something like this:
> > 
> > diff --git a/x86/svm_tests.c b/x86/svm_tests.c
> > index 9358c1f0383a..1bfe11045bd1 100644
> > --- a/x86/svm_tests.c
> > +++ b/x86/svm_tests.c
> > @@ -891,6 +891,8 @@ static void svm_tsc_scale_run_testcase(u64 duration,
> >         u64 start_tsc, actual_duration;
> > 
> >         guest_tsc_delay_value = (duration << TSC_SHIFT) * tsc_scale;
> > +       if (guest_tsc_delay_value < (duration << TSC_SHIFT) * tsc_scale)
> > +               guest_tsc_delay_value++;
> > 
> >         test_set_guest(svm_tsc_scale_guest);
> >         vmcb->control.tsc_offset = tsc_offset;
> > 
> > Even then, equality of duration and actual_duration is only guaranteed
> > if there are no significant delays during the measurement.
> 
> Wrote a changelog and applied this to kvm-x86 next.  Thanks Jim!
> 
> [1/1] x86/svm: Account for numerical rounding errors in TSC scaling test
>       https://github.com/kvm-x86/linux/commit/5465145a

Gah, my alias is hardcoded to point at linux, the actual commit is:

  https://github.com/kvm-x86/kvm-unit-tests/commit/5465145a

     prev parent reply	other threads:[~2025-11-18 22:38 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-08  5:01 KVM Unit Test Suite Regression on AMD EPYC Turin (Zen 5) Aithal, Srikanth
2025-07-08 20:57 ` Sean Christopherson
2025-07-24  3:59   ` Jim Mattson
2025-11-18 22:29     ` Sean Christopherson
2025-11-18 22:38       ` Sean Christopherson [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aRz1aiQl3TedzVvm@google.com \
    --to=seanjc@google.com \
    --cc=jmattson@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=sraithal@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.