public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Mingwei Zhang <mizhang@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	 kernel test robot <oliver.sang@intel.com>
Subject: Re: [PATCH 2/5] KVM: selftests: Only validate counts for hardware-supported arch events
Date: Fri, 24 Jan 2025 07:57:35 -0800	[thread overview]
Message-ID: <Z5O4b3c2gfdr9inN@google.com> (raw)
In-Reply-To: <Z5B5TJz9vn43AFcT@google.com>

On Wed, Jan 22, 2025, Mingwei Zhang wrote:
> On Fri, Jan 17, 2025, Sean Christopherson wrote:
> > On Sat, Jan 18, 2025, Mingwei Zhang wrote:
> > > On Fri, Jan 17, 2025, Sean Christopherson wrote:
> > > > @@ -582,18 +585,26 @@ static void test_intel_counters(void)
> > > >  
> > > >  	/*
> > > >  	 * Detect the existence of events that aren't supported by selftests.
> > > > -	 * This will (obviously) fail any time the kernel adds support for a
> > > > -	 * new event, but it's worth paying that price to keep the test fresh.
> > > > +	 * This will (obviously) fail any time hardware adds support for a new
> > > > +	 * event, but it's worth paying that price to keep the test fresh.
> > > >  	 */
> > > >  	TEST_ASSERT(nr_arch_events <= NR_INTEL_ARCH_EVENTS,
> > > >  		    "New architectural event(s) detected; please update this test (length = %u, mask = %x)",
> > > > -		    nr_arch_events, kvm_cpu_property(X86_PROPERTY_PMU_EVENTS_MASK));
> > > > +		    nr_arch_events, this_cpu_property(X86_PROPERTY_PMU_EVENTS_MASK));
> > > 
> > > This is where it would make troubles for us (all companies that might be
> > > using the selftest in upstream kernel and having a new hardware). In
> > > this case when we get new hardware, the test will fail in the downstream
> > > kernel. We will have to wait until the fix is ready, and backport it
> > > downstream, re-test it.... It takes lots of extra work.
> > 
> > If Intel can't upstream what should be a *very* simple patch to enumerate the
> > new encoding and its expected count in advance of hardware being shipped to
> > partners, then we have bigger problems.  I don't know what level of pre-silicon
> > and FPGA-based emulation testing Intel does these days, but I wouldn't be at all
> > surprised if KVM tests are being run well before silicon arrives.
> 
> Right, Intel folks will be the 1st one that is impacted. But it should
> be easy to fix on their end. But upstreaming the change may come very
> late.

And I'm saying we do what we can to motivate upstreaming the "fix" sooner than
later.

> > I am not at all convinced that this will ever affect anyone besides the Intel
> > engineers doing early enablement, and I am definitely not convinced it will ever
> > take significant effort above beyond what would be required irrespective of what
> > approach we take.  E.g. figuring out what the expected count is might be time
> > consuming, but I don't expect updating the test to be difficult.
> 
> It will affect the downstream kernels, I think? Like Paolo mentioned,
> old distro kernel may run on new hardware. In usual cases, Intel HW has
> already come out for a while, and the upstream software update is still
> there under review.

This isn't a usual case.  Or at least, it shouldn't be.  The effort required
should be more on par with adding Family/Model/Stepping information, and Intel
is quite capable of landing those types of changes well in advance of general
availability.

E.g. commit 7beade0dd41d ("x86/cpu: Add several Intel server CPU model numbers")
added Sierra Forest and Granite Rapids two years before they launched.  I don't
know when third parties first got silicion, but I would be surprised if it was
much, if at all, before that commit.

> Fixing the problem is never difficult. But we need a minor fix for each new
> generation of HW that adds a new architecture event. I can imagine fixing
> that needs a simple patch, but each of them has to cc stable tree

Yes, but sending patches to LTS kernels isn't inherently bad.  If the changes end
up conflicting regularly, then we can certainly reconsider the cost vs. benefit
of the assert.

> and with "Fixes" tag.

Nit, it doesn't need a Fixes, just Cc: stable@.

> > > Perhaps we can just putting nr_arch_events = NR_INTEL_ARCH_EVENTS
> > > if the former is larger than or equal to the latter? So that the "test"
> > > only test what it knows. It does not test what it does not know, i.e.,
> > > it does not "assume" it knows everything. We can always a warning or
> > > info log at the moment. Then expanding the capability of the test should
> > > be added smoothly later by either maintainers of SWEs from CPU vendors
> > > without causing failures.
> > 
> > If we just "warn", we're effectively relying on a future Intel engineer to run
> > the test *and* check the logs *and* actually act on the warning.  Given that tests
> > are rarely run manually with a human pouring over the output, I highly doubt that
> > will pan out.
> 
> In reality, we may not even need to warn. If the test only covers what
> it knows, then there is no alarm to report.

The assertion/warn isn't about test correctness, it's about ensuring test coverage
and distributing maintenance burden.  I don't want to have to chase down someone
at Intel that can provide the gory details on whatever architectural event comes
along next.

  reply	other threads:[~2025-01-24 15:57 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-17 23:41 [PATCH 0/5] KVM: selftests: Fix PMC checks in PMU counters test Sean Christopherson
2025-01-17 23:41 ` [PATCH 1/5] KVM: selftests: Make Intel arch events globally available " Sean Christopherson
2025-01-17 23:42 ` [PATCH 2/5] KVM: selftests: Only validate counts for hardware-supported arch events Sean Christopherson
2025-01-18  0:06   ` Mingwei Zhang
2025-01-18  0:39     ` Sean Christopherson
2025-01-20 16:12       ` Paolo Bonzini
2025-01-22  4:51       ` Mingwei Zhang
2025-01-24 15:57         ` Sean Christopherson [this message]
2025-01-17 23:42 ` [PATCH 3/5] KVM: selftests: Remove dead code in Intel PMU counters test Sean Christopherson
2025-01-17 23:42 ` [PATCH 4/5] KVM: selftests: Drop the "feature event" param from guest test helpers Sean Christopherson
2025-01-17 23:42 ` [PATCH 5/5] KVM: selftests: Print out the actual Top-Down Slots count on failure Sean Christopherson
2025-01-20 16:17 ` [PATCH 0/5] KVM: selftests: Fix PMC checks in PMU counters test Paolo Bonzini
2025-02-10 23:38   ` Sean Christopherson
2025-02-15  0:50 ` Sean Christopherson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z5O4b3c2gfdr9inN@google.com \
    --to=seanjc@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mizhang@google.com \
    --cc=oliver.sang@intel.com \
    --cc=pbonzini@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox