public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Naveen N Rao <naveen@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	 Alejandro Jimenez <alejandro.j.jimenez@oracle.com>
Subject: Re: [PATCH 1/3] KVM: SVM: Disable x2AVIC RDMSR interception for MSRs KVM actually supports
Date: Fri, 10 Apr 2026 12:20:44 -0700	[thread overview]
Message-ID: <adlNjCRRCur8mjCA@google.com> (raw)
In-Reply-To: <adkl4wFdk4JrM3T-@blrnaveerao1>

On Fri, Apr 10, 2026, Naveen N Rao wrote:
> On Thu, Apr 09, 2026 at 03:24:47PM -0700, Sean Christopherson wrote:
> > Fix multiple (classes of) bugs with one stone by using KVM's mask of
> > readable local APIC registers to determine which x2APIC MSRs to pass
> > through (or not) when toggling x2AVIC on/off.  The existing hand-coded
> > list of MSRs is wrong on multiple fronts:
> > 
> >  - ARBPRI, DFR, and ICR2 aren't supported by x2APIC; disabling
> >    interception is nonsensical and suboptimal (the access generates a
> >    #VMEXIT that requires decoding the instruction).
> > 
> >  - RRR is completely unsupported.
> > 
> >  - AVIC currently fails to pass through the "range of vectors" registers,
> >    IRR, ISR, and TMR, as e.g. X2APIC_MSR(APIC_IRR) only affects IRR0, and
> >    thus only disables intercept for vectors 31:0 (which are the *least*
> >    interesting registers).
> 
> :facepalm:
> 
> We seriously need better selftests for these.

+1000.  In general, we need a way to validate "this should exit, this should not".

> Also on my list has been to cook up something for your other fix where AVIC
> gets inhibited for non-zero vCPU IDs (with x2AVIC disabled):
> http://lore.kernel.org/r/20260112232805.1512361-1-seanjc@google.com
> 
> I started looking at Alejandro's series adding AVIC-related binary 
> stats, but had to switch to other things. Last I looked, I felt that 
> your suggestion to add an "exits" array accounting individual #VMEXITs 
> would in particular be helpful:
> https://lore.kernel.org/kvm/ZmMjHwavCLk0lRd7@google.com/

We have per-exit stats (and more!) internally, and it's ugly.   For simplicity,
and/or perhaps to reduce out-of-tree maintenance costs, we did the easy thing of
tracking hardware exit reasons as is.  Which is *extremely* useful, but also
quite wasteful with respect to memory consumption, as a large percentage of exit
counts are unintresting and/or completely unused, especially on SVM due to SVM
having high granularity exit reasons.

> Though I'm not sure how standardizing this across VMX and SVM looks 
> like, and/or if it will be truly helpful

Probably even uglier :-)   The one case I'm confident would benefit from some
amount of standardization is exceptions.  If we simply regurgitate hardware exits,
Intel/VMX will only capture "did an exception occur", whereas AMD/SVM will
capture exactly which exception occurred.  If we *strictly* regurgitated exits,
it would be even worse than that, because NMIs get lumped in with exceptions on
VMX.

There are more cases like that, where VMX and SVM have slightly different exit
granularity.  We'd want to standardize on what gets presented to userspace so
that userspace doesn't have to effectively do the standardization (or worse, can
only implement certain checks for one vendor or the other).

> -- we may be interested in specific exits, such as AVIC-related exits for
> some of the tests...
> Thoughts?

I don't think we should plan on leveraging per-exit stats for testing purposes,
as they may never land upstream, and if they do, it will take some time.  E.g.
even though we find them valuable, I'm far from convinced that what we've got
implemented internally is suitable for upstream.

However!  Idea.  For many testcases, we don't actually care about *what* exit
occurred, only if *an* exit occurred (or not).  I.e. tests really just need a
binary yes/no signal.  Which we _almost_ have today, except stat.exits is polluted
with exits that are host-induced/owned (e.g. IRQs and NMIs) and/or asynchronous
in nature, e.g. the VMX preemption timer (which is arguably host-owned as well).

If KVM provides a stat that increments only on exits that are the direct result
of a guest action, then we can use that in selftests to detect unexpected (or
missed) exits, without having to worry about false failures due to noise from
host IRQs/NMIs/SMIs, etc.

The hardest part is probably figuring out a name :-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 3f3290d5a0a6..a8530a3d0545 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4256,6 +4256,9 @@ static fastpath_t svm_exit_handlers_fastpath(struct kvm_vcpu *vcpu)
        struct vcpu_svm *svm = to_svm(vcpu);
        struct vmcb_control_area *control = &svm->vmcb->control;
 
+       if (is_guest_induced_exit(control->exit_code))
+               ++vcpu->stat.guest_induced_exits;
+
        /*
         * Next RIP must be provided as IRQs are disabled, and accessing guest
         * memory to decode the instruction might fault, i.e. might sleep.
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 967b58a8ab9d..cd7769c412ab 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7524,6 +7524,9 @@ void noinstr vmx_spec_ctrl_restore_host(struct vcpu_vmx *vmx,
 static fastpath_t vmx_exit_handlers_fastpath(struct kvm_vcpu *vcpu,
                                             bool force_immediate_exit)
 {
+       if (is_guest_induced_exit(vmx_get_exit_reason(vcpu).basic))
+               ++vcpu->stat.guest_induced_exits;
+
        /*
         * If L2 is active, some VMX preemption timer exits can be handled in
         * the fastpath even, all other exits must use the slow path.

> > @@ -162,9 +165,15 @@ static void avic_set_x2apic_msr_interception(struct vcpu_svm *svm,
> >  	if (!x2avic_enabled)
> >  		return;
> >  
> > +	x2apic_readable_mask = kvm_lapic_readable_reg_mask(vcpu->arch.apic);
> > +
> > +	for (i = 0; i < BITS_PER_TYPE(typeof(x2apic_readable_mask)); i++)
> > +		svm_set_intercept_for_msr(vcpu, APIC_BASE_MSR + i,
> > +					  MSR_TYPE_R, intercept);
> > +
> 
> Yet to test this series (will get to it next week in more detail), but I 
> suppose you meant to use `for_each_set_bit()` or such?

Heh, my turn for a /facepalm.  I'm glad I admitted I didn't test this very well :-)

  reply	other threads:[~2026-04-10 19:20 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-09 22:24 [PATCH 0/3] KVM: SVM: Fix x2AVIC MSR interception mess Sean Christopherson
2026-04-09 22:24 ` [PATCH 1/3] KVM: SVM: Disable x2AVIC RDMSR interception for MSRs KVM actually supports Sean Christopherson
2026-04-10 16:45   ` Naveen N Rao
2026-04-10 19:20     ` Sean Christopherson [this message]
2026-04-09 22:24 ` [PATCH 2/3] KVM: SVM: Always intercept RDMSR for TMCCT (current APIC timer count) Sean Christopherson
2026-04-09 22:24 ` [PATCH 3/3] KVM: SVM: Only disable x2AVIC WRMSR interception for MSRs that are accelerated Sean Christopherson
2026-04-10 16:53   ` Naveen N Rao
2026-04-10 17:19     ` Sean Christopherson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=adlNjCRRCur8mjCA@google.com \
    --to=seanjc@google.com \
    --cc=alejandro.j.jimenez@oracle.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=naveen@kernel.org \
    --cc=pbonzini@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox