Re: [PATCH 0/7] KVM: x86: APX reg prep work

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Sean Christopherson <seanjc@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Chang S. Bae" <chang.seok.bae@intel.com>,
	Kiryl Shutsemau <kas@kernel.org>,
	kvm@vger.kernel.org,  x86@kernel.org, linux-coco@lists.linux.dev,
	linux-kernel@vger.kernel.org,
	 Andrew Cooper <andrew.cooper3@citrix.com>
Subject: Re: [PATCH 0/7] KVM: x86: APX reg prep work
Date: Mon, 6 Apr 2026 08:28:03 -0700	[thread overview]
Message-ID: <adPRA4ZhnvbaXSn0@google.com> (raw)
In-Reply-To: <CABgObfY05NU8DS82jkwpF89_p1nR7VJ30HBq_xaMg_u+-j=0Cw@mail.gmail.com>

+Andrew

On Sat, Apr 04, 2026, Paolo Bonzini wrote:
> On Sat, Apr 4, 2026 at 12:05 AM Chang S. Bae <chang.seok.bae@intel.com> wrote:
> >
> > On 4/3/2026 9:03 AM, Paolo Bonzini wrote:
> > >
> > > But until the kernel starts using APX, I would do the save/restore near
> > > kvm_load_xfeatures(), because __vmx_vcpu_run()/__svm_vcpu_run() would
> > > have to check whether xcr0.apx is set or not.
> > Right, I'd much prefer this. Then, it requires to audit whether any
> > fast-path handler could access EGPRs.
> >
> > But there are cases with the new {RD|WR}MSR (MSR_IMM) instructions that
> > appear to access GPRs. Because of this, the EGPR saving/restoring needs
> > to happen earlier.
> 
> You're right about fast paths...

Ya, potential fastpath usage is why I wanted to just context switch around
entry/exit.

> so something like the attached patch.
> It is not too bad to translate into assembly, where it could use
> alternatives (in the same way as
> RESTORE_GUEST_SPEC_CTRL/RESTORE_GUEST_SPEC_CTRL_BODY) in place of
> static_cpu_has(). Maybe it's best to bite the bullet and do it
> already...

My strong vote is to context switch in assembly, but _conditionally_ context
switch R16-R31.  All of this started from Andrew's comment:

 : You can't unconditionally use PUSH2/POP2 in the VMExit, because at that
 : point in time it's the guest's XCR0 in context.  If the guest has APX
 : disabled, PUSH2 in the VMExit path will #UD.
 : 
 : You either need two VMExit handlers, one APX and one non-APX and choose
 : based on the guest XCR0 value, or you need a branch prior to regaining
 : speculative safety, or you need to save/restore XCR0 as the first
 : action.  It's horrible any way you look at it.

But that second paragraph isn't quite correct, at least not for KVM.  Specifically,
"need a branch prior to regaining speculative safety" isn't correct, as that holds
true if and only if "regaining speculative safety" requires executing code that
might access R16-R31.  If we massage __vmx_vcpu_run() to restore SPEC_CTRL in
assembly, same as __svm_vcpu_run(), then __{svm,vmx}_vcpu_run() can simply context
switch R16-R31 if and only if APX is enabled in XCR0.

KVM always intercepts XCR0 writes (when XCR0 isn't context switched by "harware",
i.e. ignoring SEV-ES+ and TDX guests), and IIUC all access to R16-R31 is gated on
XCR0.APX=1.  So unless I'm missing something (or hardware is flawed and lets the
guest speculative consume R16-R31, which would be sad), it's perfectly safe to
run the guest with host state in R16-R31.

That would avoid pointlessly context switching 16 registers when APX is not being
used by the guest, and would avoid having to write XCR0 in the fastpath.

> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 959fcc01ee0f..9a1766037b6f 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -887,6 +887,7 @@ struct kvm_vcpu_arch {
>  	struct fpu_guest guest_fpu;
>  
>  	u64 xcr0;
> +	u64 early_xcr0;

...

> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 0757b93e528d..69abfdd946dd 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1220,9 +1220,13 @@ static void kvm_load_xfeatures(struct kvm_vcpu *vcpu, bool load_guest)
>  	if (!kvm_is_cr4_bit_set(vcpu, X86_CR4_OSXSAVE))
>  		return;
>  
> -	if (vcpu->arch.xcr0 != kvm_host.xcr0)
> +	/*
> +	 * Do not load the definitive XCR0 yet; vcpu->arch.early_xcr0 keeps
> +	 * APX enabled so that the kernel can move to and from r16...r31.
> +	 */
> +	if (vcpu->arch.early_xcr0 != kvm_host.xcr0)
>  		xsetbv(XCR_XFEATURE_ENABLED_MASK,
> -		       load_guest ? vcpu->arch.xcr0 : kvm_host.xcr0);
> +		       load_guest ? vcpu->arch.early_xcr0 : kvm_host.xcr0);

Even _if_ we want to play XCR0 games, tracking early_xcr0 is unnecessary.  This
can be:

	/*
	 * XCR0 is context switched around VM-Enter/VM-Exit if APX is enabled
	 * in the host but not in the guest.
	 */
	if (vcpu->arch.xcr0 != kvm_host.xcr0 &&
	    (!cpu_feature_enabled(X86_FEATURE_APX) ||
	     vcpu->arch.xcr0 & XFEATURE_MASK_APX))
		xsetbv(XCR_XFEATURE_ENABLED_MASK,
		       load_guest ? vcpu->arch.xcr0 : kvm_host.xcr0);

And then __kvm_load_guest_apx()

	<context switch R16-R31>

	if (cpu_feature_enabled(X86_FEATURE_APX) &&
	    !(vcpu->arch.xcr0 & & XFEATURE_MASK_APX))
		xsetbv(XCR_XFEATURE_ENABLED_MASK, vcpu->arch.xcr0);

And __kvm_save_guest_apx() would reverse the order of __kvm_load_guest_apx().

> @@ -11056,6 +11061,49 @@ static void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu)
>  	kvm_x86_call(set_apic_access_page_addr)(vcpu);
>  }
>  
> +/*
> + * Assuming the kernel does not use APX for now.  When
> + * the kernel starts using APX this needs to move into
> + * assembly, and KVM_GET/SET_XSAVE needs to fill in
> + * EGPRs from vcpu->arch.regs.
> + */
> +void __kvm_load_guest_apx(struct kvm_vcpu *vcpu)
> +{
> +	if (vcpu->arch.early_xcr0 != vcpu->arch.xcr0)
> +		xsetbv(XCR_XFEATURE_ENABLED_MASK, vcpu->arch.xcr0);

This is wrong.  The "real" xcr0 needs to be loaded *after* accessing R16+.

> +	if (!(vcpu->arch.xcr0 & XFEATURE_MASK_APX))
> +		return;
> +
> +	WARN_ON_ONCE(!irqs_disabled());
> +
> +	asm("mov %[r16], %%r16\n"
> +	    "mov %[r17], %%r17\n" // ...
> +	    : : [r16] "m" (vcpu->arch.regs[16]),
> +	        [r17] "m" (vcpu->arch.regs[17]));
> +}

next prev parent reply	other threads:[~2026-04-06 15:28 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-11  0:33 [PATCH 0/7] KVM: x86: APX reg prep work Sean Christopherson
2026-03-11  0:33 ` [PATCH 1/7] KVM: x86: Add dedicated storage for guest RIP Sean Christopherson
2026-03-11  0:33 ` [PATCH 2/7] KVM: x86: Drop the "EX" part of "EXREG" to avoid collision with APX Sean Christopherson
2026-03-11 18:46   ` Paolo Bonzini
2026-03-11  0:33 ` [PATCH 3/7] KVM: nVMX: Do a bitwise-AND of regs_avail when switching active VMCS Sean Christopherson
2026-03-11  0:33 ` [PATCH 4/7] KVM: x86: Add wrapper APIs to reset dirty/available register masks Sean Christopherson
2026-03-11  2:03   ` Yosry Ahmed
2026-03-11 13:31     ` Sean Christopherson
2026-03-11 18:28       ` Yosry Ahmed
2026-03-11 18:50       ` Paolo Bonzini
2026-03-13  0:38         ` Sean Christopherson
2026-03-11  0:33 ` [PATCH 5/7] KVM: x86: Track available/dirty register masks as "unsigned long" values Sean Christopherson
2026-03-11  0:33 ` [PATCH 6/7] KVM: x86: Use a proper bitmap for tracking available/dirty registers Sean Christopherson
2026-03-11  0:33 ` [PATCH 7/7] *** DO NOT MERGE *** KVM: x86: Pretend that APX is supported on 64-bit kernels Sean Christopherson
2026-03-11 19:01 ` [PATCH 0/7] KVM: x86: APX reg prep work Paolo Bonzini
2026-03-12 16:34   ` Chang S. Bae
2026-03-12 17:47     ` Sean Christopherson
2026-03-12 18:11       ` Andrew Cooper
2026-03-12 18:29         ` Sean Christopherson
2026-03-12 18:33           ` Andrew Cooper
2026-03-25 18:28       ` Chang S. Bae
2026-04-02 23:07         ` Sean Christopherson
2026-04-03  0:05           ` Chang S. Bae
2026-04-02 23:19   ` Sean Christopherson
2026-04-03 16:03     ` Paolo Bonzini
2026-04-03 22:05       ` Chang S. Bae
2026-04-04  5:16         ` Paolo Bonzini
2026-04-06 15:28           ` Sean Christopherson [this message]
2026-04-06 21:41             ` Paolo Bonzini
2026-04-06 22:00               ` Sean Christopherson
2026-04-07  7:18                 ` Paolo Bonzini
2026-04-07 13:20                   ` Sean Christopherson
2026-04-03 16:07     ` Dave Hansen
2026-04-06 15:40       ` Sean Christopherson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=adPRA4ZhnvbaXSn0@google.com \
    --to=seanjc@google.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=chang.seok.bae@intel.com \
    --cc=kas@kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-coco@lists.linux.dev \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.