public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Chang S. Bae" <chang.seok.bae@intel.com>,
	Kiryl Shutsemau <kas@kernel.org>,
	kvm@vger.kernel.org,  x86@kernel.org, linux-coco@lists.linux.dev,
	linux-kernel@vger.kernel.org,
	 Andrew Cooper <andrew.cooper3@citrix.com>
Subject: Re: [PATCH 0/7] KVM: x86: APX reg prep work
Date: Mon, 6 Apr 2026 08:28:03 -0700	[thread overview]
Message-ID: <adPRA4ZhnvbaXSn0@google.com> (raw)
In-Reply-To: <CABgObfY05NU8DS82jkwpF89_p1nR7VJ30HBq_xaMg_u+-j=0Cw@mail.gmail.com>

+Andrew

On Sat, Apr 04, 2026, Paolo Bonzini wrote:
> On Sat, Apr 4, 2026 at 12:05 AM Chang S. Bae <chang.seok.bae@intel.com> wrote:
> >
> > On 4/3/2026 9:03 AM, Paolo Bonzini wrote:
> > >
> > > But until the kernel starts using APX, I would do the save/restore near
> > > kvm_load_xfeatures(), because __vmx_vcpu_run()/__svm_vcpu_run() would
> > > have to check whether xcr0.apx is set or not.
> > Right, I'd much prefer this. Then, it requires to audit whether any
> > fast-path handler could access EGPRs.
> >
> > But there are cases with the new {RD|WR}MSR (MSR_IMM) instructions that
> > appear to access GPRs. Because of this, the EGPR saving/restoring needs
> > to happen earlier.
> 
> You're right about fast paths...

Ya, potential fastpath usage is why I wanted to just context switch around
entry/exit.

> so something like the attached patch.
> It is not too bad to translate into assembly, where it could use
> alternatives (in the same way as
> RESTORE_GUEST_SPEC_CTRL/RESTORE_GUEST_SPEC_CTRL_BODY) in place of
> static_cpu_has(). Maybe it's best to bite the bullet and do it
> already...

My strong vote is to context switch in assembly, but _conditionally_ context
switch R16-R31.  All of this started from Andrew's comment:

 : You can't unconditionally use PUSH2/POP2 in the VMExit, because at that
 : point in time it's the guest's XCR0 in context.  If the guest has APX
 : disabled, PUSH2 in the VMExit path will #UD.
 : 
 : You either need two VMExit handlers, one APX and one non-APX and choose
 : based on the guest XCR0 value, or you need a branch prior to regaining
 : speculative safety, or you need to save/restore XCR0 as the first
 : action.  It's horrible any way you look at it.

But that second paragraph isn't quite correct, at least not for KVM.  Specifically,
"need a branch prior to regaining speculative safety" isn't correct, as that holds
true if and only if "regaining speculative safety" requires executing code that
might access R16-R31.  If we massage __vmx_vcpu_run() to restore SPEC_CTRL in
assembly, same as __svm_vcpu_run(), then __{svm,vmx}_vcpu_run() can simply context
switch R16-R31 if and only if APX is enabled in XCR0.

KVM always intercepts XCR0 writes (when XCR0 isn't context switched by "harware",
i.e. ignoring SEV-ES+ and TDX guests), and IIUC all access to R16-R31 is gated on
XCR0.APX=1.  So unless I'm missing something (or hardware is flawed and lets the
guest speculative consume R16-R31, which would be sad), it's perfectly safe to
run the guest with host state in R16-R31.

That would avoid pointlessly context switching 16 registers when APX is not being
used by the guest, and would avoid having to write XCR0 in the fastpath.

> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 959fcc01ee0f..9a1766037b6f 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -887,6 +887,7 @@ struct kvm_vcpu_arch {
>  	struct fpu_guest guest_fpu;
>  
>  	u64 xcr0;
> +	u64 early_xcr0;

...

> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 0757b93e528d..69abfdd946dd 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1220,9 +1220,13 @@ static void kvm_load_xfeatures(struct kvm_vcpu *vcpu, bool load_guest)
>  	if (!kvm_is_cr4_bit_set(vcpu, X86_CR4_OSXSAVE))
>  		return;
>  
> -	if (vcpu->arch.xcr0 != kvm_host.xcr0)
> +	/*
> +	 * Do not load the definitive XCR0 yet; vcpu->arch.early_xcr0 keeps
> +	 * APX enabled so that the kernel can move to and from r16...r31.
> +	 */
> +	if (vcpu->arch.early_xcr0 != kvm_host.xcr0)
>  		xsetbv(XCR_XFEATURE_ENABLED_MASK,
> -		       load_guest ? vcpu->arch.xcr0 : kvm_host.xcr0);
> +		       load_guest ? vcpu->arch.early_xcr0 : kvm_host.xcr0);

Even _if_ we want to play XCR0 games, tracking early_xcr0 is unnecessary.  This
can be:

	/*
	 * XCR0 is context switched around VM-Enter/VM-Exit if APX is enabled
	 * in the host but not in the guest.
	 */
	if (vcpu->arch.xcr0 != kvm_host.xcr0 &&
	    (!cpu_feature_enabled(X86_FEATURE_APX) ||
	     vcpu->arch.xcr0 & XFEATURE_MASK_APX))
		xsetbv(XCR_XFEATURE_ENABLED_MASK,
		       load_guest ? vcpu->arch.xcr0 : kvm_host.xcr0);

And then __kvm_load_guest_apx()

	<context switch R16-R31>

	if (cpu_feature_enabled(X86_FEATURE_APX) &&
	    !(vcpu->arch.xcr0 & & XFEATURE_MASK_APX))
		xsetbv(XCR_XFEATURE_ENABLED_MASK, vcpu->arch.xcr0);

And __kvm_save_guest_apx() would reverse the order of __kvm_load_guest_apx().

> @@ -11056,6 +11061,49 @@ static void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu)
>  	kvm_x86_call(set_apic_access_page_addr)(vcpu);
>  }
>  
> +/*
> + * Assuming the kernel does not use APX for now.  When
> + * the kernel starts using APX this needs to move into
> + * assembly, and KVM_GET/SET_XSAVE needs to fill in
> + * EGPRs from vcpu->arch.regs.
> + */
> +void __kvm_load_guest_apx(struct kvm_vcpu *vcpu)
> +{
> +	if (vcpu->arch.early_xcr0 != vcpu->arch.xcr0)
> +		xsetbv(XCR_XFEATURE_ENABLED_MASK, vcpu->arch.xcr0);

This is wrong.  The "real" xcr0 needs to be loaded *after* accessing R16+.

> +	if (!(vcpu->arch.xcr0 & XFEATURE_MASK_APX))
> +		return;
> +
> +	WARN_ON_ONCE(!irqs_disabled());
> +
> +	asm("mov %[r16], %%r16\n"
> +	    "mov %[r17], %%r17\n" // ...
> +	    : : [r16] "m" (vcpu->arch.regs[16]),
> +	        [r17] "m" (vcpu->arch.regs[17]));
> +}

  reply	other threads:[~2026-04-06 15:28 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-11  0:33 [PATCH 0/7] KVM: x86: APX reg prep work Sean Christopherson
2026-03-11  0:33 ` [PATCH 1/7] KVM: x86: Add dedicated storage for guest RIP Sean Christopherson
2026-03-11  0:33 ` [PATCH 2/7] KVM: x86: Drop the "EX" part of "EXREG" to avoid collision with APX Sean Christopherson
2026-03-11 18:46   ` Paolo Bonzini
2026-03-11  0:33 ` [PATCH 3/7] KVM: nVMX: Do a bitwise-AND of regs_avail when switching active VMCS Sean Christopherson
2026-03-11  0:33 ` [PATCH 4/7] KVM: x86: Add wrapper APIs to reset dirty/available register masks Sean Christopherson
2026-03-11  2:03   ` Yosry Ahmed
2026-03-11 13:31     ` Sean Christopherson
2026-03-11 18:28       ` Yosry Ahmed
2026-03-11 18:50       ` Paolo Bonzini
2026-03-13  0:38         ` Sean Christopherson
2026-03-11  0:33 ` [PATCH 5/7] KVM: x86: Track available/dirty register masks as "unsigned long" values Sean Christopherson
2026-03-11  0:33 ` [PATCH 6/7] KVM: x86: Use a proper bitmap for tracking available/dirty registers Sean Christopherson
2026-03-11  0:33 ` [PATCH 7/7] *** DO NOT MERGE *** KVM: x86: Pretend that APX is supported on 64-bit kernels Sean Christopherson
2026-03-11 19:01 ` [PATCH 0/7] KVM: x86: APX reg prep work Paolo Bonzini
2026-03-12 16:34   ` Chang S. Bae
2026-03-12 17:47     ` Sean Christopherson
2026-03-12 18:11       ` Andrew Cooper
2026-03-12 18:29         ` Sean Christopherson
2026-03-12 18:33           ` Andrew Cooper
2026-03-25 18:28       ` Chang S. Bae
2026-04-02 23:07         ` Sean Christopherson
2026-04-03  0:05           ` Chang S. Bae
2026-04-02 23:19   ` Sean Christopherson
2026-04-03 16:03     ` Paolo Bonzini
2026-04-03 22:05       ` Chang S. Bae
2026-04-04  5:16         ` Paolo Bonzini
2026-04-06 15:28           ` Sean Christopherson [this message]
2026-04-06 21:41             ` Paolo Bonzini
2026-04-06 22:00               ` Sean Christopherson
2026-04-03 16:07     ` Dave Hansen
2026-04-06 15:40       ` Sean Christopherson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=adPRA4ZhnvbaXSn0@google.com \
    --to=seanjc@google.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=chang.seok.bae@intel.com \
    --cc=kas@kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-coco@lists.linux.dev \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox