Re: [PATCH] KVM: x86: Drop "cache" from user return MSR setter that skips WRMSR

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Sean Christopherson <seanjc@google.com>
To: Yan Zhao <yan.y.zhao@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	 Xiaoyao Li <xiaoyao.li@intel.com>,
	Rick Edgecombe <rick.p.edgecombe@intel.com>
Subject: Re: [PATCH] KVM: x86: Drop "cache" from user return MSR setter that skips WRMSR
Date: Fri, 3 Oct 2025 09:53:28 -0700	[thread overview]
Message-ID: <aN__iPxo5P1bFCNk@google.com> (raw)
In-Reply-To: <aN/VaVklfXO5rId4@yzhao56-desk.sh.intel.com>

On Fri, Oct 03, 2025, Yan Zhao wrote:
> Sorry for the slow response due to the PRC holiday.
> 
> On Tue, Sep 30, 2025 at 09:29:00AM -0700, Sean Christopherson wrote:
> > On Tue, Sep 30, 2025, Yan Zhao wrote:
> > > On Tue, Sep 30, 2025 at 08:22:41PM +0800, Yan Zhao wrote:
> > > > On Fri, Sep 19, 2025 at 02:42:59PM -0700, Sean Christopherson wrote:
> > > > > Rename kvm_user_return_msr_update_cache() to __kvm_set_user_return_msr()
> > > > > and use the helper kvm_set_user_return_msr() to make it obvious that the
> > > > > double-underscores version is doing a subset of the work of the "full"
> > > > > setter.
> > > > > 
> > > > > While the function does indeed update a cache, the nomenclature becomes
> > > > > slightly misleading when adding a getter[1], as the current value isn't
> > > > > _just_ the cached value, it's also the value that's currently loaded in
> > > > > hardware.
> > > > Nit:
> > > > 
> > > > For TDX, "it's also the value that's currently loaded in hardware" is not true.
> > > since tdx module invokes wrmsr()s before each exit to VMM, while KVM only
> > > invokes __kvm_set_user_return_msr() in tdx_vcpu_put().
> > 
> > No?  kvm_user_return_msr_update_cache() is passed the value that's currently
> > loaded in hardware, by way of the TDX-Module zeroing some MSRs on TD-Exit.
> > 
> > Ah, I suspect you're calling out that the cache can be stale.  Maybe this?
> Right. But not just that the cache can be stale. My previous reply was quite
> misleading.
> 
> As with below tables, where
> CURR: msrs->values[slot].curr.
> REAL: value that's currently loaded in hardware
> 
> For TDs,
>                             CURR          REAL
>    -----------------------------------------------------------------------
> 1. enable virtualization    host value    host value
> 
> 2. TDH.VP.ENTER             host value    guest value (updated by tdx module)
> 3. TDH.VP.ENTER return      host value    defval (updated by tdx module)
> 4. tdx_vcpu_put             defval        defval
> 5. exit to user mode        host value    host value
> 
> 
> For normal VMs,
>                             CURR                 REAL
>    -----------------------------------------------------------------------
> 1. enable virtualization    host value           host value
> 2. before vcpu_run          shadow guest value   shadow guest value
> 3. after vcpu_run           shadow guest value   shadow guest value
> 4. exit to user mode        host value           host value
> 
> 
> Unlike normal VMs, where msrs->values[slot].curr always matches the the value
> that's currently loaded in hardware. 

That isn't actually true, see the bottom.

> For TDs, msrs->values[slot].curr does not contain the value that's currently
> loaded in hardware in stages 2-3.
> 
> >   While the function does indeed update a cache, the nomenclature becomes
> >   slightly misleading when adding a getter[1], as the current value isn't
> >   _just_ the cached value, it's also the value that's currently loaded in
> >   hardware (ignoring that the cache holds stale data until the vCPU is put,
> So, "stale data" is not accurate.
> It just can't hold the current hardware loaded value when guest is running in
> TD.

Eh, that's still "stale data" as far as KVM is concerned.  Though I'm splitting
hairs, I totally agree that as written the changelog is misleading.

> it's also the value that's currently loaded in hardware.

I just need to append "when KVM is actively running" (or probably something more
verbose).

> >   i.e. until KVM prepares to switch back to the host).
> > 
> > Actually, that's a bug waiting to happen when the getter comes along.  Rather
> > than document the potential pitfall, what about adding a prep patch to mimize
> > the window?  Then _this_ patch shouldn't need the caveat about the cache being
> > stale.
> With below patch,
> 
> For TDs,
>                             CURR          REAL
>    -----------------------------------------------------------------------
> 1. enable virtualization    host value    host value
> 2. before TDH.VP.ENTER      defval        host value or defval
> 3. TDH.VP.ENTER             defval        guest value (updated by tdx module)
> 4. TDH.VP.ENTER return      defval        defval (updated by tdx module)
> 5. exit to user mode        host value    host value
> 
> msrs->values[slot].curr is still not the current value loaded in hardware.

Right, this where it becomes stale from my perspective.

> > diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> > index ff41d3d00380..326fa81cb35f 100644
> > --- a/arch/x86/kvm/vmx/tdx.c
> > +++ b/arch/x86/kvm/vmx/tdx.c
> > @@ -789,6 +789,14 @@ void tdx_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
> >                 vt->msr_host_kernel_gs_base = read_msr(MSR_KERNEL_GS_BASE);
> >  
> >         vt->guest_state_loaded = true;
> > +
> > +       /*
> > +        * Several of KVM's user-return MSRs are clobbered by the TDX-Module if
> Hmm, my previous mail didn't mention that besides saving guest value + clobber
> hardware value before exit to VMM, the TDX module also loads saved guest value
> to hardware on TDH.VP.ENTER.

That's not actually unique to TDX.  EFER is setup as a user return MSR, but is
context switched on VM-Enter/VM-Exit except when running on ancient hardware
without VM_{ENTRY,EXIT}_LOAD_IA32_EFER (and even then, only when KVM doesn't
need to atomically switch to avoid toggling EFER.NX while in the host).

I.e. msrs->values[<EFER slot>].curr won't match hardware either while running
the guest.  But because EFER is atomically loaded on VM-Exit in those cases, the
curr value can't be stale while KVM is running.

next prev parent reply	other threads:[~2025-10-03 16:53 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-19 21:42 [PATCH] KVM: x86: Drop "cache" from user return MSR setter that skips WRMSR Sean Christopherson
2025-09-23  3:20 ` Xiaoyao Li
2025-09-30 12:22 ` Yan Zhao
2025-09-30 12:58   ` Yan Zhao
2025-09-30 16:29     ` Sean Christopherson
2025-09-30 16:34       ` Sean Christopherson
2025-10-15  0:55         ` Yan Zhao
2025-10-15 16:19           ` Sean Christopherson
2025-10-16  8:58             ` Yan Zhao
2025-10-16 13:27               ` Hou Wenlong
2025-10-16 13:41                 ` Sean Christopherson
2025-10-20  7:53                   ` Yan Zhao
2025-10-16 13:05             ` Hou Wenlong
2025-10-03 13:53       ` Yan Zhao
2025-10-03 16:53         ` Sean Christopherson [this message]
2025-10-13 13:47           ` Yan Zhao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aN__iPxo5P1bFCNk@google.com \
    --to=seanjc@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=rick.p.edgecombe@intel.com \
    --cc=xiaoyao.li@intel.com \
    --cc=yan.y.zhao@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox