kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Adrian Hunter <adrian.hunter@intel.com>
Cc: Chao Gao <chao.gao@intel.com>,
	pbonzini@redhat.com, kvm@vger.kernel.org,
	 dave.hansen@linux.intel.com, rick.p.edgecombe@intel.com,
	kai.huang@intel.com,  reinette.chatre@intel.com,
	xiaoyao.li@intel.com,  tony.lindgren@linux.intel.com,
	binbin.wu@linux.intel.com, dmatlack@google.com,
	 isaku.yamahata@intel.com, nik.borisov@suse.com,
	linux-kernel@vger.kernel.org,  x86@kernel.org,
	yan.y.zhao@intel.com, weijiang.yang@intel.com
Subject: Re: [PATCH 4/7] KVM: TDX: restore host xsave state when exit from the guest TD
Date: Fri, 10 Jan 2025 09:30:16 -0800	[thread overview]
Message-ID: <Z4FZKOzXIdhLOlU8@google.com> (raw)
In-Reply-To: <3a7d93aa-781b-445e-a67a-25b0ffea0dff@intel.com>

On Fri, Jan 10, 2025, Adrian Hunter wrote:
> On 9/01/25 21:11, Sean Christopherson wrote:
> > On Fri, Jan 03, 2025, Adrian Hunter wrote:
> >> +static u64 tdx_guest_cr4(struct kvm_vcpu *vcpu)
> >> +{
> >> +	struct kvm_tdx *kvm_tdx = to_kvm_tdx(vcpu->kvm);
> >> +	u64 cr4;
> >> +
> >> +	rdmsrl(MSR_IA32_VMX_CR4_FIXED1, cr4);
> > 
> > This won't be accurate long-term.  E.g. run KVM on hardware with CR4 bits that
> > neither KVM nor TDX know about, and vcpu->arch.cr4 will end up with bits set that
> > KVM think are illegal, which will cause it's own problems.
> 
> Currently validation of CR4 is only done when user space changes it,
> which should not be allowed for TDX.  For that it looks like TDX
> would need:
> 
> 	kvm->arch.has_protected_state = true;
> 
> Not sure why it doesn't already?

Sorry, I didn't follow any of that.

> > For CR0 and CR4, we should be able to start with KVM's set of allowed bits, not
> > the CPU's.  That will mean there will likely be missing bits, in vcpu->arch.cr{0,4},
> > but if KVM doesn't know about a bit, the fact that it's missing should be a complete
> > non-issue.
> 
> What about adding:
> 
> 	cr4 &= ~cr4_reserved_bits;
> 
> and
> 
> 	cr0 &= ~CR0_RESERVED_BITS

I was thinking a much more explicit:

	vcpu->arch.cr4 = ~vcpu->arch.cr4_guest_rsvd_bits;

which if it's done in tdx_vcpu_init(), in conjunction with freezing the vCPU
model (see below), should be solid.

> >> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> >> index d2ea7db896ba..f2b1980f830d 100644
> >> --- a/arch/x86/kvm/x86.c
> >> +++ b/arch/x86/kvm/x86.c
> >> @@ -1240,6 +1240,11 @@ static int __kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr)
> >>  	u64 old_xcr0 = vcpu->arch.xcr0;
> >>  	u64 valid_bits;
> >>  
> >> +	if (vcpu->arch.guest_state_protected) {
> > 
> > This should be a WARN_ON_ONCE() + return 1, no?
> 
> With kvm->arch.has_protected_state = true, KVM_SET_XCRS
> would fail, which would probably be fine except for KVM selftests:
> 
> Currently the KVM selftests expect to be able to set XCR0:
> 
>     td_vcpu_add()
> 	vm_vcpu_add()
> 	    vm_arch_vcpu_add()
> 		vcpu_init_xcrs()
> 		    vcpu_xcrs_set()
> 			vcpu_ioctl(KVM_SET_XCRS)
> 			    __TEST_ASSERT_VM_VCPU_IOCTL(!ret)
> 
> Seems like vm->arch.has_protected_state is needed for KVM selftests?

I doubt it's truly needed, my guess (without looking at the code) is that selftests
are fudging around the fact that KVM doesn't stuff arch.xcr0.

> >> +		kvm_update_cpuid_runtime(vcpu);
> 
> And kvm_update_cpuid_runtime() never gets called otherwise.
> Not sure where would be a good place to call it.

I think we should call it in tdx_vcpu_init(), and then also freeze the vCPU model
at that time.  KVM currently "freezes" the model based on last_vmentry_cpu, but
that's a bit of a hack and might even be flawed, e.g. I wouldn't be surprised if
it's possible to lead KVM astray by trying to get a signal to race with KVM_RUN
so that last_vmentry_cpu isn't set despite getting quite far into KVM_RUN.

I'll test and post a patch to add vcpu_model_is_frozen (the below will conflict
mightily with the CPUID rework that's queued for v6.14), as I think it's a good
change even if we don't end up freezing the model at tdx_vcpu_init() (though I
can't think of any reason to allow CPUID updates after that point).

---
 arch/x86/include/asm/kvm_host.h | 1 +
 arch/x86/kvm/cpuid.c            | 2 +-
 arch/x86/kvm/mmu/mmu.c          | 4 ++--
 arch/x86/kvm/pmu.c              | 2 +-
 arch/x86/kvm/vmx/tdx.c          | 7 +++++++
 arch/x86/kvm/x86.c              | 9 ++++++++-
 arch/x86/kvm/x86.h              | 5 -----
 7 files changed, 20 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 0787855ab006..41c31a69924d 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -779,6 +779,7 @@ struct kvm_vcpu_arch {
 	u64 ia32_misc_enable_msr;
 	u64 smbase;
 	u64 smi_count;
+	bool vcpu_model_is_frozen;
 	bool at_instruction_boundary;
 	bool tpr_access_reporting;
 	bool xfd_no_write_intercept;
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 6c7ab125f582..678518ec1c72 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -465,7 +465,7 @@ static int kvm_set_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *e2,
 	 * KVM_SET_CPUID{,2} again. To support this legacy behavior, check
 	 * whether the supplied CPUID data is equal to what's already set.
 	 */
-	if (kvm_vcpu_has_run(vcpu)) {
+	if (vcpu->arch.vcpu_model_is_frozen) {
 		r = kvm_cpuid_check_equal(vcpu, e2, nent);
 		if (r)
 			return r;
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 713ca857f2c2..75350a5c6c54 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5716,9 +5716,9 @@ void kvm_mmu_after_set_cpuid(struct kvm_vcpu *vcpu)
 
 	/*
 	 * Changing guest CPUID after KVM_RUN is forbidden, see the comment in
-	 * kvm_arch_vcpu_ioctl().
+	 * kvm_arch_vcpu_ioctl_run().
 	 */
-	KVM_BUG_ON(kvm_vcpu_has_run(vcpu), vcpu->kvm);
+	KVM_BUG_ON(vcpu->arch.vcpu_model_is_frozen, vcpu->kvm);
 }
 
 void kvm_mmu_reset_context(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 47a46283c866..4f487a980eae 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -752,7 +752,7 @@ void kvm_pmu_refresh(struct kvm_vcpu *vcpu)
 {
 	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
 
-	if (KVM_BUG_ON(kvm_vcpu_has_run(vcpu), vcpu->kvm))
+	if (KVM_BUG_ON(vcpu->arch.vcpu_model_is_frozen, vcpu->kvm))
 		return;
 
 	/*
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index a587f59167a7..997d14506a1f 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -2822,6 +2822,13 @@ static int tdx_vcpu_init(struct kvm_vcpu *vcpu, struct kvm_tdx_cmd *cmd)
 	td_vmcs_write64(tdx, POSTED_INTR_DESC_ADDR, __pa(&tdx->pi_desc));
 	td_vmcs_setbit32(tdx, PIN_BASED_VM_EXEC_CONTROL, PIN_BASED_POSTED_INTR);
 
+	/*
+	 * Freeze the vCPU model, as KVM relies on guest CPUID and capabilities
+	 * to be consistent with the TDX Module's view from here on out.
+	 */
+	vcpu->arch.vcpu_model_is_frozen = true;
+	kvm_update_cpuid_runtime(vcpu);
+
 	tdx->state = VCPU_TD_STATE_INITIALIZED;
 	return 0;
 }
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d2ea7db896ba..3db935737b59 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2218,7 +2218,8 @@ static int do_set_msr(struct kvm_vcpu *vcpu, unsigned index, u64 *data)
 	 * writes of the same value, e.g. to allow userspace to blindly stuff
 	 * all MSRs when emulating RESET.
 	 */
-	if (kvm_vcpu_has_run(vcpu) && kvm_is_immutable_feature_msr(index) &&
+	if (vcpu->arch.vcpu_model_is_frozen &&
+	    kvm_is_immutable_feature_msr(index) &&
 	    (do_get_msr(vcpu, index, &val) || *data != val))
 		return -EINVAL;
 
@@ -11469,6 +11470,12 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 	struct kvm_run *kvm_run = vcpu->run;
 	int r;
 
+	/*
+	 * Freeze the vCPU model, i.e. disallow changing CPUID, feature MSRs,
+	 * etc.  KVM doesn't support changing the model once the vCPU has run.
+	 */
+	vcpu->arch.vcpu_model_is_frozen = true;
+
 	vcpu_load(vcpu);
 	kvm_sigset_activate(vcpu);
 	kvm_run->flags = 0;
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 514ffd7513f3..6ed074d03616 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -118,11 +118,6 @@ static inline void kvm_leave_nested(struct kvm_vcpu *vcpu)
 	kvm_x86_ops.nested_ops->leave_nested(vcpu);
 }
 
-static inline bool kvm_vcpu_has_run(struct kvm_vcpu *vcpu)
-{
-	return vcpu->arch.last_vmentry_cpu != -1;
-}
-
 static inline bool kvm_is_exception_pending(struct kvm_vcpu *vcpu)
 {
 	return vcpu->arch.exception.pending ||

base-commit: d12b37e67b767a9e89b221067d48b257708d3044
-- 

  reply	other threads:[~2025-01-10 17:30 UTC|newest]

Thread overview: 82+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-21 20:14 [PATCH 0/7] KVM: TDX: TD vcpu enter/exit Adrian Hunter
2024-11-21 20:14 ` [PATCH RFC 1/7] x86/virt/tdx: Add SEAMCALL wrapper to enter/exit TDX guest Adrian Hunter
2024-11-22 11:10   ` Adrian Hunter
2024-11-22 16:33     ` Dave Hansen
2024-11-25 13:40       ` Adrian Hunter
2024-11-28 11:13         ` Adrian Hunter
2024-12-04 15:58           ` Adrian Hunter
2024-12-11 18:43             ` Adrian Hunter
2024-12-13 15:45               ` Adrian Hunter
2024-12-13 16:16               ` Dave Hansen
2024-12-13 16:30                 ` Adrian Hunter
2024-12-13 16:44                   ` Dave Hansen
2024-11-22 16:26   ` Dave Hansen
2024-11-22 17:29     ` Edgecombe, Rick P
2024-11-25 13:43       ` Adrian Hunter
2024-11-21 20:14 ` [PATCH 2/7] KVM: TDX: Implement TDX vcpu enter/exit path Adrian Hunter
2024-11-22  5:23   ` Xiaoyao Li
2024-11-22  5:56     ` Binbin Wu
2024-11-22 14:33       ` Adrian Hunter
2024-11-28  5:56         ` Yan Zhao
2024-11-28  6:26           ` Adrian Hunter
2024-11-21 20:14 ` [PATCH 3/7] KVM: TDX: vcpu_run: save/restore host state(host kernel gs) Adrian Hunter
2024-11-25 14:12   ` Nikolay Borisov
2024-11-26 16:15     ` Adrian Hunter
2024-11-21 20:14 ` [PATCH 4/7] KVM: TDX: restore host xsave state when exit from the guest TD Adrian Hunter
2024-11-22  5:49   ` Chao Gao
2024-11-25 11:10     ` Adrian Hunter
2024-11-26  2:20       ` Chao Gao
2024-11-28  6:50         ` Adrian Hunter
2024-12-02  2:52           ` Chao Gao
2024-12-02  6:36             ` Adrian Hunter
2024-12-17 16:09       ` Sean Christopherson
2024-12-20 15:22         ` Adrian Hunter
2024-12-20 16:22           ` Sean Christopherson
2024-12-20 21:24             ` PKEY syscall number for selftest? (was: [PATCH 4/7] KVM: TDX: restore host xsave state when exit from the guest TD) Sean Christopherson
2025-01-27 17:09               ` Sean Christopherson
2025-01-03 18:16             ` [PATCH 4/7] KVM: TDX: restore host xsave state when exit from the guest TD Adrian Hunter
2025-01-09 19:11               ` Sean Christopherson
2025-01-10 14:50                 ` Adrian Hunter
2025-01-10 17:30                   ` Sean Christopherson [this message]
2025-01-14 20:04                     ` Adrian Hunter
2025-01-15  2:28                       ` Sean Christopherson
2025-01-13 19:28                 ` Adrian Hunter
2025-01-13 23:47                   ` Sean Christopherson
2024-11-25 11:34     ` Adrian Hunter
2024-11-21 20:14 ` [PATCH 5/7] KVM: x86: Allow to update cached values in kvm_user_return_msrs w/o wrmsr Adrian Hunter
2024-11-21 20:14 ` [PATCH 6/7] KVM: TDX: restore user ret MSRs Adrian Hunter
2024-11-21 20:14 ` [PATCH 7/7] KVM: TDX: Add TSX_CTRL msr into uret_msrs list Adrian Hunter
2024-11-22  3:27   ` Chao Gao
2024-11-27 14:00     ` Sean Christopherson
2024-11-29 11:39       ` Adrian Hunter
2024-12-02 19:07         ` Sean Christopherson
2024-12-02 19:24           ` Edgecombe, Rick P
2024-12-03  0:34             ` Sean Christopherson
2024-12-03 17:34               ` Edgecombe, Rick P
2024-12-03 19:17                 ` Adrian Hunter
2024-12-04  1:25                   ` Chao Gao
2024-12-04  6:18                     ` Adrian Hunter
2024-12-04  6:37                       ` Chao Gao
2024-12-04  6:57                         ` Adrian Hunter
2024-12-04 11:13                           ` Chao Gao
2024-12-04 11:55                             ` Adrian Hunter
2024-12-04 15:33                               ` Xiaoyao Li
2024-12-04 23:51                                 ` Edgecombe, Rick P
2024-12-05 17:31                                 ` Adrian Hunter
2024-12-06  3:37                                   ` Xiaoyao Li
2024-12-06 14:40                                     ` Adrian Hunter
2024-12-09  2:46                                       ` Xiaoyao Li
2024-12-09  7:08                                         ` Adrian Hunter
2024-12-10  2:45                                           ` Xiaoyao Li
2024-12-04 23:40                               ` Edgecombe, Rick P
2024-11-25  1:25 ` [PATCH 0/7] KVM: TDX: TD vcpu enter/exit Binbin Wu
2024-11-25 15:19   ` Sean Christopherson
2024-11-25 19:50     ` Huang, Kai
2024-11-25 22:51       ` Sean Christopherson
2024-11-26  1:43         ` Huang, Kai
2024-11-26  1:44         ` Binbin Wu
2024-11-26  3:52           ` Huang, Kai
2024-11-26  5:29             ` Binbin Wu
2024-11-26  5:37               ` Huang, Kai
2024-11-26 21:41               ` Sean Christopherson
2024-12-10 18:23 ` Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z4FZKOzXIdhLOlU8@google.com \
    --to=seanjc@google.com \
    --cc=adrian.hunter@intel.com \
    --cc=binbin.wu@linux.intel.com \
    --cc=chao.gao@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=dmatlack@google.com \
    --cc=isaku.yamahata@intel.com \
    --cc=kai.huang@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nik.borisov@suse.com \
    --cc=pbonzini@redhat.com \
    --cc=reinette.chatre@intel.com \
    --cc=rick.p.edgecombe@intel.com \
    --cc=tony.lindgren@linux.intel.com \
    --cc=weijiang.yang@intel.com \
    --cc=x86@kernel.org \
    --cc=xiaoyao.li@intel.com \
    --cc=yan.y.zhao@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).