From: Sean Christopherson <seanjc@google.com>
To: Adrian Hunter <adrian.hunter@intel.com>
Cc: Rick P Edgecombe <rick.p.edgecombe@intel.com>,
"pbonzini@redhat.com" <pbonzini@redhat.com>,
"kas@kernel.org" <kas@kernel.org>,
Xiaoyao Li <xiaoyao.li@intel.com>,
"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
"linux-coco@lists.linux.dev" <linux-coco@lists.linux.dev>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
Yan Y Zhao <yan.y.zhao@intel.com>,
"x86@kernel.org" <x86@kernel.org>,
wenlong hou <houwenlong.hwl@antgroup.com>
Subject: Re: [PATCH v4 1/4] KVM: TDX: Synchronize user-return MSRs immediately after VP.ENTER
Date: Tue, 21 Oct 2025 08:06:20 -0700 [thread overview]
Message-ID: <aPehbDzbMHZTEtMa@google.com> (raw)
In-Reply-To: <d1628f0e-bbe9-48b0-8881-ad451d4ce9c5@intel.com>
On Tue, Oct 21, 2025, Adrian Hunter wrote:
> On 21/10/2025 01:55, Edgecombe, Rick P wrote:
> >> + * Several of KVM's user-return MSRs are clobbered by the TDX-Module if
> >> + * VP.ENTER succeeds, i.e. on TD-Exit. Mark those MSRs as needing an
> >> + * update to synchronize the "current" value in KVM's cache with the
> >> + * value in hardware (loaded by the TDX-Module).
> >> + */
> >
> > I think we should be synchronizing only after a successful VP.ENTER with a real
> > TD exit, but today instead we synchronize after any attempt to VP.ENTER.
Well this is all completely @#($*#. Looking at the TDX-Module source, if the
TDX-Module synthesizes an exit, e.g. because it suspects a zero-step attack, it
will signal a "normal" exit but not "restore" VMM state.
> If the MSR's do not get clobbered, does it matter whether or not they get
> restored.
It matters because KVM needs to know the actual value in hardware. If KVM thinks
an MSR is 'X', but it's actually 'Y', then KVM could fail to write the correct
value into hardware when returning to userspace and/or when running a different
vCPU.
Taking a step back, the entire approach of updating the "cache" after the fact is
ridiculous. TDX entry/exit is anything but fast; avoiding _at most_ 4x WRMSRs at
the start of the run loop is a very, very premature optimization. Preemptively
load hardware with the value that the TDX-Module _might_ set and call it good.
I'll replace patches 1 and 4 with this, tagged for stable@.
---
arch/x86/include/asm/kvm_host.h | 1 -
arch/x86/kvm/vmx/tdx.c | 52 +++++++++++++++------------------
arch/x86/kvm/vmx/tdx.h | 1 -
arch/x86/kvm/x86.c | 9 ------
4 files changed, 23 insertions(+), 40 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 48598d017d6f..d158dfd1842e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2378,7 +2378,6 @@ int kvm_pv_send_ipi(struct kvm *kvm, unsigned long ipi_bitmap_low,
int kvm_add_user_return_msr(u32 msr);
int kvm_find_user_return_msr(u32 msr);
int kvm_set_user_return_msr(unsigned index, u64 val, u64 mask);
-void kvm_user_return_msr_update_cache(unsigned int index, u64 val);
u64 kvm_get_user_return_msr(unsigned int slot);
static inline bool kvm_is_supported_user_return_msr(u32 msr)
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 326db9b9c567..63abfa251243 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -763,25 +763,6 @@ static bool tdx_protected_apic_has_interrupt(struct kvm_vcpu *vcpu)
return tdx_vcpu_state_details_intr_pending(vcpu_state_details);
}
-/*
- * Compared to vmx_prepare_switch_to_guest(), there is not much to do
- * as SEAMCALL/SEAMRET calls take care of most of save and restore.
- */
-void tdx_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
-{
- struct vcpu_vt *vt = to_vt(vcpu);
-
- if (vt->guest_state_loaded)
- return;
-
- if (likely(is_64bit_mm(current->mm)))
- vt->msr_host_kernel_gs_base = current->thread.gsbase;
- else
- vt->msr_host_kernel_gs_base = read_msr(MSR_KERNEL_GS_BASE);
-
- vt->guest_state_loaded = true;
-}
-
struct tdx_uret_msr {
u32 msr;
unsigned int slot;
@@ -795,19 +776,38 @@ static struct tdx_uret_msr tdx_uret_msrs[] = {
{.msr = MSR_TSC_AUX,},
};
-static void tdx_user_return_msr_update_cache(void)
+void tdx_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
{
+ struct vcpu_vt *vt = to_vt(vcpu);
int i;
+ if (vt->guest_state_loaded)
+ return;
+
+ if (likely(is_64bit_mm(current->mm)))
+ vt->msr_host_kernel_gs_base = current->thread.gsbase;
+ else
+ vt->msr_host_kernel_gs_base = read_msr(MSR_KERNEL_GS_BASE);
+
+ vt->guest_state_loaded = true;
+
+ /*
+ * Explicitly set user-return MSRs that are clobbered by the TDX-Module
+ * if VP.ENTER succeeds, i.e. on TD-Exit, with the values that would be
+ * written by the TDX-Module. Don't rely on the TDX-Module to actually
+ * clobber the MSRs, as the contract is poorly defined and not upheld.
+ * E.g. the TDX-Module will synthesize an EPT Violation without doing
+ * VM-Enter if it suspects a zero-step attack, and never "restore" VMM
+ * state.
+ */
for (i = 0; i < ARRAY_SIZE(tdx_uret_msrs); i++)
- kvm_user_return_msr_update_cache(tdx_uret_msrs[i].slot,
- tdx_uret_msrs[i].defval);
+ kvm_set_user_return_msr(i, tdx_uret_msrs[i].slot,
+ tdx_uret_msrs[i].defval);
}
static void tdx_prepare_switch_to_host(struct kvm_vcpu *vcpu)
{
struct vcpu_vt *vt = to_vt(vcpu);
- struct vcpu_tdx *tdx = to_tdx(vcpu);
if (!vt->guest_state_loaded)
return;
@@ -815,11 +815,6 @@ static void tdx_prepare_switch_to_host(struct kvm_vcpu *vcpu)
++vcpu->stat.host_state_reload;
wrmsrl(MSR_KERNEL_GS_BASE, vt->msr_host_kernel_gs_base);
- if (tdx->guest_entered) {
- tdx_user_return_msr_update_cache();
- tdx->guest_entered = false;
- }
-
vt->guest_state_loaded = false;
}
@@ -1059,7 +1054,6 @@ fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, u64 run_flags)
update_debugctlmsr(vcpu->arch.host_debugctl);
tdx_load_host_xsave_state(vcpu);
- tdx->guest_entered = true;
vcpu->arch.regs_avail &= TDX_REGS_AVAIL_SET;
diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
index ca39a9391db1..7f258870dc41 100644
--- a/arch/x86/kvm/vmx/tdx.h
+++ b/arch/x86/kvm/vmx/tdx.h
@@ -67,7 +67,6 @@ struct vcpu_tdx {
u64 vp_enter_ret;
enum vcpu_tdx_state state;
- bool guest_entered;
u64 map_gpa_next;
u64 map_gpa_end;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b4b5d2d09634..639589af7cbe 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -681,15 +681,6 @@ int kvm_set_user_return_msr(unsigned slot, u64 value, u64 mask)
}
EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_set_user_return_msr);
-void kvm_user_return_msr_update_cache(unsigned int slot, u64 value)
-{
- struct kvm_user_return_msrs *msrs = this_cpu_ptr(user_return_msrs);
-
- msrs->values[slot].curr = value;
- kvm_user_return_register_notifier(msrs);
-}
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_user_return_msr_update_cache);
-
u64 kvm_get_user_return_msr(unsigned int slot)
{
return this_cpu_ptr(user_return_msrs)->values[slot].curr;
base-commit: f222788458c8a7753d43befef2769cd282dc008e
--
next prev parent reply other threads:[~2025-10-21 15:06 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-16 22:28 [PATCH v4 0/4] KVM: x86: User-return MSR cleanups Sean Christopherson
2025-10-16 22:28 ` [PATCH v4 1/4] KVM: TDX: Synchronize user-return MSRs immediately after VP.ENTER Sean Christopherson
2025-10-20 22:55 ` Edgecombe, Rick P
2025-10-21 13:37 ` Adrian Hunter
2025-10-21 15:06 ` Sean Christopherson [this message]
2025-10-21 16:36 ` Adrian Hunter
2025-10-21 16:46 ` Sean Christopherson
2025-10-21 18:54 ` Edgecombe, Rick P
2025-10-21 19:33 ` Sean Christopherson
2025-10-21 20:49 ` Edgecombe, Rick P
2025-10-23 5:59 ` Xiaoyao Li
2025-10-16 22:28 ` [PATCH v4 2/4] KVM: x86: Leave user-return notifier registered on reboot/shutdown Sean Christopherson
2025-10-17 5:32 ` Chao Gao
2025-10-17 15:27 ` Sean Christopherson
2025-10-16 22:28 ` [PATCH v4 3/4] KVM: x86: Don't disable IRQs when unregistering user-return notifier Sean Christopherson
2025-10-16 22:28 ` [PATCH v4 4/4] KVM: x86: Drop "cache" from user return MSR setter that skips WRMSR Sean Christopherson
2025-10-17 2:52 ` Chao Gao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aPehbDzbMHZTEtMa@google.com \
--to=seanjc@google.com \
--cc=adrian.hunter@intel.com \
--cc=houwenlong.hwl@antgroup.com \
--cc=kas@kernel.org \
--cc=kvm@vger.kernel.org \
--cc=linux-coco@lists.linux.dev \
--cc=linux-kernel@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=rick.p.edgecombe@intel.com \
--cc=x86@kernel.org \
--cc=xiaoyao.li@intel.com \
--cc=yan.y.zhao@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).