From: Binbin Wu <binbin.wu@linux.intel.com>
To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org
Cc: rick.p.edgecombe@intel.com, kai.huang@intel.com,
adrian.hunter@intel.com, reinette.chatre@intel.com,
xiaoyao.li@intel.com, tony.lindgren@intel.com,
isaku.yamahata@intel.com, yan.y.zhao@intel.com,
chao.gao@intel.com, linux-kernel@vger.kernel.org,
binbin.wu@linux.intel.com
Subject: [PATCH v2 04/20] KVM: TDX: Kick off vCPUs when SEAMCALL is busy during TD page removal
Date: Thu, 27 Feb 2025 09:20:05 +0800 [thread overview]
Message-ID: <20250227012021.1778144-5-binbin.wu@linux.intel.com> (raw)
In-Reply-To: <20250227012021.1778144-1-binbin.wu@linux.intel.com>
From: Yan Zhao <yan.y.zhao@intel.com>
Kick off all vCPUs and prevent tdh_vp_enter() from executing whenever
tdh_mem_range_block()/tdh_mem_track()/tdh_mem_page_remove() encounters
contention, since the page removal path does not expect error and is less
sensitive to the performance penalty caused by kicking off vCPUs.
Although KVM has protected SEPT zap-related SEAMCALLs with kvm->mmu_lock,
KVM may still encounter TDX_OPERAND_BUSY due to the contention in the TDX
module.
- tdh_mem_track() may contend with tdh_vp_enter().
- tdh_mem_range_block()/tdh_mem_page_remove() may contend with
tdh_vp_enter() and TDCALLs.
Resources SHARED users EXCLUSIVE users
------------------------------------------------------------
TDCS epoch tdh_vp_enter tdh_mem_track
------------------------------------------------------------
SEPT tree tdh_mem_page_remove tdh_vp_enter (0-step mitigation)
tdh_mem_range_block
------------------------------------------------------------
SEPT entry tdh_mem_range_block (Host lock)
tdh_mem_page_remove (Host lock)
tdg_mem_page_accept (Guest lock)
tdg_mem_page_attr_rd (Guest lock)
tdg_mem_page_attr_wr (Guest lock)
Use a TDX specific per-VM flag wait_for_sept_zap along with
KVM_REQ_OUTSIDE_GUEST_MODE to kick off vCPUs and prevent them from entering
TD, thereby avoiding the potential contention. Apply the kick-off and no
vCPU entering only after each SEAMCALL busy error to minimize the window of
no TD entry, as the contention due to 0-step mitigation or TDCALLs is
expected to be rare.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
---
Rebased to use helper tdx_operand_busy().
---
arch/x86/kvm/vmx/tdx.c | 63 ++++++++++++++++++++++++++++++++++++------
arch/x86/kvm/vmx/tdx.h | 7 +++++
2 files changed, 61 insertions(+), 9 deletions(-)
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 6fa6f7e13e15..25ae2c2826d0 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -294,6 +294,26 @@ static void tdx_clear_page(struct page *page)
__mb();
}
+static void tdx_no_vcpus_enter_start(struct kvm *kvm)
+{
+ struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
+
+ lockdep_assert_held_write(&kvm->mmu_lock);
+
+ WRITE_ONCE(kvm_tdx->wait_for_sept_zap, true);
+
+ kvm_make_all_cpus_request(kvm, KVM_REQ_OUTSIDE_GUEST_MODE);
+}
+
+static void tdx_no_vcpus_enter_stop(struct kvm *kvm)
+{
+ struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
+
+ lockdep_assert_held_write(&kvm->mmu_lock);
+
+ WRITE_ONCE(kvm_tdx->wait_for_sept_zap, false);
+}
+
/* TDH.PHYMEM.PAGE.RECLAIM is allowed only when destroying the TD. */
static int __tdx_reclaim_page(struct page *page)
{
@@ -953,6 +973,14 @@ fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit)
return EXIT_FASTPATH_NONE;
}
+ /*
+ * Wait until retry of SEPT-zap-related SEAMCALL completes before
+ * allowing vCPU entry to avoid contention with tdh_vp_enter() and
+ * TDCALLs.
+ */
+ if (unlikely(READ_ONCE(to_kvm_tdx(vcpu->kvm)->wait_for_sept_zap)))
+ return EXIT_FASTPATH_EXIT_HANDLED;
+
trace_kvm_entry(vcpu, force_immediate_exit);
if (pi_test_on(&vt->pi_desc)) {
@@ -1467,15 +1495,24 @@ static int tdx_sept_drop_private_spte(struct kvm *kvm, gfn_t gfn,
if (KVM_BUG_ON(!is_hkid_assigned(kvm_tdx), kvm))
return -EINVAL;
- do {
+ /*
+ * When zapping private page, write lock is held. So no race condition
+ * with other vcpu sept operation.
+ * Race with TDH.VP.ENTER due to (0-step mitigation) and Guest TDCALLs.
+ */
+ err = tdh_mem_page_remove(&kvm_tdx->td, gpa, tdx_level, &entry,
+ &level_state);
+
+ if (unlikely(tdx_operand_busy(err))) {
/*
- * When zapping private page, write lock is held. So no race
- * condition with other vcpu sept operation. Race only with
- * TDH.VP.ENTER.
+ * The second retry is expected to succeed after kicking off all
+ * other vCPUs and prevent them from invoking TDH.VP.ENTER.
*/
+ tdx_no_vcpus_enter_start(kvm);
err = tdh_mem_page_remove(&kvm_tdx->td, gpa, tdx_level, &entry,
&level_state);
- } while (unlikely(tdx_operand_busy(err)));
+ tdx_no_vcpus_enter_stop(kvm);
+ }
if (KVM_BUG_ON(err, kvm)) {
pr_tdx_error_2(TDH_MEM_PAGE_REMOVE, err, entry, level_state);
@@ -1559,9 +1596,13 @@ static int tdx_sept_zap_private_spte(struct kvm *kvm, gfn_t gfn,
WARN_ON_ONCE(level != PG_LEVEL_4K);
err = tdh_mem_range_block(&kvm_tdx->td, gpa, tdx_level, &entry, &level_state);
- if (unlikely(tdx_operand_busy(err)))
- return -EBUSY;
+ if (unlikely(tdx_operand_busy(err))) {
+ /* After no vCPUs enter, the second retry is expected to succeed */
+ tdx_no_vcpus_enter_start(kvm);
+ err = tdh_mem_range_block(&kvm_tdx->td, gpa, tdx_level, &entry, &level_state);
+ tdx_no_vcpus_enter_stop(kvm);
+ }
if (tdx_is_sept_zap_err_due_to_premap(kvm_tdx, err, entry, level) &&
!KVM_BUG_ON(!atomic64_read(&kvm_tdx->nr_premapped), kvm)) {
atomic64_dec(&kvm_tdx->nr_premapped);
@@ -1611,9 +1652,13 @@ static void tdx_track(struct kvm *kvm)
lockdep_assert_held_write(&kvm->mmu_lock);
- do {
+ err = tdh_mem_track(&kvm_tdx->td);
+ if (unlikely(tdx_operand_busy(err))) {
+ /* After no vCPUs enter, the second retry is expected to succeed */
+ tdx_no_vcpus_enter_start(kvm);
err = tdh_mem_track(&kvm_tdx->td);
- } while (unlikely(tdx_operand_busy(err)));
+ tdx_no_vcpus_enter_stop(kvm);
+ }
if (KVM_BUG_ON(err, kvm))
pr_tdx_error(TDH_MEM_TRACK, err);
diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
index 4e1e71d5d8e8..c6bafac31d4d 100644
--- a/arch/x86/kvm/vmx/tdx.h
+++ b/arch/x86/kvm/vmx/tdx.h
@@ -36,6 +36,13 @@ struct kvm_tdx {
/* For KVM_TDX_INIT_MEM_REGION. */
atomic64_t nr_premapped;
+
+ /*
+ * Prevent vCPUs from TD entry to ensure SEPT zap related SEAMCALLs do
+ * not contend with tdh_vp_enter() and TDCALLs.
+ * Set/unset is protected with kvm->mmu_lock.
+ */
+ bool wait_for_sept_zap;
};
/* TDX module vCPU states */
--
2.46.0
next prev parent reply other threads:[~2025-02-27 1:19 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-27 1:20 [PATCH v2 00/20] KVM: TDX: TDX "the rest" part Binbin Wu
2025-02-27 1:20 ` [PATCH v2 01/20] KVM: TDX: Handle EPT violation/misconfig exit Binbin Wu
2025-02-27 1:20 ` [PATCH v2 02/20] KVM: TDX: Detect unexpected SEPT violations due to pending SPTEs Binbin Wu
2025-02-27 1:20 ` [PATCH v2 03/20] KVM: TDX: Retry locally in TDX EPT violation handler on RET_PF_RETRY Binbin Wu
2025-02-27 1:20 ` Binbin Wu [this message]
2025-02-27 1:20 ` [PATCH v2 05/20] KVM: TDX: Handle TDX PV CPUID hypercall Binbin Wu
2025-02-27 1:20 ` [PATCH v2 06/20] KVM: TDX: Handle TDX PV HLT hypercall Binbin Wu
2025-02-27 1:20 ` [PATCH v2 07/20] KVM: x86: Move KVM_MAX_MCE_BANKS to header file Binbin Wu
2025-02-27 1:20 ` [PATCH v2 08/20] KVM: TDX: Implement callbacks for MSR operations Binbin Wu
2025-02-27 1:20 ` [PATCH v2 09/20] KVM: TDX: Handle TDX PV rdmsr/wrmsr hypercall Binbin Wu
2025-02-27 1:20 ` [PATCH v2 10/20] KVM: TDX: Enable guest access to LMCE related MSRs Binbin Wu
2025-02-27 1:20 ` [PATCH v2 11/20] KVM: TDX: Handle TDG.VP.VMCALL<GetTdVmCallInfo> hypercall Binbin Wu
2025-02-27 1:20 ` [PATCH v2 12/20] KVM: TDX: Add methods to ignore accesses to CPU state Binbin Wu
2025-02-27 1:20 ` [PATCH v2 13/20] KVM: TDX: Add method to ignore guest instruction emulation Binbin Wu
2025-02-27 1:20 ` [PATCH v2 14/20] KVM: TDX: Add methods to ignore VMX preemption timer Binbin Wu
2025-02-27 1:20 ` [PATCH v2 15/20] KVM: TDX: Add methods to ignore accesses to TSC Binbin Wu
2025-02-27 1:20 ` [PATCH v2 16/20] KVM: TDX: Ignore setting up mce Binbin Wu
2025-02-27 1:20 ` [PATCH v2 17/20] KVM: TDX: Add a method to ignore hypercall patching Binbin Wu
2025-02-27 1:20 ` [PATCH v2 18/20] KVM: TDX: Enable guest access to MTRR MSRs Binbin Wu
2025-02-27 1:20 ` [PATCH v2 19/20] KVM: TDX: Make TDX VM type supported Binbin Wu
2025-02-27 1:20 ` [PATCH v2 20/20] Documentation/virt/kvm: Document on Trust Domain Extensions (TDX) Binbin Wu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250227012021.1778144-5-binbin.wu@linux.intel.com \
--to=binbin.wu@linux.intel.com \
--cc=adrian.hunter@intel.com \
--cc=chao.gao@intel.com \
--cc=isaku.yamahata@intel.com \
--cc=kai.huang@intel.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=reinette.chatre@intel.com \
--cc=rick.p.edgecombe@intel.com \
--cc=seanjc@google.com \
--cc=tony.lindgren@intel.com \
--cc=xiaoyao.li@intel.com \
--cc=yan.y.zhao@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox