public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Binbin Wu <binbin.wu@linux.intel.com>
To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org
Cc: rick.p.edgecombe@intel.com, kai.huang@intel.com,
	adrian.hunter@intel.com, reinette.chatre@intel.com,
	xiaoyao.li@intel.com, tony.lindgren@intel.com,
	isaku.yamahata@intel.com, yan.y.zhao@intel.com,
	chao.gao@intel.com, linux-kernel@vger.kernel.org,
	binbin.wu@linux.intel.com
Subject: [PATCH v2 03/20] KVM: TDX: Retry locally in TDX EPT violation handler on RET_PF_RETRY
Date: Thu, 27 Feb 2025 09:20:04 +0800	[thread overview]
Message-ID: <20250227012021.1778144-4-binbin.wu@linux.intel.com> (raw)
In-Reply-To: <20250227012021.1778144-1-binbin.wu@linux.intel.com>

From: Yan Zhao <yan.y.zhao@intel.com>

Retry locally in the TDX EPT violation handler for private memory to reduce
the chances for tdh_mem_sept_add()/tdh_mem_page_aug() to contend with
tdh_vp_enter().

TDX EPT violation installs private pages via tdh_mem_sept_add() and
tdh_mem_page_aug(). The two may have contention with tdh_vp_enter() or
TDCALLs.

Resources    SHARED  users      EXCLUSIVE users
------------------------------------------------------------
SEPT tree  tdh_mem_sept_add     tdh_vp_enter(0-step mitigation)
           tdh_mem_page_aug
------------------------------------------------------------
SEPT entry                      tdh_mem_sept_add (Host lock)
                                tdh_mem_page_aug (Host lock)
                                tdg_mem_page_accept (Guest lock)
                                tdg_mem_page_attr_rd (Guest lock)
                                tdg_mem_page_attr_wr (Guest lock)

Though the contention between tdh_mem_sept_add()/tdh_mem_page_aug() and
TDCALLs may be removed in future TDX module, their contention with
tdh_vp_enter() due to 0-step mitigation still persists.

The TDX module may trigger 0-step mitigation in SEAMCALL TDH.VP.ENTER,
which works as follows:
0. Each TDH.VP.ENTER records the guest RIP on TD entry.
1. When the TDX module encounters a VM exit with reason EPT_VIOLATION, it
   checks if the guest RIP is the same as last guest RIP on TD entry.
   -if yes, it means the EPT violation is caused by the same instruction
            that caused the last VM exit.
            Then, the TDX module increases the guest RIP no-progress count.
            When the count increases from 0 to the threshold (currently 6),
            the TDX module records the faulting GPA into a
            last_epf_gpa_list.
   -if no,  it means the guest RIP has made progress.
            So, the TDX module resets the RIP no-progress count and the
            last_epf_gpa_list.
2. On the next TDH.VP.ENTER, the TDX module (after saving the guest RIP on
   TD entry) checks if the last_epf_gpa_list is empty.
   -if yes, TD entry continues without acquiring the lock on the SEPT tree.
   -if no,  it triggers the 0-step mitigation by acquiring the exclusive
            lock on SEPT tree, walking the EPT tree to check if all page
            faults caused by the GPAs in the last_epf_gpa_list have been
            resolved before continuing TD entry.

Since KVM TDP MMU usually re-enters guest whenever it exits to userspace
(e.g. for KVM_EXIT_MEMORY_FAULT) or encounters a BUSY, it is possible for a
tdh_vp_enter() to be called more than the threshold count before a page
fault is addressed, triggering contention when tdh_vp_enter() attempts to
acquire exclusive lock on SEPT tree.

Retry locally in TDX EPT violation handler to reduce the count of invoking
tdh_vp_enter(), hence reducing the possibility of its contention with
tdh_mem_sept_add()/tdh_mem_page_aug(). However, the 0-step mitigation and
the contention are still not eliminated due to KVM_EXIT_MEMORY_FAULT,
signals/interrupts, and cases when one instruction faults more GFNs than
the threshold count.

Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
---
Changes from "KVM: TDX SEPT SEAMCALL retry" series [1]
- Use kvm_vcpu_has_events() to replace "pi_has_pending_interrupt(vcpu) ||
  kvm_test_request(KVM_REQ_NMI, vcpu) || vcpu->arch.nmi_pending)" [2].
  (Sean)
- Update code comment to explain why break out for false positive of
  interrupt injection is fine (Sean).

[1]: https://lore.kernel.org/all/20250113021218.18922-1-yan.y.zhao@intel.com
[2]: https://lore.kernel.org/all/Z4rIGv4E7Jdmhl8P@google.com
---
 arch/x86/kvm/vmx/tdx.c | 57 +++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 56 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index b8701e343e80..6fa6f7e13e15 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -1698,6 +1698,8 @@ static int tdx_handle_ept_violation(struct kvm_vcpu *vcpu)
 {
 	unsigned long exit_qual;
 	gpa_t gpa = to_tdx(vcpu)->exit_gpa;
+	bool local_retry = false;
+	int ret;
 
 	if (vt_is_tdx_private_gpa(vcpu->kvm, gpa)) {
 		if (tdx_is_sept_violation_unexpected_pending(vcpu)) {
@@ -1716,6 +1718,9 @@ static int tdx_handle_ept_violation(struct kvm_vcpu *vcpu)
 		 * due to aliasing a single HPA to multiple GPAs.
 		 */
 		exit_qual = EPT_VIOLATION_ACC_WRITE;
+
+		/* Only private GPA triggers zero-step mitigation */
+		local_retry = true;
 	} else {
 		exit_qual = vmx_get_exit_qual(vcpu);
 		/*
@@ -1728,7 +1733,57 @@ static int tdx_handle_ept_violation(struct kvm_vcpu *vcpu)
 	}
 
 	trace_kvm_page_fault(vcpu, gpa, exit_qual);
-	return __vmx_handle_ept_violation(vcpu, gpa, exit_qual);
+
+	/*
+	 * To minimize TDH.VP.ENTER invocations, retry locally for private GPA
+	 * mapping in TDX.
+	 *
+	 * KVM may return RET_PF_RETRY for private GPA due to
+	 * - contentions when atomically updating SPTEs of the mirror page table
+	 * - in-progress GFN invalidation or memslot removal.
+	 * - TDX_OPERAND_BUSY error from TDH.MEM.PAGE.AUG or TDH.MEM.SEPT.ADD,
+	 *   caused by contentions with TDH.VP.ENTER (with zero-step mitigation)
+	 *   or certain TDCALLs.
+	 *
+	 * If TDH.VP.ENTER is invoked more times than the threshold set by the
+	 * TDX module before KVM resolves the private GPA mapping, the TDX
+	 * module will activate zero-step mitigation during TDH.VP.ENTER. This
+	 * process acquires an SEPT tree lock in the TDX module, leading to
+	 * further contentions with TDH.MEM.PAGE.AUG or TDH.MEM.SEPT.ADD
+	 * operations on other vCPUs.
+	 *
+	 * Breaking out of local retries for kvm_vcpu_has_events() is for
+	 * interrupt injection. kvm_vcpu_has_events() should not see pending
+	 * events for TDX. Since KVM can't determine if IRQs (or NMIs) are
+	 * blocked by TDs, false positives are inevitable i.e., KVM may re-enter
+	 * the guest even if the IRQ/NMI can't be delivered.
+	 *
+	 * Note: even without breaking out of local retries, zero-step
+	 * mitigation may still occur due to
+	 * - invoking of TDH.VP.ENTER after KVM_EXIT_MEMORY_FAULT,
+	 * - a single RIP causing EPT violations for more GFNs than the
+	 *   threshold count.
+	 * This is safe, as triggering zero-step mitigation only introduces
+	 * contentions to page installation SEAMCALLs on other vCPUs, which will
+	 * handle retries locally in their EPT violation handlers.
+	 */
+	while (1) {
+		ret = __vmx_handle_ept_violation(vcpu, gpa, exit_qual);
+
+		if (ret != RET_PF_RETRY || !local_retry)
+			break;
+
+		if (kvm_vcpu_has_events(vcpu) || signal_pending(current))
+			break;
+
+		if (kvm_check_request(KVM_REQ_VM_DEAD, vcpu)) {
+			ret = -EIO;
+			break;
+		}
+
+		cond_resched();
+	}
+	return ret;
 }
 
 int tdx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t fastpath)
-- 
2.46.0


  parent reply	other threads:[~2025-02-27  1:18 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-27  1:20 [PATCH v2 00/20] KVM: TDX: TDX "the rest" part Binbin Wu
2025-02-27  1:20 ` [PATCH v2 01/20] KVM: TDX: Handle EPT violation/misconfig exit Binbin Wu
2025-02-27  1:20 ` [PATCH v2 02/20] KVM: TDX: Detect unexpected SEPT violations due to pending SPTEs Binbin Wu
2025-02-27  1:20 ` Binbin Wu [this message]
2025-02-27  1:20 ` [PATCH v2 04/20] KVM: TDX: Kick off vCPUs when SEAMCALL is busy during TD page removal Binbin Wu
2025-02-27  1:20 ` [PATCH v2 05/20] KVM: TDX: Handle TDX PV CPUID hypercall Binbin Wu
2025-02-27  1:20 ` [PATCH v2 06/20] KVM: TDX: Handle TDX PV HLT hypercall Binbin Wu
2025-02-27  1:20 ` [PATCH v2 07/20] KVM: x86: Move KVM_MAX_MCE_BANKS to header file Binbin Wu
2025-02-27  1:20 ` [PATCH v2 08/20] KVM: TDX: Implement callbacks for MSR operations Binbin Wu
2025-02-27  1:20 ` [PATCH v2 09/20] KVM: TDX: Handle TDX PV rdmsr/wrmsr hypercall Binbin Wu
2025-02-27  1:20 ` [PATCH v2 10/20] KVM: TDX: Enable guest access to LMCE related MSRs Binbin Wu
2025-02-27  1:20 ` [PATCH v2 11/20] KVM: TDX: Handle TDG.VP.VMCALL<GetTdVmCallInfo> hypercall Binbin Wu
2025-02-27  1:20 ` [PATCH v2 12/20] KVM: TDX: Add methods to ignore accesses to CPU state Binbin Wu
2025-02-27  1:20 ` [PATCH v2 13/20] KVM: TDX: Add method to ignore guest instruction emulation Binbin Wu
2025-02-27  1:20 ` [PATCH v2 14/20] KVM: TDX: Add methods to ignore VMX preemption timer Binbin Wu
2025-02-27  1:20 ` [PATCH v2 15/20] KVM: TDX: Add methods to ignore accesses to TSC Binbin Wu
2025-02-27  1:20 ` [PATCH v2 16/20] KVM: TDX: Ignore setting up mce Binbin Wu
2025-02-27  1:20 ` [PATCH v2 17/20] KVM: TDX: Add a method to ignore hypercall patching Binbin Wu
2025-02-27  1:20 ` [PATCH v2 18/20] KVM: TDX: Enable guest access to MTRR MSRs Binbin Wu
2025-02-27  1:20 ` [PATCH v2 19/20] KVM: TDX: Make TDX VM type supported Binbin Wu
2025-02-27  1:20 ` [PATCH v2 20/20] Documentation/virt/kvm: Document on Trust Domain Extensions (TDX) Binbin Wu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250227012021.1778144-4-binbin.wu@linux.intel.com \
    --to=binbin.wu@linux.intel.com \
    --cc=adrian.hunter@intel.com \
    --cc=chao.gao@intel.com \
    --cc=isaku.yamahata@intel.com \
    --cc=kai.huang@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=reinette.chatre@intel.com \
    --cc=rick.p.edgecombe@intel.com \
    --cc=seanjc@google.com \
    --cc=tony.lindgren@intel.com \
    --cc=xiaoyao.li@intel.com \
    --cc=yan.y.zhao@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox