public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Binbin Wu <binbin.wu@linux.intel.com>
To: pbonzini@redhat.com, seanjc@google.com, kvm@vger.kernel.org
Cc: rick.p.edgecombe@intel.com, kai.huang@intel.com,
	adrian.hunter@intel.com, reinette.chatre@intel.com,
	xiaoyao.li@intel.com, tony.lindgren@intel.com,
	isaku.yamahata@intel.com, yan.y.zhao@intel.com,
	chao.gao@intel.com, linux-kernel@vger.kernel.org,
	binbin.wu@linux.intel.com
Subject: [PATCH v2 01/20] KVM: TDX: Handle EPT violation/misconfig exit
Date: Thu, 27 Feb 2025 09:20:02 +0800	[thread overview]
Message-ID: <20250227012021.1778144-2-binbin.wu@linux.intel.com> (raw)
In-Reply-To: <20250227012021.1778144-1-binbin.wu@linux.intel.com>

From: Isaku Yamahata <isaku.yamahata@intel.com>

For TDX, on EPT violation, call common __vmx_handle_ept_violation() to
trigger x86 MMU code; on EPT misconfiguration, bug the VM since it
shouldn't happen.

EPT violation due to instruction fetch should never be triggered from
shared memory in TDX guest.  If such EPT violation occurs, treat it as
broken hardware.

EPT misconfiguration shouldn't happen on neither shared nor secure EPT for
TDX guests.
- TDX module guarantees no EPT misconfiguration on secure EPT.  Per TDX
  module v1.5 spec section 9.4 "Secure EPT Induced TD Exits":
  "By design, since secure EPT is fully controlled by the TDX module, an
  EPT misconfiguration on a private GPA indicates a TDX module bug and is
  handled as a fatal error."
- For shared EPT, the MMIO caching optimization, which is the only case
  where current KVM configures EPT entries to generate EPT
  misconfiguration, is implemented in a different way for TDX guests.  KVM
  configures EPT entries to non-present value without suppressing #VE bit.
  It causes #VE in the TDX guest and the guest will call TDG.VP.VMCALL to
  request MMIO emulation.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Co-developed-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
[binbin: rework changelog]
Co-developed-by: Binbin Wu <binbin.wu@linux.intel.com>
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
---
TDX "the rest" v2:
- KVM_BUG_ON() and return -EIO for real EXIT_REASON_EPT_MISCONFIG. (Sean)
- Defer the handling for real EXIT_REASON_EPT_MISCONFIG until
  tdx_handle_exit() because tdx_to_vmx_exit_reason() is called by
  non-instrumentable code with interrupt disabled.
- Rebased after adding tdcall_to_vmx_exit_reason().

TDX "the rest" v1:
- Renamed from "KVM: TDX: Handle ept violation/misconfig exit" to
  "KVM: TDX: Handle EPT violation/misconfig exit" (Reinette)
- Removed WARN_ON_ONCE(1) in tdx_handle_ept_misconfig(). (Rick)
- Add comment above EPT_VIOLATION_ACC_INSTR check. (Chao)
  https://lore.kernel.org/lkml/Zgoz0sizgEZhnQ98@chao-email/
  https://lore.kernel.org/lkml/ZjiE+O9fct5zI4Sf@chao-email/
- Remove unnecessary define of TDX_SEPT_VIOLATION_EXIT_QUAL. (Sean)
- Replace pr_warn() and KVM_EXIT_EXCEPTION with KVM_BUG_ON(). (Sean)
- KVM_BUG_ON() for EPT misconfig. (Sean)
- Rework changelog.

v14 -> v15:
- use PFERR_GUEST_ENC_MASK to tell the fault is private
---
 arch/x86/kvm/vmx/tdx.c | 47 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 47 insertions(+)

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index f9eccee02a69..0e2c734070d6 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -860,6 +860,12 @@ static __always_inline u32 tdx_to_vmx_exit_reason(struct kvm_vcpu *vcpu)
 			return EXIT_REASON_VMCALL;
 
 		return tdcall_to_vmx_exit_reason(vcpu);
+	case EXIT_REASON_EPT_MISCONFIG:
+		/*
+		 * Defer KVM_BUG_ON() until tdx_handle_exit() because this is in
+		 * non-instrumentable code with interrupts disabled.
+		 */
+		return -1u;
 	default:
 		break;
 	}
@@ -968,6 +974,9 @@ fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit)
 
 	vcpu->arch.regs_avail &= ~TDX_REGS_UNSUPPORTED_SET;
 
+	if (unlikely(tdx->vp_enter_ret == EXIT_REASON_EPT_MISCONFIG))
+		return EXIT_FASTPATH_NONE;
+
 	if (unlikely((tdx->vp_enter_ret & TDX_SW_ERROR) == TDX_SW_ERROR))
 		return EXIT_FASTPATH_NONE;
 
@@ -1674,6 +1683,37 @@ void tdx_deliver_interrupt(struct kvm_lapic *apic, int delivery_mode,
 	trace_kvm_apicv_accept_irq(vcpu->vcpu_id, delivery_mode, trig_mode, vector);
 }
 
+static int tdx_handle_ept_violation(struct kvm_vcpu *vcpu)
+{
+	unsigned long exit_qual;
+	gpa_t gpa = to_tdx(vcpu)->exit_gpa;
+
+	if (vt_is_tdx_private_gpa(vcpu->kvm, gpa)) {
+		/*
+		 * Always treat SEPT violations as write faults.  Ignore the
+		 * EXIT_QUALIFICATION reported by TDX-SEAM for SEPT violations.
+		 * TD private pages are always RWX in the SEPT tables,
+		 * i.e. they're always mapped writable.  Just as importantly,
+		 * treating SEPT violations as write faults is necessary to
+		 * avoid COW allocations, which will cause TDAUGPAGE failures
+		 * due to aliasing a single HPA to multiple GPAs.
+		 */
+		exit_qual = EPT_VIOLATION_ACC_WRITE;
+	} else {
+		exit_qual = vmx_get_exit_qual(vcpu);
+		/*
+		 * EPT violation due to instruction fetch should never be
+		 * triggered from shared memory in TDX guest.  If such EPT
+		 * violation occurs, treat it as broken hardware.
+		 */
+		if (KVM_BUG_ON(exit_qual & EPT_VIOLATION_ACC_INSTR, vcpu->kvm))
+			return -EIO;
+	}
+
+	trace_kvm_page_fault(vcpu, gpa, exit_qual);
+	return __vmx_handle_ept_violation(vcpu, gpa, exit_qual);
+}
+
 int tdx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t fastpath)
 {
 	struct vcpu_tdx *tdx = to_tdx(vcpu);
@@ -1683,6 +1723,11 @@ int tdx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t fastpath)
 	if (fastpath != EXIT_FASTPATH_NONE)
 		return 1;
 
+	if (unlikely(vp_enter_ret == EXIT_REASON_EPT_MISCONFIG)) {
+		KVM_BUG_ON(1, vcpu->kvm);
+		return -EIO;
+	}
+
 	/*
 	 * Handle TDX SW errors, including TDX_SEAMCALL_UD, TDX_SEAMCALL_GP and
 	 * TDX_SEAMCALL_VMFAILINVALID.
@@ -1732,6 +1777,8 @@ int tdx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t fastpath)
 		return tdx_emulate_io(vcpu);
 	case EXIT_REASON_EPT_MISCONFIG:
 		return tdx_emulate_mmio(vcpu);
+	case EXIT_REASON_EPT_VIOLATION:
+		return tdx_handle_ept_violation(vcpu);
 	case EXIT_REASON_OTHER_SMI:
 		/*
 		 * Unlike VMX, SMI in SEAM non-root mode (i.e. when
-- 
2.46.0


  reply	other threads:[~2025-02-27  1:18 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-27  1:20 [PATCH v2 00/20] KVM: TDX: TDX "the rest" part Binbin Wu
2025-02-27  1:20 ` Binbin Wu [this message]
2025-02-27  1:20 ` [PATCH v2 02/20] KVM: TDX: Detect unexpected SEPT violations due to pending SPTEs Binbin Wu
2025-02-27  1:20 ` [PATCH v2 03/20] KVM: TDX: Retry locally in TDX EPT violation handler on RET_PF_RETRY Binbin Wu
2025-02-27  1:20 ` [PATCH v2 04/20] KVM: TDX: Kick off vCPUs when SEAMCALL is busy during TD page removal Binbin Wu
2025-02-27  1:20 ` [PATCH v2 05/20] KVM: TDX: Handle TDX PV CPUID hypercall Binbin Wu
2025-02-27  1:20 ` [PATCH v2 06/20] KVM: TDX: Handle TDX PV HLT hypercall Binbin Wu
2025-02-27  1:20 ` [PATCH v2 07/20] KVM: x86: Move KVM_MAX_MCE_BANKS to header file Binbin Wu
2025-02-27  1:20 ` [PATCH v2 08/20] KVM: TDX: Implement callbacks for MSR operations Binbin Wu
2025-02-27  1:20 ` [PATCH v2 09/20] KVM: TDX: Handle TDX PV rdmsr/wrmsr hypercall Binbin Wu
2025-02-27  1:20 ` [PATCH v2 10/20] KVM: TDX: Enable guest access to LMCE related MSRs Binbin Wu
2025-02-27  1:20 ` [PATCH v2 11/20] KVM: TDX: Handle TDG.VP.VMCALL<GetTdVmCallInfo> hypercall Binbin Wu
2025-02-27  1:20 ` [PATCH v2 12/20] KVM: TDX: Add methods to ignore accesses to CPU state Binbin Wu
2025-02-27  1:20 ` [PATCH v2 13/20] KVM: TDX: Add method to ignore guest instruction emulation Binbin Wu
2025-02-27  1:20 ` [PATCH v2 14/20] KVM: TDX: Add methods to ignore VMX preemption timer Binbin Wu
2025-02-27  1:20 ` [PATCH v2 15/20] KVM: TDX: Add methods to ignore accesses to TSC Binbin Wu
2025-02-27  1:20 ` [PATCH v2 16/20] KVM: TDX: Ignore setting up mce Binbin Wu
2025-02-27  1:20 ` [PATCH v2 17/20] KVM: TDX: Add a method to ignore hypercall patching Binbin Wu
2025-02-27  1:20 ` [PATCH v2 18/20] KVM: TDX: Enable guest access to MTRR MSRs Binbin Wu
2025-02-27  1:20 ` [PATCH v2 19/20] KVM: TDX: Make TDX VM type supported Binbin Wu
2025-02-27  1:20 ` [PATCH v2 20/20] Documentation/virt/kvm: Document on Trust Domain Extensions (TDX) Binbin Wu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250227012021.1778144-2-binbin.wu@linux.intel.com \
    --to=binbin.wu@linux.intel.com \
    --cc=adrian.hunter@intel.com \
    --cc=chao.gao@intel.com \
    --cc=isaku.yamahata@intel.com \
    --cc=kai.huang@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=reinette.chatre@intel.com \
    --cc=rick.p.edgecombe@intel.com \
    --cc=seanjc@google.com \
    --cc=tony.lindgren@intel.com \
    --cc=xiaoyao.li@intel.com \
    --cc=yan.y.zhao@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox