[RFC PATCH v2 14/23] KVM: TDX: Split and inhibit huge mappings if a VMExit carries level info

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Yan Zhao <yan.y.zhao@intel.com>
To: pbonzini@redhat.com, seanjc@google.com
Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
	x86@kernel.org, rick.p.edgecombe@intel.com,
	dave.hansen@intel.com, kas@kernel.org, tabba@google.com,
	ackerleytng@google.com, quic_eberman@quicinc.com,
	michael.roth@amd.com, david@redhat.com, vannapurve@google.com,
	vbabka@suse.cz, thomas.lendacky@amd.com, pgonda@google.com,
	zhiquan1.li@intel.com, fan.du@intel.com, jun.miao@intel.com,
	ira.weiny@intel.com, isaku.yamahata@intel.com,
	xiaoyao.li@intel.com, binbin.wu@linux.intel.com,
	chao.p.peng@intel.com, yan.y.zhao@intel.com
Subject: [RFC PATCH v2 14/23] KVM: TDX: Split and inhibit huge mappings if a VMExit carries level info
Date: Thu,  7 Aug 2025 17:44:23 +0800	[thread overview]
Message-ID: <20250807094423.4644-1-yan.y.zhao@intel.com> (raw)
In-Reply-To: <20250807093950.4395-1-yan.y.zhao@intel.com>

TDX requires guests to accept S-EPT mappings created by the host KVM. Due
to the current implementation of the TDX module, if a guest accepts a GFN
at a lower level after KVM maps it at a higher level, the TDX module will
emulate an EPT violation VMExit to KVM instead of returning a size mismatch
error to the guest. If KVM fails to perform page splitting in the VMExit
handler, the guest's accept operation will be triggered again upon
re-entering the guest, causing a repeated EPT violation VMExit.

The TDX module thus enables the EPT violation VMExit to carry the guest's
accept level when the VMExit is caused by the guest's accept operation.

Therefore, in TDX's EPT violation handler
(1) Set the guest inhibit bit in the lpage info to prevent KVM MMU core
    from mapping at a higher a level than the guest's accept level.

(2) Split any existing huge mapping at the fault GFN to avoid unsupported
    splitting under the shared mmu_lock by TDX.

Use write mmu_lock to pretect (1) and (2) for now. If future KVM TDX can
perform the actual splitting under shared mmu_lock with enhanced TDX
modules, (1) is possible to be called under shared mmu_lock, and (2) would
become unnecessary.

As an optimization, this patch calls hugepage_test_guest_inhibit() without
holding the mmu_lock to reduce the frequency of acquiring the write
mmu_lock. The write mmu_lock is thus only acquired if the guest inhibit bit
is not already set. This is safe because the guest inhibit bit is set in a
one-way manner while the splitting under the write mmu_lock is performed
before setting the guest inhibit bit.

Link: https://lore.kernel.org/all/a6ffe23fb97e64109f512fa43e9f6405236ed40a.camel@intel.com
Suggested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
---
RFC v2
- Change tdx_get_accept_level() to tdx_check_accept_level().
- Invoke kvm_split_cross_boundary_leafs() and hugepage_set_guest_inhibit()
  to change KVM mapping level in a global way according to guest accept
  level. (Rick, Sean).

RFC v1:
- Introduce tdx_get_accept_level() to get guest accept level.
- Use tdx->violation_request_level and tdx->violation_gfn* to pass guest
  accept level to tdx_gmem_private_max_mapping_level() to detemine KVM
  mapping level.
---
 arch/x86/kvm/vmx/tdx.c      | 50 +++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/vmx/tdx_arch.h |  3 +++
 2 files changed, 53 insertions(+)

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 035d81275be4..71115058e5e6 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -2019,6 +2019,53 @@ static inline bool tdx_is_sept_violation_unexpected_pending(struct kvm_vcpu *vcp
 	return !(eq & EPT_VIOLATION_PROT_MASK) && !(eq & EPT_VIOLATION_EXEC_FOR_RING3_LIN);
 }
 
+static inline int tdx_check_accept_level(struct kvm_vcpu *vcpu, gfn_t gfn)
+{
+	struct kvm_memory_slot *slot = gfn_to_memslot(vcpu->kvm, gfn);
+	struct vcpu_tdx *tdx = to_tdx(vcpu);
+	struct kvm *kvm = vcpu->kvm;
+	u64 eeq_type, eeq_info;
+	int level = -1;
+
+	if (!slot)
+		return 0;
+
+	eeq_type = tdx->ext_exit_qualification & TDX_EXT_EXIT_QUAL_TYPE_MASK;
+	if (eeq_type != TDX_EXT_EXIT_QUAL_TYPE_ACCEPT)
+		return 0;
+
+	eeq_info = (tdx->ext_exit_qualification & TDX_EXT_EXIT_QUAL_INFO_MASK) >>
+		   TDX_EXT_EXIT_QUAL_INFO_SHIFT;
+
+	level = (eeq_info & GENMASK(2, 0)) + 1;
+
+	if (level == PG_LEVEL_4K || level == PG_LEVEL_2M) {
+		if (!hugepage_test_guest_inhibit(slot, gfn, level + 1)) {
+			gfn_t base_gfn = gfn_round_for_level(gfn, level);
+			struct kvm_gfn_range gfn_range = {
+				.start = base_gfn,
+				.end = base_gfn + KVM_PAGES_PER_HPAGE(level),
+				.slot = slot,
+				.may_block = true,
+				.attr_filter = KVM_FILTER_PRIVATE,
+			};
+
+			scoped_guard(write_lock, &kvm->mmu_lock) {
+				int ret;
+
+				ret = kvm_split_cross_boundary_leafs(kvm, &gfn_range, false);
+				if (ret)
+					return ret;
+
+				hugepage_set_guest_inhibit(slot, gfn, level + 1);
+				if (level == PG_LEVEL_4K)
+					hugepage_set_guest_inhibit(slot, gfn, level + 2);
+			}
+		}
+	}
+	return 0;
+}
+
 static int tdx_handle_ept_violation(struct kvm_vcpu *vcpu)
 {
 	unsigned long exit_qual;
@@ -2044,6 +2091,9 @@ static int tdx_handle_ept_violation(struct kvm_vcpu *vcpu)
 		 */
 		exit_qual = EPT_VIOLATION_ACC_WRITE;
 
+		if (tdx_check_accept_level(vcpu, gpa_to_gfn(gpa)))
+			return RET_PF_RETRY;
+
 		/* Only private GPA triggers zero-step mitigation */
 		local_retry = true;
 	} else {
diff --git a/arch/x86/kvm/vmx/tdx_arch.h b/arch/x86/kvm/vmx/tdx_arch.h
index a30e880849e3..af006a73ee05 100644
--- a/arch/x86/kvm/vmx/tdx_arch.h
+++ b/arch/x86/kvm/vmx/tdx_arch.h
@@ -82,7 +82,10 @@ struct tdx_cpuid_value {
 #define TDX_TD_ATTR_PERFMON		BIT_ULL(63)
 
 #define TDX_EXT_EXIT_QUAL_TYPE_MASK	GENMASK(3, 0)
+#define TDX_EXT_EXIT_QUAL_TYPE_ACCEPT  1
 #define TDX_EXT_EXIT_QUAL_TYPE_PENDING_EPT_VIOLATION  6
+#define TDX_EXT_EXIT_QUAL_INFO_MASK	GENMASK(63, 32)
+#define TDX_EXT_EXIT_QUAL_INFO_SHIFT	32
 /*
  * TD_PARAMS is provided as an input to TDH_MNG_INIT, the size of which is 1024B.
  */
-- 
2.43.2

next prev parent reply	other threads:[~2025-08-07  9:44 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-07  9:39 [RFC PATCH v2 00/23] KVM: TDX huge page support for private memory Yan Zhao
2025-08-07  9:41 ` [RFC PATCH v2 01/23] x86/tdx: Enhance tdh_mem_page_aug() to support huge pages Yan Zhao
2025-08-07  9:41 ` [RFC PATCH v2 02/23] x86/virt/tdx: Add SEAMCALL wrapper tdh_mem_page_demote() Yan Zhao
2025-09-01  8:55   ` Binbin Wu
2025-09-01  9:08     ` Yan Zhao
2025-09-02 16:56       ` Edgecombe, Rick P
2025-09-02 17:37         ` Sean Christopherson
2025-09-02 17:45           ` Edgecombe, Rick P
2025-09-04  9:31             ` Yan Zhao
2025-08-07  9:42 ` [RFC PATCH v2 03/23] x86/tdx: Enhance tdh_phymem_page_wbinvd_hkid() to invalidate huge pages Yan Zhao
2025-08-07  9:42 ` [RFC PATCH v2 04/23] KVM: TDX: Introduce tdx_clear_folio() to clear " Yan Zhao
2025-09-02  2:56   ` Binbin Wu
2025-09-03  9:51     ` Yan Zhao
2025-09-03 11:19       ` Binbin Wu
2025-09-04  2:53         ` Yan Zhao
2025-08-07  9:42 ` [RFC PATCH v2 05/23] x86/tdx: Enhance tdh_phymem_page_reclaim() to support " Yan Zhao
2025-08-07  9:42 ` [RFC PATCH v2 06/23] KVM: TDX: Do not hold page refcount on private guest pages Yan Zhao
2025-08-07  9:42 ` [RFC PATCH v2 07/23] KVM: x86/mmu: Disallow page merging (huge page adjustment) for mirror root Yan Zhao
2025-08-07  9:43 ` [RFC PATCH v2 08/23] KVM: x86/tdp_mmu: Alloc external_spt page for mirror page table splitting Yan Zhao
2025-08-07  9:43 ` [RFC PATCH v2 09/23] KVM: x86/tdp_mmu: Add split_external_spt hook called during write mmu_lock Yan Zhao
2025-08-07  9:43 ` [RFC PATCH v2 10/23] KVM: TDX: Enable huge page splitting under write kvm->mmu_lock Yan Zhao
2025-08-07  9:43 ` [RFC PATCH v2 11/23] KVM: x86: Reject splitting huge pages under shared mmu_lock for mirror root Yan Zhao
2025-09-03  3:30   ` Binbin Wu
2025-08-07  9:43 ` [RFC PATCH v2 12/23] KVM: x86/mmu: Introduce kvm_split_cross_boundary_leafs() Yan Zhao
2025-09-03  6:57   ` Binbin Wu
2025-09-03  9:44     ` Yan Zhao
2025-08-07  9:44 ` [RFC PATCH v2 13/23] KVM: x86: Introduce hugepage_set_guest_inhibit() Yan Zhao
2025-08-07  9:44 ` Yan Zhao [this message]
2025-09-03  7:36   ` [RFC PATCH v2 14/23] KVM: TDX: Split and inhibit huge mappings if a VMExit carries level info Binbin Wu
2025-09-03  9:37     ` Yan Zhao
2025-08-07  9:44 ` [RFC PATCH v2 15/23] KVM: Change the return type of gfn_handler_t() from bool to int Yan Zhao
2025-08-07  9:44 ` [RFC PATCH v2 16/23] KVM: x86: Split cross-boundary mirror leafs for KVM_SET_MEMORY_ATTRIBUTES Yan Zhao
2025-08-07  9:45 ` [RFC PATCH v2 17/23] KVM: guest_memfd: Split for punch hole and private-to-shared conversion Yan Zhao
2025-09-04  7:58   ` Binbin Wu
2025-09-04  9:48     ` Yan Zhao
2025-09-04 11:07       ` Yan Zhao
2025-08-07  9:45 ` [RFC PATCH v2 18/23] x86/virt/tdx: Do not perform cache flushes unless CLFLUSH_BEFORE_ALLOC is set Yan Zhao
2025-08-11 21:10   ` Sagi Shahar
2025-08-12  6:37     ` Yan Zhao
2025-09-04  8:16   ` Binbin Wu
2025-09-04  9:50     ` Yan Zhao
2025-08-07  9:45 ` [RFC PATCH v2 19/23] KVM: TDX: Pass down pfn to split_external_spt() Yan Zhao
2025-09-04  8:30   ` Binbin Wu
2025-08-07  9:45 ` [RFC PATCH v2 20/23] KVM: TDX: Handle Dynamic PAMT in tdh_mem_page_demote() Yan Zhao
2025-08-07  9:46 ` [RFC PATCH v2 21/23] KVM: TDX: Preallocate PAMT pages to be used in split path Yan Zhao
2025-09-04  9:17   ` Binbin Wu
2025-09-04  9:58     ` Yan Zhao
2025-08-07  9:46 ` [RFC PATCH v2 22/23] KVM: TDX: Handle Dynamic PAMT on page split Yan Zhao
2025-08-14  5:31   ` Vishal Annapurve
2025-08-14 18:29     ` Vishal Annapurve
2025-08-18  4:19     ` Yan Zhao
2025-08-07  9:46 ` [RFC PATCH v2 23/23] KVM: TDX: Turn on PG_LEVEL_2M after TD is RUNNABLE Yan Zhao

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:035d81275be dfblob:71115058e5e dfblob:a30e880849e
dfblob:af006a73ee0 )
 OR (
bs:"[RFC PATCH v2 14/23] KVM: TDX: Split and inhibit huge mappings if a VMExit carries level info" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250807094423.4644-1-yan.y.zhao@intel.com \
    --to=yan.y.zhao@intel.com \
    --cc=ackerleytng@google.com \
    --cc=binbin.wu@linux.intel.com \
    --cc=chao.p.peng@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=david@redhat.com \
    --cc=fan.du@intel.com \
    --cc=ira.weiny@intel.com \
    --cc=isaku.yamahata@intel.com \
    --cc=jun.miao@intel.com \
    --cc=kas@kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=michael.roth@amd.com \
    --cc=pbonzini@redhat.com \
    --cc=pgonda@google.com \
    --cc=quic_eberman@quicinc.com \
    --cc=rick.p.edgecombe@intel.com \
    --cc=seanjc@google.com \
    --cc=tabba@google.com \
    --cc=thomas.lendacky@amd.com \
    --cc=vannapurve@google.com \
    --cc=vbabka@suse.cz \
    --cc=x86@kernel.org \
    --cc=xiaoyao.li@intel.com \
    --cc=zhiquan1.li@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).