From: Yan Zhao <yan.y.zhao@intel.com>
To: pbonzini@redhat.com, seanjc@google.com
Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
x86@kernel.org, rick.p.edgecombe@intel.com,
dave.hansen@intel.com, kas@kernel.org, tabba@google.com,
ackerleytng@google.com, quic_eberman@quicinc.com,
michael.roth@amd.com, david@redhat.com, vannapurve@google.com,
vbabka@suse.cz, thomas.lendacky@amd.com, pgonda@google.com,
zhiquan1.li@intel.com, fan.du@intel.com, jun.miao@intel.com,
ira.weiny@intel.com, isaku.yamahata@intel.com,
xiaoyao.li@intel.com, binbin.wu@linux.intel.com,
chao.p.peng@intel.com, yan.y.zhao@intel.com
Subject: [RFC PATCH v2 13/23] KVM: x86: Introduce hugepage_set_guest_inhibit()
Date: Thu, 7 Aug 2025 17:44:10 +0800 [thread overview]
Message-ID: <20250807094410.4621-1-yan.y.zhao@intel.com> (raw)
In-Reply-To: <20250807093950.4395-1-yan.y.zhao@intel.com>
TDX requires guests to accept S-EPT mappings created by the host KVM. Due
to the current implementation of the TDX module, if a guest accepts a GFN
at a lower level after KVM maps it at a higher level, the TDX module will
emulate an EPT violation VMExit to KVM instead of returning a size mismatch
error to the guest. If KVM fails to perform page splitting in the VMExit
handler, the guest's accept operation will be triggered again upon
re-entering the guest, causing a repeated EPT violation VMExit.
To facilitate passing the guest's accept level information to the KVM MMU
core and to prevent the repeated mapping of a GFN at different levels due
to different accept levels specified by different vCPUs, introduce the
interface hugepage_set_guest_inhibit(). This interface specifies across
vCPUs that mapping at a certain level is inhibited from the guest.
The KVM_LPAGE_GUEST_INHIBIT_FLAG bit is currently modified in one
direction (set), so no clear interface is provided.
Link: https://lore.kernel.org/all/a6ffe23fb97e64109f512fa43e9f6405236ed40a.camel@intel.com/ [1]
Suggested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
---
RFC v2:
- new in RFC v2
---
arch/x86/kvm/mmu.h | 3 +++
arch/x86/kvm/mmu/mmu.c | 21 ++++++++++++++++++---
2 files changed, 21 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index b122255c7d4e..c2d8819f3438 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -326,4 +326,7 @@ static inline bool kvm_is_gfn_alias(struct kvm *kvm, gfn_t gfn)
{
return gfn & kvm_gfn_direct_bits(kvm);
}
+
+void hugepage_set_guest_inhibit(struct kvm_memory_slot *slot, gfn_t gfn, int level);
+bool hugepage_test_guest_inhibit(struct kvm_memory_slot *slot, gfn_t gfn, int level);
#endif
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 13910ae05f76..1c639286aac2 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -721,12 +721,14 @@ static struct kvm_lpage_info *lpage_info_slot(gfn_t gfn,
}
/*
- * The most significant bit in disallow_lpage tracks whether or not memory
- * attributes are mixed, i.e. not identical for all gfns at the current level.
+ * The most 2 significant bits in disallow_lpage tracks whether or not memory
+ * attributes are mixed, i.e. not identical for all gfns at the current level,
+ * or whether or not guest inhibits the current level of hugepage at the gfn.
* The lower order bits are used to refcount other cases where a hugepage is
* disallowed, e.g. if KVM has shadow a page table at the gfn.
*/
#define KVM_LPAGE_MIXED_FLAG BIT(31)
+#define KVM_LPAGE_GUEST_INHIBIT_FLAG BIT(30)
static void update_gfn_disallow_lpage_count(const struct kvm_memory_slot *slot,
gfn_t gfn, int count)
@@ -739,7 +741,8 @@ static void update_gfn_disallow_lpage_count(const struct kvm_memory_slot *slot,
old = linfo->disallow_lpage;
linfo->disallow_lpage += count;
- WARN_ON_ONCE((old ^ linfo->disallow_lpage) & KVM_LPAGE_MIXED_FLAG);
+ WARN_ON_ONCE((old ^ linfo->disallow_lpage) &
+ (KVM_LPAGE_MIXED_FLAG | KVM_LPAGE_GUEST_INHIBIT_FLAG));
}
}
@@ -1647,6 +1650,18 @@ static bool __kvm_rmap_zap_gfn_range(struct kvm *kvm,
start, end - 1, can_yield, true, flush);
}
+bool hugepage_test_guest_inhibit(struct kvm_memory_slot *slot, gfn_t gfn, int level)
+{
+ return lpage_info_slot(gfn, slot, level)->disallow_lpage & KVM_LPAGE_GUEST_INHIBIT_FLAG;
+}
+EXPORT_SYMBOL_GPL(hugepage_test_guest_inhibit);
+
+void hugepage_set_guest_inhibit(struct kvm_memory_slot *slot, gfn_t gfn, int level)
+{
+ lpage_info_slot(gfn, slot, level)->disallow_lpage |= KVM_LPAGE_GUEST_INHIBIT_FLAG;
+}
+EXPORT_SYMBOL_GPL(hugepage_set_guest_inhibit);
+
/*
* Split large leafs crossing the boundary of the specified range
*
--
2.43.2
next prev parent reply other threads:[~2025-08-07 9:44 UTC|newest]
Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-07 9:39 [RFC PATCH v2 00/23] KVM: TDX huge page support for private memory Yan Zhao
2025-08-07 9:41 ` [RFC PATCH v2 01/23] x86/tdx: Enhance tdh_mem_page_aug() to support huge pages Yan Zhao
2025-08-07 9:41 ` [RFC PATCH v2 02/23] x86/virt/tdx: Add SEAMCALL wrapper tdh_mem_page_demote() Yan Zhao
2025-09-01 8:55 ` Binbin Wu
2025-09-01 9:08 ` Yan Zhao
2025-09-02 16:56 ` Edgecombe, Rick P
2025-09-02 17:37 ` Sean Christopherson
2025-09-02 17:45 ` Edgecombe, Rick P
2025-09-04 9:31 ` Yan Zhao
2025-08-07 9:42 ` [RFC PATCH v2 03/23] x86/tdx: Enhance tdh_phymem_page_wbinvd_hkid() to invalidate huge pages Yan Zhao
2025-08-07 9:42 ` [RFC PATCH v2 04/23] KVM: TDX: Introduce tdx_clear_folio() to clear " Yan Zhao
2025-09-02 2:56 ` Binbin Wu
2025-09-03 9:51 ` Yan Zhao
2025-09-03 11:19 ` Binbin Wu
2025-09-04 2:53 ` Yan Zhao
2025-08-07 9:42 ` [RFC PATCH v2 05/23] x86/tdx: Enhance tdh_phymem_page_reclaim() to support " Yan Zhao
2025-08-07 9:42 ` [RFC PATCH v2 06/23] KVM: TDX: Do not hold page refcount on private guest pages Yan Zhao
2025-08-07 9:42 ` [RFC PATCH v2 07/23] KVM: x86/mmu: Disallow page merging (huge page adjustment) for mirror root Yan Zhao
2025-08-07 9:43 ` [RFC PATCH v2 08/23] KVM: x86/tdp_mmu: Alloc external_spt page for mirror page table splitting Yan Zhao
2025-08-07 9:43 ` [RFC PATCH v2 09/23] KVM: x86/tdp_mmu: Add split_external_spt hook called during write mmu_lock Yan Zhao
2025-08-07 9:43 ` [RFC PATCH v2 10/23] KVM: TDX: Enable huge page splitting under write kvm->mmu_lock Yan Zhao
2025-08-07 9:43 ` [RFC PATCH v2 11/23] KVM: x86: Reject splitting huge pages under shared mmu_lock for mirror root Yan Zhao
2025-09-03 3:30 ` Binbin Wu
2025-08-07 9:43 ` [RFC PATCH v2 12/23] KVM: x86/mmu: Introduce kvm_split_cross_boundary_leafs() Yan Zhao
2025-09-03 6:57 ` Binbin Wu
2025-09-03 9:44 ` Yan Zhao
2025-08-07 9:44 ` Yan Zhao [this message]
2025-08-07 9:44 ` [RFC PATCH v2 14/23] KVM: TDX: Split and inhibit huge mappings if a VMExit carries level info Yan Zhao
2025-09-03 7:36 ` Binbin Wu
2025-09-03 9:37 ` Yan Zhao
2025-08-07 9:44 ` [RFC PATCH v2 15/23] KVM: Change the return type of gfn_handler_t() from bool to int Yan Zhao
2025-08-07 9:44 ` [RFC PATCH v2 16/23] KVM: x86: Split cross-boundary mirror leafs for KVM_SET_MEMORY_ATTRIBUTES Yan Zhao
2025-08-07 9:45 ` [RFC PATCH v2 17/23] KVM: guest_memfd: Split for punch hole and private-to-shared conversion Yan Zhao
2025-09-04 7:58 ` Binbin Wu
2025-09-04 9:48 ` Yan Zhao
2025-08-07 9:45 ` [RFC PATCH v2 18/23] x86/virt/tdx: Do not perform cache flushes unless CLFLUSH_BEFORE_ALLOC is set Yan Zhao
2025-08-11 21:10 ` Sagi Shahar
2025-08-12 6:37 ` Yan Zhao
2025-09-04 8:16 ` Binbin Wu
2025-09-04 9:50 ` Yan Zhao
2025-08-07 9:45 ` [RFC PATCH v2 19/23] KVM: TDX: Pass down pfn to split_external_spt() Yan Zhao
2025-09-04 8:30 ` Binbin Wu
2025-08-07 9:45 ` [RFC PATCH v2 20/23] KVM: TDX: Handle Dynamic PAMT in tdh_mem_page_demote() Yan Zhao
2025-08-07 9:46 ` [RFC PATCH v2 21/23] KVM: TDX: Preallocate PAMT pages to be used in split path Yan Zhao
2025-09-04 9:17 ` Binbin Wu
2025-09-04 9:58 ` Yan Zhao
2025-08-07 9:46 ` [RFC PATCH v2 22/23] KVM: TDX: Handle Dynamic PAMT on page split Yan Zhao
2025-08-14 5:31 ` Vishal Annapurve
2025-08-14 18:29 ` Vishal Annapurve
2025-08-18 4:19 ` Yan Zhao
2025-08-07 9:46 ` [RFC PATCH v2 23/23] KVM: TDX: Turn on PG_LEVEL_2M after TD is RUNNABLE Yan Zhao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250807094410.4621-1-yan.y.zhao@intel.com \
--to=yan.y.zhao@intel.com \
--cc=ackerleytng@google.com \
--cc=binbin.wu@linux.intel.com \
--cc=chao.p.peng@intel.com \
--cc=dave.hansen@intel.com \
--cc=david@redhat.com \
--cc=fan.du@intel.com \
--cc=ira.weiny@intel.com \
--cc=isaku.yamahata@intel.com \
--cc=jun.miao@intel.com \
--cc=kas@kernel.org \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=michael.roth@amd.com \
--cc=pbonzini@redhat.com \
--cc=pgonda@google.com \
--cc=quic_eberman@quicinc.com \
--cc=rick.p.edgecombe@intel.com \
--cc=seanjc@google.com \
--cc=tabba@google.com \
--cc=thomas.lendacky@amd.com \
--cc=vannapurve@google.com \
--cc=vbabka@suse.cz \
--cc=x86@kernel.org \
--cc=xiaoyao.li@intel.com \
--cc=zhiquan1.li@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).