* [RFC PATCH 00/22] KVM: combined patchset for MBEC/GMET support
@ 2026-03-21 0:09 Paolo Bonzini
2026-03-21 0:09 ` [PATCH 01/22] KVM: TDX/VMX: rework EPT_VIOLATION_EXEC_FOR_RING3_LIN into PROT_MASK Paolo Bonzini
` (22 more replies)
0 siblings, 23 replies; 56+ messages in thread
From: Paolo Bonzini @ 2026-03-21 0:09 UTC (permalink / raw)
To: linux-kernel, kvm
Cc: Jon Kohler, Marcelo Tosatti, Nikunj A Dadhania, Amit Shah,
Sean Christopherson
This series introduces support for two related features that Hyper-V uses
in its implementation of Virtual Secure Mode; these are Intel Mode-Based
Execute Control and AMD Guest Mode Execution Trap. Both of them allow
more granular control over execute permissions, with different levels of
separation between supervisor and user mode.
MBEC provides support for separate supervisor and user-mode bits in the
PTEs; GMET instead lacks supervisor-mode only execution (with NX=0,
"both" is represented by U=0 and user-mode only by U=1). GMET was
clearly inspired by SMEP though with some differences and annoyances.
The series was developed starting from Jon Kohler's earlier version
https://lore.kernel.org/kvm/20251223054806.1611168-1-jon@nutanix.com/.
The differences lies almost entirely in the MMU parts, where I took
the occasion to do two things:
- clean up the implementation of nVMX exec-only, by properly adding
read permissions to the ACC_* constant and to the permission bitmask
machinery
- allow KVM to run with MBEC/GMET enabled even in non-nested mode.
This simplifies testing of page table manipulation, covering almost
all that is needed (on the MMU side at least) for shadow EPT/NPT.
These core MMU changes actually make the implementation of the features
pretty simple. In fact the number of new lines overall is a little smaller
than in Jon's patches, despite supporting twice the instruction sets. :)
On the Intel side, the main idea of the implementation is to split
shadow_x_mask in two and to repurpose ACC_USER_MASK (which is not used
by EPT) to ACC_USER_EXEC_MASK. When MBEC is active, ACC_EXEC_MASK is
for kernel-mode execution just like the X bit of PTEs morphs into XS.
update_permission_bitmask() then precomputes all the necessary conditions.
While the MMU code is all new, most of the Intel-specific code here is
from Jon---so thanks for the early posting, which undoubtedly saved time.
On the AMD side, the U bit maps to ACC_USER_MASK but nNPT adjusts the
permission bitmask to ignore it for reads and writes when GMET is active.
Here there was a bit more work to do than I expected, because the page
tables have to be created with U=0. For now I chose to have it in all
levels of the page tables; I'll probably change the code soon to
clear the U bit only in leaf SPTEs, but I'm leaving it this way because
it makes patch 16 easier to understand (it's a fix for a latent bug of
sorts and I'd like to include it anyway).
In both cases the former "smep_andnot_wp" bit of cpu_role.base,
now named "cr4_smep", is repurposed to indicate that the feature
is on. The minor pessimization for shadow page tables (CR4.SMEP
now always forces a rebuild of a new version of the shadow page tables,
even though that's only necessary if CR4.WP=0) is not really
worth fretting about; in practice, guests are not going to flip
CR4.SMEP in a way that would prevent efficient reuse of shadow
page tables.
Patches 1-9 are general cleanups, mostly for MMU code.
Patches 10-15 are for Intel MBEC, with the first three covering
non-nested use.
Patches 16-22 are for AMD GMET, with 16/17/18/20 covering non-nested
use and the others covering nested virtualization.
Jon Kohler tested the nVMX parts on Windows, whereas I tested nSVM GMET
with new additions to kvm-unit-tests that I'll send out next week.
Paolo
Jon Kohler (5):
KVM: TDX/VMX: rework EPT_VIOLATION_EXEC_FOR_RING3_LIN into PROT_MASK
KVM: x86/mmu: remove SPTE_PERM_MASK
KVM: x86/mmu: adjust MMIO generation bit allocation and allowed mask
KVM: nVMX: advertise MBEC to nested guests
KVM: nVMX: allow MBEC with EVMCS
Paolo Bonzini (17):
KVM: x86/mmu: shuffle high bits of SPTEs in preparation for MBEC
KVM: x86/mmu: remove SPTE_EPT_*
KVM: x86/mmu: merge make_spte_{non,}executable
KVM: x86/mmu: rename and clarify BYTE_MASK
KVM: x86/mmu: introduce ACC_READ_MASK
KVM: x86/mmu: separate more EPT/non-EPT permission_fault()
KVM: x86/mmu: split XS/XU bits for MBEC
KVM: x86/mmu: move cr4_smep to base role
KVM: VMX: enable use of MBEC
KVM: x86/mmu: add support for nested MBEC
KVM: x86/tdp_mmu: propagate access mask from kvm_mmu_page to PTE
KVM: x86/mmu: introduce cpu_role bit for availability of PFEC.I/D
KVM: SVM: add GMET bit definitions
KVM: x86/mmu: add support for NPT GMET
KVM: SVM: enable GMET and set it in MMU role
KVM: SVM: work around errata 1218
KVM: nSVM: enable GMET for guests
Documentation/virt/kvm/x86/mmu.rst | 10 +-
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/kvm-x86-ops.h | 1 +
arch/x86/include/asm/kvm_host.h | 44 ++++++---
arch/x86/include/asm/svm.h | 1 +
arch/x86/include/asm/vmx.h | 9 +-
arch/x86/kvm/mmu.h | 12 ++-
arch/x86/kvm/mmu/mmu.c | 153 ++++++++++++++++++++---------
arch/x86/kvm/mmu/mmutrace.h | 9 +-
arch/x86/kvm/mmu/paging_tmpl.h | 48 +++++----
arch/x86/kvm/mmu/spte.c | 76 +++++++-------
arch/x86/kvm/mmu/spte.h | 58 ++++++-----
arch/x86/kvm/mmu/tdp_mmu.c | 4 +-
arch/x86/kvm/svm/nested.c | 8 +-
arch/x86/kvm/svm/svm.c | 30 ++++++
arch/x86/kvm/vmx/capabilities.h | 11 ++-
arch/x86/kvm/vmx/common.h | 21 ++--
arch/x86/kvm/vmx/hyperv_evmcs.h | 1 +
arch/x86/kvm/vmx/main.c | 11 ++-
arch/x86/kvm/vmx/nested.c | 10 ++
arch/x86/kvm/vmx/tdx.c | 2 +-
arch/x86/kvm/vmx/vmx.c | 17 +++-
arch/x86/kvm/vmx/vmx.h | 1 +
arch/x86/kvm/vmx/x86_ops.h | 1 +
24 files changed, 373 insertions(+), 166 deletions(-)
--
2.52.0
^ permalink raw reply [flat|nested] 56+ messages in thread
* [PATCH 01/22] KVM: TDX/VMX: rework EPT_VIOLATION_EXEC_FOR_RING3_LIN into PROT_MASK
2026-03-21 0:09 [RFC PATCH 00/22] KVM: combined patchset for MBEC/GMET support Paolo Bonzini
@ 2026-03-21 0:09 ` Paolo Bonzini
2026-03-23 14:49 ` Jon Kohler
2026-03-25 4:29 ` Huang, Kai
2026-03-21 0:09 ` [PATCH 02/22] KVM: x86/mmu: remove SPTE_PERM_MASK Paolo Bonzini
` (21 subsequent siblings)
22 siblings, 2 replies; 56+ messages in thread
From: Paolo Bonzini @ 2026-03-21 0:09 UTC (permalink / raw)
To: linux-kernel, kvm
Cc: Jon Kohler, Marcelo Tosatti, Nikunj A Dadhania, Amit Shah,
Sean Christopherson
From: Jon Kohler <jon@nutanix.com>
EPT exit qualification bit 6 is used when mode-based execute control
is enabled, and reflects user executable addresses. Rework name to
reflect the intention and add to EPT_VIOLATION_PROT_MASK, which allows
simplifying the return evaluation in
tdx_is_sept_violation_unexpected_pending a pinch.
Rework handling in __vmx_handle_ept_violation to unconditionally clear
EPT_VIOLATION_PROT_USER_EXEC until MBEC is implemented, as suggested by
Sean [1].
Note: Intel SDM Table 29-7 defines bit 6 as:
If the “mode-based execute control” VM-execution control is 0, the
value of this bit is undefined. If that control is 1, this bit is the
logical-AND of bit 10 in the EPT paging-structure entries used to
translate the guest-physical address of the access causing the EPT
violation. In this case, it indicates whether the guest-physical
address was executable for user-mode linear addresses.
[1] https://lore.kernel.org/all/aCJDzU1p_SFNRIJd@google.com/
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Jon Kohler <jon@nutanix.com>
Message-ID: <20251223054806.1611168-2-jon@nutanix.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
arch/x86/include/asm/vmx.h | 5 +++--
arch/x86/kvm/vmx/common.h | 9 +++++++--
arch/x86/kvm/vmx/tdx.c | 2 +-
3 files changed, 11 insertions(+), 5 deletions(-)
diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index cca7d6641287..4a0804cc7c82 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -587,10 +587,11 @@ enum vm_entry_failure_code {
#define EPT_VIOLATION_PROT_READ BIT(3)
#define EPT_VIOLATION_PROT_WRITE BIT(4)
#define EPT_VIOLATION_PROT_EXEC BIT(5)
-#define EPT_VIOLATION_EXEC_FOR_RING3_LIN BIT(6)
+#define EPT_VIOLATION_PROT_USER_EXEC BIT(6)
#define EPT_VIOLATION_PROT_MASK (EPT_VIOLATION_PROT_READ | \
EPT_VIOLATION_PROT_WRITE | \
- EPT_VIOLATION_PROT_EXEC)
+ EPT_VIOLATION_PROT_EXEC | \
+ EPT_VIOLATION_PROT_USER_EXEC)
#define EPT_VIOLATION_GVA_IS_VALID BIT(7)
#define EPT_VIOLATION_GVA_TRANSLATED BIT(8)
diff --git a/arch/x86/kvm/vmx/common.h b/arch/x86/kvm/vmx/common.h
index 412d0829d7a2..adf925500b9e 100644
--- a/arch/x86/kvm/vmx/common.h
+++ b/arch/x86/kvm/vmx/common.h
@@ -94,8 +94,13 @@ static inline int __vmx_handle_ept_violation(struct kvm_vcpu *vcpu, gpa_t gpa,
/* Is it a fetch fault? */
error_code |= (exit_qualification & EPT_VIOLATION_ACC_INSTR)
? PFERR_FETCH_MASK : 0;
- /* ept page table entry is present? */
- error_code |= (exit_qualification & EPT_VIOLATION_PROT_MASK)
+ /*
+ * ept page table entry is present?
+ * note: unconditionally clear USER_EXEC until mode-based
+ * execute control is implemented
+ */
+ error_code |= (exit_qualification &
+ (EPT_VIOLATION_PROT_MASK & ~EPT_VIOLATION_PROT_USER_EXEC))
? PFERR_PRESENT_MASK : 0;
if (exit_qualification & EPT_VIOLATION_GVA_IS_VALID)
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index c308aedd8613..bf9fe76d974d 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -1921,7 +1921,7 @@ static inline bool tdx_is_sept_violation_unexpected_pending(struct kvm_vcpu *vcp
if (eeq_type != TDX_EXT_EXIT_QUAL_TYPE_PENDING_EPT_VIOLATION)
return false;
- return !(eq & EPT_VIOLATION_PROT_MASK) && !(eq & EPT_VIOLATION_EXEC_FOR_RING3_LIN);
+ return !(eq & EPT_VIOLATION_PROT_MASK);
}
static int tdx_handle_ept_violation(struct kvm_vcpu *vcpu)
--
2.52.0
^ permalink raw reply related [flat|nested] 56+ messages in thread
* [PATCH 02/22] KVM: x86/mmu: remove SPTE_PERM_MASK
2026-03-21 0:09 [RFC PATCH 00/22] KVM: combined patchset for MBEC/GMET support Paolo Bonzini
2026-03-21 0:09 ` [PATCH 01/22] KVM: TDX/VMX: rework EPT_VIOLATION_EXEC_FOR_RING3_LIN into PROT_MASK Paolo Bonzini
@ 2026-03-21 0:09 ` Paolo Bonzini
2026-03-25 4:29 ` Huang, Kai
2026-03-21 0:09 ` [PATCH 03/22] KVM: x86/mmu: adjust MMIO generation bit allocation and allowed mask Paolo Bonzini
` (20 subsequent siblings)
22 siblings, 1 reply; 56+ messages in thread
From: Paolo Bonzini @ 2026-03-21 0:09 UTC (permalink / raw)
To: linux-kernel, kvm
Cc: Jon Kohler, Marcelo Tosatti, Nikunj A Dadhania, Amit Shah,
Sean Christopherson
From: Jon Kohler <jon@nutanix.com>
SPTE_PERM_MASK is no longer referenced by anything in the kernel.
Signed-off-by: Jon Kohler <jon@nutanix.com>
Message-ID: <20251223054806.1611168-3-jon@nutanix.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
arch/x86/kvm/mmu/spte.h | 3 ---
1 file changed, 3 deletions(-)
diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
index 3133f066927e..0fc83c9064c5 100644
--- a/arch/x86/kvm/mmu/spte.h
+++ b/arch/x86/kvm/mmu/spte.h
@@ -42,9 +42,6 @@ static_assert(SPTE_TDP_AD_ENABLED == 0);
#define SPTE_BASE_ADDR_MASK (((1ULL << 52) - 1) & ~(u64)(PAGE_SIZE-1))
#endif
-#define SPTE_PERM_MASK (PT_PRESENT_MASK | PT_WRITABLE_MASK | shadow_user_mask \
- | shadow_x_mask | shadow_nx_mask | shadow_me_mask)
-
#define ACC_EXEC_MASK 1
#define ACC_WRITE_MASK PT_WRITABLE_MASK
#define ACC_USER_MASK PT_USER_MASK
--
2.52.0
^ permalink raw reply related [flat|nested] 56+ messages in thread
* [PATCH 03/22] KVM: x86/mmu: adjust MMIO generation bit allocation and allowed mask
2026-03-21 0:09 [RFC PATCH 00/22] KVM: combined patchset for MBEC/GMET support Paolo Bonzini
2026-03-21 0:09 ` [PATCH 01/22] KVM: TDX/VMX: rework EPT_VIOLATION_EXEC_FOR_RING3_LIN into PROT_MASK Paolo Bonzini
2026-03-21 0:09 ` [PATCH 02/22] KVM: x86/mmu: remove SPTE_PERM_MASK Paolo Bonzini
@ 2026-03-21 0:09 ` Paolo Bonzini
2026-03-24 3:48 ` Huang, Kai
2026-03-21 0:09 ` [PATCH 04/22] KVM: x86/mmu: shuffle high bits of SPTEs in preparation for MBEC Paolo Bonzini
` (19 subsequent siblings)
22 siblings, 1 reply; 56+ messages in thread
From: Paolo Bonzini @ 2026-03-21 0:09 UTC (permalink / raw)
To: linux-kernel, kvm
Cc: Jon Kohler, Marcelo Tosatti, Nikunj A Dadhania, Amit Shah,
Sean Christopherson, Kai Huang
From: Jon Kohler <jon@nutanix.com>
Update SPTE_MMIO_ALLOWED_MASK to allow EPT user executable (bit 10) to
be treated like EPT RWX bit2:0, as when mode-based execute control is
enabled, bit 10 can act like a "present" bit.
No functional changes intended.
Cc: Kai Huang <kai.huang@intel.com>
Signed-off-by: Jon Kohler <jon@nutanix.com>
Message-ID: <20251223054806.1611168-4-jon@nutanix.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
arch/x86/kvm/mmu/spte.h | 13 +++++++------
1 file changed, 7 insertions(+), 6 deletions(-)
diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
index 0fc83c9064c5..b60666778f61 100644
--- a/arch/x86/kvm/mmu/spte.h
+++ b/arch/x86/kvm/mmu/spte.h
@@ -96,11 +96,11 @@ static_assert(!(EPT_SPTE_MMU_WRITABLE & SHADOW_ACC_TRACK_SAVED_MASK));
#undef SHADOW_ACC_TRACK_SAVED_MASK
/*
- * Due to limited space in PTEs, the MMIO generation is a 19 bit subset of
+ * Due to limited space in PTEs, the MMIO generation is an 18 bit subset of
* the memslots generation and is derived as follows:
*
- * Bits 0-7 of the MMIO generation are propagated to spte bits 3-10
- * Bits 8-18 of the MMIO generation are propagated to spte bits 52-62
+ * Bits 0-6 of the MMIO generation are propagated to spte bits 3-9
+ * Bits 7-17 of the MMIO generation are propagated to spte bits 52-62
*
* The KVM_MEMSLOT_GEN_UPDATE_IN_PROGRESS flag is intentionally not included in
* the MMIO generation number, as doing so would require stealing a bit from
@@ -111,7 +111,7 @@ static_assert(!(EPT_SPTE_MMU_WRITABLE & SHADOW_ACC_TRACK_SAVED_MASK));
*/
#define MMIO_SPTE_GEN_LOW_START 3
-#define MMIO_SPTE_GEN_LOW_END 10
+#define MMIO_SPTE_GEN_LOW_END 9
#define MMIO_SPTE_GEN_HIGH_START 52
#define MMIO_SPTE_GEN_HIGH_END 62
@@ -133,7 +133,8 @@ static_assert(!(SPTE_MMU_PRESENT_MASK &
* and so they're off-limits for generation; additional checks ensure the mask
* doesn't overlap legal PA bits), and bit 63 (carved out for future usage).
*/
-#define SPTE_MMIO_ALLOWED_MASK (BIT_ULL(63) | GENMASK_ULL(51, 12) | GENMASK_ULL(2, 0))
+#define SPTE_MMIO_ALLOWED_MASK (BIT_ULL(63) | GENMASK_ULL(51, 12) | \
+ BIT_ULL(10) | GENMASK_ULL(2, 0))
static_assert(!(SPTE_MMIO_ALLOWED_MASK &
(SPTE_MMU_PRESENT_MASK | MMIO_SPTE_GEN_LOW_MASK | MMIO_SPTE_GEN_HIGH_MASK)));
@@ -141,7 +142,7 @@ static_assert(!(SPTE_MMIO_ALLOWED_MASK &
#define MMIO_SPTE_GEN_HIGH_BITS (MMIO_SPTE_GEN_HIGH_END - MMIO_SPTE_GEN_HIGH_START + 1)
/* remember to adjust the comment above as well if you change these */
-static_assert(MMIO_SPTE_GEN_LOW_BITS == 8 && MMIO_SPTE_GEN_HIGH_BITS == 11);
+static_assert(MMIO_SPTE_GEN_LOW_BITS == 7 && MMIO_SPTE_GEN_HIGH_BITS == 11);
#define MMIO_SPTE_GEN_LOW_SHIFT (MMIO_SPTE_GEN_LOW_START - 0)
#define MMIO_SPTE_GEN_HIGH_SHIFT (MMIO_SPTE_GEN_HIGH_START - MMIO_SPTE_GEN_LOW_BITS)
--
2.52.0
^ permalink raw reply related [flat|nested] 56+ messages in thread
* [PATCH 04/22] KVM: x86/mmu: shuffle high bits of SPTEs in preparation for MBEC
2026-03-21 0:09 [RFC PATCH 00/22] KVM: combined patchset for MBEC/GMET support Paolo Bonzini
` (2 preceding siblings ...)
2026-03-21 0:09 ` [PATCH 03/22] KVM: x86/mmu: adjust MMIO generation bit allocation and allowed mask Paolo Bonzini
@ 2026-03-21 0:09 ` Paolo Bonzini
2026-03-25 4:35 ` Huang, Kai
2026-03-21 0:09 ` [PATCH 05/22] KVM: x86/mmu: remove SPTE_EPT_* Paolo Bonzini
` (18 subsequent siblings)
22 siblings, 1 reply; 56+ messages in thread
From: Paolo Bonzini @ 2026-03-21 0:09 UTC (permalink / raw)
To: linux-kernel, kvm
Cc: Jon Kohler, Marcelo Tosatti, Nikunj A Dadhania, Amit Shah,
Sean Christopherson
Access tracking will need to save bit 10 when MBEC is enabled.
Right now it is simply shifting the R and X bits into bits 54 and 56,
but bit 10 would not fit with the same scheme. Reorganize the
high bits so that access tracking will use bits 52, 54 and 62.
As a side effect, the free bits are compacted slightly, with
56-59 still unused.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
arch/x86/kvm/mmu/spte.h | 20 +++++++++++++++-----
1 file changed, 15 insertions(+), 5 deletions(-)
diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
index b60666778f61..7223a61b1260 100644
--- a/arch/x86/kvm/mmu/spte.h
+++ b/arch/x86/kvm/mmu/spte.h
@@ -17,10 +17,20 @@
*/
#define SPTE_MMU_PRESENT_MASK BIT_ULL(11)
+/*
+ * The ignored high bits are allocated as follows:
+ * - bits 52, 54: saved X-R bits for access tracking when EPT does not have A/D
+ * - bits 53 (EPT only): host writable
+ * - bits 55 (EPT only): MMU-writable
+ * - bits 56-59: unused
+ * - bits 60-61: type of A/D tracking
+ * - bits 62: unused
+ */
+
/*
* TDP SPTES (more specifically, EPT SPTEs) may not have A/D bits, and may also
* be restricted to using write-protection (for L2 when CPU dirty logging, i.e.
- * PML, is enabled). Use bits 52 and 53 to hold the type of A/D tracking that
+ * PML, is enabled). Use bits 60 and 61 to hold the type of A/D tracking that
* is must be employed for a given TDP SPTE.
*
* Note, the "enabled" mask must be '0', as bits 62:52 are _reserved_ for PAE
@@ -29,7 +39,7 @@
* TDP with CPU dirty logging (PML). If NPT ever gains PML-like support, it
* must be restricted to 64-bit KVM.
*/
-#define SPTE_TDP_AD_SHIFT 52
+#define SPTE_TDP_AD_SHIFT 60
#define SPTE_TDP_AD_MASK (3ULL << SPTE_TDP_AD_SHIFT)
#define SPTE_TDP_AD_ENABLED (0ULL << SPTE_TDP_AD_SHIFT)
#define SPTE_TDP_AD_DISABLED (1ULL << SPTE_TDP_AD_SHIFT)
@@ -65,7 +75,7 @@ static_assert(SPTE_TDP_AD_ENABLED == 0);
*/
#define SHADOW_ACC_TRACK_SAVED_BITS_MASK (SPTE_EPT_READABLE_MASK | \
SPTE_EPT_EXECUTABLE_MASK)
-#define SHADOW_ACC_TRACK_SAVED_BITS_SHIFT 54
+#define SHADOW_ACC_TRACK_SAVED_BITS_SHIFT 52
#define SHADOW_ACC_TRACK_SAVED_MASK (SHADOW_ACC_TRACK_SAVED_BITS_MASK << \
SHADOW_ACC_TRACK_SAVED_BITS_SHIFT)
static_assert(!(SPTE_TDP_AD_MASK & SHADOW_ACC_TRACK_SAVED_MASK));
@@ -84,8 +94,8 @@ static_assert(!(SPTE_TDP_AD_MASK & SHADOW_ACC_TRACK_SAVED_MASK));
* to not overlap the A/D type mask or the saved access bits of access-tracked
* SPTEs when A/D bits are disabled.
*/
-#define EPT_SPTE_HOST_WRITABLE BIT_ULL(57)
-#define EPT_SPTE_MMU_WRITABLE BIT_ULL(58)
+#define EPT_SPTE_HOST_WRITABLE BIT_ULL(53)
+#define EPT_SPTE_MMU_WRITABLE BIT_ULL(55)
static_assert(!(EPT_SPTE_HOST_WRITABLE & SPTE_TDP_AD_MASK));
static_assert(!(EPT_SPTE_MMU_WRITABLE & SPTE_TDP_AD_MASK));
--
2.52.0
^ permalink raw reply related [flat|nested] 56+ messages in thread
* [PATCH 05/22] KVM: x86/mmu: remove SPTE_EPT_*
2026-03-21 0:09 [RFC PATCH 00/22] KVM: combined patchset for MBEC/GMET support Paolo Bonzini
` (3 preceding siblings ...)
2026-03-21 0:09 ` [PATCH 04/22] KVM: x86/mmu: shuffle high bits of SPTEs in preparation for MBEC Paolo Bonzini
@ 2026-03-21 0:09 ` Paolo Bonzini
2026-03-25 4:36 ` Huang, Kai
2026-03-21 0:09 ` [PATCH 06/22] KVM: x86/mmu: merge make_spte_{non,}executable Paolo Bonzini
` (17 subsequent siblings)
22 siblings, 1 reply; 56+ messages in thread
From: Paolo Bonzini @ 2026-03-21 0:09 UTC (permalink / raw)
To: linux-kernel, kvm
Cc: Jon Kohler, Marcelo Tosatti, Nikunj A Dadhania, Amit Shah,
Sean Christopherson
spte.h is already including vmx.h, use the constants it defines.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
arch/x86/kvm/mmu/spte.h | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)
diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
index 7223a61b1260..3d77755b6b10 100644
--- a/arch/x86/kvm/mmu/spte.h
+++ b/arch/x86/kvm/mmu/spte.h
@@ -57,10 +57,6 @@ static_assert(SPTE_TDP_AD_ENABLED == 0);
#define ACC_USER_MASK PT_USER_MASK
#define ACC_ALL (ACC_EXEC_MASK | ACC_WRITE_MASK | ACC_USER_MASK)
-/* The mask for the R/X bits in EPT PTEs */
-#define SPTE_EPT_READABLE_MASK 0x1ull
-#define SPTE_EPT_EXECUTABLE_MASK 0x4ull
-
#define SPTE_LEVEL_BITS 9
#define SPTE_LEVEL_SHIFT(level) __PT_LEVEL_SHIFT(level, SPTE_LEVEL_BITS)
#define SPTE_INDEX(address, level) __PT_INDEX(address, level, SPTE_LEVEL_BITS)
@@ -73,8 +69,8 @@ static_assert(SPTE_TDP_AD_ENABLED == 0);
* restored only when a write is attempted to the page. This mask obviously
* must not overlap the A/D type mask.
*/
-#define SHADOW_ACC_TRACK_SAVED_BITS_MASK (SPTE_EPT_READABLE_MASK | \
- SPTE_EPT_EXECUTABLE_MASK)
+#define SHADOW_ACC_TRACK_SAVED_BITS_MASK (VMX_EPT_READABLE_MASK | \
+ VMX_EPT_EXECUTABLE_MASK)
#define SHADOW_ACC_TRACK_SAVED_BITS_SHIFT 52
#define SHADOW_ACC_TRACK_SAVED_MASK (SHADOW_ACC_TRACK_SAVED_BITS_MASK << \
SHADOW_ACC_TRACK_SAVED_BITS_SHIFT)
--
2.52.0
^ permalink raw reply related [flat|nested] 56+ messages in thread
* [PATCH 06/22] KVM: x86/mmu: merge make_spte_{non,}executable
2026-03-21 0:09 [RFC PATCH 00/22] KVM: combined patchset for MBEC/GMET support Paolo Bonzini
` (4 preceding siblings ...)
2026-03-21 0:09 ` [PATCH 05/22] KVM: x86/mmu: remove SPTE_EPT_* Paolo Bonzini
@ 2026-03-21 0:09 ` Paolo Bonzini
2026-03-23 14:49 ` Jon Kohler
2026-03-21 0:09 ` [PATCH 07/22] KVM: x86/mmu: rename and clarify BYTE_MASK Paolo Bonzini
` (16 subsequent siblings)
22 siblings, 1 reply; 56+ messages in thread
From: Paolo Bonzini @ 2026-03-21 0:09 UTC (permalink / raw)
To: linux-kernel, kvm
Cc: Jon Kohler, Marcelo Tosatti, Nikunj A Dadhania, Amit Shah,
Sean Christopherson
As the logic will become more complicated with the introduction
of MBEC, at least write it only once.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
arch/x86/kvm/mmu/spte.c | 21 +++++++++++----------
1 file changed, 11 insertions(+), 10 deletions(-)
diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index df31039b5d63..e2acd9ed9dba 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -317,14 +317,15 @@ static u64 modify_spte_protections(u64 spte, u64 set, u64 clear)
return spte;
}
-static u64 make_spte_executable(u64 spte)
+static u64 make_spte_executable(u64 spte, u8 access)
{
- return modify_spte_protections(spte, shadow_x_mask, shadow_nx_mask);
-}
-
-static u64 make_spte_nonexecutable(u64 spte)
-{
- return modify_spte_protections(spte, shadow_nx_mask, shadow_x_mask);
+ u64 set, clear;
+ if (access & ACC_EXEC_MASK)
+ set = shadow_x_mask;
+ else
+ set = shadow_nx_mask;
+ clear = set ^ (shadow_nx_mask | shadow_x_mask);
+ return modify_spte_protections(spte, set, clear);
}
/*
@@ -356,8 +357,8 @@ u64 make_small_spte(struct kvm *kvm, u64 huge_spte,
* the page executable as the NX hugepage mitigation no longer
* applies.
*/
- if ((role.access & ACC_EXEC_MASK) && is_nx_huge_page_enabled(kvm))
- child_spte = make_spte_executable(child_spte);
+ if (is_nx_huge_page_enabled(kvm))
+ child_spte = make_spte_executable(child_spte, role.access);
}
return child_spte;
@@ -379,7 +380,7 @@ u64 make_huge_spte(struct kvm *kvm, u64 small_spte, int level)
huge_spte &= KVM_HPAGE_MASK(level) | ~PAGE_MASK;
if (is_nx_huge_page_enabled(kvm))
- huge_spte = make_spte_nonexecutable(huge_spte);
+ huge_spte = make_spte_executable(huge_spte, 0);
return huge_spte;
}
--
2.52.0
^ permalink raw reply related [flat|nested] 56+ messages in thread
* [PATCH 07/22] KVM: x86/mmu: rename and clarify BYTE_MASK
2026-03-21 0:09 [RFC PATCH 00/22] KVM: combined patchset for MBEC/GMET support Paolo Bonzini
` (5 preceding siblings ...)
2026-03-21 0:09 ` [PATCH 06/22] KVM: x86/mmu: merge make_spte_{non,}executable Paolo Bonzini
@ 2026-03-21 0:09 ` Paolo Bonzini
2026-03-21 0:09 ` [PATCH 08/22] KVM: x86/mmu: introduce ACC_READ_MASK Paolo Bonzini
` (15 subsequent siblings)
22 siblings, 0 replies; 56+ messages in thread
From: Paolo Bonzini @ 2026-03-21 0:09 UTC (permalink / raw)
To: linux-kernel, kvm
Cc: Jon Kohler, Marcelo Tosatti, Nikunj A Dadhania, Amit Shah,
Sean Christopherson
The BYTE_MASK macro is the central point of the black magic
in update_permission_bitmask(). Rename it to something
that relates to how it is used, and add a comment explaining
how it works.
Using shifts instead of powers of two was actually suggested by
David Hildenbrand back in 2017 for clarity[1] but I evidently
forgot his suggestion when applying to kvm.git.
[1] https://lore.kernel.org/kvm/e4b5df86-31ae-2f4e-0666-393753e256df@redhat.com/
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
arch/x86/kvm/mmu/mmu.c | 55 ++++++++++++++++++++++++++++++------------
1 file changed, 39 insertions(+), 16 deletions(-)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 0349e26baa2d..84351df8a9cb 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5416,29 +5416,53 @@ reset_ept_shadow_zero_bits_mask(struct kvm_mmu *context, bool execonly)
max_huge_page_level);
}
-#define BYTE_MASK(access) \
- ((1 & (access) ? 2 : 0) | \
- (2 & (access) ? 4 : 0) | \
- (3 & (access) ? 8 : 0) | \
- (4 & (access) ? 16 : 0) | \
- (5 & (access) ? 32 : 0) | \
- (6 & (access) ? 64 : 0) | \
- (7 & (access) ? 128 : 0))
-
+/*
+ * Build a mask with all combinations of PTE access rights that
+ * include the given access bit. The mask can be queried with
+ * "mask & (1 << access)", where access is a combination of
+ * ACC_* bits.
+ *
+ * By mixing and matching multiple masks returned by ACC_BITS_MASK,
+ * update_permission_bitmask() builds what is effectively a
+ * two-dimensional array of bools. The second dimension is
+ * provided by individual bits of permissions[pfec >> 1], and
+ * logical &, | and ~ operations operate on all the 8 possible
+ * combinations of ACC_* bits.
+ */
+#define ACC_BITS_MASK(access) \
+ ((1 & (access) ? 1 << 1 : 0) | \
+ (2 & (access) ? 1 << 2 : 0) | \
+ (3 & (access) ? 1 << 3 : 0) | \
+ (4 & (access) ? 1 << 4 : 0) | \
+ (5 & (access) ? 1 << 5 : 0) | \
+ (6 & (access) ? 1 << 6 : 0) | \
+ (7 & (access) ? 1 << 7 : 0))
static void update_permission_bitmask(struct kvm_mmu *mmu, bool ept)
{
unsigned byte;
- const u8 x = BYTE_MASK(ACC_EXEC_MASK);
- const u8 w = BYTE_MASK(ACC_WRITE_MASK);
- const u8 u = BYTE_MASK(ACC_USER_MASK);
+ const u8 x = ACC_BITS_MASK(ACC_EXEC_MASK);
+ const u8 w = ACC_BITS_MASK(ACC_WRITE_MASK);
+ const u8 u = ACC_BITS_MASK(ACC_USER_MASK);
bool cr4_smep = is_cr4_smep(mmu);
bool cr4_smap = is_cr4_smap(mmu);
bool cr0_wp = is_cr0_wp(mmu);
bool efer_nx = is_efer_nx(mmu);
+ /*
+ * In hardware, page fault error codes are generated (as the name
+ * suggests) on any kind of page fault. permission_fault() and
+ * paging_tmpl.h already use the same bits after a successful page
+ * table walk, to indicate the kind of access being performed.
+ *
+ * However, PFERR_PRESENT_MASK and PFERR_RSVD_MASK are never set here,
+ * exactly because the page walk is successful. PFERR_PRESENT_MASK is
+ * removed by the shift, while PFERR_RSVD_MASK is repurposed in
+ * permission_fault() to indicate accesses that are *not* subject to
+ * SMAP restrictions.
+ */
for (byte = 0; byte < ARRAY_SIZE(mmu->permissions); ++byte) {
unsigned pfec = byte << 1;
@@ -5485,10 +5509,9 @@ static void update_permission_bitmask(struct kvm_mmu *mmu, bool ept)
* - The access is supervisor mode
* - If implicit supervisor access or X86_EFLAGS_AC is clear
*
- * Here, we cover the first four conditions.
- * The fifth is computed dynamically in permission_fault();
- * PFERR_RSVD_MASK bit will be set in PFEC if the access is
- * *not* subject to SMAP restrictions.
+ * Here, we cover the first four conditions. The fifth
+ * is computed dynamically in permission_fault() and
+ * communicated by setting PFERR_RSVD_MASK.
*/
if (cr4_smap)
smapf = (pfec & (PFERR_RSVD_MASK|PFERR_FETCH_MASK)) ? 0 : kf;
--
2.52.0
^ permalink raw reply related [flat|nested] 56+ messages in thread
* [PATCH 08/22] KVM: x86/mmu: introduce ACC_READ_MASK
2026-03-21 0:09 [RFC PATCH 00/22] KVM: combined patchset for MBEC/GMET support Paolo Bonzini
` (6 preceding siblings ...)
2026-03-21 0:09 ` [PATCH 07/22] KVM: x86/mmu: rename and clarify BYTE_MASK Paolo Bonzini
@ 2026-03-21 0:09 ` Paolo Bonzini
2026-03-23 14:49 ` Jon Kohler
2026-03-23 14:49 ` Jon Kohler
2026-03-21 0:09 ` [PATCH 09/22] KVM: x86/mmu: separate more EPT/non-EPT permission_fault() Paolo Bonzini
` (14 subsequent siblings)
22 siblings, 2 replies; 56+ messages in thread
From: Paolo Bonzini @ 2026-03-21 0:09 UTC (permalink / raw)
To: linux-kernel, kvm
Cc: Jon Kohler, Marcelo Tosatti, Nikunj A Dadhania, Amit Shah,
Sean Christopherson
Read permissions so far were only needed for EPT, which does not need
ACC_USER_MASK. Therefore, for EPT page tables ACC_USER_MASK was repurposed
as a read permission bit.
In order to implement nested MBEC, EPT will genuinely have four kinds of
accesses, and there will be no room for such hacks; bite the bullet at
last, enlarging ACC_ALL to four bits and permissions[] to 2^4 bits (u16).
The new code does not enforce that the XWR bits on non-execonly processors
have their R bit set, even when running nested: none of the shadow_*_mask
values have bit 0 set, and make_spte() genuinely relies on ACC_READ_MASK
being requested! This works becase, if execonly is not supported by the
processor, shadow EPT will generate an EPT misconfig vmexit if the XWR
bits represent a non-readable page, and therefore the pte_access argument
to make_spte() will also always have ACC_READ_MASK set.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
arch/x86/include/asm/kvm_host.h | 12 +++++-----
arch/x86/kvm/mmu.h | 2 +-
arch/x86/kvm/mmu/mmu.c | 39 +++++++++++++++++++++------------
arch/x86/kvm/mmu/mmutrace.h | 3 ++-
arch/x86/kvm/mmu/paging_tmpl.h | 21 +++++++++---------
arch/x86/kvm/mmu/spte.c | 18 ++++++---------
arch/x86/kvm/mmu/spte.h | 5 +++--
arch/x86/kvm/vmx/capabilities.h | 5 -----
arch/x86/kvm/vmx/common.h | 5 +----
arch/x86/kvm/vmx/vmx.c | 3 +--
10 files changed, 56 insertions(+), 57 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 871c7ff4fb29..3efb238c683c 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -317,11 +317,11 @@ struct kvm_kernel_irq_routing_entry;
* the number of unique SPs that can theoretically be created is 2^n, where n
* is the number of bits that are used to compute the role.
*
- * But, even though there are 20 bits in the mask below, not all combinations
+ * But, even though there are 21 bits in the mask below, not all combinations
* of modes and flags are possible:
*
* - invalid shadow pages are not accounted, mirror pages are not shadowed,
- * so the bits are effectively 18.
+ * so the bits are effectively 19.
*
* - quadrant will only be used if has_4_byte_gpte=1 (non-PAE paging);
* execonly and ad_disabled are only used for nested EPT which has
@@ -336,7 +336,7 @@ struct kvm_kernel_irq_routing_entry;
* cr0_wp=0, therefore these three bits only give rise to 5 possibilities.
*
* Therefore, the maximum number of possible upper-level shadow pages for a
- * single gfn is a bit less than 2^13.
+ * single gfn is a bit less than 2^14.
*/
union kvm_mmu_page_role {
u32 word;
@@ -345,7 +345,7 @@ union kvm_mmu_page_role {
unsigned has_4_byte_gpte:1;
unsigned quadrant:2;
unsigned direct:1;
- unsigned access:3;
+ unsigned access:4;
unsigned invalid:1;
unsigned efer_nx:1;
unsigned cr0_wp:1;
@@ -355,7 +355,7 @@ union kvm_mmu_page_role {
unsigned guest_mode:1;
unsigned passthrough:1;
unsigned is_mirror:1;
- unsigned :4;
+ unsigned :3;
/*
* This is left at the top of the word so that
@@ -481,7 +481,7 @@ struct kvm_mmu {
* Byte index: page fault error code [4:1]
* Bit index: pte permissions in ACC_* format
*/
- u8 permissions[16];
+ u16 permissions[16];
u64 *pae_root;
u64 *pml4_root;
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index b4b6860ab971..f5d35f66750b 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -81,7 +81,7 @@ u8 kvm_mmu_get_max_tdp_level(void);
void kvm_mmu_set_mmio_spte_mask(u64 mmio_value, u64 mmio_mask, u64 access_mask);
void kvm_mmu_set_mmio_spte_value(struct kvm *kvm, u64 mmio_value);
void kvm_mmu_set_me_spte_mask(u64 me_value, u64 me_mask);
-void kvm_mmu_set_ept_masks(bool has_ad_bits, bool has_exec_only);
+void kvm_mmu_set_ept_masks(bool has_ad_bits);
void kvm_init_mmu(struct kvm_vcpu *vcpu);
void kvm_init_shadow_npt_mmu(struct kvm_vcpu *vcpu, unsigned long cr0,
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 84351df8a9cb..b87dbf9e42b9 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2029,7 +2029,7 @@ static bool kvm_sync_page_check(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
*/
const union kvm_mmu_page_role sync_role_ign = {
.level = 0xf,
- .access = 0x7,
+ .access = ACC_ALL,
.quadrant = 0x3,
.passthrough = 0x1,
};
@@ -5426,7 +5426,7 @@ reset_ept_shadow_zero_bits_mask(struct kvm_mmu *context, bool execonly)
* update_permission_bitmask() builds what is effectively a
* two-dimensional array of bools. The second dimension is
* provided by individual bits of permissions[pfec >> 1], and
- * logical &, | and ~ operations operate on all the 8 possible
+ * logical &, | and ~ operations operate on all the 16 possible
* combinations of ACC_* bits.
*/
#define ACC_BITS_MASK(access) \
@@ -5436,15 +5436,24 @@ reset_ept_shadow_zero_bits_mask(struct kvm_mmu *context, bool execonly)
(4 & (access) ? 1 << 4 : 0) | \
(5 & (access) ? 1 << 5 : 0) | \
(6 & (access) ? 1 << 6 : 0) | \
- (7 & (access) ? 1 << 7 : 0))
+ (7 & (access) ? 1 << 7 : 0) | \
+ (8 & (access) ? 1 << 8 : 0) | \
+ (9 & (access) ? 1 << 9 : 0) | \
+ (10 & (access) ? 1 << 10 : 0) | \
+ (11 & (access) ? 1 << 11 : 0) | \
+ (12 & (access) ? 1 << 12 : 0) | \
+ (13 & (access) ? 1 << 13 : 0) | \
+ (14 & (access) ? 1 << 14 : 0) | \
+ (15 & (access) ? 1 << 15 : 0))
static void update_permission_bitmask(struct kvm_mmu *mmu, bool ept)
{
unsigned byte;
- const u8 x = ACC_BITS_MASK(ACC_EXEC_MASK);
- const u8 w = ACC_BITS_MASK(ACC_WRITE_MASK);
- const u8 u = ACC_BITS_MASK(ACC_USER_MASK);
+ const u16 x = ACC_BITS_MASK(ACC_EXEC_MASK);
+ const u16 w = ACC_BITS_MASK(ACC_WRITE_MASK);
+ const u16 u = ACC_BITS_MASK(ACC_USER_MASK);
+ const u16 r = ACC_BITS_MASK(ACC_READ_MASK);
bool cr4_smep = is_cr4_smep(mmu);
bool cr4_smap = is_cr4_smap(mmu);
@@ -5467,24 +5476,26 @@ static void update_permission_bitmask(struct kvm_mmu *mmu, bool ept)
unsigned pfec = byte << 1;
/*
- * Each "*f" variable has a 1 bit for each UWX value
+ * Each "*f" variable has a 1 bit for each ACC_* combo
* that causes a fault with the given PFEC.
*/
+ /* Faults from reads to non-readable pages */
+ u16 rf = (pfec & (PFERR_WRITE_MASK|PFERR_FETCH_MASK)) ? 0 : (u16)~r;
/* Faults from writes to non-writable pages */
- u8 wf = (pfec & PFERR_WRITE_MASK) ? (u8)~w : 0;
+ u16 wf = (pfec & PFERR_WRITE_MASK) ? (u16)~w : 0;
/* Faults from user mode accesses to supervisor pages */
- u8 uf = (pfec & PFERR_USER_MASK) ? (u8)~u : 0;
+ u16 uf = (pfec & PFERR_USER_MASK) ? (u16)~u : 0;
/* Faults from fetches of non-executable pages*/
- u8 ff = (pfec & PFERR_FETCH_MASK) ? (u8)~x : 0;
+ u16 ff = (pfec & PFERR_FETCH_MASK) ? (u16)~x : 0;
/* Faults from kernel mode fetches of user pages */
- u8 smepf = 0;
+ u16 smepf = 0;
/* Faults from kernel mode accesses of user pages */
- u8 smapf = 0;
+ u16 smapf = 0;
if (!ept) {
/* Faults from kernel mode accesses to user pages */
- u8 kf = (pfec & PFERR_USER_MASK) ? 0 : u;
+ u16 kf = (pfec & PFERR_USER_MASK) ? 0 : u;
/* Not really needed: !nx will cause pte.nx to fault */
if (!efer_nx)
@@ -5517,7 +5528,7 @@ static void update_permission_bitmask(struct kvm_mmu *mmu, bool ept)
smapf = (pfec & (PFERR_RSVD_MASK|PFERR_FETCH_MASK)) ? 0 : kf;
}
- mmu->permissions[byte] = ff | uf | wf | smepf | smapf;
+ mmu->permissions[byte] = ff | uf | wf | rf | smepf | smapf;
}
}
diff --git a/arch/x86/kvm/mmu/mmutrace.h b/arch/x86/kvm/mmu/mmutrace.h
index f35a830ce469..44545f6f860a 100644
--- a/arch/x86/kvm/mmu/mmutrace.h
+++ b/arch/x86/kvm/mmu/mmutrace.h
@@ -25,7 +25,8 @@
#define KVM_MMU_PAGE_PRINTK() ({ \
const char *saved_ptr = trace_seq_buffer_ptr(p); \
static const char *access_str[] = { \
- "---", "--x", "w--", "w-x", "-u-", "-ux", "wu-", "wux" \
+ "----", "r---", "-w--", "rw--", "--u-", "r-u-", "-wu-", "rwu-", \
+ "---x", "r--x", "-w-x", "rw-x", "--ux", "r-ux", "-wux", "rwux" \
}; \
union kvm_mmu_page_role role; \
\
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index ed762bb4b007..bbdbf4ae2d65 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -170,25 +170,24 @@ static bool FNAME(prefetch_invalid_gpte)(struct kvm_vcpu *vcpu,
return true;
}
-/*
- * For PTTYPE_EPT, a page table can be executable but not readable
- * on supported processors. Therefore, set_spte does not automatically
- * set bit 0 if execute only is supported. Here, we repurpose ACC_USER_MASK
- * to signify readability since it isn't used in the EPT case
- */
static inline unsigned FNAME(gpte_access)(u64 gpte)
{
unsigned access;
#if PTTYPE == PTTYPE_EPT
access = ((gpte & VMX_EPT_WRITABLE_MASK) ? ACC_WRITE_MASK : 0) |
((gpte & VMX_EPT_EXECUTABLE_MASK) ? ACC_EXEC_MASK : 0) |
- ((gpte & VMX_EPT_READABLE_MASK) ? ACC_USER_MASK : 0);
+ ((gpte & VMX_EPT_READABLE_MASK) ? ACC_READ_MASK : 0);
#else
- BUILD_BUG_ON(ACC_EXEC_MASK != PT_PRESENT_MASK);
- BUILD_BUG_ON(ACC_EXEC_MASK != 1);
+ /*
+ * P is set here, so the page is always readable and W/U/!NX represent
+ * allowed accesses.
+ */
+ BUILD_BUG_ON(ACC_READ_MASK != PT_PRESENT_MASK);
+ BUILD_BUG_ON(ACC_WRITE_MASK != PT_WRITABLE_MASK);
+ BUILD_BUG_ON(ACC_USER_MASK != PT_USER_MASK);
+ BUILD_BUG_ON(ACC_EXEC_MASK & (PT_WRITABLE_MASK | PT_USER_MASK | PT_PRESENT_MASK));
access = gpte & (PT_WRITABLE_MASK | PT_USER_MASK | PT_PRESENT_MASK);
- /* Combine NX with P (which is set here) to get ACC_EXEC_MASK. */
- access ^= (gpte >> PT64_NX_SHIFT);
+ access |= gpte & PT64_NX_MASK ? 0 : ACC_EXEC_MASK;
#endif
return access;
diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index e2acd9ed9dba..0b09124b0d54 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -194,12 +194,6 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
int is_host_mmio = -1;
bool wrprot = false;
- /*
- * For the EPT case, shadow_present_mask has no RWX bits set if
- * exec-only page table entries are supported. In that case,
- * ACC_USER_MASK and shadow_user_mask are used to represent
- * read access. See FNAME(gpte_access) in paging_tmpl.h.
- */
WARN_ON_ONCE((pte_access | shadow_present_mask) == SHADOW_NONPRESENT_VALUE);
if (sp->role.ad_disabled)
@@ -228,6 +222,9 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
pte_access &= ~ACC_EXEC_MASK;
}
+ if (pte_access & ACC_READ_MASK)
+ spte |= PT_PRESENT_MASK; /* or VMX_EPT_READABLE_MASK */
+
if (pte_access & ACC_EXEC_MASK)
spte |= shadow_x_mask;
else
@@ -390,6 +387,7 @@ u64 make_nonleaf_spte(u64 *child_pt, bool ad_disabled)
u64 spte = SPTE_MMU_PRESENT_MASK;
spte |= __pa(child_pt) | shadow_present_mask | PT_WRITABLE_MASK |
+ PT_PRESENT_MASK /* or VMX_EPT_READABLE_MASK */ |
shadow_user_mask | shadow_x_mask | shadow_me_value;
if (ad_disabled)
@@ -490,18 +488,16 @@ void kvm_mmu_set_me_spte_mask(u64 me_value, u64 me_mask)
}
EXPORT_SYMBOL_GPL(kvm_mmu_set_me_spte_mask);
-void kvm_mmu_set_ept_masks(bool has_ad_bits, bool has_exec_only)
+void kvm_mmu_set_ept_masks(bool has_ad_bits)
{
kvm_ad_enabled = has_ad_bits;
- shadow_user_mask = VMX_EPT_READABLE_MASK;
+ shadow_user_mask = 0;
shadow_accessed_mask = VMX_EPT_ACCESS_BIT;
shadow_dirty_mask = VMX_EPT_DIRTY_BIT;
shadow_nx_mask = 0ull;
shadow_x_mask = VMX_EPT_EXECUTABLE_MASK;
- /* VMX_EPT_SUPPRESS_VE_BIT is needed for W or X violation. */
- shadow_present_mask =
- (has_exec_only ? 0ull : VMX_EPT_READABLE_MASK) | VMX_EPT_SUPPRESS_VE_BIT;
+ shadow_present_mask = VMX_EPT_SUPPRESS_VE_BIT;
shadow_acc_track_mask = VMX_EPT_RWX_MASK;
shadow_host_writable_mask = EPT_SPTE_HOST_WRITABLE;
diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
index 3d77755b6b10..0c305f2f4ba0 100644
--- a/arch/x86/kvm/mmu/spte.h
+++ b/arch/x86/kvm/mmu/spte.h
@@ -52,10 +52,11 @@ static_assert(SPTE_TDP_AD_ENABLED == 0);
#define SPTE_BASE_ADDR_MASK (((1ULL << 52) - 1) & ~(u64)(PAGE_SIZE-1))
#endif
-#define ACC_EXEC_MASK 1
+#define ACC_READ_MASK PT_PRESENT_MASK
#define ACC_WRITE_MASK PT_WRITABLE_MASK
#define ACC_USER_MASK PT_USER_MASK
-#define ACC_ALL (ACC_EXEC_MASK | ACC_WRITE_MASK | ACC_USER_MASK)
+#define ACC_EXEC_MASK 8
+#define ACC_ALL (ACC_EXEC_MASK | ACC_WRITE_MASK | ACC_USER_MASK | ACC_READ_MASK)
#define SPTE_LEVEL_BITS 9
#define SPTE_LEVEL_SHIFT(level) __PT_LEVEL_SHIFT(level, SPTE_LEVEL_BITS)
diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
index 5316c27f6099..3bda6a621d8a 100644
--- a/arch/x86/kvm/vmx/capabilities.h
+++ b/arch/x86/kvm/vmx/capabilities.h
@@ -288,11 +288,6 @@ static inline bool cpu_has_vmx_flexpriority(void)
cpu_has_vmx_virtualize_apic_accesses();
}
-static inline bool cpu_has_vmx_ept_execute_only(void)
-{
- return vmx_capability.ept & VMX_EPT_EXECUTE_ONLY_BIT;
-}
-
static inline bool cpu_has_vmx_ept_4levels(void)
{
return vmx_capability.ept & VMX_EPT_PAGE_WALK_4_BIT;
diff --git a/arch/x86/kvm/vmx/common.h b/arch/x86/kvm/vmx/common.h
index adf925500b9e..1afbf272efae 100644
--- a/arch/x86/kvm/vmx/common.h
+++ b/arch/x86/kvm/vmx/common.h
@@ -85,11 +85,8 @@ static inline int __vmx_handle_ept_violation(struct kvm_vcpu *vcpu, gpa_t gpa,
{
u64 error_code;
- /* Is it a read fault? */
- error_code = (exit_qualification & EPT_VIOLATION_ACC_READ)
- ? PFERR_USER_MASK : 0;
/* Is it a write fault? */
- error_code |= (exit_qualification & EPT_VIOLATION_ACC_WRITE)
+ error_code = (exit_qualification & EPT_VIOLATION_ACC_WRITE)
? PFERR_WRITE_MASK : 0;
/* Is it a fetch fault? */
error_code |= (exit_qualification & EPT_VIOLATION_ACC_INSTR)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 2e687761aeaf..98801c408b8c 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -8425,8 +8425,7 @@ __init int vmx_hardware_setup(void)
set_bit(0, vmx_vpid_bitmap); /* 0 is reserved for host */
if (enable_ept)
- kvm_mmu_set_ept_masks(enable_ept_ad_bits,
- cpu_has_vmx_ept_execute_only());
+ kvm_mmu_set_ept_masks(enable_ept_ad_bits);
else
vt_x86_ops.get_mt_mask = NULL;
--
2.52.0
^ permalink raw reply related [flat|nested] 56+ messages in thread
* [PATCH 09/22] KVM: x86/mmu: separate more EPT/non-EPT permission_fault()
2026-03-21 0:09 [RFC PATCH 00/22] KVM: combined patchset for MBEC/GMET support Paolo Bonzini
` (7 preceding siblings ...)
2026-03-21 0:09 ` [PATCH 08/22] KVM: x86/mmu: introduce ACC_READ_MASK Paolo Bonzini
@ 2026-03-21 0:09 ` Paolo Bonzini
2026-03-21 0:09 ` [PATCH 10/22] KVM: x86/mmu: split XS/XU bits for MBEC Paolo Bonzini
` (13 subsequent siblings)
22 siblings, 0 replies; 56+ messages in thread
From: Paolo Bonzini @ 2026-03-21 0:09 UTC (permalink / raw)
To: linux-kernel, kvm
Cc: Jon Kohler, Marcelo Tosatti, Nikunj A Dadhania, Amit Shah,
Sean Christopherson
Now that EPT is not abusing anymore ACC_USER_MASK, move its
handling entirely in the !ept branch. Merge smepf and ff
into a single variable because EPT's "SMEP" (actually
MBEC) is defined differently.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
arch/x86/kvm/mmu/mmu.c | 26 ++++++++++++++------------
1 file changed, 14 insertions(+), 12 deletions(-)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index b87dbf9e42b9..b7366e416baa 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5452,7 +5452,6 @@ static void update_permission_bitmask(struct kvm_mmu *mmu, bool ept)
const u16 x = ACC_BITS_MASK(ACC_EXEC_MASK);
const u16 w = ACC_BITS_MASK(ACC_WRITE_MASK);
- const u16 u = ACC_BITS_MASK(ACC_USER_MASK);
const u16 r = ACC_BITS_MASK(ACC_READ_MASK);
bool cr4_smep = is_cr4_smep(mmu);
@@ -5485,21 +5484,24 @@ static void update_permission_bitmask(struct kvm_mmu *mmu, bool ept)
/* Faults from writes to non-writable pages */
u16 wf = (pfec & PFERR_WRITE_MASK) ? (u16)~w : 0;
/* Faults from user mode accesses to supervisor pages */
- u16 uf = (pfec & PFERR_USER_MASK) ? (u16)~u : 0;
- /* Faults from fetches of non-executable pages*/
- u16 ff = (pfec & PFERR_FETCH_MASK) ? (u16)~x : 0;
- /* Faults from kernel mode fetches of user pages */
- u16 smepf = 0;
+ u16 uf = 0;
+ /* Faults from fetches of non-executable pages */
+ u16 ff = 0;
/* Faults from kernel mode accesses of user pages */
u16 smapf = 0;
- if (!ept) {
+ if (ept) {
+ ff = (pfec & PFERR_FETCH_MASK) ? (u16)~x : 0;
+ } else {
+ const u16 u = ACC_BITS_MASK(ACC_USER_MASK);
+
/* Faults from kernel mode accesses to user pages */
u16 kf = (pfec & PFERR_USER_MASK) ? 0 : u;
- /* Not really needed: !nx will cause pte.nx to fault */
- if (!efer_nx)
- ff = 0;
+ uf = (pfec & PFERR_USER_MASK) ? (u16)~u : 0;
+
+ if (efer_nx)
+ ff = (pfec & PFERR_FETCH_MASK) ? (u16)~x : 0;
/* Allow supervisor writes if !cr0.wp */
if (!cr0_wp)
@@ -5507,7 +5509,7 @@ static void update_permission_bitmask(struct kvm_mmu *mmu, bool ept)
/* Disallow supervisor fetches of user code if cr4.smep */
if (cr4_smep)
- smepf = (pfec & PFERR_FETCH_MASK) ? kf : 0;
+ ff |= (pfec & PFERR_FETCH_MASK) ? kf : 0;
/*
* SMAP:kernel-mode data accesses from user-mode
@@ -5528,7 +5530,7 @@ static void update_permission_bitmask(struct kvm_mmu *mmu, bool ept)
smapf = (pfec & (PFERR_RSVD_MASK|PFERR_FETCH_MASK)) ? 0 : kf;
}
- mmu->permissions[byte] = ff | uf | wf | rf | smepf | smapf;
+ mmu->permissions[byte] = ff | uf | wf | rf | smapf;
}
}
--
2.52.0
^ permalink raw reply related [flat|nested] 56+ messages in thread
* [PATCH 10/22] KVM: x86/mmu: split XS/XU bits for MBEC
2026-03-21 0:09 [RFC PATCH 00/22] KVM: combined patchset for MBEC/GMET support Paolo Bonzini
` (8 preceding siblings ...)
2026-03-21 0:09 ` [PATCH 09/22] KVM: x86/mmu: separate more EPT/non-EPT permission_fault() Paolo Bonzini
@ 2026-03-21 0:09 ` Paolo Bonzini
2026-03-24 10:45 ` Huang, Kai
2026-03-21 0:09 ` [PATCH 11/22] KVM: x86/mmu: move cr4_smep to base role Paolo Bonzini
` (12 subsequent siblings)
22 siblings, 1 reply; 56+ messages in thread
From: Paolo Bonzini @ 2026-03-21 0:09 UTC (permalink / raw)
To: linux-kernel, kvm
Cc: Jon Kohler, Marcelo Tosatti, Nikunj A Dadhania, Amit Shah,
Sean Christopherson
When EPT is in use, replace ACC_USER_MASK with ACC_USER_EXEC_MASK,
so that supervisor and user-mode execution can be controlled
independently (ACC_USER_MASK would not allow a setting similar to
XU=0 XS=1 W=1 R=1).
Replace shadow_x_mask with shadow_xs_mask/shadow_xu_mask, to allow
setting XS and XU bits separately in EPT entries.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
arch/x86/include/asm/vmx.h | 1 +
arch/x86/kvm/mmu/mmu.c | 15 ++++++++---
arch/x86/kvm/mmu/mmutrace.h | 6 ++---
arch/x86/kvm/mmu/paging_tmpl.h | 4 +++
arch/x86/kvm/mmu/spte.c | 47 ++++++++++++++++++++++------------
arch/x86/kvm/mmu/spte.h | 8 +++---
6 files changed, 55 insertions(+), 26 deletions(-)
diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 4a0804cc7c82..0041f8a77447 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -538,6 +538,7 @@ enum vmcs_field {
#define VMX_EPT_IPAT_BIT (1ull << 6)
#define VMX_EPT_ACCESS_BIT (1ull << 8)
#define VMX_EPT_DIRTY_BIT (1ull << 9)
+#define VMX_EPT_USER_EXECUTABLE_MASK (1ull << 10)
#define VMX_EPT_SUPPRESS_VE_BIT (1ull << 63)
#define VMX_EPT_RWX_MASK (VMX_EPT_READABLE_MASK | \
VMX_EPT_WRITABLE_MASK | \
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index b7366e416baa..254d69c4b9f3 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5371,7 +5371,7 @@ static void reset_shadow_zero_bits_mask(struct kvm_vcpu *vcpu,
static inline bool boot_cpu_is_amd(void)
{
WARN_ON_ONCE(!tdp_enabled);
- return shadow_x_mask == 0;
+ return shadow_xs_mask == 0;
}
/*
@@ -5450,7 +5450,6 @@ static void update_permission_bitmask(struct kvm_mmu *mmu, bool ept)
{
unsigned byte;
- const u16 x = ACC_BITS_MASK(ACC_EXEC_MASK);
const u16 w = ACC_BITS_MASK(ACC_WRITE_MASK);
const u16 r = ACC_BITS_MASK(ACC_READ_MASK);
@@ -5491,8 +5490,18 @@ static void update_permission_bitmask(struct kvm_mmu *mmu, bool ept)
u16 smapf = 0;
if (ept) {
- ff = (pfec & PFERR_FETCH_MASK) ? (u16)~x : 0;
+ const u16 xs = ACC_BITS_MASK(ACC_EXEC_MASK);
+ const u16 xu = ACC_BITS_MASK(ACC_USER_EXEC_MASK);
+
+ if (pfec & PFERR_FETCH_MASK) {
+ /* Ignore XU unless MBEC is enabled. */
+ if (cr4_smep)
+ ff = pfec & PFERR_USER_MASK ? (u16)~xu : (u16)~xs;
+ else
+ ff = (u16)~xs;
+ }
} else {
+ const u16 x = ACC_BITS_MASK(ACC_EXEC_MASK);
const u16 u = ACC_BITS_MASK(ACC_USER_MASK);
/* Faults from kernel mode accesses to user pages */
diff --git a/arch/x86/kvm/mmu/mmutrace.h b/arch/x86/kvm/mmu/mmutrace.h
index 44545f6f860a..e22588d3e145 100644
--- a/arch/x86/kvm/mmu/mmutrace.h
+++ b/arch/x86/kvm/mmu/mmutrace.h
@@ -354,8 +354,8 @@ TRACE_EVENT(
__entry->sptep = virt_to_phys(sptep);
__entry->level = level;
__entry->r = shadow_present_mask || (__entry->spte & PT_PRESENT_MASK);
- __entry->x = is_executable_pte(__entry->spte);
- __entry->u = shadow_user_mask ? !!(__entry->spte & shadow_user_mask) : -1;
+ __entry->x = (__entry->spte & (shadow_xs_mask | shadow_nx_mask)) == shadow_xs_mask;
+ __entry->u = !!(__entry->spte & (shadow_xu_mask | shadow_user_mask));
),
TP_printk("gfn %llx spte %llx (%s%s%s%s) level %d at %llx",
@@ -363,7 +363,7 @@ TRACE_EVENT(
__entry->r ? "r" : "-",
__entry->spte & PT_WRITABLE_MASK ? "w" : "-",
__entry->x ? "x" : "-",
- __entry->u == -1 ? "" : (__entry->u ? "u" : "-"),
+ __entry->u ? "u" : "-",
__entry->level, __entry->sptep
)
);
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index bbdbf4ae2d65..c657ea90bb33 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -174,6 +174,10 @@ static inline unsigned FNAME(gpte_access)(u64 gpte)
{
unsigned access;
#if PTTYPE == PTTYPE_EPT
+ /*
+ * For now nested MBEC is not supported and permission_fault() ignores
+ * ACC_USER_EXEC_MASK.
+ */
access = ((gpte & VMX_EPT_WRITABLE_MASK) ? ACC_WRITE_MASK : 0) |
((gpte & VMX_EPT_EXECUTABLE_MASK) ? ACC_EXEC_MASK : 0) |
((gpte & VMX_EPT_READABLE_MASK) ? ACC_READ_MASK : 0);
diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index 0b09124b0d54..0b3e2b97afbf 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -29,8 +29,9 @@ bool __read_mostly kvm_ad_enabled;
u64 __read_mostly shadow_host_writable_mask;
u64 __read_mostly shadow_mmu_writable_mask;
u64 __read_mostly shadow_nx_mask;
-u64 __read_mostly shadow_x_mask; /* mutual exclusive with nx_mask */
u64 __read_mostly shadow_user_mask;
+u64 __read_mostly shadow_xs_mask; /* mutual exclusive with nx_mask and user_mask */
+u64 __read_mostly shadow_xu_mask; /* mutual exclusive with nx_mask and user_mask */
u64 __read_mostly shadow_accessed_mask;
u64 __read_mostly shadow_dirty_mask;
u64 __read_mostly shadow_mmio_value;
@@ -216,22 +217,30 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
* when CR0.PG is toggled, but leveraging that to ignore the mitigation
* would tie make_spte() further to vCPU/MMU state, and add complexity
* just to optimize a mode that is anything but performance critical.
+ *
+ * Use ACC_USER_EXEC_MASK here assuming only Intel processors (EPT)
+ * are affected by the NX huge page erratum.
*/
- if (level > PG_LEVEL_4K && (pte_access & ACC_EXEC_MASK) &&
+ if (level > PG_LEVEL_4K &&
+ (pte_access & (ACC_EXEC_MASK | ACC_USER_EXEC_MASK)) &&
is_nx_huge_page_enabled(vcpu->kvm)) {
- pte_access &= ~ACC_EXEC_MASK;
+ pte_access &= ~(ACC_EXEC_MASK | ACC_USER_EXEC_MASK);
}
if (pte_access & ACC_READ_MASK)
spte |= PT_PRESENT_MASK; /* or VMX_EPT_READABLE_MASK */
- if (pte_access & ACC_EXEC_MASK)
- spte |= shadow_x_mask;
- else
- spte |= shadow_nx_mask;
-
- if (pte_access & ACC_USER_MASK)
- spte |= shadow_user_mask;
+ if (shadow_nx_mask) {
+ if (!(pte_access & ACC_EXEC_MASK))
+ spte |= shadow_nx_mask;
+ if (pte_access & ACC_USER_MASK)
+ spte |= shadow_user_mask;
+ } else {
+ if (pte_access & ACC_EXEC_MASK)
+ spte |= shadow_xs_mask;
+ if (pte_access & ACC_USER_EXEC_MASK)
+ spte |= shadow_xu_mask;
+ }
if (level > PG_LEVEL_4K)
spte |= PT_PAGE_SIZE_MASK;
@@ -317,11 +326,13 @@ static u64 modify_spte_protections(u64 spte, u64 set, u64 clear)
static u64 make_spte_executable(u64 spte, u8 access)
{
u64 set, clear;
- if (access & ACC_EXEC_MASK)
- set = shadow_x_mask;
+ if (shadow_nx_mask)
+ set = (access & ACC_EXEC_MASK) ? 0 : shadow_nx_mask;
else
- set = shadow_nx_mask;
- clear = set ^ (shadow_nx_mask | shadow_x_mask);
+ set =
+ (access & ACC_EXEC_MASK ? shadow_xs_mask : 0) |
+ (access & ACC_USER_EXEC_MASK ? shadow_xu_mask : 0);
+ clear = set ^ (shadow_nx_mask | shadow_xs_mask | shadow_xu_mask);
return modify_spte_protections(spte, set, clear);
}
@@ -388,7 +399,7 @@ u64 make_nonleaf_spte(u64 *child_pt, bool ad_disabled)
spte |= __pa(child_pt) | shadow_present_mask | PT_WRITABLE_MASK |
PT_PRESENT_MASK /* or VMX_EPT_READABLE_MASK */ |
- shadow_user_mask | shadow_x_mask | shadow_me_value;
+ shadow_user_mask | shadow_xs_mask | shadow_xu_mask | shadow_me_value;
if (ad_disabled)
spte |= SPTE_TDP_AD_DISABLED;
@@ -496,7 +507,8 @@ void kvm_mmu_set_ept_masks(bool has_ad_bits)
shadow_accessed_mask = VMX_EPT_ACCESS_BIT;
shadow_dirty_mask = VMX_EPT_DIRTY_BIT;
shadow_nx_mask = 0ull;
- shadow_x_mask = VMX_EPT_EXECUTABLE_MASK;
+ shadow_xs_mask = VMX_EPT_EXECUTABLE_MASK;
+ shadow_xu_mask = VMX_EPT_EXECUTABLE_MASK;
shadow_present_mask = VMX_EPT_SUPPRESS_VE_BIT;
shadow_acc_track_mask = VMX_EPT_RWX_MASK;
@@ -547,7 +559,8 @@ void kvm_mmu_reset_all_pte_masks(void)
shadow_accessed_mask = PT_ACCESSED_MASK;
shadow_dirty_mask = PT_DIRTY_MASK;
shadow_nx_mask = PT64_NX_MASK;
- shadow_x_mask = 0;
+ shadow_xs_mask = 0;
+ shadow_xu_mask = 0;
shadow_present_mask = PT_PRESENT_MASK;
shadow_acc_track_mask = 0;
diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
index 0c305f2f4ba0..7323ff19056b 100644
--- a/arch/x86/kvm/mmu/spte.h
+++ b/arch/x86/kvm/mmu/spte.h
@@ -54,7 +54,8 @@ static_assert(SPTE_TDP_AD_ENABLED == 0);
#define ACC_READ_MASK PT_PRESENT_MASK
#define ACC_WRITE_MASK PT_WRITABLE_MASK
-#define ACC_USER_MASK PT_USER_MASK
+#define ACC_USER_MASK PT_USER_MASK /* non EPT */
+#define ACC_USER_EXEC_MASK ACC_USER_MASK /* EPT only */
#define ACC_EXEC_MASK 8
#define ACC_ALL (ACC_EXEC_MASK | ACC_WRITE_MASK | ACC_USER_MASK | ACC_READ_MASK)
@@ -184,8 +185,9 @@ extern bool __read_mostly kvm_ad_enabled;
extern u64 __read_mostly shadow_host_writable_mask;
extern u64 __read_mostly shadow_mmu_writable_mask;
extern u64 __read_mostly shadow_nx_mask;
-extern u64 __read_mostly shadow_x_mask; /* mutual exclusive with nx_mask */
extern u64 __read_mostly shadow_user_mask;
+extern u64 __read_mostly shadow_xs_mask; /* mutual exclusive with nx_mask and user_mask */
+extern u64 __read_mostly shadow_xu_mask; /* mutual exclusive with nx_mask and user_mask */
extern u64 __read_mostly shadow_accessed_mask;
extern u64 __read_mostly shadow_dirty_mask;
extern u64 __read_mostly shadow_mmio_value;
@@ -352,7 +354,7 @@ static inline bool is_last_spte(u64 pte, int level)
static inline bool is_executable_pte(u64 spte)
{
- return (spte & (shadow_x_mask | shadow_nx_mask)) == shadow_x_mask;
+ return (spte & (shadow_xs_mask | shadow_xu_mask | shadow_nx_mask)) != shadow_nx_mask;
}
static inline kvm_pfn_t spte_to_pfn(u64 pte)
--
2.52.0
^ permalink raw reply related [flat|nested] 56+ messages in thread
* [PATCH 11/22] KVM: x86/mmu: move cr4_smep to base role
2026-03-21 0:09 [RFC PATCH 00/22] KVM: combined patchset for MBEC/GMET support Paolo Bonzini
` (9 preceding siblings ...)
2026-03-21 0:09 ` [PATCH 10/22] KVM: x86/mmu: split XS/XU bits for MBEC Paolo Bonzini
@ 2026-03-21 0:09 ` Paolo Bonzini
2026-03-21 0:09 ` [PATCH 12/22] KVM: VMX: enable use of MBEC Paolo Bonzini
` (11 subsequent siblings)
22 siblings, 0 replies; 56+ messages in thread
From: Paolo Bonzini @ 2026-03-21 0:09 UTC (permalink / raw)
To: linux-kernel, kvm
Cc: Jon Kohler, Marcelo Tosatti, Nikunj A Dadhania, Amit Shah,
Sean Christopherson
Guest page tables can be reused independent of the value of CR4.SMEP
(at least if WP=1). However, this is not true of EPT MBEC pages,
because presence of EPT entries is signaled by bits 0-2 when MBEC
is off, and bits 0-2 + bit 10 when MBEC is on.
In preparation for enabling MBEC, move cr4_smep to the base role.
This makes the smep_andnot_wp bit redundant, so remove it.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
Documentation/virt/kvm/x86/mmu.rst | 10 ++++------
arch/x86/include/asm/kvm-x86-ops.h | 1 +
arch/x86/include/asm/kvm_host.h | 23 +++++++++++++++--------
arch/x86/kvm/mmu/mmu.c | 6 +++---
4 files changed, 23 insertions(+), 17 deletions(-)
diff --git a/Documentation/virt/kvm/x86/mmu.rst b/Documentation/virt/kvm/x86/mmu.rst
index 2b3b6d442302..666aa179601a 100644
--- a/Documentation/virt/kvm/x86/mmu.rst
+++ b/Documentation/virt/kvm/x86/mmu.rst
@@ -184,10 +184,8 @@ Shadow pages contain the following information:
Contains the value of efer.nx for which the page is valid.
role.cr0_wp:
Contains the value of cr0.wp for which the page is valid.
- role.smep_andnot_wp:
- Contains the value of cr4.smep && !cr0.wp for which the page is valid
- (pages for which this is true are different from other pages; see the
- treatment of cr0.wp=0 below).
+ role.cr4_smep:
+ Contains the value of cr4.smep for which the page is valid.
role.smap_andnot_wp:
Contains the value of cr4.smap && !cr0.wp for which the page is valid
(pages for which this is true are different from other pages; see the
@@ -435,8 +433,8 @@ from being written by the kernel after cr0.wp has changed to 1, we make
the value of cr0.wp part of the page role. This means that an spte created
with one value of cr0.wp cannot be used when cr0.wp has a different value -
it will simply be missed by the shadow page lookup code. A similar issue
-exists when an spte created with cr0.wp=0 and cr4.smep=0 is used after
-changing cr4.smep to 1. To avoid this, the value of !cr0.wp && cr4.smep
+exists when an spte created with cr0.wp=0 and cr4.smap=0 is used after
+changing cr4.smap to 1. To avoid this, the value of !cr0.wp && cr4.smap
is also made a part of the page role.
Large pages
diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index 18a5c3119e1a..2ac25b418b26 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -93,6 +93,7 @@ KVM_X86_OP_OPTIONAL(sync_pir_to_irr)
KVM_X86_OP_OPTIONAL_RET0(set_tss_addr)
KVM_X86_OP_OPTIONAL_RET0(set_identity_map_addr)
KVM_X86_OP_OPTIONAL_RET0(get_mt_mask)
+KVM_X86_OP_OPTIONAL_RET0(tdp_has_smep)
KVM_X86_OP(load_mmu_pgd)
KVM_X86_OP_OPTIONAL(link_external_spt)
KVM_X86_OP_OPTIONAL(set_external_spte)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 3efb238c683c..0d6d20ab48dd 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -332,8 +332,8 @@ struct kvm_kernel_irq_routing_entry;
* paging has exactly one upper level, making level completely redundant
* when has_4_byte_gpte=1.
*
- * - on top of this, smep_andnot_wp and smap_andnot_wp are only set if
- * cr0_wp=0, therefore these three bits only give rise to 5 possibilities.
+ * - on top of this, smap_andnot_wp is only set if cr0_wp=0,
+ * therefore these two bits only give rise to 3 possibilities.
*
* Therefore, the maximum number of possible upper-level shadow pages for a
* single gfn is a bit less than 2^14.
@@ -349,12 +349,19 @@ union kvm_mmu_page_role {
unsigned invalid:1;
unsigned efer_nx:1;
unsigned cr0_wp:1;
- unsigned smep_andnot_wp:1;
unsigned smap_andnot_wp:1;
unsigned ad_disabled:1;
unsigned guest_mode:1;
unsigned passthrough:1;
unsigned is_mirror:1;
+
+ /*
+ * cr4_smep is also set for EPT MBEC. Because it affects
+ * which pages are considered non-present (bit 10 additionally
+ * must be zero if MBEC is on) it has to be in the base role.
+ */
+ unsigned cr4_smep:1;
+
unsigned :3;
/*
@@ -381,10 +388,10 @@ union kvm_mmu_page_role {
* tables (because KVM doesn't support Protection Keys with shadow paging), and
* CR0.PG, CR4.PAE, and CR4.PSE are indirectly reflected in role.level.
*
- * Note, SMEP and SMAP are not redundant with sm*p_andnot_wp in the page role.
- * If CR0.WP=1, KVM can reuse shadow pages for the guest regardless of SMEP and
- * SMAP, but the MMU's permission checks for software walks need to be SMEP and
- * SMAP aware regardless of CR0.WP.
+ * Note, SMAP is not redundant with smap_andnot_wp in the page role. If
+ * CR0.WP=1, KVM can reuse shadow pages for the guest regardless of SMAP,
+ * but the MMU's permission checks for software walks need to be SMAP
+ * aware regardless of CR0.WP.
*/
union kvm_mmu_extended_role {
u32 word;
@@ -394,7 +401,6 @@ union kvm_mmu_extended_role {
unsigned int cr4_pse:1;
unsigned int cr4_pke:1;
unsigned int cr4_smap:1;
- unsigned int cr4_smep:1;
unsigned int cr4_la57:1;
unsigned int efer_lma:1;
};
@@ -1813,6 +1819,7 @@ struct kvm_x86_ops {
int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
int (*set_identity_map_addr)(struct kvm *kvm, u64 ident_addr);
u8 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio);
+ bool (*tdp_has_smep)(struct kvm *kvm);
void (*load_mmu_pgd)(struct kvm_vcpu *vcpu, hpa_t root_hpa,
int root_level);
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 254d69c4b9f3..a0b4774e405a 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -227,7 +227,7 @@ static inline bool __maybe_unused is_##reg##_##name(struct kvm_mmu *mmu) \
}
BUILD_MMU_ROLE_ACCESSOR(base, cr0, wp);
BUILD_MMU_ROLE_ACCESSOR(ext, cr4, pse);
-BUILD_MMU_ROLE_ACCESSOR(ext, cr4, smep);
+BUILD_MMU_ROLE_ACCESSOR(base, cr4, smep);
BUILD_MMU_ROLE_ACCESSOR(ext, cr4, smap);
BUILD_MMU_ROLE_ACCESSOR(ext, cr4, pke);
BUILD_MMU_ROLE_ACCESSOR(ext, cr4, la57);
@@ -5653,7 +5653,7 @@ static union kvm_cpu_role kvm_calc_cpu_role(struct kvm_vcpu *vcpu,
role.base.efer_nx = ____is_efer_nx(regs);
role.base.cr0_wp = ____is_cr0_wp(regs);
- role.base.smep_andnot_wp = ____is_cr4_smep(regs) && !____is_cr0_wp(regs);
+ role.base.cr4_smep = ____is_cr4_smep(regs);
role.base.smap_andnot_wp = ____is_cr4_smap(regs) && !____is_cr0_wp(regs);
role.base.has_4_byte_gpte = !____is_cr4_pae(regs);
@@ -5665,7 +5665,6 @@ static union kvm_cpu_role kvm_calc_cpu_role(struct kvm_vcpu *vcpu,
else
role.base.level = PT32_ROOT_LEVEL;
- role.ext.cr4_smep = ____is_cr4_smep(regs);
role.ext.cr4_smap = ____is_cr4_smap(regs);
role.ext.cr4_pse = ____is_cr4_pse(regs);
@@ -5724,6 +5723,7 @@ kvm_calc_tdp_mmu_root_page_role(struct kvm_vcpu *vcpu,
role.access = ACC_ALL;
role.cr0_wp = true;
+ role.cr4_smep = kvm_x86_call(tdp_has_smep)(vcpu->kvm);
role.efer_nx = true;
role.smm = cpu_role.base.smm;
role.guest_mode = cpu_role.base.guest_mode;
--
2.52.0
^ permalink raw reply related [flat|nested] 56+ messages in thread
* [PATCH 12/22] KVM: VMX: enable use of MBEC
2026-03-21 0:09 [RFC PATCH 00/22] KVM: combined patchset for MBEC/GMET support Paolo Bonzini
` (10 preceding siblings ...)
2026-03-21 0:09 ` [PATCH 11/22] KVM: x86/mmu: move cr4_smep to base role Paolo Bonzini
@ 2026-03-21 0:09 ` Paolo Bonzini
2026-03-23 14:49 ` Jon Kohler
2026-03-21 0:09 ` [PATCH 13/22] KVM: x86/mmu: add support for nested MBEC Paolo Bonzini
` (10 subsequent siblings)
22 siblings, 1 reply; 56+ messages in thread
From: Paolo Bonzini @ 2026-03-21 0:09 UTC (permalink / raw)
To: linux-kernel, kvm
Cc: Jon Kohler, Marcelo Tosatti, Nikunj A Dadhania, Amit Shah,
Sean Christopherson
Add SECONDARY_EXEC_MODE_BASED_EPT_EXEC as optional secondary execution
control bit. If enabled, configure XS and XU separately (even if they
are always used together).
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
arch/x86/include/asm/vmx.h | 3 +++
arch/x86/kvm/mmu.h | 7 ++++++-
arch/x86/kvm/mmu/spte.c | 4 ++--
arch/x86/kvm/mmu/spte.h | 5 +++--
arch/x86/kvm/vmx/capabilities.h | 6 ++++++
arch/x86/kvm/vmx/common.h | 17 ++++++++++++-----
arch/x86/kvm/vmx/main.c | 11 ++++++++++-
arch/x86/kvm/vmx/vmx.c | 16 +++++++++++++++-
arch/x86/kvm/vmx/vmx.h | 1 +
arch/x86/kvm/vmx/x86_ops.h | 1 +
10 files changed, 59 insertions(+), 12 deletions(-)
diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 0041f8a77447..5fef7a531cb7 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -597,9 +597,12 @@ enum vm_entry_failure_code {
#define EPT_VIOLATION_GVA_TRANSLATED BIT(8)
#define EPT_VIOLATION_RWX_TO_PROT(__epte) (((__epte) & VMX_EPT_RWX_MASK) << 3)
+#define EPT_VIOLATION_USER_EXEC_TO_PROT(__epte) (((__epte) & VMX_EPT_USER_EXECUTABLE_MASK) >> 4)
static_assert(EPT_VIOLATION_RWX_TO_PROT(VMX_EPT_RWX_MASK) ==
(EPT_VIOLATION_PROT_READ | EPT_VIOLATION_PROT_WRITE | EPT_VIOLATION_PROT_EXEC));
+static_assert(EPT_VIOLATION_USER_EXEC_TO_PROT(VMX_EPT_USER_EXECUTABLE_MASK) ==
+ (EPT_VIOLATION_PROT_USER_EXEC));
/*
* Exit Qualifications for NOTIFY VM EXIT
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index f5d35f66750b..2a6caac39d40 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -76,12 +76,17 @@ static inline gfn_t kvm_mmu_max_gfn(void)
return (1ULL << (max_gpa_bits - PAGE_SHIFT)) - 1;
}
+static inline bool mmu_has_mbec(struct kvm_mmu *mmu)
+{
+ return mmu->root_role.cr4_smep;
+}
+
u8 kvm_mmu_get_max_tdp_level(void);
void kvm_mmu_set_mmio_spte_mask(u64 mmio_value, u64 mmio_mask, u64 access_mask);
void kvm_mmu_set_mmio_spte_value(struct kvm *kvm, u64 mmio_value);
void kvm_mmu_set_me_spte_mask(u64 me_value, u64 me_mask);
-void kvm_mmu_set_ept_masks(bool has_ad_bits);
+void kvm_mmu_set_ept_masks(bool has_ad_bits, bool has_mbec);
void kvm_init_mmu(struct kvm_vcpu *vcpu);
void kvm_init_shadow_npt_mmu(struct kvm_vcpu *vcpu, unsigned long cr0,
diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index 0b3e2b97afbf..f51e74e7202d 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -499,7 +499,7 @@ void kvm_mmu_set_me_spte_mask(u64 me_value, u64 me_mask)
}
EXPORT_SYMBOL_GPL(kvm_mmu_set_me_spte_mask);
-void kvm_mmu_set_ept_masks(bool has_ad_bits)
+void kvm_mmu_set_ept_masks(bool has_ad_bits, bool has_mbec)
{
kvm_ad_enabled = has_ad_bits;
@@ -508,7 +508,7 @@ void kvm_mmu_set_ept_masks(bool has_ad_bits)
shadow_dirty_mask = VMX_EPT_DIRTY_BIT;
shadow_nx_mask = 0ull;
shadow_xs_mask = VMX_EPT_EXECUTABLE_MASK;
- shadow_xu_mask = VMX_EPT_EXECUTABLE_MASK;
+ shadow_xu_mask = has_mbec ? VMX_EPT_USER_EXECUTABLE_MASK : VMX_EPT_EXECUTABLE_MASK;
shadow_present_mask = VMX_EPT_SUPPRESS_VE_BIT;
shadow_acc_track_mask = VMX_EPT_RWX_MASK;
diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
index 7323ff19056b..61414f8deaa2 100644
--- a/arch/x86/kvm/mmu/spte.h
+++ b/arch/x86/kvm/mmu/spte.h
@@ -24,7 +24,7 @@
* - bits 55 (EPT only): MMU-writable
* - bits 56-59: unused
* - bits 60-61: type of A/D tracking
- * - bits 62: unused
+ * - bits 62 (EPT only): saved XU bit for disabled AD
*/
/*
@@ -72,7 +72,8 @@ static_assert(SPTE_TDP_AD_ENABLED == 0);
* must not overlap the A/D type mask.
*/
#define SHADOW_ACC_TRACK_SAVED_BITS_MASK (VMX_EPT_READABLE_MASK | \
- VMX_EPT_EXECUTABLE_MASK)
+ VMX_EPT_EXECUTABLE_MASK | \
+ VMX_EPT_USER_EXECUTABLE_MASK)
#define SHADOW_ACC_TRACK_SAVED_BITS_SHIFT 52
#define SHADOW_ACC_TRACK_SAVED_MASK (SHADOW_ACC_TRACK_SAVED_BITS_MASK << \
SHADOW_ACC_TRACK_SAVED_BITS_SHIFT)
diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
index 3bda6a621d8a..02037e559410 100644
--- a/arch/x86/kvm/vmx/capabilities.h
+++ b/arch/x86/kvm/vmx/capabilities.h
@@ -393,4 +393,10 @@ static inline bool cpu_has_notify_vmexit(void)
SECONDARY_EXEC_NOTIFY_VM_EXITING;
}
+static inline bool cpu_has_ept_mbec(void)
+{
+ return vmcs_config.cpu_based_2nd_exec_ctrl &
+ SECONDARY_EXEC_MODE_BASED_EPT_EXEC;
+}
+
#endif /* __KVM_X86_VMX_CAPS_H */
diff --git a/arch/x86/kvm/vmx/common.h b/arch/x86/kvm/vmx/common.h
index 1afbf272efae..eff0b51bfda5 100644
--- a/arch/x86/kvm/vmx/common.h
+++ b/arch/x86/kvm/vmx/common.h
@@ -74,6 +74,8 @@ static __always_inline bool is_td_vcpu(struct kvm_vcpu *vcpu) { return false; }
#endif
+extern int vt_get_cpl(struct kvm_vcpu *vcpu);
+
static inline bool vt_is_tdx_private_gpa(struct kvm *kvm, gpa_t gpa)
{
/* For TDX the direct mask is the shared mask. */
@@ -91,15 +93,20 @@ static inline int __vmx_handle_ept_violation(struct kvm_vcpu *vcpu, gpa_t gpa,
/* Is it a fetch fault? */
error_code |= (exit_qualification & EPT_VIOLATION_ACC_INSTR)
? PFERR_FETCH_MASK : 0;
- /*
- * ept page table entry is present?
- * note: unconditionally clear USER_EXEC until mode-based
- * execute control is implemented
- */
+ /* Is it a fetch fault? */
+ error_code |= (exit_qualification & EPT_VIOLATION_ACC_INSTR)
+ ? PFERR_FETCH_MASK : 0;
+ /* ept page table entry is present? */
error_code |= (exit_qualification &
(EPT_VIOLATION_PROT_MASK & ~EPT_VIOLATION_PROT_USER_EXEC))
? PFERR_PRESENT_MASK : 0;
+ if (mmu_has_mbec(vcpu->arch.mmu)) {
+ error_code |= vt_get_cpl(vcpu) > 0 ? PFERR_USER_MASK : 0;
+ error_code |= (exit_qualification & EPT_VIOLATION_PROT_USER_EXEC)
+ ? PFERR_PRESENT_MASK : 0;
+ }
+
if (exit_qualification & EPT_VIOLATION_GVA_IS_VALID)
error_code |= (exit_qualification & EPT_VIOLATION_GVA_TRANSLATED) ?
PFERR_GUEST_FINAL_MASK : PFERR_GUEST_PAGE_MASK;
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index dbab1c15b0cd..601d1b7437a8 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -354,7 +354,7 @@ static void vt_set_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var,
vmx_set_segment(vcpu, var, seg);
}
-static int vt_get_cpl(struct kvm_vcpu *vcpu)
+int vt_get_cpl(struct kvm_vcpu *vcpu)
{
if (is_td_vcpu(vcpu))
return 0;
@@ -750,6 +750,14 @@ static int vt_set_identity_map_addr(struct kvm *kvm, u64 ident_addr)
return vmx_set_identity_map_addr(kvm, ident_addr);
}
+static bool vt_tdp_has_smep(struct kvm *kvm)
+{
+ if (is_td(kvm))
+ return false;
+
+ return vmx_tdp_has_smep(kvm);
+}
+
static u64 vt_get_l2_tsc_offset(struct kvm_vcpu *vcpu)
{
/* TDX doesn't support L2 guest at the moment. */
@@ -952,6 +960,7 @@ struct kvm_x86_ops vt_x86_ops __initdata = {
.set_tss_addr = vt_op(set_tss_addr),
.set_identity_map_addr = vt_op(set_identity_map_addr),
.get_mt_mask = vmx_get_mt_mask,
+ .tdp_has_smep = vt_op(tdp_has_smep),
.get_exit_info = vt_op(get_exit_info),
.get_entry_info = vt_op(get_entry_info),
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 98801c408b8c..350d26f792c4 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -112,6 +112,9 @@ module_param(emulate_invalid_guest_state, bool, 0444);
static bool __read_mostly fasteoi = 1;
module_param(fasteoi, bool, 0444);
+static bool __read_mostly enable_mbec = 1;
+module_param_named(mbec, enable_mbec, bool, 0444);
+
module_param(enable_apicv, bool, 0444);
module_param(enable_ipiv, bool, 0444);
@@ -2625,6 +2628,7 @@ static int setup_vmcs_config(struct vmcs_config *vmcs_conf,
return -EIO;
vmx_cap->ept = 0;
+ _cpu_based_2nd_exec_control &= ~SECONDARY_EXEC_MODE_BASED_EPT_EXEC;
_cpu_based_2nd_exec_control &= ~SECONDARY_EXEC_EPT_VIOLATION_VE;
}
if (!(_cpu_based_2nd_exec_control & SECONDARY_EXEC_ENABLE_VPID) &&
@@ -4520,6 +4524,9 @@ static u32 vmx_secondary_exec_control(struct vcpu_vmx *vmx)
*/
exec_control &= ~SECONDARY_EXEC_ENABLE_VMFUNC;
+ if (!enable_mbec)
+ exec_control &= ~SECONDARY_EXEC_MODE_BASED_EPT_EXEC;
+
/* SECONDARY_EXEC_DESC is enabled/disabled on writes to CR4.UMIP,
* in vmx_set_cr4. */
exec_control &= ~SECONDARY_EXEC_DESC;
@@ -7580,6 +7587,11 @@ u8 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
return (MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT);
}
+bool vmx_tdp_has_smep(struct kvm *kvm)
+{
+ return enable_mbec;
+}
+
static void vmcs_set_secondary_exec_control(struct vcpu_vmx *vmx, u32 new_ctl)
{
/*
@@ -8406,6 +8418,8 @@ __init int vmx_hardware_setup(void)
ple_window_shrink = 0;
}
+ if (!cpu_has_ept_mbec())
+ enable_mbec = 0;
if (!cpu_has_vmx_apicv())
enable_apicv = 0;
if (!enable_apicv)
@@ -8425,7 +8439,7 @@ __init int vmx_hardware_setup(void)
set_bit(0, vmx_vpid_bitmap); /* 0 is reserved for host */
if (enable_ept)
- kvm_mmu_set_ept_masks(enable_ept_ad_bits);
+ kvm_mmu_set_ept_masks(enable_ept_ad_bits, enable_mbec);
else
vt_x86_ops.get_mt_mask = NULL;
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index d3389baf3ab3..743fa33b349e 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -576,6 +576,7 @@ static inline u8 vmx_get_rvi(void)
SECONDARY_EXEC_ENABLE_VMFUNC | \
SECONDARY_EXEC_BUS_LOCK_DETECTION | \
SECONDARY_EXEC_NOTIFY_VM_EXITING | \
+ SECONDARY_EXEC_MODE_BASED_EPT_EXEC | \
SECONDARY_EXEC_ENCLS_EXITING | \
SECONDARY_EXEC_EPT_VIOLATION_VE)
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index 2b3424f638db..1fb1128b1eb7 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -104,6 +104,7 @@ void vmx_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap);
int vmx_set_tss_addr(struct kvm *kvm, unsigned int addr);
int vmx_set_identity_map_addr(struct kvm *kvm, u64 ident_addr);
u8 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio);
+bool vmx_tdp_has_smep(struct kvm *kvm);
void vmx_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason,
u64 *info1, u64 *info2, u32 *intr_info, u32 *error_code);
--
2.52.0
^ permalink raw reply related [flat|nested] 56+ messages in thread
* [PATCH 13/22] KVM: x86/mmu: add support for nested MBEC
2026-03-21 0:09 [RFC PATCH 00/22] KVM: combined patchset for MBEC/GMET support Paolo Bonzini
` (11 preceding siblings ...)
2026-03-21 0:09 ` [PATCH 12/22] KVM: VMX: enable use of MBEC Paolo Bonzini
@ 2026-03-21 0:09 ` Paolo Bonzini
2026-03-23 14:49 ` Jon Kohler
2026-03-21 0:09 ` [PATCH 14/22] KVM: nVMX: advertise MBEC to nested guests Paolo Bonzini
` (9 subsequent siblings)
22 siblings, 1 reply; 56+ messages in thread
From: Paolo Bonzini @ 2026-03-21 0:09 UTC (permalink / raw)
To: linux-kernel, kvm
Cc: Jon Kohler, Marcelo Tosatti, Nikunj A Dadhania, Amit Shah,
Sean Christopherson
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
arch/x86/kvm/mmu/paging_tmpl.h | 29 ++++++++++++++++++++---------
1 file changed, 20 insertions(+), 9 deletions(-)
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index c657ea90bb33..d50085308506 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -124,12 +124,17 @@ static inline void FNAME(protect_clean_gpte)(struct kvm_mmu *mmu, unsigned *acce
*access &= mask;
}
-static inline int FNAME(is_present_gpte)(unsigned long pte)
+static inline int FNAME(is_present_gpte)(struct kvm_mmu *mmu,
+ unsigned long pte)
{
#if PTTYPE != PTTYPE_EPT
return pte & PT_PRESENT_MASK;
#else
- return pte & 7;
+ /*
+ * For EPT, an entry is present if any of bits 2:0 are set.
+ * With mode-based execute control, bit 10 also indicates presence.
+ */
+ return pte & (7 | (mmu_has_mbec(mmu) ? VMX_EPT_USER_EXECUTABLE_MASK : 0));
#endif
}
@@ -152,7 +157,7 @@ static bool FNAME(prefetch_invalid_gpte)(struct kvm_vcpu *vcpu,
struct kvm_mmu_page *sp, u64 *spte,
u64 gpte)
{
- if (!FNAME(is_present_gpte)(gpte))
+ if (!FNAME(is_present_gpte)(vcpu->arch.mmu, gpte))
goto no_present;
/* Prefetch only accessed entries (unless A/D bits are disabled). */
@@ -173,14 +178,17 @@ static bool FNAME(prefetch_invalid_gpte)(struct kvm_vcpu *vcpu,
static inline unsigned FNAME(gpte_access)(u64 gpte)
{
unsigned access;
-#if PTTYPE == PTTYPE_EPT
/*
- * For now nested MBEC is not supported and permission_fault() ignores
- * ACC_USER_EXEC_MASK.
+ * Set bits in ACC_*_MASK even if they might not be used in the
+ * actual checks. For example, if EFER.NX is clear permission_fault()
+ * will ignore ACC_EXEC_MASK, and if MBEC is disabled it will
+ * ignore ACC_USER_EXEC_MASK.
*/
+#if PTTYPE == PTTYPE_EPT
access = ((gpte & VMX_EPT_WRITABLE_MASK) ? ACC_WRITE_MASK : 0) |
((gpte & VMX_EPT_EXECUTABLE_MASK) ? ACC_EXEC_MASK : 0) |
- ((gpte & VMX_EPT_READABLE_MASK) ? ACC_READ_MASK : 0);
+ ((gpte & VMX_EPT_READABLE_MASK) ? ACC_READ_MASK : 0) |
+ ((gpte & VMX_EPT_USER_EXECUTABLE_MASK) ? ACC_USER_EXEC_MASK : 0);
#else
/*
* P is set here, so the page is always readable and W/U/!NX represent
@@ -335,7 +343,7 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
if (walker->level == PT32E_ROOT_LEVEL) {
pte = mmu->get_pdptr(vcpu, (addr >> 30) & 3);
trace_kvm_mmu_paging_element(pte, walker->level);
- if (!FNAME(is_present_gpte)(pte))
+ if (!FNAME(is_present_gpte)(mmu, pte))
goto error;
--walker->level;
}
@@ -417,7 +425,7 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
*/
pte_access = pt_access & (pte ^ walk_nx_mask);
- if (unlikely(!FNAME(is_present_gpte)(pte)))
+ if (unlikely(!FNAME(is_present_gpte)(mmu, pte)))
goto error;
if (unlikely(FNAME(is_rsvd_bits_set)(mmu, pte, walker->level))) {
@@ -514,6 +522,9 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
* ACC_*_MASK flags!
*/
walker->fault.exit_qualification |= EPT_VIOLATION_RWX_TO_PROT(pte_access);
+ if (mmu_has_mbec(mmu))
+ walker->fault.exit_qualification |=
+ EPT_VIOLATION_USER_EXEC_TO_PROT(pte_access);
}
#endif
walker->fault.address = addr;
--
2.52.0
^ permalink raw reply related [flat|nested] 56+ messages in thread
* [PATCH 14/22] KVM: nVMX: advertise MBEC to nested guests
2026-03-21 0:09 [RFC PATCH 00/22] KVM: combined patchset for MBEC/GMET support Paolo Bonzini
` (12 preceding siblings ...)
2026-03-21 0:09 ` [PATCH 13/22] KVM: x86/mmu: add support for nested MBEC Paolo Bonzini
@ 2026-03-21 0:09 ` Paolo Bonzini
2026-03-23 14:49 ` Jon Kohler
2026-03-21 0:09 ` [PATCH 15/22] KVM: nVMX: allow MBEC with EVMCS Paolo Bonzini
` (8 subsequent siblings)
22 siblings, 1 reply; 56+ messages in thread
From: Paolo Bonzini @ 2026-03-21 0:09 UTC (permalink / raw)
To: linux-kernel, kvm
Cc: Jon Kohler, Marcelo Tosatti, Nikunj A Dadhania, Amit Shah,
Sean Christopherson
From: Jon Kohler <jon@nutanix.com>
Advertise SECONDARY_EXEC_MODE_BASED_EPT_EXEC (MBEC) to userspace, which
allows userspace to expose and advertise the feature to the guest.
When MBEC is enabled by the guest, it is passed to the MMU via cr4_smep
and to the processor by the merging of vmcs12->secondary_vm_exec_control
into the VMCS02's secondary VM execution controls.
Signed-off-by: Jon Kohler <jon@nutanix.com>
Message-ID: <20251223054806.1611168-9-jon@nutanix.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
arch/x86/kvm/mmu.h | 2 +-
arch/x86/kvm/mmu/mmu.c | 7 ++++---
arch/x86/kvm/vmx/nested.c | 10 ++++++++++
3 files changed, 15 insertions(+), 4 deletions(-)
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 2a6caac39d40..035244ccbb5e 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -93,7 +93,7 @@ void kvm_init_shadow_npt_mmu(struct kvm_vcpu *vcpu, unsigned long cr0,
unsigned long cr4, u64 efer, gpa_t nested_cr3);
void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly,
int huge_page_level, bool accessed_dirty,
- gpa_t new_eptp);
+ bool mbec, gpa_t new_eptp);
bool kvm_can_do_async_pf(struct kvm_vcpu *vcpu);
int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code,
u64 fault_address, char *insn, int insn_len);
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index a0b4774e405a..647dffb69d85 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5839,7 +5839,7 @@ EXPORT_SYMBOL_GPL(kvm_init_shadow_npt_mmu);
static union kvm_cpu_role
kvm_calc_shadow_ept_root_page_role(struct kvm_vcpu *vcpu, bool accessed_dirty,
- bool execonly, u8 level)
+ bool execonly, u8 level, bool mbec)
{
union kvm_cpu_role role = {0};
@@ -5849,6 +5849,7 @@ kvm_calc_shadow_ept_root_page_role(struct kvm_vcpu *vcpu, bool accessed_dirty,
*/
WARN_ON_ONCE(is_smm(vcpu));
role.base.level = level;
+ role.base.cr4_smep = mbec;
role.base.has_4_byte_gpte = false;
role.base.direct = false;
role.base.ad_disabled = !accessed_dirty;
@@ -5864,13 +5865,13 @@ kvm_calc_shadow_ept_root_page_role(struct kvm_vcpu *vcpu, bool accessed_dirty,
void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly,
int huge_page_level, bool accessed_dirty,
- gpa_t new_eptp)
+ bool mbec, gpa_t new_eptp)
{
struct kvm_mmu *context = &vcpu->arch.guest_mmu;
u8 level = vmx_eptp_page_walk_level(new_eptp);
union kvm_cpu_role new_mode =
kvm_calc_shadow_ept_root_page_role(vcpu, accessed_dirty,
- execonly, level);
+ execonly, level, mbec);
if (new_mode.as_u64 != context->cpu_role.as_u64) {
/* EPT, and thus nested EPT, does not consume CR0, CR4, nor EFER. */
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 7c55551a2680..7b0861d02166 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -460,6 +460,12 @@ static void nested_ept_inject_page_fault(struct kvm_vcpu *vcpu,
vmcs12->guest_physical_address = fault->address;
}
+static inline bool nested_ept_mbec_enabled(struct kvm_vcpu *vcpu)
+{
+ struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
+ return nested_cpu_has2(vmcs12, SECONDARY_EXEC_MODE_BASED_EPT_EXEC);
+}
+
static void nested_ept_new_eptp(struct kvm_vcpu *vcpu)
{
struct vcpu_vmx *vmx = to_vmx(vcpu);
@@ -468,6 +474,7 @@ static void nested_ept_new_eptp(struct kvm_vcpu *vcpu)
kvm_init_shadow_ept_mmu(vcpu, execonly, ept_lpage_level,
nested_ept_ad_enabled(vcpu),
+ nested_ept_mbec_enabled(vcpu),
nested_ept_get_eptp(vcpu));
}
@@ -7145,6 +7152,9 @@ static void nested_vmx_setup_secondary_ctls(u32 ept_caps,
msrs->ept_caps |= VMX_EPT_AD_BIT;
}
+ if (cpu_has_ept_mbec())
+ msrs->secondary_ctls_high |=
+ SECONDARY_EXEC_MODE_BASED_EPT_EXEC;
/*
* Advertise EPTP switching irrespective of hardware support,
* KVM emulates it in software so long as VMFUNC is supported.
--
2.52.0
^ permalink raw reply related [flat|nested] 56+ messages in thread
* [PATCH 15/22] KVM: nVMX: allow MBEC with EVMCS
2026-03-21 0:09 [RFC PATCH 00/22] KVM: combined patchset for MBEC/GMET support Paolo Bonzini
` (13 preceding siblings ...)
2026-03-21 0:09 ` [PATCH 14/22] KVM: nVMX: advertise MBEC to nested guests Paolo Bonzini
@ 2026-03-21 0:09 ` Paolo Bonzini
2026-03-21 0:09 ` [PATCH 16/22] KVM: x86/tdp_mmu: propagate access mask from kvm_mmu_page to PTE Paolo Bonzini
` (7 subsequent siblings)
22 siblings, 0 replies; 56+ messages in thread
From: Paolo Bonzini @ 2026-03-21 0:09 UTC (permalink / raw)
To: linux-kernel, kvm
Cc: Jon Kohler, Marcelo Tosatti, Nikunj A Dadhania, Amit Shah,
Sean Christopherson
From: Jon Kohler <jon@nutanix.com>
Extend EVMCS1_SUPPORTED_2NDEXEC to allow MBEC and EVMCS to coexist.
Presenting both EVMCS and MBEC simultaneously causes KVM to filter out
MBEC and not present it as a supported control to the guest, preventing
performance gains from MBEC when Windows HVCI is enabled.
The guest may choose not to use MBEC (e.g., if the admin does not enable
Windows HVCI / Memory Integrity), but if they use traditional nested
virt (Hyper-V, WSL2, etc.), having EVMCS exposed is important for
improving nested guest performance. IOW allowing MBEC and EVMCS to
coexist provides maximum optionality to Windows users without
overcomplicating VM administration.
Signed-off-by: Jon Kohler <jon@nutanix.com>
Message-ID: <20251223054806.1611168-8-jon@nutanix.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
arch/x86/kvm/vmx/hyperv_evmcs.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/x86/kvm/vmx/hyperv_evmcs.h b/arch/x86/kvm/vmx/hyperv_evmcs.h
index 6536290f4274..0568f76aafc1 100644
--- a/arch/x86/kvm/vmx/hyperv_evmcs.h
+++ b/arch/x86/kvm/vmx/hyperv_evmcs.h
@@ -87,6 +87,7 @@
SECONDARY_EXEC_PT_CONCEAL_VMX | \
SECONDARY_EXEC_BUS_LOCK_DETECTION | \
SECONDARY_EXEC_NOTIFY_VM_EXITING | \
+ SECONDARY_EXEC_MODE_BASED_EPT_EXEC | \
SECONDARY_EXEC_ENCLS_EXITING)
#define EVMCS1_SUPPORTED_3RDEXEC (0ULL)
--
2.52.0
^ permalink raw reply related [flat|nested] 56+ messages in thread
* [PATCH 16/22] KVM: x86/tdp_mmu: propagate access mask from kvm_mmu_page to PTE
2026-03-21 0:09 [RFC PATCH 00/22] KVM: combined patchset for MBEC/GMET support Paolo Bonzini
` (14 preceding siblings ...)
2026-03-21 0:09 ` [PATCH 15/22] KVM: nVMX: allow MBEC with EVMCS Paolo Bonzini
@ 2026-03-21 0:09 ` Paolo Bonzini
2026-03-21 0:09 ` [PATCH 17/22] KVM: x86/mmu: introduce cpu_role bit for availability of PFEC.I/D Paolo Bonzini
` (6 subsequent siblings)
22 siblings, 0 replies; 56+ messages in thread
From: Paolo Bonzini @ 2026-03-21 0:09 UTC (permalink / raw)
To: linux-kernel, kvm
Cc: Jon Kohler, Marcelo Tosatti, Nikunj A Dadhania, Amit Shah,
Sean Christopherson
Until now, all SPTEs have had all kinds of access allowed; however,
for GMET to be enabled all the pages have to have ACC_USER_MASK
disabled. By marking them as supervisor pages, the processor
allows execution from either user or supervisor mode (unlike
for normal paging, NPT ignores the U bit for reads and writes).
That will mean that the root page's role has ACC_USER_MASK
cleared and that has to be propagated down through the TDP MMU
pages.
Do that in tdp_mmu_map_handle_target_level.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
arch/x86/kvm/mmu/tdp_mmu.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 7f3d7229b2c1..f0e7528435cf 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -1161,9 +1161,9 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu,
}
if (unlikely(!fault->slot))
- new_spte = make_mmio_spte(vcpu, iter->gfn, ACC_ALL);
+ new_spte = make_mmio_spte(vcpu, iter->gfn, sp->role.access);
else
- wrprot = make_spte(vcpu, sp, fault->slot, ACC_ALL, iter->gfn,
+ wrprot = make_spte(vcpu, sp, fault->slot, sp->role.access, iter->gfn,
fault->pfn, iter->old_spte, fault->prefetch,
false, fault->map_writable, &new_spte);
--
2.52.0
^ permalink raw reply related [flat|nested] 56+ messages in thread
* [PATCH 17/22] KVM: x86/mmu: introduce cpu_role bit for availability of PFEC.I/D
2026-03-21 0:09 [RFC PATCH 00/22] KVM: combined patchset for MBEC/GMET support Paolo Bonzini
` (15 preceding siblings ...)
2026-03-21 0:09 ` [PATCH 16/22] KVM: x86/tdp_mmu: propagate access mask from kvm_mmu_page to PTE Paolo Bonzini
@ 2026-03-21 0:09 ` Paolo Bonzini
2026-03-21 0:09 ` [PATCH 18/22] KVM: SVM: add GMET bit definitions Paolo Bonzini
` (5 subsequent siblings)
22 siblings, 0 replies; 56+ messages in thread
From: Paolo Bonzini @ 2026-03-21 0:09 UTC (permalink / raw)
To: linux-kernel, kvm
Cc: Jon Kohler, Marcelo Tosatti, Nikunj A Dadhania, Amit Shah,
Sean Christopherson
While GMET looks a lot like SMEP, it has several annoying differences.
The main one is that the availability of the I/D bit in the page fault
error code still depends on the host CR4.SMEP and EFER.NXE bits. If the
base.cr4_smep bit of the cpu_role is (ab)used to enable GMET, there needs
to be another place where the host CR4.SMEP is read from; just merge it
with EFER.NXE into a new cpu_role bit that tells paging_tmpl.h whether
to set the I/D bit at all.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
arch/x86/include/asm/kvm_host.h | 7 +++++++
arch/x86/kvm/mmu/mmu.c | 8 ++++++++
arch/x86/kvm/mmu/paging_tmpl.h | 2 +-
3 files changed, 16 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 0d6d20ab48dd..3162414186f0 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -403,6 +403,13 @@ union kvm_mmu_extended_role {
unsigned int cr4_smap:1;
unsigned int cr4_la57:1;
unsigned int efer_lma:1;
+
+ /*
+ * True if either CR4.SMEP or EFER.NXE are set. For AMD NPT
+ * this is the "real" host CR4.SMEP whereas cr4_smep is
+ * actually GMET.
+ */
+ unsigned int has_pferr_fetch:1;
};
};
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 647dffb69d85..1788620e6dfc 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -234,6 +234,11 @@ BUILD_MMU_ROLE_ACCESSOR(ext, cr4, la57);
BUILD_MMU_ROLE_ACCESSOR(base, efer, nx);
BUILD_MMU_ROLE_ACCESSOR(ext, efer, lma);
+static inline bool has_pferr_fetch(struct kvm_mmu *mmu)
+{
+ return mmu->cpu_role.ext.has_pferr_fetch;
+}
+
static inline bool is_cr0_pg(struct kvm_mmu *mmu)
{
return mmu->cpu_role.base.level > 0;
@@ -5672,6 +5677,8 @@ static union kvm_cpu_role kvm_calc_cpu_role(struct kvm_vcpu *vcpu,
role.ext.cr4_pke = ____is_efer_lma(regs) && ____is_cr4_pke(regs);
role.ext.cr4_la57 = ____is_efer_lma(regs) && ____is_cr4_la57(regs);
role.ext.efer_lma = ____is_efer_lma(regs);
+
+ role.ext.has_pferr_fetch = role.base.efer_nx | role.base.cr4_smep;
return role;
}
@@ -5825,6 +5832,7 @@ void kvm_init_shadow_npt_mmu(struct kvm_vcpu *vcpu, unsigned long cr0,
/* NPT requires CR0.PG=1. */
WARN_ON_ONCE(cpu_role.base.direct || !cpu_role.base.guest_mode);
+ cpu_role.base.cr4_smep = false;
root_role = cpu_role.base;
root_role.level = kvm_mmu_get_tdp_level(vcpu);
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index d50085308506..bc6b0a1a1c8a 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -486,7 +486,7 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
error:
errcode |= write_fault | user_fault;
- if (fetch_fault && (is_efer_nx(mmu) || is_cr4_smep(mmu)))
+ if (fetch_fault && has_pferr_fetch(mmu))
errcode |= PFERR_FETCH_MASK;
walker->fault.vector = PF_VECTOR;
--
2.52.0
^ permalink raw reply related [flat|nested] 56+ messages in thread
* [PATCH 18/22] KVM: SVM: add GMET bit definitions
2026-03-21 0:09 [RFC PATCH 00/22] KVM: combined patchset for MBEC/GMET support Paolo Bonzini
` (16 preceding siblings ...)
2026-03-21 0:09 ` [PATCH 17/22] KVM: x86/mmu: introduce cpu_role bit for availability of PFEC.I/D Paolo Bonzini
@ 2026-03-21 0:09 ` Paolo Bonzini
2026-03-21 11:58 ` Borislav Petkov
2026-03-23 12:26 ` Borislav Petkov
2026-03-21 0:09 ` [PATCH 19/22] KVM: x86/mmu: add support for NPT GMET Paolo Bonzini
` (4 subsequent siblings)
22 siblings, 2 replies; 56+ messages in thread
From: Paolo Bonzini @ 2026-03-21 0:09 UTC (permalink / raw)
To: linux-kernel, kvm
Cc: Jon Kohler, Marcelo Tosatti, Nikunj A Dadhania, Amit Shah,
Sean Christopherson, Borislav Petkov (AMD)
GMET (Guest Mode Execute Trap) is an AMD virtualization feature,
essentially the nested paging version of SMEP. Hyper-V uses it;
add it in preparation for making it available to hypervisors
running under KVM.
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/svm.h | 1 +
2 files changed, 2 insertions(+)
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 123c023fe42c..95469c7d357f 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -382,6 +382,7 @@
#define X86_FEATURE_AVIC (15*32+13) /* "avic" Virtual Interrupt Controller */
#define X86_FEATURE_V_VMSAVE_VMLOAD (15*32+15) /* "v_vmsave_vmload" Virtual VMSAVE VMLOAD */
#define X86_FEATURE_VGIF (15*32+16) /* "vgif" Virtual GIF */
+#define X86_FEATURE_GMET (15*32+17) /* "gmet" Guest Mode Execution Trap */
#define X86_FEATURE_X2AVIC (15*32+18) /* "x2avic" Virtual x2apic */
#define X86_FEATURE_V_SPEC_CTRL (15*32+20) /* "v_spec_ctrl" Virtual SPEC_CTRL */
#define X86_FEATURE_VNMI (15*32+25) /* "vnmi" Virtual NMI */
diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index e2a29e1144a7..47353bef947c 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -239,6 +239,7 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
#define SVM_NESTED_CTL_NP_ENABLE BIT(0)
#define SVM_NESTED_CTL_SEV_ENABLE BIT(1)
#define SVM_NESTED_CTL_SEV_ES_ENABLE BIT(2)
+#define SVM_NESTED_CTL_GMET_ENABLE BIT(3)
#define SVM_TSC_RATIO_RSVD 0xffffff0000000000ULL
--
2.52.0
^ permalink raw reply related [flat|nested] 56+ messages in thread
* [PATCH 19/22] KVM: x86/mmu: add support for NPT GMET
2026-03-21 0:09 [RFC PATCH 00/22] KVM: combined patchset for MBEC/GMET support Paolo Bonzini
` (17 preceding siblings ...)
2026-03-21 0:09 ` [PATCH 18/22] KVM: SVM: add GMET bit definitions Paolo Bonzini
@ 2026-03-21 0:09 ` Paolo Bonzini
2026-03-21 0:09 ` [PATCH 20/22] KVM: SVM: enable GMET and set it in MMU role Paolo Bonzini
` (3 subsequent siblings)
22 siblings, 0 replies; 56+ messages in thread
From: Paolo Bonzini @ 2026-03-21 0:09 UTC (permalink / raw)
To: linux-kernel, kvm
Cc: Jon Kohler, Marcelo Tosatti, Nikunj A Dadhania, Amit Shah,
Sean Christopherson
GMET allows page table entries to be created with U=0 in NPT.
However, when GMET=1 U=0 only affects execution, not reads or
writes. Ignore user faults on non-fetch accesses for NPT GMET.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
arch/x86/include/asm/kvm_host.h | 2 ++
arch/x86/kvm/mmu.h | 3 ++-
arch/x86/kvm/mmu/mmu.c | 19 +++++++++++++------
arch/x86/kvm/svm/nested.c | 3 ++-
4 files changed, 19 insertions(+), 8 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 3162414186f0..5016a4569746 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -359,6 +359,8 @@ union kvm_mmu_page_role {
* cr4_smep is also set for EPT MBEC. Because it affects
* which pages are considered non-present (bit 10 additionally
* must be zero if MBEC is on) it has to be in the base role.
+ * It also has to be in the base role for AMD GMET because
+ * kernel-executable pages need to have U=0 with GMET enabled.
*/
unsigned cr4_smep:1;
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 035244ccbb5e..b03a5f4d9f04 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -90,7 +90,8 @@ void kvm_mmu_set_ept_masks(bool has_ad_bits, bool has_mbec);
void kvm_init_mmu(struct kvm_vcpu *vcpu);
void kvm_init_shadow_npt_mmu(struct kvm_vcpu *vcpu, unsigned long cr0,
- unsigned long cr4, u64 efer, gpa_t nested_cr3);
+ unsigned long cr4, u64 efer, gpa_t nested_cr3,
+ u64 nested_ctl);
void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly,
int huge_page_level, bool accessed_dirty,
bool mbec, gpa_t new_eptp);
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 1788620e6dfc..eeb8667a283f 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -55,6 +55,7 @@
#include <asm/io.h>
#include <asm/set_memory.h>
#include <asm/spec-ctrl.h>
+#include <asm/svm.h>
#include <asm/vmx.h>
#include "trace.h"
@@ -5451,7 +5452,7 @@ reset_ept_shadow_zero_bits_mask(struct kvm_mmu *context, bool execonly)
(14 & (access) ? 1 << 14 : 0) | \
(15 & (access) ? 1 << 15 : 0))
-static void update_permission_bitmask(struct kvm_mmu *mmu, bool ept)
+static void update_permission_bitmask(struct kvm_mmu *mmu, bool tdp, bool ept)
{
unsigned byte;
@@ -5512,7 +5513,12 @@ static void update_permission_bitmask(struct kvm_mmu *mmu, bool ept)
/* Faults from kernel mode accesses to user pages */
u16 kf = (pfec & PFERR_USER_MASK) ? 0 : u;
- uf = (pfec & PFERR_USER_MASK) ? (u16)~u : 0;
+ /*
+ * For NPT GMET, U=0 does not affect reads and writes. Fetches
+ * are handled below via cr4_smep.
+ */
+ if (!(tdp && cr4_smep))
+ uf = (pfec & PFERR_USER_MASK) ? (u16)~u : 0;
if (efer_nx)
ff = (pfec & PFERR_FETCH_MASK) ? (u16)~x : 0;
@@ -5623,7 +5629,7 @@ static void reset_guest_paging_metadata(struct kvm_vcpu *vcpu,
return;
reset_guest_rsvds_bits_mask(vcpu, mmu);
- update_permission_bitmask(mmu, false);
+ update_permission_bitmask(mmu, mmu == &vcpu->arch.guest_mmu, false);
update_pkru_bitmask(mmu);
}
@@ -5819,7 +5825,8 @@ static void kvm_init_shadow_mmu(struct kvm_vcpu *vcpu,
}
void kvm_init_shadow_npt_mmu(struct kvm_vcpu *vcpu, unsigned long cr0,
- unsigned long cr4, u64 efer, gpa_t nested_cr3)
+ unsigned long cr4, u64 efer, gpa_t nested_cr3,
+ u64 nested_ctl)
{
struct kvm_mmu *context = &vcpu->arch.guest_mmu;
struct kvm_mmu_role_regs regs = {
@@ -5832,7 +5839,7 @@ void kvm_init_shadow_npt_mmu(struct kvm_vcpu *vcpu, unsigned long cr0,
/* NPT requires CR0.PG=1. */
WARN_ON_ONCE(cpu_role.base.direct || !cpu_role.base.guest_mode);
- cpu_role.base.cr4_smep = false;
+ cpu_role.base.cr4_smep = (nested_ctl & SVM_NESTED_CTL_GMET_ENABLE) != 0;
root_role = cpu_role.base;
root_role.level = kvm_mmu_get_tdp_level(vcpu);
@@ -5890,7 +5897,7 @@ void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly,
context->gva_to_gpa = ept_gva_to_gpa;
context->sync_spte = ept_sync_spte;
- update_permission_bitmask(context, true);
+ update_permission_bitmask(context, true, true);
context->pkru_mask = 0;
reset_rsvds_bits_mask_ept(vcpu, context, execonly, huge_page_level);
reset_ept_shadow_zero_bits_mask(context, execonly);
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index b7fd2e869998..617052c98365 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -96,7 +96,8 @@ static void nested_svm_init_mmu_context(struct kvm_vcpu *vcpu)
*/
kvm_init_shadow_npt_mmu(vcpu, X86_CR0_PG, svm->vmcb01.ptr->save.cr4,
svm->vmcb01.ptr->save.efer,
- svm->nested.ctl.nested_cr3);
+ svm->nested.ctl.nested_cr3,
+ svm->nested.ctl.nested_ctl);
vcpu->arch.mmu->get_guest_pgd = nested_svm_get_tdp_cr3;
vcpu->arch.mmu->get_pdptr = nested_svm_get_tdp_pdptr;
vcpu->arch.mmu->inject_page_fault = nested_svm_inject_npf_exit;
--
2.52.0
^ permalink raw reply related [flat|nested] 56+ messages in thread
* [PATCH 20/22] KVM: SVM: enable GMET and set it in MMU role
2026-03-21 0:09 [RFC PATCH 00/22] KVM: combined patchset for MBEC/GMET support Paolo Bonzini
` (18 preceding siblings ...)
2026-03-21 0:09 ` [PATCH 19/22] KVM: x86/mmu: add support for NPT GMET Paolo Bonzini
@ 2026-03-21 0:09 ` Paolo Bonzini
2026-03-25 9:25 ` Nikunj A. Dadhania
2026-03-21 0:09 ` [PATCH 21/22] KVM: SVM: work around errata 1218 Paolo Bonzini
` (2 subsequent siblings)
22 siblings, 1 reply; 56+ messages in thread
From: Paolo Bonzini @ 2026-03-21 0:09 UTC (permalink / raw)
To: linux-kernel, kvm
Cc: Jon Kohler, Marcelo Tosatti, Nikunj A Dadhania, Amit Shah,
Sean Christopherson
Set the GMET bit in the nested control field. This has effectively
no impact as long as NPT page tables are changed to have U=0.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
arch/x86/kvm/mmu/mmu.c | 6 +++++-
arch/x86/kvm/svm/nested.c | 2 ++
arch/x86/kvm/svm/svm.c | 16 ++++++++++++++++
3 files changed, 23 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index eeb8667a283f..06289b2d4f96 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5734,7 +5734,6 @@ kvm_calc_tdp_mmu_root_page_role(struct kvm_vcpu *vcpu,
{
union kvm_mmu_page_role role = {0};
- role.access = ACC_ALL;
role.cr0_wp = true;
role.cr4_smep = kvm_x86_call(tdp_has_smep)(vcpu->kvm);
role.efer_nx = true;
@@ -5745,6 +5744,11 @@ kvm_calc_tdp_mmu_root_page_role(struct kvm_vcpu *vcpu,
role.direct = true;
role.has_4_byte_gpte = false;
+ /* All TDP pages are supervisor-executable */
+ role.access = ACC_ALL;
+ if (role.cr4_smep && shadow_user_mask)
+ role.access &= ~ACC_USER_MASK;
+
return role;
}
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 617052c98365..d69bcf52f948 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -773,6 +773,8 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
else
vmcb02->control.bus_lock_counter = 0;
+ vmcb02->control.nested_ctl &= ~SVM_NESTED_CTL_GMET_ENABLE;
+
/* Done at vmrun: asid. */
/* Also overwritten later if necessary. */
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 23cb4beea886..4a4f663b2bd2 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -134,6 +134,9 @@ module_param(pause_filter_count_max, ushort, 0444);
bool npt_enabled = true;
module_param_named(npt, npt_enabled, bool, 0444);
+bool gmet_enabled = true;
+module_param_named(gmet, gmet_enabled, bool, 0444);
+
/* allow nested virtualization in KVM/SVM */
static int nested = true;
module_param(nested, int, 0444);
@@ -1184,6 +1187,10 @@ static void init_vmcb(struct kvm_vcpu *vcpu)
save->g_pat = vcpu->arch.pat;
save->cr3 = 0;
}
+
+ if (gmet_enabled)
+ control->nested_ctl |= SVM_NESTED_CTL_GMET_ENABLE;
+
svm->current_vmcb->asid_generation = 0;
svm->asid = 0;
@@ -4423,6 +4430,11 @@ svm_patch_hypercall(struct kvm_vcpu *vcpu, unsigned char *hypercall)
hypercall[2] = 0xd9;
}
+static bool svm_tdp_has_smep(struct kvm *kvm)
+{
+ return gmet_enabled;
+}
+
/*
* The kvm parameter can be NULL (module initialization, or invocation before
* VM creation). Be sure to check the kvm parameter before using it.
@@ -5147,6 +5159,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
.write_tsc_multiplier = svm_write_tsc_multiplier,
.load_mmu_pgd = svm_load_mmu_pgd,
+ .tdp_has_smep = svm_tdp_has_smep,
.check_intercept = svm_check_intercept,
.handle_exit_irqoff = svm_handle_exit_irqoff,
@@ -5377,6 +5390,9 @@ static __init int svm_hardware_setup(void)
if (!boot_cpu_has(X86_FEATURE_NPT))
npt_enabled = false;
+ if (!npt_enabled || !boot_cpu_has(X86_FEATURE_GMET))
+ gmet_enabled = false;
+
/* Force VM NPT level equal to the host's paging level */
kvm_configure_mmu(npt_enabled, get_npt_level(),
get_npt_level(), PG_LEVEL_1G);
--
2.52.0
^ permalink raw reply related [flat|nested] 56+ messages in thread
* [PATCH 21/22] KVM: SVM: work around errata 1218
2026-03-21 0:09 [RFC PATCH 00/22] KVM: combined patchset for MBEC/GMET support Paolo Bonzini
` (19 preceding siblings ...)
2026-03-21 0:09 ` [PATCH 20/22] KVM: SVM: enable GMET and set it in MMU role Paolo Bonzini
@ 2026-03-21 0:09 ` Paolo Bonzini
2026-03-21 0:09 ` [PATCH 22/22] KVM: nSVM: enable GMET for guests Paolo Bonzini
2026-03-21 13:54 ` [RFC PATCH 00/22] KVM: combined patchset for MBEC/GMET support Paolo Bonzini
22 siblings, 0 replies; 56+ messages in thread
From: Paolo Bonzini @ 2026-03-21 0:09 UTC (permalink / raw)
To: linux-kernel, kvm
Cc: Jon Kohler, Marcelo Tosatti, Nikunj A Dadhania, Amit Shah,
Sean Christopherson
According to AMD, the hypervisor may not be able to determine whether a
fault was a GMET fault or an NX fault based on EXITINFO1, and software
"must read the relevant VMCB to determine whether a fault was a GMET
fault or an NX fault". The APM further details that they meant the
CPL field.
KVM uses the page fault error code to distinguish the causes of a
nested page fault, so recalculate the PFERR_USER_MASK bit of the
vmexit information. Only do it for fetches and only if GMET is in
use, because KVM does not differentiate based on PFERR_USER_MASK
for other nested NPT page faults.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
arch/x86/kvm/svm/svm.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 4a4f663b2bd2..d3b69eb3242b 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1955,6 +1955,17 @@ static int npf_interception(struct kvm_vcpu *vcpu)
if (WARN_ON_ONCE(error_code & PFERR_SYNTHETIC_MASK))
error_code &= ~PFERR_SYNTHETIC_MASK;
+ if ((svm->vmcb->control.nested_ctl & SVM_NESTED_CTL_GMET_ENABLE) &&
+ (error_code & PFERR_FETCH_MASK)) {
+ /*
+ * Work around errata 1218: EXITINFO1[2] May Be Incorrectly Set
+ * When GMET (Guest Mode Execute Trap extension) is Enabled
+ */
+ error_code |= PFERR_USER_MASK;
+ if (svm_get_cpl(vcpu) == 0)
+ error_code &= ~PFERR_USER_MASK;
+ }
+
if (sev_snp_guest(vcpu->kvm) && (error_code & PFERR_GUEST_ENC_MASK))
error_code |= PFERR_PRIVATE_ACCESS;
--
2.52.0
^ permalink raw reply related [flat|nested] 56+ messages in thread
* [PATCH 22/22] KVM: nSVM: enable GMET for guests
2026-03-21 0:09 [RFC PATCH 00/22] KVM: combined patchset for MBEC/GMET support Paolo Bonzini
` (20 preceding siblings ...)
2026-03-21 0:09 ` [PATCH 21/22] KVM: SVM: work around errata 1218 Paolo Bonzini
@ 2026-03-21 0:09 ` Paolo Bonzini
2026-03-24 19:57 ` Jon Kohler
2026-03-21 13:54 ` [RFC PATCH 00/22] KVM: combined patchset for MBEC/GMET support Paolo Bonzini
22 siblings, 1 reply; 56+ messages in thread
From: Paolo Bonzini @ 2026-03-21 0:09 UTC (permalink / raw)
To: linux-kernel, kvm
Cc: Jon Kohler, Marcelo Tosatti, Nikunj A Dadhania, Amit Shah,
Sean Christopherson
All that needs to be done is moving the GMET bit from vmcs12 to
vmcs02.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
arch/x86/kvm/svm/nested.c | 3 +++
arch/x86/kvm/svm/svm.c | 3 +++
2 files changed, 6 insertions(+)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index d69bcf52f948..397e9afecb78 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -774,6 +774,9 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
vmcb02->control.bus_lock_counter = 0;
vmcb02->control.nested_ctl &= ~SVM_NESTED_CTL_GMET_ENABLE;
+ if (guest_cpu_cap_has(vcpu, X86_FEATURE_GMET))
+ vmcb02->control.nested_ctl |=
+ (svm->nested.ctl.nested_ctl & SVM_NESTED_CTL_GMET_ENABLE);
/* Done at vmrun: asid. */
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index d3b69eb3242b..4a0d97e70dc2 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -5294,6 +5294,9 @@ static __init void svm_set_cpu_caps(void)
if (boot_cpu_has(X86_FEATURE_PFTHRESHOLD))
kvm_cpu_cap_set(X86_FEATURE_PFTHRESHOLD);
+ if (boot_cpu_has(X86_FEATURE_GMET))
+ kvm_cpu_cap_set(X86_FEATURE_GMET);
+
if (vgif)
kvm_cpu_cap_set(X86_FEATURE_VGIF);
--
2.52.0
^ permalink raw reply related [flat|nested] 56+ messages in thread
* Re: [PATCH 18/22] KVM: SVM: add GMET bit definitions
2026-03-21 0:09 ` [PATCH 18/22] KVM: SVM: add GMET bit definitions Paolo Bonzini
@ 2026-03-21 11:58 ` Borislav Petkov
2026-03-21 13:51 ` Paolo Bonzini
2026-03-23 12:26 ` Borislav Petkov
1 sibling, 1 reply; 56+ messages in thread
From: Borislav Petkov @ 2026-03-21 11:58 UTC (permalink / raw)
To: Paolo Bonzini
Cc: linux-kernel, kvm, Jon Kohler, Marcelo Tosatti, Nikunj A Dadhania,
Amit Shah, Sean Christopherson
On Sat, Mar 21, 2026 at 01:09:27AM +0100, Paolo Bonzini wrote:
> GMET (Guest Mode Execute Trap) is an AMD virtualization feature,
> essentially the nested paging version of SMEP. Hyper-V uses it;
> add it in preparation for making it available to hypervisors
> running under KVM.
>
> Cc: Borislav Petkov (AMD) <bp@alien8.de>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> arch/x86/include/asm/cpufeatures.h | 1 +
> arch/x86/include/asm/svm.h | 1 +
> 2 files changed, 2 insertions(+)
>
> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> index 123c023fe42c..95469c7d357f 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -382,6 +382,7 @@
> #define X86_FEATURE_AVIC (15*32+13) /* "avic" Virtual Interrupt Controller */
> #define X86_FEATURE_V_VMSAVE_VMLOAD (15*32+15) /* "v_vmsave_vmload" Virtual VMSAVE VMLOAD */
> #define X86_FEATURE_VGIF (15*32+16) /* "vgif" Virtual GIF */
> +#define X86_FEATURE_GMET (15*32+17) /* "gmet" Guest Mode Execution Trap */
^^^^^^
Why do you need to show it in /proc/cpuinfo?
> #define X86_FEATURE_X2AVIC (15*32+18) /* "x2avic" Virtual x2apic */
> #define X86_FEATURE_V_SPEC_CTRL (15*32+20) /* "v_spec_ctrl" Virtual SPEC_CTRL */
> #define X86_FEATURE_VNMI (15*32+25) /* "vnmi" Virtual NMI */
> diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
> index e2a29e1144a7..47353bef947c 100644
> --- a/arch/x86/include/asm/svm.h
> +++ b/arch/x86/include/asm/svm.h
> @@ -239,6 +239,7 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
> #define SVM_NESTED_CTL_NP_ENABLE BIT(0)
> #define SVM_NESTED_CTL_SEV_ENABLE BIT(1)
> #define SVM_NESTED_CTL_SEV_ES_ENABLE BIT(2)
> +#define SVM_NESTED_CTL_GMET_ENABLE BIT(3)
>
>
> #define SVM_TSC_RATIO_RSVD 0xffffff0000000000ULL
> --
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 18/22] KVM: SVM: add GMET bit definitions
2026-03-21 11:58 ` Borislav Petkov
@ 2026-03-21 13:51 ` Paolo Bonzini
2026-03-21 15:42 ` Borislav Petkov
0 siblings, 1 reply; 56+ messages in thread
From: Paolo Bonzini @ 2026-03-21 13:51 UTC (permalink / raw)
To: Borislav Petkov
Cc: linux-kernel, kvm, Jon Kohler, Marcelo Tosatti, Nikunj A Dadhania,
Amit Shah, Sean Christopherson
On Sat, Mar 21, 2026 at 12:58 PM Borislav Petkov <bp@alien8.de> wrote:
> > @@ -382,6 +382,7 @@
> > #define X86_FEATURE_AVIC (15*32+13) /* "avic" Virtual Interrupt Controller */
> > #define X86_FEATURE_V_VMSAVE_VMLOAD (15*32+15) /* "v_vmsave_vmload" Virtual VMSAVE VMLOAD */
> > #define X86_FEATURE_VGIF (15*32+16) /* "vgif" Virtual GIF */
> > +#define X86_FEATURE_GMET (15*32+17) /* "gmet" Guest Mode Execution Trap */
> ^^^^^^
>
> Why do you need to show it in /proc/cpuinfo?
I prefer to be consistent with all the other ones in word 15.
Paolo
> > #define X86_FEATURE_X2AVIC (15*32+18) /* "x2avic" Virtual x2apic */
> > #define X86_FEATURE_V_SPEC_CTRL (15*32+20) /* "v_spec_ctrl" Virtual SPEC_CTRL */
> > #define X86_FEATURE_VNMI (15*32+25) /* "vnmi" Virtual NMI */
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC PATCH 00/22] KVM: combined patchset for MBEC/GMET support
2026-03-21 0:09 [RFC PATCH 00/22] KVM: combined patchset for MBEC/GMET support Paolo Bonzini
` (21 preceding siblings ...)
2026-03-21 0:09 ` [PATCH 22/22] KVM: nSVM: enable GMET for guests Paolo Bonzini
@ 2026-03-21 13:54 ` Paolo Bonzini
22 siblings, 0 replies; 56+ messages in thread
From: Paolo Bonzini @ 2026-03-21 13:54 UTC (permalink / raw)
To: linux-kernel, kvm
Cc: Jon Kohler, Marcelo Tosatti, Nikunj A Dadhania, Amit Shah,
Sean Christopherson
On Sat, Mar 21, 2026 at 1:09 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
> On the AMD side [...] the page tables have to be created with U=0. For now I chose to have it in all levels of the page tables; I'll probably change the code soon to clear the U bit only in leaf SPTEs, but I'm leaving it this way because it makes patch 16 easier to understand (it's a fix for a latent bug of sorts and I'd like to include it anyway).
I added this as the last thing before sending and it's wrong - the
*role* needs to have ACC_USER_MASK cleared at all levels, which is
what patch 16 does (propagating down from the root role). The SPTEs
however only have U=0 in the leaf; make_nonleaf_spte() leaves the bit
set which is the right thing to do.
Paolo
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 18/22] KVM: SVM: add GMET bit definitions
2026-03-21 13:51 ` Paolo Bonzini
@ 2026-03-21 15:42 ` Borislav Petkov
2026-03-23 7:53 ` Paolo Bonzini
0 siblings, 1 reply; 56+ messages in thread
From: Borislav Petkov @ 2026-03-21 15:42 UTC (permalink / raw)
To: Paolo Bonzini
Cc: linux-kernel, kvm, Jon Kohler, Marcelo Tosatti, Nikunj A Dadhania,
Amit Shah, Sean Christopherson
On Sat, Mar 21, 2026 at 02:51:47PM +0100, Paolo Bonzini wrote:
> I prefer to be consistent with all the other ones in word 15.
Sorry, no, /proc/cpuinfo is an ABI.
See Documentation/arch/x86/cpuinfo.rst
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 18/22] KVM: SVM: add GMET bit definitions
2026-03-21 15:42 ` Borislav Petkov
@ 2026-03-23 7:53 ` Paolo Bonzini
2026-03-23 12:17 ` Borislav Petkov
2026-03-23 12:19 ` Borislav Petkov
0 siblings, 2 replies; 56+ messages in thread
From: Paolo Bonzini @ 2026-03-23 7:53 UTC (permalink / raw)
To: Borislav Petkov
Cc: Kernel Mailing List, Linux, kvm, Jon Kohler, Marcelo Tosatti,
Nikunj A Dadhania, Amit Shah, Sean Christopherson
Il sab 21 mar 2026, 16:42 Borislav Petkov <bp@alien8.de> ha scritto:
>
> On Sat, Mar 21, 2026 at 02:51:47PM +0100, Paolo Bonzini wrote:
> > I prefer to be consistent with all the other ones in word 15.
>
> Sorry, no, /proc/cpuinfo is an ABI.
I am not sure what you mean by ABI in this context, maybe "there are
rules for whether to add stuff to cpuinfo.rst"? If so, GMET matches
what cpuinfo.rst says:
* the kernel knows about the feature enough to have an X86_FEATURE bit
* the kernel supports it and is currently making it available to userspace
* the hardware supports it.
The bullet that applies here is the second from the above list: 1)
this series makes the feature available for userspace and guests to
use it, 2) that requires explicit support, it's not just a bunch of
new instructions.
I agree that not all word 15 features should be added to
/proc/cpuinfo, if that's what you meant by referencing cpuinfo.rst.
For example these days something like AVIC would not be added to
/proc/cpuinfo because it's transparent to userspace and guests.
Paolo
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 18/22] KVM: SVM: add GMET bit definitions
2026-03-23 7:53 ` Paolo Bonzini
@ 2026-03-23 12:17 ` Borislav Petkov
2026-03-23 12:22 ` Paolo Bonzini
2026-03-23 12:19 ` Borislav Petkov
1 sibling, 1 reply; 56+ messages in thread
From: Borislav Petkov @ 2026-03-23 12:17 UTC (permalink / raw)
To: Paolo Bonzini
Cc: Kernel Mailing List, Linux, kvm, Jon Kohler, Marcelo Tosatti,
Nikunj A Dadhania, Amit Shah, Sean Christopherson
On Mon, Mar 23, 2026 at 08:53:32AM +0100, Paolo Bonzini wrote:
> I am not sure what you mean by ABI in this context, maybe "there are
It means that once we show it in /proc/cpuinfo, we're stuck with it forever
because something in userspace might depend on it.
So we try to avoid that if it can be solved in a different way.
> rules for whether to add stuff to cpuinfo.rst"? If so, GMET matches
> what cpuinfo.rst says:
>
> * the kernel knows about the feature enough to have an X86_FEATURE bit
> * the kernel supports it and is currently making it available to userspace
> * the hardware supports it.
>
> The bullet that applies here is the second from the above list: 1)
> this series makes the feature available for userspace and guests to
> use it,
This is the key question: what in userspace is going to parse /proc/cpuinfo
and look for "gmet"?
Because parsing /proc/cpuinfo is the silliest option available but glibc does
it and for some things like, "did the kernel *actually* enable this?" it is
the only way.
If the above applies, can this be solved in a different way? CPUID,
guest_cpu_cap_has(), whatever...
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 18/22] KVM: SVM: add GMET bit definitions
2026-03-23 7:53 ` Paolo Bonzini
2026-03-23 12:17 ` Borislav Petkov
@ 2026-03-23 12:19 ` Borislav Petkov
1 sibling, 0 replies; 56+ messages in thread
From: Borislav Petkov @ 2026-03-23 12:19 UTC (permalink / raw)
To: Paolo Bonzini
Cc: Kernel Mailing List, Linux, kvm, Jon Kohler, Marcelo Tosatti,
Nikunj A Dadhania, Amit Shah, Sean Christopherson
On Mon, Mar 23, 2026 at 08:53:32AM +0100, Paolo Bonzini wrote:
> The bullet that applies here is the second from the above list: 1)
> this series makes the feature available for userspace and guests to
> use it,
Also, if you're going to do this, then this has to be dependent on Kconfig and
be disabled when the kernel support for gmet is not build-enabled.
This explains the issue more:
https://lore.kernel.org/r/cover.1774008873.git.m.wieczorretman@pm.me
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 18/22] KVM: SVM: add GMET bit definitions
2026-03-23 12:17 ` Borislav Petkov
@ 2026-03-23 12:22 ` Paolo Bonzini
2026-03-23 12:26 ` Borislav Petkov
0 siblings, 1 reply; 56+ messages in thread
From: Paolo Bonzini @ 2026-03-23 12:22 UTC (permalink / raw)
To: Borislav Petkov
Cc: Kernel Mailing List, Linux, kvm, Jon Kohler, Marcelo Tosatti,
Nikunj A Dadhania, Amit Shah, Sean Christopherson
On Mon, Mar 23, 2026 at 1:17 PM Borislav Petkov <bp@alien8.de> wrote:
> This is the key question: what in userspace is going to parse /proc/cpuinfo
> and look for "gmet"?
To be honest probably not. They can open /dev/kvm and call
KVM_GET_SUPPORTED_CPUID to get the same information.
The difference is that /proc/cpuinfo will always have gmet even if the
module parameter is zero, while /dev/kvm won't. Based on your other
message, it seems like you want this to be a bug rather than a
feature, so let's shelve this, can you ack it with "" instead?
Paolo
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 18/22] KVM: SVM: add GMET bit definitions
2026-03-23 12:22 ` Paolo Bonzini
@ 2026-03-23 12:26 ` Borislav Petkov
0 siblings, 0 replies; 56+ messages in thread
From: Borislav Petkov @ 2026-03-23 12:26 UTC (permalink / raw)
To: Paolo Bonzini
Cc: Kernel Mailing List, Linux, kvm, Jon Kohler, Marcelo Tosatti,
Nikunj A Dadhania, Amit Shah, Sean Christopherson
On Mon, Mar 23, 2026 at 01:22:10PM +0100, Paolo Bonzini wrote:
> On Mon, Mar 23, 2026 at 1:17 PM Borislav Petkov <bp@alien8.de> wrote:
> > This is the key question: what in userspace is going to parse /proc/cpuinfo
> > and look for "gmet"?
>
> To be honest probably not. They can open /dev/kvm and call
> KVM_GET_SUPPORTED_CPUID to get the same information.
>
> The difference is that /proc/cpuinfo will always have gmet even if the
> module parameter is zero, while /dev/kvm won't. Based on your other
> message, it seems like you want this to be a bug rather than a
> feature, so let's shelve this, can you ack it with "" instead?
Right, you don't need to add any "" anymore - just a normal comment and that's
it. Lemme reply to the patch.
And *if* we ever decide that we need to show it to userspace, we can always do
that later but then we'll actually have a proper justification for doing so.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 18/22] KVM: SVM: add GMET bit definitions
2026-03-21 0:09 ` [PATCH 18/22] KVM: SVM: add GMET bit definitions Paolo Bonzini
2026-03-21 11:58 ` Borislav Petkov
@ 2026-03-23 12:26 ` Borislav Petkov
1 sibling, 0 replies; 56+ messages in thread
From: Borislav Petkov @ 2026-03-23 12:26 UTC (permalink / raw)
To: Paolo Bonzini
Cc: linux-kernel, kvm, Jon Kohler, Marcelo Tosatti, Nikunj A Dadhania,
Amit Shah, Sean Christopherson
On Sat, Mar 21, 2026 at 01:09:27AM +0100, Paolo Bonzini wrote:
> GMET (Guest Mode Execute Trap) is an AMD virtualization feature,
> essentially the nested paging version of SMEP. Hyper-V uses it;
> add it in preparation for making it available to hypervisors
> running under KVM.
>
> Cc: Borislav Petkov (AMD) <bp@alien8.de>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> arch/x86/include/asm/cpufeatures.h | 1 +
> arch/x86/include/asm/svm.h | 1 +
> 2 files changed, 2 insertions(+)
>
> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> index 123c023fe42c..95469c7d357f 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -382,6 +382,7 @@
> #define X86_FEATURE_AVIC (15*32+13) /* "avic" Virtual Interrupt Controller */
> #define X86_FEATURE_V_VMSAVE_VMLOAD (15*32+15) /* "v_vmsave_vmload" Virtual VMSAVE VMLOAD */
> #define X86_FEATURE_VGIF (15*32+16) /* "vgif" Virtual GIF */
> +#define X86_FEATURE_GMET (15*32+17) /* "gmet" Guest Mode Execution Trap */
s!"gmet"!!
With that:
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
> #define X86_FEATURE_X2AVIC (15*32+18) /* "x2avic" Virtual x2apic */
> #define X86_FEATURE_V_SPEC_CTRL (15*32+20) /* "v_spec_ctrl" Virtual SPEC_CTRL */
> #define X86_FEATURE_VNMI (15*32+25) /* "vnmi" Virtual NMI */
> diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
> index e2a29e1144a7..47353bef947c 100644
> --- a/arch/x86/include/asm/svm.h
> +++ b/arch/x86/include/asm/svm.h
> @@ -239,6 +239,7 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
> #define SVM_NESTED_CTL_NP_ENABLE BIT(0)
> #define SVM_NESTED_CTL_SEV_ENABLE BIT(1)
> #define SVM_NESTED_CTL_SEV_ES_ENABLE BIT(2)
> +#define SVM_NESTED_CTL_GMET_ENABLE BIT(3)
>
>
> #define SVM_TSC_RATIO_RSVD 0xffffff0000000000ULL
> --
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 14/22] KVM: nVMX: advertise MBEC to nested guests
2026-03-21 0:09 ` [PATCH 14/22] KVM: nVMX: advertise MBEC to nested guests Paolo Bonzini
@ 2026-03-23 14:49 ` Jon Kohler
0 siblings, 0 replies; 56+ messages in thread
From: Jon Kohler @ 2026-03-23 14:49 UTC (permalink / raw)
To: Paolo Bonzini
Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
Marcelo Tosatti, Nikunj A Dadhania, Amit Shah,
Sean Christopherson
> On Mar 20, 2026, at 8:09 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> From: Jon Kohler <jon@nutanix.com>
>
> Advertise SECONDARY_EXEC_MODE_BASED_EPT_EXEC (MBEC) to userspace, which
> allows userspace to expose and advertise the feature to the guest.
> When MBEC is enabled by the guest, it is passed to the MMU via cr4_smep
> and to the processor by the merging of vmcs12->secondary_vm_exec_control
> into the VMCS02's secondary VM execution controls.
>
> Signed-off-by: Jon Kohler <jon@nutanix.com>
> Message-ID: <20251223054806.1611168-9-jon@nutanix.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> arch/x86/kvm/mmu.h | 2 +-
> arch/x86/kvm/mmu/mmu.c | 7 ++++---
> arch/x86/kvm/vmx/nested.c | 10 ++++++++++
> 3 files changed, 15 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> index 2a6caac39d40..035244ccbb5e 100644
> --- a/arch/x86/kvm/mmu.h
> +++ b/arch/x86/kvm/mmu.h
> @@ -93,7 +93,7 @@ void kvm_init_shadow_npt_mmu(struct kvm_vcpu *vcpu, unsigned long cr0,
> unsigned long cr4, u64 efer, gpa_t nested_cr3);
> void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly,
> int huge_page_level, bool accessed_dirty,
> - gpa_t new_eptp);
> + bool mbec, gpa_t new_eptp);
> bool kvm_can_do_async_pf(struct kvm_vcpu *vcpu);
> int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code,
> u64 fault_address, char *insn, int insn_len);
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index a0b4774e405a..647dffb69d85 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -5839,7 +5839,7 @@ EXPORT_SYMBOL_GPL(kvm_init_shadow_npt_mmu);
>
> static union kvm_cpu_role
> kvm_calc_shadow_ept_root_page_role(struct kvm_vcpu *vcpu, bool accessed_dirty,
> - bool execonly, u8 level)
> + bool execonly, u8 level, bool mbec)
> {
> union kvm_cpu_role role = {0};
>
> @@ -5849,6 +5849,7 @@ kvm_calc_shadow_ept_root_page_role(struct kvm_vcpu *vcpu, bool accessed_dirty,
> */
> WARN_ON_ONCE(is_smm(vcpu));
> role.base.level = level;
> + role.base.cr4_smep = mbec;
> role.base.has_4_byte_gpte = false;
> role.base.direct = false;
> role.base.ad_disabled = !accessed_dirty;
> @@ -5864,13 +5865,13 @@ kvm_calc_shadow_ept_root_page_role(struct kvm_vcpu *vcpu, bool accessed_dirty,
>
> void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly,
> int huge_page_level, bool accessed_dirty,
> - gpa_t new_eptp)
> + bool mbec, gpa_t new_eptp)
> {
> struct kvm_mmu *context = &vcpu->arch.guest_mmu;
> u8 level = vmx_eptp_page_walk_level(new_eptp);
> union kvm_cpu_role new_mode =
> kvm_calc_shadow_ept_root_page_role(vcpu, accessed_dirty,
> - execonly, level);
> + execonly, level, mbec);
>
> if (new_mode.as_u64 != context->cpu_role.as_u64) {
> /* EPT, and thus nested EPT, does not consume CR0, CR4, nor EFER. */
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index 7c55551a2680..7b0861d02166 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -460,6 +460,12 @@ static void nested_ept_inject_page_fault(struct kvm_vcpu *vcpu,
> vmcs12->guest_physical_address = fault->address;
> }
>
> +static inline bool nested_ept_mbec_enabled(struct kvm_vcpu *vcpu)
> +{
> + struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
> + return nested_cpu_has2(vmcs12, SECONDARY_EXEC_MODE_BASED_EPT_EXEC);
> +}
> +
checkpatch.pl complaint:
WARNING: Missing a blank line after declarations
#83: FILE: arch/x86/kvm/vmx/nested.c:468:
+ struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
+ return nested_cpu_has2(vmcs12, SECONDARY_EXEC_MODE_BASED_EPT_EXEC);
> static void nested_ept_new_eptp(struct kvm_vcpu *vcpu)
> {
> struct vcpu_vmx *vmx = to_vmx(vcpu);
> @@ -468,6 +474,7 @@ static void nested_ept_new_eptp(struct kvm_vcpu *vcpu)
>
> kvm_init_shadow_ept_mmu(vcpu, execonly, ept_lpage_level,
> nested_ept_ad_enabled(vcpu),
> + nested_ept_mbec_enabled(vcpu),
> nested_ept_get_eptp(vcpu));
> }
>
> @@ -7145,6 +7152,9 @@ static void nested_vmx_setup_secondary_ctls(u32 ept_caps,
> msrs->ept_caps |= VMX_EPT_AD_BIT;
> }
>
> + if (cpu_has_ept_mbec())
> + msrs->secondary_ctls_high |=
> + SECONDARY_EXEC_MODE_BASED_EPT_EXEC;
> /*
> * Advertise EPTP switching irrespective of hardware support,
> * KVM emulates it in software so long as VMFUNC is supported.
> --
> 2.52.0
>
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 08/22] KVM: x86/mmu: introduce ACC_READ_MASK
2026-03-21 0:09 ` [PATCH 08/22] KVM: x86/mmu: introduce ACC_READ_MASK Paolo Bonzini
@ 2026-03-23 14:49 ` Jon Kohler
2026-03-23 14:49 ` Jon Kohler
1 sibling, 0 replies; 56+ messages in thread
From: Jon Kohler @ 2026-03-23 14:49 UTC (permalink / raw)
To: Paolo Bonzini
Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
Marcelo Tosatti, Nikunj A Dadhania, Amit Shah,
Sean Christopherson
> On Mar 20, 2026, at 8:09 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> Read permissions so far were only needed for EPT, which does not need
> ACC_USER_MASK. Therefore, for EPT page tables ACC_USER_MASK was repurposed
> as a read permission bit.
>
> In order to implement nested MBEC, EPT will genuinely have four kinds of
> accesses, and there will be no room for such hacks; bite the bullet at
> last, enlarging ACC_ALL to four bits and permissions[] to 2^4 bits (u16).
>
> The new code does not enforce that the XWR bits on non-execonly processors
> have their R bit set, even when running nested: none of the shadow_*_mask
> values have bit 0 set, and make_spte() genuinely relies on ACC_READ_MASK
> being requested! This works becase, if execonly is not supported by the
> processor, shadow EPT will generate an EPT misconfig vmexit if the XWR
> bits represent a non-readable page, and therefore the pte_access argument
> to make_spte() will also always have ACC_READ_MASK set.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> arch/x86/include/asm/kvm_host.h | 12 +++++-----
> arch/x86/kvm/mmu.h | 2 +-
> arch/x86/kvm/mmu/mmu.c | 39 +++++++++++++++++++++------------
> arch/x86/kvm/mmu/mmutrace.h | 3 ++-
> arch/x86/kvm/mmu/paging_tmpl.h | 21 +++++++++---------
> arch/x86/kvm/mmu/spte.c | 18 ++++++---------
> arch/x86/kvm/mmu/spte.h | 5 +++--
> arch/x86/kvm/vmx/capabilities.h | 5 -----
> arch/x86/kvm/vmx/common.h | 5 +----
> arch/x86/kvm/vmx/vmx.c | 3 +--
> 10 files changed, 56 insertions(+), 57 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 871c7ff4fb29..3efb238c683c 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -317,11 +317,11 @@ struct kvm_kernel_irq_routing_entry;
> * the number of unique SPs that can theoretically be created is 2^n, where n
> * is the number of bits that are used to compute the role.
> *
> - * But, even though there are 20 bits in the mask below, not all combinations
> + * But, even though there are 21 bits in the mask below, not all combinations
> * of modes and flags are possible:
> *
> * - invalid shadow pages are not accounted, mirror pages are not shadowed,
> - * so the bits are effectively 18.
> + * so the bits are effectively 19.
> *
> * - quadrant will only be used if has_4_byte_gpte=1 (non-PAE paging);
> * execonly and ad_disabled are only used for nested EPT which has
> @@ -336,7 +336,7 @@ struct kvm_kernel_irq_routing_entry;
> * cr0_wp=0, therefore these three bits only give rise to 5 possibilities.
> *
> * Therefore, the maximum number of possible upper-level shadow pages for a
> - * single gfn is a bit less than 2^13.
> + * single gfn is a bit less than 2^14.
> */
> union kvm_mmu_page_role {
> u32 word;
> @@ -345,7 +345,7 @@ union kvm_mmu_page_role {
> unsigned has_4_byte_gpte:1;
> unsigned quadrant:2;
> unsigned direct:1;
> - unsigned access:3;
> + unsigned access:4;
> unsigned invalid:1;
> unsigned efer_nx:1;
> unsigned cr0_wp:1;
> @@ -355,7 +355,7 @@ union kvm_mmu_page_role {
> unsigned guest_mode:1;
> unsigned passthrough:1;
> unsigned is_mirror:1;
> - unsigned :4;
> + unsigned :3;
>
> /*
> * This is left at the top of the word so that
> @@ -481,7 +481,7 @@ struct kvm_mmu {
> * Byte index: page fault error code [4:1]
> * Bit index: pte permissions in ACC_* format
> */
> - u8 permissions[16];
> + u16 permissions[16];
>
> u64 *pae_root;
> u64 *pml4_root;
> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> index b4b6860ab971..f5d35f66750b 100644
> --- a/arch/x86/kvm/mmu.h
> +++ b/arch/x86/kvm/mmu.h
> @@ -81,7 +81,7 @@ u8 kvm_mmu_get_max_tdp_level(void);
> void kvm_mmu_set_mmio_spte_mask(u64 mmio_value, u64 mmio_mask, u64 access_mask);
> void kvm_mmu_set_mmio_spte_value(struct kvm *kvm, u64 mmio_value);
> void kvm_mmu_set_me_spte_mask(u64 me_value, u64 me_mask);
> -void kvm_mmu_set_ept_masks(bool has_ad_bits, bool has_exec_only);
> +void kvm_mmu_set_ept_masks(bool has_ad_bits);
>
> void kvm_init_mmu(struct kvm_vcpu *vcpu);
> void kvm_init_shadow_npt_mmu(struct kvm_vcpu *vcpu, unsigned long cr0,
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 84351df8a9cb..b87dbf9e42b9 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -2029,7 +2029,7 @@ static bool kvm_sync_page_check(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
> */
> const union kvm_mmu_page_role sync_role_ign = {
> .level = 0xf,
> - .access = 0x7,
> + .access = ACC_ALL,
> .quadrant = 0x3,
> .passthrough = 0x1,
> };
> @@ -5426,7 +5426,7 @@ reset_ept_shadow_zero_bits_mask(struct kvm_mmu *context, bool execonly)
> * update_permission_bitmask() builds what is effectively a
> * two-dimensional array of bools. The second dimension is
> * provided by individual bits of permissions[pfec >> 1], and
> - * logical &, | and ~ operations operate on all the 8 possible
> + * logical &, | and ~ operations operate on all the 16 possible
> * combinations of ACC_* bits.
> */
> #define ACC_BITS_MASK(access) \
> @@ -5436,15 +5436,24 @@ reset_ept_shadow_zero_bits_mask(struct kvm_mmu *context, bool execonly)
> (4 & (access) ? 1 << 4 : 0) | \
> (5 & (access) ? 1 << 5 : 0) | \
> (6 & (access) ? 1 << 6 : 0) | \
> - (7 & (access) ? 1 << 7 : 0))
> + (7 & (access) ? 1 << 7 : 0) | \
> + (8 & (access) ? 1 << 8 : 0) | \
> + (9 & (access) ? 1 << 9 : 0) | \
> + (10 & (access) ? 1 << 10 : 0) | \
> + (11 & (access) ? 1 << 11 : 0) | \
> + (12 & (access) ? 1 << 12 : 0) | \
> + (13 & (access) ? 1 << 13 : 0) | \
> + (14 & (access) ? 1 << 14 : 0) | \
> + (15 & (access) ? 1 << 15 : 0))
>
> static void update_permission_bitmask(struct kvm_mmu *mmu, bool ept)
> {
> unsigned byte;
>
> - const u8 x = ACC_BITS_MASK(ACC_EXEC_MASK);
> - const u8 w = ACC_BITS_MASK(ACC_WRITE_MASK);
> - const u8 u = ACC_BITS_MASK(ACC_USER_MASK);
> + const u16 x = ACC_BITS_MASK(ACC_EXEC_MASK);
> + const u16 w = ACC_BITS_MASK(ACC_WRITE_MASK);
> + const u16 u = ACC_BITS_MASK(ACC_USER_MASK);
> + const u16 r = ACC_BITS_MASK(ACC_READ_MASK);
>
> bool cr4_smep = is_cr4_smep(mmu);
> bool cr4_smap = is_cr4_smap(mmu);
> @@ -5467,24 +5476,26 @@ static void update_permission_bitmask(struct kvm_mmu *mmu, bool ept)
> unsigned pfec = byte << 1;
>
> /*
> - * Each "*f" variable has a 1 bit for each UWX value
> + * Each "*f" variable has a 1 bit for each ACC_* combo
> * that causes a fault with the given PFEC.
> */
>
> + /* Faults from reads to non-readable pages */
> + u16 rf = (pfec & (PFERR_WRITE_MASK|PFERR_FETCH_MASK)) ? 0 : (u16)~r;
> /* Faults from writes to non-writable pages */
> - u8 wf = (pfec & PFERR_WRITE_MASK) ? (u8)~w : 0;
> + u16 wf = (pfec & PFERR_WRITE_MASK) ? (u16)~w : 0;
> /* Faults from user mode accesses to supervisor pages */
> - u8 uf = (pfec & PFERR_USER_MASK) ? (u8)~u : 0;
> + u16 uf = (pfec & PFERR_USER_MASK) ? (u16)~u : 0;
> /* Faults from fetches of non-executable pages*/
> - u8 ff = (pfec & PFERR_FETCH_MASK) ? (u8)~x : 0;
> + u16 ff = (pfec & PFERR_FETCH_MASK) ? (u16)~x : 0;
> /* Faults from kernel mode fetches of user pages */
> - u8 smepf = 0;
> + u16 smepf = 0;
> /* Faults from kernel mode accesses of user pages */
> - u8 smapf = 0;
> + u16 smapf = 0;
>
> if (!ept) {
> /* Faults from kernel mode accesses to user pages */
> - u8 kf = (pfec & PFERR_USER_MASK) ? 0 : u;
> + u16 kf = (pfec & PFERR_USER_MASK) ? 0 : u;
>
> /* Not really needed: !nx will cause pte.nx to fault */
> if (!efer_nx)
> @@ -5517,7 +5528,7 @@ static void update_permission_bitmask(struct kvm_mmu *mmu, bool ept)
> smapf = (pfec & (PFERR_RSVD_MASK|PFERR_FETCH_MASK)) ? 0 : kf;
> }
>
> - mmu->permissions[byte] = ff | uf | wf | smepf | smapf;
> + mmu->permissions[byte] = ff | uf | wf | rf | smepf | smapf;
> }
> }
>
> diff --git a/arch/x86/kvm/mmu/mmutrace.h b/arch/x86/kvm/mmu/mmutrace.h
> index f35a830ce469..44545f6f860a 100644
> --- a/arch/x86/kvm/mmu/mmutrace.h
> +++ b/arch/x86/kvm/mmu/mmutrace.h
> @@ -25,7 +25,8 @@
> #define KVM_MMU_PAGE_PRINTK() ({ \
> const char *saved_ptr = trace_seq_buffer_ptr(p); \
> static const char *access_str[] = { \
> - "---", "--x", "w--", "w-x", "-u-", "-ux", "wu-", "wux" \
> + "----", "r---", "-w--", "rw--", "--u-", "r-u-", "-wu-", "rwu-", \
> + "---x", "r--x", "-w-x", "rw-x", "--ux", "r-ux", "-wux", "rwux" \
> }; \
> union kvm_mmu_page_role role; \
> \
> diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
> index ed762bb4b007..bbdbf4ae2d65 100644
> --- a/arch/x86/kvm/mmu/paging_tmpl.h
> +++ b/arch/x86/kvm/mmu/paging_tmpl.h
> @@ -170,25 +170,24 @@ static bool FNAME(prefetch_invalid_gpte)(struct kvm_vcpu *vcpu,
> return true;
> }
>
> -/*
> - * For PTTYPE_EPT, a page table can be executable but not readable
> - * on supported processors. Therefore, set_spte does not automatically
> - * set bit 0 if execute only is supported. Here, we repurpose ACC_USER_MASK
> - * to signify readability since it isn't used in the EPT case
> - */
> static inline unsigned FNAME(gpte_access)(u64 gpte)
> {
> unsigned access;
> #if PTTYPE == PTTYPE_EPT
> access = ((gpte & VMX_EPT_WRITABLE_MASK) ? ACC_WRITE_MASK : 0) |
> ((gpte & VMX_EPT_EXECUTABLE_MASK) ? ACC_EXEC_MASK : 0) |
> - ((gpte & VMX_EPT_READABLE_MASK) ? ACC_USER_MASK : 0);
> + ((gpte & VMX_EPT_READABLE_MASK) ? ACC_READ_MASK : 0);
> #else
> - BUILD_BUG_ON(ACC_EXEC_MASK != PT_PRESENT_MASK);
> - BUILD_BUG_ON(ACC_EXEC_MASK != 1);
> + /*
> + * P is set here, so the page is always readable and W/U/!NX represent
> + * allowed accesses.
> + */
> + BUILD_BUG_ON(ACC_READ_MASK != PT_PRESENT_MASK);
> + BUILD_BUG_ON(ACC_WRITE_MASK != PT_WRITABLE_MASK);
> + BUILD_BUG_ON(ACC_USER_MASK != PT_USER_MASK);
> + BUILD_BUG_ON(ACC_EXEC_MASK & (PT_WRITABLE_MASK | PT_USER_MASK | PT_PRESENT_MASK));
> access = gpte & (PT_WRITABLE_MASK | PT_USER_MASK | PT_PRESENT_MASK);
> - /* Combine NX with P (which is set here) to get ACC_EXEC_MASK. */
> - access ^= (gpte >> PT64_NX_SHIFT);
> + access |= gpte & PT64_NX_MASK ? 0 : ACC_EXEC_MASK;
> #endif
>
> return access;
> diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
> index e2acd9ed9dba..0b09124b0d54 100644
> --- a/arch/x86/kvm/mmu/spte.c
> +++ b/arch/x86/kvm/mmu/spte.c
> @@ -194,12 +194,6 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
> int is_host_mmio = -1;
> bool wrprot = false;
>
> - /*
> - * For the EPT case, shadow_present_mask has no RWX bits set if
> - * exec-only page table entries are supported. In that case,
> - * ACC_USER_MASK and shadow_user_mask are used to represent
> - * read access. See FNAME(gpte_access) in paging_tmpl.h.
> - */
> WARN_ON_ONCE((pte_access | shadow_present_mask) == SHADOW_NONPRESENT_VALUE);
>
> if (sp->role.ad_disabled)
> @@ -228,6 +222,9 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
> pte_access &= ~ACC_EXEC_MASK;
> }
>
> + if (pte_access & ACC_READ_MASK)
> + spte |= PT_PRESENT_MASK; /* or VMX_EPT_READABLE_MASK */
> +
> if (pte_access & ACC_EXEC_MASK)
> spte |= shadow_x_mask;
> else
> @@ -390,6 +387,7 @@ u64 make_nonleaf_spte(u64 *child_pt, bool ad_disabled)
> u64 spte = SPTE_MMU_PRESENT_MASK;
>
> spte |= __pa(child_pt) | shadow_present_mask | PT_WRITABLE_MASK |
> + PT_PRESENT_MASK /* or VMX_EPT_READABLE_MASK */ |
> shadow_user_mask | shadow_x_mask | shadow_me_value;
>
> if (ad_disabled)
> @@ -490,18 +488,16 @@ void kvm_mmu_set_me_spte_mask(u64 me_value, u64 me_mask)
> }
> EXPORT_SYMBOL_GPL(kvm_mmu_set_me_spte_mask);
What kernel version were you doing this against?
git am is giving me grief as I get a failed to apply because this should be
EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_mmu_set_me_spte_mask);
This was there since 6.18: https://github.com/torvalds/linux/commit/6b36119b94d0b2bb8cea9d512017efafd461d6ac
>
> -void kvm_mmu_set_ept_masks(bool has_ad_bits, bool has_exec_only)
> +void kvm_mmu_set_ept_masks(bool has_ad_bits)
> {
> kvm_ad_enabled = has_ad_bits;
>
> - shadow_user_mask = VMX_EPT_READABLE_MASK;
> + shadow_user_mask = 0;
> shadow_accessed_mask = VMX_EPT_ACCESS_BIT;
> shadow_dirty_mask = VMX_EPT_DIRTY_BIT;
> shadow_nx_mask = 0ull;
> shadow_x_mask = VMX_EPT_EXECUTABLE_MASK;
> - /* VMX_EPT_SUPPRESS_VE_BIT is needed for W or X violation. */
> - shadow_present_mask =
> - (has_exec_only ? 0ull : VMX_EPT_READABLE_MASK) | VMX_EPT_SUPPRESS_VE_BIT;
> + shadow_present_mask = VMX_EPT_SUPPRESS_VE_BIT;
>
> shadow_acc_track_mask = VMX_EPT_RWX_MASK;
> shadow_host_writable_mask = EPT_SPTE_HOST_WRITABLE;
> diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
> index 3d77755b6b10..0c305f2f4ba0 100644
> --- a/arch/x86/kvm/mmu/spte.h
> +++ b/arch/x86/kvm/mmu/spte.h
> @@ -52,10 +52,11 @@ static_assert(SPTE_TDP_AD_ENABLED == 0);
> #define SPTE_BASE_ADDR_MASK (((1ULL << 52) - 1) & ~(u64)(PAGE_SIZE-1))
> #endif
>
> -#define ACC_EXEC_MASK 1
> +#define ACC_READ_MASK PT_PRESENT_MASK
> #define ACC_WRITE_MASK PT_WRITABLE_MASK
> #define ACC_USER_MASK PT_USER_MASK
> -#define ACC_ALL (ACC_EXEC_MASK | ACC_WRITE_MASK | ACC_USER_MASK)
> +#define ACC_EXEC_MASK 8
> +#define ACC_ALL (ACC_EXEC_MASK | ACC_WRITE_MASK | ACC_USER_MASK | ACC_READ_MASK)
>
> #define SPTE_LEVEL_BITS 9
> #define SPTE_LEVEL_SHIFT(level) __PT_LEVEL_SHIFT(level, SPTE_LEVEL_BITS)
> diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
> index 5316c27f6099..3bda6a621d8a 100644
> --- a/arch/x86/kvm/vmx/capabilities.h
> +++ b/arch/x86/kvm/vmx/capabilities.h
> @@ -288,11 +288,6 @@ static inline bool cpu_has_vmx_flexpriority(void)
> cpu_has_vmx_virtualize_apic_accesses();
> }
>
> -static inline bool cpu_has_vmx_ept_execute_only(void)
> -{
> - return vmx_capability.ept & VMX_EPT_EXECUTE_ONLY_BIT;
> -}
> -
> static inline bool cpu_has_vmx_ept_4levels(void)
> {
> return vmx_capability.ept & VMX_EPT_PAGE_WALK_4_BIT;
> diff --git a/arch/x86/kvm/vmx/common.h b/arch/x86/kvm/vmx/common.h
> index adf925500b9e..1afbf272efae 100644
> --- a/arch/x86/kvm/vmx/common.h
> +++ b/arch/x86/kvm/vmx/common.h
> @@ -85,11 +85,8 @@ static inline int __vmx_handle_ept_violation(struct kvm_vcpu *vcpu, gpa_t gpa,
> {
> u64 error_code;
>
> - /* Is it a read fault? */
> - error_code = (exit_qualification & EPT_VIOLATION_ACC_READ)
> - ? PFERR_USER_MASK : 0;
> /* Is it a write fault? */
> - error_code |= (exit_qualification & EPT_VIOLATION_ACC_WRITE)
> + error_code = (exit_qualification & EPT_VIOLATION_ACC_WRITE)
> ? PFERR_WRITE_MASK : 0;
> /* Is it a fetch fault? */
> error_code |= (exit_qualification & EPT_VIOLATION_ACC_INSTR)
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 2e687761aeaf..98801c408b8c 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -8425,8 +8425,7 @@ __init int vmx_hardware_setup(void)
> set_bit(0, vmx_vpid_bitmap); /* 0 is reserved for host */
>
> if (enable_ept)
> - kvm_mmu_set_ept_masks(enable_ept_ad_bits,
> - cpu_has_vmx_ept_execute_only());
> + kvm_mmu_set_ept_masks(enable_ept_ad_bits);
> else
> vt_x86_ops.get_mt_mask = NULL;
>
> --
> 2.52.0
>
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 01/22] KVM: TDX/VMX: rework EPT_VIOLATION_EXEC_FOR_RING3_LIN into PROT_MASK
2026-03-21 0:09 ` [PATCH 01/22] KVM: TDX/VMX: rework EPT_VIOLATION_EXEC_FOR_RING3_LIN into PROT_MASK Paolo Bonzini
@ 2026-03-23 14:49 ` Jon Kohler
2026-03-25 4:29 ` Huang, Kai
1 sibling, 0 replies; 56+ messages in thread
From: Jon Kohler @ 2026-03-23 14:49 UTC (permalink / raw)
To: Paolo Bonzini
Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
Marcelo Tosatti, Nikunj A Dadhania, Amit Shah,
Sean Christopherson
> On Mar 20, 2026, at 8:09 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> From: Jon Kohler <jon@nutanix.com>
>
> EPT exit qualification bit 6 is used when mode-based execute control
> is enabled, and reflects user executable addresses. Rework name to
> reflect the intention and add to EPT_VIOLATION_PROT_MASK, which allows
> simplifying the return evaluation in
> tdx_is_sept_violation_unexpected_pending a pinch.
>
> Rework handling in __vmx_handle_ept_violation to unconditionally clear
> EPT_VIOLATION_PROT_USER_EXEC until MBEC is implemented, as suggested by
> Sean [1].
>
> Note: Intel SDM Table 29-7 defines bit 6 as:
> If the “mode-based execute control�? VM-execution control is 0, the
These quote marks should be " - utf silliness afoot !
> value of this bit is undefined. If that control is 1, this bit is the
> logical-AND of bit 10 in the EPT paging-structure entries used to
> translate the guest-physical address of the access causing the EPT
> violation. In this case, it indicates whether the guest-physical
> address was executable for user-mode linear addresses.
>
> [1] https://lore.kernel.org/all/aCJDzU1p_SFNRIJd@google.com/
>
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Jon Kohler <jon@nutanix.com>
> Message-ID: <20251223054806.1611168-2-jon@nutanix.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> arch/x86/include/asm/vmx.h | 5 +++--
> arch/x86/kvm/vmx/common.h | 9 +++++++--
> arch/x86/kvm/vmx/tdx.c | 2 +-
> 3 files changed, 11 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
> index cca7d6641287..4a0804cc7c82 100644
> --- a/arch/x86/include/asm/vmx.h
> +++ b/arch/x86/include/asm/vmx.h
> @@ -587,10 +587,11 @@ enum vm_entry_failure_code {
> #define EPT_VIOLATION_PROT_READ BIT(3)
> #define EPT_VIOLATION_PROT_WRITE BIT(4)
> #define EPT_VIOLATION_PROT_EXEC BIT(5)
> -#define EPT_VIOLATION_EXEC_FOR_RING3_LIN BIT(6)
> +#define EPT_VIOLATION_PROT_USER_EXEC BIT(6)
> #define EPT_VIOLATION_PROT_MASK (EPT_VIOLATION_PROT_READ | \
> EPT_VIOLATION_PROT_WRITE | \
> - EPT_VIOLATION_PROT_EXEC)
> + EPT_VIOLATION_PROT_EXEC | \
> + EPT_VIOLATION_PROT_USER_EXEC)
> #define EPT_VIOLATION_GVA_IS_VALID BIT(7)
> #define EPT_VIOLATION_GVA_TRANSLATED BIT(8)
>
> diff --git a/arch/x86/kvm/vmx/common.h b/arch/x86/kvm/vmx/common.h
> index 412d0829d7a2..adf925500b9e 100644
> --- a/arch/x86/kvm/vmx/common.h
> +++ b/arch/x86/kvm/vmx/common.h
> @@ -94,8 +94,13 @@ static inline int __vmx_handle_ept_violation(struct kvm_vcpu *vcpu, gpa_t gpa,
> /* Is it a fetch fault? */
> error_code |= (exit_qualification & EPT_VIOLATION_ACC_INSTR)
> ? PFERR_FETCH_MASK : 0;
> - /* ept page table entry is present? */
> - error_code |= (exit_qualification & EPT_VIOLATION_PROT_MASK)
> + /*
> + * ept page table entry is present?
> + * note: unconditionally clear USER_EXEC until mode-based
> + * execute control is implemented
> + */
> + error_code |= (exit_qualification &
> + (EPT_VIOLATION_PROT_MASK & ~EPT_VIOLATION_PROT_USER_EXEC))
> ? PFERR_PRESENT_MASK : 0;
>
> if (exit_qualification & EPT_VIOLATION_GVA_IS_VALID)
> diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> index c308aedd8613..bf9fe76d974d 100644
> --- a/arch/x86/kvm/vmx/tdx.c
> +++ b/arch/x86/kvm/vmx/tdx.c
> @@ -1921,7 +1921,7 @@ static inline bool tdx_is_sept_violation_unexpected_pending(struct kvm_vcpu *vcp
> if (eeq_type != TDX_EXT_EXIT_QUAL_TYPE_PENDING_EPT_VIOLATION)
> return false;
>
> - return !(eq & EPT_VIOLATION_PROT_MASK) && !(eq & EPT_VIOLATION_EXEC_FOR_RING3_LIN);
> + return !(eq & EPT_VIOLATION_PROT_MASK);
> }
>
> static int tdx_handle_ept_violation(struct kvm_vcpu *vcpu)
> --
> 2.52.0
>
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 06/22] KVM: x86/mmu: merge make_spte_{non,}executable
2026-03-21 0:09 ` [PATCH 06/22] KVM: x86/mmu: merge make_spte_{non,}executable Paolo Bonzini
@ 2026-03-23 14:49 ` Jon Kohler
0 siblings, 0 replies; 56+ messages in thread
From: Jon Kohler @ 2026-03-23 14:49 UTC (permalink / raw)
To: Paolo Bonzini
Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
Marcelo Tosatti, Nikunj A Dadhania, Amit Shah,
Sean Christopherson
> On Mar 20, 2026, at 8:09 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> As the logic will become more complicated with the introduction
> of MBEC, at least write it only once.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> arch/x86/kvm/mmu/spte.c | 21 +++++++++++----------
> 1 file changed, 11 insertions(+), 10 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
> index df31039b5d63..e2acd9ed9dba 100644
> --- a/arch/x86/kvm/mmu/spte.c
> +++ b/arch/x86/kvm/mmu/spte.c
> @@ -317,14 +317,15 @@ static u64 modify_spte_protections(u64 spte, u64 set, u64 clear)
> return spte;
> }
>
> -static u64 make_spte_executable(u64 spte)
> +static u64 make_spte_executable(u64 spte, u8 access)
> {
> - return modify_spte_protections(spte, shadow_x_mask, shadow_nx_mask);
> -}
> -
> -static u64 make_spte_nonexecutable(u64 spte)
> -{
> - return modify_spte_protections(spte, shadow_nx_mask, shadow_x_mask);
> + u64 set, clear;
> + if (access & ACC_EXEC_MASK)
checkpatch.pl complaint:
WARNING: Missing a blank line after declarations
#33: FILE: arch/x86/kvm/mmu/spte.c:323:
+ u64 set, clear;
+ if (access & ACC_EXEC_MASK)
> + set = shadow_x_mask;
> + else
> + set = shadow_nx_mask;
> + clear = set ^ (shadow_nx_mask | shadow_x_mask);
> + return modify_spte_protections(spte, set, clear);
> }
>
> /*
> @@ -356,8 +357,8 @@ u64 make_small_spte(struct kvm *kvm, u64 huge_spte,
> * the page executable as the NX hugepage mitigation no longer
> * applies.
> */
> - if ((role.access & ACC_EXEC_MASK) && is_nx_huge_page_enabled(kvm))
> - child_spte = make_spte_executable(child_spte);
> + if (is_nx_huge_page_enabled(kvm))
> + child_spte = make_spte_executable(child_spte, role.access);
> }
>
> return child_spte;
> @@ -379,7 +380,7 @@ u64 make_huge_spte(struct kvm *kvm, u64 small_spte, int level)
> huge_spte &= KVM_HPAGE_MASK(level) | ~PAGE_MASK;
>
> if (is_nx_huge_page_enabled(kvm))
> - huge_spte = make_spte_nonexecutable(huge_spte);
> + huge_spte = make_spte_executable(huge_spte, 0);
>
> return huge_spte;
> }
> --
> 2.52.0
>
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 13/22] KVM: x86/mmu: add support for nested MBEC
2026-03-21 0:09 ` [PATCH 13/22] KVM: x86/mmu: add support for nested MBEC Paolo Bonzini
@ 2026-03-23 14:49 ` Jon Kohler
0 siblings, 0 replies; 56+ messages in thread
From: Jon Kohler @ 2026-03-23 14:49 UTC (permalink / raw)
To: Paolo Bonzini
Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
Marcelo Tosatti, Nikunj A Dadhania, Amit Shah,
Sean Christopherson
> On Mar 20, 2026, at 8:09 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
Missing the body of the commit msg?
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> arch/x86/kvm/mmu/paging_tmpl.h | 29 ++++++++++++++++++++---------
> 1 file changed, 20 insertions(+), 9 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
> index c657ea90bb33..d50085308506 100644
> --- a/arch/x86/kvm/mmu/paging_tmpl.h
> +++ b/arch/x86/kvm/mmu/paging_tmpl.h
> @@ -124,12 +124,17 @@ static inline void FNAME(protect_clean_gpte)(struct kvm_mmu *mmu, unsigned *acce
> *access &= mask;
> }
>
> -static inline int FNAME(is_present_gpte)(unsigned long pte)
> +static inline int FNAME(is_present_gpte)(struct kvm_mmu *mmu,
> + unsigned long pte)
> {
> #if PTTYPE != PTTYPE_EPT
> return pte & PT_PRESENT_MASK;
> #else
> - return pte & 7;
> + /*
> + * For EPT, an entry is present if any of bits 2:0 are set.
> + * With mode-based execute control, bit 10 also indicates presence.
> + */
> + return pte & (7 | (mmu_has_mbec(mmu) ? VMX_EPT_USER_EXECUTABLE_MASK : 0));
> #endif
> }
>
> @@ -152,7 +157,7 @@ static bool FNAME(prefetch_invalid_gpte)(struct kvm_vcpu *vcpu,
> struct kvm_mmu_page *sp, u64 *spte,
> u64 gpte)
> {
> - if (!FNAME(is_present_gpte)(gpte))
> + if (!FNAME(is_present_gpte)(vcpu->arch.mmu, gpte))
> goto no_present;
>
> /* Prefetch only accessed entries (unless A/D bits are disabled). */
> @@ -173,14 +178,17 @@ static bool FNAME(prefetch_invalid_gpte)(struct kvm_vcpu *vcpu,
> static inline unsigned FNAME(gpte_access)(u64 gpte)
> {
> unsigned access;
> -#if PTTYPE == PTTYPE_EPT
> /*
> - * For now nested MBEC is not supported and permission_fault() ignores
> - * ACC_USER_EXEC_MASK.
> + * Set bits in ACC_*_MASK even if they might not be used in the
> + * actual checks. For example, if EFER.NX is clear permission_fault()
> + * will ignore ACC_EXEC_MASK, and if MBEC is disabled it will
> + * ignore ACC_USER_EXEC_MASK.
> */
> +#if PTTYPE == PTTYPE_EPT
> access = ((gpte & VMX_EPT_WRITABLE_MASK) ? ACC_WRITE_MASK : 0) |
> ((gpte & VMX_EPT_EXECUTABLE_MASK) ? ACC_EXEC_MASK : 0) |
> - ((gpte & VMX_EPT_READABLE_MASK) ? ACC_READ_MASK : 0);
> + ((gpte & VMX_EPT_READABLE_MASK) ? ACC_READ_MASK : 0) |
> + ((gpte & VMX_EPT_USER_EXECUTABLE_MASK) ? ACC_USER_EXEC_MASK : 0);
> #else
> /*
> * P is set here, so the page is always readable and W/U/!NX represent
> @@ -335,7 +343,7 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
> if (walker->level == PT32E_ROOT_LEVEL) {
> pte = mmu->get_pdptr(vcpu, (addr >> 30) & 3);
> trace_kvm_mmu_paging_element(pte, walker->level);
> - if (!FNAME(is_present_gpte)(pte))
> + if (!FNAME(is_present_gpte)(mmu, pte))
> goto error;
> --walker->level;
> }
> @@ -417,7 +425,7 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
> */
> pte_access = pt_access & (pte ^ walk_nx_mask);
>
> - if (unlikely(!FNAME(is_present_gpte)(pte)))
> + if (unlikely(!FNAME(is_present_gpte)(mmu, pte)))
> goto error;
>
> if (unlikely(FNAME(is_rsvd_bits_set)(mmu, pte, walker->level))) {
> @@ -514,6 +522,9 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
> * ACC_*_MASK flags!
> */
> walker->fault.exit_qualification |= EPT_VIOLATION_RWX_TO_PROT(pte_access);
> + if (mmu_has_mbec(mmu))
> + walker->fault.exit_qualification |=
> + EPT_VIOLATION_USER_EXEC_TO_PROT(pte_access);
> }
> #endif
> walker->fault.address = addr;
> --
> 2.52.0
>
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 08/22] KVM: x86/mmu: introduce ACC_READ_MASK
2026-03-21 0:09 ` [PATCH 08/22] KVM: x86/mmu: introduce ACC_READ_MASK Paolo Bonzini
2026-03-23 14:49 ` Jon Kohler
@ 2026-03-23 14:49 ` Jon Kohler
1 sibling, 0 replies; 56+ messages in thread
From: Jon Kohler @ 2026-03-23 14:49 UTC (permalink / raw)
To: Paolo Bonzini
Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
Marcelo Tosatti, Nikunj A Dadhania, Amit Shah,
Sean Christopherson
> On Mar 20, 2026, at 8:09 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> Read permissions so far were only needed for EPT, which does not need
> ACC_USER_MASK. Therefore, for EPT page tables ACC_USER_MASK was repurposed
> as a read permission bit.
>
> In order to implement nested MBEC, EPT will genuinely have four kinds of
> accesses, and there will be no room for such hacks; bite the bullet at
> last, enlarging ACC_ALL to four bits and permissions[] to 2^4 bits (u16).
>
> The new code does not enforce that the XWR bits on non-execonly processors
> have their R bit set, even when running nested: none of the shadow_*_mask
> values have bit 0 set, and make_spte() genuinely relies on ACC_READ_MASK
> being requested! This works becase, if execonly is not supported by the
> processor, shadow EPT will generate an EPT misconfig vmexit if the XWR
> bits represent a non-readable page, and therefore the pte_access argument
> to make_spte() will also always have ACC_READ_MASK set.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> arch/x86/include/asm/kvm_host.h | 12 +++++-----
> arch/x86/kvm/mmu.h | 2 +-
> arch/x86/kvm/mmu/mmu.c | 39 +++++++++++++++++++++------------
> arch/x86/kvm/mmu/mmutrace.h | 3 ++-
> arch/x86/kvm/mmu/paging_tmpl.h | 21 +++++++++---------
> arch/x86/kvm/mmu/spte.c | 18 ++++++---------
> arch/x86/kvm/mmu/spte.h | 5 +++--
> arch/x86/kvm/vmx/capabilities.h | 5 -----
> arch/x86/kvm/vmx/common.h | 5 +----
> arch/x86/kvm/vmx/vmx.c | 3 +--
> 10 files changed, 56 insertions(+), 57 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 871c7ff4fb29..3efb238c683c 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -317,11 +317,11 @@ struct kvm_kernel_irq_routing_entry;
> * the number of unique SPs that can theoretically be created is 2^n, where n
> * is the number of bits that are used to compute the role.
> *
> - * But, even though there are 20 bits in the mask below, not all combinations
> + * But, even though there are 21 bits in the mask below, not all combinations
> * of modes and flags are possible:
> *
> * - invalid shadow pages are not accounted, mirror pages are not shadowed,
> - * so the bits are effectively 18.
> + * so the bits are effectively 19.
> *
> * - quadrant will only be used if has_4_byte_gpte=1 (non-PAE paging);
> * execonly and ad_disabled are only used for nested EPT which has
> @@ -336,7 +336,7 @@ struct kvm_kernel_irq_routing_entry;
> * cr0_wp=0, therefore these three bits only give rise to 5 possibilities.
> *
> * Therefore, the maximum number of possible upper-level shadow pages for a
> - * single gfn is a bit less than 2^13.
> + * single gfn is a bit less than 2^14.
> */
> union kvm_mmu_page_role {
> u32 word;
> @@ -345,7 +345,7 @@ union kvm_mmu_page_role {
> unsigned has_4_byte_gpte:1;
> unsigned quadrant:2;
> unsigned direct:1;
> - unsigned access:3;
> + unsigned access:4;
> unsigned invalid:1;
> unsigned efer_nx:1;
> unsigned cr0_wp:1;
> @@ -355,7 +355,7 @@ union kvm_mmu_page_role {
> unsigned guest_mode:1;
> unsigned passthrough:1;
> unsigned is_mirror:1;
> - unsigned :4;
> + unsigned :3;
checkpatch.pl complaint:
ERROR: space prohibited before that ':' (ctx:WxV)
#78: FILE: arch/x86/include/asm/kvm_host.h:360:
+ unsigned :3;
^
>
> /*
> * This is left at the top of the word so that
> @@ -481,7 +481,7 @@ struct kvm_mmu {
> * Byte index: page fault error code [4:1]
> * Bit index: pte permissions in ACC_* format
> */
> - u8 permissions[16];
> + u16 permissions[16];
>
> u64 *pae_root;
> u64 *pml4_root;
> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> index b4b6860ab971..f5d35f66750b 100644
> --- a/arch/x86/kvm/mmu.h
> +++ b/arch/x86/kvm/mmu.h
> @@ -81,7 +81,7 @@ u8 kvm_mmu_get_max_tdp_level(void);
> void kvm_mmu_set_mmio_spte_mask(u64 mmio_value, u64 mmio_mask, u64 access_mask);
> void kvm_mmu_set_mmio_spte_value(struct kvm *kvm, u64 mmio_value);
> void kvm_mmu_set_me_spte_mask(u64 me_value, u64 me_mask);
> -void kvm_mmu_set_ept_masks(bool has_ad_bits, bool has_exec_only);
> +void kvm_mmu_set_ept_masks(bool has_ad_bits);
>
> void kvm_init_mmu(struct kvm_vcpu *vcpu);
> void kvm_init_shadow_npt_mmu(struct kvm_vcpu *vcpu, unsigned long cr0,
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 84351df8a9cb..b87dbf9e42b9 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -2029,7 +2029,7 @@ static bool kvm_sync_page_check(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
> */
> const union kvm_mmu_page_role sync_role_ign = {
> .level = 0xf,
> - .access = 0x7,
> + .access = ACC_ALL,
> .quadrant = 0x3,
> .passthrough = 0x1,
> };
> @@ -5426,7 +5426,7 @@ reset_ept_shadow_zero_bits_mask(struct kvm_mmu *context, bool execonly)
> * update_permission_bitmask() builds what is effectively a
> * two-dimensional array of bools. The second dimension is
> * provided by individual bits of permissions[pfec >> 1], and
> - * logical &, | and ~ operations operate on all the 8 possible
> + * logical &, | and ~ operations operate on all the 16 possible
> * combinations of ACC_* bits.
> */
> #define ACC_BITS_MASK(access) \
> @@ -5436,15 +5436,24 @@ reset_ept_shadow_zero_bits_mask(struct kvm_mmu *context, bool execonly)
> (4 & (access) ? 1 << 4 : 0) | \
> (5 & (access) ? 1 << 5 : 0) | \
> (6 & (access) ? 1 << 6 : 0) | \
> - (7 & (access) ? 1 << 7 : 0))
> + (7 & (access) ? 1 << 7 : 0) | \
> + (8 & (access) ? 1 << 8 : 0) | \
> + (9 & (access) ? 1 << 9 : 0) | \
> + (10 & (access) ? 1 << 10 : 0) | \
> + (11 & (access) ? 1 << 11 : 0) | \
> + (12 & (access) ? 1 << 12 : 0) | \
> + (13 & (access) ? 1 << 13 : 0) | \
> + (14 & (access) ? 1 << 14 : 0) | \
> + (15 & (access) ? 1 << 15 : 0))
>
> static void update_permission_bitmask(struct kvm_mmu *mmu, bool ept)
> {
> unsigned byte;
>
> - const u8 x = ACC_BITS_MASK(ACC_EXEC_MASK);
> - const u8 w = ACC_BITS_MASK(ACC_WRITE_MASK);
> - const u8 u = ACC_BITS_MASK(ACC_USER_MASK);
> + const u16 x = ACC_BITS_MASK(ACC_EXEC_MASK);
> + const u16 w = ACC_BITS_MASK(ACC_WRITE_MASK);
> + const u16 u = ACC_BITS_MASK(ACC_USER_MASK);
> + const u16 r = ACC_BITS_MASK(ACC_READ_MASK);
>
> bool cr4_smep = is_cr4_smep(mmu);
> bool cr4_smap = is_cr4_smap(mmu);
> @@ -5467,24 +5476,26 @@ static void update_permission_bitmask(struct kvm_mmu *mmu, bool ept)
> unsigned pfec = byte << 1;
>
> /*
> - * Each "*f" variable has a 1 bit for each UWX value
> + * Each "*f" variable has a 1 bit for each ACC_* combo
> * that causes a fault with the given PFEC.
> */
>
> + /* Faults from reads to non-readable pages */
> + u16 rf = (pfec & (PFERR_WRITE_MASK|PFERR_FETCH_MASK)) ? 0 : (u16)~r;
> /* Faults from writes to non-writable pages */
> - u8 wf = (pfec & PFERR_WRITE_MASK) ? (u8)~w : 0;
> + u16 wf = (pfec & PFERR_WRITE_MASK) ? (u16)~w : 0;
> /* Faults from user mode accesses to supervisor pages */
> - u8 uf = (pfec & PFERR_USER_MASK) ? (u8)~u : 0;
> + u16 uf = (pfec & PFERR_USER_MASK) ? (u16)~u : 0;
> /* Faults from fetches of non-executable pages*/
> - u8 ff = (pfec & PFERR_FETCH_MASK) ? (u8)~x : 0;
> + u16 ff = (pfec & PFERR_FETCH_MASK) ? (u16)~x : 0;
> /* Faults from kernel mode fetches of user pages */
> - u8 smepf = 0;
> + u16 smepf = 0;
> /* Faults from kernel mode accesses of user pages */
> - u8 smapf = 0;
> + u16 smapf = 0;
>
> if (!ept) {
> /* Faults from kernel mode accesses to user pages */
> - u8 kf = (pfec & PFERR_USER_MASK) ? 0 : u;
> + u16 kf = (pfec & PFERR_USER_MASK) ? 0 : u;
>
> /* Not really needed: !nx will cause pte.nx to fault */
> if (!efer_nx)
> @@ -5517,7 +5528,7 @@ static void update_permission_bitmask(struct kvm_mmu *mmu, bool ept)
> smapf = (pfec & (PFERR_RSVD_MASK|PFERR_FETCH_MASK)) ? 0 : kf;
> }
>
> - mmu->permissions[byte] = ff | uf | wf | smepf | smapf;
> + mmu->permissions[byte] = ff | uf | wf | rf | smepf | smapf;
> }
> }
>
> diff --git a/arch/x86/kvm/mmu/mmutrace.h b/arch/x86/kvm/mmu/mmutrace.h
> index f35a830ce469..44545f6f860a 100644
> --- a/arch/x86/kvm/mmu/mmutrace.h
> +++ b/arch/x86/kvm/mmu/mmutrace.h
> @@ -25,7 +25,8 @@
> #define KVM_MMU_PAGE_PRINTK() ({ \
> const char *saved_ptr = trace_seq_buffer_ptr(p); \
> static const char *access_str[] = { \
> - "---", "--x", "w--", "w-x", "-u-", "-ux", "wu-", "wux" \
> + "----", "r---", "-w--", "rw--", "--u-", "r-u-", "-wu-", "rwu-", \
> + "---x", "r--x", "-w-x", "rw-x", "--ux", "r-ux", "-wux", "rwux" \
> }; \
> union kvm_mmu_page_role role; \
> \
> diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
> index ed762bb4b007..bbdbf4ae2d65 100644
> --- a/arch/x86/kvm/mmu/paging_tmpl.h
> +++ b/arch/x86/kvm/mmu/paging_tmpl.h
> @@ -170,25 +170,24 @@ static bool FNAME(prefetch_invalid_gpte)(struct kvm_vcpu *vcpu,
> return true;
> }
>
> -/*
> - * For PTTYPE_EPT, a page table can be executable but not readable
> - * on supported processors. Therefore, set_spte does not automatically
> - * set bit 0 if execute only is supported. Here, we repurpose ACC_USER_MASK
> - * to signify readability since it isn't used in the EPT case
> - */
> static inline unsigned FNAME(gpte_access)(u64 gpte)
> {
> unsigned access;
> #if PTTYPE == PTTYPE_EPT
> access = ((gpte & VMX_EPT_WRITABLE_MASK) ? ACC_WRITE_MASK : 0) |
> ((gpte & VMX_EPT_EXECUTABLE_MASK) ? ACC_EXEC_MASK : 0) |
> - ((gpte & VMX_EPT_READABLE_MASK) ? ACC_USER_MASK : 0);
> + ((gpte & VMX_EPT_READABLE_MASK) ? ACC_READ_MASK : 0);
> #else
> - BUILD_BUG_ON(ACC_EXEC_MASK != PT_PRESENT_MASK);
> - BUILD_BUG_ON(ACC_EXEC_MASK != 1);
> + /*
> + * P is set here, so the page is always readable and W/U/!NX represent
> + * allowed accesses.
> + */
> + BUILD_BUG_ON(ACC_READ_MASK != PT_PRESENT_MASK);
> + BUILD_BUG_ON(ACC_WRITE_MASK != PT_WRITABLE_MASK);
> + BUILD_BUG_ON(ACC_USER_MASK != PT_USER_MASK);
> + BUILD_BUG_ON(ACC_EXEC_MASK & (PT_WRITABLE_MASK | PT_USER_MASK | PT_PRESENT_MASK));
> access = gpte & (PT_WRITABLE_MASK | PT_USER_MASK | PT_PRESENT_MASK);
> - /* Combine NX with P (which is set here) to get ACC_EXEC_MASK. */
> - access ^= (gpte >> PT64_NX_SHIFT);
> + access |= gpte & PT64_NX_MASK ? 0 : ACC_EXEC_MASK;
> #endif
>
> return access;
> diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
> index e2acd9ed9dba..0b09124b0d54 100644
> --- a/arch/x86/kvm/mmu/spte.c
> +++ b/arch/x86/kvm/mmu/spte.c
> @@ -194,12 +194,6 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
> int is_host_mmio = -1;
> bool wrprot = false;
>
> - /*
> - * For the EPT case, shadow_present_mask has no RWX bits set if
> - * exec-only page table entries are supported. In that case,
> - * ACC_USER_MASK and shadow_user_mask are used to represent
> - * read access. See FNAME(gpte_access) in paging_tmpl.h.
> - */
> WARN_ON_ONCE((pte_access | shadow_present_mask) == SHADOW_NONPRESENT_VALUE);
>
> if (sp->role.ad_disabled)
> @@ -228,6 +222,9 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
> pte_access &= ~ACC_EXEC_MASK;
> }
>
> + if (pte_access & ACC_READ_MASK)
> + spte |= PT_PRESENT_MASK; /* or VMX_EPT_READABLE_MASK */
> +
> if (pte_access & ACC_EXEC_MASK)
> spte |= shadow_x_mask;
> else
> @@ -390,6 +387,7 @@ u64 make_nonleaf_spte(u64 *child_pt, bool ad_disabled)
> u64 spte = SPTE_MMU_PRESENT_MASK;
>
> spte |= __pa(child_pt) | shadow_present_mask | PT_WRITABLE_MASK |
> + PT_PRESENT_MASK /* or VMX_EPT_READABLE_MASK */ |
> shadow_user_mask | shadow_x_mask | shadow_me_value;
>
> if (ad_disabled)
> @@ -490,18 +488,16 @@ void kvm_mmu_set_me_spte_mask(u64 me_value, u64 me_mask)
> }
> EXPORT_SYMBOL_GPL(kvm_mmu_set_me_spte_mask);
>
> -void kvm_mmu_set_ept_masks(bool has_ad_bits, bool has_exec_only)
> +void kvm_mmu_set_ept_masks(bool has_ad_bits)
> {
> kvm_ad_enabled = has_ad_bits;
>
> - shadow_user_mask = VMX_EPT_READABLE_MASK;
> + shadow_user_mask = 0;
> shadow_accessed_mask = VMX_EPT_ACCESS_BIT;
> shadow_dirty_mask = VMX_EPT_DIRTY_BIT;
> shadow_nx_mask = 0ull;
> shadow_x_mask = VMX_EPT_EXECUTABLE_MASK;
> - /* VMX_EPT_SUPPRESS_VE_BIT is needed for W or X violation. */
> - shadow_present_mask =
> - (has_exec_only ? 0ull : VMX_EPT_READABLE_MASK) | VMX_EPT_SUPPRESS_VE_BIT;
> + shadow_present_mask = VMX_EPT_SUPPRESS_VE_BIT;
>
> shadow_acc_track_mask = VMX_EPT_RWX_MASK;
> shadow_host_writable_mask = EPT_SPTE_HOST_WRITABLE;
> diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
> index 3d77755b6b10..0c305f2f4ba0 100644
> --- a/arch/x86/kvm/mmu/spte.h
> +++ b/arch/x86/kvm/mmu/spte.h
> @@ -52,10 +52,11 @@ static_assert(SPTE_TDP_AD_ENABLED == 0);
> #define SPTE_BASE_ADDR_MASK (((1ULL << 52) - 1) & ~(u64)(PAGE_SIZE-1))
> #endif
>
> -#define ACC_EXEC_MASK 1
> +#define ACC_READ_MASK PT_PRESENT_MASK
> #define ACC_WRITE_MASK PT_WRITABLE_MASK
> #define ACC_USER_MASK PT_USER_MASK
> -#define ACC_ALL (ACC_EXEC_MASK | ACC_WRITE_MASK | ACC_USER_MASK)
> +#define ACC_EXEC_MASK 8
> +#define ACC_ALL (ACC_EXEC_MASK | ACC_WRITE_MASK | ACC_USER_MASK | ACC_READ_MASK)
>
> #define SPTE_LEVEL_BITS 9
> #define SPTE_LEVEL_SHIFT(level) __PT_LEVEL_SHIFT(level, SPTE_LEVEL_BITS)
> diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
> index 5316c27f6099..3bda6a621d8a 100644
> --- a/arch/x86/kvm/vmx/capabilities.h
> +++ b/arch/x86/kvm/vmx/capabilities.h
> @@ -288,11 +288,6 @@ static inline bool cpu_has_vmx_flexpriority(void)
> cpu_has_vmx_virtualize_apic_accesses();
> }
>
> -static inline bool cpu_has_vmx_ept_execute_only(void)
> -{
> - return vmx_capability.ept & VMX_EPT_EXECUTE_ONLY_BIT;
> -}
> -
> static inline bool cpu_has_vmx_ept_4levels(void)
> {
> return vmx_capability.ept & VMX_EPT_PAGE_WALK_4_BIT;
> diff --git a/arch/x86/kvm/vmx/common.h b/arch/x86/kvm/vmx/common.h
> index adf925500b9e..1afbf272efae 100644
> --- a/arch/x86/kvm/vmx/common.h
> +++ b/arch/x86/kvm/vmx/common.h
> @@ -85,11 +85,8 @@ static inline int __vmx_handle_ept_violation(struct kvm_vcpu *vcpu, gpa_t gpa,
> {
> u64 error_code;
>
> - /* Is it a read fault? */
> - error_code = (exit_qualification & EPT_VIOLATION_ACC_READ)
> - ? PFERR_USER_MASK : 0;
> /* Is it a write fault? */
> - error_code |= (exit_qualification & EPT_VIOLATION_ACC_WRITE)
> + error_code = (exit_qualification & EPT_VIOLATION_ACC_WRITE)
> ? PFERR_WRITE_MASK : 0;
> /* Is it a fetch fault? */
> error_code |= (exit_qualification & EPT_VIOLATION_ACC_INSTR)
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 2e687761aeaf..98801c408b8c 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -8425,8 +8425,7 @@ __init int vmx_hardware_setup(void)
> set_bit(0, vmx_vpid_bitmap); /* 0 is reserved for host */
>
> if (enable_ept)
> - kvm_mmu_set_ept_masks(enable_ept_ad_bits,
> - cpu_has_vmx_ept_execute_only());
> + kvm_mmu_set_ept_masks(enable_ept_ad_bits);
> else
> vt_x86_ops.get_mt_mask = NULL;
>
> --
> 2.52.0
>
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 12/22] KVM: VMX: enable use of MBEC
2026-03-21 0:09 ` [PATCH 12/22] KVM: VMX: enable use of MBEC Paolo Bonzini
@ 2026-03-23 14:49 ` Jon Kohler
0 siblings, 0 replies; 56+ messages in thread
From: Jon Kohler @ 2026-03-23 14:49 UTC (permalink / raw)
To: Paolo Bonzini
Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
Marcelo Tosatti, Nikunj A Dadhania, Amit Shah,
Sean Christopherson
> On Mar 20, 2026, at 8:09 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> Add SECONDARY_EXEC_MODE_BASED_EPT_EXEC as optional secondary execution
> control bit. If enabled, configure XS and XU separately (even if they
> are always used together).
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> arch/x86/include/asm/vmx.h | 3 +++
> arch/x86/kvm/mmu.h | 7 ++++++-
> arch/x86/kvm/mmu/spte.c | 4 ++--
> arch/x86/kvm/mmu/spte.h | 5 +++--
> arch/x86/kvm/vmx/capabilities.h | 6 ++++++
> arch/x86/kvm/vmx/common.h | 17 ++++++++++++-----
> arch/x86/kvm/vmx/main.c | 11 ++++++++++-
> arch/x86/kvm/vmx/vmx.c | 16 +++++++++++++++-
> arch/x86/kvm/vmx/vmx.h | 1 +
> arch/x86/kvm/vmx/x86_ops.h | 1 +
> 10 files changed, 59 insertions(+), 12 deletions(-)
>
> diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
> index 0041f8a77447..5fef7a531cb7 100644
> --- a/arch/x86/include/asm/vmx.h
> +++ b/arch/x86/include/asm/vmx.h
> @@ -597,9 +597,12 @@ enum vm_entry_failure_code {
> #define EPT_VIOLATION_GVA_TRANSLATED BIT(8)
>
> #define EPT_VIOLATION_RWX_TO_PROT(__epte) (((__epte) & VMX_EPT_RWX_MASK) << 3)
> +#define EPT_VIOLATION_USER_EXEC_TO_PROT(__epte) (((__epte) & VMX_EPT_USER_EXECUTABLE_MASK) >> 4)
>
> static_assert(EPT_VIOLATION_RWX_TO_PROT(VMX_EPT_RWX_MASK) ==
> (EPT_VIOLATION_PROT_READ | EPT_VIOLATION_PROT_WRITE | EPT_VIOLATION_PROT_EXEC));
> +static_assert(EPT_VIOLATION_USER_EXEC_TO_PROT(VMX_EPT_USER_EXECUTABLE_MASK) ==
> + (EPT_VIOLATION_PROT_USER_EXEC));
>
> /*
> * Exit Qualifications for NOTIFY VM EXIT
> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> index f5d35f66750b..2a6caac39d40 100644
> --- a/arch/x86/kvm/mmu.h
> +++ b/arch/x86/kvm/mmu.h
> @@ -76,12 +76,17 @@ static inline gfn_t kvm_mmu_max_gfn(void)
> return (1ULL << (max_gpa_bits - PAGE_SHIFT)) - 1;
> }
>
> +static inline bool mmu_has_mbec(struct kvm_mmu *mmu)
> +{
> + return mmu->root_role.cr4_smep;
> +}
> +
> u8 kvm_mmu_get_max_tdp_level(void);
>
> void kvm_mmu_set_mmio_spte_mask(u64 mmio_value, u64 mmio_mask, u64 access_mask);
> void kvm_mmu_set_mmio_spte_value(struct kvm *kvm, u64 mmio_value);
> void kvm_mmu_set_me_spte_mask(u64 me_value, u64 me_mask);
> -void kvm_mmu_set_ept_masks(bool has_ad_bits);
> +void kvm_mmu_set_ept_masks(bool has_ad_bits, bool has_mbec);
>
> void kvm_init_mmu(struct kvm_vcpu *vcpu);
> void kvm_init_shadow_npt_mmu(struct kvm_vcpu *vcpu, unsigned long cr0,
> diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
> index 0b3e2b97afbf..f51e74e7202d 100644
> --- a/arch/x86/kvm/mmu/spte.c
> +++ b/arch/x86/kvm/mmu/spte.c
> @@ -499,7 +499,7 @@ void kvm_mmu_set_me_spte_mask(u64 me_value, u64 me_mask)
> }
> EXPORT_SYMBOL_GPL(kvm_mmu_set_me_spte_mask);
Same issue on this patch as patch 8
What kernel version were you doing this against?
git am is giving me grief as I get a failed to apply because this should be
EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_mmu_set_me_spte_mask);
This was there since 6.18: https://github.com/torvalds/linux/commit/6b36119b94d0b2bb8cea9d512017efafd461d6ac
>
> -void kvm_mmu_set_ept_masks(bool has_ad_bits)
> +void kvm_mmu_set_ept_masks(bool has_ad_bits, bool has_mbec)
> {
> kvm_ad_enabled = has_ad_bits;
>
> @@ -508,7 +508,7 @@ void kvm_mmu_set_ept_masks(bool has_ad_bits)
> shadow_dirty_mask = VMX_EPT_DIRTY_BIT;
> shadow_nx_mask = 0ull;
> shadow_xs_mask = VMX_EPT_EXECUTABLE_MASK;
> - shadow_xu_mask = VMX_EPT_EXECUTABLE_MASK;
> + shadow_xu_mask = has_mbec ? VMX_EPT_USER_EXECUTABLE_MASK : VMX_EPT_EXECUTABLE_MASK;
> shadow_present_mask = VMX_EPT_SUPPRESS_VE_BIT;
>
> shadow_acc_track_mask = VMX_EPT_RWX_MASK;
> diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
> index 7323ff19056b..61414f8deaa2 100644
> --- a/arch/x86/kvm/mmu/spte.h
> +++ b/arch/x86/kvm/mmu/spte.h
> @@ -24,7 +24,7 @@
> * - bits 55 (EPT only): MMU-writable
> * - bits 56-59: unused
> * - bits 60-61: type of A/D tracking
> - * - bits 62: unused
> + * - bits 62 (EPT only): saved XU bit for disabled AD
> */
>
> /*
> @@ -72,7 +72,8 @@ static_assert(SPTE_TDP_AD_ENABLED == 0);
> * must not overlap the A/D type mask.
> */
> #define SHADOW_ACC_TRACK_SAVED_BITS_MASK (VMX_EPT_READABLE_MASK | \
> - VMX_EPT_EXECUTABLE_MASK)
> + VMX_EPT_EXECUTABLE_MASK | \
> + VMX_EPT_USER_EXECUTABLE_MASK)
> #define SHADOW_ACC_TRACK_SAVED_BITS_SHIFT 52
> #define SHADOW_ACC_TRACK_SAVED_MASK (SHADOW_ACC_TRACK_SAVED_BITS_MASK << \
> SHADOW_ACC_TRACK_SAVED_BITS_SHIFT)
> diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
> index 3bda6a621d8a..02037e559410 100644
> --- a/arch/x86/kvm/vmx/capabilities.h
> +++ b/arch/x86/kvm/vmx/capabilities.h
> @@ -393,4 +393,10 @@ static inline bool cpu_has_notify_vmexit(void)
> SECONDARY_EXEC_NOTIFY_VM_EXITING;
> }
>
> +static inline bool cpu_has_ept_mbec(void)
> +{
> + return vmcs_config.cpu_based_2nd_exec_ctrl &
> + SECONDARY_EXEC_MODE_BASED_EPT_EXEC;
> +}
> +
> #endif /* __KVM_X86_VMX_CAPS_H */
> diff --git a/arch/x86/kvm/vmx/common.h b/arch/x86/kvm/vmx/common.h
> index 1afbf272efae..eff0b51bfda5 100644
> --- a/arch/x86/kvm/vmx/common.h
> +++ b/arch/x86/kvm/vmx/common.h
> @@ -74,6 +74,8 @@ static __always_inline bool is_td_vcpu(struct kvm_vcpu *vcpu) { return false; }
>
> #endif
>
> +extern int vt_get_cpl(struct kvm_vcpu *vcpu);
> +
> static inline bool vt_is_tdx_private_gpa(struct kvm *kvm, gpa_t gpa)
> {
> /* For TDX the direct mask is the shared mask. */
> @@ -91,15 +93,20 @@ static inline int __vmx_handle_ept_violation(struct kvm_vcpu *vcpu, gpa_t gpa,
> /* Is it a fetch fault? */
> error_code |= (exit_qualification & EPT_VIOLATION_ACC_INSTR)
> ? PFERR_FETCH_MASK : 0;
> - /*
> - * ept page table entry is present?
> - * note: unconditionally clear USER_EXEC until mode-based
> - * execute control is implemented
> - */
> + /* Is it a fetch fault? */
> + error_code |= (exit_qualification & EPT_VIOLATION_ACC_INSTR)
> + ? PFERR_FETCH_MASK : 0;
> + /* ept page table entry is present? */
> error_code |= (exit_qualification &
> (EPT_VIOLATION_PROT_MASK & ~EPT_VIOLATION_PROT_USER_EXEC))
> ? PFERR_PRESENT_MASK : 0;
>
> + if (mmu_has_mbec(vcpu->arch.mmu)) {
> + error_code |= vt_get_cpl(vcpu) > 0 ? PFERR_USER_MASK : 0;
> + error_code |= (exit_qualification & EPT_VIOLATION_PROT_USER_EXEC)
> + ? PFERR_PRESENT_MASK : 0;
> + }
> +
checkpatch.pl complaint:
ERROR: code indent should use tabs where possible
#158: FILE: arch/x86/kvm/vmx/common.h:107:
+^I ? PFERR_PRESENT_MASK : 0;$
> if (exit_qualification & EPT_VIOLATION_GVA_IS_VALID)
> error_code |= (exit_qualification & EPT_VIOLATION_GVA_TRANSLATED) ?
> PFERR_GUEST_FINAL_MASK : PFERR_GUEST_PAGE_MASK;
> diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
> index dbab1c15b0cd..601d1b7437a8 100644
> --- a/arch/x86/kvm/vmx/main.c
> +++ b/arch/x86/kvm/vmx/main.c
> @@ -354,7 +354,7 @@ static void vt_set_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var,
> vmx_set_segment(vcpu, var, seg);
> }
>
> -static int vt_get_cpl(struct kvm_vcpu *vcpu)
> +int vt_get_cpl(struct kvm_vcpu *vcpu)
> {
> if (is_td_vcpu(vcpu))
> return 0;
> @@ -750,6 +750,14 @@ static int vt_set_identity_map_addr(struct kvm *kvm, u64 ident_addr)
> return vmx_set_identity_map_addr(kvm, ident_addr);
> }
>
> +static bool vt_tdp_has_smep(struct kvm *kvm)
> +{
> + if (is_td(kvm))
> + return false;
> +
> + return vmx_tdp_has_smep(kvm);
> +}
> +
> static u64 vt_get_l2_tsc_offset(struct kvm_vcpu *vcpu)
> {
> /* TDX doesn't support L2 guest at the moment. */
> @@ -952,6 +960,7 @@ struct kvm_x86_ops vt_x86_ops __initdata = {
> .set_tss_addr = vt_op(set_tss_addr),
> .set_identity_map_addr = vt_op(set_identity_map_addr),
> .get_mt_mask = vmx_get_mt_mask,
> + .tdp_has_smep = vt_op(tdp_has_smep),
>
> .get_exit_info = vt_op(get_exit_info),
> .get_entry_info = vt_op(get_entry_info),
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 98801c408b8c..350d26f792c4 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -112,6 +112,9 @@ module_param(emulate_invalid_guest_state, bool, 0444);
> static bool __read_mostly fasteoi = 1;
> module_param(fasteoi, bool, 0444);
>
> +static bool __read_mostly enable_mbec = 1;
> +module_param_named(mbec, enable_mbec, bool, 0444);
> +
> module_param(enable_apicv, bool, 0444);
> module_param(enable_ipiv, bool, 0444);
>
> @@ -2625,6 +2628,7 @@ static int setup_vmcs_config(struct vmcs_config *vmcs_conf,
> return -EIO;
>
> vmx_cap->ept = 0;
> + _cpu_based_2nd_exec_control &= ~SECONDARY_EXEC_MODE_BASED_EPT_EXEC;
> _cpu_based_2nd_exec_control &= ~SECONDARY_EXEC_EPT_VIOLATION_VE;
> }
> if (!(_cpu_based_2nd_exec_control & SECONDARY_EXEC_ENABLE_VPID) &&
> @@ -4520,6 +4524,9 @@ static u32 vmx_secondary_exec_control(struct vcpu_vmx *vmx)
> */
> exec_control &= ~SECONDARY_EXEC_ENABLE_VMFUNC;
>
> + if (!enable_mbec)
> + exec_control &= ~SECONDARY_EXEC_MODE_BASED_EPT_EXEC;
> +
> /* SECONDARY_EXEC_DESC is enabled/disabled on writes to CR4.UMIP,
> * in vmx_set_cr4. */
> exec_control &= ~SECONDARY_EXEC_DESC;
> @@ -7580,6 +7587,11 @@ u8 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
> return (MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT);
> }
>
> +bool vmx_tdp_has_smep(struct kvm *kvm)
> +{
> + return enable_mbec;
> +}
> +
> static void vmcs_set_secondary_exec_control(struct vcpu_vmx *vmx, u32 new_ctl)
> {
> /*
> @@ -8406,6 +8418,8 @@ __init int vmx_hardware_setup(void)
> ple_window_shrink = 0;
> }
>
> + if (!cpu_has_ept_mbec())
> + enable_mbec = 0;
> if (!cpu_has_vmx_apicv())
> enable_apicv = 0;
> if (!enable_apicv)
> @@ -8425,7 +8439,7 @@ __init int vmx_hardware_setup(void)
> set_bit(0, vmx_vpid_bitmap); /* 0 is reserved for host */
>
> if (enable_ept)
> - kvm_mmu_set_ept_masks(enable_ept_ad_bits);
> + kvm_mmu_set_ept_masks(enable_ept_ad_bits, enable_mbec);
> else
> vt_x86_ops.get_mt_mask = NULL;
>
> diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
> index d3389baf3ab3..743fa33b349e 100644
> --- a/arch/x86/kvm/vmx/vmx.h
> +++ b/arch/x86/kvm/vmx/vmx.h
> @@ -576,6 +576,7 @@ static inline u8 vmx_get_rvi(void)
> SECONDARY_EXEC_ENABLE_VMFUNC | \
> SECONDARY_EXEC_BUS_LOCK_DETECTION | \
> SECONDARY_EXEC_NOTIFY_VM_EXITING | \
> + SECONDARY_EXEC_MODE_BASED_EPT_EXEC | \
> SECONDARY_EXEC_ENCLS_EXITING | \
> SECONDARY_EXEC_EPT_VIOLATION_VE)
>
> diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
> index 2b3424f638db..1fb1128b1eb7 100644
> --- a/arch/x86/kvm/vmx/x86_ops.h
> +++ b/arch/x86/kvm/vmx/x86_ops.h
> @@ -104,6 +104,7 @@ void vmx_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap);
> int vmx_set_tss_addr(struct kvm *kvm, unsigned int addr);
> int vmx_set_identity_map_addr(struct kvm *kvm, u64 ident_addr);
> u8 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio);
> +bool vmx_tdp_has_smep(struct kvm *kvm);
>
> void vmx_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason,
> u64 *info1, u64 *info2, u32 *intr_info, u32 *error_code);
> --
> 2.52.0
>
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 03/22] KVM: x86/mmu: adjust MMIO generation bit allocation and allowed mask
2026-03-21 0:09 ` [PATCH 03/22] KVM: x86/mmu: adjust MMIO generation bit allocation and allowed mask Paolo Bonzini
@ 2026-03-24 3:48 ` Huang, Kai
2026-03-24 9:11 ` Paolo Bonzini
0 siblings, 1 reply; 56+ messages in thread
From: Huang, Kai @ 2026-03-24 3:48 UTC (permalink / raw)
To: kvm@vger.kernel.org, pbonzini@redhat.com,
linux-kernel@vger.kernel.org
Cc: amit.shah@amd.com, Kohler, Jon, seanjc@google.com,
mtosatti@redhat.com, nikunj@amd.com
> /*
> - * Due to limited space in PTEs, the MMIO generation is a 19 bit subset of
> + * Due to limited space in PTEs, the MMIO generation is an 18 bit subset of
> * the memslots generation and is derived as follows:
Is "a -> an" unintentional change?
> *
> - * Bits 0-7 of the MMIO generation are propagated to spte bits 3-10
> - * Bits 8-18 of the MMIO generation are propagated to spte bits 52-62
> + * Bits 0-6 of the MMIO generation are propagated to spte bits 3-9
> + * Bits 7-17 of the MMIO generation are propagated to spte bits 52-62
> *
> * The KVM_MEMSLOT_GEN_UPDATE_IN_PROGRESS flag is intentionally not included in
> * the MMIO generation number, as doing so would require stealing a bit from
> @@ -111,7 +111,7 @@ static_assert(!(EPT_SPTE_MMU_WRITABLE & SHADOW_ACC_TRACK_SAVED_MASK));
> */
>
> #define MMIO_SPTE_GEN_LOW_START 3
> -#define MMIO_SPTE_GEN_LOW_END 10
> +#define MMIO_SPTE_GEN_LOW_END 9
>
> #define MMIO_SPTE_GEN_HIGH_START 52
> #define MMIO_SPTE_GEN_HIGH_END 62
> @@ -133,7 +133,8 @@ static_assert(!(SPTE_MMU_PRESENT_MASK &
> * and so they're off-limits for generation; additional checks ensure the mask
> * doesn't overlap legal PA bits), and bit 63 (carved out for future usage).
> */
> -#define SPTE_MMIO_ALLOWED_MASK (BIT_ULL(63) | GENMASK_ULL(51, 12) | GENMASK_ULL(2, 0))
> +#define SPTE_MMIO_ALLOWED_MASK (BIT_ULL(63) | GENMASK_ULL(51, 12) | \
> + BIT_ULL(10) | GENMASK_ULL(2, 0))
> static_assert(!(SPTE_MMIO_ALLOWED_MASK &
> (SPTE_MMU_PRESENT_MASK | MMIO_SPTE_GEN_LOW_MASK | MMIO_SPTE_GEN_HIGH_MASK)));
>
> @@ -141,7 +142,7 @@ static_assert(!(SPTE_MMIO_ALLOWED_MASK &
> #define MMIO_SPTE_GEN_HIGH_BITS (MMIO_SPTE_GEN_HIGH_END - MMIO_SPTE_GEN_HIGH_START + 1)
>
> /* remember to adjust the comment above as well if you change these */
> -static_assert(MMIO_SPTE_GEN_LOW_BITS == 8 && MMIO_SPTE_GEN_HIGH_BITS == 11);
> +static_assert(MMIO_SPTE_GEN_LOW_BITS == 7 && MMIO_SPTE_GEN_HIGH_BITS == 11);
>
> #define MMIO_SPTE_GEN_LOW_SHIFT (MMIO_SPTE_GEN_LOW_START - 0)
> #define MMIO_SPTE_GEN_HIGH_SHIFT (MMIO_SPTE_GEN_HIGH_START - MMIO_SPTE_GEN_LOW_BITS)
Besides the changes to MMIO_GEN, the FROZEN_SPTE seems to have bit 10 set:
#define FROZEN_SPTE (SHADOW_NONPRESENT_VALUE | 0x5a0ULL)
When MBEC is enabled, IIUC such SPTE will be treated as present by hardware
if CPU supports execution-only SPTE.
Also, when MBEC is enabled, per SDM if CPU doesn't support execution-only,
an SPTE with bit 0 clear but with bit 10 set will trigger EPT
miscofiguration, rather than EPT violation.
So seems we should exclude bit 10 from FROZEN_SPTE.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 03/22] KVM: x86/mmu: adjust MMIO generation bit allocation and allowed mask
2026-03-24 3:48 ` Huang, Kai
@ 2026-03-24 9:11 ` Paolo Bonzini
0 siblings, 0 replies; 56+ messages in thread
From: Paolo Bonzini @ 2026-03-24 9:11 UTC (permalink / raw)
To: Huang, Kai, kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: amit.shah@amd.com, Kohler, Jon, seanjc@google.com,
mtosatti@redhat.com, nikunj@amd.com
On 3/24/26 04:48, Huang, Kai wrote:
>
>> /*
>> - * Due to limited space in PTEs, the MMIO generation is a 19 bit subset of
>> + * Due to limited space in PTEs, the MMIO generation is an 18 bit subset of
>> * the memslots generation and is derived as follows:
>
> Is "a -> an" unintentional change?
No, "a nineteen-bit" -> "an eighteen-bit". :)
> Besides the changes to MMIO_GEN, the FROZEN_SPTE seems to have bit 10 set:
>
> #define FROZEN_SPTE (SHADOW_NONPRESENT_VALUE | 0x5a0ULL)
>
> When MBEC is enabled, IIUC such SPTE will be treated as present by hardware
> if CPU supports execution-only SPTE.
>
> Also, when MBEC is enabled, per SDM if CPU doesn't support execution-only,
> an SPTE with bit 0 clear but with bit 10 set will trigger EPT
> miscofiguration, rather than EPT violation.
>
> So seems we should exclude bit 10 from FROZEN_SPTE.
True, good catch (so 0x5a0 should become 0x1a0).
Paolo
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 10/22] KVM: x86/mmu: split XS/XU bits for MBEC
2026-03-21 0:09 ` [PATCH 10/22] KVM: x86/mmu: split XS/XU bits for MBEC Paolo Bonzini
@ 2026-03-24 10:45 ` Huang, Kai
2026-03-24 11:24 ` Paolo Bonzini
0 siblings, 1 reply; 56+ messages in thread
From: Huang, Kai @ 2026-03-24 10:45 UTC (permalink / raw)
To: kvm@vger.kernel.org, pbonzini@redhat.com,
linux-kernel@vger.kernel.org
Cc: amit.shah@amd.com, Kohler, Jon, seanjc@google.com,
mtosatti@redhat.com, nikunj@amd.com
On Sat, 2026-03-21 at 01:09 +0100, Paolo Bonzini wrote:
> When EPT is in use, replace ACC_USER_MASK with ACC_USER_EXEC_MASK,
> so that supervisor and user-mode execution can be controlled
> independently (ACC_USER_MASK would not allow a setting similar to
> XU=0 XS=1 W=1 R=1).
>
> Replace shadow_x_mask with shadow_xs_mask/shadow_xu_mask, to allow
> setting XS and XU bits separately in EPT entries.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> arch/x86/include/asm/vmx.h | 1 +
> arch/x86/kvm/mmu/mmu.c | 15 ++++++++---
> arch/x86/kvm/mmu/mmutrace.h | 6 ++---
> arch/x86/kvm/mmu/paging_tmpl.h | 4 +++
> arch/x86/kvm/mmu/spte.c | 47 ++++++++++++++++++++++------------
> arch/x86/kvm/mmu/spte.h | 8 +++---
> 6 files changed, 55 insertions(+), 26 deletions(-)
>
> diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
> index 4a0804cc7c82..0041f8a77447 100644
> --- a/arch/x86/include/asm/vmx.h
> +++ b/arch/x86/include/asm/vmx.h
> @@ -538,6 +538,7 @@ enum vmcs_field {
> #define VMX_EPT_IPAT_BIT (1ull << 6)
> #define VMX_EPT_ACCESS_BIT (1ull << 8)
> #define VMX_EPT_DIRTY_BIT (1ull << 9)
> +#define VMX_EPT_USER_EXECUTABLE_MASK (1ull << 10)
> #define VMX_EPT_SUPPRESS_VE_BIT (1ull << 63)
> #define VMX_EPT_RWX_MASK (VMX_EPT_READABLE_MASK | \
> VMX_EPT_WRITABLE_MASK | \
Should we include VMX_EPT_USER_EXECUTABLE_MASK to VMX_EPT_RWX_MASK?
[...]
> @@ -496,7 +507,8 @@ void kvm_mmu_set_ept_masks(bool has_ad_bits)
> shadow_accessed_mask = VMX_EPT_ACCESS_BIT;
> shadow_dirty_mask = VMX_EPT_DIRTY_BIT;
> shadow_nx_mask = 0ull;
> - shadow_x_mask = VMX_EPT_EXECUTABLE_MASK;
> + shadow_xs_mask = VMX_EPT_EXECUTABLE_MASK;
> + shadow_xu_mask = VMX_EPT_EXECUTABLE_MASK;
Shouldn't 'shadow_xu_mask' be VMX_EPT_USER_EXECUTABLE_MASK?
Btw, with MBEC it's a bit weird to me that we continue to just use
110 (R=0,W=1,X=1) to trigger EPT misconfig for MMIO caching:
/*
* EPT Misconfigurations are generated if the value of bits 2:0
* of an EPT paging-structure entry is 110b (write/execute).
*/
kvm_mmu_set_mmio_spte_mask(VMX_EPT_MISCONFIG_WX_VALUE,
VMX_EPT_RWX_MASK | VMX_EPT_SUPPRESS_VE_BIT,
0);
Per SDM, R=0 and W=1 is always guaranteed to trigger EPT misconfig (see
30.3.3.1 EPT Misconfigurations). Maybe we can just use that for MMIO
caching?
We can then remove both X and XU bit from mmio_mask too.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 10/22] KVM: x86/mmu: split XS/XU bits for MBEC
2026-03-24 10:45 ` Huang, Kai
@ 2026-03-24 11:24 ` Paolo Bonzini
2026-03-25 4:28 ` Huang, Kai
0 siblings, 1 reply; 56+ messages in thread
From: Paolo Bonzini @ 2026-03-24 11:24 UTC (permalink / raw)
To: Huang, Kai, kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: amit.shah@amd.com, Kohler, Jon, seanjc@google.com,
mtosatti@redhat.com, nikunj@amd.com
On 3/24/26 11:45, Huang, Kai wrote:
> On Sat, 2026-03-21 at 01:09 +0100, Paolo Bonzini wrote:
>> When EPT is in use, replace ACC_USER_MASK with ACC_USER_EXEC_MASK,
>> so that supervisor and user-mode execution can be controlled
>> independently (ACC_USER_MASK would not allow a setting similar to
>> XU=0 XS=1 W=1 R=1).
>>
>> Replace shadow_x_mask with shadow_xs_mask/shadow_xu_mask, to allow
>> setting XS and XU bits separately in EPT entries.
>>
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> ---
>> arch/x86/include/asm/vmx.h | 1 +
>> arch/x86/kvm/mmu/mmu.c | 15 ++++++++---
>> arch/x86/kvm/mmu/mmutrace.h | 6 ++---
>> arch/x86/kvm/mmu/paging_tmpl.h | 4 +++
>> arch/x86/kvm/mmu/spte.c | 47 ++++++++++++++++++++++------------
>> arch/x86/kvm/mmu/spte.h | 8 +++---
>> 6 files changed, 55 insertions(+), 26 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
>> index 4a0804cc7c82..0041f8a77447 100644
>> --- a/arch/x86/include/asm/vmx.h
>> +++ b/arch/x86/include/asm/vmx.h
>> @@ -538,6 +538,7 @@ enum vmcs_field {
>> #define VMX_EPT_IPAT_BIT (1ull << 6)
>> #define VMX_EPT_ACCESS_BIT (1ull << 8)
>> #define VMX_EPT_DIRTY_BIT (1ull << 9)
>> +#define VMX_EPT_USER_EXECUTABLE_MASK (1ull << 10)
>> #define VMX_EPT_SUPPRESS_VE_BIT (1ull << 63)
>> #define VMX_EPT_RWX_MASK (VMX_EPT_READABLE_MASK | \
>> VMX_EPT_WRITABLE_MASK | \
>
> Should we include VMX_EPT_USER_EXECUTABLE_MASK to VMX_EPT_RWX_MASK?
No, because it is used for many cases to refer to bits 0-2, for example:
#define EPT_VIOLATION_RWX_TO_PROT(__epte)
(((__epte) & VMX_EPT_RWX_MASK) << 3)
Bit 10 is handled separately because it's not contiguous and has a
different mapping to the exit qualification (to bit 6 instead of bit 13).
(However, there is a bug later in the series where shadow_acc_track_mask
needs to have VMX_EPT_USER_EXECUTABLE_MASK in it).
>
> [...]
>
>> @@ -496,7 +507,8 @@ void kvm_mmu_set_ept_masks(bool has_ad_bits)
>> shadow_accessed_mask = VMX_EPT_ACCESS_BIT;
>> shadow_dirty_mask = VMX_EPT_DIRTY_BIT;
>> shadow_nx_mask = 0ull;
>> - shadow_x_mask = VMX_EPT_EXECUTABLE_MASK;
>> + shadow_xs_mask = VMX_EPT_EXECUTABLE_MASK;
>> + shadow_xu_mask = VMX_EPT_EXECUTABLE_MASK;
>
> Shouldn't 'shadow_xu_mask' be VMX_EPT_USER_EXECUTABLE_MASK?
Not yet, because shadow_xu_mask is used to set executable permissions as
well. I suppose you could make it 0 when MBEC is disabled instead of
VMX_EPT_EXECUTABLE_MASK, but it can only be VMX_EPT_USER_EXECUTABLE_MASK
when MBEC is enabled.
>
>
> Btw, with MBEC it's a bit weird to me that we continue to just use
> 110 (R=0,W=1,X=1) to trigger EPT misconfig for MMIO caching:
>
> /*
> * EPT Misconfigurations are generated if the value of bits 2:0
> * of an EPT paging-structure entry is 110b (write/execute).
> */
> kvm_mmu_set_mmio_spte_mask(VMX_EPT_MISCONFIG_WX_VALUE,
> VMX_EPT_RWX_MASK | VMX_EPT_SUPPRESS_VE_BIT,
> 0);
>
> Per SDM, R=0 and W=1 is always guaranteed to trigger EPT misconfig (see
> 30.3.3.1 EPT Misconfigurations). Maybe we can just use that for MMIO
> caching?
>
> We can then remove both X and XU bit from mmio_mask too.
Maybe but is it worth it? (Based on this we could keep bit 10 in
MMIO_SPTE_GEN_LOW_END, after all, because W=1 R=0 would give a
misconfiguration independent of the value of XU; but again I'm not sure
it's worth it).
Paolo
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 22/22] KVM: nSVM: enable GMET for guests
2026-03-21 0:09 ` [PATCH 22/22] KVM: nSVM: enable GMET for guests Paolo Bonzini
@ 2026-03-24 19:57 ` Jon Kohler
2026-03-25 5:22 ` Nikunj A. Dadhania
2026-03-25 12:55 ` Paolo Bonzini
0 siblings, 2 replies; 56+ messages in thread
From: Jon Kohler @ 2026-03-24 19:57 UTC (permalink / raw)
To: Paolo Bonzini
Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
Marcelo Tosatti, Nikunj A Dadhania, Amit Shah,
Sean Christopherson
> On Mar 20, 2026, at 8:09 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> All that needs to be done is moving the GMET bit from vmcs12 to
> vmcs02.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> arch/x86/kvm/svm/nested.c | 3 +++
> arch/x86/kvm/svm/svm.c | 3 +++
> 2 files changed, 6 insertions(+)
>
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index d69bcf52f948..397e9afecb78 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -774,6 +774,9 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
> vmcb02->control.bus_lock_counter = 0;
>
> vmcb02->control.nested_ctl &= ~SVM_NESTED_CTL_GMET_ENABLE;
> + if (guest_cpu_cap_has(vcpu, X86_FEATURE_GMET))
> + vmcb02->control.nested_ctl |=
> + (svm->nested.ctl.nested_ctl & SVM_NESTED_CTL_GMET_ENABLE);
>
> /* Done at vmrun: asid. */
>
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index d3b69eb3242b..4a0d97e70dc2 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -5294,6 +5294,9 @@ static __init void svm_set_cpu_caps(void)
> if (boot_cpu_has(X86_FEATURE_PFTHRESHOLD))
> kvm_cpu_cap_set(X86_FEATURE_PFTHRESHOLD);
>
> + if (boot_cpu_has(X86_FEATURE_GMET))
> + kvm_cpu_cap_set(X86_FEATURE_GMET);
> +
> if (vgif)
> kvm_cpu_cap_set(X86_FEATURE_VGIF);
>
> --
> 2.52.0
>
When I enable gmet on the guest, and try to boot with memory integrity
enabled in Windows 11 25H2 guest, the machine does not boot, but rather
gets stuck in an NPF loop and does not make any progress.
I added a snippet of the tracing statements I see when enabling tracing
as well as a snippet of the VM config I’m using, nothing too fancy.
Scratching my head a bit, dropping this here in case you’ve already seen
this issue.
18:43:27.211372
Analyze events for pid(s) 32072, all VCPUs:
VM-EXIT Samples Samples% Time% Min Time Max Time Avg time
npf 35947 99.98% 100.00% 0.31us 463.93us 6.34us ( +- 4.17% )
interrupt 5 0.01% 0.00% 0.92us 3.77us 1.91us ( +- 31.48% )
npf 1 0.00% 0.00% 0.63us 0.63us 0.63us ( +- 0.00% )
Total Samples:35953, Total events handled time:227798.77us.
CPU 0/KVM-32296 [112] ..... 3170.471586: kvm_nested_vmexit: vcpu 0 reason npf rip 0xfffff814ff394160 info1 0x0000000200000007 info2 0x000001000225af80 intr_info 0x00000000 error_code 0x00000000 requests 0x0000000000000000
CPU 0/KVM-32296 [112] ..... 3170.471586: kvm_page_fault: vcpu 0 rip 0xfffff814ff394160 address 0x000001000225af80 error_code 0x200000007
CPU 0/KVM-32296 [112] d.... 3170.471586: kvm_entry: vcpu 0, rip 0xfffff814ff394160 intr_info 0x00000000 error_code 0x00000000
CPU 0/KVM-32296 [112] d.... 3170.471586: kvm_exit: vcpu 0 reason npf rip 0xfffff814ff394160 info1 0x0000000200000007 info2 0x000001000225af80 intr_info 0x00000000 error_code 0x00000000 requests 0x0000000000000000
CPU 0/KVM-32296 [112] ..... 3170.471586: kvm_nested_vmexit: vcpu 0 reason npf rip 0xfffff814ff394160 info1 0x0000000200000007 info2 0x000001000225af80 intr_info 0x00000000 error_code 0x00000000 requests 0x0000000000000000
CPU 0/KVM-32296 [112] ..... 3170.471586: kvm_page_fault: vcpu 0 rip 0xfffff814ff394160 address 0x000001000225af80 error_code 0x200000007
CPU 0/KVM-32296 [112] d.... 3170.471587: kvm_entry: vcpu 0, rip 0xfffff814ff394160 intr_info 0x00000000 error_code 0x00000000
CPU 0/KVM-32296 [112] d.... 3170.471587: kvm_exit: vcpu 0 reason npf rip 0xfffff814ff394160 info1 0x0000000200000007 info2 0x000001000225af80 intr_info 0x00000000 error_code 0x00000000 requests 0x0000000000000000
CPU 0/KVM-32296 [112] ..... 3170.471587: kvm_nested_vmexit: vcpu 0 reason npf rip 0xfffff814ff394160 info1 0x0000000200000007 info2 0x000001000225af80 intr_info 0x00000000 error_code 0x00000000 requests 0x0000000000000000
CPU 0/KVM-32296 [112] ..... 3170.471587: kvm_page_fault: vcpu 0 rip 0xfffff814ff394160 address 0x000001000225af80 error_code 0x200000007
CPU 0/KVM-32296 [112] d.... 3170.471587: kvm_entry: vcpu 0, rip 0xfffff814ff394160 intr_info 0x00000000 error_code 0x00000000
CPU 0/KVM-32296 [112] d.... 3170.471588: kvm_exit: vcpu 0 reason npf rip 0xfffff814ff394160 info1 0x0000000200000007 info2 0x000001000225af80 intr_info 0x00000000 error_code 0x00000000 requests 0x0000000000000000
CPU 0/KVM-32296 [112] ..... 3170.471588: kvm_nested_vmexit: vcpu 0 reason npf rip 0xfffff814ff394160 info1 0x0000000200000007 info2 0x000001000225af80 intr_info 0x00000000 error_code 0x00000000 requests 0x0000000000000000
CPU 0/KVM-32296 [112] ..... 3170.471588: kvm_page_fault: vcpu 0 rip 0xfffff814ff394160 address 0x000001000225af80 error_code 0x200000007
CPU 0/KVM-32296 [112] d.... 3170.471588: kvm_entry: vcpu 0, rip 0xfffff814ff394160 intr_info 0x00000000 error_code 0x00000000
CPU 0/KVM-32296 [112] d.... 3170.471589: kvm_exit: vcpu 0 reason npf rip 0xfffff814ff394160 info1 0x0000000200000007 info2 0x000001000225af80 intr_info 0x00000000 error_code 0x00000000 requests 0x0000000000000000
-machine pc-q35-rhel9.6.0,usb=off,smm=on,kernel_irqchip=split,dump-guest-core=off,mem-merge=off,pflash0=libvirt-pflash0-format,pflash1=libvirt-pflash1-storage,acpi=on -accel kvm -cpu EPYC-Genoa-v2,enforce,invtsc=on,svm=on,svme-addr-chk=on,gmet=on,pku=off,hv-time=on,tsc-frequency=2400000000,kvm-pv-eoi=on,hv-relaxed=on,hv-vapic=on,hv-spinlocks=0x2000,hv-vpindex=on,hv-runtime=on,hv-syn
ic=on,hv-stimer=on,hv-tlbflush=on,hv-ipi=on,hv-avic=on,l3-cache=on -smp 4,maxcpus=240,sockets=60,dies=1,clusters=1,cores=4,threads=1
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 10/22] KVM: x86/mmu: split XS/XU bits for MBEC
2026-03-24 11:24 ` Paolo Bonzini
@ 2026-03-25 4:28 ` Huang, Kai
0 siblings, 0 replies; 56+ messages in thread
From: Huang, Kai @ 2026-03-25 4:28 UTC (permalink / raw)
To: kvm@vger.kernel.org, pbonzini@redhat.com,
linux-kernel@vger.kernel.org
Cc: amit.shah@amd.com, Kohler, Jon, seanjc@google.com,
mtosatti@redhat.com, nikunj@amd.com
On Tue, 2026-03-24 at 12:24 +0100, Paolo Bonzini wrote:
> On 3/24/26 11:45, Huang, Kai wrote:
> > On Sat, 2026-03-21 at 01:09 +0100, Paolo Bonzini wrote:
> > > When EPT is in use, replace ACC_USER_MASK with ACC_USER_EXEC_MASK,
> > > so that supervisor and user-mode execution can be controlled
> > > independently (ACC_USER_MASK would not allow a setting similar to
> > > XU=0 XS=1 W=1 R=1).
> > >
> > > Replace shadow_x_mask with shadow_xs_mask/shadow_xu_mask, to allow
> > > setting XS and XU bits separately in EPT entries.
> > >
> > > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> > > ---
> > > arch/x86/include/asm/vmx.h | 1 +
> > > arch/x86/kvm/mmu/mmu.c | 15 ++++++++---
> > > arch/x86/kvm/mmu/mmutrace.h | 6 ++---
> > > arch/x86/kvm/mmu/paging_tmpl.h | 4 +++
> > > arch/x86/kvm/mmu/spte.c | 47 ++++++++++++++++++++++------------
> > > arch/x86/kvm/mmu/spte.h | 8 +++---
> > > 6 files changed, 55 insertions(+), 26 deletions(-)
> > >
> > > diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
> > > index 4a0804cc7c82..0041f8a77447 100644
> > > --- a/arch/x86/include/asm/vmx.h
> > > +++ b/arch/x86/include/asm/vmx.h
> > > @@ -538,6 +538,7 @@ enum vmcs_field {
> > > #define VMX_EPT_IPAT_BIT (1ull << 6)
> > > #define VMX_EPT_ACCESS_BIT (1ull << 8)
> > > #define VMX_EPT_DIRTY_BIT (1ull << 9)
> > > +#define VMX_EPT_USER_EXECUTABLE_MASK (1ull << 10)
> > > #define VMX_EPT_SUPPRESS_VE_BIT (1ull << 63)
> > > #define VMX_EPT_RWX_MASK (VMX_EPT_READABLE_MASK | \
> > > VMX_EPT_WRITABLE_MASK | \
> >
> > Should we include VMX_EPT_USER_EXECUTABLE_MASK to VMX_EPT_RWX_MASK?
>
> No, because it is used for many cases to refer to bits 0-2, for example:
>
> #define EPT_VIOLATION_RWX_TO_PROT(__epte)
> (((__epte) & VMX_EPT_RWX_MASK) << 3)
>
> Bit 10 is handled separately because it's not contiguous and has a
> different mapping to the exit qualification (to bit 6 instead of bit 13).
OK. It's a bit unfortunate but we can always explicitly get
EPT_VIOLATION_PROT_USER_EXEC from the VMX_EPT_USER_EXECUTABLE_MASK.
>
> (However, there is a bug later in the series where shadow_acc_track_mask
> needs to have VMX_EPT_USER_EXECUTABLE_MASK in it).
Right we need to track the XU bit too.
>
> >
> > [...]
> >
> > > @@ -496,7 +507,8 @@ void kvm_mmu_set_ept_masks(bool has_ad_bits)
> > > shadow_accessed_mask = VMX_EPT_ACCESS_BIT;
> > > shadow_dirty_mask = VMX_EPT_DIRTY_BIT;
> > > shadow_nx_mask = 0ull;
> > > - shadow_x_mask = VMX_EPT_EXECUTABLE_MASK;
> > > + shadow_xs_mask = VMX_EPT_EXECUTABLE_MASK;
> > > + shadow_xu_mask = VMX_EPT_EXECUTABLE_MASK;
> >
> > Shouldn't 'shadow_xu_mask' be VMX_EPT_USER_EXECUTABLE_MASK?
>
> Not yet, because shadow_xu_mask is used to set executable permissions as
> well. I suppose you could make it 0 when MBEC is disabled instead of
> VMX_EPT_EXECUTABLE_MASK, but it can only be VMX_EPT_USER_EXECUTABLE_MASK
> when MBEC is enabled.
I see. It's changed to the right value in a later patch which actually
turns on MBEC.
>
> >
> >
> > Btw, with MBEC it's a bit weird to me that we continue to just use
> > 110 (R=0,W=1,X=1) to trigger EPT misconfig for MMIO caching:
> >
> > /*
> > * EPT Misconfigurations are generated if the value of bits 2:0
> > * of an EPT paging-structure entry is 110b (write/execute).
> > */
> > kvm_mmu_set_mmio_spte_mask(VMX_EPT_MISCONFIG_WX_VALUE,
> > VMX_EPT_RWX_MASK | VMX_EPT_SUPPRESS_VE_BIT,
> > 0);
> >
> > Per SDM, R=0 and W=1 is always guaranteed to trigger EPT misconfig (see
> > 30.3.3.1 EPT Misconfigurations). Maybe we can just use that for MMIO
> > caching?
> >
> > We can then remove both X and XU bit from mmio_mask too.
>
> Maybe but is it worth it? (Based on this we could keep bit 10 in
> MMIO_SPTE_GEN_LOW_END, after all, because W=1 R=0 would give a
> misconfiguration independent of the value of XU; but again I'm not sure
> it's worth it).
It looks promising to me since we can have a slightly clearer code (IMHO)
and one more bit for MMIO gen. But no strong opinion :-)
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 01/22] KVM: TDX/VMX: rework EPT_VIOLATION_EXEC_FOR_RING3_LIN into PROT_MASK
2026-03-21 0:09 ` [PATCH 01/22] KVM: TDX/VMX: rework EPT_VIOLATION_EXEC_FOR_RING3_LIN into PROT_MASK Paolo Bonzini
2026-03-23 14:49 ` Jon Kohler
@ 2026-03-25 4:29 ` Huang, Kai
1 sibling, 0 replies; 56+ messages in thread
From: Huang, Kai @ 2026-03-25 4:29 UTC (permalink / raw)
To: kvm@vger.kernel.org, pbonzini@redhat.com,
linux-kernel@vger.kernel.org
Cc: amit.shah@amd.com, Kohler, Jon, seanjc@google.com,
mtosatti@redhat.com, nikunj@amd.com
On Sat, 2026-03-21 at 01:09 +0100, Paolo Bonzini wrote:
> From: Jon Kohler <jon@nutanix.com>
>
> EPT exit qualification bit 6 is used when mode-based execute control
> is enabled, and reflects user executable addresses. Rework name to
> reflect the intention and add to EPT_VIOLATION_PROT_MASK, which allows
> simplifying the return evaluation in
> tdx_is_sept_violation_unexpected_pending a pinch.
>
> Rework handling in __vmx_handle_ept_violation to unconditionally clear
> EPT_VIOLATION_PROT_USER_EXEC until MBEC is implemented, as suggested by
> Sean [1].
>
> Note: Intel SDM Table 29-7 defines bit 6 as:
> If the “mode-based execute control” VM-execution control is 0, the
> value of this bit is undefined. If that control is 1, this bit is the
> logical-AND of bit 10 in the EPT paging-structure entries used to
> translate the guest-physical address of the access causing the EPT
> violation. In this case, it indicates whether the guest-physical
> address was executable for user-mode linear addresses.
>
> [1] https://lore.kernel.org/all/aCJDzU1p_SFNRIJd@google.com/
>
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Jon Kohler <jon@nutanix.com>
> Message-ID: <20251223054806.1611168-2-jon@nutanix.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>
Acked-by: Kai Huang <kai.huang@intel.com>
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 02/22] KVM: x86/mmu: remove SPTE_PERM_MASK
2026-03-21 0:09 ` [PATCH 02/22] KVM: x86/mmu: remove SPTE_PERM_MASK Paolo Bonzini
@ 2026-03-25 4:29 ` Huang, Kai
0 siblings, 0 replies; 56+ messages in thread
From: Huang, Kai @ 2026-03-25 4:29 UTC (permalink / raw)
To: kvm@vger.kernel.org, pbonzini@redhat.com,
linux-kernel@vger.kernel.org
Cc: amit.shah@amd.com, Kohler, Jon, seanjc@google.com,
mtosatti@redhat.com, nikunj@amd.com
On Sat, 2026-03-21 at 01:09 +0100, Paolo Bonzini wrote:
> From: Jon Kohler <jon@nutanix.com>
>
> SPTE_PERM_MASK is no longer referenced by anything in the kernel.
>
> Signed-off-by: Jon Kohler <jon@nutanix.com>
> Message-ID: <20251223054806.1611168-3-jon@nutanix.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>
Reviewed-by: Kai Huang <kai.huang@intel.com>
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 04/22] KVM: x86/mmu: shuffle high bits of SPTEs in preparation for MBEC
2026-03-21 0:09 ` [PATCH 04/22] KVM: x86/mmu: shuffle high bits of SPTEs in preparation for MBEC Paolo Bonzini
@ 2026-03-25 4:35 ` Huang, Kai
0 siblings, 0 replies; 56+ messages in thread
From: Huang, Kai @ 2026-03-25 4:35 UTC (permalink / raw)
To: kvm@vger.kernel.org, pbonzini@redhat.com,
linux-kernel@vger.kernel.org
Cc: amit.shah@amd.com, Kohler, Jon, seanjc@google.com,
mtosatti@redhat.com, nikunj@amd.com
On Sat, 2026-03-21 at 01:09 +0100, Paolo Bonzini wrote:
> Access tracking will need to save bit 10 when MBEC is enabled.
> Right now it is simply shifting the R and X bits into bits 54 and 56,
> but bit 10 would not fit with the same scheme. Reorganize the
> high bits so that access tracking will use bits 52, 54 and 62.
So that we can continue to do a simple:
spte |= (spte & SHADOW_ACC_TRACK_SAVED_BITS_MASK) <<
SHADOW_ACC_TRACK_SAVED_BITS_SHIFT;
?
> As a side effect, the free bits are compacted slightly, with
> 56-59 still unused.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> arch/x86/kvm/mmu/spte.h | 20 +++++++++++++++-----
> 1 file changed, 15 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
> index b60666778f61..7223a61b1260 100644
> --- a/arch/x86/kvm/mmu/spte.h
> +++ b/arch/x86/kvm/mmu/spte.h
> @@ -17,10 +17,20 @@
> */
> #define SPTE_MMU_PRESENT_MASK BIT_ULL(11)
>
> +/*
> + * The ignored high bits are allocated as follows:
> + * - bits 52, 54: saved X-R bits for access tracking when EPT does not have A/D
> + * - bits 53 (EPT only): host writable
> + * - bits 55 (EPT only): MMU-writable
> + * - bits 56-59: unused
> + * - bits 60-61: type of A/D tracking
> + * - bits 62: unused
> + */
> +
> /*
> * TDP SPTES (more specifically, EPT SPTEs) may not have A/D bits, and may also
> * be restricted to using write-protection (for L2 when CPU dirty logging, i.e.
> - * PML, is enabled). Use bits 52 and 53 to hold the type of A/D tracking that
> + * PML, is enabled). Use bits 60 and 61 to hold the type of A/D tracking that
> * is must be employed for a given TDP SPTE.
> *
> * Note, the "enabled" mask must be '0', as bits 62:52 are _reserved_ for PAE
> @@ -29,7 +39,7 @@
> * TDP with CPU dirty logging (PML). If NPT ever gains PML-like support, it
> * must be restricted to 64-bit KVM.
> */
> -#define SPTE_TDP_AD_SHIFT 52
> +#define SPTE_TDP_AD_SHIFT 60
> #define SPTE_TDP_AD_MASK (3ULL << SPTE_TDP_AD_SHIFT)
> #define SPTE_TDP_AD_ENABLED (0ULL << SPTE_TDP_AD_SHIFT)
> #define SPTE_TDP_AD_DISABLED (1ULL << SPTE_TDP_AD_SHIFT)
> @@ -65,7 +75,7 @@ static_assert(SPTE_TDP_AD_ENABLED == 0);
> */
> #define SHADOW_ACC_TRACK_SAVED_BITS_MASK (SPTE_EPT_READABLE_MASK | \
> SPTE_EPT_EXECUTABLE_MASK)
> -#define SHADOW_ACC_TRACK_SAVED_BITS_SHIFT 54
> +#define SHADOW_ACC_TRACK_SAVED_BITS_SHIFT 52
> #define SHADOW_ACC_TRACK_SAVED_MASK (SHADOW_ACC_TRACK_SAVED_BITS_MASK << \
> SHADOW_ACC_TRACK_SAVED_BITS_SHIFT)
> static_assert(!(SPTE_TDP_AD_MASK & SHADOW_ACC_TRACK_SAVED_MASK));
> @@ -84,8 +94,8 @@ static_assert(!(SPTE_TDP_AD_MASK & SHADOW_ACC_TRACK_SAVED_MASK));
> * to not overlap the A/D type mask or the saved access bits of access-tracked
> * SPTEs when A/D bits are disabled.
> */
> -#define EPT_SPTE_HOST_WRITABLE BIT_ULL(57)
> -#define EPT_SPTE_MMU_WRITABLE BIT_ULL(58)
> +#define EPT_SPTE_HOST_WRITABLE BIT_ULL(53)
> +#define EPT_SPTE_MMU_WRITABLE BIT_ULL(55)
It's a bit dangerous to put HOST_WRITABLE bit between the R-X bits, but we
don't keep the W bit of the to-be-tracked SPTE and we are going to do a
simple shift to preserve the R, X and XU bits anyway, so should be fine.
Acked-by: Kai Huang <kai.huang@intel.com>
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 05/22] KVM: x86/mmu: remove SPTE_EPT_*
2026-03-21 0:09 ` [PATCH 05/22] KVM: x86/mmu: remove SPTE_EPT_* Paolo Bonzini
@ 2026-03-25 4:36 ` Huang, Kai
0 siblings, 0 replies; 56+ messages in thread
From: Huang, Kai @ 2026-03-25 4:36 UTC (permalink / raw)
To: kvm@vger.kernel.org, pbonzini@redhat.com,
linux-kernel@vger.kernel.org
Cc: amit.shah@amd.com, Kohler, Jon, seanjc@google.com,
mtosatti@redhat.com, nikunj@amd.com
On Sat, 2026-03-21 at 01:09 +0100, Paolo Bonzini wrote:
> spte.h is already including vmx.h, use the constants it defines.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 22/22] KVM: nSVM: enable GMET for guests
2026-03-24 19:57 ` Jon Kohler
@ 2026-03-25 5:22 ` Nikunj A. Dadhania
2026-03-25 12:55 ` Paolo Bonzini
1 sibling, 0 replies; 56+ messages in thread
From: Nikunj A. Dadhania @ 2026-03-25 5:22 UTC (permalink / raw)
To: Jon Kohler, Paolo Bonzini
Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
Marcelo Tosatti, Amit Shah, Sean Christopherson
On 3/25/2026 1:27 AM, Jon Kohler wrote:
>
>
>> On Mar 20, 2026, at 8:09 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>
>> All that needs to be done is moving the GMET bit from vmcs12 to
>> vmcs02.
>>
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> ---
>> arch/x86/kvm/svm/nested.c | 3 +++
>> arch/x86/kvm/svm/svm.c | 3 +++
>> 2 files changed, 6 insertions(+)
>>
>> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
>> index d69bcf52f948..397e9afecb78 100644
>> --- a/arch/x86/kvm/svm/nested.c
>> +++ b/arch/x86/kvm/svm/nested.c
>> @@ -774,6 +774,9 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
>> vmcb02->control.bus_lock_counter = 0;
>>
>> vmcb02->control.nested_ctl &= ~SVM_NESTED_CTL_GMET_ENABLE;
>> + if (guest_cpu_cap_has(vcpu, X86_FEATURE_GMET))
>> + vmcb02->control.nested_ctl |=
>> + (svm->nested.ctl.nested_ctl & SVM_NESTED_CTL_GMET_ENABLE);
>>
>> /* Done at vmrun: asid. */
>>
>> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
>> index d3b69eb3242b..4a0d97e70dc2 100644
>> --- a/arch/x86/kvm/svm/svm.c
>> +++ b/arch/x86/kvm/svm/svm.c
>> @@ -5294,6 +5294,9 @@ static __init void svm_set_cpu_caps(void)
>> if (boot_cpu_has(X86_FEATURE_PFTHRESHOLD))
>> kvm_cpu_cap_set(X86_FEATURE_PFTHRESHOLD);
>>
>> + if (boot_cpu_has(X86_FEATURE_GMET))
>> + kvm_cpu_cap_set(X86_FEATURE_GMET);
>> +
>> if (vgif)
>> kvm_cpu_cap_set(X86_FEATURE_VGIF);
>>
>> --
>> 2.52.0
>>
>
> When I enable gmet on the guest, and try to boot with memory integrity
> enabled in Windows 11 25H2 guest, the machine does not boot, but rather
> gets stuck in an NPF loop and does not make any progress.
Same here as well, trying to debug
kvm_nested_vmenter: rip: 0xfffff8148793c2ea vmcb: 0x00000001087fd000 nested_rip: 0xfffff81487993d90 int_ctl: 0x00000000 event_inj: 0x00000000 nested_npt=n guest_cr3: 0x00000001087ce000
kvm_nested_vmexit: vcpu 0 reason npf rip 0xfffff81487993d90 info1 0x0000000200000006 info2 0x00000001087cef80 intr_info 0x00000000 error_code 0x00000000 requests 0x0000000000000000
This seems to be the first entry into the nested guest and nested page fault is on guest_cr3 page
kvm_mmu_set_spte: gfn 108600 spte 39ac00ee3 (rwx-) level 2 at 3012b1218
A 2M page is provisioned in the NPT
kvm_nested_vmexit: vcpu 0 reason npf rip 0xfffff81487993d90 info1 0x0000000200000007 info2 0x00000001087cef80 intr_info 0x00000000 error_code 0x00000000 requests 0x0000000000000000
But the fault keeps hitting.
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 20/22] KVM: SVM: enable GMET and set it in MMU role
2026-03-21 0:09 ` [PATCH 20/22] KVM: SVM: enable GMET and set it in MMU role Paolo Bonzini
@ 2026-03-25 9:25 ` Nikunj A. Dadhania
2026-03-25 9:29 ` Paolo Bonzini
0 siblings, 1 reply; 56+ messages in thread
From: Nikunj A. Dadhania @ 2026-03-25 9:25 UTC (permalink / raw)
To: Paolo Bonzini, linux-kernel, kvm
Cc: Jon Kohler, Marcelo Tosatti, Amit Shah, Sean Christopherson
On 3/21/2026 5:39 AM, Paolo Bonzini wrote:
> @@ -1184,6 +1187,10 @@ static void init_vmcb(struct kvm_vcpu *vcpu)
> save->g_pat = vcpu->arch.pat;
> save->cr3 = 0;
> }
> +
> + if (gmet_enabled)
if (gmet_enabled && guest_cpu_cap_has(vcpu, X86_FEATURE_GMET))
??
> + control->nested_ctl |= SVM_NESTED_CTL_GMET_ENABLE;
> +
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 20/22] KVM: SVM: enable GMET and set it in MMU role
2026-03-25 9:25 ` Nikunj A. Dadhania
@ 2026-03-25 9:29 ` Paolo Bonzini
2026-03-25 9:39 ` Nikunj A. Dadhania
0 siblings, 1 reply; 56+ messages in thread
From: Paolo Bonzini @ 2026-03-25 9:29 UTC (permalink / raw)
To: Nikunj A. Dadhania
Cc: linux-kernel, kvm, Jon Kohler, Marcelo Tosatti, Amit Shah,
Sean Christopherson
On Wed, Mar 25, 2026 at 10:26 AM Nikunj A. Dadhania <nikunj@amd.com> wrote:
>
>
>
> On 3/21/2026 5:39 AM, Paolo Bonzini wrote:
> > @@ -1184,6 +1187,10 @@ static void init_vmcb(struct kvm_vcpu *vcpu)
> > save->g_pat = vcpu->arch.pat;
> > save->cr3 = 0;
> > }
> > +
> > + if (gmet_enabled)
>
> if (gmet_enabled && guest_cpu_cap_has(vcpu, X86_FEATURE_GMET))
No, this is the non-nested case (vmcb01) and I'm enabling GMET on
purpose for easier testing.
guest_cpu_cap_has(GMET) only matters for nested guests and is added by patch 22:
vmcb02->control.nested_ctl &= ~SVM_NESTED_CTL_GMET_ENABLE;
if (guest_cpu_cap_has(vcpu, X86_FEATURE_GMET))
vmcb02->control.nested_ctl |=
(svm->nested.ctl.nested_ctl & SVM_NESTED_CTL_GMET_ENABLE);
Paolo
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 20/22] KVM: SVM: enable GMET and set it in MMU role
2026-03-25 9:29 ` Paolo Bonzini
@ 2026-03-25 9:39 ` Nikunj A. Dadhania
2026-03-25 10:08 ` Paolo Bonzini
0 siblings, 1 reply; 56+ messages in thread
From: Nikunj A. Dadhania @ 2026-03-25 9:39 UTC (permalink / raw)
To: Paolo Bonzini
Cc: linux-kernel, kvm, Jon Kohler, Marcelo Tosatti, Amit Shah,
Sean Christopherson
On 3/25/2026 2:59 PM, Paolo Bonzini wrote:
> On Wed, Mar 25, 2026 at 10:26 AM Nikunj A. Dadhania <nikunj@amd.com> wrote:
>>
>>
>>
>> On 3/21/2026 5:39 AM, Paolo Bonzini wrote:
>>> @@ -1184,6 +1187,10 @@ static void init_vmcb(struct kvm_vcpu *vcpu)
>>> save->g_pat = vcpu->arch.pat;
>>> save->cr3 = 0;
>>> }
>>> +
>>> + if (gmet_enabled)
>>
>> if (gmet_enabled && guest_cpu_cap_has(vcpu, X86_FEATURE_GMET))
>
> No, this is the non-nested case (vmcb01) and I'm enabling GMET on
> purpose for easier testing.
Win11 guest with memory isolation stops booting after this patch. Boots if I pass gmet=0.
>
> guest_cpu_cap_has(GMET) only matters for nested guests and is added by patch 22:
>
> vmcb02->control.nested_ctl &= ~SVM_NESTED_CTL_GMET_ENABLE;
> if (guest_cpu_cap_has(vcpu, X86_FEATURE_GMET))
> vmcb02->control.nested_ctl |=
> (svm->nested.ctl.nested_ctl & SVM_NESTED_CTL_GMET_ENABLE);
Right, I noticed that.
Regards
Nikunj
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 20/22] KVM: SVM: enable GMET and set it in MMU role
2026-03-25 9:39 ` Nikunj A. Dadhania
@ 2026-03-25 10:08 ` Paolo Bonzini
0 siblings, 0 replies; 56+ messages in thread
From: Paolo Bonzini @ 2026-03-25 10:08 UTC (permalink / raw)
To: Nikunj A. Dadhania
Cc: linux-kernel, kvm, Jon Kohler, Marcelo Tosatti, Amit Shah,
Sean Christopherson
On Wed, Mar 25, 2026 at 10:40 AM Nikunj A. Dadhania <nikunj@amd.com> wrote:
>
>
>
> On 3/25/2026 2:59 PM, Paolo Bonzini wrote:
> > On Wed, Mar 25, 2026 at 10:26 AM Nikunj A. Dadhania <nikunj@amd.com> wrote:
> >>
> >>
> >>
> >> On 3/21/2026 5:39 AM, Paolo Bonzini wrote:
> >>> @@ -1184,6 +1187,10 @@ static void init_vmcb(struct kvm_vcpu *vcpu)
> >>> save->g_pat = vcpu->arch.pat;
> >>> save->cr3 = 0;
> >>> }
> >>> +
> >>> + if (gmet_enabled)
> >>
> >> if (gmet_enabled && guest_cpu_cap_has(vcpu, X86_FEATURE_GMET))
> >
> > No, this is the non-nested case (vmcb01) and I'm enabling GMET on
> > purpose for easier testing.
>
> Win11 guest with memory isolation stops booting after this patch. Boots if I pass gmet=0.
That would also disable GMET in the Hyper-V nested guest. I'm looking at it.
Paolo
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 22/22] KVM: nSVM: enable GMET for guests
2026-03-24 19:57 ` Jon Kohler
2026-03-25 5:22 ` Nikunj A. Dadhania
@ 2026-03-25 12:55 ` Paolo Bonzini
1 sibling, 0 replies; 56+ messages in thread
From: Paolo Bonzini @ 2026-03-25 12:55 UTC (permalink / raw)
To: Jon Kohler
Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
Marcelo Tosatti, Nikunj A Dadhania, Amit Shah,
Sean Christopherson
On Tue, Mar 24, 2026 at 8:57 PM Jon Kohler <jon@nutanix.com> wrote:
> On Mar 20, 2026, at 8:09 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> > vmcb02->control.nested_ctl &= ~SVM_NESTED_CTL_GMET_ENABLE;
> > if (guest_cpu_cap_has(vcpu, X86_FEATURE_GMET))
> > vmcb02->control.nested_ctl |=
> > (svm->nested.ctl.nested_ctl & SVM_NESTED_CTL_GMET_ENABLE);
The issue is with nNPT disabled; these four lines of code have to be
moved inside the "if (nested_npt_enabled(svm))".
(The giveaway is the kvmmmu:fast_page_fault event in the trace, which
never appears with shadow paging).
I have fixed Kai's reported issues and EPT page tests, and will post
the next version after doing some more testing.
Paolo
^ permalink raw reply [flat|nested] 56+ messages in thread
end of thread, other threads:[~2026-03-25 12:56 UTC | newest]
Thread overview: 56+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-21 0:09 [RFC PATCH 00/22] KVM: combined patchset for MBEC/GMET support Paolo Bonzini
2026-03-21 0:09 ` [PATCH 01/22] KVM: TDX/VMX: rework EPT_VIOLATION_EXEC_FOR_RING3_LIN into PROT_MASK Paolo Bonzini
2026-03-23 14:49 ` Jon Kohler
2026-03-25 4:29 ` Huang, Kai
2026-03-21 0:09 ` [PATCH 02/22] KVM: x86/mmu: remove SPTE_PERM_MASK Paolo Bonzini
2026-03-25 4:29 ` Huang, Kai
2026-03-21 0:09 ` [PATCH 03/22] KVM: x86/mmu: adjust MMIO generation bit allocation and allowed mask Paolo Bonzini
2026-03-24 3:48 ` Huang, Kai
2026-03-24 9:11 ` Paolo Bonzini
2026-03-21 0:09 ` [PATCH 04/22] KVM: x86/mmu: shuffle high bits of SPTEs in preparation for MBEC Paolo Bonzini
2026-03-25 4:35 ` Huang, Kai
2026-03-21 0:09 ` [PATCH 05/22] KVM: x86/mmu: remove SPTE_EPT_* Paolo Bonzini
2026-03-25 4:36 ` Huang, Kai
2026-03-21 0:09 ` [PATCH 06/22] KVM: x86/mmu: merge make_spte_{non,}executable Paolo Bonzini
2026-03-23 14:49 ` Jon Kohler
2026-03-21 0:09 ` [PATCH 07/22] KVM: x86/mmu: rename and clarify BYTE_MASK Paolo Bonzini
2026-03-21 0:09 ` [PATCH 08/22] KVM: x86/mmu: introduce ACC_READ_MASK Paolo Bonzini
2026-03-23 14:49 ` Jon Kohler
2026-03-23 14:49 ` Jon Kohler
2026-03-21 0:09 ` [PATCH 09/22] KVM: x86/mmu: separate more EPT/non-EPT permission_fault() Paolo Bonzini
2026-03-21 0:09 ` [PATCH 10/22] KVM: x86/mmu: split XS/XU bits for MBEC Paolo Bonzini
2026-03-24 10:45 ` Huang, Kai
2026-03-24 11:24 ` Paolo Bonzini
2026-03-25 4:28 ` Huang, Kai
2026-03-21 0:09 ` [PATCH 11/22] KVM: x86/mmu: move cr4_smep to base role Paolo Bonzini
2026-03-21 0:09 ` [PATCH 12/22] KVM: VMX: enable use of MBEC Paolo Bonzini
2026-03-23 14:49 ` Jon Kohler
2026-03-21 0:09 ` [PATCH 13/22] KVM: x86/mmu: add support for nested MBEC Paolo Bonzini
2026-03-23 14:49 ` Jon Kohler
2026-03-21 0:09 ` [PATCH 14/22] KVM: nVMX: advertise MBEC to nested guests Paolo Bonzini
2026-03-23 14:49 ` Jon Kohler
2026-03-21 0:09 ` [PATCH 15/22] KVM: nVMX: allow MBEC with EVMCS Paolo Bonzini
2026-03-21 0:09 ` [PATCH 16/22] KVM: x86/tdp_mmu: propagate access mask from kvm_mmu_page to PTE Paolo Bonzini
2026-03-21 0:09 ` [PATCH 17/22] KVM: x86/mmu: introduce cpu_role bit for availability of PFEC.I/D Paolo Bonzini
2026-03-21 0:09 ` [PATCH 18/22] KVM: SVM: add GMET bit definitions Paolo Bonzini
2026-03-21 11:58 ` Borislav Petkov
2026-03-21 13:51 ` Paolo Bonzini
2026-03-21 15:42 ` Borislav Petkov
2026-03-23 7:53 ` Paolo Bonzini
2026-03-23 12:17 ` Borislav Petkov
2026-03-23 12:22 ` Paolo Bonzini
2026-03-23 12:26 ` Borislav Petkov
2026-03-23 12:19 ` Borislav Petkov
2026-03-23 12:26 ` Borislav Petkov
2026-03-21 0:09 ` [PATCH 19/22] KVM: x86/mmu: add support for NPT GMET Paolo Bonzini
2026-03-21 0:09 ` [PATCH 20/22] KVM: SVM: enable GMET and set it in MMU role Paolo Bonzini
2026-03-25 9:25 ` Nikunj A. Dadhania
2026-03-25 9:29 ` Paolo Bonzini
2026-03-25 9:39 ` Nikunj A. Dadhania
2026-03-25 10:08 ` Paolo Bonzini
2026-03-21 0:09 ` [PATCH 21/22] KVM: SVM: work around errata 1218 Paolo Bonzini
2026-03-21 0:09 ` [PATCH 22/22] KVM: nSVM: enable GMET for guests Paolo Bonzini
2026-03-24 19:57 ` Jon Kohler
2026-03-25 5:22 ` Nikunj A. Dadhania
2026-03-25 12:55 ` Paolo Bonzini
2026-03-21 13:54 ` [RFC PATCH 00/22] KVM: combined patchset for MBEC/GMET support Paolo Bonzini
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox