From: Paolo Bonzini <pbonzini@redhat.com>
To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: d.riley@proxmox.com, jon@nutanix.com
Subject: [PATCH v6 00/28] KVM: combined patchset for MBEC/GMET support
Date: Tue, 5 May 2026 21:51:58 +0200 [thread overview]
Message-ID: <20260505195226.563317-1-pbonzini@redhat.com> (raw)
This version can also be found in the "queue" branch of kvm.git.
Since it should be final I'm including again for reference the full
description.
Both MBEC and GMET allow more granular control over execute permissions,
with different levels of separation between supervisor and user mode.
MBEC provides support for separate supervisor and user-mode bits in the
PTEs; GMET instead lacks supervisor-mode only execution (with NX=0,
"both" is represented by U=0 and user-mode only by U=1). GMET was
clearly inspired by SMEP though with some differences and annoyances.
The implementation starts from two changes to core MMU code, both
of which help making the actual feature almost trivial to implement:
- first, I'm cleaning up the implementation of nVMX exec-only, by
properly adding read permissions to the ACC_* constant and to the
permission bitmask machinery. Jon also had to add a fourth ACC_*
bit, but used it only in the special case of nested MBEC; here
instead ACC_READ_MASK is the normality, which simplifies testing
a lot and removes gratuitous complexity.
- second, I'm enforcing that KVM runs with MBEC/GMET enabled even in
non-nested mode, if it wants to provide the feature to nested
hypervisors. This makes the creation of SPTEs looks exactly the
same for L1 and L2 guests, despite only the latter using MBEC/GMET
fully; the difference lies only in the input access permissions.
This strategy adds a limited amount of complexity to the core is limited,
while providing for an almost entirely seamless support of nested
hypervisors.
Later patches have to use slightly different meanings for ACC_* in Intel
and AMD. On the Intel side, some work is needed in order to split
shadow_x_mask and ACC_EXEC_MASK in two; now that there is an actual
ACC_READ_MASK to be used for exec-only pages, ACC_USER_MASK is unused
and can be reused as ACC_USER_EXEC_MASK. However, unlike the older
ACC_USER_MASK hack these differences are backed by concrete concepts
of the page table format, and there is always a 1:1 mapping from ACC_*
bits to PT_*_MASK or shadow_*_mask:
Intel AMD
-------------------- ------------------- -------------------
ACC_READ_MASK PT_PRESENT_MASK PT_PRESENT_MASK
ACC_WRITE_MASK PT_WRITABLE_MASK PT_WRITABLE_MASK
ACC_EXEC_MASK shadow_xs_mask shadow_nx_mask
ACC_USER_MASK --- shadow_user_mask
ACC_USER_EXEC_MASK shadow_xu_mask ---
On Intel, ACC_EXEC_MASK is used for kernel-mode execution and is tied to
shadow_xs_mask (when MBEC is disabled, ACC_USER_EXEC_MASK and the XU bit
are computed but ineffective). update_permission_bitmask() precomputes
all the necessary conditions. On the AMD side, the U bit maps to
ACC_USER_MASK but nNPT adjusts the permission bitmask to ignore it for
reads and writes when GMET is active. Despite the smaller scale of the
changes compared to MBEC, there are some changes to make to use GMET
for L1 guests, because the page tables have to be created with U=0.
This means that the root page has role.access != ACC_ALL and its
permissions have to be propagated down.
Note that with MBEC the user/supervisor distinction depends on the U
bit of the page tables rather than the CPL. Processors provide this
information to the hypervisor through the "advanced EPT violation
vmexit info" feature, which is a requirement for KVM to use MBEC,
and kvm-intel.ko passes it to the MMU in PFERR_USER_MASK (unlike
kvm-amd.ko which computes it from the CPL). This needs a small change
to pass the effective XWU permissions of the page tables down to
translate_nested_gpa().
The former "smep_andnot_wp" bit of cpu_role.base, now named "cr4_smep",
is repurposed for nested TDP to indicate that MBEC/GMET is on. The minor
pessimization for shadow page tables (toggling CR4.SMEP now always forces
building a separate version of the shadow page tables, even though that's
technically unnecessary if CR4.WP=1) is not really worth fretting about;
in practice, guests are not going to flip CR4.SMEP in a way that would
prevent efficient reuse of shadow page tables.
Paolo
v5->v6:
- rename make_spte_executable to change_spte_executable
- rename byte index in update_permission_bitmask to index
- use (u8) casts before "KVM: x86/mmu: introduce ACC_READ_MASK"
- make commit message for "KVM: x86/mmu: split XS/XU bits for EPT" more accurate
- add XU to shadow_acc_track_mask already in "KVM: x86/mmu: split XS/XU bits for EPT"
- fix compilation error
- use alternative code for __vmx_handle_ept_violation suggested by Sean
Jon Kohler (5):
KVM: TDX/VMX: rework EPT_VIOLATION_EXEC_FOR_RING3_LIN into PROT_MASK
KVM: x86/mmu: remove SPTE_PERM_MASK
KVM: x86/mmu: free up bit 10 of PTEs in preparation for MBEC
KVM: nVMX: advertise MBEC to nested guests
KVM: nVMX: allow MBEC with EVMCS
Paolo Bonzini (23):
KVM: x86/mmu: shuffle high bits of SPTEs in preparation for MBEC
KVM: x86/mmu: remove SPTE_EPT_*
KVM: x86/mmu: merge make_spte_{non,}executable
KVM: x86/mmu: rename and clarify BYTE_MASK
KVM: x86/mmu: separate more EPT/non-EPT permission_fault()
KVM: x86/mmu: introduce ACC_READ_MASK
KVM: x86/mmu: pass PFERR_GUEST_PAGE/FINAL_MASK to kvm_translate_gpa
KVM: x86/mmu: pass pte_access for final nGPA->GPA walk
KVM: x86: make translate_nested_gpa vendor-specific
KVM: x86/mmu: split XS/XU bits for EPT
KVM: x86/mmu: move cr4_smep to base role
KVM: VMX: enable use of MBEC
KVM: nVMX: pass advanced EPT violation vmexit info to guest
KVM: nVMX: pass PFERR_USER_MASK to MMU on EPT violations
KVM: x86/mmu: add support for MBEC to EPT page table walks
KVM: x86/mmu: propagate access mask from root pages down
KVM: x86/mmu: introduce cpu_role bit for availability of PFEC.I/D
KVM: SVM: add GMET bit definitions
KVM: x86/mmu: hard code more bits in kvm_init_shadow_npt_mmu
KVM: x86/mmu: add support for GMET to NPT page table walks
KVM: SVM: enable GMET and set it in MMU role
KVM: SVM: work around errata 1218
KVM: nSVM: enable GMET for guests
Documentation/virt/kvm/x86/mmu.rst | 10 +-
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/kvm-x86-ops.h | 1 +
arch/x86/include/asm/kvm_host.h | 48 +++++---
arch/x86/include/asm/svm.h | 1 +
arch/x86/include/asm/vmx.h | 14 ++-
arch/x86/kvm/hyperv.c | 4 +-
arch/x86/kvm/mmu.h | 30 +++--
arch/x86/kvm/mmu/mmu.c | 182 ++++++++++++++++++++---------
arch/x86/kvm/mmu/mmutrace.h | 19 +--
arch/x86/kvm/mmu/paging_tmpl.h | 73 ++++++++----
arch/x86/kvm/mmu/spte.c | 92 +++++++++------
arch/x86/kvm/mmu/spte.h | 70 ++++++-----
arch/x86/kvm/mmu/tdp_mmu.c | 6 +-
arch/x86/kvm/svm/nested.c | 38 +++++-
arch/x86/kvm/svm/svm.c | 31 +++++
arch/x86/kvm/svm/svm.h | 1 +
arch/x86/kvm/vmx/capabilities.h | 12 +-
arch/x86/kvm/vmx/common.h | 26 +++--
arch/x86/kvm/vmx/hyperv_evmcs.h | 1 +
arch/x86/kvm/vmx/main.c | 9 ++
arch/x86/kvm/vmx/nested.c | 46 +++++++-
arch/x86/kvm/vmx/tdx.c | 2 +-
arch/x86/kvm/vmx/vmx.c | 27 ++++-
arch/x86/kvm/vmx/vmx.h | 1 +
arch/x86/kvm/vmx/x86_ops.h | 1 +
arch/x86/kvm/x86.c | 18 +--
27 files changed, 536 insertions(+), 228 deletions(-)
--
2.54.0
next reply other threads:[~2026-05-05 19:52 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-05 19:51 Paolo Bonzini [this message]
2026-05-05 19:51 ` [PATCH 01/28] KVM: TDX/VMX: rework EPT_VIOLATION_EXEC_FOR_RING3_LIN into PROT_MASK Paolo Bonzini
2026-05-05 19:52 ` [PATCH 02/28] KVM: x86/mmu: remove SPTE_PERM_MASK Paolo Bonzini
2026-05-05 19:52 ` [PATCH 03/28] KVM: x86/mmu: free up bit 10 of PTEs in preparation for MBEC Paolo Bonzini
2026-05-05 19:52 ` [PATCH 04/28] KVM: x86/mmu: shuffle high bits of SPTEs " Paolo Bonzini
2026-05-05 19:52 ` [PATCH 05/28] KVM: x86/mmu: remove SPTE_EPT_* Paolo Bonzini
2026-05-05 19:52 ` [PATCH 06/28] KVM: x86/mmu: merge make_spte_{non,}executable Paolo Bonzini
2026-05-05 19:52 ` [PATCH 07/28] KVM: x86/mmu: rename and clarify BYTE_MASK Paolo Bonzini
2026-05-05 19:52 ` [PATCH 08/28] KVM: x86/mmu: separate more EPT/non-EPT permission_fault() Paolo Bonzini
2026-05-05 19:52 ` [PATCH 09/28] KVM: x86/mmu: introduce ACC_READ_MASK Paolo Bonzini
2026-05-05 19:52 ` [PATCH 10/28] KVM: x86/mmu: pass PFERR_GUEST_PAGE/FINAL_MASK to kvm_translate_gpa Paolo Bonzini
2026-05-05 19:52 ` [PATCH 11/28] KVM: x86/mmu: pass pte_access for final nGPA->GPA walk Paolo Bonzini
2026-05-05 19:52 ` [PATCH 12/28] KVM: x86: make translate_nested_gpa vendor-specific Paolo Bonzini
2026-05-05 19:52 ` [PATCH 13/28] KVM: x86/mmu: split XS/XU bits for EPT Paolo Bonzini
2026-05-05 19:52 ` [PATCH 14/28] KVM: x86/mmu: move cr4_smep to base role Paolo Bonzini
2026-05-05 19:52 ` [PATCH 15/28] KVM: VMX: enable use of MBEC Paolo Bonzini
2026-05-05 19:52 ` [PATCH 16/28] KVM: nVMX: pass advanced EPT violation vmexit info to guest Paolo Bonzini
2026-05-05 19:52 ` [PATCH 17/28] KVM: nVMX: pass PFERR_USER_MASK to MMU on EPT violations Paolo Bonzini
2026-05-05 19:52 ` [PATCH 18/28] KVM: x86/mmu: add support for MBEC to EPT page table walks Paolo Bonzini
2026-05-05 19:52 ` [PATCH 19/28] KVM: nVMX: advertise MBEC to nested guests Paolo Bonzini
2026-05-05 19:52 ` [PATCH 20/28] KVM: nVMX: allow MBEC with EVMCS Paolo Bonzini
2026-05-05 19:52 ` [PATCH 21/28] KVM: x86/mmu: propagate access mask from root pages down Paolo Bonzini
2026-05-05 19:52 ` [PATCH 22/28] KVM: x86/mmu: introduce cpu_role bit for availability of PFEC.I/D Paolo Bonzini
2026-05-05 19:52 ` [PATCH 23/28] KVM: SVM: add GMET bit definitions Paolo Bonzini
2026-05-05 19:52 ` [PATCH 24/28] KVM: x86/mmu: hard code more bits in kvm_init_shadow_npt_mmu Paolo Bonzini
2026-05-05 19:52 ` [PATCH 25/28] KVM: x86/mmu: add support for GMET to NPT page table walks Paolo Bonzini
2026-05-05 19:52 ` [PATCH 26/28] KVM: SVM: enable GMET and set it in MMU role Paolo Bonzini
2026-05-05 19:52 ` [PATCH 27/28] KVM: SVM: work around errata 1218 Paolo Bonzini
2026-05-05 19:52 ` [PATCH 28/28] KVM: nSVM: enable GMET for guests Paolo Bonzini
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260505195226.563317-1-pbonzini@redhat.com \
--to=pbonzini@redhat.com \
--cc=d.riley@proxmox.com \
--cc=jon@nutanix.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox