* [PATCH v3 00/16] Add Nested NPT support in selftests
@ 2025-11-27 1:34 Yosry Ahmed
2025-11-27 1:34 ` [PATCH v3 01/16] KVM: selftests: Make __vm_get_page_table_entry() static Yosry Ahmed
` (16 more replies)
0 siblings, 17 replies; 36+ messages in thread
From: Yosry Ahmed @ 2025-11-27 1:34 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed
This series adds support for nested NPT, then extends vmx_dirty_log_test
and kvm_dirty_log_test (with -n, using memstress) to cover nested SVM.
Patches 1-5 are cleanups and prep.
Patches 6-7 introduce a new data structure, kvm_mmu, that is used by virt
mapping functions for guest page tables.
Patches 8-11 add a nested kvm_mmu for nested EPTs, which are now shared
between all vCPUs, and reuses the virt mapping functions for EPTs.
Patches 12-14 add the support for nested NPTs, which becomes simple
after all the above prep work.
Patches 15-16 extend the existing selftests exercising nested EPTs to
also cover nested NPTs.
v2 -> v3:
- Dropped the patches that landed in kvm-x86.
- Reshuffled some patches and cleanups.
- Introduced kvm_mmu data structures to hold the root, page table
levels, and page table masks (Sean).
- Extended memstress as well to cover nested SVM.
v2: https://lore.kernel.org/kvm/20251021074736.1324328-1-yosry.ahmed@linux.dev/
Yosry Ahmed (16):
KVM: selftests: Make __vm_get_page_table_entry() static
KVM: selftests: Stop passing a memslot to nested_map_memslot()
KVM: selftests: Rename nested TDP mapping functions
KVM: selftests: Kill eptPageTablePointer
KVM: selftests: Stop setting AD bits on nested EPTs on creation
KVM: selftests: Introduce struct kvm_mmu
KVM: selftests: Move PTE bitmasks to kvm_mmu
KVM: selftests: Use a nested MMU to share nested EPTs between vCPUs
KVM: selftests: Stop passing VMX metadata to TDP mapping functions
KVM: selftests: Reuse virt mapping functions for nested EPTs
KVM: selftests: Move TDP mapping functions outside of vmx.c
KVM: selftests: Allow kvm_cpu_has_ept() to be called on AMD CPUs
KVM: selftests: Add support for nested NPTs
KVM: selftests: Set the user bit on nested NPT PTEs
KVM: selftests: Extend vmx_dirty_log_test to cover SVM
KVM: selftests: Extend memstress to run on nested SVM
tools/testing/selftests/kvm/Makefile.kvm | 2 +-
.../selftests/kvm/include/x86/kvm_util_arch.h | 7 +
.../selftests/kvm/include/x86/processor.h | 68 ++++-
.../selftests/kvm/include/x86/svm_util.h | 9 +
tools/testing/selftests/kvm/include/x86/vmx.h | 16 +-
.../testing/selftests/kvm/lib/x86/memstress.c | 68 +++--
.../testing/selftests/kvm/lib/x86/processor.c | 217 ++++++++++++---
tools/testing/selftests/kvm/lib/x86/svm.c | 25 ++
tools/testing/selftests/kvm/lib/x86/vmx.c | 256 ++++--------------
...rty_log_test.c => nested_dirty_log_test.c} | 87 ++++--
10 files changed, 428 insertions(+), 327 deletions(-)
rename tools/testing/selftests/kvm/x86/{vmx_dirty_log_test.c => nested_dirty_log_test.c} (67%)
base-commit: 115d5de2eef32ac5cd488404b44b38789362dbe6
--
2.52.0.158.g65b55ccf14-goog
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v3 01/16] KVM: selftests: Make __vm_get_page_table_entry() static
2025-11-27 1:34 [PATCH v3 00/16] Add Nested NPT support in selftests Yosry Ahmed
@ 2025-11-27 1:34 ` Yosry Ahmed
2025-11-27 1:34 ` [PATCH v3 02/16] KVM: selftests: Stop passing a memslot to nested_map_memslot() Yosry Ahmed
` (15 subsequent siblings)
16 siblings, 0 replies; 36+ messages in thread
From: Yosry Ahmed @ 2025-11-27 1:34 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed
The function is only used in processor.c, drop the declaration in
processor.h and make it static.
No functional change intended.
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
---
tools/testing/selftests/kvm/include/x86/processor.h | 2 --
tools/testing/selftests/kvm/lib/x86/processor.c | 4 ++--
2 files changed, 2 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/kvm/include/x86/processor.h b/tools/testing/selftests/kvm/include/x86/processor.h
index 57d62a425109..c00c0fbe62cd 100644
--- a/tools/testing/selftests/kvm/include/x86/processor.h
+++ b/tools/testing/selftests/kvm/include/x86/processor.h
@@ -1367,8 +1367,6 @@ static inline bool kvm_is_ignore_msrs(void)
return get_kvm_param_bool("ignore_msrs");
}
-uint64_t *__vm_get_page_table_entry(struct kvm_vm *vm, uint64_t vaddr,
- int *level);
uint64_t *vm_get_page_table_entry(struct kvm_vm *vm, uint64_t vaddr);
uint64_t kvm_hypercall(uint64_t nr, uint64_t a0, uint64_t a1, uint64_t a2,
diff --git a/tools/testing/selftests/kvm/lib/x86/processor.c b/tools/testing/selftests/kvm/lib/x86/processor.c
index 36104d27f3d9..c14bf2b5f28f 100644
--- a/tools/testing/selftests/kvm/lib/x86/processor.c
+++ b/tools/testing/selftests/kvm/lib/x86/processor.c
@@ -306,8 +306,8 @@ static bool vm_is_target_pte(uint64_t *pte, int *level, int current_level)
return *level == current_level;
}
-uint64_t *__vm_get_page_table_entry(struct kvm_vm *vm, uint64_t vaddr,
- int *level)
+static uint64_t *__vm_get_page_table_entry(struct kvm_vm *vm, uint64_t vaddr,
+ int *level)
{
int va_width = 12 + (vm->pgtable_levels) * 9;
uint64_t *pte = &vm->pgd;
--
2.52.0.158.g65b55ccf14-goog
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v3 02/16] KVM: selftests: Stop passing a memslot to nested_map_memslot()
2025-11-27 1:34 [PATCH v3 00/16] Add Nested NPT support in selftests Yosry Ahmed
2025-11-27 1:34 ` [PATCH v3 01/16] KVM: selftests: Make __vm_get_page_table_entry() static Yosry Ahmed
@ 2025-11-27 1:34 ` Yosry Ahmed
2025-11-27 1:34 ` [PATCH v3 03/16] KVM: selftests: Rename nested TDP mapping functions Yosry Ahmed
` (14 subsequent siblings)
16 siblings, 0 replies; 36+ messages in thread
From: Yosry Ahmed @ 2025-11-27 1:34 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed
On x86, KVM selftests use memslot 0 for all the default regions used by
the test infrastructure. This is an implementation detail.
nested_map_memslot() is currently used to map the default regions by
explicitly passing slot 0, which leaks the library implementation into
the caller.
Rename the function to a very verbose
nested_identity_map_default_memslots() to reflect what it actually does.
Add an assertion that only memslot 0 is being used so that the
implementation does not change from under us.
No functional change intended.
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
---
tools/testing/selftests/kvm/include/x86/vmx.h | 4 ++--
tools/testing/selftests/kvm/lib/x86/vmx.c | 12 ++++++++----
tools/testing/selftests/kvm/x86/vmx_dirty_log_test.c | 2 +-
3 files changed, 11 insertions(+), 7 deletions(-)
diff --git a/tools/testing/selftests/kvm/include/x86/vmx.h b/tools/testing/selftests/kvm/include/x86/vmx.h
index 96e2b4c630a9..91916b8aa94b 100644
--- a/tools/testing/selftests/kvm/include/x86/vmx.h
+++ b/tools/testing/selftests/kvm/include/x86/vmx.h
@@ -563,8 +563,8 @@ void nested_pg_map(struct vmx_pages *vmx, struct kvm_vm *vm,
uint64_t nested_paddr, uint64_t paddr);
void nested_map(struct vmx_pages *vmx, struct kvm_vm *vm,
uint64_t nested_paddr, uint64_t paddr, uint64_t size);
-void nested_map_memslot(struct vmx_pages *vmx, struct kvm_vm *vm,
- uint32_t memslot);
+void nested_identity_map_default_memslots(struct vmx_pages *vmx,
+ struct kvm_vm *vm);
void nested_identity_map_1g(struct vmx_pages *vmx, struct kvm_vm *vm,
uint64_t addr, uint64_t size);
bool kvm_cpu_has_ept(void);
diff --git a/tools/testing/selftests/kvm/lib/x86/vmx.c b/tools/testing/selftests/kvm/lib/x86/vmx.c
index 29b082a58daa..eec33ec63811 100644
--- a/tools/testing/selftests/kvm/lib/x86/vmx.c
+++ b/tools/testing/selftests/kvm/lib/x86/vmx.c
@@ -494,12 +494,16 @@ void nested_map(struct vmx_pages *vmx, struct kvm_vm *vm,
/* Prepare an identity extended page table that maps all the
* physical pages in VM.
*/
-void nested_map_memslot(struct vmx_pages *vmx, struct kvm_vm *vm,
- uint32_t memslot)
+void nested_identity_map_default_memslots(struct vmx_pages *vmx,
+ struct kvm_vm *vm)
{
+ uint32_t s, memslot = 0;
sparsebit_idx_t i, last;
- struct userspace_mem_region *region =
- memslot2region(vm, memslot);
+ struct userspace_mem_region *region = memslot2region(vm, memslot);
+
+ /* Only memslot 0 is mapped here, ensure it's the only one being used */
+ for (s = 0; s < NR_MEM_REGIONS; s++)
+ TEST_ASSERT_EQ(vm->memslots[s], 0);
i = (region->region.guest_phys_addr >> vm->page_shift) - 1;
last = i + (region->region.memory_size >> vm->page_shift);
diff --git a/tools/testing/selftests/kvm/x86/vmx_dirty_log_test.c b/tools/testing/selftests/kvm/x86/vmx_dirty_log_test.c
index 98cb6bdab3e6..aab7333aaef0 100644
--- a/tools/testing/selftests/kvm/x86/vmx_dirty_log_test.c
+++ b/tools/testing/selftests/kvm/x86/vmx_dirty_log_test.c
@@ -121,7 +121,7 @@ static void test_vmx_dirty_log(bool enable_ept)
*/
if (enable_ept) {
prepare_eptp(vmx, vm);
- nested_map_memslot(vmx, vm, 0);
+ nested_identity_map_default_memslots(vmx, vm);
nested_map(vmx, vm, NESTED_TEST_MEM1, GUEST_TEST_MEM, PAGE_SIZE);
nested_map(vmx, vm, NESTED_TEST_MEM2, GUEST_TEST_MEM, PAGE_SIZE);
}
--
2.52.0.158.g65b55ccf14-goog
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v3 03/16] KVM: selftests: Rename nested TDP mapping functions
2025-11-27 1:34 [PATCH v3 00/16] Add Nested NPT support in selftests Yosry Ahmed
2025-11-27 1:34 ` [PATCH v3 01/16] KVM: selftests: Make __vm_get_page_table_entry() static Yosry Ahmed
2025-11-27 1:34 ` [PATCH v3 02/16] KVM: selftests: Stop passing a memslot to nested_map_memslot() Yosry Ahmed
@ 2025-11-27 1:34 ` Yosry Ahmed
2025-11-27 1:34 ` [PATCH v3 04/16] KVM: selftests: Kill eptPageTablePointer Yosry Ahmed
` (13 subsequent siblings)
16 siblings, 0 replies; 36+ messages in thread
From: Yosry Ahmed @ 2025-11-27 1:34 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed
Rename the functions from nested_* to tdp_* to make their purpose
clearer.
No functional change intended.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
---
tools/testing/selftests/kvm/include/x86/vmx.h | 16 +++---
.../testing/selftests/kvm/lib/x86/memstress.c | 4 +-
tools/testing/selftests/kvm/lib/x86/vmx.c | 50 +++++++++----------
.../selftests/kvm/x86/vmx_dirty_log_test.c | 6 +--
4 files changed, 37 insertions(+), 39 deletions(-)
diff --git a/tools/testing/selftests/kvm/include/x86/vmx.h b/tools/testing/selftests/kvm/include/x86/vmx.h
index 91916b8aa94b..04b8231d032a 100644
--- a/tools/testing/selftests/kvm/include/x86/vmx.h
+++ b/tools/testing/selftests/kvm/include/x86/vmx.h
@@ -559,14 +559,14 @@ bool load_vmcs(struct vmx_pages *vmx);
bool ept_1g_pages_supported(void);
-void nested_pg_map(struct vmx_pages *vmx, struct kvm_vm *vm,
- uint64_t nested_paddr, uint64_t paddr);
-void nested_map(struct vmx_pages *vmx, struct kvm_vm *vm,
- uint64_t nested_paddr, uint64_t paddr, uint64_t size);
-void nested_identity_map_default_memslots(struct vmx_pages *vmx,
- struct kvm_vm *vm);
-void nested_identity_map_1g(struct vmx_pages *vmx, struct kvm_vm *vm,
- uint64_t addr, uint64_t size);
+void tdp_pg_map(struct vmx_pages *vmx, struct kvm_vm *vm, uint64_t nested_paddr,
+ uint64_t paddr);
+void tdp_map(struct vmx_pages *vmx, struct kvm_vm *vm, uint64_t nested_paddr,
+ uint64_t paddr, uint64_t size);
+void tdp_identity_map_default_memslots(struct vmx_pages *vmx,
+ struct kvm_vm *vm);
+void tdp_identity_map_1g(struct vmx_pages *vmx, struct kvm_vm *vm,
+ uint64_t addr, uint64_t size);
bool kvm_cpu_has_ept(void);
void prepare_eptp(struct vmx_pages *vmx, struct kvm_vm *vm);
void prepare_virtualize_apic_accesses(struct vmx_pages *vmx, struct kvm_vm *vm);
diff --git a/tools/testing/selftests/kvm/lib/x86/memstress.c b/tools/testing/selftests/kvm/lib/x86/memstress.c
index 0b1f288ad556..1928b00bde51 100644
--- a/tools/testing/selftests/kvm/lib/x86/memstress.c
+++ b/tools/testing/selftests/kvm/lib/x86/memstress.c
@@ -70,11 +70,11 @@ void memstress_setup_ept(struct vmx_pages *vmx, struct kvm_vm *vm)
* KVM can shadow the EPT12 with the maximum huge page size supported
* by the backing source.
*/
- nested_identity_map_1g(vmx, vm, 0, 0x100000000ULL);
+ tdp_identity_map_1g(vmx, vm, 0, 0x100000000ULL);
start = align_down(memstress_args.gpa, PG_SIZE_1G);
end = align_up(memstress_args.gpa + memstress_args.size, PG_SIZE_1G);
- nested_identity_map_1g(vmx, vm, start, end - start);
+ tdp_identity_map_1g(vmx, vm, start, end - start);
}
void memstress_setup_nested(struct kvm_vm *vm, int nr_vcpus, struct kvm_vcpu *vcpus[])
diff --git a/tools/testing/selftests/kvm/lib/x86/vmx.c b/tools/testing/selftests/kvm/lib/x86/vmx.c
index eec33ec63811..1954ccdfc353 100644
--- a/tools/testing/selftests/kvm/lib/x86/vmx.c
+++ b/tools/testing/selftests/kvm/lib/x86/vmx.c
@@ -362,12 +362,12 @@ void prepare_vmcs(struct vmx_pages *vmx, void *guest_rip, void *guest_rsp)
init_vmcs_guest_state(guest_rip, guest_rsp);
}
-static void nested_create_pte(struct kvm_vm *vm,
- struct eptPageTableEntry *pte,
- uint64_t nested_paddr,
- uint64_t paddr,
- int current_level,
- int target_level)
+static void tdp_create_pte(struct kvm_vm *vm,
+ struct eptPageTableEntry *pte,
+ uint64_t nested_paddr,
+ uint64_t paddr,
+ int current_level,
+ int target_level)
{
if (!pte->readable) {
pte->writable = true;
@@ -394,8 +394,8 @@ static void nested_create_pte(struct kvm_vm *vm,
}
-void __nested_pg_map(struct vmx_pages *vmx, struct kvm_vm *vm,
- uint64_t nested_paddr, uint64_t paddr, int target_level)
+void __tdp_pg_map(struct vmx_pages *vmx, struct kvm_vm *vm,
+ uint64_t nested_paddr, uint64_t paddr, int target_level)
{
const uint64_t page_size = PG_LEVEL_SIZE(target_level);
struct eptPageTableEntry *pt = vmx->eptp_hva, *pte;
@@ -428,7 +428,7 @@ void __nested_pg_map(struct vmx_pages *vmx, struct kvm_vm *vm,
index = (nested_paddr >> PG_LEVEL_SHIFT(level)) & 0x1ffu;
pte = &pt[index];
- nested_create_pte(vm, pte, nested_paddr, paddr, level, target_level);
+ tdp_create_pte(vm, pte, nested_paddr, paddr, level, target_level);
if (pte->page_size)
break;
@@ -445,10 +445,10 @@ void __nested_pg_map(struct vmx_pages *vmx, struct kvm_vm *vm,
}
-void nested_pg_map(struct vmx_pages *vmx, struct kvm_vm *vm,
- uint64_t nested_paddr, uint64_t paddr)
+void tdp_pg_map(struct vmx_pages *vmx, struct kvm_vm *vm,
+ uint64_t nested_paddr, uint64_t paddr)
{
- __nested_pg_map(vmx, vm, nested_paddr, paddr, PG_LEVEL_4K);
+ __tdp_pg_map(vmx, vm, nested_paddr, paddr, PG_LEVEL_4K);
}
/*
@@ -468,8 +468,8 @@ void nested_pg_map(struct vmx_pages *vmx, struct kvm_vm *vm,
* Within the VM given by vm, creates a nested guest translation for the
* page range starting at nested_paddr to the page range starting at paddr.
*/
-void __nested_map(struct vmx_pages *vmx, struct kvm_vm *vm,
- uint64_t nested_paddr, uint64_t paddr, uint64_t size,
+void __tdp_map(struct vmx_pages *vmx, struct kvm_vm *vm,
+ uint64_t nested_paddr, uint64_t paddr, uint64_t size,
int level)
{
size_t page_size = PG_LEVEL_SIZE(level);
@@ -479,23 +479,23 @@ void __nested_map(struct vmx_pages *vmx, struct kvm_vm *vm,
TEST_ASSERT(paddr + size > paddr, "Paddr overflow");
while (npages--) {
- __nested_pg_map(vmx, vm, nested_paddr, paddr, level);
+ __tdp_pg_map(vmx, vm, nested_paddr, paddr, level);
nested_paddr += page_size;
paddr += page_size;
}
}
-void nested_map(struct vmx_pages *vmx, struct kvm_vm *vm,
- uint64_t nested_paddr, uint64_t paddr, uint64_t size)
+void tdp_map(struct vmx_pages *vmx, struct kvm_vm *vm,
+ uint64_t nested_paddr, uint64_t paddr, uint64_t size)
{
- __nested_map(vmx, vm, nested_paddr, paddr, size, PG_LEVEL_4K);
+ __tdp_map(vmx, vm, nested_paddr, paddr, size, PG_LEVEL_4K);
}
/* Prepare an identity extended page table that maps all the
* physical pages in VM.
*/
-void nested_identity_map_default_memslots(struct vmx_pages *vmx,
- struct kvm_vm *vm)
+void tdp_identity_map_default_memslots(struct vmx_pages *vmx,
+ struct kvm_vm *vm)
{
uint32_t s, memslot = 0;
sparsebit_idx_t i, last;
@@ -512,18 +512,16 @@ void nested_identity_map_default_memslots(struct vmx_pages *vmx,
if (i > last)
break;
- nested_map(vmx, vm,
- (uint64_t)i << vm->page_shift,
- (uint64_t)i << vm->page_shift,
- 1 << vm->page_shift);
+ tdp_map(vmx, vm, (uint64_t)i << vm->page_shift,
+ (uint64_t)i << vm->page_shift, 1 << vm->page_shift);
}
}
/* Identity map a region with 1GiB Pages. */
-void nested_identity_map_1g(struct vmx_pages *vmx, struct kvm_vm *vm,
+void tdp_identity_map_1g(struct vmx_pages *vmx, struct kvm_vm *vm,
uint64_t addr, uint64_t size)
{
- __nested_map(vmx, vm, addr, addr, size, PG_LEVEL_1G);
+ __tdp_map(vmx, vm, addr, addr, size, PG_LEVEL_1G);
}
bool kvm_cpu_has_ept(void)
diff --git a/tools/testing/selftests/kvm/x86/vmx_dirty_log_test.c b/tools/testing/selftests/kvm/x86/vmx_dirty_log_test.c
index aab7333aaef0..e7d0c08ba29d 100644
--- a/tools/testing/selftests/kvm/x86/vmx_dirty_log_test.c
+++ b/tools/testing/selftests/kvm/x86/vmx_dirty_log_test.c
@@ -121,9 +121,9 @@ static void test_vmx_dirty_log(bool enable_ept)
*/
if (enable_ept) {
prepare_eptp(vmx, vm);
- nested_identity_map_default_memslots(vmx, vm);
- nested_map(vmx, vm, NESTED_TEST_MEM1, GUEST_TEST_MEM, PAGE_SIZE);
- nested_map(vmx, vm, NESTED_TEST_MEM2, GUEST_TEST_MEM, PAGE_SIZE);
+ tdp_identity_map_default_memslots(vmx, vm);
+ tdp_map(vmx, vm, NESTED_TEST_MEM1, GUEST_TEST_MEM, PAGE_SIZE);
+ tdp_map(vmx, vm, NESTED_TEST_MEM2, GUEST_TEST_MEM, PAGE_SIZE);
}
bmap = bitmap_zalloc(TEST_MEM_PAGES);
--
2.52.0.158.g65b55ccf14-goog
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v3 04/16] KVM: selftests: Kill eptPageTablePointer
2025-11-27 1:34 [PATCH v3 00/16] Add Nested NPT support in selftests Yosry Ahmed
` (2 preceding siblings ...)
2025-11-27 1:34 ` [PATCH v3 03/16] KVM: selftests: Rename nested TDP mapping functions Yosry Ahmed
@ 2025-11-27 1:34 ` Yosry Ahmed
2025-11-27 1:34 ` [PATCH v3 05/16] KVM: selftests: Stop setting AD bits on nested EPTs on creation Yosry Ahmed
` (12 subsequent siblings)
16 siblings, 0 replies; 36+ messages in thread
From: Yosry Ahmed @ 2025-11-27 1:34 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed
Replace the struct overlay with explicit bitmasks, which is clearer and
less error-prone. See commit f18b4aebe107 ("kvm: selftests: do not use
bitfields larger than 32-bits for PTEs") for an example of why bitfields
are not preferrable.
Remove the unused PAGE_SHIFT_4K definition while at it.
No functional change intended.
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
---
tools/testing/selftests/kvm/lib/x86/vmx.c | 37 +++++++++++------------
1 file changed, 17 insertions(+), 20 deletions(-)
diff --git a/tools/testing/selftests/kvm/lib/x86/vmx.c b/tools/testing/selftests/kvm/lib/x86/vmx.c
index 1954ccdfc353..85043bb1ec4d 100644
--- a/tools/testing/selftests/kvm/lib/x86/vmx.c
+++ b/tools/testing/selftests/kvm/lib/x86/vmx.c
@@ -10,10 +10,16 @@
#include "processor.h"
#include "vmx.h"
-#define PAGE_SHIFT_4K 12
-
#define KVM_EPT_PAGE_TABLE_MIN_PADDR 0x1c0000
+#define EPTP_MT_SHIFT 0 /* EPTP memtype bits 2:0 */
+#define EPTP_PWL_SHIFT 3 /* EPTP page walk length bits 5:3 */
+#define EPTP_AD_ENABLED_SHIFT 6 /* EPTP AD enabled bit 6 */
+
+#define EPTP_WB (X86_MEMTYPE_WB << EPTP_MT_SHIFT)
+#define EPTP_PWL_4 (3ULL << EPTP_PWL_SHIFT) /* PWL is (levels - 1) */
+#define EPTP_AD_ENABLED (1ULL << EPTP_AD_ENABLED_SHIFT)
+
bool enable_evmcs;
struct hv_enlightened_vmcs *current_evmcs;
@@ -34,14 +40,6 @@ struct eptPageTableEntry {
uint64_t suppress_ve:1;
};
-struct eptPageTablePointer {
- uint64_t memory_type:3;
- uint64_t page_walk_length:3;
- uint64_t ad_enabled:1;
- uint64_t reserved_11_07:5;
- uint64_t address:40;
- uint64_t reserved_63_52:12;
-};
int vcpu_enable_evmcs(struct kvm_vcpu *vcpu)
{
uint16_t evmcs_ver;
@@ -196,16 +194,15 @@ static inline void init_vmcs_control_fields(struct vmx_pages *vmx)
vmwrite(PIN_BASED_VM_EXEC_CONTROL, rdmsr(MSR_IA32_VMX_TRUE_PINBASED_CTLS));
if (vmx->eptp_gpa) {
- uint64_t ept_paddr;
- struct eptPageTablePointer eptp = {
- .memory_type = X86_MEMTYPE_WB,
- .page_walk_length = 3, /* + 1 */
- .ad_enabled = ept_vpid_cap_supported(VMX_EPT_VPID_CAP_AD_BITS),
- .address = vmx->eptp_gpa >> PAGE_SHIFT_4K,
- };
-
- memcpy(&ept_paddr, &eptp, sizeof(ept_paddr));
- vmwrite(EPT_POINTER, ept_paddr);
+ uint64_t eptp = vmx->eptp_gpa | EPTP_WB | EPTP_PWL_4;
+
+ TEST_ASSERT((vmx->eptp_gpa & ~PHYSICAL_PAGE_MASK) == 0,
+ "Illegal bits set in vmx->eptp_gpa");
+
+ if (ept_vpid_cap_supported(VMX_EPT_VPID_CAP_AD_BITS))
+ eptp |= EPTP_AD_ENABLED;
+
+ vmwrite(EPT_POINTER, eptp);
sec_exec_ctl |= SECONDARY_EXEC_ENABLE_EPT;
}
--
2.52.0.158.g65b55ccf14-goog
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v3 05/16] KVM: selftests: Stop setting AD bits on nested EPTs on creation
2025-11-27 1:34 [PATCH v3 00/16] Add Nested NPT support in selftests Yosry Ahmed
` (3 preceding siblings ...)
2025-11-27 1:34 ` [PATCH v3 04/16] KVM: selftests: Kill eptPageTablePointer Yosry Ahmed
@ 2025-11-27 1:34 ` Yosry Ahmed
2025-12-23 22:26 ` Sean Christopherson
2025-11-27 1:34 ` [PATCH v3 06/16] KVM: selftests: Introduce struct kvm_mmu Yosry Ahmed
` (11 subsequent siblings)
16 siblings, 1 reply; 36+ messages in thread
From: Yosry Ahmed @ 2025-11-27 1:34 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed
When new nested EPTs are created, the AD bits are set. This was
introduced by commit 094444204570 ("selftests: kvm: add test for dirty
logging inside nested guests"), which introduced vmx_dirty_log_test.
It's unclear why that was needed at the time, but regardless, the test
seems to pass without them so probably no longer needed.
dirty_log_perf_test (with -n to run in L2) also passes, and these are
the only tests currently using nested EPT mappings.
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
---
tools/testing/selftests/kvm/lib/x86/vmx.c | 8 --------
1 file changed, 8 deletions(-)
diff --git a/tools/testing/selftests/kvm/lib/x86/vmx.c b/tools/testing/selftests/kvm/lib/x86/vmx.c
index 85043bb1ec4d..a3e2eae981da 100644
--- a/tools/testing/selftests/kvm/lib/x86/vmx.c
+++ b/tools/testing/selftests/kvm/lib/x86/vmx.c
@@ -432,14 +432,6 @@ void __tdp_pg_map(struct vmx_pages *vmx, struct kvm_vm *vm,
pt = addr_gpa2hva(vm, pte->address * vm->page_size);
}
-
- /*
- * For now mark these as accessed and dirty because the only
- * testcase we have needs that. Can be reconsidered later.
- */
- pte->accessed = true;
- pte->dirty = true;
-
}
void tdp_pg_map(struct vmx_pages *vmx, struct kvm_vm *vm,
--
2.52.0.158.g65b55ccf14-goog
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v3 06/16] KVM: selftests: Introduce struct kvm_mmu
2025-11-27 1:34 [PATCH v3 00/16] Add Nested NPT support in selftests Yosry Ahmed
` (4 preceding siblings ...)
2025-11-27 1:34 ` [PATCH v3 05/16] KVM: selftests: Stop setting AD bits on nested EPTs on creation Yosry Ahmed
@ 2025-11-27 1:34 ` Yosry Ahmed
2025-12-23 22:29 ` Sean Christopherson
2025-11-27 1:34 ` [PATCH v3 07/16] KVM: selftests: Move PTE bitmasks to kvm_mmu Yosry Ahmed
` (10 subsequent siblings)
16 siblings, 1 reply; 36+ messages in thread
From: Yosry Ahmed @ 2025-11-27 1:34 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed
In preparation for generalizing the virt mapping functions to work with
TDP page tables, introduce struct kvm_mmu. This struct currently only
holds the root GPA and number of page table levels. Parameterize virt
mapping functions by the kvm_mmu, and use the root GPA and page table
levels instead of hardcoding vm->pgd and vm->pgtable_levels.
There's a subtle change here, instead of checking that the parent
pointer is the address of the vm->pgd, check if the value pointed at by
the parent pointer is the root GPA (i.e. the value of vm->pgd in this
case). No change in behavior expected.
Opportunistically, switch the ordering of the checks in the assertion in
virt_get_pte(), as it makes more sense to check if the parent PTE is the
root (in which case, not a PTE) before checking the present flag.
vm->arch.mmu is dynamically allocated to avoid a circular dependency
chain if kvm_util_arch.h includes processor.h for the struct definition:
kvm_util_arch.h -> processor.h -> kvm_util.h -> kvm_util_arch.h
No functional change intended.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
---
.../selftests/kvm/include/x86/kvm_util_arch.h | 4 ++
.../selftests/kvm/include/x86/processor.h | 8 ++-
.../testing/selftests/kvm/lib/x86/processor.c | 61 +++++++++++++------
3 files changed, 53 insertions(+), 20 deletions(-)
diff --git a/tools/testing/selftests/kvm/include/x86/kvm_util_arch.h b/tools/testing/selftests/kvm/include/x86/kvm_util_arch.h
index 972bb1c4ab4c..d8808fa33faa 100644
--- a/tools/testing/selftests/kvm/include/x86/kvm_util_arch.h
+++ b/tools/testing/selftests/kvm/include/x86/kvm_util_arch.h
@@ -10,6 +10,8 @@
extern bool is_forced_emulation_enabled;
+struct kvm_mmu;
+
struct kvm_vm_arch {
vm_vaddr_t gdt;
vm_vaddr_t tss;
@@ -19,6 +21,8 @@ struct kvm_vm_arch {
uint64_t s_bit;
int sev_fd;
bool is_pt_protected;
+
+ struct kvm_mmu *mmu;
};
static inline bool __vm_arch_has_protected_memory(struct kvm_vm_arch *arch)
diff --git a/tools/testing/selftests/kvm/include/x86/processor.h b/tools/testing/selftests/kvm/include/x86/processor.h
index c00c0fbe62cd..0c295097c714 100644
--- a/tools/testing/selftests/kvm/include/x86/processor.h
+++ b/tools/testing/selftests/kvm/include/x86/processor.h
@@ -1449,7 +1449,13 @@ enum pg_level {
#define PG_SIZE_2M PG_LEVEL_SIZE(PG_LEVEL_2M)
#define PG_SIZE_1G PG_LEVEL_SIZE(PG_LEVEL_1G)
-void __virt_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr, int level);
+struct kvm_mmu {
+ uint64_t root_gpa;
+ int pgtable_levels;
+};
+
+void __virt_pg_map(struct kvm_vm *vm, struct kvm_mmu *mmu, uint64_t vaddr,
+ uint64_t paddr, int level);
void virt_map_level(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr,
uint64_t nr_bytes, int level);
diff --git a/tools/testing/selftests/kvm/lib/x86/processor.c b/tools/testing/selftests/kvm/lib/x86/processor.c
index c14bf2b5f28f..871de49c35ee 100644
--- a/tools/testing/selftests/kvm/lib/x86/processor.c
+++ b/tools/testing/selftests/kvm/lib/x86/processor.c
@@ -156,6 +156,23 @@ bool kvm_is_tdp_enabled(void)
return get_kvm_amd_param_bool("npt");
}
+static struct kvm_mmu *mmu_create(struct kvm_vm *vm,
+ int pgtable_levels)
+{
+ struct kvm_mmu *mmu = calloc(1, sizeof(*mmu));
+
+ TEST_ASSERT(mmu, "-ENOMEM when allocating MMU");
+ mmu->root_gpa = vm_alloc_page_table(vm);
+ mmu->pgtable_levels = pgtable_levels;
+ return mmu;
+}
+
+static void mmu_init(struct kvm_vm *vm)
+{
+ vm->arch.mmu = mmu_create(vm, vm->pgtable_levels);
+ vm->pgd = vm->arch.mmu->root_gpa;
+}
+
void virt_arch_pgd_alloc(struct kvm_vm *vm)
{
TEST_ASSERT(vm->mode == VM_MODE_PXXVYY_4K,
@@ -163,19 +180,19 @@ void virt_arch_pgd_alloc(struct kvm_vm *vm)
/* If needed, create the top-level page table. */
if (!vm->pgd_created) {
- vm->pgd = vm_alloc_page_table(vm);
+ mmu_init(vm);
vm->pgd_created = true;
}
}
-static void *virt_get_pte(struct kvm_vm *vm, uint64_t *parent_pte,
- uint64_t vaddr, int level)
+static void *virt_get_pte(struct kvm_vm *vm, struct kvm_mmu *mmu,
+ uint64_t *parent_pte, uint64_t vaddr, int level)
{
uint64_t pt_gpa = PTE_GET_PA(*parent_pte);
uint64_t *page_table = addr_gpa2hva(vm, pt_gpa);
int index = (vaddr >> PG_LEVEL_SHIFT(level)) & 0x1ffu;
- TEST_ASSERT((*parent_pte & PTE_PRESENT_MASK) || parent_pte == &vm->pgd,
+ TEST_ASSERT((*parent_pte == mmu->root_gpa) || (*parent_pte & PTE_PRESENT_MASK),
"Parent PTE (level %d) not PRESENT for gva: 0x%08lx",
level + 1, vaddr);
@@ -183,13 +200,14 @@ static void *virt_get_pte(struct kvm_vm *vm, uint64_t *parent_pte,
}
static uint64_t *virt_create_upper_pte(struct kvm_vm *vm,
+ struct kvm_mmu *mmu,
uint64_t *parent_pte,
uint64_t vaddr,
uint64_t paddr,
int current_level,
int target_level)
{
- uint64_t *pte = virt_get_pte(vm, parent_pte, vaddr, current_level);
+ uint64_t *pte = virt_get_pte(vm, mmu, parent_pte, vaddr, current_level);
paddr = vm_untag_gpa(vm, paddr);
@@ -215,10 +233,11 @@ static uint64_t *virt_create_upper_pte(struct kvm_vm *vm,
return pte;
}
-void __virt_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr, int level)
+void __virt_pg_map(struct kvm_vm *vm, struct kvm_mmu *mmu, uint64_t vaddr,
+ uint64_t paddr, int level)
{
const uint64_t pg_size = PG_LEVEL_SIZE(level);
- uint64_t *pte = &vm->pgd;
+ uint64_t *pte = &mmu->root_gpa;
int current_level;
TEST_ASSERT(vm->mode == VM_MODE_PXXVYY_4K,
@@ -243,17 +262,17 @@ void __virt_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr, int level)
* Allocate upper level page tables, if not already present. Return
* early if a hugepage was created.
*/
- for (current_level = vm->pgtable_levels;
+ for (current_level = mmu->pgtable_levels;
current_level > PG_LEVEL_4K;
current_level--) {
- pte = virt_create_upper_pte(vm, pte, vaddr, paddr,
+ pte = virt_create_upper_pte(vm, mmu, pte, vaddr, paddr,
current_level, level);
if (*pte & PTE_LARGE_MASK)
return;
}
/* Fill in page table entry. */
- pte = virt_get_pte(vm, pte, vaddr, PG_LEVEL_4K);
+ pte = virt_get_pte(vm, mmu, pte, vaddr, PG_LEVEL_4K);
TEST_ASSERT(!(*pte & PTE_PRESENT_MASK),
"PTE already present for 4k page at vaddr: 0x%lx", vaddr);
*pte = PTE_PRESENT_MASK | PTE_WRITABLE_MASK | (paddr & PHYSICAL_PAGE_MASK);
@@ -270,7 +289,7 @@ void __virt_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr, int level)
void virt_arch_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr)
{
- __virt_pg_map(vm, vaddr, paddr, PG_LEVEL_4K);
+ __virt_pg_map(vm, vm->arch.mmu, vaddr, paddr, PG_LEVEL_4K);
}
void virt_map_level(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr,
@@ -285,7 +304,7 @@ void virt_map_level(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr,
nr_bytes, pg_size);
for (i = 0; i < nr_pages; i++) {
- __virt_pg_map(vm, vaddr, paddr, level);
+ __virt_pg_map(vm, vm->arch.mmu, vaddr, paddr, level);
sparsebit_set_num(vm->vpages_mapped, vaddr >> vm->page_shift,
nr_bytes / PAGE_SIZE);
@@ -294,7 +313,8 @@ void virt_map_level(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr,
}
}
-static bool vm_is_target_pte(uint64_t *pte, int *level, int current_level)
+static bool vm_is_target_pte(struct kvm_mmu *mmu, uint64_t *pte,
+ int *level, int current_level)
{
if (*pte & PTE_LARGE_MASK) {
TEST_ASSERT(*level == PG_LEVEL_NONE ||
@@ -306,7 +326,9 @@ static bool vm_is_target_pte(uint64_t *pte, int *level, int current_level)
return *level == current_level;
}
-static uint64_t *__vm_get_page_table_entry(struct kvm_vm *vm, uint64_t vaddr,
+static uint64_t *__vm_get_page_table_entry(struct kvm_vm *vm,
+ struct kvm_mmu *mmu,
+ uint64_t vaddr,
int *level)
{
int va_width = 12 + (vm->pgtable_levels) * 9;
@@ -335,19 +357,19 @@ static uint64_t *__vm_get_page_table_entry(struct kvm_vm *vm, uint64_t vaddr,
for (current_level = vm->pgtable_levels;
current_level > PG_LEVEL_4K;
current_level--) {
- pte = virt_get_pte(vm, pte, vaddr, current_level);
- if (vm_is_target_pte(pte, level, current_level))
+ pte = virt_get_pte(vm, mmu, pte, vaddr, current_level);
+ if (vm_is_target_pte(mmu, pte, level, current_level))
return pte;
}
- return virt_get_pte(vm, pte, vaddr, PG_LEVEL_4K);
+ return virt_get_pte(vm, mmu, pte, vaddr, PG_LEVEL_4K);
}
uint64_t *vm_get_page_table_entry(struct kvm_vm *vm, uint64_t vaddr)
{
int level = PG_LEVEL_4K;
- return __vm_get_page_table_entry(vm, vaddr, &level);
+ return __vm_get_page_table_entry(vm, vm->arch.mmu, vaddr, &level);
}
void virt_arch_dump(FILE *stream, struct kvm_vm *vm, uint8_t indent)
@@ -497,7 +519,8 @@ static void kvm_seg_set_kernel_data_64bit(struct kvm_segment *segp)
vm_paddr_t addr_arch_gva2gpa(struct kvm_vm *vm, vm_vaddr_t gva)
{
int level = PG_LEVEL_NONE;
- uint64_t *pte = __vm_get_page_table_entry(vm, gva, &level);
+ struct kvm_mmu *mmu = vm->arch.mmu;
+ uint64_t *pte = __vm_get_page_table_entry(vm, mmu, gva, &level);
TEST_ASSERT(*pte & PTE_PRESENT_MASK,
"Leaf PTE not PRESENT for gva: 0x%08lx", gva);
--
2.52.0.158.g65b55ccf14-goog
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v3 07/16] KVM: selftests: Move PTE bitmasks to kvm_mmu
2025-11-27 1:34 [PATCH v3 00/16] Add Nested NPT support in selftests Yosry Ahmed
` (5 preceding siblings ...)
2025-11-27 1:34 ` [PATCH v3 06/16] KVM: selftests: Introduce struct kvm_mmu Yosry Ahmed
@ 2025-11-27 1:34 ` Yosry Ahmed
2025-12-23 22:31 ` Sean Christopherson
2025-11-27 1:34 ` [PATCH v3 08/16] KVM: selftests: Use a nested MMU to share nested EPTs between vCPUs Yosry Ahmed
` (9 subsequent siblings)
16 siblings, 1 reply; 36+ messages in thread
From: Yosry Ahmed @ 2025-11-27 1:34 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed
Move the PTE bitmasks into kvm_mmu to parameterize them for virt mapping
functions. Introduce helpers to read/write different PTE bits given a
kvm_mmu.
Drop the 'global' bit definition as it's currently unused, but leave the
'user' bit as it will be used in coming changes. Opportunisitcally
rename 'large' to 'huge' as it's more consistent with the kernel naming.
Leave PHYSICAL_PAGE_MASK alone, it's fixed in all page table formats and
a lot of other macros depend on it. It's tempting to move all the other
macros to be per-struct instead, but it would be too much noise for
little benefit.
Keep c_bit and s_bit in vm->arch as they used before the MMU is
initialized, through __vmcreate() -> vm_userspace_mem_region_add() ->
vm_mem_add() -> vm_arch_has_protected_memory().
No functional change intended.
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
---
.../selftests/kvm/include/x86/processor.h | 43 +++++++++---
.../testing/selftests/kvm/lib/x86/processor.c | 68 +++++++++++--------
2 files changed, 74 insertions(+), 37 deletions(-)
diff --git a/tools/testing/selftests/kvm/include/x86/processor.h b/tools/testing/selftests/kvm/include/x86/processor.h
index 0c295097c714..3a1a82fd42b2 100644
--- a/tools/testing/selftests/kvm/include/x86/processor.h
+++ b/tools/testing/selftests/kvm/include/x86/processor.h
@@ -362,16 +362,6 @@ static inline unsigned int x86_model(unsigned int eax)
return ((eax >> 12) & 0xf0) | ((eax >> 4) & 0x0f);
}
-/* Page table bitfield declarations */
-#define PTE_PRESENT_MASK BIT_ULL(0)
-#define PTE_WRITABLE_MASK BIT_ULL(1)
-#define PTE_USER_MASK BIT_ULL(2)
-#define PTE_ACCESSED_MASK BIT_ULL(5)
-#define PTE_DIRTY_MASK BIT_ULL(6)
-#define PTE_LARGE_MASK BIT_ULL(7)
-#define PTE_GLOBAL_MASK BIT_ULL(8)
-#define PTE_NX_MASK BIT_ULL(63)
-
#define PHYSICAL_PAGE_MASK GENMASK_ULL(51, 12)
#define PAGE_SHIFT 12
@@ -1449,11 +1439,44 @@ enum pg_level {
#define PG_SIZE_2M PG_LEVEL_SIZE(PG_LEVEL_2M)
#define PG_SIZE_1G PG_LEVEL_SIZE(PG_LEVEL_1G)
+struct pte_masks {
+ uint64_t present;
+ uint64_t writable;
+ uint64_t user;
+ uint64_t accessed;
+ uint64_t dirty;
+ uint64_t huge;
+ uint64_t nx;
+ uint64_t c;
+ uint64_t s;
+};
+
struct kvm_mmu {
uint64_t root_gpa;
int pgtable_levels;
+ struct pte_masks pte_masks;
};
+#define PTE_PRESENT_MASK(mmu) ((mmu)->pte_masks.present)
+#define PTE_WRITABLE_MASK(mmu) ((mmu)->pte_masks.writable)
+#define PTE_USER_MASK(mmu) ((mmu)->pte_masks.user)
+#define PTE_ACCESSED_MASK(mmu) ((mmu)->pte_masks.accessed)
+#define PTE_DIRTY_MASK(mmu) ((mmu)->pte_masks.dirty)
+#define PTE_HUGE_MASK(mmu) ((mmu)->pte_masks.huge)
+#define PTE_NX_MASK(mmu) ((mmu)->pte_masks.nx)
+#define PTE_C_MASK(mmu) ((mmu)->pte_masks.c)
+#define PTE_S_MASK(mmu) ((mmu)->pte_masks.s)
+
+#define pte_present(mmu, pte) (!!(*(pte) & PTE_PRESENT_MASK(mmu)))
+#define pte_writable(mmu, pte) (!!(*(pte) & PTE_WRITABLE_MASK(mmu)))
+#define pte_user(mmu, pte) (!!(*(pte) & PTE_USER_MASK(mmu)))
+#define pte_accessed(mmu, pte) (!!(*(pte) & PTE_ACCESSED_MASK(mmu)))
+#define pte_dirty(mmu, pte) (!!(*(pte) & PTE_DIRTY_MASK(mmu)))
+#define pte_huge(mmu, pte) (!!(*(pte) & PTE_HUGE_MASK(mmu)))
+#define pte_nx(mmu, pte) (!!(*(pte) & PTE_NX_MASK(mmu)))
+#define pte_c(mmu, pte) (!!(*(pte) & PTE_C_MASK(mmu)))
+#define pte_s(mmu, pte) (!!(*(pte) & PTE_S_MASK(mmu)))
+
void __virt_pg_map(struct kvm_vm *vm, struct kvm_mmu *mmu, uint64_t vaddr,
uint64_t paddr, int level);
void virt_map_level(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr,
diff --git a/tools/testing/selftests/kvm/lib/x86/processor.c b/tools/testing/selftests/kvm/lib/x86/processor.c
index 871de49c35ee..dc568d70f9d6 100644
--- a/tools/testing/selftests/kvm/lib/x86/processor.c
+++ b/tools/testing/selftests/kvm/lib/x86/processor.c
@@ -157,11 +157,13 @@ bool kvm_is_tdp_enabled(void)
}
static struct kvm_mmu *mmu_create(struct kvm_vm *vm,
- int pgtable_levels)
+ int pgtable_levels,
+ struct pte_masks *pte_masks)
{
struct kvm_mmu *mmu = calloc(1, sizeof(*mmu));
TEST_ASSERT(mmu, "-ENOMEM when allocating MMU");
+ mmu->pte_masks = *pte_masks;
mmu->root_gpa = vm_alloc_page_table(vm);
mmu->pgtable_levels = pgtable_levels;
return mmu;
@@ -169,7 +171,19 @@ static struct kvm_mmu *mmu_create(struct kvm_vm *vm,
static void mmu_init(struct kvm_vm *vm)
{
- vm->arch.mmu = mmu_create(vm, vm->pgtable_levels);
+ struct pte_masks pte_masks = (struct pte_masks){
+ .present = BIT_ULL(0),
+ .writable = BIT_ULL(1),
+ .user = BIT_ULL(2),
+ .accessed = BIT_ULL(5),
+ .dirty = BIT_ULL(6),
+ .huge = BIT_ULL(7),
+ .nx = BIT_ULL(63),
+ .c = vm->arch.c_bit,
+ .s = vm->arch.s_bit,
+ };
+
+ vm->arch.mmu = mmu_create(vm, vm->pgtable_levels, &pte_masks);
vm->pgd = vm->arch.mmu->root_gpa;
}
@@ -177,7 +191,6 @@ void virt_arch_pgd_alloc(struct kvm_vm *vm)
{
TEST_ASSERT(vm->mode == VM_MODE_PXXVYY_4K,
"Unknown or unsupported guest mode: 0x%x", vm->mode);
-
/* If needed, create the top-level page table. */
if (!vm->pgd_created) {
mmu_init(vm);
@@ -192,7 +205,7 @@ static void *virt_get_pte(struct kvm_vm *vm, struct kvm_mmu *mmu,
uint64_t *page_table = addr_gpa2hva(vm, pt_gpa);
int index = (vaddr >> PG_LEVEL_SHIFT(level)) & 0x1ffu;
- TEST_ASSERT((*parent_pte == mmu->root_gpa) || (*parent_pte & PTE_PRESENT_MASK),
+ TEST_ASSERT((*parent_pte == mmu->root_gpa) || pte_present(mmu, parent_pte),
"Parent PTE (level %d) not PRESENT for gva: 0x%08lx",
level + 1, vaddr);
@@ -211,10 +224,10 @@ static uint64_t *virt_create_upper_pte(struct kvm_vm *vm,
paddr = vm_untag_gpa(vm, paddr);
- if (!(*pte & PTE_PRESENT_MASK)) {
- *pte = PTE_PRESENT_MASK | PTE_WRITABLE_MASK;
+ if (!pte_present(mmu, pte)) {
+ *pte = PTE_PRESENT_MASK(mmu) | PTE_WRITABLE_MASK(mmu);
if (current_level == target_level)
- *pte |= PTE_LARGE_MASK | (paddr & PHYSICAL_PAGE_MASK);
+ *pte |= PTE_HUGE_MASK(mmu) | (paddr & PHYSICAL_PAGE_MASK);
else
*pte |= vm_alloc_page_table(vm) & PHYSICAL_PAGE_MASK;
} else {
@@ -226,7 +239,7 @@ static uint64_t *virt_create_upper_pte(struct kvm_vm *vm,
TEST_ASSERT(current_level != target_level,
"Cannot create hugepage at level: %u, vaddr: 0x%lx",
current_level, vaddr);
- TEST_ASSERT(!(*pte & PTE_LARGE_MASK),
+ TEST_ASSERT(!pte_huge(mmu, pte),
"Cannot create page table at level: %u, vaddr: 0x%lx",
current_level, vaddr);
}
@@ -267,24 +280,24 @@ void __virt_pg_map(struct kvm_vm *vm, struct kvm_mmu *mmu, uint64_t vaddr,
current_level--) {
pte = virt_create_upper_pte(vm, mmu, pte, vaddr, paddr,
current_level, level);
- if (*pte & PTE_LARGE_MASK)
+ if (pte_huge(mmu, pte))
return;
}
/* Fill in page table entry. */
pte = virt_get_pte(vm, mmu, pte, vaddr, PG_LEVEL_4K);
- TEST_ASSERT(!(*pte & PTE_PRESENT_MASK),
+ TEST_ASSERT(!pte_present(mmu, pte),
"PTE already present for 4k page at vaddr: 0x%lx", vaddr);
- *pte = PTE_PRESENT_MASK | PTE_WRITABLE_MASK | (paddr & PHYSICAL_PAGE_MASK);
+ *pte = PTE_PRESENT_MASK(mmu) | PTE_WRITABLE_MASK(mmu) | (paddr & PHYSICAL_PAGE_MASK);
/*
* Neither SEV nor TDX supports shared page tables, so only the final
* leaf PTE needs manually set the C/S-bit.
*/
if (vm_is_gpa_protected(vm, paddr))
- *pte |= vm->arch.c_bit;
+ *pte |= PTE_C_MASK(mmu);
else
- *pte |= vm->arch.s_bit;
+ *pte |= PTE_S_MASK(mmu);
}
void virt_arch_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr)
@@ -316,7 +329,7 @@ void virt_map_level(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr,
static bool vm_is_target_pte(struct kvm_mmu *mmu, uint64_t *pte,
int *level, int current_level)
{
- if (*pte & PTE_LARGE_MASK) {
+ if (pte_huge(mmu, pte)) {
TEST_ASSERT(*level == PG_LEVEL_NONE ||
*level == current_level,
"Unexpected hugepage at level %d", current_level);
@@ -374,6 +387,7 @@ uint64_t *vm_get_page_table_entry(struct kvm_vm *vm, uint64_t vaddr)
void virt_arch_dump(FILE *stream, struct kvm_vm *vm, uint8_t indent)
{
+ struct kvm_mmu *mmu = vm->arch.mmu;
uint64_t *pml4e, *pml4e_start;
uint64_t *pdpe, *pdpe_start;
uint64_t *pde, *pde_start;
@@ -390,44 +404,44 @@ void virt_arch_dump(FILE *stream, struct kvm_vm *vm, uint8_t indent)
pml4e_start = (uint64_t *) addr_gpa2hva(vm, vm->pgd);
for (uint16_t n1 = 0; n1 <= 0x1ffu; n1++) {
pml4e = &pml4e_start[n1];
- if (!(*pml4e & PTE_PRESENT_MASK))
+ if (!pte_present(mmu, pml4e))
continue;
fprintf(stream, "%*spml4e 0x%-3zx %p 0x%-12lx 0x%-10llx %u "
" %u\n",
indent, "",
pml4e - pml4e_start, pml4e,
addr_hva2gpa(vm, pml4e), PTE_GET_PFN(*pml4e),
- !!(*pml4e & PTE_WRITABLE_MASK), !!(*pml4e & PTE_NX_MASK));
+ pte_writable(mmu, pml4e), pte_nx(mmu, pml4e));
pdpe_start = addr_gpa2hva(vm, *pml4e & PHYSICAL_PAGE_MASK);
for (uint16_t n2 = 0; n2 <= 0x1ffu; n2++) {
pdpe = &pdpe_start[n2];
- if (!(*pdpe & PTE_PRESENT_MASK))
+ if (!pte_present(mmu, pdpe))
continue;
fprintf(stream, "%*spdpe 0x%-3zx %p 0x%-12lx 0x%-10llx "
"%u %u\n",
indent, "",
pdpe - pdpe_start, pdpe,
addr_hva2gpa(vm, pdpe),
- PTE_GET_PFN(*pdpe), !!(*pdpe & PTE_WRITABLE_MASK),
- !!(*pdpe & PTE_NX_MASK));
+ PTE_GET_PFN(*pdpe), pte_writable(mmu, pdpe),
+ pte_nx(mmu, pdpe));
pde_start = addr_gpa2hva(vm, *pdpe & PHYSICAL_PAGE_MASK);
for (uint16_t n3 = 0; n3 <= 0x1ffu; n3++) {
pde = &pde_start[n3];
- if (!(*pde & PTE_PRESENT_MASK))
+ if (!pte_present(mmu, pde))
continue;
fprintf(stream, "%*spde 0x%-3zx %p "
"0x%-12lx 0x%-10llx %u %u\n",
indent, "", pde - pde_start, pde,
addr_hva2gpa(vm, pde),
- PTE_GET_PFN(*pde), !!(*pde & PTE_WRITABLE_MASK),
- !!(*pde & PTE_NX_MASK));
+ PTE_GET_PFN(*pde), pte_writable(mmu, pde),
+ pte_nx(mmu, pde));
pte_start = addr_gpa2hva(vm, *pde & PHYSICAL_PAGE_MASK);
for (uint16_t n4 = 0; n4 <= 0x1ffu; n4++) {
pte = &pte_start[n4];
- if (!(*pte & PTE_PRESENT_MASK))
+ if (!pte_present(mmu, pte))
continue;
fprintf(stream, "%*spte 0x%-3zx %p "
"0x%-12lx 0x%-10llx %u %u "
@@ -436,9 +450,9 @@ void virt_arch_dump(FILE *stream, struct kvm_vm *vm, uint8_t indent)
pte - pte_start, pte,
addr_hva2gpa(vm, pte),
PTE_GET_PFN(*pte),
- !!(*pte & PTE_WRITABLE_MASK),
- !!(*pte & PTE_NX_MASK),
- !!(*pte & PTE_DIRTY_MASK),
+ pte_writable(mmu, pte),
+ pte_nx(mmu, pte),
+ pte_dirty(mmu, pte),
((uint64_t) n1 << 27)
| ((uint64_t) n2 << 18)
| ((uint64_t) n3 << 9)
@@ -522,7 +536,7 @@ vm_paddr_t addr_arch_gva2gpa(struct kvm_vm *vm, vm_vaddr_t gva)
struct kvm_mmu *mmu = vm->arch.mmu;
uint64_t *pte = __vm_get_page_table_entry(vm, mmu, gva, &level);
- TEST_ASSERT(*pte & PTE_PRESENT_MASK,
+ TEST_ASSERT(pte_present(mmu, pte),
"Leaf PTE not PRESENT for gva: 0x%08lx", gva);
/*
--
2.52.0.158.g65b55ccf14-goog
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v3 08/16] KVM: selftests: Use a nested MMU to share nested EPTs between vCPUs
2025-11-27 1:34 [PATCH v3 00/16] Add Nested NPT support in selftests Yosry Ahmed
` (6 preceding siblings ...)
2025-11-27 1:34 ` [PATCH v3 07/16] KVM: selftests: Move PTE bitmasks to kvm_mmu Yosry Ahmed
@ 2025-11-27 1:34 ` Yosry Ahmed
2025-12-23 23:16 ` Sean Christopherson
2025-11-27 1:34 ` [PATCH v3 09/16] KVM: selftests: Stop passing VMX metadata to TDP mapping functions Yosry Ahmed
` (8 subsequent siblings)
16 siblings, 1 reply; 36+ messages in thread
From: Yosry Ahmed @ 2025-11-27 1:34 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed
prepare_eptp() currently allocates new EPTs for each vCPU. memstress has
its own hack to share the EPTs between vCPUs. Currently, there is no
reason to have separate EPTs for each vCPU, and the complexity is
significant. The only reason it doesn't matter now is because memstress
is the only user with multiple vCPUs.
Replace prepare_eptp() with vm_enable_ept(), which enables EPT for an
entire VM. It allocates a new nested MMU and uses it to keep track of
the common EPT root (PTE masks are currently unused, but will be by
following changes).
Drop 'eptp' and 'eptp_hva' from struct vmx_pages. Only keep 'eptp_gpa'
and initialize it from the nested MMU root GPA. 'eptp' is unused, and
addr2_gpa2hva() is used to get the HVA in __tdp_pg_map() instead of
using 'eptp_hva'.
Remove the workaround in memstress to copy the EPT root between vCPUs
since it's handled by prepare_eptp() now.
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
---
.../selftests/kvm/include/x86/kvm_util_arch.h | 3 +++
.../selftests/kvm/include/x86/processor.h | 2 ++
tools/testing/selftests/kvm/include/x86/vmx.h | 8 +++---
.../testing/selftests/kvm/lib/x86/memstress.c | 19 +++++---------
.../testing/selftests/kvm/lib/x86/processor.c | 8 +++---
tools/testing/selftests/kvm/lib/x86/vmx.c | 25 +++++++++++--------
.../selftests/kvm/x86/vmx_dirty_log_test.c | 7 +++---
7 files changed, 36 insertions(+), 36 deletions(-)
diff --git a/tools/testing/selftests/kvm/include/x86/kvm_util_arch.h b/tools/testing/selftests/kvm/include/x86/kvm_util_arch.h
index d8808fa33faa..1f5308f30566 100644
--- a/tools/testing/selftests/kvm/include/x86/kvm_util_arch.h
+++ b/tools/testing/selftests/kvm/include/x86/kvm_util_arch.h
@@ -23,6 +23,9 @@ struct kvm_vm_arch {
bool is_pt_protected;
struct kvm_mmu *mmu;
+ struct {
+ struct kvm_mmu *mmu;
+ } nested;
};
static inline bool __vm_arch_has_protected_memory(struct kvm_vm_arch *arch)
diff --git a/tools/testing/selftests/kvm/include/x86/processor.h b/tools/testing/selftests/kvm/include/x86/processor.h
index 3a1a82fd42b2..fb2b2e53d453 100644
--- a/tools/testing/selftests/kvm/include/x86/processor.h
+++ b/tools/testing/selftests/kvm/include/x86/processor.h
@@ -1477,6 +1477,8 @@ struct kvm_mmu {
#define pte_c(mmu, pte) (!!(*(pte) & PTE_C_MASK(mmu)))
#define pte_s(mmu, pte) (!!(*(pte) & PTE_S_MASK(mmu)))
+struct kvm_mmu *mmu_create(struct kvm_vm *vm, int pgtable_levels,
+ struct pte_masks *pte_masks);
void __virt_pg_map(struct kvm_vm *vm, struct kvm_mmu *mmu, uint64_t vaddr,
uint64_t paddr, int level);
void virt_map_level(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr,
diff --git a/tools/testing/selftests/kvm/include/x86/vmx.h b/tools/testing/selftests/kvm/include/x86/vmx.h
index 04b8231d032a..1fd83c23529a 100644
--- a/tools/testing/selftests/kvm/include/x86/vmx.h
+++ b/tools/testing/selftests/kvm/include/x86/vmx.h
@@ -520,13 +520,11 @@ struct vmx_pages {
uint64_t vmwrite_gpa;
void *vmwrite;
- void *eptp_hva;
- uint64_t eptp_gpa;
- void *eptp;
-
void *apic_access_hva;
uint64_t apic_access_gpa;
void *apic_access;
+
+ uint64_t eptp_gpa;
};
union vmx_basic {
@@ -568,7 +566,7 @@ void tdp_identity_map_default_memslots(struct vmx_pages *vmx,
void tdp_identity_map_1g(struct vmx_pages *vmx, struct kvm_vm *vm,
uint64_t addr, uint64_t size);
bool kvm_cpu_has_ept(void);
-void prepare_eptp(struct vmx_pages *vmx, struct kvm_vm *vm);
+void vm_enable_ept(struct kvm_vm *vm);
void prepare_virtualize_apic_accesses(struct vmx_pages *vmx, struct kvm_vm *vm);
#endif /* SELFTEST_KVM_VMX_H */
diff --git a/tools/testing/selftests/kvm/lib/x86/memstress.c b/tools/testing/selftests/kvm/lib/x86/memstress.c
index 1928b00bde51..00f7f11e5f0e 100644
--- a/tools/testing/selftests/kvm/lib/x86/memstress.c
+++ b/tools/testing/selftests/kvm/lib/x86/memstress.c
@@ -59,12 +59,10 @@ uint64_t memstress_nested_pages(int nr_vcpus)
return 513 + 10 * nr_vcpus;
}
-void memstress_setup_ept(struct vmx_pages *vmx, struct kvm_vm *vm)
+static void memstress_setup_ept_mappings(struct vmx_pages *vmx, struct kvm_vm *vm)
{
uint64_t start, end;
- prepare_eptp(vmx, vm);
-
/*
* Identity map the first 4G and the test region with 1G pages so that
* KVM can shadow the EPT12 with the maximum huge page size supported
@@ -79,7 +77,7 @@ void memstress_setup_ept(struct vmx_pages *vmx, struct kvm_vm *vm)
void memstress_setup_nested(struct kvm_vm *vm, int nr_vcpus, struct kvm_vcpu *vcpus[])
{
- struct vmx_pages *vmx, *vmx0 = NULL;
+ struct vmx_pages *vmx;
struct kvm_regs regs;
vm_vaddr_t vmx_gva;
int vcpu_id;
@@ -87,18 +85,13 @@ void memstress_setup_nested(struct kvm_vm *vm, int nr_vcpus, struct kvm_vcpu *vc
TEST_REQUIRE(kvm_cpu_has(X86_FEATURE_VMX));
TEST_REQUIRE(kvm_cpu_has_ept());
+ vm_enable_ept(vm);
for (vcpu_id = 0; vcpu_id < nr_vcpus; vcpu_id++) {
vmx = vcpu_alloc_vmx(vm, &vmx_gva);
- if (vcpu_id == 0) {
- memstress_setup_ept(vmx, vm);
- vmx0 = vmx;
- } else {
- /* Share the same EPT table across all vCPUs. */
- vmx->eptp = vmx0->eptp;
- vmx->eptp_hva = vmx0->eptp_hva;
- vmx->eptp_gpa = vmx0->eptp_gpa;
- }
+ /* The EPTs are shared across vCPUs, setup the mappings once */
+ if (vcpu_id == 0)
+ memstress_setup_ept_mappings(vmx, vm);
/*
* Override the vCPU to run memstress_l1_guest_code() which will
diff --git a/tools/testing/selftests/kvm/lib/x86/processor.c b/tools/testing/selftests/kvm/lib/x86/processor.c
index dc568d70f9d6..bff75ff05364 100644
--- a/tools/testing/selftests/kvm/lib/x86/processor.c
+++ b/tools/testing/selftests/kvm/lib/x86/processor.c
@@ -156,14 +156,14 @@ bool kvm_is_tdp_enabled(void)
return get_kvm_amd_param_bool("npt");
}
-static struct kvm_mmu *mmu_create(struct kvm_vm *vm,
- int pgtable_levels,
- struct pte_masks *pte_masks)
+struct kvm_mmu *mmu_create(struct kvm_vm *vm, int pgtable_levels,
+ struct pte_masks *pte_masks)
{
struct kvm_mmu *mmu = calloc(1, sizeof(*mmu));
TEST_ASSERT(mmu, "-ENOMEM when allocating MMU");
- mmu->pte_masks = *pte_masks;
+ if (pte_masks)
+ mmu->pte_masks = *pte_masks;
mmu->root_gpa = vm_alloc_page_table(vm);
mmu->pgtable_levels = pgtable_levels;
return mmu;
diff --git a/tools/testing/selftests/kvm/lib/x86/vmx.c b/tools/testing/selftests/kvm/lib/x86/vmx.c
index a3e2eae981da..5d799ec5f7c6 100644
--- a/tools/testing/selftests/kvm/lib/x86/vmx.c
+++ b/tools/testing/selftests/kvm/lib/x86/vmx.c
@@ -56,6 +56,16 @@ int vcpu_enable_evmcs(struct kvm_vcpu *vcpu)
return evmcs_ver;
}
+void vm_enable_ept(struct kvm_vm *vm)
+{
+ TEST_ASSERT(kvm_cpu_has_ept(), "KVM doesn't support nested EPT");
+ if (vm->arch.nested.mmu)
+ return;
+
+ /* EPTP_PWL_4 is always used */
+ vm->arch.nested.mmu = mmu_create(vm, 4, NULL);
+}
+
/* Allocate memory regions for nested VMX tests.
*
* Input Args:
@@ -105,6 +115,9 @@ vcpu_alloc_vmx(struct kvm_vm *vm, vm_vaddr_t *p_vmx_gva)
vmx->vmwrite_gpa = addr_gva2gpa(vm, (uintptr_t)vmx->vmwrite);
memset(vmx->vmwrite_hva, 0, getpagesize());
+ if (vm->arch.nested.mmu)
+ vmx->eptp_gpa = vm->arch.nested.mmu->root_gpa;
+
*p_vmx_gva = vmx_gva;
return vmx;
}
@@ -395,7 +408,8 @@ void __tdp_pg_map(struct vmx_pages *vmx, struct kvm_vm *vm,
uint64_t nested_paddr, uint64_t paddr, int target_level)
{
const uint64_t page_size = PG_LEVEL_SIZE(target_level);
- struct eptPageTableEntry *pt = vmx->eptp_hva, *pte;
+ void *eptp_hva = addr_gpa2hva(vm, vm->arch.nested.mmu->root_gpa);
+ struct eptPageTableEntry *pt = eptp_hva, *pte;
uint16_t index;
TEST_ASSERT(vm->mode == VM_MODE_PXXVYY_4K,
@@ -525,15 +539,6 @@ bool kvm_cpu_has_ept(void)
return ctrl & SECONDARY_EXEC_ENABLE_EPT;
}
-void prepare_eptp(struct vmx_pages *vmx, struct kvm_vm *vm)
-{
- TEST_ASSERT(kvm_cpu_has_ept(), "KVM doesn't support nested EPT");
-
- vmx->eptp = (void *)vm_vaddr_alloc_page(vm);
- vmx->eptp_hva = addr_gva2hva(vm, (uintptr_t)vmx->eptp);
- vmx->eptp_gpa = addr_gva2gpa(vm, (uintptr_t)vmx->eptp);
-}
-
void prepare_virtualize_apic_accesses(struct vmx_pages *vmx, struct kvm_vm *vm)
{
vmx->apic_access = (void *)vm_vaddr_alloc_page(vm);
diff --git a/tools/testing/selftests/kvm/x86/vmx_dirty_log_test.c b/tools/testing/selftests/kvm/x86/vmx_dirty_log_test.c
index e7d0c08ba29d..5c8cf8ac42a2 100644
--- a/tools/testing/selftests/kvm/x86/vmx_dirty_log_test.c
+++ b/tools/testing/selftests/kvm/x86/vmx_dirty_log_test.c
@@ -93,6 +93,9 @@ static void test_vmx_dirty_log(bool enable_ept)
/* Create VM */
vm = vm_create_with_one_vcpu(&vcpu, l1_guest_code);
+ if (enable_ept)
+ vm_enable_ept(vm);
+
vmx = vcpu_alloc_vmx(vm, &vmx_pages_gva);
vcpu_args_set(vcpu, 1, vmx_pages_gva);
@@ -113,14 +116,10 @@ static void test_vmx_dirty_log(bool enable_ept)
* ... pages in the L2 GPA range [0xc0001000, 0xc0003000) will map to
* 0xc0000000.
*
- * Note that prepare_eptp should be called only L1's GPA map is done,
- * meaning after the last call to virt_map.
- *
* When EPT is disabled, the L2 guest code will still access the same L1
* GPAs as the EPT enabled case.
*/
if (enable_ept) {
- prepare_eptp(vmx, vm);
tdp_identity_map_default_memslots(vmx, vm);
tdp_map(vmx, vm, NESTED_TEST_MEM1, GUEST_TEST_MEM, PAGE_SIZE);
tdp_map(vmx, vm, NESTED_TEST_MEM2, GUEST_TEST_MEM, PAGE_SIZE);
--
2.52.0.158.g65b55ccf14-goog
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v3 09/16] KVM: selftests: Stop passing VMX metadata to TDP mapping functions
2025-11-27 1:34 [PATCH v3 00/16] Add Nested NPT support in selftests Yosry Ahmed
` (7 preceding siblings ...)
2025-11-27 1:34 ` [PATCH v3 08/16] KVM: selftests: Use a nested MMU to share nested EPTs between vCPUs Yosry Ahmed
@ 2025-11-27 1:34 ` Yosry Ahmed
2025-12-15 18:38 ` Yosry Ahmed
2025-11-27 1:34 ` [PATCH v3 10/16] KVM: selftests: Reuse virt mapping functions for nested EPTs Yosry Ahmed
` (7 subsequent siblings)
16 siblings, 1 reply; 36+ messages in thread
From: Yosry Ahmed @ 2025-11-27 1:34 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed
The root GPA can now be retrieved from the nested MMU, stop passing VMX
metadata. This is in preparation for making these functions work for
NPTs as well.
Opportunistically drop tdp_pg_map() since it's unused.
No functional change intended.
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
---
tools/testing/selftests/kvm/include/x86/vmx.h | 11 ++-----
.../testing/selftests/kvm/lib/x86/memstress.c | 11 +++----
tools/testing/selftests/kvm/lib/x86/vmx.c | 33 +++++++------------
.../selftests/kvm/x86/vmx_dirty_log_test.c | 9 +++--
4 files changed, 24 insertions(+), 40 deletions(-)
diff --git a/tools/testing/selftests/kvm/include/x86/vmx.h b/tools/testing/selftests/kvm/include/x86/vmx.h
index 1fd83c23529a..4dd4c2094ee6 100644
--- a/tools/testing/selftests/kvm/include/x86/vmx.h
+++ b/tools/testing/selftests/kvm/include/x86/vmx.h
@@ -557,14 +557,9 @@ bool load_vmcs(struct vmx_pages *vmx);
bool ept_1g_pages_supported(void);
-void tdp_pg_map(struct vmx_pages *vmx, struct kvm_vm *vm, uint64_t nested_paddr,
- uint64_t paddr);
-void tdp_map(struct vmx_pages *vmx, struct kvm_vm *vm, uint64_t nested_paddr,
- uint64_t paddr, uint64_t size);
-void tdp_identity_map_default_memslots(struct vmx_pages *vmx,
- struct kvm_vm *vm);
-void tdp_identity_map_1g(struct vmx_pages *vmx, struct kvm_vm *vm,
- uint64_t addr, uint64_t size);
+void tdp_map(struct kvm_vm *vm, uint64_t nested_paddr, uint64_t paddr, uint64_t size);
+void tdp_identity_map_default_memslots(struct kvm_vm *vm);
+void tdp_identity_map_1g(struct kvm_vm *vm, uint64_t addr, uint64_t size);
bool kvm_cpu_has_ept(void);
void vm_enable_ept(struct kvm_vm *vm);
void prepare_virtualize_apic_accesses(struct vmx_pages *vmx, struct kvm_vm *vm);
diff --git a/tools/testing/selftests/kvm/lib/x86/memstress.c b/tools/testing/selftests/kvm/lib/x86/memstress.c
index 00f7f11e5f0e..3319cb57a78d 100644
--- a/tools/testing/selftests/kvm/lib/x86/memstress.c
+++ b/tools/testing/selftests/kvm/lib/x86/memstress.c
@@ -59,7 +59,7 @@ uint64_t memstress_nested_pages(int nr_vcpus)
return 513 + 10 * nr_vcpus;
}
-static void memstress_setup_ept_mappings(struct vmx_pages *vmx, struct kvm_vm *vm)
+static void memstress_setup_ept_mappings(struct kvm_vm *vm)
{
uint64_t start, end;
@@ -68,16 +68,15 @@ static void memstress_setup_ept_mappings(struct vmx_pages *vmx, struct kvm_vm *v
* KVM can shadow the EPT12 with the maximum huge page size supported
* by the backing source.
*/
- tdp_identity_map_1g(vmx, vm, 0, 0x100000000ULL);
+ tdp_identity_map_1g(vm, 0, 0x100000000ULL);
start = align_down(memstress_args.gpa, PG_SIZE_1G);
end = align_up(memstress_args.gpa + memstress_args.size, PG_SIZE_1G);
- tdp_identity_map_1g(vmx, vm, start, end - start);
+ tdp_identity_map_1g(vm, start, end - start);
}
void memstress_setup_nested(struct kvm_vm *vm, int nr_vcpus, struct kvm_vcpu *vcpus[])
{
- struct vmx_pages *vmx;
struct kvm_regs regs;
vm_vaddr_t vmx_gva;
int vcpu_id;
@@ -87,11 +86,11 @@ void memstress_setup_nested(struct kvm_vm *vm, int nr_vcpus, struct kvm_vcpu *vc
vm_enable_ept(vm);
for (vcpu_id = 0; vcpu_id < nr_vcpus; vcpu_id++) {
- vmx = vcpu_alloc_vmx(vm, &vmx_gva);
+ vcpu_alloc_vmx(vm, &vmx_gva);
/* The EPTs are shared across vCPUs, setup the mappings once */
if (vcpu_id == 0)
- memstress_setup_ept_mappings(vmx, vm);
+ memstress_setup_ept_mappings(vm);
/*
* Override the vCPU to run memstress_l1_guest_code() which will
diff --git a/tools/testing/selftests/kvm/lib/x86/vmx.c b/tools/testing/selftests/kvm/lib/x86/vmx.c
index 5d799ec5f7c6..a909fad57fd5 100644
--- a/tools/testing/selftests/kvm/lib/x86/vmx.c
+++ b/tools/testing/selftests/kvm/lib/x86/vmx.c
@@ -404,8 +404,8 @@ static void tdp_create_pte(struct kvm_vm *vm,
}
-void __tdp_pg_map(struct vmx_pages *vmx, struct kvm_vm *vm,
- uint64_t nested_paddr, uint64_t paddr, int target_level)
+void __tdp_pg_map(struct kvm_vm *vm, uint64_t nested_paddr, uint64_t paddr,
+ int target_level)
{
const uint64_t page_size = PG_LEVEL_SIZE(target_level);
void *eptp_hva = addr_gpa2hva(vm, vm->arch.nested.mmu->root_gpa);
@@ -448,12 +448,6 @@ void __tdp_pg_map(struct vmx_pages *vmx, struct kvm_vm *vm,
}
}
-void tdp_pg_map(struct vmx_pages *vmx, struct kvm_vm *vm,
- uint64_t nested_paddr, uint64_t paddr)
-{
- __tdp_pg_map(vmx, vm, nested_paddr, paddr, PG_LEVEL_4K);
-}
-
/*
* Map a range of EPT guest physical addresses to the VM's physical address
*
@@ -471,9 +465,8 @@ void tdp_pg_map(struct vmx_pages *vmx, struct kvm_vm *vm,
* Within the VM given by vm, creates a nested guest translation for the
* page range starting at nested_paddr to the page range starting at paddr.
*/
-void __tdp_map(struct vmx_pages *vmx, struct kvm_vm *vm,
- uint64_t nested_paddr, uint64_t paddr, uint64_t size,
- int level)
+void __tdp_map(struct kvm_vm *vm, uint64_t nested_paddr, uint64_t paddr,
+ uint64_t size, int level)
{
size_t page_size = PG_LEVEL_SIZE(level);
size_t npages = size / page_size;
@@ -482,23 +475,22 @@ void __tdp_map(struct vmx_pages *vmx, struct kvm_vm *vm,
TEST_ASSERT(paddr + size > paddr, "Paddr overflow");
while (npages--) {
- __tdp_pg_map(vmx, vm, nested_paddr, paddr, level);
+ __tdp_pg_map(vm, nested_paddr, paddr, level);
nested_paddr += page_size;
paddr += page_size;
}
}
-void tdp_map(struct vmx_pages *vmx, struct kvm_vm *vm,
- uint64_t nested_paddr, uint64_t paddr, uint64_t size)
+void tdp_map(struct kvm_vm *vm, uint64_t nested_paddr, uint64_t paddr,
+ uint64_t size)
{
- __tdp_map(vmx, vm, nested_paddr, paddr, size, PG_LEVEL_4K);
+ __tdp_map(vm, nested_paddr, paddr, size, PG_LEVEL_4K);
}
/* Prepare an identity extended page table that maps all the
* physical pages in VM.
*/
-void tdp_identity_map_default_memslots(struct vmx_pages *vmx,
- struct kvm_vm *vm)
+void tdp_identity_map_default_memslots(struct kvm_vm *vm)
{
uint32_t s, memslot = 0;
sparsebit_idx_t i, last;
@@ -515,16 +507,15 @@ void tdp_identity_map_default_memslots(struct vmx_pages *vmx,
if (i > last)
break;
- tdp_map(vmx, vm, (uint64_t)i << vm->page_shift,
+ tdp_map(vm, (uint64_t)i << vm->page_shift,
(uint64_t)i << vm->page_shift, 1 << vm->page_shift);
}
}
/* Identity map a region with 1GiB Pages. */
-void tdp_identity_map_1g(struct vmx_pages *vmx, struct kvm_vm *vm,
- uint64_t addr, uint64_t size)
+void tdp_identity_map_1g(struct kvm_vm *vm, uint64_t addr, uint64_t size)
{
- __tdp_map(vmx, vm, addr, addr, size, PG_LEVEL_1G);
+ __tdp_map(vm, addr, addr, size, PG_LEVEL_1G);
}
bool kvm_cpu_has_ept(void)
diff --git a/tools/testing/selftests/kvm/x86/vmx_dirty_log_test.c b/tools/testing/selftests/kvm/x86/vmx_dirty_log_test.c
index 5c8cf8ac42a2..370f8d3117c2 100644
--- a/tools/testing/selftests/kvm/x86/vmx_dirty_log_test.c
+++ b/tools/testing/selftests/kvm/x86/vmx_dirty_log_test.c
@@ -80,7 +80,6 @@ void l1_guest_code(struct vmx_pages *vmx)
static void test_vmx_dirty_log(bool enable_ept)
{
vm_vaddr_t vmx_pages_gva = 0;
- struct vmx_pages *vmx;
unsigned long *bmap;
uint64_t *host_test_mem;
@@ -96,7 +95,7 @@ static void test_vmx_dirty_log(bool enable_ept)
if (enable_ept)
vm_enable_ept(vm);
- vmx = vcpu_alloc_vmx(vm, &vmx_pages_gva);
+ vcpu_alloc_vmx(vm, &vmx_pages_gva);
vcpu_args_set(vcpu, 1, vmx_pages_gva);
/* Add an extra memory slot for testing dirty logging */
@@ -120,9 +119,9 @@ static void test_vmx_dirty_log(bool enable_ept)
* GPAs as the EPT enabled case.
*/
if (enable_ept) {
- tdp_identity_map_default_memslots(vmx, vm);
- tdp_map(vmx, vm, NESTED_TEST_MEM1, GUEST_TEST_MEM, PAGE_SIZE);
- tdp_map(vmx, vm, NESTED_TEST_MEM2, GUEST_TEST_MEM, PAGE_SIZE);
+ tdp_identity_map_default_memslots(vm);
+ tdp_map(vm, NESTED_TEST_MEM1, GUEST_TEST_MEM, PAGE_SIZE);
+ tdp_map(vm, NESTED_TEST_MEM2, GUEST_TEST_MEM, PAGE_SIZE);
}
bmap = bitmap_zalloc(TEST_MEM_PAGES);
--
2.52.0.158.g65b55ccf14-goog
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v3 10/16] KVM: selftests: Reuse virt mapping functions for nested EPTs
2025-11-27 1:34 [PATCH v3 00/16] Add Nested NPT support in selftests Yosry Ahmed
` (8 preceding siblings ...)
2025-11-27 1:34 ` [PATCH v3 09/16] KVM: selftests: Stop passing VMX metadata to TDP mapping functions Yosry Ahmed
@ 2025-11-27 1:34 ` Yosry Ahmed
2025-12-23 23:12 ` Sean Christopherson
2025-12-23 23:14 ` Sean Christopherson
2025-11-27 1:34 ` [PATCH v3 11/16] KVM: selftests: Move TDP mapping functions outside of vmx.c Yosry Ahmed
` (6 subsequent siblings)
16 siblings, 2 replies; 36+ messages in thread
From: Yosry Ahmed @ 2025-11-27 1:34 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed
__tdp_pg_map() bears a lot of resemblence to __virt_pg_map(). The
main differences are:
- It uses the EPT struct overlay instead of the PTE masks.
- It always assumes 4-level EPTs.
To reuse __virt_pg_map(), initialize the PTE masks in nested MMU with
EPT PTE masks. EPTs have no 'present' or 'user' bits, so use the
'readable' bit instead like shadow_{present/user}_mask, ignoring the
fact that entries can be present and not readable if the CPU has
VMX_EPT_EXECUTE_ONLY_BIT. This is simple and sufficient for testing.
Add an executable bitmask and update __virt_pg_map() and friends to set
the bit on newly created entries to match the EPT behavior. It's a nop
for x86 page tables.
Another benefit of reusing the code is having separate handling for
upper-level PTEs vs 4K PTEs, which avoids some quirks like setting the
large bit on a 4K PTE in the EPTs.
No functional change intended.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
---
.../selftests/kvm/include/x86/processor.h | 3 +
.../testing/selftests/kvm/lib/x86/processor.c | 12 +-
tools/testing/selftests/kvm/lib/x86/vmx.c | 115 ++++--------------
3 files changed, 33 insertions(+), 97 deletions(-)
diff --git a/tools/testing/selftests/kvm/include/x86/processor.h b/tools/testing/selftests/kvm/include/x86/processor.h
index fb2b2e53d453..62e10b296719 100644
--- a/tools/testing/selftests/kvm/include/x86/processor.h
+++ b/tools/testing/selftests/kvm/include/x86/processor.h
@@ -1447,6 +1447,7 @@ struct pte_masks {
uint64_t dirty;
uint64_t huge;
uint64_t nx;
+ uint64_t x;
uint64_t c;
uint64_t s;
};
@@ -1464,6 +1465,7 @@ struct kvm_mmu {
#define PTE_DIRTY_MASK(mmu) ((mmu)->pte_masks.dirty)
#define PTE_HUGE_MASK(mmu) ((mmu)->pte_masks.huge)
#define PTE_NX_MASK(mmu) ((mmu)->pte_masks.nx)
+#define PTE_X_MASK(mmu) ((mmu)->pte_masks.x)
#define PTE_C_MASK(mmu) ((mmu)->pte_masks.c)
#define PTE_S_MASK(mmu) ((mmu)->pte_masks.s)
@@ -1474,6 +1476,7 @@ struct kvm_mmu {
#define pte_dirty(mmu, pte) (!!(*(pte) & PTE_DIRTY_MASK(mmu)))
#define pte_huge(mmu, pte) (!!(*(pte) & PTE_HUGE_MASK(mmu)))
#define pte_nx(mmu, pte) (!!(*(pte) & PTE_NX_MASK(mmu)))
+#define pte_x(mmu, pte) (!!(*(pte) & PTE_X_MASK(mmu)))
#define pte_c(mmu, pte) (!!(*(pte) & PTE_C_MASK(mmu)))
#define pte_s(mmu, pte) (!!(*(pte) & PTE_S_MASK(mmu)))
diff --git a/tools/testing/selftests/kvm/lib/x86/processor.c b/tools/testing/selftests/kvm/lib/x86/processor.c
index bff75ff05364..8b0e17f8ca37 100644
--- a/tools/testing/selftests/kvm/lib/x86/processor.c
+++ b/tools/testing/selftests/kvm/lib/x86/processor.c
@@ -162,8 +162,7 @@ struct kvm_mmu *mmu_create(struct kvm_vm *vm, int pgtable_levels,
struct kvm_mmu *mmu = calloc(1, sizeof(*mmu));
TEST_ASSERT(mmu, "-ENOMEM when allocating MMU");
- if (pte_masks)
- mmu->pte_masks = *pte_masks;
+ mmu->pte_masks = *pte_masks;
mmu->root_gpa = vm_alloc_page_table(vm);
mmu->pgtable_levels = pgtable_levels;
return mmu;
@@ -179,6 +178,7 @@ static void mmu_init(struct kvm_vm *vm)
.dirty = BIT_ULL(6),
.huge = BIT_ULL(7),
.nx = BIT_ULL(63),
+ .x = 0,
.c = vm->arch.c_bit,
.s = vm->arch.s_bit,
};
@@ -225,7 +225,7 @@ static uint64_t *virt_create_upper_pte(struct kvm_vm *vm,
paddr = vm_untag_gpa(vm, paddr);
if (!pte_present(mmu, pte)) {
- *pte = PTE_PRESENT_MASK(mmu) | PTE_WRITABLE_MASK(mmu);
+ *pte = PTE_PRESENT_MASK(mmu) | PTE_WRITABLE_MASK(mmu) | PTE_X_MASK(mmu);
if (current_level == target_level)
*pte |= PTE_HUGE_MASK(mmu) | (paddr & PHYSICAL_PAGE_MASK);
else
@@ -271,6 +271,9 @@ void __virt_pg_map(struct kvm_vm *vm, struct kvm_mmu *mmu, uint64_t vaddr,
TEST_ASSERT(vm_untag_gpa(vm, paddr) == paddr,
"Unexpected bits in paddr: %lx", paddr);
+ TEST_ASSERT(!PTE_X_MASK(mmu) || !PTE_NX_MASK(mmu),
+ "X and NX bit masks cannot be used simultaneously");
+
/*
* Allocate upper level page tables, if not already present. Return
* early if a hugepage was created.
@@ -288,7 +291,8 @@ void __virt_pg_map(struct kvm_vm *vm, struct kvm_mmu *mmu, uint64_t vaddr,
pte = virt_get_pte(vm, mmu, pte, vaddr, PG_LEVEL_4K);
TEST_ASSERT(!pte_present(mmu, pte),
"PTE already present for 4k page at vaddr: 0x%lx", vaddr);
- *pte = PTE_PRESENT_MASK(mmu) | PTE_WRITABLE_MASK(mmu) | (paddr & PHYSICAL_PAGE_MASK);
+ *pte = PTE_PRESENT_MASK(mmu) | PTE_WRITABLE_MASK(mmu) | PTE_X_MASK(mmu)
+ | (paddr & PHYSICAL_PAGE_MASK);
/*
* Neither SEV nor TDX supports shared page tables, so only the final
diff --git a/tools/testing/selftests/kvm/lib/x86/vmx.c b/tools/testing/selftests/kvm/lib/x86/vmx.c
index a909fad57fd5..0cba31cae896 100644
--- a/tools/testing/selftests/kvm/lib/x86/vmx.c
+++ b/tools/testing/selftests/kvm/lib/x86/vmx.c
@@ -25,21 +25,6 @@ bool enable_evmcs;
struct hv_enlightened_vmcs *current_evmcs;
struct hv_vp_assist_page *current_vp_assist;
-struct eptPageTableEntry {
- uint64_t readable:1;
- uint64_t writable:1;
- uint64_t executable:1;
- uint64_t memory_type:3;
- uint64_t ignore_pat:1;
- uint64_t page_size:1;
- uint64_t accessed:1;
- uint64_t dirty:1;
- uint64_t ignored_11_10:2;
- uint64_t address:40;
- uint64_t ignored_62_52:11;
- uint64_t suppress_ve:1;
-};
-
int vcpu_enable_evmcs(struct kvm_vcpu *vcpu)
{
uint16_t evmcs_ver;
@@ -58,12 +43,31 @@ int vcpu_enable_evmcs(struct kvm_vcpu *vcpu)
void vm_enable_ept(struct kvm_vm *vm)
{
+ struct pte_masks pte_masks;
+
TEST_ASSERT(kvm_cpu_has_ept(), "KVM doesn't support nested EPT");
if (vm->arch.nested.mmu)
return;
+ /*
+ * EPTs do not have 'present' or 'user' bits, instead bit 0 is the
+ * 'readable' bit. In some cases, EPTs can be execute-only and an entry
+ * is present but not readable. However, for the purposes of testing we
+ * assume 'present' == 'user' == 'readable' for simplicity.
+ */
+ pte_masks = (struct pte_masks){
+ .present = BIT_ULL(0),
+ .user = BIT_ULL(0),
+ .writable = BIT_ULL(1),
+ .x = BIT_ULL(2),
+ .accessed = BIT_ULL(5),
+ .dirty = BIT_ULL(6),
+ .huge = BIT_ULL(7),
+ .nx = 0,
+ };
+
/* EPTP_PWL_4 is always used */
- vm->arch.nested.mmu = mmu_create(vm, 4, NULL);
+ vm->arch.nested.mmu = mmu_create(vm, 4, &pte_masks);
}
/* Allocate memory regions for nested VMX tests.
@@ -372,82 +376,6 @@ void prepare_vmcs(struct vmx_pages *vmx, void *guest_rip, void *guest_rsp)
init_vmcs_guest_state(guest_rip, guest_rsp);
}
-static void tdp_create_pte(struct kvm_vm *vm,
- struct eptPageTableEntry *pte,
- uint64_t nested_paddr,
- uint64_t paddr,
- int current_level,
- int target_level)
-{
- if (!pte->readable) {
- pte->writable = true;
- pte->readable = true;
- pte->executable = true;
- pte->page_size = (current_level == target_level);
- if (pte->page_size)
- pte->address = paddr >> vm->page_shift;
- else
- pte->address = vm_alloc_page_table(vm) >> vm->page_shift;
- } else {
- /*
- * Entry already present. Assert that the caller doesn't want
- * a hugepage at this level, and that there isn't a hugepage at
- * this level.
- */
- TEST_ASSERT(current_level != target_level,
- "Cannot create hugepage at level: %u, nested_paddr: 0x%lx",
- current_level, nested_paddr);
- TEST_ASSERT(!pte->page_size,
- "Cannot create page table at level: %u, nested_paddr: 0x%lx",
- current_level, nested_paddr);
- }
-}
-
-
-void __tdp_pg_map(struct kvm_vm *vm, uint64_t nested_paddr, uint64_t paddr,
- int target_level)
-{
- const uint64_t page_size = PG_LEVEL_SIZE(target_level);
- void *eptp_hva = addr_gpa2hva(vm, vm->arch.nested.mmu->root_gpa);
- struct eptPageTableEntry *pt = eptp_hva, *pte;
- uint16_t index;
-
- TEST_ASSERT(vm->mode == VM_MODE_PXXVYY_4K,
- "Unknown or unsupported guest mode: 0x%x", vm->mode);
-
- TEST_ASSERT((nested_paddr >> 48) == 0,
- "Nested physical address 0x%lx is > 48-bits and requires 5-level EPT",
- nested_paddr);
- TEST_ASSERT((nested_paddr % page_size) == 0,
- "Nested physical address not on page boundary,\n"
- " nested_paddr: 0x%lx page_size: 0x%lx",
- nested_paddr, page_size);
- TEST_ASSERT((nested_paddr >> vm->page_shift) <= vm->max_gfn,
- "Physical address beyond beyond maximum supported,\n"
- " nested_paddr: 0x%lx vm->max_gfn: 0x%lx vm->page_size: 0x%x",
- paddr, vm->max_gfn, vm->page_size);
- TEST_ASSERT((paddr % page_size) == 0,
- "Physical address not on page boundary,\n"
- " paddr: 0x%lx page_size: 0x%lx",
- paddr, page_size);
- TEST_ASSERT((paddr >> vm->page_shift) <= vm->max_gfn,
- "Physical address beyond beyond maximum supported,\n"
- " paddr: 0x%lx vm->max_gfn: 0x%lx vm->page_size: 0x%x",
- paddr, vm->max_gfn, vm->page_size);
-
- for (int level = PG_LEVEL_512G; level >= PG_LEVEL_4K; level--) {
- index = (nested_paddr >> PG_LEVEL_SHIFT(level)) & 0x1ffu;
- pte = &pt[index];
-
- tdp_create_pte(vm, pte, nested_paddr, paddr, level, target_level);
-
- if (pte->page_size)
- break;
-
- pt = addr_gpa2hva(vm, pte->address * vm->page_size);
- }
-}
-
/*
* Map a range of EPT guest physical addresses to the VM's physical address
*
@@ -468,6 +396,7 @@ void __tdp_pg_map(struct kvm_vm *vm, uint64_t nested_paddr, uint64_t paddr,
void __tdp_map(struct kvm_vm *vm, uint64_t nested_paddr, uint64_t paddr,
uint64_t size, int level)
{
+ struct kvm_mmu *mmu = vm->arch.nested.mmu;
size_t page_size = PG_LEVEL_SIZE(level);
size_t npages = size / page_size;
@@ -475,7 +404,7 @@ void __tdp_map(struct kvm_vm *vm, uint64_t nested_paddr, uint64_t paddr,
TEST_ASSERT(paddr + size > paddr, "Paddr overflow");
while (npages--) {
- __tdp_pg_map(vm, nested_paddr, paddr, level);
+ __virt_pg_map(vm, mmu, nested_paddr, paddr, level);
nested_paddr += page_size;
paddr += page_size;
}
--
2.52.0.158.g65b55ccf14-goog
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v3 11/16] KVM: selftests: Move TDP mapping functions outside of vmx.c
2025-11-27 1:34 [PATCH v3 00/16] Add Nested NPT support in selftests Yosry Ahmed
` (9 preceding siblings ...)
2025-11-27 1:34 ` [PATCH v3 10/16] KVM: selftests: Reuse virt mapping functions for nested EPTs Yosry Ahmed
@ 2025-11-27 1:34 ` Yosry Ahmed
2025-12-23 23:13 ` Sean Christopherson
2025-11-27 1:34 ` [PATCH v3 12/16] KVM: selftests: Allow kvm_cpu_has_ept() to be called on AMD CPUs Yosry Ahmed
` (5 subsequent siblings)
16 siblings, 1 reply; 36+ messages in thread
From: Yosry Ahmed @ 2025-11-27 1:34 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed
Now that the functions are no longer VMX-specific, move them to
processor.c. Do a minor comment tweak replacing 'EPT' with 'TDP'.
No functional change intended.
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
---
.../selftests/kvm/include/x86/processor.h | 4 ++
tools/testing/selftests/kvm/include/x86/vmx.h | 3 -
.../testing/selftests/kvm/lib/x86/processor.c | 71 +++++++++++++++++++
tools/testing/selftests/kvm/lib/x86/vmx.c | 71 -------------------
4 files changed, 75 insertions(+), 74 deletions(-)
diff --git a/tools/testing/selftests/kvm/include/x86/processor.h b/tools/testing/selftests/kvm/include/x86/processor.h
index 62e10b296719..95216b513379 100644
--- a/tools/testing/selftests/kvm/include/x86/processor.h
+++ b/tools/testing/selftests/kvm/include/x86/processor.h
@@ -1487,6 +1487,10 @@ void __virt_pg_map(struct kvm_vm *vm, struct kvm_mmu *mmu, uint64_t vaddr,
void virt_map_level(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr,
uint64_t nr_bytes, int level);
+void tdp_map(struct kvm_vm *vm, uint64_t nested_paddr, uint64_t paddr, uint64_t size);
+void tdp_identity_map_default_memslots(struct kvm_vm *vm);
+void tdp_identity_map_1g(struct kvm_vm *vm, uint64_t addr, uint64_t size);
+
/*
* Basic CPU control in CR0
*/
diff --git a/tools/testing/selftests/kvm/include/x86/vmx.h b/tools/testing/selftests/kvm/include/x86/vmx.h
index 4dd4c2094ee6..92b918700d24 100644
--- a/tools/testing/selftests/kvm/include/x86/vmx.h
+++ b/tools/testing/selftests/kvm/include/x86/vmx.h
@@ -557,9 +557,6 @@ bool load_vmcs(struct vmx_pages *vmx);
bool ept_1g_pages_supported(void);
-void tdp_map(struct kvm_vm *vm, uint64_t nested_paddr, uint64_t paddr, uint64_t size);
-void tdp_identity_map_default_memslots(struct kvm_vm *vm);
-void tdp_identity_map_1g(struct kvm_vm *vm, uint64_t addr, uint64_t size);
bool kvm_cpu_has_ept(void);
void vm_enable_ept(struct kvm_vm *vm);
void prepare_virtualize_apic_accesses(struct vmx_pages *vmx, struct kvm_vm *vm);
diff --git a/tools/testing/selftests/kvm/lib/x86/processor.c b/tools/testing/selftests/kvm/lib/x86/processor.c
index 8b0e17f8ca37..517a8185eade 100644
--- a/tools/testing/selftests/kvm/lib/x86/processor.c
+++ b/tools/testing/selftests/kvm/lib/x86/processor.c
@@ -467,6 +467,77 @@ void virt_arch_dump(FILE *stream, struct kvm_vm *vm, uint8_t indent)
}
}
+/*
+ * Map a range of TDP guest physical addresses to the VM's physical address
+ *
+ * Input Args:
+ * vm - Virtual Machine
+ * nested_paddr - Nested guest physical address to map
+ * paddr - VM Physical Address
+ * size - The size of the range to map
+ * level - The level at which to map the range
+ *
+ * Output Args: None
+ *
+ * Return: None
+ *
+ * Within the VM given by vm, creates a nested guest translation for the
+ * page range starting at nested_paddr to the page range starting at paddr.
+ */
+void __tdp_map(struct kvm_vm *vm, uint64_t nested_paddr, uint64_t paddr,
+ uint64_t size, int level)
+{
+ struct kvm_mmu *mmu = vm->arch.nested.mmu;
+ size_t page_size = PG_LEVEL_SIZE(level);
+ size_t npages = size / page_size;
+
+ TEST_ASSERT(nested_paddr + size > nested_paddr, "Vaddr overflow");
+ TEST_ASSERT(paddr + size > paddr, "Paddr overflow");
+
+ while (npages--) {
+ __virt_pg_map(vm, mmu, nested_paddr, paddr, level);
+ nested_paddr += page_size;
+ paddr += page_size;
+ }
+}
+
+void tdp_map(struct kvm_vm *vm, uint64_t nested_paddr, uint64_t paddr,
+ uint64_t size)
+{
+ __tdp_map(vm, nested_paddr, paddr, size, PG_LEVEL_4K);
+}
+
+/* Prepare an identity extended page table that maps all the
+ * physical pages in VM.
+ */
+void tdp_identity_map_default_memslots(struct kvm_vm *vm)
+{
+ uint32_t s, memslot = 0;
+ sparsebit_idx_t i, last;
+ struct userspace_mem_region *region = memslot2region(vm, memslot);
+
+ /* Only memslot 0 is mapped here, ensure it's the only one being used */
+ for (s = 0; s < NR_MEM_REGIONS; s++)
+ TEST_ASSERT_EQ(vm->memslots[s], 0);
+
+ i = (region->region.guest_phys_addr >> vm->page_shift) - 1;
+ last = i + (region->region.memory_size >> vm->page_shift);
+ for (;;) {
+ i = sparsebit_next_clear(region->unused_phy_pages, i);
+ if (i > last)
+ break;
+
+ tdp_map(vm, (uint64_t)i << vm->page_shift,
+ (uint64_t)i << vm->page_shift, 1 << vm->page_shift);
+ }
+}
+
+/* Identity map a region with 1GiB Pages. */
+void tdp_identity_map_1g(struct kvm_vm *vm, uint64_t addr, uint64_t size)
+{
+ __tdp_map(vm, addr, addr, size, PG_LEVEL_1G);
+}
+
/*
* Set Unusable Segment
*
diff --git a/tools/testing/selftests/kvm/lib/x86/vmx.c b/tools/testing/selftests/kvm/lib/x86/vmx.c
index 0cba31cae896..4289c9eb61ff 100644
--- a/tools/testing/selftests/kvm/lib/x86/vmx.c
+++ b/tools/testing/selftests/kvm/lib/x86/vmx.c
@@ -376,77 +376,6 @@ void prepare_vmcs(struct vmx_pages *vmx, void *guest_rip, void *guest_rsp)
init_vmcs_guest_state(guest_rip, guest_rsp);
}
-/*
- * Map a range of EPT guest physical addresses to the VM's physical address
- *
- * Input Args:
- * vm - Virtual Machine
- * nested_paddr - Nested guest physical address to map
- * paddr - VM Physical Address
- * size - The size of the range to map
- * level - The level at which to map the range
- *
- * Output Args: None
- *
- * Return: None
- *
- * Within the VM given by vm, creates a nested guest translation for the
- * page range starting at nested_paddr to the page range starting at paddr.
- */
-void __tdp_map(struct kvm_vm *vm, uint64_t nested_paddr, uint64_t paddr,
- uint64_t size, int level)
-{
- struct kvm_mmu *mmu = vm->arch.nested.mmu;
- size_t page_size = PG_LEVEL_SIZE(level);
- size_t npages = size / page_size;
-
- TEST_ASSERT(nested_paddr + size > nested_paddr, "Vaddr overflow");
- TEST_ASSERT(paddr + size > paddr, "Paddr overflow");
-
- while (npages--) {
- __virt_pg_map(vm, mmu, nested_paddr, paddr, level);
- nested_paddr += page_size;
- paddr += page_size;
- }
-}
-
-void tdp_map(struct kvm_vm *vm, uint64_t nested_paddr, uint64_t paddr,
- uint64_t size)
-{
- __tdp_map(vm, nested_paddr, paddr, size, PG_LEVEL_4K);
-}
-
-/* Prepare an identity extended page table that maps all the
- * physical pages in VM.
- */
-void tdp_identity_map_default_memslots(struct kvm_vm *vm)
-{
- uint32_t s, memslot = 0;
- sparsebit_idx_t i, last;
- struct userspace_mem_region *region = memslot2region(vm, memslot);
-
- /* Only memslot 0 is mapped here, ensure it's the only one being used */
- for (s = 0; s < NR_MEM_REGIONS; s++)
- TEST_ASSERT_EQ(vm->memslots[s], 0);
-
- i = (region->region.guest_phys_addr >> vm->page_shift) - 1;
- last = i + (region->region.memory_size >> vm->page_shift);
- for (;;) {
- i = sparsebit_next_clear(region->unused_phy_pages, i);
- if (i > last)
- break;
-
- tdp_map(vm, (uint64_t)i << vm->page_shift,
- (uint64_t)i << vm->page_shift, 1 << vm->page_shift);
- }
-}
-
-/* Identity map a region with 1GiB Pages. */
-void tdp_identity_map_1g(struct kvm_vm *vm, uint64_t addr, uint64_t size)
-{
- __tdp_map(vm, addr, addr, size, PG_LEVEL_1G);
-}
-
bool kvm_cpu_has_ept(void)
{
uint64_t ctrl;
--
2.52.0.158.g65b55ccf14-goog
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v3 12/16] KVM: selftests: Allow kvm_cpu_has_ept() to be called on AMD CPUs
2025-11-27 1:34 [PATCH v3 00/16] Add Nested NPT support in selftests Yosry Ahmed
` (10 preceding siblings ...)
2025-11-27 1:34 ` [PATCH v3 11/16] KVM: selftests: Move TDP mapping functions outside of vmx.c Yosry Ahmed
@ 2025-11-27 1:34 ` Yosry Ahmed
2025-11-27 1:34 ` [PATCH v3 13/16] KVM: selftests: Add support for nested NPTs Yosry Ahmed
` (4 subsequent siblings)
16 siblings, 0 replies; 36+ messages in thread
From: Yosry Ahmed @ 2025-11-27 1:34 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed
In preparation for generalizing the nested dirty logging test, checking
if either EPT or NPT is enabled will be needed. To avoid needing to gate
the kvm_cpu_has_ept() call by the CPU type, make sure the function
returns false if VMX is not available instead of trying to read VMX-only
MSRs.
No functional change intended.
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
---
tools/testing/selftests/kvm/lib/x86/vmx.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/tools/testing/selftests/kvm/lib/x86/vmx.c b/tools/testing/selftests/kvm/lib/x86/vmx.c
index 4289c9eb61ff..f70d0d591d5a 100644
--- a/tools/testing/selftests/kvm/lib/x86/vmx.c
+++ b/tools/testing/selftests/kvm/lib/x86/vmx.c
@@ -380,6 +380,9 @@ bool kvm_cpu_has_ept(void)
{
uint64_t ctrl;
+ if (!kvm_cpu_has(X86_FEATURE_VMX))
+ return false;
+
ctrl = kvm_get_feature_msr(MSR_IA32_VMX_TRUE_PROCBASED_CTLS) >> 32;
if (!(ctrl & CPU_BASED_ACTIVATE_SECONDARY_CONTROLS))
return false;
--
2.52.0.158.g65b55ccf14-goog
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v3 13/16] KVM: selftests: Add support for nested NPTs
2025-11-27 1:34 [PATCH v3 00/16] Add Nested NPT support in selftests Yosry Ahmed
` (11 preceding siblings ...)
2025-11-27 1:34 ` [PATCH v3 12/16] KVM: selftests: Allow kvm_cpu_has_ept() to be called on AMD CPUs Yosry Ahmed
@ 2025-11-27 1:34 ` Yosry Ahmed
2025-11-27 1:34 ` [PATCH v3 14/16] KVM: selftests: Set the user bit on nested NPT PTEs Yosry Ahmed
` (3 subsequent siblings)
16 siblings, 0 replies; 36+ messages in thread
From: Yosry Ahmed @ 2025-11-27 1:34 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed
Implement nCR3 and NPT initialization functions, similar to the EPT
equivalents, and create common TDP helpers for enablement checking and
initialization. Enable NPT for nested guests by default if the TDP MMU
was initialized, similar to VMX.
Reuse the PTE masks from the main MMU in the NPT MMU, except for the C
and S bits related to confidential VMs.
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
---
.../selftests/kvm/include/x86/processor.h | 2 ++
.../selftests/kvm/include/x86/svm_util.h | 9 ++++++++
.../testing/selftests/kvm/lib/x86/memstress.c | 4 ++--
.../testing/selftests/kvm/lib/x86/processor.c | 15 +++++++++++++
tools/testing/selftests/kvm/lib/x86/svm.c | 22 +++++++++++++++++++
.../selftests/kvm/x86/vmx_dirty_log_test.c | 4 ++--
6 files changed, 52 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/kvm/include/x86/processor.h b/tools/testing/selftests/kvm/include/x86/processor.h
index 95216b513379..920abd14f3a6 100644
--- a/tools/testing/selftests/kvm/include/x86/processor.h
+++ b/tools/testing/selftests/kvm/include/x86/processor.h
@@ -1487,6 +1487,8 @@ void __virt_pg_map(struct kvm_vm *vm, struct kvm_mmu *mmu, uint64_t vaddr,
void virt_map_level(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr,
uint64_t nr_bytes, int level);
+void vm_enable_tdp(struct kvm_vm *vm);
+bool kvm_cpu_has_tdp(void);
void tdp_map(struct kvm_vm *vm, uint64_t nested_paddr, uint64_t paddr, uint64_t size);
void tdp_identity_map_default_memslots(struct kvm_vm *vm);
void tdp_identity_map_1g(struct kvm_vm *vm, uint64_t addr, uint64_t size);
diff --git a/tools/testing/selftests/kvm/include/x86/svm_util.h b/tools/testing/selftests/kvm/include/x86/svm_util.h
index b74c6dcddcbd..5d7c42534bc4 100644
--- a/tools/testing/selftests/kvm/include/x86/svm_util.h
+++ b/tools/testing/selftests/kvm/include/x86/svm_util.h
@@ -27,6 +27,9 @@ struct svm_test_data {
void *msr; /* gva */
void *msr_hva;
uint64_t msr_gpa;
+
+ /* NPT */
+ uint64_t ncr3_gpa;
};
static inline void vmmcall(void)
@@ -57,6 +60,12 @@ struct svm_test_data *vcpu_alloc_svm(struct kvm_vm *vm, vm_vaddr_t *p_svm_gva);
void generic_svm_setup(struct svm_test_data *svm, void *guest_rip, void *guest_rsp);
void run_guest(struct vmcb *vmcb, uint64_t vmcb_gpa);
+static inline bool kvm_cpu_has_npt(void)
+{
+ return kvm_cpu_has(X86_FEATURE_NPT);
+}
+void vm_enable_npt(struct kvm_vm *vm);
+
int open_sev_dev_path_or_exit(void);
#endif /* SELFTEST_KVM_SVM_UTILS_H */
diff --git a/tools/testing/selftests/kvm/lib/x86/memstress.c b/tools/testing/selftests/kvm/lib/x86/memstress.c
index 3319cb57a78d..407abfc34909 100644
--- a/tools/testing/selftests/kvm/lib/x86/memstress.c
+++ b/tools/testing/selftests/kvm/lib/x86/memstress.c
@@ -82,9 +82,9 @@ void memstress_setup_nested(struct kvm_vm *vm, int nr_vcpus, struct kvm_vcpu *vc
int vcpu_id;
TEST_REQUIRE(kvm_cpu_has(X86_FEATURE_VMX));
- TEST_REQUIRE(kvm_cpu_has_ept());
+ TEST_REQUIRE(kvm_cpu_has_tdp());
- vm_enable_ept(vm);
+ vm_enable_tdp(vm);
for (vcpu_id = 0; vcpu_id < nr_vcpus; vcpu_id++) {
vcpu_alloc_vmx(vm, &vmx_gva);
diff --git a/tools/testing/selftests/kvm/lib/x86/processor.c b/tools/testing/selftests/kvm/lib/x86/processor.c
index 517a8185eade..b22c8c1bfdc3 100644
--- a/tools/testing/selftests/kvm/lib/x86/processor.c
+++ b/tools/testing/selftests/kvm/lib/x86/processor.c
@@ -8,7 +8,9 @@
#include "kvm_util.h"
#include "pmu.h"
#include "processor.h"
+#include "svm_util.h"
#include "sev.h"
+#include "vmx.h"
#ifndef NUM_INTERRUPTS
#define NUM_INTERRUPTS 256
@@ -467,6 +469,19 @@ void virt_arch_dump(FILE *stream, struct kvm_vm *vm, uint8_t indent)
}
}
+void vm_enable_tdp(struct kvm_vm *vm)
+{
+ if (kvm_cpu_has(X86_FEATURE_VMX))
+ vm_enable_ept(vm);
+ else
+ vm_enable_npt(vm);
+}
+
+bool kvm_cpu_has_tdp(void)
+{
+ return kvm_cpu_has_ept() || kvm_cpu_has_npt();
+}
+
/*
* Map a range of TDP guest physical addresses to the VM's physical address
*
diff --git a/tools/testing/selftests/kvm/lib/x86/svm.c b/tools/testing/selftests/kvm/lib/x86/svm.c
index d239c2097391..cf3b98802164 100644
--- a/tools/testing/selftests/kvm/lib/x86/svm.c
+++ b/tools/testing/selftests/kvm/lib/x86/svm.c
@@ -59,6 +59,23 @@ static void vmcb_set_seg(struct vmcb_seg *seg, u16 selector,
seg->base = base;
}
+void vm_enable_npt(struct kvm_vm *vm)
+{
+ struct pte_masks pte_masks;
+
+ TEST_ASSERT(kvm_cpu_has_npt(), "KVM doesn't supported nested NPT");
+
+ if (vm->arch.nested.mmu)
+ return;
+
+ /* NPTs use the same PTE format, except for C/S bits */
+ pte_masks = vm->arch.mmu->pte_masks;
+ pte_masks.c = 0;
+ pte_masks.s = 0;
+
+ vm->arch.nested.mmu = mmu_create(vm, vm->pgtable_levels, &pte_masks);
+}
+
void generic_svm_setup(struct svm_test_data *svm, void *guest_rip, void *guest_rsp)
{
struct vmcb *vmcb = svm->vmcb;
@@ -102,6 +119,11 @@ void generic_svm_setup(struct svm_test_data *svm, void *guest_rip, void *guest_r
vmcb->save.rip = (u64)guest_rip;
vmcb->save.rsp = (u64)guest_rsp;
guest_regs.rdi = (u64)svm;
+
+ if (svm->ncr3_gpa) {
+ ctrl->nested_ctl |= SVM_NESTED_CTL_NP_ENABLE;
+ ctrl->nested_cr3 = svm->ncr3_gpa;
+ }
}
/*
diff --git a/tools/testing/selftests/kvm/x86/vmx_dirty_log_test.c b/tools/testing/selftests/kvm/x86/vmx_dirty_log_test.c
index 370f8d3117c2..032ab8bf60a4 100644
--- a/tools/testing/selftests/kvm/x86/vmx_dirty_log_test.c
+++ b/tools/testing/selftests/kvm/x86/vmx_dirty_log_test.c
@@ -93,7 +93,7 @@ static void test_vmx_dirty_log(bool enable_ept)
/* Create VM */
vm = vm_create_with_one_vcpu(&vcpu, l1_guest_code);
if (enable_ept)
- vm_enable_ept(vm);
+ vm_enable_tdp(vm);
vcpu_alloc_vmx(vm, &vmx_pages_gva);
vcpu_args_set(vcpu, 1, vmx_pages_gva);
@@ -170,7 +170,7 @@ int main(int argc, char *argv[])
test_vmx_dirty_log(/*enable_ept=*/false);
- if (kvm_cpu_has_ept())
+ if (kvm_cpu_has_tdp())
test_vmx_dirty_log(/*enable_ept=*/true);
return 0;
--
2.52.0.158.g65b55ccf14-goog
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v3 14/16] KVM: selftests: Set the user bit on nested NPT PTEs
2025-11-27 1:34 [PATCH v3 00/16] Add Nested NPT support in selftests Yosry Ahmed
` (12 preceding siblings ...)
2025-11-27 1:34 ` [PATCH v3 13/16] KVM: selftests: Add support for nested NPTs Yosry Ahmed
@ 2025-11-27 1:34 ` Yosry Ahmed
2025-11-27 1:34 ` [PATCH v3 15/16] KVM: selftests: Extend vmx_dirty_log_test to cover SVM Yosry Ahmed
` (2 subsequent siblings)
16 siblings, 0 replies; 36+ messages in thread
From: Yosry Ahmed @ 2025-11-27 1:34 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed
According to the APM, NPT walks are treated as user accesses. In
preparation for supporting NPT mappings, set the 'user' bit on NPTs by
adding a mask of bits to always be set on PTEs in kvm_mmu.
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
---
tools/testing/selftests/kvm/include/x86/processor.h | 4 ++++
tools/testing/selftests/kvm/lib/x86/processor.c | 6 ++++--
tools/testing/selftests/kvm/lib/x86/svm.c | 3 +++
3 files changed, 11 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/kvm/include/x86/processor.h b/tools/testing/selftests/kvm/include/x86/processor.h
index 920abd14f3a6..d41245e2521b 100644
--- a/tools/testing/selftests/kvm/include/x86/processor.h
+++ b/tools/testing/selftests/kvm/include/x86/processor.h
@@ -1450,6 +1450,8 @@ struct pte_masks {
uint64_t x;
uint64_t c;
uint64_t s;
+
+ uint64_t always_set;
};
struct kvm_mmu {
@@ -1469,6 +1471,8 @@ struct kvm_mmu {
#define PTE_C_MASK(mmu) ((mmu)->pte_masks.c)
#define PTE_S_MASK(mmu) ((mmu)->pte_masks.s)
+#define PTE_ALWAYS_SET_MASK(mmu) ((mmu)->pte_masks.always_set)
+
#define pte_present(mmu, pte) (!!(*(pte) & PTE_PRESENT_MASK(mmu)))
#define pte_writable(mmu, pte) (!!(*(pte) & PTE_WRITABLE_MASK(mmu)))
#define pte_user(mmu, pte) (!!(*(pte) & PTE_USER_MASK(mmu)))
diff --git a/tools/testing/selftests/kvm/lib/x86/processor.c b/tools/testing/selftests/kvm/lib/x86/processor.c
index b22c8c1bfdc3..749ae7522473 100644
--- a/tools/testing/selftests/kvm/lib/x86/processor.c
+++ b/tools/testing/selftests/kvm/lib/x86/processor.c
@@ -227,7 +227,8 @@ static uint64_t *virt_create_upper_pte(struct kvm_vm *vm,
paddr = vm_untag_gpa(vm, paddr);
if (!pte_present(mmu, pte)) {
- *pte = PTE_PRESENT_MASK(mmu) | PTE_WRITABLE_MASK(mmu) | PTE_X_MASK(mmu);
+ *pte = PTE_PRESENT_MASK(mmu) | PTE_WRITABLE_MASK(mmu)
+ | PTE_X_MASK(mmu) | PTE_ALWAYS_SET_MASK(mmu);
if (current_level == target_level)
*pte |= PTE_HUGE_MASK(mmu) | (paddr & PHYSICAL_PAGE_MASK);
else
@@ -293,7 +294,8 @@ void __virt_pg_map(struct kvm_vm *vm, struct kvm_mmu *mmu, uint64_t vaddr,
pte = virt_get_pte(vm, mmu, pte, vaddr, PG_LEVEL_4K);
TEST_ASSERT(!pte_present(mmu, pte),
"PTE already present for 4k page at vaddr: 0x%lx", vaddr);
- *pte = PTE_PRESENT_MASK(mmu) | PTE_WRITABLE_MASK(mmu) | PTE_X_MASK(mmu)
+ *pte = PTE_PRESENT_MASK(mmu) | PTE_WRITABLE_MASK(mmu)
+ | PTE_X_MASK(mmu) | PTE_ALWAYS_SET_MASK(mmu)
| (paddr & PHYSICAL_PAGE_MASK);
/*
diff --git a/tools/testing/selftests/kvm/lib/x86/svm.c b/tools/testing/selftests/kvm/lib/x86/svm.c
index cf3b98802164..838f218545af 100644
--- a/tools/testing/selftests/kvm/lib/x86/svm.c
+++ b/tools/testing/selftests/kvm/lib/x86/svm.c
@@ -73,6 +73,9 @@ void vm_enable_npt(struct kvm_vm *vm)
pte_masks.c = 0;
pte_masks.s = 0;
+ /* NPT walks are treated as user accesses, so set the 'user' bit */
+ pte_masks.always_set = pte_masks.user;
+
vm->arch.nested.mmu = mmu_create(vm, vm->pgtable_levels, &pte_masks);
}
--
2.52.0.158.g65b55ccf14-goog
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v3 15/16] KVM: selftests: Extend vmx_dirty_log_test to cover SVM
2025-11-27 1:34 [PATCH v3 00/16] Add Nested NPT support in selftests Yosry Ahmed
` (13 preceding siblings ...)
2025-11-27 1:34 ` [PATCH v3 14/16] KVM: selftests: Set the user bit on nested NPT PTEs Yosry Ahmed
@ 2025-11-27 1:34 ` Yosry Ahmed
2025-11-27 1:34 ` [PATCH v3 16/16] KVM: selftests: Extend memstress to run on nested SVM Yosry Ahmed
2025-12-23 22:01 ` [PATCH v3 00/16] Add Nested NPT support in selftests Sean Christopherson
16 siblings, 0 replies; 36+ messages in thread
From: Yosry Ahmed @ 2025-11-27 1:34 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed
Generalize the code in vmx_dirty_log_test.c by adding SVM-specific L1
code, doing some renaming (e.g. EPT -> TDP), and having setup code for
both SVM and VMX in test_dirty_log().
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
---
tools/testing/selftests/kvm/Makefile.kvm | 2 +-
...rty_log_test.c => nested_dirty_log_test.c} | 73 ++++++++++++++-----
2 files changed, 54 insertions(+), 21 deletions(-)
rename tools/testing/selftests/kvm/x86/{vmx_dirty_log_test.c => nested_dirty_log_test.c} (71%)
diff --git a/tools/testing/selftests/kvm/Makefile.kvm b/tools/testing/selftests/kvm/Makefile.kvm
index 7ebf30a87a2b..13fe403f5d82 100644
--- a/tools/testing/selftests/kvm/Makefile.kvm
+++ b/tools/testing/selftests/kvm/Makefile.kvm
@@ -89,6 +89,7 @@ TEST_GEN_PROGS_x86 += x86/kvm_buslock_test
TEST_GEN_PROGS_x86 += x86/monitor_mwait_test
TEST_GEN_PROGS_x86 += x86/msrs_test
TEST_GEN_PROGS_x86 += x86/nested_close_kvm_test
+TEST_GEN_PROGS_x86 += x86/nested_dirty_log_test
TEST_GEN_PROGS_x86 += x86/nested_emulation_test
TEST_GEN_PROGS_x86 += x86/nested_exceptions_test
TEST_GEN_PROGS_x86 += x86/nested_invalid_cr3_test
@@ -115,7 +116,6 @@ TEST_GEN_PROGS_x86 += x86/ucna_injection_test
TEST_GEN_PROGS_x86 += x86/userspace_io_test
TEST_GEN_PROGS_x86 += x86/userspace_msr_exit_test
TEST_GEN_PROGS_x86 += x86/vmx_apic_access_test
-TEST_GEN_PROGS_x86 += x86/vmx_dirty_log_test
TEST_GEN_PROGS_x86 += x86/vmx_exception_with_invalid_guest_state
TEST_GEN_PROGS_x86 += x86/vmx_msrs_test
TEST_GEN_PROGS_x86 += x86/vmx_invalid_nested_guest_state
diff --git a/tools/testing/selftests/kvm/x86/vmx_dirty_log_test.c b/tools/testing/selftests/kvm/x86/nested_dirty_log_test.c
similarity index 71%
rename from tools/testing/selftests/kvm/x86/vmx_dirty_log_test.c
rename to tools/testing/selftests/kvm/x86/nested_dirty_log_test.c
index 032ab8bf60a4..89d2e86a0db9 100644
--- a/tools/testing/selftests/kvm/x86/vmx_dirty_log_test.c
+++ b/tools/testing/selftests/kvm/x86/nested_dirty_log_test.c
@@ -12,6 +12,7 @@
#include "test_util.h"
#include "kvm_util.h"
#include "processor.h"
+#include "svm_util.h"
#include "vmx.h"
/* The memory slot index to track dirty pages */
@@ -25,6 +26,8 @@
#define NESTED_TEST_MEM1 0xc0001000
#define NESTED_TEST_MEM2 0xc0002000
+#define L2_GUEST_STACK_SIZE 64
+
static void l2_guest_code(u64 *a, u64 *b)
{
READ_ONCE(*a);
@@ -42,20 +45,19 @@ static void l2_guest_code(u64 *a, u64 *b)
vmcall();
}
-static void l2_guest_code_ept_enabled(void)
+static void l2_guest_code_tdp_enabled(void)
{
l2_guest_code((u64 *)NESTED_TEST_MEM1, (u64 *)NESTED_TEST_MEM2);
}
-static void l2_guest_code_ept_disabled(void)
+static void l2_guest_code_tdp_disabled(void)
{
- /* Access the same L1 GPAs as l2_guest_code_ept_enabled() */
+ /* Access the same L1 GPAs as l2_guest_code_tdp_enabled() */
l2_guest_code((u64 *)GUEST_TEST_MEM, (u64 *)GUEST_TEST_MEM);
}
-void l1_guest_code(struct vmx_pages *vmx)
+void l1_vmx_code(struct vmx_pages *vmx)
{
-#define L2_GUEST_STACK_SIZE 64
unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
void *l2_rip;
@@ -64,22 +66,49 @@ void l1_guest_code(struct vmx_pages *vmx)
GUEST_ASSERT(load_vmcs(vmx));
if (vmx->eptp_gpa)
- l2_rip = l2_guest_code_ept_enabled;
+ l2_rip = l2_guest_code_tdp_enabled;
else
- l2_rip = l2_guest_code_ept_disabled;
+ l2_rip = l2_guest_code_tdp_disabled;
prepare_vmcs(vmx, l2_rip, &l2_guest_stack[L2_GUEST_STACK_SIZE]);
GUEST_SYNC(false);
GUEST_ASSERT(!vmlaunch());
GUEST_SYNC(false);
- GUEST_ASSERT(vmreadz(VM_EXIT_REASON) == EXIT_REASON_VMCALL);
+ GUEST_ASSERT_EQ(vmreadz(VM_EXIT_REASON), EXIT_REASON_VMCALL);
GUEST_DONE();
}
-static void test_vmx_dirty_log(bool enable_ept)
+static void l1_svm_code(struct svm_test_data *svm)
{
- vm_vaddr_t vmx_pages_gva = 0;
+ unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
+ void *l2_rip;
+
+ if (svm->ncr3_gpa)
+ l2_rip = l2_guest_code_tdp_enabled;
+ else
+ l2_rip = l2_guest_code_tdp_disabled;
+
+ generic_svm_setup(svm, l2_rip, &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+
+ GUEST_SYNC(false);
+ run_guest(svm->vmcb, svm->vmcb_gpa);
+ GUEST_SYNC(false);
+ GUEST_ASSERT_EQ(svm->vmcb->control.exit_code, SVM_EXIT_VMMCALL);
+ GUEST_DONE();
+}
+
+static void l1_guest_code(void *data)
+{
+ if (this_cpu_has(X86_FEATURE_VMX))
+ l1_vmx_code(data);
+ else
+ l1_svm_code(data);
+}
+
+static void test_dirty_log(bool nested_tdp)
+{
+ vm_vaddr_t nested_gva = 0;
unsigned long *bmap;
uint64_t *host_test_mem;
@@ -88,15 +117,19 @@ static void test_vmx_dirty_log(bool enable_ept)
struct ucall uc;
bool done = false;
- pr_info("Nested EPT: %s\n", enable_ept ? "enabled" : "disabled");
+ pr_info("Nested TDP: %s\n", nested_tdp ? "enabled" : "disabled");
/* Create VM */
vm = vm_create_with_one_vcpu(&vcpu, l1_guest_code);
- if (enable_ept)
+ if (nested_tdp)
vm_enable_tdp(vm);
- vcpu_alloc_vmx(vm, &vmx_pages_gva);
- vcpu_args_set(vcpu, 1, vmx_pages_gva);
+ if (kvm_cpu_has(X86_FEATURE_VMX))
+ vcpu_alloc_vmx(vm, &nested_gva);
+ else
+ vcpu_alloc_svm(vm, &nested_gva);
+
+ vcpu_args_set(vcpu, 1, nested_gva);
/* Add an extra memory slot for testing dirty logging */
vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS,
@@ -115,10 +148,10 @@ static void test_vmx_dirty_log(bool enable_ept)
* ... pages in the L2 GPA range [0xc0001000, 0xc0003000) will map to
* 0xc0000000.
*
- * When EPT is disabled, the L2 guest code will still access the same L1
- * GPAs as the EPT enabled case.
+ * When TDP is disabled, the L2 guest code will still access the same L1
+ * GPAs as the TDP enabled case.
*/
- if (enable_ept) {
+ if (nested_tdp) {
tdp_identity_map_default_memslots(vm);
tdp_map(vm, NESTED_TEST_MEM1, GUEST_TEST_MEM, PAGE_SIZE);
tdp_map(vm, NESTED_TEST_MEM2, GUEST_TEST_MEM, PAGE_SIZE);
@@ -166,12 +199,12 @@ static void test_vmx_dirty_log(bool enable_ept)
int main(int argc, char *argv[])
{
- TEST_REQUIRE(kvm_cpu_has(X86_FEATURE_VMX));
+ TEST_REQUIRE(kvm_cpu_has(X86_FEATURE_VMX) || kvm_cpu_has(X86_FEATURE_SVM));
- test_vmx_dirty_log(/*enable_ept=*/false);
+ test_dirty_log(/*nested_tdp=*/false);
if (kvm_cpu_has_tdp())
- test_vmx_dirty_log(/*enable_ept=*/true);
+ test_dirty_log(/*nested_tdp=*/true);
return 0;
}
--
2.52.0.158.g65b55ccf14-goog
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v3 16/16] KVM: selftests: Extend memstress to run on nested SVM
2025-11-27 1:34 [PATCH v3 00/16] Add Nested NPT support in selftests Yosry Ahmed
` (14 preceding siblings ...)
2025-11-27 1:34 ` [PATCH v3 15/16] KVM: selftests: Extend vmx_dirty_log_test to cover SVM Yosry Ahmed
@ 2025-11-27 1:34 ` Yosry Ahmed
2025-12-23 22:01 ` [PATCH v3 00/16] Add Nested NPT support in selftests Sean Christopherson
16 siblings, 0 replies; 36+ messages in thread
From: Yosry Ahmed @ 2025-11-27 1:34 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Yosry Ahmed
Add L1 SVM code and generalize the setup code to work for both VMX and
SVM. This allows running 'dirty_log_perf_test -n' on AMD CPUs.
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
---
.../testing/selftests/kvm/lib/x86/memstress.c | 42 +++++++++++++++----
1 file changed, 35 insertions(+), 7 deletions(-)
diff --git a/tools/testing/selftests/kvm/lib/x86/memstress.c b/tools/testing/selftests/kvm/lib/x86/memstress.c
index 407abfc34909..86f4c5e4c430 100644
--- a/tools/testing/selftests/kvm/lib/x86/memstress.c
+++ b/tools/testing/selftests/kvm/lib/x86/memstress.c
@@ -13,6 +13,7 @@
#include "kvm_util.h"
#include "memstress.h"
#include "processor.h"
+#include "svm_util.h"
#include "vmx.h"
void memstress_l2_guest_code(uint64_t vcpu_id)
@@ -29,9 +30,10 @@ __asm__(
" ud2;"
);
-static void memstress_l1_guest_code(struct vmx_pages *vmx, uint64_t vcpu_id)
-{
#define L2_GUEST_STACK_SIZE 64
+
+static void l1_vmx_code(struct vmx_pages *vmx, uint64_t vcpu_id)
+{
unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
unsigned long *rsp;
@@ -45,10 +47,34 @@ static void memstress_l1_guest_code(struct vmx_pages *vmx, uint64_t vcpu_id)
prepare_vmcs(vmx, memstress_l2_guest_entry, rsp);
GUEST_ASSERT(!vmlaunch());
- GUEST_ASSERT(vmreadz(VM_EXIT_REASON) == EXIT_REASON_VMCALL);
+ GUEST_ASSERT_EQ(vmreadz(VM_EXIT_REASON), EXIT_REASON_VMCALL);
+ GUEST_DONE();
+}
+
+static void l1_svm_code(struct svm_test_data *svm, uint64_t vcpu_id)
+{
+ unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
+ unsigned long *rsp;
+
+
+ rsp = &l2_guest_stack[L2_GUEST_STACK_SIZE - 1];
+ *rsp = vcpu_id;
+ generic_svm_setup(svm, memstress_l2_guest_entry, rsp);
+
+ run_guest(svm->vmcb, svm->vmcb_gpa);
+ GUEST_ASSERT_EQ(svm->vmcb->control.exit_code, SVM_EXIT_VMMCALL);
GUEST_DONE();
}
+
+static void memstress_l1_guest_code(void *data, uint64_t vcpu_id)
+{
+ if (this_cpu_has(X86_FEATURE_VMX))
+ l1_vmx_code(data, vcpu_id);
+ else
+ l1_svm_code(data, vcpu_id);
+}
+
uint64_t memstress_nested_pages(int nr_vcpus)
{
/*
@@ -78,15 +104,17 @@ static void memstress_setup_ept_mappings(struct kvm_vm *vm)
void memstress_setup_nested(struct kvm_vm *vm, int nr_vcpus, struct kvm_vcpu *vcpus[])
{
struct kvm_regs regs;
- vm_vaddr_t vmx_gva;
+ vm_vaddr_t nested_gva;
int vcpu_id;
- TEST_REQUIRE(kvm_cpu_has(X86_FEATURE_VMX));
TEST_REQUIRE(kvm_cpu_has_tdp());
vm_enable_tdp(vm);
for (vcpu_id = 0; vcpu_id < nr_vcpus; vcpu_id++) {
- vcpu_alloc_vmx(vm, &vmx_gva);
+ if (kvm_cpu_has(X86_FEATURE_VMX))
+ vcpu_alloc_vmx(vm, &nested_gva);
+ else
+ vcpu_alloc_svm(vm, &nested_gva);
/* The EPTs are shared across vCPUs, setup the mappings once */
if (vcpu_id == 0)
@@ -99,6 +127,6 @@ void memstress_setup_nested(struct kvm_vm *vm, int nr_vcpus, struct kvm_vcpu *vc
vcpu_regs_get(vcpus[vcpu_id], ®s);
regs.rip = (unsigned long) memstress_l1_guest_code;
vcpu_regs_set(vcpus[vcpu_id], ®s);
- vcpu_args_set(vcpus[vcpu_id], 2, vmx_gva, vcpu_id);
+ vcpu_args_set(vcpus[vcpu_id], 2, nested_gva, vcpu_id);
}
}
--
2.52.0.158.g65b55ccf14-goog
^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [PATCH v3 09/16] KVM: selftests: Stop passing VMX metadata to TDP mapping functions
2025-11-27 1:34 ` [PATCH v3 09/16] KVM: selftests: Stop passing VMX metadata to TDP mapping functions Yosry Ahmed
@ 2025-12-15 18:38 ` Yosry Ahmed
0 siblings, 0 replies; 36+ messages in thread
From: Yosry Ahmed @ 2025-12-15 18:38 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel
On Thu, Nov 27, 2025 at 01:34:33AM +0000, Yosry Ahmed wrote:
> @@ -87,11 +86,11 @@ void memstress_setup_nested(struct kvm_vm *vm, int nr_vcpus, struct kvm_vcpu *vc
>
> vm_enable_ept(vm);
> for (vcpu_id = 0; vcpu_id < nr_vcpus; vcpu_id++) {
> - vmx = vcpu_alloc_vmx(vm, &vmx_gva);
> + vcpu_alloc_vmx(vm, &vmx_gva);
>
> /* The EPTs are shared across vCPUs, setup the mappings once */
> if (vcpu_id == 0)
> - memstress_setup_ept_mappings(vmx, vm);
> + memstress_setup_ept_mappings(vm);
I think at this point we can actually move
memstress_setup_ept_mappings() before the loop similar to
vm_enable_ept().
I can send a patch on top or a new version, let me know if you prefer
that I do either, or fix it up while applying.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 00/16] Add Nested NPT support in selftests
2025-11-27 1:34 [PATCH v3 00/16] Add Nested NPT support in selftests Yosry Ahmed
` (15 preceding siblings ...)
2025-11-27 1:34 ` [PATCH v3 16/16] KVM: selftests: Extend memstress to run on nested SVM Yosry Ahmed
@ 2025-12-23 22:01 ` Sean Christopherson
2025-12-23 23:48 ` Yosry Ahmed
16 siblings, 1 reply; 36+ messages in thread
From: Sean Christopherson @ 2025-12-23 22:01 UTC (permalink / raw)
To: Yosry Ahmed; +Cc: Paolo Bonzini, kvm, linux-kernel
On Thu, Nov 27, 2025, Yosry Ahmed wrote:
> Yosry Ahmed (16):
> KVM: selftests: Make __vm_get_page_table_entry() static
> KVM: selftests: Stop passing a memslot to nested_map_memslot()
> KVM: selftests: Rename nested TDP mapping functions
> KVM: selftests: Kill eptPageTablePointer
> KVM: selftests: Stop setting AD bits on nested EPTs on creation
> KVM: selftests: Introduce struct kvm_mmu
> KVM: selftests: Move PTE bitmasks to kvm_mmu
> KVM: selftests: Use a nested MMU to share nested EPTs between vCPUs
> KVM: selftests: Stop passing VMX metadata to TDP mapping functions
> KVM: selftests: Reuse virt mapping functions for nested EPTs
> KVM: selftests: Move TDP mapping functions outside of vmx.c
> KVM: selftests: Allow kvm_cpu_has_ept() to be called on AMD CPUs
> KVM: selftests: Add support for nested NPTs
> KVM: selftests: Set the user bit on nested NPT PTEs
> KVM: selftests: Extend vmx_dirty_log_test to cover SVM
> KVM: selftests: Extend memstress to run on nested SVM
Lot's of feedback incoming, but no need for you to doing anything unless you
disagree with something. I have all the "requested" changes in a local branch,
and will post v4 (probably next week).
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 05/16] KVM: selftests: Stop setting AD bits on nested EPTs on creation
2025-11-27 1:34 ` [PATCH v3 05/16] KVM: selftests: Stop setting AD bits on nested EPTs on creation Yosry Ahmed
@ 2025-12-23 22:26 ` Sean Christopherson
2025-12-23 23:35 ` Yosry Ahmed
0 siblings, 1 reply; 36+ messages in thread
From: Sean Christopherson @ 2025-12-23 22:26 UTC (permalink / raw)
To: Yosry Ahmed; +Cc: Paolo Bonzini, kvm, linux-kernel
On Thu, Nov 27, 2025, Yosry Ahmed wrote:
> When new nested EPTs are created, the AD bits are set. This was
> introduced by commit 094444204570 ("selftests: kvm: add test for dirty
> logging inside nested guests"), which introduced vmx_dirty_log_test.
>
> It's unclear why that was needed at the time, but regardless, the test
> seems to pass without them so probably no longer needed.
> dirty_log_perf_test (with -n to run in L2) also passes, and these are
> the only tests currently using nested EPT mappings.
Please, please don't take the approach of "beat on it until it passes". Yes,
Paolo's changelog and comment from 094444204570 are awful, and yes, figuring out
what Paolo _likely_ intended requires a lot of guesswork and esoteric shadow MMU
knowledge, but _at best_, modifying code without having at least a plausible
explanation only adds to the confusion, and at worst is actively dangerous.
As you've discovered a few times now, there is plenty of code in KVM and its
tests that elicit a WTF?!!? response from pretty much everyone, i.e. odds are
good you'll run into something along these lines again, sooner or later. If/when
that happens, and you can't unravel the mystery, please send a mail with a question
instead of sending a patch that papers over the issue. E.g. casting "raise dead"
on the original thread is totally acceptable (and probably even encouraged).
That _might_ be a slower and/or more painful approach, but it's generally safer,
e.g. it's all too easy for a maintainer to speed read and apply a seemingly
uninteresting patch like this.
In this case, I strongly suspect that what Paolo was _trying_ to do was coerce
KVM into creating a writable SPTE in response to the initial READ. I.e. in the
vmx_dirty_log_test's L2 code, setting the Dirty bit in the guest stage-2 PTE is
necessary to get KVM to install a writable SPTE on the READ_ONCE():
static void l2_guest_code(u64 *a, u64 *b)
{
READ_ONCE(*a);
WRITE_ONCE(*a, 1);
GUEST_SYNC(true);
GUEST_SYNC(false);
...
}
When handling a read fault in FNAME(walk_addr_generic)(), KVM adjusts the guest
PTE access protections via FNAME(protect_clean_gpte):
if (!write_fault)
FNAME(protect_clean_gpte)(mmu, &walker->pte_access, pte);
else
/*
* On a write fault, fold the dirty bit into accessed_dirty.
* For modes without A/D bits support accessed_dirty will be
* always clear.
*/
accessed_dirty &= pte >>
(PT_GUEST_DIRTY_SHIFT - PT_GUEST_ACCESSED_SHIFT);
where FNAME(protect_clean_gpte) is:
unsigned mask;
/* dirty bit is not supported, so no need to track it */
if (!PT_HAVE_ACCESSED_DIRTY(mmu))
return;
BUILD_BUG_ON(PT_WRITABLE_MASK != ACC_WRITE_MASK);
mask = (unsigned)~ACC_WRITE_MASK;
/* Allow write access to dirty gptes */
mask |= (gpte >> (PT_GUEST_DIRTY_SHIFT - PT_WRITABLE_SHIFT)) &
PT_WRITABLE_MASK;
*access &= mask;
The idea is that KVM can elide a VM-Exit on a READ=>WRITE sequence by making the
SPTE writable on the READ fault if the guest PTEs allow the page to be written.
But KVM can only do that if the guest Dirty bit is '1'; if the Dirty bit is '0',
then KVM needs to intercept the next write in order to emulate the Dirty assist.
I emphasized "trying" above because, in the context of the dirty logging test,
simply establishing a writable SPTE on the READ_ONCE() doesn't meaningfully improve
coverage without checking KVM's behavior after the READ_ONCE(). E.g. KVM marks
GFNs as dirty (in the dirty bitmap) when creating writable SPTEs, and so doing
READ+WRITE and _then_ checking the dirty bitmap would hide the KVM PML bug that
the test was written to find.
The second part of the L2 guest code:
WRITE_ONCE(*b, 1);
GUEST_SYNC(true);
WRITE_ONCE(*b, 1);
GUEST_SYNC(true);
GUEST_SYNC(false);
_does_ trigger and detect the KVM bug, but with a WRITE=>CHECK=>WRITE=>CHECK
sequence instead of READ=>CHECK=>WRITE=>CHECK. I.e. even if there was a false
pass in the first phase, as written the second phase will detect the bug,
assuming the bug affects WRITE=>WRITE and READ=>WRITE equally. Which isn't a
great assumption, but it was the case for the PML bug.
All that said, for this patch, I think it makes sense to drop the A/D code from
the common APIs, because (a) it will be trivially easy to have the test set them
as needed once the common APIs are used for TDP mappings, and (b) temporarily
dropping the code doesn't affect the test coverage in practice.
--
Stop setting Accessed/Dirty bits when creating EPT entries for L2 so that
the stage-1 and stage-2 (a.k.a. TDP) page table APIs can use common code
without bleeding the EPT hack into the common APIs.
While commit 094444204570 ("selftests: kvm: add test for dirty logging
inside nested guests") is _very_ light on details, the most likely
explanation is that vmx_dirty_log_test was attempting to avoid taking an
EPT Violation on the first _write_ from L2.
static void l2_guest_code(u64 *a, u64 *b)
{
READ_ONCE(*a);
WRITE_ONCE(*a, 1); <===
GUEST_SYNC(true);
...
}
When handling read faults in the shadow MMU, KVM opportunistically creates
a writable SPTE if the mapping can be writable *and* the gPTE is dirty (or
doesn't support the Dirty bit), i.e. if KVM doesn't need to intercept
writes in order to emulate Dirty-bit updates. By setting A/D bits in the
test's EPT entries, the above READ+WRITE will fault only on the read, and
in theory expose the bug fixed by KVM commit 1f4e5fc83a42 ("KVM: x86: fix
nested guest live migration with PML"). If the Dirty bit is NOT set, the
test will get a false pass due; though again, in theory.
However, the test is flawed (and always was, at least in the versions
posted publicly), as KVM (correctly) marks the corresponding L1 GFN as
dirty (in the dirty bitmap) when creating the writable SPTE. I.e. without
a check on the dirty bitmap after the READ_ONCE(), the check after the
first WRITE_ONCE() will get a false pass due to the dirty bitmap/log having
been updated by the read fault, not by PML.
Furthermore, the subsequent behavior in the test's l2_guest_code()
effectively hides the flawed test behavior, as the straight writes to a
new L2 GPA fault also trigger the KVM bug, and so the test will still
detect the failure due to lack of isolation between the two testcases
(Read=>Write vs. Write=>Write).
WRITE_ONCE(*b, 1);
GUEST_SYNC(true);
WRITE_ONCE(*b, 1);
GUEST_SYNC(true);
GUEST_SYNC(false);
Punt on fixing vmx_dirty_log_test for the moment as it will be easier to
properly fix the test once the TDP code uses the common MMU APIs, at which
point it will be trivially easy for the test to retrieve the EPT PTE and
set the Dirty bit as needed.
--
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 06/16] KVM: selftests: Introduce struct kvm_mmu
2025-11-27 1:34 ` [PATCH v3 06/16] KVM: selftests: Introduce struct kvm_mmu Yosry Ahmed
@ 2025-12-23 22:29 ` Sean Christopherson
2025-12-23 23:38 ` Yosry Ahmed
0 siblings, 1 reply; 36+ messages in thread
From: Sean Christopherson @ 2025-12-23 22:29 UTC (permalink / raw)
To: Yosry Ahmed; +Cc: Paolo Bonzini, kvm, linux-kernel
On Thu, Nov 27, 2025, Yosry Ahmed wrote:
> In preparation for generalizing the virt mapping functions to work with
> TDP page tables, introduce struct kvm_mmu. This struct currently only
> holds the root GPA and number of page table levels. Parameterize virt
> mapping functions by the kvm_mmu, and use the root GPA and page table
> levels instead of hardcoding vm->pgd and vm->pgtable_levels.
>
> There's a subtle change here, instead of checking that the parent
> pointer is the address of the vm->pgd, check if the value pointed at by
> the parent pointer is the root GPA (i.e. the value of vm->pgd in this
> case). No change in behavior expected.
>
> Opportunistically, switch the ordering of the checks in the assertion in
> virt_get_pte(), as it makes more sense to check if the parent PTE is the
> root (in which case, not a PTE) before checking the present flag.
>
> vm->arch.mmu is dynamically allocated to avoid a circular dependency
> chain if kvm_util_arch.h includes processor.h for the struct definition:
> kvm_util_arch.h -> processor.h -> kvm_util.h -> kvm_util_arch.h
>
> No functional change intended.
>
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
> ---
> .../selftests/kvm/include/x86/kvm_util_arch.h | 4 ++
> .../selftests/kvm/include/x86/processor.h | 8 ++-
> .../testing/selftests/kvm/lib/x86/processor.c | 61 +++++++++++++------
> 3 files changed, 53 insertions(+), 20 deletions(-)
>
> diff --git a/tools/testing/selftests/kvm/include/x86/kvm_util_arch.h b/tools/testing/selftests/kvm/include/x86/kvm_util_arch.h
> index 972bb1c4ab4c..d8808fa33faa 100644
> --- a/tools/testing/selftests/kvm/include/x86/kvm_util_arch.h
> +++ b/tools/testing/selftests/kvm/include/x86/kvm_util_arch.h
> @@ -10,6 +10,8 @@
>
> extern bool is_forced_emulation_enabled;
>
> +struct kvm_mmu;
> +
> struct kvm_vm_arch {
> vm_vaddr_t gdt;
> vm_vaddr_t tss;
> @@ -19,6 +21,8 @@ struct kvm_vm_arch {
> uint64_t s_bit;
> int sev_fd;
> bool is_pt_protected;
> +
> + struct kvm_mmu *mmu;
No, put kvm_mmu in common code and create kvm_vm.mmu. This makes the "mmu" object
a weird copy of state that's already in kvm_vm (pgd, pgd_created, and pgtable_levels),
and more importantly makes it _way_ to easy to botch the x86 MMU code (speaking
from first hand experience), e.g. due to grabbing vm->pgtable_levels instead of
the mmu's version. I don't see an easy way to _completely_ guard against goofs
like that, but it's easy-ish to audit code the code for instance of "vm->mmu.",
and adding a common kvm_mmu avoids the weird duplicate code.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 07/16] KVM: selftests: Move PTE bitmasks to kvm_mmu
2025-11-27 1:34 ` [PATCH v3 07/16] KVM: selftests: Move PTE bitmasks to kvm_mmu Yosry Ahmed
@ 2025-12-23 22:31 ` Sean Christopherson
2025-12-23 23:40 ` Yosry Ahmed
0 siblings, 1 reply; 36+ messages in thread
From: Sean Christopherson @ 2025-12-23 22:31 UTC (permalink / raw)
To: Yosry Ahmed; +Cc: Paolo Bonzini, kvm, linux-kernel
On Thu, Nov 27, 2025, Yosry Ahmed wrote:
> @@ -1449,11 +1439,44 @@ enum pg_level {
> #define PG_SIZE_2M PG_LEVEL_SIZE(PG_LEVEL_2M)
> #define PG_SIZE_1G PG_LEVEL_SIZE(PG_LEVEL_1G)
>
> +struct pte_masks {
> + uint64_t present;
> + uint64_t writable;
> + uint64_t user;
> + uint64_t accessed;
> + uint64_t dirty;
> + uint64_t huge;
> + uint64_t nx;
> + uint64_t c;
> + uint64_t s;
> +};
> +
> struct kvm_mmu {
> uint64_t root_gpa;
> int pgtable_levels;
> + struct pte_masks pte_masks;
And then introduce kvm_mmu_arch so that x86 can shove pte_masks into the mmu.
> };
>
> +#define PTE_PRESENT_MASK(mmu) ((mmu)->pte_masks.present)
> +#define PTE_WRITABLE_MASK(mmu) ((mmu)->pte_masks.writable)
> +#define PTE_USER_MASK(mmu) ((mmu)->pte_masks.user)
> +#define PTE_ACCESSED_MASK(mmu) ((mmu)->pte_masks.accessed)
> +#define PTE_DIRTY_MASK(mmu) ((mmu)->pte_masks.dirty)
> +#define PTE_HUGE_MASK(mmu) ((mmu)->pte_masks.huge)
> +#define PTE_NX_MASK(mmu) ((mmu)->pte_masks.nx)
> +#define PTE_C_MASK(mmu) ((mmu)->pte_masks.c)
> +#define PTE_S_MASK(mmu) ((mmu)->pte_masks.s)
> +
> +#define pte_present(mmu, pte) (!!(*(pte) & PTE_PRESENT_MASK(mmu)))
I very, very strongly prefer is_present_pte(), is_huge_pte(), etc.
> +#define pte_writable(mmu, pte) (!!(*(pte) & PTE_WRITABLE_MASK(mmu)))
> +#define pte_user(mmu, pte) (!!(*(pte) & PTE_USER_MASK(mmu)))
> +#define pte_accessed(mmu, pte) (!!(*(pte) & PTE_ACCESSED_MASK(mmu)))
> +#define pte_dirty(mmu, pte) (!!(*(pte) & PTE_DIRTY_MASK(mmu)))
> +#define pte_huge(mmu, pte) (!!(*(pte) & PTE_HUGE_MASK(mmu)))
> +#define pte_nx(mmu, pte) (!!(*(pte) & PTE_NX_MASK(mmu)))
> +#define pte_c(mmu, pte) (!!(*(pte) & PTE_C_MASK(mmu)))
> +#define pte_s(mmu, pte) (!!(*(pte) & PTE_S_MASK(mmu)))
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 10/16] KVM: selftests: Reuse virt mapping functions for nested EPTs
2025-11-27 1:34 ` [PATCH v3 10/16] KVM: selftests: Reuse virt mapping functions for nested EPTs Yosry Ahmed
@ 2025-12-23 23:12 ` Sean Christopherson
2025-12-23 23:45 ` Yosry Ahmed
2025-12-23 23:14 ` Sean Christopherson
1 sibling, 1 reply; 36+ messages in thread
From: Sean Christopherson @ 2025-12-23 23:12 UTC (permalink / raw)
To: Yosry Ahmed; +Cc: Paolo Bonzini, kvm, linux-kernel
On Thu, Nov 27, 2025, Yosry Ahmed wrote:
> __tdp_pg_map() bears a lot of resemblence to __virt_pg_map(). The
> main differences are:
> - It uses the EPT struct overlay instead of the PTE masks.
> - It always assumes 4-level EPTs.
>
> To reuse __virt_pg_map(), initialize the PTE masks in nested MMU with
> EPT PTE masks. EPTs have no 'present' or 'user' bits, so use the
> 'readable' bit instead like shadow_{present/user}_mask, ignoring the
> fact that entries can be present and not readable if the CPU has
> VMX_EPT_EXECUTE_ONLY_BIT. This is simple and sufficient for testing.
Ugh, no. I am strongly against playing the same insane games KVM itself plays
with overloading protectin/access bits. There's no reason for selftests to do
the same, e.g. selftests aren't shadowing guest PTEs and doing permission checks
in hot paths and so don't need to multiplex a bunch of things into an inscrutable
(but performant!) system.
> Add an executable bitmask and update __virt_pg_map() and friends to set
> the bit on newly created entries to match the EPT behavior. It's a nop
> for x86 page tables.
>
> Another benefit of reusing the code is having separate handling for
> upper-level PTEs vs 4K PTEs, which avoids some quirks like setting the
> large bit on a 4K PTE in the EPTs.
>
> No functional change intended.
>
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
> ---
> .../selftests/kvm/include/x86/processor.h | 3 +
> .../testing/selftests/kvm/lib/x86/processor.c | 12 +-
> tools/testing/selftests/kvm/lib/x86/vmx.c | 115 ++++--------------
> 3 files changed, 33 insertions(+), 97 deletions(-)
>
> diff --git a/tools/testing/selftests/kvm/include/x86/processor.h b/tools/testing/selftests/kvm/include/x86/processor.h
> index fb2b2e53d453..62e10b296719 100644
> --- a/tools/testing/selftests/kvm/include/x86/processor.h
> +++ b/tools/testing/selftests/kvm/include/x86/processor.h
> @@ -1447,6 +1447,7 @@ struct pte_masks {
> uint64_t dirty;
> uint64_t huge;
> uint64_t nx;
> + uint64_t x;
To be consistent with e.g. writable, call this executable.
> uint64_t c;
> uint64_t s;
> };
> @@ -1464,6 +1465,7 @@ struct kvm_mmu {
> #define PTE_DIRTY_MASK(mmu) ((mmu)->pte_masks.dirty)
> #define PTE_HUGE_MASK(mmu) ((mmu)->pte_masks.huge)
> #define PTE_NX_MASK(mmu) ((mmu)->pte_masks.nx)
> +#define PTE_X_MASK(mmu) ((mmu)->pte_masks.x)
> #define PTE_C_MASK(mmu) ((mmu)->pte_masks.c)
> #define PTE_S_MASK(mmu) ((mmu)->pte_masks.s)
>
> @@ -1474,6 +1476,7 @@ struct kvm_mmu {
> #define pte_dirty(mmu, pte) (!!(*(pte) & PTE_DIRTY_MASK(mmu)))
> #define pte_huge(mmu, pte) (!!(*(pte) & PTE_HUGE_MASK(mmu)))
> #define pte_nx(mmu, pte) (!!(*(pte) & PTE_NX_MASK(mmu)))
> +#define pte_x(mmu, pte) (!!(*(pte) & PTE_X_MASK(mmu)))
And then here to not assume PRESENT == READABLE, just check if the MMU even has
a PRESENT bit. We may still need changes, e.g. the page table builders actually
need to verify a PTE is _writable_, not just present, but that's largely an
orthogonal issue.
#define is_present_pte(mmu, pte) \
(PTE_PRESENT_MASK(mmu) ? \
!!(*(pte) & PTE_PRESENT_MASK(mmu)) : \
!!(*(pte) & (PTE_READABLE_MASK(mmu) | PTE_EXECUTABLE_MASK(mmu))))
And to properly capture the relationship between NX and EXECUTABLE:
#define is_executable_pte(mmu, pte) \
((*(pte) & (PTE_EXECUTABLE_MASK(mmu) | PTE_NX_MASK(mmu))) == PTE_EXECUTABLE_MASK(mmu))
#define is_nx_pte(mmu, pte) (!is_executable_pte(mmu, pte))
> #define pte_c(mmu, pte) (!!(*(pte) & PTE_C_MASK(mmu)))
> #define pte_s(mmu, pte) (!!(*(pte) & PTE_S_MASK(mmu)))
>
> diff --git a/tools/testing/selftests/kvm/lib/x86/processor.c b/tools/testing/selftests/kvm/lib/x86/processor.c
> index bff75ff05364..8b0e17f8ca37 100644
> --- a/tools/testing/selftests/kvm/lib/x86/processor.c
> +++ b/tools/testing/selftests/kvm/lib/x86/processor.c
> @@ -162,8 +162,7 @@ struct kvm_mmu *mmu_create(struct kvm_vm *vm, int pgtable_levels,
> struct kvm_mmu *mmu = calloc(1, sizeof(*mmu));
>
> TEST_ASSERT(mmu, "-ENOMEM when allocating MMU");
> - if (pte_masks)
> - mmu->pte_masks = *pte_masks;
> + mmu->pte_masks = *pte_masks;
Rather than pass NULL (and allow NULL here) in the previous patch, pass an
empty pte_masks. That avoids churning the MMU initialization code, and allows
for a better TODO in the previous patch.
> + /*
> + * EPTs do not have 'present' or 'user' bits, instead bit 0 is the
> + * 'readable' bit. In some cases, EPTs can be execute-only and an entry
> + * is present but not readable. However, for the purposes of testing we
> + * assume 'present' == 'user' == 'readable' for simplicity.
> + */
> + pte_masks = (struct pte_masks){
> + .present = BIT_ULL(0),
> + .user = BIT_ULL(0),
> + .writable = BIT_ULL(1),
> + .x = BIT_ULL(2),
> + .accessed = BIT_ULL(5),
> + .dirty = BIT_ULL(6),
> + .huge = BIT_ULL(7),
> + .nx = 0,
> + };
> +
> /* EPTP_PWL_4 is always used */
Make this a TODO, e.g.
/* TODO: Add support for 5-level paging. */
so that it's clear this is a shortcoming, not some fundamental property of
selftests.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 11/16] KVM: selftests: Move TDP mapping functions outside of vmx.c
2025-11-27 1:34 ` [PATCH v3 11/16] KVM: selftests: Move TDP mapping functions outside of vmx.c Yosry Ahmed
@ 2025-12-23 23:13 ` Sean Christopherson
0 siblings, 0 replies; 36+ messages in thread
From: Sean Christopherson @ 2025-12-23 23:13 UTC (permalink / raw)
To: Yosry Ahmed; +Cc: Paolo Bonzini, kvm, linux-kernel
On Thu, Nov 27, 2025, Yosry Ahmed wrote:
> diff --git a/tools/testing/selftests/kvm/lib/x86/processor.c b/tools/testing/selftests/kvm/lib/x86/processor.c
> index 8b0e17f8ca37..517a8185eade 100644
> --- a/tools/testing/selftests/kvm/lib/x86/processor.c
> +++ b/tools/testing/selftests/kvm/lib/x86/processor.c
> @@ -467,6 +467,77 @@ void virt_arch_dump(FILE *stream, struct kvm_vm *vm, uint8_t indent)
> }
> }
>
> +/*
> + * Map a range of TDP guest physical addresses to the VM's physical address
> + *
> + * Input Args:
> + * vm - Virtual Machine
> + * nested_paddr - Nested guest physical address to map
> + * paddr - VM Physical Address
> + * size - The size of the range to map
> + * level - The level at which to map the range
> + *
> + * Output Args: None
> + *
> + * Return: None
> + *
> + * Within the VM given by vm, creates a nested guest translation for the
> + * page range starting at nested_paddr to the page range starting at paddr.
> + */
Eh, opportunistically drop this function comment. If the reader can't figure out
what tdp_map() is doing, a failure generic comment isn't likely to help.
> +void __tdp_map(struct kvm_vm *vm, uint64_t nested_paddr, uint64_t paddr,
> + uint64_t size, int level)
> +{
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 10/16] KVM: selftests: Reuse virt mapping functions for nested EPTs
2025-11-27 1:34 ` [PATCH v3 10/16] KVM: selftests: Reuse virt mapping functions for nested EPTs Yosry Ahmed
2025-12-23 23:12 ` Sean Christopherson
@ 2025-12-23 23:14 ` Sean Christopherson
2025-12-23 23:47 ` Yosry Ahmed
1 sibling, 1 reply; 36+ messages in thread
From: Sean Christopherson @ 2025-12-23 23:14 UTC (permalink / raw)
To: Yosry Ahmed; +Cc: Paolo Bonzini, kvm, linux-kernel
On Thu, Nov 27, 2025, Yosry Ahmed wrote:
> + /*
> + * EPTs do not have 'present' or 'user' bits, instead bit 0 is the
> + * 'readable' bit. In some cases, EPTs can be execute-only and an entry
> + * is present but not readable. However, for the purposes of testing we
> + * assume 'present' == 'user' == 'readable' for simplicity.
> + */
> + pte_masks = (struct pte_masks){
> + .present = BIT_ULL(0),
> + .user = BIT_ULL(0),
> + .writable = BIT_ULL(1),
> + .x = BIT_ULL(2),
> + .accessed = BIT_ULL(5),
> + .dirty = BIT_ULL(6),
Almost forgot, the Accessed and Dirty bits are wrong. They are bits 8 and 9
respectively, not 5 and 6. Amusingly (well, it's amusing *now*, it wasn't so
amusing at the time), I found that out when I couldn't get KVM to create a writable
SPTE on a read fault in the nested dirty log test :-)
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 08/16] KVM: selftests: Use a nested MMU to share nested EPTs between vCPUs
2025-11-27 1:34 ` [PATCH v3 08/16] KVM: selftests: Use a nested MMU to share nested EPTs between vCPUs Yosry Ahmed
@ 2025-12-23 23:16 ` Sean Christopherson
0 siblings, 0 replies; 36+ messages in thread
From: Sean Christopherson @ 2025-12-23 23:16 UTC (permalink / raw)
To: Yosry Ahmed; +Cc: Paolo Bonzini, kvm, linux-kernel
On Thu, Nov 27, 2025, Yosry Ahmed wrote:
> diff --git a/tools/testing/selftests/kvm/lib/x86/vmx.c b/tools/testing/selftests/kvm/lib/x86/vmx.c
> index a3e2eae981da..5d799ec5f7c6 100644
> --- a/tools/testing/selftests/kvm/lib/x86/vmx.c
> +++ b/tools/testing/selftests/kvm/lib/x86/vmx.c
> @@ -56,6 +56,16 @@ int vcpu_enable_evmcs(struct kvm_vcpu *vcpu)
> return evmcs_ver;
> }
>
> +void vm_enable_ept(struct kvm_vm *vm)
> +{
> + TEST_ASSERT(kvm_cpu_has_ept(), "KVM doesn't support nested EPT");
> + if (vm->arch.nested.mmu)
Make this an assert, attempting to enable EPT multiple times for a single VM is
a test bug. That way we don't have to worry about silly situations in the future,
e.g. like trying to change from 4-level to 5-level after the MMU has already been
created.
Ditto for the NPT code.
> + return;
> +
> + /* EPTP_PWL_4 is always used */
> + vm->arch.nested.mmu = mmu_create(vm, 4, NULL);
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 05/16] KVM: selftests: Stop setting AD bits on nested EPTs on creation
2025-12-23 22:26 ` Sean Christopherson
@ 2025-12-23 23:35 ` Yosry Ahmed
0 siblings, 0 replies; 36+ messages in thread
From: Yosry Ahmed @ 2025-12-23 23:35 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel
On Tue, Dec 23, 2025 at 02:26:20PM -0800, Sean Christopherson wrote:
> On Thu, Nov 27, 2025, Yosry Ahmed wrote:
> > When new nested EPTs are created, the AD bits are set. This was
> > introduced by commit 094444204570 ("selftests: kvm: add test for dirty
> > logging inside nested guests"), which introduced vmx_dirty_log_test.
> >
> > It's unclear why that was needed at the time, but regardless, the test
> > seems to pass without them so probably no longer needed.
> > dirty_log_perf_test (with -n to run in L2) also passes, and these are
> > the only tests currently using nested EPT mappings.
>
> Please, please don't take the approach of "beat on it until it passes". Yes,
> Paolo's changelog and comment from 094444204570 are awful, and yes, figuring out
> what Paolo _likely_ intended requires a lot of guesswork and esoteric shadow MMU
> knowledge, but _at best_, modifying code without having at least a plausible
> explanation only adds to the confusion, and at worst is actively dangerous.
>
> As you've discovered a few times now, there is plenty of code in KVM and its
> tests that elicit a WTF?!!? response from pretty much everyone, i.e. odds are
> good you'll run into something along these lines again, sooner or later. If/when
> that happens, and you can't unravel the mystery, please send a mail with a question
> instead of sending a patch that papers over the issue. E.g. casting "raise dead"
> on the original thread is totally acceptable (and probably even encouraged).
> That _might_ be a slower and/or more painful approach, but it's generally safer,
> e.g. it's all too easy for a maintainer to speed read and apply a seemingly
> uninteresting patch like this.
That's fair, I am used to sending a patch that potentially does the
wrong thing than a question, because people tend to be more responsive
to wrong patches :)
That being said, I should have made this much more obvious, like add RFC
or DO NOT MERGE to the patch title, rather than subtly burying it in the
commit log.
>
> In this case, I strongly suspect that what Paolo was _trying_ to do was coerce
> KVM into creating a writable SPTE in response to the initial READ. I.e. in the
> vmx_dirty_log_test's L2 code, setting the Dirty bit in the guest stage-2 PTE is
> necessary to get KVM to install a writable SPTE on the READ_ONCE():
>
> static void l2_guest_code(u64 *a, u64 *b)
> {
> READ_ONCE(*a);
> WRITE_ONCE(*a, 1);
> GUEST_SYNC(true);
> GUEST_SYNC(false);
>
> ...
> }
>
> When handling a read fault in FNAME(walk_addr_generic)(), KVM adjusts the guest
> PTE access protections via FNAME(protect_clean_gpte):
>
> if (!write_fault)
> FNAME(protect_clean_gpte)(mmu, &walker->pte_access, pte);
> else
> /*
> * On a write fault, fold the dirty bit into accessed_dirty.
> * For modes without A/D bits support accessed_dirty will be
> * always clear.
> */
> accessed_dirty &= pte >>
> (PT_GUEST_DIRTY_SHIFT - PT_GUEST_ACCESSED_SHIFT);
>
>
> where FNAME(protect_clean_gpte) is:
>
> unsigned mask;
>
> /* dirty bit is not supported, so no need to track it */
> if (!PT_HAVE_ACCESSED_DIRTY(mmu))
> return;
>
> BUILD_BUG_ON(PT_WRITABLE_MASK != ACC_WRITE_MASK);
>
> mask = (unsigned)~ACC_WRITE_MASK;
> /* Allow write access to dirty gptes */
> mask |= (gpte >> (PT_GUEST_DIRTY_SHIFT - PT_WRITABLE_SHIFT)) &
> PT_WRITABLE_MASK;
> *access &= mask;
>
> The idea is that KVM can elide a VM-Exit on a READ=>WRITE sequence by making the
> SPTE writable on the READ fault if the guest PTEs allow the page to be written.
> But KVM can only do that if the guest Dirty bit is '1'; if the Dirty bit is '0',
> then KVM needs to intercept the next write in order to emulate the Dirty assist.
>
> I emphasized "trying" above because, in the context of the dirty logging test,
> simply establishing a writable SPTE on the READ_ONCE() doesn't meaningfully improve
> coverage without checking KVM's behavior after the READ_ONCE(). E.g. KVM marks
> GFNs as dirty (in the dirty bitmap) when creating writable SPTEs, and so doing
> READ+WRITE and _then_ checking the dirty bitmap would hide the KVM PML bug that
> the test was written to find.
>
> The second part of the L2 guest code:
>
> WRITE_ONCE(*b, 1);
> GUEST_SYNC(true);
> WRITE_ONCE(*b, 1);
> GUEST_SYNC(true);
> GUEST_SYNC(false);
>
> _does_ trigger and detect the KVM bug, but with a WRITE=>CHECK=>WRITE=>CHECK
> sequence instead of READ=>CHECK=>WRITE=>CHECK. I.e. even if there was a false
> pass in the first phase, as written the second phase will detect the bug,
> assuming the bug affects WRITE=>WRITE and READ=>WRITE equally. Which isn't a
> great assumption, but it was the case for the PML bug.
>
> All that said, for this patch, I think it makes sense to drop the A/D code from
> the common APIs, because (a) it will be trivially easy to have the test set them
> as needed once the common APIs are used for TDP mappings, and (b) temporarily
> dropping the code doesn't affect the test coverage in practice.
Thanks a lot for digging and finding all of this. I agree that it's
better to add the coverage properly on top of the series, and have
a separate test case for initializing the PTEs with the dirty bit for
KVM to create a writeable PTE on READ.
>
> --
> Stop setting Accessed/Dirty bits when creating EPT entries for L2 so that
> the stage-1 and stage-2 (a.k.a. TDP) page table APIs can use common code
> without bleeding the EPT hack into the common APIs.
>
> While commit 094444204570 ("selftests: kvm: add test for dirty logging
> inside nested guests") is _very_ light on details, the most likely
> explanation is that vmx_dirty_log_test was attempting to avoid taking an
> EPT Violation on the first _write_ from L2.
>
> static void l2_guest_code(u64 *a, u64 *b)
> {
> READ_ONCE(*a);
> WRITE_ONCE(*a, 1); <===
> GUEST_SYNC(true);
>
> ...
> }
>
> When handling read faults in the shadow MMU, KVM opportunistically creates
> a writable SPTE if the mapping can be writable *and* the gPTE is dirty (or
> doesn't support the Dirty bit), i.e. if KVM doesn't need to intercept
> writes in order to emulate Dirty-bit updates. By setting A/D bits in the
> test's EPT entries, the above READ+WRITE will fault only on the read, and
> in theory expose the bug fixed by KVM commit 1f4e5fc83a42 ("KVM: x86: fix
> nested guest live migration with PML"). If the Dirty bit is NOT set, the
> test will get a false pass due; though again, in theory.
>
> However, the test is flawed (and always was, at least in the versions
> posted publicly), as KVM (correctly) marks the corresponding L1 GFN as
> dirty (in the dirty bitmap) when creating the writable SPTE. I.e. without
> a check on the dirty bitmap after the READ_ONCE(), the check after the
> first WRITE_ONCE() will get a false pass due to the dirty bitmap/log having
> been updated by the read fault, not by PML.
>
> Furthermore, the subsequent behavior in the test's l2_guest_code()
> effectively hides the flawed test behavior, as the straight writes to a
> new L2 GPA fault also trigger the KVM bug, and so the test will still
> detect the failure due to lack of isolation between the two testcases
> (Read=>Write vs. Write=>Write).
>
> WRITE_ONCE(*b, 1);
> GUEST_SYNC(true);
> WRITE_ONCE(*b, 1);
> GUEST_SYNC(true);
> GUEST_SYNC(false);
>
> Punt on fixing vmx_dirty_log_test for the moment as it will be easier to
> properly fix the test once the TDP code uses the common MMU APIs, at which
> point it will be trivially easy for the test to retrieve the EPT PTE and
> set the Dirty bit as needed.
> --
Looks good to me, a significant improvement over what I had :')
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 06/16] KVM: selftests: Introduce struct kvm_mmu
2025-12-23 22:29 ` Sean Christopherson
@ 2025-12-23 23:38 ` Yosry Ahmed
2025-12-29 15:24 ` Sean Christopherson
0 siblings, 1 reply; 36+ messages in thread
From: Yosry Ahmed @ 2025-12-23 23:38 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel
On Tue, Dec 23, 2025 at 02:29:23PM -0800, Sean Christopherson wrote:
> On Thu, Nov 27, 2025, Yosry Ahmed wrote:
> > In preparation for generalizing the virt mapping functions to work with
> > TDP page tables, introduce struct kvm_mmu. This struct currently only
> > holds the root GPA and number of page table levels. Parameterize virt
> > mapping functions by the kvm_mmu, and use the root GPA and page table
> > levels instead of hardcoding vm->pgd and vm->pgtable_levels.
> >
> > There's a subtle change here, instead of checking that the parent
> > pointer is the address of the vm->pgd, check if the value pointed at by
> > the parent pointer is the root GPA (i.e. the value of vm->pgd in this
> > case). No change in behavior expected.
> >
> > Opportunistically, switch the ordering of the checks in the assertion in
> > virt_get_pte(), as it makes more sense to check if the parent PTE is the
> > root (in which case, not a PTE) before checking the present flag.
> >
> > vm->arch.mmu is dynamically allocated to avoid a circular dependency
> > chain if kvm_util_arch.h includes processor.h for the struct definition:
> > kvm_util_arch.h -> processor.h -> kvm_util.h -> kvm_util_arch.h
> >
> > No functional change intended.
> >
> > Suggested-by: Sean Christopherson <seanjc@google.com>
> > Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
> > ---
> > .../selftests/kvm/include/x86/kvm_util_arch.h | 4 ++
> > .../selftests/kvm/include/x86/processor.h | 8 ++-
> > .../testing/selftests/kvm/lib/x86/processor.c | 61 +++++++++++++------
> > 3 files changed, 53 insertions(+), 20 deletions(-)
> >
> > diff --git a/tools/testing/selftests/kvm/include/x86/kvm_util_arch.h b/tools/testing/selftests/kvm/include/x86/kvm_util_arch.h
> > index 972bb1c4ab4c..d8808fa33faa 100644
> > --- a/tools/testing/selftests/kvm/include/x86/kvm_util_arch.h
> > +++ b/tools/testing/selftests/kvm/include/x86/kvm_util_arch.h
> > @@ -10,6 +10,8 @@
> >
> > extern bool is_forced_emulation_enabled;
> >
> > +struct kvm_mmu;
> > +
> > struct kvm_vm_arch {
> > vm_vaddr_t gdt;
> > vm_vaddr_t tss;
> > @@ -19,6 +21,8 @@ struct kvm_vm_arch {
> > uint64_t s_bit;
> > int sev_fd;
> > bool is_pt_protected;
> > +
> > + struct kvm_mmu *mmu;
>
> No, put kvm_mmu in common code and create kvm_vm.mmu. This makes the "mmu" object
> a weird copy of state that's already in kvm_vm (pgd, pgd_created, and pgtable_levels),
> and more importantly makes it _way_ to easy to botch the x86 MMU code (speaking
> from first hand experience), e.g. due to grabbing vm->pgtable_levels instead of
> the mmu's version. I don't see an easy way to _completely_ guard against goofs
> like that, but it's easy-ish to audit code the code for instance of "vm->mmu.",
> and adding a common kvm_mmu avoids the weird duplicate code.
Do you mean move pgd, pgd_created, and pgtable_levels into kvm_mmu? If
yes, that makes sense to me and is obviously an improvement over what
it's in this patch.
I didn't immediately make the connection, but in hindsight it's obvious
that having some of the state in kvm_vm_arch and some in kvm_vm is
fragile.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 07/16] KVM: selftests: Move PTE bitmasks to kvm_mmu
2025-12-23 22:31 ` Sean Christopherson
@ 2025-12-23 23:40 ` Yosry Ahmed
0 siblings, 0 replies; 36+ messages in thread
From: Yosry Ahmed @ 2025-12-23 23:40 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel
On Tue, Dec 23, 2025 at 02:31:38PM -0800, Sean Christopherson wrote:
> On Thu, Nov 27, 2025, Yosry Ahmed wrote:
> > @@ -1449,11 +1439,44 @@ enum pg_level {
> > #define PG_SIZE_2M PG_LEVEL_SIZE(PG_LEVEL_2M)
> > #define PG_SIZE_1G PG_LEVEL_SIZE(PG_LEVEL_1G)
> >
> > +struct pte_masks {
> > + uint64_t present;
> > + uint64_t writable;
> > + uint64_t user;
> > + uint64_t accessed;
> > + uint64_t dirty;
> > + uint64_t huge;
> > + uint64_t nx;
> > + uint64_t c;
> > + uint64_t s;
> > +};
> > +
> > struct kvm_mmu {
> > uint64_t root_gpa;
> > int pgtable_levels;
> > + struct pte_masks pte_masks;
>
> And then introduce kvm_mmu_arch so that x86 can shove pte_masks into the mmu.
Makes sense.
>
> > };
> >
> > +#define PTE_PRESENT_MASK(mmu) ((mmu)->pte_masks.present)
> > +#define PTE_WRITABLE_MASK(mmu) ((mmu)->pte_masks.writable)
> > +#define PTE_USER_MASK(mmu) ((mmu)->pte_masks.user)
> > +#define PTE_ACCESSED_MASK(mmu) ((mmu)->pte_masks.accessed)
> > +#define PTE_DIRTY_MASK(mmu) ((mmu)->pte_masks.dirty)
> > +#define PTE_HUGE_MASK(mmu) ((mmu)->pte_masks.huge)
> > +#define PTE_NX_MASK(mmu) ((mmu)->pte_masks.nx)
> > +#define PTE_C_MASK(mmu) ((mmu)->pte_masks.c)
> > +#define PTE_S_MASK(mmu) ((mmu)->pte_masks.s)
> > +
> > +#define pte_present(mmu, pte) (!!(*(pte) & PTE_PRESENT_MASK(mmu)))
>
> I very, very strongly prefer is_present_pte(), is_huge_pte(), etc.
These were modeled after the kernel accessors (e.g. in
arch/x86/include/asm/pgtable.h). I think it's clearer if we use the same
naming here, but I don't feel as strongly as you -- so fine either way.
>
> > +#define pte_writable(mmu, pte) (!!(*(pte) & PTE_WRITABLE_MASK(mmu)))
> > +#define pte_user(mmu, pte) (!!(*(pte) & PTE_USER_MASK(mmu)))
> > +#define pte_accessed(mmu, pte) (!!(*(pte) & PTE_ACCESSED_MASK(mmu)))
> > +#define pte_dirty(mmu, pte) (!!(*(pte) & PTE_DIRTY_MASK(mmu)))
> > +#define pte_huge(mmu, pte) (!!(*(pte) & PTE_HUGE_MASK(mmu)))
> > +#define pte_nx(mmu, pte) (!!(*(pte) & PTE_NX_MASK(mmu)))
> > +#define pte_c(mmu, pte) (!!(*(pte) & PTE_C_MASK(mmu)))
> > +#define pte_s(mmu, pte) (!!(*(pte) & PTE_S_MASK(mmu)))
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 10/16] KVM: selftests: Reuse virt mapping functions for nested EPTs
2025-12-23 23:12 ` Sean Christopherson
@ 2025-12-23 23:45 ` Yosry Ahmed
2025-12-30 0:08 ` Sean Christopherson
0 siblings, 1 reply; 36+ messages in thread
From: Yosry Ahmed @ 2025-12-23 23:45 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel
On Tue, Dec 23, 2025 at 03:12:09PM -0800, Sean Christopherson wrote:
> On Thu, Nov 27, 2025, Yosry Ahmed wrote:
> > __tdp_pg_map() bears a lot of resemblence to __virt_pg_map(). The
> > main differences are:
> > - It uses the EPT struct overlay instead of the PTE masks.
> > - It always assumes 4-level EPTs.
> >
> > To reuse __virt_pg_map(), initialize the PTE masks in nested MMU with
> > EPT PTE masks. EPTs have no 'present' or 'user' bits, so use the
> > 'readable' bit instead like shadow_{present/user}_mask, ignoring the
> > fact that entries can be present and not readable if the CPU has
> > VMX_EPT_EXECUTE_ONLY_BIT. This is simple and sufficient for testing.
>
> Ugh, no. I am strongly against playing the same insane games KVM itself plays
> with overloading protectin/access bits. There's no reason for selftests to do
> the same, e.g. selftests aren't shadowing guest PTEs and doing permission checks
> in hot paths and so don't need to multiplex a bunch of things into an inscrutable
> (but performant!) system.
I was trying to stay consistent with the KVM code (rather than care
about performance), but if you'd rather simplify things here then I am
fine with that too.
>
> > Add an executable bitmask and update __virt_pg_map() and friends to set
> > the bit on newly created entries to match the EPT behavior. It's a nop
> > for x86 page tables.
> >
> > Another benefit of reusing the code is having separate handling for
> > upper-level PTEs vs 4K PTEs, which avoids some quirks like setting the
> > large bit on a 4K PTE in the EPTs.
> >
> > No functional change intended.
> >
> > Suggested-by: Sean Christopherson <seanjc@google.com>
> > Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
> > ---
> > .../selftests/kvm/include/x86/processor.h | 3 +
> > .../testing/selftests/kvm/lib/x86/processor.c | 12 +-
> > tools/testing/selftests/kvm/lib/x86/vmx.c | 115 ++++--------------
> > 3 files changed, 33 insertions(+), 97 deletions(-)
> >
> > diff --git a/tools/testing/selftests/kvm/include/x86/processor.h b/tools/testing/selftests/kvm/include/x86/processor.h
> > index fb2b2e53d453..62e10b296719 100644
> > --- a/tools/testing/selftests/kvm/include/x86/processor.h
> > +++ b/tools/testing/selftests/kvm/include/x86/processor.h
> > @@ -1447,6 +1447,7 @@ struct pte_masks {
> > uint64_t dirty;
> > uint64_t huge;
> > uint64_t nx;
> > + uint64_t x;
>
> To be consistent with e.g. writable, call this executable.
Was trying to be consistent with 'nx' :)
>
> > uint64_t c;
> > uint64_t s;
> > };
> > @@ -1464,6 +1465,7 @@ struct kvm_mmu {
> > #define PTE_DIRTY_MASK(mmu) ((mmu)->pte_masks.dirty)
> > #define PTE_HUGE_MASK(mmu) ((mmu)->pte_masks.huge)
> > #define PTE_NX_MASK(mmu) ((mmu)->pte_masks.nx)
> > +#define PTE_X_MASK(mmu) ((mmu)->pte_masks.x)
> > #define PTE_C_MASK(mmu) ((mmu)->pte_masks.c)
> > #define PTE_S_MASK(mmu) ((mmu)->pte_masks.s)
> >
> > @@ -1474,6 +1476,7 @@ struct kvm_mmu {
> > #define pte_dirty(mmu, pte) (!!(*(pte) & PTE_DIRTY_MASK(mmu)))
> > #define pte_huge(mmu, pte) (!!(*(pte) & PTE_HUGE_MASK(mmu)))
> > #define pte_nx(mmu, pte) (!!(*(pte) & PTE_NX_MASK(mmu)))
> > +#define pte_x(mmu, pte) (!!(*(pte) & PTE_X_MASK(mmu)))
>
> And then here to not assume PRESENT == READABLE, just check if the MMU even has
> a PRESENT bit. We may still need changes, e.g. the page table builders actually
> need to verify a PTE is _writable_, not just present, but that's largely an
> orthogonal issue.
Not sure what you mean? How is the PTE being writable relevant to
assuming PRESENT == READABLE?
>
> #define is_present_pte(mmu, pte) \
> (PTE_PRESENT_MASK(mmu) ? \
> !!(*(pte) & PTE_PRESENT_MASK(mmu)) : \
> !!(*(pte) & (PTE_READABLE_MASK(mmu) | PTE_EXECUTABLE_MASK(mmu))))
and then Intel will introduce VMX_EPT_WRITE_ONLY_BIT :P
>
> And to properly capture the relationship between NX and EXECUTABLE:
>
> #define is_executable_pte(mmu, pte) \
> ((*(pte) & (PTE_EXECUTABLE_MASK(mmu) | PTE_NX_MASK(mmu))) == PTE_EXECUTABLE_MASK(mmu))
Yeah that's a lot better.
>
> #define is_nx_pte(mmu, pte) (!is_executable_pte(mmu, pte))
>
> > #define pte_c(mmu, pte) (!!(*(pte) & PTE_C_MASK(mmu)))
> > #define pte_s(mmu, pte) (!!(*(pte) & PTE_S_MASK(mmu)))
> >
> > diff --git a/tools/testing/selftests/kvm/lib/x86/processor.c b/tools/testing/selftests/kvm/lib/x86/processor.c
> > index bff75ff05364..8b0e17f8ca37 100644
> > --- a/tools/testing/selftests/kvm/lib/x86/processor.c
> > +++ b/tools/testing/selftests/kvm/lib/x86/processor.c
> > @@ -162,8 +162,7 @@ struct kvm_mmu *mmu_create(struct kvm_vm *vm, int pgtable_levels,
> > struct kvm_mmu *mmu = calloc(1, sizeof(*mmu));
> >
> > TEST_ASSERT(mmu, "-ENOMEM when allocating MMU");
> > - if (pte_masks)
> > - mmu->pte_masks = *pte_masks;
> > + mmu->pte_masks = *pte_masks;
>
> Rather than pass NULL (and allow NULL here) in the previous patch, pass an
> empty pte_masks. That avoids churning the MMU initialization code, and allows
> for a better TODO in the previous patch.
Makes sense.
>
> > + /*
> > + * EPTs do not have 'present' or 'user' bits, instead bit 0 is the
> > + * 'readable' bit. In some cases, EPTs can be execute-only and an entry
> > + * is present but not readable. However, for the purposes of testing we
> > + * assume 'present' == 'user' == 'readable' for simplicity.
> > + */
> > + pte_masks = (struct pte_masks){
> > + .present = BIT_ULL(0),
> > + .user = BIT_ULL(0),
> > + .writable = BIT_ULL(1),
> > + .x = BIT_ULL(2),
> > + .accessed = BIT_ULL(5),
> > + .dirty = BIT_ULL(6),
> > + .huge = BIT_ULL(7),
> > + .nx = 0,
> > + };
> > +
> > /* EPTP_PWL_4 is always used */
>
> Make this a TODO, e.g.
>
> /* TODO: Add support for 5-level paging. */
>
> so that it's clear this is a shortcoming, not some fundamental property of
> selftests.
Makes sense.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 10/16] KVM: selftests: Reuse virt mapping functions for nested EPTs
2025-12-23 23:14 ` Sean Christopherson
@ 2025-12-23 23:47 ` Yosry Ahmed
0 siblings, 0 replies; 36+ messages in thread
From: Yosry Ahmed @ 2025-12-23 23:47 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel
On Tue, Dec 23, 2025 at 03:14:28PM -0800, Sean Christopherson wrote:
> On Thu, Nov 27, 2025, Yosry Ahmed wrote:
> > + /*
> > + * EPTs do not have 'present' or 'user' bits, instead bit 0 is the
> > + * 'readable' bit. In some cases, EPTs can be execute-only and an entry
> > + * is present but not readable. However, for the purposes of testing we
> > + * assume 'present' == 'user' == 'readable' for simplicity.
> > + */
> > + pte_masks = (struct pte_masks){
> > + .present = BIT_ULL(0),
> > + .user = BIT_ULL(0),
> > + .writable = BIT_ULL(1),
> > + .x = BIT_ULL(2),
> > + .accessed = BIT_ULL(5),
> > + .dirty = BIT_ULL(6),
>
> Almost forgot, the Accessed and Dirty bits are wrong. They are bits 8 and 9
> respectively, not 5 and 6. Amusingly (well, it's amusing *now*, it wasn't so
> amusing at the time), I found that out when I couldn't get KVM to create a writable
> SPTE on a read fault in the nested dirty log test :-)
Instead of being a reasonable person and own up to my mistake, I will
blame Intel for putting the bits there to begin with :P
(But seriously, sorry for such a dumb mistake)
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 00/16] Add Nested NPT support in selftests
2025-12-23 22:01 ` [PATCH v3 00/16] Add Nested NPT support in selftests Sean Christopherson
@ 2025-12-23 23:48 ` Yosry Ahmed
0 siblings, 0 replies; 36+ messages in thread
From: Yosry Ahmed @ 2025-12-23 23:48 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel
On Tue, Dec 23, 2025 at 02:01:36PM -0800, Sean Christopherson wrote:
> On Thu, Nov 27, 2025, Yosry Ahmed wrote:
> > Yosry Ahmed (16):
> > KVM: selftests: Make __vm_get_page_table_entry() static
> > KVM: selftests: Stop passing a memslot to nested_map_memslot()
> > KVM: selftests: Rename nested TDP mapping functions
> > KVM: selftests: Kill eptPageTablePointer
> > KVM: selftests: Stop setting AD bits on nested EPTs on creation
> > KVM: selftests: Introduce struct kvm_mmu
> > KVM: selftests: Move PTE bitmasks to kvm_mmu
> > KVM: selftests: Use a nested MMU to share nested EPTs between vCPUs
> > KVM: selftests: Stop passing VMX metadata to TDP mapping functions
> > KVM: selftests: Reuse virt mapping functions for nested EPTs
> > KVM: selftests: Move TDP mapping functions outside of vmx.c
> > KVM: selftests: Allow kvm_cpu_has_ept() to be called on AMD CPUs
> > KVM: selftests: Add support for nested NPTs
> > KVM: selftests: Set the user bit on nested NPT PTEs
> > KVM: selftests: Extend vmx_dirty_log_test to cover SVM
> > KVM: selftests: Extend memstress to run on nested SVM
>
> Lot's of feedback incoming, but no need for you to doing anything unless you
> disagree with something. I have all the "requested" changes in a local branch,
> and will post v4 (probably next week).
Thanks a lot for taking care of this. No disagreements, just a couple of
comments/questions in the replies.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 06/16] KVM: selftests: Introduce struct kvm_mmu
2025-12-23 23:38 ` Yosry Ahmed
@ 2025-12-29 15:24 ` Sean Christopherson
0 siblings, 0 replies; 36+ messages in thread
From: Sean Christopherson @ 2025-12-29 15:24 UTC (permalink / raw)
To: Yosry Ahmed; +Cc: Paolo Bonzini, kvm, linux-kernel
On Tue, Dec 23, 2025, Yosry Ahmed wrote:
> On Tue, Dec 23, 2025 at 02:29:23PM -0800, Sean Christopherson wrote:
> > On Thu, Nov 27, 2025, Yosry Ahmed wrote:
> > > diff --git a/tools/testing/selftests/kvm/include/x86/kvm_util_arch.h b/tools/testing/selftests/kvm/include/x86/kvm_util_arch.h
> > > index 972bb1c4ab4c..d8808fa33faa 100644
> > > --- a/tools/testing/selftests/kvm/include/x86/kvm_util_arch.h
> > > +++ b/tools/testing/selftests/kvm/include/x86/kvm_util_arch.h
> > > @@ -10,6 +10,8 @@
> > >
> > > extern bool is_forced_emulation_enabled;
> > >
> > > +struct kvm_mmu;
> > > +
> > > struct kvm_vm_arch {
> > > vm_vaddr_t gdt;
> > > vm_vaddr_t tss;
> > > @@ -19,6 +21,8 @@ struct kvm_vm_arch {
> > > uint64_t s_bit;
> > > int sev_fd;
> > > bool is_pt_protected;
> > > +
> > > + struct kvm_mmu *mmu;
> >
> > No, put kvm_mmu in common code and create kvm_vm.mmu. This makes the "mmu" object
> > a weird copy of state that's already in kvm_vm (pgd, pgd_created, and pgtable_levels),
> > and more importantly makes it _way_ to easy to botch the x86 MMU code (speaking
> > from first hand experience), e.g. due to grabbing vm->pgtable_levels instead of
> > the mmu's version. I don't see an easy way to _completely_ guard against goofs
> > like that, but it's easy-ish to audit code the code for instance of "vm->mmu.",
> > and adding a common kvm_mmu avoids the weird duplicate code.
>
> Do you mean move pgd, pgd_created, and pgtable_levels into kvm_mmu?
Yep, exactly.
> If yes, that makes sense to me and is obviously an improvement over what it's
> in this patch.
>
> I didn't immediately make the connection, but in hindsight it's obvious
> that having some of the state in kvm_vm_arch and some in kvm_vm is
> fragile.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 10/16] KVM: selftests: Reuse virt mapping functions for nested EPTs
2025-12-23 23:45 ` Yosry Ahmed
@ 2025-12-30 0:08 ` Sean Christopherson
2025-12-30 4:03 ` Yosry Ahmed
0 siblings, 1 reply; 36+ messages in thread
From: Sean Christopherson @ 2025-12-30 0:08 UTC (permalink / raw)
To: Yosry Ahmed; +Cc: Paolo Bonzini, kvm, linux-kernel
On Tue, Dec 23, 2025, Yosry Ahmed wrote:
> On Tue, Dec 23, 2025 at 03:12:09PM -0800, Sean Christopherson wrote:
> > On Thu, Nov 27, 2025, Yosry Ahmed wrote:
> > > diff --git a/tools/testing/selftests/kvm/include/x86/processor.h b/tools/testing/selftests/kvm/include/x86/processor.h
> > > index fb2b2e53d453..62e10b296719 100644
> > > --- a/tools/testing/selftests/kvm/include/x86/processor.h
> > > +++ b/tools/testing/selftests/kvm/include/x86/processor.h
> > > @@ -1447,6 +1447,7 @@ struct pte_masks {
> > > uint64_t dirty;
> > > uint64_t huge;
> > > uint64_t nx;
> > > + uint64_t x;
> >
> > To be consistent with e.g. writable, call this executable.
>
> Was trying to be consistent with 'nx' :)
>
> >
> > > uint64_t c;
> > > uint64_t s;
> > > };
> > > @@ -1464,6 +1465,7 @@ struct kvm_mmu {
> > > #define PTE_DIRTY_MASK(mmu) ((mmu)->pte_masks.dirty)
> > > #define PTE_HUGE_MASK(mmu) ((mmu)->pte_masks.huge)
> > > #define PTE_NX_MASK(mmu) ((mmu)->pte_masks.nx)
> > > +#define PTE_X_MASK(mmu) ((mmu)->pte_masks.x)
> > > #define PTE_C_MASK(mmu) ((mmu)->pte_masks.c)
> > > #define PTE_S_MASK(mmu) ((mmu)->pte_masks.s)
> > >
> > > @@ -1474,6 +1476,7 @@ struct kvm_mmu {
> > > #define pte_dirty(mmu, pte) (!!(*(pte) & PTE_DIRTY_MASK(mmu)))
> > > #define pte_huge(mmu, pte) (!!(*(pte) & PTE_HUGE_MASK(mmu)))
> > > #define pte_nx(mmu, pte) (!!(*(pte) & PTE_NX_MASK(mmu)))
> > > +#define pte_x(mmu, pte) (!!(*(pte) & PTE_X_MASK(mmu)))
> >
> > And then here to not assume PRESENT == READABLE, just check if the MMU even has
> > a PRESENT bit. We may still need changes, e.g. the page table builders actually
> > need to verify a PTE is _writable_, not just present, but that's largely an
> > orthogonal issue.
>
> Not sure what you mean? How is the PTE being writable relevant to
> assuming PRESENT == READABLE?
Only tangentially, I was try to say that if we ever get to a point where selftests
support read-only mappings, then the below check won't suffice because walking
page tables would get false positives on whether or not an entry is usable, e.g.
if a test wants to create a writable mapping and ends up re-using a read-only
mapping.
The PRESENT == READABLE thing is much more about execute-only mappings (which
selftests also don't support, but as you allude to below, don't require new
hardware functionality).
> > #define is_present_pte(mmu, pte) \
> > (PTE_PRESENT_MASK(mmu) ? \
> > !!(*(pte) & PTE_PRESENT_MASK(mmu)) : \
> > !!(*(pte) & (PTE_READABLE_MASK(mmu) | PTE_EXECUTABLE_MASK(mmu))))
>
> and then Intel will introduce VMX_EPT_WRITE_ONLY_BIT :P
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 10/16] KVM: selftests: Reuse virt mapping functions for nested EPTs
2025-12-30 0:08 ` Sean Christopherson
@ 2025-12-30 4:03 ` Yosry Ahmed
2025-12-30 15:43 ` Sean Christopherson
0 siblings, 1 reply; 36+ messages in thread
From: Yosry Ahmed @ 2025-12-30 4:03 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel
December 29, 2025 at 4:08 PM, "Sean Christopherson" <seanjc@google.com mailto:seanjc@google.com?to=%22Sean%20Christopherson%22%20%3Cseanjc%40google.com%3E > wrote:
>
> On Tue, Dec 23, 2025, Yosry Ahmed wrote:
>
> >
> > On Tue, Dec 23, 2025 at 03:12:09PM -0800, Sean Christopherson wrote:
> > On Thu, Nov 27, 2025, Yosry Ahmed wrote:
> > > diff --git a/tools/testing/selftests/kvm/include/x86/processor.h b/tools/testing/selftests/kvm/include/x86/processor.h
> > > index fb2b2e53d453..62e10b296719 100644
> > > --- a/tools/testing/selftests/kvm/include/x86/processor.h
> > > +++ b/tools/testing/selftests/kvm/include/x86/processor.h
> > > @@ -1447,6 +1447,7 @@ struct pte_masks {
> > > uint64_t dirty;
> > > uint64_t huge;
> > > uint64_t nx;
> > > + uint64_t x;
> >
> > To be consistent with e.g. writable, call this executable.
> >
> > Was trying to be consistent with 'nx' :)
> >
> >
> > > uint64_t c;
> > > uint64_t s;
> > > };
> > > @@ -1464,6 +1465,7 @@ struct kvm_mmu {
> > > #define PTE_DIRTY_MASK(mmu) ((mmu)->pte_masks.dirty)
> > > #define PTE_HUGE_MASK(mmu) ((mmu)->pte_masks.huge)
> > > #define PTE_NX_MASK(mmu) ((mmu)->pte_masks.nx)
> > > +#define PTE_X_MASK(mmu) ((mmu)->pte_masks.x)
> > > #define PTE_C_MASK(mmu) ((mmu)->pte_masks.c)
> > > #define PTE_S_MASK(mmu) ((mmu)->pte_masks.s)
> > >
> > > @@ -1474,6 +1476,7 @@ struct kvm_mmu {
> > > #define pte_dirty(mmu, pte) (!!(*(pte) & PTE_DIRTY_MASK(mmu)))
> > > #define pte_huge(mmu, pte) (!!(*(pte) & PTE_HUGE_MASK(mmu)))
> > > #define pte_nx(mmu, pte) (!!(*(pte) & PTE_NX_MASK(mmu)))
> > > +#define pte_x(mmu, pte) (!!(*(pte) & PTE_X_MASK(mmu)))
> >
> > And then here to not assume PRESENT == READABLE, just check if the MMU even has
> > a PRESENT bit. We may still need changes, e.g. the page table builders actually
> > need to verify a PTE is _writable_, not just present, but that's largely an
> > orthogonal issue.
> >
> > Not sure what you mean? How is the PTE being writable relevant to
> > assuming PRESENT == READABLE?
> >
> Only tangentially, I was try to say that if we ever get to a point where selftests
> support read-only mappings, then the below check won't suffice because walking
> page tables would get false positives on whether or not an entry is usable, e.g.
> if a test wants to create a writable mapping and ends up re-using a read-only
> mapping.
>
> The PRESENT == READABLE thing is much more about execute-only mappings (which
> selftests also don't support, but as you allude to below, don't require new
> hardware functionality).
Oh okay, thanks for clarifying. Yeah that makes sense, if/when read-only mappings are ever supported the page table builders will need to be updated accordingly.
Although now that you point this out, I think it would be easy to miss. If new helpers are introduced that just modify existing page tables to remove the write bit, then we'll probably miss updating the page table builders to check for writable mappings. Then again, we'll probably only update the leaf PTEs to be read-only, and the page table builders already do not re-use leaf entries.
We could be paranoid and add some TEST_ASSERT() calls to guard against that (e.g. in virt_create_upper_pte()), but probably not worth it.
>
> >
> > #define is_present_pte(mmu, pte) \
> > (PTE_PRESENT_MASK(mmu) ? \
> > !!(*(pte) & PTE_PRESENT_MASK(mmu)) : \
> > !!(*(pte) & (PTE_READABLE_MASK(mmu) | PTE_EXECUTABLE_MASK(mmu))))
> >
> > and then Intel will introduce VMX_EPT_WRITE_ONLY_BIT :P
> >
>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 10/16] KVM: selftests: Reuse virt mapping functions for nested EPTs
2025-12-30 4:03 ` Yosry Ahmed
@ 2025-12-30 15:43 ` Sean Christopherson
0 siblings, 0 replies; 36+ messages in thread
From: Sean Christopherson @ 2025-12-30 15:43 UTC (permalink / raw)
To: Yosry Ahmed; +Cc: Paolo Bonzini, kvm, linux-kernel
On Tue, Dec 30, 2025, Yosry Ahmed wrote:
> December 29, 2025 at 4:08 PM, "Sean Christopherson" <seanjc@google.com mailto:seanjc@google.com?to=%22Sean%20Christopherson%22%20%3Cseanjc%40google.com%3E > wrote:
> >
> > On Tue, Dec 23, 2025, Yosry Ahmed wrote:
> >
> > > On Tue, Dec 23, 2025 at 03:12:09PM -0800, Sean Christopherson wrote:
> > > On Thu, Nov 27, 2025, Yosry Ahmed wrote:
> > > > diff --git a/tools/testing/selftests/kvm/include/x86/processor.h b/tools/testing/selftests/kvm/include/x86/processor.h
> > > > index fb2b2e53d453..62e10b296719 100644
> > > > --- a/tools/testing/selftests/kvm/include/x86/processor.h
> > > > +++ b/tools/testing/selftests/kvm/include/x86/processor.h
> > > > @@ -1447,6 +1447,7 @@ struct pte_masks {
> > > > uint64_t dirty;
> > > > uint64_t huge;
> > > > uint64_t nx;
> > > > + uint64_t x;
> > >
> > > To be consistent with e.g. writable, call this executable.
> > >
> > > Was trying to be consistent with 'nx' :)
> > >
> > >
> > > > uint64_t c;
> > > > uint64_t s;
> > > > };
> > > > @@ -1464,6 +1465,7 @@ struct kvm_mmu {
> > > > #define PTE_DIRTY_MASK(mmu) ((mmu)->pte_masks.dirty)
> > > > #define PTE_HUGE_MASK(mmu) ((mmu)->pte_masks.huge)
> > > > #define PTE_NX_MASK(mmu) ((mmu)->pte_masks.nx)
> > > > +#define PTE_X_MASK(mmu) ((mmu)->pte_masks.x)
> > > > #define PTE_C_MASK(mmu) ((mmu)->pte_masks.c)
> > > > #define PTE_S_MASK(mmu) ((mmu)->pte_masks.s)
> > > >
> > > > @@ -1474,6 +1476,7 @@ struct kvm_mmu {
> > > > #define pte_dirty(mmu, pte) (!!(*(pte) & PTE_DIRTY_MASK(mmu)))
> > > > #define pte_huge(mmu, pte) (!!(*(pte) & PTE_HUGE_MASK(mmu)))
> > > > #define pte_nx(mmu, pte) (!!(*(pte) & PTE_NX_MASK(mmu)))
> > > > +#define pte_x(mmu, pte) (!!(*(pte) & PTE_X_MASK(mmu)))
> > >
> > > And then here to not assume PRESENT == READABLE, just check if the MMU even has
> > > a PRESENT bit. We may still need changes, e.g. the page table builders actually
> > > need to verify a PTE is _writable_, not just present, but that's largely an
> > > orthogonal issue.
> > >
> > > Not sure what you mean? How is the PTE being writable relevant to
> > > assuming PRESENT == READABLE?
> > >
> > Only tangentially, I was try to say that if we ever get to a point where selftests
> > support read-only mappings, then the below check won't suffice because walking
> > page tables would get false positives on whether or not an entry is usable, e.g.
> > if a test wants to create a writable mapping and ends up re-using a read-only
> > mapping.
> >
> > The PRESENT == READABLE thing is much more about execute-only mappings (which
> > selftests also don't support, but as you allude to below, don't require new
> > hardware functionality).
>
> Oh okay, thanks for clarifying. Yeah that makes sense, if/when read-only
> mappings are ever supported the page table builders will need to be updated
> accordingly.
>
> Although now that you point this out, I think it would be easy to miss. If
> new helpers are introduced that just modify existing page tables to remove
> the write bit, then we'll probably miss updating the page table builders to
> check for writable mappings. Then again, we'll probably only update the leaf
> PTEs to be read-only, and the page table builders already do not re-use leaf
> entries.
Yep, unless we rewrite the KUT access test :-)
> We could be paranoid and add some TEST_ASSERT() calls to guard against that
> (e.g. in virt_create_upper_pte()), but probably not worth it.
Agreed.
^ permalink raw reply [flat|nested] 36+ messages in thread
end of thread, other threads:[~2025-12-30 15:43 UTC | newest]
Thread overview: 36+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-27 1:34 [PATCH v3 00/16] Add Nested NPT support in selftests Yosry Ahmed
2025-11-27 1:34 ` [PATCH v3 01/16] KVM: selftests: Make __vm_get_page_table_entry() static Yosry Ahmed
2025-11-27 1:34 ` [PATCH v3 02/16] KVM: selftests: Stop passing a memslot to nested_map_memslot() Yosry Ahmed
2025-11-27 1:34 ` [PATCH v3 03/16] KVM: selftests: Rename nested TDP mapping functions Yosry Ahmed
2025-11-27 1:34 ` [PATCH v3 04/16] KVM: selftests: Kill eptPageTablePointer Yosry Ahmed
2025-11-27 1:34 ` [PATCH v3 05/16] KVM: selftests: Stop setting AD bits on nested EPTs on creation Yosry Ahmed
2025-12-23 22:26 ` Sean Christopherson
2025-12-23 23:35 ` Yosry Ahmed
2025-11-27 1:34 ` [PATCH v3 06/16] KVM: selftests: Introduce struct kvm_mmu Yosry Ahmed
2025-12-23 22:29 ` Sean Christopherson
2025-12-23 23:38 ` Yosry Ahmed
2025-12-29 15:24 ` Sean Christopherson
2025-11-27 1:34 ` [PATCH v3 07/16] KVM: selftests: Move PTE bitmasks to kvm_mmu Yosry Ahmed
2025-12-23 22:31 ` Sean Christopherson
2025-12-23 23:40 ` Yosry Ahmed
2025-11-27 1:34 ` [PATCH v3 08/16] KVM: selftests: Use a nested MMU to share nested EPTs between vCPUs Yosry Ahmed
2025-12-23 23:16 ` Sean Christopherson
2025-11-27 1:34 ` [PATCH v3 09/16] KVM: selftests: Stop passing VMX metadata to TDP mapping functions Yosry Ahmed
2025-12-15 18:38 ` Yosry Ahmed
2025-11-27 1:34 ` [PATCH v3 10/16] KVM: selftests: Reuse virt mapping functions for nested EPTs Yosry Ahmed
2025-12-23 23:12 ` Sean Christopherson
2025-12-23 23:45 ` Yosry Ahmed
2025-12-30 0:08 ` Sean Christopherson
2025-12-30 4:03 ` Yosry Ahmed
2025-12-30 15:43 ` Sean Christopherson
2025-12-23 23:14 ` Sean Christopherson
2025-12-23 23:47 ` Yosry Ahmed
2025-11-27 1:34 ` [PATCH v3 11/16] KVM: selftests: Move TDP mapping functions outside of vmx.c Yosry Ahmed
2025-12-23 23:13 ` Sean Christopherson
2025-11-27 1:34 ` [PATCH v3 12/16] KVM: selftests: Allow kvm_cpu_has_ept() to be called on AMD CPUs Yosry Ahmed
2025-11-27 1:34 ` [PATCH v3 13/16] KVM: selftests: Add support for nested NPTs Yosry Ahmed
2025-11-27 1:34 ` [PATCH v3 14/16] KVM: selftests: Set the user bit on nested NPT PTEs Yosry Ahmed
2025-11-27 1:34 ` [PATCH v3 15/16] KVM: selftests: Extend vmx_dirty_log_test to cover SVM Yosry Ahmed
2025-11-27 1:34 ` [PATCH v3 16/16] KVM: selftests: Extend memstress to run on nested SVM Yosry Ahmed
2025-12-23 22:01 ` [PATCH v3 00/16] Add Nested NPT support in selftests Sean Christopherson
2025-12-23 23:48 ` Yosry Ahmed
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).