* [PATCH v4 00/21] KVM: selftests: Add Nested NPT support
@ 2025-12-30 23:01 Sean Christopherson
2025-12-30 23:01 ` [PATCH v4 01/21] KVM: selftests: Make __vm_get_page_table_entry() static Sean Christopherson
` (21 more replies)
0 siblings, 22 replies; 39+ messages in thread
From: Sean Christopherson @ 2025-12-30 23:01 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou,
Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, kvm-riscv, linux-riscv,
linux-kernel, Yosry Ahmed
Yosry's series to add support for nested NPT, and extends vmx_dirty_log_test
and kvm_dirty_log_test (with -n, using memstress) to cover nested SVM.
Note, I'm mildly concerned the last patch to extend nested_dirty_log_test to
validate KVM's handling of READ faults could be flaky, e.g. maybe if someone
is running the test under heavy memory pressure and the to-be-access page is
swapped between the write-from-host and read-from-guest? But unless someone
knows/shows it'll be flaky, I'm inclined to apply it and hope for the best.
v4:
- Document the likely reason for setting A/D bits.
- Put "mmu" structure in common "struct kvm".
- Make it clear PTE accessors are predicates.
- Assert that the stage-2 MMU is initialized at most once.
- Make READABLE a standalone bit (don't overload USER).
- Don't alias PRESENT => READABLE for EPT.
- Fix the A/D bit definitions for EPT.
- Drop the function comment for __tdp_map().
- Add a patch to extend the nested_dirty_log_test to verify that KVM creates
writable SPTEs when the gPTEs are writable and dirty.
v3:
- https://lore.kernel.org/all/20251127013440.3324671-1-yosry.ahmed@linux.dev
- Dropped the patches that landed in kvm-x86.
- Reshuffled some patches and cleanups.
- Introduced kvm_mmu data structures to hold the root, page table
levels, and page table masks (Sean).
- Extended memstress as well to cover nested SVM.
v2: https://lore.kernel.org/kvm/20251021074736.1324328-1-yosry.ahmed@linux.dev
Sean Christopherson (7):
KVM: selftests: Add "struct kvm_mmu" to track a given MMU instance
KVM: selftests: Plumb "struct kvm_mmu" into x86's MMU APIs
KVM: selftests: Add a "struct kvm_mmu_arch arch" member to kvm_mmu
KVM: selftests: Add a stage-2 MMU instance to kvm_vm
KVM: selftests: Move TDP mapping functions outside of vmx.c
KVM: selftests: Rename vm_get_page_table_entry() to vm_get_pte()
KVM: selftests: Test READ=>WRITE dirty logging behavior for shadow MMU
Yosry Ahmed (14):
KVM: selftests: Make __vm_get_page_table_entry() static
KVM: selftests: Stop passing a memslot to nested_map_memslot()
KVM: selftests: Rename nested TDP mapping functions
KVM: selftests: Kill eptPageTablePointer
KVM: selftests: Stop setting A/D bits when creating EPT PTEs
KVM: selftests: Move PTE bitmasks to kvm_mmu
KVM: selftests: Use a TDP MMU to share EPT page tables between vCPUs
KVM: selftests: Stop passing VMX metadata to TDP mapping functions
KVM: selftests: Reuse virt mapping functions for nested EPTs
KVM: selftests: Allow kvm_cpu_has_ept() to be called on AMD CPUs
KVM: selftests: Add support for nested NPTs
KVM: selftests: Set the user bit on nested NPT PTEs
KVM: selftests: Extend vmx_dirty_log_test to cover SVM
KVM: selftests: Extend memstress to run on nested SVM
tools/testing/selftests/kvm/Makefile.kvm | 2 +-
.../kvm/include/arm64/kvm_util_arch.h | 2 +
.../testing/selftests/kvm/include/kvm_util.h | 18 +-
.../kvm/include/loongarch/kvm_util_arch.h | 1 +
.../kvm/include/riscv/kvm_util_arch.h | 1 +
.../kvm/include/s390/kvm_util_arch.h | 1 +
.../selftests/kvm/include/x86/kvm_util_arch.h | 22 ++
.../selftests/kvm/include/x86/processor.h | 58 +++-
.../selftests/kvm/include/x86/svm_util.h | 9 +
tools/testing/selftests/kvm/include/x86/vmx.h | 16 +-
.../selftests/kvm/lib/arm64/processor.c | 38 +--
tools/testing/selftests/kvm/lib/kvm_util.c | 28 +-
.../selftests/kvm/lib/loongarch/processor.c | 28 +-
.../selftests/kvm/lib/riscv/processor.c | 31 ++-
.../selftests/kvm/lib/s390/processor.c | 16 +-
.../testing/selftests/kvm/lib/x86/memstress.c | 66 +++--
.../testing/selftests/kvm/lib/x86/processor.c | 237 ++++++++++++----
tools/testing/selftests/kvm/lib/x86/svm.c | 24 ++
tools/testing/selftests/kvm/lib/x86/vmx.c | 251 ++++-------------
.../selftests/kvm/x86/hyperv_tlb_flush.c | 2 +-
.../selftests/kvm/x86/nested_dirty_log_test.c | 259 ++++++++++++++++++
.../x86/smaller_maxphyaddr_emulation_test.c | 4 +-
.../selftests/kvm/x86/vmx_dirty_log_test.c | 179 ------------
.../kvm/x86/vmx_nested_la57_state_test.c | 2 +-
24 files changed, 726 insertions(+), 569 deletions(-)
create mode 100644 tools/testing/selftests/kvm/x86/nested_dirty_log_test.c
delete mode 100644 tools/testing/selftests/kvm/x86/vmx_dirty_log_test.c
base-commit: 9448598b22c50c8a5bb77a9103e2d49f134c9578
--
2.52.0.351.gbe84eed79e-goog
^ permalink raw reply [flat|nested] 39+ messages in thread
* [PATCH v4 01/21] KVM: selftests: Make __vm_get_page_table_entry() static
2025-12-30 23:01 [PATCH v4 00/21] KVM: selftests: Add Nested NPT support Sean Christopherson
@ 2025-12-30 23:01 ` Sean Christopherson
2025-12-30 23:01 ` [PATCH v4 02/21] KVM: selftests: Stop passing a memslot to nested_map_memslot() Sean Christopherson
` (20 subsequent siblings)
21 siblings, 0 replies; 39+ messages in thread
From: Sean Christopherson @ 2025-12-30 23:01 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou,
Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, kvm-riscv, linux-riscv,
linux-kernel, Yosry Ahmed
From: Yosry Ahmed <yosry.ahmed@linux.dev>
The function is only used in processor.c, drop the declaration in
processor.h and make it static.
No functional change intended.
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
tools/testing/selftests/kvm/include/x86/processor.h | 2 --
tools/testing/selftests/kvm/lib/x86/processor.c | 4 ++--
2 files changed, 2 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/kvm/include/x86/processor.h b/tools/testing/selftests/kvm/include/x86/processor.h
index 57d62a425109..c00c0fbe62cd 100644
--- a/tools/testing/selftests/kvm/include/x86/processor.h
+++ b/tools/testing/selftests/kvm/include/x86/processor.h
@@ -1367,8 +1367,6 @@ static inline bool kvm_is_ignore_msrs(void)
return get_kvm_param_bool("ignore_msrs");
}
-uint64_t *__vm_get_page_table_entry(struct kvm_vm *vm, uint64_t vaddr,
- int *level);
uint64_t *vm_get_page_table_entry(struct kvm_vm *vm, uint64_t vaddr);
uint64_t kvm_hypercall(uint64_t nr, uint64_t a0, uint64_t a1, uint64_t a2,
diff --git a/tools/testing/selftests/kvm/lib/x86/processor.c b/tools/testing/selftests/kvm/lib/x86/processor.c
index 36104d27f3d9..c14bf2b5f28f 100644
--- a/tools/testing/selftests/kvm/lib/x86/processor.c
+++ b/tools/testing/selftests/kvm/lib/x86/processor.c
@@ -306,8 +306,8 @@ static bool vm_is_target_pte(uint64_t *pte, int *level, int current_level)
return *level == current_level;
}
-uint64_t *__vm_get_page_table_entry(struct kvm_vm *vm, uint64_t vaddr,
- int *level)
+static uint64_t *__vm_get_page_table_entry(struct kvm_vm *vm, uint64_t vaddr,
+ int *level)
{
int va_width = 12 + (vm->pgtable_levels) * 9;
uint64_t *pte = &vm->pgd;
--
2.52.0.351.gbe84eed79e-goog
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH v4 02/21] KVM: selftests: Stop passing a memslot to nested_map_memslot()
2025-12-30 23:01 [PATCH v4 00/21] KVM: selftests: Add Nested NPT support Sean Christopherson
2025-12-30 23:01 ` [PATCH v4 01/21] KVM: selftests: Make __vm_get_page_table_entry() static Sean Christopherson
@ 2025-12-30 23:01 ` Sean Christopherson
2025-12-30 23:01 ` [PATCH v4 03/21] KVM: selftests: Rename nested TDP mapping functions Sean Christopherson
` (19 subsequent siblings)
21 siblings, 0 replies; 39+ messages in thread
From: Sean Christopherson @ 2025-12-30 23:01 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou,
Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, kvm-riscv, linux-riscv,
linux-kernel, Yosry Ahmed
From: Yosry Ahmed <yosry.ahmed@linux.dev>
On x86, KVM selftests use memslot 0 for all the default regions used by
the test infrastructure. This is an implementation detail.
nested_map_memslot() is currently used to map the default regions by
explicitly passing slot 0, which leaks the library implementation into
the caller.
Rename the function to a very verbose
nested_identity_map_default_memslots() to reflect what it actually does.
Add an assertion that only memslot 0 is being used so that the
implementation does not change from under us.
No functional change intended.
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
tools/testing/selftests/kvm/include/x86/vmx.h | 4 ++--
tools/testing/selftests/kvm/lib/x86/vmx.c | 12 ++++++++----
tools/testing/selftests/kvm/x86/vmx_dirty_log_test.c | 2 +-
3 files changed, 11 insertions(+), 7 deletions(-)
diff --git a/tools/testing/selftests/kvm/include/x86/vmx.h b/tools/testing/selftests/kvm/include/x86/vmx.h
index 96e2b4c630a9..91916b8aa94b 100644
--- a/tools/testing/selftests/kvm/include/x86/vmx.h
+++ b/tools/testing/selftests/kvm/include/x86/vmx.h
@@ -563,8 +563,8 @@ void nested_pg_map(struct vmx_pages *vmx, struct kvm_vm *vm,
uint64_t nested_paddr, uint64_t paddr);
void nested_map(struct vmx_pages *vmx, struct kvm_vm *vm,
uint64_t nested_paddr, uint64_t paddr, uint64_t size);
-void nested_map_memslot(struct vmx_pages *vmx, struct kvm_vm *vm,
- uint32_t memslot);
+void nested_identity_map_default_memslots(struct vmx_pages *vmx,
+ struct kvm_vm *vm);
void nested_identity_map_1g(struct vmx_pages *vmx, struct kvm_vm *vm,
uint64_t addr, uint64_t size);
bool kvm_cpu_has_ept(void);
diff --git a/tools/testing/selftests/kvm/lib/x86/vmx.c b/tools/testing/selftests/kvm/lib/x86/vmx.c
index 29b082a58daa..eec33ec63811 100644
--- a/tools/testing/selftests/kvm/lib/x86/vmx.c
+++ b/tools/testing/selftests/kvm/lib/x86/vmx.c
@@ -494,12 +494,16 @@ void nested_map(struct vmx_pages *vmx, struct kvm_vm *vm,
/* Prepare an identity extended page table that maps all the
* physical pages in VM.
*/
-void nested_map_memslot(struct vmx_pages *vmx, struct kvm_vm *vm,
- uint32_t memslot)
+void nested_identity_map_default_memslots(struct vmx_pages *vmx,
+ struct kvm_vm *vm)
{
+ uint32_t s, memslot = 0;
sparsebit_idx_t i, last;
- struct userspace_mem_region *region =
- memslot2region(vm, memslot);
+ struct userspace_mem_region *region = memslot2region(vm, memslot);
+
+ /* Only memslot 0 is mapped here, ensure it's the only one being used */
+ for (s = 0; s < NR_MEM_REGIONS; s++)
+ TEST_ASSERT_EQ(vm->memslots[s], 0);
i = (region->region.guest_phys_addr >> vm->page_shift) - 1;
last = i + (region->region.memory_size >> vm->page_shift);
diff --git a/tools/testing/selftests/kvm/x86/vmx_dirty_log_test.c b/tools/testing/selftests/kvm/x86/vmx_dirty_log_test.c
index 98cb6bdab3e6..aab7333aaef0 100644
--- a/tools/testing/selftests/kvm/x86/vmx_dirty_log_test.c
+++ b/tools/testing/selftests/kvm/x86/vmx_dirty_log_test.c
@@ -121,7 +121,7 @@ static void test_vmx_dirty_log(bool enable_ept)
*/
if (enable_ept) {
prepare_eptp(vmx, vm);
- nested_map_memslot(vmx, vm, 0);
+ nested_identity_map_default_memslots(vmx, vm);
nested_map(vmx, vm, NESTED_TEST_MEM1, GUEST_TEST_MEM, PAGE_SIZE);
nested_map(vmx, vm, NESTED_TEST_MEM2, GUEST_TEST_MEM, PAGE_SIZE);
}
--
2.52.0.351.gbe84eed79e-goog
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH v4 03/21] KVM: selftests: Rename nested TDP mapping functions
2025-12-30 23:01 [PATCH v4 00/21] KVM: selftests: Add Nested NPT support Sean Christopherson
2025-12-30 23:01 ` [PATCH v4 01/21] KVM: selftests: Make __vm_get_page_table_entry() static Sean Christopherson
2025-12-30 23:01 ` [PATCH v4 02/21] KVM: selftests: Stop passing a memslot to nested_map_memslot() Sean Christopherson
@ 2025-12-30 23:01 ` Sean Christopherson
2025-12-30 23:01 ` [PATCH v4 04/21] KVM: selftests: Kill eptPageTablePointer Sean Christopherson
` (18 subsequent siblings)
21 siblings, 0 replies; 39+ messages in thread
From: Sean Christopherson @ 2025-12-30 23:01 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou,
Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, kvm-riscv, linux-riscv,
linux-kernel, Yosry Ahmed
From: Yosry Ahmed <yosry.ahmed@linux.dev>
Rename the functions from nested_* to tdp_* to make their purpose
clearer.
No functional change intended.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
tools/testing/selftests/kvm/include/x86/vmx.h | 16 +++---
.../testing/selftests/kvm/lib/x86/memstress.c | 4 +-
tools/testing/selftests/kvm/lib/x86/vmx.c | 50 +++++++++----------
.../selftests/kvm/x86/vmx_dirty_log_test.c | 6 +--
4 files changed, 37 insertions(+), 39 deletions(-)
diff --git a/tools/testing/selftests/kvm/include/x86/vmx.h b/tools/testing/selftests/kvm/include/x86/vmx.h
index 91916b8aa94b..04b8231d032a 100644
--- a/tools/testing/selftests/kvm/include/x86/vmx.h
+++ b/tools/testing/selftests/kvm/include/x86/vmx.h
@@ -559,14 +559,14 @@ bool load_vmcs(struct vmx_pages *vmx);
bool ept_1g_pages_supported(void);
-void nested_pg_map(struct vmx_pages *vmx, struct kvm_vm *vm,
- uint64_t nested_paddr, uint64_t paddr);
-void nested_map(struct vmx_pages *vmx, struct kvm_vm *vm,
- uint64_t nested_paddr, uint64_t paddr, uint64_t size);
-void nested_identity_map_default_memslots(struct vmx_pages *vmx,
- struct kvm_vm *vm);
-void nested_identity_map_1g(struct vmx_pages *vmx, struct kvm_vm *vm,
- uint64_t addr, uint64_t size);
+void tdp_pg_map(struct vmx_pages *vmx, struct kvm_vm *vm, uint64_t nested_paddr,
+ uint64_t paddr);
+void tdp_map(struct vmx_pages *vmx, struct kvm_vm *vm, uint64_t nested_paddr,
+ uint64_t paddr, uint64_t size);
+void tdp_identity_map_default_memslots(struct vmx_pages *vmx,
+ struct kvm_vm *vm);
+void tdp_identity_map_1g(struct vmx_pages *vmx, struct kvm_vm *vm,
+ uint64_t addr, uint64_t size);
bool kvm_cpu_has_ept(void);
void prepare_eptp(struct vmx_pages *vmx, struct kvm_vm *vm);
void prepare_virtualize_apic_accesses(struct vmx_pages *vmx, struct kvm_vm *vm);
diff --git a/tools/testing/selftests/kvm/lib/x86/memstress.c b/tools/testing/selftests/kvm/lib/x86/memstress.c
index 0b1f288ad556..1928b00bde51 100644
--- a/tools/testing/selftests/kvm/lib/x86/memstress.c
+++ b/tools/testing/selftests/kvm/lib/x86/memstress.c
@@ -70,11 +70,11 @@ void memstress_setup_ept(struct vmx_pages *vmx, struct kvm_vm *vm)
* KVM can shadow the EPT12 with the maximum huge page size supported
* by the backing source.
*/
- nested_identity_map_1g(vmx, vm, 0, 0x100000000ULL);
+ tdp_identity_map_1g(vmx, vm, 0, 0x100000000ULL);
start = align_down(memstress_args.gpa, PG_SIZE_1G);
end = align_up(memstress_args.gpa + memstress_args.size, PG_SIZE_1G);
- nested_identity_map_1g(vmx, vm, start, end - start);
+ tdp_identity_map_1g(vmx, vm, start, end - start);
}
void memstress_setup_nested(struct kvm_vm *vm, int nr_vcpus, struct kvm_vcpu *vcpus[])
diff --git a/tools/testing/selftests/kvm/lib/x86/vmx.c b/tools/testing/selftests/kvm/lib/x86/vmx.c
index eec33ec63811..1954ccdfc353 100644
--- a/tools/testing/selftests/kvm/lib/x86/vmx.c
+++ b/tools/testing/selftests/kvm/lib/x86/vmx.c
@@ -362,12 +362,12 @@ void prepare_vmcs(struct vmx_pages *vmx, void *guest_rip, void *guest_rsp)
init_vmcs_guest_state(guest_rip, guest_rsp);
}
-static void nested_create_pte(struct kvm_vm *vm,
- struct eptPageTableEntry *pte,
- uint64_t nested_paddr,
- uint64_t paddr,
- int current_level,
- int target_level)
+static void tdp_create_pte(struct kvm_vm *vm,
+ struct eptPageTableEntry *pte,
+ uint64_t nested_paddr,
+ uint64_t paddr,
+ int current_level,
+ int target_level)
{
if (!pte->readable) {
pte->writable = true;
@@ -394,8 +394,8 @@ static void nested_create_pte(struct kvm_vm *vm,
}
-void __nested_pg_map(struct vmx_pages *vmx, struct kvm_vm *vm,
- uint64_t nested_paddr, uint64_t paddr, int target_level)
+void __tdp_pg_map(struct vmx_pages *vmx, struct kvm_vm *vm,
+ uint64_t nested_paddr, uint64_t paddr, int target_level)
{
const uint64_t page_size = PG_LEVEL_SIZE(target_level);
struct eptPageTableEntry *pt = vmx->eptp_hva, *pte;
@@ -428,7 +428,7 @@ void __nested_pg_map(struct vmx_pages *vmx, struct kvm_vm *vm,
index = (nested_paddr >> PG_LEVEL_SHIFT(level)) & 0x1ffu;
pte = &pt[index];
- nested_create_pte(vm, pte, nested_paddr, paddr, level, target_level);
+ tdp_create_pte(vm, pte, nested_paddr, paddr, level, target_level);
if (pte->page_size)
break;
@@ -445,10 +445,10 @@ void __nested_pg_map(struct vmx_pages *vmx, struct kvm_vm *vm,
}
-void nested_pg_map(struct vmx_pages *vmx, struct kvm_vm *vm,
- uint64_t nested_paddr, uint64_t paddr)
+void tdp_pg_map(struct vmx_pages *vmx, struct kvm_vm *vm,
+ uint64_t nested_paddr, uint64_t paddr)
{
- __nested_pg_map(vmx, vm, nested_paddr, paddr, PG_LEVEL_4K);
+ __tdp_pg_map(vmx, vm, nested_paddr, paddr, PG_LEVEL_4K);
}
/*
@@ -468,8 +468,8 @@ void nested_pg_map(struct vmx_pages *vmx, struct kvm_vm *vm,
* Within the VM given by vm, creates a nested guest translation for the
* page range starting at nested_paddr to the page range starting at paddr.
*/
-void __nested_map(struct vmx_pages *vmx, struct kvm_vm *vm,
- uint64_t nested_paddr, uint64_t paddr, uint64_t size,
+void __tdp_map(struct vmx_pages *vmx, struct kvm_vm *vm,
+ uint64_t nested_paddr, uint64_t paddr, uint64_t size,
int level)
{
size_t page_size = PG_LEVEL_SIZE(level);
@@ -479,23 +479,23 @@ void __nested_map(struct vmx_pages *vmx, struct kvm_vm *vm,
TEST_ASSERT(paddr + size > paddr, "Paddr overflow");
while (npages--) {
- __nested_pg_map(vmx, vm, nested_paddr, paddr, level);
+ __tdp_pg_map(vmx, vm, nested_paddr, paddr, level);
nested_paddr += page_size;
paddr += page_size;
}
}
-void nested_map(struct vmx_pages *vmx, struct kvm_vm *vm,
- uint64_t nested_paddr, uint64_t paddr, uint64_t size)
+void tdp_map(struct vmx_pages *vmx, struct kvm_vm *vm,
+ uint64_t nested_paddr, uint64_t paddr, uint64_t size)
{
- __nested_map(vmx, vm, nested_paddr, paddr, size, PG_LEVEL_4K);
+ __tdp_map(vmx, vm, nested_paddr, paddr, size, PG_LEVEL_4K);
}
/* Prepare an identity extended page table that maps all the
* physical pages in VM.
*/
-void nested_identity_map_default_memslots(struct vmx_pages *vmx,
- struct kvm_vm *vm)
+void tdp_identity_map_default_memslots(struct vmx_pages *vmx,
+ struct kvm_vm *vm)
{
uint32_t s, memslot = 0;
sparsebit_idx_t i, last;
@@ -512,18 +512,16 @@ void nested_identity_map_default_memslots(struct vmx_pages *vmx,
if (i > last)
break;
- nested_map(vmx, vm,
- (uint64_t)i << vm->page_shift,
- (uint64_t)i << vm->page_shift,
- 1 << vm->page_shift);
+ tdp_map(vmx, vm, (uint64_t)i << vm->page_shift,
+ (uint64_t)i << vm->page_shift, 1 << vm->page_shift);
}
}
/* Identity map a region with 1GiB Pages. */
-void nested_identity_map_1g(struct vmx_pages *vmx, struct kvm_vm *vm,
+void tdp_identity_map_1g(struct vmx_pages *vmx, struct kvm_vm *vm,
uint64_t addr, uint64_t size)
{
- __nested_map(vmx, vm, addr, addr, size, PG_LEVEL_1G);
+ __tdp_map(vmx, vm, addr, addr, size, PG_LEVEL_1G);
}
bool kvm_cpu_has_ept(void)
diff --git a/tools/testing/selftests/kvm/x86/vmx_dirty_log_test.c b/tools/testing/selftests/kvm/x86/vmx_dirty_log_test.c
index aab7333aaef0..e7d0c08ba29d 100644
--- a/tools/testing/selftests/kvm/x86/vmx_dirty_log_test.c
+++ b/tools/testing/selftests/kvm/x86/vmx_dirty_log_test.c
@@ -121,9 +121,9 @@ static void test_vmx_dirty_log(bool enable_ept)
*/
if (enable_ept) {
prepare_eptp(vmx, vm);
- nested_identity_map_default_memslots(vmx, vm);
- nested_map(vmx, vm, NESTED_TEST_MEM1, GUEST_TEST_MEM, PAGE_SIZE);
- nested_map(vmx, vm, NESTED_TEST_MEM2, GUEST_TEST_MEM, PAGE_SIZE);
+ tdp_identity_map_default_memslots(vmx, vm);
+ tdp_map(vmx, vm, NESTED_TEST_MEM1, GUEST_TEST_MEM, PAGE_SIZE);
+ tdp_map(vmx, vm, NESTED_TEST_MEM2, GUEST_TEST_MEM, PAGE_SIZE);
}
bmap = bitmap_zalloc(TEST_MEM_PAGES);
--
2.52.0.351.gbe84eed79e-goog
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH v4 04/21] KVM: selftests: Kill eptPageTablePointer
2025-12-30 23:01 [PATCH v4 00/21] KVM: selftests: Add Nested NPT support Sean Christopherson
` (2 preceding siblings ...)
2025-12-30 23:01 ` [PATCH v4 03/21] KVM: selftests: Rename nested TDP mapping functions Sean Christopherson
@ 2025-12-30 23:01 ` Sean Christopherson
2025-12-30 23:01 ` [PATCH v4 05/21] KVM: selftests: Stop setting A/D bits when creating EPT PTEs Sean Christopherson
` (17 subsequent siblings)
21 siblings, 0 replies; 39+ messages in thread
From: Sean Christopherson @ 2025-12-30 23:01 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou,
Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, kvm-riscv, linux-riscv,
linux-kernel, Yosry Ahmed
From: Yosry Ahmed <yosry.ahmed@linux.dev>
Replace the struct overlay with explicit bitmasks, which is clearer and
less error-prone. See commit f18b4aebe107 ("kvm: selftests: do not use
bitfields larger than 32-bits for PTEs") for an example of why bitfields
are not preferable.
Remove the unused PAGE_SHIFT_4K definition while at it.
No functional change intended.
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
tools/testing/selftests/kvm/lib/x86/vmx.c | 35 +++++++++++------------
1 file changed, 16 insertions(+), 19 deletions(-)
diff --git a/tools/testing/selftests/kvm/lib/x86/vmx.c b/tools/testing/selftests/kvm/lib/x86/vmx.c
index 1954ccdfc353..85043bb1ec4d 100644
--- a/tools/testing/selftests/kvm/lib/x86/vmx.c
+++ b/tools/testing/selftests/kvm/lib/x86/vmx.c
@@ -10,10 +10,16 @@
#include "processor.h"
#include "vmx.h"
-#define PAGE_SHIFT_4K 12
-
#define KVM_EPT_PAGE_TABLE_MIN_PADDR 0x1c0000
+#define EPTP_MT_SHIFT 0 /* EPTP memtype bits 2:0 */
+#define EPTP_PWL_SHIFT 3 /* EPTP page walk length bits 5:3 */
+#define EPTP_AD_ENABLED_SHIFT 6 /* EPTP AD enabled bit 6 */
+
+#define EPTP_WB (X86_MEMTYPE_WB << EPTP_MT_SHIFT)
+#define EPTP_PWL_4 (3ULL << EPTP_PWL_SHIFT) /* PWL is (levels - 1) */
+#define EPTP_AD_ENABLED (1ULL << EPTP_AD_ENABLED_SHIFT)
+
bool enable_evmcs;
struct hv_enlightened_vmcs *current_evmcs;
@@ -34,14 +40,6 @@ struct eptPageTableEntry {
uint64_t suppress_ve:1;
};
-struct eptPageTablePointer {
- uint64_t memory_type:3;
- uint64_t page_walk_length:3;
- uint64_t ad_enabled:1;
- uint64_t reserved_11_07:5;
- uint64_t address:40;
- uint64_t reserved_63_52:12;
-};
int vcpu_enable_evmcs(struct kvm_vcpu *vcpu)
{
uint16_t evmcs_ver;
@@ -196,16 +194,15 @@ static inline void init_vmcs_control_fields(struct vmx_pages *vmx)
vmwrite(PIN_BASED_VM_EXEC_CONTROL, rdmsr(MSR_IA32_VMX_TRUE_PINBASED_CTLS));
if (vmx->eptp_gpa) {
- uint64_t ept_paddr;
- struct eptPageTablePointer eptp = {
- .memory_type = X86_MEMTYPE_WB,
- .page_walk_length = 3, /* + 1 */
- .ad_enabled = ept_vpid_cap_supported(VMX_EPT_VPID_CAP_AD_BITS),
- .address = vmx->eptp_gpa >> PAGE_SHIFT_4K,
- };
+ uint64_t eptp = vmx->eptp_gpa | EPTP_WB | EPTP_PWL_4;
- memcpy(&ept_paddr, &eptp, sizeof(ept_paddr));
- vmwrite(EPT_POINTER, ept_paddr);
+ TEST_ASSERT((vmx->eptp_gpa & ~PHYSICAL_PAGE_MASK) == 0,
+ "Illegal bits set in vmx->eptp_gpa");
+
+ if (ept_vpid_cap_supported(VMX_EPT_VPID_CAP_AD_BITS))
+ eptp |= EPTP_AD_ENABLED;
+
+ vmwrite(EPT_POINTER, eptp);
sec_exec_ctl |= SECONDARY_EXEC_ENABLE_EPT;
}
--
2.52.0.351.gbe84eed79e-goog
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH v4 05/21] KVM: selftests: Stop setting A/D bits when creating EPT PTEs
2025-12-30 23:01 [PATCH v4 00/21] KVM: selftests: Add Nested NPT support Sean Christopherson
` (3 preceding siblings ...)
2025-12-30 23:01 ` [PATCH v4 04/21] KVM: selftests: Kill eptPageTablePointer Sean Christopherson
@ 2025-12-30 23:01 ` Sean Christopherson
2025-12-30 23:01 ` [PATCH v4 06/21] KVM: selftests: Add "struct kvm_mmu" to track a given MMU instance Sean Christopherson
` (16 subsequent siblings)
21 siblings, 0 replies; 39+ messages in thread
From: Sean Christopherson @ 2025-12-30 23:01 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou,
Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, kvm-riscv, linux-riscv,
linux-kernel, Yosry Ahmed
From: Yosry Ahmed <yosry.ahmed@linux.dev>
Stop setting Accessed/Dirty bits when creating EPT entries for L2 so that
the stage-1 and stage-2 (a.k.a. TDP) page table APIs can use common code
without bleeding the EPT hack into the common APIs.
While commit 094444204570 ("selftests: kvm: add test for dirty logging
inside nested guests") is _very_ light on details, the most likely
explanation is that vmx_dirty_log_test was attempting to avoid taking an
EPT Violation on the first _write_ from L2.
static void l2_guest_code(u64 *a, u64 *b)
{
READ_ONCE(*a);
WRITE_ONCE(*a, 1); <===
GUEST_SYNC(true);
...
}
When handling read faults in the shadow MMU, KVM opportunistically creates
a writable SPTE if the mapping can be writable *and* the gPTE is dirty (or
doesn't support the Dirty bit), i.e. if KVM doesn't need to intercept
writes in order to emulate Dirty-bit updates. By setting A/D bits in the
test's EPT entries, the above READ+WRITE will fault only on the read, and
in theory expose the bug fixed by KVM commit 1f4e5fc83a42 ("KVM: x86: fix
nested guest live migration with PML"). If the Dirty bit is NOT set, the
test will get a false pass due; though again, in theory.
However, the test is flawed (and always was, at least in the versions
posted publicly), as KVM (correctly) marks the corresponding L1 GFN as
dirty (in the dirty bitmap) when creating the writable SPTE. I.e. without
a check on the dirty bitmap after the READ_ONCE(), the check after the
first WRITE_ONCE() will get a false pass due to the dirty bitmap/log having
been updated by the read fault, not by PML.
Furthermore, the subsequent behavior in the test's l2_guest_code()
effectively hides the flawed test behavior, as the straight writes to a
new L2 GPA fault also trigger the KVM bug, and so the test will still
detect the failure due to lack of isolation between the two testcases
(Read=>Write vs. Write=>Write).
WRITE_ONCE(*b, 1);
GUEST_SYNC(true);
WRITE_ONCE(*b, 1);
GUEST_SYNC(true);
GUEST_SYNC(false);
Punt on fixing vmx_dirty_log_test for the moment as it will be easier to
properly fix the test once the TDP code uses the common MMU APIs, at which
point it will be trivially easy for the test to retrieve the EPT PTE and
set the Dirty bit as needed.
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
[sean: rewrite changelog to explain the situation]
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
tools/testing/selftests/kvm/lib/x86/vmx.c | 8 --------
1 file changed, 8 deletions(-)
diff --git a/tools/testing/selftests/kvm/lib/x86/vmx.c b/tools/testing/selftests/kvm/lib/x86/vmx.c
index 85043bb1ec4d..a3e2eae981da 100644
--- a/tools/testing/selftests/kvm/lib/x86/vmx.c
+++ b/tools/testing/selftests/kvm/lib/x86/vmx.c
@@ -432,14 +432,6 @@ void __tdp_pg_map(struct vmx_pages *vmx, struct kvm_vm *vm,
pt = addr_gpa2hva(vm, pte->address * vm->page_size);
}
-
- /*
- * For now mark these as accessed and dirty because the only
- * testcase we have needs that. Can be reconsidered later.
- */
- pte->accessed = true;
- pte->dirty = true;
-
}
void tdp_pg_map(struct vmx_pages *vmx, struct kvm_vm *vm,
--
2.52.0.351.gbe84eed79e-goog
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH v4 06/21] KVM: selftests: Add "struct kvm_mmu" to track a given MMU instance
2025-12-30 23:01 [PATCH v4 00/21] KVM: selftests: Add Nested NPT support Sean Christopherson
` (4 preceding siblings ...)
2025-12-30 23:01 ` [PATCH v4 05/21] KVM: selftests: Stop setting A/D bits when creating EPT PTEs Sean Christopherson
@ 2025-12-30 23:01 ` Sean Christopherson
2026-01-02 16:50 ` Yosry Ahmed
2025-12-30 23:01 ` [PATCH v4 07/21] KVM: selftests: Plumb "struct kvm_mmu" into x86's MMU APIs Sean Christopherson
` (15 subsequent siblings)
21 siblings, 1 reply; 39+ messages in thread
From: Sean Christopherson @ 2025-12-30 23:01 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou,
Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, kvm-riscv, linux-riscv,
linux-kernel, Yosry Ahmed
Add a "struct kvm_mmu" to track a given MMU instance, e.g. a VM's stage-1
MMU versus a VM's stage-2 MMU, so that x86 can share MMU functionality for
both stage-1 and stage-2 MMUs, without creating the potential for subtle
bugs, e.g. due to consuming on vm->pgtable_levels when operating a stage-2
MMU.
Encapsulate the existing de facto MMU in "struct kvm_vm", e.g instead of
burying the MMU details in "struct kvm_vm_arch", to avoid more #ifdefs in
____vm_create(), and in the hopes that other architectures can utilize the
formalized MMU structure if/when they too support stage-2 page tables.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
.../testing/selftests/kvm/include/kvm_util.h | 11 ++++--
.../selftests/kvm/lib/arm64/processor.c | 38 +++++++++----------
tools/testing/selftests/kvm/lib/kvm_util.c | 28 +++++++-------
.../selftests/kvm/lib/loongarch/processor.c | 28 +++++++-------
.../selftests/kvm/lib/riscv/processor.c | 31 +++++++--------
.../selftests/kvm/lib/s390/processor.c | 16 ++++----
.../testing/selftests/kvm/lib/x86/processor.c | 28 +++++++-------
.../kvm/x86/vmx_nested_la57_state_test.c | 2 +-
8 files changed, 94 insertions(+), 88 deletions(-)
diff --git a/tools/testing/selftests/kvm/include/kvm_util.h b/tools/testing/selftests/kvm/include/kvm_util.h
index 81f4355ff28a..39558c05c0bf 100644
--- a/tools/testing/selftests/kvm/include/kvm_util.h
+++ b/tools/testing/selftests/kvm/include/kvm_util.h
@@ -88,12 +88,17 @@ enum kvm_mem_region_type {
NR_MEM_REGIONS,
};
+struct kvm_mmu {
+ bool pgd_created;
+ uint64_t pgd;
+ int pgtable_levels;
+};
+
struct kvm_vm {
int mode;
unsigned long type;
int kvm_fd;
int fd;
- unsigned int pgtable_levels;
unsigned int page_size;
unsigned int page_shift;
unsigned int pa_bits;
@@ -104,13 +109,13 @@ struct kvm_vm {
struct sparsebit *vpages_valid;
struct sparsebit *vpages_mapped;
bool has_irqchip;
- bool pgd_created;
vm_paddr_t ucall_mmio_addr;
- vm_paddr_t pgd;
vm_vaddr_t handlers;
uint32_t dirty_ring_size;
uint64_t gpa_tag_mask;
+ struct kvm_mmu mmu;
+
struct kvm_vm_arch arch;
struct kvm_binary_stats stats;
diff --git a/tools/testing/selftests/kvm/lib/arm64/processor.c b/tools/testing/selftests/kvm/lib/arm64/processor.c
index d46e4b13b92c..c40f59d48311 100644
--- a/tools/testing/selftests/kvm/lib/arm64/processor.c
+++ b/tools/testing/selftests/kvm/lib/arm64/processor.c
@@ -28,7 +28,7 @@ static uint64_t page_align(struct kvm_vm *vm, uint64_t v)
static uint64_t pgd_index(struct kvm_vm *vm, vm_vaddr_t gva)
{
- unsigned int shift = (vm->pgtable_levels - 1) * (vm->page_shift - 3) + vm->page_shift;
+ unsigned int shift = (vm->mmu.pgtable_levels - 1) * (vm->page_shift - 3) + vm->page_shift;
uint64_t mask = (1UL << (vm->va_bits - shift)) - 1;
return (gva >> shift) & mask;
@@ -39,7 +39,7 @@ static uint64_t pud_index(struct kvm_vm *vm, vm_vaddr_t gva)
unsigned int shift = 2 * (vm->page_shift - 3) + vm->page_shift;
uint64_t mask = (1UL << (vm->page_shift - 3)) - 1;
- TEST_ASSERT(vm->pgtable_levels == 4,
+ TEST_ASSERT(vm->mmu.pgtable_levels == 4,
"Mode %d does not have 4 page table levels", vm->mode);
return (gva >> shift) & mask;
@@ -50,7 +50,7 @@ static uint64_t pmd_index(struct kvm_vm *vm, vm_vaddr_t gva)
unsigned int shift = (vm->page_shift - 3) + vm->page_shift;
uint64_t mask = (1UL << (vm->page_shift - 3)) - 1;
- TEST_ASSERT(vm->pgtable_levels >= 3,
+ TEST_ASSERT(vm->mmu.pgtable_levels >= 3,
"Mode %d does not have >= 3 page table levels", vm->mode);
return (gva >> shift) & mask;
@@ -104,7 +104,7 @@ static uint64_t pte_addr(struct kvm_vm *vm, uint64_t pte)
static uint64_t ptrs_per_pgd(struct kvm_vm *vm)
{
- unsigned int shift = (vm->pgtable_levels - 1) * (vm->page_shift - 3) + vm->page_shift;
+ unsigned int shift = (vm->mmu.pgtable_levels - 1) * (vm->page_shift - 3) + vm->page_shift;
return 1 << (vm->va_bits - shift);
}
@@ -117,13 +117,13 @@ void virt_arch_pgd_alloc(struct kvm_vm *vm)
{
size_t nr_pages = page_align(vm, ptrs_per_pgd(vm) * 8) / vm->page_size;
- if (vm->pgd_created)
+ if (vm->mmu.pgd_created)
return;
- vm->pgd = vm_phy_pages_alloc(vm, nr_pages,
- KVM_GUEST_PAGE_TABLE_MIN_PADDR,
- vm->memslots[MEM_REGION_PT]);
- vm->pgd_created = true;
+ vm->mmu.pgd = vm_phy_pages_alloc(vm, nr_pages,
+ KVM_GUEST_PAGE_TABLE_MIN_PADDR,
+ vm->memslots[MEM_REGION_PT]);
+ vm->mmu.pgd_created = true;
}
static void _virt_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr,
@@ -147,12 +147,12 @@ static void _virt_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr,
" paddr: 0x%lx vm->max_gfn: 0x%lx vm->page_size: 0x%x",
paddr, vm->max_gfn, vm->page_size);
- ptep = addr_gpa2hva(vm, vm->pgd) + pgd_index(vm, vaddr) * 8;
+ ptep = addr_gpa2hva(vm, vm->mmu.pgd) + pgd_index(vm, vaddr) * 8;
if (!*ptep)
*ptep = addr_pte(vm, vm_alloc_page_table(vm),
PGD_TYPE_TABLE | PTE_VALID);
- switch (vm->pgtable_levels) {
+ switch (vm->mmu.pgtable_levels) {
case 4:
ptep = addr_gpa2hva(vm, pte_addr(vm, *ptep)) + pud_index(vm, vaddr) * 8;
if (!*ptep)
@@ -190,16 +190,16 @@ uint64_t *virt_get_pte_hva_at_level(struct kvm_vm *vm, vm_vaddr_t gva, int level
{
uint64_t *ptep;
- if (!vm->pgd_created)
+ if (!vm->mmu.pgd_created)
goto unmapped_gva;
- ptep = addr_gpa2hva(vm, vm->pgd) + pgd_index(vm, gva) * 8;
+ ptep = addr_gpa2hva(vm, vm->mmu.pgd) + pgd_index(vm, gva) * 8;
if (!ptep)
goto unmapped_gva;
if (level == 0)
return ptep;
- switch (vm->pgtable_levels) {
+ switch (vm->mmu.pgtable_levels) {
case 4:
ptep = addr_gpa2hva(vm, pte_addr(vm, *ptep)) + pud_index(vm, gva) * 8;
if (!ptep)
@@ -263,13 +263,13 @@ static void pte_dump(FILE *stream, struct kvm_vm *vm, uint8_t indent, uint64_t p
void virt_arch_dump(FILE *stream, struct kvm_vm *vm, uint8_t indent)
{
- int level = 4 - (vm->pgtable_levels - 1);
+ int level = 4 - (vm->mmu.pgtable_levels - 1);
uint64_t pgd, *ptep;
- if (!vm->pgd_created)
+ if (!vm->mmu.pgd_created)
return;
- for (pgd = vm->pgd; pgd < vm->pgd + ptrs_per_pgd(vm) * 8; pgd += 8) {
+ for (pgd = vm->mmu.pgd; pgd < vm->mmu.pgd + ptrs_per_pgd(vm) * 8; pgd += 8) {
ptep = addr_gpa2hva(vm, pgd);
if (!*ptep)
continue;
@@ -350,7 +350,7 @@ void aarch64_vcpu_setup(struct kvm_vcpu *vcpu, struct kvm_vcpu_init *init)
TEST_FAIL("Unknown guest mode, mode: 0x%x", vm->mode);
}
- ttbr0_el1 = vm->pgd & GENMASK(47, vm->page_shift);
+ ttbr0_el1 = vm->mmu.pgd & GENMASK(47, vm->page_shift);
/* Configure output size */
switch (vm->mode) {
@@ -358,7 +358,7 @@ void aarch64_vcpu_setup(struct kvm_vcpu *vcpu, struct kvm_vcpu_init *init)
case VM_MODE_P52V48_16K:
case VM_MODE_P52V48_64K:
tcr_el1 |= TCR_IPS_52_BITS;
- ttbr0_el1 |= FIELD_GET(GENMASK(51, 48), vm->pgd) << 2;
+ ttbr0_el1 |= FIELD_GET(GENMASK(51, 48), vm->mmu.pgd) << 2;
break;
case VM_MODE_P48V48_4K:
case VM_MODE_P48V48_16K:
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
index 8279b6ced8d2..65752daeed90 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -281,34 +281,34 @@ struct kvm_vm *____vm_create(struct vm_shape shape)
/* Setup mode specific traits. */
switch (vm->mode) {
case VM_MODE_P52V48_4K:
- vm->pgtable_levels = 4;
+ vm->mmu.pgtable_levels = 4;
break;
case VM_MODE_P52V48_64K:
- vm->pgtable_levels = 3;
+ vm->mmu.pgtable_levels = 3;
break;
case VM_MODE_P48V48_4K:
- vm->pgtable_levels = 4;
+ vm->mmu.pgtable_levels = 4;
break;
case VM_MODE_P48V48_64K:
- vm->pgtable_levels = 3;
+ vm->mmu.pgtable_levels = 3;
break;
case VM_MODE_P40V48_4K:
case VM_MODE_P36V48_4K:
- vm->pgtable_levels = 4;
+ vm->mmu.pgtable_levels = 4;
break;
case VM_MODE_P40V48_64K:
case VM_MODE_P36V48_64K:
- vm->pgtable_levels = 3;
+ vm->mmu.pgtable_levels = 3;
break;
case VM_MODE_P52V48_16K:
case VM_MODE_P48V48_16K:
case VM_MODE_P40V48_16K:
case VM_MODE_P36V48_16K:
- vm->pgtable_levels = 4;
+ vm->mmu.pgtable_levels = 4;
break;
case VM_MODE_P47V47_16K:
case VM_MODE_P36V47_16K:
- vm->pgtable_levels = 3;
+ vm->mmu.pgtable_levels = 3;
break;
case VM_MODE_PXXVYY_4K:
#ifdef __x86_64__
@@ -321,22 +321,22 @@ struct kvm_vm *____vm_create(struct vm_shape shape)
vm->va_bits);
if (vm->va_bits == 57) {
- vm->pgtable_levels = 5;
+ vm->mmu.pgtable_levels = 5;
} else {
TEST_ASSERT(vm->va_bits == 48,
"Unexpected guest virtual address width: %d",
vm->va_bits);
- vm->pgtable_levels = 4;
+ vm->mmu.pgtable_levels = 4;
}
#else
TEST_FAIL("VM_MODE_PXXVYY_4K not supported on non-x86 platforms");
#endif
break;
case VM_MODE_P47V64_4K:
- vm->pgtable_levels = 5;
+ vm->mmu.pgtable_levels = 5;
break;
case VM_MODE_P44V64_4K:
- vm->pgtable_levels = 5;
+ vm->mmu.pgtable_levels = 5;
break;
default:
TEST_FAIL("Unknown guest mode: 0x%x", vm->mode);
@@ -1956,8 +1956,8 @@ void vm_dump(FILE *stream, struct kvm_vm *vm, uint8_t indent)
fprintf(stream, "%*sMapped Virtual Pages:\n", indent, "");
sparsebit_dump(stream, vm->vpages_mapped, indent + 2);
fprintf(stream, "%*spgd_created: %u\n", indent, "",
- vm->pgd_created);
- if (vm->pgd_created) {
+ vm->mmu.pgd_created);
+ if (vm->mmu.pgd_created) {
fprintf(stream, "%*sVirtual Translation Tables:\n",
indent + 2, "");
virt_dump(stream, vm, indent + 4);
diff --git a/tools/testing/selftests/kvm/lib/loongarch/processor.c b/tools/testing/selftests/kvm/lib/loongarch/processor.c
index 07c103369ddb..17aa55a2047a 100644
--- a/tools/testing/selftests/kvm/lib/loongarch/processor.c
+++ b/tools/testing/selftests/kvm/lib/loongarch/processor.c
@@ -50,11 +50,11 @@ void virt_arch_pgd_alloc(struct kvm_vm *vm)
int i;
vm_paddr_t child, table;
- if (vm->pgd_created)
+ if (vm->mmu.pgd_created)
return;
child = table = 0;
- for (i = 0; i < vm->pgtable_levels; i++) {
+ for (i = 0; i < vm->mmu.pgtable_levels; i++) {
invalid_pgtable[i] = child;
table = vm_phy_page_alloc(vm, LOONGARCH_PAGE_TABLE_PHYS_MIN,
vm->memslots[MEM_REGION_PT]);
@@ -62,8 +62,8 @@ void virt_arch_pgd_alloc(struct kvm_vm *vm)
virt_set_pgtable(vm, table, child);
child = table;
}
- vm->pgd = table;
- vm->pgd_created = true;
+ vm->mmu.pgd = table;
+ vm->mmu.pgd_created = true;
}
static int virt_pte_none(uint64_t *ptep, int level)
@@ -77,11 +77,11 @@ static uint64_t *virt_populate_pte(struct kvm_vm *vm, vm_vaddr_t gva, int alloc)
uint64_t *ptep;
vm_paddr_t child;
- if (!vm->pgd_created)
+ if (!vm->mmu.pgd_created)
goto unmapped_gva;
- child = vm->pgd;
- level = vm->pgtable_levels - 1;
+ child = vm->mmu.pgd;
+ level = vm->mmu.pgtable_levels - 1;
while (level > 0) {
ptep = addr_gpa2hva(vm, child) + virt_pte_index(vm, gva, level) * 8;
if (virt_pte_none(ptep, level)) {
@@ -161,11 +161,11 @@ void virt_arch_dump(FILE *stream, struct kvm_vm *vm, uint8_t indent)
{
int level;
- if (!vm->pgd_created)
+ if (!vm->mmu.pgd_created)
return;
- level = vm->pgtable_levels - 1;
- pte_dump(stream, vm, indent, vm->pgd, level);
+ level = vm->mmu.pgtable_levels - 1;
+ pte_dump(stream, vm, indent, vm->mmu.pgd, level);
}
void vcpu_arch_dump(FILE *stream, struct kvm_vcpu *vcpu, uint8_t indent)
@@ -297,7 +297,7 @@ static void loongarch_vcpu_setup(struct kvm_vcpu *vcpu)
width = vm->page_shift - 3;
- switch (vm->pgtable_levels) {
+ switch (vm->mmu.pgtable_levels) {
case 4:
/* pud page shift and width */
val = (vm->page_shift + width * 2) << 20 | (width << 25);
@@ -309,15 +309,15 @@ static void loongarch_vcpu_setup(struct kvm_vcpu *vcpu)
val |= vm->page_shift | width << 5;
break;
default:
- TEST_FAIL("Got %u page table levels, expected 3 or 4", vm->pgtable_levels);
+ TEST_FAIL("Got %u page table levels, expected 3 or 4", vm->mmu.pgtable_levels);
}
loongarch_set_csr(vcpu, LOONGARCH_CSR_PWCTL0, val);
/* PGD page shift and width */
- val = (vm->page_shift + width * (vm->pgtable_levels - 1)) | width << 6;
+ val = (vm->page_shift + width * (vm->mmu.pgtable_levels - 1)) | width << 6;
loongarch_set_csr(vcpu, LOONGARCH_CSR_PWCTL1, val);
- loongarch_set_csr(vcpu, LOONGARCH_CSR_PGDL, vm->pgd);
+ loongarch_set_csr(vcpu, LOONGARCH_CSR_PGDL, vm->mmu.pgd);
/*
* Refill exception runs on real mode
diff --git a/tools/testing/selftests/kvm/lib/riscv/processor.c b/tools/testing/selftests/kvm/lib/riscv/processor.c
index 2eac7d4b59e9..e6ec7c224fc3 100644
--- a/tools/testing/selftests/kvm/lib/riscv/processor.c
+++ b/tools/testing/selftests/kvm/lib/riscv/processor.c
@@ -60,7 +60,7 @@ static uint64_t pte_index(struct kvm_vm *vm, vm_vaddr_t gva, int level)
{
TEST_ASSERT(level > -1,
"Negative page table level (%d) not possible", level);
- TEST_ASSERT(level < vm->pgtable_levels,
+ TEST_ASSERT(level < vm->mmu.pgtable_levels,
"Invalid page table level (%d)", level);
return (gva & pte_index_mask[level]) >> pte_index_shift[level];
@@ -70,19 +70,19 @@ void virt_arch_pgd_alloc(struct kvm_vm *vm)
{
size_t nr_pages = page_align(vm, ptrs_per_pte(vm) * 8) / vm->page_size;
- if (vm->pgd_created)
+ if (vm->mmu.pgd_created)
return;
- vm->pgd = vm_phy_pages_alloc(vm, nr_pages,
- KVM_GUEST_PAGE_TABLE_MIN_PADDR,
- vm->memslots[MEM_REGION_PT]);
- vm->pgd_created = true;
+ vm->mmu.pgd = vm_phy_pages_alloc(vm, nr_pages,
+ KVM_GUEST_PAGE_TABLE_MIN_PADDR,
+ vm->memslots[MEM_REGION_PT]);
+ vm->mmu.pgd_created = true;
}
void virt_arch_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr)
{
uint64_t *ptep, next_ppn;
- int level = vm->pgtable_levels - 1;
+ int level = vm->mmu.pgtable_levels - 1;
TEST_ASSERT((vaddr % vm->page_size) == 0,
"Virtual address not on page boundary,\n"
@@ -98,7 +98,7 @@ void virt_arch_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr)
" paddr: 0x%lx vm->max_gfn: 0x%lx vm->page_size: 0x%x",
paddr, vm->max_gfn, vm->page_size);
- ptep = addr_gpa2hva(vm, vm->pgd) + pte_index(vm, vaddr, level) * 8;
+ ptep = addr_gpa2hva(vm, vm->mmu.pgd) + pte_index(vm, vaddr, level) * 8;
if (!*ptep) {
next_ppn = vm_alloc_page_table(vm) >> PGTBL_PAGE_SIZE_SHIFT;
*ptep = (next_ppn << PGTBL_PTE_ADDR_SHIFT) |
@@ -126,12 +126,12 @@ void virt_arch_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr)
vm_paddr_t addr_arch_gva2gpa(struct kvm_vm *vm, vm_vaddr_t gva)
{
uint64_t *ptep;
- int level = vm->pgtable_levels - 1;
+ int level = vm->mmu.pgtable_levels - 1;
- if (!vm->pgd_created)
+ if (!vm->mmu.pgd_created)
goto unmapped_gva;
- ptep = addr_gpa2hva(vm, vm->pgd) + pte_index(vm, gva, level) * 8;
+ ptep = addr_gpa2hva(vm, vm->mmu.pgd) + pte_index(vm, gva, level) * 8;
if (!ptep)
goto unmapped_gva;
level--;
@@ -176,13 +176,14 @@ static void pte_dump(FILE *stream, struct kvm_vm *vm, uint8_t indent,
void virt_arch_dump(FILE *stream, struct kvm_vm *vm, uint8_t indent)
{
- int level = vm->pgtable_levels - 1;
+ struct kvm_mmu *mmu = &vm->mmu;
+ int level = mmu->pgtable_levels - 1;
uint64_t pgd, *ptep;
- if (!vm->pgd_created)
+ if (!mmu->pgd_created)
return;
- for (pgd = vm->pgd; pgd < vm->pgd + ptrs_per_pte(vm) * 8; pgd += 8) {
+ for (pgd = mmu->pgd; pgd < mmu->pgd + ptrs_per_pte(vm) * 8; pgd += 8) {
ptep = addr_gpa2hva(vm, pgd);
if (!*ptep)
continue;
@@ -211,7 +212,7 @@ void riscv_vcpu_mmu_setup(struct kvm_vcpu *vcpu)
TEST_FAIL("Unknown guest mode, mode: 0x%x", vm->mode);
}
- satp = (vm->pgd >> PGTBL_PAGE_SIZE_SHIFT) & SATP_PPN;
+ satp = (vm->mmu.pgd >> PGTBL_PAGE_SIZE_SHIFT) & SATP_PPN;
satp |= SATP_MODE_48;
vcpu_set_reg(vcpu, RISCV_GENERAL_CSR_REG(satp), satp);
diff --git a/tools/testing/selftests/kvm/lib/s390/processor.c b/tools/testing/selftests/kvm/lib/s390/processor.c
index 8ceeb17c819a..6a9a660413a7 100644
--- a/tools/testing/selftests/kvm/lib/s390/processor.c
+++ b/tools/testing/selftests/kvm/lib/s390/processor.c
@@ -17,7 +17,7 @@ void virt_arch_pgd_alloc(struct kvm_vm *vm)
TEST_ASSERT(vm->page_size == PAGE_SIZE, "Unsupported page size: 0x%x",
vm->page_size);
- if (vm->pgd_created)
+ if (vm->mmu.pgd_created)
return;
paddr = vm_phy_pages_alloc(vm, PAGES_PER_REGION,
@@ -25,8 +25,8 @@ void virt_arch_pgd_alloc(struct kvm_vm *vm)
vm->memslots[MEM_REGION_PT]);
memset(addr_gpa2hva(vm, paddr), 0xff, PAGES_PER_REGION * vm->page_size);
- vm->pgd = paddr;
- vm->pgd_created = true;
+ vm->mmu.pgd = paddr;
+ vm->mmu.pgd_created = true;
}
/*
@@ -70,7 +70,7 @@ void virt_arch_pg_map(struct kvm_vm *vm, uint64_t gva, uint64_t gpa)
gva, vm->max_gfn, vm->page_size);
/* Walk through region and segment tables */
- entry = addr_gpa2hva(vm, vm->pgd);
+ entry = addr_gpa2hva(vm, vm->mmu.pgd);
for (ri = 1; ri <= 4; ri++) {
idx = (gva >> (64 - 11 * ri)) & 0x7ffu;
if (entry[idx] & REGION_ENTRY_INVALID)
@@ -94,7 +94,7 @@ vm_paddr_t addr_arch_gva2gpa(struct kvm_vm *vm, vm_vaddr_t gva)
TEST_ASSERT(vm->page_size == PAGE_SIZE, "Unsupported page size: 0x%x",
vm->page_size);
- entry = addr_gpa2hva(vm, vm->pgd);
+ entry = addr_gpa2hva(vm, vm->mmu.pgd);
for (ri = 1; ri <= 4; ri++) {
idx = (gva >> (64 - 11 * ri)) & 0x7ffu;
TEST_ASSERT(!(entry[idx] & REGION_ENTRY_INVALID),
@@ -149,10 +149,10 @@ static void virt_dump_region(FILE *stream, struct kvm_vm *vm, uint8_t indent,
void virt_arch_dump(FILE *stream, struct kvm_vm *vm, uint8_t indent)
{
- if (!vm->pgd_created)
+ if (!vm->mmu.pgd_created)
return;
- virt_dump_region(stream, vm, indent, vm->pgd);
+ virt_dump_region(stream, vm, indent, vm->mmu.pgd);
}
void vcpu_arch_set_entry_point(struct kvm_vcpu *vcpu, void *guest_code)
@@ -184,7 +184,7 @@ struct kvm_vcpu *vm_arch_vcpu_add(struct kvm_vm *vm, uint32_t vcpu_id)
vcpu_sregs_get(vcpu, &sregs);
sregs.crs[0] |= 0x00040000; /* Enable floating point regs */
- sregs.crs[1] = vm->pgd | 0xf; /* Primary region table */
+ sregs.crs[1] = vm->mmu.pgd | 0xf; /* Primary region table */
vcpu_sregs_set(vcpu, &sregs);
vcpu->run->psw_mask = 0x0400000180000000ULL; /* DAT enabled + 64 bit mode */
diff --git a/tools/testing/selftests/kvm/lib/x86/processor.c b/tools/testing/selftests/kvm/lib/x86/processor.c
index c14bf2b5f28f..f027f86d1535 100644
--- a/tools/testing/selftests/kvm/lib/x86/processor.c
+++ b/tools/testing/selftests/kvm/lib/x86/processor.c
@@ -162,9 +162,9 @@ void virt_arch_pgd_alloc(struct kvm_vm *vm)
"Unknown or unsupported guest mode: 0x%x", vm->mode);
/* If needed, create the top-level page table. */
- if (!vm->pgd_created) {
- vm->pgd = vm_alloc_page_table(vm);
- vm->pgd_created = true;
+ if (!vm->mmu.pgd_created) {
+ vm->mmu.pgd = vm_alloc_page_table(vm);
+ vm->mmu.pgd_created = true;
}
}
@@ -175,7 +175,7 @@ static void *virt_get_pte(struct kvm_vm *vm, uint64_t *parent_pte,
uint64_t *page_table = addr_gpa2hva(vm, pt_gpa);
int index = (vaddr >> PG_LEVEL_SHIFT(level)) & 0x1ffu;
- TEST_ASSERT((*parent_pte & PTE_PRESENT_MASK) || parent_pte == &vm->pgd,
+ TEST_ASSERT((*parent_pte & PTE_PRESENT_MASK) || parent_pte == &vm->mmu.pgd,
"Parent PTE (level %d) not PRESENT for gva: 0x%08lx",
level + 1, vaddr);
@@ -218,7 +218,7 @@ static uint64_t *virt_create_upper_pte(struct kvm_vm *vm,
void __virt_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr, int level)
{
const uint64_t pg_size = PG_LEVEL_SIZE(level);
- uint64_t *pte = &vm->pgd;
+ uint64_t *pte = &vm->mmu.pgd;
int current_level;
TEST_ASSERT(vm->mode == VM_MODE_PXXVYY_4K,
@@ -243,7 +243,7 @@ void __virt_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr, int level)
* Allocate upper level page tables, if not already present. Return
* early if a hugepage was created.
*/
- for (current_level = vm->pgtable_levels;
+ for (current_level = vm->mmu.pgtable_levels;
current_level > PG_LEVEL_4K;
current_level--) {
pte = virt_create_upper_pte(vm, pte, vaddr, paddr,
@@ -309,14 +309,14 @@ static bool vm_is_target_pte(uint64_t *pte, int *level, int current_level)
static uint64_t *__vm_get_page_table_entry(struct kvm_vm *vm, uint64_t vaddr,
int *level)
{
- int va_width = 12 + (vm->pgtable_levels) * 9;
- uint64_t *pte = &vm->pgd;
+ int va_width = 12 + (vm->mmu.pgtable_levels) * 9;
+ uint64_t *pte = &vm->mmu.pgd;
int current_level;
TEST_ASSERT(!vm->arch.is_pt_protected,
"Walking page tables of protected guests is impossible");
- TEST_ASSERT(*level >= PG_LEVEL_NONE && *level <= vm->pgtable_levels,
+ TEST_ASSERT(*level >= PG_LEVEL_NONE && *level <= vm->mmu.pgtable_levels,
"Invalid PG_LEVEL_* '%d'", *level);
TEST_ASSERT(vm->mode == VM_MODE_PXXVYY_4K,
@@ -332,7 +332,7 @@ static uint64_t *__vm_get_page_table_entry(struct kvm_vm *vm, uint64_t vaddr,
(((int64_t)vaddr << (64 - va_width) >> (64 - va_width))),
"Canonical check failed. The virtual address is invalid.");
- for (current_level = vm->pgtable_levels;
+ for (current_level = vm->mmu.pgtable_levels;
current_level > PG_LEVEL_4K;
current_level--) {
pte = virt_get_pte(vm, pte, vaddr, current_level);
@@ -357,7 +357,7 @@ void virt_arch_dump(FILE *stream, struct kvm_vm *vm, uint8_t indent)
uint64_t *pde, *pde_start;
uint64_t *pte, *pte_start;
- if (!vm->pgd_created)
+ if (!vm->mmu.pgd_created)
return;
fprintf(stream, "%*s "
@@ -365,7 +365,7 @@ void virt_arch_dump(FILE *stream, struct kvm_vm *vm, uint8_t indent)
fprintf(stream, "%*s index hvaddr gpaddr "
"addr w exec dirty\n",
indent, "");
- pml4e_start = (uint64_t *) addr_gpa2hva(vm, vm->pgd);
+ pml4e_start = (uint64_t *) addr_gpa2hva(vm, vm->mmu.pgd);
for (uint16_t n1 = 0; n1 <= 0x1ffu; n1++) {
pml4e = &pml4e_start[n1];
if (!(*pml4e & PTE_PRESENT_MASK))
@@ -538,7 +538,7 @@ static void vcpu_init_sregs(struct kvm_vm *vm, struct kvm_vcpu *vcpu)
sregs.cr4 |= X86_CR4_PAE | X86_CR4_OSFXSR;
if (kvm_cpu_has(X86_FEATURE_XSAVE))
sregs.cr4 |= X86_CR4_OSXSAVE;
- if (vm->pgtable_levels == 5)
+ if (vm->mmu.pgtable_levels == 5)
sregs.cr4 |= X86_CR4_LA57;
sregs.efer |= (EFER_LME | EFER_LMA | EFER_NX);
@@ -549,7 +549,7 @@ static void vcpu_init_sregs(struct kvm_vm *vm, struct kvm_vcpu *vcpu)
kvm_seg_set_kernel_data_64bit(&sregs.gs);
kvm_seg_set_tss_64bit(vm->arch.tss, &sregs.tr);
- sregs.cr3 = vm->pgd;
+ sregs.cr3 = vm->mmu.pgd;
vcpu_sregs_set(vcpu, &sregs);
}
diff --git a/tools/testing/selftests/kvm/x86/vmx_nested_la57_state_test.c b/tools/testing/selftests/kvm/x86/vmx_nested_la57_state_test.c
index cf1d2d1f2a8f..915c42001dba 100644
--- a/tools/testing/selftests/kvm/x86/vmx_nested_la57_state_test.c
+++ b/tools/testing/selftests/kvm/x86/vmx_nested_la57_state_test.c
@@ -90,7 +90,7 @@ int main(int argc, char *argv[])
* L1 needs to read its own PML5 table to set up L2. Identity map
* the PML5 table to facilitate this.
*/
- virt_map(vm, vm->pgd, vm->pgd, 1);
+ virt_map(vm, vm->mmu.pgd, vm->mmu.pgd, 1);
vcpu_alloc_vmx(vm, &vmx_pages_gva);
vcpu_args_set(vcpu, 1, vmx_pages_gva);
--
2.52.0.351.gbe84eed79e-goog
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH v4 07/21] KVM: selftests: Plumb "struct kvm_mmu" into x86's MMU APIs
2025-12-30 23:01 [PATCH v4 00/21] KVM: selftests: Add Nested NPT support Sean Christopherson
` (5 preceding siblings ...)
2025-12-30 23:01 ` [PATCH v4 06/21] KVM: selftests: Add "struct kvm_mmu" to track a given MMU instance Sean Christopherson
@ 2025-12-30 23:01 ` Sean Christopherson
2025-12-30 23:01 ` [PATCH v4 08/21] KVM: selftests: Add a "struct kvm_mmu_arch arch" member to kvm_mmu Sean Christopherson
` (14 subsequent siblings)
21 siblings, 0 replies; 39+ messages in thread
From: Sean Christopherson @ 2025-12-30 23:01 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou,
Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, kvm-riscv, linux-riscv,
linux-kernel, Yosry Ahmed
In preparation for generalizing the x86 virt mapping APIs to work with
TDP (stage-2) page tables, plumb "struct kvm_mmu" into all of the helper
functions instead of operating on vm->mmu directly.
Opportunistically swap the order of the check in virt_get_pte() to first
assert that the parent is the PGD, and then check that the PTE is present,
as it makes more sense to check if the parent PTE is the PGD/root (i.e.
not a PTE) before checking that the PTE is PRESENT.
No functional change intended.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
[sean: rebase on common kvm_mmu structure, rewrite changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
.../selftests/kvm/include/x86/processor.h | 3 +-
.../testing/selftests/kvm/lib/x86/processor.c | 68 +++++++++++--------
2 files changed, 41 insertions(+), 30 deletions(-)
diff --git a/tools/testing/selftests/kvm/include/x86/processor.h b/tools/testing/selftests/kvm/include/x86/processor.h
index c00c0fbe62cd..cbac9de29074 100644
--- a/tools/testing/selftests/kvm/include/x86/processor.h
+++ b/tools/testing/selftests/kvm/include/x86/processor.h
@@ -1449,7 +1449,8 @@ enum pg_level {
#define PG_SIZE_2M PG_LEVEL_SIZE(PG_LEVEL_2M)
#define PG_SIZE_1G PG_LEVEL_SIZE(PG_LEVEL_1G)
-void __virt_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr, int level);
+void __virt_pg_map(struct kvm_vm *vm, struct kvm_mmu *mmu, uint64_t vaddr,
+ uint64_t paddr, int level);
void virt_map_level(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr,
uint64_t nr_bytes, int level);
diff --git a/tools/testing/selftests/kvm/lib/x86/processor.c b/tools/testing/selftests/kvm/lib/x86/processor.c
index f027f86d1535..f25742a804b0 100644
--- a/tools/testing/selftests/kvm/lib/x86/processor.c
+++ b/tools/testing/selftests/kvm/lib/x86/processor.c
@@ -156,26 +156,31 @@ bool kvm_is_tdp_enabled(void)
return get_kvm_amd_param_bool("npt");
}
-void virt_arch_pgd_alloc(struct kvm_vm *vm)
+static void virt_mmu_init(struct kvm_vm *vm, struct kvm_mmu *mmu)
{
- TEST_ASSERT(vm->mode == VM_MODE_PXXVYY_4K,
- "Unknown or unsupported guest mode: 0x%x", vm->mode);
-
/* If needed, create the top-level page table. */
- if (!vm->mmu.pgd_created) {
- vm->mmu.pgd = vm_alloc_page_table(vm);
- vm->mmu.pgd_created = true;
+ if (!mmu->pgd_created) {
+ mmu->pgd = vm_alloc_page_table(vm);
+ mmu->pgd_created = true;
}
}
-static void *virt_get_pte(struct kvm_vm *vm, uint64_t *parent_pte,
- uint64_t vaddr, int level)
+void virt_arch_pgd_alloc(struct kvm_vm *vm)
+{
+ TEST_ASSERT(vm->mode == VM_MODE_PXXVYY_4K,
+ "Unknown or unsupported guest mode: 0x%x", vm->mode);
+
+ virt_mmu_init(vm, &vm->mmu);
+}
+
+static void *virt_get_pte(struct kvm_vm *vm, struct kvm_mmu *mmu,
+ uint64_t *parent_pte, uint64_t vaddr, int level)
{
uint64_t pt_gpa = PTE_GET_PA(*parent_pte);
uint64_t *page_table = addr_gpa2hva(vm, pt_gpa);
int index = (vaddr >> PG_LEVEL_SHIFT(level)) & 0x1ffu;
- TEST_ASSERT((*parent_pte & PTE_PRESENT_MASK) || parent_pte == &vm->mmu.pgd,
+ TEST_ASSERT((*parent_pte == mmu->pgd) || (*parent_pte & PTE_PRESENT_MASK),
"Parent PTE (level %d) not PRESENT for gva: 0x%08lx",
level + 1, vaddr);
@@ -183,13 +188,14 @@ static void *virt_get_pte(struct kvm_vm *vm, uint64_t *parent_pte,
}
static uint64_t *virt_create_upper_pte(struct kvm_vm *vm,
+ struct kvm_mmu *mmu,
uint64_t *parent_pte,
uint64_t vaddr,
uint64_t paddr,
int current_level,
int target_level)
{
- uint64_t *pte = virt_get_pte(vm, parent_pte, vaddr, current_level);
+ uint64_t *pte = virt_get_pte(vm, mmu, parent_pte, vaddr, current_level);
paddr = vm_untag_gpa(vm, paddr);
@@ -215,10 +221,11 @@ static uint64_t *virt_create_upper_pte(struct kvm_vm *vm,
return pte;
}
-void __virt_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr, int level)
+void __virt_pg_map(struct kvm_vm *vm, struct kvm_mmu *mmu, uint64_t vaddr,
+ uint64_t paddr, int level)
{
const uint64_t pg_size = PG_LEVEL_SIZE(level);
- uint64_t *pte = &vm->mmu.pgd;
+ uint64_t *pte = &mmu->pgd;
int current_level;
TEST_ASSERT(vm->mode == VM_MODE_PXXVYY_4K,
@@ -243,17 +250,17 @@ void __virt_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr, int level)
* Allocate upper level page tables, if not already present. Return
* early if a hugepage was created.
*/
- for (current_level = vm->mmu.pgtable_levels;
+ for (current_level = mmu->pgtable_levels;
current_level > PG_LEVEL_4K;
current_level--) {
- pte = virt_create_upper_pte(vm, pte, vaddr, paddr,
+ pte = virt_create_upper_pte(vm, mmu, pte, vaddr, paddr,
current_level, level);
if (*pte & PTE_LARGE_MASK)
return;
}
/* Fill in page table entry. */
- pte = virt_get_pte(vm, pte, vaddr, PG_LEVEL_4K);
+ pte = virt_get_pte(vm, mmu, pte, vaddr, PG_LEVEL_4K);
TEST_ASSERT(!(*pte & PTE_PRESENT_MASK),
"PTE already present for 4k page at vaddr: 0x%lx", vaddr);
*pte = PTE_PRESENT_MASK | PTE_WRITABLE_MASK | (paddr & PHYSICAL_PAGE_MASK);
@@ -270,7 +277,7 @@ void __virt_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr, int level)
void virt_arch_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr)
{
- __virt_pg_map(vm, vaddr, paddr, PG_LEVEL_4K);
+ __virt_pg_map(vm, &vm->mmu, vaddr, paddr, PG_LEVEL_4K);
}
void virt_map_level(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr,
@@ -285,7 +292,7 @@ void virt_map_level(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr,
nr_bytes, pg_size);
for (i = 0; i < nr_pages; i++) {
- __virt_pg_map(vm, vaddr, paddr, level);
+ __virt_pg_map(vm, &vm->mmu, vaddr, paddr, level);
sparsebit_set_num(vm->vpages_mapped, vaddr >> vm->page_shift,
nr_bytes / PAGE_SIZE);
@@ -294,7 +301,8 @@ void virt_map_level(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr,
}
}
-static bool vm_is_target_pte(uint64_t *pte, int *level, int current_level)
+static bool vm_is_target_pte(struct kvm_mmu *mmu, uint64_t *pte,
+ int *level, int current_level)
{
if (*pte & PTE_LARGE_MASK) {
TEST_ASSERT(*level == PG_LEVEL_NONE ||
@@ -306,17 +314,19 @@ static bool vm_is_target_pte(uint64_t *pte, int *level, int current_level)
return *level == current_level;
}
-static uint64_t *__vm_get_page_table_entry(struct kvm_vm *vm, uint64_t vaddr,
+static uint64_t *__vm_get_page_table_entry(struct kvm_vm *vm,
+ struct kvm_mmu *mmu,
+ uint64_t vaddr,
int *level)
{
- int va_width = 12 + (vm->mmu.pgtable_levels) * 9;
- uint64_t *pte = &vm->mmu.pgd;
+ int va_width = 12 + (mmu->pgtable_levels) * 9;
+ uint64_t *pte = &mmu->pgd;
int current_level;
TEST_ASSERT(!vm->arch.is_pt_protected,
"Walking page tables of protected guests is impossible");
- TEST_ASSERT(*level >= PG_LEVEL_NONE && *level <= vm->mmu.pgtable_levels,
+ TEST_ASSERT(*level >= PG_LEVEL_NONE && *level <= mmu->pgtable_levels,
"Invalid PG_LEVEL_* '%d'", *level);
TEST_ASSERT(vm->mode == VM_MODE_PXXVYY_4K,
@@ -332,22 +342,22 @@ static uint64_t *__vm_get_page_table_entry(struct kvm_vm *vm, uint64_t vaddr,
(((int64_t)vaddr << (64 - va_width) >> (64 - va_width))),
"Canonical check failed. The virtual address is invalid.");
- for (current_level = vm->mmu.pgtable_levels;
+ for (current_level = mmu->pgtable_levels;
current_level > PG_LEVEL_4K;
current_level--) {
- pte = virt_get_pte(vm, pte, vaddr, current_level);
- if (vm_is_target_pte(pte, level, current_level))
+ pte = virt_get_pte(vm, mmu, pte, vaddr, current_level);
+ if (vm_is_target_pte(mmu, pte, level, current_level))
return pte;
}
- return virt_get_pte(vm, pte, vaddr, PG_LEVEL_4K);
+ return virt_get_pte(vm, mmu, pte, vaddr, PG_LEVEL_4K);
}
uint64_t *vm_get_page_table_entry(struct kvm_vm *vm, uint64_t vaddr)
{
int level = PG_LEVEL_4K;
- return __vm_get_page_table_entry(vm, vaddr, &level);
+ return __vm_get_page_table_entry(vm, &vm->mmu, vaddr, &level);
}
void virt_arch_dump(FILE *stream, struct kvm_vm *vm, uint8_t indent)
@@ -497,7 +507,7 @@ static void kvm_seg_set_kernel_data_64bit(struct kvm_segment *segp)
vm_paddr_t addr_arch_gva2gpa(struct kvm_vm *vm, vm_vaddr_t gva)
{
int level = PG_LEVEL_NONE;
- uint64_t *pte = __vm_get_page_table_entry(vm, gva, &level);
+ uint64_t *pte = __vm_get_page_table_entry(vm, &vm->mmu, gva, &level);
TEST_ASSERT(*pte & PTE_PRESENT_MASK,
"Leaf PTE not PRESENT for gva: 0x%08lx", gva);
--
2.52.0.351.gbe84eed79e-goog
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH v4 08/21] KVM: selftests: Add a "struct kvm_mmu_arch arch" member to kvm_mmu
2025-12-30 23:01 [PATCH v4 00/21] KVM: selftests: Add Nested NPT support Sean Christopherson
` (6 preceding siblings ...)
2025-12-30 23:01 ` [PATCH v4 07/21] KVM: selftests: Plumb "struct kvm_mmu" into x86's MMU APIs Sean Christopherson
@ 2025-12-30 23:01 ` Sean Christopherson
2026-01-02 16:53 ` Yosry Ahmed
2026-01-02 17:02 ` Yosry Ahmed
2025-12-30 23:01 ` [PATCH v4 09/21] KVM: selftests: Move PTE bitmasks " Sean Christopherson
` (13 subsequent siblings)
21 siblings, 2 replies; 39+ messages in thread
From: Sean Christopherson @ 2025-12-30 23:01 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou,
Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, kvm-riscv, linux-riscv,
linux-kernel, Yosry Ahmed
Add an arch structure+field in "struct kvm_mmu" so that architectures can
track arch-specific information for a given MMU.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
tools/testing/selftests/kvm/include/arm64/kvm_util_arch.h | 2 ++
tools/testing/selftests/kvm/include/kvm_util.h | 2 ++
tools/testing/selftests/kvm/include/loongarch/kvm_util_arch.h | 1 +
tools/testing/selftests/kvm/include/riscv/kvm_util_arch.h | 1 +
tools/testing/selftests/kvm/include/s390/kvm_util_arch.h | 1 +
tools/testing/selftests/kvm/include/x86/kvm_util_arch.h | 2 ++
6 files changed, 9 insertions(+)
diff --git a/tools/testing/selftests/kvm/include/arm64/kvm_util_arch.h b/tools/testing/selftests/kvm/include/arm64/kvm_util_arch.h
index b973bb2c64a6..4a2033708227 100644
--- a/tools/testing/selftests/kvm/include/arm64/kvm_util_arch.h
+++ b/tools/testing/selftests/kvm/include/arm64/kvm_util_arch.h
@@ -2,6 +2,8 @@
#ifndef SELFTEST_KVM_UTIL_ARCH_H
#define SELFTEST_KVM_UTIL_ARCH_H
+struct kvm_mmu_arch {};
+
struct kvm_vm_arch {
bool has_gic;
int gic_fd;
diff --git a/tools/testing/selftests/kvm/include/kvm_util.h b/tools/testing/selftests/kvm/include/kvm_util.h
index 39558c05c0bf..c1497515fa6a 100644
--- a/tools/testing/selftests/kvm/include/kvm_util.h
+++ b/tools/testing/selftests/kvm/include/kvm_util.h
@@ -92,6 +92,8 @@ struct kvm_mmu {
bool pgd_created;
uint64_t pgd;
int pgtable_levels;
+
+ struct kvm_mmu_arch arch;
};
struct kvm_vm {
diff --git a/tools/testing/selftests/kvm/include/loongarch/kvm_util_arch.h b/tools/testing/selftests/kvm/include/loongarch/kvm_util_arch.h
index e43a57d99b56..d5095900e442 100644
--- a/tools/testing/selftests/kvm/include/loongarch/kvm_util_arch.h
+++ b/tools/testing/selftests/kvm/include/loongarch/kvm_util_arch.h
@@ -2,6 +2,7 @@
#ifndef SELFTEST_KVM_UTIL_ARCH_H
#define SELFTEST_KVM_UTIL_ARCH_H
+struct kvm_mmu_arch {};
struct kvm_vm_arch {};
#endif // SELFTEST_KVM_UTIL_ARCH_H
diff --git a/tools/testing/selftests/kvm/include/riscv/kvm_util_arch.h b/tools/testing/selftests/kvm/include/riscv/kvm_util_arch.h
index e43a57d99b56..d5095900e442 100644
--- a/tools/testing/selftests/kvm/include/riscv/kvm_util_arch.h
+++ b/tools/testing/selftests/kvm/include/riscv/kvm_util_arch.h
@@ -2,6 +2,7 @@
#ifndef SELFTEST_KVM_UTIL_ARCH_H
#define SELFTEST_KVM_UTIL_ARCH_H
+struct kvm_mmu_arch {};
struct kvm_vm_arch {};
#endif // SELFTEST_KVM_UTIL_ARCH_H
diff --git a/tools/testing/selftests/kvm/include/s390/kvm_util_arch.h b/tools/testing/selftests/kvm/include/s390/kvm_util_arch.h
index e43a57d99b56..d5095900e442 100644
--- a/tools/testing/selftests/kvm/include/s390/kvm_util_arch.h
+++ b/tools/testing/selftests/kvm/include/s390/kvm_util_arch.h
@@ -2,6 +2,7 @@
#ifndef SELFTEST_KVM_UTIL_ARCH_H
#define SELFTEST_KVM_UTIL_ARCH_H
+struct kvm_mmu_arch {};
struct kvm_vm_arch {};
#endif // SELFTEST_KVM_UTIL_ARCH_H
diff --git a/tools/testing/selftests/kvm/include/x86/kvm_util_arch.h b/tools/testing/selftests/kvm/include/x86/kvm_util_arch.h
index 972bb1c4ab4c..456e5ca170df 100644
--- a/tools/testing/selftests/kvm/include/x86/kvm_util_arch.h
+++ b/tools/testing/selftests/kvm/include/x86/kvm_util_arch.h
@@ -10,6 +10,8 @@
extern bool is_forced_emulation_enabled;
+struct kvm_mmu_arch {};
+
struct kvm_vm_arch {
vm_vaddr_t gdt;
vm_vaddr_t tss;
--
2.52.0.351.gbe84eed79e-goog
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH v4 09/21] KVM: selftests: Move PTE bitmasks to kvm_mmu
2025-12-30 23:01 [PATCH v4 00/21] KVM: selftests: Add Nested NPT support Sean Christopherson
` (7 preceding siblings ...)
2025-12-30 23:01 ` [PATCH v4 08/21] KVM: selftests: Add a "struct kvm_mmu_arch arch" member to kvm_mmu Sean Christopherson
@ 2025-12-30 23:01 ` Sean Christopherson
2025-12-30 23:01 ` [PATCH v4 10/21] KVM: selftests: Use a TDP MMU to share EPT page tables between vCPUs Sean Christopherson
` (12 subsequent siblings)
21 siblings, 0 replies; 39+ messages in thread
From: Sean Christopherson @ 2025-12-30 23:01 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou,
Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, kvm-riscv, linux-riscv,
linux-kernel, Yosry Ahmed
From: Yosry Ahmed <yosry.ahmed@linux.dev>
Move the PTE bitmasks into kvm_mmu to parameterize them for virt mapping
functions. Introduce helpers to read/write different PTE bits given a
kvm_mmu.
Drop the 'global' bit definition as it's currently unused, but leave the
'user' bit as it will be used in coming changes. Opportunisitcally
rename 'large' to 'huge' as it's more consistent with the kernel naming.
Leave PHYSICAL_PAGE_MASK alone, it's fixed in all page table formats and
a lot of other macros depend on it. It's tempting to move all the other
macros to be per-struct instead, but it would be too much noise for
little benefit.
Keep c_bit and s_bit in vm->arch as they used before the MMU is
initialized, through __vmcreate() -> vm_userspace_mem_region_add() ->
vm_mem_add() -> vm_arch_has_protected_memory().
No functional change intended.
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
[sean: rename accessors to is_<adjective>_pte()]
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
.../selftests/kvm/include/x86/kvm_util_arch.h | 16 ++++-
.../selftests/kvm/include/x86/processor.h | 28 +++++---
.../testing/selftests/kvm/lib/x86/processor.c | 71 +++++++++++--------
3 files changed, 76 insertions(+), 39 deletions(-)
diff --git a/tools/testing/selftests/kvm/include/x86/kvm_util_arch.h b/tools/testing/selftests/kvm/include/x86/kvm_util_arch.h
index 456e5ca170df..bad381d63b6a 100644
--- a/tools/testing/selftests/kvm/include/x86/kvm_util_arch.h
+++ b/tools/testing/selftests/kvm/include/x86/kvm_util_arch.h
@@ -10,7 +10,21 @@
extern bool is_forced_emulation_enabled;
-struct kvm_mmu_arch {};
+struct pte_masks {
+ uint64_t present;
+ uint64_t writable;
+ uint64_t user;
+ uint64_t accessed;
+ uint64_t dirty;
+ uint64_t huge;
+ uint64_t nx;
+ uint64_t c;
+ uint64_t s;
+};
+
+struct kvm_mmu_arch {
+ struct pte_masks pte_masks;
+};
struct kvm_vm_arch {
vm_vaddr_t gdt;
diff --git a/tools/testing/selftests/kvm/include/x86/processor.h b/tools/testing/selftests/kvm/include/x86/processor.h
index cbac9de29074..b2084434dd8b 100644
--- a/tools/testing/selftests/kvm/include/x86/processor.h
+++ b/tools/testing/selftests/kvm/include/x86/processor.h
@@ -362,16 +362,6 @@ static inline unsigned int x86_model(unsigned int eax)
return ((eax >> 12) & 0xf0) | ((eax >> 4) & 0x0f);
}
-/* Page table bitfield declarations */
-#define PTE_PRESENT_MASK BIT_ULL(0)
-#define PTE_WRITABLE_MASK BIT_ULL(1)
-#define PTE_USER_MASK BIT_ULL(2)
-#define PTE_ACCESSED_MASK BIT_ULL(5)
-#define PTE_DIRTY_MASK BIT_ULL(6)
-#define PTE_LARGE_MASK BIT_ULL(7)
-#define PTE_GLOBAL_MASK BIT_ULL(8)
-#define PTE_NX_MASK BIT_ULL(63)
-
#define PHYSICAL_PAGE_MASK GENMASK_ULL(51, 12)
#define PAGE_SHIFT 12
@@ -1449,6 +1439,24 @@ enum pg_level {
#define PG_SIZE_2M PG_LEVEL_SIZE(PG_LEVEL_2M)
#define PG_SIZE_1G PG_LEVEL_SIZE(PG_LEVEL_1G)
+#define PTE_PRESENT_MASK(mmu) ((mmu)->arch.pte_masks.present)
+#define PTE_WRITABLE_MASK(mmu) ((mmu)->arch.pte_masks.writable)
+#define PTE_USER_MASK(mmu) ((mmu)->arch.pte_masks.user)
+#define PTE_ACCESSED_MASK(mmu) ((mmu)->arch.pte_masks.accessed)
+#define PTE_DIRTY_MASK(mmu) ((mmu)->arch.pte_masks.dirty)
+#define PTE_HUGE_MASK(mmu) ((mmu)->arch.pte_masks.huge)
+#define PTE_NX_MASK(mmu) ((mmu)->arch.pte_masks.nx)
+#define PTE_C_BIT_MASK(mmu) ((mmu)->arch.pte_masks.c)
+#define PTE_S_BIT_MASK(mmu) ((mmu)->arch.pte_masks.s)
+
+#define is_present_pte(mmu, pte) (!!(*(pte) & PTE_PRESENT_MASK(mmu)))
+#define is_writable_pte(mmu, pte) (!!(*(pte) & PTE_WRITABLE_MASK(mmu)))
+#define is_user_pte(mmu, pte) (!!(*(pte) & PTE_USER_MASK(mmu)))
+#define is_accessed_pte(mmu, pte) (!!(*(pte) & PTE_ACCESSED_MASK(mmu)))
+#define is_dirty_pte(mmu, pte) (!!(*(pte) & PTE_DIRTY_MASK(mmu)))
+#define is_huge_pte(mmu, pte) (!!(*(pte) & PTE_HUGE_MASK(mmu)))
+#define is_nx_pte(mmu, pte) (!!(*(pte) & PTE_NX_MASK(mmu)))
+
void __virt_pg_map(struct kvm_vm *vm, struct kvm_mmu *mmu, uint64_t vaddr,
uint64_t paddr, int level);
void virt_map_level(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr,
diff --git a/tools/testing/selftests/kvm/lib/x86/processor.c b/tools/testing/selftests/kvm/lib/x86/processor.c
index f25742a804b0..3800f4ff6770 100644
--- a/tools/testing/selftests/kvm/lib/x86/processor.c
+++ b/tools/testing/selftests/kvm/lib/x86/processor.c
@@ -156,12 +156,14 @@ bool kvm_is_tdp_enabled(void)
return get_kvm_amd_param_bool("npt");
}
-static void virt_mmu_init(struct kvm_vm *vm, struct kvm_mmu *mmu)
+static void virt_mmu_init(struct kvm_vm *vm, struct kvm_mmu *mmu,
+ struct pte_masks *pte_masks)
{
/* If needed, create the top-level page table. */
if (!mmu->pgd_created) {
mmu->pgd = vm_alloc_page_table(vm);
mmu->pgd_created = true;
+ mmu->arch.pte_masks = *pte_masks;
}
}
@@ -170,7 +172,19 @@ void virt_arch_pgd_alloc(struct kvm_vm *vm)
TEST_ASSERT(vm->mode == VM_MODE_PXXVYY_4K,
"Unknown or unsupported guest mode: 0x%x", vm->mode);
- virt_mmu_init(vm, &vm->mmu);
+ struct pte_masks pte_masks = (struct pte_masks){
+ .present = BIT_ULL(0),
+ .writable = BIT_ULL(1),
+ .user = BIT_ULL(2),
+ .accessed = BIT_ULL(5),
+ .dirty = BIT_ULL(6),
+ .huge = BIT_ULL(7),
+ .nx = BIT_ULL(63),
+ .c = vm->arch.c_bit,
+ .s = vm->arch.s_bit,
+ };
+
+ virt_mmu_init(vm, &vm->mmu, &pte_masks);
}
static void *virt_get_pte(struct kvm_vm *vm, struct kvm_mmu *mmu,
@@ -180,7 +194,7 @@ static void *virt_get_pte(struct kvm_vm *vm, struct kvm_mmu *mmu,
uint64_t *page_table = addr_gpa2hva(vm, pt_gpa);
int index = (vaddr >> PG_LEVEL_SHIFT(level)) & 0x1ffu;
- TEST_ASSERT((*parent_pte == mmu->pgd) || (*parent_pte & PTE_PRESENT_MASK),
+ TEST_ASSERT((*parent_pte == mmu->pgd) || is_present_pte(mmu, parent_pte),
"Parent PTE (level %d) not PRESENT for gva: 0x%08lx",
level + 1, vaddr);
@@ -199,10 +213,10 @@ static uint64_t *virt_create_upper_pte(struct kvm_vm *vm,
paddr = vm_untag_gpa(vm, paddr);
- if (!(*pte & PTE_PRESENT_MASK)) {
- *pte = PTE_PRESENT_MASK | PTE_WRITABLE_MASK;
+ if (!is_present_pte(mmu, pte)) {
+ *pte = PTE_PRESENT_MASK(mmu) | PTE_WRITABLE_MASK(mmu);
if (current_level == target_level)
- *pte |= PTE_LARGE_MASK | (paddr & PHYSICAL_PAGE_MASK);
+ *pte |= PTE_HUGE_MASK(mmu) | (paddr & PHYSICAL_PAGE_MASK);
else
*pte |= vm_alloc_page_table(vm) & PHYSICAL_PAGE_MASK;
} else {
@@ -214,7 +228,7 @@ static uint64_t *virt_create_upper_pte(struct kvm_vm *vm,
TEST_ASSERT(current_level != target_level,
"Cannot create hugepage at level: %u, vaddr: 0x%lx",
current_level, vaddr);
- TEST_ASSERT(!(*pte & PTE_LARGE_MASK),
+ TEST_ASSERT(!is_huge_pte(mmu, pte),
"Cannot create page table at level: %u, vaddr: 0x%lx",
current_level, vaddr);
}
@@ -255,24 +269,24 @@ void __virt_pg_map(struct kvm_vm *vm, struct kvm_mmu *mmu, uint64_t vaddr,
current_level--) {
pte = virt_create_upper_pte(vm, mmu, pte, vaddr, paddr,
current_level, level);
- if (*pte & PTE_LARGE_MASK)
+ if (is_huge_pte(mmu, pte))
return;
}
/* Fill in page table entry. */
pte = virt_get_pte(vm, mmu, pte, vaddr, PG_LEVEL_4K);
- TEST_ASSERT(!(*pte & PTE_PRESENT_MASK),
+ TEST_ASSERT(!is_present_pte(mmu, pte),
"PTE already present for 4k page at vaddr: 0x%lx", vaddr);
- *pte = PTE_PRESENT_MASK | PTE_WRITABLE_MASK | (paddr & PHYSICAL_PAGE_MASK);
+ *pte = PTE_PRESENT_MASK(mmu) | PTE_WRITABLE_MASK(mmu) | (paddr & PHYSICAL_PAGE_MASK);
/*
* Neither SEV nor TDX supports shared page tables, so only the final
* leaf PTE needs manually set the C/S-bit.
*/
if (vm_is_gpa_protected(vm, paddr))
- *pte |= vm->arch.c_bit;
+ *pte |= PTE_C_BIT_MASK(mmu);
else
- *pte |= vm->arch.s_bit;
+ *pte |= PTE_S_BIT_MASK(mmu);
}
void virt_arch_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr)
@@ -304,7 +318,7 @@ void virt_map_level(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr,
static bool vm_is_target_pte(struct kvm_mmu *mmu, uint64_t *pte,
int *level, int current_level)
{
- if (*pte & PTE_LARGE_MASK) {
+ if (is_huge_pte(mmu, pte)) {
TEST_ASSERT(*level == PG_LEVEL_NONE ||
*level == current_level,
"Unexpected hugepage at level %d", current_level);
@@ -362,12 +376,13 @@ uint64_t *vm_get_page_table_entry(struct kvm_vm *vm, uint64_t vaddr)
void virt_arch_dump(FILE *stream, struct kvm_vm *vm, uint8_t indent)
{
+ struct kvm_mmu *mmu = &vm->mmu;
uint64_t *pml4e, *pml4e_start;
uint64_t *pdpe, *pdpe_start;
uint64_t *pde, *pde_start;
uint64_t *pte, *pte_start;
- if (!vm->mmu.pgd_created)
+ if (!mmu->pgd_created)
return;
fprintf(stream, "%*s "
@@ -375,47 +390,47 @@ void virt_arch_dump(FILE *stream, struct kvm_vm *vm, uint8_t indent)
fprintf(stream, "%*s index hvaddr gpaddr "
"addr w exec dirty\n",
indent, "");
- pml4e_start = (uint64_t *) addr_gpa2hva(vm, vm->mmu.pgd);
+ pml4e_start = (uint64_t *) addr_gpa2hva(vm, mmu->pgd);
for (uint16_t n1 = 0; n1 <= 0x1ffu; n1++) {
pml4e = &pml4e_start[n1];
- if (!(*pml4e & PTE_PRESENT_MASK))
+ if (!is_present_pte(mmu, pml4e))
continue;
fprintf(stream, "%*spml4e 0x%-3zx %p 0x%-12lx 0x%-10llx %u "
" %u\n",
indent, "",
pml4e - pml4e_start, pml4e,
addr_hva2gpa(vm, pml4e), PTE_GET_PFN(*pml4e),
- !!(*pml4e & PTE_WRITABLE_MASK), !!(*pml4e & PTE_NX_MASK));
+ is_writable_pte(mmu, pml4e), is_nx_pte(mmu, pml4e));
pdpe_start = addr_gpa2hva(vm, *pml4e & PHYSICAL_PAGE_MASK);
for (uint16_t n2 = 0; n2 <= 0x1ffu; n2++) {
pdpe = &pdpe_start[n2];
- if (!(*pdpe & PTE_PRESENT_MASK))
+ if (!is_present_pte(mmu, pdpe))
continue;
fprintf(stream, "%*spdpe 0x%-3zx %p 0x%-12lx 0x%-10llx "
"%u %u\n",
indent, "",
pdpe - pdpe_start, pdpe,
addr_hva2gpa(vm, pdpe),
- PTE_GET_PFN(*pdpe), !!(*pdpe & PTE_WRITABLE_MASK),
- !!(*pdpe & PTE_NX_MASK));
+ PTE_GET_PFN(*pdpe), is_writable_pte(mmu, pdpe),
+ is_nx_pte(mmu, pdpe));
pde_start = addr_gpa2hva(vm, *pdpe & PHYSICAL_PAGE_MASK);
for (uint16_t n3 = 0; n3 <= 0x1ffu; n3++) {
pde = &pde_start[n3];
- if (!(*pde & PTE_PRESENT_MASK))
+ if (!is_present_pte(mmu, pde))
continue;
fprintf(stream, "%*spde 0x%-3zx %p "
"0x%-12lx 0x%-10llx %u %u\n",
indent, "", pde - pde_start, pde,
addr_hva2gpa(vm, pde),
- PTE_GET_PFN(*pde), !!(*pde & PTE_WRITABLE_MASK),
- !!(*pde & PTE_NX_MASK));
+ PTE_GET_PFN(*pde), is_writable_pte(mmu, pde),
+ is_nx_pte(mmu, pde));
pte_start = addr_gpa2hva(vm, *pde & PHYSICAL_PAGE_MASK);
for (uint16_t n4 = 0; n4 <= 0x1ffu; n4++) {
pte = &pte_start[n4];
- if (!(*pte & PTE_PRESENT_MASK))
+ if (!is_present_pte(mmu, pte))
continue;
fprintf(stream, "%*spte 0x%-3zx %p "
"0x%-12lx 0x%-10llx %u %u "
@@ -424,9 +439,9 @@ void virt_arch_dump(FILE *stream, struct kvm_vm *vm, uint8_t indent)
pte - pte_start, pte,
addr_hva2gpa(vm, pte),
PTE_GET_PFN(*pte),
- !!(*pte & PTE_WRITABLE_MASK),
- !!(*pte & PTE_NX_MASK),
- !!(*pte & PTE_DIRTY_MASK),
+ is_writable_pte(mmu, pte),
+ is_nx_pte(mmu, pte),
+ is_dirty_pte(mmu, pte),
((uint64_t) n1 << 27)
| ((uint64_t) n2 << 18)
| ((uint64_t) n3 << 9)
@@ -509,7 +524,7 @@ vm_paddr_t addr_arch_gva2gpa(struct kvm_vm *vm, vm_vaddr_t gva)
int level = PG_LEVEL_NONE;
uint64_t *pte = __vm_get_page_table_entry(vm, &vm->mmu, gva, &level);
- TEST_ASSERT(*pte & PTE_PRESENT_MASK,
+ TEST_ASSERT(is_present_pte(&vm->mmu, pte),
"Leaf PTE not PRESENT for gva: 0x%08lx", gva);
/*
--
2.52.0.351.gbe84eed79e-goog
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH v4 10/21] KVM: selftests: Use a TDP MMU to share EPT page tables between vCPUs
2025-12-30 23:01 [PATCH v4 00/21] KVM: selftests: Add Nested NPT support Sean Christopherson
` (8 preceding siblings ...)
2025-12-30 23:01 ` [PATCH v4 09/21] KVM: selftests: Move PTE bitmasks " Sean Christopherson
@ 2025-12-30 23:01 ` Sean Christopherson
2025-12-30 23:01 ` [PATCH v4 11/21] KVM: selftests: Stop passing VMX metadata to TDP mapping functions Sean Christopherson
` (11 subsequent siblings)
21 siblings, 0 replies; 39+ messages in thread
From: Sean Christopherson @ 2025-12-30 23:01 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou,
Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, kvm-riscv, linux-riscv,
linux-kernel, Yosry Ahmed
From: Yosry Ahmed <yosry.ahmed@linux.dev>
prepare_eptp() currently allocates new EPTs for each vCPU. memstress has
its own hack to share the EPTs between vCPUs. Currently, there is no
reason to have separate EPTs for each vCPU, and the complexity is
significant. The only reason it doesn't matter now is because memstress
is the only user with multiple vCPUs.
Add vm_enable_ept() to allocate EPT page tables for an entire VM, and use
it everywhere to replace prepare_eptp(). Drop 'eptp' and 'eptp_hva' from
'struct vmx_pages' as they serve no purpose (e.g. the EPTP can be built
from the PGD), but keep 'eptp_gpa' so that the MMU structure doesn't need
to be passed in along with vmx_pages. Dynamically allocate the TDP MMU
structure to avoid a cyclical dependency between kvm_util_arch.h and
kvm_util.h.
Remove the workaround in memstress to copy the EPT root between vCPUs
since that's now the default behavior.
Name the MMU tdp_mmu instead of e.g. nested_mmu or nested.mmu to avoid
recreating the same mess that KVM has with respect to "nested" MMUs, e.g.
does nested refer to the stage-2 page tables created by L1, or the stage-1
page tables created by L2?
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
.../selftests/kvm/include/x86/kvm_util_arch.h | 4 +++
.../selftests/kvm/include/x86/processor.h | 3 ++
tools/testing/selftests/kvm/include/x86/vmx.h | 8 ++---
.../testing/selftests/kvm/lib/x86/memstress.c | 19 ++++--------
.../testing/selftests/kvm/lib/x86/processor.c | 9 ++++++
tools/testing/selftests/kvm/lib/x86/vmx.c | 30 ++++++++++++-------
.../selftests/kvm/x86/vmx_dirty_log_test.c | 7 ++---
7 files changed, 48 insertions(+), 32 deletions(-)
diff --git a/tools/testing/selftests/kvm/include/x86/kvm_util_arch.h b/tools/testing/selftests/kvm/include/x86/kvm_util_arch.h
index bad381d63b6a..05a1fc1780f2 100644
--- a/tools/testing/selftests/kvm/include/x86/kvm_util_arch.h
+++ b/tools/testing/selftests/kvm/include/x86/kvm_util_arch.h
@@ -26,6 +26,8 @@ struct kvm_mmu_arch {
struct pte_masks pte_masks;
};
+struct kvm_mmu;
+
struct kvm_vm_arch {
vm_vaddr_t gdt;
vm_vaddr_t tss;
@@ -35,6 +37,8 @@ struct kvm_vm_arch {
uint64_t s_bit;
int sev_fd;
bool is_pt_protected;
+
+ struct kvm_mmu *tdp_mmu;
};
static inline bool __vm_arch_has_protected_memory(struct kvm_vm_arch *arch)
diff --git a/tools/testing/selftests/kvm/include/x86/processor.h b/tools/testing/selftests/kvm/include/x86/processor.h
index b2084434dd8b..973f2069cd3b 100644
--- a/tools/testing/selftests/kvm/include/x86/processor.h
+++ b/tools/testing/selftests/kvm/include/x86/processor.h
@@ -1457,6 +1457,9 @@ enum pg_level {
#define is_huge_pte(mmu, pte) (!!(*(pte) & PTE_HUGE_MASK(mmu)))
#define is_nx_pte(mmu, pte) (!!(*(pte) & PTE_NX_MASK(mmu)))
+void tdp_mmu_init(struct kvm_vm *vm, int pgtable_levels,
+ struct pte_masks *pte_masks);
+
void __virt_pg_map(struct kvm_vm *vm, struct kvm_mmu *mmu, uint64_t vaddr,
uint64_t paddr, int level);
void virt_map_level(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr,
diff --git a/tools/testing/selftests/kvm/include/x86/vmx.h b/tools/testing/selftests/kvm/include/x86/vmx.h
index 04b8231d032a..1fd83c23529a 100644
--- a/tools/testing/selftests/kvm/include/x86/vmx.h
+++ b/tools/testing/selftests/kvm/include/x86/vmx.h
@@ -520,13 +520,11 @@ struct vmx_pages {
uint64_t vmwrite_gpa;
void *vmwrite;
- void *eptp_hva;
- uint64_t eptp_gpa;
- void *eptp;
-
void *apic_access_hva;
uint64_t apic_access_gpa;
void *apic_access;
+
+ uint64_t eptp_gpa;
};
union vmx_basic {
@@ -568,7 +566,7 @@ void tdp_identity_map_default_memslots(struct vmx_pages *vmx,
void tdp_identity_map_1g(struct vmx_pages *vmx, struct kvm_vm *vm,
uint64_t addr, uint64_t size);
bool kvm_cpu_has_ept(void);
-void prepare_eptp(struct vmx_pages *vmx, struct kvm_vm *vm);
+void vm_enable_ept(struct kvm_vm *vm);
void prepare_virtualize_apic_accesses(struct vmx_pages *vmx, struct kvm_vm *vm);
#endif /* SELFTEST_KVM_VMX_H */
diff --git a/tools/testing/selftests/kvm/lib/x86/memstress.c b/tools/testing/selftests/kvm/lib/x86/memstress.c
index 1928b00bde51..00f7f11e5f0e 100644
--- a/tools/testing/selftests/kvm/lib/x86/memstress.c
+++ b/tools/testing/selftests/kvm/lib/x86/memstress.c
@@ -59,12 +59,10 @@ uint64_t memstress_nested_pages(int nr_vcpus)
return 513 + 10 * nr_vcpus;
}
-void memstress_setup_ept(struct vmx_pages *vmx, struct kvm_vm *vm)
+static void memstress_setup_ept_mappings(struct vmx_pages *vmx, struct kvm_vm *vm)
{
uint64_t start, end;
- prepare_eptp(vmx, vm);
-
/*
* Identity map the first 4G and the test region with 1G pages so that
* KVM can shadow the EPT12 with the maximum huge page size supported
@@ -79,7 +77,7 @@ void memstress_setup_ept(struct vmx_pages *vmx, struct kvm_vm *vm)
void memstress_setup_nested(struct kvm_vm *vm, int nr_vcpus, struct kvm_vcpu *vcpus[])
{
- struct vmx_pages *vmx, *vmx0 = NULL;
+ struct vmx_pages *vmx;
struct kvm_regs regs;
vm_vaddr_t vmx_gva;
int vcpu_id;
@@ -87,18 +85,13 @@ void memstress_setup_nested(struct kvm_vm *vm, int nr_vcpus, struct kvm_vcpu *vc
TEST_REQUIRE(kvm_cpu_has(X86_FEATURE_VMX));
TEST_REQUIRE(kvm_cpu_has_ept());
+ vm_enable_ept(vm);
for (vcpu_id = 0; vcpu_id < nr_vcpus; vcpu_id++) {
vmx = vcpu_alloc_vmx(vm, &vmx_gva);
- if (vcpu_id == 0) {
- memstress_setup_ept(vmx, vm);
- vmx0 = vmx;
- } else {
- /* Share the same EPT table across all vCPUs. */
- vmx->eptp = vmx0->eptp;
- vmx->eptp_hva = vmx0->eptp_hva;
- vmx->eptp_gpa = vmx0->eptp_gpa;
- }
+ /* The EPTs are shared across vCPUs, setup the mappings once */
+ if (vcpu_id == 0)
+ memstress_setup_ept_mappings(vmx, vm);
/*
* Override the vCPU to run memstress_l1_guest_code() which will
diff --git a/tools/testing/selftests/kvm/lib/x86/processor.c b/tools/testing/selftests/kvm/lib/x86/processor.c
index 3800f4ff6770..8a9298a72897 100644
--- a/tools/testing/selftests/kvm/lib/x86/processor.c
+++ b/tools/testing/selftests/kvm/lib/x86/processor.c
@@ -187,6 +187,15 @@ void virt_arch_pgd_alloc(struct kvm_vm *vm)
virt_mmu_init(vm, &vm->mmu, &pte_masks);
}
+void tdp_mmu_init(struct kvm_vm *vm, int pgtable_levels,
+ struct pte_masks *pte_masks)
+{
+ TEST_ASSERT(!vm->arch.tdp_mmu, "TDP MMU already initialized");
+
+ vm->arch.tdp_mmu = calloc(1, sizeof(*vm->arch.tdp_mmu));
+ virt_mmu_init(vm, vm->arch.tdp_mmu, pte_masks);
+}
+
static void *virt_get_pte(struct kvm_vm *vm, struct kvm_mmu *mmu,
uint64_t *parent_pte, uint64_t vaddr, int level)
{
diff --git a/tools/testing/selftests/kvm/lib/x86/vmx.c b/tools/testing/selftests/kvm/lib/x86/vmx.c
index a3e2eae981da..9d4e391fdf2c 100644
--- a/tools/testing/selftests/kvm/lib/x86/vmx.c
+++ b/tools/testing/selftests/kvm/lib/x86/vmx.c
@@ -56,6 +56,21 @@ int vcpu_enable_evmcs(struct kvm_vcpu *vcpu)
return evmcs_ver;
}
+void vm_enable_ept(struct kvm_vm *vm)
+{
+ TEST_ASSERT(kvm_cpu_has_ept(), "KVM doesn't support nested EPT");
+ if (vm->arch.tdp_mmu)
+ return;
+
+ /* TODO: Drop eptPageTableEntry in favor of PTE masks. */
+ struct pte_masks pte_masks = (struct pte_masks) {
+
+ };
+
+ /* TODO: Add support for 5-level EPT. */
+ tdp_mmu_init(vm, 4, &pte_masks);
+}
+
/* Allocate memory regions for nested VMX tests.
*
* Input Args:
@@ -105,6 +120,9 @@ vcpu_alloc_vmx(struct kvm_vm *vm, vm_vaddr_t *p_vmx_gva)
vmx->vmwrite_gpa = addr_gva2gpa(vm, (uintptr_t)vmx->vmwrite);
memset(vmx->vmwrite_hva, 0, getpagesize());
+ if (vm->arch.tdp_mmu)
+ vmx->eptp_gpa = vm->arch.tdp_mmu->pgd;
+
*p_vmx_gva = vmx_gva;
return vmx;
}
@@ -395,7 +413,8 @@ void __tdp_pg_map(struct vmx_pages *vmx, struct kvm_vm *vm,
uint64_t nested_paddr, uint64_t paddr, int target_level)
{
const uint64_t page_size = PG_LEVEL_SIZE(target_level);
- struct eptPageTableEntry *pt = vmx->eptp_hva, *pte;
+ void *eptp_hva = addr_gpa2hva(vm, vm->arch.tdp_mmu->pgd);
+ struct eptPageTableEntry *pt = eptp_hva, *pte;
uint16_t index;
TEST_ASSERT(vm->mode == VM_MODE_PXXVYY_4K,
@@ -525,15 +544,6 @@ bool kvm_cpu_has_ept(void)
return ctrl & SECONDARY_EXEC_ENABLE_EPT;
}
-void prepare_eptp(struct vmx_pages *vmx, struct kvm_vm *vm)
-{
- TEST_ASSERT(kvm_cpu_has_ept(), "KVM doesn't support nested EPT");
-
- vmx->eptp = (void *)vm_vaddr_alloc_page(vm);
- vmx->eptp_hva = addr_gva2hva(vm, (uintptr_t)vmx->eptp);
- vmx->eptp_gpa = addr_gva2gpa(vm, (uintptr_t)vmx->eptp);
-}
-
void prepare_virtualize_apic_accesses(struct vmx_pages *vmx, struct kvm_vm *vm)
{
vmx->apic_access = (void *)vm_vaddr_alloc_page(vm);
diff --git a/tools/testing/selftests/kvm/x86/vmx_dirty_log_test.c b/tools/testing/selftests/kvm/x86/vmx_dirty_log_test.c
index e7d0c08ba29d..5c8cf8ac42a2 100644
--- a/tools/testing/selftests/kvm/x86/vmx_dirty_log_test.c
+++ b/tools/testing/selftests/kvm/x86/vmx_dirty_log_test.c
@@ -93,6 +93,9 @@ static void test_vmx_dirty_log(bool enable_ept)
/* Create VM */
vm = vm_create_with_one_vcpu(&vcpu, l1_guest_code);
+ if (enable_ept)
+ vm_enable_ept(vm);
+
vmx = vcpu_alloc_vmx(vm, &vmx_pages_gva);
vcpu_args_set(vcpu, 1, vmx_pages_gva);
@@ -113,14 +116,10 @@ static void test_vmx_dirty_log(bool enable_ept)
* ... pages in the L2 GPA range [0xc0001000, 0xc0003000) will map to
* 0xc0000000.
*
- * Note that prepare_eptp should be called only L1's GPA map is done,
- * meaning after the last call to virt_map.
- *
* When EPT is disabled, the L2 guest code will still access the same L1
* GPAs as the EPT enabled case.
*/
if (enable_ept) {
- prepare_eptp(vmx, vm);
tdp_identity_map_default_memslots(vmx, vm);
tdp_map(vmx, vm, NESTED_TEST_MEM1, GUEST_TEST_MEM, PAGE_SIZE);
tdp_map(vmx, vm, NESTED_TEST_MEM2, GUEST_TEST_MEM, PAGE_SIZE);
--
2.52.0.351.gbe84eed79e-goog
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH v4 11/21] KVM: selftests: Stop passing VMX metadata to TDP mapping functions
2025-12-30 23:01 [PATCH v4 00/21] KVM: selftests: Add Nested NPT support Sean Christopherson
` (9 preceding siblings ...)
2025-12-30 23:01 ` [PATCH v4 10/21] KVM: selftests: Use a TDP MMU to share EPT page tables between vCPUs Sean Christopherson
@ 2025-12-30 23:01 ` Sean Christopherson
2026-01-02 16:58 ` Yosry Ahmed
2026-01-02 17:12 ` Yosry Ahmed
2025-12-30 23:01 ` [PATCH v4 12/21] KVM: selftests: Add a stage-2 MMU instance to kvm_vm Sean Christopherson
` (10 subsequent siblings)
21 siblings, 2 replies; 39+ messages in thread
From: Sean Christopherson @ 2025-12-30 23:01 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou,
Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, kvm-riscv, linux-riscv,
linux-kernel, Yosry Ahmed
From: Yosry Ahmed <yosry.ahmed@linux.dev>
The root GPA can now be retrieved from the nested MMU, stop passing VMX
metadata. This is in preparation for making these functions work for
NPTs as well.
Opportunistically drop tdp_pg_map() since it's unused.
No functional change intended.
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
tools/testing/selftests/kvm/include/x86/vmx.h | 11 ++-----
.../testing/selftests/kvm/lib/x86/memstress.c | 11 +++----
tools/testing/selftests/kvm/lib/x86/vmx.c | 33 +++++++------------
.../selftests/kvm/x86/vmx_dirty_log_test.c | 9 +++--
4 files changed, 24 insertions(+), 40 deletions(-)
diff --git a/tools/testing/selftests/kvm/include/x86/vmx.h b/tools/testing/selftests/kvm/include/x86/vmx.h
index 1fd83c23529a..4dd4c2094ee6 100644
--- a/tools/testing/selftests/kvm/include/x86/vmx.h
+++ b/tools/testing/selftests/kvm/include/x86/vmx.h
@@ -557,14 +557,9 @@ bool load_vmcs(struct vmx_pages *vmx);
bool ept_1g_pages_supported(void);
-void tdp_pg_map(struct vmx_pages *vmx, struct kvm_vm *vm, uint64_t nested_paddr,
- uint64_t paddr);
-void tdp_map(struct vmx_pages *vmx, struct kvm_vm *vm, uint64_t nested_paddr,
- uint64_t paddr, uint64_t size);
-void tdp_identity_map_default_memslots(struct vmx_pages *vmx,
- struct kvm_vm *vm);
-void tdp_identity_map_1g(struct vmx_pages *vmx, struct kvm_vm *vm,
- uint64_t addr, uint64_t size);
+void tdp_map(struct kvm_vm *vm, uint64_t nested_paddr, uint64_t paddr, uint64_t size);
+void tdp_identity_map_default_memslots(struct kvm_vm *vm);
+void tdp_identity_map_1g(struct kvm_vm *vm, uint64_t addr, uint64_t size);
bool kvm_cpu_has_ept(void);
void vm_enable_ept(struct kvm_vm *vm);
void prepare_virtualize_apic_accesses(struct vmx_pages *vmx, struct kvm_vm *vm);
diff --git a/tools/testing/selftests/kvm/lib/x86/memstress.c b/tools/testing/selftests/kvm/lib/x86/memstress.c
index 00f7f11e5f0e..3319cb57a78d 100644
--- a/tools/testing/selftests/kvm/lib/x86/memstress.c
+++ b/tools/testing/selftests/kvm/lib/x86/memstress.c
@@ -59,7 +59,7 @@ uint64_t memstress_nested_pages(int nr_vcpus)
return 513 + 10 * nr_vcpus;
}
-static void memstress_setup_ept_mappings(struct vmx_pages *vmx, struct kvm_vm *vm)
+static void memstress_setup_ept_mappings(struct kvm_vm *vm)
{
uint64_t start, end;
@@ -68,16 +68,15 @@ static void memstress_setup_ept_mappings(struct vmx_pages *vmx, struct kvm_vm *v
* KVM can shadow the EPT12 with the maximum huge page size supported
* by the backing source.
*/
- tdp_identity_map_1g(vmx, vm, 0, 0x100000000ULL);
+ tdp_identity_map_1g(vm, 0, 0x100000000ULL);
start = align_down(memstress_args.gpa, PG_SIZE_1G);
end = align_up(memstress_args.gpa + memstress_args.size, PG_SIZE_1G);
- tdp_identity_map_1g(vmx, vm, start, end - start);
+ tdp_identity_map_1g(vm, start, end - start);
}
void memstress_setup_nested(struct kvm_vm *vm, int nr_vcpus, struct kvm_vcpu *vcpus[])
{
- struct vmx_pages *vmx;
struct kvm_regs regs;
vm_vaddr_t vmx_gva;
int vcpu_id;
@@ -87,11 +86,11 @@ void memstress_setup_nested(struct kvm_vm *vm, int nr_vcpus, struct kvm_vcpu *vc
vm_enable_ept(vm);
for (vcpu_id = 0; vcpu_id < nr_vcpus; vcpu_id++) {
- vmx = vcpu_alloc_vmx(vm, &vmx_gva);
+ vcpu_alloc_vmx(vm, &vmx_gva);
/* The EPTs are shared across vCPUs, setup the mappings once */
if (vcpu_id == 0)
- memstress_setup_ept_mappings(vmx, vm);
+ memstress_setup_ept_mappings(vm);
/*
* Override the vCPU to run memstress_l1_guest_code() which will
diff --git a/tools/testing/selftests/kvm/lib/x86/vmx.c b/tools/testing/selftests/kvm/lib/x86/vmx.c
index 9d4e391fdf2c..ea1c09f9e8ab 100644
--- a/tools/testing/selftests/kvm/lib/x86/vmx.c
+++ b/tools/testing/selftests/kvm/lib/x86/vmx.c
@@ -409,8 +409,8 @@ static void tdp_create_pte(struct kvm_vm *vm,
}
-void __tdp_pg_map(struct vmx_pages *vmx, struct kvm_vm *vm,
- uint64_t nested_paddr, uint64_t paddr, int target_level)
+void __tdp_pg_map(struct kvm_vm *vm, uint64_t nested_paddr, uint64_t paddr,
+ int target_level)
{
const uint64_t page_size = PG_LEVEL_SIZE(target_level);
void *eptp_hva = addr_gpa2hva(vm, vm->arch.tdp_mmu->pgd);
@@ -453,12 +453,6 @@ void __tdp_pg_map(struct vmx_pages *vmx, struct kvm_vm *vm,
}
}
-void tdp_pg_map(struct vmx_pages *vmx, struct kvm_vm *vm,
- uint64_t nested_paddr, uint64_t paddr)
-{
- __tdp_pg_map(vmx, vm, nested_paddr, paddr, PG_LEVEL_4K);
-}
-
/*
* Map a range of EPT guest physical addresses to the VM's physical address
*
@@ -476,9 +470,8 @@ void tdp_pg_map(struct vmx_pages *vmx, struct kvm_vm *vm,
* Within the VM given by vm, creates a nested guest translation for the
* page range starting at nested_paddr to the page range starting at paddr.
*/
-void __tdp_map(struct vmx_pages *vmx, struct kvm_vm *vm,
- uint64_t nested_paddr, uint64_t paddr, uint64_t size,
- int level)
+void __tdp_map(struct kvm_vm *vm, uint64_t nested_paddr, uint64_t paddr,
+ uint64_t size, int level)
{
size_t page_size = PG_LEVEL_SIZE(level);
size_t npages = size / page_size;
@@ -487,23 +480,22 @@ void __tdp_map(struct vmx_pages *vmx, struct kvm_vm *vm,
TEST_ASSERT(paddr + size > paddr, "Paddr overflow");
while (npages--) {
- __tdp_pg_map(vmx, vm, nested_paddr, paddr, level);
+ __tdp_pg_map(vm, nested_paddr, paddr, level);
nested_paddr += page_size;
paddr += page_size;
}
}
-void tdp_map(struct vmx_pages *vmx, struct kvm_vm *vm,
- uint64_t nested_paddr, uint64_t paddr, uint64_t size)
+void tdp_map(struct kvm_vm *vm, uint64_t nested_paddr, uint64_t paddr,
+ uint64_t size)
{
- __tdp_map(vmx, vm, nested_paddr, paddr, size, PG_LEVEL_4K);
+ __tdp_map(vm, nested_paddr, paddr, size, PG_LEVEL_4K);
}
/* Prepare an identity extended page table that maps all the
* physical pages in VM.
*/
-void tdp_identity_map_default_memslots(struct vmx_pages *vmx,
- struct kvm_vm *vm)
+void tdp_identity_map_default_memslots(struct kvm_vm *vm)
{
uint32_t s, memslot = 0;
sparsebit_idx_t i, last;
@@ -520,16 +512,15 @@ void tdp_identity_map_default_memslots(struct vmx_pages *vmx,
if (i > last)
break;
- tdp_map(vmx, vm, (uint64_t)i << vm->page_shift,
+ tdp_map(vm, (uint64_t)i << vm->page_shift,
(uint64_t)i << vm->page_shift, 1 << vm->page_shift);
}
}
/* Identity map a region with 1GiB Pages. */
-void tdp_identity_map_1g(struct vmx_pages *vmx, struct kvm_vm *vm,
- uint64_t addr, uint64_t size)
+void tdp_identity_map_1g(struct kvm_vm *vm, uint64_t addr, uint64_t size)
{
- __tdp_map(vmx, vm, addr, addr, size, PG_LEVEL_1G);
+ __tdp_map(vm, addr, addr, size, PG_LEVEL_1G);
}
bool kvm_cpu_has_ept(void)
diff --git a/tools/testing/selftests/kvm/x86/vmx_dirty_log_test.c b/tools/testing/selftests/kvm/x86/vmx_dirty_log_test.c
index 5c8cf8ac42a2..370f8d3117c2 100644
--- a/tools/testing/selftests/kvm/x86/vmx_dirty_log_test.c
+++ b/tools/testing/selftests/kvm/x86/vmx_dirty_log_test.c
@@ -80,7 +80,6 @@ void l1_guest_code(struct vmx_pages *vmx)
static void test_vmx_dirty_log(bool enable_ept)
{
vm_vaddr_t vmx_pages_gva = 0;
- struct vmx_pages *vmx;
unsigned long *bmap;
uint64_t *host_test_mem;
@@ -96,7 +95,7 @@ static void test_vmx_dirty_log(bool enable_ept)
if (enable_ept)
vm_enable_ept(vm);
- vmx = vcpu_alloc_vmx(vm, &vmx_pages_gva);
+ vcpu_alloc_vmx(vm, &vmx_pages_gva);
vcpu_args_set(vcpu, 1, vmx_pages_gva);
/* Add an extra memory slot for testing dirty logging */
@@ -120,9 +119,9 @@ static void test_vmx_dirty_log(bool enable_ept)
* GPAs as the EPT enabled case.
*/
if (enable_ept) {
- tdp_identity_map_default_memslots(vmx, vm);
- tdp_map(vmx, vm, NESTED_TEST_MEM1, GUEST_TEST_MEM, PAGE_SIZE);
- tdp_map(vmx, vm, NESTED_TEST_MEM2, GUEST_TEST_MEM, PAGE_SIZE);
+ tdp_identity_map_default_memslots(vm);
+ tdp_map(vm, NESTED_TEST_MEM1, GUEST_TEST_MEM, PAGE_SIZE);
+ tdp_map(vm, NESTED_TEST_MEM2, GUEST_TEST_MEM, PAGE_SIZE);
}
bmap = bitmap_zalloc(TEST_MEM_PAGES);
--
2.52.0.351.gbe84eed79e-goog
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH v4 12/21] KVM: selftests: Add a stage-2 MMU instance to kvm_vm
2025-12-30 23:01 [PATCH v4 00/21] KVM: selftests: Add Nested NPT support Sean Christopherson
` (10 preceding siblings ...)
2025-12-30 23:01 ` [PATCH v4 11/21] KVM: selftests: Stop passing VMX metadata to TDP mapping functions Sean Christopherson
@ 2025-12-30 23:01 ` Sean Christopherson
2026-01-02 17:03 ` Yosry Ahmed
2025-12-30 23:01 ` [PATCH v4 13/21] KVM: selftests: Reuse virt mapping functions for nested EPTs Sean Christopherson
` (9 subsequent siblings)
21 siblings, 1 reply; 39+ messages in thread
From: Sean Christopherson @ 2025-12-30 23:01 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou,
Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, kvm-riscv, linux-riscv,
linux-kernel, Yosry Ahmed
Add a stage-2 MMU instance so that architectures that support nested
virtualization (more specifically, nested stage-2 page tables) can create
and track stage-2 page tables for running L2 guests. Plumb the structure
into common code to avoid cyclical dependencies, and to provide some line
of sight to having common APIs for creating stage-2 mappings.
As a bonus, putting the member in common code justifies using stage2_mmu
instead of tdp_mmu for x86.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
tools/testing/selftests/kvm/include/kvm_util.h | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/tools/testing/selftests/kvm/include/kvm_util.h b/tools/testing/selftests/kvm/include/kvm_util.h
index c1497515fa6a..371d55e0366e 100644
--- a/tools/testing/selftests/kvm/include/kvm_util.h
+++ b/tools/testing/selftests/kvm/include/kvm_util.h
@@ -116,7 +116,12 @@ struct kvm_vm {
uint32_t dirty_ring_size;
uint64_t gpa_tag_mask;
+ /*
+ * "mmu" is the guest's stage-1, with a short name because the vast
+ * majority of tests only care about the stage-1 MMU.
+ */
struct kvm_mmu mmu;
+ struct kvm_mmu stage2_mmu;
struct kvm_vm_arch arch;
--
2.52.0.351.gbe84eed79e-goog
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH v4 13/21] KVM: selftests: Reuse virt mapping functions for nested EPTs
2025-12-30 23:01 [PATCH v4 00/21] KVM: selftests: Add Nested NPT support Sean Christopherson
` (11 preceding siblings ...)
2025-12-30 23:01 ` [PATCH v4 12/21] KVM: selftests: Add a stage-2 MMU instance to kvm_vm Sean Christopherson
@ 2025-12-30 23:01 ` Sean Christopherson
2025-12-30 23:01 ` [PATCH v4 14/21] KVM: selftests: Move TDP mapping functions outside of vmx.c Sean Christopherson
` (8 subsequent siblings)
21 siblings, 0 replies; 39+ messages in thread
From: Sean Christopherson @ 2025-12-30 23:01 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou,
Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, kvm-riscv, linux-riscv,
linux-kernel, Yosry Ahmed
From: Yosry Ahmed <yosry.ahmed@linux.dev>
Rework tdp_map() and friends to use __virt_pg_map() and drop the custom
EPT code in __tdp_pg_map() and tdp_create_pte(). The EPT code and
__virt_pg_map() are practically identical, the main differences are:
- EPT uses the EPT struct overlay instead of the PTE masks.
- EPT always assumes 4-level EPTs.
To reuse __virt_pg_map(), extend the PTE masks to work with EPT's RWX and
X-only capabilities, and provide a tdp_mmu_init() API so that EPT can pass
in the EPT PTE masks along with the root page level (which is currently
hardcoded to '4').
Don't reuse KVM's insane overloading of the USER bit for EPT_R as there's
no reason to multiplex bits in the selftests, e.g. selftests aren't trying
to shadow guest PTEs and thus don't care about funnelling protections into
a common permissions check.
Another benefit of reusing the code is having separate handling for
upper-level PTEs vs 4K PTEs, which avoids some quirks like setting the
large bit on a 4K PTE in the EPTs.
For all intents and purposes, no functional change intended.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
.../selftests/kvm/include/x86/kvm_util_arch.h | 4 +-
.../selftests/kvm/include/x86/processor.h | 16 ++-
.../testing/selftests/kvm/lib/x86/processor.c | 21 +++-
tools/testing/selftests/kvm/lib/x86/vmx.c | 119 +++---------------
4 files changed, 52 insertions(+), 108 deletions(-)
diff --git a/tools/testing/selftests/kvm/include/x86/kvm_util_arch.h b/tools/testing/selftests/kvm/include/x86/kvm_util_arch.h
index 05a1fc1780f2..1cf84b8212c6 100644
--- a/tools/testing/selftests/kvm/include/x86/kvm_util_arch.h
+++ b/tools/testing/selftests/kvm/include/x86/kvm_util_arch.h
@@ -14,6 +14,8 @@ struct pte_masks {
uint64_t present;
uint64_t writable;
uint64_t user;
+ uint64_t readable;
+ uint64_t executable;
uint64_t accessed;
uint64_t dirty;
uint64_t huge;
@@ -37,8 +39,6 @@ struct kvm_vm_arch {
uint64_t s_bit;
int sev_fd;
bool is_pt_protected;
-
- struct kvm_mmu *tdp_mmu;
};
static inline bool __vm_arch_has_protected_memory(struct kvm_vm_arch *arch)
diff --git a/tools/testing/selftests/kvm/include/x86/processor.h b/tools/testing/selftests/kvm/include/x86/processor.h
index 973f2069cd3b..4c0d2fc83c1c 100644
--- a/tools/testing/selftests/kvm/include/x86/processor.h
+++ b/tools/testing/selftests/kvm/include/x86/processor.h
@@ -1442,6 +1442,8 @@ enum pg_level {
#define PTE_PRESENT_MASK(mmu) ((mmu)->arch.pte_masks.present)
#define PTE_WRITABLE_MASK(mmu) ((mmu)->arch.pte_masks.writable)
#define PTE_USER_MASK(mmu) ((mmu)->arch.pte_masks.user)
+#define PTE_READABLE_MASK(mmu) ((mmu)->arch.pte_masks.readable)
+#define PTE_EXECUTABLE_MASK(mmu) ((mmu)->arch.pte_masks.executable)
#define PTE_ACCESSED_MASK(mmu) ((mmu)->arch.pte_masks.accessed)
#define PTE_DIRTY_MASK(mmu) ((mmu)->arch.pte_masks.dirty)
#define PTE_HUGE_MASK(mmu) ((mmu)->arch.pte_masks.huge)
@@ -1449,13 +1451,23 @@ enum pg_level {
#define PTE_C_BIT_MASK(mmu) ((mmu)->arch.pte_masks.c)
#define PTE_S_BIT_MASK(mmu) ((mmu)->arch.pte_masks.s)
-#define is_present_pte(mmu, pte) (!!(*(pte) & PTE_PRESENT_MASK(mmu)))
+/*
+ * For PTEs without a PRESENT bit (i.e. EPT entries), treat the PTE as present
+ * if it's executable or readable, as EPT supports execute-only PTEs, but not
+ * write-only PTEs.
+ */
+#define is_present_pte(mmu, pte) \
+ (PTE_PRESENT_MASK(mmu) ? \
+ !!(*(pte) & PTE_PRESENT_MASK(mmu)) : \
+ !!(*(pte) & (PTE_READABLE_MASK(mmu) | PTE_EXECUTABLE_MASK(mmu))))
+#define is_executable_pte(mmu, pte) \
+ ((*(pte) & (PTE_EXECUTABLE_MASK(mmu) | PTE_NX_MASK(mmu))) == PTE_EXECUTABLE_MASK(mmu))
#define is_writable_pte(mmu, pte) (!!(*(pte) & PTE_WRITABLE_MASK(mmu)))
#define is_user_pte(mmu, pte) (!!(*(pte) & PTE_USER_MASK(mmu)))
#define is_accessed_pte(mmu, pte) (!!(*(pte) & PTE_ACCESSED_MASK(mmu)))
#define is_dirty_pte(mmu, pte) (!!(*(pte) & PTE_DIRTY_MASK(mmu)))
#define is_huge_pte(mmu, pte) (!!(*(pte) & PTE_HUGE_MASK(mmu)))
-#define is_nx_pte(mmu, pte) (!!(*(pte) & PTE_NX_MASK(mmu)))
+#define is_nx_pte(mmu, pte) (!is_executable_pte(mmu, pte))
void tdp_mmu_init(struct kvm_vm *vm, int pgtable_levels,
struct pte_masks *pte_masks);
diff --git a/tools/testing/selftests/kvm/lib/x86/processor.c b/tools/testing/selftests/kvm/lib/x86/processor.c
index 8a9298a72897..41316cac94e0 100644
--- a/tools/testing/selftests/kvm/lib/x86/processor.c
+++ b/tools/testing/selftests/kvm/lib/x86/processor.c
@@ -165,6 +165,10 @@ static void virt_mmu_init(struct kvm_vm *vm, struct kvm_mmu *mmu,
mmu->pgd_created = true;
mmu->arch.pte_masks = *pte_masks;
}
+
+ TEST_ASSERT(mmu->pgtable_levels == 4 || mmu->pgtable_levels == 5,
+ "Selftests MMU only supports 4-level and 5-level paging, not %u-level paging",
+ mmu->pgtable_levels);
}
void virt_arch_pgd_alloc(struct kvm_vm *vm)
@@ -180,6 +184,7 @@ void virt_arch_pgd_alloc(struct kvm_vm *vm)
.dirty = BIT_ULL(6),
.huge = BIT_ULL(7),
.nx = BIT_ULL(63),
+ .executable = 0,
.c = vm->arch.c_bit,
.s = vm->arch.s_bit,
};
@@ -190,10 +195,10 @@ void virt_arch_pgd_alloc(struct kvm_vm *vm)
void tdp_mmu_init(struct kvm_vm *vm, int pgtable_levels,
struct pte_masks *pte_masks)
{
- TEST_ASSERT(!vm->arch.tdp_mmu, "TDP MMU already initialized");
+ TEST_ASSERT(!vm->stage2_mmu.pgtable_levels, "TDP MMU already initialized");
- vm->arch.tdp_mmu = calloc(1, sizeof(*vm->arch.tdp_mmu));
- virt_mmu_init(vm, vm->arch.tdp_mmu, pte_masks);
+ vm->stage2_mmu.pgtable_levels = pgtable_levels;
+ virt_mmu_init(vm, &vm->stage2_mmu, pte_masks);
}
static void *virt_get_pte(struct kvm_vm *vm, struct kvm_mmu *mmu,
@@ -223,7 +228,8 @@ static uint64_t *virt_create_upper_pte(struct kvm_vm *vm,
paddr = vm_untag_gpa(vm, paddr);
if (!is_present_pte(mmu, pte)) {
- *pte = PTE_PRESENT_MASK(mmu) | PTE_WRITABLE_MASK(mmu);
+ *pte = PTE_PRESENT_MASK(mmu) | PTE_READABLE_MASK(mmu) |
+ PTE_WRITABLE_MASK(mmu) | PTE_EXECUTABLE_MASK(mmu);
if (current_level == target_level)
*pte |= PTE_HUGE_MASK(mmu) | (paddr & PHYSICAL_PAGE_MASK);
else
@@ -269,6 +275,9 @@ void __virt_pg_map(struct kvm_vm *vm, struct kvm_mmu *mmu, uint64_t vaddr,
TEST_ASSERT(vm_untag_gpa(vm, paddr) == paddr,
"Unexpected bits in paddr: %lx", paddr);
+ TEST_ASSERT(!PTE_EXECUTABLE_MASK(mmu) || !PTE_NX_MASK(mmu),
+ "X and NX bit masks cannot be used simultaneously");
+
/*
* Allocate upper level page tables, if not already present. Return
* early if a hugepage was created.
@@ -286,7 +295,9 @@ void __virt_pg_map(struct kvm_vm *vm, struct kvm_mmu *mmu, uint64_t vaddr,
pte = virt_get_pte(vm, mmu, pte, vaddr, PG_LEVEL_4K);
TEST_ASSERT(!is_present_pte(mmu, pte),
"PTE already present for 4k page at vaddr: 0x%lx", vaddr);
- *pte = PTE_PRESENT_MASK(mmu) | PTE_WRITABLE_MASK(mmu) | (paddr & PHYSICAL_PAGE_MASK);
+ *pte = PTE_PRESENT_MASK(mmu) | PTE_READABLE_MASK(mmu) |
+ PTE_WRITABLE_MASK(mmu) | PTE_EXECUTABLE_MASK(mmu) |
+ (paddr & PHYSICAL_PAGE_MASK);
/*
* Neither SEV nor TDX supports shared page tables, so only the final
diff --git a/tools/testing/selftests/kvm/lib/x86/vmx.c b/tools/testing/selftests/kvm/lib/x86/vmx.c
index ea1c09f9e8ab..e3737b3d9120 100644
--- a/tools/testing/selftests/kvm/lib/x86/vmx.c
+++ b/tools/testing/selftests/kvm/lib/x86/vmx.c
@@ -25,21 +25,6 @@ bool enable_evmcs;
struct hv_enlightened_vmcs *current_evmcs;
struct hv_vp_assist_page *current_vp_assist;
-struct eptPageTableEntry {
- uint64_t readable:1;
- uint64_t writable:1;
- uint64_t executable:1;
- uint64_t memory_type:3;
- uint64_t ignore_pat:1;
- uint64_t page_size:1;
- uint64_t accessed:1;
- uint64_t dirty:1;
- uint64_t ignored_11_10:2;
- uint64_t address:40;
- uint64_t ignored_62_52:11;
- uint64_t suppress_ve:1;
-};
-
int vcpu_enable_evmcs(struct kvm_vcpu *vcpu)
{
uint16_t evmcs_ver;
@@ -58,13 +43,24 @@ int vcpu_enable_evmcs(struct kvm_vcpu *vcpu)
void vm_enable_ept(struct kvm_vm *vm)
{
+ struct pte_masks pte_masks;
+
TEST_ASSERT(kvm_cpu_has_ept(), "KVM doesn't support nested EPT");
- if (vm->arch.tdp_mmu)
- return;
-
- /* TODO: Drop eptPageTableEntry in favor of PTE masks. */
- struct pte_masks pte_masks = (struct pte_masks) {
+ /*
+ * EPTs do not have 'present' or 'user' bits, instead bit 0 is the
+ * 'readable' bit.
+ */
+ pte_masks = (struct pte_masks) {
+ .present = 0,
+ .user = 0,
+ .readable = BIT_ULL(0),
+ .writable = BIT_ULL(1),
+ .executable = BIT_ULL(2),
+ .huge = BIT_ULL(7),
+ .accessed = BIT_ULL(8),
+ .dirty = BIT_ULL(9),
+ .nx = 0,
};
/* TODO: Add support for 5-level EPT. */
@@ -120,8 +116,8 @@ vcpu_alloc_vmx(struct kvm_vm *vm, vm_vaddr_t *p_vmx_gva)
vmx->vmwrite_gpa = addr_gva2gpa(vm, (uintptr_t)vmx->vmwrite);
memset(vmx->vmwrite_hva, 0, getpagesize());
- if (vm->arch.tdp_mmu)
- vmx->eptp_gpa = vm->arch.tdp_mmu->pgd;
+ if (vm->stage2_mmu.pgd_created)
+ vmx->eptp_gpa = vm->stage2_mmu.pgd;
*p_vmx_gva = vmx_gva;
return vmx;
@@ -377,82 +373,6 @@ void prepare_vmcs(struct vmx_pages *vmx, void *guest_rip, void *guest_rsp)
init_vmcs_guest_state(guest_rip, guest_rsp);
}
-static void tdp_create_pte(struct kvm_vm *vm,
- struct eptPageTableEntry *pte,
- uint64_t nested_paddr,
- uint64_t paddr,
- int current_level,
- int target_level)
-{
- if (!pte->readable) {
- pte->writable = true;
- pte->readable = true;
- pte->executable = true;
- pte->page_size = (current_level == target_level);
- if (pte->page_size)
- pte->address = paddr >> vm->page_shift;
- else
- pte->address = vm_alloc_page_table(vm) >> vm->page_shift;
- } else {
- /*
- * Entry already present. Assert that the caller doesn't want
- * a hugepage at this level, and that there isn't a hugepage at
- * this level.
- */
- TEST_ASSERT(current_level != target_level,
- "Cannot create hugepage at level: %u, nested_paddr: 0x%lx",
- current_level, nested_paddr);
- TEST_ASSERT(!pte->page_size,
- "Cannot create page table at level: %u, nested_paddr: 0x%lx",
- current_level, nested_paddr);
- }
-}
-
-
-void __tdp_pg_map(struct kvm_vm *vm, uint64_t nested_paddr, uint64_t paddr,
- int target_level)
-{
- const uint64_t page_size = PG_LEVEL_SIZE(target_level);
- void *eptp_hva = addr_gpa2hva(vm, vm->arch.tdp_mmu->pgd);
- struct eptPageTableEntry *pt = eptp_hva, *pte;
- uint16_t index;
-
- TEST_ASSERT(vm->mode == VM_MODE_PXXVYY_4K,
- "Unknown or unsupported guest mode: 0x%x", vm->mode);
-
- TEST_ASSERT((nested_paddr >> 48) == 0,
- "Nested physical address 0x%lx is > 48-bits and requires 5-level EPT",
- nested_paddr);
- TEST_ASSERT((nested_paddr % page_size) == 0,
- "Nested physical address not on page boundary,\n"
- " nested_paddr: 0x%lx page_size: 0x%lx",
- nested_paddr, page_size);
- TEST_ASSERT((nested_paddr >> vm->page_shift) <= vm->max_gfn,
- "Physical address beyond beyond maximum supported,\n"
- " nested_paddr: 0x%lx vm->max_gfn: 0x%lx vm->page_size: 0x%x",
- paddr, vm->max_gfn, vm->page_size);
- TEST_ASSERT((paddr % page_size) == 0,
- "Physical address not on page boundary,\n"
- " paddr: 0x%lx page_size: 0x%lx",
- paddr, page_size);
- TEST_ASSERT((paddr >> vm->page_shift) <= vm->max_gfn,
- "Physical address beyond beyond maximum supported,\n"
- " paddr: 0x%lx vm->max_gfn: 0x%lx vm->page_size: 0x%x",
- paddr, vm->max_gfn, vm->page_size);
-
- for (int level = PG_LEVEL_512G; level >= PG_LEVEL_4K; level--) {
- index = (nested_paddr >> PG_LEVEL_SHIFT(level)) & 0x1ffu;
- pte = &pt[index];
-
- tdp_create_pte(vm, pte, nested_paddr, paddr, level, target_level);
-
- if (pte->page_size)
- break;
-
- pt = addr_gpa2hva(vm, pte->address * vm->page_size);
- }
-}
-
/*
* Map a range of EPT guest physical addresses to the VM's physical address
*
@@ -473,6 +393,7 @@ void __tdp_pg_map(struct kvm_vm *vm, uint64_t nested_paddr, uint64_t paddr,
void __tdp_map(struct kvm_vm *vm, uint64_t nested_paddr, uint64_t paddr,
uint64_t size, int level)
{
+ struct kvm_mmu *mmu = &vm->stage2_mmu;
size_t page_size = PG_LEVEL_SIZE(level);
size_t npages = size / page_size;
@@ -480,7 +401,7 @@ void __tdp_map(struct kvm_vm *vm, uint64_t nested_paddr, uint64_t paddr,
TEST_ASSERT(paddr + size > paddr, "Paddr overflow");
while (npages--) {
- __tdp_pg_map(vm, nested_paddr, paddr, level);
+ __virt_pg_map(vm, mmu, nested_paddr, paddr, level);
nested_paddr += page_size;
paddr += page_size;
}
--
2.52.0.351.gbe84eed79e-goog
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH v4 14/21] KVM: selftests: Move TDP mapping functions outside of vmx.c
2025-12-30 23:01 [PATCH v4 00/21] KVM: selftests: Add Nested NPT support Sean Christopherson
` (12 preceding siblings ...)
2025-12-30 23:01 ` [PATCH v4 13/21] KVM: selftests: Reuse virt mapping functions for nested EPTs Sean Christopherson
@ 2025-12-30 23:01 ` Sean Christopherson
2025-12-30 23:01 ` [PATCH v4 15/21] KVM: selftests: Allow kvm_cpu_has_ept() to be called on AMD CPUs Sean Christopherson
` (7 subsequent siblings)
21 siblings, 0 replies; 39+ messages in thread
From: Sean Christopherson @ 2025-12-30 23:01 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou,
Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, kvm-riscv, linux-riscv,
linux-kernel, Yosry Ahmed
Now that the functions are no longer VMX-specific, move them to
processor.c. Do a minor comment tweak replacing 'EPT' with 'TDP'.
No functional change intended.
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
.../selftests/kvm/include/x86/processor.h | 4 ++
tools/testing/selftests/kvm/include/x86/vmx.h | 3 -
.../testing/selftests/kvm/lib/x86/processor.c | 53 ++++++++++++++
tools/testing/selftests/kvm/lib/x86/vmx.c | 71 -------------------
4 files changed, 57 insertions(+), 74 deletions(-)
diff --git a/tools/testing/selftests/kvm/include/x86/processor.h b/tools/testing/selftests/kvm/include/x86/processor.h
index 4c0d2fc83c1c..d134c886f280 100644
--- a/tools/testing/selftests/kvm/include/x86/processor.h
+++ b/tools/testing/selftests/kvm/include/x86/processor.h
@@ -1477,6 +1477,10 @@ void __virt_pg_map(struct kvm_vm *vm, struct kvm_mmu *mmu, uint64_t vaddr,
void virt_map_level(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr,
uint64_t nr_bytes, int level);
+void tdp_map(struct kvm_vm *vm, uint64_t nested_paddr, uint64_t paddr, uint64_t size);
+void tdp_identity_map_default_memslots(struct kvm_vm *vm);
+void tdp_identity_map_1g(struct kvm_vm *vm, uint64_t addr, uint64_t size);
+
/*
* Basic CPU control in CR0
*/
diff --git a/tools/testing/selftests/kvm/include/x86/vmx.h b/tools/testing/selftests/kvm/include/x86/vmx.h
index 4dd4c2094ee6..92b918700d24 100644
--- a/tools/testing/selftests/kvm/include/x86/vmx.h
+++ b/tools/testing/selftests/kvm/include/x86/vmx.h
@@ -557,9 +557,6 @@ bool load_vmcs(struct vmx_pages *vmx);
bool ept_1g_pages_supported(void);
-void tdp_map(struct kvm_vm *vm, uint64_t nested_paddr, uint64_t paddr, uint64_t size);
-void tdp_identity_map_default_memslots(struct kvm_vm *vm);
-void tdp_identity_map_1g(struct kvm_vm *vm, uint64_t addr, uint64_t size);
bool kvm_cpu_has_ept(void);
void vm_enable_ept(struct kvm_vm *vm);
void prepare_virtualize_apic_accesses(struct vmx_pages *vmx, struct kvm_vm *vm);
diff --git a/tools/testing/selftests/kvm/lib/x86/processor.c b/tools/testing/selftests/kvm/lib/x86/processor.c
index 41316cac94e0..29e7d172f945 100644
--- a/tools/testing/selftests/kvm/lib/x86/processor.c
+++ b/tools/testing/selftests/kvm/lib/x86/processor.c
@@ -472,6 +472,59 @@ void virt_arch_dump(FILE *stream, struct kvm_vm *vm, uint8_t indent)
}
}
+void __tdp_map(struct kvm_vm *vm, uint64_t nested_paddr, uint64_t paddr,
+ uint64_t size, int level)
+{
+ size_t page_size = PG_LEVEL_SIZE(level);
+ size_t npages = size / page_size;
+
+ TEST_ASSERT(nested_paddr + size > nested_paddr, "Vaddr overflow");
+ TEST_ASSERT(paddr + size > paddr, "Paddr overflow");
+
+ while (npages--) {
+ __virt_pg_map(vm, &vm->stage2_mmu, nested_paddr, paddr, level);
+ nested_paddr += page_size;
+ paddr += page_size;
+ }
+}
+
+void tdp_map(struct kvm_vm *vm, uint64_t nested_paddr, uint64_t paddr,
+ uint64_t size)
+{
+ __tdp_map(vm, nested_paddr, paddr, size, PG_LEVEL_4K);
+}
+
+/* Prepare an identity extended page table that maps all the
+ * physical pages in VM.
+ */
+void tdp_identity_map_default_memslots(struct kvm_vm *vm)
+{
+ uint32_t s, memslot = 0;
+ sparsebit_idx_t i, last;
+ struct userspace_mem_region *region = memslot2region(vm, memslot);
+
+ /* Only memslot 0 is mapped here, ensure it's the only one being used */
+ for (s = 0; s < NR_MEM_REGIONS; s++)
+ TEST_ASSERT_EQ(vm->memslots[s], 0);
+
+ i = (region->region.guest_phys_addr >> vm->page_shift) - 1;
+ last = i + (region->region.memory_size >> vm->page_shift);
+ for (;;) {
+ i = sparsebit_next_clear(region->unused_phy_pages, i);
+ if (i > last)
+ break;
+
+ tdp_map(vm, (uint64_t)i << vm->page_shift,
+ (uint64_t)i << vm->page_shift, 1 << vm->page_shift);
+ }
+}
+
+/* Identity map a region with 1GiB Pages. */
+void tdp_identity_map_1g(struct kvm_vm *vm, uint64_t addr, uint64_t size)
+{
+ __tdp_map(vm, addr, addr, size, PG_LEVEL_1G);
+}
+
/*
* Set Unusable Segment
*
diff --git a/tools/testing/selftests/kvm/lib/x86/vmx.c b/tools/testing/selftests/kvm/lib/x86/vmx.c
index e3737b3d9120..448a63457467 100644
--- a/tools/testing/selftests/kvm/lib/x86/vmx.c
+++ b/tools/testing/selftests/kvm/lib/x86/vmx.c
@@ -373,77 +373,6 @@ void prepare_vmcs(struct vmx_pages *vmx, void *guest_rip, void *guest_rsp)
init_vmcs_guest_state(guest_rip, guest_rsp);
}
-/*
- * Map a range of EPT guest physical addresses to the VM's physical address
- *
- * Input Args:
- * vm - Virtual Machine
- * nested_paddr - Nested guest physical address to map
- * paddr - VM Physical Address
- * size - The size of the range to map
- * level - The level at which to map the range
- *
- * Output Args: None
- *
- * Return: None
- *
- * Within the VM given by vm, creates a nested guest translation for the
- * page range starting at nested_paddr to the page range starting at paddr.
- */
-void __tdp_map(struct kvm_vm *vm, uint64_t nested_paddr, uint64_t paddr,
- uint64_t size, int level)
-{
- struct kvm_mmu *mmu = &vm->stage2_mmu;
- size_t page_size = PG_LEVEL_SIZE(level);
- size_t npages = size / page_size;
-
- TEST_ASSERT(nested_paddr + size > nested_paddr, "Vaddr overflow");
- TEST_ASSERT(paddr + size > paddr, "Paddr overflow");
-
- while (npages--) {
- __virt_pg_map(vm, mmu, nested_paddr, paddr, level);
- nested_paddr += page_size;
- paddr += page_size;
- }
-}
-
-void tdp_map(struct kvm_vm *vm, uint64_t nested_paddr, uint64_t paddr,
- uint64_t size)
-{
- __tdp_map(vm, nested_paddr, paddr, size, PG_LEVEL_4K);
-}
-
-/* Prepare an identity extended page table that maps all the
- * physical pages in VM.
- */
-void tdp_identity_map_default_memslots(struct kvm_vm *vm)
-{
- uint32_t s, memslot = 0;
- sparsebit_idx_t i, last;
- struct userspace_mem_region *region = memslot2region(vm, memslot);
-
- /* Only memslot 0 is mapped here, ensure it's the only one being used */
- for (s = 0; s < NR_MEM_REGIONS; s++)
- TEST_ASSERT_EQ(vm->memslots[s], 0);
-
- i = (region->region.guest_phys_addr >> vm->page_shift) - 1;
- last = i + (region->region.memory_size >> vm->page_shift);
- for (;;) {
- i = sparsebit_next_clear(region->unused_phy_pages, i);
- if (i > last)
- break;
-
- tdp_map(vm, (uint64_t)i << vm->page_shift,
- (uint64_t)i << vm->page_shift, 1 << vm->page_shift);
- }
-}
-
-/* Identity map a region with 1GiB Pages. */
-void tdp_identity_map_1g(struct kvm_vm *vm, uint64_t addr, uint64_t size)
-{
- __tdp_map(vm, addr, addr, size, PG_LEVEL_1G);
-}
-
bool kvm_cpu_has_ept(void)
{
uint64_t ctrl;
--
2.52.0.351.gbe84eed79e-goog
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH v4 15/21] KVM: selftests: Allow kvm_cpu_has_ept() to be called on AMD CPUs
2025-12-30 23:01 [PATCH v4 00/21] KVM: selftests: Add Nested NPT support Sean Christopherson
` (13 preceding siblings ...)
2025-12-30 23:01 ` [PATCH v4 14/21] KVM: selftests: Move TDP mapping functions outside of vmx.c Sean Christopherson
@ 2025-12-30 23:01 ` Sean Christopherson
2025-12-30 23:01 ` [PATCH v4 16/21] KVM: selftests: Add support for nested NPTs Sean Christopherson
` (6 subsequent siblings)
21 siblings, 0 replies; 39+ messages in thread
From: Sean Christopherson @ 2025-12-30 23:01 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou,
Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, kvm-riscv, linux-riscv,
linux-kernel, Yosry Ahmed
From: Yosry Ahmed <yosry.ahmed@linux.dev>
In preparation for generalizing the nested dirty logging test, checking
if either EPT or NPT is enabled will be needed. To avoid needing to gate
the kvm_cpu_has_ept() call by the CPU type, make sure the function
returns false if VMX is not available instead of trying to read VMX-only
MSRs.
No functional change intended.
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
tools/testing/selftests/kvm/lib/x86/vmx.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/tools/testing/selftests/kvm/lib/x86/vmx.c b/tools/testing/selftests/kvm/lib/x86/vmx.c
index 448a63457467..c87b340362a9 100644
--- a/tools/testing/selftests/kvm/lib/x86/vmx.c
+++ b/tools/testing/selftests/kvm/lib/x86/vmx.c
@@ -377,6 +377,9 @@ bool kvm_cpu_has_ept(void)
{
uint64_t ctrl;
+ if (!kvm_cpu_has(X86_FEATURE_VMX))
+ return false;
+
ctrl = kvm_get_feature_msr(MSR_IA32_VMX_TRUE_PROCBASED_CTLS) >> 32;
if (!(ctrl & CPU_BASED_ACTIVATE_SECONDARY_CONTROLS))
return false;
--
2.52.0.351.gbe84eed79e-goog
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH v4 16/21] KVM: selftests: Add support for nested NPTs
2025-12-30 23:01 [PATCH v4 00/21] KVM: selftests: Add Nested NPT support Sean Christopherson
` (14 preceding siblings ...)
2025-12-30 23:01 ` [PATCH v4 15/21] KVM: selftests: Allow kvm_cpu_has_ept() to be called on AMD CPUs Sean Christopherson
@ 2025-12-30 23:01 ` Sean Christopherson
2026-01-07 23:12 ` Yosry Ahmed
2025-12-30 23:01 ` [PATCH v4 17/21] KVM: selftests: Set the user bit on nested NPT PTEs Sean Christopherson
` (5 subsequent siblings)
21 siblings, 1 reply; 39+ messages in thread
From: Sean Christopherson @ 2025-12-30 23:01 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou,
Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, kvm-riscv, linux-riscv,
linux-kernel, Yosry Ahmed
From: Yosry Ahmed <yosry.ahmed@linux.dev>
Implement nCR3 and NPT initialization functions, similar to the EPT
equivalents, and create common TDP helpers for enablement checking and
initialization. Enable NPT for nested guests by default if the TDP MMU
was initialized, similar to VMX.
Reuse the PTE masks from the main MMU in the NPT MMU, except for the C
and S bits related to confidential VMs.
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
.../selftests/kvm/include/x86/processor.h | 2 ++
.../selftests/kvm/include/x86/svm_util.h | 9 ++++++++
.../testing/selftests/kvm/lib/x86/memstress.c | 4 ++--
.../testing/selftests/kvm/lib/x86/processor.c | 15 +++++++++++++
tools/testing/selftests/kvm/lib/x86/svm.c | 21 +++++++++++++++++++
.../selftests/kvm/x86/vmx_dirty_log_test.c | 4 ++--
6 files changed, 51 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/kvm/include/x86/processor.h b/tools/testing/selftests/kvm/include/x86/processor.h
index d134c886f280..deb471fb9b51 100644
--- a/tools/testing/selftests/kvm/include/x86/processor.h
+++ b/tools/testing/selftests/kvm/include/x86/processor.h
@@ -1477,6 +1477,8 @@ void __virt_pg_map(struct kvm_vm *vm, struct kvm_mmu *mmu, uint64_t vaddr,
void virt_map_level(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr,
uint64_t nr_bytes, int level);
+void vm_enable_tdp(struct kvm_vm *vm);
+bool kvm_cpu_has_tdp(void);
void tdp_map(struct kvm_vm *vm, uint64_t nested_paddr, uint64_t paddr, uint64_t size);
void tdp_identity_map_default_memslots(struct kvm_vm *vm);
void tdp_identity_map_1g(struct kvm_vm *vm, uint64_t addr, uint64_t size);
diff --git a/tools/testing/selftests/kvm/include/x86/svm_util.h b/tools/testing/selftests/kvm/include/x86/svm_util.h
index b74c6dcddcbd..5d7c42534bc4 100644
--- a/tools/testing/selftests/kvm/include/x86/svm_util.h
+++ b/tools/testing/selftests/kvm/include/x86/svm_util.h
@@ -27,6 +27,9 @@ struct svm_test_data {
void *msr; /* gva */
void *msr_hva;
uint64_t msr_gpa;
+
+ /* NPT */
+ uint64_t ncr3_gpa;
};
static inline void vmmcall(void)
@@ -57,6 +60,12 @@ struct svm_test_data *vcpu_alloc_svm(struct kvm_vm *vm, vm_vaddr_t *p_svm_gva);
void generic_svm_setup(struct svm_test_data *svm, void *guest_rip, void *guest_rsp);
void run_guest(struct vmcb *vmcb, uint64_t vmcb_gpa);
+static inline bool kvm_cpu_has_npt(void)
+{
+ return kvm_cpu_has(X86_FEATURE_NPT);
+}
+void vm_enable_npt(struct kvm_vm *vm);
+
int open_sev_dev_path_or_exit(void);
#endif /* SELFTEST_KVM_SVM_UTILS_H */
diff --git a/tools/testing/selftests/kvm/lib/x86/memstress.c b/tools/testing/selftests/kvm/lib/x86/memstress.c
index 3319cb57a78d..407abfc34909 100644
--- a/tools/testing/selftests/kvm/lib/x86/memstress.c
+++ b/tools/testing/selftests/kvm/lib/x86/memstress.c
@@ -82,9 +82,9 @@ void memstress_setup_nested(struct kvm_vm *vm, int nr_vcpus, struct kvm_vcpu *vc
int vcpu_id;
TEST_REQUIRE(kvm_cpu_has(X86_FEATURE_VMX));
- TEST_REQUIRE(kvm_cpu_has_ept());
+ TEST_REQUIRE(kvm_cpu_has_tdp());
- vm_enable_ept(vm);
+ vm_enable_tdp(vm);
for (vcpu_id = 0; vcpu_id < nr_vcpus; vcpu_id++) {
vcpu_alloc_vmx(vm, &vmx_gva);
diff --git a/tools/testing/selftests/kvm/lib/x86/processor.c b/tools/testing/selftests/kvm/lib/x86/processor.c
index 29e7d172f945..a3a4c9a4cbcb 100644
--- a/tools/testing/selftests/kvm/lib/x86/processor.c
+++ b/tools/testing/selftests/kvm/lib/x86/processor.c
@@ -8,7 +8,9 @@
#include "kvm_util.h"
#include "pmu.h"
#include "processor.h"
+#include "svm_util.h"
#include "sev.h"
+#include "vmx.h"
#ifndef NUM_INTERRUPTS
#define NUM_INTERRUPTS 256
@@ -472,6 +474,19 @@ void virt_arch_dump(FILE *stream, struct kvm_vm *vm, uint8_t indent)
}
}
+void vm_enable_tdp(struct kvm_vm *vm)
+{
+ if (kvm_cpu_has(X86_FEATURE_VMX))
+ vm_enable_ept(vm);
+ else
+ vm_enable_npt(vm);
+}
+
+bool kvm_cpu_has_tdp(void)
+{
+ return kvm_cpu_has_ept() || kvm_cpu_has_npt();
+}
+
void __tdp_map(struct kvm_vm *vm, uint64_t nested_paddr, uint64_t paddr,
uint64_t size, int level)
{
diff --git a/tools/testing/selftests/kvm/lib/x86/svm.c b/tools/testing/selftests/kvm/lib/x86/svm.c
index d239c2097391..8e4795225595 100644
--- a/tools/testing/selftests/kvm/lib/x86/svm.c
+++ b/tools/testing/selftests/kvm/lib/x86/svm.c
@@ -59,6 +59,22 @@ static void vmcb_set_seg(struct vmcb_seg *seg, u16 selector,
seg->base = base;
}
+void vm_enable_npt(struct kvm_vm *vm)
+{
+ struct pte_masks pte_masks;
+
+ TEST_ASSERT(kvm_cpu_has_npt(), "KVM doesn't supported nested NPT");
+
+ /*
+ * NPTs use the same PTE format, but deliberately drop the C-bit as the
+ * per-VM shared vs. private information is only meant for stage-1.
+ */
+ pte_masks = vm->mmu.arch.pte_masks;
+ pte_masks.c = 0;
+
+ tdp_mmu_init(vm, vm->mmu.pgtable_levels, &pte_masks);
+}
+
void generic_svm_setup(struct svm_test_data *svm, void *guest_rip, void *guest_rsp)
{
struct vmcb *vmcb = svm->vmcb;
@@ -102,6 +118,11 @@ void generic_svm_setup(struct svm_test_data *svm, void *guest_rip, void *guest_r
vmcb->save.rip = (u64)guest_rip;
vmcb->save.rsp = (u64)guest_rsp;
guest_regs.rdi = (u64)svm;
+
+ if (svm->ncr3_gpa) {
+ ctrl->nested_ctl |= SVM_NESTED_CTL_NP_ENABLE;
+ ctrl->nested_cr3 = svm->ncr3_gpa;
+ }
}
/*
diff --git a/tools/testing/selftests/kvm/x86/vmx_dirty_log_test.c b/tools/testing/selftests/kvm/x86/vmx_dirty_log_test.c
index 370f8d3117c2..032ab8bf60a4 100644
--- a/tools/testing/selftests/kvm/x86/vmx_dirty_log_test.c
+++ b/tools/testing/selftests/kvm/x86/vmx_dirty_log_test.c
@@ -93,7 +93,7 @@ static void test_vmx_dirty_log(bool enable_ept)
/* Create VM */
vm = vm_create_with_one_vcpu(&vcpu, l1_guest_code);
if (enable_ept)
- vm_enable_ept(vm);
+ vm_enable_tdp(vm);
vcpu_alloc_vmx(vm, &vmx_pages_gva);
vcpu_args_set(vcpu, 1, vmx_pages_gva);
@@ -170,7 +170,7 @@ int main(int argc, char *argv[])
test_vmx_dirty_log(/*enable_ept=*/false);
- if (kvm_cpu_has_ept())
+ if (kvm_cpu_has_tdp())
test_vmx_dirty_log(/*enable_ept=*/true);
return 0;
--
2.52.0.351.gbe84eed79e-goog
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH v4 17/21] KVM: selftests: Set the user bit on nested NPT PTEs
2025-12-30 23:01 [PATCH v4 00/21] KVM: selftests: Add Nested NPT support Sean Christopherson
` (15 preceding siblings ...)
2025-12-30 23:01 ` [PATCH v4 16/21] KVM: selftests: Add support for nested NPTs Sean Christopherson
@ 2025-12-30 23:01 ` Sean Christopherson
2025-12-30 23:01 ` [PATCH v4 18/21] KVM: selftests: Extend vmx_dirty_log_test to cover SVM Sean Christopherson
` (4 subsequent siblings)
21 siblings, 0 replies; 39+ messages in thread
From: Sean Christopherson @ 2025-12-30 23:01 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou,
Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, kvm-riscv, linux-riscv,
linux-kernel, Yosry Ahmed
From: Yosry Ahmed <yosry.ahmed@linux.dev>
According to the APM, NPT walks are treated as user accesses. In
preparation for supporting NPT mappings, set the 'user' bit on NPTs by
adding a mask of bits to always be set on PTEs in kvm_mmu.
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
tools/testing/selftests/kvm/include/x86/kvm_util_arch.h | 2 ++
tools/testing/selftests/kvm/include/x86/processor.h | 1 +
tools/testing/selftests/kvm/lib/x86/processor.c | 5 +++--
tools/testing/selftests/kvm/lib/x86/svm.c | 3 +++
4 files changed, 9 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/kvm/include/x86/kvm_util_arch.h b/tools/testing/selftests/kvm/include/x86/kvm_util_arch.h
index 1cf84b8212c6..be35d26bb320 100644
--- a/tools/testing/selftests/kvm/include/x86/kvm_util_arch.h
+++ b/tools/testing/selftests/kvm/include/x86/kvm_util_arch.h
@@ -22,6 +22,8 @@ struct pte_masks {
uint64_t nx;
uint64_t c;
uint64_t s;
+
+ uint64_t always_set;
};
struct kvm_mmu_arch {
diff --git a/tools/testing/selftests/kvm/include/x86/processor.h b/tools/testing/selftests/kvm/include/x86/processor.h
index deb471fb9b51..7b7d962244d6 100644
--- a/tools/testing/selftests/kvm/include/x86/processor.h
+++ b/tools/testing/selftests/kvm/include/x86/processor.h
@@ -1450,6 +1450,7 @@ enum pg_level {
#define PTE_NX_MASK(mmu) ((mmu)->arch.pte_masks.nx)
#define PTE_C_BIT_MASK(mmu) ((mmu)->arch.pte_masks.c)
#define PTE_S_BIT_MASK(mmu) ((mmu)->arch.pte_masks.s)
+#define PTE_ALWAYS_SET_MASK(mmu) ((mmu)->arch.pte_masks.always_set)
/*
* For PTEs without a PRESENT bit (i.e. EPT entries), treat the PTE as present
diff --git a/tools/testing/selftests/kvm/lib/x86/processor.c b/tools/testing/selftests/kvm/lib/x86/processor.c
index a3a4c9a4cbcb..5a3385d48902 100644
--- a/tools/testing/selftests/kvm/lib/x86/processor.c
+++ b/tools/testing/selftests/kvm/lib/x86/processor.c
@@ -231,7 +231,8 @@ static uint64_t *virt_create_upper_pte(struct kvm_vm *vm,
if (!is_present_pte(mmu, pte)) {
*pte = PTE_PRESENT_MASK(mmu) | PTE_READABLE_MASK(mmu) |
- PTE_WRITABLE_MASK(mmu) | PTE_EXECUTABLE_MASK(mmu);
+ PTE_WRITABLE_MASK(mmu) | PTE_EXECUTABLE_MASK(mmu) |
+ PTE_ALWAYS_SET_MASK(mmu);
if (current_level == target_level)
*pte |= PTE_HUGE_MASK(mmu) | (paddr & PHYSICAL_PAGE_MASK);
else
@@ -299,7 +300,7 @@ void __virt_pg_map(struct kvm_vm *vm, struct kvm_mmu *mmu, uint64_t vaddr,
"PTE already present for 4k page at vaddr: 0x%lx", vaddr);
*pte = PTE_PRESENT_MASK(mmu) | PTE_READABLE_MASK(mmu) |
PTE_WRITABLE_MASK(mmu) | PTE_EXECUTABLE_MASK(mmu) |
- (paddr & PHYSICAL_PAGE_MASK);
+ PTE_ALWAYS_SET_MASK(mmu) | (paddr & PHYSICAL_PAGE_MASK);
/*
* Neither SEV nor TDX supports shared page tables, so only the final
diff --git a/tools/testing/selftests/kvm/lib/x86/svm.c b/tools/testing/selftests/kvm/lib/x86/svm.c
index 8e4795225595..18e9e9089643 100644
--- a/tools/testing/selftests/kvm/lib/x86/svm.c
+++ b/tools/testing/selftests/kvm/lib/x86/svm.c
@@ -72,6 +72,9 @@ void vm_enable_npt(struct kvm_vm *vm)
pte_masks = vm->mmu.arch.pte_masks;
pte_masks.c = 0;
+ /* NPT walks are treated as user accesses, so set the 'user' bit. */
+ pte_masks.always_set = pte_masks.user;
+
tdp_mmu_init(vm, vm->mmu.pgtable_levels, &pte_masks);
}
--
2.52.0.351.gbe84eed79e-goog
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH v4 18/21] KVM: selftests: Extend vmx_dirty_log_test to cover SVM
2025-12-30 23:01 [PATCH v4 00/21] KVM: selftests: Add Nested NPT support Sean Christopherson
` (16 preceding siblings ...)
2025-12-30 23:01 ` [PATCH v4 17/21] KVM: selftests: Set the user bit on nested NPT PTEs Sean Christopherson
@ 2025-12-30 23:01 ` Sean Christopherson
2025-12-30 23:01 ` [PATCH v4 19/21] KVM: selftests: Extend memstress to run on nested SVM Sean Christopherson
` (3 subsequent siblings)
21 siblings, 0 replies; 39+ messages in thread
From: Sean Christopherson @ 2025-12-30 23:01 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou,
Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, kvm-riscv, linux-riscv,
linux-kernel, Yosry Ahmed
From: Yosry Ahmed <yosry.ahmed@linux.dev>
Generalize the code in vmx_dirty_log_test.c by adding SVM-specific L1
code, doing some renaming (e.g. EPT -> TDP), and having setup code for
both SVM and VMX in test_dirty_log().
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
tools/testing/selftests/kvm/Makefile.kvm | 2 +-
...rty_log_test.c => nested_dirty_log_test.c} | 73 ++++++++++++++-----
2 files changed, 54 insertions(+), 21 deletions(-)
rename tools/testing/selftests/kvm/x86/{vmx_dirty_log_test.c => nested_dirty_log_test.c} (71%)
diff --git a/tools/testing/selftests/kvm/Makefile.kvm b/tools/testing/selftests/kvm/Makefile.kvm
index ba5c2b643efa..8f14213ddef1 100644
--- a/tools/testing/selftests/kvm/Makefile.kvm
+++ b/tools/testing/selftests/kvm/Makefile.kvm
@@ -89,6 +89,7 @@ TEST_GEN_PROGS_x86 += x86/kvm_buslock_test
TEST_GEN_PROGS_x86 += x86/monitor_mwait_test
TEST_GEN_PROGS_x86 += x86/msrs_test
TEST_GEN_PROGS_x86 += x86/nested_close_kvm_test
+TEST_GEN_PROGS_x86 += x86/nested_dirty_log_test
TEST_GEN_PROGS_x86 += x86/nested_emulation_test
TEST_GEN_PROGS_x86 += x86/nested_exceptions_test
TEST_GEN_PROGS_x86 += x86/nested_invalid_cr3_test
@@ -115,7 +116,6 @@ TEST_GEN_PROGS_x86 += x86/ucna_injection_test
TEST_GEN_PROGS_x86 += x86/userspace_io_test
TEST_GEN_PROGS_x86 += x86/userspace_msr_exit_test
TEST_GEN_PROGS_x86 += x86/vmx_apic_access_test
-TEST_GEN_PROGS_x86 += x86/vmx_dirty_log_test
TEST_GEN_PROGS_x86 += x86/vmx_exception_with_invalid_guest_state
TEST_GEN_PROGS_x86 += x86/vmx_msrs_test
TEST_GEN_PROGS_x86 += x86/vmx_invalid_nested_guest_state
diff --git a/tools/testing/selftests/kvm/x86/vmx_dirty_log_test.c b/tools/testing/selftests/kvm/x86/nested_dirty_log_test.c
similarity index 71%
rename from tools/testing/selftests/kvm/x86/vmx_dirty_log_test.c
rename to tools/testing/selftests/kvm/x86/nested_dirty_log_test.c
index 032ab8bf60a4..89d2e86a0db9 100644
--- a/tools/testing/selftests/kvm/x86/vmx_dirty_log_test.c
+++ b/tools/testing/selftests/kvm/x86/nested_dirty_log_test.c
@@ -12,6 +12,7 @@
#include "test_util.h"
#include "kvm_util.h"
#include "processor.h"
+#include "svm_util.h"
#include "vmx.h"
/* The memory slot index to track dirty pages */
@@ -25,6 +26,8 @@
#define NESTED_TEST_MEM1 0xc0001000
#define NESTED_TEST_MEM2 0xc0002000
+#define L2_GUEST_STACK_SIZE 64
+
static void l2_guest_code(u64 *a, u64 *b)
{
READ_ONCE(*a);
@@ -42,20 +45,19 @@ static void l2_guest_code(u64 *a, u64 *b)
vmcall();
}
-static void l2_guest_code_ept_enabled(void)
+static void l2_guest_code_tdp_enabled(void)
{
l2_guest_code((u64 *)NESTED_TEST_MEM1, (u64 *)NESTED_TEST_MEM2);
}
-static void l2_guest_code_ept_disabled(void)
+static void l2_guest_code_tdp_disabled(void)
{
- /* Access the same L1 GPAs as l2_guest_code_ept_enabled() */
+ /* Access the same L1 GPAs as l2_guest_code_tdp_enabled() */
l2_guest_code((u64 *)GUEST_TEST_MEM, (u64 *)GUEST_TEST_MEM);
}
-void l1_guest_code(struct vmx_pages *vmx)
+void l1_vmx_code(struct vmx_pages *vmx)
{
-#define L2_GUEST_STACK_SIZE 64
unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
void *l2_rip;
@@ -64,22 +66,49 @@ void l1_guest_code(struct vmx_pages *vmx)
GUEST_ASSERT(load_vmcs(vmx));
if (vmx->eptp_gpa)
- l2_rip = l2_guest_code_ept_enabled;
+ l2_rip = l2_guest_code_tdp_enabled;
else
- l2_rip = l2_guest_code_ept_disabled;
+ l2_rip = l2_guest_code_tdp_disabled;
prepare_vmcs(vmx, l2_rip, &l2_guest_stack[L2_GUEST_STACK_SIZE]);
GUEST_SYNC(false);
GUEST_ASSERT(!vmlaunch());
GUEST_SYNC(false);
- GUEST_ASSERT(vmreadz(VM_EXIT_REASON) == EXIT_REASON_VMCALL);
+ GUEST_ASSERT_EQ(vmreadz(VM_EXIT_REASON), EXIT_REASON_VMCALL);
GUEST_DONE();
}
-static void test_vmx_dirty_log(bool enable_ept)
+static void l1_svm_code(struct svm_test_data *svm)
{
- vm_vaddr_t vmx_pages_gva = 0;
+ unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
+ void *l2_rip;
+
+ if (svm->ncr3_gpa)
+ l2_rip = l2_guest_code_tdp_enabled;
+ else
+ l2_rip = l2_guest_code_tdp_disabled;
+
+ generic_svm_setup(svm, l2_rip, &l2_guest_stack[L2_GUEST_STACK_SIZE]);
+
+ GUEST_SYNC(false);
+ run_guest(svm->vmcb, svm->vmcb_gpa);
+ GUEST_SYNC(false);
+ GUEST_ASSERT_EQ(svm->vmcb->control.exit_code, SVM_EXIT_VMMCALL);
+ GUEST_DONE();
+}
+
+static void l1_guest_code(void *data)
+{
+ if (this_cpu_has(X86_FEATURE_VMX))
+ l1_vmx_code(data);
+ else
+ l1_svm_code(data);
+}
+
+static void test_dirty_log(bool nested_tdp)
+{
+ vm_vaddr_t nested_gva = 0;
unsigned long *bmap;
uint64_t *host_test_mem;
@@ -88,15 +117,19 @@ static void test_vmx_dirty_log(bool enable_ept)
struct ucall uc;
bool done = false;
- pr_info("Nested EPT: %s\n", enable_ept ? "enabled" : "disabled");
+ pr_info("Nested TDP: %s\n", nested_tdp ? "enabled" : "disabled");
/* Create VM */
vm = vm_create_with_one_vcpu(&vcpu, l1_guest_code);
- if (enable_ept)
+ if (nested_tdp)
vm_enable_tdp(vm);
- vcpu_alloc_vmx(vm, &vmx_pages_gva);
- vcpu_args_set(vcpu, 1, vmx_pages_gva);
+ if (kvm_cpu_has(X86_FEATURE_VMX))
+ vcpu_alloc_vmx(vm, &nested_gva);
+ else
+ vcpu_alloc_svm(vm, &nested_gva);
+
+ vcpu_args_set(vcpu, 1, nested_gva);
/* Add an extra memory slot for testing dirty logging */
vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS,
@@ -115,10 +148,10 @@ static void test_vmx_dirty_log(bool enable_ept)
* ... pages in the L2 GPA range [0xc0001000, 0xc0003000) will map to
* 0xc0000000.
*
- * When EPT is disabled, the L2 guest code will still access the same L1
- * GPAs as the EPT enabled case.
+ * When TDP is disabled, the L2 guest code will still access the same L1
+ * GPAs as the TDP enabled case.
*/
- if (enable_ept) {
+ if (nested_tdp) {
tdp_identity_map_default_memslots(vm);
tdp_map(vm, NESTED_TEST_MEM1, GUEST_TEST_MEM, PAGE_SIZE);
tdp_map(vm, NESTED_TEST_MEM2, GUEST_TEST_MEM, PAGE_SIZE);
@@ -166,12 +199,12 @@ static void test_vmx_dirty_log(bool enable_ept)
int main(int argc, char *argv[])
{
- TEST_REQUIRE(kvm_cpu_has(X86_FEATURE_VMX));
+ TEST_REQUIRE(kvm_cpu_has(X86_FEATURE_VMX) || kvm_cpu_has(X86_FEATURE_SVM));
- test_vmx_dirty_log(/*enable_ept=*/false);
+ test_dirty_log(/*nested_tdp=*/false);
if (kvm_cpu_has_tdp())
- test_vmx_dirty_log(/*enable_ept=*/true);
+ test_dirty_log(/*nested_tdp=*/true);
return 0;
}
--
2.52.0.351.gbe84eed79e-goog
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH v4 19/21] KVM: selftests: Extend memstress to run on nested SVM
2025-12-30 23:01 [PATCH v4 00/21] KVM: selftests: Add Nested NPT support Sean Christopherson
` (17 preceding siblings ...)
2025-12-30 23:01 ` [PATCH v4 18/21] KVM: selftests: Extend vmx_dirty_log_test to cover SVM Sean Christopherson
@ 2025-12-30 23:01 ` Sean Christopherson
2025-12-30 23:01 ` [PATCH v4 20/21] KVM: selftests: Rename vm_get_page_table_entry() to vm_get_pte() Sean Christopherson
` (2 subsequent siblings)
21 siblings, 0 replies; 39+ messages in thread
From: Sean Christopherson @ 2025-12-30 23:01 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou,
Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, kvm-riscv, linux-riscv,
linux-kernel, Yosry Ahmed
From: Yosry Ahmed <yosry.ahmed@linux.dev>
Add L1 SVM code and generalize the setup code to work for both VMX and
SVM. This allows running 'dirty_log_perf_test -n' on AMD CPUs.
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
.../testing/selftests/kvm/lib/x86/memstress.c | 42 +++++++++++++++----
1 file changed, 35 insertions(+), 7 deletions(-)
diff --git a/tools/testing/selftests/kvm/lib/x86/memstress.c b/tools/testing/selftests/kvm/lib/x86/memstress.c
index 407abfc34909..86f4c5e4c430 100644
--- a/tools/testing/selftests/kvm/lib/x86/memstress.c
+++ b/tools/testing/selftests/kvm/lib/x86/memstress.c
@@ -13,6 +13,7 @@
#include "kvm_util.h"
#include "memstress.h"
#include "processor.h"
+#include "svm_util.h"
#include "vmx.h"
void memstress_l2_guest_code(uint64_t vcpu_id)
@@ -29,9 +30,10 @@ __asm__(
" ud2;"
);
-static void memstress_l1_guest_code(struct vmx_pages *vmx, uint64_t vcpu_id)
-{
#define L2_GUEST_STACK_SIZE 64
+
+static void l1_vmx_code(struct vmx_pages *vmx, uint64_t vcpu_id)
+{
unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
unsigned long *rsp;
@@ -45,10 +47,34 @@ static void memstress_l1_guest_code(struct vmx_pages *vmx, uint64_t vcpu_id)
prepare_vmcs(vmx, memstress_l2_guest_entry, rsp);
GUEST_ASSERT(!vmlaunch());
- GUEST_ASSERT(vmreadz(VM_EXIT_REASON) == EXIT_REASON_VMCALL);
+ GUEST_ASSERT_EQ(vmreadz(VM_EXIT_REASON), EXIT_REASON_VMCALL);
GUEST_DONE();
}
+static void l1_svm_code(struct svm_test_data *svm, uint64_t vcpu_id)
+{
+ unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
+ unsigned long *rsp;
+
+
+ rsp = &l2_guest_stack[L2_GUEST_STACK_SIZE - 1];
+ *rsp = vcpu_id;
+ generic_svm_setup(svm, memstress_l2_guest_entry, rsp);
+
+ run_guest(svm->vmcb, svm->vmcb_gpa);
+ GUEST_ASSERT_EQ(svm->vmcb->control.exit_code, SVM_EXIT_VMMCALL);
+ GUEST_DONE();
+}
+
+
+static void memstress_l1_guest_code(void *data, uint64_t vcpu_id)
+{
+ if (this_cpu_has(X86_FEATURE_VMX))
+ l1_vmx_code(data, vcpu_id);
+ else
+ l1_svm_code(data, vcpu_id);
+}
+
uint64_t memstress_nested_pages(int nr_vcpus)
{
/*
@@ -78,15 +104,17 @@ static void memstress_setup_ept_mappings(struct kvm_vm *vm)
void memstress_setup_nested(struct kvm_vm *vm, int nr_vcpus, struct kvm_vcpu *vcpus[])
{
struct kvm_regs regs;
- vm_vaddr_t vmx_gva;
+ vm_vaddr_t nested_gva;
int vcpu_id;
- TEST_REQUIRE(kvm_cpu_has(X86_FEATURE_VMX));
TEST_REQUIRE(kvm_cpu_has_tdp());
vm_enable_tdp(vm);
for (vcpu_id = 0; vcpu_id < nr_vcpus; vcpu_id++) {
- vcpu_alloc_vmx(vm, &vmx_gva);
+ if (kvm_cpu_has(X86_FEATURE_VMX))
+ vcpu_alloc_vmx(vm, &nested_gva);
+ else
+ vcpu_alloc_svm(vm, &nested_gva);
/* The EPTs are shared across vCPUs, setup the mappings once */
if (vcpu_id == 0)
@@ -99,6 +127,6 @@ void memstress_setup_nested(struct kvm_vm *vm, int nr_vcpus, struct kvm_vcpu *vc
vcpu_regs_get(vcpus[vcpu_id], ®s);
regs.rip = (unsigned long) memstress_l1_guest_code;
vcpu_regs_set(vcpus[vcpu_id], ®s);
- vcpu_args_set(vcpus[vcpu_id], 2, vmx_gva, vcpu_id);
+ vcpu_args_set(vcpus[vcpu_id], 2, nested_gva, vcpu_id);
}
}
--
2.52.0.351.gbe84eed79e-goog
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH v4 20/21] KVM: selftests: Rename vm_get_page_table_entry() to vm_get_pte()
2025-12-30 23:01 [PATCH v4 00/21] KVM: selftests: Add Nested NPT support Sean Christopherson
` (18 preceding siblings ...)
2025-12-30 23:01 ` [PATCH v4 19/21] KVM: selftests: Extend memstress to run on nested SVM Sean Christopherson
@ 2025-12-30 23:01 ` Sean Christopherson
2026-01-02 17:10 ` Yosry Ahmed
2025-12-30 23:01 ` [PATCH v4 21/21] KVM: selftests: Test READ=>WRITE dirty logging behavior for shadow MMU Sean Christopherson
2026-01-12 17:38 ` [PATCH v4 00/21] KVM: selftests: Add Nested NPT support Sean Christopherson
21 siblings, 1 reply; 39+ messages in thread
From: Sean Christopherson @ 2025-12-30 23:01 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou,
Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, kvm-riscv, linux-riscv,
linux-kernel, Yosry Ahmed
Shorten the API to get a PTE as the "PTE" acronym is ubiquitous, and the
"page table entry" makes it unnecessarily difficult to quickly understand
what callers are doing.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
tools/testing/selftests/kvm/include/x86/processor.h | 2 +-
tools/testing/selftests/kvm/lib/x86/processor.c | 2 +-
tools/testing/selftests/kvm/x86/hyperv_tlb_flush.c | 2 +-
.../selftests/kvm/x86/smaller_maxphyaddr_emulation_test.c | 4 +---
4 files changed, 4 insertions(+), 6 deletions(-)
diff --git a/tools/testing/selftests/kvm/include/x86/processor.h b/tools/testing/selftests/kvm/include/x86/processor.h
index 7b7d962244d6..ab29b1c7ed2d 100644
--- a/tools/testing/selftests/kvm/include/x86/processor.h
+++ b/tools/testing/selftests/kvm/include/x86/processor.h
@@ -1357,7 +1357,7 @@ static inline bool kvm_is_ignore_msrs(void)
return get_kvm_param_bool("ignore_msrs");
}
-uint64_t *vm_get_page_table_entry(struct kvm_vm *vm, uint64_t vaddr);
+uint64_t *vm_get_pte(struct kvm_vm *vm, uint64_t vaddr);
uint64_t kvm_hypercall(uint64_t nr, uint64_t a0, uint64_t a1, uint64_t a2,
uint64_t a3);
diff --git a/tools/testing/selftests/kvm/lib/x86/processor.c b/tools/testing/selftests/kvm/lib/x86/processor.c
index 5a3385d48902..ab869a98bbdc 100644
--- a/tools/testing/selftests/kvm/lib/x86/processor.c
+++ b/tools/testing/selftests/kvm/lib/x86/processor.c
@@ -390,7 +390,7 @@ static uint64_t *__vm_get_page_table_entry(struct kvm_vm *vm,
return virt_get_pte(vm, mmu, pte, vaddr, PG_LEVEL_4K);
}
-uint64_t *vm_get_page_table_entry(struct kvm_vm *vm, uint64_t vaddr)
+uint64_t *vm_get_pte(struct kvm_vm *vm, uint64_t vaddr)
{
int level = PG_LEVEL_4K;
diff --git a/tools/testing/selftests/kvm/x86/hyperv_tlb_flush.c b/tools/testing/selftests/kvm/x86/hyperv_tlb_flush.c
index a3b7ce155981..c542cc4762b1 100644
--- a/tools/testing/selftests/kvm/x86/hyperv_tlb_flush.c
+++ b/tools/testing/selftests/kvm/x86/hyperv_tlb_flush.c
@@ -619,7 +619,7 @@ int main(int argc, char *argv[])
*/
gva = vm_vaddr_unused_gap(vm, NTEST_PAGES * PAGE_SIZE, KVM_UTIL_MIN_VADDR);
for (i = 0; i < NTEST_PAGES; i++) {
- pte = vm_get_page_table_entry(vm, data->test_pages + i * PAGE_SIZE);
+ pte = vm_get_pte(vm, data->test_pages + i * PAGE_SIZE);
gpa = addr_hva2gpa(vm, pte);
virt_pg_map(vm, gva + PAGE_SIZE * i, gpa & PAGE_MASK);
data->test_pages_pte[i] = gva + (gpa & ~PAGE_MASK);
diff --git a/tools/testing/selftests/kvm/x86/smaller_maxphyaddr_emulation_test.c b/tools/testing/selftests/kvm/x86/smaller_maxphyaddr_emulation_test.c
index fabeeaddfb3a..0e8aec568010 100644
--- a/tools/testing/selftests/kvm/x86/smaller_maxphyaddr_emulation_test.c
+++ b/tools/testing/selftests/kvm/x86/smaller_maxphyaddr_emulation_test.c
@@ -47,7 +47,6 @@ int main(int argc, char *argv[])
struct kvm_vcpu *vcpu;
struct kvm_vm *vm;
struct ucall uc;
- uint64_t *pte;
uint64_t *hva;
uint64_t gpa;
int rc;
@@ -73,8 +72,7 @@ int main(int argc, char *argv[])
hva = addr_gpa2hva(vm, MEM_REGION_GPA);
memset(hva, 0, PAGE_SIZE);
- pte = vm_get_page_table_entry(vm, MEM_REGION_GVA);
- *pte |= BIT_ULL(MAXPHYADDR);
+ *vm_get_pte(vm, MEM_REGION_GVA) |= BIT_ULL(MAXPHYADDR);
vcpu_run(vcpu);
--
2.52.0.351.gbe84eed79e-goog
^ permalink raw reply related [flat|nested] 39+ messages in thread
* [PATCH v4 21/21] KVM: selftests: Test READ=>WRITE dirty logging behavior for shadow MMU
2025-12-30 23:01 [PATCH v4 00/21] KVM: selftests: Add Nested NPT support Sean Christopherson
` (19 preceding siblings ...)
2025-12-30 23:01 ` [PATCH v4 20/21] KVM: selftests: Rename vm_get_page_table_entry() to vm_get_pte() Sean Christopherson
@ 2025-12-30 23:01 ` Sean Christopherson
2026-01-02 17:36 ` Yosry Ahmed
2026-01-12 17:38 ` [PATCH v4 00/21] KVM: selftests: Add Nested NPT support Sean Christopherson
21 siblings, 1 reply; 39+ messages in thread
From: Sean Christopherson @ 2025-12-30 23:01 UTC (permalink / raw)
To: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou,
Christian Borntraeger, Janosch Frank, Claudio Imbrenda,
Sean Christopherson
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, kvm-riscv, linux-riscv,
linux-kernel, Yosry Ahmed
Update the nested dirty log test to validate KVM's handling of READ faults
when dirty logging is enabled. Specifically, set the Dirty bit in the
guest PTEs used to map L2 GPAs, so that KVM will create writable SPTEs
when handling L2 read faults. When handling read faults in the shadow MMU,
KVM opportunistically creates a writable SPTE if the mapping can be
writable *and* the gPTE is dirty (or doesn't support the Dirty bit), i.e.
if KVM doesn't need to intercept writes in order to emulate Dirty-bit
updates.
To actually test the L2 READ=>WRITE sequence, e.g. without masking a false
pass by other test activity, route the READ=>WRITE and WRITE=>WRITE
sequences to separate L1 pages, and differentiate between "marked dirty
due to a WRITE access/fault" and "marked dirty due to creating a writable
SPTE for a READ access/fault". The updated sequence exposes the bug fixed
by KVM commit 1f4e5fc83a42 ("KVM: x86: fix nested guest live migration
with PML") when the guest performs a READ=>WRITE sequence.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
.../selftests/kvm/include/x86/processor.h | 1 +
.../testing/selftests/kvm/lib/x86/processor.c | 7 ++
.../selftests/kvm/x86/nested_dirty_log_test.c | 115 +++++++++++++-----
3 files changed, 90 insertions(+), 33 deletions(-)
diff --git a/tools/testing/selftests/kvm/include/x86/processor.h b/tools/testing/selftests/kvm/include/x86/processor.h
index ab29b1c7ed2d..8945c9eea704 100644
--- a/tools/testing/selftests/kvm/include/x86/processor.h
+++ b/tools/testing/selftests/kvm/include/x86/processor.h
@@ -1483,6 +1483,7 @@ bool kvm_cpu_has_tdp(void);
void tdp_map(struct kvm_vm *vm, uint64_t nested_paddr, uint64_t paddr, uint64_t size);
void tdp_identity_map_default_memslots(struct kvm_vm *vm);
void tdp_identity_map_1g(struct kvm_vm *vm, uint64_t addr, uint64_t size);
+uint64_t *tdp_get_pte(struct kvm_vm *vm, uint64_t l2_gpa);
/*
* Basic CPU control in CR0
diff --git a/tools/testing/selftests/kvm/lib/x86/processor.c b/tools/testing/selftests/kvm/lib/x86/processor.c
index ab869a98bbdc..fab18e9be66c 100644
--- a/tools/testing/selftests/kvm/lib/x86/processor.c
+++ b/tools/testing/selftests/kvm/lib/x86/processor.c
@@ -390,6 +390,13 @@ static uint64_t *__vm_get_page_table_entry(struct kvm_vm *vm,
return virt_get_pte(vm, mmu, pte, vaddr, PG_LEVEL_4K);
}
+uint64_t *tdp_get_pte(struct kvm_vm *vm, uint64_t l2_gpa)
+{
+ int level = PG_LEVEL_4K;
+
+ return __vm_get_page_table_entry(vm, &vm->stage2_mmu, l2_gpa, &level);
+}
+
uint64_t *vm_get_pte(struct kvm_vm *vm, uint64_t vaddr)
{
int level = PG_LEVEL_4K;
diff --git a/tools/testing/selftests/kvm/x86/nested_dirty_log_test.c b/tools/testing/selftests/kvm/x86/nested_dirty_log_test.c
index 89d2e86a0db9..1e7c1ed917e1 100644
--- a/tools/testing/selftests/kvm/x86/nested_dirty_log_test.c
+++ b/tools/testing/selftests/kvm/x86/nested_dirty_log_test.c
@@ -17,29 +17,39 @@
/* The memory slot index to track dirty pages */
#define TEST_MEM_SLOT_INDEX 1
-#define TEST_MEM_PAGES 3
+#define TEST_MEM_PAGES 4
/* L1 guest test virtual memory offset */
-#define GUEST_TEST_MEM 0xc0000000
+#define GUEST_TEST_MEM1 0xc0000000
+#define GUEST_TEST_MEM2 0xc0002000
/* L2 guest test virtual memory offset */
#define NESTED_TEST_MEM1 0xc0001000
-#define NESTED_TEST_MEM2 0xc0002000
+#define NESTED_TEST_MEM2 0xc0003000
#define L2_GUEST_STACK_SIZE 64
+#define TEST_SYNC_PAGE_MASK 0xfull
+#define TEST_SYNC_READ_FAULT BIT(4)
+#define TEST_SYNC_WRITE_FAULT BIT(5)
+#define TEST_SYNC_NO_FAULT BIT(6)
+
static void l2_guest_code(u64 *a, u64 *b)
{
READ_ONCE(*a);
+ GUEST_SYNC(0 | TEST_SYNC_READ_FAULT);
WRITE_ONCE(*a, 1);
- GUEST_SYNC(true);
- GUEST_SYNC(false);
+ GUEST_SYNC(0 | TEST_SYNC_WRITE_FAULT);
+ READ_ONCE(*a);
+ GUEST_SYNC(0 | TEST_SYNC_NO_FAULT);
WRITE_ONCE(*b, 1);
- GUEST_SYNC(true);
+ GUEST_SYNC(2 | TEST_SYNC_WRITE_FAULT);
WRITE_ONCE(*b, 1);
- GUEST_SYNC(true);
- GUEST_SYNC(false);
+ GUEST_SYNC(2 | TEST_SYNC_WRITE_FAULT);
+ READ_ONCE(*b);
+ GUEST_SYNC(2 | TEST_SYNC_NO_FAULT);
+ GUEST_SYNC(2 | TEST_SYNC_NO_FAULT);
/* Exit to L1 and never come back. */
vmcall();
@@ -53,7 +63,7 @@ static void l2_guest_code_tdp_enabled(void)
static void l2_guest_code_tdp_disabled(void)
{
/* Access the same L1 GPAs as l2_guest_code_tdp_enabled() */
- l2_guest_code((u64 *)GUEST_TEST_MEM, (u64 *)GUEST_TEST_MEM);
+ l2_guest_code((u64 *)GUEST_TEST_MEM1, (u64 *)GUEST_TEST_MEM2);
}
void l1_vmx_code(struct vmx_pages *vmx)
@@ -72,9 +82,11 @@ void l1_vmx_code(struct vmx_pages *vmx)
prepare_vmcs(vmx, l2_rip, &l2_guest_stack[L2_GUEST_STACK_SIZE]);
- GUEST_SYNC(false);
+ GUEST_SYNC(0 | TEST_SYNC_NO_FAULT);
+ GUEST_SYNC(2 | TEST_SYNC_NO_FAULT);
GUEST_ASSERT(!vmlaunch());
- GUEST_SYNC(false);
+ GUEST_SYNC(0 | TEST_SYNC_NO_FAULT);
+ GUEST_SYNC(2 | TEST_SYNC_NO_FAULT);
GUEST_ASSERT_EQ(vmreadz(VM_EXIT_REASON), EXIT_REASON_VMCALL);
GUEST_DONE();
}
@@ -91,9 +103,11 @@ static void l1_svm_code(struct svm_test_data *svm)
generic_svm_setup(svm, l2_rip, &l2_guest_stack[L2_GUEST_STACK_SIZE]);
- GUEST_SYNC(false);
+ GUEST_SYNC(0 | TEST_SYNC_NO_FAULT);
+ GUEST_SYNC(2 | TEST_SYNC_NO_FAULT);
run_guest(svm->vmcb, svm->vmcb_gpa);
- GUEST_SYNC(false);
+ GUEST_SYNC(0 | TEST_SYNC_NO_FAULT);
+ GUEST_SYNC(2 | TEST_SYNC_NO_FAULT);
GUEST_ASSERT_EQ(svm->vmcb->control.exit_code, SVM_EXIT_VMMCALL);
GUEST_DONE();
}
@@ -106,6 +120,11 @@ static void l1_guest_code(void *data)
l1_svm_code(data);
}
+static uint64_t test_read_host_page(uint64_t *host_test_mem, int page_nr)
+{
+ return host_test_mem[PAGE_SIZE * page_nr / sizeof(*host_test_mem)];
+}
+
static void test_dirty_log(bool nested_tdp)
{
vm_vaddr_t nested_gva = 0;
@@ -133,32 +152,45 @@ static void test_dirty_log(bool nested_tdp)
/* Add an extra memory slot for testing dirty logging */
vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS,
- GUEST_TEST_MEM,
+ GUEST_TEST_MEM1,
TEST_MEM_SLOT_INDEX,
TEST_MEM_PAGES,
KVM_MEM_LOG_DIRTY_PAGES);
/*
- * Add an identity map for GVA range [0xc0000000, 0xc0002000). This
+ * Add an identity map for GVA range [0xc0000000, 0xc0004000). This
* affects both L1 and L2. However...
*/
- virt_map(vm, GUEST_TEST_MEM, GUEST_TEST_MEM, TEST_MEM_PAGES);
+ virt_map(vm, GUEST_TEST_MEM1, GUEST_TEST_MEM1, TEST_MEM_PAGES);
/*
- * ... pages in the L2 GPA range [0xc0001000, 0xc0003000) will map to
- * 0xc0000000.
+ * ... pages in the L2 GPA ranges [0xc0001000, 0xc0002000) and
+ * [0xc0003000, 0xc0004000) will map to 0xc0000000 and 0xc0001000
+ * respectively.
*
* When TDP is disabled, the L2 guest code will still access the same L1
* GPAs as the TDP enabled case.
+ *
+ * Set the Dirty bit in the PTEs used by L2 so that KVM will create
+ * writable SPTEs when handling read faults (if the Dirty bit isn't
+ * set, KVM must intercept the next write to emulate the Dirty bit
+ * update).
*/
if (nested_tdp) {
tdp_identity_map_default_memslots(vm);
- tdp_map(vm, NESTED_TEST_MEM1, GUEST_TEST_MEM, PAGE_SIZE);
- tdp_map(vm, NESTED_TEST_MEM2, GUEST_TEST_MEM, PAGE_SIZE);
+ tdp_map(vm, NESTED_TEST_MEM1, GUEST_TEST_MEM1, PAGE_SIZE);
+ tdp_map(vm, NESTED_TEST_MEM2, GUEST_TEST_MEM2, PAGE_SIZE);
+
+
+ *tdp_get_pte(vm, NESTED_TEST_MEM1) |= PTE_DIRTY_MASK(&vm->stage2_mmu);
+ *tdp_get_pte(vm, NESTED_TEST_MEM2) |= PTE_DIRTY_MASK(&vm->stage2_mmu);
+ } else {
+ *vm_get_pte(vm, GUEST_TEST_MEM1) |= PTE_DIRTY_MASK(&vm->mmu);
+ *vm_get_pte(vm, GUEST_TEST_MEM2) |= PTE_DIRTY_MASK(&vm->mmu);
}
bmap = bitmap_zalloc(TEST_MEM_PAGES);
- host_test_mem = addr_gpa2hva(vm, GUEST_TEST_MEM);
+ host_test_mem = addr_gpa2hva(vm, GUEST_TEST_MEM1);
while (!done) {
memset(host_test_mem, 0xaa, TEST_MEM_PAGES * PAGE_SIZE);
@@ -169,25 +201,42 @@ static void test_dirty_log(bool nested_tdp)
case UCALL_ABORT:
REPORT_GUEST_ASSERT(uc);
/* NOT REACHED */
- case UCALL_SYNC:
+ case UCALL_SYNC: {
+ int page_nr = uc.args[1] & TEST_SYNC_PAGE_MASK;
+ int i;
+
/*
* The nested guest wrote at offset 0x1000 in the memslot, but the
* dirty bitmap must be filled in according to L1 GPA, not L2.
*/
kvm_vm_get_dirty_log(vm, TEST_MEM_SLOT_INDEX, bmap);
- if (uc.args[1]) {
- TEST_ASSERT(test_bit(0, bmap), "Page 0 incorrectly reported clean");
- TEST_ASSERT(host_test_mem[0] == 1, "Page 0 not written by guest");
- } else {
- TEST_ASSERT(!test_bit(0, bmap), "Page 0 incorrectly reported dirty");
- TEST_ASSERT(host_test_mem[0] == 0xaaaaaaaaaaaaaaaaULL, "Page 0 written by guest");
+
+ /*
+ * If a fault is expected, the page should be dirty
+ * as the Dirty bit is set in the gPTE. KVM should
+ * create a writable SPTE even on a read fault, *and*
+ * KVM must mark the GFN as dirty when doing so.
+ */
+ TEST_ASSERT(test_bit(page_nr, bmap) == !(uc.args[1] & TEST_SYNC_NO_FAULT),
+ "Page %u incorrectly reported %s on %s fault", page_nr,
+ test_bit(page_nr, bmap) ? "dirty" : "clean",
+ uc.args[1] & TEST_SYNC_NO_FAULT ? "no" :
+ uc.args[1] & TEST_SYNC_READ_FAULT ? "read" : "write");
+
+ for (i = 0; i < TEST_MEM_PAGES; i++) {
+ if (i == page_nr && uc.args[1] & TEST_SYNC_WRITE_FAULT)
+ TEST_ASSERT(test_read_host_page(host_test_mem, i) == 1,
+ "Page %u not written by guest", i);
+ else
+ TEST_ASSERT(test_read_host_page(host_test_mem, i) == 0xaaaaaaaaaaaaaaaaULL,
+ "Page %u written by guest", i);
+
+ if (i != page_nr)
+ TEST_ASSERT(!test_bit(i, bmap),
+ "Page %u incorrectly reported dirty", i);
}
-
- TEST_ASSERT(!test_bit(1, bmap), "Page 1 incorrectly reported dirty");
- TEST_ASSERT(host_test_mem[PAGE_SIZE / 8] == 0xaaaaaaaaaaaaaaaaULL, "Page 1 written by guest");
- TEST_ASSERT(!test_bit(2, bmap), "Page 2 incorrectly reported dirty");
- TEST_ASSERT(host_test_mem[PAGE_SIZE*2 / 8] == 0xaaaaaaaaaaaaaaaaULL, "Page 2 written by guest");
break;
+ }
case UCALL_DONE:
done = true;
break;
--
2.52.0.351.gbe84eed79e-goog
^ permalink raw reply related [flat|nested] 39+ messages in thread
* Re: [PATCH v4 06/21] KVM: selftests: Add "struct kvm_mmu" to track a given MMU instance
2025-12-30 23:01 ` [PATCH v4 06/21] KVM: selftests: Add "struct kvm_mmu" to track a given MMU instance Sean Christopherson
@ 2026-01-02 16:50 ` Yosry Ahmed
0 siblings, 0 replies; 39+ messages in thread
From: Yosry Ahmed @ 2026-01-02 16:50 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou,
Christian Borntraeger, Janosch Frank, Claudio Imbrenda, kvm,
linux-arm-kernel, kvmarm, loongarch, kvm-riscv, linux-riscv,
linux-kernel
On Tue, Dec 30, 2025 at 03:01:35PM -0800, Sean Christopherson wrote:
> Add a "struct kvm_mmu" to track a given MMU instance, e.g. a VM's stage-1
> MMU versus a VM's stage-2 MMU, so that x86 can share MMU functionality for
> both stage-1 and stage-2 MMUs, without creating the potential for subtle
> bugs, e.g. due to consuming on vm->pgtable_levels when operating a stage-2
> MMU.
>
> Encapsulate the existing de facto MMU in "struct kvm_vm", e.g instead of
> burying the MMU details in "struct kvm_vm_arch", to avoid more #ifdefs in
> ____vm_create(), and in the hopes that other architectures can utilize the
> formalized MMU structure if/when they too support stage-2 page tables.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Yosry Ahmed <yosry.ahmed@linux.dev>
> ---
> .../testing/selftests/kvm/include/kvm_util.h | 11 ++++--
> .../selftests/kvm/lib/arm64/processor.c | 38 +++++++++----------
> tools/testing/selftests/kvm/lib/kvm_util.c | 28 +++++++-------
> .../selftests/kvm/lib/loongarch/processor.c | 28 +++++++-------
> .../selftests/kvm/lib/riscv/processor.c | 31 +++++++--------
> .../selftests/kvm/lib/s390/processor.c | 16 ++++----
> .../testing/selftests/kvm/lib/x86/processor.c | 28 +++++++-------
> .../kvm/x86/vmx_nested_la57_state_test.c | 2 +-
> 8 files changed, 94 insertions(+), 88 deletions(-)
>
> diff --git a/tools/testing/selftests/kvm/include/kvm_util.h b/tools/testing/selftests/kvm/include/kvm_util.h
> index 81f4355ff28a..39558c05c0bf 100644
> --- a/tools/testing/selftests/kvm/include/kvm_util.h
> +++ b/tools/testing/selftests/kvm/include/kvm_util.h
> @@ -88,12 +88,17 @@ enum kvm_mem_region_type {
> NR_MEM_REGIONS,
> };
>
> +struct kvm_mmu {
> + bool pgd_created;
> + uint64_t pgd;
> + int pgtable_levels;
> +};
> +
> struct kvm_vm {
> int mode;
> unsigned long type;
> int kvm_fd;
> int fd;
> - unsigned int pgtable_levels;
> unsigned int page_size;
> unsigned int page_shift;
> unsigned int pa_bits;
> @@ -104,13 +109,13 @@ struct kvm_vm {
> struct sparsebit *vpages_valid;
> struct sparsebit *vpages_mapped;
> bool has_irqchip;
> - bool pgd_created;
> vm_paddr_t ucall_mmio_addr;
> - vm_paddr_t pgd;
> vm_vaddr_t handlers;
> uint32_t dirty_ring_size;
> uint64_t gpa_tag_mask;
>
> + struct kvm_mmu mmu;
> +
> struct kvm_vm_arch arch;
>
> struct kvm_binary_stats stats;
> diff --git a/tools/testing/selftests/kvm/lib/arm64/processor.c b/tools/testing/selftests/kvm/lib/arm64/processor.c
> index d46e4b13b92c..c40f59d48311 100644
> --- a/tools/testing/selftests/kvm/lib/arm64/processor.c
> +++ b/tools/testing/selftests/kvm/lib/arm64/processor.c
> @@ -28,7 +28,7 @@ static uint64_t page_align(struct kvm_vm *vm, uint64_t v)
>
> static uint64_t pgd_index(struct kvm_vm *vm, vm_vaddr_t gva)
> {
> - unsigned int shift = (vm->pgtable_levels - 1) * (vm->page_shift - 3) + vm->page_shift;
> + unsigned int shift = (vm->mmu.pgtable_levels - 1) * (vm->page_shift - 3) + vm->page_shift;
> uint64_t mask = (1UL << (vm->va_bits - shift)) - 1;
>
> return (gva >> shift) & mask;
> @@ -39,7 +39,7 @@ static uint64_t pud_index(struct kvm_vm *vm, vm_vaddr_t gva)
> unsigned int shift = 2 * (vm->page_shift - 3) + vm->page_shift;
> uint64_t mask = (1UL << (vm->page_shift - 3)) - 1;
>
> - TEST_ASSERT(vm->pgtable_levels == 4,
> + TEST_ASSERT(vm->mmu.pgtable_levels == 4,
> "Mode %d does not have 4 page table levels", vm->mode);
>
> return (gva >> shift) & mask;
> @@ -50,7 +50,7 @@ static uint64_t pmd_index(struct kvm_vm *vm, vm_vaddr_t gva)
> unsigned int shift = (vm->page_shift - 3) + vm->page_shift;
> uint64_t mask = (1UL << (vm->page_shift - 3)) - 1;
>
> - TEST_ASSERT(vm->pgtable_levels >= 3,
> + TEST_ASSERT(vm->mmu.pgtable_levels >= 3,
> "Mode %d does not have >= 3 page table levels", vm->mode);
>
> return (gva >> shift) & mask;
> @@ -104,7 +104,7 @@ static uint64_t pte_addr(struct kvm_vm *vm, uint64_t pte)
>
> static uint64_t ptrs_per_pgd(struct kvm_vm *vm)
> {
> - unsigned int shift = (vm->pgtable_levels - 1) * (vm->page_shift - 3) + vm->page_shift;
> + unsigned int shift = (vm->mmu.pgtable_levels - 1) * (vm->page_shift - 3) + vm->page_shift;
> return 1 << (vm->va_bits - shift);
> }
>
> @@ -117,13 +117,13 @@ void virt_arch_pgd_alloc(struct kvm_vm *vm)
> {
> size_t nr_pages = page_align(vm, ptrs_per_pgd(vm) * 8) / vm->page_size;
>
> - if (vm->pgd_created)
> + if (vm->mmu.pgd_created)
> return;
>
> - vm->pgd = vm_phy_pages_alloc(vm, nr_pages,
> - KVM_GUEST_PAGE_TABLE_MIN_PADDR,
> - vm->memslots[MEM_REGION_PT]);
> - vm->pgd_created = true;
> + vm->mmu.pgd = vm_phy_pages_alloc(vm, nr_pages,
> + KVM_GUEST_PAGE_TABLE_MIN_PADDR,
> + vm->memslots[MEM_REGION_PT]);
> + vm->mmu.pgd_created = true;
> }
>
> static void _virt_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr,
> @@ -147,12 +147,12 @@ static void _virt_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr,
> " paddr: 0x%lx vm->max_gfn: 0x%lx vm->page_size: 0x%x",
> paddr, vm->max_gfn, vm->page_size);
>
> - ptep = addr_gpa2hva(vm, vm->pgd) + pgd_index(vm, vaddr) * 8;
> + ptep = addr_gpa2hva(vm, vm->mmu.pgd) + pgd_index(vm, vaddr) * 8;
> if (!*ptep)
> *ptep = addr_pte(vm, vm_alloc_page_table(vm),
> PGD_TYPE_TABLE | PTE_VALID);
>
> - switch (vm->pgtable_levels) {
> + switch (vm->mmu.pgtable_levels) {
> case 4:
> ptep = addr_gpa2hva(vm, pte_addr(vm, *ptep)) + pud_index(vm, vaddr) * 8;
> if (!*ptep)
> @@ -190,16 +190,16 @@ uint64_t *virt_get_pte_hva_at_level(struct kvm_vm *vm, vm_vaddr_t gva, int level
> {
> uint64_t *ptep;
>
> - if (!vm->pgd_created)
> + if (!vm->mmu.pgd_created)
> goto unmapped_gva;
>
> - ptep = addr_gpa2hva(vm, vm->pgd) + pgd_index(vm, gva) * 8;
> + ptep = addr_gpa2hva(vm, vm->mmu.pgd) + pgd_index(vm, gva) * 8;
> if (!ptep)
> goto unmapped_gva;
> if (level == 0)
> return ptep;
>
> - switch (vm->pgtable_levels) {
> + switch (vm->mmu.pgtable_levels) {
> case 4:
> ptep = addr_gpa2hva(vm, pte_addr(vm, *ptep)) + pud_index(vm, gva) * 8;
> if (!ptep)
> @@ -263,13 +263,13 @@ static void pte_dump(FILE *stream, struct kvm_vm *vm, uint8_t indent, uint64_t p
>
> void virt_arch_dump(FILE *stream, struct kvm_vm *vm, uint8_t indent)
> {
> - int level = 4 - (vm->pgtable_levels - 1);
> + int level = 4 - (vm->mmu.pgtable_levels - 1);
> uint64_t pgd, *ptep;
>
> - if (!vm->pgd_created)
> + if (!vm->mmu.pgd_created)
> return;
>
> - for (pgd = vm->pgd; pgd < vm->pgd + ptrs_per_pgd(vm) * 8; pgd += 8) {
> + for (pgd = vm->mmu.pgd; pgd < vm->mmu.pgd + ptrs_per_pgd(vm) * 8; pgd += 8) {
> ptep = addr_gpa2hva(vm, pgd);
> if (!*ptep)
> continue;
> @@ -350,7 +350,7 @@ void aarch64_vcpu_setup(struct kvm_vcpu *vcpu, struct kvm_vcpu_init *init)
> TEST_FAIL("Unknown guest mode, mode: 0x%x", vm->mode);
> }
>
> - ttbr0_el1 = vm->pgd & GENMASK(47, vm->page_shift);
> + ttbr0_el1 = vm->mmu.pgd & GENMASK(47, vm->page_shift);
>
> /* Configure output size */
> switch (vm->mode) {
> @@ -358,7 +358,7 @@ void aarch64_vcpu_setup(struct kvm_vcpu *vcpu, struct kvm_vcpu_init *init)
> case VM_MODE_P52V48_16K:
> case VM_MODE_P52V48_64K:
> tcr_el1 |= TCR_IPS_52_BITS;
> - ttbr0_el1 |= FIELD_GET(GENMASK(51, 48), vm->pgd) << 2;
> + ttbr0_el1 |= FIELD_GET(GENMASK(51, 48), vm->mmu.pgd) << 2;
> break;
> case VM_MODE_P48V48_4K:
> case VM_MODE_P48V48_16K:
> diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
> index 8279b6ced8d2..65752daeed90 100644
> --- a/tools/testing/selftests/kvm/lib/kvm_util.c
> +++ b/tools/testing/selftests/kvm/lib/kvm_util.c
> @@ -281,34 +281,34 @@ struct kvm_vm *____vm_create(struct vm_shape shape)
> /* Setup mode specific traits. */
> switch (vm->mode) {
> case VM_MODE_P52V48_4K:
> - vm->pgtable_levels = 4;
> + vm->mmu.pgtable_levels = 4;
> break;
> case VM_MODE_P52V48_64K:
> - vm->pgtable_levels = 3;
> + vm->mmu.pgtable_levels = 3;
> break;
> case VM_MODE_P48V48_4K:
> - vm->pgtable_levels = 4;
> + vm->mmu.pgtable_levels = 4;
> break;
> case VM_MODE_P48V48_64K:
> - vm->pgtable_levels = 3;
> + vm->mmu.pgtable_levels = 3;
> break;
> case VM_MODE_P40V48_4K:
> case VM_MODE_P36V48_4K:
> - vm->pgtable_levels = 4;
> + vm->mmu.pgtable_levels = 4;
> break;
> case VM_MODE_P40V48_64K:
> case VM_MODE_P36V48_64K:
> - vm->pgtable_levels = 3;
> + vm->mmu.pgtable_levels = 3;
> break;
> case VM_MODE_P52V48_16K:
> case VM_MODE_P48V48_16K:
> case VM_MODE_P40V48_16K:
> case VM_MODE_P36V48_16K:
> - vm->pgtable_levels = 4;
> + vm->mmu.pgtable_levels = 4;
> break;
> case VM_MODE_P47V47_16K:
> case VM_MODE_P36V47_16K:
> - vm->pgtable_levels = 3;
> + vm->mmu.pgtable_levels = 3;
> break;
> case VM_MODE_PXXVYY_4K:
> #ifdef __x86_64__
> @@ -321,22 +321,22 @@ struct kvm_vm *____vm_create(struct vm_shape shape)
> vm->va_bits);
>
> if (vm->va_bits == 57) {
> - vm->pgtable_levels = 5;
> + vm->mmu.pgtable_levels = 5;
> } else {
> TEST_ASSERT(vm->va_bits == 48,
> "Unexpected guest virtual address width: %d",
> vm->va_bits);
> - vm->pgtable_levels = 4;
> + vm->mmu.pgtable_levels = 4;
> }
> #else
> TEST_FAIL("VM_MODE_PXXVYY_4K not supported on non-x86 platforms");
> #endif
> break;
> case VM_MODE_P47V64_4K:
> - vm->pgtable_levels = 5;
> + vm->mmu.pgtable_levels = 5;
> break;
> case VM_MODE_P44V64_4K:
> - vm->pgtable_levels = 5;
> + vm->mmu.pgtable_levels = 5;
> break;
> default:
> TEST_FAIL("Unknown guest mode: 0x%x", vm->mode);
> @@ -1956,8 +1956,8 @@ void vm_dump(FILE *stream, struct kvm_vm *vm, uint8_t indent)
> fprintf(stream, "%*sMapped Virtual Pages:\n", indent, "");
> sparsebit_dump(stream, vm->vpages_mapped, indent + 2);
> fprintf(stream, "%*spgd_created: %u\n", indent, "",
> - vm->pgd_created);
> - if (vm->pgd_created) {
> + vm->mmu.pgd_created);
> + if (vm->mmu.pgd_created) {
> fprintf(stream, "%*sVirtual Translation Tables:\n",
> indent + 2, "");
> virt_dump(stream, vm, indent + 4);
> diff --git a/tools/testing/selftests/kvm/lib/loongarch/processor.c b/tools/testing/selftests/kvm/lib/loongarch/processor.c
> index 07c103369ddb..17aa55a2047a 100644
> --- a/tools/testing/selftests/kvm/lib/loongarch/processor.c
> +++ b/tools/testing/selftests/kvm/lib/loongarch/processor.c
> @@ -50,11 +50,11 @@ void virt_arch_pgd_alloc(struct kvm_vm *vm)
> int i;
> vm_paddr_t child, table;
>
> - if (vm->pgd_created)
> + if (vm->mmu.pgd_created)
> return;
>
> child = table = 0;
> - for (i = 0; i < vm->pgtable_levels; i++) {
> + for (i = 0; i < vm->mmu.pgtable_levels; i++) {
> invalid_pgtable[i] = child;
> table = vm_phy_page_alloc(vm, LOONGARCH_PAGE_TABLE_PHYS_MIN,
> vm->memslots[MEM_REGION_PT]);
> @@ -62,8 +62,8 @@ void virt_arch_pgd_alloc(struct kvm_vm *vm)
> virt_set_pgtable(vm, table, child);
> child = table;
> }
> - vm->pgd = table;
> - vm->pgd_created = true;
> + vm->mmu.pgd = table;
> + vm->mmu.pgd_created = true;
> }
>
> static int virt_pte_none(uint64_t *ptep, int level)
> @@ -77,11 +77,11 @@ static uint64_t *virt_populate_pte(struct kvm_vm *vm, vm_vaddr_t gva, int alloc)
> uint64_t *ptep;
> vm_paddr_t child;
>
> - if (!vm->pgd_created)
> + if (!vm->mmu.pgd_created)
> goto unmapped_gva;
>
> - child = vm->pgd;
> - level = vm->pgtable_levels - 1;
> + child = vm->mmu.pgd;
> + level = vm->mmu.pgtable_levels - 1;
> while (level > 0) {
> ptep = addr_gpa2hva(vm, child) + virt_pte_index(vm, gva, level) * 8;
> if (virt_pte_none(ptep, level)) {
> @@ -161,11 +161,11 @@ void virt_arch_dump(FILE *stream, struct kvm_vm *vm, uint8_t indent)
> {
> int level;
>
> - if (!vm->pgd_created)
> + if (!vm->mmu.pgd_created)
> return;
>
> - level = vm->pgtable_levels - 1;
> - pte_dump(stream, vm, indent, vm->pgd, level);
> + level = vm->mmu.pgtable_levels - 1;
> + pte_dump(stream, vm, indent, vm->mmu.pgd, level);
> }
>
> void vcpu_arch_dump(FILE *stream, struct kvm_vcpu *vcpu, uint8_t indent)
> @@ -297,7 +297,7 @@ static void loongarch_vcpu_setup(struct kvm_vcpu *vcpu)
>
> width = vm->page_shift - 3;
>
> - switch (vm->pgtable_levels) {
> + switch (vm->mmu.pgtable_levels) {
> case 4:
> /* pud page shift and width */
> val = (vm->page_shift + width * 2) << 20 | (width << 25);
> @@ -309,15 +309,15 @@ static void loongarch_vcpu_setup(struct kvm_vcpu *vcpu)
> val |= vm->page_shift | width << 5;
> break;
> default:
> - TEST_FAIL("Got %u page table levels, expected 3 or 4", vm->pgtable_levels);
> + TEST_FAIL("Got %u page table levels, expected 3 or 4", vm->mmu.pgtable_levels);
> }
>
> loongarch_set_csr(vcpu, LOONGARCH_CSR_PWCTL0, val);
>
> /* PGD page shift and width */
> - val = (vm->page_shift + width * (vm->pgtable_levels - 1)) | width << 6;
> + val = (vm->page_shift + width * (vm->mmu.pgtable_levels - 1)) | width << 6;
> loongarch_set_csr(vcpu, LOONGARCH_CSR_PWCTL1, val);
> - loongarch_set_csr(vcpu, LOONGARCH_CSR_PGDL, vm->pgd);
> + loongarch_set_csr(vcpu, LOONGARCH_CSR_PGDL, vm->mmu.pgd);
>
> /*
> * Refill exception runs on real mode
> diff --git a/tools/testing/selftests/kvm/lib/riscv/processor.c b/tools/testing/selftests/kvm/lib/riscv/processor.c
> index 2eac7d4b59e9..e6ec7c224fc3 100644
> --- a/tools/testing/selftests/kvm/lib/riscv/processor.c
> +++ b/tools/testing/selftests/kvm/lib/riscv/processor.c
> @@ -60,7 +60,7 @@ static uint64_t pte_index(struct kvm_vm *vm, vm_vaddr_t gva, int level)
> {
> TEST_ASSERT(level > -1,
> "Negative page table level (%d) not possible", level);
> - TEST_ASSERT(level < vm->pgtable_levels,
> + TEST_ASSERT(level < vm->mmu.pgtable_levels,
> "Invalid page table level (%d)", level);
>
> return (gva & pte_index_mask[level]) >> pte_index_shift[level];
> @@ -70,19 +70,19 @@ void virt_arch_pgd_alloc(struct kvm_vm *vm)
> {
> size_t nr_pages = page_align(vm, ptrs_per_pte(vm) * 8) / vm->page_size;
>
> - if (vm->pgd_created)
> + if (vm->mmu.pgd_created)
> return;
>
> - vm->pgd = vm_phy_pages_alloc(vm, nr_pages,
> - KVM_GUEST_PAGE_TABLE_MIN_PADDR,
> - vm->memslots[MEM_REGION_PT]);
> - vm->pgd_created = true;
> + vm->mmu.pgd = vm_phy_pages_alloc(vm, nr_pages,
> + KVM_GUEST_PAGE_TABLE_MIN_PADDR,
> + vm->memslots[MEM_REGION_PT]);
> + vm->mmu.pgd_created = true;
> }
>
> void virt_arch_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr)
> {
> uint64_t *ptep, next_ppn;
> - int level = vm->pgtable_levels - 1;
> + int level = vm->mmu.pgtable_levels - 1;
>
> TEST_ASSERT((vaddr % vm->page_size) == 0,
> "Virtual address not on page boundary,\n"
> @@ -98,7 +98,7 @@ void virt_arch_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr)
> " paddr: 0x%lx vm->max_gfn: 0x%lx vm->page_size: 0x%x",
> paddr, vm->max_gfn, vm->page_size);
>
> - ptep = addr_gpa2hva(vm, vm->pgd) + pte_index(vm, vaddr, level) * 8;
> + ptep = addr_gpa2hva(vm, vm->mmu.pgd) + pte_index(vm, vaddr, level) * 8;
> if (!*ptep) {
> next_ppn = vm_alloc_page_table(vm) >> PGTBL_PAGE_SIZE_SHIFT;
> *ptep = (next_ppn << PGTBL_PTE_ADDR_SHIFT) |
> @@ -126,12 +126,12 @@ void virt_arch_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr)
> vm_paddr_t addr_arch_gva2gpa(struct kvm_vm *vm, vm_vaddr_t gva)
> {
> uint64_t *ptep;
> - int level = vm->pgtable_levels - 1;
> + int level = vm->mmu.pgtable_levels - 1;
>
> - if (!vm->pgd_created)
> + if (!vm->mmu.pgd_created)
> goto unmapped_gva;
>
> - ptep = addr_gpa2hva(vm, vm->pgd) + pte_index(vm, gva, level) * 8;
> + ptep = addr_gpa2hva(vm, vm->mmu.pgd) + pte_index(vm, gva, level) * 8;
> if (!ptep)
> goto unmapped_gva;
> level--;
> @@ -176,13 +176,14 @@ static void pte_dump(FILE *stream, struct kvm_vm *vm, uint8_t indent,
>
> void virt_arch_dump(FILE *stream, struct kvm_vm *vm, uint8_t indent)
> {
> - int level = vm->pgtable_levels - 1;
> + struct kvm_mmu *mmu = &vm->mmu;
> + int level = mmu->pgtable_levels - 1;
> uint64_t pgd, *ptep;
>
> - if (!vm->pgd_created)
> + if (!mmu->pgd_created)
> return;
>
> - for (pgd = vm->pgd; pgd < vm->pgd + ptrs_per_pte(vm) * 8; pgd += 8) {
> + for (pgd = mmu->pgd; pgd < mmu->pgd + ptrs_per_pte(vm) * 8; pgd += 8) {
> ptep = addr_gpa2hva(vm, pgd);
> if (!*ptep)
> continue;
> @@ -211,7 +212,7 @@ void riscv_vcpu_mmu_setup(struct kvm_vcpu *vcpu)
> TEST_FAIL("Unknown guest mode, mode: 0x%x", vm->mode);
> }
>
> - satp = (vm->pgd >> PGTBL_PAGE_SIZE_SHIFT) & SATP_PPN;
> + satp = (vm->mmu.pgd >> PGTBL_PAGE_SIZE_SHIFT) & SATP_PPN;
> satp |= SATP_MODE_48;
>
> vcpu_set_reg(vcpu, RISCV_GENERAL_CSR_REG(satp), satp);
> diff --git a/tools/testing/selftests/kvm/lib/s390/processor.c b/tools/testing/selftests/kvm/lib/s390/processor.c
> index 8ceeb17c819a..6a9a660413a7 100644
> --- a/tools/testing/selftests/kvm/lib/s390/processor.c
> +++ b/tools/testing/selftests/kvm/lib/s390/processor.c
> @@ -17,7 +17,7 @@ void virt_arch_pgd_alloc(struct kvm_vm *vm)
> TEST_ASSERT(vm->page_size == PAGE_SIZE, "Unsupported page size: 0x%x",
> vm->page_size);
>
> - if (vm->pgd_created)
> + if (vm->mmu.pgd_created)
> return;
>
> paddr = vm_phy_pages_alloc(vm, PAGES_PER_REGION,
> @@ -25,8 +25,8 @@ void virt_arch_pgd_alloc(struct kvm_vm *vm)
> vm->memslots[MEM_REGION_PT]);
> memset(addr_gpa2hva(vm, paddr), 0xff, PAGES_PER_REGION * vm->page_size);
>
> - vm->pgd = paddr;
> - vm->pgd_created = true;
> + vm->mmu.pgd = paddr;
> + vm->mmu.pgd_created = true;
> }
>
> /*
> @@ -70,7 +70,7 @@ void virt_arch_pg_map(struct kvm_vm *vm, uint64_t gva, uint64_t gpa)
> gva, vm->max_gfn, vm->page_size);
>
> /* Walk through region and segment tables */
> - entry = addr_gpa2hva(vm, vm->pgd);
> + entry = addr_gpa2hva(vm, vm->mmu.pgd);
> for (ri = 1; ri <= 4; ri++) {
> idx = (gva >> (64 - 11 * ri)) & 0x7ffu;
> if (entry[idx] & REGION_ENTRY_INVALID)
> @@ -94,7 +94,7 @@ vm_paddr_t addr_arch_gva2gpa(struct kvm_vm *vm, vm_vaddr_t gva)
> TEST_ASSERT(vm->page_size == PAGE_SIZE, "Unsupported page size: 0x%x",
> vm->page_size);
>
> - entry = addr_gpa2hva(vm, vm->pgd);
> + entry = addr_gpa2hva(vm, vm->mmu.pgd);
> for (ri = 1; ri <= 4; ri++) {
> idx = (gva >> (64 - 11 * ri)) & 0x7ffu;
> TEST_ASSERT(!(entry[idx] & REGION_ENTRY_INVALID),
> @@ -149,10 +149,10 @@ static void virt_dump_region(FILE *stream, struct kvm_vm *vm, uint8_t indent,
>
> void virt_arch_dump(FILE *stream, struct kvm_vm *vm, uint8_t indent)
> {
> - if (!vm->pgd_created)
> + if (!vm->mmu.pgd_created)
> return;
>
> - virt_dump_region(stream, vm, indent, vm->pgd);
> + virt_dump_region(stream, vm, indent, vm->mmu.pgd);
> }
>
> void vcpu_arch_set_entry_point(struct kvm_vcpu *vcpu, void *guest_code)
> @@ -184,7 +184,7 @@ struct kvm_vcpu *vm_arch_vcpu_add(struct kvm_vm *vm, uint32_t vcpu_id)
>
> vcpu_sregs_get(vcpu, &sregs);
> sregs.crs[0] |= 0x00040000; /* Enable floating point regs */
> - sregs.crs[1] = vm->pgd | 0xf; /* Primary region table */
> + sregs.crs[1] = vm->mmu.pgd | 0xf; /* Primary region table */
> vcpu_sregs_set(vcpu, &sregs);
>
> vcpu->run->psw_mask = 0x0400000180000000ULL; /* DAT enabled + 64 bit mode */
> diff --git a/tools/testing/selftests/kvm/lib/x86/processor.c b/tools/testing/selftests/kvm/lib/x86/processor.c
> index c14bf2b5f28f..f027f86d1535 100644
> --- a/tools/testing/selftests/kvm/lib/x86/processor.c
> +++ b/tools/testing/selftests/kvm/lib/x86/processor.c
> @@ -162,9 +162,9 @@ void virt_arch_pgd_alloc(struct kvm_vm *vm)
> "Unknown or unsupported guest mode: 0x%x", vm->mode);
>
> /* If needed, create the top-level page table. */
> - if (!vm->pgd_created) {
> - vm->pgd = vm_alloc_page_table(vm);
> - vm->pgd_created = true;
> + if (!vm->mmu.pgd_created) {
> + vm->mmu.pgd = vm_alloc_page_table(vm);
> + vm->mmu.pgd_created = true;
> }
> }
>
> @@ -175,7 +175,7 @@ static void *virt_get_pte(struct kvm_vm *vm, uint64_t *parent_pte,
> uint64_t *page_table = addr_gpa2hva(vm, pt_gpa);
> int index = (vaddr >> PG_LEVEL_SHIFT(level)) & 0x1ffu;
>
> - TEST_ASSERT((*parent_pte & PTE_PRESENT_MASK) || parent_pte == &vm->pgd,
> + TEST_ASSERT((*parent_pte & PTE_PRESENT_MASK) || parent_pte == &vm->mmu.pgd,
> "Parent PTE (level %d) not PRESENT for gva: 0x%08lx",
> level + 1, vaddr);
>
> @@ -218,7 +218,7 @@ static uint64_t *virt_create_upper_pte(struct kvm_vm *vm,
> void __virt_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr, int level)
> {
> const uint64_t pg_size = PG_LEVEL_SIZE(level);
> - uint64_t *pte = &vm->pgd;
> + uint64_t *pte = &vm->mmu.pgd;
> int current_level;
>
> TEST_ASSERT(vm->mode == VM_MODE_PXXVYY_4K,
> @@ -243,7 +243,7 @@ void __virt_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr, int level)
> * Allocate upper level page tables, if not already present. Return
> * early if a hugepage was created.
> */
> - for (current_level = vm->pgtable_levels;
> + for (current_level = vm->mmu.pgtable_levels;
> current_level > PG_LEVEL_4K;
> current_level--) {
> pte = virt_create_upper_pte(vm, pte, vaddr, paddr,
> @@ -309,14 +309,14 @@ static bool vm_is_target_pte(uint64_t *pte, int *level, int current_level)
> static uint64_t *__vm_get_page_table_entry(struct kvm_vm *vm, uint64_t vaddr,
> int *level)
> {
> - int va_width = 12 + (vm->pgtable_levels) * 9;
> - uint64_t *pte = &vm->pgd;
> + int va_width = 12 + (vm->mmu.pgtable_levels) * 9;
> + uint64_t *pte = &vm->mmu.pgd;
> int current_level;
>
> TEST_ASSERT(!vm->arch.is_pt_protected,
> "Walking page tables of protected guests is impossible");
>
> - TEST_ASSERT(*level >= PG_LEVEL_NONE && *level <= vm->pgtable_levels,
> + TEST_ASSERT(*level >= PG_LEVEL_NONE && *level <= vm->mmu.pgtable_levels,
> "Invalid PG_LEVEL_* '%d'", *level);
>
> TEST_ASSERT(vm->mode == VM_MODE_PXXVYY_4K,
> @@ -332,7 +332,7 @@ static uint64_t *__vm_get_page_table_entry(struct kvm_vm *vm, uint64_t vaddr,
> (((int64_t)vaddr << (64 - va_width) >> (64 - va_width))),
> "Canonical check failed. The virtual address is invalid.");
>
> - for (current_level = vm->pgtable_levels;
> + for (current_level = vm->mmu.pgtable_levels;
> current_level > PG_LEVEL_4K;
> current_level--) {
> pte = virt_get_pte(vm, pte, vaddr, current_level);
> @@ -357,7 +357,7 @@ void virt_arch_dump(FILE *stream, struct kvm_vm *vm, uint8_t indent)
> uint64_t *pde, *pde_start;
> uint64_t *pte, *pte_start;
>
> - if (!vm->pgd_created)
> + if (!vm->mmu.pgd_created)
> return;
>
> fprintf(stream, "%*s "
> @@ -365,7 +365,7 @@ void virt_arch_dump(FILE *stream, struct kvm_vm *vm, uint8_t indent)
> fprintf(stream, "%*s index hvaddr gpaddr "
> "addr w exec dirty\n",
> indent, "");
> - pml4e_start = (uint64_t *) addr_gpa2hva(vm, vm->pgd);
> + pml4e_start = (uint64_t *) addr_gpa2hva(vm, vm->mmu.pgd);
> for (uint16_t n1 = 0; n1 <= 0x1ffu; n1++) {
> pml4e = &pml4e_start[n1];
> if (!(*pml4e & PTE_PRESENT_MASK))
> @@ -538,7 +538,7 @@ static void vcpu_init_sregs(struct kvm_vm *vm, struct kvm_vcpu *vcpu)
> sregs.cr4 |= X86_CR4_PAE | X86_CR4_OSFXSR;
> if (kvm_cpu_has(X86_FEATURE_XSAVE))
> sregs.cr4 |= X86_CR4_OSXSAVE;
> - if (vm->pgtable_levels == 5)
> + if (vm->mmu.pgtable_levels == 5)
> sregs.cr4 |= X86_CR4_LA57;
> sregs.efer |= (EFER_LME | EFER_LMA | EFER_NX);
>
> @@ -549,7 +549,7 @@ static void vcpu_init_sregs(struct kvm_vm *vm, struct kvm_vcpu *vcpu)
> kvm_seg_set_kernel_data_64bit(&sregs.gs);
> kvm_seg_set_tss_64bit(vm->arch.tss, &sregs.tr);
>
> - sregs.cr3 = vm->pgd;
> + sregs.cr3 = vm->mmu.pgd;
> vcpu_sregs_set(vcpu, &sregs);
> }
>
> diff --git a/tools/testing/selftests/kvm/x86/vmx_nested_la57_state_test.c b/tools/testing/selftests/kvm/x86/vmx_nested_la57_state_test.c
> index cf1d2d1f2a8f..915c42001dba 100644
> --- a/tools/testing/selftests/kvm/x86/vmx_nested_la57_state_test.c
> +++ b/tools/testing/selftests/kvm/x86/vmx_nested_la57_state_test.c
> @@ -90,7 +90,7 @@ int main(int argc, char *argv[])
> * L1 needs to read its own PML5 table to set up L2. Identity map
> * the PML5 table to facilitate this.
> */
> - virt_map(vm, vm->pgd, vm->pgd, 1);
> + virt_map(vm, vm->mmu.pgd, vm->mmu.pgd, 1);
>
> vcpu_alloc_vmx(vm, &vmx_pages_gva);
> vcpu_args_set(vcpu, 1, vmx_pages_gva);
> --
> 2.52.0.351.gbe84eed79e-goog
>
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v4 08/21] KVM: selftests: Add a "struct kvm_mmu_arch arch" member to kvm_mmu
2025-12-30 23:01 ` [PATCH v4 08/21] KVM: selftests: Add a "struct kvm_mmu_arch arch" member to kvm_mmu Sean Christopherson
@ 2026-01-02 16:53 ` Yosry Ahmed
2026-01-02 17:02 ` Yosry Ahmed
1 sibling, 0 replies; 39+ messages in thread
From: Yosry Ahmed @ 2026-01-02 16:53 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou,
Christian Borntraeger, Janosch Frank, Claudio Imbrenda, kvm,
linux-arm-kernel, kvmarm, loongarch, kvm-riscv, linux-riscv,
linux-kernel
On Tue, Dec 30, 2025 at 03:01:37PM -0800, Sean Christopherson wrote:
> Add an arch structure+field in "struct kvm_mmu" so that architectures can
> track arch-specific information for a given MMU.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Yosry Ahmed <yosry.ahmed@linux.dev>
> ---
> tools/testing/selftests/kvm/include/arm64/kvm_util_arch.h | 2 ++
> tools/testing/selftests/kvm/include/kvm_util.h | 2 ++
> tools/testing/selftests/kvm/include/loongarch/kvm_util_arch.h | 1 +
> tools/testing/selftests/kvm/include/riscv/kvm_util_arch.h | 1 +
> tools/testing/selftests/kvm/include/s390/kvm_util_arch.h | 1 +
> tools/testing/selftests/kvm/include/x86/kvm_util_arch.h | 2 ++
> 6 files changed, 9 insertions(+)
>
> diff --git a/tools/testing/selftests/kvm/include/arm64/kvm_util_arch.h b/tools/testing/selftests/kvm/include/arm64/kvm_util_arch.h
> index b973bb2c64a6..4a2033708227 100644
> --- a/tools/testing/selftests/kvm/include/arm64/kvm_util_arch.h
> +++ b/tools/testing/selftests/kvm/include/arm64/kvm_util_arch.h
> @@ -2,6 +2,8 @@
> #ifndef SELFTEST_KVM_UTIL_ARCH_H
> #define SELFTEST_KVM_UTIL_ARCH_H
>
> +struct kvm_mmu_arch {};
> +
> struct kvm_vm_arch {
> bool has_gic;
> int gic_fd;
> diff --git a/tools/testing/selftests/kvm/include/kvm_util.h b/tools/testing/selftests/kvm/include/kvm_util.h
> index 39558c05c0bf..c1497515fa6a 100644
> --- a/tools/testing/selftests/kvm/include/kvm_util.h
> +++ b/tools/testing/selftests/kvm/include/kvm_util.h
> @@ -92,6 +92,8 @@ struct kvm_mmu {
> bool pgd_created;
> uint64_t pgd;
> int pgtable_levels;
> +
> + struct kvm_mmu_arch arch;
> };
>
> struct kvm_vm {
> diff --git a/tools/testing/selftests/kvm/include/loongarch/kvm_util_arch.h b/tools/testing/selftests/kvm/include/loongarch/kvm_util_arch.h
> index e43a57d99b56..d5095900e442 100644
> --- a/tools/testing/selftests/kvm/include/loongarch/kvm_util_arch.h
> +++ b/tools/testing/selftests/kvm/include/loongarch/kvm_util_arch.h
> @@ -2,6 +2,7 @@
> #ifndef SELFTEST_KVM_UTIL_ARCH_H
> #define SELFTEST_KVM_UTIL_ARCH_H
>
> +struct kvm_mmu_arch {};
> struct kvm_vm_arch {};
>
> #endif // SELFTEST_KVM_UTIL_ARCH_H
> diff --git a/tools/testing/selftests/kvm/include/riscv/kvm_util_arch.h b/tools/testing/selftests/kvm/include/riscv/kvm_util_arch.h
> index e43a57d99b56..d5095900e442 100644
> --- a/tools/testing/selftests/kvm/include/riscv/kvm_util_arch.h
> +++ b/tools/testing/selftests/kvm/include/riscv/kvm_util_arch.h
> @@ -2,6 +2,7 @@
> #ifndef SELFTEST_KVM_UTIL_ARCH_H
> #define SELFTEST_KVM_UTIL_ARCH_H
>
> +struct kvm_mmu_arch {};
> struct kvm_vm_arch {};
>
> #endif // SELFTEST_KVM_UTIL_ARCH_H
> diff --git a/tools/testing/selftests/kvm/include/s390/kvm_util_arch.h b/tools/testing/selftests/kvm/include/s390/kvm_util_arch.h
> index e43a57d99b56..d5095900e442 100644
> --- a/tools/testing/selftests/kvm/include/s390/kvm_util_arch.h
> +++ b/tools/testing/selftests/kvm/include/s390/kvm_util_arch.h
> @@ -2,6 +2,7 @@
> #ifndef SELFTEST_KVM_UTIL_ARCH_H
> #define SELFTEST_KVM_UTIL_ARCH_H
>
> +struct kvm_mmu_arch {};
> struct kvm_vm_arch {};
>
> #endif // SELFTEST_KVM_UTIL_ARCH_H
> diff --git a/tools/testing/selftests/kvm/include/x86/kvm_util_arch.h b/tools/testing/selftests/kvm/include/x86/kvm_util_arch.h
> index 972bb1c4ab4c..456e5ca170df 100644
> --- a/tools/testing/selftests/kvm/include/x86/kvm_util_arch.h
> +++ b/tools/testing/selftests/kvm/include/x86/kvm_util_arch.h
> @@ -10,6 +10,8 @@
>
> extern bool is_forced_emulation_enabled;
>
> +struct kvm_mmu_arch {};
> +
> struct kvm_vm_arch {
> vm_vaddr_t gdt;
> vm_vaddr_t tss;
> --
> 2.52.0.351.gbe84eed79e-goog
>
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v4 11/21] KVM: selftests: Stop passing VMX metadata to TDP mapping functions
2025-12-30 23:01 ` [PATCH v4 11/21] KVM: selftests: Stop passing VMX metadata to TDP mapping functions Sean Christopherson
@ 2026-01-02 16:58 ` Yosry Ahmed
2026-01-02 17:12 ` Yosry Ahmed
1 sibling, 0 replies; 39+ messages in thread
From: Yosry Ahmed @ 2026-01-02 16:58 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou,
Christian Borntraeger, Janosch Frank, Claudio Imbrenda, kvm,
linux-arm-kernel, kvmarm, loongarch, kvm-riscv, linux-riscv,
linux-kernel
On Tue, Dec 30, 2025 at 03:01:40PM -0800, Sean Christopherson wrote:
> From: Yosry Ahmed <yosry.ahmed@linux.dev>
>
> The root GPA can now be retrieved from the nested MMU, stop passing VMX
> metadata. This is in preparation for making these functions work for
> NPTs as well.
Super nit: I think at this point the root GPA is already being retrieved
from the nested MMU, so maybe s/can now be/is?
Also, maybe call it TDP MMU or stage2 MMU since it was renamed.
>
> Opportunistically drop tdp_pg_map() since it's unused.
>
> No functional change intended.
>
> Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
> tools/testing/selftests/kvm/include/x86/vmx.h | 11 ++-----
> .../testing/selftests/kvm/lib/x86/memstress.c | 11 +++----
> tools/testing/selftests/kvm/lib/x86/vmx.c | 33 +++++++------------
> .../selftests/kvm/x86/vmx_dirty_log_test.c | 9 +++--
> 4 files changed, 24 insertions(+), 40 deletions(-)
>
> diff --git a/tools/testing/selftests/kvm/include/x86/vmx.h b/tools/testing/selftests/kvm/include/x86/vmx.h
> index 1fd83c23529a..4dd4c2094ee6 100644
> --- a/tools/testing/selftests/kvm/include/x86/vmx.h
> +++ b/tools/testing/selftests/kvm/include/x86/vmx.h
> @@ -557,14 +557,9 @@ bool load_vmcs(struct vmx_pages *vmx);
>
> bool ept_1g_pages_supported(void);
>
> -void tdp_pg_map(struct vmx_pages *vmx, struct kvm_vm *vm, uint64_t nested_paddr,
> - uint64_t paddr);
> -void tdp_map(struct vmx_pages *vmx, struct kvm_vm *vm, uint64_t nested_paddr,
> - uint64_t paddr, uint64_t size);
> -void tdp_identity_map_default_memslots(struct vmx_pages *vmx,
> - struct kvm_vm *vm);
> -void tdp_identity_map_1g(struct vmx_pages *vmx, struct kvm_vm *vm,
> - uint64_t addr, uint64_t size);
> +void tdp_map(struct kvm_vm *vm, uint64_t nested_paddr, uint64_t paddr, uint64_t size);
> +void tdp_identity_map_default_memslots(struct kvm_vm *vm);
> +void tdp_identity_map_1g(struct kvm_vm *vm, uint64_t addr, uint64_t size);
> bool kvm_cpu_has_ept(void);
> void vm_enable_ept(struct kvm_vm *vm);
> void prepare_virtualize_apic_accesses(struct vmx_pages *vmx, struct kvm_vm *vm);
> diff --git a/tools/testing/selftests/kvm/lib/x86/memstress.c b/tools/testing/selftests/kvm/lib/x86/memstress.c
> index 00f7f11e5f0e..3319cb57a78d 100644
> --- a/tools/testing/selftests/kvm/lib/x86/memstress.c
> +++ b/tools/testing/selftests/kvm/lib/x86/memstress.c
> @@ -59,7 +59,7 @@ uint64_t memstress_nested_pages(int nr_vcpus)
> return 513 + 10 * nr_vcpus;
> }
>
> -static void memstress_setup_ept_mappings(struct vmx_pages *vmx, struct kvm_vm *vm)
> +static void memstress_setup_ept_mappings(struct kvm_vm *vm)
> {
> uint64_t start, end;
>
> @@ -68,16 +68,15 @@ static void memstress_setup_ept_mappings(struct vmx_pages *vmx, struct kvm_vm *v
> * KVM can shadow the EPT12 with the maximum huge page size supported
> * by the backing source.
> */
> - tdp_identity_map_1g(vmx, vm, 0, 0x100000000ULL);
> + tdp_identity_map_1g(vm, 0, 0x100000000ULL);
>
> start = align_down(memstress_args.gpa, PG_SIZE_1G);
> end = align_up(memstress_args.gpa + memstress_args.size, PG_SIZE_1G);
> - tdp_identity_map_1g(vmx, vm, start, end - start);
> + tdp_identity_map_1g(vm, start, end - start);
> }
>
> void memstress_setup_nested(struct kvm_vm *vm, int nr_vcpus, struct kvm_vcpu *vcpus[])
> {
> - struct vmx_pages *vmx;
> struct kvm_regs regs;
> vm_vaddr_t vmx_gva;
> int vcpu_id;
> @@ -87,11 +86,11 @@ void memstress_setup_nested(struct kvm_vm *vm, int nr_vcpus, struct kvm_vcpu *vc
>
> vm_enable_ept(vm);
> for (vcpu_id = 0; vcpu_id < nr_vcpus; vcpu_id++) {
> - vmx = vcpu_alloc_vmx(vm, &vmx_gva);
> + vcpu_alloc_vmx(vm, &vmx_gva);
>
> /* The EPTs are shared across vCPUs, setup the mappings once */
> if (vcpu_id == 0)
> - memstress_setup_ept_mappings(vmx, vm);
> + memstress_setup_ept_mappings(vm);
>
> /*
> * Override the vCPU to run memstress_l1_guest_code() which will
> diff --git a/tools/testing/selftests/kvm/lib/x86/vmx.c b/tools/testing/selftests/kvm/lib/x86/vmx.c
> index 9d4e391fdf2c..ea1c09f9e8ab 100644
> --- a/tools/testing/selftests/kvm/lib/x86/vmx.c
> +++ b/tools/testing/selftests/kvm/lib/x86/vmx.c
> @@ -409,8 +409,8 @@ static void tdp_create_pte(struct kvm_vm *vm,
> }
>
>
> -void __tdp_pg_map(struct vmx_pages *vmx, struct kvm_vm *vm,
> - uint64_t nested_paddr, uint64_t paddr, int target_level)
> +void __tdp_pg_map(struct kvm_vm *vm, uint64_t nested_paddr, uint64_t paddr,
> + int target_level)
> {
> const uint64_t page_size = PG_LEVEL_SIZE(target_level);
> void *eptp_hva = addr_gpa2hva(vm, vm->arch.tdp_mmu->pgd);
> @@ -453,12 +453,6 @@ void __tdp_pg_map(struct vmx_pages *vmx, struct kvm_vm *vm,
> }
> }
>
> -void tdp_pg_map(struct vmx_pages *vmx, struct kvm_vm *vm,
> - uint64_t nested_paddr, uint64_t paddr)
> -{
> - __tdp_pg_map(vmx, vm, nested_paddr, paddr, PG_LEVEL_4K);
> -}
> -
> /*
> * Map a range of EPT guest physical addresses to the VM's physical address
> *
> @@ -476,9 +470,8 @@ void tdp_pg_map(struct vmx_pages *vmx, struct kvm_vm *vm,
> * Within the VM given by vm, creates a nested guest translation for the
> * page range starting at nested_paddr to the page range starting at paddr.
> */
> -void __tdp_map(struct vmx_pages *vmx, struct kvm_vm *vm,
> - uint64_t nested_paddr, uint64_t paddr, uint64_t size,
> - int level)
> +void __tdp_map(struct kvm_vm *vm, uint64_t nested_paddr, uint64_t paddr,
> + uint64_t size, int level)
> {
> size_t page_size = PG_LEVEL_SIZE(level);
> size_t npages = size / page_size;
> @@ -487,23 +480,22 @@ void __tdp_map(struct vmx_pages *vmx, struct kvm_vm *vm,
> TEST_ASSERT(paddr + size > paddr, "Paddr overflow");
>
> while (npages--) {
> - __tdp_pg_map(vmx, vm, nested_paddr, paddr, level);
> + __tdp_pg_map(vm, nested_paddr, paddr, level);
> nested_paddr += page_size;
> paddr += page_size;
> }
> }
>
> -void tdp_map(struct vmx_pages *vmx, struct kvm_vm *vm,
> - uint64_t nested_paddr, uint64_t paddr, uint64_t size)
> +void tdp_map(struct kvm_vm *vm, uint64_t nested_paddr, uint64_t paddr,
> + uint64_t size)
> {
> - __tdp_map(vmx, vm, nested_paddr, paddr, size, PG_LEVEL_4K);
> + __tdp_map(vm, nested_paddr, paddr, size, PG_LEVEL_4K);
> }
>
> /* Prepare an identity extended page table that maps all the
> * physical pages in VM.
> */
> -void tdp_identity_map_default_memslots(struct vmx_pages *vmx,
> - struct kvm_vm *vm)
> +void tdp_identity_map_default_memslots(struct kvm_vm *vm)
> {
> uint32_t s, memslot = 0;
> sparsebit_idx_t i, last;
> @@ -520,16 +512,15 @@ void tdp_identity_map_default_memslots(struct vmx_pages *vmx,
> if (i > last)
> break;
>
> - tdp_map(vmx, vm, (uint64_t)i << vm->page_shift,
> + tdp_map(vm, (uint64_t)i << vm->page_shift,
> (uint64_t)i << vm->page_shift, 1 << vm->page_shift);
> }
> }
>
> /* Identity map a region with 1GiB Pages. */
> -void tdp_identity_map_1g(struct vmx_pages *vmx, struct kvm_vm *vm,
> - uint64_t addr, uint64_t size)
> +void tdp_identity_map_1g(struct kvm_vm *vm, uint64_t addr, uint64_t size)
> {
> - __tdp_map(vmx, vm, addr, addr, size, PG_LEVEL_1G);
> + __tdp_map(vm, addr, addr, size, PG_LEVEL_1G);
> }
>
> bool kvm_cpu_has_ept(void)
> diff --git a/tools/testing/selftests/kvm/x86/vmx_dirty_log_test.c b/tools/testing/selftests/kvm/x86/vmx_dirty_log_test.c
> index 5c8cf8ac42a2..370f8d3117c2 100644
> --- a/tools/testing/selftests/kvm/x86/vmx_dirty_log_test.c
> +++ b/tools/testing/selftests/kvm/x86/vmx_dirty_log_test.c
> @@ -80,7 +80,6 @@ void l1_guest_code(struct vmx_pages *vmx)
> static void test_vmx_dirty_log(bool enable_ept)
> {
> vm_vaddr_t vmx_pages_gva = 0;
> - struct vmx_pages *vmx;
> unsigned long *bmap;
> uint64_t *host_test_mem;
>
> @@ -96,7 +95,7 @@ static void test_vmx_dirty_log(bool enable_ept)
> if (enable_ept)
> vm_enable_ept(vm);
>
> - vmx = vcpu_alloc_vmx(vm, &vmx_pages_gva);
> + vcpu_alloc_vmx(vm, &vmx_pages_gva);
> vcpu_args_set(vcpu, 1, vmx_pages_gva);
>
> /* Add an extra memory slot for testing dirty logging */
> @@ -120,9 +119,9 @@ static void test_vmx_dirty_log(bool enable_ept)
> * GPAs as the EPT enabled case.
> */
> if (enable_ept) {
> - tdp_identity_map_default_memslots(vmx, vm);
> - tdp_map(vmx, vm, NESTED_TEST_MEM1, GUEST_TEST_MEM, PAGE_SIZE);
> - tdp_map(vmx, vm, NESTED_TEST_MEM2, GUEST_TEST_MEM, PAGE_SIZE);
> + tdp_identity_map_default_memslots(vm);
> + tdp_map(vm, NESTED_TEST_MEM1, GUEST_TEST_MEM, PAGE_SIZE);
> + tdp_map(vm, NESTED_TEST_MEM2, GUEST_TEST_MEM, PAGE_SIZE);
> }
>
> bmap = bitmap_zalloc(TEST_MEM_PAGES);
> --
> 2.52.0.351.gbe84eed79e-goog
>
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v4 08/21] KVM: selftests: Add a "struct kvm_mmu_arch arch" member to kvm_mmu
2025-12-30 23:01 ` [PATCH v4 08/21] KVM: selftests: Add a "struct kvm_mmu_arch arch" member to kvm_mmu Sean Christopherson
2026-01-02 16:53 ` Yosry Ahmed
@ 2026-01-02 17:02 ` Yosry Ahmed
1 sibling, 0 replies; 39+ messages in thread
From: Yosry Ahmed @ 2026-01-02 17:02 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou,
Christian Borntraeger, Janosch Frank, Claudio Imbrenda, kvm,
linux-arm-kernel, kvmarm, loongarch, kvm-riscv, linux-riscv,
linux-kernel
On Tue, Dec 30, 2025 at 03:01:37PM -0800, Sean Christopherson wrote:
> Add an arch structure+field in "struct kvm_mmu" so that architectures can
> track arch-specific information for a given MMU.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Yosry Ahmed <yosry.ahmed@linux.dev>
> ---
> tools/testing/selftests/kvm/include/arm64/kvm_util_arch.h | 2 ++
> tools/testing/selftests/kvm/include/kvm_util.h | 2 ++
> tools/testing/selftests/kvm/include/loongarch/kvm_util_arch.h | 1 +
> tools/testing/selftests/kvm/include/riscv/kvm_util_arch.h | 1 +
> tools/testing/selftests/kvm/include/s390/kvm_util_arch.h | 1 +
> tools/testing/selftests/kvm/include/x86/kvm_util_arch.h | 2 ++
> 6 files changed, 9 insertions(+)
>
> diff --git a/tools/testing/selftests/kvm/include/arm64/kvm_util_arch.h b/tools/testing/selftests/kvm/include/arm64/kvm_util_arch.h
> index b973bb2c64a6..4a2033708227 100644
> --- a/tools/testing/selftests/kvm/include/arm64/kvm_util_arch.h
> +++ b/tools/testing/selftests/kvm/include/arm64/kvm_util_arch.h
> @@ -2,6 +2,8 @@
> #ifndef SELFTEST_KVM_UTIL_ARCH_H
> #define SELFTEST_KVM_UTIL_ARCH_H
>
> +struct kvm_mmu_arch {};
> +
> struct kvm_vm_arch {
> bool has_gic;
> int gic_fd;
> diff --git a/tools/testing/selftests/kvm/include/kvm_util.h b/tools/testing/selftests/kvm/include/kvm_util.h
> index 39558c05c0bf..c1497515fa6a 100644
> --- a/tools/testing/selftests/kvm/include/kvm_util.h
> +++ b/tools/testing/selftests/kvm/include/kvm_util.h
> @@ -92,6 +92,8 @@ struct kvm_mmu {
> bool pgd_created;
> uint64_t pgd;
> int pgtable_levels;
> +
> + struct kvm_mmu_arch arch;
> };
>
> struct kvm_vm {
> diff --git a/tools/testing/selftests/kvm/include/loongarch/kvm_util_arch.h b/tools/testing/selftests/kvm/include/loongarch/kvm_util_arch.h
> index e43a57d99b56..d5095900e442 100644
> --- a/tools/testing/selftests/kvm/include/loongarch/kvm_util_arch.h
> +++ b/tools/testing/selftests/kvm/include/loongarch/kvm_util_arch.h
> @@ -2,6 +2,7 @@
> #ifndef SELFTEST_KVM_UTIL_ARCH_H
> #define SELFTEST_KVM_UTIL_ARCH_H
>
> +struct kvm_mmu_arch {};
> struct kvm_vm_arch {};
>
> #endif // SELFTEST_KVM_UTIL_ARCH_H
> diff --git a/tools/testing/selftests/kvm/include/riscv/kvm_util_arch.h b/tools/testing/selftests/kvm/include/riscv/kvm_util_arch.h
> index e43a57d99b56..d5095900e442 100644
> --- a/tools/testing/selftests/kvm/include/riscv/kvm_util_arch.h
> +++ b/tools/testing/selftests/kvm/include/riscv/kvm_util_arch.h
> @@ -2,6 +2,7 @@
> #ifndef SELFTEST_KVM_UTIL_ARCH_H
> #define SELFTEST_KVM_UTIL_ARCH_H
>
> +struct kvm_mmu_arch {};
> struct kvm_vm_arch {};
>
> #endif // SELFTEST_KVM_UTIL_ARCH_H
> diff --git a/tools/testing/selftests/kvm/include/s390/kvm_util_arch.h b/tools/testing/selftests/kvm/include/s390/kvm_util_arch.h
> index e43a57d99b56..d5095900e442 100644
> --- a/tools/testing/selftests/kvm/include/s390/kvm_util_arch.h
> +++ b/tools/testing/selftests/kvm/include/s390/kvm_util_arch.h
> @@ -2,6 +2,7 @@
> #ifndef SELFTEST_KVM_UTIL_ARCH_H
> #define SELFTEST_KVM_UTIL_ARCH_H
>
> +struct kvm_mmu_arch {};
> struct kvm_vm_arch {};
>
> #endif // SELFTEST_KVM_UTIL_ARCH_H
> diff --git a/tools/testing/selftests/kvm/include/x86/kvm_util_arch.h b/tools/testing/selftests/kvm/include/x86/kvm_util_arch.h
> index 972bb1c4ab4c..456e5ca170df 100644
> --- a/tools/testing/selftests/kvm/include/x86/kvm_util_arch.h
> +++ b/tools/testing/selftests/kvm/include/x86/kvm_util_arch.h
> @@ -10,6 +10,8 @@
>
> extern bool is_forced_emulation_enabled;
>
> +struct kvm_mmu_arch {};
> +
> struct kvm_vm_arch {
> vm_vaddr_t gdt;
> vm_vaddr_t tss;
> --
> 2.52.0.351.gbe84eed79e-goog
>
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v4 12/21] KVM: selftests: Add a stage-2 MMU instance to kvm_vm
2025-12-30 23:01 ` [PATCH v4 12/21] KVM: selftests: Add a stage-2 MMU instance to kvm_vm Sean Christopherson
@ 2026-01-02 17:03 ` Yosry Ahmed
0 siblings, 0 replies; 39+ messages in thread
From: Yosry Ahmed @ 2026-01-02 17:03 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou,
Christian Borntraeger, Janosch Frank, Claudio Imbrenda, kvm,
linux-arm-kernel, kvmarm, loongarch, kvm-riscv, linux-riscv,
linux-kernel
On Tue, Dec 30, 2025 at 03:01:41PM -0800, Sean Christopherson wrote:
> Add a stage-2 MMU instance so that architectures that support nested
> virtualization (more specifically, nested stage-2 page tables) can create
> and track stage-2 page tables for running L2 guests. Plumb the structure
> into common code to avoid cyclical dependencies, and to provide some line
> of sight to having common APIs for creating stage-2 mappings.
>
> As a bonus, putting the member in common code justifies using stage2_mmu
> instead of tdp_mmu for x86.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Yosry Ahmed <yosry.ahmed@linux.dev>
> ---
> tools/testing/selftests/kvm/include/kvm_util.h | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/tools/testing/selftests/kvm/include/kvm_util.h b/tools/testing/selftests/kvm/include/kvm_util.h
> index c1497515fa6a..371d55e0366e 100644
> --- a/tools/testing/selftests/kvm/include/kvm_util.h
> +++ b/tools/testing/selftests/kvm/include/kvm_util.h
> @@ -116,7 +116,12 @@ struct kvm_vm {
> uint32_t dirty_ring_size;
> uint64_t gpa_tag_mask;
>
> + /*
> + * "mmu" is the guest's stage-1, with a short name because the vast
> + * majority of tests only care about the stage-1 MMU.
> + */
> struct kvm_mmu mmu;
> + struct kvm_mmu stage2_mmu;
>
> struct kvm_vm_arch arch;
>
> --
> 2.52.0.351.gbe84eed79e-goog
>
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v4 20/21] KVM: selftests: Rename vm_get_page_table_entry() to vm_get_pte()
2025-12-30 23:01 ` [PATCH v4 20/21] KVM: selftests: Rename vm_get_page_table_entry() to vm_get_pte() Sean Christopherson
@ 2026-01-02 17:10 ` Yosry Ahmed
0 siblings, 0 replies; 39+ messages in thread
From: Yosry Ahmed @ 2026-01-02 17:10 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou,
Christian Borntraeger, Janosch Frank, Claudio Imbrenda, kvm,
linux-arm-kernel, kvmarm, loongarch, kvm-riscv, linux-riscv,
linux-kernel
On Tue, Dec 30, 2025 at 03:01:49PM -0800, Sean Christopherson wrote:
> Shorten the API to get a PTE as the "PTE" acronym is ubiquitous, and the
> "page table entry" makes it unnecessarily difficult to quickly understand
> what callers are doing.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Yosry Ahmed <yosry.ahmed@linux.dev>
> ---
> tools/testing/selftests/kvm/include/x86/processor.h | 2 +-
> tools/testing/selftests/kvm/lib/x86/processor.c | 2 +-
> tools/testing/selftests/kvm/x86/hyperv_tlb_flush.c | 2 +-
> .../selftests/kvm/x86/smaller_maxphyaddr_emulation_test.c | 4 +---
> 4 files changed, 4 insertions(+), 6 deletions(-)
>
> diff --git a/tools/testing/selftests/kvm/include/x86/processor.h b/tools/testing/selftests/kvm/include/x86/processor.h
> index 7b7d962244d6..ab29b1c7ed2d 100644
> --- a/tools/testing/selftests/kvm/include/x86/processor.h
> +++ b/tools/testing/selftests/kvm/include/x86/processor.h
> @@ -1357,7 +1357,7 @@ static inline bool kvm_is_ignore_msrs(void)
> return get_kvm_param_bool("ignore_msrs");
> }
>
> -uint64_t *vm_get_page_table_entry(struct kvm_vm *vm, uint64_t vaddr);
> +uint64_t *vm_get_pte(struct kvm_vm *vm, uint64_t vaddr);
>
> uint64_t kvm_hypercall(uint64_t nr, uint64_t a0, uint64_t a1, uint64_t a2,
> uint64_t a3);
> diff --git a/tools/testing/selftests/kvm/lib/x86/processor.c b/tools/testing/selftests/kvm/lib/x86/processor.c
> index 5a3385d48902..ab869a98bbdc 100644
> --- a/tools/testing/selftests/kvm/lib/x86/processor.c
> +++ b/tools/testing/selftests/kvm/lib/x86/processor.c
> @@ -390,7 +390,7 @@ static uint64_t *__vm_get_page_table_entry(struct kvm_vm *vm,
> return virt_get_pte(vm, mmu, pte, vaddr, PG_LEVEL_4K);
> }
>
> -uint64_t *vm_get_page_table_entry(struct kvm_vm *vm, uint64_t vaddr)
> +uint64_t *vm_get_pte(struct kvm_vm *vm, uint64_t vaddr)
> {
> int level = PG_LEVEL_4K;
>
> diff --git a/tools/testing/selftests/kvm/x86/hyperv_tlb_flush.c b/tools/testing/selftests/kvm/x86/hyperv_tlb_flush.c
> index a3b7ce155981..c542cc4762b1 100644
> --- a/tools/testing/selftests/kvm/x86/hyperv_tlb_flush.c
> +++ b/tools/testing/selftests/kvm/x86/hyperv_tlb_flush.c
> @@ -619,7 +619,7 @@ int main(int argc, char *argv[])
> */
> gva = vm_vaddr_unused_gap(vm, NTEST_PAGES * PAGE_SIZE, KVM_UTIL_MIN_VADDR);
> for (i = 0; i < NTEST_PAGES; i++) {
> - pte = vm_get_page_table_entry(vm, data->test_pages + i * PAGE_SIZE);
> + pte = vm_get_pte(vm, data->test_pages + i * PAGE_SIZE);
> gpa = addr_hva2gpa(vm, pte);
> virt_pg_map(vm, gva + PAGE_SIZE * i, gpa & PAGE_MASK);
> data->test_pages_pte[i] = gva + (gpa & ~PAGE_MASK);
> diff --git a/tools/testing/selftests/kvm/x86/smaller_maxphyaddr_emulation_test.c b/tools/testing/selftests/kvm/x86/smaller_maxphyaddr_emulation_test.c
> index fabeeaddfb3a..0e8aec568010 100644
> --- a/tools/testing/selftests/kvm/x86/smaller_maxphyaddr_emulation_test.c
> +++ b/tools/testing/selftests/kvm/x86/smaller_maxphyaddr_emulation_test.c
> @@ -47,7 +47,6 @@ int main(int argc, char *argv[])
> struct kvm_vcpu *vcpu;
> struct kvm_vm *vm;
> struct ucall uc;
> - uint64_t *pte;
> uint64_t *hva;
> uint64_t gpa;
> int rc;
> @@ -73,8 +72,7 @@ int main(int argc, char *argv[])
> hva = addr_gpa2hva(vm, MEM_REGION_GPA);
> memset(hva, 0, PAGE_SIZE);
>
> - pte = vm_get_page_table_entry(vm, MEM_REGION_GVA);
> - *pte |= BIT_ULL(MAXPHYADDR);
> + *vm_get_pte(vm, MEM_REGION_GVA) |= BIT_ULL(MAXPHYADDR);
>
> vcpu_run(vcpu);
>
> --
> 2.52.0.351.gbe84eed79e-goog
>
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v4 11/21] KVM: selftests: Stop passing VMX metadata to TDP mapping functions
2025-12-30 23:01 ` [PATCH v4 11/21] KVM: selftests: Stop passing VMX metadata to TDP mapping functions Sean Christopherson
2026-01-02 16:58 ` Yosry Ahmed
@ 2026-01-02 17:12 ` Yosry Ahmed
1 sibling, 0 replies; 39+ messages in thread
From: Yosry Ahmed @ 2026-01-02 17:12 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou,
Christian Borntraeger, Janosch Frank, Claudio Imbrenda, kvm,
linux-arm-kernel, kvmarm, loongarch, kvm-riscv, linux-riscv,
linux-kernel
On Tue, Dec 30, 2025 at 03:01:40PM -0800, Sean Christopherson wrote:
> From: Yosry Ahmed <yosry.ahmed@linux.dev>
>
> The root GPA can now be retrieved from the nested MMU, stop passing VMX
> metadata. This is in preparation for making these functions work for
> NPTs as well.
>
> Opportunistically drop tdp_pg_map() since it's unused.
>
> No functional change intended.
>
> Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
> tools/testing/selftests/kvm/include/x86/vmx.h | 11 ++-----
> .../testing/selftests/kvm/lib/x86/memstress.c | 11 +++----
> tools/testing/selftests/kvm/lib/x86/vmx.c | 33 +++++++------------
> .../selftests/kvm/x86/vmx_dirty_log_test.c | 9 +++--
> 4 files changed, 24 insertions(+), 40 deletions(-)
>
> diff --git a/tools/testing/selftests/kvm/include/x86/vmx.h b/tools/testing/selftests/kvm/include/x86/vmx.h
> index 1fd83c23529a..4dd4c2094ee6 100644
> --- a/tools/testing/selftests/kvm/include/x86/vmx.h
> +++ b/tools/testing/selftests/kvm/include/x86/vmx.h
> @@ -557,14 +557,9 @@ bool load_vmcs(struct vmx_pages *vmx);
>
> bool ept_1g_pages_supported(void);
>
> -void tdp_pg_map(struct vmx_pages *vmx, struct kvm_vm *vm, uint64_t nested_paddr,
> - uint64_t paddr);
> -void tdp_map(struct vmx_pages *vmx, struct kvm_vm *vm, uint64_t nested_paddr,
> - uint64_t paddr, uint64_t size);
> -void tdp_identity_map_default_memslots(struct vmx_pages *vmx,
> - struct kvm_vm *vm);
> -void tdp_identity_map_1g(struct vmx_pages *vmx, struct kvm_vm *vm,
> - uint64_t addr, uint64_t size);
> +void tdp_map(struct kvm_vm *vm, uint64_t nested_paddr, uint64_t paddr, uint64_t size);
> +void tdp_identity_map_default_memslots(struct kvm_vm *vm);
> +void tdp_identity_map_1g(struct kvm_vm *vm, uint64_t addr, uint64_t size);
> bool kvm_cpu_has_ept(void);
> void vm_enable_ept(struct kvm_vm *vm);
> void prepare_virtualize_apic_accesses(struct vmx_pages *vmx, struct kvm_vm *vm);
> diff --git a/tools/testing/selftests/kvm/lib/x86/memstress.c b/tools/testing/selftests/kvm/lib/x86/memstress.c
> index 00f7f11e5f0e..3319cb57a78d 100644
> --- a/tools/testing/selftests/kvm/lib/x86/memstress.c
> +++ b/tools/testing/selftests/kvm/lib/x86/memstress.c
> @@ -59,7 +59,7 @@ uint64_t memstress_nested_pages(int nr_vcpus)
> return 513 + 10 * nr_vcpus;
> }
>
> -static void memstress_setup_ept_mappings(struct vmx_pages *vmx, struct kvm_vm *vm)
> +static void memstress_setup_ept_mappings(struct kvm_vm *vm)
> {
> uint64_t start, end;
>
> @@ -68,16 +68,15 @@ static void memstress_setup_ept_mappings(struct vmx_pages *vmx, struct kvm_vm *v
> * KVM can shadow the EPT12 with the maximum huge page size supported
> * by the backing source.
> */
> - tdp_identity_map_1g(vmx, vm, 0, 0x100000000ULL);
> + tdp_identity_map_1g(vm, 0, 0x100000000ULL);
>
> start = align_down(memstress_args.gpa, PG_SIZE_1G);
> end = align_up(memstress_args.gpa + memstress_args.size, PG_SIZE_1G);
> - tdp_identity_map_1g(vmx, vm, start, end - start);
> + tdp_identity_map_1g(vm, start, end - start);
> }
>
> void memstress_setup_nested(struct kvm_vm *vm, int nr_vcpus, struct kvm_vcpu *vcpus[])
> {
> - struct vmx_pages *vmx;
> struct kvm_regs regs;
> vm_vaddr_t vmx_gva;
> int vcpu_id;
> @@ -87,11 +86,11 @@ void memstress_setup_nested(struct kvm_vm *vm, int nr_vcpus, struct kvm_vcpu *vc
>
> vm_enable_ept(vm);
> for (vcpu_id = 0; vcpu_id < nr_vcpus; vcpu_id++) {
> - vmx = vcpu_alloc_vmx(vm, &vmx_gva);
> + vcpu_alloc_vmx(vm, &vmx_gva);
>
> /* The EPTs are shared across vCPUs, setup the mappings once */
> if (vcpu_id == 0)
> - memstress_setup_ept_mappings(vmx, vm);
> + memstress_setup_ept_mappings(vm);
Oh and if you're feeling nitpicky while applying, I think this call can
actually be moved outside of the loop now, right after vm_enable_ept(),
dropping the whole vcpu_id == 0 special case.
>
> /*
> * Override the vCPU to run memstress_l1_guest_code() which will
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v4 21/21] KVM: selftests: Test READ=>WRITE dirty logging behavior for shadow MMU
2025-12-30 23:01 ` [PATCH v4 21/21] KVM: selftests: Test READ=>WRITE dirty logging behavior for shadow MMU Sean Christopherson
@ 2026-01-02 17:36 ` Yosry Ahmed
2026-01-08 16:32 ` Sean Christopherson
0 siblings, 1 reply; 39+ messages in thread
From: Yosry Ahmed @ 2026-01-02 17:36 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou,
Christian Borntraeger, Janosch Frank, Claudio Imbrenda, kvm,
linux-arm-kernel, kvmarm, loongarch, kvm-riscv, linux-riscv,
linux-kernel
On Tue, Dec 30, 2025 at 03:01:50PM -0800, Sean Christopherson wrote:
> Update the nested dirty log test to validate KVM's handling of READ faults
> when dirty logging is enabled. Specifically, set the Dirty bit in the
> guest PTEs used to map L2 GPAs, so that KVM will create writable SPTEs
> when handling L2 read faults. When handling read faults in the shadow MMU,
> KVM opportunistically creates a writable SPTE if the mapping can be
> writable *and* the gPTE is dirty (or doesn't support the Dirty bit), i.e.
> if KVM doesn't need to intercept writes in order to emulate Dirty-bit
> updates.
>
> To actually test the L2 READ=>WRITE sequence, e.g. without masking a false
> pass by other test activity, route the READ=>WRITE and WRITE=>WRITE
> sequences to separate L1 pages, and differentiate between "marked dirty
> due to a WRITE access/fault" and "marked dirty due to creating a writable
> SPTE for a READ access/fault". The updated sequence exposes the bug fixed
> by KVM commit 1f4e5fc83a42 ("KVM: x86: fix nested guest live migration
> with PML") when the guest performs a READ=>WRITE sequence.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
> .../selftests/kvm/include/x86/processor.h | 1 +
> .../testing/selftests/kvm/lib/x86/processor.c | 7 ++
> .../selftests/kvm/x86/nested_dirty_log_test.c | 115 +++++++++++++-----
> 3 files changed, 90 insertions(+), 33 deletions(-)
>
> diff --git a/tools/testing/selftests/kvm/include/x86/processor.h b/tools/testing/selftests/kvm/include/x86/processor.h
> index ab29b1c7ed2d..8945c9eea704 100644
> --- a/tools/testing/selftests/kvm/include/x86/processor.h
> +++ b/tools/testing/selftests/kvm/include/x86/processor.h
> @@ -1483,6 +1483,7 @@ bool kvm_cpu_has_tdp(void);
> void tdp_map(struct kvm_vm *vm, uint64_t nested_paddr, uint64_t paddr, uint64_t size);
> void tdp_identity_map_default_memslots(struct kvm_vm *vm);
> void tdp_identity_map_1g(struct kvm_vm *vm, uint64_t addr, uint64_t size);
> +uint64_t *tdp_get_pte(struct kvm_vm *vm, uint64_t l2_gpa);
>
> /*
> * Basic CPU control in CR0
> diff --git a/tools/testing/selftests/kvm/lib/x86/processor.c b/tools/testing/selftests/kvm/lib/x86/processor.c
> index ab869a98bbdc..fab18e9be66c 100644
> --- a/tools/testing/selftests/kvm/lib/x86/processor.c
> +++ b/tools/testing/selftests/kvm/lib/x86/processor.c
> @@ -390,6 +390,13 @@ static uint64_t *__vm_get_page_table_entry(struct kvm_vm *vm,
> return virt_get_pte(vm, mmu, pte, vaddr, PG_LEVEL_4K);
> }
>
> +uint64_t *tdp_get_pte(struct kvm_vm *vm, uint64_t l2_gpa)
> +{
> + int level = PG_LEVEL_4K;
> +
> + return __vm_get_page_table_entry(vm, &vm->stage2_mmu, l2_gpa, &level);
> +}
> +
> uint64_t *vm_get_pte(struct kvm_vm *vm, uint64_t vaddr)
> {
> int level = PG_LEVEL_4K;
> diff --git a/tools/testing/selftests/kvm/x86/nested_dirty_log_test.c b/tools/testing/selftests/kvm/x86/nested_dirty_log_test.c
> index 89d2e86a0db9..1e7c1ed917e1 100644
> --- a/tools/testing/selftests/kvm/x86/nested_dirty_log_test.c
> +++ b/tools/testing/selftests/kvm/x86/nested_dirty_log_test.c
> @@ -17,29 +17,39 @@
>
> /* The memory slot index to track dirty pages */
> #define TEST_MEM_SLOT_INDEX 1
> -#define TEST_MEM_PAGES 3
> +#define TEST_MEM_PAGES 4
>
> /* L1 guest test virtual memory offset */
> -#define GUEST_TEST_MEM 0xc0000000
> +#define GUEST_TEST_MEM1 0xc0000000
> +#define GUEST_TEST_MEM2 0xc0002000
>
> /* L2 guest test virtual memory offset */
> #define NESTED_TEST_MEM1 0xc0001000
> -#define NESTED_TEST_MEM2 0xc0002000
> +#define NESTED_TEST_MEM2 0xc0003000
>
> #define L2_GUEST_STACK_SIZE 64
>
> +#define TEST_SYNC_PAGE_MASK 0xfull
> +#define TEST_SYNC_READ_FAULT BIT(4)
> +#define TEST_SYNC_WRITE_FAULT BIT(5)
> +#define TEST_SYNC_NO_FAULT BIT(6)
> +
> static void l2_guest_code(u64 *a, u64 *b)
> {
> READ_ONCE(*a);
> + GUEST_SYNC(0 | TEST_SYNC_READ_FAULT);
> WRITE_ONCE(*a, 1);
> - GUEST_SYNC(true);
> - GUEST_SYNC(false);
> + GUEST_SYNC(0 | TEST_SYNC_WRITE_FAULT);
> + READ_ONCE(*a);
> + GUEST_SYNC(0 | TEST_SYNC_NO_FAULT);
>
> WRITE_ONCE(*b, 1);
> - GUEST_SYNC(true);
> + GUEST_SYNC(2 | TEST_SYNC_WRITE_FAULT);
> WRITE_ONCE(*b, 1);
> - GUEST_SYNC(true);
> - GUEST_SYNC(false);
> + GUEST_SYNC(2 | TEST_SYNC_WRITE_FAULT);
> + READ_ONCE(*b);
> + GUEST_SYNC(2 | TEST_SYNC_NO_FAULT);
> + GUEST_SYNC(2 | TEST_SYNC_NO_FAULT);
Instead of hardcoding 0 and 2 here, which IIUC correspond to the
physical addresses 0xc0000000 and 0xc0002000, as well as indices in
host_test_mem, can we make the overall definitions a bit more intuitive?
For example:
#define GUEST_GPA_START 0xc0000000
#define GUEST_PAGE1_IDX 0
#define GUEST_PAGE2_IDX 1
#define GUEST_GPA_PAGE1 (GUEST_GPA_START + GUEST_PAGE1_IDX * PAGE_SIZE)
#define GUEST_GPA_PAGE2 (GUEST_GPA_START + GUEST_PAGE2_IDX * PAGE_SIZE)
/* Mapped to GUEST_GPA_PAGE1 and GUEST_GPA_PAGE2 */
#define GUEST_GVA_PAGE1 0xd0000000
#define GUEST_GVA_PAGE2 0xd0002000
/* Mapped to GUEST_GPA_PAGE1 and GUEST_GPA_PAGE2 using TDP in L1 */
#define GUEST_GVA_NESTED_PAGE1 0xd0001000
#define GUEST_GVA_NESTED_PAGE2 0xd0003000
Then in L2 code, we can explicitly take in the GVA of page1 and page2
and use the definitions above in the GUEST_SYNC() calls, for example:
static void l2_guest_code(u64 *page1_gva, u64 *page2_gva)
{
READ_ONCE(*page1_gva);
GUEST_SYNC(GUEST_PAGE1_IDX | TEST_SYNC_READ_FAULT);
WRITE_ONCE(*page1_gva, 1);
GUEST_SYNC(GUEST_PAGE1_IDX | TEST_SYNC_WRITE_FAULT);
...
}
and we can explicitly read page1 and page2 from the host (instead of
using host_test_mem).
Alternatively, we can pass in the guest GVA directly into GUEST_SYNC(),
and use the lower bits for TEST_SYNC_READ_FAULT, TEST_SYNC_WRITE_FAULT,
and TEST_SYNC_NO_FAULT.
WDYT?
>
> /* Exit to L1 and never come back. */
> vmcall();
> @@ -53,7 +63,7 @@ static void l2_guest_code_tdp_enabled(void)
> static void l2_guest_code_tdp_disabled(void)
> {
> /* Access the same L1 GPAs as l2_guest_code_tdp_enabled() */
> - l2_guest_code((u64 *)GUEST_TEST_MEM, (u64 *)GUEST_TEST_MEM);
> + l2_guest_code((u64 *)GUEST_TEST_MEM1, (u64 *)GUEST_TEST_MEM2);
> }
>
> void l1_vmx_code(struct vmx_pages *vmx)
> @@ -72,9 +82,11 @@ void l1_vmx_code(struct vmx_pages *vmx)
>
> prepare_vmcs(vmx, l2_rip, &l2_guest_stack[L2_GUEST_STACK_SIZE]);
>
> - GUEST_SYNC(false);
> + GUEST_SYNC(0 | TEST_SYNC_NO_FAULT);
> + GUEST_SYNC(2 | TEST_SYNC_NO_FAULT);
> GUEST_ASSERT(!vmlaunch());
> - GUEST_SYNC(false);
> + GUEST_SYNC(0 | TEST_SYNC_NO_FAULT);
> + GUEST_SYNC(2 | TEST_SYNC_NO_FAULT);
> GUEST_ASSERT_EQ(vmreadz(VM_EXIT_REASON), EXIT_REASON_VMCALL);
> GUEST_DONE();
> }
> @@ -91,9 +103,11 @@ static void l1_svm_code(struct svm_test_data *svm)
>
> generic_svm_setup(svm, l2_rip, &l2_guest_stack[L2_GUEST_STACK_SIZE]);
>
> - GUEST_SYNC(false);
> + GUEST_SYNC(0 | TEST_SYNC_NO_FAULT);
> + GUEST_SYNC(2 | TEST_SYNC_NO_FAULT);
> run_guest(svm->vmcb, svm->vmcb_gpa);
> - GUEST_SYNC(false);
> + GUEST_SYNC(0 | TEST_SYNC_NO_FAULT);
> + GUEST_SYNC(2 | TEST_SYNC_NO_FAULT);
> GUEST_ASSERT_EQ(svm->vmcb->control.exit_code, SVM_EXIT_VMMCALL);
> GUEST_DONE();
> }
> @@ -106,6 +120,11 @@ static void l1_guest_code(void *data)
> l1_svm_code(data);
> }
>
> +static uint64_t test_read_host_page(uint64_t *host_test_mem, int page_nr)
> +{
> + return host_test_mem[PAGE_SIZE * page_nr / sizeof(*host_test_mem)];
> +}
> +
> static void test_dirty_log(bool nested_tdp)
> {
> vm_vaddr_t nested_gva = 0;
> @@ -133,32 +152,45 @@ static void test_dirty_log(bool nested_tdp)
>
> /* Add an extra memory slot for testing dirty logging */
> vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS,
> - GUEST_TEST_MEM,
> + GUEST_TEST_MEM1,
> TEST_MEM_SLOT_INDEX,
> TEST_MEM_PAGES,
> KVM_MEM_LOG_DIRTY_PAGES);
>
> /*
> - * Add an identity map for GVA range [0xc0000000, 0xc0002000). This
> + * Add an identity map for GVA range [0xc0000000, 0xc0004000). This
> * affects both L1 and L2. However...
> */
> - virt_map(vm, GUEST_TEST_MEM, GUEST_TEST_MEM, TEST_MEM_PAGES);
> + virt_map(vm, GUEST_TEST_MEM1, GUEST_TEST_MEM1, TEST_MEM_PAGES);
>
> /*
> - * ... pages in the L2 GPA range [0xc0001000, 0xc0003000) will map to
> - * 0xc0000000.
> + * ... pages in the L2 GPA ranges [0xc0001000, 0xc0002000) and
> + * [0xc0003000, 0xc0004000) will map to 0xc0000000 and 0xc0001000
> + * respectively.
> *
> * When TDP is disabled, the L2 guest code will still access the same L1
> * GPAs as the TDP enabled case.
> + *
> + * Set the Dirty bit in the PTEs used by L2 so that KVM will create
> + * writable SPTEs when handling read faults (if the Dirty bit isn't
> + * set, KVM must intercept the next write to emulate the Dirty bit
> + * update).
> */
> if (nested_tdp) {
> tdp_identity_map_default_memslots(vm);
> - tdp_map(vm, NESTED_TEST_MEM1, GUEST_TEST_MEM, PAGE_SIZE);
> - tdp_map(vm, NESTED_TEST_MEM2, GUEST_TEST_MEM, PAGE_SIZE);
> + tdp_map(vm, NESTED_TEST_MEM1, GUEST_TEST_MEM1, PAGE_SIZE);
> + tdp_map(vm, NESTED_TEST_MEM2, GUEST_TEST_MEM2, PAGE_SIZE);
> +
> +
> + *tdp_get_pte(vm, NESTED_TEST_MEM1) |= PTE_DIRTY_MASK(&vm->stage2_mmu);
> + *tdp_get_pte(vm, NESTED_TEST_MEM2) |= PTE_DIRTY_MASK(&vm->stage2_mmu);
> + } else {
> + *vm_get_pte(vm, GUEST_TEST_MEM1) |= PTE_DIRTY_MASK(&vm->mmu);
> + *vm_get_pte(vm, GUEST_TEST_MEM2) |= PTE_DIRTY_MASK(&vm->mmu);
> }
>
> bmap = bitmap_zalloc(TEST_MEM_PAGES);
> - host_test_mem = addr_gpa2hva(vm, GUEST_TEST_MEM);
> + host_test_mem = addr_gpa2hva(vm, GUEST_TEST_MEM1);
>
> while (!done) {
> memset(host_test_mem, 0xaa, TEST_MEM_PAGES * PAGE_SIZE);
> @@ -169,25 +201,42 @@ static void test_dirty_log(bool nested_tdp)
> case UCALL_ABORT:
> REPORT_GUEST_ASSERT(uc);
> /* NOT REACHED */
> - case UCALL_SYNC:
> + case UCALL_SYNC: {
> + int page_nr = uc.args[1] & TEST_SYNC_PAGE_MASK;
> + int i;
> +
> /*
> * The nested guest wrote at offset 0x1000 in the memslot, but the
> * dirty bitmap must be filled in according to L1 GPA, not L2.
> */
> kvm_vm_get_dirty_log(vm, TEST_MEM_SLOT_INDEX, bmap);
> - if (uc.args[1]) {
> - TEST_ASSERT(test_bit(0, bmap), "Page 0 incorrectly reported clean");
> - TEST_ASSERT(host_test_mem[0] == 1, "Page 0 not written by guest");
> - } else {
> - TEST_ASSERT(!test_bit(0, bmap), "Page 0 incorrectly reported dirty");
> - TEST_ASSERT(host_test_mem[0] == 0xaaaaaaaaaaaaaaaaULL, "Page 0 written by guest");
> +
> + /*
> + * If a fault is expected, the page should be dirty
> + * as the Dirty bit is set in the gPTE. KVM should
> + * create a writable SPTE even on a read fault, *and*
> + * KVM must mark the GFN as dirty when doing so.
> + */
> + TEST_ASSERT(test_bit(page_nr, bmap) == !(uc.args[1] & TEST_SYNC_NO_FAULT),
> + "Page %u incorrectly reported %s on %s fault", page_nr,
> + test_bit(page_nr, bmap) ? "dirty" : "clean",
> + uc.args[1] & TEST_SYNC_NO_FAULT ? "no" :
> + uc.args[1] & TEST_SYNC_READ_FAULT ? "read" : "write");
> +
> + for (i = 0; i < TEST_MEM_PAGES; i++) {
> + if (i == page_nr && uc.args[1] & TEST_SYNC_WRITE_FAULT)
> + TEST_ASSERT(test_read_host_page(host_test_mem, i) == 1,
> + "Page %u not written by guest", i);
> + else
> + TEST_ASSERT(test_read_host_page(host_test_mem, i) == 0xaaaaaaaaaaaaaaaaULL,
> + "Page %u written by guest", i);
> +
> + if (i != page_nr)
> + TEST_ASSERT(!test_bit(i, bmap),
> + "Page %u incorrectly reported dirty", i);
> }
> -
> - TEST_ASSERT(!test_bit(1, bmap), "Page 1 incorrectly reported dirty");
> - TEST_ASSERT(host_test_mem[PAGE_SIZE / 8] == 0xaaaaaaaaaaaaaaaaULL, "Page 1 written by guest");
> - TEST_ASSERT(!test_bit(2, bmap), "Page 2 incorrectly reported dirty");
> - TEST_ASSERT(host_test_mem[PAGE_SIZE*2 / 8] == 0xaaaaaaaaaaaaaaaaULL, "Page 2 written by guest");
> break;
> + }
> case UCALL_DONE:
> done = true;
> break;
> --
> 2.52.0.351.gbe84eed79e-goog
>
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v4 16/21] KVM: selftests: Add support for nested NPTs
2025-12-30 23:01 ` [PATCH v4 16/21] KVM: selftests: Add support for nested NPTs Sean Christopherson
@ 2026-01-07 23:12 ` Yosry Ahmed
0 siblings, 0 replies; 39+ messages in thread
From: Yosry Ahmed @ 2026-01-07 23:12 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou,
Christian Borntraeger, Janosch Frank, Claudio Imbrenda, kvm,
linux-arm-kernel, kvmarm, loongarch, kvm-riscv, linux-riscv,
linux-kernel
On Tue, Dec 30, 2025 at 03:01:45PM -0800, Sean Christopherson wrote:
> From: Yosry Ahmed <yosry.ahmed@linux.dev>
>
> Implement nCR3 and NPT initialization functions, similar to the EPT
> equivalents, and create common TDP helpers for enablement checking and
> initialization. Enable NPT for nested guests by default if the TDP MMU
> was initialized, similar to VMX.
>
> Reuse the PTE masks from the main MMU in the NPT MMU, except for the C
> and S bits related to confidential VMs.
>
> Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Funny story, I missed a teeny tiny part here..
diff --git a/tools/testing/selftests/kvm/lib/x86/svm.c b/tools/testing/selftests/kvm/lib/x86/svm.c
index 18e9e9089643..2e5c480c9afd 100644
--- a/tools/testing/selftests/kvm/lib/x86/svm.c
+++ b/tools/testing/selftests/kvm/lib/x86/svm.c
@@ -46,6 +46,9 @@ vcpu_alloc_svm(struct kvm_vm *vm, vm_vaddr_t *p_svm_gva)
svm->msr_gpa = addr_gva2gpa(vm, (uintptr_t)svm->msr);
memset(svm->msr_hva, 0, getpagesize());
+ if (vm->stage2_mmu.pgd_created)
+ svm->ncr3_gpa = vm->stage2_mmu.pgd;
+
*p_svm_gva = svm_gva;
return svm;
}
---
The good news is that the test still passes after we start ACTUALLY
USING the nested NPT :)
> ---
> .../selftests/kvm/include/x86/processor.h | 2 ++
> .../selftests/kvm/include/x86/svm_util.h | 9 ++++++++
> .../testing/selftests/kvm/lib/x86/memstress.c | 4 ++--
> .../testing/selftests/kvm/lib/x86/processor.c | 15 +++++++++++++
> tools/testing/selftests/kvm/lib/x86/svm.c | 21 +++++++++++++++++++
> .../selftests/kvm/x86/vmx_dirty_log_test.c | 4 ++--
> 6 files changed, 51 insertions(+), 4 deletions(-)
>
> diff --git a/tools/testing/selftests/kvm/include/x86/processor.h b/tools/testing/selftests/kvm/include/x86/processor.h
> index d134c886f280..deb471fb9b51 100644
> --- a/tools/testing/selftests/kvm/include/x86/processor.h
> +++ b/tools/testing/selftests/kvm/include/x86/processor.h
> @@ -1477,6 +1477,8 @@ void __virt_pg_map(struct kvm_vm *vm, struct kvm_mmu *mmu, uint64_t vaddr,
> void virt_map_level(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr,
> uint64_t nr_bytes, int level);
>
> +void vm_enable_tdp(struct kvm_vm *vm);
> +bool kvm_cpu_has_tdp(void);
> void tdp_map(struct kvm_vm *vm, uint64_t nested_paddr, uint64_t paddr, uint64_t size);
> void tdp_identity_map_default_memslots(struct kvm_vm *vm);
> void tdp_identity_map_1g(struct kvm_vm *vm, uint64_t addr, uint64_t size);
> diff --git a/tools/testing/selftests/kvm/include/x86/svm_util.h b/tools/testing/selftests/kvm/include/x86/svm_util.h
> index b74c6dcddcbd..5d7c42534bc4 100644
> --- a/tools/testing/selftests/kvm/include/x86/svm_util.h
> +++ b/tools/testing/selftests/kvm/include/x86/svm_util.h
> @@ -27,6 +27,9 @@ struct svm_test_data {
> void *msr; /* gva */
> void *msr_hva;
> uint64_t msr_gpa;
> +
> + /* NPT */
> + uint64_t ncr3_gpa;
> };
>
> static inline void vmmcall(void)
> @@ -57,6 +60,12 @@ struct svm_test_data *vcpu_alloc_svm(struct kvm_vm *vm, vm_vaddr_t *p_svm_gva);
> void generic_svm_setup(struct svm_test_data *svm, void *guest_rip, void *guest_rsp);
> void run_guest(struct vmcb *vmcb, uint64_t vmcb_gpa);
>
> +static inline bool kvm_cpu_has_npt(void)
> +{
> + return kvm_cpu_has(X86_FEATURE_NPT);
> +}
> +void vm_enable_npt(struct kvm_vm *vm);
> +
> int open_sev_dev_path_or_exit(void);
>
> #endif /* SELFTEST_KVM_SVM_UTILS_H */
> diff --git a/tools/testing/selftests/kvm/lib/x86/memstress.c b/tools/testing/selftests/kvm/lib/x86/memstress.c
> index 3319cb57a78d..407abfc34909 100644
> --- a/tools/testing/selftests/kvm/lib/x86/memstress.c
> +++ b/tools/testing/selftests/kvm/lib/x86/memstress.c
> @@ -82,9 +82,9 @@ void memstress_setup_nested(struct kvm_vm *vm, int nr_vcpus, struct kvm_vcpu *vc
> int vcpu_id;
>
> TEST_REQUIRE(kvm_cpu_has(X86_FEATURE_VMX));
> - TEST_REQUIRE(kvm_cpu_has_ept());
> + TEST_REQUIRE(kvm_cpu_has_tdp());
>
> - vm_enable_ept(vm);
> + vm_enable_tdp(vm);
> for (vcpu_id = 0; vcpu_id < nr_vcpus; vcpu_id++) {
> vcpu_alloc_vmx(vm, &vmx_gva);
>
> diff --git a/tools/testing/selftests/kvm/lib/x86/processor.c b/tools/testing/selftests/kvm/lib/x86/processor.c
> index 29e7d172f945..a3a4c9a4cbcb 100644
> --- a/tools/testing/selftests/kvm/lib/x86/processor.c
> +++ b/tools/testing/selftests/kvm/lib/x86/processor.c
> @@ -8,7 +8,9 @@
> #include "kvm_util.h"
> #include "pmu.h"
> #include "processor.h"
> +#include "svm_util.h"
> #include "sev.h"
> +#include "vmx.h"
>
> #ifndef NUM_INTERRUPTS
> #define NUM_INTERRUPTS 256
> @@ -472,6 +474,19 @@ void virt_arch_dump(FILE *stream, struct kvm_vm *vm, uint8_t indent)
> }
> }
>
> +void vm_enable_tdp(struct kvm_vm *vm)
> +{
> + if (kvm_cpu_has(X86_FEATURE_VMX))
> + vm_enable_ept(vm);
> + else
> + vm_enable_npt(vm);
> +}
> +
> +bool kvm_cpu_has_tdp(void)
> +{
> + return kvm_cpu_has_ept() || kvm_cpu_has_npt();
> +}
> +
> void __tdp_map(struct kvm_vm *vm, uint64_t nested_paddr, uint64_t paddr,
> uint64_t size, int level)
> {
> diff --git a/tools/testing/selftests/kvm/lib/x86/svm.c b/tools/testing/selftests/kvm/lib/x86/svm.c
> index d239c2097391..8e4795225595 100644
> --- a/tools/testing/selftests/kvm/lib/x86/svm.c
> +++ b/tools/testing/selftests/kvm/lib/x86/svm.c
> @@ -59,6 +59,22 @@ static void vmcb_set_seg(struct vmcb_seg *seg, u16 selector,
> seg->base = base;
> }
>
> +void vm_enable_npt(struct kvm_vm *vm)
> +{
> + struct pte_masks pte_masks;
> +
> + TEST_ASSERT(kvm_cpu_has_npt(), "KVM doesn't supported nested NPT");
> +
> + /*
> + * NPTs use the same PTE format, but deliberately drop the C-bit as the
> + * per-VM shared vs. private information is only meant for stage-1.
> + */
> + pte_masks = vm->mmu.arch.pte_masks;
> + pte_masks.c = 0;
> +
> + tdp_mmu_init(vm, vm->mmu.pgtable_levels, &pte_masks);
> +}
> +
> void generic_svm_setup(struct svm_test_data *svm, void *guest_rip, void *guest_rsp)
> {
> struct vmcb *vmcb = svm->vmcb;
> @@ -102,6 +118,11 @@ void generic_svm_setup(struct svm_test_data *svm, void *guest_rip, void *guest_r
> vmcb->save.rip = (u64)guest_rip;
> vmcb->save.rsp = (u64)guest_rsp;
> guest_regs.rdi = (u64)svm;
> +
> + if (svm->ncr3_gpa) {
> + ctrl->nested_ctl |= SVM_NESTED_CTL_NP_ENABLE;
> + ctrl->nested_cr3 = svm->ncr3_gpa;
> + }
> }
>
> /*
> diff --git a/tools/testing/selftests/kvm/x86/vmx_dirty_log_test.c b/tools/testing/selftests/kvm/x86/vmx_dirty_log_test.c
> index 370f8d3117c2..032ab8bf60a4 100644
> --- a/tools/testing/selftests/kvm/x86/vmx_dirty_log_test.c
> +++ b/tools/testing/selftests/kvm/x86/vmx_dirty_log_test.c
> @@ -93,7 +93,7 @@ static void test_vmx_dirty_log(bool enable_ept)
> /* Create VM */
> vm = vm_create_with_one_vcpu(&vcpu, l1_guest_code);
> if (enable_ept)
> - vm_enable_ept(vm);
> + vm_enable_tdp(vm);
>
> vcpu_alloc_vmx(vm, &vmx_pages_gva);
> vcpu_args_set(vcpu, 1, vmx_pages_gva);
> @@ -170,7 +170,7 @@ int main(int argc, char *argv[])
>
> test_vmx_dirty_log(/*enable_ept=*/false);
>
> - if (kvm_cpu_has_ept())
> + if (kvm_cpu_has_tdp())
> test_vmx_dirty_log(/*enable_ept=*/true);
>
> return 0;
> --
> 2.52.0.351.gbe84eed79e-goog
>
^ permalink raw reply related [flat|nested] 39+ messages in thread
* Re: [PATCH v4 21/21] KVM: selftests: Test READ=>WRITE dirty logging behavior for shadow MMU
2026-01-02 17:36 ` Yosry Ahmed
@ 2026-01-08 16:32 ` Sean Christopherson
2026-01-08 18:01 ` Yosry Ahmed
2026-01-08 20:26 ` Yosry Ahmed
0 siblings, 2 replies; 39+ messages in thread
From: Sean Christopherson @ 2026-01-08 16:32 UTC (permalink / raw)
To: Yosry Ahmed
Cc: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou,
Christian Borntraeger, Janosch Frank, Claudio Imbrenda, kvm,
linux-arm-kernel, kvmarm, loongarch, kvm-riscv, linux-riscv,
linux-kernel
On Fri, Jan 02, 2026, Yosry Ahmed wrote:
> On Tue, Dec 30, 2025 at 03:01:50PM -0800, Sean Christopherson wrote:
> > WRITE_ONCE(*b, 1);
> > - GUEST_SYNC(true);
> > + GUEST_SYNC(2 | TEST_SYNC_WRITE_FAULT);
> > WRITE_ONCE(*b, 1);
> > - GUEST_SYNC(true);
> > - GUEST_SYNC(false);
> > + GUEST_SYNC(2 | TEST_SYNC_WRITE_FAULT);
> > + READ_ONCE(*b);
> > + GUEST_SYNC(2 | TEST_SYNC_NO_FAULT);
> > + GUEST_SYNC(2 | TEST_SYNC_NO_FAULT);
>
> Instead of hardcoding 0 and 2 here, which IIUC correspond to the
> physical addresses 0xc0000000 and 0xc0002000, as well as indices in
> host_test_mem, can we make the overall definitions a bit more intuitive?
>
> For example:
>
> #define GUEST_GPA_START 0xc0000000
> #define GUEST_PAGE1_IDX 0
> #define GUEST_PAGE2_IDX 1
> #define GUEST_GPA_PAGE1 (GUEST_GPA_START + GUEST_PAGE1_IDX * PAGE_SIZE)
> #define GUEST_GPA_PAGE2 (GUEST_GPA_START + GUEST_PAGE2_IDX * PAGE_SIZE)
>
> /* Mapped to GUEST_GPA_PAGE1 and GUEST_GPA_PAGE2 */
> #define GUEST_GVA_PAGE1 0xd0000000
> #define GUEST_GVA_PAGE2 0xd0002000
>
> /* Mapped to GUEST_GPA_PAGE1 and GUEST_GPA_PAGE2 using TDP in L1 */
> #define GUEST_GVA_NESTED_PAGE1 0xd0001000
> #define GUEST_GVA_NESTED_PAGE2 0xd0003000
>
> Then in L2 code, we can explicitly take in the GVA of page1 and page2
> and use the definitions above in the GUEST_SYNC() calls, for example:
>
> static void l2_guest_code(u64 *page1_gva, u64 *page2_gva)
> {
> READ_ONCE(*page1_gva);
> GUEST_SYNC(GUEST_PAGE1_IDX | TEST_SYNC_READ_FAULT);
> WRITE_ONCE(*page1_gva, 1);
> GUEST_SYNC(GUEST_PAGE1_IDX | TEST_SYNC_WRITE_FAULT);
> ...
> }
>
> and we can explicitly read page1 and page2 from the host (instead of
> using host_test_mem).
>
> Alternatively, we can pass in the guest GVA directly into GUEST_SYNC(),
> and use the lower bits for TEST_SYNC_READ_FAULT, TEST_SYNC_WRITE_FAULT,
> and TEST_SYNC_NO_FAULT.
>
> WDYT?
I fiddled with this a bunch and came up with the below. It's more or less what
you're suggesting, but instead of interleaving the aliases, it simply puts them
at a higher base. That makes pulling the page frame number out of the GVA much
cleaner, as it's simply arithmetic instead of weird masking and shifting magic.
--
From: Sean Christopherson <seanjc@google.com>
Date: Wed, 7 Jan 2026 14:38:32 -0800
Subject: [PATCH] KVM: selftests: Test READ=>WRITE dirty logging behavior for
shadow MMU
Update the nested dirty log test to validate KVM's handling of READ faults
when dirty logging is enabled. Specifically, set the Dirty bit in the
guest PTEs used to map L2 GPAs, so that KVM will create writable SPTEs
when handling L2 read faults. When handling read faults in the shadow MMU,
KVM opportunistically creates a writable SPTE if the mapping can be
writable *and* the gPTE is dirty (or doesn't support the Dirty bit), i.e.
if KVM doesn't need to intercept writes in order to emulate Dirty-bit
updates.
To actually test the L2 READ=>WRITE sequence, e.g. without masking a false
pass by other test activity, route the READ=>WRITE and WRITE=>WRITE
sequences to separate L1 pages, and differentiate between "marked dirty
due to a WRITE access/fault" and "marked dirty due to creating a writable
SPTE for a READ access/fault". The updated sequence exposes the bug fixed
by KVM commit 1f4e5fc83a42 ("KVM: x86: fix nested guest live migration
with PML") when the guest performs a READ=>WRITE sequence with dirty guest
PTEs.
Opportunistically tweak and rename the address macros, and add comments,
to make it more obvious what the test is doing. E.g. NESTED_TEST_MEM1
vs. GUEST_TEST_MEM doesn't make it all that obvious that the test is
creating aliases in both the L2 GPA and GVA address spaces, but only when
L1 is using TDP to run L2.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
.../selftests/kvm/include/x86/processor.h | 1 +
.../testing/selftests/kvm/lib/x86/processor.c | 7 +
.../selftests/kvm/x86/nested_dirty_log_test.c | 188 +++++++++++++-----
3 files changed, 145 insertions(+), 51 deletions(-)
diff --git a/tools/testing/selftests/kvm/include/x86/processor.h b/tools/testing/selftests/kvm/include/x86/processor.h
index ab29b1c7ed2d..8945c9eea704 100644
--- a/tools/testing/selftests/kvm/include/x86/processor.h
+++ b/tools/testing/selftests/kvm/include/x86/processor.h
@@ -1483,6 +1483,7 @@ bool kvm_cpu_has_tdp(void);
void tdp_map(struct kvm_vm *vm, uint64_t nested_paddr, uint64_t paddr, uint64_t size);
void tdp_identity_map_default_memslots(struct kvm_vm *vm);
void tdp_identity_map_1g(struct kvm_vm *vm, uint64_t addr, uint64_t size);
+uint64_t *tdp_get_pte(struct kvm_vm *vm, uint64_t l2_gpa);
/*
* Basic CPU control in CR0
diff --git a/tools/testing/selftests/kvm/lib/x86/processor.c b/tools/testing/selftests/kvm/lib/x86/processor.c
index ab869a98bbdc..fab18e9be66c 100644
--- a/tools/testing/selftests/kvm/lib/x86/processor.c
+++ b/tools/testing/selftests/kvm/lib/x86/processor.c
@@ -390,6 +390,13 @@ static uint64_t *__vm_get_page_table_entry(struct kvm_vm *vm,
return virt_get_pte(vm, mmu, pte, vaddr, PG_LEVEL_4K);
}
+uint64_t *tdp_get_pte(struct kvm_vm *vm, uint64_t l2_gpa)
+{
+ int level = PG_LEVEL_4K;
+
+ return __vm_get_page_table_entry(vm, &vm->stage2_mmu, l2_gpa, &level);
+}
+
uint64_t *vm_get_pte(struct kvm_vm *vm, uint64_t vaddr)
{
int level = PG_LEVEL_4K;
diff --git a/tools/testing/selftests/kvm/x86/nested_dirty_log_test.c b/tools/testing/selftests/kvm/x86/nested_dirty_log_test.c
index 89d2e86a0db9..6f4f7a8209be 100644
--- a/tools/testing/selftests/kvm/x86/nested_dirty_log_test.c
+++ b/tools/testing/selftests/kvm/x86/nested_dirty_log_test.c
@@ -17,29 +17,53 @@
/* The memory slot index to track dirty pages */
#define TEST_MEM_SLOT_INDEX 1
-#define TEST_MEM_PAGES 3
-/* L1 guest test virtual memory offset */
-#define GUEST_TEST_MEM 0xc0000000
+/*
+ * Allocate four pages total. Two pages are used to verify that the KVM marks
+ * the accessed page/GFN as marked dirty, but not the "other" page. Times two
+ * so that each "normal" page can be accessed from L2 via an aliased L2 GVA+GPA
+ * (when TDP is enabled), to verify KVM marks _L1's_ page/GFN as dirty (to
+ * detect failures, L2 => L1 GPAs can't be identity mapped in the TDP page
+ * tables, as marking L2's GPA dirty would get a false pass if L1 == L2).
+ */
+#define TEST_MEM_PAGES 4
-/* L2 guest test virtual memory offset */
-#define NESTED_TEST_MEM1 0xc0001000
-#define NESTED_TEST_MEM2 0xc0002000
+#define TEST_MEM_BASE 0xc0000000
+#define TEST_MEM_ALIAS_BASE 0xc0002000
+
+#define TEST_GUEST_ADDR(base, idx) ((base) + (idx) * PAGE_SIZE)
+
+#define TEST_GVA(idx) TEST_GUEST_ADDR(TEST_MEM_BASE, idx)
+#define TEST_GPA(idx) TEST_GUEST_ADDR(TEST_MEM_BASE, idx)
+
+#define TEST_HVA(vm, idx) addr_gpa2hva(vm, TEST_GPA(idx))
#define L2_GUEST_STACK_SIZE 64
-static void l2_guest_code(u64 *a, u64 *b)
+/* Use the page offset bits to communicate the access+fault type. */
+#define TEST_SYNC_READ_FAULT BIT(0)
+#define TEST_SYNC_WRITE_FAULT BIT(1)
+#define TEST_SYNC_NO_FAULT BIT(2)
+
+static void l2_guest_code(vm_vaddr_t base)
{
- READ_ONCE(*a);
- WRITE_ONCE(*a, 1);
- GUEST_SYNC(true);
- GUEST_SYNC(false);
+ vm_vaddr_t page0 = TEST_GUEST_ADDR(base, 0);
+ vm_vaddr_t page1 = TEST_GUEST_ADDR(base, 1);
- WRITE_ONCE(*b, 1);
- GUEST_SYNC(true);
- WRITE_ONCE(*b, 1);
- GUEST_SYNC(true);
- GUEST_SYNC(false);
+ READ_ONCE(*(u64 *)page0);
+ GUEST_SYNC(page0 | TEST_SYNC_READ_FAULT);
+ WRITE_ONCE(*(u64 *)page0, 1);
+ GUEST_SYNC(page0 | TEST_SYNC_WRITE_FAULT);
+ READ_ONCE(*(u64 *)page0);
+ GUEST_SYNC(page0 | TEST_SYNC_NO_FAULT);
+
+ WRITE_ONCE(*(u64 *)page1, 1);
+ GUEST_SYNC(page1 | TEST_SYNC_WRITE_FAULT);
+ WRITE_ONCE(*(u64 *)page1, 1);
+ GUEST_SYNC(page1 | TEST_SYNC_WRITE_FAULT);
+ READ_ONCE(*(u64 *)page1);
+ GUEST_SYNC(page1 | TEST_SYNC_NO_FAULT);
+ GUEST_SYNC(page1 | TEST_SYNC_NO_FAULT);
/* Exit to L1 and never come back. */
vmcall();
@@ -47,13 +71,22 @@ static void l2_guest_code(u64 *a, u64 *b)
static void l2_guest_code_tdp_enabled(void)
{
- l2_guest_code((u64 *)NESTED_TEST_MEM1, (u64 *)NESTED_TEST_MEM2);
+ /*
+ * Use the aliased virtual addresses when running with TDP to verify
+ * that KVM correctly handles the case where a page is dirtied via a
+ * different GPA than would be used by L1.
+ */
+ l2_guest_code(TEST_MEM_ALIAS_BASE);
}
static void l2_guest_code_tdp_disabled(void)
{
- /* Access the same L1 GPAs as l2_guest_code_tdp_enabled() */
- l2_guest_code((u64 *)GUEST_TEST_MEM, (u64 *)GUEST_TEST_MEM);
+ /*
+ * Use the "normal" virtual addresses when running without TDP enabled,
+ * in which case L2 will use the same page tables as L1, and thus needs
+ * to use the same virtual addresses that are mapped into L1.
+ */
+ l2_guest_code(TEST_MEM_BASE);
}
void l1_vmx_code(struct vmx_pages *vmx)
@@ -72,9 +105,9 @@ void l1_vmx_code(struct vmx_pages *vmx)
prepare_vmcs(vmx, l2_rip, &l2_guest_stack[L2_GUEST_STACK_SIZE]);
- GUEST_SYNC(false);
+ GUEST_SYNC(TEST_SYNC_NO_FAULT);
GUEST_ASSERT(!vmlaunch());
- GUEST_SYNC(false);
+ GUEST_SYNC(TEST_SYNC_NO_FAULT);
GUEST_ASSERT_EQ(vmreadz(VM_EXIT_REASON), EXIT_REASON_VMCALL);
GUEST_DONE();
}
@@ -91,9 +124,9 @@ static void l1_svm_code(struct svm_test_data *svm)
generic_svm_setup(svm, l2_rip, &l2_guest_stack[L2_GUEST_STACK_SIZE]);
- GUEST_SYNC(false);
+ GUEST_SYNC(TEST_SYNC_NO_FAULT);
run_guest(svm->vmcb, svm->vmcb_gpa);
- GUEST_SYNC(false);
+ GUEST_SYNC(TEST_SYNC_NO_FAULT);
GUEST_ASSERT_EQ(svm->vmcb->control.exit_code, SVM_EXIT_VMMCALL);
GUEST_DONE();
}
@@ -106,12 +139,66 @@ static void l1_guest_code(void *data)
l1_svm_code(data);
}
+static void test_handle_ucall_sync(struct kvm_vm *vm, u64 arg,
+ unsigned long *bmap)
+{
+ vm_vaddr_t gva = arg & ~(PAGE_SIZE - 1);
+ int page_nr, i;
+
+ /*
+ * Extract the page number of underlying physical page, which is also
+ * the _L1_ page number. The dirty bitmap _must_ be updated based on
+ * the L1 GPA, not L2 GPA, i.e. whether or not L2 used an aliased GPA
+ * (i.e. if TDP enabled for L2) is irrelevant with respect to the dirty
+ * bitmap and which underlying physical page is accessed.
+ *
+ * Note, gva will be '0' if there was no access, i.e. if the purpose of
+ * the sync is to verify all pages are clean.
+ */
+ if (!gva)
+ page_nr = 0;
+ else if (gva >= TEST_MEM_ALIAS_BASE)
+ page_nr = (gva - TEST_MEM_ALIAS_BASE) >> PAGE_SHIFT;
+ else
+ page_nr = (gva - TEST_MEM_BASE) >> PAGE_SHIFT;
+ TEST_ASSERT(page_nr == 0 || page_nr == 1,
+ "Test bug, unexpected frame number '%u' for arg = %lx", page_nr, arg);
+ TEST_ASSERT(gva || (arg & TEST_SYNC_NO_FAULT),
+ "Test bug, gva must be valid if a fault is expected");
+
+ kvm_vm_get_dirty_log(vm, TEST_MEM_SLOT_INDEX, bmap);
+
+ /*
+ * Check all pages to verify the correct physical page was modified (or
+ * not), and that all pages are clean/dirty as expected.
+ *
+ * If a fault of any kind is expected, the target page should be dirty
+ * as the Dirty bit is set in the gPTE. KVM should create a writable
+ * SPTE even on a read fault, *and* KVM must mark the GFN as dirty
+ * when doing so.
+ */
+ for (i = 0; i < TEST_MEM_PAGES; i++) {
+ if (i == page_nr && arg & TEST_SYNC_WRITE_FAULT)
+ TEST_ASSERT(*(u64 *)TEST_HVA(vm, i) == 1,
+ "Page %u incorrectly not written by guest", i);
+ else
+ TEST_ASSERT(*(u64 *)TEST_HVA(vm, i) == 0xaaaaaaaaaaaaaaaaULL,
+ "Page %u incorrectly written by guest", i);
+
+ if (i == page_nr && !(arg & TEST_SYNC_NO_FAULT))
+ TEST_ASSERT(test_bit(i, bmap),
+ "Page %u incorrectly reported clean on %s fault",
+ i, arg & TEST_SYNC_READ_FAULT ? "read" : "write");
+ else
+ TEST_ASSERT(!test_bit(i, bmap),
+ "Page %u incorrectly reported dirty", i);
+ }
+}
+
static void test_dirty_log(bool nested_tdp)
{
vm_vaddr_t nested_gva = 0;
unsigned long *bmap;
- uint64_t *host_test_mem;
-
struct kvm_vcpu *vcpu;
struct kvm_vm *vm;
struct ucall uc;
@@ -133,35 +220,50 @@ static void test_dirty_log(bool nested_tdp)
/* Add an extra memory slot for testing dirty logging */
vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS,
- GUEST_TEST_MEM,
+ TEST_MEM_BASE,
TEST_MEM_SLOT_INDEX,
TEST_MEM_PAGES,
KVM_MEM_LOG_DIRTY_PAGES);
/*
- * Add an identity map for GVA range [0xc0000000, 0xc0002000). This
+ * Add an identity map for GVA range [0xc0000000, 0xc0004000). This
* affects both L1 and L2. However...
*/
- virt_map(vm, GUEST_TEST_MEM, GUEST_TEST_MEM, TEST_MEM_PAGES);
+ virt_map(vm, TEST_MEM_BASE, TEST_MEM_BASE, TEST_MEM_PAGES);
/*
- * ... pages in the L2 GPA range [0xc0001000, 0xc0003000) will map to
- * 0xc0000000.
+ * ... pages in the L2 GPA ranges [0xc0001000, 0xc0002000) and
+ * [0xc0003000, 0xc0004000) will map to 0xc0000000 and 0xc0001000
+ * respectively.
*
* When TDP is disabled, the L2 guest code will still access the same L1
* GPAs as the TDP enabled case.
+ *
+ * Set the Dirty bit in the PTEs used by L2 so that KVM will create
+ * writable SPTEs when handling read faults (if the Dirty bit isn't
+ * set, KVM must intercept the next write to emulate the Dirty bit
+ * update).
*/
if (nested_tdp) {
+ vm_vaddr_t gva0 = TEST_GUEST_ADDR(TEST_MEM_ALIAS_BASE, 0);
+ vm_vaddr_t gva1 = TEST_GUEST_ADDR(TEST_MEM_ALIAS_BASE, 1);
+
tdp_identity_map_default_memslots(vm);
- tdp_map(vm, NESTED_TEST_MEM1, GUEST_TEST_MEM, PAGE_SIZE);
- tdp_map(vm, NESTED_TEST_MEM2, GUEST_TEST_MEM, PAGE_SIZE);
+ tdp_map(vm, gva0, TEST_GPA(0), PAGE_SIZE);
+ tdp_map(vm, gva1, TEST_GPA(1), PAGE_SIZE);
+
+ *tdp_get_pte(vm, gva0) |= PTE_DIRTY_MASK(&vm->stage2_mmu);
+ *tdp_get_pte(vm, gva1) |= PTE_DIRTY_MASK(&vm->stage2_mmu);
+ } else {
+ *vm_get_pte(vm, TEST_GVA(0)) |= PTE_DIRTY_MASK(&vm->mmu);
+ *vm_get_pte(vm, TEST_GVA(1)) |= PTE_DIRTY_MASK(&vm->mmu);
}
bmap = bitmap_zalloc(TEST_MEM_PAGES);
- host_test_mem = addr_gpa2hva(vm, GUEST_TEST_MEM);
while (!done) {
- memset(host_test_mem, 0xaa, TEST_MEM_PAGES * PAGE_SIZE);
+ memset(TEST_HVA(vm, 0), 0xaa, TEST_MEM_PAGES * PAGE_SIZE);
+
vcpu_run(vcpu);
TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_IO);
@@ -170,23 +272,7 @@ static void test_dirty_log(bool nested_tdp)
REPORT_GUEST_ASSERT(uc);
/* NOT REACHED */
case UCALL_SYNC:
- /*
- * The nested guest wrote at offset 0x1000 in the memslot, but the
- * dirty bitmap must be filled in according to L1 GPA, not L2.
- */
- kvm_vm_get_dirty_log(vm, TEST_MEM_SLOT_INDEX, bmap);
- if (uc.args[1]) {
- TEST_ASSERT(test_bit(0, bmap), "Page 0 incorrectly reported clean");
- TEST_ASSERT(host_test_mem[0] == 1, "Page 0 not written by guest");
- } else {
- TEST_ASSERT(!test_bit(0, bmap), "Page 0 incorrectly reported dirty");
- TEST_ASSERT(host_test_mem[0] == 0xaaaaaaaaaaaaaaaaULL, "Page 0 written by guest");
- }
-
- TEST_ASSERT(!test_bit(1, bmap), "Page 1 incorrectly reported dirty");
- TEST_ASSERT(host_test_mem[PAGE_SIZE / 8] == 0xaaaaaaaaaaaaaaaaULL, "Page 1 written by guest");
- TEST_ASSERT(!test_bit(2, bmap), "Page 2 incorrectly reported dirty");
- TEST_ASSERT(host_test_mem[PAGE_SIZE*2 / 8] == 0xaaaaaaaaaaaaaaaaULL, "Page 2 written by guest");
+ test_handle_ucall_sync(vm, uc.args[1], bmap);
break;
case UCALL_DONE:
done = true;
base-commit: 3cd487701a911d0e317bf31e79fe07bba5fa9995
--
^ permalink raw reply related [flat|nested] 39+ messages in thread
* Re: [PATCH v4 21/21] KVM: selftests: Test READ=>WRITE dirty logging behavior for shadow MMU
2026-01-08 16:32 ` Sean Christopherson
@ 2026-01-08 18:01 ` Yosry Ahmed
2026-01-08 18:31 ` Sean Christopherson
2026-01-08 20:26 ` Yosry Ahmed
1 sibling, 1 reply; 39+ messages in thread
From: Yosry Ahmed @ 2026-01-08 18:01 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou,
Christian Borntraeger, Janosch Frank, Claudio Imbrenda, kvm,
linux-arm-kernel, kvmarm, loongarch, kvm-riscv, linux-riscv,
linux-kernel
On Thu, Jan 08, 2026 at 08:32:44AM -0800, Sean Christopherson wrote:
> On Fri, Jan 02, 2026, Yosry Ahmed wrote:
> > On Tue, Dec 30, 2025 at 03:01:50PM -0800, Sean Christopherson wrote:
> > > WRITE_ONCE(*b, 1);
> > > - GUEST_SYNC(true);
> > > + GUEST_SYNC(2 | TEST_SYNC_WRITE_FAULT);
> > > WRITE_ONCE(*b, 1);
> > > - GUEST_SYNC(true);
> > > - GUEST_SYNC(false);
> > > + GUEST_SYNC(2 | TEST_SYNC_WRITE_FAULT);
> > > + READ_ONCE(*b);
> > > + GUEST_SYNC(2 | TEST_SYNC_NO_FAULT);
> > > + GUEST_SYNC(2 | TEST_SYNC_NO_FAULT);
> >
> > Instead of hardcoding 0 and 2 here, which IIUC correspond to the
> > physical addresses 0xc0000000 and 0xc0002000, as well as indices in
> > host_test_mem, can we make the overall definitions a bit more intuitive?
> >
> > For example:
> >
> > #define GUEST_GPA_START 0xc0000000
> > #define GUEST_PAGE1_IDX 0
> > #define GUEST_PAGE2_IDX 1
> > #define GUEST_GPA_PAGE1 (GUEST_GPA_START + GUEST_PAGE1_IDX * PAGE_SIZE)
> > #define GUEST_GPA_PAGE2 (GUEST_GPA_START + GUEST_PAGE2_IDX * PAGE_SIZE)
> >
> > /* Mapped to GUEST_GPA_PAGE1 and GUEST_GPA_PAGE2 */
> > #define GUEST_GVA_PAGE1 0xd0000000
> > #define GUEST_GVA_PAGE2 0xd0002000
> >
> > /* Mapped to GUEST_GPA_PAGE1 and GUEST_GPA_PAGE2 using TDP in L1 */
> > #define GUEST_GVA_NESTED_PAGE1 0xd0001000
> > #define GUEST_GVA_NESTED_PAGE2 0xd0003000
> >
> > Then in L2 code, we can explicitly take in the GVA of page1 and page2
> > and use the definitions above in the GUEST_SYNC() calls, for example:
> >
> > static void l2_guest_code(u64 *page1_gva, u64 *page2_gva)
> > {
> > READ_ONCE(*page1_gva);
> > GUEST_SYNC(GUEST_PAGE1_IDX | TEST_SYNC_READ_FAULT);
> > WRITE_ONCE(*page1_gva, 1);
> > GUEST_SYNC(GUEST_PAGE1_IDX | TEST_SYNC_WRITE_FAULT);
> > ...
> > }
> >
> > and we can explicitly read page1 and page2 from the host (instead of
> > using host_test_mem).
> >
> > Alternatively, we can pass in the guest GVA directly into GUEST_SYNC(),
> > and use the lower bits for TEST_SYNC_READ_FAULT, TEST_SYNC_WRITE_FAULT,
> > and TEST_SYNC_NO_FAULT.
> >
> > WDYT?
>
> I fiddled with this a bunch and came up with the below. It's more or less what
> you're suggesting, but instead of interleaving the aliases, it simply puts them
> at a higher base. That makes pulling the page frame number out of the GVA much
> cleaner, as it's simply arithmetic instead of weird masking and shifting magic.
>
> --
> From: Sean Christopherson <seanjc@google.com>
> Date: Wed, 7 Jan 2026 14:38:32 -0800
> Subject: [PATCH] KVM: selftests: Test READ=>WRITE dirty logging behavior for
> shadow MMU
>
> Update the nested dirty log test to validate KVM's handling of READ faults
> when dirty logging is enabled. Specifically, set the Dirty bit in the
> guest PTEs used to map L2 GPAs, so that KVM will create writable SPTEs
> when handling L2 read faults. When handling read faults in the shadow MMU,
> KVM opportunistically creates a writable SPTE if the mapping can be
> writable *and* the gPTE is dirty (or doesn't support the Dirty bit), i.e.
> if KVM doesn't need to intercept writes in order to emulate Dirty-bit
> updates.
>
> To actually test the L2 READ=>WRITE sequence, e.g. without masking a false
> pass by other test activity, route the READ=>WRITE and WRITE=>WRITE
> sequences to separate L1 pages, and differentiate between "marked dirty
> due to a WRITE access/fault" and "marked dirty due to creating a writable
> SPTE for a READ access/fault". The updated sequence exposes the bug fixed
> by KVM commit 1f4e5fc83a42 ("KVM: x86: fix nested guest live migration
> with PML") when the guest performs a READ=>WRITE sequence with dirty guest
> PTEs.
>
> Opportunistically tweak and rename the address macros, and add comments,
> to make it more obvious what the test is doing. E.g. NESTED_TEST_MEM1
> vs. GUEST_TEST_MEM doesn't make it all that obvious that the test is
> creating aliases in both the L2 GPA and GVA address spaces, but only when
> L1 is using TDP to run L2.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
> .../selftests/kvm/include/x86/processor.h | 1 +
> .../testing/selftests/kvm/lib/x86/processor.c | 7 +
> .../selftests/kvm/x86/nested_dirty_log_test.c | 188 +++++++++++++-----
> 3 files changed, 145 insertions(+), 51 deletions(-)
>
> diff --git a/tools/testing/selftests/kvm/include/x86/processor.h b/tools/testing/selftests/kvm/include/x86/processor.h
> index ab29b1c7ed2d..8945c9eea704 100644
> --- a/tools/testing/selftests/kvm/include/x86/processor.h
> +++ b/tools/testing/selftests/kvm/include/x86/processor.h
> @@ -1483,6 +1483,7 @@ bool kvm_cpu_has_tdp(void);
> void tdp_map(struct kvm_vm *vm, uint64_t nested_paddr, uint64_t paddr, uint64_t size);
> void tdp_identity_map_default_memslots(struct kvm_vm *vm);
> void tdp_identity_map_1g(struct kvm_vm *vm, uint64_t addr, uint64_t size);
> +uint64_t *tdp_get_pte(struct kvm_vm *vm, uint64_t l2_gpa);
>
> /*
> * Basic CPU control in CR0
> diff --git a/tools/testing/selftests/kvm/lib/x86/processor.c b/tools/testing/selftests/kvm/lib/x86/processor.c
> index ab869a98bbdc..fab18e9be66c 100644
> --- a/tools/testing/selftests/kvm/lib/x86/processor.c
> +++ b/tools/testing/selftests/kvm/lib/x86/processor.c
> @@ -390,6 +390,13 @@ static uint64_t *__vm_get_page_table_entry(struct kvm_vm *vm,
> return virt_get_pte(vm, mmu, pte, vaddr, PG_LEVEL_4K);
> }
>
> +uint64_t *tdp_get_pte(struct kvm_vm *vm, uint64_t l2_gpa)
nested_paddr is the name used by tdp_map(), maybe use that here as well
(and in the header)?
> +{
> + int level = PG_LEVEL_4K;
> +
> + return __vm_get_page_table_entry(vm, &vm->stage2_mmu, l2_gpa, &level);
> +}
> +
> uint64_t *vm_get_pte(struct kvm_vm *vm, uint64_t vaddr)
> {
> int level = PG_LEVEL_4K;
[..]
> @@ -133,35 +220,50 @@ static void test_dirty_log(bool nested_tdp)
>
> /* Add an extra memory slot for testing dirty logging */
> vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS,
> - GUEST_TEST_MEM,
> + TEST_MEM_BASE,
> TEST_MEM_SLOT_INDEX,
> TEST_MEM_PAGES,
> KVM_MEM_LOG_DIRTY_PAGES);
>
> /*
> - * Add an identity map for GVA range [0xc0000000, 0xc0002000). This
> + * Add an identity map for GVA range [0xc0000000, 0xc0004000). This
> * affects both L1 and L2. However...
> */
> - virt_map(vm, GUEST_TEST_MEM, GUEST_TEST_MEM, TEST_MEM_PAGES);
> + virt_map(vm, TEST_MEM_BASE, TEST_MEM_BASE, TEST_MEM_PAGES);
>
> /*
> - * ... pages in the L2 GPA range [0xc0001000, 0xc0003000) will map to
> - * 0xc0000000.
> + * ... pages in the L2 GPA ranges [0xc0001000, 0xc0002000) and
> + * [0xc0003000, 0xc0004000) will map to 0xc0000000 and 0xc0001000
> + * respectively.
Are these ranges correct? I thought L2 GPA range [0xc0002000,
0xc0004000) will map to [0xc0000000, 0xc0002000).
Also, perhaps it's better to express those in terms of the macros?
L2 GPA range [TEST_MEM_ALIAS_BASE, TEST_MEM_ALIAS_BASE + 2*PAGE_SIZE)
will map to [TEST_MEM_BASE, TEST_MEM_BASE + 2*PAGE_SIZE)?
> *
> * When TDP is disabled, the L2 guest code will still access the same L1
> * GPAs as the TDP enabled case.
> + *
> + * Set the Dirty bit in the PTEs used by L2 so that KVM will create
> + * writable SPTEs when handling read faults (if the Dirty bit isn't
> + * set, KVM must intercept the next write to emulate the Dirty bit
> + * update).
> */
> if (nested_tdp) {
> + vm_vaddr_t gva0 = TEST_GUEST_ADDR(TEST_MEM_ALIAS_BASE, 0);
> + vm_vaddr_t gva1 = TEST_GUEST_ADDR(TEST_MEM_ALIAS_BASE, 1);
Why are these gvas? Should these be L2 GPAs?
Maybe 'uint64_t l2_gpa0' or 'uint64_t nested_paddr0'?
Also maybe add TEST_ALIAS_GPA() macro to keep things consistent?
> +
> tdp_identity_map_default_memslots(vm);
> - tdp_map(vm, NESTED_TEST_MEM1, GUEST_TEST_MEM, PAGE_SIZE);
> - tdp_map(vm, NESTED_TEST_MEM2, GUEST_TEST_MEM, PAGE_SIZE);
> + tdp_map(vm, gva0, TEST_GPA(0), PAGE_SIZE);
> + tdp_map(vm, gva1, TEST_GPA(1), PAGE_SIZE);
> +
> + *tdp_get_pte(vm, gva0) |= PTE_DIRTY_MASK(&vm->stage2_mmu);
> + *tdp_get_pte(vm, gva1) |= PTE_DIRTY_MASK(&vm->stage2_mmu);
> + } else {
> + *vm_get_pte(vm, TEST_GVA(0)) |= PTE_DIRTY_MASK(&vm->mmu);
> + *vm_get_pte(vm, TEST_GVA(1)) |= PTE_DIRTY_MASK(&vm->mmu);
> }
>
> bmap = bitmap_zalloc(TEST_MEM_PAGES);
> - host_test_mem = addr_gpa2hva(vm, GUEST_TEST_MEM);
>
> while (!done) {
> - memset(host_test_mem, 0xaa, TEST_MEM_PAGES * PAGE_SIZE);
> + memset(TEST_HVA(vm, 0), 0xaa, TEST_MEM_PAGES * PAGE_SIZE);
> +
> vcpu_run(vcpu);
> TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_IO);
>
[..]
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v4 21/21] KVM: selftests: Test READ=>WRITE dirty logging behavior for shadow MMU
2026-01-08 18:01 ` Yosry Ahmed
@ 2026-01-08 18:31 ` Sean Christopherson
2026-01-08 20:24 ` Yosry Ahmed
0 siblings, 1 reply; 39+ messages in thread
From: Sean Christopherson @ 2026-01-08 18:31 UTC (permalink / raw)
To: Yosry Ahmed
Cc: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou,
Christian Borntraeger, Janosch Frank, Claudio Imbrenda, kvm,
linux-arm-kernel, kvmarm, loongarch, kvm-riscv, linux-riscv,
linux-kernel
On Thu, Jan 08, 2026, Yosry Ahmed wrote:
> On Thu, Jan 08, 2026 at 08:32:44AM -0800, Sean Christopherson wrote:
> > On Fri, Jan 02, 2026, Yosry Ahmed wrote:
> > diff --git a/tools/testing/selftests/kvm/lib/x86/processor.c b/tools/testing/selftests/kvm/lib/x86/processor.c
> > index ab869a98bbdc..fab18e9be66c 100644
> > --- a/tools/testing/selftests/kvm/lib/x86/processor.c
> > +++ b/tools/testing/selftests/kvm/lib/x86/processor.c
> > @@ -390,6 +390,13 @@ static uint64_t *__vm_get_page_table_entry(struct kvm_vm *vm,
> > return virt_get_pte(vm, mmu, pte, vaddr, PG_LEVEL_4K);
> > }
> >
> > +uint64_t *tdp_get_pte(struct kvm_vm *vm, uint64_t l2_gpa)
>
> nested_paddr is the name used by tdp_map(), maybe use that here as well
> (and in the header)?
Oh hell no :-) nested_paddr is a terrible name (I was *very* tempted to change
it on the fly, but restrained myself). "nested" is far too ambigous, e.g. without
nested virtualization, "nested_paddr" arguably refers to _L1_ physical addresses
(SVM called 'em Nested Page Tables after all).
> > + int level = PG_LEVEL_4K;
> > +
> > + return __vm_get_page_table_entry(vm, &vm->stage2_mmu, l2_gpa, &level);
> > +}
> > +
> > uint64_t *vm_get_pte(struct kvm_vm *vm, uint64_t vaddr)
> > {
> > int level = PG_LEVEL_4K;
> [..]
> > @@ -133,35 +220,50 @@ static void test_dirty_log(bool nested_tdp)
> >
> > /* Add an extra memory slot for testing dirty logging */
> > vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS,
> > - GUEST_TEST_MEM,
> > + TEST_MEM_BASE,
> > TEST_MEM_SLOT_INDEX,
> > TEST_MEM_PAGES,
> > KVM_MEM_LOG_DIRTY_PAGES);
> >
> > /*
> > - * Add an identity map for GVA range [0xc0000000, 0xc0002000). This
> > + * Add an identity map for GVA range [0xc0000000, 0xc0004000). This
> > * affects both L1 and L2. However...
> > */
> > - virt_map(vm, GUEST_TEST_MEM, GUEST_TEST_MEM, TEST_MEM_PAGES);
> > + virt_map(vm, TEST_MEM_BASE, TEST_MEM_BASE, TEST_MEM_PAGES);
> >
> > /*
> > - * ... pages in the L2 GPA range [0xc0001000, 0xc0003000) will map to
> > - * 0xc0000000.
> > + * ... pages in the L2 GPA ranges [0xc0001000, 0xc0002000) and
> > + * [0xc0003000, 0xc0004000) will map to 0xc0000000 and 0xc0001000
> > + * respectively.
>
> Are these ranges correct? I thought L2 GPA range [0xc0002000,
> 0xc0004000) will map to [0xc0000000, 0xc0002000).
Gah, no. I looked at the comments after changing things around, but my eyes had
glazed over by that point.
> Also, perhaps it's better to express those in terms of the macros?
>
> L2 GPA range [TEST_MEM_ALIAS_BASE, TEST_MEM_ALIAS_BASE + 2*PAGE_SIZE)
> will map to [TEST_MEM_BASE, TEST_MEM_BASE + 2*PAGE_SIZE)?
Hmm, no, at some point we need to concretely state the addresses, so that people
debugging this know what to expect, i.e. don't have to manually compute the
addresses from the macros in order to debug.
> > *
> > * When TDP is disabled, the L2 guest code will still access the same L1
> > * GPAs as the TDP enabled case.
> > + *
> > + * Set the Dirty bit in the PTEs used by L2 so that KVM will create
> > + * writable SPTEs when handling read faults (if the Dirty bit isn't
> > + * set, KVM must intercept the next write to emulate the Dirty bit
> > + * update).
> > */
> > if (nested_tdp) {
> > + vm_vaddr_t gva0 = TEST_GUEST_ADDR(TEST_MEM_ALIAS_BASE, 0);
> > + vm_vaddr_t gva1 = TEST_GUEST_ADDR(TEST_MEM_ALIAS_BASE, 1);
>
> Why are these gvas? Should these be L2 GPAs?
Pure oversight.
> Maybe 'uint64_t l2_gpa0' or 'uint64_t nested_paddr0'?
For better of worse, vm_paddr_t is the typedef in selftests. Hmm, if/when we go
with David M's proposal to switch to u64 (from e.g. uint64_t), it'd probably be
a good time to switch to KVM's gva_t and gpa_t as well.
> Also maybe add TEST_ALIAS_GPA() macro to keep things consistent?
Ya, then the line lengths are short enough to omit the local variables. How's
this look?
/*
* ... pages in the L2 GPA address range [0xc0002000, 0xc0004000) will
* map to [0xc0000000, 0xc0002000) when TDP is enabled (for L2).
*
* When TDP is disabled, the L2 guest code will still access the same L1
* GPAs as the TDP enabled case.
*
* Set the Dirty bit in the PTEs used by L2 so that KVM will create
* writable SPTEs when handling read faults (if the Dirty bit isn't
* set, KVM must intercept the next write to emulate the Dirty bit
* update).
*/
if (nested_tdp) {
tdp_identity_map_default_memslots(vm);
tdp_map(vm, TEST_ALIAS_GPA(0), TEST_GPA(0), PAGE_SIZE);
tdp_map(vm, TEST_ALIAS_GPA(1), TEST_GPA(1), PAGE_SIZE);
*tdp_get_pte(vm, TEST_ALIAS_GPA(0)) |= PTE_DIRTY_MASK(&vm->stage2_mmu);
*tdp_get_pte(vm, TEST_ALIAS_GPA(1)) |= PTE_DIRTY_MASK(&vm->stage2_mmu);
} else {
*vm_get_pte(vm, TEST_GVA(0)) |= PTE_DIRTY_MASK(&vm->mmu);
*vm_get_pte(vm, TEST_GVA(1)) |= PTE_DIRTY_MASK(&vm->mmu);
}
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v4 21/21] KVM: selftests: Test READ=>WRITE dirty logging behavior for shadow MMU
2026-01-08 18:31 ` Sean Christopherson
@ 2026-01-08 20:24 ` Yosry Ahmed
2026-01-08 20:29 ` Sean Christopherson
0 siblings, 1 reply; 39+ messages in thread
From: Yosry Ahmed @ 2026-01-08 20:24 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou,
Christian Borntraeger, Janosch Frank, Claudio Imbrenda, kvm,
linux-arm-kernel, kvmarm, loongarch, kvm-riscv, linux-riscv,
linux-kernel
On Thu, Jan 08, 2026 at 10:31:22AM -0800, Sean Christopherson wrote:
> On Thu, Jan 08, 2026, Yosry Ahmed wrote:
> > On Thu, Jan 08, 2026 at 08:32:44AM -0800, Sean Christopherson wrote:
> > > On Fri, Jan 02, 2026, Yosry Ahmed wrote:
> > > diff --git a/tools/testing/selftests/kvm/lib/x86/processor.c b/tools/testing/selftests/kvm/lib/x86/processor.c
> > > index ab869a98bbdc..fab18e9be66c 100644
> > > --- a/tools/testing/selftests/kvm/lib/x86/processor.c
> > > +++ b/tools/testing/selftests/kvm/lib/x86/processor.c
> > > @@ -390,6 +390,13 @@ static uint64_t *__vm_get_page_table_entry(struct kvm_vm *vm,
> > > return virt_get_pte(vm, mmu, pte, vaddr, PG_LEVEL_4K);
> > > }
> > >
> > > +uint64_t *tdp_get_pte(struct kvm_vm *vm, uint64_t l2_gpa)
> >
> > nested_paddr is the name used by tdp_map(), maybe use that here as well
> > (and in the header)?
>
> Oh hell no :-) nested_paddr is a terrible name (I was *very* tempted to change
> it on the fly, but restrained myself). "nested" is far too ambigous, e.g. without
> nested virtualization, "nested_paddr" arguably refers to _L1_ physical addresses
> (SVM called 'em Nested Page Tables after all).
That's fair, I generally like consistency to a fault :)
>
> > > + int level = PG_LEVEL_4K;
> > > +
> > > + return __vm_get_page_table_entry(vm, &vm->stage2_mmu, l2_gpa, &level);
> > > +}
> > > +
> > > uint64_t *vm_get_pte(struct kvm_vm *vm, uint64_t vaddr)
> > > {
> > > int level = PG_LEVEL_4K;
> > [..]
> > > @@ -133,35 +220,50 @@ static void test_dirty_log(bool nested_tdp)
> > >
> > > /* Add an extra memory slot for testing dirty logging */
> > > vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS,
> > > - GUEST_TEST_MEM,
> > > + TEST_MEM_BASE,
> > > TEST_MEM_SLOT_INDEX,
> > > TEST_MEM_PAGES,
> > > KVM_MEM_LOG_DIRTY_PAGES);
> > >
> > > /*
> > > - * Add an identity map for GVA range [0xc0000000, 0xc0002000). This
> > > + * Add an identity map for GVA range [0xc0000000, 0xc0004000). This
> > > * affects both L1 and L2. However...
> > > */
> > > - virt_map(vm, GUEST_TEST_MEM, GUEST_TEST_MEM, TEST_MEM_PAGES);
> > > + virt_map(vm, TEST_MEM_BASE, TEST_MEM_BASE, TEST_MEM_PAGES);
> > >
> > > /*
> > > - * ... pages in the L2 GPA range [0xc0001000, 0xc0003000) will map to
> > > - * 0xc0000000.
> > > + * ... pages in the L2 GPA ranges [0xc0001000, 0xc0002000) and
> > > + * [0xc0003000, 0xc0004000) will map to 0xc0000000 and 0xc0001000
> > > + * respectively.
> >
> > Are these ranges correct? I thought L2 GPA range [0xc0002000,
> > 0xc0004000) will map to [0xc0000000, 0xc0002000).
>
> Gah, no. I looked at the comments after changing things around, but my eyes had
> glazed over by that point.
>
> > Also, perhaps it's better to express those in terms of the macros?
> >
> > L2 GPA range [TEST_MEM_ALIAS_BASE, TEST_MEM_ALIAS_BASE + 2*PAGE_SIZE)
> > will map to [TEST_MEM_BASE, TEST_MEM_BASE + 2*PAGE_SIZE)?
>
> Hmm, no, at some point we need to concretely state the addresses, so that people
> debugging this know what to expect, i.e. don't have to manually compute the
> addresses from the macros in order to debug.
I was trying to avoid a situation where the comment gets out of sync
with the macros in a way that gets confusing. Maybe reference both if
it's not too verbose?
/*
* ... pages in the L2 GPA range [0xc0002000, 0xc0004000) at
* TEST_MEM_ALIAS_BASE will map to [[0xc0000000, 0xc0002000) at
* TEST_MEM_BASE.
*/
>
> > > *
> > > * When TDP is disabled, the L2 guest code will still access the same L1
> > > * GPAs as the TDP enabled case.
> > > + *
> > > + * Set the Dirty bit in the PTEs used by L2 so that KVM will create
> > > + * writable SPTEs when handling read faults (if the Dirty bit isn't
> > > + * set, KVM must intercept the next write to emulate the Dirty bit
> > > + * update).
> > > */
> > > if (nested_tdp) {
> > > + vm_vaddr_t gva0 = TEST_GUEST_ADDR(TEST_MEM_ALIAS_BASE, 0);
> > > + vm_vaddr_t gva1 = TEST_GUEST_ADDR(TEST_MEM_ALIAS_BASE, 1);
> >
> > Why are these gvas? Should these be L2 GPAs?
>
> Pure oversight.
>
> > Maybe 'uint64_t l2_gpa0' or 'uint64_t nested_paddr0'?
>
> For better of worse, vm_paddr_t is the typedef in selftests. Hmm, if/when we go
> with David M's proposal to switch to u64 (from e.g. uint64_t), it'd probably be
> a good time to switch to KVM's gva_t and gpa_t as well.
vm_paddr_t is fine too, I am just against using vm_vaddr_t. tdp_map()
takes in uint64_t for the GPAs, which is why I suggested uint64_t here.
>
> > Also maybe add TEST_ALIAS_GPA() macro to keep things consistent?
>
> Ya, then the line lengths are short enough to omit the local variables. How's
> this look?
Looks good, thanks!
>
> /*
> * ... pages in the L2 GPA address range [0xc0002000, 0xc0004000) will
> * map to [0xc0000000, 0xc0002000) when TDP is enabled (for L2).
> *
> * When TDP is disabled, the L2 guest code will still access the same L1
> * GPAs as the TDP enabled case.
> *
> * Set the Dirty bit in the PTEs used by L2 so that KVM will create
> * writable SPTEs when handling read faults (if the Dirty bit isn't
> * set, KVM must intercept the next write to emulate the Dirty bit
> * update).
> */
> if (nested_tdp) {
> tdp_identity_map_default_memslots(vm);
> tdp_map(vm, TEST_ALIAS_GPA(0), TEST_GPA(0), PAGE_SIZE);
> tdp_map(vm, TEST_ALIAS_GPA(1), TEST_GPA(1), PAGE_SIZE);
>
> *tdp_get_pte(vm, TEST_ALIAS_GPA(0)) |= PTE_DIRTY_MASK(&vm->stage2_mmu);
> *tdp_get_pte(vm, TEST_ALIAS_GPA(1)) |= PTE_DIRTY_MASK(&vm->stage2_mmu);
> } else {
> *vm_get_pte(vm, TEST_GVA(0)) |= PTE_DIRTY_MASK(&vm->mmu);
> *vm_get_pte(vm, TEST_GVA(1)) |= PTE_DIRTY_MASK(&vm->mmu);
> }
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v4 21/21] KVM: selftests: Test READ=>WRITE dirty logging behavior for shadow MMU
2026-01-08 16:32 ` Sean Christopherson
2026-01-08 18:01 ` Yosry Ahmed
@ 2026-01-08 20:26 ` Yosry Ahmed
1 sibling, 0 replies; 39+ messages in thread
From: Yosry Ahmed @ 2026-01-08 20:26 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou,
Christian Borntraeger, Janosch Frank, Claudio Imbrenda, kvm,
linux-arm-kernel, kvmarm, loongarch, kvm-riscv, linux-riscv,
linux-kernel
On Thu, Jan 08, 2026 at 08:32:44AM -0800, Sean Christopherson wrote:
[..]
> @@ -106,12 +139,66 @@ static void l1_guest_code(void *data)
> l1_svm_code(data);
> }
>
> +static void test_handle_ucall_sync(struct kvm_vm *vm, u64 arg,
> + unsigned long *bmap)
> +{
> + vm_vaddr_t gva = arg & ~(PAGE_SIZE - 1);
> + int page_nr, i;
> +
> + /*
> + * Extract the page number of underlying physical page, which is also
> + * the _L1_ page number. The dirty bitmap _must_ be updated based on
> + * the L1 GPA, not L2 GPA, i.e. whether or not L2 used an aliased GPA
> + * (i.e. if TDP enabled for L2) is irrelevant with respect to the dirty
> + * bitmap and which underlying physical page is accessed.
> + *
> + * Note, gva will be '0' if there was no access, i.e. if the purpose of
> + * the sync is to verify all pages are clean.
> + */
> + if (!gva)
> + page_nr = 0;
> + else if (gva >= TEST_MEM_ALIAS_BASE)
> + page_nr = (gva - TEST_MEM_ALIAS_BASE) >> PAGE_SHIFT;
> + else
> + page_nr = (gva - TEST_MEM_BASE) >> PAGE_SHIFT;
> + TEST_ASSERT(page_nr == 0 || page_nr == 1,
> + "Test bug, unexpected frame number '%u' for arg = %lx", page_nr, arg);
> + TEST_ASSERT(gva || (arg & TEST_SYNC_NO_FAULT),
> + "Test bug, gva must be valid if a fault is expected");
> +
> + kvm_vm_get_dirty_log(vm, TEST_MEM_SLOT_INDEX, bmap);
> +
> + /*
> + * Check all pages to verify the correct physical page was modified (or
> + * not), and that all pages are clean/dirty as expected.
> + *
> + * If a fault of any kind is expected, the target page should be dirty
> + * as the Dirty bit is set in the gPTE. KVM should create a writable
> + * SPTE even on a read fault, *and* KVM must mark the GFN as dirty
> + * when doing so.
> + */
> + for (i = 0; i < TEST_MEM_PAGES; i++) {
> + if (i == page_nr && arg & TEST_SYNC_WRITE_FAULT)
Micro nit: I think this is slightly clearer:
if (i == page_nr && (arg & TEST_SYNC_WRITE_FAULT))
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v4 21/21] KVM: selftests: Test READ=>WRITE dirty logging behavior for shadow MMU
2026-01-08 20:24 ` Yosry Ahmed
@ 2026-01-08 20:29 ` Sean Christopherson
2026-01-08 20:33 ` Yosry Ahmed
0 siblings, 1 reply; 39+ messages in thread
From: Sean Christopherson @ 2026-01-08 20:29 UTC (permalink / raw)
To: Yosry Ahmed
Cc: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou,
Christian Borntraeger, Janosch Frank, Claudio Imbrenda, kvm,
linux-arm-kernel, kvmarm, loongarch, kvm-riscv, linux-riscv,
linux-kernel
On Thu, Jan 08, 2026, Yosry Ahmed wrote:
> On Thu, Jan 08, 2026 at 10:31:22AM -0800, Sean Christopherson wrote:
> > On Thu, Jan 08, 2026, Yosry Ahmed wrote:
> > > > /*
> > > > - * Add an identity map for GVA range [0xc0000000, 0xc0002000). This
> > > > + * Add an identity map for GVA range [0xc0000000, 0xc0004000). This
> > > > * affects both L1 and L2. However...
> > > > */
> > > > - virt_map(vm, GUEST_TEST_MEM, GUEST_TEST_MEM, TEST_MEM_PAGES);
> > > > + virt_map(vm, TEST_MEM_BASE, TEST_MEM_BASE, TEST_MEM_PAGES);
> > > >
> > > > /*
> > > > - * ... pages in the L2 GPA range [0xc0001000, 0xc0003000) will map to
> > > > - * 0xc0000000.
> > > > + * ... pages in the L2 GPA ranges [0xc0001000, 0xc0002000) and
> > > > + * [0xc0003000, 0xc0004000) will map to 0xc0000000 and 0xc0001000
> > > > + * respectively.
> > >
> > > Are these ranges correct? I thought L2 GPA range [0xc0002000,
> > > 0xc0004000) will map to [0xc0000000, 0xc0002000).
> >
> > Gah, no. I looked at the comments after changing things around, but my eyes had
> > glazed over by that point.
> >
> > > Also, perhaps it's better to express those in terms of the macros?
> > >
> > > L2 GPA range [TEST_MEM_ALIAS_BASE, TEST_MEM_ALIAS_BASE + 2*PAGE_SIZE)
> > > will map to [TEST_MEM_BASE, TEST_MEM_BASE + 2*PAGE_SIZE)?
> >
> > Hmm, no, at some point we need to concretely state the addresses, so that people
> > debugging this know what to expect, i.e. don't have to manually compute the
> > addresses from the macros in order to debug.
>
> I was trying to avoid a situation where the comment gets out of sync
> with the macros in a way that gets confusing. Maybe reference both if
> it's not too verbose?
>
> /*
> * ... pages in the L2 GPA range [0xc0002000, 0xc0004000) at
> * TEST_MEM_ALIAS_BASE will map to [[0xc0000000, 0xc0002000) at
> * TEST_MEM_BASE.
> */
Heh, your solution to a mitigate a comment getting out of sync is to add more
things to the comment that can get out of sync :-D
Unless you feel very strongly about having the names of the macros in the comments,
I'd prefer to keep just the raw addresses.
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v4 21/21] KVM: selftests: Test READ=>WRITE dirty logging behavior for shadow MMU
2026-01-08 20:29 ` Sean Christopherson
@ 2026-01-08 20:33 ` Yosry Ahmed
0 siblings, 0 replies; 39+ messages in thread
From: Yosry Ahmed @ 2026-01-08 20:33 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Marc Zyngier, Oliver Upton, Tianrui Zhao, Bibo Mao,
Huacai Chen, Anup Patel, Paul Walmsley, Palmer Dabbelt, Albert Ou,
Christian Borntraeger, Janosch Frank, Claudio Imbrenda, kvm,
linux-arm-kernel, kvmarm, loongarch, kvm-riscv, linux-riscv,
linux-kernel
On Thu, Jan 08, 2026 at 12:29:09PM -0800, Sean Christopherson wrote:
> On Thu, Jan 08, 2026, Yosry Ahmed wrote:
> > On Thu, Jan 08, 2026 at 10:31:22AM -0800, Sean Christopherson wrote:
> > > On Thu, Jan 08, 2026, Yosry Ahmed wrote:
> > > > > /*
> > > > > - * Add an identity map for GVA range [0xc0000000, 0xc0002000). This
> > > > > + * Add an identity map for GVA range [0xc0000000, 0xc0004000). This
> > > > > * affects both L1 and L2. However...
> > > > > */
> > > > > - virt_map(vm, GUEST_TEST_MEM, GUEST_TEST_MEM, TEST_MEM_PAGES);
> > > > > + virt_map(vm, TEST_MEM_BASE, TEST_MEM_BASE, TEST_MEM_PAGES);
> > > > >
> > > > > /*
> > > > > - * ... pages in the L2 GPA range [0xc0001000, 0xc0003000) will map to
> > > > > - * 0xc0000000.
> > > > > + * ... pages in the L2 GPA ranges [0xc0001000, 0xc0002000) and
> > > > > + * [0xc0003000, 0xc0004000) will map to 0xc0000000 and 0xc0001000
> > > > > + * respectively.
> > > >
> > > > Are these ranges correct? I thought L2 GPA range [0xc0002000,
> > > > 0xc0004000) will map to [0xc0000000, 0xc0002000).
> > >
> > > Gah, no. I looked at the comments after changing things around, but my eyes had
> > > glazed over by that point.
> > >
> > > > Also, perhaps it's better to express those in terms of the macros?
> > > >
> > > > L2 GPA range [TEST_MEM_ALIAS_BASE, TEST_MEM_ALIAS_BASE + 2*PAGE_SIZE)
> > > > will map to [TEST_MEM_BASE, TEST_MEM_BASE + 2*PAGE_SIZE)?
> > >
> > > Hmm, no, at some point we need to concretely state the addresses, so that people
> > > debugging this know what to expect, i.e. don't have to manually compute the
> > > addresses from the macros in order to debug.
> >
> > I was trying to avoid a situation where the comment gets out of sync
> > with the macros in a way that gets confusing. Maybe reference both if
> > it's not too verbose?
> >
> > /*
> > * ... pages in the L2 GPA range [0xc0002000, 0xc0004000) at
> > * TEST_MEM_ALIAS_BASE will map to [[0xc0000000, 0xc0002000) at
> > * TEST_MEM_BASE.
> > */
>
> Heh, your solution to a mitigate a comment getting out of sync is to add more
> things to the comment that can get out of sync :-D
>
> Unless you feel very strongly about having the names of the macros in the comments,
> I'd prefer to keep just the raw addresses.
I don't feel strongly :)
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH v4 00/21] KVM: selftests: Add Nested NPT support
2025-12-30 23:01 [PATCH v4 00/21] KVM: selftests: Add Nested NPT support Sean Christopherson
` (20 preceding siblings ...)
2025-12-30 23:01 ` [PATCH v4 21/21] KVM: selftests: Test READ=>WRITE dirty logging behavior for shadow MMU Sean Christopherson
@ 2026-01-12 17:38 ` Sean Christopherson
21 siblings, 0 replies; 39+ messages in thread
From: Sean Christopherson @ 2026-01-12 17:38 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Marc Zyngier, Oliver Upton,
Tianrui Zhao, Bibo Mao, Huacai Chen, Anup Patel, Paul Walmsley,
Palmer Dabbelt, Albert Ou, Christian Borntraeger, Janosch Frank,
Claudio Imbrenda
Cc: kvm, linux-arm-kernel, kvmarm, loongarch, kvm-riscv, linux-riscv,
linux-kernel, Yosry Ahmed
On Tue, 30 Dec 2025 15:01:29 -0800, Sean Christopherson wrote:
> Yosry's series to add support for nested NPT, and extends vmx_dirty_log_test
> and kvm_dirty_log_test (with -n, using memstress) to cover nested SVM.
>
> Note, I'm mildly concerned the last patch to extend nested_dirty_log_test to
> validate KVM's handling of READ faults could be flaky, e.g. maybe if someone
> is running the test under heavy memory pressure and the to-be-access page is
> swapped between the write-from-host and read-from-guest? But unless someone
> knows/shows it'll be flaky, I'm inclined to apply it and hope for the best.
>
> [...]
Applied everything except the last patch to kvm-x86 selftests. I'll post a new
version separately since we've had a lot of back-and-forth on that one.
Oh, and I fixed up the ncr3_gpa goof in "Add support for nested NPTs".
[01/21] KVM: selftests: Make __vm_get_page_table_entry() static
https://github.com/kvm-x86/linux/commit/69e81ed5e6a5
[02/21] KVM: selftests: Stop passing a memslot to nested_map_memslot()
https://github.com/kvm-x86/linux/commit/97dfbdfea405
[03/21] KVM: selftests: Rename nested TDP mapping functions
https://github.com/kvm-x86/linux/commit/60de423781ad
[04/21] KVM: selftests: Kill eptPageTablePointer
https://github.com/kvm-x86/linux/commit/b320c03d6857
[05/21] KVM: selftests: Stop setting A/D bits when creating EPT PTEs
https://github.com/kvm-x86/linux/commit/3cd5002807be
[06/21] KVM: selftests: Add "struct kvm_mmu" to track a given MMU instance
https://github.com/kvm-x86/linux/commit/9f073ac25b4c
[07/21] KVM: selftests: Plumb "struct kvm_mmu" into x86's MMU APIs
https://github.com/kvm-x86/linux/commit/11825209f549
[08/21] KVM: selftests: Add a "struct kvm_mmu_arch arch" member to kvm_mmu
https://github.com/kvm-x86/linux/commit/3d0e7595e810
[09/21] KVM: selftests: Move PTE bitmasks to kvm_mmu
https://github.com/kvm-x86/linux/commit/6dd70757213f
[10/21] KVM: selftests: Use a TDP MMU to share EPT page tables between vCPUs
https://github.com/kvm-x86/linux/commit/f00f519cebcd
[11/21] KVM: selftests: Stop passing VMX metadata to TDP mapping functions
https://github.com/kvm-x86/linux/commit/e40e72fec0de
[12/21] KVM: selftests: Add a stage-2 MMU instance to kvm_vm
https://github.com/kvm-x86/linux/commit/8296b16c0a2b
[13/21] KVM: selftests: Reuse virt mapping functions for nested EPTs
https://github.com/kvm-x86/linux/commit/508d1cc3ca0a
[14/21] KVM: selftests: Move TDP mapping functions outside of vmx.c
https://github.com/kvm-x86/linux/commit/07676c04bd75
[15/21] KVM: selftests: Allow kvm_cpu_has_ept() to be called on AMD CPUs
https://github.com/kvm-x86/linux/commit/9cb1944f6bf0
[16/21] KVM: selftests: Add support for nested NPTs
https://github.com/kvm-x86/linux/commit/753c0d5a507b
[17/21] KVM: selftests: Set the user bit on nested NPT PTEs
https://github.com/kvm-x86/linux/commit/251e4849a79b
[18/21] KVM: selftests: Extend vmx_dirty_log_test to cover SVM
https://github.com/kvm-x86/linux/commit/6794d916f87e
[19/21] KVM: selftests: Extend memstress to run on nested SVM
https://github.com/kvm-x86/linux/commit/59eef1a47b8c
[20/21] KVM: selftests: Rename vm_get_page_table_entry() to vm_get_pte()
https://github.com/kvm-x86/linux/commit/e353850499c7
[21/21] KVM: selftests: Test READ=>WRITE dirty logging behavior for shadow MMU
*** NOT APPLIED ***
--
https://github.com/kvm-x86/linux/tree/next
^ permalink raw reply [flat|nested] 39+ messages in thread
end of thread, other threads:[~2026-01-12 17:41 UTC | newest]
Thread overview: 39+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-30 23:01 [PATCH v4 00/21] KVM: selftests: Add Nested NPT support Sean Christopherson
2025-12-30 23:01 ` [PATCH v4 01/21] KVM: selftests: Make __vm_get_page_table_entry() static Sean Christopherson
2025-12-30 23:01 ` [PATCH v4 02/21] KVM: selftests: Stop passing a memslot to nested_map_memslot() Sean Christopherson
2025-12-30 23:01 ` [PATCH v4 03/21] KVM: selftests: Rename nested TDP mapping functions Sean Christopherson
2025-12-30 23:01 ` [PATCH v4 04/21] KVM: selftests: Kill eptPageTablePointer Sean Christopherson
2025-12-30 23:01 ` [PATCH v4 05/21] KVM: selftests: Stop setting A/D bits when creating EPT PTEs Sean Christopherson
2025-12-30 23:01 ` [PATCH v4 06/21] KVM: selftests: Add "struct kvm_mmu" to track a given MMU instance Sean Christopherson
2026-01-02 16:50 ` Yosry Ahmed
2025-12-30 23:01 ` [PATCH v4 07/21] KVM: selftests: Plumb "struct kvm_mmu" into x86's MMU APIs Sean Christopherson
2025-12-30 23:01 ` [PATCH v4 08/21] KVM: selftests: Add a "struct kvm_mmu_arch arch" member to kvm_mmu Sean Christopherson
2026-01-02 16:53 ` Yosry Ahmed
2026-01-02 17:02 ` Yosry Ahmed
2025-12-30 23:01 ` [PATCH v4 09/21] KVM: selftests: Move PTE bitmasks " Sean Christopherson
2025-12-30 23:01 ` [PATCH v4 10/21] KVM: selftests: Use a TDP MMU to share EPT page tables between vCPUs Sean Christopherson
2025-12-30 23:01 ` [PATCH v4 11/21] KVM: selftests: Stop passing VMX metadata to TDP mapping functions Sean Christopherson
2026-01-02 16:58 ` Yosry Ahmed
2026-01-02 17:12 ` Yosry Ahmed
2025-12-30 23:01 ` [PATCH v4 12/21] KVM: selftests: Add a stage-2 MMU instance to kvm_vm Sean Christopherson
2026-01-02 17:03 ` Yosry Ahmed
2025-12-30 23:01 ` [PATCH v4 13/21] KVM: selftests: Reuse virt mapping functions for nested EPTs Sean Christopherson
2025-12-30 23:01 ` [PATCH v4 14/21] KVM: selftests: Move TDP mapping functions outside of vmx.c Sean Christopherson
2025-12-30 23:01 ` [PATCH v4 15/21] KVM: selftests: Allow kvm_cpu_has_ept() to be called on AMD CPUs Sean Christopherson
2025-12-30 23:01 ` [PATCH v4 16/21] KVM: selftests: Add support for nested NPTs Sean Christopherson
2026-01-07 23:12 ` Yosry Ahmed
2025-12-30 23:01 ` [PATCH v4 17/21] KVM: selftests: Set the user bit on nested NPT PTEs Sean Christopherson
2025-12-30 23:01 ` [PATCH v4 18/21] KVM: selftests: Extend vmx_dirty_log_test to cover SVM Sean Christopherson
2025-12-30 23:01 ` [PATCH v4 19/21] KVM: selftests: Extend memstress to run on nested SVM Sean Christopherson
2025-12-30 23:01 ` [PATCH v4 20/21] KVM: selftests: Rename vm_get_page_table_entry() to vm_get_pte() Sean Christopherson
2026-01-02 17:10 ` Yosry Ahmed
2025-12-30 23:01 ` [PATCH v4 21/21] KVM: selftests: Test READ=>WRITE dirty logging behavior for shadow MMU Sean Christopherson
2026-01-02 17:36 ` Yosry Ahmed
2026-01-08 16:32 ` Sean Christopherson
2026-01-08 18:01 ` Yosry Ahmed
2026-01-08 18:31 ` Sean Christopherson
2026-01-08 20:24 ` Yosry Ahmed
2026-01-08 20:29 ` Sean Christopherson
2026-01-08 20:33 ` Yosry Ahmed
2026-01-08 20:26 ` Yosry Ahmed
2026-01-12 17:38 ` [PATCH v4 00/21] KVM: selftests: Add Nested NPT support Sean Christopherson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox