[PATCH v2 0/4] KVM ARM64 pre_fault

linux-kselftest.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 0/4] KVM ARM64 pre_fault_memory
@ 2025-10-13 15:14 Jack Thomson
  2025-10-13 15:14 ` [PATCH v2 1/4] KVM: arm64: Add pre_fault_memory implementation Jack Thomson
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Jack Thomson @ 2025-10-13 15:14 UTC (permalink / raw)
  To: maz, oliver.upton, pbonzini
  Cc: joey.gouly, suzuki.poulose, yuzenghui, catalin.marinas, will,
	shuah, linux-arm-kernel, kvmarm, linux-kernel, linux-kselftest,
	isaku.yamahata, roypat, kalyazin, jackabt

From: Jack Thomson <jackabt@amazon.com>

This patch series adds ARM64 support for the KVM_PRE_FAULT_MEMORY
feature, which was previously only available on x86 [1]. This allows us
to reduce the number of stage-2 faults during execution. This is of
benefit in post-copy migration scenarios, particularly in memory
intensive applications, where we are experiencing high latencies due to
the stage-2 faults.

Patch Overview:

 - The first patch adds support for the KVM_PRE_FAULT_MEMORY ioctl
   on arm64.

 - The second patch fixes an issue with unaligned mmap allocations
   in the selftests.

 - The third patch updates the pre_fault_memory_test to support
   arm64.

 - The last patch extends the pre_fault_memory_test to cover
   different vm memory backings.

=== Changes Since v1 [2] ===

Addressing feedback from Oliver:

 - No pre-fault flag is passed to user_mem_abort() or gmem_abort() now
   aborts are synthesized.
 - Remove retry loop from kvm_arch_vcpu_pre_fault_memory()

[1]: https://lore.kernel.org/kvm/20240710174031.312055-1-pbonzini@redhat.com
[2]: https://lore.kernel.org/all/20250911134648.58945-1-jackabt.amazon@gmail.com

Jack Thomson (4):
  KVM: arm64: Add pre_fault_memory implementation
  KVM: selftests: Fix unaligned mmap allocations
  KVM: selftests: Enable pre_fault_memory_test for arm64
  KVM: selftests: Add option for different backing in pre-fault tests

 Documentation/virt/kvm/api.rst                |   3 +-
 arch/arm64/kvm/Kconfig                        |   1 +
 arch/arm64/kvm/arm.c                          |   1 +
 arch/arm64/kvm/mmu.c                          |  73 +++++++++++-
 tools/testing/selftests/kvm/Makefile.kvm      |   1 +
 tools/testing/selftests/kvm/lib/kvm_util.c    |  12 +-
 .../selftests/kvm/pre_fault_memory_test.c     | 110 +++++++++++++-----
 7 files changed, 163 insertions(+), 38 deletions(-)


base-commit: 42188667be387867d2bf763d028654cbad046f7b
-- 
2.43.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 1/4] KVM: arm64: Add pre_fault_memory implementation
  2025-10-13 15:14 [PATCH v2 0/4] KVM ARM64 pre_fault_memory Jack Thomson
@ 2025-10-13 15:14 ` Jack Thomson
  2025-10-16 14:01   ` Suzuki K Poulose
  2025-10-13 15:14 ` [PATCH v2 2/4] KVM: selftests: Fix unaligned mmap allocations Jack Thomson
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 11+ messages in thread
From: Jack Thomson @ 2025-10-13 15:14 UTC (permalink / raw)
  To: maz, oliver.upton, pbonzini
  Cc: joey.gouly, suzuki.poulose, yuzenghui, catalin.marinas, will,
	shuah, linux-arm-kernel, kvmarm, linux-kernel, linux-kselftest,
	isaku.yamahata, roypat, kalyazin, jackabt

From: Jack Thomson <jackabt@amazon.com>

Add kvm_arch_vcpu_pre_fault_memory() for arm64. The implementation hands
off the stage-2 faulting logic to either gmem_abort() or
user_mem_abort().

Add an optional page_size output parameter to user_mem_abort() to
return the VMA page size, which is needed when pre-faulting.

Update the documentation to clarify x86 specific behaviour.

Signed-off-by: Jack Thomson <jackabt@amazon.com>
---
 Documentation/virt/kvm/api.rst |  3 +-
 arch/arm64/kvm/Kconfig         |  1 +
 arch/arm64/kvm/arm.c           |  1 +
 arch/arm64/kvm/mmu.c           | 73 ++++++++++++++++++++++++++++++++--
 4 files changed, 73 insertions(+), 5 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index c17a87a0a5ac..9e8cc4eb505d 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6461,7 +6461,8 @@ Errors:
 KVM_PRE_FAULT_MEMORY populates KVM's stage-2 page tables used to map memory
 for the current vCPU state.  KVM maps memory as if the vCPU generated a
 stage-2 read page fault, e.g. faults in memory as needed, but doesn't break
-CoW.  However, KVM does not mark any newly created stage-2 PTE as Accessed.
+CoW.  However, on x86, KVM does not mark any newly created stage-2 PTE as
+Accessed.
 
 In the case of confidential VM types where there is an initial set up of
 private guest memory before the guest is 'finalized'/measured, this ioctl
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index bff62e75d681..1ac0605f86cb 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -25,6 +25,7 @@ menuconfig KVM
 	select HAVE_KVM_CPU_RELAX_INTERCEPT
 	select KVM_MMIO
 	select KVM_GENERIC_DIRTYLOG_READ_PROTECT
+	select KVM_GENERIC_PRE_FAULT_MEMORY
 	select KVM_XFER_TO_GUEST_WORK
 	select KVM_VFIO
 	select HAVE_KVM_DIRTY_RING_ACQ_REL
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 888f7c7abf54..65654a742864 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -322,6 +322,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_IRQFD_RESAMPLE:
 	case KVM_CAP_COUNTER_OFFSET:
 	case KVM_CAP_ARM_WRITABLE_IMP_ID_REGS:
+	case KVM_CAP_PRE_FAULT_MEMORY:
 		r = 1;
 		break;
 	case KVM_CAP_SET_GUEST_DEBUG2:
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index a36426ccd9b5..82f122e4b08c 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1597,8 +1597,8 @@ static int gmem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 
 static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			  struct kvm_s2_trans *nested,
-			  struct kvm_memory_slot *memslot, unsigned long hva,
-			  bool fault_is_perm)
+			  struct kvm_memory_slot *memslot, long *page_size,
+			  unsigned long hva, bool fault_is_perm)
 {
 	int ret = 0;
 	bool topup_memcache;
@@ -1871,6 +1871,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	kvm_release_faultin_page(kvm, page, !!ret, writable);
 	kvm_fault_unlock(kvm);
 
+	if (page_size)
+		*page_size = vma_pagesize;
+
 	/* Mark the page dirty only if the fault is handled successfully */
 	if (writable && !ret)
 		mark_page_dirty_in_slot(kvm, memslot, gfn);
@@ -2069,8 +2072,8 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 		ret = gmem_abort(vcpu, fault_ipa, nested, memslot,
 				 esr_fsc_is_permission_fault(esr));
 	else
-		ret = user_mem_abort(vcpu, fault_ipa, nested, memslot, hva,
-				     esr_fsc_is_permission_fault(esr));
+		ret = user_mem_abort(vcpu, fault_ipa, nested, memslot, NULL,
+				     hva, esr_fsc_is_permission_fault(esr));
 	if (ret == 0)
 		ret = 1;
 out:
@@ -2446,3 +2449,65 @@ void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled)
 
 	trace_kvm_toggle_cache(*vcpu_pc(vcpu), was_enabled, now_enabled);
 }
+
+long kvm_arch_vcpu_pre_fault_memory(struct kvm_vcpu *vcpu,
+				    struct kvm_pre_fault_memory *range)
+{
+	int ret, idx;
+	hva_t hva;
+	phys_addr_t end;
+	struct kvm_memory_slot *memslot;
+	struct kvm_vcpu_fault_info stored_fault, *fault_info;
+
+	long page_size = PAGE_SIZE;
+	phys_addr_t ipa = range->gpa;
+	gfn_t gfn = gpa_to_gfn(range->gpa);
+
+	idx = srcu_read_lock(&vcpu->kvm->srcu);
+
+	if (ipa >= kvm_phys_size(vcpu->arch.hw_mmu)) {
+		ret = -ENOENT;
+		goto out_unlock;
+	}
+
+	memslot = gfn_to_memslot(vcpu->kvm, gfn);
+	if (!memslot) {
+		ret = -ENOENT;
+		goto out_unlock;
+	}
+
+	fault_info = &vcpu->arch.fault;
+	stored_fault = *fault_info;
+
+	/* Generate a synthetic abort for the pre-fault address */
+	fault_info->esr_el2 = FIELD_PREP(ESR_ELx_EC_MASK, ESR_ELx_EC_DABT_CUR);
+	fault_info->esr_el2 &= ~ESR_ELx_ISV;
+	fault_info->esr_el2 |= ESR_ELx_FSC_FAULT_L(KVM_PGTABLE_LAST_LEVEL);
+
+	fault_info->hpfar_el2 = HPFAR_EL2_NS |
+		FIELD_PREP(HPFAR_EL2_FIPA, ipa >> 12);
+
+	if (kvm_slot_has_gmem(memslot)) {
+		ret = gmem_abort(vcpu, ipa, NULL, memslot, false);
+	} else {
+		hva = gfn_to_hva_memslot_prot(memslot, gfn, NULL);
+		if (kvm_is_error_hva(hva)) {
+			ret = -EFAULT;
+			goto out;
+		}
+		ret = user_mem_abort(vcpu, ipa, NULL, memslot, &page_size, hva,
+				     false);
+	}
+
+	if (ret < 0)
+		goto out;
+
+	end = (range->gpa & ~(page_size - 1)) + page_size;
+	ret = min(range->size, end - range->gpa);
+
+out:
+	*fault_info = stored_fault;
+out_unlock:
+	srcu_read_unlock(&vcpu->kvm->srcu, idx);
+	return ret;
+}
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 2/4] KVM: selftests: Fix unaligned mmap allocations
  2025-10-13 15:14 [PATCH v2 0/4] KVM ARM64 pre_fault_memory Jack Thomson
  2025-10-13 15:14 ` [PATCH v2 1/4] KVM: arm64: Add pre_fault_memory implementation Jack Thomson
@ 2025-10-13 15:14 ` Jack Thomson
  2025-10-23 17:16   ` Sean Christopherson
  2025-10-13 15:15 ` [PATCH v2 3/4] KVM: selftests: Enable pre_fault_memory_test for arm64 Jack Thomson
  2025-10-13 15:15 ` [PATCH v2 4/4] KVM: selftests: Add option for different backing in pre-fault tests Jack Thomson
  3 siblings, 1 reply; 11+ messages in thread
From: Jack Thomson @ 2025-10-13 15:14 UTC (permalink / raw)
  To: maz, oliver.upton, pbonzini
  Cc: joey.gouly, suzuki.poulose, yuzenghui, catalin.marinas, will,
	shuah, linux-arm-kernel, kvmarm, linux-kernel, linux-kselftest,
	isaku.yamahata, roypat, kalyazin, jackabt

From: Jack Thomson <jackabt@amazon.com>

When creating a VM using mmap with huge pages, and the memory amount does
not align with the underlying page size. The stored mmap_size value does
not account for the fact that mmap will automatically align the length
to a multiple of the underlying page size. During the teardown of the
test, munmap is used. However, munmap requires the length to be a
multiple of the underlying page size.

Update the vm_mem_add method to ensure the mmap_size is aligned to the
underlying page size.

Signed-off-by: Jack Thomson <jackabt@amazon.com>
---
 tools/testing/selftests/kvm/lib/kvm_util.c | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
index c3f5142b0a54..b106fbed999c 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -1051,7 +1051,6 @@ void vm_mem_add(struct kvm_vm *vm, enum vm_mem_backing_src_type src_type,
 	/* Allocate and initialize new mem region structure. */
 	region = calloc(1, sizeof(*region));
 	TEST_ASSERT(region != NULL, "Insufficient Memory");
-	region->mmap_size = mem_size;
 
 #ifdef __s390x__
 	/* On s390x, the host address must be aligned to 1M (due to PGSTEs) */
@@ -1060,6 +1059,11 @@ void vm_mem_add(struct kvm_vm *vm, enum vm_mem_backing_src_type src_type,
 	alignment = 1;
 #endif
 
+	alignment = max(backing_src_pagesz, alignment);
+	region->mmap_size = align_up(mem_size, alignment);
+
+	TEST_ASSERT_EQ(guest_paddr, align_up(guest_paddr, backing_src_pagesz));
+
 	/*
 	 * When using THP mmap is not guaranteed to returned a hugepage aligned
 	 * address so we have to pad the mmap. Padding is not needed for HugeTLB
@@ -1067,12 +1071,6 @@ void vm_mem_add(struct kvm_vm *vm, enum vm_mem_backing_src_type src_type,
 	 * page size.
 	 */
 	if (src_type == VM_MEM_SRC_ANONYMOUS_THP)
-		alignment = max(backing_src_pagesz, alignment);
-
-	TEST_ASSERT_EQ(guest_paddr, align_up(guest_paddr, backing_src_pagesz));
-
-	/* Add enough memory to align up if necessary */
-	if (alignment > 1)
 		region->mmap_size += alignment;
 
 	region->fd = -1;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 3/4] KVM: selftests: Enable pre_fault_memory_test for arm64
  2025-10-13 15:14 [PATCH v2 0/4] KVM ARM64 pre_fault_memory Jack Thomson
  2025-10-13 15:14 ` [PATCH v2 1/4] KVM: arm64: Add pre_fault_memory implementation Jack Thomson
  2025-10-13 15:14 ` [PATCH v2 2/4] KVM: selftests: Fix unaligned mmap allocations Jack Thomson
@ 2025-10-13 15:15 ` Jack Thomson
  2025-10-13 15:15 ` [PATCH v2 4/4] KVM: selftests: Add option for different backing in pre-fault tests Jack Thomson
  3 siblings, 0 replies; 11+ messages in thread
From: Jack Thomson @ 2025-10-13 15:15 UTC (permalink / raw)
  To: maz, oliver.upton, pbonzini
  Cc: joey.gouly, suzuki.poulose, yuzenghui, catalin.marinas, will,
	shuah, linux-arm-kernel, kvmarm, linux-kernel, linux-kselftest,
	isaku.yamahata, roypat, kalyazin, jackabt

From: Jack Thomson <jackabt@amazon.com>

Enable the pre_fault_memory_test to run on arm64 by making it work with
different guest page sizes and testing multiple guest configurations.

Update the test_assert to compare against the UCALL_EXIT_REASON, for
portability, as arm64 exits with KVM_EXIT_MMIO while x86 uses
KVM_EXIT_IO.

Signed-off-by: Jack Thomson <jackabt@amazon.com>
---
 tools/testing/selftests/kvm/Makefile.kvm      |  1 +
 .../selftests/kvm/pre_fault_memory_test.c     | 79 ++++++++++++++-----
 2 files changed, 59 insertions(+), 21 deletions(-)

diff --git a/tools/testing/selftests/kvm/Makefile.kvm b/tools/testing/selftests/kvm/Makefile.kvm
index 90f03f00cb04..4db1737fad04 100644
--- a/tools/testing/selftests/kvm/Makefile.kvm
+++ b/tools/testing/selftests/kvm/Makefile.kvm
@@ -180,6 +180,7 @@ TEST_GEN_PROGS_arm64 += memslot_perf_test
 TEST_GEN_PROGS_arm64 += mmu_stress_test
 TEST_GEN_PROGS_arm64 += rseq_test
 TEST_GEN_PROGS_arm64 += steal_time
+TEST_GEN_PROGS_arm64 += pre_fault_memory_test
 
 TEST_GEN_PROGS_s390 = $(TEST_GEN_PROGS_COMMON)
 TEST_GEN_PROGS_s390 += s390/memop
diff --git a/tools/testing/selftests/kvm/pre_fault_memory_test.c b/tools/testing/selftests/kvm/pre_fault_memory_test.c
index 0350a8896a2f..ed9848a8af60 100644
--- a/tools/testing/selftests/kvm/pre_fault_memory_test.c
+++ b/tools/testing/selftests/kvm/pre_fault_memory_test.c
@@ -10,19 +10,29 @@
 #include <test_util.h>
 #include <kvm_util.h>
 #include <processor.h>
+#include <guest_modes.h>
 
 /* Arbitrarily chosen values */
-#define TEST_SIZE		(SZ_2M + PAGE_SIZE)
-#define TEST_NPAGES		(TEST_SIZE / PAGE_SIZE)
+#define TEST_BASE_SIZE		SZ_2M
 #define TEST_SLOT		10
 
+/* Storage of test info to share with guest code */
+struct test_config {
+	int page_size;
+	uint64_t test_size;
+	uint64_t test_num_pages;
+};
+
+struct test_config test_config;
+
 static void guest_code(uint64_t base_gpa)
 {
 	volatile uint64_t val __used;
+	struct test_config *config = &test_config;
 	int i;
 
-	for (i = 0; i < TEST_NPAGES; i++) {
-		uint64_t *src = (uint64_t *)(base_gpa + i * PAGE_SIZE);
+	for (i = 0; i < config->test_num_pages; i++) {
+		uint64_t *src = (uint64_t *)(base_gpa + i * config->page_size);
 
 		val = *src;
 	}
@@ -63,11 +73,17 @@ static void pre_fault_memory(struct kvm_vcpu *vcpu, u64 gpa, u64 size,
 					    "KVM_PRE_FAULT_MEMORY", ret, vcpu->vm);
 }
 
-static void __test_pre_fault_memory(unsigned long vm_type, bool private)
+struct test_params {
+	unsigned long vm_type;
+	bool private;
+};
+
+static void __test_pre_fault_memory(enum vm_guest_mode guest_mode, void *arg)
 {
+	struct test_params *p = arg;
 	const struct vm_shape shape = {
-		.mode = VM_MODE_DEFAULT,
-		.type = vm_type,
+		.mode = guest_mode,
+		.type = p->vm_type,
 	};
 	struct kvm_vcpu *vcpu;
 	struct kvm_run *run;
@@ -78,10 +94,17 @@ static void __test_pre_fault_memory(unsigned long vm_type, bool private)
 	uint64_t guest_test_virt_mem;
 	uint64_t alignment, guest_page_size;
 
+	pr_info("Testing guest mode: %s\n", vm_guest_mode_string(guest_mode));
+
 	vm = vm_create_shape_with_one_vcpu(shape, &vcpu, guest_code);
 
-	alignment = guest_page_size = vm_guest_mode_params[VM_MODE_DEFAULT].page_size;
-	guest_test_phys_mem = (vm->max_gfn - TEST_NPAGES) * guest_page_size;
+	guest_page_size = vm_guest_mode_params[guest_mode].page_size;
+
+	test_config.page_size = guest_page_size;
+	test_config.test_size = TEST_BASE_SIZE + test_config.page_size;
+	test_config.test_num_pages = vm_calc_num_guest_pages(vm->mode, test_config.test_size);
+
+	guest_test_phys_mem = (vm->max_gfn - test_config.test_num_pages) * test_config.page_size;
 #ifdef __s390x__
 	alignment = max(0x100000UL, guest_page_size);
 #else
@@ -91,22 +114,31 @@ static void __test_pre_fault_memory(unsigned long vm_type, bool private)
 	guest_test_virt_mem = guest_test_phys_mem & ((1ULL << (vm->va_bits - 1)) - 1);
 
 	vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS,
-				    guest_test_phys_mem, TEST_SLOT, TEST_NPAGES,
-				    private ? KVM_MEM_GUEST_MEMFD : 0);
-	virt_map(vm, guest_test_virt_mem, guest_test_phys_mem, TEST_NPAGES);
-
-	if (private)
-		vm_mem_set_private(vm, guest_test_phys_mem, TEST_SIZE);
-	pre_fault_memory(vcpu, guest_test_phys_mem, SZ_2M, 0);
-	pre_fault_memory(vcpu, guest_test_phys_mem + SZ_2M, PAGE_SIZE * 2, PAGE_SIZE);
-	pre_fault_memory(vcpu, guest_test_phys_mem + TEST_SIZE, PAGE_SIZE, PAGE_SIZE);
+				    guest_test_phys_mem, TEST_SLOT, test_config.test_num_pages,
+				    p->private ? KVM_MEM_GUEST_MEMFD : 0);
+	virt_map(vm, guest_test_virt_mem, guest_test_phys_mem, test_config.test_num_pages);
+
+	if (p->private)
+		vm_mem_set_private(vm, guest_test_phys_mem, test_config.test_size);
+	pre_fault_memory(vcpu, guest_test_phys_mem, TEST_BASE_SIZE, 0);
+	/* Test pre-faulting over an already faulted range */
+	pre_fault_memory(vcpu, guest_test_phys_mem, TEST_BASE_SIZE, 0);
+	pre_fault_memory(vcpu, guest_test_phys_mem + TEST_BASE_SIZE,
+			 test_config.page_size * 2, test_config.page_size);
+	pre_fault_memory(vcpu, guest_test_phys_mem + test_config.test_size,
+			 test_config.page_size, test_config.page_size);
 
 	vcpu_args_set(vcpu, 1, guest_test_virt_mem);
+
+	/* Export the shared variables to the guest. */
+	sync_global_to_guest(vm, test_config);
+
 	vcpu_run(vcpu);
 
 	run = vcpu->run;
-	TEST_ASSERT(run->exit_reason == KVM_EXIT_IO,
-		    "Wanted KVM_EXIT_IO, got exit reason: %u (%s)",
+	TEST_ASSERT(run->exit_reason == UCALL_EXIT_REASON,
+		    "Wanted %s, got exit reason: %u (%s)",
+		    exit_reason_str(UCALL_EXIT_REASON),
 		    run->exit_reason, exit_reason_str(run->exit_reason));
 
 	switch (get_ucall(vcpu, &uc)) {
@@ -130,7 +162,12 @@ static void test_pre_fault_memory(unsigned long vm_type, bool private)
 		return;
 	}
 
-	__test_pre_fault_memory(vm_type, private);
+	struct test_params p = {
+		.vm_type = vm_type,
+		.private = private,
+	};
+
+	for_each_guest_mode(__test_pre_fault_memory, &p);
 }
 
 int main(int argc, char *argv[])
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 4/4] KVM: selftests: Add option for different backing in pre-fault tests
  2025-10-13 15:14 [PATCH v2 0/4] KVM ARM64 pre_fault_memory Jack Thomson
                   ` (2 preceding siblings ...)
  2025-10-13 15:15 ` [PATCH v2 3/4] KVM: selftests: Enable pre_fault_memory_test for arm64 Jack Thomson
@ 2025-10-13 15:15 ` Jack Thomson
  3 siblings, 0 replies; 11+ messages in thread
From: Jack Thomson @ 2025-10-13 15:15 UTC (permalink / raw)
  To: maz, oliver.upton, pbonzini
  Cc: joey.gouly, suzuki.poulose, yuzenghui, catalin.marinas, will,
	shuah, linux-arm-kernel, kvmarm, linux-kernel, linux-kselftest,
	isaku.yamahata, roypat, kalyazin, jackabt

From: Jack Thomson <jackabt@amazon.com>

Add a -m option to specify different memory backing types for the
pre-fault tests (e.g., anonymous, hugetlb), allowing testing of the
pre-fault functionality across different memory configurations.

Signed-off-by: Jack Thomson <jackabt@amazon.com>
---
 .../selftests/kvm/pre_fault_memory_test.c     | 31 ++++++++++++++++---
 1 file changed, 26 insertions(+), 5 deletions(-)

diff --git a/tools/testing/selftests/kvm/pre_fault_memory_test.c b/tools/testing/selftests/kvm/pre_fault_memory_test.c
index ed9848a8af60..22e2e53945d9 100644
--- a/tools/testing/selftests/kvm/pre_fault_memory_test.c
+++ b/tools/testing/selftests/kvm/pre_fault_memory_test.c
@@ -76,6 +76,7 @@ static void pre_fault_memory(struct kvm_vcpu *vcpu, u64 gpa, u64 size,
 struct test_params {
 	unsigned long vm_type;
 	bool private;
+	enum vm_mem_backing_src_type mem_backing_src;
 };
 
 static void __test_pre_fault_memory(enum vm_guest_mode guest_mode, void *arg)
@@ -94,7 +95,11 @@ static void __test_pre_fault_memory(enum vm_guest_mode guest_mode, void *arg)
 	uint64_t guest_test_virt_mem;
 	uint64_t alignment, guest_page_size;
 
+	size_t backing_src_pagesz = get_backing_src_pagesz(p->mem_backing_src);
+
 	pr_info("Testing guest mode: %s\n", vm_guest_mode_string(guest_mode));
+	pr_info("Testing memory backing src type: %s\n",
+		vm_mem_backing_src_alias(p->mem_backing_src)->name);
 
 	vm = vm_create_shape_with_one_vcpu(shape, &vcpu, guest_code);
 
@@ -110,10 +115,11 @@ static void __test_pre_fault_memory(enum vm_guest_mode guest_mode, void *arg)
 #else
 	alignment = SZ_2M;
 #endif
+	alignment = max(alignment, backing_src_pagesz);
 	guest_test_phys_mem = align_down(guest_test_phys_mem, alignment);
 	guest_test_virt_mem = guest_test_phys_mem & ((1ULL << (vm->va_bits - 1)) - 1);
 
-	vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS,
+	vm_userspace_mem_region_add(vm, p->mem_backing_src,
 				    guest_test_phys_mem, TEST_SLOT, test_config.test_num_pages,
 				    p->private ? KVM_MEM_GUEST_MEMFD : 0);
 	virt_map(vm, guest_test_virt_mem, guest_test_phys_mem, test_config.test_num_pages);
@@ -155,7 +161,8 @@ static void __test_pre_fault_memory(enum vm_guest_mode guest_mode, void *arg)
 	kvm_vm_free(vm);
 }
 
-static void test_pre_fault_memory(unsigned long vm_type, bool private)
+static void test_pre_fault_memory(unsigned long vm_type, enum vm_mem_backing_src_type backing_src,
+				  bool private)
 {
 	if (vm_type && !(kvm_check_cap(KVM_CAP_VM_TYPES) & BIT(vm_type))) {
 		pr_info("Skipping tests for vm_type 0x%lx\n", vm_type);
@@ -165,6 +172,7 @@ static void test_pre_fault_memory(unsigned long vm_type, bool private)
 	struct test_params p = {
 		.vm_type = vm_type,
 		.private = private,
+		.mem_backing_src = backing_src,
 	};
 
 	for_each_guest_mode(__test_pre_fault_memory, &p);
@@ -174,10 +182,23 @@ int main(int argc, char *argv[])
 {
 	TEST_REQUIRE(kvm_check_cap(KVM_CAP_PRE_FAULT_MEMORY));
 
-	test_pre_fault_memory(0, false);
+	int opt;
+	enum vm_mem_backing_src_type backing = VM_MEM_SRC_ANONYMOUS;
+
+	while ((opt = getopt(argc, argv, "m:")) != -1) {
+		switch (opt) {
+		case 'm':
+			backing = parse_backing_src_type(optarg);
+			break;
+		default:
+			break;
+		}
+	}
+
+	test_pre_fault_memory(0, backing, false);
 #ifdef __x86_64__
-	test_pre_fault_memory(KVM_X86_SW_PROTECTED_VM, false);
-	test_pre_fault_memory(KVM_X86_SW_PROTECTED_VM, true);
+	test_pre_fault_memory(KVM_X86_SW_PROTECTED_VM, backing, false);
+	test_pre_fault_memory(KVM_X86_SW_PROTECTED_VM, backing, true);
 #endif
 	return 0;
 }
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 1/4] KVM: arm64: Add pre_fault_memory implementation
  2025-10-13 15:14 ` [PATCH v2 1/4] KVM: arm64: Add pre_fault_memory implementation Jack Thomson
@ 2025-10-16 14:01   ` Suzuki K Poulose
  0 siblings, 0 replies; 11+ messages in thread
From: Suzuki K Poulose @ 2025-10-16 14:01 UTC (permalink / raw)
  To: Jack Thomson, maz, oliver.upton, pbonzini
  Cc: joey.gouly, yuzenghui, catalin.marinas, will, shuah,
	linux-arm-kernel, kvmarm, linux-kernel, linux-kselftest,
	isaku.yamahata, roypat, kalyazin, jackabt

Hi

On 13/10/2025 16:14, Jack Thomson wrote:
> From: Jack Thomson <jackabt@amazon.com>
> 
> Add kvm_arch_vcpu_pre_fault_memory() for arm64. The implementation hands
> off the stage-2 faulting logic to either gmem_abort() or
> user_mem_abort().
> 
> Add an optional page_size output parameter to user_mem_abort() to
> return the VMA page size, which is needed when pre-faulting.
> 
> Update the documentation to clarify x86 specific behaviour.

Thanks for the patch ! Do we care about faulting beyond the requested 
range ? I understand this doesn't happen for anything that is not
backed by gmem (which might change with hugetlbfs support) or normal
VMs. But for coco VMs this might affect the measurement or even cause
failure in "pre-faulting" because of the extra security checks.
(e.g., trying to fault in twice, because the range is backed by say,
1G page).

Of course these could be addressed via a separate patch, when this
becomes a real requirement.

One way to solve this could be pass on the "pagesize" as the input
parameter which could force the backend to limit the vma_pagesize
that gets used for the stage2 mapping.

> 
> Signed-off-by: Jack Thomson <jackabt@amazon.com>
> ---
>   Documentation/virt/kvm/api.rst |  3 +-
>   arch/arm64/kvm/Kconfig         |  1 +
>   arch/arm64/kvm/arm.c           |  1 +
>   arch/arm64/kvm/mmu.c           | 73 ++++++++++++++++++++++++++++++++--
>   4 files changed, 73 insertions(+), 5 deletions(-)
> 
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index c17a87a0a5ac..9e8cc4eb505d 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -6461,7 +6461,8 @@ Errors:
>   KVM_PRE_FAULT_MEMORY populates KVM's stage-2 page tables used to map memory
>   for the current vCPU state.  KVM maps memory as if the vCPU generated a
>   stage-2 read page fault, e.g. faults in memory as needed, but doesn't break
> -CoW.  However, KVM does not mark any newly created stage-2 PTE as Accessed.
> +CoW.  However, on x86, KVM does not mark any newly created stage-2 PTE as
> +Accessed.
>   
>   In the case of confidential VM types where there is an initial set up of
>   private guest memory before the guest is 'finalized'/measured, this ioctl
> diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
> index bff62e75d681..1ac0605f86cb 100644
> --- a/arch/arm64/kvm/Kconfig
> +++ b/arch/arm64/kvm/Kconfig
> @@ -25,6 +25,7 @@ menuconfig KVM
>   	select HAVE_KVM_CPU_RELAX_INTERCEPT
>   	select KVM_MMIO
>   	select KVM_GENERIC_DIRTYLOG_READ_PROTECT
> +	select KVM_GENERIC_PRE_FAULT_MEMORY
>   	select KVM_XFER_TO_GUEST_WORK
>   	select KVM_VFIO
>   	select HAVE_KVM_DIRTY_RING_ACQ_REL
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 888f7c7abf54..65654a742864 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -322,6 +322,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>   	case KVM_CAP_IRQFD_RESAMPLE:
>   	case KVM_CAP_COUNTER_OFFSET:
>   	case KVM_CAP_ARM_WRITABLE_IMP_ID_REGS:
> +	case KVM_CAP_PRE_FAULT_MEMORY:
>   		r = 1;
>   		break;
>   	case KVM_CAP_SET_GUEST_DEBUG2:
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index a36426ccd9b5..82f122e4b08c 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1597,8 +1597,8 @@ static int gmem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>   
>   static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>   			  struct kvm_s2_trans *nested,
> -			  struct kvm_memory_slot *memslot, unsigned long hva,
> -			  bool fault_is_perm)
> +			  struct kvm_memory_slot *memslot, long *page_size,
> +			  unsigned long hva, bool fault_is_perm)
>   {
>   	int ret = 0;
>   	bool topup_memcache;
> @@ -1871,6 +1871,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>   	kvm_release_faultin_page(kvm, page, !!ret, writable);
>   	kvm_fault_unlock(kvm);
>   
> +	if (page_size)
> +		*page_size = vma_pagesize;
> +
>   	/* Mark the page dirty only if the fault is handled successfully */
>   	if (writable && !ret)
>   		mark_page_dirty_in_slot(kvm, memslot, gfn);
> @@ -2069,8 +2072,8 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
>   		ret = gmem_abort(vcpu, fault_ipa, nested, memslot,
>   				 esr_fsc_is_permission_fault(esr));
>   	else
> -		ret = user_mem_abort(vcpu, fault_ipa, nested, memslot, hva,
> -				     esr_fsc_is_permission_fault(esr));
> +		ret = user_mem_abort(vcpu, fault_ipa, nested, memslot, NULL,
> +				     hva, esr_fsc_is_permission_fault(esr));
>   	if (ret == 0)
>   		ret = 1;
>   out:
> @@ -2446,3 +2449,65 @@ void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled)
>   
>   	trace_kvm_toggle_cache(*vcpu_pc(vcpu), was_enabled, now_enabled);
>   }
> +
> +long kvm_arch_vcpu_pre_fault_memory(struct kvm_vcpu *vcpu,
> +				    struct kvm_pre_fault_memory *range)
> +{
> +	int ret, idx;
> +	hva_t hva;
> +	phys_addr_t end;
> +	struct kvm_memory_slot *memslot;
> +	struct kvm_vcpu_fault_info stored_fault, *fault_info;
> +
> +	long page_size = PAGE_SIZE;
> +	phys_addr_t ipa = range->gpa;
> +	gfn_t gfn = gpa_to_gfn(range->gpa);
> +
> +	idx = srcu_read_lock(&vcpu->kvm->srcu);
> +
> +	if (ipa >= kvm_phys_size(vcpu->arch.hw_mmu)) {
> +		ret = -ENOENT;
> +		goto out_unlock;
> +	}
> +
> +	memslot = gfn_to_memslot(vcpu->kvm, gfn);
> +	if (!memslot) {
> +		ret = -ENOENT;
> +		goto out_unlock;
> +	}
> +
> +	fault_info = &vcpu->arch.fault;
> +	stored_fault = *fault_info;
> +
> +	/* Generate a synthetic abort for the pre-fault address */
> +	fault_info->esr_el2 = FIELD_PREP(ESR_ELx_EC_MASK, ESR_ELx_EC_DABT_CUR);

minor nit: Any reason why we don't use ESR_ELx_EC_DABT_LOW ? We always
get that for a data abort from the guest. Otherwise, this looks
good to me.

Suzuki


> +	fault_info->esr_el2 &= ~ESR_ELx_ISV;
> +	fault_info->esr_el2 |= ESR_ELx_FSC_FAULT_L(KVM_PGTABLE_LAST_LEVEL);
> +
> +	fault_info->hpfar_el2 = HPFAR_EL2_NS |
> +		FIELD_PREP(HPFAR_EL2_FIPA, ipa >> 12);
> +
> +	if (kvm_slot_has_gmem(memslot)) {
> +		ret = gmem_abort(vcpu, ipa, NULL, memslot, false);
> +	} else {
> +		hva = gfn_to_hva_memslot_prot(memslot, gfn, NULL);
> +		if (kvm_is_error_hva(hva)) {
> +			ret = -EFAULT;
> +			goto out;
> +		}
> +		ret = user_mem_abort(vcpu, ipa, NULL, memslot, &page_size, hva,
> +				     false);
> +	}
> +
> +	if (ret < 0)
> +		goto out;
> +
> +	end = (range->gpa & ~(page_size - 1)) + page_size;
> +	ret = min(range->size, end - range->gpa);
> +
> +out:
> +	*fault_info = stored_fault;
> +out_unlock:
> +	srcu_read_unlock(&vcpu->kvm->srcu, idx);
> +	return ret;
> +}
m


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/4] KVM: selftests: Fix unaligned mmap allocations
  2025-10-13 15:14 ` [PATCH v2 2/4] KVM: selftests: Fix unaligned mmap allocations Jack Thomson
@ 2025-10-23 17:16   ` Sean Christopherson
  2025-10-28 11:44     ` Thomson, Jack
  0 siblings, 1 reply; 11+ messages in thread
From: Sean Christopherson @ 2025-10-23 17:16 UTC (permalink / raw)
  To: Jack Thomson
  Cc: maz, oliver.upton, pbonzini, joey.gouly, suzuki.poulose,
	yuzenghui, catalin.marinas, will, shuah, linux-arm-kernel, kvmarm,
	linux-kernel, linux-kselftest, isaku.yamahata, roypat, kalyazin,
	jackabt

On Mon, Oct 13, 2025, Jack Thomson wrote:
> From: Jack Thomson <jackabt@amazon.com>
> 
> When creating a VM using mmap with huge pages, and the memory amount does
> not align with the underlying page size. The stored mmap_size value does
> not account for the fact that mmap will automatically align the length
> to a multiple of the underlying page size. During the teardown of the
> test, munmap is used. However, munmap requires the length to be a
> multiple of the underlying page size.

What happens when selftests use the wrong map_size?  E.g. is munmap() silently
failing?  If so, then I should probably take this particular patch through
kvm-x86/gmem, otherwise it means we'll start getting asserts due to:

  3223560c93eb ("KVM: selftests: Define wrappers for common syscalls to assert success")

If munmap() isn't failing, then that begs the question of what this patch is
actually doing :-)

> Update the vm_mem_add method to ensure the mmap_size is aligned to the
> underlying page size.
> 
> Signed-off-by: Jack Thomson <jackabt@amazon.com>
> ---
>  tools/testing/selftests/kvm/lib/kvm_util.c | 12 +++++-------
>  1 file changed, 5 insertions(+), 7 deletions(-)
> 
> diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
> index c3f5142b0a54..b106fbed999c 100644
> --- a/tools/testing/selftests/kvm/lib/kvm_util.c
> +++ b/tools/testing/selftests/kvm/lib/kvm_util.c
> @@ -1051,7 +1051,6 @@ void vm_mem_add(struct kvm_vm *vm, enum vm_mem_backing_src_type src_type,
>  	/* Allocate and initialize new mem region structure. */
>  	region = calloc(1, sizeof(*region));
>  	TEST_ASSERT(region != NULL, "Insufficient Memory");
> -	region->mmap_size = mem_size;
>  
>  #ifdef __s390x__
>  	/* On s390x, the host address must be aligned to 1M (due to PGSTEs) */
> @@ -1060,6 +1059,11 @@ void vm_mem_add(struct kvm_vm *vm, enum vm_mem_backing_src_type src_type,
>  	alignment = 1;
>  #endif
>  
> +	alignment = max(backing_src_pagesz, alignment);
> +	region->mmap_size = align_up(mem_size, alignment);
> +
> +	TEST_ASSERT_EQ(guest_paddr, align_up(guest_paddr, backing_src_pagesz));
> +
>  	/*
>  	 * When using THP mmap is not guaranteed to returned a hugepage aligned
>  	 * address so we have to pad the mmap. Padding is not needed for HugeTLB
> @@ -1067,12 +1071,6 @@ void vm_mem_add(struct kvm_vm *vm, enum vm_mem_backing_src_type src_type,
>  	 * page size.
>  	 */
>  	if (src_type == VM_MEM_SRC_ANONYMOUS_THP)
> -		alignment = max(backing_src_pagesz, alignment);
> -
> -	TEST_ASSERT_EQ(guest_paddr, align_up(guest_paddr, backing_src_pagesz));
> -
> -	/* Add enough memory to align up if necessary */
> -	if (alignment > 1)
>  		region->mmap_size += alignment;
>  
>  	region->fd = -1;
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/4] KVM: selftests: Fix unaligned mmap allocations
  2025-10-23 17:16   ` Sean Christopherson
@ 2025-10-28 11:44     ` Thomson, Jack
  2025-11-03 21:08       ` Sean Christopherson
  0 siblings, 1 reply; 11+ messages in thread
From: Thomson, Jack @ 2025-10-28 11:44 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: maz, oliver.upton, pbonzini, joey.gouly, suzuki.poulose,
	yuzenghui, catalin.marinas, will, shuah, linux-arm-kernel, kvmarm,
	linux-kernel, linux-kselftest, isaku.yamahata, roypat, kalyazin,
	jackabt



On 23/10/2025 6:16 pm, Sean Christopherson wrote:
> On Mon, Oct 13, 2025, Jack Thomson wrote:
>> From: Jack Thomson <jackabt@amazon.com>
>>
>> When creating a VM using mmap with huge pages, and the memory amount does
>> not align with the underlying page size. The stored mmap_size value does
>> not account for the fact that mmap will automatically align the length
>> to a multiple of the underlying page size. During the teardown of the
>> test, munmap is used. However, munmap requires the length to be a
>> multiple of the underlying page size.
> 
> What happens when selftests use the wrong map_size?  E.g. is munmap() silently
> failing?  If so, then I should probably take this particular patch through
> kvm-x86/gmem, otherwise it means we'll start getting asserts due to:
> 
>    3223560c93eb ("KVM: selftests: Define wrappers for common syscalls to assert success")
> 
> If munmap() isn't failing, then that begs the question of what this patch is
> actually doing :-)
> 

Hi Sean, sorry I completely missed your reply.

Yeah currently with a misaligned map_size it causes munmap() to fail, I
noticed when tested with different backings.

>> Update the vm_mem_add method to ensure the mmap_size is aligned to the
>> underlying page size.
>>
>> Signed-off-by: Jack Thomson <jackabt@amazon.com>
>> ---
>>   tools/testing/selftests/kvm/lib/kvm_util.c | 12 +++++-------
>>   1 file changed, 5 insertions(+), 7 deletions(-)
>>
>> diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
>> index c3f5142b0a54..b106fbed999c 100644
>> --- a/tools/testing/selftests/kvm/lib/kvm_util.c
>> +++ b/tools/testing/selftests/kvm/lib/kvm_util.c
>> @@ -1051,7 +1051,6 @@ void vm_mem_add(struct kvm_vm *vm, enum vm_mem_backing_src_type src_type,
>>   	/* Allocate and initialize new mem region structure. */
>>   	region = calloc(1, sizeof(*region));
>>   	TEST_ASSERT(region != NULL, "Insufficient Memory");
>> -	region->mmap_size = mem_size;
>>   
>>   #ifdef __s390x__
>>   	/* On s390x, the host address must be aligned to 1M (due to PGSTEs) */
>> @@ -1060,6 +1059,11 @@ void vm_mem_add(struct kvm_vm *vm, enum vm_mem_backing_src_type src_type,
>>   	alignment = 1;
>>   #endif
>>   
>> +	alignment = max(backing_src_pagesz, alignment);
>> +	region->mmap_size = align_up(mem_size, alignment);
>> +
>> +	TEST_ASSERT_EQ(guest_paddr, align_up(guest_paddr, backing_src_pagesz));
>> +
>>   	/*
>>   	 * When using THP mmap is not guaranteed to returned a hugepage aligned
>>   	 * address so we have to pad the mmap. Padding is not needed for HugeTLB
>> @@ -1067,12 +1071,6 @@ void vm_mem_add(struct kvm_vm *vm, enum vm_mem_backing_src_type src_type,
>>   	 * page size.
>>   	 */
>>   	if (src_type == VM_MEM_SRC_ANONYMOUS_THP)
>> -		alignment = max(backing_src_pagesz, alignment);
>> -
>> -	TEST_ASSERT_EQ(guest_paddr, align_up(guest_paddr, backing_src_pagesz));
>> -
>> -	/* Add enough memory to align up if necessary */
>> -	if (alignment > 1)
>>   		region->mmap_size += alignment;
>>   
>>   	region->fd = -1;
>> -- 
>> 2.43.0
>>

-- 
Thanks,
Jack

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/4] KVM: selftests: Fix unaligned mmap allocations
  2025-10-28 11:44     ` Thomson, Jack
@ 2025-11-03 21:08       ` Sean Christopherson
  2025-11-04 11:40         ` Thomson, Jack
  0 siblings, 1 reply; 11+ messages in thread
From: Sean Christopherson @ 2025-11-03 21:08 UTC (permalink / raw)
  To: Jack Thomson
  Cc: maz, oliver.upton, pbonzini, joey.gouly, suzuki.poulose,
	yuzenghui, catalin.marinas, will, shuah, linux-arm-kernel, kvmarm,
	linux-kernel, linux-kselftest, isaku.yamahata, roypat, kalyazin,
	jackabt

On Tue, Oct 28, 2025, Jack Thomson wrote:
> 
> 
> On 23/10/2025 6:16 pm, Sean Christopherson wrote:
> > On Mon, Oct 13, 2025, Jack Thomson wrote:
> > > From: Jack Thomson <jackabt@amazon.com>
> > > 
> > > When creating a VM using mmap with huge pages, and the memory amount does
> > > not align with the underlying page size. The stored mmap_size value does
> > > not account for the fact that mmap will automatically align the length
> > > to a multiple of the underlying page size. During the teardown of the
> > > test, munmap is used. However, munmap requires the length to be a
> > > multiple of the underlying page size.
> > 
> > What happens when selftests use the wrong map_size?  E.g. is munmap() silently
> > failing?  If so, then I should probably take this particular patch through
> > kvm-x86/gmem, otherwise it means we'll start getting asserts due to:
> > 
> >    3223560c93eb ("KVM: selftests: Define wrappers for common syscalls to assert success")
> > 
> > If munmap() isn't failing, then that begs the question of what this patch is
> > actually doing :-)
> > 
> 
> Hi Sean, sorry I completely missed your reply.
> 
> Yeah currently with a misaligned map_size it causes munmap() to fail, I
> noticed when tested with different backings.

Exactly which tests fail?  I ask because I'm not sure we want to fix this by
having vm_mem_add() paper over test issues (I vaguely recall looking at this in
the past, but I can't find or recall the details).

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/4] KVM: selftests: Fix unaligned mmap allocations
  2025-11-03 21:08       ` Sean Christopherson
@ 2025-11-04 11:40         ` Thomson, Jack
  2025-11-04 20:19           ` Sean Christopherson
  0 siblings, 1 reply; 11+ messages in thread
From: Thomson, Jack @ 2025-11-04 11:40 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: maz, oliver.upton, pbonzini, joey.gouly, suzuki.poulose,
	yuzenghui, catalin.marinas, will, shuah, linux-arm-kernel, kvmarm,
	linux-kernel, linux-kselftest, isaku.yamahata, roypat, kalyazin,
	jackabt

On 03/11/2025 9:08 pm, Sean Christopherson wrote:
> On Tue, Oct 28, 2025, Jack Thomson wrote:
>>
>>
>> On 23/10/2025 6:16 pm, Sean Christopherson wrote:
>>> On Mon, Oct 13, 2025, Jack Thomson wrote:
>>>> From: Jack Thomson <jackabt@amazon.com>
>>>>
>>>> When creating a VM using mmap with huge pages, and the memory amount does
>>>> not align with the underlying page size. The stored mmap_size value does
>>>> not account for the fact that mmap will automatically align the length
>>>> to a multiple of the underlying page size. During the teardown of the
>>>> test, munmap is used. However, munmap requires the length to be a
>>>> multiple of the underlying page size.
>>>
>>> What happens when selftests use the wrong map_size?  E.g. is munmap() silently
>>> failing?  If so, then I should probably take this particular patch through
>>> kvm-x86/gmem, otherwise it means we'll start getting asserts due to:
>>>
>>>     3223560c93eb ("KVM: selftests: Define wrappers for common syscalls to assert success")
>>>
>>> If munmap() isn't failing, then that begs the question of what this patch is
>>> actually doing :-)
>>>
>>
>> Hi Sean, sorry I completely missed your reply.
>>
>> Yeah currently with a misaligned map_size it causes munmap() to fail, I
>> noticed when tested with different backings.
> 
> Exactly which tests fail?  I ask because I'm not sure we want to fix this by
> having vm_mem_add() paper over test issues (I vaguely recall looking at this in
> the past, but I can't find or recall the details).

The test failures happened with pre_faulting tests after adding the
option to change the backing page size [1]. If you'd prefer to
have the test handle with this I'll update there instead.

[1]
https://lore.kernel.org/all/20251013151502.6679-5-jackabt.amazon@gmail.com

-- 
Thanks,
Jack

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/4] KVM: selftests: Fix unaligned mmap allocations
  2025-11-04 11:40         ` Thomson, Jack
@ 2025-11-04 20:19           ` Sean Christopherson
  0 siblings, 0 replies; 11+ messages in thread
From: Sean Christopherson @ 2025-11-04 20:19 UTC (permalink / raw)
  To: Jack Thomson
  Cc: maz, oliver.upton, pbonzini, joey.gouly, suzuki.poulose,
	yuzenghui, catalin.marinas, will, shuah, linux-arm-kernel, kvmarm,
	linux-kernel, linux-kselftest, isaku.yamahata, roypat, kalyazin,
	jackabt

On Tue, Nov 04, 2025, Jack Thomson wrote:
> On 03/11/2025 9:08 pm, Sean Christopherson wrote:
> > On Tue, Oct 28, 2025, Jack Thomson wrote:
> > > 
> > > 
> > > On 23/10/2025 6:16 pm, Sean Christopherson wrote:
> > > > On Mon, Oct 13, 2025, Jack Thomson wrote:
> > > > > From: Jack Thomson <jackabt@amazon.com>
> > > > > 
> > > > > When creating a VM using mmap with huge pages, and the memory amount does
> > > > > not align with the underlying page size. The stored mmap_size value does
> > > > > not account for the fact that mmap will automatically align the length
> > > > > to a multiple of the underlying page size. During the teardown of the
> > > > > test, munmap is used. However, munmap requires the length to be a
> > > > > multiple of the underlying page size.
> > > > 
> > > > What happens when selftests use the wrong map_size?  E.g. is munmap() silently
> > > > failing?  If so, then I should probably take this particular patch through
> > > > kvm-x86/gmem, otherwise it means we'll start getting asserts due to:
> > > > 
> > > >     3223560c93eb ("KVM: selftests: Define wrappers for common syscalls to assert success")
> > > > 
> > > > If munmap() isn't failing, then that begs the question of what this patch is
> > > > actually doing :-)
> > > > 
> > > 
> > > Hi Sean, sorry I completely missed your reply.
> > > 
> > > Yeah currently with a misaligned map_size it causes munmap() to fail, I
> > > noticed when tested with different backings.
> > 
> > Exactly which tests fail?  I ask because I'm not sure we want to fix this by
> > having vm_mem_add() paper over test issues (I vaguely recall looking at this in
> > the past, but I can't find or recall the details).
> 
> The test failures happened with pre_faulting tests after adding the
> option to change the backing page size [1]. If you'd prefer to
> have the test handle with this I'll update there instead.

Ah, yeah, that's a test bug introduced by your patch.  I can't find the thread,
but the issue of hugepage aligntment in vm_mem_add() has come up in the past,
and IIRC the conclusion was that tests need to handle the size+alignment, because
having the library force the alignment risking papering over test bugs/flaws.
And I think there may have even been cases where it introduced failures, as some
tests deliberately wanted to do weird things?

E.g. not updating the pre-faulting test to use the "correct" size+alignment means
the test is missing easy coverage for hugepages, since KVM won't create huge
mappings in stage-2 due to the memslot not being sized+aligned.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2025-11-04 20:19 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-13 15:14 [PATCH v2 0/4] KVM ARM64 pre_fault_memory Jack Thomson
2025-10-13 15:14 ` [PATCH v2 1/4] KVM: arm64: Add pre_fault_memory implementation Jack Thomson
2025-10-16 14:01   ` Suzuki K Poulose
2025-10-13 15:14 ` [PATCH v2 2/4] KVM: selftests: Fix unaligned mmap allocations Jack Thomson
2025-10-23 17:16   ` Sean Christopherson
2025-10-28 11:44     ` Thomson, Jack
2025-11-03 21:08       ` Sean Christopherson
2025-11-04 11:40         ` Thomson, Jack
2025-11-04 20:19           ` Sean Christopherson
2025-10-13 15:15 ` [PATCH v2 3/4] KVM: selftests: Enable pre_fault_memory_test for arm64 Jack Thomson
2025-10-13 15:15 ` [PATCH v2 4/4] KVM: selftests: Add option for different backing in pre-fault tests Jack Thomson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).