[PATCH v3 0/3] KVM ARM64 pre_fault

linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v3 0/3] KVM ARM64 pre_fault_memory
@ 2025-11-19 15:49 Jack Thomson
  2025-11-19 15:49 ` [PATCH v3 1/3] KVM: arm64: Add pre_fault_memory implementation Jack Thomson
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Jack Thomson @ 2025-11-19 15:49 UTC (permalink / raw)
  To: maz, oliver.upton, pbonzini
  Cc: joey.gouly, suzuki.poulose, yuzenghui, catalin.marinas, will,
	shuah, linux-arm-kernel, kvmarm, linux-kernel, linux-kselftest,
	isaku.yamahata, xmarcalx, kalyazin, jackabt

From: Jack Thomson <jackabt@amazon.com>

This patch series adds ARM64 support for the KVM_PRE_FAULT_MEMORY
feature, which was previously only available on x86 [1]. This allows us
to reduce the number of stage-2 faults during execution. This is of
benefit in post-copy migration scenarios, particularly in memory
intensive applications, where we are experiencing high latencies due to
the stage-2 faults.

Patch Overview:

 - The first patch adds support for the KVM_PRE_FAULT_MEMORY ioctl
   on arm64.

 - The second patch updates the pre_fault_memory_test to support
   arm64.

 - The last patch extends the pre_fault_memory_test to cover
   different vm memory backings.

=== Changes Since v2 [2] ===

 - Update fault info synthesize value. Thanks Suzuki
 - Remove change to selftests for unaligned mmap allocations. Thanks
   Sean

[1]: https://lore.kernel.org/kvm/20240710174031.312055-1-pbonzini@redhat.com
[2]: https://lore.kernel.org/linux-arm-kernel/20251013151502.6679-1-jackabt.amazon@gmail.com

Jack Thomson (3):
  KVM: arm64: Add pre_fault_memory implementation
  KVM: selftests: Enable pre_fault_memory_test for arm64
  KVM: selftests: Add option for different backing in pre-fault tests

 Documentation/virt/kvm/api.rst                |   3 +-
 arch/arm64/kvm/Kconfig                        |   1 +
 arch/arm64/kvm/arm.c                          |   1 +
 arch/arm64/kvm/mmu.c                          |  73 +++++++++++-
 tools/testing/selftests/kvm/Makefile.kvm      |   1 +
 .../selftests/kvm/pre_fault_memory_test.c     | 110 ++++++++++++++----
 6 files changed, 159 insertions(+), 30 deletions(-)


base-commit: 8a4821412cf2c1429fffa07c012dd150f2edf78c
-- 
2.43.0



^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v3 1/3] KVM: arm64: Add pre_fault_memory implementation
  2025-11-19 15:49 [PATCH v3 0/3] KVM ARM64 pre_fault_memory Jack Thomson
@ 2025-11-19 15:49 ` Jack Thomson
  2025-11-24 10:38   ` Vladimir Murzin
  2025-11-24 11:34   ` Marc Zyngier
  2025-11-19 15:49 ` [PATCH v3 2/3] KVM: selftests: Enable pre_fault_memory_test for arm64 Jack Thomson
  2025-11-19 15:49 ` [PATCH v3 3/3] KVM: selftests: Add option for different backing in pre-fault tests Jack Thomson
  2 siblings, 2 replies; 9+ messages in thread
From: Jack Thomson @ 2025-11-19 15:49 UTC (permalink / raw)
  To: maz, oliver.upton, pbonzini
  Cc: joey.gouly, suzuki.poulose, yuzenghui, catalin.marinas, will,
	shuah, linux-arm-kernel, kvmarm, linux-kernel, linux-kselftest,
	isaku.yamahata, xmarcalx, kalyazin, jackabt

From: Jack Thomson <jackabt@amazon.com>

Add kvm_arch_vcpu_pre_fault_memory() for arm64. The implementation hands
off the stage-2 faulting logic to either gmem_abort() or
user_mem_abort().

Add an optional page_size output parameter to user_mem_abort() to
return the VMA page size, which is needed when pre-faulting.

Update the documentation to clarify x86 specific behaviour.

Signed-off-by: Jack Thomson <jackabt@amazon.com>
---
 Documentation/virt/kvm/api.rst |  3 +-
 arch/arm64/kvm/Kconfig         |  1 +
 arch/arm64/kvm/arm.c           |  1 +
 arch/arm64/kvm/mmu.c           | 73 ++++++++++++++++++++++++++++++++--
 4 files changed, 73 insertions(+), 5 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 57061fa29e6a..30872d080511 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6493,7 +6493,8 @@ Errors:
 KVM_PRE_FAULT_MEMORY populates KVM's stage-2 page tables used to map memory
 for the current vCPU state.  KVM maps memory as if the vCPU generated a
 stage-2 read page fault, e.g. faults in memory as needed, but doesn't break
-CoW.  However, KVM does not mark any newly created stage-2 PTE as Accessed.
+CoW.  However, on x86, KVM does not mark any newly created stage-2 PTE as
+Accessed.
 
 In the case of confidential VM types where there is an initial set up of
 private guest memory before the guest is 'finalized'/measured, this ioctl
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index 4f803fd1c99a..6872aaabe16c 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -25,6 +25,7 @@ menuconfig KVM
 	select HAVE_KVM_CPU_RELAX_INTERCEPT
 	select KVM_MMIO
 	select KVM_GENERIC_DIRTYLOG_READ_PROTECT
+	select KVM_GENERIC_PRE_FAULT_MEMORY
 	select VIRT_XFER_TO_GUEST_WORK
 	select KVM_VFIO
 	select HAVE_KVM_DIRTY_RING_ACQ_REL
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 870953b4a8a7..88c5dc2b4ee8 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -327,6 +327,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_IRQFD_RESAMPLE:
 	case KVM_CAP_COUNTER_OFFSET:
 	case KVM_CAP_ARM_WRITABLE_IMP_ID_REGS:
+	case KVM_CAP_PRE_FAULT_MEMORY:
 		r = 1;
 		break;
 	case KVM_CAP_SET_GUEST_DEBUG2:
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 7cc964af8d30..cba09168fc6d 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1599,8 +1599,8 @@ static int gmem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 
 static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			  struct kvm_s2_trans *nested,
-			  struct kvm_memory_slot *memslot, unsigned long hva,
-			  bool fault_is_perm)
+			  struct kvm_memory_slot *memslot, long *page_size,
+			  unsigned long hva, bool fault_is_perm)
 {
 	int ret = 0;
 	bool topup_memcache;
@@ -1878,6 +1878,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	kvm_release_faultin_page(kvm, page, !!ret, writable);
 	kvm_fault_unlock(kvm);
 
+	if (page_size)
+		*page_size = vma_pagesize;
+
 	/* Mark the page dirty only if the fault is handled successfully */
 	if (writable && !ret)
 		mark_page_dirty_in_slot(kvm, memslot, gfn);
@@ -2080,8 +2083,8 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 		ret = gmem_abort(vcpu, fault_ipa, nested, memslot,
 				 esr_fsc_is_permission_fault(esr));
 	else
-		ret = user_mem_abort(vcpu, fault_ipa, nested, memslot, hva,
-				     esr_fsc_is_permission_fault(esr));
+		ret = user_mem_abort(vcpu, fault_ipa, nested, memslot, NULL,
+				     hva, esr_fsc_is_permission_fault(esr));
 	if (ret == 0)
 		ret = 1;
 out:
@@ -2457,3 +2460,65 @@ void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled)
 
 	trace_kvm_toggle_cache(*vcpu_pc(vcpu), was_enabled, now_enabled);
 }
+
+long kvm_arch_vcpu_pre_fault_memory(struct kvm_vcpu *vcpu,
+				    struct kvm_pre_fault_memory *range)
+{
+	int ret, idx;
+	hva_t hva;
+	phys_addr_t end;
+	struct kvm_memory_slot *memslot;
+	struct kvm_vcpu_fault_info stored_fault, *fault_info;
+
+	long page_size = PAGE_SIZE;
+	phys_addr_t ipa = range->gpa;
+	gfn_t gfn = gpa_to_gfn(range->gpa);
+
+	idx = srcu_read_lock(&vcpu->kvm->srcu);
+
+	if (ipa >= kvm_phys_size(vcpu->arch.hw_mmu)) {
+		ret = -ENOENT;
+		goto out_unlock;
+	}
+
+	memslot = gfn_to_memslot(vcpu->kvm, gfn);
+	if (!memslot) {
+		ret = -ENOENT;
+		goto out_unlock;
+	}
+
+	fault_info = &vcpu->arch.fault;
+	stored_fault = *fault_info;
+
+	/* Generate a synthetic abort for the pre-fault address */
+	fault_info->esr_el2 = ESR_ELx_EC_DABT_LOW;
+	fault_info->esr_el2 &= ~ESR_ELx_ISV;
+	fault_info->esr_el2 |= ESR_ELx_FSC_FAULT_L(KVM_PGTABLE_LAST_LEVEL);
+
+	fault_info->hpfar_el2 = HPFAR_EL2_NS |
+		FIELD_PREP(HPFAR_EL2_FIPA, ipa >> 12);
+
+	if (kvm_slot_has_gmem(memslot)) {
+		ret = gmem_abort(vcpu, ipa, NULL, memslot, false);
+	} else {
+		hva = gfn_to_hva_memslot_prot(memslot, gfn, NULL);
+		if (kvm_is_error_hva(hva)) {
+			ret = -EFAULT;
+			goto out;
+		}
+		ret = user_mem_abort(vcpu, ipa, NULL, memslot, &page_size, hva,
+				     false);
+	}
+
+	if (ret < 0)
+		goto out;
+
+	end = (range->gpa & ~(page_size - 1)) + page_size;
+	ret = min(range->size, end - range->gpa);
+
+out:
+	*fault_info = stored_fault;
+out_unlock:
+	srcu_read_unlock(&vcpu->kvm->srcu, idx);
+	return ret;
+}
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v3 2/3] KVM: selftests: Enable pre_fault_memory_test for arm64
  2025-11-19 15:49 [PATCH v3 0/3] KVM ARM64 pre_fault_memory Jack Thomson
  2025-11-19 15:49 ` [PATCH v3 1/3] KVM: arm64: Add pre_fault_memory implementation Jack Thomson
@ 2025-11-19 15:49 ` Jack Thomson
  2025-12-05 17:33   ` Vincent Donnefort
  2025-11-19 15:49 ` [PATCH v3 3/3] KVM: selftests: Add option for different backing in pre-fault tests Jack Thomson
  2 siblings, 1 reply; 9+ messages in thread
From: Jack Thomson @ 2025-11-19 15:49 UTC (permalink / raw)
  To: maz, oliver.upton, pbonzini
  Cc: joey.gouly, suzuki.poulose, yuzenghui, catalin.marinas, will,
	shuah, linux-arm-kernel, kvmarm, linux-kernel, linux-kselftest,
	isaku.yamahata, xmarcalx, kalyazin, jackabt

From: Jack Thomson <jackabt@amazon.com>

Enable the pre_fault_memory_test to run on arm64 by making it work with
different guest page sizes and testing multiple guest configurations.

Update the test_assert to compare against the UCALL_EXIT_REASON, for
portability, as arm64 exits with KVM_EXIT_MMIO while x86 uses
KVM_EXIT_IO.

Signed-off-by: Jack Thomson <jackabt@amazon.com>
---
 tools/testing/selftests/kvm/Makefile.kvm      |  1 +
 .../selftests/kvm/pre_fault_memory_test.c     | 78 ++++++++++++++-----
 2 files changed, 58 insertions(+), 21 deletions(-)

diff --git a/tools/testing/selftests/kvm/Makefile.kvm b/tools/testing/selftests/kvm/Makefile.kvm
index 148d427ff24b..0ddd8db60197 100644
--- a/tools/testing/selftests/kvm/Makefile.kvm
+++ b/tools/testing/selftests/kvm/Makefile.kvm
@@ -183,6 +183,7 @@ TEST_GEN_PROGS_arm64 += memslot_perf_test
 TEST_GEN_PROGS_arm64 += mmu_stress_test
 TEST_GEN_PROGS_arm64 += rseq_test
 TEST_GEN_PROGS_arm64 += steal_time
+TEST_GEN_PROGS_arm64 += pre_fault_memory_test
 
 TEST_GEN_PROGS_s390 = $(TEST_GEN_PROGS_COMMON)
 TEST_GEN_PROGS_s390 += s390/memop
diff --git a/tools/testing/selftests/kvm/pre_fault_memory_test.c b/tools/testing/selftests/kvm/pre_fault_memory_test.c
index f04768c1d2e4..674931e7bb3a 100644
--- a/tools/testing/selftests/kvm/pre_fault_memory_test.c
+++ b/tools/testing/selftests/kvm/pre_fault_memory_test.c
@@ -11,19 +11,29 @@
 #include <kvm_util.h>
 #include <processor.h>
 #include <pthread.h>
+#include <guest_modes.h>
 
 /* Arbitrarily chosen values */
-#define TEST_SIZE		(SZ_2M + PAGE_SIZE)
-#define TEST_NPAGES		(TEST_SIZE / PAGE_SIZE)
+#define TEST_BASE_SIZE		SZ_2M
 #define TEST_SLOT		10
 
+/* Storage of test info to share with guest code */
+struct test_config {
+	int page_size;
+	uint64_t test_size;
+	uint64_t test_num_pages;
+};
+
+struct test_config test_config;
+
 static void guest_code(uint64_t base_gpa)
 {
 	volatile uint64_t val __used;
+	struct test_config *config = &test_config;
 	int i;
 
-	for (i = 0; i < TEST_NPAGES; i++) {
-		uint64_t *src = (uint64_t *)(base_gpa + i * PAGE_SIZE);
+	for (i = 0; i < config->test_num_pages; i++) {
+		uint64_t *src = (uint64_t *)(base_gpa + i * config->page_size);
 
 		val = *src;
 	}
@@ -159,11 +169,17 @@ static void pre_fault_memory(struct kvm_vcpu *vcpu, u64 base_gpa, u64 offset,
 					  KVM_PRE_FAULT_MEMORY, ret, vcpu->vm);
 }
 
-static void __test_pre_fault_memory(unsigned long vm_type, bool private)
+struct test_params {
+	unsigned long vm_type;
+	bool private;
+};
+
+static void __test_pre_fault_memory(enum vm_guest_mode guest_mode, void *arg)
 {
+	struct test_params *p = arg;
 	const struct vm_shape shape = {
-		.mode = VM_MODE_DEFAULT,
-		.type = vm_type,
+		.mode = guest_mode,
+		.type = p->vm_type,
 	};
 	struct kvm_vcpu *vcpu;
 	struct kvm_run *run;
@@ -174,10 +190,17 @@ static void __test_pre_fault_memory(unsigned long vm_type, bool private)
 	uint64_t guest_test_virt_mem;
 	uint64_t alignment, guest_page_size;
 
+	pr_info("Testing guest mode: %s\n", vm_guest_mode_string(guest_mode));
+
 	vm = vm_create_shape_with_one_vcpu(shape, &vcpu, guest_code);
 
-	alignment = guest_page_size = vm_guest_mode_params[VM_MODE_DEFAULT].page_size;
-	guest_test_phys_mem = (vm->max_gfn - TEST_NPAGES) * guest_page_size;
+	guest_page_size = vm_guest_mode_params[guest_mode].page_size;
+
+	test_config.page_size = guest_page_size;
+	test_config.test_size = TEST_BASE_SIZE + test_config.page_size;
+	test_config.test_num_pages = vm_calc_num_guest_pages(vm->mode, test_config.test_size);
+
+	guest_test_phys_mem = (vm->max_gfn - test_config.test_num_pages) * test_config.page_size;
 #ifdef __s390x__
 	alignment = max(0x100000UL, guest_page_size);
 #else
@@ -187,23 +210,31 @@ static void __test_pre_fault_memory(unsigned long vm_type, bool private)
 	guest_test_virt_mem = guest_test_phys_mem & ((1ULL << (vm->va_bits - 1)) - 1);
 
 	vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS,
-				    guest_test_phys_mem, TEST_SLOT, TEST_NPAGES,
-				    private ? KVM_MEM_GUEST_MEMFD : 0);
-	virt_map(vm, guest_test_virt_mem, guest_test_phys_mem, TEST_NPAGES);
+				    guest_test_phys_mem, TEST_SLOT, test_config.test_num_pages,
+				    p->private ? KVM_MEM_GUEST_MEMFD : 0);
+	virt_map(vm, guest_test_virt_mem, guest_test_phys_mem, test_config.test_num_pages);
+
+	if (p->private)
+		vm_mem_set_private(vm, guest_test_phys_mem, test_config.test_size);
+	pre_fault_memory(vcpu, guest_test_phys_mem, TEST_BASE_SIZE, 0, p->private);
+	/* Test pre-faulting over an already faulted range */
+	pre_fault_memory(vcpu, guest_test_phys_mem, TEST_BASE_SIZE, 0, p->private);
+	pre_fault_memory(vcpu, guest_test_phys_mem + TEST_BASE_SIZE,
+			 test_config.page_size * 2, test_config.page_size, p->private);
+	pre_fault_memory(vcpu, guest_test_phys_mem + test_config.test_size,
+			 test_config.page_size, test_config.page_size, p->private);
 
-	if (private)
-		vm_mem_set_private(vm, guest_test_phys_mem, TEST_SIZE);
+	vcpu_args_set(vcpu, 1, guest_test_virt_mem);
 
-	pre_fault_memory(vcpu, guest_test_phys_mem, 0, SZ_2M, 0, private);
-	pre_fault_memory(vcpu, guest_test_phys_mem, SZ_2M, PAGE_SIZE * 2, PAGE_SIZE, private);
-	pre_fault_memory(vcpu, guest_test_phys_mem, TEST_SIZE, PAGE_SIZE, PAGE_SIZE, private);
+	/* Export the shared variables to the guest. */
+	sync_global_to_guest(vm, test_config);
 
-	vcpu_args_set(vcpu, 1, guest_test_virt_mem);
 	vcpu_run(vcpu);
 
 	run = vcpu->run;
-	TEST_ASSERT(run->exit_reason == KVM_EXIT_IO,
-		    "Wanted KVM_EXIT_IO, got exit reason: %u (%s)",
+	TEST_ASSERT(run->exit_reason == UCALL_EXIT_REASON,
+		    "Wanted %s, got exit reason: %u (%s)",
+		    exit_reason_str(UCALL_EXIT_REASON),
 		    run->exit_reason, exit_reason_str(run->exit_reason));
 
 	switch (get_ucall(vcpu, &uc)) {
@@ -227,7 +258,12 @@ static void test_pre_fault_memory(unsigned long vm_type, bool private)
 		return;
 	}
 
-	__test_pre_fault_memory(vm_type, private);
+	struct test_params p = {
+		.vm_type = vm_type,
+		.private = private,
+	};
+
+	for_each_guest_mode(__test_pre_fault_memory, &p);
 }
 
 int main(int argc, char *argv[])
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v3 3/3] KVM: selftests: Add option for different backing in pre-fault tests
  2025-11-19 15:49 [PATCH v3 0/3] KVM ARM64 pre_fault_memory Jack Thomson
  2025-11-19 15:49 ` [PATCH v3 1/3] KVM: arm64: Add pre_fault_memory implementation Jack Thomson
  2025-11-19 15:49 ` [PATCH v3 2/3] KVM: selftests: Enable pre_fault_memory_test for arm64 Jack Thomson
@ 2025-11-19 15:49 ` Jack Thomson
  2 siblings, 0 replies; 9+ messages in thread
From: Jack Thomson @ 2025-11-19 15:49 UTC (permalink / raw)
  To: maz, oliver.upton, pbonzini
  Cc: joey.gouly, suzuki.poulose, yuzenghui, catalin.marinas, will,
	shuah, linux-arm-kernel, kvmarm, linux-kernel, linux-kselftest,
	isaku.yamahata, xmarcalx, kalyazin, jackabt

From: Jack Thomson <jackabt@amazon.com>

Add a -m option to specify different memory backing types for the
pre-fault tests (e.g., anonymous, hugetlb), allowing testing of the
pre-fault functionality across different memory configurations.

Signed-off-by: Jack Thomson <jackabt@amazon.com>
---
 .../selftests/kvm/pre_fault_memory_test.c     | 42 +++++++++++++++----
 1 file changed, 33 insertions(+), 9 deletions(-)

diff --git a/tools/testing/selftests/kvm/pre_fault_memory_test.c b/tools/testing/selftests/kvm/pre_fault_memory_test.c
index 674931e7bb3a..e1111c4df748 100644
--- a/tools/testing/selftests/kvm/pre_fault_memory_test.c
+++ b/tools/testing/selftests/kvm/pre_fault_memory_test.c
@@ -172,6 +172,7 @@ static void pre_fault_memory(struct kvm_vcpu *vcpu, u64 base_gpa, u64 offset,
 struct test_params {
 	unsigned long vm_type;
 	bool private;
+	enum vm_mem_backing_src_type mem_backing_src;
 };
 
 static void __test_pre_fault_memory(enum vm_guest_mode guest_mode, void *arg)
@@ -190,14 +191,19 @@ static void __test_pre_fault_memory(enum vm_guest_mode guest_mode, void *arg)
 	uint64_t guest_test_virt_mem;
 	uint64_t alignment, guest_page_size;
 
+	size_t backing_src_pagesz = get_backing_src_pagesz(p->mem_backing_src);
+
 	pr_info("Testing guest mode: %s\n", vm_guest_mode_string(guest_mode));
+	pr_info("Testing memory backing src type: %s\n",
+		vm_mem_backing_src_alias(p->mem_backing_src)->name);
 
 	vm = vm_create_shape_with_one_vcpu(shape, &vcpu, guest_code);
 
 	guest_page_size = vm_guest_mode_params[guest_mode].page_size;
 
 	test_config.page_size = guest_page_size;
-	test_config.test_size = TEST_BASE_SIZE + test_config.page_size;
+	test_config.test_size = align_up(TEST_BASE_SIZE + test_config.page_size,
+					 backing_src_pagesz);
 	test_config.test_num_pages = vm_calc_num_guest_pages(vm->mode, test_config.test_size);
 
 	guest_test_phys_mem = (vm->max_gfn - test_config.test_num_pages) * test_config.page_size;
@@ -206,20 +212,23 @@ static void __test_pre_fault_memory(enum vm_guest_mode guest_mode, void *arg)
 #else
 	alignment = SZ_2M;
 #endif
+	alignment = max(alignment, backing_src_pagesz);
 	guest_test_phys_mem = align_down(guest_test_phys_mem, alignment);
 	guest_test_virt_mem = guest_test_phys_mem & ((1ULL << (vm->va_bits - 1)) - 1);
 
-	vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS,
+	vm_userspace_mem_region_add(vm, p->mem_backing_src,
 				    guest_test_phys_mem, TEST_SLOT, test_config.test_num_pages,
 				    p->private ? KVM_MEM_GUEST_MEMFD : 0);
 	virt_map(vm, guest_test_virt_mem, guest_test_phys_mem, test_config.test_num_pages);
 
 	if (p->private)
 		vm_mem_set_private(vm, guest_test_phys_mem, test_config.test_size);
-	pre_fault_memory(vcpu, guest_test_phys_mem, TEST_BASE_SIZE, 0, p->private);
+
+	pre_fault_memory(vcpu, guest_test_phys_mem, test_config.test_size, 0, p->private);
 	/* Test pre-faulting over an already faulted range */
-	pre_fault_memory(vcpu, guest_test_phys_mem, TEST_BASE_SIZE, 0, p->private);
-	pre_fault_memory(vcpu, guest_test_phys_mem + TEST_BASE_SIZE,
+	pre_fault_memory(vcpu, guest_test_phys_mem, test_config.test_size, 0, p->private);
+	pre_fault_memory(vcpu, guest_test_phys_mem +
+			 test_config.test_size - test_config.page_size,
 			 test_config.page_size * 2, test_config.page_size, p->private);
 	pre_fault_memory(vcpu, guest_test_phys_mem + test_config.test_size,
 			 test_config.page_size, test_config.page_size, p->private);
@@ -251,7 +260,8 @@ static void __test_pre_fault_memory(enum vm_guest_mode guest_mode, void *arg)
 	kvm_vm_free(vm);
 }
 
-static void test_pre_fault_memory(unsigned long vm_type, bool private)
+static void test_pre_fault_memory(unsigned long vm_type, enum vm_mem_backing_src_type backing_src,
+				  bool private)
 {
 	if (vm_type && !(kvm_check_cap(KVM_CAP_VM_TYPES) & BIT(vm_type))) {
 		pr_info("Skipping tests for vm_type 0x%lx\n", vm_type);
@@ -261,6 +271,7 @@ static void test_pre_fault_memory(unsigned long vm_type, bool private)
 	struct test_params p = {
 		.vm_type = vm_type,
 		.private = private,
+		.mem_backing_src = backing_src,
 	};
 
 	for_each_guest_mode(__test_pre_fault_memory, &p);
@@ -270,10 +281,23 @@ int main(int argc, char *argv[])
 {
 	TEST_REQUIRE(kvm_check_cap(KVM_CAP_PRE_FAULT_MEMORY));
 
-	test_pre_fault_memory(0, false);
+	int opt;
+	enum vm_mem_backing_src_type backing = VM_MEM_SRC_ANONYMOUS;
+
+	while ((opt = getopt(argc, argv, "m:")) != -1) {
+		switch (opt) {
+		case 'm':
+			backing = parse_backing_src_type(optarg);
+			break;
+		default:
+			break;
+		}
+	}
+
+	test_pre_fault_memory(0, backing, false);
 #ifdef __x86_64__
-	test_pre_fault_memory(KVM_X86_SW_PROTECTED_VM, false);
-	test_pre_fault_memory(KVM_X86_SW_PROTECTED_VM, true);
+	test_pre_fault_memory(KVM_X86_SW_PROTECTED_VM, backing, false);
+	test_pre_fault_memory(KVM_X86_SW_PROTECTED_VM, backing, true);
 #endif
 	return 0;
 }
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 1/3] KVM: arm64: Add pre_fault_memory implementation
  2025-11-19 15:49 ` [PATCH v3 1/3] KVM: arm64: Add pre_fault_memory implementation Jack Thomson
@ 2025-11-24 10:38   ` Vladimir Murzin
  2025-11-24 11:34   ` Marc Zyngier
  1 sibling, 0 replies; 9+ messages in thread
From: Vladimir Murzin @ 2025-11-24 10:38 UTC (permalink / raw)
  To: Jack Thomson, maz, oliver.upton, pbonzini
  Cc: joey.gouly, suzuki.poulose, yuzenghui, catalin.marinas, will,
	shuah, linux-arm-kernel, kvmarm, linux-kernel, linux-kselftest,
	isaku.yamahata, xmarcalx, kalyazin, jackabt

Hi Jack,

On 11/19/25 15:49, Jack Thomson wrote:
> From: Jack Thomson <jackabt@amazon.com>
> 
> Add kvm_arch_vcpu_pre_fault_memory() for arm64. The implementation hands
> off the stage-2 faulting logic to either gmem_abort() or
> user_mem_abort().
> 
> Add an optional page_size output parameter to user_mem_abort() to
> return the VMA page size, which is needed when pre-faulting.
> 
> Update the documentation to clarify x86 specific behaviour.
> 
> Signed-off-by: Jack Thomson <jackabt@amazon.com>

It works quite well for a few cases I have, so FWIW

Tested-by: Vladimir Murzin <vladimir.murzin@arm.com>

Cheers
Vladimir


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 1/3] KVM: arm64: Add pre_fault_memory implementation
  2025-11-19 15:49 ` [PATCH v3 1/3] KVM: arm64: Add pre_fault_memory implementation Jack Thomson
  2025-11-24 10:38   ` Vladimir Murzin
@ 2025-11-24 11:34   ` Marc Zyngier
  2025-11-24 12:54     ` Marc Zyngier
  1 sibling, 1 reply; 9+ messages in thread
From: Marc Zyngier @ 2025-11-24 11:34 UTC (permalink / raw)
  To: Jack Thomson
  Cc: oliver.upton, pbonzini, joey.gouly, suzuki.poulose, yuzenghui,
	catalin.marinas, will, shuah, linux-arm-kernel, kvmarm,
	linux-kernel, linux-kselftest, isaku.yamahata, xmarcalx, kalyazin,
	jackabt

On Wed, 19 Nov 2025 15:49:08 +0000,
Jack Thomson <jackabt.amazon@gmail.com> wrote:
> 
> From: Jack Thomson <jackabt@amazon.com>
> 
> Add kvm_arch_vcpu_pre_fault_memory() for arm64. The implementation hands
> off the stage-2 faulting logic to either gmem_abort() or
> user_mem_abort().
> 
> Add an optional page_size output parameter to user_mem_abort() to
> return the VMA page size, which is needed when pre-faulting.
> 
> Update the documentation to clarify x86 specific behaviour.
> 
> Signed-off-by: Jack Thomson <jackabt@amazon.com>
> ---
>  Documentation/virt/kvm/api.rst |  3 +-
>  arch/arm64/kvm/Kconfig         |  1 +
>  arch/arm64/kvm/arm.c           |  1 +
>  arch/arm64/kvm/mmu.c           | 73 ++++++++++++++++++++++++++++++++--
>  4 files changed, 73 insertions(+), 5 deletions(-)
> 
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 57061fa29e6a..30872d080511 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -6493,7 +6493,8 @@ Errors:
>  KVM_PRE_FAULT_MEMORY populates KVM's stage-2 page tables used to map memory
>  for the current vCPU state.  KVM maps memory as if the vCPU generated a
>  stage-2 read page fault, e.g. faults in memory as needed, but doesn't break
> -CoW.  However, KVM does not mark any newly created stage-2 PTE as Accessed.
> +CoW.  However, on x86, KVM does not mark any newly created stage-2 PTE as
> +Accessed.
>  
>  In the case of confidential VM types where there is an initial set up of
>  private guest memory before the guest is 'finalized'/measured, this ioctl
> diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
> index 4f803fd1c99a..6872aaabe16c 100644
> --- a/arch/arm64/kvm/Kconfig
> +++ b/arch/arm64/kvm/Kconfig
> @@ -25,6 +25,7 @@ menuconfig KVM
>  	select HAVE_KVM_CPU_RELAX_INTERCEPT
>  	select KVM_MMIO
>  	select KVM_GENERIC_DIRTYLOG_READ_PROTECT
> +	select KVM_GENERIC_PRE_FAULT_MEMORY
>  	select VIRT_XFER_TO_GUEST_WORK
>  	select KVM_VFIO
>  	select HAVE_KVM_DIRTY_RING_ACQ_REL
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 870953b4a8a7..88c5dc2b4ee8 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -327,6 +327,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>  	case KVM_CAP_IRQFD_RESAMPLE:
>  	case KVM_CAP_COUNTER_OFFSET:
>  	case KVM_CAP_ARM_WRITABLE_IMP_ID_REGS:
> +	case KVM_CAP_PRE_FAULT_MEMORY:
>  		r = 1;

How does with pKVM, where the host is not in charge of dealing with
stage-2?

>  		break;
>  	case KVM_CAP_SET_GUEST_DEBUG2:
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 7cc964af8d30..cba09168fc6d 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1599,8 +1599,8 @@ static int gmem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  
>  static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  			  struct kvm_s2_trans *nested,
> -			  struct kvm_memory_slot *memslot, unsigned long hva,
> -			  bool fault_is_perm)
> +			  struct kvm_memory_slot *memslot, long *page_size,

Why is page_size a signed type? A page size is never negative.

> +			  unsigned long hva, bool fault_is_perm)

I really wish we'd stop adding parameters to this function, as it has
long stopped being readable. It would make a lot more sense if we
passed a descriptor for the fault, containing the ipa, hva, memslot
and fault type.

>  {
>  	int ret = 0;
>  	bool topup_memcache;
> @@ -1878,6 +1878,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  	kvm_release_faultin_page(kvm, page, !!ret, writable);
>  	kvm_fault_unlock(kvm);
>  
> +	if (page_size)
> +		*page_size = vma_pagesize;
> +
>  	/* Mark the page dirty only if the fault is handled successfully */
>  	if (writable && !ret)
>  		mark_page_dirty_in_slot(kvm, memslot, gfn);
> @@ -2080,8 +2083,8 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
>  		ret = gmem_abort(vcpu, fault_ipa, nested, memslot,
>  				 esr_fsc_is_permission_fault(esr));
>  	else
> -		ret = user_mem_abort(vcpu, fault_ipa, nested, memslot, hva,
> -				     esr_fsc_is_permission_fault(esr));
> +		ret = user_mem_abort(vcpu, fault_ipa, nested, memslot, NULL,
> +				     hva, esr_fsc_is_permission_fault(esr));
>  	if (ret == 0)
>  		ret = 1;
>  out:
> @@ -2457,3 +2460,65 @@ void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled)
>  
>  	trace_kvm_toggle_cache(*vcpu_pc(vcpu), was_enabled, now_enabled);
>  }
> +
> +long kvm_arch_vcpu_pre_fault_memory(struct kvm_vcpu *vcpu,
> +				    struct kvm_pre_fault_memory *range)
> +{
> +	int ret, idx;
> +	hva_t hva;
> +	phys_addr_t end;
> +	struct kvm_memory_slot *memslot;
> +	struct kvm_vcpu_fault_info stored_fault, *fault_info;
> +
> +	long page_size = PAGE_SIZE;
> +	phys_addr_t ipa = range->gpa;
> +	gfn_t gfn = gpa_to_gfn(range->gpa);

nit: Please order this in a more readable way, preferably with long
line first.

> +
> +	idx = srcu_read_lock(&vcpu->kvm->srcu);

??? Aren't we already guaranteed to be under the SRCU read lock?

> +
> +	if (ipa >= kvm_phys_size(vcpu->arch.hw_mmu)) {
> +		ret = -ENOENT;
> +		goto out_unlock;
> +	}
> +
> +	memslot = gfn_to_memslot(vcpu->kvm, gfn);
> +	if (!memslot) {
> +		ret = -ENOENT;
> +		goto out_unlock;
> +	}
> +
> +	fault_info = &vcpu->arch.fault;
> +	stored_fault = *fault_info;

If this is a vcpu ioctl, can the fault information be actually valid
while userspace is issuing an ioctl? Wouldn't that mean that we are
exiting to userspace in the middle of handling an exception?

> +
> +	/* Generate a synthetic abort for the pre-fault address */
> +	fault_info->esr_el2 = ESR_ELx_EC_DABT_LOW;
> +	fault_info->esr_el2 &= ~ESR_ELx_ISV;

You are constructing this from scratch. How can ISV be set?

> +	fault_info->esr_el2 |= ESR_ELx_FSC_FAULT_L(KVM_PGTABLE_LAST_LEVEL);
> +
> +	fault_info->hpfar_el2 = HPFAR_EL2_NS |
> +		FIELD_PREP(HPFAR_EL2_FIPA, ipa >> 12);
> +
> +	if (kvm_slot_has_gmem(memslot)) {
> +		ret = gmem_abort(vcpu, ipa, NULL, memslot, false);
> +	} else {
> +		hva = gfn_to_hva_memslot_prot(memslot, gfn, NULL);
> +		if (kvm_is_error_hva(hva)) {
> +			ret = -EFAULT;
> +			goto out;
> +		}
> +		ret = user_mem_abort(vcpu, ipa, NULL, memslot, &page_size, hva,
> +				     false);
> +	}
> +
> +	if (ret < 0)
> +		goto out;
> +
> +	end = (range->gpa & ~(page_size - 1)) + page_size;

This suspiciously looks like one of the __ALIGN_KERNEL* macros.

> +	ret = min(range->size, end - range->gpa);
> +
> +out:
> +	*fault_info = stored_fault;
> +out_unlock:
> +	srcu_read_unlock(&vcpu->kvm->srcu, idx);
> +	return ret;
> +}

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 1/3] KVM: arm64: Add pre_fault_memory implementation
  2025-11-24 11:34   ` Marc Zyngier
@ 2025-11-24 12:54     ` Marc Zyngier
  2025-11-26 17:14       ` Thomson, Jack
  0 siblings, 1 reply; 9+ messages in thread
From: Marc Zyngier @ 2025-11-24 12:54 UTC (permalink / raw)
  To: Jack Thomson
  Cc: oliver.upton, pbonzini, joey.gouly, suzuki.poulose, yuzenghui,
	catalin.marinas, will, shuah, linux-arm-kernel, kvmarm,
	linux-kernel, linux-kselftest, isaku.yamahata, xmarcalx, kalyazin,
	jackabt

On Mon, 24 Nov 2025 11:34:38 +0000,
Marc Zyngier <maz@kernel.org> wrote:
> 
> On Wed, 19 Nov 2025 15:49:08 +0000,
> Jack Thomson <jackabt.amazon@gmail.com> wrote:
> >

[...]

> > +	fault_info->hpfar_el2 = HPFAR_EL2_NS |
> > +		FIELD_PREP(HPFAR_EL2_FIPA, ipa >> 12);
> > +
> > +	if (kvm_slot_has_gmem(memslot)) {
> > +		ret = gmem_abort(vcpu, ipa, NULL, memslot, false);
> > +	} else {
> > +		hva = gfn_to_hva_memslot_prot(memslot, gfn, NULL);
> > +		if (kvm_is_error_hva(hva)) {
> > +			ret = -EFAULT;
> > +			goto out;
> > +		}
> > +		ret = user_mem_abort(vcpu, ipa, NULL, memslot, &page_size, hva,
> > +				     false);
> > +	}

And thinking of it a bit more, this is completely broken. What happens
if the vcpu is in a nested context? You just populate random pages in
an IPA space that is not relevant at all, corrupting the guest state.

You must correctly handle the context the vcpu is in, instead of
assuming that this is the canonical context. This means going via the
*guest's* S2 translation, just like handle_mem_abort() does.

	M.

-- 
Without deviation from the norm, progress is not possible.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 1/3] KVM: arm64: Add pre_fault_memory implementation
  2025-11-24 12:54     ` Marc Zyngier
@ 2025-11-26 17:14       ` Thomson, Jack
  0 siblings, 0 replies; 9+ messages in thread
From: Thomson, Jack @ 2025-11-26 17:14 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: oliver.upton, pbonzini, joey.gouly, suzuki.poulose, yuzenghui,
	catalin.marinas, will, shuah, linux-arm-kernel, kvmarm,
	linux-kernel, linux-kselftest, isaku.yamahata, xmarcalx, kalyazin,
	jackabt



On 24/11/2025 12:54 pm, Marc Zyngier wrote:
> On Mon, 24 Nov 2025 11:34:38 +0000,
> Marc Zyngier <maz@kernel.org> wrote:
>>
>> On Wed, 19 Nov 2025 15:49:08 +0000,
>> Jack Thomson <jackabt.amazon@gmail.com> wrote:
>>>
> 
> [...]
> 
>>> +	fault_info->hpfar_el2 = HPFAR_EL2_NS |
>>> +		FIELD_PREP(HPFAR_EL2_FIPA, ipa >> 12);
>>> +
>>> +	if (kvm_slot_has_gmem(memslot)) {
>>> +		ret = gmem_abort(vcpu, ipa, NULL, memslot, false);
>>> +	} else {
>>> +		hva = gfn_to_hva_memslot_prot(memslot, gfn, NULL);
>>> +		if (kvm_is_error_hva(hva)) {
>>> +			ret = -EFAULT;
>>> +			goto out;
>>> +		}
>>> +		ret = user_mem_abort(vcpu, ipa, NULL, memslot, &page_size, hva,
>>> +				     false);
>>> +	}
> 
> And thinking of it a bit more, this is completely broken. What happens
> if the vcpu is in a nested context? You just populate random pages in
> an IPA space that is not relevant at all, corrupting the guest state.
> 
> You must correctly handle the context the vcpu is in, instead of
> assuming that this is the canonical context. This means going via the
> *guest's* S2 translation, just like handle_mem_abort() does.
> 
> 	M.
> 

Thanks Marc for taking a look, I'll update to fix these issues and
address the other comments.

-- 
Thanks,
Jack


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 2/3] KVM: selftests: Enable pre_fault_memory_test for arm64
  2025-11-19 15:49 ` [PATCH v3 2/3] KVM: selftests: Enable pre_fault_memory_test for arm64 Jack Thomson
@ 2025-12-05 17:33   ` Vincent Donnefort
  0 siblings, 0 replies; 9+ messages in thread
From: Vincent Donnefort @ 2025-12-05 17:33 UTC (permalink / raw)
  To: Jack Thomson
  Cc: maz, oliver.upton, pbonzini, joey.gouly, suzuki.poulose,
	yuzenghui, catalin.marinas, will, shuah, linux-arm-kernel, kvmarm,
	linux-kernel, linux-kselftest, isaku.yamahata, xmarcalx, kalyazin,
	jackabt

On Wed, Nov 19, 2025 at 03:49:09PM +0000, Jack Thomson wrote:
> From: Jack Thomson <jackabt@amazon.com>
> 
> Enable the pre_fault_memory_test to run on arm64 by making it work with
> different guest page sizes and testing multiple guest configurations.
> 
> Update the test_assert to compare against the UCALL_EXIT_REASON, for
> portability, as arm64 exits with KVM_EXIT_MMIO while x86 uses
> KVM_EXIT_IO.
> 
> Signed-off-by: Jack Thomson <jackabt@amazon.com>
> ---
>  tools/testing/selftests/kvm/Makefile.kvm      |  1 +
>  .../selftests/kvm/pre_fault_memory_test.c     | 78 ++++++++++++++-----
>  2 files changed, 58 insertions(+), 21 deletions(-)
> 
> diff --git a/tools/testing/selftests/kvm/Makefile.kvm b/tools/testing/selftests/kvm/Makefile.kvm
> index 148d427ff24b..0ddd8db60197 100644
> --- a/tools/testing/selftests/kvm/Makefile.kvm
> +++ b/tools/testing/selftests/kvm/Makefile.kvm
> @@ -183,6 +183,7 @@ TEST_GEN_PROGS_arm64 += memslot_perf_test
>  TEST_GEN_PROGS_arm64 += mmu_stress_test
>  TEST_GEN_PROGS_arm64 += rseq_test
>  TEST_GEN_PROGS_arm64 += steal_time
> +TEST_GEN_PROGS_arm64 += pre_fault_memory_test
>  
>  TEST_GEN_PROGS_s390 = $(TEST_GEN_PROGS_COMMON)
>  TEST_GEN_PROGS_s390 += s390/memop
> diff --git a/tools/testing/selftests/kvm/pre_fault_memory_test.c b/tools/testing/selftests/kvm/pre_fault_memory_test.c
> index f04768c1d2e4..674931e7bb3a 100644
> --- a/tools/testing/selftests/kvm/pre_fault_memory_test.c
> +++ b/tools/testing/selftests/kvm/pre_fault_memory_test.c
> @@ -11,19 +11,29 @@
>  #include <kvm_util.h>
>  #include <processor.h>
>  #include <pthread.h>
> +#include <guest_modes.h>
>  
>  /* Arbitrarily chosen values */
> -#define TEST_SIZE		(SZ_2M + PAGE_SIZE)
> -#define TEST_NPAGES		(TEST_SIZE / PAGE_SIZE)

After applying on top of the base commit
8a4821412cf2c1429fffa07c012dd150f2edf78c, it does not build.

That TEST_NPAGES seems to still be used in delete_slot_worker(). I believe
that's because of

  "KVM: selftests: Test prefault memory during concurrent memslot removal"

Related issues with the pre_fault_memory() prototype.

Is that the correct base-commit or shall I use something else?

> +#define TEST_BASE_SIZE		SZ_2M
>  #define TEST_SLOT		10
>

[...]


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2025-12-05 17:33 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-19 15:49 [PATCH v3 0/3] KVM ARM64 pre_fault_memory Jack Thomson
2025-11-19 15:49 ` [PATCH v3 1/3] KVM: arm64: Add pre_fault_memory implementation Jack Thomson
2025-11-24 10:38   ` Vladimir Murzin
2025-11-24 11:34   ` Marc Zyngier
2025-11-24 12:54     ` Marc Zyngier
2025-11-26 17:14       ` Thomson, Jack
2025-11-19 15:49 ` [PATCH v3 2/3] KVM: selftests: Enable pre_fault_memory_test for arm64 Jack Thomson
2025-12-05 17:33   ` Vincent Donnefort
2025-11-19 15:49 ` [PATCH v3 3/3] KVM: selftests: Add option for different backing in pre-fault tests Jack Thomson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).