linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 00/16] KVM: arm64: Add "struct kvm_page_fault"
@ 2025-08-21 21:00 Sean Christopherson
  2025-08-21 21:00 ` [RFC PATCH 01/16] KVM: arm64: Drop nested "esr" to eliminate variable shadowing Sean Christopherson
                   ` (16 more replies)
  0 siblings, 17 replies; 22+ messages in thread
From: Sean Christopherson @ 2025-08-21 21:00 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton
  Cc: linux-arm-kernel, kvmarm, linux-kernel, Sean Christopherson,
	James Houghton

Add an arm64 version of "struct kvm_page_fault" to (hopefully) tidy up
the abort path, and to pave the way for things like KVM Userfault[*] that
want to consume kvm_page_fault in arch-neutral code.

This is essentially one giant nop of code shuffling.

RFC as this is only compile-tested.  I didn't want to spend time testing
until I got feedback on whether or not y'all are amenable to the general idea.

[*] https://lore.kernel.org/all/20250618042424.330664-1-jthoughton@google.com

Sean Christopherson (16):
  KVM: arm64: Drop nested "esr" to eliminate variable shadowing
  KVM: arm64: Get iabt status on-demand
  KVM: arm64: Move SRCU-protected region of kvm_handle_guest_abort() to
    helper
  KVM: arm64: Use guard(srcu) in kvm_handle_guest_abort()
  KVM: arm64: Introduce "struct kvm_page_fault" for tracking abort state
  KVM: arm64: Pass kvm_page_fault pointer to
    transparent_hugepage_adjust()
  KVM: arm64: Pass @fault to fault_supports_stage2_huge_mapping()
  KVM: arm64: Add helper to get permission fault granule from ESR
  KVM: arm64: Track perm fault granule in "struct kvm_page_fault"
  KVM: arm64: Drop local vfio_allow_any_uc, use vm_flags snapshot
  KVM: arm64: Drop local mte_allowed, use vm_flags snapshot
  KVM: arm64: Move VMA-related information into "struct kvm_page_fault"
  KVM: arm64: Stash "mmu_seq" in "struct kvm_page_fault"
  KVM: arm64: Track "forced" information in "struct kvm_page_fault"
  KVM: arm64: Extract mmap_lock-protected code to helper for user mem
    aborts
  KVM: arm64: Don't bother nullifying "vma" in mem abort path

 arch/arm64/include/asm/esr.h         |   6 +
 arch/arm64/include/asm/kvm_emulate.h |   9 -
 arch/arm64/include/asm/kvm_host.h    |  32 ++
 arch/arm64/kvm/mmu.c                 | 514 +++++++++++++--------------
 4 files changed, 282 insertions(+), 279 deletions(-)


base-commit: 8f5ae30d69d7543eee0d70083daf4de8fe15d585
-- 
2.51.0.261.g7ce5a0a67e-goog



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [RFC PATCH 01/16] KVM: arm64: Drop nested "esr" to eliminate variable shadowing
  2025-08-21 21:00 [RFC PATCH 00/16] KVM: arm64: Add "struct kvm_page_fault" Sean Christopherson
@ 2025-08-21 21:00 ` Sean Christopherson
  2025-08-21 21:00 ` [RFC PATCH 02/16] KVM: arm64: Get iabt status on-demand Sean Christopherson
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 22+ messages in thread
From: Sean Christopherson @ 2025-08-21 21:00 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton
  Cc: linux-arm-kernel, kvmarm, linux-kernel, Sean Christopherson,
	James Houghton

Drop a local "esr" variable in kvm_handle_guest_abort() that's used as a
very temporary scratch variable when injecting nested stage-2 faults, to
avoid creating a shadow of the function's overall "esr", which holds the
original state provided by hardware.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/arm64/kvm/mmu.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 1c78864767c5..dc3aa58e2ea5 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1904,19 +1904,15 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 	 */
 	if (kvm_is_nested_s2_mmu(vcpu->kvm,vcpu->arch.hw_mmu) &&
 	    vcpu->arch.hw_mmu->nested_stage2_enabled) {
-		u32 esr;
-
 		ret = kvm_walk_nested_s2(vcpu, fault_ipa, &nested_trans);
 		if (ret) {
-			esr = kvm_s2_trans_esr(&nested_trans);
-			kvm_inject_s2_fault(vcpu, esr);
+			kvm_inject_s2_fault(vcpu, kvm_s2_trans_esr(&nested_trans));
 			goto out_unlock;
 		}
 
 		ret = kvm_s2_handle_perm_fault(vcpu, &nested_trans);
 		if (ret) {
-			esr = kvm_s2_trans_esr(&nested_trans);
-			kvm_inject_s2_fault(vcpu, esr);
+			kvm_inject_s2_fault(vcpu, kvm_s2_trans_esr(&nested_trans));
 			goto out_unlock;
 		}
 
-- 
2.51.0.261.g7ce5a0a67e-goog



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH 02/16] KVM: arm64: Get iabt status on-demand
  2025-08-21 21:00 [RFC PATCH 00/16] KVM: arm64: Add "struct kvm_page_fault" Sean Christopherson
  2025-08-21 21:00 ` [RFC PATCH 01/16] KVM: arm64: Drop nested "esr" to eliminate variable shadowing Sean Christopherson
@ 2025-08-21 21:00 ` Sean Christopherson
  2025-08-21 21:00 ` [RFC PATCH 03/16] KVM: arm64: Move SRCU-protected region of kvm_handle_guest_abort() to helper Sean Christopherson
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 22+ messages in thread
From: Sean Christopherson @ 2025-08-21 21:00 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton
  Cc: linux-arm-kernel, kvmarm, linux-kernel, Sean Christopherson,
	James Houghton

Query if an abort was due to instruction execution only when necessary
in anticipation of factoring out the SRCU-protected portion of abort
handling to a separate helper.  The happy path doesn't need to check for
IAB, and eliminating the local variable will avoid having to pass a
large pile of booleans to the inner helper.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/arm64/kvm/mmu.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index dc3aa58e2ea5..1e3ac283c519 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1830,7 +1830,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 	phys_addr_t ipa; /* Always the IPA in the L1 guest phys space */
 	struct kvm_memory_slot *memslot;
 	unsigned long hva;
-	bool is_iabt, write_fault, writable;
+	bool write_fault, writable;
 	gfn_t gfn;
 	int ret, idx;
 
@@ -1856,8 +1856,6 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 	if (KVM_BUG_ON(ipa == INVALID_GPA, vcpu->kvm))
 		return -EFAULT;
 
-	is_iabt = kvm_vcpu_trap_is_iabt(vcpu);
-
 	if (esr_fsc_is_translation_fault(esr)) {
 		/* Beyond sanitised PARange (which is the IPA limit) */
 		if (fault_ipa >= BIT_ULL(get_kvm_ipa_limit())) {
@@ -1869,7 +1867,8 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 		if (fault_ipa >= BIT_ULL(VTCR_EL2_IPA(vcpu->arch.hw_mmu->vtcr))) {
 			fault_ipa |= kvm_vcpu_get_hfar(vcpu) & GENMASK(11, 0);
 
-			return kvm_inject_sea(vcpu, is_iabt, fault_ipa);
+			return kvm_inject_sea(vcpu, kvm_vcpu_trap_is_iabt(vcpu),
+					      fault_ipa);
 		}
 	}
 
@@ -1931,7 +1930,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 		 * anything about this (there's no syndrome for a start), so
 		 * re-inject the abort back into the guest.
 		 */
-		if (is_iabt) {
+		if (kvm_vcpu_trap_is_iabt(vcpu)) {
 			ret = -ENOEXEC;
 			goto out;
 		}
-- 
2.51.0.261.g7ce5a0a67e-goog



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH 03/16] KVM: arm64: Move SRCU-protected region of kvm_handle_guest_abort() to helper
  2025-08-21 21:00 [RFC PATCH 00/16] KVM: arm64: Add "struct kvm_page_fault" Sean Christopherson
  2025-08-21 21:00 ` [RFC PATCH 01/16] KVM: arm64: Drop nested "esr" to eliminate variable shadowing Sean Christopherson
  2025-08-21 21:00 ` [RFC PATCH 02/16] KVM: arm64: Get iabt status on-demand Sean Christopherson
@ 2025-08-21 21:00 ` Sean Christopherson
  2025-08-21 21:00 ` [RFC PATCH 04/16] KVM: arm64: Use guard(srcu) in kvm_handle_guest_abort() Sean Christopherson
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 22+ messages in thread
From: Sean Christopherson @ 2025-08-21 21:00 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton
  Cc: linux-arm-kernel, kvmarm, linux-kernel, Sean Christopherson,
	James Houghton

Move the SRCU-protected portion of the abort handler to a separate helper
in anticipation of adding "struct kvm_page_fault" to track state related
to resolving the fault.  Using a separate helper will allow making several
fields in kvm_page_fault "const", without having to do something funky
like create a temporary copy in the middle of kvm_handle_guest_abort().

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/arm64/kvm/mmu.c | 172 ++++++++++++++++++++++---------------------
 1 file changed, 88 insertions(+), 84 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 1e3ac283c519..de028471b9eb 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1811,82 +1811,16 @@ static void handle_access_fault(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
 	read_unlock(&vcpu->kvm->mmu_lock);
 }
 
-/**
- * kvm_handle_guest_abort - handles all 2nd stage aborts
- * @vcpu:	the VCPU pointer
- *
- * Any abort that gets to the host is almost guaranteed to be caused by a
- * missing second stage translation table entry, which can mean that either the
- * guest simply needs more memory and we must allocate an appropriate page or it
- * can mean that the guest tried to access I/O memory, which is emulated by user
- * space. The distinction is based on the IPA causing the fault and whether this
- * memory region has been registered as standard RAM by user space.
- */
-int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
+static int __kvm_handle_guest_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
+				    unsigned long esr)
 {
 	struct kvm_s2_trans nested_trans, *nested = NULL;
-	unsigned long esr;
-	phys_addr_t fault_ipa; /* The address we faulted on */
-	phys_addr_t ipa; /* Always the IPA in the L1 guest phys space */
 	struct kvm_memory_slot *memslot;
-	unsigned long hva;
 	bool write_fault, writable;
+	unsigned long hva;
+	phys_addr_t ipa; /* Always the IPA in the L1 guest phys space */
 	gfn_t gfn;
-	int ret, idx;
-
-	/* Synchronous External Abort? */
-	if (kvm_vcpu_abt_issea(vcpu)) {
-		/*
-		 * For RAS the host kernel may handle this abort.
-		 * There is no need to pass the error into the guest.
-		 */
-		if (kvm_handle_guest_sea())
-			return kvm_inject_serror(vcpu);
-
-		return 1;
-	}
-
-	esr = kvm_vcpu_get_esr(vcpu);
-
-	/*
-	 * The fault IPA should be reliable at this point as we're not dealing
-	 * with an SEA.
-	 */
-	ipa = fault_ipa = kvm_vcpu_get_fault_ipa(vcpu);
-	if (KVM_BUG_ON(ipa == INVALID_GPA, vcpu->kvm))
-		return -EFAULT;
-
-	if (esr_fsc_is_translation_fault(esr)) {
-		/* Beyond sanitised PARange (which is the IPA limit) */
-		if (fault_ipa >= BIT_ULL(get_kvm_ipa_limit())) {
-			kvm_inject_size_fault(vcpu);
-			return 1;
-		}
-
-		/* Falls between the IPA range and the PARange? */
-		if (fault_ipa >= BIT_ULL(VTCR_EL2_IPA(vcpu->arch.hw_mmu->vtcr))) {
-			fault_ipa |= kvm_vcpu_get_hfar(vcpu) & GENMASK(11, 0);
-
-			return kvm_inject_sea(vcpu, kvm_vcpu_trap_is_iabt(vcpu),
-					      fault_ipa);
-		}
-	}
-
-	trace_kvm_guest_fault(*vcpu_pc(vcpu), kvm_vcpu_get_esr(vcpu),
-			      kvm_vcpu_get_hfar(vcpu), fault_ipa);
-
-	/* Check the stage-2 fault is trans. fault or write fault */
-	if (!esr_fsc_is_translation_fault(esr) &&
-	    !esr_fsc_is_permission_fault(esr) &&
-	    !esr_fsc_is_access_flag_fault(esr)) {
-		kvm_err("Unsupported FSC: EC=%#x xFSC=%#lx ESR_EL2=%#lx\n",
-			kvm_vcpu_trap_get_class(vcpu),
-			(unsigned long)kvm_vcpu_trap_get_fault(vcpu),
-			(unsigned long)kvm_vcpu_get_esr(vcpu));
-		return -EFAULT;
-	}
-
-	idx = srcu_read_lock(&vcpu->kvm->srcu);
+	int ret;
 
 	/*
 	 * We may have faulted on a shadow stage 2 page table if we are
@@ -1906,13 +1840,13 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 		ret = kvm_walk_nested_s2(vcpu, fault_ipa, &nested_trans);
 		if (ret) {
 			kvm_inject_s2_fault(vcpu, kvm_s2_trans_esr(&nested_trans));
-			goto out_unlock;
+			return ret;
 		}
 
 		ret = kvm_s2_handle_perm_fault(vcpu, &nested_trans);
 		if (ret) {
 			kvm_inject_s2_fault(vcpu, kvm_s2_trans_esr(&nested_trans));
-			goto out_unlock;
+			return ret;
 		}
 
 		ipa = kvm_s2_trans_output(&nested_trans);
@@ -1935,10 +1869,8 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 			goto out;
 		}
 
-		if (kvm_vcpu_abt_iss1tw(vcpu)) {
-			ret = kvm_inject_sea_dabt(vcpu, kvm_vcpu_get_hfar(vcpu));
-			goto out_unlock;
-		}
+		if (kvm_vcpu_abt_iss1tw(vcpu))
+			return kvm_inject_sea_dabt(vcpu, kvm_vcpu_get_hfar(vcpu));
 
 		/*
 		 * Check for a cache maintenance operation. Since we
@@ -1952,8 +1884,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 		 */
 		if (kvm_is_error_hva(hva) && kvm_vcpu_dabt_is_cm(vcpu)) {
 			kvm_incr_pc(vcpu);
-			ret = 1;
-			goto out_unlock;
+			return 1;
 		}
 
 		/*
@@ -1963,8 +1894,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 		 * of the page size.
 		 */
 		ipa |= kvm_vcpu_get_hfar(vcpu) & GENMASK(11, 0);
-		ret = io_mem_abort(vcpu, ipa);
-		goto out_unlock;
+		return io_mem_abort(vcpu, ipa);
 	}
 
 	/* Userspace should not be able to register out-of-bounds IPAs */
@@ -1972,8 +1902,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 
 	if (esr_fsc_is_access_flag_fault(esr)) {
 		handle_access_fault(vcpu, fault_ipa);
-		ret = 1;
-		goto out_unlock;
+		return 1;
 	}
 
 	ret = user_mem_abort(vcpu, fault_ipa, nested, memslot, hva,
@@ -1983,7 +1912,82 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 out:
 	if (ret == -ENOEXEC)
 		ret = kvm_inject_sea_iabt(vcpu, kvm_vcpu_get_hfar(vcpu));
-out_unlock:
+	return ret;
+}
+
+/**
+ * kvm_handle_guest_abort - handles all 2nd stage aborts
+ * @vcpu:	the VCPU pointer
+ *
+ * Any abort that gets to the host is almost guaranteed to be caused by a
+ * missing second stage translation table entry, which can mean that either the
+ * guest simply needs more memory and we must allocate an appropriate page or it
+ * can mean that the guest tried to access I/O memory, which is emulated by user
+ * space. The distinction is based on the IPA causing the fault and whether this
+ * memory region has been registered as standard RAM by user space.
+ */
+int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
+{
+	unsigned long esr;
+	phys_addr_t fault_ipa; /* The address we faulted on */
+	int ret, idx;
+
+	/* Synchronous External Abort? */
+	if (kvm_vcpu_abt_issea(vcpu)) {
+		/*
+		 * For RAS the host kernel may handle this abort.
+		 * There is no need to pass the error into the guest.
+		 */
+		if (kvm_handle_guest_sea())
+			return kvm_inject_serror(vcpu);
+
+		return 1;
+	}
+
+	esr = kvm_vcpu_get_esr(vcpu);
+
+	/*
+	 * The fault IPA should be reliable at this point as we're not dealing
+	 * with an SEA.
+	 */
+	fault_ipa = kvm_vcpu_get_fault_ipa(vcpu);
+	if (KVM_BUG_ON(fault_ipa == INVALID_GPA, vcpu->kvm))
+		return -EFAULT;
+
+	if (esr_fsc_is_translation_fault(esr)) {
+		/* Beyond sanitised PARange (which is the IPA limit) */
+		if (fault_ipa >= BIT_ULL(get_kvm_ipa_limit())) {
+			kvm_inject_size_fault(vcpu);
+			return 1;
+		}
+
+		/* Falls between the IPA range and the PARange? */
+		if (fault_ipa >= BIT_ULL(VTCR_EL2_IPA(vcpu->arch.hw_mmu->vtcr))) {
+			fault_ipa |= kvm_vcpu_get_hfar(vcpu) & GENMASK(11, 0);
+
+			return kvm_inject_sea(vcpu, kvm_vcpu_trap_is_iabt(vcpu),
+					      fault_ipa);
+		}
+	}
+
+	trace_kvm_guest_fault(*vcpu_pc(vcpu), kvm_vcpu_get_esr(vcpu),
+			      kvm_vcpu_get_hfar(vcpu), fault_ipa);
+
+	/* Check the stage-2 fault is trans. fault or write fault */
+	if (!esr_fsc_is_translation_fault(esr) &&
+	    !esr_fsc_is_permission_fault(esr) &&
+	    !esr_fsc_is_access_flag_fault(esr)) {
+		kvm_err("Unsupported FSC: EC=%#x xFSC=%#lx ESR_EL2=%#lx\n",
+			kvm_vcpu_trap_get_class(vcpu),
+			(unsigned long)kvm_vcpu_trap_get_fault(vcpu),
+			(unsigned long)kvm_vcpu_get_esr(vcpu));
+		return -EFAULT;
+	}
+
+	idx = srcu_read_lock(&vcpu->kvm->srcu);
+
+	ret = __kvm_handle_guest_abort(vcpu, fault_ipa, esr);
+
 	srcu_read_unlock(&vcpu->kvm->srcu, idx);
 	return ret;
 }
-- 
2.51.0.261.g7ce5a0a67e-goog



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH 04/16] KVM: arm64: Use guard(srcu) in kvm_handle_guest_abort()
  2025-08-21 21:00 [RFC PATCH 00/16] KVM: arm64: Add "struct kvm_page_fault" Sean Christopherson
                   ` (2 preceding siblings ...)
  2025-08-21 21:00 ` [RFC PATCH 03/16] KVM: arm64: Move SRCU-protected region of kvm_handle_guest_abort() to helper Sean Christopherson
@ 2025-08-21 21:00 ` Sean Christopherson
  2025-08-21 21:00 ` [RFC PATCH 05/16] KVM: arm64: Introduce "struct kvm_page_fault" for tracking abort state Sean Christopherson
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 22+ messages in thread
From: Sean Christopherson @ 2025-08-21 21:00 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton
  Cc: linux-arm-kernel, kvmarm, linux-kernel, Sean Christopherson,
	James Houghton

Use a guard() to acquire/release SRCU when handling guest aborts to
simplify the code a bit.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/arm64/kvm/mmu.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index de028471b9eb..49ce6bf623f7 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1930,7 +1930,6 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 {
 	unsigned long esr;
 	phys_addr_t fault_ipa; /* The address we faulted on */
-	int ret, idx;
 
 	/* Synchronous External Abort? */
 	if (kvm_vcpu_abt_issea(vcpu)) {
@@ -1984,12 +1983,9 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 		return -EFAULT;
 	}
 
-	idx = srcu_read_lock(&vcpu->kvm->srcu);
+	guard(srcu)(&vcpu->kvm->srcu);
 
-	ret = __kvm_handle_guest_abort(vcpu, fault_ipa, esr);
-
-	srcu_read_unlock(&vcpu->kvm->srcu, idx);
-	return ret;
+	return __kvm_handle_guest_abort(vcpu, fault_ipa, esr);
 }
 
 bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
-- 
2.51.0.261.g7ce5a0a67e-goog



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH 05/16] KVM: arm64: Introduce "struct kvm_page_fault" for tracking abort state
  2025-08-21 21:00 [RFC PATCH 00/16] KVM: arm64: Add "struct kvm_page_fault" Sean Christopherson
                   ` (3 preceding siblings ...)
  2025-08-21 21:00 ` [RFC PATCH 04/16] KVM: arm64: Use guard(srcu) in kvm_handle_guest_abort() Sean Christopherson
@ 2025-08-21 21:00 ` Sean Christopherson
  2025-08-21 22:31   ` Oliver Upton
  2025-08-21 21:00 ` [RFC PATCH 06/16] KVM: arm64: Pass kvm_page_fault pointer to transparent_hugepage_adjust() Sean Christopherson
                   ` (11 subsequent siblings)
  16 siblings, 1 reply; 22+ messages in thread
From: Sean Christopherson @ 2025-08-21 21:00 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton
  Cc: linux-arm-kernel, kvmarm, linux-kernel, Sean Christopherson,
	James Houghton

Add and use a kvm_page_fault structure to track state when handling a
guest abort.  Collecting everything in a single structure will enable a
variety of cleanups (reduce the number of params passed to helpers), and
will pave the way toward using "struct kvm_page_fault" in arch-neutral KVM
code, e.g. to consolidate logic for KVM_EXIT_MEMORY_FAULT.

No functional change intended.

Cc: James Houghton <jthoughton@google.com>
Link: https://lore.kernel.org/all/20250618042424.330664-1-jthoughton@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/arm64/include/asm/kvm_host.h |  18 ++++
 arch/arm64/kvm/mmu.c              | 143 ++++++++++++++----------------
 2 files changed, 87 insertions(+), 74 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 2f2394cce24e..4623cbc1edf4 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -413,6 +413,24 @@ struct kvm_vcpu_fault_info {
 	u64 disr_el1;		/* Deferred [SError] Status Register */
 };
 
+struct kvm_page_fault {
+	const u64 esr;
+	const bool exec;
+	const bool write;
+	const bool is_perm;
+
+	phys_addr_t fault_ipa; /* The address we faulted on */
+	phys_addr_t ipa; /* Always the IPA in the L1 guest phys space */
+
+	struct kvm_s2_trans *nested;
+
+	gfn_t gfn;
+	struct kvm_memory_slot *slot;
+	unsigned long hva;
+	kvm_pfn_t pfn;
+	struct page *page;
+};
+
 /*
  * VNCR() just places the VNCR_capable registers in the enum after
  * __VNCR_START__, and the value (after correction) to be an 8-byte offset
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 49ce6bf623f7..ca98778989b2 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1477,38 +1477,29 @@ static bool kvm_vma_is_cacheable(struct vm_area_struct *vma)
 	}
 }
 
-static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
-			  struct kvm_s2_trans *nested,
-			  struct kvm_memory_slot *memslot, unsigned long hva,
-			  bool fault_is_perm)
+static int user_mem_abort(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 {
 	int ret = 0;
-	bool write_fault, writable, force_pte = false;
-	bool exec_fault, mte_allowed, is_vma_cacheable;
+	bool writable, force_pte = false;
+	bool mte_allowed, is_vma_cacheable;
 	bool s2_force_noncacheable = false, vfio_allow_any_uc = false;
 	unsigned long mmu_seq;
-	phys_addr_t ipa = fault_ipa;
 	struct kvm *kvm = vcpu->kvm;
 	struct vm_area_struct *vma;
 	short vma_shift;
 	void *memcache;
-	gfn_t gfn;
-	kvm_pfn_t pfn;
-	bool logging_active = memslot_is_logging(memslot);
+	bool logging_active = memslot_is_logging(fault->slot);
 	long vma_pagesize, fault_granule;
 	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
 	struct kvm_pgtable *pgt;
-	struct page *page;
 	vm_flags_t vm_flags;
 	enum kvm_pgtable_walk_flags flags = KVM_PGTABLE_WALK_HANDLE_FAULT | KVM_PGTABLE_WALK_SHARED;
 
-	if (fault_is_perm)
+	if (fault->is_perm)
 		fault_granule = kvm_vcpu_trap_get_perm_fault_granule(vcpu);
-	write_fault = kvm_is_write_fault(vcpu);
-	exec_fault = kvm_vcpu_trap_is_exec_fault(vcpu);
-	VM_BUG_ON(write_fault && exec_fault);
+	VM_BUG_ON(fault->write && fault->exec);
 
-	if (fault_is_perm && !write_fault && !exec_fault) {
+	if (fault->is_perm && !fault->write && !fault->exec) {
 		kvm_err("Unexpected L2 read permission error\n");
 		return -EFAULT;
 	}
@@ -1524,7 +1515,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	 * only exception to this is when dirty logging is enabled at runtime
 	 * and a write fault needs to collapse a block entry into a table.
 	 */
-	if (!fault_is_perm || (logging_active && write_fault)) {
+	if (!fault->is_perm || (logging_active && fault->write)) {
 		int min_pages = kvm_mmu_cache_min_pages(vcpu->arch.hw_mmu);
 
 		if (!is_protected_kvm_enabled())
@@ -1541,9 +1532,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	 * get block mapping for device MMIO region.
 	 */
 	mmap_read_lock(current->mm);
-	vma = vma_lookup(current->mm, hva);
+	vma = vma_lookup(current->mm, fault->hva);
 	if (unlikely(!vma)) {
-		kvm_err("Failed to find VMA for hva 0x%lx\n", hva);
+		kvm_err("Failed to find VMA for hva 0x%lx\n", fault->hva);
 		mmap_read_unlock(current->mm);
 		return -EFAULT;
 	}
@@ -1556,13 +1547,13 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 		force_pte = true;
 		vma_shift = PAGE_SHIFT;
 	} else {
-		vma_shift = get_vma_page_shift(vma, hva);
+		vma_shift = get_vma_page_shift(vma, fault->hva);
 	}
 
 	switch (vma_shift) {
 #ifndef __PAGETABLE_PMD_FOLDED
 	case PUD_SHIFT:
-		if (fault_supports_stage2_huge_mapping(memslot, hva, PUD_SIZE))
+		if (fault_supports_stage2_huge_mapping(fault->slot, fault->hva, PUD_SIZE))
 			break;
 		fallthrough;
 #endif
@@ -1570,7 +1561,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 		vma_shift = PMD_SHIFT;
 		fallthrough;
 	case PMD_SHIFT:
-		if (fault_supports_stage2_huge_mapping(memslot, hva, PMD_SIZE))
+		if (fault_supports_stage2_huge_mapping(fault->slot, fault->hva, PMD_SIZE))
 			break;
 		fallthrough;
 	case CONT_PTE_SHIFT:
@@ -1585,19 +1576,19 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 
 	vma_pagesize = 1UL << vma_shift;
 
-	if (nested) {
+	if (fault->nested) {
 		unsigned long max_map_size;
 
 		max_map_size = force_pte ? PAGE_SIZE : PUD_SIZE;
 
-		ipa = kvm_s2_trans_output(nested);
+		WARN_ON_ONCE(fault->ipa != kvm_s2_trans_output(fault->nested));
 
 		/*
 		 * If we're about to create a shadow stage 2 entry, then we
 		 * can only create a block mapping if the guest stage 2 page
 		 * table uses at least as big a mapping.
 		 */
-		max_map_size = min(kvm_s2_trans_size(nested), max_map_size);
+		max_map_size = min(kvm_s2_trans_size(fault->nested), max_map_size);
 
 		/*
 		 * Be careful that if the mapping size falls between
@@ -1618,11 +1609,11 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	 * place.
 	 */
 	if (vma_pagesize == PMD_SIZE || vma_pagesize == PUD_SIZE) {
-		fault_ipa &= ~(vma_pagesize - 1);
-		ipa &= ~(vma_pagesize - 1);
+		fault->fault_ipa &= ~(vma_pagesize - 1);
+		fault->ipa &= ~(vma_pagesize - 1);
 	}
 
-	gfn = ipa >> PAGE_SHIFT;
+	fault->gfn = fault->ipa >> PAGE_SHIFT;
 	mte_allowed = kvm_vma_mte_allowed(vma);
 
 	vfio_allow_any_uc = vma->vm_flags & VM_ALLOW_ANY_UNCACHED;
@@ -1645,20 +1636,21 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	mmu_seq = vcpu->kvm->mmu_invalidate_seq;
 	mmap_read_unlock(current->mm);
 
-	pfn = __kvm_faultin_pfn(memslot, gfn, write_fault ? FOLL_WRITE : 0,
-				&writable, &page);
-	if (pfn == KVM_PFN_ERR_HWPOISON) {
-		kvm_send_hwpoison_signal(hva, vma_shift);
+	fault->pfn = __kvm_faultin_pfn(fault->slot, fault->gfn,
+				       fault->write ? FOLL_WRITE : 0,
+				       &writable, &fault->page);
+	if (fault->pfn == KVM_PFN_ERR_HWPOISON) {
+		kvm_send_hwpoison_signal(fault->hva, vma_shift);
 		return 0;
 	}
-	if (is_error_noslot_pfn(pfn))
+	if (is_error_noslot_pfn(fault->pfn))
 		return -EFAULT;
 
 	/*
 	 * Check if this is non-struct page memory PFN, and cannot support
 	 * CMOs. It could potentially be unsafe to access as cachable.
 	 */
-	if (vm_flags & (VM_PFNMAP | VM_MIXEDMAP) && !pfn_is_map_memory(pfn)) {
+	if (vm_flags & (VM_PFNMAP | VM_MIXEDMAP) && !pfn_is_map_memory(fault->pfn)) {
 		if (is_vma_cacheable) {
 			/*
 			 * Whilst the VMA owner expects cacheable mapping to this
@@ -1687,7 +1679,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			 */
 			s2_force_noncacheable = true;
 		}
-	} else if (logging_active && !write_fault) {
+	} else if (logging_active && !fault->write) {
 		/*
 		 * Only actually map the page as writable if this was a write
 		 * fault.
@@ -1695,7 +1687,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 		writable = false;
 	}
 
-	if (exec_fault && s2_force_noncacheable)
+	if (fault->exec && s2_force_noncacheable)
 		return -ENOEXEC;
 
 	/*
@@ -1709,12 +1701,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	 * used to limit the invalidation scope if a TTL hint or a range
 	 * isn't provided.
 	 */
-	if (nested) {
-		writable &= kvm_s2_trans_writable(nested);
-		if (!kvm_s2_trans_readable(nested))
+	if (fault->nested) {
+		writable &= kvm_s2_trans_writable(fault->nested);
+		if (!kvm_s2_trans_readable(fault->nested))
 			prot &= ~KVM_PGTABLE_PROT_R;
 
-		prot |= kvm_encode_nested_level(nested);
+		prot |= kvm_encode_nested_level(fault->nested);
 	}
 
 	kvm_fault_lock(kvm);
@@ -1729,12 +1721,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	 * backed by a THP and thus use block mapping if possible.
 	 */
 	if (vma_pagesize == PAGE_SIZE && !(force_pte || s2_force_noncacheable)) {
-		if (fault_is_perm && fault_granule > PAGE_SIZE)
+		if (fault->is_perm && fault_granule > PAGE_SIZE)
 			vma_pagesize = fault_granule;
 		else
-			vma_pagesize = transparent_hugepage_adjust(kvm, memslot,
-								   hva, &pfn,
-								   &fault_ipa);
+			vma_pagesize = transparent_hugepage_adjust(kvm, fault->slot,
+								   fault->hva, &fault->pfn,
+								   &fault->fault_ipa);
 
 		if (vma_pagesize < 0) {
 			ret = vma_pagesize;
@@ -1742,10 +1734,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 		}
 	}
 
-	if (!fault_is_perm && !s2_force_noncacheable && kvm_has_mte(kvm)) {
+	if (!fault->is_perm && !s2_force_noncacheable && kvm_has_mte(kvm)) {
 		/* Check the VMM hasn't introduced a new disallowed VMA */
 		if (mte_allowed) {
-			sanitise_mte_tags(kvm, pfn, vma_pagesize);
+			sanitise_mte_tags(kvm, fault->pfn, vma_pagesize);
 		} else {
 			ret = -EFAULT;
 			goto out_unlock;
@@ -1755,7 +1747,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	if (writable)
 		prot |= KVM_PGTABLE_PROT_W;
 
-	if (exec_fault)
+	if (fault->exec)
 		prot |= KVM_PGTABLE_PROT_X;
 
 	if (s2_force_noncacheable) {
@@ -1764,7 +1756,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 		else
 			prot |= KVM_PGTABLE_PROT_DEVICE;
 	} else if (cpus_have_final_cap(ARM64_HAS_CACHE_DIC) &&
-		   (!nested || kvm_s2_trans_executable(nested))) {
+		   (!fault->nested || kvm_s2_trans_executable(fault->nested))) {
 		prot |= KVM_PGTABLE_PROT_X;
 	}
 
@@ -1773,26 +1765,26 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	 * permissions only if vma_pagesize equals fault_granule. Otherwise,
 	 * kvm_pgtable_stage2_map() should be called to change block size.
 	 */
-	if (fault_is_perm && vma_pagesize == fault_granule) {
+	if (fault->is_perm && vma_pagesize == fault_granule) {
 		/*
 		 * Drop the SW bits in favour of those stored in the
 		 * PTE, which will be preserved.
 		 */
 		prot &= ~KVM_NV_GUEST_MAP_SZ;
-		ret = KVM_PGT_FN(kvm_pgtable_stage2_relax_perms)(pgt, fault_ipa, prot, flags);
+		ret = KVM_PGT_FN(kvm_pgtable_stage2_relax_perms)(pgt, fault->fault_ipa, prot, flags);
 	} else {
-		ret = KVM_PGT_FN(kvm_pgtable_stage2_map)(pgt, fault_ipa, vma_pagesize,
-					     __pfn_to_phys(pfn), prot,
+		ret = KVM_PGT_FN(kvm_pgtable_stage2_map)(pgt, fault->fault_ipa, vma_pagesize,
+					     __pfn_to_phys(fault->pfn), prot,
 					     memcache, flags);
 	}
 
 out_unlock:
-	kvm_release_faultin_page(kvm, page, !!ret, writable);
+	kvm_release_faultin_page(kvm, fault->page, !!ret, writable);
 	kvm_fault_unlock(kvm);
 
 	/* Mark the page dirty only if the fault is handled successfully */
 	if (writable && !ret)
-		mark_page_dirty_in_slot(kvm, memslot, gfn);
+		mark_page_dirty_in_slot(kvm, fault->slot, fault->gfn);
 
 	return ret != -EAGAIN ? ret : 0;
 }
@@ -1814,12 +1806,17 @@ static void handle_access_fault(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
 static int __kvm_handle_guest_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 				    unsigned long esr)
 {
-	struct kvm_s2_trans nested_trans, *nested = NULL;
-	struct kvm_memory_slot *memslot;
-	bool write_fault, writable;
-	unsigned long hva;
-	phys_addr_t ipa; /* Always the IPA in the L1 guest phys space */
-	gfn_t gfn;
+	struct kvm_page_fault fault = {
+		.fault_ipa = fault_ipa,
+		.esr = esr,
+		.ipa = fault_ipa,
+
+		.write = kvm_is_write_fault(vcpu),
+		.exec  = kvm_vcpu_trap_is_exec_fault(vcpu),
+		.is_perm = esr_fsc_is_permission_fault(esr),
+	};
+	struct kvm_s2_trans nested_trans;
+	bool writable;
 	int ret;
 
 	/*
@@ -1849,15 +1846,14 @@ static int __kvm_handle_guest_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa
 			return ret;
 		}
 
-		ipa = kvm_s2_trans_output(&nested_trans);
-		nested = &nested_trans;
+		fault.ipa = kvm_s2_trans_output(&nested_trans);
+		fault.nested = &nested_trans;
 	}
 
-	gfn = ipa >> PAGE_SHIFT;
-	memslot = gfn_to_memslot(vcpu->kvm, gfn);
-	hva = gfn_to_hva_memslot_prot(memslot, gfn, &writable);
-	write_fault = kvm_is_write_fault(vcpu);
-	if (kvm_is_error_hva(hva) || (write_fault && !writable)) {
+	fault.gfn = fault.ipa >> PAGE_SHIFT;
+	fault.slot = gfn_to_memslot(vcpu->kvm, fault.gfn);
+	fault.hva = gfn_to_hva_memslot_prot(fault.slot, fault.gfn, &writable);
+	if (kvm_is_error_hva(fault.hva) || (fault.write && !writable)) {
 		/*
 		 * The guest has put either its instructions or its page-tables
 		 * somewhere it shouldn't have. Userspace won't be able to do
@@ -1882,7 +1878,7 @@ static int __kvm_handle_guest_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa
 		 * So let's assume that the guest is just being
 		 * cautious, and skip the instruction.
 		 */
-		if (kvm_is_error_hva(hva) && kvm_vcpu_dabt_is_cm(vcpu)) {
+		if (kvm_is_error_hva(fault.hva) && kvm_vcpu_dabt_is_cm(vcpu)) {
 			kvm_incr_pc(vcpu);
 			return 1;
 		}
@@ -1893,20 +1889,19 @@ static int __kvm_handle_guest_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa
 		 * faulting VA. This is always 12 bits, irrespective
 		 * of the page size.
 		 */
-		ipa |= kvm_vcpu_get_hfar(vcpu) & GENMASK(11, 0);
-		return io_mem_abort(vcpu, ipa);
+		fault.ipa |= kvm_vcpu_get_hfar(vcpu) & GENMASK(11, 0);
+		return io_mem_abort(vcpu, fault.ipa);
 	}
 
 	/* Userspace should not be able to register out-of-bounds IPAs */
-	VM_BUG_ON(ipa >= kvm_phys_size(vcpu->arch.hw_mmu));
+	VM_BUG_ON(fault.ipa >= kvm_phys_size(vcpu->arch.hw_mmu));
 
 	if (esr_fsc_is_access_flag_fault(esr)) {
-		handle_access_fault(vcpu, fault_ipa);
+		handle_access_fault(vcpu, fault.fault_ipa);
 		return 1;
 	}
 
-	ret = user_mem_abort(vcpu, fault_ipa, nested, memslot, hva,
-			     esr_fsc_is_permission_fault(esr));
+	ret = user_mem_abort(vcpu, &fault);
 	if (ret == 0)
 		ret = 1;
 out:
-- 
2.51.0.261.g7ce5a0a67e-goog



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH 06/16] KVM: arm64: Pass kvm_page_fault pointer to transparent_hugepage_adjust()
  2025-08-21 21:00 [RFC PATCH 00/16] KVM: arm64: Add "struct kvm_page_fault" Sean Christopherson
                   ` (4 preceding siblings ...)
  2025-08-21 21:00 ` [RFC PATCH 05/16] KVM: arm64: Introduce "struct kvm_page_fault" for tracking abort state Sean Christopherson
@ 2025-08-21 21:00 ` Sean Christopherson
  2025-08-21 21:00 ` [RFC PATCH 07/16] KVM: arm64: Pass @fault to fault_supports_stage2_huge_mapping() Sean Christopherson
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 22+ messages in thread
From: Sean Christopherson @ 2025-08-21 21:00 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton
  Cc: linux-arm-kernel, kvmarm, linux-kernel, Sean Christopherson,
	James Houghton

Use the local kvm_page_fault structure to adjust for transparent hugepages
when resolving guest aborts, to reduce the number of parameters from 5=>2,
and to eliminate the less-than-pleasant pointer dereferences.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/arm64/kvm/mmu.c | 20 ++++++--------------
 1 file changed, 6 insertions(+), 14 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index ca98778989b2..047aba00388c 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1361,19 +1361,15 @@ static bool fault_supports_stage2_huge_mapping(struct kvm_memory_slot *memslot,
  * Returns the size of the mapping.
  */
 static long
-transparent_hugepage_adjust(struct kvm *kvm, struct kvm_memory_slot *memslot,
-			    unsigned long hva, kvm_pfn_t *pfnp,
-			    phys_addr_t *ipap)
+transparent_hugepage_adjust(struct kvm *kvm, struct kvm_page_fault *fault)
 {
-	kvm_pfn_t pfn = *pfnp;
-
 	/*
 	 * Make sure the adjustment is done only for THP pages. Also make
 	 * sure that the HVA and IPA are sufficiently aligned and that the
 	 * block map is contained within the memslot.
 	 */
-	if (fault_supports_stage2_huge_mapping(memslot, hva, PMD_SIZE)) {
-		int sz = get_user_mapping_size(kvm, hva);
+	if (fault_supports_stage2_huge_mapping(fault->slot, fault->hva, PMD_SIZE)) {
+		int sz = get_user_mapping_size(kvm, fault->hva);
 
 		if (sz < 0)
 			return sz;
@@ -1381,10 +1377,8 @@ transparent_hugepage_adjust(struct kvm *kvm, struct kvm_memory_slot *memslot,
 		if (sz < PMD_SIZE)
 			return PAGE_SIZE;
 
-		*ipap &= PMD_MASK;
-		pfn &= ~(PTRS_PER_PMD - 1);
-		*pfnp = pfn;
-
+		fault->ipa &= PMD_MASK;
+		fault->pfn &= ~(PTRS_PER_PMD - 1);
 		return PMD_SIZE;
 	}
 
@@ -1724,9 +1718,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 		if (fault->is_perm && fault_granule > PAGE_SIZE)
 			vma_pagesize = fault_granule;
 		else
-			vma_pagesize = transparent_hugepage_adjust(kvm, fault->slot,
-								   fault->hva, &fault->pfn,
-								   &fault->fault_ipa);
+			vma_pagesize = transparent_hugepage_adjust(kvm, fault);
 
 		if (vma_pagesize < 0) {
 			ret = vma_pagesize;
-- 
2.51.0.261.g7ce5a0a67e-goog



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH 07/16] KVM: arm64: Pass @fault to fault_supports_stage2_huge_mapping()
  2025-08-21 21:00 [RFC PATCH 00/16] KVM: arm64: Add "struct kvm_page_fault" Sean Christopherson
                   ` (5 preceding siblings ...)
  2025-08-21 21:00 ` [RFC PATCH 06/16] KVM: arm64: Pass kvm_page_fault pointer to transparent_hugepage_adjust() Sean Christopherson
@ 2025-08-21 21:00 ` Sean Christopherson
  2025-08-21 21:00 ` [RFC PATCH 08/16] KVM: arm64: Add helper to get permission fault granule from ESR Sean Christopherson
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 22+ messages in thread
From: Sean Christopherson @ 2025-08-21 21:00 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton
  Cc: linux-arm-kernel, kvmarm, linux-kernel, Sean Christopherson,
	James Houghton

Pass the full kvm_page_fault object when querying if a fault supports a
hugepage mapping instead of passing the separate slot+address pair.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/arm64/kvm/mmu.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 047aba00388c..c6aadd8baa18 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1287,10 +1287,10 @@ static void kvm_send_hwpoison_signal(unsigned long address, short lsb)
 	send_sig_mceerr(BUS_MCEERR_AR, (void __user *)address, lsb, current);
 }
 
-static bool fault_supports_stage2_huge_mapping(struct kvm_memory_slot *memslot,
-					       unsigned long hva,
+static bool fault_supports_stage2_huge_mapping(struct kvm_page_fault *fault,
 					       unsigned long map_size)
 {
+	struct kvm_memory_slot *memslot = fault->slot;
 	gpa_t gpa_start;
 	hva_t uaddr_start, uaddr_end;
 	size_t size;
@@ -1348,8 +1348,8 @@ static bool fault_supports_stage2_huge_mapping(struct kvm_memory_slot *memslot,
 	 * userspace_addr or the base_gfn, as both are equally aligned (per
 	 * the check above) and equally sized.
 	 */
-	return (hva & ~(map_size - 1)) >= uaddr_start &&
-	       (hva & ~(map_size - 1)) + map_size <= uaddr_end;
+	return (fault->hva & ~(map_size - 1)) >= uaddr_start &&
+	       (fault->hva & ~(map_size - 1)) + map_size <= uaddr_end;
 }
 
 /*
@@ -1368,7 +1368,7 @@ transparent_hugepage_adjust(struct kvm *kvm, struct kvm_page_fault *fault)
 	 * sure that the HVA and IPA are sufficiently aligned and that the
 	 * block map is contained within the memslot.
 	 */
-	if (fault_supports_stage2_huge_mapping(fault->slot, fault->hva, PMD_SIZE)) {
+	if (fault_supports_stage2_huge_mapping(fault, PMD_SIZE)) {
 		int sz = get_user_mapping_size(kvm, fault->hva);
 
 		if (sz < 0)
@@ -1547,7 +1547,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 	switch (vma_shift) {
 #ifndef __PAGETABLE_PMD_FOLDED
 	case PUD_SHIFT:
-		if (fault_supports_stage2_huge_mapping(fault->slot, fault->hva, PUD_SIZE))
+		if (fault_supports_stage2_huge_mapping(fault, PUD_SIZE))
 			break;
 		fallthrough;
 #endif
@@ -1555,7 +1555,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 		vma_shift = PMD_SHIFT;
 		fallthrough;
 	case PMD_SHIFT:
-		if (fault_supports_stage2_huge_mapping(fault->slot, fault->hva, PMD_SIZE))
+		if (fault_supports_stage2_huge_mapping(fault, PMD_SIZE))
 			break;
 		fallthrough;
 	case CONT_PTE_SHIFT:
-- 
2.51.0.261.g7ce5a0a67e-goog



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH 08/16] KVM: arm64: Add helper to get permission fault granule from ESR
  2025-08-21 21:00 [RFC PATCH 00/16] KVM: arm64: Add "struct kvm_page_fault" Sean Christopherson
                   ` (6 preceding siblings ...)
  2025-08-21 21:00 ` [RFC PATCH 07/16] KVM: arm64: Pass @fault to fault_supports_stage2_huge_mapping() Sean Christopherson
@ 2025-08-21 21:00 ` Sean Christopherson
  2025-08-21 21:00 ` [RFC PATCH 09/16] KVM: arm64: Track perm fault granule in "struct kvm_page_fault" Sean Christopherson
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 22+ messages in thread
From: Sean Christopherson @ 2025-08-21 21:00 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton
  Cc: linux-arm-kernel, kvmarm, linux-kernel, Sean Christopherson,
	James Houghton

Extract KVM's code for getting the granule for a permission fault into a
standalone API that takes in a raw ESR, so that KVM can get the granule
from a local copy of the ESR instead of re-retrieving the value from the
vCPU.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/arm64/include/asm/esr.h         | 6 ++++++
 arch/arm64/include/asm/kvm_emulate.h | 2 +-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
index e1deed824464..5bb99cfd184a 100644
--- a/arch/arm64/include/asm/esr.h
+++ b/arch/arm64/include/asm/esr.h
@@ -8,6 +8,7 @@
 #define __ASM_ESR_H
 
 #include <asm/memory.h>
+#include <asm/pgtable-hwdef.h>
 #include <asm/sysreg.h>
 
 #define ESR_ELx_EC_UNKNOWN	UL(0x00)
@@ -478,6 +479,11 @@ static inline bool esr_fsc_is_permission_fault(unsigned long esr)
 	       (esr == ESR_ELx_FSC_PERM_L(0));
 }
 
+static inline u64 esr_fsc_perm_fault_granule(unsigned long esr)
+{
+	return BIT(ARM64_HW_PGTABLE_LEVEL_SHIFT(esr & ESR_ELx_FSC_LEVEL));
+}
+
 static inline bool esr_fsc_is_access_flag_fault(unsigned long esr)
 {
 	esr = esr & ESR_ELx_FSC;
diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index fa8a08a1ccd5..8065f54927cb 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -455,7 +455,7 @@ u64 kvm_vcpu_trap_get_perm_fault_granule(const struct kvm_vcpu *vcpu)
 	unsigned long esr = kvm_vcpu_get_esr(vcpu);
 
 	BUG_ON(!esr_fsc_is_permission_fault(esr));
-	return BIT(ARM64_HW_PGTABLE_LEVEL_SHIFT(esr & ESR_ELx_FSC_LEVEL));
+	return esr_fsc_perm_fault_granule(esr);
 }
 
 static __always_inline bool kvm_vcpu_abt_issea(const struct kvm_vcpu *vcpu)
-- 
2.51.0.261.g7ce5a0a67e-goog



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH 09/16] KVM: arm64: Track perm fault granule in "struct kvm_page_fault"
  2025-08-21 21:00 [RFC PATCH 00/16] KVM: arm64: Add "struct kvm_page_fault" Sean Christopherson
                   ` (7 preceding siblings ...)
  2025-08-21 21:00 ` [RFC PATCH 08/16] KVM: arm64: Add helper to get permission fault granule from ESR Sean Christopherson
@ 2025-08-21 21:00 ` Sean Christopherson
  2025-08-21 21:00 ` [RFC PATCH 10/16] KVM: arm64: Drop local vfio_allow_any_uc, use vm_flags snapshot Sean Christopherson
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 22+ messages in thread
From: Sean Christopherson @ 2025-08-21 21:00 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton
  Cc: linux-arm-kernel, kvmarm, linux-kernel, Sean Christopherson,
	James Houghton

Add permission fault granule information to "struct kvm_page_fault", to
help capture that the granule is a property of the fault, and to make the
information readily available, e.g. without needing to be explicitly
passed if it's needed by a helper.

Opportunistically drop kvm_vcpu_trap_get_perm_fault_granule() and simply
grab the granule from the passed-in ESR.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/arm64/include/asm/kvm_emulate.h |  9 ---------
 arch/arm64/include/asm/kvm_host.h    |  1 +
 arch/arm64/kvm/mmu.c                 | 13 ++++++-------
 3 files changed, 7 insertions(+), 16 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 8065f54927cb..93e7a0bad0fb 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -449,15 +449,6 @@ bool kvm_vcpu_trap_is_translation_fault(const struct kvm_vcpu *vcpu)
 	return esr_fsc_is_translation_fault(kvm_vcpu_get_esr(vcpu));
 }
 
-static inline
-u64 kvm_vcpu_trap_get_perm_fault_granule(const struct kvm_vcpu *vcpu)
-{
-	unsigned long esr = kvm_vcpu_get_esr(vcpu);
-
-	BUG_ON(!esr_fsc_is_permission_fault(esr));
-	return esr_fsc_perm_fault_granule(esr);
-}
-
 static __always_inline bool kvm_vcpu_abt_issea(const struct kvm_vcpu *vcpu)
 {
 	switch (kvm_vcpu_trap_get_fault(vcpu)) {
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 4623cbc1edf4..ec6473007fb9 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -418,6 +418,7 @@ struct kvm_page_fault {
 	const bool exec;
 	const bool write;
 	const bool is_perm;
+	const u64  granule;
 
 	phys_addr_t fault_ipa; /* The address we faulted on */
 	phys_addr_t ipa; /* Always the IPA in the L1 guest phys space */
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index c6aadd8baa18..10c73494d505 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1483,14 +1483,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 	short vma_shift;
 	void *memcache;
 	bool logging_active = memslot_is_logging(fault->slot);
-	long vma_pagesize, fault_granule;
+	long vma_pagesize;
 	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
 	struct kvm_pgtable *pgt;
 	vm_flags_t vm_flags;
 	enum kvm_pgtable_walk_flags flags = KVM_PGTABLE_WALK_HANDLE_FAULT | KVM_PGTABLE_WALK_SHARED;
 
-	if (fault->is_perm)
-		fault_granule = kvm_vcpu_trap_get_perm_fault_granule(vcpu);
 	VM_BUG_ON(fault->write && fault->exec);
 
 	if (fault->is_perm && !fault->write && !fault->exec) {
@@ -1715,8 +1713,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 	 * backed by a THP and thus use block mapping if possible.
 	 */
 	if (vma_pagesize == PAGE_SIZE && !(force_pte || s2_force_noncacheable)) {
-		if (fault->is_perm && fault_granule > PAGE_SIZE)
-			vma_pagesize = fault_granule;
+		if (fault->is_perm && fault->granule > PAGE_SIZE)
+			vma_pagesize = fault->granule;
 		else
 			vma_pagesize = transparent_hugepage_adjust(kvm, fault);
 
@@ -1754,10 +1752,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 
 	/*
 	 * Under the premise of getting a FSC_PERM fault, we just need to relax
-	 * permissions only if vma_pagesize equals fault_granule. Otherwise,
+	 * permissions only if vma_pagesize equals fault->granule. Otherwise,
 	 * kvm_pgtable_stage2_map() should be called to change block size.
 	 */
-	if (fault->is_perm && vma_pagesize == fault_granule) {
+	if (fault->is_perm && vma_pagesize == fault->granule) {
 		/*
 		 * Drop the SW bits in favour of those stored in the
 		 * PTE, which will be preserved.
@@ -1806,6 +1804,7 @@ static int __kvm_handle_guest_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa
 		.write = kvm_is_write_fault(vcpu),
 		.exec  = kvm_vcpu_trap_is_exec_fault(vcpu),
 		.is_perm = esr_fsc_is_permission_fault(esr),
+		.granule = esr_fsc_is_permission_fault(esr) ? esr_fsc_perm_fault_granule(esr) : 0,
 	};
 	struct kvm_s2_trans nested_trans;
 	bool writable;
-- 
2.51.0.261.g7ce5a0a67e-goog



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH 10/16] KVM: arm64: Drop local vfio_allow_any_uc, use vm_flags snapshot
  2025-08-21 21:00 [RFC PATCH 00/16] KVM: arm64: Add "struct kvm_page_fault" Sean Christopherson
                   ` (8 preceding siblings ...)
  2025-08-21 21:00 ` [RFC PATCH 09/16] KVM: arm64: Track perm fault granule in "struct kvm_page_fault" Sean Christopherson
@ 2025-08-21 21:00 ` Sean Christopherson
  2025-08-21 21:00 ` [RFC PATCH 11/16] KVM: arm64: Drop local mte_allowed, " Sean Christopherson
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 22+ messages in thread
From: Sean Christopherson @ 2025-08-21 21:00 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton
  Cc: linux-arm-kernel, kvmarm, linux-kernel, Sean Christopherson,
	James Houghton

Drop user_mem_abort()'s local vfio_allow_any_uc variable and instead use
the vm_flags snapshot.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/arm64/kvm/mmu.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 10c73494d505..e1375296940b 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1476,7 +1476,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 	int ret = 0;
 	bool writable, force_pte = false;
 	bool mte_allowed, is_vma_cacheable;
-	bool s2_force_noncacheable = false, vfio_allow_any_uc = false;
+	bool s2_force_noncacheable = false;
 	unsigned long mmu_seq;
 	struct kvm *kvm = vcpu->kvm;
 	struct vm_area_struct *vma;
@@ -1608,8 +1608,6 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 	fault->gfn = fault->ipa >> PAGE_SHIFT;
 	mte_allowed = kvm_vma_mte_allowed(vma);
 
-	vfio_allow_any_uc = vma->vm_flags & VM_ALLOW_ANY_UNCACHED;
-
 	vm_flags = vma->vm_flags;
 
 	is_vma_cacheable = kvm_vma_is_cacheable(vma);
@@ -1741,7 +1739,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 		prot |= KVM_PGTABLE_PROT_X;
 
 	if (s2_force_noncacheable) {
-		if (vfio_allow_any_uc)
+		if (vm_flags & VM_ALLOW_ANY_UNCACHED)
 			prot |= KVM_PGTABLE_PROT_NORMAL_NC;
 		else
 			prot |= KVM_PGTABLE_PROT_DEVICE;
-- 
2.51.0.261.g7ce5a0a67e-goog



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH 11/16] KVM: arm64: Drop local mte_allowed, use vm_flags snapshot
  2025-08-21 21:00 [RFC PATCH 00/16] KVM: arm64: Add "struct kvm_page_fault" Sean Christopherson
                   ` (9 preceding siblings ...)
  2025-08-21 21:00 ` [RFC PATCH 10/16] KVM: arm64: Drop local vfio_allow_any_uc, use vm_flags snapshot Sean Christopherson
@ 2025-08-21 21:00 ` Sean Christopherson
  2025-08-21 21:00 ` [RFC PATCH 12/16] KVM: arm64: Move VMA-related information into "struct kvm_page_fault" Sean Christopherson
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 22+ messages in thread
From: Sean Christopherson @ 2025-08-21 21:00 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton
  Cc: linux-arm-kernel, kvmarm, linux-kernel, Sean Christopherson,
	James Houghton

Drop user_mem_abort()'s local mte_allowed and instead use the vm_flags
snapshot.  The redundant variables aren't problematic per se, but will be
quite awkward when a future change moves the vm_flags snapshot into
"struct kvm_page_fault".

Opportunistically drop kvm_vma_mte_allowed() and open code the vm_flags
check in the memslot preparation code, as there's little value in hiding
VM_MTE_ALLOWED (arguably negative "value), and the fault path can't use
the VMA-based helper (because looking at the VMA outside of mmap_lock is
unsafe).

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/arm64/kvm/mmu.c | 12 +++---------
 1 file changed, 3 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index e1375296940b..b85968019dd4 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1454,11 +1454,6 @@ static void sanitise_mte_tags(struct kvm *kvm, kvm_pfn_t pfn,
 	}
 }
 
-static bool kvm_vma_mte_allowed(struct vm_area_struct *vma)
-{
-	return vma->vm_flags & VM_MTE_ALLOWED;
-}
-
 static bool kvm_vma_is_cacheable(struct vm_area_struct *vma)
 {
 	switch (FIELD_GET(PTE_ATTRINDX_MASK, pgprot_val(vma->vm_page_prot))) {
@@ -1475,7 +1470,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 {
 	int ret = 0;
 	bool writable, force_pte = false;
-	bool mte_allowed, is_vma_cacheable;
+	bool is_vma_cacheable;
 	bool s2_force_noncacheable = false;
 	unsigned long mmu_seq;
 	struct kvm *kvm = vcpu->kvm;
@@ -1606,7 +1601,6 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 	}
 
 	fault->gfn = fault->ipa >> PAGE_SHIFT;
-	mte_allowed = kvm_vma_mte_allowed(vma);
 
 	vm_flags = vma->vm_flags;
 
@@ -1724,7 +1718,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 
 	if (!fault->is_perm && !s2_force_noncacheable && kvm_has_mte(kvm)) {
 		/* Check the VMM hasn't introduced a new disallowed VMA */
-		if (mte_allowed) {
+		if (vm_flags & VM_MTE_ALLOWED) {
 			sanitise_mte_tags(kvm, fault->pfn, vma_pagesize);
 		} else {
 			ret = -EFAULT;
@@ -2215,7 +2209,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
 		if (!vma)
 			break;
 
-		if (kvm_has_mte(kvm) && !kvm_vma_mte_allowed(vma)) {
+		if (kvm_has_mte(kvm) && !(vma->vm_flags & VM_MTE_ALLOWED)) {
 			ret = -EINVAL;
 			break;
 		}
-- 
2.51.0.261.g7ce5a0a67e-goog



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH 12/16] KVM: arm64: Move VMA-related information into "struct kvm_page_fault"
  2025-08-21 21:00 [RFC PATCH 00/16] KVM: arm64: Add "struct kvm_page_fault" Sean Christopherson
                   ` (10 preceding siblings ...)
  2025-08-21 21:00 ` [RFC PATCH 11/16] KVM: arm64: Drop local mte_allowed, " Sean Christopherson
@ 2025-08-21 21:00 ` Sean Christopherson
  2025-08-21 21:00 ` [RFC PATCH 13/16] KVM: arm64: Stash "mmu_seq" in " Sean Christopherson
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 22+ messages in thread
From: Sean Christopherson @ 2025-08-21 21:00 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton
  Cc: linux-arm-kernel, kvmarm, linux-kernel, Sean Christopherson,
	James Houghton

Stash the "outputs" from parsing the VMA associated with an abort in
kvm_page_fault.  This will allow moving the mmap_lock-protected section to
a separate helper without needing a pile of out-parameters.

Deliberately place "pagesize" outside of the "vma" sub-structure, as KVM
manipulates (restricts) the pagesize based on other state, i.e. it's not a
strict representation of the VMA.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/arm64/include/asm/kvm_host.h |  9 +++++
 arch/arm64/kvm/mmu.c              | 67 +++++++++++++++----------------
 2 files changed, 41 insertions(+), 35 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index ec6473007fb9..4d131be08d8d 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -430,6 +430,15 @@ struct kvm_page_fault {
 	unsigned long hva;
 	kvm_pfn_t pfn;
 	struct page *page;
+
+	struct {
+		vm_flags_t vm_flags;
+		short pageshift;
+
+		bool is_cacheable;
+	} vma;
+
+	long pagesize;
 };
 
 /*
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index b85968019dd4..aa6ee72bef51 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1470,18 +1470,14 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 {
 	int ret = 0;
 	bool writable, force_pte = false;
-	bool is_vma_cacheable;
 	bool s2_force_noncacheable = false;
 	unsigned long mmu_seq;
 	struct kvm *kvm = vcpu->kvm;
 	struct vm_area_struct *vma;
-	short vma_shift;
 	void *memcache;
 	bool logging_active = memslot_is_logging(fault->slot);
-	long vma_pagesize;
 	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
 	struct kvm_pgtable *pgt;
-	vm_flags_t vm_flags;
 	enum kvm_pgtable_walk_flags flags = KVM_PGTABLE_WALK_HANDLE_FAULT | KVM_PGTABLE_WALK_SHARED;
 
 	VM_BUG_ON(fault->write && fault->exec);
@@ -1532,12 +1528,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 	 */
 	if (logging_active) {
 		force_pte = true;
-		vma_shift = PAGE_SHIFT;
+		fault->vma.pageshift = PAGE_SHIFT;
 	} else {
-		vma_shift = get_vma_page_shift(vma, fault->hva);
+		fault->vma.pageshift = get_vma_page_shift(vma, fault->hva);
 	}
 
-	switch (vma_shift) {
+	switch (fault->vma.pageshift) {
 #ifndef __PAGETABLE_PMD_FOLDED
 	case PUD_SHIFT:
 		if (fault_supports_stage2_huge_mapping(fault, PUD_SIZE))
@@ -1545,23 +1541,23 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 		fallthrough;
 #endif
 	case CONT_PMD_SHIFT:
-		vma_shift = PMD_SHIFT;
+		fault->vma.pageshift = PMD_SHIFT;
 		fallthrough;
 	case PMD_SHIFT:
 		if (fault_supports_stage2_huge_mapping(fault, PMD_SIZE))
 			break;
 		fallthrough;
 	case CONT_PTE_SHIFT:
-		vma_shift = PAGE_SHIFT;
+		fault->vma.pageshift = PAGE_SHIFT;
 		force_pte = true;
 		fallthrough;
 	case PAGE_SHIFT:
 		break;
 	default:
-		WARN_ONCE(1, "Unknown vma_shift %d", vma_shift);
+		WARN_ONCE(1, "Unknown VMA page shift %d", fault->vma.pageshift);
 	}
 
-	vma_pagesize = 1UL << vma_shift;
+	fault->pagesize = 1UL << fault->vma.pageshift;
 
 	if (fault->nested) {
 		unsigned long max_map_size;
@@ -1587,7 +1583,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 			max_map_size = PAGE_SIZE;
 
 		force_pte = (max_map_size == PAGE_SIZE);
-		vma_pagesize = min(vma_pagesize, (long)max_map_size);
+		fault->pagesize = min(fault->pagesize, (long)max_map_size);
 	}
 
 	/*
@@ -1595,16 +1591,15 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 	 * ensure we find the right PFN and lay down the mapping in the right
 	 * place.
 	 */
-	if (vma_pagesize == PMD_SIZE || vma_pagesize == PUD_SIZE) {
-		fault->fault_ipa &= ~(vma_pagesize - 1);
-		fault->ipa &= ~(vma_pagesize - 1);
+	if (fault->pagesize == PMD_SIZE || fault->pagesize == PUD_SIZE) {
+		fault->fault_ipa &= ~(fault->pagesize - 1);
+		fault->ipa &= ~(fault->pagesize - 1);
 	}
 
 	fault->gfn = fault->ipa >> PAGE_SHIFT;
 
-	vm_flags = vma->vm_flags;
-
-	is_vma_cacheable = kvm_vma_is_cacheable(vma);
+	fault->vma.vm_flags = vma->vm_flags;
+	fault->vma.is_cacheable = kvm_vma_is_cacheable(vma);
 
 	/* Don't use the VMA after the unlock -- it may have vanished */
 	vma = NULL;
@@ -1624,7 +1619,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 				       fault->write ? FOLL_WRITE : 0,
 				       &writable, &fault->page);
 	if (fault->pfn == KVM_PFN_ERR_HWPOISON) {
-		kvm_send_hwpoison_signal(fault->hva, vma_shift);
+		kvm_send_hwpoison_signal(fault->hva, fault->vma.pageshift);
 		return 0;
 	}
 	if (is_error_noslot_pfn(fault->pfn))
@@ -1634,8 +1629,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 	 * Check if this is non-struct page memory PFN, and cannot support
 	 * CMOs. It could potentially be unsafe to access as cachable.
 	 */
-	if (vm_flags & (VM_PFNMAP | VM_MIXEDMAP) && !pfn_is_map_memory(fault->pfn)) {
-		if (is_vma_cacheable) {
+	if (fault->vma.vm_flags & (VM_PFNMAP | VM_MIXEDMAP) &&
+	    !pfn_is_map_memory(fault->pfn)) {
+		if (fault->vma.is_cacheable) {
 			/*
 			 * Whilst the VMA owner expects cacheable mapping to this
 			 * PFN, hardware also has to support the FWB and CACHE DIC
@@ -1653,9 +1649,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 		} else {
 			/*
 			 * If the page was identified as device early by looking at
-			 * the VMA flags, vma_pagesize is already representing the
+			 * the VMA flags, fualt->pagesize is already representing the
 			 * largest quantity we can map.  If instead it was mapped
-			 * via __kvm_faultin_pfn(), vma_pagesize is set to PAGE_SIZE
+			 * via __kvm_faultin_pfn(), fualt->pagesize is set to PAGE_SIZE
 			 * and must not be upgraded.
 			 *
 			 * In both cases, we don't let transparent_hugepage_adjust()
@@ -1704,22 +1700,22 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 	 * If we are not forced to use page mapping, check if we are
 	 * backed by a THP and thus use block mapping if possible.
 	 */
-	if (vma_pagesize == PAGE_SIZE && !(force_pte || s2_force_noncacheable)) {
+	if (fault->pagesize == PAGE_SIZE && !(force_pte || s2_force_noncacheable)) {
 		if (fault->is_perm && fault->granule > PAGE_SIZE)
-			vma_pagesize = fault->granule;
+			fault->pagesize = fault->granule;
 		else
-			vma_pagesize = transparent_hugepage_adjust(kvm, fault);
+			fault->pagesize = transparent_hugepage_adjust(kvm, fault);
 
-		if (vma_pagesize < 0) {
-			ret = vma_pagesize;
+		if (fault->pagesize < 0) {
+			ret = fault->pagesize;
 			goto out_unlock;
 		}
 	}
 
 	if (!fault->is_perm && !s2_force_noncacheable && kvm_has_mte(kvm)) {
 		/* Check the VMM hasn't introduced a new disallowed VMA */
-		if (vm_flags & VM_MTE_ALLOWED) {
-			sanitise_mte_tags(kvm, fault->pfn, vma_pagesize);
+		if (fault->vma.vm_flags & VM_MTE_ALLOWED) {
+			sanitise_mte_tags(kvm, fault->pfn, fault->pagesize);
 		} else {
 			ret = -EFAULT;
 			goto out_unlock;
@@ -1733,7 +1729,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 		prot |= KVM_PGTABLE_PROT_X;
 
 	if (s2_force_noncacheable) {
-		if (vm_flags & VM_ALLOW_ANY_UNCACHED)
+		if (fault->vma.vm_flags & VM_ALLOW_ANY_UNCACHED)
 			prot |= KVM_PGTABLE_PROT_NORMAL_NC;
 		else
 			prot |= KVM_PGTABLE_PROT_DEVICE;
@@ -1747,7 +1743,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 	 * permissions only if vma_pagesize equals fault->granule. Otherwise,
 	 * kvm_pgtable_stage2_map() should be called to change block size.
 	 */
-	if (fault->is_perm && vma_pagesize == fault->granule) {
+	if (fault->is_perm && fault->pagesize == fault->granule) {
 		/*
 		 * Drop the SW bits in favour of those stored in the
 		 * PTE, which will be preserved.
@@ -1755,9 +1751,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 		prot &= ~KVM_NV_GUEST_MAP_SZ;
 		ret = KVM_PGT_FN(kvm_pgtable_stage2_relax_perms)(pgt, fault->fault_ipa, prot, flags);
 	} else {
-		ret = KVM_PGT_FN(kvm_pgtable_stage2_map)(pgt, fault->fault_ipa, vma_pagesize,
-					     __pfn_to_phys(fault->pfn), prot,
-					     memcache, flags);
+		ret = KVM_PGT_FN(kvm_pgtable_stage2_map)(pgt, fault->fault_ipa,
+							 fault->pagesize,
+							 __pfn_to_phys(fault->pfn),
+							 prot, memcache, flags);
 	}
 
 out_unlock:
-- 
2.51.0.261.g7ce5a0a67e-goog



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH 13/16] KVM: arm64: Stash "mmu_seq" in "struct kvm_page_fault"
  2025-08-21 21:00 [RFC PATCH 00/16] KVM: arm64: Add "struct kvm_page_fault" Sean Christopherson
                   ` (11 preceding siblings ...)
  2025-08-21 21:00 ` [RFC PATCH 12/16] KVM: arm64: Move VMA-related information into "struct kvm_page_fault" Sean Christopherson
@ 2025-08-21 21:00 ` Sean Christopherson
  2025-08-21 21:00 ` [RFC PATCH 14/16] KVM: arm64: Track "forced" information " Sean Christopherson
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 22+ messages in thread
From: Sean Christopherson @ 2025-08-21 21:00 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton
  Cc: linux-arm-kernel, kvmarm, linux-kernel, Sean Christopherson,
	James Houghton

Track the MMU notifier sequence count snapshot in "struct kvm_page_fault"
in anticipation of moving the mmap_lock-protected code to a separate
helper.  Attaching mmu_seq to the fault could also prove useful in the
future, e.g. for additional refactorings.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/arm64/include/asm/kvm_host.h | 1 +
 arch/arm64/kvm/mmu.c              | 5 ++---
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 4d131be08d8d..6a99f7fa065d 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -430,6 +430,7 @@ struct kvm_page_fault {
 	unsigned long hva;
 	kvm_pfn_t pfn;
 	struct page *page;
+	unsigned long mmu_seq;
 
 	struct {
 		vm_flags_t vm_flags;
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index aa6ee72bef51..575a4f9f2583 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1471,7 +1471,6 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 	int ret = 0;
 	bool writable, force_pte = false;
 	bool s2_force_noncacheable = false;
-	unsigned long mmu_seq;
 	struct kvm *kvm = vcpu->kvm;
 	struct vm_area_struct *vma;
 	void *memcache;
@@ -1612,7 +1611,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 	 * Rely on mmap_read_unlock() for an implicit smp_rmb(), which pairs
 	 * with the smp_wmb() in kvm_mmu_invalidate_end().
 	 */
-	mmu_seq = vcpu->kvm->mmu_invalidate_seq;
+	fault->mmu_seq = vcpu->kvm->mmu_invalidate_seq;
 	mmap_read_unlock(current->mm);
 
 	fault->pfn = __kvm_faultin_pfn(fault->slot, fault->gfn,
@@ -1691,7 +1690,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 
 	kvm_fault_lock(kvm);
 	pgt = vcpu->arch.hw_mmu->pgt;
-	if (mmu_invalidate_retry(kvm, mmu_seq)) {
+	if (mmu_invalidate_retry(kvm, fault->mmu_seq)) {
 		ret = -EAGAIN;
 		goto out_unlock;
 	}
-- 
2.51.0.261.g7ce5a0a67e-goog



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH 14/16] KVM: arm64: Track "forced" information in "struct kvm_page_fault"
  2025-08-21 21:00 [RFC PATCH 00/16] KVM: arm64: Add "struct kvm_page_fault" Sean Christopherson
                   ` (12 preceding siblings ...)
  2025-08-21 21:00 ` [RFC PATCH 13/16] KVM: arm64: Stash "mmu_seq" in " Sean Christopherson
@ 2025-08-21 21:00 ` Sean Christopherson
  2025-08-21 21:00 ` [RFC PATCH 15/16] KVM: arm64: Extract mmap_lock-protected code to helper for user mem aborts Sean Christopherson
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 22+ messages in thread
From: Sean Christopherson @ 2025-08-21 21:00 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton
  Cc: linux-arm-kernel, kvmarm, linux-kernel, Sean Christopherson,
	James Houghton

Move the abort handler's local "force_pte" and "s2_force_noncacheable"
variables into "struct kvm_page_fault" in anticipation of moving the
mmap_lock-protected code to a separate helper.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/arm64/include/asm/kvm_host.h |  3 +++
 arch/arm64/kvm/mmu.c              | 22 +++++++++++-----------
 2 files changed, 14 insertions(+), 11 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 6a99f7fa065d..fa52546bf870 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -440,6 +440,9 @@ struct kvm_page_fault {
 	} vma;
 
 	long pagesize;
+
+	bool force_pte;
+	bool s2_force_noncacheable;
 };
 
 /*
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 575a4f9f2583..fec3a6aeabd0 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1469,8 +1469,7 @@ static bool kvm_vma_is_cacheable(struct vm_area_struct *vma)
 static int user_mem_abort(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 {
 	int ret = 0;
-	bool writable, force_pte = false;
-	bool s2_force_noncacheable = false;
+	bool writable;
 	struct kvm *kvm = vcpu->kvm;
 	struct vm_area_struct *vma;
 	void *memcache;
@@ -1526,7 +1525,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 	 * memslots.
 	 */
 	if (logging_active) {
-		force_pte = true;
+		fault->force_pte = true;
 		fault->vma.pageshift = PAGE_SHIFT;
 	} else {
 		fault->vma.pageshift = get_vma_page_shift(vma, fault->hva);
@@ -1548,7 +1547,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 		fallthrough;
 	case CONT_PTE_SHIFT:
 		fault->vma.pageshift = PAGE_SHIFT;
-		force_pte = true;
+		fault->force_pte = true;
 		fallthrough;
 	case PAGE_SHIFT:
 		break;
@@ -1561,7 +1560,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 	if (fault->nested) {
 		unsigned long max_map_size;
 
-		max_map_size = force_pte ? PAGE_SIZE : PUD_SIZE;
+		max_map_size = fault->force_pte ? PAGE_SIZE : PUD_SIZE;
 
 		WARN_ON_ONCE(fault->ipa != kvm_s2_trans_output(fault->nested));
 
@@ -1581,7 +1580,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 		else if (max_map_size >= PAGE_SIZE && max_map_size < PMD_SIZE)
 			max_map_size = PAGE_SIZE;
 
-		force_pte = (max_map_size == PAGE_SIZE);
+		fault->force_pte = (max_map_size == PAGE_SIZE);
 		fault->pagesize = min(fault->pagesize, (long)max_map_size);
 	}
 
@@ -1656,7 +1655,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 			 * In both cases, we don't let transparent_hugepage_adjust()
 			 * change things at the last minute.
 			 */
-			s2_force_noncacheable = true;
+			fault->s2_force_noncacheable = true;
 		}
 	} else if (logging_active && !fault->write) {
 		/*
@@ -1666,7 +1665,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 		writable = false;
 	}
 
-	if (fault->exec && s2_force_noncacheable)
+	if (fault->exec && fault->s2_force_noncacheable)
 		return -ENOEXEC;
 
 	/*
@@ -1699,7 +1698,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 	 * If we are not forced to use page mapping, check if we are
 	 * backed by a THP and thus use block mapping if possible.
 	 */
-	if (fault->pagesize == PAGE_SIZE && !(force_pte || s2_force_noncacheable)) {
+	if (fault->pagesize == PAGE_SIZE &&
+	    !(fault->force_pte || fault->s2_force_noncacheable)) {
 		if (fault->is_perm && fault->granule > PAGE_SIZE)
 			fault->pagesize = fault->granule;
 		else
@@ -1711,7 +1711,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 		}
 	}
 
-	if (!fault->is_perm && !s2_force_noncacheable && kvm_has_mte(kvm)) {
+	if (!fault->is_perm && !fault->s2_force_noncacheable && kvm_has_mte(kvm)) {
 		/* Check the VMM hasn't introduced a new disallowed VMA */
 		if (fault->vma.vm_flags & VM_MTE_ALLOWED) {
 			sanitise_mte_tags(kvm, fault->pfn, fault->pagesize);
@@ -1727,7 +1727,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 	if (fault->exec)
 		prot |= KVM_PGTABLE_PROT_X;
 
-	if (s2_force_noncacheable) {
+	if (fault->s2_force_noncacheable) {
 		if (fault->vma.vm_flags & VM_ALLOW_ANY_UNCACHED)
 			prot |= KVM_PGTABLE_PROT_NORMAL_NC;
 		else
-- 
2.51.0.261.g7ce5a0a67e-goog



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH 15/16] KVM: arm64: Extract mmap_lock-protected code to helper for user mem aborts
  2025-08-21 21:00 [RFC PATCH 00/16] KVM: arm64: Add "struct kvm_page_fault" Sean Christopherson
                   ` (13 preceding siblings ...)
  2025-08-21 21:00 ` [RFC PATCH 14/16] KVM: arm64: Track "forced" information " Sean Christopherson
@ 2025-08-21 21:00 ` Sean Christopherson
  2025-08-21 21:00 ` [RFC PATCH 16/16] KVM: arm64: Don't bother nullifying "vma" in mem abort path Sean Christopherson
  2025-08-21 22:39 ` [RFC PATCH 00/16] KVM: arm64: Add "struct kvm_page_fault" Oliver Upton
  16 siblings, 0 replies; 22+ messages in thread
From: Sean Christopherson @ 2025-08-21 21:00 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton
  Cc: linux-arm-kernel, kvmarm, linux-kernel, Sean Christopherson,
	James Houghton

Factor out the mmap_lock-protected portion of user_mem_abort() to a new
helper, partly to make user_mem_abort() easier to follow, but mostly so
that the scope of the mmap_lock-protected code is more explicitly bounded.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/arm64/kvm/mmu.c | 96 ++++++++++++++++++++++++--------------------
 1 file changed, 52 insertions(+), 44 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index fec3a6aeabd0..ea326d66f027 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1466,47 +1466,10 @@ static bool kvm_vma_is_cacheable(struct vm_area_struct *vma)
 	}
 }
 
-static int user_mem_abort(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
+static int kvm_gather_fault_vma_info(struct kvm_vcpu *vcpu,
+				     struct kvm_page_fault *fault)
 {
-	int ret = 0;
-	bool writable;
-	struct kvm *kvm = vcpu->kvm;
 	struct vm_area_struct *vma;
-	void *memcache;
-	bool logging_active = memslot_is_logging(fault->slot);
-	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
-	struct kvm_pgtable *pgt;
-	enum kvm_pgtable_walk_flags flags = KVM_PGTABLE_WALK_HANDLE_FAULT | KVM_PGTABLE_WALK_SHARED;
-
-	VM_BUG_ON(fault->write && fault->exec);
-
-	if (fault->is_perm && !fault->write && !fault->exec) {
-		kvm_err("Unexpected L2 read permission error\n");
-		return -EFAULT;
-	}
-
-	if (!is_protected_kvm_enabled())
-		memcache = &vcpu->arch.mmu_page_cache;
-	else
-		memcache = &vcpu->arch.pkvm_memcache;
-
-	/*
-	 * Permission faults just need to update the existing leaf entry,
-	 * and so normally don't require allocations from the memcache. The
-	 * only exception to this is when dirty logging is enabled at runtime
-	 * and a write fault needs to collapse a block entry into a table.
-	 */
-	if (!fault->is_perm || (logging_active && fault->write)) {
-		int min_pages = kvm_mmu_cache_min_pages(vcpu->arch.hw_mmu);
-
-		if (!is_protected_kvm_enabled())
-			ret = kvm_mmu_topup_memory_cache(memcache, min_pages);
-		else
-			ret = topup_hyp_memcache(memcache, min_pages);
-
-		if (ret)
-			return ret;
-	}
 
 	/*
 	 * Let's check if we will get back a huge page backed by hugetlbfs, or
@@ -1520,11 +1483,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 		return -EFAULT;
 	}
 
-	/*
-	 * logging_active is guaranteed to never be true for VM_PFNMAP
-	 * memslots.
-	 */
-	if (logging_active) {
+	/* Logging is guaranteed to never be active for VM_PFNMAP memslots. */
+	if (memslot_is_logging(fault->slot)) {
 		fault->force_pte = true;
 		fault->vma.pageshift = PAGE_SHIFT;
 	} else {
@@ -1613,6 +1573,54 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 	fault->mmu_seq = vcpu->kvm->mmu_invalidate_seq;
 	mmap_read_unlock(current->mm);
 
+	return 0;
+}
+
+static int user_mem_abort(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
+{
+	int ret = 0;
+	bool writable;
+	struct kvm *kvm = vcpu->kvm;
+	void *memcache;
+	bool logging_active = memslot_is_logging(fault->slot);
+	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
+	struct kvm_pgtable *pgt;
+	enum kvm_pgtable_walk_flags flags = KVM_PGTABLE_WALK_HANDLE_FAULT | KVM_PGTABLE_WALK_SHARED;
+
+	VM_BUG_ON(fault->write && fault->exec);
+
+	if (fault->is_perm && !fault->write && !fault->exec) {
+		kvm_err("Unexpected L2 read permission error\n");
+		return -EFAULT;
+	}
+
+	if (!is_protected_kvm_enabled())
+		memcache = &vcpu->arch.mmu_page_cache;
+	else
+		memcache = &vcpu->arch.pkvm_memcache;
+
+	/*
+	 * Permission faults just need to update the existing leaf entry,
+	 * and so normally don't require allocations from the memcache. The
+	 * only exception to this is when dirty logging is enabled at runtime
+	 * and a write fault needs to collapse a block entry into a table.
+	 */
+	if (!fault->is_perm || (logging_active && fault->write)) {
+		int min_pages = kvm_mmu_cache_min_pages(vcpu->arch.hw_mmu);
+
+		if (!is_protected_kvm_enabled())
+			ret = kvm_mmu_topup_memory_cache(memcache, min_pages);
+		else
+			ret = topup_hyp_memcache(memcache, min_pages);
+
+		if (ret)
+			return ret;
+	}
+
+	ret = kvm_gather_fault_vma_info(vcpu, fault);
+	if (ret)
+		return ret;
+
 	fault->pfn = __kvm_faultin_pfn(fault->slot, fault->gfn,
 				       fault->write ? FOLL_WRITE : 0,
 				       &writable, &fault->page);
-- 
2.51.0.261.g7ce5a0a67e-goog



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH 16/16] KVM: arm64: Don't bother nullifying "vma" in mem abort path
  2025-08-21 21:00 [RFC PATCH 00/16] KVM: arm64: Add "struct kvm_page_fault" Sean Christopherson
                   ` (14 preceding siblings ...)
  2025-08-21 21:00 ` [RFC PATCH 15/16] KVM: arm64: Extract mmap_lock-protected code to helper for user mem aborts Sean Christopherson
@ 2025-08-21 21:00 ` Sean Christopherson
  2025-08-21 22:39 ` [RFC PATCH 00/16] KVM: arm64: Add "struct kvm_page_fault" Oliver Upton
  16 siblings, 0 replies; 22+ messages in thread
From: Sean Christopherson @ 2025-08-21 21:00 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton
  Cc: linux-arm-kernel, kvmarm, linux-kernel, Sean Christopherson,
	James Houghton

Now that the local "vma" in kvm_gather_fault_vma_info() will naturally go
out of scope when mmap_lock is dropped, don't bother nullifying the
variable.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/arm64/kvm/mmu.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index ea326d66f027..435582e997ce 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1559,9 +1559,6 @@ static int kvm_gather_fault_vma_info(struct kvm_vcpu *vcpu,
 	fault->vma.vm_flags = vma->vm_flags;
 	fault->vma.is_cacheable = kvm_vma_is_cacheable(vma);
 
-	/* Don't use the VMA after the unlock -- it may have vanished */
-	vma = NULL;
-
 	/*
 	 * Read mmu_invalidate_seq so that KVM can detect if the results of
 	 * vma_lookup() or __kvm_faultin_pfn() become stale prior to
-- 
2.51.0.261.g7ce5a0a67e-goog



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 05/16] KVM: arm64: Introduce "struct kvm_page_fault" for tracking abort state
  2025-08-21 21:00 ` [RFC PATCH 05/16] KVM: arm64: Introduce "struct kvm_page_fault" for tracking abort state Sean Christopherson
@ 2025-08-21 22:31   ` Oliver Upton
  2025-08-26 18:58     ` Sean Christopherson
  0 siblings, 1 reply; 22+ messages in thread
From: Oliver Upton @ 2025-08-21 22:31 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Marc Zyngier, linux-arm-kernel, kvmarm, linux-kernel,
	James Houghton

Hey Sean,

On Thu, Aug 21, 2025 at 02:00:31PM -0700, Sean Christopherson wrote:
> Add and use a kvm_page_fault structure to track state when handling a
> guest abort.  Collecting everything in a single structure will enable a
> variety of cleanups (reduce the number of params passed to helpers), and
> will pave the way toward using "struct kvm_page_fault" in arch-neutral KVM
> code, e.g. to consolidate logic for KVM_EXIT_MEMORY_FAULT.
> 
> No functional change intended.
> 
> Cc: James Houghton <jthoughton@google.com>
> Link: https://lore.kernel.org/all/20250618042424.330664-1-jthoughton@google.com
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/arm64/include/asm/kvm_host.h |  18 ++++
>  arch/arm64/kvm/mmu.c              | 143 ++++++++++++++----------------
>  2 files changed, 87 insertions(+), 74 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 2f2394cce24e..4623cbc1edf4 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -413,6 +413,24 @@ struct kvm_vcpu_fault_info {
>  	u64 disr_el1;		/* Deferred [SError] Status Register */
>  };
>  
> +struct kvm_page_fault {
> +	const u64 esr;
> +	const bool exec;
> +	const bool write;
> +	const bool is_perm;

Hmm... these might be better represented as predicates that take a
pointer to this struct and we just compute it based on ESR. That'd have
the benefit in the arch-neutral code where 'struct kvm_page_fault' is an
opaque type and we don't need to align field names/types.

> +	phys_addr_t fault_ipa; /* The address we faulted on */
> +	phys_addr_t ipa; /* Always the IPA in the L1 guest phys space */

NYC, but this also seems like a good opportunity to rename + retype
these guys. Specifically:

	fault_ipa => ipa
	ipa => canonical_ipa

would clarify these and align with the verbiage we currently use to talk
about nested.

Thanks,
Oliver


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 00/16] KVM: arm64: Add "struct kvm_page_fault"
  2025-08-21 21:00 [RFC PATCH 00/16] KVM: arm64: Add "struct kvm_page_fault" Sean Christopherson
                   ` (15 preceding siblings ...)
  2025-08-21 21:00 ` [RFC PATCH 16/16] KVM: arm64: Don't bother nullifying "vma" in mem abort path Sean Christopherson
@ 2025-08-21 22:39 ` Oliver Upton
  16 siblings, 0 replies; 22+ messages in thread
From: Oliver Upton @ 2025-08-21 22:39 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Marc Zyngier, linux-arm-kernel, kvmarm, linux-kernel,
	James Houghton

On Thu, Aug 21, 2025 at 02:00:26PM -0700, Sean Christopherson wrote:
> Add an arm64 version of "struct kvm_page_fault" to (hopefully) tidy up
> the abort path, and to pave the way for things like KVM Userfault[*] that
> want to consume kvm_page_fault in arch-neutral code.
> 
> This is essentially one giant nop of code shuffling.
> 
> RFC as this is only compile-tested.  I didn't want to spend time testing
> until I got feedback on whether or not y'all are amenable to the general idea.

I appreciate the improved scoping around things like mmap lock, so this
seems like a net-win in terms of readability. Just want to clarify the
way this gets consumed from arch-neutral code and actually read in
detail.

Thanks,
Oliver


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 05/16] KVM: arm64: Introduce "struct kvm_page_fault" for tracking abort state
  2025-08-21 22:31   ` Oliver Upton
@ 2025-08-26 18:58     ` Sean Christopherson
  2025-08-26 19:29       ` Oliver Upton
  0 siblings, 1 reply; 22+ messages in thread
From: Sean Christopherson @ 2025-08-26 18:58 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Marc Zyngier, linux-arm-kernel, kvmarm, linux-kernel,
	James Houghton

On Thu, Aug 21, 2025, Oliver Upton wrote:
> Hey Sean,
> 
> On Thu, Aug 21, 2025 at 02:00:31PM -0700, Sean Christopherson wrote:
> > Add and use a kvm_page_fault structure to track state when handling a
> > guest abort.  Collecting everything in a single structure will enable a
> > variety of cleanups (reduce the number of params passed to helpers), and
> > will pave the way toward using "struct kvm_page_fault" in arch-neutral KVM
> > code, e.g. to consolidate logic for KVM_EXIT_MEMORY_FAULT.
> > 
> > No functional change intended.
> > 
> > Cc: James Houghton <jthoughton@google.com>
> > Link: https://lore.kernel.org/all/20250618042424.330664-1-jthoughton@google.com
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---
> >  arch/arm64/include/asm/kvm_host.h |  18 ++++
> >  arch/arm64/kvm/mmu.c              | 143 ++++++++++++++----------------
> >  2 files changed, 87 insertions(+), 74 deletions(-)
> > 
> > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > index 2f2394cce24e..4623cbc1edf4 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -413,6 +413,24 @@ struct kvm_vcpu_fault_info {
> >  	u64 disr_el1;		/* Deferred [SError] Status Register */
> >  };
> >  
> > +struct kvm_page_fault {
> > +	const u64 esr;
> > +	const bool exec;
> > +	const bool write;
> > +	const bool is_perm;
> 
> Hmm... these might be better represented as predicates that take a
> pointer to this struct and we just compute it based on ESR. That'd have
> the benefit in the arch-neutral code where 'struct kvm_page_fault' is an
> opaque type and we don't need to align field names/types.

We'd need to align function names/types though, so to some extent it's six of one,
half dozen of the other.  My slight preference would be to require kvm_page_fault
to have certain fields, but I'm ok with making kvm_page_fault opaque to generic
code and instead adding arch APIs.  Having a handful of wrappers in x86 isn't the
end of the world, and it would be more familiar for pretty much everyone.

> > +	phys_addr_t fault_ipa; /* The address we faulted on */
> > +	phys_addr_t ipa; /* Always the IPA in the L1 guest phys space */
> 
> NYC, but this also seems like a good opportunity to rename + retype
> these guys. Specifically:
> 
> 	fault_ipa => ipa
> 	ipa => canonical_ipa
> 
> would clarify these and align with the verbiage we currently use to talk
> about nested.

Heh, I'm so screwed.  x86's use of "canonical" is wildly different.  I can add
a patch to do those renames (I think doing an "opportunistic" rename would be a
bit much).


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 05/16] KVM: arm64: Introduce "struct kvm_page_fault" for tracking abort state
  2025-08-26 18:58     ` Sean Christopherson
@ 2025-08-26 19:29       ` Oliver Upton
  2025-08-26 21:29         ` Sean Christopherson
  0 siblings, 1 reply; 22+ messages in thread
From: Oliver Upton @ 2025-08-26 19:29 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Marc Zyngier, linux-arm-kernel, kvmarm, linux-kernel,
	James Houghton

On Tue, Aug 26, 2025 at 11:58:10AM -0700, Sean Christopherson wrote:
> On Thu, Aug 21, 2025, Oliver Upton wrote:
> > Hey Sean,
> > 
> > On Thu, Aug 21, 2025 at 02:00:31PM -0700, Sean Christopherson wrote:
> > > Add and use a kvm_page_fault structure to track state when handling a
> > > guest abort.  Collecting everything in a single structure will enable a
> > > variety of cleanups (reduce the number of params passed to helpers), and
> > > will pave the way toward using "struct kvm_page_fault" in arch-neutral KVM
> > > code, e.g. to consolidate logic for KVM_EXIT_MEMORY_FAULT.
> > > 
> > > No functional change intended.
> > > 
> > > Cc: James Houghton <jthoughton@google.com>
> > > Link: https://lore.kernel.org/all/20250618042424.330664-1-jthoughton@google.com
> > > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > > ---
> > >  arch/arm64/include/asm/kvm_host.h |  18 ++++
> > >  arch/arm64/kvm/mmu.c              | 143 ++++++++++++++----------------
> > >  2 files changed, 87 insertions(+), 74 deletions(-)
> > > 
> > > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > > index 2f2394cce24e..4623cbc1edf4 100644
> > > --- a/arch/arm64/include/asm/kvm_host.h
> > > +++ b/arch/arm64/include/asm/kvm_host.h
> > > @@ -413,6 +413,24 @@ struct kvm_vcpu_fault_info {
> > >  	u64 disr_el1;		/* Deferred [SError] Status Register */
> > >  };
> > >  
> > > +struct kvm_page_fault {
> > > +	const u64 esr;
> > > +	const bool exec;
> > > +	const bool write;
> > > +	const bool is_perm;
> > 
> > Hmm... these might be better represented as predicates that take a
> > pointer to this struct and we just compute it based on ESR. That'd have
> > the benefit in the arch-neutral code where 'struct kvm_page_fault' is an
> > opaque type and we don't need to align field names/types.
> 
> We'd need to align function names/types though, so to some extent it's six of one,
> half dozen of the other.  My slight preference would be to require kvm_page_fault
> to have certain fields, but I'm ok with making kvm_page_fault opaque to generic
> code and instead adding arch APIs.  Having a handful of wrappers in x86 isn't the
> end of the world, and it would be more familiar for pretty much everyone.

To clarify my earlier point, my actual interest is in using ESR as the
source of truth from the arch POV, interface to the arch-neutral code
isn't that big of a deal either way.

Thanks,
Oliver


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 05/16] KVM: arm64: Introduce "struct kvm_page_fault" for tracking abort state
  2025-08-26 19:29       ` Oliver Upton
@ 2025-08-26 21:29         ` Sean Christopherson
  0 siblings, 0 replies; 22+ messages in thread
From: Sean Christopherson @ 2025-08-26 21:29 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Marc Zyngier, linux-arm-kernel, kvmarm, linux-kernel,
	James Houghton

On Tue, Aug 26, 2025, Oliver Upton wrote:
> On Tue, Aug 26, 2025 at 11:58:10AM -0700, Sean Christopherson wrote:
> > On Thu, Aug 21, 2025, Oliver Upton wrote:
> > > > +struct kvm_page_fault {
> > > > +	const u64 esr;
> > > > +	const bool exec;
> > > > +	const bool write;
> > > > +	const bool is_perm;
> > > 
> > > Hmm... these might be better represented as predicates that take a
> > > pointer to this struct and we just compute it based on ESR. That'd have
> > > the benefit in the arch-neutral code where 'struct kvm_page_fault' is an
> > > opaque type and we don't need to align field names/types.
> > 
> > We'd need to align function names/types though, so to some extent it's six of one,
> > half dozen of the other.  My slight preference would be to require kvm_page_fault
> > to have certain fields, but I'm ok with making kvm_page_fault opaque to generic
> > code and instead adding arch APIs.  Having a handful of wrappers in x86 isn't the
> > end of the world, and it would be more familiar for pretty much everyone.
> 
> To clarify my earlier point, my actual interest is in using ESR as the
> source of truth from the arch POV, interface to the arch-neutral code
> isn't that big of a deal either way.

Ya, but that would mean having something like

  static bool kvm_is_exec_fault(struct kvm_page_fault *fault)
  {
	return esr_trap_is_iabt(fault->esr) && !esr_abt_iss1tw(fault->esr);
  }

and

  if (kvm_is_exec_fault(fault))

in arm64 code and then

  if (fault->exec)

in arch-neutral code, which, eww.

I like the idea of having a single source of truth, but that's going to be a
massive amount of work to do it "right", e.g. O(weeks) if not O(months).  E.g. to
replace fault->exec with kvm_is_exec_fault(), AFAICT it would require duplicating
all of kvm_is_write_fault().  Rinse and repeat for 20+ APIs in kvm_emulate.h that
take a vCPU and pull ESR from vcpu->arch.fault.esr_el2.

As an intermediate state, having that many duplicate APIs is tolerable, but I
wouldn't want to leave that as the "end" state for any kernel release, and ideally
not for any given series.  That means adding a pile of esr-based APIs, converting
_all_ users, then dropping the vcpu-based APIs.  That's a lot of code and patches.

E.g. even if we convert all of kvm_handle_guest_abort(), which itself is a big task,
there will still be usage of many of the APIs in at least kvm_translate_vncr(),
io_mem_abort(), and kvm_handle_mmio_return().  Converting all of those is totally
doable, e.g. through a combination of using kvm_page_fault and local snapshots of
esr, but it will be a lot of work and churn.

The work+churn itself doesn't bother me, but I would prefer not to block arch-neutral
usage of kvm_page_fault for months on end, nor do I want to leave KVM arm64 in
a half-baked state, i.e. I wouldn't feel comfortable converting just
__kvm_handle_guest_abort() and walking away.

What if we keep the exec, write, and is_perm fields for now, but add proper APIs
to access kvm_page_fault from common code?  The APIs would be largely duplicate
code between x86 and arm64 (though I think kvm_get_fault_gpa() would be different,
so yay), but that's not a big deal.  That way common KVM can start building out
functionality based on kvm_page_fault, and arm64 can independently convert to
making fault->esr the single source of truth, without having to worry about
perturbing common code.


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2025-08-26 21:33 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-21 21:00 [RFC PATCH 00/16] KVM: arm64: Add "struct kvm_page_fault" Sean Christopherson
2025-08-21 21:00 ` [RFC PATCH 01/16] KVM: arm64: Drop nested "esr" to eliminate variable shadowing Sean Christopherson
2025-08-21 21:00 ` [RFC PATCH 02/16] KVM: arm64: Get iabt status on-demand Sean Christopherson
2025-08-21 21:00 ` [RFC PATCH 03/16] KVM: arm64: Move SRCU-protected region of kvm_handle_guest_abort() to helper Sean Christopherson
2025-08-21 21:00 ` [RFC PATCH 04/16] KVM: arm64: Use guard(srcu) in kvm_handle_guest_abort() Sean Christopherson
2025-08-21 21:00 ` [RFC PATCH 05/16] KVM: arm64: Introduce "struct kvm_page_fault" for tracking abort state Sean Christopherson
2025-08-21 22:31   ` Oliver Upton
2025-08-26 18:58     ` Sean Christopherson
2025-08-26 19:29       ` Oliver Upton
2025-08-26 21:29         ` Sean Christopherson
2025-08-21 21:00 ` [RFC PATCH 06/16] KVM: arm64: Pass kvm_page_fault pointer to transparent_hugepage_adjust() Sean Christopherson
2025-08-21 21:00 ` [RFC PATCH 07/16] KVM: arm64: Pass @fault to fault_supports_stage2_huge_mapping() Sean Christopherson
2025-08-21 21:00 ` [RFC PATCH 08/16] KVM: arm64: Add helper to get permission fault granule from ESR Sean Christopherson
2025-08-21 21:00 ` [RFC PATCH 09/16] KVM: arm64: Track perm fault granule in "struct kvm_page_fault" Sean Christopherson
2025-08-21 21:00 ` [RFC PATCH 10/16] KVM: arm64: Drop local vfio_allow_any_uc, use vm_flags snapshot Sean Christopherson
2025-08-21 21:00 ` [RFC PATCH 11/16] KVM: arm64: Drop local mte_allowed, " Sean Christopherson
2025-08-21 21:00 ` [RFC PATCH 12/16] KVM: arm64: Move VMA-related information into "struct kvm_page_fault" Sean Christopherson
2025-08-21 21:00 ` [RFC PATCH 13/16] KVM: arm64: Stash "mmu_seq" in " Sean Christopherson
2025-08-21 21:00 ` [RFC PATCH 14/16] KVM: arm64: Track "forced" information " Sean Christopherson
2025-08-21 21:00 ` [RFC PATCH 15/16] KVM: arm64: Extract mmap_lock-protected code to helper for user mem aborts Sean Christopherson
2025-08-21 21:00 ` [RFC PATCH 16/16] KVM: arm64: Don't bother nullifying "vma" in mem abort path Sean Christopherson
2025-08-21 22:39 ` [RFC PATCH 00/16] KVM: arm64: Add "struct kvm_page_fault" Oliver Upton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).