[PATCH 00/17] KVM: arm64: More user_mem

public inbox for linux-arm-kernel@lists.infradead.org
 help / color / mirror / Atom feed

* [PATCH 00/17] KVM: arm64: More user_mem_abort() rework
@ 2026-03-16 17:54 Marc Zyngier
  2026-03-16 17:54 ` [PATCH 01/17] KVM: arm64: Kill fault->ipa Marc Zyngier
                   ` (19 more replies)
  0 siblings, 20 replies; 47+ messages in thread
From: Marc Zyngier @ 2026-03-16 17:54 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel
  Cc: Joey Gouly, Suzuki K Poulose, Oliver Upton, Zenghui Yu,
	Fuad Tabba, Will Deacon, Quentin Perret

Piqued by Fuad's initial set of patches[1] splitting user_mem_abort()
into more "edible" functions, I've added my on take on top of it with
a few goals in mind:

- contextualise the state by splitting kvm_s2_fault into more granular
  structures

- reduce the amount of state that is visible and/or mutable by any
  single function

- reduce the number of variable that simply cache state that is
  already implicitly available (and often only a helper away)

I find the result reasonably attractive, and throwing it at a couple
of machines didn't result in anything out of the ordinary.

For those interested, I have stashed a branch at [2], and I'd
appreciate some feedback on the outcome.

[1] https://lore.kernel.org/all/20260306140232.2193802-1-tabba@google.com/
[2] https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=hack/user_mem_abort-rework

Marc Zyngier (17):
  KVM: arm64: Kill fault->ipa
  KVM: arm64: Make fault_ipa immutable
  KVM: arm64: Move fault context to const structure
  KVM: arm64: Replace fault_is_perm with a helper
  KVM: arm64: Constrain fault_granule to kvm_s2_fault_map()
  KVM: arm64: Kill write_fault from kvm_s2_fault
  KVM: arm64: Kill exec_fault from kvm_s2_fault
  KVM: arm64: Kill topup_memcache from kvm_s2_fault
  KVM: arm64: Move VMA-related information to kvm_s2_fault_vma_info
  KVM: arm64: Kill logging_active from kvm_s2_fault
  KVM: arm64: Restrict the scope of the 'writable' attribute
  KVM: arm64: Move kvm_s2_fault.{pfn,page} to kvm_s2_vma_info
  KVM: arm64: Replace force_pte with a max_map_size attribute
  KVM: arm64: Move device mapping management into kvm_s2_fault_pin_pfn()
  KVM: arm64: Directly expose mapping prot and kill kvm_s2_fault
  KVM: arm64: Simplify integration of adjust_nested_*_perms()
  KVM: arm64: Convert gmem_abort() to struct kvm_s2_fault_desc

 arch/arm64/kvm/mmu.c | 428 ++++++++++++++++++++++---------------------
 1 file changed, 223 insertions(+), 205 deletions(-)

-- 
2.47.3



^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH 01/17] KVM: arm64: Kill fault->ipa
  2026-03-16 17:54 [PATCH 00/17] KVM: arm64: More user_mem_abort() rework Marc Zyngier
@ 2026-03-16 17:54 ` Marc Zyngier
  2026-03-17  9:22   ` Fuad Tabba
  2026-03-16 17:54 ` [PATCH 02/17] KVM: arm64: Make fault_ipa immutable Marc Zyngier
                   ` (18 subsequent siblings)
  19 siblings, 1 reply; 47+ messages in thread
From: Marc Zyngier @ 2026-03-16 17:54 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel
  Cc: Joey Gouly, Suzuki K Poulose, Oliver Upton, Zenghui Yu,
	Fuad Tabba, Will Deacon, Quentin Perret

fault->ipa, in a nested contest, represents the output of the guest's
S2 translation for the fault->fault_ipa input, and is equal to
fault->fault_ipa otherwise,

Given that this is readily available from kvm_s2_trans_output(),
drop fault->ipa and directly compute fault->gfn instead, which
is really what we want.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/mmu.c | 14 +++++---------
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 5542a50dc8a65..fe8f8057cf412 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1643,7 +1643,7 @@ static short kvm_s2_resolve_vma_size(struct vm_area_struct *vma,
 				     unsigned long hva,
 				     struct kvm_memory_slot *memslot,
 				     struct kvm_s2_trans *nested,
-				     bool *force_pte, phys_addr_t *ipa)
+				     bool *force_pte)
 {
 	short vma_shift;
 
@@ -1681,8 +1681,6 @@ static short kvm_s2_resolve_vma_size(struct vm_area_struct *vma,
 
 		max_map_size = *force_pte ? PAGE_SIZE : PUD_SIZE;
 
-		*ipa = kvm_s2_trans_output(nested);
-
 		/*
 		 * If we're about to create a shadow stage 2 entry, then we
 		 * can only create a block mapping if the guest stage 2 page
@@ -1722,7 +1720,6 @@ struct kvm_s2_fault {
 	bool is_vma_cacheable;
 	bool s2_force_noncacheable;
 	unsigned long mmu_seq;
-	phys_addr_t ipa;
 	gfn_t gfn;
 	kvm_pfn_t pfn;
 	bool logging_active;
@@ -1738,6 +1735,7 @@ static int kvm_s2_fault_get_vma_info(struct kvm_s2_fault *fault)
 {
 	struct vm_area_struct *vma;
 	struct kvm *kvm = fault->vcpu->kvm;
+	phys_addr_t ipa;
 
 	mmap_read_lock(current->mm);
 	vma = vma_lookup(current->mm, fault->hva);
@@ -1748,8 +1746,7 @@ static int kvm_s2_fault_get_vma_info(struct kvm_s2_fault *fault)
 	}
 
 	fault->vma_pagesize = 1UL << kvm_s2_resolve_vma_size(vma, fault->hva, fault->memslot,
-							     fault->nested, &fault->force_pte,
-							     &fault->ipa);
+							     fault->nested, &fault->force_pte);
 
 	/*
 	 * Both the canonical IPA and fault IPA must be aligned to the
@@ -1757,9 +1754,9 @@ static int kvm_s2_fault_get_vma_info(struct kvm_s2_fault *fault)
 	 * mapping in the right place.
 	 */
 	fault->fault_ipa = ALIGN_DOWN(fault->fault_ipa, fault->vma_pagesize);
-	fault->ipa = ALIGN_DOWN(fault->ipa, fault->vma_pagesize);
+	ipa = fault->nested ? kvm_s2_trans_output(fault->nested) : fault->fault_ipa;
+	fault->gfn = ALIGN_DOWN(ipa, fault->vma_pagesize) >> PAGE_SHIFT;
 
-	fault->gfn = fault->ipa >> PAGE_SHIFT;
 	fault->mte_allowed = kvm_vma_mte_allowed(vma);
 
 	fault->vm_flags = vma->vm_flags;
@@ -1970,7 +1967,6 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 		.memslot = memslot,
 		.hva = hva,
 		.fault_is_perm = fault_is_perm,
-		.ipa = fault_ipa,
 		.logging_active = logging_active,
 		.force_pte = logging_active,
 		.prot = KVM_PGTABLE_PROT_R,
-- 
2.47.3



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 02/17] KVM: arm64: Make fault_ipa immutable
  2026-03-16 17:54 [PATCH 00/17] KVM: arm64: More user_mem_abort() rework Marc Zyngier
  2026-03-16 17:54 ` [PATCH 01/17] KVM: arm64: Kill fault->ipa Marc Zyngier
@ 2026-03-16 17:54 ` Marc Zyngier
  2026-03-17  9:38   ` Fuad Tabba
  2026-03-16 17:54 ` [PATCH 03/17] KVM: arm64: Move fault context to const structure Marc Zyngier
                   ` (17 subsequent siblings)
  19 siblings, 1 reply; 47+ messages in thread
From: Marc Zyngier @ 2026-03-16 17:54 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel
  Cc: Joey Gouly, Suzuki K Poulose, Oliver Upton, Zenghui Yu,
	Fuad Tabba, Will Deacon, Quentin Perret

Updating fault_ipa is conceptually annoying, as it changes something
that is a property of the fault itself.

Stop doing so and instead use fault->gfn as the sole piece of state
that can be used to represent the faulting IPA.

At the same time, introduce get_canonical_gfn() for the couple of case
we're we are concerned with the memslot-related IPA and not the faulting
one.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/mmu.c | 38 ++++++++++++++++++++++++++------------
 1 file changed, 26 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index fe8f8057cf412..ab8a269d4366d 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1400,10 +1400,10 @@ static bool fault_supports_stage2_huge_mapping(struct kvm_memory_slot *memslot,
  */
 static long
 transparent_hugepage_adjust(struct kvm *kvm, struct kvm_memory_slot *memslot,
-			    unsigned long hva, kvm_pfn_t *pfnp,
-			    phys_addr_t *ipap)
+			    unsigned long hva, kvm_pfn_t *pfnp, gfn_t *gfnp)
 {
 	kvm_pfn_t pfn = *pfnp;
+	gfn_t gfn = *gfnp;
 
 	/*
 	 * Make sure the adjustment is done only for THP pages. Also make
@@ -1419,7 +1419,8 @@ transparent_hugepage_adjust(struct kvm *kvm, struct kvm_memory_slot *memslot,
 		if (sz < PMD_SIZE)
 			return PAGE_SIZE;
 
-		*ipap &= PMD_MASK;
+		gfn &= ~(PTRS_PER_PMD - 1);
+		*gfnp = gfn;
 		pfn &= ~(PTRS_PER_PMD - 1);
 		*pfnp = pfn;
 
@@ -1735,7 +1736,6 @@ static int kvm_s2_fault_get_vma_info(struct kvm_s2_fault *fault)
 {
 	struct vm_area_struct *vma;
 	struct kvm *kvm = fault->vcpu->kvm;
-	phys_addr_t ipa;
 
 	mmap_read_lock(current->mm);
 	vma = vma_lookup(current->mm, fault->hva);
@@ -1753,9 +1753,7 @@ static int kvm_s2_fault_get_vma_info(struct kvm_s2_fault *fault)
 	 * mapping size to ensure we find the right PFN and lay down the
 	 * mapping in the right place.
 	 */
-	fault->fault_ipa = ALIGN_DOWN(fault->fault_ipa, fault->vma_pagesize);
-	ipa = fault->nested ? kvm_s2_trans_output(fault->nested) : fault->fault_ipa;
-	fault->gfn = ALIGN_DOWN(ipa, fault->vma_pagesize) >> PAGE_SHIFT;
+	fault->gfn = ALIGN_DOWN(fault->fault_ipa, fault->vma_pagesize) >> PAGE_SHIFT;
 
 	fault->mte_allowed = kvm_vma_mte_allowed(vma);
 
@@ -1777,6 +1775,17 @@ static int kvm_s2_fault_get_vma_info(struct kvm_s2_fault *fault)
 	return 0;
 }
 
+static gfn_t get_canonical_gfn(struct kvm_s2_fault *fault)
+{
+	phys_addr_t ipa;
+
+	if (!fault->nested)
+		return fault->gfn;
+
+	ipa = kvm_s2_trans_output(fault->nested);
+	return ALIGN_DOWN(ipa, fault->vma_pagesize) >> PAGE_SHIFT;
+}
+
 static int kvm_s2_fault_pin_pfn(struct kvm_s2_fault *fault)
 {
 	int ret;
@@ -1785,7 +1794,7 @@ static int kvm_s2_fault_pin_pfn(struct kvm_s2_fault *fault)
 	if (ret)
 		return ret;
 
-	fault->pfn = __kvm_faultin_pfn(fault->memslot, fault->gfn,
+	fault->pfn = __kvm_faultin_pfn(fault->memslot, get_canonical_gfn(fault),
 				       fault->write_fault ? FOLL_WRITE : 0,
 				       &fault->writable, &fault->page);
 	if (unlikely(is_error_noslot_pfn(fault->pfn))) {
@@ -1885,6 +1894,11 @@ static int kvm_s2_fault_compute_prot(struct kvm_s2_fault *fault)
 	return 0;
 }
 
+static phys_addr_t get_ipa(const struct kvm_s2_fault *fault)
+{
+	return gfn_to_gpa(fault->gfn);
+}
+
 static int kvm_s2_fault_map(struct kvm_s2_fault *fault, void *memcache)
 {
 	struct kvm *kvm = fault->vcpu->kvm;
@@ -1909,7 +1923,7 @@ static int kvm_s2_fault_map(struct kvm_s2_fault *fault, void *memcache)
 		} else {
 			fault->vma_pagesize = transparent_hugepage_adjust(kvm, fault->memslot,
 									  fault->hva, &fault->pfn,
-									  &fault->fault_ipa);
+									  &fault->gfn);
 
 			if (fault->vma_pagesize < 0) {
 				ret = fault->vma_pagesize;
@@ -1932,10 +1946,10 @@ static int kvm_s2_fault_map(struct kvm_s2_fault *fault, void *memcache)
 		 * PTE, which will be preserved.
 		 */
 		fault->prot &= ~KVM_NV_GUEST_MAP_SZ;
-		ret = KVM_PGT_FN(kvm_pgtable_stage2_relax_perms)(pgt, fault->fault_ipa,
+		ret = KVM_PGT_FN(kvm_pgtable_stage2_relax_perms)(pgt, get_ipa(fault),
 								 fault->prot, flags);
 	} else {
-		ret = KVM_PGT_FN(kvm_pgtable_stage2_map)(pgt, fault->fault_ipa, fault->vma_pagesize,
+		ret = KVM_PGT_FN(kvm_pgtable_stage2_map)(pgt, get_ipa(fault), fault->vma_pagesize,
 							 __pfn_to_phys(fault->pfn), fault->prot,
 							 memcache, flags);
 	}
@@ -1946,7 +1960,7 @@ static int kvm_s2_fault_map(struct kvm_s2_fault *fault, void *memcache)
 
 	/* Mark the fault->page dirty only if the fault is handled successfully */
 	if (fault->writable && !ret)
-		mark_page_dirty_in_slot(kvm, fault->memslot, fault->gfn);
+		mark_page_dirty_in_slot(kvm, fault->memslot, get_canonical_gfn(fault));
 
 	if (ret != -EAGAIN)
 		return ret;
-- 
2.47.3



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 03/17] KVM: arm64: Move fault context to const structure
  2026-03-16 17:54 [PATCH 00/17] KVM: arm64: More user_mem_abort() rework Marc Zyngier
  2026-03-16 17:54 ` [PATCH 01/17] KVM: arm64: Kill fault->ipa Marc Zyngier
  2026-03-16 17:54 ` [PATCH 02/17] KVM: arm64: Make fault_ipa immutable Marc Zyngier
@ 2026-03-16 17:54 ` Marc Zyngier
  2026-03-17 10:26   ` Fuad Tabba
  2026-03-16 17:54 ` [PATCH 04/17] KVM: arm64: Replace fault_is_perm with a helper Marc Zyngier
                   ` (16 subsequent siblings)
  19 siblings, 1 reply; 47+ messages in thread
From: Marc Zyngier @ 2026-03-16 17:54 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel
  Cc: Joey Gouly, Suzuki K Poulose, Oliver Upton, Zenghui Yu,
	Fuad Tabba, Will Deacon, Quentin Perret

In order to make it clearer what gets updated or not during fault
handling, move a set of information that losely represents the
fault context.

This gets populated early, from handle_mem_abort(), and gets passed
along as a const pointer. user_mem_abort()'s signature is majorly
improved in doing so, and kvm_s2_fault loses a bunch of fields.

gmem_abort() will get a similar treatment down the line.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/mmu.c | 133 ++++++++++++++++++++++---------------------
 1 file changed, 69 insertions(+), 64 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index ab8a269d4366d..2a7128b8dd14f 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1640,23 +1640,28 @@ static int gmem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	return ret != -EAGAIN ? ret : 0;
 }
 
-static short kvm_s2_resolve_vma_size(struct vm_area_struct *vma,
-				     unsigned long hva,
-				     struct kvm_memory_slot *memslot,
-				     struct kvm_s2_trans *nested,
-				     bool *force_pte)
+struct kvm_s2_fault_desc {
+	struct kvm_vcpu		*vcpu;
+	phys_addr_t		fault_ipa;
+	struct kvm_s2_trans	*nested;
+	struct kvm_memory_slot	*memslot;
+	unsigned long		hva;
+};
+
+static short kvm_s2_resolve_vma_size(const struct kvm_s2_fault_desc *s2fd,
+				     struct vm_area_struct *vma, bool *force_pte)
 {
 	short vma_shift;
 
 	if (*force_pte)
 		vma_shift = PAGE_SHIFT;
 	else
-		vma_shift = get_vma_page_shift(vma, hva);
+		vma_shift = get_vma_page_shift(vma, s2fd->hva);
 
 	switch (vma_shift) {
 #ifndef __PAGETABLE_PMD_FOLDED
 	case PUD_SHIFT:
-		if (fault_supports_stage2_huge_mapping(memslot, hva, PUD_SIZE))
+		if (fault_supports_stage2_huge_mapping(s2fd->memslot, s2fd->hva, PUD_SIZE))
 			break;
 		fallthrough;
 #endif
@@ -1664,7 +1669,7 @@ static short kvm_s2_resolve_vma_size(struct vm_area_struct *vma,
 		vma_shift = PMD_SHIFT;
 		fallthrough;
 	case PMD_SHIFT:
-		if (fault_supports_stage2_huge_mapping(memslot, hva, PMD_SIZE))
+		if (fault_supports_stage2_huge_mapping(s2fd->memslot, s2fd->hva, PMD_SIZE))
 			break;
 		fallthrough;
 	case CONT_PTE_SHIFT:
@@ -1677,7 +1682,7 @@ static short kvm_s2_resolve_vma_size(struct vm_area_struct *vma,
 		WARN_ONCE(1, "Unknown vma_shift %d", vma_shift);
 	}
 
-	if (nested) {
+	if (s2fd->nested) {
 		unsigned long max_map_size;
 
 		max_map_size = *force_pte ? PAGE_SIZE : PUD_SIZE;
@@ -1687,7 +1692,7 @@ static short kvm_s2_resolve_vma_size(struct vm_area_struct *vma,
 		 * can only create a block mapping if the guest stage 2 page
 		 * table uses at least as big a mapping.
 		 */
-		max_map_size = min(kvm_s2_trans_size(nested), max_map_size);
+		max_map_size = min(kvm_s2_trans_size(s2fd->nested), max_map_size);
 
 		/*
 		 * Be careful that if the mapping size falls between
@@ -1706,11 +1711,6 @@ static short kvm_s2_resolve_vma_size(struct vm_area_struct *vma,
 }
 
 struct kvm_s2_fault {
-	struct kvm_vcpu *vcpu;
-	phys_addr_t fault_ipa;
-	struct kvm_s2_trans *nested;
-	struct kvm_memory_slot *memslot;
-	unsigned long hva;
 	bool fault_is_perm;
 
 	bool write_fault;
@@ -1732,28 +1732,28 @@ struct kvm_s2_fault {
 	vm_flags_t vm_flags;
 };
 
-static int kvm_s2_fault_get_vma_info(struct kvm_s2_fault *fault)
+static int kvm_s2_fault_get_vma_info(const struct kvm_s2_fault_desc *s2fd,
+				     struct kvm_s2_fault *fault)
 {
 	struct vm_area_struct *vma;
-	struct kvm *kvm = fault->vcpu->kvm;
+	struct kvm *kvm = s2fd->vcpu->kvm;
 
 	mmap_read_lock(current->mm);
-	vma = vma_lookup(current->mm, fault->hva);
+	vma = vma_lookup(current->mm, s2fd->hva);
 	if (unlikely(!vma)) {
-		kvm_err("Failed to find VMA for fault->hva 0x%lx\n", fault->hva);
+		kvm_err("Failed to find VMA for hva 0x%lx\n", s2fd->hva);
 		mmap_read_unlock(current->mm);
 		return -EFAULT;
 	}
 
-	fault->vma_pagesize = 1UL << kvm_s2_resolve_vma_size(vma, fault->hva, fault->memslot,
-							     fault->nested, &fault->force_pte);
+	fault->vma_pagesize = BIT(kvm_s2_resolve_vma_size(s2fd, vma, &fault->force_pte));
 
 	/*
 	 * Both the canonical IPA and fault IPA must be aligned to the
 	 * mapping size to ensure we find the right PFN and lay down the
 	 * mapping in the right place.
 	 */
-	fault->gfn = ALIGN_DOWN(fault->fault_ipa, fault->vma_pagesize) >> PAGE_SHIFT;
+	fault->gfn = ALIGN_DOWN(s2fd->fault_ipa, fault->vma_pagesize) >> PAGE_SHIFT;
 
 	fault->mte_allowed = kvm_vma_mte_allowed(vma);
 
@@ -1775,31 +1775,33 @@ static int kvm_s2_fault_get_vma_info(struct kvm_s2_fault *fault)
 	return 0;
 }
 
-static gfn_t get_canonical_gfn(struct kvm_s2_fault *fault)
+static gfn_t get_canonical_gfn(const struct kvm_s2_fault_desc *s2fd,
+			       const struct kvm_s2_fault *fault)
 {
 	phys_addr_t ipa;
 
-	if (!fault->nested)
+	if (!s2fd->nested)
 		return fault->gfn;
 
-	ipa = kvm_s2_trans_output(fault->nested);
+	ipa = kvm_s2_trans_output(s2fd->nested);
 	return ALIGN_DOWN(ipa, fault->vma_pagesize) >> PAGE_SHIFT;
 }
 
-static int kvm_s2_fault_pin_pfn(struct kvm_s2_fault *fault)
+static int kvm_s2_fault_pin_pfn(const struct kvm_s2_fault_desc *s2fd,
+				struct kvm_s2_fault *fault)
 {
 	int ret;
 
-	ret = kvm_s2_fault_get_vma_info(fault);
+	ret = kvm_s2_fault_get_vma_info(s2fd, fault);
 	if (ret)
 		return ret;
 
-	fault->pfn = __kvm_faultin_pfn(fault->memslot, get_canonical_gfn(fault),
+	fault->pfn = __kvm_faultin_pfn(s2fd->memslot, get_canonical_gfn(s2fd, fault),
 				       fault->write_fault ? FOLL_WRITE : 0,
 				       &fault->writable, &fault->page);
 	if (unlikely(is_error_noslot_pfn(fault->pfn))) {
 		if (fault->pfn == KVM_PFN_ERR_HWPOISON) {
-			kvm_send_hwpoison_signal(fault->hva, __ffs(fault->vma_pagesize));
+			kvm_send_hwpoison_signal(s2fd->hva, __ffs(fault->vma_pagesize));
 			return 0;
 		}
 		return -EFAULT;
@@ -1808,9 +1810,10 @@ static int kvm_s2_fault_pin_pfn(struct kvm_s2_fault *fault)
 	return 1;
 }
 
-static int kvm_s2_fault_compute_prot(struct kvm_s2_fault *fault)
+static int kvm_s2_fault_compute_prot(const struct kvm_s2_fault_desc *s2fd,
+				     struct kvm_s2_fault *fault)
 {
-	struct kvm *kvm = fault->vcpu->kvm;
+	struct kvm *kvm = s2fd->vcpu->kvm;
 
 	/*
 	 * Check if this is non-struct page memory PFN, and cannot support
@@ -1862,13 +1865,13 @@ static int kvm_s2_fault_compute_prot(struct kvm_s2_fault *fault)
 	 * and trigger the exception here. Since the memslot is valid, inject
 	 * the fault back to the guest.
 	 */
-	if (esr_fsc_is_excl_atomic_fault(kvm_vcpu_get_esr(fault->vcpu))) {
-		kvm_inject_dabt_excl_atomic(fault->vcpu, kvm_vcpu_get_hfar(fault->vcpu));
+	if (esr_fsc_is_excl_atomic_fault(kvm_vcpu_get_esr(s2fd->vcpu))) {
+		kvm_inject_dabt_excl_atomic(s2fd->vcpu, kvm_vcpu_get_hfar(s2fd->vcpu));
 		return 1;
 	}
 
-	if (fault->nested)
-		adjust_nested_fault_perms(fault->nested, &fault->prot, &fault->writable);
+	if (s2fd->nested)
+		adjust_nested_fault_perms(s2fd->nested, &fault->prot, &fault->writable);
 
 	if (fault->writable)
 		fault->prot |= KVM_PGTABLE_PROT_W;
@@ -1882,8 +1885,8 @@ static int kvm_s2_fault_compute_prot(struct kvm_s2_fault *fault)
 	else if (cpus_have_final_cap(ARM64_HAS_CACHE_DIC))
 		fault->prot |= KVM_PGTABLE_PROT_X;
 
-	if (fault->nested)
-		adjust_nested_exec_perms(kvm, fault->nested, &fault->prot);
+	if (s2fd->nested)
+		adjust_nested_exec_perms(kvm, s2fd->nested, &fault->prot);
 
 	if (!fault->fault_is_perm && !fault->s2_force_noncacheable && kvm_has_mte(kvm)) {
 		/* Check the VMM hasn't introduced a new disallowed VMA */
@@ -1899,15 +1902,16 @@ static phys_addr_t get_ipa(const struct kvm_s2_fault *fault)
 	return gfn_to_gpa(fault->gfn);
 }
 
-static int kvm_s2_fault_map(struct kvm_s2_fault *fault, void *memcache)
+static int kvm_s2_fault_map(const struct kvm_s2_fault_desc *s2fd,
+			    struct kvm_s2_fault *fault, void *memcache)
 {
-	struct kvm *kvm = fault->vcpu->kvm;
+	struct kvm *kvm = s2fd->vcpu->kvm;
 	struct kvm_pgtable *pgt;
 	int ret;
 	enum kvm_pgtable_walk_flags flags = KVM_PGTABLE_WALK_SHARED;
 
 	kvm_fault_lock(kvm);
-	pgt = fault->vcpu->arch.hw_mmu->pgt;
+	pgt = s2fd->vcpu->arch.hw_mmu->pgt;
 	ret = -EAGAIN;
 	if (mmu_invalidate_retry(kvm, fault->mmu_seq))
 		goto out_unlock;
@@ -1921,8 +1925,8 @@ static int kvm_s2_fault_map(struct kvm_s2_fault *fault, void *memcache)
 		if (fault->fault_is_perm && fault->fault_granule > PAGE_SIZE) {
 			fault->vma_pagesize = fault->fault_granule;
 		} else {
-			fault->vma_pagesize = transparent_hugepage_adjust(kvm, fault->memslot,
-									  fault->hva, &fault->pfn,
+			fault->vma_pagesize = transparent_hugepage_adjust(kvm, s2fd->memslot,
+									  s2fd->hva, &fault->pfn,
 									  &fault->gfn);
 
 			if (fault->vma_pagesize < 0) {
@@ -1960,34 +1964,27 @@ static int kvm_s2_fault_map(struct kvm_s2_fault *fault, void *memcache)
 
 	/* Mark the fault->page dirty only if the fault is handled successfully */
 	if (fault->writable && !ret)
-		mark_page_dirty_in_slot(kvm, fault->memslot, get_canonical_gfn(fault));
+		mark_page_dirty_in_slot(kvm, s2fd->memslot, get_canonical_gfn(s2fd, fault));
 
 	if (ret != -EAGAIN)
 		return ret;
 	return 0;
 }
 
-static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
-			  struct kvm_s2_trans *nested,
-			  struct kvm_memory_slot *memslot, unsigned long hva,
-			  bool fault_is_perm)
+static int user_mem_abort(const struct kvm_s2_fault_desc *s2fd)
 {
-	bool write_fault = kvm_is_write_fault(vcpu);
-	bool logging_active = memslot_is_logging(memslot);
+	bool perm_fault = kvm_vcpu_trap_is_permission_fault(s2fd->vcpu);
+	bool write_fault = kvm_is_write_fault(s2fd->vcpu);
+	bool logging_active = memslot_is_logging(s2fd->memslot);
 	struct kvm_s2_fault fault = {
-		.vcpu = vcpu,
-		.fault_ipa = fault_ipa,
-		.nested = nested,
-		.memslot = memslot,
-		.hva = hva,
-		.fault_is_perm = fault_is_perm,
+		.fault_is_perm = perm_fault,
 		.logging_active = logging_active,
 		.force_pte = logging_active,
 		.prot = KVM_PGTABLE_PROT_R,
-		.fault_granule = fault_is_perm ? kvm_vcpu_trap_get_perm_fault_granule(vcpu) : 0,
+		.fault_granule = perm_fault ? kvm_vcpu_trap_get_perm_fault_granule(s2fd->vcpu) : 0,
 		.write_fault = write_fault,
-		.exec_fault = kvm_vcpu_trap_is_exec_fault(vcpu),
-		.topup_memcache = !fault_is_perm || (logging_active && write_fault),
+		.exec_fault = kvm_vcpu_trap_is_exec_fault(s2fd->vcpu),
+		.topup_memcache = !perm_fault || (logging_active && write_fault),
 	};
 	void *memcache;
 	int ret;
@@ -2000,7 +1997,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	 * only exception to this is when dirty logging is enabled at runtime
 	 * and a write fault needs to collapse a block entry into a table.
 	 */
-	ret = prepare_mmu_memcache(vcpu, fault.topup_memcache, &memcache);
+	ret = prepare_mmu_memcache(s2fd->vcpu, fault.topup_memcache, &memcache);
 	if (ret)
 		return ret;
 
@@ -2008,17 +2005,17 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	 * Let's check if we will get back a huge fault->page backed by hugetlbfs, or
 	 * get block mapping for device MMIO region.
 	 */
-	ret = kvm_s2_fault_pin_pfn(&fault);
+	ret = kvm_s2_fault_pin_pfn(s2fd, &fault);
 	if (ret != 1)
 		return ret;
 
-	ret = kvm_s2_fault_compute_prot(&fault);
+	ret = kvm_s2_fault_compute_prot(s2fd, &fault);
 	if (ret) {
 		kvm_release_page_unused(fault.page);
 		return ret;
 	}
 
-	return kvm_s2_fault_map(&fault, memcache);
+	return kvm_s2_fault_map(s2fd, &fault, memcache);
 }
 
 /* Resolve the access fault by making the page young again. */
@@ -2284,12 +2281,20 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 	VM_WARN_ON_ONCE(kvm_vcpu_trap_is_permission_fault(vcpu) &&
 			!write_fault && !kvm_vcpu_trap_is_exec_fault(vcpu));
 
+	const struct kvm_s2_fault_desc s2fd = {
+		.vcpu		= vcpu,
+		.fault_ipa	= fault_ipa,
+		.nested		= nested,
+		.memslot	= memslot,
+		.hva		= hva,
+	};
+
 	if (kvm_slot_has_gmem(memslot))
 		ret = gmem_abort(vcpu, fault_ipa, nested, memslot,
 				 esr_fsc_is_permission_fault(esr));
 	else
-		ret = user_mem_abort(vcpu, fault_ipa, nested, memslot, hva,
-				     esr_fsc_is_permission_fault(esr));
+		ret = user_mem_abort(&s2fd);
+
 	if (ret == 0)
 		ret = 1;
 out:
-- 
2.47.3



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 04/17] KVM: arm64: Replace fault_is_perm with a helper
  2026-03-16 17:54 [PATCH 00/17] KVM: arm64: More user_mem_abort() rework Marc Zyngier
                   ` (2 preceding siblings ...)
  2026-03-16 17:54 ` [PATCH 03/17] KVM: arm64: Move fault context to const structure Marc Zyngier
@ 2026-03-16 17:54 ` Marc Zyngier
  2026-03-17 10:49   ` Fuad Tabba
  2026-03-18 13:43   ` Joey Gouly
  2026-03-16 17:54 ` [PATCH 05/17] KVM: arm64: Constrain fault_granule to kvm_s2_fault_map() Marc Zyngier
                   ` (15 subsequent siblings)
  19 siblings, 2 replies; 47+ messages in thread
From: Marc Zyngier @ 2026-03-16 17:54 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel
  Cc: Joey Gouly, Suzuki K Poulose, Oliver Upton, Zenghui Yu,
	Fuad Tabba, Will Deacon, Quentin Perret

Carrying a boolean to indicate that a given fault is slightly odd,
as this is a property of the fault itself, and we'd better avoid
duplicating state.

For this purpose, introduce a kvm_s2_fault_is_perm() predicate that
can take a fault descriptor as a parameter. fault_is_perm is therefore
dropped from kvm_s2_fault.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/mmu.c | 17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 2a7128b8dd14f..1b32f2e6c3e61 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1711,8 +1711,6 @@ static short kvm_s2_resolve_vma_size(const struct kvm_s2_fault_desc *s2fd,
 }
 
 struct kvm_s2_fault {
-	bool fault_is_perm;
-
 	bool write_fault;
 	bool exec_fault;
 	bool writable;
@@ -1732,6 +1730,11 @@ struct kvm_s2_fault {
 	vm_flags_t vm_flags;
 };
 
+static bool kvm_s2_fault_is_perm(const struct kvm_s2_fault_desc *s2fd)
+{
+	return kvm_vcpu_trap_is_permission_fault(s2fd->vcpu);
+}
+
 static int kvm_s2_fault_get_vma_info(const struct kvm_s2_fault_desc *s2fd,
 				     struct kvm_s2_fault *fault)
 {
@@ -1888,7 +1891,7 @@ static int kvm_s2_fault_compute_prot(const struct kvm_s2_fault_desc *s2fd,
 	if (s2fd->nested)
 		adjust_nested_exec_perms(kvm, s2fd->nested, &fault->prot);
 
-	if (!fault->fault_is_perm && !fault->s2_force_noncacheable && kvm_has_mte(kvm)) {
+	if (!kvm_s2_fault_is_perm(s2fd) && !fault->s2_force_noncacheable && kvm_has_mte(kvm)) {
 		/* Check the VMM hasn't introduced a new disallowed VMA */
 		if (!fault->mte_allowed)
 			return -EFAULT;
@@ -1905,6 +1908,7 @@ static phys_addr_t get_ipa(const struct kvm_s2_fault *fault)
 static int kvm_s2_fault_map(const struct kvm_s2_fault_desc *s2fd,
 			    struct kvm_s2_fault *fault, void *memcache)
 {
+	bool fault_is_perm = kvm_s2_fault_is_perm(s2fd);
 	struct kvm *kvm = s2fd->vcpu->kvm;
 	struct kvm_pgtable *pgt;
 	int ret;
@@ -1922,7 +1926,7 @@ static int kvm_s2_fault_map(const struct kvm_s2_fault_desc *s2fd,
 	 */
 	if (fault->vma_pagesize == PAGE_SIZE &&
 	    !(fault->force_pte || fault->s2_force_noncacheable)) {
-		if (fault->fault_is_perm && fault->fault_granule > PAGE_SIZE) {
+		if (fault_is_perm && fault->fault_granule > PAGE_SIZE) {
 			fault->vma_pagesize = fault->fault_granule;
 		} else {
 			fault->vma_pagesize = transparent_hugepage_adjust(kvm, s2fd->memslot,
@@ -1936,7 +1940,7 @@ static int kvm_s2_fault_map(const struct kvm_s2_fault_desc *s2fd,
 		}
 	}
 
-	if (!fault->fault_is_perm && !fault->s2_force_noncacheable && kvm_has_mte(kvm))
+	if (!fault_is_perm && !fault->s2_force_noncacheable && kvm_has_mte(kvm))
 		sanitise_mte_tags(kvm, fault->pfn, fault->vma_pagesize);
 
 	/*
@@ -1944,7 +1948,7 @@ static int kvm_s2_fault_map(const struct kvm_s2_fault_desc *s2fd,
 	 * permissions only if fault->vma_pagesize equals fault->fault_granule. Otherwise,
 	 * kvm_pgtable_stage2_map() should be called to change block size.
 	 */
-	if (fault->fault_is_perm && fault->vma_pagesize == fault->fault_granule) {
+	if (fault_is_perm && fault->vma_pagesize == fault->fault_granule) {
 		/*
 		 * Drop the SW bits in favour of those stored in the
 		 * PTE, which will be preserved.
@@ -1977,7 +1981,6 @@ static int user_mem_abort(const struct kvm_s2_fault_desc *s2fd)
 	bool write_fault = kvm_is_write_fault(s2fd->vcpu);
 	bool logging_active = memslot_is_logging(s2fd->memslot);
 	struct kvm_s2_fault fault = {
-		.fault_is_perm = perm_fault,
 		.logging_active = logging_active,
 		.force_pte = logging_active,
 		.prot = KVM_PGTABLE_PROT_R,
-- 
2.47.3



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 05/17] KVM: arm64: Constrain fault_granule to kvm_s2_fault_map()
  2026-03-16 17:54 [PATCH 00/17] KVM: arm64: More user_mem_abort() rework Marc Zyngier
                   ` (3 preceding siblings ...)
  2026-03-16 17:54 ` [PATCH 04/17] KVM: arm64: Replace fault_is_perm with a helper Marc Zyngier
@ 2026-03-16 17:54 ` Marc Zyngier
  2026-03-17 11:04   ` Fuad Tabba
  2026-03-16 17:54 ` [PATCH 06/17] KVM: arm64: Kill write_fault from kvm_s2_fault Marc Zyngier
                   ` (14 subsequent siblings)
  19 siblings, 1 reply; 47+ messages in thread
From: Marc Zyngier @ 2026-03-16 17:54 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel
  Cc: Joey Gouly, Suzuki K Poulose, Oliver Upton, Zenghui Yu,
	Fuad Tabba, Will Deacon, Quentin Perret

The notion of fault_granule is specific to kvm_s2_fault_map(), and
is unused anywhere else.

Move this variable locally, removing it from kvm_s2_fault.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/mmu.c | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 1b32f2e6c3e61..12c2f0aeaae4c 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1724,7 +1724,6 @@ struct kvm_s2_fault {
 	bool logging_active;
 	bool force_pte;
 	long vma_pagesize;
-	long fault_granule;
 	enum kvm_pgtable_prot prot;
 	struct page *page;
 	vm_flags_t vm_flags;
@@ -1908,9 +1907,9 @@ static phys_addr_t get_ipa(const struct kvm_s2_fault *fault)
 static int kvm_s2_fault_map(const struct kvm_s2_fault_desc *s2fd,
 			    struct kvm_s2_fault *fault, void *memcache)
 {
-	bool fault_is_perm = kvm_s2_fault_is_perm(s2fd);
 	struct kvm *kvm = s2fd->vcpu->kvm;
 	struct kvm_pgtable *pgt;
+	long perm_fault_granule;
 	int ret;
 	enum kvm_pgtable_walk_flags flags = KVM_PGTABLE_WALK_SHARED;
 
@@ -1920,14 +1919,17 @@ static int kvm_s2_fault_map(const struct kvm_s2_fault_desc *s2fd,
 	if (mmu_invalidate_retry(kvm, fault->mmu_seq))
 		goto out_unlock;
 
+	perm_fault_granule = (kvm_s2_fault_is_perm(s2fd) ?
+			      kvm_vcpu_trap_get_perm_fault_granule(s2fd->vcpu) : 0);
+
 	/*
 	 * If we are not forced to use fault->page mapping, check if we are
 	 * backed by a THP and thus use block mapping if possible.
 	 */
 	if (fault->vma_pagesize == PAGE_SIZE &&
 	    !(fault->force_pte || fault->s2_force_noncacheable)) {
-		if (fault_is_perm && fault->fault_granule > PAGE_SIZE) {
-			fault->vma_pagesize = fault->fault_granule;
+		if (perm_fault_granule > PAGE_SIZE) {
+			fault->vma_pagesize = perm_fault_granule;
 		} else {
 			fault->vma_pagesize = transparent_hugepage_adjust(kvm, s2fd->memslot,
 									  s2fd->hva, &fault->pfn,
@@ -1940,15 +1942,15 @@ static int kvm_s2_fault_map(const struct kvm_s2_fault_desc *s2fd,
 		}
 	}
 
-	if (!fault_is_perm && !fault->s2_force_noncacheable && kvm_has_mte(kvm))
+	if (!perm_fault_granule && !fault->s2_force_noncacheable && kvm_has_mte(kvm))
 		sanitise_mte_tags(kvm, fault->pfn, fault->vma_pagesize);
 
 	/*
 	 * Under the premise of getting a FSC_PERM fault, we just need to relax
-	 * permissions only if fault->vma_pagesize equals fault->fault_granule. Otherwise,
+	 * permissions only if vma_pagesize equals perm_fault_granule. Otherwise,
 	 * kvm_pgtable_stage2_map() should be called to change block size.
 	 */
-	if (fault_is_perm && fault->vma_pagesize == fault->fault_granule) {
+	if (fault->vma_pagesize == perm_fault_granule) {
 		/*
 		 * Drop the SW bits in favour of those stored in the
 		 * PTE, which will be preserved.
@@ -1984,7 +1986,6 @@ static int user_mem_abort(const struct kvm_s2_fault_desc *s2fd)
 		.logging_active = logging_active,
 		.force_pte = logging_active,
 		.prot = KVM_PGTABLE_PROT_R,
-		.fault_granule = perm_fault ? kvm_vcpu_trap_get_perm_fault_granule(s2fd->vcpu) : 0,
 		.write_fault = write_fault,
 		.exec_fault = kvm_vcpu_trap_is_exec_fault(s2fd->vcpu),
 		.topup_memcache = !perm_fault || (logging_active && write_fault),
-- 
2.47.3



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 06/17] KVM: arm64: Kill write_fault from kvm_s2_fault
  2026-03-16 17:54 [PATCH 00/17] KVM: arm64: More user_mem_abort() rework Marc Zyngier
                   ` (4 preceding siblings ...)
  2026-03-16 17:54 ` [PATCH 05/17] KVM: arm64: Constrain fault_granule to kvm_s2_fault_map() Marc Zyngier
@ 2026-03-16 17:54 ` Marc Zyngier
  2026-03-17 11:20   ` Fuad Tabba
  2026-03-16 17:54 ` [PATCH 07/17] KVM: arm64: Kill exec_fault " Marc Zyngier
                   ` (13 subsequent siblings)
  19 siblings, 1 reply; 47+ messages in thread
From: Marc Zyngier @ 2026-03-16 17:54 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel
  Cc: Joey Gouly, Suzuki K Poulose, Oliver Upton, Zenghui Yu,
	Fuad Tabba, Will Deacon, Quentin Perret

We already have kvm_is_write_fault() as a predicate indicating
a S2 fault on a write, and we're better off just using that instead
of duplicating the state.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/mmu.c | 11 +++--------
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 12c2f0aeaae4c..86950acbd7e6b 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1711,7 +1711,6 @@ static short kvm_s2_resolve_vma_size(const struct kvm_s2_fault_desc *s2fd,
 }
 
 struct kvm_s2_fault {
-	bool write_fault;
 	bool exec_fault;
 	bool writable;
 	bool topup_memcache;
@@ -1799,7 +1798,7 @@ static int kvm_s2_fault_pin_pfn(const struct kvm_s2_fault_desc *s2fd,
 		return ret;
 
 	fault->pfn = __kvm_faultin_pfn(s2fd->memslot, get_canonical_gfn(s2fd, fault),
-				       fault->write_fault ? FOLL_WRITE : 0,
+				       kvm_is_write_fault(s2fd->vcpu) ? FOLL_WRITE : 0,
 				       &fault->writable, &fault->page);
 	if (unlikely(is_error_noslot_pfn(fault->pfn))) {
 		if (fault->pfn == KVM_PFN_ERR_HWPOISON) {
@@ -1850,7 +1849,7 @@ static int kvm_s2_fault_compute_prot(const struct kvm_s2_fault_desc *s2fd,
 			 */
 			fault->s2_force_noncacheable = true;
 		}
-	} else if (fault->logging_active && !fault->write_fault) {
+	} else if (fault->logging_active && !kvm_is_write_fault(s2fd->vcpu)) {
 		/*
 		 * Only actually map the page as writable if this was a write
 		 * fault.
@@ -1980,21 +1979,17 @@ static int kvm_s2_fault_map(const struct kvm_s2_fault_desc *s2fd,
 static int user_mem_abort(const struct kvm_s2_fault_desc *s2fd)
 {
 	bool perm_fault = kvm_vcpu_trap_is_permission_fault(s2fd->vcpu);
-	bool write_fault = kvm_is_write_fault(s2fd->vcpu);
 	bool logging_active = memslot_is_logging(s2fd->memslot);
 	struct kvm_s2_fault fault = {
 		.logging_active = logging_active,
 		.force_pte = logging_active,
 		.prot = KVM_PGTABLE_PROT_R,
-		.write_fault = write_fault,
 		.exec_fault = kvm_vcpu_trap_is_exec_fault(s2fd->vcpu),
-		.topup_memcache = !perm_fault || (logging_active && write_fault),
+		.topup_memcache = !perm_fault || (logging_active && kvm_is_write_fault(s2fd->vcpu)),
 	};
 	void *memcache;
 	int ret;
 
-	VM_WARN_ON_ONCE(fault.write_fault && fault.exec_fault);
-
 	/*
 	 * Permission faults just need to update the existing leaf entry,
 	 * and so normally don't require allocations from the memcache. The
-- 
2.47.3



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 07/17] KVM: arm64: Kill exec_fault from kvm_s2_fault
  2026-03-16 17:54 [PATCH 00/17] KVM: arm64: More user_mem_abort() rework Marc Zyngier
                   ` (5 preceding siblings ...)
  2026-03-16 17:54 ` [PATCH 06/17] KVM: arm64: Kill write_fault from kvm_s2_fault Marc Zyngier
@ 2026-03-16 17:54 ` Marc Zyngier
  2026-03-17 11:44   ` Fuad Tabba
  2026-03-16 17:54 ` [PATCH 08/17] KVM: arm64: Kill topup_memcache " Marc Zyngier
                   ` (12 subsequent siblings)
  19 siblings, 1 reply; 47+ messages in thread
From: Marc Zyngier @ 2026-03-16 17:54 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel
  Cc: Joey Gouly, Suzuki K Poulose, Oliver Upton, Zenghui Yu,
	Fuad Tabba, Will Deacon, Quentin Perret

Similarly to write_fault, exec_fault can be advantageously replaced
by the kvm_vcpu_trap_is_exec_fault() predicate where needed.

Another one bites the dust...

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/mmu.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 86950acbd7e6b..11820e39ad8e1 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1711,7 +1711,6 @@ static short kvm_s2_resolve_vma_size(const struct kvm_s2_fault_desc *s2fd,
 }
 
 struct kvm_s2_fault {
-	bool exec_fault;
 	bool writable;
 	bool topup_memcache;
 	bool mte_allowed;
@@ -1857,7 +1856,7 @@ static int kvm_s2_fault_compute_prot(const struct kvm_s2_fault_desc *s2fd,
 		fault->writable = false;
 	}
 
-	if (fault->exec_fault && fault->s2_force_noncacheable)
+	if (kvm_vcpu_trap_is_exec_fault(s2fd->vcpu) && fault->s2_force_noncacheable)
 		return -ENOEXEC;
 
 	/*
@@ -1877,7 +1876,7 @@ static int kvm_s2_fault_compute_prot(const struct kvm_s2_fault_desc *s2fd,
 	if (fault->writable)
 		fault->prot |= KVM_PGTABLE_PROT_W;
 
-	if (fault->exec_fault)
+	if (kvm_vcpu_trap_is_exec_fault(s2fd->vcpu))
 		fault->prot |= KVM_PGTABLE_PROT_X;
 
 	if (fault->s2_force_noncacheable)
@@ -1984,7 +1983,6 @@ static int user_mem_abort(const struct kvm_s2_fault_desc *s2fd)
 		.logging_active = logging_active,
 		.force_pte = logging_active,
 		.prot = KVM_PGTABLE_PROT_R,
-		.exec_fault = kvm_vcpu_trap_is_exec_fault(s2fd->vcpu),
 		.topup_memcache = !perm_fault || (logging_active && kvm_is_write_fault(s2fd->vcpu)),
 	};
 	void *memcache;
-- 
2.47.3



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 08/17] KVM: arm64: Kill topup_memcache from kvm_s2_fault
  2026-03-16 17:54 [PATCH 00/17] KVM: arm64: More user_mem_abort() rework Marc Zyngier
                   ` (6 preceding siblings ...)
  2026-03-16 17:54 ` [PATCH 07/17] KVM: arm64: Kill exec_fault " Marc Zyngier
@ 2026-03-16 17:54 ` Marc Zyngier
  2026-03-17 12:12   ` Fuad Tabba
  2026-03-16 17:54 ` [PATCH 09/17] KVM: arm64: Move VMA-related information to kvm_s2_fault_vma_info Marc Zyngier
                   ` (11 subsequent siblings)
  19 siblings, 1 reply; 47+ messages in thread
From: Marc Zyngier @ 2026-03-16 17:54 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel
  Cc: Joey Gouly, Suzuki K Poulose, Oliver Upton, Zenghui Yu,
	Fuad Tabba, Will Deacon, Quentin Perret

The topup_memcache field can be easily replaced by the equivalent
conditions, and the resulting code is not much worse.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/mmu.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 11820e39ad8e1..abe239752c696 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1712,7 +1712,6 @@ static short kvm_s2_resolve_vma_size(const struct kvm_s2_fault_desc *s2fd,
 
 struct kvm_s2_fault {
 	bool writable;
-	bool topup_memcache;
 	bool mte_allowed;
 	bool is_vma_cacheable;
 	bool s2_force_noncacheable;
@@ -1983,7 +1982,6 @@ static int user_mem_abort(const struct kvm_s2_fault_desc *s2fd)
 		.logging_active = logging_active,
 		.force_pte = logging_active,
 		.prot = KVM_PGTABLE_PROT_R,
-		.topup_memcache = !perm_fault || (logging_active && kvm_is_write_fault(s2fd->vcpu)),
 	};
 	void *memcache;
 	int ret;
@@ -1994,9 +1992,11 @@ static int user_mem_abort(const struct kvm_s2_fault_desc *s2fd)
 	 * only exception to this is when dirty logging is enabled at runtime
 	 * and a write fault needs to collapse a block entry into a table.
 	 */
-	ret = prepare_mmu_memcache(s2fd->vcpu, fault.topup_memcache, &memcache);
-	if (ret)
-		return ret;
+	if (!perm_fault || (logging_active && kvm_is_write_fault(s2fd->vcpu))) {
+		ret = prepare_mmu_memcache(s2fd->vcpu, true, &memcache);
+		if (ret)
+			return ret;
+	}
 
 	/*
 	 * Let's check if we will get back a huge fault->page backed by hugetlbfs, or
-- 
2.47.3



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 09/17] KVM: arm64: Move VMA-related information to kvm_s2_fault_vma_info
  2026-03-16 17:54 [PATCH 00/17] KVM: arm64: More user_mem_abort() rework Marc Zyngier
                   ` (7 preceding siblings ...)
  2026-03-16 17:54 ` [PATCH 08/17] KVM: arm64: Kill topup_memcache " Marc Zyngier
@ 2026-03-16 17:54 ` Marc Zyngier
  2026-03-17 12:51   ` Fuad Tabba
  2026-03-18 14:22   ` Joey Gouly
  2026-03-16 17:54 ` [PATCH 10/17] KVM: arm64: Kill logging_active from kvm_s2_fault Marc Zyngier
                   ` (10 subsequent siblings)
  19 siblings, 2 replies; 47+ messages in thread
From: Marc Zyngier @ 2026-03-16 17:54 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel
  Cc: Joey Gouly, Suzuki K Poulose, Oliver Upton, Zenghui Yu,
	Fuad Tabba, Will Deacon, Quentin Perret

Mecanically extract a bunch of VMA-related fields from kvm_s2_fault
and move them to a new kvm_s2_fault_vma_info structure.

This is not much, but it already allows us to define which functions
can update this structure, and which ones are pure consumers of the
data. Those in the latter camp are updated to take a const pointer
to that structure.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/mmu.c | 113 +++++++++++++++++++++++--------------------
 1 file changed, 61 insertions(+), 52 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index abe239752c696..a5b0dd41560f6 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1710,20 +1710,23 @@ static short kvm_s2_resolve_vma_size(const struct kvm_s2_fault_desc *s2fd,
 	return vma_shift;
 }
 
+struct kvm_s2_fault_vma_info {
+	unsigned long	mmu_seq;
+	long		vma_pagesize;
+	vm_flags_t	vm_flags;
+	gfn_t		gfn;
+	bool		mte_allowed;
+	bool		is_vma_cacheable;
+};
+
 struct kvm_s2_fault {
 	bool writable;
-	bool mte_allowed;
-	bool is_vma_cacheable;
 	bool s2_force_noncacheable;
-	unsigned long mmu_seq;
-	gfn_t gfn;
 	kvm_pfn_t pfn;
 	bool logging_active;
 	bool force_pte;
-	long vma_pagesize;
 	enum kvm_pgtable_prot prot;
 	struct page *page;
-	vm_flags_t vm_flags;
 };
 
 static bool kvm_s2_fault_is_perm(const struct kvm_s2_fault_desc *s2fd)
@@ -1732,7 +1735,8 @@ static bool kvm_s2_fault_is_perm(const struct kvm_s2_fault_desc *s2fd)
 }
 
 static int kvm_s2_fault_get_vma_info(const struct kvm_s2_fault_desc *s2fd,
-				     struct kvm_s2_fault *fault)
+				     struct kvm_s2_fault *fault,
+				     struct kvm_s2_fault_vma_info *s2vi)
 {
 	struct vm_area_struct *vma;
 	struct kvm *kvm = s2fd->vcpu->kvm;
@@ -1745,20 +1749,20 @@ static int kvm_s2_fault_get_vma_info(const struct kvm_s2_fault_desc *s2fd,
 		return -EFAULT;
 	}
 
-	fault->vma_pagesize = BIT(kvm_s2_resolve_vma_size(s2fd, vma, &fault->force_pte));
+	s2vi->vma_pagesize = BIT(kvm_s2_resolve_vma_size(s2fd, vma, &fault->force_pte));
 
 	/*
 	 * Both the canonical IPA and fault IPA must be aligned to the
 	 * mapping size to ensure we find the right PFN and lay down the
 	 * mapping in the right place.
 	 */
-	fault->gfn = ALIGN_DOWN(s2fd->fault_ipa, fault->vma_pagesize) >> PAGE_SHIFT;
+	s2vi->gfn = ALIGN_DOWN(s2fd->fault_ipa, s2vi->vma_pagesize) >> PAGE_SHIFT;
 
-	fault->mte_allowed = kvm_vma_mte_allowed(vma);
+	s2vi->mte_allowed = kvm_vma_mte_allowed(vma);
 
-	fault->vm_flags = vma->vm_flags;
+	s2vi->vm_flags = vma->vm_flags;
 
-	fault->is_vma_cacheable = kvm_vma_is_cacheable(vma);
+	s2vi->is_vma_cacheable = kvm_vma_is_cacheable(vma);
 
 	/*
 	 * Read mmu_invalidate_seq so that KVM can detect if the results of
@@ -1768,39 +1772,40 @@ static int kvm_s2_fault_get_vma_info(const struct kvm_s2_fault_desc *s2fd,
 	 * Rely on mmap_read_unlock() for an implicit smp_rmb(), which pairs
 	 * with the smp_wmb() in kvm_mmu_invalidate_end().
 	 */
-	fault->mmu_seq = kvm->mmu_invalidate_seq;
+	s2vi->mmu_seq = kvm->mmu_invalidate_seq;
 	mmap_read_unlock(current->mm);
 
 	return 0;
 }
 
 static gfn_t get_canonical_gfn(const struct kvm_s2_fault_desc *s2fd,
-			       const struct kvm_s2_fault *fault)
+			       const struct kvm_s2_fault_vma_info *s2vi)
 {
 	phys_addr_t ipa;
 
 	if (!s2fd->nested)
-		return fault->gfn;
+		return s2vi->gfn;
 
 	ipa = kvm_s2_trans_output(s2fd->nested);
-	return ALIGN_DOWN(ipa, fault->vma_pagesize) >> PAGE_SHIFT;
+	return ALIGN_DOWN(ipa, s2vi->vma_pagesize) >> PAGE_SHIFT;
 }
 
 static int kvm_s2_fault_pin_pfn(const struct kvm_s2_fault_desc *s2fd,
-				struct kvm_s2_fault *fault)
+				struct kvm_s2_fault *fault,
+				struct kvm_s2_fault_vma_info *s2vi)
 {
 	int ret;
 
-	ret = kvm_s2_fault_get_vma_info(s2fd, fault);
+	ret = kvm_s2_fault_get_vma_info(s2fd, fault, s2vi);
 	if (ret)
 		return ret;
 
-	fault->pfn = __kvm_faultin_pfn(s2fd->memslot, get_canonical_gfn(s2fd, fault),
+	fault->pfn = __kvm_faultin_pfn(s2fd->memslot, get_canonical_gfn(s2fd, s2vi),
 				       kvm_is_write_fault(s2fd->vcpu) ? FOLL_WRITE : 0,
 				       &fault->writable, &fault->page);
 	if (unlikely(is_error_noslot_pfn(fault->pfn))) {
 		if (fault->pfn == KVM_PFN_ERR_HWPOISON) {
-			kvm_send_hwpoison_signal(s2fd->hva, __ffs(fault->vma_pagesize));
+			kvm_send_hwpoison_signal(s2fd->hva, __ffs(s2vi->vma_pagesize));
 			return 0;
 		}
 		return -EFAULT;
@@ -1810,7 +1815,8 @@ static int kvm_s2_fault_pin_pfn(const struct kvm_s2_fault_desc *s2fd,
 }
 
 static int kvm_s2_fault_compute_prot(const struct kvm_s2_fault_desc *s2fd,
-				     struct kvm_s2_fault *fault)
+				     struct kvm_s2_fault *fault,
+				     const struct kvm_s2_fault_vma_info *s2vi)
 {
 	struct kvm *kvm = s2fd->vcpu->kvm;
 
@@ -1818,8 +1824,8 @@ static int kvm_s2_fault_compute_prot(const struct kvm_s2_fault_desc *s2fd,
 	 * Check if this is non-struct page memory PFN, and cannot support
 	 * CMOs. It could potentially be unsafe to access as cacheable.
 	 */
-	if (fault->vm_flags & (VM_PFNMAP | VM_MIXEDMAP) && !pfn_is_map_memory(fault->pfn)) {
-		if (fault->is_vma_cacheable) {
+	if (s2vi->vm_flags & (VM_PFNMAP | VM_MIXEDMAP) && !pfn_is_map_memory(fault->pfn)) {
+		if (s2vi->is_vma_cacheable) {
 			/*
 			 * Whilst the VMA owner expects cacheable mapping to this
 			 * PFN, hardware also has to support the FWB and CACHE DIC
@@ -1879,7 +1885,7 @@ static int kvm_s2_fault_compute_prot(const struct kvm_s2_fault_desc *s2fd,
 		fault->prot |= KVM_PGTABLE_PROT_X;
 
 	if (fault->s2_force_noncacheable)
-		fault->prot |= (fault->vm_flags & VM_ALLOW_ANY_UNCACHED) ?
+		fault->prot |= (s2vi->vm_flags & VM_ALLOW_ANY_UNCACHED) ?
 			       KVM_PGTABLE_PROT_NORMAL_NC : KVM_PGTABLE_PROT_DEVICE;
 	else if (cpus_have_final_cap(ARM64_HAS_CACHE_DIC))
 		fault->prot |= KVM_PGTABLE_PROT_X;
@@ -1889,74 +1895,73 @@ static int kvm_s2_fault_compute_prot(const struct kvm_s2_fault_desc *s2fd,
 
 	if (!kvm_s2_fault_is_perm(s2fd) && !fault->s2_force_noncacheable && kvm_has_mte(kvm)) {
 		/* Check the VMM hasn't introduced a new disallowed VMA */
-		if (!fault->mte_allowed)
+		if (!s2vi->mte_allowed)
 			return -EFAULT;
 	}
 
 	return 0;
 }
 
-static phys_addr_t get_ipa(const struct kvm_s2_fault *fault)
-{
-	return gfn_to_gpa(fault->gfn);
-}
-
 static int kvm_s2_fault_map(const struct kvm_s2_fault_desc *s2fd,
-			    struct kvm_s2_fault *fault, void *memcache)
+			    struct kvm_s2_fault *fault,
+			    const struct kvm_s2_fault_vma_info *s2vi, void *memcache)
 {
+	enum kvm_pgtable_walk_flags flags = KVM_PGTABLE_WALK_SHARED;
 	struct kvm *kvm = s2fd->vcpu->kvm;
 	struct kvm_pgtable *pgt;
 	long perm_fault_granule;
+	long mapping_size;
+	gfn_t gfn;
 	int ret;
-	enum kvm_pgtable_walk_flags flags = KVM_PGTABLE_WALK_SHARED;
 
 	kvm_fault_lock(kvm);
 	pgt = s2fd->vcpu->arch.hw_mmu->pgt;
 	ret = -EAGAIN;
-	if (mmu_invalidate_retry(kvm, fault->mmu_seq))
+	if (mmu_invalidate_retry(kvm, s2vi->mmu_seq))
 		goto out_unlock;
 
 	perm_fault_granule = (kvm_s2_fault_is_perm(s2fd) ?
 			      kvm_vcpu_trap_get_perm_fault_granule(s2fd->vcpu) : 0);
+	mapping_size = s2vi->vma_pagesize;
+	gfn = s2vi->gfn;
 
 	/*
 	 * If we are not forced to use fault->page mapping, check if we are
 	 * backed by a THP and thus use block mapping if possible.
 	 */
-	if (fault->vma_pagesize == PAGE_SIZE &&
+	if (mapping_size == PAGE_SIZE &&
 	    !(fault->force_pte || fault->s2_force_noncacheable)) {
 		if (perm_fault_granule > PAGE_SIZE) {
-			fault->vma_pagesize = perm_fault_granule;
+			mapping_size = perm_fault_granule;
 		} else {
-			fault->vma_pagesize = transparent_hugepage_adjust(kvm, s2fd->memslot,
-									  s2fd->hva, &fault->pfn,
-									  &fault->gfn);
-
-			if (fault->vma_pagesize < 0) {
-				ret = fault->vma_pagesize;
+			mapping_size = transparent_hugepage_adjust(kvm, s2fd->memslot,
+								   s2fd->hva, &fault->pfn,
+								   &gfn);
+			if (mapping_size < 0) {
+				ret = mapping_size;
 				goto out_unlock;
 			}
 		}
 	}
 
 	if (!perm_fault_granule && !fault->s2_force_noncacheable && kvm_has_mte(kvm))
-		sanitise_mte_tags(kvm, fault->pfn, fault->vma_pagesize);
+		sanitise_mte_tags(kvm, fault->pfn, mapping_size);
 
 	/*
 	 * Under the premise of getting a FSC_PERM fault, we just need to relax
-	 * permissions only if vma_pagesize equals perm_fault_granule. Otherwise,
+	 * permissions only if mapping_size equals perm_fault_granule. Otherwise,
 	 * kvm_pgtable_stage2_map() should be called to change block size.
 	 */
-	if (fault->vma_pagesize == perm_fault_granule) {
+	if (mapping_size == perm_fault_granule) {
 		/*
 		 * Drop the SW bits in favour of those stored in the
 		 * PTE, which will be preserved.
 		 */
 		fault->prot &= ~KVM_NV_GUEST_MAP_SZ;
-		ret = KVM_PGT_FN(kvm_pgtable_stage2_relax_perms)(pgt, get_ipa(fault),
+		ret = KVM_PGT_FN(kvm_pgtable_stage2_relax_perms)(pgt, gfn_to_gpa(gfn),
 								 fault->prot, flags);
 	} else {
-		ret = KVM_PGT_FN(kvm_pgtable_stage2_map)(pgt, get_ipa(fault), fault->vma_pagesize,
+		ret = KVM_PGT_FN(kvm_pgtable_stage2_map)(pgt, gfn_to_gpa(gfn), mapping_size,
 							 __pfn_to_phys(fault->pfn), fault->prot,
 							 memcache, flags);
 	}
@@ -1965,9 +1970,12 @@ static int kvm_s2_fault_map(const struct kvm_s2_fault_desc *s2fd,
 	kvm_release_faultin_page(kvm, fault->page, !!ret, fault->writable);
 	kvm_fault_unlock(kvm);
 
-	/* Mark the fault->page dirty only if the fault is handled successfully */
-	if (fault->writable && !ret)
-		mark_page_dirty_in_slot(kvm, s2fd->memslot, get_canonical_gfn(s2fd, fault));
+	/* Mark the page dirty only if the fault is handled successfully */
+	if (fault->writable && !ret) {
+		phys_addr_t ipa = gfn_to_gpa(get_canonical_gfn(s2fd, s2vi));
+		ipa &= ~(mapping_size - 1);
+		mark_page_dirty_in_slot(kvm, s2fd->memslot, gpa_to_gfn(ipa));
+	}
 
 	if (ret != -EAGAIN)
 		return ret;
@@ -1978,6 +1986,7 @@ static int user_mem_abort(const struct kvm_s2_fault_desc *s2fd)
 {
 	bool perm_fault = kvm_vcpu_trap_is_permission_fault(s2fd->vcpu);
 	bool logging_active = memslot_is_logging(s2fd->memslot);
+	struct kvm_s2_fault_vma_info s2vi = {};
 	struct kvm_s2_fault fault = {
 		.logging_active = logging_active,
 		.force_pte = logging_active,
@@ -2002,17 +2011,17 @@ static int user_mem_abort(const struct kvm_s2_fault_desc *s2fd)
 	 * Let's check if we will get back a huge fault->page backed by hugetlbfs, or
 	 * get block mapping for device MMIO region.
 	 */
-	ret = kvm_s2_fault_pin_pfn(s2fd, &fault);
+	ret = kvm_s2_fault_pin_pfn(s2fd, &fault, &s2vi);
 	if (ret != 1)
 		return ret;
 
-	ret = kvm_s2_fault_compute_prot(s2fd, &fault);
+	ret = kvm_s2_fault_compute_prot(s2fd, &fault, &s2vi);
 	if (ret) {
 		kvm_release_page_unused(fault.page);
 		return ret;
 	}
 
-	return kvm_s2_fault_map(s2fd, &fault, memcache);
+	return kvm_s2_fault_map(s2fd, &fault, &s2vi, memcache);
 }
 
 /* Resolve the access fault by making the page young again. */
-- 
2.47.3



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 10/17] KVM: arm64: Kill logging_active from kvm_s2_fault
  2026-03-16 17:54 [PATCH 00/17] KVM: arm64: More user_mem_abort() rework Marc Zyngier
                   ` (8 preceding siblings ...)
  2026-03-16 17:54 ` [PATCH 09/17] KVM: arm64: Move VMA-related information to kvm_s2_fault_vma_info Marc Zyngier
@ 2026-03-16 17:54 ` Marc Zyngier
  2026-03-17 13:23   ` Fuad Tabba
  2026-03-16 17:54 ` [PATCH 11/17] KVM: arm64: Restrict the scope of the 'writable' attribute Marc Zyngier
                   ` (9 subsequent siblings)
  19 siblings, 1 reply; 47+ messages in thread
From: Marc Zyngier @ 2026-03-16 17:54 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel
  Cc: Joey Gouly, Suzuki K Poulose, Oliver Upton, Zenghui Yu,
	Fuad Tabba, Will Deacon, Quentin Perret

There are only two spots where we evaluate whether logging is
active. Replace the boolean with calls to the relevant helper.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/mmu.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index a5b0dd41560f6..caa5bedc79e19 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1723,7 +1723,6 @@ struct kvm_s2_fault {
 	bool writable;
 	bool s2_force_noncacheable;
 	kvm_pfn_t pfn;
-	bool logging_active;
 	bool force_pte;
 	enum kvm_pgtable_prot prot;
 	struct page *page;
@@ -1853,7 +1852,7 @@ static int kvm_s2_fault_compute_prot(const struct kvm_s2_fault_desc *s2fd,
 			 */
 			fault->s2_force_noncacheable = true;
 		}
-	} else if (fault->logging_active && !kvm_is_write_fault(s2fd->vcpu)) {
+	} else if (memslot_is_logging(s2fd->memslot) && !kvm_is_write_fault(s2fd->vcpu)) {
 		/*
 		 * Only actually map the page as writable if this was a write
 		 * fault.
@@ -1985,11 +1984,9 @@ static int kvm_s2_fault_map(const struct kvm_s2_fault_desc *s2fd,
 static int user_mem_abort(const struct kvm_s2_fault_desc *s2fd)
 {
 	bool perm_fault = kvm_vcpu_trap_is_permission_fault(s2fd->vcpu);
-	bool logging_active = memslot_is_logging(s2fd->memslot);
 	struct kvm_s2_fault_vma_info s2vi = {};
 	struct kvm_s2_fault fault = {
-		.logging_active = logging_active,
-		.force_pte = logging_active,
+		.force_pte = memslot_is_logging(s2fd->memslot),
 		.prot = KVM_PGTABLE_PROT_R,
 	};
 	void *memcache;
@@ -2001,7 +1998,8 @@ static int user_mem_abort(const struct kvm_s2_fault_desc *s2fd)
 	 * only exception to this is when dirty logging is enabled at runtime
 	 * and a write fault needs to collapse a block entry into a table.
 	 */
-	if (!perm_fault || (logging_active && kvm_is_write_fault(s2fd->vcpu))) {
+	if (!perm_fault || (memslot_is_logging(s2fd->memslot) &&
+			    kvm_is_write_fault(s2fd->vcpu))) {
 		ret = prepare_mmu_memcache(s2fd->vcpu, true, &memcache);
 		if (ret)
 			return ret;
-- 
2.47.3



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 11/17] KVM: arm64: Restrict the scope of the 'writable' attribute
  2026-03-16 17:54 [PATCH 00/17] KVM: arm64: More user_mem_abort() rework Marc Zyngier
                   ` (9 preceding siblings ...)
  2026-03-16 17:54 ` [PATCH 10/17] KVM: arm64: Kill logging_active from kvm_s2_fault Marc Zyngier
@ 2026-03-16 17:54 ` Marc Zyngier
  2026-03-17 13:55   ` Fuad Tabba
  2026-03-16 17:54 ` [PATCH 12/17] KVM: arm64: Move kvm_s2_fault.{pfn,page} to kvm_s2_vma_info Marc Zyngier
                   ` (8 subsequent siblings)
  19 siblings, 1 reply; 47+ messages in thread
From: Marc Zyngier @ 2026-03-16 17:54 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel
  Cc: Joey Gouly, Suzuki K Poulose, Oliver Upton, Zenghui Yu,
	Fuad Tabba, Will Deacon, Quentin Perret

The 'writable' field is ambiguous, and indicates multiple things:

- whether the underlying memslot is writable

- whether we are resolving the fault with writable attributes

Add a new field to kvm_s2_fault_vma_info (map_writable) to indicate
the former condition, and have local writable variables to track
the latter.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/mmu.c | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index caa5bedc79e19..3cfb8f2a6d186 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1717,10 +1717,10 @@ struct kvm_s2_fault_vma_info {
 	gfn_t		gfn;
 	bool		mte_allowed;
 	bool		is_vma_cacheable;
+	bool		map_writable;
 };
 
 struct kvm_s2_fault {
-	bool writable;
 	bool s2_force_noncacheable;
 	kvm_pfn_t pfn;
 	bool force_pte;
@@ -1801,7 +1801,7 @@ static int kvm_s2_fault_pin_pfn(const struct kvm_s2_fault_desc *s2fd,
 
 	fault->pfn = __kvm_faultin_pfn(s2fd->memslot, get_canonical_gfn(s2fd, s2vi),
 				       kvm_is_write_fault(s2fd->vcpu) ? FOLL_WRITE : 0,
-				       &fault->writable, &fault->page);
+				       &s2vi->map_writable, &fault->page);
 	if (unlikely(is_error_noslot_pfn(fault->pfn))) {
 		if (fault->pfn == KVM_PFN_ERR_HWPOISON) {
 			kvm_send_hwpoison_signal(s2fd->hva, __ffs(s2vi->vma_pagesize));
@@ -1818,6 +1818,7 @@ static int kvm_s2_fault_compute_prot(const struct kvm_s2_fault_desc *s2fd,
 				     const struct kvm_s2_fault_vma_info *s2vi)
 {
 	struct kvm *kvm = s2fd->vcpu->kvm;
+	bool writable = s2vi->map_writable;
 
 	/*
 	 * Check if this is non-struct page memory PFN, and cannot support
@@ -1857,7 +1858,7 @@ static int kvm_s2_fault_compute_prot(const struct kvm_s2_fault_desc *s2fd,
 		 * Only actually map the page as writable if this was a write
 		 * fault.
 		 */
-		fault->writable = false;
+		writable = false;
 	}
 
 	if (kvm_vcpu_trap_is_exec_fault(s2fd->vcpu) && fault->s2_force_noncacheable)
@@ -1875,9 +1876,9 @@ static int kvm_s2_fault_compute_prot(const struct kvm_s2_fault_desc *s2fd,
 	}
 
 	if (s2fd->nested)
-		adjust_nested_fault_perms(s2fd->nested, &fault->prot, &fault->writable);
+		adjust_nested_fault_perms(s2fd->nested, &fault->prot, &writable);
 
-	if (fault->writable)
+	if (writable)
 		fault->prot |= KVM_PGTABLE_PROT_W;
 
 	if (kvm_vcpu_trap_is_exec_fault(s2fd->vcpu))
@@ -1906,6 +1907,7 @@ static int kvm_s2_fault_map(const struct kvm_s2_fault_desc *s2fd,
 			    const struct kvm_s2_fault_vma_info *s2vi, void *memcache)
 {
 	enum kvm_pgtable_walk_flags flags = KVM_PGTABLE_WALK_SHARED;
+	bool writable = fault->prot & KVM_PGTABLE_PROT_W;
 	struct kvm *kvm = s2fd->vcpu->kvm;
 	struct kvm_pgtable *pgt;
 	long perm_fault_granule;
@@ -1966,11 +1968,11 @@ static int kvm_s2_fault_map(const struct kvm_s2_fault_desc *s2fd,
 	}
 
 out_unlock:
-	kvm_release_faultin_page(kvm, fault->page, !!ret, fault->writable);
+	kvm_release_faultin_page(kvm, fault->page, !!ret, writable);
 	kvm_fault_unlock(kvm);
 
 	/* Mark the page dirty only if the fault is handled successfully */
-	if (fault->writable && !ret) {
+	if (writable && !ret) {
 		phys_addr_t ipa = gfn_to_gpa(get_canonical_gfn(s2fd, s2vi));
 		ipa &= ~(mapping_size - 1);
 		mark_page_dirty_in_slot(kvm, s2fd->memslot, gpa_to_gfn(ipa));
-- 
2.47.3



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 12/17] KVM: arm64: Move kvm_s2_fault.{pfn,page} to kvm_s2_vma_info
  2026-03-16 17:54 [PATCH 00/17] KVM: arm64: More user_mem_abort() rework Marc Zyngier
                   ` (10 preceding siblings ...)
  2026-03-16 17:54 ` [PATCH 11/17] KVM: arm64: Restrict the scope of the 'writable' attribute Marc Zyngier
@ 2026-03-16 17:54 ` Marc Zyngier
  2026-03-17 14:24   ` Fuad Tabba
  2026-03-16 17:54 ` [PATCH 13/17] KVM: arm64: Replace force_pte with a max_map_size attribute Marc Zyngier
                   ` (7 subsequent siblings)
  19 siblings, 1 reply; 47+ messages in thread
From: Marc Zyngier @ 2026-03-16 17:54 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel
  Cc: Joey Gouly, Suzuki K Poulose, Oliver Upton, Zenghui Yu,
	Fuad Tabba, Will Deacon, Quentin Perret

Continue restricting the visibility/mutability of some attributes
by moving kvm_s2_fault.{pfn,page} to kvm_s2_vma_info.

This is a pretty mechanical change.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/mmu.c | 30 ++++++++++++++++--------------
 1 file changed, 16 insertions(+), 14 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 3cfb8f2a6d186..ccdc9398e4ce2 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1714,6 +1714,8 @@ struct kvm_s2_fault_vma_info {
 	unsigned long	mmu_seq;
 	long		vma_pagesize;
 	vm_flags_t	vm_flags;
+	struct page	*page;
+	kvm_pfn_t	pfn;
 	gfn_t		gfn;
 	bool		mte_allowed;
 	bool		is_vma_cacheable;
@@ -1722,10 +1724,8 @@ struct kvm_s2_fault_vma_info {
 
 struct kvm_s2_fault {
 	bool s2_force_noncacheable;
-	kvm_pfn_t pfn;
 	bool force_pte;
 	enum kvm_pgtable_prot prot;
-	struct page *page;
 };
 
 static bool kvm_s2_fault_is_perm(const struct kvm_s2_fault_desc *s2fd)
@@ -1799,11 +1799,11 @@ static int kvm_s2_fault_pin_pfn(const struct kvm_s2_fault_desc *s2fd,
 	if (ret)
 		return ret;
 
-	fault->pfn = __kvm_faultin_pfn(s2fd->memslot, get_canonical_gfn(s2fd, s2vi),
-				       kvm_is_write_fault(s2fd->vcpu) ? FOLL_WRITE : 0,
-				       &s2vi->map_writable, &fault->page);
-	if (unlikely(is_error_noslot_pfn(fault->pfn))) {
-		if (fault->pfn == KVM_PFN_ERR_HWPOISON) {
+	s2vi->pfn = __kvm_faultin_pfn(s2fd->memslot, get_canonical_gfn(s2fd, s2vi),
+				      kvm_is_write_fault(s2fd->vcpu) ? FOLL_WRITE : 0,
+				      &s2vi->map_writable, &s2vi->page);
+	if (unlikely(is_error_noslot_pfn(s2vi->pfn))) {
+		if (s2vi->pfn == KVM_PFN_ERR_HWPOISON) {
 			kvm_send_hwpoison_signal(s2fd->hva, __ffs(s2vi->vma_pagesize));
 			return 0;
 		}
@@ -1824,7 +1824,7 @@ static int kvm_s2_fault_compute_prot(const struct kvm_s2_fault_desc *s2fd,
 	 * Check if this is non-struct page memory PFN, and cannot support
 	 * CMOs. It could potentially be unsafe to access as cacheable.
 	 */
-	if (s2vi->vm_flags & (VM_PFNMAP | VM_MIXEDMAP) && !pfn_is_map_memory(fault->pfn)) {
+	if (s2vi->vm_flags & (VM_PFNMAP | VM_MIXEDMAP) && !pfn_is_map_memory(s2vi->pfn)) {
 		if (s2vi->is_vma_cacheable) {
 			/*
 			 * Whilst the VMA owner expects cacheable mapping to this
@@ -1912,6 +1912,7 @@ static int kvm_s2_fault_map(const struct kvm_s2_fault_desc *s2fd,
 	struct kvm_pgtable *pgt;
 	long perm_fault_granule;
 	long mapping_size;
+	kvm_pfn_t pfn;
 	gfn_t gfn;
 	int ret;
 
@@ -1924,10 +1925,11 @@ static int kvm_s2_fault_map(const struct kvm_s2_fault_desc *s2fd,
 	perm_fault_granule = (kvm_s2_fault_is_perm(s2fd) ?
 			      kvm_vcpu_trap_get_perm_fault_granule(s2fd->vcpu) : 0);
 	mapping_size = s2vi->vma_pagesize;
+	pfn = s2vi->pfn;
 	gfn = s2vi->gfn;
 
 	/*
-	 * If we are not forced to use fault->page mapping, check if we are
+	 * If we are not forced to use page mapping, check if we are
 	 * backed by a THP and thus use block mapping if possible.
 	 */
 	if (mapping_size == PAGE_SIZE &&
@@ -1936,7 +1938,7 @@ static int kvm_s2_fault_map(const struct kvm_s2_fault_desc *s2fd,
 			mapping_size = perm_fault_granule;
 		} else {
 			mapping_size = transparent_hugepage_adjust(kvm, s2fd->memslot,
-								   s2fd->hva, &fault->pfn,
+								   s2fd->hva, &pfn,
 								   &gfn);
 			if (mapping_size < 0) {
 				ret = mapping_size;
@@ -1946,7 +1948,7 @@ static int kvm_s2_fault_map(const struct kvm_s2_fault_desc *s2fd,
 	}
 
 	if (!perm_fault_granule && !fault->s2_force_noncacheable && kvm_has_mte(kvm))
-		sanitise_mte_tags(kvm, fault->pfn, mapping_size);
+		sanitise_mte_tags(kvm, pfn, mapping_size);
 
 	/*
 	 * Under the premise of getting a FSC_PERM fault, we just need to relax
@@ -1963,12 +1965,12 @@ static int kvm_s2_fault_map(const struct kvm_s2_fault_desc *s2fd,
 								 fault->prot, flags);
 	} else {
 		ret = KVM_PGT_FN(kvm_pgtable_stage2_map)(pgt, gfn_to_gpa(gfn), mapping_size,
-							 __pfn_to_phys(fault->pfn), fault->prot,
+							 __pfn_to_phys(pfn), fault->prot,
 							 memcache, flags);
 	}
 
 out_unlock:
-	kvm_release_faultin_page(kvm, fault->page, !!ret, writable);
+	kvm_release_faultin_page(kvm, s2vi->page, !!ret, writable);
 	kvm_fault_unlock(kvm);
 
 	/* Mark the page dirty only if the fault is handled successfully */
@@ -2017,7 +2019,7 @@ static int user_mem_abort(const struct kvm_s2_fault_desc *s2fd)
 
 	ret = kvm_s2_fault_compute_prot(s2fd, &fault, &s2vi);
 	if (ret) {
-		kvm_release_page_unused(fault.page);
+		kvm_release_page_unused(s2vi.page);
 		return ret;
 	}
 
-- 
2.47.3



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 13/17] KVM: arm64: Replace force_pte with a max_map_size attribute
  2026-03-16 17:54 [PATCH 00/17] KVM: arm64: More user_mem_abort() rework Marc Zyngier
                   ` (11 preceding siblings ...)
  2026-03-16 17:54 ` [PATCH 12/17] KVM: arm64: Move kvm_s2_fault.{pfn,page} to kvm_s2_vma_info Marc Zyngier
@ 2026-03-16 17:54 ` Marc Zyngier
  2026-03-17 15:08   ` Fuad Tabba
  2026-03-16 17:54 ` [PATCH 14/17] KVM: arm64: Move device mapping management into kvm_s2_fault_pin_pfn() Marc Zyngier
                   ` (6 subsequent siblings)
  19 siblings, 1 reply; 47+ messages in thread
From: Marc Zyngier @ 2026-03-16 17:54 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel
  Cc: Joey Gouly, Suzuki K Poulose, Oliver Upton, Zenghui Yu,
	Fuad Tabba, Will Deacon, Quentin Perret

force_pte is annoyingly limited in what it expresses, and we'd
be better off with a more generic primitive. Introduce max_map_size
instead, which does the trick and can be moved into the vma_info
structure. This firther allows it to reduce the scopes in which
it is mutable.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/mmu.c | 47 +++++++++++++++++++++++---------------------
 1 file changed, 25 insertions(+), 22 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index ccdc9398e4ce2..ac4bfcc33aeb1 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1648,15 +1648,32 @@ struct kvm_s2_fault_desc {
 	unsigned long		hva;
 };
 
+struct kvm_s2_fault_vma_info {
+	unsigned long	mmu_seq;
+	long		vma_pagesize;
+	vm_flags_t	vm_flags;
+	unsigned long	max_map_size;
+	struct page	*page;
+	kvm_pfn_t	pfn;
+	gfn_t		gfn;
+	bool		mte_allowed;
+	bool		is_vma_cacheable;
+	bool		map_writable;
+};
+
 static short kvm_s2_resolve_vma_size(const struct kvm_s2_fault_desc *s2fd,
-				     struct vm_area_struct *vma, bool *force_pte)
+				     struct kvm_s2_fault_vma_info *s2vi,
+				     struct vm_area_struct *vma)
 {
 	short vma_shift;
 
-	if (*force_pte)
+	if (memslot_is_logging(s2fd->memslot)) {
+		s2vi->max_map_size = PAGE_SIZE;
 		vma_shift = PAGE_SHIFT;
-	else
+	} else {
+		s2vi->max_map_size = PUD_SIZE;
 		vma_shift = get_vma_page_shift(vma, s2fd->hva);
+	}
 
 	switch (vma_shift) {
 #ifndef __PAGETABLE_PMD_FOLDED
@@ -1674,7 +1691,7 @@ static short kvm_s2_resolve_vma_size(const struct kvm_s2_fault_desc *s2fd,
 		fallthrough;
 	case CONT_PTE_SHIFT:
 		vma_shift = PAGE_SHIFT;
-		*force_pte = true;
+		s2vi->max_map_size = PAGE_SIZE;
 		fallthrough;
 	case PAGE_SHIFT:
 		break;
@@ -1685,7 +1702,7 @@ static short kvm_s2_resolve_vma_size(const struct kvm_s2_fault_desc *s2fd,
 	if (s2fd->nested) {
 		unsigned long max_map_size;
 
-		max_map_size = *force_pte ? PAGE_SIZE : PUD_SIZE;
+		max_map_size = min(s2vi->max_map_size, PUD_SIZE);
 
 		/*
 		 * If we're about to create a shadow stage 2 entry, then we
@@ -1703,28 +1720,15 @@ static short kvm_s2_resolve_vma_size(const struct kvm_s2_fault_desc *s2fd,
 		else if (max_map_size >= PAGE_SIZE && max_map_size < PMD_SIZE)
 			max_map_size = PAGE_SIZE;
 
-		*force_pte = (max_map_size == PAGE_SIZE);
+		s2vi->max_map_size = max_map_size;
 		vma_shift = min_t(short, vma_shift, __ffs(max_map_size));
 	}
 
 	return vma_shift;
 }
 
-struct kvm_s2_fault_vma_info {
-	unsigned long	mmu_seq;
-	long		vma_pagesize;
-	vm_flags_t	vm_flags;
-	struct page	*page;
-	kvm_pfn_t	pfn;
-	gfn_t		gfn;
-	bool		mte_allowed;
-	bool		is_vma_cacheable;
-	bool		map_writable;
-};
-
 struct kvm_s2_fault {
 	bool s2_force_noncacheable;
-	bool force_pte;
 	enum kvm_pgtable_prot prot;
 };
 
@@ -1748,7 +1752,7 @@ static int kvm_s2_fault_get_vma_info(const struct kvm_s2_fault_desc *s2fd,
 		return -EFAULT;
 	}
 
-	s2vi->vma_pagesize = BIT(kvm_s2_resolve_vma_size(s2fd, vma, &fault->force_pte));
+	s2vi->vma_pagesize = BIT(kvm_s2_resolve_vma_size(s2fd, s2vi, vma));
 
 	/*
 	 * Both the canonical IPA and fault IPA must be aligned to the
@@ -1933,7 +1937,7 @@ static int kvm_s2_fault_map(const struct kvm_s2_fault_desc *s2fd,
 	 * backed by a THP and thus use block mapping if possible.
 	 */
 	if (mapping_size == PAGE_SIZE &&
-	    !(fault->force_pte || fault->s2_force_noncacheable)) {
+	    !(s2vi->max_map_size == PAGE_SIZE || fault->s2_force_noncacheable)) {
 		if (perm_fault_granule > PAGE_SIZE) {
 			mapping_size = perm_fault_granule;
 		} else {
@@ -1990,7 +1994,6 @@ static int user_mem_abort(const struct kvm_s2_fault_desc *s2fd)
 	bool perm_fault = kvm_vcpu_trap_is_permission_fault(s2fd->vcpu);
 	struct kvm_s2_fault_vma_info s2vi = {};
 	struct kvm_s2_fault fault = {
-		.force_pte = memslot_is_logging(s2fd->memslot),
 		.prot = KVM_PGTABLE_PROT_R,
 	};
 	void *memcache;
-- 
2.47.3



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 14/17] KVM: arm64: Move device mapping management into kvm_s2_fault_pin_pfn()
  2026-03-16 17:54 [PATCH 00/17] KVM: arm64: More user_mem_abort() rework Marc Zyngier
                   ` (12 preceding siblings ...)
  2026-03-16 17:54 ` [PATCH 13/17] KVM: arm64: Replace force_pte with a max_map_size attribute Marc Zyngier
@ 2026-03-16 17:54 ` Marc Zyngier
  2026-03-17 15:41   ` Fuad Tabba
  2026-03-16 17:54 ` [PATCH 15/17] KVM: arm64: Directly expose mapping prot and kill kvm_s2_fault Marc Zyngier
                   ` (5 subsequent siblings)
  19 siblings, 1 reply; 47+ messages in thread
From: Marc Zyngier @ 2026-03-16 17:54 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel
  Cc: Joey Gouly, Suzuki K Poulose, Oliver Upton, Zenghui Yu,
	Fuad Tabba, Will Deacon, Quentin Perret

Attributes computed for devices are computed very late in the fault
handling process, meanning they are mutable for that long.

Introduce both 'device' and 'map_non_cacheable' attributes to the
vma_info structure, allowing that information to be set in stone
earlier, in kvm_s2_fault_pin_pfn().

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/mmu.c | 52 ++++++++++++++++++++++++--------------------
 1 file changed, 29 insertions(+), 23 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index ac4bfcc33aeb1..97cb3585eba03 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1656,9 +1656,11 @@ struct kvm_s2_fault_vma_info {
 	struct page	*page;
 	kvm_pfn_t	pfn;
 	gfn_t		gfn;
+	bool		device;
 	bool		mte_allowed;
 	bool		is_vma_cacheable;
 	bool		map_writable;
+	bool		map_non_cacheable;
 };
 
 static short kvm_s2_resolve_vma_size(const struct kvm_s2_fault_desc *s2fd,
@@ -1728,7 +1730,6 @@ static short kvm_s2_resolve_vma_size(const struct kvm_s2_fault_desc *s2fd,
 }
 
 struct kvm_s2_fault {
-	bool s2_force_noncacheable;
 	enum kvm_pgtable_prot prot;
 };
 
@@ -1738,7 +1739,6 @@ static bool kvm_s2_fault_is_perm(const struct kvm_s2_fault_desc *s2fd)
 }
 
 static int kvm_s2_fault_get_vma_info(const struct kvm_s2_fault_desc *s2fd,
-				     struct kvm_s2_fault *fault,
 				     struct kvm_s2_fault_vma_info *s2vi)
 {
 	struct vm_area_struct *vma;
@@ -1794,12 +1794,11 @@ static gfn_t get_canonical_gfn(const struct kvm_s2_fault_desc *s2fd,
 }
 
 static int kvm_s2_fault_pin_pfn(const struct kvm_s2_fault_desc *s2fd,
-				struct kvm_s2_fault *fault,
 				struct kvm_s2_fault_vma_info *s2vi)
 {
 	int ret;
 
-	ret = kvm_s2_fault_get_vma_info(s2fd, fault, s2vi);
+	ret = kvm_s2_fault_get_vma_info(s2fd, s2vi);
 	if (ret)
 		return ret;
 
@@ -1814,16 +1813,6 @@ static int kvm_s2_fault_pin_pfn(const struct kvm_s2_fault_desc *s2fd,
 		return -EFAULT;
 	}
 
-	return 1;
-}
-
-static int kvm_s2_fault_compute_prot(const struct kvm_s2_fault_desc *s2fd,
-				     struct kvm_s2_fault *fault,
-				     const struct kvm_s2_fault_vma_info *s2vi)
-{
-	struct kvm *kvm = s2fd->vcpu->kvm;
-	bool writable = s2vi->map_writable;
-
 	/*
 	 * Check if this is non-struct page memory PFN, and cannot support
 	 * CMOs. It could potentially be unsafe to access as cacheable.
@@ -1842,8 +1831,10 @@ static int kvm_s2_fault_compute_prot(const struct kvm_s2_fault_desc *s2fd,
 			 * S2FWB and CACHE DIC are mandatory to avoid the need for
 			 * cache maintenance.
 			 */
-			if (!kvm_supports_cacheable_pfnmap())
+			if (!kvm_supports_cacheable_pfnmap()) {
+				kvm_release_faultin_page(s2fd->vcpu->kvm, s2vi->page, true, false);
 				return -EFAULT;
+			}
 		} else {
 			/*
 			 * If the page was identified as device early by looking at
@@ -1855,9 +1846,24 @@ static int kvm_s2_fault_compute_prot(const struct kvm_s2_fault_desc *s2fd,
 			 * In both cases, we don't let transparent_hugepage_adjust()
 			 * change things at the last minute.
 			 */
-			fault->s2_force_noncacheable = true;
+			s2vi->map_non_cacheable = true;
 		}
-	} else if (memslot_is_logging(s2fd->memslot) && !kvm_is_write_fault(s2fd->vcpu)) {
+
+		s2vi->device = true;
+	}
+
+	return 1;
+}
+
+static int kvm_s2_fault_compute_prot(const struct kvm_s2_fault_desc *s2fd,
+				     struct kvm_s2_fault *fault,
+				     const struct kvm_s2_fault_vma_info *s2vi)
+{
+	struct kvm *kvm = s2fd->vcpu->kvm;
+	bool writable = s2vi->map_writable;
+
+	if (!s2vi->device && memslot_is_logging(s2fd->memslot) &&
+	    !kvm_is_write_fault(s2fd->vcpu)) {
 		/*
 		 * Only actually map the page as writable if this was a write
 		 * fault.
@@ -1865,7 +1871,7 @@ static int kvm_s2_fault_compute_prot(const struct kvm_s2_fault_desc *s2fd,
 		writable = false;
 	}
 
-	if (kvm_vcpu_trap_is_exec_fault(s2fd->vcpu) && fault->s2_force_noncacheable)
+	if (kvm_vcpu_trap_is_exec_fault(s2fd->vcpu) && s2vi->map_non_cacheable)
 		return -ENOEXEC;
 
 	/*
@@ -1888,7 +1894,7 @@ static int kvm_s2_fault_compute_prot(const struct kvm_s2_fault_desc *s2fd,
 	if (kvm_vcpu_trap_is_exec_fault(s2fd->vcpu))
 		fault->prot |= KVM_PGTABLE_PROT_X;
 
-	if (fault->s2_force_noncacheable)
+	if (s2vi->map_non_cacheable)
 		fault->prot |= (s2vi->vm_flags & VM_ALLOW_ANY_UNCACHED) ?
 			       KVM_PGTABLE_PROT_NORMAL_NC : KVM_PGTABLE_PROT_DEVICE;
 	else if (cpus_have_final_cap(ARM64_HAS_CACHE_DIC))
@@ -1897,7 +1903,7 @@ static int kvm_s2_fault_compute_prot(const struct kvm_s2_fault_desc *s2fd,
 	if (s2fd->nested)
 		adjust_nested_exec_perms(kvm, s2fd->nested, &fault->prot);
 
-	if (!kvm_s2_fault_is_perm(s2fd) && !fault->s2_force_noncacheable && kvm_has_mte(kvm)) {
+	if (!kvm_s2_fault_is_perm(s2fd) && !s2vi->map_non_cacheable && kvm_has_mte(kvm)) {
 		/* Check the VMM hasn't introduced a new disallowed VMA */
 		if (!s2vi->mte_allowed)
 			return -EFAULT;
@@ -1937,7 +1943,7 @@ static int kvm_s2_fault_map(const struct kvm_s2_fault_desc *s2fd,
 	 * backed by a THP and thus use block mapping if possible.
 	 */
 	if (mapping_size == PAGE_SIZE &&
-	    !(s2vi->max_map_size == PAGE_SIZE || fault->s2_force_noncacheable)) {
+	    !(s2vi->max_map_size == PAGE_SIZE || s2vi->map_non_cacheable)) {
 		if (perm_fault_granule > PAGE_SIZE) {
 			mapping_size = perm_fault_granule;
 		} else {
@@ -1951,7 +1957,7 @@ static int kvm_s2_fault_map(const struct kvm_s2_fault_desc *s2fd,
 		}
 	}
 
-	if (!perm_fault_granule && !fault->s2_force_noncacheable && kvm_has_mte(kvm))
+	if (!perm_fault_granule && !s2vi->map_non_cacheable && kvm_has_mte(kvm))
 		sanitise_mte_tags(kvm, pfn, mapping_size);
 
 	/*
@@ -2016,7 +2022,7 @@ static int user_mem_abort(const struct kvm_s2_fault_desc *s2fd)
 	 * Let's check if we will get back a huge fault->page backed by hugetlbfs, or
 	 * get block mapping for device MMIO region.
 	 */
-	ret = kvm_s2_fault_pin_pfn(s2fd, &fault, &s2vi);
+	ret = kvm_s2_fault_pin_pfn(s2fd, &s2vi);
 	if (ret != 1)
 		return ret;
 
-- 
2.47.3



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 15/17] KVM: arm64: Directly expose mapping prot and kill kvm_s2_fault
  2026-03-16 17:54 [PATCH 00/17] KVM: arm64: More user_mem_abort() rework Marc Zyngier
                   ` (13 preceding siblings ...)
  2026-03-16 17:54 ` [PATCH 14/17] KVM: arm64: Move device mapping management into kvm_s2_fault_pin_pfn() Marc Zyngier
@ 2026-03-16 17:54 ` Marc Zyngier
  2026-03-17 16:14   ` Fuad Tabba
  2026-03-16 17:54 ` [PATCH 16/17] KVM: arm64: Simplify integration of adjust_nested_*_perms() Marc Zyngier
                   ` (4 subsequent siblings)
  19 siblings, 1 reply; 47+ messages in thread
From: Marc Zyngier @ 2026-03-16 17:54 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel
  Cc: Joey Gouly, Suzuki K Poulose, Oliver Upton, Zenghui Yu,
	Fuad Tabba, Will Deacon, Quentin Perret

The 'prot' field is the only one left in kvm_s2_fault. Expose it
directly to the functions needing it, and get rid of kvm_s2_fault.

It has served us well during this refactoring, but it is now no
longer needed.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/mmu.c | 45 +++++++++++++++++++++-----------------------
 1 file changed, 21 insertions(+), 24 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 97cb3585eba03..9b5df70807875 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1729,10 +1729,6 @@ static short kvm_s2_resolve_vma_size(const struct kvm_s2_fault_desc *s2fd,
 	return vma_shift;
 }
 
-struct kvm_s2_fault {
-	enum kvm_pgtable_prot prot;
-};
-
 static bool kvm_s2_fault_is_perm(const struct kvm_s2_fault_desc *s2fd)
 {
 	return kvm_vcpu_trap_is_permission_fault(s2fd->vcpu);
@@ -1856,8 +1852,8 @@ static int kvm_s2_fault_pin_pfn(const struct kvm_s2_fault_desc *s2fd,
 }
 
 static int kvm_s2_fault_compute_prot(const struct kvm_s2_fault_desc *s2fd,
-				     struct kvm_s2_fault *fault,
-				     const struct kvm_s2_fault_vma_info *s2vi)
+				     const struct kvm_s2_fault_vma_info *s2vi,
+				     enum kvm_pgtable_prot *prot)
 {
 	struct kvm *kvm = s2fd->vcpu->kvm;
 	bool writable = s2vi->map_writable;
@@ -1885,23 +1881,25 @@ static int kvm_s2_fault_compute_prot(const struct kvm_s2_fault_desc *s2fd,
 		return 1;
 	}
 
+	*prot = KVM_PGTABLE_PROT_R;
+
 	if (s2fd->nested)
-		adjust_nested_fault_perms(s2fd->nested, &fault->prot, &writable);
+		adjust_nested_fault_perms(s2fd->nested, prot, &writable);
 
 	if (writable)
-		fault->prot |= KVM_PGTABLE_PROT_W;
+		*prot |= KVM_PGTABLE_PROT_W;
 
 	if (kvm_vcpu_trap_is_exec_fault(s2fd->vcpu))
-		fault->prot |= KVM_PGTABLE_PROT_X;
+		*prot |= KVM_PGTABLE_PROT_X;
 
 	if (s2vi->map_non_cacheable)
-		fault->prot |= (s2vi->vm_flags & VM_ALLOW_ANY_UNCACHED) ?
-			       KVM_PGTABLE_PROT_NORMAL_NC : KVM_PGTABLE_PROT_DEVICE;
+		*prot |= (s2vi->vm_flags & VM_ALLOW_ANY_UNCACHED) ?
+			KVM_PGTABLE_PROT_NORMAL_NC : KVM_PGTABLE_PROT_DEVICE;
 	else if (cpus_have_final_cap(ARM64_HAS_CACHE_DIC))
-		fault->prot |= KVM_PGTABLE_PROT_X;
+		*prot |= KVM_PGTABLE_PROT_X;
 
 	if (s2fd->nested)
-		adjust_nested_exec_perms(kvm, s2fd->nested, &fault->prot);
+		adjust_nested_exec_perms(kvm, s2fd->nested, prot);
 
 	if (!kvm_s2_fault_is_perm(s2fd) && !s2vi->map_non_cacheable && kvm_has_mte(kvm)) {
 		/* Check the VMM hasn't introduced a new disallowed VMA */
@@ -1913,11 +1911,12 @@ static int kvm_s2_fault_compute_prot(const struct kvm_s2_fault_desc *s2fd,
 }
 
 static int kvm_s2_fault_map(const struct kvm_s2_fault_desc *s2fd,
-			    struct kvm_s2_fault *fault,
-			    const struct kvm_s2_fault_vma_info *s2vi, void *memcache)
+			    const struct kvm_s2_fault_vma_info *s2vi,
+			    enum kvm_pgtable_prot prot,
+			    void *memcache)
 {
 	enum kvm_pgtable_walk_flags flags = KVM_PGTABLE_WALK_SHARED;
-	bool writable = fault->prot & KVM_PGTABLE_PROT_W;
+	bool writable = prot & KVM_PGTABLE_PROT_W;
 	struct kvm *kvm = s2fd->vcpu->kvm;
 	struct kvm_pgtable *pgt;
 	long perm_fault_granule;
@@ -1970,12 +1969,12 @@ static int kvm_s2_fault_map(const struct kvm_s2_fault_desc *s2fd,
 		 * Drop the SW bits in favour of those stored in the
 		 * PTE, which will be preserved.
 		 */
-		fault->prot &= ~KVM_NV_GUEST_MAP_SZ;
+		prot &= ~KVM_NV_GUEST_MAP_SZ;
 		ret = KVM_PGT_FN(kvm_pgtable_stage2_relax_perms)(pgt, gfn_to_gpa(gfn),
-								 fault->prot, flags);
+								 prot, flags);
 	} else {
 		ret = KVM_PGT_FN(kvm_pgtable_stage2_map)(pgt, gfn_to_gpa(gfn), mapping_size,
-							 __pfn_to_phys(pfn), fault->prot,
+							 __pfn_to_phys(pfn), prot,
 							 memcache, flags);
 	}
 
@@ -1999,9 +1998,7 @@ static int user_mem_abort(const struct kvm_s2_fault_desc *s2fd)
 {
 	bool perm_fault = kvm_vcpu_trap_is_permission_fault(s2fd->vcpu);
 	struct kvm_s2_fault_vma_info s2vi = {};
-	struct kvm_s2_fault fault = {
-		.prot = KVM_PGTABLE_PROT_R,
-	};
+	enum kvm_pgtable_prot prot;
 	void *memcache;
 	int ret;
 
@@ -2026,13 +2023,13 @@ static int user_mem_abort(const struct kvm_s2_fault_desc *s2fd)
 	if (ret != 1)
 		return ret;
 
-	ret = kvm_s2_fault_compute_prot(s2fd, &fault, &s2vi);
+	ret = kvm_s2_fault_compute_prot(s2fd, &s2vi, &prot);
 	if (ret) {
 		kvm_release_page_unused(s2vi.page);
 		return ret;
 	}
 
-	return kvm_s2_fault_map(s2fd, &fault, &s2vi, memcache);
+	return kvm_s2_fault_map(s2fd, &s2vi, prot, memcache);
 }
 
 /* Resolve the access fault by making the page young again. */
-- 
2.47.3



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 16/17] KVM: arm64: Simplify integration of adjust_nested_*_perms()
  2026-03-16 17:54 [PATCH 00/17] KVM: arm64: More user_mem_abort() rework Marc Zyngier
                   ` (14 preceding siblings ...)
  2026-03-16 17:54 ` [PATCH 15/17] KVM: arm64: Directly expose mapping prot and kill kvm_s2_fault Marc Zyngier
@ 2026-03-16 17:54 ` Marc Zyngier
  2026-03-17 16:45   ` Fuad Tabba
  2026-03-16 17:54 ` [PATCH 17/17] KVM: arm64: Convert gmem_abort() to struct kvm_s2_fault_desc Marc Zyngier
                   ` (3 subsequent siblings)
  19 siblings, 1 reply; 47+ messages in thread
From: Marc Zyngier @ 2026-03-16 17:54 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel
  Cc: Joey Gouly, Suzuki K Poulose, Oliver Upton, Zenghui Yu,
	Fuad Tabba, Will Deacon, Quentin Perret

Instead of passing pointers to adjust_nested_*_perms(), allow
them to return a new set of permissions.

With some careful moving around so that the canonical permissions
are computed before the nested ones are applied, we end-up with
a bit less code, and something a bit more readable.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/mmu.c | 62 +++++++++++++++++++-------------------------
 1 file changed, 27 insertions(+), 35 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 9b5df70807875..18cf7e6ba786d 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1544,32 +1544,34 @@ static int prepare_mmu_memcache(struct kvm_vcpu *vcpu, bool topup_memcache,
  * TLB invalidation from the guest and used to limit the invalidation scope if a
  * TTL hint or a range isn't provided.
  */
-static void adjust_nested_fault_perms(struct kvm_s2_trans *nested,
-				      enum kvm_pgtable_prot *prot,
-				      bool *writable)
+static enum kvm_pgtable_prot adjust_nested_fault_perms(struct kvm_s2_trans *nested,
+						       enum kvm_pgtable_prot prot)
 {
-	*writable &= kvm_s2_trans_writable(nested);
+	if (!kvm_s2_trans_writable(nested))
+		prot &= ~KVM_PGTABLE_PROT_W;
 	if (!kvm_s2_trans_readable(nested))
-		*prot &= ~KVM_PGTABLE_PROT_R;
+		prot &= ~KVM_PGTABLE_PROT_R;
 
-	*prot |= kvm_encode_nested_level(nested);
+	return prot | kvm_encode_nested_level(nested);
 }
 
-static void adjust_nested_exec_perms(struct kvm *kvm,
-				     struct kvm_s2_trans *nested,
-				     enum kvm_pgtable_prot *prot)
+static enum kvm_pgtable_prot adjust_nested_exec_perms(struct kvm *kvm,
+						      struct kvm_s2_trans *nested,
+						      enum kvm_pgtable_prot prot)
 {
 	if (!kvm_s2_trans_exec_el0(kvm, nested))
-		*prot &= ~KVM_PGTABLE_PROT_UX;
+		prot &= ~KVM_PGTABLE_PROT_UX;
 	if (!kvm_s2_trans_exec_el1(kvm, nested))
-		*prot &= ~KVM_PGTABLE_PROT_PX;
+		prot &= ~KVM_PGTABLE_PROT_PX;
+
+	return prot;
 }
 
 static int gmem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 		      struct kvm_s2_trans *nested,
 		      struct kvm_memory_slot *memslot, bool is_perm)
 {
-	bool write_fault, exec_fault, writable;
+	bool write_fault, exec_fault;
 	enum kvm_pgtable_walk_flags flags = KVM_PGTABLE_WALK_SHARED;
 	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
 	struct kvm_pgtable *pgt = vcpu->arch.hw_mmu->pgt;
@@ -1606,19 +1608,17 @@ static int gmem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 		return ret;
 	}
 
-	writable = !(memslot->flags & KVM_MEM_READONLY);
+	if (!(memslot->flags & KVM_MEM_READONLY))
+		prot |= KVM_PGTABLE_PROT_W;
 
 	if (nested)
-		adjust_nested_fault_perms(nested, &prot, &writable);
-
-	if (writable)
-		prot |= KVM_PGTABLE_PROT_W;
+		prot = adjust_nested_fault_perms(nested, prot);
 
 	if (exec_fault || cpus_have_final_cap(ARM64_HAS_CACHE_DIC))
 		prot |= KVM_PGTABLE_PROT_X;
 
 	if (nested)
-		adjust_nested_exec_perms(kvm, nested, &prot);
+		prot = adjust_nested_exec_perms(kvm, nested, prot);
 
 	kvm_fault_lock(kvm);
 	if (mmu_invalidate_retry(kvm, mmu_seq)) {
@@ -1631,10 +1631,10 @@ static int gmem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 						 memcache, flags);
 
 out_unlock:
-	kvm_release_faultin_page(kvm, page, !!ret, writable);
+	kvm_release_faultin_page(kvm, page, !!ret, prot & KVM_PGTABLE_PROT_W);
 	kvm_fault_unlock(kvm);
 
-	if (writable && !ret)
+	if ((prot & KVM_PGTABLE_PROT_W) && !ret)
 		mark_page_dirty_in_slot(kvm, memslot, gfn);
 
 	return ret != -EAGAIN ? ret : 0;
@@ -1856,16 +1856,6 @@ static int kvm_s2_fault_compute_prot(const struct kvm_s2_fault_desc *s2fd,
 				     enum kvm_pgtable_prot *prot)
 {
 	struct kvm *kvm = s2fd->vcpu->kvm;
-	bool writable = s2vi->map_writable;
-
-	if (!s2vi->device && memslot_is_logging(s2fd->memslot) &&
-	    !kvm_is_write_fault(s2fd->vcpu)) {
-		/*
-		 * Only actually map the page as writable if this was a write
-		 * fault.
-		 */
-		writable = false;
-	}
 
 	if (kvm_vcpu_trap_is_exec_fault(s2fd->vcpu) && s2vi->map_non_cacheable)
 		return -ENOEXEC;
@@ -1883,12 +1873,14 @@ static int kvm_s2_fault_compute_prot(const struct kvm_s2_fault_desc *s2fd,
 
 	*prot = KVM_PGTABLE_PROT_R;
 
-	if (s2fd->nested)
-		adjust_nested_fault_perms(s2fd->nested, prot, &writable);
-
-	if (writable)
+	if (s2vi->map_writable && (s2vi->device ||
+				   !memslot_is_logging(s2fd->memslot) ||
+				   kvm_is_write_fault(s2fd->vcpu)))
 		*prot |= KVM_PGTABLE_PROT_W;
 
+	if (s2fd->nested)
+		*prot = adjust_nested_fault_perms(s2fd->nested, *prot);
+
 	if (kvm_vcpu_trap_is_exec_fault(s2fd->vcpu))
 		*prot |= KVM_PGTABLE_PROT_X;
 
@@ -1899,7 +1891,7 @@ static int kvm_s2_fault_compute_prot(const struct kvm_s2_fault_desc *s2fd,
 		*prot |= KVM_PGTABLE_PROT_X;
 
 	if (s2fd->nested)
-		adjust_nested_exec_perms(kvm, s2fd->nested, prot);
+		*prot = adjust_nested_exec_perms(kvm, s2fd->nested, *prot);
 
 	if (!kvm_s2_fault_is_perm(s2fd) && !s2vi->map_non_cacheable && kvm_has_mte(kvm)) {
 		/* Check the VMM hasn't introduced a new disallowed VMA */
-- 
2.47.3



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 17/17] KVM: arm64: Convert gmem_abort() to struct kvm_s2_fault_desc
  2026-03-16 17:54 [PATCH 00/17] KVM: arm64: More user_mem_abort() rework Marc Zyngier
                   ` (15 preceding siblings ...)
  2026-03-16 17:54 ` [PATCH 16/17] KVM: arm64: Simplify integration of adjust_nested_*_perms() Marc Zyngier
@ 2026-03-16 17:54 ` Marc Zyngier
  2026-03-17 17:58   ` Fuad Tabba
  2026-03-16 19:45 ` [PATCH 00/17] KVM: arm64: More user_mem_abort() rework Fuad Tabba
                   ` (2 subsequent siblings)
  19 siblings, 1 reply; 47+ messages in thread
From: Marc Zyngier @ 2026-03-16 17:54 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel
  Cc: Joey Gouly, Suzuki K Poulose, Oliver Upton, Zenghui Yu,
	Fuad Tabba, Will Deacon, Quentin Perret

Having fully converted user_mem_abort() to kvm_s2_fault_desc and
co, convert gmem_abort() to it as well. The change is obviously
much simpler.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/mmu.c | 57 +++++++++++++++++++++-----------------------
 1 file changed, 27 insertions(+), 30 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 18cf7e6ba786d..e14b8b7287192 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1567,33 +1567,39 @@ static enum kvm_pgtable_prot adjust_nested_exec_perms(struct kvm *kvm,
 	return prot;
 }
 
-static int gmem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
-		      struct kvm_s2_trans *nested,
-		      struct kvm_memory_slot *memslot, bool is_perm)
+struct kvm_s2_fault_desc {
+	struct kvm_vcpu		*vcpu;
+	phys_addr_t		fault_ipa;
+	struct kvm_s2_trans	*nested;
+	struct kvm_memory_slot	*memslot;
+	unsigned long		hva;
+};
+
+static int gmem_abort(const struct kvm_s2_fault_desc *s2fd)
 {
 	bool write_fault, exec_fault;
 	enum kvm_pgtable_walk_flags flags = KVM_PGTABLE_WALK_SHARED;
 	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
-	struct kvm_pgtable *pgt = vcpu->arch.hw_mmu->pgt;
+	struct kvm_pgtable *pgt = s2fd->vcpu->arch.hw_mmu->pgt;
 	unsigned long mmu_seq;
 	struct page *page;
-	struct kvm *kvm = vcpu->kvm;
+	struct kvm *kvm = s2fd->vcpu->kvm;
 	void *memcache;
 	kvm_pfn_t pfn;
 	gfn_t gfn;
 	int ret;
 
-	ret = prepare_mmu_memcache(vcpu, true, &memcache);
+	ret = prepare_mmu_memcache(s2fd->vcpu, true, &memcache);
 	if (ret)
 		return ret;
 
-	if (nested)
-		gfn = kvm_s2_trans_output(nested) >> PAGE_SHIFT;
+	if (s2fd->nested)
+		gfn = kvm_s2_trans_output(s2fd->nested) >> PAGE_SHIFT;
 	else
-		gfn = fault_ipa >> PAGE_SHIFT;
+		gfn = s2fd->fault_ipa >> PAGE_SHIFT;
 
-	write_fault = kvm_is_write_fault(vcpu);
-	exec_fault = kvm_vcpu_trap_is_exec_fault(vcpu);
+	write_fault = kvm_is_write_fault(s2fd->vcpu);
+	exec_fault = kvm_vcpu_trap_is_exec_fault(s2fd->vcpu);
 
 	VM_WARN_ON_ONCE(write_fault && exec_fault);
 
@@ -1601,24 +1607,24 @@ static int gmem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	/* Pairs with the smp_wmb() in kvm_mmu_invalidate_end(). */
 	smp_rmb();
 
-	ret = kvm_gmem_get_pfn(kvm, memslot, gfn, &pfn, &page, NULL);
+	ret = kvm_gmem_get_pfn(kvm, s2fd->memslot, gfn, &pfn, &page, NULL);
 	if (ret) {
-		kvm_prepare_memory_fault_exit(vcpu, fault_ipa, PAGE_SIZE,
+		kvm_prepare_memory_fault_exit(s2fd->vcpu, s2fd->fault_ipa, PAGE_SIZE,
 					      write_fault, exec_fault, false);
 		return ret;
 	}
 
-	if (!(memslot->flags & KVM_MEM_READONLY))
+	if (!(s2fd->memslot->flags & KVM_MEM_READONLY))
 		prot |= KVM_PGTABLE_PROT_W;
 
-	if (nested)
-		prot = adjust_nested_fault_perms(nested, prot);
+	if (s2fd->nested)
+		prot = adjust_nested_fault_perms(s2fd->nested, prot);
 
 	if (exec_fault || cpus_have_final_cap(ARM64_HAS_CACHE_DIC))
 		prot |= KVM_PGTABLE_PROT_X;
 
-	if (nested)
-		prot = adjust_nested_exec_perms(kvm, nested, prot);
+	if (s2fd->nested)
+		prot = adjust_nested_exec_perms(kvm, s2fd->nested, prot);
 
 	kvm_fault_lock(kvm);
 	if (mmu_invalidate_retry(kvm, mmu_seq)) {
@@ -1626,7 +1632,7 @@ static int gmem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 		goto out_unlock;
 	}
 
-	ret = KVM_PGT_FN(kvm_pgtable_stage2_map)(pgt, fault_ipa, PAGE_SIZE,
+	ret = KVM_PGT_FN(kvm_pgtable_stage2_map)(pgt, s2fd->fault_ipa, PAGE_SIZE,
 						 __pfn_to_phys(pfn), prot,
 						 memcache, flags);
 
@@ -1635,19 +1641,11 @@ static int gmem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	kvm_fault_unlock(kvm);
 
 	if ((prot & KVM_PGTABLE_PROT_W) && !ret)
-		mark_page_dirty_in_slot(kvm, memslot, gfn);
+		mark_page_dirty_in_slot(kvm, s2fd->memslot, gfn);
 
 	return ret != -EAGAIN ? ret : 0;
 }
 
-struct kvm_s2_fault_desc {
-	struct kvm_vcpu		*vcpu;
-	phys_addr_t		fault_ipa;
-	struct kvm_s2_trans	*nested;
-	struct kvm_memory_slot	*memslot;
-	unsigned long		hva;
-};
-
 struct kvm_s2_fault_vma_info {
 	unsigned long	mmu_seq;
 	long		vma_pagesize;
@@ -2296,8 +2294,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 	};
 
 	if (kvm_slot_has_gmem(memslot))
-		ret = gmem_abort(vcpu, fault_ipa, nested, memslot,
-				 esr_fsc_is_permission_fault(esr));
+		ret = gmem_abort(&s2fd);
 	else
 		ret = user_mem_abort(&s2fd);
 
-- 
2.47.3



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH 00/17] KVM: arm64: More user_mem_abort() rework
  2026-03-16 17:54 [PATCH 00/17] KVM: arm64: More user_mem_abort() rework Marc Zyngier
                   ` (16 preceding siblings ...)
  2026-03-16 17:54 ` [PATCH 17/17] KVM: arm64: Convert gmem_abort() to struct kvm_s2_fault_desc Marc Zyngier
@ 2026-03-16 19:45 ` Fuad Tabba
  2026-03-16 20:26 ` Fuad Tabba
  2026-03-17 17:03 ` Suzuki K Poulose
  19 siblings, 0 replies; 47+ messages in thread
From: Fuad Tabba @ 2026-03-16 19:45 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvmarm, linux-arm-kernel, Joey Gouly, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu, Will Deacon, Quentin Perret

Hi Marc,

On Mon, 16 Mar 2026 at 17:55, Marc Zyngier <maz@kernel.org> wrote:
>
> Piqued by Fuad's initial set of patches[1] splitting user_mem_abort()
> into more "edible" functions, I've added my on take on top of it with
> a few goals in mind:

I'm glad I piqued your interest with these patches. I had a look at
the final result, and it looks good. I will start reviewing this
series in detail soon!

Thanks,
/fuad

> - contextualise the state by splitting kvm_s2_fault into more granular
>   structures
>
> - reduce the amount of state that is visible and/or mutable by any
>   single function
>
> - reduce the number of variable that simply cache state that is
>   already implicitly available (and often only a helper away)
>
> I find the result reasonably attractive, and throwing it at a couple
> of machines didn't result in anything out of the ordinary.
>
> For those interested, I have stashed a branch at [2], and I'd
> appreciate some feedback on the outcome.
>
> [1] https://lore.kernel.org/all/20260306140232.2193802-1-tabba@google.com/
> [2] https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=hack/user_mem_abort-rework
>
> Marc Zyngier (17):
>   KVM: arm64: Kill fault->ipa
>   KVM: arm64: Make fault_ipa immutable
>   KVM: arm64: Move fault context to const structure
>   KVM: arm64: Replace fault_is_perm with a helper
>   KVM: arm64: Constrain fault_granule to kvm_s2_fault_map()
>   KVM: arm64: Kill write_fault from kvm_s2_fault
>   KVM: arm64: Kill exec_fault from kvm_s2_fault
>   KVM: arm64: Kill topup_memcache from kvm_s2_fault
>   KVM: arm64: Move VMA-related information to kvm_s2_fault_vma_info
>   KVM: arm64: Kill logging_active from kvm_s2_fault
>   KVM: arm64: Restrict the scope of the 'writable' attribute
>   KVM: arm64: Move kvm_s2_fault.{pfn,page} to kvm_s2_vma_info
>   KVM: arm64: Replace force_pte with a max_map_size attribute
>   KVM: arm64: Move device mapping management into kvm_s2_fault_pin_pfn()
>   KVM: arm64: Directly expose mapping prot and kill kvm_s2_fault
>   KVM: arm64: Simplify integration of adjust_nested_*_perms()
>   KVM: arm64: Convert gmem_abort() to struct kvm_s2_fault_desc
>
>  arch/arm64/kvm/mmu.c | 428 ++++++++++++++++++++++---------------------
>  1 file changed, 223 insertions(+), 205 deletions(-)
>
> --
> 2.47.3
>


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 00/17] KVM: arm64: More user_mem_abort() rework
  2026-03-16 17:54 [PATCH 00/17] KVM: arm64: More user_mem_abort() rework Marc Zyngier
                   ` (17 preceding siblings ...)
  2026-03-16 19:45 ` [PATCH 00/17] KVM: arm64: More user_mem_abort() rework Fuad Tabba
@ 2026-03-16 20:26 ` Fuad Tabba
  2026-03-16 20:33   ` Fuad Tabba
  2026-03-17 17:03 ` Suzuki K Poulose
  19 siblings, 1 reply; 47+ messages in thread
From: Fuad Tabba @ 2026-03-16 20:26 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvmarm, linux-arm-kernel, Joey Gouly, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu, Will Deacon, Quentin Perret

Hi Marc,

On Mon, 16 Mar 2026 at 17:55, Marc Zyngier <maz@kernel.org> wrote:
>
> Piqued by Fuad's initial set of patches[1] splitting user_mem_abort()
> into more "edible" functions, I've added my on take on top of it with
> a few goals in mind:
>
> - contextualise the state by splitting kvm_s2_fault into more granular
>   structures
>
> - reduce the amount of state that is visible and/or mutable by any
>   single function
>
> - reduce the number of variable that simply cache state that is
>   already implicitly available (and often only a helper away)
>
> I find the result reasonably attractive, and throwing it at a couple
> of machines didn't result in anything out of the ordinary.
>
> For those interested, I have stashed a branch at [2], and I'd
> appreciate some feedback on the outcome.
>
> [1] https://lore.kernel.org/all/20260306140232.2193802-1-tabba@google.com/
> [2] https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=hack/user_mem_abort-rework

The series in hack/user_mem_abort-rework is different from this one.
Here are the first few patches:

7243471061be KVM: arm64: Extract VMA size resolution in user_mem_abort()
3a8557ce6025 KVM: arm64: Introduce struct kvm_s2_fault to user_mem_abort()
9f3c0a14bbcb KVM: arm64: Extract PFN resolution in user_mem_abort()
98740dc5cf2b KVM: arm64: Isolate mmap_read_lock inside new
kvm_s2_fault_get_vma_info() helper
ea364906b626 KVM: arm64: Extract stage-2 permission logic in user_mem_abort()

The first patch here doesn't appear until quite later.

Cheers,
/fuad

>
> Marc Zyngier (17):
>   KVM: arm64: Kill fault->ipa
>   KVM: arm64: Make fault_ipa immutable
>   KVM: arm64: Move fault context to const structure
>   KVM: arm64: Replace fault_is_perm with a helper
>   KVM: arm64: Constrain fault_granule to kvm_s2_fault_map()
>   KVM: arm64: Kill write_fault from kvm_s2_fault
>   KVM: arm64: Kill exec_fault from kvm_s2_fault
>   KVM: arm64: Kill topup_memcache from kvm_s2_fault
>   KVM: arm64: Move VMA-related information to kvm_s2_fault_vma_info
>   KVM: arm64: Kill logging_active from kvm_s2_fault
>   KVM: arm64: Restrict the scope of the 'writable' attribute
>   KVM: arm64: Move kvm_s2_fault.{pfn,page} to kvm_s2_vma_info
>   KVM: arm64: Replace force_pte with a max_map_size attribute
>   KVM: arm64: Move device mapping management into kvm_s2_fault_pin_pfn()
>   KVM: arm64: Directly expose mapping prot and kill kvm_s2_fault
>   KVM: arm64: Simplify integration of adjust_nested_*_perms()
>   KVM: arm64: Convert gmem_abort() to struct kvm_s2_fault_desc
>
>  arch/arm64/kvm/mmu.c | 428 ++++++++++++++++++++++---------------------
>  1 file changed, 223 insertions(+), 205 deletions(-)
>
> --
> 2.47.3
>


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 00/17] KVM: arm64: More user_mem_abort() rework
  2026-03-16 20:26 ` Fuad Tabba
@ 2026-03-16 20:33   ` Fuad Tabba
  2026-03-17  8:23     ` Marc Zyngier
  0 siblings, 1 reply; 47+ messages in thread
From: Fuad Tabba @ 2026-03-16 20:33 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvmarm, linux-arm-kernel, Joey Gouly, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu, Will Deacon, Quentin Perret

On Mon, 16 Mar 2026 at 20:26, Fuad Tabba <tabba@google.com> wrote:
>
> Hi Marc,
>
> On Mon, 16 Mar 2026 at 17:55, Marc Zyngier <maz@kernel.org> wrote:
> >
> > Piqued by Fuad's initial set of patches[1] splitting user_mem_abort()
> > into more "edible" functions, I've added my on take on top of it with
> > a few goals in mind:
> >
> > - contextualise the state by splitting kvm_s2_fault into more granular
> >   structures
> >
> > - reduce the amount of state that is visible and/or mutable by any
> >   single function
> >
> > - reduce the number of variable that simply cache state that is
> >   already implicitly available (and often only a helper away)
> >
> > I find the result reasonably attractive, and throwing it at a couple
> > of machines didn't result in anything out of the ordinary.
> >
> > For those interested, I have stashed a branch at [2], and I'd
> > appreciate some feedback on the outcome.
> >
> > [1] https://lore.kernel.org/all/20260306140232.2193802-1-tabba@google.com/
> > [2] https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=hack/user_mem_abort-rework
>
> The series in hack/user_mem_abort-rework is different from this one.
> Here are the first few patches:

And I just realized it's because they're based on _my_ patches ... doh! :D

Sorry for the noise.

Cheers,
/fuad

> 7243471061be KVM: arm64: Extract VMA size resolution in user_mem_abort()
> 3a8557ce6025 KVM: arm64: Introduce struct kvm_s2_fault to user_mem_abort()
> 9f3c0a14bbcb KVM: arm64: Extract PFN resolution in user_mem_abort()
> 98740dc5cf2b KVM: arm64: Isolate mmap_read_lock inside new
> kvm_s2_fault_get_vma_info() helper
> ea364906b626 KVM: arm64: Extract stage-2 permission logic in user_mem_abort()
>
> The first patch here doesn't appear until quite later.
>
> Cheers,
> /fuad
>
> >
> > Marc Zyngier (17):
> >   KVM: arm64: Kill fault->ipa
> >   KVM: arm64: Make fault_ipa immutable
> >   KVM: arm64: Move fault context to const structure
> >   KVM: arm64: Replace fault_is_perm with a helper
> >   KVM: arm64: Constrain fault_granule to kvm_s2_fault_map()
> >   KVM: arm64: Kill write_fault from kvm_s2_fault
> >   KVM: arm64: Kill exec_fault from kvm_s2_fault
> >   KVM: arm64: Kill topup_memcache from kvm_s2_fault
> >   KVM: arm64: Move VMA-related information to kvm_s2_fault_vma_info
> >   KVM: arm64: Kill logging_active from kvm_s2_fault
> >   KVM: arm64: Restrict the scope of the 'writable' attribute
> >   KVM: arm64: Move kvm_s2_fault.{pfn,page} to kvm_s2_vma_info
> >   KVM: arm64: Replace force_pte with a max_map_size attribute
> >   KVM: arm64: Move device mapping management into kvm_s2_fault_pin_pfn()
> >   KVM: arm64: Directly expose mapping prot and kill kvm_s2_fault
> >   KVM: arm64: Simplify integration of adjust_nested_*_perms()
> >   KVM: arm64: Convert gmem_abort() to struct kvm_s2_fault_desc
> >
> >  arch/arm64/kvm/mmu.c | 428 ++++++++++++++++++++++---------------------
> >  1 file changed, 223 insertions(+), 205 deletions(-)
> >
> > --
> > 2.47.3
> >


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 00/17] KVM: arm64: More user_mem_abort() rework
  2026-03-16 20:33   ` Fuad Tabba
@ 2026-03-17  8:23     ` Marc Zyngier
  2026-03-17 17:50       ` Fuad Tabba
  0 siblings, 1 reply; 47+ messages in thread
From: Marc Zyngier @ 2026-03-17  8:23 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: kvmarm, linux-arm-kernel, Joey Gouly, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu, Will Deacon, Quentin Perret

On Mon, 16 Mar 2026 20:33:03 +0000,
Fuad Tabba <tabba@google.com> wrote:
> 
> On Mon, 16 Mar 2026 at 20:26, Fuad Tabba <tabba@google.com> wrote:
> >
> > Hi Marc,
> >
> > On Mon, 16 Mar 2026 at 17:55, Marc Zyngier <maz@kernel.org> wrote:
> > >
> > > Piqued by Fuad's initial set of patches[1] splitting user_mem_abort()
> > > into more "edible" functions, I've added my on take on top of it with
> > > a few goals in mind:
> > >
> > > - contextualise the state by splitting kvm_s2_fault into more granular
> > >   structures
> > >
> > > - reduce the amount of state that is visible and/or mutable by any
> > >   single function
> > >
> > > - reduce the number of variable that simply cache state that is
> > >   already implicitly available (and often only a helper away)
> > >
> > > I find the result reasonably attractive, and throwing it at a couple
> > > of machines didn't result in anything out of the ordinary.
> > >
> > > For those interested, I have stashed a branch at [2], and I'd
> > > appreciate some feedback on the outcome.
> > >
> > > [1] https://lore.kernel.org/all/20260306140232.2193802-1-tabba@google.com/
> > > [2] https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=hack/user_mem_abort-rework
> >
> > The series in hack/user_mem_abort-rework is different from this one.
> > Here are the first few patches:
> 
> And I just realized it's because they're based on _my_ patches ... doh! :D

I thought I was clear when I wrote "I've added my on take on top of
it", but maybe not. In any case, I didn't feel the need to redo what
you had already done -- I don't think I'd have come up with something
better.

In any case, I'd appreciate your feedback!

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 01/17] KVM: arm64: Kill fault->ipa
  2026-03-16 17:54 ` [PATCH 01/17] KVM: arm64: Kill fault->ipa Marc Zyngier
@ 2026-03-17  9:22   ` Fuad Tabba
  0 siblings, 0 replies; 47+ messages in thread
From: Fuad Tabba @ 2026-03-17  9:22 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvmarm, linux-arm-kernel, Joey Gouly, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu, Will Deacon, Quentin Perret

On Mon, 16 Mar 2026 at 17:55, Marc Zyngier <maz@kernel.org> wrote:
>
> fault->ipa, in a nested contest, represents the output of the guest's
> S2 translation for the fault->fault_ipa input, and is equal to
> fault->fault_ipa otherwise,
>
> Given that this is readily available from kvm_s2_trans_output(),
> drop fault->ipa and directly compute fault->gfn instead, which
> is really what we want.
>
> Signed-off-by: Marc Zyngier <maz@kernel.org>

Reviewed-by: Fuad Tabba <tabba@google.com>

Cheers,
/fuad


> ---
>  arch/arm64/kvm/mmu.c | 14 +++++---------
>  1 file changed, 5 insertions(+), 9 deletions(-)
>
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 5542a50dc8a65..fe8f8057cf412 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1643,7 +1643,7 @@ static short kvm_s2_resolve_vma_size(struct vm_area_struct *vma,
>                                      unsigned long hva,
>                                      struct kvm_memory_slot *memslot,
>                                      struct kvm_s2_trans *nested,
> -                                    bool *force_pte, phys_addr_t *ipa)
> +                                    bool *force_pte)
>  {
>         short vma_shift;
>
> @@ -1681,8 +1681,6 @@ static short kvm_s2_resolve_vma_size(struct vm_area_struct *vma,
>
>                 max_map_size = *force_pte ? PAGE_SIZE : PUD_SIZE;
>
> -               *ipa = kvm_s2_trans_output(nested);
> -
>                 /*
>                  * If we're about to create a shadow stage 2 entry, then we
>                  * can only create a block mapping if the guest stage 2 page
> @@ -1722,7 +1720,6 @@ struct kvm_s2_fault {
>         bool is_vma_cacheable;
>         bool s2_force_noncacheable;
>         unsigned long mmu_seq;
> -       phys_addr_t ipa;
>         gfn_t gfn;
>         kvm_pfn_t pfn;
>         bool logging_active;
> @@ -1738,6 +1735,7 @@ static int kvm_s2_fault_get_vma_info(struct kvm_s2_fault *fault)
>  {
>         struct vm_area_struct *vma;
>         struct kvm *kvm = fault->vcpu->kvm;
> +       phys_addr_t ipa;
>
>         mmap_read_lock(current->mm);
>         vma = vma_lookup(current->mm, fault->hva);
> @@ -1748,8 +1746,7 @@ static int kvm_s2_fault_get_vma_info(struct kvm_s2_fault *fault)
>         }
>
>         fault->vma_pagesize = 1UL << kvm_s2_resolve_vma_size(vma, fault->hva, fault->memslot,
> -                                                            fault->nested, &fault->force_pte,
> -                                                            &fault->ipa);
> +                                                            fault->nested, &fault->force_pte);
>
>         /*
>          * Both the canonical IPA and fault IPA must be aligned to the
> @@ -1757,9 +1754,9 @@ static int kvm_s2_fault_get_vma_info(struct kvm_s2_fault *fault)
>          * mapping in the right place.
>          */
>         fault->fault_ipa = ALIGN_DOWN(fault->fault_ipa, fault->vma_pagesize);
> -       fault->ipa = ALIGN_DOWN(fault->ipa, fault->vma_pagesize);
> +       ipa = fault->nested ? kvm_s2_trans_output(fault->nested) : fault->fault_ipa;
> +       fault->gfn = ALIGN_DOWN(ipa, fault->vma_pagesize) >> PAGE_SHIFT;
>
> -       fault->gfn = fault->ipa >> PAGE_SHIFT;
>         fault->mte_allowed = kvm_vma_mte_allowed(vma);
>
>         fault->vm_flags = vma->vm_flags;
> @@ -1970,7 +1967,6 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>                 .memslot = memslot,
>                 .hva = hva,
>                 .fault_is_perm = fault_is_perm,
> -               .ipa = fault_ipa,
>                 .logging_active = logging_active,
>                 .force_pte = logging_active,
>                 .prot = KVM_PGTABLE_PROT_R,
> --
> 2.47.3
>


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 02/17] KVM: arm64: Make fault_ipa immutable
  2026-03-16 17:54 ` [PATCH 02/17] KVM: arm64: Make fault_ipa immutable Marc Zyngier
@ 2026-03-17  9:38   ` Fuad Tabba
  0 siblings, 0 replies; 47+ messages in thread
From: Fuad Tabba @ 2026-03-17  9:38 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvmarm, linux-arm-kernel, Joey Gouly, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu, Will Deacon, Quentin Perret

On Mon, 16 Mar 2026 at 17:55, Marc Zyngier <maz@kernel.org> wrote:
>
> Updating fault_ipa is conceptually annoying, as it changes something
> that is a property of the fault itself.
>
> Stop doing so and instead use fault->gfn as the sole piece of state
> that can be used to represent the faulting IPA.
>
> At the same time, introduce get_canonical_gfn() for the couple of case
> we're we are concerned with the memslot-related IPA and not the faulting
> one.
>
> Signed-off-by: Marc Zyngier <maz@kernel.org>

Reviewed-by: Fuad Tabba <tabba@google.com>

Cheers,
/fuad

> ---
>  arch/arm64/kvm/mmu.c | 38 ++++++++++++++++++++++++++------------
>  1 file changed, 26 insertions(+), 12 deletions(-)
>
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index fe8f8057cf412..ab8a269d4366d 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1400,10 +1400,10 @@ static bool fault_supports_stage2_huge_mapping(struct kvm_memory_slot *memslot,
>   */
>  static long
>  transparent_hugepage_adjust(struct kvm *kvm, struct kvm_memory_slot *memslot,
> -                           unsigned long hva, kvm_pfn_t *pfnp,
> -                           phys_addr_t *ipap)
> +                           unsigned long hva, kvm_pfn_t *pfnp, gfn_t *gfnp)
>  {
>         kvm_pfn_t pfn = *pfnp;
> +       gfn_t gfn = *gfnp;
>
>         /*
>          * Make sure the adjustment is done only for THP pages. Also make
> @@ -1419,7 +1419,8 @@ transparent_hugepage_adjust(struct kvm *kvm, struct kvm_memory_slot *memslot,
>                 if (sz < PMD_SIZE)
>                         return PAGE_SIZE;
>
> -               *ipap &= PMD_MASK;
> +               gfn &= ~(PTRS_PER_PMD - 1);
> +               *gfnp = gfn;
>                 pfn &= ~(PTRS_PER_PMD - 1);
>                 *pfnp = pfn;
>
> @@ -1735,7 +1736,6 @@ static int kvm_s2_fault_get_vma_info(struct kvm_s2_fault *fault)
>  {
>         struct vm_area_struct *vma;
>         struct kvm *kvm = fault->vcpu->kvm;
> -       phys_addr_t ipa;
>
>         mmap_read_lock(current->mm);
>         vma = vma_lookup(current->mm, fault->hva);
> @@ -1753,9 +1753,7 @@ static int kvm_s2_fault_get_vma_info(struct kvm_s2_fault *fault)
>          * mapping size to ensure we find the right PFN and lay down the
>          * mapping in the right place.
>          */
> -       fault->fault_ipa = ALIGN_DOWN(fault->fault_ipa, fault->vma_pagesize);
> -       ipa = fault->nested ? kvm_s2_trans_output(fault->nested) : fault->fault_ipa;
> -       fault->gfn = ALIGN_DOWN(ipa, fault->vma_pagesize) >> PAGE_SHIFT;
> +       fault->gfn = ALIGN_DOWN(fault->fault_ipa, fault->vma_pagesize) >> PAGE_SHIFT;
>
>         fault->mte_allowed = kvm_vma_mte_allowed(vma);
>
> @@ -1777,6 +1775,17 @@ static int kvm_s2_fault_get_vma_info(struct kvm_s2_fault *fault)
>         return 0;
>  }
>
> +static gfn_t get_canonical_gfn(struct kvm_s2_fault *fault)
> +{
> +       phys_addr_t ipa;
> +
> +       if (!fault->nested)
> +               return fault->gfn;
> +
> +       ipa = kvm_s2_trans_output(fault->nested);
> +       return ALIGN_DOWN(ipa, fault->vma_pagesize) >> PAGE_SHIFT;
> +}
> +
>  static int kvm_s2_fault_pin_pfn(struct kvm_s2_fault *fault)
>  {
>         int ret;
> @@ -1785,7 +1794,7 @@ static int kvm_s2_fault_pin_pfn(struct kvm_s2_fault *fault)
>         if (ret)
>                 return ret;
>
> -       fault->pfn = __kvm_faultin_pfn(fault->memslot, fault->gfn,
> +       fault->pfn = __kvm_faultin_pfn(fault->memslot, get_canonical_gfn(fault),
>                                        fault->write_fault ? FOLL_WRITE : 0,
>                                        &fault->writable, &fault->page);
>         if (unlikely(is_error_noslot_pfn(fault->pfn))) {
> @@ -1885,6 +1894,11 @@ static int kvm_s2_fault_compute_prot(struct kvm_s2_fault *fault)
>         return 0;
>  }
>
> +static phys_addr_t get_ipa(const struct kvm_s2_fault *fault)
> +{
> +       return gfn_to_gpa(fault->gfn);
> +}
> +
>  static int kvm_s2_fault_map(struct kvm_s2_fault *fault, void *memcache)
>  {
>         struct kvm *kvm = fault->vcpu->kvm;
> @@ -1909,7 +1923,7 @@ static int kvm_s2_fault_map(struct kvm_s2_fault *fault, void *memcache)
>                 } else {
>                         fault->vma_pagesize = transparent_hugepage_adjust(kvm, fault->memslot,
>                                                                           fault->hva, &fault->pfn,
> -                                                                         &fault->fault_ipa);
> +                                                                         &fault->gfn);
>
>                         if (fault->vma_pagesize < 0) {
>                                 ret = fault->vma_pagesize;
> @@ -1932,10 +1946,10 @@ static int kvm_s2_fault_map(struct kvm_s2_fault *fault, void *memcache)
>                  * PTE, which will be preserved.
>                  */
>                 fault->prot &= ~KVM_NV_GUEST_MAP_SZ;
> -               ret = KVM_PGT_FN(kvm_pgtable_stage2_relax_perms)(pgt, fault->fault_ipa,
> +               ret = KVM_PGT_FN(kvm_pgtable_stage2_relax_perms)(pgt, get_ipa(fault),
>                                                                  fault->prot, flags);
>         } else {
> -               ret = KVM_PGT_FN(kvm_pgtable_stage2_map)(pgt, fault->fault_ipa, fault->vma_pagesize,
> +               ret = KVM_PGT_FN(kvm_pgtable_stage2_map)(pgt, get_ipa(fault), fault->vma_pagesize,
>                                                          __pfn_to_phys(fault->pfn), fault->prot,
>                                                          memcache, flags);
>         }
> @@ -1946,7 +1960,7 @@ static int kvm_s2_fault_map(struct kvm_s2_fault *fault, void *memcache)
>
>         /* Mark the fault->page dirty only if the fault is handled successfully */
>         if (fault->writable && !ret)
> -               mark_page_dirty_in_slot(kvm, fault->memslot, fault->gfn);
> +               mark_page_dirty_in_slot(kvm, fault->memslot, get_canonical_gfn(fault));
>
>         if (ret != -EAGAIN)
>                 return ret;
> --
> 2.47.3
>


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 03/17] KVM: arm64: Move fault context to const structure
  2026-03-16 17:54 ` [PATCH 03/17] KVM: arm64: Move fault context to const structure Marc Zyngier
@ 2026-03-17 10:26   ` Fuad Tabba
  0 siblings, 0 replies; 47+ messages in thread
From: Fuad Tabba @ 2026-03-17 10:26 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvmarm, linux-arm-kernel, Joey Gouly, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu, Will Deacon, Quentin Perret

On Mon, 16 Mar 2026 at 17:55, Marc Zyngier <maz@kernel.org> wrote:
>
> In order to make it clearer what gets updated or not during fault
> handling, move a set of information that losely represents the
> fault context.
>
> This gets populated early, from handle_mem_abort(), and gets passed
> along as a const pointer. user_mem_abort()'s signature is majorly
> improved in doing so, and kvm_s2_fault loses a bunch of fields.
>
> gmem_abort() will get a similar treatment down the line.
>
> Signed-off-by: Marc Zyngier <maz@kernel.org>

Reviewed-by: Fuad Tabba <tabba@google.com>

Cheers,
/fuad

> ---
>  arch/arm64/kvm/mmu.c | 133 ++++++++++++++++++++++---------------------
>  1 file changed, 69 insertions(+), 64 deletions(-)
>
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index ab8a269d4366d..2a7128b8dd14f 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1640,23 +1640,28 @@ static int gmem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>         return ret != -EAGAIN ? ret : 0;
>  }
>
> -static short kvm_s2_resolve_vma_size(struct vm_area_struct *vma,
> -                                    unsigned long hva,
> -                                    struct kvm_memory_slot *memslot,
> -                                    struct kvm_s2_trans *nested,
> -                                    bool *force_pte)
> +struct kvm_s2_fault_desc {
> +       struct kvm_vcpu         *vcpu;
> +       phys_addr_t             fault_ipa;
> +       struct kvm_s2_trans     *nested;
> +       struct kvm_memory_slot  *memslot;
> +       unsigned long           hva;
> +};
> +
> +static short kvm_s2_resolve_vma_size(const struct kvm_s2_fault_desc *s2fd,
> +                                    struct vm_area_struct *vma, bool *force_pte)
>  {
>         short vma_shift;
>
>         if (*force_pte)
>                 vma_shift = PAGE_SHIFT;
>         else
> -               vma_shift = get_vma_page_shift(vma, hva);
> +               vma_shift = get_vma_page_shift(vma, s2fd->hva);
>
>         switch (vma_shift) {
>  #ifndef __PAGETABLE_PMD_FOLDED
>         case PUD_SHIFT:
> -               if (fault_supports_stage2_huge_mapping(memslot, hva, PUD_SIZE))
> +               if (fault_supports_stage2_huge_mapping(s2fd->memslot, s2fd->hva, PUD_SIZE))
>                         break;
>                 fallthrough;
>  #endif
> @@ -1664,7 +1669,7 @@ static short kvm_s2_resolve_vma_size(struct vm_area_struct *vma,
>                 vma_shift = PMD_SHIFT;
>                 fallthrough;
>         case PMD_SHIFT:
> -               if (fault_supports_stage2_huge_mapping(memslot, hva, PMD_SIZE))
> +               if (fault_supports_stage2_huge_mapping(s2fd->memslot, s2fd->hva, PMD_SIZE))
>                         break;
>                 fallthrough;
>         case CONT_PTE_SHIFT:
> @@ -1677,7 +1682,7 @@ static short kvm_s2_resolve_vma_size(struct vm_area_struct *vma,
>                 WARN_ONCE(1, "Unknown vma_shift %d", vma_shift);
>         }
>
> -       if (nested) {
> +       if (s2fd->nested) {
>                 unsigned long max_map_size;
>
>                 max_map_size = *force_pte ? PAGE_SIZE : PUD_SIZE;
> @@ -1687,7 +1692,7 @@ static short kvm_s2_resolve_vma_size(struct vm_area_struct *vma,
>                  * can only create a block mapping if the guest stage 2 page
>                  * table uses at least as big a mapping.
>                  */
> -               max_map_size = min(kvm_s2_trans_size(nested), max_map_size);
> +               max_map_size = min(kvm_s2_trans_size(s2fd->nested), max_map_size);
>
>                 /*
>                  * Be careful that if the mapping size falls between
> @@ -1706,11 +1711,6 @@ static short kvm_s2_resolve_vma_size(struct vm_area_struct *vma,
>  }
>
>  struct kvm_s2_fault {
> -       struct kvm_vcpu *vcpu;
> -       phys_addr_t fault_ipa;
> -       struct kvm_s2_trans *nested;
> -       struct kvm_memory_slot *memslot;
> -       unsigned long hva;
>         bool fault_is_perm;
>
>         bool write_fault;
> @@ -1732,28 +1732,28 @@ struct kvm_s2_fault {
>         vm_flags_t vm_flags;
>  };
>
> -static int kvm_s2_fault_get_vma_info(struct kvm_s2_fault *fault)
> +static int kvm_s2_fault_get_vma_info(const struct kvm_s2_fault_desc *s2fd,
> +                                    struct kvm_s2_fault *fault)
>  {
>         struct vm_area_struct *vma;
> -       struct kvm *kvm = fault->vcpu->kvm;
> +       struct kvm *kvm = s2fd->vcpu->kvm;
>
>         mmap_read_lock(current->mm);
> -       vma = vma_lookup(current->mm, fault->hva);
> +       vma = vma_lookup(current->mm, s2fd->hva);
>         if (unlikely(!vma)) {
> -               kvm_err("Failed to find VMA for fault->hva 0x%lx\n", fault->hva);
> +               kvm_err("Failed to find VMA for hva 0x%lx\n", s2fd->hva);
>                 mmap_read_unlock(current->mm);
>                 return -EFAULT;
>         }
>
> -       fault->vma_pagesize = 1UL << kvm_s2_resolve_vma_size(vma, fault->hva, fault->memslot,
> -                                                            fault->nested, &fault->force_pte);
> +       fault->vma_pagesize = BIT(kvm_s2_resolve_vma_size(s2fd, vma, &fault->force_pte));
>
>         /*
>          * Both the canonical IPA and fault IPA must be aligned to the
>          * mapping size to ensure we find the right PFN and lay down the
>          * mapping in the right place.
>          */
> -       fault->gfn = ALIGN_DOWN(fault->fault_ipa, fault->vma_pagesize) >> PAGE_SHIFT;
> +       fault->gfn = ALIGN_DOWN(s2fd->fault_ipa, fault->vma_pagesize) >> PAGE_SHIFT;
>
>         fault->mte_allowed = kvm_vma_mte_allowed(vma);
>
> @@ -1775,31 +1775,33 @@ static int kvm_s2_fault_get_vma_info(struct kvm_s2_fault *fault)
>         return 0;
>  }
>
> -static gfn_t get_canonical_gfn(struct kvm_s2_fault *fault)
> +static gfn_t get_canonical_gfn(const struct kvm_s2_fault_desc *s2fd,
> +                              const struct kvm_s2_fault *fault)
>  {
>         phys_addr_t ipa;
>
> -       if (!fault->nested)
> +       if (!s2fd->nested)
>                 return fault->gfn;
>
> -       ipa = kvm_s2_trans_output(fault->nested);
> +       ipa = kvm_s2_trans_output(s2fd->nested);
>         return ALIGN_DOWN(ipa, fault->vma_pagesize) >> PAGE_SHIFT;
>  }
>
> -static int kvm_s2_fault_pin_pfn(struct kvm_s2_fault *fault)
> +static int kvm_s2_fault_pin_pfn(const struct kvm_s2_fault_desc *s2fd,
> +                               struct kvm_s2_fault *fault)
>  {
>         int ret;
>
> -       ret = kvm_s2_fault_get_vma_info(fault);
> +       ret = kvm_s2_fault_get_vma_info(s2fd, fault);
>         if (ret)
>                 return ret;
>
> -       fault->pfn = __kvm_faultin_pfn(fault->memslot, get_canonical_gfn(fault),
> +       fault->pfn = __kvm_faultin_pfn(s2fd->memslot, get_canonical_gfn(s2fd, fault),
>                                        fault->write_fault ? FOLL_WRITE : 0,
>                                        &fault->writable, &fault->page);
>         if (unlikely(is_error_noslot_pfn(fault->pfn))) {
>                 if (fault->pfn == KVM_PFN_ERR_HWPOISON) {
> -                       kvm_send_hwpoison_signal(fault->hva, __ffs(fault->vma_pagesize));
> +                       kvm_send_hwpoison_signal(s2fd->hva, __ffs(fault->vma_pagesize));
>                         return 0;
>                 }
>                 return -EFAULT;
> @@ -1808,9 +1810,10 @@ static int kvm_s2_fault_pin_pfn(struct kvm_s2_fault *fault)
>         return 1;
>  }
>
> -static int kvm_s2_fault_compute_prot(struct kvm_s2_fault *fault)
> +static int kvm_s2_fault_compute_prot(const struct kvm_s2_fault_desc *s2fd,
> +                                    struct kvm_s2_fault *fault)
>  {
> -       struct kvm *kvm = fault->vcpu->kvm;
> +       struct kvm *kvm = s2fd->vcpu->kvm;
>
>         /*
>          * Check if this is non-struct page memory PFN, and cannot support
> @@ -1862,13 +1865,13 @@ static int kvm_s2_fault_compute_prot(struct kvm_s2_fault *fault)
>          * and trigger the exception here. Since the memslot is valid, inject
>          * the fault back to the guest.
>          */
> -       if (esr_fsc_is_excl_atomic_fault(kvm_vcpu_get_esr(fault->vcpu))) {
> -               kvm_inject_dabt_excl_atomic(fault->vcpu, kvm_vcpu_get_hfar(fault->vcpu));
> +       if (esr_fsc_is_excl_atomic_fault(kvm_vcpu_get_esr(s2fd->vcpu))) {
> +               kvm_inject_dabt_excl_atomic(s2fd->vcpu, kvm_vcpu_get_hfar(s2fd->vcpu));
>                 return 1;
>         }
>
> -       if (fault->nested)
> -               adjust_nested_fault_perms(fault->nested, &fault->prot, &fault->writable);
> +       if (s2fd->nested)
> +               adjust_nested_fault_perms(s2fd->nested, &fault->prot, &fault->writable);
>
>         if (fault->writable)
>                 fault->prot |= KVM_PGTABLE_PROT_W;
> @@ -1882,8 +1885,8 @@ static int kvm_s2_fault_compute_prot(struct kvm_s2_fault *fault)
>         else if (cpus_have_final_cap(ARM64_HAS_CACHE_DIC))
>                 fault->prot |= KVM_PGTABLE_PROT_X;
>
> -       if (fault->nested)
> -               adjust_nested_exec_perms(kvm, fault->nested, &fault->prot);
> +       if (s2fd->nested)
> +               adjust_nested_exec_perms(kvm, s2fd->nested, &fault->prot);
>
>         if (!fault->fault_is_perm && !fault->s2_force_noncacheable && kvm_has_mte(kvm)) {
>                 /* Check the VMM hasn't introduced a new disallowed VMA */
> @@ -1899,15 +1902,16 @@ static phys_addr_t get_ipa(const struct kvm_s2_fault *fault)
>         return gfn_to_gpa(fault->gfn);
>  }
>
> -static int kvm_s2_fault_map(struct kvm_s2_fault *fault, void *memcache)
> +static int kvm_s2_fault_map(const struct kvm_s2_fault_desc *s2fd,
> +                           struct kvm_s2_fault *fault, void *memcache)
>  {
> -       struct kvm *kvm = fault->vcpu->kvm;
> +       struct kvm *kvm = s2fd->vcpu->kvm;
>         struct kvm_pgtable *pgt;
>         int ret;
>         enum kvm_pgtable_walk_flags flags = KVM_PGTABLE_WALK_SHARED;
>
>         kvm_fault_lock(kvm);
> -       pgt = fault->vcpu->arch.hw_mmu->pgt;
> +       pgt = s2fd->vcpu->arch.hw_mmu->pgt;
>         ret = -EAGAIN;
>         if (mmu_invalidate_retry(kvm, fault->mmu_seq))
>                 goto out_unlock;
> @@ -1921,8 +1925,8 @@ static int kvm_s2_fault_map(struct kvm_s2_fault *fault, void *memcache)
>                 if (fault->fault_is_perm && fault->fault_granule > PAGE_SIZE) {
>                         fault->vma_pagesize = fault->fault_granule;
>                 } else {
> -                       fault->vma_pagesize = transparent_hugepage_adjust(kvm, fault->memslot,
> -                                                                         fault->hva, &fault->pfn,
> +                       fault->vma_pagesize = transparent_hugepage_adjust(kvm, s2fd->memslot,
> +                                                                         s2fd->hva, &fault->pfn,
>                                                                           &fault->gfn);
>
>                         if (fault->vma_pagesize < 0) {
> @@ -1960,34 +1964,27 @@ static int kvm_s2_fault_map(struct kvm_s2_fault *fault, void *memcache)
>
>         /* Mark the fault->page dirty only if the fault is handled successfully */
>         if (fault->writable && !ret)
> -               mark_page_dirty_in_slot(kvm, fault->memslot, get_canonical_gfn(fault));
> +               mark_page_dirty_in_slot(kvm, s2fd->memslot, get_canonical_gfn(s2fd, fault));
>
>         if (ret != -EAGAIN)
>                 return ret;
>         return 0;
>  }
>
> -static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> -                         struct kvm_s2_trans *nested,
> -                         struct kvm_memory_slot *memslot, unsigned long hva,
> -                         bool fault_is_perm)
> +static int user_mem_abort(const struct kvm_s2_fault_desc *s2fd)
>  {
> -       bool write_fault = kvm_is_write_fault(vcpu);
> -       bool logging_active = memslot_is_logging(memslot);
> +       bool perm_fault = kvm_vcpu_trap_is_permission_fault(s2fd->vcpu);
> +       bool write_fault = kvm_is_write_fault(s2fd->vcpu);
> +       bool logging_active = memslot_is_logging(s2fd->memslot);
>         struct kvm_s2_fault fault = {
> -               .vcpu = vcpu,
> -               .fault_ipa = fault_ipa,
> -               .nested = nested,
> -               .memslot = memslot,
> -               .hva = hva,
> -               .fault_is_perm = fault_is_perm,
> +               .fault_is_perm = perm_fault,
>                 .logging_active = logging_active,
>                 .force_pte = logging_active,
>                 .prot = KVM_PGTABLE_PROT_R,
> -               .fault_granule = fault_is_perm ? kvm_vcpu_trap_get_perm_fault_granule(vcpu) : 0,
> +               .fault_granule = perm_fault ? kvm_vcpu_trap_get_perm_fault_granule(s2fd->vcpu) : 0,
>                 .write_fault = write_fault,
> -               .exec_fault = kvm_vcpu_trap_is_exec_fault(vcpu),
> -               .topup_memcache = !fault_is_perm || (logging_active && write_fault),
> +               .exec_fault = kvm_vcpu_trap_is_exec_fault(s2fd->vcpu),
> +               .topup_memcache = !perm_fault || (logging_active && write_fault),
>         };
>         void *memcache;
>         int ret;
> @@ -2000,7 +1997,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>          * only exception to this is when dirty logging is enabled at runtime
>          * and a write fault needs to collapse a block entry into a table.
>          */
> -       ret = prepare_mmu_memcache(vcpu, fault.topup_memcache, &memcache);
> +       ret = prepare_mmu_memcache(s2fd->vcpu, fault.topup_memcache, &memcache);
>         if (ret)
>                 return ret;
>
> @@ -2008,17 +2005,17 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>          * Let's check if we will get back a huge fault->page backed by hugetlbfs, or
>          * get block mapping for device MMIO region.
>          */
> -       ret = kvm_s2_fault_pin_pfn(&fault);
> +       ret = kvm_s2_fault_pin_pfn(s2fd, &fault);
>         if (ret != 1)
>                 return ret;
>
> -       ret = kvm_s2_fault_compute_prot(&fault);
> +       ret = kvm_s2_fault_compute_prot(s2fd, &fault);
>         if (ret) {
>                 kvm_release_page_unused(fault.page);
>                 return ret;
>         }
>
> -       return kvm_s2_fault_map(&fault, memcache);
> +       return kvm_s2_fault_map(s2fd, &fault, memcache);
>  }
>
>  /* Resolve the access fault by making the page young again. */
> @@ -2284,12 +2281,20 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
>         VM_WARN_ON_ONCE(kvm_vcpu_trap_is_permission_fault(vcpu) &&
>                         !write_fault && !kvm_vcpu_trap_is_exec_fault(vcpu));
>
> +       const struct kvm_s2_fault_desc s2fd = {
> +               .vcpu           = vcpu,
> +               .fault_ipa      = fault_ipa,
> +               .nested         = nested,
> +               .memslot        = memslot,
> +               .hva            = hva,
> +       };
> +
>         if (kvm_slot_has_gmem(memslot))
>                 ret = gmem_abort(vcpu, fault_ipa, nested, memslot,
>                                  esr_fsc_is_permission_fault(esr));
>         else
> -               ret = user_mem_abort(vcpu, fault_ipa, nested, memslot, hva,
> -                                    esr_fsc_is_permission_fault(esr));
> +               ret = user_mem_abort(&s2fd);
> +
>         if (ret == 0)
>                 ret = 1;
>  out:
> --
> 2.47.3
>
>


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 04/17] KVM: arm64: Replace fault_is_perm with a helper
  2026-03-16 17:54 ` [PATCH 04/17] KVM: arm64: Replace fault_is_perm with a helper Marc Zyngier
@ 2026-03-17 10:49   ` Fuad Tabba
  2026-03-18 13:43   ` Joey Gouly
  1 sibling, 0 replies; 47+ messages in thread
From: Fuad Tabba @ 2026-03-17 10:49 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvmarm, linux-arm-kernel, Joey Gouly, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu, Will Deacon, Quentin Perret

On Mon, 16 Mar 2026 at 17:55, Marc Zyngier <maz@kernel.org> wrote:
>
> Carrying a boolean to indicate that a given fault is slightly odd,
> as this is a property of the fault itself, and we'd better avoid
> duplicating state.
>
> For this purpose, introduce a kvm_s2_fault_is_perm() predicate that
> can take a fault descriptor as a parameter. fault_is_perm is therefore
> dropped from kvm_s2_fault.
>
> Signed-off-by: Marc Zyngier <maz@kernel.org>

Reviewed-by: Fuad Tabba <tabba@google.com>

Cheers,
/fuad

> ---
>  arch/arm64/kvm/mmu.c | 17 ++++++++++-------
>  1 file changed, 10 insertions(+), 7 deletions(-)
>
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 2a7128b8dd14f..1b32f2e6c3e61 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1711,8 +1711,6 @@ static short kvm_s2_resolve_vma_size(const struct kvm_s2_fault_desc *s2fd,
>  }
>
>  struct kvm_s2_fault {
> -       bool fault_is_perm;
> -
>         bool write_fault;
>         bool exec_fault;
>         bool writable;
> @@ -1732,6 +1730,11 @@ struct kvm_s2_fault {
>         vm_flags_t vm_flags;
>  };
>
> +static bool kvm_s2_fault_is_perm(const struct kvm_s2_fault_desc *s2fd)
> +{
> +       return kvm_vcpu_trap_is_permission_fault(s2fd->vcpu);
> +}
> +
>  static int kvm_s2_fault_get_vma_info(const struct kvm_s2_fault_desc *s2fd,
>                                      struct kvm_s2_fault *fault)
>  {
> @@ -1888,7 +1891,7 @@ static int kvm_s2_fault_compute_prot(const struct kvm_s2_fault_desc *s2fd,
>         if (s2fd->nested)
>                 adjust_nested_exec_perms(kvm, s2fd->nested, &fault->prot);
>
> -       if (!fault->fault_is_perm && !fault->s2_force_noncacheable && kvm_has_mte(kvm)) {
> +       if (!kvm_s2_fault_is_perm(s2fd) && !fault->s2_force_noncacheable && kvm_has_mte(kvm)) {
>                 /* Check the VMM hasn't introduced a new disallowed VMA */
>                 if (!fault->mte_allowed)
>                         return -EFAULT;
> @@ -1905,6 +1908,7 @@ static phys_addr_t get_ipa(const struct kvm_s2_fault *fault)
>  static int kvm_s2_fault_map(const struct kvm_s2_fault_desc *s2fd,
>                             struct kvm_s2_fault *fault, void *memcache)
>  {
> +       bool fault_is_perm = kvm_s2_fault_is_perm(s2fd);
>         struct kvm *kvm = s2fd->vcpu->kvm;
>         struct kvm_pgtable *pgt;
>         int ret;
> @@ -1922,7 +1926,7 @@ static int kvm_s2_fault_map(const struct kvm_s2_fault_desc *s2fd,
>          */
>         if (fault->vma_pagesize == PAGE_SIZE &&
>             !(fault->force_pte || fault->s2_force_noncacheable)) {
> -               if (fault->fault_is_perm && fault->fault_granule > PAGE_SIZE) {
> +               if (fault_is_perm && fault->fault_granule > PAGE_SIZE) {
>                         fault->vma_pagesize = fault->fault_granule;
>                 } else {
>                         fault->vma_pagesize = transparent_hugepage_adjust(kvm, s2fd->memslot,
> @@ -1936,7 +1940,7 @@ static int kvm_s2_fault_map(const struct kvm_s2_fault_desc *s2fd,
>                 }
>         }
>
> -       if (!fault->fault_is_perm && !fault->s2_force_noncacheable && kvm_has_mte(kvm))
> +       if (!fault_is_perm && !fault->s2_force_noncacheable && kvm_has_mte(kvm))
>                 sanitise_mte_tags(kvm, fault->pfn, fault->vma_pagesize);
>
>         /*
> @@ -1944,7 +1948,7 @@ static int kvm_s2_fault_map(const struct kvm_s2_fault_desc *s2fd,
>          * permissions only if fault->vma_pagesize equals fault->fault_granule. Otherwise,
>          * kvm_pgtable_stage2_map() should be called to change block size.
>          */
> -       if (fault->fault_is_perm && fault->vma_pagesize == fault->fault_granule) {
> +       if (fault_is_perm && fault->vma_pagesize == fault->fault_granule) {
>                 /*
>                  * Drop the SW bits in favour of those stored in the
>                  * PTE, which will be preserved.
> @@ -1977,7 +1981,6 @@ static int user_mem_abort(const struct kvm_s2_fault_desc *s2fd)
>         bool write_fault = kvm_is_write_fault(s2fd->vcpu);
>         bool logging_active = memslot_is_logging(s2fd->memslot);
>         struct kvm_s2_fault fault = {
> -               .fault_is_perm = perm_fault,
>                 .logging_active = logging_active,
>                 .force_pte = logging_active,
>                 .prot = KVM_PGTABLE_PROT_R,
> --
> 2.47.3
>


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 05/17] KVM: arm64: Constrain fault_granule to kvm_s2_fault_map()
  2026-03-16 17:54 ` [PATCH 05/17] KVM: arm64: Constrain fault_granule to kvm_s2_fault_map() Marc Zyngier
@ 2026-03-17 11:04   ` Fuad Tabba
  0 siblings, 0 replies; 47+ messages in thread
From: Fuad Tabba @ 2026-03-17 11:04 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvmarm, linux-arm-kernel, Joey Gouly, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu, Will Deacon, Quentin Perret

On Mon, 16 Mar 2026 at 17:55, Marc Zyngier <maz@kernel.org> wrote:
>
> The notion of fault_granule is specific to kvm_s2_fault_map(), and
> is unused anywhere else.
>
> Move this variable locally, removing it from kvm_s2_fault.
>
> Signed-off-by: Marc Zyngier <maz@kernel.org>

Reviewed-by: Fuad Tabba <tabba@google.com>

Cheers,
/fuad
> ---
>  arch/arm64/kvm/mmu.c | 17 +++++++++--------
>  1 file changed, 9 insertions(+), 8 deletions(-)
>
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 1b32f2e6c3e61..12c2f0aeaae4c 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1724,7 +1724,6 @@ struct kvm_s2_fault {
>         bool logging_active;
>         bool force_pte;
>         long vma_pagesize;
> -       long fault_granule;
>         enum kvm_pgtable_prot prot;
>         struct page *page;
>         vm_flags_t vm_flags;
> @@ -1908,9 +1907,9 @@ static phys_addr_t get_ipa(const struct kvm_s2_fault *fault)
>  static int kvm_s2_fault_map(const struct kvm_s2_fault_desc *s2fd,
>                             struct kvm_s2_fault *fault, void *memcache)
>  {
> -       bool fault_is_perm = kvm_s2_fault_is_perm(s2fd);
>         struct kvm *kvm = s2fd->vcpu->kvm;
>         struct kvm_pgtable *pgt;
> +       long perm_fault_granule;
>         int ret;
>         enum kvm_pgtable_walk_flags flags = KVM_PGTABLE_WALK_SHARED;
>
> @@ -1920,14 +1919,17 @@ static int kvm_s2_fault_map(const struct kvm_s2_fault_desc *s2fd,
>         if (mmu_invalidate_retry(kvm, fault->mmu_seq))
>                 goto out_unlock;
>
> +       perm_fault_granule = (kvm_s2_fault_is_perm(s2fd) ?
> +                             kvm_vcpu_trap_get_perm_fault_granule(s2fd->vcpu) : 0);
> +
>         /*
>          * If we are not forced to use fault->page mapping, check if we are
>          * backed by a THP and thus use block mapping if possible.
>          */
>         if (fault->vma_pagesize == PAGE_SIZE &&
>             !(fault->force_pte || fault->s2_force_noncacheable)) {
> -               if (fault_is_perm && fault->fault_granule > PAGE_SIZE) {
> -                       fault->vma_pagesize = fault->fault_granule;
> +               if (perm_fault_granule > PAGE_SIZE) {
> +                       fault->vma_pagesize = perm_fault_granule;
>                 } else {
>                         fault->vma_pagesize = transparent_hugepage_adjust(kvm, s2fd->memslot,
>                                                                           s2fd->hva, &fault->pfn,
> @@ -1940,15 +1942,15 @@ static int kvm_s2_fault_map(const struct kvm_s2_fault_desc *s2fd,
>                 }
>         }
>
> -       if (!fault_is_perm && !fault->s2_force_noncacheable && kvm_has_mte(kvm))
> +       if (!perm_fault_granule && !fault->s2_force_noncacheable && kvm_has_mte(kvm))
>                 sanitise_mte_tags(kvm, fault->pfn, fault->vma_pagesize);
>
>         /*
>          * Under the premise of getting a FSC_PERM fault, we just need to relax
> -        * permissions only if fault->vma_pagesize equals fault->fault_granule. Otherwise,
> +        * permissions only if vma_pagesize equals perm_fault_granule. Otherwise,
>          * kvm_pgtable_stage2_map() should be called to change block size.
>          */
> -       if (fault_is_perm && fault->vma_pagesize == fault->fault_granule) {
> +       if (fault->vma_pagesize == perm_fault_granule) {
>                 /*
>                  * Drop the SW bits in favour of those stored in the
>                  * PTE, which will be preserved.
> @@ -1984,7 +1986,6 @@ static int user_mem_abort(const struct kvm_s2_fault_desc *s2fd)
>                 .logging_active = logging_active,
>                 .force_pte = logging_active,
>                 .prot = KVM_PGTABLE_PROT_R,
> -               .fault_granule = perm_fault ? kvm_vcpu_trap_get_perm_fault_granule(s2fd->vcpu) : 0,
>                 .write_fault = write_fault,
>                 .exec_fault = kvm_vcpu_trap_is_exec_fault(s2fd->vcpu),
>                 .topup_memcache = !perm_fault || (logging_active && write_fault),
> --
> 2.47.3
>


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 06/17] KVM: arm64: Kill write_fault from kvm_s2_fault
  2026-03-16 17:54 ` [PATCH 06/17] KVM: arm64: Kill write_fault from kvm_s2_fault Marc Zyngier
@ 2026-03-17 11:20   ` Fuad Tabba
  0 siblings, 0 replies; 47+ messages in thread
From: Fuad Tabba @ 2026-03-17 11:20 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvmarm, linux-arm-kernel, Joey Gouly, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu, Will Deacon, Quentin Perret

On Mon, 16 Mar 2026 at 17:55, Marc Zyngier <maz@kernel.org> wrote:
>
> We already have kvm_is_write_fault() as a predicate indicating
> a S2 fault on a write, and we're better off just using that instead
> of duplicating the state.
>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/kvm/mmu.c | 11 +++--------
>  1 file changed, 3 insertions(+), 8 deletions(-)
>
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 12c2f0aeaae4c..86950acbd7e6b 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1711,7 +1711,6 @@ static short kvm_s2_resolve_vma_size(const struct kvm_s2_fault_desc *s2fd,
>  }
>
>  struct kvm_s2_fault {
> -       bool write_fault;
>         bool exec_fault;
>         bool writable;
>         bool topup_memcache;
> @@ -1799,7 +1798,7 @@ static int kvm_s2_fault_pin_pfn(const struct kvm_s2_fault_desc *s2fd,
>                 return ret;
>
>         fault->pfn = __kvm_faultin_pfn(s2fd->memslot, get_canonical_gfn(s2fd, fault),
> -                                      fault->write_fault ? FOLL_WRITE : 0,
> +                                      kvm_is_write_fault(s2fd->vcpu) ? FOLL_WRITE : 0,
>                                        &fault->writable, &fault->page);
>         if (unlikely(is_error_noslot_pfn(fault->pfn))) {
>                 if (fault->pfn == KVM_PFN_ERR_HWPOISON) {
> @@ -1850,7 +1849,7 @@ static int kvm_s2_fault_compute_prot(const struct kvm_s2_fault_desc *s2fd,
>                          */
>                         fault->s2_force_noncacheable = true;
>                 }
> -       } else if (fault->logging_active && !fault->write_fault) {
> +       } else if (fault->logging_active && !kvm_is_write_fault(s2fd->vcpu)) {
>                 /*
>                  * Only actually map the page as writable if this was a write
>                  * fault.
> @@ -1980,21 +1979,17 @@ static int kvm_s2_fault_map(const struct kvm_s2_fault_desc *s2fd,
>  static int user_mem_abort(const struct kvm_s2_fault_desc *s2fd)
>  {
>         bool perm_fault = kvm_vcpu_trap_is_permission_fault(s2fd->vcpu);
> -       bool write_fault = kvm_is_write_fault(s2fd->vcpu);
>         bool logging_active = memslot_is_logging(s2fd->memslot);
>         struct kvm_s2_fault fault = {
>                 .logging_active = logging_active,
>                 .force_pte = logging_active,
>                 .prot = KVM_PGTABLE_PROT_R,
> -               .write_fault = write_fault,
>                 .exec_fault = kvm_vcpu_trap_is_exec_fault(s2fd->vcpu),
> -               .topup_memcache = !perm_fault || (logging_active && write_fault),
> +               .topup_memcache = !perm_fault || (logging_active && kvm_is_write_fault(s2fd->vcpu)),
>         };
>         void *memcache;
>         int ret;
>
> -       VM_WARN_ON_ONCE(fault.write_fault && fault.exec_fault);
> -

We can't hit this warning anyway.

Reviewed-by: Fuad Tabba <tabba@google.com>

Cheers,
/fuad

>         /*
>          * Permission faults just need to update the existing leaf entry,
>          * and so normally don't require allocations from the memcache. The
> --
> 2.47.3
>


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 07/17] KVM: arm64: Kill exec_fault from kvm_s2_fault
  2026-03-16 17:54 ` [PATCH 07/17] KVM: arm64: Kill exec_fault " Marc Zyngier
@ 2026-03-17 11:44   ` Fuad Tabba
  0 siblings, 0 replies; 47+ messages in thread
From: Fuad Tabba @ 2026-03-17 11:44 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvmarm, linux-arm-kernel, Joey Gouly, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu, Will Deacon, Quentin Perret

On Mon, 16 Mar 2026 at 17:55, Marc Zyngier <maz@kernel.org> wrote:
>
> Similarly to write_fault, exec_fault can be advantageously replaced
> by the kvm_vcpu_trap_is_exec_fault() predicate where needed.
>
> Another one bites the dust...
>
> Signed-off-by: Marc Zyngier <maz@kernel.org>

Reviewed-by: Fuad Tabba <tabba@google.com>

Cheers,
/fuad


> ---
>  arch/arm64/kvm/mmu.c | 6 ++----
>  1 file changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 86950acbd7e6b..11820e39ad8e1 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1711,7 +1711,6 @@ static short kvm_s2_resolve_vma_size(const struct kvm_s2_fault_desc *s2fd,
>  }
>
>  struct kvm_s2_fault {
> -       bool exec_fault;
>         bool writable;
>         bool topup_memcache;
>         bool mte_allowed;
> @@ -1857,7 +1856,7 @@ static int kvm_s2_fault_compute_prot(const struct kvm_s2_fault_desc *s2fd,
>                 fault->writable = false;
>         }
>
> -       if (fault->exec_fault && fault->s2_force_noncacheable)
> +       if (kvm_vcpu_trap_is_exec_fault(s2fd->vcpu) && fault->s2_force_noncacheable)
>                 return -ENOEXEC;
>
>         /*
> @@ -1877,7 +1876,7 @@ static int kvm_s2_fault_compute_prot(const struct kvm_s2_fault_desc *s2fd,
>         if (fault->writable)
>                 fault->prot |= KVM_PGTABLE_PROT_W;
>
> -       if (fault->exec_fault)
> +       if (kvm_vcpu_trap_is_exec_fault(s2fd->vcpu))
>                 fault->prot |= KVM_PGTABLE_PROT_X;
>
>         if (fault->s2_force_noncacheable)
> @@ -1984,7 +1983,6 @@ static int user_mem_abort(const struct kvm_s2_fault_desc *s2fd)
>                 .logging_active = logging_active,
>                 .force_pte = logging_active,
>                 .prot = KVM_PGTABLE_PROT_R,
> -               .exec_fault = kvm_vcpu_trap_is_exec_fault(s2fd->vcpu),
>                 .topup_memcache = !perm_fault || (logging_active && kvm_is_write_fault(s2fd->vcpu)),
>         };
>         void *memcache;
> --
> 2.47.3
>
>


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 08/17] KVM: arm64: Kill topup_memcache from kvm_s2_fault
  2026-03-16 17:54 ` [PATCH 08/17] KVM: arm64: Kill topup_memcache " Marc Zyngier
@ 2026-03-17 12:12   ` Fuad Tabba
  2026-03-17 13:31     ` Marc Zyngier
  0 siblings, 1 reply; 47+ messages in thread
From: Fuad Tabba @ 2026-03-17 12:12 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvmarm, linux-arm-kernel, Joey Gouly, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu, Will Deacon, Quentin Perret

Hi Marc,

On Mon, 16 Mar 2026 at 17:55, Marc Zyngier <maz@kernel.org> wrote:
>
> The topup_memcache field can be easily replaced by the equivalent
> conditions, and the resulting code is not much worse.
>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/kvm/mmu.c | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 11820e39ad8e1..abe239752c696 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1712,7 +1712,6 @@ static short kvm_s2_resolve_vma_size(const struct kvm_s2_fault_desc *s2fd,
>
>  struct kvm_s2_fault {
>         bool writable;
> -       bool topup_memcache;
>         bool mte_allowed;
>         bool is_vma_cacheable;
>         bool s2_force_noncacheable;
> @@ -1983,7 +1982,6 @@ static int user_mem_abort(const struct kvm_s2_fault_desc *s2fd)
>                 .logging_active = logging_active,
>                 .force_pte = logging_active,
>                 .prot = KVM_PGTABLE_PROT_R,
> -               .topup_memcache = !perm_fault || (logging_active && kvm_is_write_fault(s2fd->vcpu)),
>         };
>         void *memcache;
>         int ret;
> @@ -1994,9 +1992,11 @@ static int user_mem_abort(const struct kvm_s2_fault_desc *s2fd)
>          * only exception to this is when dirty logging is enabled at runtime
>          * and a write fault needs to collapse a block entry into a table.
>          */
> -       ret = prepare_mmu_memcache(s2fd->vcpu, fault.topup_memcache, &memcache);
> -       if (ret)
> -               return ret;
> +       if (!perm_fault || (logging_active && kvm_is_write_fault(s2fd->vcpu))) {
> +               ret = prepare_mmu_memcache(s2fd->vcpu, true, &memcache);
> +               if (ret)
> +                       return ret;
> +       }

Further up in user_mem_abort(), when memcache is declared it should be
initialized to NULL, since prepare_mmu_memcache() isn't called if this
evaluates to false.

With that fixed:
Reviewed-by: Fuad Tabba <tabba@google.com>

Cheers,
/fuad

>
>         /*
>          * Let's check if we will get back a huge fault->page backed by hugetlbfs, or
> --
> 2.47.3
>


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 09/17] KVM: arm64: Move VMA-related information to kvm_s2_fault_vma_info
  2026-03-16 17:54 ` [PATCH 09/17] KVM: arm64: Move VMA-related information to kvm_s2_fault_vma_info Marc Zyngier
@ 2026-03-17 12:51   ` Fuad Tabba
  2026-03-18 14:22   ` Joey Gouly
  1 sibling, 0 replies; 47+ messages in thread
From: Fuad Tabba @ 2026-03-17 12:51 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvmarm, linux-arm-kernel, Joey Gouly, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu, Will Deacon, Quentin Perret

On Mon, 16 Mar 2026 at 17:55, Marc Zyngier <maz@kernel.org> wrote:
>
> Mecanically extract a bunch of VMA-related fields from kvm_s2_fault
> and move them to a new kvm_s2_fault_vma_info structure.
>
> This is not much, but it already allows us to define which functions
> can update this structure, and which ones are pure consumers of the
> data. Those in the latter camp are updated to take a const pointer
> to that structure.
>
> Signed-off-by: Marc Zyngier <maz@kernel.org>

I really like how, with this patch, we now distinguish between
vma_pagesize and mapping_size. Less chance for bugs or confusion.

Reviewed-by: Fuad Tabba <tabba@google.com>

Cheers,
/fuad

> ---
>  arch/arm64/kvm/mmu.c | 113 +++++++++++++++++++++++--------------------
>  1 file changed, 61 insertions(+), 52 deletions(-)
>
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index abe239752c696..a5b0dd41560f6 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1710,20 +1710,23 @@ static short kvm_s2_resolve_vma_size(const struct kvm_s2_fault_desc *s2fd,
>         return vma_shift;
>  }
>
> +struct kvm_s2_fault_vma_info {
> +       unsigned long   mmu_seq;
> +       long            vma_pagesize;
> +       vm_flags_t      vm_flags;
> +       gfn_t           gfn;
> +       bool            mte_allowed;
> +       bool            is_vma_cacheable;
> +};
> +
>  struct kvm_s2_fault {
>         bool writable;
> -       bool mte_allowed;
> -       bool is_vma_cacheable;
>         bool s2_force_noncacheable;
> -       unsigned long mmu_seq;
> -       gfn_t gfn;
>         kvm_pfn_t pfn;
>         bool logging_active;
>         bool force_pte;
> -       long vma_pagesize;
>         enum kvm_pgtable_prot prot;
>         struct page *page;
> -       vm_flags_t vm_flags;
>  };
>
>  static bool kvm_s2_fault_is_perm(const struct kvm_s2_fault_desc *s2fd)
> @@ -1732,7 +1735,8 @@ static bool kvm_s2_fault_is_perm(const struct kvm_s2_fault_desc *s2fd)
>  }
>
>  static int kvm_s2_fault_get_vma_info(const struct kvm_s2_fault_desc *s2fd,
> -                                    struct kvm_s2_fault *fault)
> +                                    struct kvm_s2_fault *fault,
> +                                    struct kvm_s2_fault_vma_info *s2vi)
>  {
>         struct vm_area_struct *vma;
>         struct kvm *kvm = s2fd->vcpu->kvm;
> @@ -1745,20 +1749,20 @@ static int kvm_s2_fault_get_vma_info(const struct kvm_s2_fault_desc *s2fd,
>                 return -EFAULT;
>         }
>
> -       fault->vma_pagesize = BIT(kvm_s2_resolve_vma_size(s2fd, vma, &fault->force_pte));
> +       s2vi->vma_pagesize = BIT(kvm_s2_resolve_vma_size(s2fd, vma, &fault->force_pte));
>
>         /*
>          * Both the canonical IPA and fault IPA must be aligned to the
>          * mapping size to ensure we find the right PFN and lay down the
>          * mapping in the right place.
>          */
> -       fault->gfn = ALIGN_DOWN(s2fd->fault_ipa, fault->vma_pagesize) >> PAGE_SHIFT;
> +       s2vi->gfn = ALIGN_DOWN(s2fd->fault_ipa, s2vi->vma_pagesize) >> PAGE_SHIFT;
>
> -       fault->mte_allowed = kvm_vma_mte_allowed(vma);
> +       s2vi->mte_allowed = kvm_vma_mte_allowed(vma);
>
> -       fault->vm_flags = vma->vm_flags;
> +       s2vi->vm_flags = vma->vm_flags;
>
> -       fault->is_vma_cacheable = kvm_vma_is_cacheable(vma);
> +       s2vi->is_vma_cacheable = kvm_vma_is_cacheable(vma);
>
>         /*
>          * Read mmu_invalidate_seq so that KVM can detect if the results of
> @@ -1768,39 +1772,40 @@ static int kvm_s2_fault_get_vma_info(const struct kvm_s2_fault_desc *s2fd,
>          * Rely on mmap_read_unlock() for an implicit smp_rmb(), which pairs
>          * with the smp_wmb() in kvm_mmu_invalidate_end().
>          */
> -       fault->mmu_seq = kvm->mmu_invalidate_seq;
> +       s2vi->mmu_seq = kvm->mmu_invalidate_seq;
>         mmap_read_unlock(current->mm);
>
>         return 0;
>  }
>
>  static gfn_t get_canonical_gfn(const struct kvm_s2_fault_desc *s2fd,
> -                              const struct kvm_s2_fault *fault)
> +                              const struct kvm_s2_fault_vma_info *s2vi)
>  {
>         phys_addr_t ipa;
>
>         if (!s2fd->nested)
> -               return fault->gfn;
> +               return s2vi->gfn;
>
>         ipa = kvm_s2_trans_output(s2fd->nested);
> -       return ALIGN_DOWN(ipa, fault->vma_pagesize) >> PAGE_SHIFT;
> +       return ALIGN_DOWN(ipa, s2vi->vma_pagesize) >> PAGE_SHIFT;
>  }
>
>  static int kvm_s2_fault_pin_pfn(const struct kvm_s2_fault_desc *s2fd,
> -                               struct kvm_s2_fault *fault)
> +                               struct kvm_s2_fault *fault,
> +                               struct kvm_s2_fault_vma_info *s2vi)
>  {
>         int ret;
>
> -       ret = kvm_s2_fault_get_vma_info(s2fd, fault);
> +       ret = kvm_s2_fault_get_vma_info(s2fd, fault, s2vi);
>         if (ret)
>                 return ret;
>
> -       fault->pfn = __kvm_faultin_pfn(s2fd->memslot, get_canonical_gfn(s2fd, fault),
> +       fault->pfn = __kvm_faultin_pfn(s2fd->memslot, get_canonical_gfn(s2fd, s2vi),
>                                        kvm_is_write_fault(s2fd->vcpu) ? FOLL_WRITE : 0,
>                                        &fault->writable, &fault->page);
>         if (unlikely(is_error_noslot_pfn(fault->pfn))) {
>                 if (fault->pfn == KVM_PFN_ERR_HWPOISON) {
> -                       kvm_send_hwpoison_signal(s2fd->hva, __ffs(fault->vma_pagesize));
> +                       kvm_send_hwpoison_signal(s2fd->hva, __ffs(s2vi->vma_pagesize));
>                         return 0;
>                 }
>                 return -EFAULT;
> @@ -1810,7 +1815,8 @@ static int kvm_s2_fault_pin_pfn(const struct kvm_s2_fault_desc *s2fd,
>  }
>
>  static int kvm_s2_fault_compute_prot(const struct kvm_s2_fault_desc *s2fd,
> -                                    struct kvm_s2_fault *fault)
> +                                    struct kvm_s2_fault *fault,
> +                                    const struct kvm_s2_fault_vma_info *s2vi)
>  {
>         struct kvm *kvm = s2fd->vcpu->kvm;
>
> @@ -1818,8 +1824,8 @@ static int kvm_s2_fault_compute_prot(const struct kvm_s2_fault_desc *s2fd,
>          * Check if this is non-struct page memory PFN, and cannot support
>          * CMOs. It could potentially be unsafe to access as cacheable.
>          */
> -       if (fault->vm_flags & (VM_PFNMAP | VM_MIXEDMAP) && !pfn_is_map_memory(fault->pfn)) {
> -               if (fault->is_vma_cacheable) {
> +       if (s2vi->vm_flags & (VM_PFNMAP | VM_MIXEDMAP) && !pfn_is_map_memory(fault->pfn)) {
> +               if (s2vi->is_vma_cacheable) {
>                         /*
>                          * Whilst the VMA owner expects cacheable mapping to this
>                          * PFN, hardware also has to support the FWB and CACHE DIC
> @@ -1879,7 +1885,7 @@ static int kvm_s2_fault_compute_prot(const struct kvm_s2_fault_desc *s2fd,
>                 fault->prot |= KVM_PGTABLE_PROT_X;
>
>         if (fault->s2_force_noncacheable)
> -               fault->prot |= (fault->vm_flags & VM_ALLOW_ANY_UNCACHED) ?
> +               fault->prot |= (s2vi->vm_flags & VM_ALLOW_ANY_UNCACHED) ?
>                                KVM_PGTABLE_PROT_NORMAL_NC : KVM_PGTABLE_PROT_DEVICE;
>         else if (cpus_have_final_cap(ARM64_HAS_CACHE_DIC))
>                 fault->prot |= KVM_PGTABLE_PROT_X;
> @@ -1889,74 +1895,73 @@ static int kvm_s2_fault_compute_prot(const struct kvm_s2_fault_desc *s2fd,
>
>         if (!kvm_s2_fault_is_perm(s2fd) && !fault->s2_force_noncacheable && kvm_has_mte(kvm)) {
>                 /* Check the VMM hasn't introduced a new disallowed VMA */
> -               if (!fault->mte_allowed)
> +               if (!s2vi->mte_allowed)
>                         return -EFAULT;
>         }
>
>         return 0;
>  }
>
> -static phys_addr_t get_ipa(const struct kvm_s2_fault *fault)
> -{
> -       return gfn_to_gpa(fault->gfn);
> -}
> -
>  static int kvm_s2_fault_map(const struct kvm_s2_fault_desc *s2fd,
> -                           struct kvm_s2_fault *fault, void *memcache)
> +                           struct kvm_s2_fault *fault,
> +                           const struct kvm_s2_fault_vma_info *s2vi, void *memcache)
>  {
> +       enum kvm_pgtable_walk_flags flags = KVM_PGTABLE_WALK_SHARED;
>         struct kvm *kvm = s2fd->vcpu->kvm;
>         struct kvm_pgtable *pgt;
>         long perm_fault_granule;
> +       long mapping_size;
> +       gfn_t gfn;
>         int ret;
> -       enum kvm_pgtable_walk_flags flags = KVM_PGTABLE_WALK_SHARED;
>
>         kvm_fault_lock(kvm);
>         pgt = s2fd->vcpu->arch.hw_mmu->pgt;
>         ret = -EAGAIN;
> -       if (mmu_invalidate_retry(kvm, fault->mmu_seq))
> +       if (mmu_invalidate_retry(kvm, s2vi->mmu_seq))
>                 goto out_unlock;
>
>         perm_fault_granule = (kvm_s2_fault_is_perm(s2fd) ?
>                               kvm_vcpu_trap_get_perm_fault_granule(s2fd->vcpu) : 0);
> +       mapping_size = s2vi->vma_pagesize;
> +       gfn = s2vi->gfn;
>
>         /*
>          * If we are not forced to use fault->page mapping, check if we are
>          * backed by a THP and thus use block mapping if possible.
>          */
> -       if (fault->vma_pagesize == PAGE_SIZE &&
> +       if (mapping_size == PAGE_SIZE &&
>             !(fault->force_pte || fault->s2_force_noncacheable)) {
>                 if (perm_fault_granule > PAGE_SIZE) {
> -                       fault->vma_pagesize = perm_fault_granule;
> +                       mapping_size = perm_fault_granule;
>                 } else {
> -                       fault->vma_pagesize = transparent_hugepage_adjust(kvm, s2fd->memslot,
> -                                                                         s2fd->hva, &fault->pfn,
> -                                                                         &fault->gfn);
> -
> -                       if (fault->vma_pagesize < 0) {
> -                               ret = fault->vma_pagesize;
> +                       mapping_size = transparent_hugepage_adjust(kvm, s2fd->memslot,
> +                                                                  s2fd->hva, &fault->pfn,
> +                                                                  &gfn);
> +                       if (mapping_size < 0) {
> +                               ret = mapping_size;
>                                 goto out_unlock;
>                         }
>                 }
>         }
>
>         if (!perm_fault_granule && !fault->s2_force_noncacheable && kvm_has_mte(kvm))
> -               sanitise_mte_tags(kvm, fault->pfn, fault->vma_pagesize);
> +               sanitise_mte_tags(kvm, fault->pfn, mapping_size);
>
>         /*
>          * Under the premise of getting a FSC_PERM fault, we just need to relax
> -        * permissions only if vma_pagesize equals perm_fault_granule. Otherwise,
> +        * permissions only if mapping_size equals perm_fault_granule. Otherwise,
>          * kvm_pgtable_stage2_map() should be called to change block size.
>          */
> -       if (fault->vma_pagesize == perm_fault_granule) {
> +       if (mapping_size == perm_fault_granule) {
>                 /*
>                  * Drop the SW bits in favour of those stored in the
>                  * PTE, which will be preserved.
>                  */
>                 fault->prot &= ~KVM_NV_GUEST_MAP_SZ;
> -               ret = KVM_PGT_FN(kvm_pgtable_stage2_relax_perms)(pgt, get_ipa(fault),
> +               ret = KVM_PGT_FN(kvm_pgtable_stage2_relax_perms)(pgt, gfn_to_gpa(gfn),
>                                                                  fault->prot, flags);
>         } else {
> -               ret = KVM_PGT_FN(kvm_pgtable_stage2_map)(pgt, get_ipa(fault), fault->vma_pagesize,
> +               ret = KVM_PGT_FN(kvm_pgtable_stage2_map)(pgt, gfn_to_gpa(gfn), mapping_size,
>                                                          __pfn_to_phys(fault->pfn), fault->prot,
>                                                          memcache, flags);
>         }
> @@ -1965,9 +1970,12 @@ static int kvm_s2_fault_map(const struct kvm_s2_fault_desc *s2fd,
>         kvm_release_faultin_page(kvm, fault->page, !!ret, fault->writable);
>         kvm_fault_unlock(kvm);
>
> -       /* Mark the fault->page dirty only if the fault is handled successfully */
> -       if (fault->writable && !ret)
> -               mark_page_dirty_in_slot(kvm, s2fd->memslot, get_canonical_gfn(s2fd, fault));
> +       /* Mark the page dirty only if the fault is handled successfully */
> +       if (fault->writable && !ret) {
> +               phys_addr_t ipa = gfn_to_gpa(get_canonical_gfn(s2fd, s2vi));
> +               ipa &= ~(mapping_size - 1);
> +               mark_page_dirty_in_slot(kvm, s2fd->memslot, gpa_to_gfn(ipa));
> +       }
>
>         if (ret != -EAGAIN)
>                 return ret;
> @@ -1978,6 +1986,7 @@ static int user_mem_abort(const struct kvm_s2_fault_desc *s2fd)
>  {
>         bool perm_fault = kvm_vcpu_trap_is_permission_fault(s2fd->vcpu);
>         bool logging_active = memslot_is_logging(s2fd->memslot);
> +       struct kvm_s2_fault_vma_info s2vi = {};
>         struct kvm_s2_fault fault = {
>                 .logging_active = logging_active,
>                 .force_pte = logging_active,
> @@ -2002,17 +2011,17 @@ static int user_mem_abort(const struct kvm_s2_fault_desc *s2fd)
>          * Let's check if we will get back a huge fault->page backed by hugetlbfs, or
>          * get block mapping for device MMIO region.
>          */
> -       ret = kvm_s2_fault_pin_pfn(s2fd, &fault);
> +       ret = kvm_s2_fault_pin_pfn(s2fd, &fault, &s2vi);
>         if (ret != 1)
>                 return ret;
>
> -       ret = kvm_s2_fault_compute_prot(s2fd, &fault);
> +       ret = kvm_s2_fault_compute_prot(s2fd, &fault, &s2vi);
>         if (ret) {
>                 kvm_release_page_unused(fault.page);
>                 return ret;
>         }
>
> -       return kvm_s2_fault_map(s2fd, &fault, memcache);
> +       return kvm_s2_fault_map(s2fd, &fault, &s2vi, memcache);
>  }
>
>  /* Resolve the access fault by making the page young again. */
> --
> 2.47.3
>


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 10/17] KVM: arm64: Kill logging_active from kvm_s2_fault
  2026-03-16 17:54 ` [PATCH 10/17] KVM: arm64: Kill logging_active from kvm_s2_fault Marc Zyngier
@ 2026-03-17 13:23   ` Fuad Tabba
  0 siblings, 0 replies; 47+ messages in thread
From: Fuad Tabba @ 2026-03-17 13:23 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvmarm, linux-arm-kernel, Joey Gouly, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu, Will Deacon, Quentin Perret

On Mon, 16 Mar 2026 at 17:55, Marc Zyngier <maz@kernel.org> wrote:
>
> There are only two spots where we evaluate whether logging is
> active. Replace the boolean with calls to the relevant helper.
>
> Signed-off-by: Marc Zyngier <maz@kernel.org>

Reviewed-by: Fuad Tabba <tabba@google.com>

Cheers,
/fuad
> ---
>  arch/arm64/kvm/mmu.c | 10 ++++------
>  1 file changed, 4 insertions(+), 6 deletions(-)
>
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index a5b0dd41560f6..caa5bedc79e19 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1723,7 +1723,6 @@ struct kvm_s2_fault {
>         bool writable;
>         bool s2_force_noncacheable;
>         kvm_pfn_t pfn;
> -       bool logging_active;
>         bool force_pte;
>         enum kvm_pgtable_prot prot;
>         struct page *page;
> @@ -1853,7 +1852,7 @@ static int kvm_s2_fault_compute_prot(const struct kvm_s2_fault_desc *s2fd,
>                          */
>                         fault->s2_force_noncacheable = true;
>                 }
> -       } else if (fault->logging_active && !kvm_is_write_fault(s2fd->vcpu)) {
> +       } else if (memslot_is_logging(s2fd->memslot) && !kvm_is_write_fault(s2fd->vcpu)) {
>                 /*
>                  * Only actually map the page as writable if this was a write
>                  * fault.
> @@ -1985,11 +1984,9 @@ static int kvm_s2_fault_map(const struct kvm_s2_fault_desc *s2fd,
>  static int user_mem_abort(const struct kvm_s2_fault_desc *s2fd)
>  {
>         bool perm_fault = kvm_vcpu_trap_is_permission_fault(s2fd->vcpu);
> -       bool logging_active = memslot_is_logging(s2fd->memslot);
>         struct kvm_s2_fault_vma_info s2vi = {};
>         struct kvm_s2_fault fault = {
> -               .logging_active = logging_active,
> -               .force_pte = logging_active,
> +               .force_pte = memslot_is_logging(s2fd->memslot),
>                 .prot = KVM_PGTABLE_PROT_R,
>         };
>         void *memcache;
> @@ -2001,7 +1998,8 @@ static int user_mem_abort(const struct kvm_s2_fault_desc *s2fd)
>          * only exception to this is when dirty logging is enabled at runtime
>          * and a write fault needs to collapse a block entry into a table.
>          */
> -       if (!perm_fault || (logging_active && kvm_is_write_fault(s2fd->vcpu))) {
> +       if (!perm_fault || (memslot_is_logging(s2fd->memslot) &&
> +                           kvm_is_write_fault(s2fd->vcpu))) {
>                 ret = prepare_mmu_memcache(s2fd->vcpu, true, &memcache);
>                 if (ret)
>                         return ret;
> --
> 2.47.3
>


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 08/17] KVM: arm64: Kill topup_memcache from kvm_s2_fault
  2026-03-17 12:12   ` Fuad Tabba
@ 2026-03-17 13:31     ` Marc Zyngier
  0 siblings, 0 replies; 47+ messages in thread
From: Marc Zyngier @ 2026-03-17 13:31 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: kvmarm, linux-arm-kernel, Joey Gouly, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu, Will Deacon, Quentin Perret

On Tue, 17 Mar 2026 12:12:57 +0000,
Fuad Tabba <tabba@google.com> wrote:
> 
> Hi Marc,
> 
> On Mon, 16 Mar 2026 at 17:55, Marc Zyngier <maz@kernel.org> wrote:
> >
> > The topup_memcache field can be easily replaced by the equivalent
> > conditions, and the resulting code is not much worse.
> >
> > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > ---
> >  arch/arm64/kvm/mmu.c | 10 +++++-----
> >  1 file changed, 5 insertions(+), 5 deletions(-)
> >
> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index 11820e39ad8e1..abe239752c696 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -1712,7 +1712,6 @@ static short kvm_s2_resolve_vma_size(const struct kvm_s2_fault_desc *s2fd,
> >
> >  struct kvm_s2_fault {
> >         bool writable;
> > -       bool topup_memcache;
> >         bool mte_allowed;
> >         bool is_vma_cacheable;
> >         bool s2_force_noncacheable;
> > @@ -1983,7 +1982,6 @@ static int user_mem_abort(const struct kvm_s2_fault_desc *s2fd)
> >                 .logging_active = logging_active,
> >                 .force_pte = logging_active,
> >                 .prot = KVM_PGTABLE_PROT_R,
> > -               .topup_memcache = !perm_fault || (logging_active && kvm_is_write_fault(s2fd->vcpu)),
> >         };
> >         void *memcache;
> >         int ret;
> > @@ -1994,9 +1992,11 @@ static int user_mem_abort(const struct kvm_s2_fault_desc *s2fd)
> >          * only exception to this is when dirty logging is enabled at runtime
> >          * and a write fault needs to collapse a block entry into a table.
> >          */
> > -       ret = prepare_mmu_memcache(s2fd->vcpu, fault.topup_memcache, &memcache);
> > -       if (ret)
> > -               return ret;
> > +       if (!perm_fault || (logging_active && kvm_is_write_fault(s2fd->vcpu))) {
> > +               ret = prepare_mmu_memcache(s2fd->vcpu, true, &memcache);
> > +               if (ret)
> > +                       return ret;
> > +       }
> 
> Further up in user_mem_abort(), when memcache is declared it should be
> initialized to NULL, since prepare_mmu_memcache() isn't called if this
> evaluates to false.

I had that at some point, but then realised that there was no case
where memcache could be used and yet not be initialised via
prepare_mmu_memcache(). But given that this is still a bit fragile,
I'll add it back.

> 
> With that fixed:
> Reviewed-by: Fuad Tabba <tabba@google.com>

Thanks!

	M.

-- 
Without deviation from the norm, progress is not possible.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 11/17] KVM: arm64: Restrict the scope of the 'writable' attribute
  2026-03-16 17:54 ` [PATCH 11/17] KVM: arm64: Restrict the scope of the 'writable' attribute Marc Zyngier
@ 2026-03-17 13:55   ` Fuad Tabba
  0 siblings, 0 replies; 47+ messages in thread
From: Fuad Tabba @ 2026-03-17 13:55 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvmarm, linux-arm-kernel, Joey Gouly, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu, Will Deacon, Quentin Perret

On Mon, 16 Mar 2026 at 17:55, Marc Zyngier <maz@kernel.org> wrote:
>
> The 'writable' field is ambiguous, and indicates multiple things:
>
> - whether the underlying memslot is writable
>
> - whether we are resolving the fault with writable attributes
>
> Add a new field to kvm_s2_fault_vma_info (map_writable) to indicate
> the former condition, and have local writable variables to track
> the latter.
>
> Signed-off-by: Marc Zyngier <maz@kernel.org>

user_mem_abort() was overloading a lot of variables!

Reviewed-by: Fuad Tabba <tabba@google.com>

Cheers,
/fuad

> ---
>  arch/arm64/kvm/mmu.c | 16 +++++++++-------
>  1 file changed, 9 insertions(+), 7 deletions(-)
>
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index caa5bedc79e19..3cfb8f2a6d186 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1717,10 +1717,10 @@ struct kvm_s2_fault_vma_info {
>         gfn_t           gfn;
>         bool            mte_allowed;
>         bool            is_vma_cacheable;
> +       bool            map_writable;
>  };
>
>  struct kvm_s2_fault {
> -       bool writable;
>         bool s2_force_noncacheable;
>         kvm_pfn_t pfn;
>         bool force_pte;
> @@ -1801,7 +1801,7 @@ static int kvm_s2_fault_pin_pfn(const struct kvm_s2_fault_desc *s2fd,
>
>         fault->pfn = __kvm_faultin_pfn(s2fd->memslot, get_canonical_gfn(s2fd, s2vi),
>                                        kvm_is_write_fault(s2fd->vcpu) ? FOLL_WRITE : 0,
> -                                      &fault->writable, &fault->page);
> +                                      &s2vi->map_writable, &fault->page);
>         if (unlikely(is_error_noslot_pfn(fault->pfn))) {
>                 if (fault->pfn == KVM_PFN_ERR_HWPOISON) {
>                         kvm_send_hwpoison_signal(s2fd->hva, __ffs(s2vi->vma_pagesize));
> @@ -1818,6 +1818,7 @@ static int kvm_s2_fault_compute_prot(const struct kvm_s2_fault_desc *s2fd,
>                                      const struct kvm_s2_fault_vma_info *s2vi)
>  {
>         struct kvm *kvm = s2fd->vcpu->kvm;
> +       bool writable = s2vi->map_writable;
>
>         /*
>          * Check if this is non-struct page memory PFN, and cannot support
> @@ -1857,7 +1858,7 @@ static int kvm_s2_fault_compute_prot(const struct kvm_s2_fault_desc *s2fd,
>                  * Only actually map the page as writable if this was a write
>                  * fault.
>                  */
> -               fault->writable = false;
> +               writable = false;
>         }
>
>         if (kvm_vcpu_trap_is_exec_fault(s2fd->vcpu) && fault->s2_force_noncacheable)
> @@ -1875,9 +1876,9 @@ static int kvm_s2_fault_compute_prot(const struct kvm_s2_fault_desc *s2fd,
>         }
>
>         if (s2fd->nested)
> -               adjust_nested_fault_perms(s2fd->nested, &fault->prot, &fault->writable);
> +               adjust_nested_fault_perms(s2fd->nested, &fault->prot, &writable);
>
> -       if (fault->writable)
> +       if (writable)
>                 fault->prot |= KVM_PGTABLE_PROT_W;
>
>         if (kvm_vcpu_trap_is_exec_fault(s2fd->vcpu))
> @@ -1906,6 +1907,7 @@ static int kvm_s2_fault_map(const struct kvm_s2_fault_desc *s2fd,
>                             const struct kvm_s2_fault_vma_info *s2vi, void *memcache)
>  {
>         enum kvm_pgtable_walk_flags flags = KVM_PGTABLE_WALK_SHARED;
> +       bool writable = fault->prot & KVM_PGTABLE_PROT_W;
>         struct kvm *kvm = s2fd->vcpu->kvm;
>         struct kvm_pgtable *pgt;
>         long perm_fault_granule;
> @@ -1966,11 +1968,11 @@ static int kvm_s2_fault_map(const struct kvm_s2_fault_desc *s2fd,
>         }
>
>  out_unlock:
> -       kvm_release_faultin_page(kvm, fault->page, !!ret, fault->writable);
> +       kvm_release_faultin_page(kvm, fault->page, !!ret, writable);
>         kvm_fault_unlock(kvm);
>
>         /* Mark the page dirty only if the fault is handled successfully */
> -       if (fault->writable && !ret) {
> +       if (writable && !ret) {
>                 phys_addr_t ipa = gfn_to_gpa(get_canonical_gfn(s2fd, s2vi));
>                 ipa &= ~(mapping_size - 1);
>                 mark_page_dirty_in_slot(kvm, s2fd->memslot, gpa_to_gfn(ipa));
> --
> 2.47.3
>


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 12/17] KVM: arm64: Move kvm_s2_fault.{pfn,page} to kvm_s2_vma_info
  2026-03-16 17:54 ` [PATCH 12/17] KVM: arm64: Move kvm_s2_fault.{pfn,page} to kvm_s2_vma_info Marc Zyngier
@ 2026-03-17 14:24   ` Fuad Tabba
  0 siblings, 0 replies; 47+ messages in thread
From: Fuad Tabba @ 2026-03-17 14:24 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvmarm, linux-arm-kernel, Joey Gouly, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu, Will Deacon, Quentin Perret

On Mon, 16 Mar 2026 at 17:55, Marc Zyngier <maz@kernel.org> wrote:
>
> Continue restricting the visibility/mutability of some attributes
> by moving kvm_s2_fault.{pfn,page} to kvm_s2_vma_info.
>
> This is a pretty mechanical change.
>
> Signed-off-by: Marc Zyngier <maz@kernel.org>

Reviewed-by: Fuad Tabba <tabba@google.com>

Cheers,
/fuad

> ---
>  arch/arm64/kvm/mmu.c | 30 ++++++++++++++++--------------
>  1 file changed, 16 insertions(+), 14 deletions(-)
>
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 3cfb8f2a6d186..ccdc9398e4ce2 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1714,6 +1714,8 @@ struct kvm_s2_fault_vma_info {
>         unsigned long   mmu_seq;
>         long            vma_pagesize;
>         vm_flags_t      vm_flags;
> +       struct page     *page;
> +       kvm_pfn_t       pfn;
>         gfn_t           gfn;
>         bool            mte_allowed;
>         bool            is_vma_cacheable;
> @@ -1722,10 +1724,8 @@ struct kvm_s2_fault_vma_info {
>
>  struct kvm_s2_fault {
>         bool s2_force_noncacheable;
> -       kvm_pfn_t pfn;
>         bool force_pte;
>         enum kvm_pgtable_prot prot;
> -       struct page *page;
>  };
>
>  static bool kvm_s2_fault_is_perm(const struct kvm_s2_fault_desc *s2fd)
> @@ -1799,11 +1799,11 @@ static int kvm_s2_fault_pin_pfn(const struct kvm_s2_fault_desc *s2fd,
>         if (ret)
>                 return ret;
>
> -       fault->pfn = __kvm_faultin_pfn(s2fd->memslot, get_canonical_gfn(s2fd, s2vi),
> -                                      kvm_is_write_fault(s2fd->vcpu) ? FOLL_WRITE : 0,
> -                                      &s2vi->map_writable, &fault->page);
> -       if (unlikely(is_error_noslot_pfn(fault->pfn))) {
> -               if (fault->pfn == KVM_PFN_ERR_HWPOISON) {
> +       s2vi->pfn = __kvm_faultin_pfn(s2fd->memslot, get_canonical_gfn(s2fd, s2vi),
> +                                     kvm_is_write_fault(s2fd->vcpu) ? FOLL_WRITE : 0,
> +                                     &s2vi->map_writable, &s2vi->page);
> +       if (unlikely(is_error_noslot_pfn(s2vi->pfn))) {
> +               if (s2vi->pfn == KVM_PFN_ERR_HWPOISON) {
>                         kvm_send_hwpoison_signal(s2fd->hva, __ffs(s2vi->vma_pagesize));
>                         return 0;
>                 }
> @@ -1824,7 +1824,7 @@ static int kvm_s2_fault_compute_prot(const struct kvm_s2_fault_desc *s2fd,
>          * Check if this is non-struct page memory PFN, and cannot support
>          * CMOs. It could potentially be unsafe to access as cacheable.
>          */
> -       if (s2vi->vm_flags & (VM_PFNMAP | VM_MIXEDMAP) && !pfn_is_map_memory(fault->pfn)) {
> +       if (s2vi->vm_flags & (VM_PFNMAP | VM_MIXEDMAP) && !pfn_is_map_memory(s2vi->pfn)) {
>                 if (s2vi->is_vma_cacheable) {
>                         /*
>                          * Whilst the VMA owner expects cacheable mapping to this
> @@ -1912,6 +1912,7 @@ static int kvm_s2_fault_map(const struct kvm_s2_fault_desc *s2fd,
>         struct kvm_pgtable *pgt;
>         long perm_fault_granule;
>         long mapping_size;
> +       kvm_pfn_t pfn;
>         gfn_t gfn;
>         int ret;
>
> @@ -1924,10 +1925,11 @@ static int kvm_s2_fault_map(const struct kvm_s2_fault_desc *s2fd,
>         perm_fault_granule = (kvm_s2_fault_is_perm(s2fd) ?
>                               kvm_vcpu_trap_get_perm_fault_granule(s2fd->vcpu) : 0);
>         mapping_size = s2vi->vma_pagesize;
> +       pfn = s2vi->pfn;
>         gfn = s2vi->gfn;
>
>         /*
> -        * If we are not forced to use fault->page mapping, check if we are
> +        * If we are not forced to use page mapping, check if we are
>          * backed by a THP and thus use block mapping if possible.
>          */
>         if (mapping_size == PAGE_SIZE &&
> @@ -1936,7 +1938,7 @@ static int kvm_s2_fault_map(const struct kvm_s2_fault_desc *s2fd,
>                         mapping_size = perm_fault_granule;
>                 } else {
>                         mapping_size = transparent_hugepage_adjust(kvm, s2fd->memslot,
> -                                                                  s2fd->hva, &fault->pfn,
> +                                                                  s2fd->hva, &pfn,
>                                                                    &gfn);
>                         if (mapping_size < 0) {
>                                 ret = mapping_size;
> @@ -1946,7 +1948,7 @@ static int kvm_s2_fault_map(const struct kvm_s2_fault_desc *s2fd,
>         }
>
>         if (!perm_fault_granule && !fault->s2_force_noncacheable && kvm_has_mte(kvm))
> -               sanitise_mte_tags(kvm, fault->pfn, mapping_size);
> +               sanitise_mte_tags(kvm, pfn, mapping_size);
>
>         /*
>          * Under the premise of getting a FSC_PERM fault, we just need to relax
> @@ -1963,12 +1965,12 @@ static int kvm_s2_fault_map(const struct kvm_s2_fault_desc *s2fd,
>                                                                  fault->prot, flags);
>         } else {
>                 ret = KVM_PGT_FN(kvm_pgtable_stage2_map)(pgt, gfn_to_gpa(gfn), mapping_size,
> -                                                        __pfn_to_phys(fault->pfn), fault->prot,
> +                                                        __pfn_to_phys(pfn), fault->prot,
>                                                          memcache, flags);
>         }
>
>  out_unlock:
> -       kvm_release_faultin_page(kvm, fault->page, !!ret, writable);
> +       kvm_release_faultin_page(kvm, s2vi->page, !!ret, writable);
>         kvm_fault_unlock(kvm);
>
>         /* Mark the page dirty only if the fault is handled successfully */
> @@ -2017,7 +2019,7 @@ static int user_mem_abort(const struct kvm_s2_fault_desc *s2fd)
>
>         ret = kvm_s2_fault_compute_prot(s2fd, &fault, &s2vi);
>         if (ret) {
> -               kvm_release_page_unused(fault.page);
> +               kvm_release_page_unused(s2vi.page);
>                 return ret;
>         }
>
> --
> 2.47.3
>
>


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 13/17] KVM: arm64: Replace force_pte with a max_map_size attribute
  2026-03-16 17:54 ` [PATCH 13/17] KVM: arm64: Replace force_pte with a max_map_size attribute Marc Zyngier
@ 2026-03-17 15:08   ` Fuad Tabba
  0 siblings, 0 replies; 47+ messages in thread
From: Fuad Tabba @ 2026-03-17 15:08 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvmarm, linux-arm-kernel, Joey Gouly, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu, Will Deacon, Quentin Perret

On Mon, 16 Mar 2026 at 17:55, Marc Zyngier <maz@kernel.org> wrote:
>
> force_pte is annoyingly limited in what it expresses, and we'd
> be better off with a more generic primitive. Introduce max_map_size
> instead, which does the trick and can be moved into the vma_info
> structure. This firther allows it to reduce the scopes in which

nit: firther -> further

> it is mutable.
>
> Signed-off-by: Marc Zyngier <maz@kernel.org>


Reviewed-by: Fuad Tabba <tabba@google.com>

Cheers,
/fuad


> ---
>  arch/arm64/kvm/mmu.c | 47 +++++++++++++++++++++++---------------------
>  1 file changed, 25 insertions(+), 22 deletions(-)
>
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index ccdc9398e4ce2..ac4bfcc33aeb1 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1648,15 +1648,32 @@ struct kvm_s2_fault_desc {
>         unsigned long           hva;
>  };
>
> +struct kvm_s2_fault_vma_info {
> +       unsigned long   mmu_seq;
> +       long            vma_pagesize;
> +       vm_flags_t      vm_flags;
> +       unsigned long   max_map_size;
> +       struct page     *page;
> +       kvm_pfn_t       pfn;
> +       gfn_t           gfn;
> +       bool            mte_allowed;
> +       bool            is_vma_cacheable;
> +       bool            map_writable;
> +};
> +
>  static short kvm_s2_resolve_vma_size(const struct kvm_s2_fault_desc *s2fd,
> -                                    struct vm_area_struct *vma, bool *force_pte)
> +                                    struct kvm_s2_fault_vma_info *s2vi,
> +                                    struct vm_area_struct *vma)
>  {
>         short vma_shift;
>
> -       if (*force_pte)
> +       if (memslot_is_logging(s2fd->memslot)) {
> +               s2vi->max_map_size = PAGE_SIZE;
>                 vma_shift = PAGE_SHIFT;
> -       else
> +       } else {
> +               s2vi->max_map_size = PUD_SIZE;
>                 vma_shift = get_vma_page_shift(vma, s2fd->hva);
> +       }
>
>         switch (vma_shift) {
>  #ifndef __PAGETABLE_PMD_FOLDED
> @@ -1674,7 +1691,7 @@ static short kvm_s2_resolve_vma_size(const struct kvm_s2_fault_desc *s2fd,
>                 fallthrough;
>         case CONT_PTE_SHIFT:
>                 vma_shift = PAGE_SHIFT;
> -               *force_pte = true;
> +               s2vi->max_map_size = PAGE_SIZE;
>                 fallthrough;
>         case PAGE_SHIFT:
>                 break;
> @@ -1685,7 +1702,7 @@ static short kvm_s2_resolve_vma_size(const struct kvm_s2_fault_desc *s2fd,
>         if (s2fd->nested) {
>                 unsigned long max_map_size;
>
> -               max_map_size = *force_pte ? PAGE_SIZE : PUD_SIZE;
> +               max_map_size = min(s2vi->max_map_size, PUD_SIZE);
>
>                 /*
>                  * If we're about to create a shadow stage 2 entry, then we
> @@ -1703,28 +1720,15 @@ static short kvm_s2_resolve_vma_size(const struct kvm_s2_fault_desc *s2fd,
>                 else if (max_map_size >= PAGE_SIZE && max_map_size < PMD_SIZE)
>                         max_map_size = PAGE_SIZE;
>
> -               *force_pte = (max_map_size == PAGE_SIZE);
> +               s2vi->max_map_size = max_map_size;
>                 vma_shift = min_t(short, vma_shift, __ffs(max_map_size));
>         }
>
>         return vma_shift;
>  }
>
> -struct kvm_s2_fault_vma_info {
> -       unsigned long   mmu_seq;
> -       long            vma_pagesize;
> -       vm_flags_t      vm_flags;
> -       struct page     *page;
> -       kvm_pfn_t       pfn;
> -       gfn_t           gfn;
> -       bool            mte_allowed;
> -       bool            is_vma_cacheable;
> -       bool            map_writable;
> -};
> -
>  struct kvm_s2_fault {
>         bool s2_force_noncacheable;
> -       bool force_pte;
>         enum kvm_pgtable_prot prot;
>  };
>
> @@ -1748,7 +1752,7 @@ static int kvm_s2_fault_get_vma_info(const struct kvm_s2_fault_desc *s2fd,
>                 return -EFAULT;
>         }
>
> -       s2vi->vma_pagesize = BIT(kvm_s2_resolve_vma_size(s2fd, vma, &fault->force_pte));
> +       s2vi->vma_pagesize = BIT(kvm_s2_resolve_vma_size(s2fd, s2vi, vma));
>
>         /*
>          * Both the canonical IPA and fault IPA must be aligned to the
> @@ -1933,7 +1937,7 @@ static int kvm_s2_fault_map(const struct kvm_s2_fault_desc *s2fd,
>          * backed by a THP and thus use block mapping if possible.
>          */
>         if (mapping_size == PAGE_SIZE &&
> -           !(fault->force_pte || fault->s2_force_noncacheable)) {
> +           !(s2vi->max_map_size == PAGE_SIZE || fault->s2_force_noncacheable)) {
>                 if (perm_fault_granule > PAGE_SIZE) {
>                         mapping_size = perm_fault_granule;
>                 } else {
> @@ -1990,7 +1994,6 @@ static int user_mem_abort(const struct kvm_s2_fault_desc *s2fd)
>         bool perm_fault = kvm_vcpu_trap_is_permission_fault(s2fd->vcpu);
>         struct kvm_s2_fault_vma_info s2vi = {};
>         struct kvm_s2_fault fault = {
> -               .force_pte = memslot_is_logging(s2fd->memslot),
>                 .prot = KVM_PGTABLE_PROT_R,
>         };
>         void *memcache;
> --
> 2.47.3
>


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 14/17] KVM: arm64: Move device mapping management into kvm_s2_fault_pin_pfn()
  2026-03-16 17:54 ` [PATCH 14/17] KVM: arm64: Move device mapping management into kvm_s2_fault_pin_pfn() Marc Zyngier
@ 2026-03-17 15:41   ` Fuad Tabba
  0 siblings, 0 replies; 47+ messages in thread
From: Fuad Tabba @ 2026-03-17 15:41 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvmarm, linux-arm-kernel, Joey Gouly, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu, Will Deacon, Quentin Perret

On Mon, 16 Mar 2026 at 17:55, Marc Zyngier <maz@kernel.org> wrote:
>
> Attributes computed for devices are computed very late in the fault
> handling process, meanning they are mutable for that long.
>
> Introduce both 'device' and 'map_non_cacheable' attributes to the
> vma_info structure, allowing that information to be set in stone
> earlier, in kvm_s2_fault_pin_pfn().
>
> Signed-off-by: Marc Zyngier <maz@kernel.org>

Reviewed-by: Fuad Tabba <tabba@google.com>

Cheers,
/fuad


> ---
>  arch/arm64/kvm/mmu.c | 52 ++++++++++++++++++++++++--------------------
>  1 file changed, 29 insertions(+), 23 deletions(-)
>
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index ac4bfcc33aeb1..97cb3585eba03 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1656,9 +1656,11 @@ struct kvm_s2_fault_vma_info {
>         struct page     *page;
>         kvm_pfn_t       pfn;
>         gfn_t           gfn;
> +       bool            device;
>         bool            mte_allowed;
>         bool            is_vma_cacheable;
>         bool            map_writable;
> +       bool            map_non_cacheable;
>  };
>
>  static short kvm_s2_resolve_vma_size(const struct kvm_s2_fault_desc *s2fd,
> @@ -1728,7 +1730,6 @@ static short kvm_s2_resolve_vma_size(const struct kvm_s2_fault_desc *s2fd,
>  }
>
>  struct kvm_s2_fault {
> -       bool s2_force_noncacheable;
>         enum kvm_pgtable_prot prot;
>  };
>
> @@ -1738,7 +1739,6 @@ static bool kvm_s2_fault_is_perm(const struct kvm_s2_fault_desc *s2fd)
>  }
>
>  static int kvm_s2_fault_get_vma_info(const struct kvm_s2_fault_desc *s2fd,
> -                                    struct kvm_s2_fault *fault,
>                                      struct kvm_s2_fault_vma_info *s2vi)
>  {
>         struct vm_area_struct *vma;
> @@ -1794,12 +1794,11 @@ static gfn_t get_canonical_gfn(const struct kvm_s2_fault_desc *s2fd,
>  }
>
>  static int kvm_s2_fault_pin_pfn(const struct kvm_s2_fault_desc *s2fd,
> -                               struct kvm_s2_fault *fault,
>                                 struct kvm_s2_fault_vma_info *s2vi)
>  {
>         int ret;
>
> -       ret = kvm_s2_fault_get_vma_info(s2fd, fault, s2vi);
> +       ret = kvm_s2_fault_get_vma_info(s2fd, s2vi);
>         if (ret)
>                 return ret;
>
> @@ -1814,16 +1813,6 @@ static int kvm_s2_fault_pin_pfn(const struct kvm_s2_fault_desc *s2fd,
>                 return -EFAULT;
>         }
>
> -       return 1;
> -}
> -
> -static int kvm_s2_fault_compute_prot(const struct kvm_s2_fault_desc *s2fd,
> -                                    struct kvm_s2_fault *fault,
> -                                    const struct kvm_s2_fault_vma_info *s2vi)
> -{
> -       struct kvm *kvm = s2fd->vcpu->kvm;
> -       bool writable = s2vi->map_writable;
> -
>         /*
>          * Check if this is non-struct page memory PFN, and cannot support
>          * CMOs. It could potentially be unsafe to access as cacheable.
> @@ -1842,8 +1831,10 @@ static int kvm_s2_fault_compute_prot(const struct kvm_s2_fault_desc *s2fd,
>                          * S2FWB and CACHE DIC are mandatory to avoid the need for
>                          * cache maintenance.
>                          */
> -                       if (!kvm_supports_cacheable_pfnmap())
> +                       if (!kvm_supports_cacheable_pfnmap()) {
> +                               kvm_release_faultin_page(s2fd->vcpu->kvm, s2vi->page, true, false);
>                                 return -EFAULT;
> +                       }
>                 } else {
>                         /*
>                          * If the page was identified as device early by looking at
> @@ -1855,9 +1846,24 @@ static int kvm_s2_fault_compute_prot(const struct kvm_s2_fault_desc *s2fd,
>                          * In both cases, we don't let transparent_hugepage_adjust()
>                          * change things at the last minute.
>                          */
> -                       fault->s2_force_noncacheable = true;
> +                       s2vi->map_non_cacheable = true;
>                 }
> -       } else if (memslot_is_logging(s2fd->memslot) && !kvm_is_write_fault(s2fd->vcpu)) {
> +
> +               s2vi->device = true;
> +       }
> +
> +       return 1;
> +}
> +
> +static int kvm_s2_fault_compute_prot(const struct kvm_s2_fault_desc *s2fd,
> +                                    struct kvm_s2_fault *fault,
> +                                    const struct kvm_s2_fault_vma_info *s2vi)
> +{
> +       struct kvm *kvm = s2fd->vcpu->kvm;
> +       bool writable = s2vi->map_writable;
> +
> +       if (!s2vi->device && memslot_is_logging(s2fd->memslot) &&
> +           !kvm_is_write_fault(s2fd->vcpu)) {
>                 /*
>                  * Only actually map the page as writable if this was a write
>                  * fault.
> @@ -1865,7 +1871,7 @@ static int kvm_s2_fault_compute_prot(const struct kvm_s2_fault_desc *s2fd,
>                 writable = false;
>         }
>
> -       if (kvm_vcpu_trap_is_exec_fault(s2fd->vcpu) && fault->s2_force_noncacheable)
> +       if (kvm_vcpu_trap_is_exec_fault(s2fd->vcpu) && s2vi->map_non_cacheable)
>                 return -ENOEXEC;
>
>         /*
> @@ -1888,7 +1894,7 @@ static int kvm_s2_fault_compute_prot(const struct kvm_s2_fault_desc *s2fd,
>         if (kvm_vcpu_trap_is_exec_fault(s2fd->vcpu))
>                 fault->prot |= KVM_PGTABLE_PROT_X;
>
> -       if (fault->s2_force_noncacheable)
> +       if (s2vi->map_non_cacheable)
>                 fault->prot |= (s2vi->vm_flags & VM_ALLOW_ANY_UNCACHED) ?
>                                KVM_PGTABLE_PROT_NORMAL_NC : KVM_PGTABLE_PROT_DEVICE;
>         else if (cpus_have_final_cap(ARM64_HAS_CACHE_DIC))
> @@ -1897,7 +1903,7 @@ static int kvm_s2_fault_compute_prot(const struct kvm_s2_fault_desc *s2fd,
>         if (s2fd->nested)
>                 adjust_nested_exec_perms(kvm, s2fd->nested, &fault->prot);
>
> -       if (!kvm_s2_fault_is_perm(s2fd) && !fault->s2_force_noncacheable && kvm_has_mte(kvm)) {
> +       if (!kvm_s2_fault_is_perm(s2fd) && !s2vi->map_non_cacheable && kvm_has_mte(kvm)) {
>                 /* Check the VMM hasn't introduced a new disallowed VMA */
>                 if (!s2vi->mte_allowed)
>                         return -EFAULT;
> @@ -1937,7 +1943,7 @@ static int kvm_s2_fault_map(const struct kvm_s2_fault_desc *s2fd,
>          * backed by a THP and thus use block mapping if possible.
>          */
>         if (mapping_size == PAGE_SIZE &&
> -           !(s2vi->max_map_size == PAGE_SIZE || fault->s2_force_noncacheable)) {
> +           !(s2vi->max_map_size == PAGE_SIZE || s2vi->map_non_cacheable)) {
>                 if (perm_fault_granule > PAGE_SIZE) {
>                         mapping_size = perm_fault_granule;
>                 } else {
> @@ -1951,7 +1957,7 @@ static int kvm_s2_fault_map(const struct kvm_s2_fault_desc *s2fd,
>                 }
>         }
>
> -       if (!perm_fault_granule && !fault->s2_force_noncacheable && kvm_has_mte(kvm))
> +       if (!perm_fault_granule && !s2vi->map_non_cacheable && kvm_has_mte(kvm))
>                 sanitise_mte_tags(kvm, pfn, mapping_size);
>
>         /*
> @@ -2016,7 +2022,7 @@ static int user_mem_abort(const struct kvm_s2_fault_desc *s2fd)
>          * Let's check if we will get back a huge fault->page backed by hugetlbfs, or
>          * get block mapping for device MMIO region.
>          */
> -       ret = kvm_s2_fault_pin_pfn(s2fd, &fault, &s2vi);
> +       ret = kvm_s2_fault_pin_pfn(s2fd, &s2vi);
>         if (ret != 1)
>                 return ret;
>
> --
> 2.47.3
>


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 15/17] KVM: arm64: Directly expose mapping prot and kill kvm_s2_fault
  2026-03-16 17:54 ` [PATCH 15/17] KVM: arm64: Directly expose mapping prot and kill kvm_s2_fault Marc Zyngier
@ 2026-03-17 16:14   ` Fuad Tabba
  0 siblings, 0 replies; 47+ messages in thread
From: Fuad Tabba @ 2026-03-17 16:14 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvmarm, linux-arm-kernel, Joey Gouly, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu, Will Deacon, Quentin Perret

On Mon, 16 Mar 2026 at 17:55, Marc Zyngier <maz@kernel.org> wrote:
>
> The 'prot' field is the only one left in kvm_s2_fault. Expose it
> directly to the functions needing it, and get rid of kvm_s2_fault.
>
> It has served us well during this refactoring, but it is now no
> longer needed.
>
> Signed-off-by: Marc Zyngier <maz@kernel.org>

Reviewed-by: Fuad Tabba <tabba@google.com>

Cheers,
/fuad


> ---
>  arch/arm64/kvm/mmu.c | 45 +++++++++++++++++++++-----------------------
>  1 file changed, 21 insertions(+), 24 deletions(-)
>
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 97cb3585eba03..9b5df70807875 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1729,10 +1729,6 @@ static short kvm_s2_resolve_vma_size(const struct kvm_s2_fault_desc *s2fd,
>         return vma_shift;
>  }
>
> -struct kvm_s2_fault {
> -       enum kvm_pgtable_prot prot;
> -};
> -
>  static bool kvm_s2_fault_is_perm(const struct kvm_s2_fault_desc *s2fd)
>  {
>         return kvm_vcpu_trap_is_permission_fault(s2fd->vcpu);
> @@ -1856,8 +1852,8 @@ static int kvm_s2_fault_pin_pfn(const struct kvm_s2_fault_desc *s2fd,
>  }
>
>  static int kvm_s2_fault_compute_prot(const struct kvm_s2_fault_desc *s2fd,
> -                                    struct kvm_s2_fault *fault,
> -                                    const struct kvm_s2_fault_vma_info *s2vi)
> +                                    const struct kvm_s2_fault_vma_info *s2vi,
> +                                    enum kvm_pgtable_prot *prot)
>  {
>         struct kvm *kvm = s2fd->vcpu->kvm;
>         bool writable = s2vi->map_writable;
> @@ -1885,23 +1881,25 @@ static int kvm_s2_fault_compute_prot(const struct kvm_s2_fault_desc *s2fd,
>                 return 1;
>         }
>
> +       *prot = KVM_PGTABLE_PROT_R;
> +
>         if (s2fd->nested)
> -               adjust_nested_fault_perms(s2fd->nested, &fault->prot, &writable);
> +               adjust_nested_fault_perms(s2fd->nested, prot, &writable);
>
>         if (writable)
> -               fault->prot |= KVM_PGTABLE_PROT_W;
> +               *prot |= KVM_PGTABLE_PROT_W;
>
>         if (kvm_vcpu_trap_is_exec_fault(s2fd->vcpu))
> -               fault->prot |= KVM_PGTABLE_PROT_X;
> +               *prot |= KVM_PGTABLE_PROT_X;
>
>         if (s2vi->map_non_cacheable)
> -               fault->prot |= (s2vi->vm_flags & VM_ALLOW_ANY_UNCACHED) ?
> -                              KVM_PGTABLE_PROT_NORMAL_NC : KVM_PGTABLE_PROT_DEVICE;
> +               *prot |= (s2vi->vm_flags & VM_ALLOW_ANY_UNCACHED) ?
> +                       KVM_PGTABLE_PROT_NORMAL_NC : KVM_PGTABLE_PROT_DEVICE;
>         else if (cpus_have_final_cap(ARM64_HAS_CACHE_DIC))
> -               fault->prot |= KVM_PGTABLE_PROT_X;
> +               *prot |= KVM_PGTABLE_PROT_X;
>
>         if (s2fd->nested)
> -               adjust_nested_exec_perms(kvm, s2fd->nested, &fault->prot);
> +               adjust_nested_exec_perms(kvm, s2fd->nested, prot);
>
>         if (!kvm_s2_fault_is_perm(s2fd) && !s2vi->map_non_cacheable && kvm_has_mte(kvm)) {
>                 /* Check the VMM hasn't introduced a new disallowed VMA */
> @@ -1913,11 +1911,12 @@ static int kvm_s2_fault_compute_prot(const struct kvm_s2_fault_desc *s2fd,
>  }
>
>  static int kvm_s2_fault_map(const struct kvm_s2_fault_desc *s2fd,
> -                           struct kvm_s2_fault *fault,
> -                           const struct kvm_s2_fault_vma_info *s2vi, void *memcache)
> +                           const struct kvm_s2_fault_vma_info *s2vi,
> +                           enum kvm_pgtable_prot prot,
> +                           void *memcache)
>  {
>         enum kvm_pgtable_walk_flags flags = KVM_PGTABLE_WALK_SHARED;
> -       bool writable = fault->prot & KVM_PGTABLE_PROT_W;
> +       bool writable = prot & KVM_PGTABLE_PROT_W;
>         struct kvm *kvm = s2fd->vcpu->kvm;
>         struct kvm_pgtable *pgt;
>         long perm_fault_granule;
> @@ -1970,12 +1969,12 @@ static int kvm_s2_fault_map(const struct kvm_s2_fault_desc *s2fd,
>                  * Drop the SW bits in favour of those stored in the
>                  * PTE, which will be preserved.
>                  */
> -               fault->prot &= ~KVM_NV_GUEST_MAP_SZ;
> +               prot &= ~KVM_NV_GUEST_MAP_SZ;
>                 ret = KVM_PGT_FN(kvm_pgtable_stage2_relax_perms)(pgt, gfn_to_gpa(gfn),
> -                                                                fault->prot, flags);
> +                                                                prot, flags);
>         } else {
>                 ret = KVM_PGT_FN(kvm_pgtable_stage2_map)(pgt, gfn_to_gpa(gfn), mapping_size,
> -                                                        __pfn_to_phys(pfn), fault->prot,
> +                                                        __pfn_to_phys(pfn), prot,
>                                                          memcache, flags);
>         }
>
> @@ -1999,9 +1998,7 @@ static int user_mem_abort(const struct kvm_s2_fault_desc *s2fd)
>  {
>         bool perm_fault = kvm_vcpu_trap_is_permission_fault(s2fd->vcpu);
>         struct kvm_s2_fault_vma_info s2vi = {};
> -       struct kvm_s2_fault fault = {
> -               .prot = KVM_PGTABLE_PROT_R,
> -       };
> +       enum kvm_pgtable_prot prot;
>         void *memcache;
>         int ret;
>
> @@ -2026,13 +2023,13 @@ static int user_mem_abort(const struct kvm_s2_fault_desc *s2fd)
>         if (ret != 1)
>                 return ret;
>
> -       ret = kvm_s2_fault_compute_prot(s2fd, &fault, &s2vi);
> +       ret = kvm_s2_fault_compute_prot(s2fd, &s2vi, &prot);
>         if (ret) {
>                 kvm_release_page_unused(s2vi.page);
>                 return ret;
>         }
>
> -       return kvm_s2_fault_map(s2fd, &fault, &s2vi, memcache);
> +       return kvm_s2_fault_map(s2fd, &s2vi, prot, memcache);
>  }
>
>  /* Resolve the access fault by making the page young again. */
> --
> 2.47.3
>


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 16/17] KVM: arm64: Simplify integration of adjust_nested_*_perms()
  2026-03-16 17:54 ` [PATCH 16/17] KVM: arm64: Simplify integration of adjust_nested_*_perms() Marc Zyngier
@ 2026-03-17 16:45   ` Fuad Tabba
  0 siblings, 0 replies; 47+ messages in thread
From: Fuad Tabba @ 2026-03-17 16:45 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvmarm, linux-arm-kernel, Joey Gouly, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu, Will Deacon, Quentin Perret

On Mon, 16 Mar 2026 at 17:55, Marc Zyngier <maz@kernel.org> wrote:
>
> Instead of passing pointers to adjust_nested_*_perms(), allow
> them to return a new set of permissions.
>
> With some careful moving around so that the canonical permissions
> are computed before the nested ones are applied, we end-up with
> a bit less code, and something a bit more readable.
>
> Signed-off-by: Marc Zyngier <maz@kernel.org>

Reviewed-by: Fuad Tabba <tabba@google.com>

Cheers,
/fuad


> ---
>  arch/arm64/kvm/mmu.c | 62 +++++++++++++++++++-------------------------
>  1 file changed, 27 insertions(+), 35 deletions(-)
>
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 9b5df70807875..18cf7e6ba786d 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1544,32 +1544,34 @@ static int prepare_mmu_memcache(struct kvm_vcpu *vcpu, bool topup_memcache,
>   * TLB invalidation from the guest and used to limit the invalidation scope if a
>   * TTL hint or a range isn't provided.
>   */
> -static void adjust_nested_fault_perms(struct kvm_s2_trans *nested,
> -                                     enum kvm_pgtable_prot *prot,
> -                                     bool *writable)
> +static enum kvm_pgtable_prot adjust_nested_fault_perms(struct kvm_s2_trans *nested,
> +                                                      enum kvm_pgtable_prot prot)
>  {
> -       *writable &= kvm_s2_trans_writable(nested);
> +       if (!kvm_s2_trans_writable(nested))
> +               prot &= ~KVM_PGTABLE_PROT_W;
>         if (!kvm_s2_trans_readable(nested))
> -               *prot &= ~KVM_PGTABLE_PROT_R;
> +               prot &= ~KVM_PGTABLE_PROT_R;
>
> -       *prot |= kvm_encode_nested_level(nested);
> +       return prot | kvm_encode_nested_level(nested);
>  }
>
> -static void adjust_nested_exec_perms(struct kvm *kvm,
> -                                    struct kvm_s2_trans *nested,
> -                                    enum kvm_pgtable_prot *prot)
> +static enum kvm_pgtable_prot adjust_nested_exec_perms(struct kvm *kvm,
> +                                                     struct kvm_s2_trans *nested,
> +                                                     enum kvm_pgtable_prot prot)
>  {
>         if (!kvm_s2_trans_exec_el0(kvm, nested))
> -               *prot &= ~KVM_PGTABLE_PROT_UX;
> +               prot &= ~KVM_PGTABLE_PROT_UX;
>         if (!kvm_s2_trans_exec_el1(kvm, nested))
> -               *prot &= ~KVM_PGTABLE_PROT_PX;
> +               prot &= ~KVM_PGTABLE_PROT_PX;
> +
> +       return prot;
>  }
>
>  static int gmem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>                       struct kvm_s2_trans *nested,
>                       struct kvm_memory_slot *memslot, bool is_perm)
>  {
> -       bool write_fault, exec_fault, writable;
> +       bool write_fault, exec_fault;
>         enum kvm_pgtable_walk_flags flags = KVM_PGTABLE_WALK_SHARED;
>         enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
>         struct kvm_pgtable *pgt = vcpu->arch.hw_mmu->pgt;
> @@ -1606,19 +1608,17 @@ static int gmem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>                 return ret;
>         }
>
> -       writable = !(memslot->flags & KVM_MEM_READONLY);
> +       if (!(memslot->flags & KVM_MEM_READONLY))
> +               prot |= KVM_PGTABLE_PROT_W;
>
>         if (nested)
> -               adjust_nested_fault_perms(nested, &prot, &writable);
> -
> -       if (writable)
> -               prot |= KVM_PGTABLE_PROT_W;
> +               prot = adjust_nested_fault_perms(nested, prot);
>
>         if (exec_fault || cpus_have_final_cap(ARM64_HAS_CACHE_DIC))
>                 prot |= KVM_PGTABLE_PROT_X;
>
>         if (nested)
> -               adjust_nested_exec_perms(kvm, nested, &prot);
> +               prot = adjust_nested_exec_perms(kvm, nested, prot);
>
>         kvm_fault_lock(kvm);
>         if (mmu_invalidate_retry(kvm, mmu_seq)) {
> @@ -1631,10 +1631,10 @@ static int gmem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>                                                  memcache, flags);
>
>  out_unlock:
> -       kvm_release_faultin_page(kvm, page, !!ret, writable);
> +       kvm_release_faultin_page(kvm, page, !!ret, prot & KVM_PGTABLE_PROT_W);
>         kvm_fault_unlock(kvm);
>
> -       if (writable && !ret)
> +       if ((prot & KVM_PGTABLE_PROT_W) && !ret)
>                 mark_page_dirty_in_slot(kvm, memslot, gfn);
>
>         return ret != -EAGAIN ? ret : 0;
> @@ -1856,16 +1856,6 @@ static int kvm_s2_fault_compute_prot(const struct kvm_s2_fault_desc *s2fd,
>                                      enum kvm_pgtable_prot *prot)
>  {
>         struct kvm *kvm = s2fd->vcpu->kvm;
> -       bool writable = s2vi->map_writable;
> -
> -       if (!s2vi->device && memslot_is_logging(s2fd->memslot) &&
> -           !kvm_is_write_fault(s2fd->vcpu)) {
> -               /*
> -                * Only actually map the page as writable if this was a write
> -                * fault.
> -                */
> -               writable = false;
> -       }
>
>         if (kvm_vcpu_trap_is_exec_fault(s2fd->vcpu) && s2vi->map_non_cacheable)
>                 return -ENOEXEC;
> @@ -1883,12 +1873,14 @@ static int kvm_s2_fault_compute_prot(const struct kvm_s2_fault_desc *s2fd,
>
>         *prot = KVM_PGTABLE_PROT_R;
>
> -       if (s2fd->nested)
> -               adjust_nested_fault_perms(s2fd->nested, prot, &writable);
> -
> -       if (writable)
> +       if (s2vi->map_writable && (s2vi->device ||
> +                                  !memslot_is_logging(s2fd->memslot) ||
> +                                  kvm_is_write_fault(s2fd->vcpu)))
>                 *prot |= KVM_PGTABLE_PROT_W;
>
> +       if (s2fd->nested)
> +               *prot = adjust_nested_fault_perms(s2fd->nested, *prot);
> +
>         if (kvm_vcpu_trap_is_exec_fault(s2fd->vcpu))
>                 *prot |= KVM_PGTABLE_PROT_X;
>
> @@ -1899,7 +1891,7 @@ static int kvm_s2_fault_compute_prot(const struct kvm_s2_fault_desc *s2fd,
>                 *prot |= KVM_PGTABLE_PROT_X;
>
>         if (s2fd->nested)
> -               adjust_nested_exec_perms(kvm, s2fd->nested, prot);
> +               *prot = adjust_nested_exec_perms(kvm, s2fd->nested, *prot);
>
>         if (!kvm_s2_fault_is_perm(s2fd) && !s2vi->map_non_cacheable && kvm_has_mte(kvm)) {
>                 /* Check the VMM hasn't introduced a new disallowed VMA */
> --
> 2.47.3
>
>


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 00/17] KVM: arm64: More user_mem_abort() rework
  2026-03-16 17:54 [PATCH 00/17] KVM: arm64: More user_mem_abort() rework Marc Zyngier
                   ` (18 preceding siblings ...)
  2026-03-16 20:26 ` Fuad Tabba
@ 2026-03-17 17:03 ` Suzuki K Poulose
  19 siblings, 0 replies; 47+ messages in thread
From: Suzuki K Poulose @ 2026-03-17 17:03 UTC (permalink / raw)
  To: Marc Zyngier, kvmarm, linux-arm-kernel
  Cc: Joey Gouly, Oliver Upton, Zenghui Yu, Fuad Tabba, Will Deacon,
	Quentin Perret

On 16/03/2026 17:54, Marc Zyngier wrote:
> Piqued by Fuad's initial set of patches[1] splitting user_mem_abort()
> into more "edible" functions, I've added my on take on top of it with
> a few goals in mind:
> 
> - contextualise the state by splitting kvm_s2_fault into more granular
>    structures
> 
> - reduce the amount of state that is visible and/or mutable by any
>    single function
> 
> - reduce the number of variable that simply cache state that is
>    already implicitly available (and often only a helper away)
> 
> I find the result reasonably attractive, and throwing it at a couple
> of machines didn't result in anything out of the ordinary.

Indeed a very nice cleanup both of you ! Thanks for making this
pleasing to the eyes and easier to reason about.

FWIW:

Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>


> 
> For those interested, I have stashed a branch at [2], and I'd
> appreciate some feedback on the outcome.
> 
> [1] https://lore.kernel.org/all/20260306140232.2193802-1-tabba@google.com/
> [2] https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=hack/user_mem_abort-rework
> 
> Marc Zyngier (17):
>    KVM: arm64: Kill fault->ipa
>    KVM: arm64: Make fault_ipa immutable
>    KVM: arm64: Move fault context to const structure
>    KVM: arm64: Replace fault_is_perm with a helper
>    KVM: arm64: Constrain fault_granule to kvm_s2_fault_map()
>    KVM: arm64: Kill write_fault from kvm_s2_fault
>    KVM: arm64: Kill exec_fault from kvm_s2_fault
>    KVM: arm64: Kill topup_memcache from kvm_s2_fault
>    KVM: arm64: Move VMA-related information to kvm_s2_fault_vma_info
>    KVM: arm64: Kill logging_active from kvm_s2_fault
>    KVM: arm64: Restrict the scope of the 'writable' attribute
>    KVM: arm64: Move kvm_s2_fault.{pfn,page} to kvm_s2_vma_info
>    KVM: arm64: Replace force_pte with a max_map_size attribute
>    KVM: arm64: Move device mapping management into kvm_s2_fault_pin_pfn()
>    KVM: arm64: Directly expose mapping prot and kill kvm_s2_fault
>    KVM: arm64: Simplify integration of adjust_nested_*_perms()
>    KVM: arm64: Convert gmem_abort() to struct kvm_s2_fault_desc
> 
>   arch/arm64/kvm/mmu.c | 428 ++++++++++++++++++++++---------------------
>   1 file changed, 223 insertions(+), 205 deletions(-)
> 



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 00/17] KVM: arm64: More user_mem_abort() rework
  2026-03-17  8:23     ` Marc Zyngier
@ 2026-03-17 17:50       ` Fuad Tabba
  2026-03-17 18:02         ` Fuad Tabba
  0 siblings, 1 reply; 47+ messages in thread
From: Fuad Tabba @ 2026-03-17 17:50 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvmarm, linux-arm-kernel, Joey Gouly, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu, Will Deacon, Quentin Perret

<snip>

> > > The series in hack/user_mem_abort-rework is different from this one.
> > > Here are the first few patches:
> >
> > And I just realized it's because they're based on _my_ patches ... doh! :D
>
> I thought I was clear when I wrote "I've added my on take on top of
> it", but maybe not. In any case, I didn't feel the need to redo what
> you had already done -- I don't think I'd have come up with something
> better.

You were clear. It's the reader's fault, not the author's.

I'll send you one more patch that might be worth adding, but this
looks great! Much easier to reason about. Thanks.

For the series:
Tested-by: Fuad Tabba <tabba@google.com>

(might as well repeat it here)
Reviwed-by: Fuad Tabba <tabba@google.com>

Cheers,
/fuad


>
> In any case, I'd appreciate your feedback!
>
> Thanks,
>
>         M.
>
> --
> Without deviation from the norm, progress is not possible.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 17/17] KVM: arm64: Convert gmem_abort() to struct kvm_s2_fault_desc
  2026-03-16 17:54 ` [PATCH 17/17] KVM: arm64: Convert gmem_abort() to struct kvm_s2_fault_desc Marc Zyngier
@ 2026-03-17 17:58   ` Fuad Tabba
  0 siblings, 0 replies; 47+ messages in thread
From: Fuad Tabba @ 2026-03-17 17:58 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvmarm, linux-arm-kernel, Joey Gouly, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu, Will Deacon, Quentin Perret

On Mon, 16 Mar 2026 at 17:55, Marc Zyngier <maz@kernel.org> wrote:
>
> Having fully converted user_mem_abort() to kvm_s2_fault_desc and
> co, convert gmem_abort() to it as well. The change is obviously
> much simpler.
>
> Signed-off-by: Marc Zyngier <maz@kernel.org>

Reviewed-by: Fuad Tabba <tabba@google.com>

Cheers,
/fuad


> ---
>  arch/arm64/kvm/mmu.c | 57 +++++++++++++++++++++-----------------------
>  1 file changed, 27 insertions(+), 30 deletions(-)
>
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 18cf7e6ba786d..e14b8b7287192 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1567,33 +1567,39 @@ static enum kvm_pgtable_prot adjust_nested_exec_perms(struct kvm *kvm,
>         return prot;
>  }
>
> -static int gmem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> -                     struct kvm_s2_trans *nested,
> -                     struct kvm_memory_slot *memslot, bool is_perm)
> +struct kvm_s2_fault_desc {
> +       struct kvm_vcpu         *vcpu;
> +       phys_addr_t             fault_ipa;
> +       struct kvm_s2_trans     *nested;
> +       struct kvm_memory_slot  *memslot;
> +       unsigned long           hva;
> +};
> +
> +static int gmem_abort(const struct kvm_s2_fault_desc *s2fd)
>  {
>         bool write_fault, exec_fault;
>         enum kvm_pgtable_walk_flags flags = KVM_PGTABLE_WALK_SHARED;
>         enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
> -       struct kvm_pgtable *pgt = vcpu->arch.hw_mmu->pgt;
> +       struct kvm_pgtable *pgt = s2fd->vcpu->arch.hw_mmu->pgt;
>         unsigned long mmu_seq;
>         struct page *page;
> -       struct kvm *kvm = vcpu->kvm;
> +       struct kvm *kvm = s2fd->vcpu->kvm;
>         void *memcache;
>         kvm_pfn_t pfn;
>         gfn_t gfn;
>         int ret;
>
> -       ret = prepare_mmu_memcache(vcpu, true, &memcache);
> +       ret = prepare_mmu_memcache(s2fd->vcpu, true, &memcache);
>         if (ret)
>                 return ret;
>
> -       if (nested)
> -               gfn = kvm_s2_trans_output(nested) >> PAGE_SHIFT;
> +       if (s2fd->nested)
> +               gfn = kvm_s2_trans_output(s2fd->nested) >> PAGE_SHIFT;
>         else
> -               gfn = fault_ipa >> PAGE_SHIFT;
> +               gfn = s2fd->fault_ipa >> PAGE_SHIFT;
>
> -       write_fault = kvm_is_write_fault(vcpu);
> -       exec_fault = kvm_vcpu_trap_is_exec_fault(vcpu);
> +       write_fault = kvm_is_write_fault(s2fd->vcpu);
> +       exec_fault = kvm_vcpu_trap_is_exec_fault(s2fd->vcpu);
>
>         VM_WARN_ON_ONCE(write_fault && exec_fault);
>
> @@ -1601,24 +1607,24 @@ static int gmem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>         /* Pairs with the smp_wmb() in kvm_mmu_invalidate_end(). */
>         smp_rmb();
>
> -       ret = kvm_gmem_get_pfn(kvm, memslot, gfn, &pfn, &page, NULL);
> +       ret = kvm_gmem_get_pfn(kvm, s2fd->memslot, gfn, &pfn, &page, NULL);
>         if (ret) {
> -               kvm_prepare_memory_fault_exit(vcpu, fault_ipa, PAGE_SIZE,
> +               kvm_prepare_memory_fault_exit(s2fd->vcpu, s2fd->fault_ipa, PAGE_SIZE,
>                                               write_fault, exec_fault, false);
>                 return ret;
>         }
>
> -       if (!(memslot->flags & KVM_MEM_READONLY))
> +       if (!(s2fd->memslot->flags & KVM_MEM_READONLY))
>                 prot |= KVM_PGTABLE_PROT_W;
>
> -       if (nested)
> -               prot = adjust_nested_fault_perms(nested, prot);
> +       if (s2fd->nested)
> +               prot = adjust_nested_fault_perms(s2fd->nested, prot);
>
>         if (exec_fault || cpus_have_final_cap(ARM64_HAS_CACHE_DIC))
>                 prot |= KVM_PGTABLE_PROT_X;
>
> -       if (nested)
> -               prot = adjust_nested_exec_perms(kvm, nested, prot);
> +       if (s2fd->nested)
> +               prot = adjust_nested_exec_perms(kvm, s2fd->nested, prot);
>
>         kvm_fault_lock(kvm);
>         if (mmu_invalidate_retry(kvm, mmu_seq)) {
> @@ -1626,7 +1632,7 @@ static int gmem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>                 goto out_unlock;
>         }
>
> -       ret = KVM_PGT_FN(kvm_pgtable_stage2_map)(pgt, fault_ipa, PAGE_SIZE,
> +       ret = KVM_PGT_FN(kvm_pgtable_stage2_map)(pgt, s2fd->fault_ipa, PAGE_SIZE,
>                                                  __pfn_to_phys(pfn), prot,
>                                                  memcache, flags);
>
> @@ -1635,19 +1641,11 @@ static int gmem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>         kvm_fault_unlock(kvm);
>
>         if ((prot & KVM_PGTABLE_PROT_W) && !ret)
> -               mark_page_dirty_in_slot(kvm, memslot, gfn);
> +               mark_page_dirty_in_slot(kvm, s2fd->memslot, gfn);
>
>         return ret != -EAGAIN ? ret : 0;
>  }
>
> -struct kvm_s2_fault_desc {
> -       struct kvm_vcpu         *vcpu;
> -       phys_addr_t             fault_ipa;
> -       struct kvm_s2_trans     *nested;
> -       struct kvm_memory_slot  *memslot;
> -       unsigned long           hva;
> -};
> -
>  struct kvm_s2_fault_vma_info {
>         unsigned long   mmu_seq;
>         long            vma_pagesize;
> @@ -2296,8 +2294,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
>         };
>
>         if (kvm_slot_has_gmem(memslot))
> -               ret = gmem_abort(vcpu, fault_ipa, nested, memslot,
> -                                esr_fsc_is_permission_fault(esr));
> +               ret = gmem_abort(&s2fd);
>         else
>                 ret = user_mem_abort(&s2fd);
>
> --
> 2.47.3
>


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 00/17] KVM: arm64: More user_mem_abort() rework
  2026-03-17 17:50       ` Fuad Tabba
@ 2026-03-17 18:02         ` Fuad Tabba
  0 siblings, 0 replies; 47+ messages in thread
From: Fuad Tabba @ 2026-03-17 18:02 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvmarm, linux-arm-kernel, Joey Gouly, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu, Will Deacon, Quentin Perret

On Tue, 17 Mar 2026 at 17:50, Fuad Tabba <tabba@google.com> wrote:
>
> <snip>
>
> > > > The series in hack/user_mem_abort-rework is different from this one.
> > > > Here are the first few patches:
> > >
> > > And I just realized it's because they're based on _my_ patches ... doh! :D
> >
> > I thought I was clear when I wrote "I've added my on take on top of
> > it", but maybe not. In any case, I didn't feel the need to redo what
> > you had already done -- I don't think I'd have come up with something
> > better.
>
> You were clear. It's the reader's fault, not the author's.
>
> I'll send you one more patch that might be worth adding, but this
> looks great! Much easier to reason about. Thanks.

On a second thought, let's stop here. I was thinking of going ahead
with guest_memfd abort, but let's see how this one goes and take it
from there.

Thanks again for this!
/fuad

> For the series:
> Tested-by: Fuad Tabba <tabba@google.com>
>
> (might as well repeat it here)
> Reviwed-by: Fuad Tabba <tabba@google.com>
>
> Cheers,
> /fuad
>
>
> >
> > In any case, I'd appreciate your feedback!
> >
> > Thanks,
> >
> >         M.
> >
> > --
> > Without deviation from the norm, progress is not possible.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 04/17] KVM: arm64: Replace fault_is_perm with a helper
  2026-03-16 17:54 ` [PATCH 04/17] KVM: arm64: Replace fault_is_perm with a helper Marc Zyngier
  2026-03-17 10:49   ` Fuad Tabba
@ 2026-03-18 13:43   ` Joey Gouly
  1 sibling, 0 replies; 47+ messages in thread
From: Joey Gouly @ 2026-03-18 13:43 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvmarm, linux-arm-kernel, Suzuki K Poulose, Oliver Upton,
	Zenghui Yu, Fuad Tabba, Will Deacon, Quentin Perret

On Mon, Mar 16, 2026 at 05:54:37PM +0000, Marc Zyngier wrote:
> Carrying a boolean to indicate that a given fault is slightly odd,

ESR_ELx_FSC_SLIGHTLY_ODD :D
(probably should be "that a given fault is a permission fault is..")

Reviewed-by: Joey Gouly <joey.gouly@arm.com>

> as this is a property of the fault itself, and we'd better avoid
> duplicating state.
> 
> For this purpose, introduce a kvm_s2_fault_is_perm() predicate that
> can take a fault descriptor as a parameter. fault_is_perm is therefore
> dropped from kvm_s2_fault.
> 
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/kvm/mmu.c | 17 ++++++++++-------
>  1 file changed, 10 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 2a7128b8dd14f..1b32f2e6c3e61 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1711,8 +1711,6 @@ static short kvm_s2_resolve_vma_size(const struct kvm_s2_fault_desc *s2fd,
>  }
>  
>  struct kvm_s2_fault {
> -	bool fault_is_perm;
> -
>  	bool write_fault;
>  	bool exec_fault;
>  	bool writable;
> @@ -1732,6 +1730,11 @@ struct kvm_s2_fault {
>  	vm_flags_t vm_flags;
>  };
>  
> +static bool kvm_s2_fault_is_perm(const struct kvm_s2_fault_desc *s2fd)
> +{
> +	return kvm_vcpu_trap_is_permission_fault(s2fd->vcpu);
> +}
> +
>  static int kvm_s2_fault_get_vma_info(const struct kvm_s2_fault_desc *s2fd,
>  				     struct kvm_s2_fault *fault)
>  {
> @@ -1888,7 +1891,7 @@ static int kvm_s2_fault_compute_prot(const struct kvm_s2_fault_desc *s2fd,
>  	if (s2fd->nested)
>  		adjust_nested_exec_perms(kvm, s2fd->nested, &fault->prot);
>  
> -	if (!fault->fault_is_perm && !fault->s2_force_noncacheable && kvm_has_mte(kvm)) {
> +	if (!kvm_s2_fault_is_perm(s2fd) && !fault->s2_force_noncacheable && kvm_has_mte(kvm)) {
>  		/* Check the VMM hasn't introduced a new disallowed VMA */
>  		if (!fault->mte_allowed)
>  			return -EFAULT;
> @@ -1905,6 +1908,7 @@ static phys_addr_t get_ipa(const struct kvm_s2_fault *fault)
>  static int kvm_s2_fault_map(const struct kvm_s2_fault_desc *s2fd,
>  			    struct kvm_s2_fault *fault, void *memcache)
>  {
> +	bool fault_is_perm = kvm_s2_fault_is_perm(s2fd);
>  	struct kvm *kvm = s2fd->vcpu->kvm;
>  	struct kvm_pgtable *pgt;
>  	int ret;
> @@ -1922,7 +1926,7 @@ static int kvm_s2_fault_map(const struct kvm_s2_fault_desc *s2fd,
>  	 */
>  	if (fault->vma_pagesize == PAGE_SIZE &&
>  	    !(fault->force_pte || fault->s2_force_noncacheable)) {
> -		if (fault->fault_is_perm && fault->fault_granule > PAGE_SIZE) {
> +		if (fault_is_perm && fault->fault_granule > PAGE_SIZE) {
>  			fault->vma_pagesize = fault->fault_granule;
>  		} else {
>  			fault->vma_pagesize = transparent_hugepage_adjust(kvm, s2fd->memslot,
> @@ -1936,7 +1940,7 @@ static int kvm_s2_fault_map(const struct kvm_s2_fault_desc *s2fd,
>  		}
>  	}
>  
> -	if (!fault->fault_is_perm && !fault->s2_force_noncacheable && kvm_has_mte(kvm))
> +	if (!fault_is_perm && !fault->s2_force_noncacheable && kvm_has_mte(kvm))
>  		sanitise_mte_tags(kvm, fault->pfn, fault->vma_pagesize);
>  
>  	/*
> @@ -1944,7 +1948,7 @@ static int kvm_s2_fault_map(const struct kvm_s2_fault_desc *s2fd,
>  	 * permissions only if fault->vma_pagesize equals fault->fault_granule. Otherwise,
>  	 * kvm_pgtable_stage2_map() should be called to change block size.
>  	 */
> -	if (fault->fault_is_perm && fault->vma_pagesize == fault->fault_granule) {
> +	if (fault_is_perm && fault->vma_pagesize == fault->fault_granule) {
>  		/*
>  		 * Drop the SW bits in favour of those stored in the
>  		 * PTE, which will be preserved.
> @@ -1977,7 +1981,6 @@ static int user_mem_abort(const struct kvm_s2_fault_desc *s2fd)
>  	bool write_fault = kvm_is_write_fault(s2fd->vcpu);
>  	bool logging_active = memslot_is_logging(s2fd->memslot);
>  	struct kvm_s2_fault fault = {
> -		.fault_is_perm = perm_fault,
>  		.logging_active = logging_active,
>  		.force_pte = logging_active,
>  		.prot = KVM_PGTABLE_PROT_R,
> -- 
> 2.47.3
> 


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 09/17] KVM: arm64: Move VMA-related information to kvm_s2_fault_vma_info
  2026-03-16 17:54 ` [PATCH 09/17] KVM: arm64: Move VMA-related information to kvm_s2_fault_vma_info Marc Zyngier
  2026-03-17 12:51   ` Fuad Tabba
@ 2026-03-18 14:22   ` Joey Gouly
  2026-03-18 16:14     ` Fuad Tabba
  1 sibling, 1 reply; 47+ messages in thread
From: Joey Gouly @ 2026-03-18 14:22 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvmarm, linux-arm-kernel, Suzuki K Poulose, Oliver Upton,
	Zenghui Yu, Fuad Tabba, Will Deacon, Quentin Perret

On Mon, Mar 16, 2026 at 05:54:42PM +0000, Marc Zyngier wrote:
> Mecanically extract a bunch of VMA-related fields from kvm_s2_fault
Mechanically
> and move them to a new kvm_s2_fault_vma_info structure.
> 
> This is not much, but it already allows us to define which functions
> can update this structure, and which ones are pure consumers of the
> data. Those in the latter camp are updated to take a const pointer
> to that structure.
> 
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/kvm/mmu.c | 113 +++++++++++++++++++++++--------------------
>  1 file changed, 61 insertions(+), 52 deletions(-)
> 
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index abe239752c696..a5b0dd41560f6 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1710,20 +1710,23 @@ static short kvm_s2_resolve_vma_size(const struct kvm_s2_fault_desc *s2fd,
>  	return vma_shift;
>  }
>  
> +struct kvm_s2_fault_vma_info {
> +	unsigned long	mmu_seq;
> +	long		vma_pagesize;
> +	vm_flags_t	vm_flags;
> +	gfn_t		gfn;
> +	bool		mte_allowed;
> +	bool		is_vma_cacheable;
> +};
> +
>  struct kvm_s2_fault {
>  	bool writable;
> -	bool mte_allowed;
> -	bool is_vma_cacheable;
>  	bool s2_force_noncacheable;
> -	unsigned long mmu_seq;
> -	gfn_t gfn;
>  	kvm_pfn_t pfn;
>  	bool logging_active;
>  	bool force_pte;
> -	long vma_pagesize;
>  	enum kvm_pgtable_prot prot;
>  	struct page *page;
> -	vm_flags_t vm_flags;
>  };
>  
>  static bool kvm_s2_fault_is_perm(const struct kvm_s2_fault_desc *s2fd)
> @@ -1732,7 +1735,8 @@ static bool kvm_s2_fault_is_perm(const struct kvm_s2_fault_desc *s2fd)
>  }
>  
>  static int kvm_s2_fault_get_vma_info(const struct kvm_s2_fault_desc *s2fd,
> -				     struct kvm_s2_fault *fault)
> +				     struct kvm_s2_fault *fault,
> +				     struct kvm_s2_fault_vma_info *s2vi)
>  {
>  	struct vm_area_struct *vma;
>  	struct kvm *kvm = s2fd->vcpu->kvm;
> @@ -1745,20 +1749,20 @@ static int kvm_s2_fault_get_vma_info(const struct kvm_s2_fault_desc *s2fd,
>  		return -EFAULT;
>  	}
>  
> -	fault->vma_pagesize = BIT(kvm_s2_resolve_vma_size(s2fd, vma, &fault->force_pte));
> +	s2vi->vma_pagesize = BIT(kvm_s2_resolve_vma_size(s2fd, vma, &fault->force_pte));
>  
>  	/*
>  	 * Both the canonical IPA and fault IPA must be aligned to the
>  	 * mapping size to ensure we find the right PFN and lay down the
>  	 * mapping in the right place.
>  	 */
> -	fault->gfn = ALIGN_DOWN(s2fd->fault_ipa, fault->vma_pagesize) >> PAGE_SHIFT;
> +	s2vi->gfn = ALIGN_DOWN(s2fd->fault_ipa, s2vi->vma_pagesize) >> PAGE_SHIFT;
>  
> -	fault->mte_allowed = kvm_vma_mte_allowed(vma);
> +	s2vi->mte_allowed = kvm_vma_mte_allowed(vma);
>  
> -	fault->vm_flags = vma->vm_flags;
> +	s2vi->vm_flags = vma->vm_flags;
>  
> -	fault->is_vma_cacheable = kvm_vma_is_cacheable(vma);
> +	s2vi->is_vma_cacheable = kvm_vma_is_cacheable(vma);
>  
>  	/*
>  	 * Read mmu_invalidate_seq so that KVM can detect if the results of
> @@ -1768,39 +1772,40 @@ static int kvm_s2_fault_get_vma_info(const struct kvm_s2_fault_desc *s2fd,
>  	 * Rely on mmap_read_unlock() for an implicit smp_rmb(), which pairs
>  	 * with the smp_wmb() in kvm_mmu_invalidate_end().
>  	 */
> -	fault->mmu_seq = kvm->mmu_invalidate_seq;
> +	s2vi->mmu_seq = kvm->mmu_invalidate_seq;
>  	mmap_read_unlock(current->mm);
>  
>  	return 0;
>  }
>  
>  static gfn_t get_canonical_gfn(const struct kvm_s2_fault_desc *s2fd,
> -			       const struct kvm_s2_fault *fault)
> +			       const struct kvm_s2_fault_vma_info *s2vi)
>  {
>  	phys_addr_t ipa;
>  
>  	if (!s2fd->nested)
> -		return fault->gfn;
> +		return s2vi->gfn;
>  
>  	ipa = kvm_s2_trans_output(s2fd->nested);
> -	return ALIGN_DOWN(ipa, fault->vma_pagesize) >> PAGE_SHIFT;
> +	return ALIGN_DOWN(ipa, s2vi->vma_pagesize) >> PAGE_SHIFT;
>  }
>  
>  static int kvm_s2_fault_pin_pfn(const struct kvm_s2_fault_desc *s2fd,
> -				struct kvm_s2_fault *fault)
> +				struct kvm_s2_fault *fault,
> +				struct kvm_s2_fault_vma_info *s2vi)
>  {
>  	int ret;
>  
> -	ret = kvm_s2_fault_get_vma_info(s2fd, fault);
> +	ret = kvm_s2_fault_get_vma_info(s2fd, fault, s2vi);
>  	if (ret)
>  		return ret;
>  
> -	fault->pfn = __kvm_faultin_pfn(s2fd->memslot, get_canonical_gfn(s2fd, fault),
> +	fault->pfn = __kvm_faultin_pfn(s2fd->memslot, get_canonical_gfn(s2fd, s2vi),
>  				       kvm_is_write_fault(s2fd->vcpu) ? FOLL_WRITE : 0,
>  				       &fault->writable, &fault->page);
>  	if (unlikely(is_error_noslot_pfn(fault->pfn))) {
>  		if (fault->pfn == KVM_PFN_ERR_HWPOISON) {
> -			kvm_send_hwpoison_signal(s2fd->hva, __ffs(fault->vma_pagesize));
> +			kvm_send_hwpoison_signal(s2fd->hva, __ffs(s2vi->vma_pagesize));
>  			return 0;
>  		}
>  		return -EFAULT;
> @@ -1810,7 +1815,8 @@ static int kvm_s2_fault_pin_pfn(const struct kvm_s2_fault_desc *s2fd,
>  }
>  
>  static int kvm_s2_fault_compute_prot(const struct kvm_s2_fault_desc *s2fd,
> -				     struct kvm_s2_fault *fault)
> +				     struct kvm_s2_fault *fault,
> +				     const struct kvm_s2_fault_vma_info *s2vi)
>  {
>  	struct kvm *kvm = s2fd->vcpu->kvm;
>  
> @@ -1818,8 +1824,8 @@ static int kvm_s2_fault_compute_prot(const struct kvm_s2_fault_desc *s2fd,
>  	 * Check if this is non-struct page memory PFN, and cannot support
>  	 * CMOs. It could potentially be unsafe to access as cacheable.
>  	 */
> -	if (fault->vm_flags & (VM_PFNMAP | VM_MIXEDMAP) && !pfn_is_map_memory(fault->pfn)) {
> -		if (fault->is_vma_cacheable) {
> +	if (s2vi->vm_flags & (VM_PFNMAP | VM_MIXEDMAP) && !pfn_is_map_memory(fault->pfn)) {
> +		if (s2vi->is_vma_cacheable) {
>  			/*
>  			 * Whilst the VMA owner expects cacheable mapping to this
>  			 * PFN, hardware also has to support the FWB and CACHE DIC
> @@ -1879,7 +1885,7 @@ static int kvm_s2_fault_compute_prot(const struct kvm_s2_fault_desc *s2fd,
>  		fault->prot |= KVM_PGTABLE_PROT_X;
>  
>  	if (fault->s2_force_noncacheable)
> -		fault->prot |= (fault->vm_flags & VM_ALLOW_ANY_UNCACHED) ?
> +		fault->prot |= (s2vi->vm_flags & VM_ALLOW_ANY_UNCACHED) ?
>  			       KVM_PGTABLE_PROT_NORMAL_NC : KVM_PGTABLE_PROT_DEVICE;
>  	else if (cpus_have_final_cap(ARM64_HAS_CACHE_DIC))
>  		fault->prot |= KVM_PGTABLE_PROT_X;
> @@ -1889,74 +1895,73 @@ static int kvm_s2_fault_compute_prot(const struct kvm_s2_fault_desc *s2fd,
>  
>  	if (!kvm_s2_fault_is_perm(s2fd) && !fault->s2_force_noncacheable && kvm_has_mte(kvm)) {
>  		/* Check the VMM hasn't introduced a new disallowed VMA */
> -		if (!fault->mte_allowed)
> +		if (!s2vi->mte_allowed)
>  			return -EFAULT;
>  	}
>  
>  	return 0;
>  }
>  
> -static phys_addr_t get_ipa(const struct kvm_s2_fault *fault)
> -{
> -	return gfn_to_gpa(fault->gfn);
> -}
> -
>  static int kvm_s2_fault_map(const struct kvm_s2_fault_desc *s2fd,
> -			    struct kvm_s2_fault *fault, void *memcache)
> +			    struct kvm_s2_fault *fault,
> +			    const struct kvm_s2_fault_vma_info *s2vi, void *memcache)
>  {
> +	enum kvm_pgtable_walk_flags flags = KVM_PGTABLE_WALK_SHARED;
>  	struct kvm *kvm = s2fd->vcpu->kvm;
>  	struct kvm_pgtable *pgt;
>  	long perm_fault_granule;
> +	long mapping_size;
> +	gfn_t gfn;
>  	int ret;
> -	enum kvm_pgtable_walk_flags flags = KVM_PGTABLE_WALK_SHARED;
>  
>  	kvm_fault_lock(kvm);
>  	pgt = s2fd->vcpu->arch.hw_mmu->pgt;
>  	ret = -EAGAIN;
> -	if (mmu_invalidate_retry(kvm, fault->mmu_seq))
> +	if (mmu_invalidate_retry(kvm, s2vi->mmu_seq))
>  		goto out_unlock;
>  
>  	perm_fault_granule = (kvm_s2_fault_is_perm(s2fd) ?
>  			      kvm_vcpu_trap_get_perm_fault_granule(s2fd->vcpu) : 0);
> +	mapping_size = s2vi->vma_pagesize;
> +	gfn = s2vi->gfn;
>  
>  	/*
>  	 * If we are not forced to use fault->page mapping, check if we are

This find/replace mistake is from Fuad's patches, but maybe it can be fixed
here or in one of the earlier commits touching kvm_s2_fault_map().

  * If we are not forced to use page mapping, check if we are

>  	 * backed by a THP and thus use block mapping if possible.
>  	 */
> -	if (fault->vma_pagesize == PAGE_SIZE &&
> +	if (mapping_size == PAGE_SIZE &&
>  	    !(fault->force_pte || fault->s2_force_noncacheable)) {
>  		if (perm_fault_granule > PAGE_SIZE) {
> -			fault->vma_pagesize = perm_fault_granule;
> +			mapping_size = perm_fault_granule;
>  		} else {
> -			fault->vma_pagesize = transparent_hugepage_adjust(kvm, s2fd->memslot,
> -									  s2fd->hva, &fault->pfn,
> -									  &fault->gfn);
> -
> -			if (fault->vma_pagesize < 0) {
> -				ret = fault->vma_pagesize;
> +			mapping_size = transparent_hugepage_adjust(kvm, s2fd->memslot,
> +								   s2fd->hva, &fault->pfn,
> +								   &gfn);
> +			if (mapping_size < 0) {
> +				ret = mapping_size;
>  				goto out_unlock;
>  			}
>  		}
>  	}
>  
>  	if (!perm_fault_granule && !fault->s2_force_noncacheable && kvm_has_mte(kvm))
> -		sanitise_mte_tags(kvm, fault->pfn, fault->vma_pagesize);
> +		sanitise_mte_tags(kvm, fault->pfn, mapping_size);
>  
>  	/*
>  	 * Under the premise of getting a FSC_PERM fault, we just need to relax
> -	 * permissions only if vma_pagesize equals perm_fault_granule. Otherwise,
> +	 * permissions only if mapping_size equals perm_fault_granule. Otherwise,
>  	 * kvm_pgtable_stage2_map() should be called to change block size.
>  	 */
> -	if (fault->vma_pagesize == perm_fault_granule) {
> +	if (mapping_size == perm_fault_granule) {
>  		/*
>  		 * Drop the SW bits in favour of those stored in the
>  		 * PTE, which will be preserved.
>  		 */
>  		fault->prot &= ~KVM_NV_GUEST_MAP_SZ;
> -		ret = KVM_PGT_FN(kvm_pgtable_stage2_relax_perms)(pgt, get_ipa(fault),
> +		ret = KVM_PGT_FN(kvm_pgtable_stage2_relax_perms)(pgt, gfn_to_gpa(gfn),
>  								 fault->prot, flags);
>  	} else {
> -		ret = KVM_PGT_FN(kvm_pgtable_stage2_map)(pgt, get_ipa(fault), fault->vma_pagesize,
> +		ret = KVM_PGT_FN(kvm_pgtable_stage2_map)(pgt, gfn_to_gpa(gfn), mapping_size,
>  							 __pfn_to_phys(fault->pfn), fault->prot,
>  							 memcache, flags);
>  	}
> @@ -1965,9 +1970,12 @@ static int kvm_s2_fault_map(const struct kvm_s2_fault_desc *s2fd,
>  	kvm_release_faultin_page(kvm, fault->page, !!ret, fault->writable);
>  	kvm_fault_unlock(kvm);
>  
> -	/* Mark the fault->page dirty only if the fault is handled successfully */
> -	if (fault->writable && !ret)
> -		mark_page_dirty_in_slot(kvm, s2fd->memslot, get_canonical_gfn(s2fd, fault));
> +	/* Mark the page dirty only if the fault is handled successfully */
> +	if (fault->writable && !ret) {
> +		phys_addr_t ipa = gfn_to_gpa(get_canonical_gfn(s2fd, s2vi));
> +		ipa &= ~(mapping_size - 1);
> +		mark_page_dirty_in_slot(kvm, s2fd->memslot, gpa_to_gfn(ipa));

I don't understand this change, why do we need to mask stuff now?

> +	}
>  
>  	if (ret != -EAGAIN)
>  		return ret;
> @@ -1978,6 +1986,7 @@ static int user_mem_abort(const struct kvm_s2_fault_desc *s2fd)
>  {
>  	bool perm_fault = kvm_vcpu_trap_is_permission_fault(s2fd->vcpu);
>  	bool logging_active = memslot_is_logging(s2fd->memslot);
> +	struct kvm_s2_fault_vma_info s2vi = {};
>  	struct kvm_s2_fault fault = {
>  		.logging_active = logging_active,
>  		.force_pte = logging_active,
> @@ -2002,17 +2011,17 @@ static int user_mem_abort(const struct kvm_s2_fault_desc *s2fd)
>  	 * Let's check if we will get back a huge fault->page backed by hugetlbfs, or
>  	 * get block mapping for device MMIO region.
>  	 */
> -	ret = kvm_s2_fault_pin_pfn(s2fd, &fault);
> +	ret = kvm_s2_fault_pin_pfn(s2fd, &fault, &s2vi);
>  	if (ret != 1)
>  		return ret;
>  
> -	ret = kvm_s2_fault_compute_prot(s2fd, &fault);
> +	ret = kvm_s2_fault_compute_prot(s2fd, &fault, &s2vi);
>  	if (ret) {
>  		kvm_release_page_unused(fault.page);
>  		return ret;
>  	}
>  
> -	return kvm_s2_fault_map(s2fd, &fault, memcache);
> +	return kvm_s2_fault_map(s2fd, &fault, &s2vi, memcache);
>  }
>  
>  /* Resolve the access fault by making the page young again. */

Thanks,
Joey


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 09/17] KVM: arm64: Move VMA-related information to kvm_s2_fault_vma_info
  2026-03-18 14:22   ` Joey Gouly
@ 2026-03-18 16:14     ` Fuad Tabba
  2026-03-21  9:50       ` Marc Zyngier
  0 siblings, 1 reply; 47+ messages in thread
From: Fuad Tabba @ 2026-03-18 16:14 UTC (permalink / raw)
  To: Joey Gouly
  Cc: Marc Zyngier, kvmarm, linux-arm-kernel, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu, Will Deacon, Quentin Perret

Hi Joey,

First, thanks for the reviews and the comments on my series. You're
right about my changes wrongly editing "page". I wanted it to be as
mechanical as possible to make it easy to review, but it ended up
being too mechanical.

<snip>

> > -     /* Mark the fault->page dirty only if the fault is handled successfully */
> > -     if (fault->writable && !ret)
> > -             mark_page_dirty_in_slot(kvm, s2fd->memslot, get_canonical_gfn(s2fd, fault));
> > +     /* Mark the page dirty only if the fault is handled successfully */
> > +     if (fault->writable && !ret) {
> > +             phys_addr_t ipa = gfn_to_gpa(get_canonical_gfn(s2fd, s2vi));
> > +             ipa &= ~(mapping_size - 1);
> > +             mark_page_dirty_in_slot(kvm, s2fd->memslot, gpa_to_gfn(ipa));
>
> I don't understand this change, why do we need to mask stuff now?

Let me see if _I_ understand it (Marc, please correct me if I'm wrong).

Before this patch, fault->gfn and fault->vma_pagesize were mutable,
and transparent_hugepage_adjust() modified both directly. In addition
to this being confusing (which gfn is this: the host /canonical or the
nested one?), it made it more difficult to separate the logic.

So, to mark a dirty page, it did this:
-             mark_page_dirty_in_slot(kvm, s2fd->memslot,
get_canonical_gfn(s2fd, fault));

which relied on the old struct fault to calculate the canonical gfn
using the (magically) THP adjusted fault->vma_pagesize.

Now that fault (or s2vi, its successor in this case) isn't mutable, we
need to get the canonical gfn using the host mapping size.

Cheers,
/fuad

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 09/17] KVM: arm64: Move VMA-related information to kvm_s2_fault_vma_info
  2026-03-18 16:14     ` Fuad Tabba
@ 2026-03-21  9:50       ` Marc Zyngier
  0 siblings, 0 replies; 47+ messages in thread
From: Marc Zyngier @ 2026-03-21  9:50 UTC (permalink / raw)
  To: Joey Gouly, Fuad Tabba
  Cc: kvmarm, linux-arm-kernel, Suzuki K Poulose, Oliver Upton,
	Zenghui Yu, Will Deacon, Quentin Perret

On Wed, 18 Mar 2026 16:14:19 +0000,
Fuad Tabba <tabba@google.com> wrote:
> 
> Hi Joey,
> 
> First, thanks for the reviews and the comments on my series. You're
> right about my changes wrongly editing "page". I wanted it to be as
> mechanical as possible to make it easy to review, but it ended up
> being too mechanical.
> 
> <snip>
> 
> > > -     /* Mark the fault->page dirty only if the fault is handled successfully */
> > > -     if (fault->writable && !ret)
> > > -             mark_page_dirty_in_slot(kvm, s2fd->memslot, get_canonical_gfn(s2fd, fault));
> > > +     /* Mark the page dirty only if the fault is handled successfully */
> > > +     if (fault->writable && !ret) {
> > > +             phys_addr_t ipa = gfn_to_gpa(get_canonical_gfn(s2fd, s2vi));
> > > +             ipa &= ~(mapping_size - 1);
> > > +             mark_page_dirty_in_slot(kvm, s2fd->memslot, gpa_to_gfn(ipa));
> >
> > I don't understand this change, why do we need to mask stuff now?
> 
> Let me see if _I_ understand it (Marc, please correct me if I'm wrong).
> 
> Before this patch, fault->gfn and fault->vma_pagesize were mutable,
> and transparent_hugepage_adjust() modified both directly. In addition
> to this being confusing (which gfn is this: the host /canonical or the
> nested one?), it made it more difficult to separate the logic.
> 
> So, to mark a dirty page, it did this:
> -             mark_page_dirty_in_slot(kvm, s2fd->memslot,
> get_canonical_gfn(s2fd, fault));
> 
> which relied on the old struct fault to calculate the canonical gfn
> using the (magically) THP adjusted fault->vma_pagesize.
> 
> Now that fault (or s2vi, its successor in this case) isn't mutable, we
> need to get the canonical gfn using the host mapping size.

It's exactly that, and it is slightly clearer if you look at how
mapping_size is updated:

	mapping_size = transparent_hugepage_adjust(kvm, s2fd->memslot,
						   s2fd->hva, &fault->pfn,
						   &gfn);

The faulting IPA is represented by 'gfn', and gets correctly updated
by the helper. But that doesn't adjust the 'canonical' IPA, which is
used for any memslot related update.

So if we need to call into mark_page_dirty_in_slot(), we really need
to pick the base of the region we are actually marking dirty, hence
the masking of the bottom bits.

Does this make sense? This is one of the area where the constification
results in slightly more complicated code, as we can't update things
in place anymore.

Thanks,

	M.

-- 
Jazz isn't dead. It just smells funny.


^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2026-03-21  9:50 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-16 17:54 [PATCH 00/17] KVM: arm64: More user_mem_abort() rework Marc Zyngier
2026-03-16 17:54 ` [PATCH 01/17] KVM: arm64: Kill fault->ipa Marc Zyngier
2026-03-17  9:22   ` Fuad Tabba
2026-03-16 17:54 ` [PATCH 02/17] KVM: arm64: Make fault_ipa immutable Marc Zyngier
2026-03-17  9:38   ` Fuad Tabba
2026-03-16 17:54 ` [PATCH 03/17] KVM: arm64: Move fault context to const structure Marc Zyngier
2026-03-17 10:26   ` Fuad Tabba
2026-03-16 17:54 ` [PATCH 04/17] KVM: arm64: Replace fault_is_perm with a helper Marc Zyngier
2026-03-17 10:49   ` Fuad Tabba
2026-03-18 13:43   ` Joey Gouly
2026-03-16 17:54 ` [PATCH 05/17] KVM: arm64: Constrain fault_granule to kvm_s2_fault_map() Marc Zyngier
2026-03-17 11:04   ` Fuad Tabba
2026-03-16 17:54 ` [PATCH 06/17] KVM: arm64: Kill write_fault from kvm_s2_fault Marc Zyngier
2026-03-17 11:20   ` Fuad Tabba
2026-03-16 17:54 ` [PATCH 07/17] KVM: arm64: Kill exec_fault " Marc Zyngier
2026-03-17 11:44   ` Fuad Tabba
2026-03-16 17:54 ` [PATCH 08/17] KVM: arm64: Kill topup_memcache " Marc Zyngier
2026-03-17 12:12   ` Fuad Tabba
2026-03-17 13:31     ` Marc Zyngier
2026-03-16 17:54 ` [PATCH 09/17] KVM: arm64: Move VMA-related information to kvm_s2_fault_vma_info Marc Zyngier
2026-03-17 12:51   ` Fuad Tabba
2026-03-18 14:22   ` Joey Gouly
2026-03-18 16:14     ` Fuad Tabba
2026-03-21  9:50       ` Marc Zyngier
2026-03-16 17:54 ` [PATCH 10/17] KVM: arm64: Kill logging_active from kvm_s2_fault Marc Zyngier
2026-03-17 13:23   ` Fuad Tabba
2026-03-16 17:54 ` [PATCH 11/17] KVM: arm64: Restrict the scope of the 'writable' attribute Marc Zyngier
2026-03-17 13:55   ` Fuad Tabba
2026-03-16 17:54 ` [PATCH 12/17] KVM: arm64: Move kvm_s2_fault.{pfn,page} to kvm_s2_vma_info Marc Zyngier
2026-03-17 14:24   ` Fuad Tabba
2026-03-16 17:54 ` [PATCH 13/17] KVM: arm64: Replace force_pte with a max_map_size attribute Marc Zyngier
2026-03-17 15:08   ` Fuad Tabba
2026-03-16 17:54 ` [PATCH 14/17] KVM: arm64: Move device mapping management into kvm_s2_fault_pin_pfn() Marc Zyngier
2026-03-17 15:41   ` Fuad Tabba
2026-03-16 17:54 ` [PATCH 15/17] KVM: arm64: Directly expose mapping prot and kill kvm_s2_fault Marc Zyngier
2026-03-17 16:14   ` Fuad Tabba
2026-03-16 17:54 ` [PATCH 16/17] KVM: arm64: Simplify integration of adjust_nested_*_perms() Marc Zyngier
2026-03-17 16:45   ` Fuad Tabba
2026-03-16 17:54 ` [PATCH 17/17] KVM: arm64: Convert gmem_abort() to struct kvm_s2_fault_desc Marc Zyngier
2026-03-17 17:58   ` Fuad Tabba
2026-03-16 19:45 ` [PATCH 00/17] KVM: arm64: More user_mem_abort() rework Fuad Tabba
2026-03-16 20:26 ` Fuad Tabba
2026-03-16 20:33   ` Fuad Tabba
2026-03-17  8:23     ` Marc Zyngier
2026-03-17 17:50       ` Fuad Tabba
2026-03-17 18:02         ` Fuad Tabba
2026-03-17 17:03 ` Suzuki K Poulose

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox