* [PATCH] KVM: arm64: Handle permission faults with guest_memfd
@ 2026-04-30 13:23 Alexandru Elisei
2026-04-30 14:48 ` Fuad Tabba
0 siblings, 1 reply; 3+ messages in thread
From: Alexandru Elisei @ 2026-04-30 13:23 UTC (permalink / raw)
To: maz, oupton, joey.gouly, suzuki.poulose, yuzenghui,
linux-arm-kernel, kvmarm, tabba
Cc: mark.rutland
gmem_abort() calls kvm_pgtable_stage2_map() to make changes to stage 2. It
does this for both relaxing permissions on an existing mapping and to
install a missing mapping.
kvm_pgtable_stage2_map() doesn't make changes to stage 2 if there is an
existing, valid entry and the new entry modifies only the permissions.
This is checked in:
kvm_pgtable_stage2_map()
stage2_map_walk_leaf()
stage2_map_walker_try_leaf()
stage2_pte_needs_update()
and if only the permissions differ, kvm_pgtable_stage2_map() returns
-EAGAIN and KVM returns to the guest to replay the instruction. The
assumption is that a concurrent fault on a different VCPU already mapped
the faulting IPA, and replaying the instruction will either succeed, or
cause a permission fault, which should be handled with
kvm_pgtable_stage2_relax_perms().
gmem_abort(), on a read or write fault on a system without DIC (instruction
cache invalidation required for data to instruction coherence), installs a
valid entry with read and write permissions, but without executable
permissions. On an execution fault on the same page, gmem_abort() attempts
to relax the permissions to allow execution, but calls
kvm_pgtable_stage2_map() to change the existing, valid, entry.
kvm_pgtable_stage2_map() returns -EAGAIN and KVM resumes execution from the
faulting instruction, which leads to an infinite loop of permission faults
on the same instruction.
Allow the guest to make progress by using kvm_pgtable_stage2_relax_perms()
to relax permissions.
Fixes: a7b57e099592 ("KVM: arm64: Handle guest_memfd-backed guest page faults")
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
Lightly tested on an Orion O6 board, without pkvm, and without nested
virtualisation.
Doesn't apply cleanly on top of a7b57e099592, I can send a patch for that if
needed.
arch/arm64/kvm/mmu.c | 24 +++++++++++++++++-------
1 file changed, 17 insertions(+), 7 deletions(-)
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index d089c107d9b7..dff58de7703b 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1576,6 +1576,7 @@ struct kvm_s2_fault_desc {
static int gmem_abort(const struct kvm_s2_fault_desc *s2fd)
{
bool write_fault, exec_fault;
+ bool perm_fault = kvm_vcpu_trap_is_permission_fault(s2fd->vcpu);
enum kvm_pgtable_walk_flags flags = KVM_PGTABLE_WALK_SHARED;
enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
struct kvm_pgtable *pgt = s2fd->vcpu->arch.hw_mmu->pgt;
@@ -1587,10 +1588,12 @@ static int gmem_abort(const struct kvm_s2_fault_desc *s2fd)
gfn_t gfn;
int ret;
- memcache = get_mmu_memcache(s2fd->vcpu);
- ret = topup_mmu_memcache(s2fd->vcpu, memcache);
- if (ret)
- return ret;
+ if (!perm_fault) {
+ memcache = get_mmu_memcache(s2fd->vcpu);
+ ret = topup_mmu_memcache(s2fd->vcpu, memcache);
+ if (ret)
+ return ret;
+ }
if (s2fd->nested)
gfn = kvm_s2_trans_output(s2fd->nested) >> PAGE_SHIFT;
@@ -1631,9 +1634,16 @@ static int gmem_abort(const struct kvm_s2_fault_desc *s2fd)
goto out_unlock;
}
- ret = KVM_PGT_FN(kvm_pgtable_stage2_map)(pgt, s2fd->fault_ipa, PAGE_SIZE,
- __pfn_to_phys(pfn), prot,
- memcache, flags);
+ if (perm_fault) {
+ /* Preserve the software bits from the existing table entry. */
+ prot &= ~KVM_NV_GUEST_MAP_SZ;
+ ret = KVM_PGT_FN(kvm_pgtable_stage2_relax_perms(pgt, s2fd->fault_ipa,
+ prot, flags));
+ } else {
+ ret = KVM_PGT_FN(kvm_pgtable_stage2_map)(pgt, s2fd->fault_ipa, PAGE_SIZE,
+ __pfn_to_phys(pfn), prot,
+ memcache, flags);
+ }
out_unlock:
kvm_release_faultin_page(kvm, page, !!ret, prot & KVM_PGTABLE_PROT_W);
base-commit: 254f49634ee16a731174d2ae34bc50bd5f45e731
--
2.54.0
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] KVM: arm64: Handle permission faults with guest_memfd
2026-04-30 13:23 [PATCH] KVM: arm64: Handle permission faults with guest_memfd Alexandru Elisei
@ 2026-04-30 14:48 ` Fuad Tabba
2026-05-01 14:18 ` Alexandru Elisei
0 siblings, 1 reply; 3+ messages in thread
From: Fuad Tabba @ 2026-04-30 14:48 UTC (permalink / raw)
To: Alexandru Elisei
Cc: maz, oupton, joey.gouly, suzuki.poulose, yuzenghui,
linux-arm-kernel, kvmarm, mark.rutland
Hi Alexandru,
On Thu, 30 Apr 2026 at 14:24, Alexandru Elisei <alexandru.elisei@arm.com> wrote:
>
> gmem_abort() calls kvm_pgtable_stage2_map() to make changes to stage 2. It
> does this for both relaxing permissions on an existing mapping and to
> install a missing mapping.
>
> kvm_pgtable_stage2_map() doesn't make changes to stage 2 if there is an
> existing, valid entry and the new entry modifies only the permissions.
> This is checked in:
>
> kvm_pgtable_stage2_map()
> stage2_map_walk_leaf()
> stage2_map_walker_try_leaf()
> stage2_pte_needs_update()
>
> and if only the permissions differ, kvm_pgtable_stage2_map() returns
> -EAGAIN and KVM returns to the guest to replay the instruction. The
> assumption is that a concurrent fault on a different VCPU already mapped
> the faulting IPA, and replaying the instruction will either succeed, or
> cause a permission fault, which should be handled with
> kvm_pgtable_stage2_relax_perms().
>
> gmem_abort(), on a read or write fault on a system without DIC (instruction
> cache invalidation required for data to instruction coherence), installs a
> valid entry with read and write permissions, but without executable
> permissions. On an execution fault on the same page, gmem_abort() attempts
> to relax the permissions to allow execution, but calls
> kvm_pgtable_stage2_map() to change the existing, valid, entry.
> kvm_pgtable_stage2_map() returns -EAGAIN and KVM resumes execution from the
> faulting instruction, which leads to an infinite loop of permission faults
> on the same instruction.
>
> Allow the guest to make progress by using kvm_pgtable_stage2_relax_perms()
> to relax permissions.
>
> Fixes: a7b57e099592 ("KVM: arm64: Handle guest_memfd-backed guest page faults")
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> ---
>
> Lightly tested on an Orion O6 board, without pkvm, and without nested
> virtualisation.
>
> Doesn't apply cleanly on top of a7b57e099592, I can send a patch for that if
> needed.
>
> arch/arm64/kvm/mmu.c | 24 +++++++++++++++++-------
> 1 file changed, 17 insertions(+), 7 deletions(-)
>
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index d089c107d9b7..dff58de7703b 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1576,6 +1576,7 @@ struct kvm_s2_fault_desc {
> static int gmem_abort(const struct kvm_s2_fault_desc *s2fd)
> {
> bool write_fault, exec_fault;
> + bool perm_fault = kvm_vcpu_trap_is_permission_fault(s2fd->vcpu);
> enum kvm_pgtable_walk_flags flags = KVM_PGTABLE_WALK_SHARED;
> enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
> struct kvm_pgtable *pgt = s2fd->vcpu->arch.hw_mmu->pgt;
> @@ -1587,10 +1588,12 @@ static int gmem_abort(const struct kvm_s2_fault_desc *s2fd)
> gfn_t gfn;
> int ret;
>
> - memcache = get_mmu_memcache(s2fd->vcpu);
> - ret = topup_mmu_memcache(s2fd->vcpu, memcache);
> - if (ret)
> - return ret;
> + if (!perm_fault) {
> + memcache = get_mmu_memcache(s2fd->vcpu);
> + ret = topup_mmu_memcache(s2fd->vcpu, memcache);
> + if (ret)
> + return ret;
> + }
memcache is now uninitialized when perm_fault, and only read in the
!perm_fault branch below. It's safe today, but initializing it to NULL
would silence any defensive analyzer and keep things robust against
future churn.
>
> if (s2fd->nested)
> gfn = kvm_s2_trans_output(s2fd->nested) >> PAGE_SHIFT;
> @@ -1631,9 +1634,16 @@ static int gmem_abort(const struct kvm_s2_fault_desc *s2fd)
> goto out_unlock;
> }
>
> - ret = KVM_PGT_FN(kvm_pgtable_stage2_map)(pgt, s2fd->fault_ipa, PAGE_SIZE,
> - __pfn_to_phys(pfn), prot,
> - memcache, flags);
> + if (perm_fault) {
> + /* Preserve the software bits from the existing table entry. */
> + prot &= ~KVM_NV_GUEST_MAP_SZ;
Can you please phrase it the same way as the existing comment in
user_mem_abort, as that phrasing more precisely describes the
rationale? i.e.,
/*
* Drop the SW bits in favour of those stored in the
* PTE, which will be preserved.
*/
> + ret = KVM_PGT_FN(kvm_pgtable_stage2_relax_perms(pgt, s2fd->fault_ipa,
> + prot, flags));
nit: args wrapped inside KVM_PGT_FN() is inconsistent with the rest of
the file (and with the else-branch immediately below). Every other
call site uses name-only inside the macro. Both expand the same way,
but it's better to stick to the convention.
> + } else {
> + ret = KVM_PGT_FN(kvm_pgtable_stage2_map)(pgt, s2fd->fault_ipa, PAGE_SIZE,
> + __pfn_to_phys(pfn), prot,
> + memcache, flags);
> + }
>
> out_unlock:
> kvm_release_faultin_page(kvm, page, !!ret, prot & KVM_PGTABLE_PROT_W);
With these fixed:
Reviewed-by: Fuad Tabba <tabba@google.com>
Cheers,
/fuad
>
> base-commit: 254f49634ee16a731174d2ae34bc50bd5f45e731
> --
> 2.54.0
>
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] KVM: arm64: Handle permission faults with guest_memfd
2026-04-30 14:48 ` Fuad Tabba
@ 2026-05-01 14:18 ` Alexandru Elisei
0 siblings, 0 replies; 3+ messages in thread
From: Alexandru Elisei @ 2026-05-01 14:18 UTC (permalink / raw)
To: Fuad Tabba
Cc: maz, oupton, joey.gouly, suzuki.poulose, yuzenghui,
linux-arm-kernel, kvmarm, mark.rutland
Hi Fuad,
On Thu, Apr 30, 2026 at 03:48:21PM +0100, Fuad Tabba wrote:
> Hi Alexandru,
>
> On Thu, 30 Apr 2026 at 14:24, Alexandru Elisei <alexandru.elisei@arm.com> wrote:
> >
> > gmem_abort() calls kvm_pgtable_stage2_map() to make changes to stage 2. It
> > does this for both relaxing permissions on an existing mapping and to
> > install a missing mapping.
> >
> > kvm_pgtable_stage2_map() doesn't make changes to stage 2 if there is an
> > existing, valid entry and the new entry modifies only the permissions.
> > This is checked in:
> >
> > kvm_pgtable_stage2_map()
> > stage2_map_walk_leaf()
> > stage2_map_walker_try_leaf()
> > stage2_pte_needs_update()
> >
> > and if only the permissions differ, kvm_pgtable_stage2_map() returns
> > -EAGAIN and KVM returns to the guest to replay the instruction. The
> > assumption is that a concurrent fault on a different VCPU already mapped
> > the faulting IPA, and replaying the instruction will either succeed, or
> > cause a permission fault, which should be handled with
> > kvm_pgtable_stage2_relax_perms().
> >
> > gmem_abort(), on a read or write fault on a system without DIC (instruction
> > cache invalidation required for data to instruction coherence), installs a
> > valid entry with read and write permissions, but without executable
> > permissions. On an execution fault on the same page, gmem_abort() attempts
> > to relax the permissions to allow execution, but calls
> > kvm_pgtable_stage2_map() to change the existing, valid, entry.
> > kvm_pgtable_stage2_map() returns -EAGAIN and KVM resumes execution from the
> > faulting instruction, which leads to an infinite loop of permission faults
> > on the same instruction.
> >
> > Allow the guest to make progress by using kvm_pgtable_stage2_relax_perms()
> > to relax permissions.
> >
> > Fixes: a7b57e099592 ("KVM: arm64: Handle guest_memfd-backed guest page faults")
> > Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> > ---
> >
> > Lightly tested on an Orion O6 board, without pkvm, and without nested
> > virtualisation.
> >
> > Doesn't apply cleanly on top of a7b57e099592, I can send a patch for that if
> > needed.
> >
> > arch/arm64/kvm/mmu.c | 24 +++++++++++++++++-------
> > 1 file changed, 17 insertions(+), 7 deletions(-)
> >
> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index d089c107d9b7..dff58de7703b 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -1576,6 +1576,7 @@ struct kvm_s2_fault_desc {
> > static int gmem_abort(const struct kvm_s2_fault_desc *s2fd)
> > {
> > bool write_fault, exec_fault;
> > + bool perm_fault = kvm_vcpu_trap_is_permission_fault(s2fd->vcpu);
> > enum kvm_pgtable_walk_flags flags = KVM_PGTABLE_WALK_SHARED;
> > enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
> > struct kvm_pgtable *pgt = s2fd->vcpu->arch.hw_mmu->pgt;
> > @@ -1587,10 +1588,12 @@ static int gmem_abort(const struct kvm_s2_fault_desc *s2fd)
> > gfn_t gfn;
> > int ret;
> >
> > - memcache = get_mmu_memcache(s2fd->vcpu);
> > - ret = topup_mmu_memcache(s2fd->vcpu, memcache);
> > - if (ret)
> > - return ret;
> > + if (!perm_fault) {
> > + memcache = get_mmu_memcache(s2fd->vcpu);
> > + ret = topup_mmu_memcache(s2fd->vcpu, memcache);
> > + if (ret)
> > + return ret;
> > + }
>
> memcache is now uninitialized when perm_fault, and only read in the
> !perm_fault branch below. It's safe today, but initializing it to NULL
> would silence any defensive analyzer and keep things robust against
> future churn.
Sure, I'll initialise memcache to NULL.
>
> >
> > if (s2fd->nested)
> > gfn = kvm_s2_trans_output(s2fd->nested) >> PAGE_SHIFT;
> > @@ -1631,9 +1634,16 @@ static int gmem_abort(const struct kvm_s2_fault_desc *s2fd)
> > goto out_unlock;
> > }
> >
> > - ret = KVM_PGT_FN(kvm_pgtable_stage2_map)(pgt, s2fd->fault_ipa, PAGE_SIZE,
> > - __pfn_to_phys(pfn), prot,
> > - memcache, flags);
> > + if (perm_fault) {
> > + /* Preserve the software bits from the existing table entry. */
> > + prot &= ~KVM_NV_GUEST_MAP_SZ;
>
> Can you please phrase it the same way as the existing comment in
> user_mem_abort, as that phrasing more precisely describes the
> rationale? i.e.,
Of course.
>
> /*
> * Drop the SW bits in favour of those stored in the
> * PTE, which will be preserved.
> */
>
> > + ret = KVM_PGT_FN(kvm_pgtable_stage2_relax_perms(pgt, s2fd->fault_ipa,
> > + prot, flags));
>
> nit: args wrapped inside KVM_PGT_FN() is inconsistent with the rest of
> the file (and with the else-branch immediately below). Every other
> call site uses name-only inside the macro. Both expand the same way,
> but it's better to stick to the convention.
This was a typo on my part, thanks for catching it.
>
> > + } else {
> > + ret = KVM_PGT_FN(kvm_pgtable_stage2_map)(pgt, s2fd->fault_ipa, PAGE_SIZE,
> > + __pfn_to_phys(pfn), prot,
> > + memcache, flags);
> > + }
> >
> > out_unlock:
> > kvm_release_faultin_page(kvm, page, !!ret, prot & KVM_PGTABLE_PROT_W);
>
> With these fixed:
>
> Reviewed-by: Fuad Tabba <tabba@google.com>
Thanks!
Alex
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-05-01 14:18 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-30 13:23 [PATCH] KVM: arm64: Handle permission faults with guest_memfd Alexandru Elisei
2026-04-30 14:48 ` Fuad Tabba
2026-05-01 14:18 ` Alexandru Elisei
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox