[PATCH] KVM: arm64: Fix protected mode handling of pages larger than 4kB

public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] KVM: arm64: Fix protected mode handling of pages larger than 4kB
@ 2026-02-22 14:10 Marc Zyngier
  2026-02-22 17:58 ` Fuad Tabba
  2026-02-23 16:31 ` Marc Zyngier
  0 siblings, 2 replies; 5+ messages in thread
From: Marc Zyngier @ 2026-02-22 14:10 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel
  Cc: Quentin Perret, Will Deacon, Fuad Tabba, Vincent Donnefort,
	Joey Gouly, Suzuki K Poulose, Oliver Upton, Zenghui Yu, stable

Since 3669ddd8fa8b5 ("KVM: arm64: Add a range to pkvm_mappings"),
pKVM tracks the memory that has been mapped into a guest in a
side data structure. Crucially, it uses it to find out whether
a page has already been mapped, and therefore refuses to map it
twice. So far, so good.

However, this very patch completely breaks non-4kB page support,
with guests being unable to boot. The most obvious symptom is that
we take the same fault repeatedly, and not making forward progress.
A quick investigation shows that this is because of the above
rejection code.

As it turns out, there are multiple issues at play:

- while the HPFAR_EL2 register gives you the faulting IPA minus
  the bottom 12 bits, it will still give you the extra bits that
  are part of the page offset for anything larger than 4kB,
  even for a level-3 mapping

- pkvm_kvm_pgtable_stage2_map() assumes that the address passed
  as a parameter is aligned to the size of the intended mapping

- the faulting address is only aligned for a non-page mapping

When the planets are suitably aligned (pun intended), the guest
faults a page by accessing it past the bottom 4kB, and extra bits
get set in the HPFAR_EL2 register. If this results in a page mapping
(which is likely with large granule sizes), nothing aligns it further
down, and pkvm_mapping_iter_first() finds an intersection that
doesn't really exist. We assume this is a spurious fault and return
-EAGAIN. And again.

This doesn't hit outside of the protected code, as the page table
code always aligns the IPA down to a page boundary, hiding the issue
for everyone else.

Fix it by always forcing the alignment on vma_pagesize, irrespective
of the value of vma_pagesize.

Fixes: 3669ddd8fa8b5 ("KVM: arm64: Add a range to pkvm_mappings")
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
---
 arch/arm64/kvm/mmu.c | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 8c5d259810b2f..aa587f2e28264 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1753,14 +1753,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	}
 
 	/*
-	 * Both the canonical IPA and fault IPA must be hugepage-aligned to
-	 * ensure we find the right PFN and lay down the mapping in the right
-	 * place.
+	 * Both the canonical IPA and fault IPA must be aligned to the
+	 * mapping size to ensure we find the right PFN and lay down the
+	 * mapping in the right place.
 	 */
-	if (vma_pagesize == PMD_SIZE || vma_pagesize == PUD_SIZE) {
-		fault_ipa &= ~(vma_pagesize - 1);
-		ipa &= ~(vma_pagesize - 1);
-	}
+	fault_ipa &= ~(vma_pagesize - 1);
+	ipa &= ~(vma_pagesize - 1);
 
 	gfn = ipa >> PAGE_SHIFT;
 	mte_allowed = kvm_vma_mte_allowed(vma);
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] KVM: arm64: Fix protected mode handling of pages larger than 4kB
  2026-02-22 14:10 [PATCH] KVM: arm64: Fix protected mode handling of pages larger than 4kB Marc Zyngier
@ 2026-02-22 17:58 ` Fuad Tabba
  2026-02-22 18:54   ` Marc Zyngier
  2026-02-23 16:31 ` Marc Zyngier
  1 sibling, 1 reply; 5+ messages in thread
From: Fuad Tabba @ 2026-02-22 17:58 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvmarm, linux-arm-kernel, Quentin Perret, Will Deacon,
	Vincent Donnefort, Joey Gouly, Suzuki K Poulose, Oliver Upton,
	Zenghui Yu, stable

Hi Marc,

On Sun, 22 Feb 2026 at 14:10, Marc Zyngier <maz@kernel.org> wrote:
>
> Since 3669ddd8fa8b5 ("KVM: arm64: Add a range to pkvm_mappings"),
> pKVM tracks the memory that has been mapped into a guest in a
> side data structure. Crucially, it uses it to find out whether
> a page has already been mapped, and therefore refuses to map it
> twice. So far, so good.
>
> However, this very patch completely breaks non-4kB page support,
> with guests being unable to boot. The most obvious symptom is that
> we take the same fault repeatedly, and not making forward progress.
> A quick investigation shows that this is because of the above
> rejection code.
>
> As it turns out, there are multiple issues at play:
>
> - while the HPFAR_EL2 register gives you the faulting IPA minus
>   the bottom 12 bits, it will still give you the extra bits that
>   are part of the page offset for anything larger than 4kB,
>   even for a level-3 mapping

Matches the ARM ARM.

> - pkvm_kvm_pgtable_stage2_map() assumes that the address passed
>   as a parameter is aligned to the size of the intended mapping

nit: pkvm_kvm_pgtable_stage2_map() -> kvm_pgtable_stage2_map()

> - the faulting address is only aligned for a non-page mapping
>
> When the planets are suitably aligned (pun intended), the guest
> faults a page by accessing it past the bottom 4kB, and extra bits
> get set in the HPFAR_EL2 register. If this results in a page mapping
> (which is likely with large granule sizes), nothing aligns it further
> down, and pkvm_mapping_iter_first() finds an intersection that
> doesn't really exist. We assume this is a spurious fault and return
> -EAGAIN. And again.
>
> This doesn't hit outside of the protected code, as the page table
> code always aligns the IPA down to a page boundary, hiding the issue
> for everyone else.
>
> Fix it by always forcing the alignment on vma_pagesize, irrespective
> of the value of vma_pagesize.
>
> Fixes: 3669ddd8fa8b5 ("KVM: arm64: Add a range to pkvm_mappings")
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> Cc: stable@vger.kernel.org
> ---
>  arch/arm64/kvm/mmu.c | 12 +++++-------
>  1 file changed, 5 insertions(+), 7 deletions(-)
>
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 8c5d259810b2f..aa587f2e28264 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1753,14 +1753,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>         }
>
>         /*
> -        * Both the canonical IPA and fault IPA must be hugepage-aligned to
> -        * ensure we find the right PFN and lay down the mapping in the right
> -        * place.
> +        * Both the canonical IPA and fault IPA must be aligned to the
> +        * mapping size to ensure we find the right PFN and lay down the
> +        * mapping in the right place.
>          */
> -       if (vma_pagesize == PMD_SIZE || vma_pagesize == PUD_SIZE) {
> -               fault_ipa &= ~(vma_pagesize - 1);
> -               ipa &= ~(vma_pagesize - 1);
> -       }
> +       fault_ipa &= ~(vma_pagesize - 1);
> +       ipa &= ~(vma_pagesize - 1);

nit: Since we're changing this code anyway, should we use the ALIGN
macros instead?

Reviewed-by: Fuad Tabba <tabba@google.com>

and using 4, 16, and 64KB pages:

Tested-by: Fuad Tabba <tabba@google.com>

Cheers,
/fuad


>
>         gfn = ipa >> PAGE_SHIFT;
>         mte_allowed = kvm_vma_mte_allowed(vma);
> --
> 2.47.3
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] KVM: arm64: Fix protected mode handling of pages larger than 4kB
  2026-02-22 17:58 ` Fuad Tabba
@ 2026-02-22 18:54   ` Marc Zyngier
  2026-02-22 20:28     ` Fuad Tabba
  0 siblings, 1 reply; 5+ messages in thread
From: Marc Zyngier @ 2026-02-22 18:54 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: kvmarm, linux-arm-kernel, Quentin Perret, Will Deacon,
	Vincent Donnefort, Joey Gouly, Suzuki K Poulose, Oliver Upton,
	Zenghui Yu, stable

Hi Fuad,

On Sun, 22 Feb 2026 17:58:00 +0000,
Fuad Tabba <tabba@google.com> wrote:
> 
> Hi Marc,
> 
> On Sun, 22 Feb 2026 at 14:10, Marc Zyngier <maz@kernel.org> wrote:
> >
> > Since 3669ddd8fa8b5 ("KVM: arm64: Add a range to pkvm_mappings"),
> > pKVM tracks the memory that has been mapped into a guest in a
> > side data structure. Crucially, it uses it to find out whether
> > a page has already been mapped, and therefore refuses to map it
> > twice. So far, so good.
> >
> > However, this very patch completely breaks non-4kB page support,
> > with guests being unable to boot. The most obvious symptom is that
> > we take the same fault repeatedly, and not making forward progress.
> > A quick investigation shows that this is because of the above
> > rejection code.
> >
> > As it turns out, there are multiple issues at play:
> >
> > - while the HPFAR_EL2 register gives you the faulting IPA minus
> >   the bottom 12 bits, it will still give you the extra bits that
> >   are part of the page offset for anything larger than 4kB,
> >   even for a level-3 mapping
> 
> Matches the ARM ARM.
> 
> > - pkvm_kvm_pgtable_stage2_map() assumes that the address passed
> >   as a parameter is aligned to the size of the intended mapping
> 
> nit: pkvm_kvm_pgtable_stage2_map() -> kvm_pgtable_stage2_map()

Actually, that's pkvm_pgtable_stage2_map(). kvm_pgtable_stage2_map()
itself isn't affected.

> 
> > - the faulting address is only aligned for a non-page mapping
> >
> > When the planets are suitably aligned (pun intended), the guest
> > faults a page by accessing it past the bottom 4kB, and extra bits
> > get set in the HPFAR_EL2 register. If this results in a page mapping
> > (which is likely with large granule sizes), nothing aligns it further
> > down, and pkvm_mapping_iter_first() finds an intersection that
> > doesn't really exist. We assume this is a spurious fault and return
> > -EAGAIN. And again.
> >
> > This doesn't hit outside of the protected code, as the page table
> > code always aligns the IPA down to a page boundary, hiding the issue
> > for everyone else.
> >
> > Fix it by always forcing the alignment on vma_pagesize, irrespective
> > of the value of vma_pagesize.
> >
> > Fixes: 3669ddd8fa8b5 ("KVM: arm64: Add a range to pkvm_mappings")
> > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > Cc: stable@vger.kernel.org
> > ---
> >  arch/arm64/kvm/mmu.c | 12 +++++-------
> >  1 file changed, 5 insertions(+), 7 deletions(-)
> >
> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index 8c5d259810b2f..aa587f2e28264 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -1753,14 +1753,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >         }
> >
> >         /*
> > -        * Both the canonical IPA and fault IPA must be hugepage-aligned to
> > -        * ensure we find the right PFN and lay down the mapping in the right
> > -        * place.
> > +        * Both the canonical IPA and fault IPA must be aligned to the
> > +        * mapping size to ensure we find the right PFN and lay down the
> > +        * mapping in the right place.
> >          */
> > -       if (vma_pagesize == PMD_SIZE || vma_pagesize == PUD_SIZE) {
> > -               fault_ipa &= ~(vma_pagesize - 1);
> > -               ipa &= ~(vma_pagesize - 1);
> > -       }
> > +       fault_ipa &= ~(vma_pagesize - 1);
> > +       ipa &= ~(vma_pagesize - 1);
> 
> nit: Since we're changing this code anyway, should we use the ALIGN
> macros instead?

That'd be ALIGN_DOWN() then, as ALIGN() really is ALIGN_UP(), and
that'd be counter-productive.  Something like:

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index aa587f2e28264..3952415c4f83b 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1757,8 +1757,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	 * mapping size to ensure we find the right PFN and lay down the
 	 * mapping in the right place.
 	 */
-	fault_ipa &= ~(vma_pagesize - 1);
-	ipa &= ~(vma_pagesize - 1);
+	fault_ipa = ALIGN_DOWN(fault_ipa, vma_pagesize);
+	ipa = ALIGN_DOWN(ipa, vma_pagesize);
 
 	gfn = ipa >> PAGE_SHIFT;
 	mte_allowed = kvm_vma_mte_allowed(vma);

> Reviewed-by: Fuad Tabba <tabba@google.com>
> 
> and using 4, 16, and 64KB pages:
> 
> Tested-by: Fuad Tabba <tabba@google.com>

Ah, great! I couldn't be bothered with 64kB, and only used 16kB in NV
to debug quickly and then bare-metal to verify the fix.

Thanks!

	M.

-- 
Jazz isn't dead. It just smells funny.

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] KVM: arm64: Fix protected mode handling of pages larger than 4kB
  2026-02-22 18:54   ` Marc Zyngier
@ 2026-02-22 20:28     ` Fuad Tabba
  0 siblings, 0 replies; 5+ messages in thread
From: Fuad Tabba @ 2026-02-22 20:28 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvmarm, linux-arm-kernel, Quentin Perret, Will Deacon,
	Vincent Donnefort, Joey Gouly, Suzuki K Poulose, Oliver Upton,
	Zenghui Yu, stable

On Sun, 22 Feb 2026 at 18:55, Marc Zyngier <maz@kernel.org> wrote:
>
> Hi Fuad,
>
> On Sun, 22 Feb 2026 17:58:00 +0000,
> Fuad Tabba <tabba@google.com> wrote:
> >
> > Hi Marc,
> >
> > On Sun, 22 Feb 2026 at 14:10, Marc Zyngier <maz@kernel.org> wrote:
> > >
> > > Since 3669ddd8fa8b5 ("KVM: arm64: Add a range to pkvm_mappings"),
> > > pKVM tracks the memory that has been mapped into a guest in a
> > > side data structure. Crucially, it uses it to find out whether
> > > a page has already been mapped, and therefore refuses to map it
> > > twice. So far, so good.
> > >
> > > However, this very patch completely breaks non-4kB page support,
> > > with guests being unable to boot. The most obvious symptom is that
> > > we take the same fault repeatedly, and not making forward progress.
> > > A quick investigation shows that this is because of the above
> > > rejection code.
> > >
> > > As it turns out, there are multiple issues at play:
> > >
> > > - while the HPFAR_EL2 register gives you the faulting IPA minus
> > >   the bottom 12 bits, it will still give you the extra bits that
> > >   are part of the page offset for anything larger than 4kB,
> > >   even for a level-3 mapping
> >
> > Matches the ARM ARM.
> >
> > > - pkvm_kvm_pgtable_stage2_map() assumes that the address passed
> > >   as a parameter is aligned to the size of the intended mapping
> >
> > nit: pkvm_kvm_pgtable_stage2_map() -> kvm_pgtable_stage2_map()
>
> Actually, that's pkvm_pgtable_stage2_map(). kvm_pgtable_stage2_map()
> itself isn't affected.

Right, I meant to remove the kvm, not the pkvm. It is indeed
pkvm_pgtable_stage2_map().

> >
> > > - the faulting address is only aligned for a non-page mapping
> > >
> > > When the planets are suitably aligned (pun intended), the guest
> > > faults a page by accessing it past the bottom 4kB, and extra bits
> > > get set in the HPFAR_EL2 register. If this results in a page mapping
> > > (which is likely with large granule sizes), nothing aligns it further
> > > down, and pkvm_mapping_iter_first() finds an intersection that
> > > doesn't really exist. We assume this is a spurious fault and return
> > > -EAGAIN. And again.
> > >
> > > This doesn't hit outside of the protected code, as the page table
> > > code always aligns the IPA down to a page boundary, hiding the issue
> > > for everyone else.
> > >
> > > Fix it by always forcing the alignment on vma_pagesize, irrespective
> > > of the value of vma_pagesize.
> > >
> > > Fixes: 3669ddd8fa8b5 ("KVM: arm64: Add a range to pkvm_mappings")
> > > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > > Cc: stable@vger.kernel.org
> > > ---
> > >  arch/arm64/kvm/mmu.c | 12 +++++-------
> > >  1 file changed, 5 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > > index 8c5d259810b2f..aa587f2e28264 100644
> > > --- a/arch/arm64/kvm/mmu.c
> > > +++ b/arch/arm64/kvm/mmu.c
> > > @@ -1753,14 +1753,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> > >         }
> > >
> > >         /*
> > > -        * Both the canonical IPA and fault IPA must be hugepage-aligned to
> > > -        * ensure we find the right PFN and lay down the mapping in the right
> > > -        * place.
> > > +        * Both the canonical IPA and fault IPA must be aligned to the
> > > +        * mapping size to ensure we find the right PFN and lay down the
> > > +        * mapping in the right place.
> > >          */
> > > -       if (vma_pagesize == PMD_SIZE || vma_pagesize == PUD_SIZE) {
> > > -               fault_ipa &= ~(vma_pagesize - 1);
> > > -               ipa &= ~(vma_pagesize - 1);
> > > -       }
> > > +       fault_ipa &= ~(vma_pagesize - 1);
> > > +       ipa &= ~(vma_pagesize - 1);
> >
> > nit: Since we're changing this code anyway, should we use the ALIGN
> > macros instead?
>
> That'd be ALIGN_DOWN() then, as ALIGN() really is ALIGN_UP(), and
> that'd be counter-productive.  Something like:
>
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index aa587f2e28264..3952415c4f83b 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1757,8 +1757,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>          * mapping size to ensure we find the right PFN and lay down the
>          * mapping in the right place.
>          */
> -       fault_ipa &= ~(vma_pagesize - 1);
> -       ipa &= ~(vma_pagesize - 1);
> +       fault_ipa = ALIGN_DOWN(fault_ipa, vma_pagesize);
> +       ipa = ALIGN_DOWN(ipa, vma_pagesize);

Yup, that's what I had in mind.

>         gfn = ipa >> PAGE_SHIFT;
>         mte_allowed = kvm_vma_mte_allowed(vma);
>
> > Reviewed-by: Fuad Tabba <tabba@google.com>
> >
> > and using 4, 16, and 64KB pages:
> >
> > Tested-by: Fuad Tabba <tabba@google.com>
>
> Ah, great! I couldn't be bothered with 64kB, and only used 16kB in NV
> to debug quickly and then bare-metal to verify the fix.
>
> Thanks!

Thank you!
/fuad

>
>         M.
>
> --
> Jazz isn't dead. It just smells funny.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] KVM: arm64: Fix protected mode handling of pages larger than 4kB
  2026-02-22 14:10 [PATCH] KVM: arm64: Fix protected mode handling of pages larger than 4kB Marc Zyngier
  2026-02-22 17:58 ` Fuad Tabba
@ 2026-02-23 16:31 ` Marc Zyngier
  1 sibling, 0 replies; 5+ messages in thread
From: Marc Zyngier @ 2026-02-23 16:31 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel, Marc Zyngier
  Cc: Quentin Perret, Will Deacon, Fuad Tabba, Vincent Donnefort,
	Joey Gouly, Suzuki K Poulose, Oliver Upton, Zenghui Yu, stable

On Sun, 22 Feb 2026 14:10:00 +0000, Marc Zyngier wrote:
> Since 3669ddd8fa8b5 ("KVM: arm64: Add a range to pkvm_mappings"),
> pKVM tracks the memory that has been mapped into a guest in a
> side data structure. Crucially, it uses it to find out whether
> a page has already been mapped, and therefore refuses to map it
> twice. So far, so good.
> 
> However, this very patch completely breaks non-4kB page support,
> with guests being unable to boot. The most obvious symptom is that
> we take the same fault repeatedly, and not making forward progress.
> A quick investigation shows that this is because of the above
> rejection code.
> 
> [...]

Applied to fixes, thanks!

[1/1] KVM: arm64: Fix protected mode handling of pages larger than 4kB
      commit: 08f97454b7fa39bfcf82524955c771d2d693d6fe

Cheers,

	M.
-- 
Without deviation from the norm, progress is not possible.



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-02-23 16:31 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-22 14:10 [PATCH] KVM: arm64: Fix protected mode handling of pages larger than 4kB Marc Zyngier
2026-02-22 17:58 ` Fuad Tabba
2026-02-22 18:54   ` Marc Zyngier
2026-02-22 20:28     ` Fuad Tabba
2026-02-23 16:31 ` Marc Zyngier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox