* [PATCH] KVM: arm64: Adjust range correctly during host stage-2 faults
@ 2025-06-25 10:55 Quentin Perret
2025-06-26 7:53 ` Marc Zyngier
2026-03-04 18:55 ` Marc Zyngier
0 siblings, 2 replies; 6+ messages in thread
From: Quentin Perret @ 2025-06-25 10:55 UTC (permalink / raw)
To: Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
Zenghui Yu, Catalin Marinas, Will Deacon
Cc: Quentin Perret, linux-arm-kernel, kvmarm, linux-kernel
host_stage2_adjust_range() tries to find the largest block mapping that
fits within a memory or mmio region (represented by a kvm_mem_range in
this function) during host stage-2 faults under pKVM. To do so, it walks
the host stage-2 page-table, finds the faulting PTE and its level, and
then progressively increments the level until it finds a granule of the
appropriate size. However, the condition in the loop implementing the
above is broken as it checks kvm_level_supports_block_mapping() for the
next level instead of the current, so pKVM may attempt to map a region
larger than can be covered with a single block.
This is not a security problem and is quite rare in practice (the
kvm_mem_range check usually forces host_stage2_adjust_range() to choose a
smaller granule), but this is clearly not the expected behaviour.
Refactor the loop to fix the bug and improve readability.
Fixes: c4f0935e4d95 ("KVM: arm64: Optimize host memory aborts")
Signed-off-by: Quentin Perret <qperret@google.com>
---
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 20 ++++++++++++--------
1 file changed, 12 insertions(+), 8 deletions(-)
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 95d7534c9679..8957734d6183 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -479,6 +479,7 @@ static int host_stage2_adjust_range(u64 addr, struct kvm_mem_range *range)
{
struct kvm_mem_range cur;
kvm_pte_t pte;
+ u64 granule;
s8 level;
int ret;
@@ -496,18 +497,21 @@ static int host_stage2_adjust_range(u64 addr, struct kvm_mem_range *range)
return -EPERM;
}
- do {
- u64 granule = kvm_granule_size(level);
+ for (; level <= KVM_PGTABLE_LAST_LEVEL; level++) {
+ if (!kvm_level_supports_block_mapping(level))
+ continue;
+ granule = kvm_granule_size(level);
cur.start = ALIGN_DOWN(addr, granule);
cur.end = cur.start + granule;
- level++;
- } while ((level <= KVM_PGTABLE_LAST_LEVEL) &&
- !(kvm_level_supports_block_mapping(level) &&
- range_included(&cur, range)));
+ if (!range_included(&cur, range))
+ continue;
+ *range = cur;
+ return 0;
+ }
- *range = cur;
+ WARN_ON(1);
- return 0;
+ return -EINVAL;
}
int host_stage2_idmap_locked(phys_addr_t addr, u64 size,
--
2.50.0.714.g196bf9f422-goog
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] KVM: arm64: Adjust range correctly during host stage-2 faults
2025-06-25 10:55 [PATCH] KVM: arm64: Adjust range correctly during host stage-2 faults Quentin Perret
@ 2025-06-26 7:53 ` Marc Zyngier
2026-03-04 18:55 ` Marc Zyngier
1 sibling, 0 replies; 6+ messages in thread
From: Marc Zyngier @ 2025-06-26 7:53 UTC (permalink / raw)
To: Oliver Upton, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
Catalin Marinas, Will Deacon, Quentin Perret
Cc: linux-arm-kernel, kvmarm, linux-kernel
On Wed, 25 Jun 2025 10:55:48 +0000, Quentin Perret wrote:
> host_stage2_adjust_range() tries to find the largest block mapping that
> fits within a memory or mmio region (represented by a kvm_mem_range in
> this function) during host stage-2 faults under pKVM. To do so, it walks
> the host stage-2 page-table, finds the faulting PTE and its level, and
> then progressively increments the level until it finds a granule of the
> appropriate size. However, the condition in the loop implementing the
> above is broken as it checks kvm_level_supports_block_mapping() for the
> next level instead of the current, so pKVM may attempt to map a region
> larger than can be covered with a single block.
>
> [...]
Applied to fixes, thanks!
[1/1] KVM: arm64: Adjust range correctly during host stage-2 faults
commit: e728e705802fec20f65d974a5d5eb91217ac618d
Cheers,
M.
--
Without deviation from the norm, progress is not possible.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] KVM: arm64: Adjust range correctly during host stage-2 faults
2025-06-25 10:55 [PATCH] KVM: arm64: Adjust range correctly during host stage-2 faults Quentin Perret
2025-06-26 7:53 ` Marc Zyngier
@ 2026-03-04 18:55 ` Marc Zyngier
2026-03-05 10:55 ` Marc Zyngier
1 sibling, 1 reply; 6+ messages in thread
From: Marc Zyngier @ 2026-03-04 18:55 UTC (permalink / raw)
To: Quentin Perret
Cc: Oliver Upton, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
Catalin Marinas, Will Deacon, linux-arm-kernel, kvmarm,
linux-kernel, Leo Yan
On Wed, 25 Jun 2025 11:55:48 +0100,
Quentin Perret <qperret@google.com> wrote:
>
> host_stage2_adjust_range() tries to find the largest block mapping that
> fits within a memory or mmio region (represented by a kvm_mem_range in
> this function) during host stage-2 faults under pKVM. To do so, it walks
> the host stage-2 page-table, finds the faulting PTE and its level, and
> then progressively increments the level until it finds a granule of the
> appropriate size. However, the condition in the loop implementing the
> above is broken as it checks kvm_level_supports_block_mapping() for the
> next level instead of the current, so pKVM may attempt to map a region
> larger than can be covered with a single block.
>
> This is not a security problem and is quite rare in practice (the
> kvm_mem_range check usually forces host_stage2_adjust_range() to choose a
> smaller granule), but this is clearly not the expected behaviour.
>
> Refactor the loop to fix the bug and improve readability.
>
> Fixes: c4f0935e4d95 ("KVM: arm64: Optimize host memory aborts")
> Signed-off-by: Quentin Perret <qperret@google.com>
This patch prevents my O6 board from booting in protected mode as of
e728e705802fe. Reverting it on top of 7.0-rc2 make the box work again.
I haven't quite worked out why though. The hack below makes it work,
but implies that we can get ranges that are smaller than a page. That
feels unlikely, but I'm not sure we can rule it out (the kernel page
size could be pretty large anyway).
Any idea?
M.
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 38f66a56a7665..d815265bd374f 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -518,7 +518,7 @@ static int host_stage2_adjust_range(u64 addr, struct kvm_mem_range *range)
granule = kvm_granule_size(level);
cur.start = ALIGN_DOWN(addr, granule);
cur.end = cur.start + granule;
- if (!range_included(&cur, range))
+ if (!range_included(&cur, range) && level < KVM_PGTABLE_LAST_LEVEL)
continue;
*range = cur;
return 0;
--
Without deviation from the norm, progress is not possible.
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] KVM: arm64: Adjust range correctly during host stage-2 faults
2026-03-04 18:55 ` Marc Zyngier
@ 2026-03-05 10:55 ` Marc Zyngier
2026-03-05 13:13 ` Quentin Perret
0 siblings, 1 reply; 6+ messages in thread
From: Marc Zyngier @ 2026-03-05 10:55 UTC (permalink / raw)
To: Quentin Perret
Cc: Oliver Upton, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
Catalin Marinas, Will Deacon, linux-arm-kernel, kvmarm,
linux-kernel, Leo Yan
On Wed, 04 Mar 2026 18:55:04 +0000,
Marc Zyngier <maz@kernel.org> wrote:
>
> On Wed, 25 Jun 2025 11:55:48 +0100,
> Quentin Perret <qperret@google.com> wrote:
> >
> > host_stage2_adjust_range() tries to find the largest block mapping that
> > fits within a memory or mmio region (represented by a kvm_mem_range in
> > this function) during host stage-2 faults under pKVM. To do so, it walks
> > the host stage-2 page-table, finds the faulting PTE and its level, and
> > then progressively increments the level until it finds a granule of the
> > appropriate size. However, the condition in the loop implementing the
> > above is broken as it checks kvm_level_supports_block_mapping() for the
> > next level instead of the current, so pKVM may attempt to map a region
> > larger than can be covered with a single block.
> >
> > This is not a security problem and is quite rare in practice (the
> > kvm_mem_range check usually forces host_stage2_adjust_range() to choose a
> > smaller granule), but this is clearly not the expected behaviour.
> >
> > Refactor the loop to fix the bug and improve readability.
> >
> > Fixes: c4f0935e4d95 ("KVM: arm64: Optimize host memory aborts")
> > Signed-off-by: Quentin Perret <qperret@google.com>
>
> This patch prevents my O6 board from booting in protected mode as of
> e728e705802fe. Reverting it on top of 7.0-rc2 make the box work again.
>
> I haven't quite worked out why though. The hack below makes it work,
> but implies that we can get ranges that are smaller than a page. That
> feels unlikely, but I'm not sure we can rule it out (the kernel page
> size could be pretty large anyway).
Having spent a bit of time on this, I'm pretty sure this is the cause
of the issue. The memblock tables are as such:
maz@cosmic-debris:~/vminstall$ sudo cat /sys/kernel/debug/memblock/memory
0: 0x0000000080000000..0x00000000843fffff 0 NOMAP
1: 0x0000000084400000..0x00000000845fffff 0 NONE
2: 0x0000000085000000..0x000000009fffffff 0 NONE
3: 0x00000000a0000000..0x00000000a7ffffff 0 NOMAP
4: 0x00000000a8000000..0x00000000fffbffff 0 NONE
5: 0x00000000fffc0000..0x00000000fffeffff 0 NOMAP
6: 0x00000000ffff0000..0x00000000ffffdfff 0 NONE
7: 0x00000000ffffe000..0x00000000ffffffff 0 NOMAP
8: 0x0000000100000000..0x00000007fe4effff 0 NONE
9: 0x00000007fe4f0000..0x00000007fedeffff 0 NOMAP
10: 0x00000007fedf0000..0x00000007ffffffff 0 NONE
11: 0x0000008000000000..0x000000807a290fff 0 NONE
12: 0x000000807a291000..0x000000807a2927b2 0 NOMAP
13: 0x000000807a2927b3..0x000000807fffffff 0 NONE
Any access to page 0x000000807a292000 is going to blow up in your
face, because there is no way you can map this and still respect the
memblock boundary. Same thing for any region that is smaller than
PAGE_SIZE, or not aligned on PAGE_SIZE. Which is even more annoying.
I'm starting to think that my hack is not that idiotic in the end...
M.
--
Without deviation from the norm, progress is not possible.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] KVM: arm64: Adjust range correctly during host stage-2 faults
2026-03-05 10:55 ` Marc Zyngier
@ 2026-03-05 13:13 ` Quentin Perret
2026-03-05 13:22 ` Marc Zyngier
0 siblings, 1 reply; 6+ messages in thread
From: Quentin Perret @ 2026-03-05 13:13 UTC (permalink / raw)
To: Marc Zyngier
Cc: Oliver Upton, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
Catalin Marinas, Will Deacon, linux-arm-kernel, kvmarm,
linux-kernel, Leo Yan
On Thursday 05 Mar 2026 at 10:55:42 (+0000), Marc Zyngier wrote:
> On Wed, 04 Mar 2026 18:55:04 +0000,
> Marc Zyngier <maz@kernel.org> wrote:
> >
> > On Wed, 25 Jun 2025 11:55:48 +0100,
> > Quentin Perret <qperret@google.com> wrote:
> > >
> > > host_stage2_adjust_range() tries to find the largest block mapping that
> > > fits within a memory or mmio region (represented by a kvm_mem_range in
> > > this function) during host stage-2 faults under pKVM. To do so, it walks
> > > the host stage-2 page-table, finds the faulting PTE and its level, and
> > > then progressively increments the level until it finds a granule of the
> > > appropriate size. However, the condition in the loop implementing the
> > > above is broken as it checks kvm_level_supports_block_mapping() for the
> > > next level instead of the current, so pKVM may attempt to map a region
> > > larger than can be covered with a single block.
> > >
> > > This is not a security problem and is quite rare in practice (the
> > > kvm_mem_range check usually forces host_stage2_adjust_range() to choose a
> > > smaller granule), but this is clearly not the expected behaviour.
> > >
> > > Refactor the loop to fix the bug and improve readability.
> > >
> > > Fixes: c4f0935e4d95 ("KVM: arm64: Optimize host memory aborts")
> > > Signed-off-by: Quentin Perret <qperret@google.com>
> >
> > This patch prevents my O6 board from booting in protected mode as of
> > e728e705802fe. Reverting it on top of 7.0-rc2 make the box work again.
> >
> > I haven't quite worked out why though. The hack below makes it work,
> > but implies that we can get ranges that are smaller than a page. That
> > feels unlikely, but I'm not sure we can rule it out (the kernel page
> > size could be pretty large anyway).
>
> Having spent a bit of time on this, I'm pretty sure this is the cause
> of the issue. The memblock tables are as such:
>
> maz@cosmic-debris:~/vminstall$ sudo cat /sys/kernel/debug/memblock/memory
> 0: 0x0000000080000000..0x00000000843fffff 0 NOMAP
> 1: 0x0000000084400000..0x00000000845fffff 0 NONE
> 2: 0x0000000085000000..0x000000009fffffff 0 NONE
> 3: 0x00000000a0000000..0x00000000a7ffffff 0 NOMAP
> 4: 0x00000000a8000000..0x00000000fffbffff 0 NONE
> 5: 0x00000000fffc0000..0x00000000fffeffff 0 NOMAP
> 6: 0x00000000ffff0000..0x00000000ffffdfff 0 NONE
> 7: 0x00000000ffffe000..0x00000000ffffffff 0 NOMAP
> 8: 0x0000000100000000..0x00000007fe4effff 0 NONE
> 9: 0x00000007fe4f0000..0x00000007fedeffff 0 NOMAP
> 10: 0x00000007fedf0000..0x00000007ffffffff 0 NONE
> 11: 0x0000008000000000..0x000000807a290fff 0 NONE
> 12: 0x000000807a291000..0x000000807a2927b2 0 NOMAP
> 13: 0x000000807a2927b3..0x000000807fffffff 0 NONE
Ouch, these last few are 'interesting', oh well :-)
> Any access to page 0x000000807a292000 is going to blow up in your
> face, because there is no way you can map this and still respect the
> memblock boundary. Same thing for any region that is smaller than
> PAGE_SIZE, or not aligned on PAGE_SIZE. Which is even more annoying.
>
> I'm starting to think that my hack is not that idiotic in the end...
Yes, I can't think of anything better TBH. We've already asserted that
we don't have an annotated PTE here, and at the last level we're
guaranteed not to accidentally map a neighbouring private region, so yes
we should just proceed with a page-aligned mapping there.
Want me to post a proper patch or do you already have one in stock?
Thanks!
Quentin
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] KVM: arm64: Adjust range correctly during host stage-2 faults
2026-03-05 13:13 ` Quentin Perret
@ 2026-03-05 13:22 ` Marc Zyngier
0 siblings, 0 replies; 6+ messages in thread
From: Marc Zyngier @ 2026-03-05 13:22 UTC (permalink / raw)
To: Quentin Perret
Cc: Oliver Upton, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
Catalin Marinas, Will Deacon, linux-arm-kernel, kvmarm,
linux-kernel, Leo Yan
On Thu, 05 Mar 2026 13:13:40 +0000,
Quentin Perret <qperret@google.com> wrote:
>
> On Thursday 05 Mar 2026 at 10:55:42 (+0000), Marc Zyngier wrote:
> > On Wed, 04 Mar 2026 18:55:04 +0000,
> > Marc Zyngier <maz@kernel.org> wrote:
> > >
> > > On Wed, 25 Jun 2025 11:55:48 +0100,
> > > Quentin Perret <qperret@google.com> wrote:
> > > >
> > > > host_stage2_adjust_range() tries to find the largest block mapping that
> > > > fits within a memory or mmio region (represented by a kvm_mem_range in
> > > > this function) during host stage-2 faults under pKVM. To do so, it walks
> > > > the host stage-2 page-table, finds the faulting PTE and its level, and
> > > > then progressively increments the level until it finds a granule of the
> > > > appropriate size. However, the condition in the loop implementing the
> > > > above is broken as it checks kvm_level_supports_block_mapping() for the
> > > > next level instead of the current, so pKVM may attempt to map a region
> > > > larger than can be covered with a single block.
> > > >
> > > > This is not a security problem and is quite rare in practice (the
> > > > kvm_mem_range check usually forces host_stage2_adjust_range() to choose a
> > > > smaller granule), but this is clearly not the expected behaviour.
> > > >
> > > > Refactor the loop to fix the bug and improve readability.
> > > >
> > > > Fixes: c4f0935e4d95 ("KVM: arm64: Optimize host memory aborts")
> > > > Signed-off-by: Quentin Perret <qperret@google.com>
> > >
> > > This patch prevents my O6 board from booting in protected mode as of
> > > e728e705802fe. Reverting it on top of 7.0-rc2 make the box work again.
> > >
> > > I haven't quite worked out why though. The hack below makes it work,
> > > but implies that we can get ranges that are smaller than a page. That
> > > feels unlikely, but I'm not sure we can rule it out (the kernel page
> > > size could be pretty large anyway).
> >
> > Having spent a bit of time on this, I'm pretty sure this is the cause
> > of the issue. The memblock tables are as such:
> >
> > maz@cosmic-debris:~/vminstall$ sudo cat /sys/kernel/debug/memblock/memory
> > 0: 0x0000000080000000..0x00000000843fffff 0 NOMAP
> > 1: 0x0000000084400000..0x00000000845fffff 0 NONE
> > 2: 0x0000000085000000..0x000000009fffffff 0 NONE
> > 3: 0x00000000a0000000..0x00000000a7ffffff 0 NOMAP
> > 4: 0x00000000a8000000..0x00000000fffbffff 0 NONE
> > 5: 0x00000000fffc0000..0x00000000fffeffff 0 NOMAP
> > 6: 0x00000000ffff0000..0x00000000ffffdfff 0 NONE
> > 7: 0x00000000ffffe000..0x00000000ffffffff 0 NOMAP
> > 8: 0x0000000100000000..0x00000007fe4effff 0 NONE
> > 9: 0x00000007fe4f0000..0x00000007fedeffff 0 NOMAP
> > 10: 0x00000007fedf0000..0x00000007ffffffff 0 NONE
> > 11: 0x0000008000000000..0x000000807a290fff 0 NONE
> > 12: 0x000000807a291000..0x000000807a2927b2 0 NOMAP
> > 13: 0x000000807a2927b3..0x000000807fffffff 0 NONE
>
> Ouch, these last few are 'interesting', oh well :-)
>
> > Any access to page 0x000000807a292000 is going to blow up in your
> > face, because there is no way you can map this and still respect the
> > memblock boundary. Same thing for any region that is smaller than
> > PAGE_SIZE, or not aligned on PAGE_SIZE. Which is even more annoying.
> >
> > I'm starting to think that my hack is not that idiotic in the end...
>
> Yes, I can't think of anything better TBH. We've already asserted that
> we don't have an annotated PTE here, and at the last level we're
> guaranteed not to accidentally map a neighbouring private region, so yes
> we should just proceed with a page-aligned mapping there.
>
> Want me to post a proper patch or do you already have one in stock?
I have that ready, but I wanted your feedback on it before posting it.
I'll send that now.
Thanks,
M.
--
Without deviation from the norm, progress is not possible.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2026-03-05 13:22 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-25 10:55 [PATCH] KVM: arm64: Adjust range correctly during host stage-2 faults Quentin Perret
2025-06-26 7:53 ` Marc Zyngier
2026-03-04 18:55 ` Marc Zyngier
2026-03-05 10:55 ` Marc Zyngier
2026-03-05 13:13 ` Quentin Perret
2026-03-05 13:22 ` Marc Zyngier
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox