* [PATCH v2 0/3] Fix __pkvm_init_vm error path
@ 2026-05-21 10:21 Vincent Donnefort
2026-05-21 10:21 ` [PATCH v2 1/3] KVM: arm64: Reset page order in pKVM hyp_pool_init Vincent Donnefort
` (3 more replies)
0 siblings, 4 replies; 10+ messages in thread
From: Vincent Donnefort @ 2026-05-21 10:21 UTC (permalink / raw)
To: maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
catalin.marinas, will
Cc: linux-arm-kernel, kvmarm, kernel-team, qperret, tabba,
Vincent Donnefort
Sashiko reported a potential refcount leak in the unlikely case where
insert_vm_table_entry fails.
While at it, I have added a fail-safe to __pkvm_hyp_donate_host to ensure this
function doesn't allow leaking refcounted pages.
Changes since v2:
* Proactively init hyp_page order field in hyp_pool_init
v1 (https://lore.kernel.org/all/20260521081250.655226-1-vdonnefort@google.com/)
*** BLURB HERE ***
Vincent Donnefort (3):
KVM: arm64: Reset page order in pKVM hyp_pool_init
KVM: arm64: Fix __pkvm_init_vm error path
KVM: arm64: Add fail-safe for refcounted pages in
__pkvm_hyp_donate_host
arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 1 +
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 34 ++++++++++++++-----
arch/arm64/kvm/hyp/nvhe/page_alloc.c | 6 +++-
arch/arm64/kvm/hyp/nvhe/pkvm.c | 4 ++-
4 files changed, 34 insertions(+), 11 deletions(-)
base-commit: 5200f5f493f79f14bbdc349e402a40dfb32f23c8
--
2.54.0.746.g67dd491aae-goog
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v2 1/3] KVM: arm64: Reset page order in pKVM hyp_pool_init
2026-05-21 10:21 [PATCH v2 0/3] Fix __pkvm_init_vm error path Vincent Donnefort
@ 2026-05-21 10:21 ` Vincent Donnefort
2026-05-21 13:07 ` Fuad Tabba
2026-05-21 10:21 ` [PATCH v2 2/3] KVM: arm64: Fix __pkvm_init_vm error path Vincent Donnefort
` (2 subsequent siblings)
3 siblings, 1 reply; 10+ messages in thread
From: Vincent Donnefort @ 2026-05-21 10:21 UTC (permalink / raw)
To: maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
catalin.marinas, will
Cc: linux-arm-kernel, kvmarm, kernel-team, qperret, tabba,
Vincent Donnefort, Sashiko
When a VM fails to initialise after its stage-2 hyp_pool has been
initialised, that stage-2 must be torn down entirely. This requires
resetting both the refcount and the order of its pages back to 0.
Currently, reclaim_pgtable_pages() implicitly resets the page order by
allocating the entire pool with order-0 granularity. However, in the VM
initialisation error path, the addresses of the donated memory (the PGD)
are already known, making it unnecessary to iterate over all pages in
the pool.
Since the vmemmap page order is a hyp_pool-specific field, leaving a
non-zero order on hyp_pool destruction is harmless until another pool
attempts to admit the page. Instead of resetting this field during
destruction, reset it during pool initialization in hyp_pool_init().
Note that pages added to the pool outside of the initial pool range
(e.g., via guest_s2_zalloc_page()) must still have their order managed
manually.
While at it, add a WARN_ON() in the hyp_pool attach path to catch
unexpected page orders that exceed the pool's max_order.
Fixes: 256b4668cd89 ("KVM: arm64: Introduce separate hypercalls for pKVM VM reservation and initialization")
Reported-by: Sashiko <sashiko-bot@kernel.org>
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 25f04629014e..89eb20d4fee4 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -322,7 +322,6 @@ void reclaim_pgtable_pages(struct pkvm_hyp_vm *vm, struct kvm_hyp_memcache *mc)
while (addr) {
page = hyp_virt_to_page(addr);
page->refcount = 0;
- page->order = 0;
push_hyp_memcache(mc, addr, hyp_virt_to_phys);
WARN_ON(__pkvm_hyp_donate_host(hyp_virt_to_pfn(addr), 1));
addr = hyp_alloc_pages(&vm->pool, 0);
diff --git a/arch/arm64/kvm/hyp/nvhe/page_alloc.c b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
index a1eb27a1a747..c3b3dc5a8ea7 100644
--- a/arch/arm64/kvm/hyp/nvhe/page_alloc.c
+++ b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
@@ -97,6 +97,8 @@ static void __hyp_attach_page(struct hyp_pool *pool,
u8 order = p->order;
struct hyp_page *buddy;
+ WARN_ON(p->order > pool->max_order);
+
memset(hyp_page_to_virt(p), 0, PAGE_SIZE << p->order);
/* Skip coalescing for 'external' pages being freed into the pool. */
@@ -237,8 +239,10 @@ int hyp_pool_init(struct hyp_pool *pool, u64 pfn, unsigned int nr_pages,
/* Init the vmemmap portion */
p = hyp_phys_to_page(phys);
- for (i = 0; i < nr_pages; i++)
+ for (i = 0; i < nr_pages; i++) {
hyp_set_page_refcounted(&p[i]);
+ p[i].order = 0;
+ }
/* Attach the unused pages to the buddy tree */
for (i = reserved_pages; i < nr_pages; i++)
--
2.54.0.746.g67dd491aae-goog
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v2 2/3] KVM: arm64: Fix __pkvm_init_vm error path
2026-05-21 10:21 [PATCH v2 0/3] Fix __pkvm_init_vm error path Vincent Donnefort
2026-05-21 10:21 ` [PATCH v2 1/3] KVM: arm64: Reset page order in pKVM hyp_pool_init Vincent Donnefort
@ 2026-05-21 10:21 ` Vincent Donnefort
2026-05-21 13:07 ` Fuad Tabba
2026-05-21 10:21 ` [PATCH v2 3/3] KVM: arm64: Add fail-safe for refcounted pages in __pkvm_hyp_donate_host Vincent Donnefort
2026-05-21 13:07 ` [PATCH v2 0/3] Fix __pkvm_init_vm error path Fuad Tabba
3 siblings, 1 reply; 10+ messages in thread
From: Vincent Donnefort @ 2026-05-21 10:21 UTC (permalink / raw)
To: maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
catalin.marinas, will
Cc: linux-arm-kernel, kvmarm, kernel-team, qperret, tabba,
Vincent Donnefort, Sashiko
In the unlikely case where insert_vm_table_entry fails, __pkvm_init_vm
release the memory donated by the host for the PGD, but as the stage-2
is still set-up the hypervisor keeps a refcount on those pages,
effectively leaking the references.
Fix the rollback with the newly added kvm_guest_destroy_stage2().
Fixes: 256b4668cd89 ("KVM: arm64: Introduce separate hypercalls for pKVM VM reservation and initialization")
Reported-by: Sashiko <sashiko-bot@kernel.org>
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index 3cbfae0e3dda..4f2b871199cb 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -56,6 +56,7 @@ int host_stage2_idmap_locked(phys_addr_t addr, u64 size, enum kvm_pgtable_prot p
int host_stage2_set_owner_locked(phys_addr_t addr, u64 size, u8 owner_id);
int kvm_host_prepare_stage2(void *pgt_pool_base);
int kvm_guest_prepare_stage2(struct pkvm_hyp_vm *vm, void *pgd);
+void kvm_guest_destroy_stage2(struct pkvm_hyp_vm *vm);
void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt);
int hyp_pin_shared_mem(void *from, void *to);
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 89eb20d4fee4..42b0b648f32f 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -306,16 +306,21 @@ int kvm_guest_prepare_stage2(struct pkvm_hyp_vm *vm, void *pgd)
return 0;
}
+void kvm_guest_destroy_stage2(struct pkvm_hyp_vm *vm)
+{
+ guest_lock_component(vm);
+ kvm_pgtable_stage2_destroy(&vm->pgt);
+ vm->kvm.arch.mmu.pgd_phys = 0ULL;
+ guest_unlock_component(vm);
+}
+
void reclaim_pgtable_pages(struct pkvm_hyp_vm *vm, struct kvm_hyp_memcache *mc)
{
struct hyp_page *page;
void *addr;
/* Dump all pgtable pages in the hyp_pool */
- guest_lock_component(vm);
- kvm_pgtable_stage2_destroy(&vm->pgt);
- vm->kvm.arch.mmu.pgd_phys = 0ULL;
- guest_unlock_component(vm);
+ kvm_guest_destroy_stage2(vm);
/* Drain the hyp_pool into the memcache */
addr = hyp_alloc_pages(&vm->pool, 0);
diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
index eb1c10120f9f..3b2c4fbc34d8 100644
--- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
+++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
@@ -853,10 +853,12 @@ int __pkvm_init_vm(struct kvm *host_kvm, unsigned long vm_hva,
/* Must be called last since this publishes the VM. */
ret = insert_vm_table_entry(handle, hyp_vm);
if (ret)
- goto err_remove_mappings;
+ goto err_destroy_stage2;
return 0;
+err_destroy_stage2:
+ kvm_guest_destroy_stage2(hyp_vm);
err_remove_mappings:
unmap_donated_memory(hyp_vm, vm_size);
unmap_donated_memory(pgd, pgd_size);
--
2.54.0.746.g67dd491aae-goog
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v2 3/3] KVM: arm64: Add fail-safe for refcounted pages in __pkvm_hyp_donate_host
2026-05-21 10:21 [PATCH v2 0/3] Fix __pkvm_init_vm error path Vincent Donnefort
2026-05-21 10:21 ` [PATCH v2 1/3] KVM: arm64: Reset page order in pKVM hyp_pool_init Vincent Donnefort
2026-05-21 10:21 ` [PATCH v2 2/3] KVM: arm64: Fix __pkvm_init_vm error path Vincent Donnefort
@ 2026-05-21 10:21 ` Vincent Donnefort
2026-05-21 13:07 ` Fuad Tabba
2026-05-21 13:07 ` [PATCH v2 0/3] Fix __pkvm_init_vm error path Fuad Tabba
3 siblings, 1 reply; 10+ messages in thread
From: Vincent Donnefort @ 2026-05-21 10:21 UTC (permalink / raw)
To: maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
catalin.marinas, will
Cc: linux-arm-kernel, kvmarm, kernel-team, qperret, tabba,
Vincent Donnefort
A previous bug in __pkvm_init_vm error path showed that the hypervisor
could leak refcounted pages, (i.e. losing access to a page while its
refcount is still elevated). This poses a threat to the pKVM state
machine.
Address this by introducing a fail-safe in n __pkvm_hyp_donate_host.
Transitions are not a hot path so added security is worth the extra
check.
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 42b0b648f32f..bb97d05b9b25 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -855,6 +855,16 @@ static int __hyp_check_page_state_range(phys_addr_t phys, u64 size, enum pkvm_pa
return 0;
}
+static int __hyp_check_page_count_range(phys_addr_t phys, u64 size)
+{
+ for_each_hyp_page(page, phys, size) {
+ if (page->refcount)
+ return -EBUSY;
+ }
+
+ return 0;
+}
+
static bool guest_pte_is_poisoned(kvm_pte_t pte)
{
if (kvm_pte_valid(pte))
@@ -1053,7 +1063,6 @@ int __pkvm_guest_unshare_host(struct pkvm_hyp_vcpu *vcpu, u64 gfn)
int __pkvm_host_unshare_hyp(u64 pfn)
{
u64 phys = hyp_pfn_to_phys(pfn);
- u64 virt = (u64)__hyp_va(phys);
u64 size = PAGE_SIZE;
int ret;
@@ -1066,10 +1075,9 @@ int __pkvm_host_unshare_hyp(u64 pfn)
ret = __hyp_check_page_state_range(phys, size, PKVM_PAGE_SHARED_BORROWED);
if (ret)
goto unlock;
- if (hyp_page_count((void *)virt)) {
- ret = -EBUSY;
+ ret = __hyp_check_page_count_range(phys, size);
+ if (ret)
goto unlock;
- }
__hyp_set_page_state_range(phys, size, PKVM_NOPAGE);
WARN_ON(__host_set_page_state_range(phys, size, PKVM_PAGE_OWNED));
@@ -1132,6 +1140,10 @@ int __pkvm_hyp_donate_host(u64 pfn, u64 nr_pages)
if (ret)
goto unlock;
+ ret = __hyp_check_page_count_range(phys, size);
+ if (ret)
+ goto unlock;
+
__hyp_set_page_state_range(phys, size, PKVM_NOPAGE);
WARN_ON(kvm_pgtable_hyp_unmap(&pkvm_pgtable, virt, size) != size);
WARN_ON(host_stage2_set_owner_locked(phys, size, PKVM_ID_HOST));
--
2.54.0.746.g67dd491aae-goog
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v2 0/3] Fix __pkvm_init_vm error path
2026-05-21 10:21 [PATCH v2 0/3] Fix __pkvm_init_vm error path Vincent Donnefort
` (2 preceding siblings ...)
2026-05-21 10:21 ` [PATCH v2 3/3] KVM: arm64: Add fail-safe for refcounted pages in __pkvm_hyp_donate_host Vincent Donnefort
@ 2026-05-21 13:07 ` Fuad Tabba
3 siblings, 0 replies; 10+ messages in thread
From: Fuad Tabba @ 2026-05-21 13:07 UTC (permalink / raw)
To: Vincent Donnefort
Cc: maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
catalin.marinas, will, linux-arm-kernel, kvmarm, kernel-team,
qperret
On Thu, 21 May 2026 at 11:22, Vincent Donnefort <vdonnefort@google.com> wrote:
>
> Sashiko reported a potential refcount leak in the unlikely case where
> insert_vm_table_entry fails.
>
> While at it, I have added a fail-safe to __pkvm_hyp_donate_host to ensure this
> function doesn't allow leaking refcounted pages.
>
> Changes since v2:
>
> * Proactively init hyp_page order field in hyp_pool_init
>
> v1 (https://lore.kernel.org/all/20260521081250.655226-1-vdonnefort@google.com/)
>
> *** BLURB HERE ***
nit: missing BLURB :)
/fuad
/fuad
>
> Vincent Donnefort (3):
> KVM: arm64: Reset page order in pKVM hyp_pool_init
> KVM: arm64: Fix __pkvm_init_vm error path
> KVM: arm64: Add fail-safe for refcounted pages in
> __pkvm_hyp_donate_host
>
> arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 1 +
> arch/arm64/kvm/hyp/nvhe/mem_protect.c | 34 ++++++++++++++-----
> arch/arm64/kvm/hyp/nvhe/page_alloc.c | 6 +++-
> arch/arm64/kvm/hyp/nvhe/pkvm.c | 4 ++-
> 4 files changed, 34 insertions(+), 11 deletions(-)
>
>
> base-commit: 5200f5f493f79f14bbdc349e402a40dfb32f23c8
> --
> 2.54.0.746.g67dd491aae-goog
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 1/3] KVM: arm64: Reset page order in pKVM hyp_pool_init
2026-05-21 10:21 ` [PATCH v2 1/3] KVM: arm64: Reset page order in pKVM hyp_pool_init Vincent Donnefort
@ 2026-05-21 13:07 ` Fuad Tabba
2026-05-21 13:21 ` Vincent Donnefort
0 siblings, 1 reply; 10+ messages in thread
From: Fuad Tabba @ 2026-05-21 13:07 UTC (permalink / raw)
To: Vincent Donnefort
Cc: maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
catalin.marinas, will, linux-arm-kernel, kvmarm, kernel-team,
qperret, Sashiko
On Thu, 21 May 2026 at 11:22, Vincent Donnefort <vdonnefort@google.com> wrote:
>
> When a VM fails to initialise after its stage-2 hyp_pool has been
> initialised, that stage-2 must be torn down entirely. This requires
> resetting both the refcount and the order of its pages back to 0.
>
> Currently, reclaim_pgtable_pages() implicitly resets the page order by
> allocating the entire pool with order-0 granularity. However, in the VM
> initialisation error path, the addresses of the donated memory (the PGD)
> are already known, making it unnecessary to iterate over all pages in
> the pool.
>
> Since the vmemmap page order is a hyp_pool-specific field, leaving a
> non-zero order on hyp_pool destruction is harmless until another pool
> attempts to admit the page. Instead of resetting this field during
> destruction, reset it during pool initialization in hyp_pool_init().
> Note that pages added to the pool outside of the initial pool range
> (e.g., via guest_s2_zalloc_page()) must still have their order managed
> manually.
>
> While at it, add a WARN_ON() in the hyp_pool attach path to catch
> unexpected page orders that exceed the pool's max_order.
>
> Fixes: 256b4668cd89 ("KVM: arm64: Introduce separate hypercalls for pKVM VM reservation and initialization")
> Reported-by: Sashiko <sashiko-bot@kernel.org>
> Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
>
> diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> index 25f04629014e..89eb20d4fee4 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> @@ -322,7 +322,6 @@ void reclaim_pgtable_pages(struct pkvm_hyp_vm *vm, struct kvm_hyp_memcache *mc)
> while (addr) {
> page = hyp_virt_to_page(addr);
> page->refcount = 0;
> - page->order = 0;
> push_hyp_memcache(mc, addr, hyp_virt_to_phys);
> WARN_ON(__pkvm_hyp_donate_host(hyp_virt_to_pfn(addr), 1));
> addr = hyp_alloc_pages(&vm->pool, 0);
> diff --git a/arch/arm64/kvm/hyp/nvhe/page_alloc.c b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
> index a1eb27a1a747..c3b3dc5a8ea7 100644
> --- a/arch/arm64/kvm/hyp/nvhe/page_alloc.c
> +++ b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
> @@ -97,6 +97,8 @@ static void __hyp_attach_page(struct hyp_pool *pool,
> u8 order = p->order;
> struct hyp_page *buddy;
>
> + WARN_ON(p->order > pool->max_order);
> +
Could you add a brief comment? It took me a minute to figure out what this
catches. IIUC it's not attach's own input, it's a stale p->order from way back
when an external page was popped from a memcache (today only via
guest_s2_zalloc_page()). Right?
With that.
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Cheers,
/fuad
> memset(hyp_page_to_virt(p), 0, PAGE_SIZE << p->order);
>
> /* Skip coalescing for 'external' pages being freed into the pool. */
> @@ -237,8 +239,10 @@ int hyp_pool_init(struct hyp_pool *pool, u64 pfn, unsigned int nr_pages,
>
> /* Init the vmemmap portion */
> p = hyp_phys_to_page(phys);
> - for (i = 0; i < nr_pages; i++)
> + for (i = 0; i < nr_pages; i++) {
> hyp_set_page_refcounted(&p[i]);
> + p[i].order = 0;
> + }
>
> /* Attach the unused pages to the buddy tree */
> for (i = reserved_pages; i < nr_pages; i++)
> --
> 2.54.0.746.g67dd491aae-goog
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 2/3] KVM: arm64: Fix __pkvm_init_vm error path
2026-05-21 10:21 ` [PATCH v2 2/3] KVM: arm64: Fix __pkvm_init_vm error path Vincent Donnefort
@ 2026-05-21 13:07 ` Fuad Tabba
0 siblings, 0 replies; 10+ messages in thread
From: Fuad Tabba @ 2026-05-21 13:07 UTC (permalink / raw)
To: Vincent Donnefort
Cc: maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
catalin.marinas, will, linux-arm-kernel, kvmarm, kernel-team,
qperret, Sashiko
On Thu, 21 May 2026 at 11:22, Vincent Donnefort <vdonnefort@google.com> wrote:
>
> In the unlikely case where insert_vm_table_entry fails, __pkvm_init_vm
> release the memory donated by the host for the PGD, but as the stage-2
> is still set-up the hypervisor keeps a refcount on those pages,
> effectively leaking the references.
>
> Fix the rollback with the newly added kvm_guest_destroy_stage2().
>
> Fixes: 256b4668cd89 ("KVM: arm64: Introduce separate hypercalls for pKVM VM reservation and initialization")
> Reported-by: Sashiko <sashiko-bot@kernel.org>
> Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Cheers,
/fuad
>
> diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> index 3cbfae0e3dda..4f2b871199cb 100644
> --- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> +++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> @@ -56,6 +56,7 @@ int host_stage2_idmap_locked(phys_addr_t addr, u64 size, enum kvm_pgtable_prot p
> int host_stage2_set_owner_locked(phys_addr_t addr, u64 size, u8 owner_id);
> int kvm_host_prepare_stage2(void *pgt_pool_base);
> int kvm_guest_prepare_stage2(struct pkvm_hyp_vm *vm, void *pgd);
> +void kvm_guest_destroy_stage2(struct pkvm_hyp_vm *vm);
> void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt);
>
> int hyp_pin_shared_mem(void *from, void *to);
> diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> index 89eb20d4fee4..42b0b648f32f 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> @@ -306,16 +306,21 @@ int kvm_guest_prepare_stage2(struct pkvm_hyp_vm *vm, void *pgd)
> return 0;
> }
>
> +void kvm_guest_destroy_stage2(struct pkvm_hyp_vm *vm)
> +{
> + guest_lock_component(vm);
> + kvm_pgtable_stage2_destroy(&vm->pgt);
> + vm->kvm.arch.mmu.pgd_phys = 0ULL;
> + guest_unlock_component(vm);
> +}
> +
> void reclaim_pgtable_pages(struct pkvm_hyp_vm *vm, struct kvm_hyp_memcache *mc)
> {
> struct hyp_page *page;
> void *addr;
>
> /* Dump all pgtable pages in the hyp_pool */
> - guest_lock_component(vm);
> - kvm_pgtable_stage2_destroy(&vm->pgt);
> - vm->kvm.arch.mmu.pgd_phys = 0ULL;
> - guest_unlock_component(vm);
> + kvm_guest_destroy_stage2(vm);
>
> /* Drain the hyp_pool into the memcache */
> addr = hyp_alloc_pages(&vm->pool, 0);
> diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
> index eb1c10120f9f..3b2c4fbc34d8 100644
> --- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
> +++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
> @@ -853,10 +853,12 @@ int __pkvm_init_vm(struct kvm *host_kvm, unsigned long vm_hva,
> /* Must be called last since this publishes the VM. */
> ret = insert_vm_table_entry(handle, hyp_vm);
> if (ret)
> - goto err_remove_mappings;
> + goto err_destroy_stage2;
>
> return 0;
>
> +err_destroy_stage2:
> + kvm_guest_destroy_stage2(hyp_vm);
> err_remove_mappings:
> unmap_donated_memory(hyp_vm, vm_size);
> unmap_donated_memory(pgd, pgd_size);
> --
> 2.54.0.746.g67dd491aae-goog
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 3/3] KVM: arm64: Add fail-safe for refcounted pages in __pkvm_hyp_donate_host
2026-05-21 10:21 ` [PATCH v2 3/3] KVM: arm64: Add fail-safe for refcounted pages in __pkvm_hyp_donate_host Vincent Donnefort
@ 2026-05-21 13:07 ` Fuad Tabba
0 siblings, 0 replies; 10+ messages in thread
From: Fuad Tabba @ 2026-05-21 13:07 UTC (permalink / raw)
To: Vincent Donnefort
Cc: maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
catalin.marinas, will, linux-arm-kernel, kvmarm, kernel-team,
qperret
On Thu, 21 May 2026 at 11:22, Vincent Donnefort <vdonnefort@google.com> wrote:
>
> A previous bug in __pkvm_init_vm error path showed that the hypervisor
> could leak refcounted pages, (i.e. losing access to a page while its
> refcount is still elevated). This poses a threat to the pKVM state
> machine.
>
> Address this by introducing a fail-safe in n __pkvm_hyp_donate_host.
Stray n.
> Transitions are not a hot path so added security is worth the extra
> check.
>
> Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Cheers,
/fuad
>
> diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> index 42b0b648f32f..bb97d05b9b25 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> @@ -855,6 +855,16 @@ static int __hyp_check_page_state_range(phys_addr_t phys, u64 size, enum pkvm_pa
> return 0;
> }
>
> +static int __hyp_check_page_count_range(phys_addr_t phys, u64 size)
> +{
> + for_each_hyp_page(page, phys, size) {
> + if (page->refcount)
> + return -EBUSY;
> + }
> +
> + return 0;
> +}
> +
> static bool guest_pte_is_poisoned(kvm_pte_t pte)
> {
> if (kvm_pte_valid(pte))
> @@ -1053,7 +1063,6 @@ int __pkvm_guest_unshare_host(struct pkvm_hyp_vcpu *vcpu, u64 gfn)
> int __pkvm_host_unshare_hyp(u64 pfn)
> {
> u64 phys = hyp_pfn_to_phys(pfn);
> - u64 virt = (u64)__hyp_va(phys);
> u64 size = PAGE_SIZE;
> int ret;
>
> @@ -1066,10 +1075,9 @@ int __pkvm_host_unshare_hyp(u64 pfn)
> ret = __hyp_check_page_state_range(phys, size, PKVM_PAGE_SHARED_BORROWED);
> if (ret)
> goto unlock;
> - if (hyp_page_count((void *)virt)) {
> - ret = -EBUSY;
> + ret = __hyp_check_page_count_range(phys, size);
> + if (ret)
> goto unlock;
> - }
>
> __hyp_set_page_state_range(phys, size, PKVM_NOPAGE);
> WARN_ON(__host_set_page_state_range(phys, size, PKVM_PAGE_OWNED));
> @@ -1132,6 +1140,10 @@ int __pkvm_hyp_donate_host(u64 pfn, u64 nr_pages)
> if (ret)
> goto unlock;
>
> + ret = __hyp_check_page_count_range(phys, size);
> + if (ret)
> + goto unlock;
> +
> __hyp_set_page_state_range(phys, size, PKVM_NOPAGE);
> WARN_ON(kvm_pgtable_hyp_unmap(&pkvm_pgtable, virt, size) != size);
> WARN_ON(host_stage2_set_owner_locked(phys, size, PKVM_ID_HOST));
> --
> 2.54.0.746.g67dd491aae-goog
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 1/3] KVM: arm64: Reset page order in pKVM hyp_pool_init
2026-05-21 13:07 ` Fuad Tabba
@ 2026-05-21 13:21 ` Vincent Donnefort
2026-05-21 13:30 ` Fuad Tabba
0 siblings, 1 reply; 10+ messages in thread
From: Vincent Donnefort @ 2026-05-21 13:21 UTC (permalink / raw)
To: Fuad Tabba
Cc: maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
catalin.marinas, will, linux-arm-kernel, kvmarm, kernel-team,
qperret, Sashiko
On Thu, May 21, 2026 at 02:07:36PM +0100, Fuad Tabba wrote:
> On Thu, 21 May 2026 at 11:22, Vincent Donnefort <vdonnefort@google.com> wrote:
> >
> > When a VM fails to initialise after its stage-2 hyp_pool has been
> > initialised, that stage-2 must be torn down entirely. This requires
> > resetting both the refcount and the order of its pages back to 0.
> >
> > Currently, reclaim_pgtable_pages() implicitly resets the page order by
> > allocating the entire pool with order-0 granularity. However, in the VM
> > initialisation error path, the addresses of the donated memory (the PGD)
> > are already known, making it unnecessary to iterate over all pages in
> > the pool.
> >
> > Since the vmemmap page order is a hyp_pool-specific field, leaving a
> > non-zero order on hyp_pool destruction is harmless until another pool
> > attempts to admit the page. Instead of resetting this field during
> > destruction, reset it during pool initialization in hyp_pool_init().
> > Note that pages added to the pool outside of the initial pool range
> > (e.g., via guest_s2_zalloc_page()) must still have their order managed
> > manually.
> >
> > While at it, add a WARN_ON() in the hyp_pool attach path to catch
> > unexpected page orders that exceed the pool's max_order.
> >
> > Fixes: 256b4668cd89 ("KVM: arm64: Introduce separate hypercalls for pKVM VM reservation and initialization")
> > Reported-by: Sashiko <sashiko-bot@kernel.org>
> > Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
> >
> > diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > index 25f04629014e..89eb20d4fee4 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > @@ -322,7 +322,6 @@ void reclaim_pgtable_pages(struct pkvm_hyp_vm *vm, struct kvm_hyp_memcache *mc)
> > while (addr) {
> > page = hyp_virt_to_page(addr);
> > page->refcount = 0;
> > - page->order = 0;
> > push_hyp_memcache(mc, addr, hyp_virt_to_phys);
> > WARN_ON(__pkvm_hyp_donate_host(hyp_virt_to_pfn(addr), 1));
> > addr = hyp_alloc_pages(&vm->pool, 0);
> > diff --git a/arch/arm64/kvm/hyp/nvhe/page_alloc.c b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
> > index a1eb27a1a747..c3b3dc5a8ea7 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/page_alloc.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
> > @@ -97,6 +97,8 @@ static void __hyp_attach_page(struct hyp_pool *pool,
> > u8 order = p->order;
> > struct hyp_page *buddy;
> >
> > + WARN_ON(p->order > pool->max_order);
> > +
>
> Could you add a brief comment? It took me a minute to figure out what this
> catches. IIUC it's not attach's own input, it's a stale p->order from way back
> when an external page was popped from a memcache (today only via
> guest_s2_zalloc_page()). Right?
I think it'd be self explanatory if that was next the page_add_to_list, but that
wouldn't protect the memset (that's really best-effort though).
How about?
/*
* A page with an order bigger than the pool's max is an 'external' page
* whose order hasn't been reset before being added to the pool.
*/
But now I am thinking I can do way better: we can easily identify external
pages, so I could just force the order to 0 in that case.
WDYS?
>
> With that.
>
> Reviewed-by: Fuad Tabba <tabba@google.com>
> Tested-by: Fuad Tabba <tabba@google.com>
>
> Cheers,
> /fuad
>
>
>
>
> > memset(hyp_page_to_virt(p), 0, PAGE_SIZE << p->order);
> >
> > /* Skip coalescing for 'external' pages being freed into the pool. */
> > @@ -237,8 +239,10 @@ int hyp_pool_init(struct hyp_pool *pool, u64 pfn, unsigned int nr_pages,
> >
> > /* Init the vmemmap portion */
> > p = hyp_phys_to_page(phys);
> > - for (i = 0; i < nr_pages; i++)
> > + for (i = 0; i < nr_pages; i++) {
> > hyp_set_page_refcounted(&p[i]);
> > + p[i].order = 0;
> > + }
> >
> > /* Attach the unused pages to the buddy tree */
> > for (i = reserved_pages; i < nr_pages; i++)
> > --
> > 2.54.0.746.g67dd491aae-goog
> >
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 1/3] KVM: arm64: Reset page order in pKVM hyp_pool_init
2026-05-21 13:21 ` Vincent Donnefort
@ 2026-05-21 13:30 ` Fuad Tabba
0 siblings, 0 replies; 10+ messages in thread
From: Fuad Tabba @ 2026-05-21 13:30 UTC (permalink / raw)
To: Vincent Donnefort
Cc: maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
catalin.marinas, will, linux-arm-kernel, kvmarm, kernel-team,
qperret, Sashiko
On Thu, 21 May 2026 at 14:21, Vincent Donnefort <vdonnefort@google.com> wrote:
>
> On Thu, May 21, 2026 at 02:07:36PM +0100, Fuad Tabba wrote:
> > On Thu, 21 May 2026 at 11:22, Vincent Donnefort <vdonnefort@google.com> wrote:
> > >
> > > When a VM fails to initialise after its stage-2 hyp_pool has been
> > > initialised, that stage-2 must be torn down entirely. This requires
> > > resetting both the refcount and the order of its pages back to 0.
> > >
> > > Currently, reclaim_pgtable_pages() implicitly resets the page order by
> > > allocating the entire pool with order-0 granularity. However, in the VM
> > > initialisation error path, the addresses of the donated memory (the PGD)
> > > are already known, making it unnecessary to iterate over all pages in
> > > the pool.
> > >
> > > Since the vmemmap page order is a hyp_pool-specific field, leaving a
> > > non-zero order on hyp_pool destruction is harmless until another pool
> > > attempts to admit the page. Instead of resetting this field during
> > > destruction, reset it during pool initialization in hyp_pool_init().
> > > Note that pages added to the pool outside of the initial pool range
> > > (e.g., via guest_s2_zalloc_page()) must still have their order managed
> > > manually.
> > >
> > > While at it, add a WARN_ON() in the hyp_pool attach path to catch
> > > unexpected page orders that exceed the pool's max_order.
> > >
> > > Fixes: 256b4668cd89 ("KVM: arm64: Introduce separate hypercalls for pKVM VM reservation and initialization")
> > > Reported-by: Sashiko <sashiko-bot@kernel.org>
> > > Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
> > >
> > > diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > > index 25f04629014e..89eb20d4fee4 100644
> > > --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > > +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > > @@ -322,7 +322,6 @@ void reclaim_pgtable_pages(struct pkvm_hyp_vm *vm, struct kvm_hyp_memcache *mc)
> > > while (addr) {
> > > page = hyp_virt_to_page(addr);
> > > page->refcount = 0;
> > > - page->order = 0;
> > > push_hyp_memcache(mc, addr, hyp_virt_to_phys);
> > > WARN_ON(__pkvm_hyp_donate_host(hyp_virt_to_pfn(addr), 1));
> > > addr = hyp_alloc_pages(&vm->pool, 0);
> > > diff --git a/arch/arm64/kvm/hyp/nvhe/page_alloc.c b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
> > > index a1eb27a1a747..c3b3dc5a8ea7 100644
> > > --- a/arch/arm64/kvm/hyp/nvhe/page_alloc.c
> > > +++ b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
> > > @@ -97,6 +97,8 @@ static void __hyp_attach_page(struct hyp_pool *pool,
> > > u8 order = p->order;
> > > struct hyp_page *buddy;
> > >
> > > + WARN_ON(p->order > pool->max_order);
> > > +
> >
> > Could you add a brief comment? It took me a minute to figure out what this
> > catches. IIUC it's not attach's own input, it's a stale p->order from way back
> > when an external page was popped from a memcache (today only via
> > guest_s2_zalloc_page()). Right?
>
> I think it'd be self explanatory if that was next the page_add_to_list, but that
> wouldn't protect the memset (that's really best-effort though).
>
> How about?
>
> /*
> * A page with an order bigger than the pool's max is an 'external' page
> * whose order hasn't been reset before being added to the pool.
> */
>
> But now I am thinking I can do way better: we can easily identify external
> pages, so I could just force the order to 0 in that case.
>
> WDYS?
Yeah, Sounds better. The WARN's scope was actually narrower than the
real risk. Forcing order = 0 on entry covers all of that and removes
the implicit caller obligation the WARN was best-effort enforcing.
The memset is trivially safe then too (PAGE_SIZE << 0, regardless of
what was in the vmemmap).
Cheers,
/fuad
>
> >
> > With that.
> >
> > Reviewed-by: Fuad Tabba <tabba@google.com>
> > Tested-by: Fuad Tabba <tabba@google.com>
> >
> > Cheers,
> > /fuad
> >
> >
> >
> >
> > > memset(hyp_page_to_virt(p), 0, PAGE_SIZE << p->order);
> > >
> > > /* Skip coalescing for 'external' pages being freed into the pool. */
> > > @@ -237,8 +239,10 @@ int hyp_pool_init(struct hyp_pool *pool, u64 pfn, unsigned int nr_pages,
> > >
> > > /* Init the vmemmap portion */
> > > p = hyp_phys_to_page(phys);
> > > - for (i = 0; i < nr_pages; i++)
> > > + for (i = 0; i < nr_pages; i++) {
> > > hyp_set_page_refcounted(&p[i]);
> > > + p[i].order = 0;
> > > + }
> > >
> > > /* Attach the unused pages to the buddy tree */
> > > for (i = reserved_pages; i < nr_pages; i++)
> > > --
> > > 2.54.0.746.g67dd491aae-goog
> > >
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2026-05-21 13:31 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-21 10:21 [PATCH v2 0/3] Fix __pkvm_init_vm error path Vincent Donnefort
2026-05-21 10:21 ` [PATCH v2 1/3] KVM: arm64: Reset page order in pKVM hyp_pool_init Vincent Donnefort
2026-05-21 13:07 ` Fuad Tabba
2026-05-21 13:21 ` Vincent Donnefort
2026-05-21 13:30 ` Fuad Tabba
2026-05-21 10:21 ` [PATCH v2 2/3] KVM: arm64: Fix __pkvm_init_vm error path Vincent Donnefort
2026-05-21 13:07 ` Fuad Tabba
2026-05-21 10:21 ` [PATCH v2 3/3] KVM: arm64: Add fail-safe for refcounted pages in __pkvm_hyp_donate_host Vincent Donnefort
2026-05-21 13:07 ` Fuad Tabba
2026-05-21 13:07 ` [PATCH v2 0/3] Fix __pkvm_init_vm error path Fuad Tabba
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox