* [PATCH 00/30] KVM: arm64: Add support for protected guest memory with pKVM
@ 2026-01-05 15:49 Will Deacon
2026-01-05 15:49 ` [PATCH 01/30] KVM: arm64: Invert KVM_PGTABLE_WALK_HANDLE_FAULT to fix pKVM walkers Will Deacon
` (29 more replies)
0 siblings, 30 replies; 45+ messages in thread
From: Will Deacon @ 2026-01-05 15:49 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh
Hi folks,
Although pKVM has been shipping in Android kernels for a while now,
protected guest (pVM) support has been somewhat languishing upstream.
This has partly been because we've been waiting for guest_memfd() but
also because it hasn't been clear how to expose pVMs to userspace (which
is necessary for testing) without getting everything in place beforehand.
This has led to frustration on both sides of the fence [1] and so this
patch series attempts to get things moving again by exposing pVM
features in an incremental fashion based on top of anonymous memory,
which is what we have been using in Android. The big difference between
this series and the Android implementation is the graceful handling of
host stage-2 faults arising from accesses made using kernel mappings.
The hope is that this will unblock pKVM upstreaming efforts while the
guest_memfd() work continues to evolve.
Specifically, this patch series implements support for protected guest
memory with pKVM, where pages are unmapped from the host as they are
faulted into the guest and can be shared back from the guest using pKVM
hypercalls. Protected guests are created using a new machine type
identifier and can be booted to a shell using the kvmtool patches
available at [2], which finally means that we are able to test the pVM
logic in pKVM. Since this is an incremental step towards full isolation
from the host (for example, the CPU register state and DMA accesses are
not yet isolated), creating a pVM requires a developer Kconfig option to
be enabled in addition to booting with 'kvm-arm.mode=protected' and
results in a kernel taint.
More information about what is and what isn't implemented is described
in the pkvm.rst documentation added by this series. The intention is to
update this file as we introduce additional protection features and
ultimately to remove the taint.
The series is loosely structured as follows:
Patch 1: A dependent pgtable fix that I sent out previously
Patches 2-8: Cleanups/fixes to the existing code to prepare for pVMs
Patches 9-14: Support for memory donation and reclaim
Patches 15-22: Handling of bad host accesses to protected memory
Patches 23-25: Support for SHARE and UNSHARE guest hypercalls
Patches 26-27: UAPI and developer documentation
Patches 28-30: Selftest additions for new page ownership transitions
Patches are based on v6.19-rc4 and are also available at [3].
All feedback welcome.
Cheers,
Will
[1] https://lore.kernel.org/all/aS9rmMgNna7I5g4F@kernel.org/
[2] https://git.kernel.org/pub/scm/linux/kernel/git/will/kvmtool.git/log/?h=pkvm
[3] https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=kvm/protected-memory
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oupton@kernel.org>
Cc: Joey Gouly <joey.gouly@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Quentin Perret <qperret@google.com>
Cc: Fuad Tabba <tabba@google.com>
Cc: Vincent Donnefort <vdonnefort@google.com>
Cc: Mostafa Saleh <smostafa@google.com>
--->8
Fuad Tabba (1):
KVM: arm64: Expose self-hosted debug regs as RAZ/WI for protected
guests
Quentin Perret (2):
KVM: arm64: Refactor enter_exception64()
KVM: arm64: Inject SIGSEGV on illegal accesses
Will Deacon (27):
KVM: arm64: Invert KVM_PGTABLE_WALK_HANDLE_FAULT to fix pKVM walkers
KVM: arm64: Remove redundant 'pgt' pointer checks from MMU notifiers
KVM: arm64: Rename __pkvm_pgtable_stage2_unmap()
KVM: arm64: Don't advertise unsupported features for protected guests
KVM: arm64: Remove pointless is_protected_kvm_enabled() checks from
hyp
KVM: arm64: Ignore MMU notifier callbacks for protected VMs
KVM: arm64: Prevent unsupported memslot operations on protected VMs
KVM: arm64: Split teardown hypercall into two phases
KVM: arm64: Introduce __pkvm_host_donate_guest()
KVM: arm64: Hook up donation hypercall to pkvm_pgtable_stage2_map()
KVM: arm64: Handle aborts from protected VMs
KVM: arm64: Introduce __pkvm_reclaim_dying_guest_page()
KVM: arm64: Hook up reclaim hypercall to pkvm_pgtable_stage2_destroy()
KVM: arm64: Generalise kvm_pgtable_stage2_set_owner()
KVM: arm64: Introduce host_stage2_set_owner_metadata_locked()
KVM: arm64: Annotate guest donations with handle and gfn in host
stage-2
KVM: arm64: Introduce hypercall to force reclaim of a protected page
KVM: arm64: Reclaim faulting page from pKVM in spurious fault handler
KVM: arm64: Return -EFAULT from VCPU_RUN on access to a poisoned pte
KVM: arm64: Add hvc handler at EL2 for hypercalls from protected VMs
KVM: arm64: Implement the MEM_SHARE hypercall for protected VMs
KVM: arm64: Implement the MEM_UNSHARE hypercall for protected VMs
KVM: arm64: Allow userspace to create protected VMs when pKVM is
enabled
KVM: arm64: Add some initial documentation for pKVM
KVM: arm64: Extend pKVM page ownership selftests to cover guest
donation
KVM: arm64: Register 'selftest_vm' in the VM table
KVM: arm64: Extend pKVM page ownership selftests to cover forced
reclaim
.../admin-guide/kernel-parameters.txt | 4 +-
Documentation/virt/kvm/arm/index.rst | 1 +
Documentation/virt/kvm/arm/pkvm.rst | 101 ++++
arch/arm64/include/asm/kvm_asm.h | 7 +-
arch/arm64/include/asm/kvm_emulate.h | 5 +
arch/arm64/include/asm/kvm_host.h | 7 +
arch/arm64/include/asm/kvm_pgtable.h | 27 +-
arch/arm64/include/asm/kvm_pkvm.h | 4 +-
arch/arm64/include/asm/virt.h | 6 +
arch/arm64/kvm/Kconfig | 10 +
arch/arm64/kvm/arm.c | 8 +-
arch/arm64/kvm/hyp/exception.c | 100 ++--
arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 9 +
arch/arm64/kvm/hyp/include/nvhe/memory.h | 6 +
arch/arm64/kvm/hyp/include/nvhe/pkvm.h | 7 +-
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 95 ++--
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 500 ++++++++++++++++--
arch/arm64/kvm/hyp/nvhe/pkvm.c | 223 +++++++-
arch/arm64/kvm/hyp/nvhe/switch.c | 1 +
arch/arm64/kvm/hyp/nvhe/sys_regs.c | 8 +
arch/arm64/kvm/hyp/pgtable.c | 22 +-
arch/arm64/kvm/mmu.c | 122 ++++-
arch/arm64/kvm/pkvm.c | 149 +++++-
arch/arm64/mm/fault.c | 31 +-
include/uapi/linux/kvm.h | 5 +
25 files changed, 1249 insertions(+), 209 deletions(-)
create mode 100644 Documentation/virt/kvm/arm/pkvm.rst
--
2.52.0.351.gbe84eed79e-goog
^ permalink raw reply [flat|nested] 45+ messages in thread
* [PATCH 01/30] KVM: arm64: Invert KVM_PGTABLE_WALK_HANDLE_FAULT to fix pKVM walkers
2026-01-05 15:49 [PATCH 00/30] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
@ 2026-01-05 15:49 ` Will Deacon
2026-01-06 14:33 ` Quentin Perret
2026-01-05 15:49 ` [PATCH 02/30] KVM: arm64: Remove redundant 'pgt' pointer checks from MMU notifiers Will Deacon
` (28 subsequent siblings)
29 siblings, 1 reply; 45+ messages in thread
From: Will Deacon @ 2026-01-05 15:49 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh
Commit ddcadb297ce5 ("KVM: arm64: Ignore EAGAIN for walks outside of a
fault") introduced a new walker flag ('KVM_PGTABLE_WALK_HANDLE_FAULT')
to KVM's page-table code. When set, the walk logic maintains its
previous behaviour of terminating a walk as soon as the visitor callback
returns an error. However, when the flag is clear, the walk will
continue if the visitor returns -EAGAIN and the error is then suppressed
and returned as zero to the caller.
Clearing the flag is beneficial when write-protecting a range of IPAs
with kvm_pgtable_stage2_wrprotect() but is not useful in any other
cases, either because we are operating on a single page (e.g.
kvm_pgtable_stage2_mkyoung() or kvm_phys_addr_ioremap()) or because the
early termination is desirable (e.g. when mapping pages from a fault in
user_mem_abort()).
Subsequently, commit e912efed485a ("KVM: arm64: Introduce the EL1 pKVM
MMU") hooked up pKVM's hypercall interface to the MMU code at EL1 but
failed to propagate any of the walker flags. As a result, page-table
walks at EL2 fail to set KVM_PGTABLE_WALK_HANDLE_FAULT even when the
early termination semantics are desirable on the fault handling path.
Rather than complicate the pKVM hypercall interface, invert the flag so
that the whole thing can be simplified and only pass the new flag
('KVM_PGTABLE_WALK_IGNORE_EAGAIN') from the wrprotect code.
Cc: Fuad Tabba <tabba@google.com>
Cc: Quentin Perret <qperret@google.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oupton@kernel.org>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Fixes: fce886a60207 ("KVM: arm64: Plumb the pKVM MMU in KVM")
Signed-off-by: Will Deacon <will@kernel.org>
---
arch/arm64/include/asm/kvm_pgtable.h | 6 +++---
arch/arm64/kvm/hyp/pgtable.c | 5 +++--
arch/arm64/kvm/mmu.c | 8 +++-----
3 files changed, 9 insertions(+), 10 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index fc02de43c68d..8b78d573fbcf 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -293,8 +293,8 @@ typedef bool (*kvm_pgtable_force_pte_cb_t)(u64 addr, u64 end,
* children.
* @KVM_PGTABLE_WALK_SHARED: Indicates the page-tables may be shared
* with other software walkers.
- * @KVM_PGTABLE_WALK_HANDLE_FAULT: Indicates the page-table walk was
- * invoked from a fault handler.
+ * @KVM_PGTABLE_WALK_IGNORE_EAGAIN: Don't terminate the walk early if
+ * the walker returns -EAGAIN.
* @KVM_PGTABLE_WALK_SKIP_BBM_TLBI: Visit and update table entries
* without Break-before-make's
* TLB invalidation.
@@ -307,7 +307,7 @@ enum kvm_pgtable_walk_flags {
KVM_PGTABLE_WALK_TABLE_PRE = BIT(1),
KVM_PGTABLE_WALK_TABLE_POST = BIT(2),
KVM_PGTABLE_WALK_SHARED = BIT(3),
- KVM_PGTABLE_WALK_HANDLE_FAULT = BIT(4),
+ KVM_PGTABLE_WALK_IGNORE_EAGAIN = BIT(4),
KVM_PGTABLE_WALK_SKIP_BBM_TLBI = BIT(5),
KVM_PGTABLE_WALK_SKIP_CMO = BIT(6),
};
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 947ac1a951a5..9abc0a6cf448 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -144,7 +144,7 @@ static bool kvm_pgtable_walk_continue(const struct kvm_pgtable_walker *walker,
* page table walk.
*/
if (r == -EAGAIN)
- return !(walker->flags & KVM_PGTABLE_WALK_HANDLE_FAULT);
+ return walker->flags & KVM_PGTABLE_WALK_IGNORE_EAGAIN;
return !r;
}
@@ -1262,7 +1262,8 @@ int kvm_pgtable_stage2_wrprotect(struct kvm_pgtable *pgt, u64 addr, u64 size)
{
return stage2_update_leaf_attrs(pgt, addr, size, 0,
KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W,
- NULL, NULL, 0);
+ NULL, NULL,
+ KVM_PGTABLE_WALK_IGNORE_EAGAIN);
}
void kvm_pgtable_stage2_mkyoung(struct kvm_pgtable *pgt, u64 addr,
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 48d7c372a4cd..2b260bdf3202 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1563,14 +1563,12 @@ static void adjust_nested_exec_perms(struct kvm *kvm,
*prot &= ~KVM_PGTABLE_PROT_PX;
}
-#define KVM_PGTABLE_WALK_MEMABORT_FLAGS (KVM_PGTABLE_WALK_HANDLE_FAULT | KVM_PGTABLE_WALK_SHARED)
-
static int gmem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
struct kvm_s2_trans *nested,
struct kvm_memory_slot *memslot, bool is_perm)
{
bool write_fault, exec_fault, writable;
- enum kvm_pgtable_walk_flags flags = KVM_PGTABLE_WALK_MEMABORT_FLAGS;
+ enum kvm_pgtable_walk_flags flags = KVM_PGTABLE_WALK_SHARED;
enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
struct kvm_pgtable *pgt = vcpu->arch.hw_mmu->pgt;
unsigned long mmu_seq;
@@ -1665,7 +1663,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
struct kvm_pgtable *pgt;
struct page *page;
vm_flags_t vm_flags;
- enum kvm_pgtable_walk_flags flags = KVM_PGTABLE_WALK_MEMABORT_FLAGS;
+ enum kvm_pgtable_walk_flags flags = KVM_PGTABLE_WALK_SHARED;
if (fault_is_perm)
fault_granule = kvm_vcpu_trap_get_perm_fault_granule(vcpu);
@@ -1933,7 +1931,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
/* Resolve the access fault by making the page young again. */
static void handle_access_fault(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
{
- enum kvm_pgtable_walk_flags flags = KVM_PGTABLE_WALK_HANDLE_FAULT | KVM_PGTABLE_WALK_SHARED;
+ enum kvm_pgtable_walk_flags flags = KVM_PGTABLE_WALK_SHARED;
struct kvm_s2_mmu *mmu;
trace_kvm_access_fault(fault_ipa);
--
2.52.0.351.gbe84eed79e-goog
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH 02/30] KVM: arm64: Remove redundant 'pgt' pointer checks from MMU notifiers
2026-01-05 15:49 [PATCH 00/30] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
2026-01-05 15:49 ` [PATCH 01/30] KVM: arm64: Invert KVM_PGTABLE_WALK_HANDLE_FAULT to fix pKVM walkers Will Deacon
@ 2026-01-05 15:49 ` Will Deacon
2026-01-06 14:32 ` Quentin Perret
2026-01-05 15:49 ` [PATCH 03/30] KVM: arm64: Rename __pkvm_pgtable_stage2_unmap() Will Deacon
` (27 subsequent siblings)
29 siblings, 1 reply; 45+ messages in thread
From: Will Deacon @ 2026-01-05 15:49 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh
The MMU notifiers are registered by kvm_init_mmu_notifier() only after
kvm_arch_init_vm() has returned successfully. Since the latter function
initialises the 'kvm->arch.mmu.pgt' pointer (and allocates the VM handle
when pKVM is enabled), the initialisation checks in the MMU notifiers
are not required.
Remove the redundant checks from the MMU notifier callbacks.
Signed-off-by: Will Deacon <will@kernel.org>
---
arch/arm64/kvm/mmu.c | 9 ---------
arch/arm64/kvm/pkvm.c | 9 ++++++---
2 files changed, 6 insertions(+), 12 deletions(-)
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 2b260bdf3202..5d18927f76ba 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -2208,9 +2208,6 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
{
- if (!kvm->arch.mmu.pgt)
- return false;
-
__unmap_stage2_range(&kvm->arch.mmu, range->start << PAGE_SHIFT,
(range->end - range->start) << PAGE_SHIFT,
range->may_block);
@@ -2223,9 +2220,6 @@ bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
{
u64 size = (range->end - range->start) << PAGE_SHIFT;
- if (!kvm->arch.mmu.pgt)
- return false;
-
return KVM_PGT_FN(kvm_pgtable_stage2_test_clear_young)(kvm->arch.mmu.pgt,
range->start << PAGE_SHIFT,
size, true);
@@ -2239,9 +2233,6 @@ bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
{
u64 size = (range->end - range->start) << PAGE_SHIFT;
- if (!kvm->arch.mmu.pgt)
- return false;
-
return KVM_PGT_FN(kvm_pgtable_stage2_test_clear_young)(kvm->arch.mmu.pgt,
range->start << PAGE_SHIFT,
size, false);
diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index d7a0f69a9982..7797813f4dbe 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -329,9 +329,6 @@ static int __pkvm_pgtable_stage2_unmap(struct kvm_pgtable *pgt, u64 start, u64 e
struct pkvm_mapping *mapping;
int ret;
- if (!handle)
- return 0;
-
for_each_mapping_in_range_safe(pgt, start, end, mapping) {
ret = kvm_call_hyp_nvhe(__pkvm_host_unshare_guest, handle, mapping->gfn,
mapping->nr_pages);
@@ -347,6 +344,12 @@ static int __pkvm_pgtable_stage2_unmap(struct kvm_pgtable *pgt, u64 start, u64 e
void pkvm_pgtable_stage2_destroy_range(struct kvm_pgtable *pgt,
u64 addr, u64 size)
{
+ struct kvm *kvm = kvm_s2_mmu_to_kvm(pgt->mmu);
+ pkvm_handle_t handle = kvm->arch.pkvm.handle;
+
+ if (!handle)
+ return;
+
__pkvm_pgtable_stage2_unmap(pgt, addr, addr + size);
}
--
2.52.0.351.gbe84eed79e-goog
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH 03/30] KVM: arm64: Rename __pkvm_pgtable_stage2_unmap()
2026-01-05 15:49 [PATCH 00/30] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
2026-01-05 15:49 ` [PATCH 01/30] KVM: arm64: Invert KVM_PGTABLE_WALK_HANDLE_FAULT to fix pKVM walkers Will Deacon
2026-01-05 15:49 ` [PATCH 02/30] KVM: arm64: Remove redundant 'pgt' pointer checks from MMU notifiers Will Deacon
@ 2026-01-05 15:49 ` Will Deacon
2026-01-05 15:49 ` [PATCH 04/30] KVM: arm64: Don't advertise unsupported features for protected guests Will Deacon
` (26 subsequent siblings)
29 siblings, 0 replies; 45+ messages in thread
From: Will Deacon @ 2026-01-05 15:49 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh
In preparation for adding support for protected VMs, where pages are
donated rather than shared, rename __pkvm_pgtable_stage2_unmap() to
__pkvm_pgtable_stage2_unshare() to make it clearer about what is going
on.
Signed-off-by: Will Deacon <will@kernel.org>
---
arch/arm64/kvm/pkvm.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index 7797813f4dbe..42f6e50825ac 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -322,7 +322,7 @@ int pkvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
return 0;
}
-static int __pkvm_pgtable_stage2_unmap(struct kvm_pgtable *pgt, u64 start, u64 end)
+static int __pkvm_pgtable_stage2_unshare(struct kvm_pgtable *pgt, u64 start, u64 end)
{
struct kvm *kvm = kvm_s2_mmu_to_kvm(pgt->mmu);
pkvm_handle_t handle = kvm->arch.pkvm.handle;
@@ -350,7 +350,7 @@ void pkvm_pgtable_stage2_destroy_range(struct kvm_pgtable *pgt,
if (!handle)
return;
- __pkvm_pgtable_stage2_unmap(pgt, addr, addr + size);
+ __pkvm_pgtable_stage2_unshare(pgt, addr, addr + size);
}
void pkvm_pgtable_stage2_destroy_pgd(struct kvm_pgtable *pgt)
@@ -386,7 +386,7 @@ int pkvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
return -EAGAIN;
/* Remove _any_ pkvm_mapping overlapping with the range, bigger or smaller. */
- ret = __pkvm_pgtable_stage2_unmap(pgt, addr, addr + size);
+ ret = __pkvm_pgtable_stage2_unshare(pgt, addr, addr + size);
if (ret)
return ret;
mapping = NULL;
@@ -409,7 +409,7 @@ int pkvm_pgtable_stage2_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
{
lockdep_assert_held_write(&kvm_s2_mmu_to_kvm(pgt->mmu)->mmu_lock);
- return __pkvm_pgtable_stage2_unmap(pgt, addr, addr + size);
+ return __pkvm_pgtable_stage2_unshare(pgt, addr, addr + size);
}
int pkvm_pgtable_stage2_wrprotect(struct kvm_pgtable *pgt, u64 addr, u64 size)
--
2.52.0.351.gbe84eed79e-goog
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH 04/30] KVM: arm64: Don't advertise unsupported features for protected guests
2026-01-05 15:49 [PATCH 00/30] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
` (2 preceding siblings ...)
2026-01-05 15:49 ` [PATCH 03/30] KVM: arm64: Rename __pkvm_pgtable_stage2_unmap() Will Deacon
@ 2026-01-05 15:49 ` Will Deacon
2026-01-05 15:49 ` [PATCH 05/30] KVM: arm64: Expose self-hosted debug regs as RAZ/WI " Will Deacon
` (25 subsequent siblings)
29 siblings, 0 replies; 45+ messages in thread
From: Will Deacon @ 2026-01-05 15:49 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh
Both SVE and PMUv3 are treated as "restricted" features for protected
guests and attempts to access their corresponding architectural state
from a protected guest result in an undefined exception being injected
by the hypervisor.
Since these exceptions are unexpected and typically fatal for the guest,
don't advertise these features for protected guests.
Signed-off-by: Will Deacon <will@kernel.org>
---
arch/arm64/include/asm/kvm_pkvm.h | 2 --
1 file changed, 2 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_pkvm.h b/arch/arm64/include/asm/kvm_pkvm.h
index 0aecd4ac5f45..5a71d25febca 100644
--- a/arch/arm64/include/asm/kvm_pkvm.h
+++ b/arch/arm64/include/asm/kvm_pkvm.h
@@ -37,8 +37,6 @@ static inline bool kvm_pvm_ext_allowed(long ext)
case KVM_CAP_MAX_VCPU_ID:
case KVM_CAP_MSI_DEVID:
case KVM_CAP_ARM_VM_IPA_SIZE:
- case KVM_CAP_ARM_PMU_V3:
- case KVM_CAP_ARM_SVE:
case KVM_CAP_ARM_PTRAUTH_ADDRESS:
case KVM_CAP_ARM_PTRAUTH_GENERIC:
return true;
--
2.52.0.351.gbe84eed79e-goog
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH 05/30] KVM: arm64: Expose self-hosted debug regs as RAZ/WI for protected guests
2026-01-05 15:49 [PATCH 00/30] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
` (3 preceding siblings ...)
2026-01-05 15:49 ` [PATCH 04/30] KVM: arm64: Don't advertise unsupported features for protected guests Will Deacon
@ 2026-01-05 15:49 ` Will Deacon
2026-01-05 15:49 ` [PATCH 06/30] KVM: arm64: Remove pointless is_protected_kvm_enabled() checks from hyp Will Deacon
` (24 subsequent siblings)
29 siblings, 0 replies; 45+ messages in thread
From: Will Deacon @ 2026-01-05 15:49 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh
From: Fuad Tabba <tabba@google.com>
Debug and trace are not currently supported for protected guests, so
trap accesses to the related registers and emulate them as RAZ/WI for
now. Although this isn't strictly compatible with the architecture, it's
sufficient for Linux guests and means that debug support can be added
later on.
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
arch/arm64/kvm/hyp/nvhe/sys_regs.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/arch/arm64/kvm/hyp/nvhe/sys_regs.c b/arch/arm64/kvm/hyp/nvhe/sys_regs.c
index 3108b5185c20..c106bb796ab0 100644
--- a/arch/arm64/kvm/hyp/nvhe/sys_regs.c
+++ b/arch/arm64/kvm/hyp/nvhe/sys_regs.c
@@ -372,6 +372,14 @@ static const struct sys_reg_desc pvm_sys_reg_descs[] = {
/* Cache maintenance by set/way operations are restricted. */
/* Debug and Trace Registers are restricted. */
+ RAZ_WI(SYS_DBGBVRn_EL1(0)),
+ RAZ_WI(SYS_DBGBCRn_EL1(0)),
+ RAZ_WI(SYS_DBGWVRn_EL1(0)),
+ RAZ_WI(SYS_DBGWCRn_EL1(0)),
+ RAZ_WI(SYS_MDSCR_EL1),
+ RAZ_WI(SYS_OSLAR_EL1),
+ RAZ_WI(SYS_OSLSR_EL1),
+ RAZ_WI(SYS_OSDLR_EL1),
/* Group 1 ID registers */
HOST_HANDLED(SYS_REVIDR_EL1),
--
2.52.0.351.gbe84eed79e-goog
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH 06/30] KVM: arm64: Remove pointless is_protected_kvm_enabled() checks from hyp
2026-01-05 15:49 [PATCH 00/30] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
` (4 preceding siblings ...)
2026-01-05 15:49 ` [PATCH 05/30] KVM: arm64: Expose self-hosted debug regs as RAZ/WI " Will Deacon
@ 2026-01-05 15:49 ` Will Deacon
2026-01-06 14:40 ` Quentin Perret
2026-01-05 15:49 ` [PATCH 07/30] KVM: arm64: Ignore MMU notifier callbacks for protected VMs Will Deacon
` (23 subsequent siblings)
29 siblings, 1 reply; 45+ messages in thread
From: Will Deacon @ 2026-01-05 15:49 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh
When pKVM is not enabled, the host shouldn't issue pKVM-specific
hypercalls and so there's no point checking for this in the EL2
hypercall handling code.
Remove the redundant is_protected_kvm_enabled() checks.
Signed-off-by: Will Deacon <will@kernel.org>
---
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 33 ++----------------------------
1 file changed, 2 insertions(+), 31 deletions(-)
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index a7c689152f68..62a56c6084ca 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -169,9 +169,6 @@ static void handle___pkvm_vcpu_load(struct kvm_cpu_context *host_ctxt)
DECLARE_REG(u64, hcr_el2, host_ctxt, 3);
struct pkvm_hyp_vcpu *hyp_vcpu;
- if (!is_protected_kvm_enabled())
- return;
-
hyp_vcpu = pkvm_load_hyp_vcpu(handle, vcpu_idx);
if (!hyp_vcpu)
return;
@@ -185,12 +182,8 @@ static void handle___pkvm_vcpu_load(struct kvm_cpu_context *host_ctxt)
static void handle___pkvm_vcpu_put(struct kvm_cpu_context *host_ctxt)
{
- struct pkvm_hyp_vcpu *hyp_vcpu;
+ struct pkvm_hyp_vcpu *hyp_vcpu = pkvm_get_loaded_hyp_vcpu();
- if (!is_protected_kvm_enabled())
- return;
-
- hyp_vcpu = pkvm_get_loaded_hyp_vcpu();
if (hyp_vcpu)
pkvm_put_hyp_vcpu(hyp_vcpu);
}
@@ -254,9 +247,6 @@ static void handle___pkvm_host_share_guest(struct kvm_cpu_context *host_ctxt)
struct pkvm_hyp_vcpu *hyp_vcpu;
int ret = -EINVAL;
- if (!is_protected_kvm_enabled())
- goto out;
-
hyp_vcpu = pkvm_get_loaded_hyp_vcpu();
if (!hyp_vcpu || pkvm_hyp_vcpu_is_protected(hyp_vcpu))
goto out;
@@ -278,9 +268,6 @@ static void handle___pkvm_host_unshare_guest(struct kvm_cpu_context *host_ctxt)
struct pkvm_hyp_vm *hyp_vm;
int ret = -EINVAL;
- if (!is_protected_kvm_enabled())
- goto out;
-
hyp_vm = get_np_pkvm_hyp_vm(handle);
if (!hyp_vm)
goto out;
@@ -298,9 +285,6 @@ static void handle___pkvm_host_relax_perms_guest(struct kvm_cpu_context *host_ct
struct pkvm_hyp_vcpu *hyp_vcpu;
int ret = -EINVAL;
- if (!is_protected_kvm_enabled())
- goto out;
-
hyp_vcpu = pkvm_get_loaded_hyp_vcpu();
if (!hyp_vcpu || pkvm_hyp_vcpu_is_protected(hyp_vcpu))
goto out;
@@ -318,9 +302,6 @@ static void handle___pkvm_host_wrprotect_guest(struct kvm_cpu_context *host_ctxt
struct pkvm_hyp_vm *hyp_vm;
int ret = -EINVAL;
- if (!is_protected_kvm_enabled())
- goto out;
-
hyp_vm = get_np_pkvm_hyp_vm(handle);
if (!hyp_vm)
goto out;
@@ -340,9 +321,6 @@ static void handle___pkvm_host_test_clear_young_guest(struct kvm_cpu_context *ho
struct pkvm_hyp_vm *hyp_vm;
int ret = -EINVAL;
- if (!is_protected_kvm_enabled())
- goto out;
-
hyp_vm = get_np_pkvm_hyp_vm(handle);
if (!hyp_vm)
goto out;
@@ -359,9 +337,6 @@ static void handle___pkvm_host_mkyoung_guest(struct kvm_cpu_context *host_ctxt)
struct pkvm_hyp_vcpu *hyp_vcpu;
int ret = -EINVAL;
- if (!is_protected_kvm_enabled())
- goto out;
-
hyp_vcpu = pkvm_get_loaded_hyp_vcpu();
if (!hyp_vcpu || pkvm_hyp_vcpu_is_protected(hyp_vcpu))
goto out;
@@ -421,12 +396,8 @@ static void handle___kvm_tlb_flush_vmid(struct kvm_cpu_context *host_ctxt)
static void handle___pkvm_tlb_flush_vmid(struct kvm_cpu_context *host_ctxt)
{
DECLARE_REG(pkvm_handle_t, handle, host_ctxt, 1);
- struct pkvm_hyp_vm *hyp_vm;
+ struct pkvm_hyp_vm *hyp_vm = get_np_pkvm_hyp_vm(handle);
- if (!is_protected_kvm_enabled())
- return;
-
- hyp_vm = get_np_pkvm_hyp_vm(handle);
if (!hyp_vm)
return;
--
2.52.0.351.gbe84eed79e-goog
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH 07/30] KVM: arm64: Ignore MMU notifier callbacks for protected VMs
2026-01-05 15:49 [PATCH 00/30] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
` (5 preceding siblings ...)
2026-01-05 15:49 ` [PATCH 06/30] KVM: arm64: Remove pointless is_protected_kvm_enabled() checks from hyp Will Deacon
@ 2026-01-05 15:49 ` Will Deacon
2026-01-05 15:49 ` [PATCH 08/30] KVM: arm64: Prevent unsupported memslot operations on " Will Deacon
` (22 subsequent siblings)
29 siblings, 0 replies; 45+ messages in thread
From: Will Deacon @ 2026-01-05 15:49 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh
In preparation for supporting the donation of pinned pages to protected
VMs, return early from the MMU notifiers when called for a protected VM,
as the necessary hypercalls are exposed only for non-protected guests.
Signed-off-by: Will Deacon <will@kernel.org>
---
arch/arm64/kvm/mmu.c | 12 ++++++++++++
arch/arm64/kvm/pkvm.c | 19 ++++++++++++++++++-
2 files changed, 30 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 5d18927f76ba..a888840497f9 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -340,6 +340,9 @@ static void __unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64
void kvm_stage2_unmap_range(struct kvm_s2_mmu *mmu, phys_addr_t start,
u64 size, bool may_block)
{
+ if (kvm_vm_is_protected(kvm_s2_mmu_to_kvm(mmu)))
+ return;
+
__unmap_stage2_range(mmu, start, size, may_block);
}
@@ -2208,6 +2211,9 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
{
+ if (kvm_vm_is_protected(kvm))
+ return false;
+
__unmap_stage2_range(&kvm->arch.mmu, range->start << PAGE_SHIFT,
(range->end - range->start) << PAGE_SHIFT,
range->may_block);
@@ -2220,6 +2226,9 @@ bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
{
u64 size = (range->end - range->start) << PAGE_SHIFT;
+ if (kvm_vm_is_protected(kvm))
+ return false;
+
return KVM_PGT_FN(kvm_pgtable_stage2_test_clear_young)(kvm->arch.mmu.pgt,
range->start << PAGE_SHIFT,
size, true);
@@ -2233,6 +2242,9 @@ bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
{
u64 size = (range->end - range->start) << PAGE_SHIFT;
+ if (kvm_vm_is_protected(kvm))
+ return false;
+
return KVM_PGT_FN(kvm_pgtable_stage2_test_clear_young)(kvm->arch.mmu.pgt,
range->start << PAGE_SHIFT,
size, false);
diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index 42f6e50825ac..20d50abb3b94 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -407,7 +407,12 @@ int pkvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
int pkvm_pgtable_stage2_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
{
- lockdep_assert_held_write(&kvm_s2_mmu_to_kvm(pgt->mmu)->mmu_lock);
+ struct kvm *kvm = kvm_s2_mmu_to_kvm(pgt->mmu);
+
+ if (WARN_ON(kvm_vm_is_protected(kvm)))
+ return -EPERM;
+
+ lockdep_assert_held_write(&kvm->mmu_lock);
return __pkvm_pgtable_stage2_unshare(pgt, addr, addr + size);
}
@@ -419,6 +424,9 @@ int pkvm_pgtable_stage2_wrprotect(struct kvm_pgtable *pgt, u64 addr, u64 size)
struct pkvm_mapping *mapping;
int ret = 0;
+ if (WARN_ON(kvm_vm_is_protected(kvm)))
+ return -EPERM;
+
lockdep_assert_held(&kvm->mmu_lock);
for_each_mapping_in_range_safe(pgt, addr, addr + size, mapping) {
ret = kvm_call_hyp_nvhe(__pkvm_host_wrprotect_guest, handle, mapping->gfn,
@@ -450,6 +458,9 @@ bool pkvm_pgtable_stage2_test_clear_young(struct kvm_pgtable *pgt, u64 addr, u64
struct pkvm_mapping *mapping;
bool young = false;
+ if (WARN_ON(kvm_vm_is_protected(kvm)))
+ return -EPERM;
+
lockdep_assert_held(&kvm->mmu_lock);
for_each_mapping_in_range_safe(pgt, addr, addr + size, mapping)
young |= kvm_call_hyp_nvhe(__pkvm_host_test_clear_young_guest, handle, mapping->gfn,
@@ -461,12 +472,18 @@ bool pkvm_pgtable_stage2_test_clear_young(struct kvm_pgtable *pgt, u64 addr, u64
int pkvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr, enum kvm_pgtable_prot prot,
enum kvm_pgtable_walk_flags flags)
{
+ if (WARN_ON(kvm_vm_is_protected(kvm_s2_mmu_to_kvm(pgt->mmu))))
+ return -EPERM;
+
return kvm_call_hyp_nvhe(__pkvm_host_relax_perms_guest, addr >> PAGE_SHIFT, prot);
}
void pkvm_pgtable_stage2_mkyoung(struct kvm_pgtable *pgt, u64 addr,
enum kvm_pgtable_walk_flags flags)
{
+ if (WARN_ON(kvm_vm_is_protected(kvm_s2_mmu_to_kvm(pgt->mmu))))
+ return;
+
WARN_ON(kvm_call_hyp_nvhe(__pkvm_host_mkyoung_guest, addr >> PAGE_SHIFT));
}
--
2.52.0.351.gbe84eed79e-goog
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH 08/30] KVM: arm64: Prevent unsupported memslot operations on protected VMs
2026-01-05 15:49 [PATCH 00/30] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
` (6 preceding siblings ...)
2026-01-05 15:49 ` [PATCH 07/30] KVM: arm64: Ignore MMU notifier callbacks for protected VMs Will Deacon
@ 2026-01-05 15:49 ` Will Deacon
2026-01-05 15:49 ` [PATCH 09/30] KVM: arm64: Split teardown hypercall into two phases Will Deacon
` (21 subsequent siblings)
29 siblings, 0 replies; 45+ messages in thread
From: Will Deacon @ 2026-01-05 15:49 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh
Protected VMs do not support deleting or moving memslots after first
run nor do they support read-only or dirty logging.
Return -EPERM to userspace if such an operation is attempted.
Signed-off-by: Will Deacon <will@kernel.org>
---
arch/arm64/kvm/mmu.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index a888840497f9..8f290d9477d3 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -2419,6 +2419,19 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
hva_t hva, reg_end;
int ret = 0;
+ if (kvm_vm_is_protected(kvm)) {
+ /* Cannot modify memslots once a pVM has run. */
+ if (pkvm_hyp_vm_is_created(kvm) &&
+ (change == KVM_MR_DELETE || change == KVM_MR_MOVE)) {
+ return -EPERM;
+ }
+
+ if (new &&
+ new->flags & (KVM_MEM_LOG_DIRTY_PAGES | KVM_MEM_READONLY)) {
+ return -EPERM;
+ }
+ }
+
if (change != KVM_MR_CREATE && change != KVM_MR_MOVE &&
change != KVM_MR_FLAGS_ONLY)
return 0;
--
2.52.0.351.gbe84eed79e-goog
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH 09/30] KVM: arm64: Split teardown hypercall into two phases
2026-01-05 15:49 [PATCH 00/30] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
` (7 preceding siblings ...)
2026-01-05 15:49 ` [PATCH 08/30] KVM: arm64: Prevent unsupported memslot operations on " Will Deacon
@ 2026-01-05 15:49 ` Will Deacon
2026-01-05 15:49 ` [PATCH 10/30] KVM: arm64: Introduce __pkvm_host_donate_guest() Will Deacon
` (20 subsequent siblings)
29 siblings, 0 replies; 45+ messages in thread
From: Will Deacon @ 2026-01-05 15:49 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh
In preparation for reclaiming protected guest VM pages from the host
during teardown, split the current 'pkvm_teardown_vm' hypercall into
separate 'start' and 'finalise' calls.
The 'pkvm_start_teardown_vm' hypercall puts the VM into a new 'is_dying'
state, which is a point of no return past which no vCPU of the pVM is
allowed to run any more. Once in this new state,
'pkvm_finalize_teardown_vm' can be used to reclaim meta-data and
page-table pages from the VM. A subsequent patch will add support for
reclaiming the individual guest memory pages.
Co-developed-by: Quentin Perret <qperret@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
arch/arm64/include/asm/kvm_asm.h | 3 ++-
arch/arm64/include/asm/kvm_host.h | 7 +++++
arch/arm64/kvm/hyp/include/nvhe/pkvm.h | 4 ++-
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 14 +++++++---
arch/arm64/kvm/hyp/nvhe/pkvm.c | 36 ++++++++++++++++++++++----
arch/arm64/kvm/pkvm.c | 7 ++++-
6 files changed, 60 insertions(+), 11 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index a1ad12c72ebf..d7fb23d39956 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -85,7 +85,8 @@ enum __kvm_host_smccc_func {
__KVM_HOST_SMCCC_FUNC___pkvm_unreserve_vm,
__KVM_HOST_SMCCC_FUNC___pkvm_init_vm,
__KVM_HOST_SMCCC_FUNC___pkvm_init_vcpu,
- __KVM_HOST_SMCCC_FUNC___pkvm_teardown_vm,
+ __KVM_HOST_SMCCC_FUNC___pkvm_start_teardown_vm,
+ __KVM_HOST_SMCCC_FUNC___pkvm_finalize_teardown_vm,
__KVM_HOST_SMCCC_FUNC___pkvm_vcpu_load,
__KVM_HOST_SMCCC_FUNC___pkvm_vcpu_put,
__KVM_HOST_SMCCC_FUNC___pkvm_tlb_flush_vmid,
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index ac7f970c7883..3191d10a2622 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -255,6 +255,13 @@ struct kvm_protected_vm {
struct kvm_hyp_memcache stage2_teardown_mc;
bool is_protected;
bool is_created;
+
+ /*
+ * True when the guest is being torn down. When in this state, the
+ * guest's vCPUs can't be loaded anymore, but its pages can be
+ * reclaimed by the host.
+ */
+ bool is_dying;
};
struct kvm_mpidr_data {
diff --git a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
index 184ad7a39950..04c7ca703014 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
@@ -73,7 +73,9 @@ int __pkvm_init_vm(struct kvm *host_kvm, unsigned long vm_hva,
unsigned long pgd_hva);
int __pkvm_init_vcpu(pkvm_handle_t handle, struct kvm_vcpu *host_vcpu,
unsigned long vcpu_hva);
-int __pkvm_teardown_vm(pkvm_handle_t handle);
+
+int __pkvm_start_teardown_vm(pkvm_handle_t handle);
+int __pkvm_finalize_teardown_vm(pkvm_handle_t handle);
struct pkvm_hyp_vcpu *pkvm_load_hyp_vcpu(pkvm_handle_t handle,
unsigned int vcpu_idx);
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 62a56c6084ca..e88b57eac25c 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -550,11 +550,18 @@ static void handle___pkvm_init_vcpu(struct kvm_cpu_context *host_ctxt)
cpu_reg(host_ctxt, 1) = __pkvm_init_vcpu(handle, host_vcpu, vcpu_hva);
}
-static void handle___pkvm_teardown_vm(struct kvm_cpu_context *host_ctxt)
+static void handle___pkvm_start_teardown_vm(struct kvm_cpu_context *host_ctxt)
{
DECLARE_REG(pkvm_handle_t, handle, host_ctxt, 1);
- cpu_reg(host_ctxt, 1) = __pkvm_teardown_vm(handle);
+ cpu_reg(host_ctxt, 1) = __pkvm_start_teardown_vm(handle);
+}
+
+static void handle___pkvm_finalize_teardown_vm(struct kvm_cpu_context *host_ctxt)
+{
+ DECLARE_REG(pkvm_handle_t, handle, host_ctxt, 1);
+
+ cpu_reg(host_ctxt, 1) = __pkvm_finalize_teardown_vm(handle);
}
typedef void (*hcall_t)(struct kvm_cpu_context *);
@@ -594,7 +601,8 @@ static const hcall_t host_hcall[] = {
HANDLE_FUNC(__pkvm_unreserve_vm),
HANDLE_FUNC(__pkvm_init_vm),
HANDLE_FUNC(__pkvm_init_vcpu),
- HANDLE_FUNC(__pkvm_teardown_vm),
+ HANDLE_FUNC(__pkvm_start_teardown_vm),
+ HANDLE_FUNC(__pkvm_finalize_teardown_vm),
HANDLE_FUNC(__pkvm_vcpu_load),
HANDLE_FUNC(__pkvm_vcpu_put),
HANDLE_FUNC(__pkvm_tlb_flush_vmid),
diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
index 8911338961c5..7f8191f96fc3 100644
--- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
+++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
@@ -256,7 +256,10 @@ struct pkvm_hyp_vcpu *pkvm_load_hyp_vcpu(pkvm_handle_t handle,
hyp_spin_lock(&vm_table_lock);
hyp_vm = get_vm_by_handle(handle);
- if (!hyp_vm || hyp_vm->kvm.created_vcpus <= vcpu_idx)
+ if (!hyp_vm || hyp_vm->kvm.arch.pkvm.is_dying)
+ goto unlock;
+
+ if (hyp_vm->kvm.created_vcpus <= vcpu_idx)
goto unlock;
hyp_vcpu = hyp_vm->vcpus[vcpu_idx];
@@ -829,7 +832,32 @@ teardown_donated_memory(struct kvm_hyp_memcache *mc, void *addr, size_t size)
unmap_donated_memory_noclear(addr, size);
}
-int __pkvm_teardown_vm(pkvm_handle_t handle)
+int __pkvm_start_teardown_vm(pkvm_handle_t handle)
+{
+ struct pkvm_hyp_vm *hyp_vm;
+ int ret = 0;
+
+ hyp_spin_lock(&vm_table_lock);
+ hyp_vm = get_vm_by_handle(handle);
+ if (!hyp_vm) {
+ ret = -ENOENT;
+ goto unlock;
+ } else if (WARN_ON(hyp_page_count(hyp_vm))) {
+ ret = -EBUSY;
+ goto unlock;
+ } else if (hyp_vm->kvm.arch.pkvm.is_dying) {
+ ret = -EINVAL;
+ goto unlock;
+ }
+
+ hyp_vm->kvm.arch.pkvm.is_dying = true;
+unlock:
+ hyp_spin_unlock(&vm_table_lock);
+
+ return ret;
+}
+
+int __pkvm_finalize_teardown_vm(pkvm_handle_t handle)
{
struct kvm_hyp_memcache *mc, *stage2_mc;
struct pkvm_hyp_vm *hyp_vm;
@@ -843,9 +871,7 @@ int __pkvm_teardown_vm(pkvm_handle_t handle)
if (!hyp_vm) {
err = -ENOENT;
goto err_unlock;
- }
-
- if (WARN_ON(hyp_page_count(hyp_vm))) {
+ } else if (!hyp_vm->kvm.arch.pkvm.is_dying) {
err = -EBUSY;
goto err_unlock;
}
diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index 20d50abb3b94..a39dacd1d617 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -88,7 +88,7 @@ void __init kvm_hyp_reserve(void)
static void __pkvm_destroy_hyp_vm(struct kvm *kvm)
{
if (pkvm_hyp_vm_is_created(kvm)) {
- WARN_ON(kvm_call_hyp_nvhe(__pkvm_teardown_vm,
+ WARN_ON(kvm_call_hyp_nvhe(__pkvm_finalize_teardown_vm,
kvm->arch.pkvm.handle));
} else if (kvm->arch.pkvm.handle) {
/*
@@ -350,6 +350,11 @@ void pkvm_pgtable_stage2_destroy_range(struct kvm_pgtable *pgt,
if (!handle)
return;
+ if (pkvm_hyp_vm_is_created(kvm) && !kvm->arch.pkvm.is_dying) {
+ WARN_ON(kvm_call_hyp_nvhe(__pkvm_start_teardown_vm, handle));
+ kvm->arch.pkvm.is_dying = true;
+ }
+
__pkvm_pgtable_stage2_unshare(pgt, addr, addr + size);
}
--
2.52.0.351.gbe84eed79e-goog
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH 10/30] KVM: arm64: Introduce __pkvm_host_donate_guest()
2026-01-05 15:49 [PATCH 00/30] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
` (8 preceding siblings ...)
2026-01-05 15:49 ` [PATCH 09/30] KVM: arm64: Split teardown hypercall into two phases Will Deacon
@ 2026-01-05 15:49 ` Will Deacon
2026-01-06 14:48 ` Quentin Perret
2026-01-05 15:49 ` [PATCH 11/30] KVM: arm64: Hook up donation hypercall to pkvm_pgtable_stage2_map() Will Deacon
` (19 subsequent siblings)
29 siblings, 1 reply; 45+ messages in thread
From: Will Deacon @ 2026-01-05 15:49 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh
In preparation for supporting protected VMs, whose memory pages are
isolated from the host, introduce a new pKVM hypercall to allow the
donation of pages to a guest.
Signed-off-by: Will Deacon <will@kernel.org>
---
arch/arm64/include/asm/kvm_asm.h | 1 +
arch/arm64/include/asm/kvm_pgtable.h | 2 +-
arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 2 ++
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 21 +++++++++++++
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 30 +++++++++++++++++++
5 files changed, 55 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index d7fb23d39956..cad3ba5e1c5a 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -64,6 +64,7 @@ enum __kvm_host_smccc_func {
/* Hypercalls available after pKVM finalisation */
__KVM_HOST_SMCCC_FUNC___pkvm_host_share_hyp,
__KVM_HOST_SMCCC_FUNC___pkvm_host_unshare_hyp,
+ __KVM_HOST_SMCCC_FUNC___pkvm_host_donate_guest,
__KVM_HOST_SMCCC_FUNC___pkvm_host_share_guest,
__KVM_HOST_SMCCC_FUNC___pkvm_host_unshare_guest,
__KVM_HOST_SMCCC_FUNC___pkvm_host_relax_perms_guest,
diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 8b78d573fbcf..9ce55442b621 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -98,7 +98,7 @@ typedef u64 kvm_pte_t;
KVM_PTE_LEAF_ATTR_HI_S2_XN)
#define KVM_INVALID_PTE_OWNER_MASK GENMASK(9, 2)
-#define KVM_MAX_OWNER_ID 1
+#define KVM_MAX_OWNER_ID 3
/*
* Used to indicate a pte for which a 'break-before-make' sequence is in
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index 5f9d56754e39..9c0cc53d1dc9 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -28,6 +28,7 @@ enum pkvm_component_id {
PKVM_ID_HOST,
PKVM_ID_HYP,
PKVM_ID_FFA,
+ PKVM_ID_GUEST,
};
extern unsigned long hyp_nr_cpus;
@@ -39,6 +40,7 @@ int __pkvm_host_donate_hyp(u64 pfn, u64 nr_pages);
int __pkvm_hyp_donate_host(u64 pfn, u64 nr_pages);
int __pkvm_host_share_ffa(u64 pfn, u64 nr_pages);
int __pkvm_host_unshare_ffa(u64 pfn, u64 nr_pages);
+int __pkvm_host_donate_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu);
int __pkvm_host_share_guest(u64 pfn, u64 gfn, u64 nr_pages, struct pkvm_hyp_vcpu *vcpu,
enum kvm_pgtable_prot prot);
int __pkvm_host_unshare_guest(u64 gfn, u64 nr_pages, struct pkvm_hyp_vm *hyp_vm);
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index e88b57eac25c..a5ee1103ce1f 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -238,6 +238,26 @@ static int pkvm_refill_memcache(struct pkvm_hyp_vcpu *hyp_vcpu)
&host_vcpu->arch.pkvm_memcache);
}
+static void handle___pkvm_host_donate_guest(struct kvm_cpu_context *host_ctxt)
+{
+ DECLARE_REG(u64, pfn, host_ctxt, 1);
+ DECLARE_REG(u64, gfn, host_ctxt, 2);
+ struct pkvm_hyp_vcpu *hyp_vcpu;
+ int ret = -EINVAL;
+
+ hyp_vcpu = pkvm_get_loaded_hyp_vcpu();
+ if (!hyp_vcpu)
+ goto out;
+
+ ret = pkvm_refill_memcache(hyp_vcpu);
+ if (ret)
+ goto out;
+
+ ret = __pkvm_host_donate_guest(pfn, gfn, hyp_vcpu);
+out:
+ cpu_reg(host_ctxt, 1) = ret;
+}
+
static void handle___pkvm_host_share_guest(struct kvm_cpu_context *host_ctxt)
{
DECLARE_REG(u64, pfn, host_ctxt, 1);
@@ -580,6 +600,7 @@ static const hcall_t host_hcall[] = {
HANDLE_FUNC(__pkvm_host_share_hyp),
HANDLE_FUNC(__pkvm_host_unshare_hyp),
+ HANDLE_FUNC(__pkvm_host_donate_guest),
HANDLE_FUNC(__pkvm_host_share_guest),
HANDLE_FUNC(__pkvm_host_unshare_guest),
HANDLE_FUNC(__pkvm_host_relax_perms_guest),
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 49db32f3ddf7..ae126ab9febf 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -958,6 +958,36 @@ static int __guest_check_transition_size(u64 phys, u64 ipa, u64 nr_pages, u64 *s
return 0;
}
+int __pkvm_host_donate_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu)
+{
+ struct pkvm_hyp_vm *vm = pkvm_hyp_vcpu_to_hyp_vm(vcpu);
+ u64 phys = hyp_pfn_to_phys(pfn);
+ u64 ipa = hyp_pfn_to_phys(gfn);
+ int ret;
+
+ host_lock_component();
+ guest_lock_component(vm);
+
+ ret = __host_check_page_state_range(phys, PAGE_SIZE, PKVM_PAGE_OWNED);
+ if (ret)
+ goto unlock;
+
+ ret = __guest_check_page_state_range(vm, ipa, PAGE_SIZE, PKVM_NOPAGE);
+ if (ret)
+ goto unlock;
+
+ WARN_ON(host_stage2_set_owner_locked(phys, PAGE_SIZE, PKVM_ID_GUEST));
+ WARN_ON(kvm_pgtable_stage2_map(&vm->pgt, ipa, PAGE_SIZE, phys,
+ pkvm_mkstate(KVM_PGTABLE_PROT_RWX, PKVM_PAGE_OWNED),
+ &vcpu->vcpu.arch.pkvm_memcache, 0));
+
+unlock:
+ guest_unlock_component(vm);
+ host_unlock_component();
+
+ return ret;
+}
+
int __pkvm_host_share_guest(u64 pfn, u64 gfn, u64 nr_pages, struct pkvm_hyp_vcpu *vcpu,
enum kvm_pgtable_prot prot)
{
--
2.52.0.351.gbe84eed79e-goog
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH 11/30] KVM: arm64: Hook up donation hypercall to pkvm_pgtable_stage2_map()
2026-01-05 15:49 [PATCH 00/30] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
` (9 preceding siblings ...)
2026-01-05 15:49 ` [PATCH 10/30] KVM: arm64: Introduce __pkvm_host_donate_guest() Will Deacon
@ 2026-01-05 15:49 ` Will Deacon
2026-01-05 15:49 ` [PATCH 12/30] KVM: arm64: Handle aborts from protected VMs Will Deacon
` (18 subsequent siblings)
29 siblings, 0 replies; 45+ messages in thread
From: Will Deacon @ 2026-01-05 15:49 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh
Mapping pages into a protected guest requires the donation of memory
from the host.
Extend pkvm_pgtable_stage2_map() to issue a donate hypercall when the
target VM is protected. Since the hypercall only handles a single page,
the splitting logic used for the share path is not required.
Signed-off-by: Will Deacon <will@kernel.org>
---
arch/arm64/kvm/pkvm.c | 58 ++++++++++++++++++++++++++++++-------------
1 file changed, 41 insertions(+), 17 deletions(-)
diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index a39dacd1d617..1814e17d600e 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -373,31 +373,55 @@ int pkvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
struct kvm_hyp_memcache *cache = mc;
u64 gfn = addr >> PAGE_SHIFT;
u64 pfn = phys >> PAGE_SHIFT;
+ u64 end = addr + size;
int ret;
- if (size != PAGE_SIZE && size != PMD_SIZE)
- return -EINVAL;
-
lockdep_assert_held_write(&kvm->mmu_lock);
+ mapping = pkvm_mapping_iter_first(&pgt->pkvm_mappings, addr, end - 1);
- /*
- * Calling stage2_map() on top of existing mappings is either happening because of a race
- * with another vCPU, or because we're changing between page and block mappings. As per
- * user_mem_abort(), same-size permission faults are handled in the relax_perms() path.
- */
- mapping = pkvm_mapping_iter_first(&pgt->pkvm_mappings, addr, addr + size - 1);
- if (mapping) {
- if (size == (mapping->nr_pages * PAGE_SIZE))
+ if (kvm_vm_is_protected(kvm)) {
+ /* Protected VMs are mapped using RWX page-granular mappings */
+ if (WARN_ON_ONCE(size != PAGE_SIZE))
+ return -EINVAL;
+
+ if (WARN_ON_ONCE(prot != KVM_PGTABLE_PROT_RWX))
+ return -EINVAL;
+
+ /*
+ * We raced with another vCPU.
+ */
+ if (mapping)
return -EAGAIN;
- /* Remove _any_ pkvm_mapping overlapping with the range, bigger or smaller. */
- ret = __pkvm_pgtable_stage2_unshare(pgt, addr, addr + size);
- if (ret)
- return ret;
- mapping = NULL;
+ ret = kvm_call_hyp_nvhe(__pkvm_host_donate_guest, pfn, gfn);
+ } else {
+ if (WARN_ON_ONCE(size != PAGE_SIZE && size != PMD_SIZE))
+ return -EINVAL;
+
+ /*
+ * We either raced with another vCPU or we're changing between
+ * page and block mappings. As per user_mem_abort(), same-size
+ * permission faults are handled in the relax_perms() path.
+ */
+ if (mapping) {
+ if (size == (mapping->nr_pages * PAGE_SIZE))
+ return -EAGAIN;
+
+ /*
+ * Remove _any_ pkvm_mapping overlapping with the range,
+ * bigger or smaller.
+ */
+ ret = __pkvm_pgtable_stage2_unshare(pgt, addr, end);
+ if (ret)
+ return ret;
+
+ mapping = NULL;
+ }
+
+ ret = kvm_call_hyp_nvhe(__pkvm_host_share_guest, pfn, gfn,
+ size / PAGE_SIZE, prot);
}
- ret = kvm_call_hyp_nvhe(__pkvm_host_share_guest, pfn, gfn, size / PAGE_SIZE, prot);
if (WARN_ON(ret))
return ret;
--
2.52.0.351.gbe84eed79e-goog
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH 12/30] KVM: arm64: Handle aborts from protected VMs
2026-01-05 15:49 [PATCH 00/30] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
` (10 preceding siblings ...)
2026-01-05 15:49 ` [PATCH 11/30] KVM: arm64: Hook up donation hypercall to pkvm_pgtable_stage2_map() Will Deacon
@ 2026-01-05 15:49 ` Will Deacon
2026-01-05 15:49 ` [PATCH 13/30] KVM: arm64: Introduce __pkvm_reclaim_dying_guest_page() Will Deacon
` (17 subsequent siblings)
29 siblings, 0 replies; 45+ messages in thread
From: Will Deacon @ 2026-01-05 15:49 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh
Introduce a new abort handler for resolving stage-2 page faults from
protected VMs by pinning and donating anonymous memory. This is
considerably simpler than the infamous user_mem_abort() as we only have
to deal with translation faults at the pte level.
Signed-off-by: Will Deacon <will@kernel.org>
---
arch/arm64/kvm/mmu.c | 89 ++++++++++++++++++++++++++++++++++++++++----
1 file changed, 81 insertions(+), 8 deletions(-)
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 8f290d9477d3..fafedcbd516d 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1641,6 +1641,74 @@ static int gmem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
return ret != -EAGAIN ? ret : 0;
}
+static int pkvm_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
+ struct kvm_memory_slot *memslot, unsigned long hva)
+{
+ unsigned int flags = FOLL_HWPOISON | FOLL_LONGTERM | FOLL_WRITE;
+ struct kvm_pgtable *pgt = vcpu->arch.hw_mmu->pgt;
+ struct mm_struct *mm = current->mm;
+ struct kvm *kvm = vcpu->kvm;
+ void *hyp_memcache;
+ struct page *page;
+ int ret;
+
+ ret = prepare_mmu_memcache(vcpu, true, &hyp_memcache);
+ if (ret)
+ return -ENOMEM;
+
+ ret = account_locked_vm(mm, 1, true);
+ if (ret)
+ return ret;
+
+ mmap_read_lock(mm);
+ ret = pin_user_pages(hva, 1, flags, &page);
+ mmap_read_unlock(mm);
+
+ if (ret == -EHWPOISON) {
+ kvm_send_hwpoison_signal(hva, PAGE_SHIFT);
+ ret = 0;
+ goto dec_account;
+ } else if (ret != 1) {
+ ret = -EFAULT;
+ goto dec_account;
+ } else if (!folio_test_swapbacked(page_folio(page))) {
+ /*
+ * We really can't deal with page-cache pages returned by GUP
+ * because (a) we may trigger writeback of a page for which we
+ * no longer have access and (b) page_mkclean() won't find the
+ * stage-2 mapping in the rmap so we can get out-of-whack with
+ * the filesystem when marking the page dirty during unpinning
+ * (see cc5095747edf ("ext4: don't BUG if someone dirty pages
+ * without asking ext4 first")).
+ *
+ * Ideally we'd just restrict ourselves to anonymous pages, but
+ * we also want to allow memfd (i.e. shmem) pages, so check for
+ * pages backed by swap in the knowledge that the GUP pin will
+ * prevent try_to_unmap() from succeeding.
+ */
+ ret = -EIO;
+ goto unpin;
+ }
+
+ write_lock(&kvm->mmu_lock);
+ ret = pkvm_pgtable_stage2_map(pgt, fault_ipa, PAGE_SIZE,
+ page_to_phys(page), KVM_PGTABLE_PROT_RWX,
+ hyp_memcache, 0);
+ write_unlock(&kvm->mmu_lock);
+ if (ret) {
+ if (ret == -EAGAIN)
+ ret = 0;
+ goto unpin;
+ }
+
+ return 0;
+unpin:
+ unpin_user_pages(&page, 1);
+dec_account:
+ account_locked_vm(mm, 1, false);
+ return ret;
+}
+
static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
struct kvm_s2_trans *nested,
struct kvm_memory_slot *memslot, unsigned long hva,
@@ -2190,15 +2258,20 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
goto out_unlock;
}
- VM_WARN_ON_ONCE(kvm_vcpu_trap_is_permission_fault(vcpu) &&
- !write_fault && !kvm_vcpu_trap_is_exec_fault(vcpu));
+ if (kvm_vm_is_protected(vcpu->kvm)) {
+ ret = pkvm_mem_abort(vcpu, fault_ipa, memslot, hva);
+ } else {
+ VM_WARN_ON_ONCE(kvm_vcpu_trap_is_permission_fault(vcpu) &&
+ !write_fault &&
+ !kvm_vcpu_trap_is_exec_fault(vcpu));
- if (kvm_slot_has_gmem(memslot))
- ret = gmem_abort(vcpu, fault_ipa, nested, memslot,
- esr_fsc_is_permission_fault(esr));
- else
- ret = user_mem_abort(vcpu, fault_ipa, nested, memslot, hva,
- esr_fsc_is_permission_fault(esr));
+ if (kvm_slot_has_gmem(memslot))
+ ret = gmem_abort(vcpu, fault_ipa, nested, memslot,
+ esr_fsc_is_permission_fault(esr));
+ else
+ ret = user_mem_abort(vcpu, fault_ipa, nested, memslot, hva,
+ esr_fsc_is_permission_fault(esr));
+ }
if (ret == 0)
ret = 1;
out:
--
2.52.0.351.gbe84eed79e-goog
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH 13/30] KVM: arm64: Introduce __pkvm_reclaim_dying_guest_page()
2026-01-05 15:49 [PATCH 00/30] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
` (11 preceding siblings ...)
2026-01-05 15:49 ` [PATCH 12/30] KVM: arm64: Handle aborts from protected VMs Will Deacon
@ 2026-01-05 15:49 ` Will Deacon
2026-01-06 16:26 ` Vincent Donnefort
2026-01-05 15:49 ` [PATCH 14/30] KVM: arm64: Hook up reclaim hypercall to pkvm_pgtable_stage2_destroy() Will Deacon
` (16 subsequent siblings)
29 siblings, 1 reply; 45+ messages in thread
From: Will Deacon @ 2026-01-05 15:49 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh
To enable reclaim of pages from a protected VM during teardown,
introduce a new hypercall to reclaim a single page from a protected
guest that is in the dying state.
Since the EL2 code is non-preemptible, the new hypercall deliberately
acts on a single page at a time so as to allow EL1 to reschedule
frequently during the teardown operation.
Co-developed-by: Quentin Perret <qperret@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
arch/arm64/include/asm/kvm_asm.h | 1 +
arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 1 +
arch/arm64/kvm/hyp/include/nvhe/pkvm.h | 1 +
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 9 +++
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 79 +++++++++++++++++++
arch/arm64/kvm/hyp/nvhe/pkvm.c | 14 ++++
6 files changed, 105 insertions(+)
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index cad3ba5e1c5a..f14f845aeedd 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -86,6 +86,7 @@ enum __kvm_host_smccc_func {
__KVM_HOST_SMCCC_FUNC___pkvm_unreserve_vm,
__KVM_HOST_SMCCC_FUNC___pkvm_init_vm,
__KVM_HOST_SMCCC_FUNC___pkvm_init_vcpu,
+ __KVM_HOST_SMCCC_FUNC___pkvm_reclaim_dying_guest_page,
__KVM_HOST_SMCCC_FUNC___pkvm_start_teardown_vm,
__KVM_HOST_SMCCC_FUNC___pkvm_finalize_teardown_vm,
__KVM_HOST_SMCCC_FUNC___pkvm_vcpu_load,
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index 9c0cc53d1dc9..cde38a556049 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -41,6 +41,7 @@ int __pkvm_hyp_donate_host(u64 pfn, u64 nr_pages);
int __pkvm_host_share_ffa(u64 pfn, u64 nr_pages);
int __pkvm_host_unshare_ffa(u64 pfn, u64 nr_pages);
int __pkvm_host_donate_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu);
+int __pkvm_host_reclaim_page_guest(u64 gfn, struct pkvm_hyp_vm *vm);
int __pkvm_host_share_guest(u64 pfn, u64 gfn, u64 nr_pages, struct pkvm_hyp_vcpu *vcpu,
enum kvm_pgtable_prot prot);
int __pkvm_host_unshare_guest(u64 gfn, u64 nr_pages, struct pkvm_hyp_vm *hyp_vm);
diff --git a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
index 04c7ca703014..506831804f64 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
@@ -74,6 +74,7 @@ int __pkvm_init_vm(struct kvm *host_kvm, unsigned long vm_hva,
int __pkvm_init_vcpu(pkvm_handle_t handle, struct kvm_vcpu *host_vcpu,
unsigned long vcpu_hva);
+int __pkvm_reclaim_dying_guest_page(pkvm_handle_t handle, u64 gfn);
int __pkvm_start_teardown_vm(pkvm_handle_t handle);
int __pkvm_finalize_teardown_vm(pkvm_handle_t handle);
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index a5ee1103ce1f..b1940e639ad3 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -570,6 +570,14 @@ static void handle___pkvm_init_vcpu(struct kvm_cpu_context *host_ctxt)
cpu_reg(host_ctxt, 1) = __pkvm_init_vcpu(handle, host_vcpu, vcpu_hva);
}
+static void handle___pkvm_reclaim_dying_guest_page(struct kvm_cpu_context *host_ctxt)
+{
+ DECLARE_REG(pkvm_handle_t, handle, host_ctxt, 1);
+ DECLARE_REG(u64, gfn, host_ctxt, 2);
+
+ cpu_reg(host_ctxt, 1) = __pkvm_reclaim_dying_guest_page(handle, gfn);
+}
+
static void handle___pkvm_start_teardown_vm(struct kvm_cpu_context *host_ctxt)
{
DECLARE_REG(pkvm_handle_t, handle, host_ctxt, 1);
@@ -622,6 +630,7 @@ static const hcall_t host_hcall[] = {
HANDLE_FUNC(__pkvm_unreserve_vm),
HANDLE_FUNC(__pkvm_init_vm),
HANDLE_FUNC(__pkvm_init_vcpu),
+ HANDLE_FUNC(__pkvm_reclaim_dying_guest_page),
HANDLE_FUNC(__pkvm_start_teardown_vm),
HANDLE_FUNC(__pkvm_finalize_teardown_vm),
HANDLE_FUNC(__pkvm_vcpu_load),
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index ae126ab9febf..edbfe0e3dc58 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -725,6 +725,32 @@ static int __guest_check_page_state_range(struct pkvm_hyp_vm *vm, u64 addr,
return check_page_state_range(&vm->pgt, addr, size, &d);
}
+static int get_valid_guest_pte(struct pkvm_hyp_vm *vm, u64 ipa, kvm_pte_t *ptep, u64 *physp)
+{
+ kvm_pte_t pte;
+ u64 phys;
+ s8 level;
+ int ret;
+
+ ret = kvm_pgtable_get_leaf(&vm->pgt, ipa, &pte, &level);
+ if (ret)
+ return ret;
+ if (!kvm_pte_valid(pte))
+ return -ENOENT;
+ if (level != KVM_PGTABLE_LAST_LEVEL)
+ return -E2BIG;
+
+ phys = kvm_pte_to_phys(pte);
+ ret = check_range_allowed_memory(phys, phys + PAGE_SIZE);
+ if (WARN_ON(ret))
+ return ret;
+
+ *ptep = pte;
+ *physp = phys;
+
+ return 0;
+}
+
int __pkvm_host_share_hyp(u64 pfn)
{
u64 phys = hyp_pfn_to_phys(pfn);
@@ -958,6 +984,59 @@ static int __guest_check_transition_size(u64 phys, u64 ipa, u64 nr_pages, u64 *s
return 0;
}
+static void hyp_poison_page(phys_addr_t phys)
+{
+ void *addr = hyp_fixmap_map(phys);
+
+ memset(addr, 0, PAGE_SIZE);
+ /*
+ * Prefer kvm_flush_dcache_to_poc() over __clean_dcache_guest_page()
+ * here as the latter may elide the CMO under the assumption that FWB
+ * will be enabled on CPUs that support it. This is incorrect for the
+ * host stage-2 and would otherwise lead to a malicious host potentially
+ * being able to read the contents of newly reclaimed guest pages.
+ */
+ kvm_flush_dcache_to_poc(addr, PAGE_SIZE);
+ hyp_fixmap_unmap();
+}
+
+int __pkvm_host_reclaim_page_guest(u64 gfn, struct pkvm_hyp_vm *vm)
+{
+ u64 ipa = hyp_pfn_to_phys(gfn);
+ kvm_pte_t pte;
+ u64 phys;
+ int ret;
+
+ host_lock_component();
+ guest_lock_component(vm);
+
+ ret = get_valid_guest_pte(vm, ipa, &pte, &phys);
+ if (ret)
+ goto unlock;
+
+ switch (guest_get_page_state(pte, ipa)) {
+ case PKVM_PAGE_OWNED:
+ WARN_ON(__host_check_page_state_range(phys, PAGE_SIZE, PKVM_NOPAGE));
+ hyp_poison_page(phys);
+ break;
+ case PKVM_PAGE_SHARED_OWNED:
+ WARN_ON(__host_check_page_state_range(phys, PAGE_SIZE, PKVM_PAGE_SHARED_BORROWED));
+ break;
+ default:
+ ret = -EPERM;
+ goto unlock;
+ }
+
+ WARN_ON(kvm_pgtable_stage2_unmap(&vm->pgt, ipa, PAGE_SIZE));
+ WARN_ON(host_stage2_set_owner_locked(phys, PAGE_SIZE, PKVM_ID_HOST));
+
+unlock:
+ guest_unlock_component(vm);
+ host_unlock_component();
+
+ return ret;
+}
+
int __pkvm_host_donate_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu)
{
struct pkvm_hyp_vm *vm = pkvm_hyp_vcpu_to_hyp_vm(vcpu);
diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
index 7f8191f96fc3..9f0997150cf5 100644
--- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
+++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
@@ -832,6 +832,20 @@ teardown_donated_memory(struct kvm_hyp_memcache *mc, void *addr, size_t size)
unmap_donated_memory_noclear(addr, size);
}
+int __pkvm_reclaim_dying_guest_page(pkvm_handle_t handle, u64 gfn)
+{
+ struct pkvm_hyp_vm *hyp_vm;
+ int ret = -EINVAL;
+
+ hyp_spin_lock(&vm_table_lock);
+ hyp_vm = get_vm_by_handle(handle);
+ if (hyp_vm && hyp_vm->kvm.arch.pkvm.is_dying)
+ ret = __pkvm_host_reclaim_page_guest(gfn, hyp_vm);
+ hyp_spin_unlock(&vm_table_lock);
+
+ return ret;
+}
+
int __pkvm_start_teardown_vm(pkvm_handle_t handle)
{
struct pkvm_hyp_vm *hyp_vm;
--
2.52.0.351.gbe84eed79e-goog
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH 14/30] KVM: arm64: Hook up reclaim hypercall to pkvm_pgtable_stage2_destroy()
2026-01-05 15:49 [PATCH 00/30] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
` (12 preceding siblings ...)
2026-01-05 15:49 ` [PATCH 13/30] KVM: arm64: Introduce __pkvm_reclaim_dying_guest_page() Will Deacon
@ 2026-01-05 15:49 ` Will Deacon
2026-01-06 14:59 ` Quentin Perret
2026-01-05 15:49 ` [PATCH 15/30] KVM: arm64: Refactor enter_exception64() Will Deacon
` (15 subsequent siblings)
29 siblings, 1 reply; 45+ messages in thread
From: Will Deacon @ 2026-01-05 15:49 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh
During teardown of a protected guest, its memory pages must be reclaimed
from the hypervisor by issuing the '__pkvm_reclaim_dying_guest_page'
hypercall.
Add a new helper, __pkvm_pgtable_stage2_reclaim(), which is called
during the VM teardown operation to reclaim pages from the hypervisor
and drop the GUP pin on the host.
Signed-off-by: Will Deacon <will@kernel.org>
---
arch/arm64/kvm/pkvm.c | 31 ++++++++++++++++++++++++++++++-
1 file changed, 30 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index 1814e17d600e..8be91051699e 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -322,6 +322,32 @@ int pkvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
return 0;
}
+static int __pkvm_pgtable_stage2_reclaim(struct kvm_pgtable *pgt, u64 start, u64 end)
+{
+ struct kvm *kvm = kvm_s2_mmu_to_kvm(pgt->mmu);
+ pkvm_handle_t handle = kvm->arch.pkvm.handle;
+ struct pkvm_mapping *mapping;
+ int ret;
+
+ for_each_mapping_in_range_safe(pgt, start, end, mapping) {
+ struct page *page;
+
+ ret = kvm_call_hyp_nvhe(__pkvm_reclaim_dying_guest_page,
+ handle, mapping->gfn);
+ if (WARN_ON(ret))
+ return ret;
+
+ page = pfn_to_page(mapping->pfn);
+ WARN_ON_ONCE(mapping->nr_pages != 1);
+ unpin_user_pages_dirty_lock(&page, 1, true);
+ account_locked_vm(current->mm, 1, false);
+ pkvm_mapping_remove(mapping, &pgt->pkvm_mappings);
+ kfree(mapping);
+ }
+
+ return 0;
+}
+
static int __pkvm_pgtable_stage2_unshare(struct kvm_pgtable *pgt, u64 start, u64 end)
{
struct kvm *kvm = kvm_s2_mmu_to_kvm(pgt->mmu);
@@ -355,7 +381,10 @@ void pkvm_pgtable_stage2_destroy_range(struct kvm_pgtable *pgt,
kvm->arch.pkvm.is_dying = true;
}
- __pkvm_pgtable_stage2_unshare(pgt, addr, addr + size);
+ if (kvm_vm_is_protected(kvm))
+ __pkvm_pgtable_stage2_reclaim(pgt, addr, addr + size);
+ else
+ __pkvm_pgtable_stage2_unshare(pgt, addr, addr + size);
}
void pkvm_pgtable_stage2_destroy_pgd(struct kvm_pgtable *pgt)
--
2.52.0.351.gbe84eed79e-goog
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH 15/30] KVM: arm64: Refactor enter_exception64()
2026-01-05 15:49 [PATCH 00/30] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
` (13 preceding siblings ...)
2026-01-05 15:49 ` [PATCH 14/30] KVM: arm64: Hook up reclaim hypercall to pkvm_pgtable_stage2_destroy() Will Deacon
@ 2026-01-05 15:49 ` Will Deacon
2026-01-05 15:49 ` [PATCH 16/30] KVM: arm64: Inject SIGSEGV on illegal accesses Will Deacon
` (14 subsequent siblings)
29 siblings, 0 replies; 45+ messages in thread
From: Will Deacon @ 2026-01-05 15:49 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh
From: Quentin Perret <qperret@google.com>
In order to simplify the injection of exceptions in the host in pkvm
context, refactor enter_exception64() to separate the code calculating
the exception offset from VBAR_EL1 and the cpsr.
No functional change intended.
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
arch/arm64/include/asm/kvm_emulate.h | 5 ++
arch/arm64/kvm/hyp/exception.c | 100 ++++++++++++++++-----------
2 files changed, 63 insertions(+), 42 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index c9eab316398e..c3f04bd5b2a5 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -71,6 +71,11 @@ static inline int kvm_inject_serror(struct kvm_vcpu *vcpu)
return kvm_inject_serror_esr(vcpu, ESR_ELx_ISV);
}
+unsigned long get_except64_offset(unsigned long psr, unsigned long target_mode,
+ enum exception_type type);
+unsigned long get_except64_cpsr(unsigned long old, bool has_mte,
+ unsigned long sctlr, unsigned long mode);
+
void kvm_vcpu_wfi(struct kvm_vcpu *vcpu);
void kvm_emulate_nested_eret(struct kvm_vcpu *vcpu);
diff --git a/arch/arm64/kvm/hyp/exception.c b/arch/arm64/kvm/hyp/exception.c
index bef40ddb16db..d3bcda665612 100644
--- a/arch/arm64/kvm/hyp/exception.c
+++ b/arch/arm64/kvm/hyp/exception.c
@@ -65,12 +65,25 @@ static void __vcpu_write_spsr_und(struct kvm_vcpu *vcpu, u64 val)
vcpu->arch.ctxt.spsr_und = val;
}
+unsigned long get_except64_offset(unsigned long psr, unsigned long target_mode,
+ enum exception_type type)
+{
+ u64 mode = psr & (PSR_MODE_MASK | PSR_MODE32_BIT);
+ u64 exc_offset;
+
+ if (mode == target_mode)
+ exc_offset = CURRENT_EL_SP_ELx_VECTOR;
+ else if ((mode | PSR_MODE_THREAD_BIT) == target_mode)
+ exc_offset = CURRENT_EL_SP_EL0_VECTOR;
+ else if (!(mode & PSR_MODE32_BIT))
+ exc_offset = LOWER_EL_AArch64_VECTOR;
+ else
+ exc_offset = LOWER_EL_AArch32_VECTOR;
+
+ return exc_offset + type;
+}
+
/*
- * This performs the exception entry at a given EL (@target_mode), stashing PC
- * and PSTATE into ELR and SPSR respectively, and compute the new PC/PSTATE.
- * The EL passed to this function *must* be a non-secure, privileged mode with
- * bit 0 being set (PSTATE.SP == 1).
- *
* When an exception is taken, most PSTATE fields are left unchanged in the
* handler. However, some are explicitly overridden (e.g. M[4:0]). Luckily all
* of the inherited bits have the same position in the AArch64/AArch32 SPSR_ELx
@@ -82,50 +95,17 @@ static void __vcpu_write_spsr_und(struct kvm_vcpu *vcpu, u64 val)
* Here we manipulate the fields in order of the AArch64 SPSR_ELx layout, from
* MSB to LSB.
*/
-static void enter_exception64(struct kvm_vcpu *vcpu, unsigned long target_mode,
- enum exception_type type)
+unsigned long get_except64_cpsr(unsigned long old, bool has_mte,
+ unsigned long sctlr, unsigned long target_mode)
{
- unsigned long sctlr, vbar, old, new, mode;
- u64 exc_offset;
-
- mode = *vcpu_cpsr(vcpu) & (PSR_MODE_MASK | PSR_MODE32_BIT);
-
- if (mode == target_mode)
- exc_offset = CURRENT_EL_SP_ELx_VECTOR;
- else if ((mode | PSR_MODE_THREAD_BIT) == target_mode)
- exc_offset = CURRENT_EL_SP_EL0_VECTOR;
- else if (!(mode & PSR_MODE32_BIT))
- exc_offset = LOWER_EL_AArch64_VECTOR;
- else
- exc_offset = LOWER_EL_AArch32_VECTOR;
-
- switch (target_mode) {
- case PSR_MODE_EL1h:
- vbar = __vcpu_read_sys_reg(vcpu, VBAR_EL1);
- sctlr = __vcpu_read_sys_reg(vcpu, SCTLR_EL1);
- __vcpu_write_sys_reg(vcpu, *vcpu_pc(vcpu), ELR_EL1);
- break;
- case PSR_MODE_EL2h:
- vbar = __vcpu_read_sys_reg(vcpu, VBAR_EL2);
- sctlr = __vcpu_read_sys_reg(vcpu, SCTLR_EL2);
- __vcpu_write_sys_reg(vcpu, *vcpu_pc(vcpu), ELR_EL2);
- break;
- default:
- /* Don't do that */
- BUG();
- }
-
- *vcpu_pc(vcpu) = vbar + exc_offset + type;
-
- old = *vcpu_cpsr(vcpu);
- new = 0;
+ u64 new = 0;
new |= (old & PSR_N_BIT);
new |= (old & PSR_Z_BIT);
new |= (old & PSR_C_BIT);
new |= (old & PSR_V_BIT);
- if (kvm_has_mte(kern_hyp_va(vcpu->kvm)))
+ if (has_mte)
new |= PSR_TCO_BIT;
new |= (old & PSR_DIT_BIT);
@@ -161,6 +141,42 @@ static void enter_exception64(struct kvm_vcpu *vcpu, unsigned long target_mode,
new |= target_mode;
+ return new;
+}
+
+/*
+ * This performs the exception entry at a given EL (@target_mode), stashing PC
+ * and PSTATE into ELR and SPSR respectively, and compute the new PC/PSTATE.
+ * The EL passed to this function *must* be a non-secure, privileged mode with
+ * bit 0 being set (PSTATE.SP == 1).
+ */
+static void enter_exception64(struct kvm_vcpu *vcpu, unsigned long target_mode,
+ enum exception_type type)
+{
+ u64 offset = get_except64_offset(*vcpu_cpsr(vcpu), target_mode, type);
+ unsigned long sctlr, vbar, old, new;
+
+ switch (target_mode) {
+ case PSR_MODE_EL1h:
+ vbar = __vcpu_read_sys_reg(vcpu, VBAR_EL1);
+ sctlr = __vcpu_read_sys_reg(vcpu, SCTLR_EL1);
+ __vcpu_write_sys_reg(vcpu, *vcpu_pc(vcpu), ELR_EL1);
+ break;
+ case PSR_MODE_EL2h:
+ vbar = __vcpu_read_sys_reg(vcpu, VBAR_EL2);
+ sctlr = __vcpu_read_sys_reg(vcpu, SCTLR_EL2);
+ __vcpu_write_sys_reg(vcpu, *vcpu_pc(vcpu), ELR_EL2);
+ break;
+ default:
+ /* Don't do that */
+ BUG();
+ }
+
+ *vcpu_pc(vcpu) = vbar + offset;
+
+ old = *vcpu_cpsr(vcpu);
+ new = get_except64_cpsr(old, kvm_has_mte(kern_hyp_va(vcpu->kvm)), sctlr,
+ target_mode);
*vcpu_cpsr(vcpu) = new;
__vcpu_write_spsr(vcpu, target_mode, old);
}
--
2.52.0.351.gbe84eed79e-goog
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH 16/30] KVM: arm64: Inject SIGSEGV on illegal accesses
2026-01-05 15:49 [PATCH 00/30] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
` (14 preceding siblings ...)
2026-01-05 15:49 ` [PATCH 15/30] KVM: arm64: Refactor enter_exception64() Will Deacon
@ 2026-01-05 15:49 ` Will Deacon
2026-01-05 15:49 ` [PATCH 17/30] KVM: arm64: Generalise kvm_pgtable_stage2_set_owner() Will Deacon
` (13 subsequent siblings)
29 siblings, 0 replies; 45+ messages in thread
From: Will Deacon @ 2026-01-05 15:49 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh
From: Quentin Perret <qperret@google.com>
The pKVM hypervisor will currently panic if the host tries to access
memory that it doesn't own (e.g. protected guest memory). Sadly, as
guest memory can still be mapped into the VMM's address space, userspace
can trivially crash the kernel/hypervisor by poking into guest memory.
To prevent this, inject the abort back in the host with S1PTW set in the
ESR, hence allowing the host to differentiate this abort from normal
userspace faults and inject a SIGSEGV cleanly.
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 50 ++++++++++++++++++++++++++-
arch/arm64/mm/fault.c | 22 ++++++++++++
2 files changed, 71 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index edbfe0e3dc58..0336143cbb24 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -603,6 +603,50 @@ static int host_stage2_idmap(u64 addr)
return ret;
}
+static void host_inject_abort(struct kvm_cpu_context *host_ctxt)
+{
+ u64 spsr = read_sysreg_el2(SYS_SPSR);
+ u64 esr = read_sysreg_el2(SYS_ESR);
+ u64 ventry, ec;
+
+ /* Repaint the ESR to report a same-level fault if taken from EL1 */
+ if ((spsr & PSR_MODE_MASK) != PSR_MODE_EL0t) {
+ ec = ESR_ELx_EC(esr);
+ if (ec == ESR_ELx_EC_DABT_LOW)
+ ec = ESR_ELx_EC_DABT_CUR;
+ else if (ec == ESR_ELx_EC_IABT_LOW)
+ ec = ESR_ELx_EC_IABT_CUR;
+ else
+ WARN_ON(1);
+ esr &= ~ESR_ELx_EC_MASK;
+ esr |= ec << ESR_ELx_EC_SHIFT;
+ }
+
+ /*
+ * Since S1PTW should only ever be set for stage-2 faults, we're pretty
+ * much guaranteed that it won't be set in ESR_EL1 by the hardware. So,
+ * let's use that bit to allow the host abort handler to differentiate
+ * this abort from normal userspace faults.
+ *
+ * Note: although S1PTW is RES0 at EL1, it is guaranteed by the
+ * architecture to be backed by flops, so it should be safe to use.
+ */
+ esr |= ESR_ELx_S1PTW;
+
+ write_sysreg_el1(esr, SYS_ESR);
+ write_sysreg_el1(spsr, SYS_SPSR);
+ write_sysreg_el1(read_sysreg_el2(SYS_ELR), SYS_ELR);
+ write_sysreg_el1(read_sysreg_el2(SYS_FAR), SYS_FAR);
+
+ ventry = read_sysreg_el1(SYS_VBAR);
+ ventry += get_except64_offset(spsr, PSR_MODE_EL1h, except_type_sync);
+ write_sysreg_el2(ventry, SYS_ELR);
+
+ spsr = get_except64_cpsr(spsr, system_supports_mte(),
+ read_sysreg_el1(SYS_SCTLR), PSR_MODE_EL1h);
+ write_sysreg_el2(spsr, SYS_SPSR);
+}
+
void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt)
{
struct kvm_vcpu_fault_info fault;
@@ -627,7 +671,11 @@ void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt)
addr = FIELD_GET(HPFAR_EL2_FIPA, fault.hpfar_el2) << 12;
ret = host_stage2_idmap(addr);
- BUG_ON(ret && ret != -EAGAIN);
+
+ if (ret == -EPERM)
+ host_inject_abort(host_ctxt);
+ else
+ BUG_ON(ret && ret != -EAGAIN);
}
struct check_walk_data {
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index be9dab2c7d6a..2294f2061866 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -43,6 +43,7 @@
#include <asm/system_misc.h>
#include <asm/tlbflush.h>
#include <asm/traps.h>
+#include <asm/virt.h>
struct fault_info {
int (*fn)(unsigned long far, unsigned long esr,
@@ -269,6 +270,15 @@ static inline bool is_el1_permission_fault(unsigned long addr, unsigned long esr
return false;
}
+static bool is_pkvm_stage2_abort(unsigned int esr)
+{
+ /*
+ * S1PTW should only ever be set in ESR_EL1 if the pkvm hypervisor
+ * injected a stage-2 abort -- see host_inject_abort().
+ */
+ return is_pkvm_initialized() && (esr & ESR_ELx_S1PTW);
+}
+
static bool __kprobes is_spurious_el1_translation_fault(unsigned long addr,
unsigned long esr,
struct pt_regs *regs)
@@ -279,6 +289,9 @@ static bool __kprobes is_spurious_el1_translation_fault(unsigned long addr,
if (!is_el1_data_abort(esr) || !esr_fsc_is_translation_fault(esr))
return false;
+ if (is_pkvm_stage2_abort(esr))
+ return false;
+
local_irq_save(flags);
asm volatile("at s1e1r, %0" :: "r" (addr));
isb();
@@ -395,6 +408,8 @@ static void __do_kernel_fault(unsigned long addr, unsigned long esr,
msg = "read from unreadable memory";
} else if (addr < PAGE_SIZE) {
msg = "NULL pointer dereference";
+ } else if (is_pkvm_stage2_abort(esr)) {
+ msg = "access to hypervisor-protected memory";
} else {
if (esr_fsc_is_translation_fault(esr) &&
kfence_handle_page_fault(addr, esr & ESR_ELx_WNR, regs))
@@ -621,6 +636,13 @@ static int __kprobes do_page_fault(unsigned long far, unsigned long esr,
addr, esr, regs);
}
+ if (is_pkvm_stage2_abort(esr)) {
+ if (!user_mode(regs))
+ goto no_context;
+ arm64_force_sig_fault(SIGSEGV, SEGV_ACCERR, far, "stage-2 fault");
+ return 0;
+ }
+
perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, addr);
if (!(mm_flags & FAULT_FLAG_USER))
--
2.52.0.351.gbe84eed79e-goog
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH 17/30] KVM: arm64: Generalise kvm_pgtable_stage2_set_owner()
2026-01-05 15:49 [PATCH 00/30] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
` (15 preceding siblings ...)
2026-01-05 15:49 ` [PATCH 16/30] KVM: arm64: Inject SIGSEGV on illegal accesses Will Deacon
@ 2026-01-05 15:49 ` Will Deacon
2026-01-06 15:20 ` Quentin Perret
2026-01-05 15:49 ` [PATCH 18/30] KVM: arm64: Introduce host_stage2_set_owner_metadata_locked() Will Deacon
` (12 subsequent siblings)
29 siblings, 1 reply; 45+ messages in thread
From: Will Deacon @ 2026-01-05 15:49 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh
kvm_pgtable_stage2_set_owner() can be generalised into a way to store
up to 62 bits in the page tables, as long as we don't set the 'valid' or
KVM_INVALID_PTE_LOCKED bits.
Let's do that and reduce the owner field to 4 bits to free up more
annotation space for future work.
Signed-off-by: Will Deacon <will@kernel.org>
---
arch/arm64/include/asm/kvm_pgtable.h | 21 +++++++++++++--------
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 14 ++++++++++++--
arch/arm64/kvm/hyp/pgtable.c | 17 ++++++-----------
3 files changed, 31 insertions(+), 21 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 9ce55442b621..ddbe0e506e4c 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -97,14 +97,17 @@ typedef u64 kvm_pte_t;
KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W | \
KVM_PTE_LEAF_ATTR_HI_S2_XN)
-#define KVM_INVALID_PTE_OWNER_MASK GENMASK(9, 2)
-#define KVM_MAX_OWNER_ID 3
+/* pKVM invalid pte encodings */
+#define KVM_INVALID_PTE_OWNER_MASK GENMASK(4, 1)
+#define KVM_INVALID_PTE_EXTRA_MASK GENMASK(62, 5)
+#define KVM_INVALID_PTE_ANNOT_MASK (KVM_INVALID_PTE_OWNER_MASK | \
+ KVM_INVALID_PTE_EXTRA_MASK)
/*
* Used to indicate a pte for which a 'break-before-make' sequence is in
* progress.
*/
-#define KVM_INVALID_PTE_LOCKED BIT(10)
+#define KVM_INVALID_PTE_LOCKED BIT(63)
static inline bool kvm_pte_valid(kvm_pte_t pte)
{
@@ -657,14 +660,16 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
void *mc, enum kvm_pgtable_walk_flags flags);
/**
- * kvm_pgtable_stage2_set_owner() - Unmap and annotate pages in the IPA space to
- * track ownership.
+ * kvm_pgtable_stage2_annotate() - Unmap and annotate pages in the IPA space
+ * to track ownership (and more).
* @pgt: Page-table structure initialised by kvm_pgtable_stage2_init*().
* @addr: Base intermediate physical address to annotate.
* @size: Size of the annotated range.
* @mc: Cache of pre-allocated and zeroed memory from which to allocate
* page-table pages.
- * @owner_id: Unique identifier for the owner of the page.
+ * @annotation: A 62-bit value that will be stored in the page tables.
+ * @annotation[0] and @annotation[63] must be 0.
+ * @annotation[62:1] is stored in the page tables.
*
* By default, all page-tables are owned by identifier 0. This function can be
* used to mark portions of the IPA space as owned by other entities. When a
@@ -673,8 +678,8 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
*
* Return: 0 on success, negative error code on failure.
*/
-int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size,
- void *mc, u8 owner_id);
+int kvm_pgtable_stage2_annotate(struct kvm_pgtable *pgt, u64 addr, u64 size,
+ void *mc, kvm_pte_t annotation);
/**
* kvm_pgtable_stage2_unmap() - Remove a mapping from a guest stage-2 page-table.
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 0336143cbb24..6c32bdc689ed 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -539,15 +539,25 @@ static void __host_update_page_state(phys_addr_t addr, u64 size, enum pkvm_page_
set_host_state(page, state);
}
+static kvm_pte_t kvm_init_invalid_leaf_owner(u8 owner_id)
+{
+ return FIELD_PREP(KVM_INVALID_PTE_OWNER_MASK, owner_id);
+}
+
int host_stage2_set_owner_locked(phys_addr_t addr, u64 size, u8 owner_id)
{
+ kvm_pte_t annotation;
int ret;
+ if (!FIELD_FIT(KVM_INVALID_PTE_OWNER_MASK, owner_id))
+ return -EINVAL;
+
if (!range_is_memory(addr, addr + size))
return -EPERM;
- ret = host_stage2_try(kvm_pgtable_stage2_set_owner, &host_mmu.pgt,
- addr, size, &host_s2_pool, owner_id);
+ annotation = kvm_init_invalid_leaf_owner(owner_id);
+ ret = host_stage2_try(kvm_pgtable_stage2_annotate, &host_mmu.pgt,
+ addr, size, &host_s2_pool, annotation);
if (ret)
return ret;
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 9abc0a6cf448..749ee5863601 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -114,11 +114,6 @@ static kvm_pte_t kvm_init_valid_leaf_pte(u64 pa, kvm_pte_t attr, s8 level)
return pte;
}
-static kvm_pte_t kvm_init_invalid_leaf_owner(u8 owner_id)
-{
- return FIELD_PREP(KVM_INVALID_PTE_OWNER_MASK, owner_id);
-}
-
static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data,
const struct kvm_pgtable_visit_ctx *ctx,
enum kvm_pgtable_walk_flags visit)
@@ -563,7 +558,7 @@ void kvm_pgtable_hyp_destroy(struct kvm_pgtable *pgt)
struct stage2_map_data {
const u64 phys;
kvm_pte_t attr;
- u8 owner_id;
+ kvm_pte_t pte_annot;
kvm_pte_t *anchor;
kvm_pte_t *childp;
@@ -946,7 +941,7 @@ static int stage2_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
if (!data->annotation)
new = kvm_init_valid_leaf_pte(phys, data->attr, ctx->level);
else
- new = kvm_init_invalid_leaf_owner(data->owner_id);
+ new = data->pte_annot;
/*
* Skip updating the PTE if we are trying to recreate the exact
@@ -1100,16 +1095,16 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
return ret;
}
-int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size,
- void *mc, u8 owner_id)
+int kvm_pgtable_stage2_annotate(struct kvm_pgtable *pgt, u64 addr, u64 size,
+ void *mc, kvm_pte_t pte_annot)
{
int ret;
struct stage2_map_data map_data = {
.mmu = pgt->mmu,
.memcache = mc,
- .owner_id = owner_id,
.force_pte = true,
.annotation = true,
+ .pte_annot = pte_annot,
};
struct kvm_pgtable_walker walker = {
.cb = stage2_map_walker,
@@ -1118,7 +1113,7 @@ int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size,
.arg = &map_data,
};
- if (owner_id > KVM_MAX_OWNER_ID)
+ if (pte_annot & ~KVM_INVALID_PTE_ANNOT_MASK)
return -EINVAL;
ret = kvm_pgtable_walk(pgt, addr, size, &walker);
--
2.52.0.351.gbe84eed79e-goog
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH 18/30] KVM: arm64: Introduce host_stage2_set_owner_metadata_locked()
2026-01-05 15:49 [PATCH 00/30] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
` (16 preceding siblings ...)
2026-01-05 15:49 ` [PATCH 17/30] KVM: arm64: Generalise kvm_pgtable_stage2_set_owner() Will Deacon
@ 2026-01-05 15:49 ` Will Deacon
2026-01-05 15:49 ` [PATCH 19/30] KVM: arm64: Annotate guest donations with handle and gfn in host stage-2 Will Deacon
` (11 subsequent siblings)
29 siblings, 0 replies; 45+ messages in thread
From: Will Deacon @ 2026-01-05 15:49 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh
Rework host_stage2_set_owner_locked() to add a new helper function,
host_stage2_set_owner_metadata_locked(), which will allow us to store
additional metadata alongside the owner ID for invalid host stage-2
entries.
Signed-off-by: Will Deacon <will@kernel.org>
---
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 19 ++++++++++++-------
1 file changed, 12 insertions(+), 7 deletions(-)
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 6c32bdc689ed..7d1844e2888d 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -539,12 +539,8 @@ static void __host_update_page_state(phys_addr_t addr, u64 size, enum pkvm_page_
set_host_state(page, state);
}
-static kvm_pte_t kvm_init_invalid_leaf_owner(u8 owner_id)
-{
- return FIELD_PREP(KVM_INVALID_PTE_OWNER_MASK, owner_id);
-}
-
-int host_stage2_set_owner_locked(phys_addr_t addr, u64 size, u8 owner_id)
+static int host_stage2_set_owner_metadata_locked(phys_addr_t addr, u64 size,
+ u8 owner_id, u64 meta)
{
kvm_pte_t annotation;
int ret;
@@ -552,10 +548,14 @@ int host_stage2_set_owner_locked(phys_addr_t addr, u64 size, u8 owner_id)
if (!FIELD_FIT(KVM_INVALID_PTE_OWNER_MASK, owner_id))
return -EINVAL;
+ if (!FIELD_FIT(KVM_INVALID_PTE_EXTRA_MASK, meta))
+ return -EINVAL;
+
if (!range_is_memory(addr, addr + size))
return -EPERM;
- annotation = kvm_init_invalid_leaf_owner(owner_id);
+ annotation = FIELD_PREP(KVM_INVALID_PTE_OWNER_MASK, owner_id) |
+ FIELD_PREP(KVM_INVALID_PTE_EXTRA_MASK, meta);
ret = host_stage2_try(kvm_pgtable_stage2_annotate, &host_mmu.pgt,
addr, size, &host_s2_pool, annotation);
if (ret)
@@ -570,6 +570,11 @@ int host_stage2_set_owner_locked(phys_addr_t addr, u64 size, u8 owner_id)
return 0;
}
+int host_stage2_set_owner_locked(phys_addr_t addr, u64 size, u8 owner_id)
+{
+ return host_stage2_set_owner_metadata_locked(addr, size, owner_id, 0);
+}
+
static bool host_stage2_force_pte_cb(u64 addr, u64 end, enum kvm_pgtable_prot prot)
{
/*
--
2.52.0.351.gbe84eed79e-goog
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH 19/30] KVM: arm64: Annotate guest donations with handle and gfn in host stage-2
2026-01-05 15:49 [PATCH 00/30] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
` (17 preceding siblings ...)
2026-01-05 15:49 ` [PATCH 18/30] KVM: arm64: Introduce host_stage2_set_owner_metadata_locked() Will Deacon
@ 2026-01-05 15:49 ` Will Deacon
2026-01-06 16:01 ` Fuad Tabba
2026-01-05 15:49 ` [PATCH 20/30] KVM: arm64: Introduce hypercall to force reclaim of a protected page Will Deacon
` (10 subsequent siblings)
29 siblings, 1 reply; 45+ messages in thread
From: Will Deacon @ 2026-01-05 15:49 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh
Handling host kernel faults arising from accesses to donated guest
memory will require an rmap-like mechanism to identify the guest mapping
of the faulting page.
Extend the page donation logic to encode the guest handle and gfn
alongside the owner information in the host stage-2 pte.
Signed-off-by: Will Deacon <will@kernel.org>
---
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 18 +++++++++++++++++-
1 file changed, 17 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 7d1844e2888d..1a341337b272 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -1063,6 +1063,19 @@ static void hyp_poison_page(phys_addr_t phys)
hyp_fixmap_unmap();
}
+#define KVM_HOST_INVALID_PTE_GUEST_HANDLE_MASK GENMASK(15, 0)
+#define KVM_HOST_INVALID_PTE_GUEST_GFN_MASK GENMASK(56, 16)
+static u64 host_stage2_encode_gfn_meta(struct pkvm_hyp_vm *vm, u64 gfn)
+{
+ pkvm_handle_t handle = vm->kvm.arch.pkvm.handle;
+
+ WARN_ON(!FIELD_FIT(KVM_HOST_INVALID_PTE_GUEST_HANDLE_MASK, handle));
+ WARN_ON(!FIELD_FIT(KVM_HOST_INVALID_PTE_GUEST_GFN_MASK, gfn));
+
+ return FIELD_PREP(KVM_HOST_INVALID_PTE_GUEST_HANDLE_MASK, handle) |
+ FIELD_PREP(KVM_HOST_INVALID_PTE_GUEST_GFN_MASK, gfn);
+}
+
int __pkvm_host_reclaim_page_guest(u64 gfn, struct pkvm_hyp_vm *vm)
{
u64 ipa = hyp_pfn_to_phys(gfn);
@@ -1105,6 +1118,7 @@ int __pkvm_host_donate_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu)
struct pkvm_hyp_vm *vm = pkvm_hyp_vcpu_to_hyp_vm(vcpu);
u64 phys = hyp_pfn_to_phys(pfn);
u64 ipa = hyp_pfn_to_phys(gfn);
+ u64 meta;
int ret;
host_lock_component();
@@ -1118,7 +1132,9 @@ int __pkvm_host_donate_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu)
if (ret)
goto unlock;
- WARN_ON(host_stage2_set_owner_locked(phys, PAGE_SIZE, PKVM_ID_GUEST));
+ meta = host_stage2_encode_gfn_meta(vm, gfn);
+ WARN_ON(host_stage2_set_owner_metadata_locked(phys, PAGE_SIZE,
+ PKVM_ID_GUEST, meta));
WARN_ON(kvm_pgtable_stage2_map(&vm->pgt, ipa, PAGE_SIZE, phys,
pkvm_mkstate(KVM_PGTABLE_PROT_RWX, PKVM_PAGE_OWNED),
&vcpu->vcpu.arch.pkvm_memcache, 0));
--
2.52.0.351.gbe84eed79e-goog
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH 20/30] KVM: arm64: Introduce hypercall to force reclaim of a protected page
2026-01-05 15:49 [PATCH 00/30] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
` (18 preceding siblings ...)
2026-01-05 15:49 ` [PATCH 19/30] KVM: arm64: Annotate guest donations with handle and gfn in host stage-2 Will Deacon
@ 2026-01-05 15:49 ` Will Deacon
2026-01-06 15:44 ` Quentin Perret
2026-01-05 15:49 ` [PATCH 21/30] KVM: arm64: Reclaim faulting page from pKVM in spurious fault handler Will Deacon
` (9 subsequent siblings)
29 siblings, 1 reply; 45+ messages in thread
From: Will Deacon @ 2026-01-05 15:49 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh
Introduce a new hypercall, __pkvm_force_reclaim_guest_page(), to allow
the host to forcefully reclaim a physical page that was previous donated
to a protected guest. This results in the page being zeroed and the
previous guest mapping being poisoned so that new pages cannot be
subsequently donated at the same IPA.
Signed-off-by: Will Deacon <will@kernel.org>
---
arch/arm64/include/asm/kvm_asm.h | 1 +
arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 1 +
arch/arm64/kvm/hyp/include/nvhe/memory.h | 6 +
arch/arm64/kvm/hyp/include/nvhe/pkvm.h | 1 +
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 8 ++
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 118 +++++++++++++++++-
arch/arm64/kvm/hyp/nvhe/pkvm.c | 4 +-
7 files changed, 137 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index f14f845aeedd..286a7379a368 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -86,6 +86,7 @@ enum __kvm_host_smccc_func {
__KVM_HOST_SMCCC_FUNC___pkvm_unreserve_vm,
__KVM_HOST_SMCCC_FUNC___pkvm_init_vm,
__KVM_HOST_SMCCC_FUNC___pkvm_init_vcpu,
+ __KVM_HOST_SMCCC_FUNC___pkvm_force_reclaim_guest_page,
__KVM_HOST_SMCCC_FUNC___pkvm_reclaim_dying_guest_page,
__KVM_HOST_SMCCC_FUNC___pkvm_start_teardown_vm,
__KVM_HOST_SMCCC_FUNC___pkvm_finalize_teardown_vm,
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index cde38a556049..f27b037abaf3 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -41,6 +41,7 @@ int __pkvm_hyp_donate_host(u64 pfn, u64 nr_pages);
int __pkvm_host_share_ffa(u64 pfn, u64 nr_pages);
int __pkvm_host_unshare_ffa(u64 pfn, u64 nr_pages);
int __pkvm_host_donate_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu);
+int __pkvm_host_force_reclaim_page_guest(phys_addr_t phys);
int __pkvm_host_reclaim_page_guest(u64 gfn, struct pkvm_hyp_vm *vm);
int __pkvm_host_share_guest(u64 pfn, u64 gfn, u64 nr_pages, struct pkvm_hyp_vcpu *vcpu,
enum kvm_pgtable_prot prot);
diff --git a/arch/arm64/kvm/hyp/include/nvhe/memory.h b/arch/arm64/kvm/hyp/include/nvhe/memory.h
index dee1a406b0c2..4cedb720c75d 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/memory.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/memory.h
@@ -30,6 +30,12 @@ enum pkvm_page_state {
* struct hyp_page.
*/
PKVM_NOPAGE = BIT(0) | BIT(1),
+
+ /*
+ * 'Meta-states' which aren't encoded directly in the PTE's SW bits (or
+ * the hyp_vmemmap entry for the host)
+ */
+ PKVM_POISON = BIT(2),
};
#define PKVM_PAGE_STATE_MASK (BIT(0) | BIT(1))
diff --git a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
index 506831804f64..a5a7bb453f3e 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
@@ -78,6 +78,7 @@ int __pkvm_reclaim_dying_guest_page(pkvm_handle_t handle, u64 gfn);
int __pkvm_start_teardown_vm(pkvm_handle_t handle);
int __pkvm_finalize_teardown_vm(pkvm_handle_t handle);
+struct pkvm_hyp_vm *get_vm_by_handle(pkvm_handle_t handle);
struct pkvm_hyp_vcpu *pkvm_load_hyp_vcpu(pkvm_handle_t handle,
unsigned int vcpu_idx);
void pkvm_put_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu);
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index b1940e639ad3..7d66cdd7de57 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -570,6 +570,13 @@ static void handle___pkvm_init_vcpu(struct kvm_cpu_context *host_ctxt)
cpu_reg(host_ctxt, 1) = __pkvm_init_vcpu(handle, host_vcpu, vcpu_hva);
}
+static void handle___pkvm_force_reclaim_guest_page(struct kvm_cpu_context *host_ctxt)
+{
+ DECLARE_REG(phys_addr_t, phys, host_ctxt, 1);
+
+ cpu_reg(host_ctxt, 1) = __pkvm_host_force_reclaim_page_guest(phys);
+}
+
static void handle___pkvm_reclaim_dying_guest_page(struct kvm_cpu_context *host_ctxt)
{
DECLARE_REG(pkvm_handle_t, handle, host_ctxt, 1);
@@ -630,6 +637,7 @@ static const hcall_t host_hcall[] = {
HANDLE_FUNC(__pkvm_unreserve_vm),
HANDLE_FUNC(__pkvm_init_vm),
HANDLE_FUNC(__pkvm_init_vcpu),
+ HANDLE_FUNC(__pkvm_force_reclaim_guest_page),
HANDLE_FUNC(__pkvm_reclaim_dying_guest_page),
HANDLE_FUNC(__pkvm_start_teardown_vm),
HANDLE_FUNC(__pkvm_finalize_teardown_vm),
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 1a341337b272..5d6028c41125 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -768,8 +768,17 @@ static int __hyp_check_page_state_range(phys_addr_t phys, u64 size, enum pkvm_pa
return 0;
}
+#define KVM_GUEST_INVALID_PTE_POISONED FIELD_PREP(KVM_INVALID_PTE_ANNOT_MASK, 1)
+static bool guest_pte_is_poisoned(kvm_pte_t pte)
+{
+ return pte == KVM_GUEST_INVALID_PTE_POISONED;
+}
+
static enum pkvm_page_state guest_get_page_state(kvm_pte_t pte, u64 addr)
{
+ if (guest_pte_is_poisoned(pte))
+ return PKVM_POISON;
+
if (!kvm_pte_valid(pte))
return PKVM_NOPAGE;
@@ -798,6 +807,8 @@ static int get_valid_guest_pte(struct pkvm_hyp_vm *vm, u64 ipa, kvm_pte_t *ptep,
ret = kvm_pgtable_get_leaf(&vm->pgt, ipa, &pte, &level);
if (ret)
return ret;
+ if (guest_pte_is_poisoned(pte))
+ return -EHWPOISON;
if (!kvm_pte_valid(pte))
return -ENOENT;
if (level != KVM_PGTABLE_LAST_LEVEL)
@@ -1076,6 +1087,107 @@ static u64 host_stage2_encode_gfn_meta(struct pkvm_hyp_vm *vm, u64 gfn)
FIELD_PREP(KVM_HOST_INVALID_PTE_GUEST_GFN_MASK, gfn);
}
+static int host_stage2_decode_gfn_meta(kvm_pte_t pte, struct pkvm_hyp_vm **vm,
+ u64 *gfn)
+{
+ pkvm_handle_t handle;
+ u64 meta;
+
+ if (kvm_pte_valid(pte))
+ return -EINVAL;
+
+ if (FIELD_GET(KVM_INVALID_PTE_OWNER_MASK, pte) != PKVM_ID_GUEST)
+ return -EPERM;
+
+ meta = FIELD_GET(KVM_INVALID_PTE_EXTRA_MASK, pte);
+ handle = FIELD_GET(KVM_HOST_INVALID_PTE_GUEST_HANDLE_MASK, meta);
+ *vm = get_vm_by_handle(handle);
+ if (!*vm) {
+ /* We probably raced with teardown; try again */
+ return -EAGAIN;
+ }
+
+ *gfn = FIELD_GET(KVM_HOST_INVALID_PTE_GUEST_GFN_MASK, meta);
+ return 0;
+}
+
+static int host_stage2_get_guest_info(phys_addr_t phys, struct pkvm_hyp_vm **vm,
+ u64 *gfn)
+{
+ enum pkvm_page_state state;
+ kvm_pte_t pte;
+ s8 level;
+ int ret;
+
+ if (!addr_is_memory(phys))
+ return -EFAULT;
+
+ state = get_host_state(hyp_phys_to_page(phys));
+ switch (state) {
+ case PKVM_PAGE_OWNED:
+ case PKVM_PAGE_SHARED_OWNED:
+ case PKVM_PAGE_SHARED_BORROWED:
+ /* The access should no longer fault; try again. */
+ return -EAGAIN;
+ case PKVM_NOPAGE:
+ break;
+ default:
+ return -EPERM;
+ }
+
+ ret = kvm_pgtable_get_leaf(&host_mmu.pgt, phys, &pte, &level);
+ if (ret)
+ return ret;
+
+ if (WARN_ON(level != KVM_PGTABLE_LAST_LEVEL))
+ return -EINVAL;
+
+ return host_stage2_decode_gfn_meta(pte, vm, gfn);
+}
+
+int __pkvm_host_force_reclaim_page_guest(phys_addr_t phys)
+{
+ struct pkvm_hyp_vm *vm;
+ u64 gfn, ipa, pa;
+ kvm_pte_t pte;
+ int ret;
+
+ hyp_spin_lock(&vm_table_lock);
+ host_lock_component();
+
+ ret = host_stage2_get_guest_info(phys, &vm, &gfn);
+ if (ret)
+ goto unlock_host;
+
+ ipa = hyp_pfn_to_phys(gfn);
+ guest_lock_component(vm);
+ ret = get_valid_guest_pte(vm, ipa, &pte, &pa);
+ if (ret)
+ goto unlock_guest;
+
+ WARN_ON(pa != phys);
+ if (guest_get_page_state(pte, ipa) != PKVM_PAGE_OWNED) {
+ ret = -EPERM;
+ goto unlock_guest;
+ }
+
+ /* We really shouldn't be allocating, so don't pass a memcache */
+ ret = kvm_pgtable_stage2_annotate(&vm->pgt, ipa, PAGE_SIZE, NULL,
+ KVM_GUEST_INVALID_PTE_POISONED);
+ if (ret)
+ goto unlock_guest;
+
+ hyp_poison_page(phys);
+ WARN_ON(host_stage2_set_owner_locked(phys, PAGE_SIZE, PKVM_ID_HOST));
+unlock_guest:
+ guest_unlock_component(vm);
+unlock_host:
+ host_unlock_component();
+ hyp_spin_unlock(&vm_table_lock);
+
+ return ret;
+}
+
int __pkvm_host_reclaim_page_guest(u64 gfn, struct pkvm_hyp_vm *vm)
{
u64 ipa = hyp_pfn_to_phys(gfn);
@@ -1110,7 +1222,11 @@ int __pkvm_host_reclaim_page_guest(u64 gfn, struct pkvm_hyp_vm *vm)
guest_unlock_component(vm);
host_unlock_component();
- return ret;
+ /*
+ * -EHWPOISON implies that the page was forcefully reclaimed already
+ * so return success for the GUP pin to be dropped.
+ */
+ return ret && ret != -EHWPOISON ? ret : 0;
}
int __pkvm_host_donate_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu)
diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
index 9f0997150cf5..df340de59eed 100644
--- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
+++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
@@ -230,10 +230,12 @@ void pkvm_hyp_vm_table_init(void *tbl)
/*
* Return the hyp vm structure corresponding to the handle.
*/
-static struct pkvm_hyp_vm *get_vm_by_handle(pkvm_handle_t handle)
+struct pkvm_hyp_vm *get_vm_by_handle(pkvm_handle_t handle)
{
unsigned int idx = vm_handle_to_idx(handle);
+ hyp_assert_lock_held(&vm_table_lock);
+
if (unlikely(idx >= KVM_MAX_PVMS))
return NULL;
--
2.52.0.351.gbe84eed79e-goog
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH 21/30] KVM: arm64: Reclaim faulting page from pKVM in spurious fault handler
2026-01-05 15:49 [PATCH 00/30] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
` (19 preceding siblings ...)
2026-01-05 15:49 ` [PATCH 20/30] KVM: arm64: Introduce hypercall to force reclaim of a protected page Will Deacon
@ 2026-01-05 15:49 ` Will Deacon
2026-01-05 15:49 ` [PATCH 22/30] KVM: arm64: Return -EFAULT from VCPU_RUN on access to a poisoned pte Will Deacon
` (8 subsequent siblings)
29 siblings, 0 replies; 45+ messages in thread
From: Will Deacon @ 2026-01-05 15:49 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh
Host kernel accesses to pages that are inaccessible at stage-2 result in
the injection of a translation fault, which is fatal unless an exception
table fixup is registered for the faulting PC (e.g. for user access
routines). This is undesirable, since a get_user_pages() call could be
used to obtain a reference to a donated page and then a subsequent
access via a kernel mapping would lead to a panic().
Rework the spurious fault handler so that stage-2 faults injected back
into the host result in the target page being forcefully reclaimed when
no exception table fixup handler is registered.
Signed-off-by: Will Deacon <will@kernel.org>
---
arch/arm64/include/asm/virt.h | 6 ++++++
arch/arm64/kvm/pkvm.c | 7 +++++++
arch/arm64/mm/fault.c | 15 +++++++++------
3 files changed, 22 insertions(+), 6 deletions(-)
diff --git a/arch/arm64/include/asm/virt.h b/arch/arm64/include/asm/virt.h
index b51ab6840f9c..e80addc923a4 100644
--- a/arch/arm64/include/asm/virt.h
+++ b/arch/arm64/include/asm/virt.h
@@ -94,6 +94,12 @@ static inline bool is_pkvm_initialized(void)
static_branch_likely(&kvm_protected_mode_initialized);
}
+#ifdef CONFIG_KVM
+bool pkvm_reclaim_guest_page(phys_addr_t phys);
+#else
+static inline bool pkvm_reclaim_guest_page(phys_addr_t phys) { return false; }
+#endif
+
/* Reports the availability of HYP mode */
static inline bool is_hyp_mode_available(void)
{
diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index 8be91051699e..d1926cb08c76 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -563,3 +563,10 @@ int pkvm_pgtable_stage2_split(struct kvm_pgtable *pgt, u64 addr, u64 size,
WARN_ON_ONCE(1);
return -EINVAL;
}
+
+bool pkvm_reclaim_guest_page(phys_addr_t phys)
+{
+ int ret = kvm_call_hyp_nvhe(__pkvm_force_reclaim_guest_page, phys);
+
+ return !ret || ret == -EAGAIN;
+}
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 2294f2061866..5d62abee5262 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -289,9 +289,6 @@ static bool __kprobes is_spurious_el1_translation_fault(unsigned long addr,
if (!is_el1_data_abort(esr) || !esr_fsc_is_translation_fault(esr))
return false;
- if (is_pkvm_stage2_abort(esr))
- return false;
-
local_irq_save(flags);
asm volatile("at s1e1r, %0" :: "r" (addr));
isb();
@@ -302,8 +299,12 @@ static bool __kprobes is_spurious_el1_translation_fault(unsigned long addr,
* If we now have a valid translation, treat the translation fault as
* spurious.
*/
- if (!(par & SYS_PAR_EL1_F))
+ if (!(par & SYS_PAR_EL1_F)) {
+ if (is_pkvm_stage2_abort(esr))
+ return pkvm_reclaim_guest_page(par & SYS_PAR_EL1_PA);
+
return true;
+ }
/*
* If we got a different type of fault from the AT instruction,
@@ -389,9 +390,11 @@ static void __do_kernel_fault(unsigned long addr, unsigned long esr,
if (!is_el1_instruction_abort(esr) && fixup_exception(regs, esr))
return;
- if (WARN_RATELIMIT(is_spurious_el1_translation_fault(addr, esr, regs),
- "Ignoring spurious kernel translation fault at virtual address %016lx\n", addr))
+ if (is_spurious_el1_translation_fault(addr, esr, regs)) {
+ WARN_RATELIMIT(!is_pkvm_stage2_abort(esr),
+ "Ignoring spurious kernel translation fault at virtual address %016lx\n", addr);
return;
+ }
if (is_el1_mte_sync_tag_check_fault(esr)) {
do_tag_recovery(addr, esr, regs);
--
2.52.0.351.gbe84eed79e-goog
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH 22/30] KVM: arm64: Return -EFAULT from VCPU_RUN on access to a poisoned pte
2026-01-05 15:49 [PATCH 00/30] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
` (20 preceding siblings ...)
2026-01-05 15:49 ` [PATCH 21/30] KVM: arm64: Reclaim faulting page from pKVM in spurious fault handler Will Deacon
@ 2026-01-05 15:49 ` Will Deacon
2026-01-06 15:54 ` Quentin Perret
2026-01-05 15:49 ` [PATCH 23/30] KVM: arm64: Add hvc handler at EL2 for hypercalls from protected VMs Will Deacon
` (7 subsequent siblings)
29 siblings, 1 reply; 45+ messages in thread
From: Will Deacon @ 2026-01-05 15:49 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh
If a protected vCPU faults on an IPA which appears to be mapped, query
the hypervisor to determine whether or not the faulting pte has been
poisoned by a forceful reclaim. If the pte has been poisoned, return
-EFAULT back to userspace rather than retrying the instruction forever.
Signed-off-by: Will Deacon <will@kernel.org>
---
arch/arm64/include/asm/kvm_asm.h | 1 +
arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 1 +
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 10 +++++
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 37 +++++++++++++++++++
arch/arm64/kvm/pkvm.c | 9 +++--
5 files changed, 55 insertions(+), 3 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 286a7379a368..178e2c2724ef 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -86,6 +86,7 @@ enum __kvm_host_smccc_func {
__KVM_HOST_SMCCC_FUNC___pkvm_unreserve_vm,
__KVM_HOST_SMCCC_FUNC___pkvm_init_vm,
__KVM_HOST_SMCCC_FUNC___pkvm_init_vcpu,
+ __KVM_HOST_SMCCC_FUNC___pkvm_vcpu_in_poison_fault,
__KVM_HOST_SMCCC_FUNC___pkvm_force_reclaim_guest_page,
__KVM_HOST_SMCCC_FUNC___pkvm_reclaim_dying_guest_page,
__KVM_HOST_SMCCC_FUNC___pkvm_start_teardown_vm,
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index f27b037abaf3..5e6cdafcdd69 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -41,6 +41,7 @@ int __pkvm_hyp_donate_host(u64 pfn, u64 nr_pages);
int __pkvm_host_share_ffa(u64 pfn, u64 nr_pages);
int __pkvm_host_unshare_ffa(u64 pfn, u64 nr_pages);
int __pkvm_host_donate_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu);
+int __pkvm_vcpu_in_poison_fault(struct pkvm_hyp_vcpu *hyp_vcpu);
int __pkvm_host_force_reclaim_page_guest(phys_addr_t phys);
int __pkvm_host_reclaim_page_guest(u64 gfn, struct pkvm_hyp_vm *vm);
int __pkvm_host_share_guest(u64 pfn, u64 gfn, u64 nr_pages, struct pkvm_hyp_vcpu *vcpu,
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 7d66cdd7de57..7ff87eb91112 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -570,6 +570,15 @@ static void handle___pkvm_init_vcpu(struct kvm_cpu_context *host_ctxt)
cpu_reg(host_ctxt, 1) = __pkvm_init_vcpu(handle, host_vcpu, vcpu_hva);
}
+static void handle___pkvm_vcpu_in_poison_fault(struct kvm_cpu_context *host_ctxt)
+{
+ int ret;
+ struct pkvm_hyp_vcpu *hyp_vcpu = pkvm_get_loaded_hyp_vcpu();
+
+ ret = hyp_vcpu ? __pkvm_vcpu_in_poison_fault(hyp_vcpu) : -EINVAL;
+ cpu_reg(host_ctxt, 1) = ret;
+}
+
static void handle___pkvm_force_reclaim_guest_page(struct kvm_cpu_context *host_ctxt)
{
DECLARE_REG(phys_addr_t, phys, host_ctxt, 1);
@@ -637,6 +646,7 @@ static const hcall_t host_hcall[] = {
HANDLE_FUNC(__pkvm_unreserve_vm),
HANDLE_FUNC(__pkvm_init_vm),
HANDLE_FUNC(__pkvm_init_vcpu),
+ HANDLE_FUNC(__pkvm_vcpu_in_poison_fault),
HANDLE_FUNC(__pkvm_force_reclaim_guest_page),
HANDLE_FUNC(__pkvm_reclaim_dying_guest_page),
HANDLE_FUNC(__pkvm_start_teardown_vm),
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 5d6028c41125..57bc7ade30de 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -825,6 +825,43 @@ static int get_valid_guest_pte(struct pkvm_hyp_vm *vm, u64 ipa, kvm_pte_t *ptep,
return 0;
}
+int __pkvm_vcpu_in_poison_fault(struct pkvm_hyp_vcpu *hyp_vcpu)
+{
+ struct pkvm_hyp_vm *vm = pkvm_hyp_vcpu_to_hyp_vm(hyp_vcpu);
+ kvm_pte_t pte;
+ s8 level;
+ u64 ipa;
+ int ret;
+
+ switch (kvm_vcpu_trap_get_class(&hyp_vcpu->vcpu)) {
+ case ESR_ELx_EC_DABT_LOW:
+ case ESR_ELx_EC_IABT_LOW:
+ if (kvm_vcpu_trap_is_translation_fault(&hyp_vcpu->vcpu))
+ break;
+ fallthrough;
+ default:
+ return -EINVAL;
+ }
+
+ ipa = kvm_vcpu_get_fault_ipa(&hyp_vcpu->vcpu);
+ ipa |= kvm_vcpu_get_hfar(&hyp_vcpu->vcpu) & GENMASK(11, 0);
+
+ guest_lock_component(vm);
+ ret = kvm_pgtable_get_leaf(&vm->pgt, ipa, &pte, &level);
+ if (ret)
+ goto unlock;
+
+ if (level != KVM_PGTABLE_LAST_LEVEL) {
+ ret = -EINVAL;
+ goto unlock;
+ }
+
+ ret = guest_pte_is_poisoned(pte);
+unlock:
+ guest_unlock_component(vm);
+ return ret;
+}
+
int __pkvm_host_share_hyp(u64 pfn)
{
u64 phys = hyp_pfn_to_phys(pfn);
diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index d1926cb08c76..14865907610c 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -417,10 +417,13 @@ int pkvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
return -EINVAL;
/*
- * We raced with another vCPU.
+ * We either raced with another vCPU or the guest PTE
+ * has been poisoned by an erroneous host access.
*/
- if (mapping)
- return -EAGAIN;
+ if (mapping) {
+ ret = kvm_call_hyp_nvhe(__pkvm_vcpu_in_poison_fault);
+ return ret ? -EFAULT : -EAGAIN;
+ }
ret = kvm_call_hyp_nvhe(__pkvm_host_donate_guest, pfn, gfn);
} else {
--
2.52.0.351.gbe84eed79e-goog
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH 23/30] KVM: arm64: Add hvc handler at EL2 for hypercalls from protected VMs
2026-01-05 15:49 [PATCH 00/30] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
` (21 preceding siblings ...)
2026-01-05 15:49 ` [PATCH 22/30] KVM: arm64: Return -EFAULT from VCPU_RUN on access to a poisoned pte Will Deacon
@ 2026-01-05 15:49 ` Will Deacon
2026-01-06 15:52 ` Vincent Donnefort
2026-01-05 15:49 ` [PATCH 24/30] KVM: arm64: Implement the MEM_SHARE hypercall for " Will Deacon
` (6 subsequent siblings)
29 siblings, 1 reply; 45+ messages in thread
From: Will Deacon @ 2026-01-05 15:49 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh
Add a hypercall handler at EL2 for hypercalls originating from protected
VMs. For now, this implements only the FEATURES and MEMINFO calls, but
subsequent patches will implement the SHARE and UNSHARE functions
necessary for virtio.
Unhandled hypercalls (including PSCI) are passed back to the host.
Signed-off-by: Will Deacon <will@kernel.org>
---
arch/arm64/kvm/hyp/include/nvhe/pkvm.h | 1 +
arch/arm64/kvm/hyp/nvhe/pkvm.c | 37 ++++++++++++++++++++++++++
arch/arm64/kvm/hyp/nvhe/switch.c | 1 +
3 files changed, 39 insertions(+)
diff --git a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
index a5a7bb453f3e..c904647d2f76 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
@@ -88,6 +88,7 @@ struct pkvm_hyp_vm *get_pkvm_hyp_vm(pkvm_handle_t handle);
struct pkvm_hyp_vm *get_np_pkvm_hyp_vm(pkvm_handle_t handle);
void put_pkvm_hyp_vm(struct pkvm_hyp_vm *hyp_vm);
+bool kvm_handle_pvm_hvc64(struct kvm_vcpu *vcpu, u64 *exit_code);
bool kvm_handle_pvm_sysreg(struct kvm_vcpu *vcpu, u64 *exit_code);
bool kvm_handle_pvm_restricted(struct kvm_vcpu *vcpu, u64 *exit_code);
void kvm_init_pvm_id_regs(struct kvm_vcpu *vcpu);
diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
index df340de59eed..5cdec49e989b 100644
--- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
+++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
@@ -4,6 +4,8 @@
* Author: Fuad Tabba <tabba@google.com>
*/
+#include <kvm/arm_hypercalls.h>
+
#include <linux/kvm_host.h>
#include <linux/mm.h>
@@ -934,3 +936,38 @@ int __pkvm_finalize_teardown_vm(pkvm_handle_t handle)
hyp_spin_unlock(&vm_table_lock);
return err;
}
+/*
+ * Handler for protected VM HVC calls.
+ *
+ * Returns true if the hypervisor has handled the exit (and control
+ * should return to the guest) or false if it hasn't (and the handling
+ * should be performed by the host).
+ */
+bool kvm_handle_pvm_hvc64(struct kvm_vcpu *vcpu, u64 *exit_code)
+{
+ u64 val[4] = { SMCCC_RET_INVALID_PARAMETER };
+ bool handled = true;
+
+ switch (smccc_get_function(vcpu)) {
+ case ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID:
+ val[0] = BIT(ARM_SMCCC_KVM_FUNC_FEATURES);
+ val[0] |= BIT(ARM_SMCCC_KVM_FUNC_HYP_MEMINFO);
+ break;
+ case ARM_SMCCC_VENDOR_HYP_KVM_HYP_MEMINFO_FUNC_ID:
+ if (smccc_get_arg1(vcpu) ||
+ smccc_get_arg2(vcpu) ||
+ smccc_get_arg3(vcpu)) {
+ break;
+ }
+
+ val[0] = PAGE_SIZE;
+ break;
+ default:
+ /* Punt everything else back to the host, for now. */
+ handled = false;
+ }
+
+ if (handled)
+ smccc_set_retval(vcpu, val[0], val[1], val[2], val[3]);
+ return handled;
+}
diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
index d3b9ec8a7c28..b62e25e8bb7e 100644
--- a/arch/arm64/kvm/hyp/nvhe/switch.c
+++ b/arch/arm64/kvm/hyp/nvhe/switch.c
@@ -190,6 +190,7 @@ static const exit_handler_fn hyp_exit_handlers[] = {
static const exit_handler_fn pvm_exit_handlers[] = {
[0 ... ESR_ELx_EC_MAX] = NULL,
+ [ESR_ELx_EC_HVC64] = kvm_handle_pvm_hvc64,
[ESR_ELx_EC_SYS64] = kvm_handle_pvm_sys64,
[ESR_ELx_EC_SVE] = kvm_handle_pvm_restricted,
[ESR_ELx_EC_FP_ASIMD] = kvm_hyp_handle_fpsimd,
--
2.52.0.351.gbe84eed79e-goog
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH 24/30] KVM: arm64: Implement the MEM_SHARE hypercall for protected VMs
2026-01-05 15:49 [PATCH 00/30] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
` (22 preceding siblings ...)
2026-01-05 15:49 ` [PATCH 23/30] KVM: arm64: Add hvc handler at EL2 for hypercalls from protected VMs Will Deacon
@ 2026-01-05 15:49 ` Will Deacon
2026-01-06 15:45 ` Vincent Donnefort
2026-01-05 15:49 ` [PATCH 25/30] KVM: arm64: Implement the MEM_UNSHARE " Will Deacon
` (5 subsequent siblings)
29 siblings, 1 reply; 45+ messages in thread
From: Will Deacon @ 2026-01-05 15:49 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh
Implement the ARM_SMCCC_KVM_FUNC_MEM_SHARE hypercall to allow protected
VMs to share memory (e.g. the swiotlb bounce buffers) back to the host.
Signed-off-by: Will Deacon <will@kernel.org>
---
arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 1 +
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 32 ++++++++++
arch/arm64/kvm/hyp/nvhe/pkvm.c | 61 +++++++++++++++++++
3 files changed, 94 insertions(+)
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index 5e6cdafcdd69..42fd60c5cfc9 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -35,6 +35,7 @@ extern unsigned long hyp_nr_cpus;
int __pkvm_prot_finalize(void);
int __pkvm_host_share_hyp(u64 pfn);
+int __pkvm_guest_share_host(struct pkvm_hyp_vcpu *vcpu, u64 gfn);
int __pkvm_host_unshare_hyp(u64 pfn);
int __pkvm_host_donate_hyp(u64 pfn, u64 nr_pages);
int __pkvm_hyp_donate_host(u64 pfn, u64 nr_pages);
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 57bc7ade30de..365c769c82a4 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -888,6 +888,38 @@ int __pkvm_host_share_hyp(u64 pfn)
return ret;
}
+int __pkvm_guest_share_host(struct pkvm_hyp_vcpu *vcpu, u64 gfn)
+{
+ struct pkvm_hyp_vm *vm = pkvm_hyp_vcpu_to_hyp_vm(vcpu);
+ u64 phys, ipa = hyp_pfn_to_phys(gfn);
+ kvm_pte_t pte;
+ int ret;
+
+ host_lock_component();
+ guest_lock_component(vm);
+
+ ret = get_valid_guest_pte(vm, ipa, &pte, &phys);
+ if (ret)
+ goto unlock;
+
+ ret = -EPERM;
+ if (pkvm_getstate(kvm_pgtable_stage2_pte_prot(pte)) != PKVM_PAGE_OWNED)
+ goto unlock;
+ if (__host_check_page_state_range(phys, PAGE_SIZE, PKVM_NOPAGE))
+ goto unlock;
+
+ ret = 0;
+ WARN_ON(kvm_pgtable_stage2_map(&vm->pgt, ipa, PAGE_SIZE, phys,
+ pkvm_mkstate(KVM_PGTABLE_PROT_RWX, PKVM_PAGE_SHARED_OWNED),
+ &vcpu->vcpu.arch.pkvm_memcache, 0));
+ WARN_ON(__host_set_page_state_range(phys, PAGE_SIZE, PKVM_PAGE_SHARED_BORROWED));
+unlock:
+ guest_unlock_component(vm);
+ host_unlock_component();
+
+ return ret;
+}
+
int __pkvm_host_unshare_hyp(u64 pfn)
{
u64 phys = hyp_pfn_to_phys(pfn);
diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
index 5cdec49e989b..d8afa2b98542 100644
--- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
+++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
@@ -936,6 +936,58 @@ int __pkvm_finalize_teardown_vm(pkvm_handle_t handle)
hyp_spin_unlock(&vm_table_lock);
return err;
}
+
+static u64 __pkvm_memshare_page_req(struct kvm_vcpu *vcpu, u64 ipa)
+{
+ u64 elr;
+
+ /* Fake up a data abort (level 3 translation fault on write) */
+ vcpu->arch.fault.esr_el2 = (ESR_ELx_EC_DABT_LOW << ESR_ELx_EC_SHIFT) |
+ ESR_ELx_WNR | ESR_ELx_FSC_FAULT |
+ FIELD_PREP(ESR_ELx_FSC_LEVEL, 3);
+
+ /* Shuffle the IPA around into the HPFAR */
+ vcpu->arch.fault.hpfar_el2 = (HPFAR_EL2_NS | (ipa >> 8)) & HPFAR_MASK;
+
+ /* This is a virtual address. 0's good. Let's go with 0. */
+ vcpu->arch.fault.far_el2 = 0;
+
+ /* Rewind the ELR so we return to the HVC once the IPA is mapped */
+ elr = read_sysreg(elr_el2);
+ elr -= 4;
+ write_sysreg(elr, elr_el2);
+
+ return ARM_EXCEPTION_TRAP;
+}
+
+static bool pkvm_memshare_call(u64 *ret, struct kvm_vcpu *vcpu, u64 *exit_code)
+{
+ struct pkvm_hyp_vcpu *hyp_vcpu;
+ u64 ipa = smccc_get_arg1(vcpu);
+
+ if (!PAGE_ALIGNED(ipa))
+ goto out_guest;
+
+ hyp_vcpu = container_of(vcpu, struct pkvm_hyp_vcpu, vcpu);
+ switch (__pkvm_guest_share_host(hyp_vcpu, hyp_phys_to_pfn(ipa))) {
+ case 0:
+ ret[0] = SMCCC_RET_SUCCESS;
+ goto out_guest;
+ case -ENOENT:
+ /*
+ * Convert the exception into a data abort so that the page
+ * being shared is mapped into the guest next time.
+ */
+ *exit_code = __pkvm_memshare_page_req(vcpu, ipa);
+ goto out_host;
+ }
+
+out_guest:
+ return true;
+out_host:
+ return false;
+}
+
/*
* Handler for protected VM HVC calls.
*
@@ -952,6 +1004,7 @@ bool kvm_handle_pvm_hvc64(struct kvm_vcpu *vcpu, u64 *exit_code)
case ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID:
val[0] = BIT(ARM_SMCCC_KVM_FUNC_FEATURES);
val[0] |= BIT(ARM_SMCCC_KVM_FUNC_HYP_MEMINFO);
+ val[0] |= BIT(ARM_SMCCC_KVM_FUNC_MEM_SHARE);
break;
case ARM_SMCCC_VENDOR_HYP_KVM_HYP_MEMINFO_FUNC_ID:
if (smccc_get_arg1(vcpu) ||
@@ -962,6 +1015,14 @@ bool kvm_handle_pvm_hvc64(struct kvm_vcpu *vcpu, u64 *exit_code)
val[0] = PAGE_SIZE;
break;
+ case ARM_SMCCC_VENDOR_HYP_KVM_MEM_SHARE_FUNC_ID:
+ if (smccc_get_arg2(vcpu) ||
+ smccc_get_arg3(vcpu)) {
+ break;
+ }
+
+ handled = pkvm_memshare_call(val, vcpu, exit_code);
+ break;
default:
/* Punt everything else back to the host, for now. */
handled = false;
--
2.52.0.351.gbe84eed79e-goog
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH 25/30] KVM: arm64: Implement the MEM_UNSHARE hypercall for protected VMs
2026-01-05 15:49 [PATCH 00/30] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
` (23 preceding siblings ...)
2026-01-05 15:49 ` [PATCH 24/30] KVM: arm64: Implement the MEM_SHARE hypercall for " Will Deacon
@ 2026-01-05 15:49 ` Will Deacon
2026-01-06 15:50 ` Vincent Donnefort
2026-01-05 15:49 ` [PATCH 26/30] KVM: arm64: Allow userspace to create protected VMs when pKVM is enabled Will Deacon
` (4 subsequent siblings)
29 siblings, 1 reply; 45+ messages in thread
From: Will Deacon @ 2026-01-05 15:49 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh
Implement the ARM_SMCCC_KVM_FUNC_MEM_UNSHARE hypercall to allow
protected VMs to unshare memory that was previously shared with the host
using the ARM_SMCCC_KVM_FUNC_MEM_SHARE hypercall.
Signed-off-by: Will Deacon <will@kernel.org>
---
arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 1 +
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 32 +++++++++++++++++++
arch/arm64/kvm/hyp/nvhe/pkvm.c | 22 +++++++++++++
3 files changed, 55 insertions(+)
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index 42fd60c5cfc9..e41a128b0854 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -36,6 +36,7 @@ extern unsigned long hyp_nr_cpus;
int __pkvm_prot_finalize(void);
int __pkvm_host_share_hyp(u64 pfn);
int __pkvm_guest_share_host(struct pkvm_hyp_vcpu *vcpu, u64 gfn);
+int __pkvm_guest_unshare_host(struct pkvm_hyp_vcpu *vcpu, u64 gfn);
int __pkvm_host_unshare_hyp(u64 pfn);
int __pkvm_host_donate_hyp(u64 pfn, u64 nr_pages);
int __pkvm_hyp_donate_host(u64 pfn, u64 nr_pages);
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 365c769c82a4..c1600b88c316 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -920,6 +920,38 @@ int __pkvm_guest_share_host(struct pkvm_hyp_vcpu *vcpu, u64 gfn)
return ret;
}
+int __pkvm_guest_unshare_host(struct pkvm_hyp_vcpu *vcpu, u64 gfn)
+{
+ struct pkvm_hyp_vm *vm = pkvm_hyp_vcpu_to_hyp_vm(vcpu);
+ u64 phys, ipa = hyp_pfn_to_phys(gfn);
+ kvm_pte_t pte;
+ int ret;
+
+ host_lock_component();
+ guest_lock_component(vm);
+
+ ret = get_valid_guest_pte(vm, ipa, &pte, &phys);
+ if (ret)
+ goto unlock;
+
+ ret = -EPERM;
+ if (pkvm_getstate(kvm_pgtable_stage2_pte_prot(pte)) != PKVM_PAGE_SHARED_OWNED)
+ goto unlock;
+ if (__host_check_page_state_range(phys, PAGE_SIZE, PKVM_PAGE_SHARED_BORROWED))
+ goto unlock;
+
+ ret = 0;
+ WARN_ON(host_stage2_set_owner_locked(phys, PAGE_SIZE, PKVM_ID_GUEST));
+ WARN_ON(kvm_pgtable_stage2_map(&vm->pgt, ipa, PAGE_SIZE, phys,
+ pkvm_mkstate(KVM_PGTABLE_PROT_RWX, PKVM_PAGE_OWNED),
+ &vcpu->vcpu.arch.pkvm_memcache, 0));
+unlock:
+ guest_unlock_component(vm);
+ host_unlock_component();
+
+ return ret;
+}
+
int __pkvm_host_unshare_hyp(u64 pfn)
{
u64 phys = hyp_pfn_to_phys(pfn);
diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
index d8afa2b98542..2890328f4a78 100644
--- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
+++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
@@ -988,6 +988,19 @@ static bool pkvm_memshare_call(u64 *ret, struct kvm_vcpu *vcpu, u64 *exit_code)
return false;
}
+static void pkvm_memunshare_call(u64 *ret, struct kvm_vcpu *vcpu)
+{
+ struct pkvm_hyp_vcpu *hyp_vcpu;
+ u64 ipa = smccc_get_arg1(vcpu);
+
+ if (!PAGE_ALIGNED(ipa))
+ return;
+
+ hyp_vcpu = container_of(vcpu, struct pkvm_hyp_vcpu, vcpu);
+ if (!__pkvm_guest_unshare_host(hyp_vcpu, hyp_phys_to_pfn(ipa)))
+ ret[0] = SMCCC_RET_SUCCESS;
+}
+
/*
* Handler for protected VM HVC calls.
*
@@ -1005,6 +1018,7 @@ bool kvm_handle_pvm_hvc64(struct kvm_vcpu *vcpu, u64 *exit_code)
val[0] = BIT(ARM_SMCCC_KVM_FUNC_FEATURES);
val[0] |= BIT(ARM_SMCCC_KVM_FUNC_HYP_MEMINFO);
val[0] |= BIT(ARM_SMCCC_KVM_FUNC_MEM_SHARE);
+ val[0] |= BIT(ARM_SMCCC_KVM_FUNC_MEM_UNSHARE);
break;
case ARM_SMCCC_VENDOR_HYP_KVM_HYP_MEMINFO_FUNC_ID:
if (smccc_get_arg1(vcpu) ||
@@ -1023,6 +1037,14 @@ bool kvm_handle_pvm_hvc64(struct kvm_vcpu *vcpu, u64 *exit_code)
handled = pkvm_memshare_call(val, vcpu, exit_code);
break;
+ case ARM_SMCCC_VENDOR_HYP_KVM_MEM_UNSHARE_FUNC_ID:
+ if (smccc_get_arg2(vcpu) ||
+ smccc_get_arg3(vcpu)) {
+ break;
+ }
+
+ pkvm_memunshare_call(val, vcpu);
+ break;
default:
/* Punt everything else back to the host, for now. */
handled = false;
--
2.52.0.351.gbe84eed79e-goog
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH 26/30] KVM: arm64: Allow userspace to create protected VMs when pKVM is enabled
2026-01-05 15:49 [PATCH 00/30] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
` (24 preceding siblings ...)
2026-01-05 15:49 ` [PATCH 25/30] KVM: arm64: Implement the MEM_UNSHARE " Will Deacon
@ 2026-01-05 15:49 ` Will Deacon
2026-01-05 15:49 ` [PATCH 27/30] KVM: arm64: Add some initial documentation for pKVM Will Deacon
` (3 subsequent siblings)
29 siblings, 0 replies; 45+ messages in thread
From: Will Deacon @ 2026-01-05 15:49 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh
Introduce a new VM type for KVM/arm64 to allow userspace to request the
creation of a "protected VM" when the host has booted with pKVM enabled.
For now, this depends on CONFIG_EXPERT and results in a taint on first
use as many aspects of a protected VM are not yet protected!
Signed-off-by: Will Deacon <will@kernel.org>
---
arch/arm64/include/asm/kvm_pkvm.h | 2 +-
arch/arm64/kvm/Kconfig | 10 ++++++++++
arch/arm64/kvm/arm.c | 8 +++++++-
arch/arm64/kvm/mmu.c | 3 ---
arch/arm64/kvm/pkvm.c | 11 ++++++++++-
include/uapi/linux/kvm.h | 5 +++++
6 files changed, 33 insertions(+), 6 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_pkvm.h b/arch/arm64/include/asm/kvm_pkvm.h
index 5a71d25febca..507d9c53e2eb 100644
--- a/arch/arm64/include/asm/kvm_pkvm.h
+++ b/arch/arm64/include/asm/kvm_pkvm.h
@@ -16,7 +16,7 @@
#define HYP_MEMBLOCK_REGIONS 128
-int pkvm_init_host_vm(struct kvm *kvm);
+int pkvm_init_host_vm(struct kvm *kvm, unsigned long type);
int pkvm_create_hyp_vm(struct kvm *kvm);
bool pkvm_hyp_vm_is_created(struct kvm *kvm);
void pkvm_destroy_hyp_vm(struct kvm *kvm);
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index 4f803fd1c99a..47fa06c9da94 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -83,4 +83,14 @@ config PTDUMP_STAGE2_DEBUGFS
If in doubt, say N.
+config PROTECTED_VM_UAPI
+ bool "Expose protected VMs to userspace (experimental)"
+ depends on KVM && EXPERT
+ help
+ Say Y here to enable experimental (i.e. in development)
+ support for creating protected virtual machines using KVM's
+ KVM_CREATE_VM ioctl() when booted with pKVM enabled.
+
+ Unless you are a KVM developer, say N.
+
endif # VIRTUALIZATION
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 4f80da0c0d1d..66f472774da7 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -157,6 +157,9 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
{
int ret;
+ if (type & ~KVM_VM_TYPE_ARM_MASK)
+ return -EINVAL;
+
mutex_init(&kvm->arch.config_lock);
#ifdef CONFIG_LOCKDEP
@@ -188,9 +191,12 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
* If any failures occur after this is successful, make sure to
* call __pkvm_unreserve_vm to unreserve the VM in hyp.
*/
- ret = pkvm_init_host_vm(kvm);
+ ret = pkvm_init_host_vm(kvm, type);
if (ret)
goto err_free_cpumask;
+ } else if (type & KVM_VM_TYPE_ARM_PROTECTED) {
+ ret = -EINVAL;
+ goto err_free_cpumask;
}
kvm_vgic_early_init(kvm);
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index fafedcbd516d..3c7aca86961d 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -881,9 +881,6 @@ static int kvm_init_ipa_range(struct kvm_s2_mmu *mmu, unsigned long type)
u64 mmfr0, mmfr1;
u32 phys_shift;
- if (type & ~KVM_VM_TYPE_ARM_IPA_SIZE_MASK)
- return -EINVAL;
-
phys_shift = KVM_VM_TYPE_ARM_IPA_SIZE(type);
if (is_protected_kvm_enabled()) {
phys_shift = kvm_ipa_limit;
diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index 14865907610c..6d6b1ef1cc62 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -219,9 +219,13 @@ void pkvm_destroy_hyp_vm(struct kvm *kvm)
mutex_unlock(&kvm->arch.config_lock);
}
-int pkvm_init_host_vm(struct kvm *kvm)
+int pkvm_init_host_vm(struct kvm *kvm, unsigned long type)
{
int ret;
+ bool protected = type & KVM_VM_TYPE_ARM_PROTECTED;
+
+ if (protected && !IS_ENABLED(CONFIG_PROTECTED_VM_UAPI))
+ return -EINVAL;
if (pkvm_hyp_vm_is_created(kvm))
return -EINVAL;
@@ -236,6 +240,11 @@ int pkvm_init_host_vm(struct kvm *kvm)
return ret;
kvm->arch.pkvm.handle = ret;
+ kvm->arch.pkvm.is_protected = protected;
+ if (protected) {
+ pr_warn_once("kvm: protected VMs are for development only, tainting kernel\n");
+ add_taint(TAINT_USER, LOCKDEP_STILL_OK);
+ }
return 0;
}
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index dddb781b0507..9316a827a826 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -689,6 +689,11 @@ struct kvm_enable_cap {
#define KVM_VM_TYPE_ARM_IPA_SIZE_MASK 0xffULL
#define KVM_VM_TYPE_ARM_IPA_SIZE(x) \
((x) & KVM_VM_TYPE_ARM_IPA_SIZE_MASK)
+
+#define KVM_VM_TYPE_ARM_PROTECTED (1UL << 31)
+#define KVM_VM_TYPE_ARM_MASK (KVM_VM_TYPE_ARM_IPA_SIZE_MASK | \
+ KVM_VM_TYPE_ARM_PROTECTED)
+
/*
* ioctls for /dev/kvm fds:
*/
--
2.52.0.351.gbe84eed79e-goog
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH 27/30] KVM: arm64: Add some initial documentation for pKVM
2026-01-05 15:49 [PATCH 00/30] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
` (25 preceding siblings ...)
2026-01-05 15:49 ` [PATCH 26/30] KVM: arm64: Allow userspace to create protected VMs when pKVM is enabled Will Deacon
@ 2026-01-05 15:49 ` Will Deacon
2026-01-06 15:59 ` Vincent Donnefort
2026-01-05 15:49 ` [PATCH 28/30] KVM: arm64: Extend pKVM page ownership selftests to cover guest donation Will Deacon
` (2 subsequent siblings)
29 siblings, 1 reply; 45+ messages in thread
From: Will Deacon @ 2026-01-05 15:49 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh
Add some initial documentation for pKVM to help people understand what
is supported, the limitations of protected VMs when compared to
non-protected VMs and also what is left to do.
Signed-off-by: Will Deacon <will@kernel.org>
---
.../admin-guide/kernel-parameters.txt | 4 +-
Documentation/virt/kvm/arm/index.rst | 1 +
Documentation/virt/kvm/arm/pkvm.rst | 101 ++++++++++++++++++
3 files changed, 104 insertions(+), 2 deletions(-)
create mode 100644 Documentation/virt/kvm/arm/pkvm.rst
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index a8d0afde7f85..9939dc5654d2 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3141,8 +3141,8 @@ Kernel parameters
for the host. To force nVHE on VHE hardware, add
"arm64_sw.hvhe=0 id_aa64mmfr1.vh=0" to the
command-line.
- "nested" is experimental and should be used with
- extreme caution.
+ "nested" and "protected" are experimental and should be
+ used with extreme caution.
kvm-arm.vgic_v3_group0_trap=
[KVM,ARM,EARLY] Trap guest accesses to GICv3 group-0
diff --git a/Documentation/virt/kvm/arm/index.rst b/Documentation/virt/kvm/arm/index.rst
index ec09881de4cf..0856b4942e05 100644
--- a/Documentation/virt/kvm/arm/index.rst
+++ b/Documentation/virt/kvm/arm/index.rst
@@ -10,6 +10,7 @@ ARM
fw-pseudo-registers
hyp-abi
hypercalls
+ pkvm
pvtime
ptp_kvm
vcpu-features
diff --git a/Documentation/virt/kvm/arm/pkvm.rst b/Documentation/virt/kvm/arm/pkvm.rst
new file mode 100644
index 000000000000..1882bde8cc0b
--- /dev/null
+++ b/Documentation/virt/kvm/arm/pkvm.rst
@@ -0,0 +1,101 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+====================
+Protected KVM (pKVM)
+====================
+
+**NOTE**: pKVM is currently an experimental, development feature and
+subject to breaking changes as new isolation features are implemented.
+Please reach out to the developers at kvmarm@lists.linux.dev if you have
+any questions.
+
+Overview
+========
+
+Booting a host kernel with '``kvm-arm.mode=protected``' enables
+"Protected KVM" (pKVM). During boot, pKVM installs a stage-2 identity
+map page-table for the host and uses it to isolate the hypervisor
+running at EL2 from the rest of the host running at EL1/0.
+
+If ``CONFIG_PROTECTED_VM_UAPI=y``, pKVM permits creation of protected
+virtual machines (pVMs) by passing the ``KVM_VM_TYPE_ARM_PROTECTED``
+machine type identifier to the ``KVM_CREATE_VM`` ioctl(). The hypervisor
+isolates pVMs from the host by unmapping pages from the stage-2 identity
+map as they are accessed by a pVM. Hypercalls are provided for a pVM to
+share specific regions of its IPA space back with the host, allowing
+for communication with the VMM. See hypercalls.rst for more details.
+
+Isolation mechanisms
+====================
+
+pKVM relies on a number of mechanisms to isolate PVMs from the host:
+
+CPU memory isolation
+--------------------
+
+Status: Isolation of anonymous memory and metadata pages.
+
+Metadata pages (e.g. page-table pages and '``struct kvm_vcpu``' pages)
+are donated from the host to the hypervisor during pVM creation and
+are consequently unmapped from the stage-2 identity map until the pVM is
+destroyed.
+
+Similarly to regular KVM, pages are lazily mapped into the guest in
+response to stage-2 page faults handled by the host. However, when
+running a pVM, these pages are first pinned and then unmapped from the
+stage-2 identity map as part of the donation procedure. This gives rise
+to some user-visible differences when compared to non-protected VMs,
+largely due to the lack of MMU notifiers:
+
+* Memslots cannot be moved or deleted once the pVM has started running.
+* Read-only memslots and dirty logging are not supported.
+* With the exception of swap, file-backed pages cannot be mapped into a
+ pVM.
+* Donated pages are accounted against ``RLIMIT_MLOCK`` and so the VMM
+ must have a sufficient resource limit or be granted ``CAP_IPC_LOCK``.
+* Changes to the VMM address space (e.g. a ``MAP_FIXED`` mmap() over a
+ mapping associated with a memslot) are not reflected in the guest and
+ may lead to loss of coherency.
+* Accessing pVM memory that has not been shared back will result in the
+ delivery of a SIGSEGV.
+* If a system call accesses pVM memory that has not been shared back
+ then it will either return ``-EFAULT`` or forcefully reclaim the
+ memory pages. Reclaimed memory is zeroed by the hypervisor and a
+ subsequent attempt to access it in the pVM will return ``-EFAULT``
+ from the ``VCPU_RUN`` ioctl().
+
+CPU state isolation
+-------------------
+
+Status: **Unimplemented.**
+
+DMA isolation using an IOMMU
+----------------------------
+
+Status: **Unimplemented.**
+
+Proxying of Trustzone services
+------------------------------
+
+Status: FF-A and PSCI calls from the host are proxied by the pKVM
+hypervisor.
+
+The FF-A proxy ensures that the host cannot share pVM or hypervisor
+memory with Trustzone as part of a "confused deputy" attack.
+
+The PSCI proxy ensures that CPUs always have the stage-2 identity map
+installed when they are executing in the host.
+
+Protected VM firmware (pvmfw)
+-----------------------------
+
+Status: **Unimplemented.**
+
+Resources
+=========
+
+Quentin Perret's KVM Forum 2022 talk entitled "Protected KVM on arm64: A
+technical deep dive" remains a good resource for learning more about
+pKVM, despite some of the details having changed in the meantime:
+
+https://www.youtube.com/watch?v=9npebeVFbFw
--
2.52.0.351.gbe84eed79e-goog
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH 28/30] KVM: arm64: Extend pKVM page ownership selftests to cover guest donation
2026-01-05 15:49 [PATCH 00/30] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
` (26 preceding siblings ...)
2026-01-05 15:49 ` [PATCH 27/30] KVM: arm64: Add some initial documentation for pKVM Will Deacon
@ 2026-01-05 15:49 ` Will Deacon
2026-01-05 15:49 ` [PATCH 29/30] KVM: arm64: Register 'selftest_vm' in the VM table Will Deacon
2026-01-05 15:49 ` [PATCH 30/30] KVM: arm64: Extend pKVM page ownership selftests to cover forced reclaim Will Deacon
29 siblings, 0 replies; 45+ messages in thread
From: Will Deacon @ 2026-01-05 15:49 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh
Extend the pKVM page ownership selftests to donate and reclaim a page
to/from a guest.
Signed-off-by: Will Deacon <will@kernel.org>
---
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index c1600b88c316..a586ca922258 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -1712,6 +1712,7 @@ void pkvm_ownership_selftest(void *base)
assert_transition_res(-EPERM, hyp_pin_shared_mem, virt, virt + size);
assert_transition_res(-EPERM, __pkvm_host_share_guest, pfn, gfn, 1, vcpu, prot);
assert_transition_res(-ENOENT, __pkvm_host_unshare_guest, gfn, 1, vm);
+ assert_transition_res(-EPERM, __pkvm_host_donate_guest, pfn, gfn, vcpu);
selftest_state.host = PKVM_PAGE_OWNED;
selftest_state.hyp = PKVM_NOPAGE;
@@ -1731,6 +1732,7 @@ void pkvm_ownership_selftest(void *base)
assert_transition_res(-EPERM, __pkvm_hyp_donate_host, pfn, 1);
assert_transition_res(-EPERM, __pkvm_host_share_guest, pfn, gfn, 1, vcpu, prot);
assert_transition_res(-ENOENT, __pkvm_host_unshare_guest, gfn, 1, vm);
+ assert_transition_res(-EPERM, __pkvm_host_donate_guest, pfn, gfn, vcpu);
assert_transition_res(0, hyp_pin_shared_mem, virt, virt + size);
assert_transition_res(0, hyp_pin_shared_mem, virt, virt + size);
@@ -1743,6 +1745,7 @@ void pkvm_ownership_selftest(void *base)
assert_transition_res(-EPERM, __pkvm_hyp_donate_host, pfn, 1);
assert_transition_res(-EPERM, __pkvm_host_share_guest, pfn, gfn, 1, vcpu, prot);
assert_transition_res(-ENOENT, __pkvm_host_unshare_guest, gfn, 1, vm);
+ assert_transition_res(-EPERM, __pkvm_host_donate_guest, pfn, gfn, vcpu);
hyp_unpin_shared_mem(virt, virt + size);
assert_page_state();
@@ -1762,6 +1765,7 @@ void pkvm_ownership_selftest(void *base)
assert_transition_res(-EPERM, __pkvm_hyp_donate_host, pfn, 1);
assert_transition_res(-EPERM, __pkvm_host_share_guest, pfn, gfn, 1, vcpu, prot);
assert_transition_res(-ENOENT, __pkvm_host_unshare_guest, gfn, 1, vm);
+ assert_transition_res(-EPERM, __pkvm_host_donate_guest, pfn, gfn, vcpu);
assert_transition_res(-EPERM, hyp_pin_shared_mem, virt, virt + size);
selftest_state.host = PKVM_PAGE_OWNED;
@@ -1778,6 +1782,7 @@ void pkvm_ownership_selftest(void *base)
assert_transition_res(-EPERM, __pkvm_host_share_hyp, pfn);
assert_transition_res(-EPERM, __pkvm_host_unshare_hyp, pfn);
assert_transition_res(-EPERM, __pkvm_hyp_donate_host, pfn, 1);
+ assert_transition_res(-EPERM, __pkvm_host_donate_guest, pfn, gfn, vcpu);
assert_transition_res(-EPERM, hyp_pin_shared_mem, virt, virt + size);
selftest_state.guest[1] = PKVM_PAGE_SHARED_BORROWED;
@@ -1791,6 +1796,23 @@ void pkvm_ownership_selftest(void *base)
selftest_state.host = PKVM_PAGE_OWNED;
assert_transition_res(0, __pkvm_host_unshare_guest, gfn + 1, 1, vm);
+ selftest_state.host = PKVM_NOPAGE;
+ selftest_state.guest[0] = PKVM_PAGE_OWNED;
+ assert_transition_res(0, __pkvm_host_donate_guest, pfn, gfn, vcpu);
+ assert_transition_res(-EPERM, __pkvm_host_donate_guest, pfn, gfn, vcpu);
+ assert_transition_res(-EPERM, __pkvm_host_donate_guest, pfn, gfn + 1, vcpu);
+ assert_transition_res(-EPERM, __pkvm_host_share_guest, pfn, gfn, 1, vcpu, prot);
+ assert_transition_res(-EPERM, __pkvm_host_share_guest, pfn, gfn + 1, 1, vcpu, prot);
+ assert_transition_res(-EPERM, __pkvm_host_share_ffa, pfn, 1);
+ assert_transition_res(-EPERM, __pkvm_host_donate_hyp, pfn, 1);
+ assert_transition_res(-EPERM, __pkvm_host_share_hyp, pfn);
+ assert_transition_res(-EPERM, __pkvm_host_unshare_hyp, pfn);
+ assert_transition_res(-EPERM, __pkvm_hyp_donate_host, pfn, 1);
+
+ selftest_state.host = PKVM_PAGE_OWNED;
+ selftest_state.guest[0] = PKVM_NOPAGE;
+ assert_transition_res(0, __pkvm_host_reclaim_page_guest, gfn, vm);
+
selftest_state.host = PKVM_NOPAGE;
selftest_state.hyp = PKVM_PAGE_OWNED;
assert_transition_res(0, __pkvm_host_donate_hyp, pfn, 1);
--
2.52.0.351.gbe84eed79e-goog
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH 29/30] KVM: arm64: Register 'selftest_vm' in the VM table
2026-01-05 15:49 [PATCH 00/30] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
` (27 preceding siblings ...)
2026-01-05 15:49 ` [PATCH 28/30] KVM: arm64: Extend pKVM page ownership selftests to cover guest donation Will Deacon
@ 2026-01-05 15:49 ` Will Deacon
2026-01-05 15:49 ` [PATCH 30/30] KVM: arm64: Extend pKVM page ownership selftests to cover forced reclaim Will Deacon
29 siblings, 0 replies; 45+ messages in thread
From: Will Deacon @ 2026-01-05 15:49 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh
In preparation for extending the pKVM page ownership selftests to cover
forceful reclaim of donated pages, rework the creation of the
'selftest_vm' so that it is registered in the VM table while the tests
are running.
Signed-off-by: Will Deacon <will@kernel.org>
---
arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 2 +
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 53 ++++---------------
arch/arm64/kvm/hyp/nvhe/pkvm.c | 49 +++++++++++++++++
3 files changed, 61 insertions(+), 43 deletions(-)
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index e41a128b0854..3ad644111885 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -77,6 +77,8 @@ static __always_inline void __load_host_stage2(void)
#ifdef CONFIG_NVHE_EL2_DEBUG
void pkvm_ownership_selftest(void *base);
+struct pkvm_hyp_vcpu *init_selftest_vm(void *virt);
+void teardown_selftest_vm(void);
#else
static inline void pkvm_ownership_selftest(void *base) { }
#endif
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index a586ca922258..f59f5e24ddda 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -1609,53 +1609,18 @@ struct pkvm_expected_state {
static struct pkvm_expected_state selftest_state;
static struct hyp_page *selftest_page;
-
-static struct pkvm_hyp_vm selftest_vm = {
- .kvm = {
- .arch = {
- .mmu = {
- .arch = &selftest_vm.kvm.arch,
- .pgt = &selftest_vm.pgt,
- },
- },
- },
-};
-
-static struct pkvm_hyp_vcpu selftest_vcpu = {
- .vcpu = {
- .arch = {
- .hw_mmu = &selftest_vm.kvm.arch.mmu,
- },
- .kvm = &selftest_vm.kvm,
- },
-};
-
-static void init_selftest_vm(void *virt)
-{
- struct hyp_page *p = hyp_virt_to_page(virt);
- int i;
-
- selftest_vm.kvm.arch.mmu.vtcr = host_mmu.arch.mmu.vtcr;
- WARN_ON(kvm_guest_prepare_stage2(&selftest_vm, virt));
-
- for (i = 0; i < pkvm_selftest_pages(); i++) {
- if (p[i].refcount)
- continue;
- p[i].refcount = 1;
- hyp_put_page(&selftest_vm.pool, hyp_page_to_virt(&p[i]));
- }
-}
+static struct pkvm_hyp_vcpu *selftest_vcpu;
static u64 selftest_ipa(void)
{
- return BIT(selftest_vm.pgt.ia_bits - 1);
+ return BIT(selftest_vcpu->vcpu.arch.hw_mmu->pgt->ia_bits - 1);
}
static void assert_page_state(void)
{
void *virt = hyp_page_to_virt(selftest_page);
u64 size = PAGE_SIZE << selftest_page->order;
- struct pkvm_hyp_vcpu *vcpu = &selftest_vcpu;
+ struct pkvm_hyp_vcpu *vcpu = selftest_vcpu;
u64 phys = hyp_virt_to_phys(virt);
u64 ipa[2] = { selftest_ipa(), selftest_ipa() + PAGE_SIZE };
struct pkvm_hyp_vm *vm;
@@ -1670,10 +1635,10 @@ static void assert_page_state(void)
WARN_ON(__hyp_check_page_state_range(phys, size, selftest_state.hyp));
hyp_unlock_component();
- guest_lock_component(&selftest_vm);
+ guest_lock_component(vm);
WARN_ON(__guest_check_page_state_range(vm, ipa[0], size, selftest_state.guest[0]));
WARN_ON(__guest_check_page_state_range(vm, ipa[1], size, selftest_state.guest[1]));
- guest_unlock_component(&selftest_vm);
+ guest_unlock_component(vm);
}
#define assert_transition_res(res, fn, ...) \
@@ -1686,14 +1651,15 @@ void pkvm_ownership_selftest(void *base)
{
enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_RWX;
void *virt = hyp_alloc_pages(&host_s2_pool, 0);
- struct pkvm_hyp_vcpu *vcpu = &selftest_vcpu;
- struct pkvm_hyp_vm *vm = &selftest_vm;
+ struct pkvm_hyp_vcpu *vcpu;
u64 phys, size, pfn, gfn;
+ struct pkvm_hyp_vm *vm;
WARN_ON(!virt);
selftest_page = hyp_virt_to_page(virt);
selftest_page->refcount = 0;
- init_selftest_vm(base);
+ selftest_vcpu = vcpu = init_selftest_vm(base);
+ vm = pkvm_hyp_vcpu_to_hyp_vm(vcpu);
size = PAGE_SIZE << selftest_page->order;
phys = hyp_virt_to_phys(virt);
@@ -1817,6 +1783,7 @@ void pkvm_ownership_selftest(void *base)
selftest_state.hyp = PKVM_PAGE_OWNED;
assert_transition_res(0, __pkvm_host_donate_hyp, pfn, 1);
+ teardown_selftest_vm();
selftest_page->refcount = 1;
hyp_put_page(&host_s2_pool, virt);
}
diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
index 2890328f4a78..6dc90ccc99a2 100644
--- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
+++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
@@ -696,6 +696,55 @@ void __pkvm_unreserve_vm(pkvm_handle_t handle)
hyp_spin_unlock(&vm_table_lock);
}
+#ifdef CONFIG_NVHE_EL2_DEBUG
+static struct pkvm_hyp_vm selftest_vm = {
+ .kvm = {
+ .arch = {
+ .mmu = {
+ .arch = &selftest_vm.kvm.arch,
+ .pgt = &selftest_vm.pgt,
+ },
+ },
+ },
+};
+
+static struct pkvm_hyp_vcpu selftest_vcpu = {
+ .vcpu = {
+ .arch = {
+ .hw_mmu = &selftest_vm.kvm.arch.mmu,
+ },
+ .kvm = &selftest_vm.kvm,
+ },
+};
+
+struct pkvm_hyp_vcpu *init_selftest_vm(void *virt)
+{
+ struct hyp_page *p = hyp_virt_to_page(virt);
+ int i;
+
+ selftest_vm.kvm.arch.mmu.vtcr = host_mmu.arch.mmu.vtcr;
+ WARN_ON(kvm_guest_prepare_stage2(&selftest_vm, virt));
+
+ for (i = 0; i < pkvm_selftest_pages(); i++) {
+ if (p[i].refcount)
+ continue;
+ p[i].refcount = 1;
+ hyp_put_page(&selftest_vm.pool, hyp_page_to_virt(&p[i]));
+ }
+
+ selftest_vm.kvm.arch.pkvm.handle = __pkvm_reserve_vm();
+ insert_vm_table_entry(selftest_vm.kvm.arch.pkvm.handle, &selftest_vm);
+ return &selftest_vcpu;
+}
+
+void teardown_selftest_vm(void)
+{
+ hyp_spin_lock(&vm_table_lock);
+ remove_vm_table_entry(selftest_vm.kvm.arch.pkvm.handle);
+ hyp_spin_unlock(&vm_table_lock);
+}
+#endif /* CONFIG_NVHE_EL2_DEBUG */
+
/*
* Initialize the hypervisor copy of the VM state using host-donated memory.
*
--
2.52.0.351.gbe84eed79e-goog
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH 30/30] KVM: arm64: Extend pKVM page ownership selftests to cover forced reclaim
2026-01-05 15:49 [PATCH 00/30] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
` (28 preceding siblings ...)
2026-01-05 15:49 ` [PATCH 29/30] KVM: arm64: Register 'selftest_vm' in the VM table Will Deacon
@ 2026-01-05 15:49 ` Will Deacon
29 siblings, 0 replies; 45+ messages in thread
From: Will Deacon @ 2026-01-05 15:49 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh
Extend the pKVM page ownership selftests to forcefully reclaim a donated
page and check that it cannot be re-donated at the same IPA.
Signed-off-by: Will Deacon <will@kernel.org>
---
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 16 ++++++++++++++--
1 file changed, 14 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index f59f5e24ddda..5a1bac31e253 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -1776,8 +1776,20 @@ void pkvm_ownership_selftest(void *base)
assert_transition_res(-EPERM, __pkvm_hyp_donate_host, pfn, 1);
selftest_state.host = PKVM_PAGE_OWNED;
- selftest_state.guest[0] = PKVM_NOPAGE;
- assert_transition_res(0, __pkvm_host_reclaim_page_guest, gfn, vm);
+ selftest_state.guest[0] = PKVM_POISON;
+ assert_transition_res(0, __pkvm_host_force_reclaim_page_guest, phys);
+ assert_transition_res(-EPERM, __pkvm_host_donate_guest, pfn, gfn, vcpu);
+ assert_transition_res(-EPERM, __pkvm_host_share_guest, pfn, gfn, 1, vcpu, prot);
+
+ selftest_state.host = PKVM_NOPAGE;
+ selftest_state.guest[1] = PKVM_PAGE_OWNED;
+ assert_transition_res(0, __pkvm_host_donate_guest, pfn, gfn + 1, vcpu);
+
+ selftest_state.host = PKVM_PAGE_OWNED;
+ selftest_state.guest[1] = PKVM_NOPAGE;
+ assert_transition_res(0, __pkvm_host_reclaim_page_guest, gfn + 1, vm);
+ assert_transition_res(-EPERM, __pkvm_host_donate_guest, pfn, gfn, vcpu);
+ assert_transition_res(-EPERM, __pkvm_host_share_guest, pfn, gfn, 1, vcpu, prot);
selftest_state.host = PKVM_NOPAGE;
selftest_state.hyp = PKVM_PAGE_OWNED;
--
2.52.0.351.gbe84eed79e-goog
^ permalink raw reply related [flat|nested] 45+ messages in thread
* Re: [PATCH 02/30] KVM: arm64: Remove redundant 'pgt' pointer checks from MMU notifiers
2026-01-05 15:49 ` [PATCH 02/30] KVM: arm64: Remove redundant 'pgt' pointer checks from MMU notifiers Will Deacon
@ 2026-01-06 14:32 ` Quentin Perret
0 siblings, 0 replies; 45+ messages in thread
From: Quentin Perret @ 2026-01-06 14:32 UTC (permalink / raw)
To: Will Deacon
Cc: kvmarm, linux-arm-kernel, Marc Zyngier, Oliver Upton, Joey Gouly,
Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Fuad Tabba,
Vincent Donnefort, Mostafa Saleh
Hey Will,
On Monday 05 Jan 2026 at 15:49:10 (+0000), Will Deacon wrote:
> The MMU notifiers are registered by kvm_init_mmu_notifier() only after
> kvm_arch_init_vm() has returned successfully. Since the latter function
> initialises the 'kvm->arch.mmu.pgt' pointer (and allocates the VM handle
> when pKVM is enabled), the initialisation checks in the MMU notifiers
> are not required.
It took me a while to remember, but I think these checks are needed for
the free path rather than init. In particular, the doc for
mmu_notifier_ops::release() (from which we free the pgt) says that it
"can run concurrently with other mmu notifier" (see mmu_notifier.h), which
is fun.
Had you considered that path? If so, probably worth expanding in the
commit description why this is safe?
Cheers,
Quentin
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 01/30] KVM: arm64: Invert KVM_PGTABLE_WALK_HANDLE_FAULT to fix pKVM walkers
2026-01-05 15:49 ` [PATCH 01/30] KVM: arm64: Invert KVM_PGTABLE_WALK_HANDLE_FAULT to fix pKVM walkers Will Deacon
@ 2026-01-06 14:33 ` Quentin Perret
0 siblings, 0 replies; 45+ messages in thread
From: Quentin Perret @ 2026-01-06 14:33 UTC (permalink / raw)
To: Will Deacon
Cc: kvmarm, linux-arm-kernel, Marc Zyngier, Oliver Upton, Joey Gouly,
Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Fuad Tabba,
Vincent Donnefort, Mostafa Saleh
On Monday 05 Jan 2026 at 15:49:09 (+0000), Will Deacon wrote:
> Commit ddcadb297ce5 ("KVM: arm64: Ignore EAGAIN for walks outside of a
> fault") introduced a new walker flag ('KVM_PGTABLE_WALK_HANDLE_FAULT')
> to KVM's page-table code. When set, the walk logic maintains its
> previous behaviour of terminating a walk as soon as the visitor callback
> returns an error. However, when the flag is clear, the walk will
> continue if the visitor returns -EAGAIN and the error is then suppressed
> and returned as zero to the caller.
>
> Clearing the flag is beneficial when write-protecting a range of IPAs
> with kvm_pgtable_stage2_wrprotect() but is not useful in any other
> cases, either because we are operating on a single page (e.g.
> kvm_pgtable_stage2_mkyoung() or kvm_phys_addr_ioremap()) or because the
> early termination is desirable (e.g. when mapping pages from a fault in
> user_mem_abort()).
>
> Subsequently, commit e912efed485a ("KVM: arm64: Introduce the EL1 pKVM
> MMU") hooked up pKVM's hypercall interface to the MMU code at EL1 but
> failed to propagate any of the walker flags. As a result, page-table
> walks at EL2 fail to set KVM_PGTABLE_WALK_HANDLE_FAULT even when the
> early termination semantics are desirable on the fault handling path.
>
> Rather than complicate the pKVM hypercall interface, invert the flag so
> that the whole thing can be simplified and only pass the new flag
> ('KVM_PGTABLE_WALK_IGNORE_EAGAIN') from the wrprotect code.
Reviewed-by: Quentin Perret <qperret@google.com>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 06/30] KVM: arm64: Remove pointless is_protected_kvm_enabled() checks from hyp
2026-01-05 15:49 ` [PATCH 06/30] KVM: arm64: Remove pointless is_protected_kvm_enabled() checks from hyp Will Deacon
@ 2026-01-06 14:40 ` Quentin Perret
0 siblings, 0 replies; 45+ messages in thread
From: Quentin Perret @ 2026-01-06 14:40 UTC (permalink / raw)
To: Will Deacon
Cc: kvmarm, linux-arm-kernel, Marc Zyngier, Oliver Upton, Joey Gouly,
Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Fuad Tabba,
Vincent Donnefort, Mostafa Saleh
On Monday 05 Jan 2026 at 15:49:14 (+0000), Will Deacon wrote:
> When pKVM is not enabled, the host shouldn't issue pKVM-specific
> hypercalls and so there's no point checking for this in the EL2
> hypercall handling code.
>
> Remove the redundant is_protected_kvm_enabled() checks.
That made me wonder if we should further divide the HVC space to have a
'pKVM only' range in addition to the privileged/unprivileged split, so
we could WARN in the core HVC handler in the pretty unlikely event that
we take a pKVM-only call in {n,h}VHE.
But yes there is no point in littering the code with is_protected_kvm_enabled()
checks all over, so:
Reviewed-by: Quentin Perret <qperret@google.com>
Thanks,
Quentin
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 10/30] KVM: arm64: Introduce __pkvm_host_donate_guest()
2026-01-05 15:49 ` [PATCH 10/30] KVM: arm64: Introduce __pkvm_host_donate_guest() Will Deacon
@ 2026-01-06 14:48 ` Quentin Perret
0 siblings, 0 replies; 45+ messages in thread
From: Quentin Perret @ 2026-01-06 14:48 UTC (permalink / raw)
To: Will Deacon
Cc: kvmarm, linux-arm-kernel, Marc Zyngier, Oliver Upton, Joey Gouly,
Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Fuad Tabba,
Vincent Donnefort, Mostafa Saleh
On Monday 05 Jan 2026 at 15:49:18 (+0000), Will Deacon wrote:
> +static void handle___pkvm_host_donate_guest(struct kvm_cpu_context *host_ctxt)
> +{
> + DECLARE_REG(u64, pfn, host_ctxt, 1);
> + DECLARE_REG(u64, gfn, host_ctxt, 2);
> + struct pkvm_hyp_vcpu *hyp_vcpu;
> + int ret = -EINVAL;
> +
> + hyp_vcpu = pkvm_get_loaded_hyp_vcpu();
> + if (!hyp_vcpu)
I guess we should check this is a protected VM here, else a malicious
host could donate pages to an np-guest. I didn't try to think through
the implications, perhaps it's fine, but it feels unecessary so I'd say
let's be restrictive here.
Cheers,
Quentin
> + goto out;
> +
> + ret = pkvm_refill_memcache(hyp_vcpu);
> + if (ret)
> + goto out;
> +
> + ret = __pkvm_host_donate_guest(pfn, gfn, hyp_vcpu);
> +out:
> + cpu_reg(host_ctxt, 1) = ret;
> +}
> +
> static void handle___pkvm_host_share_guest(struct kvm_cpu_context *host_ctxt)
> {
> DECLARE_REG(u64, pfn, host_ctxt, 1);
> @@ -580,6 +600,7 @@ static const hcall_t host_hcall[] = {
>
> HANDLE_FUNC(__pkvm_host_share_hyp),
> HANDLE_FUNC(__pkvm_host_unshare_hyp),
> + HANDLE_FUNC(__pkvm_host_donate_guest),
> HANDLE_FUNC(__pkvm_host_share_guest),
> HANDLE_FUNC(__pkvm_host_unshare_guest),
> HANDLE_FUNC(__pkvm_host_relax_perms_guest),
> diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> index 49db32f3ddf7..ae126ab9febf 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> @@ -958,6 +958,36 @@ static int __guest_check_transition_size(u64 phys, u64 ipa, u64 nr_pages, u64 *s
> return 0;
> }
>
> +int __pkvm_host_donate_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu)
> +{
> + struct pkvm_hyp_vm *vm = pkvm_hyp_vcpu_to_hyp_vm(vcpu);
> + u64 phys = hyp_pfn_to_phys(pfn);
> + u64 ipa = hyp_pfn_to_phys(gfn);
> + int ret;
> +
> + host_lock_component();
> + guest_lock_component(vm);
> +
> + ret = __host_check_page_state_range(phys, PAGE_SIZE, PKVM_PAGE_OWNED);
> + if (ret)
> + goto unlock;
> +
> + ret = __guest_check_page_state_range(vm, ipa, PAGE_SIZE, PKVM_NOPAGE);
> + if (ret)
> + goto unlock;
> +
> + WARN_ON(host_stage2_set_owner_locked(phys, PAGE_SIZE, PKVM_ID_GUEST));
> + WARN_ON(kvm_pgtable_stage2_map(&vm->pgt, ipa, PAGE_SIZE, phys,
> + pkvm_mkstate(KVM_PGTABLE_PROT_RWX, PKVM_PAGE_OWNED),
> + &vcpu->vcpu.arch.pkvm_memcache, 0));
> +
> +unlock:
> + guest_unlock_component(vm);
> + host_unlock_component();
> +
> + return ret;
> +}
> +
> int __pkvm_host_share_guest(u64 pfn, u64 gfn, u64 nr_pages, struct pkvm_hyp_vcpu *vcpu,
> enum kvm_pgtable_prot prot)
> {
> --
> 2.52.0.351.gbe84eed79e-goog
>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 14/30] KVM: arm64: Hook up reclaim hypercall to pkvm_pgtable_stage2_destroy()
2026-01-05 15:49 ` [PATCH 14/30] KVM: arm64: Hook up reclaim hypercall to pkvm_pgtable_stage2_destroy() Will Deacon
@ 2026-01-06 14:59 ` Quentin Perret
0 siblings, 0 replies; 45+ messages in thread
From: Quentin Perret @ 2026-01-06 14:59 UTC (permalink / raw)
To: Will Deacon
Cc: kvmarm, linux-arm-kernel, Marc Zyngier, Oliver Upton, Joey Gouly,
Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Fuad Tabba,
Vincent Donnefort, Mostafa Saleh
On Monday 05 Jan 2026 at 15:49:22 (+0000), Will Deacon wrote:
> During teardown of a protected guest, its memory pages must be reclaimed
> from the hypervisor by issuing the '__pkvm_reclaim_dying_guest_page'
> hypercall.
>
> Add a new helper, __pkvm_pgtable_stage2_reclaim(), which is called
> during the VM teardown operation to reclaim pages from the hypervisor
> and drop the GUP pin on the host.
>
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
> arch/arm64/kvm/pkvm.c | 31 ++++++++++++++++++++++++++++++-
> 1 file changed, 30 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
> index 1814e17d600e..8be91051699e 100644
> --- a/arch/arm64/kvm/pkvm.c
> +++ b/arch/arm64/kvm/pkvm.c
> @@ -322,6 +322,32 @@ int pkvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
> return 0;
> }
>
> +static int __pkvm_pgtable_stage2_reclaim(struct kvm_pgtable *pgt, u64 start, u64 end)
> +{
> + struct kvm *kvm = kvm_s2_mmu_to_kvm(pgt->mmu);
> + pkvm_handle_t handle = kvm->arch.pkvm.handle;
> + struct pkvm_mapping *mapping;
> + int ret;
> +
> + for_each_mapping_in_range_safe(pgt, start, end, mapping) {
> + struct page *page;
> +
> + ret = kvm_call_hyp_nvhe(__pkvm_reclaim_dying_guest_page,
> + handle, mapping->gfn);
> + if (WARN_ON(ret))
> + return ret;
> +
> + page = pfn_to_page(mapping->pfn);
> + WARN_ON_ONCE(mapping->nr_pages != 1);
> + unpin_user_pages_dirty_lock(&page, 1, true);
> + account_locked_vm(current->mm, 1, false);
> + pkvm_mapping_remove(mapping, &pgt->pkvm_mappings);
> + kfree(mapping);
Nit: this might take a while, worth adding a cond_resched() in here?
> + }
> +
> + return 0;
> +}
> +
> static int __pkvm_pgtable_stage2_unshare(struct kvm_pgtable *pgt, u64 start, u64 end)
> {
> struct kvm *kvm = kvm_s2_mmu_to_kvm(pgt->mmu);
> @@ -355,7 +381,10 @@ void pkvm_pgtable_stage2_destroy_range(struct kvm_pgtable *pgt,
> kvm->arch.pkvm.is_dying = true;
> }
>
> - __pkvm_pgtable_stage2_unshare(pgt, addr, addr + size);
> + if (kvm_vm_is_protected(kvm))
> + __pkvm_pgtable_stage2_reclaim(pgt, addr, addr + size);
> + else
> + __pkvm_pgtable_stage2_unshare(pgt, addr, addr + size);
> }
>
> void pkvm_pgtable_stage2_destroy_pgd(struct kvm_pgtable *pgt)
> --
> 2.52.0.351.gbe84eed79e-goog
>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 17/30] KVM: arm64: Generalise kvm_pgtable_stage2_set_owner()
2026-01-05 15:49 ` [PATCH 17/30] KVM: arm64: Generalise kvm_pgtable_stage2_set_owner() Will Deacon
@ 2026-01-06 15:20 ` Quentin Perret
0 siblings, 0 replies; 45+ messages in thread
From: Quentin Perret @ 2026-01-06 15:20 UTC (permalink / raw)
To: Will Deacon
Cc: kvmarm, linux-arm-kernel, Marc Zyngier, Oliver Upton, Joey Gouly,
Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Fuad Tabba,
Vincent Donnefort, Mostafa Saleh
On Monday 05 Jan 2026 at 15:49:25 (+0000), Will Deacon wrote:
> /**
> - * kvm_pgtable_stage2_set_owner() - Unmap and annotate pages in the IPA space to
> - * track ownership.
> + * kvm_pgtable_stage2_annotate() - Unmap and annotate pages in the IPA space
> + * to track ownership (and more).
> * @pgt: Page-table structure initialised by kvm_pgtable_stage2_init*().
> * @addr: Base intermediate physical address to annotate.
> * @size: Size of the annotated range.
> * @mc: Cache of pre-allocated and zeroed memory from which to allocate
> * page-table pages.
> - * @owner_id: Unique identifier for the owner of the page.
> + * @annotation: A 62-bit value that will be stored in the page tables.
> + * @annotation[0] and @annotation[63] must be 0.
> + * @annotation[62:1] is stored in the page tables.
> *
> * By default, all page-tables are owned by identifier 0. This function can be
> * used to mark portions of the IPA space as owned by other entities. When a
> @@ -673,8 +678,8 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
> *
> * Return: 0 on success, negative error code on failure.
> */
> -int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size,
> - void *mc, u8 owner_id);
> +int kvm_pgtable_stage2_annotate(struct kvm_pgtable *pgt, u64 addr, u64 size,
> + void *mc, kvm_pte_t annotation);
While we're on this topic, perhaps we could go one step further and 'type'
the annotation itself? For instance have a 'type' and 'meta' parameter
directly at the kvm_pgatble_stage2_annotate() level instead of leaving
that up to the callers. This would allow to have one place to allocate
annotation 'types' (donated pages, locked PTE, MMIO guard, ...) and one
way to serialize/deserialize them. That 'type' would be stored in top 2
or 3 bits of the PTE for instance, and decoding of the 'meta' field would
be dependant on the type value. Thoughts?
Thanks,
Quentin
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 20/30] KVM: arm64: Introduce hypercall to force reclaim of a protected page
2026-01-05 15:49 ` [PATCH 20/30] KVM: arm64: Introduce hypercall to force reclaim of a protected page Will Deacon
@ 2026-01-06 15:44 ` Quentin Perret
0 siblings, 0 replies; 45+ messages in thread
From: Quentin Perret @ 2026-01-06 15:44 UTC (permalink / raw)
To: Will Deacon
Cc: kvmarm, linux-arm-kernel, Marc Zyngier, Oliver Upton, Joey Gouly,
Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Fuad Tabba,
Vincent Donnefort, Mostafa Saleh
> +static int host_stage2_decode_gfn_meta(kvm_pte_t pte, struct pkvm_hyp_vm **vm,
> + u64 *gfn)
> +{
> + pkvm_handle_t handle;
> + u64 meta;
> +
> + if (kvm_pte_valid(pte))
> + return -EINVAL;
Nit:
I can't think of any cases where we'd end up returning -EINVAL here that
isn't indicative of a major problem (e.g taking a stage-2 perm fault we
don't expect) given that we've extensively checked the state of the page
already. Upgrade to WARN()? It's fatal, but the system is unlikely to
make much more progress if we return cleanly anyways, so we might as
well make it obvious where things went wrong.
> + if (FIELD_GET(KVM_INVALID_PTE_OWNER_MASK, pte) != PKVM_ID_GUEST)
> + return -EPERM;
> +
> + meta = FIELD_GET(KVM_INVALID_PTE_EXTRA_MASK, pte);
> + handle = FIELD_GET(KVM_HOST_INVALID_PTE_GUEST_HANDLE_MASK, meta);
> + *vm = get_vm_by_handle(handle);
> + if (!*vm) {
> + /* We probably raced with teardown; try again */
> + return -EAGAIN;
> + }
> +
> + *gfn = FIELD_GET(KVM_HOST_INVALID_PTE_GUEST_GFN_MASK, meta);
> + return 0;
> +}
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 24/30] KVM: arm64: Implement the MEM_SHARE hypercall for protected VMs
2026-01-05 15:49 ` [PATCH 24/30] KVM: arm64: Implement the MEM_SHARE hypercall for " Will Deacon
@ 2026-01-06 15:45 ` Vincent Donnefort
0 siblings, 0 replies; 45+ messages in thread
From: Vincent Donnefort @ 2026-01-06 15:45 UTC (permalink / raw)
To: Will Deacon
Cc: kvmarm, linux-arm-kernel, Marc Zyngier, Oliver Upton, Joey Gouly,
Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Quentin Perret,
Fuad Tabba, Mostafa Saleh
[...]
> @@ -952,6 +1004,7 @@ bool kvm_handle_pvm_hvc64(struct kvm_vcpu *vcpu, u64 *exit_code)
> case ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID:
> val[0] = BIT(ARM_SMCCC_KVM_FUNC_FEATURES);
> val[0] |= BIT(ARM_SMCCC_KVM_FUNC_HYP_MEMINFO);
> + val[0] |= BIT(ARM_SMCCC_KVM_FUNC_MEM_SHARE);
> break;
> case ARM_SMCCC_VENDOR_HYP_KVM_HYP_MEMINFO_FUNC_ID:
> if (smccc_get_arg1(vcpu) ||
> @@ -962,6 +1015,14 @@ bool kvm_handle_pvm_hvc64(struct kvm_vcpu *vcpu, u64 *exit_code)
>
> val[0] = PAGE_SIZE;
> break;
> + case ARM_SMCCC_VENDOR_HYP_KVM_MEM_SHARE_FUNC_ID:
> + if (smccc_get_arg2(vcpu) ||
> + smccc_get_arg3(vcpu)) {
> + break;
> + }
I wonder if that shouldn't go into pkvm_memshare_call(): that function has the
knowledge of the argument content since we pass vcpu.
Otherwise:
Reviewed-by: Vincent Donnefort <vdonnefort@google.com>
> +
> + handled = pkvm_memshare_call(val, vcpu, exit_code);
> + break;
> default:
> /* Punt everything else back to the host, for now. */
> handled = false;
> --
> 2.52.0.351.gbe84eed79e-goog
>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 25/30] KVM: arm64: Implement the MEM_UNSHARE hypercall for protected VMs
2026-01-05 15:49 ` [PATCH 25/30] KVM: arm64: Implement the MEM_UNSHARE " Will Deacon
@ 2026-01-06 15:50 ` Vincent Donnefort
0 siblings, 0 replies; 45+ messages in thread
From: Vincent Donnefort @ 2026-01-06 15:50 UTC (permalink / raw)
To: Will Deacon
Cc: kvmarm, linux-arm-kernel, Marc Zyngier, Oliver Upton, Joey Gouly,
Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Quentin Perret,
Fuad Tabba, Mostafa Saleh
On Mon, Jan 05, 2026 at 03:49:33PM +0000, Will Deacon wrote:
> Implement the ARM_SMCCC_KVM_FUNC_MEM_UNSHARE hypercall to allow
> protected VMs to unshare memory that was previously shared with the host
> using the ARM_SMCCC_KVM_FUNC_MEM_SHARE hypercall.
>
> Signed-off-by: Will Deacon <will@kernel.org>
Reviewed-by: Vincent Donnefort <vdonnefort@google.com>
> ---
> arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 1 +
> arch/arm64/kvm/hyp/nvhe/mem_protect.c | 32 +++++++++++++++++++
> arch/arm64/kvm/hyp/nvhe/pkvm.c | 22 +++++++++++++
> 3 files changed, 55 insertions(+)
>
> diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> index 42fd60c5cfc9..e41a128b0854 100644
> --- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> +++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> @@ -36,6 +36,7 @@ extern unsigned long hyp_nr_cpus;
> int __pkvm_prot_finalize(void);
> int __pkvm_host_share_hyp(u64 pfn);
> int __pkvm_guest_share_host(struct pkvm_hyp_vcpu *vcpu, u64 gfn);
> +int __pkvm_guest_unshare_host(struct pkvm_hyp_vcpu *vcpu, u64 gfn);
> int __pkvm_host_unshare_hyp(u64 pfn);
> int __pkvm_host_donate_hyp(u64 pfn, u64 nr_pages);
> int __pkvm_hyp_donate_host(u64 pfn, u64 nr_pages);
> diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> index 365c769c82a4..c1600b88c316 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> @@ -920,6 +920,38 @@ int __pkvm_guest_share_host(struct pkvm_hyp_vcpu *vcpu, u64 gfn)
> return ret;
> }
>
> +int __pkvm_guest_unshare_host(struct pkvm_hyp_vcpu *vcpu, u64 gfn)
> +{
> + struct pkvm_hyp_vm *vm = pkvm_hyp_vcpu_to_hyp_vm(vcpu);
> + u64 phys, ipa = hyp_pfn_to_phys(gfn);
> + kvm_pte_t pte;
> + int ret;
> +
> + host_lock_component();
> + guest_lock_component(vm);
> +
> + ret = get_valid_guest_pte(vm, ipa, &pte, &phys);
> + if (ret)
> + goto unlock;
> +
> + ret = -EPERM;
> + if (pkvm_getstate(kvm_pgtable_stage2_pte_prot(pte)) != PKVM_PAGE_SHARED_OWNED)
> + goto unlock;
> + if (__host_check_page_state_range(phys, PAGE_SIZE, PKVM_PAGE_SHARED_BORROWED))
> + goto unlock;
> +
> + ret = 0;
> + WARN_ON(host_stage2_set_owner_locked(phys, PAGE_SIZE, PKVM_ID_GUEST));
> + WARN_ON(kvm_pgtable_stage2_map(&vm->pgt, ipa, PAGE_SIZE, phys,
> + pkvm_mkstate(KVM_PGTABLE_PROT_RWX, PKVM_PAGE_OWNED),
> + &vcpu->vcpu.arch.pkvm_memcache, 0));
> +unlock:
> + guest_unlock_component(vm);
> + host_unlock_component();
> +
> + return ret;
> +}
> +
> int __pkvm_host_unshare_hyp(u64 pfn)
> {
> u64 phys = hyp_pfn_to_phys(pfn);
> diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
> index d8afa2b98542..2890328f4a78 100644
> --- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
> +++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
> @@ -988,6 +988,19 @@ static bool pkvm_memshare_call(u64 *ret, struct kvm_vcpu *vcpu, u64 *exit_code)
> return false;
> }
>
> +static void pkvm_memunshare_call(u64 *ret, struct kvm_vcpu *vcpu)
> +{
> + struct pkvm_hyp_vcpu *hyp_vcpu;
> + u64 ipa = smccc_get_arg1(vcpu);
> +
> + if (!PAGE_ALIGNED(ipa))
> + return;
> +
> + hyp_vcpu = container_of(vcpu, struct pkvm_hyp_vcpu, vcpu);
> + if (!__pkvm_guest_unshare_host(hyp_vcpu, hyp_phys_to_pfn(ipa)))
> + ret[0] = SMCCC_RET_SUCCESS;
> +}
> +
> /*
> * Handler for protected VM HVC calls.
> *
> @@ -1005,6 +1018,7 @@ bool kvm_handle_pvm_hvc64(struct kvm_vcpu *vcpu, u64 *exit_code)
> val[0] = BIT(ARM_SMCCC_KVM_FUNC_FEATURES);
> val[0] |= BIT(ARM_SMCCC_KVM_FUNC_HYP_MEMINFO);
> val[0] |= BIT(ARM_SMCCC_KVM_FUNC_MEM_SHARE);
> + val[0] |= BIT(ARM_SMCCC_KVM_FUNC_MEM_UNSHARE);
> break;
> case ARM_SMCCC_VENDOR_HYP_KVM_HYP_MEMINFO_FUNC_ID:
> if (smccc_get_arg1(vcpu) ||
> @@ -1023,6 +1037,14 @@ bool kvm_handle_pvm_hvc64(struct kvm_vcpu *vcpu, u64 *exit_code)
>
> handled = pkvm_memshare_call(val, vcpu, exit_code);
> break;
> + case ARM_SMCCC_VENDOR_HYP_KVM_MEM_UNSHARE_FUNC_ID:
> + if (smccc_get_arg2(vcpu) ||
> + smccc_get_arg3(vcpu)) {
> + break;
> + }
> +
> + pkvm_memunshare_call(val, vcpu);
> + break;
> default:
> /* Punt everything else back to the host, for now. */
> handled = false;
> --
> 2.52.0.351.gbe84eed79e-goog
>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 23/30] KVM: arm64: Add hvc handler at EL2 for hypercalls from protected VMs
2026-01-05 15:49 ` [PATCH 23/30] KVM: arm64: Add hvc handler at EL2 for hypercalls from protected VMs Will Deacon
@ 2026-01-06 15:52 ` Vincent Donnefort
0 siblings, 0 replies; 45+ messages in thread
From: Vincent Donnefort @ 2026-01-06 15:52 UTC (permalink / raw)
To: Will Deacon
Cc: kvmarm, linux-arm-kernel, Marc Zyngier, Oliver Upton, Joey Gouly,
Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Quentin Perret,
Fuad Tabba, Mostafa Saleh
On Mon, Jan 05, 2026 at 03:49:31PM +0000, Will Deacon wrote:
> Add a hypercall handler at EL2 for hypercalls originating from protected
> VMs. For now, this implements only the FEATURES and MEMINFO calls, but
> subsequent patches will implement the SHARE and UNSHARE functions
> necessary for virtio.
>
> Unhandled hypercalls (including PSCI) are passed back to the host.
>
> Signed-off-by: Will Deacon <will@kernel.org>
Reviewed-by: Vincent Donnefort <vdonnefort@google.com>
> ---
> arch/arm64/kvm/hyp/include/nvhe/pkvm.h | 1 +
> arch/arm64/kvm/hyp/nvhe/pkvm.c | 37 ++++++++++++++++++++++++++
> arch/arm64/kvm/hyp/nvhe/switch.c | 1 +
> 3 files changed, 39 insertions(+)
>
> diff --git a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
> index a5a7bb453f3e..c904647d2f76 100644
> --- a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
> +++ b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
> @@ -88,6 +88,7 @@ struct pkvm_hyp_vm *get_pkvm_hyp_vm(pkvm_handle_t handle);
> struct pkvm_hyp_vm *get_np_pkvm_hyp_vm(pkvm_handle_t handle);
> void put_pkvm_hyp_vm(struct pkvm_hyp_vm *hyp_vm);
>
> +bool kvm_handle_pvm_hvc64(struct kvm_vcpu *vcpu, u64 *exit_code);
> bool kvm_handle_pvm_sysreg(struct kvm_vcpu *vcpu, u64 *exit_code);
> bool kvm_handle_pvm_restricted(struct kvm_vcpu *vcpu, u64 *exit_code);
> void kvm_init_pvm_id_regs(struct kvm_vcpu *vcpu);
> diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
> index df340de59eed..5cdec49e989b 100644
> --- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
> +++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
> @@ -4,6 +4,8 @@
> * Author: Fuad Tabba <tabba@google.com>
> */
>
> +#include <kvm/arm_hypercalls.h>
> +
> #include <linux/kvm_host.h>
> #include <linux/mm.h>
>
> @@ -934,3 +936,38 @@ int __pkvm_finalize_teardown_vm(pkvm_handle_t handle)
> hyp_spin_unlock(&vm_table_lock);
> return err;
> }
> +/*
> + * Handler for protected VM HVC calls.
> + *
> + * Returns true if the hypervisor has handled the exit (and control
> + * should return to the guest) or false if it hasn't (and the handling
> + * should be performed by the host).
> + */
> +bool kvm_handle_pvm_hvc64(struct kvm_vcpu *vcpu, u64 *exit_code)
> +{
> + u64 val[4] = { SMCCC_RET_INVALID_PARAMETER };
> + bool handled = true;
> +
> + switch (smccc_get_function(vcpu)) {
> + case ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID:
> + val[0] = BIT(ARM_SMCCC_KVM_FUNC_FEATURES);
> + val[0] |= BIT(ARM_SMCCC_KVM_FUNC_HYP_MEMINFO);
> + break;
> + case ARM_SMCCC_VENDOR_HYP_KVM_HYP_MEMINFO_FUNC_ID:
> + if (smccc_get_arg1(vcpu) ||
> + smccc_get_arg2(vcpu) ||
> + smccc_get_arg3(vcpu)) {
> + break;
> + }
> +
> + val[0] = PAGE_SIZE;
> + break;
> + default:
> + /* Punt everything else back to the host, for now. */
> + handled = false;
> + }
> +
> + if (handled)
> + smccc_set_retval(vcpu, val[0], val[1], val[2], val[3]);
> + return handled;
> +}
> diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
> index d3b9ec8a7c28..b62e25e8bb7e 100644
> --- a/arch/arm64/kvm/hyp/nvhe/switch.c
> +++ b/arch/arm64/kvm/hyp/nvhe/switch.c
> @@ -190,6 +190,7 @@ static const exit_handler_fn hyp_exit_handlers[] = {
>
> static const exit_handler_fn pvm_exit_handlers[] = {
> [0 ... ESR_ELx_EC_MAX] = NULL,
> + [ESR_ELx_EC_HVC64] = kvm_handle_pvm_hvc64,
> [ESR_ELx_EC_SYS64] = kvm_handle_pvm_sys64,
> [ESR_ELx_EC_SVE] = kvm_handle_pvm_restricted,
> [ESR_ELx_EC_FP_ASIMD] = kvm_hyp_handle_fpsimd,
> --
> 2.52.0.351.gbe84eed79e-goog
>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 22/30] KVM: arm64: Return -EFAULT from VCPU_RUN on access to a poisoned pte
2026-01-05 15:49 ` [PATCH 22/30] KVM: arm64: Return -EFAULT from VCPU_RUN on access to a poisoned pte Will Deacon
@ 2026-01-06 15:54 ` Quentin Perret
0 siblings, 0 replies; 45+ messages in thread
From: Quentin Perret @ 2026-01-06 15:54 UTC (permalink / raw)
To: Will Deacon
Cc: kvmarm, linux-arm-kernel, Marc Zyngier, Oliver Upton, Joey Gouly,
Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Fuad Tabba,
Vincent Donnefort, Mostafa Saleh
On Monday 05 Jan 2026 at 15:49:30 (+0000), Will Deacon wrote:
> +int __pkvm_vcpu_in_poison_fault(struct pkvm_hyp_vcpu *hyp_vcpu)
> +{
> + struct pkvm_hyp_vm *vm = pkvm_hyp_vcpu_to_hyp_vm(hyp_vcpu);
> + kvm_pte_t pte;
> + s8 level;
> + u64 ipa;
> + int ret;
> +
> + switch (kvm_vcpu_trap_get_class(&hyp_vcpu->vcpu)) {
> + case ESR_ELx_EC_DABT_LOW:
> + case ESR_ELx_EC_IABT_LOW:
> + if (kvm_vcpu_trap_is_translation_fault(&hyp_vcpu->vcpu))
> + break;
> + fallthrough;
> + default:
> + return -EINVAL;
> + }
> +
> + ipa = kvm_vcpu_get_fault_ipa(&hyp_vcpu->vcpu);
> + ipa |= kvm_vcpu_get_hfar(&hyp_vcpu->vcpu) & GENMASK(11, 0);
Why is all the above needed? Could we simplify by having the host pass
the IPA to the hcall?
> + guest_lock_component(vm);
> + ret = kvm_pgtable_get_leaf(&vm->pgt, ipa, &pte, &level);
> + if (ret)
> + goto unlock;
> +
> + if (level != KVM_PGTABLE_LAST_LEVEL) {
> + ret = -EINVAL;
> + goto unlock;
> + }
> +
> + ret = guest_pte_is_poisoned(pte);
> +unlock:
> + guest_unlock_component(vm);
> + return ret;
> +}
> +
> int __pkvm_host_share_hyp(u64 pfn)
> {
> u64 phys = hyp_pfn_to_phys(pfn);
> diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
> index d1926cb08c76..14865907610c 100644
> --- a/arch/arm64/kvm/pkvm.c
> +++ b/arch/arm64/kvm/pkvm.c
> @@ -417,10 +417,13 @@ int pkvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
> return -EINVAL;
>
> /*
> - * We raced with another vCPU.
> + * We either raced with another vCPU or the guest PTE
> + * has been poisoned by an erroneous host access.
> */
> - if (mapping)
> - return -EAGAIN;
> + if (mapping) {
> + ret = kvm_call_hyp_nvhe(__pkvm_vcpu_in_poison_fault);
It's not too bad, but it's a shame we now issue that every time we have
such a race (which is frequent-ish). Could we perhaps only issue it if
at least one page has been forcefully reclaimed since boot?
> + return ret ? -EFAULT : -EAGAIN;
> + }
>
> ret = kvm_call_hyp_nvhe(__pkvm_host_donate_guest, pfn, gfn);
> } else {
> --
> 2.52.0.351.gbe84eed79e-goog
>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 27/30] KVM: arm64: Add some initial documentation for pKVM
2026-01-05 15:49 ` [PATCH 27/30] KVM: arm64: Add some initial documentation for pKVM Will Deacon
@ 2026-01-06 15:59 ` Vincent Donnefort
0 siblings, 0 replies; 45+ messages in thread
From: Vincent Donnefort @ 2026-01-06 15:59 UTC (permalink / raw)
To: Will Deacon
Cc: kvmarm, linux-arm-kernel, Marc Zyngier, Oliver Upton, Joey Gouly,
Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Quentin Perret,
Fuad Tabba, Mostafa Saleh
On Mon, Jan 05, 2026 at 03:49:35PM +0000, Will Deacon wrote:
> Add some initial documentation for pKVM to help people understand what
> is supported, the limitations of protected VMs when compared to
> non-protected VMs and also what is left to do.
>
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
> .../admin-guide/kernel-parameters.txt | 4 +-
> Documentation/virt/kvm/arm/index.rst | 1 +
> Documentation/virt/kvm/arm/pkvm.rst | 101 ++++++++++++++++++
> 3 files changed, 104 insertions(+), 2 deletions(-)
> create mode 100644 Documentation/virt/kvm/arm/pkvm.rst
>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index a8d0afde7f85..9939dc5654d2 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -3141,8 +3141,8 @@ Kernel parameters
> for the host. To force nVHE on VHE hardware, add
> "arm64_sw.hvhe=0 id_aa64mmfr1.vh=0" to the
> command-line.
> - "nested" is experimental and should be used with
> - extreme caution.
> + "nested" and "protected" are experimental and should be
> + used with extreme caution.
>
> kvm-arm.vgic_v3_group0_trap=
> [KVM,ARM,EARLY] Trap guest accesses to GICv3 group-0
> diff --git a/Documentation/virt/kvm/arm/index.rst b/Documentation/virt/kvm/arm/index.rst
> index ec09881de4cf..0856b4942e05 100644
> --- a/Documentation/virt/kvm/arm/index.rst
> +++ b/Documentation/virt/kvm/arm/index.rst
> @@ -10,6 +10,7 @@ ARM
> fw-pseudo-registers
> hyp-abi
> hypercalls
> + pkvm
> pvtime
> ptp_kvm
> vcpu-features
> diff --git a/Documentation/virt/kvm/arm/pkvm.rst b/Documentation/virt/kvm/arm/pkvm.rst
> new file mode 100644
> index 000000000000..1882bde8cc0b
> --- /dev/null
> +++ b/Documentation/virt/kvm/arm/pkvm.rst
> @@ -0,0 +1,101 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +====================
> +Protected KVM (pKVM)
> +====================
> +
> +**NOTE**: pKVM is currently an experimental, development feature and
> +subject to breaking changes as new isolation features are implemented.
> +Please reach out to the developers at kvmarm@lists.linux.dev if you have
> +any questions.
> +
> +Overview
> +========
> +
> +Booting a host kernel with '``kvm-arm.mode=protected``' enables
> +"Protected KVM" (pKVM). During boot, pKVM installs a stage-2 identity
> +map page-table for the host and uses it to isolate the hypervisor
> +running at EL2 from the rest of the host running at EL1/0.
> +
> +If ``CONFIG_PROTECTED_VM_UAPI=y``, pKVM permits creation of protected
> +virtual machines (pVMs) by passing the ``KVM_VM_TYPE_ARM_PROTECTED``
> +machine type identifier to the ``KVM_CREATE_VM`` ioctl(). The hypervisor
> +isolates pVMs from the host by unmapping pages from the stage-2 identity
> +map as they are accessed by a pVM. Hypercalls are provided for a pVM to
> +share specific regions of its IPA space back with the host, allowing
> +for communication with the VMM. See hypercalls.rst for more details.
> +
> +Isolation mechanisms
> +====================
> +
> +pKVM relies on a number of mechanisms to isolate PVMs from the host:
> +
> +CPU memory isolation
> +--------------------
> +
> +Status: Isolation of anonymous memory and metadata pages.
> +
> +Metadata pages (e.g. page-table pages and '``struct kvm_vcpu``' pages)
> +are donated from the host to the hypervisor during pVM creation and
> +are consequently unmapped from the stage-2 identity map until the pVM is
> +destroyed.
> +
> +Similarly to regular KVM, pages are lazily mapped into the guest in
> +response to stage-2 page faults handled by the host. However, when
> +running a pVM, these pages are first pinned and then unmapped from the
> +stage-2 identity map as part of the donation procedure. This gives rise
> +to some user-visible differences when compared to non-protected VMs,
> +largely due to the lack of MMU notifiers:
> +
> +* Memslots cannot be moved or deleted once the pVM has started running.
> +* Read-only memslots and dirty logging are not supported.
> +* With the exception of swap, file-backed pages cannot be mapped into a
> + pVM.
> +* Donated pages are accounted against ``RLIMIT_MLOCK`` and so the VMM
> + must have a sufficient resource limit or be granted ``CAP_IPC_LOCK``.
Perhaps worth to add that there's no runtime reclaim either so the accounting
will only grow until the VM is destroyed?
> +* Changes to the VMM address space (e.g. a ``MAP_FIXED`` mmap() over a
> + mapping associated with a memslot) are not reflected in the guest and
> + may lead to loss of coherency.
> +* Accessing pVM memory that has not been shared back will result in the
> + delivery of a SIGSEGV.
> +* If a system call accesses pVM memory that has not been shared back
> + then it will either return ``-EFAULT`` or forcefully reclaim the
> + memory pages. Reclaimed memory is zeroed by the hypervisor and a
> + subsequent attempt to access it in the pVM will return ``-EFAULT``
> + from the ``VCPU_RUN`` ioctl().
> +
> +CPU state isolation
> +-------------------
> +
> +Status: **Unimplemented.**
> +
> +DMA isolation using an IOMMU
> +----------------------------
> +
> +Status: **Unimplemented.**
> +
> +Proxying of Trustzone services
> +------------------------------
> +
> +Status: FF-A and PSCI calls from the host are proxied by the pKVM
> +hypervisor.
> +
> +The FF-A proxy ensures that the host cannot share pVM or hypervisor
> +memory with Trustzone as part of a "confused deputy" attack.
> +
> +The PSCI proxy ensures that CPUs always have the stage-2 identity map
> +installed when they are executing in the host.
> +
> +Protected VM firmware (pvmfw)
> +-----------------------------
> +
> +Status: **Unimplemented.**
> +
> +Resources
> +=========
> +
> +Quentin Perret's KVM Forum 2022 talk entitled "Protected KVM on arm64: A
> +technical deep dive" remains a good resource for learning more about
> +pKVM, despite some of the details having changed in the meantime:
> +
> +https://www.youtube.com/watch?v=9npebeVFbFw
> --
> 2.52.0.351.gbe84eed79e-goog
>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 19/30] KVM: arm64: Annotate guest donations with handle and gfn in host stage-2
2026-01-05 15:49 ` [PATCH 19/30] KVM: arm64: Annotate guest donations with handle and gfn in host stage-2 Will Deacon
@ 2026-01-06 16:01 ` Fuad Tabba
0 siblings, 0 replies; 45+ messages in thread
From: Fuad Tabba @ 2026-01-06 16:01 UTC (permalink / raw)
To: Will Deacon
Cc: kvmarm, linux-arm-kernel, Marc Zyngier, Oliver Upton, Joey Gouly,
Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Quentin Perret,
Vincent Donnefort, Mostafa Saleh
Hi Will,
On Mon, 5 Jan 2026 at 15:50, Will Deacon <will@kernel.org> wrote:
>
> Handling host kernel faults arising from accesses to donated guest
> memory will require an rmap-like mechanism to identify the guest mapping
> of the faulting page.
>
> Extend the page donation logic to encode the guest handle and gfn
> alongside the owner information in the host stage-2 pte.
>
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
> arch/arm64/kvm/hyp/nvhe/mem_protect.c | 18 +++++++++++++++++-
> 1 file changed, 17 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> index 7d1844e2888d..1a341337b272 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> @@ -1063,6 +1063,19 @@ static void hyp_poison_page(phys_addr_t phys)
> hyp_fixmap_unmap();
> }
>
> +#define KVM_HOST_INVALID_PTE_GUEST_HANDLE_MASK GENMASK(15, 0)
> +#define KVM_HOST_INVALID_PTE_GUEST_GFN_MASK GENMASK(56, 16)
> +static u64 host_stage2_encode_gfn_meta(struct pkvm_hyp_vm *vm, u64 gfn)
> +{
> + pkvm_handle_t handle = vm->kvm.arch.pkvm.handle;
> +
> + WARN_ON(!FIELD_FIT(KVM_HOST_INVALID_PTE_GUEST_HANDLE_MASK, handle));
Instead of (or in addition to) this check, should we also have a
compile time check to ensure that the handle fits? We have
KVM_MAX_PVMS and HANDLE_OFFSET, albeit HANDLE_OFFSET, so we can
calculate whether it fits.
Cheers,
/fuad
> + WARN_ON(!FIELD_FIT(KVM_HOST_INVALID_PTE_GUEST_GFN_MASK, gfn));
> +
> + return FIELD_PREP(KVM_HOST_INVALID_PTE_GUEST_HANDLE_MASK, handle) |
> + FIELD_PREP(KVM_HOST_INVALID_PTE_GUEST_GFN_MASK, gfn);
> +}
> +
> int __pkvm_host_reclaim_page_guest(u64 gfn, struct pkvm_hyp_vm *vm)
> {
> u64 ipa = hyp_pfn_to_phys(gfn);
> @@ -1105,6 +1118,7 @@ int __pkvm_host_donate_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu)
> struct pkvm_hyp_vm *vm = pkvm_hyp_vcpu_to_hyp_vm(vcpu);
> u64 phys = hyp_pfn_to_phys(pfn);
> u64 ipa = hyp_pfn_to_phys(gfn);
> + u64 meta;
> int ret;
>
> host_lock_component();
> @@ -1118,7 +1132,9 @@ int __pkvm_host_donate_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu)
> if (ret)
> goto unlock;
>
> - WARN_ON(host_stage2_set_owner_locked(phys, PAGE_SIZE, PKVM_ID_GUEST));
> + meta = host_stage2_encode_gfn_meta(vm, gfn);
> + WARN_ON(host_stage2_set_owner_metadata_locked(phys, PAGE_SIZE,
> + PKVM_ID_GUEST, meta));
> WARN_ON(kvm_pgtable_stage2_map(&vm->pgt, ipa, PAGE_SIZE, phys,
> pkvm_mkstate(KVM_PGTABLE_PROT_RWX, PKVM_PAGE_OWNED),
> &vcpu->vcpu.arch.pkvm_memcache, 0));
> --
> 2.52.0.351.gbe84eed79e-goog
>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 13/30] KVM: arm64: Introduce __pkvm_reclaim_dying_guest_page()
2026-01-05 15:49 ` [PATCH 13/30] KVM: arm64: Introduce __pkvm_reclaim_dying_guest_page() Will Deacon
@ 2026-01-06 16:26 ` Vincent Donnefort
0 siblings, 0 replies; 45+ messages in thread
From: Vincent Donnefort @ 2026-01-06 16:26 UTC (permalink / raw)
To: Will Deacon
Cc: kvmarm, linux-arm-kernel, Marc Zyngier, Oliver Upton, Joey Gouly,
Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Quentin Perret,
Fuad Tabba, Mostafa Saleh
On Mon, Jan 05, 2026 at 03:49:21PM +0000, Will Deacon wrote:
> To enable reclaim of pages from a protected VM during teardown,
> introduce a new hypercall to reclaim a single page from a protected
> guest that is in the dying state.
>
> Since the EL2 code is non-preemptible, the new hypercall deliberately
> acts on a single page at a time so as to allow EL1 to reschedule
> frequently during the teardown operation.
>
> Co-developed-by: Quentin Perret <qperret@google.com>
> Signed-off-by: Quentin Perret <qperret@google.com>
> Signed-off-by: Will Deacon <will@kernel.org>
Reviewed-by: Vincent Donnefort <vdonnefort@google.com>
> ---
> arch/arm64/include/asm/kvm_asm.h | 1 +
> arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 1 +
> arch/arm64/kvm/hyp/include/nvhe/pkvm.h | 1 +
> arch/arm64/kvm/hyp/nvhe/hyp-main.c | 9 +++
> arch/arm64/kvm/hyp/nvhe/mem_protect.c | 79 +++++++++++++++++++
> arch/arm64/kvm/hyp/nvhe/pkvm.c | 14 ++++
> 6 files changed, 105 insertions(+)
>
> diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> index cad3ba5e1c5a..f14f845aeedd 100644
> --- a/arch/arm64/include/asm/kvm_asm.h
> +++ b/arch/arm64/include/asm/kvm_asm.h
> @@ -86,6 +86,7 @@ enum __kvm_host_smccc_func {
> __KVM_HOST_SMCCC_FUNC___pkvm_unreserve_vm,
> __KVM_HOST_SMCCC_FUNC___pkvm_init_vm,
> __KVM_HOST_SMCCC_FUNC___pkvm_init_vcpu,
> + __KVM_HOST_SMCCC_FUNC___pkvm_reclaim_dying_guest_page,
> __KVM_HOST_SMCCC_FUNC___pkvm_start_teardown_vm,
> __KVM_HOST_SMCCC_FUNC___pkvm_finalize_teardown_vm,
> __KVM_HOST_SMCCC_FUNC___pkvm_vcpu_load,
> diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> index 9c0cc53d1dc9..cde38a556049 100644
> --- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> +++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> @@ -41,6 +41,7 @@ int __pkvm_hyp_donate_host(u64 pfn, u64 nr_pages);
> int __pkvm_host_share_ffa(u64 pfn, u64 nr_pages);
> int __pkvm_host_unshare_ffa(u64 pfn, u64 nr_pages);
> int __pkvm_host_donate_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu);
> +int __pkvm_host_reclaim_page_guest(u64 gfn, struct pkvm_hyp_vm *vm);
> int __pkvm_host_share_guest(u64 pfn, u64 gfn, u64 nr_pages, struct pkvm_hyp_vcpu *vcpu,
> enum kvm_pgtable_prot prot);
> int __pkvm_host_unshare_guest(u64 gfn, u64 nr_pages, struct pkvm_hyp_vm *hyp_vm);
> diff --git a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
> index 04c7ca703014..506831804f64 100644
> --- a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
> +++ b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
> @@ -74,6 +74,7 @@ int __pkvm_init_vm(struct kvm *host_kvm, unsigned long vm_hva,
> int __pkvm_init_vcpu(pkvm_handle_t handle, struct kvm_vcpu *host_vcpu,
> unsigned long vcpu_hva);
>
> +int __pkvm_reclaim_dying_guest_page(pkvm_handle_t handle, u64 gfn);
> int __pkvm_start_teardown_vm(pkvm_handle_t handle);
> int __pkvm_finalize_teardown_vm(pkvm_handle_t handle);
>
> diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> index a5ee1103ce1f..b1940e639ad3 100644
> --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> @@ -570,6 +570,14 @@ static void handle___pkvm_init_vcpu(struct kvm_cpu_context *host_ctxt)
> cpu_reg(host_ctxt, 1) = __pkvm_init_vcpu(handle, host_vcpu, vcpu_hva);
> }
>
> +static void handle___pkvm_reclaim_dying_guest_page(struct kvm_cpu_context *host_ctxt)
> +{
> + DECLARE_REG(pkvm_handle_t, handle, host_ctxt, 1);
> + DECLARE_REG(u64, gfn, host_ctxt, 2);
> +
> + cpu_reg(host_ctxt, 1) = __pkvm_reclaim_dying_guest_page(handle, gfn);
> +}
> +
> static void handle___pkvm_start_teardown_vm(struct kvm_cpu_context *host_ctxt)
> {
> DECLARE_REG(pkvm_handle_t, handle, host_ctxt, 1);
> @@ -622,6 +630,7 @@ static const hcall_t host_hcall[] = {
> HANDLE_FUNC(__pkvm_unreserve_vm),
> HANDLE_FUNC(__pkvm_init_vm),
> HANDLE_FUNC(__pkvm_init_vcpu),
> + HANDLE_FUNC(__pkvm_reclaim_dying_guest_page),
> HANDLE_FUNC(__pkvm_start_teardown_vm),
> HANDLE_FUNC(__pkvm_finalize_teardown_vm),
> HANDLE_FUNC(__pkvm_vcpu_load),
> diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> index ae126ab9febf..edbfe0e3dc58 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> @@ -725,6 +725,32 @@ static int __guest_check_page_state_range(struct pkvm_hyp_vm *vm, u64 addr,
> return check_page_state_range(&vm->pgt, addr, size, &d);
> }
>
> +static int get_valid_guest_pte(struct pkvm_hyp_vm *vm, u64 ipa, kvm_pte_t *ptep, u64 *physp)
> +{
> + kvm_pte_t pte;
> + u64 phys;
> + s8 level;
> + int ret;
> +
> + ret = kvm_pgtable_get_leaf(&vm->pgt, ipa, &pte, &level);
> + if (ret)
> + return ret;
> + if (!kvm_pte_valid(pte))
> + return -ENOENT;
> + if (level != KVM_PGTABLE_LAST_LEVEL)
> + return -E2BIG;
> +
> + phys = kvm_pte_to_phys(pte);
> + ret = check_range_allowed_memory(phys, phys + PAGE_SIZE);
> + if (WARN_ON(ret))
> + return ret;
> +
> + *ptep = pte;
> + *physp = phys;
> +
> + return 0;
> +}
> +
> int __pkvm_host_share_hyp(u64 pfn)
> {
> u64 phys = hyp_pfn_to_phys(pfn);
> @@ -958,6 +984,59 @@ static int __guest_check_transition_size(u64 phys, u64 ipa, u64 nr_pages, u64 *s
> return 0;
> }
>
> +static void hyp_poison_page(phys_addr_t phys)
> +{
> + void *addr = hyp_fixmap_map(phys);
> +
> + memset(addr, 0, PAGE_SIZE);
> + /*
> + * Prefer kvm_flush_dcache_to_poc() over __clean_dcache_guest_page()
> + * here as the latter may elide the CMO under the assumption that FWB
> + * will be enabled on CPUs that support it. This is incorrect for the
> + * host stage-2 and would otherwise lead to a malicious host potentially
> + * being able to read the contents of newly reclaimed guest pages.
> + */
> + kvm_flush_dcache_to_poc(addr, PAGE_SIZE);
> + hyp_fixmap_unmap();
> +}
> +
> +int __pkvm_host_reclaim_page_guest(u64 gfn, struct pkvm_hyp_vm *vm)
> +{
> + u64 ipa = hyp_pfn_to_phys(gfn);
> + kvm_pte_t pte;
> + u64 phys;
> + int ret;
> +
> + host_lock_component();
> + guest_lock_component(vm);
> +
> + ret = get_valid_guest_pte(vm, ipa, &pte, &phys);
> + if (ret)
> + goto unlock;
> +
> + switch (guest_get_page_state(pte, ipa)) {
> + case PKVM_PAGE_OWNED:
> + WARN_ON(__host_check_page_state_range(phys, PAGE_SIZE, PKVM_NOPAGE));
> + hyp_poison_page(phys);
> + break;
> + case PKVM_PAGE_SHARED_OWNED:
> + WARN_ON(__host_check_page_state_range(phys, PAGE_SIZE, PKVM_PAGE_SHARED_BORROWED));
> + break;
> + default:
> + ret = -EPERM;
> + goto unlock;
> + }
> +
> + WARN_ON(kvm_pgtable_stage2_unmap(&vm->pgt, ipa, PAGE_SIZE));
> + WARN_ON(host_stage2_set_owner_locked(phys, PAGE_SIZE, PKVM_ID_HOST));
> +
> +unlock:
> + guest_unlock_component(vm);
> + host_unlock_component();
> +
> + return ret;
> +}
> +
> int __pkvm_host_donate_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu)
> {
> struct pkvm_hyp_vm *vm = pkvm_hyp_vcpu_to_hyp_vm(vcpu);
> diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
> index 7f8191f96fc3..9f0997150cf5 100644
> --- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
> +++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
> @@ -832,6 +832,20 @@ teardown_donated_memory(struct kvm_hyp_memcache *mc, void *addr, size_t size)
> unmap_donated_memory_noclear(addr, size);
> }
>
> +int __pkvm_reclaim_dying_guest_page(pkvm_handle_t handle, u64 gfn)
> +{
> + struct pkvm_hyp_vm *hyp_vm;
> + int ret = -EINVAL;
> +
> + hyp_spin_lock(&vm_table_lock);
> + hyp_vm = get_vm_by_handle(handle);
> + if (hyp_vm && hyp_vm->kvm.arch.pkvm.is_dying)
> + ret = __pkvm_host_reclaim_page_guest(gfn, hyp_vm);
> + hyp_spin_unlock(&vm_table_lock);
> +
> + return ret;
> +}
> +
> int __pkvm_start_teardown_vm(pkvm_handle_t handle)
> {
> struct pkvm_hyp_vm *hyp_vm;
> --
> 2.52.0.351.gbe84eed79e-goog
>
^ permalink raw reply [flat|nested] 45+ messages in thread
end of thread, other threads:[~2026-01-06 16:26 UTC | newest]
Thread overview: 45+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-05 15:49 [PATCH 00/30] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
2026-01-05 15:49 ` [PATCH 01/30] KVM: arm64: Invert KVM_PGTABLE_WALK_HANDLE_FAULT to fix pKVM walkers Will Deacon
2026-01-06 14:33 ` Quentin Perret
2026-01-05 15:49 ` [PATCH 02/30] KVM: arm64: Remove redundant 'pgt' pointer checks from MMU notifiers Will Deacon
2026-01-06 14:32 ` Quentin Perret
2026-01-05 15:49 ` [PATCH 03/30] KVM: arm64: Rename __pkvm_pgtable_stage2_unmap() Will Deacon
2026-01-05 15:49 ` [PATCH 04/30] KVM: arm64: Don't advertise unsupported features for protected guests Will Deacon
2026-01-05 15:49 ` [PATCH 05/30] KVM: arm64: Expose self-hosted debug regs as RAZ/WI " Will Deacon
2026-01-05 15:49 ` [PATCH 06/30] KVM: arm64: Remove pointless is_protected_kvm_enabled() checks from hyp Will Deacon
2026-01-06 14:40 ` Quentin Perret
2026-01-05 15:49 ` [PATCH 07/30] KVM: arm64: Ignore MMU notifier callbacks for protected VMs Will Deacon
2026-01-05 15:49 ` [PATCH 08/30] KVM: arm64: Prevent unsupported memslot operations on " Will Deacon
2026-01-05 15:49 ` [PATCH 09/30] KVM: arm64: Split teardown hypercall into two phases Will Deacon
2026-01-05 15:49 ` [PATCH 10/30] KVM: arm64: Introduce __pkvm_host_donate_guest() Will Deacon
2026-01-06 14:48 ` Quentin Perret
2026-01-05 15:49 ` [PATCH 11/30] KVM: arm64: Hook up donation hypercall to pkvm_pgtable_stage2_map() Will Deacon
2026-01-05 15:49 ` [PATCH 12/30] KVM: arm64: Handle aborts from protected VMs Will Deacon
2026-01-05 15:49 ` [PATCH 13/30] KVM: arm64: Introduce __pkvm_reclaim_dying_guest_page() Will Deacon
2026-01-06 16:26 ` Vincent Donnefort
2026-01-05 15:49 ` [PATCH 14/30] KVM: arm64: Hook up reclaim hypercall to pkvm_pgtable_stage2_destroy() Will Deacon
2026-01-06 14:59 ` Quentin Perret
2026-01-05 15:49 ` [PATCH 15/30] KVM: arm64: Refactor enter_exception64() Will Deacon
2026-01-05 15:49 ` [PATCH 16/30] KVM: arm64: Inject SIGSEGV on illegal accesses Will Deacon
2026-01-05 15:49 ` [PATCH 17/30] KVM: arm64: Generalise kvm_pgtable_stage2_set_owner() Will Deacon
2026-01-06 15:20 ` Quentin Perret
2026-01-05 15:49 ` [PATCH 18/30] KVM: arm64: Introduce host_stage2_set_owner_metadata_locked() Will Deacon
2026-01-05 15:49 ` [PATCH 19/30] KVM: arm64: Annotate guest donations with handle and gfn in host stage-2 Will Deacon
2026-01-06 16:01 ` Fuad Tabba
2026-01-05 15:49 ` [PATCH 20/30] KVM: arm64: Introduce hypercall to force reclaim of a protected page Will Deacon
2026-01-06 15:44 ` Quentin Perret
2026-01-05 15:49 ` [PATCH 21/30] KVM: arm64: Reclaim faulting page from pKVM in spurious fault handler Will Deacon
2026-01-05 15:49 ` [PATCH 22/30] KVM: arm64: Return -EFAULT from VCPU_RUN on access to a poisoned pte Will Deacon
2026-01-06 15:54 ` Quentin Perret
2026-01-05 15:49 ` [PATCH 23/30] KVM: arm64: Add hvc handler at EL2 for hypercalls from protected VMs Will Deacon
2026-01-06 15:52 ` Vincent Donnefort
2026-01-05 15:49 ` [PATCH 24/30] KVM: arm64: Implement the MEM_SHARE hypercall for " Will Deacon
2026-01-06 15:45 ` Vincent Donnefort
2026-01-05 15:49 ` [PATCH 25/30] KVM: arm64: Implement the MEM_UNSHARE " Will Deacon
2026-01-06 15:50 ` Vincent Donnefort
2026-01-05 15:49 ` [PATCH 26/30] KVM: arm64: Allow userspace to create protected VMs when pKVM is enabled Will Deacon
2026-01-05 15:49 ` [PATCH 27/30] KVM: arm64: Add some initial documentation for pKVM Will Deacon
2026-01-06 15:59 ` Vincent Donnefort
2026-01-05 15:49 ` [PATCH 28/30] KVM: arm64: Extend pKVM page ownership selftests to cover guest donation Will Deacon
2026-01-05 15:49 ` [PATCH 29/30] KVM: arm64: Register 'selftest_vm' in the VM table Will Deacon
2026-01-05 15:49 ` [PATCH 30/30] KVM: arm64: Extend pKVM page ownership selftests to cover forced reclaim Will Deacon
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).