public inbox for linux-arm-kernel@lists.infradead.org
 help / color / mirror / Atom feed
* [PATCH v2 00/35] KVM: arm64: Add support for protected guest memory with pKVM
@ 2026-01-19 12:45 Will Deacon
  2026-01-19 12:45 ` [PATCH v2 01/35] KVM: arm64: Invert KVM_PGTABLE_WALK_HANDLE_FAULT to fix pKVM walkers Will Deacon
                   ` (35 more replies)
  0 siblings, 36 replies; 54+ messages in thread
From: Will Deacon @ 2026-01-19 12:45 UTC (permalink / raw)
  To: kvmarm
  Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
	Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
	Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh

Hi folks,

It's back and it's even bigger than before!

Although the first patch has been picked up as a fix (thanks, Oliver),
review feedback has resulted in some additional patches being included
in the series. If you'd like to see the first version, it's available
here:

  https://lore.kernel.org/kvmarm/20260105154939.11041-1-will@kernel.org/

Changes since v1 include:

  * Change hypercall order so that pKVM-specific calls are in their own
    range
  * Make 'pkvm_handle_t' a u16 and check its size at compile-time
  * Rework pte annotations to take a 4-bit type field
  * Fix a memory leak on the kvm_arch_init_vm() failure path
  * Pass KVM_PGTABLE_WALK_IGNORE_EAGAIN when mapping into the host idmap
  * Avoid host annotation when reclaiming guest pages
  * Re-instate guest pte metadata when unsharing a page
  * Extend selftests to cover guest share and unshare hypercalls
  * Drop broken MMU notifier changes
  * Documentation tweaks

As before, patches are based on v6.19-rc4 and are also available at:

  https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=kvm/protected-memory

All feedback welcome.

Cheers,

Will

Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oupton@kernel.org>
Cc: Joey Gouly <joey.gouly@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Quentin Perret <qperret@google.com>
Cc: Fuad Tabba <tabba@google.com>
Cc: Vincent Donnefort <vdonnefort@google.com>
Cc: Mostafa Saleh <smostafa@google.com>

--->8

Fuad Tabba (1):
  KVM: arm64: Expose self-hosted debug regs as RAZ/WI for protected
    guests

Quentin Perret (2):
  KVM: arm64: Refactor enter_exception64()
  KVM: arm64: Inject SIGSEGV on illegal accesses

Will Deacon (32):
  KVM: arm64: Invert KVM_PGTABLE_WALK_HANDLE_FAULT to fix pKVM walkers
  KVM: arm64: Don't leak stage-2 page-table if VM fails to init under
    pKVM
  KVM: arm64: Move handle check into pkvm_pgtable_stage2_destroy_range()
  KVM: arm64: Rename __pkvm_pgtable_stage2_unmap()
  KVM: arm64: Don't advertise unsupported features for protected guests
  KVM: arm64: Remove is_protected_kvm_enabled() checks from hypercalls
  KVM: arm64: Ignore MMU notifier callbacks for protected VMs
  KVM: arm64: Prevent unsupported memslot operations on protected VMs
  KVM: arm64: Ignore -EAGAIN when mapping in pages for the pKVM host
  KVM: arm64: Split teardown hypercall into two phases
  KVM: arm64: Introduce __pkvm_host_donate_guest()
  KVM: arm64: Hook up donation hypercall to pkvm_pgtable_stage2_map()
  KVM: arm64: Handle aborts from protected VMs
  KVM: arm64: Introduce __pkvm_reclaim_dying_guest_page()
  KVM: arm64: Hook up reclaim hypercall to pkvm_pgtable_stage2_destroy()
  KVM: arm64: Avoid pointless annotation when mapping host-owned pages
  KVM: arm64: Generalise kvm_pgtable_stage2_set_owner()
  KVM: arm64: Introduce host_stage2_set_owner_metadata_locked()
  KVM: arm64: Change 'pkvm_handle_t' to u16
  KVM: arm64: Annotate guest donations with handle and gfn in host
    stage-2
  KVM: arm64: Introduce hypercall to force reclaim of a protected page
  KVM: arm64: Reclaim faulting page from pKVM in spurious fault handler
  KVM: arm64: Return -EFAULT from VCPU_RUN on access to a poisoned pte
  KVM: arm64: Add hvc handler at EL2 for hypercalls from protected VMs
  KVM: arm64: Implement the MEM_SHARE hypercall for protected VMs
  KVM: arm64: Implement the MEM_UNSHARE hypercall for protected VMs
  KVM: arm64: Allow userspace to create protected VMs when pKVM is
    enabled
  KVM: arm64: Add some initial documentation for pKVM
  KVM: arm64: Extend pKVM page ownership selftests to cover guest
    donation
  KVM: arm64: Register 'selftest_vm' in the VM table
  KVM: arm64: Extend pKVM page ownership selftests to cover forced
    reclaim
  KVM: arm64: Extend pKVM page ownership selftests to cover guest hvcs

 .../admin-guide/kernel-parameters.txt         |   4 +-
 Documentation/virt/kvm/arm/index.rst          |   1 +
 Documentation/virt/kvm/arm/pkvm.rst           | 103 +++
 arch/arm64/include/asm/kvm_asm.h              |  27 +-
 arch/arm64/include/asm/kvm_emulate.h          |   5 +
 arch/arm64/include/asm/kvm_host.h             |   9 +-
 arch/arm64/include/asm/kvm_pgtable.h          |  51 +-
 arch/arm64/include/asm/kvm_pkvm.h             |   4 +-
 arch/arm64/include/asm/virt.h                 |   6 +
 arch/arm64/kvm/Kconfig                        |  10 +
 arch/arm64/kvm/arm.c                          |  12 +-
 arch/arm64/kvm/hyp/exception.c                | 100 +--
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |   9 +
 arch/arm64/kvm/hyp/include/nvhe/memory.h      |   6 +
 arch/arm64/kvm/hyp/include/nvhe/pkvm.h        |   7 +-
 arch/arm64/kvm/hyp/nvhe/hyp-main.c            | 125 ++--
 arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 595 ++++++++++++++++--
 arch/arm64/kvm/hyp/nvhe/pkvm.c                | 224 ++++++-
 arch/arm64/kvm/hyp/nvhe/switch.c              |   1 +
 arch/arm64/kvm/hyp/nvhe/sys_regs.c            |   8 +
 arch/arm64/kvm/hyp/pgtable.c                  |  38 +-
 arch/arm64/kvm/mmu.c                          | 122 +++-
 arch/arm64/kvm/pkvm.c                         | 149 ++++-
 arch/arm64/mm/fault.c                         |  31 +-
 include/uapi/linux/kvm.h                      |   5 +
 25 files changed, 1405 insertions(+), 247 deletions(-)
 create mode 100644 Documentation/virt/kvm/arm/pkvm.rst

-- 
2.52.0.457.g6b5491de43-goog



^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v2 01/35] KVM: arm64: Invert KVM_PGTABLE_WALK_HANDLE_FAULT to fix pKVM walkers
  2026-01-19 12:45 [PATCH v2 00/35] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
@ 2026-01-19 12:45 ` Will Deacon
  2026-01-19 12:45 ` [PATCH v2 02/35] KVM: arm64: Don't leak stage-2 page-table if VM fails to init under pKVM Will Deacon
                   ` (34 subsequent siblings)
  35 siblings, 0 replies; 54+ messages in thread
From: Will Deacon @ 2026-01-19 12:45 UTC (permalink / raw)
  To: kvmarm
  Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
	Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
	Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh

Commit ddcadb297ce5 ("KVM: arm64: Ignore EAGAIN for walks outside of a
fault") introduced a new walker flag ('KVM_PGTABLE_WALK_HANDLE_FAULT')
to KVM's page-table code. When set, the walk logic maintains its
previous behaviour of terminating a walk as soon as the visitor callback
returns an error. However, when the flag is clear, the walk will
continue if the visitor returns -EAGAIN and the error is then suppressed
and returned as zero to the caller.

Clearing the flag is beneficial when write-protecting a range of IPAs
with kvm_pgtable_stage2_wrprotect() but is not useful in any other
cases, either because we are operating on a single page (e.g.
kvm_pgtable_stage2_mkyoung() or kvm_phys_addr_ioremap()) or because the
early termination is desirable (e.g. when mapping pages from a fault in
user_mem_abort()).

Subsequently, commit e912efed485a ("KVM: arm64: Introduce the EL1 pKVM
MMU") hooked up pKVM's hypercall interface to the MMU code at EL1 but
failed to propagate any of the walker flags. As a result, page-table
walks at EL2 fail to set KVM_PGTABLE_WALK_HANDLE_FAULT even when the
early termination semantics are desirable on the fault handling path.

Rather than complicate the pKVM hypercall interface, invert the flag so
that the whole thing can be simplified and only pass the new flag
('KVM_PGTABLE_WALK_IGNORE_EAGAIN') from the wrprotect code.

Cc: Fuad Tabba <tabba@google.com>
Cc: Quentin Perret <qperret@google.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oupton@kernel.org>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Quentin Perret <qperret@google.com>
Fixes: fce886a60207 ("KVM: arm64: Plumb the pKVM MMU in KVM")
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/kvm_pgtable.h | 6 +++---
 arch/arm64/kvm/hyp/pgtable.c         | 5 +++--
 arch/arm64/kvm/mmu.c                 | 8 +++-----
 3 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index fc02de43c68d..8b78d573fbcf 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -293,8 +293,8 @@ typedef bool (*kvm_pgtable_force_pte_cb_t)(u64 addr, u64 end,
  *					children.
  * @KVM_PGTABLE_WALK_SHARED:		Indicates the page-tables may be shared
  *					with other software walkers.
- * @KVM_PGTABLE_WALK_HANDLE_FAULT:	Indicates the page-table walk was
- *					invoked from a fault handler.
+ * @KVM_PGTABLE_WALK_IGNORE_EAGAIN:	Don't terminate the walk early if
+ *					the walker returns -EAGAIN.
  * @KVM_PGTABLE_WALK_SKIP_BBM_TLBI:	Visit and update table entries
  *					without Break-before-make's
  *					TLB invalidation.
@@ -307,7 +307,7 @@ enum kvm_pgtable_walk_flags {
 	KVM_PGTABLE_WALK_TABLE_PRE		= BIT(1),
 	KVM_PGTABLE_WALK_TABLE_POST		= BIT(2),
 	KVM_PGTABLE_WALK_SHARED			= BIT(3),
-	KVM_PGTABLE_WALK_HANDLE_FAULT		= BIT(4),
+	KVM_PGTABLE_WALK_IGNORE_EAGAIN		= BIT(4),
 	KVM_PGTABLE_WALK_SKIP_BBM_TLBI		= BIT(5),
 	KVM_PGTABLE_WALK_SKIP_CMO		= BIT(6),
 };
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 947ac1a951a5..9abc0a6cf448 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -144,7 +144,7 @@ static bool kvm_pgtable_walk_continue(const struct kvm_pgtable_walker *walker,
 	 * page table walk.
 	 */
 	if (r == -EAGAIN)
-		return !(walker->flags & KVM_PGTABLE_WALK_HANDLE_FAULT);
+		return walker->flags & KVM_PGTABLE_WALK_IGNORE_EAGAIN;
 
 	return !r;
 }
@@ -1262,7 +1262,8 @@ int kvm_pgtable_stage2_wrprotect(struct kvm_pgtable *pgt, u64 addr, u64 size)
 {
 	return stage2_update_leaf_attrs(pgt, addr, size, 0,
 					KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W,
-					NULL, NULL, 0);
+					NULL, NULL,
+					KVM_PGTABLE_WALK_IGNORE_EAGAIN);
 }
 
 void kvm_pgtable_stage2_mkyoung(struct kvm_pgtable *pgt, u64 addr,
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 48d7c372a4cd..2b260bdf3202 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1563,14 +1563,12 @@ static void adjust_nested_exec_perms(struct kvm *kvm,
 		*prot &= ~KVM_PGTABLE_PROT_PX;
 }
 
-#define KVM_PGTABLE_WALK_MEMABORT_FLAGS (KVM_PGTABLE_WALK_HANDLE_FAULT | KVM_PGTABLE_WALK_SHARED)
-
 static int gmem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 		      struct kvm_s2_trans *nested,
 		      struct kvm_memory_slot *memslot, bool is_perm)
 {
 	bool write_fault, exec_fault, writable;
-	enum kvm_pgtable_walk_flags flags = KVM_PGTABLE_WALK_MEMABORT_FLAGS;
+	enum kvm_pgtable_walk_flags flags = KVM_PGTABLE_WALK_SHARED;
 	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
 	struct kvm_pgtable *pgt = vcpu->arch.hw_mmu->pgt;
 	unsigned long mmu_seq;
@@ -1665,7 +1663,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	struct kvm_pgtable *pgt;
 	struct page *page;
 	vm_flags_t vm_flags;
-	enum kvm_pgtable_walk_flags flags = KVM_PGTABLE_WALK_MEMABORT_FLAGS;
+	enum kvm_pgtable_walk_flags flags = KVM_PGTABLE_WALK_SHARED;
 
 	if (fault_is_perm)
 		fault_granule = kvm_vcpu_trap_get_perm_fault_granule(vcpu);
@@ -1933,7 +1931,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 /* Resolve the access fault by making the page young again. */
 static void handle_access_fault(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
 {
-	enum kvm_pgtable_walk_flags flags = KVM_PGTABLE_WALK_HANDLE_FAULT | KVM_PGTABLE_WALK_SHARED;
+	enum kvm_pgtable_walk_flags flags = KVM_PGTABLE_WALK_SHARED;
 	struct kvm_s2_mmu *mmu;
 
 	trace_kvm_access_fault(fault_ipa);
-- 
2.52.0.457.g6b5491de43-goog



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 02/35] KVM: arm64: Don't leak stage-2 page-table if VM fails to init under pKVM
  2026-01-19 12:45 [PATCH v2 00/35] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
  2026-01-19 12:45 ` [PATCH v2 01/35] KVM: arm64: Invert KVM_PGTABLE_WALK_HANDLE_FAULT to fix pKVM walkers Will Deacon
@ 2026-01-19 12:45 ` Will Deacon
  2026-01-19 12:45 ` [PATCH v2 03/35] KVM: arm64: Move handle check into pkvm_pgtable_stage2_destroy_range() Will Deacon
                   ` (33 subsequent siblings)
  35 siblings, 0 replies; 54+ messages in thread
From: Will Deacon @ 2026-01-19 12:45 UTC (permalink / raw)
  To: kvmarm
  Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
	Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
	Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh

If pkvm_init_host_vm() fails, we should free the stage-2 page-table
previously allocated by kvm_init_stage2_mmu().

Cc: Fuad Tabba <tabba@google.com>
Fixes: 07aeb70707b1 ("KVM: arm64: Reserve pKVM handle during pkvm_init_host_vm()")
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/arm.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 4f80da0c0d1d..6a218739621d 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -190,7 +190,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 		 */
 		ret = pkvm_init_host_vm(kvm);
 		if (ret)
-			goto err_free_cpumask;
+			goto err_uninit_mmu;
 	}
 
 	kvm_vgic_early_init(kvm);
@@ -206,6 +206,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 
 	return 0;
 
+err_uninit_mmu:
+	kvm_uninit_stage2_mmu(kvm);
 err_free_cpumask:
 	free_cpumask_var(kvm->arch.supported_cpus);
 err_unshare_kvm:
-- 
2.52.0.457.g6b5491de43-goog



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 03/35] KVM: arm64: Move handle check into pkvm_pgtable_stage2_destroy_range()
  2026-01-19 12:45 [PATCH v2 00/35] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
  2026-01-19 12:45 ` [PATCH v2 01/35] KVM: arm64: Invert KVM_PGTABLE_WALK_HANDLE_FAULT to fix pKVM walkers Will Deacon
  2026-01-19 12:45 ` [PATCH v2 02/35] KVM: arm64: Don't leak stage-2 page-table if VM fails to init under pKVM Will Deacon
@ 2026-01-19 12:45 ` Will Deacon
  2026-01-19 12:45 ` [PATCH v2 04/35] KVM: arm64: Rename __pkvm_pgtable_stage2_unmap() Will Deacon
                   ` (32 subsequent siblings)
  35 siblings, 0 replies; 54+ messages in thread
From: Will Deacon @ 2026-01-19 12:45 UTC (permalink / raw)
  To: kvmarm
  Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
	Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
	Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh

When pKVM is enabled, a VM has a 'handle' allocated by the hypervisor
in kvm_arch_init_vm() and released later by kvm_arch_destroy_vm().

Consequently, the only time __pkvm_pgtable_stage2_unmap() can run into
an uninitialised 'handle' is on the kvm_arch_init_vm() failure path,
where we destroy the empty stage-2 page-table if we fail to allocate a
handle.

Move the handle check into pkvm_pgtable_stage2_destroy_range(), which
will additionally handle protected VMs in subsequent patches.

Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/pkvm.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index d7a0f69a9982..7797813f4dbe 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -329,9 +329,6 @@ static int __pkvm_pgtable_stage2_unmap(struct kvm_pgtable *pgt, u64 start, u64 e
 	struct pkvm_mapping *mapping;
 	int ret;
 
-	if (!handle)
-		return 0;
-
 	for_each_mapping_in_range_safe(pgt, start, end, mapping) {
 		ret = kvm_call_hyp_nvhe(__pkvm_host_unshare_guest, handle, mapping->gfn,
 					mapping->nr_pages);
@@ -347,6 +344,12 @@ static int __pkvm_pgtable_stage2_unmap(struct kvm_pgtable *pgt, u64 start, u64 e
 void pkvm_pgtable_stage2_destroy_range(struct kvm_pgtable *pgt,
 					u64 addr, u64 size)
 {
+	struct kvm *kvm = kvm_s2_mmu_to_kvm(pgt->mmu);
+	pkvm_handle_t handle = kvm->arch.pkvm.handle;
+
+	if (!handle)
+		return;
+
 	__pkvm_pgtable_stage2_unmap(pgt, addr, addr + size);
 }
 
-- 
2.52.0.457.g6b5491de43-goog



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 04/35] KVM: arm64: Rename __pkvm_pgtable_stage2_unmap()
  2026-01-19 12:45 [PATCH v2 00/35] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
                   ` (2 preceding siblings ...)
  2026-01-19 12:45 ` [PATCH v2 03/35] KVM: arm64: Move handle check into pkvm_pgtable_stage2_destroy_range() Will Deacon
@ 2026-01-19 12:45 ` Will Deacon
  2026-01-19 12:45 ` [PATCH v2 05/35] KVM: arm64: Don't advertise unsupported features for protected guests Will Deacon
                   ` (31 subsequent siblings)
  35 siblings, 0 replies; 54+ messages in thread
From: Will Deacon @ 2026-01-19 12:45 UTC (permalink / raw)
  To: kvmarm
  Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
	Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
	Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh

In preparation for adding support for protected VMs, where pages are
donated rather than shared, rename __pkvm_pgtable_stage2_unmap() to
__pkvm_pgtable_stage2_unshare() to make it clearer about what is going
on.

Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/pkvm.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index 7797813f4dbe..42f6e50825ac 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -322,7 +322,7 @@ int pkvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
 	return 0;
 }
 
-static int __pkvm_pgtable_stage2_unmap(struct kvm_pgtable *pgt, u64 start, u64 end)
+static int __pkvm_pgtable_stage2_unshare(struct kvm_pgtable *pgt, u64 start, u64 end)
 {
 	struct kvm *kvm = kvm_s2_mmu_to_kvm(pgt->mmu);
 	pkvm_handle_t handle = kvm->arch.pkvm.handle;
@@ -350,7 +350,7 @@ void pkvm_pgtable_stage2_destroy_range(struct kvm_pgtable *pgt,
 	if (!handle)
 		return;
 
-	__pkvm_pgtable_stage2_unmap(pgt, addr, addr + size);
+	__pkvm_pgtable_stage2_unshare(pgt, addr, addr + size);
 }
 
 void pkvm_pgtable_stage2_destroy_pgd(struct kvm_pgtable *pgt)
@@ -386,7 +386,7 @@ int pkvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
 			return -EAGAIN;
 
 		/* Remove _any_ pkvm_mapping overlapping with the range, bigger or smaller. */
-		ret = __pkvm_pgtable_stage2_unmap(pgt, addr, addr + size);
+		ret = __pkvm_pgtable_stage2_unshare(pgt, addr, addr + size);
 		if (ret)
 			return ret;
 		mapping = NULL;
@@ -409,7 +409,7 @@ int pkvm_pgtable_stage2_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
 {
 	lockdep_assert_held_write(&kvm_s2_mmu_to_kvm(pgt->mmu)->mmu_lock);
 
-	return __pkvm_pgtable_stage2_unmap(pgt, addr, addr + size);
+	return __pkvm_pgtable_stage2_unshare(pgt, addr, addr + size);
 }
 
 int pkvm_pgtable_stage2_wrprotect(struct kvm_pgtable *pgt, u64 addr, u64 size)
-- 
2.52.0.457.g6b5491de43-goog



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 05/35] KVM: arm64: Don't advertise unsupported features for protected guests
  2026-01-19 12:45 [PATCH v2 00/35] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
                   ` (3 preceding siblings ...)
  2026-01-19 12:45 ` [PATCH v2 04/35] KVM: arm64: Rename __pkvm_pgtable_stage2_unmap() Will Deacon
@ 2026-01-19 12:45 ` Will Deacon
  2026-01-19 12:45 ` [PATCH v2 06/35] KVM: arm64: Expose self-hosted debug regs as RAZ/WI " Will Deacon
                   ` (30 subsequent siblings)
  35 siblings, 0 replies; 54+ messages in thread
From: Will Deacon @ 2026-01-19 12:45 UTC (permalink / raw)
  To: kvmarm
  Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
	Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
	Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh

Both SVE and PMUv3 are treated as "restricted" features for protected
guests and attempts to access their corresponding architectural state
from a protected guest result in an undefined exception being injected
by the hypervisor.

Since these exceptions are unexpected and typically fatal for the guest,
don't advertise these features for protected guests.

Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/kvm_pkvm.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pkvm.h b/arch/arm64/include/asm/kvm_pkvm.h
index 0aecd4ac5f45..5a71d25febca 100644
--- a/arch/arm64/include/asm/kvm_pkvm.h
+++ b/arch/arm64/include/asm/kvm_pkvm.h
@@ -37,8 +37,6 @@ static inline bool kvm_pvm_ext_allowed(long ext)
 	case KVM_CAP_MAX_VCPU_ID:
 	case KVM_CAP_MSI_DEVID:
 	case KVM_CAP_ARM_VM_IPA_SIZE:
-	case KVM_CAP_ARM_PMU_V3:
-	case KVM_CAP_ARM_SVE:
 	case KVM_CAP_ARM_PTRAUTH_ADDRESS:
 	case KVM_CAP_ARM_PTRAUTH_GENERIC:
 		return true;
-- 
2.52.0.457.g6b5491de43-goog



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 06/35] KVM: arm64: Expose self-hosted debug regs as RAZ/WI for protected guests
  2026-01-19 12:45 [PATCH v2 00/35] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
                   ` (4 preceding siblings ...)
  2026-01-19 12:45 ` [PATCH v2 05/35] KVM: arm64: Don't advertise unsupported features for protected guests Will Deacon
@ 2026-01-19 12:45 ` Will Deacon
  2026-01-19 12:46 ` [PATCH v2 07/35] KVM: arm64: Remove is_protected_kvm_enabled() checks from hypercalls Will Deacon
                   ` (29 subsequent siblings)
  35 siblings, 0 replies; 54+ messages in thread
From: Will Deacon @ 2026-01-19 12:45 UTC (permalink / raw)
  To: kvmarm
  Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
	Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
	Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh

From: Fuad Tabba <tabba@google.com>

Debug and trace are not currently supported for protected guests, so
trap accesses to the related registers and emulate them as RAZ/WI for
now. Although this isn't strictly compatible with the architecture, it's
sufficient for Linux guests and means that debug support can be added
later on.

Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/hyp/nvhe/sys_regs.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/arch/arm64/kvm/hyp/nvhe/sys_regs.c b/arch/arm64/kvm/hyp/nvhe/sys_regs.c
index 3108b5185c20..c106bb796ab0 100644
--- a/arch/arm64/kvm/hyp/nvhe/sys_regs.c
+++ b/arch/arm64/kvm/hyp/nvhe/sys_regs.c
@@ -372,6 +372,14 @@ static const struct sys_reg_desc pvm_sys_reg_descs[] = {
 	/* Cache maintenance by set/way operations are restricted. */
 
 	/* Debug and Trace Registers are restricted. */
+	RAZ_WI(SYS_DBGBVRn_EL1(0)),
+	RAZ_WI(SYS_DBGBCRn_EL1(0)),
+	RAZ_WI(SYS_DBGWVRn_EL1(0)),
+	RAZ_WI(SYS_DBGWCRn_EL1(0)),
+	RAZ_WI(SYS_MDSCR_EL1),
+	RAZ_WI(SYS_OSLAR_EL1),
+	RAZ_WI(SYS_OSLSR_EL1),
+	RAZ_WI(SYS_OSDLR_EL1),
 
 	/* Group 1 ID registers */
 	HOST_HANDLED(SYS_REVIDR_EL1),
-- 
2.52.0.457.g6b5491de43-goog



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 07/35] KVM: arm64: Remove is_protected_kvm_enabled() checks from hypercalls
  2026-01-19 12:45 [PATCH v2 00/35] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
                   ` (5 preceding siblings ...)
  2026-01-19 12:45 ` [PATCH v2 06/35] KVM: arm64: Expose self-hosted debug regs as RAZ/WI " Will Deacon
@ 2026-01-19 12:46 ` Will Deacon
  2026-02-10 14:53   ` Alexandru Elisei
  2026-01-19 12:46 ` [PATCH v2 08/35] KVM: arm64: Ignore MMU notifier callbacks for protected VMs Will Deacon
                   ` (28 subsequent siblings)
  35 siblings, 1 reply; 54+ messages in thread
From: Will Deacon @ 2026-01-19 12:46 UTC (permalink / raw)
  To: kvmarm
  Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
	Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
	Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh

When pKVM is not enabled, the host shouldn't issue pKVM-specific
hypercalls and so there's no point checking for this in the pKVM
hypercall handlers.

Remove the redundant is_protected_kvm_enabled() checks from each
hypercall and instead rejig the hypercall table so that the
pKVM-specific hypercalls are unreachable when pKVM is not being used.

Reviewed-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/kvm_asm.h   | 20 ++++++----
 arch/arm64/kvm/hyp/nvhe/hyp-main.c | 63 ++++++++++--------------------
 2 files changed, 32 insertions(+), 51 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index a1ad12c72ebf..2076005e9253 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -60,16 +60,9 @@ enum __kvm_host_smccc_func {
 	__KVM_HOST_SMCCC_FUNC___vgic_v3_init_lrs,
 	__KVM_HOST_SMCCC_FUNC___vgic_v3_get_gic_config,
 	__KVM_HOST_SMCCC_FUNC___pkvm_prot_finalize,
+	__KVM_HOST_SMCCC_FUNC_MIN_PKVM = __KVM_HOST_SMCCC_FUNC___pkvm_prot_finalize,
 
 	/* Hypercalls available after pKVM finalisation */
-	__KVM_HOST_SMCCC_FUNC___pkvm_host_share_hyp,
-	__KVM_HOST_SMCCC_FUNC___pkvm_host_unshare_hyp,
-	__KVM_HOST_SMCCC_FUNC___pkvm_host_share_guest,
-	__KVM_HOST_SMCCC_FUNC___pkvm_host_unshare_guest,
-	__KVM_HOST_SMCCC_FUNC___pkvm_host_relax_perms_guest,
-	__KVM_HOST_SMCCC_FUNC___pkvm_host_wrprotect_guest,
-	__KVM_HOST_SMCCC_FUNC___pkvm_host_test_clear_young_guest,
-	__KVM_HOST_SMCCC_FUNC___pkvm_host_mkyoung_guest,
 	__KVM_HOST_SMCCC_FUNC___kvm_adjust_pc,
 	__KVM_HOST_SMCCC_FUNC___kvm_vcpu_run,
 	__KVM_HOST_SMCCC_FUNC___kvm_flush_vm_context,
@@ -81,6 +74,17 @@ enum __kvm_host_smccc_func {
 	__KVM_HOST_SMCCC_FUNC___kvm_timer_set_cntvoff,
 	__KVM_HOST_SMCCC_FUNC___vgic_v3_save_aprs,
 	__KVM_HOST_SMCCC_FUNC___vgic_v3_restore_vmcr_aprs,
+	__KVM_HOST_SMCCC_FUNC_MAX_NO_PKVM = __KVM_HOST_SMCCC_FUNC___vgic_v3_restore_vmcr_aprs,
+
+	/* Hypercalls available only when pKVM has finalised */
+	__KVM_HOST_SMCCC_FUNC___pkvm_host_share_hyp,
+	__KVM_HOST_SMCCC_FUNC___pkvm_host_unshare_hyp,
+	__KVM_HOST_SMCCC_FUNC___pkvm_host_share_guest,
+	__KVM_HOST_SMCCC_FUNC___pkvm_host_unshare_guest,
+	__KVM_HOST_SMCCC_FUNC___pkvm_host_relax_perms_guest,
+	__KVM_HOST_SMCCC_FUNC___pkvm_host_wrprotect_guest,
+	__KVM_HOST_SMCCC_FUNC___pkvm_host_test_clear_young_guest,
+	__KVM_HOST_SMCCC_FUNC___pkvm_host_mkyoung_guest,
 	__KVM_HOST_SMCCC_FUNC___pkvm_reserve_vm,
 	__KVM_HOST_SMCCC_FUNC___pkvm_unreserve_vm,
 	__KVM_HOST_SMCCC_FUNC___pkvm_init_vm,
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index a7c689152f68..eb5cfe32b2c9 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -169,9 +169,6 @@ static void handle___pkvm_vcpu_load(struct kvm_cpu_context *host_ctxt)
 	DECLARE_REG(u64, hcr_el2, host_ctxt, 3);
 	struct pkvm_hyp_vcpu *hyp_vcpu;
 
-	if (!is_protected_kvm_enabled())
-		return;
-
 	hyp_vcpu = pkvm_load_hyp_vcpu(handle, vcpu_idx);
 	if (!hyp_vcpu)
 		return;
@@ -185,12 +182,8 @@ static void handle___pkvm_vcpu_load(struct kvm_cpu_context *host_ctxt)
 
 static void handle___pkvm_vcpu_put(struct kvm_cpu_context *host_ctxt)
 {
-	struct pkvm_hyp_vcpu *hyp_vcpu;
+	struct pkvm_hyp_vcpu *hyp_vcpu = pkvm_get_loaded_hyp_vcpu();
 
-	if (!is_protected_kvm_enabled())
-		return;
-
-	hyp_vcpu = pkvm_get_loaded_hyp_vcpu();
 	if (hyp_vcpu)
 		pkvm_put_hyp_vcpu(hyp_vcpu);
 }
@@ -254,9 +247,6 @@ static void handle___pkvm_host_share_guest(struct kvm_cpu_context *host_ctxt)
 	struct pkvm_hyp_vcpu *hyp_vcpu;
 	int ret = -EINVAL;
 
-	if (!is_protected_kvm_enabled())
-		goto out;
-
 	hyp_vcpu = pkvm_get_loaded_hyp_vcpu();
 	if (!hyp_vcpu || pkvm_hyp_vcpu_is_protected(hyp_vcpu))
 		goto out;
@@ -278,9 +268,6 @@ static void handle___pkvm_host_unshare_guest(struct kvm_cpu_context *host_ctxt)
 	struct pkvm_hyp_vm *hyp_vm;
 	int ret = -EINVAL;
 
-	if (!is_protected_kvm_enabled())
-		goto out;
-
 	hyp_vm = get_np_pkvm_hyp_vm(handle);
 	if (!hyp_vm)
 		goto out;
@@ -298,9 +285,6 @@ static void handle___pkvm_host_relax_perms_guest(struct kvm_cpu_context *host_ct
 	struct pkvm_hyp_vcpu *hyp_vcpu;
 	int ret = -EINVAL;
 
-	if (!is_protected_kvm_enabled())
-		goto out;
-
 	hyp_vcpu = pkvm_get_loaded_hyp_vcpu();
 	if (!hyp_vcpu || pkvm_hyp_vcpu_is_protected(hyp_vcpu))
 		goto out;
@@ -318,9 +302,6 @@ static void handle___pkvm_host_wrprotect_guest(struct kvm_cpu_context *host_ctxt
 	struct pkvm_hyp_vm *hyp_vm;
 	int ret = -EINVAL;
 
-	if (!is_protected_kvm_enabled())
-		goto out;
-
 	hyp_vm = get_np_pkvm_hyp_vm(handle);
 	if (!hyp_vm)
 		goto out;
@@ -340,9 +321,6 @@ static void handle___pkvm_host_test_clear_young_guest(struct kvm_cpu_context *ho
 	struct pkvm_hyp_vm *hyp_vm;
 	int ret = -EINVAL;
 
-	if (!is_protected_kvm_enabled())
-		goto out;
-
 	hyp_vm = get_np_pkvm_hyp_vm(handle);
 	if (!hyp_vm)
 		goto out;
@@ -359,9 +337,6 @@ static void handle___pkvm_host_mkyoung_guest(struct kvm_cpu_context *host_ctxt)
 	struct pkvm_hyp_vcpu *hyp_vcpu;
 	int ret = -EINVAL;
 
-	if (!is_protected_kvm_enabled())
-		goto out;
-
 	hyp_vcpu = pkvm_get_loaded_hyp_vcpu();
 	if (!hyp_vcpu || pkvm_hyp_vcpu_is_protected(hyp_vcpu))
 		goto out;
@@ -421,12 +396,8 @@ static void handle___kvm_tlb_flush_vmid(struct kvm_cpu_context *host_ctxt)
 static void handle___pkvm_tlb_flush_vmid(struct kvm_cpu_context *host_ctxt)
 {
 	DECLARE_REG(pkvm_handle_t, handle, host_ctxt, 1);
-	struct pkvm_hyp_vm *hyp_vm;
+	struct pkvm_hyp_vm *hyp_vm = get_np_pkvm_hyp_vm(handle);
 
-	if (!is_protected_kvm_enabled())
-		return;
-
-	hyp_vm = get_np_pkvm_hyp_vm(handle);
 	if (!hyp_vm)
 		return;
 
@@ -600,14 +571,6 @@ static const hcall_t host_hcall[] = {
 	HANDLE_FUNC(__vgic_v3_get_gic_config),
 	HANDLE_FUNC(__pkvm_prot_finalize),
 
-	HANDLE_FUNC(__pkvm_host_share_hyp),
-	HANDLE_FUNC(__pkvm_host_unshare_hyp),
-	HANDLE_FUNC(__pkvm_host_share_guest),
-	HANDLE_FUNC(__pkvm_host_unshare_guest),
-	HANDLE_FUNC(__pkvm_host_relax_perms_guest),
-	HANDLE_FUNC(__pkvm_host_wrprotect_guest),
-	HANDLE_FUNC(__pkvm_host_test_clear_young_guest),
-	HANDLE_FUNC(__pkvm_host_mkyoung_guest),
 	HANDLE_FUNC(__kvm_adjust_pc),
 	HANDLE_FUNC(__kvm_vcpu_run),
 	HANDLE_FUNC(__kvm_flush_vm_context),
@@ -619,6 +582,15 @@ static const hcall_t host_hcall[] = {
 	HANDLE_FUNC(__kvm_timer_set_cntvoff),
 	HANDLE_FUNC(__vgic_v3_save_aprs),
 	HANDLE_FUNC(__vgic_v3_restore_vmcr_aprs),
+
+	HANDLE_FUNC(__pkvm_host_share_hyp),
+	HANDLE_FUNC(__pkvm_host_unshare_hyp),
+	HANDLE_FUNC(__pkvm_host_share_guest),
+	HANDLE_FUNC(__pkvm_host_unshare_guest),
+	HANDLE_FUNC(__pkvm_host_relax_perms_guest),
+	HANDLE_FUNC(__pkvm_host_wrprotect_guest),
+	HANDLE_FUNC(__pkvm_host_test_clear_young_guest),
+	HANDLE_FUNC(__pkvm_host_mkyoung_guest),
 	HANDLE_FUNC(__pkvm_reserve_vm),
 	HANDLE_FUNC(__pkvm_unreserve_vm),
 	HANDLE_FUNC(__pkvm_init_vm),
@@ -632,7 +604,7 @@ static const hcall_t host_hcall[] = {
 static void handle_host_hcall(struct kvm_cpu_context *host_ctxt)
 {
 	DECLARE_REG(unsigned long, id, host_ctxt, 0);
-	unsigned long hcall_min = 0;
+	unsigned long hcall_min = 0, hcall_max = -1;
 	hcall_t hfn;
 
 	/*
@@ -644,14 +616,19 @@ static void handle_host_hcall(struct kvm_cpu_context *host_ctxt)
 	 * basis. This is all fine, however, since __pkvm_prot_finalize
 	 * returns -EPERM after the first call for a given CPU.
 	 */
-	if (static_branch_unlikely(&kvm_protected_mode_initialized))
-		hcall_min = __KVM_HOST_SMCCC_FUNC___pkvm_prot_finalize;
+	if (static_branch_unlikely(&kvm_protected_mode_initialized)) {
+		hcall_min = __KVM_HOST_SMCCC_FUNC_MIN_PKVM;
+	} else {
+		hcall_max = __KVM_HOST_SMCCC_FUNC_MAX_NO_PKVM;
+	}
 
 	id &= ~ARM_SMCCC_CALL_HINTS;
 	id -= KVM_HOST_SMCCC_ID(0);
 
-	if (unlikely(id < hcall_min || id >= ARRAY_SIZE(host_hcall)))
+	if (unlikely(id < hcall_min || id > hcall_max ||
+		     id >= ARRAY_SIZE(host_hcall))) {
 		goto inval;
+	}
 
 	hfn = host_hcall[id];
 	if (unlikely(!hfn))
-- 
2.52.0.457.g6b5491de43-goog



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 08/35] KVM: arm64: Ignore MMU notifier callbacks for protected VMs
  2026-01-19 12:45 [PATCH v2 00/35] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
                   ` (6 preceding siblings ...)
  2026-01-19 12:46 ` [PATCH v2 07/35] KVM: arm64: Remove is_protected_kvm_enabled() checks from hypercalls Will Deacon
@ 2026-01-19 12:46 ` Will Deacon
  2026-01-19 12:46 ` [PATCH v2 09/35] KVM: arm64: Prevent unsupported memslot operations on " Will Deacon
                   ` (27 subsequent siblings)
  35 siblings, 0 replies; 54+ messages in thread
From: Will Deacon @ 2026-01-19 12:46 UTC (permalink / raw)
  To: kvmarm
  Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
	Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
	Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh

In preparation for supporting the donation of pinned pages to protected
VMs, return early from the MMU notifiers when called for a protected VM,
as the necessary hypercalls are exposed only for non-protected guests.

Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/mmu.c  |  9 ++++++---
 arch/arm64/kvm/pkvm.c | 19 ++++++++++++++++++-
 2 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 2b260bdf3202..f535a180fc1e 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -340,6 +340,9 @@ static void __unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64
 void kvm_stage2_unmap_range(struct kvm_s2_mmu *mmu, phys_addr_t start,
 			    u64 size, bool may_block)
 {
+	if (kvm_vm_is_protected(kvm_s2_mmu_to_kvm(mmu)))
+		return;
+
 	__unmap_stage2_range(mmu, start, size, may_block);
 }
 
@@ -2208,7 +2211,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 
 bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
 {
-	if (!kvm->arch.mmu.pgt)
+	if (!kvm->arch.mmu.pgt || kvm_vm_is_protected(kvm))
 		return false;
 
 	__unmap_stage2_range(&kvm->arch.mmu, range->start << PAGE_SHIFT,
@@ -2223,7 +2226,7 @@ bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
 {
 	u64 size = (range->end - range->start) << PAGE_SHIFT;
 
-	if (!kvm->arch.mmu.pgt)
+	if (!kvm->arch.mmu.pgt || kvm_vm_is_protected(kvm))
 		return false;
 
 	return KVM_PGT_FN(kvm_pgtable_stage2_test_clear_young)(kvm->arch.mmu.pgt,
@@ -2239,7 +2242,7 @@ bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
 {
 	u64 size = (range->end - range->start) << PAGE_SHIFT;
 
-	if (!kvm->arch.mmu.pgt)
+	if (!kvm->arch.mmu.pgt || kvm_vm_is_protected(kvm))
 		return false;
 
 	return KVM_PGT_FN(kvm_pgtable_stage2_test_clear_young)(kvm->arch.mmu.pgt,
diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index 42f6e50825ac..20d50abb3b94 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -407,7 +407,12 @@ int pkvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
 
 int pkvm_pgtable_stage2_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
 {
-	lockdep_assert_held_write(&kvm_s2_mmu_to_kvm(pgt->mmu)->mmu_lock);
+	struct kvm *kvm = kvm_s2_mmu_to_kvm(pgt->mmu);
+
+	if (WARN_ON(kvm_vm_is_protected(kvm)))
+		return -EPERM;
+
+	lockdep_assert_held_write(&kvm->mmu_lock);
 
 	return __pkvm_pgtable_stage2_unshare(pgt, addr, addr + size);
 }
@@ -419,6 +424,9 @@ int pkvm_pgtable_stage2_wrprotect(struct kvm_pgtable *pgt, u64 addr, u64 size)
 	struct pkvm_mapping *mapping;
 	int ret = 0;
 
+	if (WARN_ON(kvm_vm_is_protected(kvm)))
+		return -EPERM;
+
 	lockdep_assert_held(&kvm->mmu_lock);
 	for_each_mapping_in_range_safe(pgt, addr, addr + size, mapping) {
 		ret = kvm_call_hyp_nvhe(__pkvm_host_wrprotect_guest, handle, mapping->gfn,
@@ -450,6 +458,9 @@ bool pkvm_pgtable_stage2_test_clear_young(struct kvm_pgtable *pgt, u64 addr, u64
 	struct pkvm_mapping *mapping;
 	bool young = false;
 
+	if (WARN_ON(kvm_vm_is_protected(kvm)))
+		return -EPERM;
+
 	lockdep_assert_held(&kvm->mmu_lock);
 	for_each_mapping_in_range_safe(pgt, addr, addr + size, mapping)
 		young |= kvm_call_hyp_nvhe(__pkvm_host_test_clear_young_guest, handle, mapping->gfn,
@@ -461,12 +472,18 @@ bool pkvm_pgtable_stage2_test_clear_young(struct kvm_pgtable *pgt, u64 addr, u64
 int pkvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr, enum kvm_pgtable_prot prot,
 				    enum kvm_pgtable_walk_flags flags)
 {
+	if (WARN_ON(kvm_vm_is_protected(kvm_s2_mmu_to_kvm(pgt->mmu))))
+		return -EPERM;
+
 	return kvm_call_hyp_nvhe(__pkvm_host_relax_perms_guest, addr >> PAGE_SHIFT, prot);
 }
 
 void pkvm_pgtable_stage2_mkyoung(struct kvm_pgtable *pgt, u64 addr,
 				 enum kvm_pgtable_walk_flags flags)
 {
+	if (WARN_ON(kvm_vm_is_protected(kvm_s2_mmu_to_kvm(pgt->mmu))))
+		return;
+
 	WARN_ON(kvm_call_hyp_nvhe(__pkvm_host_mkyoung_guest, addr >> PAGE_SHIFT));
 }
 
-- 
2.52.0.457.g6b5491de43-goog



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 09/35] KVM: arm64: Prevent unsupported memslot operations on protected VMs
  2026-01-19 12:45 [PATCH v2 00/35] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
                   ` (7 preceding siblings ...)
  2026-01-19 12:46 ` [PATCH v2 08/35] KVM: arm64: Ignore MMU notifier callbacks for protected VMs Will Deacon
@ 2026-01-19 12:46 ` Will Deacon
  2026-01-19 12:46 ` [PATCH v2 10/35] KVM: arm64: Ignore -EAGAIN when mapping in pages for the pKVM host Will Deacon
                   ` (26 subsequent siblings)
  35 siblings, 0 replies; 54+ messages in thread
From: Will Deacon @ 2026-01-19 12:46 UTC (permalink / raw)
  To: kvmarm
  Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
	Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
	Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh

Protected VMs do not support deleting or moving memslots after first
run nor do they support read-only or dirty logging.

Return -EPERM to userspace if such an operation is attempted.

Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/mmu.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index f535a180fc1e..a23a4b7f108c 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -2419,6 +2419,19 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
 	hva_t hva, reg_end;
 	int ret = 0;
 
+	if (kvm_vm_is_protected(kvm)) {
+		/* Cannot modify memslots once a pVM has run. */
+		if (pkvm_hyp_vm_is_created(kvm) &&
+		    (change == KVM_MR_DELETE || change == KVM_MR_MOVE)) {
+			return -EPERM;
+		}
+
+		if (new &&
+		    new->flags & (KVM_MEM_LOG_DIRTY_PAGES | KVM_MEM_READONLY)) {
+			return -EPERM;
+		}
+	}
+
 	if (change != KVM_MR_CREATE && change != KVM_MR_MOVE &&
 			change != KVM_MR_FLAGS_ONLY)
 		return 0;
-- 
2.52.0.457.g6b5491de43-goog



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 10/35] KVM: arm64: Ignore -EAGAIN when mapping in pages for the pKVM host
  2026-01-19 12:45 [PATCH v2 00/35] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
                   ` (8 preceding siblings ...)
  2026-01-19 12:46 ` [PATCH v2 09/35] KVM: arm64: Prevent unsupported memslot operations on " Will Deacon
@ 2026-01-19 12:46 ` Will Deacon
  2026-01-19 12:46 ` [PATCH v2 11/35] KVM: arm64: Split teardown hypercall into two phases Will Deacon
                   ` (25 subsequent siblings)
  35 siblings, 0 replies; 54+ messages in thread
From: Will Deacon @ 2026-01-19 12:46 UTC (permalink / raw)
  To: kvmarm
  Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
	Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
	Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh

If the host takes a stage-2 translation fault on two CPUs at the same
time, one of them will get back -EAGAIN from the page-table mapping code
when it runs into the mapping installed by the other.

Rather than handle this explicitly in handle_host_mem_abort(), pass the
new KVM_PGTABLE_WALK_IGNORE_EAGAIN flag to kvm_pgtable_stage2_map() from
__host_stage2_idmap() and return -EEXIST if host_stage2_adjust_range()
finds a valid pte. This will avoid having to test for -EAGAIN on the
reclaim path in subsequent patches.

Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/hyp/nvhe/mem_protect.c | 21 ++++++++++++++++-----
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 49db32f3ddf7..0abf6c3acdf7 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -459,8 +459,15 @@ static bool range_is_memory(u64 start, u64 end)
 static inline int __host_stage2_idmap(u64 start, u64 end,
 				      enum kvm_pgtable_prot prot)
 {
+	/*
+	 * We don't make permission changes to the host idmap after
+	 * initialisation, so we can squash -EAGAIN to save callers
+	 * having to treat it like success in the case that they try to
+	 * map something that is already mapped.
+	 */
 	return kvm_pgtable_stage2_map(&host_mmu.pgt, start, end - start, start,
-				      prot, &host_s2_pool, 0);
+				      prot, &host_s2_pool,
+				      KVM_PGTABLE_WALK_IGNORE_EAGAIN);
 }
 
 /*
@@ -502,7 +509,7 @@ static int host_stage2_adjust_range(u64 addr, struct kvm_mem_range *range)
 		return ret;
 
 	if (kvm_pte_valid(pte))
-		return -EAGAIN;
+		return -EEXIST;
 
 	if (pte) {
 		WARN_ON(addr_is_memory(addr) &&
@@ -607,7 +614,6 @@ void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt)
 {
 	struct kvm_vcpu_fault_info fault;
 	u64 esr, addr;
-	int ret = 0;
 
 	esr = read_sysreg_el2(SYS_ESR);
 	if (!__get_fault_info(esr, &fault)) {
@@ -626,8 +632,13 @@ void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt)
 	BUG_ON(!(fault.hpfar_el2 & HPFAR_EL2_NS));
 	addr = FIELD_GET(HPFAR_EL2_FIPA, fault.hpfar_el2) << 12;
 
-	ret = host_stage2_idmap(addr);
-	BUG_ON(ret && ret != -EAGAIN);
+	switch (host_stage2_idmap(addr)) {
+	case -EEXIST:
+	case 0:
+		break;
+	default:
+		BUG();
+	}
 }
 
 struct check_walk_data {
-- 
2.52.0.457.g6b5491de43-goog



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 11/35] KVM: arm64: Split teardown hypercall into two phases
  2026-01-19 12:45 [PATCH v2 00/35] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
                   ` (9 preceding siblings ...)
  2026-01-19 12:46 ` [PATCH v2 10/35] KVM: arm64: Ignore -EAGAIN when mapping in pages for the pKVM host Will Deacon
@ 2026-01-19 12:46 ` Will Deacon
  2026-01-19 12:46 ` [PATCH v2 12/35] KVM: arm64: Introduce __pkvm_host_donate_guest() Will Deacon
                   ` (24 subsequent siblings)
  35 siblings, 0 replies; 54+ messages in thread
From: Will Deacon @ 2026-01-19 12:46 UTC (permalink / raw)
  To: kvmarm
  Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
	Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
	Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh

In preparation for reclaiming protected guest VM pages from the host
during teardown, split the current 'pkvm_teardown_vm' hypercall into
separate 'start' and 'finalise' calls.

The 'pkvm_start_teardown_vm' hypercall puts the VM into a new 'is_dying'
state, which is a point of no return past which no vCPU of the pVM is
allowed to run any more.  Once in this new state,
'pkvm_finalize_teardown_vm' can be used to reclaim meta-data and
page-table pages from the VM. A subsequent patch will add support for
reclaiming the individual guest memory pages.

Co-developed-by: Quentin Perret <qperret@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/kvm_asm.h       |  3 ++-
 arch/arm64/include/asm/kvm_host.h      |  7 +++++
 arch/arm64/kvm/hyp/include/nvhe/pkvm.h |  4 ++-
 arch/arm64/kvm/hyp/nvhe/hyp-main.c     | 14 +++++++---
 arch/arm64/kvm/hyp/nvhe/pkvm.c         | 36 ++++++++++++++++++++++----
 arch/arm64/kvm/pkvm.c                  |  7 ++++-
 6 files changed, 60 insertions(+), 11 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 2076005e9253..2b5ceaed0d7e 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -89,7 +89,8 @@ enum __kvm_host_smccc_func {
 	__KVM_HOST_SMCCC_FUNC___pkvm_unreserve_vm,
 	__KVM_HOST_SMCCC_FUNC___pkvm_init_vm,
 	__KVM_HOST_SMCCC_FUNC___pkvm_init_vcpu,
-	__KVM_HOST_SMCCC_FUNC___pkvm_teardown_vm,
+	__KVM_HOST_SMCCC_FUNC___pkvm_start_teardown_vm,
+	__KVM_HOST_SMCCC_FUNC___pkvm_finalize_teardown_vm,
 	__KVM_HOST_SMCCC_FUNC___pkvm_vcpu_load,
 	__KVM_HOST_SMCCC_FUNC___pkvm_vcpu_put,
 	__KVM_HOST_SMCCC_FUNC___pkvm_tlb_flush_vmid,
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index ac7f970c7883..3191d10a2622 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -255,6 +255,13 @@ struct kvm_protected_vm {
 	struct kvm_hyp_memcache stage2_teardown_mc;
 	bool is_protected;
 	bool is_created;
+
+	/*
+	 * True when the guest is being torn down. When in this state, the
+	 * guest's vCPUs can't be loaded anymore, but its pages can be
+	 * reclaimed by the host.
+	 */
+	bool is_dying;
 };
 
 struct kvm_mpidr_data {
diff --git a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
index 184ad7a39950..04c7ca703014 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
@@ -73,7 +73,9 @@ int __pkvm_init_vm(struct kvm *host_kvm, unsigned long vm_hva,
 		   unsigned long pgd_hva);
 int __pkvm_init_vcpu(pkvm_handle_t handle, struct kvm_vcpu *host_vcpu,
 		     unsigned long vcpu_hva);
-int __pkvm_teardown_vm(pkvm_handle_t handle);
+
+int __pkvm_start_teardown_vm(pkvm_handle_t handle);
+int __pkvm_finalize_teardown_vm(pkvm_handle_t handle);
 
 struct pkvm_hyp_vcpu *pkvm_load_hyp_vcpu(pkvm_handle_t handle,
 					 unsigned int vcpu_idx);
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index eb5cfe32b2c9..ebcbdd1cb232 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -550,11 +550,18 @@ static void handle___pkvm_init_vcpu(struct kvm_cpu_context *host_ctxt)
 	cpu_reg(host_ctxt, 1) = __pkvm_init_vcpu(handle, host_vcpu, vcpu_hva);
 }
 
-static void handle___pkvm_teardown_vm(struct kvm_cpu_context *host_ctxt)
+static void handle___pkvm_start_teardown_vm(struct kvm_cpu_context *host_ctxt)
 {
 	DECLARE_REG(pkvm_handle_t, handle, host_ctxt, 1);
 
-	cpu_reg(host_ctxt, 1) = __pkvm_teardown_vm(handle);
+	cpu_reg(host_ctxt, 1) = __pkvm_start_teardown_vm(handle);
+}
+
+static void handle___pkvm_finalize_teardown_vm(struct kvm_cpu_context *host_ctxt)
+{
+	DECLARE_REG(pkvm_handle_t, handle, host_ctxt, 1);
+
+	cpu_reg(host_ctxt, 1) = __pkvm_finalize_teardown_vm(handle);
 }
 
 typedef void (*hcall_t)(struct kvm_cpu_context *);
@@ -595,7 +602,8 @@ static const hcall_t host_hcall[] = {
 	HANDLE_FUNC(__pkvm_unreserve_vm),
 	HANDLE_FUNC(__pkvm_init_vm),
 	HANDLE_FUNC(__pkvm_init_vcpu),
-	HANDLE_FUNC(__pkvm_teardown_vm),
+	HANDLE_FUNC(__pkvm_start_teardown_vm),
+	HANDLE_FUNC(__pkvm_finalize_teardown_vm),
 	HANDLE_FUNC(__pkvm_vcpu_load),
 	HANDLE_FUNC(__pkvm_vcpu_put),
 	HANDLE_FUNC(__pkvm_tlb_flush_vmid),
diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
index 8911338961c5..7f8191f96fc3 100644
--- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
+++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
@@ -256,7 +256,10 @@ struct pkvm_hyp_vcpu *pkvm_load_hyp_vcpu(pkvm_handle_t handle,
 
 	hyp_spin_lock(&vm_table_lock);
 	hyp_vm = get_vm_by_handle(handle);
-	if (!hyp_vm || hyp_vm->kvm.created_vcpus <= vcpu_idx)
+	if (!hyp_vm || hyp_vm->kvm.arch.pkvm.is_dying)
+		goto unlock;
+
+	if (hyp_vm->kvm.created_vcpus <= vcpu_idx)
 		goto unlock;
 
 	hyp_vcpu = hyp_vm->vcpus[vcpu_idx];
@@ -829,7 +832,32 @@ teardown_donated_memory(struct kvm_hyp_memcache *mc, void *addr, size_t size)
 	unmap_donated_memory_noclear(addr, size);
 }
 
-int __pkvm_teardown_vm(pkvm_handle_t handle)
+int __pkvm_start_teardown_vm(pkvm_handle_t handle)
+{
+	struct pkvm_hyp_vm *hyp_vm;
+	int ret = 0;
+
+	hyp_spin_lock(&vm_table_lock);
+	hyp_vm = get_vm_by_handle(handle);
+	if (!hyp_vm) {
+		ret = -ENOENT;
+		goto unlock;
+	} else if (WARN_ON(hyp_page_count(hyp_vm))) {
+		ret = -EBUSY;
+		goto unlock;
+	} else if (hyp_vm->kvm.arch.pkvm.is_dying) {
+		ret = -EINVAL;
+		goto unlock;
+	}
+
+	hyp_vm->kvm.arch.pkvm.is_dying = true;
+unlock:
+	hyp_spin_unlock(&vm_table_lock);
+
+	return ret;
+}
+
+int __pkvm_finalize_teardown_vm(pkvm_handle_t handle)
 {
 	struct kvm_hyp_memcache *mc, *stage2_mc;
 	struct pkvm_hyp_vm *hyp_vm;
@@ -843,9 +871,7 @@ int __pkvm_teardown_vm(pkvm_handle_t handle)
 	if (!hyp_vm) {
 		err = -ENOENT;
 		goto err_unlock;
-	}
-
-	if (WARN_ON(hyp_page_count(hyp_vm))) {
+	} else if (!hyp_vm->kvm.arch.pkvm.is_dying) {
 		err = -EBUSY;
 		goto err_unlock;
 	}
diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index 20d50abb3b94..a39dacd1d617 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -88,7 +88,7 @@ void __init kvm_hyp_reserve(void)
 static void __pkvm_destroy_hyp_vm(struct kvm *kvm)
 {
 	if (pkvm_hyp_vm_is_created(kvm)) {
-		WARN_ON(kvm_call_hyp_nvhe(__pkvm_teardown_vm,
+		WARN_ON(kvm_call_hyp_nvhe(__pkvm_finalize_teardown_vm,
 					  kvm->arch.pkvm.handle));
 	} else if (kvm->arch.pkvm.handle) {
 		/*
@@ -350,6 +350,11 @@ void pkvm_pgtable_stage2_destroy_range(struct kvm_pgtable *pgt,
 	if (!handle)
 		return;
 
+	if (pkvm_hyp_vm_is_created(kvm) && !kvm->arch.pkvm.is_dying) {
+		WARN_ON(kvm_call_hyp_nvhe(__pkvm_start_teardown_vm, handle));
+		kvm->arch.pkvm.is_dying = true;
+	}
+
 	__pkvm_pgtable_stage2_unshare(pgt, addr, addr + size);
 }
 
-- 
2.52.0.457.g6b5491de43-goog



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 12/35] KVM: arm64: Introduce __pkvm_host_donate_guest()
  2026-01-19 12:45 [PATCH v2 00/35] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
                   ` (10 preceding siblings ...)
  2026-01-19 12:46 ` [PATCH v2 11/35] KVM: arm64: Split teardown hypercall into two phases Will Deacon
@ 2026-01-19 12:46 ` Will Deacon
  2026-01-19 12:46 ` [PATCH v2 13/35] KVM: arm64: Hook up donation hypercall to pkvm_pgtable_stage2_map() Will Deacon
                   ` (23 subsequent siblings)
  35 siblings, 0 replies; 54+ messages in thread
From: Will Deacon @ 2026-01-19 12:46 UTC (permalink / raw)
  To: kvmarm
  Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
	Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
	Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh

In preparation for supporting protected VMs, whose memory pages are
isolated from the host, introduce a new pKVM hypercall to allow the
donation of pages to a guest.

Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/kvm_asm.h              |  1 +
 arch/arm64/include/asm/kvm_pgtable.h          |  2 +-
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  2 ++
 arch/arm64/kvm/hyp/nvhe/hyp-main.c            | 21 +++++++++++++
 arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 30 +++++++++++++++++++
 5 files changed, 55 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 2b5ceaed0d7e..2750b75b5b8f 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -79,6 +79,7 @@ enum __kvm_host_smccc_func {
 	/* Hypercalls available only when pKVM has finalised */
 	__KVM_HOST_SMCCC_FUNC___pkvm_host_share_hyp,
 	__KVM_HOST_SMCCC_FUNC___pkvm_host_unshare_hyp,
+	__KVM_HOST_SMCCC_FUNC___pkvm_host_donate_guest,
 	__KVM_HOST_SMCCC_FUNC___pkvm_host_share_guest,
 	__KVM_HOST_SMCCC_FUNC___pkvm_host_unshare_guest,
 	__KVM_HOST_SMCCC_FUNC___pkvm_host_relax_perms_guest,
diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 8b78d573fbcf..9ce55442b621 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -98,7 +98,7 @@ typedef u64 kvm_pte_t;
 					 KVM_PTE_LEAF_ATTR_HI_S2_XN)
 
 #define KVM_INVALID_PTE_OWNER_MASK	GENMASK(9, 2)
-#define KVM_MAX_OWNER_ID		1
+#define KVM_MAX_OWNER_ID		3
 
 /*
  * Used to indicate a pte for which a 'break-before-make' sequence is in
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index 5f9d56754e39..9c0cc53d1dc9 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -28,6 +28,7 @@ enum pkvm_component_id {
 	PKVM_ID_HOST,
 	PKVM_ID_HYP,
 	PKVM_ID_FFA,
+	PKVM_ID_GUEST,
 };
 
 extern unsigned long hyp_nr_cpus;
@@ -39,6 +40,7 @@ int __pkvm_host_donate_hyp(u64 pfn, u64 nr_pages);
 int __pkvm_hyp_donate_host(u64 pfn, u64 nr_pages);
 int __pkvm_host_share_ffa(u64 pfn, u64 nr_pages);
 int __pkvm_host_unshare_ffa(u64 pfn, u64 nr_pages);
+int __pkvm_host_donate_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu);
 int __pkvm_host_share_guest(u64 pfn, u64 gfn, u64 nr_pages, struct pkvm_hyp_vcpu *vcpu,
 			    enum kvm_pgtable_prot prot);
 int __pkvm_host_unshare_guest(u64 gfn, u64 nr_pages, struct pkvm_hyp_vm *hyp_vm);
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index ebcbdd1cb232..c04300b51c9a 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -238,6 +238,26 @@ static int pkvm_refill_memcache(struct pkvm_hyp_vcpu *hyp_vcpu)
 			       &host_vcpu->arch.pkvm_memcache);
 }
 
+static void handle___pkvm_host_donate_guest(struct kvm_cpu_context *host_ctxt)
+{
+	DECLARE_REG(u64, pfn, host_ctxt, 1);
+	DECLARE_REG(u64, gfn, host_ctxt, 2);
+	struct pkvm_hyp_vcpu *hyp_vcpu;
+	int ret = -EINVAL;
+
+	hyp_vcpu = pkvm_get_loaded_hyp_vcpu();
+	if (!hyp_vcpu || !pkvm_hyp_vcpu_is_protected(hyp_vcpu))
+		goto out;
+
+	ret = pkvm_refill_memcache(hyp_vcpu);
+	if (ret)
+		goto out;
+
+	ret = __pkvm_host_donate_guest(pfn, gfn, hyp_vcpu);
+out:
+	cpu_reg(host_ctxt, 1) =  ret;
+}
+
 static void handle___pkvm_host_share_guest(struct kvm_cpu_context *host_ctxt)
 {
 	DECLARE_REG(u64, pfn, host_ctxt, 1);
@@ -592,6 +612,7 @@ static const hcall_t host_hcall[] = {
 
 	HANDLE_FUNC(__pkvm_host_share_hyp),
 	HANDLE_FUNC(__pkvm_host_unshare_hyp),
+	HANDLE_FUNC(__pkvm_host_donate_guest),
 	HANDLE_FUNC(__pkvm_host_share_guest),
 	HANDLE_FUNC(__pkvm_host_unshare_guest),
 	HANDLE_FUNC(__pkvm_host_relax_perms_guest),
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 0abf6c3acdf7..f24e8ce2c40c 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -969,6 +969,36 @@ static int __guest_check_transition_size(u64 phys, u64 ipa, u64 nr_pages, u64 *s
 	return 0;
 }
 
+int __pkvm_host_donate_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu)
+{
+	struct pkvm_hyp_vm *vm = pkvm_hyp_vcpu_to_hyp_vm(vcpu);
+	u64 phys = hyp_pfn_to_phys(pfn);
+	u64 ipa = hyp_pfn_to_phys(gfn);
+	int ret;
+
+	host_lock_component();
+	guest_lock_component(vm);
+
+	ret = __host_check_page_state_range(phys, PAGE_SIZE, PKVM_PAGE_OWNED);
+	if (ret)
+		goto unlock;
+
+	ret = __guest_check_page_state_range(vm, ipa, PAGE_SIZE, PKVM_NOPAGE);
+	if (ret)
+		goto unlock;
+
+	WARN_ON(host_stage2_set_owner_locked(phys, PAGE_SIZE, PKVM_ID_GUEST));
+	WARN_ON(kvm_pgtable_stage2_map(&vm->pgt, ipa, PAGE_SIZE, phys,
+				       pkvm_mkstate(KVM_PGTABLE_PROT_RWX, PKVM_PAGE_OWNED),
+				       &vcpu->vcpu.arch.pkvm_memcache, 0));
+
+unlock:
+	guest_unlock_component(vm);
+	host_unlock_component();
+
+	return ret;
+}
+
 int __pkvm_host_share_guest(u64 pfn, u64 gfn, u64 nr_pages, struct pkvm_hyp_vcpu *vcpu,
 			    enum kvm_pgtable_prot prot)
 {
-- 
2.52.0.457.g6b5491de43-goog



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 13/35] KVM: arm64: Hook up donation hypercall to pkvm_pgtable_stage2_map()
  2026-01-19 12:45 [PATCH v2 00/35] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
                   ` (11 preceding siblings ...)
  2026-01-19 12:46 ` [PATCH v2 12/35] KVM: arm64: Introduce __pkvm_host_donate_guest() Will Deacon
@ 2026-01-19 12:46 ` Will Deacon
  2026-01-19 12:46 ` [PATCH v2 14/35] KVM: arm64: Handle aborts from protected VMs Will Deacon
                   ` (22 subsequent siblings)
  35 siblings, 0 replies; 54+ messages in thread
From: Will Deacon @ 2026-01-19 12:46 UTC (permalink / raw)
  To: kvmarm
  Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
	Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
	Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh

Mapping pages into a protected guest requires the donation of memory
from the host.

Extend pkvm_pgtable_stage2_map() to issue a donate hypercall when the
target VM is protected. Since the hypercall only handles a single page,
the splitting logic used for the share path is not required.

Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/pkvm.c | 58 ++++++++++++++++++++++++++++++-------------
 1 file changed, 41 insertions(+), 17 deletions(-)

diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index a39dacd1d617..1814e17d600e 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -373,31 +373,55 @@ int pkvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
 	struct kvm_hyp_memcache *cache = mc;
 	u64 gfn = addr >> PAGE_SHIFT;
 	u64 pfn = phys >> PAGE_SHIFT;
+	u64 end = addr + size;
 	int ret;
 
-	if (size != PAGE_SIZE && size != PMD_SIZE)
-		return -EINVAL;
-
 	lockdep_assert_held_write(&kvm->mmu_lock);
+	mapping = pkvm_mapping_iter_first(&pgt->pkvm_mappings, addr, end - 1);
 
-	/*
-	 * Calling stage2_map() on top of existing mappings is either happening because of a race
-	 * with another vCPU, or because we're changing between page and block mappings. As per
-	 * user_mem_abort(), same-size permission faults are handled in the relax_perms() path.
-	 */
-	mapping = pkvm_mapping_iter_first(&pgt->pkvm_mappings, addr, addr + size - 1);
-	if (mapping) {
-		if (size == (mapping->nr_pages * PAGE_SIZE))
+	if (kvm_vm_is_protected(kvm)) {
+		/* Protected VMs are mapped using RWX page-granular mappings */
+		if (WARN_ON_ONCE(size != PAGE_SIZE))
+			return -EINVAL;
+
+		if (WARN_ON_ONCE(prot != KVM_PGTABLE_PROT_RWX))
+			return -EINVAL;
+
+		/*
+		 * We raced with another vCPU.
+		 */
+		if (mapping)
 			return -EAGAIN;
 
-		/* Remove _any_ pkvm_mapping overlapping with the range, bigger or smaller. */
-		ret = __pkvm_pgtable_stage2_unshare(pgt, addr, addr + size);
-		if (ret)
-			return ret;
-		mapping = NULL;
+		ret = kvm_call_hyp_nvhe(__pkvm_host_donate_guest, pfn, gfn);
+	} else {
+		if (WARN_ON_ONCE(size != PAGE_SIZE && size != PMD_SIZE))
+			return -EINVAL;
+
+		/*
+		 * We either raced with another vCPU or we're changing between
+		 * page and block mappings. As per user_mem_abort(), same-size
+		 * permission faults are handled in the relax_perms() path.
+		 */
+		if (mapping) {
+			if (size == (mapping->nr_pages * PAGE_SIZE))
+				return -EAGAIN;
+
+			/*
+			 * Remove _any_ pkvm_mapping overlapping with the range,
+			 * bigger or smaller.
+			 */
+			ret = __pkvm_pgtable_stage2_unshare(pgt, addr, end);
+			if (ret)
+				return ret;
+
+			mapping = NULL;
+		}
+
+		ret = kvm_call_hyp_nvhe(__pkvm_host_share_guest, pfn, gfn,
+					size / PAGE_SIZE, prot);
 	}
 
-	ret = kvm_call_hyp_nvhe(__pkvm_host_share_guest, pfn, gfn, size / PAGE_SIZE, prot);
 	if (WARN_ON(ret))
 		return ret;
 
-- 
2.52.0.457.g6b5491de43-goog



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 14/35] KVM: arm64: Handle aborts from protected VMs
  2026-01-19 12:45 [PATCH v2 00/35] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
                   ` (12 preceding siblings ...)
  2026-01-19 12:46 ` [PATCH v2 13/35] KVM: arm64: Hook up donation hypercall to pkvm_pgtable_stage2_map() Will Deacon
@ 2026-01-19 12:46 ` Will Deacon
  2026-02-12 10:37   ` Alexandru Elisei
  2026-03-11 10:24   ` Fuad Tabba
  2026-01-19 12:46 ` [PATCH v2 15/35] KVM: arm64: Introduce __pkvm_reclaim_dying_guest_page() Will Deacon
                   ` (21 subsequent siblings)
  35 siblings, 2 replies; 54+ messages in thread
From: Will Deacon @ 2026-01-19 12:46 UTC (permalink / raw)
  To: kvmarm
  Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
	Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
	Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh

Introduce a new abort handler for resolving stage-2 page faults from
protected VMs by pinning and donating anonymous memory. This is
considerably simpler than the infamous user_mem_abort() as we only have
to deal with translation faults at the pte level.

Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/mmu.c | 89 ++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 81 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index a23a4b7f108c..b21a5bf3d104 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1641,6 +1641,74 @@ static int gmem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	return ret != -EAGAIN ? ret : 0;
 }
 
+static int pkvm_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
+		struct kvm_memory_slot *memslot, unsigned long hva)
+{
+	unsigned int flags = FOLL_HWPOISON | FOLL_LONGTERM | FOLL_WRITE;
+	struct kvm_pgtable *pgt = vcpu->arch.hw_mmu->pgt;
+	struct mm_struct *mm = current->mm;
+	struct kvm *kvm = vcpu->kvm;
+	void *hyp_memcache;
+	struct page *page;
+	int ret;
+
+	ret = prepare_mmu_memcache(vcpu, true, &hyp_memcache);
+	if (ret)
+		return -ENOMEM;
+
+	ret = account_locked_vm(mm, 1, true);
+	if (ret)
+		return ret;
+
+	mmap_read_lock(mm);
+	ret = pin_user_pages(hva, 1, flags, &page);
+	mmap_read_unlock(mm);
+
+	if (ret == -EHWPOISON) {
+		kvm_send_hwpoison_signal(hva, PAGE_SHIFT);
+		ret = 0;
+		goto dec_account;
+	} else if (ret != 1) {
+		ret = -EFAULT;
+		goto dec_account;
+	} else if (!folio_test_swapbacked(page_folio(page))) {
+		/*
+		 * We really can't deal with page-cache pages returned by GUP
+		 * because (a) we may trigger writeback of a page for which we
+		 * no longer have access and (b) page_mkclean() won't find the
+		 * stage-2 mapping in the rmap so we can get out-of-whack with
+		 * the filesystem when marking the page dirty during unpinning
+		 * (see cc5095747edf ("ext4: don't BUG if someone dirty pages
+		 * without asking ext4 first")).
+		 *
+		 * Ideally we'd just restrict ourselves to anonymous pages, but
+		 * we also want to allow memfd (i.e. shmem) pages, so check for
+		 * pages backed by swap in the knowledge that the GUP pin will
+		 * prevent try_to_unmap() from succeeding.
+		 */
+		ret = -EIO;
+		goto unpin;
+	}
+
+	write_lock(&kvm->mmu_lock);
+	ret = pkvm_pgtable_stage2_map(pgt, fault_ipa, PAGE_SIZE,
+				      page_to_phys(page), KVM_PGTABLE_PROT_RWX,
+				      hyp_memcache, 0);
+	write_unlock(&kvm->mmu_lock);
+	if (ret) {
+		if (ret == -EAGAIN)
+			ret = 0;
+		goto unpin;
+	}
+
+	return 0;
+unpin:
+	unpin_user_pages(&page, 1);
+dec_account:
+	account_locked_vm(mm, 1, false);
+	return ret;
+}
+
 static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			  struct kvm_s2_trans *nested,
 			  struct kvm_memory_slot *memslot, unsigned long hva,
@@ -2190,15 +2258,20 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 		goto out_unlock;
 	}
 
-	VM_WARN_ON_ONCE(kvm_vcpu_trap_is_permission_fault(vcpu) &&
-			!write_fault && !kvm_vcpu_trap_is_exec_fault(vcpu));
+	if (kvm_vm_is_protected(vcpu->kvm)) {
+		ret = pkvm_mem_abort(vcpu, fault_ipa, memslot, hva);
+	} else {
+		VM_WARN_ON_ONCE(kvm_vcpu_trap_is_permission_fault(vcpu) &&
+				!write_fault &&
+				!kvm_vcpu_trap_is_exec_fault(vcpu));
 
-	if (kvm_slot_has_gmem(memslot))
-		ret = gmem_abort(vcpu, fault_ipa, nested, memslot,
-				 esr_fsc_is_permission_fault(esr));
-	else
-		ret = user_mem_abort(vcpu, fault_ipa, nested, memslot, hva,
-				     esr_fsc_is_permission_fault(esr));
+		if (kvm_slot_has_gmem(memslot))
+			ret = gmem_abort(vcpu, fault_ipa, nested, memslot,
+					 esr_fsc_is_permission_fault(esr));
+		else
+			ret = user_mem_abort(vcpu, fault_ipa, nested, memslot, hva,
+					     esr_fsc_is_permission_fault(esr));
+	}
 	if (ret == 0)
 		ret = 1;
 out:
-- 
2.52.0.457.g6b5491de43-goog



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 15/35] KVM: arm64: Introduce __pkvm_reclaim_dying_guest_page()
  2026-01-19 12:45 [PATCH v2 00/35] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
                   ` (13 preceding siblings ...)
  2026-01-19 12:46 ` [PATCH v2 14/35] KVM: arm64: Handle aborts from protected VMs Will Deacon
@ 2026-01-19 12:46 ` Will Deacon
  2026-01-19 12:46 ` [PATCH v2 16/35] KVM: arm64: Hook up reclaim hypercall to pkvm_pgtable_stage2_destroy() Will Deacon
                   ` (20 subsequent siblings)
  35 siblings, 0 replies; 54+ messages in thread
From: Will Deacon @ 2026-01-19 12:46 UTC (permalink / raw)
  To: kvmarm
  Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
	Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
	Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh

To enable reclaim of pages from a protected VM during teardown,
introduce a new hypercall to reclaim a single page from a protected
guest that is in the dying state.

Since the EL2 code is non-preemptible, the new hypercall deliberately
acts on a single page at a time so as to allow EL1 to reschedule
frequently during the teardown operation.

Reviewed-by: Vincent Donnefort <vdonnefort@google.com>
Co-developed-by: Quentin Perret <qperret@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/kvm_asm.h              |  1 +
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  1 +
 arch/arm64/kvm/hyp/include/nvhe/pkvm.h        |  1 +
 arch/arm64/kvm/hyp/nvhe/hyp-main.c            |  9 +++
 arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 79 +++++++++++++++++++
 arch/arm64/kvm/hyp/nvhe/pkvm.c                | 14 ++++
 6 files changed, 105 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 2750b75b5b8f..2e7e8e7771f6 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -90,6 +90,7 @@ enum __kvm_host_smccc_func {
 	__KVM_HOST_SMCCC_FUNC___pkvm_unreserve_vm,
 	__KVM_HOST_SMCCC_FUNC___pkvm_init_vm,
 	__KVM_HOST_SMCCC_FUNC___pkvm_init_vcpu,
+	__KVM_HOST_SMCCC_FUNC___pkvm_reclaim_dying_guest_page,
 	__KVM_HOST_SMCCC_FUNC___pkvm_start_teardown_vm,
 	__KVM_HOST_SMCCC_FUNC___pkvm_finalize_teardown_vm,
 	__KVM_HOST_SMCCC_FUNC___pkvm_vcpu_load,
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index 9c0cc53d1dc9..cde38a556049 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -41,6 +41,7 @@ int __pkvm_hyp_donate_host(u64 pfn, u64 nr_pages);
 int __pkvm_host_share_ffa(u64 pfn, u64 nr_pages);
 int __pkvm_host_unshare_ffa(u64 pfn, u64 nr_pages);
 int __pkvm_host_donate_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu);
+int __pkvm_host_reclaim_page_guest(u64 gfn, struct pkvm_hyp_vm *vm);
 int __pkvm_host_share_guest(u64 pfn, u64 gfn, u64 nr_pages, struct pkvm_hyp_vcpu *vcpu,
 			    enum kvm_pgtable_prot prot);
 int __pkvm_host_unshare_guest(u64 gfn, u64 nr_pages, struct pkvm_hyp_vm *hyp_vm);
diff --git a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
index 04c7ca703014..506831804f64 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
@@ -74,6 +74,7 @@ int __pkvm_init_vm(struct kvm *host_kvm, unsigned long vm_hva,
 int __pkvm_init_vcpu(pkvm_handle_t handle, struct kvm_vcpu *host_vcpu,
 		     unsigned long vcpu_hva);
 
+int __pkvm_reclaim_dying_guest_page(pkvm_handle_t handle, u64 gfn);
 int __pkvm_start_teardown_vm(pkvm_handle_t handle);
 int __pkvm_finalize_teardown_vm(pkvm_handle_t handle);
 
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index c04300b51c9a..f43c50ae2d81 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -570,6 +570,14 @@ static void handle___pkvm_init_vcpu(struct kvm_cpu_context *host_ctxt)
 	cpu_reg(host_ctxt, 1) = __pkvm_init_vcpu(handle, host_vcpu, vcpu_hva);
 }
 
+static void handle___pkvm_reclaim_dying_guest_page(struct kvm_cpu_context *host_ctxt)
+{
+	DECLARE_REG(pkvm_handle_t, handle, host_ctxt, 1);
+	DECLARE_REG(u64, gfn, host_ctxt, 2);
+
+	cpu_reg(host_ctxt, 1) = __pkvm_reclaim_dying_guest_page(handle, gfn);
+}
+
 static void handle___pkvm_start_teardown_vm(struct kvm_cpu_context *host_ctxt)
 {
 	DECLARE_REG(pkvm_handle_t, handle, host_ctxt, 1);
@@ -623,6 +631,7 @@ static const hcall_t host_hcall[] = {
 	HANDLE_FUNC(__pkvm_unreserve_vm),
 	HANDLE_FUNC(__pkvm_init_vm),
 	HANDLE_FUNC(__pkvm_init_vcpu),
+	HANDLE_FUNC(__pkvm_reclaim_dying_guest_page),
 	HANDLE_FUNC(__pkvm_start_teardown_vm),
 	HANDLE_FUNC(__pkvm_finalize_teardown_vm),
 	HANDLE_FUNC(__pkvm_vcpu_load),
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index f24e8ce2c40c..ef21e8e7a734 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -736,6 +736,32 @@ static int __guest_check_page_state_range(struct pkvm_hyp_vm *vm, u64 addr,
 	return check_page_state_range(&vm->pgt, addr, size, &d);
 }
 
+static int get_valid_guest_pte(struct pkvm_hyp_vm *vm, u64 ipa, kvm_pte_t *ptep, u64 *physp)
+{
+	kvm_pte_t pte;
+	u64 phys;
+	s8 level;
+	int ret;
+
+	ret = kvm_pgtable_get_leaf(&vm->pgt, ipa, &pte, &level);
+	if (ret)
+		return ret;
+	if (!kvm_pte_valid(pte))
+		return -ENOENT;
+	if (level != KVM_PGTABLE_LAST_LEVEL)
+		return -E2BIG;
+
+	phys = kvm_pte_to_phys(pte);
+	ret = check_range_allowed_memory(phys, phys + PAGE_SIZE);
+	if (WARN_ON(ret))
+		return ret;
+
+	*ptep = pte;
+	*physp = phys;
+
+	return 0;
+}
+
 int __pkvm_host_share_hyp(u64 pfn)
 {
 	u64 phys = hyp_pfn_to_phys(pfn);
@@ -969,6 +995,59 @@ static int __guest_check_transition_size(u64 phys, u64 ipa, u64 nr_pages, u64 *s
 	return 0;
 }
 
+static void hyp_poison_page(phys_addr_t phys)
+{
+	void *addr = hyp_fixmap_map(phys);
+
+	memset(addr, 0, PAGE_SIZE);
+	/*
+	 * Prefer kvm_flush_dcache_to_poc() over __clean_dcache_guest_page()
+	 * here as the latter may elide the CMO under the assumption that FWB
+	 * will be enabled on CPUs that support it. This is incorrect for the
+	 * host stage-2 and would otherwise lead to a malicious host potentially
+	 * being able to read the contents of newly reclaimed guest pages.
+	 */
+	kvm_flush_dcache_to_poc(addr, PAGE_SIZE);
+	hyp_fixmap_unmap();
+}
+
+int __pkvm_host_reclaim_page_guest(u64 gfn, struct pkvm_hyp_vm *vm)
+{
+	u64 ipa = hyp_pfn_to_phys(gfn);
+	kvm_pte_t pte;
+	u64 phys;
+	int ret;
+
+	host_lock_component();
+	guest_lock_component(vm);
+
+	ret = get_valid_guest_pte(vm, ipa, &pte, &phys);
+	if (ret)
+		goto unlock;
+
+	switch (guest_get_page_state(pte, ipa)) {
+	case PKVM_PAGE_OWNED:
+		WARN_ON(__host_check_page_state_range(phys, PAGE_SIZE, PKVM_NOPAGE));
+		hyp_poison_page(phys);
+		break;
+	case PKVM_PAGE_SHARED_OWNED:
+		WARN_ON(__host_check_page_state_range(phys, PAGE_SIZE, PKVM_PAGE_SHARED_BORROWED));
+		break;
+	default:
+		ret = -EPERM;
+		goto unlock;
+	}
+
+	WARN_ON(kvm_pgtable_stage2_unmap(&vm->pgt, ipa, PAGE_SIZE));
+	WARN_ON(host_stage2_set_owner_locked(phys, PAGE_SIZE, PKVM_ID_HOST));
+
+unlock:
+	guest_unlock_component(vm);
+	host_unlock_component();
+
+	return ret;
+}
+
 int __pkvm_host_donate_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu)
 {
 	struct pkvm_hyp_vm *vm = pkvm_hyp_vcpu_to_hyp_vm(vcpu);
diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
index 7f8191f96fc3..9f0997150cf5 100644
--- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
+++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
@@ -832,6 +832,20 @@ teardown_donated_memory(struct kvm_hyp_memcache *mc, void *addr, size_t size)
 	unmap_donated_memory_noclear(addr, size);
 }
 
+int __pkvm_reclaim_dying_guest_page(pkvm_handle_t handle, u64 gfn)
+{
+	struct pkvm_hyp_vm *hyp_vm;
+	int ret = -EINVAL;
+
+	hyp_spin_lock(&vm_table_lock);
+	hyp_vm = get_vm_by_handle(handle);
+	if (hyp_vm && hyp_vm->kvm.arch.pkvm.is_dying)
+		ret = __pkvm_host_reclaim_page_guest(gfn, hyp_vm);
+	hyp_spin_unlock(&vm_table_lock);
+
+	return ret;
+}
+
 int __pkvm_start_teardown_vm(pkvm_handle_t handle)
 {
 	struct pkvm_hyp_vm *hyp_vm;
-- 
2.52.0.457.g6b5491de43-goog



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 16/35] KVM: arm64: Hook up reclaim hypercall to pkvm_pgtable_stage2_destroy()
  2026-01-19 12:45 [PATCH v2 00/35] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
                   ` (14 preceding siblings ...)
  2026-01-19 12:46 ` [PATCH v2 15/35] KVM: arm64: Introduce __pkvm_reclaim_dying_guest_page() Will Deacon
@ 2026-01-19 12:46 ` Will Deacon
  2026-01-19 12:46 ` [PATCH v2 17/35] KVM: arm64: Refactor enter_exception64() Will Deacon
                   ` (19 subsequent siblings)
  35 siblings, 0 replies; 54+ messages in thread
From: Will Deacon @ 2026-01-19 12:46 UTC (permalink / raw)
  To: kvmarm
  Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
	Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
	Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh

During teardown of a protected guest, its memory pages must be reclaimed
from the hypervisor by issuing the '__pkvm_reclaim_dying_guest_page'
hypercall.

Add a new helper, __pkvm_pgtable_stage2_reclaim(), which is called
during the VM teardown operation to reclaim pages from the hypervisor
and drop the GUP pin on the host.

Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/pkvm.c | 31 ++++++++++++++++++++++++++++++-
 1 file changed, 30 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index 1814e17d600e..8be91051699e 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -322,6 +322,32 @@ int pkvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
 	return 0;
 }
 
+static int __pkvm_pgtable_stage2_reclaim(struct kvm_pgtable *pgt, u64 start, u64 end)
+{
+	struct kvm *kvm = kvm_s2_mmu_to_kvm(pgt->mmu);
+	pkvm_handle_t handle = kvm->arch.pkvm.handle;
+	struct pkvm_mapping *mapping;
+	int ret;
+
+	for_each_mapping_in_range_safe(pgt, start, end, mapping) {
+		struct page *page;
+
+		ret = kvm_call_hyp_nvhe(__pkvm_reclaim_dying_guest_page,
+					handle, mapping->gfn);
+		if (WARN_ON(ret))
+			return ret;
+
+		page = pfn_to_page(mapping->pfn);
+		WARN_ON_ONCE(mapping->nr_pages != 1);
+		unpin_user_pages_dirty_lock(&page, 1, true);
+		account_locked_vm(current->mm, 1, false);
+		pkvm_mapping_remove(mapping, &pgt->pkvm_mappings);
+		kfree(mapping);
+	}
+
+	return 0;
+}
+
 static int __pkvm_pgtable_stage2_unshare(struct kvm_pgtable *pgt, u64 start, u64 end)
 {
 	struct kvm *kvm = kvm_s2_mmu_to_kvm(pgt->mmu);
@@ -355,7 +381,10 @@ void pkvm_pgtable_stage2_destroy_range(struct kvm_pgtable *pgt,
 		kvm->arch.pkvm.is_dying = true;
 	}
 
-	__pkvm_pgtable_stage2_unshare(pgt, addr, addr + size);
+	if (kvm_vm_is_protected(kvm))
+		__pkvm_pgtable_stage2_reclaim(pgt, addr, addr + size);
+	else
+		__pkvm_pgtable_stage2_unshare(pgt, addr, addr + size);
 }
 
 void pkvm_pgtable_stage2_destroy_pgd(struct kvm_pgtable *pgt)
-- 
2.52.0.457.g6b5491de43-goog



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 17/35] KVM: arm64: Refactor enter_exception64()
  2026-01-19 12:45 [PATCH v2 00/35] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
                   ` (15 preceding siblings ...)
  2026-01-19 12:46 ` [PATCH v2 16/35] KVM: arm64: Hook up reclaim hypercall to pkvm_pgtable_stage2_destroy() Will Deacon
@ 2026-01-19 12:46 ` Will Deacon
  2026-01-19 12:46 ` [PATCH v2 18/35] KVM: arm64: Inject SIGSEGV on illegal accesses Will Deacon
                   ` (18 subsequent siblings)
  35 siblings, 0 replies; 54+ messages in thread
From: Will Deacon @ 2026-01-19 12:46 UTC (permalink / raw)
  To: kvmarm
  Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
	Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
	Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh

From: Quentin Perret <qperret@google.com>

In order to simplify the injection of exceptions in the host in pkvm
context, refactor enter_exception64() to separate the code calculating
the exception offset from VBAR_EL1 and the cpsr.

No functional change intended.

Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/kvm_emulate.h |   5 ++
 arch/arm64/kvm/hyp/exception.c       | 100 ++++++++++++++++-----------
 2 files changed, 63 insertions(+), 42 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index c9eab316398e..c3f04bd5b2a5 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -71,6 +71,11 @@ static inline int kvm_inject_serror(struct kvm_vcpu *vcpu)
 	return kvm_inject_serror_esr(vcpu, ESR_ELx_ISV);
 }
 
+unsigned long get_except64_offset(unsigned long psr, unsigned long target_mode,
+				  enum exception_type type);
+unsigned long get_except64_cpsr(unsigned long old, bool has_mte,
+				unsigned long sctlr, unsigned long mode);
+
 void kvm_vcpu_wfi(struct kvm_vcpu *vcpu);
 
 void kvm_emulate_nested_eret(struct kvm_vcpu *vcpu);
diff --git a/arch/arm64/kvm/hyp/exception.c b/arch/arm64/kvm/hyp/exception.c
index bef40ddb16db..d3bcda665612 100644
--- a/arch/arm64/kvm/hyp/exception.c
+++ b/arch/arm64/kvm/hyp/exception.c
@@ -65,12 +65,25 @@ static void __vcpu_write_spsr_und(struct kvm_vcpu *vcpu, u64 val)
 		vcpu->arch.ctxt.spsr_und = val;
 }
 
+unsigned long get_except64_offset(unsigned long psr, unsigned long target_mode,
+				  enum exception_type type)
+{
+	u64 mode = psr & (PSR_MODE_MASK | PSR_MODE32_BIT);
+	u64 exc_offset;
+
+	if      (mode == target_mode)
+		exc_offset = CURRENT_EL_SP_ELx_VECTOR;
+	else if ((mode | PSR_MODE_THREAD_BIT) == target_mode)
+		exc_offset = CURRENT_EL_SP_EL0_VECTOR;
+	else if (!(mode & PSR_MODE32_BIT))
+		exc_offset = LOWER_EL_AArch64_VECTOR;
+	else
+		exc_offset = LOWER_EL_AArch32_VECTOR;
+
+	return exc_offset + type;
+}
+
 /*
- * This performs the exception entry at a given EL (@target_mode), stashing PC
- * and PSTATE into ELR and SPSR respectively, and compute the new PC/PSTATE.
- * The EL passed to this function *must* be a non-secure, privileged mode with
- * bit 0 being set (PSTATE.SP == 1).
- *
  * When an exception is taken, most PSTATE fields are left unchanged in the
  * handler. However, some are explicitly overridden (e.g. M[4:0]). Luckily all
  * of the inherited bits have the same position in the AArch64/AArch32 SPSR_ELx
@@ -82,50 +95,17 @@ static void __vcpu_write_spsr_und(struct kvm_vcpu *vcpu, u64 val)
  * Here we manipulate the fields in order of the AArch64 SPSR_ELx layout, from
  * MSB to LSB.
  */
-static void enter_exception64(struct kvm_vcpu *vcpu, unsigned long target_mode,
-			      enum exception_type type)
+unsigned long get_except64_cpsr(unsigned long old, bool has_mte,
+				unsigned long sctlr, unsigned long target_mode)
 {
-	unsigned long sctlr, vbar, old, new, mode;
-	u64 exc_offset;
-
-	mode = *vcpu_cpsr(vcpu) & (PSR_MODE_MASK | PSR_MODE32_BIT);
-
-	if      (mode == target_mode)
-		exc_offset = CURRENT_EL_SP_ELx_VECTOR;
-	else if ((mode | PSR_MODE_THREAD_BIT) == target_mode)
-		exc_offset = CURRENT_EL_SP_EL0_VECTOR;
-	else if (!(mode & PSR_MODE32_BIT))
-		exc_offset = LOWER_EL_AArch64_VECTOR;
-	else
-		exc_offset = LOWER_EL_AArch32_VECTOR;
-
-	switch (target_mode) {
-	case PSR_MODE_EL1h:
-		vbar = __vcpu_read_sys_reg(vcpu, VBAR_EL1);
-		sctlr = __vcpu_read_sys_reg(vcpu, SCTLR_EL1);
-		__vcpu_write_sys_reg(vcpu, *vcpu_pc(vcpu), ELR_EL1);
-		break;
-	case PSR_MODE_EL2h:
-		vbar = __vcpu_read_sys_reg(vcpu, VBAR_EL2);
-		sctlr = __vcpu_read_sys_reg(vcpu, SCTLR_EL2);
-		__vcpu_write_sys_reg(vcpu, *vcpu_pc(vcpu), ELR_EL2);
-		break;
-	default:
-		/* Don't do that */
-		BUG();
-	}
-
-	*vcpu_pc(vcpu) = vbar + exc_offset + type;
-
-	old = *vcpu_cpsr(vcpu);
-	new = 0;
+	u64 new = 0;
 
 	new |= (old & PSR_N_BIT);
 	new |= (old & PSR_Z_BIT);
 	new |= (old & PSR_C_BIT);
 	new |= (old & PSR_V_BIT);
 
-	if (kvm_has_mte(kern_hyp_va(vcpu->kvm)))
+	if (has_mte)
 		new |= PSR_TCO_BIT;
 
 	new |= (old & PSR_DIT_BIT);
@@ -161,6 +141,42 @@ static void enter_exception64(struct kvm_vcpu *vcpu, unsigned long target_mode,
 
 	new |= target_mode;
 
+	return new;
+}
+
+/*
+ * This performs the exception entry at a given EL (@target_mode), stashing PC
+ * and PSTATE into ELR and SPSR respectively, and compute the new PC/PSTATE.
+ * The EL passed to this function *must* be a non-secure, privileged mode with
+ * bit 0 being set (PSTATE.SP == 1).
+ */
+static void enter_exception64(struct kvm_vcpu *vcpu, unsigned long target_mode,
+			      enum exception_type type)
+{
+	u64 offset = get_except64_offset(*vcpu_cpsr(vcpu), target_mode, type);
+	unsigned long sctlr, vbar, old, new;
+
+	switch (target_mode) {
+	case PSR_MODE_EL1h:
+		vbar = __vcpu_read_sys_reg(vcpu, VBAR_EL1);
+		sctlr = __vcpu_read_sys_reg(vcpu, SCTLR_EL1);
+		__vcpu_write_sys_reg(vcpu, *vcpu_pc(vcpu), ELR_EL1);
+		break;
+	case PSR_MODE_EL2h:
+		vbar = __vcpu_read_sys_reg(vcpu, VBAR_EL2);
+		sctlr = __vcpu_read_sys_reg(vcpu, SCTLR_EL2);
+		__vcpu_write_sys_reg(vcpu, *vcpu_pc(vcpu), ELR_EL2);
+		break;
+	default:
+		/* Don't do that */
+		BUG();
+	}
+
+	*vcpu_pc(vcpu) = vbar + offset;
+
+	old = *vcpu_cpsr(vcpu);
+	new = get_except64_cpsr(old, kvm_has_mte(kern_hyp_va(vcpu->kvm)), sctlr,
+				target_mode);
 	*vcpu_cpsr(vcpu) = new;
 	__vcpu_write_spsr(vcpu, target_mode, old);
 }
-- 
2.52.0.457.g6b5491de43-goog



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 18/35] KVM: arm64: Inject SIGSEGV on illegal accesses
  2026-01-19 12:45 [PATCH v2 00/35] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
                   ` (16 preceding siblings ...)
  2026-01-19 12:46 ` [PATCH v2 17/35] KVM: arm64: Refactor enter_exception64() Will Deacon
@ 2026-01-19 12:46 ` Will Deacon
  2026-01-19 12:46 ` [PATCH v2 19/35] KVM: arm64: Avoid pointless annotation when mapping host-owned pages Will Deacon
                   ` (17 subsequent siblings)
  35 siblings, 0 replies; 54+ messages in thread
From: Will Deacon @ 2026-01-19 12:46 UTC (permalink / raw)
  To: kvmarm
  Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
	Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
	Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh

From: Quentin Perret <qperret@google.com>

The pKVM hypervisor will currently panic if the host tries to access
memory that it doesn't own (e.g. protected guest memory). Sadly, as
guest memory can still be mapped into the VMM's address space, userspace
can trivially crash the kernel/hypervisor by poking into guest memory.

To prevent this, inject the abort back in the host with S1PTW set in the
ESR, hence allowing the host to differentiate this abort from normal
userspace faults and inject a SIGSEGV cleanly.

Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/hyp/nvhe/mem_protect.c | 47 +++++++++++++++++++++++++++
 arch/arm64/mm/fault.c                 | 22 +++++++++++++
 2 files changed, 69 insertions(+)

diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index ef21e8e7a734..41469df46e09 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -610,6 +610,50 @@ static int host_stage2_idmap(u64 addr)
 	return ret;
 }
 
+static void host_inject_abort(struct kvm_cpu_context *host_ctxt)
+{
+	u64 spsr = read_sysreg_el2(SYS_SPSR);
+	u64 esr = read_sysreg_el2(SYS_ESR);
+	u64 ventry, ec;
+
+	/* Repaint the ESR to report a same-level fault if taken from EL1 */
+	if ((spsr & PSR_MODE_MASK) != PSR_MODE_EL0t) {
+		ec = ESR_ELx_EC(esr);
+		if (ec == ESR_ELx_EC_DABT_LOW)
+			ec = ESR_ELx_EC_DABT_CUR;
+		else if (ec == ESR_ELx_EC_IABT_LOW)
+			ec = ESR_ELx_EC_IABT_CUR;
+		else
+			WARN_ON(1);
+		esr &= ~ESR_ELx_EC_MASK;
+		esr |= ec << ESR_ELx_EC_SHIFT;
+	}
+
+	/*
+	 * Since S1PTW should only ever be set for stage-2 faults, we're pretty
+	 * much guaranteed that it won't be set in ESR_EL1 by the hardware. So,
+	 * let's use that bit to allow the host abort handler to differentiate
+	 * this abort from normal userspace faults.
+	 *
+	 * Note: although S1PTW is RES0 at EL1, it is guaranteed by the
+	 * architecture to be backed by flops, so it should be safe to use.
+	 */
+	esr |= ESR_ELx_S1PTW;
+
+	write_sysreg_el1(esr, SYS_ESR);
+	write_sysreg_el1(spsr, SYS_SPSR);
+	write_sysreg_el1(read_sysreg_el2(SYS_ELR), SYS_ELR);
+	write_sysreg_el1(read_sysreg_el2(SYS_FAR), SYS_FAR);
+
+	ventry = read_sysreg_el1(SYS_VBAR);
+	ventry += get_except64_offset(spsr, PSR_MODE_EL1h, except_type_sync);
+	write_sysreg_el2(ventry, SYS_ELR);
+
+	spsr = get_except64_cpsr(spsr, system_supports_mte(),
+				 read_sysreg_el1(SYS_SCTLR), PSR_MODE_EL1h);
+	write_sysreg_el2(spsr, SYS_SPSR);
+}
+
 void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt)
 {
 	struct kvm_vcpu_fault_info fault;
@@ -633,6 +677,9 @@ void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt)
 	addr = FIELD_GET(HPFAR_EL2_FIPA, fault.hpfar_el2) << 12;
 
 	switch (host_stage2_idmap(addr)) {
+	case -EPERM:
+		host_inject_abort(host_ctxt);
+		fallthrough;
 	case -EEXIST:
 	case 0:
 		break;
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index be9dab2c7d6a..2294f2061866 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -43,6 +43,7 @@
 #include <asm/system_misc.h>
 #include <asm/tlbflush.h>
 #include <asm/traps.h>
+#include <asm/virt.h>
 
 struct fault_info {
 	int	(*fn)(unsigned long far, unsigned long esr,
@@ -269,6 +270,15 @@ static inline bool is_el1_permission_fault(unsigned long addr, unsigned long esr
 	return false;
 }
 
+static bool is_pkvm_stage2_abort(unsigned int esr)
+{
+	/*
+	 * S1PTW should only ever be set in ESR_EL1 if the pkvm hypervisor
+	 * injected a stage-2 abort -- see host_inject_abort().
+	 */
+	return is_pkvm_initialized() && (esr & ESR_ELx_S1PTW);
+}
+
 static bool __kprobes is_spurious_el1_translation_fault(unsigned long addr,
 							unsigned long esr,
 							struct pt_regs *regs)
@@ -279,6 +289,9 @@ static bool __kprobes is_spurious_el1_translation_fault(unsigned long addr,
 	if (!is_el1_data_abort(esr) || !esr_fsc_is_translation_fault(esr))
 		return false;
 
+	if (is_pkvm_stage2_abort(esr))
+		return false;
+
 	local_irq_save(flags);
 	asm volatile("at s1e1r, %0" :: "r" (addr));
 	isb();
@@ -395,6 +408,8 @@ static void __do_kernel_fault(unsigned long addr, unsigned long esr,
 			msg = "read from unreadable memory";
 	} else if (addr < PAGE_SIZE) {
 		msg = "NULL pointer dereference";
+	} else if (is_pkvm_stage2_abort(esr)) {
+		msg = "access to hypervisor-protected memory";
 	} else {
 		if (esr_fsc_is_translation_fault(esr) &&
 		    kfence_handle_page_fault(addr, esr & ESR_ELx_WNR, regs))
@@ -621,6 +636,13 @@ static int __kprobes do_page_fault(unsigned long far, unsigned long esr,
 					 addr, esr, regs);
 	}
 
+	if (is_pkvm_stage2_abort(esr)) {
+		if (!user_mode(regs))
+			goto no_context;
+		arm64_force_sig_fault(SIGSEGV, SEGV_ACCERR, far, "stage-2 fault");
+		return 0;
+	}
+
 	perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, addr);
 
 	if (!(mm_flags & FAULT_FLAG_USER))
-- 
2.52.0.457.g6b5491de43-goog



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 19/35] KVM: arm64: Avoid pointless annotation when mapping host-owned pages
  2026-01-19 12:45 [PATCH v2 00/35] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
                   ` (17 preceding siblings ...)
  2026-01-19 12:46 ` [PATCH v2 18/35] KVM: arm64: Inject SIGSEGV on illegal accesses Will Deacon
@ 2026-01-19 12:46 ` Will Deacon
  2026-01-19 12:46 ` [PATCH v2 20/35] KVM: arm64: Generalise kvm_pgtable_stage2_set_owner() Will Deacon
                   ` (16 subsequent siblings)
  35 siblings, 0 replies; 54+ messages in thread
From: Will Deacon @ 2026-01-19 12:46 UTC (permalink / raw)
  To: kvmarm
  Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
	Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
	Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh

When a page is transitioned to host ownership, we can eagerly map it
into the host stage-2 page-table rather than going via the convoluted
step of a faulting annotation to trigger the mapping.

Call host_stage2_idmap_locked() directly when transitioning a page to
be owned by the host.

Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/hyp/nvhe/mem_protect.c | 28 +++++++++++++++------------
 1 file changed, 16 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 41469df46e09..55df0c45b0f2 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -548,23 +548,27 @@ static void __host_update_page_state(phys_addr_t addr, u64 size, enum pkvm_page_
 
 int host_stage2_set_owner_locked(phys_addr_t addr, u64 size, u8 owner_id)
 {
-	int ret;
+	int ret = -EINVAL;
 
 	if (!range_is_memory(addr, addr + size))
 		return -EPERM;
 
-	ret = host_stage2_try(kvm_pgtable_stage2_set_owner, &host_mmu.pgt,
-			      addr, size, &host_s2_pool, owner_id);
-	if (ret)
-		return ret;
+	switch (owner_id) {
+	case PKVM_ID_HOST:
+		ret = host_stage2_idmap_locked(addr, size, PKVM_HOST_MEM_PROT);
+		if (!ret)
+			__host_update_page_state(addr, size, PKVM_PAGE_OWNED);
+		break;
+	case PKVM_ID_GUEST:
+	case PKVM_ID_HYP:
+		ret = host_stage2_try(kvm_pgtable_stage2_set_owner, &host_mmu.pgt,
+				      addr, size, &host_s2_pool, owner_id);
+		if (!ret)
+			__host_update_page_state(addr, size, PKVM_NOPAGE);
+		break;
+	}
 
-	/* Don't forget to update the vmemmap tracking for the host */
-	if (owner_id == PKVM_ID_HOST)
-		__host_update_page_state(addr, size, PKVM_PAGE_OWNED);
-	else
-		__host_update_page_state(addr, size, PKVM_NOPAGE);
-
-	return 0;
+	return ret;
 }
 
 static bool host_stage2_force_pte_cb(u64 addr, u64 end, enum kvm_pgtable_prot prot)
-- 
2.52.0.457.g6b5491de43-goog



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 20/35] KVM: arm64: Generalise kvm_pgtable_stage2_set_owner()
  2026-01-19 12:45 [PATCH v2 00/35] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
                   ` (18 preceding siblings ...)
  2026-01-19 12:46 ` [PATCH v2 19/35] KVM: arm64: Avoid pointless annotation when mapping host-owned pages Will Deacon
@ 2026-01-19 12:46 ` Will Deacon
  2026-01-19 12:46 ` [PATCH v2 21/35] KVM: arm64: Introduce host_stage2_set_owner_metadata_locked() Will Deacon
                   ` (15 subsequent siblings)
  35 siblings, 0 replies; 54+ messages in thread
From: Will Deacon @ 2026-01-19 12:46 UTC (permalink / raw)
  To: kvmarm
  Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
	Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
	Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh

kvm_pgtable_stage2_set_owner() can be generalised into a way to store
up to 59 bits in the page tables alongside a 4-bit 'type' identifier
specific to the format of the 59-bit payload.

Introduce kvm_pgtable_stage2_annotate() and move the existing invalid
ptes (for locked ptes and donated pages) over to the new scheme.

Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/kvm_pgtable.h  | 39 +++++++++++++++++++--------
 arch/arm64/kvm/hyp/nvhe/mem_protect.c | 16 +++++++++--
 arch/arm64/kvm/hyp/pgtable.c          | 33 ++++++++++++++---------
 3 files changed, 62 insertions(+), 26 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 9ce55442b621..4c41a8ed4a7c 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -98,13 +98,25 @@ typedef u64 kvm_pte_t;
 					 KVM_PTE_LEAF_ATTR_HI_S2_XN)
 
 #define KVM_INVALID_PTE_OWNER_MASK	GENMASK(9, 2)
-#define KVM_MAX_OWNER_ID		3
 
-/*
- * Used to indicate a pte for which a 'break-before-make' sequence is in
- * progress.
- */
-#define KVM_INVALID_PTE_LOCKED		BIT(10)
+/* pKVM invalid pte encodings */
+#define KVM_INVALID_PTE_TYPE_MASK	GENMASK(63, 60)
+#define KVM_INVALID_PTE_ANNOT_MASK	~(KVM_PTE_VALID | \
+					  KVM_INVALID_PTE_TYPE_MASK)
+
+enum kvm_invalid_pte_type {
+	/*
+	 * Used to indicate a pte for which a 'break-before-make'
+	 * sequence is in progress.
+	 */
+	KVM_INVALID_PTE_TYPE_LOCKED	= 1,
+
+	/*
+	 * pKVM has unmapped the page from the host due to a change of
+	 * ownership.
+	 */
+	KVM_HOST_INVALID_PTE_TYPE_DONATION,
+};
 
 static inline bool kvm_pte_valid(kvm_pte_t pte)
 {
@@ -657,14 +669,18 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
 			   void *mc, enum kvm_pgtable_walk_flags flags);
 
 /**
- * kvm_pgtable_stage2_set_owner() - Unmap and annotate pages in the IPA space to
- *				    track ownership.
+ * kvm_pgtable_stage2_annotate() - Unmap and annotate pages in the IPA space
+ *				   to track ownership (and more).
  * @pgt:	Page-table structure initialised by kvm_pgtable_stage2_init*().
  * @addr:	Base intermediate physical address to annotate.
  * @size:	Size of the annotated range.
  * @mc:		Cache of pre-allocated and zeroed memory from which to allocate
  *		page-table pages.
- * @owner_id:	Unique identifier for the owner of the page.
+ * @type:	The type of the annotation, determining its meaning and format.
+ * @annotation:	A 59-bit value that will be stored in the page tables.
+ *		@annotation[0] and @annotation[63:60] must be 0.
+ * 		@annotation[59:1] is stored in the page tables, along
+ *		with @type.
  *
  * By default, all page-tables are owned by identifier 0. This function can be
  * used to mark portions of the IPA space as owned by other entities. When a
@@ -673,8 +689,9 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
  *
  * Return: 0 on success, negative error code on failure.
  */
-int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size,
-				 void *mc, u8 owner_id);
+int kvm_pgtable_stage2_annotate(struct kvm_pgtable *pgt, u64 addr, u64 size,
+				void *mc, enum kvm_invalid_pte_type type,
+				kvm_pte_t annotation);
 
 /**
  * kvm_pgtable_stage2_unmap() - Remove a mapping from a guest stage-2 page-table.
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 55df0c45b0f2..3f8a73461f90 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -546,10 +546,19 @@ static void __host_update_page_state(phys_addr_t addr, u64 size, enum pkvm_page_
 		set_host_state(page, state);
 }
 
+static kvm_pte_t kvm_init_invalid_leaf_owner(u8 owner_id)
+{
+	return FIELD_PREP(KVM_INVALID_PTE_OWNER_MASK, owner_id);
+}
+
 int host_stage2_set_owner_locked(phys_addr_t addr, u64 size, u8 owner_id)
 {
+	kvm_pte_t annotation;
 	int ret = -EINVAL;
 
+	if (!FIELD_FIT(KVM_INVALID_PTE_OWNER_MASK, owner_id))
+		return -EINVAL;
+
 	if (!range_is_memory(addr, addr + size))
 		return -EPERM;
 
@@ -561,8 +570,11 @@ int host_stage2_set_owner_locked(phys_addr_t addr, u64 size, u8 owner_id)
 		break;
 	case PKVM_ID_GUEST:
 	case PKVM_ID_HYP:
-		ret = host_stage2_try(kvm_pgtable_stage2_set_owner, &host_mmu.pgt,
-				      addr, size, &host_s2_pool, owner_id);
+		annotation = kvm_init_invalid_leaf_owner(owner_id);
+		ret = host_stage2_try(kvm_pgtable_stage2_annotate, &host_mmu.pgt,
+				      addr, size, &host_s2_pool,
+				      KVM_HOST_INVALID_PTE_TYPE_DONATION,
+				      annotation);
 		if (!ret)
 			__host_update_page_state(addr, size, PKVM_NOPAGE);
 		break;
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 9abc0a6cf448..38465f547c8c 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -114,11 +114,6 @@ static kvm_pte_t kvm_init_valid_leaf_pte(u64 pa, kvm_pte_t attr, s8 level)
 	return pte;
 }
 
-static kvm_pte_t kvm_init_invalid_leaf_owner(u8 owner_id)
-{
-	return FIELD_PREP(KVM_INVALID_PTE_OWNER_MASK, owner_id);
-}
-
 static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data,
 				  const struct kvm_pgtable_visit_ctx *ctx,
 				  enum kvm_pgtable_walk_flags visit)
@@ -563,7 +558,7 @@ void kvm_pgtable_hyp_destroy(struct kvm_pgtable *pgt)
 struct stage2_map_data {
 	const u64			phys;
 	kvm_pte_t			attr;
-	u8				owner_id;
+	kvm_pte_t			pte_annot;
 
 	kvm_pte_t			*anchor;
 	kvm_pte_t			*childp;
@@ -780,7 +775,11 @@ static bool stage2_pte_is_counted(kvm_pte_t pte)
 
 static bool stage2_pte_is_locked(kvm_pte_t pte)
 {
-	return !kvm_pte_valid(pte) && (pte & KVM_INVALID_PTE_LOCKED);
+	if (kvm_pte_valid(pte))
+		return false;
+
+	return FIELD_GET(KVM_INVALID_PTE_TYPE_MASK, pte) ==
+	       KVM_INVALID_PTE_TYPE_LOCKED;
 }
 
 static bool stage2_try_set_pte(const struct kvm_pgtable_visit_ctx *ctx, kvm_pte_t new)
@@ -811,6 +810,7 @@ static bool stage2_try_break_pte(const struct kvm_pgtable_visit_ctx *ctx,
 				 struct kvm_s2_mmu *mmu)
 {
 	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
+	kvm_pte_t locked_pte;
 
 	if (stage2_pte_is_locked(ctx->old)) {
 		/*
@@ -821,7 +821,9 @@ static bool stage2_try_break_pte(const struct kvm_pgtable_visit_ctx *ctx,
 		return false;
 	}
 
-	if (!stage2_try_set_pte(ctx, KVM_INVALID_PTE_LOCKED))
+	locked_pte = FIELD_PREP(KVM_INVALID_PTE_TYPE_MASK,
+				KVM_INVALID_PTE_TYPE_LOCKED);
+	if (!stage2_try_set_pte(ctx, locked_pte))
 		return false;
 
 	if (!kvm_pgtable_walk_skip_bbm_tlbi(ctx)) {
@@ -946,7 +948,7 @@ static int stage2_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
 	if (!data->annotation)
 		new = kvm_init_valid_leaf_pte(phys, data->attr, ctx->level);
 	else
-		new = kvm_init_invalid_leaf_owner(data->owner_id);
+		new = data->pte_annot;
 
 	/*
 	 * Skip updating the PTE if we are trying to recreate the exact
@@ -1100,16 +1102,18 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
 	return ret;
 }
 
-int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size,
-				 void *mc, u8 owner_id)
+int kvm_pgtable_stage2_annotate(struct kvm_pgtable *pgt, u64 addr, u64 size,
+				void *mc, enum kvm_invalid_pte_type type,
+				kvm_pte_t pte_annot)
 {
 	int ret;
 	struct stage2_map_data map_data = {
 		.mmu		= pgt->mmu,
 		.memcache	= mc,
-		.owner_id	= owner_id,
 		.force_pte	= true,
 		.annotation	= true,
+		.pte_annot	= pte_annot |
+				  FIELD_PREP(KVM_INVALID_PTE_TYPE_MASK, type),
 	};
 	struct kvm_pgtable_walker walker = {
 		.cb		= stage2_map_walker,
@@ -1118,7 +1122,10 @@ int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size,
 		.arg		= &map_data,
 	};
 
-	if (owner_id > KVM_MAX_OWNER_ID)
+	if (pte_annot & ~KVM_INVALID_PTE_ANNOT_MASK)
+		return -EINVAL;
+
+	if (!type || type == KVM_INVALID_PTE_TYPE_LOCKED)
 		return -EINVAL;
 
 	ret = kvm_pgtable_walk(pgt, addr, size, &walker);
-- 
2.52.0.457.g6b5491de43-goog



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 21/35] KVM: arm64: Introduce host_stage2_set_owner_metadata_locked()
  2026-01-19 12:45 [PATCH v2 00/35] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
                   ` (19 preceding siblings ...)
  2026-01-19 12:46 ` [PATCH v2 20/35] KVM: arm64: Generalise kvm_pgtable_stage2_set_owner() Will Deacon
@ 2026-01-19 12:46 ` Will Deacon
  2026-01-19 12:46 ` [PATCH v2 22/35] KVM: arm64: Change 'pkvm_handle_t' to u16 Will Deacon
                   ` (14 subsequent siblings)
  35 siblings, 0 replies; 54+ messages in thread
From: Will Deacon @ 2026-01-19 12:46 UTC (permalink / raw)
  To: kvmarm
  Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
	Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
	Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh

Rework host_stage2_set_owner_locked() to add a new helper function,
host_stage2_set_owner_metadata_locked(), which will allow us to store
additional metadata alongside a 3-bit owner ID for invalid host stage-2
entries.

Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/kvm_pgtable.h  |  2 --
 arch/arm64/kvm/hyp/nvhe/mem_protect.c | 47 ++++++++++++++++++---------
 2 files changed, 32 insertions(+), 17 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 4c41a8ed4a7c..eb2a6258d83d 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -97,8 +97,6 @@ typedef u64 kvm_pte_t;
 					 KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W | \
 					 KVM_PTE_LEAF_ATTR_HI_S2_XN)
 
-#define KVM_INVALID_PTE_OWNER_MASK	GENMASK(9, 2)
-
 /* pKVM invalid pte encodings */
 #define KVM_INVALID_PTE_TYPE_MASK	GENMASK(63, 60)
 #define KVM_INVALID_PTE_ANNOT_MASK	~(KVM_PTE_VALID | \
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 3f8a73461f90..e090252d38a8 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -546,37 +546,54 @@ static void __host_update_page_state(phys_addr_t addr, u64 size, enum pkvm_page_
 		set_host_state(page, state);
 }
 
-static kvm_pte_t kvm_init_invalid_leaf_owner(u8 owner_id)
-{
-	return FIELD_PREP(KVM_INVALID_PTE_OWNER_MASK, owner_id);
-}
-
-int host_stage2_set_owner_locked(phys_addr_t addr, u64 size, u8 owner_id)
+#define KVM_HOST_DONATION_PTE_OWNER_MASK	GENMASK(3, 1)
+#define KVM_HOST_DONATION_PTE_EXTRA_MASK	GENMASK(59, 4)
+static int host_stage2_set_owner_metadata_locked(phys_addr_t addr, u64 size,
+						 u8 owner_id, u64 meta)
 {
 	kvm_pte_t annotation;
-	int ret = -EINVAL;
+	int ret;
 
-	if (!FIELD_FIT(KVM_INVALID_PTE_OWNER_MASK, owner_id))
+	if (owner_id == PKVM_ID_HOST)
 		return -EINVAL;
 
 	if (!range_is_memory(addr, addr + size))
 		return -EPERM;
 
+	if (!FIELD_FIT(KVM_HOST_DONATION_PTE_OWNER_MASK, owner_id))
+		return -EINVAL;
+
+	if (!FIELD_FIT(KVM_HOST_DONATION_PTE_EXTRA_MASK, meta))
+		return -EINVAL;
+
+	annotation = FIELD_PREP(KVM_HOST_DONATION_PTE_OWNER_MASK, owner_id) |
+		     FIELD_PREP(KVM_HOST_DONATION_PTE_EXTRA_MASK, meta);
+	ret = host_stage2_try(kvm_pgtable_stage2_annotate, &host_mmu.pgt,
+			      addr, size, &host_s2_pool,
+			      KVM_HOST_INVALID_PTE_TYPE_DONATION, annotation);
+	if (!ret)
+		__host_update_page_state(addr, size, PKVM_NOPAGE);
+
+	return ret;
+}
+
+int host_stage2_set_owner_locked(phys_addr_t addr, u64 size, u8 owner_id)
+{
+	int ret = -EINVAL;
+
 	switch (owner_id) {
 	case PKVM_ID_HOST:
+		if (!range_is_memory(addr, addr + size))
+			return -EPERM;
+
 		ret = host_stage2_idmap_locked(addr, size, PKVM_HOST_MEM_PROT);
 		if (!ret)
 			__host_update_page_state(addr, size, PKVM_PAGE_OWNED);
 		break;
 	case PKVM_ID_GUEST:
 	case PKVM_ID_HYP:
-		annotation = kvm_init_invalid_leaf_owner(owner_id);
-		ret = host_stage2_try(kvm_pgtable_stage2_annotate, &host_mmu.pgt,
-				      addr, size, &host_s2_pool,
-				      KVM_HOST_INVALID_PTE_TYPE_DONATION,
-				      annotation);
-		if (!ret)
-			__host_update_page_state(addr, size, PKVM_NOPAGE);
+		ret = host_stage2_set_owner_metadata_locked(addr, size,
+							    owner_id, 0);
 		break;
 	}
 
-- 
2.52.0.457.g6b5491de43-goog



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 22/35] KVM: arm64: Change 'pkvm_handle_t' to u16
  2026-01-19 12:45 [PATCH v2 00/35] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
                   ` (20 preceding siblings ...)
  2026-01-19 12:46 ` [PATCH v2 21/35] KVM: arm64: Introduce host_stage2_set_owner_metadata_locked() Will Deacon
@ 2026-01-19 12:46 ` Will Deacon
  2026-01-28 10:28   ` Fuad Tabba
  2026-01-19 12:46 ` [PATCH v2 23/35] KVM: arm64: Annotate guest donations with handle and gfn in host stage-2 Will Deacon
                   ` (13 subsequent siblings)
  35 siblings, 1 reply; 54+ messages in thread
From: Will Deacon @ 2026-01-19 12:46 UTC (permalink / raw)
  To: kvmarm
  Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
	Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
	Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh

'pkvm_handle_t' doesn't need to be a 32-bit type and subsequent patches
will rely on it being no more than 16 bits so that it can be encoded
into a pte annotation.

Change 'pkvm_handle_t' to a u16 and add a compile-type check that the
maximum handle fits into the reduced type.

Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/kvm_host.h | 2 +-
 arch/arm64/kvm/hyp/nvhe/pkvm.c    | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 3191d10a2622..60a5c87b0a17 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -247,7 +247,7 @@ struct kvm_smccc_features {
 	unsigned long vendor_hyp_bmap_2; /* Function numbers 64-127 */
 };
 
-typedef unsigned int pkvm_handle_t;
+typedef u16 pkvm_handle_t;
 
 struct kvm_protected_vm {
 	pkvm_handle_t handle;
diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
index 9f0997150cf5..c5772417372d 100644
--- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
+++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
@@ -223,6 +223,7 @@ static struct pkvm_hyp_vm **vm_table;
 
 void pkvm_hyp_vm_table_init(void *tbl)
 {
+	BUILD_BUG_ON((u64)HANDLE_OFFSET + KVM_MAX_PVMS > (pkvm_handle_t)-1);
 	WARN_ON(vm_table);
 	vm_table = tbl;
 }
-- 
2.52.0.457.g6b5491de43-goog



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 23/35] KVM: arm64: Annotate guest donations with handle and gfn in host stage-2
  2026-01-19 12:45 [PATCH v2 00/35] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
                   ` (21 preceding siblings ...)
  2026-01-19 12:46 ` [PATCH v2 22/35] KVM: arm64: Change 'pkvm_handle_t' to u16 Will Deacon
@ 2026-01-19 12:46 ` Will Deacon
  2026-01-28 10:29   ` Fuad Tabba
  2026-01-19 12:46 ` [PATCH v2 24/35] KVM: arm64: Introduce hypercall to force reclaim of a protected page Will Deacon
                   ` (12 subsequent siblings)
  35 siblings, 1 reply; 54+ messages in thread
From: Will Deacon @ 2026-01-19 12:46 UTC (permalink / raw)
  To: kvmarm
  Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
	Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
	Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh

Handling host kernel faults arising from accesses to donated guest
memory will require an rmap-like mechanism to identify the guest mapping
of the faulting page.

Extend the page donation logic to encode the guest handle and gfn
alongside the owner information in the host stage-2 pte.

Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/hyp/nvhe/mem_protect.c | 20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index e090252d38a8..f4638fe9d77a 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -590,7 +590,6 @@ int host_stage2_set_owner_locked(phys_addr_t addr, u64 size, u8 owner_id)
 		if (!ret)
 			__host_update_page_state(addr, size, PKVM_PAGE_OWNED);
 		break;
-	case PKVM_ID_GUEST:
 	case PKVM_ID_HYP:
 		ret = host_stage2_set_owner_metadata_locked(addr, size,
 							    owner_id, 0);
@@ -600,6 +599,20 @@ int host_stage2_set_owner_locked(phys_addr_t addr, u64 size, u8 owner_id)
 	return ret;
 }
 
+#define KVM_HOST_PTE_OWNER_GUEST_HANDLE_MASK	GENMASK(15, 0)
+/* We need 40 bits for the GFN to cover a 52-bit IPA with 4k pages and LPA2 */
+#define KVM_HOST_PTE_OWNER_GUEST_GFN_MASK	GENMASK(55, 16)
+static u64 host_stage2_encode_gfn_meta(struct pkvm_hyp_vm *vm, u64 gfn)
+{
+	pkvm_handle_t handle = vm->kvm.arch.pkvm.handle;
+
+	BUILD_BUG_ON((pkvm_handle_t)-1 > KVM_HOST_PTE_OWNER_GUEST_HANDLE_MASK);
+	WARN_ON(!FIELD_FIT(KVM_HOST_PTE_OWNER_GUEST_GFN_MASK, gfn));
+
+	return FIELD_PREP(KVM_HOST_PTE_OWNER_GUEST_HANDLE_MASK, handle) |
+	       FIELD_PREP(KVM_HOST_PTE_OWNER_GUEST_GFN_MASK, gfn);
+}
+
 static bool host_stage2_force_pte_cb(u64 addr, u64 end, enum kvm_pgtable_prot prot)
 {
 	/*
@@ -1133,6 +1146,7 @@ int __pkvm_host_donate_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu)
 	struct pkvm_hyp_vm *vm = pkvm_hyp_vcpu_to_hyp_vm(vcpu);
 	u64 phys = hyp_pfn_to_phys(pfn);
 	u64 ipa = hyp_pfn_to_phys(gfn);
+	u64 meta;
 	int ret;
 
 	host_lock_component();
@@ -1146,7 +1160,9 @@ int __pkvm_host_donate_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu)
 	if (ret)
 		goto unlock;
 
-	WARN_ON(host_stage2_set_owner_locked(phys, PAGE_SIZE, PKVM_ID_GUEST));
+	meta = host_stage2_encode_gfn_meta(vm, gfn);
+	WARN_ON(host_stage2_set_owner_metadata_locked(phys, PAGE_SIZE,
+						      PKVM_ID_GUEST, meta));
 	WARN_ON(kvm_pgtable_stage2_map(&vm->pgt, ipa, PAGE_SIZE, phys,
 				       pkvm_mkstate(KVM_PGTABLE_PROT_RWX, PKVM_PAGE_OWNED),
 				       &vcpu->vcpu.arch.pkvm_memcache, 0));
-- 
2.52.0.457.g6b5491de43-goog



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 24/35] KVM: arm64: Introduce hypercall to force reclaim of a protected page
  2026-01-19 12:45 [PATCH v2 00/35] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
                   ` (22 preceding siblings ...)
  2026-01-19 12:46 ` [PATCH v2 23/35] KVM: arm64: Annotate guest donations with handle and gfn in host stage-2 Will Deacon
@ 2026-01-19 12:46 ` Will Deacon
  2026-02-12 17:18   ` Alexandru Elisei
  2026-01-19 12:46 ` [PATCH v2 25/35] KVM: arm64: Reclaim faulting page from pKVM in spurious fault handler Will Deacon
                   ` (11 subsequent siblings)
  35 siblings, 1 reply; 54+ messages in thread
From: Will Deacon @ 2026-01-19 12:46 UTC (permalink / raw)
  To: kvmarm
  Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
	Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
	Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh

Introduce a new hypercall, __pkvm_force_reclaim_guest_page(), to allow
the host to forcefully reclaim a physical page that was previous donated
to a protected guest. This results in the page being zeroed and the
previous guest mapping being poisoned so that new pages cannot be
subsequently donated at the same IPA.

Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/kvm_asm.h              |   1 +
 arch/arm64/include/asm/kvm_pgtable.h          |   6 +
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |   1 +
 arch/arm64/kvm/hyp/include/nvhe/memory.h      |   6 +
 arch/arm64/kvm/hyp/include/nvhe/pkvm.h        |   1 +
 arch/arm64/kvm/hyp/nvhe/hyp-main.c            |   8 ++
 arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 127 +++++++++++++++++-
 arch/arm64/kvm/hyp/nvhe/pkvm.c                |   4 +-
 8 files changed, 152 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 2e7e8e7771f6..39e4e588ca4f 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -90,6 +90,7 @@ enum __kvm_host_smccc_func {
 	__KVM_HOST_SMCCC_FUNC___pkvm_unreserve_vm,
 	__KVM_HOST_SMCCC_FUNC___pkvm_init_vm,
 	__KVM_HOST_SMCCC_FUNC___pkvm_init_vcpu,
+	__KVM_HOST_SMCCC_FUNC___pkvm_force_reclaim_guest_page,
 	__KVM_HOST_SMCCC_FUNC___pkvm_reclaim_dying_guest_page,
 	__KVM_HOST_SMCCC_FUNC___pkvm_start_teardown_vm,
 	__KVM_HOST_SMCCC_FUNC___pkvm_finalize_teardown_vm,
diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index eb2a6258d83d..4c069f875a85 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -114,6 +114,12 @@ enum kvm_invalid_pte_type {
 	 * ownership.
 	 */
 	KVM_HOST_INVALID_PTE_TYPE_DONATION,
+
+	/*
+	 * The page has been forcefully reclaimed from the guest by the
+	 * host.
+	 */
+	KVM_GUEST_INVALID_PTE_TYPE_POISONED,
 };
 
 static inline bool kvm_pte_valid(kvm_pte_t pte)
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index cde38a556049..f27b037abaf3 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -41,6 +41,7 @@ int __pkvm_hyp_donate_host(u64 pfn, u64 nr_pages);
 int __pkvm_host_share_ffa(u64 pfn, u64 nr_pages);
 int __pkvm_host_unshare_ffa(u64 pfn, u64 nr_pages);
 int __pkvm_host_donate_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu);
+int __pkvm_host_force_reclaim_page_guest(phys_addr_t phys);
 int __pkvm_host_reclaim_page_guest(u64 gfn, struct pkvm_hyp_vm *vm);
 int __pkvm_host_share_guest(u64 pfn, u64 gfn, u64 nr_pages, struct pkvm_hyp_vcpu *vcpu,
 			    enum kvm_pgtable_prot prot);
diff --git a/arch/arm64/kvm/hyp/include/nvhe/memory.h b/arch/arm64/kvm/hyp/include/nvhe/memory.h
index dee1a406b0c2..4cedb720c75d 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/memory.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/memory.h
@@ -30,6 +30,12 @@ enum pkvm_page_state {
 	 * struct hyp_page.
 	 */
 	PKVM_NOPAGE			= BIT(0) | BIT(1),
+
+	/*
+	 * 'Meta-states' which aren't encoded directly in the PTE's SW bits (or
+	 * the hyp_vmemmap entry for the host)
+	 */
+	PKVM_POISON			= BIT(2),
 };
 #define PKVM_PAGE_STATE_MASK		(BIT(0) | BIT(1))
 
diff --git a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
index 506831804f64..a5a7bb453f3e 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
@@ -78,6 +78,7 @@ int __pkvm_reclaim_dying_guest_page(pkvm_handle_t handle, u64 gfn);
 int __pkvm_start_teardown_vm(pkvm_handle_t handle);
 int __pkvm_finalize_teardown_vm(pkvm_handle_t handle);
 
+struct pkvm_hyp_vm *get_vm_by_handle(pkvm_handle_t handle);
 struct pkvm_hyp_vcpu *pkvm_load_hyp_vcpu(pkvm_handle_t handle,
 					 unsigned int vcpu_idx);
 void pkvm_put_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu);
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index f43c50ae2d81..e68b5d24bdad 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -570,6 +570,13 @@ static void handle___pkvm_init_vcpu(struct kvm_cpu_context *host_ctxt)
 	cpu_reg(host_ctxt, 1) = __pkvm_init_vcpu(handle, host_vcpu, vcpu_hva);
 }
 
+static void handle___pkvm_force_reclaim_guest_page(struct kvm_cpu_context *host_ctxt)
+{
+	DECLARE_REG(phys_addr_t, phys, host_ctxt, 1);
+
+	cpu_reg(host_ctxt, 1) = __pkvm_host_force_reclaim_page_guest(phys);
+}
+
 static void handle___pkvm_reclaim_dying_guest_page(struct kvm_cpu_context *host_ctxt)
 {
 	DECLARE_REG(pkvm_handle_t, handle, host_ctxt, 1);
@@ -631,6 +638,7 @@ static const hcall_t host_hcall[] = {
 	HANDLE_FUNC(__pkvm_unreserve_vm),
 	HANDLE_FUNC(__pkvm_init_vm),
 	HANDLE_FUNC(__pkvm_init_vcpu),
+	HANDLE_FUNC(__pkvm_force_reclaim_guest_page),
 	HANDLE_FUNC(__pkvm_reclaim_dying_guest_page),
 	HANDLE_FUNC(__pkvm_start_teardown_vm),
 	HANDLE_FUNC(__pkvm_finalize_teardown_vm),
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index f4638fe9d77a..49b309b8d7d2 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -613,6 +613,35 @@ static u64 host_stage2_encode_gfn_meta(struct pkvm_hyp_vm *vm, u64 gfn)
 	       FIELD_PREP(KVM_HOST_PTE_OWNER_GUEST_GFN_MASK, gfn);
 }
 
+static int host_stage2_decode_gfn_meta(kvm_pte_t pte, struct pkvm_hyp_vm **vm,
+				       u64 *gfn)
+{
+	pkvm_handle_t handle;
+	u64 meta;
+
+	if (WARN_ON(kvm_pte_valid(pte)))
+		return -EINVAL;
+
+	if (FIELD_GET(KVM_INVALID_PTE_TYPE_MASK, pte) !=
+	    KVM_HOST_INVALID_PTE_TYPE_DONATION) {
+		return -EINVAL;
+	}
+
+	if (FIELD_GET(KVM_HOST_DONATION_PTE_OWNER_MASK, pte) != PKVM_ID_GUEST)
+		return -EPERM;
+
+	meta = FIELD_GET(KVM_HOST_DONATION_PTE_EXTRA_MASK, pte);
+	handle = FIELD_GET(KVM_HOST_PTE_OWNER_GUEST_HANDLE_MASK, meta);
+	*vm = get_vm_by_handle(handle);
+	if (!*vm) {
+		/* We probably raced with teardown; try again */
+		return -EAGAIN;
+	}
+
+	*gfn = FIELD_GET(KVM_HOST_PTE_OWNER_GUEST_GFN_MASK, meta);
+	return 0;
+}
+
 static bool host_stage2_force_pte_cb(u64 addr, u64 end, enum kvm_pgtable_prot prot)
 {
 	/*
@@ -809,8 +838,20 @@ static int __hyp_check_page_state_range(phys_addr_t phys, u64 size, enum pkvm_pa
 	return 0;
 }
 
+static bool guest_pte_is_poisoned(kvm_pte_t pte)
+{
+	if (kvm_pte_valid(pte))
+		return false;
+
+	return FIELD_GET(KVM_INVALID_PTE_TYPE_MASK, pte) ==
+	       KVM_GUEST_INVALID_PTE_TYPE_POISONED;
+}
+
 static enum pkvm_page_state guest_get_page_state(kvm_pte_t pte, u64 addr)
 {
+	if (guest_pte_is_poisoned(pte))
+		return PKVM_POISON;
+
 	if (!kvm_pte_valid(pte))
 		return PKVM_NOPAGE;
 
@@ -839,6 +880,8 @@ static int get_valid_guest_pte(struct pkvm_hyp_vm *vm, u64 ipa, kvm_pte_t *ptep,
 	ret = kvm_pgtable_get_leaf(&vm->pgt, ipa, &pte, &level);
 	if (ret)
 		return ret;
+	if (guest_pte_is_poisoned(pte))
+		return -EHWPOISON;
 	if (!kvm_pte_valid(pte))
 		return -ENOENT;
 	if (level != KVM_PGTABLE_LAST_LEVEL)
@@ -1104,6 +1147,84 @@ static void hyp_poison_page(phys_addr_t phys)
 	hyp_fixmap_unmap();
 }
 
+static int host_stage2_get_guest_info(phys_addr_t phys, struct pkvm_hyp_vm **vm,
+				      u64 *gfn)
+{
+	enum pkvm_page_state state;
+	kvm_pte_t pte;
+	s8 level;
+	int ret;
+
+	if (!addr_is_memory(phys))
+		return -EFAULT;
+
+	state = get_host_state(hyp_phys_to_page(phys));
+	switch (state) {
+	case PKVM_PAGE_OWNED:
+	case PKVM_PAGE_SHARED_OWNED:
+	case PKVM_PAGE_SHARED_BORROWED:
+		/* The access should no longer fault; try again. */
+		return -EAGAIN;
+	case PKVM_NOPAGE:
+		break;
+	default:
+		return -EPERM;
+	}
+
+	ret = kvm_pgtable_get_leaf(&host_mmu.pgt, phys, &pte, &level);
+	if (ret)
+		return ret;
+
+	if (WARN_ON(level != KVM_PGTABLE_LAST_LEVEL))
+		return -EINVAL;
+
+	return host_stage2_decode_gfn_meta(pte, vm, gfn);
+}
+
+int __pkvm_host_force_reclaim_page_guest(phys_addr_t phys)
+{
+	struct pkvm_hyp_vm *vm;
+	u64 gfn, ipa, pa;
+	kvm_pte_t pte;
+	int ret;
+
+	hyp_spin_lock(&vm_table_lock);
+	host_lock_component();
+
+	ret = host_stage2_get_guest_info(phys, &vm, &gfn);
+	if (ret)
+		goto unlock_host;
+
+	ipa = hyp_pfn_to_phys(gfn);
+	guest_lock_component(vm);
+	ret = get_valid_guest_pte(vm, ipa, &pte, &pa);
+	if (ret)
+		goto unlock_guest;
+
+	WARN_ON(pa != phys);
+	if (guest_get_page_state(pte, ipa) != PKVM_PAGE_OWNED) {
+		ret = -EPERM;
+		goto unlock_guest;
+	}
+
+	/* We really shouldn't be allocating, so don't pass a memcache */
+	ret = kvm_pgtable_stage2_annotate(&vm->pgt, ipa, PAGE_SIZE, NULL,
+					  KVM_GUEST_INVALID_PTE_TYPE_POISONED,
+					  0);
+	if (ret)
+		goto unlock_guest;
+
+	hyp_poison_page(phys);
+	WARN_ON(host_stage2_set_owner_locked(phys, PAGE_SIZE, PKVM_ID_HOST));
+unlock_guest:
+	guest_unlock_component(vm);
+unlock_host:
+	host_unlock_component();
+	hyp_spin_unlock(&vm_table_lock);
+
+	return ret;
+}
+
 int __pkvm_host_reclaim_page_guest(u64 gfn, struct pkvm_hyp_vm *vm)
 {
 	u64 ipa = hyp_pfn_to_phys(gfn);
@@ -1138,7 +1259,11 @@ int __pkvm_host_reclaim_page_guest(u64 gfn, struct pkvm_hyp_vm *vm)
 	guest_unlock_component(vm);
 	host_unlock_component();
 
-	return ret;
+	/*
+	 * -EHWPOISON implies that the page was forcefully reclaimed already
+	 * so return success for the GUP pin to be dropped.
+	 */
+	return ret && ret != -EHWPOISON ? ret : 0;
 }
 
 int __pkvm_host_donate_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu)
diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
index c5772417372d..2836c68c1ea5 100644
--- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
+++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
@@ -231,10 +231,12 @@ void pkvm_hyp_vm_table_init(void *tbl)
 /*
  * Return the hyp vm structure corresponding to the handle.
  */
-static struct pkvm_hyp_vm *get_vm_by_handle(pkvm_handle_t handle)
+struct pkvm_hyp_vm *get_vm_by_handle(pkvm_handle_t handle)
 {
 	unsigned int idx = vm_handle_to_idx(handle);
 
+	hyp_assert_lock_held(&vm_table_lock);
+
 	if (unlikely(idx >= KVM_MAX_PVMS))
 		return NULL;
 
-- 
2.52.0.457.g6b5491de43-goog



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 25/35] KVM: arm64: Reclaim faulting page from pKVM in spurious fault handler
  2026-01-19 12:45 [PATCH v2 00/35] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
                   ` (23 preceding siblings ...)
  2026-01-19 12:46 ` [PATCH v2 24/35] KVM: arm64: Introduce hypercall to force reclaim of a protected page Will Deacon
@ 2026-01-19 12:46 ` Will Deacon
  2026-02-12 17:22   ` Alexandru Elisei
  2026-01-19 12:46 ` [PATCH v2 26/35] KVM: arm64: Return -EFAULT from VCPU_RUN on access to a poisoned pte Will Deacon
                   ` (10 subsequent siblings)
  35 siblings, 1 reply; 54+ messages in thread
From: Will Deacon @ 2026-01-19 12:46 UTC (permalink / raw)
  To: kvmarm
  Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
	Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
	Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh

Host kernel accesses to pages that are inaccessible at stage-2 result in
the injection of a translation fault, which is fatal unless an exception
table fixup is registered for the faulting PC (e.g. for user access
routines). This is undesirable, since a get_user_pages() call could be
used to obtain a reference to a donated page and then a subsequent
access via a kernel mapping would lead to a panic().

Rework the spurious fault handler so that stage-2 faults injected back
into the host result in the target page being forcefully reclaimed when
no exception table fixup handler is registered.

Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/virt.h |  6 ++++++
 arch/arm64/kvm/pkvm.c         |  7 +++++++
 arch/arm64/mm/fault.c         | 15 +++++++++------
 3 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/virt.h b/arch/arm64/include/asm/virt.h
index b51ab6840f9c..e80addc923a4 100644
--- a/arch/arm64/include/asm/virt.h
+++ b/arch/arm64/include/asm/virt.h
@@ -94,6 +94,12 @@ static inline bool is_pkvm_initialized(void)
 	       static_branch_likely(&kvm_protected_mode_initialized);
 }
 
+#ifdef CONFIG_KVM
+bool pkvm_reclaim_guest_page(phys_addr_t phys);
+#else
+static inline bool pkvm_reclaim_guest_page(phys_addr_t phys) { return false; }
+#endif
+
 /* Reports the availability of HYP mode */
 static inline bool is_hyp_mode_available(void)
 {
diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index 8be91051699e..d1926cb08c76 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -563,3 +563,10 @@ int pkvm_pgtable_stage2_split(struct kvm_pgtable *pgt, u64 addr, u64 size,
 	WARN_ON_ONCE(1);
 	return -EINVAL;
 }
+
+bool pkvm_reclaim_guest_page(phys_addr_t phys)
+{
+	int ret = kvm_call_hyp_nvhe(__pkvm_force_reclaim_guest_page, phys);
+
+	return !ret || ret == -EAGAIN;
+}
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 2294f2061866..5d62abee5262 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -289,9 +289,6 @@ static bool __kprobes is_spurious_el1_translation_fault(unsigned long addr,
 	if (!is_el1_data_abort(esr) || !esr_fsc_is_translation_fault(esr))
 		return false;
 
-	if (is_pkvm_stage2_abort(esr))
-		return false;
-
 	local_irq_save(flags);
 	asm volatile("at s1e1r, %0" :: "r" (addr));
 	isb();
@@ -302,8 +299,12 @@ static bool __kprobes is_spurious_el1_translation_fault(unsigned long addr,
 	 * If we now have a valid translation, treat the translation fault as
 	 * spurious.
 	 */
-	if (!(par & SYS_PAR_EL1_F))
+	if (!(par & SYS_PAR_EL1_F)) {
+		if (is_pkvm_stage2_abort(esr))
+			return pkvm_reclaim_guest_page(par & SYS_PAR_EL1_PA);
+
 		return true;
+	}
 
 	/*
 	 * If we got a different type of fault from the AT instruction,
@@ -389,9 +390,11 @@ static void __do_kernel_fault(unsigned long addr, unsigned long esr,
 	if (!is_el1_instruction_abort(esr) && fixup_exception(regs, esr))
 		return;
 
-	if (WARN_RATELIMIT(is_spurious_el1_translation_fault(addr, esr, regs),
-	    "Ignoring spurious kernel translation fault at virtual address %016lx\n", addr))
+	if (is_spurious_el1_translation_fault(addr, esr, regs)) {
+		WARN_RATELIMIT(!is_pkvm_stage2_abort(esr),
+			"Ignoring spurious kernel translation fault at virtual address %016lx\n", addr);
 		return;
+	}
 
 	if (is_el1_mte_sync_tag_check_fault(esr)) {
 		do_tag_recovery(addr, esr, regs);
-- 
2.52.0.457.g6b5491de43-goog



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 26/35] KVM: arm64: Return -EFAULT from VCPU_RUN on access to a poisoned pte
  2026-01-19 12:45 [PATCH v2 00/35] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
                   ` (24 preceding siblings ...)
  2026-01-19 12:46 ` [PATCH v2 25/35] KVM: arm64: Reclaim faulting page from pKVM in spurious fault handler Will Deacon
@ 2026-01-19 12:46 ` Will Deacon
  2026-01-19 12:46 ` [PATCH v2 27/35] KVM: arm64: Add hvc handler at EL2 for hypercalls from protected VMs Will Deacon
                   ` (9 subsequent siblings)
  35 siblings, 0 replies; 54+ messages in thread
From: Will Deacon @ 2026-01-19 12:46 UTC (permalink / raw)
  To: kvmarm
  Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
	Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
	Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh

If a protected vCPU faults on an IPA which appears to be mapped, query
the hypervisor to determine whether or not the faulting pte has been
poisoned by a forceful reclaim. If the pte has been poisoned, return
-EFAULT back to userspace rather than retrying the instruction forever.

Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/kvm_asm.h              |  1 +
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  1 +
 arch/arm64/kvm/hyp/nvhe/hyp-main.c            | 10 +++++
 arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 43 +++++++++++++++++++
 arch/arm64/kvm/pkvm.c                         |  9 ++--
 5 files changed, 61 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 39e4e588ca4f..c4246c34509a 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -90,6 +90,7 @@ enum __kvm_host_smccc_func {
 	__KVM_HOST_SMCCC_FUNC___pkvm_unreserve_vm,
 	__KVM_HOST_SMCCC_FUNC___pkvm_init_vm,
 	__KVM_HOST_SMCCC_FUNC___pkvm_init_vcpu,
+	__KVM_HOST_SMCCC_FUNC___pkvm_vcpu_in_poison_fault,
 	__KVM_HOST_SMCCC_FUNC___pkvm_force_reclaim_guest_page,
 	__KVM_HOST_SMCCC_FUNC___pkvm_reclaim_dying_guest_page,
 	__KVM_HOST_SMCCC_FUNC___pkvm_start_teardown_vm,
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index f27b037abaf3..5e6cdafcdd69 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -41,6 +41,7 @@ int __pkvm_hyp_donate_host(u64 pfn, u64 nr_pages);
 int __pkvm_host_share_ffa(u64 pfn, u64 nr_pages);
 int __pkvm_host_unshare_ffa(u64 pfn, u64 nr_pages);
 int __pkvm_host_donate_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu);
+int __pkvm_vcpu_in_poison_fault(struct pkvm_hyp_vcpu *hyp_vcpu);
 int __pkvm_host_force_reclaim_page_guest(phys_addr_t phys);
 int __pkvm_host_reclaim_page_guest(u64 gfn, struct pkvm_hyp_vm *vm);
 int __pkvm_host_share_guest(u64 pfn, u64 gfn, u64 nr_pages, struct pkvm_hyp_vcpu *vcpu,
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index e68b5d24bdad..4ecde9662111 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -570,6 +570,15 @@ static void handle___pkvm_init_vcpu(struct kvm_cpu_context *host_ctxt)
 	cpu_reg(host_ctxt, 1) = __pkvm_init_vcpu(handle, host_vcpu, vcpu_hva);
 }
 
+static void handle___pkvm_vcpu_in_poison_fault(struct kvm_cpu_context *host_ctxt)
+{
+	int ret;
+	struct pkvm_hyp_vcpu *hyp_vcpu = pkvm_get_loaded_hyp_vcpu();
+
+	ret = hyp_vcpu ? __pkvm_vcpu_in_poison_fault(hyp_vcpu) : -EINVAL;
+	cpu_reg(host_ctxt, 1) = ret;
+}
+
 static void handle___pkvm_force_reclaim_guest_page(struct kvm_cpu_context *host_ctxt)
 {
 	DECLARE_REG(phys_addr_t, phys, host_ctxt, 1);
@@ -638,6 +647,7 @@ static const hcall_t host_hcall[] = {
 	HANDLE_FUNC(__pkvm_unreserve_vm),
 	HANDLE_FUNC(__pkvm_init_vm),
 	HANDLE_FUNC(__pkvm_init_vcpu),
+	HANDLE_FUNC(__pkvm_vcpu_in_poison_fault),
 	HANDLE_FUNC(__pkvm_force_reclaim_guest_page),
 	HANDLE_FUNC(__pkvm_reclaim_dying_guest_page),
 	HANDLE_FUNC(__pkvm_start_teardown_vm),
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 49b309b8d7d2..914b373cfb56 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -898,6 +898,49 @@ static int get_valid_guest_pte(struct pkvm_hyp_vm *vm, u64 ipa, kvm_pte_t *ptep,
 	return 0;
 }
 
+int __pkvm_vcpu_in_poison_fault(struct pkvm_hyp_vcpu *hyp_vcpu)
+{
+	struct pkvm_hyp_vm *vm = pkvm_hyp_vcpu_to_hyp_vm(hyp_vcpu);
+	kvm_pte_t pte;
+	s8 level;
+	u64 ipa;
+	int ret;
+
+	switch (kvm_vcpu_trap_get_class(&hyp_vcpu->vcpu)) {
+	case ESR_ELx_EC_DABT_LOW:
+	case ESR_ELx_EC_IABT_LOW:
+		if (kvm_vcpu_trap_is_translation_fault(&hyp_vcpu->vcpu))
+			break;
+		fallthrough;
+	default:
+		return -EINVAL;
+	}
+
+	/*
+	 * The host has the faulting IPA when it calls us from the guest
+	 * fault handler but we retrieve it ourselves from the FAR so as
+	 * to avoid exposing an "oracle" that could reveal data access
+	 * patterns of the guest after initial donation of its pages.
+	 */
+	ipa = kvm_vcpu_get_fault_ipa(&hyp_vcpu->vcpu);
+	ipa |= kvm_vcpu_get_hfar(&hyp_vcpu->vcpu) & GENMASK(11, 0);
+
+	guest_lock_component(vm);
+	ret = kvm_pgtable_get_leaf(&vm->pgt, ipa, &pte, &level);
+	if (ret)
+		goto unlock;
+
+	if (level != KVM_PGTABLE_LAST_LEVEL) {
+		ret = -EINVAL;
+		goto unlock;
+	}
+
+	ret = guest_pte_is_poisoned(pte);
+unlock:
+	guest_unlock_component(vm);
+	return ret;
+}
+
 int __pkvm_host_share_hyp(u64 pfn)
 {
 	u64 phys = hyp_pfn_to_phys(pfn);
diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index d1926cb08c76..14865907610c 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -417,10 +417,13 @@ int pkvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
 			return -EINVAL;
 
 		/*
-		 * We raced with another vCPU.
+		 * We either raced with another vCPU or the guest PTE
+		 * has been poisoned by an erroneous host access.
 		 */
-		if (mapping)
-			return -EAGAIN;
+		if (mapping) {
+			ret = kvm_call_hyp_nvhe(__pkvm_vcpu_in_poison_fault);
+			return ret ? -EFAULT : -EAGAIN;
+		}
 
 		ret = kvm_call_hyp_nvhe(__pkvm_host_donate_guest, pfn, gfn);
 	} else {
-- 
2.52.0.457.g6b5491de43-goog



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 27/35] KVM: arm64: Add hvc handler at EL2 for hypercalls from protected VMs
  2026-01-19 12:45 [PATCH v2 00/35] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
                   ` (25 preceding siblings ...)
  2026-01-19 12:46 ` [PATCH v2 26/35] KVM: arm64: Return -EFAULT from VCPU_RUN on access to a poisoned pte Will Deacon
@ 2026-01-19 12:46 ` Will Deacon
  2026-01-19 12:46 ` [PATCH v2 28/35] KVM: arm64: Implement the MEM_SHARE hypercall for " Will Deacon
                   ` (8 subsequent siblings)
  35 siblings, 0 replies; 54+ messages in thread
From: Will Deacon @ 2026-01-19 12:46 UTC (permalink / raw)
  To: kvmarm
  Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
	Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
	Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh

Add a hypercall handler at EL2 for hypercalls originating from protected
VMs. For now, this implements only the FEATURES and MEMINFO calls, but
subsequent patches will implement the SHARE and UNSHARE functions
necessary for virtio.

Unhandled hypercalls (including PSCI) are passed back to the host.

Reviewed-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/hyp/include/nvhe/pkvm.h |  1 +
 arch/arm64/kvm/hyp/nvhe/pkvm.c         | 37 ++++++++++++++++++++++++++
 arch/arm64/kvm/hyp/nvhe/switch.c       |  1 +
 3 files changed, 39 insertions(+)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
index a5a7bb453f3e..c904647d2f76 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
@@ -88,6 +88,7 @@ struct pkvm_hyp_vm *get_pkvm_hyp_vm(pkvm_handle_t handle);
 struct pkvm_hyp_vm *get_np_pkvm_hyp_vm(pkvm_handle_t handle);
 void put_pkvm_hyp_vm(struct pkvm_hyp_vm *hyp_vm);
 
+bool kvm_handle_pvm_hvc64(struct kvm_vcpu *vcpu, u64 *exit_code);
 bool kvm_handle_pvm_sysreg(struct kvm_vcpu *vcpu, u64 *exit_code);
 bool kvm_handle_pvm_restricted(struct kvm_vcpu *vcpu, u64 *exit_code);
 void kvm_init_pvm_id_regs(struct kvm_vcpu *vcpu);
diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
index 2836c68c1ea5..64171e04ea82 100644
--- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
+++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
@@ -4,6 +4,8 @@
  * Author: Fuad Tabba <tabba@google.com>
  */
 
+#include <kvm/arm_hypercalls.h>
+
 #include <linux/kvm_host.h>
 #include <linux/mm.h>
 
@@ -935,3 +937,38 @@ int __pkvm_finalize_teardown_vm(pkvm_handle_t handle)
 	hyp_spin_unlock(&vm_table_lock);
 	return err;
 }
+/*
+ * Handler for protected VM HVC calls.
+ *
+ * Returns true if the hypervisor has handled the exit (and control
+ * should return to the guest) or false if it hasn't (and the handling
+ * should be performed by the host).
+ */
+bool kvm_handle_pvm_hvc64(struct kvm_vcpu *vcpu, u64 *exit_code)
+{
+	u64 val[4] = { SMCCC_RET_INVALID_PARAMETER };
+	bool handled = true;
+
+	switch (smccc_get_function(vcpu)) {
+	case ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID:
+		val[0] = BIT(ARM_SMCCC_KVM_FUNC_FEATURES);
+		val[0] |= BIT(ARM_SMCCC_KVM_FUNC_HYP_MEMINFO);
+		break;
+	case ARM_SMCCC_VENDOR_HYP_KVM_HYP_MEMINFO_FUNC_ID:
+		if (smccc_get_arg1(vcpu) ||
+		    smccc_get_arg2(vcpu) ||
+		    smccc_get_arg3(vcpu)) {
+			break;
+		}
+
+		val[0] = PAGE_SIZE;
+		break;
+	default:
+		/* Punt everything else back to the host, for now. */
+		handled = false;
+	}
+
+	if (handled)
+		smccc_set_retval(vcpu, val[0], val[1], val[2], val[3]);
+	return handled;
+}
diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
index d3b9ec8a7c28..b62e25e8bb7e 100644
--- a/arch/arm64/kvm/hyp/nvhe/switch.c
+++ b/arch/arm64/kvm/hyp/nvhe/switch.c
@@ -190,6 +190,7 @@ static const exit_handler_fn hyp_exit_handlers[] = {
 
 static const exit_handler_fn pvm_exit_handlers[] = {
 	[0 ... ESR_ELx_EC_MAX]		= NULL,
+	[ESR_ELx_EC_HVC64]		= kvm_handle_pvm_hvc64,
 	[ESR_ELx_EC_SYS64]		= kvm_handle_pvm_sys64,
 	[ESR_ELx_EC_SVE]		= kvm_handle_pvm_restricted,
 	[ESR_ELx_EC_FP_ASIMD]		= kvm_hyp_handle_fpsimd,
-- 
2.52.0.457.g6b5491de43-goog



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 28/35] KVM: arm64: Implement the MEM_SHARE hypercall for protected VMs
  2026-01-19 12:45 [PATCH v2 00/35] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
                   ` (26 preceding siblings ...)
  2026-01-19 12:46 ` [PATCH v2 27/35] KVM: arm64: Add hvc handler at EL2 for hypercalls from protected VMs Will Deacon
@ 2026-01-19 12:46 ` Will Deacon
  2026-01-19 12:46 ` [PATCH v2 29/35] KVM: arm64: Implement the MEM_UNSHARE " Will Deacon
                   ` (7 subsequent siblings)
  35 siblings, 0 replies; 54+ messages in thread
From: Will Deacon @ 2026-01-19 12:46 UTC (permalink / raw)
  To: kvmarm
  Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
	Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
	Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh

Implement the ARM_SMCCC_KVM_FUNC_MEM_SHARE hypercall to allow protected
VMs to share memory (e.g. the swiotlb bounce buffers) back to the host.

Reviewed-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  1 +
 arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 32 ++++++++++
 arch/arm64/kvm/hyp/nvhe/pkvm.c                | 61 +++++++++++++++++++
 3 files changed, 94 insertions(+)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index 5e6cdafcdd69..42fd60c5cfc9 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -35,6 +35,7 @@ extern unsigned long hyp_nr_cpus;
 
 int __pkvm_prot_finalize(void);
 int __pkvm_host_share_hyp(u64 pfn);
+int __pkvm_guest_share_host(struct pkvm_hyp_vcpu *vcpu, u64 gfn);
 int __pkvm_host_unshare_hyp(u64 pfn);
 int __pkvm_host_donate_hyp(u64 pfn, u64 nr_pages);
 int __pkvm_hyp_donate_host(u64 pfn, u64 nr_pages);
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 914b373cfb56..7535a45565f4 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -967,6 +967,38 @@ int __pkvm_host_share_hyp(u64 pfn)
 	return ret;
 }
 
+int __pkvm_guest_share_host(struct pkvm_hyp_vcpu *vcpu, u64 gfn)
+{
+	struct pkvm_hyp_vm *vm = pkvm_hyp_vcpu_to_hyp_vm(vcpu);
+	u64 phys, ipa = hyp_pfn_to_phys(gfn);
+	kvm_pte_t pte;
+	int ret;
+
+	host_lock_component();
+	guest_lock_component(vm);
+
+	ret = get_valid_guest_pte(vm, ipa, &pte, &phys);
+	if (ret)
+		goto unlock;
+
+	ret = -EPERM;
+	if (pkvm_getstate(kvm_pgtable_stage2_pte_prot(pte)) != PKVM_PAGE_OWNED)
+		goto unlock;
+	if (__host_check_page_state_range(phys, PAGE_SIZE, PKVM_NOPAGE))
+		goto unlock;
+
+	ret = 0;
+	WARN_ON(kvm_pgtable_stage2_map(&vm->pgt, ipa, PAGE_SIZE, phys,
+				       pkvm_mkstate(KVM_PGTABLE_PROT_RWX, PKVM_PAGE_SHARED_OWNED),
+				       &vcpu->vcpu.arch.pkvm_memcache, 0));
+	WARN_ON(__host_set_page_state_range(phys, PAGE_SIZE, PKVM_PAGE_SHARED_BORROWED));
+unlock:
+	guest_unlock_component(vm);
+	host_unlock_component();
+
+	return ret;
+}
+
 int __pkvm_host_unshare_hyp(u64 pfn)
 {
 	u64 phys = hyp_pfn_to_phys(pfn);
diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
index 64171e04ea82..4133556560d9 100644
--- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
+++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
@@ -937,6 +937,58 @@ int __pkvm_finalize_teardown_vm(pkvm_handle_t handle)
 	hyp_spin_unlock(&vm_table_lock);
 	return err;
 }
+
+static u64 __pkvm_memshare_page_req(struct kvm_vcpu *vcpu, u64 ipa)
+{
+	u64 elr;
+
+	/* Fake up a data abort (level 3 translation fault on write) */
+	vcpu->arch.fault.esr_el2 = (ESR_ELx_EC_DABT_LOW << ESR_ELx_EC_SHIFT) |
+				   ESR_ELx_WNR | ESR_ELx_FSC_FAULT |
+				   FIELD_PREP(ESR_ELx_FSC_LEVEL, 3);
+
+	/* Shuffle the IPA around into the HPFAR */
+	vcpu->arch.fault.hpfar_el2 = (HPFAR_EL2_NS | (ipa >> 8)) & HPFAR_MASK;
+
+	/* This is a virtual address. 0's good. Let's go with 0. */
+	vcpu->arch.fault.far_el2 = 0;
+
+	/* Rewind the ELR so we return to the HVC once the IPA is mapped */
+	elr = read_sysreg(elr_el2);
+	elr -= 4;
+	write_sysreg(elr, elr_el2);
+
+	return ARM_EXCEPTION_TRAP;
+}
+
+static bool pkvm_memshare_call(u64 *ret, struct kvm_vcpu *vcpu, u64 *exit_code)
+{
+	struct pkvm_hyp_vcpu *hyp_vcpu;
+	u64 ipa = smccc_get_arg1(vcpu);
+
+	if (!PAGE_ALIGNED(ipa))
+		goto out_guest;
+
+	hyp_vcpu = container_of(vcpu, struct pkvm_hyp_vcpu, vcpu);
+	switch (__pkvm_guest_share_host(hyp_vcpu, hyp_phys_to_pfn(ipa))) {
+	case 0:
+		ret[0] = SMCCC_RET_SUCCESS;
+		goto out_guest;
+	case -ENOENT:
+		/*
+		 * Convert the exception into a data abort so that the page
+		 * being shared is mapped into the guest next time.
+		 */
+		*exit_code = __pkvm_memshare_page_req(vcpu, ipa);
+		goto out_host;
+	}
+
+out_guest:
+	return true;
+out_host:
+	return false;
+}
+
 /*
  * Handler for protected VM HVC calls.
  *
@@ -953,6 +1005,7 @@ bool kvm_handle_pvm_hvc64(struct kvm_vcpu *vcpu, u64 *exit_code)
 	case ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID:
 		val[0] = BIT(ARM_SMCCC_KVM_FUNC_FEATURES);
 		val[0] |= BIT(ARM_SMCCC_KVM_FUNC_HYP_MEMINFO);
+		val[0] |= BIT(ARM_SMCCC_KVM_FUNC_MEM_SHARE);
 		break;
 	case ARM_SMCCC_VENDOR_HYP_KVM_HYP_MEMINFO_FUNC_ID:
 		if (smccc_get_arg1(vcpu) ||
@@ -963,6 +1016,14 @@ bool kvm_handle_pvm_hvc64(struct kvm_vcpu *vcpu, u64 *exit_code)
 
 		val[0] = PAGE_SIZE;
 		break;
+	case ARM_SMCCC_VENDOR_HYP_KVM_MEM_SHARE_FUNC_ID:
+		if (smccc_get_arg2(vcpu) ||
+		    smccc_get_arg3(vcpu)) {
+			break;
+		}
+
+		handled = pkvm_memshare_call(val, vcpu, exit_code);
+		break;
 	default:
 		/* Punt everything else back to the host, for now. */
 		handled = false;
-- 
2.52.0.457.g6b5491de43-goog



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 29/35] KVM: arm64: Implement the MEM_UNSHARE hypercall for protected VMs
  2026-01-19 12:45 [PATCH v2 00/35] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
                   ` (27 preceding siblings ...)
  2026-01-19 12:46 ` [PATCH v2 28/35] KVM: arm64: Implement the MEM_SHARE hypercall for " Will Deacon
@ 2026-01-19 12:46 ` Will Deacon
  2026-01-19 12:46 ` [PATCH v2 30/35] KVM: arm64: Allow userspace to create protected VMs when pKVM is enabled Will Deacon
                   ` (6 subsequent siblings)
  35 siblings, 0 replies; 54+ messages in thread
From: Will Deacon @ 2026-01-19 12:46 UTC (permalink / raw)
  To: kvmarm
  Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
	Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
	Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh

Implement the ARM_SMCCC_KVM_FUNC_MEM_UNSHARE hypercall to allow
protected VMs to unshare memory that was previously shared with the host
using the ARM_SMCCC_KVM_FUNC_MEM_SHARE hypercall.

Reviewed-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  1 +
 arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 34 +++++++++++++++++++
 arch/arm64/kvm/hyp/nvhe/pkvm.c                | 22 ++++++++++++
 3 files changed, 57 insertions(+)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index 42fd60c5cfc9..e41a128b0854 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -36,6 +36,7 @@ extern unsigned long hyp_nr_cpus;
 int __pkvm_prot_finalize(void);
 int __pkvm_host_share_hyp(u64 pfn);
 int __pkvm_guest_share_host(struct pkvm_hyp_vcpu *vcpu, u64 gfn);
+int __pkvm_guest_unshare_host(struct pkvm_hyp_vcpu *vcpu, u64 gfn);
 int __pkvm_host_unshare_hyp(u64 pfn);
 int __pkvm_host_donate_hyp(u64 pfn, u64 nr_pages);
 int __pkvm_hyp_donate_host(u64 pfn, u64 nr_pages);
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 7535a45565f4..3969a92dc3e2 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -999,6 +999,40 @@ int __pkvm_guest_share_host(struct pkvm_hyp_vcpu *vcpu, u64 gfn)
 	return ret;
 }
 
+int __pkvm_guest_unshare_host(struct pkvm_hyp_vcpu *vcpu, u64 gfn)
+{
+	struct pkvm_hyp_vm *vm = pkvm_hyp_vcpu_to_hyp_vm(vcpu);
+	u64 meta, phys, ipa = hyp_pfn_to_phys(gfn);
+	kvm_pte_t pte;
+	int ret;
+
+	host_lock_component();
+	guest_lock_component(vm);
+
+	ret = get_valid_guest_pte(vm, ipa, &pte, &phys);
+	if (ret)
+		goto unlock;
+
+	ret = -EPERM;
+	if (pkvm_getstate(kvm_pgtable_stage2_pte_prot(pte)) != PKVM_PAGE_SHARED_OWNED)
+		goto unlock;
+	if (__host_check_page_state_range(phys, PAGE_SIZE, PKVM_PAGE_SHARED_BORROWED))
+		goto unlock;
+
+	ret = 0;
+	meta = host_stage2_encode_gfn_meta(vm, gfn);
+	WARN_ON(host_stage2_set_owner_metadata_locked(phys, PAGE_SIZE,
+						      PKVM_ID_GUEST, meta));
+	WARN_ON(kvm_pgtable_stage2_map(&vm->pgt, ipa, PAGE_SIZE, phys,
+				       pkvm_mkstate(KVM_PGTABLE_PROT_RWX, PKVM_PAGE_OWNED),
+				       &vcpu->vcpu.arch.pkvm_memcache, 0));
+unlock:
+	guest_unlock_component(vm);
+	host_unlock_component();
+
+	return ret;
+}
+
 int __pkvm_host_unshare_hyp(u64 pfn)
 {
 	u64 phys = hyp_pfn_to_phys(pfn);
diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
index 4133556560d9..4ddb3d70f9c0 100644
--- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
+++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
@@ -989,6 +989,19 @@ static bool pkvm_memshare_call(u64 *ret, struct kvm_vcpu *vcpu, u64 *exit_code)
 	return false;
 }
 
+static void pkvm_memunshare_call(u64 *ret, struct kvm_vcpu *vcpu)
+{
+	struct pkvm_hyp_vcpu *hyp_vcpu;
+	u64 ipa = smccc_get_arg1(vcpu);
+
+	if (!PAGE_ALIGNED(ipa))
+		return;
+
+	hyp_vcpu = container_of(vcpu, struct pkvm_hyp_vcpu, vcpu);
+	if (!__pkvm_guest_unshare_host(hyp_vcpu, hyp_phys_to_pfn(ipa)))
+		ret[0] = SMCCC_RET_SUCCESS;
+}
+
 /*
  * Handler for protected VM HVC calls.
  *
@@ -1006,6 +1019,7 @@ bool kvm_handle_pvm_hvc64(struct kvm_vcpu *vcpu, u64 *exit_code)
 		val[0] = BIT(ARM_SMCCC_KVM_FUNC_FEATURES);
 		val[0] |= BIT(ARM_SMCCC_KVM_FUNC_HYP_MEMINFO);
 		val[0] |= BIT(ARM_SMCCC_KVM_FUNC_MEM_SHARE);
+		val[0] |= BIT(ARM_SMCCC_KVM_FUNC_MEM_UNSHARE);
 		break;
 	case ARM_SMCCC_VENDOR_HYP_KVM_HYP_MEMINFO_FUNC_ID:
 		if (smccc_get_arg1(vcpu) ||
@@ -1024,6 +1038,14 @@ bool kvm_handle_pvm_hvc64(struct kvm_vcpu *vcpu, u64 *exit_code)
 
 		handled = pkvm_memshare_call(val, vcpu, exit_code);
 		break;
+	case ARM_SMCCC_VENDOR_HYP_KVM_MEM_UNSHARE_FUNC_ID:
+		if (smccc_get_arg2(vcpu) ||
+		    smccc_get_arg3(vcpu)) {
+			break;
+		}
+
+		pkvm_memunshare_call(val, vcpu);
+		break;
 	default:
 		/* Punt everything else back to the host, for now. */
 		handled = false;
-- 
2.52.0.457.g6b5491de43-goog



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 30/35] KVM: arm64: Allow userspace to create protected VMs when pKVM is enabled
  2026-01-19 12:45 [PATCH v2 00/35] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
                   ` (28 preceding siblings ...)
  2026-01-19 12:46 ` [PATCH v2 29/35] KVM: arm64: Implement the MEM_UNSHARE " Will Deacon
@ 2026-01-19 12:46 ` Will Deacon
  2026-01-19 12:46 ` [PATCH v2 31/35] KVM: arm64: Add some initial documentation for pKVM Will Deacon
                   ` (5 subsequent siblings)
  35 siblings, 0 replies; 54+ messages in thread
From: Will Deacon @ 2026-01-19 12:46 UTC (permalink / raw)
  To: kvmarm
  Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
	Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
	Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh

Introduce a new VM type for KVM/arm64 to allow userspace to request the
creation of a "protected VM" when the host has booted with pKVM enabled.

For now, this depends on CONFIG_EXPERT and results in a taint on first
use as many aspects of a protected VM are not yet protected!

Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/kvm_pkvm.h |  2 +-
 arch/arm64/kvm/Kconfig            | 10 ++++++++++
 arch/arm64/kvm/arm.c              |  8 +++++++-
 arch/arm64/kvm/mmu.c              |  3 ---
 arch/arm64/kvm/pkvm.c             | 11 ++++++++++-
 include/uapi/linux/kvm.h          |  5 +++++
 6 files changed, 33 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pkvm.h b/arch/arm64/include/asm/kvm_pkvm.h
index 5a71d25febca..507d9c53e2eb 100644
--- a/arch/arm64/include/asm/kvm_pkvm.h
+++ b/arch/arm64/include/asm/kvm_pkvm.h
@@ -16,7 +16,7 @@
 
 #define HYP_MEMBLOCK_REGIONS 128
 
-int pkvm_init_host_vm(struct kvm *kvm);
+int pkvm_init_host_vm(struct kvm *kvm, unsigned long type);
 int pkvm_create_hyp_vm(struct kvm *kvm);
 bool pkvm_hyp_vm_is_created(struct kvm *kvm);
 void pkvm_destroy_hyp_vm(struct kvm *kvm);
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index 4f803fd1c99a..47fa06c9da94 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -83,4 +83,14 @@ config PTDUMP_STAGE2_DEBUGFS
 
 	  If in doubt, say N.
 
+config PROTECTED_VM_UAPI
+	bool "Expose protected VMs to userspace (experimental)"
+	depends on KVM && EXPERT
+	help
+	  Say Y here to enable experimental (i.e. in development)
+	  support for creating protected virtual machines using KVM's
+	  KVM_CREATE_VM ioctl() when booted with pKVM enabled.
+
+	  Unless you are a KVM developer, say N.
+
 endif # VIRTUALIZATION
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 6a218739621d..7a59f0e32efc 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -157,6 +157,9 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 {
 	int ret;
 
+	if (type & ~KVM_VM_TYPE_ARM_MASK)
+		return -EINVAL;
+
 	mutex_init(&kvm->arch.config_lock);
 
 #ifdef CONFIG_LOCKDEP
@@ -188,9 +191,12 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 		 * If any failures occur after this is successful, make sure to
 		 * call __pkvm_unreserve_vm to unreserve the VM in hyp.
 		 */
-		ret = pkvm_init_host_vm(kvm);
+		ret = pkvm_init_host_vm(kvm, type);
 		if (ret)
 			goto err_uninit_mmu;
+	} else if (type & KVM_VM_TYPE_ARM_PROTECTED) {
+		ret = -EINVAL;
+		goto err_uninit_mmu;
 	}
 
 	kvm_vgic_early_init(kvm);
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index b21a5bf3d104..bbc569b583e6 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -881,9 +881,6 @@ static int kvm_init_ipa_range(struct kvm_s2_mmu *mmu, unsigned long type)
 	u64 mmfr0, mmfr1;
 	u32 phys_shift;
 
-	if (type & ~KVM_VM_TYPE_ARM_IPA_SIZE_MASK)
-		return -EINVAL;
-
 	phys_shift = KVM_VM_TYPE_ARM_IPA_SIZE(type);
 	if (is_protected_kvm_enabled()) {
 		phys_shift = kvm_ipa_limit;
diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index 14865907610c..6d6b1ef1cc62 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -219,9 +219,13 @@ void pkvm_destroy_hyp_vm(struct kvm *kvm)
 	mutex_unlock(&kvm->arch.config_lock);
 }
 
-int pkvm_init_host_vm(struct kvm *kvm)
+int pkvm_init_host_vm(struct kvm *kvm, unsigned long type)
 {
 	int ret;
+	bool protected = type & KVM_VM_TYPE_ARM_PROTECTED;
+
+	if (protected && !IS_ENABLED(CONFIG_PROTECTED_VM_UAPI))
+		return -EINVAL;
 
 	if (pkvm_hyp_vm_is_created(kvm))
 		return -EINVAL;
@@ -236,6 +240,11 @@ int pkvm_init_host_vm(struct kvm *kvm)
 		return ret;
 
 	kvm->arch.pkvm.handle = ret;
+	kvm->arch.pkvm.is_protected = protected;
+	if (protected) {
+		pr_warn_once("kvm: protected VMs are for development only, tainting kernel\n");
+		add_taint(TAINT_USER, LOCKDEP_STILL_OK);
+	}
 
 	return 0;
 }
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index dddb781b0507..9316a827a826 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -689,6 +689,11 @@ struct kvm_enable_cap {
 #define KVM_VM_TYPE_ARM_IPA_SIZE_MASK	0xffULL
 #define KVM_VM_TYPE_ARM_IPA_SIZE(x)		\
 	((x) & KVM_VM_TYPE_ARM_IPA_SIZE_MASK)
+
+#define KVM_VM_TYPE_ARM_PROTECTED	(1UL << 31)
+#define KVM_VM_TYPE_ARM_MASK		(KVM_VM_TYPE_ARM_IPA_SIZE_MASK | \
+					 KVM_VM_TYPE_ARM_PROTECTED)
+
 /*
  * ioctls for /dev/kvm fds:
  */
-- 
2.52.0.457.g6b5491de43-goog



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 31/35] KVM: arm64: Add some initial documentation for pKVM
  2026-01-19 12:45 [PATCH v2 00/35] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
                   ` (29 preceding siblings ...)
  2026-01-19 12:46 ` [PATCH v2 30/35] KVM: arm64: Allow userspace to create protected VMs when pKVM is enabled Will Deacon
@ 2026-01-19 12:46 ` Will Deacon
  2026-01-19 12:46 ` [PATCH v2 32/35] KVM: arm64: Extend pKVM page ownership selftests to cover guest donation Will Deacon
                   ` (4 subsequent siblings)
  35 siblings, 0 replies; 54+ messages in thread
From: Will Deacon @ 2026-01-19 12:46 UTC (permalink / raw)
  To: kvmarm
  Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
	Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
	Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh

Add some initial documentation for pKVM to help people understand what
is supported, the limitations of protected VMs when compared to
non-protected VMs and also what is left to do.

Signed-off-by: Will Deacon <will@kernel.org>
---
 .../admin-guide/kernel-parameters.txt         |   4 +-
 Documentation/virt/kvm/arm/index.rst          |   1 +
 Documentation/virt/kvm/arm/pkvm.rst           | 103 ++++++++++++++++++
 3 files changed, 106 insertions(+), 2 deletions(-)
 create mode 100644 Documentation/virt/kvm/arm/pkvm.rst

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index a8d0afde7f85..9939dc5654d2 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3141,8 +3141,8 @@ Kernel parameters
 			for the host. To force nVHE on VHE hardware, add
 			"arm64_sw.hvhe=0 id_aa64mmfr1.vh=0" to the
 			command-line.
-			"nested" is experimental and should be used with
-			extreme caution.
+			"nested" and "protected" are experimental and should be
+			used with extreme caution.
 
 	kvm-arm.vgic_v3_group0_trap=
 			[KVM,ARM,EARLY] Trap guest accesses to GICv3 group-0
diff --git a/Documentation/virt/kvm/arm/index.rst b/Documentation/virt/kvm/arm/index.rst
index ec09881de4cf..0856b4942e05 100644
--- a/Documentation/virt/kvm/arm/index.rst
+++ b/Documentation/virt/kvm/arm/index.rst
@@ -10,6 +10,7 @@ ARM
    fw-pseudo-registers
    hyp-abi
    hypercalls
+   pkvm
    pvtime
    ptp_kvm
    vcpu-features
diff --git a/Documentation/virt/kvm/arm/pkvm.rst b/Documentation/virt/kvm/arm/pkvm.rst
new file mode 100644
index 000000000000..8258c93bed6e
--- /dev/null
+++ b/Documentation/virt/kvm/arm/pkvm.rst
@@ -0,0 +1,103 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+====================
+Protected KVM (pKVM)
+====================
+
+**NOTE**: pKVM is currently an experimental, development feature and
+subject to breaking changes as new isolation features are implemented.
+Please reach out to the developers at kvmarm@lists.linux.dev if you have
+any questions.
+
+Overview
+========
+
+Booting a host kernel with '``kvm-arm.mode=protected``' enables
+"Protected KVM" (pKVM). During boot, pKVM installs a stage-2 identity
+map page-table for the host and uses it to isolate the hypervisor
+running at EL2 from the rest of the host running at EL1/0.
+
+If ``CONFIG_PROTECTED_VM_UAPI=y``, pKVM permits creation of protected
+virtual machines (pVMs) by passing the ``KVM_VM_TYPE_ARM_PROTECTED``
+machine type identifier to the ``KVM_CREATE_VM`` ioctl(). The hypervisor
+isolates pVMs from the host by unmapping pages from the stage-2 identity
+map as they are accessed by a pVM. Hypercalls are provided for a pVM to
+share specific regions of its IPA space back with the host, allowing
+for communication with the VMM. See hypercalls.rst for more details.
+
+Isolation mechanisms
+====================
+
+pKVM relies on a number of mechanisms to isolate PVMs from the host:
+
+CPU memory isolation
+--------------------
+
+Status: Isolation of anonymous memory and metadata pages.
+
+Metadata pages (e.g. page-table pages and '``struct kvm_vcpu``' pages)
+are donated from the host to the hypervisor during pVM creation and
+are consequently unmapped from the stage-2 identity map until the pVM is
+destroyed.
+
+Similarly to regular KVM, pages are lazily mapped into the guest in
+response to stage-2 page faults handled by the host. However, when
+running a pVM, these pages are first pinned and then unmapped from the
+stage-2 identity map as part of the donation procedure. This gives rise
+to some user-visible differences when compared to non-protected VMs,
+largely due to the lack of MMU notifiers:
+
+* Memslots cannot be moved or deleted once the pVM has started running.
+* Read-only memslots and dirty logging are not supported.
+* With the exception of swap, file-backed pages cannot be mapped into a
+  pVM.
+* Donated pages are accounted against ``RLIMIT_MLOCK`` and so the VMM
+  must have a sufficient resource limit or be granted ``CAP_IPC_LOCK``.
+  The lack of a runtime reclaim mechanism means that memory locked for
+  a pVM will remain locked until the pVM is destroyed.
+* Changes to the VMM address space (e.g. a ``MAP_FIXED`` mmap() over a
+  mapping associated with a memslot) are not reflected in the guest and
+  may lead to loss of coherency.
+* Accessing pVM memory that has not been shared back will result in the
+  delivery of a SIGSEGV.
+* If a system call accesses pVM memory that has not been shared back
+  then it will either return ``-EFAULT`` or forcefully reclaim the
+  memory pages. Reclaimed memory is zeroed by the hypervisor and a
+  subsequent attempt to access it in the pVM will return ``-EFAULT``
+  from the ``VCPU_RUN`` ioctl().
+
+CPU state isolation
+-------------------
+
+Status: **Unimplemented.**
+
+DMA isolation using an IOMMU
+----------------------------
+
+Status: **Unimplemented.**
+
+Proxying of Trustzone services
+------------------------------
+
+Status: FF-A and PSCI calls from the host are proxied by the pKVM
+hypervisor.
+
+The FF-A proxy ensures that the host cannot share pVM or hypervisor
+memory with Trustzone as part of a "confused deputy" attack.
+
+The PSCI proxy ensures that CPUs always have the stage-2 identity map
+installed when they are executing in the host.
+
+Protected VM firmware (pvmfw)
+-----------------------------
+
+Status: **Unimplemented.**
+
+Resources
+=========
+
+Quentin Perret's KVM Forum 2022 talk entitled "Protected KVM on arm64: A
+technical deep dive" remains a good resource for learning more about
+pKVM, despite some of the details having changed in the meantime:
+
+https://www.youtube.com/watch?v=9npebeVFbFw
-- 
2.52.0.457.g6b5491de43-goog



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 32/35] KVM: arm64: Extend pKVM page ownership selftests to cover guest donation
  2026-01-19 12:45 [PATCH v2 00/35] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
                   ` (30 preceding siblings ...)
  2026-01-19 12:46 ` [PATCH v2 31/35] KVM: arm64: Add some initial documentation for pKVM Will Deacon
@ 2026-01-19 12:46 ` Will Deacon
  2026-01-19 12:46 ` [PATCH v2 33/35] KVM: arm64: Register 'selftest_vm' in the VM table Will Deacon
                   ` (3 subsequent siblings)
  35 siblings, 0 replies; 54+ messages in thread
From: Will Deacon @ 2026-01-19 12:46 UTC (permalink / raw)
  To: kvmarm
  Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
	Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
	Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh

Extend the pKVM page ownership selftests to donate and reclaim a page
to/from a guest.

Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/hyp/nvhe/mem_protect.c | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 3969a92dc3e2..0d23da060be4 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -1757,6 +1757,7 @@ void pkvm_ownership_selftest(void *base)
 	assert_transition_res(-EPERM,	hyp_pin_shared_mem, virt, virt + size);
 	assert_transition_res(-EPERM,	__pkvm_host_share_guest, pfn, gfn, 1, vcpu, prot);
 	assert_transition_res(-ENOENT,	__pkvm_host_unshare_guest, gfn, 1, vm);
+	assert_transition_res(-EPERM,   __pkvm_host_donate_guest, pfn, gfn, vcpu);
 
 	selftest_state.host = PKVM_PAGE_OWNED;
 	selftest_state.hyp = PKVM_NOPAGE;
@@ -1776,6 +1777,7 @@ void pkvm_ownership_selftest(void *base)
 	assert_transition_res(-EPERM,	__pkvm_hyp_donate_host, pfn, 1);
 	assert_transition_res(-EPERM,	__pkvm_host_share_guest, pfn, gfn, 1, vcpu, prot);
 	assert_transition_res(-ENOENT,	__pkvm_host_unshare_guest, gfn, 1, vm);
+	assert_transition_res(-EPERM,   __pkvm_host_donate_guest, pfn, gfn, vcpu);
 
 	assert_transition_res(0,	hyp_pin_shared_mem, virt, virt + size);
 	assert_transition_res(0,	hyp_pin_shared_mem, virt, virt + size);
@@ -1788,6 +1790,7 @@ void pkvm_ownership_selftest(void *base)
 	assert_transition_res(-EPERM,	__pkvm_hyp_donate_host, pfn, 1);
 	assert_transition_res(-EPERM,	__pkvm_host_share_guest, pfn, gfn, 1, vcpu, prot);
 	assert_transition_res(-ENOENT,	__pkvm_host_unshare_guest, gfn, 1, vm);
+	assert_transition_res(-EPERM,   __pkvm_host_donate_guest, pfn, gfn, vcpu);
 
 	hyp_unpin_shared_mem(virt, virt + size);
 	assert_page_state();
@@ -1807,6 +1810,7 @@ void pkvm_ownership_selftest(void *base)
 	assert_transition_res(-EPERM,	__pkvm_hyp_donate_host, pfn, 1);
 	assert_transition_res(-EPERM,	__pkvm_host_share_guest, pfn, gfn, 1, vcpu, prot);
 	assert_transition_res(-ENOENT,	__pkvm_host_unshare_guest, gfn, 1, vm);
+	assert_transition_res(-EPERM,   __pkvm_host_donate_guest, pfn, gfn, vcpu);
 	assert_transition_res(-EPERM,	hyp_pin_shared_mem, virt, virt + size);
 
 	selftest_state.host = PKVM_PAGE_OWNED;
@@ -1823,6 +1827,7 @@ void pkvm_ownership_selftest(void *base)
 	assert_transition_res(-EPERM,	__pkvm_host_share_hyp, pfn);
 	assert_transition_res(-EPERM,	__pkvm_host_unshare_hyp, pfn);
 	assert_transition_res(-EPERM,	__pkvm_hyp_donate_host, pfn, 1);
+	assert_transition_res(-EPERM,   __pkvm_host_donate_guest, pfn, gfn, vcpu);
 	assert_transition_res(-EPERM,	hyp_pin_shared_mem, virt, virt + size);
 
 	selftest_state.guest[1] = PKVM_PAGE_SHARED_BORROWED;
@@ -1836,6 +1841,23 @@ void pkvm_ownership_selftest(void *base)
 	selftest_state.host = PKVM_PAGE_OWNED;
 	assert_transition_res(0,	__pkvm_host_unshare_guest, gfn + 1, 1, vm);
 
+	selftest_state.host = PKVM_NOPAGE;
+	selftest_state.guest[0] = PKVM_PAGE_OWNED;
+	assert_transition_res(0,	__pkvm_host_donate_guest, pfn, gfn, vcpu);
+	assert_transition_res(-EPERM,	__pkvm_host_donate_guest, pfn, gfn, vcpu);
+	assert_transition_res(-EPERM,	__pkvm_host_donate_guest, pfn, gfn + 1, vcpu);
+	assert_transition_res(-EPERM,	__pkvm_host_share_guest, pfn, gfn, 1, vcpu, prot);
+	assert_transition_res(-EPERM,	__pkvm_host_share_guest, pfn, gfn + 1, 1, vcpu, prot);
+	assert_transition_res(-EPERM,	__pkvm_host_share_ffa, pfn, 1);
+	assert_transition_res(-EPERM,	__pkvm_host_donate_hyp, pfn, 1);
+	assert_transition_res(-EPERM,	__pkvm_host_share_hyp, pfn);
+	assert_transition_res(-EPERM,	__pkvm_host_unshare_hyp, pfn);
+	assert_transition_res(-EPERM,	__pkvm_hyp_donate_host, pfn, 1);
+
+	selftest_state.host = PKVM_PAGE_OWNED;
+	selftest_state.guest[0] = PKVM_NOPAGE;
+	assert_transition_res(0,	__pkvm_host_reclaim_page_guest, gfn, vm);
+
 	selftest_state.host = PKVM_NOPAGE;
 	selftest_state.hyp = PKVM_PAGE_OWNED;
 	assert_transition_res(0,	__pkvm_host_donate_hyp, pfn, 1);
-- 
2.52.0.457.g6b5491de43-goog



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 33/35] KVM: arm64: Register 'selftest_vm' in the VM table
  2026-01-19 12:45 [PATCH v2 00/35] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
                   ` (31 preceding siblings ...)
  2026-01-19 12:46 ` [PATCH v2 32/35] KVM: arm64: Extend pKVM page ownership selftests to cover guest donation Will Deacon
@ 2026-01-19 12:46 ` Will Deacon
  2026-01-19 12:46 ` [PATCH v2 34/35] KVM: arm64: Extend pKVM page ownership selftests to cover forced reclaim Will Deacon
                   ` (2 subsequent siblings)
  35 siblings, 0 replies; 54+ messages in thread
From: Will Deacon @ 2026-01-19 12:46 UTC (permalink / raw)
  To: kvmarm
  Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
	Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
	Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh

In preparation for extending the pKVM page ownership selftests to cover
forceful reclaim of donated pages, rework the creation of the
'selftest_vm' so that it is registered in the VM table while the tests
are running.

Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  2 +
 arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 53 ++++---------------
 arch/arm64/kvm/hyp/nvhe/pkvm.c                | 49 +++++++++++++++++
 3 files changed, 61 insertions(+), 43 deletions(-)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index e41a128b0854..3ad644111885 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -77,6 +77,8 @@ static __always_inline void __load_host_stage2(void)
 
 #ifdef CONFIG_NVHE_EL2_DEBUG
 void pkvm_ownership_selftest(void *base);
+struct pkvm_hyp_vcpu *init_selftest_vm(void *virt);
+void teardown_selftest_vm(void);
 #else
 static inline void pkvm_ownership_selftest(void *base) { }
 #endif
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 0d23da060be4..f36f81f75fdf 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -1654,53 +1654,18 @@ struct pkvm_expected_state {
 
 static struct pkvm_expected_state selftest_state;
 static struct hyp_page *selftest_page;
-
-static struct pkvm_hyp_vm selftest_vm = {
-	.kvm = {
-		.arch = {
-			.mmu = {
-				.arch = &selftest_vm.kvm.arch,
-				.pgt = &selftest_vm.pgt,
-			},
-		},
-	},
-};
-
-static struct pkvm_hyp_vcpu selftest_vcpu = {
-	.vcpu = {
-		.arch = {
-			.hw_mmu = &selftest_vm.kvm.arch.mmu,
-		},
-		.kvm = &selftest_vm.kvm,
-	},
-};
-
-static void init_selftest_vm(void *virt)
-{
-	struct hyp_page *p = hyp_virt_to_page(virt);
-	int i;
-
-	selftest_vm.kvm.arch.mmu.vtcr = host_mmu.arch.mmu.vtcr;
-	WARN_ON(kvm_guest_prepare_stage2(&selftest_vm, virt));
-
-	for (i = 0; i < pkvm_selftest_pages(); i++) {
-		if (p[i].refcount)
-			continue;
-		p[i].refcount = 1;
-		hyp_put_page(&selftest_vm.pool, hyp_page_to_virt(&p[i]));
-	}
-}
+static struct pkvm_hyp_vcpu *selftest_vcpu;
 
 static u64 selftest_ipa(void)
 {
-	return BIT(selftest_vm.pgt.ia_bits - 1);
+	return BIT(selftest_vcpu->vcpu.arch.hw_mmu->pgt->ia_bits - 1);
 }
 
 static void assert_page_state(void)
 {
 	void *virt = hyp_page_to_virt(selftest_page);
 	u64 size = PAGE_SIZE << selftest_page->order;
-	struct pkvm_hyp_vcpu *vcpu = &selftest_vcpu;
+	struct pkvm_hyp_vcpu *vcpu = selftest_vcpu;
 	u64 phys = hyp_virt_to_phys(virt);
 	u64 ipa[2] = { selftest_ipa(), selftest_ipa() + PAGE_SIZE };
 	struct pkvm_hyp_vm *vm;
@@ -1715,10 +1680,10 @@ static void assert_page_state(void)
 	WARN_ON(__hyp_check_page_state_range(phys, size, selftest_state.hyp));
 	hyp_unlock_component();
 
-	guest_lock_component(&selftest_vm);
+	guest_lock_component(vm);
 	WARN_ON(__guest_check_page_state_range(vm, ipa[0], size, selftest_state.guest[0]));
 	WARN_ON(__guest_check_page_state_range(vm, ipa[1], size, selftest_state.guest[1]));
-	guest_unlock_component(&selftest_vm);
+	guest_unlock_component(vm);
 }
 
 #define assert_transition_res(res, fn, ...)		\
@@ -1731,14 +1696,15 @@ void pkvm_ownership_selftest(void *base)
 {
 	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_RWX;
 	void *virt = hyp_alloc_pages(&host_s2_pool, 0);
-	struct pkvm_hyp_vcpu *vcpu = &selftest_vcpu;
-	struct pkvm_hyp_vm *vm = &selftest_vm;
+	struct pkvm_hyp_vcpu *vcpu;
 	u64 phys, size, pfn, gfn;
+	struct pkvm_hyp_vm *vm;
 
 	WARN_ON(!virt);
 	selftest_page = hyp_virt_to_page(virt);
 	selftest_page->refcount = 0;
-	init_selftest_vm(base);
+	selftest_vcpu = vcpu = init_selftest_vm(base);
+	vm = pkvm_hyp_vcpu_to_hyp_vm(vcpu);
 
 	size = PAGE_SIZE << selftest_page->order;
 	phys = hyp_virt_to_phys(virt);
@@ -1862,6 +1828,7 @@ void pkvm_ownership_selftest(void *base)
 	selftest_state.hyp = PKVM_PAGE_OWNED;
 	assert_transition_res(0,	__pkvm_host_donate_hyp, pfn, 1);
 
+	teardown_selftest_vm();
 	selftest_page->refcount = 1;
 	hyp_put_page(&host_s2_pool, virt);
 }
diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
index 4ddb3d70f9c0..f68eac841326 100644
--- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
+++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
@@ -697,6 +697,55 @@ void __pkvm_unreserve_vm(pkvm_handle_t handle)
 	hyp_spin_unlock(&vm_table_lock);
 }
 
+#ifdef CONFIG_NVHE_EL2_DEBUG
+static struct pkvm_hyp_vm selftest_vm = {
+	.kvm = {
+		.arch = {
+			.mmu = {
+				.arch = &selftest_vm.kvm.arch,
+				.pgt = &selftest_vm.pgt,
+			},
+		},
+	},
+};
+
+static struct pkvm_hyp_vcpu selftest_vcpu = {
+	.vcpu = {
+		.arch = {
+			.hw_mmu = &selftest_vm.kvm.arch.mmu,
+		},
+		.kvm = &selftest_vm.kvm,
+	},
+};
+
+struct pkvm_hyp_vcpu *init_selftest_vm(void *virt)
+{
+	struct hyp_page *p = hyp_virt_to_page(virt);
+	int i;
+
+	selftest_vm.kvm.arch.mmu.vtcr = host_mmu.arch.mmu.vtcr;
+	WARN_ON(kvm_guest_prepare_stage2(&selftest_vm, virt));
+
+	for (i = 0; i < pkvm_selftest_pages(); i++) {
+		if (p[i].refcount)
+			continue;
+		p[i].refcount = 1;
+		hyp_put_page(&selftest_vm.pool, hyp_page_to_virt(&p[i]));
+	}
+
+	selftest_vm.kvm.arch.pkvm.handle = __pkvm_reserve_vm();
+	insert_vm_table_entry(selftest_vm.kvm.arch.pkvm.handle, &selftest_vm);
+	return &selftest_vcpu;
+}
+
+void teardown_selftest_vm(void)
+{
+	hyp_spin_lock(&vm_table_lock);
+	remove_vm_table_entry(selftest_vm.kvm.arch.pkvm.handle);
+	hyp_spin_unlock(&vm_table_lock);
+}
+#endif /* CONFIG_NVHE_EL2_DEBUG */
+
 /*
  * Initialize the hypervisor copy of the VM state using host-donated memory.
  *
-- 
2.52.0.457.g6b5491de43-goog



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 34/35] KVM: arm64: Extend pKVM page ownership selftests to cover forced reclaim
  2026-01-19 12:45 [PATCH v2 00/35] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
                   ` (32 preceding siblings ...)
  2026-01-19 12:46 ` [PATCH v2 33/35] KVM: arm64: Register 'selftest_vm' in the VM table Will Deacon
@ 2026-01-19 12:46 ` Will Deacon
  2026-01-19 12:46 ` [PATCH v2 35/35] KVM: arm64: Extend pKVM page ownership selftests to cover guest hvcs Will Deacon
  2026-02-10 18:58 ` [PATCH v2 00/35] KVM: arm64: Add support for protected guest memory with pKVM Trilok Soni
  35 siblings, 0 replies; 54+ messages in thread
From: Will Deacon @ 2026-01-19 12:46 UTC (permalink / raw)
  To: kvmarm
  Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
	Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
	Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh

Extend the pKVM page ownership selftests to forcefully reclaim a donated
page and check that it cannot be re-donated at the same IPA.

Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/hyp/nvhe/mem_protect.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index f36f81f75fdf..a7d880701933 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -1821,8 +1821,20 @@ void pkvm_ownership_selftest(void *base)
 	assert_transition_res(-EPERM,	__pkvm_hyp_donate_host, pfn, 1);
 
 	selftest_state.host = PKVM_PAGE_OWNED;
-	selftest_state.guest[0] = PKVM_NOPAGE;
-	assert_transition_res(0,	__pkvm_host_reclaim_page_guest, gfn, vm);
+	selftest_state.guest[0] = PKVM_POISON;
+	assert_transition_res(0,	__pkvm_host_force_reclaim_page_guest, phys);
+	assert_transition_res(-EPERM,	__pkvm_host_donate_guest, pfn, gfn, vcpu);
+	assert_transition_res(-EPERM,	__pkvm_host_share_guest, pfn, gfn, 1, vcpu, prot);
+
+	selftest_state.host = PKVM_NOPAGE;
+	selftest_state.guest[1] = PKVM_PAGE_OWNED;
+	assert_transition_res(0,	__pkvm_host_donate_guest, pfn, gfn + 1, vcpu);
+
+	selftest_state.host = PKVM_PAGE_OWNED;
+	selftest_state.guest[1] = PKVM_NOPAGE;
+	assert_transition_res(0,	__pkvm_host_reclaim_page_guest, gfn + 1, vm);
+	assert_transition_res(-EPERM,	__pkvm_host_donate_guest, pfn, gfn, vcpu);
+	assert_transition_res(-EPERM,	__pkvm_host_share_guest, pfn, gfn, 1, vcpu, prot);
 
 	selftest_state.host = PKVM_NOPAGE;
 	selftest_state.hyp = PKVM_PAGE_OWNED;
-- 
2.52.0.457.g6b5491de43-goog



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v2 35/35] KVM: arm64: Extend pKVM page ownership selftests to cover guest hvcs
  2026-01-19 12:45 [PATCH v2 00/35] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
                   ` (33 preceding siblings ...)
  2026-01-19 12:46 ` [PATCH v2 34/35] KVM: arm64: Extend pKVM page ownership selftests to cover forced reclaim Will Deacon
@ 2026-01-19 12:46 ` Will Deacon
  2026-02-10 18:58 ` [PATCH v2 00/35] KVM: arm64: Add support for protected guest memory with pKVM Trilok Soni
  35 siblings, 0 replies; 54+ messages in thread
From: Will Deacon @ 2026-01-19 12:46 UTC (permalink / raw)
  To: kvmarm
  Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
	Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
	Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh

Now that the guest can share and unshare memory with the host using
hypercalls, extend the pKVM page ownership selftest to exercise these
new transitions.

Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/hyp/nvhe/mem_protect.c | 30 +++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index a7d880701933..2a75b5649289 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -1820,11 +1820,41 @@ void pkvm_ownership_selftest(void *base)
 	assert_transition_res(-EPERM,	__pkvm_host_unshare_hyp, pfn);
 	assert_transition_res(-EPERM,	__pkvm_hyp_donate_host, pfn, 1);
 
+	selftest_state.host = PKVM_PAGE_SHARED_BORROWED;
+	selftest_state.guest[0] = PKVM_PAGE_SHARED_OWNED;
+	assert_transition_res(0,	__pkvm_guest_share_host, vcpu, gfn);
+	assert_transition_res(-EPERM,	__pkvm_guest_share_host, vcpu, gfn);
+	assert_transition_res(-EPERM,	__pkvm_host_donate_guest, pfn, gfn, vcpu);
+	assert_transition_res(-EPERM,	__pkvm_host_donate_guest, pfn, gfn + 1, vcpu);
+	assert_transition_res(-EPERM,	__pkvm_host_share_guest, pfn, gfn, 1, vcpu, prot);
+	assert_transition_res(-EPERM,	__pkvm_host_share_guest, pfn, gfn + 1, 1, vcpu, prot);
+	assert_transition_res(-EPERM,	__pkvm_host_share_ffa, pfn, 1);
+	assert_transition_res(-EPERM,	__pkvm_host_donate_hyp, pfn, 1);
+	assert_transition_res(-EPERM,	__pkvm_host_share_hyp, pfn);
+	assert_transition_res(-EPERM,	__pkvm_host_unshare_hyp, pfn);
+	assert_transition_res(-EPERM,	__pkvm_hyp_donate_host, pfn, 1);
+
+	selftest_state.host = PKVM_NOPAGE;
+	selftest_state.guest[0] = PKVM_PAGE_OWNED;
+	assert_transition_res(0,	__pkvm_guest_unshare_host, vcpu, gfn);
+	assert_transition_res(-EPERM,	__pkvm_guest_unshare_host, vcpu, gfn);
+	assert_transition_res(-EPERM,	__pkvm_host_donate_guest, pfn, gfn, vcpu);
+	assert_transition_res(-EPERM,	__pkvm_host_donate_guest, pfn, gfn + 1, vcpu);
+	assert_transition_res(-EPERM,	__pkvm_host_share_guest, pfn, gfn, 1, vcpu, prot);
+	assert_transition_res(-EPERM,	__pkvm_host_share_guest, pfn, gfn + 1, 1, vcpu, prot);
+	assert_transition_res(-EPERM,	__pkvm_host_share_ffa, pfn, 1);
+	assert_transition_res(-EPERM,	__pkvm_host_donate_hyp, pfn, 1);
+	assert_transition_res(-EPERM,	__pkvm_host_share_hyp, pfn);
+	assert_transition_res(-EPERM,	__pkvm_host_unshare_hyp, pfn);
+	assert_transition_res(-EPERM,	__pkvm_hyp_donate_host, pfn, 1);
+
 	selftest_state.host = PKVM_PAGE_OWNED;
 	selftest_state.guest[0] = PKVM_POISON;
 	assert_transition_res(0,	__pkvm_host_force_reclaim_page_guest, phys);
 	assert_transition_res(-EPERM,	__pkvm_host_donate_guest, pfn, gfn, vcpu);
 	assert_transition_res(-EPERM,	__pkvm_host_share_guest, pfn, gfn, 1, vcpu, prot);
+	assert_transition_res(-EHWPOISON, __pkvm_guest_share_host, vcpu, gfn);
+	assert_transition_res(-EHWPOISON, __pkvm_guest_unshare_host, vcpu, gfn);
 
 	selftest_state.host = PKVM_NOPAGE;
 	selftest_state.guest[1] = PKVM_PAGE_OWNED;
-- 
2.52.0.457.g6b5491de43-goog



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 22/35] KVM: arm64: Change 'pkvm_handle_t' to u16
  2026-01-19 12:46 ` [PATCH v2 22/35] KVM: arm64: Change 'pkvm_handle_t' to u16 Will Deacon
@ 2026-01-28 10:28   ` Fuad Tabba
  0 siblings, 0 replies; 54+ messages in thread
From: Fuad Tabba @ 2026-01-28 10:28 UTC (permalink / raw)
  To: Will Deacon
  Cc: kvmarm, linux-arm-kernel, Marc Zyngier, Oliver Upton, Joey Gouly,
	Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Quentin Perret,
	Vincent Donnefort, Mostafa Saleh

On Mon, 19 Jan 2026 at 12:48, Will Deacon <will@kernel.org> wrote:
>
> 'pkvm_handle_t' doesn't need to be a 32-bit type and subsequent patches
> will rely on it being no more than 16 bits so that it can be encoded
> into a pte annotation.
>
> Change 'pkvm_handle_t' to a u16 and add a compile-type check that the
> maximum handle fits into the reduced type.
>
> Signed-off-by: Will Deacon <will@kernel.org>

Reviewed-by: Fuad Tabba <tabba@google.com>

Cheers,
/fuad

> ---
>  arch/arm64/include/asm/kvm_host.h | 2 +-
>  arch/arm64/kvm/hyp/nvhe/pkvm.c    | 1 +
>  2 files changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 3191d10a2622..60a5c87b0a17 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -247,7 +247,7 @@ struct kvm_smccc_features {
>         unsigned long vendor_hyp_bmap_2; /* Function numbers 64-127 */
>  };
>
> -typedef unsigned int pkvm_handle_t;
> +typedef u16 pkvm_handle_t;
>
>  struct kvm_protected_vm {
>         pkvm_handle_t handle;
> diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
> index 9f0997150cf5..c5772417372d 100644
> --- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
> +++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
> @@ -223,6 +223,7 @@ static struct pkvm_hyp_vm **vm_table;
>
>  void pkvm_hyp_vm_table_init(void *tbl)
>  {
> +       BUILD_BUG_ON((u64)HANDLE_OFFSET + KVM_MAX_PVMS > (pkvm_handle_t)-1);
>         WARN_ON(vm_table);
>         vm_table = tbl;
>  }
> --
> 2.52.0.457.g6b5491de43-goog
>


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 23/35] KVM: arm64: Annotate guest donations with handle and gfn in host stage-2
  2026-01-19 12:46 ` [PATCH v2 23/35] KVM: arm64: Annotate guest donations with handle and gfn in host stage-2 Will Deacon
@ 2026-01-28 10:29   ` Fuad Tabba
  0 siblings, 0 replies; 54+ messages in thread
From: Fuad Tabba @ 2026-01-28 10:29 UTC (permalink / raw)
  To: Will Deacon
  Cc: kvmarm, linux-arm-kernel, Marc Zyngier, Oliver Upton, Joey Gouly,
	Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Quentin Perret,
	Vincent Donnefort, Mostafa Saleh

On Mon, 19 Jan 2026 at 12:48, Will Deacon <will@kernel.org> wrote:
>
> Handling host kernel faults arising from accesses to donated guest
> memory will require an rmap-like mechanism to identify the guest mapping
> of the faulting page.
>
> Extend the page donation logic to encode the guest handle and gfn
> alongside the owner information in the host stage-2 pte.
>
> Signed-off-by: Will Deacon <will@kernel.org>

Reviewed-by: Fuad Tabba <tabba@google.com>

Cheers,
/fuad

> ---
>  arch/arm64/kvm/hyp/nvhe/mem_protect.c | 20 ++++++++++++++++++--
>  1 file changed, 18 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> index e090252d38a8..f4638fe9d77a 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> @@ -590,7 +590,6 @@ int host_stage2_set_owner_locked(phys_addr_t addr, u64 size, u8 owner_id)
>                 if (!ret)
>                         __host_update_page_state(addr, size, PKVM_PAGE_OWNED);
>                 break;
> -       case PKVM_ID_GUEST:
>         case PKVM_ID_HYP:
>                 ret = host_stage2_set_owner_metadata_locked(addr, size,
>                                                             owner_id, 0);
> @@ -600,6 +599,20 @@ int host_stage2_set_owner_locked(phys_addr_t addr, u64 size, u8 owner_id)
>         return ret;
>  }
>
> +#define KVM_HOST_PTE_OWNER_GUEST_HANDLE_MASK   GENMASK(15, 0)
> +/* We need 40 bits for the GFN to cover a 52-bit IPA with 4k pages and LPA2 */
> +#define KVM_HOST_PTE_OWNER_GUEST_GFN_MASK      GENMASK(55, 16)
> +static u64 host_stage2_encode_gfn_meta(struct pkvm_hyp_vm *vm, u64 gfn)
> +{
> +       pkvm_handle_t handle = vm->kvm.arch.pkvm.handle;
> +
> +       BUILD_BUG_ON((pkvm_handle_t)-1 > KVM_HOST_PTE_OWNER_GUEST_HANDLE_MASK);
> +       WARN_ON(!FIELD_FIT(KVM_HOST_PTE_OWNER_GUEST_GFN_MASK, gfn));
> +
> +       return FIELD_PREP(KVM_HOST_PTE_OWNER_GUEST_HANDLE_MASK, handle) |
> +              FIELD_PREP(KVM_HOST_PTE_OWNER_GUEST_GFN_MASK, gfn);
> +}
> +
>  static bool host_stage2_force_pte_cb(u64 addr, u64 end, enum kvm_pgtable_prot prot)
>  {
>         /*
> @@ -1133,6 +1146,7 @@ int __pkvm_host_donate_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu)
>         struct pkvm_hyp_vm *vm = pkvm_hyp_vcpu_to_hyp_vm(vcpu);
>         u64 phys = hyp_pfn_to_phys(pfn);
>         u64 ipa = hyp_pfn_to_phys(gfn);
> +       u64 meta;
>         int ret;
>
>         host_lock_component();
> @@ -1146,7 +1160,9 @@ int __pkvm_host_donate_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu)
>         if (ret)
>                 goto unlock;
>
> -       WARN_ON(host_stage2_set_owner_locked(phys, PAGE_SIZE, PKVM_ID_GUEST));
> +       meta = host_stage2_encode_gfn_meta(vm, gfn);
> +       WARN_ON(host_stage2_set_owner_metadata_locked(phys, PAGE_SIZE,
> +                                                     PKVM_ID_GUEST, meta));
>         WARN_ON(kvm_pgtable_stage2_map(&vm->pgt, ipa, PAGE_SIZE, phys,
>                                        pkvm_mkstate(KVM_PGTABLE_PROT_RWX, PKVM_PAGE_OWNED),
>                                        &vcpu->vcpu.arch.pkvm_memcache, 0));
> --
> 2.52.0.457.g6b5491de43-goog
>


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 07/35] KVM: arm64: Remove is_protected_kvm_enabled() checks from hypercalls
  2026-01-19 12:46 ` [PATCH v2 07/35] KVM: arm64: Remove is_protected_kvm_enabled() checks from hypercalls Will Deacon
@ 2026-02-10 14:53   ` Alexandru Elisei
  2026-03-03 15:45     ` Will Deacon
  0 siblings, 1 reply; 54+ messages in thread
From: Alexandru Elisei @ 2026-02-10 14:53 UTC (permalink / raw)
  To: Will Deacon
  Cc: kvmarm, linux-arm-kernel, Marc Zyngier, Oliver Upton, Joey Gouly,
	Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Quentin Perret,
	Fuad Tabba, Vincent Donnefort, Mostafa Saleh

Hi Will,

On Mon, Jan 19, 2026 at 12:46:00PM +0000, Will Deacon wrote:
> When pKVM is not enabled, the host shouldn't issue pKVM-specific
> hypercalls and so there's no point checking for this in the pKVM
> hypercall handlers.
> 
> Remove the redundant is_protected_kvm_enabled() checks from each
> hypercall and instead rejig the hypercall table so that the
> pKVM-specific hypercalls are unreachable when pKVM is not being used.
> 
> Reviewed-by: Quentin Perret <qperret@google.com>
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
>  arch/arm64/include/asm/kvm_asm.h   | 20 ++++++----
>  arch/arm64/kvm/hyp/nvhe/hyp-main.c | 63 ++++++++++--------------------
>  2 files changed, 32 insertions(+), 51 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> index a1ad12c72ebf..2076005e9253 100644
> --- a/arch/arm64/include/asm/kvm_asm.h
> +++ b/arch/arm64/include/asm/kvm_asm.h
> @@ -60,16 +60,9 @@ enum __kvm_host_smccc_func {
>  	__KVM_HOST_SMCCC_FUNC___vgic_v3_init_lrs,
>  	__KVM_HOST_SMCCC_FUNC___vgic_v3_get_gic_config,
>  	__KVM_HOST_SMCCC_FUNC___pkvm_prot_finalize,
> +	__KVM_HOST_SMCCC_FUNC_MIN_PKVM = __KVM_HOST_SMCCC_FUNC___pkvm_prot_finalize,
>  
>  	/* Hypercalls available after pKVM finalisation */

This comment should be removed, I think the functions that follow, up to
and including __KVM_HOST_SMCCC_FUNC_MAX_NO_PKVM, are also available with
kvm-arm.mode=nvhe.

If you agree that the comment should be removed, maybe a different name for
the define above would be more appropriate, one that does not imply pkvm?

> -	__KVM_HOST_SMCCC_FUNC___pkvm_host_share_hyp,
> -	__KVM_HOST_SMCCC_FUNC___pkvm_host_unshare_hyp,
> -	__KVM_HOST_SMCCC_FUNC___pkvm_host_share_guest,
> -	__KVM_HOST_SMCCC_FUNC___pkvm_host_unshare_guest,
> -	__KVM_HOST_SMCCC_FUNC___pkvm_host_relax_perms_guest,
> -	__KVM_HOST_SMCCC_FUNC___pkvm_host_wrprotect_guest,
> -	__KVM_HOST_SMCCC_FUNC___pkvm_host_test_clear_young_guest,
> -	__KVM_HOST_SMCCC_FUNC___pkvm_host_mkyoung_guest,
>  	__KVM_HOST_SMCCC_FUNC___kvm_adjust_pc,
>  	__KVM_HOST_SMCCC_FUNC___kvm_vcpu_run,
>  	__KVM_HOST_SMCCC_FUNC___kvm_flush_vm_context,
> @@ -81,6 +74,17 @@ enum __kvm_host_smccc_func {
>  	__KVM_HOST_SMCCC_FUNC___kvm_timer_set_cntvoff,
>  	__KVM_HOST_SMCCC_FUNC___vgic_v3_save_aprs,
>  	__KVM_HOST_SMCCC_FUNC___vgic_v3_restore_vmcr_aprs,
> +	__KVM_HOST_SMCCC_FUNC_MAX_NO_PKVM = __KVM_HOST_SMCCC_FUNC___vgic_v3_restore_vmcr_aprs,
> +
> +	/* Hypercalls available only when pKVM has finalised */
> +	__KVM_HOST_SMCCC_FUNC___pkvm_host_share_hyp,
> +	__KVM_HOST_SMCCC_FUNC___pkvm_host_unshare_hyp,
> +	__KVM_HOST_SMCCC_FUNC___pkvm_host_share_guest,
> +	__KVM_HOST_SMCCC_FUNC___pkvm_host_unshare_guest,
> +	__KVM_HOST_SMCCC_FUNC___pkvm_host_relax_perms_guest,
> +	__KVM_HOST_SMCCC_FUNC___pkvm_host_wrprotect_guest,
> +	__KVM_HOST_SMCCC_FUNC___pkvm_host_test_clear_young_guest,
> +	__KVM_HOST_SMCCC_FUNC___pkvm_host_mkyoung_guest,
>  	__KVM_HOST_SMCCC_FUNC___pkvm_reserve_vm,
>  	__KVM_HOST_SMCCC_FUNC___pkvm_unreserve_vm,
>  	__KVM_HOST_SMCCC_FUNC___pkvm_init_vm,
> diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> index a7c689152f68..eb5cfe32b2c9 100644
> --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> @@ -169,9 +169,6 @@ static void handle___pkvm_vcpu_load(struct kvm_cpu_context *host_ctxt)
>  	DECLARE_REG(u64, hcr_el2, host_ctxt, 3);
>  	struct pkvm_hyp_vcpu *hyp_vcpu;
>  
> -	if (!is_protected_kvm_enabled())
> -		return;
> -
>  	hyp_vcpu = pkvm_load_hyp_vcpu(handle, vcpu_idx);
>  	if (!hyp_vcpu)
>  		return;

I've always wondered about this. For some hypercalls, all the handler does is
marshal the arguments for the actual function (for example,
handle___kvm_adjust_pc() -> __kvm_adjust_pc()), but for others, like this one,
the handler also has extra checks before calling the actual function.  Would you
mind explaining what the rationale is?

As someone who is not intimately familiar with the code, I find this surprising,
and each time I want to understand what a hypercall does (in this case,
__pkvm_vcpu_load()), I have to remind myself that the handler might also have
code that is relevant if I want to get the full picture.

Thanks,
Alex

> @@ -185,12 +182,8 @@ static void handle___pkvm_vcpu_load(struct kvm_cpu_context *host_ctxt)
>  
>  static void handle___pkvm_vcpu_put(struct kvm_cpu_context *host_ctxt)
>  {
> -	struct pkvm_hyp_vcpu *hyp_vcpu;
> +	struct pkvm_hyp_vcpu *hyp_vcpu = pkvm_get_loaded_hyp_vcpu();
>  
> -	if (!is_protected_kvm_enabled())
> -		return;
> -
> -	hyp_vcpu = pkvm_get_loaded_hyp_vcpu();
>  	if (hyp_vcpu)
>  		pkvm_put_hyp_vcpu(hyp_vcpu);
>  }
> @@ -254,9 +247,6 @@ static void handle___pkvm_host_share_guest(struct kvm_cpu_context *host_ctxt)
>  	struct pkvm_hyp_vcpu *hyp_vcpu;
>  	int ret = -EINVAL;
>  
w> -	if (!is_protected_kvm_enabled())
> -		goto out;
> -
>  	hyp_vcpu = pkvm_get_loaded_hyp_vcpu();
>  	if (!hyp_vcpu || pkvm_hyp_vcpu_is_protected(hyp_vcpu))
>  		goto out;
> @@ -278,9 +268,6 @@ static void handle___pkvm_host_unshare_guest(struct kvm_cpu_context *host_ctxt)
>  	struct pkvm_hyp_vm *hyp_vm;
>  	int ret = -EINVAL;
>  
> -	if (!is_protected_kvm_enabled())
> -		goto out;
> -
>  	hyp_vm = get_np_pkvm_hyp_vm(handle);
>  	if (!hyp_vm)
>  		goto out;
> @@ -298,9 +285,6 @@ static void handle___pkvm_host_relax_perms_guest(struct kvm_cpu_context *host_ct
>  	struct pkvm_hyp_vcpu *hyp_vcpu;
>  	int ret = -EINVAL;
>  
> -	if (!is_protected_kvm_enabled())
> -		goto out;
> -
>  	hyp_vcpu = pkvm_get_loaded_hyp_vcpu();
>  	if (!hyp_vcpu || pkvm_hyp_vcpu_is_protected(hyp_vcpu))
>  		goto out;
> @@ -318,9 +302,6 @@ static void handle___pkvm_host_wrprotect_guest(struct kvm_cpu_context *host_ctxt
>  	struct pkvm_hyp_vm *hyp_vm;
>  	int ret = -EINVAL;
>  
> -	if (!is_protected_kvm_enabled())
> -		goto out;
> -
>  	hyp_vm = get_np_pkvm_hyp_vm(handle);
>  	if (!hyp_vm)
>  		goto out;
> @@ -340,9 +321,6 @@ static void handle___pkvm_host_test_clear_young_guest(struct kvm_cpu_context *ho
>  	struct pkvm_hyp_vm *hyp_vm;
>  	int ret = -EINVAL;
>  
> -	if (!is_protected_kvm_enabled())
> -		goto out;
> -
>  	hyp_vm = get_np_pkvm_hyp_vm(handle);
>  	if (!hyp_vm)
>  		goto out;
> @@ -359,9 +337,6 @@ static void handle___pkvm_host_mkyoung_guest(struct kvm_cpu_context *host_ctxt)
>  	struct pkvm_hyp_vcpu *hyp_vcpu;
>  	int ret = -EINVAL;
>  
> -	if (!is_protected_kvm_enabled())
> -		goto out;
> -
>  	hyp_vcpu = pkvm_get_loaded_hyp_vcpu();
>  	if (!hyp_vcpu || pkvm_hyp_vcpu_is_protected(hyp_vcpu))
>  		goto out;
> @@ -421,12 +396,8 @@ static void handle___kvm_tlb_flush_vmid(struct kvm_cpu_context *host_ctxt)
>  static void handle___pkvm_tlb_flush_vmid(struct kvm_cpu_context *host_ctxt)
>  {
>  	DECLARE_REG(pkvm_handle_t, handle, host_ctxt, 1);
> -	struct pkvm_hyp_vm *hyp_vm;
> +	struct pkvm_hyp_vm *hyp_vm = get_np_pkvm_hyp_vm(handle);
>  
> -	if (!is_protected_kvm_enabled())
> -		return;
> -
> -	hyp_vm = get_np_pkvm_hyp_vm(handle);
>  	if (!hyp_vm)
>  		return;
>  
> @@ -600,14 +571,6 @@ static const hcall_t host_hcall[] = {
>  	HANDLE_FUNC(__vgic_v3_get_gic_config),
>  	HANDLE_FUNC(__pkvm_prot_finalize),
>  
> -	HANDLE_FUNC(__pkvm_host_share_hyp),
> -	HANDLE_FUNC(__pkvm_host_unshare_hyp),
> -	HANDLE_FUNC(__pkvm_host_share_guest),
> -	HANDLE_FUNC(__pkvm_host_unshare_guest),
> -	HANDLE_FUNC(__pkvm_host_relax_perms_guest),
> -	HANDLE_FUNC(__pkvm_host_wrprotect_guest),
> -	HANDLE_FUNC(__pkvm_host_test_clear_young_guest),
> -	HANDLE_FUNC(__pkvm_host_mkyoung_guest),
>  	HANDLE_FUNC(__kvm_adjust_pc),
>  	HANDLE_FUNC(__kvm_vcpu_run),
>  	HANDLE_FUNC(__kvm_flush_vm_context),
> @@ -619,6 +582,15 @@ static const hcall_t host_hcall[] = {
>  	HANDLE_FUNC(__kvm_timer_set_cntvoff),
>  	HANDLE_FUNC(__vgic_v3_save_aprs),
>  	HANDLE_FUNC(__vgic_v3_restore_vmcr_aprs),
> +
> +	HANDLE_FUNC(__pkvm_host_share_hyp),
> +	HANDLE_FUNC(__pkvm_host_unshare_hyp),
> +	HANDLE_FUNC(__pkvm_host_share_guest),
> +	HANDLE_FUNC(__pkvm_host_unshare_guest),
> +	HANDLE_FUNC(__pkvm_host_relax_perms_guest),
> +	HANDLE_FUNC(__pkvm_host_wrprotect_guest),
> +	HANDLE_FUNC(__pkvm_host_test_clear_young_guest),
> +	HANDLE_FUNC(__pkvm_host_mkyoung_guest),
>  	HANDLE_FUNC(__pkvm_reserve_vm),
>  	HANDLE_FUNC(__pkvm_unreserve_vm),
>  	HANDLE_FUNC(__pkvm_init_vm),
> @@ -632,7 +604,7 @@ static const hcall_t host_hcall[] = {
>  static void handle_host_hcall(struct kvm_cpu_context *host_ctxt)
>  {
>  	DECLARE_REG(unsigned long, id, host_ctxt, 0);
> -	unsigned long hcall_min = 0;
> +	unsigned long hcall_min = 0, hcall_max = -1;
>  	hcall_t hfn;
>  
>  	/*
> @@ -644,14 +616,19 @@ static void handle_host_hcall(struct kvm_cpu_context *host_ctxt)
>  	 * basis. This is all fine, however, since __pkvm_prot_finalize
>  	 * returns -EPERM after the first call for a given CPU.
>  	 */
> -	if (static_branch_unlikely(&kvm_protected_mode_initialized))
> -		hcall_min = __KVM_HOST_SMCCC_FUNC___pkvm_prot_finalize;
> +	if (static_branch_unlikely(&kvm_protected_mode_initialized)) {
> +		hcall_min = __KVM_HOST_SMCCC_FUNC_MIN_PKVM;
> +	} else {
> +		hcall_max = __KVM_HOST_SMCCC_FUNC_MAX_NO_PKVM;
> +	}
>  
>  	id &= ~ARM_SMCCC_CALL_HINTS;
>  	id -= KVM_HOST_SMCCC_ID(0);
>  
> -	if (unlikely(id < hcall_min || id >= ARRAY_SIZE(host_hcall)))
> +	if (unlikely(id < hcall_min || id > hcall_max ||
> +		     id >= ARRAY_SIZE(host_hcall))) {
>  		goto inval;
> +	}
>  
>  	hfn = host_hcall[id];
>  	if (unlikely(!hfn))
> -- 
> 2.52.0.457.g6b5491de43-goog
> 
> 


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 00/35] KVM: arm64: Add support for protected guest memory with pKVM
  2026-01-19 12:45 [PATCH v2 00/35] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
                   ` (34 preceding siblings ...)
  2026-01-19 12:46 ` [PATCH v2 35/35] KVM: arm64: Extend pKVM page ownership selftests to cover guest hvcs Will Deacon
@ 2026-02-10 18:58 ` Trilok Soni
  2026-02-10 19:03   ` Fuad Tabba
  2026-02-16 10:58   ` Venkata Rao Kakani
  35 siblings, 2 replies; 54+ messages in thread
From: Trilok Soni @ 2026-02-10 18:58 UTC (permalink / raw)
  To: Will Deacon, kvmarm
  Cc: linux-arm-kernel, Marc Zyngier, Oliver Upton, Joey Gouly,
	Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Quentin Perret,
	Fuad Tabba, Vincent Donnefort, Mostafa Saleh

On 1/19/2026 4:45 AM, Will Deacon wrote:
> Hi folks,
> 
> It's back and it's even bigger than before!
> 
> Although the first patch has been picked up as a fix (thanks, Oliver),
> review feedback has resulted in some additional patches being included
> in the series. If you'd like to see the first version, it's available
> here:

Is it possible to test this patch series w/ the QEMU or it will require
real SOC platform? 

---Trilok Soni


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 00/35] KVM: arm64: Add support for protected guest memory with pKVM
  2026-02-10 18:58 ` [PATCH v2 00/35] KVM: arm64: Add support for protected guest memory with pKVM Trilok Soni
@ 2026-02-10 19:03   ` Fuad Tabba
  2026-02-16 10:58   ` Venkata Rao Kakani
  1 sibling, 0 replies; 54+ messages in thread
From: Fuad Tabba @ 2026-02-10 19:03 UTC (permalink / raw)
  To: Trilok Soni
  Cc: Will Deacon, kvmarm, linux-arm-kernel, Marc Zyngier, Oliver Upton,
	Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
	Quentin Perret, Vincent Donnefort, Mostafa Saleh

Hi Trilok,

On Tue, 10 Feb 2026 at 18:58, Trilok Soni
<trilokkumar.soni@oss.qualcomm.com> wrote:
>
> On 1/19/2026 4:45 AM, Will Deacon wrote:
> > Hi folks,
> >
> > It's back and it's even bigger than before!
> >
> > Although the first patch has been picked up as a fix (thanks, Oliver),
> > review feedback has resulted in some additional patches being included
> > in the series. If you'd like to see the first version, it's available
> > here:
>
> Is it possible to test this patch series w/ the QEMU or it will require
> real SOC platform?

QEMU works just fine. Just make sure you use the kvmtool patches [1]
Will mentioned in v1, and to book the kernel with
`kvm-arm.mode=protected`.

Cheers,
/fuad

[1] https://git.kernel.org/pub/scm/linux/kernel/git/will/kvmtool.git/log/?h=pkvm

>
> ---Trilok Soni


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 14/35] KVM: arm64: Handle aborts from protected VMs
  2026-01-19 12:46 ` [PATCH v2 14/35] KVM: arm64: Handle aborts from protected VMs Will Deacon
@ 2026-02-12 10:37   ` Alexandru Elisei
  2026-03-04 14:06     ` Will Deacon
  2026-03-11 10:24   ` Fuad Tabba
  1 sibling, 1 reply; 54+ messages in thread
From: Alexandru Elisei @ 2026-02-12 10:37 UTC (permalink / raw)
  To: Will Deacon
  Cc: kvmarm, linux-arm-kernel, Marc Zyngier, Oliver Upton, Joey Gouly,
	Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Quentin Perret,
	Fuad Tabba, Vincent Donnefort, Mostafa Saleh

Hi Will,

On Mon, Jan 19, 2026 at 12:46:07PM +0000, Will Deacon wrote:
> Introduce a new abort handler for resolving stage-2 page faults from
> protected VMs by pinning and donating anonymous memory. This is
> considerably simpler than the infamous user_mem_abort() as we only have
> to deal with translation faults at the pte level.
> 
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
>  arch/arm64/kvm/mmu.c | 89 ++++++++++++++++++++++++++++++++++++++++----
>  1 file changed, 81 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index a23a4b7f108c..b21a5bf3d104 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1641,6 +1641,74 @@ static int gmem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  	return ret != -EAGAIN ? ret : 0;
>  }
>  
> +static int pkvm_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> +		struct kvm_memory_slot *memslot, unsigned long hva)
> +{
> +	unsigned int flags = FOLL_HWPOISON | FOLL_LONGTERM | FOLL_WRITE;
> +	struct kvm_pgtable *pgt = vcpu->arch.hw_mmu->pgt;
> +	struct mm_struct *mm = current->mm;
> +	struct kvm *kvm = vcpu->kvm;
> +	void *hyp_memcache;
> +	struct page *page;
> +	int ret;
> +
> +	ret = prepare_mmu_memcache(vcpu, true, &hyp_memcache);
> +	if (ret)
> +		return -ENOMEM;
> +
> +	ret = account_locked_vm(mm, 1, true);
> +	if (ret)
> +		return ret;
> +
> +	mmap_read_lock(mm);
> +	ret = pin_user_pages(hva, 1, flags, &page);
> +	mmap_read_unlock(mm);

If the page is part of a large folio, the entire folio gets pinned here, not
just the page returned by pin_user_pages(). Do you reckon that should be
considered when calling account_locked_vm()?

> +
> +	if (ret == -EHWPOISON) {
> +		kvm_send_hwpoison_signal(hva, PAGE_SHIFT);
> +		ret = 0;
> +		goto dec_account;
> +	} else if (ret != 1) {
> +		ret = -EFAULT;
> +		goto dec_account;
> +	} else if (!folio_test_swapbacked(page_folio(page))) {
> +		/*
> +		 * We really can't deal with page-cache pages returned by GUP
> +		 * because (a) we may trigger writeback of a page for which we
> +		 * no longer have access and (b) page_mkclean() won't find the
> +		 * stage-2 mapping in the rmap so we can get out-of-whack with
> +		 * the filesystem when marking the page dirty during unpinning
> +		 * (see cc5095747edf ("ext4: don't BUG if someone dirty pages
> +		 * without asking ext4 first")).

I've been trying to wrap my head around this. Would you mind providing a few
more hints about what the issue is? I'm sure the approach is correct, it's
likely just me not being familiar with the code.

> +		 *
> +		 * Ideally we'd just restrict ourselves to anonymous pages, but
> +		 * we also want to allow memfd (i.e. shmem) pages, so check for
> +		 * pages backed by swap in the knowledge that the GUP pin will
> +		 * prevent try_to_unmap() from succeeding.
> +		 */
> +		ret = -EIO;
> +		goto unpin;
> +	}
> +
> +	write_lock(&kvm->mmu_lock);
> +	ret = pkvm_pgtable_stage2_map(pgt, fault_ipa, PAGE_SIZE,
> +				      page_to_phys(page), KVM_PGTABLE_PROT_RWX,
> +				      hyp_memcache, 0);
> +	write_unlock(&kvm->mmu_lock);
> +	if (ret) {
> +		if (ret == -EAGAIN)
> +			ret = 0;
> +		goto unpin;
> +	}

This looks correct to me, there's no need to check for the notifier sequence
number if the MMU notifiers are ignored. And concurrent faults on the same page
are handled by treating -EAGAIN as success.

> +
> +	return 0;
> +unpin:
> +	unpin_user_pages(&page, 1);
> +dec_account:
> +	account_locked_vm(mm, 1, false);
> +	return ret;
> +}
> +
>  static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  			  struct kvm_s2_trans *nested,
>  			  struct kvm_memory_slot *memslot, unsigned long hva,
> @@ -2190,15 +2258,20 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
>  		goto out_unlock;
>  	}
>  
> -	VM_WARN_ON_ONCE(kvm_vcpu_trap_is_permission_fault(vcpu) &&
> -			!write_fault && !kvm_vcpu_trap_is_exec_fault(vcpu));
> +	if (kvm_vm_is_protected(vcpu->kvm)) {
> +		ret = pkvm_mem_abort(vcpu, fault_ipa, memslot, hva);

I guess the reason this comes after handling an access fault is because you want
the WARN_ON() to trigger in pkvm_pgtable_stage2_mkyoung().

Thanks,
Alex

> +	} else {
> +		VM_WARN_ON_ONCE(kvm_vcpu_trap_is_permission_fault(vcpu) &&
> +				!write_fault &&
> +				!kvm_vcpu_trap_is_exec_fault(vcpu));
>  
> -	if (kvm_slot_has_gmem(memslot))
> -		ret = gmem_abort(vcpu, fault_ipa, nested, memslot,
> -				 esr_fsc_is_permission_fault(esr));
> -	else
> -		ret = user_mem_abort(vcpu, fault_ipa, nested, memslot, hva,
> -				     esr_fsc_is_permission_fault(esr));
> +		if (kvm_slot_has_gmem(memslot))
> +			ret = gmem_abort(vcpu, fault_ipa, nested, memslot,
> +					 esr_fsc_is_permission_fault(esr));
> +		else
> +			ret = user_mem_abort(vcpu, fault_ipa, nested, memslot, hva,
> +					     esr_fsc_is_permission_fault(esr));
> +	}
>  	if (ret == 0)
>  		ret = 1;
>  out:
> -- 
> 2.52.0.457.g6b5491de43-goog
> 
> 


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 24/35] KVM: arm64: Introduce hypercall to force reclaim of a protected page
  2026-01-19 12:46 ` [PATCH v2 24/35] KVM: arm64: Introduce hypercall to force reclaim of a protected page Will Deacon
@ 2026-02-12 17:18   ` Alexandru Elisei
  2026-03-04 14:08     ` Will Deacon
  0 siblings, 1 reply; 54+ messages in thread
From: Alexandru Elisei @ 2026-02-12 17:18 UTC (permalink / raw)
  To: Will Deacon
  Cc: kvmarm, linux-arm-kernel, Marc Zyngier, Oliver Upton, Joey Gouly,
	Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Quentin Perret,
	Fuad Tabba, Vincent Donnefort, Mostafa Saleh

Hi Will,

On Mon, Jan 19, 2026 at 12:46:17PM +0000, Will Deacon wrote:
> Introduce a new hypercall, __pkvm_force_reclaim_guest_page(), to allow
> the host to forcefully reclaim a physical page that was previous donated
> to a protected guest. This results in the page being zeroed and the
> previous guest mapping being poisoned so that new pages cannot be
> subsequently donated at the same IPA.
> 
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
>  arch/arm64/include/asm/kvm_asm.h              |   1 +
>  arch/arm64/include/asm/kvm_pgtable.h          |   6 +
>  arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |   1 +
>  arch/arm64/kvm/hyp/include/nvhe/memory.h      |   6 +
>  arch/arm64/kvm/hyp/include/nvhe/pkvm.h        |   1 +
>  arch/arm64/kvm/hyp/nvhe/hyp-main.c            |   8 ++
>  arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 127 +++++++++++++++++-
>  arch/arm64/kvm/hyp/nvhe/pkvm.c                |   4 +-
>  8 files changed, 152 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> index 2e7e8e7771f6..39e4e588ca4f 100644
> --- a/arch/arm64/include/asm/kvm_asm.h
> +++ b/arch/arm64/include/asm/kvm_asm.h
> @@ -90,6 +90,7 @@ enum __kvm_host_smccc_func {
>  	__KVM_HOST_SMCCC_FUNC___pkvm_unreserve_vm,
>  	__KVM_HOST_SMCCC_FUNC___pkvm_init_vm,
>  	__KVM_HOST_SMCCC_FUNC___pkvm_init_vcpu,
> +	__KVM_HOST_SMCCC_FUNC___pkvm_force_reclaim_guest_page,
>  	__KVM_HOST_SMCCC_FUNC___pkvm_reclaim_dying_guest_page,
>  	__KVM_HOST_SMCCC_FUNC___pkvm_start_teardown_vm,
>  	__KVM_HOST_SMCCC_FUNC___pkvm_finalize_teardown_vm,
> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index eb2a6258d83d..4c069f875a85 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -114,6 +114,12 @@ enum kvm_invalid_pte_type {
>  	 * ownership.
>  	 */
>  	KVM_HOST_INVALID_PTE_TYPE_DONATION,
> +
> +	/*
> +	 * The page has been forcefully reclaimed from the guest by the
> +	 * host.
> +	 */
> +	KVM_GUEST_INVALID_PTE_TYPE_POISONED,
>  };
>  
>  static inline bool kvm_pte_valid(kvm_pte_t pte)
> diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> index cde38a556049..f27b037abaf3 100644
> --- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> +++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> @@ -41,6 +41,7 @@ int __pkvm_hyp_donate_host(u64 pfn, u64 nr_pages);
>  int __pkvm_host_share_ffa(u64 pfn, u64 nr_pages);
>  int __pkvm_host_unshare_ffa(u64 pfn, u64 nr_pages);
>  int __pkvm_host_donate_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu);
> +int __pkvm_host_force_reclaim_page_guest(phys_addr_t phys);
>  int __pkvm_host_reclaim_page_guest(u64 gfn, struct pkvm_hyp_vm *vm);
>  int __pkvm_host_share_guest(u64 pfn, u64 gfn, u64 nr_pages, struct pkvm_hyp_vcpu *vcpu,
>  			    enum kvm_pgtable_prot prot);
> diff --git a/arch/arm64/kvm/hyp/include/nvhe/memory.h b/arch/arm64/kvm/hyp/include/nvhe/memory.h
> index dee1a406b0c2..4cedb720c75d 100644
> --- a/arch/arm64/kvm/hyp/include/nvhe/memory.h
> +++ b/arch/arm64/kvm/hyp/include/nvhe/memory.h
> @@ -30,6 +30,12 @@ enum pkvm_page_state {
>  	 * struct hyp_page.
>  	 */
>  	PKVM_NOPAGE			= BIT(0) | BIT(1),
> +
> +	/*
> +	 * 'Meta-states' which aren't encoded directly in the PTE's SW bits (or
> +	 * the hyp_vmemmap entry for the host)
> +	 */
> +	PKVM_POISON			= BIT(2),
>  };
>  #define PKVM_PAGE_STATE_MASK		(BIT(0) | BIT(1))

Looks a bit awkward to me, having the page state encoded using 3 bits, but the
mask only 2 bits.

Thanks,
Alex

>  
> diff --git a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
> index 506831804f64..a5a7bb453f3e 100644
> --- a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
> +++ b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
> @@ -78,6 +78,7 @@ int __pkvm_reclaim_dying_guest_page(pkvm_handle_t handle, u64 gfn);
>  int __pkvm_start_teardown_vm(pkvm_handle_t handle);
>  int __pkvm_finalize_teardown_vm(pkvm_handle_t handle);
>  
> +struct pkvm_hyp_vm *get_vm_by_handle(pkvm_handle_t handle);
>  struct pkvm_hyp_vcpu *pkvm_load_hyp_vcpu(pkvm_handle_t handle,
>  					 unsigned int vcpu_idx);
>  void pkvm_put_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu);
> diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> index f43c50ae2d81..e68b5d24bdad 100644
> --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> @@ -570,6 +570,13 @@ static void handle___pkvm_init_vcpu(struct kvm_cpu_context *host_ctxt)
>  	cpu_reg(host_ctxt, 1) = __pkvm_init_vcpu(handle, host_vcpu, vcpu_hva);
>  }
>  
> +static void handle___pkvm_force_reclaim_guest_page(struct kvm_cpu_context *host_ctxt)
> +{
> +	DECLARE_REG(phys_addr_t, phys, host_ctxt, 1);
> +
> +	cpu_reg(host_ctxt, 1) = __pkvm_host_force_reclaim_page_guest(phys);
> +}
> +
>  static void handle___pkvm_reclaim_dying_guest_page(struct kvm_cpu_context *host_ctxt)
>  {
>  	DECLARE_REG(pkvm_handle_t, handle, host_ctxt, 1);
> @@ -631,6 +638,7 @@ static const hcall_t host_hcall[] = {
>  	HANDLE_FUNC(__pkvm_unreserve_vm),
>  	HANDLE_FUNC(__pkvm_init_vm),
>  	HANDLE_FUNC(__pkvm_init_vcpu),
> +	HANDLE_FUNC(__pkvm_force_reclaim_guest_page),
>  	HANDLE_FUNC(__pkvm_reclaim_dying_guest_page),
>  	HANDLE_FUNC(__pkvm_start_teardown_vm),
>  	HANDLE_FUNC(__pkvm_finalize_teardown_vm),
> diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> index f4638fe9d77a..49b309b8d7d2 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> @@ -613,6 +613,35 @@ static u64 host_stage2_encode_gfn_meta(struct pkvm_hyp_vm *vm, u64 gfn)
>  	       FIELD_PREP(KVM_HOST_PTE_OWNER_GUEST_GFN_MASK, gfn);
>  }
>  
> +static int host_stage2_decode_gfn_meta(kvm_pte_t pte, struct pkvm_hyp_vm **vm,
> +				       u64 *gfn)
> +{
> +	pkvm_handle_t handle;
> +	u64 meta;
> +
> +	if (WARN_ON(kvm_pte_valid(pte)))
> +		return -EINVAL;
> +
> +	if (FIELD_GET(KVM_INVALID_PTE_TYPE_MASK, pte) !=
> +	    KVM_HOST_INVALID_PTE_TYPE_DONATION) {
> +		return -EINVAL;
> +	}
> +
> +	if (FIELD_GET(KVM_HOST_DONATION_PTE_OWNER_MASK, pte) != PKVM_ID_GUEST)
> +		return -EPERM;
> +
> +	meta = FIELD_GET(KVM_HOST_DONATION_PTE_EXTRA_MASK, pte);
> +	handle = FIELD_GET(KVM_HOST_PTE_OWNER_GUEST_HANDLE_MASK, meta);
> +	*vm = get_vm_by_handle(handle);
> +	if (!*vm) {
> +		/* We probably raced with teardown; try again */
> +		return -EAGAIN;
> +	}
> +
> +	*gfn = FIELD_GET(KVM_HOST_PTE_OWNER_GUEST_GFN_MASK, meta);
> +	return 0;
> +}
> +
>  static bool host_stage2_force_pte_cb(u64 addr, u64 end, enum kvm_pgtable_prot prot)
>  {
>  	/*
> @@ -809,8 +838,20 @@ static int __hyp_check_page_state_range(phys_addr_t phys, u64 size, enum pkvm_pa
>  	return 0;
>  }
>  
> +static bool guest_pte_is_poisoned(kvm_pte_t pte)
> +{
> +	if (kvm_pte_valid(pte))
> +		return false;
> +
> +	return FIELD_GET(KVM_INVALID_PTE_TYPE_MASK, pte) ==
> +	       KVM_GUEST_INVALID_PTE_TYPE_POISONED;
> +}
> +
>  static enum pkvm_page_state guest_get_page_state(kvm_pte_t pte, u64 addr)
>  {
> +	if (guest_pte_is_poisoned(pte))
> +		return PKVM_POISON;
> +
>  	if (!kvm_pte_valid(pte))
>  		return PKVM_NOPAGE;
>  
> @@ -839,6 +880,8 @@ static int get_valid_guest_pte(struct pkvm_hyp_vm *vm, u64 ipa, kvm_pte_t *ptep,
>  	ret = kvm_pgtable_get_leaf(&vm->pgt, ipa, &pte, &level);
>  	if (ret)
>  		return ret;
> +	if (guest_pte_is_poisoned(pte))
> +		return -EHWPOISON;
>  	if (!kvm_pte_valid(pte))
>  		return -ENOENT;
>  	if (level != KVM_PGTABLE_LAST_LEVEL)
> @@ -1104,6 +1147,84 @@ static void hyp_poison_page(phys_addr_t phys)
>  	hyp_fixmap_unmap();
>  }
>  
> +static int host_stage2_get_guest_info(phys_addr_t phys, struct pkvm_hyp_vm **vm,
> +				      u64 *gfn)
> +{
> +	enum pkvm_page_state state;
> +	kvm_pte_t pte;
> +	s8 level;
> +	int ret;
> +
> +	if (!addr_is_memory(phys))
> +		return -EFAULT;
> +
> +	state = get_host_state(hyp_phys_to_page(phys));
> +	switch (state) {
> +	case PKVM_PAGE_OWNED:
> +	case PKVM_PAGE_SHARED_OWNED:
> +	case PKVM_PAGE_SHARED_BORROWED:
> +		/* The access should no longer fault; try again. */
> +		return -EAGAIN;
> +	case PKVM_NOPAGE:
> +		break;
> +	default:
> +		return -EPERM;
> +	}
> +
> +	ret = kvm_pgtable_get_leaf(&host_mmu.pgt, phys, &pte, &level);
> +	if (ret)
> +		return ret;
> +
> +	if (WARN_ON(level != KVM_PGTABLE_LAST_LEVEL))
> +		return -EINVAL;
> +
> +	return host_stage2_decode_gfn_meta(pte, vm, gfn);
> +}
> +
> +int __pkvm_host_force_reclaim_page_guest(phys_addr_t phys)
> +{
> +	struct pkvm_hyp_vm *vm;
> +	u64 gfn, ipa, pa;
> +	kvm_pte_t pte;
> +	int ret;
> +
> +	hyp_spin_lock(&vm_table_lock);
> +	host_lock_component();
> +
> +	ret = host_stage2_get_guest_info(phys, &vm, &gfn);
> +	if (ret)
> +		goto unlock_host;
> +
> +	ipa = hyp_pfn_to_phys(gfn);
> +	guest_lock_component(vm);
> +	ret = get_valid_guest_pte(vm, ipa, &pte, &pa);
> +	if (ret)
> +		goto unlock_guest;
> +
> +	WARN_ON(pa != phys);
> +	if (guest_get_page_state(pte, ipa) != PKVM_PAGE_OWNED) {
> +		ret = -EPERM;
> +		goto unlock_guest;
> +	}
> +
> +	/* We really shouldn't be allocating, so don't pass a memcache */
> +	ret = kvm_pgtable_stage2_annotate(&vm->pgt, ipa, PAGE_SIZE, NULL,
> +					  KVM_GUEST_INVALID_PTE_TYPE_POISONED,
> +					  0);
> +	if (ret)
> +		goto unlock_guest;
> +
> +	hyp_poison_page(phys);
> +	WARN_ON(host_stage2_set_owner_locked(phys, PAGE_SIZE, PKVM_ID_HOST));
> +unlock_guest:
> +	guest_unlock_component(vm);
> +unlock_host:
> +	host_unlock_component();
> +	hyp_spin_unlock(&vm_table_lock);
> +
> +	return ret;
> +}
> +
>  int __pkvm_host_reclaim_page_guest(u64 gfn, struct pkvm_hyp_vm *vm)
>  {
>  	u64 ipa = hyp_pfn_to_phys(gfn);
> @@ -1138,7 +1259,11 @@ int __pkvm_host_reclaim_page_guest(u64 gfn, struct pkvm_hyp_vm *vm)
>  	guest_unlock_component(vm);
>  	host_unlock_component();
>  
> -	return ret;
> +	/*
> +	 * -EHWPOISON implies that the page was forcefully reclaimed already
> +	 * so return success for the GUP pin to be dropped.
> +	 */
> +	return ret && ret != -EHWPOISON ? ret : 0;
>  }
>  
>  int __pkvm_host_donate_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu)
> diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
> index c5772417372d..2836c68c1ea5 100644
> --- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
> +++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
> @@ -231,10 +231,12 @@ void pkvm_hyp_vm_table_init(void *tbl)
>  /*
>   * Return the hyp vm structure corresponding to the handle.
>   */
> -static struct pkvm_hyp_vm *get_vm_by_handle(pkvm_handle_t handle)
> +struct pkvm_hyp_vm *get_vm_by_handle(pkvm_handle_t handle)
>  {
>  	unsigned int idx = vm_handle_to_idx(handle);
>  
> +	hyp_assert_lock_held(&vm_table_lock);
> +
>  	if (unlikely(idx >= KVM_MAX_PVMS))
>  		return NULL;
>  
> -- 
> 2.52.0.457.g6b5491de43-goog
> 
> 


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 25/35] KVM: arm64: Reclaim faulting page from pKVM in spurious fault handler
  2026-01-19 12:46 ` [PATCH v2 25/35] KVM: arm64: Reclaim faulting page from pKVM in spurious fault handler Will Deacon
@ 2026-02-12 17:22   ` Alexandru Elisei
  2026-03-04 14:06     ` Will Deacon
  0 siblings, 1 reply; 54+ messages in thread
From: Alexandru Elisei @ 2026-02-12 17:22 UTC (permalink / raw)
  To: Will Deacon
  Cc: kvmarm, linux-arm-kernel, Marc Zyngier, Oliver Upton, Joey Gouly,
	Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Quentin Perret,
	Fuad Tabba, Vincent Donnefort, Mostafa Saleh

Hi Will,

Would be nice to merge this with the previous patch, that added the force
reclaim function, as it would make reviewing easier.

On Mon, Jan 19, 2026 at 12:46:18PM +0000, Will Deacon wrote:
> Host kernel accesses to pages that are inaccessible at stage-2 result in
> the injection of a translation fault, which is fatal unless an exception
> table fixup is registered for the faulting PC (e.g. for user access
> routines). This is undesirable, since a get_user_pages() call could be
> used to obtain a reference to a donated page and then a subsequent
> access via a kernel mapping would lead to a panic().
> 
> Rework the spurious fault handler so that stage-2 faults injected back
> into the host result in the target page being forcefully reclaimed when
> no exception table fixup handler is registered.
> 
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
>  arch/arm64/include/asm/virt.h |  6 ++++++
>  arch/arm64/kvm/pkvm.c         |  7 +++++++
>  arch/arm64/mm/fault.c         | 15 +++++++++------
>  3 files changed, 22 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/virt.h b/arch/arm64/include/asm/virt.h
> index b51ab6840f9c..e80addc923a4 100644
> --- a/arch/arm64/include/asm/virt.h
> +++ b/arch/arm64/include/asm/virt.h
> @@ -94,6 +94,12 @@ static inline bool is_pkvm_initialized(void)
>  	       static_branch_likely(&kvm_protected_mode_initialized);
>  }
>  
> +#ifdef CONFIG_KVM
> +bool pkvm_reclaim_guest_page(phys_addr_t phys);
> +#else
> +static inline bool pkvm_reclaim_guest_page(phys_addr_t phys) { return false; }
> +#endif
> +
>  /* Reports the availability of HYP mode */
>  static inline bool is_hyp_mode_available(void)
>  {
> diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
> index 8be91051699e..d1926cb08c76 100644
> --- a/arch/arm64/kvm/pkvm.c
> +++ b/arch/arm64/kvm/pkvm.c
> @@ -563,3 +563,10 @@ int pkvm_pgtable_stage2_split(struct kvm_pgtable *pgt, u64 addr, u64 size,
>  	WARN_ON_ONCE(1);
>  	return -EINVAL;
>  }
> +
> +bool pkvm_reclaim_guest_page(phys_addr_t phys)
> +{
> +	int ret = kvm_call_hyp_nvhe(__pkvm_force_reclaim_guest_page, phys);

Nitpicking here, we have the functions __pkvm_reclaim_page_guest() and this
function, pkvm_reclaim_guest_page(), which calls
__pkvm_force_reclaim_guest_page, which in turn calls
__pkvm_host_force_reclaim_page_guest(). I think having a bit of naming
consistency would be really useful when navigating the source code.

It might also be useful to document that callers of the hypercall
__pkvm_force_reclaim_guest_page are not expected to unpin the page in case of
success, but callers of __pkvm_reclaim_dying_guest_page are.

Thanks,
Alex

> +
> +	return !ret || ret == -EAGAIN;
> +}
> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
> index 2294f2061866..5d62abee5262 100644
> --- a/arch/arm64/mm/fault.c
> +++ b/arch/arm64/mm/fault.c
> @@ -289,9 +289,6 @@ static bool __kprobes is_spurious_el1_translation_fault(unsigned long addr,
>  	if (!is_el1_data_abort(esr) || !esr_fsc_is_translation_fault(esr))
>  		return false;
>  
> -	if (is_pkvm_stage2_abort(esr))
> -		return false;
> -
>  	local_irq_save(flags);
>  	asm volatile("at s1e1r, %0" :: "r" (addr));
>  	isb();
> @@ -302,8 +299,12 @@ static bool __kprobes is_spurious_el1_translation_fault(unsigned long addr,
>  	 * If we now have a valid translation, treat the translation fault as
>  	 * spurious.
>  	 */
> -	if (!(par & SYS_PAR_EL1_F))
> +	if (!(par & SYS_PAR_EL1_F)) {
> +		if (is_pkvm_stage2_abort(esr))
> +			return pkvm_reclaim_guest_page(par & SYS_PAR_EL1_PA);
> +
>  		return true;
> +	}
>  
>  	/*
>  	 * If we got a different type of fault from the AT instruction,
> @@ -389,9 +390,11 @@ static void __do_kernel_fault(unsigned long addr, unsigned long esr,
>  	if (!is_el1_instruction_abort(esr) && fixup_exception(regs, esr))
>  		return;
>  
> -	if (WARN_RATELIMIT(is_spurious_el1_translation_fault(addr, esr, regs),
> -	    "Ignoring spurious kernel translation fault at virtual address %016lx\n", addr))
> +	if (is_spurious_el1_translation_fault(addr, esr, regs)) {
> +		WARN_RATELIMIT(!is_pkvm_stage2_abort(esr),
> +			"Ignoring spurious kernel translation fault at virtual address %016lx\n", addr);
>  		return;
> +	}
>  
>  	if (is_el1_mte_sync_tag_check_fault(esr)) {
>  		do_tag_recovery(addr, esr, regs);
> -- 
> 2.52.0.457.g6b5491de43-goog
> 
> 


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 00/35] KVM: arm64: Add support for protected guest memory with pKVM
  2026-02-10 18:58 ` [PATCH v2 00/35] KVM: arm64: Add support for protected guest memory with pKVM Trilok Soni
  2026-02-10 19:03   ` Fuad Tabba
@ 2026-02-16 10:58   ` Venkata Rao Kakani
  2026-02-16 11:00     ` Fuad Tabba
  1 sibling, 1 reply; 54+ messages in thread
From: Venkata Rao Kakani @ 2026-02-16 10:58 UTC (permalink / raw)
  To: Trilok Soni, Will Deacon, kvmarm
  Cc: linux-arm-kernel, Marc Zyngier, Oliver Upton, Joey Gouly,
	Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Quentin Perret,
	Fuad Tabba, Vincent Donnefort, Mostafa Saleh

HI Faud/will,

Can you please share tip of your commits to apply the patch series cleanly?

I could not apply on tip of master branch.

Thanks - Venkat



On 2/11/2026 12:28 AM, Trilok Soni wrote:
> On 1/19/2026 4:45 AM, Will Deacon wrote:
>> Hi folks,
>>
>> It's back and it's even bigger than before!
>>
>> Although the first patch has been picked up as a fix (thanks, Oliver),
>> review feedback has resulted in some additional patches being included
>> in the series. If you'd like to see the first version, it's available
>> here:
> 
> Is it possible to test this patch series w/ the QEMU or it will require
> real SOC platform?
> 
> ---Trilok Soni
> 



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 00/35] KVM: arm64: Add support for protected guest memory with pKVM
  2026-02-16 10:58   ` Venkata Rao Kakani
@ 2026-02-16 11:00     ` Fuad Tabba
  2026-02-17 10:43       ` Venkata Rao Kakani
  0 siblings, 1 reply; 54+ messages in thread
From: Fuad Tabba @ 2026-02-16 11:00 UTC (permalink / raw)
  To: Venkata Rao Kakani
  Cc: Trilok Soni, Will Deacon, kvmarm, linux-arm-kernel, Marc Zyngier,
	Oliver Upton, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
	Catalin Marinas, Quentin Perret, Vincent Donnefort, Mostafa Saleh

Hi Venkata,

On Mon, 16 Feb 2026 at 10:58, Venkata Rao Kakani
<venkata.kakani@oss.qualcomm.com> wrote:
>
> HI Faud/will,
>
> Can you please share tip of your commits to apply the patch series cleanly?
>
> I could not apply on tip of master branch.

The link Will provided in the cover letter above didn't work for you?

>As before, patches are based on v6.19-rc4 and are also available at:

>  https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=kvm/protected-memory

I just tested it, and it seems to be fine, based on Linux 6.19-rc4.

Cheers,
/fuad

> Thanks - Venkat
>
>
>
> On 2/11/2026 12:28 AM, Trilok Soni wrote:
> > On 1/19/2026 4:45 AM, Will Deacon wrote:
> >> Hi folks,
> >>
> >> It's back and it's even bigger than before!
> >>
> >> Although the first patch has been picked up as a fix (thanks, Oliver),
> >> review feedback has resulted in some additional patches being included
> >> in the series. If you'd like to see the first version, it's available
> >> here:
> >
> > Is it possible to test this patch series w/ the QEMU or it will require
> > real SOC platform?
> >
> > ---Trilok Soni
> >
>


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 00/35] KVM: arm64: Add support for protected guest memory with pKVM
  2026-02-16 11:00     ` Fuad Tabba
@ 2026-02-17 10:43       ` Venkata Rao Kakani
  0 siblings, 0 replies; 54+ messages in thread
From: Venkata Rao Kakani @ 2026-02-17 10:43 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: Trilok Soni, Will Deacon, kvmarm, linux-arm-kernel, Marc Zyngier,
	Oliver Upton, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
	Catalin Marinas, Quentin Perret, Vincent Donnefort, Mostafa Saleh

Thanks Faud, it worked.

On 2/16/2026 4:30 PM, Fuad Tabba wrote:
> Hi Venkata,
> 
> On Mon, 16 Feb 2026 at 10:58, Venkata Rao Kakani
> <venkata.kakani@oss.qualcomm.com> wrote:
>>
>> HI Faud/will,
>>
>> Can you please share tip of your commits to apply the patch series cleanly?
>>
>> I could not apply on tip of master branch.
> 
> The link Will provided in the cover letter above didn't work for you?
> 
>> As before, patches are based on v6.19-rc4 and are also available at:
> 
>>   https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=kvm/protected-memory
> 
> I just tested it, and it seems to be fine, based on Linux 6.19-rc4.
> 
> Cheers,
> /fuad
> 
>> Thanks - Venkat
>>
>>
>>
>> On 2/11/2026 12:28 AM, Trilok Soni wrote:
>>> On 1/19/2026 4:45 AM, Will Deacon wrote:
>>>> Hi folks,
>>>>
>>>> It's back and it's even bigger than before!
>>>>
>>>> Although the first patch has been picked up as a fix (thanks, Oliver),
>>>> review feedback has resulted in some additional patches being included
>>>> in the series. If you'd like to see the first version, it's available
>>>> here:
>>>
>>> Is it possible to test this patch series w/ the QEMU or it will require
>>> real SOC platform?
>>>
>>> ---Trilok Soni
>>>
>>



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 07/35] KVM: arm64: Remove is_protected_kvm_enabled() checks from hypercalls
  2026-02-10 14:53   ` Alexandru Elisei
@ 2026-03-03 15:45     ` Will Deacon
  2026-03-06 11:33       ` Alexandru Elisei
  0 siblings, 1 reply; 54+ messages in thread
From: Will Deacon @ 2026-03-03 15:45 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: kvmarm, linux-arm-kernel, Marc Zyngier, Oliver Upton, Joey Gouly,
	Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Quentin Perret,
	Fuad Tabba, Vincent Donnefort, Mostafa Saleh

Hi Alex,

Thanks for having a look.

On Tue, Feb 10, 2026 at 02:53:15PM +0000, Alexandru Elisei wrote:
> On Mon, Jan 19, 2026 at 12:46:00PM +0000, Will Deacon wrote:
> > When pKVM is not enabled, the host shouldn't issue pKVM-specific
> > hypercalls and so there's no point checking for this in the pKVM
> > hypercall handlers.
> > 
> > Remove the redundant is_protected_kvm_enabled() checks from each
> > hypercall and instead rejig the hypercall table so that the
> > pKVM-specific hypercalls are unreachable when pKVM is not being used.
> > 
> > Reviewed-by: Quentin Perret <qperret@google.com>
> > Signed-off-by: Will Deacon <will@kernel.org>
> > ---
> >  arch/arm64/include/asm/kvm_asm.h   | 20 ++++++----
> >  arch/arm64/kvm/hyp/nvhe/hyp-main.c | 63 ++++++++++--------------------
> >  2 files changed, 32 insertions(+), 51 deletions(-)
> > 
> > diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> > index a1ad12c72ebf..2076005e9253 100644
> > --- a/arch/arm64/include/asm/kvm_asm.h
> > +++ b/arch/arm64/include/asm/kvm_asm.h
> > @@ -60,16 +60,9 @@ enum __kvm_host_smccc_func {
> >  	__KVM_HOST_SMCCC_FUNC___vgic_v3_init_lrs,
> >  	__KVM_HOST_SMCCC_FUNC___vgic_v3_get_gic_config,
> >  	__KVM_HOST_SMCCC_FUNC___pkvm_prot_finalize,
> > +	__KVM_HOST_SMCCC_FUNC_MIN_PKVM = __KVM_HOST_SMCCC_FUNC___pkvm_prot_finalize,
> >  
> >  	/* Hypercalls available after pKVM finalisation */
> 
> This comment should be removed, I think the functions that follow, up to
> and including __KVM_HOST_SMCCC_FUNC_MAX_NO_PKVM, are also available with
> kvm-arm.mode=nvhe.
>
> If you agree that the comment should be removed, maybe a different name for
> the define above would be more appropriate, one that does not imply pkvm?

I'd rather keep the comment, as it delimits the blocks of hypercalls and
is informative for the case when pKVM is enabled.

I suppose we could reword it like:


diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index c4246c34509a..6c79f7504d80 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -51,7 +51,7 @@
 #include <linux/mm.h>
 
 enum __kvm_host_smccc_func {
-       /* Hypercalls available only prior to pKVM finalisation */
+       /* Hypercalls that are unavailable once pKVM has finalised. */
        /* __KVM_HOST_SMCCC_FUNC___kvm_hyp_init */
        __KVM_HOST_SMCCC_FUNC___pkvm_init = __KVM_HOST_SMCCC_FUNC___kvm_hyp_init + 1,
        __KVM_HOST_SMCCC_FUNC___pkvm_create_private_mapping,
@@ -62,7 +62,7 @@ enum __kvm_host_smccc_func {
        __KVM_HOST_SMCCC_FUNC___pkvm_prot_finalize,
        __KVM_HOST_SMCCC_FUNC_MIN_PKVM = __KVM_HOST_SMCCC_FUNC___pkvm_prot_finalize,
 
-       /* Hypercalls available after pKVM finalisation */
+       /* Hypercalls that are always available and common to [nh]VHE/pKVM. */
        __KVM_HOST_SMCCC_FUNC___kvm_adjust_pc,
        __KVM_HOST_SMCCC_FUNC___kvm_vcpu_run,
        __KVM_HOST_SMCCC_FUNC___kvm_flush_vm_context,
@@ -76,7 +76,7 @@ enum __kvm_host_smccc_func {
        __KVM_HOST_SMCCC_FUNC___vgic_v3_restore_vmcr_aprs,
        __KVM_HOST_SMCCC_FUNC_MAX_NO_PKVM = __KVM_HOST_SMCCC_FUNC___vgic_v3_restore_vmcr_aprs,
 
-       /* Hypercalls available only when pKVM has finalised */
+       /* Hypercalls that are available only when pKVM has finalised. */
        __KVM_HOST_SMCCC_FUNC___pkvm_host_share_hyp,
        __KVM_HOST_SMCCC_FUNC___pkvm_host_unshare_hyp,
        __KVM_HOST_SMCCC_FUNC___pkvm_host_donate_guest,


WDYT?

> > diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> > index a7c689152f68..eb5cfe32b2c9 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> > @@ -169,9 +169,6 @@ static void handle___pkvm_vcpu_load(struct kvm_cpu_context *host_ctxt)
> >  	DECLARE_REG(u64, hcr_el2, host_ctxt, 3);
> >  	struct pkvm_hyp_vcpu *hyp_vcpu;
> >  
> > -	if (!is_protected_kvm_enabled())
> > -		return;
> > -
> >  	hyp_vcpu = pkvm_load_hyp_vcpu(handle, vcpu_idx);
> >  	if (!hyp_vcpu)
> >  		return;
> 
> I've always wondered about this. For some hypercalls, all the handler does is
> marshal the arguments for the actual function (for example,
> handle___kvm_adjust_pc() -> __kvm_adjust_pc()), but for others, like this one,
> the handler also has extra checks before calling the actual function.  Would you
> mind explaining what the rationale is?

Basically, any hypercall available post-pKVM finalisation needs to check
all pointer arguments that it takes. It's best to do this in the early
handler so that the backend code can just operate on a safe pointer
(either because the underlying memory has been pinned or because it's
been repainted to point at a hypervisor-managed data structure). That
also allows us to share a bunch of code (e.g. __kvm_vcpu_run()) with
nVHE.

The reason handle___kvm_adjust_pc() doesn't do this is simply because
this series focusses purely on the guest memory side of things; once
we've got that, then we can work on hardening the vCPU/VM structures and
these hypercalls will get tightened up by that work. In fact, that
specific hypercall will do _nothing_ for a protected VM!

Will


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 25/35] KVM: arm64: Reclaim faulting page from pKVM in spurious fault handler
  2026-02-12 17:22   ` Alexandru Elisei
@ 2026-03-04 14:06     ` Will Deacon
  0 siblings, 0 replies; 54+ messages in thread
From: Will Deacon @ 2026-03-04 14:06 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: kvmarm, linux-arm-kernel, Marc Zyngier, Oliver Upton, Joey Gouly,
	Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Quentin Perret,
	Fuad Tabba, Vincent Donnefort, Mostafa Saleh

On Thu, Feb 12, 2026 at 05:22:28PM +0000, Alexandru Elisei wrote:
> Would be nice to merge this with the previous patch, that added the force
> reclaim function, as it would make reviewing easier.

I deliberately kept them separate as the previous patch isn't exactly
small and this makes the EL1 part easier to review in isolation.

> > diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
> > index 8be91051699e..d1926cb08c76 100644
> > --- a/arch/arm64/kvm/pkvm.c
> > +++ b/arch/arm64/kvm/pkvm.c
> > @@ -563,3 +563,10 @@ int pkvm_pgtable_stage2_split(struct kvm_pgtable *pgt, u64 addr, u64 size,
> >  	WARN_ON_ONCE(1);
> >  	return -EINVAL;
> >  }
> > +
> > +bool pkvm_reclaim_guest_page(phys_addr_t phys)
> > +{
> > +	int ret = kvm_call_hyp_nvhe(__pkvm_force_reclaim_guest_page, phys);
> 
> Nitpicking here, we have the functions __pkvm_reclaim_page_guest() and this
> function, pkvm_reclaim_guest_page(), which calls
> __pkvm_force_reclaim_guest_page, which in turn calls
> __pkvm_host_force_reclaim_page_guest(). I think having a bit of naming
> consistency would be really useful when navigating the source code.

Yuck, you're right! I'll make sure 'force' is used consistently on this
path.

> It might also be useful to document that callers of the hypercall
> __pkvm_force_reclaim_guest_page are not expected to unpin the page in case of
> success, but callers of __pkvm_reclaim_dying_guest_page are.

I'll add a comment, but this is the private hypercall interface with the
host and, as lwn might put it, it's rigorously undocumented. Unpinning
memory from panic context is an interesting idea ;)

Will


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 14/35] KVM: arm64: Handle aborts from protected VMs
  2026-02-12 10:37   ` Alexandru Elisei
@ 2026-03-04 14:06     ` Will Deacon
  2026-03-06 11:34       ` Alexandru Elisei
  0 siblings, 1 reply; 54+ messages in thread
From: Will Deacon @ 2026-03-04 14:06 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: kvmarm, linux-arm-kernel, Marc Zyngier, Oliver Upton, Joey Gouly,
	Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Quentin Perret,
	Fuad Tabba, Vincent Donnefort, Mostafa Saleh

On Thu, Feb 12, 2026 at 10:37:19AM +0000, Alexandru Elisei wrote:
> On Mon, Jan 19, 2026 at 12:46:07PM +0000, Will Deacon wrote:
> > Introduce a new abort handler for resolving stage-2 page faults from
> > protected VMs by pinning and donating anonymous memory. This is
> > considerably simpler than the infamous user_mem_abort() as we only have
> > to deal with translation faults at the pte level.
> > 
> > Signed-off-by: Will Deacon <will@kernel.org>
> > ---
> >  arch/arm64/kvm/mmu.c | 89 ++++++++++++++++++++++++++++++++++++++++----
> >  1 file changed, 81 insertions(+), 8 deletions(-)
> > 
> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index a23a4b7f108c..b21a5bf3d104 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -1641,6 +1641,74 @@ static int gmem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >  	return ret != -EAGAIN ? ret : 0;
> >  }
> >  
> > +static int pkvm_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> > +		struct kvm_memory_slot *memslot, unsigned long hva)
> > +{
> > +	unsigned int flags = FOLL_HWPOISON | FOLL_LONGTERM | FOLL_WRITE;
> > +	struct kvm_pgtable *pgt = vcpu->arch.hw_mmu->pgt;
> > +	struct mm_struct *mm = current->mm;
> > +	struct kvm *kvm = vcpu->kvm;
> > +	void *hyp_memcache;
> > +	struct page *page;
> > +	int ret;
> > +
> > +	ret = prepare_mmu_memcache(vcpu, true, &hyp_memcache);
> > +	if (ret)
> > +		return -ENOMEM;
> > +
> > +	ret = account_locked_vm(mm, 1, true);
> > +	if (ret)
> > +		return ret;
> > +
> > +	mmap_read_lock(mm);
> > +	ret = pin_user_pages(hva, 1, flags, &page);
> > +	mmap_read_unlock(mm);
> 
> If the page is part of a large folio, the entire folio gets pinned here, not
> just the page returned by pin_user_pages(). Do you reckon that should be
> considered when calling account_locked_vm()?

I don't _think_ so.

Since we only ask for a single page when we call pin_user_pages(), the
folio refcount will be adjusted by 1, even for large folios. Trying to
adjust the accounting based on whether the pinned page forms part of a
large folio feels error-prone, not least because the migration triggered
by the longterm pin could actually end up splitting the folio but also
because we'd have to avoid double accounting on subsequent faults to the
same folio. It also feels fragile if the mm code is able to split
partially pinned folios in future (like it appears to be able to for
partially mapped folios).

> > +	if (ret == -EHWPOISON) {
> > +		kvm_send_hwpoison_signal(hva, PAGE_SHIFT);
> > +		ret = 0;
> > +		goto dec_account;
> > +	} else if (ret != 1) {
> > +		ret = -EFAULT;
> > +		goto dec_account;
> > +	} else if (!folio_test_swapbacked(page_folio(page))) {
> > +		/*
> > +		 * We really can't deal with page-cache pages returned by GUP
> > +		 * because (a) we may trigger writeback of a page for which we
> > +		 * no longer have access and (b) page_mkclean() won't find the
> > +		 * stage-2 mapping in the rmap so we can get out-of-whack with
> > +		 * the filesystem when marking the page dirty during unpinning
> > +		 * (see cc5095747edf ("ext4: don't BUG if someone dirty pages
> > +		 * without asking ext4 first")).
> 
> I've been trying to wrap my head around this. Would you mind providing a few
> more hints about what the issue is? I'm sure the approach is correct, it's
> likely just me not being familiar with the code.

The fundamental problem is that unmapping page-cache pages from the host
stage-2 can confuse filesystems who don't know that either the page is
now inaccessible (and so may attempt to access it) or that the page can
be accessed concurrently by the guest without updating the page state.

To fix those issues, we would need to support MMU notifiers for protected
memory but that would allow the host to mess with the guest stage-2
page-table, which breaks the security model that we're trying to uphold.

> > @@ -2190,15 +2258,20 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
> >  		goto out_unlock;
> >  	}
> >  
> > -	VM_WARN_ON_ONCE(kvm_vcpu_trap_is_permission_fault(vcpu) &&
> > -			!write_fault && !kvm_vcpu_trap_is_exec_fault(vcpu));
> > +	if (kvm_vm_is_protected(vcpu->kvm)) {
> > +		ret = pkvm_mem_abort(vcpu, fault_ipa, memslot, hva);
> 
> I guess the reason this comes after handling an access fault is because you want
> the WARN_ON() to trigger in pkvm_pgtable_stage2_mkyoung().

Right, we should only ever see translation faults for protected guests
and that's all that pkvm_mem_abort() is prepared to handle, so we call
it last.

Will


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 24/35] KVM: arm64: Introduce hypercall to force reclaim of a protected page
  2026-02-12 17:18   ` Alexandru Elisei
@ 2026-03-04 14:08     ` Will Deacon
  0 siblings, 0 replies; 54+ messages in thread
From: Will Deacon @ 2026-03-04 14:08 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: kvmarm, linux-arm-kernel, Marc Zyngier, Oliver Upton, Joey Gouly,
	Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Quentin Perret,
	Fuad Tabba, Vincent Donnefort, Mostafa Saleh

On Thu, Feb 12, 2026 at 05:18:42PM +0000, Alexandru Elisei wrote:
> > diff --git a/arch/arm64/kvm/hyp/include/nvhe/memory.h b/arch/arm64/kvm/hyp/include/nvhe/memory.h
> > index dee1a406b0c2..4cedb720c75d 100644
> > --- a/arch/arm64/kvm/hyp/include/nvhe/memory.h
> > +++ b/arch/arm64/kvm/hyp/include/nvhe/memory.h
> > @@ -30,6 +30,12 @@ enum pkvm_page_state {
> >  	 * struct hyp_page.
> >  	 */
> >  	PKVM_NOPAGE			= BIT(0) | BIT(1),
> > +
> > +	/*
> > +	 * 'Meta-states' which aren't encoded directly in the PTE's SW bits (or
> > +	 * the hyp_vmemmap entry for the host)
> > +	 */
> > +	PKVM_POISON			= BIT(2),
> >  };
> >  #define PKVM_PAGE_STATE_MASK		(BIT(0) | BIT(1))
> 
> Looks a bit awkward to me, having the page state encoded using 3 bits, but the
> mask only 2 bits.

It's a little fiddly because we have three ways to track the page state:

1. In the two software bits of the pte mapping the page. This uses
   PKVM_PAGE_STATE_PROT_MASK.

2. In the four bits of each 'struct hyp_page' entry in the
   'hyp_vmemmap'. This means we can avoid fragmenting the host stage-2
   page-table for pages that are shared. These use PKVM_PAGE_STATE_MASK.

3. States derived from an invalid pte that are never stored explicitly.

PKVM_POISON fits into the last category, and so isn't constrained by the
masks.

Perhaps I should rename PKVM_PAGE_STATE_MASK to something like
PKVM_PAGE_STATE_VMEMMAP_MASK to make it clearer?

Will


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 07/35] KVM: arm64: Remove is_protected_kvm_enabled() checks from hypercalls
  2026-03-03 15:45     ` Will Deacon
@ 2026-03-06 11:33       ` Alexandru Elisei
  0 siblings, 0 replies; 54+ messages in thread
From: Alexandru Elisei @ 2026-03-06 11:33 UTC (permalink / raw)
  To: Will Deacon
  Cc: kvmarm, linux-arm-kernel, Marc Zyngier, Oliver Upton, Joey Gouly,
	Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Quentin Perret,
	Fuad Tabba, Vincent Donnefort, Mostafa Saleh

Hi Will,

On Tue, Mar 03, 2026 at 03:45:16PM +0000, Will Deacon wrote:
> Hi Alex,
> 
> Thanks for having a look.
> 
> On Tue, Feb 10, 2026 at 02:53:15PM +0000, Alexandru Elisei wrote:
> > On Mon, Jan 19, 2026 at 12:46:00PM +0000, Will Deacon wrote:
> > > When pKVM is not enabled, the host shouldn't issue pKVM-specific
> > > hypercalls and so there's no point checking for this in the pKVM
> > > hypercall handlers.
> > > 
> > > Remove the redundant is_protected_kvm_enabled() checks from each
> > > hypercall and instead rejig the hypercall table so that the
> > > pKVM-specific hypercalls are unreachable when pKVM is not being used.
> > > 
> > > Reviewed-by: Quentin Perret <qperret@google.com>
> > > Signed-off-by: Will Deacon <will@kernel.org>
> > > ---
> > >  arch/arm64/include/asm/kvm_asm.h   | 20 ++++++----
> > >  arch/arm64/kvm/hyp/nvhe/hyp-main.c | 63 ++++++++++--------------------
> > >  2 files changed, 32 insertions(+), 51 deletions(-)
> > > 
> > > diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> > > index a1ad12c72ebf..2076005e9253 100644
> > > --- a/arch/arm64/include/asm/kvm_asm.h
> > > +++ b/arch/arm64/include/asm/kvm_asm.h
> > > @@ -60,16 +60,9 @@ enum __kvm_host_smccc_func {
> > >  	__KVM_HOST_SMCCC_FUNC___vgic_v3_init_lrs,
> > >  	__KVM_HOST_SMCCC_FUNC___vgic_v3_get_gic_config,
> > >  	__KVM_HOST_SMCCC_FUNC___pkvm_prot_finalize,
> > > +	__KVM_HOST_SMCCC_FUNC_MIN_PKVM = __KVM_HOST_SMCCC_FUNC___pkvm_prot_finalize,
> > >  
> > >  	/* Hypercalls available after pKVM finalisation */
> > 
> > This comment should be removed, I think the functions that follow, up to
> > and including __KVM_HOST_SMCCC_FUNC_MAX_NO_PKVM, are also available with
> > kvm-arm.mode=nvhe.
> >
> > If you agree that the comment should be removed, maybe a different name for
> > the define above would be more appropriate, one that does not imply pkvm?
> 
> I'd rather keep the comment, as it delimits the blocks of hypercalls and
> is informative for the case when pKVM is enabled.
> 
> I suppose we could reword it like:
> 
> 
> diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> index c4246c34509a..6c79f7504d80 100644
> --- a/arch/arm64/include/asm/kvm_asm.h
> +++ b/arch/arm64/include/asm/kvm_asm.h
> @@ -51,7 +51,7 @@
>  #include <linux/mm.h>
>  
>  enum __kvm_host_smccc_func {
> -       /* Hypercalls available only prior to pKVM finalisation */
> +       /* Hypercalls that are unavailable once pKVM has finalised. */
>         /* __KVM_HOST_SMCCC_FUNC___kvm_hyp_init */
>         __KVM_HOST_SMCCC_FUNC___pkvm_init = __KVM_HOST_SMCCC_FUNC___kvm_hyp_init + 1,
>         __KVM_HOST_SMCCC_FUNC___pkvm_create_private_mapping,
> @@ -62,7 +62,7 @@ enum __kvm_host_smccc_func {
>         __KVM_HOST_SMCCC_FUNC___pkvm_prot_finalize,
>         __KVM_HOST_SMCCC_FUNC_MIN_PKVM = __KVM_HOST_SMCCC_FUNC___pkvm_prot_finalize,
>  
> -       /* Hypercalls available after pKVM finalisation */
> +       /* Hypercalls that are always available and common to [nh]VHE/pKVM. */
>         __KVM_HOST_SMCCC_FUNC___kvm_adjust_pc,
>         __KVM_HOST_SMCCC_FUNC___kvm_vcpu_run,
>         __KVM_HOST_SMCCC_FUNC___kvm_flush_vm_context,
> @@ -76,7 +76,7 @@ enum __kvm_host_smccc_func {
>         __KVM_HOST_SMCCC_FUNC___vgic_v3_restore_vmcr_aprs,
>         __KVM_HOST_SMCCC_FUNC_MAX_NO_PKVM = __KVM_HOST_SMCCC_FUNC___vgic_v3_restore_vmcr_aprs,
>  
> -       /* Hypercalls available only when pKVM has finalised */
> +       /* Hypercalls that are available only when pKVM has finalised. */
>         __KVM_HOST_SMCCC_FUNC___pkvm_host_share_hyp,
>         __KVM_HOST_SMCCC_FUNC___pkvm_host_unshare_hyp,
>         __KVM_HOST_SMCCC_FUNC___pkvm_host_donate_guest,
> 
> 
> WDYT?

Looks good to me (but you already figured that out in the updated series).

> 
> > > diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> > > index a7c689152f68..eb5cfe32b2c9 100644
> > > --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> > > +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> > > @@ -169,9 +169,6 @@ static void handle___pkvm_vcpu_load(struct kvm_cpu_context *host_ctxt)
> > >  	DECLARE_REG(u64, hcr_el2, host_ctxt, 3);
> > >  	struct pkvm_hyp_vcpu *hyp_vcpu;
> > >  
> > > -	if (!is_protected_kvm_enabled())
> > > -		return;
> > > -
> > >  	hyp_vcpu = pkvm_load_hyp_vcpu(handle, vcpu_idx);
> > >  	if (!hyp_vcpu)
> > >  		return;
> > 
> > I've always wondered about this. For some hypercalls, all the handler does is
> > marshal the arguments for the actual function (for example,
> > handle___kvm_adjust_pc() -> __kvm_adjust_pc()), but for others, like this one,
> > the handler also has extra checks before calling the actual function.  Would you
> > mind explaining what the rationale is?
> 
> Basically, any hypercall available post-pKVM finalisation needs to check
> all pointer arguments that it takes. It's best to do this in the early
> handler so that the backend code can just operate on a safe pointer
> (either because the underlying memory has been pinned or because it's
> been repainted to point at a hypervisor-managed data structure). That
> also allows us to share a bunch of code (e.g. __kvm_vcpu_run()) with
> nVHE.
> 
> The reason handle___kvm_adjust_pc() doesn't do this is simply because
> this series focusses purely on the guest memory side of things; once
> we've got that, then we can work on hardening the vCPU/VM structures and
> these hypercalls will get tightened up by that work. In fact, that
> specific hypercall will do _nothing_ for a protected VM!


That makes sense, thank you for the explanation.

Alex


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 14/35] KVM: arm64: Handle aborts from protected VMs
  2026-03-04 14:06     ` Will Deacon
@ 2026-03-06 11:34       ` Alexandru Elisei
  0 siblings, 0 replies; 54+ messages in thread
From: Alexandru Elisei @ 2026-03-06 11:34 UTC (permalink / raw)
  To: Will Deacon
  Cc: kvmarm, linux-arm-kernel, Marc Zyngier, Oliver Upton, Joey Gouly,
	Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Quentin Perret,
	Fuad Tabba, Vincent Donnefort, Mostafa Saleh

Hi Will,

On Wed, Mar 04, 2026 at 02:06:49PM +0000, Will Deacon wrote:
> On Thu, Feb 12, 2026 at 10:37:19AM +0000, Alexandru Elisei wrote:
> > On Mon, Jan 19, 2026 at 12:46:07PM +0000, Will Deacon wrote:
> > > Introduce a new abort handler for resolving stage-2 page faults from
> > > protected VMs by pinning and donating anonymous memory. This is
> > > considerably simpler than the infamous user_mem_abort() as we only have
> > > to deal with translation faults at the pte level.
> > > 
> > > Signed-off-by: Will Deacon <will@kernel.org>
> > > ---
> > >  arch/arm64/kvm/mmu.c | 89 ++++++++++++++++++++++++++++++++++++++++----
> > >  1 file changed, 81 insertions(+), 8 deletions(-)
> > > 
> > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > > index a23a4b7f108c..b21a5bf3d104 100644
> > > --- a/arch/arm64/kvm/mmu.c
> > > +++ b/arch/arm64/kvm/mmu.c
> > > @@ -1641,6 +1641,74 @@ static int gmem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> > >  	return ret != -EAGAIN ? ret : 0;
> > >  }
> > >  
> > > +static int pkvm_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> > > +		struct kvm_memory_slot *memslot, unsigned long hva)
> > > +{
> > > +	unsigned int flags = FOLL_HWPOISON | FOLL_LONGTERM | FOLL_WRITE;
> > > +	struct kvm_pgtable *pgt = vcpu->arch.hw_mmu->pgt;
> > > +	struct mm_struct *mm = current->mm;
> > > +	struct kvm *kvm = vcpu->kvm;
> > > +	void *hyp_memcache;
> > > +	struct page *page;
> > > +	int ret;
> > > +
> > > +	ret = prepare_mmu_memcache(vcpu, true, &hyp_memcache);
> > > +	if (ret)
> > > +		return -ENOMEM;
> > > +
> > > +	ret = account_locked_vm(mm, 1, true);
> > > +	if (ret)
> > > +		return ret;
> > > +
> > > +	mmap_read_lock(mm);
> > > +	ret = pin_user_pages(hva, 1, flags, &page);
> > > +	mmap_read_unlock(mm);
> > 
> > If the page is part of a large folio, the entire folio gets pinned here, not
> > just the page returned by pin_user_pages(). Do you reckon that should be
> > considered when calling account_locked_vm()?
> 
> I don't _think_ so.
> 
> Since we only ask for a single page when we call pin_user_pages(), the
> folio refcount will be adjusted by 1, even for large folios. Trying to

The large folios, **_pincount** is adjusted by 1 with FOLL_LONGTERM. For
non-large folio, the refcount is increased by GUP_PIN_COUNTING_BIAS == 1024
(try_grab_folio() is where the magic happens).

> adjust the accounting based on whether the pinned page forms part of a
> large folio feels error-prone, not least because the migration triggered
> by the longterm pin could actually end up splitting the folio but also

Hmm.. as far as I can tell pin_user_pages() uses MIGRATE_SYNC to migrate folios
not suitable for longterm pinning, and after migration has completed it attemps
to pin the userspace address again.

Also, split_folio() and friends cannot split folio_maybe_dma_pinned_folio(),
according to the comments for the various functions.

> because we'd have to avoid double accounting on subsequent faults to the
> same folio. It also feels fragile if the mm code is able to split
> partially pinned folios in future (like it appears to be able to for
> partially mapped folios).

I'm not sure why mm would want to split a folio_maybe_dma_pinned_folio(). But
I'm far from being a mm expert, so I do understand why relying on this might
feel fragile.

> 
> > > +	if (ret == -EHWPOISON) {
> > > +		kvm_send_hwpoison_signal(hva, PAGE_SHIFT);
> > > +		ret = 0;
> > > +		goto dec_account;
> > > +	} else if (ret != 1) {
> > > +		ret = -EFAULT;
> > > +		goto dec_account;
> > > +	} else if (!folio_test_swapbacked(page_folio(page))) {
> > > +		/*
> > > +		 * We really can't deal with page-cache pages returned by GUP
> > > +		 * because (a) we may trigger writeback of a page for which we
> > > +		 * no longer have access and (b) page_mkclean() won't find the
> > > +		 * stage-2 mapping in the rmap so we can get out-of-whack with
> > > +		 * the filesystem when marking the page dirty during unpinning
> > > +		 * (see cc5095747edf ("ext4: don't BUG if someone dirty pages
> > > +		 * without asking ext4 first")).
> > 
> > I've been trying to wrap my head around this. Would you mind providing a few
> > more hints about what the issue is? I'm sure the approach is correct, it's
> > likely just me not being familiar with the code.
> 
> The fundamental problem is that unmapping page-cache pages from the host
> stage-2 can confuse filesystems who don't know that either the page is
> now inaccessible (and so may attempt to access it) or that the page can
> be accessed concurrently by the guest without updating the page state.
> 
> To fix those issues, we would need to support MMU notifiers for protected
> memory but that would allow the host to mess with the guest stage-2
> page-table, which breaks the security model that we're trying to uphold.

Aha, got it, thanks for the explanation!

Alex

> 
> > > @@ -2190,15 +2258,20 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
> > >  		goto out_unlock;
> > >  	}
> > >  
> > > -	VM_WARN_ON_ONCE(kvm_vcpu_trap_is_permission_fault(vcpu) &&
> > > -			!write_fault && !kvm_vcpu_trap_is_exec_fault(vcpu));
> > > +	if (kvm_vm_is_protected(vcpu->kvm)) {
> > > +		ret = pkvm_mem_abort(vcpu, fault_ipa, memslot, hva);
> > 
> > I guess the reason this comes after handling an access fault is because you want
> > the WARN_ON() to trigger in pkvm_pgtable_stage2_mkyoung().
> 
> Right, we should only ever see translation faults for protected guests
> and that's all that pkvm_mem_abort() is prepared to handle, so we call
> it last.
> 
> Will


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v2 14/35] KVM: arm64: Handle aborts from protected VMs
  2026-01-19 12:46 ` [PATCH v2 14/35] KVM: arm64: Handle aborts from protected VMs Will Deacon
  2026-02-12 10:37   ` Alexandru Elisei
@ 2026-03-11 10:24   ` Fuad Tabba
  1 sibling, 0 replies; 54+ messages in thread
From: Fuad Tabba @ 2026-03-11 10:24 UTC (permalink / raw)
  To: Will Deacon
  Cc: kvmarm, linux-arm-kernel, Marc Zyngier, Oliver Upton, Joey Gouly,
	Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Quentin Perret,
	Vincent Donnefort, Mostafa Saleh

On Mon, 19 Jan 2026 at 12:47, Will Deacon <will@kernel.org> wrote:
>
> Introduce a new abort handler for resolving stage-2 page faults from
> protected VMs by pinning and donating anonymous memory. This is
> considerably simpler than the infamous user_mem_abort() as we only have
> to deal with translation faults at the pte level.
>
> Signed-off-by: Will Deacon <will@kernel.org>

Reviewed-by: Fuad Tabba <tabba@google.com>

Cheers,
/fuad

> ---
>  arch/arm64/kvm/mmu.c | 89 ++++++++++++++++++++++++++++++++++++++++----
>  1 file changed, 81 insertions(+), 8 deletions(-)
>
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index a23a4b7f108c..b21a5bf3d104 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1641,6 +1641,74 @@ static int gmem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>         return ret != -EAGAIN ? ret : 0;
>  }
>
> +static int pkvm_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> +               struct kvm_memory_slot *memslot, unsigned long hva)
> +{
> +       unsigned int flags = FOLL_HWPOISON | FOLL_LONGTERM | FOLL_WRITE;
> +       struct kvm_pgtable *pgt = vcpu->arch.hw_mmu->pgt;
> +       struct mm_struct *mm = current->mm;
> +       struct kvm *kvm = vcpu->kvm;
> +       void *hyp_memcache;
> +       struct page *page;
> +       int ret;
> +
> +       ret = prepare_mmu_memcache(vcpu, true, &hyp_memcache);
> +       if (ret)
> +               return -ENOMEM;
> +
> +       ret = account_locked_vm(mm, 1, true);
> +       if (ret)
> +               return ret;
> +
> +       mmap_read_lock(mm);
> +       ret = pin_user_pages(hva, 1, flags, &page);
> +       mmap_read_unlock(mm);
> +
> +       if (ret == -EHWPOISON) {
> +               kvm_send_hwpoison_signal(hva, PAGE_SHIFT);
> +               ret = 0;
> +               goto dec_account;
> +       } else if (ret != 1) {
> +               ret = -EFAULT;
> +               goto dec_account;
> +       } else if (!folio_test_swapbacked(page_folio(page))) {
> +               /*
> +                * We really can't deal with page-cache pages returned by GUP
> +                * because (a) we may trigger writeback of a page for which we
> +                * no longer have access and (b) page_mkclean() won't find the
> +                * stage-2 mapping in the rmap so we can get out-of-whack with
> +                * the filesystem when marking the page dirty during unpinning
> +                * (see cc5095747edf ("ext4: don't BUG if someone dirty pages
> +                * without asking ext4 first")).
> +                *
> +                * Ideally we'd just restrict ourselves to anonymous pages, but
> +                * we also want to allow memfd (i.e. shmem) pages, so check for
> +                * pages backed by swap in the knowledge that the GUP pin will
> +                * prevent try_to_unmap() from succeeding.
> +                */
> +               ret = -EIO;
> +               goto unpin;
> +       }
> +
> +       write_lock(&kvm->mmu_lock);
> +       ret = pkvm_pgtable_stage2_map(pgt, fault_ipa, PAGE_SIZE,
> +                                     page_to_phys(page), KVM_PGTABLE_PROT_RWX,
> +                                     hyp_memcache, 0);
> +       write_unlock(&kvm->mmu_lock);
> +       if (ret) {
> +               if (ret == -EAGAIN)
> +                       ret = 0;
> +               goto unpin;
> +       }
> +
> +       return 0;
> +unpin:
> +       unpin_user_pages(&page, 1);
> +dec_account:
> +       account_locked_vm(mm, 1, false);
> +       return ret;
> +}
> +
>  static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>                           struct kvm_s2_trans *nested,
>                           struct kvm_memory_slot *memslot, unsigned long hva,
> @@ -2190,15 +2258,20 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
>                 goto out_unlock;
>         }
>
> -       VM_WARN_ON_ONCE(kvm_vcpu_trap_is_permission_fault(vcpu) &&
> -                       !write_fault && !kvm_vcpu_trap_is_exec_fault(vcpu));
> +       if (kvm_vm_is_protected(vcpu->kvm)) {
> +               ret = pkvm_mem_abort(vcpu, fault_ipa, memslot, hva);
> +       } else {
> +               VM_WARN_ON_ONCE(kvm_vcpu_trap_is_permission_fault(vcpu) &&
> +                               !write_fault &&
> +                               !kvm_vcpu_trap_is_exec_fault(vcpu));
>
> -       if (kvm_slot_has_gmem(memslot))
> -               ret = gmem_abort(vcpu, fault_ipa, nested, memslot,
> -                                esr_fsc_is_permission_fault(esr));
> -       else
> -               ret = user_mem_abort(vcpu, fault_ipa, nested, memslot, hva,
> -                                    esr_fsc_is_permission_fault(esr));
> +               if (kvm_slot_has_gmem(memslot))
> +                       ret = gmem_abort(vcpu, fault_ipa, nested, memslot,
> +                                        esr_fsc_is_permission_fault(esr));
> +               else
> +                       ret = user_mem_abort(vcpu, fault_ipa, nested, memslot, hva,
> +                                            esr_fsc_is_permission_fault(esr));
> +       }
>         if (ret == 0)
>                 ret = 1;
>  out:
> --
> 2.52.0.457.g6b5491de43-goog
>


^ permalink raw reply	[flat|nested] 54+ messages in thread

end of thread, other threads:[~2026-03-11 10:25 UTC | newest]

Thread overview: 54+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-19 12:45 [PATCH v2 00/35] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
2026-01-19 12:45 ` [PATCH v2 01/35] KVM: arm64: Invert KVM_PGTABLE_WALK_HANDLE_FAULT to fix pKVM walkers Will Deacon
2026-01-19 12:45 ` [PATCH v2 02/35] KVM: arm64: Don't leak stage-2 page-table if VM fails to init under pKVM Will Deacon
2026-01-19 12:45 ` [PATCH v2 03/35] KVM: arm64: Move handle check into pkvm_pgtable_stage2_destroy_range() Will Deacon
2026-01-19 12:45 ` [PATCH v2 04/35] KVM: arm64: Rename __pkvm_pgtable_stage2_unmap() Will Deacon
2026-01-19 12:45 ` [PATCH v2 05/35] KVM: arm64: Don't advertise unsupported features for protected guests Will Deacon
2026-01-19 12:45 ` [PATCH v2 06/35] KVM: arm64: Expose self-hosted debug regs as RAZ/WI " Will Deacon
2026-01-19 12:46 ` [PATCH v2 07/35] KVM: arm64: Remove is_protected_kvm_enabled() checks from hypercalls Will Deacon
2026-02-10 14:53   ` Alexandru Elisei
2026-03-03 15:45     ` Will Deacon
2026-03-06 11:33       ` Alexandru Elisei
2026-01-19 12:46 ` [PATCH v2 08/35] KVM: arm64: Ignore MMU notifier callbacks for protected VMs Will Deacon
2026-01-19 12:46 ` [PATCH v2 09/35] KVM: arm64: Prevent unsupported memslot operations on " Will Deacon
2026-01-19 12:46 ` [PATCH v2 10/35] KVM: arm64: Ignore -EAGAIN when mapping in pages for the pKVM host Will Deacon
2026-01-19 12:46 ` [PATCH v2 11/35] KVM: arm64: Split teardown hypercall into two phases Will Deacon
2026-01-19 12:46 ` [PATCH v2 12/35] KVM: arm64: Introduce __pkvm_host_donate_guest() Will Deacon
2026-01-19 12:46 ` [PATCH v2 13/35] KVM: arm64: Hook up donation hypercall to pkvm_pgtable_stage2_map() Will Deacon
2026-01-19 12:46 ` [PATCH v2 14/35] KVM: arm64: Handle aborts from protected VMs Will Deacon
2026-02-12 10:37   ` Alexandru Elisei
2026-03-04 14:06     ` Will Deacon
2026-03-06 11:34       ` Alexandru Elisei
2026-03-11 10:24   ` Fuad Tabba
2026-01-19 12:46 ` [PATCH v2 15/35] KVM: arm64: Introduce __pkvm_reclaim_dying_guest_page() Will Deacon
2026-01-19 12:46 ` [PATCH v2 16/35] KVM: arm64: Hook up reclaim hypercall to pkvm_pgtable_stage2_destroy() Will Deacon
2026-01-19 12:46 ` [PATCH v2 17/35] KVM: arm64: Refactor enter_exception64() Will Deacon
2026-01-19 12:46 ` [PATCH v2 18/35] KVM: arm64: Inject SIGSEGV on illegal accesses Will Deacon
2026-01-19 12:46 ` [PATCH v2 19/35] KVM: arm64: Avoid pointless annotation when mapping host-owned pages Will Deacon
2026-01-19 12:46 ` [PATCH v2 20/35] KVM: arm64: Generalise kvm_pgtable_stage2_set_owner() Will Deacon
2026-01-19 12:46 ` [PATCH v2 21/35] KVM: arm64: Introduce host_stage2_set_owner_metadata_locked() Will Deacon
2026-01-19 12:46 ` [PATCH v2 22/35] KVM: arm64: Change 'pkvm_handle_t' to u16 Will Deacon
2026-01-28 10:28   ` Fuad Tabba
2026-01-19 12:46 ` [PATCH v2 23/35] KVM: arm64: Annotate guest donations with handle and gfn in host stage-2 Will Deacon
2026-01-28 10:29   ` Fuad Tabba
2026-01-19 12:46 ` [PATCH v2 24/35] KVM: arm64: Introduce hypercall to force reclaim of a protected page Will Deacon
2026-02-12 17:18   ` Alexandru Elisei
2026-03-04 14:08     ` Will Deacon
2026-01-19 12:46 ` [PATCH v2 25/35] KVM: arm64: Reclaim faulting page from pKVM in spurious fault handler Will Deacon
2026-02-12 17:22   ` Alexandru Elisei
2026-03-04 14:06     ` Will Deacon
2026-01-19 12:46 ` [PATCH v2 26/35] KVM: arm64: Return -EFAULT from VCPU_RUN on access to a poisoned pte Will Deacon
2026-01-19 12:46 ` [PATCH v2 27/35] KVM: arm64: Add hvc handler at EL2 for hypercalls from protected VMs Will Deacon
2026-01-19 12:46 ` [PATCH v2 28/35] KVM: arm64: Implement the MEM_SHARE hypercall for " Will Deacon
2026-01-19 12:46 ` [PATCH v2 29/35] KVM: arm64: Implement the MEM_UNSHARE " Will Deacon
2026-01-19 12:46 ` [PATCH v2 30/35] KVM: arm64: Allow userspace to create protected VMs when pKVM is enabled Will Deacon
2026-01-19 12:46 ` [PATCH v2 31/35] KVM: arm64: Add some initial documentation for pKVM Will Deacon
2026-01-19 12:46 ` [PATCH v2 32/35] KVM: arm64: Extend pKVM page ownership selftests to cover guest donation Will Deacon
2026-01-19 12:46 ` [PATCH v2 33/35] KVM: arm64: Register 'selftest_vm' in the VM table Will Deacon
2026-01-19 12:46 ` [PATCH v2 34/35] KVM: arm64: Extend pKVM page ownership selftests to cover forced reclaim Will Deacon
2026-01-19 12:46 ` [PATCH v2 35/35] KVM: arm64: Extend pKVM page ownership selftests to cover guest hvcs Will Deacon
2026-02-10 18:58 ` [PATCH v2 00/35] KVM: arm64: Add support for protected guest memory with pKVM Trilok Soni
2026-02-10 19:03   ` Fuad Tabba
2026-02-16 10:58   ` Venkata Rao Kakani
2026-02-16 11:00     ` Fuad Tabba
2026-02-17 10:43       ` Venkata Rao Kakani

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox