* [PATCH v4 00/38] KVM: arm64: Add support for protected guest memory with pKVM
@ 2026-03-27 13:59 Will Deacon
2026-03-27 14:00 ` [PATCH v4 01/38] KVM: arm64: Remove unused PKVM_ID_FFA definition Will Deacon
` (38 more replies)
0 siblings, 39 replies; 40+ messages in thread
From: Will Deacon @ 2026-03-27 13:59 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh,
Alexandru Elisei
Hi again, folks,
Here's v4 of the pKVM protected memory patches previously posted here:
v1: https://lore.kernel.org/kvmarm/20260105154939.11041-1-will@kernel.org/
v2: https://lore.kernel.org/kvmarm/20260119124629.2563-1-will@kernel.org/
v3: https://lore.kernel.org/r/20260305144351.17071-1-will@kernel.org
Changes since v3 include:
* Rebased onto v7.0-rc4
* Remove unused PKVM_ID_FFA
* Make ARM_PKVM_GUEST depend on DMA_RESTRICTED_POOL
* Use FAR_TO_FIPA_OFFSET() instead of open-coding it
* Remove PROTECTED_VM_UAPI config option and update documentation
As before, I've pushed an updated branch with this series:
https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=kvm/protected-memory
and the kvmtool patches are available at:
https://git.kernel.org/pub/scm/linux/kernel/git/will/kvmtool.git/log/?h=pkvm
I fully expect to send a v5, as this is the first time Sashiko has had
a chance to chew on this and I'm expecting a roasting.
Cheers,
Will
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oupton@kernel.org>
Cc: Joey Gouly <joey.gouly@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Quentin Perret <qperret@google.com>
Cc: Fuad Tabba <tabba@google.com>
Cc: Vincent Donnefort <vdonnefort@google.com>
Cc: Mostafa Saleh <smostafa@google.com>
Cc: Alexandru Elisei <alexandru.elisei@arm.com>
--->8
Fuad Tabba (1):
KVM: arm64: Expose self-hosted debug regs as RAZ/WI for protected
guests
Quentin Perret (1):
KVM: arm64: Inject SIGSEGV on illegal accesses
Will Deacon (36):
KVM: arm64: Remove unused PKVM_ID_FFA definition
KVM: arm64: Don't leak stage-2 page-table if VM fails to init under
pKVM
KVM: arm64: Move handle check into pkvm_pgtable_stage2_destroy_range()
KVM: arm64: Rename __pkvm_pgtable_stage2_unmap()
KVM: arm64: Don't advertise unsupported features for protected guests
KVM: arm64: Remove is_protected_kvm_enabled() checks from hypercalls
KVM: arm64: Ignore MMU notifier callbacks for protected VMs
KVM: arm64: Prevent unsupported memslot operations on protected VMs
KVM: arm64: Ignore -EAGAIN when mapping in pages for the pKVM host
KVM: arm64: Split teardown hypercall into two phases
KVM: arm64: Introduce __pkvm_host_donate_guest()
KVM: arm64: Hook up donation hypercall to pkvm_pgtable_stage2_map()
KVM: arm64: Handle aborts from protected VMs
KVM: arm64: Introduce __pkvm_reclaim_dying_guest_page()
KVM: arm64: Hook up reclaim hypercall to pkvm_pgtable_stage2_destroy()
KVM: arm64: Factor out pKVM host exception injection logic
KVM: arm64: Support translation faults in inject_host_exception()
KVM: arm64: Avoid pointless annotation when mapping host-owned pages
KVM: arm64: Generalise kvm_pgtable_stage2_set_owner()
KVM: arm64: Introduce host_stage2_set_owner_metadata_locked()
KVM: arm64: Change 'pkvm_handle_t' to u16
KVM: arm64: Annotate guest donations with handle and gfn in host
stage-2
KVM: arm64: Introduce hypercall to force reclaim of a protected page
KVM: arm64: Reclaim faulting page from pKVM in spurious fault handler
KVM: arm64: Return -EFAULT from VCPU_RUN on access to a poisoned pte
KVM: arm64: Add hvc handler at EL2 for hypercalls from protected VMs
KVM: arm64: Implement the MEM_SHARE hypercall for protected VMs
KVM: arm64: Implement the MEM_UNSHARE hypercall for protected VMs
KVM: arm64: Allow userspace to create protected VMs when pKVM is
enabled
KVM: arm64: Add some initial documentation for pKVM
KVM: arm64: Extend pKVM page ownership selftests to cover guest
donation
KVM: arm64: Register 'selftest_vm' in the VM table
KVM: arm64: Extend pKVM page ownership selftests to cover forced
reclaim
KVM: arm64: Extend pKVM page ownership selftests to cover guest hvcs
KVM: arm64: Rename PKVM_PAGE_STATE_MASK
drivers/virt: pkvm: Add Kconfig dependency on DMA_RESTRICTED_POOL
.../admin-guide/kernel-parameters.txt | 4 +-
Documentation/virt/kvm/arm/index.rst | 1 +
Documentation/virt/kvm/arm/pkvm.rst | 106 ++++
arch/arm64/include/asm/kvm_asm.h | 31 +-
arch/arm64/include/asm/kvm_host.h | 9 +-
arch/arm64/include/asm/kvm_pgtable.h | 45 +-
arch/arm64/include/asm/kvm_pkvm.h | 4 +-
arch/arm64/include/asm/virt.h | 9 +
arch/arm64/kvm/arm.c | 12 +-
arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 10 +-
arch/arm64/kvm/hyp/include/nvhe/memory.h | 12 +-
arch/arm64/kvm/hyp/include/nvhe/pkvm.h | 7 +-
.../arm64/kvm/hyp/include/nvhe/trap_handler.h | 2 +
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 184 +++---
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 585 ++++++++++++++++--
arch/arm64/kvm/hyp/nvhe/pkvm.c | 224 ++++++-
arch/arm64/kvm/hyp/nvhe/switch.c | 1 +
arch/arm64/kvm/hyp/nvhe/sys_regs.c | 8 +
arch/arm64/kvm/hyp/pgtable.c | 33 +-
arch/arm64/kvm/mmu.c | 114 +++-
arch/arm64/kvm/pkvm.c | 151 ++++-
arch/arm64/mm/fault.c | 33 +-
drivers/virt/coco/pkvm-guest/Kconfig | 2 +-
include/uapi/linux/kvm.h | 5 +
24 files changed, 1365 insertions(+), 227 deletions(-)
create mode 100644 Documentation/virt/kvm/arm/pkvm.rst
--
2.53.0.1018.g2bb0e51243-goog
^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH v4 01/38] KVM: arm64: Remove unused PKVM_ID_FFA definition
2026-03-27 13:59 [PATCH v4 00/38] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
@ 2026-03-27 14:00 ` Will Deacon
2026-03-27 14:00 ` [PATCH v4 02/38] KVM: arm64: Don't leak stage-2 page-table if VM fails to init under pKVM Will Deacon
` (37 subsequent siblings)
38 siblings, 0 replies; 40+ messages in thread
From: Will Deacon @ 2026-03-27 14:00 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh,
Alexandru Elisei
Commit 7cbf7c37718e ("KVM: arm64: Drop pkvm_mem_transition for host/hyp
sharing") removed the last users of PKVM_ID_FFA, so drop the definition
altogether.
Signed-off-by: Will Deacon <will@kernel.org>
---
arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 1 -
1 file changed, 1 deletion(-)
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index 5f9d56754e39..7f25f2bca90c 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -27,7 +27,6 @@ extern struct host_mmu host_mmu;
enum pkvm_component_id {
PKVM_ID_HOST,
PKVM_ID_HYP,
- PKVM_ID_FFA,
};
extern unsigned long hyp_nr_cpus;
--
2.53.0.1018.g2bb0e51243-goog
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v4 02/38] KVM: arm64: Don't leak stage-2 page-table if VM fails to init under pKVM
2026-03-27 13:59 [PATCH v4 00/38] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
2026-03-27 14:00 ` [PATCH v4 01/38] KVM: arm64: Remove unused PKVM_ID_FFA definition Will Deacon
@ 2026-03-27 14:00 ` Will Deacon
2026-03-27 14:00 ` [PATCH v4 03/38] KVM: arm64: Move handle check into pkvm_pgtable_stage2_destroy_range() Will Deacon
` (36 subsequent siblings)
38 siblings, 0 replies; 40+ messages in thread
From: Will Deacon @ 2026-03-27 14:00 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh,
Alexandru Elisei
If pkvm_init_host_vm() fails, we should free the stage-2 page-table
previously allocated by kvm_init_stage2_mmu().
Cc: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Fixes: 07aeb70707b1 ("KVM: arm64: Reserve pKVM handle during pkvm_init_host_vm()")
Signed-off-by: Will Deacon <will@kernel.org>
---
arch/arm64/kvm/arm.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 410ffd41fd73..3589fc08266c 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -236,7 +236,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
*/
ret = pkvm_init_host_vm(kvm);
if (ret)
- goto err_free_cpumask;
+ goto err_uninit_mmu;
}
kvm_vgic_early_init(kvm);
@@ -252,6 +252,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
return 0;
+err_uninit_mmu:
+ kvm_uninit_stage2_mmu(kvm);
err_free_cpumask:
free_cpumask_var(kvm->arch.supported_cpus);
err_unshare_kvm:
--
2.53.0.1018.g2bb0e51243-goog
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v4 03/38] KVM: arm64: Move handle check into pkvm_pgtable_stage2_destroy_range()
2026-03-27 13:59 [PATCH v4 00/38] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
2026-03-27 14:00 ` [PATCH v4 01/38] KVM: arm64: Remove unused PKVM_ID_FFA definition Will Deacon
2026-03-27 14:00 ` [PATCH v4 02/38] KVM: arm64: Don't leak stage-2 page-table if VM fails to init under pKVM Will Deacon
@ 2026-03-27 14:00 ` Will Deacon
2026-03-27 14:00 ` [PATCH v4 04/38] KVM: arm64: Rename __pkvm_pgtable_stage2_unmap() Will Deacon
` (35 subsequent siblings)
38 siblings, 0 replies; 40+ messages in thread
From: Will Deacon @ 2026-03-27 14:00 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh,
Alexandru Elisei
When pKVM is enabled, a VM has a 'handle' allocated by the hypervisor
in kvm_arch_init_vm() and released later by kvm_arch_destroy_vm().
Consequently, the only time __pkvm_pgtable_stage2_unmap() can run into
an uninitialised 'handle' is on the kvm_arch_init_vm() failure path,
where we destroy the empty stage-2 page-table if we fail to allocate a
handle.
Move the handle check into pkvm_pgtable_stage2_destroy_range(), which
will additionally handle protected VMs in subsequent patches.
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
arch/arm64/kvm/pkvm.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index d7a0f69a9982..7797813f4dbe 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -329,9 +329,6 @@ static int __pkvm_pgtable_stage2_unmap(struct kvm_pgtable *pgt, u64 start, u64 e
struct pkvm_mapping *mapping;
int ret;
- if (!handle)
- return 0;
-
for_each_mapping_in_range_safe(pgt, start, end, mapping) {
ret = kvm_call_hyp_nvhe(__pkvm_host_unshare_guest, handle, mapping->gfn,
mapping->nr_pages);
@@ -347,6 +344,12 @@ static int __pkvm_pgtable_stage2_unmap(struct kvm_pgtable *pgt, u64 start, u64 e
void pkvm_pgtable_stage2_destroy_range(struct kvm_pgtable *pgt,
u64 addr, u64 size)
{
+ struct kvm *kvm = kvm_s2_mmu_to_kvm(pgt->mmu);
+ pkvm_handle_t handle = kvm->arch.pkvm.handle;
+
+ if (!handle)
+ return;
+
__pkvm_pgtable_stage2_unmap(pgt, addr, addr + size);
}
--
2.53.0.1018.g2bb0e51243-goog
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v4 04/38] KVM: arm64: Rename __pkvm_pgtable_stage2_unmap()
2026-03-27 13:59 [PATCH v4 00/38] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
` (2 preceding siblings ...)
2026-03-27 14:00 ` [PATCH v4 03/38] KVM: arm64: Move handle check into pkvm_pgtable_stage2_destroy_range() Will Deacon
@ 2026-03-27 14:00 ` Will Deacon
2026-03-27 14:00 ` [PATCH v4 05/38] KVM: arm64: Don't advertise unsupported features for protected guests Will Deacon
` (34 subsequent siblings)
38 siblings, 0 replies; 40+ messages in thread
From: Will Deacon @ 2026-03-27 14:00 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh,
Alexandru Elisei
In preparation for adding support for protected VMs, where pages are
donated rather than shared, rename __pkvm_pgtable_stage2_unmap() to
__pkvm_pgtable_stage2_unshare() to make it clearer about what is going
on.
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
arch/arm64/kvm/pkvm.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index 7797813f4dbe..42f6e50825ac 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -322,7 +322,7 @@ int pkvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
return 0;
}
-static int __pkvm_pgtable_stage2_unmap(struct kvm_pgtable *pgt, u64 start, u64 end)
+static int __pkvm_pgtable_stage2_unshare(struct kvm_pgtable *pgt, u64 start, u64 end)
{
struct kvm *kvm = kvm_s2_mmu_to_kvm(pgt->mmu);
pkvm_handle_t handle = kvm->arch.pkvm.handle;
@@ -350,7 +350,7 @@ void pkvm_pgtable_stage2_destroy_range(struct kvm_pgtable *pgt,
if (!handle)
return;
- __pkvm_pgtable_stage2_unmap(pgt, addr, addr + size);
+ __pkvm_pgtable_stage2_unshare(pgt, addr, addr + size);
}
void pkvm_pgtable_stage2_destroy_pgd(struct kvm_pgtable *pgt)
@@ -386,7 +386,7 @@ int pkvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
return -EAGAIN;
/* Remove _any_ pkvm_mapping overlapping with the range, bigger or smaller. */
- ret = __pkvm_pgtable_stage2_unmap(pgt, addr, addr + size);
+ ret = __pkvm_pgtable_stage2_unshare(pgt, addr, addr + size);
if (ret)
return ret;
mapping = NULL;
@@ -409,7 +409,7 @@ int pkvm_pgtable_stage2_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
{
lockdep_assert_held_write(&kvm_s2_mmu_to_kvm(pgt->mmu)->mmu_lock);
- return __pkvm_pgtable_stage2_unmap(pgt, addr, addr + size);
+ return __pkvm_pgtable_stage2_unshare(pgt, addr, addr + size);
}
int pkvm_pgtable_stage2_wrprotect(struct kvm_pgtable *pgt, u64 addr, u64 size)
--
2.53.0.1018.g2bb0e51243-goog
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v4 05/38] KVM: arm64: Don't advertise unsupported features for protected guests
2026-03-27 13:59 [PATCH v4 00/38] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
` (3 preceding siblings ...)
2026-03-27 14:00 ` [PATCH v4 04/38] KVM: arm64: Rename __pkvm_pgtable_stage2_unmap() Will Deacon
@ 2026-03-27 14:00 ` Will Deacon
2026-03-27 14:00 ` [PATCH v4 06/38] KVM: arm64: Expose self-hosted debug regs as RAZ/WI " Will Deacon
` (33 subsequent siblings)
38 siblings, 0 replies; 40+ messages in thread
From: Will Deacon @ 2026-03-27 14:00 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh,
Alexandru Elisei
Both SVE and PMUv3 are treated as "restricted" features for protected
guests and attempts to access their corresponding architectural state
from a protected guest result in an undefined exception being injected
by the hypervisor.
Since these exceptions are unexpected and typically fatal for the guest,
don't advertise these features for protected guests.
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
arch/arm64/include/asm/kvm_pkvm.h | 2 --
1 file changed, 2 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_pkvm.h b/arch/arm64/include/asm/kvm_pkvm.h
index 757076ad4ec9..7041e398fb4c 100644
--- a/arch/arm64/include/asm/kvm_pkvm.h
+++ b/arch/arm64/include/asm/kvm_pkvm.h
@@ -40,8 +40,6 @@ static inline bool kvm_pkvm_ext_allowed(struct kvm *kvm, long ext)
case KVM_CAP_MAX_VCPU_ID:
case KVM_CAP_MSI_DEVID:
case KVM_CAP_ARM_VM_IPA_SIZE:
- case KVM_CAP_ARM_PMU_V3:
- case KVM_CAP_ARM_SVE:
case KVM_CAP_ARM_PTRAUTH_ADDRESS:
case KVM_CAP_ARM_PTRAUTH_GENERIC:
return true;
--
2.53.0.1018.g2bb0e51243-goog
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v4 06/38] KVM: arm64: Expose self-hosted debug regs as RAZ/WI for protected guests
2026-03-27 13:59 [PATCH v4 00/38] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
` (4 preceding siblings ...)
2026-03-27 14:00 ` [PATCH v4 05/38] KVM: arm64: Don't advertise unsupported features for protected guests Will Deacon
@ 2026-03-27 14:00 ` Will Deacon
2026-03-27 14:00 ` [PATCH v4 07/38] KVM: arm64: Remove is_protected_kvm_enabled() checks from hypercalls Will Deacon
` (32 subsequent siblings)
38 siblings, 0 replies; 40+ messages in thread
From: Will Deacon @ 2026-03-27 14:00 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh,
Alexandru Elisei
From: Fuad Tabba <tabba@google.com>
Debug and trace are not currently supported for protected guests, so
trap accesses to the related registers and emulate them as RAZ/WI for
now. Although this isn't strictly compatible with the architecture, it's
sufficient for Linux guests and means that debug support can be added
later on.
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
arch/arm64/kvm/hyp/nvhe/sys_regs.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/arch/arm64/kvm/hyp/nvhe/sys_regs.c b/arch/arm64/kvm/hyp/nvhe/sys_regs.c
index 06d28621722e..0a84140afa28 100644
--- a/arch/arm64/kvm/hyp/nvhe/sys_regs.c
+++ b/arch/arm64/kvm/hyp/nvhe/sys_regs.c
@@ -392,6 +392,14 @@ static const struct sys_reg_desc pvm_sys_reg_descs[] = {
/* Cache maintenance by set/way operations are restricted. */
/* Debug and Trace Registers are restricted. */
+ RAZ_WI(SYS_DBGBVRn_EL1(0)),
+ RAZ_WI(SYS_DBGBCRn_EL1(0)),
+ RAZ_WI(SYS_DBGWVRn_EL1(0)),
+ RAZ_WI(SYS_DBGWCRn_EL1(0)),
+ RAZ_WI(SYS_MDSCR_EL1),
+ RAZ_WI(SYS_OSLAR_EL1),
+ RAZ_WI(SYS_OSLSR_EL1),
+ RAZ_WI(SYS_OSDLR_EL1),
/* Group 1 ID registers */
HOST_HANDLED(SYS_REVIDR_EL1),
--
2.53.0.1018.g2bb0e51243-goog
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v4 07/38] KVM: arm64: Remove is_protected_kvm_enabled() checks from hypercalls
2026-03-27 13:59 [PATCH v4 00/38] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
` (5 preceding siblings ...)
2026-03-27 14:00 ` [PATCH v4 06/38] KVM: arm64: Expose self-hosted debug regs as RAZ/WI " Will Deacon
@ 2026-03-27 14:00 ` Will Deacon
2026-03-27 14:00 ` [PATCH v4 08/38] KVM: arm64: Ignore MMU notifier callbacks for protected VMs Will Deacon
` (31 subsequent siblings)
38 siblings, 0 replies; 40+ messages in thread
From: Will Deacon @ 2026-03-27 14:00 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh,
Alexandru Elisei
When pKVM is not enabled, the host shouldn't issue pKVM-specific
hypercalls and so there's no point checking for this in the pKVM
hypercall handlers.
Remove the redundant is_protected_kvm_enabled() checks from each
hypercall and instead rejig the hypercall table so that the
pKVM-specific hypercalls are unreachable when pKVM is not being used.
Reviewed-by: Quentin Perret <qperret@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
arch/arm64/include/asm/kvm_asm.h | 24 +++++++-----
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 63 ++++++++++--------------------
2 files changed, 34 insertions(+), 53 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index a1ad12c72ebf..7b72aac4730d 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -51,7 +51,7 @@
#include <linux/mm.h>
enum __kvm_host_smccc_func {
- /* Hypercalls available only prior to pKVM finalisation */
+ /* Hypercalls that are unavailable once pKVM has finalised. */
/* __KVM_HOST_SMCCC_FUNC___kvm_hyp_init */
__KVM_HOST_SMCCC_FUNC___pkvm_init = __KVM_HOST_SMCCC_FUNC___kvm_hyp_init + 1,
__KVM_HOST_SMCCC_FUNC___pkvm_create_private_mapping,
@@ -60,16 +60,9 @@ enum __kvm_host_smccc_func {
__KVM_HOST_SMCCC_FUNC___vgic_v3_init_lrs,
__KVM_HOST_SMCCC_FUNC___vgic_v3_get_gic_config,
__KVM_HOST_SMCCC_FUNC___pkvm_prot_finalize,
+ __KVM_HOST_SMCCC_FUNC_MIN_PKVM = __KVM_HOST_SMCCC_FUNC___pkvm_prot_finalize,
- /* Hypercalls available after pKVM finalisation */
- __KVM_HOST_SMCCC_FUNC___pkvm_host_share_hyp,
- __KVM_HOST_SMCCC_FUNC___pkvm_host_unshare_hyp,
- __KVM_HOST_SMCCC_FUNC___pkvm_host_share_guest,
- __KVM_HOST_SMCCC_FUNC___pkvm_host_unshare_guest,
- __KVM_HOST_SMCCC_FUNC___pkvm_host_relax_perms_guest,
- __KVM_HOST_SMCCC_FUNC___pkvm_host_wrprotect_guest,
- __KVM_HOST_SMCCC_FUNC___pkvm_host_test_clear_young_guest,
- __KVM_HOST_SMCCC_FUNC___pkvm_host_mkyoung_guest,
+ /* Hypercalls that are always available and common to [nh]VHE/pKVM. */
__KVM_HOST_SMCCC_FUNC___kvm_adjust_pc,
__KVM_HOST_SMCCC_FUNC___kvm_vcpu_run,
__KVM_HOST_SMCCC_FUNC___kvm_flush_vm_context,
@@ -81,6 +74,17 @@ enum __kvm_host_smccc_func {
__KVM_HOST_SMCCC_FUNC___kvm_timer_set_cntvoff,
__KVM_HOST_SMCCC_FUNC___vgic_v3_save_aprs,
__KVM_HOST_SMCCC_FUNC___vgic_v3_restore_vmcr_aprs,
+ __KVM_HOST_SMCCC_FUNC_MAX_NO_PKVM = __KVM_HOST_SMCCC_FUNC___vgic_v3_restore_vmcr_aprs,
+
+ /* Hypercalls that are available only when pKVM has finalised. */
+ __KVM_HOST_SMCCC_FUNC___pkvm_host_share_hyp,
+ __KVM_HOST_SMCCC_FUNC___pkvm_host_unshare_hyp,
+ __KVM_HOST_SMCCC_FUNC___pkvm_host_share_guest,
+ __KVM_HOST_SMCCC_FUNC___pkvm_host_unshare_guest,
+ __KVM_HOST_SMCCC_FUNC___pkvm_host_relax_perms_guest,
+ __KVM_HOST_SMCCC_FUNC___pkvm_host_wrprotect_guest,
+ __KVM_HOST_SMCCC_FUNC___pkvm_host_test_clear_young_guest,
+ __KVM_HOST_SMCCC_FUNC___pkvm_host_mkyoung_guest,
__KVM_HOST_SMCCC_FUNC___pkvm_reserve_vm,
__KVM_HOST_SMCCC_FUNC___pkvm_unreserve_vm,
__KVM_HOST_SMCCC_FUNC___pkvm_init_vm,
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index e7790097db93..127decc2dd2b 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -169,9 +169,6 @@ static void handle___pkvm_vcpu_load(struct kvm_cpu_context *host_ctxt)
DECLARE_REG(u64, hcr_el2, host_ctxt, 3);
struct pkvm_hyp_vcpu *hyp_vcpu;
- if (!is_protected_kvm_enabled())
- return;
-
hyp_vcpu = pkvm_load_hyp_vcpu(handle, vcpu_idx);
if (!hyp_vcpu)
return;
@@ -188,12 +185,8 @@ static void handle___pkvm_vcpu_load(struct kvm_cpu_context *host_ctxt)
static void handle___pkvm_vcpu_put(struct kvm_cpu_context *host_ctxt)
{
- struct pkvm_hyp_vcpu *hyp_vcpu;
+ struct pkvm_hyp_vcpu *hyp_vcpu = pkvm_get_loaded_hyp_vcpu();
- if (!is_protected_kvm_enabled())
- return;
-
- hyp_vcpu = pkvm_get_loaded_hyp_vcpu();
if (hyp_vcpu)
pkvm_put_hyp_vcpu(hyp_vcpu);
}
@@ -257,9 +250,6 @@ static void handle___pkvm_host_share_guest(struct kvm_cpu_context *host_ctxt)
struct pkvm_hyp_vcpu *hyp_vcpu;
int ret = -EINVAL;
- if (!is_protected_kvm_enabled())
- goto out;
-
hyp_vcpu = pkvm_get_loaded_hyp_vcpu();
if (!hyp_vcpu || pkvm_hyp_vcpu_is_protected(hyp_vcpu))
goto out;
@@ -281,9 +271,6 @@ static void handle___pkvm_host_unshare_guest(struct kvm_cpu_context *host_ctxt)
struct pkvm_hyp_vm *hyp_vm;
int ret = -EINVAL;
- if (!is_protected_kvm_enabled())
- goto out;
-
hyp_vm = get_np_pkvm_hyp_vm(handle);
if (!hyp_vm)
goto out;
@@ -301,9 +288,6 @@ static void handle___pkvm_host_relax_perms_guest(struct kvm_cpu_context *host_ct
struct pkvm_hyp_vcpu *hyp_vcpu;
int ret = -EINVAL;
- if (!is_protected_kvm_enabled())
- goto out;
-
hyp_vcpu = pkvm_get_loaded_hyp_vcpu();
if (!hyp_vcpu || pkvm_hyp_vcpu_is_protected(hyp_vcpu))
goto out;
@@ -321,9 +305,6 @@ static void handle___pkvm_host_wrprotect_guest(struct kvm_cpu_context *host_ctxt
struct pkvm_hyp_vm *hyp_vm;
int ret = -EINVAL;
- if (!is_protected_kvm_enabled())
- goto out;
-
hyp_vm = get_np_pkvm_hyp_vm(handle);
if (!hyp_vm)
goto out;
@@ -343,9 +324,6 @@ static void handle___pkvm_host_test_clear_young_guest(struct kvm_cpu_context *ho
struct pkvm_hyp_vm *hyp_vm;
int ret = -EINVAL;
- if (!is_protected_kvm_enabled())
- goto out;
-
hyp_vm = get_np_pkvm_hyp_vm(handle);
if (!hyp_vm)
goto out;
@@ -362,9 +340,6 @@ static void handle___pkvm_host_mkyoung_guest(struct kvm_cpu_context *host_ctxt)
struct pkvm_hyp_vcpu *hyp_vcpu;
int ret = -EINVAL;
- if (!is_protected_kvm_enabled())
- goto out;
-
hyp_vcpu = pkvm_get_loaded_hyp_vcpu();
if (!hyp_vcpu || pkvm_hyp_vcpu_is_protected(hyp_vcpu))
goto out;
@@ -424,12 +399,8 @@ static void handle___kvm_tlb_flush_vmid(struct kvm_cpu_context *host_ctxt)
static void handle___pkvm_tlb_flush_vmid(struct kvm_cpu_context *host_ctxt)
{
DECLARE_REG(pkvm_handle_t, handle, host_ctxt, 1);
- struct pkvm_hyp_vm *hyp_vm;
+ struct pkvm_hyp_vm *hyp_vm = get_np_pkvm_hyp_vm(handle);
- if (!is_protected_kvm_enabled())
- return;
-
- hyp_vm = get_np_pkvm_hyp_vm(handle);
if (!hyp_vm)
return;
@@ -603,14 +574,6 @@ static const hcall_t host_hcall[] = {
HANDLE_FUNC(__vgic_v3_get_gic_config),
HANDLE_FUNC(__pkvm_prot_finalize),
- HANDLE_FUNC(__pkvm_host_share_hyp),
- HANDLE_FUNC(__pkvm_host_unshare_hyp),
- HANDLE_FUNC(__pkvm_host_share_guest),
- HANDLE_FUNC(__pkvm_host_unshare_guest),
- HANDLE_FUNC(__pkvm_host_relax_perms_guest),
- HANDLE_FUNC(__pkvm_host_wrprotect_guest),
- HANDLE_FUNC(__pkvm_host_test_clear_young_guest),
- HANDLE_FUNC(__pkvm_host_mkyoung_guest),
HANDLE_FUNC(__kvm_adjust_pc),
HANDLE_FUNC(__kvm_vcpu_run),
HANDLE_FUNC(__kvm_flush_vm_context),
@@ -622,6 +585,15 @@ static const hcall_t host_hcall[] = {
HANDLE_FUNC(__kvm_timer_set_cntvoff),
HANDLE_FUNC(__vgic_v3_save_aprs),
HANDLE_FUNC(__vgic_v3_restore_vmcr_aprs),
+
+ HANDLE_FUNC(__pkvm_host_share_hyp),
+ HANDLE_FUNC(__pkvm_host_unshare_hyp),
+ HANDLE_FUNC(__pkvm_host_share_guest),
+ HANDLE_FUNC(__pkvm_host_unshare_guest),
+ HANDLE_FUNC(__pkvm_host_relax_perms_guest),
+ HANDLE_FUNC(__pkvm_host_wrprotect_guest),
+ HANDLE_FUNC(__pkvm_host_test_clear_young_guest),
+ HANDLE_FUNC(__pkvm_host_mkyoung_guest),
HANDLE_FUNC(__pkvm_reserve_vm),
HANDLE_FUNC(__pkvm_unreserve_vm),
HANDLE_FUNC(__pkvm_init_vm),
@@ -635,7 +607,7 @@ static const hcall_t host_hcall[] = {
static void handle_host_hcall(struct kvm_cpu_context *host_ctxt)
{
DECLARE_REG(unsigned long, id, host_ctxt, 0);
- unsigned long hcall_min = 0;
+ unsigned long hcall_min = 0, hcall_max = -1;
hcall_t hfn;
/*
@@ -647,14 +619,19 @@ static void handle_host_hcall(struct kvm_cpu_context *host_ctxt)
* basis. This is all fine, however, since __pkvm_prot_finalize
* returns -EPERM after the first call for a given CPU.
*/
- if (static_branch_unlikely(&kvm_protected_mode_initialized))
- hcall_min = __KVM_HOST_SMCCC_FUNC___pkvm_prot_finalize;
+ if (static_branch_unlikely(&kvm_protected_mode_initialized)) {
+ hcall_min = __KVM_HOST_SMCCC_FUNC_MIN_PKVM;
+ } else {
+ hcall_max = __KVM_HOST_SMCCC_FUNC_MAX_NO_PKVM;
+ }
id &= ~ARM_SMCCC_CALL_HINTS;
id -= KVM_HOST_SMCCC_ID(0);
- if (unlikely(id < hcall_min || id >= ARRAY_SIZE(host_hcall)))
+ if (unlikely(id < hcall_min || id > hcall_max ||
+ id >= ARRAY_SIZE(host_hcall))) {
goto inval;
+ }
hfn = host_hcall[id];
if (unlikely(!hfn))
--
2.53.0.1018.g2bb0e51243-goog
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v4 08/38] KVM: arm64: Ignore MMU notifier callbacks for protected VMs
2026-03-27 13:59 [PATCH v4 00/38] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
` (6 preceding siblings ...)
2026-03-27 14:00 ` [PATCH v4 07/38] KVM: arm64: Remove is_protected_kvm_enabled() checks from hypercalls Will Deacon
@ 2026-03-27 14:00 ` Will Deacon
2026-03-27 14:00 ` [PATCH v4 09/38] KVM: arm64: Prevent unsupported memslot operations on " Will Deacon
` (30 subsequent siblings)
38 siblings, 0 replies; 40+ messages in thread
From: Will Deacon @ 2026-03-27 14:00 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh,
Alexandru Elisei
In preparation for supporting the donation of pinned pages to protected
VMs, return early from the MMU notifiers when called for a protected VM,
as the necessary hypercalls are exposed only for non-protected guests.
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
arch/arm64/kvm/mmu.c | 9 ++++++---
arch/arm64/kvm/pkvm.c | 19 ++++++++++++++++++-
2 files changed, 24 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 17d64a1e11e5..5e7821fe0fc4 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -340,6 +340,9 @@ static void __unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64
void kvm_stage2_unmap_range(struct kvm_s2_mmu *mmu, phys_addr_t start,
u64 size, bool may_block)
{
+ if (kvm_vm_is_protected(kvm_s2_mmu_to_kvm(mmu)))
+ return;
+
__unmap_stage2_range(mmu, start, size, may_block);
}
@@ -2223,7 +2226,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
{
- if (!kvm->arch.mmu.pgt)
+ if (!kvm->arch.mmu.pgt || kvm_vm_is_protected(kvm))
return false;
__unmap_stage2_range(&kvm->arch.mmu, range->start << PAGE_SHIFT,
@@ -2238,7 +2241,7 @@ bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
{
u64 size = (range->end - range->start) << PAGE_SHIFT;
- if (!kvm->arch.mmu.pgt)
+ if (!kvm->arch.mmu.pgt || kvm_vm_is_protected(kvm))
return false;
return KVM_PGT_FN(kvm_pgtable_stage2_test_clear_young)(kvm->arch.mmu.pgt,
@@ -2254,7 +2257,7 @@ bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
{
u64 size = (range->end - range->start) << PAGE_SHIFT;
- if (!kvm->arch.mmu.pgt)
+ if (!kvm->arch.mmu.pgt || kvm_vm_is_protected(kvm))
return false;
return KVM_PGT_FN(kvm_pgtable_stage2_test_clear_young)(kvm->arch.mmu.pgt,
diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index 42f6e50825ac..20d50abb3b94 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -407,7 +407,12 @@ int pkvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
int pkvm_pgtable_stage2_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
{
- lockdep_assert_held_write(&kvm_s2_mmu_to_kvm(pgt->mmu)->mmu_lock);
+ struct kvm *kvm = kvm_s2_mmu_to_kvm(pgt->mmu);
+
+ if (WARN_ON(kvm_vm_is_protected(kvm)))
+ return -EPERM;
+
+ lockdep_assert_held_write(&kvm->mmu_lock);
return __pkvm_pgtable_stage2_unshare(pgt, addr, addr + size);
}
@@ -419,6 +424,9 @@ int pkvm_pgtable_stage2_wrprotect(struct kvm_pgtable *pgt, u64 addr, u64 size)
struct pkvm_mapping *mapping;
int ret = 0;
+ if (WARN_ON(kvm_vm_is_protected(kvm)))
+ return -EPERM;
+
lockdep_assert_held(&kvm->mmu_lock);
for_each_mapping_in_range_safe(pgt, addr, addr + size, mapping) {
ret = kvm_call_hyp_nvhe(__pkvm_host_wrprotect_guest, handle, mapping->gfn,
@@ -450,6 +458,9 @@ bool pkvm_pgtable_stage2_test_clear_young(struct kvm_pgtable *pgt, u64 addr, u64
struct pkvm_mapping *mapping;
bool young = false;
+ if (WARN_ON(kvm_vm_is_protected(kvm)))
+ return -EPERM;
+
lockdep_assert_held(&kvm->mmu_lock);
for_each_mapping_in_range_safe(pgt, addr, addr + size, mapping)
young |= kvm_call_hyp_nvhe(__pkvm_host_test_clear_young_guest, handle, mapping->gfn,
@@ -461,12 +472,18 @@ bool pkvm_pgtable_stage2_test_clear_young(struct kvm_pgtable *pgt, u64 addr, u64
int pkvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr, enum kvm_pgtable_prot prot,
enum kvm_pgtable_walk_flags flags)
{
+ if (WARN_ON(kvm_vm_is_protected(kvm_s2_mmu_to_kvm(pgt->mmu))))
+ return -EPERM;
+
return kvm_call_hyp_nvhe(__pkvm_host_relax_perms_guest, addr >> PAGE_SHIFT, prot);
}
void pkvm_pgtable_stage2_mkyoung(struct kvm_pgtable *pgt, u64 addr,
enum kvm_pgtable_walk_flags flags)
{
+ if (WARN_ON(kvm_vm_is_protected(kvm_s2_mmu_to_kvm(pgt->mmu))))
+ return;
+
WARN_ON(kvm_call_hyp_nvhe(__pkvm_host_mkyoung_guest, addr >> PAGE_SHIFT));
}
--
2.53.0.1018.g2bb0e51243-goog
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v4 09/38] KVM: arm64: Prevent unsupported memslot operations on protected VMs
2026-03-27 13:59 [PATCH v4 00/38] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
` (7 preceding siblings ...)
2026-03-27 14:00 ` [PATCH v4 08/38] KVM: arm64: Ignore MMU notifier callbacks for protected VMs Will Deacon
@ 2026-03-27 14:00 ` Will Deacon
2026-03-27 14:00 ` [PATCH v4 10/38] KVM: arm64: Ignore -EAGAIN when mapping in pages for the pKVM host Will Deacon
` (29 subsequent siblings)
38 siblings, 0 replies; 40+ messages in thread
From: Will Deacon @ 2026-03-27 14:00 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh,
Alexandru Elisei
Protected VMs do not support deleting or moving memslots after first
run nor do they support read-only or dirty logging.
Return -EPERM to userspace if such an operation is attempted.
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
arch/arm64/kvm/mmu.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 5e7821fe0fc4..b3cc5dfe5723 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -2414,6 +2414,19 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
hva_t hva, reg_end;
int ret = 0;
+ if (kvm_vm_is_protected(kvm)) {
+ /* Cannot modify memslots once a pVM has run. */
+ if (pkvm_hyp_vm_is_created(kvm) &&
+ (change == KVM_MR_DELETE || change == KVM_MR_MOVE)) {
+ return -EPERM;
+ }
+
+ if (new &&
+ new->flags & (KVM_MEM_LOG_DIRTY_PAGES | KVM_MEM_READONLY)) {
+ return -EPERM;
+ }
+ }
+
if (change != KVM_MR_CREATE && change != KVM_MR_MOVE &&
change != KVM_MR_FLAGS_ONLY)
return 0;
--
2.53.0.1018.g2bb0e51243-goog
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v4 10/38] KVM: arm64: Ignore -EAGAIN when mapping in pages for the pKVM host
2026-03-27 13:59 [PATCH v4 00/38] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
` (8 preceding siblings ...)
2026-03-27 14:00 ` [PATCH v4 09/38] KVM: arm64: Prevent unsupported memslot operations on " Will Deacon
@ 2026-03-27 14:00 ` Will Deacon
2026-03-27 14:00 ` [PATCH v4 11/38] KVM: arm64: Split teardown hypercall into two phases Will Deacon
` (28 subsequent siblings)
38 siblings, 0 replies; 40+ messages in thread
From: Will Deacon @ 2026-03-27 14:00 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh,
Alexandru Elisei
If the host takes a stage-2 translation fault on two CPUs at the same
time, one of them will get back -EAGAIN from the page-table mapping code
when it runs into the mapping installed by the other.
Rather than handle this explicitly in handle_host_mem_abort(), pass the
new KVM_PGTABLE_WALK_IGNORE_EAGAIN flag to kvm_pgtable_stage2_map() from
__host_stage2_idmap() and return -EEXIST if host_stage2_adjust_range()
finds a valid pte. This will avoid having to test for -EAGAIN on the
reclaim path in subsequent patches.
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 21 ++++++++++++++++-----
1 file changed, 16 insertions(+), 5 deletions(-)
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index d815265bd374..7d22893ab1dc 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -461,8 +461,15 @@ static bool range_is_memory(u64 start, u64 end)
static inline int __host_stage2_idmap(u64 start, u64 end,
enum kvm_pgtable_prot prot)
{
+ /*
+ * We don't make permission changes to the host idmap after
+ * initialisation, so we can squash -EAGAIN to save callers
+ * having to treat it like success in the case that they try to
+ * map something that is already mapped.
+ */
return kvm_pgtable_stage2_map(&host_mmu.pgt, start, end - start, start,
- prot, &host_s2_pool, 0);
+ prot, &host_s2_pool,
+ KVM_PGTABLE_WALK_IGNORE_EAGAIN);
}
/*
@@ -504,7 +511,7 @@ static int host_stage2_adjust_range(u64 addr, struct kvm_mem_range *range)
return ret;
if (kvm_pte_valid(pte))
- return -EAGAIN;
+ return -EEXIST;
if (pte) {
WARN_ON(addr_is_memory(addr) &&
@@ -609,7 +616,6 @@ void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt)
{
struct kvm_vcpu_fault_info fault;
u64 esr, addr;
- int ret = 0;
esr = read_sysreg_el2(SYS_ESR);
if (!__get_fault_info(esr, &fault)) {
@@ -628,8 +634,13 @@ void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt)
BUG_ON(!(fault.hpfar_el2 & HPFAR_EL2_NS));
addr = FIELD_GET(HPFAR_EL2_FIPA, fault.hpfar_el2) << 12;
- ret = host_stage2_idmap(addr);
- BUG_ON(ret && ret != -EAGAIN);
+ switch (host_stage2_idmap(addr)) {
+ case -EEXIST:
+ case 0:
+ break;
+ default:
+ BUG();
+ }
}
struct check_walk_data {
--
2.53.0.1018.g2bb0e51243-goog
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v4 11/38] KVM: arm64: Split teardown hypercall into two phases
2026-03-27 13:59 [PATCH v4 00/38] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
` (9 preceding siblings ...)
2026-03-27 14:00 ` [PATCH v4 10/38] KVM: arm64: Ignore -EAGAIN when mapping in pages for the pKVM host Will Deacon
@ 2026-03-27 14:00 ` Will Deacon
2026-03-27 14:00 ` [PATCH v4 12/38] KVM: arm64: Introduce __pkvm_host_donate_guest() Will Deacon
` (27 subsequent siblings)
38 siblings, 0 replies; 40+ messages in thread
From: Will Deacon @ 2026-03-27 14:00 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh,
Alexandru Elisei
In preparation for reclaiming protected guest VM pages from the host
during teardown, split the current 'pkvm_teardown_vm' hypercall into
separate 'start' and 'finalise' calls.
The 'pkvm_start_teardown_vm' hypercall puts the VM into a new 'is_dying'
state, which is a point of no return past which no vCPU of the pVM is
allowed to run any more. Once in this new state,
'pkvm_finalize_teardown_vm' can be used to reclaim meta-data and
page-table pages from the VM. A subsequent patch will add support for
reclaiming the individual guest memory pages.
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Co-developed-by: Quentin Perret <qperret@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
arch/arm64/include/asm/kvm_asm.h | 3 ++-
arch/arm64/include/asm/kvm_host.h | 7 +++++
arch/arm64/kvm/hyp/include/nvhe/pkvm.h | 4 ++-
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 14 +++++++---
arch/arm64/kvm/hyp/nvhe/pkvm.c | 36 ++++++++++++++++++++++----
arch/arm64/kvm/pkvm.c | 7 ++++-
6 files changed, 60 insertions(+), 11 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 7b72aac4730d..df6b661701b6 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -89,7 +89,8 @@ enum __kvm_host_smccc_func {
__KVM_HOST_SMCCC_FUNC___pkvm_unreserve_vm,
__KVM_HOST_SMCCC_FUNC___pkvm_init_vm,
__KVM_HOST_SMCCC_FUNC___pkvm_init_vcpu,
- __KVM_HOST_SMCCC_FUNC___pkvm_teardown_vm,
+ __KVM_HOST_SMCCC_FUNC___pkvm_start_teardown_vm,
+ __KVM_HOST_SMCCC_FUNC___pkvm_finalize_teardown_vm,
__KVM_HOST_SMCCC_FUNC___pkvm_vcpu_load,
__KVM_HOST_SMCCC_FUNC___pkvm_vcpu_put,
__KVM_HOST_SMCCC_FUNC___pkvm_tlb_flush_vmid,
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 70cb9cfd760a..31b9454bb74d 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -255,6 +255,13 @@ struct kvm_protected_vm {
struct kvm_hyp_memcache stage2_teardown_mc;
bool is_protected;
bool is_created;
+
+ /*
+ * True when the guest is being torn down. When in this state, the
+ * guest's vCPUs can't be loaded anymore, but its pages can be
+ * reclaimed by the host.
+ */
+ bool is_dying;
};
struct kvm_mpidr_data {
diff --git a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
index 184ad7a39950..04c7ca703014 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
@@ -73,7 +73,9 @@ int __pkvm_init_vm(struct kvm *host_kvm, unsigned long vm_hva,
unsigned long pgd_hva);
int __pkvm_init_vcpu(pkvm_handle_t handle, struct kvm_vcpu *host_vcpu,
unsigned long vcpu_hva);
-int __pkvm_teardown_vm(pkvm_handle_t handle);
+
+int __pkvm_start_teardown_vm(pkvm_handle_t handle);
+int __pkvm_finalize_teardown_vm(pkvm_handle_t handle);
struct pkvm_hyp_vcpu *pkvm_load_hyp_vcpu(pkvm_handle_t handle,
unsigned int vcpu_idx);
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 127decc2dd2b..634ea2766240 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -553,11 +553,18 @@ static void handle___pkvm_init_vcpu(struct kvm_cpu_context *host_ctxt)
cpu_reg(host_ctxt, 1) = __pkvm_init_vcpu(handle, host_vcpu, vcpu_hva);
}
-static void handle___pkvm_teardown_vm(struct kvm_cpu_context *host_ctxt)
+static void handle___pkvm_start_teardown_vm(struct kvm_cpu_context *host_ctxt)
{
DECLARE_REG(pkvm_handle_t, handle, host_ctxt, 1);
- cpu_reg(host_ctxt, 1) = __pkvm_teardown_vm(handle);
+ cpu_reg(host_ctxt, 1) = __pkvm_start_teardown_vm(handle);
+}
+
+static void handle___pkvm_finalize_teardown_vm(struct kvm_cpu_context *host_ctxt)
+{
+ DECLARE_REG(pkvm_handle_t, handle, host_ctxt, 1);
+
+ cpu_reg(host_ctxt, 1) = __pkvm_finalize_teardown_vm(handle);
}
typedef void (*hcall_t)(struct kvm_cpu_context *);
@@ -598,7 +605,8 @@ static const hcall_t host_hcall[] = {
HANDLE_FUNC(__pkvm_unreserve_vm),
HANDLE_FUNC(__pkvm_init_vm),
HANDLE_FUNC(__pkvm_init_vcpu),
- HANDLE_FUNC(__pkvm_teardown_vm),
+ HANDLE_FUNC(__pkvm_start_teardown_vm),
+ HANDLE_FUNC(__pkvm_finalize_teardown_vm),
HANDLE_FUNC(__pkvm_vcpu_load),
HANDLE_FUNC(__pkvm_vcpu_put),
HANDLE_FUNC(__pkvm_tlb_flush_vmid),
diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
index 2f029bfe4755..c4e05ab8b605 100644
--- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
+++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
@@ -255,7 +255,10 @@ struct pkvm_hyp_vcpu *pkvm_load_hyp_vcpu(pkvm_handle_t handle,
hyp_spin_lock(&vm_table_lock);
hyp_vm = get_vm_by_handle(handle);
- if (!hyp_vm || hyp_vm->kvm.created_vcpus <= vcpu_idx)
+ if (!hyp_vm || hyp_vm->kvm.arch.pkvm.is_dying)
+ goto unlock;
+
+ if (hyp_vm->kvm.created_vcpus <= vcpu_idx)
goto unlock;
hyp_vcpu = hyp_vm->vcpus[vcpu_idx];
@@ -859,7 +862,32 @@ teardown_donated_memory(struct kvm_hyp_memcache *mc, void *addr, size_t size)
unmap_donated_memory_noclear(addr, size);
}
-int __pkvm_teardown_vm(pkvm_handle_t handle)
+int __pkvm_start_teardown_vm(pkvm_handle_t handle)
+{
+ struct pkvm_hyp_vm *hyp_vm;
+ int ret = 0;
+
+ hyp_spin_lock(&vm_table_lock);
+ hyp_vm = get_vm_by_handle(handle);
+ if (!hyp_vm) {
+ ret = -ENOENT;
+ goto unlock;
+ } else if (WARN_ON(hyp_page_count(hyp_vm))) {
+ ret = -EBUSY;
+ goto unlock;
+ } else if (hyp_vm->kvm.arch.pkvm.is_dying) {
+ ret = -EINVAL;
+ goto unlock;
+ }
+
+ hyp_vm->kvm.arch.pkvm.is_dying = true;
+unlock:
+ hyp_spin_unlock(&vm_table_lock);
+
+ return ret;
+}
+
+int __pkvm_finalize_teardown_vm(pkvm_handle_t handle)
{
struct kvm_hyp_memcache *mc, *stage2_mc;
struct pkvm_hyp_vm *hyp_vm;
@@ -873,9 +901,7 @@ int __pkvm_teardown_vm(pkvm_handle_t handle)
if (!hyp_vm) {
err = -ENOENT;
goto err_unlock;
- }
-
- if (WARN_ON(hyp_page_count(hyp_vm))) {
+ } else if (!hyp_vm->kvm.arch.pkvm.is_dying) {
err = -EBUSY;
goto err_unlock;
}
diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index 20d50abb3b94..a39dacd1d617 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -88,7 +88,7 @@ void __init kvm_hyp_reserve(void)
static void __pkvm_destroy_hyp_vm(struct kvm *kvm)
{
if (pkvm_hyp_vm_is_created(kvm)) {
- WARN_ON(kvm_call_hyp_nvhe(__pkvm_teardown_vm,
+ WARN_ON(kvm_call_hyp_nvhe(__pkvm_finalize_teardown_vm,
kvm->arch.pkvm.handle));
} else if (kvm->arch.pkvm.handle) {
/*
@@ -350,6 +350,11 @@ void pkvm_pgtable_stage2_destroy_range(struct kvm_pgtable *pgt,
if (!handle)
return;
+ if (pkvm_hyp_vm_is_created(kvm) && !kvm->arch.pkvm.is_dying) {
+ WARN_ON(kvm_call_hyp_nvhe(__pkvm_start_teardown_vm, handle));
+ kvm->arch.pkvm.is_dying = true;
+ }
+
__pkvm_pgtable_stage2_unshare(pgt, addr, addr + size);
}
--
2.53.0.1018.g2bb0e51243-goog
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v4 12/38] KVM: arm64: Introduce __pkvm_host_donate_guest()
2026-03-27 13:59 [PATCH v4 00/38] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
` (10 preceding siblings ...)
2026-03-27 14:00 ` [PATCH v4 11/38] KVM: arm64: Split teardown hypercall into two phases Will Deacon
@ 2026-03-27 14:00 ` Will Deacon
2026-03-27 14:00 ` [PATCH v4 13/38] KVM: arm64: Hook up donation hypercall to pkvm_pgtable_stage2_map() Will Deacon
` (26 subsequent siblings)
38 siblings, 0 replies; 40+ messages in thread
From: Will Deacon @ 2026-03-27 14:00 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh,
Alexandru Elisei
In preparation for supporting protected VMs, whose memory pages are
isolated from the host, introduce a new pKVM hypercall to allow the
donation of pages to a guest.
Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
arch/arm64/include/asm/kvm_asm.h | 1 +
arch/arm64/include/asm/kvm_pgtable.h | 2 +-
arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 2 ++
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 21 +++++++++++++
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 30 +++++++++++++++++++
5 files changed, 55 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index df6b661701b6..dfc6625c8269 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -79,6 +79,7 @@ enum __kvm_host_smccc_func {
/* Hypercalls that are available only when pKVM has finalised. */
__KVM_HOST_SMCCC_FUNC___pkvm_host_share_hyp,
__KVM_HOST_SMCCC_FUNC___pkvm_host_unshare_hyp,
+ __KVM_HOST_SMCCC_FUNC___pkvm_host_donate_guest,
__KVM_HOST_SMCCC_FUNC___pkvm_host_share_guest,
__KVM_HOST_SMCCC_FUNC___pkvm_host_unshare_guest,
__KVM_HOST_SMCCC_FUNC___pkvm_host_relax_perms_guest,
diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index c201168f2857..50caca311ef5 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -100,7 +100,7 @@ typedef u64 kvm_pte_t;
KVM_PTE_LEAF_ATTR_HI_S2_XN)
#define KVM_INVALID_PTE_OWNER_MASK GENMASK(9, 2)
-#define KVM_MAX_OWNER_ID 1
+#define KVM_MAX_OWNER_ID 2
/*
* Used to indicate a pte for which a 'break-before-make' sequence is in
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index 7f25f2bca90c..7061b0be340a 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -27,6 +27,7 @@ extern struct host_mmu host_mmu;
enum pkvm_component_id {
PKVM_ID_HOST,
PKVM_ID_HYP,
+ PKVM_ID_GUEST,
};
extern unsigned long hyp_nr_cpus;
@@ -38,6 +39,7 @@ int __pkvm_host_donate_hyp(u64 pfn, u64 nr_pages);
int __pkvm_hyp_donate_host(u64 pfn, u64 nr_pages);
int __pkvm_host_share_ffa(u64 pfn, u64 nr_pages);
int __pkvm_host_unshare_ffa(u64 pfn, u64 nr_pages);
+int __pkvm_host_donate_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu);
int __pkvm_host_share_guest(u64 pfn, u64 gfn, u64 nr_pages, struct pkvm_hyp_vcpu *vcpu,
enum kvm_pgtable_prot prot);
int __pkvm_host_unshare_guest(u64 gfn, u64 nr_pages, struct pkvm_hyp_vm *hyp_vm);
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 634ea2766240..970656318cf2 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -241,6 +241,26 @@ static int pkvm_refill_memcache(struct pkvm_hyp_vcpu *hyp_vcpu)
&host_vcpu->arch.pkvm_memcache);
}
+static void handle___pkvm_host_donate_guest(struct kvm_cpu_context *host_ctxt)
+{
+ DECLARE_REG(u64, pfn, host_ctxt, 1);
+ DECLARE_REG(u64, gfn, host_ctxt, 2);
+ struct pkvm_hyp_vcpu *hyp_vcpu;
+ int ret = -EINVAL;
+
+ hyp_vcpu = pkvm_get_loaded_hyp_vcpu();
+ if (!hyp_vcpu || !pkvm_hyp_vcpu_is_protected(hyp_vcpu))
+ goto out;
+
+ ret = pkvm_refill_memcache(hyp_vcpu);
+ if (ret)
+ goto out;
+
+ ret = __pkvm_host_donate_guest(pfn, gfn, hyp_vcpu);
+out:
+ cpu_reg(host_ctxt, 1) = ret;
+}
+
static void handle___pkvm_host_share_guest(struct kvm_cpu_context *host_ctxt)
{
DECLARE_REG(u64, pfn, host_ctxt, 1);
@@ -595,6 +615,7 @@ static const hcall_t host_hcall[] = {
HANDLE_FUNC(__pkvm_host_share_hyp),
HANDLE_FUNC(__pkvm_host_unshare_hyp),
+ HANDLE_FUNC(__pkvm_host_donate_guest),
HANDLE_FUNC(__pkvm_host_share_guest),
HANDLE_FUNC(__pkvm_host_unshare_guest),
HANDLE_FUNC(__pkvm_host_relax_perms_guest),
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 7d22893ab1dc..03e6fa124253 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -971,6 +971,36 @@ static int __guest_check_transition_size(u64 phys, u64 ipa, u64 nr_pages, u64 *s
return 0;
}
+int __pkvm_host_donate_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu)
+{
+ struct pkvm_hyp_vm *vm = pkvm_hyp_vcpu_to_hyp_vm(vcpu);
+ u64 phys = hyp_pfn_to_phys(pfn);
+ u64 ipa = hyp_pfn_to_phys(gfn);
+ int ret;
+
+ host_lock_component();
+ guest_lock_component(vm);
+
+ ret = __host_check_page_state_range(phys, PAGE_SIZE, PKVM_PAGE_OWNED);
+ if (ret)
+ goto unlock;
+
+ ret = __guest_check_page_state_range(vm, ipa, PAGE_SIZE, PKVM_NOPAGE);
+ if (ret)
+ goto unlock;
+
+ WARN_ON(host_stage2_set_owner_locked(phys, PAGE_SIZE, PKVM_ID_GUEST));
+ WARN_ON(kvm_pgtable_stage2_map(&vm->pgt, ipa, PAGE_SIZE, phys,
+ pkvm_mkstate(KVM_PGTABLE_PROT_RWX, PKVM_PAGE_OWNED),
+ &vcpu->vcpu.arch.pkvm_memcache, 0));
+
+unlock:
+ guest_unlock_component(vm);
+ host_unlock_component();
+
+ return ret;
+}
+
int __pkvm_host_share_guest(u64 pfn, u64 gfn, u64 nr_pages, struct pkvm_hyp_vcpu *vcpu,
enum kvm_pgtable_prot prot)
{
--
2.53.0.1018.g2bb0e51243-goog
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v4 13/38] KVM: arm64: Hook up donation hypercall to pkvm_pgtable_stage2_map()
2026-03-27 13:59 [PATCH v4 00/38] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
` (11 preceding siblings ...)
2026-03-27 14:00 ` [PATCH v4 12/38] KVM: arm64: Introduce __pkvm_host_donate_guest() Will Deacon
@ 2026-03-27 14:00 ` Will Deacon
2026-03-27 14:00 ` [PATCH v4 14/38] KVM: arm64: Handle aborts from protected VMs Will Deacon
` (25 subsequent siblings)
38 siblings, 0 replies; 40+ messages in thread
From: Will Deacon @ 2026-03-27 14:00 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh,
Alexandru Elisei
Mapping pages into a protected guest requires the donation of memory
from the host.
Extend pkvm_pgtable_stage2_map() to issue a donate hypercall when the
target VM is protected. Since the hypercall only handles a single page,
the splitting logic used for the share path is not required.
Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
arch/arm64/kvm/pkvm.c | 58 ++++++++++++++++++++++++++++++-------------
1 file changed, 41 insertions(+), 17 deletions(-)
diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index a39dacd1d617..1814e17d600e 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -373,31 +373,55 @@ int pkvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
struct kvm_hyp_memcache *cache = mc;
u64 gfn = addr >> PAGE_SHIFT;
u64 pfn = phys >> PAGE_SHIFT;
+ u64 end = addr + size;
int ret;
- if (size != PAGE_SIZE && size != PMD_SIZE)
- return -EINVAL;
-
lockdep_assert_held_write(&kvm->mmu_lock);
+ mapping = pkvm_mapping_iter_first(&pgt->pkvm_mappings, addr, end - 1);
- /*
- * Calling stage2_map() on top of existing mappings is either happening because of a race
- * with another vCPU, or because we're changing between page and block mappings. As per
- * user_mem_abort(), same-size permission faults are handled in the relax_perms() path.
- */
- mapping = pkvm_mapping_iter_first(&pgt->pkvm_mappings, addr, addr + size - 1);
- if (mapping) {
- if (size == (mapping->nr_pages * PAGE_SIZE))
+ if (kvm_vm_is_protected(kvm)) {
+ /* Protected VMs are mapped using RWX page-granular mappings */
+ if (WARN_ON_ONCE(size != PAGE_SIZE))
+ return -EINVAL;
+
+ if (WARN_ON_ONCE(prot != KVM_PGTABLE_PROT_RWX))
+ return -EINVAL;
+
+ /*
+ * We raced with another vCPU.
+ */
+ if (mapping)
return -EAGAIN;
- /* Remove _any_ pkvm_mapping overlapping with the range, bigger or smaller. */
- ret = __pkvm_pgtable_stage2_unshare(pgt, addr, addr + size);
- if (ret)
- return ret;
- mapping = NULL;
+ ret = kvm_call_hyp_nvhe(__pkvm_host_donate_guest, pfn, gfn);
+ } else {
+ if (WARN_ON_ONCE(size != PAGE_SIZE && size != PMD_SIZE))
+ return -EINVAL;
+
+ /*
+ * We either raced with another vCPU or we're changing between
+ * page and block mappings. As per user_mem_abort(), same-size
+ * permission faults are handled in the relax_perms() path.
+ */
+ if (mapping) {
+ if (size == (mapping->nr_pages * PAGE_SIZE))
+ return -EAGAIN;
+
+ /*
+ * Remove _any_ pkvm_mapping overlapping with the range,
+ * bigger or smaller.
+ */
+ ret = __pkvm_pgtable_stage2_unshare(pgt, addr, end);
+ if (ret)
+ return ret;
+
+ mapping = NULL;
+ }
+
+ ret = kvm_call_hyp_nvhe(__pkvm_host_share_guest, pfn, gfn,
+ size / PAGE_SIZE, prot);
}
- ret = kvm_call_hyp_nvhe(__pkvm_host_share_guest, pfn, gfn, size / PAGE_SIZE, prot);
if (WARN_ON(ret))
return ret;
--
2.53.0.1018.g2bb0e51243-goog
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v4 14/38] KVM: arm64: Handle aborts from protected VMs
2026-03-27 13:59 [PATCH v4 00/38] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
` (12 preceding siblings ...)
2026-03-27 14:00 ` [PATCH v4 13/38] KVM: arm64: Hook up donation hypercall to pkvm_pgtable_stage2_map() Will Deacon
@ 2026-03-27 14:00 ` Will Deacon
2026-03-27 14:00 ` [PATCH v4 15/38] KVM: arm64: Introduce __pkvm_reclaim_dying_guest_page() Will Deacon
` (24 subsequent siblings)
38 siblings, 0 replies; 40+ messages in thread
From: Will Deacon @ 2026-03-27 14:00 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh,
Alexandru Elisei
Introduce a new abort handler for resolving stage-2 page faults from
protected VMs by pinning and donating anonymous memory. This is
considerably simpler than the infamous user_mem_abort() as we only have
to deal with translation faults at the pte level.
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
arch/arm64/kvm/mmu.c | 89 ++++++++++++++++++++++++++++++++++++++++----
1 file changed, 81 insertions(+), 8 deletions(-)
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index b3cc5dfe5723..6a4151e3e4a3 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1642,6 +1642,74 @@ static int gmem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
return ret != -EAGAIN ? ret : 0;
}
+static int pkvm_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
+ struct kvm_memory_slot *memslot, unsigned long hva)
+{
+ unsigned int flags = FOLL_HWPOISON | FOLL_LONGTERM | FOLL_WRITE;
+ struct kvm_pgtable *pgt = vcpu->arch.hw_mmu->pgt;
+ struct mm_struct *mm = current->mm;
+ struct kvm *kvm = vcpu->kvm;
+ void *hyp_memcache;
+ struct page *page;
+ int ret;
+
+ ret = prepare_mmu_memcache(vcpu, true, &hyp_memcache);
+ if (ret)
+ return -ENOMEM;
+
+ ret = account_locked_vm(mm, 1, true);
+ if (ret)
+ return ret;
+
+ mmap_read_lock(mm);
+ ret = pin_user_pages(hva, 1, flags, &page);
+ mmap_read_unlock(mm);
+
+ if (ret == -EHWPOISON) {
+ kvm_send_hwpoison_signal(hva, PAGE_SHIFT);
+ ret = 0;
+ goto dec_account;
+ } else if (ret != 1) {
+ ret = -EFAULT;
+ goto dec_account;
+ } else if (!folio_test_swapbacked(page_folio(page))) {
+ /*
+ * We really can't deal with page-cache pages returned by GUP
+ * because (a) we may trigger writeback of a page for which we
+ * no longer have access and (b) page_mkclean() won't find the
+ * stage-2 mapping in the rmap so we can get out-of-whack with
+ * the filesystem when marking the page dirty during unpinning
+ * (see cc5095747edf ("ext4: don't BUG if someone dirty pages
+ * without asking ext4 first")).
+ *
+ * Ideally we'd just restrict ourselves to anonymous pages, but
+ * we also want to allow memfd (i.e. shmem) pages, so check for
+ * pages backed by swap in the knowledge that the GUP pin will
+ * prevent try_to_unmap() from succeeding.
+ */
+ ret = -EIO;
+ goto unpin;
+ }
+
+ write_lock(&kvm->mmu_lock);
+ ret = pkvm_pgtable_stage2_map(pgt, fault_ipa, PAGE_SIZE,
+ page_to_phys(page), KVM_PGTABLE_PROT_RWX,
+ hyp_memcache, 0);
+ write_unlock(&kvm->mmu_lock);
+ if (ret) {
+ if (ret == -EAGAIN)
+ ret = 0;
+ goto unpin;
+ }
+
+ return 0;
+unpin:
+ unpin_user_pages(&page, 1);
+dec_account:
+ account_locked_vm(mm, 1, false);
+ return ret;
+}
+
static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
struct kvm_s2_trans *nested,
struct kvm_memory_slot *memslot, unsigned long hva,
@@ -2205,15 +2273,20 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
goto out_unlock;
}
- VM_WARN_ON_ONCE(kvm_vcpu_trap_is_permission_fault(vcpu) &&
- !write_fault && !kvm_vcpu_trap_is_exec_fault(vcpu));
+ if (kvm_vm_is_protected(vcpu->kvm)) {
+ ret = pkvm_mem_abort(vcpu, fault_ipa, memslot, hva);
+ } else {
+ VM_WARN_ON_ONCE(kvm_vcpu_trap_is_permission_fault(vcpu) &&
+ !write_fault &&
+ !kvm_vcpu_trap_is_exec_fault(vcpu));
- if (kvm_slot_has_gmem(memslot))
- ret = gmem_abort(vcpu, fault_ipa, nested, memslot,
- esr_fsc_is_permission_fault(esr));
- else
- ret = user_mem_abort(vcpu, fault_ipa, nested, memslot, hva,
- esr_fsc_is_permission_fault(esr));
+ if (kvm_slot_has_gmem(memslot))
+ ret = gmem_abort(vcpu, fault_ipa, nested, memslot,
+ esr_fsc_is_permission_fault(esr));
+ else
+ ret = user_mem_abort(vcpu, fault_ipa, nested, memslot, hva,
+ esr_fsc_is_permission_fault(esr));
+ }
if (ret == 0)
ret = 1;
out:
--
2.53.0.1018.g2bb0e51243-goog
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v4 15/38] KVM: arm64: Introduce __pkvm_reclaim_dying_guest_page()
2026-03-27 13:59 [PATCH v4 00/38] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
` (13 preceding siblings ...)
2026-03-27 14:00 ` [PATCH v4 14/38] KVM: arm64: Handle aborts from protected VMs Will Deacon
@ 2026-03-27 14:00 ` Will Deacon
2026-03-27 14:00 ` [PATCH v4 16/38] KVM: arm64: Hook up reclaim hypercall to pkvm_pgtable_stage2_destroy() Will Deacon
` (23 subsequent siblings)
38 siblings, 0 replies; 40+ messages in thread
From: Will Deacon @ 2026-03-27 14:00 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh,
Alexandru Elisei
To enable reclaim of pages from a protected VM during teardown,
introduce a new hypercall to reclaim a single page from a protected
guest that is in the dying state.
Since the EL2 code is non-preemptible, the new hypercall deliberately
acts on a single page at a time so as to allow EL1 to reschedule
frequently during the teardown operation.
Reviewed-by: Vincent Donnefort <vdonnefort@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Co-developed-by: Quentin Perret <qperret@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
arch/arm64/include/asm/kvm_asm.h | 1 +
arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 1 +
arch/arm64/kvm/hyp/include/nvhe/pkvm.h | 1 +
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 9 +++
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 79 +++++++++++++++++++
arch/arm64/kvm/hyp/nvhe/pkvm.c | 14 ++++
6 files changed, 105 insertions(+)
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index dfc6625c8269..b6df8f64d573 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -90,6 +90,7 @@ enum __kvm_host_smccc_func {
__KVM_HOST_SMCCC_FUNC___pkvm_unreserve_vm,
__KVM_HOST_SMCCC_FUNC___pkvm_init_vm,
__KVM_HOST_SMCCC_FUNC___pkvm_init_vcpu,
+ __KVM_HOST_SMCCC_FUNC___pkvm_reclaim_dying_guest_page,
__KVM_HOST_SMCCC_FUNC___pkvm_start_teardown_vm,
__KVM_HOST_SMCCC_FUNC___pkvm_finalize_teardown_vm,
__KVM_HOST_SMCCC_FUNC___pkvm_vcpu_load,
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index 7061b0be340a..29f81a1d9e1f 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -40,6 +40,7 @@ int __pkvm_hyp_donate_host(u64 pfn, u64 nr_pages);
int __pkvm_host_share_ffa(u64 pfn, u64 nr_pages);
int __pkvm_host_unshare_ffa(u64 pfn, u64 nr_pages);
int __pkvm_host_donate_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu);
+int __pkvm_host_reclaim_page_guest(u64 gfn, struct pkvm_hyp_vm *vm);
int __pkvm_host_share_guest(u64 pfn, u64 gfn, u64 nr_pages, struct pkvm_hyp_vcpu *vcpu,
enum kvm_pgtable_prot prot);
int __pkvm_host_unshare_guest(u64 gfn, u64 nr_pages, struct pkvm_hyp_vm *hyp_vm);
diff --git a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
index 04c7ca703014..506831804f64 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
@@ -74,6 +74,7 @@ int __pkvm_init_vm(struct kvm *host_kvm, unsigned long vm_hva,
int __pkvm_init_vcpu(pkvm_handle_t handle, struct kvm_vcpu *host_vcpu,
unsigned long vcpu_hva);
+int __pkvm_reclaim_dying_guest_page(pkvm_handle_t handle, u64 gfn);
int __pkvm_start_teardown_vm(pkvm_handle_t handle);
int __pkvm_finalize_teardown_vm(pkvm_handle_t handle);
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 970656318cf2..7294c94f9296 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -573,6 +573,14 @@ static void handle___pkvm_init_vcpu(struct kvm_cpu_context *host_ctxt)
cpu_reg(host_ctxt, 1) = __pkvm_init_vcpu(handle, host_vcpu, vcpu_hva);
}
+static void handle___pkvm_reclaim_dying_guest_page(struct kvm_cpu_context *host_ctxt)
+{
+ DECLARE_REG(pkvm_handle_t, handle, host_ctxt, 1);
+ DECLARE_REG(u64, gfn, host_ctxt, 2);
+
+ cpu_reg(host_ctxt, 1) = __pkvm_reclaim_dying_guest_page(handle, gfn);
+}
+
static void handle___pkvm_start_teardown_vm(struct kvm_cpu_context *host_ctxt)
{
DECLARE_REG(pkvm_handle_t, handle, host_ctxt, 1);
@@ -626,6 +634,7 @@ static const hcall_t host_hcall[] = {
HANDLE_FUNC(__pkvm_unreserve_vm),
HANDLE_FUNC(__pkvm_init_vm),
HANDLE_FUNC(__pkvm_init_vcpu),
+ HANDLE_FUNC(__pkvm_reclaim_dying_guest_page),
HANDLE_FUNC(__pkvm_start_teardown_vm),
HANDLE_FUNC(__pkvm_finalize_teardown_vm),
HANDLE_FUNC(__pkvm_vcpu_load),
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 03e6fa124253..ca266a4d9d50 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -738,6 +738,32 @@ static int __guest_check_page_state_range(struct pkvm_hyp_vm *vm, u64 addr,
return check_page_state_range(&vm->pgt, addr, size, &d);
}
+static int get_valid_guest_pte(struct pkvm_hyp_vm *vm, u64 ipa, kvm_pte_t *ptep, u64 *physp)
+{
+ kvm_pte_t pte;
+ u64 phys;
+ s8 level;
+ int ret;
+
+ ret = kvm_pgtable_get_leaf(&vm->pgt, ipa, &pte, &level);
+ if (ret)
+ return ret;
+ if (!kvm_pte_valid(pte))
+ return -ENOENT;
+ if (level != KVM_PGTABLE_LAST_LEVEL)
+ return -E2BIG;
+
+ phys = kvm_pte_to_phys(pte);
+ ret = check_range_allowed_memory(phys, phys + PAGE_SIZE);
+ if (WARN_ON(ret))
+ return ret;
+
+ *ptep = pte;
+ *physp = phys;
+
+ return 0;
+}
+
int __pkvm_host_share_hyp(u64 pfn)
{
u64 phys = hyp_pfn_to_phys(pfn);
@@ -971,6 +997,59 @@ static int __guest_check_transition_size(u64 phys, u64 ipa, u64 nr_pages, u64 *s
return 0;
}
+static void hyp_poison_page(phys_addr_t phys)
+{
+ void *addr = hyp_fixmap_map(phys);
+
+ memset(addr, 0, PAGE_SIZE);
+ /*
+ * Prefer kvm_flush_dcache_to_poc() over __clean_dcache_guest_page()
+ * here as the latter may elide the CMO under the assumption that FWB
+ * will be enabled on CPUs that support it. This is incorrect for the
+ * host stage-2 and would otherwise lead to a malicious host potentially
+ * being able to read the contents of newly reclaimed guest pages.
+ */
+ kvm_flush_dcache_to_poc(addr, PAGE_SIZE);
+ hyp_fixmap_unmap();
+}
+
+int __pkvm_host_reclaim_page_guest(u64 gfn, struct pkvm_hyp_vm *vm)
+{
+ u64 ipa = hyp_pfn_to_phys(gfn);
+ kvm_pte_t pte;
+ u64 phys;
+ int ret;
+
+ host_lock_component();
+ guest_lock_component(vm);
+
+ ret = get_valid_guest_pte(vm, ipa, &pte, &phys);
+ if (ret)
+ goto unlock;
+
+ switch (guest_get_page_state(pte, ipa)) {
+ case PKVM_PAGE_OWNED:
+ WARN_ON(__host_check_page_state_range(phys, PAGE_SIZE, PKVM_NOPAGE));
+ hyp_poison_page(phys);
+ break;
+ case PKVM_PAGE_SHARED_OWNED:
+ WARN_ON(__host_check_page_state_range(phys, PAGE_SIZE, PKVM_PAGE_SHARED_BORROWED));
+ break;
+ default:
+ ret = -EPERM;
+ goto unlock;
+ }
+
+ WARN_ON(kvm_pgtable_stage2_unmap(&vm->pgt, ipa, PAGE_SIZE));
+ WARN_ON(host_stage2_set_owner_locked(phys, PAGE_SIZE, PKVM_ID_HOST));
+
+unlock:
+ guest_unlock_component(vm);
+ host_unlock_component();
+
+ return ret;
+}
+
int __pkvm_host_donate_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu)
{
struct pkvm_hyp_vm *vm = pkvm_hyp_vcpu_to_hyp_vm(vcpu);
diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
index c4e05ab8b605..a2d45f4b0cf6 100644
--- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
+++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
@@ -862,6 +862,20 @@ teardown_donated_memory(struct kvm_hyp_memcache *mc, void *addr, size_t size)
unmap_donated_memory_noclear(addr, size);
}
+int __pkvm_reclaim_dying_guest_page(pkvm_handle_t handle, u64 gfn)
+{
+ struct pkvm_hyp_vm *hyp_vm;
+ int ret = -EINVAL;
+
+ hyp_spin_lock(&vm_table_lock);
+ hyp_vm = get_vm_by_handle(handle);
+ if (hyp_vm && hyp_vm->kvm.arch.pkvm.is_dying)
+ ret = __pkvm_host_reclaim_page_guest(gfn, hyp_vm);
+ hyp_spin_unlock(&vm_table_lock);
+
+ return ret;
+}
+
int __pkvm_start_teardown_vm(pkvm_handle_t handle)
{
struct pkvm_hyp_vm *hyp_vm;
--
2.53.0.1018.g2bb0e51243-goog
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v4 16/38] KVM: arm64: Hook up reclaim hypercall to pkvm_pgtable_stage2_destroy()
2026-03-27 13:59 [PATCH v4 00/38] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
` (14 preceding siblings ...)
2026-03-27 14:00 ` [PATCH v4 15/38] KVM: arm64: Introduce __pkvm_reclaim_dying_guest_page() Will Deacon
@ 2026-03-27 14:00 ` Will Deacon
2026-03-27 14:00 ` [PATCH v4 17/38] KVM: arm64: Factor out pKVM host exception injection logic Will Deacon
` (22 subsequent siblings)
38 siblings, 0 replies; 40+ messages in thread
From: Will Deacon @ 2026-03-27 14:00 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh,
Alexandru Elisei
During teardown of a protected guest, its memory pages must be reclaimed
from the hypervisor by issuing the '__pkvm_reclaim_dying_guest_page'
hypercall.
Add a new helper, __pkvm_pgtable_stage2_reclaim(), which is called
during the VM teardown operation to reclaim pages from the hypervisor
and drop the GUP pin on the host.
Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
arch/arm64/kvm/pkvm.c | 31 ++++++++++++++++++++++++++++++-
1 file changed, 30 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index 1814e17d600e..8be91051699e 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -322,6 +322,32 @@ int pkvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
return 0;
}
+static int __pkvm_pgtable_stage2_reclaim(struct kvm_pgtable *pgt, u64 start, u64 end)
+{
+ struct kvm *kvm = kvm_s2_mmu_to_kvm(pgt->mmu);
+ pkvm_handle_t handle = kvm->arch.pkvm.handle;
+ struct pkvm_mapping *mapping;
+ int ret;
+
+ for_each_mapping_in_range_safe(pgt, start, end, mapping) {
+ struct page *page;
+
+ ret = kvm_call_hyp_nvhe(__pkvm_reclaim_dying_guest_page,
+ handle, mapping->gfn);
+ if (WARN_ON(ret))
+ return ret;
+
+ page = pfn_to_page(mapping->pfn);
+ WARN_ON_ONCE(mapping->nr_pages != 1);
+ unpin_user_pages_dirty_lock(&page, 1, true);
+ account_locked_vm(current->mm, 1, false);
+ pkvm_mapping_remove(mapping, &pgt->pkvm_mappings);
+ kfree(mapping);
+ }
+
+ return 0;
+}
+
static int __pkvm_pgtable_stage2_unshare(struct kvm_pgtable *pgt, u64 start, u64 end)
{
struct kvm *kvm = kvm_s2_mmu_to_kvm(pgt->mmu);
@@ -355,7 +381,10 @@ void pkvm_pgtable_stage2_destroy_range(struct kvm_pgtable *pgt,
kvm->arch.pkvm.is_dying = true;
}
- __pkvm_pgtable_stage2_unshare(pgt, addr, addr + size);
+ if (kvm_vm_is_protected(kvm))
+ __pkvm_pgtable_stage2_reclaim(pgt, addr, addr + size);
+ else
+ __pkvm_pgtable_stage2_unshare(pgt, addr, addr + size);
}
void pkvm_pgtable_stage2_destroy_pgd(struct kvm_pgtable *pgt)
--
2.53.0.1018.g2bb0e51243-goog
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v4 17/38] KVM: arm64: Factor out pKVM host exception injection logic
2026-03-27 13:59 [PATCH v4 00/38] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
` (15 preceding siblings ...)
2026-03-27 14:00 ` [PATCH v4 16/38] KVM: arm64: Hook up reclaim hypercall to pkvm_pgtable_stage2_destroy() Will Deacon
@ 2026-03-27 14:00 ` Will Deacon
2026-03-27 14:00 ` [PATCH v4 18/38] KVM: arm64: Support translation faults in inject_host_exception() Will Deacon
` (21 subsequent siblings)
38 siblings, 0 replies; 40+ messages in thread
From: Will Deacon @ 2026-03-27 14:00 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh,
Alexandru Elisei
inject_undef64() open-codes the logic to inject an exception into the
pKVM host. In preparation for reusing this logic to inject a data abort
on an unhandled stage-2 fault from the host, factor out the meat and
potatoes of the function into a new inject_host_exception() function
which takes the ESR as a parameter.
Cc: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 49 ++++++++++++++----------------
1 file changed, 23 insertions(+), 26 deletions(-)
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 7294c94f9296..adfc0bc15398 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -705,43 +705,40 @@ static void handle_host_smc(struct kvm_cpu_context *host_ctxt)
kvm_skip_host_instr();
}
-/*
- * Inject an Undefined Instruction exception into the host.
- *
- * This is open-coded to allow control over PSTATE construction without
- * complicating the generic exception entry helpers.
- */
-static void inject_undef64(void)
+static void inject_host_exception(u64 esr)
{
- u64 spsr_mask, vbar, sctlr, old_spsr, new_spsr, esr, offset;
+ u64 sctlr, spsr_el1, spsr_el2, exc_offset = except_type_sync;
+ const u64 spsr_mask = PSR_N_BIT | PSR_Z_BIT | PSR_C_BIT |
+ PSR_V_BIT | PSR_DIT_BIT | PSR_PAN_BIT;
- spsr_mask = PSR_N_BIT | PSR_Z_BIT | PSR_C_BIT | PSR_V_BIT | PSR_DIT_BIT | PSR_PAN_BIT;
+ exc_offset += CURRENT_EL_SP_ELx_VECTOR;
+
+ spsr_el1 = spsr_el2 = read_sysreg_el2(SYS_SPSR);
+ spsr_el2 &= spsr_mask;
+ spsr_el2 |= PSR_D_BIT | PSR_A_BIT | PSR_I_BIT | PSR_F_BIT |
+ PSR_MODE_EL1h;
- vbar = read_sysreg_el1(SYS_VBAR);
sctlr = read_sysreg_el1(SYS_SCTLR);
- old_spsr = read_sysreg_el2(SYS_SPSR);
-
- new_spsr = old_spsr & spsr_mask;
- new_spsr |= PSR_D_BIT | PSR_A_BIT | PSR_I_BIT | PSR_F_BIT;
- new_spsr |= PSR_MODE_EL1h;
-
if (!(sctlr & SCTLR_EL1_SPAN))
- new_spsr |= PSR_PAN_BIT;
+ spsr_el2 |= PSR_PAN_BIT;
if (sctlr & SCTLR_ELx_DSSBS)
- new_spsr |= PSR_SSBS_BIT;
+ spsr_el2 |= PSR_SSBS_BIT;
if (system_supports_mte())
- new_spsr |= PSR_TCO_BIT;
-
- esr = (ESR_ELx_EC_UNKNOWN << ESR_ELx_EC_SHIFT) | ESR_ELx_IL;
- offset = CURRENT_EL_SP_ELx_VECTOR + except_type_sync;
+ spsr_el2 |= PSR_TCO_BIT;
write_sysreg_el1(esr, SYS_ESR);
write_sysreg_el1(read_sysreg_el2(SYS_ELR), SYS_ELR);
- write_sysreg_el1(old_spsr, SYS_SPSR);
- write_sysreg_el2(vbar + offset, SYS_ELR);
- write_sysreg_el2(new_spsr, SYS_SPSR);
+ write_sysreg_el1(spsr_el1, SYS_SPSR);
+ write_sysreg_el2(read_sysreg_el1(SYS_VBAR) + exc_offset, SYS_ELR);
+ write_sysreg_el2(spsr_el2, SYS_SPSR);
+}
+
+static void inject_host_undef64(void)
+{
+ inject_host_exception((ESR_ELx_EC_UNKNOWN << ESR_ELx_EC_SHIFT) |
+ ESR_ELx_IL);
}
static bool handle_host_mte(u64 esr)
@@ -764,7 +761,7 @@ static bool handle_host_mte(u64 esr)
return false;
}
- inject_undef64();
+ inject_host_undef64();
return true;
}
--
2.53.0.1018.g2bb0e51243-goog
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v4 18/38] KVM: arm64: Support translation faults in inject_host_exception()
2026-03-27 13:59 [PATCH v4 00/38] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
` (16 preceding siblings ...)
2026-03-27 14:00 ` [PATCH v4 17/38] KVM: arm64: Factor out pKVM host exception injection logic Will Deacon
@ 2026-03-27 14:00 ` Will Deacon
2026-03-27 14:00 ` [PATCH v4 19/38] KVM: arm64: Inject SIGSEGV on illegal accesses Will Deacon
` (20 subsequent siblings)
38 siblings, 0 replies; 40+ messages in thread
From: Will Deacon @ 2026-03-27 14:00 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh,
Alexandru Elisei
Extend inject_host_exception() to support the injection of translation
faults on both the data and instruction side to 32-bit and 64-bit EL0
as well as 64-bit EL1. This will be used in a subsequent patch when
resolving an unhandled host stage-2 abort.
Cc: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
arch/arm64/kvm/hyp/include/nvhe/trap_handler.h | 2 ++
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 18 +++++++++++++++---
2 files changed, 17 insertions(+), 3 deletions(-)
diff --git a/arch/arm64/kvm/hyp/include/nvhe/trap_handler.h b/arch/arm64/kvm/hyp/include/nvhe/trap_handler.h
index ba5382c12787..32d7b7746e8e 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/trap_handler.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/trap_handler.h
@@ -16,4 +16,6 @@
__always_unused int ___check_reg_ ## reg; \
type name = (type)cpu_reg(ctxt, (reg))
+void inject_host_exception(u64 esr);
+
#endif /* __ARM64_KVM_NVHE_TRAP_HANDLER_H__ */
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index adfc0bc15398..6db5aebd92dc 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -705,15 +705,24 @@ static void handle_host_smc(struct kvm_cpu_context *host_ctxt)
kvm_skip_host_instr();
}
-static void inject_host_exception(u64 esr)
+void inject_host_exception(u64 esr)
{
u64 sctlr, spsr_el1, spsr_el2, exc_offset = except_type_sync;
const u64 spsr_mask = PSR_N_BIT | PSR_Z_BIT | PSR_C_BIT |
PSR_V_BIT | PSR_DIT_BIT | PSR_PAN_BIT;
- exc_offset += CURRENT_EL_SP_ELx_VECTOR;
-
spsr_el1 = spsr_el2 = read_sysreg_el2(SYS_SPSR);
+ switch (spsr_el1 & (PSR_MODE_MASK | PSR_MODE32_BIT)) {
+ case PSR_MODE_EL0t:
+ exc_offset += LOWER_EL_AArch64_VECTOR;
+ break;
+ case PSR_MODE_EL0t | PSR_MODE32_BIT:
+ exc_offset += LOWER_EL_AArch32_VECTOR;
+ break;
+ default:
+ exc_offset += CURRENT_EL_SP_ELx_VECTOR;
+ }
+
spsr_el2 &= spsr_mask;
spsr_el2 |= PSR_D_BIT | PSR_A_BIT | PSR_I_BIT | PSR_F_BIT |
PSR_MODE_EL1h;
@@ -728,6 +737,9 @@ static void inject_host_exception(u64 esr)
if (system_supports_mte())
spsr_el2 |= PSR_TCO_BIT;
+ if (esr_fsc_is_translation_fault(esr))
+ write_sysreg_el1(read_sysreg_el2(SYS_FAR), SYS_FAR);
+
write_sysreg_el1(esr, SYS_ESR);
write_sysreg_el1(read_sysreg_el2(SYS_ELR), SYS_ELR);
write_sysreg_el1(spsr_el1, SYS_SPSR);
--
2.53.0.1018.g2bb0e51243-goog
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v4 19/38] KVM: arm64: Inject SIGSEGV on illegal accesses
2026-03-27 13:59 [PATCH v4 00/38] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
` (17 preceding siblings ...)
2026-03-27 14:00 ` [PATCH v4 18/38] KVM: arm64: Support translation faults in inject_host_exception() Will Deacon
@ 2026-03-27 14:00 ` Will Deacon
2026-03-27 14:00 ` [PATCH v4 20/38] KVM: arm64: Avoid pointless annotation when mapping host-owned pages Will Deacon
` (19 subsequent siblings)
38 siblings, 0 replies; 40+ messages in thread
From: Will Deacon @ 2026-03-27 14:00 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh,
Alexandru Elisei
From: Quentin Perret <qperret@google.com>
The pKVM hypervisor will currently panic if the host tries to access
memory that it doesn't own (e.g. protected guest memory). Sadly, as
guest memory can still be mapped into the VMM's address space, userspace
can trivially crash the kernel/hypervisor by poking into guest memory.
To prevent this, inject the abort back in the host with S1PTW set in the
ESR, hence allowing the host to differentiate this abort from normal
userspace faults and inject a SIGSEGV cleanly.
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 37 +++++++++++++++++++++++++++
arch/arm64/mm/fault.c | 22 ++++++++++++++++
2 files changed, 59 insertions(+)
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index ca266a4d9d50..0e57dc1881e0 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -18,6 +18,7 @@
#include <nvhe/memory.h>
#include <nvhe/mem_protect.h>
#include <nvhe/mm.h>
+#include <nvhe/trap_handler.h>
#define KVM_HOST_S2_FLAGS (KVM_PGTABLE_S2_AS_S1 | KVM_PGTABLE_S2_IDMAP)
@@ -612,6 +613,39 @@ static int host_stage2_idmap(u64 addr)
return ret;
}
+static void host_inject_mem_abort(struct kvm_cpu_context *host_ctxt)
+{
+ u64 ec, esr, spsr;
+
+ esr = read_sysreg_el2(SYS_ESR);
+ spsr = read_sysreg_el2(SYS_SPSR);
+
+ /* Repaint the ESR to report a same-level fault if taken from EL1 */
+ if ((spsr & PSR_MODE_MASK) != PSR_MODE_EL0t) {
+ ec = ESR_ELx_EC(esr);
+ if (ec == ESR_ELx_EC_DABT_LOW)
+ ec = ESR_ELx_EC_DABT_CUR;
+ else if (ec == ESR_ELx_EC_IABT_LOW)
+ ec = ESR_ELx_EC_IABT_CUR;
+ else
+ WARN_ON(1);
+ esr &= ~ESR_ELx_EC_MASK;
+ esr |= ec << ESR_ELx_EC_SHIFT;
+ }
+
+ /*
+ * Since S1PTW should only ever be set for stage-2 faults, we're pretty
+ * much guaranteed that it won't be set in ESR_EL1 by the hardware. So,
+ * let's use that bit to allow the host abort handler to differentiate
+ * this abort from normal userspace faults.
+ *
+ * Note: although S1PTW is RES0 at EL1, it is guaranteed by the
+ * architecture to be backed by flops, so it should be safe to use.
+ */
+ esr |= ESR_ELx_S1PTW;
+ inject_host_exception(esr);
+}
+
void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt)
{
struct kvm_vcpu_fault_info fault;
@@ -635,6 +669,9 @@ void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt)
addr = FIELD_GET(HPFAR_EL2_FIPA, fault.hpfar_el2) << 12;
switch (host_stage2_idmap(addr)) {
+ case -EPERM:
+ host_inject_mem_abort(host_ctxt);
+ fallthrough;
case -EEXIST:
case 0:
break;
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index be9dab2c7d6a..3abfc7272d63 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -43,6 +43,7 @@
#include <asm/system_misc.h>
#include <asm/tlbflush.h>
#include <asm/traps.h>
+#include <asm/virt.h>
struct fault_info {
int (*fn)(unsigned long far, unsigned long esr,
@@ -269,6 +270,15 @@ static inline bool is_el1_permission_fault(unsigned long addr, unsigned long esr
return false;
}
+static bool is_pkvm_stage2_abort(unsigned int esr)
+{
+ /*
+ * S1PTW should only ever be set in ESR_EL1 if the pkvm hypervisor
+ * injected a stage-2 abort -- see host_inject_mem_abort().
+ */
+ return is_pkvm_initialized() && (esr & ESR_ELx_S1PTW);
+}
+
static bool __kprobes is_spurious_el1_translation_fault(unsigned long addr,
unsigned long esr,
struct pt_regs *regs)
@@ -279,6 +289,9 @@ static bool __kprobes is_spurious_el1_translation_fault(unsigned long addr,
if (!is_el1_data_abort(esr) || !esr_fsc_is_translation_fault(esr))
return false;
+ if (is_pkvm_stage2_abort(esr))
+ return false;
+
local_irq_save(flags);
asm volatile("at s1e1r, %0" :: "r" (addr));
isb();
@@ -395,6 +408,8 @@ static void __do_kernel_fault(unsigned long addr, unsigned long esr,
msg = "read from unreadable memory";
} else if (addr < PAGE_SIZE) {
msg = "NULL pointer dereference";
+ } else if (is_pkvm_stage2_abort(esr)) {
+ msg = "access to hypervisor-protected memory";
} else {
if (esr_fsc_is_translation_fault(esr) &&
kfence_handle_page_fault(addr, esr & ESR_ELx_WNR, regs))
@@ -621,6 +636,13 @@ static int __kprobes do_page_fault(unsigned long far, unsigned long esr,
addr, esr, regs);
}
+ if (is_pkvm_stage2_abort(esr)) {
+ if (!user_mode(regs))
+ goto no_context;
+ arm64_force_sig_fault(SIGSEGV, SEGV_ACCERR, far, "stage-2 fault");
+ return 0;
+ }
+
perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, addr);
if (!(mm_flags & FAULT_FLAG_USER))
--
2.53.0.1018.g2bb0e51243-goog
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v4 20/38] KVM: arm64: Avoid pointless annotation when mapping host-owned pages
2026-03-27 13:59 [PATCH v4 00/38] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
` (18 preceding siblings ...)
2026-03-27 14:00 ` [PATCH v4 19/38] KVM: arm64: Inject SIGSEGV on illegal accesses Will Deacon
@ 2026-03-27 14:00 ` Will Deacon
2026-03-27 14:00 ` [PATCH v4 21/38] KVM: arm64: Generalise kvm_pgtable_stage2_set_owner() Will Deacon
` (18 subsequent siblings)
38 siblings, 0 replies; 40+ messages in thread
From: Will Deacon @ 2026-03-27 14:00 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh,
Alexandru Elisei
When a page is transitioned to host ownership, we can eagerly map it
into the host stage-2 page-table rather than going via the convoluted
step of a faulting annotation to trigger the mapping.
Call host_stage2_idmap_locked() directly when transitioning a page to
be owned by the host.
Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 28 +++++++++++++++------------
1 file changed, 16 insertions(+), 12 deletions(-)
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 0e57dc1881e0..bf5102594fc8 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -551,23 +551,27 @@ static void __host_update_page_state(phys_addr_t addr, u64 size, enum pkvm_page_
int host_stage2_set_owner_locked(phys_addr_t addr, u64 size, u8 owner_id)
{
- int ret;
+ int ret = -EINVAL;
if (!range_is_memory(addr, addr + size))
return -EPERM;
- ret = host_stage2_try(kvm_pgtable_stage2_set_owner, &host_mmu.pgt,
- addr, size, &host_s2_pool, owner_id);
- if (ret)
- return ret;
+ switch (owner_id) {
+ case PKVM_ID_HOST:
+ ret = host_stage2_idmap_locked(addr, size, PKVM_HOST_MEM_PROT);
+ if (!ret)
+ __host_update_page_state(addr, size, PKVM_PAGE_OWNED);
+ break;
+ case PKVM_ID_GUEST:
+ case PKVM_ID_HYP:
+ ret = host_stage2_try(kvm_pgtable_stage2_set_owner, &host_mmu.pgt,
+ addr, size, &host_s2_pool, owner_id);
+ if (!ret)
+ __host_update_page_state(addr, size, PKVM_NOPAGE);
+ break;
+ }
- /* Don't forget to update the vmemmap tracking for the host */
- if (owner_id == PKVM_ID_HOST)
- __host_update_page_state(addr, size, PKVM_PAGE_OWNED);
- else
- __host_update_page_state(addr, size, PKVM_NOPAGE);
-
- return 0;
+ return ret;
}
static bool host_stage2_force_pte_cb(u64 addr, u64 end, enum kvm_pgtable_prot prot)
--
2.53.0.1018.g2bb0e51243-goog
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v4 21/38] KVM: arm64: Generalise kvm_pgtable_stage2_set_owner()
2026-03-27 13:59 [PATCH v4 00/38] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
` (19 preceding siblings ...)
2026-03-27 14:00 ` [PATCH v4 20/38] KVM: arm64: Avoid pointless annotation when mapping host-owned pages Will Deacon
@ 2026-03-27 14:00 ` Will Deacon
2026-03-27 14:00 ` [PATCH v4 22/38] KVM: arm64: Introduce host_stage2_set_owner_metadata_locked() Will Deacon
` (17 subsequent siblings)
38 siblings, 0 replies; 40+ messages in thread
From: Will Deacon @ 2026-03-27 14:00 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh,
Alexandru Elisei
kvm_pgtable_stage2_set_owner() can be generalised into a way to store
up to 59 bits in the page tables alongside a 4-bit 'type' identifier
specific to the format of the 59-bit payload.
Introduce kvm_pgtable_stage2_annotate() and move the existing invalid
ptes (for locked ptes and donated pages) over to the new scheme.
Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
arch/arm64/include/asm/kvm_pgtable.h | 39 +++++++++++++++++++--------
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 16 +++++++++--
arch/arm64/kvm/hyp/pgtable.c | 33 ++++++++++++++---------
3 files changed, 62 insertions(+), 26 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 50caca311ef5..e36c2908bdb2 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -100,13 +100,25 @@ typedef u64 kvm_pte_t;
KVM_PTE_LEAF_ATTR_HI_S2_XN)
#define KVM_INVALID_PTE_OWNER_MASK GENMASK(9, 2)
-#define KVM_MAX_OWNER_ID 2
-/*
- * Used to indicate a pte for which a 'break-before-make' sequence is in
- * progress.
- */
-#define KVM_INVALID_PTE_LOCKED BIT(10)
+/* pKVM invalid pte encodings */
+#define KVM_INVALID_PTE_TYPE_MASK GENMASK(63, 60)
+#define KVM_INVALID_PTE_ANNOT_MASK ~(KVM_PTE_VALID | \
+ KVM_INVALID_PTE_TYPE_MASK)
+
+enum kvm_invalid_pte_type {
+ /*
+ * Used to indicate a pte for which a 'break-before-make'
+ * sequence is in progress.
+ */
+ KVM_INVALID_PTE_TYPE_LOCKED = 1,
+
+ /*
+ * pKVM has unmapped the page from the host due to a change of
+ * ownership.
+ */
+ KVM_HOST_INVALID_PTE_TYPE_DONATION,
+};
static inline bool kvm_pte_valid(kvm_pte_t pte)
{
@@ -658,14 +670,18 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
void *mc, enum kvm_pgtable_walk_flags flags);
/**
- * kvm_pgtable_stage2_set_owner() - Unmap and annotate pages in the IPA space to
- * track ownership.
+ * kvm_pgtable_stage2_annotate() - Unmap and annotate pages in the IPA space
+ * to track ownership (and more).
* @pgt: Page-table structure initialised by kvm_pgtable_stage2_init*().
* @addr: Base intermediate physical address to annotate.
* @size: Size of the annotated range.
* @mc: Cache of pre-allocated and zeroed memory from which to allocate
* page-table pages.
- * @owner_id: Unique identifier for the owner of the page.
+ * @type: The type of the annotation, determining its meaning and format.
+ * @annotation: A 59-bit value that will be stored in the page tables.
+ * @annotation[0] and @annotation[63:60] must be 0.
+ * @annotation[59:1] is stored in the page tables, along
+ * with @type.
*
* By default, all page-tables are owned by identifier 0. This function can be
* used to mark portions of the IPA space as owned by other entities. When a
@@ -674,8 +690,9 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
*
* Return: 0 on success, negative error code on failure.
*/
-int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size,
- void *mc, u8 owner_id);
+int kvm_pgtable_stage2_annotate(struct kvm_pgtable *pgt, u64 addr, u64 size,
+ void *mc, enum kvm_invalid_pte_type type,
+ kvm_pte_t annotation);
/**
* kvm_pgtable_stage2_unmap() - Remove a mapping from a guest stage-2 page-table.
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index bf5102594fc8..aea6ec981801 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -549,10 +549,19 @@ static void __host_update_page_state(phys_addr_t addr, u64 size, enum pkvm_page_
set_host_state(page, state);
}
+static kvm_pte_t kvm_init_invalid_leaf_owner(u8 owner_id)
+{
+ return FIELD_PREP(KVM_INVALID_PTE_OWNER_MASK, owner_id);
+}
+
int host_stage2_set_owner_locked(phys_addr_t addr, u64 size, u8 owner_id)
{
+ kvm_pte_t annotation;
int ret = -EINVAL;
+ if (!FIELD_FIT(KVM_INVALID_PTE_OWNER_MASK, owner_id))
+ return -EINVAL;
+
if (!range_is_memory(addr, addr + size))
return -EPERM;
@@ -564,8 +573,11 @@ int host_stage2_set_owner_locked(phys_addr_t addr, u64 size, u8 owner_id)
break;
case PKVM_ID_GUEST:
case PKVM_ID_HYP:
- ret = host_stage2_try(kvm_pgtable_stage2_set_owner, &host_mmu.pgt,
- addr, size, &host_s2_pool, owner_id);
+ annotation = kvm_init_invalid_leaf_owner(owner_id);
+ ret = host_stage2_try(kvm_pgtable_stage2_annotate, &host_mmu.pgt,
+ addr, size, &host_s2_pool,
+ KVM_HOST_INVALID_PTE_TYPE_DONATION,
+ annotation);
if (!ret)
__host_update_page_state(addr, size, PKVM_NOPAGE);
break;
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 9b480f947da2..84c7a1df845d 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -114,11 +114,6 @@ static kvm_pte_t kvm_init_valid_leaf_pte(u64 pa, kvm_pte_t attr, s8 level)
return pte;
}
-static kvm_pte_t kvm_init_invalid_leaf_owner(u8 owner_id)
-{
- return FIELD_PREP(KVM_INVALID_PTE_OWNER_MASK, owner_id);
-}
-
static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data,
const struct kvm_pgtable_visit_ctx *ctx,
enum kvm_pgtable_walk_flags visit)
@@ -581,7 +576,7 @@ void kvm_pgtable_hyp_destroy(struct kvm_pgtable *pgt)
struct stage2_map_data {
const u64 phys;
kvm_pte_t attr;
- u8 owner_id;
+ kvm_pte_t pte_annot;
kvm_pte_t *anchor;
kvm_pte_t *childp;
@@ -798,7 +793,11 @@ static bool stage2_pte_is_counted(kvm_pte_t pte)
static bool stage2_pte_is_locked(kvm_pte_t pte)
{
- return !kvm_pte_valid(pte) && (pte & KVM_INVALID_PTE_LOCKED);
+ if (kvm_pte_valid(pte))
+ return false;
+
+ return FIELD_GET(KVM_INVALID_PTE_TYPE_MASK, pte) ==
+ KVM_INVALID_PTE_TYPE_LOCKED;
}
static bool stage2_try_set_pte(const struct kvm_pgtable_visit_ctx *ctx, kvm_pte_t new)
@@ -829,6 +828,7 @@ static bool stage2_try_break_pte(const struct kvm_pgtable_visit_ctx *ctx,
struct kvm_s2_mmu *mmu)
{
struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
+ kvm_pte_t locked_pte;
if (stage2_pte_is_locked(ctx->old)) {
/*
@@ -839,7 +839,9 @@ static bool stage2_try_break_pte(const struct kvm_pgtable_visit_ctx *ctx,
return false;
}
- if (!stage2_try_set_pte(ctx, KVM_INVALID_PTE_LOCKED))
+ locked_pte = FIELD_PREP(KVM_INVALID_PTE_TYPE_MASK,
+ KVM_INVALID_PTE_TYPE_LOCKED);
+ if (!stage2_try_set_pte(ctx, locked_pte))
return false;
if (!kvm_pgtable_walk_skip_bbm_tlbi(ctx)) {
@@ -964,7 +966,7 @@ static int stage2_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
if (!data->annotation)
new = kvm_init_valid_leaf_pte(phys, data->attr, ctx->level);
else
- new = kvm_init_invalid_leaf_owner(data->owner_id);
+ new = data->pte_annot;
/*
* Skip updating the PTE if we are trying to recreate the exact
@@ -1118,16 +1120,18 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
return ret;
}
-int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size,
- void *mc, u8 owner_id)
+int kvm_pgtable_stage2_annotate(struct kvm_pgtable *pgt, u64 addr, u64 size,
+ void *mc, enum kvm_invalid_pte_type type,
+ kvm_pte_t pte_annot)
{
int ret;
struct stage2_map_data map_data = {
.mmu = pgt->mmu,
.memcache = mc,
- .owner_id = owner_id,
.force_pte = true,
.annotation = true,
+ .pte_annot = pte_annot |
+ FIELD_PREP(KVM_INVALID_PTE_TYPE_MASK, type),
};
struct kvm_pgtable_walker walker = {
.cb = stage2_map_walker,
@@ -1136,7 +1140,10 @@ int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size,
.arg = &map_data,
};
- if (owner_id > KVM_MAX_OWNER_ID)
+ if (pte_annot & ~KVM_INVALID_PTE_ANNOT_MASK)
+ return -EINVAL;
+
+ if (!type || type == KVM_INVALID_PTE_TYPE_LOCKED)
return -EINVAL;
ret = kvm_pgtable_walk(pgt, addr, size, &walker);
--
2.53.0.1018.g2bb0e51243-goog
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v4 22/38] KVM: arm64: Introduce host_stage2_set_owner_metadata_locked()
2026-03-27 13:59 [PATCH v4 00/38] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
` (20 preceding siblings ...)
2026-03-27 14:00 ` [PATCH v4 21/38] KVM: arm64: Generalise kvm_pgtable_stage2_set_owner() Will Deacon
@ 2026-03-27 14:00 ` Will Deacon
2026-03-27 14:00 ` [PATCH v4 23/38] KVM: arm64: Change 'pkvm_handle_t' to u16 Will Deacon
` (16 subsequent siblings)
38 siblings, 0 replies; 40+ messages in thread
From: Will Deacon @ 2026-03-27 14:00 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh,
Alexandru Elisei
Rework host_stage2_set_owner_locked() to add a new helper function,
host_stage2_set_owner_metadata_locked(), which will allow us to store
additional metadata alongside a 3-bit owner ID for invalid host stage-2
entries.
Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
arch/arm64/include/asm/kvm_pgtable.h | 2 --
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 47 ++++++++++++++++++---------
2 files changed, 32 insertions(+), 17 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index e36c2908bdb2..2df22640833c 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -99,8 +99,6 @@ typedef u64 kvm_pte_t;
KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W | \
KVM_PTE_LEAF_ATTR_HI_S2_XN)
-#define KVM_INVALID_PTE_OWNER_MASK GENMASK(9, 2)
-
/* pKVM invalid pte encodings */
#define KVM_INVALID_PTE_TYPE_MASK GENMASK(63, 60)
#define KVM_INVALID_PTE_ANNOT_MASK ~(KVM_PTE_VALID | \
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index aea6ec981801..90003cbf5603 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -549,37 +549,54 @@ static void __host_update_page_state(phys_addr_t addr, u64 size, enum pkvm_page_
set_host_state(page, state);
}
-static kvm_pte_t kvm_init_invalid_leaf_owner(u8 owner_id)
-{
- return FIELD_PREP(KVM_INVALID_PTE_OWNER_MASK, owner_id);
-}
-
-int host_stage2_set_owner_locked(phys_addr_t addr, u64 size, u8 owner_id)
+#define KVM_HOST_DONATION_PTE_OWNER_MASK GENMASK(3, 1)
+#define KVM_HOST_DONATION_PTE_EXTRA_MASK GENMASK(59, 4)
+static int host_stage2_set_owner_metadata_locked(phys_addr_t addr, u64 size,
+ u8 owner_id, u64 meta)
{
kvm_pte_t annotation;
- int ret = -EINVAL;
+ int ret;
- if (!FIELD_FIT(KVM_INVALID_PTE_OWNER_MASK, owner_id))
+ if (owner_id == PKVM_ID_HOST)
return -EINVAL;
if (!range_is_memory(addr, addr + size))
return -EPERM;
+ if (!FIELD_FIT(KVM_HOST_DONATION_PTE_OWNER_MASK, owner_id))
+ return -EINVAL;
+
+ if (!FIELD_FIT(KVM_HOST_DONATION_PTE_EXTRA_MASK, meta))
+ return -EINVAL;
+
+ annotation = FIELD_PREP(KVM_HOST_DONATION_PTE_OWNER_MASK, owner_id) |
+ FIELD_PREP(KVM_HOST_DONATION_PTE_EXTRA_MASK, meta);
+ ret = host_stage2_try(kvm_pgtable_stage2_annotate, &host_mmu.pgt,
+ addr, size, &host_s2_pool,
+ KVM_HOST_INVALID_PTE_TYPE_DONATION, annotation);
+ if (!ret)
+ __host_update_page_state(addr, size, PKVM_NOPAGE);
+
+ return ret;
+}
+
+int host_stage2_set_owner_locked(phys_addr_t addr, u64 size, u8 owner_id)
+{
+ int ret = -EINVAL;
+
switch (owner_id) {
case PKVM_ID_HOST:
+ if (!range_is_memory(addr, addr + size))
+ return -EPERM;
+
ret = host_stage2_idmap_locked(addr, size, PKVM_HOST_MEM_PROT);
if (!ret)
__host_update_page_state(addr, size, PKVM_PAGE_OWNED);
break;
case PKVM_ID_GUEST:
case PKVM_ID_HYP:
- annotation = kvm_init_invalid_leaf_owner(owner_id);
- ret = host_stage2_try(kvm_pgtable_stage2_annotate, &host_mmu.pgt,
- addr, size, &host_s2_pool,
- KVM_HOST_INVALID_PTE_TYPE_DONATION,
- annotation);
- if (!ret)
- __host_update_page_state(addr, size, PKVM_NOPAGE);
+ ret = host_stage2_set_owner_metadata_locked(addr, size,
+ owner_id, 0);
break;
}
--
2.53.0.1018.g2bb0e51243-goog
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v4 23/38] KVM: arm64: Change 'pkvm_handle_t' to u16
2026-03-27 13:59 [PATCH v4 00/38] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
` (21 preceding siblings ...)
2026-03-27 14:00 ` [PATCH v4 22/38] KVM: arm64: Introduce host_stage2_set_owner_metadata_locked() Will Deacon
@ 2026-03-27 14:00 ` Will Deacon
2026-03-27 14:00 ` [PATCH v4 24/38] KVM: arm64: Annotate guest donations with handle and gfn in host stage-2 Will Deacon
` (15 subsequent siblings)
38 siblings, 0 replies; 40+ messages in thread
From: Will Deacon @ 2026-03-27 14:00 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh,
Alexandru Elisei
'pkvm_handle_t' doesn't need to be a 32-bit type and subsequent patches
will rely on it being no more than 16 bits so that it can be encoded
into a pte annotation.
Change 'pkvm_handle_t' to a u16 and add a compile-type check that the
maximum handle fits into the reduced type.
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
arch/arm64/include/asm/kvm_host.h | 2 +-
arch/arm64/kvm/hyp/nvhe/pkvm.c | 1 +
2 files changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 31b9454bb74d..0c5e7ce5f187 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -247,7 +247,7 @@ struct kvm_smccc_features {
unsigned long vendor_hyp_bmap_2; /* Function numbers 64-127 */
};
-typedef unsigned int pkvm_handle_t;
+typedef u16 pkvm_handle_t;
struct kvm_protected_vm {
pkvm_handle_t handle;
diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
index a2d45f4b0cf6..a7253a884163 100644
--- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
+++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
@@ -222,6 +222,7 @@ static struct pkvm_hyp_vm **vm_table;
void pkvm_hyp_vm_table_init(void *tbl)
{
+ BUILD_BUG_ON((u64)HANDLE_OFFSET + KVM_MAX_PVMS > (pkvm_handle_t)-1);
WARN_ON(vm_table);
vm_table = tbl;
}
--
2.53.0.1018.g2bb0e51243-goog
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v4 24/38] KVM: arm64: Annotate guest donations with handle and gfn in host stage-2
2026-03-27 13:59 [PATCH v4 00/38] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
` (22 preceding siblings ...)
2026-03-27 14:00 ` [PATCH v4 23/38] KVM: arm64: Change 'pkvm_handle_t' to u16 Will Deacon
@ 2026-03-27 14:00 ` Will Deacon
2026-03-27 14:00 ` [PATCH v4 25/38] KVM: arm64: Introduce hypercall to force reclaim of a protected page Will Deacon
` (14 subsequent siblings)
38 siblings, 0 replies; 40+ messages in thread
From: Will Deacon @ 2026-03-27 14:00 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh,
Alexandru Elisei
Handling host kernel faults arising from accesses to donated guest
memory will require an rmap-like mechanism to identify the guest mapping
of the faulting page.
Extend the page donation logic to encode the guest handle and gfn
alongside the owner information in the host stage-2 pte.
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 20 ++++++++++++++++++--
1 file changed, 18 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 90003cbf5603..51cb5c89fd20 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -593,7 +593,6 @@ int host_stage2_set_owner_locked(phys_addr_t addr, u64 size, u8 owner_id)
if (!ret)
__host_update_page_state(addr, size, PKVM_PAGE_OWNED);
break;
- case PKVM_ID_GUEST:
case PKVM_ID_HYP:
ret = host_stage2_set_owner_metadata_locked(addr, size,
owner_id, 0);
@@ -603,6 +602,20 @@ int host_stage2_set_owner_locked(phys_addr_t addr, u64 size, u8 owner_id)
return ret;
}
+#define KVM_HOST_PTE_OWNER_GUEST_HANDLE_MASK GENMASK(15, 0)
+/* We need 40 bits for the GFN to cover a 52-bit IPA with 4k pages and LPA2 */
+#define KVM_HOST_PTE_OWNER_GUEST_GFN_MASK GENMASK(55, 16)
+static u64 host_stage2_encode_gfn_meta(struct pkvm_hyp_vm *vm, u64 gfn)
+{
+ pkvm_handle_t handle = vm->kvm.arch.pkvm.handle;
+
+ BUILD_BUG_ON((pkvm_handle_t)-1 > KVM_HOST_PTE_OWNER_GUEST_HANDLE_MASK);
+ WARN_ON(!FIELD_FIT(KVM_HOST_PTE_OWNER_GUEST_GFN_MASK, gfn));
+
+ return FIELD_PREP(KVM_HOST_PTE_OWNER_GUEST_HANDLE_MASK, handle) |
+ FIELD_PREP(KVM_HOST_PTE_OWNER_GUEST_GFN_MASK, gfn);
+}
+
static bool host_stage2_force_pte_cb(u64 addr, u64 end, enum kvm_pgtable_prot prot)
{
/*
@@ -1125,6 +1138,7 @@ int __pkvm_host_donate_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu)
struct pkvm_hyp_vm *vm = pkvm_hyp_vcpu_to_hyp_vm(vcpu);
u64 phys = hyp_pfn_to_phys(pfn);
u64 ipa = hyp_pfn_to_phys(gfn);
+ u64 meta;
int ret;
host_lock_component();
@@ -1138,7 +1152,9 @@ int __pkvm_host_donate_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu)
if (ret)
goto unlock;
- WARN_ON(host_stage2_set_owner_locked(phys, PAGE_SIZE, PKVM_ID_GUEST));
+ meta = host_stage2_encode_gfn_meta(vm, gfn);
+ WARN_ON(host_stage2_set_owner_metadata_locked(phys, PAGE_SIZE,
+ PKVM_ID_GUEST, meta));
WARN_ON(kvm_pgtable_stage2_map(&vm->pgt, ipa, PAGE_SIZE, phys,
pkvm_mkstate(KVM_PGTABLE_PROT_RWX, PKVM_PAGE_OWNED),
&vcpu->vcpu.arch.pkvm_memcache, 0));
--
2.53.0.1018.g2bb0e51243-goog
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v4 25/38] KVM: arm64: Introduce hypercall to force reclaim of a protected page
2026-03-27 13:59 [PATCH v4 00/38] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
` (23 preceding siblings ...)
2026-03-27 14:00 ` [PATCH v4 24/38] KVM: arm64: Annotate guest donations with handle and gfn in host stage-2 Will Deacon
@ 2026-03-27 14:00 ` Will Deacon
2026-03-27 14:00 ` [PATCH v4 26/38] KVM: arm64: Reclaim faulting page from pKVM in spurious fault handler Will Deacon
` (13 subsequent siblings)
38 siblings, 0 replies; 40+ messages in thread
From: Will Deacon @ 2026-03-27 14:00 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh,
Alexandru Elisei
Introduce a new hypercall, __pkvm_force_reclaim_guest_page(), to allow
the host to forcefully reclaim a physical page that was previous donated
to a protected guest. This results in the page being zeroed and the
previous guest mapping being poisoned so that new pages cannot be
subsequently donated at the same IPA.
Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
arch/arm64/include/asm/kvm_asm.h | 1 +
arch/arm64/include/asm/kvm_pgtable.h | 6 +
arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 1 +
arch/arm64/kvm/hyp/include/nvhe/memory.h | 6 +
arch/arm64/kvm/hyp/include/nvhe/pkvm.h | 1 +
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 8 ++
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 127 +++++++++++++++++-
arch/arm64/kvm/hyp/nvhe/pkvm.c | 4 +-
8 files changed, 152 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index b6df8f64d573..04a230e906a7 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -90,6 +90,7 @@ enum __kvm_host_smccc_func {
__KVM_HOST_SMCCC_FUNC___pkvm_unreserve_vm,
__KVM_HOST_SMCCC_FUNC___pkvm_init_vm,
__KVM_HOST_SMCCC_FUNC___pkvm_init_vcpu,
+ __KVM_HOST_SMCCC_FUNC___pkvm_force_reclaim_guest_page,
__KVM_HOST_SMCCC_FUNC___pkvm_reclaim_dying_guest_page,
__KVM_HOST_SMCCC_FUNC___pkvm_start_teardown_vm,
__KVM_HOST_SMCCC_FUNC___pkvm_finalize_teardown_vm,
diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 2df22640833c..41a8687938eb 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -116,6 +116,12 @@ enum kvm_invalid_pte_type {
* ownership.
*/
KVM_HOST_INVALID_PTE_TYPE_DONATION,
+
+ /*
+ * The page has been forcefully reclaimed from the guest by the
+ * host.
+ */
+ KVM_GUEST_INVALID_PTE_TYPE_POISONED,
};
static inline bool kvm_pte_valid(kvm_pte_t pte)
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index 29f81a1d9e1f..acc031103600 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -40,6 +40,7 @@ int __pkvm_hyp_donate_host(u64 pfn, u64 nr_pages);
int __pkvm_host_share_ffa(u64 pfn, u64 nr_pages);
int __pkvm_host_unshare_ffa(u64 pfn, u64 nr_pages);
int __pkvm_host_donate_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu);
+int __pkvm_host_force_reclaim_page_guest(phys_addr_t phys);
int __pkvm_host_reclaim_page_guest(u64 gfn, struct pkvm_hyp_vm *vm);
int __pkvm_host_share_guest(u64 pfn, u64 gfn, u64 nr_pages, struct pkvm_hyp_vcpu *vcpu,
enum kvm_pgtable_prot prot);
diff --git a/arch/arm64/kvm/hyp/include/nvhe/memory.h b/arch/arm64/kvm/hyp/include/nvhe/memory.h
index dee1a406b0c2..4cedb720c75d 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/memory.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/memory.h
@@ -30,6 +30,12 @@ enum pkvm_page_state {
* struct hyp_page.
*/
PKVM_NOPAGE = BIT(0) | BIT(1),
+
+ /*
+ * 'Meta-states' which aren't encoded directly in the PTE's SW bits (or
+ * the hyp_vmemmap entry for the host)
+ */
+ PKVM_POISON = BIT(2),
};
#define PKVM_PAGE_STATE_MASK (BIT(0) | BIT(1))
diff --git a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
index 506831804f64..a5a7bb453f3e 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
@@ -78,6 +78,7 @@ int __pkvm_reclaim_dying_guest_page(pkvm_handle_t handle, u64 gfn);
int __pkvm_start_teardown_vm(pkvm_handle_t handle);
int __pkvm_finalize_teardown_vm(pkvm_handle_t handle);
+struct pkvm_hyp_vm *get_vm_by_handle(pkvm_handle_t handle);
struct pkvm_hyp_vcpu *pkvm_load_hyp_vcpu(pkvm_handle_t handle,
unsigned int vcpu_idx);
void pkvm_put_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu);
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 6db5aebd92dc..456c83207717 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -573,6 +573,13 @@ static void handle___pkvm_init_vcpu(struct kvm_cpu_context *host_ctxt)
cpu_reg(host_ctxt, 1) = __pkvm_init_vcpu(handle, host_vcpu, vcpu_hva);
}
+static void handle___pkvm_force_reclaim_guest_page(struct kvm_cpu_context *host_ctxt)
+{
+ DECLARE_REG(phys_addr_t, phys, host_ctxt, 1);
+
+ cpu_reg(host_ctxt, 1) = __pkvm_host_force_reclaim_page_guest(phys);
+}
+
static void handle___pkvm_reclaim_dying_guest_page(struct kvm_cpu_context *host_ctxt)
{
DECLARE_REG(pkvm_handle_t, handle, host_ctxt, 1);
@@ -634,6 +641,7 @@ static const hcall_t host_hcall[] = {
HANDLE_FUNC(__pkvm_unreserve_vm),
HANDLE_FUNC(__pkvm_init_vm),
HANDLE_FUNC(__pkvm_init_vcpu),
+ HANDLE_FUNC(__pkvm_force_reclaim_guest_page),
HANDLE_FUNC(__pkvm_reclaim_dying_guest_page),
HANDLE_FUNC(__pkvm_start_teardown_vm),
HANDLE_FUNC(__pkvm_finalize_teardown_vm),
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 51cb5c89fd20..dfc512d3bb20 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -616,6 +616,35 @@ static u64 host_stage2_encode_gfn_meta(struct pkvm_hyp_vm *vm, u64 gfn)
FIELD_PREP(KVM_HOST_PTE_OWNER_GUEST_GFN_MASK, gfn);
}
+static int host_stage2_decode_gfn_meta(kvm_pte_t pte, struct pkvm_hyp_vm **vm,
+ u64 *gfn)
+{
+ pkvm_handle_t handle;
+ u64 meta;
+
+ if (WARN_ON(kvm_pte_valid(pte)))
+ return -EINVAL;
+
+ if (FIELD_GET(KVM_INVALID_PTE_TYPE_MASK, pte) !=
+ KVM_HOST_INVALID_PTE_TYPE_DONATION) {
+ return -EINVAL;
+ }
+
+ if (FIELD_GET(KVM_HOST_DONATION_PTE_OWNER_MASK, pte) != PKVM_ID_GUEST)
+ return -EPERM;
+
+ meta = FIELD_GET(KVM_HOST_DONATION_PTE_EXTRA_MASK, pte);
+ handle = FIELD_GET(KVM_HOST_PTE_OWNER_GUEST_HANDLE_MASK, meta);
+ *vm = get_vm_by_handle(handle);
+ if (!*vm) {
+ /* We probably raced with teardown; try again */
+ return -EAGAIN;
+ }
+
+ *gfn = FIELD_GET(KVM_HOST_PTE_OWNER_GUEST_GFN_MASK, meta);
+ return 0;
+}
+
static bool host_stage2_force_pte_cb(u64 addr, u64 end, enum kvm_pgtable_prot prot)
{
/*
@@ -801,8 +830,20 @@ static int __hyp_check_page_state_range(phys_addr_t phys, u64 size, enum pkvm_pa
return 0;
}
+static bool guest_pte_is_poisoned(kvm_pte_t pte)
+{
+ if (kvm_pte_valid(pte))
+ return false;
+
+ return FIELD_GET(KVM_INVALID_PTE_TYPE_MASK, pte) ==
+ KVM_GUEST_INVALID_PTE_TYPE_POISONED;
+}
+
static enum pkvm_page_state guest_get_page_state(kvm_pte_t pte, u64 addr)
{
+ if (guest_pte_is_poisoned(pte))
+ return PKVM_POISON;
+
if (!kvm_pte_valid(pte))
return PKVM_NOPAGE;
@@ -831,6 +872,8 @@ static int get_valid_guest_pte(struct pkvm_hyp_vm *vm, u64 ipa, kvm_pte_t *ptep,
ret = kvm_pgtable_get_leaf(&vm->pgt, ipa, &pte, &level);
if (ret)
return ret;
+ if (guest_pte_is_poisoned(pte))
+ return -EHWPOISON;
if (!kvm_pte_valid(pte))
return -ENOENT;
if (level != KVM_PGTABLE_LAST_LEVEL)
@@ -1096,6 +1139,84 @@ static void hyp_poison_page(phys_addr_t phys)
hyp_fixmap_unmap();
}
+static int host_stage2_get_guest_info(phys_addr_t phys, struct pkvm_hyp_vm **vm,
+ u64 *gfn)
+{
+ enum pkvm_page_state state;
+ kvm_pte_t pte;
+ s8 level;
+ int ret;
+
+ if (!addr_is_memory(phys))
+ return -EFAULT;
+
+ state = get_host_state(hyp_phys_to_page(phys));
+ switch (state) {
+ case PKVM_PAGE_OWNED:
+ case PKVM_PAGE_SHARED_OWNED:
+ case PKVM_PAGE_SHARED_BORROWED:
+ /* The access should no longer fault; try again. */
+ return -EAGAIN;
+ case PKVM_NOPAGE:
+ break;
+ default:
+ return -EPERM;
+ }
+
+ ret = kvm_pgtable_get_leaf(&host_mmu.pgt, phys, &pte, &level);
+ if (ret)
+ return ret;
+
+ if (WARN_ON(level != KVM_PGTABLE_LAST_LEVEL))
+ return -EINVAL;
+
+ return host_stage2_decode_gfn_meta(pte, vm, gfn);
+}
+
+int __pkvm_host_force_reclaim_page_guest(phys_addr_t phys)
+{
+ struct pkvm_hyp_vm *vm;
+ u64 gfn, ipa, pa;
+ kvm_pte_t pte;
+ int ret;
+
+ hyp_spin_lock(&vm_table_lock);
+ host_lock_component();
+
+ ret = host_stage2_get_guest_info(phys, &vm, &gfn);
+ if (ret)
+ goto unlock_host;
+
+ ipa = hyp_pfn_to_phys(gfn);
+ guest_lock_component(vm);
+ ret = get_valid_guest_pte(vm, ipa, &pte, &pa);
+ if (ret)
+ goto unlock_guest;
+
+ WARN_ON(pa != phys);
+ if (guest_get_page_state(pte, ipa) != PKVM_PAGE_OWNED) {
+ ret = -EPERM;
+ goto unlock_guest;
+ }
+
+ /* We really shouldn't be allocating, so don't pass a memcache */
+ ret = kvm_pgtable_stage2_annotate(&vm->pgt, ipa, PAGE_SIZE, NULL,
+ KVM_GUEST_INVALID_PTE_TYPE_POISONED,
+ 0);
+ if (ret)
+ goto unlock_guest;
+
+ hyp_poison_page(phys);
+ WARN_ON(host_stage2_set_owner_locked(phys, PAGE_SIZE, PKVM_ID_HOST));
+unlock_guest:
+ guest_unlock_component(vm);
+unlock_host:
+ host_unlock_component();
+ hyp_spin_unlock(&vm_table_lock);
+
+ return ret;
+}
+
int __pkvm_host_reclaim_page_guest(u64 gfn, struct pkvm_hyp_vm *vm)
{
u64 ipa = hyp_pfn_to_phys(gfn);
@@ -1130,7 +1251,11 @@ int __pkvm_host_reclaim_page_guest(u64 gfn, struct pkvm_hyp_vm *vm)
guest_unlock_component(vm);
host_unlock_component();
- return ret;
+ /*
+ * -EHWPOISON implies that the page was forcefully reclaimed already
+ * so return success for the GUP pin to be dropped.
+ */
+ return ret && ret != -EHWPOISON ? ret : 0;
}
int __pkvm_host_donate_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu)
diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
index a7253a884163..5269ac20d2fb 100644
--- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
+++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
@@ -230,10 +230,12 @@ void pkvm_hyp_vm_table_init(void *tbl)
/*
* Return the hyp vm structure corresponding to the handle.
*/
-static struct pkvm_hyp_vm *get_vm_by_handle(pkvm_handle_t handle)
+struct pkvm_hyp_vm *get_vm_by_handle(pkvm_handle_t handle)
{
unsigned int idx = vm_handle_to_idx(handle);
+ hyp_assert_lock_held(&vm_table_lock);
+
if (unlikely(idx >= KVM_MAX_PVMS))
return NULL;
--
2.53.0.1018.g2bb0e51243-goog
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v4 26/38] KVM: arm64: Reclaim faulting page from pKVM in spurious fault handler
2026-03-27 13:59 [PATCH v4 00/38] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
` (24 preceding siblings ...)
2026-03-27 14:00 ` [PATCH v4 25/38] KVM: arm64: Introduce hypercall to force reclaim of a protected page Will Deacon
@ 2026-03-27 14:00 ` Will Deacon
2026-03-27 14:00 ` [PATCH v4 27/38] KVM: arm64: Return -EFAULT from VCPU_RUN on access to a poisoned pte Will Deacon
` (12 subsequent siblings)
38 siblings, 0 replies; 40+ messages in thread
From: Will Deacon @ 2026-03-27 14:00 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh,
Alexandru Elisei
Host kernel accesses to pages that are inaccessible at stage-2 result in
the injection of a translation fault, which is fatal unless an exception
table fixup is registered for the faulting PC (e.g. for user access
routines). This is undesirable, since a get_user_pages() call could be
used to obtain a reference to a donated page and then a subsequent
access via a kernel mapping would lead to a panic().
Rework the spurious fault handler so that stage-2 faults injected back
into the host result in the target page being forcefully reclaimed when
no exception table fixup handler is registered.
Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
arch/arm64/include/asm/virt.h | 9 +++++++++
arch/arm64/kvm/pkvm.c | 12 ++++++++++++
arch/arm64/mm/fault.c | 17 +++++++++++------
3 files changed, 32 insertions(+), 6 deletions(-)
diff --git a/arch/arm64/include/asm/virt.h b/arch/arm64/include/asm/virt.h
index b51ab6840f9c..b546703c3ab9 100644
--- a/arch/arm64/include/asm/virt.h
+++ b/arch/arm64/include/asm/virt.h
@@ -94,6 +94,15 @@ static inline bool is_pkvm_initialized(void)
static_branch_likely(&kvm_protected_mode_initialized);
}
+#ifdef CONFIG_KVM
+bool pkvm_force_reclaim_guest_page(phys_addr_t phys);
+#else
+static inline bool pkvm_force_reclaim_guest_page(phys_addr_t phys)
+{
+ return false;
+}
+#endif
+
/* Reports the availability of HYP mode */
static inline bool is_hyp_mode_available(void)
{
diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index 8be91051699e..32294bd21dde 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -563,3 +563,15 @@ int pkvm_pgtable_stage2_split(struct kvm_pgtable *pgt, u64 addr, u64 size,
WARN_ON_ONCE(1);
return -EINVAL;
}
+
+/*
+ * Forcefully reclaim a page from the guest, zeroing its contents and
+ * poisoning the stage-2 pte so that pages can no longer be mapped at
+ * the same IPA. The page remains pinned until the guest is destroyed.
+ */
+bool pkvm_force_reclaim_guest_page(phys_addr_t phys)
+{
+ int ret = kvm_call_hyp_nvhe(__pkvm_force_reclaim_guest_page, phys);
+
+ return !ret || ret == -EAGAIN;
+}
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 3abfc7272d63..7eacc7b45c1f 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -289,9 +289,6 @@ static bool __kprobes is_spurious_el1_translation_fault(unsigned long addr,
if (!is_el1_data_abort(esr) || !esr_fsc_is_translation_fault(esr))
return false;
- if (is_pkvm_stage2_abort(esr))
- return false;
-
local_irq_save(flags);
asm volatile("at s1e1r, %0" :: "r" (addr));
isb();
@@ -302,8 +299,14 @@ static bool __kprobes is_spurious_el1_translation_fault(unsigned long addr,
* If we now have a valid translation, treat the translation fault as
* spurious.
*/
- if (!(par & SYS_PAR_EL1_F))
+ if (!(par & SYS_PAR_EL1_F)) {
+ if (is_pkvm_stage2_abort(esr)) {
+ par &= SYS_PAR_EL1_PA;
+ return pkvm_force_reclaim_guest_page(par);
+ }
+
return true;
+ }
/*
* If we got a different type of fault from the AT instruction,
@@ -389,9 +392,11 @@ static void __do_kernel_fault(unsigned long addr, unsigned long esr,
if (!is_el1_instruction_abort(esr) && fixup_exception(regs, esr))
return;
- if (WARN_RATELIMIT(is_spurious_el1_translation_fault(addr, esr, regs),
- "Ignoring spurious kernel translation fault at virtual address %016lx\n", addr))
+ if (is_spurious_el1_translation_fault(addr, esr, regs)) {
+ WARN_RATELIMIT(!is_pkvm_stage2_abort(esr),
+ "Ignoring spurious kernel translation fault at virtual address %016lx\n", addr);
return;
+ }
if (is_el1_mte_sync_tag_check_fault(esr)) {
do_tag_recovery(addr, esr, regs);
--
2.53.0.1018.g2bb0e51243-goog
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v4 27/38] KVM: arm64: Return -EFAULT from VCPU_RUN on access to a poisoned pte
2026-03-27 13:59 [PATCH v4 00/38] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
` (25 preceding siblings ...)
2026-03-27 14:00 ` [PATCH v4 26/38] KVM: arm64: Reclaim faulting page from pKVM in spurious fault handler Will Deacon
@ 2026-03-27 14:00 ` Will Deacon
2026-03-27 14:00 ` [PATCH v4 28/38] KVM: arm64: Add hvc handler at EL2 for hypercalls from protected VMs Will Deacon
` (11 subsequent siblings)
38 siblings, 0 replies; 40+ messages in thread
From: Will Deacon @ 2026-03-27 14:00 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh,
Alexandru Elisei
If a protected vCPU faults on an IPA which appears to be mapped, query
the hypervisor to determine whether or not the faulting pte has been
poisoned by a forceful reclaim. If the pte has been poisoned, return
-EFAULT back to userspace rather than retrying the instruction forever.
Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
arch/arm64/include/asm/kvm_asm.h | 1 +
arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 1 +
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 10 +++++
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 43 +++++++++++++++++++
arch/arm64/kvm/pkvm.c | 9 ++--
5 files changed, 61 insertions(+), 3 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 04a230e906a7..6c79f7504d80 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -90,6 +90,7 @@ enum __kvm_host_smccc_func {
__KVM_HOST_SMCCC_FUNC___pkvm_unreserve_vm,
__KVM_HOST_SMCCC_FUNC___pkvm_init_vm,
__KVM_HOST_SMCCC_FUNC___pkvm_init_vcpu,
+ __KVM_HOST_SMCCC_FUNC___pkvm_vcpu_in_poison_fault,
__KVM_HOST_SMCCC_FUNC___pkvm_force_reclaim_guest_page,
__KVM_HOST_SMCCC_FUNC___pkvm_reclaim_dying_guest_page,
__KVM_HOST_SMCCC_FUNC___pkvm_start_teardown_vm,
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index acc031103600..8bc9a2489298 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -40,6 +40,7 @@ int __pkvm_hyp_donate_host(u64 pfn, u64 nr_pages);
int __pkvm_host_share_ffa(u64 pfn, u64 nr_pages);
int __pkvm_host_unshare_ffa(u64 pfn, u64 nr_pages);
int __pkvm_host_donate_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu);
+int __pkvm_vcpu_in_poison_fault(struct pkvm_hyp_vcpu *hyp_vcpu);
int __pkvm_host_force_reclaim_page_guest(phys_addr_t phys);
int __pkvm_host_reclaim_page_guest(u64 gfn, struct pkvm_hyp_vm *vm);
int __pkvm_host_share_guest(u64 pfn, u64 gfn, u64 nr_pages, struct pkvm_hyp_vcpu *vcpu,
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 456c83207717..90e3b14fe287 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -573,6 +573,15 @@ static void handle___pkvm_init_vcpu(struct kvm_cpu_context *host_ctxt)
cpu_reg(host_ctxt, 1) = __pkvm_init_vcpu(handle, host_vcpu, vcpu_hva);
}
+static void handle___pkvm_vcpu_in_poison_fault(struct kvm_cpu_context *host_ctxt)
+{
+ int ret;
+ struct pkvm_hyp_vcpu *hyp_vcpu = pkvm_get_loaded_hyp_vcpu();
+
+ ret = hyp_vcpu ? __pkvm_vcpu_in_poison_fault(hyp_vcpu) : -EINVAL;
+ cpu_reg(host_ctxt, 1) = ret;
+}
+
static void handle___pkvm_force_reclaim_guest_page(struct kvm_cpu_context *host_ctxt)
{
DECLARE_REG(phys_addr_t, phys, host_ctxt, 1);
@@ -641,6 +650,7 @@ static const hcall_t host_hcall[] = {
HANDLE_FUNC(__pkvm_unreserve_vm),
HANDLE_FUNC(__pkvm_init_vm),
HANDLE_FUNC(__pkvm_init_vcpu),
+ HANDLE_FUNC(__pkvm_vcpu_in_poison_fault),
HANDLE_FUNC(__pkvm_force_reclaim_guest_page),
HANDLE_FUNC(__pkvm_reclaim_dying_guest_page),
HANDLE_FUNC(__pkvm_start_teardown_vm),
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index dfc512d3bb20..6fc2c77a6920 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -890,6 +890,49 @@ static int get_valid_guest_pte(struct pkvm_hyp_vm *vm, u64 ipa, kvm_pte_t *ptep,
return 0;
}
+int __pkvm_vcpu_in_poison_fault(struct pkvm_hyp_vcpu *hyp_vcpu)
+{
+ struct pkvm_hyp_vm *vm = pkvm_hyp_vcpu_to_hyp_vm(hyp_vcpu);
+ kvm_pte_t pte;
+ s8 level;
+ u64 ipa;
+ int ret;
+
+ switch (kvm_vcpu_trap_get_class(&hyp_vcpu->vcpu)) {
+ case ESR_ELx_EC_DABT_LOW:
+ case ESR_ELx_EC_IABT_LOW:
+ if (kvm_vcpu_trap_is_translation_fault(&hyp_vcpu->vcpu))
+ break;
+ fallthrough;
+ default:
+ return -EINVAL;
+ }
+
+ /*
+ * The host has the faulting IPA when it calls us from the guest
+ * fault handler but we retrieve it ourselves from the FAR so as
+ * to avoid exposing an "oracle" that could reveal data access
+ * patterns of the guest after initial donation of its pages.
+ */
+ ipa = kvm_vcpu_get_fault_ipa(&hyp_vcpu->vcpu);
+ ipa |= FAR_TO_FIPA_OFFSET(kvm_vcpu_get_hfar(&hyp_vcpu->vcpu));
+
+ guest_lock_component(vm);
+ ret = kvm_pgtable_get_leaf(&vm->pgt, ipa, &pte, &level);
+ if (ret)
+ goto unlock;
+
+ if (level != KVM_PGTABLE_LAST_LEVEL) {
+ ret = -EINVAL;
+ goto unlock;
+ }
+
+ ret = guest_pte_is_poisoned(pte);
+unlock:
+ guest_unlock_component(vm);
+ return ret;
+}
+
int __pkvm_host_share_hyp(u64 pfn)
{
u64 phys = hyp_pfn_to_phys(pfn);
diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index 32294bd21dde..da0a45dab203 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -417,10 +417,13 @@ int pkvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
return -EINVAL;
/*
- * We raced with another vCPU.
+ * We either raced with another vCPU or the guest PTE
+ * has been poisoned by an erroneous host access.
*/
- if (mapping)
- return -EAGAIN;
+ if (mapping) {
+ ret = kvm_call_hyp_nvhe(__pkvm_vcpu_in_poison_fault);
+ return ret ? -EFAULT : -EAGAIN;
+ }
ret = kvm_call_hyp_nvhe(__pkvm_host_donate_guest, pfn, gfn);
} else {
--
2.53.0.1018.g2bb0e51243-goog
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v4 28/38] KVM: arm64: Add hvc handler at EL2 for hypercalls from protected VMs
2026-03-27 13:59 [PATCH v4 00/38] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
` (26 preceding siblings ...)
2026-03-27 14:00 ` [PATCH v4 27/38] KVM: arm64: Return -EFAULT from VCPU_RUN on access to a poisoned pte Will Deacon
@ 2026-03-27 14:00 ` Will Deacon
2026-03-27 14:00 ` [PATCH v4 29/38] KVM: arm64: Implement the MEM_SHARE hypercall for " Will Deacon
` (10 subsequent siblings)
38 siblings, 0 replies; 40+ messages in thread
From: Will Deacon @ 2026-03-27 14:00 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh,
Alexandru Elisei
Add a hypercall handler at EL2 for hypercalls originating from protected
VMs. For now, this implements only the FEATURES and MEMINFO calls, but
subsequent patches will implement the SHARE and UNSHARE functions
necessary for virtio.
Unhandled hypercalls (including PSCI) are passed back to the host.
Reviewed-by: Vincent Donnefort <vdonnefort@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
arch/arm64/kvm/hyp/include/nvhe/pkvm.h | 1 +
arch/arm64/kvm/hyp/nvhe/pkvm.c | 37 ++++++++++++++++++++++++++
arch/arm64/kvm/hyp/nvhe/switch.c | 1 +
3 files changed, 39 insertions(+)
diff --git a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
index a5a7bb453f3e..c904647d2f76 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
@@ -88,6 +88,7 @@ struct pkvm_hyp_vm *get_pkvm_hyp_vm(pkvm_handle_t handle);
struct pkvm_hyp_vm *get_np_pkvm_hyp_vm(pkvm_handle_t handle);
void put_pkvm_hyp_vm(struct pkvm_hyp_vm *hyp_vm);
+bool kvm_handle_pvm_hvc64(struct kvm_vcpu *vcpu, u64 *exit_code);
bool kvm_handle_pvm_sysreg(struct kvm_vcpu *vcpu, u64 *exit_code);
bool kvm_handle_pvm_restricted(struct kvm_vcpu *vcpu, u64 *exit_code);
void kvm_init_pvm_id_regs(struct kvm_vcpu *vcpu);
diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
index 5269ac20d2fb..8b32bf37acc3 100644
--- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
+++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
@@ -4,6 +4,8 @@
* Author: Fuad Tabba <tabba@google.com>
*/
+#include <kvm/arm_hypercalls.h>
+
#include <linux/kvm_host.h>
#include <linux/mm.h>
@@ -965,3 +967,38 @@ int __pkvm_finalize_teardown_vm(pkvm_handle_t handle)
hyp_spin_unlock(&vm_table_lock);
return err;
}
+/*
+ * Handler for protected VM HVC calls.
+ *
+ * Returns true if the hypervisor has handled the exit (and control
+ * should return to the guest) or false if it hasn't (and the handling
+ * should be performed by the host).
+ */
+bool kvm_handle_pvm_hvc64(struct kvm_vcpu *vcpu, u64 *exit_code)
+{
+ u64 val[4] = { SMCCC_RET_INVALID_PARAMETER };
+ bool handled = true;
+
+ switch (smccc_get_function(vcpu)) {
+ case ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID:
+ val[0] = BIT(ARM_SMCCC_KVM_FUNC_FEATURES);
+ val[0] |= BIT(ARM_SMCCC_KVM_FUNC_HYP_MEMINFO);
+ break;
+ case ARM_SMCCC_VENDOR_HYP_KVM_HYP_MEMINFO_FUNC_ID:
+ if (smccc_get_arg1(vcpu) ||
+ smccc_get_arg2(vcpu) ||
+ smccc_get_arg3(vcpu)) {
+ break;
+ }
+
+ val[0] = PAGE_SIZE;
+ break;
+ default:
+ /* Punt everything else back to the host, for now. */
+ handled = false;
+ }
+
+ if (handled)
+ smccc_set_retval(vcpu, val[0], val[1], val[2], val[3]);
+ return handled;
+}
diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
index 779089e42681..51bd88dc6012 100644
--- a/arch/arm64/kvm/hyp/nvhe/switch.c
+++ b/arch/arm64/kvm/hyp/nvhe/switch.c
@@ -190,6 +190,7 @@ static const exit_handler_fn hyp_exit_handlers[] = {
static const exit_handler_fn pvm_exit_handlers[] = {
[0 ... ESR_ELx_EC_MAX] = NULL,
+ [ESR_ELx_EC_HVC64] = kvm_handle_pvm_hvc64,
[ESR_ELx_EC_SYS64] = kvm_handle_pvm_sys64,
[ESR_ELx_EC_SVE] = kvm_handle_pvm_restricted,
[ESR_ELx_EC_FP_ASIMD] = kvm_hyp_handle_fpsimd,
--
2.53.0.1018.g2bb0e51243-goog
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v4 29/38] KVM: arm64: Implement the MEM_SHARE hypercall for protected VMs
2026-03-27 13:59 [PATCH v4 00/38] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
` (27 preceding siblings ...)
2026-03-27 14:00 ` [PATCH v4 28/38] KVM: arm64: Add hvc handler at EL2 for hypercalls from protected VMs Will Deacon
@ 2026-03-27 14:00 ` Will Deacon
2026-03-27 14:00 ` [PATCH v4 30/38] KVM: arm64: Implement the MEM_UNSHARE " Will Deacon
` (9 subsequent siblings)
38 siblings, 0 replies; 40+ messages in thread
From: Will Deacon @ 2026-03-27 14:00 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh,
Alexandru Elisei
Implement the ARM_SMCCC_KVM_FUNC_MEM_SHARE hypercall to allow protected
VMs to share memory (e.g. the swiotlb bounce buffers) back to the host.
Reviewed-by: Vincent Donnefort <vdonnefort@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 1 +
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 32 ++++++++++
arch/arm64/kvm/hyp/nvhe/pkvm.c | 61 +++++++++++++++++++
3 files changed, 94 insertions(+)
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index 8bc9a2489298..fea8aecae5ef 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -34,6 +34,7 @@ extern unsigned long hyp_nr_cpus;
int __pkvm_prot_finalize(void);
int __pkvm_host_share_hyp(u64 pfn);
+int __pkvm_guest_share_host(struct pkvm_hyp_vcpu *vcpu, u64 gfn);
int __pkvm_host_unshare_hyp(u64 pfn);
int __pkvm_host_donate_hyp(u64 pfn, u64 nr_pages);
int __pkvm_hyp_donate_host(u64 pfn, u64 nr_pages);
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 6fc2c77a6920..e005a5690c65 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -959,6 +959,38 @@ int __pkvm_host_share_hyp(u64 pfn)
return ret;
}
+int __pkvm_guest_share_host(struct pkvm_hyp_vcpu *vcpu, u64 gfn)
+{
+ struct pkvm_hyp_vm *vm = pkvm_hyp_vcpu_to_hyp_vm(vcpu);
+ u64 phys, ipa = hyp_pfn_to_phys(gfn);
+ kvm_pte_t pte;
+ int ret;
+
+ host_lock_component();
+ guest_lock_component(vm);
+
+ ret = get_valid_guest_pte(vm, ipa, &pte, &phys);
+ if (ret)
+ goto unlock;
+
+ ret = -EPERM;
+ if (pkvm_getstate(kvm_pgtable_stage2_pte_prot(pte)) != PKVM_PAGE_OWNED)
+ goto unlock;
+ if (__host_check_page_state_range(phys, PAGE_SIZE, PKVM_NOPAGE))
+ goto unlock;
+
+ ret = 0;
+ WARN_ON(kvm_pgtable_stage2_map(&vm->pgt, ipa, PAGE_SIZE, phys,
+ pkvm_mkstate(KVM_PGTABLE_PROT_RWX, PKVM_PAGE_SHARED_OWNED),
+ &vcpu->vcpu.arch.pkvm_memcache, 0));
+ WARN_ON(__host_set_page_state_range(phys, PAGE_SIZE, PKVM_PAGE_SHARED_BORROWED));
+unlock:
+ guest_unlock_component(vm);
+ host_unlock_component();
+
+ return ret;
+}
+
int __pkvm_host_unshare_hyp(u64 pfn)
{
u64 phys = hyp_pfn_to_phys(pfn);
diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
index 8b32bf37acc3..1dc9225073c4 100644
--- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
+++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
@@ -967,6 +967,58 @@ int __pkvm_finalize_teardown_vm(pkvm_handle_t handle)
hyp_spin_unlock(&vm_table_lock);
return err;
}
+
+static u64 __pkvm_memshare_page_req(struct kvm_vcpu *vcpu, u64 ipa)
+{
+ u64 elr;
+
+ /* Fake up a data abort (level 3 translation fault on write) */
+ vcpu->arch.fault.esr_el2 = (ESR_ELx_EC_DABT_LOW << ESR_ELx_EC_SHIFT) |
+ ESR_ELx_WNR | ESR_ELx_FSC_FAULT |
+ FIELD_PREP(ESR_ELx_FSC_LEVEL, 3);
+
+ /* Shuffle the IPA around into the HPFAR */
+ vcpu->arch.fault.hpfar_el2 = (HPFAR_EL2_NS | (ipa >> 8)) & HPFAR_MASK;
+
+ /* This is a virtual address. 0's good. Let's go with 0. */
+ vcpu->arch.fault.far_el2 = 0;
+
+ /* Rewind the ELR so we return to the HVC once the IPA is mapped */
+ elr = read_sysreg(elr_el2);
+ elr -= 4;
+ write_sysreg(elr, elr_el2);
+
+ return ARM_EXCEPTION_TRAP;
+}
+
+static bool pkvm_memshare_call(u64 *ret, struct kvm_vcpu *vcpu, u64 *exit_code)
+{
+ struct pkvm_hyp_vcpu *hyp_vcpu;
+ u64 ipa = smccc_get_arg1(vcpu);
+
+ if (!PAGE_ALIGNED(ipa))
+ goto out_guest;
+
+ hyp_vcpu = container_of(vcpu, struct pkvm_hyp_vcpu, vcpu);
+ switch (__pkvm_guest_share_host(hyp_vcpu, hyp_phys_to_pfn(ipa))) {
+ case 0:
+ ret[0] = SMCCC_RET_SUCCESS;
+ goto out_guest;
+ case -ENOENT:
+ /*
+ * Convert the exception into a data abort so that the page
+ * being shared is mapped into the guest next time.
+ */
+ *exit_code = __pkvm_memshare_page_req(vcpu, ipa);
+ goto out_host;
+ }
+
+out_guest:
+ return true;
+out_host:
+ return false;
+}
+
/*
* Handler for protected VM HVC calls.
*
@@ -983,6 +1035,7 @@ bool kvm_handle_pvm_hvc64(struct kvm_vcpu *vcpu, u64 *exit_code)
case ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID:
val[0] = BIT(ARM_SMCCC_KVM_FUNC_FEATURES);
val[0] |= BIT(ARM_SMCCC_KVM_FUNC_HYP_MEMINFO);
+ val[0] |= BIT(ARM_SMCCC_KVM_FUNC_MEM_SHARE);
break;
case ARM_SMCCC_VENDOR_HYP_KVM_HYP_MEMINFO_FUNC_ID:
if (smccc_get_arg1(vcpu) ||
@@ -993,6 +1046,14 @@ bool kvm_handle_pvm_hvc64(struct kvm_vcpu *vcpu, u64 *exit_code)
val[0] = PAGE_SIZE;
break;
+ case ARM_SMCCC_VENDOR_HYP_KVM_MEM_SHARE_FUNC_ID:
+ if (smccc_get_arg2(vcpu) ||
+ smccc_get_arg3(vcpu)) {
+ break;
+ }
+
+ handled = pkvm_memshare_call(val, vcpu, exit_code);
+ break;
default:
/* Punt everything else back to the host, for now. */
handled = false;
--
2.53.0.1018.g2bb0e51243-goog
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v4 30/38] KVM: arm64: Implement the MEM_UNSHARE hypercall for protected VMs
2026-03-27 13:59 [PATCH v4 00/38] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
` (28 preceding siblings ...)
2026-03-27 14:00 ` [PATCH v4 29/38] KVM: arm64: Implement the MEM_SHARE hypercall for " Will Deacon
@ 2026-03-27 14:00 ` Will Deacon
2026-03-27 14:00 ` [PATCH v4 31/38] KVM: arm64: Allow userspace to create protected VMs when pKVM is enabled Will Deacon
` (8 subsequent siblings)
38 siblings, 0 replies; 40+ messages in thread
From: Will Deacon @ 2026-03-27 14:00 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh,
Alexandru Elisei
Implement the ARM_SMCCC_KVM_FUNC_MEM_UNSHARE hypercall to allow
protected VMs to unshare memory that was previously shared with the host
using the ARM_SMCCC_KVM_FUNC_MEM_SHARE hypercall.
Reviewed-by: Vincent Donnefort <vdonnefort@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 1 +
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 34 +++++++++++++++++++
arch/arm64/kvm/hyp/nvhe/pkvm.c | 22 ++++++++++++
3 files changed, 57 insertions(+)
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index fea8aecae5ef..99d8398afe20 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -35,6 +35,7 @@ extern unsigned long hyp_nr_cpus;
int __pkvm_prot_finalize(void);
int __pkvm_host_share_hyp(u64 pfn);
int __pkvm_guest_share_host(struct pkvm_hyp_vcpu *vcpu, u64 gfn);
+int __pkvm_guest_unshare_host(struct pkvm_hyp_vcpu *vcpu, u64 gfn);
int __pkvm_host_unshare_hyp(u64 pfn);
int __pkvm_host_donate_hyp(u64 pfn, u64 nr_pages);
int __pkvm_hyp_donate_host(u64 pfn, u64 nr_pages);
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index e005a5690c65..898bd5d767ce 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -991,6 +991,40 @@ int __pkvm_guest_share_host(struct pkvm_hyp_vcpu *vcpu, u64 gfn)
return ret;
}
+int __pkvm_guest_unshare_host(struct pkvm_hyp_vcpu *vcpu, u64 gfn)
+{
+ struct pkvm_hyp_vm *vm = pkvm_hyp_vcpu_to_hyp_vm(vcpu);
+ u64 meta, phys, ipa = hyp_pfn_to_phys(gfn);
+ kvm_pte_t pte;
+ int ret;
+
+ host_lock_component();
+ guest_lock_component(vm);
+
+ ret = get_valid_guest_pte(vm, ipa, &pte, &phys);
+ if (ret)
+ goto unlock;
+
+ ret = -EPERM;
+ if (pkvm_getstate(kvm_pgtable_stage2_pte_prot(pte)) != PKVM_PAGE_SHARED_OWNED)
+ goto unlock;
+ if (__host_check_page_state_range(phys, PAGE_SIZE, PKVM_PAGE_SHARED_BORROWED))
+ goto unlock;
+
+ ret = 0;
+ meta = host_stage2_encode_gfn_meta(vm, gfn);
+ WARN_ON(host_stage2_set_owner_metadata_locked(phys, PAGE_SIZE,
+ PKVM_ID_GUEST, meta));
+ WARN_ON(kvm_pgtable_stage2_map(&vm->pgt, ipa, PAGE_SIZE, phys,
+ pkvm_mkstate(KVM_PGTABLE_PROT_RWX, PKVM_PAGE_OWNED),
+ &vcpu->vcpu.arch.pkvm_memcache, 0));
+unlock:
+ guest_unlock_component(vm);
+ host_unlock_component();
+
+ return ret;
+}
+
int __pkvm_host_unshare_hyp(u64 pfn)
{
u64 phys = hyp_pfn_to_phys(pfn);
diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
index 1dc9225073c4..ebfd9904ede6 100644
--- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
+++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
@@ -1019,6 +1019,19 @@ static bool pkvm_memshare_call(u64 *ret, struct kvm_vcpu *vcpu, u64 *exit_code)
return false;
}
+static void pkvm_memunshare_call(u64 *ret, struct kvm_vcpu *vcpu)
+{
+ struct pkvm_hyp_vcpu *hyp_vcpu;
+ u64 ipa = smccc_get_arg1(vcpu);
+
+ if (!PAGE_ALIGNED(ipa))
+ return;
+
+ hyp_vcpu = container_of(vcpu, struct pkvm_hyp_vcpu, vcpu);
+ if (!__pkvm_guest_unshare_host(hyp_vcpu, hyp_phys_to_pfn(ipa)))
+ ret[0] = SMCCC_RET_SUCCESS;
+}
+
/*
* Handler for protected VM HVC calls.
*
@@ -1036,6 +1049,7 @@ bool kvm_handle_pvm_hvc64(struct kvm_vcpu *vcpu, u64 *exit_code)
val[0] = BIT(ARM_SMCCC_KVM_FUNC_FEATURES);
val[0] |= BIT(ARM_SMCCC_KVM_FUNC_HYP_MEMINFO);
val[0] |= BIT(ARM_SMCCC_KVM_FUNC_MEM_SHARE);
+ val[0] |= BIT(ARM_SMCCC_KVM_FUNC_MEM_UNSHARE);
break;
case ARM_SMCCC_VENDOR_HYP_KVM_HYP_MEMINFO_FUNC_ID:
if (smccc_get_arg1(vcpu) ||
@@ -1054,6 +1068,14 @@ bool kvm_handle_pvm_hvc64(struct kvm_vcpu *vcpu, u64 *exit_code)
handled = pkvm_memshare_call(val, vcpu, exit_code);
break;
+ case ARM_SMCCC_VENDOR_HYP_KVM_MEM_UNSHARE_FUNC_ID:
+ if (smccc_get_arg2(vcpu) ||
+ smccc_get_arg3(vcpu)) {
+ break;
+ }
+
+ pkvm_memunshare_call(val, vcpu);
+ break;
default:
/* Punt everything else back to the host, for now. */
handled = false;
--
2.53.0.1018.g2bb0e51243-goog
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v4 31/38] KVM: arm64: Allow userspace to create protected VMs when pKVM is enabled
2026-03-27 13:59 [PATCH v4 00/38] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
` (29 preceding siblings ...)
2026-03-27 14:00 ` [PATCH v4 30/38] KVM: arm64: Implement the MEM_UNSHARE " Will Deacon
@ 2026-03-27 14:00 ` Will Deacon
2026-03-27 14:00 ` [PATCH v4 32/38] KVM: arm64: Add some initial documentation for pKVM Will Deacon
` (7 subsequent siblings)
38 siblings, 0 replies; 40+ messages in thread
From: Will Deacon @ 2026-03-27 14:00 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh,
Alexandru Elisei
Introduce a new VM type for KVM/arm64 to allow userspace to request the
creation of a "protected VM" when the host has booted with pKVM enabled.
For now, this feature results in a taint on first use as many aspects of
a protected VM are not yet protected!
Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
arch/arm64/include/asm/kvm_pkvm.h | 2 +-
arch/arm64/kvm/arm.c | 8 +++++++-
arch/arm64/kvm/mmu.c | 3 ---
arch/arm64/kvm/pkvm.c | 8 +++++++-
include/uapi/linux/kvm.h | 5 +++++
5 files changed, 20 insertions(+), 6 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_pkvm.h b/arch/arm64/include/asm/kvm_pkvm.h
index 7041e398fb4c..2954b311128c 100644
--- a/arch/arm64/include/asm/kvm_pkvm.h
+++ b/arch/arm64/include/asm/kvm_pkvm.h
@@ -17,7 +17,7 @@
#define HYP_MEMBLOCK_REGIONS 128
-int pkvm_init_host_vm(struct kvm *kvm);
+int pkvm_init_host_vm(struct kvm *kvm, unsigned long type);
int pkvm_create_hyp_vm(struct kvm *kvm);
bool pkvm_hyp_vm_is_created(struct kvm *kvm);
void pkvm_destroy_hyp_vm(struct kvm *kvm);
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 3589fc08266c..c2b666a46893 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -203,6 +203,9 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
{
int ret;
+ if (type & ~KVM_VM_TYPE_ARM_MASK)
+ return -EINVAL;
+
mutex_init(&kvm->arch.config_lock);
#ifdef CONFIG_LOCKDEP
@@ -234,9 +237,12 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
* If any failures occur after this is successful, make sure to
* call __pkvm_unreserve_vm to unreserve the VM in hyp.
*/
- ret = pkvm_init_host_vm(kvm);
+ ret = pkvm_init_host_vm(kvm, type);
if (ret)
goto err_uninit_mmu;
+ } else if (type & KVM_VM_TYPE_ARM_PROTECTED) {
+ ret = -EINVAL;
+ goto err_uninit_mmu;
}
kvm_vgic_early_init(kvm);
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 6a4151e3e4a3..45358ae8a300 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -881,9 +881,6 @@ static int kvm_init_ipa_range(struct kvm_s2_mmu *mmu, unsigned long type)
u64 mmfr0, mmfr1;
u32 phys_shift;
- if (type & ~KVM_VM_TYPE_ARM_IPA_SIZE_MASK)
- return -EINVAL;
-
phys_shift = KVM_VM_TYPE_ARM_IPA_SIZE(type);
if (is_protected_kvm_enabled()) {
phys_shift = kvm_ipa_limit;
diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index da0a45dab203..632852648012 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -219,9 +219,10 @@ void pkvm_destroy_hyp_vm(struct kvm *kvm)
mutex_unlock(&kvm->arch.config_lock);
}
-int pkvm_init_host_vm(struct kvm *kvm)
+int pkvm_init_host_vm(struct kvm *kvm, unsigned long type)
{
int ret;
+ bool protected = type & KVM_VM_TYPE_ARM_PROTECTED;
if (pkvm_hyp_vm_is_created(kvm))
return -EINVAL;
@@ -236,6 +237,11 @@ int pkvm_init_host_vm(struct kvm *kvm)
return ret;
kvm->arch.pkvm.handle = ret;
+ kvm->arch.pkvm.is_protected = protected;
+ if (protected) {
+ pr_warn_once("kvm: protected VMs are experimental and for development only, tainting kernel\n");
+ add_taint(TAINT_USER, LOCKDEP_STILL_OK);
+ }
return 0;
}
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 80364d4dbebb..073b2bcaf560 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -703,6 +703,11 @@ struct kvm_enable_cap {
#define KVM_VM_TYPE_ARM_IPA_SIZE_MASK 0xffULL
#define KVM_VM_TYPE_ARM_IPA_SIZE(x) \
((x) & KVM_VM_TYPE_ARM_IPA_SIZE_MASK)
+
+#define KVM_VM_TYPE_ARM_PROTECTED (1UL << 31)
+#define KVM_VM_TYPE_ARM_MASK (KVM_VM_TYPE_ARM_IPA_SIZE_MASK | \
+ KVM_VM_TYPE_ARM_PROTECTED)
+
/*
* ioctls for /dev/kvm fds:
*/
--
2.53.0.1018.g2bb0e51243-goog
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v4 32/38] KVM: arm64: Add some initial documentation for pKVM
2026-03-27 13:59 [PATCH v4 00/38] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
` (30 preceding siblings ...)
2026-03-27 14:00 ` [PATCH v4 31/38] KVM: arm64: Allow userspace to create protected VMs when pKVM is enabled Will Deacon
@ 2026-03-27 14:00 ` Will Deacon
2026-03-27 14:00 ` [PATCH v4 33/38] KVM: arm64: Extend pKVM page ownership selftests to cover guest donation Will Deacon
` (6 subsequent siblings)
38 siblings, 0 replies; 40+ messages in thread
From: Will Deacon @ 2026-03-27 14:00 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh,
Alexandru Elisei
Add some initial documentation for pKVM to help people understand what
is supported, the limitations of protected VMs when compared to
non-protected VMs and also what is left to do.
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
.../admin-guide/kernel-parameters.txt | 4 +-
Documentation/virt/kvm/arm/index.rst | 1 +
Documentation/virt/kvm/arm/pkvm.rst | 106 ++++++++++++++++++
3 files changed, 109 insertions(+), 2 deletions(-)
create mode 100644 Documentation/virt/kvm/arm/pkvm.rst
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 03a550630644..44854a67bc63 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3247,8 +3247,8 @@ Kernel parameters
for the host. To force nVHE on VHE hardware, add
"arm64_sw.hvhe=0 id_aa64mmfr1.vh=0" to the
command-line.
- "nested" is experimental and should be used with
- extreme caution.
+ "nested" and "protected" are experimental and should be
+ used with extreme caution.
kvm-arm.vgic_v3_group0_trap=
[KVM,ARM,EARLY] Trap guest accesses to GICv3 group-0
diff --git a/Documentation/virt/kvm/arm/index.rst b/Documentation/virt/kvm/arm/index.rst
index ec09881de4cf..0856b4942e05 100644
--- a/Documentation/virt/kvm/arm/index.rst
+++ b/Documentation/virt/kvm/arm/index.rst
@@ -10,6 +10,7 @@ ARM
fw-pseudo-registers
hyp-abi
hypercalls
+ pkvm
pvtime
ptp_kvm
vcpu-features
diff --git a/Documentation/virt/kvm/arm/pkvm.rst b/Documentation/virt/kvm/arm/pkvm.rst
new file mode 100644
index 000000000000..514992a79a83
--- /dev/null
+++ b/Documentation/virt/kvm/arm/pkvm.rst
@@ -0,0 +1,106 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+====================
+Protected KVM (pKVM)
+====================
+
+**NOTE**: pKVM is currently an experimental, development feature and
+subject to breaking changes as new isolation features are implemented.
+Please reach out to the developers at kvmarm@lists.linux.dev if you have
+any questions.
+
+Overview
+========
+
+Booting a host kernel with '``kvm-arm.mode=protected``' enables
+"Protected KVM" (pKVM). During boot, pKVM installs a stage-2 identity
+map page-table for the host and uses it to isolate the hypervisor
+running at EL2 from the rest of the host running at EL1/0.
+
+pKVM permits creation of protected virtual machines (pVMs) by passing
+the ``KVM_VM_TYPE_ARM_PROTECTED`` machine type identifier to the
+``KVM_CREATE_VM`` ioctl(). The hypervisor isolates pVMs from the host by
+unmapping pages from the stage-2 identity map as they are accessed by a
+pVM. Hypercalls are provided for a pVM to share specific regions of its
+IPA space back with the host, allowing for communication with the VMM.
+A Linux guest must be configured with ``CONFIG_ARM_PKVM_GUEST=y`` in
+order to issue these hypercalls.
+
+See hypercalls.rst for more details.
+
+Isolation mechanisms
+====================
+
+pKVM relies on a number of mechanisms to isolate PVMs from the host:
+
+CPU memory isolation
+--------------------
+
+Status: Isolation of anonymous memory and metadata pages.
+
+Metadata pages (e.g. page-table pages and '``struct kvm_vcpu``' pages)
+are donated from the host to the hypervisor during pVM creation and
+are consequently unmapped from the stage-2 identity map until the pVM is
+destroyed.
+
+Similarly to regular KVM, pages are lazily mapped into the guest in
+response to stage-2 page faults handled by the host. However, when
+running a pVM, these pages are first pinned and then unmapped from the
+stage-2 identity map as part of the donation procedure. This gives rise
+to some user-visible differences when compared to non-protected VMs,
+largely due to the lack of MMU notifiers:
+
+* Memslots cannot be moved or deleted once the pVM has started running.
+* Read-only memslots and dirty logging are not supported.
+* With the exception of swap, file-backed pages cannot be mapped into a
+ pVM.
+* Donated pages are accounted against ``RLIMIT_MLOCK`` and so the VMM
+ must have a sufficient resource limit or be granted ``CAP_IPC_LOCK``.
+ The lack of a runtime reclaim mechanism means that memory locked for
+ a pVM will remain locked until the pVM is destroyed.
+* Changes to the VMM address space (e.g. a ``MAP_FIXED`` mmap() over a
+ mapping associated with a memslot) are not reflected in the guest and
+ may lead to loss of coherency.
+* Accessing pVM memory that has not been shared back will result in the
+ delivery of a SIGSEGV.
+* If a system call accesses pVM memory that has not been shared back
+ then it will either return ``-EFAULT`` or forcefully reclaim the
+ memory pages. Reclaimed memory is zeroed by the hypervisor and a
+ subsequent attempt to access it in the pVM will return ``-EFAULT``
+ from the ``VCPU_RUN`` ioctl().
+
+CPU state isolation
+-------------------
+
+Status: **Unimplemented.**
+
+DMA isolation using an IOMMU
+----------------------------
+
+Status: **Unimplemented.**
+
+Proxying of Trustzone services
+------------------------------
+
+Status: FF-A and PSCI calls from the host are proxied by the pKVM
+hypervisor.
+
+The FF-A proxy ensures that the host cannot share pVM or hypervisor
+memory with Trustzone as part of a "confused deputy" attack.
+
+The PSCI proxy ensures that CPUs always have the stage-2 identity map
+installed when they are executing in the host.
+
+Protected VM firmware (pvmfw)
+-----------------------------
+
+Status: **Unimplemented.**
+
+Resources
+=========
+
+Quentin Perret's KVM Forum 2022 talk entitled "Protected KVM on arm64: A
+technical deep dive" remains a good resource for learning more about
+pKVM, despite some of the details having changed in the meantime:
+
+https://www.youtube.com/watch?v=9npebeVFbFw
--
2.53.0.1018.g2bb0e51243-goog
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v4 33/38] KVM: arm64: Extend pKVM page ownership selftests to cover guest donation
2026-03-27 13:59 [PATCH v4 00/38] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
` (31 preceding siblings ...)
2026-03-27 14:00 ` [PATCH v4 32/38] KVM: arm64: Add some initial documentation for pKVM Will Deacon
@ 2026-03-27 14:00 ` Will Deacon
2026-03-27 14:00 ` [PATCH v4 34/38] KVM: arm64: Register 'selftest_vm' in the VM table Will Deacon
` (5 subsequent siblings)
38 siblings, 0 replies; 40+ messages in thread
From: Will Deacon @ 2026-03-27 14:00 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh,
Alexandru Elisei
Extend the pKVM page ownership selftests to donate and reclaim a page
to/from a guest.
Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 898bd5d767ce..6525f9fa274c 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -1749,6 +1749,7 @@ void pkvm_ownership_selftest(void *base)
assert_transition_res(-EPERM, hyp_pin_shared_mem, virt, virt + size);
assert_transition_res(-EPERM, __pkvm_host_share_guest, pfn, gfn, 1, vcpu, prot);
assert_transition_res(-ENOENT, __pkvm_host_unshare_guest, gfn, 1, vm);
+ assert_transition_res(-EPERM, __pkvm_host_donate_guest, pfn, gfn, vcpu);
selftest_state.host = PKVM_PAGE_OWNED;
selftest_state.hyp = PKVM_NOPAGE;
@@ -1768,6 +1769,7 @@ void pkvm_ownership_selftest(void *base)
assert_transition_res(-EPERM, __pkvm_hyp_donate_host, pfn, 1);
assert_transition_res(-EPERM, __pkvm_host_share_guest, pfn, gfn, 1, vcpu, prot);
assert_transition_res(-ENOENT, __pkvm_host_unshare_guest, gfn, 1, vm);
+ assert_transition_res(-EPERM, __pkvm_host_donate_guest, pfn, gfn, vcpu);
assert_transition_res(0, hyp_pin_shared_mem, virt, virt + size);
assert_transition_res(0, hyp_pin_shared_mem, virt, virt + size);
@@ -1780,6 +1782,7 @@ void pkvm_ownership_selftest(void *base)
assert_transition_res(-EPERM, __pkvm_hyp_donate_host, pfn, 1);
assert_transition_res(-EPERM, __pkvm_host_share_guest, pfn, gfn, 1, vcpu, prot);
assert_transition_res(-ENOENT, __pkvm_host_unshare_guest, gfn, 1, vm);
+ assert_transition_res(-EPERM, __pkvm_host_donate_guest, pfn, gfn, vcpu);
hyp_unpin_shared_mem(virt, virt + size);
assert_page_state();
@@ -1799,6 +1802,7 @@ void pkvm_ownership_selftest(void *base)
assert_transition_res(-EPERM, __pkvm_hyp_donate_host, pfn, 1);
assert_transition_res(-EPERM, __pkvm_host_share_guest, pfn, gfn, 1, vcpu, prot);
assert_transition_res(-ENOENT, __pkvm_host_unshare_guest, gfn, 1, vm);
+ assert_transition_res(-EPERM, __pkvm_host_donate_guest, pfn, gfn, vcpu);
assert_transition_res(-EPERM, hyp_pin_shared_mem, virt, virt + size);
selftest_state.host = PKVM_PAGE_OWNED;
@@ -1815,6 +1819,7 @@ void pkvm_ownership_selftest(void *base)
assert_transition_res(-EPERM, __pkvm_host_share_hyp, pfn);
assert_transition_res(-EPERM, __pkvm_host_unshare_hyp, pfn);
assert_transition_res(-EPERM, __pkvm_hyp_donate_host, pfn, 1);
+ assert_transition_res(-EPERM, __pkvm_host_donate_guest, pfn, gfn, vcpu);
assert_transition_res(-EPERM, hyp_pin_shared_mem, virt, virt + size);
selftest_state.guest[1] = PKVM_PAGE_SHARED_BORROWED;
@@ -1828,6 +1833,23 @@ void pkvm_ownership_selftest(void *base)
selftest_state.host = PKVM_PAGE_OWNED;
assert_transition_res(0, __pkvm_host_unshare_guest, gfn + 1, 1, vm);
+ selftest_state.host = PKVM_NOPAGE;
+ selftest_state.guest[0] = PKVM_PAGE_OWNED;
+ assert_transition_res(0, __pkvm_host_donate_guest, pfn, gfn, vcpu);
+ assert_transition_res(-EPERM, __pkvm_host_donate_guest, pfn, gfn, vcpu);
+ assert_transition_res(-EPERM, __pkvm_host_donate_guest, pfn, gfn + 1, vcpu);
+ assert_transition_res(-EPERM, __pkvm_host_share_guest, pfn, gfn, 1, vcpu, prot);
+ assert_transition_res(-EPERM, __pkvm_host_share_guest, pfn, gfn + 1, 1, vcpu, prot);
+ assert_transition_res(-EPERM, __pkvm_host_share_ffa, pfn, 1);
+ assert_transition_res(-EPERM, __pkvm_host_donate_hyp, pfn, 1);
+ assert_transition_res(-EPERM, __pkvm_host_share_hyp, pfn);
+ assert_transition_res(-EPERM, __pkvm_host_unshare_hyp, pfn);
+ assert_transition_res(-EPERM, __pkvm_hyp_donate_host, pfn, 1);
+
+ selftest_state.host = PKVM_PAGE_OWNED;
+ selftest_state.guest[0] = PKVM_NOPAGE;
+ assert_transition_res(0, __pkvm_host_reclaim_page_guest, gfn, vm);
+
selftest_state.host = PKVM_NOPAGE;
selftest_state.hyp = PKVM_PAGE_OWNED;
assert_transition_res(0, __pkvm_host_donate_hyp, pfn, 1);
--
2.53.0.1018.g2bb0e51243-goog
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v4 34/38] KVM: arm64: Register 'selftest_vm' in the VM table
2026-03-27 13:59 [PATCH v4 00/38] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
` (32 preceding siblings ...)
2026-03-27 14:00 ` [PATCH v4 33/38] KVM: arm64: Extend pKVM page ownership selftests to cover guest donation Will Deacon
@ 2026-03-27 14:00 ` Will Deacon
2026-03-27 14:00 ` [PATCH v4 35/38] KVM: arm64: Extend pKVM page ownership selftests to cover forced reclaim Will Deacon
` (4 subsequent siblings)
38 siblings, 0 replies; 40+ messages in thread
From: Will Deacon @ 2026-03-27 14:00 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh,
Alexandru Elisei
In preparation for extending the pKVM page ownership selftests to cover
forceful reclaim of donated pages, rework the creation of the
'selftest_vm' so that it is registered in the VM table while the tests
are running.
Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 2 +
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 53 ++++---------------
arch/arm64/kvm/hyp/nvhe/pkvm.c | 49 +++++++++++++++++
3 files changed, 61 insertions(+), 43 deletions(-)
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index 99d8398afe20..5031879ccb87 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -76,6 +76,8 @@ static __always_inline void __load_host_stage2(void)
#ifdef CONFIG_NVHE_EL2_DEBUG
void pkvm_ownership_selftest(void *base);
+struct pkvm_hyp_vcpu *init_selftest_vm(void *virt);
+void teardown_selftest_vm(void);
#else
static inline void pkvm_ownership_selftest(void *base) { }
#endif
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 6525f9fa274c..b2c9ea105701 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -1646,53 +1646,18 @@ struct pkvm_expected_state {
static struct pkvm_expected_state selftest_state;
static struct hyp_page *selftest_page;
-
-static struct pkvm_hyp_vm selftest_vm = {
- .kvm = {
- .arch = {
- .mmu = {
- .arch = &selftest_vm.kvm.arch,
- .pgt = &selftest_vm.pgt,
- },
- },
- },
-};
-
-static struct pkvm_hyp_vcpu selftest_vcpu = {
- .vcpu = {
- .arch = {
- .hw_mmu = &selftest_vm.kvm.arch.mmu,
- },
- .kvm = &selftest_vm.kvm,
- },
-};
-
-static void init_selftest_vm(void *virt)
-{
- struct hyp_page *p = hyp_virt_to_page(virt);
- int i;
-
- selftest_vm.kvm.arch.mmu.vtcr = host_mmu.arch.mmu.vtcr;
- WARN_ON(kvm_guest_prepare_stage2(&selftest_vm, virt));
-
- for (i = 0; i < pkvm_selftest_pages(); i++) {
- if (p[i].refcount)
- continue;
- p[i].refcount = 1;
- hyp_put_page(&selftest_vm.pool, hyp_page_to_virt(&p[i]));
- }
-}
+static struct pkvm_hyp_vcpu *selftest_vcpu;
static u64 selftest_ipa(void)
{
- return BIT(selftest_vm.pgt.ia_bits - 1);
+ return BIT(selftest_vcpu->vcpu.arch.hw_mmu->pgt->ia_bits - 1);
}
static void assert_page_state(void)
{
void *virt = hyp_page_to_virt(selftest_page);
u64 size = PAGE_SIZE << selftest_page->order;
- struct pkvm_hyp_vcpu *vcpu = &selftest_vcpu;
+ struct pkvm_hyp_vcpu *vcpu = selftest_vcpu;
u64 phys = hyp_virt_to_phys(virt);
u64 ipa[2] = { selftest_ipa(), selftest_ipa() + PAGE_SIZE };
struct pkvm_hyp_vm *vm;
@@ -1707,10 +1672,10 @@ static void assert_page_state(void)
WARN_ON(__hyp_check_page_state_range(phys, size, selftest_state.hyp));
hyp_unlock_component();
- guest_lock_component(&selftest_vm);
+ guest_lock_component(vm);
WARN_ON(__guest_check_page_state_range(vm, ipa[0], size, selftest_state.guest[0]));
WARN_ON(__guest_check_page_state_range(vm, ipa[1], size, selftest_state.guest[1]));
- guest_unlock_component(&selftest_vm);
+ guest_unlock_component(vm);
}
#define assert_transition_res(res, fn, ...) \
@@ -1723,14 +1688,15 @@ void pkvm_ownership_selftest(void *base)
{
enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_RWX;
void *virt = hyp_alloc_pages(&host_s2_pool, 0);
- struct pkvm_hyp_vcpu *vcpu = &selftest_vcpu;
- struct pkvm_hyp_vm *vm = &selftest_vm;
+ struct pkvm_hyp_vcpu *vcpu;
u64 phys, size, pfn, gfn;
+ struct pkvm_hyp_vm *vm;
WARN_ON(!virt);
selftest_page = hyp_virt_to_page(virt);
selftest_page->refcount = 0;
- init_selftest_vm(base);
+ selftest_vcpu = vcpu = init_selftest_vm(base);
+ vm = pkvm_hyp_vcpu_to_hyp_vm(vcpu);
size = PAGE_SIZE << selftest_page->order;
phys = hyp_virt_to_phys(virt);
@@ -1854,6 +1820,7 @@ void pkvm_ownership_selftest(void *base)
selftest_state.hyp = PKVM_PAGE_OWNED;
assert_transition_res(0, __pkvm_host_donate_hyp, pfn, 1);
+ teardown_selftest_vm();
selftest_page->refcount = 1;
hyp_put_page(&host_s2_pool, virt);
}
diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
index ebfd9904ede6..794a19fa911d 100644
--- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
+++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
@@ -727,6 +727,55 @@ void __pkvm_unreserve_vm(pkvm_handle_t handle)
hyp_spin_unlock(&vm_table_lock);
}
+#ifdef CONFIG_NVHE_EL2_DEBUG
+static struct pkvm_hyp_vm selftest_vm = {
+ .kvm = {
+ .arch = {
+ .mmu = {
+ .arch = &selftest_vm.kvm.arch,
+ .pgt = &selftest_vm.pgt,
+ },
+ },
+ },
+};
+
+static struct pkvm_hyp_vcpu selftest_vcpu = {
+ .vcpu = {
+ .arch = {
+ .hw_mmu = &selftest_vm.kvm.arch.mmu,
+ },
+ .kvm = &selftest_vm.kvm,
+ },
+};
+
+struct pkvm_hyp_vcpu *init_selftest_vm(void *virt)
+{
+ struct hyp_page *p = hyp_virt_to_page(virt);
+ int i;
+
+ selftest_vm.kvm.arch.mmu.vtcr = host_mmu.arch.mmu.vtcr;
+ WARN_ON(kvm_guest_prepare_stage2(&selftest_vm, virt));
+
+ for (i = 0; i < pkvm_selftest_pages(); i++) {
+ if (p[i].refcount)
+ continue;
+ p[i].refcount = 1;
+ hyp_put_page(&selftest_vm.pool, hyp_page_to_virt(&p[i]));
+ }
+
+ selftest_vm.kvm.arch.pkvm.handle = __pkvm_reserve_vm();
+ insert_vm_table_entry(selftest_vm.kvm.arch.pkvm.handle, &selftest_vm);
+ return &selftest_vcpu;
+}
+
+void teardown_selftest_vm(void)
+{
+ hyp_spin_lock(&vm_table_lock);
+ remove_vm_table_entry(selftest_vm.kvm.arch.pkvm.handle);
+ hyp_spin_unlock(&vm_table_lock);
+}
+#endif /* CONFIG_NVHE_EL2_DEBUG */
+
/*
* Initialize the hypervisor copy of the VM state using host-donated memory.
*
--
2.53.0.1018.g2bb0e51243-goog
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v4 35/38] KVM: arm64: Extend pKVM page ownership selftests to cover forced reclaim
2026-03-27 13:59 [PATCH v4 00/38] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
` (33 preceding siblings ...)
2026-03-27 14:00 ` [PATCH v4 34/38] KVM: arm64: Register 'selftest_vm' in the VM table Will Deacon
@ 2026-03-27 14:00 ` Will Deacon
2026-03-27 14:00 ` [PATCH v4 36/38] KVM: arm64: Extend pKVM page ownership selftests to cover guest hvcs Will Deacon
` (3 subsequent siblings)
38 siblings, 0 replies; 40+ messages in thread
From: Will Deacon @ 2026-03-27 14:00 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh,
Alexandru Elisei
Extend the pKVM page ownership selftests to forcefully reclaim a donated
page and check that it cannot be re-donated at the same IPA.
Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 16 ++++++++++++++--
1 file changed, 14 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index b2c9ea105701..05a5b145e303 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -1813,8 +1813,20 @@ void pkvm_ownership_selftest(void *base)
assert_transition_res(-EPERM, __pkvm_hyp_donate_host, pfn, 1);
selftest_state.host = PKVM_PAGE_OWNED;
- selftest_state.guest[0] = PKVM_NOPAGE;
- assert_transition_res(0, __pkvm_host_reclaim_page_guest, gfn, vm);
+ selftest_state.guest[0] = PKVM_POISON;
+ assert_transition_res(0, __pkvm_host_force_reclaim_page_guest, phys);
+ assert_transition_res(-EPERM, __pkvm_host_donate_guest, pfn, gfn, vcpu);
+ assert_transition_res(-EPERM, __pkvm_host_share_guest, pfn, gfn, 1, vcpu, prot);
+
+ selftest_state.host = PKVM_NOPAGE;
+ selftest_state.guest[1] = PKVM_PAGE_OWNED;
+ assert_transition_res(0, __pkvm_host_donate_guest, pfn, gfn + 1, vcpu);
+
+ selftest_state.host = PKVM_PAGE_OWNED;
+ selftest_state.guest[1] = PKVM_NOPAGE;
+ assert_transition_res(0, __pkvm_host_reclaim_page_guest, gfn + 1, vm);
+ assert_transition_res(-EPERM, __pkvm_host_donate_guest, pfn, gfn, vcpu);
+ assert_transition_res(-EPERM, __pkvm_host_share_guest, pfn, gfn, 1, vcpu, prot);
selftest_state.host = PKVM_NOPAGE;
selftest_state.hyp = PKVM_PAGE_OWNED;
--
2.53.0.1018.g2bb0e51243-goog
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v4 36/38] KVM: arm64: Extend pKVM page ownership selftests to cover guest hvcs
2026-03-27 13:59 [PATCH v4 00/38] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
` (34 preceding siblings ...)
2026-03-27 14:00 ` [PATCH v4 35/38] KVM: arm64: Extend pKVM page ownership selftests to cover forced reclaim Will Deacon
@ 2026-03-27 14:00 ` Will Deacon
2026-03-27 14:00 ` [PATCH v4 37/38] KVM: arm64: Rename PKVM_PAGE_STATE_MASK Will Deacon
` (2 subsequent siblings)
38 siblings, 0 replies; 40+ messages in thread
From: Will Deacon @ 2026-03-27 14:00 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh,
Alexandru Elisei
Now that the guest can share and unshare memory with the host using
hypercalls, extend the pKVM page ownership selftest to exercise these
new transitions.
Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 30 +++++++++++++++++++++++++++
1 file changed, 30 insertions(+)
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 05a5b145e303..0921efb8a16f 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -1812,11 +1812,41 @@ void pkvm_ownership_selftest(void *base)
assert_transition_res(-EPERM, __pkvm_host_unshare_hyp, pfn);
assert_transition_res(-EPERM, __pkvm_hyp_donate_host, pfn, 1);
+ selftest_state.host = PKVM_PAGE_SHARED_BORROWED;
+ selftest_state.guest[0] = PKVM_PAGE_SHARED_OWNED;
+ assert_transition_res(0, __pkvm_guest_share_host, vcpu, gfn);
+ assert_transition_res(-EPERM, __pkvm_guest_share_host, vcpu, gfn);
+ assert_transition_res(-EPERM, __pkvm_host_donate_guest, pfn, gfn, vcpu);
+ assert_transition_res(-EPERM, __pkvm_host_donate_guest, pfn, gfn + 1, vcpu);
+ assert_transition_res(-EPERM, __pkvm_host_share_guest, pfn, gfn, 1, vcpu, prot);
+ assert_transition_res(-EPERM, __pkvm_host_share_guest, pfn, gfn + 1, 1, vcpu, prot);
+ assert_transition_res(-EPERM, __pkvm_host_share_ffa, pfn, 1);
+ assert_transition_res(-EPERM, __pkvm_host_donate_hyp, pfn, 1);
+ assert_transition_res(-EPERM, __pkvm_host_share_hyp, pfn);
+ assert_transition_res(-EPERM, __pkvm_host_unshare_hyp, pfn);
+ assert_transition_res(-EPERM, __pkvm_hyp_donate_host, pfn, 1);
+
+ selftest_state.host = PKVM_NOPAGE;
+ selftest_state.guest[0] = PKVM_PAGE_OWNED;
+ assert_transition_res(0, __pkvm_guest_unshare_host, vcpu, gfn);
+ assert_transition_res(-EPERM, __pkvm_guest_unshare_host, vcpu, gfn);
+ assert_transition_res(-EPERM, __pkvm_host_donate_guest, pfn, gfn, vcpu);
+ assert_transition_res(-EPERM, __pkvm_host_donate_guest, pfn, gfn + 1, vcpu);
+ assert_transition_res(-EPERM, __pkvm_host_share_guest, pfn, gfn, 1, vcpu, prot);
+ assert_transition_res(-EPERM, __pkvm_host_share_guest, pfn, gfn + 1, 1, vcpu, prot);
+ assert_transition_res(-EPERM, __pkvm_host_share_ffa, pfn, 1);
+ assert_transition_res(-EPERM, __pkvm_host_donate_hyp, pfn, 1);
+ assert_transition_res(-EPERM, __pkvm_host_share_hyp, pfn);
+ assert_transition_res(-EPERM, __pkvm_host_unshare_hyp, pfn);
+ assert_transition_res(-EPERM, __pkvm_hyp_donate_host, pfn, 1);
+
selftest_state.host = PKVM_PAGE_OWNED;
selftest_state.guest[0] = PKVM_POISON;
assert_transition_res(0, __pkvm_host_force_reclaim_page_guest, phys);
assert_transition_res(-EPERM, __pkvm_host_donate_guest, pfn, gfn, vcpu);
assert_transition_res(-EPERM, __pkvm_host_share_guest, pfn, gfn, 1, vcpu, prot);
+ assert_transition_res(-EHWPOISON, __pkvm_guest_share_host, vcpu, gfn);
+ assert_transition_res(-EHWPOISON, __pkvm_guest_unshare_host, vcpu, gfn);
selftest_state.host = PKVM_NOPAGE;
selftest_state.guest[1] = PKVM_PAGE_OWNED;
--
2.53.0.1018.g2bb0e51243-goog
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v4 37/38] KVM: arm64: Rename PKVM_PAGE_STATE_MASK
2026-03-27 13:59 [PATCH v4 00/38] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
` (35 preceding siblings ...)
2026-03-27 14:00 ` [PATCH v4 36/38] KVM: arm64: Extend pKVM page ownership selftests to cover guest hvcs Will Deacon
@ 2026-03-27 14:00 ` Will Deacon
2026-03-27 14:00 ` [PATCH v4 38/38] drivers/virt: pkvm: Add Kconfig dependency on DMA_RESTRICTED_POOL Will Deacon
2026-03-27 18:13 ` [PATCH v4 00/38] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
38 siblings, 0 replies; 40+ messages in thread
From: Will Deacon @ 2026-03-27 14:00 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh,
Alexandru Elisei
Rename PKVM_PAGE_STATE_MASK to PKVM_PAGE_STATE_VMEMMAP_MASK to make it
clear that the mask applies to the page state recorded in the entries
of the 'hyp_vmemmap', rather than page states stored elsewhere (e.g. in
the ptes).
Suggested-by: Alexandru Elisei <alexandru.elisei@arm.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
arch/arm64/kvm/hyp/include/nvhe/memory.h | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/arch/arm64/kvm/hyp/include/nvhe/memory.h b/arch/arm64/kvm/hyp/include/nvhe/memory.h
index 4cedb720c75d..b50712d47f6d 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/memory.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/memory.h
@@ -37,7 +37,7 @@ enum pkvm_page_state {
*/
PKVM_POISON = BIT(2),
};
-#define PKVM_PAGE_STATE_MASK (BIT(0) | BIT(1))
+#define PKVM_PAGE_STATE_VMEMMAP_MASK (BIT(0) | BIT(1))
#define PKVM_PAGE_STATE_PROT_MASK (KVM_PGTABLE_PROT_SW0 | KVM_PGTABLE_PROT_SW1)
static inline enum kvm_pgtable_prot pkvm_mkstate(enum kvm_pgtable_prot prot,
@@ -114,12 +114,12 @@ static inline void set_host_state(struct hyp_page *p, enum pkvm_page_state state
static inline enum pkvm_page_state get_hyp_state(struct hyp_page *p)
{
- return p->__hyp_state_comp ^ PKVM_PAGE_STATE_MASK;
+ return p->__hyp_state_comp ^ PKVM_PAGE_STATE_VMEMMAP_MASK;
}
static inline void set_hyp_state(struct hyp_page *p, enum pkvm_page_state state)
{
- p->__hyp_state_comp = state ^ PKVM_PAGE_STATE_MASK;
+ p->__hyp_state_comp = state ^ PKVM_PAGE_STATE_VMEMMAP_MASK;
}
/*
--
2.53.0.1018.g2bb0e51243-goog
^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH v4 38/38] drivers/virt: pkvm: Add Kconfig dependency on DMA_RESTRICTED_POOL
2026-03-27 13:59 [PATCH v4 00/38] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
` (36 preceding siblings ...)
2026-03-27 14:00 ` [PATCH v4 37/38] KVM: arm64: Rename PKVM_PAGE_STATE_MASK Will Deacon
@ 2026-03-27 14:00 ` Will Deacon
2026-03-27 18:13 ` [PATCH v4 00/38] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
38 siblings, 0 replies; 40+ messages in thread
From: Will Deacon @ 2026-03-27 14:00 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Will Deacon, Marc Zyngier, Oliver Upton,
Joey Gouly, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Quentin Perret, Fuad Tabba, Vincent Donnefort, Mostafa Saleh,
Alexandru Elisei
pKVM guests practically rely on CONFIG_DMA_RESTRICTED_POOL=y in order
to establish shared memory regions with the host for virtio buffers.
Make CONFIG_ARM_PKVM_GUEST depend on CONFIG_DMA_RESTRICTED_POOL to avoid
the inevitable segmentation faults experience if you have the former but
not the latter.
Reported-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
---
drivers/virt/coco/pkvm-guest/Kconfig | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/virt/coco/pkvm-guest/Kconfig b/drivers/virt/coco/pkvm-guest/Kconfig
index d2f344f1f98f..928b8e1668cc 100644
--- a/drivers/virt/coco/pkvm-guest/Kconfig
+++ b/drivers/virt/coco/pkvm-guest/Kconfig
@@ -1,6 +1,6 @@
config ARM_PKVM_GUEST
bool "Arm pKVM protected guest driver"
- depends on ARM64
+ depends on ARM64 && DMA_RESTRICTED_POOL
help
Protected guests running under the pKVM hypervisor on arm64
are isolated from the host and must issue hypercalls to enable
--
2.53.0.1018.g2bb0e51243-goog
^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [PATCH v4 00/38] KVM: arm64: Add support for protected guest memory with pKVM
2026-03-27 13:59 [PATCH v4 00/38] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
` (37 preceding siblings ...)
2026-03-27 14:00 ` [PATCH v4 38/38] drivers/virt: pkvm: Add Kconfig dependency on DMA_RESTRICTED_POOL Will Deacon
@ 2026-03-27 18:13 ` Will Deacon
38 siblings, 0 replies; 40+ messages in thread
From: Will Deacon @ 2026-03-27 18:13 UTC (permalink / raw)
To: kvmarm
Cc: linux-arm-kernel, Marc Zyngier, Oliver Upton, Joey Gouly,
Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Quentin Perret,
Fuad Tabba, Vincent Donnefort, Mostafa Saleh, Alexandru Elisei
On Fri, Mar 27, 2026 at 01:59:59PM +0000, Will Deacon wrote:
> I fully expect to send a v5, as this is the first time Sashiko has had
> a chance to chew on this and I'm expecting a roasting.
After going through it, the report isn't as bad as it looks and some of
the comments are actively wrong, which I suppose is inevitable.
That being said, I've got a handful of fixes to fold in now and it's
pointed out some unrelated life-cycle issues that we want want to fix
separately.
Will
^ permalink raw reply [flat|nested] 40+ messages in thread
end of thread, other threads:[~2026-03-27 18:13 UTC | newest]
Thread overview: 40+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-27 13:59 [PATCH v4 00/38] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
2026-03-27 14:00 ` [PATCH v4 01/38] KVM: arm64: Remove unused PKVM_ID_FFA definition Will Deacon
2026-03-27 14:00 ` [PATCH v4 02/38] KVM: arm64: Don't leak stage-2 page-table if VM fails to init under pKVM Will Deacon
2026-03-27 14:00 ` [PATCH v4 03/38] KVM: arm64: Move handle check into pkvm_pgtable_stage2_destroy_range() Will Deacon
2026-03-27 14:00 ` [PATCH v4 04/38] KVM: arm64: Rename __pkvm_pgtable_stage2_unmap() Will Deacon
2026-03-27 14:00 ` [PATCH v4 05/38] KVM: arm64: Don't advertise unsupported features for protected guests Will Deacon
2026-03-27 14:00 ` [PATCH v4 06/38] KVM: arm64: Expose self-hosted debug regs as RAZ/WI " Will Deacon
2026-03-27 14:00 ` [PATCH v4 07/38] KVM: arm64: Remove is_protected_kvm_enabled() checks from hypercalls Will Deacon
2026-03-27 14:00 ` [PATCH v4 08/38] KVM: arm64: Ignore MMU notifier callbacks for protected VMs Will Deacon
2026-03-27 14:00 ` [PATCH v4 09/38] KVM: arm64: Prevent unsupported memslot operations on " Will Deacon
2026-03-27 14:00 ` [PATCH v4 10/38] KVM: arm64: Ignore -EAGAIN when mapping in pages for the pKVM host Will Deacon
2026-03-27 14:00 ` [PATCH v4 11/38] KVM: arm64: Split teardown hypercall into two phases Will Deacon
2026-03-27 14:00 ` [PATCH v4 12/38] KVM: arm64: Introduce __pkvm_host_donate_guest() Will Deacon
2026-03-27 14:00 ` [PATCH v4 13/38] KVM: arm64: Hook up donation hypercall to pkvm_pgtable_stage2_map() Will Deacon
2026-03-27 14:00 ` [PATCH v4 14/38] KVM: arm64: Handle aborts from protected VMs Will Deacon
2026-03-27 14:00 ` [PATCH v4 15/38] KVM: arm64: Introduce __pkvm_reclaim_dying_guest_page() Will Deacon
2026-03-27 14:00 ` [PATCH v4 16/38] KVM: arm64: Hook up reclaim hypercall to pkvm_pgtable_stage2_destroy() Will Deacon
2026-03-27 14:00 ` [PATCH v4 17/38] KVM: arm64: Factor out pKVM host exception injection logic Will Deacon
2026-03-27 14:00 ` [PATCH v4 18/38] KVM: arm64: Support translation faults in inject_host_exception() Will Deacon
2026-03-27 14:00 ` [PATCH v4 19/38] KVM: arm64: Inject SIGSEGV on illegal accesses Will Deacon
2026-03-27 14:00 ` [PATCH v4 20/38] KVM: arm64: Avoid pointless annotation when mapping host-owned pages Will Deacon
2026-03-27 14:00 ` [PATCH v4 21/38] KVM: arm64: Generalise kvm_pgtable_stage2_set_owner() Will Deacon
2026-03-27 14:00 ` [PATCH v4 22/38] KVM: arm64: Introduce host_stage2_set_owner_metadata_locked() Will Deacon
2026-03-27 14:00 ` [PATCH v4 23/38] KVM: arm64: Change 'pkvm_handle_t' to u16 Will Deacon
2026-03-27 14:00 ` [PATCH v4 24/38] KVM: arm64: Annotate guest donations with handle and gfn in host stage-2 Will Deacon
2026-03-27 14:00 ` [PATCH v4 25/38] KVM: arm64: Introduce hypercall to force reclaim of a protected page Will Deacon
2026-03-27 14:00 ` [PATCH v4 26/38] KVM: arm64: Reclaim faulting page from pKVM in spurious fault handler Will Deacon
2026-03-27 14:00 ` [PATCH v4 27/38] KVM: arm64: Return -EFAULT from VCPU_RUN on access to a poisoned pte Will Deacon
2026-03-27 14:00 ` [PATCH v4 28/38] KVM: arm64: Add hvc handler at EL2 for hypercalls from protected VMs Will Deacon
2026-03-27 14:00 ` [PATCH v4 29/38] KVM: arm64: Implement the MEM_SHARE hypercall for " Will Deacon
2026-03-27 14:00 ` [PATCH v4 30/38] KVM: arm64: Implement the MEM_UNSHARE " Will Deacon
2026-03-27 14:00 ` [PATCH v4 31/38] KVM: arm64: Allow userspace to create protected VMs when pKVM is enabled Will Deacon
2026-03-27 14:00 ` [PATCH v4 32/38] KVM: arm64: Add some initial documentation for pKVM Will Deacon
2026-03-27 14:00 ` [PATCH v4 33/38] KVM: arm64: Extend pKVM page ownership selftests to cover guest donation Will Deacon
2026-03-27 14:00 ` [PATCH v4 34/38] KVM: arm64: Register 'selftest_vm' in the VM table Will Deacon
2026-03-27 14:00 ` [PATCH v4 35/38] KVM: arm64: Extend pKVM page ownership selftests to cover forced reclaim Will Deacon
2026-03-27 14:00 ` [PATCH v4 36/38] KVM: arm64: Extend pKVM page ownership selftests to cover guest hvcs Will Deacon
2026-03-27 14:00 ` [PATCH v4 37/38] KVM: arm64: Rename PKVM_PAGE_STATE_MASK Will Deacon
2026-03-27 14:00 ` [PATCH v4 38/38] drivers/virt: pkvm: Add Kconfig dependency on DMA_RESTRICTED_POOL Will Deacon
2026-03-27 18:13 ` [PATCH v4 00/38] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox