* [PATCH v2 0/8] KVM: arm64: Reserve pKVM VM handle during initial VM setup
@ 2025-08-11 10:19 Fuad Tabba
2025-08-11 10:19 ` [PATCH v2 1/8] KVM: arm64: Rename pkvm.enabled to pkvm.is_protected Fuad Tabba
` (7 more replies)
0 siblings, 8 replies; 9+ messages in thread
From: Fuad Tabba @ 2025-08-11 10:19 UTC (permalink / raw)
To: kvmarm, linux-arm-kernel
Cc: maz, oliver.upton, will, mark.rutland, joey.gouly, suzuki.poulose,
yuzenghui, catalin.marinas, broonie, vdonnefort, qperret,
sebastianene, keirf, smostafa, tabba
Changes since v1 [1]:
- Collect reviews.
- Add more details to cover letter.
- Rebase on Linux 6.17-rc1.
All VMs in pKVM identified by their handle, a unique per-VM ID. This
handle is shared between the host kernel and the hypvervisor, and used
to track the VM across both.
In pKVM, this handle is allocated when the VM is initialized at the
hypervisor, which is on the first vCPU run. However, the host starts
initializing the VM and setting up its data structures earlier. MMU
notifiers for the VMs are also registered before VM initialization at
the hypervisor, and rely on the handle to identify the VM [2].
Therefore, there is a potential gap between when the VM is (partially)
setup at the host, but still without a valid pKVM handle to identify it
when communicating with the hypervisor.
Additionally, in the future, the host needs to communicate with
TrustZone about the before the VM first run. Therefore, move handle
creation to when the VM is first initialized at the host.
This patch series also takes the oportunity to do some refactoring
(mostly renaming and fixing documentation) of the code. We are in the
process of upstreaming pKVM. Refactoring this code now would generate
less churn than postponing it, as the upstreamed codebase grows.
Moreover, the exsiting names and documentation are at best misleading
(and in cases actually wrong), which could lead to more confusion and
problems reviewing code in the future.
This patch series is divided into two parts:
- Patches 1-4: Renaming, refactoring, and tidying up to lay the
groundwork for moving handle initialization and to fix existing
issues.
- Patches 5-8: Decouple handle creation from VM initialization at the
hypervisor and move the handle creation to VM initialization at the
host.
Cheers,
/fuad
[1] https://lore.kernel.org/all/20250729120014.2799359-1-tabba@google.com/
[2] https://lore.kernel.org/all/20250303214947.GA30619@willie-the-truck/
Fuad Tabba (8):
KVM: arm64: Rename pkvm.enabled to pkvm.is_protected
KVM: arm64: Rename 'host_kvm' to 'kvm' in pKVM host code
KVM: arm64: Clarify comments to distinguish pKVM mode from protected
VMs
KVM: arm64: Decouple hyp VM creation state from its handle
KVM: arm64: Separate allocation and insertion of pKVM VM table entries
KVM: arm64: Consolidate pKVM hypervisor VM initialization logic
KVM: arm64: Introduce separate hypercalls for pKVM VM reservation and
initialization
KVM: arm64: Reserve pKVM handle during pkvm_init_host_vm()
arch/arm64/include/asm/kvm_asm.h | 2 +
arch/arm64/include/asm/kvm_host.h | 5 +-
arch/arm64/include/asm/kvm_pkvm.h | 1 +
arch/arm64/kvm/arm.c | 12 +-
arch/arm64/kvm/hyp/include/nvhe/pkvm.h | 4 +-
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 14 ++
arch/arm64/kvm/hyp/nvhe/pkvm.c | 177 +++++++++++++++++++------
arch/arm64/kvm/pkvm.c | 76 +++++++----
8 files changed, 217 insertions(+), 74 deletions(-)
base-commit: 8f5ae30d69d7543eee0d70083daf4de8fe15d585
--
2.50.1.703.g449372360f-goog
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v2 1/8] KVM: arm64: Rename pkvm.enabled to pkvm.is_protected
2025-08-11 10:19 [PATCH v2 0/8] KVM: arm64: Reserve pKVM VM handle during initial VM setup Fuad Tabba
@ 2025-08-11 10:19 ` Fuad Tabba
2025-08-11 10:19 ` [PATCH v2 2/8] KVM: arm64: Rename 'host_kvm' to 'kvm' in pKVM host code Fuad Tabba
` (6 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Fuad Tabba @ 2025-08-11 10:19 UTC (permalink / raw)
To: kvmarm, linux-arm-kernel
Cc: maz, oliver.upton, will, mark.rutland, joey.gouly, suzuki.poulose,
yuzenghui, catalin.marinas, broonie, vdonnefort, qperret,
sebastianene, keirf, smostafa, tabba
The 'pkvm.enabled' field in struct kvm_protected_vm is confusingly
named. Its purpose is to indicate whether a VM is a _protected_ VM under
pKVM, and not whether the VM itself is enabled or running.
For a non-protected VM, the VM can be fully active and running, yet this
field would be false. This ambiguity can lead to incorrect assumptions
about the VM's operational state and makes the code harder to reason
about.
Rename the field to 'is_protected' to make it unambiguous that the flag
tracks the protected status of the VM.
No functional change intended.
Reviewed-by: Kunwu Chan <kunwu.chan@linux.dev>
Signed-off-by: Fuad Tabba <tabba@google.com>
---
arch/arm64/include/asm/kvm_host.h | 4 ++--
arch/arm64/kvm/hyp/nvhe/pkvm.c | 2 +-
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 2f2394cce24e..a4289c2f13f5 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -252,7 +252,7 @@ struct kvm_protected_vm {
pkvm_handle_t handle;
struct kvm_hyp_memcache teardown_mc;
struct kvm_hyp_memcache stage2_teardown_mc;
- bool enabled;
+ bool is_protected;
};
struct kvm_mpidr_data {
@@ -1548,7 +1548,7 @@ struct kvm *kvm_arch_alloc_vm(void);
#define __KVM_HAVE_ARCH_FLUSH_REMOTE_TLBS_RANGE
-#define kvm_vm_is_protected(kvm) (is_protected_kvm_enabled() && (kvm)->arch.pkvm.enabled)
+#define kvm_vm_is_protected(kvm) (is_protected_kvm_enabled() && (kvm)->arch.pkvm.is_protected)
#define vcpu_is_protected(vcpu) kvm_vm_is_protected((vcpu)->kvm)
diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
index 338505cb0171..6198c1d27b5b 100644
--- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
+++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
@@ -406,7 +406,7 @@ static void init_pkvm_hyp_vm(struct kvm *host_kvm, struct pkvm_hyp_vm *hyp_vm,
hyp_vm->host_kvm = host_kvm;
hyp_vm->kvm.created_vcpus = nr_vcpus;
hyp_vm->kvm.arch.mmu.vtcr = host_mmu.arch.mmu.vtcr;
- hyp_vm->kvm.arch.pkvm.enabled = READ_ONCE(host_kvm->arch.pkvm.enabled);
+ hyp_vm->kvm.arch.pkvm.is_protected = READ_ONCE(host_kvm->arch.pkvm.is_protected);
hyp_vm->kvm.arch.flags = 0;
pkvm_init_features_from_host(hyp_vm, host_kvm);
}
--
2.50.1.703.g449372360f-goog
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH v2 2/8] KVM: arm64: Rename 'host_kvm' to 'kvm' in pKVM host code
2025-08-11 10:19 [PATCH v2 0/8] KVM: arm64: Reserve pKVM VM handle during initial VM setup Fuad Tabba
2025-08-11 10:19 ` [PATCH v2 1/8] KVM: arm64: Rename pkvm.enabled to pkvm.is_protected Fuad Tabba
@ 2025-08-11 10:19 ` Fuad Tabba
2025-08-11 10:19 ` [PATCH v2 3/8] KVM: arm64: Clarify comments to distinguish pKVM mode from protected VMs Fuad Tabba
` (5 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Fuad Tabba @ 2025-08-11 10:19 UTC (permalink / raw)
To: kvmarm, linux-arm-kernel
Cc: maz, oliver.upton, will, mark.rutland, joey.gouly, suzuki.poulose,
yuzenghui, catalin.marinas, broonie, vdonnefort, qperret,
sebastianene, keirf, smostafa, tabba
In hypervisor (EL2) code, it is important to distinguish between the
host's 'struct kvm' and a protected VM's 'struct kvm'. Using 'host_kvm'
as variable name in that context makes this distinction clear.
However, in the host kernel code (EL1), there is no such ambiguity. The
code is only ever concerned with the host's own 'struct kvm' instance.
The 'host_' prefix is therefore redundant and adds unnecessary
verbosity.
Simplify the code by renaming the 'host_kvm' parameter to 'kvm' in all
functions within host-side kernel code (EL1). This improves readability
and makes the naming consistent with other host-side kernel code.
No functional change intended.
Signed-off-by: Fuad Tabba <tabba@google.com>
---
arch/arm64/kvm/pkvm.c | 46 +++++++++++++++++++++----------------------
1 file changed, 23 insertions(+), 23 deletions(-)
diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index fcd70bfe44fb..7aaeb66e3f39 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -85,16 +85,16 @@ void __init kvm_hyp_reserve(void)
hyp_mem_base);
}
-static void __pkvm_destroy_hyp_vm(struct kvm *host_kvm)
+static void __pkvm_destroy_hyp_vm(struct kvm *kvm)
{
- if (host_kvm->arch.pkvm.handle) {
+ if (kvm->arch.pkvm.handle) {
WARN_ON(kvm_call_hyp_nvhe(__pkvm_teardown_vm,
- host_kvm->arch.pkvm.handle));
+ kvm->arch.pkvm.handle));
}
- host_kvm->arch.pkvm.handle = 0;
- free_hyp_memcache(&host_kvm->arch.pkvm.teardown_mc);
- free_hyp_memcache(&host_kvm->arch.pkvm.stage2_teardown_mc);
+ kvm->arch.pkvm.handle = 0;
+ free_hyp_memcache(&kvm->arch.pkvm.teardown_mc);
+ free_hyp_memcache(&kvm->arch.pkvm.stage2_teardown_mc);
}
static int __pkvm_create_hyp_vcpu(struct kvm_vcpu *vcpu)
@@ -129,16 +129,16 @@ static int __pkvm_create_hyp_vcpu(struct kvm_vcpu *vcpu)
*
* Return 0 on success, negative error code on failure.
*/
-static int __pkvm_create_hyp_vm(struct kvm *host_kvm)
+static int __pkvm_create_hyp_vm(struct kvm *kvm)
{
size_t pgd_sz, hyp_vm_sz;
void *pgd, *hyp_vm;
int ret;
- if (host_kvm->created_vcpus < 1)
+ if (kvm->created_vcpus < 1)
return -EINVAL;
- pgd_sz = kvm_pgtable_stage2_pgd_size(host_kvm->arch.mmu.vtcr);
+ pgd_sz = kvm_pgtable_stage2_pgd_size(kvm->arch.mmu.vtcr);
/*
* The PGD pages will be reclaimed using a hyp_memcache which implies
@@ -152,7 +152,7 @@ static int __pkvm_create_hyp_vm(struct kvm *host_kvm)
/* Allocate memory to donate to hyp for vm and vcpu pointers. */
hyp_vm_sz = PAGE_ALIGN(size_add(PKVM_HYP_VM_SIZE,
size_mul(sizeof(void *),
- host_kvm->created_vcpus)));
+ kvm->created_vcpus)));
hyp_vm = alloc_pages_exact(hyp_vm_sz, GFP_KERNEL_ACCOUNT);
if (!hyp_vm) {
ret = -ENOMEM;
@@ -160,12 +160,12 @@ static int __pkvm_create_hyp_vm(struct kvm *host_kvm)
}
/* Donate the VM memory to hyp and let hyp initialize it. */
- ret = kvm_call_hyp_nvhe(__pkvm_init_vm, host_kvm, hyp_vm, pgd);
+ ret = kvm_call_hyp_nvhe(__pkvm_init_vm, kvm, hyp_vm, pgd);
if (ret < 0)
goto free_vm;
- host_kvm->arch.pkvm.handle = ret;
- host_kvm->arch.pkvm.stage2_teardown_mc.flags |= HYP_MEMCACHE_ACCOUNT_STAGE2;
+ kvm->arch.pkvm.handle = ret;
+ kvm->arch.pkvm.stage2_teardown_mc.flags |= HYP_MEMCACHE_ACCOUNT_STAGE2;
kvm_account_pgtable_pages(pgd, pgd_sz / PAGE_SIZE);
return 0;
@@ -176,14 +176,14 @@ static int __pkvm_create_hyp_vm(struct kvm *host_kvm)
return ret;
}
-int pkvm_create_hyp_vm(struct kvm *host_kvm)
+int pkvm_create_hyp_vm(struct kvm *kvm)
{
int ret = 0;
- mutex_lock(&host_kvm->arch.config_lock);
- if (!host_kvm->arch.pkvm.handle)
- ret = __pkvm_create_hyp_vm(host_kvm);
- mutex_unlock(&host_kvm->arch.config_lock);
+ mutex_lock(&kvm->arch.config_lock);
+ if (!kvm->arch.pkvm.handle)
+ ret = __pkvm_create_hyp_vm(kvm);
+ mutex_unlock(&kvm->arch.config_lock);
return ret;
}
@@ -200,14 +200,14 @@ int pkvm_create_hyp_vcpu(struct kvm_vcpu *vcpu)
return ret;
}
-void pkvm_destroy_hyp_vm(struct kvm *host_kvm)
+void pkvm_destroy_hyp_vm(struct kvm *kvm)
{
- mutex_lock(&host_kvm->arch.config_lock);
- __pkvm_destroy_hyp_vm(host_kvm);
- mutex_unlock(&host_kvm->arch.config_lock);
+ mutex_lock(&kvm->arch.config_lock);
+ __pkvm_destroy_hyp_vm(kvm);
+ mutex_unlock(&kvm->arch.config_lock);
}
-int pkvm_init_host_vm(struct kvm *host_kvm)
+int pkvm_init_host_vm(struct kvm *kvm)
{
return 0;
}
--
2.50.1.703.g449372360f-goog
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH v2 3/8] KVM: arm64: Clarify comments to distinguish pKVM mode from protected VMs
2025-08-11 10:19 [PATCH v2 0/8] KVM: arm64: Reserve pKVM VM handle during initial VM setup Fuad Tabba
2025-08-11 10:19 ` [PATCH v2 1/8] KVM: arm64: Rename pkvm.enabled to pkvm.is_protected Fuad Tabba
2025-08-11 10:19 ` [PATCH v2 2/8] KVM: arm64: Rename 'host_kvm' to 'kvm' in pKVM host code Fuad Tabba
@ 2025-08-11 10:19 ` Fuad Tabba
2025-08-11 10:19 ` [PATCH v2 4/8] KVM: arm64: Decouple hyp VM creation state from its handle Fuad Tabba
` (4 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Fuad Tabba @ 2025-08-11 10:19 UTC (permalink / raw)
To: kvmarm, linux-arm-kernel
Cc: maz, oliver.upton, will, mark.rutland, joey.gouly, suzuki.poulose,
yuzenghui, catalin.marinas, broonie, vdonnefort, qperret,
sebastianene, keirf, smostafa, tabba
The hypervisor code for protected KVM contains comments that are
imprecise and at times flat-out wrong. They often refer to a "protected
VM" in contexts where the code or data structure applies to _any_ VM
managed by the hypervisor when pKVM is enabled.
For instance, the 'vm_table' holds handles for all VMs known to the
hypervisor, not exclusively for those that are configured as protected.
This inaccurate terminology can make the code scope harder to understand
for future (and current) developers.
Clarify the comments throughout the pKVM hypervisor code to make a clear
distinction between the pKVM feature itself (i.e., "protected mode") and
the VMs that are specifically configured to be protected. This involves
replacing ambiguous uses of "protected VM" with more accurate phrasing.
No functional change intended.
Signed-off-by: Fuad Tabba <tabba@google.com>
---
arch/arm64/kvm/hyp/include/nvhe/pkvm.h | 2 +-
arch/arm64/kvm/hyp/nvhe/pkvm.c | 25 +++++++++++--------------
2 files changed, 12 insertions(+), 15 deletions(-)
diff --git a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
index ce31d3b73603..4540324b5657 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
@@ -29,7 +29,7 @@ struct pkvm_hyp_vcpu {
};
/*
- * Holds the relevant data for running a protected vm.
+ * Holds the relevant data for running a vm in protected mode.
*/
struct pkvm_hyp_vm {
struct kvm kvm;
diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
index 6198c1d27b5b..abe173406c88 100644
--- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
+++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
@@ -23,8 +23,8 @@ unsigned int kvm_arm_vmid_bits;
unsigned int kvm_host_sve_max_vl;
/*
- * The currently loaded hyp vCPU for each physical CPU. Used only when
- * protected KVM is enabled, but for both protected and non-protected VMs.
+ * The currently loaded hyp vCPU for each physical CPU. Used in protected mode
+ * for both protected and non-protected VMs.
*/
static DEFINE_PER_CPU(struct pkvm_hyp_vcpu *, loaded_hyp_vcpu);
@@ -135,7 +135,7 @@ static int pkvm_check_pvm_cpu_features(struct kvm_vcpu *vcpu)
{
struct kvm *kvm = vcpu->kvm;
- /* Protected KVM does not support AArch32 guests. */
+ /* No AArch32 support for protected guests. */
if (kvm_has_feat(kvm, ID_AA64PFR0_EL1, EL0, AARCH32) ||
kvm_has_feat(kvm, ID_AA64PFR0_EL1, EL1, AARCH32))
return -EINVAL;
@@ -210,8 +210,8 @@ static pkvm_handle_t idx_to_vm_handle(unsigned int idx)
DEFINE_HYP_SPINLOCK(vm_table_lock);
/*
- * The table of VM entries for protected VMs in hyp.
- * Allocated at hyp initialization and setup.
+ * A table that tracks all VMs in protected mode.
+ * Allocated during hyp initialization and setup.
*/
static struct pkvm_hyp_vm **vm_table;
@@ -495,7 +495,7 @@ static int find_free_vm_table_entry(struct kvm *host_kvm)
/*
* Allocate a VM table entry and insert a pointer to the new vm.
*
- * Return a unique handle to the protected VM on success,
+ * Return a unique handle to the VM on success,
* negative error code on failure.
*/
static pkvm_handle_t insert_vm_table_entry(struct kvm *host_kvm,
@@ -594,10 +594,8 @@ static void unmap_donated_memory_noclear(void *va, size_t size)
}
/*
- * Initialize the hypervisor copy of the protected VM state using the
- * memory donated by the host.
- *
- * Unmaps the donated memory from the host at stage 2.
+ * Initialize the hypervisor copy of the VM state using host-donated memory.
+ * Unmap the donated memory from the host at stage 2.
*
* host_kvm: A pointer to the host's struct kvm.
* vm_hva: The host va of the area being donated for the VM state.
@@ -606,7 +604,7 @@ static void unmap_donated_memory_noclear(void *va, size_t size)
* the VM. Must be page aligned. Its size is implied by the VM's
* VTCR.
*
- * Return a unique handle to the protected VM on success,
+ * Return a unique handle to the VM on success,
* negative error code on failure.
*/
int __pkvm_init_vm(struct kvm *host_kvm, unsigned long vm_hva,
@@ -668,10 +666,9 @@ int __pkvm_init_vm(struct kvm *host_kvm, unsigned long vm_hva,
}
/*
- * Initialize the hypervisor copy of the protected vCPU state using the
- * memory donated by the host.
+ * Initialize the hypervisor copy of the vCPU state using host-donated memory.
*
- * handle: The handle for the protected vm.
+ * handle: The hypervisor handle for the vm.
* host_vcpu: A pointer to the corresponding host vcpu.
* vcpu_hva: The host va of the area being donated for the vcpu state.
* Must be page aligned. The size of the area must be equal to
--
2.50.1.703.g449372360f-goog
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH v2 4/8] KVM: arm64: Decouple hyp VM creation state from its handle
2025-08-11 10:19 [PATCH v2 0/8] KVM: arm64: Reserve pKVM VM handle during initial VM setup Fuad Tabba
` (2 preceding siblings ...)
2025-08-11 10:19 ` [PATCH v2 3/8] KVM: arm64: Clarify comments to distinguish pKVM mode from protected VMs Fuad Tabba
@ 2025-08-11 10:19 ` Fuad Tabba
2025-08-11 10:19 ` [PATCH v2 5/8] KVM: arm64: Separate allocation and insertion of pKVM VM table entries Fuad Tabba
` (3 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Fuad Tabba @ 2025-08-11 10:19 UTC (permalink / raw)
To: kvmarm, linux-arm-kernel
Cc: maz, oliver.upton, will, mark.rutland, joey.gouly, suzuki.poulose,
yuzenghui, catalin.marinas, broonie, vdonnefort, qperret,
sebastianene, keirf, smostafa, tabba
Currently, the presence of a pKVM handle (pkvm.handle != 0) is used to
determine if the corresponding hypervisor (EL2) VM has been created and
initialized. This couples the handle's lifecycle with the VM's creation
state.
This coupling will become problematic with upcoming changes that will
allocate the pKVM handle earlier in the VM's life, before the VM is
instantiated at the hypervisor.
To prepare for this and make the state tracking explicit, decouple the
two concepts. Introduce a new boolean flag, 'pkvm.is_created', to track
whether the hypervisor-side VM has been created and initialized.
A new helper, pkvm_hyp_vm_is_created(), is added to check this flag. All
call sites that previously checked for the handle's existence are
converted to use the new, explicit check. The 'is_created' flag is set
to true upon successful creation in the hypervisor (EL2) and cleared
upon destruction.
Signed-off-by: Fuad Tabba <tabba@google.com>
---
arch/arm64/include/asm/kvm_host.h | 1 +
arch/arm64/include/asm/kvm_pkvm.h | 1 +
arch/arm64/kvm/hyp/nvhe/pkvm.c | 1 +
arch/arm64/kvm/pkvm.c | 11 +++++++++--
4 files changed, 12 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index a4289c2f13f5..bc57749e3fb9 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -253,6 +253,7 @@ struct kvm_protected_vm {
struct kvm_hyp_memcache teardown_mc;
struct kvm_hyp_memcache stage2_teardown_mc;
bool is_protected;
+ bool is_created;
};
struct kvm_mpidr_data {
diff --git a/arch/arm64/include/asm/kvm_pkvm.h b/arch/arm64/include/asm/kvm_pkvm.h
index ea58282f59bb..08be89c95466 100644
--- a/arch/arm64/include/asm/kvm_pkvm.h
+++ b/arch/arm64/include/asm/kvm_pkvm.h
@@ -18,6 +18,7 @@
int pkvm_init_host_vm(struct kvm *kvm);
int pkvm_create_hyp_vm(struct kvm *kvm);
+bool pkvm_hyp_vm_is_created(struct kvm *kvm);
void pkvm_destroy_hyp_vm(struct kvm *kvm);
int pkvm_create_hyp_vcpu(struct kvm_vcpu *vcpu);
diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
index abe173406c88..969f6b293234 100644
--- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
+++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
@@ -407,6 +407,7 @@ static void init_pkvm_hyp_vm(struct kvm *host_kvm, struct pkvm_hyp_vm *hyp_vm,
hyp_vm->kvm.created_vcpus = nr_vcpus;
hyp_vm->kvm.arch.mmu.vtcr = host_mmu.arch.mmu.vtcr;
hyp_vm->kvm.arch.pkvm.is_protected = READ_ONCE(host_kvm->arch.pkvm.is_protected);
+ hyp_vm->kvm.arch.pkvm.is_created = true;
hyp_vm->kvm.arch.flags = 0;
pkvm_init_features_from_host(hyp_vm, host_kvm);
}
diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index 7aaeb66e3f39..45d699bba96a 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -87,12 +87,13 @@ void __init kvm_hyp_reserve(void)
static void __pkvm_destroy_hyp_vm(struct kvm *kvm)
{
- if (kvm->arch.pkvm.handle) {
+ if (pkvm_hyp_vm_is_created(kvm)) {
WARN_ON(kvm_call_hyp_nvhe(__pkvm_teardown_vm,
kvm->arch.pkvm.handle));
}
kvm->arch.pkvm.handle = 0;
+ kvm->arch.pkvm.is_created = false;
free_hyp_memcache(&kvm->arch.pkvm.teardown_mc);
free_hyp_memcache(&kvm->arch.pkvm.stage2_teardown_mc);
}
@@ -165,6 +166,7 @@ static int __pkvm_create_hyp_vm(struct kvm *kvm)
goto free_vm;
kvm->arch.pkvm.handle = ret;
+ kvm->arch.pkvm.is_created = true;
kvm->arch.pkvm.stage2_teardown_mc.flags |= HYP_MEMCACHE_ACCOUNT_STAGE2;
kvm_account_pgtable_pages(pgd, pgd_sz / PAGE_SIZE);
@@ -176,12 +178,17 @@ static int __pkvm_create_hyp_vm(struct kvm *kvm)
return ret;
}
+bool pkvm_hyp_vm_is_created(struct kvm *kvm)
+{
+ return READ_ONCE(kvm->arch.pkvm.is_created);
+}
+
int pkvm_create_hyp_vm(struct kvm *kvm)
{
int ret = 0;
mutex_lock(&kvm->arch.config_lock);
- if (!kvm->arch.pkvm.handle)
+ if (!pkvm_hyp_vm_is_created(kvm))
ret = __pkvm_create_hyp_vm(kvm);
mutex_unlock(&kvm->arch.config_lock);
--
2.50.1.703.g449372360f-goog
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH v2 5/8] KVM: arm64: Separate allocation and insertion of pKVM VM table entries
2025-08-11 10:19 [PATCH v2 0/8] KVM: arm64: Reserve pKVM VM handle during initial VM setup Fuad Tabba
` (3 preceding siblings ...)
2025-08-11 10:19 ` [PATCH v2 4/8] KVM: arm64: Decouple hyp VM creation state from its handle Fuad Tabba
@ 2025-08-11 10:19 ` Fuad Tabba
2025-08-11 10:19 ` [PATCH v2 6/8] KVM: arm64: Consolidate pKVM hypervisor VM initialization logic Fuad Tabba
` (2 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Fuad Tabba @ 2025-08-11 10:19 UTC (permalink / raw)
To: kvmarm, linux-arm-kernel
Cc: maz, oliver.upton, will, mark.rutland, joey.gouly, suzuki.poulose,
yuzenghui, catalin.marinas, broonie, vdonnefort, qperret,
sebastianene, keirf, smostafa, tabba
The current insert_vm_table_entry() function performs two actions at
once: it finds a free slot in the pKVM VM table and populates it with
the pkvm_hyp_vm pointer.
Refactor this function as a preparatory step for future work that will
require reserving a VM slot and its corresponding handle earlier in the
VM lifecycle, before the pkvm_hyp_vm structure is initialized and ready
to be inserted.
Split the function into a two-phase process:
- A new allocate_vm_table_entry() function finds an empty slot, marks it
as reserved with a RESERVED_ENTRY placeholder, and returns a handle
derived from the slot's index.
- The insert_vm_table_entry() function is repurposed to take the handle,
validate that the corresponding slot is in the reserved state, and
then populate it with the pkvm_hyp_vm pointer.
Signed-off-by: Fuad Tabba <tabba@google.com>
---
arch/arm64/kvm/hyp/nvhe/pkvm.c | 52 ++++++++++++++++++++++++++++------
1 file changed, 43 insertions(+), 9 deletions(-)
diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
index 969f6b293234..64b760d30d05 100644
--- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
+++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
@@ -192,6 +192,11 @@ static int pkvm_vcpu_init_traps(struct pkvm_hyp_vcpu *hyp_vcpu)
*/
#define HANDLE_OFFSET 0x1000
+/*
+ * Marks a reserved but not yet used entry in the VM table.
+ */
+#define RESERVED_ENTRY ((void *)0xa110ca7ed)
+
static unsigned int vm_handle_to_idx(pkvm_handle_t handle)
{
return handle - HANDLE_OFFSET;
@@ -231,6 +236,10 @@ static struct pkvm_hyp_vm *get_vm_by_handle(pkvm_handle_t handle)
if (unlikely(idx >= KVM_MAX_PVMS))
return NULL;
+ /* A reserved entry doesn't represent an initialized VM. */
+ if (unlikely(vm_table[idx] == RESERVED_ENTRY))
+ return NULL;
+
return vm_table[idx];
}
@@ -481,7 +490,7 @@ static int init_pkvm_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu,
return ret;
}
-static int find_free_vm_table_entry(struct kvm *host_kvm)
+static int find_free_vm_table_entry(void)
{
int i;
@@ -494,15 +503,13 @@ static int find_free_vm_table_entry(struct kvm *host_kvm)
}
/*
- * Allocate a VM table entry and insert a pointer to the new vm.
+ * Reserve a VM table entry.
*
* Return a unique handle to the VM on success,
* negative error code on failure.
*/
-static pkvm_handle_t insert_vm_table_entry(struct kvm *host_kvm,
- struct pkvm_hyp_vm *hyp_vm)
+static int allocate_vm_table_entry(void)
{
- struct kvm_s2_mmu *mmu = &hyp_vm->kvm.arch.mmu;
int idx;
hyp_assert_lock_held(&vm_table_lock);
@@ -515,10 +522,30 @@ static pkvm_handle_t insert_vm_table_entry(struct kvm *host_kvm,
if (unlikely(!vm_table))
return -EINVAL;
- idx = find_free_vm_table_entry(host_kvm);
- if (idx < 0)
+ idx = find_free_vm_table_entry();
+ if (unlikely(idx < 0))
return idx;
+ vm_table[idx] = RESERVED_ENTRY;
+
+ return idx;
+}
+
+/*
+ * Insert a pointer to the new VM into the VM table.
+ *
+ * Return 0 on success, or negative error code on failure.
+ */
+static int insert_vm_table_entry(struct kvm *host_kvm,
+ struct pkvm_hyp_vm *hyp_vm,
+ pkvm_handle_t handle)
+{
+ struct kvm_s2_mmu *mmu = &hyp_vm->kvm.arch.mmu;
+ unsigned int idx;
+
+ hyp_assert_lock_held(&vm_table_lock);
+
+ idx = vm_handle_to_idx(handle);
hyp_vm->kvm.arch.pkvm.handle = idx_to_vm_handle(idx);
/* VMID 0 is reserved for the host */
@@ -528,7 +555,7 @@ static pkvm_handle_t insert_vm_table_entry(struct kvm *host_kvm,
mmu->pgt = &hyp_vm->pgt;
vm_table[idx] = hyp_vm;
- return hyp_vm->kvm.arch.pkvm.handle;
+ return 0;
}
/*
@@ -614,6 +641,7 @@ int __pkvm_init_vm(struct kvm *host_kvm, unsigned long vm_hva,
struct pkvm_hyp_vm *hyp_vm = NULL;
size_t vm_size, pgd_size;
unsigned int nr_vcpus;
+ pkvm_handle_t handle;
void *pgd = NULL;
int ret;
@@ -643,10 +671,16 @@ int __pkvm_init_vm(struct kvm *host_kvm, unsigned long vm_hva,
init_pkvm_hyp_vm(host_kvm, hyp_vm, nr_vcpus);
hyp_spin_lock(&vm_table_lock);
- ret = insert_vm_table_entry(host_kvm, hyp_vm);
+ ret = allocate_vm_table_entry();
if (ret < 0)
goto err_unlock;
+ handle = idx_to_vm_handle(ret);
+
+ ret = insert_vm_table_entry(host_kvm, hyp_vm, handle);
+ if (ret)
+ goto err_unlock;
+
ret = kvm_guest_prepare_stage2(hyp_vm, pgd);
if (ret)
goto err_remove_vm_table_entry;
--
2.50.1.703.g449372360f-goog
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH v2 6/8] KVM: arm64: Consolidate pKVM hypervisor VM initialization logic
2025-08-11 10:19 [PATCH v2 0/8] KVM: arm64: Reserve pKVM VM handle during initial VM setup Fuad Tabba
` (4 preceding siblings ...)
2025-08-11 10:19 ` [PATCH v2 5/8] KVM: arm64: Separate allocation and insertion of pKVM VM table entries Fuad Tabba
@ 2025-08-11 10:19 ` Fuad Tabba
2025-08-11 10:19 ` [PATCH v2 7/8] KVM: arm64: Introduce separate hypercalls for pKVM VM reservation and initialization Fuad Tabba
2025-08-11 10:19 ` [PATCH v2 8/8] KVM: arm64: Reserve pKVM handle during pkvm_init_host_vm() Fuad Tabba
7 siblings, 0 replies; 9+ messages in thread
From: Fuad Tabba @ 2025-08-11 10:19 UTC (permalink / raw)
To: kvmarm, linux-arm-kernel
Cc: maz, oliver.upton, will, mark.rutland, joey.gouly, suzuki.poulose,
yuzenghui, catalin.marinas, broonie, vdonnefort, qperret,
sebastianene, keirf, smostafa, tabba
The insert_vm_table_entry() function was performing tasks beyond its
primary responsibility. In addition to inserting a VM pointer into the
vm_table, it was also initializing several fields within 'struct
pkvm_hyp_vm', such as the VMID and stage-2 MMU pointers. This mixing of
concerns made the code harder to follow.
As another preparatory step towards allowing a VM table entry to be
reserved before the VM is fully created, this logic must be cleaned up.
By separating table insertion from state initialization, we can control
the timing of the initialization step more precisely in subsequent
patches.
Refactor the code to consolidate all initialization logic into
init_pkvm_hyp_vm():
- Move the initialization of the handle, VMID, and MMU fields from
insert_vm_table_entry() to init_pkvm_hyp_vm().
- Simplify insert_vm_table_entry() to perform only one action: placing
the provided pkvm_hyp_vm pointer into the vm_table.
- Update the calling sequence in __pkvm_init_vm() to first allocate an
entry in the VM table, initialize the VM, and then insert the VM into
the VM table. This is all protected by the vm_table_lock for now.
Subsequent patches will adjust the sequence and not hold the
vm_table_lock while initializing the VM at the hypervisor
(init_pkvm_hyp_vm()).
Signed-off-by: Fuad Tabba <tabba@google.com>
---
arch/arm64/kvm/hyp/nvhe/pkvm.c | 47 +++++++++++++++++-----------------
1 file changed, 24 insertions(+), 23 deletions(-)
diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
index 64b760d30d05..a9abbeb530f0 100644
--- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
+++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
@@ -410,15 +410,26 @@ static void unpin_host_vcpus(struct pkvm_hyp_vcpu *hyp_vcpus[],
}
static void init_pkvm_hyp_vm(struct kvm *host_kvm, struct pkvm_hyp_vm *hyp_vm,
- unsigned int nr_vcpus)
+ unsigned int nr_vcpus, pkvm_handle_t handle)
{
+ struct kvm_s2_mmu *mmu = &hyp_vm->kvm.arch.mmu;
+ int idx = vm_handle_to_idx(handle);
+
+ hyp_vm->kvm.arch.pkvm.handle = handle;
+
hyp_vm->host_kvm = host_kvm;
hyp_vm->kvm.created_vcpus = nr_vcpus;
- hyp_vm->kvm.arch.mmu.vtcr = host_mmu.arch.mmu.vtcr;
hyp_vm->kvm.arch.pkvm.is_protected = READ_ONCE(host_kvm->arch.pkvm.is_protected);
hyp_vm->kvm.arch.pkvm.is_created = true;
hyp_vm->kvm.arch.flags = 0;
pkvm_init_features_from_host(hyp_vm, host_kvm);
+
+ /* VMID 0 is reserved for the host */
+ atomic64_set(&mmu->vmid.id, idx + 1);
+
+ mmu->vtcr = host_mmu.arch.mmu.vtcr;
+ mmu->arch = &hyp_vm->kvm.arch;
+ mmu->pgt = &hyp_vm->pgt;
}
static int pkvm_vcpu_init_sve(struct pkvm_hyp_vcpu *hyp_vcpu, struct kvm_vcpu *host_vcpu)
@@ -532,29 +543,19 @@ static int allocate_vm_table_entry(void)
}
/*
- * Insert a pointer to the new VM into the VM table.
+ * Insert a pointer to the initialized VM into the VM table.
*
* Return 0 on success, or negative error code on failure.
*/
-static int insert_vm_table_entry(struct kvm *host_kvm,
- struct pkvm_hyp_vm *hyp_vm,
- pkvm_handle_t handle)
+static int insert_vm_table_entry(pkvm_handle_t handle,
+ struct pkvm_hyp_vm *hyp_vm)
{
- struct kvm_s2_mmu *mmu = &hyp_vm->kvm.arch.mmu;
unsigned int idx;
hyp_assert_lock_held(&vm_table_lock);
-
idx = vm_handle_to_idx(handle);
- hyp_vm->kvm.arch.pkvm.handle = idx_to_vm_handle(idx);
-
- /* VMID 0 is reserved for the host */
- atomic64_set(&mmu->vmid.id, idx + 1);
-
- mmu->arch = &hyp_vm->kvm.arch;
- mmu->pgt = &hyp_vm->pgt;
-
vm_table[idx] = hyp_vm;
+
return 0;
}
@@ -668,8 +669,6 @@ int __pkvm_init_vm(struct kvm *host_kvm, unsigned long vm_hva,
if (!pgd)
goto err_remove_mappings;
- init_pkvm_hyp_vm(host_kvm, hyp_vm, nr_vcpus);
-
hyp_spin_lock(&vm_table_lock);
ret = allocate_vm_table_entry();
if (ret < 0)
@@ -677,19 +676,21 @@ int __pkvm_init_vm(struct kvm *host_kvm, unsigned long vm_hva,
handle = idx_to_vm_handle(ret);
- ret = insert_vm_table_entry(host_kvm, hyp_vm, handle);
- if (ret)
- goto err_unlock;
+ init_pkvm_hyp_vm(host_kvm, hyp_vm, nr_vcpus, handle);
ret = kvm_guest_prepare_stage2(hyp_vm, pgd);
+ if (ret)
+ goto err_remove_vm_table_entry;
+
+ ret = insert_vm_table_entry(handle, hyp_vm);
if (ret)
goto err_remove_vm_table_entry;
hyp_spin_unlock(&vm_table_lock);
- return hyp_vm->kvm.arch.pkvm.handle;
+ return handle;
err_remove_vm_table_entry:
- remove_vm_table_entry(hyp_vm->kvm.arch.pkvm.handle);
+ remove_vm_table_entry(handle);
err_unlock:
hyp_spin_unlock(&vm_table_lock);
err_remove_mappings:
--
2.50.1.703.g449372360f-goog
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH v2 7/8] KVM: arm64: Introduce separate hypercalls for pKVM VM reservation and initialization
2025-08-11 10:19 [PATCH v2 0/8] KVM: arm64: Reserve pKVM VM handle during initial VM setup Fuad Tabba
` (5 preceding siblings ...)
2025-08-11 10:19 ` [PATCH v2 6/8] KVM: arm64: Consolidate pKVM hypervisor VM initialization logic Fuad Tabba
@ 2025-08-11 10:19 ` Fuad Tabba
2025-08-11 10:19 ` [PATCH v2 8/8] KVM: arm64: Reserve pKVM handle during pkvm_init_host_vm() Fuad Tabba
7 siblings, 0 replies; 9+ messages in thread
From: Fuad Tabba @ 2025-08-11 10:19 UTC (permalink / raw)
To: kvmarm, linux-arm-kernel
Cc: maz, oliver.upton, will, mark.rutland, joey.gouly, suzuki.poulose,
yuzenghui, catalin.marinas, broonie, vdonnefort, qperret,
sebastianene, keirf, smostafa, tabba
The existing __pkvm_init_vm hypercall performs both the reservation of a
VM table entry and the initialization of the hypervisor VM state in a
single operation. This design prevents the host from obtaining a VM
handle from the hypervisor until all preparation for the creation and
the initialization of the VM is done, which is on the first vCPU run
operation.
To support more flexible VM lifecycle management, the host needs the
ability to reserve a handle early, before the first vCPU run.
Refactor the hypercall interface to enable this, splitting the single
hypercall into a two-stage process:
- __pkvm_reserve_vm: A new hypercall that allocates a slot in the
hypervisor's vm_table, marks it as reserved, and returns a unique
handle to the host.
- __pkvm_unreserve_vm: A corresponding cleanup hypercall to safely
release the reservation if the host fails to proceed with full
initialization.
- __pkvm_init_vm: The existing hypercall is modified to no longer
allocate a slot. It now expects a pre-reserved handle and commits the
donated VM memory to that slot.
For now, the host-side code in __pkvm_create_hyp_vm calls the new
reserve and init hypercalls back-to-back to maintain existing behavior.
This paves the way for subsequent patches to separate the reservation
and initialization steps in the VM's lifecycle.
Signed-off-by: Fuad Tabba <tabba@google.com>
---
arch/arm64/include/asm/kvm_asm.h | 2 +
arch/arm64/kvm/hyp/include/nvhe/pkvm.h | 2 +
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 14 ++++
arch/arm64/kvm/hyp/nvhe/pkvm.c | 102 +++++++++++++++++++------
arch/arm64/kvm/pkvm.c | 12 ++-
5 files changed, 108 insertions(+), 24 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index bec227f9500a..9da54d4ee49e 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -81,6 +81,8 @@ enum __kvm_host_smccc_func {
__KVM_HOST_SMCCC_FUNC___kvm_timer_set_cntvoff,
__KVM_HOST_SMCCC_FUNC___vgic_v3_save_vmcr_aprs,
__KVM_HOST_SMCCC_FUNC___vgic_v3_restore_vmcr_aprs,
+ __KVM_HOST_SMCCC_FUNC___pkvm_reserve_vm,
+ __KVM_HOST_SMCCC_FUNC___pkvm_unreserve_vm,
__KVM_HOST_SMCCC_FUNC___pkvm_init_vm,
__KVM_HOST_SMCCC_FUNC___pkvm_init_vcpu,
__KVM_HOST_SMCCC_FUNC___pkvm_teardown_vm,
diff --git a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
index 4540324b5657..184ad7a39950 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
@@ -67,6 +67,8 @@ static inline bool pkvm_hyp_vm_is_protected(struct pkvm_hyp_vm *hyp_vm)
void pkvm_hyp_vm_table_init(void *tbl);
+int __pkvm_reserve_vm(void);
+void __pkvm_unreserve_vm(pkvm_handle_t handle);
int __pkvm_init_vm(struct kvm *host_kvm, unsigned long vm_hva,
unsigned long pgd_hva);
int __pkvm_init_vcpu(pkvm_handle_t handle, struct kvm_vcpu *host_vcpu,
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 3206b2c07f82..29430c031095 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -546,6 +546,18 @@ static void handle___pkvm_prot_finalize(struct kvm_cpu_context *host_ctxt)
cpu_reg(host_ctxt, 1) = __pkvm_prot_finalize();
}
+static void handle___pkvm_reserve_vm(struct kvm_cpu_context *host_ctxt)
+{
+ cpu_reg(host_ctxt, 1) = __pkvm_reserve_vm();
+}
+
+static void handle___pkvm_unreserve_vm(struct kvm_cpu_context *host_ctxt)
+{
+ DECLARE_REG(pkvm_handle_t, handle, host_ctxt, 1);
+
+ __pkvm_unreserve_vm(handle);
+}
+
static void handle___pkvm_init_vm(struct kvm_cpu_context *host_ctxt)
{
DECLARE_REG(struct kvm *, host_kvm, host_ctxt, 1);
@@ -606,6 +618,8 @@ static const hcall_t host_hcall[] = {
HANDLE_FUNC(__kvm_timer_set_cntvoff),
HANDLE_FUNC(__vgic_v3_save_vmcr_aprs),
HANDLE_FUNC(__vgic_v3_restore_vmcr_aprs),
+ HANDLE_FUNC(__pkvm_reserve_vm),
+ HANDLE_FUNC(__pkvm_unreserve_vm),
HANDLE_FUNC(__pkvm_init_vm),
HANDLE_FUNC(__pkvm_init_vcpu),
HANDLE_FUNC(__pkvm_teardown_vm),
diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
index a9abbeb530f0..05774aed09cb 100644
--- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
+++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
@@ -542,6 +542,33 @@ static int allocate_vm_table_entry(void)
return idx;
}
+static int __insert_vm_table_entry(pkvm_handle_t handle,
+ struct pkvm_hyp_vm *hyp_vm)
+{
+ unsigned int idx;
+
+ hyp_assert_lock_held(&vm_table_lock);
+
+ /*
+ * Initializing protected state might have failed, yet a malicious
+ * host could trigger this function. Thus, ensure that 'vm_table'
+ * exists.
+ */
+ if (unlikely(!vm_table))
+ return -EINVAL;
+
+ idx = vm_handle_to_idx(handle);
+ if (unlikely(idx >= KVM_MAX_PVMS))
+ return -EINVAL;
+
+ if (unlikely(vm_table[idx] != RESERVED_ENTRY))
+ return -EINVAL;
+
+ vm_table[idx] = hyp_vm;
+
+ return 0;
+}
+
/*
* Insert a pointer to the initialized VM into the VM table.
*
@@ -550,13 +577,13 @@ static int allocate_vm_table_entry(void)
static int insert_vm_table_entry(pkvm_handle_t handle,
struct pkvm_hyp_vm *hyp_vm)
{
- unsigned int idx;
+ int ret;
- hyp_assert_lock_held(&vm_table_lock);
- idx = vm_handle_to_idx(handle);
- vm_table[idx] = hyp_vm;
+ hyp_spin_lock(&vm_table_lock);
+ ret = __insert_vm_table_entry(handle, hyp_vm);
+ hyp_spin_unlock(&vm_table_lock);
- return 0;
+ return ret;
}
/*
@@ -622,8 +649,45 @@ static void unmap_donated_memory_noclear(void *va, size_t size)
__unmap_donated_memory(va, size);
}
+/*
+ * Reserves an entry in the hypervisor for a new VM in protected mode.
+ *
+ * Return a unique handle to the VM on success, negative error code on failure.
+ */
+int __pkvm_reserve_vm(void)
+{
+ int ret;
+
+ hyp_spin_lock(&vm_table_lock);
+ ret = allocate_vm_table_entry();
+ hyp_spin_unlock(&vm_table_lock);
+
+ if (ret < 0)
+ return ret;
+
+ return idx_to_vm_handle(ret);
+}
+
+/*
+ * Removes a reserved entry, but only if is hasn't been used yet.
+ * Otherwise, the VM needs to be destroyed.
+ */
+void __pkvm_unreserve_vm(pkvm_handle_t handle)
+{
+ unsigned int idx = vm_handle_to_idx(handle);
+
+ if (unlikely(!vm_table))
+ return;
+
+ hyp_spin_lock(&vm_table_lock);
+ if (likely(idx < KVM_MAX_PVMS && vm_table[idx] == RESERVED_ENTRY))
+ remove_vm_table_entry(handle);
+ hyp_spin_unlock(&vm_table_lock);
+}
+
/*
* Initialize the hypervisor copy of the VM state using host-donated memory.
+ *
* Unmap the donated memory from the host at stage 2.
*
* host_kvm: A pointer to the host's struct kvm.
@@ -633,8 +697,7 @@ static void unmap_donated_memory_noclear(void *va, size_t size)
* the VM. Must be page aligned. Its size is implied by the VM's
* VTCR.
*
- * Return a unique handle to the VM on success,
- * negative error code on failure.
+ * Return 0 success, negative error code on failure.
*/
int __pkvm_init_vm(struct kvm *host_kvm, unsigned long vm_hva,
unsigned long pgd_hva)
@@ -656,6 +719,12 @@ int __pkvm_init_vm(struct kvm *host_kvm, unsigned long vm_hva,
goto err_unpin_kvm;
}
+ handle = READ_ONCE(host_kvm->arch.pkvm.handle);
+ if (unlikely(handle < HANDLE_OFFSET)) {
+ ret = -EINVAL;
+ goto err_unpin_kvm;
+ }
+
vm_size = pkvm_get_hyp_vm_size(nr_vcpus);
pgd_size = kvm_pgtable_stage2_pgd_size(host_mmu.arch.mmu.vtcr);
@@ -669,30 +738,19 @@ int __pkvm_init_vm(struct kvm *host_kvm, unsigned long vm_hva,
if (!pgd)
goto err_remove_mappings;
- hyp_spin_lock(&vm_table_lock);
- ret = allocate_vm_table_entry();
- if (ret < 0)
- goto err_unlock;
-
- handle = idx_to_vm_handle(ret);
-
init_pkvm_hyp_vm(host_kvm, hyp_vm, nr_vcpus, handle);
ret = kvm_guest_prepare_stage2(hyp_vm, pgd);
if (ret)
- goto err_remove_vm_table_entry;
+ goto err_remove_mappings;
+ /* Must be called last since this publishes the VM. */
ret = insert_vm_table_entry(handle, hyp_vm);
if (ret)
- goto err_remove_vm_table_entry;
- hyp_spin_unlock(&vm_table_lock);
+ goto err_remove_mappings;
- return handle;
+ return 0;
-err_remove_vm_table_entry:
- remove_vm_table_entry(handle);
-err_unlock:
- hyp_spin_unlock(&vm_table_lock);
err_remove_mappings:
unmap_donated_memory(hyp_vm, vm_size);
unmap_donated_memory(pgd, pgd_size);
diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index 45d699bba96a..082bc15f436c 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -160,17 +160,25 @@ static int __pkvm_create_hyp_vm(struct kvm *kvm)
goto free_pgd;
}
- /* Donate the VM memory to hyp and let hyp initialize it. */
- ret = kvm_call_hyp_nvhe(__pkvm_init_vm, kvm, hyp_vm, pgd);
+ /* Reserve the VM in hyp and obtain a hyp handle for the VM. */
+ ret = kvm_call_hyp_nvhe(__pkvm_reserve_vm);
if (ret < 0)
goto free_vm;
kvm->arch.pkvm.handle = ret;
+
+ /* Donate the VM memory to hyp and let hyp initialize it. */
+ ret = kvm_call_hyp_nvhe(__pkvm_init_vm, kvm, hyp_vm, pgd);
+ if (ret)
+ goto unreserve_vm;
+
kvm->arch.pkvm.is_created = true;
kvm->arch.pkvm.stage2_teardown_mc.flags |= HYP_MEMCACHE_ACCOUNT_STAGE2;
kvm_account_pgtable_pages(pgd, pgd_sz / PAGE_SIZE);
return 0;
+unreserve_vm:
+ kvm_call_hyp_nvhe(__pkvm_unreserve_vm, kvm->arch.pkvm.handle);
free_vm:
free_pages_exact(hyp_vm, hyp_vm_sz);
free_pgd:
--
2.50.1.703.g449372360f-goog
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH v2 8/8] KVM: arm64: Reserve pKVM handle during pkvm_init_host_vm()
2025-08-11 10:19 [PATCH v2 0/8] KVM: arm64: Reserve pKVM VM handle during initial VM setup Fuad Tabba
` (6 preceding siblings ...)
2025-08-11 10:19 ` [PATCH v2 7/8] KVM: arm64: Introduce separate hypercalls for pKVM VM reservation and initialization Fuad Tabba
@ 2025-08-11 10:19 ` Fuad Tabba
7 siblings, 0 replies; 9+ messages in thread
From: Fuad Tabba @ 2025-08-11 10:19 UTC (permalink / raw)
To: kvmarm, linux-arm-kernel
Cc: maz, oliver.upton, will, mark.rutland, joey.gouly, suzuki.poulose,
yuzenghui, catalin.marinas, broonie, vdonnefort, qperret,
sebastianene, keirf, smostafa, tabba
When a pKVM guest is active, TLB invalidations triggered by host MMU
notifiers require a valid hypervisor handle. Currently, this handle is
only allocated when the first vCPU is run.
However, the guest's memory is associated with the host MMU much
earlier, during kvm_arch_init_vm(). This creates a window where an MMU
invalidation could occur after the kvm_pgtable pointer checked by the
notifiers is set but before the pKVM handle has been created.
Fix this by reserving the pKVM handle when the host VM is first set up.
Move the call to the __pkvm_reserve_vm hypercall from the first-vCPU-run
path into pkvm_init_host_vm(), which is called during initial VM setup.
This ensures the handle is available before any subsystem can trigger an
MMU notification for the VM.
The VM destruction path is updated to call __pkvm_unreserve_vm for cases
where a VM was reserved but never fully created at the hypervisor,
ensuring the handle is properly released.
This fix leverages the two-stage reservation/initialization hypercall
interface introduced in preceding patches.
Signed-off-by: Fuad Tabba <tabba@google.com>
---
arch/arm64/kvm/arm.c | 12 ++++++++----
arch/arm64/kvm/pkvm.c | 33 +++++++++++++++++++++++----------
2 files changed, 31 insertions(+), 14 deletions(-)
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 888f7c7abf54..7f9ebe38fdd2 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -170,10 +170,6 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
if (ret)
return ret;
- ret = pkvm_init_host_vm(kvm);
- if (ret)
- goto err_unshare_kvm;
-
if (!zalloc_cpumask_var(&kvm->arch.supported_cpus, GFP_KERNEL_ACCOUNT)) {
ret = -ENOMEM;
goto err_unshare_kvm;
@@ -184,6 +180,14 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
if (ret)
goto err_free_cpumask;
+ /*
+ * If any failures occur after this is successful, make sure to call
+ * __pkvm_unreserve_vm to unreserve the VM in the hypervisor.
+ */
+ ret = pkvm_init_host_vm(kvm);
+ if (ret)
+ goto err_free_cpumask;
+
kvm_vgic_early_init(kvm);
kvm_timer_init_vm(kvm);
diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index 082bc15f436c..24f0f8a8c943 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -90,6 +90,12 @@ static void __pkvm_destroy_hyp_vm(struct kvm *kvm)
if (pkvm_hyp_vm_is_created(kvm)) {
WARN_ON(kvm_call_hyp_nvhe(__pkvm_teardown_vm,
kvm->arch.pkvm.handle));
+ } else if (kvm->arch.pkvm.handle) {
+ /*
+ * The VM could have been reserved but hyp initialization has
+ * failed. Make sure to unreserve it.
+ */
+ kvm_call_hyp_nvhe(__pkvm_unreserve_vm, kvm->arch.pkvm.handle);
}
kvm->arch.pkvm.handle = 0;
@@ -160,25 +166,16 @@ static int __pkvm_create_hyp_vm(struct kvm *kvm)
goto free_pgd;
}
- /* Reserve the VM in hyp and obtain a hyp handle for the VM. */
- ret = kvm_call_hyp_nvhe(__pkvm_reserve_vm);
- if (ret < 0)
- goto free_vm;
-
- kvm->arch.pkvm.handle = ret;
-
/* Donate the VM memory to hyp and let hyp initialize it. */
ret = kvm_call_hyp_nvhe(__pkvm_init_vm, kvm, hyp_vm, pgd);
if (ret)
- goto unreserve_vm;
+ goto free_vm;
kvm->arch.pkvm.is_created = true;
kvm->arch.pkvm.stage2_teardown_mc.flags |= HYP_MEMCACHE_ACCOUNT_STAGE2;
kvm_account_pgtable_pages(pgd, pgd_sz / PAGE_SIZE);
return 0;
-unreserve_vm:
- kvm_call_hyp_nvhe(__pkvm_unreserve_vm, kvm->arch.pkvm.handle);
free_vm:
free_pages_exact(hyp_vm, hyp_vm_sz);
free_pgd:
@@ -224,6 +221,22 @@ void pkvm_destroy_hyp_vm(struct kvm *kvm)
int pkvm_init_host_vm(struct kvm *kvm)
{
+ int ret;
+
+ if (pkvm_hyp_vm_is_created(kvm))
+ return -EINVAL;
+
+ /* VM is already reserved, no need to proceed. */
+ if (kvm->arch.pkvm.handle)
+ return 0;
+
+ /* Reserve the VM in hyp and obtain a hyp handle for the VM. */
+ ret = kvm_call_hyp_nvhe(__pkvm_reserve_vm);
+ if (ret < 0)
+ return ret;
+
+ kvm->arch.pkvm.handle = ret;
+
return 0;
}
--
2.50.1.703.g449372360f-goog
^ permalink raw reply related [flat|nested] 9+ messages in thread
end of thread, other threads:[~2025-08-11 10:44 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-11 10:19 [PATCH v2 0/8] KVM: arm64: Reserve pKVM VM handle during initial VM setup Fuad Tabba
2025-08-11 10:19 ` [PATCH v2 1/8] KVM: arm64: Rename pkvm.enabled to pkvm.is_protected Fuad Tabba
2025-08-11 10:19 ` [PATCH v2 2/8] KVM: arm64: Rename 'host_kvm' to 'kvm' in pKVM host code Fuad Tabba
2025-08-11 10:19 ` [PATCH v2 3/8] KVM: arm64: Clarify comments to distinguish pKVM mode from protected VMs Fuad Tabba
2025-08-11 10:19 ` [PATCH v2 4/8] KVM: arm64: Decouple hyp VM creation state from its handle Fuad Tabba
2025-08-11 10:19 ` [PATCH v2 5/8] KVM: arm64: Separate allocation and insertion of pKVM VM table entries Fuad Tabba
2025-08-11 10:19 ` [PATCH v2 6/8] KVM: arm64: Consolidate pKVM hypervisor VM initialization logic Fuad Tabba
2025-08-11 10:19 ` [PATCH v2 7/8] KVM: arm64: Introduce separate hypercalls for pKVM VM reservation and initialization Fuad Tabba
2025-08-11 10:19 ` [PATCH v2 8/8] KVM: arm64: Reserve pKVM handle during pkvm_init_host_vm() Fuad Tabba
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).