* [PATCH v2 0/4] KVM: arm64: Fix initializing HCRX_EL2 and other traps in pKVM
@ 2025-02-26 21:55 Fuad Tabba
2025-02-26 21:55 ` [PATCH v2 1/4] KVM: arm64: Factor out setting HCRX_EL2 traps into separate function Fuad Tabba
` (4 more replies)
0 siblings, 5 replies; 18+ messages in thread
From: Fuad Tabba @ 2025-02-26 21:55 UTC (permalink / raw)
To: kvmarm, linux-arm-kernel
Cc: maz, oliver.upton, mark.rutland, will, joey.gouly, suzuki.poulose,
yuzenghui, catalin.marinas, broonie, qperret, vdonnefort, tabba
Changes since v1 [*]:
- Factor out setting HCRX_EL2 traps into a separate function and
share logic between protected and non-protected modes
- Rebase on Linux 6.14-rc4
For context, please refer to v1 [*].
Cheers,
/fuad
[*] https://lore.kernel.org/all/20250214150258.464798-1-tabba@google.com/
Fuad Tabba (4):
KVM: arm64: Factor out setting HCRX_EL2 traps into separate function
KVM: arm64: Initialize HCRX_EL2 traps in pKVM
KVM: arm64: Factor out pKVM hyp vcpu creation to separate function
KVM: arm64: Create each pKVM hyp vcpu after its corresponding host
vcpu
arch/arm64/include/asm/kvm_emulate.h | 24 +++++++++
arch/arm64/include/asm/kvm_host.h | 2 +
arch/arm64/include/asm/kvm_pkvm.h | 1 +
arch/arm64/kvm/arm.c | 4 ++
arch/arm64/kvm/hyp/include/nvhe/pkvm.h | 6 ---
arch/arm64/kvm/hyp/nvhe/pkvm.c | 62 ++++++++++++++---------
arch/arm64/kvm/pkvm.c | 70 +++++++++++++-------------
arch/arm64/kvm/sys_regs.c | 20 +-------
8 files changed, 104 insertions(+), 85 deletions(-)
base-commit: d082ecbc71e9e0bf49883ee4afd435a77a5101b6
--
2.48.1.711.g2feabab25a-goog
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v2 1/4] KVM: arm64: Factor out setting HCRX_EL2 traps into separate function
2025-02-26 21:55 [PATCH v2 0/4] KVM: arm64: Fix initializing HCRX_EL2 and other traps in pKVM Fuad Tabba
@ 2025-02-26 21:55 ` Fuad Tabba
2025-02-26 21:55 ` [PATCH v2 2/4] KVM: arm64: Initialize HCRX_EL2 traps in pKVM Fuad Tabba
` (3 subsequent siblings)
4 siblings, 0 replies; 18+ messages in thread
From: Fuad Tabba @ 2025-02-26 21:55 UTC (permalink / raw)
To: kvmarm, linux-arm-kernel
Cc: maz, oliver.upton, mark.rutland, will, joey.gouly, suzuki.poulose,
yuzenghui, catalin.marinas, broonie, qperret, vdonnefort, tabba
Factor out the code for setting a vcpu's HCRX_EL2 traps in to a
separate inline function. This allows us to share the logic with
pKVM when setting the traps in protected mode.
No functional change intended.
Signed-off-by: Fuad Tabba <tabba@google.com>
---
arch/arm64/include/asm/kvm_emulate.h | 24 ++++++++++++++++++++++++
arch/arm64/kvm/sys_regs.c | 20 +-------------------
2 files changed, 25 insertions(+), 19 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 78ec1ef2cfe8..31c8497372d0 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -649,4 +649,28 @@ static inline bool guest_hyp_sve_traps_enabled(const struct kvm_vcpu *vcpu)
{
return __guest_hyp_cptr_xen_trap_enabled(vcpu, ZEN);
}
+
+static inline void vcpu_set_hcrx(struct kvm_vcpu *vcpu)
+{
+ struct kvm *kvm = vcpu->kvm;
+
+ if (cpus_have_final_cap(ARM64_HAS_HCX)) {
+ /*
+ * In general, all HCRX_EL2 bits are gated by a feature.
+ * The only reason we can set SMPME without checking any
+ * feature is that its effects are not directly observable
+ * from the guest.
+ */
+ vcpu->arch.hcrx_el2 = HCRX_EL2_SMPME;
+
+ if (kvm_has_feat(kvm, ID_AA64ISAR2_EL1, MOPS, IMP))
+ vcpu->arch.hcrx_el2 |= (HCRX_EL2_MSCEn | HCRX_EL2_MCE2);
+
+ if (kvm_has_tcr2(kvm))
+ vcpu->arch.hcrx_el2 |= HCRX_EL2_TCR2En;
+
+ if (kvm_has_fpmr(kvm))
+ vcpu->arch.hcrx_el2 |= HCRX_EL2_EnFPM;
+ }
+}
#endif /* __ARM64_KVM_EMULATE_H__ */
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 82430c1e1dd0..16ce5f584e7c 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -4971,25 +4971,7 @@ void kvm_calculate_traps(struct kvm_vcpu *vcpu)
mutex_lock(&kvm->arch.config_lock);
vcpu_set_hcr(vcpu);
vcpu_set_ich_hcr(vcpu);
-
- if (cpus_have_final_cap(ARM64_HAS_HCX)) {
- /*
- * In general, all HCRX_EL2 bits are gated by a feature.
- * The only reason we can set SMPME without checking any
- * feature is that its effects are not directly observable
- * from the guest.
- */
- vcpu->arch.hcrx_el2 = HCRX_EL2_SMPME;
-
- if (kvm_has_feat(kvm, ID_AA64ISAR2_EL1, MOPS, IMP))
- vcpu->arch.hcrx_el2 |= (HCRX_EL2_MSCEn | HCRX_EL2_MCE2);
-
- if (kvm_has_tcr2(kvm))
- vcpu->arch.hcrx_el2 |= HCRX_EL2_TCR2En;
-
- if (kvm_has_fpmr(kvm))
- vcpu->arch.hcrx_el2 |= HCRX_EL2_EnFPM;
- }
+ vcpu_set_hcrx(vcpu);
if (test_bit(KVM_ARCH_FLAG_FGU_INITIALIZED, &kvm->arch.flags))
goto out;
--
2.48.1.711.g2feabab25a-goog
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v2 2/4] KVM: arm64: Initialize HCRX_EL2 traps in pKVM
2025-02-26 21:55 [PATCH v2 0/4] KVM: arm64: Fix initializing HCRX_EL2 and other traps in pKVM Fuad Tabba
2025-02-26 21:55 ` [PATCH v2 1/4] KVM: arm64: Factor out setting HCRX_EL2 traps into separate function Fuad Tabba
@ 2025-02-26 21:55 ` Fuad Tabba
2025-02-26 21:55 ` [PATCH v2 3/4] KVM: arm64: Factor out pKVM hyp vcpu creation to separate function Fuad Tabba
` (2 subsequent siblings)
4 siblings, 0 replies; 18+ messages in thread
From: Fuad Tabba @ 2025-02-26 21:55 UTC (permalink / raw)
To: kvmarm, linux-arm-kernel
Cc: maz, oliver.upton, mark.rutland, will, joey.gouly, suzuki.poulose,
yuzenghui, catalin.marinas, broonie, qperret, vdonnefort, tabba
Initialize and set the traps controlled by the HCRX_EL2 in pKVM.
Signed-off-by: Fuad Tabba <tabba@google.com>
---
arch/arm64/kvm/hyp/nvhe/pkvm.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
index 3927fe52a3dd..6efb9bf56180 100644
--- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
+++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
@@ -166,8 +166,13 @@ static int pkvm_vcpu_init_traps(struct pkvm_hyp_vcpu *hyp_vcpu)
pkvm_vcpu_reset_hcr(vcpu);
- if ((!pkvm_hyp_vcpu_is_protected(hyp_vcpu)))
+ if ((!pkvm_hyp_vcpu_is_protected(hyp_vcpu))) {
+ struct kvm_vcpu *host_vcpu = hyp_vcpu->host_vcpu;
+
+ /* Trust the host for non-protected vcpu features. */
+ vcpu->arch.hcrx_el2 = host_vcpu->arch.hcrx_el2;
return 0;
+ }
ret = pkvm_check_pvm_cpu_features(vcpu);
if (ret)
@@ -175,6 +180,7 @@ static int pkvm_vcpu_init_traps(struct pkvm_hyp_vcpu *hyp_vcpu)
pvm_init_traps_hcr(vcpu);
pvm_init_traps_mdcr(vcpu);
+ vcpu_set_hcrx(vcpu);
return 0;
}
--
2.48.1.711.g2feabab25a-goog
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v2 3/4] KVM: arm64: Factor out pKVM hyp vcpu creation to separate function
2025-02-26 21:55 [PATCH v2 0/4] KVM: arm64: Fix initializing HCRX_EL2 and other traps in pKVM Fuad Tabba
2025-02-26 21:55 ` [PATCH v2 1/4] KVM: arm64: Factor out setting HCRX_EL2 traps into separate function Fuad Tabba
2025-02-26 21:55 ` [PATCH v2 2/4] KVM: arm64: Initialize HCRX_EL2 traps in pKVM Fuad Tabba
@ 2025-02-26 21:55 ` Fuad Tabba
2025-02-28 19:44 ` Quentin Perret
2025-02-26 21:55 ` [PATCH v2 4/4] KVM: arm64: Create each pKVM hyp vcpu after its corresponding host vcpu Fuad Tabba
2025-02-27 14:13 ` [PATCH v2 0/4] KVM: arm64: Fix initializing HCRX_EL2 and other traps in pKVM Marc Zyngier
4 siblings, 1 reply; 18+ messages in thread
From: Fuad Tabba @ 2025-02-26 21:55 UTC (permalink / raw)
To: kvmarm, linux-arm-kernel
Cc: maz, oliver.upton, mark.rutland, will, joey.gouly, suzuki.poulose,
yuzenghui, catalin.marinas, broonie, qperret, vdonnefort, tabba
Move the code that creates and initializes the hyp view of a vcpu
in pKVM to its own function. This is meant to make the transition
to initializing every vcpu individually clearer.
Signed-off-by: Fuad Tabba <tabba@google.com>
---
arch/arm64/kvm/pkvm.c | 48 ++++++++++++++++++++-----------------------
1 file changed, 22 insertions(+), 26 deletions(-)
diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index 930b677eb9b0..06665cf221bf 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -113,6 +113,24 @@ static void __pkvm_destroy_hyp_vm(struct kvm *host_kvm)
free_hyp_memcache(&host_kvm->arch.pkvm.teardown_mc);
}
+static int __pkvm_create_hyp_vcpu(struct kvm_vcpu *vcpu)
+{
+ size_t hyp_vcpu_sz = PAGE_ALIGN(PKVM_HYP_VCPU_SIZE);
+ pkvm_handle_t handle = vcpu->kvm->arch.pkvm.handle;
+ void *hyp_vcpu;
+ int ret;
+
+ hyp_vcpu = alloc_pages_exact(hyp_vcpu_sz, GFP_KERNEL_ACCOUNT);
+ if (!hyp_vcpu)
+ return -ENOMEM;
+
+ ret = kvm_call_hyp_nvhe(__pkvm_init_vcpu, handle, vcpu, hyp_vcpu);
+ if (ret)
+ free_pages_exact(hyp_vcpu, hyp_vcpu_sz);
+
+ return ret;
+}
+
/*
* Allocates and donates memory for hypervisor VM structs at EL2.
*
@@ -125,9 +143,8 @@ static void __pkvm_destroy_hyp_vm(struct kvm *host_kvm)
*/
static int __pkvm_create_hyp_vm(struct kvm *host_kvm)
{
- size_t pgd_sz, hyp_vm_sz, hyp_vcpu_sz;
+ size_t pgd_sz, hyp_vm_sz;
struct kvm_vcpu *host_vcpu;
- pkvm_handle_t handle;
void *pgd, *hyp_vm;
unsigned long idx;
int ret;
@@ -161,33 +178,12 @@ static int __pkvm_create_hyp_vm(struct kvm *host_kvm)
if (ret < 0)
goto free_vm;
- handle = ret;
+ WRITE_ONCE(host_kvm->arch.pkvm.handle, ret);
- host_kvm->arch.pkvm.handle = handle;
-
- /* Donate memory for the vcpus at hyp and initialize it. */
- hyp_vcpu_sz = PAGE_ALIGN(PKVM_HYP_VCPU_SIZE);
kvm_for_each_vcpu(idx, host_vcpu, host_kvm) {
- void *hyp_vcpu;
-
- /* Indexing of the vcpus to be sequential starting at 0. */
- if (WARN_ON(host_vcpu->vcpu_idx != idx)) {
- ret = -EINVAL;
- goto destroy_vm;
- }
-
- hyp_vcpu = alloc_pages_exact(hyp_vcpu_sz, GFP_KERNEL_ACCOUNT);
- if (!hyp_vcpu) {
- ret = -ENOMEM;
- goto destroy_vm;
- }
-
- ret = kvm_call_hyp_nvhe(__pkvm_init_vcpu, handle, host_vcpu,
- hyp_vcpu);
- if (ret) {
- free_pages_exact(hyp_vcpu, hyp_vcpu_sz);
+ ret = __pkvm_create_hyp_vcpu(host_vcpu);
+ if (ret)
goto destroy_vm;
- }
}
return 0;
--
2.48.1.711.g2feabab25a-goog
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v2 4/4] KVM: arm64: Create each pKVM hyp vcpu after its corresponding host vcpu
2025-02-26 21:55 [PATCH v2 0/4] KVM: arm64: Fix initializing HCRX_EL2 and other traps in pKVM Fuad Tabba
` (2 preceding siblings ...)
2025-02-26 21:55 ` [PATCH v2 3/4] KVM: arm64: Factor out pKVM hyp vcpu creation to separate function Fuad Tabba
@ 2025-02-26 21:55 ` Fuad Tabba
2025-02-27 12:09 ` Marc Zyngier
2025-02-27 14:13 ` [PATCH v2 0/4] KVM: arm64: Fix initializing HCRX_EL2 and other traps in pKVM Marc Zyngier
4 siblings, 1 reply; 18+ messages in thread
From: Fuad Tabba @ 2025-02-26 21:55 UTC (permalink / raw)
To: kvmarm, linux-arm-kernel
Cc: maz, oliver.upton, mark.rutland, will, joey.gouly, suzuki.poulose,
yuzenghui, catalin.marinas, broonie, qperret, vdonnefort, tabba
Instead of creating and initializing _all_ hyp vcpus in pKVM when
the first host vcpu runs for the first time, initialize _each_
hyp vcpu in conjunction with its corresponding host vcpu.
Some of the host vcpu state (e.g., system registers and traps
values) is not initialized until the first time the host vcpu is
run. Therefore, initializing a hyp vcpu before its corresponding
host vcpu has run for the first time might not view the complete
host state of these vcpus.
Additionally, this behavior is inline with non-protected modes.
Signed-off-by: Fuad Tabba <tabba@google.com>
---
arch/arm64/include/asm/kvm_host.h | 2 +
arch/arm64/include/asm/kvm_pkvm.h | 1 +
arch/arm64/kvm/arm.c | 4 ++
arch/arm64/kvm/hyp/include/nvhe/pkvm.h | 6 ---
arch/arm64/kvm/hyp/nvhe/pkvm.c | 54 +++++++++++++++-----------
arch/arm64/kvm/pkvm.c | 28 ++++++-------
6 files changed, 53 insertions(+), 42 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 3a7ec98ef123..4a0e522f6001 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -869,6 +869,8 @@ struct kvm_vcpu_arch {
#define VCPU_INITIALIZED __vcpu_single_flag(cflags, BIT(0))
/* SVE config completed */
#define VCPU_SVE_FINALIZED __vcpu_single_flag(cflags, BIT(1))
+/* pKVM VCPU setup completed */
+#define VCPU_PKVM_FINALIZED __vcpu_single_flag(cflags, BIT(2))
/* Exception pending */
#define PENDING_EXCEPTION __vcpu_single_flag(iflags, BIT(0))
diff --git a/arch/arm64/include/asm/kvm_pkvm.h b/arch/arm64/include/asm/kvm_pkvm.h
index eb65f12e81d9..abd693ce5b93 100644
--- a/arch/arm64/include/asm/kvm_pkvm.h
+++ b/arch/arm64/include/asm/kvm_pkvm.h
@@ -19,6 +19,7 @@
int pkvm_init_host_vm(struct kvm *kvm);
int pkvm_create_hyp_vm(struct kvm *kvm);
void pkvm_destroy_hyp_vm(struct kvm *kvm);
+int pkvm_create_hyp_vcpu(struct kvm_vcpu *vcpu);
/*
* This functions as an allow-list of protected VM capabilities.
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index b8e55a441282..cd7a808eeb64 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -833,6 +833,10 @@ int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu)
ret = pkvm_create_hyp_vm(kvm);
if (ret)
return ret;
+
+ ret = pkvm_create_hyp_vcpu(vcpu);
+ if (ret)
+ return ret;
}
mutex_lock(&kvm->arch.config_lock);
diff --git a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
index e42bf68c8848..ce31d3b73603 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
@@ -43,12 +43,6 @@ struct pkvm_hyp_vm {
struct hyp_pool pool;
hyp_spinlock_t lock;
- /*
- * The number of vcpus initialized and ready to run.
- * Modifying this is protected by 'vm_table_lock'.
- */
- unsigned int nr_vcpus;
-
/* Array of the hyp vCPU structures for this VM. */
struct pkvm_hyp_vcpu *vcpus[];
};
diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
index 6efb9bf56180..4ef3748dc660 100644
--- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
+++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
@@ -245,10 +245,12 @@ struct pkvm_hyp_vcpu *pkvm_load_hyp_vcpu(pkvm_handle_t handle,
hyp_spin_lock(&vm_table_lock);
hyp_vm = get_vm_by_handle(handle);
- if (!hyp_vm || hyp_vm->nr_vcpus <= vcpu_idx)
+ if (!hyp_vm || hyp_vm->kvm.created_vcpus <= vcpu_idx)
goto unlock;
hyp_vcpu = hyp_vm->vcpus[vcpu_idx];
+ if (!hyp_vcpu)
+ goto unlock;
/* Ensure vcpu isn't loaded on more than one cpu simultaneously. */
if (unlikely(hyp_vcpu->loaded_hyp_vcpu)) {
@@ -367,8 +369,14 @@ static void unpin_host_vcpus(struct pkvm_hyp_vcpu *hyp_vcpus[],
{
int i;
- for (i = 0; i < nr_vcpus; i++)
- unpin_host_vcpu(hyp_vcpus[i]->host_vcpu);
+ for (i = 0; i < nr_vcpus; i++) {
+ struct pkvm_hyp_vcpu *hyp_vcpu = hyp_vcpus[i];
+
+ if (!hyp_vcpu)
+ continue;
+
+ unpin_host_vcpu(hyp_vcpu->host_vcpu);
+ }
}
static void init_pkvm_hyp_vm(struct kvm *host_kvm, struct pkvm_hyp_vm *hyp_vm,
@@ -392,24 +400,18 @@ static void pkvm_vcpu_init_sve(struct pkvm_hyp_vcpu *hyp_vcpu, struct kvm_vcpu *
static int init_pkvm_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu,
struct pkvm_hyp_vm *hyp_vm,
- struct kvm_vcpu *host_vcpu,
- unsigned int vcpu_idx)
+ struct kvm_vcpu *host_vcpu)
{
int ret = 0;
if (hyp_pin_shared_mem(host_vcpu, host_vcpu + 1))
return -EBUSY;
- if (host_vcpu->vcpu_idx != vcpu_idx) {
- ret = -EINVAL;
- goto done;
- }
-
hyp_vcpu->host_vcpu = host_vcpu;
hyp_vcpu->vcpu.kvm = &hyp_vm->kvm;
hyp_vcpu->vcpu.vcpu_id = READ_ONCE(host_vcpu->vcpu_id);
- hyp_vcpu->vcpu.vcpu_idx = vcpu_idx;
+ hyp_vcpu->vcpu.vcpu_idx = READ_ONCE(host_vcpu->vcpu_idx);
hyp_vcpu->vcpu.arch.hw_mmu = &hyp_vm->kvm.arch.mmu;
hyp_vcpu->vcpu.arch.cflags = READ_ONCE(host_vcpu->arch.cflags);
@@ -647,27 +649,28 @@ int __pkvm_init_vcpu(pkvm_handle_t handle, struct kvm_vcpu *host_vcpu,
goto unlock;
}
- idx = hyp_vm->nr_vcpus;
+ ret = init_pkvm_hyp_vcpu(hyp_vcpu, hyp_vm, host_vcpu);
+ if (ret)
+ goto unlock;
+
+ idx = hyp_vcpu->vcpu.vcpu_idx;
if (idx >= hyp_vm->kvm.created_vcpus) {
ret = -EINVAL;
goto unlock;
}
- ret = init_pkvm_hyp_vcpu(hyp_vcpu, hyp_vm, host_vcpu, idx);
- if (ret)
+ if (hyp_vm->vcpus[idx]) {
+ ret = -EINVAL;
goto unlock;
+ }
hyp_vm->vcpus[idx] = hyp_vcpu;
- hyp_vm->nr_vcpus++;
unlock:
hyp_spin_unlock(&vm_table_lock);
- if (ret) {
+ if (ret)
unmap_donated_memory(hyp_vcpu, sizeof(*hyp_vcpu));
- return ret;
- }
-
- return 0;
+ return ret;
}
static void
@@ -713,12 +716,17 @@ int __pkvm_teardown_vm(pkvm_handle_t handle)
/* Reclaim guest pages (including page-table pages) */
mc = &host_kvm->arch.pkvm.teardown_mc;
reclaim_guest_pages(hyp_vm, mc);
- unpin_host_vcpus(hyp_vm->vcpus, hyp_vm->nr_vcpus);
+ unpin_host_vcpus(hyp_vm->vcpus, hyp_vm->kvm.created_vcpus);
/* Push the metadata pages to the teardown memcache */
- for (idx = 0; idx < hyp_vm->nr_vcpus; ++idx) {
+ for (idx = 0; idx < hyp_vm->kvm.created_vcpus; ++idx) {
struct pkvm_hyp_vcpu *hyp_vcpu = hyp_vm->vcpus[idx];
- struct kvm_hyp_memcache *vcpu_mc = &hyp_vcpu->vcpu.arch.pkvm_memcache;
+ struct kvm_hyp_memcache *vcpu_mc;
+
+ if (!hyp_vcpu)
+ continue;
+
+ vcpu_mc = &hyp_vcpu->vcpu.arch.pkvm_memcache;
while (vcpu_mc->nr_pages) {
void *addr = pop_hyp_memcache(vcpu_mc, hyp_phys_to_virt);
diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index 06665cf221bf..f2a09dc50676 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -125,7 +125,9 @@ static int __pkvm_create_hyp_vcpu(struct kvm_vcpu *vcpu)
return -ENOMEM;
ret = kvm_call_hyp_nvhe(__pkvm_init_vcpu, handle, vcpu, hyp_vcpu);
- if (ret)
+ if (!ret)
+ vcpu_set_flag(vcpu, VCPU_PKVM_FINALIZED);
+ else
free_pages_exact(hyp_vcpu, hyp_vcpu_sz);
return ret;
@@ -144,9 +146,7 @@ static int __pkvm_create_hyp_vcpu(struct kvm_vcpu *vcpu)
static int __pkvm_create_hyp_vm(struct kvm *host_kvm)
{
size_t pgd_sz, hyp_vm_sz;
- struct kvm_vcpu *host_vcpu;
void *pgd, *hyp_vm;
- unsigned long idx;
int ret;
if (host_kvm->created_vcpus < 1)
@@ -180,17 +180,7 @@ static int __pkvm_create_hyp_vm(struct kvm *host_kvm)
WRITE_ONCE(host_kvm->arch.pkvm.handle, ret);
- kvm_for_each_vcpu(idx, host_vcpu, host_kvm) {
- ret = __pkvm_create_hyp_vcpu(host_vcpu);
- if (ret)
- goto destroy_vm;
- }
-
return 0;
-
-destroy_vm:
- __pkvm_destroy_hyp_vm(host_kvm);
- return ret;
free_vm:
free_pages_exact(hyp_vm, hyp_vm_sz);
free_pgd:
@@ -210,6 +200,18 @@ int pkvm_create_hyp_vm(struct kvm *host_kvm)
return ret;
}
+int pkvm_create_hyp_vcpu(struct kvm_vcpu *vcpu)
+{
+ int ret = 0;
+
+ mutex_lock(&vcpu->kvm->arch.config_lock);
+ if (!vcpu_get_flag(vcpu, VCPU_PKVM_FINALIZED))
+ ret = __pkvm_create_hyp_vcpu(vcpu);
+ mutex_unlock(&vcpu->kvm->arch.config_lock);
+
+ return ret;
+}
+
void pkvm_destroy_hyp_vm(struct kvm *host_kvm)
{
mutex_lock(&host_kvm->arch.config_lock);
--
2.48.1.711.g2feabab25a-goog
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH v2 4/4] KVM: arm64: Create each pKVM hyp vcpu after its corresponding host vcpu
2025-02-26 21:55 ` [PATCH v2 4/4] KVM: arm64: Create each pKVM hyp vcpu after its corresponding host vcpu Fuad Tabba
@ 2025-02-27 12:09 ` Marc Zyngier
2025-02-27 12:47 ` Fuad Tabba
0 siblings, 1 reply; 18+ messages in thread
From: Marc Zyngier @ 2025-02-27 12:09 UTC (permalink / raw)
To: Fuad Tabba
Cc: kvmarm, linux-arm-kernel, oliver.upton, mark.rutland, will,
joey.gouly, suzuki.poulose, yuzenghui, catalin.marinas, broonie,
qperret, vdonnefort
On Wed, 26 Feb 2025 21:55:20 +0000,
Fuad Tabba <tabba@google.com> wrote:
>
> Instead of creating and initializing _all_ hyp vcpus in pKVM when
> the first host vcpu runs for the first time, initialize _each_
> hyp vcpu in conjunction with its corresponding host vcpu.
>
> Some of the host vcpu state (e.g., system registers and traps
> values) is not initialized until the first time the host vcpu is
> run. Therefore, initializing a hyp vcpu before its corresponding
> host vcpu has run for the first time might not view the complete
> host state of these vcpus.
>
> Additionally, this behavior is inline with non-protected modes.
>
> Signed-off-by: Fuad Tabba <tabba@google.com>
> ---
> arch/arm64/include/asm/kvm_host.h | 2 +
> arch/arm64/include/asm/kvm_pkvm.h | 1 +
> arch/arm64/kvm/arm.c | 4 ++
> arch/arm64/kvm/hyp/include/nvhe/pkvm.h | 6 ---
> arch/arm64/kvm/hyp/nvhe/pkvm.c | 54 +++++++++++++++-----------
> arch/arm64/kvm/pkvm.c | 28 ++++++-------
> 6 files changed, 53 insertions(+), 42 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 3a7ec98ef123..4a0e522f6001 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -869,6 +869,8 @@ struct kvm_vcpu_arch {
> #define VCPU_INITIALIZED __vcpu_single_flag(cflags, BIT(0))
> /* SVE config completed */
> #define VCPU_SVE_FINALIZED __vcpu_single_flag(cflags, BIT(1))
> +/* pKVM VCPU setup completed */
> +#define VCPU_PKVM_FINALIZED __vcpu_single_flag(cflags, BIT(2))
>
> /* Exception pending */
> #define PENDING_EXCEPTION __vcpu_single_flag(iflags, BIT(0))
> diff --git a/arch/arm64/include/asm/kvm_pkvm.h b/arch/arm64/include/asm/kvm_pkvm.h
> index eb65f12e81d9..abd693ce5b93 100644
> --- a/arch/arm64/include/asm/kvm_pkvm.h
> +++ b/arch/arm64/include/asm/kvm_pkvm.h
> @@ -19,6 +19,7 @@
> int pkvm_init_host_vm(struct kvm *kvm);
> int pkvm_create_hyp_vm(struct kvm *kvm);
> void pkvm_destroy_hyp_vm(struct kvm *kvm);
> +int pkvm_create_hyp_vcpu(struct kvm_vcpu *vcpu);
>
> /*
> * This functions as an allow-list of protected VM capabilities.
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index b8e55a441282..cd7a808eeb64 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -833,6 +833,10 @@ int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu)
> ret = pkvm_create_hyp_vm(kvm);
> if (ret)
> return ret;
> +
> + ret = pkvm_create_hyp_vcpu(vcpu);
> + if (ret)
> + return ret;
> }
>
> mutex_lock(&kvm->arch.config_lock);
> diff --git a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
> index e42bf68c8848..ce31d3b73603 100644
> --- a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
> +++ b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
> @@ -43,12 +43,6 @@ struct pkvm_hyp_vm {
> struct hyp_pool pool;
> hyp_spinlock_t lock;
>
> - /*
> - * The number of vcpus initialized and ready to run.
> - * Modifying this is protected by 'vm_table_lock'.
> - */
> - unsigned int nr_vcpus;
> -
> /* Array of the hyp vCPU structures for this VM. */
> struct pkvm_hyp_vcpu *vcpus[];
> };
> diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
> index 6efb9bf56180..4ef3748dc660 100644
> --- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
> +++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
> @@ -245,10 +245,12 @@ struct pkvm_hyp_vcpu *pkvm_load_hyp_vcpu(pkvm_handle_t handle,
>
> hyp_spin_lock(&vm_table_lock);
> hyp_vm = get_vm_by_handle(handle);
> - if (!hyp_vm || hyp_vm->nr_vcpus <= vcpu_idx)
> + if (!hyp_vm || hyp_vm->kvm.created_vcpus <= vcpu_idx)
Why is it reasonable to trust kvm.created_vcpus here? It looks like
something that is under direct control of the host, and that could be
arbitrarily changed to allow dereferencing the vcpus[] array past its
actual boundaries.
What guarantees that it is stable at load-time?
I have similar misgivings about the init and teardown paths.
Thanks,
M.
--
Without deviation from the norm, progress is not possible.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 4/4] KVM: arm64: Create each pKVM hyp vcpu after its corresponding host vcpu
2025-02-27 12:09 ` Marc Zyngier
@ 2025-02-27 12:47 ` Fuad Tabba
2025-02-27 14:13 ` Marc Zyngier
0 siblings, 1 reply; 18+ messages in thread
From: Fuad Tabba @ 2025-02-27 12:47 UTC (permalink / raw)
To: Marc Zyngier
Cc: kvmarm, linux-arm-kernel, oliver.upton, mark.rutland, will,
joey.gouly, suzuki.poulose, yuzenghui, catalin.marinas, broonie,
qperret, vdonnefort
Hi Marc,
On Thu, 27 Feb 2025 at 04:09, Marc Zyngier <maz@kernel.org> wrote:
>
> On Wed, 26 Feb 2025 21:55:20 +0000,
> Fuad Tabba <tabba@google.com> wrote:
> >
> > Instead of creating and initializing _all_ hyp vcpus in pKVM when
> > the first host vcpu runs for the first time, initialize _each_
> > hyp vcpu in conjunction with its corresponding host vcpu.
> >
> > Some of the host vcpu state (e.g., system registers and traps
> > values) is not initialized until the first time the host vcpu is
> > run. Therefore, initializing a hyp vcpu before its corresponding
> > host vcpu has run for the first time might not view the complete
> > host state of these vcpus.
> >
> > Additionally, this behavior is inline with non-protected modes.
> >
> > Signed-off-by: Fuad Tabba <tabba@google.com>
> > ---
> > arch/arm64/include/asm/kvm_host.h | 2 +
> > arch/arm64/include/asm/kvm_pkvm.h | 1 +
> > arch/arm64/kvm/arm.c | 4 ++
> > arch/arm64/kvm/hyp/include/nvhe/pkvm.h | 6 ---
> > arch/arm64/kvm/hyp/nvhe/pkvm.c | 54 +++++++++++++++-----------
> > arch/arm64/kvm/pkvm.c | 28 ++++++-------
> > 6 files changed, 53 insertions(+), 42 deletions(-)
> >
> > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > index 3a7ec98ef123..4a0e522f6001 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -869,6 +869,8 @@ struct kvm_vcpu_arch {
> > #define VCPU_INITIALIZED __vcpu_single_flag(cflags, BIT(0))
> > /* SVE config completed */
> > #define VCPU_SVE_FINALIZED __vcpu_single_flag(cflags, BIT(1))
> > +/* pKVM VCPU setup completed */
> > +#define VCPU_PKVM_FINALIZED __vcpu_single_flag(cflags, BIT(2))
> >
> > /* Exception pending */
> > #define PENDING_EXCEPTION __vcpu_single_flag(iflags, BIT(0))
> > diff --git a/arch/arm64/include/asm/kvm_pkvm.h b/arch/arm64/include/asm/kvm_pkvm.h
> > index eb65f12e81d9..abd693ce5b93 100644
> > --- a/arch/arm64/include/asm/kvm_pkvm.h
> > +++ b/arch/arm64/include/asm/kvm_pkvm.h
> > @@ -19,6 +19,7 @@
> > int pkvm_init_host_vm(struct kvm *kvm);
> > int pkvm_create_hyp_vm(struct kvm *kvm);
> > void pkvm_destroy_hyp_vm(struct kvm *kvm);
> > +int pkvm_create_hyp_vcpu(struct kvm_vcpu *vcpu);
> >
> > /*
> > * This functions as an allow-list of protected VM capabilities.
> > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> > index b8e55a441282..cd7a808eeb64 100644
> > --- a/arch/arm64/kvm/arm.c
> > +++ b/arch/arm64/kvm/arm.c
> > @@ -833,6 +833,10 @@ int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu)
> > ret = pkvm_create_hyp_vm(kvm);
> > if (ret)
> > return ret;
> > +
> > + ret = pkvm_create_hyp_vcpu(vcpu);
> > + if (ret)
> > + return ret;
> > }
> >
> > mutex_lock(&kvm->arch.config_lock);
> > diff --git a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
> > index e42bf68c8848..ce31d3b73603 100644
> > --- a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
> > +++ b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
> > @@ -43,12 +43,6 @@ struct pkvm_hyp_vm {
> > struct hyp_pool pool;
> > hyp_spinlock_t lock;
> >
> > - /*
> > - * The number of vcpus initialized and ready to run.
> > - * Modifying this is protected by 'vm_table_lock'.
> > - */
> > - unsigned int nr_vcpus;
> > -
> > /* Array of the hyp vCPU structures for this VM. */
> > struct pkvm_hyp_vcpu *vcpus[];
> > };
> > diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
> > index 6efb9bf56180..4ef3748dc660 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
> > @@ -245,10 +245,12 @@ struct pkvm_hyp_vcpu *pkvm_load_hyp_vcpu(pkvm_handle_t handle,
> >
> > hyp_spin_lock(&vm_table_lock);
> > hyp_vm = get_vm_by_handle(handle);
> > - if (!hyp_vm || hyp_vm->nr_vcpus <= vcpu_idx)
> > + if (!hyp_vm || hyp_vm->kvm.created_vcpus <= vcpu_idx)
>
> Why is it reasonable to trust kvm.created_vcpus here? It looks like
> something that is under direct control of the host, and that could be
> arbitrarily changed to allow dereferencing the vcpus[] array past its
> actual boundaries.
>
> What guarantees that it is stable at load-time?
>
> I have similar misgivings about the init and teardown paths.
This is the hypervisor's view of created_vcpus
(hyp_vm->kvm.created_vcpus), i.e., not the host's (not
host_kvm->created_vcpus). We read it once from the host at
__pkvm_init_vm(). We should know that it wouldn't overflow the vcpus
array since it's the value of created_vcpus (nr_vcpus) we use when we
calculate the vm_size in __pkvm_init_vm(), and try to map the
corresponding amount of donated memory for the hyp_vm (all in
__pkvm_init_vm()).
Since it's all hyp values, it should be fine, right?
Cheers,
/fuad
> Thanks,
>
> M.
>
> --
> Without deviation from the norm, progress is not possible.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 4/4] KVM: arm64: Create each pKVM hyp vcpu after its corresponding host vcpu
2025-02-27 12:47 ` Fuad Tabba
@ 2025-02-27 14:13 ` Marc Zyngier
0 siblings, 0 replies; 18+ messages in thread
From: Marc Zyngier @ 2025-02-27 14:13 UTC (permalink / raw)
To: Fuad Tabba
Cc: kvmarm, linux-arm-kernel, oliver.upton, mark.rutland, will,
joey.gouly, suzuki.poulose, yuzenghui, catalin.marinas, broonie,
qperret, vdonnefort
On Thu, 27 Feb 2025 12:47:16 +0000,
Fuad Tabba <tabba@google.com> wrote:
>
> Hi Marc,
>
> On Thu, 27 Feb 2025 at 04:09, Marc Zyngier <maz@kernel.org> wrote:
> >
> > On Wed, 26 Feb 2025 21:55:20 +0000,
> > Fuad Tabba <tabba@google.com> wrote:
> > >
> > > Instead of creating and initializing _all_ hyp vcpus in pKVM when
> > > the first host vcpu runs for the first time, initialize _each_
> > > hyp vcpu in conjunction with its corresponding host vcpu.
> > >
> > > Some of the host vcpu state (e.g., system registers and traps
> > > values) is not initialized until the first time the host vcpu is
> > > run. Therefore, initializing a hyp vcpu before its corresponding
> > > host vcpu has run for the first time might not view the complete
> > > host state of these vcpus.
> > >
> > > Additionally, this behavior is inline with non-protected modes.
> > >
> > > Signed-off-by: Fuad Tabba <tabba@google.com>
> > > ---
> > > arch/arm64/include/asm/kvm_host.h | 2 +
> > > arch/arm64/include/asm/kvm_pkvm.h | 1 +
> > > arch/arm64/kvm/arm.c | 4 ++
> > > arch/arm64/kvm/hyp/include/nvhe/pkvm.h | 6 ---
> > > arch/arm64/kvm/hyp/nvhe/pkvm.c | 54 +++++++++++++++-----------
> > > arch/arm64/kvm/pkvm.c | 28 ++++++-------
> > > 6 files changed, 53 insertions(+), 42 deletions(-)
> > >
> > > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > > index 3a7ec98ef123..4a0e522f6001 100644
> > > --- a/arch/arm64/include/asm/kvm_host.h
> > > +++ b/arch/arm64/include/asm/kvm_host.h
> > > @@ -869,6 +869,8 @@ struct kvm_vcpu_arch {
> > > #define VCPU_INITIALIZED __vcpu_single_flag(cflags, BIT(0))
> > > /* SVE config completed */
> > > #define VCPU_SVE_FINALIZED __vcpu_single_flag(cflags, BIT(1))
> > > +/* pKVM VCPU setup completed */
> > > +#define VCPU_PKVM_FINALIZED __vcpu_single_flag(cflags, BIT(2))
> > >
> > > /* Exception pending */
> > > #define PENDING_EXCEPTION __vcpu_single_flag(iflags, BIT(0))
> > > diff --git a/arch/arm64/include/asm/kvm_pkvm.h b/arch/arm64/include/asm/kvm_pkvm.h
> > > index eb65f12e81d9..abd693ce5b93 100644
> > > --- a/arch/arm64/include/asm/kvm_pkvm.h
> > > +++ b/arch/arm64/include/asm/kvm_pkvm.h
> > > @@ -19,6 +19,7 @@
> > > int pkvm_init_host_vm(struct kvm *kvm);
> > > int pkvm_create_hyp_vm(struct kvm *kvm);
> > > void pkvm_destroy_hyp_vm(struct kvm *kvm);
> > > +int pkvm_create_hyp_vcpu(struct kvm_vcpu *vcpu);
> > >
> > > /*
> > > * This functions as an allow-list of protected VM capabilities.
> > > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> > > index b8e55a441282..cd7a808eeb64 100644
> > > --- a/arch/arm64/kvm/arm.c
> > > +++ b/arch/arm64/kvm/arm.c
> > > @@ -833,6 +833,10 @@ int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu)
> > > ret = pkvm_create_hyp_vm(kvm);
> > > if (ret)
> > > return ret;
> > > +
> > > + ret = pkvm_create_hyp_vcpu(vcpu);
> > > + if (ret)
> > > + return ret;
> > > }
> > >
> > > mutex_lock(&kvm->arch.config_lock);
> > > diff --git a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
> > > index e42bf68c8848..ce31d3b73603 100644
> > > --- a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
> > > +++ b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
> > > @@ -43,12 +43,6 @@ struct pkvm_hyp_vm {
> > > struct hyp_pool pool;
> > > hyp_spinlock_t lock;
> > >
> > > - /*
> > > - * The number of vcpus initialized and ready to run.
> > > - * Modifying this is protected by 'vm_table_lock'.
> > > - */
> > > - unsigned int nr_vcpus;
> > > -
> > > /* Array of the hyp vCPU structures for this VM. */
> > > struct pkvm_hyp_vcpu *vcpus[];
> > > };
> > > diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
> > > index 6efb9bf56180..4ef3748dc660 100644
> > > --- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
> > > +++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
> > > @@ -245,10 +245,12 @@ struct pkvm_hyp_vcpu *pkvm_load_hyp_vcpu(pkvm_handle_t handle,
> > >
> > > hyp_spin_lock(&vm_table_lock);
> > > hyp_vm = get_vm_by_handle(handle);
> > > - if (!hyp_vm || hyp_vm->nr_vcpus <= vcpu_idx)
> > > + if (!hyp_vm || hyp_vm->kvm.created_vcpus <= vcpu_idx)
> >
> > Why is it reasonable to trust kvm.created_vcpus here? It looks like
> > something that is under direct control of the host, and that could be
> > arbitrarily changed to allow dereferencing the vcpus[] array past its
> > actual boundaries.
> >
> > What guarantees that it is stable at load-time?
> >
> > I have similar misgivings about the init and teardown paths.
>
> This is the hypervisor's view of created_vcpus
> (hyp_vm->kvm.created_vcpus), i.e., not the host's (not
> host_kvm->created_vcpus). We read it once from the host at
> __pkvm_init_vm(). We should know that it wouldn't overflow the vcpus
> array since it's the value of created_vcpus (nr_vcpus) we use when we
> calculate the vm_size in __pkvm_init_vm(), and try to map the
> corresponding amount of donated memory for the hyp_vm (all in
> __pkvm_init_vm()).
>
> Since it's all hyp values, it should be fine, right?
Ah, I got confused between the host_kvm and kvm fields. Apologies for
the noise, this looks fine indeed.
M.
--
Without deviation from the norm, progress is not possible.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 0/4] KVM: arm64: Fix initializing HCRX_EL2 and other traps in pKVM
2025-02-26 21:55 [PATCH v2 0/4] KVM: arm64: Fix initializing HCRX_EL2 and other traps in pKVM Fuad Tabba
` (3 preceding siblings ...)
2025-02-26 21:55 ` [PATCH v2 4/4] KVM: arm64: Create each pKVM hyp vcpu after its corresponding host vcpu Fuad Tabba
@ 2025-02-27 14:13 ` Marc Zyngier
4 siblings, 0 replies; 18+ messages in thread
From: Marc Zyngier @ 2025-02-27 14:13 UTC (permalink / raw)
To: Fuad Tabba
Cc: kvmarm, linux-arm-kernel, oliver.upton, mark.rutland, will,
joey.gouly, suzuki.poulose, yuzenghui, catalin.marinas, broonie,
qperret, vdonnefort
On Wed, 26 Feb 2025 21:55:16 +0000,
Fuad Tabba <tabba@google.com> wrote:
>
> Changes since v1 [*]:
> - Factor out setting HCRX_EL2 traps into a separate function and
> share logic between protected and non-protected modes
> - Rebase on Linux 6.14-rc4
>
> For context, please refer to v1 [*].
>
> Cheers,
> /fuad
>
> [*] https://lore.kernel.org/all/20250214150258.464798-1-tabba@google.com/
>
> Fuad Tabba (4):
> KVM: arm64: Factor out setting HCRX_EL2 traps into separate function
> KVM: arm64: Initialize HCRX_EL2 traps in pKVM
> KVM: arm64: Factor out pKVM hyp vcpu creation to separate function
> KVM: arm64: Create each pKVM hyp vcpu after its corresponding host
> vcpu
>
> arch/arm64/include/asm/kvm_emulate.h | 24 +++++++++
> arch/arm64/include/asm/kvm_host.h | 2 +
> arch/arm64/include/asm/kvm_pkvm.h | 1 +
> arch/arm64/kvm/arm.c | 4 ++
> arch/arm64/kvm/hyp/include/nvhe/pkvm.h | 6 ---
> arch/arm64/kvm/hyp/nvhe/pkvm.c | 62 ++++++++++++++---------
> arch/arm64/kvm/pkvm.c | 70 +++++++++++++-------------
> arch/arm64/kvm/sys_regs.c | 20 +-------
> 8 files changed, 104 insertions(+), 85 deletions(-)
>
>
> base-commit: d082ecbc71e9e0bf49883ee4afd435a77a5101b6
Reviewed-by: Marc Zyngier <maz@kernel.org>
M.
--
Without deviation from the norm, progress is not possible.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 3/4] KVM: arm64: Factor out pKVM hyp vcpu creation to separate function
2025-02-26 21:55 ` [PATCH v2 3/4] KVM: arm64: Factor out pKVM hyp vcpu creation to separate function Fuad Tabba
@ 2025-02-28 19:44 ` Quentin Perret
2025-03-03 7:57 ` Fuad Tabba
0 siblings, 1 reply; 18+ messages in thread
From: Quentin Perret @ 2025-02-28 19:44 UTC (permalink / raw)
To: Fuad Tabba
Cc: kvmarm, linux-arm-kernel, maz, oliver.upton, mark.rutland, will,
joey.gouly, suzuki.poulose, yuzenghui, catalin.marinas, broonie,
vdonnefort
On Wednesday 26 Feb 2025 at 21:55:19 (+0000), Fuad Tabba wrote:
> static int __pkvm_create_hyp_vm(struct kvm *host_kvm)
> {
> - size_t pgd_sz, hyp_vm_sz, hyp_vcpu_sz;
> + size_t pgd_sz, hyp_vm_sz;
> struct kvm_vcpu *host_vcpu;
> - pkvm_handle_t handle;
> void *pgd, *hyp_vm;
> unsigned long idx;
> int ret;
> @@ -161,33 +178,12 @@ static int __pkvm_create_hyp_vm(struct kvm *host_kvm)
> if (ret < 0)
> goto free_vm;
>
> - handle = ret;
> + WRITE_ONCE(host_kvm->arch.pkvm.handle, ret);
What's the reason to make this a WRITE_ONCE? Does it mean we should
update the readers to be READ_ONCE()?
> - host_kvm->arch.pkvm.handle = handle;
> -
> - /* Donate memory for the vcpus at hyp and initialize it. */
> - hyp_vcpu_sz = PAGE_ALIGN(PKVM_HYP_VCPU_SIZE);
> kvm_for_each_vcpu(idx, host_vcpu, host_kvm) {
> - void *hyp_vcpu;
> -
> - /* Indexing of the vcpus to be sequential starting at 0. */
> - if (WARN_ON(host_vcpu->vcpu_idx != idx)) {
> - ret = -EINVAL;
> - goto destroy_vm;
> - }
> -
> - hyp_vcpu = alloc_pages_exact(hyp_vcpu_sz, GFP_KERNEL_ACCOUNT);
> - if (!hyp_vcpu) {
> - ret = -ENOMEM;
> - goto destroy_vm;
> - }
> -
> - ret = kvm_call_hyp_nvhe(__pkvm_init_vcpu, handle, host_vcpu,
> - hyp_vcpu);
> - if (ret) {
> - free_pages_exact(hyp_vcpu, hyp_vcpu_sz);
> + ret = __pkvm_create_hyp_vcpu(host_vcpu);
> + if (ret)
> goto destroy_vm;
> - }
> }
>
> return 0;
> --
> 2.48.1.711.g2feabab25a-goog
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 3/4] KVM: arm64: Factor out pKVM hyp vcpu creation to separate function
2025-02-28 19:44 ` Quentin Perret
@ 2025-03-03 7:57 ` Fuad Tabba
2025-03-03 19:18 ` Will Deacon
0 siblings, 1 reply; 18+ messages in thread
From: Fuad Tabba @ 2025-03-03 7:57 UTC (permalink / raw)
To: Quentin Perret
Cc: kvmarm, linux-arm-kernel, maz, oliver.upton, mark.rutland, will,
joey.gouly, suzuki.poulose, yuzenghui, catalin.marinas, broonie,
vdonnefort
Hi Quentin,
On Fri, 28 Feb 2025 at 19:44, Quentin Perret <qperret@google.com> wrote:
>
> On Wednesday 26 Feb 2025 at 21:55:19 (+0000), Fuad Tabba wrote:
> > static int __pkvm_create_hyp_vm(struct kvm *host_kvm)
> > {
> > - size_t pgd_sz, hyp_vm_sz, hyp_vcpu_sz;
> > + size_t pgd_sz, hyp_vm_sz;
> > struct kvm_vcpu *host_vcpu;
> > - pkvm_handle_t handle;
> > void *pgd, *hyp_vm;
> > unsigned long idx;
> > int ret;
> > @@ -161,33 +178,12 @@ static int __pkvm_create_hyp_vm(struct kvm *host_kvm)
> > if (ret < 0)
> > goto free_vm;
> >
> > - handle = ret;
> > + WRITE_ONCE(host_kvm->arch.pkvm.handle, ret);
>
> What's the reason to make this a WRITE_ONCE? Does it mean we should
> update the readers to be READ_ONCE()?
I don't remember the original reason, to be honest. In this case, it
was to make it consistent with downstream code in Android. That said,
I plan on revising all of these soon and fixing this (and related
code) in light of Will's comment regarding potential specter gadgets:
https://lore.kernel.org/all/20250218092705.GA17030@willie-the-truck/
Cheers,
/fuad
> > - host_kvm->arch.pkvm.handle = handle;
> > -
> > - /* Donate memory for the vcpus at hyp and initialize it. */
> > - hyp_vcpu_sz = PAGE_ALIGN(PKVM_HYP_VCPU_SIZE);
> > kvm_for_each_vcpu(idx, host_vcpu, host_kvm) {
> > - void *hyp_vcpu;
> > -
> > - /* Indexing of the vcpus to be sequential starting at 0. */
> > - if (WARN_ON(host_vcpu->vcpu_idx != idx)) {
> > - ret = -EINVAL;
> > - goto destroy_vm;
> > - }
> > -
> > - hyp_vcpu = alloc_pages_exact(hyp_vcpu_sz, GFP_KERNEL_ACCOUNT);
> > - if (!hyp_vcpu) {
> > - ret = -ENOMEM;
> > - goto destroy_vm;
> > - }
> > -
> > - ret = kvm_call_hyp_nvhe(__pkvm_init_vcpu, handle, host_vcpu,
> > - hyp_vcpu);
> > - if (ret) {
> > - free_pages_exact(hyp_vcpu, hyp_vcpu_sz);
> > + ret = __pkvm_create_hyp_vcpu(host_vcpu);
> > + if (ret)
> > goto destroy_vm;
> > - }
> > }
> >
> > return 0;
> > --
> > 2.48.1.711.g2feabab25a-goog
> >
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 3/4] KVM: arm64: Factor out pKVM hyp vcpu creation to separate function
2025-03-03 7:57 ` Fuad Tabba
@ 2025-03-03 19:18 ` Will Deacon
2025-03-03 19:21 ` Fuad Tabba
0 siblings, 1 reply; 18+ messages in thread
From: Will Deacon @ 2025-03-03 19:18 UTC (permalink / raw)
To: Fuad Tabba
Cc: Quentin Perret, kvmarm, linux-arm-kernel, maz, oliver.upton,
mark.rutland, joey.gouly, suzuki.poulose, yuzenghui,
catalin.marinas, broonie, vdonnefort
On Mon, Mar 03, 2025 at 07:57:00AM +0000, Fuad Tabba wrote:
> On Fri, 28 Feb 2025 at 19:44, Quentin Perret <qperret@google.com> wrote:
> >
> > On Wednesday 26 Feb 2025 at 21:55:19 (+0000), Fuad Tabba wrote:
> > > static int __pkvm_create_hyp_vm(struct kvm *host_kvm)
> > > {
> > > - size_t pgd_sz, hyp_vm_sz, hyp_vcpu_sz;
> > > + size_t pgd_sz, hyp_vm_sz;
> > > struct kvm_vcpu *host_vcpu;
> > > - pkvm_handle_t handle;
> > > void *pgd, *hyp_vm;
> > > unsigned long idx;
> > > int ret;
> > > @@ -161,33 +178,12 @@ static int __pkvm_create_hyp_vm(struct kvm *host_kvm)
> > > if (ret < 0)
> > > goto free_vm;
> > >
> > > - handle = ret;
> > > + WRITE_ONCE(host_kvm->arch.pkvm.handle, ret);
> >
> > What's the reason to make this a WRITE_ONCE? Does it mean we should
> > update the readers to be READ_ONCE()?
>
> I don't remember the original reason, to be honest. In this case, it
> was to make it consistent with downstream code in Android. That said,
> I plan on revising all of these soon and fixing this (and related
> code) in light of Will's comment regarding potential specter gadgets:
>
> https://lore.kernel.org/all/20250218092705.GA17030@willie-the-truck/
I'm not sure the spectre stuff changes the concurrency aspects here, so
Quentin's question presumably still stands even after that.
Looking at the Android code, this WRITE_ONCE() pairs with a READ_ONCE()
in pkvm_is_hyp_created() which is called without the config_lock held
by kvm_arch_prepare_memory_region(). However, given that
pkvm_is_hyp_created() is only testing against 0, I don't think the
_ONCE() accessors are doing anything useful in that case.
The more confusing stuff is where 'kvm->arch.pkvm.handle' is read
directly without the 'config_lock' held. The MMU notifiers look like
they can do that, so I wonder if there's a theoretical race where they
can race with first run and issue TLB invalidation with the wrong handle?
That would apply equally to the upstream code, I think.
Will
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 3/4] KVM: arm64: Factor out pKVM hyp vcpu creation to separate function
2025-03-03 19:18 ` Will Deacon
@ 2025-03-03 19:21 ` Fuad Tabba
2025-03-03 21:49 ` Will Deacon
0 siblings, 1 reply; 18+ messages in thread
From: Fuad Tabba @ 2025-03-03 19:21 UTC (permalink / raw)
To: Will Deacon
Cc: Quentin Perret, kvmarm, linux-arm-kernel, maz, oliver.upton,
mark.rutland, joey.gouly, suzuki.poulose, yuzenghui,
catalin.marinas, broonie, vdonnefort
Hi Will,
On Mon, 3 Mar 2025 at 19:18, Will Deacon <will@kernel.org> wrote:
>
> On Mon, Mar 03, 2025 at 07:57:00AM +0000, Fuad Tabba wrote:
> > On Fri, 28 Feb 2025 at 19:44, Quentin Perret <qperret@google.com> wrote:
> > >
> > > On Wednesday 26 Feb 2025 at 21:55:19 (+0000), Fuad Tabba wrote:
> > > > static int __pkvm_create_hyp_vm(struct kvm *host_kvm)
> > > > {
> > > > - size_t pgd_sz, hyp_vm_sz, hyp_vcpu_sz;
> > > > + size_t pgd_sz, hyp_vm_sz;
> > > > struct kvm_vcpu *host_vcpu;
> > > > - pkvm_handle_t handle;
> > > > void *pgd, *hyp_vm;
> > > > unsigned long idx;
> > > > int ret;
> > > > @@ -161,33 +178,12 @@ static int __pkvm_create_hyp_vm(struct kvm *host_kvm)
> > > > if (ret < 0)
> > > > goto free_vm;
> > > >
> > > > - handle = ret;
> > > > + WRITE_ONCE(host_kvm->arch.pkvm.handle, ret);
> > >
> > > What's the reason to make this a WRITE_ONCE? Does it mean we should
> > > update the readers to be READ_ONCE()?
> >
> > I don't remember the original reason, to be honest. In this case, it
> > was to make it consistent with downstream code in Android. That said,
> > I plan on revising all of these soon and fixing this (and related
> > code) in light of Will's comment regarding potential specter gadgets:
> >
> > https://lore.kernel.org/all/20250218092705.GA17030@willie-the-truck/
>
> I'm not sure the spectre stuff changes the concurrency aspects here, so
> Quentin's question presumably still stands even after that.
>
> Looking at the Android code, this WRITE_ONCE() pairs with a READ_ONCE()
> in pkvm_is_hyp_created() which is called without the config_lock held
> by kvm_arch_prepare_memory_region(). However, given that
> pkvm_is_hyp_created() is only testing against 0, I don't think the
> _ONCE() accessors are doing anything useful in that case.
>
> The more confusing stuff is where 'kvm->arch.pkvm.handle' is read
> directly without the 'config_lock' held. The MMU notifiers look like
> they can do that, so I wonder if there's a theoretical race where they
> can race with first run and issue TLB invalidation with the wrong handle?
> That would apply equally to the upstream code, I think.
Would using _ONCE accessors with the MMU notifiers be enough to avoid
the race, or do we need to reconsider the lock protecting the handle
and apply it to the notifiers?
Cheers,
/fuad
> Will
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 3/4] KVM: arm64: Factor out pKVM hyp vcpu creation to separate function
2025-03-03 19:21 ` Fuad Tabba
@ 2025-03-03 21:49 ` Will Deacon
2025-03-04 12:33 ` Fuad Tabba
0 siblings, 1 reply; 18+ messages in thread
From: Will Deacon @ 2025-03-03 21:49 UTC (permalink / raw)
To: Fuad Tabba
Cc: Quentin Perret, kvmarm, linux-arm-kernel, maz, oliver.upton,
mark.rutland, joey.gouly, suzuki.poulose, yuzenghui,
catalin.marinas, broonie, vdonnefort
On Mon, Mar 03, 2025 at 07:21:33PM +0000, Fuad Tabba wrote:
> On Mon, 3 Mar 2025 at 19:18, Will Deacon <will@kernel.org> wrote:
> > On Mon, Mar 03, 2025 at 07:57:00AM +0000, Fuad Tabba wrote:
> > > On Fri, 28 Feb 2025 at 19:44, Quentin Perret <qperret@google.com> wrote:
> > > > On Wednesday 26 Feb 2025 at 21:55:19 (+0000), Fuad Tabba wrote:
> > > > > static int __pkvm_create_hyp_vm(struct kvm *host_kvm)
> > > > > {
> > > > > - size_t pgd_sz, hyp_vm_sz, hyp_vcpu_sz;
> > > > > + size_t pgd_sz, hyp_vm_sz;
> > > > > struct kvm_vcpu *host_vcpu;
> > > > > - pkvm_handle_t handle;
> > > > > void *pgd, *hyp_vm;
> > > > > unsigned long idx;
> > > > > int ret;
> > > > > @@ -161,33 +178,12 @@ static int __pkvm_create_hyp_vm(struct kvm *host_kvm)
> > > > > if (ret < 0)
> > > > > goto free_vm;
> > > > >
> > > > > - handle = ret;
> > > > > + WRITE_ONCE(host_kvm->arch.pkvm.handle, ret);
> > > >
> > > > What's the reason to make this a WRITE_ONCE? Does it mean we should
> > > > update the readers to be READ_ONCE()?
> > >
> > > I don't remember the original reason, to be honest. In this case, it
> > > was to make it consistent with downstream code in Android. That said,
> > > I plan on revising all of these soon and fixing this (and related
> > > code) in light of Will's comment regarding potential specter gadgets:
> > >
> > > https://lore.kernel.org/all/20250218092705.GA17030@willie-the-truck/
> >
> > I'm not sure the spectre stuff changes the concurrency aspects here, so
> > Quentin's question presumably still stands even after that.
> >
> > Looking at the Android code, this WRITE_ONCE() pairs with a READ_ONCE()
> > in pkvm_is_hyp_created() which is called without the config_lock held
> > by kvm_arch_prepare_memory_region(). However, given that
> > pkvm_is_hyp_created() is only testing against 0, I don't think the
> > _ONCE() accessors are doing anything useful in that case.
> >
> > The more confusing stuff is where 'kvm->arch.pkvm.handle' is read
> > directly without the 'config_lock' held. The MMU notifiers look like
> > they can do that, so I wonder if there's a theoretical race where they
> > can race with first run and issue TLB invalidation with the wrong handle?
> > That would apply equally to the upstream code, I think.
>
> Would using _ONCE accessors with the MMU notifiers be enough to avoid
> the race, or do we need to reconsider the lock protecting the handle
> and apply it to the notifiers?
I'm not entirely sure... if we used _ONCE() then we'd get either the
correct handle or zero, but we presumably need to order that against the
page-table somehow. The 'mmu_lock' looks like it gives us that, but I
don't think the notifiers are expecting an uninitialised handle in the
case where the page-table is empty (i.e. if they fire before first run).
Given that the handle is necessary for TLB invalidation, I'd be inclined
to make sure that the handle is allocated and published _before_ the
kvm_pgtable pointer checked by the notifiers is set, but that means
moving the handle allocation into kvm_arch_init_vm(). Is that do-able?
Will
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 3/4] KVM: arm64: Factor out pKVM hyp vcpu creation to separate function
2025-03-03 21:49 ` Will Deacon
@ 2025-03-04 12:33 ` Fuad Tabba
2025-03-12 15:29 ` Will Deacon
0 siblings, 1 reply; 18+ messages in thread
From: Fuad Tabba @ 2025-03-04 12:33 UTC (permalink / raw)
To: Will Deacon
Cc: Quentin Perret, kvmarm, linux-arm-kernel, maz, oliver.upton,
mark.rutland, joey.gouly, suzuki.poulose, yuzenghui,
catalin.marinas, broonie, vdonnefort
Hi Will,
On Mon, 3 Mar 2025 at 21:49, Will Deacon <will@kernel.org> wrote:
>
> On Mon, Mar 03, 2025 at 07:21:33PM +0000, Fuad Tabba wrote:
> > On Mon, 3 Mar 2025 at 19:18, Will Deacon <will@kernel.org> wrote:
> > > On Mon, Mar 03, 2025 at 07:57:00AM +0000, Fuad Tabba wrote:
> > > > On Fri, 28 Feb 2025 at 19:44, Quentin Perret <qperret@google.com> wrote:
> > > > > On Wednesday 26 Feb 2025 at 21:55:19 (+0000), Fuad Tabba wrote:
> > > > > > static int __pkvm_create_hyp_vm(struct kvm *host_kvm)
> > > > > > {
> > > > > > - size_t pgd_sz, hyp_vm_sz, hyp_vcpu_sz;
> > > > > > + size_t pgd_sz, hyp_vm_sz;
> > > > > > struct kvm_vcpu *host_vcpu;
> > > > > > - pkvm_handle_t handle;
> > > > > > void *pgd, *hyp_vm;
> > > > > > unsigned long idx;
> > > > > > int ret;
> > > > > > @@ -161,33 +178,12 @@ static int __pkvm_create_hyp_vm(struct kvm *host_kvm)
> > > > > > if (ret < 0)
> > > > > > goto free_vm;
> > > > > >
> > > > > > - handle = ret;
> > > > > > + WRITE_ONCE(host_kvm->arch.pkvm.handle, ret);
> > > > >
> > > > > What's the reason to make this a WRITE_ONCE? Does it mean we should
> > > > > update the readers to be READ_ONCE()?
> > > >
> > > > I don't remember the original reason, to be honest. In this case, it
> > > > was to make it consistent with downstream code in Android. That said,
> > > > I plan on revising all of these soon and fixing this (and related
> > > > code) in light of Will's comment regarding potential specter gadgets:
> > > >
> > > > https://lore.kernel.org/all/20250218092705.GA17030@willie-the-truck/
> > >
> > > I'm not sure the spectre stuff changes the concurrency aspects here, so
> > > Quentin's question presumably still stands even after that.
> > >
> > > Looking at the Android code, this WRITE_ONCE() pairs with a READ_ONCE()
> > > in pkvm_is_hyp_created() which is called without the config_lock held
> > > by kvm_arch_prepare_memory_region(). However, given that
> > > pkvm_is_hyp_created() is only testing against 0, I don't think the
> > > _ONCE() accessors are doing anything useful in that case.
> > >
> > > The more confusing stuff is where 'kvm->arch.pkvm.handle' is read
> > > directly without the 'config_lock' held. The MMU notifiers look like
> > > they can do that, so I wonder if there's a theoretical race where they
> > > can race with first run and issue TLB invalidation with the wrong handle?
> > > That would apply equally to the upstream code, I think.
> >
> > Would using _ONCE accessors with the MMU notifiers be enough to avoid
> > the race, or do we need to reconsider the lock protecting the handle
> > and apply it to the notifiers?
>
> I'm not entirely sure... if we used _ONCE() then we'd get either the
> correct handle or zero, but we presumably need to order that against the
> page-table somehow. The 'mmu_lock' looks like it gives us that, but I
> don't think the notifiers are expecting an uninitialised handle in the
> case where the page-table is empty (i.e. if they fire before first run).
>
> Given that the handle is necessary for TLB invalidation, I'd be inclined
> to make sure that the handle is allocated and published _before_ the
> kvm_pgtable pointer checked by the notifiers is set, but that means
> moving the handle allocation into kvm_arch_init_vm(). Is that do-able?
It is doable, but it won't be a simple patch. Up until now, the
creation of the hyp view of a vm is associated with the first run of
the vm, rather than its allocation at the host. Part of the reason for
that is that much of the vm state (mainly the vcpu state) isn't
finalized until then.
This patch series fixes this in part, by decoupling the initialization
of the individual vcpus from the vm. Because of that, it would be
doable, but it might be better as another patch series that we could
test and analyze separately.
If you agree, I could whip up something and send it soon.
Cheers,
/fuad
> Will
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 3/4] KVM: arm64: Factor out pKVM hyp vcpu creation to separate function
2025-03-04 12:33 ` Fuad Tabba
@ 2025-03-12 15:29 ` Will Deacon
2025-03-12 15:31 ` Fuad Tabba
0 siblings, 1 reply; 18+ messages in thread
From: Will Deacon @ 2025-03-12 15:29 UTC (permalink / raw)
To: Fuad Tabba
Cc: Quentin Perret, kvmarm, linux-arm-kernel, maz, oliver.upton,
mark.rutland, joey.gouly, suzuki.poulose, yuzenghui,
catalin.marinas, broonie, vdonnefort
On Tue, Mar 04, 2025 at 12:33:27PM +0000, Fuad Tabba wrote:
> On Mon, 3 Mar 2025 at 21:49, Will Deacon <will@kernel.org> wrote:
> >
> > On Mon, Mar 03, 2025 at 07:21:33PM +0000, Fuad Tabba wrote:
> > > On Mon, 3 Mar 2025 at 19:18, Will Deacon <will@kernel.org> wrote:
> > > > On Mon, Mar 03, 2025 at 07:57:00AM +0000, Fuad Tabba wrote:
> > > > > On Fri, 28 Feb 2025 at 19:44, Quentin Perret <qperret@google.com> wrote:
> > > > > > On Wednesday 26 Feb 2025 at 21:55:19 (+0000), Fuad Tabba wrote:
> > > > > > > static int __pkvm_create_hyp_vm(struct kvm *host_kvm)
> > > > > > > {
> > > > > > > - size_t pgd_sz, hyp_vm_sz, hyp_vcpu_sz;
> > > > > > > + size_t pgd_sz, hyp_vm_sz;
> > > > > > > struct kvm_vcpu *host_vcpu;
> > > > > > > - pkvm_handle_t handle;
> > > > > > > void *pgd, *hyp_vm;
> > > > > > > unsigned long idx;
> > > > > > > int ret;
> > > > > > > @@ -161,33 +178,12 @@ static int __pkvm_create_hyp_vm(struct kvm *host_kvm)
> > > > > > > if (ret < 0)
> > > > > > > goto free_vm;
> > > > > > >
> > > > > > > - handle = ret;
> > > > > > > + WRITE_ONCE(host_kvm->arch.pkvm.handle, ret);
> > > > > >
> > > > > > What's the reason to make this a WRITE_ONCE? Does it mean we should
> > > > > > update the readers to be READ_ONCE()?
> > > > >
> > > > > I don't remember the original reason, to be honest. In this case, it
> > > > > was to make it consistent with downstream code in Android. That said,
> > > > > I plan on revising all of these soon and fixing this (and related
> > > > > code) in light of Will's comment regarding potential specter gadgets:
> > > > >
> > > > > https://lore.kernel.org/all/20250218092705.GA17030@willie-the-truck/
> > > >
> > > > I'm not sure the spectre stuff changes the concurrency aspects here, so
> > > > Quentin's question presumably still stands even after that.
> > > >
> > > > Looking at the Android code, this WRITE_ONCE() pairs with a READ_ONCE()
> > > > in pkvm_is_hyp_created() which is called without the config_lock held
> > > > by kvm_arch_prepare_memory_region(). However, given that
> > > > pkvm_is_hyp_created() is only testing against 0, I don't think the
> > > > _ONCE() accessors are doing anything useful in that case.
> > > >
> > > > The more confusing stuff is where 'kvm->arch.pkvm.handle' is read
> > > > directly without the 'config_lock' held. The MMU notifiers look like
> > > > they can do that, so I wonder if there's a theoretical race where they
> > > > can race with first run and issue TLB invalidation with the wrong handle?
> > > > That would apply equally to the upstream code, I think.
> > >
> > > Would using _ONCE accessors with the MMU notifiers be enough to avoid
> > > the race, or do we need to reconsider the lock protecting the handle
> > > and apply it to the notifiers?
> >
> > I'm not entirely sure... if we used _ONCE() then we'd get either the
> > correct handle or zero, but we presumably need to order that against the
> > page-table somehow. The 'mmu_lock' looks like it gives us that, but I
> > don't think the notifiers are expecting an uninitialised handle in the
> > case where the page-table is empty (i.e. if they fire before first run).
> >
> > Given that the handle is necessary for TLB invalidation, I'd be inclined
> > to make sure that the handle is allocated and published _before_ the
> > kvm_pgtable pointer checked by the notifiers is set, but that means
> > moving the handle allocation into kvm_arch_init_vm(). Is that do-able?
>
> It is doable, but it won't be a simple patch. Up until now, the
> creation of the hyp view of a vm is associated with the first run of
> the vm, rather than its allocation at the host. Part of the reason for
> that is that much of the vm state (mainly the vcpu state) isn't
> finalized until then.
>
> This patch series fixes this in part, by decoupling the initialization
> of the individual vcpus from the vm. Because of that, it would be
> doable, but it might be better as another patch series that we could
> test and analyze separately.
>
> If you agree, I could whip up something and send it soon.
Yeah, I think we should fix this as a separate series and drop the
WRITE_ONCE() from this patch.
Will
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 3/4] KVM: arm64: Factor out pKVM hyp vcpu creation to separate function
2025-03-12 15:29 ` Will Deacon
@ 2025-03-12 15:31 ` Fuad Tabba
2025-03-14 11:14 ` Will Deacon
0 siblings, 1 reply; 18+ messages in thread
From: Fuad Tabba @ 2025-03-12 15:31 UTC (permalink / raw)
To: Will Deacon
Cc: Quentin Perret, kvmarm, linux-arm-kernel, maz, oliver.upton,
mark.rutland, joey.gouly, suzuki.poulose, yuzenghui,
catalin.marinas, broonie, vdonnefort
Hi Will,
On Wed, 12 Mar 2025 at 15:29, Will Deacon <will@kernel.org> wrote:
>
> On Tue, Mar 04, 2025 at 12:33:27PM +0000, Fuad Tabba wrote:
> > On Mon, 3 Mar 2025 at 21:49, Will Deacon <will@kernel.org> wrote:
> > >
> > > On Mon, Mar 03, 2025 at 07:21:33PM +0000, Fuad Tabba wrote:
> > > > On Mon, 3 Mar 2025 at 19:18, Will Deacon <will@kernel.org> wrote:
> > > > > On Mon, Mar 03, 2025 at 07:57:00AM +0000, Fuad Tabba wrote:
> > > > > > On Fri, 28 Feb 2025 at 19:44, Quentin Perret <qperret@google.com> wrote:
> > > > > > > On Wednesday 26 Feb 2025 at 21:55:19 (+0000), Fuad Tabba wrote:
> > > > > > > > static int __pkvm_create_hyp_vm(struct kvm *host_kvm)
> > > > > > > > {
> > > > > > > > - size_t pgd_sz, hyp_vm_sz, hyp_vcpu_sz;
> > > > > > > > + size_t pgd_sz, hyp_vm_sz;
> > > > > > > > struct kvm_vcpu *host_vcpu;
> > > > > > > > - pkvm_handle_t handle;
> > > > > > > > void *pgd, *hyp_vm;
> > > > > > > > unsigned long idx;
> > > > > > > > int ret;
> > > > > > > > @@ -161,33 +178,12 @@ static int __pkvm_create_hyp_vm(struct kvm *host_kvm)
> > > > > > > > if (ret < 0)
> > > > > > > > goto free_vm;
> > > > > > > >
> > > > > > > > - handle = ret;
> > > > > > > > + WRITE_ONCE(host_kvm->arch.pkvm.handle, ret);
> > > > > > >
> > > > > > > What's the reason to make this a WRITE_ONCE? Does it mean we should
> > > > > > > update the readers to be READ_ONCE()?
> > > > > >
> > > > > > I don't remember the original reason, to be honest. In this case, it
> > > > > > was to make it consistent with downstream code in Android. That said,
> > > > > > I plan on revising all of these soon and fixing this (and related
> > > > > > code) in light of Will's comment regarding potential specter gadgets:
> > > > > >
> > > > > > https://lore.kernel.org/all/20250218092705.GA17030@willie-the-truck/
> > > > >
> > > > > I'm not sure the spectre stuff changes the concurrency aspects here, so
> > > > > Quentin's question presumably still stands even after that.
> > > > >
> > > > > Looking at the Android code, this WRITE_ONCE() pairs with a READ_ONCE()
> > > > > in pkvm_is_hyp_created() which is called without the config_lock held
> > > > > by kvm_arch_prepare_memory_region(). However, given that
> > > > > pkvm_is_hyp_created() is only testing against 0, I don't think the
> > > > > _ONCE() accessors are doing anything useful in that case.
> > > > >
> > > > > The more confusing stuff is where 'kvm->arch.pkvm.handle' is read
> > > > > directly without the 'config_lock' held. The MMU notifiers look like
> > > > > they can do that, so I wonder if there's a theoretical race where they
> > > > > can race with first run and issue TLB invalidation with the wrong handle?
> > > > > That would apply equally to the upstream code, I think.
> > > >
> > > > Would using _ONCE accessors with the MMU notifiers be enough to avoid
> > > > the race, or do we need to reconsider the lock protecting the handle
> > > > and apply it to the notifiers?
> > >
> > > I'm not entirely sure... if we used _ONCE() then we'd get either the
> > > correct handle or zero, but we presumably need to order that against the
> > > page-table somehow. The 'mmu_lock' looks like it gives us that, but I
> > > don't think the notifiers are expecting an uninitialised handle in the
> > > case where the page-table is empty (i.e. if they fire before first run).
> > >
> > > Given that the handle is necessary for TLB invalidation, I'd be inclined
> > > to make sure that the handle is allocated and published _before_ the
> > > kvm_pgtable pointer checked by the notifiers is set, but that means
> > > moving the handle allocation into kvm_arch_init_vm(). Is that do-able?
> >
> > It is doable, but it won't be a simple patch. Up until now, the
> > creation of the hyp view of a vm is associated with the first run of
> > the vm, rather than its allocation at the host. Part of the reason for
> > that is that much of the vm state (mainly the vcpu state) isn't
> > finalized until then.
> >
> > This patch series fixes this in part, by decoupling the initialization
> > of the individual vcpus from the vm. Because of that, it would be
> > doable, but it might be better as another patch series that we could
> > test and analyze separately.
> >
> > If you agree, I could whip up something and send it soon.
>
> Yeah, I think we should fix this as a separate series and drop the
> WRITE_ONCE() from this patch.
I'll send a fix soon.
Oliver, would you like me to respin this series, or can you drop the
WRITE_ONCE()?
Thanks,
/fuad
> Will
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 3/4] KVM: arm64: Factor out pKVM hyp vcpu creation to separate function
2025-03-12 15:31 ` Fuad Tabba
@ 2025-03-14 11:14 ` Will Deacon
0 siblings, 0 replies; 18+ messages in thread
From: Will Deacon @ 2025-03-14 11:14 UTC (permalink / raw)
To: Fuad Tabba
Cc: Quentin Perret, kvmarm, linux-arm-kernel, maz, oliver.upton,
mark.rutland, joey.gouly, suzuki.poulose, yuzenghui,
catalin.marinas, broonie, vdonnefort
On Wed, Mar 12, 2025 at 03:31:17PM +0000, Fuad Tabba wrote:
> On Wed, 12 Mar 2025 at 15:29, Will Deacon <will@kernel.org> wrote:
> > On Tue, Mar 04, 2025 at 12:33:27PM +0000, Fuad Tabba wrote:
> > > If you agree, I could whip up something and send it soon.
> >
> > Yeah, I think we should fix this as a separate series and drop the
> > WRITE_ONCE() from this patch.
>
> I'll send a fix soon.
>
> Oliver, would you like me to respin this series, or can you drop the
> WRITE_ONCE()?
With the WRITE_ONCE() dropped:
Acked-by: Will Deacon <will@kernel.org>
Cheers,
Will
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2025-03-14 11:16 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-26 21:55 [PATCH v2 0/4] KVM: arm64: Fix initializing HCRX_EL2 and other traps in pKVM Fuad Tabba
2025-02-26 21:55 ` [PATCH v2 1/4] KVM: arm64: Factor out setting HCRX_EL2 traps into separate function Fuad Tabba
2025-02-26 21:55 ` [PATCH v2 2/4] KVM: arm64: Initialize HCRX_EL2 traps in pKVM Fuad Tabba
2025-02-26 21:55 ` [PATCH v2 3/4] KVM: arm64: Factor out pKVM hyp vcpu creation to separate function Fuad Tabba
2025-02-28 19:44 ` Quentin Perret
2025-03-03 7:57 ` Fuad Tabba
2025-03-03 19:18 ` Will Deacon
2025-03-03 19:21 ` Fuad Tabba
2025-03-03 21:49 ` Will Deacon
2025-03-04 12:33 ` Fuad Tabba
2025-03-12 15:29 ` Will Deacon
2025-03-12 15:31 ` Fuad Tabba
2025-03-14 11:14 ` Will Deacon
2025-02-26 21:55 ` [PATCH v2 4/4] KVM: arm64: Create each pKVM hyp vcpu after its corresponding host vcpu Fuad Tabba
2025-02-27 12:09 ` Marc Zyngier
2025-02-27 12:47 ` Fuad Tabba
2025-02-27 14:13 ` Marc Zyngier
2025-02-27 14:13 ` [PATCH v2 0/4] KVM: arm64: Fix initializing HCRX_EL2 and other traps in pKVM Marc Zyngier
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).