* [PATCH v14 35/44] arm64: RMI: support RSI_HOST_CALL
From: Steven Price @ 2026-05-13 13:17 UTC (permalink / raw)
To: kvm, kvmarm
Cc: Joey Gouly, Catalin Marinas, Marc Zyngier, Will Deacon,
James Morse, Oliver Upton, Suzuki K Poulose, Zenghui Yu,
linux-arm-kernel, linux-kernel, Alexandru Elisei,
Christoffer Dall, Fuad Tabba, linux-coco, Ganapatrao Kulkarni,
Gavin Shan, Shanker Donthineni, Alper Gun, Aneesh Kumar K . V,
Emi Kisanuki, Vishal Annapurve, WeiLin.Chang, Lorenzo.Pieralisi2,
Steven Price
In-Reply-To: <20260513131757.116630-1-steven.price@arm.com>
From: Joey Gouly <joey.gouly@arm.com>
Realm VMs can talk to the hypervisor using the RSI_HOST_CALL SMC. The
RMM forwards this to the host and KVM handles them as regular
hypercalls.
Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
---
Changes since v7:
* Avoid turning a negative return from kvm_smccc_call_handler() into a
error response to the guest. Instead propogate the error back to user
space.
Changes since v4:
* Setting GPRS is now done by kvm_rec_enter() rather than
rec_exit_host_call() (see previous patch - arm64: RME: Handle realm
enter/exit). This fixes a bug where the registers set by user space
were being ignored.
---
arch/arm64/kvm/rmi-exit.c | 15 +++++++++++++++
1 file changed, 15 insertions(+)
diff --git a/arch/arm64/kvm/rmi-exit.c b/arch/arm64/kvm/rmi-exit.c
index 8ec0d179eba2..e5647aa004d3 100644
--- a/arch/arm64/kvm/rmi-exit.c
+++ b/arch/arm64/kvm/rmi-exit.c
@@ -116,6 +116,19 @@ static int rec_exit_ripas_change(struct kvm_vcpu *vcpu)
return -EFAULT;
}
+static int rec_exit_host_call(struct kvm_vcpu *vcpu)
+{
+ int i;
+ struct realm_rec *rec = &vcpu->arch.rec;
+
+ vcpu->stat.hvc_exit_stat++;
+
+ for (i = 0; i < REC_RUN_GPRS; i++)
+ vcpu_set_reg(vcpu, i, rec->run->exit.gprs[i]);
+
+ return kvm_smccc_call_handler(vcpu);
+}
+
static void update_arch_timer_irq_lines(struct kvm_vcpu *vcpu)
{
struct realm_rec *rec = &vcpu->arch.rec;
@@ -191,6 +204,8 @@ int handle_rec_exit(struct kvm_vcpu *vcpu, int rec_run_ret)
return rec_exit_psci(vcpu);
case RMI_EXIT_RIPAS_CHANGE:
return rec_exit_ripas_change(vcpu);
+ case RMI_EXIT_HOST_CALL:
+ return rec_exit_host_call(vcpu);
}
kvm_pr_unimpl("Unsupported exit reason: %u\n",
--
2.43.0
^ permalink raw reply related
* [PATCH v14 34/44] arm64: RMI: allow userspace to inject aborts
From: Steven Price @ 2026-05-13 13:17 UTC (permalink / raw)
To: kvm, kvmarm
Cc: Joey Gouly, Catalin Marinas, Marc Zyngier, Will Deacon,
James Morse, Oliver Upton, Suzuki K Poulose, Zenghui Yu,
linux-arm-kernel, linux-kernel, Alexandru Elisei,
Christoffer Dall, Fuad Tabba, linux-coco, Ganapatrao Kulkarni,
Gavin Shan, Shanker Donthineni, Alper Gun, Aneesh Kumar K . V,
Emi Kisanuki, Vishal Annapurve, WeiLin.Chang, Lorenzo.Pieralisi2,
Steven Price
In-Reply-To: <20260513131757.116630-1-steven.price@arm.com>
From: Joey Gouly <joey.gouly@arm.com>
Extend KVM_SET_VCPU_EVENTS to support realms, where KVM cannot set the
system registers, and the RMM must perform it on next REC entry.
Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
---
Documentation/virt/kvm/api.rst | 2 ++
arch/arm64/kvm/guest.c | 24 ++++++++++++++++++++++++
2 files changed, 26 insertions(+)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index a47c60490475..4e0dcca0d261 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -1314,6 +1314,8 @@ User space may need to inject several types of events to the guest.
Set the pending SError exception state for this VCPU. It is not possible to
'cancel' an Serror that has been made pending.
+User space cannot inject SErrors into Realms.
+
If the guest performed an access to I/O memory which could not be handled by
userspace, for example because of missing instruction syndrome decode
information or because there is no device mapped at the accessed IPA, then
diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index e6682019ef6d..447674373426 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -827,6 +827,30 @@ int __kvm_arm_vcpu_set_events(struct kvm_vcpu *vcpu,
u64 esr = events->exception.serror_esr;
int ret = 0;
+ if (vcpu_is_rec(vcpu)) {
+ /* Cannot inject SError into a Realm. */
+ if (serror_pending)
+ return -EINVAL;
+
+ /*
+ * If a data abort is pending, set the flag and let the RMM
+ * inject an SEA when the REC is scheduled to be run.
+ */
+ if (ext_dabt_pending) {
+ /*
+ * Can only inject SEA into a Realm if the previous exit
+ * was due to a data abort of an Unprotected IPA.
+ */
+ if (!(vcpu->arch.rec.run->enter.flags & REC_ENTER_FLAG_EMULATED_MMIO))
+ return -EINVAL;
+
+ vcpu->arch.rec.run->enter.flags &= ~REC_ENTER_FLAG_EMULATED_MMIO;
+ vcpu->arch.rec.run->enter.flags |= REC_ENTER_FLAG_INJECT_SEA;
+ }
+
+ return 0;
+ }
+
/*
* Immediately commit the pending SEA to the vCPU's architectural
* state which is necessary since we do not return a pending SEA
--
2.43.0
^ permalink raw reply related
* [PATCH v14 33/44] KVM: arm64: WARN on injected undef exceptions
From: Steven Price @ 2026-05-13 13:17 UTC (permalink / raw)
To: kvm, kvmarm
Cc: Steven Price, Catalin Marinas, Marc Zyngier, Will Deacon,
James Morse, Oliver Upton, Suzuki K Poulose, Zenghui Yu,
linux-arm-kernel, linux-kernel, Joey Gouly, Alexandru Elisei,
Christoffer Dall, Fuad Tabba, linux-coco, Ganapatrao Kulkarni,
Gavin Shan, Shanker Donthineni, Alper Gun, Aneesh Kumar K . V,
Emi Kisanuki, Vishal Annapurve, WeiLin.Chang, Lorenzo.Pieralisi2
In-Reply-To: <20260513131757.116630-1-steven.price@arm.com>
The RMM doesn't allow injection of a undefined exception into a realm
guest. Add a WARN to catch if this ever happens.
Signed-off-by: Steven Price <steven.price@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
Changes since v6:
* if (x) WARN(1, ...) makes no sense, just WARN(x, ...)!
---
arch/arm64/kvm/inject_fault.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c
index 6492397b73d7..613f223bc7a3 100644
--- a/arch/arm64/kvm/inject_fault.c
+++ b/arch/arm64/kvm/inject_fault.c
@@ -327,6 +327,7 @@ void kvm_inject_size_fault(struct kvm_vcpu *vcpu)
*/
void kvm_inject_undefined(struct kvm_vcpu *vcpu)
{
+ WARN(vcpu_is_rec(vcpu), "Unexpected undefined exception injection to REC");
if (vcpu_el1_is_32bit(vcpu))
inject_undef32(vcpu);
else
--
2.43.0
^ permalink raw reply related
* [PATCH v14 32/44] KVM: arm64: Handle Realm PSCI requests
From: Steven Price @ 2026-05-13 13:17 UTC (permalink / raw)
To: kvm, kvmarm
Cc: Steven Price, Catalin Marinas, Marc Zyngier, Will Deacon,
James Morse, Oliver Upton, Suzuki K Poulose, Zenghui Yu,
linux-arm-kernel, linux-kernel, Joey Gouly, Alexandru Elisei,
Christoffer Dall, Fuad Tabba, linux-coco, Ganapatrao Kulkarni,
Gavin Shan, Shanker Donthineni, Alper Gun, Aneesh Kumar K . V,
Emi Kisanuki, Vishal Annapurve, WeiLin.Chang, Lorenzo.Pieralisi2
In-Reply-To: <20260513131757.116630-1-steven.price@arm.com>
The RMM needs to be informed of the target REC when a PSCI call is made
with an MPIDR argument.
This requirement will be removed in a future release of the RMM 2.0
specification but is still required for v2.0-bet1.
Co-developed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
---
Chanegs since v13:
* The ioctl KVM_ARM_VCPU_RMI_PSCI_COMPLETE has gone. The RMI call is
made automatically just before entering the REC again.
Changes since v12:
* Chance return code for non-realms to -ENXIO to better represent that
the ioctl is invalid for non-realms (checkpatch is insistent that
"ENOSYS means 'invalid syscall nr' and nothing else").
Changes since v11:
* RMM->RMI renaming.
Changes since v6:
* Use vcpu_is_rec() rather than kvm_is_realm(vcpu->kvm).
* Minor renaming/formatting fixes.
---
arch/arm64/include/asm/kvm_rmi.h | 3 ++
arch/arm64/kvm/psci.c | 15 ++++++++-
arch/arm64/kvm/rmi.c | 58 ++++++++++++++++++++++++++++++++
3 files changed, 75 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/include/asm/kvm_rmi.h b/arch/arm64/include/asm/kvm_rmi.h
index b65cfec10dee..eacf82a7467d 100644
--- a/arch/arm64/include/asm/kvm_rmi.h
+++ b/arch/arm64/include/asm/kvm_rmi.h
@@ -109,6 +109,9 @@ int realm_map_non_secure(struct realm *realm,
unsigned long size,
enum kvm_pgtable_prot prot,
struct kvm_mmu_memory_cache *memcache);
+int realm_psci_complete(struct kvm_vcpu *source,
+ struct kvm_vcpu *target,
+ unsigned long status);
static inline bool kvm_realm_is_private_address(struct realm *realm,
unsigned long addr)
diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
index 3b5dbe9a0a0e..a2cd55dc7b5b 100644
--- a/arch/arm64/kvm/psci.c
+++ b/arch/arm64/kvm/psci.c
@@ -103,7 +103,6 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
reset_state->reset = true;
kvm_make_request(KVM_REQ_VCPU_RESET, vcpu);
-
/*
* Make sure the reset request is observed if the RUNNABLE mp_state is
* observed.
@@ -142,6 +141,20 @@ static unsigned long kvm_psci_vcpu_affinity_info(struct kvm_vcpu *vcpu)
/* Ignore other bits of target affinity */
target_affinity &= target_affinity_mask;
+ if (vcpu_is_rec(vcpu)) {
+ struct kvm_vcpu *target_vcpu;
+
+ /* RMM supports only zero affinity level */
+ if (lowest_affinity_level != 0)
+ return PSCI_RET_INVALID_PARAMS;
+
+ target_vcpu = kvm_mpidr_to_vcpu(kvm, target_affinity);
+ if (!target_vcpu)
+ return PSCI_RET_INVALID_PARAMS;
+
+ return PSCI_RET_SUCCESS;
+ }
+
/*
* If one or more VCPU matching target affinity are running
* then ON else OFF
diff --git a/arch/arm64/kvm/rmi.c b/arch/arm64/kvm/rmi.c
index 761b38a4071c..2b03e962ee41 100644
--- a/arch/arm64/kvm/rmi.c
+++ b/arch/arm64/kvm/rmi.c
@@ -3,6 +3,7 @@
* Copyright (C) 2023-2025 ARM Ltd.
*/
+#include <uapi/linux/psci.h>
#include <linux/kvm_host.h>
#include <asm/kvm_emulate.h>
@@ -127,6 +128,25 @@ static void free_rtt(phys_addr_t phys)
kvm_account_pgtable_pages(phys_to_virt(phys), -1);
}
+int realm_psci_complete(struct kvm_vcpu *source, struct kvm_vcpu *target,
+ unsigned long status)
+{
+ int ret;
+
+ /*
+ * XXX: RMM-v2.0 doesn't require the target REC address for completing
+ * PSCI requests. Temporary hack until RMM implementation catches up
+ * to the full spec.
+ */
+ ret = rmi_psci_complete(virt_to_phys(source->arch.rec.rec_page),
+ virt_to_phys(target->arch.rec.rec_page),
+ status);
+ if (ret)
+ return -EINVAL;
+
+ return 0;
+}
+
static int realm_rtt_create(struct realm *realm,
unsigned long addr,
int level,
@@ -1004,6 +1024,41 @@ static void kvm_complete_ripas_change(struct kvm_vcpu *vcpu)
rec->run->exit.ripas_base = base;
}
+static void kvm_rec_complete_psci(struct kvm_vcpu *vcpu)
+{
+ struct rec_run *run = vcpu->arch.rec.run;
+ unsigned long status = PSCI_RET_DENIED;
+ unsigned long ret = vcpu_get_reg(vcpu, 0);
+ struct kvm_vcpu *target;
+
+ switch (run->exit.gprs[0]) {
+ /*
+ * XXX: RMM-v2.0 doesn't cause RMI_EXIT_PSCI for AFFINITY_INFO
+ * Temporary hack until tf-RMM gets the REC to MPIDR mapping via
+ * RD Auxiliary granules.
+ * For now always report SUCCESS
+ */
+ case PSCI_0_2_FN64_AFFINITY_INFO:
+ status = PSCI_RET_SUCCESS;
+ break;
+ case PSCI_0_2_FN64_CPU_ON: {
+ if (ret != PSCI_RET_SUCCESS &&
+ ret != PSCI_RET_ALREADY_ON)
+ status = PSCI_RET_DENIED;
+ else
+ status = PSCI_RET_SUCCESS;
+ break;
+ }
+ default:
+ return;
+ }
+
+ target = kvm_mpidr_to_vcpu(vcpu->kvm, run->exit.gprs[1]);
+ /* RMM makes sure that we don't get RMI_EXIT_PSCI for invalid mpidrs */
+ if (target)
+ realm_psci_complete(vcpu, target, status);
+}
+
/*
* kvm_rec_pre_enter - Complete operations before entering a REC
*
@@ -1028,6 +1083,9 @@ int kvm_rec_pre_enter(struct kvm_vcpu *vcpu)
for (int i = 0; i < REC_RUN_GPRS; i++)
rec->run->enter.gprs[i] = vcpu_get_reg(vcpu, i);
break;
+ case RMI_EXIT_PSCI:
+ kvm_rec_complete_psci(vcpu);
+ break;
case RMI_EXIT_RIPAS_CHANGE:
kvm_complete_ripas_change(vcpu);
break;
--
2.43.0
^ permalink raw reply related
* [PATCH v14 31/44] KVM: arm64: Validate register access for a Realm VM
From: Steven Price @ 2026-05-13 13:17 UTC (permalink / raw)
To: kvm, kvmarm
Cc: Steven Price, Catalin Marinas, Marc Zyngier, Will Deacon,
James Morse, Oliver Upton, Suzuki K Poulose, Zenghui Yu,
linux-arm-kernel, linux-kernel, Joey Gouly, Alexandru Elisei,
Christoffer Dall, Fuad Tabba, linux-coco, Ganapatrao Kulkarni,
Gavin Shan, Shanker Donthineni, Alper Gun, Aneesh Kumar K . V,
Emi Kisanuki, Vishal Annapurve, WeiLin.Chang, Lorenzo.Pieralisi2
In-Reply-To: <20260513131757.116630-1-steven.price@arm.com>
The RMM only allows setting the GPRS (x0-x30) and PC for a realm
guest. Check this in kvm_arm_set_reg() so that the VMM can receive a
suitable error return if other registers are written to.
The RMM makes similar restrictions for reading of the guest's registers
(this is *confidential* compute after all), however we don't impose the
restriction here. This allows the VMM to read (stale) values from the
registers which might be useful to read back the initial values even if
the RMM doesn't provide the latest version. For migration of a realm VM,
a new interface will be needed so that the VMM can receive an
(encrypted) blob of the VM's state.
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
---
Changes since v5:
* Upper GPRS can be set as part of a HOST_CALL return, so fix up the
test to allow them.
---
arch/arm64/kvm/guest.c | 41 +++++++++++++++++++++++++++++++++++++++++
1 file changed, 41 insertions(+)
diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index 332c453b87cf..e6682019ef6d 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -73,6 +73,25 @@ static u64 core_reg_offset_from_id(u64 id)
return id & ~(KVM_REG_ARCH_MASK | KVM_REG_SIZE_MASK | KVM_REG_ARM_CORE);
}
+static bool kvm_realm_validate_core_reg(u64 off)
+{
+ /*
+ * Note that GPRs can only sometimes be controlled by the VMM.
+ * For PSCI only X0-X6 are used, higher registers are ignored (restored
+ * from the REC).
+ * For HOST_CALL all of X0-X30 are copied to the RsiHostCall structure.
+ * For emulated MMIO X0 is always used.
+ * PC can only be set before the realm is activated.
+ */
+ switch (off) {
+ case KVM_REG_ARM_CORE_REG(regs.regs[0]) ...
+ KVM_REG_ARM_CORE_REG(regs.regs[30]):
+ case KVM_REG_ARM_CORE_REG(regs.pc):
+ return true;
+ }
+ return false;
+}
+
static int core_reg_size_from_offset(const struct kvm_vcpu *vcpu, u64 off)
{
int size;
@@ -716,12 +735,34 @@ int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
return kvm_arm_sys_reg_get_reg(vcpu, reg);
}
+/*
+ * The RMI ABI only enables setting some GPRs and PC. The selection of GPRs
+ * that are available depends on the Realm state and the reason for the last
+ * exit. All other registers are reset to architectural or otherwise defined
+ * reset values by the RMM, except for a few configuration fields that
+ * correspond to Realm parameters.
+ */
+static bool validate_realm_set_reg(struct kvm_vcpu *vcpu,
+ const struct kvm_one_reg *reg)
+{
+ if ((reg->id & KVM_REG_ARM_COPROC_MASK) == KVM_REG_ARM_CORE) {
+ u64 off = core_reg_offset_from_id(reg->id);
+
+ return kvm_realm_validate_core_reg(off);
+ }
+
+ return false;
+}
+
int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
{
/* We currently use nothing arch-specific in upper 32 bits */
if ((reg->id & ~KVM_REG_SIZE_MASK) >> 32 != KVM_REG_ARM64 >> 32)
return -EINVAL;
+ if (kvm_is_realm(vcpu->kvm) && !validate_realm_set_reg(vcpu, reg))
+ return -EINVAL;
+
switch (reg->id & KVM_REG_ARM_COPROC_MASK) {
case KVM_REG_ARM_CORE: return set_core_reg(vcpu, reg);
case KVM_REG_ARM_FW:
--
2.43.0
^ permalink raw reply related
* [PATCH v14 30/44] KVM: arm64: Handle realm VCPU load
From: Steven Price @ 2026-05-13 13:17 UTC (permalink / raw)
To: kvm, kvmarm
Cc: Steven Price, Catalin Marinas, Marc Zyngier, Will Deacon,
James Morse, Oliver Upton, Suzuki K Poulose, Zenghui Yu,
linux-arm-kernel, linux-kernel, Joey Gouly, Alexandru Elisei,
Christoffer Dall, Fuad Tabba, linux-coco, Ganapatrao Kulkarni,
Gavin Shan, Shanker Donthineni, Alper Gun, Aneesh Kumar K . V,
Emi Kisanuki, Vishal Annapurve, WeiLin.Chang, Lorenzo.Pieralisi2
In-Reply-To: <20260513131757.116630-1-steven.price@arm.com>
When loading a realm VCPU much of the work is handled by the RMM so only
some of the actions are required. Rearrange kvm_arch_vcpu_load()
slightly so we can bail out early for a realm guest.
Signed-off-by: Steven Price <steven.price@arm.com>
---
arch/arm64/kvm/arm.c | 19 ++++++++++++-------
1 file changed, 12 insertions(+), 7 deletions(-)
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 073ba9181da9..495082e601a9 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -702,7 +702,7 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
struct kvm_s2_mmu *mmu;
int *last_ran;
- if (is_protected_kvm_enabled())
+ if (is_protected_kvm_enabled() || kvm_is_realm(vcpu->kvm))
goto nommu;
if (vcpu_has_nv(vcpu))
@@ -746,12 +746,6 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
kvm_vgic_load(vcpu);
kvm_vcpu_load_debug(vcpu);
kvm_vcpu_load_fgt(vcpu);
- if (has_vhe())
- kvm_vcpu_load_vhe(vcpu);
- kvm_arch_vcpu_load_fp(vcpu);
- kvm_vcpu_pmu_restore_guest(vcpu);
- if (kvm_arm_is_pvtime_enabled(&vcpu->arch))
- kvm_make_request(KVM_REQ_RECORD_STEAL, vcpu);
if (kvm_vcpu_should_clear_twe(vcpu))
vcpu->arch.hcr_el2 &= ~HCR_TWE;
@@ -773,6 +767,17 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
&vcpu->arch.vgic_cpu.vgic_v3);
}
+ /* No additional state needs to be loaded on Realmed VMs */
+ if (vcpu_is_rec(vcpu))
+ return;
+
+ if (has_vhe())
+ kvm_vcpu_load_vhe(vcpu);
+ kvm_arch_vcpu_load_fp(vcpu);
+ kvm_vcpu_pmu_restore_guest(vcpu);
+ if (kvm_arm_is_pvtime_enabled(&vcpu->arch))
+ kvm_make_request(KVM_REQ_RECORD_STEAL, vcpu);
+
if (!cpumask_test_cpu(cpu, vcpu->kvm->arch.supported_cpus))
vcpu_set_on_unsupported_cpu(vcpu);
--
2.43.0
^ permalink raw reply related
* [PATCH v14 29/44] arm64: RMI: Runtime faulting of memory
From: Steven Price @ 2026-05-13 13:17 UTC (permalink / raw)
To: kvm, kvmarm
Cc: Steven Price, Catalin Marinas, Marc Zyngier, Will Deacon,
James Morse, Oliver Upton, Suzuki K Poulose, Zenghui Yu,
linux-arm-kernel, linux-kernel, Joey Gouly, Alexandru Elisei,
Christoffer Dall, Fuad Tabba, linux-coco, Ganapatrao Kulkarni,
Gavin Shan, Shanker Donthineni, Alper Gun, Aneesh Kumar K . V,
Emi Kisanuki, Vishal Annapurve, WeiLin.Chang, Lorenzo.Pieralisi2
In-Reply-To: <20260513131757.116630-1-steven.price@arm.com>
At runtime if the realm guest accesses memory which hasn't yet been
mapped then KVM needs to either populate the region or fault the guest.
For memory in the lower (protected) region of IPA a fresh page is
provided to the RMM which will zero the contents. For memory in the
upper (shared) region of IPA, the memory from the memslot is mapped
into the realm VM non secure.
Signed-off-by: Steven Price <steven.price@arm.com>
---
Changes since v13:
* Numerous changes due to rebasing.
* Fix addr_range_desc() to encode the correct block size.
Changes since v12:
* Switch to RMM v2.0 range based APIs.
Changes since v11:
* Adapt to upstream changes.
Changes since v10:
* RME->RMI renaming.
* Adapt to upstream gmem changes.
Changes since v9:
* Fix call to kvm_stage2_unmap_range() in kvm_free_stage2_pgd() to set
may_block to avoid stall warnings.
* Minor coding style fixes.
Changes since v8:
* Propagate the may_block flag.
* Minor comments and coding style changes.
Changes since v7:
* Remove redundant WARN_ONs for realm_create_rtt_levels() - it will
internally WARN when necessary.
Changes since v6:
* Handle PAGE_SIZE being larger than RMM granule size.
* Some minor renaming following review comments.
Changes since v5:
* Reduce use of struct page in preparation for supporting the RMM
having a different page size to the host.
* Handle a race when delegating a page where another CPU has faulted on
a the same page (and already delegated the physical page) but not yet
mapped it. In this case simply return to the guest to either use the
mapping from the other CPU (or refault if the race is lost).
* The changes to populate_par_region() are moved into the previous
patch where they belong.
Changes since v4:
* Code cleanup following review feedback.
* Drop the PTE_SHARED bit when creating unprotected page table entries.
This is now set by the RMM and the host has no control of it and the
spec requires the bit to be set to zero.
Changes since v2:
* Avoid leaking memory if failing to map it in the realm.
* Correctly mask RTT based on LPA2 flag (see rtt_get_phys()).
* Adapt to changes in previous patches.
---
arch/arm64/include/asm/kvm_emulate.h | 8 ++
arch/arm64/include/asm/kvm_rmi.h | 12 ++
arch/arm64/kvm/mmu.c | 128 ++++++++++++++++----
arch/arm64/kvm/rmi.c | 173 +++++++++++++++++++++++++++
4 files changed, 301 insertions(+), 20 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 2e69fe494716..8b6f9d26b5d8 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -712,6 +712,14 @@ static inline bool kvm_realm_is_created(struct kvm *kvm)
return kvm_is_realm(kvm) && kvm_realm_state(kvm) != REALM_STATE_NONE;
}
+static inline gpa_t kvm_gpa_from_fault(struct kvm *kvm, phys_addr_t ipa)
+{
+ if (!kvm_is_realm(kvm))
+ return ipa;
+
+ return ipa & ~BIT(kvm->arch.realm.ia_bits - 1);
+}
+
static inline bool vcpu_is_rec(const struct kvm_vcpu *vcpu)
{
return kvm_is_realm(vcpu->kvm);
diff --git a/arch/arm64/include/asm/kvm_rmi.h b/arch/arm64/include/asm/kvm_rmi.h
index a2b6bc412a22..b65cfec10dee 100644
--- a/arch/arm64/include/asm/kvm_rmi.h
+++ b/arch/arm64/include/asm/kvm_rmi.h
@@ -6,6 +6,7 @@
#ifndef __ASM_KVM_RMI_H
#define __ASM_KVM_RMI_H
+#include <asm/kvm_pgtable.h>
#include <asm/rmi_smc.h>
/**
@@ -97,6 +98,17 @@ void kvm_realm_unmap_range(struct kvm *kvm,
unsigned long size,
bool unmap_private,
bool may_block);
+int realm_map_protected(struct kvm *kvm,
+ unsigned long base_ipa,
+ kvm_pfn_t pfn,
+ unsigned long size,
+ struct kvm_mmu_memory_cache *memcache);
+int realm_map_non_secure(struct realm *realm,
+ unsigned long ipa,
+ kvm_pfn_t pfn,
+ unsigned long size,
+ enum kvm_pgtable_prot prot,
+ struct kvm_mmu_memory_cache *memcache);
static inline bool kvm_realm_is_private_address(struct realm *realm,
unsigned long addr)
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index ac2a0f0106b0..776ffe56d17e 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -334,8 +334,15 @@ static void __unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64
lockdep_assert_held_write(&kvm->mmu_lock);
WARN_ON(size & ~PAGE_MASK);
- WARN_ON(stage2_apply_range(mmu, start, end, KVM_PGT_FN(kvm_pgtable_stage2_unmap),
- may_block));
+
+ if (kvm_is_realm(kvm)) {
+ kvm_realm_unmap_range(kvm, start, size, !only_shared,
+ may_block);
+ } else {
+ WARN_ON(stage2_apply_range(mmu, start, end,
+ KVM_PGT_FN(kvm_pgtable_stage2_unmap),
+ may_block));
+ }
}
void kvm_stage2_unmap_range(struct kvm_s2_mmu *mmu, phys_addr_t start,
@@ -358,7 +365,10 @@ static void stage2_flush_memslot(struct kvm *kvm,
phys_addr_t addr = memslot->base_gfn << PAGE_SHIFT;
phys_addr_t end = addr + PAGE_SIZE * memslot->npages;
- kvm_stage2_flush_range(&kvm->arch.mmu, addr, end);
+ if (kvm_is_realm(kvm))
+ kvm_realm_unmap_range(kvm, addr, end - addr, false, true);
+ else
+ kvm_stage2_flush_range(&kvm->arch.mmu, addr, end);
}
/**
@@ -1103,6 +1113,10 @@ void stage2_unmap_vm(struct kvm *kvm)
struct kvm_memory_slot *memslot;
int idx, bkt;
+ /* For realms this is handled by the RMM so nothing to do here */
+ if (kvm_is_realm(kvm))
+ return;
+
idx = srcu_read_lock(&kvm->srcu);
mmap_read_lock(current->mm);
write_lock(&kvm->mmu_lock);
@@ -1528,6 +1542,29 @@ static bool kvm_vma_mte_allowed(struct vm_area_struct *vma)
return vma->vm_flags & VM_MTE_ALLOWED;
}
+static int realm_map_ipa(struct kvm *kvm, phys_addr_t ipa,
+ kvm_pfn_t pfn, unsigned long map_size,
+ enum kvm_pgtable_prot prot,
+ struct kvm_mmu_memory_cache *memcache)
+{
+ struct realm *realm = &kvm->arch.realm;
+
+ /*
+ * Write permission is required for now even though it's possible to
+ * map unprotected pages (granules) as read-only. It's impossible to
+ * map protected pages (granules) as read-only.
+ */
+ if (WARN_ON(!(prot & KVM_PGTABLE_PROT_W)))
+ return -EFAULT;
+
+ ipa = ALIGN_DOWN(ipa, PAGE_SIZE);
+ if (!kvm_realm_is_private_address(realm, ipa))
+ return realm_map_non_secure(realm, ipa, pfn, map_size, prot,
+ memcache);
+
+ return realm_map_protected(kvm, ipa, pfn, map_size, memcache);
+}
+
static bool kvm_vma_is_cacheable(struct vm_area_struct *vma)
{
switch (FIELD_GET(PTE_ATTRINDX_MASK, pgprot_val(vma->vm_page_prot))) {
@@ -1604,27 +1641,52 @@ static int gmem_abort(const struct kvm_s2_fault_desc *s2fd)
bool write_fault, exec_fault;
enum kvm_pgtable_walk_flags flags = KVM_PGTABLE_WALK_SHARED;
enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
- struct kvm_pgtable *pgt = s2fd->vcpu->arch.hw_mmu->pgt;
+ struct kvm_vcpu *vcpu = s2fd->vcpu;
+ struct kvm_pgtable *pgt = vcpu->arch.hw_mmu->pgt;
+ gpa_t gpa = kvm_gpa_from_fault(vcpu->kvm, s2fd->fault_ipa);
unsigned long mmu_seq;
struct page *page;
- struct kvm *kvm = s2fd->vcpu->kvm;
+ struct kvm *kvm = vcpu->kvm;
void *memcache;
kvm_pfn_t pfn;
gfn_t gfn;
int ret;
- memcache = get_mmu_memcache(s2fd->vcpu);
- ret = topup_mmu_memcache(s2fd->vcpu, memcache);
+ if (kvm_is_realm(vcpu->kvm)) {
+ /* check for memory attribute mismatch */
+ bool is_priv_gfn = kvm_mem_is_private(kvm, gpa >> PAGE_SHIFT);
+ /*
+ * For Realms, the shared address is an alias of the private
+ * PA with the top bit set. Thus if the fault address matches
+ * the GPA then it is the private alias.
+ */
+ bool is_priv_fault = (gpa == s2fd->fault_ipa);
+
+ if (is_priv_gfn != is_priv_fault) {
+ kvm_prepare_memory_fault_exit(vcpu, gpa, PAGE_SIZE,
+ kvm_is_write_fault(vcpu),
+ false,
+ is_priv_fault);
+ /*
+ * KVM_EXIT_MEMORY_FAULT requires an return code of
+ * -EFAULT, see the API documentation
+ */
+ return -EFAULT;
+ }
+ }
+
+ memcache = get_mmu_memcache(vcpu);
+ ret = topup_mmu_memcache(vcpu, memcache);
if (ret)
return ret;
if (s2fd->nested)
gfn = kvm_s2_trans_output(s2fd->nested) >> PAGE_SHIFT;
else
- gfn = s2fd->fault_ipa >> PAGE_SHIFT;
+ gfn = gpa >> PAGE_SHIFT;
- write_fault = kvm_is_write_fault(s2fd->vcpu);
- exec_fault = kvm_vcpu_trap_is_exec_fault(s2fd->vcpu);
+ write_fault = kvm_is_write_fault(vcpu);
+ exec_fault = kvm_vcpu_trap_is_exec_fault(vcpu);
VM_WARN_ON_ONCE(write_fault && exec_fault);
@@ -1634,7 +1696,7 @@ static int gmem_abort(const struct kvm_s2_fault_desc *s2fd)
ret = kvm_gmem_get_pfn(kvm, s2fd->memslot, gfn, &pfn, &page, NULL);
if (ret) {
- kvm_prepare_memory_fault_exit(s2fd->vcpu, s2fd->fault_ipa, PAGE_SIZE,
+ kvm_prepare_memory_fault_exit(vcpu, gpa, PAGE_SIZE,
write_fault, exec_fault, false);
return ret;
}
@@ -1654,14 +1716,20 @@ static int gmem_abort(const struct kvm_s2_fault_desc *s2fd)
kvm_fault_lock(kvm);
if (mmu_invalidate_retry(kvm, mmu_seq)) {
ret = -EAGAIN;
- goto out_unlock;
+ goto out_release_page;
+ }
+
+ if (kvm_is_realm(kvm)) {
+ ret = realm_map_ipa(kvm, s2fd->fault_ipa, pfn,
+ PAGE_SIZE, KVM_PGTABLE_PROT_R | KVM_PGTABLE_PROT_W, memcache);
+ goto out_release_page;
}
ret = KVM_PGT_FN(kvm_pgtable_stage2_map)(pgt, s2fd->fault_ipa, PAGE_SIZE,
__pfn_to_phys(pfn), prot,
memcache, flags);
-out_unlock:
+out_release_page:
kvm_release_faultin_page(kvm, page, !!ret, prot & KVM_PGTABLE_PROT_W);
kvm_fault_unlock(kvm);
@@ -1847,7 +1915,7 @@ static int kvm_s2_fault_get_vma_info(const struct kvm_s2_fault_desc *s2fd,
* mapping size to ensure we find the right PFN and lay down the
* mapping in the right place.
*/
- s2vi->gfn = ALIGN_DOWN(s2fd->fault_ipa, s2vi->vma_pagesize) >> PAGE_SHIFT;
+ s2vi->gfn = kvm_gpa_from_fault(kvm, ALIGN_DOWN(s2fd->fault_ipa, s2vi->vma_pagesize)) >> PAGE_SHIFT;
s2vi->mte_allowed = kvm_vma_mte_allowed(vma);
@@ -2056,6 +2124,9 @@ static int kvm_s2_fault_map(const struct kvm_s2_fault_desc *s2fd,
prot &= ~KVM_NV_GUEST_MAP_SZ;
ret = KVM_PGT_FN(kvm_pgtable_stage2_relax_perms)(pgt, gfn_to_gpa(gfn),
prot, flags);
+ } else if (kvm_is_realm(kvm)) {
+ ret = realm_map_ipa(kvm, s2fd->fault_ipa, pfn, mapping_size,
+ prot, memcache);
} else {
ret = KVM_PGT_FN(kvm_pgtable_stage2_map)(pgt, gfn_to_gpa(gfn), mapping_size,
__pfn_to_phys(pfn), prot,
@@ -2214,6 +2285,13 @@ int kvm_handle_guest_sea(struct kvm_vcpu *vcpu)
return 0;
}
+static bool shared_ipa_fault(struct kvm *kvm, phys_addr_t fault_ipa)
+{
+ gpa_t gpa = kvm_gpa_from_fault(kvm, fault_ipa);
+
+ return (gpa != fault_ipa);
+}
+
/**
* kvm_handle_guest_abort - handles all 2nd stage aborts
* @vcpu: the VCPU pointer
@@ -2324,8 +2402,9 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
nested = &nested_trans;
}
- gfn = ipa >> PAGE_SHIFT;
+ gfn = kvm_gpa_from_fault(vcpu->kvm, ipa) >> PAGE_SHIFT;
memslot = gfn_to_memslot(vcpu->kvm, gfn);
+
hva = gfn_to_hva_memslot_prot(memslot, gfn, &writable);
write_fault = kvm_is_write_fault(vcpu);
if (kvm_is_error_hva(hva) || (write_fault && !writable)) {
@@ -2368,7 +2447,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
* of the page size.
*/
ipa |= FAR_TO_FIPA_OFFSET(kvm_vcpu_get_hfar(vcpu));
- ret = io_mem_abort(vcpu, ipa);
+ ret = io_mem_abort(vcpu, kvm_gpa_from_fault(vcpu->kvm, ipa));
goto out_unlock;
}
@@ -2396,7 +2475,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
!write_fault &&
!kvm_vcpu_trap_is_exec_fault(vcpu));
- if (kvm_slot_has_gmem(memslot))
+ if (kvm_slot_has_gmem(memslot) && !shared_ipa_fault(vcpu->kvm, fault_ipa))
ret = gmem_abort(&s2fd);
else
ret = user_mem_abort(&s2fd);
@@ -2433,6 +2512,10 @@ bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
if (!kvm->arch.mmu.pgt || kvm_vm_is_protected(kvm))
return false;
+ /* We don't support aging for Realms */
+ if (kvm_is_realm(kvm))
+ return true;
+
return KVM_PGT_FN(kvm_pgtable_stage2_test_clear_young)(kvm->arch.mmu.pgt,
range->start << PAGE_SHIFT,
size, true);
@@ -2449,6 +2532,10 @@ bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
if (!kvm->arch.mmu.pgt || kvm_vm_is_protected(kvm))
return false;
+ /* We don't support aging for Realms */
+ if (kvm_is_realm(kvm))
+ return true;
+
return KVM_PGT_FN(kvm_pgtable_stage2_test_clear_young)(kvm->arch.mmu.pgt,
range->start << PAGE_SHIFT,
size, false);
@@ -2628,10 +2715,11 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
return -EFAULT;
/*
- * Only support guest_memfd backed memslots with mappable memory, since
- * there aren't any CoCo VMs that support only private memory on arm64.
+ * Only support guest_memfd backed memslots with mappable memory,
+ * unless the guest is a CCA realm guest.
*/
- if (kvm_slot_has_gmem(new) && !kvm_memslot_is_gmem_only(new))
+ if (kvm_slot_has_gmem(new) && !kvm_memslot_is_gmem_only(new) &&
+ !kvm_is_realm(kvm))
return -EINVAL;
hva = new->userspace_addr;
diff --git a/arch/arm64/kvm/rmi.c b/arch/arm64/kvm/rmi.c
index cae29fd3353c..761b38a4071c 100644
--- a/arch/arm64/kvm/rmi.c
+++ b/arch/arm64/kvm/rmi.c
@@ -597,6 +597,179 @@ static int realm_data_map_init(struct kvm *kvm, unsigned long ipa,
return ret;
}
+static unsigned long addr_range_desc(unsigned long phys, unsigned long size)
+{
+ unsigned long out = 0;
+
+ switch (size) {
+ case P4D_SIZE:
+ out = 3 | (1 << 2);
+ break;
+ case PUD_SIZE:
+ out = 2 | (1 << 2);
+ break;
+ case PMD_SIZE:
+ out = 1 | (1 << 2);
+ break;
+ case PAGE_SIZE:
+ out = 0 | (1 << 2);
+ break;
+ default:
+ /*
+ * Only support mapping at the page level granulatity when
+ * it's an unusual length. This should get us back onto a larger
+ * block size for the subsequent mappings.
+ */
+ out = 0 | ((MIN(size >> PAGE_SHIFT, PTRS_PER_PTE - 1)) << 2);
+ break;
+ }
+
+ WARN_ON(phys & ~PAGE_MASK);
+
+ out |= phys & PAGE_MASK;
+
+ return out;
+}
+
+int realm_map_protected(struct kvm *kvm,
+ unsigned long ipa,
+ kvm_pfn_t pfn,
+ unsigned long map_size,
+ struct kvm_mmu_memory_cache *memcache)
+{
+ struct realm *realm = &kvm->arch.realm;
+ phys_addr_t phys = __pfn_to_phys(pfn);
+ phys_addr_t base_phys = phys;
+ phys_addr_t rd = virt_to_phys(realm->rd);
+ unsigned long base_ipa = ipa;
+ unsigned long ipa_top = ipa + map_size;
+ int ret = 0;
+
+ if (WARN_ON(!IS_ALIGNED(map_size, PAGE_SIZE) ||
+ !IS_ALIGNED(ipa, map_size)))
+ return -EINVAL;
+
+ if (rmi_delegate_range(phys, map_size)) {
+ /*
+ * It's likely we raced with another VCPU on the same
+ * fault. Assume the other VCPU has handled the fault
+ * and return to the guest.
+ */
+ return 0;
+ }
+
+ while (ipa < ipa_top) {
+ unsigned long flags = RMI_ADDR_TYPE_SINGLE;
+ unsigned long range_desc = addr_range_desc(phys, ipa_top - ipa);
+ unsigned long out_top;
+
+ ret = rmi_rtt_data_map(rd, ipa, ipa_top, flags, range_desc,
+ &out_top);
+
+ if (RMI_RETURN_STATUS(ret) == RMI_ERROR_RTT) {
+ /* Create missing RTTs and retry */
+ int level = RMI_RETURN_INDEX(ret);
+
+ WARN_ON(level == KVM_PGTABLE_LAST_LEVEL);
+ ret = realm_create_rtt_levels(realm, ipa, level,
+ KVM_PGTABLE_LAST_LEVEL,
+ memcache);
+ if (ret)
+ goto err_undelegate;
+
+ ret = rmi_rtt_data_map(rd, ipa, ipa_top, flags,
+ range_desc, &out_top);
+ }
+
+ if (WARN_ON(ret))
+ goto err_undelegate;
+
+ phys += out_top - ipa;
+ ipa = out_top;
+ }
+
+ return 0;
+
+err_undelegate:
+ realm_unmap_private_range(kvm, base_ipa, ipa, true);
+ if (WARN_ON(rmi_undelegate_range(base_phys, map_size))) {
+ /* Page can't be returned to NS world so is lost */
+ get_page(phys_to_page(base_phys));
+ }
+ return -ENXIO;
+}
+
+int realm_map_non_secure(struct realm *realm,
+ unsigned long ipa,
+ kvm_pfn_t pfn,
+ unsigned long size,
+ enum kvm_pgtable_prot prot,
+ struct kvm_mmu_memory_cache *memcache)
+{
+ unsigned long attr, flags = 0;
+ phys_addr_t rd = virt_to_phys(realm->rd);
+ phys_addr_t phys = __pfn_to_phys(pfn);
+ unsigned long ipa_top = ipa + size;
+ int ret;
+
+ if (WARN_ON(!IS_ALIGNED(size, PAGE_SIZE) ||
+ !IS_ALIGNED(ipa, size)))
+ return -EINVAL;
+
+ switch (prot & (KVM_PGTABLE_PROT_DEVICE | KVM_PGTABLE_PROT_NORMAL_NC)) {
+ case KVM_PGTABLE_PROT_DEVICE | KVM_PGTABLE_PROT_NORMAL_NC:
+ return -EINVAL;
+ case KVM_PGTABLE_PROT_DEVICE:
+ attr = MT_S2_FWB_DEVICE_nGnRE;
+ break;
+ case KVM_PGTABLE_PROT_NORMAL_NC:
+ attr = MT_S2_FWB_NORMAL_NC;
+ break;
+ default:
+ attr = MT_S2_FWB_NORMAL;
+ }
+
+ flags |= FIELD_PREP(RMI_RTT_UNPROT_MAP_FLAGS_MEMATTR, attr);
+
+ if (prot & KVM_PGTABLE_PROT_R)
+ flags |= FIELD_PREP(RMI_RTT_UNPROT_MAP_FLAGS_S2AP, RMI_S2AP_DIRECT_READ);
+ if (prot & KVM_PGTABLE_PROT_W)
+ flags |= FIELD_PREP(RMI_RTT_UNPROT_MAP_FLAGS_S2AP, RMI_S2AP_DIRECT_WRITE);
+
+ flags |= RMI_ADDR_TYPE_SINGLE;
+
+ while (ipa < ipa_top) {
+ unsigned long range_desc = addr_range_desc(phys, ipa_top - ipa);
+ unsigned long out_top;
+
+ ret = rmi_rtt_unprot_map(rd, ipa, ipa_top, flags, range_desc,
+ &out_top);
+
+ if (RMI_RETURN_STATUS(ret) == RMI_ERROR_RTT) {
+ /* Create missing RTTs and retry */
+ int level = RMI_RETURN_INDEX(ret);
+
+ WARN_ON(level == KVM_PGTABLE_LAST_LEVEL);
+ ret = realm_create_rtt_levels(realm, ipa, level,
+ KVM_PGTABLE_LAST_LEVEL,
+ memcache);
+ if (ret)
+ return ret;
+
+ ret = rmi_rtt_unprot_map(rd, ipa, ipa_top, flags,
+ range_desc, &out_top);
+ }
+
+ if (WARN_ON(ret))
+ return ret;
+
+ phys += out_top - ipa;
+ ipa = out_top;
+ }
+
+ return 0;
+}
+
static int populate_region_cb(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn,
struct page *src_page, void *opaque)
{
--
2.43.0
^ permalink raw reply related
* [PATCH v14 28/44] arm64: RMI: Create the realm descriptor
From: Steven Price @ 2026-05-13 13:17 UTC (permalink / raw)
To: kvm, kvmarm
Cc: Steven Price, Catalin Marinas, Marc Zyngier, Will Deacon,
James Morse, Oliver Upton, Suzuki K Poulose, Zenghui Yu,
linux-arm-kernel, linux-kernel, Joey Gouly, Alexandru Elisei,
Christoffer Dall, Fuad Tabba, linux-coco, Ganapatrao Kulkarni,
Gavin Shan, Shanker Donthineni, Alper Gun, Aneesh Kumar K . V,
Emi Kisanuki, Vishal Annapurve, WeiLin.Chang, Lorenzo.Pieralisi2
In-Reply-To: <20260513131757.116630-1-steven.price@arm.com>
Creating a realm involves first creating a realm descriptor (RD). This
involves passing the configuration information to the RMM. Do this as
part of realm_ensure_created() so that the realm is created when it is
first needed.
Signed-off-by: Steven Price <steven.price@arm.com>
---
Changes since v13:
* The RMM no longer uses AUX granules, so no need to ask it how many it
needs.
* Adapted to other changes.
Changes since v12:
* Since RMM page size is now equal to the host's page size various
calculations are simplified.
* Switch to using range based APIs to delegate/undelegate.
* VMID handling is now handled entirely by the RMM.
---
arch/arm64/kvm/rmi.c | 88 +++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 86 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/kvm/rmi.c b/arch/arm64/kvm/rmi.c
index fb96bcaa73ed..cae29fd3353c 100644
--- a/arch/arm64/kvm/rmi.c
+++ b/arch/arm64/kvm/rmi.c
@@ -418,6 +418,77 @@ static void realm_unmap_shared_range(struct kvm *kvm,
start, end);
}
+static int realm_create_rd(struct kvm *kvm)
+{
+ struct realm *realm = &kvm->arch.realm;
+ struct realm_params *params = realm->params;
+ void *rd = NULL;
+ phys_addr_t rd_phys, params_phys;
+ size_t pgd_size = kvm_pgtable_stage2_pgd_size(kvm->arch.mmu.vtcr);
+ int r;
+
+ realm->ia_bits = VTCR_EL2_IPA(kvm->arch.mmu.vtcr);
+
+ if (WARN_ON(realm->rd || !realm->params))
+ return -EEXIST;
+
+ rd = (void *)__get_free_page(GFP_KERNEL_ACCOUNT);
+ if (!rd)
+ return -ENOMEM;
+
+ rd_phys = virt_to_phys(rd);
+ if (rmi_delegate_page(rd_phys)) {
+ r = -ENXIO;
+ goto free_rd;
+ }
+
+ if (rmi_delegate_range(kvm->arch.mmu.pgd_phys, pgd_size)) {
+ r = -ENXIO;
+ goto out_undelegate_tables;
+ }
+
+ params->s2sz = VTCR_EL2_IPA(kvm->arch.mmu.vtcr);
+ params->rtt_level_start = get_start_level(realm);
+ params->rtt_num_start = pgd_size / PAGE_SIZE;
+ params->rtt_base = kvm->arch.mmu.pgd_phys;
+
+ if (kvm->arch.arm_pmu) {
+ params->pmu_num_ctrs = kvm->arch.nr_pmu_counters;
+ params->flags |= RMI_REALM_PARAM_FLAG_PMU;
+ }
+
+ if (kvm_lpa2_is_enabled())
+ params->flags |= RMI_REALM_PARAM_FLAG_LPA2;
+
+ params_phys = virt_to_phys(params);
+
+ if (rmi_realm_create(rd_phys, params_phys)) {
+ r = -ENXIO;
+ goto out_undelegate_tables;
+ }
+
+ realm->rd = rd;
+ kvm_set_realm_state(kvm, REALM_STATE_NEW);
+ /* The realm is up, free the parameters. */
+ free_page((unsigned long)realm->params);
+ realm->params = NULL;
+
+ return 0;
+
+out_undelegate_tables:
+ if (WARN_ON(rmi_undelegate_range(kvm->arch.mmu.pgd_phys, pgd_size))) {
+ /* Leak the pages if they cannot be returned */
+ kvm->arch.mmu.pgt = NULL;
+ }
+ if (WARN_ON(rmi_undelegate_page(rd_phys))) {
+ /* Leak the page if it isn't returned */
+ return r;
+ }
+free_rd:
+ free_page((unsigned long)rd);
+ return r;
+}
+
static void realm_unmap_private_range(struct kvm *kvm,
unsigned long start,
unsigned long end,
@@ -647,8 +718,21 @@ static int realm_init_ipa_state(struct kvm *kvm,
static int realm_ensure_created(struct kvm *kvm)
{
- /* Provided in later patch */
- return -ENXIO;
+ int ret;
+
+ switch (kvm_realm_state(kvm)) {
+ case REALM_STATE_NONE:
+ break;
+ case REALM_STATE_NEW:
+ return 0;
+ case REALM_STATE_DEAD:
+ return -ENXIO;
+ default:
+ return -EBUSY;
+ }
+
+ ret = realm_create_rd(kvm);
+ return ret;
}
static int set_ripas_of_protected_regions(struct kvm *kvm)
--
2.43.0
^ permalink raw reply related
* [PATCH v14 27/44] arm64: RMI: Set RIPAS of initial memslots
From: Steven Price @ 2026-05-13 13:17 UTC (permalink / raw)
To: kvm, kvmarm
Cc: Steven Price, Catalin Marinas, Marc Zyngier, Will Deacon,
James Morse, Oliver Upton, Suzuki K Poulose, Zenghui Yu,
linux-arm-kernel, linux-kernel, Joey Gouly, Alexandru Elisei,
Christoffer Dall, Fuad Tabba, linux-coco, Ganapatrao Kulkarni,
Gavin Shan, Shanker Donthineni, Alper Gun, Aneesh Kumar K . V,
Emi Kisanuki, Vishal Annapurve, WeiLin.Chang, Lorenzo.Pieralisi2
In-Reply-To: <20260513131757.116630-1-steven.price@arm.com>
The memory which the realm guest accesses must be set to RIPAS_RAM.
Iterate over the memslots and set all gmem memslots to RIPAS_RAM.
Signed-off-by: Steven Price <steven.price@arm.com>
---
New patch for v12.
---
arch/arm64/kvm/rmi.c | 36 ++++++++++++++++++++++++++++++++++++
1 file changed, 36 insertions(+)
diff --git a/arch/arm64/kvm/rmi.c b/arch/arm64/kvm/rmi.c
index 209087bcf399..fb96bcaa73ed 100644
--- a/arch/arm64/kvm/rmi.c
+++ b/arch/arm64/kvm/rmi.c
@@ -637,12 +637,44 @@ static int realm_set_ipa_state(struct kvm_vcpu *vcpu,
return ret;
}
+static int realm_init_ipa_state(struct kvm *kvm,
+ unsigned long gfn,
+ unsigned long pages)
+{
+ return ripas_change(kvm, NULL, gfn_to_gpa(gfn), gfn_to_gpa(gfn + pages),
+ RIPAS_INIT, NULL);
+}
+
static int realm_ensure_created(struct kvm *kvm)
{
/* Provided in later patch */
return -ENXIO;
}
+static int set_ripas_of_protected_regions(struct kvm *kvm)
+{
+ struct kvm_memslots *slots;
+ struct kvm_memory_slot *memslot;
+ int idx, bkt;
+ int ret = 0;
+
+ idx = srcu_read_lock(&kvm->srcu);
+
+ slots = kvm_memslots(kvm);
+ kvm_for_each_memslot(memslot, bkt, slots) {
+ if (!kvm_slot_has_gmem(memslot))
+ continue;
+
+ ret = realm_init_ipa_state(kvm, memslot->base_gfn,
+ memslot->npages);
+ if (ret)
+ break;
+ }
+ srcu_read_unlock(&kvm->srcu, idx);
+
+ return ret;
+}
+
int kvm_arm_rmi_populate(struct kvm *kvm,
struct kvm_arm_rmi_populate *args)
{
@@ -890,6 +922,10 @@ int kvm_activate_realm(struct kvm *kvm)
return ret;
}
+ ret = set_ripas_of_protected_regions(kvm);
+ if (ret)
+ return ret;
+
ret = rmi_realm_activate(virt_to_phys(realm->rd));
if (ret)
return -ENXIO;
--
2.43.0
^ permalink raw reply related
* [PATCH v14 26/44] arm64: RMI: Allow populating initial contents
From: Steven Price @ 2026-05-13 13:17 UTC (permalink / raw)
To: kvm, kvmarm
Cc: Steven Price, Catalin Marinas, Marc Zyngier, Will Deacon,
James Morse, Oliver Upton, Suzuki K Poulose, Zenghui Yu,
linux-arm-kernel, linux-kernel, Joey Gouly, Alexandru Elisei,
Christoffer Dall, Fuad Tabba, linux-coco, Ganapatrao Kulkarni,
Gavin Shan, Shanker Donthineni, Alper Gun, Aneesh Kumar K . V,
Emi Kisanuki, Vishal Annapurve, WeiLin.Chang, Lorenzo.Pieralisi2
In-Reply-To: <20260513131757.116630-1-steven.price@arm.com>
The VMM needs to populate the realm with some data before starting (e.g.
a kernel and initrd). This is measured by the RMM and used as part of
the attestation later on.
Signed-off-by: Steven Price <steven.price@arm.com>
---
Changes since v13:
* Rename realm_create_protected_data_page() to realm_data_map_init().
Changes since v12:
* The ioctl now updates the structure with the amount populated rather
than returning this through the ioctl return code.
* Use the new RMM v2.0 range based RMI calls.
* Adapt to upstream changes in kvm_gmem_populate().
Changes since v11:
* The multiplex CAP is gone and there's a new ioctl which makes use of
the generic kvm_gmem_populate() functionality.
Changes since v7:
* Improve the error codes.
* Other minor changes from review.
Changes since v6:
* Handle host potentially having a larger page size than the RMM
granule.
* Drop historic "par" (protected address range) from
populate_par_region() - it doesn't exist within the current
architecture.
* Add a cond_resched() call in kvm_populate_realm().
Changes since v5:
* Refactor to use PFNs rather than tracking struct page in
realm_create_protected_data_page().
* Pull changes from a later patch (in the v5 series) for accessing
pages from a guest memfd.
* Do the populate in chunks to avoid holding locks for too long and
triggering RCU stall warnings.
---
arch/arm64/include/asm/kvm_rmi.h | 4 ++
arch/arm64/kvm/Kconfig | 1 +
arch/arm64/kvm/arm.c | 13 ++++
arch/arm64/kvm/rmi.c | 106 +++++++++++++++++++++++++++++++
4 files changed, 124 insertions(+)
diff --git a/arch/arm64/include/asm/kvm_rmi.h b/arch/arm64/include/asm/kvm_rmi.h
index 007249a13dbc..a2b6bc412a22 100644
--- a/arch/arm64/include/asm/kvm_rmi.h
+++ b/arch/arm64/include/asm/kvm_rmi.h
@@ -88,6 +88,10 @@ int kvm_rec_enter(struct kvm_vcpu *vcpu);
int kvm_rec_pre_enter(struct kvm_vcpu *vcpu);
int handle_rec_exit(struct kvm_vcpu *vcpu, int rec_run_status);
+struct kvm_arm_rmi_populate;
+
+int kvm_arm_rmi_populate(struct kvm *kvm,
+ struct kvm_arm_rmi_populate *arg);
void kvm_realm_unmap_range(struct kvm *kvm,
unsigned long ipa,
unsigned long size,
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index 4e16719fda22..d0cd011cf672 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -38,6 +38,7 @@ menuconfig KVM
select GUEST_PERF_EVENTS if PERF_EVENTS
select KVM_GUEST_MEMFD
select KVM_GENERIC_MEMORY_ATTRIBUTES
+ select HAVE_KVM_ARCH_GMEM_POPULATE
help
Support hosting virtualized guest machines.
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index ed88a203b892..073ba9181da9 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -2131,6 +2131,19 @@ int kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg)
return -EFAULT;
return kvm_vm_ioctl_get_reg_writable_masks(kvm, &range);
}
+ case KVM_ARM_RMI_POPULATE: {
+ struct kvm_arm_rmi_populate req;
+ int ret;
+
+ if (!kvm_is_realm(kvm))
+ return -ENXIO;
+ if (copy_from_user(&req, argp, sizeof(req)))
+ return -EFAULT;
+ ret = kvm_arm_rmi_populate(kvm, &req);
+ if (copy_to_user(argp, &req, sizeof(req)))
+ return -EFAULT;
+ return ret;
+ }
default:
return -EINVAL;
}
diff --git a/arch/arm64/kvm/rmi.c b/arch/arm64/kvm/rmi.c
index a89873a5eb77..209087bcf399 100644
--- a/arch/arm64/kvm/rmi.c
+++ b/arch/arm64/kvm/rmi.c
@@ -486,6 +486,75 @@ void kvm_realm_unmap_range(struct kvm *kvm, unsigned long start,
realm_unmap_private_range(kvm, start, end, may_block);
}
+static int realm_data_map_init(struct kvm *kvm, unsigned long ipa,
+ kvm_pfn_t dst_pfn, kvm_pfn_t src_pfn,
+ unsigned long flags)
+{
+ struct realm *realm = &kvm->arch.realm;
+ phys_addr_t rd = virt_to_phys(realm->rd);
+ phys_addr_t dst_phys, src_phys;
+ int ret;
+
+ dst_phys = __pfn_to_phys(dst_pfn);
+ src_phys = __pfn_to_phys(src_pfn);
+
+ if (rmi_delegate_page(dst_phys))
+ return -ENXIO;
+
+ ret = rmi_rtt_data_map_init(rd, dst_phys, ipa, src_phys, flags);
+ if (RMI_RETURN_STATUS(ret) == RMI_ERROR_RTT) {
+ /* Create missing RTTs and retry */
+ int level = RMI_RETURN_INDEX(ret);
+
+ KVM_BUG_ON(level == KVM_PGTABLE_LAST_LEVEL, kvm);
+
+ ret = realm_create_rtt_levels(realm, ipa, level,
+ KVM_PGTABLE_LAST_LEVEL, NULL);
+ if (!ret) {
+ ret = rmi_rtt_data_map_init(rd, dst_phys, ipa, src_phys,
+ flags);
+ }
+ }
+
+ if (ret) {
+ if (WARN_ON(rmi_undelegate_page(dst_phys))) {
+ /* Undelegate failed, so we leak the page */
+ get_page(pfn_to_page(dst_pfn));
+ }
+ }
+
+ return ret;
+}
+
+static int populate_region_cb(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn,
+ struct page *src_page, void *opaque)
+{
+ unsigned long data_flags = *(unsigned long *)opaque;
+ phys_addr_t ipa = gfn_to_gpa(gfn);
+
+ if (!src_page)
+ return -EOPNOTSUPP;
+
+ return realm_data_map_init(kvm, ipa, pfn, page_to_pfn(src_page),
+ data_flags);
+}
+
+static long populate_region(struct kvm *kvm,
+ gfn_t base_gfn,
+ unsigned long pages,
+ u64 uaddr,
+ unsigned long data_flags)
+{
+ long ret = 0;
+
+ mutex_lock(&kvm->slots_lock);
+ ret = kvm_gmem_populate(kvm, base_gfn, u64_to_user_ptr(uaddr), pages,
+ populate_region_cb, &data_flags);
+ mutex_unlock(&kvm->slots_lock);
+
+ return ret;
+}
+
enum ripas_action {
RIPAS_INIT,
RIPAS_SET,
@@ -574,6 +643,43 @@ static int realm_ensure_created(struct kvm *kvm)
return -ENXIO;
}
+int kvm_arm_rmi_populate(struct kvm *kvm,
+ struct kvm_arm_rmi_populate *args)
+{
+ unsigned long data_flags = 0;
+ unsigned long ipa_start = args->base;
+ unsigned long ipa_end = ipa_start + args->size;
+ long pages_populated;
+ int ret;
+
+ if (args->reserved ||
+ (args->flags & ~KVM_ARM_RMI_POPULATE_FLAGS_MEASURE) ||
+ !IS_ALIGNED(ipa_start, PAGE_SIZE) ||
+ !IS_ALIGNED(ipa_end, PAGE_SIZE) ||
+ !IS_ALIGNED(args->source_uaddr, PAGE_SIZE))
+ return -EINVAL;
+
+ ret = realm_ensure_created(kvm);
+ if (ret)
+ return ret;
+
+ if (args->flags & KVM_ARM_RMI_POPULATE_FLAGS_MEASURE)
+ data_flags |= RMI_MEASURE_CONTENT;
+
+ pages_populated = populate_region(kvm, gpa_to_gfn(ipa_start),
+ args->size >> PAGE_SHIFT,
+ args->source_uaddr, data_flags);
+
+ if (pages_populated < 0)
+ return pages_populated;
+
+ args->size -= pages_populated << PAGE_SHIFT;
+ args->source_uaddr += pages_populated << PAGE_SHIFT;
+ args->base += pages_populated << PAGE_SHIFT;
+
+ return 0;
+}
+
static void kvm_complete_ripas_change(struct kvm_vcpu *vcpu)
{
struct kvm *kvm = vcpu->kvm;
--
2.43.0
^ permalink raw reply related
* [PATCH v14 25/44] KVM: arm64: Expose support for private memory
From: Steven Price @ 2026-05-13 13:17 UTC (permalink / raw)
To: kvm, kvmarm
Cc: Steven Price, Catalin Marinas, Marc Zyngier, Will Deacon,
James Morse, Oliver Upton, Suzuki K Poulose, Zenghui Yu,
linux-arm-kernel, linux-kernel, Joey Gouly, Alexandru Elisei,
Christoffer Dall, Fuad Tabba, linux-coco, Ganapatrao Kulkarni,
Gavin Shan, Shanker Donthineni, Alper Gun, Aneesh Kumar K . V,
Emi Kisanuki, Vishal Annapurve, WeiLin.Chang, Lorenzo.Pieralisi2
In-Reply-To: <20260513131757.116630-1-steven.price@arm.com>
Select KVM_GENERIC_MEMORY_ATTRIBUTES and provide the necessary support
functions.
Signed-off-by: Steven Price <steven.price@arm.com>
---
Changes since v13:
* Also update documentation to show that KVM_CAP_MEMORY_ATTRIBUTES is
used on arm64.
Changes since v12:
* Only define kvm_arch_has_private_mem() when
CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES is set to avoid build issues
when KVM is disabled.
Changes since v10:
* KVM_GENERIC_PRIVATE_MEM replacd with KVM_GENERIC_MEMORY_ATTRIBUTES.
Changes since v9:
* Drop the #ifdef CONFIG_KVM_PRIVATE_MEM guard from the definition of
kvm_arch_has_private_mem()
Changes since v2:
* Switch kvm_arch_has_private_mem() to a macro to avoid overhead of a
function call.
* Guard definitions of kvm_arch_{pre,post}_set_memory_attributes() with
#ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES.
* Early out in kvm_arch_post_set_memory_attributes() if the WARN_ON
should trigger.
---
Documentation/virt/kvm/api.rst | 2 +-
arch/arm64/include/asm/kvm_host.h | 4 ++++
arch/arm64/kvm/Kconfig | 1 +
arch/arm64/kvm/mmu.c | 24 ++++++++++++++++++++++++
4 files changed, 30 insertions(+), 1 deletion(-)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 31a5919d8d5f..a47c60490475 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6379,7 +6379,7 @@ Returns -EINVAL if called on a protected VM.
-------------------------------
:Capability: KVM_CAP_MEMORY_ATTRIBUTES
-:Architectures: x86
+:Architectures: x86, arm64
:Type: vm ioctl
:Parameters: struct kvm_memory_attributes (in)
:Returns: 0 on success, <0 on error
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 39b5de03d0fe..11e7b629c950 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -1531,6 +1531,10 @@ struct kvm *kvm_arch_alloc_vm(void);
#define vcpu_is_protected(vcpu) kvm_vm_is_protected((vcpu)->kvm)
+#ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
+#define kvm_arch_has_private_mem(kvm) ((kvm)->arch.is_realm)
+#endif
+
int kvm_arm_vcpu_finalize(struct kvm_vcpu *vcpu, int feature);
bool kvm_arm_vcpu_is_finalized(struct kvm_vcpu *vcpu);
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index 449154f9a485..4e16719fda22 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -37,6 +37,7 @@ menuconfig KVM
select SCHED_INFO
select GUEST_PERF_EVENTS if PERF_EVENTS
select KVM_GUEST_MEMFD
+ select KVM_GENERIC_MEMORY_ATTRIBUTES
help
Support hosting virtualized guest machines.
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 10ca9dbe40a0..ac2a0f0106b0 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -2684,6 +2684,30 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
return ret;
}
+#ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
+bool kvm_arch_pre_set_memory_attributes(struct kvm *kvm,
+ struct kvm_gfn_range *range)
+{
+ WARN_ON_ONCE(!kvm_arch_has_private_mem(kvm));
+ return false;
+}
+
+bool kvm_arch_post_set_memory_attributes(struct kvm *kvm,
+ struct kvm_gfn_range *range)
+{
+ if (WARN_ON_ONCE(!kvm_arch_has_private_mem(kvm)))
+ return false;
+
+ if (range->arg.attributes & KVM_MEMORY_ATTRIBUTE_PRIVATE)
+ range->attr_filter = KVM_FILTER_SHARED;
+ else
+ range->attr_filter = KVM_FILTER_PRIVATE;
+ kvm_unmap_gfn_range(kvm, range);
+
+ return false;
+}
+#endif
+
void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *slot)
{
}
--
2.43.0
^ permalink raw reply related
* [PATCH v14 24/44] KVM: arm64: Handle realm MMIO emulation
From: Steven Price @ 2026-05-13 13:17 UTC (permalink / raw)
To: kvm, kvmarm
Cc: Steven Price, Catalin Marinas, Marc Zyngier, Will Deacon,
James Morse, Oliver Upton, Suzuki K Poulose, Zenghui Yu,
linux-arm-kernel, linux-kernel, Joey Gouly, Alexandru Elisei,
Christoffer Dall, Fuad Tabba, linux-coco, Ganapatrao Kulkarni,
Gavin Shan, Shanker Donthineni, Alper Gun, Aneesh Kumar K . V,
Emi Kisanuki, Vishal Annapurve, WeiLin.Chang, Lorenzo.Pieralisi2
In-Reply-To: <20260513131757.116630-1-steven.price@arm.com>
MMIO emulation for a realm cannot be done directly with the VM's
registers as they are protected from the host. However, for emulatable
data aborts, the RMM uses GPRS[0] to provide the read/written value.
We can transfer this from/to the equivalent VCPU's register entry and
then depend on the generic MMIO handling code in KVM.
For a MMIO read, the value is placed in the shared RecExit structure
during kvm_handle_mmio_return() rather than in the VCPU's register
entry.
Signed-off-by: Steven Price <steven.price@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
Changes since v7:
* New comment for rec_exit_sync_dabt() explaining the call to
vcpu_set_reg().
Changes since v5:
* Inject SEA to the guest is an emulatable MMIO access triggers a data
abort.
* kvm_handle_mmio_return() - disable kvm_incr_pc() for a REC (as the PC
isn't under the host's control) and move the REC_ENTER_EMULATED_MMIO
flag setting to this location (as that tells the RMM to skip the
instruction).
---
arch/arm64/kvm/inject_fault.c | 4 +++-
arch/arm64/kvm/mmio.c | 16 ++++++++++++----
arch/arm64/kvm/rmi-exit.c | 14 ++++++++++++++
3 files changed, 29 insertions(+), 5 deletions(-)
diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c
index 89982bd3345f..6492397b73d7 100644
--- a/arch/arm64/kvm/inject_fault.c
+++ b/arch/arm64/kvm/inject_fault.c
@@ -228,7 +228,9 @@ static void inject_abt32(struct kvm_vcpu *vcpu, bool is_pabt, u32 addr)
static void __kvm_inject_sea(struct kvm_vcpu *vcpu, bool iabt, u64 addr)
{
- if (vcpu_el1_is_32bit(vcpu))
+ if (unlikely(vcpu_is_rec(vcpu)))
+ vcpu->arch.rec.run->enter.flags |= REC_ENTER_FLAG_INJECT_SEA;
+ else if (vcpu_el1_is_32bit(vcpu))
inject_abt32(vcpu, iabt, addr);
else
inject_abt64(vcpu, iabt, addr);
diff --git a/arch/arm64/kvm/mmio.c b/arch/arm64/kvm/mmio.c
index e2285ed8c91d..6a8cb927fcca 100644
--- a/arch/arm64/kvm/mmio.c
+++ b/arch/arm64/kvm/mmio.c
@@ -6,6 +6,7 @@
#include <linux/kvm_host.h>
#include <asm/kvm_emulate.h>
+#include <asm/rmi_smc.h>
#include <trace/events/kvm.h>
#include "trace.h"
@@ -138,14 +139,21 @@ int kvm_handle_mmio_return(struct kvm_vcpu *vcpu)
trace_kvm_mmio(KVM_TRACE_MMIO_READ, len, run->mmio.phys_addr,
&data);
data = vcpu_data_host_to_guest(vcpu, data, len);
- vcpu_set_reg(vcpu, kvm_vcpu_dabt_get_rd(vcpu), data);
+
+ if (vcpu_is_rec(vcpu))
+ vcpu->arch.rec.run->enter.gprs[0] = data;
+ else
+ vcpu_set_reg(vcpu, kvm_vcpu_dabt_get_rd(vcpu), data);
}
/*
* The MMIO instruction is emulated and should not be re-executed
* in the guest.
*/
- kvm_incr_pc(vcpu);
+ if (vcpu_is_rec(vcpu))
+ vcpu->arch.rec.run->enter.flags |= REC_ENTER_FLAG_EMULATED_MMIO;
+ else
+ kvm_incr_pc(vcpu);
return 1;
}
@@ -167,14 +175,14 @@ int io_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
* No valid syndrome? Ask userspace for help if it has
* volunteered to do so, and bail out otherwise.
*
- * In the protected VM case, there isn't much userspace can do
+ * In the protected/realm VM case, there isn't much userspace can do
* though, so directly deliver an exception to the guest.
*/
if (!kvm_vcpu_dabt_isvalid(vcpu)) {
trace_kvm_mmio_nisv(*vcpu_pc(vcpu), esr,
kvm_vcpu_get_hfar(vcpu), fault_ipa);
- if (vcpu_is_protected(vcpu))
+ if (vcpu_is_protected(vcpu) || vcpu_is_rec(vcpu))
return kvm_inject_sea_dabt(vcpu, kvm_vcpu_get_hfar(vcpu));
if (test_bit(KVM_ARCH_FLAG_RETURN_NISV_IO_ABORT_TO_USER,
diff --git a/arch/arm64/kvm/rmi-exit.c b/arch/arm64/kvm/rmi-exit.c
index e7c51b6cf6ce..8ec0d179eba2 100644
--- a/arch/arm64/kvm/rmi-exit.c
+++ b/arch/arm64/kvm/rmi-exit.c
@@ -25,6 +25,20 @@ static int rec_exit_reason_notimpl(struct kvm_vcpu *vcpu)
static int rec_exit_sync_dabt(struct kvm_vcpu *vcpu)
{
+ struct realm_rec *rec = &vcpu->arch.rec;
+
+ /*
+ * In the case of a write, copy over gprs[0] to the target GPR,
+ * preparing to handle MMIO write fault. The content to be written has
+ * been saved to gprs[0] by the RMM (even if another register was used
+ * by the guest). In the case of normal memory access this is redundant
+ * (the guest will replay the instruction), but the overhead is
+ * minimal.
+ */
+ if (kvm_vcpu_dabt_iswrite(vcpu) && kvm_vcpu_dabt_isvalid(vcpu))
+ vcpu_set_reg(vcpu, kvm_vcpu_dabt_get_rd(vcpu),
+ rec->run->exit.gprs[0]);
+
return kvm_handle_guest_abort(vcpu);
}
--
2.43.0
^ permalink raw reply related
* [PATCH v14 23/44] arm64: RMI: Handle RMI_EXIT_RIPAS_CHANGE
From: Steven Price @ 2026-05-13 13:17 UTC (permalink / raw)
To: kvm, kvmarm
Cc: Steven Price, Catalin Marinas, Marc Zyngier, Will Deacon,
James Morse, Oliver Upton, Suzuki K Poulose, Zenghui Yu,
linux-arm-kernel, linux-kernel, Joey Gouly, Alexandru Elisei,
Christoffer Dall, Fuad Tabba, linux-coco, Ganapatrao Kulkarni,
Gavin Shan, Shanker Donthineni, Alper Gun, Aneesh Kumar K . V,
Emi Kisanuki, Vishal Annapurve, WeiLin.Chang, Lorenzo.Pieralisi2
In-Reply-To: <20260513131757.116630-1-steven.price@arm.com>
The guest can request that a region of it's protected address space is
switched between RIPAS_RAM and RIPAS_EMPTY (and back) using
RSI_IPA_STATE_SET. This causes a guest exit with the
RMI_EXIT_RIPAS_CHANGE code. We treat this as a request to convert a
protected region to unprotected (or back), exiting to the VMM to make
the necessary changes to the guest_memfd and memslot mappings. On the
next entry the RIPAS changes are committed by making RMI_RTT_SET_RIPAS
calls.
The VMM may wish to reject the RIPAS change requested by the guest. For
now it can only do this by no longer scheduling the VCPU as we don't
currently have a usecase for returning that rejection to the guest, but
by postponing the RMI_RTT_SET_RIPAS changes to entry we leave the door
open for adding a new ioctl in the future for this purpose.
Signed-off-by: Steven Price <steven.price@arm.com>
---
Changes since v13:
* Switch to the new RMI_RTT_UNPROT_UNMAP range-based API.
* Drop ugly hack for RMM bug which errored when the RIPAS was already
set to the desired value.
Changes since v12:
* Switch to the new RMM v2.0 RMI_RTT_DATA_UNMAP which can unmap an
address range.
Changes since v11:
* Combine the "Allow VMM to set RIPAS" patch into this one to avoid
adding functions before they are used.
* Drop the CAP for setting RIPAS and adapt to changes from previous
patches.
Changes since v10:
* Add comment explaining the assignment of rec->run->exit.ripas_base in
kvm_complete_ripas_change().
Changes since v8:
* Make use of ripas_change() from a previous patch to implement
realm_set_ipa_state().
* Update exit.ripas_base after a RIPAS change so that, if instead of
entering the guest we exit to user space, we don't attempt to repeat
the RIPAS change (triggering an error from the RMM).
Changes since v7:
* Rework the loop in realm_set_ipa_state() to make it clear when the
'next' output value of rmi_rtt_set_ripas() is used.
New patch for v7: The code was previously split awkwardly between two
other patches.
---
arch/arm64/include/asm/kvm_rmi.h | 6 +
arch/arm64/kvm/mmu.c | 8 +-
arch/arm64/kvm/rmi.c | 439 +++++++++++++++++++++++++++++++
3 files changed, 450 insertions(+), 3 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_rmi.h b/arch/arm64/include/asm/kvm_rmi.h
index feb534a6678e..007249a13dbc 100644
--- a/arch/arm64/include/asm/kvm_rmi.h
+++ b/arch/arm64/include/asm/kvm_rmi.h
@@ -88,6 +88,12 @@ int kvm_rec_enter(struct kvm_vcpu *vcpu);
int kvm_rec_pre_enter(struct kvm_vcpu *vcpu);
int handle_rec_exit(struct kvm_vcpu *vcpu, int rec_run_status);
+void kvm_realm_unmap_range(struct kvm *kvm,
+ unsigned long ipa,
+ unsigned long size,
+ bool unmap_private,
+ bool may_block);
+
static inline bool kvm_realm_is_private_address(struct realm *realm,
unsigned long addr)
{
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index eb56d4e7f21a..10ca9dbe40a0 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -319,6 +319,7 @@ static void invalidate_icache_guest_page(void *va, size_t size)
* @start: The intermediate physical base address of the range to unmap
* @size: The size of the area to unmap
* @may_block: Whether or not we are permitted to block
+ * @only_shared: If true then protected mappings should not be unmapped
*
* Clear a range of stage-2 mappings, lowering the various ref-counts. Must
* be called while holding mmu_lock (unless for freeing the stage2 pgd before
@@ -326,7 +327,7 @@ static void invalidate_icache_guest_page(void *va, size_t size)
* with things behind our backs.
*/
static void __unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size,
- bool may_block)
+ bool may_block, bool only_shared)
{
struct kvm *kvm = kvm_s2_mmu_to_kvm(mmu);
phys_addr_t end = start + size;
@@ -343,7 +344,7 @@ void kvm_stage2_unmap_range(struct kvm_s2_mmu *mmu, phys_addr_t start,
if (kvm_vm_is_protected(kvm_s2_mmu_to_kvm(mmu)))
return;
- __unmap_stage2_range(mmu, start, size, may_block);
+ __unmap_stage2_range(mmu, start, size, may_block, false);
}
void kvm_stage2_flush_range(struct kvm_s2_mmu *mmu, phys_addr_t addr, phys_addr_t end)
@@ -2418,7 +2419,8 @@ bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
__unmap_stage2_range(&kvm->arch.mmu, range->start << PAGE_SHIFT,
(range->end - range->start) << PAGE_SHIFT,
- range->may_block);
+ range->may_block,
+ !(range->attr_filter & KVM_FILTER_PRIVATE));
kvm_nested_s2_unmap(kvm, range->may_block);
return false;
diff --git a/arch/arm64/kvm/rmi.c b/arch/arm64/kvm/rmi.c
index d8a5fb12db2d..a89873a5eb77 100644
--- a/arch/arm64/kvm/rmi.c
+++ b/arch/arm64/kvm/rmi.c
@@ -34,6 +34,91 @@ static int get_start_level(struct realm *realm)
return 4 - stage2_pgtable_levels(realm->ia_bits);
}
+static int find_map_level(struct realm *realm,
+ unsigned long start,
+ unsigned long end)
+{
+ int level = KVM_PGTABLE_LAST_LEVEL;
+
+ while (level > get_start_level(realm)) {
+ unsigned long map_size = rmi_rtt_level_mapsize(level - 1);
+
+ if (!IS_ALIGNED(start, map_size) ||
+ (start + map_size) > end)
+ break;
+
+ level--;
+ }
+
+ return level;
+}
+
+static unsigned long level_to_size(int level)
+{
+ switch (level) {
+ case 0:
+ return PAGE_SIZE;
+ case 1:
+ return PMD_SIZE;
+ case 2:
+ return PUD_SIZE;
+ case 3:
+ return P4D_SIZE;
+ }
+ WARN_ON(1);
+ return 0;
+}
+
+static int undelegate_range_desc(unsigned long desc)
+{
+ unsigned long size = level_to_size(RMI_ADDR_RANGE_SIZE(desc));
+ unsigned long count = RMI_ADDR_RANGE_COUNT(desc);
+ unsigned long addr = RMI_ADDR_RANGE_ADDR(desc);
+ unsigned long state = RMI_ADDR_RANGE_STATE(desc);
+
+ if (state == RMI_OP_MEM_UNDELEGATED)
+ return 0;
+
+ if (size * count == 0)
+ return 0;
+
+ return rmi_undelegate_range(addr, size * count);
+}
+
+static phys_addr_t alloc_delegated_granule(struct kvm_mmu_memory_cache *mc)
+{
+ phys_addr_t phys;
+ void *virt;
+
+ if (mc) {
+ virt = kvm_mmu_memory_cache_alloc(mc);
+ } else {
+ virt = (void *)__get_free_page(GFP_ATOMIC | __GFP_ZERO |
+ __GFP_ACCOUNT);
+ }
+
+ if (!virt)
+ return PHYS_ADDR_MAX;
+
+ phys = virt_to_phys(virt);
+ if (rmi_delegate_page(phys)) {
+ free_page((unsigned long)virt);
+ return PHYS_ADDR_MAX;
+ }
+
+ return phys;
+}
+
+static phys_addr_t alloc_rtt(struct kvm_mmu_memory_cache *mc)
+{
+ phys_addr_t phys = alloc_delegated_granule(mc);
+
+ if (phys != PHYS_ADDR_MAX)
+ kvm_account_pgtable_pages(phys_to_virt(phys), 1);
+
+ return phys;
+}
+
static void free_rtt(phys_addr_t phys)
{
if (free_delegated_page(phys))
@@ -42,6 +127,32 @@ static void free_rtt(phys_addr_t phys)
kvm_account_pgtable_pages(phys_to_virt(phys), -1);
}
+static int realm_rtt_create(struct realm *realm,
+ unsigned long addr,
+ int level,
+ phys_addr_t phys)
+{
+ addr = ALIGN_DOWN(addr, rmi_rtt_level_mapsize(level - 1));
+ return rmi_rtt_create(virt_to_phys(realm->rd), phys, addr, level);
+}
+
+static int realm_rtt_fold(struct realm *realm,
+ unsigned long addr,
+ int level,
+ phys_addr_t *rtt_granule)
+{
+ unsigned long out_rtt;
+ int ret;
+
+ addr = ALIGN_DOWN(addr, rmi_rtt_level_mapsize(level - 1));
+ ret = rmi_rtt_fold(virt_to_phys(realm->rd), addr, level, &out_rtt);
+
+ if (rtt_granule)
+ *rtt_granule = out_rtt;
+
+ return ret;
+}
+
/*
* realm_rtt_destroy - Destroy an RTT at @level for @addr.
*
@@ -65,6 +176,38 @@ static int realm_rtt_destroy(struct realm *realm, unsigned long addr,
return ret;
}
+static int realm_create_rtt_levels(struct realm *realm,
+ unsigned long ipa,
+ int level,
+ int max_level,
+ struct kvm_mmu_memory_cache *mc)
+{
+ while (level++ < max_level) {
+ phys_addr_t rtt = alloc_rtt(mc);
+ int ret;
+
+ if (rtt == PHYS_ADDR_MAX)
+ return -ENOMEM;
+
+ ret = realm_rtt_create(realm, ipa, level, rtt);
+ if (RMI_RETURN_STATUS(ret) == RMI_ERROR_RTT &&
+ RMI_RETURN_INDEX(ret) == level - 1) {
+ /* The RTT already exists, continue */
+ free_rtt(rtt);
+ continue;
+ }
+
+ if (ret) {
+ WARN(1, "Failed to create RTT at level %d: %d\n",
+ level, ret);
+ free_rtt(rtt);
+ return -ENXIO;
+ }
+ }
+
+ return 0;
+}
+
static int realm_tear_down_rtt_level(struct realm *realm, int level,
unsigned long start, unsigned long end)
{
@@ -159,6 +302,62 @@ static int realm_tear_down_rtt_range(struct realm *realm,
start, end);
}
+/*
+ * Returns 0 on successful fold, a negative value on error, a positive value if
+ * we were not able to fold all tables at this level.
+ */
+static int realm_fold_rtt_level(struct realm *realm, int level,
+ unsigned long start, unsigned long end)
+{
+ int not_folded = 0;
+ ssize_t map_size;
+ unsigned long addr, next_addr;
+
+ if (WARN_ON(level > KVM_PGTABLE_LAST_LEVEL))
+ return -EINVAL;
+
+ map_size = rmi_rtt_level_mapsize(level - 1);
+
+ for (addr = start; addr < end; addr = next_addr) {
+ phys_addr_t rtt_granule;
+ int ret;
+ unsigned long align_addr = ALIGN(addr, map_size);
+
+ next_addr = ALIGN(addr + 1, map_size);
+
+ ret = realm_rtt_fold(realm, align_addr, level, &rtt_granule);
+
+ switch (RMI_RETURN_STATUS(ret)) {
+ case RMI_SUCCESS:
+ free_rtt(rtt_granule);
+ break;
+ case RMI_ERROR_RTT:
+ if (level == KVM_PGTABLE_LAST_LEVEL ||
+ RMI_RETURN_INDEX(ret) < level) {
+ not_folded++;
+ break;
+ }
+ /* Recurse a level deeper */
+ ret = realm_fold_rtt_level(realm,
+ level + 1,
+ addr,
+ next_addr);
+ if (ret < 0) {
+ return ret;
+ } else if (ret == 0) {
+ /* Try again at this level */
+ next_addr = addr;
+ }
+ break;
+ default:
+ WARN_ON(1);
+ return -ENXIO;
+ }
+ }
+
+ return not_folded;
+}
+
void kvm_realm_destroy_rtts(struct kvm *kvm)
{
struct realm *realm = &kvm->arch.realm;
@@ -167,12 +366,249 @@ void kvm_realm_destroy_rtts(struct kvm *kvm)
realm_tear_down_rtt_range(realm, 0, (1UL << ia_bits));
}
+static void realm_unmap_shared_range(struct kvm *kvm,
+ unsigned long start,
+ unsigned long end,
+ bool may_block)
+{
+ struct realm *realm = &kvm->arch.realm;
+ unsigned long rd = virt_to_phys(realm->rd);
+ unsigned long next_addr, addr;
+ unsigned long shared_bit = BIT(realm->ia_bits - 1);
+
+ start |= shared_bit;
+ end |= shared_bit;
+
+ for (addr = start; addr < end; addr = next_addr) {
+ int ret;
+
+ ret = rmi_rtt_unprot_unmap(rd, addr, end, RMI_ADDR_TYPE_NONE,
+ 0, &next_addr, NULL, NULL);
+ switch (RMI_RETURN_STATUS(ret)) {
+ case RMI_SUCCESS:
+ break;
+ case RMI_ERROR_RTT: {
+ int err_level = RMI_RETURN_INDEX(ret);
+ int level = find_map_level(realm, addr, end);
+
+ if (err_level >= level) {
+ /* Nothing present, so skip */
+ next_addr = addr + rmi_rtt_level_mapsize(err_level);
+ break;
+ }
+
+ ret = realm_create_rtt_levels(realm, addr, err_level,
+ level, NULL);
+ if (WARN_ON(ret))
+ return;
+ /* Retry with the RTT levels in place */
+ next_addr = addr;
+ break;
+ }
+ default:
+ WARN_ON(1);
+ return;
+ }
+
+ if (may_block)
+ cond_resched_rwlock_write(&kvm->mmu_lock);
+ }
+
+ realm_fold_rtt_level(realm, get_start_level(realm) + 1,
+ start, end);
+}
+
+static void realm_unmap_private_range(struct kvm *kvm,
+ unsigned long start,
+ unsigned long end,
+ bool may_block)
+{
+ struct realm *realm = &kvm->arch.realm;
+ unsigned long rd = virt_to_phys(realm->rd);
+ unsigned long next_addr, addr;
+ int ret;
+
+ for (addr = start; addr < end; addr = next_addr) {
+ unsigned long out_range;
+ unsigned long flags = RMI_ADDR_TYPE_SINGLE;
+ /* TODO: Optimise using RMI_ADDR_TYPE_LIST */
+
+retry:
+ ret = rmi_rtt_data_unmap(rd, addr, end, flags, 0,
+ &next_addr, &out_range, NULL);
+
+ if (RMI_RETURN_STATUS(ret) == RMI_ERROR_RTT) {
+ phys_addr_t rtt;
+
+ if (next_addr > addr)
+ continue; /* UNASSIGNED */
+
+ rtt = alloc_rtt(NULL);
+ if (WARN_ON(rtt == PHYS_ADDR_MAX))
+ return;
+ ret = realm_rtt_create(realm, addr,
+ RMI_RETURN_INDEX(ret) + 1, rtt);
+ if (WARN_ON(ret)) {
+ free_rtt(rtt);
+ return;
+ }
+ goto retry;
+ } else if (WARN_ON(ret)) {
+ continue;
+ }
+
+ ret = undelegate_range_desc(out_range);
+ if (WARN_ON(ret))
+ break;
+
+ if (may_block)
+ cond_resched_rwlock_write(&kvm->mmu_lock);
+ }
+
+ realm_fold_rtt_level(realm, get_start_level(realm) + 1,
+ start, end);
+}
+
+void kvm_realm_unmap_range(struct kvm *kvm, unsigned long start,
+ unsigned long size, bool unmap_private,
+ bool may_block)
+{
+ unsigned long end = start + size;
+ struct realm *realm = &kvm->arch.realm;
+
+ if (!kvm_realm_is_created(kvm))
+ return;
+
+ end = min(BIT(realm->ia_bits - 1), end);
+
+ realm_unmap_shared_range(kvm, start, end, may_block);
+ if (unmap_private)
+ realm_unmap_private_range(kvm, start, end, may_block);
+}
+
+enum ripas_action {
+ RIPAS_INIT,
+ RIPAS_SET,
+};
+
+static int ripas_change(struct kvm *kvm,
+ struct kvm_vcpu *vcpu,
+ unsigned long ipa,
+ unsigned long end,
+ enum ripas_action action,
+ unsigned long *top_ipa)
+{
+ struct realm *realm = &kvm->arch.realm;
+ phys_addr_t rd_phys = virt_to_phys(realm->rd);
+ phys_addr_t rec_phys;
+ struct kvm_mmu_memory_cache *memcache = NULL;
+ int ret = 0;
+
+ if (vcpu) {
+ rec_phys = virt_to_phys(vcpu->arch.rec.rec_page);
+ memcache = &vcpu->arch.mmu_page_cache;
+
+ WARN_ON(action != RIPAS_SET);
+ } else {
+ WARN_ON(action != RIPAS_INIT);
+ }
+
+ while (ipa < end) {
+ unsigned long next = ~0;
+
+ switch (action) {
+ case RIPAS_INIT:
+ ret = rmi_rtt_init_ripas(rd_phys, ipa, end, &next);
+ break;
+ case RIPAS_SET:
+ ret = rmi_rtt_set_ripas(rd_phys, rec_phys, ipa, end,
+ &next);
+ break;
+ }
+
+ switch (RMI_RETURN_STATUS(ret)) {
+ case RMI_SUCCESS:
+ ipa = next;
+ break;
+ case RMI_ERROR_RTT: {
+ int err_level = RMI_RETURN_INDEX(ret);
+ int level = find_map_level(realm, ipa, end);
+
+ ret = realm_create_rtt_levels(realm, ipa, err_level,
+ level, memcache);
+ if (ret)
+ return ret;
+ /* Retry with the RTT levels in place */
+ break;
+ }
+ default:
+ WARN_ON(1);
+ return -ENXIO;
+ }
+ }
+
+ if (top_ipa)
+ *top_ipa = ipa;
+
+ return 0;
+}
+
+static int realm_set_ipa_state(struct kvm_vcpu *vcpu,
+ unsigned long start,
+ unsigned long end,
+ unsigned long ripas,
+ unsigned long *top_ipa)
+{
+ struct kvm *kvm = vcpu->kvm;
+ int ret = ripas_change(kvm, vcpu, start, end, RIPAS_SET, top_ipa);
+
+ if (ripas == RMI_EMPTY && *top_ipa != start)
+ realm_unmap_private_range(kvm, start, *top_ipa, false);
+
+ return ret;
+}
+
static int realm_ensure_created(struct kvm *kvm)
{
/* Provided in later patch */
return -ENXIO;
}
+static void kvm_complete_ripas_change(struct kvm_vcpu *vcpu)
+{
+ struct kvm *kvm = vcpu->kvm;
+ struct realm_rec *rec = &vcpu->arch.rec;
+ unsigned long base = rec->run->exit.ripas_base;
+ unsigned long top = rec->run->exit.ripas_top;
+ unsigned long ripas = rec->run->exit.ripas_value;
+ unsigned long top_ipa;
+ int ret;
+
+ do {
+ kvm_mmu_topup_memory_cache(&vcpu->arch.mmu_page_cache,
+ kvm_mmu_cache_min_pages(vcpu->arch.hw_mmu));
+ write_lock(&kvm->mmu_lock);
+ ret = realm_set_ipa_state(vcpu, base, top, ripas, &top_ipa);
+ write_unlock(&kvm->mmu_lock);
+
+ if (WARN_RATELIMIT(ret && ret != -ENOMEM,
+ "Unable to satisfy RIPAS_CHANGE for %#lx - %#lx, ripas: %#lx\n",
+ base, top, ripas))
+ break;
+
+ base = top_ipa;
+ } while (base < top);
+
+ /*
+ * If this function is called again before the REC_ENTER call then
+ * avoid calling realm_set_ipa_state() again by changing to the value
+ * of ripas_base for the part that has already been covered. The RMM
+ * ignores the contains of the rec_exit structure so this doesn't
+ * affect the RMM.
+ */
+ rec->run->exit.ripas_base = base;
+}
+
/*
* kvm_rec_pre_enter - Complete operations before entering a REC
*
@@ -197,6 +633,9 @@ int kvm_rec_pre_enter(struct kvm_vcpu *vcpu)
for (int i = 0; i < REC_RUN_GPRS; i++)
rec->run->enter.gprs[i] = vcpu_get_reg(vcpu, i);
break;
+ case RMI_EXIT_RIPAS_CHANGE:
+ kvm_complete_ripas_change(vcpu);
+ break;
}
return 1;
--
2.43.0
^ permalink raw reply related
* [PATCH v14 22/44] arm64: RMI: Handle realm enter/exit
From: Steven Price @ 2026-05-13 13:17 UTC (permalink / raw)
To: kvm, kvmarm
Cc: Steven Price, Catalin Marinas, Marc Zyngier, Will Deacon,
James Morse, Oliver Upton, Suzuki K Poulose, Zenghui Yu,
linux-arm-kernel, linux-kernel, Joey Gouly, Alexandru Elisei,
Christoffer Dall, Fuad Tabba, linux-coco, Ganapatrao Kulkarni,
Gavin Shan, Shanker Donthineni, Alper Gun, Aneesh Kumar K . V,
Emi Kisanuki, Vishal Annapurve, WeiLin.Chang, Lorenzo.Pieralisi2
In-Reply-To: <20260513131757.116630-1-steven.price@arm.com>
Entering a realm is done using a SMC call to the RMM. On exit the
exit-codes need to be handled slightly differently to the normal KVM
path so define our own functions for realm enter/exit and hook them
in if the guest is a realm guest.
Signed-off-by: Steven Price <steven.price@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
---
Chanegs since v13:
* The RMM is now required to provide an ESR value with the correct
information to emulate MMIO, so we no longer need to hardcode 0s in
rec_exit_sys_reg().
* The PSCI changes mean that there is a potential race when turning on
a VCPU which can cause a RMI_ERROR_REC return. Exit to user space
with -EAGAIN in this case.
Changes since v12:
* Call guest_state_{enter,exit}_irqoff() around rmi_rec_enter().
* Add handling of the IRQ exception case where IRQs need to be briefly
enabled before exiting guest timing.
Changes since v8:
* Introduce kvm_rec_pre_enter() called before entering an atomic
section to handle operations that might require memory allocation
(specifically completing a RIPAS change introduced in a later patch).
* Updates to align with upstream changes to hpfar_el2 which now (ab)uses
HPFAR_EL2_NS as a valid flag.
* Fix exit reason when racing with PSCI shutdown to return
KVM_EXIT_SHUTDOWN rather than KVM_EXIT_UNKNOWN.
Changes since v7:
* A return of 0 from kvm_handle_sys_reg() doesn't mean the register has
been read (although that can never happen in the current code). Tidy
up the condition to handle any future refactoring.
Changes since v6:
* Use vcpu_err() rather than pr_err/kvm_err when there is an associated
vcpu to the error.
* Return -EFAULT for KVM_EXIT_MEMORY_FAULT as per the documentation for
this exit type.
* Split code handling a RIPAS change triggered by the guest to the
following patch.
Changes since v5:
* For a RIPAS_CHANGE request from the guest perform the actual RIPAS
change on next entry rather than immediately on the exit. This allows
the VMM to 'reject' a RIPAS change by refusing to continue
scheduling.
Changes since v4:
* Rename handle_rme_exit() to handle_rec_exit()
* Move the loop to copy registers into the REC enter structure from the
to rec_exit_handlers callbacks to kvm_rec_enter(). This fixes a bug
where the handler exits to user space and user space wants to modify
the GPRS.
* Some code rearrangement in rec_exit_ripas_change().
Changes since v2:
* realm_set_ipa_state() now provides an output parameter for the
top_iap that was changed. Use this to signal the VMM with the correct
range that has been transitioned.
* Adapt to previous patch changes.
---
arch/arm64/include/asm/kvm_rmi.h | 4 +
arch/arm64/kvm/Makefile | 2 +-
arch/arm64/kvm/arm.c | 26 ++++-
arch/arm64/kvm/rmi-exit.c | 186 +++++++++++++++++++++++++++++++
arch/arm64/kvm/rmi.c | 42 +++++++
5 files changed, 254 insertions(+), 6 deletions(-)
create mode 100644 arch/arm64/kvm/rmi-exit.c
diff --git a/arch/arm64/include/asm/kvm_rmi.h b/arch/arm64/include/asm/kvm_rmi.h
index d99bf4fc3c39..feb534a6678e 100644
--- a/arch/arm64/include/asm/kvm_rmi.h
+++ b/arch/arm64/include/asm/kvm_rmi.h
@@ -84,6 +84,10 @@ void kvm_destroy_realm(struct kvm *kvm);
void kvm_realm_destroy_rtts(struct kvm *kvm);
void kvm_destroy_rec(struct kvm_vcpu *vcpu);
+int kvm_rec_enter(struct kvm_vcpu *vcpu);
+int kvm_rec_pre_enter(struct kvm_vcpu *vcpu);
+int handle_rec_exit(struct kvm_vcpu *vcpu, int rec_run_status);
+
static inline bool kvm_realm_is_private_address(struct realm *realm,
unsigned long addr)
{
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index ed3cf30eb06e..4a2d52fdb6a2 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -16,7 +16,7 @@ CFLAGS_handle_exit.o += -Wno-override-init
kvm-y += arm.o mmu.o mmio.o psci.o hypercalls.o pvtime.o \
inject_fault.o va_layout.o handle_exit.o config.o \
guest.o debug.o reset.o sys_regs.o stacktrace.o \
- vgic-sys-reg-v3.o fpsimd.o pkvm.o rmi.o \
+ vgic-sys-reg-v3.o fpsimd.o pkvm.o rmi.o rmi-exit.o \
arch_timer.o trng.o vmid.o emulate-nested.o nested.o at.o \
vgic/vgic.o vgic/vgic-init.o \
vgic/vgic-irqfd.o vgic/vgic-v2.o \
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 21d9dfdb1ea0..ed88a203b892 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1331,6 +1331,9 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
if (ret > 0)
ret = check_vcpu_requests(vcpu);
+ if (ret > 0 && vcpu_is_rec(vcpu))
+ ret = kvm_rec_pre_enter(vcpu);
+
/*
* Preparing the interrupts to be injected also
* involves poking the GIC, which must be done in a
@@ -1378,7 +1381,10 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
trace_kvm_entry(*vcpu_pc(vcpu));
guest_timing_enter_irqoff();
- ret = kvm_arm_vcpu_enter_exit(vcpu);
+ if (vcpu_is_rec(vcpu))
+ ret = kvm_rec_enter(vcpu);
+ else
+ ret = kvm_arm_vcpu_enter_exit(vcpu);
vcpu->mode = OUTSIDE_GUEST_MODE;
vcpu->stat.exits++;
@@ -1424,7 +1430,9 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
* context synchronization event) is necessary to ensure that
* pending interrupts are taken.
*/
- if (ARM_EXCEPTION_CODE(ret) == ARM_EXCEPTION_IRQ) {
+ if (ARM_EXCEPTION_CODE(ret) == ARM_EXCEPTION_IRQ ||
+ (vcpu_is_rec(vcpu) &&
+ vcpu->arch.rec.run->exit.exit_reason == RMI_EXIT_IRQ)) {
local_irq_enable();
isb();
local_irq_disable();
@@ -1436,8 +1444,13 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
trace_kvm_exit(ret, kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
- /* Exit types that need handling before we can be preempted */
- handle_exit_early(vcpu, ret);
+ if (!vcpu_is_rec(vcpu)) {
+ /*
+ * Exit types that need handling before we can be
+ * preempted
+ */
+ handle_exit_early(vcpu, ret);
+ }
kvm_nested_sync_hwstate(vcpu);
@@ -1462,7 +1475,10 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
ret = ARM_EXCEPTION_IL;
}
- ret = handle_exit(vcpu, ret);
+ if (vcpu_is_rec(vcpu))
+ ret = handle_rec_exit(vcpu, ret);
+ else
+ ret = handle_exit(vcpu, ret);
}
/* Tell userspace about in-kernel device output levels */
diff --git a/arch/arm64/kvm/rmi-exit.c b/arch/arm64/kvm/rmi-exit.c
new file mode 100644
index 000000000000..e7c51b6cf6ce
--- /dev/null
+++ b/arch/arm64/kvm/rmi-exit.c
@@ -0,0 +1,186 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2023 ARM Ltd.
+ */
+
+#include <linux/kvm_host.h>
+#include <kvm/arm_hypercalls.h>
+#include <kvm/arm_psci.h>
+
+#include <asm/rmi_smc.h>
+#include <asm/kvm_emulate.h>
+#include <asm/kvm_rmi.h>
+#include <asm/kvm_mmu.h>
+
+typedef int (*exit_handler_fn)(struct kvm_vcpu *vcpu);
+
+static int rec_exit_reason_notimpl(struct kvm_vcpu *vcpu)
+{
+ struct realm_rec *rec = &vcpu->arch.rec;
+
+ vcpu_err(vcpu, "Unhandled exit reason from realm (ESR: %#llx)\n",
+ rec->run->exit.esr);
+ return -ENXIO;
+}
+
+static int rec_exit_sync_dabt(struct kvm_vcpu *vcpu)
+{
+ return kvm_handle_guest_abort(vcpu);
+}
+
+static int rec_exit_sync_iabt(struct kvm_vcpu *vcpu)
+{
+ struct realm_rec *rec = &vcpu->arch.rec;
+
+ vcpu_err(vcpu, "Unhandled instruction abort (ESR: %#llx).\n",
+ rec->run->exit.esr);
+ return -ENXIO;
+}
+
+static int rec_exit_sys_reg(struct kvm_vcpu *vcpu)
+{
+ struct realm_rec *rec = &vcpu->arch.rec;
+ unsigned long esr = kvm_vcpu_get_esr(vcpu);
+ int rt = kvm_vcpu_sys_get_rt(vcpu);
+ bool is_write = (esr & ESR_ELx_SYS64_ISS_DIR_MASK) == ESR_ELx_SYS64_ISS_DIR_WRITE;
+ int ret;
+
+ if (is_write)
+ vcpu_set_reg(vcpu, rt, rec->run->exit.gprs[rt]);
+
+ ret = kvm_handle_sys_reg(vcpu);
+ if (!is_write)
+ rec->run->enter.gprs[rt] = vcpu_get_reg(vcpu, rt);
+
+ return ret;
+}
+
+static exit_handler_fn rec_exit_handlers[] = {
+ [0 ... ESR_ELx_EC_MAX] = rec_exit_reason_notimpl,
+ [ESR_ELx_EC_SYS64] = rec_exit_sys_reg,
+ [ESR_ELx_EC_DABT_LOW] = rec_exit_sync_dabt,
+ [ESR_ELx_EC_IABT_LOW] = rec_exit_sync_iabt
+};
+
+static int rec_exit_psci(struct kvm_vcpu *vcpu)
+{
+ struct realm_rec *rec = &vcpu->arch.rec;
+ int i;
+
+ for (i = 0; i < REC_RUN_GPRS; i++)
+ vcpu_set_reg(vcpu, i, rec->run->exit.gprs[i]);
+
+ return kvm_smccc_call_handler(vcpu);
+}
+
+static int rec_exit_ripas_change(struct kvm_vcpu *vcpu)
+{
+ struct kvm *kvm = vcpu->kvm;
+ struct realm *realm = &kvm->arch.realm;
+ struct realm_rec *rec = &vcpu->arch.rec;
+ unsigned long base = rec->run->exit.ripas_base;
+ unsigned long top = rec->run->exit.ripas_top;
+ unsigned long ripas = rec->run->exit.ripas_value;
+
+ if (!kvm_realm_is_private_address(realm, base) ||
+ !kvm_realm_is_private_address(realm, top - 1)) {
+ vcpu_err(vcpu, "Invalid RIPAS_CHANGE for %#lx - %#lx, ripas: %#lx\n",
+ base, top, ripas);
+ /* Set RMI_REJECT bit */
+ rec->run->enter.flags = REC_ENTER_FLAG_RIPAS_RESPONSE;
+ return -EINVAL;
+ }
+
+ /* Exit to VMM, the actual RIPAS change is done on next entry */
+ kvm_prepare_memory_fault_exit(vcpu, base, top - base, false, false,
+ ripas == RMI_RAM);
+
+ /*
+ * KVM_EXIT_MEMORY_FAULT requires an return code of -EFAULT, see the
+ * API documentation
+ */
+ return -EFAULT;
+}
+
+static void update_arch_timer_irq_lines(struct kvm_vcpu *vcpu)
+{
+ struct realm_rec *rec = &vcpu->arch.rec;
+
+ __vcpu_assign_sys_reg(vcpu, CNTV_CTL_EL0, rec->run->exit.cntv_ctl);
+ __vcpu_assign_sys_reg(vcpu, CNTV_CVAL_EL0, rec->run->exit.cntv_cval);
+ __vcpu_assign_sys_reg(vcpu, CNTP_CTL_EL0, rec->run->exit.cntp_ctl);
+ __vcpu_assign_sys_reg(vcpu, CNTP_CVAL_EL0, rec->run->exit.cntp_cval);
+
+ kvm_realm_timers_update(vcpu);
+}
+
+/*
+ * Return > 0 to return to guest, < 0 on error, 0 (and set exit_reason) on
+ * proper exit to userspace.
+ */
+int handle_rec_exit(struct kvm_vcpu *vcpu, int rec_run_ret)
+{
+ struct realm_rec *rec = &vcpu->arch.rec;
+ u8 esr_ec = ESR_ELx_EC(rec->run->exit.esr);
+ unsigned long status, index;
+
+ status = RMI_RETURN_STATUS(rec_run_ret);
+ index = RMI_RETURN_INDEX(rec_run_ret);
+
+ /*
+ * If a PSCI_SYSTEM_OFF request raced with a vcpu executing, we might
+ * see the following status code and index indicating an attempt to run
+ * a REC when the RD state is SYSTEM_OFF. In this case, we just need to
+ * return to user space which can deal with the system event or will try
+ * to run the KVM VCPU again, at which point we will no longer attempt
+ * to enter the Realm because we will have a sleep request pending on
+ * the VCPU as a result of KVM's PSCI handling.
+ */
+ if (status == RMI_ERROR_REALM) {
+ vcpu->run->exit_reason = KVM_EXIT_SHUTDOWN;
+ return 0;
+ }
+
+ /*
+ * If a VCPU has been turned on, but the REC state hasn't been updated
+ * we may experience RMI_ERROR_REC. Exit to the userspace with -EAGAIN
+ * for a retry.
+ */
+ if (status == RMI_ERROR_REC)
+ return -EAGAIN;
+ if (rec_run_ret)
+ return -ENXIO;
+
+ vcpu->arch.fault.esr_el2 = rec->run->exit.esr;
+ vcpu->arch.fault.far_el2 = rec->run->exit.far;
+ /* HPFAR_EL2 is only valid for RMI_EXIT_SYNC */
+ vcpu->arch.fault.hpfar_el2 = 0;
+
+ update_arch_timer_irq_lines(vcpu);
+
+ /* Reset the emulation flags for the next run of the REC */
+ rec->run->enter.flags = 0;
+
+ switch (rec->run->exit.exit_reason) {
+ case RMI_EXIT_SYNC:
+ /*
+ * HPFAR_EL2_NS is hijacked to indicate a valid HPFAR value,
+ * see __get_fault_info()
+ */
+ vcpu->arch.fault.hpfar_el2 = rec->run->exit.hpfar | HPFAR_EL2_NS;
+ return rec_exit_handlers[esr_ec](vcpu);
+ case RMI_EXIT_IRQ:
+ case RMI_EXIT_FIQ:
+ case RMI_EXIT_SERROR:
+ return 1;
+ case RMI_EXIT_PSCI:
+ return rec_exit_psci(vcpu);
+ case RMI_EXIT_RIPAS_CHANGE:
+ return rec_exit_ripas_change(vcpu);
+ }
+
+ kvm_pr_unimpl("Unsupported exit reason: %u\n",
+ rec->run->exit.exit_reason);
+ vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
+ return 0;
+}
diff --git a/arch/arm64/kvm/rmi.c b/arch/arm64/kvm/rmi.c
index 353a5ca45e78..d8a5fb12db2d 100644
--- a/arch/arm64/kvm/rmi.c
+++ b/arch/arm64/kvm/rmi.c
@@ -173,6 +173,48 @@ static int realm_ensure_created(struct kvm *kvm)
return -ENXIO;
}
+/*
+ * kvm_rec_pre_enter - Complete operations before entering a REC
+ *
+ * Some operations require work to be completed before entering a realm. That
+ * work may require memory allocation so cannot be done in the kvm_rec_enter()
+ * call.
+ *
+ * Return: 1 if we should enter the guest
+ * 0 if we should exit to userspace
+ * < 0 if we should exit to userspace, where the return value indicates
+ * an error
+ */
+int kvm_rec_pre_enter(struct kvm_vcpu *vcpu)
+{
+ struct realm_rec *rec = &vcpu->arch.rec;
+
+ if (kvm_realm_state(vcpu->kvm) != REALM_STATE_ACTIVE)
+ return -EINVAL;
+
+ switch (rec->run->exit.exit_reason) {
+ case RMI_EXIT_HOST_CALL:
+ for (int i = 0; i < REC_RUN_GPRS; i++)
+ rec->run->enter.gprs[i] = vcpu_get_reg(vcpu, i);
+ break;
+ }
+
+ return 1;
+}
+
+int noinstr kvm_rec_enter(struct kvm_vcpu *vcpu)
+{
+ struct realm_rec *rec = &vcpu->arch.rec;
+ int ret;
+
+ guest_state_enter_irqoff();
+ ret = rmi_rec_enter(virt_to_phys(rec->rec_page),
+ virt_to_phys(rec->run));
+ guest_state_exit_irqoff();
+
+ return ret;
+}
+
static int kvm_create_rec(struct kvm_vcpu *vcpu)
{
struct user_pt_regs *vcpu_regs = vcpu_gp_regs(vcpu);
--
2.43.0
^ permalink raw reply related
* [PATCH v14 21/44] KVM: arm64: Support timers in realm RECs
From: Steven Price @ 2026-05-13 13:17 UTC (permalink / raw)
To: kvm, kvmarm
Cc: Steven Price, Catalin Marinas, Marc Zyngier, Will Deacon,
James Morse, Oliver Upton, Suzuki K Poulose, Zenghui Yu,
linux-arm-kernel, linux-kernel, Joey Gouly, Alexandru Elisei,
Christoffer Dall, Fuad Tabba, linux-coco, Ganapatrao Kulkarni,
Gavin Shan, Shanker Donthineni, Alper Gun, Aneesh Kumar K . V,
Emi Kisanuki, Vishal Annapurve, WeiLin.Chang, Lorenzo.Pieralisi2
In-Reply-To: <20260513131757.116630-1-steven.price@arm.com>
The RMM keeps track of the timer while the realm REC is running, but on
exit to the normal world KVM is responsible for handling the timers.
A later patch adds the support for propagating the timer values from the
exit data structure and calling kvm_realm_timers_update().
Signed-off-by: Steven Price <steven.price@arm.com>
---
Changes since v12:
* Adapt to upstream changes.
Changes since v11:
* Drop the kvm_is_realm() check from timer_set_offset(). We already
ensure that the offset is 0 when calling the function.
Changes since v10:
* KVM_CAP_COUNTER_OFFSET is now already hidden by a previous patch.
Changes since v9:
* No need to move the call to kvm_timer_unblocking() in
kvm_timer_vcpu_load().
Changes since v7:
* Hide KVM_CAP_COUNTER_OFFSET for realm guests.
---
arch/arm64/kvm/arch_timer.c | 28 +++++++++++++++++++++++++---
include/kvm/arm_arch_timer.h | 2 ++
2 files changed, 27 insertions(+), 3 deletions(-)
diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
index cbea4d9ee955..88ed01edc136 100644
--- a/arch/arm64/kvm/arch_timer.c
+++ b/arch/arm64/kvm/arch_timer.c
@@ -470,6 +470,21 @@ static void kvm_timer_update_irq(struct kvm_vcpu *vcpu, bool new_level,
timer_ctx);
}
+void kvm_realm_timers_update(struct kvm_vcpu *vcpu)
+{
+ struct arch_timer_cpu *arch_timer = &vcpu->arch.timer_cpu;
+ int i;
+
+ for (i = 0; i < NR_KVM_EL0_TIMERS; i++) {
+ struct arch_timer_context *timer = &arch_timer->timers[i];
+ bool status = timer_get_ctl(timer) & ARCH_TIMER_CTRL_IT_STAT;
+ bool level = kvm_timer_irq_can_fire(timer) && status;
+
+ if (level != timer->irq.level)
+ kvm_timer_update_irq(vcpu, level, timer);
+ }
+}
+
/* Only called for a fully emulated timer */
static void timer_emulate(struct arch_timer_context *ctx)
{
@@ -1079,7 +1094,7 @@ static void timer_context_init(struct kvm_vcpu *vcpu, int timerid)
ctxt->timer_id = timerid;
- if (!kvm_vm_is_protected(vcpu->kvm)) {
+ if (!kvm_vm_is_protected(vcpu->kvm) && !kvm_is_realm(vcpu->kvm)) {
if (timerid == TIMER_VTIMER)
ctxt->offset.vm_offset = &kvm->arch.timer_data.voffset;
else
@@ -1110,7 +1125,7 @@ void kvm_timer_vcpu_init(struct kvm_vcpu *vcpu)
timer_context_init(vcpu, i);
/* Synchronize offsets across timers of a VM if not already provided */
- if (!vcpu_is_protected(vcpu) &&
+ if (!vcpu_is_protected(vcpu) && !kvm_is_realm(vcpu->kvm) &&
!test_bit(KVM_ARCH_FLAG_VM_COUNTER_OFFSET, &vcpu->kvm->arch.flags)) {
timer_set_offset(vcpu_vtimer(vcpu), kvm_phys_timer_read());
timer_set_offset(vcpu_ptimer(vcpu), 0);
@@ -1611,6 +1626,13 @@ int kvm_timer_enable(struct kvm_vcpu *vcpu)
return -EINVAL;
}
+ /*
+ * We don't use mapped IRQs for Realms because the RMI doesn't allow
+ * us setting the LR.HW bit in the VGIC.
+ */
+ if (vcpu_is_rec(vcpu))
+ return 0;
+
get_timer_map(vcpu, &map);
ops = vgic_is_v5(vcpu->kvm) ? &arch_timer_irq_ops_vgic_v5 :
@@ -1740,7 +1762,7 @@ int kvm_vm_ioctl_set_counter_offset(struct kvm *kvm,
if (offset->reserved)
return -EINVAL;
- if (kvm_vm_is_protected(kvm))
+ if (kvm_vm_is_protected(kvm) || kvm_is_realm(kvm))
return -EINVAL;
mutex_lock(&kvm->lock);
diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
index bf8cc9589bd0..ffdb90dcad58 100644
--- a/include/kvm/arm_arch_timer.h
+++ b/include/kvm/arm_arch_timer.h
@@ -113,6 +113,8 @@ int kvm_arm_timer_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr);
int kvm_arm_timer_get_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr);
int kvm_arm_timer_has_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr);
+void kvm_realm_timers_update(struct kvm_vcpu *vcpu);
+
u64 kvm_phys_timer_read(void);
void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu);
--
2.43.0
^ permalink raw reply related
* [PATCH v14 20/44] arm64: RMI: Support for the VGIC in realms
From: Steven Price @ 2026-05-13 13:17 UTC (permalink / raw)
To: kvm, kvmarm
Cc: Steven Price, Catalin Marinas, Marc Zyngier, Will Deacon,
James Morse, Oliver Upton, Suzuki K Poulose, Zenghui Yu,
linux-arm-kernel, linux-kernel, Joey Gouly, Alexandru Elisei,
Christoffer Dall, Fuad Tabba, linux-coco, Ganapatrao Kulkarni,
Gavin Shan, Shanker Donthineni, Alper Gun, Aneesh Kumar K . V,
Emi Kisanuki, Vishal Annapurve, WeiLin.Chang, Lorenzo.Pieralisi2
In-Reply-To: <20260513131757.116630-1-steven.price@arm.com>
The RMM provides emulation of a VGIC to the realm guest. With RMM v2.0
the registers are passed in the system registers so this works similar
to a normal guest, but kvm_arch_vcpu_put() need reordering to early out,
and realm guests don't support GICv2 even if the host does.
Signed-off-by: Steven Price <steven.price@arm.com>
---
Changes from v12:
* GIC registers are now passed in the system registers rather than via
rec_entry/rec_exit which removes most of the changes.
Changes from v11:
* Minor changes to align with the previous patches. Note that the VGIC
handling will change with RMM v2.0.
Changes from v10:
* Make sure we sync the VGIC v4 state, and only populate valid lrs from
the list.
Changes from v9:
* Copy gicv3_vmcr from the RMM at the same time as gicv3_hcr rather
than having to handle that as a special case.
Changes from v8:
* Propagate gicv3_hcr to from the RMM.
Changes from v5:
* Handle RMM providing fewer GIC LRs than the hardware supports.
---
arch/arm64/kvm/arm.c | 11 ++++++++---
arch/arm64/kvm/vgic/vgic-init.c | 2 +-
2 files changed, 9 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 93d34762db91..21d9dfdb1ea0 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -786,19 +786,24 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
kvm_call_hyp_nvhe(__pkvm_vcpu_put);
}
+ kvm_timer_vcpu_put(vcpu);
+ kvm_vgic_put(vcpu);
+
+ vcpu->cpu = -1;
+
+ if (vcpu_is_rec(vcpu))
+ return;
+
kvm_vcpu_put_debug(vcpu);
kvm_arch_vcpu_put_fp(vcpu);
if (has_vhe())
kvm_vcpu_put_vhe(vcpu);
- kvm_timer_vcpu_put(vcpu);
- kvm_vgic_put(vcpu);
kvm_vcpu_pmu_restore_host(vcpu);
if (vcpu_has_nv(vcpu))
kvm_vcpu_put_hw_mmu(vcpu);
kvm_arm_vmid_clear_active();
vcpu_clear_on_unsupported_cpu(vcpu);
- vcpu->cpu = -1;
}
static void __kvm_arm_vcpu_power_off(struct kvm_vcpu *vcpu)
diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c
index 933983bb2005..a9db963dfd23 100644
--- a/arch/arm64/kvm/vgic/vgic-init.c
+++ b/arch/arm64/kvm/vgic/vgic-init.c
@@ -81,7 +81,7 @@ int kvm_vgic_create(struct kvm *kvm, u32 type)
* the proper checks already.
*/
if (type == KVM_DEV_TYPE_ARM_VGIC_V2 &&
- !kvm_vgic_global_state.can_emulate_gicv2)
+ (!kvm_vgic_global_state.can_emulate_gicv2 || kvm_is_realm(kvm)))
return -ENODEV;
/*
--
2.43.0
^ permalink raw reply related
* [PATCH v14 19/44] arm64: RMI: Allocate/free RECs to match vCPUs
From: Steven Price @ 2026-05-13 13:17 UTC (permalink / raw)
To: kvm, kvmarm
Cc: Steven Price, Catalin Marinas, Marc Zyngier, Will Deacon,
James Morse, Oliver Upton, Suzuki K Poulose, Zenghui Yu,
linux-arm-kernel, linux-kernel, Joey Gouly, Alexandru Elisei,
Christoffer Dall, Fuad Tabba, linux-coco, Ganapatrao Kulkarni,
Gavin Shan, Shanker Donthineni, Alper Gun, Aneesh Kumar K . V,
Emi Kisanuki, Vishal Annapurve, WeiLin.Chang, Lorenzo.Pieralisi2
In-Reply-To: <20260513131757.116630-1-steven.price@arm.com>
The RMM maintains a data structure known as the Realm Execution Context
(or REC). It is similar to struct kvm_vcpu and tracks the state of the
virtual CPUs. KVM must delegate memory and request the structures are
created when vCPUs are created, and suitably tear down on destruction.
RECs may require additional pages (e.g. for storing larger register
state for SVE). The RMM can request extra pages for this purpose using
the Stateful RMI Operations (SRO) functionality to request pages during
REC creation. These pages are then passed back to the host from the RMM
('reclaimed') when the REC is destroyed. The kernel tracking object
(struct rmi_sro_state) is stored in the realm_rec structure to avoid
memory allocation during the destruction path.
Note that only some of register state for the REC can be set by KVM, the
rest is defined by the RMM (zeroed). The register state then cannot be
changed by KVM after the REC is created (except when the guest
explicitly requests this e.g. by performing a PSCI call).
Signed-off-by: Steven Price <steven.price@arm.com>
---
Changes since v13:
* Support SRO for REC creation/destruction instead of auxiliary
granules.
Changes since v12:
* Use the new range-based delegation RMI.
Changes since v11:
* Remove the KVM_ARM_VCPU_REC feature. User space no longer needs to
configure each VCPU separately, RECs are created on the first VCPU
run of the guest.
Changes since v9:
* Size the aux_pages array according to the PAGE_SIZE of the host.
Changes since v7:
* Add comment explaining the aux_pages array.
* Rename "undeleted_failed" variable to "should_free" to avoid a
confusing double negative.
Changes since v6:
* Avoid reporting the KVM_ARM_VCPU_REC feature if the guest isn't a
realm guest.
* Support host page size being larger than RMM's granule size when
allocating/freeing aux granules.
Changes since v5:
* Separate the concept of vcpu_is_rec() and
kvm_arm_vcpu_rec_finalized() by using the KVM_ARM_VCPU_REC feature as
the indication that the VCPU is a REC.
Changes since v2:
* Free rec->run earlier in kvm_destroy_realm() and adapt to previous patches.
---
arch/arm64/include/asm/kvm_emulate.h | 2 +-
arch/arm64/include/asm/kvm_host.h | 3 +
arch/arm64/include/asm/kvm_rmi.h | 17 +++++
arch/arm64/kvm/arm.c | 6 ++
arch/arm64/kvm/reset.c | 1 +
arch/arm64/kvm/rmi.c | 105 +++++++++++++++++++++++++++
6 files changed, 133 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 82fd777bd9bb..2e69fe494716 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -714,7 +714,7 @@ static inline bool kvm_realm_is_created(struct kvm *kvm)
static inline bool vcpu_is_rec(const struct kvm_vcpu *vcpu)
{
- return false;
+ return kvm_is_realm(vcpu->kvm);
}
#endif /* __ARM64_KVM_EMULATE_H__ */
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 3512696ed506..39b5de03d0fe 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -969,6 +969,9 @@ struct kvm_vcpu_arch {
/* Hyp-readable copy of kvm_vcpu::pid */
pid_t pid;
+
+ /* Realm meta data */
+ struct realm_rec rec;
};
/*
diff --git a/arch/arm64/include/asm/kvm_rmi.h b/arch/arm64/include/asm/kvm_rmi.h
index 8bd743093ccf..d99bf4fc3c39 100644
--- a/arch/arm64/include/asm/kvm_rmi.h
+++ b/arch/arm64/include/asm/kvm_rmi.h
@@ -59,6 +59,22 @@ struct realm {
unsigned int ia_bits;
};
+/**
+ * struct realm_rec - Additional per VCPU data for a Realm
+ *
+ * @mpidr: MPIDR (Multiprocessor Affinity Register) value to identify this VCPU
+ * @rec_page: Kernel VA of the RMM's private page for this REC
+ * @aux_pages: Additional pages private to the RMM for this REC
+ * @run: Kernel VA of the RmiRecRun structure shared with the RMM
+ * @sro: A preallocated SRO state context
+ */
+struct realm_rec {
+ unsigned long mpidr;
+ void *rec_page;
+ struct rec_run *run;
+ struct rmi_sro_state *sro;
+};
+
void kvm_init_rmi(void);
u32 kvm_realm_ipa_limit(void);
@@ -66,6 +82,7 @@ int kvm_init_realm(struct kvm *kvm);
int kvm_activate_realm(struct kvm *kvm);
void kvm_destroy_realm(struct kvm *kvm);
void kvm_realm_destroy_rtts(struct kvm *kvm);
+void kvm_destroy_rec(struct kvm_vcpu *vcpu);
static inline bool kvm_realm_is_private_address(struct realm *realm,
unsigned long addr)
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index eb2b61fe1f0a..93d34762db91 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -586,6 +586,8 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
/* Force users to call KVM_ARM_VCPU_INIT */
vcpu_clear_flag(vcpu, VCPU_INITIALIZED);
+ vcpu->arch.rec.mpidr = INVALID_HWID;
+
vcpu->arch.mmu_page_cache.gfp_zero = __GFP_ZERO;
/* Set up the timer */
@@ -1651,6 +1653,10 @@ static int kvm_vcpu_init_check_features(struct kvm_vcpu *vcpu,
if (test_bit(KVM_ARM_VCPU_HAS_EL2, &features))
return -EINVAL;
+ /* Realms are incompatible with AArch32 */
+ if (vcpu_is_rec(vcpu))
+ return -EINVAL;
+
return 0;
}
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index b963fd975aac..c18cdca7d125 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -161,6 +161,7 @@ void kvm_arm_vcpu_destroy(struct kvm_vcpu *vcpu)
free_page((unsigned long)vcpu->arch.ctxt.vncr_array);
kfree(vcpu->arch.vncr_tlb);
kfree(vcpu->arch.ccsidr);
+ kvm_destroy_rec(vcpu);
}
static void kvm_vcpu_reset_sve(struct kvm_vcpu *vcpu)
diff --git a/arch/arm64/kvm/rmi.c b/arch/arm64/kvm/rmi.c
index 849111817af7..353a5ca45e78 100644
--- a/arch/arm64/kvm/rmi.c
+++ b/arch/arm64/kvm/rmi.c
@@ -173,9 +173,108 @@ static int realm_ensure_created(struct kvm *kvm)
return -ENXIO;
}
+static int kvm_create_rec(struct kvm_vcpu *vcpu)
+{
+ struct user_pt_regs *vcpu_regs = vcpu_gp_regs(vcpu);
+ unsigned long mpidr = kvm_vcpu_get_mpidr_aff(vcpu);
+ struct realm *realm = &vcpu->kvm->arch.realm;
+ struct realm_rec *rec = &vcpu->arch.rec;
+ unsigned long rec_page_phys;
+ struct rec_params *params;
+ int r, i;
+
+ if (rec->run)
+ return -EBUSY;
+
+ /*
+ * The RMM will report PSCI v1.0 to Realms and the KVM_ARM_VCPU_PSCI_0_2
+ * flag covers v0.2 and onwards.
+ */
+ if (!vcpu_has_feature(vcpu, KVM_ARM_VCPU_PSCI_0_2))
+ return -EINVAL;
+
+ BUILD_BUG_ON(sizeof(*params) > PAGE_SIZE);
+ BUILD_BUG_ON(sizeof(*rec->run) > PAGE_SIZE);
+
+ params = (struct rec_params *)get_zeroed_page(GFP_KERNEL);
+ rec->rec_page = (void *)__get_free_page(GFP_KERNEL);
+ rec->run = (void *)get_zeroed_page(GFP_KERNEL);
+ rec->sro = kmalloc_obj(*rec->sro);
+ if (!params || !rec->rec_page || !rec->run || !rec->sro) {
+ r = -ENOMEM;
+ goto out_free_pages;
+ }
+
+ for (i = 0; i < ARRAY_SIZE(params->gprs); i++)
+ params->gprs[i] = vcpu_regs->regs[i];
+
+ params->pc = vcpu_regs->pc;
+
+ if (vcpu->vcpu_id == 0)
+ params->flags |= REC_PARAMS_FLAG_RUNNABLE;
+
+ rec_page_phys = virt_to_phys(rec->rec_page);
+
+ if (rmi_delegate_page(rec_page_phys)) {
+ r = -ENXIO;
+ goto out_free_pages;
+ }
+
+ params->mpidr = mpidr;
+
+ if (rmi_rec_create(virt_to_phys(realm->rd), rec_page_phys,
+ virt_to_phys(params), rec->sro)) {
+ r = -ENXIO;
+ goto out_undelegate_rmm_rec;
+ }
+
+ rec->mpidr = mpidr;
+
+ free_page((unsigned long)params);
+ return 0;
+
+out_undelegate_rmm_rec:
+ if (WARN_ON(rmi_undelegate_page(rec_page_phys)))
+ rec->rec_page = NULL;
+out_free_pages:
+ free_page((unsigned long)rec->run);
+ free_page((unsigned long)rec->rec_page);
+ free_page((unsigned long)params);
+ kfree(rec->sro);
+ rec->run = NULL;
+ return r;
+}
+
+void kvm_destroy_rec(struct kvm_vcpu *vcpu)
+{
+ struct realm_rec *rec = &vcpu->arch.rec;
+ unsigned long rec_page_phys;
+
+ if (!vcpu_is_rec(vcpu))
+ return;
+
+ if (!rec->run) {
+ /* Nothing to do if the VCPU hasn't been finalized */
+ return;
+ }
+
+ free_page((unsigned long)rec->run);
+
+ rec_page_phys = virt_to_phys(rec->rec_page);
+
+ if (WARN_ON(rmi_rec_destroy(rec_page_phys, rec->sro)))
+ return;
+
+ kfree(rec->sro);
+
+ free_delegated_page(rec_page_phys);
+}
+
int kvm_activate_realm(struct kvm *kvm)
{
struct realm *realm = &kvm->arch.realm;
+ struct kvm_vcpu *vcpu;
+ unsigned long i;
int ret;
if (kvm_realm_state(kvm) >= REALM_STATE_ACTIVE)
@@ -198,6 +297,12 @@ int kvm_activate_realm(struct kvm *kvm)
/* Mark state as dead in case we fail */
kvm_set_realm_state(kvm, REALM_STATE_DEAD);
+ kvm_for_each_vcpu(i, vcpu, kvm) {
+ ret = kvm_create_rec(vcpu);
+ if (ret)
+ return ret;
+ }
+
ret = rmi_realm_activate(virt_to_phys(realm->rd));
if (ret)
return -ENXIO;
--
2.43.0
^ permalink raw reply related
* [PATCH v14 18/44] arm64: RMI: Activate realm on first VCPU run
From: Steven Price @ 2026-05-13 13:17 UTC (permalink / raw)
To: kvm, kvmarm
Cc: Steven Price, Catalin Marinas, Marc Zyngier, Will Deacon,
James Morse, Oliver Upton, Suzuki K Poulose, Zenghui Yu,
linux-arm-kernel, linux-kernel, Joey Gouly, Alexandru Elisei,
Christoffer Dall, Fuad Tabba, linux-coco, Ganapatrao Kulkarni,
Gavin Shan, Shanker Donthineni, Alper Gun, Aneesh Kumar K . V,
Emi Kisanuki, Vishal Annapurve, WeiLin.Chang, Lorenzo.Pieralisi2
In-Reply-To: <20260513131757.116630-1-steven.price@arm.com>
Use kvm_arch_vcpu_run_pid_change() to check if this is the first time
the realm guest has run. If this is the first run then activate the
realm.
Before the realm can be activated it must first be created, this is a
stub in this patch and will be filled in by a later patch.
Signed-off-by: Steven Price <steven.price@arm.com>
---
Changes since v12:
* Fix commit message
* Change realm_state checks to be >= REALM_STATE_ACTIVE to avoid a dead
guest being revived by kvm_activate_realm().
---
arch/arm64/include/asm/kvm_rmi.h | 1 +
arch/arm64/kvm/arm.c | 6 +++++
arch/arm64/kvm/rmi.c | 39 ++++++++++++++++++++++++++++++++
3 files changed, 46 insertions(+)
diff --git a/arch/arm64/include/asm/kvm_rmi.h b/arch/arm64/include/asm/kvm_rmi.h
index 06ba0d4745c6..8bd743093ccf 100644
--- a/arch/arm64/include/asm/kvm_rmi.h
+++ b/arch/arm64/include/asm/kvm_rmi.h
@@ -63,6 +63,7 @@ void kvm_init_rmi(void);
u32 kvm_realm_ipa_limit(void);
int kvm_init_realm(struct kvm *kvm);
+int kvm_activate_realm(struct kvm *kvm);
void kvm_destroy_realm(struct kvm *kvm);
void kvm_realm_destroy_rtts(struct kvm *kvm);
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 41d35b2d1dee..eb2b61fe1f0a 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1018,6 +1018,12 @@ int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu)
return ret;
}
+ if (kvm_is_realm(vcpu->kvm)) {
+ ret = kvm_activate_realm(kvm);
+ if (ret)
+ return ret;
+ }
+
mutex_lock(&kvm->arch.config_lock);
set_bit(KVM_ARCH_FLAG_HAS_RAN_ONCE, &kvm->arch.flags);
mutex_unlock(&kvm->arch.config_lock);
diff --git a/arch/arm64/kvm/rmi.c b/arch/arm64/kvm/rmi.c
index 5b00ccca4af3..849111817af7 100644
--- a/arch/arm64/kvm/rmi.c
+++ b/arch/arm64/kvm/rmi.c
@@ -167,6 +167,45 @@ void kvm_realm_destroy_rtts(struct kvm *kvm)
realm_tear_down_rtt_range(realm, 0, (1UL << ia_bits));
}
+static int realm_ensure_created(struct kvm *kvm)
+{
+ /* Provided in later patch */
+ return -ENXIO;
+}
+
+int kvm_activate_realm(struct kvm *kvm)
+{
+ struct realm *realm = &kvm->arch.realm;
+ int ret;
+
+ if (kvm_realm_state(kvm) >= REALM_STATE_ACTIVE)
+ return 0;
+
+ if (!irqchip_in_kernel(kvm)) {
+ /* Userspace irqchip not yet supported with realms */
+ return -EOPNOTSUPP;
+ }
+
+ guard(mutex)(&kvm->arch.config_lock);
+ /* Check again with the lock held */
+ if (kvm_realm_state(kvm) >= REALM_STATE_ACTIVE)
+ return 0;
+
+ ret = realm_ensure_created(kvm);
+ if (ret)
+ return ret;
+
+ /* Mark state as dead in case we fail */
+ kvm_set_realm_state(kvm, REALM_STATE_DEAD);
+
+ ret = rmi_realm_activate(virt_to_phys(realm->rd));
+ if (ret)
+ return -ENXIO;
+
+ kvm_set_realm_state(kvm, REALM_STATE_ACTIVE);
+ return 0;
+}
+
void kvm_destroy_realm(struct kvm *kvm)
{
struct realm *realm = &kvm->arch.realm;
--
2.43.0
^ permalink raw reply related
* [PATCH v14 17/44] arm64: RMI: RTT tear down
From: Steven Price @ 2026-05-13 13:17 UTC (permalink / raw)
To: kvm, kvmarm
Cc: Steven Price, Catalin Marinas, Marc Zyngier, Will Deacon,
James Morse, Oliver Upton, Suzuki K Poulose, Zenghui Yu,
linux-arm-kernel, linux-kernel, Joey Gouly, Alexandru Elisei,
Christoffer Dall, Fuad Tabba, linux-coco, Ganapatrao Kulkarni,
Gavin Shan, Shanker Donthineni, Alper Gun, Aneesh Kumar K . V,
Emi Kisanuki, Vishal Annapurve, WeiLin.Chang, Lorenzo.Pieralisi2
In-Reply-To: <20260513131757.116630-1-steven.price@arm.com>
The RMM owns the stage 2 page tables for a realm, and KVM must request
that the RMM creates/destroys entries as necessary. The physical pages
to store the page tables are delegated to the realm as required, and can
be undelegated when no longer used.
Creating new RTTs is the easy part, tearing down is a little more
tricky. The result of realm_rtt_destroy() can be used to effectively
walk the tree and destroy the entries (undelegating pages that were
given to the realm).
Signed-off-by: Steven Price <steven.price@arm.com>
---
Changes since v13:
* Avoid the double call of kvm_free_stage2_pgd() by splitting the work
across that and a new function kvm_realm_uninit_stage2() which is
only called for realm guests.
Changes since v12:
* Simplify some functions now we know RMM page size is the same as the
host's.
Changes since v11:
* Moved some code from earlier in the series to this one so that it's
added when it's first used.
Changes since v10:
* RME->RMI rename.
* Some code to handle freeing stage 2 PGD moved into this patch where
it belongs.
Changes since v9:
* Add a comment clarifying that root level RTTs are not destroyed until
after the RD is destroyed.
Changes since v8:
* Introduce free_rtt() wrapper which calls free_delegated_granule()
followed by kvm_account_pgtable_pages(). This makes it clear where an
RTT is being freed rather than just a delegated granule.
Changes since v6:
* Move rme_rtt_level_mapsize() and supporting defines from kvm_rme.h
into rme.c as they are only used in that file.
Changes since v5:
* Rename some RME_xxx defines to do with page sizes as RMM_xxx - they are
a property of the RMM specification not the RME architecture.
Changes since v2:
* Moved {alloc,free}_delegated_page() and ensure_spare_page() to a
later patch when they are actually used.
* Some simplifications now rmi_xxx() functions allow NULL as an output
parameter.
* Improved comments and code layout.
---
arch/arm64/include/asm/kvm_rmi.h | 7 ++
arch/arm64/kvm/mmu.c | 21 ++++-
arch/arm64/kvm/rmi.c | 148 +++++++++++++++++++++++++++++++
3 files changed, 174 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_rmi.h b/arch/arm64/include/asm/kvm_rmi.h
index 9de34983ee52..06ba0d4745c6 100644
--- a/arch/arm64/include/asm/kvm_rmi.h
+++ b/arch/arm64/include/asm/kvm_rmi.h
@@ -64,5 +64,12 @@ u32 kvm_realm_ipa_limit(void);
int kvm_init_realm(struct kvm *kvm);
void kvm_destroy_realm(struct kvm *kvm);
+void kvm_realm_destroy_rtts(struct kvm *kvm);
+
+static inline bool kvm_realm_is_private_address(struct realm *realm,
+ unsigned long addr)
+{
+ return !(addr & BIT(realm->ia_bits - 1));
+}
#endif /* __ASM_KVM_RMI_H */
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index ba8286472286..eb56d4e7f21a 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1024,9 +1024,26 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu, unsigned long t
return err;
}
+static void kvm_realm_uninit_stage2(struct kvm_s2_mmu *mmu)
+{
+ struct kvm *kvm = kvm_s2_mmu_to_kvm(mmu);
+ struct realm *realm = &kvm->arch.realm;
+
+ if (kvm_realm_state(kvm) != REALM_STATE_ACTIVE)
+ return;
+
+ write_lock(&kvm->mmu_lock);
+ kvm_stage2_unmap_range(mmu, 0, BIT(realm->ia_bits - 1), true);
+ write_unlock(&kvm->mmu_lock);
+ kvm_realm_destroy_rtts(kvm);
+}
+
void kvm_uninit_stage2_mmu(struct kvm *kvm)
{
- kvm_free_stage2_pgd(&kvm->arch.mmu);
+ if (kvm_is_realm(kvm))
+ kvm_realm_uninit_stage2(&kvm->arch.mmu);
+ else
+ kvm_free_stage2_pgd(&kvm->arch.mmu);
kvm_mmu_free_memory_cache(&kvm->arch.mmu.split_page_cache);
}
@@ -1103,7 +1120,7 @@ void stage2_unmap_vm(struct kvm *kvm)
void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu)
{
struct kvm *kvm = kvm_s2_mmu_to_kvm(mmu);
- struct kvm_pgtable *pgt = NULL;
+ struct kvm_pgtable *pgt;
write_lock(&kvm->mmu_lock);
pgt = mmu->pgt;
diff --git a/arch/arm64/kvm/rmi.c b/arch/arm64/kvm/rmi.c
index f51ec667445e..5b00ccca4af3 100644
--- a/arch/arm64/kvm/rmi.c
+++ b/arch/arm64/kvm/rmi.c
@@ -11,6 +11,14 @@
#include <asm/rmi_cmds.h>
#include <asm/virt.h>
+static inline unsigned long rmi_rtt_level_mapsize(int level)
+{
+ if (WARN_ON(level > KVM_PGTABLE_LAST_LEVEL))
+ return PAGE_SIZE;
+
+ return (1UL << ARM64_HW_PGTABLE_LEVEL_SHIFT(level));
+}
+
static bool rmi_has_feature(unsigned long feature)
{
return !!u64_get_bits(rmm_feat_reg0, feature);
@@ -21,6 +29,144 @@ u32 kvm_realm_ipa_limit(void)
return u64_get_bits(rmm_feat_reg0, RMI_FEATURE_REGISTER_0_S2SZ);
}
+static int get_start_level(struct realm *realm)
+{
+ return 4 - stage2_pgtable_levels(realm->ia_bits);
+}
+
+static void free_rtt(phys_addr_t phys)
+{
+ if (free_delegated_page(phys))
+ return;
+
+ kvm_account_pgtable_pages(phys_to_virt(phys), -1);
+}
+
+/*
+ * realm_rtt_destroy - Destroy an RTT at @level for @addr.
+ *
+ * Returns - Result of the RMI_RTT_DESTROY call, and:
+ * @rtt_granule: RTT granule, if the RTT was destroyed.
+ * @next_addr: IPA corresponding to the next possible valid entry we
+ * can target
+ */
+static int realm_rtt_destroy(struct realm *realm, unsigned long addr,
+ int level, phys_addr_t *rtt_granule,
+ unsigned long *next_addr)
+{
+ unsigned long out_rtt;
+ int ret;
+
+ ret = rmi_rtt_destroy(virt_to_phys(realm->rd), addr, level,
+ &out_rtt, next_addr);
+
+ *rtt_granule = out_rtt;
+
+ return ret;
+}
+
+static int realm_tear_down_rtt_level(struct realm *realm, int level,
+ unsigned long start, unsigned long end)
+{
+ ssize_t map_size;
+ unsigned long addr, next_addr;
+
+ if (WARN_ON(level > KVM_PGTABLE_LAST_LEVEL))
+ return -EINVAL;
+
+ map_size = rmi_rtt_level_mapsize(level - 1);
+
+ for (addr = start; addr < end; addr = next_addr) {
+ phys_addr_t rtt_granule;
+ int ret;
+ unsigned long align_addr = ALIGN(addr, map_size);
+
+ next_addr = ALIGN(addr + 1, map_size);
+
+ if (next_addr > end || align_addr != addr) {
+ /*
+ * The target range is smaller than what this level
+ * covers, recurse deeper.
+ */
+ ret = realm_tear_down_rtt_level(realm,
+ level + 1,
+ addr,
+ min(next_addr, end));
+ if (ret)
+ return ret;
+ continue;
+ }
+
+ ret = realm_rtt_destroy(realm, addr, level,
+ &rtt_granule, &next_addr);
+
+ switch (RMI_RETURN_STATUS(ret)) {
+ case RMI_SUCCESS:
+ free_rtt(rtt_granule);
+ break;
+ case RMI_ERROR_RTT:
+ if (next_addr > addr) {
+ /* Missing RTT, skip */
+ break;
+ }
+ /*
+ * We tear down the RTT range for the full IPA
+ * space, after everything is unmapped. Also we
+ * descend down only if we cannot tear down a
+ * top level RTT. Thus RMM must be able to walk
+ * to the requested level. e.g., a block mapping
+ * exists at L1 or L2.
+ */
+ if (WARN_ON(RMI_RETURN_INDEX(ret) != level))
+ return -EBUSY;
+ if (WARN_ON(level == KVM_PGTABLE_LAST_LEVEL))
+ return -EBUSY;
+
+ /*
+ * The table has active entries in it, recurse deeper
+ * and tear down the RTTs.
+ */
+ next_addr = ALIGN(addr + 1, map_size);
+ ret = realm_tear_down_rtt_level(realm,
+ level + 1,
+ addr,
+ next_addr);
+ if (ret)
+ return ret;
+ /*
+ * Now that the child RTTs are destroyed,
+ * retry at this level.
+ */
+ next_addr = addr;
+ break;
+ default:
+ WARN_ON(1);
+ return -ENXIO;
+ }
+ }
+
+ return 0;
+}
+
+static int realm_tear_down_rtt_range(struct realm *realm,
+ unsigned long start, unsigned long end)
+{
+ /*
+ * Root level RTTs can only be destroyed after the RD is destroyed. So
+ * tear down everything below the root level
+ */
+ return realm_tear_down_rtt_level(realm, get_start_level(realm) + 1,
+ start, end);
+}
+
+void kvm_realm_destroy_rtts(struct kvm *kvm)
+{
+ struct realm *realm = &kvm->arch.realm;
+ unsigned int ia_bits = realm->ia_bits;
+
+ realm_tear_down_rtt_range(realm, 0, (1UL << ia_bits));
+}
+
void kvm_destroy_realm(struct kvm *kvm)
{
struct realm *realm = &kvm->arch.realm;
@@ -47,6 +193,8 @@ void kvm_destroy_realm(struct kvm *kvm)
if (WARN_ON(rmi_realm_terminate(rd_phys)))
return;
+ kvm_realm_destroy_rtts(kvm);
+
if (WARN_ON(rmi_realm_destroy(rd_phys)))
return;
free_delegated_page(rd_phys);
--
2.43.0
^ permalink raw reply related
* [PATCH v14 16/44] KVM: arm64: Allow passing machine type in KVM creation
From: Steven Price @ 2026-05-13 13:17 UTC (permalink / raw)
To: kvm, kvmarm
Cc: Steven Price, Catalin Marinas, Marc Zyngier, Will Deacon,
James Morse, Oliver Upton, Suzuki K Poulose, Zenghui Yu,
linux-arm-kernel, linux-kernel, Joey Gouly, Alexandru Elisei,
Christoffer Dall, Fuad Tabba, linux-coco, Ganapatrao Kulkarni,
Gavin Shan, Shanker Donthineni, Alper Gun, Aneesh Kumar K . V,
Emi Kisanuki, Vishal Annapurve, WeiLin.Chang, Lorenzo.Pieralisi2
In-Reply-To: <20260513131757.116630-1-steven.price@arm.com>
Previously machine type was used purely for specifying the physical
address size of the guest. Reserve the higher bits to specify an ARM
specific machine type and declare a new type 'KVM_VM_TYPE_ARM_REALM'
used to create a realm guest.
Signed-off-by: Steven Price <steven.price@arm.com>
---
Changes since v13:
* Rework to use the two top bits for the machine type now that pKVM has
merged and used the top bit for KVM_VM_TYPE_ARM_PROTECTED.
* Update the documentation to include KVM_VM_TYPE_ARM_PROTECTED as
well.
Changes since v9:
* Explictly set realm.state to REALM_STATE_NONE rather than rely on the
zeroing of the structure.
Changes since v7:
* Add some documentation explaining the new machine type.
Changes since v6:
* Make the check for kvm_rme_is_available more visible and report an
error code of -EPERM (instead of -EINVAL) to make it explicit that
the kernel supports RME, but the platform doesn't.
---
Documentation/virt/kvm/api.rst | 18 ++++++++++++++++--
arch/arm64/kvm/arm.c | 11 +++++++++++
include/uapi/linux/kvm.h | 7 ++++++-
3 files changed, 33 insertions(+), 3 deletions(-)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index ca68aae7faa2..31a5919d8d5f 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -181,8 +181,22 @@ flag KVM_VM_MIPS_VZ.
ARM64:
^^^^^^
-On arm64, the physical address size for a VM (IPA Size limit) is limited
-to 40bits by default. The limit can be configured if the host supports the
+On arm64, the machine type identifier is used to encode a type and the
+physical address size for the VM. The lower byte (bits[7-0]) encode the
+address size and the upper bits[30-31] encode a machine type. The machine
+types that might be available are:
+
+ ========================= ============================================
+ KVM_VM_TYPE_ARM_NORMAL A standard VM
+ KVM_VM_TYPE_ARM_REALM A "Realm" VM using the Arm Confidential
+ Compute extensions, the VM's memory is
+ protected from the host.
+ KVM_VM_TYPE_ARM_PROTECTED A "protected" VM using pKVM to isolate the
+ VM from the host.
+ ========================= ============================================
+
+The physical address size for a VM (IPA Size limit) is limited to 40bits
+by default. The limit can be configured if the host supports the
extension KVM_CAP_ARM_VM_IPA_SIZE. When supported, use
KVM_VM_TYPE_ARM_IPA_SIZE(IPA_Bits) to set the size in the machine type
identifier, where IPA_Bits is the maximum width of any physical
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index c6ebc5913e40..41d35b2d1dee 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -246,6 +246,17 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
mutex_unlock(&kvm->lock);
#endif
+ if ((type & KVM_VM_TYPE_ARM_PROTECTED) &&
+ (type & KVM_VM_TYPE_ARM_REALM))
+ return -EINVAL;
+
+ if (type & KVM_VM_TYPE_ARM_REALM) {
+ if (!static_branch_unlikely(&kvm_rmi_is_available))
+ return -EINVAL;
+ kvm_set_realm_state(kvm, REALM_STATE_NONE);
+ kvm->arch.is_realm = true;
+ }
+
kvm_init_nested(kvm);
ret = kvm_share_hyp(kvm, kvm + 1);
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index b8cff0938041..7b2507a3865e 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -700,14 +700,19 @@ struct kvm_enable_cap {
* address size for the VM. Bits[7-0] are reserved for the guest
* PA size shift (i.e, log2(PA_Size)). For backward compatibility,
* value 0 implies the default IPA size, 40bits.
+ *
+ * Bits[30-31] are reserved for the VM type
*/
#define KVM_VM_TYPE_ARM_IPA_SIZE_MASK 0xffULL
#define KVM_VM_TYPE_ARM_IPA_SIZE(x) \
((x) & KVM_VM_TYPE_ARM_IPA_SIZE_MASK)
+#define KVM_VM_TYPE_ARM_NORMAL 0
+#define KVM_VM_TYPE_ARM_REALM (1UL << 30)
#define KVM_VM_TYPE_ARM_PROTECTED (1UL << 31)
#define KVM_VM_TYPE_ARM_MASK (KVM_VM_TYPE_ARM_IPA_SIZE_MASK | \
- KVM_VM_TYPE_ARM_PROTECTED)
+ KVM_VM_TYPE_ARM_PROTECTED | \
+ KVM_VM_TYPE_ARM_REALM)
/*
* ioctls for /dev/kvm fds:
--
2.43.0
^ permalink raw reply related
* [PATCH v14 15/44] kvm: arm64: Don't expose unsupported capabilities for realm guests
From: Steven Price @ 2026-05-13 13:17 UTC (permalink / raw)
To: kvm, kvmarm
Cc: Suzuki K Poulose, Catalin Marinas, Marc Zyngier, Will Deacon,
James Morse, Oliver Upton, Zenghui Yu, linux-arm-kernel,
linux-kernel, Joey Gouly, Alexandru Elisei, Christoffer Dall,
Fuad Tabba, linux-coco, Ganapatrao Kulkarni, Gavin Shan,
Shanker Donthineni, Alper Gun, Aneesh Kumar K . V, Emi Kisanuki,
Vishal Annapurve, WeiLin.Chang, Lorenzo.Pieralisi2, Steven Price
In-Reply-To: <20260513131757.116630-1-steven.price@arm.com>
From: Suzuki K Poulose <suzuki.poulose@arm.com>
RMM v2.0 provides no mechanism for the host to perform debug operations
on the guest. So limit the extensions that are visible to an allowlist
so that only those capabilities we can support are advertised.
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
---
Changes since v13:
* Add missing check in kvm_vm_ioctl_enable_cap().
Changes since v10:
* Add a kvm_realm_ext_allowed() function which limits which extensions
are exposed to an allowlist. This removes the need for special casing
various extensions.
Changes since v7:
* Remove the helper functions and inline the kvm_is_realm() check with
a ternary operator.
* Rewrite the commit message to explain this patch.
---
arch/arm64/kvm/arm.c | 25 +++++++++++++++++++++++++
1 file changed, 25 insertions(+)
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 18251e561524..c6ebc5913e40 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -133,6 +133,25 @@ int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)
return kvm_vcpu_exiting_guest_mode(vcpu) == IN_GUEST_MODE;
}
+static bool kvm_realm_ext_allowed(long ext)
+{
+ switch (ext) {
+ case KVM_CAP_IRQCHIP:
+ case KVM_CAP_ARM_PSCI:
+ case KVM_CAP_ARM_PSCI_0_2:
+ case KVM_CAP_NR_VCPUS:
+ case KVM_CAP_MAX_VCPUS:
+ case KVM_CAP_MAX_VCPU_ID:
+ case KVM_CAP_MSI_DEVID:
+ case KVM_CAP_ARM_VM_IPA_SIZE:
+ case KVM_CAP_ARM_PTRAUTH_ADDRESS:
+ case KVM_CAP_ARM_PTRAUTH_GENERIC:
+ case KVM_CAP_ARM_RMI:
+ return true;
+ }
+ return false;
+}
+
int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
struct kvm_enable_cap *cap)
{
@@ -144,6 +163,9 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
if (is_protected_kvm_enabled() && !kvm_pkvm_ext_allowed(kvm, cap->cap))
return -EINVAL;
+ if (kvm && kvm_is_realm(kvm) && !kvm_realm_ext_allowed(cap->cap))
+ return -EINVAL;
+
switch (cap->cap) {
case KVM_CAP_ARM_NISV_TO_USER:
r = 0;
@@ -378,6 +400,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
if (is_protected_kvm_enabled() && !kvm_pkvm_ext_allowed(kvm, ext))
return 0;
+ if (kvm && kvm_is_realm(kvm) && !kvm_realm_ext_allowed(ext))
+ return 0;
+
switch (ext) {
case KVM_CAP_IRQCHIP:
r = vgic_present;
--
2.43.0
^ permalink raw reply related
* [PATCH v14 14/44] arm64: RMI: Basic infrastructure for creating a realm.
From: Steven Price @ 2026-05-13 13:17 UTC (permalink / raw)
To: kvm, kvmarm
Cc: Steven Price, Catalin Marinas, Marc Zyngier, Will Deacon,
James Morse, Oliver Upton, Suzuki K Poulose, Zenghui Yu,
linux-arm-kernel, linux-kernel, Joey Gouly, Alexandru Elisei,
Christoffer Dall, Fuad Tabba, linux-coco, Ganapatrao Kulkarni,
Gavin Shan, Shanker Donthineni, Alper Gun, Aneesh Kumar K . V,
Emi Kisanuki, Vishal Annapurve, WeiLin.Chang, Lorenzo.Pieralisi2
In-Reply-To: <20260513131757.116630-1-steven.price@arm.com>
Introduce the skeleton functions for creating and destroying a realm.
The IPA size requested is checked against what the RMM supports.
The actual work of constructing the realm will be added in future
patches.
Signed-off-by: Steven Price <steven.price@arm.com>
---
Changes since v13:
* Rebased and updated to RMM-v2.0-bet1.
* Auxiliary granules have been removed in RMM-v2.0-bet1
Changes since v12:
* Drop the RMM_PAGE_{SHIFT,SIZE} defines - the RMM is now configured to
be the same as the host's page size.
* Rework delegate/undelegate functions to use the new RMI range based
operations.
Changes since v11:
* Major rework to drop the realm configuration and make the
construction of realms implicit rather than driven by the VMM
directly.
* The code to create RDs, handle VMIDs etc is moved to later patches.
Changes since v10:
* Rename from RME to RMI.
* Move the stage2 cleanup to a later patch.
Changes since v9:
* Avoid walking the stage 2 page tables when destroying the realm -
the real ones are not accessible to the non-secure world, and the RMM
may leave junk in the physical pages when returning them.
* Fix an error path in realm_create_rd() to actually return an error value.
Changes since v8:
* Fix free_delegated_granule() to not call kvm_account_pgtable_pages();
a separate wrapper will be introduced in a later patch to deal with
RTTs.
* Minor code cleanups following review.
Changes since v7:
* Minor code cleanup following Gavin's review.
Changes since v6:
* Separate RMM RTT calculations from host PAGE_SIZE. This allows the
host page size to be larger than 4k while still communicating with an
RMM which uses 4k granules.
Changes since v5:
* Introduce free_delegated_granule() to replace many
undelegate/free_page() instances and centralise the comment on
leaking when the undelegate fails.
* Several other minor improvements suggested by reviews - thanks for
the feedback!
Changes since v2:
* Improved commit description.
* Improved return failures for rmi_check_version().
* Clear contents of PGD after it has been undelegated in case the RMM
left stale data.
* Minor changes to reflect changes in previous patches.
---
arch/arm64/include/asm/kvm_emulate.h | 29 ++++++++++++++
arch/arm64/include/asm/kvm_rmi.h | 51 +++++++++++++++++++++++++
arch/arm64/kvm/arm.c | 12 ++++++
arch/arm64/kvm/mmu.c | 12 +++++-
arch/arm64/kvm/rmi.c | 57 ++++++++++++++++++++++++++++
5 files changed, 159 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 5bf3d7e1d92c..82fd777bd9bb 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -688,4 +688,33 @@ static inline void vcpu_set_hcrx(struct kvm_vcpu *vcpu)
vcpu->arch.hcrx_el2 |= HCRX_EL2_EnASR;
}
}
+
+static inline bool kvm_is_realm(struct kvm *kvm)
+{
+ if (static_branch_unlikely(&kvm_rmi_is_available))
+ return kvm->arch.is_realm;
+ return false;
+}
+
+static inline enum realm_state kvm_realm_state(struct kvm *kvm)
+{
+ return READ_ONCE(kvm->arch.realm.state);
+}
+
+static inline void kvm_set_realm_state(struct kvm *kvm,
+ enum realm_state new_state)
+{
+ WRITE_ONCE(kvm->arch.realm.state, new_state);
+}
+
+static inline bool kvm_realm_is_created(struct kvm *kvm)
+{
+ return kvm_is_realm(kvm) && kvm_realm_state(kvm) != REALM_STATE_NONE;
+}
+
+static inline bool vcpu_is_rec(const struct kvm_vcpu *vcpu)
+{
+ return false;
+}
+
#endif /* __ARM64_KVM_EMULATE_H__ */
diff --git a/arch/arm64/include/asm/kvm_rmi.h b/arch/arm64/include/asm/kvm_rmi.h
index 4936007947fd..9de34983ee52 100644
--- a/arch/arm64/include/asm/kvm_rmi.h
+++ b/arch/arm64/include/asm/kvm_rmi.h
@@ -6,12 +6,63 @@
#ifndef __ASM_KVM_RMI_H
#define __ASM_KVM_RMI_H
+#include <asm/rmi_smc.h>
+
+/**
+ * enum realm_state - State of a Realm
+ */
+enum realm_state {
+ /**
+ * @REALM_STATE_NONE:
+ * Realm has not yet been created. rmi_realm_create() has not
+ * yet been called.
+ */
+ REALM_STATE_NONE,
+ /**
+ * @REALM_STATE_NEW:
+ * Realm is under construction, rmi_realm_create() has been
+ * called, but it is not yet activated. Pages may be populated.
+ */
+ REALM_STATE_NEW,
+ /**
+ * @REALM_STATE_ACTIVE:
+ * Realm has been created and is eligible for execution with
+ * rmi_rec_enter(). Pages may no longer be populated with
+ * rmi_data_create().
+ */
+ REALM_STATE_ACTIVE,
+ /**
+ * @REALM_STATE_DYING:
+ * Realm is in the process of being destroyed or has already been
+ * destroyed.
+ */
+ REALM_STATE_DYING,
+ /**
+ * @REALM_STATE_DEAD:
+ * Realm has been destroyed.
+ */
+ REALM_STATE_DEAD
+};
+
/**
* struct realm - Additional per VM data for a Realm
+ *
+ * @state: The lifetime state machine for the realm
+ * @rd: Kernel mapping of the Realm Descriptor (RD)
+ * @params: Parameters for the RMI_REALM_CREATE command
+ * @ia_bits: Number of valid Input Address bits in the IPA
*/
struct realm {
+ enum realm_state state;
+ void *rd;
+ struct realm_params *params;
+ unsigned int ia_bits;
};
void kvm_init_rmi(void);
+u32 kvm_realm_ipa_limit(void);
+
+int kvm_init_realm(struct kvm *kvm);
+void kvm_destroy_realm(struct kvm *kvm);
#endif /* __ASM_KVM_RMI_H */
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 247e03b33035..18251e561524 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -264,6 +264,13 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
bitmap_zero(kvm->arch.vcpu_features, KVM_VCPU_MAX_FEATURES);
+ /* Initialise the realm bits after the generic bits are enabled */
+ if (kvm_is_realm(kvm)) {
+ ret = kvm_init_realm(kvm);
+ if (ret)
+ goto err_uninit_mmu;
+ }
+
return 0;
err_uninit_mmu:
@@ -326,6 +333,8 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
kvm_unshare_hyp(kvm, kvm + 1);
kvm_arm_teardown_hypercalls(kvm);
+ if (kvm_is_realm(kvm))
+ kvm_destroy_realm(kvm);
}
static bool kvm_has_full_ptr_auth(void)
@@ -486,6 +495,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
else
r = kvm_supports_cacheable_pfnmap();
break;
+ case KVM_CAP_ARM_RMI:
+ r = static_key_enabled(&kvm_rmi_is_available);
+ break;
default:
r = 0;
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index d089c107d9b7..ba8286472286 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -877,10 +877,14 @@ static struct kvm_pgtable_mm_ops kvm_s2_mm_ops = {
static int kvm_init_ipa_range(struct kvm_s2_mmu *mmu, unsigned long type)
{
+ struct kvm *kvm = kvm_s2_mmu_to_kvm(mmu);
u32 kvm_ipa_limit = get_kvm_ipa_limit();
u64 mmfr0, mmfr1;
u32 phys_shift;
+ if (kvm_is_realm(kvm))
+ kvm_ipa_limit = kvm_realm_ipa_limit();
+
phys_shift = KVM_VM_TYPE_ARM_IPA_SIZE(type);
if (is_protected_kvm_enabled()) {
phys_shift = kvm_ipa_limit;
@@ -974,6 +978,8 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu, unsigned long t
return -EINVAL;
}
+ mmu->arch = &kvm->arch;
+
err = kvm_init_ipa_range(mmu, type);
if (err)
return err;
@@ -982,7 +988,6 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu, unsigned long t
if (!pgt)
return -ENOMEM;
- mmu->arch = &kvm->arch;
err = KVM_PGT_FN(kvm_pgtable_stage2_init)(pgt, mmu, &kvm_s2_mm_ops);
if (err)
goto out_free_pgtable;
@@ -1114,7 +1119,10 @@ void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu)
write_unlock(&kvm->mmu_lock);
if (pgt) {
- kvm_stage2_destroy(pgt);
+ if (!kvm_is_realm(kvm))
+ kvm_stage2_destroy(pgt);
+ else
+ kvm_pgtable_stage2_destroy_pgd(pgt);
kfree(pgt);
}
}
diff --git a/arch/arm64/kvm/rmi.c b/arch/arm64/kvm/rmi.c
index 6e28b669ded2..f51ec667445e 100644
--- a/arch/arm64/kvm/rmi.c
+++ b/arch/arm64/kvm/rmi.c
@@ -5,6 +5,8 @@
#include <linux/kvm_host.h>
+#include <asm/kvm_emulate.h>
+#include <asm/kvm_mmu.h>
#include <asm/kvm_pgtable.h>
#include <asm/rmi_cmds.h>
#include <asm/virt.h>
@@ -14,6 +16,61 @@ static bool rmi_has_feature(unsigned long feature)
return !!u64_get_bits(rmm_feat_reg0, feature);
}
+u32 kvm_realm_ipa_limit(void)
+{
+ return u64_get_bits(rmm_feat_reg0, RMI_FEATURE_REGISTER_0_S2SZ);
+}
+
+void kvm_destroy_realm(struct kvm *kvm)
+{
+ struct realm *realm = &kvm->arch.realm;
+ size_t pgd_size = kvm_pgtable_stage2_pgd_size(kvm->arch.mmu.vtcr);
+
+ if (realm->params) {
+ free_page((unsigned long)realm->params);
+ realm->params = NULL;
+ }
+
+ if (!kvm_realm_is_created(kvm))
+ return;
+
+ kvm_set_realm_state(kvm, REALM_STATE_DYING);
+
+ write_lock(&kvm->mmu_lock);
+ kvm_stage2_unmap_range(&kvm->arch.mmu, 0,
+ BIT(realm->ia_bits - 1), true);
+ write_unlock(&kvm->mmu_lock);
+
+ if (realm->rd) {
+ phys_addr_t rd_phys = virt_to_phys(realm->rd);
+
+ if (WARN_ON(rmi_realm_terminate(rd_phys)))
+ return;
+
+ if (WARN_ON(rmi_realm_destroy(rd_phys)))
+ return;
+ free_delegated_page(rd_phys);
+ realm->rd = NULL;
+ }
+
+ if (WARN_ON(rmi_undelegate_range(kvm->arch.mmu.pgd_phys, pgd_size)))
+ return;
+
+ kvm_set_realm_state(kvm, REALM_STATE_DEAD);
+
+ /* Now that the Realm is destroyed, free the entry level RTTs */
+ kvm_free_stage2_pgd(&kvm->arch.mmu);
+}
+
+int kvm_init_realm(struct kvm *kvm)
+{
+ kvm->arch.realm.params = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
+
+ if (!kvm->arch.realm.params)
+ return -ENOMEM;
+ return 0;
+}
+
static int rmm_check_features(void)
{
if (kvm_lpa2_is_enabled() && !rmi_has_feature(RMI_FEATURE_REGISTER_0_LPA2)) {
--
2.43.0
^ permalink raw reply related
* [PATCH v14 13/44] arm64: RMI: Define the user ABI
From: Steven Price @ 2026-05-13 13:17 UTC (permalink / raw)
To: kvm, kvmarm
Cc: Steven Price, Catalin Marinas, Marc Zyngier, Will Deacon,
James Morse, Oliver Upton, Suzuki K Poulose, Zenghui Yu,
linux-arm-kernel, linux-kernel, Joey Gouly, Alexandru Elisei,
Christoffer Dall, Fuad Tabba, linux-coco, Ganapatrao Kulkarni,
Gavin Shan, Shanker Donthineni, Alper Gun, Aneesh Kumar K . V,
Emi Kisanuki, Vishal Annapurve, WeiLin.Chang, Lorenzo.Pieralisi2
In-Reply-To: <20260513131757.116630-1-steven.price@arm.com>
There is one CAP which identified the presence of CCA, and one ioctl.
The ioctl is used to populate memory during creation of the realm as
this requires the RMM to copy data from an unprotected address to the
protected memory - CCA does not support memory conversion where the
memory contents is preserved as this is incompatible with memory
encryption.
Signed-off-by: Steven Price <steven.price@arm.com>
---
Changes since v13:
* KVM_ARM_VCPU_RMI_PSCI_COMPLETE removed.
* KVM_ARM_RMI_POPULATE documentation updated to reflect that the
structure is written by the kernel.
* CAP number bumped.
Changes since v12:
* Change KVM_ARM_RMI_POPULATE to update the structure with the amount
that has been progressed rather than return the number of bytes
populated.
* Describe the flag KVM_ARM_RMI_POPULATE_FLAGS_MEASURE.
* CAP number is bumped.
* NOTE: The PSCI ioctl may be removed in a future spec release.
Changes since v11:
* Completely reworked to be more implicit. Rather than having explicit
CAP operations to progress the realm construction these operations
are done when needed (on populating and on first vCPU run).
* Populate and PSCI complete are promoted to proper ioctls.
Changes since v10:
* Rename symbols from RME to RMI.
Changes since v9:
* Improvements to documentation.
* Bump the magic number for KVM_CAP_ARM_RME to avoid conflicts.
Changes since v8:
* Minor improvements to documentation following review.
* Bump the magic numbers to avoid conflicts.
Changes since v7:
* Add documentation of new ioctls
* Bump the magic numbers to avoid conflicts
Changes since v6:
* Rename some of the symbols to make their usage clearer and avoid
repetition.
Changes from v5:
* Actually expose the new VCPU capability (KVM_ARM_VCPU_REC) by bumping
KVM_VCPU_MAX_FEATURES - note this also exposes KVM_ARM_VCPU_HAS_EL2!
---
Documentation/virt/kvm/api.rst | 40 ++++++++++++++++++++++++++++++++++
include/uapi/linux/kvm.h | 13 +++++++++++
2 files changed, 53 insertions(+)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 52bbbb553ce1..ca68aae7faa2 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6553,6 +6553,37 @@ KVM_S390_KEYOP_SSKE
Sets the storage key for the guest address ``guest_addr`` to the key
specified in ``key``, returning the previous value in ``key``.
+4.145 KVM_ARM_RMI_POPULATE
+--------------------------
+
+:Capability: KVM_CAP_ARM_RMI
+:Architectures: arm64
+:Type: vm ioctl
+:Parameters: struct kvm_arm_rmi_populate (in/out)
+:Returns: 0 on success, < 0 on error
+
+::
+
+ struct kvm_arm_rmi_populate {
+ __u64 base;
+ __u64 size;
+ __u64 source_uaddr;
+ __u32 flags;
+ __u32 reserved;
+ };
+
+Populate a region of protected address space by copying the data from the
+(non-protected) user space pointer provided into a protected region (backed by
+guestmem_fd). It implicitly sets the destination region to RIPAS RAM. This is
+only valid before any VCPUs have been run. The ioctl might not populate the
+entire region and in this case the kernel updates the fields `base`, `size` and
+`source_uaddr`. User space may have to repeatedly call it until `size` is 0 to
+populate the entire region.
+
+`flags` can be set to `KVM_ARM_RMI_POPULATE_FLAGS_MEASURE` to request that the
+populated data is hashed and added to the guest's Realm Initial Measurement
+(RIM).
+
.. _kvm_run:
5. The kvm_run structure
@@ -8904,6 +8935,15 @@ helpful if user space wants to emulate instructions which are not
This capability can be enabled dynamically even if VCPUs were already
created and are running.
+7.47 KVM_CAP_ARM_RMI
+--------------------
+
+:Architectures: arm64
+:Target: VM
+:Parameters: None
+
+This capability indicates that support for CCA realms is available.
+
8. Other capabilities.
======================
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 6c8afa2047bf..b8cff0938041 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -996,6 +996,7 @@ struct kvm_enable_cap {
#define KVM_CAP_S390_USER_OPEREXEC 246
#define KVM_CAP_S390_KEYOP 247
#define KVM_CAP_S390_VSIE_ESAMODE 248
+#define KVM_CAP_ARM_RMI 249
struct kvm_irq_routing_irqchip {
__u32 irqchip;
@@ -1669,4 +1670,16 @@ struct kvm_pre_fault_memory {
__u64 padding[5];
};
+/* Available with KVM_CAP_ARM_RMI, only for VMs with KVM_VM_TYPE_ARM_REALM */
+#define KVM_ARM_RMI_POPULATE _IOWR(KVMIO, 0xd7, struct kvm_arm_rmi_populate)
+#define KVM_ARM_RMI_POPULATE_FLAGS_MEASURE (1 << 0)
+
+struct kvm_arm_rmi_populate {
+ __u64 base;
+ __u64 size;
+ __u64 source_uaddr;
+ __u32 flags;
+ __u32 reserved;
+};
+
#endif /* __LINUX_KVM_H */
--
2.43.0
^ permalink raw reply related
* [PATCH v14 12/44] arm64: RMI: Check for LPA2 support
From: Steven Price @ 2026-05-13 13:17 UTC (permalink / raw)
To: kvm, kvmarm
Cc: Steven Price, Catalin Marinas, Marc Zyngier, Will Deacon,
James Morse, Oliver Upton, Suzuki K Poulose, Zenghui Yu,
linux-arm-kernel, linux-kernel, Joey Gouly, Alexandru Elisei,
Christoffer Dall, Fuad Tabba, linux-coco, Ganapatrao Kulkarni,
Gavin Shan, Shanker Donthineni, Alper Gun, Aneesh Kumar K . V,
Emi Kisanuki, Vishal Annapurve, WeiLin.Chang, Lorenzo.Pieralisi2
In-Reply-To: <20260513131757.116630-1-steven.price@arm.com>
If KVM has enabled LPA2 support then check that the RMM also supports
it. If there is a mismatch then disable support for realm guests as the
VMM may attempt to create a guest which is incompatible with the RMM.
Signed-off-by: Steven Price <steven.price@arm.com>
---
New patch for v13
---
arch/arm64/kvm/rmi.c | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)
diff --git a/arch/arm64/kvm/rmi.c b/arch/arm64/kvm/rmi.c
index 1acc972a4b92..6e28b669ded2 100644
--- a/arch/arm64/kvm/rmi.c
+++ b/arch/arm64/kvm/rmi.c
@@ -5,9 +5,25 @@
#include <linux/kvm_host.h>
+#include <asm/kvm_pgtable.h>
#include <asm/rmi_cmds.h>
#include <asm/virt.h>
+static bool rmi_has_feature(unsigned long feature)
+{
+ return !!u64_get_bits(rmm_feat_reg0, feature);
+}
+
+static int rmm_check_features(void)
+{
+ if (kvm_lpa2_is_enabled() && !rmi_has_feature(RMI_FEATURE_REGISTER_0_LPA2)) {
+ kvm_err("RMM doesn't support LPA2");
+ return -ENXIO;
+ }
+
+ return 0;
+}
+
void kvm_init_rmi(void)
{
/*
@@ -20,5 +36,8 @@ void kvm_init_rmi(void)
if (!rmi_is_available())
return;
+ if (rmm_check_features())
+ return;
+
/* Future patch will enable static branch kvm_rmi_is_available */
}
--
2.43.0
^ permalink raw reply related
* [PATCH v14 11/44] arm64: RMI: Check for RMI support at KVM init
From: Steven Price @ 2026-05-13 13:17 UTC (permalink / raw)
To: kvm, kvmarm
Cc: Steven Price, Catalin Marinas, Marc Zyngier, Will Deacon,
James Morse, Oliver Upton, Suzuki K Poulose, Zenghui Yu,
linux-arm-kernel, linux-kernel, Joey Gouly, Alexandru Elisei,
Christoffer Dall, Fuad Tabba, linux-coco, Ganapatrao Kulkarni,
Gavin Shan, Shanker Donthineni, Alper Gun, Aneesh Kumar K . V,
Emi Kisanuki, Vishal Annapurve, WeiLin.Chang, Lorenzo.Pieralisi2
In-Reply-To: <20260513131757.116630-1-steven.price@arm.com>
Check if the RMI support is sufficient for using in KVM. Specifically we
currently only support KVM in VHE mode when for creating realm VMs.
Signed-off-by: Steven Price <steven.price@arm.com>
---
Changes since v13:
* Most of the init has been moved out of the 'kvm' directory so this is
much more basic now.
Changes since v12:
* Drop check for 4k page size.
Changes since v11:
* Reword slightly the comments on the realm states.
Changes since v10:
* kvm_is_realm() no longer has a NULL check.
* Rename from "rme" to "rmi" when referring to the RMM interface.
* Check for RME (hardware) support before probing for RMI support.
Changes since v8:
* No need to guard kvm_init_rme() behind 'in_hyp_mode'.
Changes since v6:
* Improved message for an unsupported RMI ABI version.
Changes since v5:
* Reword "unsupported" message from "host supports" to "we want" to
clarify that 'we' are the 'host'.
Changes since v2:
* Drop return value from kvm_init_rme(), it was always 0.
* Rely on the RMM return value to identify whether the RSI ABI is
compatible.
---
arch/arm64/include/asm/kvm_host.h | 4 ++++
arch/arm64/include/asm/kvm_rmi.h | 17 +++++++++++++++++
arch/arm64/include/asm/virt.h | 1 +
arch/arm64/kvm/Makefile | 2 +-
arch/arm64/kvm/arm.c | 5 +++++
arch/arm64/kvm/rmi.c | 24 ++++++++++++++++++++++++
6 files changed, 52 insertions(+), 1 deletion(-)
create mode 100644 arch/arm64/include/asm/kvm_rmi.h
create mode 100644 arch/arm64/kvm/rmi.c
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 851f6171751c..3512696ed506 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -27,6 +27,7 @@
#include <asm/fpsimd.h>
#include <asm/kvm.h>
#include <asm/kvm_asm.h>
+#include <asm/kvm_rmi.h>
#include <asm/vncr_mapping.h>
#define __KVM_HAVE_ARCH_INTC_INITIALIZED
@@ -424,6 +425,9 @@ struct kvm_arch {
/* Nested virtualization info */
struct dentry *debugfs_nv_dentry;
#endif
+
+ bool is_realm;
+ struct realm realm;
};
struct kvm_vcpu_fault_info {
diff --git a/arch/arm64/include/asm/kvm_rmi.h b/arch/arm64/include/asm/kvm_rmi.h
new file mode 100644
index 000000000000..4936007947fd
--- /dev/null
+++ b/arch/arm64/include/asm/kvm_rmi.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2023-2025 ARM Ltd.
+ */
+
+#ifndef __ASM_KVM_RMI_H
+#define __ASM_KVM_RMI_H
+
+/**
+ * struct realm - Additional per VM data for a Realm
+ */
+struct realm {
+};
+
+void kvm_init_rmi(void);
+
+#endif /* __ASM_KVM_RMI_H */
diff --git a/arch/arm64/include/asm/virt.h b/arch/arm64/include/asm/virt.h
index b546703c3ab9..92cec42952f4 100644
--- a/arch/arm64/include/asm/virt.h
+++ b/arch/arm64/include/asm/virt.h
@@ -87,6 +87,7 @@ void __hyp_reset_vectors(void);
bool is_kvm_arm_initialised(void);
DECLARE_STATIC_KEY_FALSE(kvm_protected_mode_initialized);
+DECLARE_STATIC_KEY_FALSE(kvm_rmi_is_available);
static inline bool is_pkvm_initialized(void)
{
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index 59612d2f277c..ed3cf30eb06e 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -16,7 +16,7 @@ CFLAGS_handle_exit.o += -Wno-override-init
kvm-y += arm.o mmu.o mmio.o psci.o hypercalls.o pvtime.o \
inject_fault.o va_layout.o handle_exit.o config.o \
guest.o debug.o reset.o sys_regs.o stacktrace.o \
- vgic-sys-reg-v3.o fpsimd.o pkvm.o \
+ vgic-sys-reg-v3.o fpsimd.o pkvm.o rmi.o \
arch_timer.o trng.o vmid.o emulate-nested.o nested.o at.o \
vgic/vgic.o vgic/vgic-init.o \
vgic/vgic-irqfd.o vgic/vgic-v2.o \
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 176cbe8baad3..247e03b33035 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -41,6 +41,7 @@
#include <asm/kvm_nested.h>
#include <asm/kvm_pkvm.h>
#include <asm/kvm_ptrauth.h>
+#include <asm/kvm_rmi.h>
#include <asm/sections.h>
#include <asm/stacktrace/nvhe.h>
@@ -109,6 +110,8 @@ long kvm_get_cap_for_kvm_ioctl(unsigned int ioctl, long *ext)
return -EINVAL;
}
+DEFINE_STATIC_KEY_FALSE(kvm_rmi_is_available);
+
DECLARE_KVM_HYP_PER_CPU(unsigned long, kvm_hyp_vector);
DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_base);
@@ -2975,6 +2978,8 @@ static __init int kvm_arm_init(void)
in_hyp_mode = is_kernel_in_hyp_mode();
+ kvm_init_rmi();
+
if (cpus_have_final_cap(ARM64_WORKAROUND_DEVICE_LOAD_ACQUIRE) ||
cpus_have_final_cap(ARM64_WORKAROUND_1508412))
kvm_info("Guests without required CPU erratum workarounds can deadlock system!\n" \
diff --git a/arch/arm64/kvm/rmi.c b/arch/arm64/kvm/rmi.c
new file mode 100644
index 000000000000..1acc972a4b92
--- /dev/null
+++ b/arch/arm64/kvm/rmi.c
@@ -0,0 +1,24 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2023-2025 ARM Ltd.
+ */
+
+#include <linux/kvm_host.h>
+
+#include <asm/rmi_cmds.h>
+#include <asm/virt.h>
+
+void kvm_init_rmi(void)
+{
+ /*
+ * TODO: Support Realm guests in nVHE mode, this will require adding
+ * EL2 stub(s) for REC entry and possibly other things.
+ */
+ if (!is_kernel_in_hyp_mode())
+ return;
+
+ if (!rmi_is_available())
+ return;
+
+ /* Future patch will enable static branch kvm_rmi_is_available */
+}
--
2.43.0
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox