* [PATCH 00/16] KVM: arm64: nv: Shadow stage-2 page table handling
@ 2024-04-09 17:54 Marc Zyngier
2024-04-09 17:54 ` [PATCH 01/16] KVM: arm64: nv: Support multiple nested Stage-2 mmu structures Marc Zyngier
` (15 more replies)
0 siblings, 16 replies; 19+ messages in thread
From: Marc Zyngier @ 2024-04-09 17:54 UTC (permalink / raw)
To: kvmarm, kvm, linux-arm-kernel
Cc: James Morse, Suzuki K Poulose, Oliver Upton, Zenghui Yu,
Joey Gouly, Alexandru Elisei, Christoffer Dall
Here's another instalment of everyone's favourite "arm64 nested virt,
one headache at a time". This time, we deal with the shadowing of the
guest's S2 page tables.
So here's the 10000m (approximately 30000ft for those of you stuck
with the wrong units) view of what this is doing:
- for each {VMID,VTTBR,VTCR} tuple the guest uses, we use a separate
shadow s2_mmu context. This context has its own "real" VMID and a
set of page tables that are the combination of the guest's S2 and
the host S2, built dynamically one fault at a time.
- these shadow S2 contexts are ephemeral, and behave exactly as
TLBs. For all intent and purposes, they *are* TLBs, and we discard
them pretty often.
- TLB invalidation takes three possible paths:
* either this is an EL2 S1 invalidation, and we directly emulate it
as early as possible
* or this is an EL1 S1 invalidation, and we need to apply it to the
shadow S2s (plural!) that match the VMID set by the L1 guest
* or finally, this is affecting S2, and we need to teardown the
corresponding part of the shadow S2s, which invalidates the TLBs
From a quality of implementation, this series does the absolute
minimum. In a lot of cases, we blow away all the shadow S2s without
any discrimination. That's because we don't have a reverse mapping
yet, so if something gets unmapped from the canonical S2 through a MMU
notifier, things slow down significantly. At this stage, nobody should
care.
We also make some implementation choices:
- no overhead for non-NV guests -- this is our #1 requirement
- all the TLBIs are implemented as Inner-Shareable, no matter what the
guest says
- we don't try to optimise for leaf invalidation at S2
- we use a TTL-like mechanism to limit the over-invalidation when no
TTL is provided, but this is only a best effort process
- range invalidation is supported
- NXS operations are supported as well, and implemented as XS. Nobody
cares about them anyway
Note that some of the patches used to carry review tags, but the
series has had so many changes that they are not making sense anymore.
This is based on 6.9-rc3, and has been tested on my usual M2 with the
rest of the NV series.
Christoffer Dall (2):
KVM: arm64: nv: Implement nested Stage-2 page table walk logic
KVM: arm64: nv: Unmap/flush shadow stage 2 page tables
Marc Zyngier (14):
KVM: arm64: nv: Support multiple nested Stage-2 mmu structures
KVM: arm64: nv: Handle shadow stage 2 page faults
KVM: arm64: nv: Add Stage-1 EL2 invalidation primitives
KVM: arm64: nv: Handle EL2 Stage-1 TLB invalidation
KVM: arm64: nv: Handle TLB invalidation targeting L2 stage-1
KVM: arm64: nv: Handle TLBI VMALLS12E1{,IS} operations
KVM: arm64: nv: Handle TLBI ALLE1{,IS} operations
KVM: arm64: nv: Handle TLBI IPAS2E1{,IS} operations
KVM: arm64: nv: Handle FEAT_TTL hinted TLB operations
KVM: arm64: nv: Tag shadow S2 entries with guest's leaf S2 level
KVM: arm64: nv: Invalidate TLBs based on shadow S2 TTL-like
information
KVM: arm64: nv: Add handling of outer-shareable TLBI operations
KVM: arm64: nv: Add handling of range-based TLBI operations
KVM: arm64: nv: Add handling of NXS-flavoured TLBI operations
arch/arm64/include/asm/esr.h | 1 +
arch/arm64/include/asm/kvm_asm.h | 2 +
arch/arm64/include/asm/kvm_emulate.h | 1 +
arch/arm64/include/asm/kvm_host.h | 41 ++
arch/arm64/include/asm/kvm_mmu.h | 12 +
arch/arm64/include/asm/kvm_nested.h | 127 +++++
arch/arm64/include/asm/sysreg.h | 17 +
arch/arm64/kvm/arm.c | 11 +
arch/arm64/kvm/hyp/vhe/switch.c | 51 +-
arch/arm64/kvm/hyp/vhe/tlb.c | 147 +++++
arch/arm64/kvm/mmu.c | 219 ++++++--
arch/arm64/kvm/nested.c | 767 ++++++++++++++++++++++++++-
arch/arm64/kvm/reset.c | 6 +
arch/arm64/kvm/sys_regs.c | 398 ++++++++++++++
14 files changed, 1759 insertions(+), 41 deletions(-)
--
2.39.2
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH 01/16] KVM: arm64: nv: Support multiple nested Stage-2 mmu structures
2024-04-09 17:54 [PATCH 00/16] KVM: arm64: nv: Shadow stage-2 page table handling Marc Zyngier
@ 2024-04-09 17:54 ` Marc Zyngier
2024-05-07 6:17 ` Oliver Upton
2024-04-09 17:54 ` [PATCH 02/16] KVM: arm64: nv: Implement nested Stage-2 page table walk logic Marc Zyngier
` (14 subsequent siblings)
15 siblings, 1 reply; 19+ messages in thread
From: Marc Zyngier @ 2024-04-09 17:54 UTC (permalink / raw)
To: kvmarm, kvm, linux-arm-kernel
Cc: James Morse, Suzuki K Poulose, Oliver Upton, Zenghui Yu,
Joey Gouly, Alexandru Elisei, Christoffer Dall
Add Stage-2 mmu data structures for virtual EL2 and for nested guests.
We don't yet populate shadow Stage-2 page tables, but we now have a
framework for getting to a shadow Stage-2 pgd.
We allocate twice the number of vcpus as Stage-2 mmu structures because
that's sufficient for each vcpu running two translation regimes without
having to flush the Stage-2 page tables.
Co-developed-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
arch/arm64/include/asm/kvm_host.h | 41 ++++++
arch/arm64/include/asm/kvm_mmu.h | 9 ++
arch/arm64/include/asm/kvm_nested.h | 6 +
arch/arm64/kvm/arm.c | 11 ++
arch/arm64/kvm/mmu.c | 76 ++++++++---
arch/arm64/kvm/nested.c | 205 ++++++++++++++++++++++++++++
arch/arm64/kvm/reset.c | 6 +
7 files changed, 333 insertions(+), 21 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 9e8a496fb284..f54f4842f116 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -188,8 +188,40 @@ struct kvm_s2_mmu {
uint64_t split_page_chunk_size;
struct kvm_arch *arch;
+
+ /*
+ * For a shadow stage-2 MMU, the virtual vttbr used by the
+ * host to parse the guest S2.
+ * This either contains:
+ * - the virtual VTTBR programmed by the guest hypervisor with
+ * CnP cleared
+ * - The value 1 (VMID=0, BADDR=0, CnP=1) if invalid
+ *
+ * We also cache the full VTCR which gets used for TLB invalidation,
+ * taking the ARM ARM's "Any of the bits in VTCR_EL2 are permitted
+ * to be cached in a TLB" to the letter.
+ */
+ u64 tlb_vttbr;
+ u64 tlb_vtcr;
+
+ /*
+ * true when this represents a nested context where virtual
+ * HCR_EL2.VM == 1
+ */
+ bool nested_stage2_enabled;
+
+ /*
+ * 0: Nobody is currently using this, check vttbr for validity
+ * >0: Somebody is actively using this.
+ */
+ atomic_t refcnt;
};
+static inline bool kvm_s2_mmu_valid(struct kvm_s2_mmu *mmu)
+{
+ return !(mmu->tlb_vttbr & 1);
+}
+
struct kvm_arch_memory_slot {
};
@@ -264,6 +296,14 @@ struct kvm_arch {
*/
u64 fgu[__NR_FGT_GROUP_IDS__];
+ /*
+ * Stage 2 paging state for VMs with nested S2 using a virtual
+ * VMID.
+ */
+ struct kvm_s2_mmu *nested_mmus;
+ size_t nested_mmus_size;
+ int nested_mmus_next;
+
/* Interrupt controller */
struct vgic_dist vgic;
@@ -1239,6 +1279,7 @@ void kvm_vcpu_load_vhe(struct kvm_vcpu *vcpu);
void kvm_vcpu_put_vhe(struct kvm_vcpu *vcpu);
int __init kvm_set_ipa_limit(void);
+u32 kvm_get_pa_bits(struct kvm *kvm);
#define __KVM_HAVE_ARCH_VM_ALLOC
struct kvm *kvm_arch_alloc_vm(void);
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index d5e48d870461..b48c9e12e7ff 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -98,6 +98,7 @@ alternative_cb_end
#include <asm/mmu_context.h>
#include <asm/kvm_emulate.h>
#include <asm/kvm_host.h>
+#include <asm/kvm_nested.h>
void kvm_update_va_mask(struct alt_instr *alt,
__le32 *origptr, __le32 *updptr, int nr_inst);
@@ -165,6 +166,7 @@ int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
int create_hyp_stack(phys_addr_t phys_addr, unsigned long *haddr);
void __init free_hyp_pgds(void);
+void kvm_unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size);
void stage2_unmap_vm(struct kvm *kvm);
int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu, unsigned long type);
void kvm_uninit_stage2_mmu(struct kvm *kvm);
@@ -326,5 +328,12 @@ static inline struct kvm *kvm_s2_mmu_to_kvm(struct kvm_s2_mmu *mmu)
{
return container_of(mmu->arch, struct kvm, arch);
}
+
+static inline u64 get_vmid(u64 vttbr)
+{
+ return (vttbr & VTTBR_VMID_MASK(kvm_get_vmid_bits())) >>
+ VTTBR_VMID_SHIFT;
+}
+
#endif /* __ASSEMBLY__ */
#endif /* __ARM64_KVM_MMU_H__ */
diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
index c77d795556e1..1a4004e7573d 100644
--- a/arch/arm64/include/asm/kvm_nested.h
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -60,6 +60,12 @@ static inline u64 translate_ttbr0_el2_to_ttbr0_el1(u64 ttbr0)
return ttbr0 & ~GENMASK_ULL(63, 48);
}
+extern void kvm_init_nested(struct kvm *kvm);
+extern int kvm_vcpu_init_nested(struct kvm_vcpu *vcpu);
+extern void kvm_init_nested_s2_mmu(struct kvm_s2_mmu *mmu);
+extern struct kvm_s2_mmu *lookup_s2_mmu(struct kvm_vcpu *vcpu);
+extern void kvm_vcpu_load_hw_mmu(struct kvm_vcpu *vcpu);
+extern void kvm_vcpu_put_hw_mmu(struct kvm_vcpu *vcpu);
int kvm_init_nv_sysregs(struct kvm *kvm);
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index c4a0a35e02c7..bb19c53b6539 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -147,6 +147,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
mutex_unlock(&kvm->lock);
#endif
+ kvm_init_nested(kvm);
+
ret = kvm_share_hyp(kvm, kvm + 1);
if (ret)
return ret;
@@ -433,6 +435,9 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
struct kvm_s2_mmu *mmu;
int *last_ran;
+ if (vcpu_has_nv(vcpu))
+ kvm_vcpu_load_hw_mmu(vcpu);
+
mmu = vcpu->arch.hw_mmu;
last_ran = this_cpu_ptr(mmu->last_vcpu_ran);
@@ -483,6 +488,8 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
kvm_timer_vcpu_put(vcpu);
kvm_vgic_put(vcpu);
kvm_vcpu_pmu_restore_host(vcpu);
+ if (vcpu_has_nv(vcpu))
+ kvm_vcpu_put_hw_mmu(vcpu);
kvm_arm_vmid_clear_active();
vcpu_clear_on_unsupported_cpu(vcpu);
@@ -1346,6 +1353,10 @@ static int kvm_setup_vcpu(struct kvm_vcpu *vcpu)
if (kvm_vcpu_has_pmu(vcpu) && !kvm->arch.arm_pmu)
ret = kvm_arm_set_default_pmu(kvm);
+ /* Prepare for nested if required */
+ if (!ret)
+ ret = kvm_vcpu_init_nested(vcpu);
+
return ret;
}
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index dc04bc767865..24a946814831 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -328,7 +328,7 @@ static void __unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64
may_block));
}
-static void unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size)
+void kvm_unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size)
{
__unmap_stage2_range(mmu, start, size, true);
}
@@ -855,21 +855,9 @@ static struct kvm_pgtable_mm_ops kvm_s2_mm_ops = {
.icache_inval_pou = invalidate_icache_guest_page,
};
-/**
- * kvm_init_stage2_mmu - Initialise a S2 MMU structure
- * @kvm: The pointer to the KVM structure
- * @mmu: The pointer to the s2 MMU structure
- * @type: The machine type of the virtual machine
- *
- * Allocates only the stage-2 HW PGD level table(s).
- * Note we don't need locking here as this is only called when the VM is
- * created, which can only be done once.
- */
-int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu, unsigned long type)
+static int kvm_init_ipa_range(struct kvm_s2_mmu *mmu, unsigned long type)
{
u32 kvm_ipa_limit = get_kvm_ipa_limit();
- int cpu, err;
- struct kvm_pgtable *pgt;
u64 mmfr0, mmfr1;
u32 phys_shift;
@@ -896,11 +884,58 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu, unsigned long t
mmfr1 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1);
mmu->vtcr = kvm_get_vtcr(mmfr0, mmfr1, phys_shift);
+ return 0;
+}
+
+/**
+ * kvm_init_stage2_mmu - Initialise a S2 MMU structure
+ * @kvm: The pointer to the KVM structure
+ * @mmu: The pointer to the s2 MMU structure
+ * @type: The machine type of the virtual machine
+ *
+ * Allocates only the stage-2 HW PGD level table(s).
+ * Note we don't need locking here as this is only called in two cases:
+ *
+ * - when the VM is created, which can't race against anything
+ *
+ * - when secondary kvm_s2_mmu structures are initialised for NV
+ * guests, and the caller must hold kvm->lock as this is called on a
+ * per-vcpu basis.
+ */
+int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu, unsigned long type)
+{
+ int cpu, err;
+ struct kvm_pgtable *pgt;
+
+ /*
+ * If we already have our page tables in place, and that the
+ * MMU context is the canonical one, we have a bug somewhere,
+ * as this is only supposed to ever happen once per VM.
+ *
+ * Otherwise, we're building nested page tables, and that's
+ * probably because userspace called KVM_ARM_VCPU_INIT more
+ * than once on the same vcpu. Since that's actually legal,
+ * don't kick a fuss and leave gracefully.
+ */
if (mmu->pgt != NULL) {
+ if (&kvm->arch.mmu != mmu)
+ return 0;
+
kvm_err("kvm_arch already initialized?\n");
return -EINVAL;
}
+ /*
+ * We only initialise the IPA range on the canonical MMU, so
+ * the type is meaningless in all other situations.
+ */
+ if (&kvm->arch.mmu != mmu)
+ type = kvm_get_pa_bits(kvm);
+
+ err = kvm_init_ipa_range(mmu, type);
+ if (err)
+ return err;
+
pgt = kzalloc(sizeof(*pgt), GFP_KERNEL_ACCOUNT);
if (!pgt)
return -ENOMEM;
@@ -925,6 +960,10 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu, unsigned long t
mmu->pgt = pgt;
mmu->pgd_phys = __pa(pgt->pgd);
+
+ if (&kvm->arch.mmu != mmu)
+ kvm_init_nested_s2_mmu(mmu);
+
return 0;
out_destroy_pgtable:
@@ -976,7 +1015,7 @@ static void stage2_unmap_memslot(struct kvm *kvm,
if (!(vma->vm_flags & VM_PFNMAP)) {
gpa_t gpa = addr + (vm_start - memslot->userspace_addr);
- unmap_stage2_range(&kvm->arch.mmu, gpa, vm_end - vm_start);
+ kvm_unmap_stage2_range(&kvm->arch.mmu, gpa, vm_end - vm_start);
}
hva = vm_end;
} while (hva < reg_end);
@@ -2054,11 +2093,6 @@ void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen)
{
}
-void kvm_arch_flush_shadow_all(struct kvm *kvm)
-{
- kvm_uninit_stage2_mmu(kvm);
-}
-
void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
struct kvm_memory_slot *slot)
{
@@ -2066,7 +2100,7 @@ void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
phys_addr_t size = slot->npages << PAGE_SHIFT;
write_lock(&kvm->mmu_lock);
- unmap_stage2_range(&kvm->arch.mmu, gpa, size);
+ kvm_unmap_stage2_range(&kvm->arch.mmu, gpa, size);
write_unlock(&kvm->mmu_lock);
}
diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
index ced30c90521a..1f4f80a8c011 100644
--- a/arch/arm64/kvm/nested.c
+++ b/arch/arm64/kvm/nested.c
@@ -7,7 +7,9 @@
#include <linux/kvm.h>
#include <linux/kvm_host.h>
+#include <asm/kvm_arm.h>
#include <asm/kvm_emulate.h>
+#include <asm/kvm_mmu.h>
#include <asm/kvm_nested.h>
#include <asm/sysreg.h>
@@ -16,6 +18,209 @@
/* Protection against the sysreg repainting madness... */
#define NV_FTR(r, f) ID_AA64##r##_EL1_##f
+void kvm_init_nested(struct kvm *kvm)
+{
+ kvm->arch.nested_mmus = NULL;
+ kvm->arch.nested_mmus_size = 0;
+}
+
+int kvm_vcpu_init_nested(struct kvm_vcpu *vcpu)
+{
+ struct kvm *kvm = vcpu->kvm;
+ struct kvm_s2_mmu *tmp;
+ int num_mmus;
+ int ret = -ENOMEM;
+
+ if (!test_bit(KVM_ARM_VCPU_HAS_EL2, vcpu->kvm->arch.vcpu_features))
+ return 0;
+
+ if (!cpus_have_final_cap(ARM64_HAS_NESTED_VIRT))
+ return -EINVAL;
+
+ /*
+ * Let's treat memory allocation failures as benign: If we fail to
+ * allocate anything, return an error and keep the allocated array
+ * alive. Userspace may try to recover by intializing the vcpu
+ * again, and there is no reason to affect the whole VM for this.
+ */
+ num_mmus = atomic_read(&kvm->online_vcpus) * 2;
+ tmp = krealloc(kvm->arch.nested_mmus,
+ num_mmus * sizeof(*kvm->arch.nested_mmus),
+ GFP_KERNEL_ACCOUNT | __GFP_ZERO);
+ if (tmp) {
+ /*
+ * If we went through a realocation, adjust the MMU
+ * back-pointers in the previously initialised
+ * pg_table structures.
+ */
+ if (kvm->arch.nested_mmus != tmp) {
+ int i;
+
+ for (i = 0; i < num_mmus - 2; i++)
+ tmp[i].pgt->mmu = &tmp[i];
+ }
+
+ if (kvm_init_stage2_mmu(kvm, &tmp[num_mmus - 1], 0) ||
+ kvm_init_stage2_mmu(kvm, &tmp[num_mmus - 2], 0)) {
+ kvm_free_stage2_pgd(&tmp[num_mmus - 1]);
+ kvm_free_stage2_pgd(&tmp[num_mmus - 2]);
+ } else {
+ kvm->arch.nested_mmus_size = num_mmus;
+ ret = 0;
+ }
+
+ kvm->arch.nested_mmus = tmp;
+ }
+
+ return ret;
+}
+
+struct kvm_s2_mmu *lookup_s2_mmu(struct kvm_vcpu *vcpu)
+{
+ bool nested_stage2_enabled;
+ u64 vttbr, vtcr, hcr;
+ struct kvm *kvm;
+ int i;
+
+ kvm = vcpu->kvm;
+
+ lockdep_assert_held_write(&kvm->mmu_lock);
+
+ vttbr = vcpu_read_sys_reg(vcpu, VTTBR_EL2);
+ vtcr = vcpu_read_sys_reg(vcpu, VTCR_EL2);
+ hcr = vcpu_read_sys_reg(vcpu, HCR_EL2);
+
+ nested_stage2_enabled = hcr & HCR_VM;
+
+ /* Don't consider the CnP bit for the vttbr match */
+ vttbr = vttbr & ~VTTBR_CNP_BIT;
+
+ /*
+ * Two possibilities when looking up a S2 MMU context:
+ *
+ * - either S2 is enabled in the guest, and we need a context that is
+ * S2-enabled and matches the full VTTBR (VMID+BADDR) and VTCR,
+ * which makes it safe from a TLB conflict perspective (a broken
+ * guest won't be able to generate them),
+ *
+ * - or S2 is disabled, and we need a context that is S2-disabled
+ * and matches the VMID only, as all TLBs are tagged by VMID even
+ * if S2 translation is disabled.
+ */
+ for (i = 0; i < kvm->arch.nested_mmus_size; i++) {
+ struct kvm_s2_mmu *mmu = &kvm->arch.nested_mmus[i];
+
+ if (!kvm_s2_mmu_valid(mmu))
+ continue;
+
+ if (nested_stage2_enabled &&
+ mmu->nested_stage2_enabled &&
+ vttbr == mmu->tlb_vttbr &&
+ vtcr == mmu->tlb_vtcr)
+ return mmu;
+
+ if (!nested_stage2_enabled &&
+ !mmu->nested_stage2_enabled &&
+ get_vmid(vttbr) == get_vmid(mmu->tlb_vttbr))
+ return mmu;
+ }
+ return NULL;
+}
+
+static struct kvm_s2_mmu *get_s2_mmu_nested(struct kvm_vcpu *vcpu)
+{
+ struct kvm *kvm = vcpu->kvm;
+ struct kvm_s2_mmu *s2_mmu;
+ int i;
+
+ lockdep_assert_held_write(&vcpu->kvm->mmu_lock);
+
+ s2_mmu = lookup_s2_mmu(vcpu);
+ if (s2_mmu)
+ goto out;
+
+ /*
+ * Make sure we don't always search from the same point, or we
+ * will always reuse a potentially active context, leaving
+ * free contexts unused.
+ */
+ for (i = kvm->arch.nested_mmus_next;
+ i < (kvm->arch.nested_mmus_size + kvm->arch.nested_mmus_next);
+ i++) {
+ s2_mmu = &kvm->arch.nested_mmus[i % kvm->arch.nested_mmus_size];
+
+ if (atomic_read(&s2_mmu->refcnt) == 0)
+ break;
+ }
+ BUG_ON(atomic_read(&s2_mmu->refcnt)); /* We have struct MMUs to spare */
+
+ /* Set the scene for the next search */
+ kvm->arch.nested_mmus_next = (i + 1) % kvm->arch.nested_mmus_size;
+
+ /* Clear the old state */
+ if (kvm_s2_mmu_valid(s2_mmu))
+ kvm_unmap_stage2_range(s2_mmu, 0, kvm_phys_size(s2_mmu));
+
+ /*
+ * The virtual VMID (modulo CnP) will be used as a key when matching
+ * an existing kvm_s2_mmu.
+ *
+ * We cache VTCR at allocation time, once and for all. It'd be great
+ * if the guest didn't screw that one up, as this is not very
+ * forgiving...
+ */
+ s2_mmu->tlb_vttbr = vcpu_read_sys_reg(vcpu, VTTBR_EL2) & ~VTTBR_CNP_BIT;
+ s2_mmu->tlb_vtcr = vcpu_read_sys_reg(vcpu, VTCR_EL2);
+ s2_mmu->nested_stage2_enabled = vcpu_read_sys_reg(vcpu, HCR_EL2) & HCR_VM;
+
+out:
+ atomic_inc(&s2_mmu->refcnt);
+ return s2_mmu;
+}
+
+void kvm_init_nested_s2_mmu(struct kvm_s2_mmu *mmu)
+{
+ /* CnP being set denotes an invalid entry */
+ mmu->tlb_vttbr = VTTBR_CNP_BIT;
+ mmu->nested_stage2_enabled = false;
+ atomic_set(&mmu->refcnt, 0);
+}
+
+void kvm_vcpu_load_hw_mmu(struct kvm_vcpu *vcpu)
+{
+ if (is_hyp_ctxt(vcpu)) {
+ vcpu->arch.hw_mmu = &vcpu->kvm->arch.mmu;
+ } else {
+ write_lock(&vcpu->kvm->mmu_lock);
+ vcpu->arch.hw_mmu = get_s2_mmu_nested(vcpu);
+ write_unlock(&vcpu->kvm->mmu_lock);
+ }
+}
+
+void kvm_vcpu_put_hw_mmu(struct kvm_vcpu *vcpu)
+{
+ if (vcpu->arch.hw_mmu != &vcpu->kvm->arch.mmu) {
+ atomic_dec(&vcpu->arch.hw_mmu->refcnt);
+ vcpu->arch.hw_mmu = NULL;
+ }
+}
+
+void kvm_arch_flush_shadow_all(struct kvm *kvm)
+{
+ int i;
+
+ for (i = 0; i < kvm->arch.nested_mmus_size; i++) {
+ struct kvm_s2_mmu *mmu = &kvm->arch.nested_mmus[i];
+
+ if (!WARN_ON(atomic_read(&mmu->refcnt)))
+ kvm_free_stage2_pgd(mmu);
+ }
+ kfree(kvm->arch.nested_mmus);
+ kvm->arch.nested_mmus = NULL;
+ kvm->arch.nested_mmus_size = 0;
+ kvm_uninit_stage2_mmu(kvm);
+}
+
/*
* Our emulated CPU doesn't support all the possible features. For the
* sake of simplicity (and probably mental sanity), wipe out a number
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index 68d1d05672bd..9151f7335ddf 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -266,6 +266,12 @@ void kvm_reset_vcpu(struct kvm_vcpu *vcpu)
preempt_enable();
}
+u32 kvm_get_pa_bits(struct kvm *kvm)
+{
+ /* Fixed limit until we can configure ID_AA64MMFR0.PARange */
+ return kvm_ipa_limit;
+}
+
u32 get_kvm_ipa_limit(void)
{
return kvm_ipa_limit;
--
2.39.2
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 02/16] KVM: arm64: nv: Implement nested Stage-2 page table walk logic
2024-04-09 17:54 [PATCH 00/16] KVM: arm64: nv: Shadow stage-2 page table handling Marc Zyngier
2024-04-09 17:54 ` [PATCH 01/16] KVM: arm64: nv: Support multiple nested Stage-2 mmu structures Marc Zyngier
@ 2024-04-09 17:54 ` Marc Zyngier
2024-04-09 17:54 ` [PATCH 03/16] KVM: arm64: nv: Handle shadow stage 2 page faults Marc Zyngier
` (13 subsequent siblings)
15 siblings, 0 replies; 19+ messages in thread
From: Marc Zyngier @ 2024-04-09 17:54 UTC (permalink / raw)
To: kvmarm, kvm, linux-arm-kernel
Cc: James Morse, Suzuki K Poulose, Oliver Upton, Zenghui Yu,
Joey Gouly, Alexandru Elisei, Christoffer Dall
From: Christoffer Dall <christoffer.dall@linaro.org>
Based on the pseudo-code in the ARM ARM, implement a stage 2 software
page table walker.
Co-developed-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
arch/arm64/include/asm/esr.h | 1 +
arch/arm64/include/asm/kvm_nested.h | 13 ++
arch/arm64/kvm/nested.c | 265 ++++++++++++++++++++++++++++
3 files changed, 279 insertions(+)
diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
index 81606bf7d5ac..bf6e7de485f8 100644
--- a/arch/arm64/include/asm/esr.h
+++ b/arch/arm64/include/asm/esr.h
@@ -152,6 +152,7 @@
#define ESR_ELx_Xs_MASK (GENMASK_ULL(4, 0))
/* ISS field definitions for exceptions taken in to Hyp */
+#define ESR_ELx_FSC_ADDRSZ (0x00)
#define ESR_ELx_CV (UL(1) << 24)
#define ESR_ELx_COND_SHIFT (20)
#define ESR_ELx_COND_MASK (UL(0xF) << ESR_ELx_COND_SHIFT)
diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
index 1a4004e7573d..d7a1c402dc2d 100644
--- a/arch/arm64/include/asm/kvm_nested.h
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -67,6 +67,19 @@ extern struct kvm_s2_mmu *lookup_s2_mmu(struct kvm_vcpu *vcpu);
extern void kvm_vcpu_load_hw_mmu(struct kvm_vcpu *vcpu);
extern void kvm_vcpu_put_hw_mmu(struct kvm_vcpu *vcpu);
+struct kvm_s2_trans {
+ phys_addr_t output;
+ unsigned long block_size;
+ bool writable;
+ bool readable;
+ int level;
+ u32 esr;
+ u64 upper_attr;
+};
+
+extern int kvm_walk_nested_s2(struct kvm_vcpu *vcpu, phys_addr_t gipa,
+ struct kvm_s2_trans *result);
+
int kvm_init_nv_sysregs(struct kvm *kvm);
#endif /* __ARM64_KVM_NESTED_H */
diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
index 1f4f80a8c011..2ed97b196757 100644
--- a/arch/arm64/kvm/nested.c
+++ b/arch/arm64/kvm/nested.c
@@ -75,6 +75,271 @@ int kvm_vcpu_init_nested(struct kvm_vcpu *vcpu)
return ret;
}
+struct s2_walk_info {
+ int (*read_desc)(phys_addr_t pa, u64 *desc, void *data);
+ void *data;
+ u64 baddr;
+ unsigned int max_pa_bits;
+ unsigned int max_ipa_bits;
+ unsigned int pgshift;
+ unsigned int pgsize;
+ unsigned int ps;
+ unsigned int sl;
+ unsigned int t0sz;
+ bool be;
+};
+
+static unsigned int ps_to_output_size(unsigned int ps)
+{
+ switch (ps) {
+ case 0: return 32;
+ case 1: return 36;
+ case 2: return 40;
+ case 3: return 42;
+ case 4: return 44;
+ case 5:
+ default:
+ return 48;
+ }
+}
+
+static u32 compute_fsc(int level, u32 fsc)
+{
+ return fsc | (level & 0x3);
+}
+
+static int check_base_s2_limits(struct s2_walk_info *wi,
+ int level, int input_size, int stride)
+{
+ int start_size;
+
+ /* Check translation limits */
+ switch (wi->pgsize) {
+ case SZ_64K:
+ if (level == 0 || (level == 1 && wi->max_ipa_bits <= 42))
+ return -EFAULT;
+ break;
+ case SZ_16K:
+ if (level == 0 || (level == 1 && wi->max_ipa_bits <= 40))
+ return -EFAULT;
+ break;
+ case SZ_4K:
+ if (level < 0 || (level == 0 && wi->max_ipa_bits <= 42))
+ return -EFAULT;
+ break;
+ }
+
+ /* Check input size limits */
+ if (input_size > wi->max_ipa_bits)
+ return -EFAULT;
+
+ /* Check number of entries in starting level table */
+ start_size = input_size - ((3 - level) * stride + wi->pgshift);
+ if (start_size < 1 || start_size > stride + 4)
+ return -EFAULT;
+
+ return 0;
+}
+
+/* Check if output is within boundaries */
+static int check_output_size(struct s2_walk_info *wi, phys_addr_t output)
+{
+ unsigned int output_size = ps_to_output_size(wi->ps);
+
+ if (output_size > wi->max_pa_bits)
+ output_size = wi->max_pa_bits;
+
+ if (output_size != 48 && (output & GENMASK_ULL(47, output_size)))
+ return -1;
+
+ return 0;
+}
+
+/*
+ * This is essentially a C-version of the pseudo code from the ARM ARM
+ * AArch64.TranslationTableWalk function. I strongly recommend looking at
+ * that pseudocode in trying to understand this.
+ *
+ * Must be called with the kvm->srcu read lock held
+ */
+static int walk_nested_s2_pgd(phys_addr_t ipa,
+ struct s2_walk_info *wi, struct kvm_s2_trans *out)
+{
+ int first_block_level, level, stride, input_size, base_lower_bound;
+ phys_addr_t base_addr;
+ unsigned int addr_top, addr_bottom;
+ u64 desc; /* page table entry */
+ int ret;
+ phys_addr_t paddr;
+
+ switch (wi->pgsize) {
+ default:
+ case SZ_64K:
+ case SZ_16K:
+ level = 3 - wi->sl;
+ first_block_level = 2;
+ break;
+ case SZ_4K:
+ level = 2 - wi->sl;
+ first_block_level = 1;
+ break;
+ }
+
+ stride = wi->pgshift - 3;
+ input_size = 64 - wi->t0sz;
+ if (input_size > 48 || input_size < 25)
+ return -EFAULT;
+
+ ret = check_base_s2_limits(wi, level, input_size, stride);
+ if (WARN_ON(ret))
+ return ret;
+
+ base_lower_bound = 3 + input_size - ((3 - level) * stride +
+ wi->pgshift);
+ base_addr = wi->baddr & GENMASK_ULL(47, base_lower_bound);
+
+ if (check_output_size(wi, base_addr)) {
+ out->esr = compute_fsc(level, ESR_ELx_FSC_ADDRSZ);
+ return 1;
+ }
+
+ addr_top = input_size - 1;
+
+ while (1) {
+ phys_addr_t index;
+
+ addr_bottom = (3 - level) * stride + wi->pgshift;
+ index = (ipa & GENMASK_ULL(addr_top, addr_bottom))
+ >> (addr_bottom - 3);
+
+ paddr = base_addr | index;
+ ret = wi->read_desc(paddr, &desc, wi->data);
+ if (ret < 0)
+ return ret;
+
+ /*
+ * Handle reversedescriptors if endianness differs between the
+ * host and the guest hypervisor.
+ */
+ if (wi->be)
+ desc = be64_to_cpu(desc);
+ else
+ desc = le64_to_cpu(desc);
+
+ /* Check for valid descriptor at this point */
+ if (!(desc & 1) || ((desc & 3) == 1 && level == 3)) {
+ out->esr = compute_fsc(level, ESR_ELx_FSC_FAULT);
+ out->upper_attr = desc;
+ return 1;
+ }
+
+ /* We're at the final level or block translation level */
+ if ((desc & 3) == 1 || level == 3)
+ break;
+
+ if (check_output_size(wi, desc)) {
+ out->esr = compute_fsc(level, ESR_ELx_FSC_ADDRSZ);
+ out->upper_attr = desc;
+ return 1;
+ }
+
+ base_addr = desc & GENMASK_ULL(47, wi->pgshift);
+
+ level += 1;
+ addr_top = addr_bottom - 1;
+ }
+
+ if (level < first_block_level) {
+ out->esr = compute_fsc(level, ESR_ELx_FSC_FAULT);
+ out->upper_attr = desc;
+ return 1;
+ }
+
+ /*
+ * We don't use the contiguous bit in the stage-2 ptes, so skip check
+ * for misprogramming of the contiguous bit.
+ */
+
+ if (check_output_size(wi, desc)) {
+ out->esr = compute_fsc(level, ESR_ELx_FSC_ADDRSZ);
+ out->upper_attr = desc;
+ return 1;
+ }
+
+ if (!(desc & BIT(10))) {
+ out->esr = compute_fsc(level, ESR_ELx_FSC_ACCESS);
+ out->upper_attr = desc;
+ return 1;
+ }
+
+ /* Calculate and return the result */
+ paddr = (desc & GENMASK_ULL(47, addr_bottom)) |
+ (ipa & GENMASK_ULL(addr_bottom - 1, 0));
+ out->output = paddr;
+ out->block_size = 1UL << ((3 - level) * stride + wi->pgshift);
+ out->readable = desc & (0b01 << 6);
+ out->writable = desc & (0b10 << 6);
+ out->level = level;
+ out->upper_attr = desc & GENMASK_ULL(63, 52);
+ return 0;
+}
+
+static int read_guest_s2_desc(phys_addr_t pa, u64 *desc, void *data)
+{
+ struct kvm_vcpu *vcpu = data;
+
+ return kvm_read_guest(vcpu->kvm, pa, desc, sizeof(*desc));
+}
+
+static void vtcr_to_walk_info(u64 vtcr, struct s2_walk_info *wi)
+{
+ wi->t0sz = vtcr & TCR_EL2_T0SZ_MASK;
+
+ switch (vtcr & VTCR_EL2_TG0_MASK) {
+ case VTCR_EL2_TG0_4K:
+ wi->pgshift = 12; break;
+ case VTCR_EL2_TG0_16K:
+ wi->pgshift = 14; break;
+ case VTCR_EL2_TG0_64K:
+ default: /* IMPDEF: treat any other value as 64k */
+ wi->pgshift = 16; break;
+ }
+
+ wi->pgsize = BIT(wi->pgshift);
+ wi->ps = FIELD_GET(VTCR_EL2_PS_MASK, vtcr);
+ wi->sl = FIELD_GET(VTCR_EL2_SL0_MASK, vtcr);
+ wi->max_ipa_bits = VTCR_EL2_IPA(vtcr);
+ /* Global limit for now, should eventually be per-VM */
+ wi->max_pa_bits = get_kvm_ipa_limit();
+}
+
+int kvm_walk_nested_s2(struct kvm_vcpu *vcpu, phys_addr_t gipa,
+ struct kvm_s2_trans *result)
+{
+ u64 vtcr = vcpu_read_sys_reg(vcpu, VTCR_EL2);
+ struct s2_walk_info wi;
+ int ret;
+
+ result->esr = 0;
+
+ if (!vcpu_has_nv(vcpu))
+ return 0;
+
+ wi.read_desc = read_guest_s2_desc;
+ wi.data = vcpu;
+ wi.baddr = vcpu_read_sys_reg(vcpu, VTTBR_EL2);
+
+ vtcr_to_walk_info(vtcr, &wi);
+
+ wi.be = vcpu_read_sys_reg(vcpu, SCTLR_EL2) & SCTLR_ELx_EE;
+
+ ret = walk_nested_s2_pgd(gipa, &wi, result);
+ if (ret)
+ result->esr |= (kvm_vcpu_get_esr(vcpu) & ~ESR_ELx_FSC);
+
+ return ret;
+}
+
struct kvm_s2_mmu *lookup_s2_mmu(struct kvm_vcpu *vcpu)
{
bool nested_stage2_enabled;
--
2.39.2
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 03/16] KVM: arm64: nv: Handle shadow stage 2 page faults
2024-04-09 17:54 [PATCH 00/16] KVM: arm64: nv: Shadow stage-2 page table handling Marc Zyngier
2024-04-09 17:54 ` [PATCH 01/16] KVM: arm64: nv: Support multiple nested Stage-2 mmu structures Marc Zyngier
2024-04-09 17:54 ` [PATCH 02/16] KVM: arm64: nv: Implement nested Stage-2 page table walk logic Marc Zyngier
@ 2024-04-09 17:54 ` Marc Zyngier
2024-04-09 17:54 ` [PATCH 04/16] KVM: arm64: nv: Unmap/flush shadow stage 2 page tables Marc Zyngier
` (12 subsequent siblings)
15 siblings, 0 replies; 19+ messages in thread
From: Marc Zyngier @ 2024-04-09 17:54 UTC (permalink / raw)
To: kvmarm, kvm, linux-arm-kernel
Cc: James Morse, Suzuki K Poulose, Oliver Upton, Zenghui Yu,
Joey Gouly, Alexandru Elisei, Christoffer Dall
If we are faulting on a shadow stage 2 translation, we first walk the
guest hypervisor's stage 2 page table to see if it has a mapping. If
not, we inject a stage 2 page fault to the virtual EL2. Otherwise, we
create a mapping in the shadow stage 2 page table.
Note that we have to deal with two IPAs when we got a shadow stage 2
page fault. One is the address we faulted on, and is in the L2 guest
phys space. The other is from the guest stage-2 page table walk, and is
in the L1 guest phys space. To differentiate them, we rename variables
so that fault_ipa is used for the former and ipa is used for the latter.
When mapping a page in a shadow stage-2, special care must be taken not
to be more permissive than the guest is.
Co-developed-by: Christoffer Dall <christoffer.dall@linaro.org>
Co-developed-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
arch/arm64/include/asm/kvm_emulate.h | 1 +
arch/arm64/include/asm/kvm_nested.h | 33 ++++++++++
arch/arm64/kvm/mmu.c | 97 +++++++++++++++++++++++++---
arch/arm64/kvm/nested.c | 45 +++++++++++++
4 files changed, 167 insertions(+), 9 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 975af30af31f..675b93b0a4cd 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -611,4 +611,5 @@ static __always_inline void kvm_reset_cptr_el2(struct kvm_vcpu *vcpu)
kvm_write_cptr_el2(val);
}
+
#endif /* __ARM64_KVM_EMULATE_H__ */
diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
index d7a1c402dc2d..35044a5dce3f 100644
--- a/arch/arm64/include/asm/kvm_nested.h
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -77,8 +77,41 @@ struct kvm_s2_trans {
u64 upper_attr;
};
+static inline phys_addr_t kvm_s2_trans_output(struct kvm_s2_trans *trans)
+{
+ return trans->output;
+}
+
+static inline unsigned long kvm_s2_trans_size(struct kvm_s2_trans *trans)
+{
+ return trans->block_size;
+}
+
+static inline u32 kvm_s2_trans_esr(struct kvm_s2_trans *trans)
+{
+ return trans->esr;
+}
+
+static inline bool kvm_s2_trans_readable(struct kvm_s2_trans *trans)
+{
+ return trans->readable;
+}
+
+static inline bool kvm_s2_trans_writable(struct kvm_s2_trans *trans)
+{
+ return trans->writable;
+}
+
+static inline bool kvm_s2_trans_executable(struct kvm_s2_trans *trans)
+{
+ return !(trans->upper_attr & BIT(54));
+}
+
extern int kvm_walk_nested_s2(struct kvm_vcpu *vcpu, phys_addr_t gipa,
struct kvm_s2_trans *result);
+extern int kvm_s2_handle_perm_fault(struct kvm_vcpu *vcpu,
+ struct kvm_s2_trans *trans);
+extern int kvm_inject_s2_fault(struct kvm_vcpu *vcpu, u64 esr_el2);
int kvm_init_nv_sysregs(struct kvm *kvm);
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 24a946814831..fe2b403e9a14 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1414,6 +1414,7 @@ static bool kvm_vma_mte_allowed(struct vm_area_struct *vma)
}
static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
+ struct kvm_s2_trans *nested,
struct kvm_memory_slot *memslot, unsigned long hva,
bool fault_is_perm)
{
@@ -1422,6 +1423,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
bool exec_fault, mte_allowed;
bool device = false, vfio_allow_any_uc = false;
unsigned long mmu_seq;
+ phys_addr_t ipa = fault_ipa;
struct kvm *kvm = vcpu->kvm;
struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache;
struct vm_area_struct *vma;
@@ -1505,10 +1507,38 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
}
vma_pagesize = 1UL << vma_shift;
+
+ if (nested) {
+ unsigned long max_map_size;
+
+ max_map_size = force_pte ? PAGE_SIZE : PUD_SIZE;
+
+ ipa = kvm_s2_trans_output(nested);
+
+ /*
+ * If we're about to create a shadow stage 2 entry, then we
+ * can only create a block mapping if the guest stage 2 page
+ * table uses at least as big a mapping.
+ */
+ max_map_size = min(kvm_s2_trans_size(nested), max_map_size);
+
+ /*
+ * Be careful that if the mapping size falls between
+ * two host sizes, take the smallest of the two.
+ */
+ if (max_map_size >= PMD_SIZE && max_map_size < PUD_SIZE)
+ max_map_size = PMD_SIZE;
+ else if (max_map_size >= PAGE_SIZE && max_map_size < PMD_SIZE)
+ max_map_size = PAGE_SIZE;
+
+ force_pte = (max_map_size == PAGE_SIZE);
+ vma_pagesize = min(vma_pagesize, (long)max_map_size);
+ }
+
if (vma_pagesize == PMD_SIZE || vma_pagesize == PUD_SIZE)
fault_ipa &= ~(vma_pagesize - 1);
- gfn = fault_ipa >> PAGE_SHIFT;
+ gfn = ipa >> PAGE_SHIFT;
mte_allowed = kvm_vma_mte_allowed(vma);
vfio_allow_any_uc = vma->vm_flags & VM_ALLOW_ANY_UNCACHED;
@@ -1559,6 +1589,17 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
if (exec_fault && device)
return -ENOEXEC;
+ /*
+ * Potentially reduce shadow S2 permissions to match the guest's own
+ * S2. For exec faults, we'd only reach this point if the guest
+ * actually allowed it (see kvm_s2_handle_perm_fault).
+ */
+ if (nested) {
+ writable &= kvm_s2_trans_writable(nested);
+ if (!kvm_s2_trans_readable(nested))
+ prot &= ~KVM_PGTABLE_PROT_R;
+ }
+
read_lock(&kvm->mmu_lock);
pgt = vcpu->arch.hw_mmu->pgt;
if (mmu_invalidate_retry(kvm, mmu_seq))
@@ -1603,7 +1644,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
prot |= KVM_PGTABLE_PROT_NORMAL_NC;
else
prot |= KVM_PGTABLE_PROT_DEVICE;
- } else if (cpus_have_final_cap(ARM64_HAS_CACHE_DIC)) {
+ } else if (cpus_have_final_cap(ARM64_HAS_CACHE_DIC) &&
+ (!nested || kvm_s2_trans_executable(nested))) {
prot |= KVM_PGTABLE_PROT_X;
}
@@ -1663,8 +1705,10 @@ static void handle_access_fault(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
*/
int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
{
+ struct kvm_s2_trans nested_trans, *nested = NULL;
unsigned long esr;
- phys_addr_t fault_ipa;
+ phys_addr_t fault_ipa; /* The address we faulted on */
+ phys_addr_t ipa; /* Always the IPA in the L1 guest phys space */
struct kvm_memory_slot *memslot;
unsigned long hva;
bool is_iabt, write_fault, writable;
@@ -1673,7 +1717,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
esr = kvm_vcpu_get_esr(vcpu);
- fault_ipa = kvm_vcpu_get_fault_ipa(vcpu);
+ ipa = fault_ipa = kvm_vcpu_get_fault_ipa(vcpu);
is_iabt = kvm_vcpu_trap_is_iabt(vcpu);
if (esr_fsc_is_translation_fault(esr)) {
@@ -1723,7 +1767,42 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
idx = srcu_read_lock(&vcpu->kvm->srcu);
- gfn = fault_ipa >> PAGE_SHIFT;
+ /*
+ * We may have faulted on a shadow stage 2 page table if we are
+ * running a nested guest. In this case, we have to resolve the L2
+ * IPA to the L1 IPA first, before knowing what kind of memory should
+ * back the L1 IPA.
+ *
+ * If the shadow stage 2 page table walk faults, then we simply inject
+ * this to the guest and carry on.
+ *
+ * If there are no shadow S2 PTs because S2 is disabled, there is
+ * nothing to walk and we treat it as a 1:1 before going through the
+ * canonical translation.
+ */
+ if (vcpu->arch.hw_mmu != &vcpu->kvm->arch.mmu &&
+ vcpu->arch.hw_mmu->nested_stage2_enabled) {
+ u32 esr;
+
+ ret = kvm_walk_nested_s2(vcpu, fault_ipa, &nested_trans);
+ if (ret) {
+ esr = kvm_s2_trans_esr(&nested_trans);
+ kvm_inject_s2_fault(vcpu, esr);
+ goto out_unlock;
+ }
+
+ ret = kvm_s2_handle_perm_fault(vcpu, &nested_trans);
+ if (ret) {
+ esr = kvm_s2_trans_esr(&nested_trans);
+ kvm_inject_s2_fault(vcpu, esr);
+ goto out_unlock;
+ }
+
+ ipa = kvm_s2_trans_output(&nested_trans);
+ nested = &nested_trans;
+ }
+
+ gfn = ipa >> PAGE_SHIFT;
memslot = gfn_to_memslot(vcpu->kvm, gfn);
hva = gfn_to_hva_memslot_prot(memslot, gfn, &writable);
write_fault = kvm_is_write_fault(vcpu);
@@ -1767,13 +1846,13 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
* faulting VA. This is always 12 bits, irrespective
* of the page size.
*/
- fault_ipa |= kvm_vcpu_get_hfar(vcpu) & ((1 << 12) - 1);
- ret = io_mem_abort(vcpu, fault_ipa);
+ ipa |= kvm_vcpu_get_hfar(vcpu) & GENMASK(11, 0);
+ ret = io_mem_abort(vcpu, ipa);
goto out_unlock;
}
/* Userspace should not be able to register out-of-bounds IPAs */
- VM_BUG_ON(fault_ipa >= kvm_phys_size(vcpu->arch.hw_mmu));
+ VM_BUG_ON(ipa >= kvm_phys_size(vcpu->arch.hw_mmu));
if (esr_fsc_is_access_flag_fault(esr)) {
handle_access_fault(vcpu, fault_ipa);
@@ -1781,7 +1860,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
goto out_unlock;
}
- ret = user_mem_abort(vcpu, fault_ipa, memslot, hva,
+ ret = user_mem_abort(vcpu, fault_ipa, nested, memslot, hva,
esr_fsc_is_permission_fault(esr));
if (ret == 0)
ret = 1;
diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
index 2ed97b196757..c2297e696910 100644
--- a/arch/arm64/kvm/nested.c
+++ b/arch/arm64/kvm/nested.c
@@ -108,6 +108,15 @@ static u32 compute_fsc(int level, u32 fsc)
return fsc | (level & 0x3);
}
+static int esr_s2_fault(struct kvm_vcpu *vcpu, int level, u32 fsc)
+{
+ u32 esr;
+
+ esr = kvm_vcpu_get_esr(vcpu) & ~ESR_ELx_FSC;
+ esr |= compute_fsc(level, fsc);
+ return esr;
+}
+
static int check_base_s2_limits(struct s2_walk_info *wi,
int level, int input_size, int stride)
{
@@ -470,6 +479,42 @@ void kvm_vcpu_put_hw_mmu(struct kvm_vcpu *vcpu)
}
}
+/*
+ * Returns non-zero if permission fault is handled by injecting it to the next
+ * level hypervisor.
+ */
+int kvm_s2_handle_perm_fault(struct kvm_vcpu *vcpu, struct kvm_s2_trans *trans)
+{
+ bool forward_fault = false;
+
+ trans->esr = 0;
+
+ if (!kvm_vcpu_trap_is_permission_fault(vcpu))
+ return 0;
+
+ if (kvm_vcpu_trap_is_iabt(vcpu)) {
+ forward_fault = !kvm_s2_trans_executable(trans);
+ } else {
+ bool write_fault = kvm_is_write_fault(vcpu);
+
+ forward_fault = ((write_fault && !trans->writable) ||
+ (!write_fault && !trans->readable));
+ }
+
+ if (forward_fault)
+ trans->esr = esr_s2_fault(vcpu, trans->level, ESR_ELx_FSC_PERM);
+
+ return forward_fault;
+}
+
+int kvm_inject_s2_fault(struct kvm_vcpu *vcpu, u64 esr_el2)
+{
+ vcpu_write_sys_reg(vcpu, vcpu->arch.fault.far_el2, FAR_EL2);
+ vcpu_write_sys_reg(vcpu, vcpu->arch.fault.hpfar_el2, HPFAR_EL2);
+
+ return kvm_inject_nested_sync(vcpu, esr_el2);
+}
+
void kvm_arch_flush_shadow_all(struct kvm *kvm)
{
int i;
--
2.39.2
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 04/16] KVM: arm64: nv: Unmap/flush shadow stage 2 page tables
2024-04-09 17:54 [PATCH 00/16] KVM: arm64: nv: Shadow stage-2 page table handling Marc Zyngier
` (2 preceding siblings ...)
2024-04-09 17:54 ` [PATCH 03/16] KVM: arm64: nv: Handle shadow stage 2 page faults Marc Zyngier
@ 2024-04-09 17:54 ` Marc Zyngier
2024-04-09 17:54 ` [PATCH 05/16] KVM: arm64: nv: Add Stage-1 EL2 invalidation primitives Marc Zyngier
` (11 subsequent siblings)
15 siblings, 0 replies; 19+ messages in thread
From: Marc Zyngier @ 2024-04-09 17:54 UTC (permalink / raw)
To: kvmarm, kvm, linux-arm-kernel
Cc: James Morse, Suzuki K Poulose, Oliver Upton, Zenghui Yu,
Joey Gouly, Alexandru Elisei, Christoffer Dall
From: Christoffer Dall <christoffer.dall@linaro.org>
Unmap/flush shadow stage 2 page tables for the nested VMs as well as the
stage 2 page table for the guest hypervisor.
Note: A bunch of the code in mmu.c relating to MMU notifiers is
currently dealt with in an extremely abrupt way, for example by clearing
out an entire shadow stage-2 table. This will be handled in a more
efficient way using the reverse mapping feature in a later version of
the patch series.
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
arch/arm64/include/asm/kvm_mmu.h | 3 +++
arch/arm64/include/asm/kvm_nested.h | 3 +++
arch/arm64/kvm/mmu.c | 30 +++++++++++++++++----
arch/arm64/kvm/nested.c | 42 +++++++++++++++++++++++++++++
4 files changed, 73 insertions(+), 5 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index b48c9e12e7ff..6975e47f9d4c 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -164,6 +164,8 @@ int create_hyp_io_mappings(phys_addr_t phys_addr, size_t size,
int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
void **haddr);
int create_hyp_stack(phys_addr_t phys_addr, unsigned long *haddr);
+void kvm_stage2_flush_range(struct kvm_s2_mmu *mmu,
+ phys_addr_t addr, phys_addr_t end);
void __init free_hyp_pgds(void);
void kvm_unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size);
@@ -173,6 +175,7 @@ void kvm_uninit_stage2_mmu(struct kvm *kvm);
void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu);
int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
phys_addr_t pa, unsigned long size, bool writable);
+void kvm_stage2_wp_range(struct kvm_s2_mmu *mmu, phys_addr_t addr, phys_addr_t end);
int kvm_handle_guest_abort(struct kvm_vcpu *vcpu);
diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
index 35044a5dce3f..afb3d0689e17 100644
--- a/arch/arm64/include/asm/kvm_nested.h
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -112,6 +112,9 @@ extern int kvm_walk_nested_s2(struct kvm_vcpu *vcpu, phys_addr_t gipa,
extern int kvm_s2_handle_perm_fault(struct kvm_vcpu *vcpu,
struct kvm_s2_trans *trans);
extern int kvm_inject_s2_fault(struct kvm_vcpu *vcpu, u64 esr_el2);
+extern void kvm_nested_s2_wp(struct kvm *kvm);
+extern void kvm_nested_s2_unmap(struct kvm *kvm);
+extern void kvm_nested_s2_flush(struct kvm *kvm);
int kvm_init_nv_sysregs(struct kvm *kvm);
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index fe2b403e9a14..cf31da306622 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -333,13 +333,19 @@ void kvm_unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size)
__unmap_stage2_range(mmu, start, size, true);
}
+void kvm_stage2_flush_range(struct kvm_s2_mmu *mmu,
+ phys_addr_t addr, phys_addr_t end)
+{
+ stage2_apply_range_resched(mmu, addr, end, kvm_pgtable_stage2_flush);
+}
+
static void stage2_flush_memslot(struct kvm *kvm,
struct kvm_memory_slot *memslot)
{
phys_addr_t addr = memslot->base_gfn << PAGE_SHIFT;
phys_addr_t end = addr + PAGE_SIZE * memslot->npages;
- stage2_apply_range_resched(&kvm->arch.mmu, addr, end, kvm_pgtable_stage2_flush);
+ kvm_stage2_flush_range(&kvm->arch.mmu, addr, end);
}
/**
@@ -362,6 +368,8 @@ static void stage2_flush_vm(struct kvm *kvm)
kvm_for_each_memslot(memslot, bkt, slots)
stage2_flush_memslot(kvm, memslot);
+ kvm_nested_s2_flush(kvm);
+
write_unlock(&kvm->mmu_lock);
srcu_read_unlock(&kvm->srcu, idx);
}
@@ -1042,6 +1050,8 @@ void stage2_unmap_vm(struct kvm *kvm)
kvm_for_each_memslot(memslot, bkt, slots)
stage2_unmap_memslot(kvm, memslot);
+ kvm_nested_s2_unmap(kvm);
+
write_unlock(&kvm->mmu_lock);
mmap_read_unlock(current->mm);
srcu_read_unlock(&kvm->srcu, idx);
@@ -1141,12 +1151,12 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
}
/**
- * stage2_wp_range() - write protect stage2 memory region range
+ * kvm_stage2_wp_range() - write protect stage2 memory region range
* @mmu: The KVM stage-2 MMU pointer
* @addr: Start address of range
* @end: End address of range
*/
-static void stage2_wp_range(struct kvm_s2_mmu *mmu, phys_addr_t addr, phys_addr_t end)
+void kvm_stage2_wp_range(struct kvm_s2_mmu *mmu, phys_addr_t addr, phys_addr_t end)
{
stage2_apply_range_resched(mmu, addr, end, kvm_pgtable_stage2_wrprotect);
}
@@ -1177,7 +1187,8 @@ static void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot)
end = (memslot->base_gfn + memslot->npages) << PAGE_SHIFT;
write_lock(&kvm->mmu_lock);
- stage2_wp_range(&kvm->arch.mmu, start, end);
+ kvm_stage2_wp_range(&kvm->arch.mmu, start, end);
+ kvm_nested_s2_wp(kvm);
write_unlock(&kvm->mmu_lock);
kvm_flush_remote_tlbs_memslot(kvm, memslot);
}
@@ -1231,7 +1242,7 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
lockdep_assert_held_write(&kvm->mmu_lock);
- stage2_wp_range(&kvm->arch.mmu, start, end);
+ kvm_stage2_wp_range(&kvm->arch.mmu, start, end);
/*
* Eager-splitting is done when manual-protect is set. We
@@ -1243,6 +1254,8 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
*/
if (kvm_dirty_log_manual_protect_and_init_set(kvm))
kvm_mmu_split_huge_pages(kvm, start, end);
+
+ kvm_nested_s2_wp(kvm);
}
static void kvm_send_hwpoison_signal(unsigned long address, short lsb)
@@ -1883,6 +1896,7 @@ bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
(range->end - range->start) << PAGE_SHIFT,
range->may_block);
+ kvm_nested_s2_unmap(kvm);
return false;
}
@@ -1917,6 +1931,7 @@ bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
PAGE_SIZE, __pfn_to_phys(pfn),
KVM_PGTABLE_PROT_R, NULL, 0);
+ kvm_nested_s2_unmap(kvm);
return false;
}
@@ -1930,6 +1945,10 @@ bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
return kvm_pgtable_stage2_test_clear_young(kvm->arch.mmu.pgt,
range->start << PAGE_SHIFT,
size, true);
+ /*
+ * TODO: Handle nested_mmu structures here using the reverse mapping in
+ * a later version of patch series.
+ */
}
bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
@@ -2180,6 +2199,7 @@ void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
write_lock(&kvm->mmu_lock);
kvm_unmap_stage2_range(&kvm->arch.mmu, gpa, size);
+ kvm_nested_s2_unmap(kvm);
write_unlock(&kvm->mmu_lock);
}
diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
index c2297e696910..3e95f815ad17 100644
--- a/arch/arm64/kvm/nested.c
+++ b/arch/arm64/kvm/nested.c
@@ -515,6 +515,48 @@ int kvm_inject_s2_fault(struct kvm_vcpu *vcpu, u64 esr_el2)
return kvm_inject_nested_sync(vcpu, esr_el2);
}
+void kvm_nested_s2_wp(struct kvm *kvm)
+{
+ int i;
+
+ lockdep_assert_held_write(&kvm->mmu_lock);
+
+ for (i = 0; i < kvm->arch.nested_mmus_size; i++) {
+ struct kvm_s2_mmu *mmu = &kvm->arch.nested_mmus[i];
+
+ if (kvm_s2_mmu_valid(mmu))
+ kvm_stage2_wp_range(mmu, 0, kvm_phys_size(mmu));
+ }
+}
+
+void kvm_nested_s2_unmap(struct kvm *kvm)
+{
+ int i;
+
+ lockdep_assert_held_write(&kvm->mmu_lock);
+
+ for (i = 0; i < kvm->arch.nested_mmus_size; i++) {
+ struct kvm_s2_mmu *mmu = &kvm->arch.nested_mmus[i];
+
+ if (kvm_s2_mmu_valid(mmu))
+ kvm_unmap_stage2_range(mmu, 0, kvm_phys_size(mmu));
+ }
+}
+
+void kvm_nested_s2_flush(struct kvm *kvm)
+{
+ int i;
+
+ lockdep_assert_held_write(&kvm->mmu_lock);
+
+ for (i = 0; i < kvm->arch.nested_mmus_size; i++) {
+ struct kvm_s2_mmu *mmu = &kvm->arch.nested_mmus[i];
+
+ if (kvm_s2_mmu_valid(mmu))
+ kvm_stage2_flush_range(mmu, 0, kvm_phys_size(mmu));
+ }
+}
+
void kvm_arch_flush_shadow_all(struct kvm *kvm)
{
int i;
--
2.39.2
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 05/16] KVM: arm64: nv: Add Stage-1 EL2 invalidation primitives
2024-04-09 17:54 [PATCH 00/16] KVM: arm64: nv: Shadow stage-2 page table handling Marc Zyngier
` (3 preceding siblings ...)
2024-04-09 17:54 ` [PATCH 04/16] KVM: arm64: nv: Unmap/flush shadow stage 2 page tables Marc Zyngier
@ 2024-04-09 17:54 ` Marc Zyngier
2024-04-09 17:54 ` [PATCH 06/16] KVM: arm64: nv: Handle EL2 Stage-1 TLB invalidation Marc Zyngier
` (10 subsequent siblings)
15 siblings, 0 replies; 19+ messages in thread
From: Marc Zyngier @ 2024-04-09 17:54 UTC (permalink / raw)
To: kvmarm, kvm, linux-arm-kernel
Cc: James Morse, Suzuki K Poulose, Oliver Upton, Zenghui Yu,
Joey Gouly, Alexandru Elisei, Christoffer Dall
Provide the primitives required to handle TLB invalidation for
Stage-1 EL2 TLBs, which by definition do not require messing
with the Stage-2 page tables.
Co-developed-by: Jintack Lim <jintack.lim@linaro.org>
Co-developed-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
arch/arm64/include/asm/kvm_asm.h | 2 +
arch/arm64/kvm/hyp/vhe/tlb.c | 65 ++++++++++++++++++++++++++++++++
2 files changed, 67 insertions(+)
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 24b5e6b23417..38644fa648fa 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -234,6 +234,8 @@ extern void __kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
phys_addr_t start, unsigned long pages);
extern void __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu);
+extern int __kvm_tlbi_s1e2(struct kvm_s2_mmu *mmu, u64 va, u64 sys_encoding);
+
extern void __kvm_timer_set_cntvoff(u64 cntvoff);
extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
diff --git a/arch/arm64/kvm/hyp/vhe/tlb.c b/arch/arm64/kvm/hyp/vhe/tlb.c
index 1a60b95381e8..7ffc3f4b09f0 100644
--- a/arch/arm64/kvm/hyp/vhe/tlb.c
+++ b/arch/arm64/kvm/hyp/vhe/tlb.c
@@ -219,3 +219,68 @@ void __kvm_flush_vm_context(void)
__tlbi(alle1is);
dsb(ish);
}
+
+/*
+ * TLB invalidation emulation for NV. For any given instruction, we
+ * perform the following transformtions:
+ *
+ * - a TLBI targeting EL2 S1 is remapped to EL1 S1
+ * - a non-shareable TLBI is upgraded to being inner-shareable
+ */
+int __kvm_tlbi_s1e2(struct kvm_s2_mmu *mmu, u64 va, u64 sys_encoding)
+{
+ struct tlb_inv_context cxt;
+ int ret = 0;
+
+ /*
+ * The guest will have provided its own DSB ISHST before trapping.
+ * If it hasn't, that's its own problem, and we won't paper over it
+ * (plus, there is plenty of extra synchronisation before we even
+ * get here...).
+ */
+
+ if (mmu)
+ __tlb_switch_to_guest(mmu, &cxt);
+
+ switch (sys_encoding) {
+ case OP_TLBI_ALLE2:
+ case OP_TLBI_ALLE2IS:
+ case OP_TLBI_VMALLE1:
+ case OP_TLBI_VMALLE1IS:
+ __tlbi(vmalle1is);
+ break;
+ case OP_TLBI_VAE2:
+ case OP_TLBI_VAE2IS:
+ case OP_TLBI_VAE1:
+ case OP_TLBI_VAE1IS:
+ __tlbi(vae1is, va);
+ break;
+ case OP_TLBI_VALE2:
+ case OP_TLBI_VALE2IS:
+ case OP_TLBI_VALE1:
+ case OP_TLBI_VALE1IS:
+ __tlbi(vale1is, va);
+ break;
+ case OP_TLBI_ASIDE1:
+ case OP_TLBI_ASIDE1IS:
+ __tlbi(aside1is, va);
+ break;
+ case OP_TLBI_VAAE1:
+ case OP_TLBI_VAAE1IS:
+ __tlbi(vaae1is, va);
+ break;
+ case OP_TLBI_VAALE1:
+ case OP_TLBI_VAALE1IS:
+ __tlbi(vaale1is, va);
+ break;
+ default:
+ ret = -EINVAL;
+ }
+ dsb(ish);
+ isb();
+
+ if (mmu)
+ __tlb_switch_to_host(&cxt);
+
+ return ret;
+}
--
2.39.2
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 06/16] KVM: arm64: nv: Handle EL2 Stage-1 TLB invalidation
2024-04-09 17:54 [PATCH 00/16] KVM: arm64: nv: Shadow stage-2 page table handling Marc Zyngier
` (4 preceding siblings ...)
2024-04-09 17:54 ` [PATCH 05/16] KVM: arm64: nv: Add Stage-1 EL2 invalidation primitives Marc Zyngier
@ 2024-04-09 17:54 ` Marc Zyngier
2024-04-09 17:54 ` [PATCH 07/16] KVM: arm64: nv: Handle TLB invalidation targeting L2 stage-1 Marc Zyngier
` (9 subsequent siblings)
15 siblings, 0 replies; 19+ messages in thread
From: Marc Zyngier @ 2024-04-09 17:54 UTC (permalink / raw)
To: kvmarm, kvm, linux-arm-kernel
Cc: James Morse, Suzuki K Poulose, Oliver Upton, Zenghui Yu,
Joey Gouly, Alexandru Elisei, Christoffer Dall
Due to the way FEAT_NV2 suppresses traps when accessing EL2
system registers, we can't track when the guest changes its
HCR_EL2.TGE setting. This means we always trap EL1 TLBIs,
even if they don't affect any L2 guest.
Given that invalidating the EL2 TLBs doesn't require any messing
with the shadow stage-2 page-tables, we can simply emulate the
instructions early and return directly to the guest.
This is conditioned on the instruction being an EL1 one and
the guest's HCR_EL2.{E2H,TGE} being {1,1} (indicating that
the instruction targets the EL2 S1 TLBs), or the instruction
being one of the EL2 ones (which are not ambiguous).
EL1 TLBIs issued with HCR_EL2.{E2H,TGE}={1,0} are not handled
here, and cause a full exit so that they can be handled in
the context of a VMID.
Co-developed-by: Jintack Lim <jintack.lim@linaro.org>
Co-developed-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
arch/arm64/include/asm/kvm_nested.h | 55 +++++++++++++++++++++++++++++
arch/arm64/include/asm/sysreg.h | 17 +++++++++
arch/arm64/kvm/hyp/vhe/switch.c | 51 +++++++++++++++++++++++++-
3 files changed, 122 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
index afb3d0689e17..136c878b742e 100644
--- a/arch/arm64/include/asm/kvm_nested.h
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -116,6 +116,61 @@ extern void kvm_nested_s2_wp(struct kvm *kvm);
extern void kvm_nested_s2_unmap(struct kvm *kvm);
extern void kvm_nested_s2_flush(struct kvm *kvm);
+static inline bool kvm_supported_tlbi_s1e1_op(struct kvm_vcpu *vpcu, u32 instr)
+{
+ struct kvm *kvm = vpcu->kvm;
+ u8 CRm = sys_reg_CRm(instr);
+
+ if (!(sys_reg_Op0(instr) == TLBI_Op0 &&
+ sys_reg_Op1(instr) == TLBI_Op1_EL1))
+ return false;
+
+ if (!(sys_reg_CRn(instr) == TLBI_CRn_XS ||
+ (sys_reg_CRn(instr) == TLBI_CRn_nXS &&
+ kvm_has_feat(kvm, ID_AA64ISAR1_EL1, XS, IMP))))
+ return false;
+
+ if (CRm == TLBI_CRm_nROS &&
+ !kvm_has_feat(kvm, ID_AA64ISAR0_EL1, TLB, OS))
+ return false;
+
+ if ((CRm == TLBI_CRm_RIS || CRm == TLBI_CRm_ROS ||
+ CRm == TLBI_CRm_RNS) &&
+ !kvm_has_feat(kvm, ID_AA64ISAR0_EL1, TLB, RANGE))
+ return false;
+
+ return true;
+}
+
+static inline bool kvm_supported_tlbi_s1e2_op(struct kvm_vcpu *vpcu, u32 instr)
+{
+ struct kvm *kvm = vpcu->kvm;
+ u8 CRm = sys_reg_CRm(instr);
+
+ if (!(sys_reg_Op0(instr) == TLBI_Op0 &&
+ sys_reg_Op1(instr) == TLBI_Op1_EL2))
+ return false;
+
+ if (!(sys_reg_CRn(instr) == TLBI_CRn_XS ||
+ (sys_reg_CRn(instr) == TLBI_CRn_nXS &&
+ kvm_has_feat(kvm, ID_AA64ISAR1_EL1, XS, IMP))))
+ return false;
+
+ if (CRm == TLBI_CRm_IPAIS || CRm == TLBI_CRm_IPAONS)
+ return false;
+
+ if (CRm == TLBI_CRm_nROS &&
+ !kvm_has_feat(kvm, ID_AA64ISAR0_EL1, TLB, OS))
+ return false;
+
+ if ((CRm == TLBI_CRm_RIS || CRm == TLBI_CRm_ROS ||
+ CRm == TLBI_CRm_RNS) &&
+ !kvm_has_feat(kvm, ID_AA64ISAR0_EL1, TLB, RANGE))
+ return false;
+
+ return true;
+}
+
int kvm_init_nv_sysregs(struct kvm *kvm);
#endif /* __ARM64_KVM_NESTED_H */
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 9e8999592f3a..b15194f79273 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -654,6 +654,23 @@
#define OP_AT_S12E0W sys_insn(AT_Op0, 4, AT_CRn, 8, 7)
/* TLBI instructions */
+#define TLBI_Op0 1
+
+#define TLBI_Op1_EL1 0 /* Accessible from EL1 or higher */
+#define TLBI_Op1_EL2 4 /* Accessible from EL2 or higher */
+
+#define TLBI_CRn_XS 8 /* Extra Slow (the common one) */
+#define TLBI_CRn_nXS 9 /* not Extra Slow (which nobody uses)*/
+
+#define TLBI_CRm_IPAIS 0 /* S2 Inner-Shareable */
+#define TLBI_CRm_nROS 1 /* non-Range, Outer-Sharable */
+#define TLBI_CRm_RIS 2 /* Range, Inner-Sharable */
+#define TLBI_CRm_nRIS 3 /* non-Range, Inner-Sharable */
+#define TLBI_CRm_IPAONS 4 /* S2 Outer and Non-Shareable */
+#define TLBI_CRm_ROS 5 /* Range, Outer-Sharable */
+#define TLBI_CRm_RNS 6 /* Range, Non-Sharable */
+#define TLBI_CRm_nRNS 7 /* non-Range, Non-Sharable */
+
#define OP_TLBI_VMALLE1OS sys_insn(1, 0, 8, 1, 0)
#define OP_TLBI_VAE1OS sys_insn(1, 0, 8, 1, 1)
#define OP_TLBI_ASIDE1OS sys_insn(1, 0, 8, 1, 2)
diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
index 1581df6aec87..e3ded0ba8e9c 100644
--- a/arch/arm64/kvm/hyp/vhe/switch.c
+++ b/arch/arm64/kvm/hyp/vhe/switch.c
@@ -173,10 +173,59 @@ void kvm_vcpu_put_vhe(struct kvm_vcpu *vcpu)
__vcpu_put_switch_sysregs(vcpu);
}
+static bool kvm_hyp_handle_tlbi_el2(struct kvm_vcpu *vcpu, u64 *exit_code)
+{
+ int ret = -EINVAL;
+ u32 instr;
+ u64 val;
+
+ /*
+ * Ideally, we would never trap on EL2 S1 TLB invalidations using
+ * the EL1 instructions when the guest's HCR_EL2.{E2H,TGE}=={1,1}.
+ * But "thanks" to FEAT_NV2, we don't trap writes to HCR_EL2,
+ * meaning that we can't track changes to the virtual TGE bit. So we
+ * have to leave HCR_EL2.TTLB set on the host. Oopsie...
+ *
+ * Try and handle these invalidation as quickly as possible, without
+ * fully exiting. Note that we don't need to consider any forwarding
+ * here, as having E2H+TGE set is the very definition of being
+ * InHost.
+ *
+ * For the lesser hypervisors out there that have failed to get on
+ * with the VHE program, we can also handle the nVHE style of EL2
+ * invalidation.
+ */
+ if (!(is_hyp_ctxt(vcpu)))
+ return false;
+
+ instr = esr_sys64_to_sysreg(kvm_vcpu_get_esr(vcpu));
+ val = vcpu_get_reg(vcpu, kvm_vcpu_sys_get_rt(vcpu));
+
+ if ((kvm_supported_tlbi_s1e1_op(vcpu, instr) &&
+ vcpu_el2_e2h_is_set(vcpu) && vcpu_el2_tge_is_set(vcpu)) ||
+ kvm_supported_tlbi_s1e2_op (vcpu, instr))
+ ret = __kvm_tlbi_s1e2(NULL, val, instr);
+
+ if (ret)
+ return false;
+
+ __kvm_skip_instr(vcpu);
+
+ return true;
+}
+
+static bool kvm_hyp_handle_sysreg_vhe(struct kvm_vcpu *vcpu, u64 *exit_code)
+{
+ if (kvm_hyp_handle_tlbi_el2(vcpu, exit_code))
+ return true;
+
+ return kvm_hyp_handle_sysreg(vcpu, exit_code);
+}
+
static const exit_handler_fn hyp_exit_handlers[] = {
[0 ... ESR_ELx_EC_MAX] = NULL,
[ESR_ELx_EC_CP15_32] = kvm_hyp_handle_cp15_32,
- [ESR_ELx_EC_SYS64] = kvm_hyp_handle_sysreg,
+ [ESR_ELx_EC_SYS64] = kvm_hyp_handle_sysreg_vhe,
[ESR_ELx_EC_SVE] = kvm_hyp_handle_fpsimd,
[ESR_ELx_EC_FP_ASIMD] = kvm_hyp_handle_fpsimd,
[ESR_ELx_EC_IABT_LOW] = kvm_hyp_handle_iabt_low,
--
2.39.2
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 07/16] KVM: arm64: nv: Handle TLB invalidation targeting L2 stage-1
2024-04-09 17:54 [PATCH 00/16] KVM: arm64: nv: Shadow stage-2 page table handling Marc Zyngier
` (5 preceding siblings ...)
2024-04-09 17:54 ` [PATCH 06/16] KVM: arm64: nv: Handle EL2 Stage-1 TLB invalidation Marc Zyngier
@ 2024-04-09 17:54 ` Marc Zyngier
2024-04-09 17:54 ` [PATCH 08/16] KVM: arm64: nv: Handle TLBI VMALLS12E1{,IS} operations Marc Zyngier
` (8 subsequent siblings)
15 siblings, 0 replies; 19+ messages in thread
From: Marc Zyngier @ 2024-04-09 17:54 UTC (permalink / raw)
To: kvmarm, kvm, linux-arm-kernel
Cc: James Morse, Suzuki K Poulose, Oliver Upton, Zenghui Yu,
Joey Gouly, Alexandru Elisei, Christoffer Dall
While dealing with TLB invalidation targeting the guest hypervisor's
own stage-1 was easy, doing the same thing for its own guests is
a bit more involved.
Since such an invalidation is scoped by VMID, it needs to apply to
all s2_mmu contexts that have been tagged by that VMID, irrespective
of the value of VTTBR_EL2.BADDR.
So for each s2_mmu context matching that VMID, we invalidate the
corresponding TLBs, each context having its own "physical" VMID.
Co-developed-by: Jintack Lim <jintack.lim@linaro.org>
Co-developed-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
arch/arm64/include/asm/kvm_nested.h | 7 +++
arch/arm64/kvm/nested.c | 35 +++++++++++++
arch/arm64/kvm/sys_regs.c | 80 +++++++++++++++++++++++++++++
3 files changed, 122 insertions(+)
diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
index 136c878b742e..6b647a169f5d 100644
--- a/arch/arm64/include/asm/kvm_nested.h
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -64,6 +64,13 @@ extern void kvm_init_nested(struct kvm *kvm);
extern int kvm_vcpu_init_nested(struct kvm_vcpu *vcpu);
extern void kvm_init_nested_s2_mmu(struct kvm_s2_mmu *mmu);
extern struct kvm_s2_mmu *lookup_s2_mmu(struct kvm_vcpu *vcpu);
+
+union tlbi_info;
+
+extern void kvm_s2_mmu_iterate_by_vmid(struct kvm *kvm, u16 vmid,
+ const union tlbi_info *info,
+ void (*)(struct kvm_s2_mmu *,
+ const union tlbi_info *));
extern void kvm_vcpu_load_hw_mmu(struct kvm_vcpu *vcpu);
extern void kvm_vcpu_put_hw_mmu(struct kvm_vcpu *vcpu);
diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
index 3e95f815ad17..645c979a75e5 100644
--- a/arch/arm64/kvm/nested.c
+++ b/arch/arm64/kvm/nested.c
@@ -349,6 +349,41 @@ int kvm_walk_nested_s2(struct kvm_vcpu *vcpu, phys_addr_t gipa,
return ret;
}
+/*
+ * We can have multiple *different* MMU contexts with the same VMID:
+ *
+ * - S2 being enabled or not, hence differing by the HCR_EL2.VM bit
+ *
+ * - Multiple vcpus using private S2s (huh huh...), hence differing by the
+ * VBBTR_EL2.BADDR address
+ *
+ * - A combination of the above...
+ *
+ * We can always identify which MMU context to pick at run-time. However,
+ * TLB invalidation involving a VMID must take action on all the TLBs using
+ * this particular VMID. This translates into applying the same invalidation
+ * operation to all the contexts that are using this VMID. Moar phun!
+ */
+void kvm_s2_mmu_iterate_by_vmid(struct kvm *kvm, u16 vmid,
+ const union tlbi_info *info,
+ void (*tlbi_callback)(struct kvm_s2_mmu *,
+ const union tlbi_info *))
+{
+ write_lock(&kvm->mmu_lock);
+
+ for (int i = 0; i < kvm->arch.nested_mmus_size; i++) {
+ struct kvm_s2_mmu *mmu = &kvm->arch.nested_mmus[i];
+
+ if (!kvm_s2_mmu_valid(mmu))
+ continue;
+
+ if (vmid == get_vmid(mmu->tlb_vttbr))
+ tlbi_callback(mmu, info);
+ }
+
+ write_unlock(&kvm->mmu_lock);
+}
+
struct kvm_s2_mmu *lookup_s2_mmu(struct kvm_vcpu *vcpu)
{
bool nested_stage2_enabled;
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index c9f4f387155f..4a0317883cd0 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -2728,6 +2728,73 @@ static const struct sys_reg_desc sys_reg_descs[] = {
EL2_REG(SP_EL2, NULL, reset_unknown, 0),
};
+/* Only defined here as this is an internal "abstraction" */
+union tlbi_info {
+ struct {
+ u64 start;
+ u64 size;
+ } range;
+
+ struct {
+ u64 addr;
+ } ipa;
+
+ struct {
+ u64 addr;
+ u32 encoding;
+ } va;
+};
+
+static void s2_mmu_tlbi_s1e1(struct kvm_s2_mmu *mmu,
+ const union tlbi_info *info)
+{
+ WARN_ON(__kvm_tlbi_s1e2(mmu, info->va.addr, info->va.encoding));
+}
+
+static bool handle_tlbi_el1(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
+ const struct sys_reg_desc *r)
+{
+ u32 sys_encoding = sys_insn(p->Op0, p->Op1, p->CRn, p->CRm, p->Op2);
+ u64 vttbr = vcpu_read_sys_reg(vcpu, VTTBR_EL2);
+
+ /*
+ * If we're here, this is because we've trapped on a EL1 TLBI
+ * instruction that affects the EL1 translation regime while
+ * we're running in a context that doesn't allow us to let the
+ * HW do its thing (aka vEL2):
+ *
+ * - HCR_EL2.E2H == 0 : a non-VHE guest
+ * - HCR_EL2.{E2H,TGE} == { 1, 0 } : a VHE guest in guest mode
+ *
+ * We don't expect these helpers to ever be called when running
+ * in a vEL1 context.
+ */
+
+ WARN_ON(!vcpu_is_el2(vcpu));
+
+ if (!kvm_supported_tlbi_s1e1_op(vcpu, sys_encoding)) {
+ kvm_inject_undefined(vcpu);
+ return false;
+ }
+
+ kvm_s2_mmu_iterate_by_vmid(vcpu->kvm, get_vmid(vttbr),
+ &(union tlbi_info) {
+ .va = {
+ .addr = p->regval,
+ .encoding = sys_encoding,
+ },
+ },
+ s2_mmu_tlbi_s1e1);
+
+ return true;
+}
+
+#define SYS_INSN(insn, access_fn) \
+ { \
+ SYS_DESC(OP_##insn), \
+ .access = (access_fn), \
+ }
+
static struct sys_reg_desc sys_insn_descs[] = {
{ SYS_DESC(SYS_DC_ISW), access_dcsw },
{ SYS_DESC(SYS_DC_IGSW), access_dcgsw },
@@ -2738,6 +2805,19 @@ static struct sys_reg_desc sys_insn_descs[] = {
{ SYS_DESC(SYS_DC_CISW), access_dcsw },
{ SYS_DESC(SYS_DC_CIGSW), access_dcgsw },
{ SYS_DESC(SYS_DC_CIGDSW), access_dcgsw },
+
+ SYS_INSN(TLBI_VMALLE1IS, handle_tlbi_el1),
+ SYS_INSN(TLBI_VAE1IS, handle_tlbi_el1),
+ SYS_INSN(TLBI_ASIDE1IS, handle_tlbi_el1),
+ SYS_INSN(TLBI_VAAE1IS, handle_tlbi_el1),
+ SYS_INSN(TLBI_VALE1IS, handle_tlbi_el1),
+ SYS_INSN(TLBI_VAALE1IS, handle_tlbi_el1),
+ SYS_INSN(TLBI_VMALLE1, handle_tlbi_el1),
+ SYS_INSN(TLBI_VAE1, handle_tlbi_el1),
+ SYS_INSN(TLBI_ASIDE1, handle_tlbi_el1),
+ SYS_INSN(TLBI_VAAE1, handle_tlbi_el1),
+ SYS_INSN(TLBI_VALE1, handle_tlbi_el1),
+ SYS_INSN(TLBI_VAALE1, handle_tlbi_el1),
};
static const struct sys_reg_desc *first_idreg;
--
2.39.2
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 08/16] KVM: arm64: nv: Handle TLBI VMALLS12E1{,IS} operations
2024-04-09 17:54 [PATCH 00/16] KVM: arm64: nv: Shadow stage-2 page table handling Marc Zyngier
` (6 preceding siblings ...)
2024-04-09 17:54 ` [PATCH 07/16] KVM: arm64: nv: Handle TLB invalidation targeting L2 stage-1 Marc Zyngier
@ 2024-04-09 17:54 ` Marc Zyngier
2024-04-09 17:54 ` [PATCH 09/16] KVM: arm64: nv: Handle TLBI ALLE1{,IS} operations Marc Zyngier
` (7 subsequent siblings)
15 siblings, 0 replies; 19+ messages in thread
From: Marc Zyngier @ 2024-04-09 17:54 UTC (permalink / raw)
To: kvmarm, kvm, linux-arm-kernel
Cc: James Morse, Suzuki K Poulose, Oliver Upton, Zenghui Yu,
Joey Gouly, Alexandru Elisei, Christoffer Dall
Emulating TLBI VMALLS12E1* results in tearing down all the shadow
S2 PTs that match the current VMID, since our shadow S2s are just
some form of SW-managed TLBs. That teardown itself results in a
full TLB invalidation for both S1 and S2.
This can result in over-invalidation if two vcpus use the same VMID
to tag private S2 PTs, but this is still correct from an architecture
perspective.
Co-developed-by: Jintack Lim <jintack.lim@linaro.org>
Co-developed-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
arch/arm64/kvm/sys_regs.c | 51 +++++++++++++++++++++++++++++++++++++++
1 file changed, 51 insertions(+)
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 4a0317883cd0..4be090ceb2ba 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -2728,6 +2728,22 @@ static const struct sys_reg_desc sys_reg_descs[] = {
EL2_REG(SP_EL2, NULL, reset_unknown, 0),
};
+static bool kvm_supported_tlbi_s12_op(struct kvm_vcpu *vpcu, u32 instr)
+{
+ struct kvm *kvm = vpcu->kvm;
+ u8 CRm = sys_reg_CRm(instr);
+
+ if (sys_reg_CRn(instr) == TLBI_CRn_nXS &&
+ !kvm_has_feat(kvm, ID_AA64ISAR1_EL1, XS, IMP))
+ return false;
+
+ if (CRm == TLBI_CRm_nROS &&
+ !kvm_has_feat(kvm, ID_AA64ISAR0_EL1, TLB, OS))
+ return false;
+
+ return true;
+}
+
/* Only defined here as this is an internal "abstraction" */
union tlbi_info {
struct {
@@ -2745,6 +2761,38 @@ union tlbi_info {
} va;
};
+static void s2_mmu_unmap_stage2_range(struct kvm_s2_mmu *mmu,
+ const union tlbi_info *info)
+{
+ kvm_unmap_stage2_range(mmu, info->range.start, info->range.size);
+}
+
+static bool handle_vmalls12e1is(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
+ const struct sys_reg_desc *r)
+{
+ u32 sys_encoding = sys_insn(p->Op0, p->Op1, p->CRn, p->CRm, p->Op2);
+ u64 limit, vttbr;
+
+ if (!kvm_supported_tlbi_s12_op(vcpu, sys_encoding)) {
+ kvm_inject_undefined(vcpu);
+ return false;
+ }
+
+ vttbr = vcpu_read_sys_reg(vcpu, VTTBR_EL2);
+ limit = BIT_ULL(kvm_get_pa_bits(vcpu->kvm));
+
+ kvm_s2_mmu_iterate_by_vmid(vcpu->kvm, get_vmid(vttbr),
+ &(union tlbi_info) {
+ .range = {
+ .start = 0,
+ .size = limit,
+ },
+ },
+ s2_mmu_unmap_stage2_range);
+
+ return true;
+}
+
static void s2_mmu_tlbi_s1e1(struct kvm_s2_mmu *mmu,
const union tlbi_info *info)
{
@@ -2818,6 +2866,9 @@ static struct sys_reg_desc sys_insn_descs[] = {
SYS_INSN(TLBI_VAAE1, handle_tlbi_el1),
SYS_INSN(TLBI_VALE1, handle_tlbi_el1),
SYS_INSN(TLBI_VAALE1, handle_tlbi_el1),
+
+ SYS_INSN(TLBI_VMALLS12E1IS, handle_vmalls12e1is),
+ SYS_INSN(TLBI_VMALLS12E1, handle_vmalls12e1is),
};
static const struct sys_reg_desc *first_idreg;
--
2.39.2
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 09/16] KVM: arm64: nv: Handle TLBI ALLE1{,IS} operations
2024-04-09 17:54 [PATCH 00/16] KVM: arm64: nv: Shadow stage-2 page table handling Marc Zyngier
` (7 preceding siblings ...)
2024-04-09 17:54 ` [PATCH 08/16] KVM: arm64: nv: Handle TLBI VMALLS12E1{,IS} operations Marc Zyngier
@ 2024-04-09 17:54 ` Marc Zyngier
2024-04-09 17:54 ` [PATCH 10/16] KVM: arm64: nv: Handle TLBI IPAS2E1{,IS} operations Marc Zyngier
` (6 subsequent siblings)
15 siblings, 0 replies; 19+ messages in thread
From: Marc Zyngier @ 2024-04-09 17:54 UTC (permalink / raw)
To: kvmarm, kvm, linux-arm-kernel
Cc: James Morse, Suzuki K Poulose, Oliver Upton, Zenghui Yu,
Joey Gouly, Alexandru Elisei, Christoffer Dall
TLBI ALLE1* is a pretty big hammer that invalides all S1/S2 TLBs.
This translates into the unmapping of all our shadow S2 PTs, itself
resulting in the corresponding TLB invalidations.
Co-developed-by: Jintack Lim <jintack.lim@linaro.org>
Co-developed-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
arch/arm64/kvm/sys_regs.c | 25 +++++++++++++++++++++++++
1 file changed, 25 insertions(+)
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 4be090ceb2ba..24e7d9c494c4 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -2744,6 +2744,29 @@ static bool kvm_supported_tlbi_s12_op(struct kvm_vcpu *vpcu, u32 instr)
return true;
}
+static bool handle_alle1is(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
+ const struct sys_reg_desc *r)
+{
+ u32 sys_encoding = sys_insn(p->Op0, p->Op1, p->CRn, p->CRm, p->Op2);
+
+ if (!kvm_supported_tlbi_s12_op(vcpu, sys_encoding)) {
+ kvm_inject_undefined(vcpu);
+ return false;
+ }
+
+ write_lock(&vcpu->kvm->mmu_lock);
+
+ /*
+ * Drop all shadow S2s, resulting in S1/S2 TLBIs for each of the
+ * corresponding VMIDs.
+ */
+ kvm_nested_s2_unmap(vcpu->kvm);
+
+ write_unlock(&vcpu->kvm->mmu_lock);
+
+ return true;
+}
+
/* Only defined here as this is an internal "abstraction" */
union tlbi_info {
struct {
@@ -2867,7 +2890,9 @@ static struct sys_reg_desc sys_insn_descs[] = {
SYS_INSN(TLBI_VALE1, handle_tlbi_el1),
SYS_INSN(TLBI_VAALE1, handle_tlbi_el1),
+ SYS_INSN(TLBI_ALLE1IS, handle_alle1is),
SYS_INSN(TLBI_VMALLS12E1IS, handle_vmalls12e1is),
+ SYS_INSN(TLBI_ALLE1, handle_alle1is),
SYS_INSN(TLBI_VMALLS12E1, handle_vmalls12e1is),
};
--
2.39.2
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 10/16] KVM: arm64: nv: Handle TLBI IPAS2E1{,IS} operations
2024-04-09 17:54 [PATCH 00/16] KVM: arm64: nv: Shadow stage-2 page table handling Marc Zyngier
` (8 preceding siblings ...)
2024-04-09 17:54 ` [PATCH 09/16] KVM: arm64: nv: Handle TLBI ALLE1{,IS} operations Marc Zyngier
@ 2024-04-09 17:54 ` Marc Zyngier
2024-04-09 17:54 ` [PATCH 11/16] KVM: arm64: nv: Handle FEAT_TTL hinted TLB operations Marc Zyngier
` (5 subsequent siblings)
15 siblings, 0 replies; 19+ messages in thread
From: Marc Zyngier @ 2024-04-09 17:54 UTC (permalink / raw)
To: kvmarm, kvm, linux-arm-kernel
Cc: James Morse, Suzuki K Poulose, Oliver Upton, Zenghui Yu,
Joey Gouly, Alexandru Elisei, Christoffer Dall
TLBI IPAS2E1* are the last class of TLBI instructions we need
to handle. For each matching S2 MMU context, we invalidate a
range corresponding to the largest possible mapping for that
context.
At this stage, we don't handle TTL, which means we are likely
over-invalidating. Further patches will aim at making this
a bit better.
Co-developed-by: Jintack Lim <jintack.lim@linaro.org>
Co-developed-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
arch/arm64/kvm/sys_regs.c | 96 +++++++++++++++++++++++++++++++++++++++
1 file changed, 96 insertions(+)
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 24e7d9c494c4..ce477867ec03 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -2767,6 +2767,31 @@ static bool handle_alle1is(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
return true;
}
+static bool kvm_supported_tlbi_ipas2_op(struct kvm_vcpu *vpcu, u32 instr)
+{
+ struct kvm *kvm = vpcu->kvm;
+ u8 CRm = sys_reg_CRm(instr);
+ u8 Op2 = sys_reg_Op2(instr);
+
+ if (sys_reg_CRn(instr) == TLBI_CRn_nXS &&
+ !kvm_has_feat(kvm, ID_AA64ISAR1_EL1, XS, IMP))
+ return false;
+
+ if (CRm == TLBI_CRm_IPAIS && (Op2 == 2 || Op2 == 6) &&
+ !kvm_has_feat(kvm, ID_AA64ISAR0_EL1, TLB, RANGE))
+ return false;
+
+ if (CRm == TLBI_CRm_IPAONS && (Op2 == 0 || Op2 == 4) &&
+ !kvm_has_feat(kvm, ID_AA64ISAR0_EL1, TLB, OS))
+ return false;
+
+ if (CRm == TLBI_CRm_IPAONS && (Op2 == 3 || Op2 == 7) &&
+ !kvm_has_feat(kvm, ID_AA64ISAR0_EL1, TLB, RANGE))
+ return false;
+
+ return true;
+}
+
/* Only defined here as this is an internal "abstraction" */
union tlbi_info {
struct {
@@ -2816,6 +2841,72 @@ static bool handle_vmalls12e1is(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
return true;
}
+static void s2_mmu_unmap_stage2_ipa(struct kvm_s2_mmu *mmu,
+ const union tlbi_info *info)
+{
+ unsigned long max_size;
+ u64 base_addr;
+
+ /*
+ * We drop a number of things from the supplied value:
+ *
+ * - NS bit: we're non-secure only.
+ *
+ * - TTL field: We already have the granule size from the
+ * VTCR_EL2.TG0 field, and the level is only relevant to the
+ * guest's S2PT.
+ *
+ * - IPA[51:48]: We don't support 52bit IPA just yet...
+ *
+ * And of course, adjust the IPA to be on an actual address.
+ */
+ base_addr = (info->ipa.addr & GENMASK_ULL(35, 0)) << 12;
+
+ /* Compute the maximum extent of the invalidation */
+ switch (mmu->tlb_vtcr & VTCR_EL2_TG0_MASK) {
+ case VTCR_EL2_TG0_4K:
+ max_size = SZ_1G;
+ break;
+ case VTCR_EL2_TG0_16K:
+ max_size = SZ_32M;
+ break;
+ case VTCR_EL2_TG0_64K:
+ default: /* IMPDEF: treat any other value as 64k */
+ /*
+ * No, we do not support 52bit IPA in nested yet. Once
+ * we do, this should be 4TB.
+ */
+ max_size = SZ_512M;
+ break;
+ }
+
+ base_addr &= ~(max_size - 1);
+
+ kvm_unmap_stage2_range(mmu, base_addr, max_size);
+}
+
+static bool handle_ipas2e1is(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
+ const struct sys_reg_desc *r)
+{
+ u32 sys_encoding = sys_insn(p->Op0, p->Op1, p->CRn, p->CRm, p->Op2);
+ u64 vttbr = vcpu_read_sys_reg(vcpu, VTTBR_EL2);
+
+ if (!kvm_supported_tlbi_ipas2_op(vcpu, sys_encoding)) {
+ kvm_inject_undefined(vcpu);
+ return false;
+ }
+
+ kvm_s2_mmu_iterate_by_vmid(vcpu->kvm, get_vmid(vttbr),
+ &(union tlbi_info) {
+ .ipa = {
+ .addr = p->regval,
+ },
+ },
+ s2_mmu_unmap_stage2_ipa);
+
+ return true;
+}
+
static void s2_mmu_tlbi_s1e1(struct kvm_s2_mmu *mmu,
const union tlbi_info *info)
{
@@ -2890,8 +2981,13 @@ static struct sys_reg_desc sys_insn_descs[] = {
SYS_INSN(TLBI_VALE1, handle_tlbi_el1),
SYS_INSN(TLBI_VAALE1, handle_tlbi_el1),
+ SYS_INSN(TLBI_IPAS2E1IS, handle_ipas2e1is),
+ SYS_INSN(TLBI_IPAS2LE1IS, handle_ipas2e1is),
+
SYS_INSN(TLBI_ALLE1IS, handle_alle1is),
SYS_INSN(TLBI_VMALLS12E1IS, handle_vmalls12e1is),
+ SYS_INSN(TLBI_IPAS2E1, handle_ipas2e1is),
+ SYS_INSN(TLBI_IPAS2LE1, handle_ipas2e1is),
SYS_INSN(TLBI_ALLE1, handle_alle1is),
SYS_INSN(TLBI_VMALLS12E1, handle_vmalls12e1is),
};
--
2.39.2
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 11/16] KVM: arm64: nv: Handle FEAT_TTL hinted TLB operations
2024-04-09 17:54 [PATCH 00/16] KVM: arm64: nv: Shadow stage-2 page table handling Marc Zyngier
` (9 preceding siblings ...)
2024-04-09 17:54 ` [PATCH 10/16] KVM: arm64: nv: Handle TLBI IPAS2E1{,IS} operations Marc Zyngier
@ 2024-04-09 17:54 ` Marc Zyngier
2024-04-09 17:54 ` [PATCH 12/16] KVM: arm64: nv: Tag shadow S2 entries with guest's leaf S2 level Marc Zyngier
` (4 subsequent siblings)
15 siblings, 0 replies; 19+ messages in thread
From: Marc Zyngier @ 2024-04-09 17:54 UTC (permalink / raw)
To: kvmarm, kvm, linux-arm-kernel
Cc: James Morse, Suzuki K Poulose, Oliver Upton, Zenghui Yu,
Joey Gouly, Alexandru Elisei, Christoffer Dall
Support guest-provided information information to size the range of
required invalidation. This helps with reducing over-invalidation,
provided that the guest actually provides accurate information.
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
arch/arm64/include/asm/kvm_nested.h | 2 +
arch/arm64/kvm/nested.c | 89 +++++++++++++++++++++++++++++
arch/arm64/kvm/sys_regs.c | 24 +-------
3 files changed, 92 insertions(+), 23 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
index 6b647a169f5d..ec79e16d8436 100644
--- a/arch/arm64/include/asm/kvm_nested.h
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -123,6 +123,8 @@ extern void kvm_nested_s2_wp(struct kvm *kvm);
extern void kvm_nested_s2_unmap(struct kvm *kvm);
extern void kvm_nested_s2_flush(struct kvm *kvm);
+unsigned long compute_tlb_inval_range(struct kvm_s2_mmu *mmu, u64 val);
+
static inline bool kvm_supported_tlbi_s1e1_op(struct kvm_vcpu *vpcu, u32 instr)
{
struct kvm *kvm = vpcu->kvm;
diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
index 645c979a75e5..5abdf82bb9a4 100644
--- a/arch/arm64/kvm/nested.c
+++ b/arch/arm64/kvm/nested.c
@@ -349,6 +349,95 @@ int kvm_walk_nested_s2(struct kvm_vcpu *vcpu, phys_addr_t gipa,
return ret;
}
+static unsigned int ttl_to_size(u8 ttl)
+{
+ int level = ttl & 3;
+ int gran = (ttl >> 2) & 3;
+ unsigned int max_size = 0;
+
+ switch (gran) {
+ case TLBI_TTL_TG_4K:
+ switch (level) {
+ case 0:
+ break;
+ case 1:
+ max_size = SZ_1G;
+ break;
+ case 2:
+ max_size = SZ_2M;
+ break;
+ case 3:
+ max_size = SZ_4K;
+ break;
+ }
+ break;
+ case TLBI_TTL_TG_16K:
+ switch (level) {
+ case 0:
+ case 1:
+ break;
+ case 2:
+ max_size = SZ_32M;
+ break;
+ case 3:
+ max_size = SZ_16K;
+ break;
+ }
+ break;
+ case TLBI_TTL_TG_64K:
+ switch (level) {
+ case 0:
+ case 1:
+ /* No 52bit IPA support */
+ break;
+ case 2:
+ max_size = SZ_512M;
+ break;
+ case 3:
+ max_size = SZ_64K;
+ break;
+ }
+ break;
+ default: /* No size information */
+ break;
+ }
+
+ return max_size;
+}
+
+unsigned long compute_tlb_inval_range(struct kvm_s2_mmu *mmu, u64 val)
+{
+ unsigned long max_size;
+ u8 ttl;
+
+ ttl = FIELD_GET(GENMASK_ULL(47, 44), val);
+
+ max_size = ttl_to_size(ttl);
+
+ if (!max_size) {
+ /* Compute the maximum extent of the invalidation */
+ switch (mmu->tlb_vtcr & VTCR_EL2_TG0_MASK) {
+ case VTCR_EL2_TG0_4K:
+ max_size = SZ_1G;
+ break;
+ case VTCR_EL2_TG0_16K:
+ max_size = SZ_32M;
+ break;
+ case VTCR_EL2_TG0_64K:
+ default: /* IMPDEF: treat any other value as 64k */
+ /*
+ * No, we do not support 52bit IPA in nested yet. Once
+ * we do, this should be 4TB.
+ */
+ max_size = SZ_512M;
+ break;
+ }
+ }
+
+ WARN_ON(!max_size);
+ return max_size;
+}
+
/*
* We can have multiple *different* MMU contexts with the same VMID:
*
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index ce477867ec03..810ea3dd2088 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -2852,34 +2852,12 @@ static void s2_mmu_unmap_stage2_ipa(struct kvm_s2_mmu *mmu,
*
* - NS bit: we're non-secure only.
*
- * - TTL field: We already have the granule size from the
- * VTCR_EL2.TG0 field, and the level is only relevant to the
- * guest's S2PT.
- *
* - IPA[51:48]: We don't support 52bit IPA just yet...
*
* And of course, adjust the IPA to be on an actual address.
*/
base_addr = (info->ipa.addr & GENMASK_ULL(35, 0)) << 12;
-
- /* Compute the maximum extent of the invalidation */
- switch (mmu->tlb_vtcr & VTCR_EL2_TG0_MASK) {
- case VTCR_EL2_TG0_4K:
- max_size = SZ_1G;
- break;
- case VTCR_EL2_TG0_16K:
- max_size = SZ_32M;
- break;
- case VTCR_EL2_TG0_64K:
- default: /* IMPDEF: treat any other value as 64k */
- /*
- * No, we do not support 52bit IPA in nested yet. Once
- * we do, this should be 4TB.
- */
- max_size = SZ_512M;
- break;
- }
-
+ max_size = compute_tlb_inval_range(mmu, info->ipa.addr);
base_addr &= ~(max_size - 1);
kvm_unmap_stage2_range(mmu, base_addr, max_size);
--
2.39.2
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 12/16] KVM: arm64: nv: Tag shadow S2 entries with guest's leaf S2 level
2024-04-09 17:54 [PATCH 00/16] KVM: arm64: nv: Shadow stage-2 page table handling Marc Zyngier
` (10 preceding siblings ...)
2024-04-09 17:54 ` [PATCH 11/16] KVM: arm64: nv: Handle FEAT_TTL hinted TLB operations Marc Zyngier
@ 2024-04-09 17:54 ` Marc Zyngier
2024-04-09 17:54 ` [PATCH 13/16] KVM: arm64: nv: Invalidate TLBs based on shadow S2 TTL-like information Marc Zyngier
` (3 subsequent siblings)
15 siblings, 0 replies; 19+ messages in thread
From: Marc Zyngier @ 2024-04-09 17:54 UTC (permalink / raw)
To: kvmarm, kvm, linux-arm-kernel
Cc: James Morse, Suzuki K Poulose, Oliver Upton, Zenghui Yu,
Joey Gouly, Alexandru Elisei, Christoffer Dall
Populate bits [56:55] of the leaf entry with the level provided
by the guest's S2 translation. This will allow us to better scope
the invalidation by remembering the mapping size.
Of course, this assume that the guest will issue an invalidation
with an address that falls into the same leaf. If the guest doesn't,
we'll over-invalidate.
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
arch/arm64/include/asm/kvm_nested.h | 8 ++++++++
arch/arm64/kvm/mmu.c | 16 ++++++++++++++--
2 files changed, 22 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
index ec79e16d8436..3119e0a195dc 100644
--- a/arch/arm64/include/asm/kvm_nested.h
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -5,6 +5,7 @@
#include <linux/bitfield.h>
#include <linux/kvm_host.h>
#include <asm/kvm_emulate.h>
+#include <asm/kvm_pgtable.h>
static inline bool vcpu_has_nv(const struct kvm_vcpu *vcpu)
{
@@ -182,4 +183,11 @@ static inline bool kvm_supported_tlbi_s1e2_op(struct kvm_vcpu *vpcu, u32 instr)
int kvm_init_nv_sysregs(struct kvm *kvm);
+#define KVM_NV_GUEST_MAP_SZ (KVM_PGTABLE_PROT_SW1 | KVM_PGTABLE_PROT_SW0)
+
+static inline u64 kvm_encode_nested_level(struct kvm_s2_trans *trans)
+{
+ return FIELD_PREP(KVM_NV_GUEST_MAP_SZ, trans->level);
+}
+
#endif /* __ARM64_KVM_NESTED_H */
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index cf31da306622..0412c3c07c75 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1606,11 +1606,17 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
* Potentially reduce shadow S2 permissions to match the guest's own
* S2. For exec faults, we'd only reach this point if the guest
* actually allowed it (see kvm_s2_handle_perm_fault).
+ *
+ * Also encode the level of the nested translation in the SW bits of
+ * the PTE/PMD/PUD. This will be retrived on TLB invalidation from
+ * the guest.
*/
if (nested) {
writable &= kvm_s2_trans_writable(nested);
if (!kvm_s2_trans_readable(nested))
prot &= ~KVM_PGTABLE_PROT_R;
+
+ prot |= kvm_encode_nested_level(nested);
}
read_lock(&kvm->mmu_lock);
@@ -1667,14 +1673,20 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
* permissions only if vma_pagesize equals fault_granule. Otherwise,
* kvm_pgtable_stage2_map() should be called to change block size.
*/
- if (fault_is_perm && vma_pagesize == fault_granule)
+ if (fault_is_perm && vma_pagesize == fault_granule) {
+ /*
+ * Drop the SW bits in favour of those stored in the
+ * PTE, which will be preserved.
+ */
+ prot &= ~KVM_NV_GUEST_MAP_SZ;
ret = kvm_pgtable_stage2_relax_perms(pgt, fault_ipa, prot);
- else
+ } else {
ret = kvm_pgtable_stage2_map(pgt, fault_ipa, vma_pagesize,
__pfn_to_phys(pfn), prot,
memcache,
KVM_PGTABLE_WALK_HANDLE_FAULT |
KVM_PGTABLE_WALK_SHARED);
+ }
/* Mark the page dirty only if the fault is handled successfully */
if (writable && !ret) {
--
2.39.2
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 13/16] KVM: arm64: nv: Invalidate TLBs based on shadow S2 TTL-like information
2024-04-09 17:54 [PATCH 00/16] KVM: arm64: nv: Shadow stage-2 page table handling Marc Zyngier
` (11 preceding siblings ...)
2024-04-09 17:54 ` [PATCH 12/16] KVM: arm64: nv: Tag shadow S2 entries with guest's leaf S2 level Marc Zyngier
@ 2024-04-09 17:54 ` Marc Zyngier
2024-04-09 17:54 ` [PATCH 14/16] KVM: arm64: nv: Add handling of outer-shareable TLBI operations Marc Zyngier
` (2 subsequent siblings)
15 siblings, 0 replies; 19+ messages in thread
From: Marc Zyngier @ 2024-04-09 17:54 UTC (permalink / raw)
To: kvmarm, kvm, linux-arm-kernel
Cc: James Morse, Suzuki K Poulose, Oliver Upton, Zenghui Yu,
Joey Gouly, Alexandru Elisei, Christoffer Dall
In order to be able to make S2 TLB invalidations more performant on NV,
let's use a scheme derived from the FEAT_TTL extension.
If bits [56:55] in the leaf descriptor translating the address in the
corresponding shadow S2 are non-zero, they indicate a level which can
be used as an invalidation range. This allows further reduction of the
systematic over-invalidation that takes place otherwise.
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
arch/arm64/kvm/nested.c | 83 ++++++++++++++++++++++++++++++++++++++++-
1 file changed, 82 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
index 5abdf82bb9a4..8163f7308b82 100644
--- a/arch/arm64/kvm/nested.c
+++ b/arch/arm64/kvm/nested.c
@@ -4,6 +4,7 @@
* Author: Jintack Lim <jintack.lim@linaro.org>
*/
+#include <linux/bitfield.h>
#include <linux/kvm.h>
#include <linux/kvm_host.h>
@@ -405,12 +406,92 @@ static unsigned int ttl_to_size(u8 ttl)
return max_size;
}
+/*
+ * Compute the equivalent of the TTL field by parsing the shadow PT. The
+ * granule size is extracted from the cached VTCR_EL2.TG0 while the level is
+ * retrieved from first entry carrying the level as a tag.
+ */
+static u8 get_guest_mapping_ttl(struct kvm_s2_mmu *mmu, u64 addr)
+{
+ u64 tmp, sz = 0, vtcr = mmu->tlb_vtcr;
+ kvm_pte_t pte;
+ u8 ttl, level;
+
+ switch (vtcr & VTCR_EL2_TG0_MASK) {
+ case VTCR_EL2_TG0_4K:
+ ttl = (TLBI_TTL_TG_4K << 2);
+ break;
+ case VTCR_EL2_TG0_16K:
+ ttl = (TLBI_TTL_TG_16K << 2);
+ break;
+ case VTCR_EL2_TG0_64K:
+ default: /* IMPDEF: treat any other value as 64k */
+ ttl = (TLBI_TTL_TG_64K << 2);
+ break;
+ }
+
+ tmp = addr;
+
+again:
+ /* Iteratively compute the block sizes for a particular granule size */
+ switch (vtcr & VTCR_EL2_TG0_MASK) {
+ case VTCR_EL2_TG0_4K:
+ if (sz < SZ_4K) sz = SZ_4K;
+ else if (sz < SZ_2M) sz = SZ_2M;
+ else if (sz < SZ_1G) sz = SZ_1G;
+ else sz = 0;
+ break;
+ case VTCR_EL2_TG0_16K:
+ if (sz < SZ_16K) sz = SZ_16K;
+ else if (sz < SZ_32M) sz = SZ_32M;
+ else sz = 0;
+ break;
+ case VTCR_EL2_TG0_64K:
+ default: /* IMPDEF: treat any other value as 64k */
+ if (sz < SZ_64K) sz = SZ_64K;
+ else if (sz < SZ_512M) sz = SZ_512M;
+ else sz = 0;
+ break;
+ }
+
+ if (sz == 0)
+ return 0;
+
+ tmp &= ~(sz - 1);
+ if (kvm_pgtable_get_leaf(mmu->pgt, tmp, &pte, NULL))
+ goto again;
+ if (!(pte & PTE_VALID))
+ goto again;
+ level = FIELD_GET(KVM_NV_GUEST_MAP_SZ, pte);
+ if (!level)
+ goto again;
+
+ ttl |= level;
+
+ /*
+ * We now have found some level information in the shadow S2. Check
+ * that the resulting range is actually including the original IPA.
+ */
+ sz = ttl_to_size(ttl);
+ if (addr < (tmp + sz))
+ return ttl;
+
+ return 0;
+}
+
unsigned long compute_tlb_inval_range(struct kvm_s2_mmu *mmu, u64 val)
{
+ struct kvm *kvm = kvm_s2_mmu_to_kvm(mmu);
unsigned long max_size;
u8 ttl;
- ttl = FIELD_GET(GENMASK_ULL(47, 44), val);
+ ttl = FIELD_GET(TLBI_TTL_MASK, val);
+
+ if (!ttl || !kvm_has_feat(kvm, ID_AA64MMFR2_EL1, TTL, IMP)) {
+ /* No TTL, check the shadow S2 for a hint */
+ u64 addr = (val & GENMASK_ULL(35, 0)) << 12;
+ ttl = get_guest_mapping_ttl(mmu, addr);
+ }
max_size = ttl_to_size(ttl);
--
2.39.2
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 14/16] KVM: arm64: nv: Add handling of outer-shareable TLBI operations
2024-04-09 17:54 [PATCH 00/16] KVM: arm64: nv: Shadow stage-2 page table handling Marc Zyngier
` (12 preceding siblings ...)
2024-04-09 17:54 ` [PATCH 13/16] KVM: arm64: nv: Invalidate TLBs based on shadow S2 TTL-like information Marc Zyngier
@ 2024-04-09 17:54 ` Marc Zyngier
2024-04-09 17:54 ` [PATCH 15/16] KVM: arm64: nv: Add handling of range-based " Marc Zyngier
2024-04-09 17:54 ` [PATCH 16/16] KVM: arm64: nv: Add handling of NXS-flavoured " Marc Zyngier
15 siblings, 0 replies; 19+ messages in thread
From: Marc Zyngier @ 2024-04-09 17:54 UTC (permalink / raw)
To: kvmarm, kvm, linux-arm-kernel
Cc: James Morse, Suzuki K Poulose, Oliver Upton, Zenghui Yu,
Joey Gouly, Alexandru Elisei, Christoffer Dall
Our handling of outer-shareable TLBIs is pretty basic: we just
map them to the existing inner-shareable ones, because we really
don't have anything else.
The only significant change is that we can now advertise FEAT_TLBIOS
support if the host supports it.
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
arch/arm64/kvm/hyp/vhe/tlb.c | 10 ++++++++++
arch/arm64/kvm/nested.c | 5 ++++-
arch/arm64/kvm/sys_regs.c | 15 +++++++++++++++
3 files changed, 29 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/kvm/hyp/vhe/tlb.c b/arch/arm64/kvm/hyp/vhe/tlb.c
index 7ffc3f4b09f0..a5e6c01ccdde 100644
--- a/arch/arm64/kvm/hyp/vhe/tlb.c
+++ b/arch/arm64/kvm/hyp/vhe/tlb.c
@@ -226,6 +226,7 @@ void __kvm_flush_vm_context(void)
*
* - a TLBI targeting EL2 S1 is remapped to EL1 S1
* - a non-shareable TLBI is upgraded to being inner-shareable
+ * - an outer-shareable TLBI is also mapped to inner-shareable
*/
int __kvm_tlbi_s1e2(struct kvm_s2_mmu *mmu, u64 va, u64 sys_encoding)
{
@@ -245,32 +246,41 @@ int __kvm_tlbi_s1e2(struct kvm_s2_mmu *mmu, u64 va, u64 sys_encoding)
switch (sys_encoding) {
case OP_TLBI_ALLE2:
case OP_TLBI_ALLE2IS:
+ case OP_TLBI_ALLE2OS:
case OP_TLBI_VMALLE1:
case OP_TLBI_VMALLE1IS:
+ case OP_TLBI_VMALLE1OS:
__tlbi(vmalle1is);
break;
case OP_TLBI_VAE2:
case OP_TLBI_VAE2IS:
+ case OP_TLBI_VAE2OS:
case OP_TLBI_VAE1:
case OP_TLBI_VAE1IS:
+ case OP_TLBI_VAE1OS:
__tlbi(vae1is, va);
break;
case OP_TLBI_VALE2:
case OP_TLBI_VALE2IS:
+ case OP_TLBI_VALE2OS:
case OP_TLBI_VALE1:
case OP_TLBI_VALE1IS:
+ case OP_TLBI_VALE1OS:
__tlbi(vale1is, va);
break;
case OP_TLBI_ASIDE1:
case OP_TLBI_ASIDE1IS:
+ case OP_TLBI_ASIDE1OS:
__tlbi(aside1is, va);
break;
case OP_TLBI_VAAE1:
case OP_TLBI_VAAE1IS:
+ case OP_TLBI_VAAE1OS:
__tlbi(vaae1is, va);
break;
case OP_TLBI_VAALE1:
case OP_TLBI_VAALE1IS:
+ case OP_TLBI_VAALE1OS:
__tlbi(vaale1is, va);
break;
default:
diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
index 8163f7308b82..f9487c04840a 100644
--- a/arch/arm64/kvm/nested.c
+++ b/arch/arm64/kvm/nested.c
@@ -791,9 +791,12 @@ static u64 limit_nv_id_reg(u32 id, u64 val)
switch (id) {
case SYS_ID_AA64ISAR0_EL1:
- /* Support everything but TME, O.S. and Range TLBIs */
+ /* Support everything but TME and Range TLBIs */
+ tmp = FIELD_GET(NV_FTR(ISAR0, TLB), val);
+ tmp = min(tmp, ID_AA64ISAR0_EL1_TLB_OS);
val &= ~(NV_FTR(ISAR0, TLB) |
NV_FTR(ISAR0, TME));
+ val |= FIELD_PREP(NV_FTR(ISAR0, TLB), tmp);
break;
case SYS_ID_AA64ISAR1_EL1:
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 810ea3dd2088..a7c5fa320cda 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -2946,6 +2946,13 @@ static struct sys_reg_desc sys_insn_descs[] = {
{ SYS_DESC(SYS_DC_CIGSW), access_dcgsw },
{ SYS_DESC(SYS_DC_CIGDSW), access_dcgsw },
+ SYS_INSN(TLBI_VMALLE1OS, handle_tlbi_el1),
+ SYS_INSN(TLBI_VAE1OS, handle_tlbi_el1),
+ SYS_INSN(TLBI_ASIDE1OS, handle_tlbi_el1),
+ SYS_INSN(TLBI_VAAE1OS, handle_tlbi_el1),
+ SYS_INSN(TLBI_VALE1OS, handle_tlbi_el1),
+ SYS_INSN(TLBI_VAALE1OS, handle_tlbi_el1),
+
SYS_INSN(TLBI_VMALLE1IS, handle_tlbi_el1),
SYS_INSN(TLBI_VAE1IS, handle_tlbi_el1),
SYS_INSN(TLBI_ASIDE1IS, handle_tlbi_el1),
@@ -2962,9 +2969,17 @@ static struct sys_reg_desc sys_insn_descs[] = {
SYS_INSN(TLBI_IPAS2E1IS, handle_ipas2e1is),
SYS_INSN(TLBI_IPAS2LE1IS, handle_ipas2e1is),
+ SYS_INSN(TLBI_ALLE2OS, trap_undef),
+ SYS_INSN(TLBI_VAE2OS, trap_undef),
+ SYS_INSN(TLBI_ALLE1OS, handle_alle1is),
+ SYS_INSN(TLBI_VALE2OS, trap_undef),
+ SYS_INSN(TLBI_VMALLS12E1OS, handle_vmalls12e1is),
+
SYS_INSN(TLBI_ALLE1IS, handle_alle1is),
SYS_INSN(TLBI_VMALLS12E1IS, handle_vmalls12e1is),
+ SYS_INSN(TLBI_IPAS2E1OS, handle_ipas2e1is),
SYS_INSN(TLBI_IPAS2E1, handle_ipas2e1is),
+ SYS_INSN(TLBI_IPAS2LE1OS, handle_ipas2e1is),
SYS_INSN(TLBI_IPAS2LE1, handle_ipas2e1is),
SYS_INSN(TLBI_ALLE1, handle_alle1is),
SYS_INSN(TLBI_VMALLS12E1, handle_vmalls12e1is),
--
2.39.2
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 15/16] KVM: arm64: nv: Add handling of range-based TLBI operations
2024-04-09 17:54 [PATCH 00/16] KVM: arm64: nv: Shadow stage-2 page table handling Marc Zyngier
` (13 preceding siblings ...)
2024-04-09 17:54 ` [PATCH 14/16] KVM: arm64: nv: Add handling of outer-shareable TLBI operations Marc Zyngier
@ 2024-04-09 17:54 ` Marc Zyngier
2024-04-09 17:54 ` [PATCH 16/16] KVM: arm64: nv: Add handling of NXS-flavoured " Marc Zyngier
15 siblings, 0 replies; 19+ messages in thread
From: Marc Zyngier @ 2024-04-09 17:54 UTC (permalink / raw)
To: kvmarm, kvm, linux-arm-kernel
Cc: James Morse, Suzuki K Poulose, Oliver Upton, Zenghui Yu,
Joey Gouly, Alexandru Elisei, Christoffer Dall
We already support some form of range operation by handling FEAT_TTL,
but so far the "arbitrary" range operations are unsupported.
Let's fix that.
For EL2 S1, this is simple enough: we just map both NSH, ISH and OSH
instructions onto the ISH version for EL1.
For TLBI instructions affecting EL1 S1, we use the same model as
their non-range counterpart to invalidate in the context of the
correct VMID.
For TLBI instructions affecting S2, we interpret the data passed
by the guest to compute the range and use that to tear-down part
of the shadow S2 range and invalidate the TLBs.
Finally, we advertise FEAT_TLBIRANGE if the host supports it.
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
arch/arm64/kvm/hyp/vhe/tlb.c | 26 ++++++++++++
arch/arm64/kvm/nested.c | 8 +---
arch/arm64/kvm/sys_regs.c | 80 ++++++++++++++++++++++++++++++++++++
3 files changed, 108 insertions(+), 6 deletions(-)
diff --git a/arch/arm64/kvm/hyp/vhe/tlb.c b/arch/arm64/kvm/hyp/vhe/tlb.c
index a5e6c01ccdde..8afaa237f5d4 100644
--- a/arch/arm64/kvm/hyp/vhe/tlb.c
+++ b/arch/arm64/kvm/hyp/vhe/tlb.c
@@ -283,6 +283,32 @@ int __kvm_tlbi_s1e2(struct kvm_s2_mmu *mmu, u64 va, u64 sys_encoding)
case OP_TLBI_VAALE1OS:
__tlbi(vaale1is, va);
break;
+ case OP_TLBI_RVAE2:
+ case OP_TLBI_RVAE2IS:
+ case OP_TLBI_RVAE2OS:
+ case OP_TLBI_RVAE1:
+ case OP_TLBI_RVAE1IS:
+ case OP_TLBI_RVAE1OS:
+ __tlbi(rvae1is, va);
+ break;
+ case OP_TLBI_RVALE2:
+ case OP_TLBI_RVALE2IS:
+ case OP_TLBI_RVALE2OS:
+ case OP_TLBI_RVALE1:
+ case OP_TLBI_RVALE1IS:
+ case OP_TLBI_RVALE1OS:
+ __tlbi(rvale1is, va);
+ break;
+ case OP_TLBI_RVAAE1:
+ case OP_TLBI_RVAAE1IS:
+ case OP_TLBI_RVAAE1OS:
+ __tlbi(rvaae1is, va);
+ break;
+ case OP_TLBI_RVAALE1:
+ case OP_TLBI_RVAALE1IS:
+ case OP_TLBI_RVAALE1OS:
+ __tlbi(rvaale1is, va);
+ break;
default:
ret = -EINVAL;
}
diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
index f9487c04840a..c63824479647 100644
--- a/arch/arm64/kvm/nested.c
+++ b/arch/arm64/kvm/nested.c
@@ -791,12 +791,8 @@ static u64 limit_nv_id_reg(u32 id, u64 val)
switch (id) {
case SYS_ID_AA64ISAR0_EL1:
- /* Support everything but TME and Range TLBIs */
- tmp = FIELD_GET(NV_FTR(ISAR0, TLB), val);
- tmp = min(tmp, ID_AA64ISAR0_EL1_TLB_OS);
- val &= ~(NV_FTR(ISAR0, TLB) |
- NV_FTR(ISAR0, TME));
- val |= FIELD_PREP(NV_FTR(ISAR0, TLB), tmp);
+ /* Support everything but TME */
+ val &= ~NV_FTR(ISAR0, TME);
break;
case SYS_ID_AA64ISAR1_EL1:
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index a7c5fa320cda..6d7f043d892c 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -2841,6 +2841,57 @@ static bool handle_vmalls12e1is(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
return true;
}
+static bool handle_ripas2e1is(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
+ const struct sys_reg_desc *r)
+{
+ u32 sys_encoding = sys_insn(p->Op0, p->Op1, p->CRn, p->CRm, p->Op2);
+ u64 vttbr = vcpu_read_sys_reg(vcpu, VTTBR_EL2);
+ u64 base, range, tg, num, scale;
+ int shift;
+
+ if (!kvm_supported_tlbi_ipas2_op(vcpu, sys_encoding)) {
+ kvm_inject_undefined(vcpu);
+ return false;
+ }
+
+ /*
+ * Because the shadow S2 structure doesn't necessarily reflect that
+ * of the guest's S2 (different base granule size, for example), we
+ * decide to ignore TTL and only use the described range.
+ */
+ tg = FIELD_GET(GENMASK(47, 46), p->regval);
+ scale = FIELD_GET(GENMASK(45, 44), p->regval);
+ num = FIELD_GET(GENMASK(43, 39), p->regval);
+ base = p->regval & GENMASK(36, 0);
+
+ switch(tg) {
+ case 1:
+ shift = 12;
+ break;
+ case 2:
+ shift = 14;
+ break;
+ case 3:
+ default: /* IMPDEF: handle tg==0 as 64k */
+ shift = 16;
+ break;
+ }
+
+ base <<= shift;
+ range = __TLBI_RANGE_PAGES(num, scale) << shift;
+
+ kvm_s2_mmu_iterate_by_vmid(vcpu->kvm, get_vmid(vttbr),
+ &(union tlbi_info) {
+ .range = {
+ .start = base,
+ .size = range,
+ },
+ },
+ s2_mmu_unmap_stage2_range);
+
+ return true;
+}
+
static void s2_mmu_unmap_stage2_ipa(struct kvm_s2_mmu *mmu,
const union tlbi_info *info)
{
@@ -2953,12 +3004,28 @@ static struct sys_reg_desc sys_insn_descs[] = {
SYS_INSN(TLBI_VALE1OS, handle_tlbi_el1),
SYS_INSN(TLBI_VAALE1OS, handle_tlbi_el1),
+ SYS_INSN(TLBI_RVAE1IS, handle_tlbi_el1),
+ SYS_INSN(TLBI_RVAAE1IS, handle_tlbi_el1),
+ SYS_INSN(TLBI_RVALE1IS, handle_tlbi_el1),
+ SYS_INSN(TLBI_RVAALE1IS, handle_tlbi_el1),
+
SYS_INSN(TLBI_VMALLE1IS, handle_tlbi_el1),
SYS_INSN(TLBI_VAE1IS, handle_tlbi_el1),
SYS_INSN(TLBI_ASIDE1IS, handle_tlbi_el1),
SYS_INSN(TLBI_VAAE1IS, handle_tlbi_el1),
SYS_INSN(TLBI_VALE1IS, handle_tlbi_el1),
SYS_INSN(TLBI_VAALE1IS, handle_tlbi_el1),
+
+ SYS_INSN(TLBI_RVAE1OS, handle_tlbi_el1),
+ SYS_INSN(TLBI_RVAAE1OS, handle_tlbi_el1),
+ SYS_INSN(TLBI_RVALE1OS, handle_tlbi_el1),
+ SYS_INSN(TLBI_RVAALE1OS, handle_tlbi_el1),
+
+ SYS_INSN(TLBI_RVAE1, handle_tlbi_el1),
+ SYS_INSN(TLBI_RVAAE1, handle_tlbi_el1),
+ SYS_INSN(TLBI_RVALE1, handle_tlbi_el1),
+ SYS_INSN(TLBI_RVAALE1, handle_tlbi_el1),
+
SYS_INSN(TLBI_VMALLE1, handle_tlbi_el1),
SYS_INSN(TLBI_VAE1, handle_tlbi_el1),
SYS_INSN(TLBI_ASIDE1, handle_tlbi_el1),
@@ -2967,7 +3034,9 @@ static struct sys_reg_desc sys_insn_descs[] = {
SYS_INSN(TLBI_VAALE1, handle_tlbi_el1),
SYS_INSN(TLBI_IPAS2E1IS, handle_ipas2e1is),
+ SYS_INSN(TLBI_RIPAS2E1IS, handle_ripas2e1is),
SYS_INSN(TLBI_IPAS2LE1IS, handle_ipas2e1is),
+ SYS_INSN(TLBI_RIPAS2LE1IS, handle_ripas2e1is),
SYS_INSN(TLBI_ALLE2OS, trap_undef),
SYS_INSN(TLBI_VAE2OS, trap_undef),
@@ -2975,12 +3044,23 @@ static struct sys_reg_desc sys_insn_descs[] = {
SYS_INSN(TLBI_VALE2OS, trap_undef),
SYS_INSN(TLBI_VMALLS12E1OS, handle_vmalls12e1is),
+ SYS_INSN(TLBI_RVAE2IS, trap_undef),
+ SYS_INSN(TLBI_RVALE2IS, trap_undef),
+
SYS_INSN(TLBI_ALLE1IS, handle_alle1is),
SYS_INSN(TLBI_VMALLS12E1IS, handle_vmalls12e1is),
SYS_INSN(TLBI_IPAS2E1OS, handle_ipas2e1is),
SYS_INSN(TLBI_IPAS2E1, handle_ipas2e1is),
+ SYS_INSN(TLBI_RIPAS2E1, handle_ripas2e1is),
+ SYS_INSN(TLBI_RIPAS2E1OS, handle_ripas2e1is),
SYS_INSN(TLBI_IPAS2LE1OS, handle_ipas2e1is),
SYS_INSN(TLBI_IPAS2LE1, handle_ipas2e1is),
+ SYS_INSN(TLBI_RIPAS2LE1, handle_ripas2e1is),
+ SYS_INSN(TLBI_RIPAS2LE1OS, handle_ripas2e1is),
+ SYS_INSN(TLBI_RVAE2OS, trap_undef),
+ SYS_INSN(TLBI_RVALE2OS, trap_undef),
+ SYS_INSN(TLBI_RVAE2, trap_undef),
+ SYS_INSN(TLBI_RVALE2, trap_undef),
SYS_INSN(TLBI_ALLE1, handle_alle1is),
SYS_INSN(TLBI_VMALLS12E1, handle_vmalls12e1is),
};
--
2.39.2
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 16/16] KVM: arm64: nv: Add handling of NXS-flavoured TLBI operations
2024-04-09 17:54 [PATCH 00/16] KVM: arm64: nv: Shadow stage-2 page table handling Marc Zyngier
` (14 preceding siblings ...)
2024-04-09 17:54 ` [PATCH 15/16] KVM: arm64: nv: Add handling of range-based " Marc Zyngier
@ 2024-04-09 17:54 ` Marc Zyngier
15 siblings, 0 replies; 19+ messages in thread
From: Marc Zyngier @ 2024-04-09 17:54 UTC (permalink / raw)
To: kvmarm, kvm, linux-arm-kernel
Cc: James Morse, Suzuki K Poulose, Oliver Upton, Zenghui Yu,
Joey Gouly, Alexandru Elisei, Christoffer Dall
Latest kid on the block: NXS (Non-eXtra-Slow) TLBI operations.
Let's add those in bulk (NSH, ISH, OSH, both normal and range)
as they directly map to their XS (the standard ones) counterparts.
Not a lot to say about them, they are basically useless.
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
arch/arm64/kvm/hyp/vhe/tlb.c | 46 +++++++++++++++++++++++
arch/arm64/kvm/sys_regs.c | 73 ++++++++++++++++++++++++++++++++++++
2 files changed, 119 insertions(+)
diff --git a/arch/arm64/kvm/hyp/vhe/tlb.c b/arch/arm64/kvm/hyp/vhe/tlb.c
index 8afaa237f5d4..c43ecbce1201 100644
--- a/arch/arm64/kvm/hyp/vhe/tlb.c
+++ b/arch/arm64/kvm/hyp/vhe/tlb.c
@@ -227,6 +227,7 @@ void __kvm_flush_vm_context(void)
* - a TLBI targeting EL2 S1 is remapped to EL1 S1
* - a non-shareable TLBI is upgraded to being inner-shareable
* - an outer-shareable TLBI is also mapped to inner-shareable
+ * - an nXS TLBI is upgraded to XS
*/
int __kvm_tlbi_s1e2(struct kvm_s2_mmu *mmu, u64 va, u64 sys_encoding)
{
@@ -250,6 +251,12 @@ int __kvm_tlbi_s1e2(struct kvm_s2_mmu *mmu, u64 va, u64 sys_encoding)
case OP_TLBI_VMALLE1:
case OP_TLBI_VMALLE1IS:
case OP_TLBI_VMALLE1OS:
+ case OP_TLBI_ALLE2NXS:
+ case OP_TLBI_ALLE2ISNXS:
+ case OP_TLBI_ALLE2OSNXS:
+ case OP_TLBI_VMALLE1NXS:
+ case OP_TLBI_VMALLE1ISNXS:
+ case OP_TLBI_VMALLE1OSNXS:
__tlbi(vmalle1is);
break;
case OP_TLBI_VAE2:
@@ -258,6 +265,12 @@ int __kvm_tlbi_s1e2(struct kvm_s2_mmu *mmu, u64 va, u64 sys_encoding)
case OP_TLBI_VAE1:
case OP_TLBI_VAE1IS:
case OP_TLBI_VAE1OS:
+ case OP_TLBI_VAE2NXS:
+ case OP_TLBI_VAE2ISNXS:
+ case OP_TLBI_VAE2OSNXS:
+ case OP_TLBI_VAE1NXS:
+ case OP_TLBI_VAE1ISNXS:
+ case OP_TLBI_VAE1OSNXS:
__tlbi(vae1is, va);
break;
case OP_TLBI_VALE2:
@@ -266,21 +279,36 @@ int __kvm_tlbi_s1e2(struct kvm_s2_mmu *mmu, u64 va, u64 sys_encoding)
case OP_TLBI_VALE1:
case OP_TLBI_VALE1IS:
case OP_TLBI_VALE1OS:
+ case OP_TLBI_VALE2NXS:
+ case OP_TLBI_VALE2ISNXS:
+ case OP_TLBI_VALE2OSNXS:
+ case OP_TLBI_VALE1NXS:
+ case OP_TLBI_VALE1ISNXS:
+ case OP_TLBI_VALE1OSNXS:
__tlbi(vale1is, va);
break;
case OP_TLBI_ASIDE1:
case OP_TLBI_ASIDE1IS:
case OP_TLBI_ASIDE1OS:
+ case OP_TLBI_ASIDE1NXS:
+ case OP_TLBI_ASIDE1ISNXS:
+ case OP_TLBI_ASIDE1OSNXS:
__tlbi(aside1is, va);
break;
case OP_TLBI_VAAE1:
case OP_TLBI_VAAE1IS:
case OP_TLBI_VAAE1OS:
+ case OP_TLBI_VAAE1NXS:
+ case OP_TLBI_VAAE1ISNXS:
+ case OP_TLBI_VAAE1OSNXS:
__tlbi(vaae1is, va);
break;
case OP_TLBI_VAALE1:
case OP_TLBI_VAALE1IS:
case OP_TLBI_VAALE1OS:
+ case OP_TLBI_VAALE1NXS:
+ case OP_TLBI_VAALE1ISNXS:
+ case OP_TLBI_VAALE1OSNXS:
__tlbi(vaale1is, va);
break;
case OP_TLBI_RVAE2:
@@ -289,6 +317,12 @@ int __kvm_tlbi_s1e2(struct kvm_s2_mmu *mmu, u64 va, u64 sys_encoding)
case OP_TLBI_RVAE1:
case OP_TLBI_RVAE1IS:
case OP_TLBI_RVAE1OS:
+ case OP_TLBI_RVAE2NXS:
+ case OP_TLBI_RVAE2ISNXS:
+ case OP_TLBI_RVAE2OSNXS:
+ case OP_TLBI_RVAE1NXS:
+ case OP_TLBI_RVAE1ISNXS:
+ case OP_TLBI_RVAE1OSNXS:
__tlbi(rvae1is, va);
break;
case OP_TLBI_RVALE2:
@@ -297,16 +331,28 @@ int __kvm_tlbi_s1e2(struct kvm_s2_mmu *mmu, u64 va, u64 sys_encoding)
case OP_TLBI_RVALE1:
case OP_TLBI_RVALE1IS:
case OP_TLBI_RVALE1OS:
+ case OP_TLBI_RVALE2NXS:
+ case OP_TLBI_RVALE2ISNXS:
+ case OP_TLBI_RVALE2OSNXS:
+ case OP_TLBI_RVALE1NXS:
+ case OP_TLBI_RVALE1ISNXS:
+ case OP_TLBI_RVALE1OSNXS:
__tlbi(rvale1is, va);
break;
case OP_TLBI_RVAAE1:
case OP_TLBI_RVAAE1IS:
case OP_TLBI_RVAAE1OS:
+ case OP_TLBI_RVAAE1NXS:
+ case OP_TLBI_RVAAE1ISNXS:
+ case OP_TLBI_RVAAE1OSNXS:
__tlbi(rvaae1is, va);
break;
case OP_TLBI_RVAALE1:
case OP_TLBI_RVAALE1IS:
case OP_TLBI_RVAALE1OS:
+ case OP_TLBI_RVAALE1NXS:
+ case OP_TLBI_RVAALE1ISNXS:
+ case OP_TLBI_RVAALE1OSNXS:
__tlbi(rvaale1is, va);
break;
default:
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 6d7f043d892c..494b03ecf712 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -3033,6 +3033,42 @@ static struct sys_reg_desc sys_insn_descs[] = {
SYS_INSN(TLBI_VALE1, handle_tlbi_el1),
SYS_INSN(TLBI_VAALE1, handle_tlbi_el1),
+ SYS_INSN(TLBI_VMALLE1OSNXS, handle_tlbi_el1),
+ SYS_INSN(TLBI_VAE1OSNXS, handle_tlbi_el1),
+ SYS_INSN(TLBI_ASIDE1OSNXS, handle_tlbi_el1),
+ SYS_INSN(TLBI_VAAE1OSNXS, handle_tlbi_el1),
+ SYS_INSN(TLBI_VALE1OSNXS, handle_tlbi_el1),
+ SYS_INSN(TLBI_VAALE1OSNXS, handle_tlbi_el1),
+
+ SYS_INSN(TLBI_RVAE1ISNXS, handle_tlbi_el1),
+ SYS_INSN(TLBI_RVAAE1ISNXS, handle_tlbi_el1),
+ SYS_INSN(TLBI_RVALE1ISNXS, handle_tlbi_el1),
+ SYS_INSN(TLBI_RVAALE1ISNXS, handle_tlbi_el1),
+
+ SYS_INSN(TLBI_VMALLE1ISNXS, handle_tlbi_el1),
+ SYS_INSN(TLBI_VAE1ISNXS, handle_tlbi_el1),
+ SYS_INSN(TLBI_ASIDE1ISNXS, handle_tlbi_el1),
+ SYS_INSN(TLBI_VAAE1ISNXS, handle_tlbi_el1),
+ SYS_INSN(TLBI_VALE1ISNXS, handle_tlbi_el1),
+ SYS_INSN(TLBI_VAALE1ISNXS, handle_tlbi_el1),
+
+ SYS_INSN(TLBI_RVAE1OSNXS, handle_tlbi_el1),
+ SYS_INSN(TLBI_RVAAE1OSNXS, handle_tlbi_el1),
+ SYS_INSN(TLBI_RVALE1OSNXS, handle_tlbi_el1),
+ SYS_INSN(TLBI_RVAALE1OSNXS, handle_tlbi_el1),
+
+ SYS_INSN(TLBI_RVAE1NXS, handle_tlbi_el1),
+ SYS_INSN(TLBI_RVAAE1NXS, handle_tlbi_el1),
+ SYS_INSN(TLBI_RVALE1NXS, handle_tlbi_el1),
+ SYS_INSN(TLBI_RVAALE1NXS, handle_tlbi_el1),
+
+ SYS_INSN(TLBI_VMALLE1NXS, handle_tlbi_el1),
+ SYS_INSN(TLBI_VAE1NXS, handle_tlbi_el1),
+ SYS_INSN(TLBI_ASIDE1NXS, handle_tlbi_el1),
+ SYS_INSN(TLBI_VAAE1NXS, handle_tlbi_el1),
+ SYS_INSN(TLBI_VALE1NXS, handle_tlbi_el1),
+ SYS_INSN(TLBI_VAALE1NXS, handle_tlbi_el1),
+
SYS_INSN(TLBI_IPAS2E1IS, handle_ipas2e1is),
SYS_INSN(TLBI_RIPAS2E1IS, handle_ripas2e1is),
SYS_INSN(TLBI_IPAS2LE1IS, handle_ipas2e1is),
@@ -3063,6 +3099,43 @@ static struct sys_reg_desc sys_insn_descs[] = {
SYS_INSN(TLBI_RVALE2, trap_undef),
SYS_INSN(TLBI_ALLE1, handle_alle1is),
SYS_INSN(TLBI_VMALLS12E1, handle_vmalls12e1is),
+
+ SYS_INSN(TLBI_IPAS2E1ISNXS, handle_ipas2e1is),
+ SYS_INSN(TLBI_RIPAS2E1ISNXS, handle_ripas2e1is),
+ SYS_INSN(TLBI_IPAS2LE1ISNXS, handle_ipas2e1is),
+ SYS_INSN(TLBI_RIPAS2LE1ISNXS, handle_ripas2e1is),
+
+ SYS_INSN(TLBI_ALLE2OSNXS, trap_undef),
+ SYS_INSN(TLBI_VAE2OSNXS, trap_undef),
+ SYS_INSN(TLBI_ALLE1OSNXS, handle_alle1is),
+ SYS_INSN(TLBI_VALE2OSNXS, trap_undef),
+ SYS_INSN(TLBI_VMALLS12E1OSNXS, handle_vmalls12e1is),
+
+ SYS_INSN(TLBI_RVAE2ISNXS, trap_undef),
+ SYS_INSN(TLBI_RVALE2ISNXS, trap_undef),
+ SYS_INSN(TLBI_ALLE2ISNXS, trap_undef),
+ SYS_INSN(TLBI_VAE2ISNXS, trap_undef),
+
+ SYS_INSN(TLBI_ALLE1ISNXS, handle_alle1is),
+ SYS_INSN(TLBI_VALE2ISNXS, trap_undef),
+ SYS_INSN(TLBI_VMALLS12E1ISNXS, handle_vmalls12e1is),
+ SYS_INSN(TLBI_IPAS2E1OSNXS, handle_ipas2e1is),
+ SYS_INSN(TLBI_IPAS2E1NXS, handle_ipas2e1is),
+ SYS_INSN(TLBI_RIPAS2E1NXS, handle_ripas2e1is),
+ SYS_INSN(TLBI_RIPAS2E1OSNXS, handle_ripas2e1is),
+ SYS_INSN(TLBI_IPAS2LE1OSNXS, handle_ipas2e1is),
+ SYS_INSN(TLBI_IPAS2LE1NXS, handle_ipas2e1is),
+ SYS_INSN(TLBI_RIPAS2LE1NXS, handle_ripas2e1is),
+ SYS_INSN(TLBI_RIPAS2LE1OSNXS, handle_ripas2e1is),
+ SYS_INSN(TLBI_RVAE2OSNXS, trap_undef),
+ SYS_INSN(TLBI_RVALE2OSNXS, trap_undef),
+ SYS_INSN(TLBI_RVAE2NXS, trap_undef),
+ SYS_INSN(TLBI_RVALE2NXS, trap_undef),
+ SYS_INSN(TLBI_ALLE2NXS, trap_undef),
+ SYS_INSN(TLBI_VAE2NXS, trap_undef),
+ SYS_INSN(TLBI_ALLE1NXS, handle_alle1is),
+ SYS_INSN(TLBI_VALE2NXS, trap_undef),
+ SYS_INSN(TLBI_VMALLS12E1NXS, handle_vmalls12e1is),
};
static const struct sys_reg_desc *first_idreg;
--
2.39.2
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH 01/16] KVM: arm64: nv: Support multiple nested Stage-2 mmu structures
2024-04-09 17:54 ` [PATCH 01/16] KVM: arm64: nv: Support multiple nested Stage-2 mmu structures Marc Zyngier
@ 2024-05-07 6:17 ` Oliver Upton
2024-05-13 16:19 ` Marc Zyngier
0 siblings, 1 reply; 19+ messages in thread
From: Oliver Upton @ 2024-05-07 6:17 UTC (permalink / raw)
To: Marc Zyngier
Cc: kvmarm, kvm, linux-arm-kernel, James Morse, Suzuki K Poulose,
Zenghui Yu, Joey Gouly, Alexandru Elisei, Christoffer Dall
Hey Marc,
On Tue, Apr 09, 2024 at 06:54:33PM +0100, Marc Zyngier wrote:
> +static inline bool kvm_s2_mmu_valid(struct kvm_s2_mmu *mmu)
> +{
> + return !(mmu->tlb_vttbr & 1);
> +}
More readable if you use VTTBR_CNP_BIT here.
> struct kvm_arch_memory_slot {
> };
>
> @@ -264,6 +296,14 @@ struct kvm_arch {
> */
> u64 fgu[__NR_FGT_GROUP_IDS__];
>
> + /*
> + * Stage 2 paging state for VMs with nested S2 using a virtual
> + * VMID.
> + */
> + struct kvm_s2_mmu *nested_mmus;
> + size_t nested_mmus_size;
> + int nested_mmus_next;
> +
> /* Interrupt controller */
> struct vgic_dist vgic;
>
> @@ -1239,6 +1279,7 @@ void kvm_vcpu_load_vhe(struct kvm_vcpu *vcpu);
> void kvm_vcpu_put_vhe(struct kvm_vcpu *vcpu);
>
> int __init kvm_set_ipa_limit(void);
> +u32 kvm_get_pa_bits(struct kvm *kvm);
>
> #define __KVM_HAVE_ARCH_VM_ALLOC
> struct kvm *kvm_arch_alloc_vm(void);
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index d5e48d870461..b48c9e12e7ff 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -98,6 +98,7 @@ alternative_cb_end
> #include <asm/mmu_context.h>
> #include <asm/kvm_emulate.h>
> #include <asm/kvm_host.h>
> +#include <asm/kvm_nested.h>
>
> void kvm_update_va_mask(struct alt_instr *alt,
> __le32 *origptr, __le32 *updptr, int nr_inst);
> @@ -165,6 +166,7 @@ int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
> int create_hyp_stack(phys_addr_t phys_addr, unsigned long *haddr);
> void __init free_hyp_pgds(void);
>
> +void kvm_unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size);
> void stage2_unmap_vm(struct kvm *kvm);
> int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu, unsigned long type);
> void kvm_uninit_stage2_mmu(struct kvm *kvm);
> @@ -326,5 +328,12 @@ static inline struct kvm *kvm_s2_mmu_to_kvm(struct kvm_s2_mmu *mmu)
> {
> return container_of(mmu->arch, struct kvm, arch);
> }
> +
> +static inline u64 get_vmid(u64 vttbr)
> +{
> + return (vttbr & VTTBR_VMID_MASK(kvm_get_vmid_bits())) >>
> + VTTBR_VMID_SHIFT;
> +}
> +
> #endif /* __ASSEMBLY__ */
> #endif /* __ARM64_KVM_MMU_H__ */
> diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
> index c77d795556e1..1a4004e7573d 100644
> --- a/arch/arm64/include/asm/kvm_nested.h
> +++ b/arch/arm64/include/asm/kvm_nested.h
> @@ -60,6 +60,12 @@ static inline u64 translate_ttbr0_el2_to_ttbr0_el1(u64 ttbr0)
> return ttbr0 & ~GENMASK_ULL(63, 48);
> }
>
> +extern void kvm_init_nested(struct kvm *kvm);
> +extern int kvm_vcpu_init_nested(struct kvm_vcpu *vcpu);
> +extern void kvm_init_nested_s2_mmu(struct kvm_s2_mmu *mmu);
> +extern struct kvm_s2_mmu *lookup_s2_mmu(struct kvm_vcpu *vcpu);
> +extern void kvm_vcpu_load_hw_mmu(struct kvm_vcpu *vcpu);
> +extern void kvm_vcpu_put_hw_mmu(struct kvm_vcpu *vcpu);
>
> int kvm_init_nv_sysregs(struct kvm *kvm);
>
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index c4a0a35e02c7..bb19c53b6539 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -147,6 +147,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
> mutex_unlock(&kvm->lock);
> #endif
>
> + kvm_init_nested(kvm);
> +
> ret = kvm_share_hyp(kvm, kvm + 1);
> if (ret)
> return ret;
> @@ -433,6 +435,9 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
> struct kvm_s2_mmu *mmu;
> int *last_ran;
>
> + if (vcpu_has_nv(vcpu))
> + kvm_vcpu_load_hw_mmu(vcpu);
> +
> mmu = vcpu->arch.hw_mmu;
> last_ran = this_cpu_ptr(mmu->last_vcpu_ran);
>
> @@ -483,6 +488,8 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
> kvm_timer_vcpu_put(vcpu);
> kvm_vgic_put(vcpu);
> kvm_vcpu_pmu_restore_host(vcpu);
> + if (vcpu_has_nv(vcpu))
> + kvm_vcpu_put_hw_mmu(vcpu);
> kvm_arm_vmid_clear_active();
>
> vcpu_clear_on_unsupported_cpu(vcpu);
> @@ -1346,6 +1353,10 @@ static int kvm_setup_vcpu(struct kvm_vcpu *vcpu)
> if (kvm_vcpu_has_pmu(vcpu) && !kvm->arch.arm_pmu)
> ret = kvm_arm_set_default_pmu(kvm);
>
> + /* Prepare for nested if required */
> + if (!ret)
> + ret = kvm_vcpu_init_nested(vcpu);
> +
> return ret;
> }
>
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index dc04bc767865..24a946814831 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -328,7 +328,7 @@ static void __unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64
> may_block));
> }
>
> -static void unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size)
> +void kvm_unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size)
> {
> __unmap_stage2_range(mmu, start, size, true);
> }
> @@ -855,21 +855,9 @@ static struct kvm_pgtable_mm_ops kvm_s2_mm_ops = {
> .icache_inval_pou = invalidate_icache_guest_page,
> };
>
> -/**
> - * kvm_init_stage2_mmu - Initialise a S2 MMU structure
> - * @kvm: The pointer to the KVM structure
> - * @mmu: The pointer to the s2 MMU structure
> - * @type: The machine type of the virtual machine
> - *
> - * Allocates only the stage-2 HW PGD level table(s).
> - * Note we don't need locking here as this is only called when the VM is
> - * created, which can only be done once.
> - */
> -int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu, unsigned long type)
> +static int kvm_init_ipa_range(struct kvm_s2_mmu *mmu, unsigned long type)
> {
> u32 kvm_ipa_limit = get_kvm_ipa_limit();
> - int cpu, err;
> - struct kvm_pgtable *pgt;
> u64 mmfr0, mmfr1;
> u32 phys_shift;
>
> @@ -896,11 +884,58 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu, unsigned long t
> mmfr1 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1);
> mmu->vtcr = kvm_get_vtcr(mmfr0, mmfr1, phys_shift);
>
> + return 0;
> +}
> +
> +/**
> + * kvm_init_stage2_mmu - Initialise a S2 MMU structure
> + * @kvm: The pointer to the KVM structure
> + * @mmu: The pointer to the s2 MMU structure
> + * @type: The machine type of the virtual machine
> + *
> + * Allocates only the stage-2 HW PGD level table(s).
> + * Note we don't need locking here as this is only called in two cases:
> + *
> + * - when the VM is created, which can't race against anything
> + *
> + * - when secondary kvm_s2_mmu structures are initialised for NV
> + * guests, and the caller must hold kvm->lock as this is called on a
> + * per-vcpu basis.
> + */
> +int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu, unsigned long type)
> +{
> + int cpu, err;
> + struct kvm_pgtable *pgt;
> +
> + /*
> + * If we already have our page tables in place, and that the
> + * MMU context is the canonical one, we have a bug somewhere,
> + * as this is only supposed to ever happen once per VM.
> + *
> + * Otherwise, we're building nested page tables, and that's
> + * probably because userspace called KVM_ARM_VCPU_INIT more
> + * than once on the same vcpu. Since that's actually legal,
> + * don't kick a fuss and leave gracefully.
> + */
> if (mmu->pgt != NULL) {
> + if (&kvm->arch.mmu != mmu)
A helper might be a good idea, I see this repeated several times:
static inline bool kvm_is_nested_s2_mmu(struct kvm_s2_mmu *mmu)
{
return &arch->mmu != mmu;
}
> + return 0;
> +
> kvm_err("kvm_arch already initialized?\n");
> return -EINVAL;
> }
>
> + /*
> + * We only initialise the IPA range on the canonical MMU, so
> + * the type is meaningless in all other situations.
> + */
> + if (&kvm->arch.mmu != mmu)
> + type = kvm_get_pa_bits(kvm);
I'm not sure I follow this comment, because kvm_init_ipa_range() still
gets called on nested MMUs. Is this suggesting that the configured IPA
limit of the shadow MMUs doesn't matter as they can only ever map things
in the canonical IPA space?
> + err = kvm_init_ipa_range(mmu, type);
> + if (err)
> + return err;
> +
> pgt = kzalloc(sizeof(*pgt), GFP_KERNEL_ACCOUNT);
> if (!pgt)
> return -ENOMEM;
> @@ -925,6 +960,10 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu, unsigned long t
>
> mmu->pgt = pgt;
> mmu->pgd_phys = __pa(pgt->pgd);
> +
> + if (&kvm->arch.mmu != mmu)
> + kvm_init_nested_s2_mmu(mmu);
> +
> return 0;
>
> out_destroy_pgtable:
> @@ -976,7 +1015,7 @@ static void stage2_unmap_memslot(struct kvm *kvm,
>
> if (!(vma->vm_flags & VM_PFNMAP)) {
> gpa_t gpa = addr + (vm_start - memslot->userspace_addr);
> - unmap_stage2_range(&kvm->arch.mmu, gpa, vm_end - vm_start);
> + kvm_unmap_stage2_range(&kvm->arch.mmu, gpa, vm_end - vm_start);
> }
> hva = vm_end;
> } while (hva < reg_end);
> @@ -2054,11 +2093,6 @@ void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen)
> {
> }
>
> -void kvm_arch_flush_shadow_all(struct kvm *kvm)
> -{
> - kvm_uninit_stage2_mmu(kvm);
> -}
> -
> void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
> struct kvm_memory_slot *slot)
> {
> @@ -2066,7 +2100,7 @@ void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
> phys_addr_t size = slot->npages << PAGE_SHIFT;
>
> write_lock(&kvm->mmu_lock);
> - unmap_stage2_range(&kvm->arch.mmu, gpa, size);
> + kvm_unmap_stage2_range(&kvm->arch.mmu, gpa, size);
> write_unlock(&kvm->mmu_lock);
> }
>
> diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
> index ced30c90521a..1f4f80a8c011 100644
> --- a/arch/arm64/kvm/nested.c
> +++ b/arch/arm64/kvm/nested.c
> @@ -7,7 +7,9 @@
> #include <linux/kvm.h>
> #include <linux/kvm_host.h>
>
> +#include <asm/kvm_arm.h>
> #include <asm/kvm_emulate.h>
> +#include <asm/kvm_mmu.h>
> #include <asm/kvm_nested.h>
> #include <asm/sysreg.h>
>
> @@ -16,6 +18,209 @@
> /* Protection against the sysreg repainting madness... */
> #define NV_FTR(r, f) ID_AA64##r##_EL1_##f
>
> +void kvm_init_nested(struct kvm *kvm)
> +{
> + kvm->arch.nested_mmus = NULL;
> + kvm->arch.nested_mmus_size = 0;
> +}
> +
> +int kvm_vcpu_init_nested(struct kvm_vcpu *vcpu)
> +{
> + struct kvm *kvm = vcpu->kvm;
> + struct kvm_s2_mmu *tmp;
> + int num_mmus;
> + int ret = -ENOMEM;
> +
> + if (!test_bit(KVM_ARM_VCPU_HAS_EL2, vcpu->kvm->arch.vcpu_features))
> + return 0;
> +
> + if (!cpus_have_final_cap(ARM64_HAS_NESTED_VIRT))
> + return -EINVAL;
nitpick: maybe guard the call to kvm_vcpu_init_nested() with
vcpu_has_nv() and collapse these into
if (!vcpu_has_nv(vcpu))
return -EINVAL;
> + /*
> + * Let's treat memory allocation failures as benign: If we fail to
> + * allocate anything, return an error and keep the allocated array
> + * alive. Userspace may try to recover by intializing the vcpu
> + * again, and there is no reason to affect the whole VM for this.
> + */
This code feels a bit tricky, and I'm not sure much will be done to
recover the VM in practice should this allocation / ioctl fail.
Is it possible to do this late in kvm_arch_vcpu_run_pid_change() and
only have the first vCPU to reach the call do the initialization for the
whole VM? We could then dispose of the reallocation / fixup scheme
below.
If we keep this code...
> + num_mmus = atomic_read(&kvm->online_vcpus) * 2;
> + tmp = krealloc(kvm->arch.nested_mmus,
> + num_mmus * sizeof(*kvm->arch.nested_mmus),
> + GFP_KERNEL_ACCOUNT | __GFP_ZERO);
Just do an early 'return -ENOMEM' here to cut a level of indendation for
the rest that follows.
> + if (tmp) {
> + /*
> + * If we went through a realocation, adjust the MMU
> + * back-pointers in the previously initialised
> + * pg_table structures.
nitpick: pgtable or kvm_pgtable structures
> + */
> + if (kvm->arch.nested_mmus != tmp) {
> + int i;
> +
> + for (i = 0; i < num_mmus - 2; i++)
> + tmp[i].pgt->mmu = &tmp[i];
> + }
> +
> + if (kvm_init_stage2_mmu(kvm, &tmp[num_mmus - 1], 0) ||
> + kvm_init_stage2_mmu(kvm, &tmp[num_mmus - 2], 0)) {
> + kvm_free_stage2_pgd(&tmp[num_mmus - 1]);
> + kvm_free_stage2_pgd(&tmp[num_mmus - 2]);
> + } else {
> + kvm->arch.nested_mmus_size = num_mmus;
> + ret = 0;
> + }
> +
> + kvm->arch.nested_mmus = tmp;
> + }
> +
> + return ret;
> +}
> +
> +struct kvm_s2_mmu *lookup_s2_mmu(struct kvm_vcpu *vcpu)
> +{
> + bool nested_stage2_enabled;
> + u64 vttbr, vtcr, hcr;
> + struct kvm *kvm;
> + int i;
> +
> + kvm = vcpu->kvm;
nit: just do this when declaring the local.
> + lockdep_assert_held_write(&kvm->mmu_lock);
> +
> + vttbr = vcpu_read_sys_reg(vcpu, VTTBR_EL2);
> + vtcr = vcpu_read_sys_reg(vcpu, VTCR_EL2);
> + hcr = vcpu_read_sys_reg(vcpu, HCR_EL2);
> +
> + nested_stage2_enabled = hcr & HCR_VM;
> +
> + /* Don't consider the CnP bit for the vttbr match */
> + vttbr = vttbr & ~VTTBR_CNP_BIT;
nit: &=
> + /*
> + * Two possibilities when looking up a S2 MMU context:
> + *
> + * - either S2 is enabled in the guest, and we need a context that is
> + * S2-enabled and matches the full VTTBR (VMID+BADDR) and VTCR,
> + * which makes it safe from a TLB conflict perspective (a broken
> + * guest won't be able to generate them),
> + *
> + * - or S2 is disabled, and we need a context that is S2-disabled
> + * and matches the VMID only, as all TLBs are tagged by VMID even
> + * if S2 translation is disabled.
> + */
Looks like some spaces snuck in and got the indendation weird.
--
Thanks,
Oliver
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 01/16] KVM: arm64: nv: Support multiple nested Stage-2 mmu structures
2024-05-07 6:17 ` Oliver Upton
@ 2024-05-13 16:19 ` Marc Zyngier
0 siblings, 0 replies; 19+ messages in thread
From: Marc Zyngier @ 2024-05-13 16:19 UTC (permalink / raw)
To: Oliver Upton
Cc: kvmarm, kvm, linux-arm-kernel, James Morse, Suzuki K Poulose,
Zenghui Yu, Joey Gouly, Alexandru Elisei, Christoffer Dall
On Tue, 07 May 2024 07:17:13 +0100,
Oliver Upton <oliver.upton@linux.dev> wrote:
>
> Hey Marc,
>
> On Tue, Apr 09, 2024 at 06:54:33PM +0100, Marc Zyngier wrote:
> > +static inline bool kvm_s2_mmu_valid(struct kvm_s2_mmu *mmu)
> > +{
> > + return !(mmu->tlb_vttbr & 1);
> > +}
>
> More readable if you use VTTBR_CNP_BIT here.
Yes, well spotted.
[...]
> > +int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu, unsigned long type)
> > +{
> > + int cpu, err;
> > + struct kvm_pgtable *pgt;
> > +
> > + /*
> > + * If we already have our page tables in place, and that the
> > + * MMU context is the canonical one, we have a bug somewhere,
> > + * as this is only supposed to ever happen once per VM.
> > + *
> > + * Otherwise, we're building nested page tables, and that's
> > + * probably because userspace called KVM_ARM_VCPU_INIT more
> > + * than once on the same vcpu. Since that's actually legal,
> > + * don't kick a fuss and leave gracefully.
> > + */
> > if (mmu->pgt != NULL) {
> > + if (&kvm->arch.mmu != mmu)
>
> A helper might be a good idea, I see this repeated several times:
>
> static inline bool kvm_is_nested_s2_mmu(struct kvm_s2_mmu *mmu)
> {
> return &arch->mmu != mmu;
> }
Yeah, I can probably fit something like this in a number of spots.
Just need to be careful as mmu is not initialised at all in some
contexts.
>
> > + return 0;
> > +
> > kvm_err("kvm_arch already initialized?\n");
> > return -EINVAL;
> > }
> >
> > + /*
> > + * We only initialise the IPA range on the canonical MMU, so
> > + * the type is meaningless in all other situations.
> > + */
> > + if (&kvm->arch.mmu != mmu)
> > + type = kvm_get_pa_bits(kvm);
>
> I'm not sure I follow this comment, because kvm_init_ipa_range() still
> gets called on nested MMUs. Is this suggesting that the configured IPA
> limit of the shadow MMUs doesn't matter as they can only ever map things
> in the canonical IPA space?
Yes, that's exactly what I meant. Just because we limit the IPA space
to some number of bits doesn't mean we can limit the guest's own S2 to
the same thing, because they mean different things:
- the canonical IPA space (aka type) is a contract between KVM and
userspace on which ranges the MMIO exits are valid
- the nested IPA space is whatever the virtual HW exposes as PARange,
and the only constraint is that the *output* of the nested IPA space
must be contained in the canonical IPA space
Does this make sense? Happy to rework the comment to clarify this.
>
> > + err = kvm_init_ipa_range(mmu, type);
> > + if (err)
> > + return err;
> > +
> > pgt = kzalloc(sizeof(*pgt), GFP_KERNEL_ACCOUNT);
> > if (!pgt)
> > return -ENOMEM;
> > @@ -925,6 +960,10 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu, unsigned long t
> >
> > mmu->pgt = pgt;
> > mmu->pgd_phys = __pa(pgt->pgd);
> > +
> > + if (&kvm->arch.mmu != mmu)
> > + kvm_init_nested_s2_mmu(mmu);
> > +
> > return 0;
> >
> > out_destroy_pgtable:
> > @@ -976,7 +1015,7 @@ static void stage2_unmap_memslot(struct kvm *kvm,
> >
> > if (!(vma->vm_flags & VM_PFNMAP)) {
> > gpa_t gpa = addr + (vm_start - memslot->userspace_addr);
> > - unmap_stage2_range(&kvm->arch.mmu, gpa, vm_end - vm_start);
> > + kvm_unmap_stage2_range(&kvm->arch.mmu, gpa, vm_end - vm_start);
> > }
> > hva = vm_end;
> > } while (hva < reg_end);
> > @@ -2054,11 +2093,6 @@ void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen)
> > {
> > }
> >
> > -void kvm_arch_flush_shadow_all(struct kvm *kvm)
> > -{
> > - kvm_uninit_stage2_mmu(kvm);
> > -}
> > -
> > void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
> > struct kvm_memory_slot *slot)
> > {
> > @@ -2066,7 +2100,7 @@ void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
> > phys_addr_t size = slot->npages << PAGE_SHIFT;
> >
> > write_lock(&kvm->mmu_lock);
> > - unmap_stage2_range(&kvm->arch.mmu, gpa, size);
> > + kvm_unmap_stage2_range(&kvm->arch.mmu, gpa, size);
> > write_unlock(&kvm->mmu_lock);
> > }
> >
> > diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
> > index ced30c90521a..1f4f80a8c011 100644
> > --- a/arch/arm64/kvm/nested.c
> > +++ b/arch/arm64/kvm/nested.c
> > @@ -7,7 +7,9 @@
> > #include <linux/kvm.h>
> > #include <linux/kvm_host.h>
> >
> > +#include <asm/kvm_arm.h>
> > #include <asm/kvm_emulate.h>
> > +#include <asm/kvm_mmu.h>
> > #include <asm/kvm_nested.h>
> > #include <asm/sysreg.h>
> >
> > @@ -16,6 +18,209 @@
> > /* Protection against the sysreg repainting madness... */
> > #define NV_FTR(r, f) ID_AA64##r##_EL1_##f
> >
> > +void kvm_init_nested(struct kvm *kvm)
> > +{
> > + kvm->arch.nested_mmus = NULL;
> > + kvm->arch.nested_mmus_size = 0;
> > +}
> > +
> > +int kvm_vcpu_init_nested(struct kvm_vcpu *vcpu)
> > +{
> > + struct kvm *kvm = vcpu->kvm;
> > + struct kvm_s2_mmu *tmp;
> > + int num_mmus;
> > + int ret = -ENOMEM;
> > +
> > + if (!test_bit(KVM_ARM_VCPU_HAS_EL2, vcpu->kvm->arch.vcpu_features))
> > + return 0;
> > +
> > + if (!cpus_have_final_cap(ARM64_HAS_NESTED_VIRT))
> > + return -EINVAL;
>
> nitpick: maybe guard the call to kvm_vcpu_init_nested() with
> vcpu_has_nv() and collapse these into
>
> if (!vcpu_has_nv(vcpu))
> return -EINVAL;
Indeed, this is definitely old cruft we can get rid off. We don't even
need to error out, as there is a single call site.
>
> > + /*
> > + * Let's treat memory allocation failures as benign: If we fail to
> > + * allocate anything, return an error and keep the allocated array
> > + * alive. Userspace may try to recover by intializing the vcpu
> > + * again, and there is no reason to affect the whole VM for this.
> > + */
>
> This code feels a bit tricky, and I'm not sure much will be done to
> recover the VM in practice should this allocation / ioctl fail.
I think this is a question of consistency. We don't break the VM when
VPCU_INIT fails in any other case. But yeah, I agree that the whole
fixup code is tricky.
> Is it possible to do this late in kvm_arch_vcpu_run_pid_change() and
> only have the first vCPU to reach the call do the initialization for the
> whole VM? We could then dispose of the reallocation / fixup scheme
> below.
We could, but then the error becomes pretty non-recoverable.
Another thing is that I really should move this over to be vmalloc'd
rather than kmalloc'd -- there is no benefit in having this physically
contiguous.
>
> If we keep this code...
>
> > + num_mmus = atomic_read(&kvm->online_vcpus) * 2;
> > + tmp = krealloc(kvm->arch.nested_mmus,
> > + num_mmus * sizeof(*kvm->arch.nested_mmus),
> > + GFP_KERNEL_ACCOUNT | __GFP_ZERO);
>
> Just do an early 'return -ENOMEM' here to cut a level of indendation for
> the rest that follows.
>
> > + if (tmp) {
> > + /*
> > + * If we went through a realocation, adjust the MMU
> > + * back-pointers in the previously initialised
> > + * pg_table structures.
>
> nitpick: pgtable or kvm_pgtable structures
>
> > + */
> > + if (kvm->arch.nested_mmus != tmp) {
> > + int i;
> > +
> > + for (i = 0; i < num_mmus - 2; i++)
> > + tmp[i].pgt->mmu = &tmp[i];
> > + }
> > +
> > + if (kvm_init_stage2_mmu(kvm, &tmp[num_mmus - 1], 0) ||
> > + kvm_init_stage2_mmu(kvm, &tmp[num_mmus - 2], 0)) {
> > + kvm_free_stage2_pgd(&tmp[num_mmus - 1]);
> > + kvm_free_stage2_pgd(&tmp[num_mmus - 2]);
> > + } else {
> > + kvm->arch.nested_mmus_size = num_mmus;
> > + ret = 0;
> > + }
> > +
> > + kvm->arch.nested_mmus = tmp;
> > + }
> > +
> > + return ret;
> > +}
> > +
> > +struct kvm_s2_mmu *lookup_s2_mmu(struct kvm_vcpu *vcpu)
> > +{
> > + bool nested_stage2_enabled;
> > + u64 vttbr, vtcr, hcr;
> > + struct kvm *kvm;
> > + int i;
> > +
> > + kvm = vcpu->kvm;
>
> nit: just do this when declaring the local.
>
> > + lockdep_assert_held_write(&kvm->mmu_lock);
> > +
> > + vttbr = vcpu_read_sys_reg(vcpu, VTTBR_EL2);
> > + vtcr = vcpu_read_sys_reg(vcpu, VTCR_EL2);
> > + hcr = vcpu_read_sys_reg(vcpu, HCR_EL2);
> > +
> > + nested_stage2_enabled = hcr & HCR_VM;
> > +
> > + /* Don't consider the CnP bit for the vttbr match */
> > + vttbr = vttbr & ~VTTBR_CNP_BIT;
>
> nit: &=
>
> > + /*
> > + * Two possibilities when looking up a S2 MMU context:
> > + *
> > + * - either S2 is enabled in the guest, and we need a context that is
> > + * S2-enabled and matches the full VTTBR (VMID+BADDR) and VTCR,
> > + * which makes it safe from a TLB conflict perspective (a broken
> > + * guest won't be able to generate them),
> > + *
> > + * - or S2 is disabled, and we need a context that is S2-disabled
> > + * and matches the VMID only, as all TLBs are tagged by VMID even
> > + * if S2 translation is disabled.
> > + */
>
> Looks like some spaces snuck in and got the indendation weird.
Ack on all the above.
Thanks for having looked into it!
M.
--
Without deviation from the norm, progress is not possible.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2024-05-13 16:20 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-09 17:54 [PATCH 00/16] KVM: arm64: nv: Shadow stage-2 page table handling Marc Zyngier
2024-04-09 17:54 ` [PATCH 01/16] KVM: arm64: nv: Support multiple nested Stage-2 mmu structures Marc Zyngier
2024-05-07 6:17 ` Oliver Upton
2024-05-13 16:19 ` Marc Zyngier
2024-04-09 17:54 ` [PATCH 02/16] KVM: arm64: nv: Implement nested Stage-2 page table walk logic Marc Zyngier
2024-04-09 17:54 ` [PATCH 03/16] KVM: arm64: nv: Handle shadow stage 2 page faults Marc Zyngier
2024-04-09 17:54 ` [PATCH 04/16] KVM: arm64: nv: Unmap/flush shadow stage 2 page tables Marc Zyngier
2024-04-09 17:54 ` [PATCH 05/16] KVM: arm64: nv: Add Stage-1 EL2 invalidation primitives Marc Zyngier
2024-04-09 17:54 ` [PATCH 06/16] KVM: arm64: nv: Handle EL2 Stage-1 TLB invalidation Marc Zyngier
2024-04-09 17:54 ` [PATCH 07/16] KVM: arm64: nv: Handle TLB invalidation targeting L2 stage-1 Marc Zyngier
2024-04-09 17:54 ` [PATCH 08/16] KVM: arm64: nv: Handle TLBI VMALLS12E1{,IS} operations Marc Zyngier
2024-04-09 17:54 ` [PATCH 09/16] KVM: arm64: nv: Handle TLBI ALLE1{,IS} operations Marc Zyngier
2024-04-09 17:54 ` [PATCH 10/16] KVM: arm64: nv: Handle TLBI IPAS2E1{,IS} operations Marc Zyngier
2024-04-09 17:54 ` [PATCH 11/16] KVM: arm64: nv: Handle FEAT_TTL hinted TLB operations Marc Zyngier
2024-04-09 17:54 ` [PATCH 12/16] KVM: arm64: nv: Tag shadow S2 entries with guest's leaf S2 level Marc Zyngier
2024-04-09 17:54 ` [PATCH 13/16] KVM: arm64: nv: Invalidate TLBs based on shadow S2 TTL-like information Marc Zyngier
2024-04-09 17:54 ` [PATCH 14/16] KVM: arm64: nv: Add handling of outer-shareable TLBI operations Marc Zyngier
2024-04-09 17:54 ` [PATCH 15/16] KVM: arm64: nv: Add handling of range-based " Marc Zyngier
2024-04-09 17:54 ` [PATCH 16/16] KVM: arm64: nv: Add handling of NXS-flavoured " Marc Zyngier
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).