All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marc Zyngier <maz@kernel.org>
To: kvmarm@lists.linux.dev, kvm@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org
Cc: Alexandru Elisei <alexandru.elisei@arm.com>,
	Andre Przywara <andre.przywara@arm.com>,
	Chase Conklin <chase.conklin@arm.com>,
	Christoffer Dall <christoffer.dall@arm.com>,
	Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>,
	Darren Hart <darren@os.amperecomputing.com>,
	Jintack Lim <jintack@cs.columbia.edu>,
	Russell King <rmk+kernel@armlinux.org.uk>,
	Miguel Luis <miguel.luis@oracle.com>,
	James Morse <james.morse@arm.com>,
	Suzuki K Poulose <suzuki.poulose@arm.com>,
	Oliver Upton <oliver.upton@linux.dev>,
	Zenghui Yu <yuzenghui@huawei.com>
Subject: [PATCH v11 19/43] KVM: arm64: nv: Handle shadow stage 2 page faults
Date: Mon, 20 Nov 2023 13:10:03 +0000	[thread overview]
Message-ID: <20231120131027.854038-20-maz@kernel.org> (raw)
In-Reply-To: <20231120131027.854038-1-maz@kernel.org>

If we are faulting on a shadow stage 2 translation, we first walk the
guest hypervisor's stage 2 page table to see if it has a mapping. If
not, we inject a stage 2 page fault to the virtual EL2. Otherwise, we
create a mapping in the shadow stage 2 page table.

Note that we have to deal with two IPAs when we got a shadow stage 2
page fault. One is the address we faulted on, and is in the L2 guest
phys space. The other is from the guest stage-2 page table walk, and is
in the L1 guest phys space.  To differentiate them, we rename variables
so that fault_ipa is used for the former and ipa is used for the latter.

Co-developed-by: Christoffer Dall <christoffer.dall@linaro.org>
Co-developed-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
[maz: rewrote this multiple times...]
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_emulate.h |  7 +++
 arch/arm64/include/asm/kvm_nested.h  | 19 ++++++
 arch/arm64/kvm/mmu.c                 | 89 ++++++++++++++++++++++++----
 arch/arm64/kvm/nested.c              | 48 +++++++++++++++
 4 files changed, 153 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 8ccf8a1d37ff..5173f8cf2904 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -649,4 +649,11 @@ static __always_inline void kvm_reset_cptr_el2(struct kvm_vcpu *vcpu)
 
 	kvm_write_cptr_el2(val);
 }
+
+static inline bool kvm_is_shadow_s2_fault(struct kvm_vcpu *vcpu)
+{
+	return (vcpu->arch.hw_mmu != &vcpu->kvm->arch.mmu &&
+		vcpu->arch.hw_mmu->nested_stage2_enabled);
+}
+
 #endif /* __ARM64_KVM_EMULATE_H__ */
diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
index a2e719d1cd53..a9aec29bf7a1 100644
--- a/arch/arm64/include/asm/kvm_nested.h
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -77,8 +77,27 @@ struct kvm_s2_trans {
 	u64 upper_attr;
 };
 
+static inline phys_addr_t kvm_s2_trans_output(struct kvm_s2_trans *trans)
+{
+	return trans->output;
+}
+
+static inline unsigned long kvm_s2_trans_size(struct kvm_s2_trans *trans)
+{
+	return trans->block_size;
+}
+
+static inline u32 kvm_s2_trans_esr(struct kvm_s2_trans *trans)
+{
+	return trans->esr;
+}
+
 extern int kvm_walk_nested_s2(struct kvm_vcpu *vcpu, phys_addr_t gipa,
 			      struct kvm_s2_trans *result);
+extern int kvm_s2_handle_perm_fault(struct kvm_vcpu *vcpu,
+				    struct kvm_s2_trans *trans);
+extern int kvm_inject_s2_fault(struct kvm_vcpu *vcpu, u64 esr_el2);
+int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe);
 
 extern bool forward_smc_trap(struct kvm_vcpu *vcpu);
 extern bool __check_nv_sr_forward(struct kvm_vcpu *vcpu);
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 588ce46c0ad0..41de7616b735 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1412,14 +1412,16 @@ static bool kvm_vma_mte_allowed(struct vm_area_struct *vma)
 }
 
 static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
-			  struct kvm_memory_slot *memslot, unsigned long hva,
-			  unsigned long fault_status)
+			  struct kvm_s2_trans *nested,
+			  struct kvm_memory_slot *memslot,
+			  unsigned long hva, unsigned long fault_status)
 {
 	int ret = 0;
 	bool write_fault, writable, force_pte = false;
 	bool exec_fault, mte_allowed;
 	bool device = false;
 	unsigned long mmu_seq;
+	phys_addr_t ipa = fault_ipa;
 	struct kvm *kvm = vcpu->kvm;
 	struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache;
 	struct vm_area_struct *vma;
@@ -1504,10 +1506,38 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	}
 
 	vma_pagesize = 1UL << vma_shift;
+
+	if (nested) {
+		unsigned long max_map_size;
+
+		max_map_size = force_pte ? PUD_SIZE : PAGE_SIZE;
+
+		ipa = kvm_s2_trans_output(nested);
+
+		/*
+		 * If we're about to create a shadow stage 2 entry, then we
+		 * can only create a block mapping if the guest stage 2 page
+		 * table uses at least as big a mapping.
+		 */
+		max_map_size = min(kvm_s2_trans_size(nested), max_map_size);
+
+		/*
+		 * Be careful that if the mapping size falls between
+		 * two host sizes, take the smallest of the two.
+		 */
+		if (max_map_size >= PMD_SIZE && max_map_size < PUD_SIZE)
+			max_map_size = PMD_SIZE;
+		else if (max_map_size >= PAGE_SIZE && max_map_size < PMD_SIZE)
+			max_map_size = PAGE_SIZE;
+
+		force_pte = (max_map_size == PAGE_SIZE);
+		vma_pagesize = min(vma_pagesize, (long)max_map_size);
+	}
+
 	if (vma_pagesize == PMD_SIZE || vma_pagesize == PUD_SIZE)
 		fault_ipa &= ~(vma_pagesize - 1);
 
-	gfn = fault_ipa >> PAGE_SHIFT;
+	gfn = ipa >> PAGE_SHIFT;
 	mte_allowed = kvm_vma_mte_allowed(vma);
 
 	/* Don't use the VMA after the unlock -- it may have vanished */
@@ -1657,8 +1687,10 @@ static void handle_access_fault(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
  */
 int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 {
+	struct kvm_s2_trans nested_trans, *nested = NULL;
 	unsigned long fault_status;
-	phys_addr_t fault_ipa;
+	phys_addr_t fault_ipa; /* The address we faulted on */
+	phys_addr_t ipa; /* Always the IPA in the L1 guest phys space */
 	struct kvm_memory_slot *memslot;
 	unsigned long hva;
 	bool is_iabt, write_fault, writable;
@@ -1667,7 +1699,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 
 	fault_status = kvm_vcpu_trap_get_fault_type(vcpu);
 
-	fault_ipa = kvm_vcpu_get_fault_ipa(vcpu);
+	ipa = fault_ipa = kvm_vcpu_get_fault_ipa(vcpu);
 	is_iabt = kvm_vcpu_trap_is_iabt(vcpu);
 
 	if (fault_status == ESR_ELx_FSC_FAULT) {
@@ -1708,6 +1740,12 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 	if (fault_status != ESR_ELx_FSC_FAULT &&
 	    fault_status != ESR_ELx_FSC_PERM &&
 	    fault_status != ESR_ELx_FSC_ACCESS) {
+		/*
+		 * We must never see an address size fault on shadow stage 2
+		 * page table walk, because we would have injected an addr
+		 * size fault when we walked the nested s2 page and not
+		 * create the shadow entry.
+		 */
 		kvm_err("Unsupported FSC: EC=%#x xFSC=%#lx ESR_EL2=%#lx\n",
 			kvm_vcpu_trap_get_class(vcpu),
 			(unsigned long)kvm_vcpu_trap_get_fault(vcpu),
@@ -1717,7 +1755,37 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 
 	idx = srcu_read_lock(&vcpu->kvm->srcu);
 
-	gfn = fault_ipa >> PAGE_SHIFT;
+	/*
+	 * We may have faulted on a shadow stage 2 page table if we are
+	 * running a nested guest.  In this case, we have to resolve the L2
+	 * IPA to the L1 IPA first, before knowing what kind of memory should
+	 * back the L1 IPA.
+	 *
+	 * If the shadow stage 2 page table walk faults, then we simply inject
+	 * this to the guest and carry on.
+	 */
+	if (kvm_is_shadow_s2_fault(vcpu)) {
+		u32 esr;
+
+		ret = kvm_walk_nested_s2(vcpu, fault_ipa, &nested_trans);
+		if (ret) {
+			esr = kvm_s2_trans_esr(&nested_trans);
+			kvm_inject_s2_fault(vcpu, esr);
+			goto out_unlock;
+		}
+
+		ret = kvm_s2_handle_perm_fault(vcpu, &nested_trans);
+		if (ret) {
+			esr = kvm_s2_trans_esr(&nested_trans);
+			kvm_inject_s2_fault(vcpu, esr);
+			goto out_unlock;
+		}
+
+		ipa = kvm_s2_trans_output(&nested_trans);
+		nested = &nested_trans;
+	}
+
+	gfn = ipa >> PAGE_SHIFT;
 	memslot = gfn_to_memslot(vcpu->kvm, gfn);
 	hva = gfn_to_hva_memslot_prot(memslot, gfn, &writable);
 	write_fault = kvm_is_write_fault(vcpu);
@@ -1761,13 +1829,13 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 		 * faulting VA. This is always 12 bits, irrespective
 		 * of the page size.
 		 */
-		fault_ipa |= kvm_vcpu_get_hfar(vcpu) & ((1 << 12) - 1);
-		ret = io_mem_abort(vcpu, fault_ipa);
+		ipa |= kvm_vcpu_get_hfar(vcpu) & ((1 << 12) - 1);
+		ret = io_mem_abort(vcpu, ipa);
 		goto out_unlock;
 	}
 
 	/* Userspace should not be able to register out-of-bounds IPAs */
-	VM_BUG_ON(fault_ipa >= kvm_phys_size(vcpu->arch.hw_mmu));
+	VM_BUG_ON(ipa >= kvm_phys_size(vcpu->arch.hw_mmu));
 
 	if (fault_status == ESR_ELx_FSC_ACCESS) {
 		handle_access_fault(vcpu, fault_ipa);
@@ -1775,7 +1843,8 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 		goto out_unlock;
 	}
 
-	ret = user_mem_abort(vcpu, fault_ipa, memslot, hva, fault_status);
+	ret = user_mem_abort(vcpu, fault_ipa, nested,
+			     memslot, hva, fault_status);
 	if (ret == 0)
 		ret = 1;
 out:
diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
index cd74d8ee93cb..f4014ae0f901 100644
--- a/arch/arm64/kvm/nested.c
+++ b/arch/arm64/kvm/nested.c
@@ -108,6 +108,15 @@ static u32 compute_fsc(int level, u32 fsc)
 	return fsc | (level & 0x3);
 }
 
+static int esr_s2_fault(struct kvm_vcpu *vcpu, int level, u32 fsc)
+{
+	u32 esr;
+
+	esr = kvm_vcpu_get_esr(vcpu) & ~ESR_ELx_FSC;
+	esr |= compute_fsc(level, fsc);
+	return esr;
+}
+
 static int check_base_s2_limits(struct s2_walk_info *wi,
 				int level, int input_size, int stride)
 {
@@ -472,6 +481,45 @@ void kvm_vcpu_put_hw_mmu(struct kvm_vcpu *vcpu)
 	}
 }
 
+/*
+ * Returns non-zero if permission fault is handled by injecting it to the next
+ * level hypervisor.
+ */
+int kvm_s2_handle_perm_fault(struct kvm_vcpu *vcpu, struct kvm_s2_trans *trans)
+{
+	unsigned long fault_status = kvm_vcpu_trap_get_fault_type(vcpu);
+	bool forward_fault = false;
+
+	trans->esr = 0;
+
+	if (fault_status != ESR_ELx_FSC_PERM)
+		return 0;
+
+	if (kvm_vcpu_trap_is_iabt(vcpu)) {
+		forward_fault = (trans->upper_attr & BIT(54));
+	} else {
+		bool write_fault = kvm_is_write_fault(vcpu);
+
+		forward_fault = ((write_fault && !trans->writable) ||
+				 (!write_fault && !trans->readable));
+	}
+
+	if (forward_fault) {
+		trans->esr = esr_s2_fault(vcpu, trans->level, ESR_ELx_FSC_PERM);
+		return 1;
+	}
+
+	return 0;
+}
+
+int kvm_inject_s2_fault(struct kvm_vcpu *vcpu, u64 esr_el2)
+{
+	vcpu_write_sys_reg(vcpu, vcpu->arch.fault.far_el2, FAR_EL2);
+	vcpu_write_sys_reg(vcpu, vcpu->arch.fault.hpfar_el2, HPFAR_EL2);
+
+	return kvm_inject_nested_sync(vcpu, esr_el2);
+}
+
 void kvm_arch_flush_shadow_all(struct kvm *kvm)
 {
 	int i;
-- 
2.39.2


WARNING: multiple messages have this Message-ID (diff)
From: Marc Zyngier <maz@kernel.org>
To: kvmarm@lists.linux.dev, kvm@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org
Cc: Alexandru Elisei <alexandru.elisei@arm.com>,
	Andre Przywara <andre.przywara@arm.com>,
	Chase Conklin <chase.conklin@arm.com>,
	Christoffer Dall <christoffer.dall@arm.com>,
	Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>,
	Darren Hart <darren@os.amperecomputing.com>,
	Jintack Lim <jintack@cs.columbia.edu>,
	Russell King <rmk+kernel@armlinux.org.uk>,
	Miguel Luis <miguel.luis@oracle.com>,
	James Morse <james.morse@arm.com>,
	Suzuki K Poulose <suzuki.poulose@arm.com>,
	Oliver Upton <oliver.upton@linux.dev>,
	Zenghui Yu <yuzenghui@huawei.com>
Subject: [PATCH v11 19/43] KVM: arm64: nv: Handle shadow stage 2 page faults
Date: Mon, 20 Nov 2023 13:10:03 +0000	[thread overview]
Message-ID: <20231120131027.854038-20-maz@kernel.org> (raw)
In-Reply-To: <20231120131027.854038-1-maz@kernel.org>

If we are faulting on a shadow stage 2 translation, we first walk the
guest hypervisor's stage 2 page table to see if it has a mapping. If
not, we inject a stage 2 page fault to the virtual EL2. Otherwise, we
create a mapping in the shadow stage 2 page table.

Note that we have to deal with two IPAs when we got a shadow stage 2
page fault. One is the address we faulted on, and is in the L2 guest
phys space. The other is from the guest stage-2 page table walk, and is
in the L1 guest phys space.  To differentiate them, we rename variables
so that fault_ipa is used for the former and ipa is used for the latter.

Co-developed-by: Christoffer Dall <christoffer.dall@linaro.org>
Co-developed-by: Jintack Lim <jintack.lim@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Jintack Lim <jintack.lim@linaro.org>
[maz: rewrote this multiple times...]
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_emulate.h |  7 +++
 arch/arm64/include/asm/kvm_nested.h  | 19 ++++++
 arch/arm64/kvm/mmu.c                 | 89 ++++++++++++++++++++++++----
 arch/arm64/kvm/nested.c              | 48 +++++++++++++++
 4 files changed, 153 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 8ccf8a1d37ff..5173f8cf2904 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -649,4 +649,11 @@ static __always_inline void kvm_reset_cptr_el2(struct kvm_vcpu *vcpu)
 
 	kvm_write_cptr_el2(val);
 }
+
+static inline bool kvm_is_shadow_s2_fault(struct kvm_vcpu *vcpu)
+{
+	return (vcpu->arch.hw_mmu != &vcpu->kvm->arch.mmu &&
+		vcpu->arch.hw_mmu->nested_stage2_enabled);
+}
+
 #endif /* __ARM64_KVM_EMULATE_H__ */
diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
index a2e719d1cd53..a9aec29bf7a1 100644
--- a/arch/arm64/include/asm/kvm_nested.h
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -77,8 +77,27 @@ struct kvm_s2_trans {
 	u64 upper_attr;
 };
 
+static inline phys_addr_t kvm_s2_trans_output(struct kvm_s2_trans *trans)
+{
+	return trans->output;
+}
+
+static inline unsigned long kvm_s2_trans_size(struct kvm_s2_trans *trans)
+{
+	return trans->block_size;
+}
+
+static inline u32 kvm_s2_trans_esr(struct kvm_s2_trans *trans)
+{
+	return trans->esr;
+}
+
 extern int kvm_walk_nested_s2(struct kvm_vcpu *vcpu, phys_addr_t gipa,
 			      struct kvm_s2_trans *result);
+extern int kvm_s2_handle_perm_fault(struct kvm_vcpu *vcpu,
+				    struct kvm_s2_trans *trans);
+extern int kvm_inject_s2_fault(struct kvm_vcpu *vcpu, u64 esr_el2);
+int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe);
 
 extern bool forward_smc_trap(struct kvm_vcpu *vcpu);
 extern bool __check_nv_sr_forward(struct kvm_vcpu *vcpu);
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 588ce46c0ad0..41de7616b735 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1412,14 +1412,16 @@ static bool kvm_vma_mte_allowed(struct vm_area_struct *vma)
 }
 
 static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
-			  struct kvm_memory_slot *memslot, unsigned long hva,
-			  unsigned long fault_status)
+			  struct kvm_s2_trans *nested,
+			  struct kvm_memory_slot *memslot,
+			  unsigned long hva, unsigned long fault_status)
 {
 	int ret = 0;
 	bool write_fault, writable, force_pte = false;
 	bool exec_fault, mte_allowed;
 	bool device = false;
 	unsigned long mmu_seq;
+	phys_addr_t ipa = fault_ipa;
 	struct kvm *kvm = vcpu->kvm;
 	struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache;
 	struct vm_area_struct *vma;
@@ -1504,10 +1506,38 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	}
 
 	vma_pagesize = 1UL << vma_shift;
+
+	if (nested) {
+		unsigned long max_map_size;
+
+		max_map_size = force_pte ? PUD_SIZE : PAGE_SIZE;
+
+		ipa = kvm_s2_trans_output(nested);
+
+		/*
+		 * If we're about to create a shadow stage 2 entry, then we
+		 * can only create a block mapping if the guest stage 2 page
+		 * table uses at least as big a mapping.
+		 */
+		max_map_size = min(kvm_s2_trans_size(nested), max_map_size);
+
+		/*
+		 * Be careful that if the mapping size falls between
+		 * two host sizes, take the smallest of the two.
+		 */
+		if (max_map_size >= PMD_SIZE && max_map_size < PUD_SIZE)
+			max_map_size = PMD_SIZE;
+		else if (max_map_size >= PAGE_SIZE && max_map_size < PMD_SIZE)
+			max_map_size = PAGE_SIZE;
+
+		force_pte = (max_map_size == PAGE_SIZE);
+		vma_pagesize = min(vma_pagesize, (long)max_map_size);
+	}
+
 	if (vma_pagesize == PMD_SIZE || vma_pagesize == PUD_SIZE)
 		fault_ipa &= ~(vma_pagesize - 1);
 
-	gfn = fault_ipa >> PAGE_SHIFT;
+	gfn = ipa >> PAGE_SHIFT;
 	mte_allowed = kvm_vma_mte_allowed(vma);
 
 	/* Don't use the VMA after the unlock -- it may have vanished */
@@ -1657,8 +1687,10 @@ static void handle_access_fault(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
  */
 int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 {
+	struct kvm_s2_trans nested_trans, *nested = NULL;
 	unsigned long fault_status;
-	phys_addr_t fault_ipa;
+	phys_addr_t fault_ipa; /* The address we faulted on */
+	phys_addr_t ipa; /* Always the IPA in the L1 guest phys space */
 	struct kvm_memory_slot *memslot;
 	unsigned long hva;
 	bool is_iabt, write_fault, writable;
@@ -1667,7 +1699,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 
 	fault_status = kvm_vcpu_trap_get_fault_type(vcpu);
 
-	fault_ipa = kvm_vcpu_get_fault_ipa(vcpu);
+	ipa = fault_ipa = kvm_vcpu_get_fault_ipa(vcpu);
 	is_iabt = kvm_vcpu_trap_is_iabt(vcpu);
 
 	if (fault_status == ESR_ELx_FSC_FAULT) {
@@ -1708,6 +1740,12 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 	if (fault_status != ESR_ELx_FSC_FAULT &&
 	    fault_status != ESR_ELx_FSC_PERM &&
 	    fault_status != ESR_ELx_FSC_ACCESS) {
+		/*
+		 * We must never see an address size fault on shadow stage 2
+		 * page table walk, because we would have injected an addr
+		 * size fault when we walked the nested s2 page and not
+		 * create the shadow entry.
+		 */
 		kvm_err("Unsupported FSC: EC=%#x xFSC=%#lx ESR_EL2=%#lx\n",
 			kvm_vcpu_trap_get_class(vcpu),
 			(unsigned long)kvm_vcpu_trap_get_fault(vcpu),
@@ -1717,7 +1755,37 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 
 	idx = srcu_read_lock(&vcpu->kvm->srcu);
 
-	gfn = fault_ipa >> PAGE_SHIFT;
+	/*
+	 * We may have faulted on a shadow stage 2 page table if we are
+	 * running a nested guest.  In this case, we have to resolve the L2
+	 * IPA to the L1 IPA first, before knowing what kind of memory should
+	 * back the L1 IPA.
+	 *
+	 * If the shadow stage 2 page table walk faults, then we simply inject
+	 * this to the guest and carry on.
+	 */
+	if (kvm_is_shadow_s2_fault(vcpu)) {
+		u32 esr;
+
+		ret = kvm_walk_nested_s2(vcpu, fault_ipa, &nested_trans);
+		if (ret) {
+			esr = kvm_s2_trans_esr(&nested_trans);
+			kvm_inject_s2_fault(vcpu, esr);
+			goto out_unlock;
+		}
+
+		ret = kvm_s2_handle_perm_fault(vcpu, &nested_trans);
+		if (ret) {
+			esr = kvm_s2_trans_esr(&nested_trans);
+			kvm_inject_s2_fault(vcpu, esr);
+			goto out_unlock;
+		}
+
+		ipa = kvm_s2_trans_output(&nested_trans);
+		nested = &nested_trans;
+	}
+
+	gfn = ipa >> PAGE_SHIFT;
 	memslot = gfn_to_memslot(vcpu->kvm, gfn);
 	hva = gfn_to_hva_memslot_prot(memslot, gfn, &writable);
 	write_fault = kvm_is_write_fault(vcpu);
@@ -1761,13 +1829,13 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 		 * faulting VA. This is always 12 bits, irrespective
 		 * of the page size.
 		 */
-		fault_ipa |= kvm_vcpu_get_hfar(vcpu) & ((1 << 12) - 1);
-		ret = io_mem_abort(vcpu, fault_ipa);
+		ipa |= kvm_vcpu_get_hfar(vcpu) & ((1 << 12) - 1);
+		ret = io_mem_abort(vcpu, ipa);
 		goto out_unlock;
 	}
 
 	/* Userspace should not be able to register out-of-bounds IPAs */
-	VM_BUG_ON(fault_ipa >= kvm_phys_size(vcpu->arch.hw_mmu));
+	VM_BUG_ON(ipa >= kvm_phys_size(vcpu->arch.hw_mmu));
 
 	if (fault_status == ESR_ELx_FSC_ACCESS) {
 		handle_access_fault(vcpu, fault_ipa);
@@ -1775,7 +1843,8 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 		goto out_unlock;
 	}
 
-	ret = user_mem_abort(vcpu, fault_ipa, memslot, hva, fault_status);
+	ret = user_mem_abort(vcpu, fault_ipa, nested,
+			     memslot, hva, fault_status);
 	if (ret == 0)
 		ret = 1;
 out:
diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
index cd74d8ee93cb..f4014ae0f901 100644
--- a/arch/arm64/kvm/nested.c
+++ b/arch/arm64/kvm/nested.c
@@ -108,6 +108,15 @@ static u32 compute_fsc(int level, u32 fsc)
 	return fsc | (level & 0x3);
 }
 
+static int esr_s2_fault(struct kvm_vcpu *vcpu, int level, u32 fsc)
+{
+	u32 esr;
+
+	esr = kvm_vcpu_get_esr(vcpu) & ~ESR_ELx_FSC;
+	esr |= compute_fsc(level, fsc);
+	return esr;
+}
+
 static int check_base_s2_limits(struct s2_walk_info *wi,
 				int level, int input_size, int stride)
 {
@@ -472,6 +481,45 @@ void kvm_vcpu_put_hw_mmu(struct kvm_vcpu *vcpu)
 	}
 }
 
+/*
+ * Returns non-zero if permission fault is handled by injecting it to the next
+ * level hypervisor.
+ */
+int kvm_s2_handle_perm_fault(struct kvm_vcpu *vcpu, struct kvm_s2_trans *trans)
+{
+	unsigned long fault_status = kvm_vcpu_trap_get_fault_type(vcpu);
+	bool forward_fault = false;
+
+	trans->esr = 0;
+
+	if (fault_status != ESR_ELx_FSC_PERM)
+		return 0;
+
+	if (kvm_vcpu_trap_is_iabt(vcpu)) {
+		forward_fault = (trans->upper_attr & BIT(54));
+	} else {
+		bool write_fault = kvm_is_write_fault(vcpu);
+
+		forward_fault = ((write_fault && !trans->writable) ||
+				 (!write_fault && !trans->readable));
+	}
+
+	if (forward_fault) {
+		trans->esr = esr_s2_fault(vcpu, trans->level, ESR_ELx_FSC_PERM);
+		return 1;
+	}
+
+	return 0;
+}
+
+int kvm_inject_s2_fault(struct kvm_vcpu *vcpu, u64 esr_el2)
+{
+	vcpu_write_sys_reg(vcpu, vcpu->arch.fault.far_el2, FAR_EL2);
+	vcpu_write_sys_reg(vcpu, vcpu->arch.fault.hpfar_el2, HPFAR_EL2);
+
+	return kvm_inject_nested_sync(vcpu, esr_el2);
+}
+
 void kvm_arch_flush_shadow_all(struct kvm *kvm)
 {
 	int i;
-- 
2.39.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  parent reply	other threads:[~2023-11-20 13:10 UTC|newest]

Thread overview: 158+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-20 13:09 [PATCH v11 00/43] KVM: arm64: Nested Virtualization support (FEAT_NV2 only) Marc Zyngier
2023-11-20 13:09 ` Marc Zyngier
2023-11-20 13:09 ` [PATCH v11 01/43] arm64: cpufeatures: Restrict NV support to FEAT_NV2 Marc Zyngier
2023-11-20 13:09   ` Marc Zyngier
2023-11-21  9:07   ` Ganapatrao Kulkarni
2023-11-21  9:07     ` Ganapatrao Kulkarni
2023-11-21  9:27     ` Marc Zyngier
2023-11-21  9:27       ` Marc Zyngier
2023-11-20 13:09 ` [PATCH v11 02/43] KVM: arm64: nv: Hoist vcpu_has_nv() into is_hyp_ctxt() Marc Zyngier
2023-11-20 13:09   ` Marc Zyngier
2023-11-20 13:09 ` [PATCH v11 03/43] KVM: arm64: nv: Compute NV view of idregs as a one-off Marc Zyngier
2023-11-20 13:09   ` Marc Zyngier
2023-11-20 13:09 ` [PATCH v11 04/43] KVM: arm64: nv: Drop EL12 register traps that are redirected to VNCR Marc Zyngier
2023-11-20 13:09   ` Marc Zyngier
2023-11-20 13:09 ` [PATCH v11 05/43] KVM: arm64: nv: Add non-VHE-EL2->EL1 translation helpers Marc Zyngier
2023-11-20 13:09   ` Marc Zyngier
2023-11-20 13:09 ` [PATCH v11 06/43] KVM: arm64: nv: Add include containing the VNCR_EL2 offsets Marc Zyngier
2023-11-20 13:09   ` Marc Zyngier
2023-11-20 13:09 ` [PATCH v11 07/43] KVM: arm64: Introduce a bad_trap() primitive for unexpected trap handling Marc Zyngier
2023-11-20 13:09   ` Marc Zyngier
2023-11-20 13:09 ` [PATCH v11 08/43] KVM: arm64: nv: Add EL2_REG_VNCR()/EL2_REG_REDIR() sysreg helpers Marc Zyngier
2023-11-20 13:09   ` Marc Zyngier
2023-11-20 13:09 ` [PATCH v11 09/43] KVM: arm64: nv: Map VNCR-capable registers to a separate page Marc Zyngier
2023-11-20 13:09   ` Marc Zyngier
2023-11-20 13:09 ` [PATCH v11 10/43] KVM: arm64: nv: Handle virtual EL2 registers in vcpu_read/write_sys_reg() Marc Zyngier
2023-11-20 13:09   ` Marc Zyngier
2023-11-20 13:09 ` [PATCH v11 11/43] KVM: arm64: nv: Handle HCR_EL2.E2H specially Marc Zyngier
2023-11-20 13:09   ` Marc Zyngier
2023-11-20 13:09 ` [PATCH v11 12/43] KVM: arm64: nv: Handle CNTHCTL_EL2 specially Marc Zyngier
2023-11-20 13:09   ` Marc Zyngier
2023-11-20 13:09 ` [PATCH v11 13/43] KVM: arm64: nv: Save/Restore vEL2 sysregs Marc Zyngier
2023-11-20 13:09   ` Marc Zyngier
2023-11-20 13:09 ` [PATCH v11 14/43] KVM: arm64: nv: Respect virtual HCR_EL2.TWX setting Marc Zyngier
2023-11-20 13:09   ` Marc Zyngier
2023-11-20 13:09 ` [PATCH v11 15/43] KVM: arm64: nv: Respect virtual CPTR_EL2.{TFP,FPEN} settings Marc Zyngier
2023-11-20 13:09   ` Marc Zyngier
2023-11-20 13:10 ` [PATCH v11 16/43] KVM: arm64: nv: Configure HCR_EL2 for FEAT_NV2 Marc Zyngier
2023-11-20 13:10   ` Marc Zyngier
2023-11-20 13:10 ` [PATCH v11 17/43] KVM: arm64: nv: Support multiple nested Stage-2 mmu structures Marc Zyngier
2023-11-20 13:10   ` Marc Zyngier
2024-01-23  9:55   ` Ganapatrao Kulkarni
2024-01-23  9:55     ` Ganapatrao Kulkarni
2024-01-23 14:26     ` Marc Zyngier
2024-01-23 14:26       ` Marc Zyngier
2024-01-25  8:14       ` Ganapatrao Kulkarni
2024-01-25  8:14         ` Ganapatrao Kulkarni
2024-01-25  8:58         ` Marc Zyngier
2024-01-25  8:58           ` Marc Zyngier
2024-01-31  9:39           ` Ganapatrao Kulkarni
2024-01-31  9:39             ` Ganapatrao Kulkarni
2024-01-31 13:50             ` Marc Zyngier
2024-01-31 13:50               ` Marc Zyngier
2023-11-20 13:10 ` [PATCH v11 18/43] KVM: arm64: nv: Implement nested Stage-2 page table walk logic Marc Zyngier
2023-11-20 13:10   ` Marc Zyngier
2023-11-20 13:10 ` Marc Zyngier [this message]
2023-11-20 13:10   ` [PATCH v11 19/43] KVM: arm64: nv: Handle shadow stage 2 page faults Marc Zyngier
2024-01-17 14:53   ` Joey Gouly
2024-01-17 14:53     ` Joey Gouly
2024-01-17 15:53     ` Marc Zyngier
2024-01-17 15:53       ` Marc Zyngier
2023-11-20 13:10 ` [PATCH v11 20/43] KVM: arm64: nv: Restrict S2 RD/WR permissions to match the guest's Marc Zyngier
2023-11-20 13:10   ` Marc Zyngier
2023-11-20 13:10 ` [PATCH v11 21/43] KVM: arm64: nv: Unmap/flush shadow stage 2 page tables Marc Zyngier
2023-11-20 13:10   ` Marc Zyngier
2023-11-20 13:10 ` [PATCH v11 22/43] KVM: arm64: nv: Set a handler for the system instruction traps Marc Zyngier
2023-11-20 13:10   ` Marc Zyngier
2023-11-20 13:10 ` [PATCH v11 23/43] KVM: arm64: nv: Trap and emulate AT instructions from virtual EL2 Marc Zyngier
2023-11-20 13:10   ` Marc Zyngier
2023-11-20 13:10 ` [PATCH v11 24/43] KVM: arm64: nv: Trap and emulate TLBI " Marc Zyngier
2023-11-20 13:10   ` Marc Zyngier
2023-11-20 13:10 ` [PATCH v11 25/43] KVM: arm64: nv: Hide RAS from nested guests Marc Zyngier
2023-11-20 13:10   ` Marc Zyngier
2023-11-20 13:10 ` [PATCH v11 26/43] KVM: arm64: nv: Add handling of EL2-specific timer registers Marc Zyngier
2023-11-20 13:10   ` Marc Zyngier
2023-11-20 13:10 ` [PATCH v11 27/43] KVM: arm64: nv: Sync nested timer state with FEAT_NV2 Marc Zyngier
2023-11-20 13:10   ` Marc Zyngier
2023-11-20 13:10 ` [PATCH v11 28/43] KVM: arm64: nv: Publish emulated timer interrupt state in the in-memory state Marc Zyngier
2023-11-20 13:10   ` Marc Zyngier
2023-11-20 13:10 ` [PATCH v11 29/43] KVM: arm64: nv: Load timer before the GIC Marc Zyngier
2023-11-20 13:10   ` Marc Zyngier
2023-11-20 13:10 ` [PATCH v11 30/43] KVM: arm64: nv: Nested GICv3 Support Marc Zyngier
2023-11-20 13:10   ` Marc Zyngier
2023-11-20 13:10 ` [PATCH v11 31/43] KVM: arm64: nv: Don't block in WFI from nested state Marc Zyngier
2023-11-20 13:10   ` Marc Zyngier
2023-11-20 13:10 ` [PATCH v11 32/43] KVM: arm64: nv: vgic: Allow userland to set VGIC maintenance IRQ Marc Zyngier
2023-11-20 13:10   ` Marc Zyngier
2023-11-20 13:10 ` [PATCH v11 33/43] KVM: arm64: nv: Fold GICv3 host trapping requirements into guest setup Marc Zyngier
2023-11-20 13:10   ` Marc Zyngier
2023-11-20 13:10 ` [PATCH v11 34/43] KVM: arm64: nv: Deal with broken VGIC on maintenance interrupt delivery Marc Zyngier
2023-11-20 13:10   ` Marc Zyngier
2023-11-20 13:10 ` [PATCH v11 35/43] KVM: arm64: nv: Add handling of FEAT_TTL TLB invalidation Marc Zyngier
2023-11-20 13:10   ` Marc Zyngier
2023-11-20 13:10 ` [PATCH v11 36/43] KVM: arm64: nv: Invalidate TLBs based on shadow S2 TTL-like information Marc Zyngier
2023-11-20 13:10   ` Marc Zyngier
2023-11-20 13:10 ` [PATCH v11 37/43] KVM: arm64: nv: Tag shadow S2 entries with nested level Marc Zyngier
2023-11-20 13:10   ` Marc Zyngier
2023-11-20 13:10 ` [PATCH v11 38/43] KVM: arm64: nv: Allocate VNCR page when required Marc Zyngier
2023-11-20 13:10   ` Marc Zyngier
2023-11-20 13:10 ` [PATCH v11 39/43] KVM: arm64: nv: Fast-track 'InHost' exception returns Marc Zyngier
2023-11-20 13:10   ` Marc Zyngier
2023-11-20 13:10 ` [PATCH v11 40/43] KVM: arm64: nv: Fast-track EL1 TLBIs for VHE guests Marc Zyngier
2023-11-20 13:10   ` Marc Zyngier
2023-11-20 13:10 ` [PATCH v11 41/43] KVM: arm64: nv: Use FEAT_ECV to trap access to EL0 timers Marc Zyngier
2023-11-20 13:10   ` Marc Zyngier
2023-11-20 13:10 ` [PATCH v11 42/43] KVM: arm64: nv: Accelerate EL0 timer read accesses when FEAT_ECV is on Marc Zyngier
2023-11-20 13:10   ` Marc Zyngier
2023-11-20 13:10 ` [PATCH v11 43/43] KVM: arm64: nv: Allow userspace to request KVM_ARM_VCPU_NESTED_VIRT Marc Zyngier
2023-11-20 13:10   ` Marc Zyngier
2023-11-21  8:51 ` [PATCH v11 00/43] KVM: arm64: Nested Virtualization support (FEAT_NV2 only) Ganapatrao Kulkarni
2023-11-21  8:51   ` Ganapatrao Kulkarni
2023-11-21  9:08   ` Marc Zyngier
2023-11-21  9:08     ` Marc Zyngier
2023-11-21  9:26     ` Ganapatrao Kulkarni
2023-11-21  9:26       ` Ganapatrao Kulkarni
2023-11-21  9:41       ` Marc Zyngier
2023-11-21  9:41         ` Marc Zyngier
2023-11-22 11:10         ` Ganapatrao Kulkarni
2023-11-22 11:10           ` Ganapatrao Kulkarni
2023-11-22 11:39           ` Marc Zyngier
2023-11-22 11:39             ` Marc Zyngier
2023-11-21 16:49 ` Miguel Luis
2023-11-21 16:49   ` Miguel Luis
2023-11-21 19:02   ` Marc Zyngier
2023-11-21 19:02     ` Marc Zyngier
2023-11-23 16:21     ` Miguel Luis
2023-11-23 16:21       ` Miguel Luis
2023-11-23 16:44       ` Marc Zyngier
2023-11-23 16:44         ` Marc Zyngier
2023-11-24  9:50         ` Ganapatrao Kulkarni
2023-11-24  9:50           ` Ganapatrao Kulkarni
2023-11-24 10:19           ` Marc Zyngier
2023-11-24 10:19             ` Marc Zyngier
2023-11-24 12:34             ` Ganapatrao Kulkarni
2023-11-24 12:34               ` Ganapatrao Kulkarni
2023-11-24 12:51               ` Marc Zyngier
2023-11-24 12:51                 ` Marc Zyngier
2023-11-24 13:22                 ` Ganapatrao Kulkarni
2023-11-24 13:22                   ` Ganapatrao Kulkarni
2023-11-24 14:32                   ` Marc Zyngier
2023-11-24 14:32                     ` Marc Zyngier
2023-11-27  7:26                     ` Ganapatrao Kulkarni
2023-11-27  7:26                       ` Ganapatrao Kulkarni
2023-11-27  9:22                       ` Marc Zyngier
2023-11-27  9:22                         ` Marc Zyngier
2023-11-27 10:59                         ` Ganapatrao Kulkarni
2023-11-27 10:59                           ` Ganapatrao Kulkarni
2023-11-27 11:45                           ` Marc Zyngier
2023-11-27 11:45                             ` Marc Zyngier
2023-11-27 12:18                             ` Ganapatrao Kulkarni
2023-11-27 12:18                               ` Ganapatrao Kulkarni
2023-11-27 13:57                               ` Marc Zyngier
2023-11-27 13:57                                 ` Marc Zyngier
2023-12-18 12:39 ` Marc Zyngier
2023-12-18 12:39   ` Marc Zyngier
2023-12-18 19:51   ` Oliver Upton
2023-12-18 19:51     ` Oliver Upton
2023-12-19 10:32 ` (subset) " Marc Zyngier
2023-12-19 10:32   ` Marc Zyngier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231120131027.854038-20-maz@kernel.org \
    --to=maz@kernel.org \
    --cc=alexandru.elisei@arm.com \
    --cc=andre.przywara@arm.com \
    --cc=chase.conklin@arm.com \
    --cc=christoffer.dall@arm.com \
    --cc=darren@os.amperecomputing.com \
    --cc=gankulkarni@os.amperecomputing.com \
    --cc=james.morse@arm.com \
    --cc=jintack@cs.columbia.edu \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.linux.dev \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=miguel.luis@oracle.com \
    --cc=oliver.upton@linux.dev \
    --cc=rmk+kernel@armlinux.org.uk \
    --cc=suzuki.poulose@arm.com \
    --cc=yuzenghui@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.