* [PATCH v2 0/3] KVM: arm64: Fix handling of S1PTW S2 fault on RO memslots
@ 2023-01-03 10:09 Marc Zyngier
2023-01-03 10:09 ` [PATCH v2 1/3] KVM: arm64: Fix S1PTW handling " Marc Zyngier
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: Marc Zyngier @ 2023-01-03 10:09 UTC (permalink / raw)
To: kvmarm, kvmarm, kvm, linux-arm-kernel
Cc: James Morse, Suzuki K Poulose, Alexandru Elisei, Oliver Upton,
Ard Biesheuvel, Will Deacon, Quentin Perret, Ricardo Koller
Recent developments on the EFI front have resulted in guests that
simply won't boot if the page tables are in a read-only memslot and
that you're a bit unlucky in the way S2 gets paged in... The core
issue is related to the fact that we treat a S1PTW as a write, which
is close enough to what needs to be done. Until to get to RO memslots.
The first patch fixes this and is definitely a stable candidate. It
splits the faulting of page tables in two steps (RO translation fault,
followed by a writable permission fault -- should it even happen).
The second one documents the slightly odd behaviour of PTW writes to
RO memslot, which do not result in a KVM_MMIO exit. The last patch is
totally optional, only tangentially related, and randomly repainting
stuff (maybe that's contagious, who knows).
The whole thing is on top of v6.1-rc2.
I plan to take this in as a fix shortly.
M.
* From v1:
- Added the documentation patch
- Dropped the AF micro-optimisation, as it was creating more
confusion, was hard to test, and was of dubious value
- Collected RBs, with thanks
Marc Zyngier (3):
KVM: arm64: Fix S1PTW handling on RO memslots
KVM: arm64: Document the behaviour of S1PTW faults on RO memslots
KVM: arm64: Convert FSC_* over to ESR_ELx_FSC_*
Documentation/virt/kvm/api.rst | 8 +++++
arch/arm64/include/asm/esr.h | 9 ++++++
arch/arm64/include/asm/kvm_arm.h | 15 ---------
arch/arm64/include/asm/kvm_emulate.h | 42 ++++++++++++++++++-------
arch/arm64/kvm/hyp/include/hyp/fault.h | 2 +-
arch/arm64/kvm/hyp/include/hyp/switch.h | 2 +-
arch/arm64/kvm/mmu.c | 21 +++++++------
7 files changed, 61 insertions(+), 38 deletions(-)
--
2.34.1
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH v2 1/3] KVM: arm64: Fix S1PTW handling on RO memslots
2023-01-03 10:09 [PATCH v2 0/3] KVM: arm64: Fix handling of S1PTW S2 fault on RO memslots Marc Zyngier
@ 2023-01-03 10:09 ` Marc Zyngier
2023-01-03 10:09 ` [PATCH v2 2/3] KVM: arm64: Document the behaviour of S1PTW faults " Marc Zyngier
2023-01-03 10:09 ` [PATCH v2 3/3] KVM: arm64: Convert FSC_* over to ESR_ELx_FSC_* Marc Zyngier
2 siblings, 0 replies; 4+ messages in thread
From: Marc Zyngier @ 2023-01-03 10:09 UTC (permalink / raw)
To: kvmarm, kvmarm, kvm, linux-arm-kernel
Cc: James Morse, Suzuki K Poulose, Alexandru Elisei, Oliver Upton,
Ard Biesheuvel, Will Deacon, Quentin Perret, Ricardo Koller,
stable
A recent development on the EFI front has resulted in guests having
their page tables baked in the firmware binary, and mapped into the
IPA space as part of a read-only memslot. Not only is this legitimate,
but it also results in added security, so thumbs up.
It is possible to take an S1PTW translation fault if the S1 PTs are
unmapped at stage-2. However, KVM unconditionally treats S1PTW as a
write to correctly handle hardware AF/DB updates to the S1 PTs.
Furthermore, KVM injects an exception into the guest for S1PTW writes.
In the aforementioned case this results in the guest taking an abort
it won't recover from, as the S1 PTs mapping the vectors suffer from
the same problem.
So clearly our handling is... wrong.
Instead, switch to a two-pronged approach:
- On S1PTW translation fault, handle the fault as a read
- On S1PTW permission fault, handle the fault as a write
This is of no consequence to SW that *writes* to its PTs (the write
will trigger a non-S1PTW fault), and SW that uses RO PTs will not
use HW-assisted AF/DB anyway, as that'd be wrong.
Only in the case described in c4ad98e4b72c ("KVM: arm64: Assume write
fault on S1PTW permission fault on instruction fetch") do we end-up
with two back-to-back faults (page being evicted and faulted back).
I don't think this is a case worth optimising for.
Fixes: c4ad98e4b72c ("KVM: arm64: Assume write fault on S1PTW permission fault on instruction fetch")
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Regression-tested-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
---
arch/arm64/include/asm/kvm_emulate.h | 22 ++++++++++++++++++++--
1 file changed, 20 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 9bdba47f7e14..0d40c48d8132 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -373,8 +373,26 @@ static __always_inline int kvm_vcpu_sys_get_rt(struct kvm_vcpu *vcpu)
static inline bool kvm_is_write_fault(struct kvm_vcpu *vcpu)
{
- if (kvm_vcpu_abt_iss1tw(vcpu))
- return true;
+ if (kvm_vcpu_abt_iss1tw(vcpu)) {
+ /*
+ * Only a permission fault on a S1PTW should be
+ * considered as a write. Otherwise, page tables baked
+ * in a read-only memslot will result in an exception
+ * being delivered in the guest.
+ *
+ * The drawback is that we end-up faulting twice if the
+ * guest is using any of HW AF/DB: a translation fault
+ * to map the page containing the PT (read only at
+ * first), then a permission fault to allow the flags
+ * to be set.
+ */
+ switch (kvm_vcpu_trap_get_fault_type(vcpu)) {
+ case ESR_ELx_FSC_PERM:
+ return true;
+ default:
+ return false;
+ }
+ }
if (kvm_vcpu_trap_is_iabt(vcpu))
return false;
--
2.34.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH v2 2/3] KVM: arm64: Document the behaviour of S1PTW faults on RO memslots
2023-01-03 10:09 [PATCH v2 0/3] KVM: arm64: Fix handling of S1PTW S2 fault on RO memslots Marc Zyngier
2023-01-03 10:09 ` [PATCH v2 1/3] KVM: arm64: Fix S1PTW handling " Marc Zyngier
@ 2023-01-03 10:09 ` Marc Zyngier
2023-01-03 10:09 ` [PATCH v2 3/3] KVM: arm64: Convert FSC_* over to ESR_ELx_FSC_* Marc Zyngier
2 siblings, 0 replies; 4+ messages in thread
From: Marc Zyngier @ 2023-01-03 10:09 UTC (permalink / raw)
To: kvmarm, kvmarm, kvm, linux-arm-kernel
Cc: James Morse, Suzuki K Poulose, Alexandru Elisei, Oliver Upton,
Ard Biesheuvel, Will Deacon, Quentin Perret, Ricardo Koller
Although the KVM API says that a write to a RO memslot must result
in a KVM_EXIT_MMIO describing the write, the arm64 architecture
doesn't provide the *data* written by a Stage-1 page table walk
(we only get the address).
Since there isn't much userspace can do with so little information
anyway, document the fact that such an access results in a guest
exception, not an exit. This is consistent with the guest being
terminally broken anyway.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
Documentation/virt/kvm/api.rst | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 0dd5d8733dd5..42db72a0cbe6 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -1354,6 +1354,14 @@ the memory region are automatically reflected into the guest. For example, an
mmap() that affects the region will be made visible immediately. Another
example is madvise(MADV_DROP).
+Note: On arm64, a write generated by the page-table walker (to update
+the Access and Dirty flags, for example) never results in a
+KVM_EXIT_MMIO exit when the slot has the KVM_MEM_READONLY flag. This
+is because KVM cannot provide the data that would be written by the
+page-table walker, making it impossible to emulate the access.
+Instead, an abort (data abort if the cause of the page-table update
+was a load or a store, instruction abort if it was an instruction
+fetch) is injected in the guest.
4.36 KVM_SET_TSS_ADDR
---------------------
--
2.34.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH v2 3/3] KVM: arm64: Convert FSC_* over to ESR_ELx_FSC_*
2023-01-03 10:09 [PATCH v2 0/3] KVM: arm64: Fix handling of S1PTW S2 fault on RO memslots Marc Zyngier
2023-01-03 10:09 ` [PATCH v2 1/3] KVM: arm64: Fix S1PTW handling " Marc Zyngier
2023-01-03 10:09 ` [PATCH v2 2/3] KVM: arm64: Document the behaviour of S1PTW faults " Marc Zyngier
@ 2023-01-03 10:09 ` Marc Zyngier
2 siblings, 0 replies; 4+ messages in thread
From: Marc Zyngier @ 2023-01-03 10:09 UTC (permalink / raw)
To: kvmarm, kvmarm, kvm, linux-arm-kernel
Cc: James Morse, Suzuki K Poulose, Alexandru Elisei, Oliver Upton,
Ard Biesheuvel, Will Deacon, Quentin Perret, Ricardo Koller
The former is an AArch32 legacy, so let's move over to the
verbose (and strictly identical) version.
This involves moving some of the #defines that were private
to KVM into the more generic esr.h.
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
arch/arm64/include/asm/esr.h | 9 +++++++++
arch/arm64/include/asm/kvm_arm.h | 15 ---------------
arch/arm64/include/asm/kvm_emulate.h | 20 ++++++++++----------
arch/arm64/kvm/hyp/include/hyp/fault.h | 2 +-
arch/arm64/kvm/hyp/include/hyp/switch.h | 2 +-
arch/arm64/kvm/mmu.c | 21 ++++++++++++---------
6 files changed, 33 insertions(+), 36 deletions(-)
diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
index 15b34fbfca66..206de10524e3 100644
--- a/arch/arm64/include/asm/esr.h
+++ b/arch/arm64/include/asm/esr.h
@@ -114,6 +114,15 @@
#define ESR_ELx_FSC_ACCESS (0x08)
#define ESR_ELx_FSC_FAULT (0x04)
#define ESR_ELx_FSC_PERM (0x0C)
+#define ESR_ELx_FSC_SEA_TTW0 (0x14)
+#define ESR_ELx_FSC_SEA_TTW1 (0x15)
+#define ESR_ELx_FSC_SEA_TTW2 (0x16)
+#define ESR_ELx_FSC_SEA_TTW3 (0x17)
+#define ESR_ELx_FSC_SECC (0x18)
+#define ESR_ELx_FSC_SECC_TTW0 (0x1c)
+#define ESR_ELx_FSC_SECC_TTW1 (0x1d)
+#define ESR_ELx_FSC_SECC_TTW2 (0x1e)
+#define ESR_ELx_FSC_SECC_TTW3 (0x1f)
/* ISS field definitions for Data Aborts */
#define ESR_ELx_ISV_SHIFT (24)
diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 0df3fc3a0173..26b0c97df986 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -319,21 +319,6 @@
BIT(18) | \
GENMASK(16, 15))
-/* For compatibility with fault code shared with 32-bit */
-#define FSC_FAULT ESR_ELx_FSC_FAULT
-#define FSC_ACCESS ESR_ELx_FSC_ACCESS
-#define FSC_PERM ESR_ELx_FSC_PERM
-#define FSC_SEA ESR_ELx_FSC_EXTABT
-#define FSC_SEA_TTW0 (0x14)
-#define FSC_SEA_TTW1 (0x15)
-#define FSC_SEA_TTW2 (0x16)
-#define FSC_SEA_TTW3 (0x17)
-#define FSC_SECC (0x18)
-#define FSC_SECC_TTW0 (0x1c)
-#define FSC_SECC_TTW1 (0x1d)
-#define FSC_SECC_TTW2 (0x1e)
-#define FSC_SECC_TTW3 (0x1f)
-
/* Hyp Prefetch Fault Address Register (HPFAR/HDFAR) */
#define HPFAR_MASK (~UL(0xf))
/*
diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 0d40c48d8132..193583df2d9c 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -349,16 +349,16 @@ static __always_inline u8 kvm_vcpu_trap_get_fault_level(const struct kvm_vcpu *v
static __always_inline bool kvm_vcpu_abt_issea(const struct kvm_vcpu *vcpu)
{
switch (kvm_vcpu_trap_get_fault(vcpu)) {
- case FSC_SEA:
- case FSC_SEA_TTW0:
- case FSC_SEA_TTW1:
- case FSC_SEA_TTW2:
- case FSC_SEA_TTW3:
- case FSC_SECC:
- case FSC_SECC_TTW0:
- case FSC_SECC_TTW1:
- case FSC_SECC_TTW2:
- case FSC_SECC_TTW3:
+ case ESR_ELx_FSC_EXTABT:
+ case ESR_ELx_FSC_SEA_TTW0:
+ case ESR_ELx_FSC_SEA_TTW1:
+ case ESR_ELx_FSC_SEA_TTW2:
+ case ESR_ELx_FSC_SEA_TTW3:
+ case ESR_ELx_FSC_SECC:
+ case ESR_ELx_FSC_SECC_TTW0:
+ case ESR_ELx_FSC_SECC_TTW1:
+ case ESR_ELx_FSC_SECC_TTW2:
+ case ESR_ELx_FSC_SECC_TTW3:
return true;
default:
return false;
diff --git a/arch/arm64/kvm/hyp/include/hyp/fault.h b/arch/arm64/kvm/hyp/include/hyp/fault.h
index 1b8a2dcd712f..9ddcfe2c3e57 100644
--- a/arch/arm64/kvm/hyp/include/hyp/fault.h
+++ b/arch/arm64/kvm/hyp/include/hyp/fault.h
@@ -60,7 +60,7 @@ static inline bool __get_fault_info(u64 esr, struct kvm_vcpu_fault_info *fault)
*/
if (!(esr & ESR_ELx_S1PTW) &&
(cpus_have_final_cap(ARM64_WORKAROUND_834220) ||
- (esr & ESR_ELx_FSC_TYPE) == FSC_PERM)) {
+ (esr & ESR_ELx_FSC_TYPE) == ESR_ELx_FSC_PERM)) {
if (!__translate_far_to_hpfar(far, &hpfar))
return false;
} else {
diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h
index 3330d1b76bdd..07d37ff88a3f 100644
--- a/arch/arm64/kvm/hyp/include/hyp/switch.h
+++ b/arch/arm64/kvm/hyp/include/hyp/switch.h
@@ -367,7 +367,7 @@ static bool kvm_hyp_handle_dabt_low(struct kvm_vcpu *vcpu, u64 *exit_code)
if (static_branch_unlikely(&vgic_v2_cpuif_trap)) {
bool valid;
- valid = kvm_vcpu_trap_get_fault_type(vcpu) == FSC_FAULT &&
+ valid = kvm_vcpu_trap_get_fault_type(vcpu) == ESR_ELx_FSC_FAULT &&
kvm_vcpu_dabt_isvalid(vcpu) &&
!kvm_vcpu_abt_issea(vcpu) &&
!kvm_vcpu_abt_iss1tw(vcpu);
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 31d7fa4c7c14..a3ee3b605c9b 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1212,7 +1212,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
exec_fault = kvm_vcpu_trap_is_exec_fault(vcpu);
VM_BUG_ON(write_fault && exec_fault);
- if (fault_status == FSC_PERM && !write_fault && !exec_fault) {
+ if (fault_status == ESR_ELx_FSC_PERM && !write_fault && !exec_fault) {
kvm_err("Unexpected L2 read permission error\n");
return -EFAULT;
}
@@ -1277,7 +1277,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
* only exception to this is when dirty logging is enabled at runtime
* and a write fault needs to collapse a block entry into a table.
*/
- if (fault_status != FSC_PERM || (logging_active && write_fault)) {
+ if (fault_status != ESR_ELx_FSC_PERM ||
+ (logging_active && write_fault)) {
ret = kvm_mmu_topup_memory_cache(memcache,
kvm_mmu_cache_min_pages(kvm));
if (ret)
@@ -1342,7 +1343,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
* backed by a THP and thus use block mapping if possible.
*/
if (vma_pagesize == PAGE_SIZE && !(force_pte || device)) {
- if (fault_status == FSC_PERM && fault_granule > PAGE_SIZE)
+ if (fault_status == ESR_ELx_FSC_PERM &&
+ fault_granule > PAGE_SIZE)
vma_pagesize = fault_granule;
else
vma_pagesize = transparent_hugepage_adjust(kvm, memslot,
@@ -1350,7 +1352,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
&fault_ipa);
}
- if (fault_status != FSC_PERM && !device && kvm_has_mte(kvm)) {
+ if (fault_status != ESR_ELx_FSC_PERM && !device && kvm_has_mte(kvm)) {
/* Check the VMM hasn't introduced a new disallowed VMA */
if (kvm_vma_mte_allowed(vma)) {
sanitise_mte_tags(kvm, pfn, vma_pagesize);
@@ -1376,7 +1378,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
* permissions only if vma_pagesize equals fault_granule. Otherwise,
* kvm_pgtable_stage2_map() should be called to change block size.
*/
- if (fault_status == FSC_PERM && vma_pagesize == fault_granule)
+ if (fault_status == ESR_ELx_FSC_PERM && vma_pagesize == fault_granule)
ret = kvm_pgtable_stage2_relax_perms(pgt, fault_ipa, prot);
else
ret = kvm_pgtable_stage2_map(pgt, fault_ipa, vma_pagesize,
@@ -1441,7 +1443,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
fault_ipa = kvm_vcpu_get_fault_ipa(vcpu);
is_iabt = kvm_vcpu_trap_is_iabt(vcpu);
- if (fault_status == FSC_FAULT) {
+ if (fault_status == ESR_ELx_FSC_FAULT) {
/* Beyond sanitised PARange (which is the IPA limit) */
if (fault_ipa >= BIT_ULL(get_kvm_ipa_limit())) {
kvm_inject_size_fault(vcpu);
@@ -1476,8 +1478,9 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
kvm_vcpu_get_hfar(vcpu), fault_ipa);
/* Check the stage-2 fault is trans. fault or write fault */
- if (fault_status != FSC_FAULT && fault_status != FSC_PERM &&
- fault_status != FSC_ACCESS) {
+ if (fault_status != ESR_ELx_FSC_FAULT &&
+ fault_status != ESR_ELx_FSC_PERM &&
+ fault_status != ESR_ELx_FSC_ACCESS) {
kvm_err("Unsupported FSC: EC=%#x xFSC=%#lx ESR_EL2=%#lx\n",
kvm_vcpu_trap_get_class(vcpu),
(unsigned long)kvm_vcpu_trap_get_fault(vcpu),
@@ -1539,7 +1542,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
/* Userspace should not be able to register out-of-bounds IPAs */
VM_BUG_ON(fault_ipa >= kvm_phys_size(vcpu->kvm));
- if (fault_status == FSC_ACCESS) {
+ if (fault_status == ESR_ELx_FSC_ACCESS) {
handle_access_fault(vcpu, fault_ipa);
ret = 1;
goto out_unlock;
--
2.34.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
end of thread, other threads:[~2023-01-03 10:13 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-01-03 10:09 [PATCH v2 0/3] KVM: arm64: Fix handling of S1PTW S2 fault on RO memslots Marc Zyngier
2023-01-03 10:09 ` [PATCH v2 1/3] KVM: arm64: Fix S1PTW handling " Marc Zyngier
2023-01-03 10:09 ` [PATCH v2 2/3] KVM: arm64: Document the behaviour of S1PTW faults " Marc Zyngier
2023-01-03 10:09 ` [PATCH v2 3/3] KVM: arm64: Convert FSC_* over to ESR_ELx_FSC_* Marc Zyngier
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox