* [PATCH v2 1/6] KVM: arm64: VM exit to userspace to handle SEA
2025-06-04 5:08 [PATCH v2 0/6] VMM can handle guest SEA via KVM_EXIT_ARM_SEA Jiaqi Yan
@ 2025-06-04 5:08 ` Jiaqi Yan
2025-07-01 17:35 ` Jiaqi Yan
2025-07-11 19:39 ` Oliver Upton
2025-06-04 5:08 ` [PATCH v2 2/6] KVM: arm64: Set FnV for VCPU when FAR_EL2 is invalid Jiaqi Yan
` (4 subsequent siblings)
5 siblings, 2 replies; 21+ messages in thread
From: Jiaqi Yan @ 2025-06-04 5:08 UTC (permalink / raw)
To: maz, oliver.upton
Cc: joey.gouly, suzuki.poulose, yuzenghui, catalin.marinas, will,
pbonzini, corbet, shuah, kvm, kvmarm, linux-arm-kernel,
linux-kernel, linux-doc, linux-kselftest, duenwen, rananta,
jthoughton, Jiaqi Yan
When APEI fails to handle a stage-2 synchronous external abort (SEA),
today KVM directly injects an async SError to the VCPU then resumes it,
which usually results in unpleasant guest kernel panic.
One major situation of guest SEA is when vCPU consumes recoverable
uncorrected memory error (UER). Although SError and guest kernel panic
effectively stops the propagation of corrupted memory, there is room
to recover from an UER in a more graceful manner.
Alternatively KVM can redirect the synchronous SEA event to VMM to
- Reduce blast radius if possible. VMM can inject a SEA to VCPU via
KVM's existing KVM_SET_VCPU_EVENTS API. If the memory poison
consumption or fault is not from guest kernel, blast radius can be
limited to the triggering thread in guest userspace, so VM can
keep running.
- VMM can protect from future memory poison consumption by unmapping
the page from stage-2, or interrupt guest of the poisoned guest page
so guest kernel can unmap it from stage-1.
- VMM can also track SEA events that VM customers care about, restart
VM when certain number of distinct poison events have happened,
provide observability to customers in log management UI.
Introduce an userspace-visible feature to enable VMM to handle SEA:
- KVM_CAP_ARM_SEA_TO_USER. As the alternative fallback behavior
when host APEI fails to claim a SEA, userspace can opt in this new
capability to let KVM exit to userspace during SEA if it is not
caused by access on memory of stage-2 translation table.
- KVM_EXIT_ARM_SEA. A new exit reason is introduced for this.
KVM fills kvm_run.arm_sea with as much as possible information about
the SEA, enabling VMM to emulate SEA to guest by itself.
- Sanitized ESR_EL2. The general rule is to keep only the bits
useful for userspace and relevant to guest memory. See code
comments for why bits are hidden/reported.
- If faulting guest virtual and physical addresses are available.
- Faulting guest virtual address if available.
- Faulting guest physical address if available.
Signed-off-by: Jiaqi Yan <jiaqiyan@google.com>
---
arch/arm64/include/asm/kvm_emulate.h | 67 ++++++++++++++++++++++++++++
arch/arm64/include/asm/kvm_host.h | 8 ++++
arch/arm64/include/asm/kvm_ras.h | 2 +-
arch/arm64/kvm/arm.c | 5 +++
arch/arm64/kvm/mmu.c | 59 +++++++++++++++++++-----
include/uapi/linux/kvm.h | 11 +++++
6 files changed, 141 insertions(+), 11 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index bd020fc28aa9c..ac602f8503622 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -429,6 +429,73 @@ static __always_inline bool kvm_vcpu_abt_issea(const struct kvm_vcpu *vcpu)
}
}
+/*
+ * Return true if SEA is on an access made for stage-2 translation table walk.
+ */
+static inline bool kvm_vcpu_sea_iss2ttw(const struct kvm_vcpu *vcpu)
+{
+ u64 esr = kvm_vcpu_get_esr(vcpu);
+
+ if (!esr_fsc_is_sea_ttw(esr) && !esr_fsc_is_secc_ttw(esr))
+ return false;
+
+ return !(esr & ESR_ELx_S1PTW);
+}
+
+/*
+ * Sanitize ESR_EL2 before KVM_EXIT_ARM_SEA. The general rule is to keep
+ * only the SEA-relevant bits that are useful for userspace and relevant to
+ * guest memory.
+ */
+static inline u64 kvm_vcpu_sea_esr_sanitized(const struct kvm_vcpu *vcpu)
+{
+ u64 esr = kvm_vcpu_get_esr(vcpu);
+ /*
+ * Starting with zero to hide the following bits:
+ * - HDBSSF: hardware dirty state is not guest memory.
+ * - TnD, TagAccess, AssuredOnly, Overlay, DirtyBit: they are
+ * for permission fault.
+ * - GCS: not guest memory.
+ * - Xs: it is for translation/access flag/permission fault.
+ * - ISV: it is 1 mostly for Translation fault, Access flag fault,
+ * or Permission fault. Only when FEAT_RAS is not implemented,
+ * it may be set to 1 (implementation defined) for S2PTW,
+ * which not worthy to return to userspace anyway.
+ * - ISS[23:14]: because ISV is already hidden.
+ * - VNCR: VNCR_EL2 is not guest memory.
+ */
+ u64 sanitized = 0ULL;
+
+ /*
+ * Reasons to make these bits visible to userspace:
+ * - EC: tell if abort on instruction or data.
+ * - IL: useful if userspace decides to retire the instruction.
+ * - FSC: tell if abort on translation table walk.
+ * - SET: tell if abort is recoverable, uncontainable, or
+ * restartable.
+ * - S1PTW: userspace can tell guest its stage-1 has problem.
+ * - FnV: userspace should avoid writing FAR_EL1 if FnV=1.
+ * - CM and WnR: make ESR "authentic" in general.
+ */
+ sanitized |= esr & (ESR_ELx_EC_MASK | ESR_ELx_IL | ESR_ELx_FSC |
+ ESR_ELx_SET_MASK | ESR_ELx_S1PTW | ESR_ELx_FnV |
+ ESR_ELx_CM | ESR_ELx_WNR);
+
+ return sanitized;
+}
+
+/* Return true if faulting guest virtual address during SEA is valid. */
+static inline bool kvm_vcpu_sea_far_valid(const struct kvm_vcpu *vcpu)
+{
+ return !(kvm_vcpu_get_esr(vcpu) & ESR_ELx_FnV);
+}
+
+/* Return true if faulting guest physical address during SEA is valid. */
+static inline bool kvm_vcpu_sea_ipa_valid(const struct kvm_vcpu *vcpu)
+{
+ return vcpu->arch.fault.hpfar_el2 & HPFAR_EL2_NS;
+}
+
static __always_inline int kvm_vcpu_sys_get_rt(struct kvm_vcpu *vcpu)
{
u64 esr = kvm_vcpu_get_esr(vcpu);
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index d941abc6b5eef..4b27e988ec768 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -349,6 +349,14 @@ struct kvm_arch {
#define KVM_ARCH_FLAG_GUEST_HAS_SVE 9
/* MIDR_EL1, REVIDR_EL1, and AIDR_EL1 are writable from userspace */
#define KVM_ARCH_FLAG_WRITABLE_IMP_ID_REGS 10
+ /*
+ * When APEI failed to claim stage-2 synchronous external abort
+ * (SEA) return to userspace with fault information. Userspace
+ * can opt in this feature if KVM_CAP_ARM_SEA_TO_USER is
+ * supported. Userspace is encouraged to handle this VM exit
+ * by injecting a SEA to VCPU before resume the VCPU.
+ */
+#define KVM_ARCH_FLAG_RETURN_SEA_TO_USER 11
unsigned long flags;
/* VM-wide vCPU feature set */
diff --git a/arch/arm64/include/asm/kvm_ras.h b/arch/arm64/include/asm/kvm_ras.h
index 9398ade632aaf..760a5e34489b1 100644
--- a/arch/arm64/include/asm/kvm_ras.h
+++ b/arch/arm64/include/asm/kvm_ras.h
@@ -14,7 +14,7 @@
* Was this synchronous external abort a RAS notification?
* Returns '0' for errors handled by some RAS subsystem, or -ENOENT.
*/
-static inline int kvm_handle_guest_sea(void)
+static inline int kvm_delegate_guest_sea(void)
{
/* apei_claim_sea(NULL) expects to mask interrupts itself */
lockdep_assert_irqs_enabled();
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 505d504b52b53..99e0c6c16e437 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -133,6 +133,10 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
}
mutex_unlock(&kvm->lock);
break;
+ case KVM_CAP_ARM_SEA_TO_USER:
+ r = 0;
+ set_bit(KVM_ARCH_FLAG_RETURN_SEA_TO_USER, &kvm->arch.flags);
+ break;
default:
break;
}
@@ -322,6 +326,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_IRQFD_RESAMPLE:
case KVM_CAP_COUNTER_OFFSET:
case KVM_CAP_ARM_WRITABLE_IMP_ID_REGS:
+ case KVM_CAP_ARM_SEA_TO_USER:
r = 1;
break;
case KVM_CAP_SET_GUEST_DEBUG2:
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index e445db2cb4a43..5a50d0ed76a68 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1775,6 +1775,53 @@ static void handle_access_fault(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
read_unlock(&vcpu->kvm->mmu_lock);
}
+/* Handle stage-2 synchronous external abort (SEA). */
+static int kvm_handle_guest_sea(struct kvm_vcpu *vcpu)
+{
+ struct kvm_run *run = vcpu->run;
+
+ /* Delegate to APEI for RAS and if it can claim SEA, resume guest. */
+ if (kvm_delegate_guest_sea() == 0)
+ return 1;
+
+ /*
+ * In addition to userspace opt out KVM_ARCH_FLAG_RETURN_SEA_TO_USER,
+ * when the SEA is caused on memory for stage-2 page table, returning
+ * to userspace doesn't bring any benefit: eventually a EL2 exception
+ * will crash the host kernel.
+ */
+ if (!test_bit(KVM_ARCH_FLAG_RETURN_SEA_TO_USER,
+ &vcpu->kvm->arch.flags) ||
+ kvm_vcpu_sea_iss2ttw(vcpu)) {
+ /* Fallback behavior prior to KVM_EXIT_ARM_SEA. */
+ kvm_inject_vabt(vcpu);
+ return 1;
+ }
+
+ /*
+ * Exit to userspace, and provide faulting guest virtual and physical
+ * addresses in case userspace wants to emulate SEA to guest by
+ * writing to FAR_EL1 and HPFAR_EL1 registers.
+ */
+ run->exit_reason = KVM_EXIT_ARM_SEA;
+ run->arm_sea.esr = kvm_vcpu_sea_esr_sanitized(vcpu);
+ run->arm_sea.flags = 0ULL;
+ run->arm_sea.gva = 0ULL;
+ run->arm_sea.gpa = 0ULL;
+
+ if (kvm_vcpu_sea_far_valid(vcpu)) {
+ run->arm_sea.flags |= KVM_EXIT_ARM_SEA_FLAG_GVA_VALID;
+ run->arm_sea.gva = kvm_vcpu_get_hfar(vcpu);
+ }
+
+ if (kvm_vcpu_sea_ipa_valid(vcpu)) {
+ run->arm_sea.flags |= KVM_EXIT_ARM_SEA_FLAG_GPA_VALID;
+ run->arm_sea.gpa = kvm_vcpu_get_fault_ipa(vcpu);
+ }
+
+ return 0;
+}
+
/**
* kvm_handle_guest_abort - handles all 2nd stage aborts
* @vcpu: the VCPU pointer
@@ -1799,16 +1846,8 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
int ret, idx;
/* Synchronous External Abort? */
- if (kvm_vcpu_abt_issea(vcpu)) {
- /*
- * For RAS the host kernel may handle this abort.
- * There is no need to pass the error into the guest.
- */
- if (kvm_handle_guest_sea())
- kvm_inject_vabt(vcpu);
-
- return 1;
- }
+ if (kvm_vcpu_abt_issea(vcpu))
+ return kvm_handle_guest_sea(vcpu);
esr = kvm_vcpu_get_esr(vcpu);
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index c9d4a908976e8..4fed3fdfb13d6 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -178,6 +178,7 @@ struct kvm_xen_exit {
#define KVM_EXIT_NOTIFY 37
#define KVM_EXIT_LOONGARCH_IOCSR 38
#define KVM_EXIT_MEMORY_FAULT 39
+#define KVM_EXIT_ARM_SEA 40
/* For KVM_EXIT_INTERNAL_ERROR */
/* Emulate instruction failed. */
@@ -446,6 +447,15 @@ struct kvm_run {
__u64 gpa;
__u64 size;
} memory_fault;
+ /* KVM_EXIT_ARM_SEA */
+ struct {
+ __u64 esr;
+#define KVM_EXIT_ARM_SEA_FLAG_GVA_VALID (1ULL << 0)
+#define KVM_EXIT_ARM_SEA_FLAG_GPA_VALID (1ULL << 1)
+ __u64 flags;
+ __u64 gva;
+ __u64 gpa;
+ } arm_sea;
/* Fix the size of the union. */
char padding[256];
};
@@ -932,6 +942,7 @@ struct kvm_enable_cap {
#define KVM_CAP_ARM_WRITABLE_IMP_ID_REGS 239
#define KVM_CAP_ARM_EL2 240
#define KVM_CAP_ARM_EL2_E2H0 241
+#define KVM_CAP_ARM_SEA_TO_USER 242
struct kvm_irq_routing_irqchip {
__u32 irqchip;
--
2.49.0.1266.g31b7d2e469-goog
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH v2 1/6] KVM: arm64: VM exit to userspace to handle SEA
2025-06-04 5:08 ` [PATCH v2 1/6] KVM: arm64: VM exit to userspace to handle SEA Jiaqi Yan
@ 2025-07-01 17:35 ` Jiaqi Yan
2025-07-11 19:39 ` Oliver Upton
1 sibling, 0 replies; 21+ messages in thread
From: Jiaqi Yan @ 2025-07-01 17:35 UTC (permalink / raw)
To: maz, oliver.upton
Cc: joey.gouly, suzuki.poulose, yuzenghui, catalin.marinas, will,
pbonzini, corbet, shuah, kvm, kvmarm, linux-arm-kernel,
linux-kernel, linux-doc, linux-kselftest, duenwen, rananta,
jthoughton
On Tue, Jun 3, 2025 at 10:09 PM Jiaqi Yan <jiaqiyan@google.com> wrote:
>
> When APEI fails to handle a stage-2 synchronous external abort (SEA),
> today KVM directly injects an async SError to the VCPU then resumes it,
> which usually results in unpleasant guest kernel panic.
>
> One major situation of guest SEA is when vCPU consumes recoverable
> uncorrected memory error (UER). Although SError and guest kernel panic
> effectively stops the propagation of corrupted memory, there is room
> to recover from an UER in a more graceful manner.
>
> Alternatively KVM can redirect the synchronous SEA event to VMM to
> - Reduce blast radius if possible. VMM can inject a SEA to VCPU via
> KVM's existing KVM_SET_VCPU_EVENTS API. If the memory poison
> consumption or fault is not from guest kernel, blast radius can be
> limited to the triggering thread in guest userspace, so VM can
> keep running.
> - VMM can protect from future memory poison consumption by unmapping
> the page from stage-2, or interrupt guest of the poisoned guest page
> so guest kernel can unmap it from stage-1.
> - VMM can also track SEA events that VM customers care about, restart
> VM when certain number of distinct poison events have happened,
> provide observability to customers in log management UI.
>
> Introduce an userspace-visible feature to enable VMM to handle SEA:
> - KVM_CAP_ARM_SEA_TO_USER. As the alternative fallback behavior
> when host APEI fails to claim a SEA, userspace can opt in this new
> capability to let KVM exit to userspace during SEA if it is not
> caused by access on memory of stage-2 translation table.
> - KVM_EXIT_ARM_SEA. A new exit reason is introduced for this.
> KVM fills kvm_run.arm_sea with as much as possible information about
> the SEA, enabling VMM to emulate SEA to guest by itself.
> - Sanitized ESR_EL2. The general rule is to keep only the bits
> useful for userspace and relevant to guest memory. See code
> comments for why bits are hidden/reported.
> - If faulting guest virtual and physical addresses are available.
> - Faulting guest virtual address if available.
> - Faulting guest physical address if available.
>
> Signed-off-by: Jiaqi Yan <jiaqiyan@google.com>
> ---
> arch/arm64/include/asm/kvm_emulate.h | 67 ++++++++++++++++++++++++++++
> arch/arm64/include/asm/kvm_host.h | 8 ++++
> arch/arm64/include/asm/kvm_ras.h | 2 +-
> arch/arm64/kvm/arm.c | 5 +++
> arch/arm64/kvm/mmu.c | 59 +++++++++++++++++++-----
> include/uapi/linux/kvm.h | 11 +++++
> 6 files changed, 141 insertions(+), 11 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> index bd020fc28aa9c..ac602f8503622 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -429,6 +429,73 @@ static __always_inline bool kvm_vcpu_abt_issea(const struct kvm_vcpu *vcpu)
> }
> }
>
> +/*
> + * Return true if SEA is on an access made for stage-2 translation table walk.
> + */
> +static inline bool kvm_vcpu_sea_iss2ttw(const struct kvm_vcpu *vcpu)
> +{
> + u64 esr = kvm_vcpu_get_esr(vcpu);
> +
> + if (!esr_fsc_is_sea_ttw(esr) && !esr_fsc_is_secc_ttw(esr))
> + return false;
> +
> + return !(esr & ESR_ELx_S1PTW);
> +}
> +
> +/*
> + * Sanitize ESR_EL2 before KVM_EXIT_ARM_SEA. The general rule is to keep
> + * only the SEA-relevant bits that are useful for userspace and relevant to
> + * guest memory.
> + */
> +static inline u64 kvm_vcpu_sea_esr_sanitized(const struct kvm_vcpu *vcpu)
> +{
> + u64 esr = kvm_vcpu_get_esr(vcpu);
> + /*
> + * Starting with zero to hide the following bits:
> + * - HDBSSF: hardware dirty state is not guest memory.
> + * - TnD, TagAccess, AssuredOnly, Overlay, DirtyBit: they are
> + * for permission fault.
> + * - GCS: not guest memory.
> + * - Xs: it is for translation/access flag/permission fault.
> + * - ISV: it is 1 mostly for Translation fault, Access flag fault,
> + * or Permission fault. Only when FEAT_RAS is not implemented,
> + * it may be set to 1 (implementation defined) for S2PTW,
> + * which not worthy to return to userspace anyway.
> + * - ISS[23:14]: because ISV is already hidden.
> + * - VNCR: VNCR_EL2 is not guest memory.
> + */
> + u64 sanitized = 0ULL;
> +
> + /*
> + * Reasons to make these bits visible to userspace:
> + * - EC: tell if abort on instruction or data.
> + * - IL: useful if userspace decides to retire the instruction.
> + * - FSC: tell if abort on translation table walk.
> + * - SET: tell if abort is recoverable, uncontainable, or
> + * restartable.
> + * - S1PTW: userspace can tell guest its stage-1 has problem.
> + * - FnV: userspace should avoid writing FAR_EL1 if FnV=1.
> + * - CM and WnR: make ESR "authentic" in general.
> + */
> + sanitized |= esr & (ESR_ELx_EC_MASK | ESR_ELx_IL | ESR_ELx_FSC |
> + ESR_ELx_SET_MASK | ESR_ELx_S1PTW | ESR_ELx_FnV |
> + ESR_ELx_CM | ESR_ELx_WNR);
> +
> + return sanitized;
> +}
> +
> +/* Return true if faulting guest virtual address during SEA is valid. */
> +static inline bool kvm_vcpu_sea_far_valid(const struct kvm_vcpu *vcpu)
> +{
> + return !(kvm_vcpu_get_esr(vcpu) & ESR_ELx_FnV);
> +}
> +
> +/* Return true if faulting guest physical address during SEA is valid. */
> +static inline bool kvm_vcpu_sea_ipa_valid(const struct kvm_vcpu *vcpu)
> +{
> + return vcpu->arch.fault.hpfar_el2 & HPFAR_EL2_NS;
> +}
> +
> static __always_inline int kvm_vcpu_sys_get_rt(struct kvm_vcpu *vcpu)
> {
> u64 esr = kvm_vcpu_get_esr(vcpu);
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index d941abc6b5eef..4b27e988ec768 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -349,6 +349,14 @@ struct kvm_arch {
> #define KVM_ARCH_FLAG_GUEST_HAS_SVE 9
> /* MIDR_EL1, REVIDR_EL1, and AIDR_EL1 are writable from userspace */
> #define KVM_ARCH_FLAG_WRITABLE_IMP_ID_REGS 10
> + /*
> + * When APEI failed to claim stage-2 synchronous external abort
> + * (SEA) return to userspace with fault information. Userspace
> + * can opt in this feature if KVM_CAP_ARM_SEA_TO_USER is
> + * supported. Userspace is encouraged to handle this VM exit
> + * by injecting a SEA to VCPU before resume the VCPU.
> + */
> +#define KVM_ARCH_FLAG_RETURN_SEA_TO_USER 11
> unsigned long flags;
>
> /* VM-wide vCPU feature set */
> diff --git a/arch/arm64/include/asm/kvm_ras.h b/arch/arm64/include/asm/kvm_ras.h
> index 9398ade632aaf..760a5e34489b1 100644
> --- a/arch/arm64/include/asm/kvm_ras.h
> +++ b/arch/arm64/include/asm/kvm_ras.h
> @@ -14,7 +14,7 @@
> * Was this synchronous external abort a RAS notification?
> * Returns '0' for errors handled by some RAS subsystem, or -ENOENT.
> */
> -static inline int kvm_handle_guest_sea(void)
> +static inline int kvm_delegate_guest_sea(void)
> {
> /* apei_claim_sea(NULL) expects to mask interrupts itself */
> lockdep_assert_irqs_enabled();
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 505d504b52b53..99e0c6c16e437 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -133,6 +133,10 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
> }
> mutex_unlock(&kvm->lock);
> break;
> + case KVM_CAP_ARM_SEA_TO_USER:
> + r = 0;
> + set_bit(KVM_ARCH_FLAG_RETURN_SEA_TO_USER, &kvm->arch.flags);
> + break;
> default:
> break;
> }
> @@ -322,6 +326,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> case KVM_CAP_IRQFD_RESAMPLE:
> case KVM_CAP_COUNTER_OFFSET:
> case KVM_CAP_ARM_WRITABLE_IMP_ID_REGS:
> + case KVM_CAP_ARM_SEA_TO_USER:
> r = 1;
> break;
> case KVM_CAP_SET_GUEST_DEBUG2:
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index e445db2cb4a43..5a50d0ed76a68 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1775,6 +1775,53 @@ static void handle_access_fault(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
> read_unlock(&vcpu->kvm->mmu_lock);
> }
>
> +/* Handle stage-2 synchronous external abort (SEA). */
> +static int kvm_handle_guest_sea(struct kvm_vcpu *vcpu)
> +{
> + struct kvm_run *run = vcpu->run;
> +
> + /* Delegate to APEI for RAS and if it can claim SEA, resume guest. */
> + if (kvm_delegate_guest_sea() == 0)
> + return 1;
> +
> + /*
> + * In addition to userspace opt out KVM_ARCH_FLAG_RETURN_SEA_TO_USER,
> + * when the SEA is caused on memory for stage-2 page table, returning
> + * to userspace doesn't bring any benefit: eventually a EL2 exception
> + * will crash the host kernel.
> + */
> + if (!test_bit(KVM_ARCH_FLAG_RETURN_SEA_TO_USER,
> + &vcpu->kvm->arch.flags) ||
> + kvm_vcpu_sea_iss2ttw(vcpu)) {
> + /* Fallback behavior prior to KVM_EXIT_ARM_SEA. */
> + kvm_inject_vabt(vcpu);
> + return 1;
> + }
> +
> + /*
> + * Exit to userspace, and provide faulting guest virtual and physical
> + * addresses in case userspace wants to emulate SEA to guest by
> + * writing to FAR_EL1 and HPFAR_EL1 registers.
> + */
> + run->exit_reason = KVM_EXIT_ARM_SEA;
> + run->arm_sea.esr = kvm_vcpu_sea_esr_sanitized(vcpu);
> + run->arm_sea.flags = 0ULL;
> + run->arm_sea.gva = 0ULL;
> + run->arm_sea.gpa = 0ULL;
> +
> + if (kvm_vcpu_sea_far_valid(vcpu)) {
> + run->arm_sea.flags |= KVM_EXIT_ARM_SEA_FLAG_GVA_VALID;
> + run->arm_sea.gva = kvm_vcpu_get_hfar(vcpu);
> + }
> +
> + if (kvm_vcpu_sea_ipa_valid(vcpu)) {
> + run->arm_sea.flags |= KVM_EXIT_ARM_SEA_FLAG_GPA_VALID;
> + run->arm_sea.gpa = kvm_vcpu_get_fault_ipa(vcpu);
> + }
> +
> + return 0;
> +}
> +
> /**
> * kvm_handle_guest_abort - handles all 2nd stage aborts
> * @vcpu: the VCPU pointer
> @@ -1799,16 +1846,8 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
> int ret, idx;
>
> /* Synchronous External Abort? */
> - if (kvm_vcpu_abt_issea(vcpu)) {
> - /*
> - * For RAS the host kernel may handle this abort.
> - * There is no need to pass the error into the guest.
> - */
> - if (kvm_handle_guest_sea())
> - kvm_inject_vabt(vcpu);
> -
> - return 1;
> - }
> + if (kvm_vcpu_abt_issea(vcpu))
> + return kvm_handle_guest_sea(vcpu);
>
> esr = kvm_vcpu_get_esr(vcpu);
>
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index c9d4a908976e8..4fed3fdfb13d6 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -178,6 +178,7 @@ struct kvm_xen_exit {
> #define KVM_EXIT_NOTIFY 37
> #define KVM_EXIT_LOONGARCH_IOCSR 38
> #define KVM_EXIT_MEMORY_FAULT 39
> +#define KVM_EXIT_ARM_SEA 40
>
> /* For KVM_EXIT_INTERNAL_ERROR */
> /* Emulate instruction failed. */
> @@ -446,6 +447,15 @@ struct kvm_run {
> __u64 gpa;
> __u64 size;
> } memory_fault;
> + /* KVM_EXIT_ARM_SEA */
> + struct {
> + __u64 esr;
> +#define KVM_EXIT_ARM_SEA_FLAG_GVA_VALID (1ULL << 0)
> +#define KVM_EXIT_ARM_SEA_FLAG_GPA_VALID (1ULL << 1)
> + __u64 flags;
> + __u64 gva;
> + __u64 gpa;
> + } arm_sea;
> /* Fix the size of the union. */
> char padding[256];
> };
> @@ -932,6 +942,7 @@ struct kvm_enable_cap {
> #define KVM_CAP_ARM_WRITABLE_IMP_ID_REGS 239
> #define KVM_CAP_ARM_EL2 240
> #define KVM_CAP_ARM_EL2_E2H0 241
> +#define KVM_CAP_ARM_SEA_TO_USER 242
>
> struct kvm_irq_routing_irqchip {
> __u32 irqchip;
> --
> 2.49.0.1266.g31b7d2e469-goog
>
Humbly ping for reviews / comments
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v2 1/6] KVM: arm64: VM exit to userspace to handle SEA
2025-06-04 5:08 ` [PATCH v2 1/6] KVM: arm64: VM exit to userspace to handle SEA Jiaqi Yan
2025-07-01 17:35 ` Jiaqi Yan
@ 2025-07-11 19:39 ` Oliver Upton
2025-07-11 23:59 ` Jiaqi Yan
1 sibling, 1 reply; 21+ messages in thread
From: Oliver Upton @ 2025-07-11 19:39 UTC (permalink / raw)
To: Jiaqi Yan
Cc: maz, joey.gouly, suzuki.poulose, yuzenghui, catalin.marinas, will,
pbonzini, corbet, shuah, kvm, kvmarm, linux-arm-kernel,
linux-kernel, linux-doc, linux-kselftest, duenwen, rananta,
jthoughton
Hi Jiaqi,
On Wed, Jun 04, 2025 at 05:08:56AM +0000, Jiaqi Yan wrote:
> When APEI fails to handle a stage-2 synchronous external abort (SEA),
> today KVM directly injects an async SError to the VCPU then resumes it,
> which usually results in unpleasant guest kernel panic.
>
> One major situation of guest SEA is when vCPU consumes recoverable
> uncorrected memory error (UER). Although SError and guest kernel panic
> effectively stops the propagation of corrupted memory, there is room
> to recover from an UER in a more graceful manner.
>
> Alternatively KVM can redirect the synchronous SEA event to VMM to
> - Reduce blast radius if possible. VMM can inject a SEA to VCPU via
> KVM's existing KVM_SET_VCPU_EVENTS API. If the memory poison
> consumption or fault is not from guest kernel, blast radius can be
> limited to the triggering thread in guest userspace, so VM can
> keep running.
> - VMM can protect from future memory poison consumption by unmapping
> the page from stage-2, or interrupt guest of the poisoned guest page
> so guest kernel can unmap it from stage-1.
> - VMM can also track SEA events that VM customers care about, restart
> VM when certain number of distinct poison events have happened,
> provide observability to customers in log management UI.
>
> Introduce an userspace-visible feature to enable VMM to handle SEA:
> - KVM_CAP_ARM_SEA_TO_USER. As the alternative fallback behavior
> when host APEI fails to claim a SEA, userspace can opt in this new
> capability to let KVM exit to userspace during SEA if it is not
> caused by access on memory of stage-2 translation table.
> - KVM_EXIT_ARM_SEA. A new exit reason is introduced for this.
> KVM fills kvm_run.arm_sea with as much as possible information about
> the SEA, enabling VMM to emulate SEA to guest by itself.
> - Sanitized ESR_EL2. The general rule is to keep only the bits
> useful for userspace and relevant to guest memory. See code
> comments for why bits are hidden/reported.
> - If faulting guest virtual and physical addresses are available.
> - Faulting guest virtual address if available.
> - Faulting guest physical address if available.
>
> Signed-off-by: Jiaqi Yan <jiaqiyan@google.com>
I was reviewing this locally and wound up making enough changes where it
just made more sense to share the diff. General comments:
- Avoid adding helpers to headers when they're used in a single
callsite / compilation unit
- Add some detail about FEAT_RAS where we may still exit to userspace
for host-controlled memory, as we cannot differentiate between a
stage-1 or stage-2 TTW SEA when taken on the descriptor PA
- Explicitly handle SEAs due to VNCR (I have a separate prereq patch)
From aac0bb8f90c43b5b17c3b4e50379cb8ca828812c Mon Sep 17 00:00:00 2001
From: Jiaqi Yan <jiaqiyan@google.com>
Date: Wed, 4 Jun 2025 05:08:56 +0000
Subject: [PATCH] KVM: arm64: VM exit to userspace to handle SEA
When APEI fails to handle a stage-2 synchronous external abort (SEA),
today KVM directly injects an async SError to the VCPU then resumes it,
which usually results in unpleasant guest kernel panic.
One major situation of guest SEA is when vCPU consumes recoverable
uncorrected memory error (UER). Although SError and guest kernel panic
effectively stops the propagation of corrupted memory, there is room
to recover from an UER in a more graceful manner.
Alternatively KVM can redirect the synchronous SEA event to VMM to
- Reduce blast radius if possible. VMM can inject a SEA to VCPU via
KVM's existing KVM_SET_VCPU_EVENTS API. If the memory poison
consumption or fault is not from guest kernel, blast radius can be
limited to the triggering thread in guest userspace, so VM can
keep running.
- VMM can protect from future memory poison consumption by unmapping
the page from stage-2, or interrupt guest of the poisoned guest page
so guest kernel can unmap it from stage-1.
- VMM can also track SEA events that VM customers care about, restart
VM when certain number of distinct poison events have happened,
provide observability to customers in log management UI.
Introduce an userspace-visible feature to enable VMM to handle SEA:
- KVM_CAP_ARM_SEA_TO_USER. As the alternative fallback behavior
when host APEI fails to claim a SEA, userspace can opt in this new
capability to let KVM exit to userspace during SEA if it is not
caused by access on memory of stage-2 translation table.
- KVM_EXIT_ARM_SEA. A new exit reason is introduced for this.
KVM fills kvm_run.arm_sea with as much as possible information about
the SEA, enabling VMM to emulate SEA to guest by itself.
- Sanitized ESR_EL2. The general rule is to keep only the bits
useful for userspace and relevant to guest memory. See code
comments for why bits are hidden/reported.
- If faulting guest virtual and physical addresses are available.
- Faulting guest virtual address if available.
- Faulting guest physical address if available.
Signed-off-by: Jiaqi Yan <jiaqiyan@google.com>
Link: https://lore.kernel.org/r/20250604050902.3944054-2-jiaqiyan@google.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
---
arch/arm64/include/asm/kvm_host.h | 2 +
arch/arm64/kvm/arm.c | 5 +++
arch/arm64/kvm/mmu.c | 67 ++++++++++++++++++++++++++++++-
include/uapi/linux/kvm.h | 10 +++++
4 files changed, 83 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index e54d29feb469..98ce2d58ac8d 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -349,6 +349,8 @@ struct kvm_arch {
#define KVM_ARCH_FLAG_GUEST_HAS_SVE 9
/* MIDR_EL1, REVIDR_EL1, and AIDR_EL1 are writable from userspace */
#define KVM_ARCH_FLAG_WRITABLE_IMP_ID_REGS 10
+ /* Unhandled SEAs are taken to userspace */
+#define KVM_ARCH_FLAG_EXIT_SEA 11
unsigned long flags;
/* VM-wide vCPU feature set */
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 7a1a8210ff91..aec6034db1e7 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -133,6 +133,10 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
}
mutex_unlock(&kvm->lock);
break;
+ case KVM_CAP_ARM_SEA_TO_USER:
+ r = 0;
+ set_bit(KVM_ARCH_FLAG_EXIT_SEA, &kvm->arch.flags);
+ break;
default:
break;
}
@@ -322,6 +326,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_IRQFD_RESAMPLE:
case KVM_CAP_COUNTER_OFFSET:
case KVM_CAP_ARM_WRITABLE_IMP_ID_REGS:
+ case KVM_CAP_ARM_SEA_TO_USER:
r = 1;
break;
case KVM_CAP_SET_GUEST_DEBUG2:
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index a34924d75069..26b2e71994be 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1813,8 +1813,48 @@ static void handle_access_fault(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
read_unlock(&vcpu->kvm->mmu_lock);
}
+/*
+ * Returns true if the SEA should be handled locally within KVM if the abort is
+ * caused by a kernel memory allocation (e.g. stage-2 table memory).
+ */
+static bool host_owns_sea(struct kvm_vcpu *vcpu, u64 esr)
+{
+ /*
+ * Without FEAT_RAS HCR_EL2.TEA is RES0, meaning any external abort
+ * taken from a guest EL to EL2 is due to a host-imposed access (e.g.
+ * stage-2 PTW).
+ */
+ if (!cpus_have_final_cap(ARM64_HAS_RAS_EXTN))
+ return true;
+
+ /* KVM owns the VNCR when the vCPU isn't in a nested context. */
+ if (is_hyp_ctxt(vcpu) && (esr & ESR_ELx_VNCR))
+ return true;
+
+ /*
+ * Determining if an external abort during a table walk happened at
+ * stage-2 is only possible with S1PTW is set. Otherwise, since KVM
+ * sets HCR_EL2.TEA, SEAs due to a stage-1 walk (i.e. accessing the PA
+ * of the stage-1 descriptor) can reach here and are reported with a
+ * TTW ESR value.
+ */
+ return esr_fsc_is_sea_ttw(esr) && (esr & ESR_ELx_S1PTW);
+}
+
int kvm_handle_guest_sea(struct kvm_vcpu *vcpu)
{
+ u64 esr = kvm_vcpu_get_esr(vcpu);
+ struct kvm_run *run = vcpu->run;
+ struct kvm *kvm = vcpu->kvm;
+ u64 esr_mask = ESR_ELx_EC_MASK |
+ ESR_ELx_FnV |
+ ESR_ELx_EA |
+ ESR_ELx_CM |
+ ESR_ELx_WNR |
+ ESR_ELx_FSC;
+ u64 ipa;
+
+
/*
* Give APEI the opportunity to claim the abort before handling it
* within KVM. apei_claim_sea() expects to be called with IRQs
@@ -1824,7 +1864,32 @@ int kvm_handle_guest_sea(struct kvm_vcpu *vcpu)
if (apei_claim_sea(NULL) == 0)
return 1;
- return kvm_inject_serror(vcpu);
+ if (host_owns_sea(vcpu, esr) || !test_bit(KVM_ARCH_FLAG_EXIT_SEA, &kvm->arch.flags))
+ return kvm_inject_serror(vcpu);
+
+ /* ESR_ELx.SET is RES0 when FEAT_RAS isn't implemented. */
+ if (kvm_has_ras(kvm))
+ esr_mask |= ESR_ELx_SET_MASK;
+
+ /*
+ * Exit to userspace, and provide faulting guest virtual and physical
+ * addresses in case userspace wants to emulate SEA to guest by
+ * writing to FAR_EL1 and HPFAR_EL1 registers.
+ */
+ memset(&run->arm_sea, 0, sizeof(run->arm_sea));
+ run->exit_reason = KVM_EXIT_ARM_SEA;
+ run->arm_sea.esr = esr & esr_mask;
+
+ if (!(esr & ESR_ELx_FnV))
+ run->arm_sea.gva = kvm_vcpu_get_hfar(vcpu);
+
+ ipa = kvm_vcpu_get_fault_ipa(vcpu);
+ if (ipa != INVALID_GPA) {
+ run->arm_sea.flags |= KVM_EXIT_ARM_SEA_FLAG_GPA_VALID;
+ run->arm_sea.gpa = ipa;
+ }
+
+ return 0;
}
/**
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index e4e566ff348b..b2cc3d74d769 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -179,6 +179,7 @@ struct kvm_xen_exit {
#define KVM_EXIT_LOONGARCH_IOCSR 38
#define KVM_EXIT_MEMORY_FAULT 39
#define KVM_EXIT_TDX 40
+#define KVM_EXIT_ARM_SEA 41
/* For KVM_EXIT_INTERNAL_ERROR */
/* Emulate instruction failed. */
@@ -469,6 +470,14 @@ struct kvm_run {
} get_tdvmcall_info;
};
} tdx;
+ /* KVM_EXIT_ARM_SEA */
+ struct {
+#define KVM_EXIT_ARM_SEA_FLAG_GPA_VALID (1ULL << 0)
+ __u64 flags;
+ __u64 esr;
+ __u64 gva;
+ __u64 gpa;
+ } arm_sea;
/* Fix the size of the union. */
char padding[256];
};
@@ -957,6 +966,7 @@ struct kvm_enable_cap {
#define KVM_CAP_ARM_EL2_E2H0 241
#define KVM_CAP_RISCV_MP_STATE_RESET 242
#define KVM_CAP_ARM_CACHEABLE_PFNMAP_SUPPORTED 243
+#define KVM_CAP_ARM_SEA_TO_USER 244
struct kvm_irq_routing_irqchip {
__u32 irqchip;
--
2.39.5
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH v2 1/6] KVM: arm64: VM exit to userspace to handle SEA
2025-07-11 19:39 ` Oliver Upton
@ 2025-07-11 23:59 ` Jiaqi Yan
2025-07-12 19:57 ` Oliver Upton
0 siblings, 1 reply; 21+ messages in thread
From: Jiaqi Yan @ 2025-07-11 23:59 UTC (permalink / raw)
To: Oliver Upton
Cc: maz, joey.gouly, suzuki.poulose, yuzenghui, catalin.marinas, will,
pbonzini, corbet, shuah, kvm, kvmarm, linux-arm-kernel,
linux-kernel, linux-doc, linux-kselftest, duenwen, rananta,
jthoughton
On Fri, Jul 11, 2025 at 12:40 PM Oliver Upton <oliver.upton@linux.dev> wrote:
>
> Hi Jiaqi,
>
> On Wed, Jun 04, 2025 at 05:08:56AM +0000, Jiaqi Yan wrote:
> > When APEI fails to handle a stage-2 synchronous external abort (SEA),
> > today KVM directly injects an async SError to the VCPU then resumes it,
> > which usually results in unpleasant guest kernel panic.
> >
> > One major situation of guest SEA is when vCPU consumes recoverable
> > uncorrected memory error (UER). Although SError and guest kernel panic
> > effectively stops the propagation of corrupted memory, there is room
> > to recover from an UER in a more graceful manner.
> >
> > Alternatively KVM can redirect the synchronous SEA event to VMM to
> > - Reduce blast radius if possible. VMM can inject a SEA to VCPU via
> > KVM's existing KVM_SET_VCPU_EVENTS API. If the memory poison
> > consumption or fault is not from guest kernel, blast radius can be
> > limited to the triggering thread in guest userspace, so VM can
> > keep running.
> > - VMM can protect from future memory poison consumption by unmapping
> > the page from stage-2, or interrupt guest of the poisoned guest page
> > so guest kernel can unmap it from stage-1.
> > - VMM can also track SEA events that VM customers care about, restart
> > VM when certain number of distinct poison events have happened,
> > provide observability to customers in log management UI.
> >
> > Introduce an userspace-visible feature to enable VMM to handle SEA:
> > - KVM_CAP_ARM_SEA_TO_USER. As the alternative fallback behavior
> > when host APEI fails to claim a SEA, userspace can opt in this new
> > capability to let KVM exit to userspace during SEA if it is not
> > caused by access on memory of stage-2 translation table.
> > - KVM_EXIT_ARM_SEA. A new exit reason is introduced for this.
> > KVM fills kvm_run.arm_sea with as much as possible information about
> > the SEA, enabling VMM to emulate SEA to guest by itself.
> > - Sanitized ESR_EL2. The general rule is to keep only the bits
> > useful for userspace and relevant to guest memory. See code
> > comments for why bits are hidden/reported.
> > - If faulting guest virtual and physical addresses are available.
> > - Faulting guest virtual address if available.
> > - Faulting guest physical address if available.
> >
> > Signed-off-by: Jiaqi Yan <jiaqiyan@google.com>
>
> I was reviewing this locally and wound up making enough changes where it
> just made more sense to share the diff. General comments:
Thanks for the diff, Oliver! I will work on a v3 based on it.
>
> - Avoid adding helpers to headers when they're used in a single
> callsite / compilation unit
>
> - Add some detail about FEAT_RAS where we may still exit to userspace
> for host-controlled memory, as we cannot differentiate between a
> stage-1 or stage-2 TTW SEA when taken on the descriptor PA
Ah, IIUC, you are saying even if the FSC code tells fault is on TTW
(esr_fsc_is_secc_ttw or esr_fsc_is_sea_ttw), it can either be guest
stage-1's or stage-2's descriptor PA, and we can tell which from
which.
However, if ESR_ELx_S1PTW is set, we can tell this is a sub-case of
stage-2 descriptor PA, their usage is for stage-1 PTW but they are
stage-2 memory.
Is my current understanding right?
>
> - Explicitly handle SEAs due to VNCR (I have a separate prereq patch)
>
> From aac0bb8f90c43b5b17c3b4e50379cb8ca828812c Mon Sep 17 00:00:00 2001
> From: Jiaqi Yan <jiaqiyan@google.com>
> Date: Wed, 4 Jun 2025 05:08:56 +0000
> Subject: [PATCH] KVM: arm64: VM exit to userspace to handle SEA
>
> When APEI fails to handle a stage-2 synchronous external abort (SEA),
> today KVM directly injects an async SError to the VCPU then resumes it,
> which usually results in unpleasant guest kernel panic.
>
> One major situation of guest SEA is when vCPU consumes recoverable
> uncorrected memory error (UER). Although SError and guest kernel panic
> effectively stops the propagation of corrupted memory, there is room
> to recover from an UER in a more graceful manner.
>
> Alternatively KVM can redirect the synchronous SEA event to VMM to
> - Reduce blast radius if possible. VMM can inject a SEA to VCPU via
> KVM's existing KVM_SET_VCPU_EVENTS API. If the memory poison
> consumption or fault is not from guest kernel, blast radius can be
> limited to the triggering thread in guest userspace, so VM can
> keep running.
> - VMM can protect from future memory poison consumption by unmapping
> the page from stage-2, or interrupt guest of the poisoned guest page
> so guest kernel can unmap it from stage-1.
> - VMM can also track SEA events that VM customers care about, restart
> VM when certain number of distinct poison events have happened,
> provide observability to customers in log management UI.
>
> Introduce an userspace-visible feature to enable VMM to handle SEA:
> - KVM_CAP_ARM_SEA_TO_USER. As the alternative fallback behavior
> when host APEI fails to claim a SEA, userspace can opt in this new
> capability to let KVM exit to userspace during SEA if it is not
> caused by access on memory of stage-2 translation table.
> - KVM_EXIT_ARM_SEA. A new exit reason is introduced for this.
> KVM fills kvm_run.arm_sea with as much as possible information about
> the SEA, enabling VMM to emulate SEA to guest by itself.
> - Sanitized ESR_EL2. The general rule is to keep only the bits
> useful for userspace and relevant to guest memory. See code
> comments for why bits are hidden/reported.
> - If faulting guest virtual and physical addresses are available.
> - Faulting guest virtual address if available.
> - Faulting guest physical address if available.
>
> Signed-off-by: Jiaqi Yan <jiaqiyan@google.com>
> Link: https://lore.kernel.org/r/20250604050902.3944054-2-jiaqiyan@google.com
> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> ---
> arch/arm64/include/asm/kvm_host.h | 2 +
> arch/arm64/kvm/arm.c | 5 +++
> arch/arm64/kvm/mmu.c | 67 ++++++++++++++++++++++++++++++-
> include/uapi/linux/kvm.h | 10 +++++
> 4 files changed, 83 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index e54d29feb469..98ce2d58ac8d 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -349,6 +349,8 @@ struct kvm_arch {
> #define KVM_ARCH_FLAG_GUEST_HAS_SVE 9
> /* MIDR_EL1, REVIDR_EL1, and AIDR_EL1 are writable from userspace */
> #define KVM_ARCH_FLAG_WRITABLE_IMP_ID_REGS 10
> + /* Unhandled SEAs are taken to userspace */
> +#define KVM_ARCH_FLAG_EXIT_SEA 11
> unsigned long flags;
>
> /* VM-wide vCPU feature set */
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 7a1a8210ff91..aec6034db1e7 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -133,6 +133,10 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
> }
> mutex_unlock(&kvm->lock);
> break;
> + case KVM_CAP_ARM_SEA_TO_USER:
> + r = 0;
> + set_bit(KVM_ARCH_FLAG_EXIT_SEA, &kvm->arch.flags);
> + break;
> default:
> break;
> }
> @@ -322,6 +326,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> case KVM_CAP_IRQFD_RESAMPLE:
> case KVM_CAP_COUNTER_OFFSET:
> case KVM_CAP_ARM_WRITABLE_IMP_ID_REGS:
> + case KVM_CAP_ARM_SEA_TO_USER:
> r = 1;
> break;
> case KVM_CAP_SET_GUEST_DEBUG2:
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index a34924d75069..26b2e71994be 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1813,8 +1813,48 @@ static void handle_access_fault(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
> read_unlock(&vcpu->kvm->mmu_lock);
> }
>
> +/*
> + * Returns true if the SEA should be handled locally within KVM if the abort is
> + * caused by a kernel memory allocation (e.g. stage-2 table memory).
> + */
> +static bool host_owns_sea(struct kvm_vcpu *vcpu, u64 esr)
> +{
> + /*
> + * Without FEAT_RAS HCR_EL2.TEA is RES0, meaning any external abort
> + * taken from a guest EL to EL2 is due to a host-imposed access (e.g.
> + * stage-2 PTW).
> + */
> + if (!cpus_have_final_cap(ARM64_HAS_RAS_EXTN))
> + return true;
> +
> + /* KVM owns the VNCR when the vCPU isn't in a nested context. */
> + if (is_hyp_ctxt(vcpu) && (esr & ESR_ELx_VNCR))
> + return true;
> +
> + /*
> + * Determining if an external abort during a table walk happened at
> + * stage-2 is only possible with S1PTW is set. Otherwise, since KVM
> + * sets HCR_EL2.TEA, SEAs due to a stage-1 walk (i.e. accessing the PA
> + * of the stage-1 descriptor) can reach here and are reported with a
> + * TTW ESR value.
> + */
> + return esr_fsc_is_sea_ttw(esr) && (esr & ESR_ELx_S1PTW);
Should we include esr_fsc_is_secc_ttw? like
(esr_fsc_is_sea_ttw(esr) || esr_fsc_is_secc_ttw(esr)) && (esr & ESR_ELx_S1PTW)
> +}
> +
> int kvm_handle_guest_sea(struct kvm_vcpu *vcpu)
> {
> + u64 esr = kvm_vcpu_get_esr(vcpu);
> + struct kvm_run *run = vcpu->run;
> + struct kvm *kvm = vcpu->kvm;
> + u64 esr_mask = ESR_ELx_EC_MASK |
> + ESR_ELx_FnV |
> + ESR_ELx_EA |
> + ESR_ELx_CM |
> + ESR_ELx_WNR |
> + ESR_ELx_FSC;
Do you (and why) exclude ESR_ELx_IL on purpose?
BTW, if my previous statement about TTW SEA is correct, then I also
understand why we need to explicitly exclude ESR_ELx_S1PTW.
> + u64 ipa;
> +
> +
> /*
> * Give APEI the opportunity to claim the abort before handling it
> * within KVM. apei_claim_sea() expects to be called with IRQs
> @@ -1824,7 +1864,32 @@ int kvm_handle_guest_sea(struct kvm_vcpu *vcpu)
> if (apei_claim_sea(NULL) == 0)
I assume kvm should still lockdep_assert_irqs_enabled(), right? That
is, a WARN_ON_ONCE is still useful in case?
> return 1;
>
> - return kvm_inject_serror(vcpu);
> + if (host_owns_sea(vcpu, esr) || !test_bit(KVM_ARCH_FLAG_EXIT_SEA, &kvm->arch.flags))
> + return kvm_inject_serror(vcpu);
> +
> + /* ESR_ELx.SET is RES0 when FEAT_RAS isn't implemented. */
> + if (kvm_has_ras(kvm))
> + esr_mask |= ESR_ELx_SET_MASK;
> +
> + /*
> + * Exit to userspace, and provide faulting guest virtual and physical
> + * addresses in case userspace wants to emulate SEA to guest by
> + * writing to FAR_EL1 and HPFAR_EL1 registers.
> + */
> + memset(&run->arm_sea, 0, sizeof(run->arm_sea));
> + run->exit_reason = KVM_EXIT_ARM_SEA;
> + run->arm_sea.esr = esr & esr_mask;
> +
> + if (!(esr & ESR_ELx_FnV))
> + run->arm_sea.gva = kvm_vcpu_get_hfar(vcpu);
> +
> + ipa = kvm_vcpu_get_fault_ipa(vcpu);
> + if (ipa != INVALID_GPA) {
> + run->arm_sea.flags |= KVM_EXIT_ARM_SEA_FLAG_GPA_VALID;
> + run->arm_sea.gpa = ipa;
> + }
> +
> + return 0;
> }
>
> /**
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index e4e566ff348b..b2cc3d74d769 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -179,6 +179,7 @@ struct kvm_xen_exit {
> #define KVM_EXIT_LOONGARCH_IOCSR 38
> #define KVM_EXIT_MEMORY_FAULT 39
> #define KVM_EXIT_TDX 40
> +#define KVM_EXIT_ARM_SEA 41
>
> /* For KVM_EXIT_INTERNAL_ERROR */
> /* Emulate instruction failed. */
> @@ -469,6 +470,14 @@ struct kvm_run {
> } get_tdvmcall_info;
> };
> } tdx;
> + /* KVM_EXIT_ARM_SEA */
> + struct {
> +#define KVM_EXIT_ARM_SEA_FLAG_GPA_VALID (1ULL << 0)
> + __u64 flags;
> + __u64 esr;
> + __u64 gva;
> + __u64 gpa;
> + } arm_sea;
> /* Fix the size of the union. */
> char padding[256];
> };
> @@ -957,6 +966,7 @@ struct kvm_enable_cap {
> #define KVM_CAP_ARM_EL2_E2H0 241
> #define KVM_CAP_RISCV_MP_STATE_RESET 242
> #define KVM_CAP_ARM_CACHEABLE_PFNMAP_SUPPORTED 243
> +#define KVM_CAP_ARM_SEA_TO_USER 244
>
> struct kvm_irq_routing_irqchip {
> __u32 irqchip;
> --
> 2.39.5
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v2 1/6] KVM: arm64: VM exit to userspace to handle SEA
2025-07-11 23:59 ` Jiaqi Yan
@ 2025-07-12 19:57 ` Oliver Upton
2025-07-19 21:24 ` Jiaqi Yan
0 siblings, 1 reply; 21+ messages in thread
From: Oliver Upton @ 2025-07-12 19:57 UTC (permalink / raw)
To: Jiaqi Yan
Cc: maz, joey.gouly, suzuki.poulose, yuzenghui, catalin.marinas, will,
pbonzini, corbet, shuah, kvm, kvmarm, linux-arm-kernel,
linux-kernel, linux-doc, linux-kselftest, duenwen, rananta,
jthoughton
On Fri, Jul 11, 2025 at 04:59:11PM -0700, Jiaqi Yan wrote:
> > - Add some detail about FEAT_RAS where we may still exit to userspace
> > for host-controlled memory, as we cannot differentiate between a
> > stage-1 or stage-2 TTW SEA when taken on the descriptor PA
>
> Ah, IIUC, you are saying even if the FSC code tells fault is on TTW
> (esr_fsc_is_secc_ttw or esr_fsc_is_sea_ttw), it can either be guest
> stage-1's or stage-2's descriptor PA, and we can tell which from
> which.
>
> However, if ESR_ELx_S1PTW is set, we can tell this is a sub-case of
> stage-2 descriptor PA, their usage is for stage-1 PTW but they are
> stage-2 memory.
>
> Is my current understanding right?
Yep, that's exactly what I'm getting at. As you note, stage-2 aborts
during a stage-1 walk are sufficiently described, but not much else.
> > +/*
> > + * Returns true if the SEA should be handled locally within KVM if the abort is
> > + * caused by a kernel memory allocation (e.g. stage-2 table memory).
> > + */
> > +static bool host_owns_sea(struct kvm_vcpu *vcpu, u64 esr)
> > +{
> > + /*
> > + * Without FEAT_RAS HCR_EL2.TEA is RES0, meaning any external abort
> > + * taken from a guest EL to EL2 is due to a host-imposed access (e.g.
> > + * stage-2 PTW).
> > + */
> > + if (!cpus_have_final_cap(ARM64_HAS_RAS_EXTN))
> > + return true;
> > +
> > + /* KVM owns the VNCR when the vCPU isn't in a nested context. */
> > + if (is_hyp_ctxt(vcpu) && (esr & ESR_ELx_VNCR))
> > + return true;
> > +
> > + /*
> > + * Determining if an external abort during a table walk happened at
> > + * stage-2 is only possible with S1PTW is set. Otherwise, since KVM
> > + * sets HCR_EL2.TEA, SEAs due to a stage-1 walk (i.e. accessing the PA
> > + * of the stage-1 descriptor) can reach here and are reported with a
> > + * TTW ESR value.
> > + */
> > + return esr_fsc_is_sea_ttw(esr) && (esr & ESR_ELx_S1PTW);
>
> Should we include esr_fsc_is_secc_ttw? like
> (esr_fsc_is_sea_ttw(esr) || esr_fsc_is_secc_ttw(esr)) && (esr & ESR_ELx_S1PTW)
Parity / ECC errors are not permitted if FEAT_RAS is implemented (which
is tested for up front).
> > +}
> > +
> > int kvm_handle_guest_sea(struct kvm_vcpu *vcpu)
> > {
> > + u64 esr = kvm_vcpu_get_esr(vcpu);
> > + struct kvm_run *run = vcpu->run;
> > + struct kvm *kvm = vcpu->kvm;
> > + u64 esr_mask = ESR_ELx_EC_MASK |
> > + ESR_ELx_FnV |
> > + ESR_ELx_EA |
> > + ESR_ELx_CM |
> > + ESR_ELx_WNR |
> > + ESR_ELx_FSC;
>
> Do you (and why) exclude ESR_ELx_IL on purpose?
Unintended :)
> BTW, if my previous statement about TTW SEA is correct, then I also
> understand why we need to explicitly exclude ESR_ELx_S1PTW.
Right, we shouldn't be exposing genuine stage-2 external aborts to userspace.
> > + u64 ipa;
> > +
> > +
> > /*
> > * Give APEI the opportunity to claim the abort before handling it
> > * within KVM. apei_claim_sea() expects to be called with IRQs
> > @@ -1824,7 +1864,32 @@ int kvm_handle_guest_sea(struct kvm_vcpu *vcpu)
> > if (apei_claim_sea(NULL) == 0)
>
> I assume kvm should still lockdep_assert_irqs_enabled(), right? That
> is, a WARN_ON_ONCE is still useful in case?
Ah, this is diffed against my VNCR prefix which has this context. Yes, I
want to preserve the lockdep assertion.
From eb63dbf07b3d1f42b059f5c94abd147d195299c8 Mon Sep 17 00:00:00 2001
From: Oliver Upton <oliver.upton@linux.dev>
Date: Thu, 10 Jul 2025 17:14:51 -0700
Subject: [PATCH] KVM: arm64: nv: Handle SEAs due to VNCR redirection
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
---
arch/arm64/include/asm/kvm_mmu.h | 1 +
arch/arm64/include/asm/kvm_ras.h | 25 -------------------------
arch/arm64/kvm/mmu.c | 30 ++++++++++++++++++------------
arch/arm64/kvm/nested.c | 3 +++
4 files changed, 22 insertions(+), 37 deletions(-)
delete mode 100644 arch/arm64/include/asm/kvm_ras.h
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index ae563ebd6aee..e4069f2ce642 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -180,6 +180,7 @@ void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu);
int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
phys_addr_t pa, unsigned long size, bool writable);
+int kvm_handle_guest_sea(struct kvm_vcpu *vcpu);
int kvm_handle_guest_abort(struct kvm_vcpu *vcpu);
phys_addr_t kvm_mmu_get_httbr(void);
diff --git a/arch/arm64/include/asm/kvm_ras.h b/arch/arm64/include/asm/kvm_ras.h
deleted file mode 100644
index 9398ade632aa..000000000000
--- a/arch/arm64/include/asm/kvm_ras.h
+++ /dev/null
@@ -1,25 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-/* Copyright (C) 2018 - Arm Ltd */
-
-#ifndef __ARM64_KVM_RAS_H__
-#define __ARM64_KVM_RAS_H__
-
-#include <linux/acpi.h>
-#include <linux/errno.h>
-#include <linux/types.h>
-
-#include <asm/acpi.h>
-
-/*
- * Was this synchronous external abort a RAS notification?
- * Returns '0' for errors handled by some RAS subsystem, or -ENOENT.
- */
-static inline int kvm_handle_guest_sea(void)
-{
- /* apei_claim_sea(NULL) expects to mask interrupts itself */
- lockdep_assert_irqs_enabled();
-
- return apei_claim_sea(NULL);
-}
-
-#endif /* __ARM64_KVM_RAS_H__ */
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 1c78864767c5..6934f4acdc45 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -4,19 +4,20 @@
* Author: Christoffer Dall <c.dall@virtualopensystems.com>
*/
+#include <linux/acpi.h>
#include <linux/mman.h>
#include <linux/kvm_host.h>
#include <linux/io.h>
#include <linux/hugetlb.h>
#include <linux/sched/signal.h>
#include <trace/events/kvm.h>
+#include <asm/acpi.h>
#include <asm/pgalloc.h>
#include <asm/cacheflush.h>
#include <asm/kvm_arm.h>
#include <asm/kvm_mmu.h>
#include <asm/kvm_pgtable.h>
#include <asm/kvm_pkvm.h>
-#include <asm/kvm_ras.h>
#include <asm/kvm_asm.h>
#include <asm/kvm_emulate.h>
#include <asm/virt.h>
@@ -1811,6 +1812,20 @@ static void handle_access_fault(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
read_unlock(&vcpu->kvm->mmu_lock);
}
+int kvm_handle_guest_sea(struct kvm_vcpu *vcpu)
+{
+ /*
+ * Give APEI the opportunity to claim the abort before handling it
+ * within KVM. apei_claim_sea() expects to be called with IRQs
+ * enabled.
+ */
+ lockdep_assert_irqs_enabled();
+ if (apei_claim_sea(NULL) == 0)
+ return 1;
+
+ return kvm_inject_serror(vcpu);
+}
+
/**
* kvm_handle_guest_abort - handles all 2nd stage aborts
* @vcpu: the VCPU pointer
@@ -1834,17 +1849,8 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
gfn_t gfn;
int ret, idx;
- /* Synchronous External Abort? */
- if (kvm_vcpu_abt_issea(vcpu)) {
- /*
- * For RAS the host kernel may handle this abort.
- * There is no need to pass the error into the guest.
- */
- if (kvm_handle_guest_sea())
- return kvm_inject_serror(vcpu);
-
- return 1;
- }
+ if (kvm_vcpu_abt_issea(vcpu))
+ return kvm_handle_guest_sea(vcpu);
esr = kvm_vcpu_get_esr(vcpu);
diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
index 096747a61bf6..38b0e3a9a6db 100644
--- a/arch/arm64/kvm/nested.c
+++ b/arch/arm64/kvm/nested.c
@@ -1289,6 +1289,9 @@ int kvm_handle_vncr_abort(struct kvm_vcpu *vcpu)
BUG_ON(!(esr & ESR_ELx_VNCR_SHIFT));
+ if (kvm_vcpu_abt_issea(vcpu))
+ return kvm_handle_guest_sea(vcpu);
+
if (esr_fsc_is_permission_fault(esr)) {
inject_vncr_perm(vcpu);
} else if (esr_fsc_is_translation_fault(esr)) {
--
2.39.5
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH v2 1/6] KVM: arm64: VM exit to userspace to handle SEA
2025-07-12 19:57 ` Oliver Upton
@ 2025-07-19 21:24 ` Jiaqi Yan
2025-07-25 22:54 ` Jiaqi Yan
0 siblings, 1 reply; 21+ messages in thread
From: Jiaqi Yan @ 2025-07-19 21:24 UTC (permalink / raw)
To: Oliver Upton
Cc: maz, joey.gouly, suzuki.poulose, yuzenghui, catalin.marinas, will,
pbonzini, corbet, shuah, kvm, kvmarm, linux-arm-kernel,
linux-kernel, linux-doc, linux-kselftest, duenwen, rananta,
jthoughton
On Sat, Jul 12, 2025 at 12:57 PM Oliver Upton <oliver.upton@linux.dev> wrote:
>
> On Fri, Jul 11, 2025 at 04:59:11PM -0700, Jiaqi Yan wrote:
> > > - Add some detail about FEAT_RAS where we may still exit to userspace
> > > for host-controlled memory, as we cannot differentiate between a
> > > stage-1 or stage-2 TTW SEA when taken on the descriptor PA
> >
> > Ah, IIUC, you are saying even if the FSC code tells fault is on TTW
> > (esr_fsc_is_secc_ttw or esr_fsc_is_sea_ttw), it can either be guest
> > stage-1's or stage-2's descriptor PA, and we can tell which from
> > which.
> >
> > However, if ESR_ELx_S1PTW is set, we can tell this is a sub-case of
> > stage-2 descriptor PA, their usage is for stage-1 PTW but they are
> > stage-2 memory.
> >
> > Is my current understanding right?
>
> Yep, that's exactly what I'm getting at. As you note, stage-2 aborts
> during a stage-1 walk are sufficiently described, but not much else.
Got it, thanks!
>
> > > +/*
> > > + * Returns true if the SEA should be handled locally within KVM if the abort is
> > > + * caused by a kernel memory allocation (e.g. stage-2 table memory).
> > > + */
> > > +static bool host_owns_sea(struct kvm_vcpu *vcpu, u64 esr)
> > > +{
> > > + /*
> > > + * Without FEAT_RAS HCR_EL2.TEA is RES0, meaning any external abort
> > > + * taken from a guest EL to EL2 is due to a host-imposed access (e.g.
> > > + * stage-2 PTW).
> > > + */
> > > + if (!cpus_have_final_cap(ARM64_HAS_RAS_EXTN))
> > > + return true;
> > > +
> > > + /* KVM owns the VNCR when the vCPU isn't in a nested context. */
> > > + if (is_hyp_ctxt(vcpu) && (esr & ESR_ELx_VNCR))
> > > + return true;
> > > +
> > > + /*
> > > + * Determining if an external abort during a table walk happened at
> > > + * stage-2 is only possible with S1PTW is set. Otherwise, since KVM
> > > + * sets HCR_EL2.TEA, SEAs due to a stage-1 walk (i.e. accessing the PA
> > > + * of the stage-1 descriptor) can reach here and are reported with a
> > > + * TTW ESR value.
> > > + */
> > > + return esr_fsc_is_sea_ttw(esr) && (esr & ESR_ELx_S1PTW);
> >
> > Should we include esr_fsc_is_secc_ttw? like
> > (esr_fsc_is_sea_ttw(esr) || esr_fsc_is_secc_ttw(esr)) && (esr & ESR_ELx_S1PTW)
>
> Parity / ECC errors are not permitted if FEAT_RAS is implemented (which
> is tested for up front).
Ah, thanks for pointing this out.
>
> > > +}
> > > +
> > > int kvm_handle_guest_sea(struct kvm_vcpu *vcpu)
> > > {
> > > + u64 esr = kvm_vcpu_get_esr(vcpu);
> > > + struct kvm_run *run = vcpu->run;
> > > + struct kvm *kvm = vcpu->kvm;
> > > + u64 esr_mask = ESR_ELx_EC_MASK |
> > > + ESR_ELx_FnV |
> > > + ESR_ELx_EA |
> > > + ESR_ELx_CM |
> > > + ESR_ELx_WNR |
> > > + ESR_ELx_FSC;
> >
> > Do you (and why) exclude ESR_ELx_IL on purpose?
>
> Unintended :)
Will add into my patch.
>
> > BTW, if my previous statement about TTW SEA is correct, then I also
> > understand why we need to explicitly exclude ESR_ELx_S1PTW.
>
> Right, we shouldn't be exposing genuine stage-2 external aborts to userspace.
>
> > > + u64 ipa;
> > > +
> > > +
> > > /*
> > > * Give APEI the opportunity to claim the abort before handling it
> > > * within KVM. apei_claim_sea() expects to be called with IRQs
> > > @@ -1824,7 +1864,32 @@ int kvm_handle_guest_sea(struct kvm_vcpu *vcpu)
> > > if (apei_claim_sea(NULL) == 0)
> >
> > I assume kvm should still lockdep_assert_irqs_enabled(), right? That
> > is, a WARN_ON_ONCE is still useful in case?
>
> Ah, this is diffed against my VNCR prefix which has this context. Yes, I
> want to preserve the lockdep assertion.
Thanks for sharing the patch! Should I wait for you to send and queue
to kvmarm/next and rebase my v3 to it? Or should I insert it into my
v3 patch series with you as the commit author, and Signed-off-by you?
BTW, while I am working on v3, I think it is probably better to
decouple the current patchset into two. The first one for
KVM_EXIT_ARM_SEA, and the second one for injecting (D|I)ABT with
user-supplemented esr. This way may help KVM_EXIT_ARM_SEA, the more
important feature, get reviewed and accepted sooner. I will send out a
separate patchset for enhancing the guest SEA injection.
>
>
> From eb63dbf07b3d1f42b059f5c94abd147d195299c8 Mon Sep 17 00:00:00 2001
> From: Oliver Upton <oliver.upton@linux.dev>
> Date: Thu, 10 Jul 2025 17:14:51 -0700
> Subject: [PATCH] KVM: arm64: nv: Handle SEAs due to VNCR redirection
>
> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> ---
> arch/arm64/include/asm/kvm_mmu.h | 1 +
> arch/arm64/include/asm/kvm_ras.h | 25 -------------------------
> arch/arm64/kvm/mmu.c | 30 ++++++++++++++++++------------
> arch/arm64/kvm/nested.c | 3 +++
> 4 files changed, 22 insertions(+), 37 deletions(-)
> delete mode 100644 arch/arm64/include/asm/kvm_ras.h
>
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index ae563ebd6aee..e4069f2ce642 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -180,6 +180,7 @@ void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu);
> int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
> phys_addr_t pa, unsigned long size, bool writable);
>
> +int kvm_handle_guest_sea(struct kvm_vcpu *vcpu);
> int kvm_handle_guest_abort(struct kvm_vcpu *vcpu);
>
> phys_addr_t kvm_mmu_get_httbr(void);
> diff --git a/arch/arm64/include/asm/kvm_ras.h b/arch/arm64/include/asm/kvm_ras.h
> deleted file mode 100644
> index 9398ade632aa..000000000000
> --- a/arch/arm64/include/asm/kvm_ras.h
> +++ /dev/null
> @@ -1,25 +0,0 @@
> -/* SPDX-License-Identifier: GPL-2.0 */
> -/* Copyright (C) 2018 - Arm Ltd */
> -
> -#ifndef __ARM64_KVM_RAS_H__
> -#define __ARM64_KVM_RAS_H__
> -
> -#include <linux/acpi.h>
> -#include <linux/errno.h>
> -#include <linux/types.h>
> -
> -#include <asm/acpi.h>
> -
> -/*
> - * Was this synchronous external abort a RAS notification?
> - * Returns '0' for errors handled by some RAS subsystem, or -ENOENT.
> - */
> -static inline int kvm_handle_guest_sea(void)
> -{
> - /* apei_claim_sea(NULL) expects to mask interrupts itself */
> - lockdep_assert_irqs_enabled();
> -
> - return apei_claim_sea(NULL);
> -}
> -
> -#endif /* __ARM64_KVM_RAS_H__ */
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 1c78864767c5..6934f4acdc45 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -4,19 +4,20 @@
> * Author: Christoffer Dall <c.dall@virtualopensystems.com>
> */
>
> +#include <linux/acpi.h>
> #include <linux/mman.h>
> #include <linux/kvm_host.h>
> #include <linux/io.h>
> #include <linux/hugetlb.h>
> #include <linux/sched/signal.h>
> #include <trace/events/kvm.h>
> +#include <asm/acpi.h>
> #include <asm/pgalloc.h>
> #include <asm/cacheflush.h>
> #include <asm/kvm_arm.h>
> #include <asm/kvm_mmu.h>
> #include <asm/kvm_pgtable.h>
> #include <asm/kvm_pkvm.h>
> -#include <asm/kvm_ras.h>
> #include <asm/kvm_asm.h>
> #include <asm/kvm_emulate.h>
> #include <asm/virt.h>
> @@ -1811,6 +1812,20 @@ static void handle_access_fault(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
> read_unlock(&vcpu->kvm->mmu_lock);
> }
>
> +int kvm_handle_guest_sea(struct kvm_vcpu *vcpu)
> +{
> + /*
> + * Give APEI the opportunity to claim the abort before handling it
> + * within KVM. apei_claim_sea() expects to be called with IRQs
> + * enabled.
> + */
> + lockdep_assert_irqs_enabled();
> + if (apei_claim_sea(NULL) == 0)
> + return 1;
> +
> + return kvm_inject_serror(vcpu);
> +}
> +
> /**
> * kvm_handle_guest_abort - handles all 2nd stage aborts
> * @vcpu: the VCPU pointer
> @@ -1834,17 +1849,8 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
> gfn_t gfn;
> int ret, idx;
>
> - /* Synchronous External Abort? */
> - if (kvm_vcpu_abt_issea(vcpu)) {
> - /*
> - * For RAS the host kernel may handle this abort.
> - * There is no need to pass the error into the guest.
> - */
> - if (kvm_handle_guest_sea())
> - return kvm_inject_serror(vcpu);
> -
> - return 1;
> - }
> + if (kvm_vcpu_abt_issea(vcpu))
> + return kvm_handle_guest_sea(vcpu);
>
> esr = kvm_vcpu_get_esr(vcpu);
>
> diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
> index 096747a61bf6..38b0e3a9a6db 100644
> --- a/arch/arm64/kvm/nested.c
> +++ b/arch/arm64/kvm/nested.c
> @@ -1289,6 +1289,9 @@ int kvm_handle_vncr_abort(struct kvm_vcpu *vcpu)
>
> BUG_ON(!(esr & ESR_ELx_VNCR_SHIFT));
>
> + if (kvm_vcpu_abt_issea(vcpu))
> + return kvm_handle_guest_sea(vcpu);
> +
> if (esr_fsc_is_permission_fault(esr)) {
> inject_vncr_perm(vcpu);
> } else if (esr_fsc_is_translation_fault(esr)) {
> --
> 2.39.5
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v2 1/6] KVM: arm64: VM exit to userspace to handle SEA
2025-07-19 21:24 ` Jiaqi Yan
@ 2025-07-25 22:54 ` Jiaqi Yan
2025-07-29 21:28 ` Oliver Upton
0 siblings, 1 reply; 21+ messages in thread
From: Jiaqi Yan @ 2025-07-25 22:54 UTC (permalink / raw)
To: Oliver Upton
Cc: maz, joey.gouly, suzuki.poulose, yuzenghui, catalin.marinas, will,
pbonzini, corbet, shuah, kvm, kvmarm, linux-arm-kernel,
linux-kernel, linux-doc, linux-kselftest, duenwen, rananta,
jthoughton
On Sat, Jul 19, 2025 at 2:24 PM Jiaqi Yan <jiaqiyan@google.com> wrote:
>
> On Sat, Jul 12, 2025 at 12:57 PM Oliver Upton <oliver.upton@linux.dev> wrote:
> >
> > On Fri, Jul 11, 2025 at 04:59:11PM -0700, Jiaqi Yan wrote:
> > > > - Add some detail about FEAT_RAS where we may still exit to userspace
> > > > for host-controlled memory, as we cannot differentiate between a
> > > > stage-1 or stage-2 TTW SEA when taken on the descriptor PA
> > >
> > > Ah, IIUC, you are saying even if the FSC code tells fault is on TTW
> > > (esr_fsc_is_secc_ttw or esr_fsc_is_sea_ttw), it can either be guest
> > > stage-1's or stage-2's descriptor PA, and we can tell which from
> > > which.
> > >
> > > However, if ESR_ELx_S1PTW is set, we can tell this is a sub-case of
> > > stage-2 descriptor PA, their usage is for stage-1 PTW but they are
> > > stage-2 memory.
> > >
> > > Is my current understanding right?
> >
> > Yep, that's exactly what I'm getting at. As you note, stage-2 aborts
> > during a stage-1 walk are sufficiently described, but not much else.
>
> Got it, thanks!
>
> >
> > > > +/*
> > > > + * Returns true if the SEA should be handled locally within KVM if the abort is
> > > > + * caused by a kernel memory allocation (e.g. stage-2 table memory).
> > > > + */
> > > > +static bool host_owns_sea(struct kvm_vcpu *vcpu, u64 esr)
> > > > +{
> > > > + /*
> > > > + * Without FEAT_RAS HCR_EL2.TEA is RES0, meaning any external abort
> > > > + * taken from a guest EL to EL2 is due to a host-imposed access (e.g.
> > > > + * stage-2 PTW).
> > > > + */
> > > > + if (!cpus_have_final_cap(ARM64_HAS_RAS_EXTN))
> > > > + return true;
> > > > +
> > > > + /* KVM owns the VNCR when the vCPU isn't in a nested context. */
> > > > + if (is_hyp_ctxt(vcpu) && (esr & ESR_ELx_VNCR))
> > > > + return true;
> > > > +
> > > > + /*
> > > > + * Determining if an external abort during a table walk happened at
> > > > + * stage-2 is only possible with S1PTW is set. Otherwise, since KVM
> > > > + * sets HCR_EL2.TEA, SEAs due to a stage-1 walk (i.e. accessing the PA
> > > > + * of the stage-1 descriptor) can reach here and are reported with a
> > > > + * TTW ESR value.
> > > > + */
> > > > + return esr_fsc_is_sea_ttw(esr) && (esr & ESR_ELx_S1PTW);
> > >
> > > Should we include esr_fsc_is_secc_ttw? like
> > > (esr_fsc_is_sea_ttw(esr) || esr_fsc_is_secc_ttw(esr)) && (esr & ESR_ELx_S1PTW)
> >
> > Parity / ECC errors are not permitted if FEAT_RAS is implemented (which
> > is tested for up front).
>
> Ah, thanks for pointing this out.
>
> >
> > > > +}
> > > > +
> > > > int kvm_handle_guest_sea(struct kvm_vcpu *vcpu)
> > > > {
> > > > + u64 esr = kvm_vcpu_get_esr(vcpu);
> > > > + struct kvm_run *run = vcpu->run;
> > > > + struct kvm *kvm = vcpu->kvm;
> > > > + u64 esr_mask = ESR_ELx_EC_MASK |
> > > > + ESR_ELx_FnV |
> > > > + ESR_ELx_EA |
> > > > + ESR_ELx_CM |
> > > > + ESR_ELx_WNR |
> > > > + ESR_ELx_FSC;
> > >
> > > Do you (and why) exclude ESR_ELx_IL on purpose?
> >
> > Unintended :)
>
> Will add into my patch.
>
> >
> > > BTW, if my previous statement about TTW SEA is correct, then I also
> > > understand why we need to explicitly exclude ESR_ELx_S1PTW.
> >
> > Right, we shouldn't be exposing genuine stage-2 external aborts to userspace.
> >
> > > > + u64 ipa;
> > > > +
> > > > +
> > > > /*
> > > > * Give APEI the opportunity to claim the abort before handling it
> > > > * within KVM. apei_claim_sea() expects to be called with IRQs
> > > > @@ -1824,7 +1864,32 @@ int kvm_handle_guest_sea(struct kvm_vcpu *vcpu)
> > > > if (apei_claim_sea(NULL) == 0)
> > >
> > > I assume kvm should still lockdep_assert_irqs_enabled(), right? That
> > > is, a WARN_ON_ONCE is still useful in case?
> >
> > Ah, this is diffed against my VNCR prefix which has this context. Yes, I
> > want to preserve the lockdep assertion.
>
> Thanks for sharing the patch! Should I wait for you to send and queue
> to kvmarm/next and rebase my v3 to it? Or should I insert it into my
> v3 patch series with you as the commit author, and Signed-off-by you?
Friendly ping for this question, my v3 is ready but want to confirm
the best option here.
Recently we found even the newer ARM64 platforms used by our org has
to rely on KVM to more gracefully handle SEA (lacking support from
APEI), so we would really want to work with upstream to lock down the
proposed approach/UAPI asap.
Thanks!
>
> BTW, while I am working on v3, I think it is probably better to
> decouple the current patchset into two. The first one for
> KVM_EXIT_ARM_SEA, and the second one for injecting (D|I)ABT with
> user-supplemented esr. This way may help KVM_EXIT_ARM_SEA, the more
> important feature, get reviewed and accepted sooner. I will send out a
> separate patchset for enhancing the guest SEA injection.
>
> >
> >
> > From eb63dbf07b3d1f42b059f5c94abd147d195299c8 Mon Sep 17 00:00:00 2001
> > From: Oliver Upton <oliver.upton@linux.dev>
> > Date: Thu, 10 Jul 2025 17:14:51 -0700
> > Subject: [PATCH] KVM: arm64: nv: Handle SEAs due to VNCR redirection
> >
> > Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> > ---
> > arch/arm64/include/asm/kvm_mmu.h | 1 +
> > arch/arm64/include/asm/kvm_ras.h | 25 -------------------------
> > arch/arm64/kvm/mmu.c | 30 ++++++++++++++++++------------
> > arch/arm64/kvm/nested.c | 3 +++
> > 4 files changed, 22 insertions(+), 37 deletions(-)
> > delete mode 100644 arch/arm64/include/asm/kvm_ras.h
> >
> > diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> > index ae563ebd6aee..e4069f2ce642 100644
> > --- a/arch/arm64/include/asm/kvm_mmu.h
> > +++ b/arch/arm64/include/asm/kvm_mmu.h
> > @@ -180,6 +180,7 @@ void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu);
> > int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
> > phys_addr_t pa, unsigned long size, bool writable);
> >
> > +int kvm_handle_guest_sea(struct kvm_vcpu *vcpu);
> > int kvm_handle_guest_abort(struct kvm_vcpu *vcpu);
> >
> > phys_addr_t kvm_mmu_get_httbr(void);
> > diff --git a/arch/arm64/include/asm/kvm_ras.h b/arch/arm64/include/asm/kvm_ras.h
> > deleted file mode 100644
> > index 9398ade632aa..000000000000
> > --- a/arch/arm64/include/asm/kvm_ras.h
> > +++ /dev/null
> > @@ -1,25 +0,0 @@
> > -/* SPDX-License-Identifier: GPL-2.0 */
> > -/* Copyright (C) 2018 - Arm Ltd */
> > -
> > -#ifndef __ARM64_KVM_RAS_H__
> > -#define __ARM64_KVM_RAS_H__
> > -
> > -#include <linux/acpi.h>
> > -#include <linux/errno.h>
> > -#include <linux/types.h>
> > -
> > -#include <asm/acpi.h>
> > -
> > -/*
> > - * Was this synchronous external abort a RAS notification?
> > - * Returns '0' for errors handled by some RAS subsystem, or -ENOENT.
> > - */
> > -static inline int kvm_handle_guest_sea(void)
> > -{
> > - /* apei_claim_sea(NULL) expects to mask interrupts itself */
> > - lockdep_assert_irqs_enabled();
> > -
> > - return apei_claim_sea(NULL);
> > -}
> > -
> > -#endif /* __ARM64_KVM_RAS_H__ */
> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index 1c78864767c5..6934f4acdc45 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -4,19 +4,20 @@
> > * Author: Christoffer Dall <c.dall@virtualopensystems.com>
> > */
> >
> > +#include <linux/acpi.h>
> > #include <linux/mman.h>
> > #include <linux/kvm_host.h>
> > #include <linux/io.h>
> > #include <linux/hugetlb.h>
> > #include <linux/sched/signal.h>
> > #include <trace/events/kvm.h>
> > +#include <asm/acpi.h>
> > #include <asm/pgalloc.h>
> > #include <asm/cacheflush.h>
> > #include <asm/kvm_arm.h>
> > #include <asm/kvm_mmu.h>
> > #include <asm/kvm_pgtable.h>
> > #include <asm/kvm_pkvm.h>
> > -#include <asm/kvm_ras.h>
> > #include <asm/kvm_asm.h>
> > #include <asm/kvm_emulate.h>
> > #include <asm/virt.h>
> > @@ -1811,6 +1812,20 @@ static void handle_access_fault(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
> > read_unlock(&vcpu->kvm->mmu_lock);
> > }
> >
> > +int kvm_handle_guest_sea(struct kvm_vcpu *vcpu)
> > +{
> > + /*
> > + * Give APEI the opportunity to claim the abort before handling it
> > + * within KVM. apei_claim_sea() expects to be called with IRQs
> > + * enabled.
> > + */
> > + lockdep_assert_irqs_enabled();
> > + if (apei_claim_sea(NULL) == 0)
> > + return 1;
> > +
> > + return kvm_inject_serror(vcpu);
> > +}
> > +
> > /**
> > * kvm_handle_guest_abort - handles all 2nd stage aborts
> > * @vcpu: the VCPU pointer
> > @@ -1834,17 +1849,8 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
> > gfn_t gfn;
> > int ret, idx;
> >
> > - /* Synchronous External Abort? */
> > - if (kvm_vcpu_abt_issea(vcpu)) {
> > - /*
> > - * For RAS the host kernel may handle this abort.
> > - * There is no need to pass the error into the guest.
> > - */
> > - if (kvm_handle_guest_sea())
> > - return kvm_inject_serror(vcpu);
> > -
> > - return 1;
> > - }
> > + if (kvm_vcpu_abt_issea(vcpu))
> > + return kvm_handle_guest_sea(vcpu);
> >
> > esr = kvm_vcpu_get_esr(vcpu);
> >
> > diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
> > index 096747a61bf6..38b0e3a9a6db 100644
> > --- a/arch/arm64/kvm/nested.c
> > +++ b/arch/arm64/kvm/nested.c
> > @@ -1289,6 +1289,9 @@ int kvm_handle_vncr_abort(struct kvm_vcpu *vcpu)
> >
> > BUG_ON(!(esr & ESR_ELx_VNCR_SHIFT));
> >
> > + if (kvm_vcpu_abt_issea(vcpu))
> > + return kvm_handle_guest_sea(vcpu);
> > +
> > if (esr_fsc_is_permission_fault(esr)) {
> > inject_vncr_perm(vcpu);
> > } else if (esr_fsc_is_translation_fault(esr)) {
> > --
> > 2.39.5
> >
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v2 1/6] KVM: arm64: VM exit to userspace to handle SEA
2025-07-25 22:54 ` Jiaqi Yan
@ 2025-07-29 21:28 ` Oliver Upton
2025-07-31 21:06 ` Jiaqi Yan
0 siblings, 1 reply; 21+ messages in thread
From: Oliver Upton @ 2025-07-29 21:28 UTC (permalink / raw)
To: Jiaqi Yan
Cc: maz, joey.gouly, suzuki.poulose, yuzenghui, catalin.marinas, will,
pbonzini, corbet, shuah, kvm, kvmarm, linux-arm-kernel,
linux-kernel, linux-doc, linux-kselftest, duenwen, rananta,
jthoughton
On Fri, Jul 25, 2025 at 03:54:10PM -0700, Jiaqi Yan wrote:
> On Sat, Jul 19, 2025 at 2:24 PM Jiaqi Yan <jiaqiyan@google.com> wrote:
> >
> > On Sat, Jul 12, 2025 at 12:57 PM Oliver Upton <oliver.upton@linux.dev> wrote:
> > >
> > > On Fri, Jul 11, 2025 at 04:59:11PM -0700, Jiaqi Yan wrote:
> > > > > - Add some detail about FEAT_RAS where we may still exit to userspace
> > > > > for host-controlled memory, as we cannot differentiate between a
> > > > > stage-1 or stage-2 TTW SEA when taken on the descriptor PA
> > > >
> > > > Ah, IIUC, you are saying even if the FSC code tells fault is on TTW
> > > > (esr_fsc_is_secc_ttw or esr_fsc_is_sea_ttw), it can either be guest
> > > > stage-1's or stage-2's descriptor PA, and we can tell which from
> > > > which.
> > > >
> > > > However, if ESR_ELx_S1PTW is set, we can tell this is a sub-case of
> > > > stage-2 descriptor PA, their usage is for stage-1 PTW but they are
> > > > stage-2 memory.
> > > >
> > > > Is my current understanding right?
> > >
> > > Yep, that's exactly what I'm getting at. As you note, stage-2 aborts
> > > during a stage-1 walk are sufficiently described, but not much else.
> >
> > Got it, thanks!
> >
> > >
> > > > > +/*
> > > > > + * Returns true if the SEA should be handled locally within KVM if the abort is
> > > > > + * caused by a kernel memory allocation (e.g. stage-2 table memory).
> > > > > + */
> > > > > +static bool host_owns_sea(struct kvm_vcpu *vcpu, u64 esr)
> > > > > +{
> > > > > + /*
> > > > > + * Without FEAT_RAS HCR_EL2.TEA is RES0, meaning any external abort
> > > > > + * taken from a guest EL to EL2 is due to a host-imposed access (e.g.
> > > > > + * stage-2 PTW).
> > > > > + */
> > > > > + if (!cpus_have_final_cap(ARM64_HAS_RAS_EXTN))
> > > > > + return true;
> > > > > +
> > > > > + /* KVM owns the VNCR when the vCPU isn't in a nested context. */
> > > > > + if (is_hyp_ctxt(vcpu) && (esr & ESR_ELx_VNCR))
> > > > > + return true;
> > > > > +
> > > > > + /*
> > > > > + * Determining if an external abort during a table walk happened at
> > > > > + * stage-2 is only possible with S1PTW is set. Otherwise, since KVM
> > > > > + * sets HCR_EL2.TEA, SEAs due to a stage-1 walk (i.e. accessing the PA
> > > > > + * of the stage-1 descriptor) can reach here and are reported with a
> > > > > + * TTW ESR value.
> > > > > + */
> > > > > + return esr_fsc_is_sea_ttw(esr) && (esr & ESR_ELx_S1PTW);
> > > >
> > > > Should we include esr_fsc_is_secc_ttw? like
> > > > (esr_fsc_is_sea_ttw(esr) || esr_fsc_is_secc_ttw(esr)) && (esr & ESR_ELx_S1PTW)
> > >
> > > Parity / ECC errors are not permitted if FEAT_RAS is implemented (which
> > > is tested for up front).
> >
> > Ah, thanks for pointing this out.
> >
> > >
> > > > > +}
> > > > > +
> > > > > int kvm_handle_guest_sea(struct kvm_vcpu *vcpu)
> > > > > {
> > > > > + u64 esr = kvm_vcpu_get_esr(vcpu);
> > > > > + struct kvm_run *run = vcpu->run;
> > > > > + struct kvm *kvm = vcpu->kvm;
> > > > > + u64 esr_mask = ESR_ELx_EC_MASK |
> > > > > + ESR_ELx_FnV |
> > > > > + ESR_ELx_EA |
> > > > > + ESR_ELx_CM |
> > > > > + ESR_ELx_WNR |
> > > > > + ESR_ELx_FSC;
> > > >
> > > > Do you (and why) exclude ESR_ELx_IL on purpose?
> > >
> > > Unintended :)
> >
> > Will add into my patch.
> >
> > >
> > > > BTW, if my previous statement about TTW SEA is correct, then I also
> > > > understand why we need to explicitly exclude ESR_ELx_S1PTW.
> > >
> > > Right, we shouldn't be exposing genuine stage-2 external aborts to userspace.
> > >
> > > > > + u64 ipa;
> > > > > +
> > > > > +
> > > > > /*
> > > > > * Give APEI the opportunity to claim the abort before handling it
> > > > > * within KVM. apei_claim_sea() expects to be called with IRQs
> > > > > @@ -1824,7 +1864,32 @@ int kvm_handle_guest_sea(struct kvm_vcpu *vcpu)
> > > > > if (apei_claim_sea(NULL) == 0)
> > > >
> > > > I assume kvm should still lockdep_assert_irqs_enabled(), right? That
> > > > is, a WARN_ON_ONCE is still useful in case?
> > >
> > > Ah, this is diffed against my VNCR prefix which has this context. Yes, I
> > > want to preserve the lockdep assertion.
> >
> > Thanks for sharing the patch! Should I wait for you to send and queue
> > to kvmarm/next and rebase my v3 to it? Or should I insert it into my
> > v3 patch series with you as the commit author, and Signed-off-by you?
>
> Friendly ping for this question, my v3 is ready but want to confirm
> the best option here.
>
> Recently we found even the newer ARM64 platforms used by our org has
> to rely on KVM to more gracefully handle SEA (lacking support from
> APEI), so we would really want to work with upstream to lock down the
> proposed approach/UAPI asap.
Posted the VNCR fix which I plan on taking in 6.17. Feel free to rebase
your work on top of kvmarm-6.17 or -rc1 when it comes out.
https://lore.kernel.org/kvmarm/20250729182342.3281742-1-oliver.upton@linux.dev/
Thanks,
Oliver
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v2 1/6] KVM: arm64: VM exit to userspace to handle SEA
2025-07-29 21:28 ` Oliver Upton
@ 2025-07-31 21:06 ` Jiaqi Yan
0 siblings, 0 replies; 21+ messages in thread
From: Jiaqi Yan @ 2025-07-31 21:06 UTC (permalink / raw)
To: Oliver Upton
Cc: maz, joey.gouly, suzuki.poulose, yuzenghui, catalin.marinas, will,
pbonzini, corbet, shuah, kvm, kvmarm, linux-arm-kernel,
linux-kernel, linux-doc, linux-kselftest, duenwen, rananta,
jthoughton
On Tue, Jul 29, 2025 at 2:28 PM Oliver Upton <oliver.upton@linux.dev> wrote:
>
> On Fri, Jul 25, 2025 at 03:54:10PM -0700, Jiaqi Yan wrote:
> > On Sat, Jul 19, 2025 at 2:24 PM Jiaqi Yan <jiaqiyan@google.com> wrote:
> > >
> > > On Sat, Jul 12, 2025 at 12:57 PM Oliver Upton <oliver.upton@linux.dev> wrote:
> > > >
> > > > On Fri, Jul 11, 2025 at 04:59:11PM -0700, Jiaqi Yan wrote:
> > > > > > - Add some detail about FEAT_RAS where we may still exit to userspace
> > > > > > for host-controlled memory, as we cannot differentiate between a
> > > > > > stage-1 or stage-2 TTW SEA when taken on the descriptor PA
> > > > >
> > > > > Ah, IIUC, you are saying even if the FSC code tells fault is on TTW
> > > > > (esr_fsc_is_secc_ttw or esr_fsc_is_sea_ttw), it can either be guest
> > > > > stage-1's or stage-2's descriptor PA, and we can tell which from
> > > > > which.
> > > > >
> > > > > However, if ESR_ELx_S1PTW is set, we can tell this is a sub-case of
> > > > > stage-2 descriptor PA, their usage is for stage-1 PTW but they are
> > > > > stage-2 memory.
> > > > >
> > > > > Is my current understanding right?
> > > >
> > > > Yep, that's exactly what I'm getting at. As you note, stage-2 aborts
> > > > during a stage-1 walk are sufficiently described, but not much else.
> > >
> > > Got it, thanks!
> > >
> > > >
> > > > > > +/*
> > > > > > + * Returns true if the SEA should be handled locally within KVM if the abort is
> > > > > > + * caused by a kernel memory allocation (e.g. stage-2 table memory).
> > > > > > + */
> > > > > > +static bool host_owns_sea(struct kvm_vcpu *vcpu, u64 esr)
> > > > > > +{
> > > > > > + /*
> > > > > > + * Without FEAT_RAS HCR_EL2.TEA is RES0, meaning any external abort
> > > > > > + * taken from a guest EL to EL2 is due to a host-imposed access (e.g.
> > > > > > + * stage-2 PTW).
> > > > > > + */
> > > > > > + if (!cpus_have_final_cap(ARM64_HAS_RAS_EXTN))
> > > > > > + return true;
> > > > > > +
> > > > > > + /* KVM owns the VNCR when the vCPU isn't in a nested context. */
> > > > > > + if (is_hyp_ctxt(vcpu) && (esr & ESR_ELx_VNCR))
> > > > > > + return true;
> > > > > > +
> > > > > > + /*
> > > > > > + * Determining if an external abort during a table walk happened at
> > > > > > + * stage-2 is only possible with S1PTW is set. Otherwise, since KVM
> > > > > > + * sets HCR_EL2.TEA, SEAs due to a stage-1 walk (i.e. accessing the PA
> > > > > > + * of the stage-1 descriptor) can reach here and are reported with a
> > > > > > + * TTW ESR value.
> > > > > > + */
> > > > > > + return esr_fsc_is_sea_ttw(esr) && (esr & ESR_ELx_S1PTW);
> > > > >
> > > > > Should we include esr_fsc_is_secc_ttw? like
> > > > > (esr_fsc_is_sea_ttw(esr) || esr_fsc_is_secc_ttw(esr)) && (esr & ESR_ELx_S1PTW)
> > > >
> > > > Parity / ECC errors are not permitted if FEAT_RAS is implemented (which
> > > > is tested for up front).
> > >
> > > Ah, thanks for pointing this out.
> > >
> > > >
> > > > > > +}
> > > > > > +
> > > > > > int kvm_handle_guest_sea(struct kvm_vcpu *vcpu)
> > > > > > {
> > > > > > + u64 esr = kvm_vcpu_get_esr(vcpu);
> > > > > > + struct kvm_run *run = vcpu->run;
> > > > > > + struct kvm *kvm = vcpu->kvm;
> > > > > > + u64 esr_mask = ESR_ELx_EC_MASK |
> > > > > > + ESR_ELx_FnV |
> > > > > > + ESR_ELx_EA |
> > > > > > + ESR_ELx_CM |
> > > > > > + ESR_ELx_WNR |
> > > > > > + ESR_ELx_FSC;
> > > > >
> > > > > Do you (and why) exclude ESR_ELx_IL on purpose?
> > > >
> > > > Unintended :)
> > >
> > > Will add into my patch.
> > >
> > > >
> > > > > BTW, if my previous statement about TTW SEA is correct, then I also
> > > > > understand why we need to explicitly exclude ESR_ELx_S1PTW.
> > > >
> > > > Right, we shouldn't be exposing genuine stage-2 external aborts to userspace.
> > > >
> > > > > > + u64 ipa;
> > > > > > +
> > > > > > +
> > > > > > /*
> > > > > > * Give APEI the opportunity to claim the abort before handling it
> > > > > > * within KVM. apei_claim_sea() expects to be called with IRQs
> > > > > > @@ -1824,7 +1864,32 @@ int kvm_handle_guest_sea(struct kvm_vcpu *vcpu)
> > > > > > if (apei_claim_sea(NULL) == 0)
> > > > >
> > > > > I assume kvm should still lockdep_assert_irqs_enabled(), right? That
> > > > > is, a WARN_ON_ONCE is still useful in case?
> > > >
> > > > Ah, this is diffed against my VNCR prefix which has this context. Yes, I
> > > > want to preserve the lockdep assertion.
> > >
> > > Thanks for sharing the patch! Should I wait for you to send and queue
> > > to kvmarm/next and rebase my v3 to it? Or should I insert it into my
> > > v3 patch series with you as the commit author, and Signed-off-by you?
> >
> > Friendly ping for this question, my v3 is ready but want to confirm
> > the best option here.
> >
> > Recently we found even the newer ARM64 platforms used by our org has
> > to rely on KVM to more gracefully handle SEA (lacking support from
> > APEI), so we would really want to work with upstream to lock down the
> > proposed approach/UAPI asap.
>
> Posted the VNCR fix which I plan on taking in 6.17. Feel free to rebase
> your work on top of kvmarm-6.17 or -rc1 when it comes out.
Thanks Oliver! I sent out v3 based on the VNCR fix here on top of the
current kvmarm/next.
>
> https://lore.kernel.org/kvmarm/20250729182342.3281742-1-oliver.upton@linux.dev/
>
> Thanks,
> Oliver
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH v2 2/6] KVM: arm64: Set FnV for VCPU when FAR_EL2 is invalid
2025-06-04 5:08 [PATCH v2 0/6] VMM can handle guest SEA via KVM_EXIT_ARM_SEA Jiaqi Yan
2025-06-04 5:08 ` [PATCH v2 1/6] KVM: arm64: VM exit to userspace to handle SEA Jiaqi Yan
@ 2025-06-04 5:08 ` Jiaqi Yan
2025-06-04 5:08 ` [PATCH v2 3/6] KVM: arm64: Allow userspace to inject external instruction aborts Jiaqi Yan
` (3 subsequent siblings)
5 siblings, 0 replies; 21+ messages in thread
From: Jiaqi Yan @ 2025-06-04 5:08 UTC (permalink / raw)
To: maz, oliver.upton
Cc: joey.gouly, suzuki.poulose, yuzenghui, catalin.marinas, will,
pbonzini, corbet, shuah, kvm, kvmarm, linux-arm-kernel,
linux-kernel, linux-doc, linux-kselftest, duenwen, rananta,
jthoughton, Jiaqi Yan
Certain microarchitectures (e.g. Neoverse V2) do not keep track of
the faulting address for a memory load that consumes poisoned data
and results in a synchronous external abort (SEA). IOW, both
FAR_EL2 register and kvm_vcpu_get_hfar holds a garbage value.
In case VMM later totally relies on KVM to synchronously inject a
SEA into the guest, KVM should set FnV bit in VCPU's
- ESR_EL1 to let guest kernel know FAR_EL1 is invalid
- ESR_EL2 to let nested virtualization know FAR_EL2 is invalid
Signed-off-by: Jiaqi Yan <jiaqiyan@google.com>
---
arch/arm64/kvm/inject_fault.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c
index a640e839848e6..b4f9a09952ead 100644
--- a/arch/arm64/kvm/inject_fault.c
+++ b/arch/arm64/kvm/inject_fault.c
@@ -81,6 +81,9 @@ static void inject_abt64(struct kvm_vcpu *vcpu, bool is_iabt, unsigned long addr
if (!is_iabt)
esr |= ESR_ELx_EC_DABT_LOW << ESR_ELx_EC_SHIFT;
+ if (!kvm_vcpu_sea_far_valid(vcpu))
+ esr |= ESR_ELx_FnV;
+
esr |= ESR_ELx_FSC_EXTABT;
if (match_target_el(vcpu, unpack_vcpu_flag(EXCEPT_AA64_EL1_SYNC))) {
--
2.49.0.1266.g31b7d2e469-goog
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v2 3/6] KVM: arm64: Allow userspace to inject external instruction aborts
2025-06-04 5:08 [PATCH v2 0/6] VMM can handle guest SEA via KVM_EXIT_ARM_SEA Jiaqi Yan
2025-06-04 5:08 ` [PATCH v2 1/6] KVM: arm64: VM exit to userspace to handle SEA Jiaqi Yan
2025-06-04 5:08 ` [PATCH v2 2/6] KVM: arm64: Set FnV for VCPU when FAR_EL2 is invalid Jiaqi Yan
@ 2025-06-04 5:08 ` Jiaqi Yan
2025-07-11 19:42 ` Oliver Upton
2025-06-04 5:08 ` [PATCH v2 4/6] KVM: selftests: Test for KVM_EXIT_ARM_SEA and KVM_CAP_ARM_SEA_TO_USER Jiaqi Yan
` (2 subsequent siblings)
5 siblings, 1 reply; 21+ messages in thread
From: Jiaqi Yan @ 2025-06-04 5:08 UTC (permalink / raw)
To: maz, oliver.upton
Cc: joey.gouly, suzuki.poulose, yuzenghui, catalin.marinas, will,
pbonzini, corbet, shuah, kvm, kvmarm, linux-arm-kernel,
linux-kernel, linux-doc, linux-kselftest, duenwen, rananta,
jthoughton, Jiaqi Yan
From: Raghavendra Rao Ananta <rananta@google.com>
When KVM returns to userspace for KVM_EXIT_ARM_SEA, the userspace is
encouraged to inject the abort into the guest via KVM_SET_VCPU_EVENTS.
KVM_SET_VCPU_EVENTS currently only allows injecting external data aborts.
However, the synchronous external abort that caused KVM_EXIT_ARM_SEA
is possible to be an instruction abort. Userspace is already able to
tell if an abort is due to data or instruction via kvm_run.arm_sea.esr,
by checking its Exception Class value.
Extend the KVM_SET_VCPU_EVENTS ioctl to allow injecting instruction
abort into the guest.
Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: Jiaqi Yan <jiaqiyan@google.com>
---
arch/arm64/include/uapi/asm/kvm.h | 3 ++-
arch/arm64/kvm/arm.c | 1 +
arch/arm64/kvm/guest.c | 13 ++++++++++---
include/uapi/linux/kvm.h | 1 +
4 files changed, 14 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index ed5f3892674c7..643e8c4825451 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -184,8 +184,9 @@ struct kvm_vcpu_events {
__u8 serror_pending;
__u8 serror_has_esr;
__u8 ext_dabt_pending;
+ __u8 ext_iabt_pending;
/* Align it to 8 bytes */
- __u8 pad[5];
+ __u8 pad[4];
__u64 serror_esr;
} exception;
__u32 reserved[12];
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 99e0c6c16e437..78e8a82c38cfc 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -319,6 +319,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_ARM_IRQ_LINE_LAYOUT_2:
case KVM_CAP_ARM_NISV_TO_USER:
case KVM_CAP_ARM_INJECT_EXT_DABT:
+ case KVM_CAP_ARM_INJECT_EXT_IABT:
case KVM_CAP_SET_GUEST_DEBUG:
case KVM_CAP_VCPU_ATTRIBUTES:
case KVM_CAP_PTP_KVM:
diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index 2196979a24a32..4917361ecf5cb 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -825,9 +825,9 @@ int __kvm_arm_vcpu_get_events(struct kvm_vcpu *vcpu,
events->exception.serror_esr = vcpu_get_vsesr(vcpu);
/*
- * We never return a pending ext_dabt here because we deliver it to
- * the virtual CPU directly when setting the event and it's no longer
- * 'pending' at this point.
+ * We never return a pending ext_dabt or ext_iabt here because we
+ * deliver it to the virtual CPU directly when setting the event
+ * and it's no longer 'pending' at this point.
*/
return 0;
@@ -839,6 +839,7 @@ int __kvm_arm_vcpu_set_events(struct kvm_vcpu *vcpu,
bool serror_pending = events->exception.serror_pending;
bool has_esr = events->exception.serror_has_esr;
bool ext_dabt_pending = events->exception.ext_dabt_pending;
+ bool ext_iabt_pending = events->exception.ext_iabt_pending;
if (serror_pending && has_esr) {
if (!cpus_have_final_cap(ARM64_HAS_RAS_EXTN))
@@ -852,8 +853,14 @@ int __kvm_arm_vcpu_set_events(struct kvm_vcpu *vcpu,
kvm_inject_vabt(vcpu);
}
+ /* DABT and IABT cannot happen at the same time. */
+ if (ext_dabt_pending && ext_iabt_pending)
+ return -EINVAL;
+
if (ext_dabt_pending)
kvm_inject_dabt(vcpu, kvm_vcpu_get_hfar(vcpu));
+ else if (ext_iabt_pending)
+ kvm_inject_pabt(vcpu, kvm_vcpu_get_hfar(vcpu));
return 0;
}
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 4fed3fdfb13d6..2fc3775ac1183 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -943,6 +943,7 @@ struct kvm_enable_cap {
#define KVM_CAP_ARM_EL2 240
#define KVM_CAP_ARM_EL2_E2H0 241
#define KVM_CAP_ARM_SEA_TO_USER 242
+#define KVM_CAP_ARM_INJECT_EXT_IABT 243
struct kvm_irq_routing_irqchip {
__u32 irqchip;
--
2.49.0.1266.g31b7d2e469-goog
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH v2 3/6] KVM: arm64: Allow userspace to inject external instruction aborts
2025-06-04 5:08 ` [PATCH v2 3/6] KVM: arm64: Allow userspace to inject external instruction aborts Jiaqi Yan
@ 2025-07-11 19:42 ` Oliver Upton
2025-07-11 23:58 ` Jiaqi Yan
0 siblings, 1 reply; 21+ messages in thread
From: Oliver Upton @ 2025-07-11 19:42 UTC (permalink / raw)
To: Jiaqi Yan
Cc: maz, joey.gouly, suzuki.poulose, yuzenghui, catalin.marinas, will,
pbonzini, corbet, shuah, kvm, kvmarm, linux-arm-kernel,
linux-kernel, linux-doc, linux-kselftest, duenwen, rananta,
jthoughton
On Wed, Jun 04, 2025 at 05:08:58AM +0000, Jiaqi Yan wrote:
> From: Raghavendra Rao Ananta <rananta@google.com>
>
> When KVM returns to userspace for KVM_EXIT_ARM_SEA, the userspace is
> encouraged to inject the abort into the guest via KVM_SET_VCPU_EVENTS.
>
> KVM_SET_VCPU_EVENTS currently only allows injecting external data aborts.
> However, the synchronous external abort that caused KVM_EXIT_ARM_SEA
> is possible to be an instruction abort. Userspace is already able to
> tell if an abort is due to data or instruction via kvm_run.arm_sea.esr,
> by checking its Exception Class value.
>
> Extend the KVM_SET_VCPU_EVENTS ioctl to allow injecting instruction
> abort into the guest.
>
> Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
> Signed-off-by: Jiaqi Yan <jiaqiyan@google.com>
Hmm. Since we expose an ESR value to userspace I get the feeling that we
should allow the user to supply an ISS for the external abort, similar
to what we already do for SErrors.
Thanks,
Oliver
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v2 3/6] KVM: arm64: Allow userspace to inject external instruction aborts
2025-07-11 19:42 ` Oliver Upton
@ 2025-07-11 23:58 ` Jiaqi Yan
2025-07-12 19:47 ` Oliver Upton
0 siblings, 1 reply; 21+ messages in thread
From: Jiaqi Yan @ 2025-07-11 23:58 UTC (permalink / raw)
To: Oliver Upton
Cc: maz, joey.gouly, suzuki.poulose, yuzenghui, catalin.marinas, will,
pbonzini, corbet, shuah, kvm, kvmarm, linux-arm-kernel,
linux-kernel, linux-doc, linux-kselftest, duenwen, rananta,
jthoughton
On Fri, Jul 11, 2025 at 12:42 PM Oliver Upton <oliver.upton@linux.dev> wrote:
>
> On Wed, Jun 04, 2025 at 05:08:58AM +0000, Jiaqi Yan wrote:
> > From: Raghavendra Rao Ananta <rananta@google.com>
> >
> > When KVM returns to userspace for KVM_EXIT_ARM_SEA, the userspace is
> > encouraged to inject the abort into the guest via KVM_SET_VCPU_EVENTS.
> >
> > KVM_SET_VCPU_EVENTS currently only allows injecting external data aborts.
> > However, the synchronous external abort that caused KVM_EXIT_ARM_SEA
> > is possible to be an instruction abort. Userspace is already able to
> > tell if an abort is due to data or instruction via kvm_run.arm_sea.esr,
> > by checking its Exception Class value.
> >
> > Extend the KVM_SET_VCPU_EVENTS ioctl to allow injecting instruction
> > abort into the guest.
> >
> > Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
> > Signed-off-by: Jiaqi Yan <jiaqiyan@google.com>
>
> Hmm. Since we expose an ESR value to userspace I get the feeling that we
> should allow the user to supply an ISS for the external abort, similar
> to what we already do for SErrors.
Oh, I will create something in v3, by extending kvm_vcpu_events to
something like:
struct {
__u8 serror_pending;
__u8 serror_has_esr;
__u8 ext_dabt_pending;
__u8 ext_iabt_pending;
__u8 ext_abt_has_esr; // <= new
/* Align it to 8 bytes */
__u8 pad[3];
union {
__u64 serror_esr;
__u64 ext_abt_esr; // <= new
};
} exception;
One question about the naming since we cannot change it once
committed. Taking the existing SError injection as example, although
the name in kvm_vcpu_events is serror_has_esr, it is essentially just
the ISS fields of the ESR (which is also written in virt/kvm/api.rst).
Why named after "esr" instead of "iss"? The only reason I can think of
is, KVM wants to leave the room to accept more fields than ISS from
userspace. Does this reason apply to external aborts? Asking in case
if "iss" is a better name in kvm_vcpu_events, maybe for external
aborts, we should use ext_abt_has_iss?
>
> Thanks,
> Oliver
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v2 3/6] KVM: arm64: Allow userspace to inject external instruction aborts
2025-07-11 23:58 ` Jiaqi Yan
@ 2025-07-12 19:47 ` Oliver Upton
2025-07-13 2:42 ` Jiaqi Yan
0 siblings, 1 reply; 21+ messages in thread
From: Oliver Upton @ 2025-07-12 19:47 UTC (permalink / raw)
To: Jiaqi Yan
Cc: maz, joey.gouly, suzuki.poulose, yuzenghui, catalin.marinas, will,
pbonzini, corbet, shuah, kvm, kvmarm, linux-arm-kernel,
linux-kernel, linux-doc, linux-kselftest, duenwen, rananta,
jthoughton
On Fri, Jul 11, 2025 at 04:58:57PM -0700, Jiaqi Yan wrote:
> On Fri, Jul 11, 2025 at 12:42 PM Oliver Upton <oliver.upton@linux.dev> wrote:
> >
> > On Wed, Jun 04, 2025 at 05:08:58AM +0000, Jiaqi Yan wrote:
> > > From: Raghavendra Rao Ananta <rananta@google.com>
> > >
> > > When KVM returns to userspace for KVM_EXIT_ARM_SEA, the userspace is
> > > encouraged to inject the abort into the guest via KVM_SET_VCPU_EVENTS.
> > >
> > > KVM_SET_VCPU_EVENTS currently only allows injecting external data aborts.
> > > However, the synchronous external abort that caused KVM_EXIT_ARM_SEA
> > > is possible to be an instruction abort. Userspace is already able to
> > > tell if an abort is due to data or instruction via kvm_run.arm_sea.esr,
> > > by checking its Exception Class value.
> > >
> > > Extend the KVM_SET_VCPU_EVENTS ioctl to allow injecting instruction
> > > abort into the guest.
> > >
> > > Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
> > > Signed-off-by: Jiaqi Yan <jiaqiyan@google.com>
> >
> > Hmm. Since we expose an ESR value to userspace I get the feeling that we
> > should allow the user to supply an ISS for the external abort, similar
> > to what we already do for SErrors.
>
> Oh, I will create something in v3, by extending kvm_vcpu_events to
> something like:
>
> struct {
> __u8 serror_pending;
> __u8 serror_has_esr;
> __u8 ext_dabt_pending;
> __u8 ext_iabt_pending;
> __u8 ext_abt_has_esr; // <= new
> /* Align it to 8 bytes */
> __u8 pad[3];
> union {
> __u64 serror_esr;
> __u64 ext_abt_esr; // <= new
This doesn't work. The ABI allows userspace to pend both an SError and
SEA, so we can't use the same storage for the ESR.
> };
> } exception;
>
> One question about the naming since we cannot change it once
> committed. Taking the existing SError injection as example, although
> the name in kvm_vcpu_events is serror_has_esr, it is essentially just
> the ISS fields of the ESR (which is also written in virt/kvm/api.rst).
> Why named after "esr" instead of "iss"? The only reason I can think of
> is, KVM wants to leave the room to accept more fields than ISS from
> userspace. Does this reason apply to external aborts? Asking in case
> if "iss" is a better name in kvm_vcpu_events, maybe for external
> aborts, we should use ext_abt_has_iss?
We will probably need to include more ESR fields in the future, like
ESR_ELx.ISS2. So let's just keep the existing naming if that's OK with
you.
Thanks,
Oliver
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v2 3/6] KVM: arm64: Allow userspace to inject external instruction aborts
2025-07-12 19:47 ` Oliver Upton
@ 2025-07-13 2:42 ` Jiaqi Yan
0 siblings, 0 replies; 21+ messages in thread
From: Jiaqi Yan @ 2025-07-13 2:42 UTC (permalink / raw)
To: Oliver Upton
Cc: maz, joey.gouly, suzuki.poulose, yuzenghui, catalin.marinas, will,
pbonzini, corbet, shuah, kvm, kvmarm, linux-arm-kernel,
linux-kernel, linux-doc, linux-kselftest, duenwen, rananta,
jthoughton
On Sat, Jul 12, 2025 at 12:47 PM Oliver Upton <oliver.upton@linux.dev> wrote:
>
> On Fri, Jul 11, 2025 at 04:58:57PM -0700, Jiaqi Yan wrote:
> > On Fri, Jul 11, 2025 at 12:42 PM Oliver Upton <oliver.upton@linux.dev> wrote:
> > >
> > > On Wed, Jun 04, 2025 at 05:08:58AM +0000, Jiaqi Yan wrote:
> > > > From: Raghavendra Rao Ananta <rananta@google.com>
> > > >
> > > > When KVM returns to userspace for KVM_EXIT_ARM_SEA, the userspace is
> > > > encouraged to inject the abort into the guest via KVM_SET_VCPU_EVENTS.
> > > >
> > > > KVM_SET_VCPU_EVENTS currently only allows injecting external data aborts.
> > > > However, the synchronous external abort that caused KVM_EXIT_ARM_SEA
> > > > is possible to be an instruction abort. Userspace is already able to
> > > > tell if an abort is due to data or instruction via kvm_run.arm_sea.esr,
> > > > by checking its Exception Class value.
> > > >
> > > > Extend the KVM_SET_VCPU_EVENTS ioctl to allow injecting instruction
> > > > abort into the guest.
> > > >
> > > > Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
> > > > Signed-off-by: Jiaqi Yan <jiaqiyan@google.com>
> > >
> > > Hmm. Since we expose an ESR value to userspace I get the feeling that we
> > > should allow the user to supply an ISS for the external abort, similar
> > > to what we already do for SErrors.
> >
> > Oh, I will create something in v3, by extending kvm_vcpu_events to
> > something like:
> >
> > struct {
> > __u8 serror_pending;
> > __u8 serror_has_esr;
> > __u8 ext_dabt_pending;
> > __u8 ext_iabt_pending;
> > __u8 ext_abt_has_esr; // <= new
> > /* Align it to 8 bytes */
> > __u8 pad[3];
> > union {
> > __u64 serror_esr;
> > __u64 ext_abt_esr; // <= new
>
> This doesn't work. The ABI allows userspace to pend both an SError and
> SEA, so we can't use the same storage for the ESR.
You are right, the implementation (__kvm_arm_vcpu_set_events) indeed
continues to inject SError after injecting SEA.
Then we may have to extend the size of exception and meanwhile reduce
the size of reserved, because I believe we want to place ext_abt_esr
into kvm_vcpu_events.exception. Something like:
struct kvm_vcpu_events {
struct {
__u8 serror_pending;
__u8 serror_has_esr;
__u8 ext_dabt_pending;
__u8 ext_iabt_pending;
__u8 ext_abt_has_esr;
__u8 pad[3];
__u64 serror_esr;
__u64 ext_abt_esr; // <= +64 bits
} exception;
__u32 reserved[10]; // <= -64 bits
};
The offset to kvm_vcpu_events .reserved changes; I don' think
userspace will read/write reserved (so its offset is probably not very
important?), but theoretically this is an ABI break.
Another safer but not very readable way is to add at the end:
struct kvm_vcpu_events {
struct {
__u8 serror_pending;
__u8 serror_has_esr;
__u8 ext_dabt_pending;
__u8 ext_iabt_pending;
__u8 ext_abt_has_esr;
__u8 pad[3];
__u64 serror_esr;
} exception;
__u32 reserved[10]; // <= -64 bits
__u64 ext_abt_esr; // <= +64 bits
};
Any better suggestions?
>
> > };
> > } exception;
> >
> > One question about the naming since we cannot change it once
> > committed. Taking the existing SError injection as example, although
> > the name in kvm_vcpu_events is serror_has_esr, it is essentially just
> > the ISS fields of the ESR (which is also written in virt/kvm/api.rst).
> > Why named after "esr" instead of "iss"? The only reason I can think of
> > is, KVM wants to leave the room to accept more fields than ISS from
> > userspace. Does this reason apply to external aborts? Asking in case
> > if "iss" is a better name in kvm_vcpu_events, maybe for external
> > aborts, we should use ext_abt_has_iss?
>
> We will probably need to include more ESR fields in the future, like
> ESR_ELx.ISS2. So let's just keep the existing naming if that's OK with
> you.
Ack to "esr", thanks Oliver!
>
> Thanks,
> Oliver
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH v2 4/6] KVM: selftests: Test for KVM_EXIT_ARM_SEA and KVM_CAP_ARM_SEA_TO_USER
2025-06-04 5:08 [PATCH v2 0/6] VMM can handle guest SEA via KVM_EXIT_ARM_SEA Jiaqi Yan
` (2 preceding siblings ...)
2025-06-04 5:08 ` [PATCH v2 3/6] KVM: arm64: Allow userspace to inject external instruction aborts Jiaqi Yan
@ 2025-06-04 5:08 ` Jiaqi Yan
2025-06-04 5:09 ` [PATCH v2 5/6] KVM: selftests: Test for KVM_CAP_INJECT_EXT_IABT Jiaqi Yan
2025-06-04 5:09 ` [PATCH v2 6/6] Documentation: kvm: new uAPI for handling SEA Jiaqi Yan
5 siblings, 0 replies; 21+ messages in thread
From: Jiaqi Yan @ 2025-06-04 5:08 UTC (permalink / raw)
To: maz, oliver.upton
Cc: joey.gouly, suzuki.poulose, yuzenghui, catalin.marinas, will,
pbonzini, corbet, shuah, kvm, kvmarm, linux-arm-kernel,
linux-kernel, linux-doc, linux-kselftest, duenwen, rananta,
jthoughton, Jiaqi Yan
Test how KVM handles guest stage-2 SEA when APEI is unable to claim it.
The behavior is triggered by consuming recoverable memory error (UER)
injected via EINJ. The test asserts two major things:
1. KVM returns to userspace with KVM_EXIT_ARM_SEA exit reason, and
has provided expected fault information, e.g. esr, flags, gva, gpa.
2. Userspace is able to handle KVM_EXIT_ARM_SEA by injecting SEA to
guest and KVM injects expected SEA into the VCPU.
Tested on a data center server running Siryn AmpereOne processor.
Several things to notice before attempting to run this selftest:
- The test relies on EINJ support in both firmware and kernel to
inject UER. Otherwise the test will be skipped.
- The under-test platform's APEI should be unable to claim the SEA.
Otherwise the test will be skipped.
- Some platform doesn't support notrigger in EINJ, which may cause
APEI and GHES to offline the memory before guest can consume
injected UER, and making test unable to trigger SEA.
Signed-off-by: Jiaqi Yan <jiaqiyan@google.com>
---
tools/arch/arm64/include/asm/esr.h | 2 +
tools/testing/selftests/kvm/Makefile.kvm | 1 +
.../testing/selftests/kvm/arm64/sea_to_user.c | 340 ++++++++++++++++++
tools/testing/selftests/kvm/lib/kvm_util.c | 1 +
4 files changed, 344 insertions(+)
create mode 100644 tools/testing/selftests/kvm/arm64/sea_to_user.c
diff --git a/tools/arch/arm64/include/asm/esr.h b/tools/arch/arm64/include/asm/esr.h
index bd592ca815711..0fa17b3af1f78 100644
--- a/tools/arch/arm64/include/asm/esr.h
+++ b/tools/arch/arm64/include/asm/esr.h
@@ -141,6 +141,8 @@
#define ESR_ELx_SF (UL(1) << ESR_ELx_SF_SHIFT)
#define ESR_ELx_AR_SHIFT (14)
#define ESR_ELx_AR (UL(1) << ESR_ELx_AR_SHIFT)
+#define ESR_ELx_VNCR_SHIFT (13)
+#define ESR_ELx_VNCR (UL(1) << ESR_ELx_VNCR_SHIFT)
#define ESR_ELx_CM_SHIFT (8)
#define ESR_ELx_CM (UL(1) << ESR_ELx_CM_SHIFT)
diff --git a/tools/testing/selftests/kvm/Makefile.kvm b/tools/testing/selftests/kvm/Makefile.kvm
index d37072054a3d0..9eecce6b8274f 100644
--- a/tools/testing/selftests/kvm/Makefile.kvm
+++ b/tools/testing/selftests/kvm/Makefile.kvm
@@ -152,6 +152,7 @@ TEST_GEN_PROGS_arm64 += arm64/hypercalls
TEST_GEN_PROGS_arm64 += arm64/mmio_abort
TEST_GEN_PROGS_arm64 += arm64/page_fault_test
TEST_GEN_PROGS_arm64 += arm64/psci_test
+TEST_GEN_PROGS_arm64 += arm64/sea_to_user
TEST_GEN_PROGS_arm64 += arm64/set_id_regs
TEST_GEN_PROGS_arm64 += arm64/smccc_filter
TEST_GEN_PROGS_arm64 += arm64/vcpu_width_config
diff --git a/tools/testing/selftests/kvm/arm64/sea_to_user.c b/tools/testing/selftests/kvm/arm64/sea_to_user.c
new file mode 100644
index 0000000000000..381d8597ab406
--- /dev/null
+++ b/tools/testing/selftests/kvm/arm64/sea_to_user.c
@@ -0,0 +1,340 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Test KVM returns to userspace with KVM_EXIT_ARM_SEA if host APEI fails
+ * to handle SEA and userspace has opt-ed in KVM_CAP_ARM_SEA_TO_USER.
+ *
+ * After reaching userspace with expected arm_sea info, also test userspace
+ * injecting a synchronous external data abort into the guest.
+ *
+ * This test utilizes EINJ to generate a REAL synchronous external data
+ * abort by consuming a recoverable uncorrectable memory error. Therefore
+ * the device under test must support EINJ in both firmware and host kernel,
+ * including the notrigger feature. Otherwise the test will be skipped.
+ * The under-test platform's APEI should be unable to claim SEA. Otherwise
+ * the test will also be skipped.
+ */
+
+#include <signal.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+
+#include "test_util.h"
+#include "kvm_util.h"
+#include "processor.h"
+#include "guest_modes.h"
+
+#define PAGE_PRESENT (1ULL << 63)
+#define PAGE_PHYSICAL 0x007fffffffffffffULL
+#define PAGE_ADDR_MASK (~(0xfffULL))
+
+/* Value for "Recoverable state (UER)". */
+#define ESR_ELx_SET_UER 0U
+
+/* Group ISV and ISS[23:14]. */
+#define ESR_ELx_INST_SYNDROME ((ESR_ELx_ISV) | (ESR_ELx_SAS) | \
+ (ESR_ELx_SSE) | (ESR_ELx_SRT_MASK) | \
+ (ESR_ELx_SF) | (ESR_ELx_AR))
+
+#define EINJ_ETYPE "/sys/kernel/debug/apei/einj/error_type"
+#define EINJ_ADDR "/sys/kernel/debug/apei/einj/param1"
+#define EINJ_MASK "/sys/kernel/debug/apei/einj/param2"
+#define EINJ_FLAGS "/sys/kernel/debug/apei/einj/flags"
+#define EINJ_NOTRIGGER "/sys/kernel/debug/apei/einj/notrigger"
+#define EINJ_DOIT "/sys/kernel/debug/apei/einj/error_inject"
+/* Memory Uncorrectable non-fatal. */
+#define ERROR_TYPE_MEMORY_UER 0x10
+/* Memory address and mask valid (param1 and param2). */
+#define MASK_MEMORY_UER 0b10
+
+/* Guest virtual address region = [2G, 3G). */
+#define START_GVA 0x80000000UL
+#define VM_MEM_SIZE 0x40000000UL
+/* Note: EINJ_OFFSET must < VM_MEM_SIZE. */
+#define EINJ_OFFSET 0x01234badUL
+#define EINJ_GVA ((START_GVA) + (EINJ_OFFSET))
+
+static vm_paddr_t einj_gpa;
+static void *einj_hva;
+static uint64_t einj_hpa;
+static bool far_invalid;
+
+static uint64_t translate_to_host_paddr(unsigned long vaddr)
+{
+ uint64_t pinfo;
+ int64_t offset = vaddr / getpagesize() * sizeof(pinfo);
+ int fd;
+ uint64_t page_addr;
+ uint64_t paddr;
+
+ fd = open("/proc/self/pagemap", O_RDONLY);
+ if (fd < 0)
+ ksft_exit_fail_perror("Failed to open /proc/self/pagemap");
+ if (pread(fd, &pinfo, sizeof(pinfo), offset) != sizeof(pinfo)) {
+ close(fd);
+ ksft_exit_fail_perror("Failed to read /proc/self/pagemap");
+ }
+
+ close(fd);
+
+ if ((pinfo & PAGE_PRESENT) == 0)
+ ksft_exit_fail_perror("Page not present");
+
+ page_addr = (pinfo & PAGE_PHYSICAL) << MIN_PAGE_SHIFT;
+ paddr = page_addr + (vaddr & (getpagesize() - 1));
+ return paddr;
+}
+
+static void write_einj_entry(const char *einj_path, uint64_t val)
+{
+ char cmd[256] = {0};
+ FILE *cmdfile = NULL;
+
+ sprintf(cmd, "echo %#lx > %s", val, einj_path);
+ cmdfile = popen(cmd, "r");
+
+ if (pclose(cmdfile) == 0)
+ ksft_print_msg("echo %#lx > %s - done\n", val, einj_path);
+ else
+ ksft_exit_fail_perror("Failed to write EINJ entry");
+}
+
+static void inject_uer(uint64_t paddr)
+{
+ if (access("/sys/firmware/acpi/tables/EINJ", R_OK) == -1)
+ ksft_test_result_skip("EINJ table no available in firmware");
+
+ if (access(EINJ_ETYPE, R_OK | W_OK) == -1)
+ ksft_test_result_skip("EINJ module probably not loaded?");
+
+ write_einj_entry(EINJ_ETYPE, ERROR_TYPE_MEMORY_UER);
+ write_einj_entry(EINJ_FLAGS, MASK_MEMORY_UER);
+ write_einj_entry(EINJ_ADDR, paddr);
+ write_einj_entry(EINJ_MASK, ~0x0UL);
+ write_einj_entry(EINJ_NOTRIGGER, 1);
+ write_einj_entry(EINJ_DOIT, 1);
+}
+
+/*
+ * When host APEI successfully claims the SEA caused by guest_code, kernel
+ * will send SIGBUS signal with BUS_MCEERR_AR to test thread.
+ *
+ * We set up this SIGBUS handler to skip the test for that case.
+ */
+static void sigbus_signal_handler(int sig, siginfo_t *si, void *v)
+{
+ ksft_print_msg("SIGBUS (%d) received, dumping siginfo...\n", sig);
+ ksft_print_msg("si_signo=%d, si_errno=%d, si_code=%d, si_addr=%p\n",
+ si->si_signo, si->si_errno, si->si_code, si->si_addr);
+ if (si->si_code == BUS_MCEERR_AR)
+ ksft_test_result_skip("SEA is claimed by host APEI\n");
+ else
+ ksft_test_result_fail("Exit with signal unhandled\n");
+
+ exit(0);
+}
+
+static void setup_sigbus_handler(void)
+{
+ struct sigaction act;
+
+ memset(&act, 0, sizeof(act));
+ sigemptyset(&act.sa_mask);
+ act.sa_sigaction = sigbus_signal_handler;
+ act.sa_flags = SA_SIGINFO;
+ TEST_ASSERT(sigaction(SIGBUS, &act, NULL) == 0,
+ "Failed to setup SIGBUS handler");
+}
+
+static void guest_code(void)
+{
+ uint64_t guest_data;
+
+ /* Consumes error will cause a SEA. */
+ guest_data = *(uint64_t *)EINJ_GVA;
+
+ GUEST_FAIL("Data corruption not prevented by SEA: gva=%#lx, data=%#lx",
+ EINJ_GVA, guest_data);
+}
+
+static void expect_sea_handler(struct ex_regs *regs)
+{
+ u64 esr = read_sysreg(esr_el1);
+ u64 far = read_sysreg(far_el1);
+ bool expect_far_invalid = far_invalid;
+
+ GUEST_PRINTF("Handling Guest SEA\n");
+ GUEST_PRINTF(" ESR_EL1=%#lx, FAR_EL1=%#lx\n", esr, far);
+ GUEST_PRINTF(" Entire ISS2=%#llx\n", ESR_ELx_ISS2(esr));
+ GUEST_PRINTF(" ISV + ISS[23:14]=%#lx\n", esr & ESR_ELx_INST_SYNDROME);
+ GUEST_PRINTF(" VNCR=%#lx\n", esr & ESR_ELx_VNCR);
+ GUEST_PRINTF(" SET=%#lx\n", esr & ESR_ELx_SET_MASK);
+
+ GUEST_ASSERT_EQ(ESR_ELx_EC(esr), ESR_ELx_EC_DABT_CUR);
+ GUEST_ASSERT_EQ(esr & ESR_ELx_FSC_TYPE, ESR_ELx_FSC_EXTABT);
+
+ /* Asserts bits hidden by KVM. */
+ GUEST_ASSERT_EQ(ESR_ELx_ISS2(esr), 0);
+ GUEST_ASSERT_EQ((esr & ESR_ELx_INST_SYNDROME), 0);
+ GUEST_ASSERT_EQ(esr & ESR_ELx_VNCR, 0);
+ GUEST_ASSERT_EQ(esr & ESR_ELx_SET_MASK, ESR_ELx_SET_UER);
+
+ if (expect_far_invalid) {
+ GUEST_ASSERT_EQ(esr & ESR_ELx_FnV, ESR_ELx_FnV);
+ GUEST_PRINTF("Guest observed garbage value in FAR\n");
+ } else {
+ GUEST_ASSERT_EQ(esr & ESR_ELx_FnV, 0);
+ GUEST_ASSERT_EQ(far, EINJ_GVA);
+ }
+
+ GUEST_DONE();
+}
+
+static void vcpu_inject_sea(struct kvm_vcpu *vcpu)
+{
+ struct kvm_vcpu_events events = {};
+
+ events.exception.ext_dabt_pending = true;
+ vcpu_events_set(vcpu, &events);
+}
+
+static void run_vm(struct kvm_vm *vm, struct kvm_vcpu *vcpu)
+{
+ struct ucall uc;
+ bool guest_done = false;
+ struct kvm_run *run = vcpu->run;
+
+ /* Resume the vCPU after error injection to consume the error. */
+ vcpu_run(vcpu);
+
+ ksft_print_msg("Dump kvm_run info about KVM_EXIT_%s\n",
+ exit_reason_str(run->exit_reason));
+ ksft_print_msg("kvm_run.arm_sea: esr=%#llx, flags=%#llx\n",
+ run->arm_sea.esr, run->arm_sea.flags);
+ ksft_print_msg("kvm_run.arm_sea: gva=%#llx, gpa=%#llx\n",
+ run->arm_sea.gva, run->arm_sea.gpa);
+
+ /* Validate the KVM_EXIT. */
+ TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_ARM_SEA);
+ TEST_ASSERT_EQ(ESR_ELx_EC(run->arm_sea.esr), ESR_ELx_EC_DABT_LOW);
+ TEST_ASSERT_EQ(run->arm_sea.esr & ESR_ELx_FSC_TYPE, ESR_ELx_FSC_EXTABT);
+ TEST_ASSERT_EQ(run->arm_sea.esr & ESR_ELx_SET_MASK, ESR_ELx_SET_UER);
+
+ if (run->arm_sea.flags & KVM_EXIT_ARM_SEA_FLAG_GVA_VALID)
+ TEST_ASSERT_EQ(run->arm_sea.gva, EINJ_GVA);
+
+ if (run->arm_sea.flags & KVM_EXIT_ARM_SEA_FLAG_GPA_VALID)
+ TEST_ASSERT_EQ(run->arm_sea.gpa, einj_gpa & PAGE_ADDR_MASK);
+
+ far_invalid = run->arm_sea.esr & ESR_ELx_FnV;
+
+ /* Inject a SEA into guest and expect handled in SEA handler. */
+ vcpu_inject_sea(vcpu);
+
+ /* Expect the guest to reach GUEST_DONE gracefully. */
+ do {
+ vcpu_run(vcpu);
+ switch (get_ucall(vcpu, &uc)) {
+ case UCALL_PRINTF:
+ ksft_print_msg("From guest: %s", uc.buffer);
+ break;
+ case UCALL_DONE:
+ ksft_print_msg("Guest done gracefully!\n");
+ guest_done = 1;
+ break;
+ case UCALL_ABORT:
+ ksft_print_msg("Guest aborted!\n");
+ guest_done = 1;
+ REPORT_GUEST_ASSERT(uc);
+ break;
+ default:
+ TEST_FAIL("Unexpected ucall: %lu\n", uc.cmd);
+ }
+ } while (!guest_done);
+}
+
+static struct kvm_vm *vm_create_with_sea_handler(struct kvm_vcpu **vcpu)
+{
+ size_t backing_page_size;
+ size_t guest_page_size;
+ size_t alignment;
+ uint64_t num_guest_pages;
+ vm_paddr_t start_gpa;
+ enum vm_mem_backing_src_type src_type = VM_MEM_SRC_ANONYMOUS_HUGETLB_1GB;
+ struct kvm_vm *vm;
+
+ backing_page_size = get_backing_src_pagesz(src_type);
+ guest_page_size = vm_guest_mode_params[VM_MODE_DEFAULT].page_size;
+ alignment = max(backing_page_size, guest_page_size);
+ num_guest_pages = VM_MEM_SIZE / guest_page_size;
+
+ vm = __vm_create_with_one_vcpu(vcpu, num_guest_pages, guest_code);
+ vm_init_descriptor_tables(vm);
+ vcpu_init_descriptor_tables(*vcpu);
+
+ vm_install_sync_handler(vm,
+ /*vector=*/VECTOR_SYNC_CURRENT,
+ /*ec=*/ESR_ELx_EC_DABT_CUR,
+ /*handler=*/expect_sea_handler);
+
+ start_gpa = (vm->max_gfn - num_guest_pages) * guest_page_size;
+ start_gpa = align_down(start_gpa, alignment);
+
+ vm_userspace_mem_region_add(
+ /*vm=*/vm,
+ /*src_type=*/src_type,
+ /*guest_paddr=*/start_gpa,
+ /*slot=*/1,
+ /*npages=*/num_guest_pages,
+ /*flags=*/0);
+
+ virt_map(vm, START_GVA, start_gpa, num_guest_pages);
+
+ ksft_print_msg("Mapped %#lx pages: gva=%#lx to gpa=%#lx\n",
+ num_guest_pages, START_GVA, start_gpa);
+ return vm;
+}
+
+static void vm_inject_memory_uer(struct kvm_vm *vm)
+{
+ uint64_t guest_data;
+
+ einj_gpa = addr_gva2gpa(vm, EINJ_GVA);
+ einj_hva = addr_gva2hva(vm, EINJ_GVA);
+
+ /* Populate certain data before injecting UER. */
+ *(uint64_t *)einj_hva = 0xBAADCAFE;
+ guest_data = *(uint64_t *)einj_hva;
+ ksft_print_msg("Before EINJect: data=%#lx\n",
+ guest_data);
+
+ einj_hpa = translate_to_host_paddr((unsigned long)einj_hva);
+
+ ksft_print_msg("EINJ_GVA=%#lx, einj_gpa=%#lx, einj_hva=%p, einj_hpa=%#lx\n",
+ EINJ_GVA, einj_gpa, einj_hva, einj_hpa);
+
+ inject_uer(einj_hpa);
+ ksft_print_msg("Memory UER EINJected\n");
+}
+
+int main(int argc, char *argv[])
+{
+ struct kvm_vm *vm;
+ struct kvm_vcpu *vcpu;
+
+ TEST_REQUIRE(kvm_has_cap(KVM_CAP_ARM_SEA_TO_USER));
+
+ setup_sigbus_handler();
+
+ vm = vm_create_with_sea_handler(&vcpu);
+
+ vm_enable_cap(vm, KVM_CAP_ARM_SEA_TO_USER, 0);
+
+ vm_inject_memory_uer(vm);
+
+ run_vm(vm, vcpu);
+
+ kvm_vm_free(vm);
+
+ return 0;
+}
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
index 815bc45dd8dc6..bc9fcf6c3295a 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -2021,6 +2021,7 @@ static struct exit_reason {
KVM_EXIT_STRING(NOTIFY),
KVM_EXIT_STRING(LOONGARCH_IOCSR),
KVM_EXIT_STRING(MEMORY_FAULT),
+ KVM_EXIT_STRING(ARM_SEA),
};
/*
--
2.49.0.1266.g31b7d2e469-goog
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v2 5/6] KVM: selftests: Test for KVM_CAP_INJECT_EXT_IABT
2025-06-04 5:08 [PATCH v2 0/6] VMM can handle guest SEA via KVM_EXIT_ARM_SEA Jiaqi Yan
` (3 preceding siblings ...)
2025-06-04 5:08 ` [PATCH v2 4/6] KVM: selftests: Test for KVM_EXIT_ARM_SEA and KVM_CAP_ARM_SEA_TO_USER Jiaqi Yan
@ 2025-06-04 5:09 ` Jiaqi Yan
2025-07-11 19:44 ` Oliver Upton
2025-06-04 5:09 ` [PATCH v2 6/6] Documentation: kvm: new uAPI for handling SEA Jiaqi Yan
5 siblings, 1 reply; 21+ messages in thread
From: Jiaqi Yan @ 2025-06-04 5:09 UTC (permalink / raw)
To: maz, oliver.upton
Cc: joey.gouly, suzuki.poulose, yuzenghui, catalin.marinas, will,
pbonzini, corbet, shuah, kvm, kvmarm, linux-arm-kernel,
linux-kernel, linux-doc, linux-kselftest, duenwen, rananta,
jthoughton, Jiaqi Yan
Test userspace can use KVM_SET_VCPU_EVENTS to inject an external
instruction abort into guest. The test injects instruction abort at an
arbitrary time without real SEA happening in the guest VCPU, so only
certain ESR_EL1 bits are expected and asserted.
Signed-off-by: Jiaqi Yan <jiaqiyan@google.com>
---
tools/arch/arm64/include/uapi/asm/kvm.h | 3 +-
tools/testing/selftests/kvm/Makefile.kvm | 1 +
.../testing/selftests/kvm/arm64/inject_iabt.c | 98 +++++++++++++++++++
3 files changed, 101 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/kvm/arm64/inject_iabt.c
diff --git a/tools/arch/arm64/include/uapi/asm/kvm.h b/tools/arch/arm64/include/uapi/asm/kvm.h
index af9d9acaf9975..d3a4530846311 100644
--- a/tools/arch/arm64/include/uapi/asm/kvm.h
+++ b/tools/arch/arm64/include/uapi/asm/kvm.h
@@ -184,8 +184,9 @@ struct kvm_vcpu_events {
__u8 serror_pending;
__u8 serror_has_esr;
__u8 ext_dabt_pending;
+ __u8 ext_iabt_pending;
/* Align it to 8 bytes */
- __u8 pad[5];
+ __u8 pad[4];
__u64 serror_esr;
} exception;
__u32 reserved[12];
diff --git a/tools/testing/selftests/kvm/Makefile.kvm b/tools/testing/selftests/kvm/Makefile.kvm
index 9eecce6b8274f..e6b504ded9c1c 100644
--- a/tools/testing/selftests/kvm/Makefile.kvm
+++ b/tools/testing/selftests/kvm/Makefile.kvm
@@ -149,6 +149,7 @@ TEST_GEN_PROGS_arm64 += arm64/arch_timer_edge_cases
TEST_GEN_PROGS_arm64 += arm64/debug-exceptions
TEST_GEN_PROGS_arm64 += arm64/host_sve
TEST_GEN_PROGS_arm64 += arm64/hypercalls
+TEST_GEN_PROGS_arm64 += arm64/inject_iabt
TEST_GEN_PROGS_arm64 += arm64/mmio_abort
TEST_GEN_PROGS_arm64 += arm64/page_fault_test
TEST_GEN_PROGS_arm64 += arm64/psci_test
diff --git a/tools/testing/selftests/kvm/arm64/inject_iabt.c b/tools/testing/selftests/kvm/arm64/inject_iabt.c
new file mode 100644
index 0000000000000..0c7999e5ba5b3
--- /dev/null
+++ b/tools/testing/selftests/kvm/arm64/inject_iabt.c
@@ -0,0 +1,98 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * inject_iabt.c - Tests for injecting instruction aborts into guest.
+ */
+
+#include "processor.h"
+#include "test_util.h"
+
+static void expect_iabt_handler(struct ex_regs *regs)
+{
+ u64 esr = read_sysreg(esr_el1);
+
+ GUEST_PRINTF("Handling Guest SEA\n");
+ GUEST_PRINTF(" ESR_EL1=%#lx\n", esr);
+
+ GUEST_ASSERT_EQ(ESR_ELx_EC(esr), ESR_ELx_EC_IABT_CUR);
+ GUEST_ASSERT_EQ(esr & ESR_ELx_FSC_TYPE, ESR_ELx_FSC_EXTABT);
+
+ GUEST_DONE();
+}
+
+static void guest_code(void)
+{
+ GUEST_FAIL("Guest should only run SEA handler");
+}
+
+static void vcpu_run_expect_done(struct kvm_vcpu *vcpu)
+{
+ struct ucall uc;
+ bool guest_done = false;
+
+ do {
+ vcpu_run(vcpu);
+ switch (get_ucall(vcpu, &uc)) {
+ case UCALL_ABORT:
+ REPORT_GUEST_ASSERT(uc);
+ break;
+ case UCALL_PRINTF:
+ ksft_print_msg("From guest: %s", uc.buffer);
+ break;
+ case UCALL_DONE:
+ ksft_print_msg("Guest done gracefully!\n");
+ guest_done = true;
+ break;
+ default:
+ TEST_FAIL("Unexpected ucall: %lu", uc.cmd);
+ }
+ } while (!guest_done);
+}
+
+static void vcpu_inject_ext_iabt(struct kvm_vcpu *vcpu)
+{
+ struct kvm_vcpu_events events = {};
+
+ events.exception.ext_iabt_pending = true;
+ vcpu_events_set(vcpu, &events);
+}
+
+static void vcpu_inject_invalid_abt(struct kvm_vcpu *vcpu)
+{
+ struct kvm_vcpu_events events = {};
+ int r;
+
+ events.exception.ext_iabt_pending = true;
+ events.exception.ext_dabt_pending = true;
+
+ ksft_print_msg("Injecting invalid external abort events\n");
+ r = __vcpu_ioctl(vcpu, KVM_SET_VCPU_EVENTS, &events);
+ TEST_ASSERT(r && errno == EINVAL,
+ KVM_IOCTL_ERROR(KVM_SET_VCPU_EVENTS, r));
+}
+
+static void test_inject_iabt(void)
+{
+ struct kvm_vcpu *vcpu;
+ struct kvm_vm *vm;
+
+ vm = vm_create_with_one_vcpu(&vcpu, guest_code);
+
+ vm_init_descriptor_tables(vm);
+ vcpu_init_descriptor_tables(vcpu);
+
+ vm_install_sync_handler(vm, VECTOR_SYNC_CURRENT,
+ ESR_ELx_EC_IABT_CUR, expect_iabt_handler);
+
+ vcpu_inject_invalid_abt(vcpu);
+
+ vcpu_inject_ext_iabt(vcpu);
+ vcpu_run_expect_done(vcpu);
+
+ kvm_vm_free(vm);
+}
+
+int main(void)
+{
+ test_inject_iabt();
+ return 0;
+}
--
2.49.0.1266.g31b7d2e469-goog
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH v2 5/6] KVM: selftests: Test for KVM_CAP_INJECT_EXT_IABT
2025-06-04 5:09 ` [PATCH v2 5/6] KVM: selftests: Test for KVM_CAP_INJECT_EXT_IABT Jiaqi Yan
@ 2025-07-11 19:44 ` Oliver Upton
2025-07-11 23:59 ` Jiaqi Yan
0 siblings, 1 reply; 21+ messages in thread
From: Oliver Upton @ 2025-07-11 19:44 UTC (permalink / raw)
To: Jiaqi Yan
Cc: maz, joey.gouly, suzuki.poulose, yuzenghui, catalin.marinas, will,
pbonzini, corbet, shuah, kvm, kvmarm, linux-arm-kernel,
linux-kernel, linux-doc, linux-kselftest, duenwen, rananta,
jthoughton
On Wed, Jun 04, 2025 at 05:09:00AM +0000, Jiaqi Yan wrote:
> Test userspace can use KVM_SET_VCPU_EVENTS to inject an external
> instruction abort into guest. The test injects instruction abort at an
> arbitrary time without real SEA happening in the guest VCPU, so only
> certain ESR_EL1 bits are expected and asserted.
>
> Signed-off-by: Jiaqi Yan <jiaqiyan@google.com>
I reworked mmio_abort to be a general external abort test, can you add
your test cases there in the next spin (arm64/external_aborts.c)?
Thanks,
Oliver
> ---
> tools/arch/arm64/include/uapi/asm/kvm.h | 3 +-
> tools/testing/selftests/kvm/Makefile.kvm | 1 +
> .../testing/selftests/kvm/arm64/inject_iabt.c | 98 +++++++++++++++++++
> 3 files changed, 101 insertions(+), 1 deletion(-)
> create mode 100644 tools/testing/selftests/kvm/arm64/inject_iabt.c
>
> diff --git a/tools/arch/arm64/include/uapi/asm/kvm.h b/tools/arch/arm64/include/uapi/asm/kvm.h
> index af9d9acaf9975..d3a4530846311 100644
> --- a/tools/arch/arm64/include/uapi/asm/kvm.h
> +++ b/tools/arch/arm64/include/uapi/asm/kvm.h
> @@ -184,8 +184,9 @@ struct kvm_vcpu_events {
> __u8 serror_pending;
> __u8 serror_has_esr;
> __u8 ext_dabt_pending;
> + __u8 ext_iabt_pending;
> /* Align it to 8 bytes */
> - __u8 pad[5];
> + __u8 pad[4];
> __u64 serror_esr;
> } exception;
> __u32 reserved[12];
> diff --git a/tools/testing/selftests/kvm/Makefile.kvm b/tools/testing/selftests/kvm/Makefile.kvm
> index 9eecce6b8274f..e6b504ded9c1c 100644
> --- a/tools/testing/selftests/kvm/Makefile.kvm
> +++ b/tools/testing/selftests/kvm/Makefile.kvm
> @@ -149,6 +149,7 @@ TEST_GEN_PROGS_arm64 += arm64/arch_timer_edge_cases
> TEST_GEN_PROGS_arm64 += arm64/debug-exceptions
> TEST_GEN_PROGS_arm64 += arm64/host_sve
> TEST_GEN_PROGS_arm64 += arm64/hypercalls
> +TEST_GEN_PROGS_arm64 += arm64/inject_iabt
> TEST_GEN_PROGS_arm64 += arm64/mmio_abort
> TEST_GEN_PROGS_arm64 += arm64/page_fault_test
> TEST_GEN_PROGS_arm64 += arm64/psci_test
> diff --git a/tools/testing/selftests/kvm/arm64/inject_iabt.c b/tools/testing/selftests/kvm/arm64/inject_iabt.c
> new file mode 100644
> index 0000000000000..0c7999e5ba5b3
> --- /dev/null
> +++ b/tools/testing/selftests/kvm/arm64/inject_iabt.c
> @@ -0,0 +1,98 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * inject_iabt.c - Tests for injecting instruction aborts into guest.
> + */
> +
> +#include "processor.h"
> +#include "test_util.h"
> +
> +static void expect_iabt_handler(struct ex_regs *regs)
> +{
> + u64 esr = read_sysreg(esr_el1);
> +
> + GUEST_PRINTF("Handling Guest SEA\n");
> + GUEST_PRINTF(" ESR_EL1=%#lx\n", esr);
> +
> + GUEST_ASSERT_EQ(ESR_ELx_EC(esr), ESR_ELx_EC_IABT_CUR);
> + GUEST_ASSERT_EQ(esr & ESR_ELx_FSC_TYPE, ESR_ELx_FSC_EXTABT);
> +
> + GUEST_DONE();
> +}
> +
> +static void guest_code(void)
> +{
> + GUEST_FAIL("Guest should only run SEA handler");
> +}
> +
> +static void vcpu_run_expect_done(struct kvm_vcpu *vcpu)
> +{
> + struct ucall uc;
> + bool guest_done = false;
> +
> + do {
> + vcpu_run(vcpu);
> + switch (get_ucall(vcpu, &uc)) {
> + case UCALL_ABORT:
> + REPORT_GUEST_ASSERT(uc);
> + break;
> + case UCALL_PRINTF:
> + ksft_print_msg("From guest: %s", uc.buffer);
> + break;
> + case UCALL_DONE:
> + ksft_print_msg("Guest done gracefully!\n");
> + guest_done = true;
> + break;
> + default:
> + TEST_FAIL("Unexpected ucall: %lu", uc.cmd);
> + }
> + } while (!guest_done);
> +}
> +
> +static void vcpu_inject_ext_iabt(struct kvm_vcpu *vcpu)
> +{
> + struct kvm_vcpu_events events = {};
> +
> + events.exception.ext_iabt_pending = true;
> + vcpu_events_set(vcpu, &events);
> +}
> +
> +static void vcpu_inject_invalid_abt(struct kvm_vcpu *vcpu)
> +{
> + struct kvm_vcpu_events events = {};
> + int r;
> +
> + events.exception.ext_iabt_pending = true;
> + events.exception.ext_dabt_pending = true;
> +
> + ksft_print_msg("Injecting invalid external abort events\n");
> + r = __vcpu_ioctl(vcpu, KVM_SET_VCPU_EVENTS, &events);
> + TEST_ASSERT(r && errno == EINVAL,
> + KVM_IOCTL_ERROR(KVM_SET_VCPU_EVENTS, r));
> +}
> +
> +static void test_inject_iabt(void)
> +{
> + struct kvm_vcpu *vcpu;
> + struct kvm_vm *vm;
> +
> + vm = vm_create_with_one_vcpu(&vcpu, guest_code);
> +
> + vm_init_descriptor_tables(vm);
> + vcpu_init_descriptor_tables(vcpu);
> +
> + vm_install_sync_handler(vm, VECTOR_SYNC_CURRENT,
> + ESR_ELx_EC_IABT_CUR, expect_iabt_handler);
> +
> + vcpu_inject_invalid_abt(vcpu);
> +
> + vcpu_inject_ext_iabt(vcpu);
> + vcpu_run_expect_done(vcpu);
> +
> + kvm_vm_free(vm);
> +}
> +
> +int main(void)
> +{
> + test_inject_iabt();
> + return 0;
> +}
> --
> 2.49.0.1266.g31b7d2e469-goog
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v2 5/6] KVM: selftests: Test for KVM_CAP_INJECT_EXT_IABT
2025-07-11 19:44 ` Oliver Upton
@ 2025-07-11 23:59 ` Jiaqi Yan
0 siblings, 0 replies; 21+ messages in thread
From: Jiaqi Yan @ 2025-07-11 23:59 UTC (permalink / raw)
To: Oliver Upton
Cc: maz, joey.gouly, suzuki.poulose, yuzenghui, catalin.marinas, will,
pbonzini, corbet, shuah, kvm, kvmarm, linux-arm-kernel,
linux-kernel, linux-doc, linux-kselftest, duenwen, rananta,
jthoughton
On Fri, Jul 11, 2025 at 12:45 PM Oliver Upton <oliver.upton@linux.dev> wrote:
>
> On Wed, Jun 04, 2025 at 05:09:00AM +0000, Jiaqi Yan wrote:
> > Test userspace can use KVM_SET_VCPU_EVENTS to inject an external
> > instruction abort into guest. The test injects instruction abort at an
> > arbitrary time without real SEA happening in the guest VCPU, so only
> > certain ESR_EL1 bits are expected and asserted.
> >
> > Signed-off-by: Jiaqi Yan <jiaqiyan@google.com>
>
> I reworked mmio_abort to be a general external abort test, can you add
> your test cases there in the next spin (arm64/external_aborts.c)?
For sure!
>
> Thanks,
> Oliver
>
> > ---
> > tools/arch/arm64/include/uapi/asm/kvm.h | 3 +-
> > tools/testing/selftests/kvm/Makefile.kvm | 1 +
> > .../testing/selftests/kvm/arm64/inject_iabt.c | 98 +++++++++++++++++++
> > 3 files changed, 101 insertions(+), 1 deletion(-)
> > create mode 100644 tools/testing/selftests/kvm/arm64/inject_iabt.c
> >
> > diff --git a/tools/arch/arm64/include/uapi/asm/kvm.h b/tools/arch/arm64/include/uapi/asm/kvm.h
> > index af9d9acaf9975..d3a4530846311 100644
> > --- a/tools/arch/arm64/include/uapi/asm/kvm.h
> > +++ b/tools/arch/arm64/include/uapi/asm/kvm.h
> > @@ -184,8 +184,9 @@ struct kvm_vcpu_events {
> > __u8 serror_pending;
> > __u8 serror_has_esr;
> > __u8 ext_dabt_pending;
> > + __u8 ext_iabt_pending;
> > /* Align it to 8 bytes */
> > - __u8 pad[5];
> > + __u8 pad[4];
> > __u64 serror_esr;
> > } exception;
> > __u32 reserved[12];
> > diff --git a/tools/testing/selftests/kvm/Makefile.kvm b/tools/testing/selftests/kvm/Makefile.kvm
> > index 9eecce6b8274f..e6b504ded9c1c 100644
> > --- a/tools/testing/selftests/kvm/Makefile.kvm
> > +++ b/tools/testing/selftests/kvm/Makefile.kvm
> > @@ -149,6 +149,7 @@ TEST_GEN_PROGS_arm64 += arm64/arch_timer_edge_cases
> > TEST_GEN_PROGS_arm64 += arm64/debug-exceptions
> > TEST_GEN_PROGS_arm64 += arm64/host_sve
> > TEST_GEN_PROGS_arm64 += arm64/hypercalls
> > +TEST_GEN_PROGS_arm64 += arm64/inject_iabt
> > TEST_GEN_PROGS_arm64 += arm64/mmio_abort
> > TEST_GEN_PROGS_arm64 += arm64/page_fault_test
> > TEST_GEN_PROGS_arm64 += arm64/psci_test
> > diff --git a/tools/testing/selftests/kvm/arm64/inject_iabt.c b/tools/testing/selftests/kvm/arm64/inject_iabt.c
> > new file mode 100644
> > index 0000000000000..0c7999e5ba5b3
> > --- /dev/null
> > +++ b/tools/testing/selftests/kvm/arm64/inject_iabt.c
> > @@ -0,0 +1,98 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/*
> > + * inject_iabt.c - Tests for injecting instruction aborts into guest.
> > + */
> > +
> > +#include "processor.h"
> > +#include "test_util.h"
> > +
> > +static void expect_iabt_handler(struct ex_regs *regs)
> > +{
> > + u64 esr = read_sysreg(esr_el1);
> > +
> > + GUEST_PRINTF("Handling Guest SEA\n");
> > + GUEST_PRINTF(" ESR_EL1=%#lx\n", esr);
> > +
> > + GUEST_ASSERT_EQ(ESR_ELx_EC(esr), ESR_ELx_EC_IABT_CUR);
> > + GUEST_ASSERT_EQ(esr & ESR_ELx_FSC_TYPE, ESR_ELx_FSC_EXTABT);
> > +
> > + GUEST_DONE();
> > +}
> > +
> > +static void guest_code(void)
> > +{
> > + GUEST_FAIL("Guest should only run SEA handler");
> > +}
> > +
> > +static void vcpu_run_expect_done(struct kvm_vcpu *vcpu)
> > +{
> > + struct ucall uc;
> > + bool guest_done = false;
> > +
> > + do {
> > + vcpu_run(vcpu);
> > + switch (get_ucall(vcpu, &uc)) {
> > + case UCALL_ABORT:
> > + REPORT_GUEST_ASSERT(uc);
> > + break;
> > + case UCALL_PRINTF:
> > + ksft_print_msg("From guest: %s", uc.buffer);
> > + break;
> > + case UCALL_DONE:
> > + ksft_print_msg("Guest done gracefully!\n");
> > + guest_done = true;
> > + break;
> > + default:
> > + TEST_FAIL("Unexpected ucall: %lu", uc.cmd);
> > + }
> > + } while (!guest_done);
> > +}
> > +
> > +static void vcpu_inject_ext_iabt(struct kvm_vcpu *vcpu)
> > +{
> > + struct kvm_vcpu_events events = {};
> > +
> > + events.exception.ext_iabt_pending = true;
> > + vcpu_events_set(vcpu, &events);
> > +}
> > +
> > +static void vcpu_inject_invalid_abt(struct kvm_vcpu *vcpu)
> > +{
> > + struct kvm_vcpu_events events = {};
> > + int r;
> > +
> > + events.exception.ext_iabt_pending = true;
> > + events.exception.ext_dabt_pending = true;
> > +
> > + ksft_print_msg("Injecting invalid external abort events\n");
> > + r = __vcpu_ioctl(vcpu, KVM_SET_VCPU_EVENTS, &events);
> > + TEST_ASSERT(r && errno == EINVAL,
> > + KVM_IOCTL_ERROR(KVM_SET_VCPU_EVENTS, r));
> > +}
> > +
> > +static void test_inject_iabt(void)
> > +{
> > + struct kvm_vcpu *vcpu;
> > + struct kvm_vm *vm;
> > +
> > + vm = vm_create_with_one_vcpu(&vcpu, guest_code);
> > +
> > + vm_init_descriptor_tables(vm);
> > + vcpu_init_descriptor_tables(vcpu);
> > +
> > + vm_install_sync_handler(vm, VECTOR_SYNC_CURRENT,
> > + ESR_ELx_EC_IABT_CUR, expect_iabt_handler);
> > +
> > + vcpu_inject_invalid_abt(vcpu);
> > +
> > + vcpu_inject_ext_iabt(vcpu);
> > + vcpu_run_expect_done(vcpu);
> > +
> > + kvm_vm_free(vm);
> > +}
> > +
> > +int main(void)
> > +{
> > + test_inject_iabt();
> > + return 0;
> > +}
> > --
> > 2.49.0.1266.g31b7d2e469-goog
> >
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH v2 6/6] Documentation: kvm: new uAPI for handling SEA
2025-06-04 5:08 [PATCH v2 0/6] VMM can handle guest SEA via KVM_EXIT_ARM_SEA Jiaqi Yan
` (4 preceding siblings ...)
2025-06-04 5:09 ` [PATCH v2 5/6] KVM: selftests: Test for KVM_CAP_INJECT_EXT_IABT Jiaqi Yan
@ 2025-06-04 5:09 ` Jiaqi Yan
5 siblings, 0 replies; 21+ messages in thread
From: Jiaqi Yan @ 2025-06-04 5:09 UTC (permalink / raw)
To: maz, oliver.upton
Cc: joey.gouly, suzuki.poulose, yuzenghui, catalin.marinas, will,
pbonzini, corbet, shuah, kvm, kvmarm, linux-arm-kernel,
linux-kernel, linux-doc, linux-kselftest, duenwen, rananta,
jthoughton, Jiaqi Yan
Document the new userspace-visible features and APIs for handling
synchronous external abort (SEA)
- KVM_CAP_ARM_SEA_TO_USER: How userspace enables the new feature.
- KVM_EXIT_ARM_SEA: When userspace needs to handle SEA and what
userspace gets while taking the SEA.
- KVM_CAP_ARM_INJECT_EXT_(D|I)ABT: How userspace injects SEA to
guest while taking the SEA.
Signed-off-by: Jiaqi Yan <jiaqiyan@google.com>
---
Documentation/virt/kvm/api.rst | 128 +++++++++++++++++++++++++++++----
1 file changed, 115 insertions(+), 13 deletions(-)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index fe3d6b5d2acca..c58ecb72a4b4d 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -1236,8 +1236,9 @@ directly to the virtual CPU).
__u8 serror_pending;
__u8 serror_has_esr;
__u8 ext_dabt_pending;
+ __u8 ext_iabt_pending;
/* Align it to 8 bytes */
- __u8 pad[5];
+ __u8 pad[4];
__u64 serror_esr;
} exception;
__u32 reserved[12];
@@ -1292,20 +1293,57 @@ ARM64:
User space may need to inject several types of events to the guest.
+Inject SError
+~~~~~~~~~~~~~
+
Set the pending SError exception state for this VCPU. It is not possible to
'cancel' an Serror that has been made pending.
-If the guest performed an access to I/O memory which could not be handled by
-userspace, for example because of missing instruction syndrome decode
-information or because there is no device mapped at the accessed IPA, then
-userspace can ask the kernel to inject an external abort using the address
-from the exiting fault on the VCPU. It is a programming error to set
-ext_dabt_pending after an exit which was not either KVM_EXIT_MMIO or
-KVM_EXIT_ARM_NISV. This feature is only available if the system supports
-KVM_CAP_ARM_INJECT_EXT_DABT. This is a helper which provides commonality in
-how userspace reports accesses for the above cases to guests, across different
-userspace implementations. Nevertheless, userspace can still emulate all Arm
-exceptions by manipulating individual registers using the KVM_SET_ONE_REG API.
+Inject SEA (synchronous external abort)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+- If the guest performed an access to I/O memory which could not be handled by
+ userspace, for example because of missing instruction syndrome decode
+ information or because there is no device mapped at the accessed IPA.
+
+- If the guest consumed an uncorrected memory error, and RAS extension in the
+ Trusted Firmware chooses to notify PE with SEA, KVM has to handle it when
+ host APEI is unable to claim the SEA. For the following types of faults,
+ if userspace has enabled KVM_CAP_ARM_SEA_TO_USER, KVM returns to userspace
+ with KVM_EXIT_ARM_SEA:
+
+ - Synchronous external abort, not on translation table walk or hardware
+ update of translation table.
+
+ - Synchronous external abort on stage-1 translation table walk or hardware
+ update of stage-1 translation table, including all levels.
+
+ - Synchronous parity or ECC error on memory access, not on translation table
+ walk.
+
+ - Synchronous parity or ECC error on memory access on stage-1 translation
+ table walk or hardware update of stage-1 translation table, including
+ all levels.
+
+Note that external abort or ECC error on memory access on stage-2 translation
+table walk or hardware update of stage-2 translation table does not results in
+KVM_EXIT_ARM_SEA, even if KVM_CAP_ARM_SEA_TO_USER is enabled.
+
+For the cases above, userspace can ask the kernel to replay either an external
+data abort (by setting ext_dabt_pending) or an external instruction abort
+(by setting ext_iabt_pending) into the faulting VCPU. KVM will use the address
+from the existing fault on the VCPU. Setting both ext_dabt_pending and
+ext_iabt_pending at the same time will return -EINVAL.
+
+It is a programming error to set ext_dabt_pending or ext_iabt_pending after an
+exit which was not KVM_EXIT_MMIO, KVM_EXIT_ARM_NISV or KVM_EXIT_ARM_SEA.
+Injecting SEA for data and instruction abort is only available if KVM supports
+KVM_CAP_ARM_INJECT_EXT_DABT and KVM_CAP_ARM_INJECT_EXT_IABT respectively.
+
+This is a helper which provides commonality in how userspace reports accesses
+for the above cases to guests, across different userspace implementations.
+Nevertheless, userspace can still emulate all Arm exceptions by manipulating
+individual registers using the KVM_SET_ONE_REG API.
See KVM_GET_VCPU_EVENTS for the data structure.
@@ -7163,6 +7201,58 @@ The valid value for 'flags' is:
- KVM_NOTIFY_CONTEXT_INVALID -- the VM context is corrupted and not valid
in VMCS. It would run into unknown result if resume the target VM.
+::
+
+ /* KVM_EXIT_ARM_SEA */
+ struct {
+ __u64 esr;
+ #define KVM_EXIT_ARM_SEA_FLAG_GVA_VALID (1ULL << 0)
+ #define KVM_EXIT_ARM_SEA_FLAG_GPA_VALID (1ULL << 1)
+ __u64 flags;
+ __u64 gva;
+ __u64 gpa;
+ } arm_sea;
+
+Used on arm64 systems. When the VM capability KVM_CAP_ARM_SEA_TO_USER is
+enabled, a VM exit is generated if guest causes a synchronous external abort
+(SEA) and the host APEI fails to handle the SEA.
+
+Historically KVM handles SEA by first delegating the SEA to host APEI as there
+is high chance that the SEA is caused by consuming uncorrected memory error.
+However, not all platforms support SEA handling in APEI, and KVM's fallback
+handling is to inject an async SError into the guest, which usually panics
+guest kernel unpleasantly. As an alternative, userspace can participate into
+the SEA handling by enabling KVM_CAP_ARM_SEA_TO_USER at VM creation, after
+querying the capability. Once enabled, when KVM has to handle the guest
+caused SEA, it returns to userspace with KVM_EXIT_ARM_SEA, with details
+about the SEA available in 'arm_sea'.
+
+The 'esr' field holds the value of the exception syndrome register (ESR) while
+KVM taking the SEA, which tells userspace the character of the current SEA,
+such as its Exception Class, Synchronous Error Type, Fault Specific Code and
+so on. For more details on ESR, check the Arm Architecture Registers
+documentation.
+
+The 'flags' field indicates if the faulting addresses are valid while taking
+the SEA:
+
+ - KVM_EXIT_ARM_SEA_FLAG_GVA_VALID -- the faulting guest virtual address
+ is valid and userspace can get its value in the 'gva' field.
+ - KVM_EXIT_ARM_SEA_FLAG_GPA_VALID -- the faulting guest physical address
+ is valid and userspace can get its value in the 'gpa' field.
+
+Userspace needs to take actions to handle guest SEA synchronously, namely in
+the same thread that runs KVM_RUN and receives KVM_EXIT_ARM_SEA. One of the
+encouraged approaches is to utilize the KVM_SET_VCPU_EVENTS to inject the SEA
+to the faulting VCPU. This way, the guest has the opportunity to keep running
+and limit the blast radius of the SEA to the particular guest application that
+caused the SEA. If the Exception Class indicated by 'esr' field in 'arm_sea'
+is data abort, userspace should inject data abort. If the Exception Class is
+instruction abort, userspace should inject instruction abort. Userspace may
+also emulate the SEA to VM by itself using the KVM_SET_ONE_REG API. In this
+case, it can use the valid values from 'gva' and 'gpa' fields to manipulate
+VCPU's registers (e.g. FAR_EL1, HPFAR_EL1).
+
::
/* Fix the size of the union. */
@@ -8490,7 +8580,7 @@ ENOSYS for the others.
When enabled, KVM will exit to userspace with KVM_EXIT_SYSTEM_EVENT of
type KVM_SYSTEM_EVENT_SUSPEND to process the guest suspend request.
-7.37 KVM_CAP_ARM_WRITABLE_IMP_ID_REGS
+7.42 KVM_CAP_ARM_WRITABLE_IMP_ID_REGS
-------------------------------------
:Architectures: arm64
@@ -8508,6 +8598,18 @@ aforementioned registers before the first KVM_RUN. These registers are VM
scoped, meaning that the same set of values are presented on all vCPUs in a
given VM.
+7.43 KVM_CAP_ARM_SEA_TO_USER
+----------------------------
+
+:Architecture: arm64
+:Target: VM
+:Parameters: none
+:Returns: 0 on success, -EINVAL if unsupported.
+
+This capability, if KVM_CHECK_EXTENSION indicates that it is available, means
+that KVM has an implementation that allows userspace to participate in handling
+synchronous external abort caused by VM, by an exit of KVM_EXIT_ARM_SEA.
+
8. Other capabilities.
======================
--
2.49.0.1266.g31b7d2e469-goog
^ permalink raw reply related [flat|nested] 21+ messages in thread