linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Jose Marinho <jose.marinho@arm.com>
To: Jiaqi Yan <jiaqiyan@google.com>, maz@kernel.org, oliver.upton@linux.dev
Cc: duenwen@google.com, rananta@google.com, jthoughton@google.com,
	vsethi@nvidia.com, jgg@nvidia.com, joey.gouly@arm.com,
	suzuki.poulose@arm.com, yuzenghui@huawei.com,
	catalin.marinas@arm.com, will@kernel.org, pbonzini@redhat.com,
	corbet@lwn.net, shuah@kernel.org, kvm@vger.kernel.org,
	kvmarm@lists.linux.dev, linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
	linux-kselftest@vger.kernel.org
Subject: Re: [PATCH v4 1/3] KVM: arm64: VM exit to userspace to handle SEA
Date: Mon, 3 Nov 2025 18:17:00 +0000	[thread overview]
Message-ID: <7a61bcf9-a57d-a8e9-a9b8-4eacef80acd3@arm.com> (raw)
In-Reply-To: <20251013185903.1372553-2-jiaqiyan@google.com>

Thank you for these patches.

On 10/13/2025 7:59 PM, Jiaqi Yan wrote:
> When APEI fails to handle a stage-2 synchronous external abort (SEA),
> today KVM injects an asynchronous SError to the VCPU then resumes it,
> which usually results in unpleasant guest kernel panic.
> 
> One major situation of guest SEA is when vCPU consumes recoverable
> uncorrected memory error (UER). Although SError and guest kernel panic
> effectively stops the propagation of corrupted memory, guest may
> re-use the corrupted memory if auto-rebooted; in worse case, guest
> boot may run into poisoned memory. So there is room to recover from
> an UER in a more graceful manner.
> 
> Alternatively KVM can redirect the synchronous SEA event to VMM to
> - Reduce blast radius if possible. VMM can inject a SEA to VCPU via
>    KVM's existing KVM_SET_VCPU_EVENTS API. If the memory poison
>    consumption or fault is not from guest kernel, blast radius can be
>    limited to the triggering thread in guest userspace, so VM can
>    keep running.
> - Allow VMM to protect from future memory poison consumption by
>    unmapping the page from stage-2, or to interrupt guest of the
>    poisoned page so guest kernel can unmap it from stage-1 page table.
> - Allow VMM to track SEA events that VM customers care about, to restart
>    VM when certain number of distinct poison events have happened,
>    to provide observability to customers in log management UI.
> 
> Introduce an userspace-visible feature to enable VMM handle SEA:
> - KVM_CAP_ARM_SEA_TO_USER. As the alternative fallback behavior
>    when host APEI fails to claim a SEA, userspace can opt in this new
>    capability to let KVM exit to userspace during SEA if it is not
>    owned by host.
> - KVM_EXIT_ARM_SEA. A new exit reason is introduced for this.
>    KVM fills kvm_run.arm_sea with as much as possible information about
>    the SEA, enabling VMM to emulate SEA to guest by itself.
>    - Sanitized ESR_EL2. The general rule is to keep only the bits
>      useful for userspace and relevant to guest memory.
>    - Flags indicating if faulting guest physical address is valid.
>    - Faulting guest physical and virtual addresses if valid.
> 
> Signed-off-by: Jiaqi Yan <jiaqiyan@google.com>
> Co-developed-by: Oliver Upton <oliver.upton@linux.dev>
> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> ---
>   arch/arm64/include/asm/kvm_host.h |  2 +
>   arch/arm64/kvm/arm.c              |  5 +++
>   arch/arm64/kvm/mmu.c              | 68 ++++++++++++++++++++++++++++++-
>   include/uapi/linux/kvm.h          | 10 +++++
>   4 files changed, 84 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index b763293281c88..e2c65b14e60c4 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -350,6 +350,8 @@ struct kvm_arch {
>   #define KVM_ARCH_FLAG_GUEST_HAS_SVE			9
>   	/* MIDR_EL1, REVIDR_EL1, and AIDR_EL1 are writable from userspace */
>   #define KVM_ARCH_FLAG_WRITABLE_IMP_ID_REGS		10
> +	/* Unhandled SEAs are taken to userspace */
> +#define KVM_ARCH_FLAG_EXIT_SEA				11
>   	unsigned long flags;
>   
>   	/* VM-wide vCPU feature set */
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index f21d1b7f20f8e..888600df79c40 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -132,6 +132,10 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
>   		}
>   		mutex_unlock(&kvm->lock);
>   		break;
> +	case KVM_CAP_ARM_SEA_TO_USER:
> +		r = 0;
> +		set_bit(KVM_ARCH_FLAG_EXIT_SEA, &kvm->arch.flags);
> +		break;
>   	default:
>   		break;
>   	}
> @@ -327,6 +331,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>   	case KVM_CAP_IRQFD_RESAMPLE:
>   	case KVM_CAP_COUNTER_OFFSET:
>   	case KVM_CAP_ARM_WRITABLE_IMP_ID_REGS:
> +	case KVM_CAP_ARM_SEA_TO_USER:
>   		r = 1;
>   		break;
>   	case KVM_CAP_SET_GUEST_DEBUG2:
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 7cc964af8d305..09210b6ab3907 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1899,8 +1899,48 @@ static void handle_access_fault(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
>   	read_unlock(&vcpu->kvm->mmu_lock);
>   }
>   
> +/*
> + * Returns true if the SEA should be handled locally within KVM if the abort
> + * is caused by a kernel memory allocation (e.g. stage-2 table memory).
> + */
> +static bool host_owns_sea(struct kvm_vcpu *vcpu, u64 esr)
> +{
> +	/*
> +	 * Without FEAT_RAS HCR_EL2.TEA is RES0, meaning any external abort
> +	 * taken from a guest EL to EL2 is due to a host-imposed access (e.g.
> +	 * stage-2 PTW).
> +	 */
> +	if (!cpus_have_final_cap(ARM64_HAS_RAS_EXTN))
> +		return true;
> +
> +	/* KVM owns the VNCR when the vCPU isn't in a nested context. */
> +	if (is_hyp_ctxt(vcpu) && (esr & ESR_ELx_VNCR))
Is this check valid only for a "Data Abort"?
> +		return true;
> +
> +	/*
> +	 * Determine if an external abort during a table walk happened at
> +	 * stage-2 is only possible with S1PTW is set. Otherwise, since KVM
nit: Is the first sentence correct?

> +	 * sets HCR_EL2.TEA, SEAs due to a stage-1 walk (i.e. accessing the
> +	 * PA of the stage-1 descriptor) can reach here and are reported
> +	 * with a TTW ESR value.
> +	 */
> +	return (esr_fsc_is_sea_ttw(esr) && (esr & ESR_ELx_S1PTW));
> +}
> +
>   int kvm_handle_guest_sea(struct kvm_vcpu *vcpu)
>   {
> +	struct kvm *kvm = vcpu->kvm;
> +	struct kvm_run *run = vcpu->run;
> +	u64 esr = kvm_vcpu_get_esr(vcpu);
> +	u64 esr_mask = ESR_ELx_EC_MASK	|
> +		       ESR_ELx_IL	|
> +		       ESR_ELx_FnV	|
> +		       ESR_ELx_EA	|
> +		       ESR_ELx_CM	|
> +		       ESR_ELx_WNR	|
> +		       ESR_ELx_FSC;
> +	u64 ipa;
> +
>   	/*
>   	 * Give APEI the opportunity to claim the abort before handling it
>   	 * within KVM. apei_claim_sea() expects to be called with IRQs enabled.
> @@ -1909,7 +1949,33 @@ int kvm_handle_guest_sea(struct kvm_vcpu *vcpu)
>   	if (apei_claim_sea(NULL) == 0)
>   		return 1;
>   
> -	return kvm_inject_serror(vcpu);
> +	if (host_owns_sea(vcpu, esr) ||
> +	    !test_bit(KVM_ARCH_FLAG_EXIT_SEA, &vcpu->kvm->arch.flags))
> +		return kvm_inject_serror(vcpu);
> +
> +	/* ESR_ELx.SET is RES0 when FEAT_RAS isn't implemented. */
> +	if (kvm_has_ras(kvm))
> +		esr_mask |= ESR_ELx_SET_MASK;
> +
> +	/*
> +	 * Exit to userspace, and provide faulting guest virtual and physical
> +	 * addresses in case userspace wants to emulate SEA to guest by
> +	 * writing to FAR_ELx and HPFAR_ELx registers.
> +	 */
> +	memset(&run->arm_sea, 0, sizeof(run->arm_sea));
> +	run->exit_reason = KVM_EXIT_ARM_SEA;
> +	run->arm_sea.esr = esr & esr_mask;
> +
> +	if (!(esr & ESR_ELx_FnV))
> +		run->arm_sea.gva = kvm_vcpu_get_hfar(vcpu) > +
> +	ipa = kvm_vcpu_get_fault_ipa(vcpu);
> +	if (ipa != INVALID_GPA) {
> +		run->arm_sea.flags |= KVM_EXIT_ARM_SEA_FLAG_GPA_VALID;
> +		run->arm_sea.gpa = ipa;

Are we interested in the value of PFAR_EL2 (if FEAT_PFAR implemented)?
I believe kvm_vcpu_get_fault_ipa gets the HPFAR_EL2, which is valid for 
S2 translation and GPC faults, but unknown for other cases.

Jose

> +	}
> +
> +	return 0;
>   }
>   
>   /**
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 6efa98a57ec11..acc7b3a346992 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -179,6 +179,7 @@ struct kvm_xen_exit {
>   #define KVM_EXIT_LOONGARCH_IOCSR  38
>   #define KVM_EXIT_MEMORY_FAULT     39
>   #define KVM_EXIT_TDX              40
> +#define KVM_EXIT_ARM_SEA          41
>   
>   /* For KVM_EXIT_INTERNAL_ERROR */
>   /* Emulate instruction failed. */
> @@ -473,6 +474,14 @@ struct kvm_run {
>   				} setup_event_notify;
>   			};
>   		} tdx;
> +		/* KVM_EXIT_ARM_SEA */
> +		struct {
> +#define KVM_EXIT_ARM_SEA_FLAG_GPA_VALID	(1ULL << 0)
> +			__u64 flags;
> +			__u64 esr;
> +			__u64 gva;
> +			__u64 gpa;
> +		} arm_sea;
>   		/* Fix the size of the union. */
>   		char padding[256];
>   	};
> @@ -963,6 +972,7 @@ struct kvm_enable_cap {
>   #define KVM_CAP_RISCV_MP_STATE_RESET 242
>   #define KVM_CAP_ARM_CACHEABLE_PFNMAP_SUPPORTED 243
>   #define KVM_CAP_GUEST_MEMFD_MMAP 244
> +#define KVM_CAP_ARM_SEA_TO_USER 245
>   
>   struct kvm_irq_routing_irqchip {
>   	__u32 irqchip;


  reply	other threads:[~2025-11-03 18:17 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-13 18:59 [PATCH v4 0/3] VMM can handle guest SEA via KVM_EXIT_ARM_SEA Jiaqi Yan
2025-10-13 18:59 ` [PATCH v4 1/3] KVM: arm64: VM exit to userspace to handle SEA Jiaqi Yan
2025-11-03 18:17   ` Jose Marinho [this message]
2025-11-03 20:45     ` Jiaqi Yan
2025-11-11  9:53       ` Oliver Upton
2025-11-11 23:32         ` Jiaqi Yan
2025-11-03 22:22     ` Marc Zyngier
2025-10-13 18:59 ` [PATCH v4 2/3] KVM: selftests: Test for KVM_EXIT_ARM_SEA Jiaqi Yan
2025-10-13 18:59 ` [PATCH v4 3/3] Documentation: kvm: new UAPI for handling SEA Jiaqi Yan
2025-10-14  1:51   ` Randy Dunlap
2025-10-21 16:13     ` Jiaqi Yan
2025-10-20 14:46 ` [PATCH v4 0/3] VMM can handle guest SEA via KVM_EXIT_ARM_SEA Jason Gunthorpe
2025-11-10 17:41   ` Jiaqi Yan
2025-11-13 13:54     ` Mauro Carvalho Chehab
2025-11-13 18:21       ` Oliver Upton
2025-11-13 21:06 ` Oliver Upton
2025-11-13 22:14   ` Jiaqi Yan
2025-11-13 22:33     ` Oliver Upton
2025-11-14  0:53       ` Jiaqi Yan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7a61bcf9-a57d-a8e9-a9b8-4eacef80acd3@arm.com \
    --to=jose.marinho@arm.com \
    --cc=catalin.marinas@arm.com \
    --cc=corbet@lwn.net \
    --cc=duenwen@google.com \
    --cc=jgg@nvidia.com \
    --cc=jiaqiyan@google.com \
    --cc=joey.gouly@arm.com \
    --cc=jthoughton@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.linux.dev \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=maz@kernel.org \
    --cc=oliver.upton@linux.dev \
    --cc=pbonzini@redhat.com \
    --cc=rananta@google.com \
    --cc=shuah@kernel.org \
    --cc=suzuki.poulose@arm.com \
    --cc=vsethi@nvidia.com \
    --cc=will@kernel.org \
    --cc=yuzenghui@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).