From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 3EC1E10987A5 for ; Fri, 20 Mar 2026 16:35:54 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Type:MIME-Version: References:In-Reply-To:Subject:Cc:To:From:Message-ID:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=jQGW6dUAkw3sMzSAwgPkDCgLnLC7tkH+EPEzCOc8nYs=; b=uc2VVnH3IdJaxM4jb2qhEoOMWe NoTeNXF9RmJh1RBDLVezY8euo/Rqr0EtbU/9s7QbQJrtDXYMLCfdI+KOlobDImej7IgUmOTN/xA4C bg+fe/3/Zfypm9YtZ/hLk12FWTACpNxoN/35nt6Y1SZ1m/XSNV7c3Hs1N/Jv/AwC+czMTpVDwyOyn hZa5/NeKiFrgboTlLMBzdJSTTU0NR58+gUUyMFCvVXyTmmTpfWkli33v8OkCzxpt8GqqWCRtgaX9/ ruG5wJp3n+OIR+rkPDZeTNglTRXs6Tcn/yMaUF9iDzoZFkNssmq2DjAKhJh/jiZTh/XxBgXC6a8ed Lc8HR3Pg==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98.2 #2 (Red Hat Linux)) id 1w3cpC-0000000D9ty-2bhy; Fri, 20 Mar 2026 16:35:50 +0000 Received: from tor.source.kernel.org ([2600:3c04:e001:324:0:1991:8:25]) by bombadil.infradead.org with esmtps (Exim 4.98.2 #2 (Red Hat Linux)) id 1w3cpB-0000000D9ts-03HS for linux-arm-kernel@lists.infradead.org; Fri, 20 Mar 2026 16:35:49 +0000 Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by tor.source.kernel.org (Postfix) with ESMTP id 3F98D60128; Fri, 20 Mar 2026 16:35:48 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id C64BBC4CEF7; Fri, 20 Mar 2026 16:35:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1774024547; bh=Vhr1ZOzaTuRxK1NbGPvVPF/qRGgb9xTzt4pPvJnK5n8=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=Gh5ClOauJ0jxeRyHXWR+gNa1lGiTB/sZ8r0Lu9T5G8U57tvS/GTqhzXb4Gk1p/IRF nGeJvcha/Si8OkZ8r2qvMvXYbie9Gz9YZ3VCpyLxPCgcl9d1Dzq1RvYpA/YmqSK2Q0 tCtAP7I7Oyc0rNBWiU6ok2AErGSl8H7MRU1xALyW1mdepKOM+3KvKgIliYEpUvzcbK Md5c17v9JkZt4EKvUW686OF5ilU7KUawQ5eHrLonbBhD+1FrjD7V8iLegvn26gjY5p D9b6sgR1pr3dWE37n+Pc3CtWQM7dGNrz4T3TF9osKSPNjyVb2dV/Kg1r1JJLhgJZJ+ DKmaUQhFdjdcg== Received: from sofa.misterjones.org ([185.219.108.64] helo=goblin-girl.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.98.2) (envelope-from ) id 1w3cp7-000000045Ud-1Vqh; Fri, 20 Mar 2026 16:35:45 +0000 Date: Fri, 20 Mar 2026 16:35:44 +0000 Message-ID: <86341u5uhr.wl-maz@kernel.org> From: Marc Zyngier To: Will Deacon Cc: kvmarm@lists.linux.dev, linux-arm-kernel@lists.infradead.org, Oliver Upton , Joey Gouly , Suzuki K Poulose , Zenghui Yu , Catalin Marinas , Quentin Perret , Fuad Tabba , Vincent Donnefort , Mostafa Saleh , Alexandru Elisei Subject: Re: [PATCH v3 26/36] KVM: arm64: Return -EFAULT from VCPU_RUN on access to a poisoned pte In-Reply-To: <20260305144351.17071-27-will@kernel.org> References: <20260305144351.17071-1-will@kernel.org> <20260305144351.17071-27-will@kernel.org> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/30.1 (aarch64-unknown-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-SA-Exim-Connect-IP: 185.219.108.64 X-SA-Exim-Rcpt-To: will@kernel.org, kvmarm@lists.linux.dev, linux-arm-kernel@lists.infradead.org, oupton@kernel.org, joey.gouly@arm.com, suzuki.poulose@arm.com, yuzenghui@huawei.com, catalin.marinas@arm.com, qperret@google.com, tabba@google.com, vdonnefort@google.com, smostafa@google.com, alexandru.elisei@arm.com X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Thu, 05 Mar 2026 14:43:39 +0000, Will Deacon wrote: > > If a protected vCPU faults on an IPA which appears to be mapped, query > the hypervisor to determine whether or not the faulting pte has been > poisoned by a forceful reclaim. If the pte has been poisoned, return > -EFAULT back to userspace rather than retrying the instruction forever. > > Signed-off-by: Will Deacon > --- > arch/arm64/include/asm/kvm_asm.h | 1 + > arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 1 + > arch/arm64/kvm/hyp/nvhe/hyp-main.c | 10 +++++ > arch/arm64/kvm/hyp/nvhe/mem_protect.c | 43 +++++++++++++++++++ > arch/arm64/kvm/pkvm.c | 9 ++-- > 5 files changed, 61 insertions(+), 3 deletions(-) > > diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h > index 04a230e906a7..6c79f7504d80 100644 > --- a/arch/arm64/include/asm/kvm_asm.h > +++ b/arch/arm64/include/asm/kvm_asm.h > @@ -90,6 +90,7 @@ enum __kvm_host_smccc_func { > __KVM_HOST_SMCCC_FUNC___pkvm_unreserve_vm, > __KVM_HOST_SMCCC_FUNC___pkvm_init_vm, > __KVM_HOST_SMCCC_FUNC___pkvm_init_vcpu, > + __KVM_HOST_SMCCC_FUNC___pkvm_vcpu_in_poison_fault, > __KVM_HOST_SMCCC_FUNC___pkvm_force_reclaim_guest_page, > __KVM_HOST_SMCCC_FUNC___pkvm_reclaim_dying_guest_page, > __KVM_HOST_SMCCC_FUNC___pkvm_start_teardown_vm, > diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h > index f27b037abaf3..5e6cdafcdd69 100644 > --- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h > +++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h > @@ -41,6 +41,7 @@ int __pkvm_hyp_donate_host(u64 pfn, u64 nr_pages); > int __pkvm_host_share_ffa(u64 pfn, u64 nr_pages); > int __pkvm_host_unshare_ffa(u64 pfn, u64 nr_pages); > int __pkvm_host_donate_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu); > +int __pkvm_vcpu_in_poison_fault(struct pkvm_hyp_vcpu *hyp_vcpu); > int __pkvm_host_force_reclaim_page_guest(phys_addr_t phys); > int __pkvm_host_reclaim_page_guest(u64 gfn, struct pkvm_hyp_vm *vm); > int __pkvm_host_share_guest(u64 pfn, u64 gfn, u64 nr_pages, struct pkvm_hyp_vcpu *vcpu, > diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c > index 456c83207717..90e3b14fe287 100644 > --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c > +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c > @@ -573,6 +573,15 @@ static void handle___pkvm_init_vcpu(struct kvm_cpu_context *host_ctxt) > cpu_reg(host_ctxt, 1) = __pkvm_init_vcpu(handle, host_vcpu, vcpu_hva); > } > > +static void handle___pkvm_vcpu_in_poison_fault(struct kvm_cpu_context *host_ctxt) > +{ > + int ret; > + struct pkvm_hyp_vcpu *hyp_vcpu = pkvm_get_loaded_hyp_vcpu(); > + > + ret = hyp_vcpu ? __pkvm_vcpu_in_poison_fault(hyp_vcpu) : -EINVAL; > + cpu_reg(host_ctxt, 1) = ret; > +} > + > static void handle___pkvm_force_reclaim_guest_page(struct kvm_cpu_context *host_ctxt) > { > DECLARE_REG(phys_addr_t, phys, host_ctxt, 1); > @@ -641,6 +650,7 @@ static const hcall_t host_hcall[] = { > HANDLE_FUNC(__pkvm_unreserve_vm), > HANDLE_FUNC(__pkvm_init_vm), > HANDLE_FUNC(__pkvm_init_vcpu), > + HANDLE_FUNC(__pkvm_vcpu_in_poison_fault), > HANDLE_FUNC(__pkvm_force_reclaim_guest_page), > HANDLE_FUNC(__pkvm_reclaim_dying_guest_page), > HANDLE_FUNC(__pkvm_start_teardown_vm), > diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c > index 4ff31947579b..7f705f662c40 100644 > --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c > +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c > @@ -890,6 +890,49 @@ static int get_valid_guest_pte(struct pkvm_hyp_vm *vm, u64 ipa, kvm_pte_t *ptep, > return 0; > } > > +int __pkvm_vcpu_in_poison_fault(struct pkvm_hyp_vcpu *hyp_vcpu) > +{ > + struct pkvm_hyp_vm *vm = pkvm_hyp_vcpu_to_hyp_vm(hyp_vcpu); > + kvm_pte_t pte; > + s8 level; > + u64 ipa; > + int ret; > + > + switch (kvm_vcpu_trap_get_class(&hyp_vcpu->vcpu)) { > + case ESR_ELx_EC_DABT_LOW: > + case ESR_ELx_EC_IABT_LOW: > + if (kvm_vcpu_trap_is_translation_fault(&hyp_vcpu->vcpu)) > + break; > + fallthrough; > + default: > + return -EINVAL; > + } > + > + /* > + * The host has the faulting IPA when it calls us from the guest > + * fault handler but we retrieve it ourselves from the FAR so as > + * to avoid exposing an "oracle" that could reveal data access > + * patterns of the guest after initial donation of its pages. > + */ > + ipa = kvm_vcpu_get_fault_ipa(&hyp_vcpu->vcpu); > + ipa |= kvm_vcpu_get_hfar(&hyp_vcpu->vcpu) & GENMASK(11, 0); nit: we now have FAR_TO_FIPA_OFFSET() for this. > + > + guest_lock_component(vm); > + ret = kvm_pgtable_get_leaf(&vm->pgt, ipa, &pte, &level); > + if (ret) > + goto unlock; > + > + if (level != KVM_PGTABLE_LAST_LEVEL) { > + ret = -EINVAL; > + goto unlock; > + } > + > + ret = guest_pte_is_poisoned(pte); > +unlock: > + guest_unlock_component(vm); > + return ret; > +} > + > int __pkvm_host_share_hyp(u64 pfn) > { > u64 phys = hyp_pfn_to_phys(pfn); > diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c > index 32294bd21dde..da0a45dab203 100644 > --- a/arch/arm64/kvm/pkvm.c > +++ b/arch/arm64/kvm/pkvm.c > @@ -417,10 +417,13 @@ int pkvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size, > return -EINVAL; > > /* > - * We raced with another vCPU. > + * We either raced with another vCPU or the guest PTE > + * has been poisoned by an erroneous host access. > */ > - if (mapping) > - return -EAGAIN; > + if (mapping) { > + ret = kvm_call_hyp_nvhe(__pkvm_vcpu_in_poison_fault); > + return ret ? -EFAULT : -EAGAIN; > + } I guess this considers that racing against another vcpu is an unlikely situation, because calling back into EL2 and walking the PTs isn't exactly cheap. I wonder if there is a mechanism we could use to directly return this information to the host at the point of the guest fault. The only things I can figure out would require the PTE to be valid (access or permission faults, for example), and that'd break the "full PTE dedicated to annotations"... M. -- Without deviation from the norm, progress is not possible.