From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 6835BFF885D for ; Tue, 28 Apr 2026 13:45:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:In-Reply-To: Content-Transfer-Encoding:Content-Type:MIME-Version:References:Message-ID: Subject:Cc:To:From:Date:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=Xj43fmA92xTaWjtulGJ0Hz/Ajlax6eTpQtBMorT2rls=; b=Eq+UMxvLIRdP3NwPAdHh99/sQV 5EfGBhI+mZfW4n674SobhHl+/8oAC+7aXGHiBj+dC/58OYvr4yVtohf/ojtwLX4wmnS7v3PtgISLL By+fPODTF7IxRnjxnqooQjsBIcVsCoMYapHTXka7BLZnAkSHT8j8+gDZmh/cSHZwEm+4e3eWXPIge PoHODL5y/pHyB4Gvnebzxhk12I8sr8cjBFRQ+AfcjfKu8Nyr2Fervgt1y1+BXjY6Jn603bk0WLx5r 8n65dQBbAKJZYSkKoyc2D1yq1kHAXUTSdG/GBRnj4zDpDoLFgoks85weDV3rQUT5nkjf40LqaDutp I3Em7R5g==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98.2 #2 (Red Hat Linux)) id 1wHikw-00000001ZvG-0SX4; Tue, 28 Apr 2026 13:45:42 +0000 Received: from tor.source.kernel.org ([2600:3c04:e001:324:0:1991:8:25]) by bombadil.infradead.org with esmtps (Exim 4.98.2 #2 (Red Hat Linux)) id 1wHikv-00000001Zux-123p for linux-arm-kernel@lists.infradead.org; Tue, 28 Apr 2026 13:45:41 +0000 Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by tor.source.kernel.org (Postfix) with ESMTP id 4CBCB60008; Tue, 28 Apr 2026 13:45:40 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 80E67C2BCAF; Tue, 28 Apr 2026 13:45:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1777383940; bh=GYj8nB60I1VHypw+mOX3Mu/8peTbqovz5p67Koi3YFc=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=Rypw4e10B/yE8h5zP02oFqJax3GmBY+hiezSZHP7i2YneSyZMYRHOybPhmxqVJ0hP Apb/7+q9T6BLVwD+vGJjFrz8Oi3jOdbXySKacDs/V/ilE1587Kl8Zax1ynObH6j30g /qCaPg3LVQ5Qvc6mAGP2BM3UGMTBQ4K7vIbz2HwtxiX4wNRz34u17K05TL3zkaMM+W cI76sx9EJZRN96uaYu22mYolJli//JqM8x2OjGJfOn5kC82gZxxK4TrpRU9/UBj6Zv b54062dNzC7BaMHhemZP6cWuEot+DOkVa3uMT1DAHByPthdd6Q64dcTOvFGM2ZydO6 A/Sy3pcXq4OGw== Date: Tue, 28 Apr 2026 14:45:34 +0100 From: Will Deacon To: Fuad Tabba Cc: maz@kernel.org, oliver.upton@linux.dev, james.morse@arm.com, suzuki.poulose@arm.com, yuzenghui@huawei.com, qperret@google.com, vdonnefort@google.com, catalin.marinas@arm.com, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-kernel@vger.kernel.org, stable@vger.kernel.org Subject: Re: [PATCH 6/8] KVM: arm64: Propagate stage-2 map failure on host->guest donation Message-ID: References: <20260428103008.696141-1-tabba@google.com> <20260428103008.696141-7-tabba@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20260428103008.696141-7-tabba@google.com> X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Tue, Apr 28, 2026 at 11:30:06AM +0100, Fuad Tabba wrote: > __pkvm_host_donate_guest() flips the host stage-2 PTE for the donated > page to a non-valid annotation (KVM_HOST_INVALID_PTE_TYPE_DONATION, > owner = PKVM_ID_GUEST) via host_stage2_set_owner_metadata_locked() > and then calls kvm_pgtable_stage2_map() to install the matching guest > stage-2 mapping. The map's return value was wrapped in WARN_ON() and > otherwise discarded. > > At EL2 in nVHE/pKVM, WARN_ON() is not warn-and-continue: it expands > to a BRK that enters the invalid-host-el2 vector and branches to > hyp_panic(), declared __noreturn. WARN_ON of a reachable failure at > EL2 is a panic primitive, not a debug aid. > > kvm_pgtable_stage2_map() can fail in reachable ways even at PAGE_SIZE > granularity: __pkvm_host_donate_guest() verifies PKVM_NOPAGE for the > guest IPA before the map, meaning no valid stage-2 entry exists. The > walker must allocate new page-table pages from the vcpu memcache to > install the mapping, returning -ENOMEM if exhausted. The host > controls the vcpu memcache via the topup interface, so an > under-provisioned donation request converts a recoverable error into > a fatal hyp panic. > > Capture the stage-2 map return value and propagate it. The walker > may have installed partial leaf entries for the IPA before failing, > so unmap the range to clear them; otherwise the guest would retain > stage-2 access to a page the host is about to reclaim as > PKVM_PAGE_OWNED. Then roll back the host stage-2 mutation: the only > forward mutation is host_stage2_set_owner_metadata_locked() flipping > the host vmemmap from PKVM_PAGE_OWNED to PKVM_NOPAGE and the host > stage-2 PTE from idmap to invalid+annotation. > host_stage2_set_owner_locked(_, _, PKVM_ID_HOST) restores both. > > The rollback calls host_stage2_set_owner_locked() under WARN_ON. > This is the correct use: host_stage2_set_owner_metadata_locked() > just wrote the host leaf PTE as an invalid+annotation entry, so the > reverse idmap rewrite cannot require new page-table allocation — it > rewrites the leaf in-place. The WARN_ON asserts an impossible state > under correct EL2 execution, semantically distinct from the misuse > being fixed. > > Fixes: 1e579adca177 ("KVM: arm64: Introduce __pkvm_host_donate_guest()") > Signed-off-by: Fuad Tabba > --- > arch/arm64/kvm/hyp/nvhe/mem_protect.c | 27 ++++++++++++++++++++++++--- > 1 file changed, 24 insertions(+), 3 deletions(-) > > diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c > index 7044913a0758..b8c57a95e9bf 100644 > --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c > +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c > @@ -1391,9 +1391,30 @@ int __pkvm_host_donate_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu) > meta = host_stage2_encode_gfn_meta(vm, gfn); > WARN_ON(host_stage2_set_owner_metadata_locked(phys, PAGE_SIZE, > PKVM_ID_GUEST, meta)); > - WARN_ON(kvm_pgtable_stage2_map(&vm->pgt, ipa, PAGE_SIZE, phys, > - pkvm_mkstate(KVM_PGTABLE_PROT_RWX, PKVM_PAGE_OWNED), > - &vcpu->vcpu.arch.pkvm_memcache, 0)); > + ret = kvm_pgtable_stage2_map(&vm->pgt, ipa, PAGE_SIZE, phys, > + pkvm_mkstate(KVM_PGTABLE_PROT_RWX, PKVM_PAGE_OWNED), > + &vcpu->vcpu.arch.pkvm_memcache, 0); > + if (ret) { > + /* > + * Stage-2 map can fail mid-walk (e.g. -ENOMEM from the > + * memcache), leaving partial leaf entries installed in the > + * guest stage-2. Tear them down before rolling back the host > + * stage-2; otherwise the guest would retain access to a page > + * the host is about to reclaim as PKVM_PAGE_OWNED. > + */ > + kvm_pgtable_stage2_unmap(&vm->pgt, ipa, PAGE_SIZE); Whoa, whoa, whoa. First of all, this is mapping a single page, so the comment talking about "leaf entries" (plural) is bogus. If an operation to map a single page fails, then it makes no sense to try unmapping the mapping which we failed to create. What do you expect it to do? On the other hand, if we extend this to handle ranges in future (which presumably we'll want as part of the THP support) then wouldn't this mean that a concurrent vCPU could have transiently written to the pages that _did_ get mapped, and now we're going to give those back to the host? That's really not ok! We're relying on these WARN_ON()s being fatal and they shouldn't fail because we perform all the permission checks first, in a separate pass. If you want to improve this, then I think the options are either: 1. Check that the the memcache is topped up first 2. Poison the page (similar to the forced-reclaim path) 3. Tell the host about the pages it's lost and maybe it can leak them (I vote for (1)) But we absolutely cannot do the simple rollback for the ownership changes; that's why the code is written to do the checks up-front. Your other patches at the end of this series have different flavours of the same issue. If it's just about keeping the LLM happy, then either fix the LLM or make these BUG_ON() (in conjunction with (1) above). Will