From: Will Deacon <will@kernel.org>
To: Fuad Tabba <tabba@google.com>
Cc: maz@kernel.org, oliver.upton@linux.dev, james.morse@arm.com,
suzuki.poulose@arm.com, yuzenghui@huawei.com, qperret@google.com,
vdonnefort@google.com, catalin.marinas@arm.com,
linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
linux-kernel@vger.kernel.org, stable@vger.kernel.org
Subject: Re: [PATCH 6/8] KVM: arm64: Propagate stage-2 map failure on host->guest donation
Date: Tue, 28 Apr 2026 14:45:34 +0100 [thread overview]
Message-ID: <afC5_jrTEVRfJ77x@willie-the-truck> (raw)
In-Reply-To: <20260428103008.696141-7-tabba@google.com>
On Tue, Apr 28, 2026 at 11:30:06AM +0100, Fuad Tabba wrote:
> __pkvm_host_donate_guest() flips the host stage-2 PTE for the donated
> page to a non-valid annotation (KVM_HOST_INVALID_PTE_TYPE_DONATION,
> owner = PKVM_ID_GUEST) via host_stage2_set_owner_metadata_locked()
> and then calls kvm_pgtable_stage2_map() to install the matching guest
> stage-2 mapping. The map's return value was wrapped in WARN_ON() and
> otherwise discarded.
>
> At EL2 in nVHE/pKVM, WARN_ON() is not warn-and-continue: it expands
> to a BRK that enters the invalid-host-el2 vector and branches to
> hyp_panic(), declared __noreturn. WARN_ON of a reachable failure at
> EL2 is a panic primitive, not a debug aid.
>
> kvm_pgtable_stage2_map() can fail in reachable ways even at PAGE_SIZE
> granularity: __pkvm_host_donate_guest() verifies PKVM_NOPAGE for the
> guest IPA before the map, meaning no valid stage-2 entry exists. The
> walker must allocate new page-table pages from the vcpu memcache to
> install the mapping, returning -ENOMEM if exhausted. The host
> controls the vcpu memcache via the topup interface, so an
> under-provisioned donation request converts a recoverable error into
> a fatal hyp panic.
>
> Capture the stage-2 map return value and propagate it. The walker
> may have installed partial leaf entries for the IPA before failing,
> so unmap the range to clear them; otherwise the guest would retain
> stage-2 access to a page the host is about to reclaim as
> PKVM_PAGE_OWNED. Then roll back the host stage-2 mutation: the only
> forward mutation is host_stage2_set_owner_metadata_locked() flipping
> the host vmemmap from PKVM_PAGE_OWNED to PKVM_NOPAGE and the host
> stage-2 PTE from idmap to invalid+annotation.
> host_stage2_set_owner_locked(_, _, PKVM_ID_HOST) restores both.
>
> The rollback calls host_stage2_set_owner_locked() under WARN_ON.
> This is the correct use: host_stage2_set_owner_metadata_locked()
> just wrote the host leaf PTE as an invalid+annotation entry, so the
> reverse idmap rewrite cannot require new page-table allocation — it
> rewrites the leaf in-place. The WARN_ON asserts an impossible state
> under correct EL2 execution, semantically distinct from the misuse
> being fixed.
>
> Fixes: 1e579adca177 ("KVM: arm64: Introduce __pkvm_host_donate_guest()")
> Signed-off-by: Fuad Tabba <tabba@google.com>
> ---
> arch/arm64/kvm/hyp/nvhe/mem_protect.c | 27 ++++++++++++++++++++++++---
> 1 file changed, 24 insertions(+), 3 deletions(-)
>
> diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> index 7044913a0758..b8c57a95e9bf 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> @@ -1391,9 +1391,30 @@ int __pkvm_host_donate_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu)
> meta = host_stage2_encode_gfn_meta(vm, gfn);
> WARN_ON(host_stage2_set_owner_metadata_locked(phys, PAGE_SIZE,
> PKVM_ID_GUEST, meta));
> - WARN_ON(kvm_pgtable_stage2_map(&vm->pgt, ipa, PAGE_SIZE, phys,
> - pkvm_mkstate(KVM_PGTABLE_PROT_RWX, PKVM_PAGE_OWNED),
> - &vcpu->vcpu.arch.pkvm_memcache, 0));
> + ret = kvm_pgtable_stage2_map(&vm->pgt, ipa, PAGE_SIZE, phys,
> + pkvm_mkstate(KVM_PGTABLE_PROT_RWX, PKVM_PAGE_OWNED),
> + &vcpu->vcpu.arch.pkvm_memcache, 0);
> + if (ret) {
> + /*
> + * Stage-2 map can fail mid-walk (e.g. -ENOMEM from the
> + * memcache), leaving partial leaf entries installed in the
> + * guest stage-2. Tear them down before rolling back the host
> + * stage-2; otherwise the guest would retain access to a page
> + * the host is about to reclaim as PKVM_PAGE_OWNED.
> + */
> + kvm_pgtable_stage2_unmap(&vm->pgt, ipa, PAGE_SIZE);
Whoa, whoa, whoa.
First of all, this is mapping a single page, so the comment talking about
"leaf entries" (plural) is bogus. If an operation to map a single page
fails, then it makes no sense to try unmapping the mapping which we
failed to create. What do you expect it to do?
On the other hand, if we extend this to handle ranges in future (which
presumably we'll want as part of the THP support) then wouldn't this
mean that a concurrent vCPU could have transiently written to the pages
that _did_ get mapped, and now we're going to give those back to the
host? That's really not ok! We're relying on these WARN_ON()s being
fatal and they shouldn't fail because we perform all the permission
checks first, in a separate pass.
If you want to improve this, then I think the options are either:
1. Check that the the memcache is topped up first
2. Poison the page (similar to the forced-reclaim path)
3. Tell the host about the pages it's lost and maybe it can leak them
(I vote for (1))
But we absolutely cannot do the simple rollback for the ownership
changes; that's why the code is written to do the checks up-front. Your
other patches at the end of this series have different flavours of the
same issue.
If it's just about keeping the LLM happy, then either fix the LLM or
make these BUG_ON() (in conjunction with (1) above).
Will
next prev parent reply other threads:[~2026-04-28 13:45 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-28 10:30 [PATCH 0/8] KVM: arm64: EL2 synchronisation and pKVM stage-2 error propagation fixes Fuad Tabba
2026-04-28 10:30 ` [PATCH 1/8] KVM: arm64: Make EL2 exception entry and exit context-synchronization events Fuad Tabba
2026-04-30 2:22 ` Yao Yuan
2026-04-30 9:07 ` Will Deacon
2026-04-30 12:18 ` Fuad Tabba
2026-04-30 12:37 ` Mark Rutland
2026-04-28 10:30 ` [PATCH 2/8] KVM: arm64: Synchronise HCR_EL2 writes on the guest exit path Fuad Tabba
2026-04-28 13:50 ` Will Deacon
2026-04-28 14:21 ` Fuad Tabba
2026-04-28 10:30 ` [PATCH 3/8] KVM: arm64: Guard against NULL vcpu on VHE hyp panic path Fuad Tabba
2026-04-28 10:30 ` [PATCH 4/8] KVM: arm64: Fix __deactivate_fgt macro parameter typo Fuad Tabba
2026-04-28 10:30 ` [PATCH 5/8] KVM: arm64: Propagate stage-2 map failure on host->guest share Fuad Tabba
2026-04-28 10:30 ` [PATCH 6/8] KVM: arm64: Propagate stage-2 map failure on host->guest donation Fuad Tabba
2026-04-28 13:45 ` Will Deacon [this message]
2026-04-28 14:36 ` Fuad Tabba
2026-04-28 16:57 ` Will Deacon
2026-04-28 17:03 ` Fuad Tabba
2026-04-28 10:30 ` [PATCH 7/8] KVM: arm64: Propagate stage-2 map failure on guest->host share Fuad Tabba
2026-04-28 10:30 ` [PATCH 8/8] KVM: arm64: Propagate stage-2 map failure on guest->host unshare Fuad Tabba
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=afC5_jrTEVRfJ77x@willie-the-truck \
--to=will@kernel.org \
--cc=catalin.marinas@arm.com \
--cc=james.morse@arm.com \
--cc=kvmarm@lists.linux.dev \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=maz@kernel.org \
--cc=oliver.upton@linux.dev \
--cc=qperret@google.com \
--cc=stable@vger.kernel.org \
--cc=suzuki.poulose@arm.com \
--cc=tabba@google.com \
--cc=vdonnefort@google.com \
--cc=yuzenghui@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox