From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5BE4F35B64C; Sun, 22 Feb 2026 18:55:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771786502; cv=none; b=leIIidUbfCOseum4qI9isWIz3o4/wE4ZDB6K5D/gUSOt7OgaIDHr5RyqwkKb1Q9oqQOikoNoGBpi8HCFZ75zeIg9k0HVLoi4Ko66LbaJbYD0AsXL9Mtw4IMiLF2ECvYM8kJ0VRPeC/W6tf04SVv9pU53CwshhTaJJlww4xDzo/w= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771786502; c=relaxed/simple; bh=wFo8UbUa82jGUjMKBA3eL1iC8v+b9J2tS9Z1+mG5XAQ=; h=Date:Message-ID:From:To:Cc:Subject:In-Reply-To:References: MIME-Version:Content-Type; b=ShP2w3bCP176VoganvOw40k9NO/V0JrAFw9KFwsUimjday0Vq8r5KE5KosMOV7kc9OYAtv+hohiC0g3/wYmhzeO1qImTFt/FvPBTSsfLcI/HrrFsuAAnrnyjDjk22N+qYBV1aKbgTAO1z56cULP6c9tLrW1u9m2gvJLTMraqby4= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=BulHsGTT; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="BulHsGTT" Received: by smtp.kernel.org (Postfix) with ESMTPSA id AE1F7C116D0; Sun, 22 Feb 2026 18:55:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1771786501; bh=wFo8UbUa82jGUjMKBA3eL1iC8v+b9J2tS9Z1+mG5XAQ=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=BulHsGTTThcYf4CoitxBzJKckfnkMBZG22DPj41pbdPCVms96aj1X9/gAi80/AtzE T7UzXSSqjE3aWUjJzdIYfSVhF8KtXf/zXV3RSFQYJeYnzbKTOqnlAAPEw8VIr9UVoF sMw5YAInw3Z1h+ch++mvIi8djRoLIfEyiixOrGfhPsbLQcWudO+B66UgTdAI3JCbj6 G/h0KC7AF2lwfn32I2T+XvkytG69ecPiabfLrQ/aTZB2smp//rLSRRWS23wxIwmGjc 3QCChY+CV7lbA0OZKpmpn1AzdtVie/GA1ZSxgOhcvr4HgWlpYgCPYINQA0mVpXTwgj xJcZCNg4vcnZw== Received: from sofa.misterjones.org ([185.219.108.64] helo=lobster-girl.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.98.2) (envelope-from ) id 1vuEbb-0000000Co9G-1U40; Sun, 22 Feb 2026 18:54:59 +0000 Date: Sun, 22 Feb 2026 18:54:58 +0000 Message-ID: <878qckehh9.wl-maz@kernel.org> From: Marc Zyngier To: Fuad Tabba Cc: kvmarm@lists.linux.dev, linux-arm-kernel@lists.infradead.org, Quentin Perret , Will Deacon , Vincent Donnefort , Joey Gouly , Suzuki K Poulose , Oliver Upton , Zenghui Yu , stable@vger.kernel.org Subject: Re: [PATCH] KVM: arm64: Fix protected mode handling of pages larger than 4kB In-Reply-To: References: <20260222141000.3084258-1-maz@kernel.org> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/30.1 (aarch64-unknown-linux-gnu) MULE/6.0 (HANACHIRUSATO) Precedence: bulk X-Mailing-List: stable@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-SA-Exim-Connect-IP: 185.219.108.64 X-SA-Exim-Rcpt-To: tabba@google.com, kvmarm@lists.linux.dev, linux-arm-kernel@lists.infradead.org, qperret@google.com, will@kernel.org, vdonnefort@google.com, joey.gouly@arm.com, suzuki.poulose@arm.com, oupton@kernel.org, yuzenghui@huawei.com, stable@vger.kernel.org X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false Hi Fuad, On Sun, 22 Feb 2026 17:58:00 +0000, Fuad Tabba wrote: > > Hi Marc, > > On Sun, 22 Feb 2026 at 14:10, Marc Zyngier wrote: > > > > Since 3669ddd8fa8b5 ("KVM: arm64: Add a range to pkvm_mappings"), > > pKVM tracks the memory that has been mapped into a guest in a > > side data structure. Crucially, it uses it to find out whether > > a page has already been mapped, and therefore refuses to map it > > twice. So far, so good. > > > > However, this very patch completely breaks non-4kB page support, > > with guests being unable to boot. The most obvious symptom is that > > we take the same fault repeatedly, and not making forward progress. > > A quick investigation shows that this is because of the above > > rejection code. > > > > As it turns out, there are multiple issues at play: > > > > - while the HPFAR_EL2 register gives you the faulting IPA minus > > the bottom 12 bits, it will still give you the extra bits that > > are part of the page offset for anything larger than 4kB, > > even for a level-3 mapping > > Matches the ARM ARM. > > > - pkvm_kvm_pgtable_stage2_map() assumes that the address passed > > as a parameter is aligned to the size of the intended mapping > > nit: pkvm_kvm_pgtable_stage2_map() -> kvm_pgtable_stage2_map() Actually, that's pkvm_pgtable_stage2_map(). kvm_pgtable_stage2_map() itself isn't affected. > > > - the faulting address is only aligned for a non-page mapping > > > > When the planets are suitably aligned (pun intended), the guest > > faults a page by accessing it past the bottom 4kB, and extra bits > > get set in the HPFAR_EL2 register. If this results in a page mapping > > (which is likely with large granule sizes), nothing aligns it further > > down, and pkvm_mapping_iter_first() finds an intersection that > > doesn't really exist. We assume this is a spurious fault and return > > -EAGAIN. And again. > > > > This doesn't hit outside of the protected code, as the page table > > code always aligns the IPA down to a page boundary, hiding the issue > > for everyone else. > > > > Fix it by always forcing the alignment on vma_pagesize, irrespective > > of the value of vma_pagesize. > > > > Fixes: 3669ddd8fa8b5 ("KVM: arm64: Add a range to pkvm_mappings") > > Signed-off-by: Marc Zyngier > > Cc: stable@vger.kernel.org > > --- > > arch/arm64/kvm/mmu.c | 12 +++++------- > > 1 file changed, 5 insertions(+), 7 deletions(-) > > > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c > > index 8c5d259810b2f..aa587f2e28264 100644 > > --- a/arch/arm64/kvm/mmu.c > > +++ b/arch/arm64/kvm/mmu.c > > @@ -1753,14 +1753,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, > > } > > > > /* > > - * Both the canonical IPA and fault IPA must be hugepage-aligned to > > - * ensure we find the right PFN and lay down the mapping in the right > > - * place. > > + * Both the canonical IPA and fault IPA must be aligned to the > > + * mapping size to ensure we find the right PFN and lay down the > > + * mapping in the right place. > > */ > > - if (vma_pagesize == PMD_SIZE || vma_pagesize == PUD_SIZE) { > > - fault_ipa &= ~(vma_pagesize - 1); > > - ipa &= ~(vma_pagesize - 1); > > - } > > + fault_ipa &= ~(vma_pagesize - 1); > > + ipa &= ~(vma_pagesize - 1); > > nit: Since we're changing this code anyway, should we use the ALIGN > macros instead? That'd be ALIGN_DOWN() then, as ALIGN() really is ALIGN_UP(), and that'd be counter-productive. Something like: diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index aa587f2e28264..3952415c4f83b 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -1757,8 +1757,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, * mapping size to ensure we find the right PFN and lay down the * mapping in the right place. */ - fault_ipa &= ~(vma_pagesize - 1); - ipa &= ~(vma_pagesize - 1); + fault_ipa = ALIGN_DOWN(fault_ipa, vma_pagesize); + ipa = ALIGN_DOWN(ipa, vma_pagesize); gfn = ipa >> PAGE_SHIFT; mte_allowed = kvm_vma_mte_allowed(vma); > Reviewed-by: Fuad Tabba > > and using 4, 16, and 64KB pages: > > Tested-by: Fuad Tabba Ah, great! I couldn't be bothered with 64kB, and only used 16kB in NV to debug quickly and then bare-metal to verify the fix. Thanks! M. -- Jazz isn't dead. It just smells funny.