From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E6EA2235063; Thu, 5 Mar 2026 13:22:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772716957; cv=none; b=iiYH+RK7Do5sf5LqyP4KJtlM78XW7KAgi+5d8t103zcgt43KLcZ/u6ISrZYUvWBfWEN52fMyLHKeeHPwSc6M/ZOPbbjMkXRmkaccyYWj7UUt228rAgEZjVEmUeZC1pqJWdZrGN5uZz6ZJALr3it9Fh5PHvi4QibVBYIt8TQgnAo= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772716957; c=relaxed/simple; bh=HoeOEnJJ3rV6xiUN02dBgGjyaUSeSVtm+8KWX9i/zkQ=; h=Date:Message-ID:From:To:Cc:Subject:In-Reply-To:References: MIME-Version:Content-Type; b=RIqncEKA/s2cOOaqXWtDQ2ThoOfklqaB+qxZUOJxzfA2yPVXkLeF9T9uzX7kATgUZpJ0uk5iGBHeU/IPvmEU2/a3GYudc9fE0990lWRZG2m/xl2qnNUNrpA0lnqps6PF9D0kte4b2blHWR++ZBAXsbmIzlrgMCvVWYpWxLcuZfM= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=UMWQAHqt; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="UMWQAHqt" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 7B1EBC116C6; Thu, 5 Mar 2026 13:22:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1772716956; bh=HoeOEnJJ3rV6xiUN02dBgGjyaUSeSVtm+8KWX9i/zkQ=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=UMWQAHqtL6hz0qbk7PAyTnVidzD6kRo8NpzuAuCNGwv7oe2iMZb2M171Oora0L3uD edcSPXQ2OU/YUlS65eAxfUkWg6kZwp538Hjb1HBZBaO9GqtDhqB/Y3wbxez0gf9rAm 2XfqDs7k2ijLpuolP2cu2ec5yM/DVVQNvsrFwOCzrWwaJpx0ZhuVJCL1ktM/PK4MrX QZ6HyZxeoR5FRvBuL4EWKx7U1GzjmRX/MEzc27Hj5CK5V837UG0eTWsO+7I35za2cO QBLwp1KBEBJl4Fe1qr2ztrHisJgVSjcBmlWe28VEnPt/CVF1jjGlW6FYF37Uwti/6G sqIJxcosn+XZg== Received: from sofa.misterjones.org ([185.219.108.64] helo=goblin-girl.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.98.2) (envelope-from ) id 1vy8ew-0000000GSPP-0YCu; Thu, 05 Mar 2026 13:22:34 +0000 Date: Thu, 05 Mar 2026 13:22:33 +0000 Message-ID: <86o6l276na.wl-maz@kernel.org> From: Marc Zyngier To: Quentin Perret Cc: Oliver Upton , Joey Gouly , Suzuki K Poulose , Zenghui Yu , Catalin Marinas , Will Deacon , linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-kernel@vger.kernel.org, Leo Yan Subject: Re: [PATCH] KVM: arm64: Adjust range correctly during host stage-2 faults In-Reply-To: References: <20250625105548.984572-1-qperret@google.com> <86wlzr77cn.wl-maz@kernel.org> <86seae7dg1.wl-maz@kernel.org> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/30.1 (aarch64-unknown-linux-gnu) MULE/6.0 (HANACHIRUSATO) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-SA-Exim-Connect-IP: 185.219.108.64 X-SA-Exim-Rcpt-To: qperret@google.com, oliver.upton@linux.dev, joey.gouly@arm.com, suzuki.poulose@arm.com, yuzenghui@huawei.com, catalin.marinas@arm.com, will@kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-kernel@vger.kernel.org, leo.yan@arm.com X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false On Thu, 05 Mar 2026 13:13:40 +0000, Quentin Perret wrote: > > On Thursday 05 Mar 2026 at 10:55:42 (+0000), Marc Zyngier wrote: > > On Wed, 04 Mar 2026 18:55:04 +0000, > > Marc Zyngier wrote: > > > > > > On Wed, 25 Jun 2025 11:55:48 +0100, > > > Quentin Perret wrote: > > > > > > > > host_stage2_adjust_range() tries to find the largest block mapping that > > > > fits within a memory or mmio region (represented by a kvm_mem_range in > > > > this function) during host stage-2 faults under pKVM. To do so, it walks > > > > the host stage-2 page-table, finds the faulting PTE and its level, and > > > > then progressively increments the level until it finds a granule of the > > > > appropriate size. However, the condition in the loop implementing the > > > > above is broken as it checks kvm_level_supports_block_mapping() for the > > > > next level instead of the current, so pKVM may attempt to map a region > > > > larger than can be covered with a single block. > > > > > > > > This is not a security problem and is quite rare in practice (the > > > > kvm_mem_range check usually forces host_stage2_adjust_range() to choose a > > > > smaller granule), but this is clearly not the expected behaviour. > > > > > > > > Refactor the loop to fix the bug and improve readability. > > > > > > > > Fixes: c4f0935e4d95 ("KVM: arm64: Optimize host memory aborts") > > > > Signed-off-by: Quentin Perret > > > > > > This patch prevents my O6 board from booting in protected mode as of > > > e728e705802fe. Reverting it on top of 7.0-rc2 make the box work again. > > > > > > I haven't quite worked out why though. The hack below makes it work, > > > but implies that we can get ranges that are smaller than a page. That > > > feels unlikely, but I'm not sure we can rule it out (the kernel page > > > size could be pretty large anyway). > > > > Having spent a bit of time on this, I'm pretty sure this is the cause > > of the issue. The memblock tables are as such: > > > > maz@cosmic-debris:~/vminstall$ sudo cat /sys/kernel/debug/memblock/memory > > 0: 0x0000000080000000..0x00000000843fffff 0 NOMAP > > 1: 0x0000000084400000..0x00000000845fffff 0 NONE > > 2: 0x0000000085000000..0x000000009fffffff 0 NONE > > 3: 0x00000000a0000000..0x00000000a7ffffff 0 NOMAP > > 4: 0x00000000a8000000..0x00000000fffbffff 0 NONE > > 5: 0x00000000fffc0000..0x00000000fffeffff 0 NOMAP > > 6: 0x00000000ffff0000..0x00000000ffffdfff 0 NONE > > 7: 0x00000000ffffe000..0x00000000ffffffff 0 NOMAP > > 8: 0x0000000100000000..0x00000007fe4effff 0 NONE > > 9: 0x00000007fe4f0000..0x00000007fedeffff 0 NOMAP > > 10: 0x00000007fedf0000..0x00000007ffffffff 0 NONE > > 11: 0x0000008000000000..0x000000807a290fff 0 NONE > > 12: 0x000000807a291000..0x000000807a2927b2 0 NOMAP > > 13: 0x000000807a2927b3..0x000000807fffffff 0 NONE > > Ouch, these last few are 'interesting', oh well :-) > > > Any access to page 0x000000807a292000 is going to blow up in your > > face, because there is no way you can map this and still respect the > > memblock boundary. Same thing for any region that is smaller than > > PAGE_SIZE, or not aligned on PAGE_SIZE. Which is even more annoying. > > > > I'm starting to think that my hack is not that idiotic in the end... > > Yes, I can't think of anything better TBH. We've already asserted that > we don't have an annotated PTE here, and at the last level we're > guaranteed not to accidentally map a neighbouring private region, so yes > we should just proceed with a page-aligned mapping there. > > Want me to post a proper patch or do you already have one in stock? I have that ready, but I wanted your feedback on it before posting it. I'll send that now. Thanks, M. -- Without deviation from the norm, progress is not possible.