From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6796C15A85A; Thu, 5 Mar 2026 10:55:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772708145; cv=none; b=tXJ6m1fETuQEcTy3B4fuQVO5O5AnU4drlZzQU8AUm4nuoR3r2rC8wHkkPzITfDVziRhoGrqGAOCDv6Er3jWTp+VLR+fP7Edt6KwYY5yY7BAkMWUxyN/gw3FnlcPGJa4ydiZ7wO674Wib/4agHxmGKF4KUcn/mLAtgfgImXFQAog= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772708145; c=relaxed/simple; bh=inGG7aCeztwLtn94aRsqW0/+EdtoLTodhlHwJRABAGs=; h=Date:Message-ID:From:To:Cc:Subject:In-Reply-To:References: MIME-Version:Content-Type; b=nJesVrek7Uzu70gLhR0Syi2kILumF5mVKfYOmYpLGE0LRaFmcAPEKVxqAtkbpDE/eUA7Mq1P5oaNi+RbsGFoi5Zp5JjKODgUvqyjHP8dWo4UiS4pgMExLW3ny+lr7Hh2MuJyFbJoK5FfN9nbfsSy8f9iTYkylEYgDM61IU2CatQ= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Ml7AuvDF; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Ml7AuvDF" Received: by smtp.kernel.org (Postfix) with ESMTPSA id D5630C116C6; Thu, 5 Mar 2026 10:55:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1772708144; bh=inGG7aCeztwLtn94aRsqW0/+EdtoLTodhlHwJRABAGs=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=Ml7AuvDFtrUlCH4EaMX7Z31t9EkPYgX4ZgO0kkUNADOfKATTnPHDCzZbBsWDas8dW 0HRR822649Mjj0/uxSzNyw7l/kUTYDw+CBVjF7/FKa7Q6aZ1UyF+dBDrH2nrE41URy NdUY7U8/CwvtVKPy5ftnJ9HhKI8mk6wAj/Sf1bdAIe/EjaRS+QC3pqycJ2TNWYP5/2 Qy7G+gOpDnfOa10pcV+ns2BH6ySe7g3aMBMQdGF+2UxNysOHZFdXGT0dtnBxOnF4dq tf9NbNB5tQdwXYdUtaYaUpiSzmk0wwNgHUdkj5jR1PmdphQrKEtjYEkJjzvBBvbD5k 8H3j7lHR3H7Vw== Received: from sofa.misterjones.org ([185.219.108.64] helo=goblin-girl.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.98.2) (envelope-from ) id 1vy6Mo-0000000GPIG-3G5P; Thu, 05 Mar 2026 10:55:42 +0000 Date: Thu, 05 Mar 2026 10:55:42 +0000 Message-ID: <86seae7dg1.wl-maz@kernel.org> From: Marc Zyngier To: Quentin Perret Cc: Oliver Upton , Joey Gouly , Suzuki K Poulose , Zenghui Yu , Catalin Marinas , Will Deacon , linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-kernel@vger.kernel.org, Leo Yan Subject: Re: [PATCH] KVM: arm64: Adjust range correctly during host stage-2 faults In-Reply-To: <86wlzr77cn.wl-maz@kernel.org> References: <20250625105548.984572-1-qperret@google.com> <86wlzr77cn.wl-maz@kernel.org> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/30.1 (aarch64-unknown-linux-gnu) MULE/6.0 (HANACHIRUSATO) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-SA-Exim-Connect-IP: 185.219.108.64 X-SA-Exim-Rcpt-To: qperret@google.com, oliver.upton@linux.dev, joey.gouly@arm.com, suzuki.poulose@arm.com, yuzenghui@huawei.com, catalin.marinas@arm.com, will@kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-kernel@vger.kernel.org, leo.yan@arm.com X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false On Wed, 04 Mar 2026 18:55:04 +0000, Marc Zyngier wrote: > > On Wed, 25 Jun 2025 11:55:48 +0100, > Quentin Perret wrote: > > > > host_stage2_adjust_range() tries to find the largest block mapping that > > fits within a memory or mmio region (represented by a kvm_mem_range in > > this function) during host stage-2 faults under pKVM. To do so, it walks > > the host stage-2 page-table, finds the faulting PTE and its level, and > > then progressively increments the level until it finds a granule of the > > appropriate size. However, the condition in the loop implementing the > > above is broken as it checks kvm_level_supports_block_mapping() for the > > next level instead of the current, so pKVM may attempt to map a region > > larger than can be covered with a single block. > > > > This is not a security problem and is quite rare in practice (the > > kvm_mem_range check usually forces host_stage2_adjust_range() to choose a > > smaller granule), but this is clearly not the expected behaviour. > > > > Refactor the loop to fix the bug and improve readability. > > > > Fixes: c4f0935e4d95 ("KVM: arm64: Optimize host memory aborts") > > Signed-off-by: Quentin Perret > > This patch prevents my O6 board from booting in protected mode as of > e728e705802fe. Reverting it on top of 7.0-rc2 make the box work again. > > I haven't quite worked out why though. The hack below makes it work, > but implies that we can get ranges that are smaller than a page. That > feels unlikely, but I'm not sure we can rule it out (the kernel page > size could be pretty large anyway). Having spent a bit of time on this, I'm pretty sure this is the cause of the issue. The memblock tables are as such: maz@cosmic-debris:~/vminstall$ sudo cat /sys/kernel/debug/memblock/memory 0: 0x0000000080000000..0x00000000843fffff 0 NOMAP 1: 0x0000000084400000..0x00000000845fffff 0 NONE 2: 0x0000000085000000..0x000000009fffffff 0 NONE 3: 0x00000000a0000000..0x00000000a7ffffff 0 NOMAP 4: 0x00000000a8000000..0x00000000fffbffff 0 NONE 5: 0x00000000fffc0000..0x00000000fffeffff 0 NOMAP 6: 0x00000000ffff0000..0x00000000ffffdfff 0 NONE 7: 0x00000000ffffe000..0x00000000ffffffff 0 NOMAP 8: 0x0000000100000000..0x00000007fe4effff 0 NONE 9: 0x00000007fe4f0000..0x00000007fedeffff 0 NOMAP 10: 0x00000007fedf0000..0x00000007ffffffff 0 NONE 11: 0x0000008000000000..0x000000807a290fff 0 NONE 12: 0x000000807a291000..0x000000807a2927b2 0 NOMAP 13: 0x000000807a2927b3..0x000000807fffffff 0 NONE Any access to page 0x000000807a292000 is going to blow up in your face, because there is no way you can map this and still respect the memblock boundary. Same thing for any region that is smaller than PAGE_SIZE, or not aligned on PAGE_SIZE. Which is even more annoying. I'm starting to think that my hack is not that idiotic in the end... M. -- Without deviation from the norm, progress is not possible.