From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-14.0 required=3.0 tests=BAYES_00,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 863B1C43381 for ; Wed, 3 Mar 2021 15:28:39 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5DD6C64EBD for ; Wed, 3 Mar 2021 15:28:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1448346AbhCCPZA (ORCPT ); Wed, 3 Mar 2021 10:25:00 -0500 Received: from mail.kernel.org ([198.145.29.99]:47470 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241111AbhCCKcX (ORCPT ); Wed, 3 Mar 2021 05:32:23 -0500 Received: from disco-boy.misterjones.org (disco-boy.misterjones.org [51.254.78.96]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id A147B64EE9; Wed, 3 Mar 2021 09:54:29 +0000 (UTC) Received: from 78.163-31-62.static.virginmediabusiness.co.uk ([62.31.163.78] helo=why.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94) (envelope-from ) id 1lHOD5-00Grus-9a; Wed, 03 Mar 2021 09:54:27 +0000 Date: Wed, 03 Mar 2021 09:54:25 +0000 Message-ID: <87sg5czhny.wl-maz@kernel.org> From: Marc Zyngier To: Jia He Cc: kvmarm@lists.cs.columbia.edu, James Morse , Julien Thierry , Suzuki K Poulose , Catalin Marinas , Will Deacon , Gavin Shan , Yanan Wang , Quentin Perret , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] KVM: arm64: Fix unaligned addr case in mmu walking In-Reply-To: <20210303024225.2591-1-justin.he@arm.com> References: <20210303024225.2591-1-justin.he@arm.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/27.1 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-SA-Exim-Connect-IP: 62.31.163.78 X-SA-Exim-Rcpt-To: justin.he@arm.com, kvmarm@lists.cs.columbia.edu, james.morse@arm.com, julien.thierry.kdev@gmail.com, suzuki.poulose@arm.com, catalin.marinas@arm.com, will@kernel.org, gshan@redhat.com, wangyanan55@huawei.com, qperret@google.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Jia, On Wed, 03 Mar 2021 02:42:25 +0000, Jia He wrote: > > If the start addr is not aligned with the granule size of that level. > loop step size should be adjusted to boundary instead of simple > kvm_granual_size(level) increment. Otherwise, some mmu entries might miss > the chance to be walked through. > E.g. Assume the unmap range [data->addr, data->end] is > [0xff00ab2000,0xff00cb2000] in level 2 walking and NOT block mapping. When does this occur? Upgrade from page mappings to block? Swap out? > And the 1st part of that pmd entry is [0xff00ab2000,0xff00c00000]. The > pmd value is 0x83fbd2c1002 (not valid entry). In this case, data->addr > should be adjusted to 0xff00c00000 instead of 0xff00cb2000. Let me see if I understand this. Assuming 4k pages, the region described above spans *two* 2M entries: (a) ff00ab2000-ff00c00000, part of ff00a00000-ff00c00000 (b) ff00c00000-ff00db2000, part of ff00c00000-ff00e00000 (a) has no valid mapping, but (b) does. Because we fail to correctly align on a block boundary when skipping (a), we also skip (b), which is then left mapped. Did I get it right? If so, yes, this is... annoying. Understanding the circumstances this triggers in would be most interesting. This current code seems to assume that we get ranges aligned to mapping boundaries, but I seem to remember that the old code did use the stage2_*_addr_end() helpers to deal with this case. Will: I don't think things have changed in that respect, right? > > Without this fix, userspace "segment fault" error can be easily > triggered by running simple gVisor runsc cases on an Ampere Altra > server: > docker run --runtime=runsc -it --rm ubuntu /bin/bash > > In container: > for i in `seq 1 100`;do ls;done The workload on its own isn't that interesting. What I'd like to understand is what happens on the host during that time. > > Reported-by: Howard Zhang > Signed-off-by: Jia He > --- > arch/arm64/kvm/hyp/pgtable.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c > index bdf8e55ed308..4d99d07c610c 100644 > --- a/arch/arm64/kvm/hyp/pgtable.c > +++ b/arch/arm64/kvm/hyp/pgtable.c > @@ -225,6 +225,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data, > goto out; > > if (!table) { > + data->addr = ALIGN_DOWN(data->addr, kvm_granule_size(level)); > data->addr += kvm_granule_size(level); > goto out; > } It otherwise looks good to me. Quentin, Will: unless you object to this, I plan to take it in the next round of fixes with Fixes: b1e57de62cfb ("KVM: arm64: Add stand-alone page-table walker infrastructure") Cc: stable@vger.kernel.org Thanks, M. -- Without deviation from the norm, progress is not possible.