From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 71E5ACD11C2 for ; Wed, 10 Apr 2024 08:45:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Subject:Cc:To:From:Message-ID:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=EWP5Gx+M5UGGBgQ18vgZJd/dNi/6DNVKq0AhktYqzLY=; b=ERr1Yw8Aploktt tARLNIhC+68qvblYI9YjOvLLbfpz62RkbzHMHJGtPNp7TpOHFFKOQsXBLaERk7q0B4lW3lEesg7/u DNMZb19wGBhI8FKc+DbJbTNFWxKY8YtVtZjGq64eRoriXSbkv+i1OroGonIaCRTR475T2042Dopup bt2uKzC0II5dDZzAMr/GKBMijBpF2ztLc/urf2LSrHgUdXIlpWdfa+ZtFJRCe/dlEkC0wAQ/RMqJg 6aE5SKZDJHB5ggHsLFaIQhTptTi2k9PH+cRQWC2EKHMHiepIIloO575AfOSAmhJB9SqdlkwyU+dh+ lnFd4HMD89RsbleAtDRg==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1ruTaT-00000005wuw-34HF; Wed, 10 Apr 2024 08:45:45 +0000 Received: from sin.source.kernel.org ([2604:1380:40e1:4800::1]) by bombadil.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux)) id 1ruTaQ-00000005wsN-49r1 for linux-arm-kernel@lists.infradead.org; Wed, 10 Apr 2024 08:45:44 +0000 Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by sin.source.kernel.org (Postfix) with ESMTP id 9B4D9CE22AA; Wed, 10 Apr 2024 08:45:40 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id D033EC433C7; Wed, 10 Apr 2024 08:45:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1712738739; bh=rjpJ/B0jBLfP6NqGowlaRPiXpWGZqILHxap2uCmUq7E=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=nZnkn8HrApjPn+iBhw/Z53N0bB5DkavtqggzdJ1FKc6z1KIgt0UcANOWxTPYTfGOd b9mtO7drDcTBlWti/u/3DvjOkvxj5wYBbafkNTnrgekkExcrFU7dx7o2mATD9GZnNS iLWh1Th97DjMhQdxJ6h1Ve+tuxUIRjwM+UiRFcSbriQN4RK2cutC6Gfa772FQQdzCi Rp8IRo65oXNhaMrn0C12M4J//fX5ggxoWjij075NKo1+RkQ8si6QmEBR753jNfEPR6 PCJxjwLBozu7H4wLtaxJ1xEY2Edtl00qQ+bsW8x0oEwpRWeYDW2cpbQUA9kB1LWABC y/B88RqarDZzQ== Received: from sofa.misterjones.org ([185.219.108.64] helo=goblin-girl.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from ) id 1ruTaL-0034tA-JX; Wed, 10 Apr 2024 09:45:37 +0100 Date: Wed, 10 Apr 2024 09:45:37 +0100 Message-ID: <86wmp5sj0u.wl-maz@kernel.org> From: Marc Zyngier To: Ryan Roberts Cc: Gavin Shan , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, catalin.marinas@arm.com, will@kernel.org, akpm@linux-foundation.org, oliver.upton@linux.dev, apopple@nvidia.com, rananta@google.com, mark.rutland@arm.com, v-songbaohua@oppo.com, yangyicong@hisilicon.com, shahuang@redhat.com, yihyu@redhat.com, shan.gavin@gmail.com Subject: Re: [PATCH v3 1/3] arm64: tlb: Fix TLBI RANGE operand In-Reply-To: <7a929104-5f09-4ff6-8792-4a9e93bc0894@arm.com> References: <20240405035852.1532010-1-gshan@redhat.com> <20240405035852.1532010-2-gshan@redhat.com> <7a929104-5f09-4ff6-8792-4a9e93bc0894@arm.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/29.2 (aarch64-unknown-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") X-SA-Exim-Connect-IP: 185.219.108.64 X-SA-Exim-Rcpt-To: ryan.roberts@arm.com, gshan@redhat.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, catalin.marinas@arm.com, will@kernel.org, akpm@linux-foundation.org, oliver.upton@linux.dev, apopple@nvidia.com, rananta@google.com, mark.rutland@arm.com, v-songbaohua@oppo.com, yangyicong@hisilicon.com, shahuang@redhat.com, yihyu@redhat.com, shan.gavin@gmail.com X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240410_014543_456370_D45717BB X-CRM114-Status: GOOD ( 41.06 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Mon, 08 Apr 2024 09:29:31 +0100, Ryan Roberts wrote: > > On 05/04/2024 04:58, Gavin Shan wrote: > > KVM/arm64 relies on TLBI RANGE feature to flush TLBs when the dirty > > pages are collected by VMM and the page table entries become write > > protected during live migration. Unfortunately, the operand passed > > to the TLBI RANGE instruction isn't correctly sorted out due to the > > commit 117940aa6e5f ("KVM: arm64: Define kvm_tlb_flush_vmid_range()"). > > It leads to crash on the destination VM after live migration because > > TLBs aren't flushed completely and some of the dirty pages are missed. > > > > For example, I have a VM where 8GB memory is assigned, starting from > > 0x40000000 (1GB). Note that the host has 4KB as the base page size. > > In the middile of migration, kvm_tlb_flush_vmid_range() is executed > > to flush TLBs. It passes MAX_TLBI_RANGE_PAGES as the argument to > > __kvm_tlb_flush_vmid_range() and __flush_s2_tlb_range_op(). SCALE#3 > > and NUM#31, corresponding to MAX_TLBI_RANGE_PAGES, isn't supported > > by __TLBI_RANGE_NUM(). In this specific case, -1 has been returned > > from __TLBI_RANGE_NUM() for SCALE#3/2/1/0 and rejected by the loop > > in the __flush_tlb_range_op() until the variable @scale underflows > > and becomes -9, 0xffff708000040000 is set as the operand. The operand > > is wrong since it's sorted out by __TLBI_VADDR_RANGE() according to > > invalid @scale and @num. > > > > Fix it by extending __TLBI_RANGE_NUM() to support the combination of > > SCALE#3 and NUM#31. With the changes, [-1 31] instead of [-1 30] can > > be returned from the macro, meaning the TLBs for 0x200000 pages in the > > above example can be flushed in one shoot with SCALE#3 and NUM#31. The > > macro TLBI_RANGE_MASK is dropped since no one uses it any more. The > > comments are also adjusted accordingly. > > Perhaps I'm being overly pedantic, but I don't think the bug is > __TLBI_RANGE_NUM() not being able to return 31; It is clearly documented that it > can only return in the range [-1, 30] and a maximum of (MAX_TLBI_RANGE_PAGES - > 1) pages are supported. I guess "clearly" is pretty relative. I find it misleading that we don't support the full range of what the architecture offers and have these odd limitations. > The bug is in the kvm caller, which tries to call __flush_tlb_range_op() with > MAX_TLBI_RANGE_PAGES; clearly out-of-bounds. Nobody disputes that point, hence the Fixes: tag pointing to the KVM patch. But there are two ways to fix it: either reduce the amount KVM can use for range invalidation, or fix the helper to allow the full range offered by the architecture. > So personally, I would prefer to fix the bug first. Then separately > enhance the infrastructure to support NUM=31. I don't think this buys us much, apart from making it harder for people to know what they need/want/randomly-elect to backport. > > Fixes: 117940aa6e5f ("KVM: arm64: Define kvm_tlb_flush_vmid_range()") > > I would argue that the bug was actually introduced by commit 360839027a6e > ("arm64: tlb: Refactor the core flush algorithm of __flush_tlb_range"), which > separated the tlbi loop from the range size validation in __flush_tlb_range(). > Before this, all calls would have to go through __flush_tlb_range() and > therefore anything bigger than (MAX_TLBI_RANGE_PAGES - 1) pages would cause the > whole mm to be flushed. Although I get that bisect will lead to this one, so > that's probably the right one to highlight. I haven't tried to bisect it, only picked this as the obviously culprit. To your point, using __flush_tlb_range() made little sense for KVM -- what would be the vma here? Splitting the helper out was, I think the correct decision. But we of course lost sight of the __TLBI_RANGE_NUM limitation in the process. > I get why it was split, but perhaps it should have been split at a higher level; > If tlbi range is not supported, then KVM will flush the whole vmid. Would it be > better for KVM to follow the same pattern as __flush_tlb_range_nosync() and > issue per-block tlbis upto a max of MAX_DVM_OPS before falling back to the whole > vmid? And if tlbi range is supported, KVM uses it regardless of the size of the > range, whereas __flush_tlb_range_nosync() falls back to flush_tlb_mm() at a > certain size. It's not clear why this divergence is useful? That'd be a definitive improvement indeed, and would bring back some much needed consistency. > > Cc: stable@kernel.org # v6.6+ > > Reported-by: Yihuang Yu > > Suggested-by: Marc Zyngier > > Signed-off-by: Gavin Shan > > Anyway, the implementation looks correct, so: > > Reviewed-by: Ryan Roberts Thanks for that! M. -- Without deviation from the norm, progress is not possible. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel