From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 9FF6CC77B7D for ; Wed, 17 May 2023 16:42:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:Message-ID:Date:References :In-Reply-To:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=+27ZV6278Lz9e8rm7qYvTH0x1JSUlsrxhqzGvemfaiY=; b=q/cHDytYyGuWnS 5zobBQ2bbzWckKVdaFQRbevSCBEMQDTo4mk1t2K/fJ93gNx5E6CB8Px5btaq2stMlJAXByO0SJ5Ci W/eizAE2sJLFmPb3cahZ5oFBOZxlImaPpu6GDj6i3EaZXKZOwEZ6cP12AbqkcuQjww1cLcCXVN4wz CYRcRvyKL1c3NJyLSJof0pfDGRxxXXUaBzVbujpCUllPZVDpQiSZgWOfOSVc19UZg7sC6jtz5rab3 q8HVIIkxkI52yVA6B4h6n1ULXaXcWdfN/sTMedtiHH7i87oR9tFf82PT12a8FsoXtCRuFAOb6Lrtl q7yNkaCi2QBPZkIvuEtw==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1pzKDm-00AUpU-25; Wed, 17 May 2023 16:41:50 +0000 Received: from galois.linutronix.de ([2a0a:51c0:0:12e:550::1]) by bombadil.infradead.org with esmtps (Exim 4.96 #2 (Red Hat Linux)) id 1pzKDj-00AUoP-05 for linux-arm-kernel@lists.infradead.org; Wed, 17 May 2023 16:41:48 +0000 From: Thomas Gleixner DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1684341705; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=oaxra5LHo3xgJ7S7vUrzvDP6NZUhTIgpDNXRpQYpnMQ=; b=SqhFtCn1feXTg0JlJ0doVT2H2tx8esA8PU4FfccLzaFx2egMJo4sEGmrMi6/TPi+jAmQ0l yIAfFixmYJKrELPdZlfz0B6KYI4DZh+ohIV8u7iMRqpC9lnK+rPAHvEsCfgFjd4YJkJtTb xYooZz/luyFRT9flyyhOY7YIAhJfHILFL9ajLOju1da29KLvxQE4zeqWwYYFx68353uLVV G61RIJYkFqbIm2MI2U2nU7IQaKY/RIUJvM2xDX/NlFvWryHObTTfBujy1NyZHs2lkJele6 A3JO29WOr1jgFG5OhSo8VbNRQ9ReXBo/H866fdtjED4dF1MXNLxx/AH9HrULqQ== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1684341705; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=oaxra5LHo3xgJ7S7vUrzvDP6NZUhTIgpDNXRpQYpnMQ=; b=xBF97RtSz9tINsAxIHlc7nErm3fhbKdEHjFn+SmqJI94mETIXBMj3x87ePt2x87B40XAtY tf4bxnozuONCwbDw== To: Mark Rutland Cc: Nadav Amit , Uladzislau Rezki , "Russell King (Oracle)" , Andrew Morton , linux-mm , Christoph Hellwig , Lorenzo Stoakes , Peter Zijlstra , Baoquan He , John Ogness , linux-arm-kernel@lists.infradead.org, Marc Zyngier , x86@kernel.org Subject: Re: Excessive TLB flush ranges In-Reply-To: References: <87cz308y3s.ffs@tglx> <87y1lo7a0z.ffs@tglx> <87o7mk733x.ffs@tglx> <7ED917BC-420F-47D4-8956-8984205A75F0@gmail.com> <87bkik6pin.ffs@tglx> <87353v7qms.ffs@tglx> <87ttwb5jx3.ffs@tglx> Date: Wed, 17 May 2023 18:41:44 +0200 Message-ID: <87bkii6hbr.ffs@tglx> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230517_094147_206895_4B6E1248 X-CRM114-Status: GOOD ( 23.72 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Wed, May 17 2023 at 15:43, Mark Rutland wrote: > On Wed, May 17, 2023 at 12:31:04PM +0200, Thomas Gleixner wrote: >> The way how arm/arm64 implement that in software is: >> >> magic_barrier1(); >> flush_range_with_magic_opcodes(); >> magic_barrier2(); > > FWIW, on arm64 that sequence (for leaf entries only) is: > > /* > * Make sure prior writes to the page table entries are visible to all > * CPUs, so that *subsequent* page table walks will see the latest > * values. > * > * This is roughly __smp_wmb(). > */ > dsb(ishst) // AKA magic_barrier1() > > /* > * The "TLBI *IS, " instructions send a message to all other > * CPUs, essentially saying "please start invalidating entries for > * " > * > * The "TLBI *ALL*IS" instructions send a message to all other CPUs, > * essentially saying "please start invalidating all entries". > * > * In theory, this could be for discontiguous ranges. > */ > flush_range_with_magic_opcodes() > > /* > * Wait for acknowledgement that all prior TLBIs have completed. This > * also ensures that all accesses using those translations have also > * completed. > * > * This waits for all relevant CPUs to acknowledge completion of any > * prior TLBIs sent by this CPU. > */ > dsb(ish) // AKA magic_barrier2() > isb() > > So you can batch a bunch of "TLBI *IS, " with a single barrier for > completion, or you can use a single "TLBI *ALL*IS" to invalidate everything. > > It can still be worth using the latter, as arm64 has done since commit: > > 05ac65305437e8ef ("arm64: fix soft lockup due to large tlb flush range") > > ... as for a large range, issuing a bunch of "TLBI *IS, " can take a > while, and can require the recipient CPUs to do more work than they might have > to do for a single "TLBI *ALL*IS". And looking at the changelog and backtrace: PC is at __cpu_flush_kern_tlb_range+0xc/0x40 LR is at __purge_vmap_area_lazy+0x28c/0x3ac I'm willing to bet that this is exactly the same scenario of a direct map + module area flush. That's the only one we found so far which creates insanely large ranges. The other effects of coalescing can still result in seriously oversized flushs for just a couple of pages. The worst I've seen aside of that BPF muck was a 'flush 2 pages' with an resulting range of ~3.8MB. > The point at which invalidating everything is better depends on a number of > factors (e.g. the impact of all CPUs needing to make new page table walks), and > currently we have an arbitrary boundary where we choose to invalidate > everything (which has been tweaked a bit over time); there isn't really a > one-size-fits-all best answer. I'm well aware of that :) Thanks, tglx _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel