From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5E72BC00528 for ; Mon, 24 Jul 2023 17:40:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B107C6B0071; Mon, 24 Jul 2023 13:40:11 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A73006B0074; Mon, 24 Jul 2023 13:40:11 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8C5496B0075; Mon, 24 Jul 2023 13:40:11 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 767076B0071 for ; Mon, 24 Jul 2023 13:40:11 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 5339180B3D for ; Mon, 24 Jul 2023 17:40:11 +0000 (UTC) X-FDA: 81047219022.01.7F7C8A5 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by imf14.hostedemail.com (Postfix) with ESMTP id D7BBA10001D for ; Mon, 24 Jul 2023 17:40:08 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=lVqtErQI; spf=pass (imf14.hostedemail.com: domain of dave.hansen@intel.com designates 192.55.52.93 as permitted sender) smtp.mailfrom=dave.hansen@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1690220409; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Z7FhzluX97ais6rpFOT2+/vgEtB6oAG9cMbMYQDVkG8=; b=ua66/6jurBZXE6t6VmHsVTSyBmUz72qCd0o9NV/rYY4UI4RNNdh8jusEzZ33HBotxUtk8F Gz9SdIlYhT7cPjgLfvUZiB6bYZ7s4i0wE98p3nEUQBnVVCs7TuazKf9t3ktKhrqg+7OhCA HR2i5R9h+GkHOEeEHOTOm44YjZtkSWk= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1690220409; a=rsa-sha256; cv=none; b=BP1yLRySxv9mPslZUeP0+dveYtwTx7t6hqW+rKb1PZQoVmQNwyh7/GGBECiWSZo8ebENw+ fY/Vd6sY9EO2xRQaLpOi3Ifx9NAD3mqyNWVPe8E+VM2EAoML2WXlHK9W4H7h23Ed0zrTUz t1GjHCRuO0kmMMfIA2eIgDiGpQQ8YqQ= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=lVqtErQI; spf=pass (imf14.hostedemail.com: domain of dave.hansen@intel.com designates 192.55.52.93 as permitted sender) smtp.mailfrom=dave.hansen@intel.com; dmarc=pass (policy=none) header.from=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1690220408; x=1721756408; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=c5Z6pBHbsES+jLFL8lHM4LvuACi7M2wwU23RdacU0yA=; b=lVqtErQIymDVanRM+adoD1iFS3Id8Qtb5y9ktWCavJ4Ct0UGF7pxpKvd h09cpk1w0FUtzubfBXHPLdFifStlJdvkZr12qM1/9OYspwYcngdNDljC5 gURjf1mCJVasjd5p0h1SDq/M7/RKdLKyv8Nz4BAIPeq8z3xskBZh9hlOi EcdYRP+xbDLgf0g8j6xRzQejuUG92WYlM0Iif6dU7Cc5JtBIvkvLF2rcv 7F1u2mFPC1MF9/gr0Hvx7FZcRnujOvTVBOnD+JB6cgSCVzFa2we54IRGM /RdwrB7aai8soT3ww9J2ixjUdHxJC2W/SD3IoT6YwGEDNigyLuKfxyXCq g==; X-IronPort-AV: E=McAfee;i="6600,9927,10781"; a="364968455" X-IronPort-AV: E=Sophos;i="6.01,228,1684825200"; d="scan'208";a="364968455" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Jul 2023 10:40:06 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10781"; a="849718874" X-IronPort-AV: E=Sophos;i="6.01,228,1684825200"; d="scan'208";a="849718874" Received: from zhihuich-mobl.amr.corp.intel.com (HELO [10.251.18.158]) ([10.251.18.158]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Jul 2023 10:40:05 -0700 Message-ID: <2284d0db-f94a-e059-7bd0-bab4f112ed35@intel.com> Date: Mon, 24 Jul 2023 10:40:04 -0700 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0 Subject: Re: [RFC PATCH v2 20/20] x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range() targeting NOHZ_FULL CPUs Content-Language: en-US To: Valentin Schneider , Nadav Amit Cc: Linux Kernel Mailing List , "linux-trace-kernel@vger.kernel.org" , "linux-doc@vger.kernel.org" , "kvm@vger.kernel.org" , linux-mm , bpf , the arch/x86 maintainers , "rcu@vger.kernel.org" , "linux-kselftest@vger.kernel.org" , Steven Rostedt , Masami Hiramatsu , Jonathan Corbet , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Paolo Bonzini , Wanpeng Li , Vitaly Kuznetsov , Andy Lutomirski , Peter Zijlstra , Frederic Weisbecker , "Paul E. McKenney" , Neeraj Upadhyay , Joel Fernandes , Josh Triplett , Boqun Feng , Mathieu Desnoyers , Lai Jiangshan , Zqiang , Andrew Morton , Uladzislau Rezki , Christoph Hellwig , Lorenzo Stoakes , Josh Poimboeuf , Jason Baron , Kees Cook , Sami Tolvanen , Ard Biesheuvel , Nicholas Piggin , Juerg Haefliger , Nicolas Saenz Julienne , "Kirill A. Shutemov" , Dan Carpenter , Chuang Wang , Yang Jihong , Petr Mladek , "Jason A. Donenfeld" , Song Liu , Julian Pidancet , Tom Lendacky , Dionna Glaze , =?UTF-8?Q?Thomas_Wei=c3=9fschuh?= , Juri Lelli , Daniel Bristot de Oliveira , Marcelo Tosatti , Yair Podemsky References: <20230720163056.2564824-1-vschneid@redhat.com> <20230720163056.2564824-21-vschneid@redhat.com> <188AEA79-10E6-4DFF-86F4-FE624FD1880F@vmware.com> From: Dave Hansen In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Stat-Signature: qfsicrangdwubik7t7f7jt6whsrpdeq4 X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: D7BBA10001D X-Rspam-User: X-HE-Tag: 1690220408-642493 X-HE-Meta: U2FsdGVkX18bkcIlYCY+QuCks07riymMEU2DGe872yafa3aCE4I88Ts5g/FRoCW8KCGQXwrQyGPQ4NpYsWTZo40lCgh2a+Kp+/l4VEdA+gNp2zq9CTFh6Vtl/wBXBZKu6q2KSb8hiw7vnD/8JdDS99AULWJhIU6BWDiqvc2ejd9g2o0zYFW2iwws1OXqMGEF/oS6EIZ0lYq8ie8xxLuBSTygAcbK02M3JXK7WeW/4tF7hY2qvhU50aQeHra7tS/eCDY1A+JjVznfuNQPkqym1yy5HlKYUFLBM7Z1yKGxkhi+Ih8fuJiLMDE5CDlgFFGbsBRkoo6W3cCNXwRDciUZg4N6WLOIqNYTpFWEW8lXqOEBoc0d5gSsfh2lDZdi9N2lb35tEMu6OeMC1v7tscUcjo1XLCD38xyJT8Pf5tIKX04gjJz2CmFNaiEtoki6CmgBrFuMuf8cCizvVAYWd/G8qw5Pn6XxLvwt74O49dAE2ViVDDmAyREjNtvWGneP/3IV2JQeKTlUNtVlmbPi7bl7w3cHoh/wZ6nz7vTkfTAp63RDiXw1k1Mv9o3McAwu15PC89QwvqpcdRARcpw3oaBbgO7XlA0pYa/yEAF3vXVDtmY3XyrDN1qmTsumeCnvVuKn18mz9+KpJr1oIjQj8Pj66nLTCLJhSFPD+huti75QTiIPKGe1oHmc+ZMxvdsXEb/z2hcyZsn6GWZ+dgq5QsqWC0P9HEQSw1CNhIf0yYDa5X7wMklMOxSMBfv5DIlziYkj+GqtjBQWkzFhKhavbf6eqV3ZBMjpyG8xnEDOQgrJLrdRFwhds/tJmUmssYnkjhMfbKYRyhWsPWIszquis6uqD0TpJ6N91CRtlPKLyYiSn6L301wjrMsoNyi7/YAKYbUSU2MF0CQUDDWkBXvcjz58uKeX9I3c+wWaER2R4p5CtU/D6PIzTI6Wp3N689eyNnZRdqV3zvPuUWHxxYoqnRw 46hgT5ii BjryvYrQbysB0Cg5iHvCjDiQF3qpvnWihqMrzFZvEIVEt5actiNafxPVVSP/rezKVkdLEQC8VNkXVFyQiaQc9IrHaSruxTyPBliNZ8N6b/IHtmS2Lrh1e3jbZDGSVWrIjVkrlYrFLe72umN8vlFPUBpe/OC1pP3Xmn/NTeF9ABcmjaX80/q3myc08wiTIsjDWVPl93d2bU6MCuGKyeUkHurj4HLcbbjcidbeQU5Lgqilz0HILUGGjHDKn7GRv5qrZBtu448jWWBPx0Mhv5/yZzKzd5CloRCOnY/JpZey2ZKPu+NI= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 7/24/23 04:32, Valentin Schneider wrote: > AFAICT the only reasonable way to go about the deferral is to prove that no > such access happens before the deferred @operation is done. We got to prove > that for sync_core() deferral, cf. PATCH 18. > > I'd like to reason about it for deferring vunmap TLB flushes: > > What addresses in VMAP range, other than the stack, can early entry code > access? Yes, the ranges can be checked at runtime, but is there any chance > of figuring this out e.g. at build-time? Nadav was touching on a very important point: TLB flushes for addresses are relatively easy to defer. You just need to ensure that the CPU deferring the flush does an actual flush before it might architecturally consume the contents of the flushed entry. TLB flushes for freed page tables are another game entirely. The CPU is free to cache any part of the paging hierarchy it wants at any time. It's also free to set accessed and dirty bits at any time, even for instructions that may never execute architecturally. That basically means that if you have *ANY* freed page table page *ANYWHERE* in the page table hierarchy of any CPU at any time ... you're screwed. There's no reasoning about accesses or ordering. As soon as the CPU does *anything*, it's out to get you. You're going to need to do something a lot more radical to deal with free page table pages.