From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 720F7C25B75 for ; Thu, 30 May 2024 01:33:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B3C456B009B; Wed, 29 May 2024 21:33:37 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AC5536B009C; Wed, 29 May 2024 21:33:37 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9655B6B009E; Wed, 29 May 2024 21:33:37 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 747FB6B009B for ; Wed, 29 May 2024 21:33:37 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id CD84FC0AFC for ; Thu, 30 May 2024 01:33:36 +0000 (UTC) X-FDA: 82173340032.10.256C091 Received: from invmail4.hynix.com (exvmail4.hynix.com [166.125.252.92]) by imf07.hostedemail.com (Postfix) with ESMTP id CBBC640003 for ; Thu, 30 May 2024 01:33:33 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf07.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1717032815; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=yTI9UxltLNBSbvB9HJHORAdwj+4lruYn/sePPevP8k4=; b=ppNnCSmEP7ycFlvE74lJ4Vwt+LdmEb01NnX6yEr/wT1iaxILswogiDM7ywIbqhO2SkWFsm Z8uFFhTXYggolCouKcudBIgMQTtGu2fiZMD3MnvcJVsoYBd2AKOIwVJ7hIlH2zMEsgSJ93 30h1BbqTBUTDytfCpGFj9DClBgvMB30= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf07.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1717032815; a=rsa-sha256; cv=none; b=3g12bthrbQRmWn4f4qYm88qmGMWXcVPAr709kwkQcN53mmwil5yiuqzFlQmA+R3mdLOZU3 9AmP9A146w2rnQZ6rSm+tXycCcgjEz5RbLliRr42xWaOsXMwlN2MQ4aamozEt1iocgv5aH eyQyBEYFH4mEAVIA+sMD1NLKTWmBDPQ= X-AuditID: a67dfc5b-d6dff70000001748-bc-6657d76941b2 Date: Thu, 30 May 2024 10:33:24 +0900 From: Byungchul Park To: "Huang, Ying" Cc: Dave Hansen , linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel_team@skhynix.com, akpm@linux-foundation.org, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, rjgolo@gmail.com Subject: Re: [PATCH v10 00/12] LUF(Lazy Unmap Flush) reducing tlb numbers over 90% Message-ID: <20240530013324.GA15492@system.software.com> References: <20240510065206.76078-1-byungchul@sk.com> <982317c0-7faa-45f0-82a1-29978c3c9f4d@intel.com> <20240527015732.GA61604@system.software.com> <8734q46jc8.fsf@yhuang6-desk2.ccr.corp.intel.com> <44e4f2fd-e76e-445d-b618-17a6ec692812@intel.com> <20240529050046.GB20307@system.software.com> <961f9533-1e0c-416c-b6b0-d46b97127de2@intel.com> <20240530005026.GA47476@system.software.com> <87a5k814tq.fsf@yhuang6-desk2.ccr.corp.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87a5k814tq.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Mutt/1.9.4 (2018-02-28) X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrAIsWRmVeSWpSXmKPExsXC9ZZnoW7m9fA0g7W3TCzmrF/DZvF5wz82 i08vHzBavNjQzmjxdf0vZounn/pYLC7vmsNmcW/Nf1aL87vWslrsWLqPyeLSgQVMFsd7DzBZ zL/3mc1i86apzBbHp0xltPj9A6j45KzJLA6CHt9b+1g8ds66y+6xYFOpx+YVWh6L97xk8ti0 qpPNY9OnSewe786dY/c4MeM3i8e8k4Ee7/ddZfPY+svOo3HqNTaPz5vkAviiuGxSUnMyy1KL 9O0SuDJmHAgoOCFSseHTArYGxnv8XYwcHBICJhJ7ryl3MXKCmduWXmcHCbMIqEr8/xMCEmYT UJe4ceMnM4gtIqAh8WnhcqASLg5mgT5miTWLDzGCJIQFQiSmfVjDBGLzClhITP+whRGkSEjg MLNE35kdUAlBiZMzn7CA2MwCWhI3/r1kAlnGLCAtsfwfB0iYU8BO4tiVKWAzRQWUJQ5sO84E MkdCYBu7xNatH1ghDpWUOLjiBssERoFZSMbOQjJ2FsLYBYzMqxiFMvPKchMzc0z0MirzMiv0 kvNzNzEC43FZ7Z/oHYyfLgQfYhTgYFTi4T0gEZ4mxJpYVlyZe4hRgoNZSYT3zKTQNCHelMTK qtSi/Pii0pzU4kOM0hwsSuK8Rt/KU4QE0hNLUrNTUwtSi2CyTBycUg2Ms7iz1D+Ye2+euOHp sX+ylncPeq4+kL/mjsgLrv+GdxKu6Ms+EPsed+HLnZ7/Vm96llWu/rHs7p9bZ/4UaxsdNvqS 1jBTTW2ffayVkfVlPW3m3o/poXmTN22az82ssiqyi6MoSWrpjKWfmtlbgl5MvsKwZPXTEMbE 8o7Zkj9urVDWeC58sCdXV4mlOCPRUIu5qDgRALZP0F7DAgAA X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFprBIsWRmVeSWpSXmKPExsXC5WfdrJt5PTzN4OJVTYs569ewWXze8I/N 4tPLB4wWLza0M1p8Xf+L2eLppz4Wi8NzT7JaXN41h83i3pr/rBbnd61ltdixdB+TxaUDC5gs jvceYLKYf+8zm8XmTVOZLY5Pmcpo8fsHUPHJWZNZHIQ8vrf2sXjsnHWX3WPBplKPzSu0PBbv ecnksWlVJ5vHpk+T2D3enTvH7nFixm8Wj3knAz3e77vK5rH4xQcmj62/7Dwap15j8/i8SS6A P4rLJiU1J7MstUjfLoErY8aBgIITIhUbPi1ga2C8x9/FyMkhIWAisW3pdfYuRg4OFgFVif9/ QkDCbALqEjdu/GQGsUUENCQ+LVwOVMLFwSzQxyyxZvEhRpCEsECIxLQPa5hAbF4BC4npH7Yw ghQJCRxmlug7swMqIShxcuYTFhCbWUBL4sa/l0wgy5gFpCWW/+MACXMK2EkcuzIFbKaogLLE gW3HmSYw8s5C0j0LSfcshO4FjMyrGEUy88pyEzNzTPWKszMq8zIr9JLzczcxAuNrWe2fiTsY v1x2P8QowMGoxMN7QCI8TYg1say4MvcQowQHs5II75lJoWlCvCmJlVWpRfnxRaU5qcWHGKU5 WJTEeb3CUxOEBNITS1KzU1MLUotgskwcnFINjHnbT602rlx/pVy9vr5pWXesv/JyxdWSXw8x nGgwC/vvlPappnpma+Fp1oSUeVFFPU/uC4Ts2nDGf4H18fjnM2s+yiZwlTy6nlK2P78kNuxz h1Z7QbmAw5PuNVzMDPtuLWmMerovf2HmzhuuEo/O/Gq5MpVFMsjiupn5oop7UZkZ5yau8TrI rcRSnJFoqMVcVJwIAJL7ZxerAgAA X-CFilter-Loop: Reflected X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: CBBC640003 X-Stat-Signature: 5oro11fu8ieisq9kjunggx1m74cwwj8s X-Rspam-User: X-HE-Tag: 1717032813-981263 X-HE-Meta: U2FsdGVkX1+Ext1/AY4b87lwGkoEvQ6/TAPQstayz8tjDAgtccH6if8s0cPhxyTsAfoh85uV4I4wg/lI/DgcIJ3zQHEcdCG591JGbcK8cQoDo6P3tE/iVlcycM5eOqKYO911qjJ5a9hhdeu6w8IJBQW2/P6tLzDp86a7YvImXpo5ZyxP7XTM3vdEU3wGwpVBsOELVItcB0u9UG2mSxateFNeA0zxmnlBi/rF8ONoAdddpQ9iNb9KAlvvhrZt2+0lbtrMOGh/18d1lJGqfx5fxDT7zNTxGL0sAM/CQ17yMDEw+Cq8z8M27jj+bRmsg4ihbWQnEvo5nUhJCFFroFhBZc4egTCD4Q85u1MALvFfAFl0bvAiccfDR+TVXZaak6VPOM+Q9sLD1V8PElsKuqOAE35Yr9CLIfJXvoIL8Yad3UPADzF8kl+h2/8vpW+G54aFJE0r6K3DkJQlXIWblul6m7VV/7XkuLVs6eVdoqP2iQ3sZjODP2dFXFCwdzSSZkLGM611IAt06mmUJnSbPgYnRbiT/atZjvSNiXtPWu06GcShHowjpQchAS8TSZSZj4vSe1PkSBJ+fxmXIkMVtZdq2vWp/DrG8mnxDxzUNYAcJUwfOwYQpTUgpT/25mGr0cj6eT+x7SdsWdUyhVOzowSgzMOlhpmxQSjIkxFo+lNX/fpdp69azb8ZSMr15j7cVZR5HkAVBE4VNEsBCk+beDnVl1+x8bxgU5O+6dXoURZfpHiNs1Cby5lnHoMsqQb/nBfrwRjyAwyu3xiN5gZzbfPW/AstLYYWmUDrbDOuRDaY1Q7ldzpDD60O7ZLCu5jzbyyZr8AqvRfwRcUMlgNYXZEMVitb/+GuZMzgum1wRhOzLHmWQL5aTlBDkf5QAGbMA2/qj4xCSUUZNIb4/wEcdXeqjfxKIMDvrARz7zE0jR9rA6B0kST5XJNCuPUTFeCT98LfNN1qsTrf+Dtg6LMih27 npJJ/f7e 3Lg6l X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, May 30, 2024 at 09:11:45AM +0800, Huang, Ying wrote: > Byungchul Park writes: > > > On Wed, May 29, 2024 at 09:41:22AM -0700, Dave Hansen wrote: > >> On 5/28/24 22:00, Byungchul Park wrote: > >> > All the code updating ptes already performs TLB flush needed in a safe > >> > way if it's inevitable e.g. munmap. LUF which controls when to flush in > >> > a higer level than arch code, just leaves stale ro tlb entries that are > >> > currently supposed to be in use. Could you give a scenario that you are > >> > concering? > >> > >> Let's go back this scenario: > >> > >> fd = open("/some/file", O_RDONLY); > >> ptr1 = mmap(-1, size, PROT_READ, ..., fd, ...); > >> foo1 = *ptr1; > >> > >> There's a read-only PTE at 'ptr1'. Right? The page being pointed to is > >> eligible for LUF via the try_to_unmap() paths. In other words, the page > >> might be reclaimed at any time. If it is reclaimed, the PTE will be > >> cleared. > >> > >> Then, the user might do: > >> > >> munmap(ptr1, PAGE_SIZE); > >> > >> Which will _eventually_ wind up in the zap_pte_range() loop. But that > >> loop will only see pte_none(). It doesn't do _anything_ to the 'struct > >> mmu_gather'. > >> > >> The munmap() then lands in tlb_flush_mmu_tlbonly() where it looks at the > >> 'struct mmu_gather': > >> > >> if (!(tlb->freed_tables || tlb->cleared_ptes || > >> tlb->cleared_pmds || tlb->cleared_puds || > >> tlb->cleared_p4ds)) > >> return; > >> > >> But since there were no cleared PTEs (or anything else) during the > >> unmap, this just returns and doesn't flush the TLB. > >> > >> We now have an address space with a stale TLB entry at 'ptr1' and not > >> even a VMA there. There's nothing to stop a new VMA from going in, > >> installing a *new* PTE, but getting data from the stale TLB entry that > >> still hasn't been flushed. > > > > Thank you for the explanation. I got you. I think I could handle the > > case through a new flag in vma or something indicating LUF has deferred > > necessary TLB flush for it during unmapping so that mmu_gather mechanism > > can be aware of it. Of course, the performance change should be checked > > again. Thoughts? > > I suggest you to start with the simple case. That is, only support page > reclaiming and migration. A TLB flushing can be enforced during unmap > with something similar as flush_tlb_batched_pending(). Right. I'm thinking to add a related code to flush_tlb_batched_pending(). Byungchul > -- > Best Regards, > Huang, Ying