From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4C21FC27C43 for ; Thu, 30 May 2024 07:19:00 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A97A46B0099; Thu, 30 May 2024 03:18:59 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A47056B009B; Thu, 30 May 2024 03:18:59 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 90F076B009C; Thu, 30 May 2024 03:18:59 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 72C3D6B0099 for ; Thu, 30 May 2024 03:18:59 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id E8676160E63 for ; Thu, 30 May 2024 07:18:58 +0000 (UTC) X-FDA: 82174210356.17.0E8202F Received: from invmail4.hynix.com (exvmail4.hynix.com [166.125.252.92]) by imf05.hostedemail.com (Postfix) with ESMTP id BE944100018 for ; Thu, 30 May 2024 07:18:55 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf05.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1717053537; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=yfwBK8gd7E2V4OauAmkvjsNeFVlh9Sk1DdLMsZ/RhDA=; b=tQsJAZvfbzWydqvCm2ioa9n7OQ5zvxYlZx4XOS7o1+Eh9uSEGeUZfb4PX5yqaRKRJjqwX0 fot1CXmf8uzGicf4NQFi2LxyHxyP+67/ER/EEXEko+ArHaiH1ZRByp5xIuxSxTS+J/0CfB Gk2MvT2JSxwY+m1uq38uq2t65g0cSZk= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf05.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1717053537; a=rsa-sha256; cv=none; b=ZpBVP8IkfCTsEu+22JjyMaYXv5zGbxixBRSDkJ34DZR8TrVcoNBlwPOsAyfFBcqKCF0fhq sj2rCIriBkALS6T1LHpErkkl2qmb56jgmbt191FU8IwFJBQlraTWBQ1Z6yH7KGAznNcDLu G/hk2AHQqVvUO1/CrY9dxcPyD6HLjoU= X-AuditID: a67dfc5b-d6dff70000001748-8c-6658285c933b Date: Thu, 30 May 2024 16:18:47 +0900 From: Byungchul Park To: "Huang, Ying" Cc: Dave Hansen , linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel_team@skhynix.com, akpm@linux-foundation.org, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, rjgolo@gmail.com Subject: Re: [PATCH v10 00/12] LUF(Lazy Unmap Flush) reducing tlb numbers over 90% Message-ID: <20240530071847.GA15344@system.software.com> References: <20240510065206.76078-1-byungchul@sk.com> <982317c0-7faa-45f0-82a1-29978c3c9f4d@intel.com> <20240527015732.GA61604@system.software.com> <8734q46jc8.fsf@yhuang6-desk2.ccr.corp.intel.com> <44e4f2fd-e76e-445d-b618-17a6ec692812@intel.com> <20240529050046.GB20307@system.software.com> <961f9533-1e0c-416c-b6b0-d46b97127de2@intel.com> <20240530005026.GA47476@system.software.com> <87a5k814tq.fsf@yhuang6-desk2.ccr.corp.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87a5k814tq.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Mutt/1.9.4 (2018-02-28) X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrEIsWRmVeSWpSXmKPExsXC9ZZnkW6sRkSawe00iznr17BZfN7wj83i 08sHjBYvNrQzWnxd/4vZ4umnPhaLy7vmsFncW/Of1eL8rrWsFjuW7mOyuHRgAZPF8d4DTBbz 731ms9i8aSqzxfEpUxktfv8AKj45azKLg6DH99Y+Fo+ds+6yeyzYVOqxeYWWx+I9L5k8Nq3q ZPPY9GkSu8e7c+fYPU7M+M3iMe9koMf7fVfZPLb+svNonHqNzePzJrkAvigum5TUnMyy1CJ9 uwSujAutU9gLtkpUvD90gaWB8btQFyMnh4SAicTvq2fZYOxdu58zg9gsAqoSV/88Zwex2QTU JW7c+AkWFxHQkPi0cDlQnIuDWaCPWWLN4kOMIAlhgRCJaR/WMIHYvAIWEucf7wQrEhI4zCzR d2YHVEJQ4uTMJywgNrOAlsSNfy+B4hxAtrTE8n8cIGFOATuJY1emgM0UFVCWOLDtOBPIHAmB TewSz6bfZYa4VFLi4IobLBMYBWYhGTsLydhZCGMXMDKvYhTKzCvLTczMMdHLqMzLrNBLzs/d xAiMyWW1f6J3MH66EHyIUYCDUYmHd8en8DQh1sSy4srcQ4wSHMxKIrxnJoWmCfGmJFZWpRbl xxeV5qQWH2KU5mBREuc1+laeIiSQnliSmp2aWpBaBJNl4uCUamCcLy/UyDszb+epZWqLfgt7 KIqdujjNmWWWTlNGR3no7YdbZNNKpTOuuKdc0/KK4nua25zpJbpU5o7Uz5Na9oubt0zPe2vX pfNxXur0mZE7X9i+ibiw6t6JO/p7P762WsS0JOSWXarBZL+HYfzr/iZ5PPmo0JHu/4q3u4vh ZMquh/Vxa5Mm75dTYinOSDTUYi4qTgQA6vu2XMUCAAA= X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFprFIsWRmVeSWpSXmKPExsXC5WfdrBujEZFmMHm1psWc9WvYLD5v+Mdm 8enlA0aLFxvaGS2+rv/FbPH0Ux+LxeG5J1ktLu+aw2Zxb81/Vovzu9ayWuxYuo/J4tKBBUwW x3sPMFnMv/eZzWLzpqnMFsenTGW0+P0DqPjkrMksDkIe31v7WDx2zrrL7rFgU6nH5hVaHov3 vGTy2LSqk81j06dJ7B7vzp1j9zgx4zeLx7yTgR7v911l81j84gOTx9Zfdh6NU6+xeXzeJBfA H8Vlk5Kak1mWWqRvl8CVcaF1CnvBVomK94cusDQwfhfqYuTkkBAwkdi1+zkziM0ioCpx9c9z dhCbTUBd4saNn2BxEQENiU8LlwPFuTiYBfqYJdYsPsQIkhAWCJGY9mENE4jNK2Ahcf7xTrAi IYHDzBJ9Z3ZAJQQlTs58wgJiMwtoSdz49xIozgFkS0ss/8cBEuYUsJM4dmUK2ExRAWWJA9uO M01g5J2FpHsWku5ZCN0LGJlXMYpk5pXlJmbmmOoVZ2dU5mVW6CXn525iBMbYsto/E3cwfrns fohRgINRiYf3gER4mhBrYllxZe4hRgkOZiUR3jOTQtOEeFMSK6tSi/Lji0pzUosPMUpzsCiJ 83qFpyYICaQnlqRmp6YWpBbBZJk4OKUaGBkCn3u3q8t5pk947pUekuXzeb6AEGexXqHq4ls9 TEzcD/caZ3Zw+vqtEHLp5t8+SyjENqflzpc7HwJF78k98rCOuubDlpCwQy27bNmU1xv1Ax23 RSv8Pv+oKXht5UXTA64NhS+nlE7vn540YZrr9sZbgiyN+wpPuB5onFJ9OvnxrK+/3D53KbEU ZyQaajEXFScCAEWhZeCtAgAA X-CFilter-Loop: Reflected X-Rspamd-Queue-Id: BE944100018 X-Stat-Signature: xw3xjxrrtdi3uansr3exff4j4xs1gqjp X-Rspam-User: X-Rspamd-Server: rspam11 X-HE-Tag: 1717053535-918202 X-HE-Meta: U2FsdGVkX18VJ8j+2ZZ480KJQtXANZEIbZ+kju6h2pXqU1FQeZR9TYLqzD6zFuoEay1PLu+IGV3fjy4bjjagjwAQDmsYd5nghwXGcyfvEqquESrZnedvv71Gq8oVmpJVcVOg3DiZ2/Cwxvw4Xb2s0IR4FWhuaShF4rXZAxOXUc8aZtBPul/m7CJv9naWbKzGftZsct8TDNgA6c5bzY0y1KV+3baId9g9qjMrIERP05GzUJ136u1DeEB7oMGShHsJCVoRECt7qEru6oiVHIAEgwUsfEas32gMImyldE5LY1CKVwjDCkmyatqzumU4bvAGGan3I36Jj48v7uYgwMqEdxvhM82w3KQMx6FGVK8jHrNY3gFBzbu/D3FBBbu/hKiADe7gz2nZY9pNLSsRdCPU075yFkObuqL9CVsWXr65aA+9Sy+4n6xp6yOULoRnCgsXqL1jTqYECOZTGbxm7+oOYKGxDrStI3EPcWpr4OV5vGZfk+PmHblxg1sGtz1h/FMxH7OtvZVXHXQVIRFosiwhhmDdlVeNGu0h+FoiBDB/Gu0yOHk0DUVfhURC4Gb5E3hcng/I+xsFoQ14uCbDGMyztpE6Mwj+5SzgZIPCKdZ79F6Cumeg4QfedxLnAjUR90+z/OghVPG91WstNLtVjRMBMggTIcyfUYxU8PgFiX6YW8uKbXCWAED8JKqHXiCqynnt67FUz1cUXRx8IJLaaK/ak3gV7YNdHg7Mph1QxWYCIP8onEpDv6ebE6bVrdL5p7Wg8W17NIoQvj4Osc17Av8T3xheWd1FmtN3MltUkZzUnk6W6Aaw2U1LwZHrMG7WJKHpx/UCdzMLYvRmnFDlUIAmIgNS3SGwoo7a6l1YaWx2Kv8kpPsTCEz6Drqj69blqQAHdEu3pt8hS2OLHWlBoFoaq21wWEH/zdRhRPfLzE3QDc3E+dJUJYP2jcl5rg0pNJrnP2d5R0RioSc04gGnyks 2lApD4Uy 5x2KiGulTNsNKEY1AjlJo5I9P4f/NKMMxUcvz X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, May 30, 2024 at 09:11:45AM +0800, Huang, Ying wrote: > Byungchul Park writes: > > > On Wed, May 29, 2024 at 09:41:22AM -0700, Dave Hansen wrote: > >> On 5/28/24 22:00, Byungchul Park wrote: > >> > All the code updating ptes already performs TLB flush needed in a safe > >> > way if it's inevitable e.g. munmap. LUF which controls when to flush in > >> > a higer level than arch code, just leaves stale ro tlb entries that are > >> > currently supposed to be in use. Could you give a scenario that you are > >> > concering? > >> > >> Let's go back this scenario: > >> > >> fd = open("/some/file", O_RDONLY); > >> ptr1 = mmap(-1, size, PROT_READ, ..., fd, ...); > >> foo1 = *ptr1; > >> > >> There's a read-only PTE at 'ptr1'. Right? The page being pointed to is > >> eligible for LUF via the try_to_unmap() paths. In other words, the page > >> might be reclaimed at any time. If it is reclaimed, the PTE will be > >> cleared. > >> > >> Then, the user might do: > >> > >> munmap(ptr1, PAGE_SIZE); > >> > >> Which will _eventually_ wind up in the zap_pte_range() loop. But that > >> loop will only see pte_none(). It doesn't do _anything_ to the 'struct > >> mmu_gather'. > >> > >> The munmap() then lands in tlb_flush_mmu_tlbonly() where it looks at the > >> 'struct mmu_gather': > >> > >> if (!(tlb->freed_tables || tlb->cleared_ptes || > >> tlb->cleared_pmds || tlb->cleared_puds || > >> tlb->cleared_p4ds)) > >> return; > >> > >> But since there were no cleared PTEs (or anything else) during the > >> unmap, this just returns and doesn't flush the TLB. > >> > >> We now have an address space with a stale TLB entry at 'ptr1' and not > >> even a VMA there. There's nothing to stop a new VMA from going in, > >> installing a *new* PTE, but getting data from the stale TLB entry that > >> still hasn't been flushed. > > > > Thank you for the explanation. I got you. I think I could handle the > > case through a new flag in vma or something indicating LUF has deferred > > necessary TLB flush for it during unmapping so that mmu_gather mechanism > > can be aware of it. Of course, the performance change should be checked > > again. Thoughts? > > I suggest you to start with the simple case. That is, only support page > reclaiming and migration. A TLB flushing can be enforced during unmap > with something similar as flush_tlb_batched_pending(). While reading flush_tlb_batched_pending(mm), I found it already performs TLB flush for the target mm, if set_tlb_ubc_flush_pending(mm) has been hit at least once since the last flush_tlb_batched_pending(mm). Since LUF also relies on set_tlb_ubc_flush_pending(mm), it's going to perform TLB flush required, in flush_tlb_batched_pending(mm) during munmap(). So it looks safe to me with regard to munmap() already. Is there something that I'm missing? JFYI, regarding to mmap(), I have reworked on fault handler to give up luf when needed in a better way. Byungchul > -- > Best Regards, > Huang, Ying