From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B8249C25B74 for ; Thu, 30 May 2024 09:33:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3A5EE6B009F; Thu, 30 May 2024 05:33:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 355716B00A0; Thu, 30 May 2024 05:33:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 21C706B00A1; Thu, 30 May 2024 05:33:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 05B1E6B009F for ; Thu, 30 May 2024 05:33:20 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 77FA2A18D8 for ; Thu, 30 May 2024 09:33:20 +0000 (UTC) X-FDA: 82174548960.20.9C9B51C Received: from invmail4.hynix.com (exvmail4.skhynix.com [166.125.252.92]) by imf11.hostedemail.com (Postfix) with ESMTP id 423D54000B for ; Thu, 30 May 2024 09:33:15 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf11.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1717061598; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=/jEgnVOrEf/0VJF72D1iV5wLAEVrhZATLoeCRc/XwZ4=; b=y+JhXzHdhLUJJsAz1c+i0iKy/G/QwqLSzh1olBMf0tv6pduJnrE7oS76lphSYmBDAh9sLq L5qZKmT2of/kkWLMvjBmmcLA3K70TVU0pZSlTKbu7I7Y4+rVHQYKspd6ATWgdHoKXIpn+j U+gVwv958VwNsr1eUkSU9w8Lu/iHiE4= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf11.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1717061598; a=rsa-sha256; cv=none; b=WRYJzse4eBb/PCGhAP/1vcQsmmXX10bMz9T1V8u+r4gkuG3G996G10PjMgcxnJRGy1vqrG 5Hop4KiSel8iTh4g2PIvF+zGIE5J38GtXcrHEFylwP9k6pFpq8nc88WdJuZ6sTS/v+3Phn sfafouuenqGkOkUI457B8x56IfLmkAk= X-AuditID: a67dfc5b-d6dff70000001748-64-665847d8df8b Date: Thu, 30 May 2024 18:33:07 +0900 From: Byungchul Park To: "Huang, Ying" Cc: Dave Hansen , linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel_team@skhynix.com, akpm@linux-foundation.org, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, rjgolo@gmail.com Subject: Re: [PATCH v10 00/12] LUF(Lazy Unmap Flush) reducing tlb numbers over 90% Message-ID: <20240530093306.GA35610@system.software.com> References: <982317c0-7faa-45f0-82a1-29978c3c9f4d@intel.com> <20240527015732.GA61604@system.software.com> <8734q46jc8.fsf@yhuang6-desk2.ccr.corp.intel.com> <44e4f2fd-e76e-445d-b618-17a6ec692812@intel.com> <20240529050046.GB20307@system.software.com> <961f9533-1e0c-416c-b6b0-d46b97127de2@intel.com> <20240530005026.GA47476@system.software.com> <87a5k814tq.fsf@yhuang6-desk2.ccr.corp.intel.com> <20240530071847.GA15344@system.software.com> <871q5j1zdf.fsf@yhuang6-desk2.ccr.corp.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <871q5j1zdf.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Mutt/1.9.4 (2018-02-28) X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrEIsWRmVeSWpSXmKPExsXC9ZZnoe4N94g0g64vshZz1q9hs/i84R+b xaeXDxgtXmxoZ7T4uv4Xs8XTT30sFpd3zWGzuLfmP6vF+V1rWS12LN3HZHHpwAImi+O9B5gs 5t/7zGaxedNUZovjU6YyWvz+AVR8ctZkFgdBj++tfSweO2fdZfdYsKnUY/MKLY/Fe14yeWxa 1cnmsenTJHaPd+fOsXucmPGbxWPeyUCP9/uusnls/WXn0Tj1GpvH501yAXxRXDYpqTmZZalF +nYJXBkvji1lLLggX7Hv3CqmBsZtEl2MnBwSAiYSrTOOMcLYJ9aeZgKxWQRUJdasOwkWZxNQ l7hx4ycziC0ioCHxaeFy9i5GLg5mgT5miTWLD4EVCQuESEz7sAasmVfAQuJc63wWEFtI4Bez xLSb6hBxQYmTM5+AxZkFtCRu/HsJVM8BZEtLLP/HARLmFLCTeP/+G9gYUQFliQPbjjOB7JIQ 2MYusWX3TWaIQyUlDq64wTKBUWAWkrGzkIydhTB2ASPzKkahzLyy3MTMHBO9jMq8zAq95Pzc TYzAmFxW+yd6B+OnC8GHGAU4GJV4eHd8Ck8TYk0sK67MPcQowcGsJMJ7ZlJomhBvSmJlVWpR fnxRaU5q8SFGaQ4WJXFeo2/lKUIC6YklqdmpqQWpRTBZJg5OqQZGw5XsWTqbNMXP7XZfLypm cIczdknbpVDdSUfO/MoseaF28nTUto+8ahtk1hq7hgffTtmdIen8dtmSOpkQviKfkz5vPBXW LbbTFIuPX6W1e2bKvJL3/7uPi89dPHHv68eHTaf9k53GcNWi95PLxBmL7Na8OxhZ966Lc/mK 7VfPTlgX9emE2j79zUosxRmJhlrMRcWJAH6Wj1PFAgAA X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFprJIsWRmVeSWpSXmKPExsXC5WfdrHvDPSLNoO8On8Wc9WvYLD5v+Mdm 8enlA0aLFxvaGS2+rv/FbPH0Ux+LxeG5J1ktLu+aw2Zxb81/Vovzu9ayWuxYuo/J4tKBBUwW x3sPMFnMv/eZzWLzpqnMFsenTGW0+P0DqPjkrMksDkIe31v7WDx2zrrL7rFgU6nH5hVaHov3 vGTy2LSqk81j06dJ7B7vzp1j9zgx4zeLx7yTgR7v911l81j84gOTx9Zfdh6NU6+xeXzeJBfA H8Vlk5Kak1mWWqRvl8CV8eLYUsaCC/IV+86tYmpg3CbRxcjJISFgInFi7WkmEJtFQFVizbqT jCA2m4C6xI0bP5lBbBEBDYlPC5ezdzFycTAL9DFLrFl8CKxIWCBEYtqHNWDNvAIWEuda57OA 2EICv5glpt1Uh4gLSpyc+QQsziygJXHj30ugeg4gW1pi+T8OkDCngJ3E+/ffwMaICihLHNh2 nGkCI+8sJN2zkHTPQuhewMi8ilEkM68sNzEzx1SvODujMi+zQi85P3cTIzDCltX+mbiD8ctl 90OMAhyMSjy8ByTC04RYE8uKK3MPMUpwMCuJ8J6ZFJomxJuSWFmVWpQfX1Sak1p8iFGag0VJ nNcrPDVBSCA9sSQ1OzW1ILUIJsvEwSnVwHik8dYHaaPAf9f3rH3GyrdsQ/nbt5HuGdwN9+/2 /1rZyGBx8cTLopebtj3u+1e36rbYnAV2fMrNAlVHLKsOyUyVcfb1+XOer9yop+jDJ7+LVwz/ W68OP3Hp67Y2Wdap1rwaBjO2RyQXtDP+Fd/oGBe5al+8t3X9Ye+4uFnuH3uKov9f0Ax6dl2J pTgj0VCLuag4EQDlUdi1rAIAAA== X-CFilter-Loop: Reflected X-Rspamd-Queue-Id: 423D54000B X-Stat-Signature: 61adeebw3yfdt75zq16idf6pyd4sku5y X-Rspam-User: X-Rspamd-Server: rspam11 X-HE-Tag: 1717061595-601508 X-HE-Meta: U2FsdGVkX1/H20ehtj42nogkIsk6OHb/THzPQXYCpsqgERYSuPAgcw+Qa8XZitNug1snmRfgUf0v4AimA/CMsWuR0tL2BCvHy/SRATCE9PIvajPc3arIpRrir3UFMvfNRBN/YdDliYt/EqOzZAn+obfc0hetXd1PJkzWUhtT8R6EE0ImkFWYul7cIcgU5OaJTfuMp4f1auA5ruvLBBefc7G3B8c3x+rguvJPUhZba3XQsKcsE9zuwERZy2XexlUwaQ+ASbhFsqlIQxzYgzF3wINk4r2nXWeOWHM3y++8HEKa5ju/bTTMG49+8jeJ6mqiceeZ7MHpRKQORtGMtP4aRLMCAXzPGecxBETs+lIomWIn2ICqGxV9ujI+XRwVYcbDT8q00KokrijZ3ZebkiaU4XH4TAmrUGA/HGVcUpcYgSJsyILd2wk+Sw2464LJFnQXyYNObT006+tRiBULeXZ6/H9b5sVtbCaFZvXJ98LvgQo7Kt0lx0WDIzG0Ic9rEFH8M1Tph590Jj1WjWdpGQohw9ItSLsfEDyuhU5A9D7Bb+6I/cPjRBzWUhCTTrw3M+5BZAmAR1TczwdAc5l6T7HmQoPTThCqTsv19b++Cy9SXTfvNGoOBg5HVJ3jL1HUEvqgMDz2gHcqBStohjMSVxLTQuI48XCfSldshTn+uMEE1evUGpgekJf386B0DAWKUdnaJ4CToqoX1cVKbbleeeKSrxvjqWwMZ18wjkv8jz9gl1y9/Sz0uC2AL+IntXVn/V5SiCeQO9v9v6LPVrppbieusJ6n69K3TqxT4DFMvxih+6iBsl1aIS1njrk3hO0FAUkS77aS0rbft2ETTOr1ZsZVhZ6qWeh8Mq9wmfRHDm4uCDlJjzW+vYhIhXFEgb5MwL+Dehl3Y07U5U69Uh05grl/bQj3LUNB8E57TlkUjQoVv99s8Bmqjjc7d22xnWsRZVPSIuPxCu5qmf+odBSbOWA 3IGNifCI BCeyP X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, May 30, 2024 at 04:24:12PM +0800, Huang, Ying wrote: > Byungchul Park writes: > > > On Thu, May 30, 2024 at 09:11:45AM +0800, Huang, Ying wrote: > >> Byungchul Park writes: > >> > >> > On Wed, May 29, 2024 at 09:41:22AM -0700, Dave Hansen wrote: > >> >> On 5/28/24 22:00, Byungchul Park wrote: > >> >> > All the code updating ptes already performs TLB flush needed in a safe > >> >> > way if it's inevitable e.g. munmap. LUF which controls when to flush in > >> >> > a higer level than arch code, just leaves stale ro tlb entries that are > >> >> > currently supposed to be in use. Could you give a scenario that you are > >> >> > concering? > >> >> > >> >> Let's go back this scenario: > >> >> > >> >> fd = open("/some/file", O_RDONLY); > >> >> ptr1 = mmap(-1, size, PROT_READ, ..., fd, ...); > >> >> foo1 = *ptr1; > >> >> > >> >> There's a read-only PTE at 'ptr1'. Right? The page being pointed to is > >> >> eligible for LUF via the try_to_unmap() paths. In other words, the page > >> >> might be reclaimed at any time. If it is reclaimed, the PTE will be > >> >> cleared. > >> >> > >> >> Then, the user might do: > >> >> > >> >> munmap(ptr1, PAGE_SIZE); > >> >> > >> >> Which will _eventually_ wind up in the zap_pte_range() loop. But that > >> >> loop will only see pte_none(). It doesn't do _anything_ to the 'struct > >> >> mmu_gather'. > >> >> > >> >> The munmap() then lands in tlb_flush_mmu_tlbonly() where it looks at the > >> >> 'struct mmu_gather': > >> >> > >> >> if (!(tlb->freed_tables || tlb->cleared_ptes || > >> >> tlb->cleared_pmds || tlb->cleared_puds || > >> >> tlb->cleared_p4ds)) > >> >> return; > >> >> > >> >> But since there were no cleared PTEs (or anything else) during the > >> >> unmap, this just returns and doesn't flush the TLB. > >> >> > >> >> We now have an address space with a stale TLB entry at 'ptr1' and not > >> >> even a VMA there. There's nothing to stop a new VMA from going in, > >> >> installing a *new* PTE, but getting data from the stale TLB entry that > >> >> still hasn't been flushed. > >> > > >> > Thank you for the explanation. I got you. I think I could handle the > >> > case through a new flag in vma or something indicating LUF has deferred > >> > necessary TLB flush for it during unmapping so that mmu_gather mechanism > >> > can be aware of it. Of course, the performance change should be checked > >> > again. Thoughts? > >> > >> I suggest you to start with the simple case. That is, only support page > >> reclaiming and migration. A TLB flushing can be enforced during unmap > >> with something similar as flush_tlb_batched_pending(). > > > > While reading flush_tlb_batched_pending(mm), I found it already performs > > TLB flush for the target mm, if set_tlb_ubc_flush_pending(mm) has been > > hit at least once since the last flush_tlb_batched_pending(mm). > > > > Since LUF also relies on set_tlb_ubc_flush_pending(mm), it's going to > > perform TLB flush required, in flush_tlb_batched_pending(mm) during > > munmap(). So it looks safe to me with regard to munmap() already. > > > > Is there something that I'm missing? > > > > JFYI, regarding to mmap(), I have reworked on fault handler to give up > > luf when needed in a better way. > > If TLB flush is always enforced during munmap(), then your solution can > only avoid TLB flushing for page reclaiming and migration, not unmap. I'm not sure if I understand what you meant. Could you explain it in more detail? LUF works for only *unmapping* that happens during page reclaiming and migration. Other unmappings than page reclaiming and migration are not what LUF works for. That's why I thought flush_tlb_batched_pending() could handle the pending tlb flushes in the case. It'd be appreciated if you explain what you meant more. Byungchul > Or do I miss something? > > -- > Best Regards, > Huang, Ying