From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 27C89C4167B for ; Fri, 10 Nov 2023 01:32:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B66674401BE; Thu, 9 Nov 2023 20:32:33 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B15BB4401BC; Thu, 9 Nov 2023 20:32:33 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9B8384401BE; Thu, 9 Nov 2023 20:32:33 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 8A88E4401BC for ; Thu, 9 Nov 2023 20:32:33 -0500 (EST) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 5FABF806B4 for ; Fri, 10 Nov 2023 01:32:33 +0000 (UTC) X-FDA: 81440319786.30.9611EA1 Received: from invmail4.hynix.com (exvmail4.hynix.com [166.125.252.92]) by imf11.hostedemail.com (Postfix) with ESMTP id 2520940004 for ; Fri, 10 Nov 2023 01:32:30 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf11.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1699579951; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=xWgwJp1IezWVZCbREHL1rkrDzBmvaLyqIySsk0cRZOg=; b=OHAhmkOuhxYtcDcXAO1nWbkYR+6/OSq2E87OnuEXXx+okfV9s8G5c9EzpTUBgfzCKZDpAG BLK/griaiWN2DQCxxo2FQfnpWnnOe5ZoscbRDe1SNhdpPm8QsmuQFfUsV3adQiaHZlukiQ ya7lohr3mXtHgZJe31kELeA0n04dvNM= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf11.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1699579951; a=rsa-sha256; cv=none; b=Wn5HbulDbfKPqonmytOYEkE126HuiGU2hb8+BOb1oryAn6px9hOVkulMjChnuI5ASbDUoX r5r2Ag9K0W6zN9acKlIeBXkU+LLEBJLsxMcS8GoIqlNOx3CQ1n+cUUTWfRZsBCl5TXZ2rd /ynLyb/mJdz0UKmEL5YDG5iMvBpuWDw= X-AuditID: a67dfc5b-d85ff70000001748-c6-654d882de360 Date: Fri, 10 Nov 2023 10:32:24 +0900 From: Byungchul Park To: "Huang, Ying" Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel_team@skhynix.com, akpm@linux-foundation.org, namit@vmware.com, xhao@linux.alibaba.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, david@redhat.com, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com Subject: Re: [v4 0/3] Reduce TLB flushes under some specific conditions Message-ID: <20231110013224.GD72073@system.software.com> References: <20231109045908.54996-1-byungchul@sk.com> <87il6bijtu.fsf@yhuang6-desk2.ccr.corp.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87il6bijtu.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Mutt/1.9.4 (2018-02-28) X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFlrNIsWRmVeSWpSXmKPExsXC9ZZnka5uh2+qweQXJhZz1q9hs/i84R+b xYsN7YwWX9f/YrZ4+qmPxeLyrjlsFvfW/Ge1OL9rLavFjqX7mCwuHVjAZHF910NGi+O9B5gs Nm+aymzx+wdQ3ZwpVhYnZ01mcRDw+N7ax+KxYFOpx+YVWh6L97xk8ti0qpPNY9OnSewe786d Y/c4MeM3i8fOh5Ye804Gerzfd5XNY+svO4/Pm+Q83s1/yxbAF8Vlk5Kak1mWWqRvl8CVcXTr P6aCTpOKrbv3sTQwflXrYuTkkBAwkWiYt48Jxv655w4riM0ioCox9cUNMJtNQF3ixo2fzCC2 iICGxKeFy9m7GLk4mAXeMEmcbt4GViQs4C6x69QdsCJeAQuJ7vPP2EBsIYFMiRudW1kh4oIS J2c+YQGxmQW0JG78ewm0mAPIlpZY/o8DJMwpYCfxvPMb2BhRAWWJA9uOM4HskhBYxS6xae0E RohDJSUOrrjBMoFRYBaSsbOQjJ2FMHYBI/MqRqHMvLLcxMwcE72MyrzMCr3k/NxNjMAoXFb7 J3oH46cLwYcYBTgYlXh4L1z3SRViTSwrrsw9xCjBwawkwnvBBCjEm5JYWZValB9fVJqTWnyI UZqDRUmc1+hbeYqQQHpiSWp2ampBahFMlomDU6qBkcNjq9477uW7o6+eN4/ofbuEN7/CpmCr oc9h58/3lkUuOV9dXLElzmxr8jmjTw2/09kEmn/3uqR/nLf+xyShBzPPnVihr/n43IOPq1dq NX5ya6xVUrAPn3lbcO1ys5OfWM9nCG1+eHbpv7ylLYv6vFSuLtt3yU4qcdVa24R3io/LVBbP LLYQ2qPEUpyRaKjFXFScCACANeEhvgIAAA== X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFprEIsWRmVeSWpSXmKPExsXC5WfdrKvb4ZtqsH67nMWc9WvYLD5v+Mdm 8WJDO6PF1/W/mC2efupjsTg89ySrxeVdc9gs7q35z2pxftdaVosdS/cxWVw6sIDJ4vquh4wW x3sPMFls3jSV2eL3D6C6OVOsLE7OmsziIOjxvbWPxWPBplKPzSu0PBbvecnksWlVJ5vHpk+T 2D3enTvH7nFixm8Wj50PLT3mnQz0eL/vKpvH4hcfmDy2/rLz+LxJzuPd/LdsAfxRXDYpqTmZ ZalF+nYJXBlHt/5jKug0qdi6ex9LA+NXtS5GTg4JAROJn3vusILYLAKqElNf3ACz2QTUJW7c +MkMYosIaEh8WricvYuRi4NZ4A2TxOnmbWBFwgLuErtO3QEr4hWwkOg+/4wNxBYSyJS40bmV FSIuKHFy5hMWEJtZQEvixr+XTF2MHEC2tMTyfxwgYU4BO4nnnd/AxogKKEsc2HacaQIj7ywk 3bOQdM9C6F7AyLyKUSQzryw3MTPHVK84O6MyL7NCLzk/dxMjMKaW1f6ZuIPxy2X3Q4wCHIxK PLwXrvukCrEmlhVX5h5ilOBgVhLhvWACFOJNSaysSi3Kjy8qzUktPsQozcGiJM7rFZ6aICSQ nliSmp2aWpBaBJNl4uCUamD0Uf6azbNMxf+gC3Om532th20mL+47M2/MWy436dL/z68XSNot u//EmmFFySJXvXDb8Nt1XY+mHpth4i/z7ecq+1ORR77ffXHH1OnR3XBB3bX3bzlvfxrhanCF q4dD6e/qW/xbvdXtb+z89XjG0zczGTw1j9dmF73Kudg06UBt8NuLQUt27016qcRSnJFoqMVc VJwIAA5006mlAgAA X-CFilter-Loop: Reflected X-Rspamd-Queue-Id: 2520940004 X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: 14a7gz1gd63p4o8p1i3p991gwhgyfrhz X-HE-Tag: 1699579950-625018 X-HE-Meta: U2FsdGVkX19QLM7NW571MnEtC4tEtKCnj5tUeg539ElxxABGtcjOGAFn6cI3nKweN1IuJcQPuJV5XS6h/GUBYx+SAWoZf1LVMfKPK9lX/rwsYRGrRJ0h0SAWTejefx8n0GMBHS4BQWZuQhx7oIhyOCSJW0q50+HX6HVkADmUEjHxVUH8Vhh62Iw9xu7P8EZjWJYozMeb+6Pi+qyiTvHYdOJKMxstaF5pG/uykeT9ADjP0aWuJJz2gzuv3eMlRTQEtXQ9eHD/H95/MdZ4BFrNzAx500JfjwjGWzwyUdnRuJEwgpq6hxVqtFErNk5I5NhjJrh5ITEJzT3QS9wDrzkbkC4VrpcQ5OmzJPPUiW04Xuou1NvYu1NN5U33fcXeIQBC2zIUiwr5WM9P9GBVWl/ngBUArD87eJWf+U9KMglINC8+uVrA3JKL1VrjK75jvoHD7k2JeW+URJNCTjJwaEx3Lk4LSuFbEiDqbW06H9ihiHtgiYpwiaELv6oLJZK6z+rChXNXYUtAmARG9SPZTdo+LRMgunvnMeK3zM9wE6iEPVkUtgakvBQ4lWgAC+fbWvf4w9FlHbybQQq35ATxmN/GIVkg1o8zmkKzqEJhVjN/af1hIAQqFeJ/gv/7R9yaryz0PJbBPMw/i5kBblkyLUf0wlt37oqqYJMCtKqpaoazwzRIWu3fEDrs9DLtr3GA1Xh1tdP7cvROnDVeetonyr0oM4ARFkP2Hm5D8Bcz49bxztY6BPrm0FCRxHedBrffbkY6HRL+o3inN5F1XbggRr08CyZnAWEayd30YWFi/d3XIsVr9U87sIx/gev13Xu8l2XPegh0nRO027O0l8sAa1U82L9366weDI1kqD0uaiTrRU8R0p7TKMXgf2t+3P8gbqLzk3haxSigdryxBnBNttAoTSc6SF9Vakl4qe+yPfQxjX0rPVQnv8waia0jukj2o9Nrb2fR4kYVzrLTv9aPI+K r2fkbQaA Z87yY X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Nov 09, 2023 at 01:20:29PM +0800, Huang, Ying wrote: > Byungchul Park writes: > > > Hi everyone, > > > > While I'm working with CXL memory, I have been facing migration overhead > > esp. TLB shootdown on promotion or demotion between different tiers. > > Yeah.. most TLB shootdowns on migration through hinting fault can be > > avoided thanks to Huang Ying's work, commit 4d4b6d66db ("mm,unmap: avoid > > flushing TLB in batch if PTE is inaccessible"). > > > > However, it's only for ones using hinting fault. I thought it'd be much > > better if we have a general mechanism to reduce # of TLB flushes and > > TLB misses, that we can apply to any type of migration. I tried it only > > for tiering migration for now tho. > > > > I'm suggesting a mechanism to reduce TLB flushes by keeping source and > > destination of folios participated in the migrations until all TLB > > flushes required are done, only if those folios are not mapped with > > write permission PTE entries at all. I worked Based on v6.6-rc5. > > > > Can you believe it? I saw the number of TLB full flush reduced about > > 80% and iTLB miss reduced about 50%, and the time wise performance > > always shows at least 1% stable improvement with the workload I tested > > with, XSBench. However, I believe that it would help more with other > > ones or any real ones. It'd be appreciated to let me know if I'm missing > > something. > > Can you help to test the effect of commit 7e12beb8ca2a ("migrate_pages: > batch flushing TLB") for your test case? To test it, you can revert it > and compare the performance before and after the reverting. I will. > And, how do you trigger migration when testing XSBench? Use a tiered > memory system, and migrate pages between DRAM and CXL memory back and > forth? If so, how many pages will you migrate for each migration? Honestly I've been focusing on the migration # and TLB #. I will get back to you. Byungchul > -- > Best Regards, > Huang, Ying > > > > > Byungchul > > > > --- > > > > Changes from v3: > > > > 1. Don't use the kconfig, CONFIG_MIGRC, and remove sysctl knob, > > migrc_enable. (feedbacked by Nadav) > > 2. Remove the optimization skipping CPUs that have already > > performed TLB flushes needed by any reason when performing > > TLB flushes by migrc because I can't tell the performance > > difference between w/ the optimization and w/o that. > > (feedbacked by Nadav) > > 3. Minimize arch-specific code. While at it, move all the migrc > > declarations and inline functions from include/linux/mm.h to > > mm/internal.h (feedbacked by Dave Hansen, Nadav) > > 4. Separate a part making migrc paused when the system is in > > high memory pressure to another patch. (feedbacked by Nadav) > > 5. Rename: > > a. arch_tlbbatch_clean() to arch_tlbbatch_clear(), > > b. tlb_ubc_nowr to tlb_ubc_ro, > > c. migrc_try_flush_free_folios() to migrc_flush_free_folios(), > > d. migrc_stop to migrc_pause. > > (feedbacked by Nadav) > > 6. Use ->lru list_head instead of introducing a new llist_head. > > (feedbacked by Nadav) > > 7. Use non-atomic operations of page-flag when it's safe. > > (feedbacked by Nadav) > > 8. Use stack instead of keeping a pointer of 'struct migrc_req' > > in struct task, which is for manipulating it locally. > > (feedbacked by Nadav) > > 9. Replace a lot of simple functions to inline functions placed > > in a header, mm/internal.h. (feedbacked by Nadav) > > 10. Add additional sufficient comments. (feedbacked by Nadav) > > 11. Remove a lot of wrapper functions. (feedbacked by Nadav) > > > > Changes from RFC v2: > > > > 1. Remove additional occupation in struct page. To do that, > > unioned with lru field for migrc's list and added a page > > flag. I know page flag is a thing that we don't like to add > > but no choice because migrc should distinguish folios under > > migrc's control from others. Instead, I force migrc to be > > used only on 64 bit system to mitigate you guys from getting > > angry. > > 2. Remove meaningless internal object allocator that I > > introduced to minimize impact onto the system. However, a ton > > of tests showed there was no difference. > > 3. Stop migrc from working when the system is in high memory > > pressure like about to perform direct reclaim. At the > > condition where the swap mechanism is heavily used, I found > > the system suffered from regression without this control. > > 4. Exclude folios that pte_dirty() == true from migrc's interest > > so that migrc can work simpler. > > 5. Combine several patches that work tightly coupled to one. > > 6. Add sufficient comments for better review. > > 7. Manage migrc's request in per-node manner (from globally). > > 8. Add TLB miss improvement in commit message. > > 9. Test with more CPUs(4 -> 16) to see bigger improvement. > > > > Changes from RFC: > > > > 1. Fix a bug triggered when a destination folio at the previous > > migration becomes a source folio at the next migration, > > before the folio gets handled properly so that the folio can > > play with another migration. There was inconsistency in the > > folio's state. Fixed it. > > 2. Split the patch set into more pieces so that the folks can > > review better. (Feedbacked by Nadav Amit) > > 3. Fix a wrong usage of barrier e.g. smp_mb__after_atomic(). > > (Feedbacked by Nadav Amit) > > 4. Tried to add sufficient comments to explain the patch set > > better. (Feedbacked by Nadav Amit) > > > > Byungchul Park (3): > > mm/rmap: Recognize read-only TLB entries during batched TLB flush > > mm: Defer TLB flush by keeping both src and dst folios at migration > > mm: Pause migrc mechanism at high memory pressure > > > > arch/x86/include/asm/tlbflush.h | 3 + > > arch/x86/mm/tlb.c | 11 ++ > > include/linux/mm_types.h | 21 +++ > > include/linux/mmzone.h | 9 ++ > > include/linux/page-flags.h | 4 + > > include/linux/sched.h | 7 + > > include/trace/events/mmflags.h | 3 +- > > mm/internal.h | 78 ++++++++++ > > mm/memory.c | 11 ++ > > mm/migrate.c | 266 ++++++++++++++++++++++++++++++++ > > mm/page_alloc.c | 30 +++- > > mm/rmap.c | 35 ++++- > > 12 files changed, 475 insertions(+), 3 deletions(-)