From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 01DB1C25B75 for ; Wed, 29 May 2024 04:39:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1412F6B0098; Wed, 29 May 2024 00:39:50 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0F1A66B0099; Wed, 29 May 2024 00:39:50 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EFB376B009A; Wed, 29 May 2024 00:39:49 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id D244B6B0098 for ; Wed, 29 May 2024 00:39:49 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 82D74140821 for ; Wed, 29 May 2024 04:39:49 +0000 (UTC) X-FDA: 82170180498.16.A9F87D0 Received: from invmail4.hynix.com (exvmail4.skhynix.com [166.125.252.92]) by imf15.hostedemail.com (Postfix) with ESMTP id DDC55A000C for ; Wed, 29 May 2024 04:39:46 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=none; spf=pass (imf15.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1716957587; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Z2xUBtyc7NFMq25yd49d95YqK3pcno3PYXsgDAXzZWI=; b=ftH5JyVfW+LZEULmYXYF3dksTxWICBmuz8x9N0y4hApgPISD8ZVbay1iQkteD36BdlWKUr yxa2OXGn8VVzHy9AwD5ck/jtd1GvRNU1Y5BfniWag2X/lJtOY30WrQIsaf3LDEGxRJ3IM9 RMx4DCBK1ogsCbiyE/+i8cIlsZuxKOI= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1716957587; a=rsa-sha256; cv=none; b=aIayPFcn3hMgWijeYfmftSgJ0ySu2Cv+BvKP2QIMe9uaqERVmScPILBHUvSXucqVUjlI8D k4MJUngcWzs/H9lKTWfE4TLRYKJVye+d68qjP0dnhLXyt51weIplkL+mS4klaMNwidf7pq gR2deaKgUgq9kOi0r2h4dvegFf+conY= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=none; spf=pass (imf15.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none X-AuditID: a67dfc5b-d85ff70000001748-7f-6656b1909438 Date: Wed, 29 May 2024 13:39:39 +0900 From: Byungchul Park To: David Hildenbrand Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel_team@skhynix.com, akpm@linux-foundation.org, ying.huang@intel.com, vernhao@tencent.com, mgorman@techsingularity.net, hughd@google.com, willy@infradead.org, peterz@infradead.org, luto@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, rjgolo@gmail.com Subject: Re: [PATCH v10 00/12] LUF(Lazy Unmap Flush) reducing tlb numbers over 90% Message-ID: <20240529043938.GA20307@system.software.com> References: <20240510065206.76078-1-byungchul@sk.com> <07686f06-f1a8-4282-bb48-fc4a5b554552@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <07686f06-f1a8-4282-bb48-fc4a5b554552@redhat.com> User-Agent: Mutt/1.9.4 (2018-02-28) X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFlrFIsWRmVeSWpSXmKPExsXC9ZZnke6EjWFpBo+WSFrMWb+GzeLzhn9s Fi82tDNafF3/i9ni6ac+FovLu+awWdxb85/V4vyutawWO5buY7K4dGABk8Xx3gNMFvPvfWaz 2LxpKrPF8SlTGS1+/wAqPjlrMouDgMf31j4Wj52z7rJ7LNhU6rF5hZbH4j0vmTw2repk89j0 aRK7x7tz59g9Tsz4zeIx72Sgx/t9V9k8tv6y82iceo3N4/MmuQC+KC6blNSczLLUIn27BK6M K9scCg5KVLx4/IGpgXGxcBcjJ4eEgInEmf4FLDD2ss/nmUFsFgFViUVtu8HibALqEjdu/ASL iwhoSGxq2wBkc3EwC7xlkpgyp5kNJCEsECIx7cMapi5GDg5eAQuJdzvTQMJCApkS56YcZgSx eQUEJU7OfAI2k1lAS+LGv5dg5cwC0hLL/3GAmJwCdhLff6eCVIgKKEsc2HacCWSThMAqdolf Z+ZAnSkpcXDFDZYJjAKzkEydhWTqLISpCxiZVzEKZeaV5SZm5pjoZVTmZVboJefnbmIExuCy 2j/ROxg/XQg+xCjAwajEw2txIDRNiDWxrLgy9xCjBAezkgjvmUlAId6UxMqq1KL8+KLSnNTi Q4zSHCxK4rxG38pThATSE0tSs1NTC1KLYLJMHJxSwADenx9//VJTkwP7e5vnpcekMuOtrn/Z +0A71m/yC5WP9p2qx05UfDK+0yQa8eTqxXJdPpdotk1b10m7LAy/d3OFq1Q9Y+G7teLKpo3B ooIKr2L728pVdm/MWLNrvoT1fMk5KUxFhr/qGysa+g+5lf1PV+qw6Pzxk8Xsc7lAV/Kbiukd 8zeyKLEUZyQaajEXFScCACnS0mG9AgAA X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFprIIsWRmVeSWpSXmKPExsXC5WfdrDthY1iawfp+Xos569ewWXze8I/N 4sWGdkaLr+t/MVs8/dTHYnF47klWi8u75rBZ3Fvzn9Xi/K61rBY7lu5jsrh0YAGTxfHeA0wW 8+99ZrPYvGkqs8XxKVMZLX7/ACo+OWsyi4Ogx/fWPhaPnbPusnss2FTqsXmFlsfiPS+ZPDat 6mTz2PRpErvHu3Pn2D1OzPjN4jHvZKDH+31X2TwWv/jA5LH1l51H49RrbB6fN8kF8Edx2aSk 5mSWpRbp2yVwZVzZ5lBwUKLixeMPTA2Mi4W7GDk5JARMJJZ9Ps8MYrMIqEosatvNAmKzCahL 3LjxEywuIqAhsaltA5DNxcEs8JZJYsqcZjaQhLBAiMS0D2uYuhg5OHgFLCTe7UwDCQsJZEqc m3KYEcTmFRCUODnzCdhMZgEtiRv/XoKVMwtISyz/xwFicgrYSXz/nQpSISqgLHFg23GmCYy8 s5A0z0LSPAuheQEj8ypGkcy8stzEzBxTveLsjMq8zAq95PzcTYzAiFpW+2fiDsYvl90PMQpw MCrx8BrsDE0TYk0sK67MPcQowcGsJMJ7ZhJQiDclsbIqtSg/vqg0J7X4EKM0B4uSOK9XeGqC kEB6YklqdmpqQWoRTJaJg1OqgXGqk1KeTPZhubX6psJ66xk7q1f3dd+1L+/KOJ9mulVIWFxM VC6+kvNMYOv0EBvuS86lJVWizqp7alli6t2q4lwC3B01OTmCH5bdOZDyJdwq/ubuoxbMPL/u Zk95Ou93rcV2J7V9Zq7zd5RGyTZ2/oyrrGh1b5Z60HHIuPWcjdOyNHNOEX8lluKMREMt5qLi RAD1/kjxpAIAAA== X-CFilter-Loop: Reflected X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: DDC55A000C X-Rspam-User: X-Stat-Signature: wfh4ykemcmtanoon3y48gthf5o77mdoz X-HE-Tag: 1716957586-123233 X-HE-Meta: U2FsdGVkX1+kwsWYzxFg64CsgtadJ/Md6BK9WL9f6OeKyaKReQqNH8+k2CO046v/3qjVCE3O0mDpRp93O+fuElKxb9axIsylVuqHombLuQq5QjAtn0JzOAHUiSXRxiDjnWbiEuCnO5To6EPRBH2q+KpEunjmaghu1s3BHLoWCPUS2RejFFiA0YTqruO9gdD65grRrhOP5fvwG7yP0El0d7D6gJdS1tiLRRYMyq6N4+CjvNoCogza2b//WnyO/qE0SkF27103sc6R1lZ0Vh/+2Yttm9aWjAae9rR+OPjtWQyZ5M5fXYDvBkF/HqexghOZT1/mkG28Y4ruB6eY9gRNw2mtxDwagzL8Hh4Tph+g0NFgcN44rbPFngvuVdYNKqVPL6RrbmzwEDdpObF066WzeAaqfEz2m1gMqjlV/GWplEFk3OjHJvocDNnK0gYLP48OUbWBigIIY8PAI6E+M5tO9dfQ4mw0/XD61zkmYNVLt0jGm6LI4HjtCghD5bxG7Rz7zsSiVXLvsJfNOJFhbPEOs32axgqNXgrhNvNHY44QXkt449x1C82eV5SoX88Biyp1xIgnmOIPmVCu7/y7rOy2TaUxJwyjAk5HcI6lbMyK0juZRXfAcnzFyHbDXcGNuQhJ4JAMNVgDBzIMPyG4F0tTrwkqjtGHhttr9HurMv/1oVwnvEPebKKq5gv7CufWz3hhT88mUh2KW5Ek57yIIYABnbqwPGH2UXfNXMkzyYjlT+Yybkn70q24tB6xQq4VzI+hBwpl+zALvW/STlHmHKONEetRUZ3DSnQQHPDgoevTVWilhRT2T4z/RA4480s3fU5MNDoNu195IUpsiVZRPbLdp2zWAZ2C4wW6E5ocVcNe8TyS4u/AASkV8XahPrYSohXCCsYPNWj5UEHrWh9Ygdd4v3HvXSG5k874ECq55j9m0n4Lj/E9lI+nox9fiSN8UQBjrkotzXdcFtoTVaMhaVg 0HcSgFtL zpnSOjxNir7XVkJbX+fkl9djbDaqCFdpy0+PBEto4dQE3R564bTmLbxwFMMdgVZBV+C7S X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, May 28, 2024 at 10:41:54AM +0200, David Hildenbrand wrote: > Am 10.05.24 um 08:51 schrieb Byungchul Park: > > Hi everyone, > > > > While I'm working with a tiered memory system e.g. CXL memory, I have > > been facing migration overhead esp. tlb shootdown on promotion or > > demotion between different tiers. Yeah.. most tlb shootdowns on > > migration through hinting fault can be avoided thanks to Huang Ying's > > work, commit 4d4b6d66db ("mm,unmap: avoid flushing tlb in batch if PTE > > is inaccessible"). See the following link for more information: > > > > https://lore.kernel.org/lkml/20231115025755.GA29979@system.software.com/ > > > > However, it's only for migration through hinting fault. I thought it'd > > be much better if we have a general mechanism to reduce all the tlb > > numbers that we can apply to any unmap code, that we normally believe > > tlb flush should be followed. > > > > I'm suggesting a new mechanism, LUF(Lazy Unmap Flush), defers tlb flush > > until folios that have been unmapped and freed, eventually get allocated > > again. It's safe for folios that had been mapped read-only and were > > unmapped, since the contents of the folios don't change while staying in > > pcp or buddy so we can still read the data through the stale tlb entries. > > > > tlb flush can be defered when folios get unmapped as long as it > > guarantees to perform tlb flush needed, before the folios actually > > become used, of course, only if all the corresponding ptes don't have > > write permission. Otherwise, the system will get messed up. > > > > To achieve that: > > > > 1. For the folios that map only to non-writable tlb entries, prevent > > tlb flush during unmapping but perform it just before the folios > > actually become used, out of buddy or pcp. > > Trying to understand the impact: Effectively, a CPU could still read data > from a page that has already been freed, until that page gets reallocated > again. > > The important part I can see is > > 1) PCP/buddy must not change page content (e.g., poison, init_on_free), > otherwise an app might read wrong content. Exactly. I will take them into account. Thank you. > 2) If we mess up the flush-before-realloc, an app might observe data written > by whoever allocated the page. Yes. However, appropiate TLB flush is performed in prep_new_page(). Basically you are right. I need to pay enough attention to it. > 3) We must reliably detect+handle any read-only PTEs for which we didn't > flush the TLB yet, otherwise an app could see its memory writes getting > lost. I recall that at least uffd-wp might defer TLB flushes (see comment in > do_wp_page()). Not sure about other pte_wrprotect() callers that flush the > TLB after processing multiple page tables, whereby rmap code might succeed > in unmapping a page before the TLB flush happened. > > Any other possible issues you stumbled over that are worth mentioning? You mentioned all that I'm concerning but in a clear way. Byungchul > > -- > Thanks, > > David / dhildenb