From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A62DEC4167B for ; Fri, 10 Nov 2023 01:02:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C343A28000F; Thu, 9 Nov 2023 20:02:15 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id BBD3A28000E; Thu, 9 Nov 2023 20:02:15 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A368C28000F; Thu, 9 Nov 2023 20:02:15 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 8D74128000E for ; Thu, 9 Nov 2023 20:02:15 -0500 (EST) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 5FE3F80DD3 for ; Fri, 10 Nov 2023 01:02:15 +0000 (UTC) X-FDA: 81440243430.02.CF216F7 Received: from invmail4.hynix.com (exvmail4.skhynix.com [166.125.252.92]) by imf29.hostedemail.com (Postfix) with ESMTP id 5A7E7120019 for ; Fri, 10 Nov 2023 01:02:12 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=none; spf=pass (imf29.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1699578133; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=mDv6JB6FCNiTzW7WEBgz4nfFAhRyRDm7kr4DvgXuQe4=; b=blkG1qi8wFDMTVdfGKgKRktrCB9u4DjLYm4gcJcJEhsjL5gWlYrOV6aCTgU5fFzp2bXUiG w3RNN0Z6OWhAJWVZ3iaEtdEkBVY49tmu/C0/c05QejLf1XiazCEeXCI7NAs04/mPxKGrfV L763iimbQc/1ouCmVdwyj3qVKL2aSlU= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1699578133; a=rsa-sha256; cv=none; b=8GO25YoXwZKGyPd6aD3IkmOhVLbGKuHgQ5tHdDkCGtmrWRXQOngL8svuts6jyYjGaMj22E YUpsujrOa7LnDizegXxrI176nApPpl+CR+nq5A9Vrj8go8C8FdUQ/Kv4+4LexljXzqMjXd H1HRlsI8189XyZbuyZa7Vty4+kSBKwk= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=none; spf=pass (imf29.hostedemail.com: domain of byungchul@sk.com designates 166.125.252.92 as permitted sender) smtp.mailfrom=byungchul@sk.com; dmarc=none X-AuditID: a67dfc5b-d85ff70000001748-cc-654d810e1540 Date: Fri, 10 Nov 2023 10:02:01 +0900 From: Byungchul Park To: Nadav Amit Cc: Linux Kernel Mailing List , linux-mm , "kernel_team@skhynix.com" , Andrew Morton , "ying.huang@intel.com" , "xhao@linux.alibaba.com" , "mgorman@techsingularity.net" , "hughd@google.com" , "willy@infradead.org" , "david@redhat.com" , "peterz@infradead.org" , Andy Lutomirski , Thomas Gleixner , "mingo@redhat.com" , "bp@alien8.de" , "dave.hansen@linux.intel.com" Subject: Re: [v3 2/3] mm: Defer TLB flush by keeping both src and dst folios at migration Message-ID: <20231110010201.GA72073@system.software.com> References: <20231030072540.38631-1-byungchul@sk.com> <20231030072540.38631-3-byungchul@sk.com> <63C530D3-3A1D-4BE9-8AA7-EFF5B895BE80@vmware.com> <20231030125129.GD81877@system.software.com> <20231108041208.GA40954@system.software.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.9.4 (2018-02-28) X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFjrEIsWRmVeSWpSXmKPExsXC9ZZnkS5/o2+qQccaNos564HE5w3/2Cxe bGhntPi6/hezxdNPfSwWl3fNYbO4t+Y/q8X5XWtZLXYs3cdkcenAAiaL67seMloc7z3AZLF5 01Rmi98/gOrmTLGyODlrMouDgMf31j4WjwWbSj02r9DyWLznJZPHplWdbB6bPk1i93h37hy7 x4kZv1k8dj609Jh3MtDj/b6rbB5bf9l5fN4k5/Fu/lu2AL4oLpuU1JzMstQifbsEroz7x9qZ Ch6IV8yZeYupgXGBYBcjJ4eEgInElZUr2GDsvW3P2EFsFgFViSe9OxhBbDYBdYkbN34yg9gi AooSh/bfA4szC7xjlfj+SRPEFhaIlvi09xVYL6+AhUTXkkbWLkYuDiGB5UwS69pnskAkBCVO znzCAtGsLvFn3iWgoRxAtrTE8n8cEGF5ieats8F2cQrYSdyb0wlWLiqgLHFg23EmiDu3sUt0 bk6CsCUlDq64wTKBUXAWkg2zkGyYhbBhFpINCxhZVjEKZeaV5SZm5pjoZVTmZVboJefnbmIE xuyy2j/ROxg/XQg+xCjAwajEw3vhuk+qEGtiWXFl7iFGCQ5mJRHeCyZAId6UxMqq1KL8+KLS nNTiQ4zSHCxK4rxG38pThATSE0tSs1NTC1KLYLJMHJxSDYwa+w6t1JCeeI/DotLR916Qemap nd2ia40HH27yv960wqFiakqj7aVftm4zc39/sXyUL1m6s/ZGjdelG4x8f6sltm2L+Xjd8OZq /t+M89iXnvt78lnW26Xarhk1UXOfmn/QdSvXmcOYd9akbX0Wk+yL71c/24b4rviq/nKVDe9i KWaG4jMfIzOVWIozEg21mIuKEwFQEDrb1QIAAA== X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFlrFIsWRmVeSWpSXmKPExsXC5WfdrMvX6JtqsOekjsWc9WvYLD5v+Mdm 8WJDO6PF1/W/mC2efupjsTg89ySrxeVdc9gs7q35z2pxftdaVosdS/cxWVw6sIDJ4vquh4wW x3sPMFls3jSV2eL3D6C6OVOsLE7OmsziIOjxvbWPxWPBplKPzSu0PBbvecnksWlVJ5vHpk+T 2D3enTvH7nFixm8Wj50PLT3mnQz0eL/vKpvH4hcfmDy2/rLz+LxJzuPd/LdsAfxRXDYpqTmZ ZalF+nYJXBn3j7UzFTwQr5gz8xZTA+MCwS5GTg4JAROJvW3P2EFsFgFViSe9OxhBbDYBdYkb N34yg9giAooSh/bfA4szC7xjlfj+SRPEFhaIlvi09xVYL6+AhUTXkkbWLkYuDiGB5UwS69pn skAkBCVOznzCAtGsLvFn3iWgoRxAtrTE8n8cEGF5ieats8F2cQrYSdyb0wlWLiqgLHFg23Gm CYx8s5BMmoVk0iyESbOQTFrAyLKKUSQzryw3MTPHVK84O6MyL7NCLzk/dxMjMAaX1f6ZuIPx y2X3Q4wCHIxKPLwXrvukCrEmlhVX5h5ilOBgVhLhvWACFOJNSaysSi3Kjy8qzUktPsQozcGi JM7rFZ6aICSQnliSmp2aWpBaBJNl4uCUamA8F5Kf3SdX3HPMtXuRd5NwzyuuTdkH4qIvr+Rb emjvwyP7nz1Ov+uqGsZ9oFH6ieWnlRpVb44u/7by1oRe3ZrdLyJN1PVtvOPvKEYoOzpqTfn5 JndqMvtbMT8B+bcNVQd4XOamTerasbPHPsrjBN8sLZ9PDbMYGTl/WdtPFw6IK1+w55r74UVK LMUZiYZazEXFiQClF+oAvQIAAA== X-CFilter-Loop: Reflected X-Rspamd-Queue-Id: 5A7E7120019 X-Rspam-User: X-Stat-Signature: km6qqkuyi4r7qjisnh8zmwe4sgwr65mo X-Rspamd-Server: rspam03 X-HE-Tag: 1699578132-928678 X-HE-Meta: U2FsdGVkX19WXiM2T54iWWk4W9PYlKzDKdVnB7jAUY9ZG8UiX0eH70JFp8erbiHMeBvVwIIxuoWPn9wU9I2PKZ5JN0zYYVNecHtgQhDtWrVjWVS0/io4XNNz3d/jusQpnUI2aZCT7k077nOeMhAiat5nLKBgijGu3CQ71nsQuaJSedV7A4jfSYFVq30Ztu2iudBlU8+h+T98AFS84gsXWhfQaKBOAh8mTBuIuEsNF5s3hEEqx/lMR84gwadKykhho1CSW7u5aclIw+r1efqITnZTzgJdkrtjFKmVQR9DGTbkm9OxNnXGHI1PJ6IP9Syd5yMfETnbJhspz6ANE3SH96S4U1RruWiZmwbf6hA264zDJTIRbCiGtF1JHMrFiHr1XnWt33J2Hh7Pmrx1GpsACrQoHJjSFc72Dlg87Mt1zNg/N11MJlVHfqVPar/IpIGjvNAvukc0ztYqeQNqjzHt+CUXNdwuuIHxIL3mhk85REKAlftJa19HVbjytgFeWlumIX9TQkjd72Bm7+T1mNLELUPNNu6/4vWADNaWXOiAbROwMriLNOWUIO3HEILUwKQi0ukzHbCZ47gqyGiR2wtKXcQUBM7n64VBVu0i7j6lUQEKdz8ILn9xZk2tCWcrEtwkHrh346hET0DMJrhkAkzE1GXZNHQE387+HtbBTG7t+tvbQ9vtDMQobXLbVZodIuUo5qwsC7GReHbogZlNo3O7+/HTs2x97eONKMa+bqJnVbA/4XmcvpCjxyRJSy1fF1g8UX/Hy/N8LoziHL6XnwzCG19PA+Bq9PfrWf+hS5TcLQ/lS/0WLAWdJkCQsUtcK35fW8iiy8MCXw0e8drWifACxhVZm1JEV2/fJE3IsmdAkuXgmxUMQ1aqqM3xBocuBUCQuaslU+Z8upOyISfTEYI4zhPZc7SWE1lRHHJGXnqZq19rReGqmrtRAPJNWMRBQYJlf+4RvjDXtJcp2EoRIrY dAZCanq6 jAWblSDdon6ykCblexuckwJIxqrbOiGMC67ktcVOwbEJQrLE= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Nov 09, 2023 at 10:16:57AM +0000, Nadav Amit wrote: > > > > On Nov 8, 2023, at 6:12 AM, Byungchul Park wrote: > > > > !! External Email > > > > On Mon, Oct 30, 2023 at 09:51:30PM +0900, Byungchul Park wrote: > >>>> diff --git a/mm/memory.c b/mm/memory.c > >>>> index 6c264d2f969c..75dc48b6e15f 100644 > >>>> --- a/mm/memory.c > >>>> +++ b/mm/memory.c > >>>> @@ -3359,6 +3359,19 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf) > >>>> if (vmf->page) > >>>> folio = page_folio(vmf->page); > >>>> > >>>> + /* > >>>> + * This folio has its read copy to prevent inconsistency while > >>>> + * deferring TLB flushes. However, the problem might arise if > >>>> + * it's going to become writable. > >>>> + * > >>>> + * To prevent it, give up the deferring TLB flushes and perform > >>>> + * TLB flush right away. > >>>> + */ > >>>> + if (folio && migrc_pending_folio(folio)) { > >>>> + migrc_unpend_folio(folio); > >>>> + migrc_try_flush_free_folios(NULL); > >>> > >>> So many potential function calls… Probably they should have been combined > >>> into one and at least migrc_pending_folio() should have been an inline > >>> function in the header. > >> > >> I will try to change it as you mention. > >> > >>>> + } > >>>> + > >>> > >>> What about mprotect? I thought David has changed it so it can set writable > >>> PTEs. > >> > >> I will check it out. > > > > I found mprotect stuff is already performing TLB flushes needed for it. > > So some redundant TLB flushes might happen by migrc but it's not that > > harmful I think. Thanks. > > Let me explain the scenario I am concerned with. Assume page P is RO, and > moves from Psrc to Pdst. Pointer “p” points to P. Initially (*p == 0). > > Let’s also assume we also have an atomic variable “a”. Initially (a == 0). > > I hope I got the migration function names right, but I hope the problem > itself can be clear regardless. > > CPU0 CPU1 CPU2 CPU3 > ---- ---- ---- ---- > (user-mode) (user-mode) > > Access *p > [Psrc cached in TLB] > > migrate_pages_batch() > -> migrate_folio_unmap() > > [ PTE updated, > still no flush ] > > mprotect(p, > RW) Here, mprotect() do_mprotect_pkey() tlb_finish_mmu() tlb_flush_mmu() I thought TLB flush for mprotect() is performed by tlb_flush_mmu() so any cached TLB entries on other CPUs can have chance to update. Could you correct me if I get it wrong? Thanks. Byungchul > > [ Psrc is > RW ] > > [ flush > deferred] > > > *p = 1 # Pdst > > xchg(&a, 1) > mfence > if (a == 1) > assert(*p == 1); > > > > Now at this point the assertion might fail. CPU2 wrote into Pdst, whereas > CPU1 reads from Psrc. But based on x86 memory model, userspace might not > expect this scenario to be possible, hence leading to bugs.