From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AAD6FCD1293 for ; Sat, 31 Aug 2024 11:03:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1CFA28D0028; Sat, 31 Aug 2024 07:03:03 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 159148D0013; Sat, 31 Aug 2024 07:03:03 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EC8988D0028; Sat, 31 Aug 2024 07:03:02 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id CBDB98D0013 for ; Sat, 31 Aug 2024 07:03:02 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 4398AA0909 for ; Sat, 31 Aug 2024 11:03:02 +0000 (UTC) X-FDA: 82512253404.14.1F3CE73 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf12.hostedemail.com (Postfix) with ESMTP id 3DB0B40011 for ; Sat, 31 Aug 2024 11:02:58 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=VXd9CpfY; spf=pass (imf12.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1725102107; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=qDoL98Pt8qskSfE3fZxfS0+mDTtFmmEmKu1HOObEw2Y=; b=19L5DeBiLo5WuiON+Z73Lo92yiIPh36V4hGW9p1oa+ovbDElt728rl4pq7NbnHnwZc4MsU Csdo9mv2BLKsGWU58v079FDUjr8t9fK8jIHoivjz8BbyXoMPPfQ23WeMsNP83gHI//smQc ZZttdjHH/wY+dlsUTqpEtY27EDRMJRQ= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=VXd9CpfY; spf=pass (imf12.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1725102107; a=rsa-sha256; cv=none; b=cKGgvUbcXZLb3+ABZ5ti0Qb+IRukUJzV/GairInjUAo3afo4jXPXZzLjSPXx3bClpAmOjB 9SdI8VmSN/WIBvbzBEO/zhz8jacE36eLmydjJF0aLrxDXXmuXDdU4fsCDtb6XbYtdhXMGD GvlEI0yiiM+1VlBE25W24PSb2+D/x/0= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1725102177; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:autocrypt:autocrypt; bh=qDoL98Pt8qskSfE3fZxfS0+mDTtFmmEmKu1HOObEw2Y=; b=VXd9CpfYwTMwxKxx6n9dat2Dp33mbLoT/JM9nRVTAb2nkF7MnR8ZtwqG5WIh0sipReQHQE PABhFwDKS2oNyXrMTLmrrIPbYuWELXWkFfk98UrD27lWjd3rIHcr/esLm/Xs4NTiClzgxJ CCEgpaj4YiE2ncktXN5I+/OOaEabGQM= Received: from mail-wr1-f70.google.com (mail-wr1-f70.google.com [209.85.221.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-633-ZuhjP8V5OsCELd9__Mz9_Q-1; Sat, 31 Aug 2024 07:02:55 -0400 X-MC-Unique: ZuhjP8V5OsCELd9__Mz9_Q-1 Received: by mail-wr1-f70.google.com with SMTP id ffacd0b85a97d-374b9617ab0so765208f8f.3 for ; Sat, 31 Aug 2024 04:02:54 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725102174; x=1725706974; h=content-transfer-encoding:in-reply-to:organization:autocrypt:from :content-language:references:cc:to:subject:user-agent:mime-version :date:message-id:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=qDoL98Pt8qskSfE3fZxfS0+mDTtFmmEmKu1HOObEw2Y=; b=LGkSNT3lOxvLKa1ivISkY8UEvN3of27ZMxIckYLaJN4sINzLJwQg04xmBBIlV6p1jQ g2N8lMcuDAfjhmYJwdqjiA6Co+LxmWl7cL6q5clTiBXMSxY4GA/vZVlUNmnvhvDPY6pj bqwjBq1XmbscRmb1EE2SVqUbvxQMHK5pBXQZsciVNVwd/TM5oR0BgsZzp8BwnFZh39Ie jSflyJR9jmR68j8fMAoklDFNqpqFserzoGH2RBRvaFf2DOTyfu4QLNHgzwslzf0+r0gG ucHJoYunof6sZu0L/jEjwJYDimm6E0qM13Tio0UiBh0xFIvYvKBZBUfcl/FIbiYsIzVy 073Q== X-Forwarded-Encrypted: i=1; AJvYcCXk3uCi5Gm05+sQasCFWZj7XioxQ7u24n1CMkzUD72heGhYvizKX5Op8MMwv4BPrTiFqIPYHLBI0Q==@kvack.org X-Gm-Message-State: AOJu0YyId+chTwbWdUK4uamcT4ajKOw9wUK5XMElGcaByjuajm8oyW42 EVYpBdMhUzk++bbFtNaQE68qrcupocfOwBh84Yzywz6Gfczp3f/TfatdDNM4vmic72VOp1gLgO2 yM5+F+M+EamXF8mWRn4SbYTssevQiNwTHXFbhBYp54JUZjqlX X-Received: by 2002:a05:6000:4388:b0:374:b9d0:f2c3 with SMTP id ffacd0b85a97d-374b9d0f45dmr1945673f8f.17.1725102173630; Sat, 31 Aug 2024 04:02:53 -0700 (PDT) X-Google-Smtp-Source: AGHT+IE0RHMGFj1+wi6NLafOUAoA0j2+BSSmSuALFG2vM0OID7Mbr6eIdMImFmR4xaTs5J7J/bHv1A== X-Received: by 2002:a05:6000:4388:b0:374:b9d0:f2c3 with SMTP id ffacd0b85a97d-374b9d0f45dmr1945661f8f.17.1725102172644; Sat, 31 Aug 2024 04:02:52 -0700 (PDT) Received: from [192.168.3.141] (p5b0c6ee6.dip0.t-ipconnect.de. [91.12.110.230]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-374c15705easm889193f8f.33.2024.08.31.04.02.51 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sat, 31 Aug 2024 04:02:52 -0700 (PDT) Message-ID: Date: Sat, 31 Aug 2024 13:02:51 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH RFC] mm: entirely reuse the whole anon mTHP in do_wp_page To: Barry Song <21cnbao@gmail.com> Cc: akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Barry Song , Chuanhua Han , Baolin Wang , Ryan Roberts , Zi Yan , Chris Li , Kairui Song , Kalesh Singh , Suren Baghdasaryan References: <20240831092339.66085-1-21cnbao@gmail.com> <36933711-ae0f-468c-93bd-d6a67d974c9d@redhat.com> From: David Hildenbrand Autocrypt: addr=david@redhat.com; keydata= xsFNBFXLn5EBEAC+zYvAFJxCBY9Tr1xZgcESmxVNI/0ffzE/ZQOiHJl6mGkmA1R7/uUpiCjJ dBrn+lhhOYjjNefFQou6478faXE6o2AhmebqT4KiQoUQFV4R7y1KMEKoSyy8hQaK1umALTdL QZLQMzNE74ap+GDK0wnacPQFpcG1AE9RMq3aeErY5tujekBS32jfC/7AnH7I0v1v1TbbK3Gp XNeiN4QroO+5qaSr0ID2sz5jtBLRb15RMre27E1ImpaIv2Jw8NJgW0k/D1RyKCwaTsgRdwuK Kx/Y91XuSBdz0uOyU/S8kM1+ag0wvsGlpBVxRR/xw/E8M7TEwuCZQArqqTCmkG6HGcXFT0V9 PXFNNgV5jXMQRwU0O/ztJIQqsE5LsUomE//bLwzj9IVsaQpKDqW6TAPjcdBDPLHvriq7kGjt WhVhdl0qEYB8lkBEU7V2Yb+SYhmhpDrti9Fq1EsmhiHSkxJcGREoMK/63r9WLZYI3+4W2rAc UucZa4OT27U5ZISjNg3Ev0rxU5UH2/pT4wJCfxwocmqaRr6UYmrtZmND89X0KigoFD/XSeVv jwBRNjPAubK9/k5NoRrYqztM9W6sJqrH8+UWZ1Idd/DdmogJh0gNC0+N42Za9yBRURfIdKSb B3JfpUqcWwE7vUaYrHG1nw54pLUoPG6sAA7Mehl3nd4pZUALHwARAQABzSREYXZpZCBIaWxk ZW5icmFuZCA8ZGF2aWRAcmVkaGF0LmNvbT7CwZgEEwEIAEICGwMGCwkIBwMCBhUIAgkKCwQW AgMBAh4BAheAAhkBFiEEG9nKrXNcTDpGDfzKTd4Q9wD/g1oFAl8Ox4kFCRKpKXgACgkQTd4Q 9wD/g1oHcA//a6Tj7SBNjFNM1iNhWUo1lxAja0lpSodSnB2g4FCZ4R61SBR4l/psBL73xktp rDHrx4aSpwkRP6Epu6mLvhlfjmkRG4OynJ5HG1gfv7RJJfnUdUM1z5kdS8JBrOhMJS2c/gPf wv1TGRq2XdMPnfY2o0CxRqpcLkx4vBODvJGl2mQyJF/gPepdDfcT8/PY9BJ7FL6Hrq1gnAo4 3Iv9qV0JiT2wmZciNyYQhmA1V6dyTRiQ4YAc31zOo2IM+xisPzeSHgw3ONY/XhYvfZ9r7W1l pNQdc2G+o4Di9NPFHQQhDw3YTRR1opJaTlRDzxYxzU6ZnUUBghxt9cwUWTpfCktkMZiPSDGd KgQBjnweV2jw9UOTxjb4LXqDjmSNkjDdQUOU69jGMUXgihvo4zhYcMX8F5gWdRtMR7DzW/YE BgVcyxNkMIXoY1aYj6npHYiNQesQlqjU6azjbH70/SXKM5tNRplgW8TNprMDuntdvV9wNkFs 9TyM02V5aWxFfI42+aivc4KEw69SE9KXwC7FSf5wXzuTot97N9Phj/Z3+jx443jo2NR34XgF 89cct7wJMjOF7bBefo0fPPZQuIma0Zym71cP61OP/i11ahNye6HGKfxGCOcs5wW9kRQEk8P9 M/k2wt3mt/fCQnuP/mWutNPt95w9wSsUyATLmtNrwccz63XOwU0EVcufkQEQAOfX3n0g0fZz Bgm/S2zF/kxQKCEKP8ID+Vz8sy2GpDvveBq4H2Y34XWsT1zLJdvqPI4af4ZSMxuerWjXbVWb T6d4odQIG0fKx4F8NccDqbgHeZRNajXeeJ3R7gAzvWvQNLz4piHrO/B4tf8svmRBL0ZB5P5A 2uhdwLU3NZuK22zpNn4is87BPWF8HhY0L5fafgDMOqnf4guJVJPYNPhUFzXUbPqOKOkL8ojk CXxkOFHAbjstSK5Ca3fKquY3rdX3DNo+EL7FvAiw1mUtS+5GeYE+RMnDCsVFm/C7kY8c2d0G NWkB9pJM5+mnIoFNxy7YBcldYATVeOHoY4LyaUWNnAvFYWp08dHWfZo9WCiJMuTfgtH9tc75 7QanMVdPt6fDK8UUXIBLQ2TWr/sQKE9xtFuEmoQGlE1l6bGaDnnMLcYu+Asp3kDT0w4zYGsx 5r6XQVRH4+5N6eHZiaeYtFOujp5n+pjBaQK7wUUjDilPQ5QMzIuCL4YjVoylWiBNknvQWBXS lQCWmavOT9sttGQXdPCC5ynI+1ymZC1ORZKANLnRAb0NH/UCzcsstw2TAkFnMEbo9Zu9w7Kv AxBQXWeXhJI9XQssfrf4Gusdqx8nPEpfOqCtbbwJMATbHyqLt7/oz/5deGuwxgb65pWIzufa N7eop7uh+6bezi+rugUI+w6DABEBAAHCwXwEGAEIACYCGwwWIQQb2cqtc1xMOkYN/MpN3hD3 AP+DWgUCXw7HsgUJEqkpoQAKCRBN3hD3AP+DWrrpD/4qS3dyVRxDcDHIlmguXjC1Q5tZTwNB boaBTPHSy/Nksu0eY7x6HfQJ3xajVH32Ms6t1trDQmPx2iP5+7iDsb7OKAb5eOS8h+BEBDeq 3ecsQDv0fFJOA9ag5O3LLNk+3x3q7e0uo06XMaY7UHS341ozXUUI7wC7iKfoUTv03iO9El5f XpNMx/YrIMduZ2+nd9Di7o5+KIwlb2mAB9sTNHdMrXesX8eBL6T9b+MZJk+mZuPxKNVfEQMQ a5SxUEADIPQTPNvBewdeI80yeOCrN+Zzwy/Mrx9EPeu59Y5vSJOx/z6OUImD/GhX7Xvkt3kq Er5KTrJz3++B6SH9pum9PuoE/k+nntJkNMmQpR4MCBaV/J9gIOPGodDKnjdng+mXliF3Ptu6 3oxc2RCyGzTlxyMwuc2U5Q7KtUNTdDe8T0uE+9b8BLMVQDDfJjqY0VVqSUwImzTDLX9S4g/8 kC4HRcclk8hpyhY2jKGluZO0awwTIMgVEzmTyBphDg/Gx7dZU1Xf8HFuE+UZ5UDHDTnwgv7E th6RC9+WrhDNspZ9fJjKWRbveQgUFCpe1sa77LAw+XFrKmBHXp9ZVIe90RMe2tRL06BGiRZr jPrnvUsUUsjRoRNJjKKA/REq+sAnhkNPPZ/NNMjaZ5b8Tovi8C0tmxiCHaQYqj7G2rgnT0kt WNyWQQ== Organization: Red Hat In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Stat-Signature: 7nsjukr64947kyhka16sx4rga5sip635 X-Rspam-User: X-Rspamd-Queue-Id: 3DB0B40011 X-Rspamd-Server: rspam02 X-HE-Tag: 1725102178-642832 X-HE-Meta: U2FsdGVkX191eyNdXla/iRRrxoBgfEiVRNragrCzJSsgO+Z1jxNM8NYnGurRFcd7vJv4eB596N3tG9Z79qrhcVw8wSy+0swQbEYsphnLddgnWMrGEQbwLVNIjMAjhBt+HeskJJYB2wJ8rnJsDyp2esU0aTC0jCWjKwhfrX3RCxdXj6LRl3CZvHBRGMVHhtwmvKH+Dff+HQzni2tmQT53mVDausbNrNaPqz7m856FQCV1SY4MW/SNnyZjTwxUm6RQ6L8WlQCjFpQnqiZRcA6oAFZoqO1J3xzoplB4af2qBqjQ5tA03qZZJSEQqeksFHE/YV5ukxnEtHlpB0XZjxcDmNh7N0qwFVFVo0EOQOBJjXVmN7IbPDRdtjwn52Oe5qdD/53++B1u6v0cz5Waw0MBvh4WRqo3pVhLkKTqqWufgDmL6TT8m8hXh4pX+tJlPBNgflNpjnkzuGG1C5ODEKDpRHfCSE6CIWF6i0h+IHgn+qjO1gpgKUvrqMuU9JRuHaddsdFA4BuTneZ/NTebAIRc9W1ivC5xexra6igcqNqtenxLOlXCSkwLv4I+KapvND93tQhLRzVfSy5NC7yzzSKK2XkYaAi9KqQDC+NLrWl0N4SxFs9GTKSOqOXNwDAV7Kq73lEBwbC3f/EJl6mVphx3iBaOWhaO2pJnCtjJpJsqZwXm0wKZAbTRICzyKWwwVSXYgGdww+yZmkI/4+VDi5oVID2iiabg3pfHKDn1eebU1SXfNYd+HL06/jRMsSmJAxlYSB0CosGvw6+VapU3B3giDDe1of446LXjcjmYdzLlQdmg+ClJYv59+NF0wRFv0qkWXAMEZV545j3b+FNBfbU1VfG0Nrx66m9OuEw4huO2xk3iW5g4/a8tBahIjBfOl0TrxFxR3itL6J0WUiR4ynOxzUwLQL4ZZzFT0FHZ4FolXkz3AgM4BvLUvfKg9U21xo1cJCTDUffqIcCIs1Dnseb tXRh6iPS dhALzmGJ+iOl60gPrlXRwGVoF+HwHYdv0ETqpx1C6BDxut0Xir5RDbmaKAsUnKbi5HS74tice2jgDF3E1ulPklwGXzQ6yPlRrn2NLCMpei5X92YXY8L1/fIlr205/D1fxqd6cQ7a10ZHZY4jrDr4jRMgoA6FSYh0li4k3oI/PRmLvSEhwAazoyBwrlXtkWNg5Sqq3wYhgTPYKZUbovAXaI8azYFSaERnOAo2XMj5MRB3fC5fyoVQjzQeK1BTZcGAl8acq1vBJczHp2s8VFf5Kxs2ybcK42LGz4l8cS5LcFWiDd/zjzFdYQNv07aIK8dNrof5ylegweQbLVMax0ZD/KQst6+le8+TaaDftRDzngnN58XIZ7O/Lx7TCsk/prFjhA/EzxX+aKVU0ZMAO950bzTyIJsVDGP9S7fyI4qnezvGrprrA+lhID0lSV2xD7mlGk43quBxSaKzehn5YOKNbUbRmesqlFbrCcYwY1429Xiu3ziyb8OscE9XVZlG2kQKRgJ5JvJoxl/xjX8QD3gcpRIoVnPO+Aelo9R0rp7TU2CKGezXzm/A7WvJR5A== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 31.08.24 12:49, Barry Song wrote: > On Sat, Aug 31, 2024 at 10:29 PM David Hildenbrand wrote: >> >> On 31.08.24 12:21, Barry Song wrote: >>> On Sat, Aug 31, 2024 at 10:07 PM David Hildenbrand wrote: >>>> >>>> On 31.08.24 11:55, Barry Song wrote: >>>>> On Sat, Aug 31, 2024 at 9:44 PM David Hildenbrand wrote: >>>>>> >>>>>> On 31.08.24 11:23, Barry Song wrote: >>>>>>> From: Barry Song >>>>>>> >>>>>>> On a physical phone, it's sometimes observed that deferred_split >>>>>>> mTHPs account for over 15% of the total mTHPs. Profiling by Chuanhua >>>>>>> indicates that the majority of these originate from the typical fork >>>>>>> scenario. >>>>>>> When the child process either execs or exits, the parent process should >>>>>>> ideally be able to reuse the entire mTHP. However, the current kernel >>>>>>> lacks this capability and instead places the mTHP into split_deferred, >>>>>>> performing a CoW (Copy-on-Write) on just a single subpage of the mTHP. >>>>>>> >>>>>>> main() >>>>>>> { >>>>>>> #define SIZE 1024 * 1024UL >>>>>>> void *p = malloc(SIZE); >>>>>>> memset(p, 0x11, SIZE); >>>>>>> if (fork() == 0) >>>>>>> exec(....); >>>>>>> /* >>>>>>> * this will trigger cow one subpage from >>>>>>> * mTHP and put mTHP into split_deferred >>>>>>> * list >>>>>>> */ >>>>>>> *(int *)(p + 10) = 10; >>>>>>> printf("done\n"); >>>>>>> while(1); >>>>>>> } >>>>>>> >>>>>>> This leads to two significant issues: >>>>>>> >>>>>>> * Memory Waste: Before the mTHP is fully split by the shrinker, >>>>>>> it wastes memory. In extreme cases, such as with a 64KB mTHP, >>>>>>> the memory usage could be 64KB + 60KB until the last subpage >>>>>>> is written, at which point the mTHP is freed. >>>>>>> >>>>>>> * Fragmentation and Performance Loss: It destroys large folios >>>>>>> (negating the performance benefits of CONT-PTE) and fragments memory. >>>>>>> >>>>>>> To address this, we should aim to reuse the entire mTHP in such cases. >>>>>>> >>>>>>> Hi David, >>>>>>> >>>>>>> I’ve renamed wp_page_reuse() to wp_folio_reuse() and added an >>>>>>> entirely_reuse argument because I’m not sure if there are still cases >>>>>>> where we reuse a subpage within an mTHP. For now, I’m setting >>>>>>> entirely_reuse to true only for the newly supported case, while all >>>>>>> other cases still get false. Please let me know if this is incorrect—if >>>>>>> we don’t reuse subpages at all, we could remove the argument. >>>>>> >>>>>> See [1] I sent out this week, that is able to reuse even without >>>>>> scanning page tables. If we find the the folio is exclusive we could try >>>>>> processing surrounding PTEs that map the same folio. >>>>>> >>>>>> [1] https://lkml.kernel.org/r/20240829165627.2256514-1-david@redhat.com >>>>> >>>>> Great! It looks like I missed your patch again. Since you've implemented this >>>>> in a better way, I’d prefer to use your patchset. >>>> >>>> I wouldn't say better, just more universally. And while taking care of >>>> properly sync'ing the mapcount vs. refcount :P >>>> >>>>> >>>>> I’m curious about how you're handling ptep_set_access_flags_nr() or similar >>>>> things because I couldn’t find the related code in your patch 10/17: >>>>> >>>>> [PATCH v1 10/17] mm: COW reuse support for PTE-mapped THP with CONFIG_MM_ID >>>>> >>>>> Am I missing something? >>>> >>>> The idea is to keep individual write faults as fast as possible. So the >>>> patch set keeps it simple and only reuses a single PTE at a time, >>>> setting that one PAE and mapping it writable. >>> >>> I got your point, thanks! as anyway the mTHP has been exclusive, >>> so the following nr-1 minor page faults will set their particular PTE >>> to writable one by one. >> >> Yes, assuming you would get these page faults, and assuming you would >> get them in the near future. >> >>> >>>> >>>> As the patch states, it might be reasonable to optimize some cases, >>>> maybe also only on some architectures. For example to fault-around and >>>> map the other ones writable as well. It might not always be desirable >>>> though, especially not for larger folios. >>> >>> as anyway, the mTHP has been entirely exclusive, setting all PTEs >>> directly to writable should help reduce nr - 1 minor page faults and >>> ideally help reduce CONTPTE unfold and fold? >> >> Yes, doing that on CONTPTE granularity would very likely make sense. For >> anything bigger than that, I am not sure. >> >> Assuming we have a 1M folio mapped by PTEs. Trying to fault-around in >> aligned CONTPTE granularity likely makes sense. Bigger than that, I am >> not convinced. >> > > I see. maybe we can have something like: > > static bool pte_fault_around_estimate(int nr) > { > if (nr / arch_batched_ptes_nr() < 16) > return true; > > return false; > } > > if (pte_fault_around_estimate(folio_nr_pages(folio))) > set all ptes; > > for arm64, arch_batched_ptes_nr() == 16. for > arch without cont-pte or similar things, > arch_batched_ptes_nr() == 1. Yes, something like that would be my take. After we know that we can reuse the large folio, we'll try scanning starting from the aligned PTE. If we find that we can batch, we'll batch that part. Otherwise we'll simply fallback to a single one. Handling batching across VMAs is a bit harder. We might be able to batch, or might not ... We could have the CONT_PTE bit set across VMAs, but might not necessarily be able to batch (e.g., some VMAs are read-only). > > Just some rough ideas; all the naming might be quite messy. > > at least, we won't lose the benefit of reduced TLB miss > before all nr_pages are written for aarch64 :-) > >>> >>> What is the downside to doing that? I also don't think mapping them >>> all together will waste memory? >> >> No, it's all about increasing the latency of individual write faults. >> > > i see, i assume it won't be worse than the current case where we have to > allocate small folios and copy? and folio allocation can even further incur > direct reclamation? Yes, it would certainly better than what we currently have. Almost everything would likely be better than what we currently have. :) -- Cheers, David / dhildenb