Re: [PATCH v4 4/6] mm: Introduce a pageflag for partially mapped folios

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Usama Arif <usamaarif642@gmail.com>
To: Bang Li <libang.linux@gmail.com>, Barry Song <baohua@kernel.org>
Cc: akpm@linux-foundation.org, linux-mm@kvack.org,
	hannes@cmpxchg.org, riel@surriel.com, shakeel.butt@linux.dev,
	roman.gushchin@linux.dev, yuzhao@google.com, david@redhat.com,
	ryan.roberts@arm.com, rppt@kernel.org, willy@infradead.org,
	cerasuolodomenico@gmail.com, ryncsn@gmail.com, corbet@lwn.net,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
	kernel-team@meta.com, libang.li@antgroup.com,
	Lance Yang <ioworker0@gmail.com>
Subject: Re: [PATCH v4 4/6] mm: Introduce a pageflag for partially mapped folios
Date: Mon, 2 Sep 2024 11:09:20 +0100	[thread overview]
Message-ID: <3b910105-591c-45d4-b9c8-8b70f20f283b@gmail.com> (raw)
In-Reply-To: <9eb5af0d-730c-459d-9c2e-5ad7b78f30d7@gmail.com>



On 01/09/2024 08:48, Bang Li wrote:
> hi, Usama
> 
> On 2024/8/22 3:04, Usama Arif wrote:
> 
>>
>> On 20/08/2024 17:30, Barry Song wrote:
>>
>>> Hi Usama,
>>> thanks! I can't judge if we need this partially_mapped flag. but if we
>>> need, the code
>>> looks correct to me. I'd like to leave this to David and other experts to ack.
>>>
>> Thanks for the reviews!
>>
>>> an alternative approach might be two lists? one for entirely_mapped,
>>> the other one
>>> for split_deferred. also seems ugly ?
>>>
>> That was my very first prototype! I shifted to using a bool which I sent in v1, and then a bit in _flags_1 as David suggested. I believe a bit in _flags_1 is the best way forward, as it leaves the most space in folio for future work.
>>
>>> On the other hand, when we want to extend your patchset to mTHP other than PMD-
>>> order, will the only deferred_list create huge lock contention while
>>> adding or removing
>>> folios from it?
>>>
>> Yes, I would imagine so. the deferred_split_queue is per memcg/node, so that helps.
>>
>> Also, this work is tied to khugepaged. So would need some thought when doing it for mTHP.
>>
>> I would imagine doing underused shrinker for mTHP would be less beneficial compared to doing it for 2M THP. But probably needs experimentation.
>>
>> Thanks
> 
> Below is the core code snippet to support "split underused mTHP". Can we extend the
> khugepaged_max_ptes_none value to mthp and keep its semantics unchanged? With a small
> modification, Only folios with page numbers greater than khugepaged_max_ptes_none - 1
> can be added to the deferred_split list and can be split. What do you think?
> 

hmm, so I believe its not as simple as that.

First mTHP support would need to be added to khugepaged. The entire khugepaged code needs
to be audited for it and significantly tested. If you just look at all the instances of
HPAGE_PMD_NR in khugepaged.c, you will see it will be a significant change and needs to
be a series of its own.

Also, different values of max_ptes_none can have different significance for mTHP sizes.
max_ptes_none of 200 does not mean anything for 512K and below, it means 78% of a 1M mTHP
and 39% of a 2M THP. We might want different max_ptes_none for each mTHP if we were to do
this.

The benefit of splitting underused mTHPs of lower order might not be worth the work
needed to split. Someone needs to experiment with different workloads and see some
improvement for these lower orders.

So probably a lot more than the below diff is needed for mTHP support!


> diff --git a/mm/memory.c b/mm/memory.c
> index b95fce7d190f..ef503958d6a0 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -4789,6 +4789,8 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
>         }
> 
>         folio_ref_add(folio, nr_pages - 1);
> +       if (nr_pages > 1 && nr_pages > khugepaged_max_ptes_none - 1)
> +               deferred_split_folio(folio, false);
>         add_mm_counter(vma->vm_mm, MM_ANONPAGES, nr_pages);
>         count_mthp_stat(folio_order(folio), MTHP_STAT_ANON_FAULT_ALLOC);
>         folio_add_new_anon_rmap(folio, vma, addr, RMAP_EXCLUSIVE);
> 
> shmem THP has the same memory expansion problem when the shmem_enabled configuration is
> set to always. In my opinion, it is necessary to support "split underused shmem THP",
> but I am not sure if there is any gap in the implementation?
> 
> Bang
> Thanks
>

next prev parent reply	other threads:[~2024-09-02 10:09 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-19  2:30 [PATCH v4 0/6] mm: split underused THPs Usama Arif
2024-08-19  2:30 ` [PATCH v4 1/6] mm: free zapped tail pages when splitting isolated thp Usama Arif
2024-08-19  2:30 ` [PATCH v4 2/6] mm: remap unused subpages to shared zeropage " Usama Arif
2024-08-19  2:30 ` [PATCH v4 3/6] mm: selftest to verify zero-filled pages are mapped to zeropage Usama Arif
2024-08-21 19:09   ` Usama Arif
2024-08-19  2:30 ` [PATCH v4 4/6] mm: Introduce a pageflag for partially mapped folios Usama Arif
2024-08-19  8:29   ` Barry Song
2024-08-19  8:30     ` Barry Song
2024-08-19 14:17     ` Usama Arif
2024-08-19 19:00       ` Barry Song
2024-08-19 20:16         ` Usama Arif
2024-08-19 21:34           ` Barry Song
2024-08-19 21:55             ` Barry Song
2024-08-20 19:35               ` Usama Arif
2024-08-20 21:30                 ` Barry Song
2024-08-21 19:04                   ` Usama Arif
2024-08-21 21:25                     ` Barry Song
2024-09-01 12:48                     ` Bang Li
2024-09-02 10:09                       ` Usama Arif [this message]
2024-08-19  2:30 ` [PATCH v4 5/6] mm: split underused THPs Usama Arif
2024-08-19  2:30 ` [PATCH v4 6/6] mm: add sysfs entry to disable splitting " Usama Arif

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3b910105-591c-45d4-b9c8-8b70f20f283b@gmail.com \
    --to=usamaarif642@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=baohua@kernel.org \
    --cc=cerasuolodomenico@gmail.com \
    --cc=corbet@lwn.net \
    --cc=david@redhat.com \
    --cc=hannes@cmpxchg.org \
    --cc=ioworker0@gmail.com \
    --cc=kernel-team@meta.com \
    --cc=libang.li@antgroup.com \
    --cc=libang.linux@gmail.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=riel@surriel.com \
    --cc=roman.gushchin@linux.dev \
    --cc=rppt@kernel.org \
    --cc=ryan.roberts@arm.com \
    --cc=ryncsn@gmail.com \
    --cc=shakeel.butt@linux.dev \
    --cc=willy@infradead.org \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).