Re: [PATCH v5 2/6] mm: remap unused subpages to shared zeropage when splitting isolated thp

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Lance Yang <lance.yang@linux.dev>
To: David Hildenbrand <david@redhat.com>
Cc: "Qun-wei Lin (林群崴)" <Qun-wei.Lin@mediatek.com>,
	"catalin.marinas@arm.com" <catalin.marinas@arm.com>,
	"usamaarif642@gmail.com" <usamaarif642@gmail.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"yuzhao@google.com" <yuzhao@google.com>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"corbet@lwn.net" <corbet@lwn.net>,
	"Andrew Yang (楊智強)" <Andrew.Yang@mediatek.com>,
	"npache@redhat.com" <npache@redhat.com>,
	"rppt@kernel.org" <rppt@kernel.org>,
	"willy@infradead.org" <willy@infradead.org>,
	"kernel-team@meta.com" <kernel-team@meta.com>,
	"roman.gushchin@linux.dev" <roman.gushchin@linux.dev>,
	"hannes@cmpxchg.org" <hannes@cmpxchg.org>,
	"cerasuolodomenico@gmail.com" <cerasuolodomenico@gmail.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"ryncsn@gmail.com" <ryncsn@gmail.com>,
	"surenb@google.com" <surenb@google.com>,
	"riel@surriel.com" <riel@surriel.com>,
	"shakeel.butt@linux.dev" <shakeel.butt@linux.dev>,
	"Chinwen Chang (張錦文)" <chinwen.chang@mediatek.com>,
	"linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>,
	"Casper Li (李中榮)" <casper.li@mediatek.com>,
	"ryan.roberts@arm.com" <ryan.roberts@arm.com>,
	"linux-mediatek@lists.infradead.org"
	<linux-mediatek@lists.infradead.org>,
	"baohua@kernel.org" <baohua@kernel.org>,
	"kaleshsingh@google.com" <kaleshsingh@google.com>,
	"zhais@google.com" <zhais@google.com>,
	"linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>
Subject: Re: [PATCH v5 2/6] mm: remap unused subpages to shared zeropage when splitting isolated thp
Date: Fri, 19 Sep 2025 20:44:14 +0800	[thread overview]
Message-ID: <4a67a5d5-6043-4ba4-b1ca-2b0a800aafb9@linux.dev> (raw)
In-Reply-To: <9395a9ca-d865-42d7-9ea1-8e693e4e38e0@linux.dev>



On 2025/9/19 20:19, Lance Yang wrote:
> Hey David,
> 
> I believe I've found the exact reason why KSM skips MTE-tagged pages ;p
> 
>>
>>
>> On 2025/9/19 16:14, Lance Yang wrote:
>>>
>>>
>>> On 2025/9/19 15:55, David Hildenbrand wrote:
>>>>>> I think where possible we really only want to identify problematic
>>>>>> (tagged) pages and skip them. And we should either look into 
>>>>>> fixing KSM
>>>>>> as well or finding out why KSM is not affected.
>>>>>
>>>>> Yeah. Seems like we could introduce a new helper,
>>>>> folio_test_mte_tagged(struct
>>>>> folio *folio). By default, it would return false, and architectures 
>>>>> like
>>>>> arm64
>>>>> can override it.
>>>>
>>>> If we add a new helper it should instead express the semantics that 
>>>> we cannot deduplicate.
>>>
>>> Agreed.
>>>
>>>>
>>>> For THP, I recall that only some pages might be tagged. So likely we 
>>>> want to check per page.
>>>
>>> Yes, a per-page check would be simpler.
>>>
>>>>
>>>>>
>>>>> Looking at the code, the PG_mte_tagged flag is not set for regular 
>>>>> THP.
>>>>
>>>> I think it's supported for THP per page. Only for hugetlb we tag the 
>>>> whole thing through the head page instead of individual pages.
>>>
>>> Right. That's exactly what I meant.
>>>
>>>>
>>>>> The MTE
>>>>> status actually comes from the VM_MTE flag in the VMA that maps it.
>>>>>
>>>>
>>>> During the rmap walk we could check the VMA flag, but there would be 
>>>> no way to just stop the THP shrinker scanning this page early.
>>>>
>>>>> static inline bool folio_test_hugetlb_mte_tagged(struct folio *folio)
>>>>> {
>>>>>     bool ret = test_bit(PG_mte_tagged, &folio->flags.f);
>>>>>
>>>>>     VM_WARN_ON_ONCE(!folio_test_hugetlb(folio));
>>>>>
>>>>>     /*
>>>>>      * If the folio is tagged, ensure ordering with a likely 
>>>>> subsequent
>>>>>      * read of the tags.
>>>>>      */
>>>>>     if (ret)
>>>>>         smp_rmb();
>>>>>     return ret;
>>>>> }
>>>>>
>>>>> static inline bool page_mte_tagged(struct page *page)
>>>>> {
>>>>>     bool ret = test_bit(PG_mte_tagged, &page->flags.f);
>>>>>
>>>>>     VM_WARN_ON_ONCE(folio_test_hugetlb(page_folio(page)));
>>>>>
>>>>>     /*
>>>>>      * If the page is tagged, ensure ordering with a likely subsequent
>>>>>      * read of the tags.
>>>>>      */
>>>>>     if (ret)
>>>>>         smp_rmb();
>>>>>     return ret;
>>>>> }
>>>>>
>>>>> contpte_set_ptes()
>>>>>     __set_ptes()
>>>>>         __set_ptes_anysz()
>>>>>             __sync_cache_and_tags()
>>>>>                 mte_sync_tags()
>>>>>                     set_page_mte_tagged()
>>>>>
>>>>> Then, having the THP shrinker skip any folios that are identified as
>>>>> MTE-tagged.
>>>>
>>>> Likely we should just do something like (maybe we want better naming)
>>>>
>>>> #ifndef page_is_mergable
>>>> #define page_is_mergable(page) (true)
>>>> #endif
>>>
>>>
>>> Maybe something like page_is_optimizable()? Just a thought ;p
>>>
>>>>
>>>> And for arm64 have it be
>>>>
>>>> #define page_is_mergable(page) (!page_mte_tagged(page))
>>>>
>>>>
>>>> And then do
>>>>
>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>>> index 1f0813b956436..1cac9093918d6 100644
>>>> --- a/mm/huge_memory.c
>>>> +++ b/mm/huge_memory.c
>>>> @@ -4251,7 +4251,8 @@ static bool thp_underused(struct folio *folio)
>>>>
>>>>          for (i = 0; i < folio_nr_pages(folio); i++) {
>>>>                  kaddr = kmap_local_folio(folio, i * PAGE_SIZE);
>>>> -               if (!memchr_inv(kaddr, 0, PAGE_SIZE)) {
>>>> +               if (page_is_mergable(folio_page(folio, i)) &&
>>>> +                   !memchr_inv(kaddr, 0, PAGE_SIZE)) {
>>>>                          num_zero_pages++;
>>>>                          if (num_zero_pages > 
>>>> khugepaged_max_ptes_none) {
>>>>                                  kunmap_local(kaddr);
>>>> diff --git a/mm/migrate.c b/mm/migrate.c
>>>> index 946253c398072..476a9a9091bd3 100644
>>>> --- a/mm/migrate.c
>>>> +++ b/mm/migrate.c
>>>> @@ -306,6 +306,8 @@ static bool try_to_map_unused_to_zeropage(struct 
>>>> page_vma_mapped_walk *pvmw,
>>>>
>>>>          if (PageCompound(page))
>>>>                  return false;
>>>> +       if (!page_is_mergable(page))
>>>> +               return false;
>>>>          VM_BUG_ON_PAGE(!PageAnon(page), page);
>>>>          VM_BUG_ON_PAGE(!PageLocked(page), page);
>>>>          VM_BUG_ON_PAGE(pte_present(ptep_get(pvmw->pte)), page);
>>>
>>> Looks good to me!
>>>
>>>>
>>>>
>>>> For KSM, similarly just bail out early. But still wondering if this 
>>>> is already checked
>>>> somehow for KSM.
>>>
>>> +1 I'm looking for a machine to test it on.
>>
>> Interestingly, it seems KSM is already skipping MTE-tagged pages. My 
>> test,
>> running on a v6.8.0 kernel inside QEMU (with MTE enabled), shows no 
>> merging
>> activity for those pages ...
> 
> KSM's call to pages_identical() ultimately leads to memcmp_pages(). The
> arm64 implementation of memcmp_pages() in arch/arm64/kernel/mte.c contains
> a specific check that prevents merging in this case.
> 
> try_to_merge_one_page()
>      -> pages_identical()
>          -> !memcmp_pages() Fails!
>          -> replace_page()

Forgot to add:

memcmp_pages() is also called in other KSM paths, such as
stable_tree_search(), stable_tree_insert(), and
unstable_tree_search_insert(), effectively blocking MTE-tagged
pages from entering either of KSM's trees.

> 
> 
> int memcmp_pages(struct page *page1, struct page *page2)
> {
>      char *addr1, *addr2;
>      int ret;
> 
>      addr1 = page_address(page1);
>      addr2 = page_address(page2);
>      ret = memcmp(addr1, addr2, PAGE_SIZE);
> 
>      if (!system_supports_mte() || ret)
>          return ret;
> 
>      /*
>       * If the page content is identical but at least one of the pages is
>       * tagged, return non-zero to avoid KSM merging. If only one of the
>       * pages is tagged, __set_ptes() may zero or change the tags of the
>       * other page via mte_sync_tags().
>       */
>      if (page_mte_tagged(page1) || page_mte_tagged(page2))
>          return addr1 != addr2;
> 
>      return ret;
> }
> 
> IIUC, if either page is MTE-tagged, memcmp_pages() intentionally returns
> a non-zero value, which in turn causes pages_identical() to return false.
> 
> Cheers,
> Lance

next prev parent reply	other threads:[~2025-09-19 12:44 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-30 10:03 [PATCH v5 0/6] mm: split underused THPs Usama Arif
2024-08-30 10:03 ` [PATCH v5 1/6] mm: free zapped tail pages when splitting isolated thp Usama Arif
2024-09-05  8:46   ` Hugh Dickins
2024-09-05 10:21     ` Usama Arif
2024-09-05 18:05       ` Hugh Dickins
2024-09-05 19:24         ` Usama Arif
2024-08-30 10:03 ` [PATCH v5 2/6] mm: remap unused subpages to shared zeropage " Usama Arif
2024-10-23 16:21   ` Zi Yan
2024-10-23 16:50     ` Usama Arif
2024-10-23 16:55       ` Zi Yan
2024-10-23 16:56       ` Yu Zhao
2025-09-18  8:53   ` Qun-wei Lin (林群崴)
2025-09-18  8:56     ` David Hildenbrand
2025-09-18 11:42       ` Usama Arif
2025-09-18 11:57         ` David Hildenbrand
2025-09-18 12:22       ` Lance Yang
2025-09-18 12:25         ` Lance Yang
2025-09-18 12:35         ` David Hildenbrand
2025-09-19  5:16           ` Lance Yang
2025-09-19  7:55             ` David Hildenbrand
2025-09-19  8:14               ` Lance Yang
2025-09-19 10:53                 ` Lance Yang
2025-09-19 12:19                   ` Lance Yang
2025-09-19 12:44                     ` Lance Yang [this message]
2025-09-19 13:09                     ` David Hildenbrand
2025-09-19 13:24                       ` Lance Yang
2024-08-30 10:03 ` [PATCH v5 3/6] mm: selftest to verify zero-filled pages are mapped to zeropage Usama Arif
2024-08-30 10:03 ` [PATCH v5 4/6] mm: Introduce a pageflag for partially mapped folios Usama Arif
2024-12-11 15:03   ` David Hildenbrand
2024-12-12 10:30     ` Usama Arif
2024-12-12 10:49       ` David Hildenbrand
2024-08-30 10:03 ` [PATCH v5 5/6] mm: split underused THPs Usama Arif
2024-08-30 10:03 ` [PATCH v5 6/6] mm: add sysfs entry to disable splitting " Usama Arif

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4a67a5d5-6043-4ba4-b1ca-2b0a800aafb9@linux.dev \
    --to=lance.yang@linux.dev \
    --cc=Andrew.Yang@mediatek.com \
    --cc=Qun-wei.Lin@mediatek.com \
    --cc=akpm@linux-foundation.org \
    --cc=baohua@kernel.org \
    --cc=casper.li@mediatek.com \
    --cc=catalin.marinas@arm.com \
    --cc=cerasuolodomenico@gmail.com \
    --cc=chinwen.chang@mediatek.com \
    --cc=corbet@lwn.net \
    --cc=david@redhat.com \
    --cc=hannes@cmpxchg.org \
    --cc=kaleshsingh@google.com \
    --cc=kernel-team@meta.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mediatek@lists.infradead.org \
    --cc=linux-mm@kvack.org \
    --cc=npache@redhat.com \
    --cc=riel@surriel.com \
    --cc=roman.gushchin@linux.dev \
    --cc=rppt@kernel.org \
    --cc=ryan.roberts@arm.com \
    --cc=ryncsn@gmail.com \
    --cc=shakeel.butt@linux.dev \
    --cc=surenb@google.com \
    --cc=usamaarif642@gmail.com \
    --cc=willy@infradead.org \
    --cc=yuzhao@google.com \
    --cc=zhais@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).