All of lore.kernel.org
 help / color / mirror / Atom feed
From: Lance Yang <lance.yang@linux.dev>
To: David Hildenbrand <david@redhat.com>
Cc: "Qun-wei Lin (林群崴)" <Qun-wei.Lin@mediatek.com>,
	"catalin.marinas@arm.com" <catalin.marinas@arm.com>,
	"usamaarif642@gmail.com" <usamaarif642@gmail.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"yuzhao@google.com" <yuzhao@google.com>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"corbet@lwn.net" <corbet@lwn.net>,
	"Andrew Yang (楊智強)" <Andrew.Yang@mediatek.com>,
	"npache@redhat.com" <npache@redhat.com>,
	"rppt@kernel.org" <rppt@kernel.org>,
	"willy@infradead.org" <willy@infradead.org>,
	"kernel-team@meta.com" <kernel-team@meta.com>,
	"roman.gushchin@linux.dev" <roman.gushchin@linux.dev>,
	"hannes@cmpxchg.org" <hannes@cmpxchg.org>,
	"cerasuolodomenico@gmail.com" <cerasuolodomenico@gmail.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"ryncsn@gmail.com" <ryncsn@gmail.com>,
	"surenb@google.com" <surenb@google.com>,
	"riel@surriel.com" <riel@surriel.com>,
	"shakeel.butt@linux.dev" <shakeel.butt@linux.dev>,
	"Chinwen Chang (張錦文)" <chinwen.chang@mediatek.com>,
	"linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>,
	"Casper Li (李中榮)" <casper.li@mediatek.com>,
	"ryan.roberts@arm.com" <ryan.roberts@arm.com>,
	"linux-mediatek@lists.infradead.org"
	<linux-mediatek@lists.infradead.org>,
	"baohua@kernel.org" <baohua@kernel.org>,
	"kaleshsingh@google.com" <kaleshsingh@google.com>,
	"zhais@google.com" <zhais@google.com>,
	"linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>
Subject: Re: [PATCH v5 2/6] mm: remap unused subpages to shared zeropage when splitting isolated thp
Date: Fri, 19 Sep 2025 20:19:25 +0800	[thread overview]
Message-ID: <9395a9ca-d865-42d7-9ea1-8e693e4e38e0@linux.dev> (raw)
In-Reply-To: <dfb303b9-151b-49ec-b7ef-691c71bd5385@linux.dev>

Hey David,

I believe I've found the exact reason why KSM skips MTE-tagged pages ;p

> 
> 
> On 2025/9/19 16:14, Lance Yang wrote:
>>
>>
>> On 2025/9/19 15:55, David Hildenbrand wrote:
>>>>> I think where possible we really only want to identify problematic
>>>>> (tagged) pages and skip them. And we should either look into fixing 
>>>>> KSM
>>>>> as well or finding out why KSM is not affected.
>>>>
>>>> Yeah. Seems like we could introduce a new helper,
>>>> folio_test_mte_tagged(struct
>>>> folio *folio). By default, it would return false, and architectures 
>>>> like
>>>> arm64
>>>> can override it.
>>>
>>> If we add a new helper it should instead express the semantics that 
>>> we cannot deduplicate.
>>
>> Agreed.
>>
>>>
>>> For THP, I recall that only some pages might be tagged. So likely we 
>>> want to check per page.
>>
>> Yes, a per-page check would be simpler.
>>
>>>
>>>>
>>>> Looking at the code, the PG_mte_tagged flag is not set for regular THP.
>>>
>>> I think it's supported for THP per page. Only for hugetlb we tag the 
>>> whole thing through the head page instead of individual pages.
>>
>> Right. That's exactly what I meant.
>>
>>>
>>>> The MTE
>>>> status actually comes from the VM_MTE flag in the VMA that maps it.
>>>>
>>>
>>> During the rmap walk we could check the VMA flag, but there would be 
>>> no way to just stop the THP shrinker scanning this page early.
>>>
>>>> static inline bool folio_test_hugetlb_mte_tagged(struct folio *folio)
>>>> {
>>>>     bool ret = test_bit(PG_mte_tagged, &folio->flags.f);
>>>>
>>>>     VM_WARN_ON_ONCE(!folio_test_hugetlb(folio));
>>>>
>>>>     /*
>>>>      * If the folio is tagged, ensure ordering with a likely subsequent
>>>>      * read of the tags.
>>>>      */
>>>>     if (ret)
>>>>         smp_rmb();
>>>>     return ret;
>>>> }
>>>>
>>>> static inline bool page_mte_tagged(struct page *page)
>>>> {
>>>>     bool ret = test_bit(PG_mte_tagged, &page->flags.f);
>>>>
>>>>     VM_WARN_ON_ONCE(folio_test_hugetlb(page_folio(page)));
>>>>
>>>>     /*
>>>>      * If the page is tagged, ensure ordering with a likely subsequent
>>>>      * read of the tags.
>>>>      */
>>>>     if (ret)
>>>>         smp_rmb();
>>>>     return ret;
>>>> }
>>>>
>>>> contpte_set_ptes()
>>>>     __set_ptes()
>>>>         __set_ptes_anysz()
>>>>             __sync_cache_and_tags()
>>>>                 mte_sync_tags()
>>>>                     set_page_mte_tagged()
>>>>
>>>> Then, having the THP shrinker skip any folios that are identified as
>>>> MTE-tagged.
>>>
>>> Likely we should just do something like (maybe we want better naming)
>>>
>>> #ifndef page_is_mergable
>>> #define page_is_mergable(page) (true)
>>> #endif
>>
>>
>> Maybe something like page_is_optimizable()? Just a thought ;p
>>
>>>
>>> And for arm64 have it be
>>>
>>> #define page_is_mergable(page) (!page_mte_tagged(page))
>>>
>>>
>>> And then do
>>>
>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>> index 1f0813b956436..1cac9093918d6 100644
>>> --- a/mm/huge_memory.c
>>> +++ b/mm/huge_memory.c
>>> @@ -4251,7 +4251,8 @@ static bool thp_underused(struct folio *folio)
>>>
>>>          for (i = 0; i < folio_nr_pages(folio); i++) {
>>>                  kaddr = kmap_local_folio(folio, i * PAGE_SIZE);
>>> -               if (!memchr_inv(kaddr, 0, PAGE_SIZE)) {
>>> +               if (page_is_mergable(folio_page(folio, i)) &&
>>> +                   !memchr_inv(kaddr, 0, PAGE_SIZE)) {
>>>                          num_zero_pages++;
>>>                          if (num_zero_pages > 
>>> khugepaged_max_ptes_none) {
>>>                                  kunmap_local(kaddr);
>>> diff --git a/mm/migrate.c b/mm/migrate.c
>>> index 946253c398072..476a9a9091bd3 100644
>>> --- a/mm/migrate.c
>>> +++ b/mm/migrate.c
>>> @@ -306,6 +306,8 @@ static bool try_to_map_unused_to_zeropage(struct 
>>> page_vma_mapped_walk *pvmw,
>>>
>>>          if (PageCompound(page))
>>>                  return false;
>>> +       if (!page_is_mergable(page))
>>> +               return false;
>>>          VM_BUG_ON_PAGE(!PageAnon(page), page);
>>>          VM_BUG_ON_PAGE(!PageLocked(page), page);
>>>          VM_BUG_ON_PAGE(pte_present(ptep_get(pvmw->pte)), page);
>>
>> Looks good to me!
>>
>>>
>>>
>>> For KSM, similarly just bail out early. But still wondering if this 
>>> is already checked
>>> somehow for KSM.
>>
>> +1 I'm looking for a machine to test it on.
> 
> Interestingly, it seems KSM is already skipping MTE-tagged pages. My test,
> running on a v6.8.0 kernel inside QEMU (with MTE enabled), shows no merging
> activity for those pages ...

KSM's call to pages_identical() ultimately leads to memcmp_pages(). The
arm64 implementation of memcmp_pages() in arch/arm64/kernel/mte.c contains
a specific check that prevents merging in this case.

try_to_merge_one_page()
	-> pages_identical()
		-> !memcmp_pages() Fails!
		-> replace_page()


int memcmp_pages(struct page *page1, struct page *page2)
{
	char *addr1, *addr2;
	int ret;

	addr1 = page_address(page1);
	addr2 = page_address(page2);
	ret = memcmp(addr1, addr2, PAGE_SIZE);

	if (!system_supports_mte() || ret)
		return ret;

	/*
	 * If the page content is identical but at least one of the pages is
	 * tagged, return non-zero to avoid KSM merging. If only one of the
	 * pages is tagged, __set_ptes() may zero or change the tags of the
	 * other page via mte_sync_tags().
	 */
	if (page_mte_tagged(page1) || page_mte_tagged(page2))
		return addr1 != addr2;

	return ret;
}

IIUC, if either page is MTE-tagged, memcmp_pages() intentionally returns
a non-zero value, which in turn causes pages_identical() to return false.

Cheers,
Lance


  reply	other threads:[~2025-09-19 12:19 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-30 10:03 [PATCH v5 0/6] mm: split underused THPs Usama Arif
2024-08-30 10:03 ` [PATCH v5 1/6] mm: free zapped tail pages when splitting isolated thp Usama Arif
2024-09-05  8:46   ` Hugh Dickins
2024-09-05 10:21     ` Usama Arif
2024-09-05 18:05       ` Hugh Dickins
2024-09-05 19:24         ` Usama Arif
2024-08-30 10:03 ` [PATCH v5 2/6] mm: remap unused subpages to shared zeropage " Usama Arif
2024-10-23 16:21   ` Zi Yan
2024-10-23 16:50     ` Usama Arif
2024-10-23 16:55       ` Zi Yan
2024-10-23 16:56       ` Yu Zhao
2025-09-18  8:53   ` Qun-wei Lin (林群崴)
2025-09-18  8:56     ` David Hildenbrand
2025-09-18 11:42       ` Usama Arif
2025-09-18 11:57         ` David Hildenbrand
2025-09-18 12:22       ` Lance Yang
2025-09-18 12:25         ` Lance Yang
2025-09-18 12:35         ` David Hildenbrand
2025-09-19  5:16           ` Lance Yang
2025-09-19  7:55             ` David Hildenbrand
2025-09-19  8:14               ` Lance Yang
2025-09-19 10:53                 ` Lance Yang
2025-09-19 12:19                   ` Lance Yang [this message]
2025-09-19 12:44                     ` Lance Yang
2025-09-19 13:09                     ` David Hildenbrand
2025-09-19 13:24                       ` Lance Yang
2024-08-30 10:03 ` [PATCH v5 3/6] mm: selftest to verify zero-filled pages are mapped to zeropage Usama Arif
2024-08-30 10:03 ` [PATCH v5 4/6] mm: Introduce a pageflag for partially mapped folios Usama Arif
2024-12-11 15:03   ` David Hildenbrand
2024-12-12 10:30     ` Usama Arif
2024-12-12 10:49       ` David Hildenbrand
2024-08-30 10:03 ` [PATCH v5 5/6] mm: split underused THPs Usama Arif
2024-08-30 10:03 ` [PATCH v5 6/6] mm: add sysfs entry to disable splitting " Usama Arif

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9395a9ca-d865-42d7-9ea1-8e693e4e38e0@linux.dev \
    --to=lance.yang@linux.dev \
    --cc=Andrew.Yang@mediatek.com \
    --cc=Qun-wei.Lin@mediatek.com \
    --cc=akpm@linux-foundation.org \
    --cc=baohua@kernel.org \
    --cc=casper.li@mediatek.com \
    --cc=catalin.marinas@arm.com \
    --cc=cerasuolodomenico@gmail.com \
    --cc=chinwen.chang@mediatek.com \
    --cc=corbet@lwn.net \
    --cc=david@redhat.com \
    --cc=hannes@cmpxchg.org \
    --cc=kaleshsingh@google.com \
    --cc=kernel-team@meta.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mediatek@lists.infradead.org \
    --cc=linux-mm@kvack.org \
    --cc=npache@redhat.com \
    --cc=riel@surriel.com \
    --cc=roman.gushchin@linux.dev \
    --cc=rppt@kernel.org \
    --cc=ryan.roberts@arm.com \
    --cc=ryncsn@gmail.com \
    --cc=shakeel.butt@linux.dev \
    --cc=surenb@google.com \
    --cc=usamaarif642@gmail.com \
    --cc=willy@infradead.org \
    --cc=yuzhao@google.com \
    --cc=zhais@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.