From: Lance Yang <lance.yang@linux.dev>
To: David Hildenbrand <david@redhat.com>,
Catalin Marinas <catalin.marinas@arm.com>
Cc: akpm@linux-foundation.org, lorenzo.stoakes@oracle.com,
usamaarif642@gmail.com, yuzhao@google.com, ziy@nvidia.com,
baolin.wang@linux.alibaba.com, baohua@kernel.org,
voidice@gmail.com, Liam.Howlett@oracle.com,
cerasuolodomenico@gmail.com, hannes@cmpxchg.org,
kaleshsingh@google.com, npache@redhat.com, riel@surriel.com,
roman.gushchin@linux.dev, rppt@kernel.org, ryan.roberts@arm.com,
dev.jain@arm.com, ryncsn@gmail.com, shakeel.butt@linux.dev,
surenb@google.com, hughd@google.com, willy@infradead.org,
matthew.brost@intel.com, joshua.hahnjy@gmail.com,
rakie.kim@sk.com, byungchul@sk.com, gourry@gourry.net,
ying.huang@linux.alibaba.com, apopple@nvidia.com,
qun-wei.lin@mediatek.com, Andrew.Yang@mediatek.com,
casper.li@mediatek.com, chinwen.chang@mediatek.com,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, linux-mediatek@lists.infradead.org,
linux-mm@kvack.org, ioworker0@gmail.com, stable@vger.kernel.org
Subject: Re: [PATCH 1/1] mm/thp: fix MTE tag mismatch when replacing zero-filled subpages
Date: Wed, 24 Sep 2025 01:20:16 +0800 [thread overview]
Message-ID: <1a8352a6-e03f-4f3e-a06b-fab757e39eab@linux.dev> (raw)
In-Reply-To: <8bf8302a-6aba-4f7e-8356-a933bcf9e4a1@redhat.com>
On 2025/9/23 20:00, David Hildenbrand wrote:
> On 23.09.25 13:52, Catalin Marinas wrote:
>> On Mon, Sep 22, 2025 at 07:59:00PM +0200, David Hildenbrand wrote:
>>> On 22.09.25 19:24, Catalin Marinas wrote:
>>>> On Mon, Sep 22, 2025 at 10:14:58AM +0800, Lance Yang wrote:
>>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>>>> index 32e0ec2dde36..28d4b02a1aa5 100644
>>>>> --- a/mm/huge_memory.c
>>>>> +++ b/mm/huge_memory.c
>>>>> @@ -4104,29 +4104,20 @@ static unsigned long
>>>>> deferred_split_count(struct shrinker *shrink,
>>>>> static bool thp_underused(struct folio *folio)
>>>>> {
>>>>> int num_zero_pages = 0, num_filled_pages = 0;
>>>>> - void *kaddr;
>>>>> int i;
>>>>> for (i = 0; i < folio_nr_pages(folio); i++) {
>>>>> - kaddr = kmap_local_folio(folio, i * PAGE_SIZE);
>>>>> - if (!memchr_inv(kaddr, 0, PAGE_SIZE)) {
>>>>> - num_zero_pages++;
>>>>> - if (num_zero_pages > khugepaged_max_ptes_none) {
>>>>> - kunmap_local(kaddr);
>>>>> + if (pages_identical(folio_page(folio, i), ZERO_PAGE(0))) {
>>>>> + if (++num_zero_pages > khugepaged_max_ptes_none)
>>>>> return true;
>>>>
>>>> I wonder what the overhead of doing a memcmp() vs memchr_inv() is. The
>>>> former will need to read from two places. If it's noticeable, it would
>>>> affect architectures that don't have an MTE equivalent.
>>>>
>>>> Alternatively we could introduce something like folio_has_metadata()
>>>> which on arm64 simply checks PG_mte_tagged.
>>>
>>> We discussed something similar in the other thread (I suggested
>>> page_is_mergable()). I'd prefer to use pages_identical() for now, so
>>> we have
>>> the same logic here and in ksm code.
>>>
>>> (this patch here almost looks like a cleanup :) )
>>>
>>> If this becomes a problem, what we could do is in pages_identical()
>>> would be
>>> simply doing the memchr_inv() in case is_zero_pfn(). KSM might
>>> benefit from
>>> that as well when merging with the shared zeropage through
>>> try_to_merge_with_zero_page().
>>
>> Yes, we can always optimise it later.
>>
>> I just realised that on arm64 with MTE we won't get any merging with the
>> zero page even if the user page isn't mapped with PROT_MTE. In
>> cpu_enable_mte() we zero the tags in the zero page and set
>> PG_mte_tagged. The reason is that we want to use the zero page with
>> PROT_MTE mappings (until tag setting causes CoW). Hmm, the arm64
>> memcmp_pages() messed up KSM merging with the zero page even before this
>> patch.
>>
>> The MTE tag setting evolved a bit over time with some locking using PG_*
>> flags to avoid a set_pte_at() race trying to initialise the tags on the
>> same page. We also moved the swap restoring to arch_swap_restore()
>> rather than the set_pte_at() path. So it is safe now to merge with the
>> zero page if the other page isn't tagged. A subsequent set_pte_at()
>> attempting to clear the tags would notice that the zero page is already
>> tagged.
>>
>> We could go a step further and add tag comparison (I had some code
>> around) but I think the quick fix is to just not treat the zero page as
>> tagged.
>
> I assume any tag changes would result in CoW.
Right, I did a test and confirmed that any attempt to change
tags on the shared zero page results in a Copy-on-Write, as
expected ;)
>
> It would be interesting to know if there are use cases with VMs or other
> workloads where that could be beneficial with KSM.
>
>> Not fully tested yet:
>>
>> diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c
>> index e5e773844889..72a1dfc54659 100644
>> --- a/arch/arm64/kernel/mte.c
>> +++ b/arch/arm64/kernel/mte.c
>> @@ -73,6 +73,8 @@ int memcmp_pages(struct page *page1, struct page
>> *page2)
>> {
>> char *addr1, *addr2;
>> int ret;
>> + bool page1_tagged = page_mte_tagged(page1) && !is_zero_page(page1);
>> + bool page2_tagged = page_mte_tagged(page2) && !is_zero_page(page2);
>> addr1 = page_address(page1);
>> addr2 = page_address(page2);
>> @@ -83,11 +85,10 @@ int memcmp_pages(struct page *page1, struct page
>> *page2)
>> /*
>> * If the page content is identical but at least one of the
>> pages is
>> - * tagged, return non-zero to avoid KSM merging. If only one of the
>> - * pages is tagged, __set_ptes() may zero or change the tags of the
>> - * other page via mte_sync_tags().
>> + * tagged, return non-zero to avoid KSM merging. Ignore the zero
>> page
>> + * since it is always tagged with the tags cleared.
>> */
>> - if (page_mte_tagged(page1) || page_mte_tagged(page2))
>> + if (page1_tagged || page2_tagged)
>> return addr1 != addr2;
>
> That looks reasonable to me.
>
> @Lance as you had a test setup, could you give this a try as well with
> KSM shared zeropage deduplication enabled whether it now works as
> expected as well?
This works as expected. Both KSM (with use_zero_pages enabled) and
the THP shrinker are now able to successfully merge zero-filled
pages with the shared zero page, as long as those pages are not
mapped with PROT_MTE.
>
> Then, this should likely be an independent fix.
@Catalin could you send a real patch?
next prev parent reply other threads:[~2025-09-23 17:20 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-22 2:14 [PATCH 1/1] mm/thp: fix MTE tag mismatch when replacing zero-filled subpages Lance Yang
2025-09-22 2:36 ` Zi Yan
2025-09-22 3:36 ` Lance Yang
2025-09-22 3:36 ` Lance Yang
2025-09-22 7:41 ` David Hildenbrand
2025-09-22 8:24 ` Usama Arif
2025-09-22 17:24 ` Catalin Marinas
2025-09-22 17:59 ` David Hildenbrand
2025-09-23 1:48 ` Lance Yang
2025-09-23 11:52 ` Catalin Marinas
2025-09-23 12:00 ` David Hildenbrand
2025-09-23 12:04 ` Lance Yang
2025-09-23 12:51 ` Catalin Marinas
2025-09-23 17:20 ` Lance Yang [this message]
2025-09-23 16:14 ` Catalin Marinas
2025-09-23 16:40 ` David Hildenbrand
2025-09-24 2:49 ` Lance Yang
2025-09-24 8:50 ` Catalin Marinas
2025-09-24 9:13 ` David Hildenbrand
2025-09-24 9:34 ` Catalin Marinas
2025-09-24 9:44 ` David Hildenbrand
2025-09-24 9:59 ` Catalin Marinas
2025-09-23 2:10 ` Wei Yang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1a8352a6-e03f-4f3e-a06b-fab757e39eab@linux.dev \
--to=lance.yang@linux.dev \
--cc=Andrew.Yang@mediatek.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=apopple@nvidia.com \
--cc=baohua@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=byungchul@sk.com \
--cc=casper.li@mediatek.com \
--cc=catalin.marinas@arm.com \
--cc=cerasuolodomenico@gmail.com \
--cc=chinwen.chang@mediatek.com \
--cc=david@redhat.com \
--cc=dev.jain@arm.com \
--cc=gourry@gourry.net \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=ioworker0@gmail.com \
--cc=joshua.hahnjy@gmail.com \
--cc=kaleshsingh@google.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mediatek@lists.infradead.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=matthew.brost@intel.com \
--cc=npache@redhat.com \
--cc=qun-wei.lin@mediatek.com \
--cc=rakie.kim@sk.com \
--cc=riel@surriel.com \
--cc=roman.gushchin@linux.dev \
--cc=rppt@kernel.org \
--cc=ryan.roberts@arm.com \
--cc=ryncsn@gmail.com \
--cc=shakeel.butt@linux.dev \
--cc=stable@vger.kernel.org \
--cc=surenb@google.com \
--cc=usamaarif642@gmail.com \
--cc=voidice@gmail.com \
--cc=willy@infradead.org \
--cc=ying.huang@linux.alibaba.com \
--cc=yuzhao@google.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.