From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 20D3ECAC592 for ; Fri, 19 Sep 2025 13:25:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: Content-Type:In-Reply-To:From:References:Cc:To:Subject:MIME-Version:Date: Message-ID:Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=gOx1K9W6IpgjKGBYblzlfTeW/YPQKf2lOT+ygyeGOuE=; b=ECEIlVAi2lqazbzbYyExMt09Lm EpWCtRNr6N41ZDRb6sQuSONIq3oygvkQO9mJz0dBYRUv+5mNryCyYRQqZbblCPnc8xb5X3KbChv/8 v8KqQKCPbnLiL0jdOWsBQDax8O+74UKbTroQK2Ku74y+G114pfyyj4jfo00nQnFtEStJanPeMVICP +pOE0DPeVMNGfbwgUMN/taYNJzN/IrSwT8NoiLTrD2ttS9q9Er1brgmFn7HuRIWukIPOnmywAzto4 +WBLN0CZOnGMKKat1w6q7Sr6G8VS1OZMEhTY4KOo4DBqP/Fv6auP6fjOtcy39CTeTHwOYHSQ0O+o4 cMo2TUJQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98.2 #2 (Red Hat Linux)) id 1uzb6h-00000002ySg-3KfD; Fri, 19 Sep 2025 13:24:59 +0000 Received: from out-180.mta1.migadu.com ([2001:41d0:203:375::b4]) by bombadil.infradead.org with esmtps (Exim 4.98.2 #2 (Red Hat Linux)) id 1uzb6f-00000002yRv-49fp for linux-mediatek@lists.infradead.org; Fri, 19 Sep 2025 13:24:59 +0000 Message-ID: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1758288285; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=gOx1K9W6IpgjKGBYblzlfTeW/YPQKf2lOT+ygyeGOuE=; b=BIfFOlXNLJKckdL/MG6thldIkbNraS46l3B1wrxX9pO5lr+lgmxeDsVMV/r4oPU6orzUyg lVuVfQEJWkvw4DKh9yV56MjDDMCZHW0rEumT7WxTmpJMH5LWtTXmZRhxunwFPexvXv6VJe hHKmrZJrdJbqW4qpWt+KyUmYUiWZzao= Date: Fri, 19 Sep 2025 21:24:35 +0800 MIME-Version: 1.0 Subject: Re: [PATCH v5 2/6] mm: remap unused subpages to shared zeropage when splitting isolated thp Content-Language: en-US To: David Hildenbrand Cc: =?UTF-8?B?UXVuLXdlaSBMaW4gKOael+e+pOW0tCk=?= , "catalin.marinas@arm.com" , "usamaarif642@gmail.com" , "linux-mm@kvack.org" , "yuzhao@google.com" , "akpm@linux-foundation.org" , "corbet@lwn.net" , =?UTF-8?B?QW5kcmV3IFlhbmcgKOaliuaZuuW8tyk=?= , "npache@redhat.com" , "rppt@kernel.org" , "willy@infradead.org" , "kernel-team@meta.com" , "roman.gushchin@linux.dev" , "hannes@cmpxchg.org" , "cerasuolodomenico@gmail.com" , "linux-kernel@vger.kernel.org" , "ryncsn@gmail.com" , "surenb@google.com" , "riel@surriel.com" , "shakeel.butt@linux.dev" , =?UTF-8?B?Q2hpbndlbiBDaGFuZyAo5by16Yym5paHKQ==?= , "linux-doc@vger.kernel.org" , =?UTF-8?B?Q2FzcGVyIExpICjmnY7kuK3mpq4p?= , "ryan.roberts@arm.com" , "linux-mediatek@lists.infradead.org" , "baohua@kernel.org" , "kaleshsingh@google.com" , "zhais@google.com" , "linux-arm-kernel@lists.infradead.org" References: <20240830100438.3623486-1-usamaarif642@gmail.com> <20240830100438.3623486-3-usamaarif642@gmail.com> <434c092b-0f19-47bf-a5fa-ea5b4b36c35e@redhat.com> <120445c8-7250-42e0-ad6a-978020c8fad3@linux.dev> <9d2c3e3e-439d-4695-b7c9-21fa52f48ced@redhat.com> <4cf41cd5-e93a-412b-b209-4180bd2d4015@linux.dev> <9395a9ca-d865-42d7-9ea1-8e693e4e38e0@linux.dev> X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Lance Yang In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Migadu-Flow: FLOW_OUT X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250919_062458_167570_CB0A1920 X-CRM114-Status: GOOD ( 22.54 ) X-BeenThere: linux-mediatek@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "Linux-mediatek" Errors-To: linux-mediatek-bounces+linux-mediatek=archiver.kernel.org@lists.infradead.org On 2025/9/19 21:09, David Hildenbrand wrote: > On 19.09.25 14:19, Lance Yang wrote: >> Hey David, >> >> I believe I've found the exact reason why KSM skips MTE-tagged pages ;p >> >>> >>> >>> On 2025/9/19 16:14, Lance Yang wrote: >>>> >>>> >>>> On 2025/9/19 15:55, David Hildenbrand wrote: >>>>>>> I think where possible we really only want to identify problematic >>>>>>> (tagged) pages and skip them. And we should either look into fixing >>>>>>> KSM >>>>>>> as well or finding out why KSM is not affected. >>>>>> >>>>>> Yeah. Seems like we could introduce a new helper, >>>>>> folio_test_mte_tagged(struct >>>>>> folio *folio). By default, it would return false, and architectures >>>>>> like >>>>>> arm64 >>>>>> can override it. >>>>> >>>>> If we add a new helper it should instead express the semantics that >>>>> we cannot deduplicate. >>>> >>>> Agreed. >>>> >>>>> >>>>> For THP, I recall that only some pages might be tagged. So likely we >>>>> want to check per page. >>>> >>>> Yes, a per-page check would be simpler. >>>> >>>>> >>>>>> >>>>>> Looking at the code, the PG_mte_tagged flag is not set for regular >>>>>> THP. >>>>> >>>>> I think it's supported for THP per page. Only for hugetlb we tag the >>>>> whole thing through the head page instead of individual pages. >>>> >>>> Right. That's exactly what I meant. >>>> >>>>> >>>>>> The MTE >>>>>> status actually comes from the VM_MTE flag in the VMA that maps it. >>>>>> >>>>> >>>>> During the rmap walk we could check the VMA flag, but there would be >>>>> no way to just stop the THP shrinker scanning this page early. >>>>> >>>>>> static inline bool folio_test_hugetlb_mte_tagged(struct folio *folio) >>>>>> { >>>>>>      bool ret = test_bit(PG_mte_tagged, &folio->flags.f); >>>>>> >>>>>>      VM_WARN_ON_ONCE(!folio_test_hugetlb(folio)); >>>>>> >>>>>>      /* >>>>>>       * If the folio is tagged, ensure ordering with a likely >>>>>> subsequent >>>>>>       * read of the tags. >>>>>>       */ >>>>>>      if (ret) >>>>>>          smp_rmb(); >>>>>>      return ret; >>>>>> } >>>>>> >>>>>> static inline bool page_mte_tagged(struct page *page) >>>>>> { >>>>>>      bool ret = test_bit(PG_mte_tagged, &page->flags.f); >>>>>> >>>>>>      VM_WARN_ON_ONCE(folio_test_hugetlb(page_folio(page))); >>>>>> >>>>>>      /* >>>>>>       * If the page is tagged, ensure ordering with a likely >>>>>> subsequent >>>>>>       * read of the tags. >>>>>>       */ >>>>>>      if (ret) >>>>>>          smp_rmb(); >>>>>>      return ret; >>>>>> } >>>>>> >>>>>> contpte_set_ptes() >>>>>>      __set_ptes() >>>>>>          __set_ptes_anysz() >>>>>>              __sync_cache_and_tags() >>>>>>                  mte_sync_tags() >>>>>>                      set_page_mte_tagged() >>>>>> >>>>>> Then, having the THP shrinker skip any folios that are identified as >>>>>> MTE-tagged. >>>>> >>>>> Likely we should just do something like (maybe we want better naming) >>>>> >>>>> #ifndef page_is_mergable >>>>> #define page_is_mergable(page) (true) >>>>> #endif >>>> >>>> >>>> Maybe something like page_is_optimizable()? Just a thought ;p >>>> >>>>> >>>>> And for arm64 have it be >>>>> >>>>> #define page_is_mergable(page) (!page_mte_tagged(page)) >>>>> >>>>> >>>>> And then do >>>>> >>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c >>>>> index 1f0813b956436..1cac9093918d6 100644 >>>>> --- a/mm/huge_memory.c >>>>> +++ b/mm/huge_memory.c >>>>> @@ -4251,7 +4251,8 @@ static bool thp_underused(struct folio *folio) >>>>> >>>>>           for (i = 0; i < folio_nr_pages(folio); i++) { >>>>>                   kaddr = kmap_local_folio(folio, i * PAGE_SIZE); >>>>> -               if (!memchr_inv(kaddr, 0, PAGE_SIZE)) { >>>>> +               if (page_is_mergable(folio_page(folio, i)) && >>>>> +                   !memchr_inv(kaddr, 0, PAGE_SIZE)) { >>>>>                           num_zero_pages++; >>>>>                           if (num_zero_pages > >>>>> khugepaged_max_ptes_none) { >>>>>                                   kunmap_local(kaddr); >>>>> diff --git a/mm/migrate.c b/mm/migrate.c >>>>> index 946253c398072..476a9a9091bd3 100644 >>>>> --- a/mm/migrate.c >>>>> +++ b/mm/migrate.c >>>>> @@ -306,6 +306,8 @@ static bool try_to_map_unused_to_zeropage(struct >>>>> page_vma_mapped_walk *pvmw, >>>>> >>>>>           if (PageCompound(page)) >>>>>                   return false; >>>>> +       if (!page_is_mergable(page)) >>>>> +               return false; >>>>>           VM_BUG_ON_PAGE(!PageAnon(page), page); >>>>>           VM_BUG_ON_PAGE(!PageLocked(page), page); >>>>>           VM_BUG_ON_PAGE(pte_present(ptep_get(pvmw->pte)), page); >>>> >>>> Looks good to me! >>>> >>>>> >>>>> >>>>> For KSM, similarly just bail out early. But still wondering if this >>>>> is already checked >>>>> somehow for KSM. >>>> >>>> +1 I'm looking for a machine to test it on. >>> >>> Interestingly, it seems KSM is already skipping MTE-tagged pages. My >>> test, >>> running on a v6.8.0 kernel inside QEMU (with MTE enabled), shows no >>> merging >>> activity for those pages ... >> >> KSM's call to pages_identical() ultimately leads to memcmp_pages(). The >> arm64 implementation of memcmp_pages() in arch/arm64/kernel/mte.c >> contains >> a specific check that prevents merging in this case. >> >> try_to_merge_one_page() >>     -> pages_identical() >>         -> !memcmp_pages() Fails! >>         -> replace_page() >> >> >> int memcmp_pages(struct page *page1, struct page *page2) >> { >>     char *addr1, *addr2; >>     int ret; >> >>     addr1 = page_address(page1); >>     addr2 = page_address(page2); >>     ret = memcmp(addr1, addr2, PAGE_SIZE); >> >>     if (!system_supports_mte() || ret) >>         return ret; >> >>     /* >>      * If the page content is identical but at least one of the pages is >>      * tagged, return non-zero to avoid KSM merging. If only one of the >>      * pages is tagged, __set_ptes() may zero or change the tags of the >>      * other page via mte_sync_tags(). >>      */ >>     if (page_mte_tagged(page1) || page_mte_tagged(page2)) >>         return addr1 != addr2; >> >>     return ret; >> } >> >> IIUC, if either page is MTE-tagged, memcmp_pages() intentionally returns >> a non-zero value, which in turn causes pages_identical() to return false. > > Cool, so we should likely just use that then in the shrinker code. Can > you send a fix? Certainly! I'll get on that ;p Cheers, Lance