From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 62754CAC59A for ; Fri, 19 Sep 2025 12:44:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: Content-Type:In-Reply-To:References:Cc:To:From:Subject:MIME-Version:Date: Message-ID:Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=K1FLwkYOUVkwrrVHc9TlgJl3FE5qVq/cdte4SWXZgzE=; b=fpc9VtiHQd6gfkyHHanptu2ozN 5MhdU0soFeaSx4XR6S5oQUTt2vcEoXHwtDc0fyRBbxVw+cGH0xP2fO4mdJs0U0vDS5ONpmVkdO8ap DA/BPnj4QkbzkwBpqTeblj5HUEMxI6GN/KPLQTG9H8CgT7brYrKbt14F7KHohPSqVVpBqJPYITODZ QYfxsAIZb4LXTZvew20veb8wQMC+FW3NeM0CzbnrgD11MuxSTyNQD4ByJ7rhCHkXvcK13WYhRjzS2 dwsWVmHsNA78tsKciihKIZt+Xqe6A4g4CKROO5EsrAZV4HvJQ6mAly0aWaaEwwpOSKXRRKgppot1P 4MrHwL2g==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98.2 #2 (Red Hat Linux)) id 1uzaTg-00000002rmo-2YDd; Fri, 19 Sep 2025 12:44:40 +0000 Received: from out-182.mta0.migadu.com ([2001:41d0:1004:224b::b6]) by bombadil.infradead.org with esmtps (Exim 4.98.2 #2 (Red Hat Linux)) id 1uzaTe-00000002rkV-0D4E for linux-mediatek@lists.infradead.org; Fri, 19 Sep 2025 12:44:39 +0000 Message-ID: <4a67a5d5-6043-4ba4-b1ca-2b0a800aafb9@linux.dev> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1758285864; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=K1FLwkYOUVkwrrVHc9TlgJl3FE5qVq/cdte4SWXZgzE=; b=idiqMF/F2cFWSTzPsNc74ZZn7KDQsa3YohzP4Jr+GT0ZYco72PslC9P64P6ffs4UyHiL5R cPdWj06/LPQlVQ58Bra+5toAm06sdHVL169F5MrpE1e1ELcoUy0DJelJsqoT/eE5OLhmX2 Cr554tR1sW2TjiYYyApiVxIMGu06lps= Date: Fri, 19 Sep 2025 20:44:14 +0800 MIME-Version: 1.0 Subject: Re: [PATCH v5 2/6] mm: remap unused subpages to shared zeropage when splitting isolated thp Content-Language: en-US X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Lance Yang To: David Hildenbrand Cc: =?UTF-8?B?UXVuLXdlaSBMaW4gKOael+e+pOW0tCk=?= , "catalin.marinas@arm.com" , "usamaarif642@gmail.com" , "linux-mm@kvack.org" , "yuzhao@google.com" , "akpm@linux-foundation.org" , "corbet@lwn.net" , =?UTF-8?B?QW5kcmV3IFlhbmcgKOaliuaZuuW8tyk=?= , "npache@redhat.com" , "rppt@kernel.org" , "willy@infradead.org" , "kernel-team@meta.com" , "roman.gushchin@linux.dev" , "hannes@cmpxchg.org" , "cerasuolodomenico@gmail.com" , "linux-kernel@vger.kernel.org" , "ryncsn@gmail.com" , "surenb@google.com" , "riel@surriel.com" , "shakeel.butt@linux.dev" , =?UTF-8?B?Q2hpbndlbiBDaGFuZyAo5by16Yym5paHKQ==?= , "linux-doc@vger.kernel.org" , =?UTF-8?B?Q2FzcGVyIExpICjmnY7kuK3mpq4p?= , "ryan.roberts@arm.com" , "linux-mediatek@lists.infradead.org" , "baohua@kernel.org" , "kaleshsingh@google.com" , "zhais@google.com" , "linux-arm-kernel@lists.infradead.org" References: <20240830100438.3623486-1-usamaarif642@gmail.com> <20240830100438.3623486-3-usamaarif642@gmail.com> <434c092b-0f19-47bf-a5fa-ea5b4b36c35e@redhat.com> <120445c8-7250-42e0-ad6a-978020c8fad3@linux.dev> <9d2c3e3e-439d-4695-b7c9-21fa52f48ced@redhat.com> <4cf41cd5-e93a-412b-b209-4180bd2d4015@linux.dev> <9395a9ca-d865-42d7-9ea1-8e693e4e38e0@linux.dev> In-Reply-To: <9395a9ca-d865-42d7-9ea1-8e693e4e38e0@linux.dev> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Migadu-Flow: FLOW_OUT X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250919_054438_223416_679413A8 X-CRM114-Status: GOOD ( 23.52 ) X-BeenThere: linux-mediatek@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "Linux-mediatek" Errors-To: linux-mediatek-bounces+linux-mediatek=archiver.kernel.org@lists.infradead.org On 2025/9/19 20:19, Lance Yang wrote: > Hey David, > > I believe I've found the exact reason why KSM skips MTE-tagged pages ;p > >> >> >> On 2025/9/19 16:14, Lance Yang wrote: >>> >>> >>> On 2025/9/19 15:55, David Hildenbrand wrote: >>>>>> I think where possible we really only want to identify problematic >>>>>> (tagged) pages and skip them. And we should either look into >>>>>> fixing KSM >>>>>> as well or finding out why KSM is not affected. >>>>> >>>>> Yeah. Seems like we could introduce a new helper, >>>>> folio_test_mte_tagged(struct >>>>> folio *folio). By default, it would return false, and architectures >>>>> like >>>>> arm64 >>>>> can override it. >>>> >>>> If we add a new helper it should instead express the semantics that >>>> we cannot deduplicate. >>> >>> Agreed. >>> >>>> >>>> For THP, I recall that only some pages might be tagged. So likely we >>>> want to check per page. >>> >>> Yes, a per-page check would be simpler. >>> >>>> >>>>> >>>>> Looking at the code, the PG_mte_tagged flag is not set for regular >>>>> THP. >>>> >>>> I think it's supported for THP per page. Only for hugetlb we tag the >>>> whole thing through the head page instead of individual pages. >>> >>> Right. That's exactly what I meant. >>> >>>> >>>>> The MTE >>>>> status actually comes from the VM_MTE flag in the VMA that maps it. >>>>> >>>> >>>> During the rmap walk we could check the VMA flag, but there would be >>>> no way to just stop the THP shrinker scanning this page early. >>>> >>>>> static inline bool folio_test_hugetlb_mte_tagged(struct folio *folio) >>>>> { >>>>>     bool ret = test_bit(PG_mte_tagged, &folio->flags.f); >>>>> >>>>>     VM_WARN_ON_ONCE(!folio_test_hugetlb(folio)); >>>>> >>>>>     /* >>>>>      * If the folio is tagged, ensure ordering with a likely >>>>> subsequent >>>>>      * read of the tags. >>>>>      */ >>>>>     if (ret) >>>>>         smp_rmb(); >>>>>     return ret; >>>>> } >>>>> >>>>> static inline bool page_mte_tagged(struct page *page) >>>>> { >>>>>     bool ret = test_bit(PG_mte_tagged, &page->flags.f); >>>>> >>>>>     VM_WARN_ON_ONCE(folio_test_hugetlb(page_folio(page))); >>>>> >>>>>     /* >>>>>      * If the page is tagged, ensure ordering with a likely subsequent >>>>>      * read of the tags. >>>>>      */ >>>>>     if (ret) >>>>>         smp_rmb(); >>>>>     return ret; >>>>> } >>>>> >>>>> contpte_set_ptes() >>>>>     __set_ptes() >>>>>         __set_ptes_anysz() >>>>>             __sync_cache_and_tags() >>>>>                 mte_sync_tags() >>>>>                     set_page_mte_tagged() >>>>> >>>>> Then, having the THP shrinker skip any folios that are identified as >>>>> MTE-tagged. >>>> >>>> Likely we should just do something like (maybe we want better naming) >>>> >>>> #ifndef page_is_mergable >>>> #define page_is_mergable(page) (true) >>>> #endif >>> >>> >>> Maybe something like page_is_optimizable()? Just a thought ;p >>> >>>> >>>> And for arm64 have it be >>>> >>>> #define page_is_mergable(page) (!page_mte_tagged(page)) >>>> >>>> >>>> And then do >>>> >>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c >>>> index 1f0813b956436..1cac9093918d6 100644 >>>> --- a/mm/huge_memory.c >>>> +++ b/mm/huge_memory.c >>>> @@ -4251,7 +4251,8 @@ static bool thp_underused(struct folio *folio) >>>> >>>>          for (i = 0; i < folio_nr_pages(folio); i++) { >>>>                  kaddr = kmap_local_folio(folio, i * PAGE_SIZE); >>>> -               if (!memchr_inv(kaddr, 0, PAGE_SIZE)) { >>>> +               if (page_is_mergable(folio_page(folio, i)) && >>>> +                   !memchr_inv(kaddr, 0, PAGE_SIZE)) { >>>>                          num_zero_pages++; >>>>                          if (num_zero_pages > >>>> khugepaged_max_ptes_none) { >>>>                                  kunmap_local(kaddr); >>>> diff --git a/mm/migrate.c b/mm/migrate.c >>>> index 946253c398072..476a9a9091bd3 100644 >>>> --- a/mm/migrate.c >>>> +++ b/mm/migrate.c >>>> @@ -306,6 +306,8 @@ static bool try_to_map_unused_to_zeropage(struct >>>> page_vma_mapped_walk *pvmw, >>>> >>>>          if (PageCompound(page)) >>>>                  return false; >>>> +       if (!page_is_mergable(page)) >>>> +               return false; >>>>          VM_BUG_ON_PAGE(!PageAnon(page), page); >>>>          VM_BUG_ON_PAGE(!PageLocked(page), page); >>>>          VM_BUG_ON_PAGE(pte_present(ptep_get(pvmw->pte)), page); >>> >>> Looks good to me! >>> >>>> >>>> >>>> For KSM, similarly just bail out early. But still wondering if this >>>> is already checked >>>> somehow for KSM. >>> >>> +1 I'm looking for a machine to test it on. >> >> Interestingly, it seems KSM is already skipping MTE-tagged pages. My >> test, >> running on a v6.8.0 kernel inside QEMU (with MTE enabled), shows no >> merging >> activity for those pages ... > > KSM's call to pages_identical() ultimately leads to memcmp_pages(). The > arm64 implementation of memcmp_pages() in arch/arm64/kernel/mte.c contains > a specific check that prevents merging in this case. > > try_to_merge_one_page() >     -> pages_identical() >         -> !memcmp_pages() Fails! >         -> replace_page() Forgot to add: memcmp_pages() is also called in other KSM paths, such as stable_tree_search(), stable_tree_insert(), and unstable_tree_search_insert(), effectively blocking MTE-tagged pages from entering either of KSM's trees. > > > int memcmp_pages(struct page *page1, struct page *page2) > { >     char *addr1, *addr2; >     int ret; > >     addr1 = page_address(page1); >     addr2 = page_address(page2); >     ret = memcmp(addr1, addr2, PAGE_SIZE); > >     if (!system_supports_mte() || ret) >         return ret; > >     /* >      * If the page content is identical but at least one of the pages is >      * tagged, return non-zero to avoid KSM merging. If only one of the >      * pages is tagged, __set_ptes() may zero or change the tags of the >      * other page via mte_sync_tags(). >      */ >     if (page_mte_tagged(page1) || page_mte_tagged(page2)) >         return addr1 != addr2; > >     return ret; > } > > IIUC, if either page is MTE-tagged, memcmp_pages() intentionally returns > a non-zero value, which in turn causes pages_identical() to return false. > > Cheers, > Lance