From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5B09DC83F1A for ; Thu, 17 Jul 2025 07:23:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F283C6B00A5; Thu, 17 Jul 2025 03:23:30 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id ED8EC6B00A8; Thu, 17 Jul 2025 03:23:30 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DC7C46B00A9; Thu, 17 Jul 2025 03:23:30 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id CD7C76B00A5 for ; Thu, 17 Jul 2025 03:23:30 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 7594412E749 for ; Thu, 17 Jul 2025 07:23:30 +0000 (UTC) X-FDA: 83672916180.19.FFE3028 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf08.hostedemail.com (Postfix) with ESMTP id 33F3616000A for ; Thu, 17 Jul 2025 07:23:28 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="Uwwaw/is"; dmarc=pass (policy=quarantine) header.from=redhat.com; spf=pass (imf08.hostedemail.com: domain of npache@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=npache@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1752737008; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Kc1S4HRN10CMKK/JwiubHJK1ene59WEJkUCJ4n6yCBE=; b=XTovXlzsTTyS07uVmAvPhdaGGTSRIPlIvD8CP1On/oknE4zFXJ84NDiawpC5LyeCXgXETO 5ceim6VJe09o+1g36DuNl7m2h8r8XVF0lSH05jgX29zwVE5sUNVAXgFKy3nMDcjpZ9n7Ds 0xWFNt/21BIZKnovtkCSaSGZTPMq0JU= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1752737008; a=rsa-sha256; cv=none; b=74JU0W2tr0s/8ypBIgV3WmmtQPTxt1WKDOfqnicJLed6nlIWC9gXDDqj3pQIrI1aAUb3xe O8Vl3BB/d9iXsuW1NK0XZ3y8xKLIkZ/Siz39kLpvkbGah1/tjaTeFHfb4iX1K7pG98IHRP Br259X0nICCysBN87EA/k9eqxwv56C4= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="Uwwaw/is"; dmarc=pass (policy=quarantine) header.from=redhat.com; spf=pass (imf08.hostedemail.com: domain of npache@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=npache@redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1752737007; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Kc1S4HRN10CMKK/JwiubHJK1ene59WEJkUCJ4n6yCBE=; b=Uwwaw/ise3e/7l8ovasxRBvvjwkh2pl/+tXUuW45bkfUJtm+y/crkOOumQiDEEGwvhISRn gCrrFjt0L2MVqvJpEyUMAJ+f3LW2vyHSL354B37hcMA2GM7Q0X1dwthN/Dqi1UfxTGgHMg t/lDjCCH3//kxmKfpwpQSg2MCREk1p0= Received: from mail-yw1-f199.google.com (mail-yw1-f199.google.com [209.85.128.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-372-xc4bTjQlOYWJMQTTeHbOEw-1; Thu, 17 Jul 2025 03:23:26 -0400 X-MC-Unique: xc4bTjQlOYWJMQTTeHbOEw-1 X-Mimecast-MFC-AGG-ID: xc4bTjQlOYWJMQTTeHbOEw_1752737006 Received: by mail-yw1-f199.google.com with SMTP id 00721157ae682-718409595cbso10625897b3.1 for ; Thu, 17 Jul 2025 00:23:26 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1752737006; x=1753341806; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Kc1S4HRN10CMKK/JwiubHJK1ene59WEJkUCJ4n6yCBE=; b=X2vP17CkWKgg+WAe0B811SeICmRXfB/KdlOJb51sF32L2fyQbh2yTyJCBhoOXX8pDm HXbRmAqrHISYMlO9WG1br1rXv2YMk3DlCaVxPPJYYNZqhrowaiK7WFvRzNDXAniGBFRf GwlwWgoAGBr2VO1vtL4dv4y97uPrW8TLzoMgow83yYviTX9RWWIrFdeL0aicC/463rIu c8KYpSyGQwJ9+QEjNDacE89k8fFLIoR7MBnrVeEgyHDI7J5ViSfMZwfgA/pNGqB66XFp 7cNaDV/3KaKDDHAHdjtWD5bazkWcz29tTAdHkV0bViD7TceB7XLW0FLhMlFA11e9TgLX bimQ== X-Gm-Message-State: AOJu0Yy36TE1779dG/sur8kGiKIKzZ0O0IWFPFr2Yxic+Gyma3fC8eUw TJuh2Hve+jPqFbJYtIUDFSDux/F3S4s2D2RlU6wBus01XpDTcmjaBC/C6DuBhvr501oFf1IZBHW lWjoAhpQhpGaPfYnOaYHSkgn/08jMiGLix5djXacv6PccuG3bn17+dtDdYIkL2q4K4kaB8xF8ql vOJlElPRZo36RtW5xVa34bPMb1FaQ= X-Gm-Gg: ASbGnctFuwdsfmSAg/AJ5U3oKYMB9i48UPxR7fkmgeo99w42TWyx+GKBDr0aeycaNpO +eDvZsR0ynJP/dKWaK+Gv/6QCr3SHTXXomY1g+Z124AChJebJQV+2BvPzfRHTRi5sikfmVHVc2D A2Q5oc+rO72fM3ZLwQnnbKb3k= X-Received: by 2002:a05:690c:688e:b0:70b:6651:b3e2 with SMTP id 00721157ae682-71834f3630bmr86143657b3.6.1752737005790; Thu, 17 Jul 2025 00:23:25 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFrnspdqqX7aF+30GpKsaj5N3vowL7QVfLxGl9seZtaQy9Qba+TSioCVG5Kge/qXEG9ECo0HmCjXxEtp9vaAjM= X-Received: by 2002:a05:690c:688e:b0:70b:6651:b3e2 with SMTP id 00721157ae682-71834f3630bmr86142767b3.6.1752737004540; Thu, 17 Jul 2025 00:23:24 -0700 (PDT) MIME-Version: 1.0 References: <20250714003207.113275-1-npache@redhat.com> <20250714003207.113275-6-npache@redhat.com> In-Reply-To: From: Nico Pache Date: Thu, 17 Jul 2025 01:22:58 -0600 X-Gm-Features: Ac12FXyE2i46MbTFKcgd5mrUuWOmp87pUm27jhY9X8v9F0TQ52kTkKs7hjcEIPs Message-ID: Subject: Re: [PATCH v9 05/14] khugepaged: generalize __collapse_huge_page_* for mTHP support To: David Hildenbrand Cc: linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, ziy@nvidia.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, dev.jain@arm.com, corbet@lwn.net, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, akpm@linux-foundation.org, baohua@kernel.org, willy@infradead.org, peterx@redhat.com, wangkefeng.wang@huawei.com, usamaarif642@gmail.com, sunnanyong@huawei.com, vishal.moola@gmail.com, thomas.hellstrom@linux.intel.com, yang@os.amperecomputing.com, kirill.shutemov@linux.intel.com, aarcange@redhat.com, raquini@redhat.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, tiwai@suse.de, will@kernel.org, dave.hansen@linux.intel.com, jack@suse.cz, cl@gentwo.org, jglisse@google.com, surenb@google.com, zokeefe@google.com, hannes@cmpxchg.org, rientjes@google.com, mhocko@suse.com, rdunlap@infradead.org, hughd@google.com X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: p7dfIilgcFVEfD517n1Oac3M-Nh4nwqJF6znPDxqV_g_1752737006 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: jhdzaqkgcen6n6bo8neardzgz7c7fd3t X-Rspamd-Queue-Id: 33F3616000A X-Rspamd-Server: rspam10 X-Rspam-User: X-HE-Tag: 1752737008-441676 X-HE-Meta: U2FsdGVkX19zj9plj269qkrjyMgGz/vyEb9z4oXnQhdvAVAGowuy5SPEG66EYCLyNfWI00Kpm1Q8nBX2XgwniCVoH4gk7+0x2uTrt8AUrW/j5rGKVJb4hmU8LyhUnMcjZpu28FP46RgplGTt0UBh28F9Rs9fCFMAgtmHlDtREyScxyA1UD4HkAUzCi3oClvH4OHBjmK3IYF/RRytk3UWsHQK3WVGTaVgkPZkKKp0xbXA4h/nPmsvJx4eKViwk1PM6nelARhnmF3w7lpl1Ux6NkRXXbpr+/Qu+Bes9JSBWEa9frjAxD/j1Ofh0PVuX5HWRY59sy+C/aPIJr/Qu7pXYIOILaSO3Qto2EFKJhbGBjqTxZWa20FsyciseuQYujAcbVRKnt5splTQY8DPlFozohtWYaC/t+QfPl9AF73kJfGI8LY5jzIaT/9It3bqdYkSoxE1a+iHMZwIaF0vJyZ2CLcfvaJs6Qj+1wCJydsl07h/uBPMCyF1KXmiFzSlqOKCOMT1IwZ5wx320SOehBqeiSV5zVeP1JUEcR5+0PyEEDtOegFd/OJThSDqy/AFp84MVGz9CddoQ1cH6QL9w9I3dMjBRfvEfj/5JFmiWXsCoVvIiLALLed9vyN3RA/nygkr1wNVlxG4nbb6k3gnW5xSEO/hWSI7PpWZ2uq29lBDK3vH7t/zx9DFejYQ5Dt3T+vQeA2PD3/Nr9d9z2svIYNj+s0ZgQ3P8ljQ9VNwAvMfbjFzkHGMd8AEcwNuNgRTjPlW1iPr3jxDfgWlAhVp3p5+1I1EJpXmjrWiRbt/osrEYaDHMsXUBpi9Xw92WV61jqQjJaZnM4l6jZQgluouys8Y92dx4LPXrhjIP4bRUPkVXOnt2GqDL+ClL2V9xkCZICT1cD0DmHaRdCEkVSW5yUEnWjTG4JVCj36EIWEtqlzCpXt+94ioVxn5FGfGqt1Vs9UDm+ycn0qP0zOtCoQeYvg VdJ8/SZu j2e8xCjvqWvi2MUnsr23YZYdWUiJfRfyoIMec0v8B47/ccwcfBdzszm7EOM2JD7gICZnvhwaGa5JkmzaeW0ul71D0TybcKAMSXrxFjEEq4MC6FfWIhgsE0m+SN2vSR1VGW///SxemeD8ySypwvcivji6HgSyryxtwJICTconShktnOEPsv/OsY3bRtTMBGvnG1IYWioGS4hQQkpEolL4d3BNrDnBdDoIcBsoLScmO2AFVGUCMaPrAHV/jWLREJ0sblqfdScPwMxIPaYTefIlYLkqiW91QJhPmc+ng6MQHEL8azXhM3yqC/WvWJ4JXCm9y3Ie4Suc+2evXIbuGWyGhtVn/qGDzeAiHZbi5Tb+FaoT2QZ9ioPg1Prm0JD59EvWfKbDQlKaI5mqojI5f+B8k5UUhsS19QxJJ9oHhAY/c6Zn/mGqdvzLXk4thTtS4ShRIujGM2EV8MkyVTe+No3ygPh3AI2UWNUY2TY2evVYu3erOB2IAqCeLVtgjTn1Ras2AGIGec455aCV3LkA= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Jul 16, 2025 at 7:53=E2=80=AFAM David Hildenbrand wrote: > > On 14.07.25 02:31, Nico Pache wrote: > > generalize the order of the __collapse_huge_page_* functions > > to support future mTHP collapse. > > > > mTHP collapse can suffer from incosistant behavior, and memory waste > > "creep". disable swapin and shared support for mTHP collapse. > > > > No functional changes in this patch. > > > > Reviewed-by: Baolin Wang > > Co-developed-by: Dev Jain > > Signed-off-by: Dev Jain > > Signed-off-by: Nico Pache > > --- > > mm/khugepaged.c | 49 +++++++++++++++++++++++++++++++-----------------= - > > 1 file changed, 31 insertions(+), 18 deletions(-) > > > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c > > index cc9a35185604..ee54e3c1db4e 100644 > > --- a/mm/khugepaged.c > > +++ b/mm/khugepaged.c > > @@ -552,15 +552,17 @@ static int __collapse_huge_page_isolate(struct vm= _area_struct *vma, > > unsigned long address, > > pte_t *pte, > > struct collapse_control *cc, > > - struct list_head *compound_pageli= st) > > + struct list_head *compound_pageli= st, > > + u8 order) > > u8 ... (applies to all instances) Fixed all instances of this (other than those that need to stay) > > > { > > struct page *page =3D NULL; > > struct folio *folio =3D NULL; > > pte_t *_pte; > > int none_or_zero =3D 0, shared =3D 0, result =3D SCAN_FAIL, refer= enced =3D 0; > > bool writable =3D false; > > + int scaled_none =3D khugepaged_max_ptes_none >> (HPAGE_PMD_ORDER = - order); > > "scaled_max_ptes_none" maybe? done! > > > > > - for (_pte =3D pte; _pte < pte + HPAGE_PMD_NR; > > + for (_pte =3D pte; _pte < pte + (1 << order); > > _pte++, address +=3D PAGE_SIZE) { > > pte_t pteval =3D ptep_get(_pte); > > if (pte_none(pteval) || (pte_present(pteval) && > > @@ -568,7 +570,7 @@ static int __collapse_huge_page_isolate(struct vm_a= rea_struct *vma, > > ++none_or_zero; > > if (!userfaultfd_armed(vma) && > > (!cc->is_khugepaged || > > - none_or_zero <=3D khugepaged_max_ptes_none))= { > > + none_or_zero <=3D scaled_none)) { > > continue; > > } else { > > result =3D SCAN_EXCEED_NONE_PTE; > > @@ -596,8 +598,8 @@ static int __collapse_huge_page_isolate(struct vm_a= rea_struct *vma, > > /* See hpage_collapse_scan_pmd(). */ > > if (folio_maybe_mapped_shared(folio)) { > > ++shared; > > - if (cc->is_khugepaged && > > - shared > khugepaged_max_ptes_shared) { > > + if (order !=3D HPAGE_PMD_ORDER || (cc->is_khugepa= ged && > > + shared > khugepaged_max_ptes_shared)) { > > Please add a comment why we do something different with PMD. As > commenting below, does this deserve a TODO? > > > result =3D SCAN_EXCEED_SHARED_PTE; > > count_vm_event(THP_SCAN_EXCEED_SHARED_PTE= ); > > goto out; > > @@ -698,13 +700,14 @@ static void __collapse_huge_page_copy_succeeded(p= te_t *pte, > > struct vm_area_struct *vm= a, > > unsigned long address, > > spinlock_t *ptl, > > - struct list_head *compoun= d_pagelist) > > + struct list_head *compoun= d_pagelist, > > + u8 order) > > { > > struct folio *src, *tmp; > > pte_t *_pte; > > pte_t pteval; > > > > - for (_pte =3D pte; _pte < pte + HPAGE_PMD_NR; > > + for (_pte =3D pte; _pte < pte + (1 << order); > > _pte++, address +=3D PAGE_SIZE) { > > pteval =3D ptep_get(_pte); > > if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) { > > @@ -751,7 +754,8 @@ static void __collapse_huge_page_copy_failed(pte_t = *pte, > > pmd_t *pmd, > > pmd_t orig_pmd, > > struct vm_area_struct *vma, > > - struct list_head *compound_p= agelist) > > + struct list_head *compound_p= agelist, > > + u8 order) > > { > > spinlock_t *pmd_ptl; > > > > @@ -768,7 +772,7 @@ static void __collapse_huge_page_copy_failed(pte_t = *pte, > > * Release both raw and compound pages isolated > > * in __collapse_huge_page_isolate. > > */ > > - release_pte_pages(pte, pte + HPAGE_PMD_NR, compound_pagelist); > > + release_pte_pages(pte, pte + (1 << order), compound_pagelist); > > } > > > > /* > > @@ -789,7 +793,7 @@ static void __collapse_huge_page_copy_failed(pte_t = *pte, > > static int __collapse_huge_page_copy(pte_t *pte, struct folio *folio, > > pmd_t *pmd, pmd_t orig_pmd, struct vm_area_struct *vma, > > unsigned long address, spinlock_t *ptl, > > - struct list_head *compound_pagelist) > > + struct list_head *compound_pagelist, u8 order) > > { > > unsigned int i; > > int result =3D SCAN_SUCCEED; > > @@ -797,7 +801,7 @@ static int __collapse_huge_page_copy(pte_t *pte, st= ruct folio *folio, > > /* > > * Copying pages' contents is subject to memory poison at any ite= ration. > > */ > > - for (i =3D 0; i < HPAGE_PMD_NR; i++) { > > + for (i =3D 0; i < (1 << order); i++) { > > pte_t pteval =3D ptep_get(pte + i); > > struct page *page =3D folio_page(folio, i); > > unsigned long src_addr =3D address + i * PAGE_SIZE; > > @@ -816,10 +820,10 @@ static int __collapse_huge_page_copy(pte_t *pte, = struct folio *folio, > > > > if (likely(result =3D=3D SCAN_SUCCEED)) > > __collapse_huge_page_copy_succeeded(pte, vma, address, pt= l, > > - compound_pagelist); > > + compound_pagelist, or= der); > > else > > __collapse_huge_page_copy_failed(pte, pmd, orig_pmd, vma, > > - compound_pagelist); > > + compound_pagelist, order= ); > > > > return result; > > } > > @@ -994,11 +998,11 @@ static int check_pmd_still_valid(struct mm_struct= *mm, > > static int __collapse_huge_page_swapin(struct mm_struct *mm, > > struct vm_area_struct *vma, > > unsigned long haddr, pmd_t *pmd, > > - int referenced) > > + int referenced, u8 order) > > { > > int swapped_in =3D 0; > > vm_fault_t ret =3D 0; > > - unsigned long address, end =3D haddr + (HPAGE_PMD_NR * PAGE_SIZE)= ; > > + unsigned long address, end =3D haddr + (PAGE_SIZE << order); > > int result; > > pte_t *pte =3D NULL; > > spinlock_t *ptl; > > @@ -1029,6 +1033,15 @@ static int __collapse_huge_page_swapin(struct mm= _struct *mm, > > if (!is_swap_pte(vmf.orig_pte)) > > continue; > > > > + /* Dont swapin for mTHP collapse */ > > Should we turn this into a TODO, because it's something to figure out > regarding the scaling etc? Good idea, I changed both of these into TODOs > > > + if (order !=3D HPAGE_PMD_ORDER) { > > + count_mthp_stat(order, MTHP_STAT_COLLAPSE_EXCEED_= SWAP); > > + pte_unmap(pte); > > + mmap_read_unlock(mm); > > + result =3D SCAN_EXCEED_SWAP_PTE; > > + goto out; > > + } > > + > > vmf.pte =3D pte; > > vmf.ptl =3D ptl; > > ret =3D do_swap_page(&vmf); > > @@ -1149,7 +1162,7 @@ static int collapse_huge_page(struct mm_struct *m= m, unsigned long address, > > * that case. Continuing to collapse causes inconsistenc= y. > > */ > > result =3D __collapse_huge_page_swapin(mm, vma, address, = pmd, > > - referenced); > > + referenced, HPAGE_PMD_ORDER); > > Indent messed up. Feel free to exceed 80 chars if it aids readability. Fixed! > > > if (result !=3D SCAN_SUCCEED) > > goto out_nolock; > > } > > @@ -1197,7 +1210,7 @@ static int collapse_huge_page(struct mm_struct *m= m, unsigned long address, > > pte =3D pte_offset_map_lock(mm, &_pmd, address, &pte_ptl); > > if (pte) { > > result =3D __collapse_huge_page_isolate(vma, address, pte= , cc, > > - &compound_pagelist)= ; > > + &compound_pagelist, HPAGE_PMD_ORD= ER); > > Dito. Fixed! > > > Apart from that, nothing jumped at me > > Acked-by: David Hildenbrand Thanks for the ack! I fixed the compile issue you noted too. > > -- > Cheers, > > David / dhildenb >