From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7539EC369CB for ; Wed, 23 Apr 2025 07:30:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 120A06B0005; Wed, 23 Apr 2025 03:30:13 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0D3816B0007; Wed, 23 Apr 2025 03:30:13 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id ED90E6B0008; Wed, 23 Apr 2025 03:30:12 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id CE9246B0005 for ; Wed, 23 Apr 2025 03:30:12 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id EC2C35F716 for ; Wed, 23 Apr 2025 07:30:13 +0000 (UTC) X-FDA: 83364485106.13.5C17811 Received: from out30-101.freemail.mail.aliyun.com (out30-101.freemail.mail.aliyun.com [115.124.30.101]) by imf08.hostedemail.com (Postfix) with ESMTP id C062A160014 for ; Wed, 23 Apr 2025 07:30:10 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=yqcPt7C4; dmarc=pass (policy=none) header.from=linux.alibaba.com; spf=pass (imf08.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.101 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1745393412; a=rsa-sha256; cv=none; b=Julj+fJBd19QIr6j9mhgW6BiezLdCvJAT/zrsR1zYE/v0B4DrXwfqWfF6VHG8UJEZC4lPT rGRef+ByNClzIyrTjRcBTRckrkyy/0be7eQQvHnyBHvc+TgnWExalBCUog2xmuFjHAgW2v GdT/7eZ0ic/CecKAJxHJ/znZRGPkgWo= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=yqcPt7C4; dmarc=pass (policy=none) header.from=linux.alibaba.com; spf=pass (imf08.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.101 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1745393412; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=lH4lDINRe2o7t1vSewfvz5VOPesF2wiTgnlvHI8Ti7Y=; b=hWAkwoQh/21IbVWtK7LOMJtT8E9AvT9rSYLLpanpt3xMvUjvOi6I1qlN8Mn5XJhRtrWB/w YgOSBeXf14aeXLT1gQkv/n73EbfoIKP9Fy4Ukzc4WF78TqQgUshEz7eD0jtVwLSRk/cvld cGPtHjf6L0qnVnW0fIg6lnZt/r6tBQw= DKIM-Signature:v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1745393405; h=Message-ID:Date:MIME-Version:Subject:To:From:Content-Type; bh=lH4lDINRe2o7t1vSewfvz5VOPesF2wiTgnlvHI8Ti7Y=; b=yqcPt7C4qmdGqTKc7v7ChxNkPMLYDewr0NLfSjM5tjWY9zeyXnT9znPy7WSMjNPuNPMo6Pxay+WxKiWEvw+fZ8hVwAz9nXJXMNnd2K+QhCtqjlA0rG0yTRdnObZV2K07I1JmZtLYh6v5PHxQmNnEDNTaWw61GWkyiUyMm/2rlXs= Received: from 30.74.144.121(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0WXtXXfj_1745393402 cluster:ay36) by smtp.aliyun-inc.com; Wed, 23 Apr 2025 15:30:03 +0800 Message-ID: Date: Wed, 23 Apr 2025 15:30:02 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v4 05/12] khugepaged: generalize __collapse_huge_page_* for mTHP support To: Nico Pache , linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: akpm@linux-foundation.org, corbet@lwn.net, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, david@redhat.com, baohua@kernel.org, ryan.roberts@arm.com, willy@infradead.org, peterx@redhat.com, ziy@nvidia.com, wangkefeng.wang@huawei.com, usamaarif642@gmail.com, sunnanyong@huawei.com, vishal.moola@gmail.com, thomas.hellstrom@linux.intel.com, yang@os.amperecomputing.com, kirill.shutemov@linux.intel.com, aarcange@redhat.com, raquini@redhat.com, dev.jain@arm.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, tiwai@suse.de, will@kernel.org, dave.hansen@linux.intel.com, jack@suse.cz, cl@gentwo.org, jglisse@google.com, surenb@google.com, zokeefe@google.com, hannes@cmpxchg.org, rientjes@google.com, mhocko@suse.com, rdunlap@infradead.org References: <20250417000238.74567-1-npache@redhat.com> <20250417000238.74567-6-npache@redhat.com> From: Baolin Wang In-Reply-To: <20250417000238.74567-6-npache@redhat.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: C062A160014 X-Stat-Signature: zmj98pm3kzggxutdhxg478b7gsmxuko9 X-Rspam-User: X-HE-Tag: 1745393410-443554 X-HE-Meta: U2FsdGVkX1/rePZ0Cpiilz7MgVYu+Zv26bc8U0BvG7KakivCYi9YXuAIwJBljPgJyEMlfJ+mi7KOrwsNKkpMX2GxtDirz1aT62+fqqANDqKfWxIMzUeT9C3ecN8s/y6KXuU52HAH8f72fVD9GD3i1DQHawEaG61c8aNy+JAN31dQH3PoSYBL73pHdBKzFg9IX5uRp+BoB7UmIR2DFLp2xLZqE7ot4iAk3HU3R/FkwvMZD+a6XkEnzSCbK+ood4DSahpiPPWI+t7yUPa6vu0z7FGYBFbaKPTZX+x8JT5+SYugCv62irY0Pcubw67QzyTz0Puk8PveSHjcwAwBmaE2SWqq6J9eiruJIfWZ4lNTg6huRoVQrCOfMY3CxfhTCjM9MQLSGlImJTOE3D8XRkhRvLQfjhY2gD+NEwkQGBkrtZzSEqU/8s4Oq+hwI6HKV+/L1Ol5UH523Mo3Vy/pfHOK1MGBvWymxUlXqBCm0B7F1raEWcrv/9/b6x98NCpUqC9TznNNHlRxb5k8PaTyY8zKL+5urVKUebQJcH+5BPYZOlXNkmCmy7hzGSKDJBkVwQul0/B/QmcdWcFj4WuQzFdpBo/QQqnJRZDnkXGNvQ5DL7appYfA2xpYVyd8ZULASbDQc4IEKsGZsUAgbRgrist28woJ5FLrRrp995HSopTax8YR8UzRz51P06k9gOuYo1B9t2tAsz3pcuDs0HmMLnZBtrjRqy9pv7UNAN21fc8eUjmNmZOd64GtU+EH1raDLBiZxOpJ5eAS33QJV8CCzrkd6my+j+ZXZsI5ljkI/qRw9XiCm9ixd1wTpIS0LhbRtlfIGfnswMuV90ZefBar1eMBTVDMGCH2NE0/bjDCsJnIckmrGxCeGWbWpY7EGWe8+czvXuDFQf6ZBEWy1+CdowdZ9hO166UKZknV3rxw1riGMF5k+FaSOsp3cNH7zoTGwlh5Aty3X5+WO+3Bz0lKGov IPk9MAbl VgGRoFlwzrtyoR5Bo2NO1nTPkfGEJJBotfEb84p3Y9MmLJ/SSx3HIZAcGekLmktfejausoj1aO0e9YuBhXNPaZoUsSI/6uPhR2diJn1nmWW3EosrroF8sdx31nwaZkJqwEUBIijijJNy4gUQQ5134SoHp3YVniwndhiGYkdFVNV/XnMiZXhK1/ZLmXvTq4NFZkh4W4PJvDQVDb1T3IyxwqnpgMXf0sUQmUSKEiirDwQjqKZOENOa+DXTxUgWhha2lv2SVtS6PzT24fuuvZ2e/KiPvXnkyA+YVg+ag8ubDc+jDrRit8X+Se5ZZmrwBDxWtLqmHklNCTrXnUxJ2EVkwXLoGo7iQFJ0vdGPjHVdFkGVBe4kFV07yb0w/qYn0fIe07VmAr9RTLAeSsMDBxW+Nx7qfK1tqMl9iUJo7C1GTvl9TCEvvx60greUDQjirtHI4PTbJzBD9Wt1pi1c= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2025/4/17 08:02, Nico Pache wrote: > generalize the order of the __collapse_huge_page_* functions > to support future mTHP collapse. > > mTHP collapse can suffer from incosistant behavior, and memory waste > "creep". disable swapin and shared support for mTHP collapse. > > No functional changes in this patch. > > Co-developed-by: Dev Jain > Signed-off-by: Dev Jain > Signed-off-by: Nico Pache > --- > mm/khugepaged.c | 46 ++++++++++++++++++++++++++++------------------ > 1 file changed, 28 insertions(+), 18 deletions(-) > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c > index 883e9a46359f..5e9272ab82da 100644 > --- a/mm/khugepaged.c > +++ b/mm/khugepaged.c > @@ -565,15 +565,17 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, > unsigned long address, > pte_t *pte, > struct collapse_control *cc, > - struct list_head *compound_pagelist) > + struct list_head *compound_pagelist, > + u8 order) > { > struct page *page = NULL; > struct folio *folio = NULL; > pte_t *_pte; > int none_or_zero = 0, shared = 0, result = SCAN_FAIL, referenced = 0; > bool writable = false; > + int scaled_none = khugepaged_max_ptes_none >> (HPAGE_PMD_ORDER - order); > > - for (_pte = pte; _pte < pte + HPAGE_PMD_NR; > + for (_pte = pte; _pte < pte + (1 << order); > _pte++, address += PAGE_SIZE) { > pte_t pteval = ptep_get(_pte); > if (pte_none(pteval) || (pte_present(pteval) && > @@ -581,7 +583,7 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, > ++none_or_zero; > if (!userfaultfd_armed(vma) && > (!cc->is_khugepaged || > - none_or_zero <= khugepaged_max_ptes_none)) { > + none_or_zero <= scaled_none)) { > continue; > } else { > result = SCAN_EXCEED_NONE_PTE; > @@ -609,8 +611,8 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, > /* See hpage_collapse_scan_pmd(). */ > if (folio_maybe_mapped_shared(folio)) { > ++shared; > - if (cc->is_khugepaged && > - shared > khugepaged_max_ptes_shared) { > + if (order != HPAGE_PMD_ORDER || (cc->is_khugepaged && > + shared > khugepaged_max_ptes_shared)) { > result = SCAN_EXCEED_SHARED_PTE; > count_vm_event(THP_SCAN_EXCEED_SHARED_PTE); > goto out; > @@ -711,13 +713,14 @@ static void __collapse_huge_page_copy_succeeded(pte_t *pte, > struct vm_area_struct *vma, > unsigned long address, > spinlock_t *ptl, > - struct list_head *compound_pagelist) > + struct list_head *compound_pagelist, > + u8 order) > { > struct folio *src, *tmp; > pte_t *_pte; > pte_t pteval; > > - for (_pte = pte; _pte < pte + HPAGE_PMD_NR; > + for (_pte = pte; _pte < pte + (1 << order); > _pte++, address += PAGE_SIZE) { > pteval = ptep_get(_pte); > if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) { > @@ -764,7 +767,8 @@ static void __collapse_huge_page_copy_failed(pte_t *pte, > pmd_t *pmd, > pmd_t orig_pmd, > struct vm_area_struct *vma, > - struct list_head *compound_pagelist) > + struct list_head *compound_pagelist, > + u8 order) > { > spinlock_t *pmd_ptl; > > @@ -781,7 +785,7 @@ static void __collapse_huge_page_copy_failed(pte_t *pte, > * Release both raw and compound pages isolated > * in __collapse_huge_page_isolate. > */ > - release_pte_pages(pte, pte + HPAGE_PMD_NR, compound_pagelist); > + release_pte_pages(pte, pte + (1 << order), compound_pagelist); > } > > /* > @@ -802,7 +806,7 @@ static void __collapse_huge_page_copy_failed(pte_t *pte, > static int __collapse_huge_page_copy(pte_t *pte, struct folio *folio, > pmd_t *pmd, pmd_t orig_pmd, struct vm_area_struct *vma, > unsigned long address, spinlock_t *ptl, > - struct list_head *compound_pagelist) > + struct list_head *compound_pagelist, u8 order) > { > unsigned int i; > int result = SCAN_SUCCEED; > @@ -810,7 +814,7 @@ static int __collapse_huge_page_copy(pte_t *pte, struct folio *folio, > /* > * Copying pages' contents is subject to memory poison at any iteration. > */ > - for (i = 0; i < HPAGE_PMD_NR; i++) { > + for (i = 0; i < (1 << order); i++) { > pte_t pteval = ptep_get(pte + i); > struct page *page = folio_page(folio, i); > unsigned long src_addr = address + i * PAGE_SIZE; > @@ -829,10 +833,10 @@ static int __collapse_huge_page_copy(pte_t *pte, struct folio *folio, > > if (likely(result == SCAN_SUCCEED)) > __collapse_huge_page_copy_succeeded(pte, vma, address, ptl, > - compound_pagelist); > + compound_pagelist, order); > else > __collapse_huge_page_copy_failed(pte, pmd, orig_pmd, vma, > - compound_pagelist); > + compound_pagelist, order); > > return result; > } > @@ -1000,11 +1004,11 @@ static int check_pmd_still_valid(struct mm_struct *mm, > static int __collapse_huge_page_swapin(struct mm_struct *mm, > struct vm_area_struct *vma, > unsigned long haddr, pmd_t *pmd, > - int referenced) > + int referenced, u8 order) > { > int swapped_in = 0; > vm_fault_t ret = 0; > - unsigned long address, end = haddr + (HPAGE_PMD_NR * PAGE_SIZE); > + unsigned long address, end = haddr + (PAGE_SIZE << order); > int result; > pte_t *pte = NULL; > spinlock_t *ptl; > @@ -1035,6 +1039,12 @@ static int __collapse_huge_page_swapin(struct mm_struct *mm, > if (!is_swap_pte(vmf.orig_pte)) > continue; > > + /* Dont swapin for mTHP collapse */ > + if (order != HPAGE_PMD_ORDER) { > + result = SCAN_EXCEED_SWAP_PTE; > + goto out; > + } IMO, this check should move into hpage_collapse_scan_pmd(), that means if we scan the swap ptes for mTHP collapse, then we can return 'SCAN_EXCEED_SWAP_PTE' to abort the collapse earlier. The logic is the same as how you handle the shared ptes for mTHP. > vmf.pte = pte; > vmf.ptl = ptl; > ret = do_swap_page(&vmf); > @@ -1154,7 +1164,7 @@ static int collapse_huge_page(struct mm_struct *mm, unsigned long address, > * that case. Continuing to collapse causes inconsistency. > */ > result = __collapse_huge_page_swapin(mm, vma, address, pmd, > - referenced); > + referenced, HPAGE_PMD_ORDER); > if (result != SCAN_SUCCEED) > goto out_nolock; > } > @@ -1201,7 +1211,7 @@ static int collapse_huge_page(struct mm_struct *mm, unsigned long address, > pte = pte_offset_map_lock(mm, &_pmd, address, &pte_ptl); > if (pte) { > result = __collapse_huge_page_isolate(vma, address, pte, cc, > - &compound_pagelist); > + &compound_pagelist, HPAGE_PMD_ORDER); > spin_unlock(pte_ptl); > } else { > result = SCAN_PMD_NULL; > @@ -1231,7 +1241,7 @@ static int collapse_huge_page(struct mm_struct *mm, unsigned long address, > > result = __collapse_huge_page_copy(pte, folio, pmd, _pmd, > vma, address, pte_ptl, > - &compound_pagelist); > + &compound_pagelist, HPAGE_PMD_ORDER); > pte_unmap(pte); > if (unlikely(result != SCAN_SUCCEED)) > goto out_up_write;