From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 93F352701C4 for ; Sun, 8 Feb 2026 09:17:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770542246; cv=none; b=OvroBUN6nacd4JsTZjMY8lk64ocicYn2gfypOpA/H0o7rEQDjAlesXWy7MxXjWw4bNM9yWjad3faAjjbbbyPhrQam3oz03rW7VTj54mbpKReKMcGlpwVEj8/fyqnZfQAITROY02FyS3TttKXFwLRJdBHg7Q6chDizsA3nu/Yl4I= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770542246; c=relaxed/simple; bh=oqPB9aYY9H2ZD53KKpKPr7jkBDWNcQXmJ/3X8DGAMSQ=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=G4r/B2CGfUpKhyQyzPT+O94t3zp+kHKE/p9hzKk6kQIMyFsVYWFIjSxmxmFA3bVlg5+U0RpMfouqgnkqSIH3rfxhzxohZ4Nrk6o5EvNz3WrVVKMq4rejf+L6FMZrQ+v2bThOYTNXNHS2kWmorJTNPIMkbXMcQzifnPMlFv2eBas= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 83694339; Sun, 8 Feb 2026 01:17:18 -0800 (PST) Received: from [10.164.10.250] (unknown [10.164.10.250]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id D86AC3F740; Sun, 8 Feb 2026 01:17:21 -0800 (PST) Message-ID: <53527839-e918-47d3-9442-cd5e8975ab22@arm.com> Date: Sun, 8 Feb 2026 14:47:18 +0530 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH mm-new v7 2/5] mm: khugepaged: refine scan progress number To: Vernon Yang , akpm@linux-foundation.org, david@kernel.org Cc: lorenzo.stoakes@oracle.com, ziy@nvidia.com, baohua@kernel.org, lance.yang@linux.dev, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Vernon Yang References: <20260207081613.588598-1-vernon2gm@gmail.com> <20260207081613.588598-3-vernon2gm@gmail.com> Content-Language: en-US From: Dev Jain In-Reply-To: <20260207081613.588598-3-vernon2gm@gmail.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit On 07/02/26 1:46 pm, Vernon Yang wrote: > From: Vernon Yang > > Currently, each scan always increases "progress" by HPAGE_PMD_NR, > even if only scanning a single PTE/PMD entry. > > - When only scanning a sigle PTE entry, let me provide a detailed > example: > > static int hpage_collapse_scan_pmd() > { > for (addr = start_addr, _pte = pte; _pte < pte + HPAGE_PMD_NR; > _pte++, addr += PAGE_SIZE) { > pte_t pteval = ptep_get(_pte); > ... > if (pte_uffd_wp(pteval)) { <-- first scan hit > result = SCAN_PTE_UFFD_WP; > goto out_unmap; > } > } > } > > During the first scan, if pte_uffd_wp(pteval) is true, the loop exits > directly. In practice, only one PTE is scanned before termination. > Here, "progress += 1" reflects the actual number of PTEs scanned, but > previously "progress += HPAGE_PMD_NR" always. > > - When the memory has been collapsed to PMD, let me provide a detailed > example: > > The following data is traced by bpftrace on a desktop system. After > the system has been left idle for 10 minutes upon booting, a lot of > SCAN_PMD_MAPPED or SCAN_NO_PTE_TABLE are observed during a full scan > by khugepaged. > > From trace_mm_khugepaged_scan_pmd and trace_mm_khugepaged_scan_file, the > following statuses were observed, with frequency mentioned next to them: > > SCAN_SUCCEED : 1 > SCAN_EXCEED_SHARED_PTE: 2 > SCAN_PMD_MAPPED : 142 > SCAN_NO_PTE_TABLE : 178 > total progress size : 674 MB > Total time : 419 seconds, include khugepaged_scan_sleep_millisecs > > The khugepaged_scan list save all task that support collapse into hugepage, > as long as the task is not destroyed, khugepaged will not remove it from > the khugepaged_scan list. This exist a phenomenon where task has already > collapsed all memory regions into hugepage, but khugepaged continues to > scan it, which wastes CPU time and invalid, and due to > khugepaged_scan_sleep_millisecs (default 10s) causes a long wait for > scanning a large number of invalid task, so scanning really valid task > is later. > > After applying this patch, when the memory is either SCAN_PMD_MAPPED or > SCAN_NO_PTE_TABLE, just skip it, as follow: > > SCAN_EXCEED_SHARED_PTE: 2 > SCAN_PMD_MAPPED : 147 > SCAN_NO_PTE_TABLE : 173 > total progress size : 45 MB > Total time : 20 seconds > > Signed-off-by: Vernon Yang > --- > mm/khugepaged.c | 38 ++++++++++++++++++++++++++++---------- > 1 file changed, 28 insertions(+), 10 deletions(-) > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c > index 4049234e1c8b..8b68ae3bc2c5 100644 > --- a/mm/khugepaged.c > +++ b/mm/khugepaged.c > @@ -68,7 +68,10 @@ enum scan_result { > static struct task_struct *khugepaged_thread __read_mostly; > static DEFINE_MUTEX(khugepaged_mutex); > > -/* default scan 8*HPAGE_PMD_NR ptes (or vmas) every 10 second */ > +/* > + * default scan 8*HPAGE_PMD_NR ptes, pmd_mapped, no_pte_table or vmas > + * every 10 second. > + */ > static unsigned int khugepaged_pages_to_scan __read_mostly; > static unsigned int khugepaged_pages_collapsed; > static unsigned int khugepaged_full_scans; > @@ -1240,7 +1243,8 @@ static enum scan_result collapse_huge_page(struct mm_struct *mm, unsigned long a > } > > static enum scan_result hpage_collapse_scan_pmd(struct mm_struct *mm, > - struct vm_area_struct *vma, unsigned long start_addr, bool *mmap_locked, > + struct vm_area_struct *vma, unsigned long start_addr, > + bool *mmap_locked, unsigned int *cur_progress, > struct collapse_control *cc) > { > pmd_t *pmd; > @@ -1256,19 +1260,27 @@ static enum scan_result hpage_collapse_scan_pmd(struct mm_struct *mm, > VM_BUG_ON(start_addr & ~HPAGE_PMD_MASK); > > result = find_pmd_or_thp_or_none(mm, start_addr, &pmd); > - if (result != SCAN_SUCCEED) > + if (result != SCAN_SUCCEED) { > + if (cur_progress) > + *cur_progress = 1; > goto out; > + } > > memset(cc->node_load, 0, sizeof(cc->node_load)); > nodes_clear(cc->alloc_nmask); > pte = pte_offset_map_lock(mm, pmd, start_addr, &ptl); > if (!pte) { > + if (cur_progress) > + *cur_progress = 1; > result = SCAN_NO_PTE_TABLE; > goto out; > } > > for (addr = start_addr, _pte = pte; _pte < pte + HPAGE_PMD_NR; > _pte++, addr += PAGE_SIZE) { > + if (cur_progress) > + *cur_progress += 1; > + > pte_t pteval = ptep_get(_pte); > if (pte_none_or_zero(pteval)) { > ++none_or_zero; > @@ -2288,8 +2300,9 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr, > return result; > } > > -static enum scan_result hpage_collapse_scan_file(struct mm_struct *mm, unsigned long addr, > - struct file *file, pgoff_t start, struct collapse_control *cc) > +static enum scan_result hpage_collapse_scan_file(struct mm_struct *mm, > + unsigned long addr, struct file *file, pgoff_t start, > + unsigned int *cur_progress, struct collapse_control *cc) > { > struct folio *folio = NULL; > struct address_space *mapping = file->f_mapping; > @@ -2378,6 +2391,8 @@ static enum scan_result hpage_collapse_scan_file(struct mm_struct *mm, unsigned > cond_resched_rcu(); > } > } > + if (cur_progress) > + *cur_progress = HPAGE_PMD_NR; > rcu_read_unlock(); > > Nit: Could move this at the end of the function. Looks weird before the rcu_read_unlock. Reviewed-by: Dev Jain