public inbox for linux-trace-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "David Hildenbrand (Arm)" <david@kernel.org>
To: Nico Pache <npache@redhat.com>,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org
Cc: aarcange@redhat.com, akpm@linux-foundation.org,
	anshuman.khandual@arm.com, apopple@nvidia.com, baohua@kernel.org,
	baolin.wang@linux.alibaba.com, byungchul@sk.com,
	catalin.marinas@arm.com, cl@gentwo.org, corbet@lwn.net,
	dave.hansen@linux.intel.com, dev.jain@arm.com, gourry@gourry.net,
	hannes@cmpxchg.org, hughd@google.com, jack@suse.cz,
	jackmanb@google.com, jannh@google.com, jglisse@google.com,
	joshua.hahnjy@gmail.com, kas@kernel.org, lance.yang@linux.dev,
	Liam.Howlett@oracle.com, ljs@kernel.org,
	mathieu.desnoyers@efficios.com, matthew.brost@intel.com,
	mhiramat@kernel.org, mhocko@suse.com, peterx@redhat.com,
	pfalcato@suse.de, rakie.kim@sk.com, raquini@redhat.com,
	rdunlap@infradead.org, richard.weiyang@gmail.com,
	rientjes@google.com, rostedt@goodmis.org, rppt@kernel.org,
	ryan.roberts@arm.com, shivankg@amd.com, sunnanyong@huawei.com,
	surenb@google.com, thomas.hellstrom@linux.intel.com,
	tiwai@suse.de, usamaarif642@gmail.com, vbabka@suse.cz,
	vishal.moola@gmail.com, wangkefeng.wang@huawei.com,
	will@kernel.org, willy@infradead.org,
	yang@os.amperecomputing.com, ying.huang@linux.alibaba.com,
	ziy@nvidia.com, zokeefe@google.com
Subject: Re: [PATCH 7.2 v16 03/13] mm/khugepaged: rework max_ptes_* handling with helper functions
Date: Mon, 27 Apr 2026 21:52:32 +0200	[thread overview]
Message-ID: <c82a0b73-67ff-45e6-a792-e610b35a5b2f@kernel.org> (raw)
In-Reply-To: <20260419185750.260784-4-npache@redhat.com>

On 4/19/26 20:57, Nico Pache wrote:
> The following cleanup reworks all the max_ptes_* handling into helper
> functions. This increases the code readability and will later be used to
> implement the mTHP handling of these variables.
> 
> With these changes we abstract all the madvise_collapse() special casing
> (dont respect the sysctls) away from the functions that utilize them. And
> will later in this series to cleanly restrict mTHP collapses behaviors.
> 
> Suggested-by: David Hildenbrand <david@kernel.org>
> Signed-off-by: Nico Pache <npache@redhat.com>
> ---
>  mm/khugepaged.c | 114 +++++++++++++++++++++++++++++++++---------------
>  1 file changed, 78 insertions(+), 36 deletions(-)
> 
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index afac6bc4e76d..f42b55421191 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -348,6 +348,58 @@ static bool pte_none_or_zero(pte_t pte)
>  	return pte_present(pte) && is_zero_pfn(pte_pfn(pte));
>  }
>  
> +/**
> + * collapse_max_ptes_none - Calculate maximum allowed empty PTEs for collapse

empty PTE or PTE mapping the shared zeropage ? That should be clarified also below.

> + * @cc: The collapse control struct
> + * @vma: The vma to check for userfaultfd
> + *
> + * If we are not in khugepaged mode use HPAGE_PMD_NR to allow any
> + * empty page.

Not completely accurate due to uffd. And it's not really "empty page".

Is that information really necessary for the caller? I'd suggest you drop this
here and instead add a comment inline above the "return HPAGE_PMD_NR;".

> + *
> + * Return: Maximum number of empty PTEs allowed for the collapse operation
> + */
> +static unsigned int collapse_max_ptes_none(struct collapse_control *cc,
> +		struct vm_area_struct *vma)
> +{
> +	if (vma && userfaultfd_armed(vma))
> +		return 0;
> +	if (!cc->is_khugepaged)
> +		return HPAGE_PMD_NR;
> +	return khugepaged_max_ptes_none;
> +}
> +
> +/**
> + * collapse_max_ptes_shared - Calculate maximum allowed shared PTEs for collapse

"shared PTE" is not quite clear.

"PTEs that map shared anonymous pages" ?

> + * @cc: The collapse control struct
> + *
> + * If we are not in khugepaged mode use HPAGE_PMD_NR to allow any
> + * shared page.

Same comment as above.

> + *
> + * Return: Maximum number of shared PTEs allowed for the collapse operation
> + */
> +static unsigned int collapse_max_ptes_shared(struct collapse_control *cc)
> +{
> +	if (!cc->is_khugepaged)
> +		return HPAGE_PMD_NR;
> +	return khugepaged_max_ptes_shared;
> +}
> +
> +/**
> + * collapse_max_ptes_swap - Calculate maximum allowed swap PTEs for collapse

We're actually checking non-present page table entries (anonymous THP collapse)
or non-present pagecache entries (file THP collapse).

I wonder if there is an easy way to clarify that here, at least in the
description (confusing name can stay unless we find something better).

> + * @cc: The collapse control struct
> + *
> + * If we are not in khugepaged mode use HPAGE_PMD_NR to allow any
> + * swap page.

Dito.

> + *
> + * Return: Maximum number of swap PTEs allowed for the collapse operation
> + */
> +static unsigned int collapse_max_ptes_swap(struct collapse_control *cc)
> +{
> +	if (!cc->is_khugepaged)
> +		return HPAGE_PMD_NR;
> +	return khugepaged_max_ptes_swap;
> +}
> +
>  int hugepage_madvise(struct vm_area_struct *vma,
>  		     vm_flags_t *vm_flags, int advice)
>  {
> @@ -546,21 +598,19 @@ static enum scan_result __collapse_huge_page_isolate(struct vm_area_struct *vma,
>  	pte_t *_pte;
>  	int none_or_zero = 0, shared = 0, referenced = 0;
>  	enum scan_result result = SCAN_FAIL;
> +	unsigned int max_ptes_none = collapse_max_ptes_none(cc, vma);
> +	unsigned int max_ptes_shared = collapse_max_ptes_shared(cc);

These could be const, right? Or will that change in future patches?

>  
>  	for (_pte = pte; _pte < pte + HPAGE_PMD_NR;
>  	     _pte++, addr += PAGE_SIZE) {
>  		pte_t pteval = ptep_get(_pte);
>  		if (pte_none_or_zero(pteval)) {
> -			++none_or_zero;
> -			if (!userfaultfd_armed(vma) &&
> -			    (!cc->is_khugepaged ||
> -			     none_or_zero <= khugepaged_max_ptes_none)) {
> -				continue;
> -			} else {
> +			if (++none_or_zero > max_ptes_none) {
>  				result = SCAN_EXCEED_NONE_PTE;
>  				count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
>  				goto out;
>  			}
> +			continue;
>  		}
>  		if (!pte_present(pteval)) {
>  			result = SCAN_PTE_NON_PRESENT;
> @@ -591,9 +641,7 @@ static enum scan_result __collapse_huge_page_isolate(struct vm_area_struct *vma,
>  
>  		/* See collapse_scan_pmd(). */
>  		if (folio_maybe_mapped_shared(folio)) {
> -			++shared;
> -			if (cc->is_khugepaged &&
> -			    shared > khugepaged_max_ptes_shared) {
> +			if (++shared > max_ptes_shared) {
>  				result = SCAN_EXCEED_SHARED_PTE;
>  				count_vm_event(THP_SCAN_EXCEED_SHARED_PTE);
>  				goto out;
> @@ -1270,6 +1318,9 @@ static enum scan_result collapse_scan_pmd(struct mm_struct *mm,
>  	unsigned long addr;
>  	spinlock_t *ptl;
>  	int node = NUMA_NO_NODE, unmapped = 0;
> +	unsigned int max_ptes_none = collapse_max_ptes_none(cc, vma);
> +	unsigned int max_ptes_shared = collapse_max_ptes_shared(cc);
> +	unsigned int max_ptes_swap = collapse_max_ptes_swap(cc);

Same question here.

>  
>  	VM_BUG_ON(start_addr & ~HPAGE_PMD_MASK);
>  


In general, LGTM. With the doc fixed up

Acked-by: David Hildenbrand (Arm) <david@kernel.org>

-- 
Cheers,

David

  parent reply	other threads:[~2026-04-27 19:52 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-19 18:57 [PATCH 7.2 v16 00/13] khugepaged: mTHP support Nico Pache
2026-04-19 18:57 ` [PATCH 7.2 v16 01/13] mm/khugepaged: generalize hugepage_vma_revalidate for " Nico Pache
2026-04-20 12:59   ` Usama Arif
2026-04-19 18:57 ` [PATCH 7.2 v16 02/13] mm/khugepaged: generalize alloc_charge_folio() Nico Pache
2026-04-20 13:05   ` Usama Arif
2026-04-27 19:41   ` David Hildenbrand (Arm)
2026-04-19 18:57 ` [PATCH 7.2 v16 03/13] mm/khugepaged: rework max_ptes_* handling with helper functions Nico Pache
2026-04-20 13:15   ` Usama Arif
2026-04-27 19:52   ` David Hildenbrand (Arm) [this message]
2026-04-19 18:57 ` [PATCH 7.2 v16 04/13] mm/khugepaged: generalize __collapse_huge_page_* for mTHP support Nico Pache
2026-04-20 13:55   ` Usama Arif
2026-04-27 20:06     ` David Hildenbrand (Arm)
2026-04-27 20:07   ` David Hildenbrand (Arm)
2026-04-19 18:57 ` [PATCH 7.2 v16 05/13] mm/khugepaged: generalize collapse_huge_page for mTHP collapse Nico Pache
2026-04-20 14:20   ` Usama Arif
2026-04-27 20:13   ` David Hildenbrand (Arm)
2026-04-19 18:57 ` [PATCH 7.2 v16 06/13] mm/khugepaged: skip collapsing mTHP to smaller orders Nico Pache
2026-04-20 15:36   ` Usama Arif
2026-04-19 18:57 ` [PATCH 7.2 v16 07/13] mm/khugepaged: add per-order mTHP collapse failure statistics Nico Pache
2026-04-27 20:21   ` David Hildenbrand (Arm)
2026-04-19 18:57 ` [PATCH 7.2 v16 08/13] mm/khugepaged: improve tracepoints for mTHP orders Nico Pache
2026-04-19 18:57 ` [PATCH 7.2 v16 09/13] mm/khugepaged: introduce collapse_allowable_orders helper function Nico Pache
2026-04-27 20:24   ` David Hildenbrand (Arm)
2026-04-19 18:57 ` [PATCH 7.2 v16 10/13] mm/khugepaged: Introduce mTHP collapse support Nico Pache
2026-04-19 18:57 ` [PATCH 7.2 v16 11/13] mm/khugepaged: avoid unnecessary mTHP collapse attempts Nico Pache
2026-04-19 18:57 ` [PATCH 7.2 v16 12/13] mm/khugepaged: run khugepaged for all orders Nico Pache
2026-04-19 18:57 ` [PATCH 7.2 v16 13/13] Documentation: mm: update the admin guide for mTHP collapse Nico Pache
2026-04-23 20:30 ` [PATCH 7.2 v16 00/13] khugepaged: mTHP support Andrew Morton
2026-04-24 13:58 ` Andrew Morton
2026-04-24 14:05   ` Matthew Brost
2026-04-24 14:19     ` Andrew Morton
2026-04-24 14:24       ` Matthew Brost

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c82a0b73-67ff-45e6-a792-e610b35a5b2f@kernel.org \
    --to=david@kernel.org \
    --cc=Liam.Howlett@oracle.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=anshuman.khandual@arm.com \
    --cc=apopple@nvidia.com \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=byungchul@sk.com \
    --cc=catalin.marinas@arm.com \
    --cc=cl@gentwo.org \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=dev.jain@arm.com \
    --cc=gourry@gourry.net \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=jack@suse.cz \
    --cc=jackmanb@google.com \
    --cc=jannh@google.com \
    --cc=jglisse@google.com \
    --cc=joshua.hahnjy@gmail.com \
    --cc=kas@kernel.org \
    --cc=lance.yang@linux.dev \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=ljs@kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=matthew.brost@intel.com \
    --cc=mhiramat@kernel.org \
    --cc=mhocko@suse.com \
    --cc=npache@redhat.com \
    --cc=peterx@redhat.com \
    --cc=pfalcato@suse.de \
    --cc=rakie.kim@sk.com \
    --cc=raquini@redhat.com \
    --cc=rdunlap@infradead.org \
    --cc=richard.weiyang@gmail.com \
    --cc=rientjes@google.com \
    --cc=rostedt@goodmis.org \
    --cc=rppt@kernel.org \
    --cc=ryan.roberts@arm.com \
    --cc=shivankg@amd.com \
    --cc=sunnanyong@huawei.com \
    --cc=surenb@google.com \
    --cc=thomas.hellstrom@linux.intel.com \
    --cc=tiwai@suse.de \
    --cc=usamaarif642@gmail.com \
    --cc=vbabka@suse.cz \
    --cc=vishal.moola@gmail.com \
    --cc=wangkefeng.wang@huawei.com \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=yang@os.amperecomputing.com \
    --cc=ying.huang@linux.alibaba.com \
    --cc=ziy@nvidia.com \
    --cc=zokeefe@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox