linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
To: Nico Pache <npache@redhat.com>
Cc: linux-mm@kvack.org, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org,
	david@redhat.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com,
	Liam.Howlett@oracle.com, ryan.roberts@arm.com, dev.jain@arm.com,
	corbet@lwn.net, rostedt@goodmis.org, mhiramat@kernel.org,
	mathieu.desnoyers@efficios.com, akpm@linux-foundation.org,
	baohua@kernel.org, willy@infradead.org, peterx@redhat.com,
	wangkefeng.wang@huawei.com, usamaarif642@gmail.com,
	sunnanyong@huawei.com, vishal.moola@gmail.com,
	thomas.hellstrom@linux.intel.com, yang@os.amperecomputing.com,
	kirill.shutemov@linux.intel.com, aarcange@redhat.com,
	raquini@redhat.com, anshuman.khandual@arm.com,
	catalin.marinas@arm.com, tiwai@suse.de, will@kernel.org,
	dave.hansen@linux.intel.com, jack@suse.cz, cl@gentwo.org,
	jglisse@google.com, surenb@google.com, zokeefe@google.com,
	hannes@cmpxchg.org, rientjes@google.com, mhocko@suse.com,
	rdunlap@infradead.org, hughd@google.com
Subject: Re: [PATCH v10 02/13] introduce collapse_single_pmd to unify khugepaged and madvise_collapse
Date: Wed, 20 Aug 2025 12:21:43 +0100	[thread overview]
Message-ID: <248d57e5-8cd5-408b-a6c8-970df6876b6c@lucifer.local> (raw)
In-Reply-To: <20250819134205.622806-3-npache@redhat.com>

On Tue, Aug 19, 2025 at 07:41:54AM -0600, Nico Pache wrote:
> The khugepaged daemon and madvise_collapse have two different
> implementations that do almost the same thing.
>
> Create collapse_single_pmd to increase code reuse and create an entry
> point to these two users.
>
> Refactor madvise_collapse and collapse_scan_mm_slot to use the new
> collapse_single_pmd function. This introduces a minor behavioral change
> that is most likely an undiscovered bug. The current implementation of
> khugepaged tests collapse_test_exit_or_disable before calling
> collapse_pte_mapped_thp, but we weren't doing it in the madvise_collapse
> case. By unifying these two callers madvise_collapse now also performs
> this check.
>
> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> Acked-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Nico Pache <npache@redhat.com>
> ---
>  mm/khugepaged.c | 94 ++++++++++++++++++++++++++-----------------------
>  1 file changed, 49 insertions(+), 45 deletions(-)
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 0e7bbadf03ee..b7b98aebb670 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -2382,6 +2382,50 @@ static int collapse_scan_file(struct mm_struct *mm, unsigned long addr,
>  	return result;
>  }
>
> +/*
> + * Try to collapse a single PMD starting at a PMD aligned addr, and return
> + * the results.
> + */
> +static int collapse_single_pmd(unsigned long addr,
> +				struct vm_area_struct *vma, bool *mmap_locked,
> +				struct collapse_control *cc)
> +{
> +	int result = SCAN_FAIL;

You assign result in all branches, so this can be uninitialised.

> +	struct mm_struct *mm = vma->vm_mm;
> +
> +	if (!vma_is_anonymous(vma)) {
> +		struct file *file = get_file(vma->vm_file);
> +		pgoff_t pgoff = linear_page_index(vma, addr);
> +
> +		mmap_read_unlock(mm);
> +		*mmap_locked = false;
> +		result = collapse_scan_file(mm, addr, file, pgoff, cc);
> +		fput(file);
> +		if (result == SCAN_PTE_MAPPED_HUGEPAGE) {
> +			mmap_read_lock(mm);
> +			*mmap_locked = true;
> +			if (collapse_test_exit_or_disable(mm)) {
> +				mmap_read_unlock(mm);
> +				*mmap_locked = false;
> +				result = SCAN_ANY_PROCESS;
> +				goto end;

Don't love that in e.g. collapse_scan_mm_slot() we are using the mmap lock being
disabled as in effect an error code.

Is SCAN_ANY_PROCESS correct here? Because in collapse_scan_mm_slot() you'd
previously:

	if (*result == SCAN_PTE_MAPPED_HUGEPAGE) {
		mmap_read_lock(mm);
		if (collapse_test_exit_or_disable(mm))
			goto breakouterloop;
		...
	}

But now you're setting result = SCAN_ANY_PROCESS rather than
SCAN_PTE_MAPPED_HUGEPAGE in this instance?

You don't mention that you're changing this, or at least explicitly enough,
the commit message should state that you're changing this and explain why
it's ok.

This whole file is horrid, and it's kinda an aside, but I really wish we
had some comment going through each of the scan_result cases and explaining
what each one meant.

Also I think:

	return SCAN_ANY_PROCESS;

Is better than:

	result = SCAN_ANY_PROCESS;
	goto end;
	...
end:
	return result;

> +			}
> +			result = collapse_pte_mapped_thp(mm, addr,
> +							 !cc->is_khugepaged);

Hm another change here, in the original code in collapse_scan_mm_slot()
this is:

	*result = collapse_pte_mapped_thp(mm,
		khugepaged_scan.address, false);

Presumably collapse_scan_mm_slot() is only ever invoked with
cc->is_khugepaged?

Maybe worth adding a VM_WARN_ON_ONCE(!cc->is_khugepaged) at the top of
collapse_scan_mm_slot() to assert this (and other places where your change
assumes this to be the case).


> +			if (result == SCAN_PMD_MAPPED)
> +				result = SCAN_SUCCEED;
> +			mmap_read_unlock(mm);
> +			*mmap_locked = false;
> +		}
> +	} else {
> +		result = collapse_scan_pmd(mm, vma, addr, mmap_locked, cc);
> +	}
> +	if (cc->is_khugepaged && result == SCAN_SUCCEED)
> +		++khugepaged_pages_collapsed;

Similarly, presumably because collapse_scan_mm_slot() only ever invoked
khugepaged case this didn't have the cc->is_khugepaged check?

> +end:
> +	return result;
> +}

There's a LOT of nesting going on here, I think we can simplify this a
lot. If we make the change I noted above re: returning SCAN_ANY_PROCESS< we
can move the end label up a bit and avoid a ton of nesting, e.g.:

static int collapse_single_pmd(unsigned long addr,
				struct vm_area_struct *vma, bool *mmap_locked,
				struct collapse_control *cc)
{
	struct mm_struct *mm = vma->vm_mm;
	struct file *file;
	pgoff_t pgoff;
	int result;

	if (vma_is_anonymous(vma)) {
		result = collapse_scan_pmd(mm, vma, addr, mmap_locked, cc);
		goto end:
	}

	file = get_file(vma->vm_file);
	pgoff = linear_page_index(vma, addr);

	mmap_read_unlock(mm);
	*mmap_locked = false;
	result = collapse_scan_file(mm, addr, file, pgoff, cc);
	fput(file);
	if (result != SCAN_PTE_MAPPED_HUGEPAGE)
		goto end;

	mmap_read_lock(mm);
	*mmap_locked = true;
	if (collapse_test_exit_or_disable(mm)) {
		mmap_read_unlock(mm);
		*mmap_locked = false;
		return SCAN_ANY_PROCESS;
	}
	result = collapse_pte_mapped_thp(mm, addr, !cc->is_khugepaged);
	if (result == SCAN_PMD_MAPPED)
		result = SCAN_SUCCEED;
	mmap_read_unlock(mm);
	*mmap_locked = false;

end:
	if (cc->is_khugepaged && result == SCAN_SUCCEED)
		++khugepaged_pages_collapsed;

	return result;
}

(untested, thrown together so do double check!)

> +
>  static unsigned int collapse_scan_mm_slot(unsigned int pages, int *result,
>  					    struct collapse_control *cc)
>  	__releases(&khugepaged_mm_lock)
> @@ -2455,34 +2499,9 @@ static unsigned int collapse_scan_mm_slot(unsigned int pages, int *result,
>  			VM_BUG_ON(khugepaged_scan.address < hstart ||
>  				  khugepaged_scan.address + HPAGE_PMD_SIZE >
>  				  hend);
> -			if (!vma_is_anonymous(vma)) {
> -				struct file *file = get_file(vma->vm_file);
> -				pgoff_t pgoff = linear_page_index(vma,
> -						khugepaged_scan.address);
> -
> -				mmap_read_unlock(mm);
> -				mmap_locked = false;
> -				*result = collapse_scan_file(mm,
> -					khugepaged_scan.address, file, pgoff, cc);
> -				fput(file);
> -				if (*result == SCAN_PTE_MAPPED_HUGEPAGE) {
> -					mmap_read_lock(mm);
> -					if (collapse_test_exit_or_disable(mm))
> -						goto breakouterloop;
> -					*result = collapse_pte_mapped_thp(mm,
> -						khugepaged_scan.address, false);
> -					if (*result == SCAN_PMD_MAPPED)
> -						*result = SCAN_SUCCEED;
> -					mmap_read_unlock(mm);
> -				}
> -			} else {
> -				*result = collapse_scan_pmd(mm, vma,
> -					khugepaged_scan.address, &mmap_locked, cc);
> -			}
> -
> -			if (*result == SCAN_SUCCEED)
> -				++khugepaged_pages_collapsed;
>
> +			*result = collapse_single_pmd(khugepaged_scan.address,
> +							vma, &mmap_locked, cc);
>  			/* move to next address */
>  			khugepaged_scan.address += HPAGE_PMD_SIZE;
>  			progress += HPAGE_PMD_NR;
> @@ -2799,34 +2818,19 @@ int madvise_collapse(struct vm_area_struct *vma, unsigned long start,
>  		mmap_assert_locked(mm);
>  		memset(cc->node_load, 0, sizeof(cc->node_load));
>  		nodes_clear(cc->alloc_nmask);
> -		if (!vma_is_anonymous(vma)) {
> -			struct file *file = get_file(vma->vm_file);
> -			pgoff_t pgoff = linear_page_index(vma, addr);
>
> -			mmap_read_unlock(mm);
> -			mmap_locked = false;
> -			result = collapse_scan_file(mm, addr, file, pgoff, cc);
> -			fput(file);
> -		} else {
> -			result = collapse_scan_pmd(mm, vma, addr,
> -						   &mmap_locked, cc);
> -		}
> +		result = collapse_single_pmd(addr, vma, &mmap_locked, cc);
> +

Ack the fact you noted the behaviour change re:
collapse_test_exit_or_disable() that seems fine.

>  		if (!mmap_locked)
>  			*lock_dropped = true;
>
> -handle_result:
>  		switch (result) {
>  		case SCAN_SUCCEED:
>  		case SCAN_PMD_MAPPED:
>  			++thps;
>  			break;
> -		case SCAN_PTE_MAPPED_HUGEPAGE:
> -			BUG_ON(mmap_locked);
> -			mmap_read_lock(mm);
> -			result = collapse_pte_mapped_thp(mm, addr, true);
> -			mmap_read_unlock(mm);
> -			goto handle_result;

One thing that differs with new code her is we filter SCAN_PMD_MAPPED to
SCAN_SUCCEED.

I was about to say 'but ++thps - is this correct' but now I realise this
was looping back on itself with a goto to do just that... ugh ye gads.

Anwyay that's fine because it doesn't change anything.

Re: switch statement in general, again would be good to always have each
scan possibility in switch statements, but perhaps given so many not
practical :)

(that way the compiler warns on missing a newly added enum val)

>  		/* Whitelisted set of results where continuing OK */
> +		case SCAN_PTE_MAPPED_HUGEPAGE:
>  		case SCAN_PMD_NULL:
>  		case SCAN_PTE_NON_PRESENT:
>  		case SCAN_PTE_UFFD_WP:
> --
> 2.50.1
>


  reply	other threads:[~2025-08-20 11:22 UTC|newest]

Thread overview: 75+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-19 13:41 [PATCH v10 00/13] khugepaged: mTHP support Nico Pache
2025-08-19 13:41 ` [PATCH v10 01/13] khugepaged: rename hpage_collapse_* to collapse_* Nico Pache
2025-08-20 10:42   ` Lorenzo Stoakes
2025-08-19 13:41 ` [PATCH v10 02/13] introduce collapse_single_pmd to unify khugepaged and madvise_collapse Nico Pache
2025-08-20 11:21   ` Lorenzo Stoakes [this message]
2025-08-20 16:35     ` Nico Pache
2025-08-22 10:21       ` Lorenzo Stoakes
2025-08-26 13:30         ` Nico Pache
2025-08-19 13:41 ` [PATCH v10 03/13] khugepaged: generalize hugepage_vma_revalidate for mTHP support Nico Pache
2025-08-20 13:23   ` Lorenzo Stoakes
2025-08-20 15:40     ` Nico Pache
2025-08-21  3:41       ` Wei Yang
2025-08-21 14:09         ` Zi Yan
2025-08-22 10:25           ` Lorenzo Stoakes
2025-08-24  1:37   ` Wei Yang
2025-08-26 13:46     ` Nico Pache
2025-08-19 13:41 ` [PATCH v10 04/13] khugepaged: generalize alloc_charge_folio() Nico Pache
2025-08-20 13:28   ` Lorenzo Stoakes
2025-08-19 13:41 ` [PATCH v10 05/13] khugepaged: generalize __collapse_huge_page_* for mTHP support Nico Pache
2025-08-20 14:22   ` Lorenzo Stoakes
2025-09-01 16:15     ` David Hildenbrand
2025-08-19 13:41 ` [PATCH v10 06/13] khugepaged: add " Nico Pache
2025-08-20 18:29   ` Lorenzo Stoakes
2025-09-02 20:12     ` Nico Pache
2025-08-19 13:41 ` [PATCH v10 07/13] khugepaged: skip collapsing mTHP to smaller orders Nico Pache
2025-08-21 12:05   ` Lorenzo Stoakes
2025-08-21 12:33     ` Dev Jain
2025-08-22 10:33       ` Lorenzo Stoakes
2025-08-21 16:54     ` Steven Rostedt
2025-08-21 16:56       ` Lorenzo Stoakes
2025-08-19 13:42 ` [PATCH v10 08/13] khugepaged: avoid unnecessary mTHP collapse attempts Nico Pache
2025-08-20 10:38   ` Lorenzo Stoakes
2025-08-19 13:42 ` [PATCH v10 09/13] khugepaged: enable collapsing mTHPs even when PMD THPs are disabled Nico Pache
2025-08-21 13:35   ` Lorenzo Stoakes
2025-08-19 13:42 ` [PATCH v10 10/13] khugepaged: kick khugepaged for enabling none-PMD-sized mTHPs Nico Pache
2025-08-21 14:18   ` Lorenzo Stoakes
2025-08-21 14:26     ` Lorenzo Stoakes
2025-08-22  6:59     ` Baolin Wang
2025-08-22  7:36       ` Dev Jain
2025-08-19 13:42 ` [PATCH v10 11/13] khugepaged: improve tracepoints for mTHP orders Nico Pache
2025-08-21 14:24   ` Lorenzo Stoakes
2025-08-19 14:16 ` [PATCH v10 12/13] khugepaged: add per-order mTHP khugepaged stats Nico Pache
2025-08-21 14:47   ` Lorenzo Stoakes
2025-08-19 14:17 ` [PATCH v10 13/13] Documentation: mm: update the admin guide for mTHP collapse Nico Pache
2025-08-21 15:03   ` Lorenzo Stoakes
2025-08-19 21:55 ` [PATCH v10 00/13] khugepaged: mTHP support Andrew Morton
2025-08-20 15:55   ` Nico Pache
2025-08-21 15:01 ` Lorenzo Stoakes
2025-08-21 15:13   ` Dev Jain
2025-08-21 15:19     ` Lorenzo Stoakes
2025-08-21 15:25       ` Nico Pache
2025-08-21 15:27         ` Nico Pache
2025-08-21 15:32           ` Lorenzo Stoakes
2025-08-21 16:46             ` Nico Pache
2025-08-21 16:54               ` Lorenzo Stoakes
2025-08-21 17:26                 ` David Hildenbrand
2025-08-21 20:43                 ` David Hildenbrand
2025-08-22 10:41                   ` Lorenzo Stoakes
2025-08-22 14:10                     ` David Hildenbrand
2025-08-22 14:49                       ` Lorenzo Stoakes
2025-08-22 15:33                         ` Dev Jain
2025-08-26 10:43                           ` Lorenzo Stoakes
2025-08-28  9:46                       ` Baolin Wang
2025-08-28 10:48                         ` Dev Jain
2025-08-29  1:55                           ` Baolin Wang
2025-09-01 16:46                             ` David Hildenbrand
2025-09-02  2:28                               ` Baolin Wang
2025-09-02  9:03                                 ` David Hildenbrand
2025-09-02 10:34                                   ` Usama Arif
2025-09-02 11:03                                     ` David Hildenbrand
2025-09-02 20:23                                       ` Usama Arif
2025-09-03  3:27                                         ` Baolin Wang
2025-08-21 16:38     ` Liam R. Howlett
2025-09-01 16:21 ` David Hildenbrand
2025-09-01 17:06   ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=248d57e5-8cd5-408b-a6c8-970df6876b6c@lucifer.local \
    --to=lorenzo.stoakes@oracle.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=anshuman.khandual@arm.com \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=catalin.marinas@arm.com \
    --cc=cl@gentwo.org \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=dev.jain@arm.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=jack@suse.cz \
    --cc=jglisse@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mhiramat@kernel.org \
    --cc=mhocko@suse.com \
    --cc=npache@redhat.com \
    --cc=peterx@redhat.com \
    --cc=raquini@redhat.com \
    --cc=rdunlap@infradead.org \
    --cc=rientjes@google.com \
    --cc=rostedt@goodmis.org \
    --cc=ryan.roberts@arm.com \
    --cc=sunnanyong@huawei.com \
    --cc=surenb@google.com \
    --cc=thomas.hellstrom@linux.intel.com \
    --cc=tiwai@suse.de \
    --cc=usamaarif642@gmail.com \
    --cc=vishal.moola@gmail.com \
    --cc=wangkefeng.wang@huawei.com \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=yang@os.amperecomputing.com \
    --cc=ziy@nvidia.com \
    --cc=zokeefe@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).