From: Lance Yang <lance.yang@linux.dev>
To: Vernon Yang <vernon2gm@gmail.com>
Cc: ziy@nvidia.com, npache@redhat.com, baohua@kernel.org,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Vernon Yang <yanglincheng@kylinos.cn>,
akpm@linux-foundation.org, lorenzo.stoakes@oracle.com,
david@kernel.org
Subject: Re: [PATCH 2/4] mm: khugepaged: remove mm when all memory has been collapsed
Date: Mon, 15 Dec 2025 19:52:41 +0800 [thread overview]
Message-ID: <d03d6ea7-68fe-4eb5-aa28-1020f3382f4d@linux.dev> (raw)
In-Reply-To: <20251215090419.174418-3-yanglincheng@kylinos.cn>
Hi Vernon,
Thanks for the patches!
On 2025/12/15 17:04, Vernon Yang wrote:
> The following data is traced by bpftrace on a desktop system. After
> the system has been left idle for 10 minutes upon booting, a lot of
> SCAN_PMD_MAPPED or SCAN_PMD_NONE are observed during a full scan by
> khugepaged.
>
> @scan_pmd_status[1]: 1 ## SCAN_SUCCEED
> @scan_pmd_status[4]: 158 ## SCAN_PMD_MAPPED
> @scan_pmd_status[3]: 174 ## SCAN_PMD_NONE
> total progress size: 701 MB
> Total time : 440 seconds ## include khugepaged_scan_sleep_millisecs
>
> The khugepaged_scan list save all task that support collapse into hugepage,
> as long as the take is not destroyed, khugepaged will not remove it from
Nit: s/take/task/
> the khugepaged_scan list. This exist a phenomenon where task has already
> collapsed all memory regions into hugepage, but khugepaged continues to
> scan it, which wastes CPU time and invalid, and due to
> khugepaged_scan_sleep_millisecs (default 10s) causes a long wait for
> scanning a large number of invalid task, so scanning really valid task
> is later.
>
> After applying this patch, when all memory is either SCAN_PMD_MAPPED or
> SCAN_PMD_NONE, the mm is automatically removed from khugepaged's scan
> list. If the page fault or MADV_HUGEPAGE again, it is added back to
> khugepaged.
>
> Signed-off-by: Vernon Yang <yanglincheng@kylinos.cn>
> ---
> mm/khugepaged.c | 35 +++++++++++++++++++++++++----------
> 1 file changed, 25 insertions(+), 10 deletions(-)
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 0598a19a98cc..1ec1af5be3c8 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -115,6 +115,7 @@ struct khugepaged_scan {
> struct list_head mm_head;
> struct mm_slot *mm_slot;
> unsigned long address;
> + bool maybe_collapse;
At a quick glance, the name of "maybe_collapse" is a bit ambiguous ...
Perhaps "scan_needed" or "collapse_possible" would be clearer to
indicate that the mm should be kept in the scan list?
> };
>
> static struct khugepaged_scan khugepaged_scan = {
> @@ -1420,22 +1421,19 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
> return result;
> }
>
> -static void collect_mm_slot(struct mm_slot *slot)
> +static void collect_mm_slot(struct mm_slot *slot, bool maybe_collapse)
> {
> struct mm_struct *mm = slot->mm;
>
> lockdep_assert_held(&khugepaged_mm_lock);
>
> - if (hpage_collapse_test_exit(mm)) {
> + if (hpage_collapse_test_exit(mm) || !maybe_collapse) {
> /* free mm_slot */
> hash_del(&slot->hash);
> list_del(&slot->mm_node);
>
> - /*
> - * Not strictly needed because the mm exited already.
> - *
> - * mm_flags_clear(MMF_VM_HUGEPAGE, mm);
> - */
> + if (!maybe_collapse)
> + mm_flags_clear(MMF_VM_HUGEPAGE, mm);
>
> /* khugepaged_mm_lock actually not necessary for the below */
> mm_slot_free(mm_slot_cache, slot);
> @@ -2397,6 +2395,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result,
> struct mm_slot, mm_node);
> khugepaged_scan.address = 0;
> khugepaged_scan.mm_slot = slot;
> + khugepaged_scan.maybe_collapse = false;
> }
> spin_unlock(&khugepaged_mm_lock);
>
> @@ -2470,8 +2469,18 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result,
> khugepaged_scan.address, &mmap_locked, cc);
> }
>
> - if (*result == SCAN_SUCCEED)
> + switch (*result) {
> + case SCAN_PMD_NULL:
> + case SCAN_PMD_NONE:
> + case SCAN_PMD_MAPPED:
> + case SCAN_PTE_MAPPED_HUGEPAGE:
> + break;
> + case SCAN_SUCCEED:
> ++khugepaged_pages_collapsed;
> + fallthrough;
> + default:
> + khugepaged_scan.maybe_collapse = true;
> + }
>
> /* move to next address */
> khugepaged_scan.address += HPAGE_PMD_SIZE;
> @@ -2500,6 +2509,11 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result,
> * if we scanned all vmas of this mm.
> */
> if (hpage_collapse_test_exit(mm) || !vma) {
> + bool maybe_collapse = khugepaged_scan.maybe_collapse;
> +
> + if (mm_flags_test(MMF_DISABLE_THP_COMPLETELY, mm))
> + maybe_collapse = true;
> +
> /*
> * Make sure that if mm_users is reaching zero while
> * khugepaged runs here, khugepaged_exit will find
> @@ -2508,12 +2522,13 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result,
> if (!list_is_last(&slot->mm_node, &khugepaged_scan.mm_head)) {
> khugepaged_scan.mm_slot = list_next_entry(slot, mm_node);
> khugepaged_scan.address = 0;
> + khugepaged_scan.maybe_collapse = false;
> } else {
> khugepaged_scan.mm_slot = NULL;
> khugepaged_full_scans++;
> }
>
> - collect_mm_slot(slot);
> + collect_mm_slot(slot, maybe_collapse);
> }
>
> trace_mm_khugepaged_scan(mm, progress, khugepaged_scan.mm_slot == NULL);
> @@ -2616,7 +2631,7 @@ static int khugepaged(void *none)
> slot = khugepaged_scan.mm_slot;
> khugepaged_scan.mm_slot = NULL;
> if (slot)
> - collect_mm_slot(slot);
> + collect_mm_slot(slot, true);
> spin_unlock(&khugepaged_mm_lock);
> return 0;
> }
next prev parent reply other threads:[~2025-12-15 11:52 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-15 9:04 [PATCH 0/4] Improve khugepaged scan logic Vernon Yang
2025-12-15 9:04 ` [PATCH 1/4] mm: khugepaged: add trace_mm_khugepaged_scan event Vernon Yang
2025-12-18 9:24 ` David Hildenbrand (Red Hat)
2025-12-19 5:21 ` Vernon Yang
2025-12-15 9:04 ` [PATCH 2/4] mm: khugepaged: remove mm when all memory has been collapsed Vernon Yang
2025-12-15 11:52 ` Lance Yang [this message]
2025-12-16 6:27 ` Vernon Yang
2025-12-15 21:45 ` kernel test robot
2025-12-16 6:30 ` Vernon Yang
2025-12-15 23:01 ` kernel test robot
2025-12-16 6:32 ` Vernon Yang
2025-12-17 3:31 ` Wei Yang
2025-12-18 3:27 ` Vernon Yang
2025-12-18 3:48 ` Wei Yang
2025-12-18 4:41 ` Vernon Yang
2025-12-18 9:29 ` David Hildenbrand (Red Hat)
2025-12-19 5:24 ` Vernon Yang
2025-12-19 9:00 ` David Hildenbrand (Red Hat)
2025-12-19 8:35 ` Vernon Yang
2025-12-19 8:55 ` David Hildenbrand (Red Hat)
2025-12-23 11:18 ` Dev Jain
2025-12-25 16:07 ` Vernon Yang
2025-12-29 6:02 ` Vernon Yang
2025-12-22 19:00 ` kernel test robot
2025-12-15 9:04 ` [PATCH 3/4] mm: khugepaged: move mm to list tail when MADV_COLD/MADV_FREE Vernon Yang
2025-12-15 21:12 ` kernel test robot
2025-12-16 7:00 ` Vernon Yang
2025-12-16 13:08 ` kernel test robot
2025-12-16 13:31 ` kernel test robot
2025-12-18 9:31 ` David Hildenbrand (Red Hat)
2025-12-19 5:29 ` Vernon Yang
2025-12-19 8:58 ` David Hildenbrand (Red Hat)
2025-12-21 2:10 ` Wei Yang
2025-12-21 4:25 ` Vernon Yang
2025-12-21 9:24 ` David Hildenbrand (Red Hat)
2025-12-21 12:34 ` Vernon Yang
2025-12-23 9:59 ` David Hildenbrand (Red Hat)
2025-12-25 15:12 ` Vernon Yang
2025-12-21 12:38 ` Wei Yang
2025-12-15 9:04 ` [PATCH 4/4] mm: khugepaged: set to next mm direct when mm has MMF_DISABLE_THP_COMPLETELY Vernon Yang
2025-12-18 9:33 ` David Hildenbrand (Red Hat)
2025-12-19 5:31 ` Vernon Yang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d03d6ea7-68fe-4eb5-aa28-1020f3382f4d@linux.dev \
--to=lance.yang@linux.dev \
--cc=akpm@linux-foundation.org \
--cc=baohua@kernel.org \
--cc=david@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=npache@redhat.com \
--cc=vernon2gm@gmail.com \
--cc=yanglincheng@kylinos.cn \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.