From: Lance Yang <lance.yang@linux.dev>
To: Yafang Shao <laoar.shao@gmail.com>
Cc: bpf@vger.kernel.org, linux-mm@kvack.org,
linux-doc@vger.kernel.org, Lance Yang <ioworker0@gmail.com>,
david@redhat.com, akpm@linux-foundation.org,
baolin.wang@linux.alibaba.com, ziy@nvidia.com,
hannes@cmpxchg.org, corbet@lwn.net, ameryhung@gmail.com,
21cnbao@gmail.com, shakeel.butt@linux.dev, rientjes@google.com,
andrii@kernel.org, daniel@iogearbox.net, ast@kernel.org,
ryan.roberts@arm.com, gutierrez.asier@huawei-partners.com,
willy@infradead.org, usamaarif642@gmail.com,
lorenzo.stoakes@oracle.com, npache@redhat.com, dev.jain@arm.com,
Liam.Howlett@oracle.com
Subject: Re: [PATCH v7 mm-new 01/10] mm: thp: remove disabled task from khugepaged_mm_slot
Date: Wed, 10 Sep 2025 13:11:27 +0800 [thread overview]
Message-ID: <7c890b42-610f-42ec-acf2-b5b9f95209b1@linux.dev> (raw)
In-Reply-To: <20250910024447.64788-2-laoar.shao@gmail.com>
Hey Yafang,
On 2025/9/10 10:44, Yafang Shao wrote:
> Since a task with MMF_DISABLE_THP_COMPLETELY cannot use THP, remove it from
> the khugepaged_mm_slot to stop khugepaged from processing it.
>
> After this change, the following semantic relationship always holds:
>
> MMF_VM_HUGEPAGE is set == task is in khugepaged mm_slot
> MMF_VM_HUGEPAGE is not set == task is not in khugepaged mm_slot
>
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> Cc: Lance Yang <ioworker0@gmail.com>
> ---
> include/linux/khugepaged.h | 1 +
> kernel/sys.c | 6 ++++++
> mm/khugepaged.c | 19 +++++++++----------
> 3 files changed, 16 insertions(+), 10 deletions(-)
>
> diff --git a/include/linux/khugepaged.h b/include/linux/khugepaged.h
> index eb1946a70cff..6cb9107f1006 100644
> --- a/include/linux/khugepaged.h
> +++ b/include/linux/khugepaged.h
> @@ -19,6 +19,7 @@ extern void khugepaged_min_free_kbytes_update(void);
> extern bool current_is_khugepaged(void);
> extern int collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr,
> bool install_pmd);
> +bool hugepage_pmd_enabled(void);
>
> static inline void khugepaged_fork(struct mm_struct *mm, struct mm_struct *oldmm)
> {
> diff --git a/kernel/sys.c b/kernel/sys.c
> index a46d9b75880b..a1c1e8007f2d 100644
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -8,6 +8,7 @@
> #include <linux/export.h>
> #include <linux/mm.h>
> #include <linux/mm_inline.h>
> +#include <linux/khugepaged.h>
> #include <linux/utsname.h>
> #include <linux/mman.h>
> #include <linux/reboot.h>
> @@ -2493,6 +2494,11 @@ static int prctl_set_thp_disable(bool thp_disable, unsigned long flags,
> mm_flags_clear(MMF_DISABLE_THP_COMPLETELY, mm);
> mm_flags_clear(MMF_DISABLE_THP_EXCEPT_ADVISED, mm);
> }
> +
> + if (!mm_flags_test(MMF_DISABLE_THP_COMPLETELY, mm) &&
> + !mm_flags_test(MMF_VM_HUGEPAGE, mm) &&
> + hugepage_pmd_enabled())
> + __khugepaged_enter(mm);
> mmap_write_unlock(current->mm);
One minor style suggestion for prctl_set_thp_disable():
static int prctl_set_thp_disable(bool thp_disable, unsigned long flags,
unsigned long arg4, unsigned long arg5)
{
struct mm_struct *mm = current->mm;
[...]
if (mmap_write_lock_killable(current->mm))
return -EINTR;
[...]
mmap_write_unlock(current->mm);
return 0;
}
It initializes struct mm_struct *mm = current->mm; at the beginning, but
then uses both mm and current->mm. Could you change the calls using
current->mm to use the local mm variable for consistency? Just a nit ;)
Cheers,
Lance
> return 0;
> }
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 4ec324a4c1fe..88ac482fb3a0 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -413,7 +413,7 @@ static inline int hpage_collapse_test_exit_or_disable(struct mm_struct *mm)
> mm_flags_test(MMF_DISABLE_THP_COMPLETELY, mm);
> }
>
> -static bool hugepage_pmd_enabled(void)
> +bool hugepage_pmd_enabled(void)
> {
> /*
> * We cover the anon, shmem and the file-backed case here; file-backed
> @@ -445,6 +445,7 @@ void __khugepaged_enter(struct mm_struct *mm)
>
> /* __khugepaged_exit() must not run from under us */
> VM_BUG_ON_MM(hpage_collapse_test_exit(mm), mm);
> + WARN_ON_ONCE(mm_flags_test(MMF_DISABLE_THP_COMPLETELY, mm));
> if (unlikely(mm_flags_test_and_set(MMF_VM_HUGEPAGE, mm)))
> return;
>
> @@ -472,7 +473,8 @@ void __khugepaged_enter(struct mm_struct *mm)
> void khugepaged_enter_vma(struct vm_area_struct *vma,
> vm_flags_t vm_flags)
> {
> - if (!mm_flags_test(MMF_VM_HUGEPAGE, vma->vm_mm) &&
> + if (!mm_flags_test(MMF_DISABLE_THP_COMPLETELY, vma->vm_mm) &&
> + !mm_flags_test(MMF_VM_HUGEPAGE, vma->vm_mm) &&
> hugepage_pmd_enabled()) {
> if (thp_vma_allowable_order(vma, vm_flags, TVA_KHUGEPAGED, PMD_ORDER))
> __khugepaged_enter(vma->vm_mm);
> @@ -1451,16 +1453,13 @@ static void collect_mm_slot(struct khugepaged_mm_slot *mm_slot)
>
> lockdep_assert_held(&khugepaged_mm_lock);
>
> - if (hpage_collapse_test_exit(mm)) {
> + if (hpage_collapse_test_exit_or_disable(mm)) {
> /* free mm_slot */
> hash_del(&slot->hash);
> list_del(&slot->mm_node);
>
> - /*
> - * Not strictly needed because the mm exited already.
> - *
> - * mm_flags_clear(MMF_VM_HUGEPAGE, mm);
> - */
> + /* If the mm is disabled, this flag must be cleared. */
> + mm_flags_clear(MMF_VM_HUGEPAGE, mm);
>
> /* khugepaged_mm_lock actually not necessary for the below */
> mm_slot_free(mm_slot_cache, mm_slot);
> @@ -2507,9 +2506,9 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result,
> VM_BUG_ON(khugepaged_scan.mm_slot != mm_slot);
> /*
> * Release the current mm_slot if this mm is about to die, or
> - * if we scanned all vmas of this mm.
> + * if we scanned all vmas of this mm, or if this mm is disabled.
> */
> - if (hpage_collapse_test_exit(mm) || !vma) {
> + if (hpage_collapse_test_exit_or_disable(mm) || !vma) {
> /*
> * Make sure that if mm_users is reaching zero while
> * khugepaged runs here, khugepaged_exit will find
next prev parent reply other threads:[~2025-09-10 5:11 UTC|newest]
Thread overview: 61+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-10 2:44 [PATCH v7 mm-new 0/9] mm, bpf: BPF based THP order selection Yafang Shao
2025-09-10 2:44 ` [PATCH v7 mm-new 01/10] mm: thp: remove disabled task from khugepaged_mm_slot Yafang Shao
2025-09-10 5:11 ` Lance Yang [this message]
2025-09-10 6:17 ` Yafang Shao
2025-09-10 7:21 ` Lance Yang
2025-09-10 17:27 ` kernel test robot
2025-09-11 2:12 ` Lance Yang
2025-09-11 2:28 ` Zi Yan
2025-09-11 2:35 ` Yafang Shao
2025-09-11 2:38 ` Lance Yang
2025-09-11 13:47 ` Lorenzo Stoakes
2025-09-14 2:48 ` Yafang Shao
2025-09-11 13:43 ` Lorenzo Stoakes
2025-09-14 2:47 ` Yafang Shao
2025-09-10 2:44 ` [PATCH v7 mm-new 02/10] mm: thp: add support for BPF based THP order selection Yafang Shao
2025-09-10 12:42 ` Lance Yang
2025-09-10 12:54 ` Lance Yang
2025-09-10 13:56 ` Lance Yang
2025-09-11 2:48 ` Yafang Shao
2025-09-11 3:04 ` Lance Yang
2025-09-11 14:45 ` Lorenzo Stoakes
2025-09-11 14:02 ` Lorenzo Stoakes
2025-09-11 14:42 ` Lance Yang
2025-09-11 14:58 ` Lorenzo Stoakes
2025-09-12 7:58 ` Yafang Shao
2025-09-12 12:04 ` Lorenzo Stoakes
2025-09-11 14:33 ` Lorenzo Stoakes
2025-09-12 8:28 ` Yafang Shao
2025-09-12 11:53 ` Lorenzo Stoakes
2025-09-14 2:22 ` Yafang Shao
2025-09-11 14:51 ` Lorenzo Stoakes
2025-09-12 8:03 ` Yafang Shao
2025-09-12 12:00 ` Lorenzo Stoakes
2025-09-25 10:05 ` Lance Yang
2025-09-25 11:38 ` Yafang Shao
2025-09-10 2:44 ` [PATCH v7 mm-new 03/10] mm: thp: decouple THP allocation between swap and page fault paths Yafang Shao
2025-09-11 14:55 ` Lorenzo Stoakes
2025-09-12 7:20 ` Yafang Shao
2025-09-12 12:04 ` Lorenzo Stoakes
2025-09-10 2:44 ` [PATCH v7 mm-new 04/10] mm: thp: enable THP allocation exclusively through khugepaged Yafang Shao
2025-09-11 15:53 ` Lance Yang
2025-09-12 6:21 ` Yafang Shao
2025-09-11 15:58 ` Lorenzo Stoakes
2025-09-12 6:17 ` Yafang Shao
2025-09-12 13:48 ` Lorenzo Stoakes
2025-09-14 2:19 ` Yafang Shao
2025-09-10 2:44 ` [PATCH v7 mm-new 05/10] bpf: mark mm->owner as __safe_rcu_or_null Yafang Shao
2025-09-11 16:04 ` Lorenzo Stoakes
2025-09-10 2:44 ` [PATCH v7 mm-new 06/10] bpf: mark vma->vm_mm as __safe_trusted_or_null Yafang Shao
2025-09-11 17:08 ` Lorenzo Stoakes
2025-09-11 17:30 ` Liam R. Howlett
2025-09-11 17:44 ` Lorenzo Stoakes
2025-09-12 3:56 ` Yafang Shao
2025-09-12 3:50 ` Yafang Shao
2025-09-10 2:44 ` [PATCH v7 mm-new 07/10] selftests/bpf: add a simple BPF based THP policy Yafang Shao
2025-09-10 20:44 ` Alexei Starovoitov
2025-09-11 2:31 ` Yafang Shao
2025-09-10 2:44 ` [PATCH v7 mm-new 08/10] selftests/bpf: add test case to update " Yafang Shao
2025-09-10 2:44 ` [PATCH v7 mm-new 09/10] selftests/bpf: add test cases for invalid thp_adjust usage Yafang Shao
2025-09-10 2:44 ` [PATCH v7 mm-new 10/10] Documentation: add BPF-based THP policy management Yafang Shao
2025-09-10 11:11 ` [PATCH v7 mm-new 0/9] mm, bpf: BPF based THP order selection Lance Yang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7c890b42-610f-42ec-acf2-b5b9f95209b1@linux.dev \
--to=lance.yang@linux.dev \
--cc=21cnbao@gmail.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=ameryhung@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=bpf@vger.kernel.org \
--cc=corbet@lwn.net \
--cc=daniel@iogearbox.net \
--cc=david@redhat.com \
--cc=dev.jain@arm.com \
--cc=gutierrez.asier@huawei-partners.com \
--cc=hannes@cmpxchg.org \
--cc=ioworker0@gmail.com \
--cc=laoar.shao@gmail.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=npache@redhat.com \
--cc=rientjes@google.com \
--cc=ryan.roberts@arm.com \
--cc=shakeel.butt@linux.dev \
--cc=usamaarif642@gmail.com \
--cc=willy@infradead.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.