linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/2] fix MADV_COLLAPSE issue if THP settings are disabled
@ 2025-06-23  8:28 Baolin Wang
  2025-06-23  8:28 ` [PATCH v3 1/2] mm: huge_memory: disallow hugepages if the system-wide THP sysfs " Baolin Wang
  2025-06-23  8:28 ` [PATCH v3 2/2] mm: shmem: disallow hugepages if the system-wide shmem " Baolin Wang
  0 siblings, 2 replies; 19+ messages in thread
From: Baolin Wang @ 2025-06-23  8:28 UTC (permalink / raw)
  To: akpm, hughd, david
  Cc: ziy, lorenzo.stoakes, Liam.Howlett, npache, ryan.roberts,
	dev.jain, baohua, baolin.wang, linux-mm, linux-kernel

When invoking thp_vma_allowable_orders(), if the TVA_ENFORCE_SYSFS flag is not
specified, we will ignore the THP sysfs settings. Whilst it makes sense for the
callers who do not specify this flag, it creates a odd and surprising situation
where a sysadmin specifying 'never' for all THP sizes still observing THP pages
being allocated and used on the system. And the MADV_COLLAPSE is an example of
such a case, that means it will not set TVA_ENFORCE_SYSFS when calling
thp_vma_allowable_orders().

As we discussed in the previous thread [1], the MADV_COLLAPSE will ignore
the system-wide anon/shmem THP sysfs settings, which means that even though
we have disabled the anon/shmem THP configuration, MADV_COLLAPSE will still
attempt to collapse into a anon/shmem THP. This violates the rule we have
agreed upon: never means never.

For example, system administrators who disabled THP everywhere must indeed very
much not want THP to be used for whatever reason - having individual programs
being able to quietly override this is very surprising and likely to cause headaches
for those who desire this not to happen on their systems.

This patch set will address the MADV_COLLAPSE issue.

Test
====
1. Tested the mm selftests and found no regressions.
2. With toggling different Anon mTHP settings, the allocation and madvise collapse for
anonymous pages work well.
3. With toggling different shmem mTHP settings, the allocation and madvise collapse for
shmem work well.
4. Tested the large order allocation for tmpfs, and works as expected.

Hi Dev, Nico and Zi,
I dropped your reviewed or tested tags for patch 1, since patch 1 was refactored
according to Lorenzo's suggestions, please help review it again. Thanks.

[1] https://lore.kernel.org/all/1f00fdc3-a3a3-464b-8565-4c1b23d34f8d@linux.alibaba.com/

Changes from v2:
 - Update the commit message and cover letter, per Lorenzo. Thanks.
 - Simplify the logic in thp_vma_allowable_orders(), per Lorenzo and David. Thanks.

Changes from v1:
 - Update the commit message, per Zi.
 - Add Zi's reviewed tag. Thanks.
 - Update the shmem logic.

Baolin Wang (2):
  mm: huge_memory: disallow hugepages if the system-wide THP sysfs
    settings are disabled
  mm: shmem: disallow hugepages if the system-wide shmem THP sysfs
    settings are disabled

 include/linux/huge_mm.h                 | 51 ++++++++++++++++++-------
 mm/shmem.c                              |  6 +--
 tools/testing/selftests/mm/khugepaged.c |  8 +---
 3 files changed, 43 insertions(+), 22 deletions(-)

-- 
2.43.5


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v3 1/2] mm: huge_memory: disallow hugepages if the system-wide THP sysfs settings are disabled
  2025-06-23  8:28 [PATCH v3 0/2] fix MADV_COLLAPSE issue if THP settings are disabled Baolin Wang
@ 2025-06-23  8:28 ` Baolin Wang
  2025-06-23 10:26   ` Lorenzo Stoakes
                     ` (4 more replies)
  2025-06-23  8:28 ` [PATCH v3 2/2] mm: shmem: disallow hugepages if the system-wide shmem " Baolin Wang
  1 sibling, 5 replies; 19+ messages in thread
From: Baolin Wang @ 2025-06-23  8:28 UTC (permalink / raw)
  To: akpm, hughd, david
  Cc: ziy, lorenzo.stoakes, Liam.Howlett, npache, ryan.roberts,
	dev.jain, baohua, baolin.wang, linux-mm, linux-kernel

When invoking thp_vma_allowable_orders(), the TVA_ENFORCE_SYSFS flag is not
specified, we will ignore the THP sysfs settings. Whilst it makes sense for the
callers who do not specify this flag, it creates a odd and surprising situation
where a sysadmin specifying 'never' for all THP sizes still observing THP pages
being allocated and used on the system.

The motivating case for this is MADV_COLLAPSE. The MADV_COLLAPSE will ignore
the system-wide Anon THP sysfs settings, which means that even though we have
disabled the Anon THP configuration, MADV_COLLAPSE will still attempt to collapse
into a Anon THP. This violates the rule we have agreed upon: never means never.

Currently, besides MADV_COLLAPSE not setting TVA_ENFORCE_SYSFS, there is only
one other instance where TVA_ENFORCE_SYSFS is not set, which is in the
collapse_pte_mapped_thp() function, but I believe this is reasonable from its
comments:

"
/*
 * If we are here, we've succeeded in replacing all the native pages
 * in the page cache with a single hugepage. If a mm were to fault-in
 * this memory (mapped by a suitably aligned VMA), we'd get the hugepage
 * and map it by a PMD, regardless of sysfs THP settings. As such, let's
 * analogously elide sysfs THP settings here.
 */
if (!thp_vma_allowable_order(vma, vma->vm_flags, 0, PMD_ORDER))
"

Another rule for madvise, referring to David's suggestion: “allowing for
collapsing in a VM without VM_HUGEPAGE in the "madvise" mode would be fine".

To address this issue, the current strategy should be:

If no hugepage modes are enabled for the desired orders, nor can we enable them
by inheriting from a 'global' enabled setting - then it must be the case that
all desired orders either specify or inherit 'NEVER' - and we must abort.

Meanwhile, we should fix the khugepaged selftest for MADV_COLLAPSE by enabling
THP.

Suggested-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
 include/linux/huge_mm.h                 | 51 ++++++++++++++++++-------
 tools/testing/selftests/mm/khugepaged.c |  6 +--
 2 files changed, 39 insertions(+), 18 deletions(-)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 4d5bb67dc4ec..ab70ca4e704b 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -267,6 +267,42 @@ unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma,
 					 unsigned long tva_flags,
 					 unsigned long orders);
 
+/* Strictly mask requested anonymous orders according to sysfs settings. */
+static inline unsigned long __thp_mask_anon_orders(unsigned long vm_flags,
+		unsigned long tva_flags, unsigned long orders)
+{
+	const unsigned long always = READ_ONCE(huge_anon_orders_always);
+	const unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
+	const unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);
+	const unsigned long never = ~(always | madvise | inherit);
+	const bool inherit_never = !hugepage_global_enabled();
+
+	/* Disallow orders that are set to NEVER directly ... */
+	orders &= ~never;
+
+	/* ... or through inheritance (global == NEVER). */
+	if (inherit_never)
+		orders &= ~inherit;
+
+	/*
+	 * Otherwise, we only enforce sysfs settings if asked. In addition,
+	 * if the user sets a sysfs mode of madvise and if TVA_ENFORCE_SYSFS
+	 * is not set, we don't bother checking whether the VMA has VM_HUGEPAGE
+	 * set.
+	 */
+	if (!(tva_flags & TVA_ENFORCE_SYSFS))
+		return orders;
+
+	/* We already excluded never inherit above. */
+	if (vm_flags & VM_HUGEPAGE)
+		return orders & (always | madvise | inherit);
+
+	if (hugepage_global_always())
+		return orders & (always | inherit);
+
+	return orders & always;
+}
+
 /**
  * thp_vma_allowable_orders - determine hugepage orders that are allowed for vma
  * @vma:  the vm area to check
@@ -289,19 +325,8 @@ unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma,
 				       unsigned long orders)
 {
 	/* Optimization to check if required orders are enabled early. */
-	if ((tva_flags & TVA_ENFORCE_SYSFS) && vma_is_anonymous(vma)) {
-		unsigned long mask = READ_ONCE(huge_anon_orders_always);
-
-		if (vm_flags & VM_HUGEPAGE)
-			mask |= READ_ONCE(huge_anon_orders_madvise);
-		if (hugepage_global_always() ||
-		    ((vm_flags & VM_HUGEPAGE) && hugepage_global_enabled()))
-			mask |= READ_ONCE(huge_anon_orders_inherit);
-
-		orders &= mask;
-		if (!orders)
-			return 0;
-	}
+	if (vma_is_anonymous(vma))
+		orders = __thp_mask_anon_orders(vm_flags, tva_flags, orders);
 
 	return __thp_vma_allowable_orders(vma, vm_flags, tva_flags, orders);
 }
diff --git a/tools/testing/selftests/mm/khugepaged.c b/tools/testing/selftests/mm/khugepaged.c
index 4341ce6b3b38..85bfff53dba6 100644
--- a/tools/testing/selftests/mm/khugepaged.c
+++ b/tools/testing/selftests/mm/khugepaged.c
@@ -501,11 +501,7 @@ static void __madvise_collapse(const char *msg, char *p, int nr_hpages,
 
 	printf("%s...", msg);
 
-	/*
-	 * Prevent khugepaged interference and tests that MADV_COLLAPSE
-	 * ignores /sys/kernel/mm/transparent_hugepage/enabled
-	 */
-	settings.thp_enabled = THP_NEVER;
+	settings.thp_enabled = THP_ALWAYS;
 	settings.shmem_enabled = SHMEM_NEVER;
 	thp_push_settings(&settings);
 
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 2/2] mm: shmem: disallow hugepages if the system-wide shmem THP sysfs settings are disabled
  2025-06-23  8:28 [PATCH v3 0/2] fix MADV_COLLAPSE issue if THP settings are disabled Baolin Wang
  2025-06-23  8:28 ` [PATCH v3 1/2] mm: huge_memory: disallow hugepages if the system-wide THP sysfs " Baolin Wang
@ 2025-06-23  8:28 ` Baolin Wang
  2025-06-23 10:45   ` Lorenzo Stoakes
  2025-06-23 13:59   ` David Hildenbrand
  1 sibling, 2 replies; 19+ messages in thread
From: Baolin Wang @ 2025-06-23  8:28 UTC (permalink / raw)
  To: akpm, hughd, david
  Cc: ziy, lorenzo.stoakes, Liam.Howlett, npache, ryan.roberts,
	dev.jain, baohua, baolin.wang, linux-mm, linux-kernel

When invoking thp_vma_allowable_orders(), if the TVA_ENFORCE_SYSFS flag is not
specified, we will ignore the THP sysfs settings. And the MADV_COLLAPSE is an
example of such a case.

The MADV_COLLAPSE will ignore the system-wide shmem THP sysfs settings, which
means that even though we have disabled the shmem THP configuration, MADV_COLLAPSE
will still attempt to collapse into a shmem THP. This violates the rule we have
agreed upon: never means never.

Another rule for madvise, referring to David's suggestion: “allowing for collapsing
in a VM without VM_HUGEPAGE in the "madvise" mode would be fine".

To fix the MADV_COLLAPSE issue for shmem, then the current strategy should be:

For shmem, if none of always, madvise, within_size, and inherit have enabled
PMD-sized THP, then MADV_COLLAPSE will be prohibited from collapsing PMD-sized THP.

For tmpfs, if the mount option is set with the 'huge=never' parameter, then
MADV_COLLAPSE will be prohibited from collapsing PMD-sized THP.

Meanwhile, we should fix the khugepaged selftest for shmem MADV_COLLAPSE by enabling
shmem THP.

Acked-by: Zi Yan <ziy@nvidia.com>
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
 mm/shmem.c                              | 6 +++---
 tools/testing/selftests/mm/khugepaged.c | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index 2b19965d27df..e3f51fab2b7d 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -637,7 +637,7 @@ static unsigned int shmem_huge_global_enabled(struct inode *inode, pgoff_t index
 		return 0;
 	if (shmem_huge == SHMEM_HUGE_DENY)
 		return 0;
-	if (shmem_huge_force || shmem_huge == SHMEM_HUGE_FORCE)
+	if (shmem_huge == SHMEM_HUGE_FORCE)
 		return maybe_pmd_order;
 
 	/*
@@ -672,7 +672,7 @@ static unsigned int shmem_huge_global_enabled(struct inode *inode, pgoff_t index
 
 		fallthrough;
 	case SHMEM_HUGE_ADVISE:
-		if (vm_flags & VM_HUGEPAGE)
+		if (shmem_huge_force || (vm_flags & VM_HUGEPAGE))
 			return maybe_pmd_order;
 		fallthrough;
 	default:
@@ -1806,7 +1806,7 @@ unsigned long shmem_allowable_huge_orders(struct inode *inode,
 	/* Allow mTHP that will be fully within i_size. */
 	mask |= shmem_get_orders_within_size(inode, within_size_orders, index, 0);
 
-	if (vm_flags & VM_HUGEPAGE)
+	if (shmem_huge_force || (vm_flags & VM_HUGEPAGE))
 		mask |= READ_ONCE(huge_shmem_orders_madvise);
 
 	if (global_orders > 0)
diff --git a/tools/testing/selftests/mm/khugepaged.c b/tools/testing/selftests/mm/khugepaged.c
index 85bfff53dba6..9517ed99c382 100644
--- a/tools/testing/selftests/mm/khugepaged.c
+++ b/tools/testing/selftests/mm/khugepaged.c
@@ -502,7 +502,7 @@ static void __madvise_collapse(const char *msg, char *p, int nr_hpages,
 	printf("%s...", msg);
 
 	settings.thp_enabled = THP_ALWAYS;
-	settings.shmem_enabled = SHMEM_NEVER;
+	settings.shmem_enabled = SHMEM_ALWAYS;
 	thp_push_settings(&settings);
 
 	/* Clear VM_NOHUGEPAGE */
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/2] mm: huge_memory: disallow hugepages if the system-wide THP sysfs settings are disabled
  2025-06-23  8:28 ` [PATCH v3 1/2] mm: huge_memory: disallow hugepages if the system-wide THP sysfs " Baolin Wang
@ 2025-06-23 10:26   ` Lorenzo Stoakes
  2025-06-24  1:45     ` Baolin Wang
  2025-06-23 11:08   ` Barry Song
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 19+ messages in thread
From: Lorenzo Stoakes @ 2025-06-23 10:26 UTC (permalink / raw)
  To: Baolin Wang
  Cc: akpm, hughd, david, ziy, Liam.Howlett, npache, ryan.roberts,
	dev.jain, baohua, linux-mm, linux-kernel

On Mon, Jun 23, 2025 at 04:28:08PM +0800, Baolin Wang wrote:
> When invoking thp_vma_allowable_orders(), the TVA_ENFORCE_SYSFS flag is not
> specified, we will ignore the THP sysfs settings. Whilst it makes sense for the
> callers who do not specify this flag, it creates a odd and surprising situation
> where a sysadmin specifying 'never' for all THP sizes still observing THP pages
> being allocated and used on the system.
>
> The motivating case for this is MADV_COLLAPSE. The MADV_COLLAPSE will ignore
> the system-wide Anon THP sysfs settings, which means that even though we have
> disabled the Anon THP configuration, MADV_COLLAPSE will still attempt to collapse
> into a Anon THP. This violates the rule we have agreed upon: never means never.
>
> Currently, besides MADV_COLLAPSE not setting TVA_ENFORCE_SYSFS, there is only
> one other instance where TVA_ENFORCE_SYSFS is not set, which is in the
> collapse_pte_mapped_thp() function, but I believe this is reasonable from its
> comments:
>
> "
> /*
>  * If we are here, we've succeeded in replacing all the native pages
>  * in the page cache with a single hugepage. If a mm were to fault-in
>  * this memory (mapped by a suitably aligned VMA), we'd get the hugepage
>  * and map it by a PMD, regardless of sysfs THP settings. As such, let's
>  * analogously elide sysfs THP settings here.
>  */
> if (!thp_vma_allowable_order(vma, vma->vm_flags, 0, PMD_ORDER))
> "
>
> Another rule for madvise, referring to David's suggestion: “allowing for
> collapsing in a VM without VM_HUGEPAGE in the "madvise" mode would be fine".
>
> To address this issue, the current strategy should be:
>
> If no hugepage modes are enabled for the desired orders, nor can we enable them
> by inheriting from a 'global' enabled setting - then it must be the case that
> all desired orders either specify or inherit 'NEVER' - and we must abort.
>
> Meanwhile, we should fix the khugepaged selftest for MADV_COLLAPSE by enabling
> THP.

Thanks! Sounds good.
>
> Suggested-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>

Appreciate it though I'm not so bothered about attribution :) but just to say,
of course the 'never' stuff is David's idea (and a good one!) :)

> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>

LGTM so:

Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>

> ---
>  include/linux/huge_mm.h                 | 51 ++++++++++++++++++-------
>  tools/testing/selftests/mm/khugepaged.c |  6 +--
>  2 files changed, 39 insertions(+), 18 deletions(-)
>
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index 4d5bb67dc4ec..ab70ca4e704b 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -267,6 +267,42 @@ unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma,
>  					 unsigned long tva_flags,
>  					 unsigned long orders);
>
> +/* Strictly mask requested anonymous orders according to sysfs settings. */
> +static inline unsigned long __thp_mask_anon_orders(unsigned long vm_flags,
> +		unsigned long tva_flags, unsigned long orders)
> +{
> +	const unsigned long always = READ_ONCE(huge_anon_orders_always);
> +	const unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
> +	const unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);
> +	const unsigned long never = ~(always | madvise | inherit);
> +	const bool inherit_never = !hugepage_global_enabled();
> +
> +	/* Disallow orders that are set to NEVER directly ... */
> +	orders &= ~never;
> +
> +	/* ... or through inheritance (global == NEVER). */
> +	if (inherit_never)
> +		orders &= ~inherit;
> +
> +	/*
> +	 * Otherwise, we only enforce sysfs settings if asked. In addition,
> +	 * if the user sets a sysfs mode of madvise and if TVA_ENFORCE_SYSFS
> +	 * is not set, we don't bother checking whether the VMA has VM_HUGEPAGE
> +	 * set.
> +	 */
> +	if (!(tva_flags & TVA_ENFORCE_SYSFS))
> +		return orders;
> +
> +	/* We already excluded never inherit above. */
> +	if (vm_flags & VM_HUGEPAGE)
> +		return orders & (always | madvise | inherit);
> +
> +	if (hugepage_global_always())
> +		return orders & (always | inherit);
> +
> +	return orders & always;
> +}
> +
>  /**
>   * thp_vma_allowable_orders - determine hugepage orders that are allowed for vma
>   * @vma:  the vm area to check
> @@ -289,19 +325,8 @@ unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma,
>  				       unsigned long orders)
>  {
>  	/* Optimization to check if required orders are enabled early. */
> -	if ((tva_flags & TVA_ENFORCE_SYSFS) && vma_is_anonymous(vma)) {
> -		unsigned long mask = READ_ONCE(huge_anon_orders_always);
> -
> -		if (vm_flags & VM_HUGEPAGE)
> -			mask |= READ_ONCE(huge_anon_orders_madvise);
> -		if (hugepage_global_always() ||
> -		    ((vm_flags & VM_HUGEPAGE) && hugepage_global_enabled()))
> -			mask |= READ_ONCE(huge_anon_orders_inherit);
> -
> -		orders &= mask;
> -		if (!orders)
> -			return 0;
> -	}
> +	if (vma_is_anonymous(vma))
> +		orders = __thp_mask_anon_orders(vm_flags, tva_flags, orders);
>
>  	return __thp_vma_allowable_orders(vma, vm_flags, tva_flags, orders);
>  }
> diff --git a/tools/testing/selftests/mm/khugepaged.c b/tools/testing/selftests/mm/khugepaged.c
> index 4341ce6b3b38..85bfff53dba6 100644
> --- a/tools/testing/selftests/mm/khugepaged.c
> +++ b/tools/testing/selftests/mm/khugepaged.c
> @@ -501,11 +501,7 @@ static void __madvise_collapse(const char *msg, char *p, int nr_hpages,
>
>  	printf("%s...", msg);
>
> -	/*
> -	 * Prevent khugepaged interference and tests that MADV_COLLAPSE
> -	 * ignores /sys/kernel/mm/transparent_hugepage/enabled
> -	 */
> -	settings.thp_enabled = THP_NEVER;
> +	settings.thp_enabled = THP_ALWAYS;

Good spot!

>  	settings.shmem_enabled = SHMEM_NEVER;
>  	thp_push_settings(&settings);
>
> --
> 2.43.5
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 2/2] mm: shmem: disallow hugepages if the system-wide shmem THP sysfs settings are disabled
  2025-06-23  8:28 ` [PATCH v3 2/2] mm: shmem: disallow hugepages if the system-wide shmem " Baolin Wang
@ 2025-06-23 10:45   ` Lorenzo Stoakes
  2025-06-23 13:59   ` David Hildenbrand
  1 sibling, 0 replies; 19+ messages in thread
From: Lorenzo Stoakes @ 2025-06-23 10:45 UTC (permalink / raw)
  To: Baolin Wang
  Cc: akpm, hughd, david, ziy, Liam.Howlett, npache, ryan.roberts,
	dev.jain, baohua, linux-mm, linux-kernel

On Mon, Jun 23, 2025 at 04:28:09PM +0800, Baolin Wang wrote:
> When invoking thp_vma_allowable_orders(), if the TVA_ENFORCE_SYSFS flag is not
> specified, we will ignore the THP sysfs settings. And the MADV_COLLAPSE is an
> example of such a case.
>
> The MADV_COLLAPSE will ignore the system-wide shmem THP sysfs settings, which
> means that even though we have disabled the shmem THP configuration, MADV_COLLAPSE
> will still attempt to collapse into a shmem THP. This violates the rule we have
> agreed upon: never means never.
>
> Another rule for madvise, referring to David's suggestion: “allowing for collapsing
> in a VM without VM_HUGEPAGE in the "madvise" mode would be fine".
>
> To fix the MADV_COLLAPSE issue for shmem, then the current strategy should be:
>
> For shmem, if none of always, madvise, within_size, and inherit have enabled
> PMD-sized THP, then MADV_COLLAPSE will be prohibited from collapsing PMD-sized THP.
>
> For tmpfs, if the mount option is set with the 'huge=never' parameter, then
> MADV_COLLAPSE will be prohibited from collapsing PMD-sized THP.
>
> Meanwhile, we should fix the khugepaged selftest for shmem MADV_COLLAPSE by enabling
> shmem THP.
>
> Acked-by: Zi Yan <ziy@nvidia.com>
> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>

LGTM, so:

Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>

> ---
>  mm/shmem.c                              | 6 +++---
>  tools/testing/selftests/mm/khugepaged.c | 2 +-
>  2 files changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 2b19965d27df..e3f51fab2b7d 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -637,7 +637,7 @@ static unsigned int shmem_huge_global_enabled(struct inode *inode, pgoff_t index
>  		return 0;
>  	if (shmem_huge == SHMEM_HUGE_DENY)
>  		return 0;
> -	if (shmem_huge_force || shmem_huge == SHMEM_HUGE_FORCE)
> +	if (shmem_huge == SHMEM_HUGE_FORCE)
>  		return maybe_pmd_order;
>
>  	/*
> @@ -672,7 +672,7 @@ static unsigned int shmem_huge_global_enabled(struct inode *inode, pgoff_t index
>
>  		fallthrough;
>  	case SHMEM_HUGE_ADVISE:
> -		if (vm_flags & VM_HUGEPAGE)
> +		if (shmem_huge_force || (vm_flags & VM_HUGEPAGE))
>  			return maybe_pmd_order;
>  		fallthrough;
>  	default:
> @@ -1806,7 +1806,7 @@ unsigned long shmem_allowable_huge_orders(struct inode *inode,
>  	/* Allow mTHP that will be fully within i_size. */
>  	mask |= shmem_get_orders_within_size(inode, within_size_orders, index, 0);
>
> -	if (vm_flags & VM_HUGEPAGE)
> +	if (shmem_huge_force || (vm_flags & VM_HUGEPAGE))
>  		mask |= READ_ONCE(huge_shmem_orders_madvise);
>
>  	if (global_orders > 0)
> diff --git a/tools/testing/selftests/mm/khugepaged.c b/tools/testing/selftests/mm/khugepaged.c
> index 85bfff53dba6..9517ed99c382 100644
> --- a/tools/testing/selftests/mm/khugepaged.c
> +++ b/tools/testing/selftests/mm/khugepaged.c
> @@ -502,7 +502,7 @@ static void __madvise_collapse(const char *msg, char *p, int nr_hpages,
>  	printf("%s...", msg);
>
>  	settings.thp_enabled = THP_ALWAYS;
> -	settings.shmem_enabled = SHMEM_NEVER;
> +	settings.shmem_enabled = SHMEM_ALWAYS;
>  	thp_push_settings(&settings);
>
>  	/* Clear VM_NOHUGEPAGE */
> --
> 2.43.5
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/2] mm: huge_memory: disallow hugepages if the system-wide THP sysfs settings are disabled
  2025-06-23  8:28 ` [PATCH v3 1/2] mm: huge_memory: disallow hugepages if the system-wide THP sysfs " Baolin Wang
  2025-06-23 10:26   ` Lorenzo Stoakes
@ 2025-06-23 11:08   ` Barry Song
  2025-06-24  1:44     ` Baolin Wang
  2025-06-23 13:54   ` David Hildenbrand
                     ` (2 subsequent siblings)
  4 siblings, 1 reply; 19+ messages in thread
From: Barry Song @ 2025-06-23 11:08 UTC (permalink / raw)
  To: Baolin Wang
  Cc: akpm, hughd, david, ziy, lorenzo.stoakes, Liam.Howlett, npache,
	ryan.roberts, dev.jain, linux-mm, linux-kernel

On Mon, Jun 23, 2025 at 8:28 PM Baolin Wang
<baolin.wang@linux.alibaba.com> wrote:
>
> When invoking thp_vma_allowable_orders(), the TVA_ENFORCE_SYSFS flag is not
> specified, we will ignore the THP sysfs settings. Whilst it makes sense for the
> callers who do not specify this flag, it creates a odd and surprising situation
> where a sysadmin specifying 'never' for all THP sizes still observing THP pages
> being allocated and used on the system.
>
> The motivating case for this is MADV_COLLAPSE. The MADV_COLLAPSE will ignore
> the system-wide Anon THP sysfs settings, which means that even though we have
> disabled the Anon THP configuration, MADV_COLLAPSE will still attempt to collapse
> into a Anon THP. This violates the rule we have agreed upon: never means never.
>

Should we update the man page for madv_collapse ?
https://man7.org/linux/man-pages/man2/madvise.2.html

              MADV_COLLAPSE is independent of any sysfs (see sysfs(5))
              setting under /sys/kernel/mm/transparent_hugepage, both in
              terms of determining THP eligibility, and allocation
              semantics.  See Linux kernel source file
              Documentation/admin-guide/mm/transhuge.rst for more
              information.  MADV_COLLAPSE also ignores huge= tmpfs mount
              when operating on tmpfs files.  Allocation for the new
              hugepage may enter direct reclaim and/or compaction,
              regardless of VMA flags (though VM_NOHUGEPAGE is still
              respected).

So this effectively changes the uABI, right?

> Currently, besides MADV_COLLAPSE not setting TVA_ENFORCE_SYSFS, there is only
> one other instance where TVA_ENFORCE_SYSFS is not set, which is in the
> collapse_pte_mapped_thp() function, but I believe this is reasonable from its
> comments:
>
> "
> /*
>  * If we are here, we've succeeded in replacing all the native pages
>  * in the page cache with a single hugepage. If a mm were to fault-in
>  * this memory (mapped by a suitably aligned VMA), we'd get the hugepage
>  * and map it by a PMD, regardless of sysfs THP settings. As such, let's
>  * analogously elide sysfs THP settings here.
>  */
> if (!thp_vma_allowable_order(vma, vma->vm_flags, 0, PMD_ORDER))
> "
>
> Another rule for madvise, referring to David's suggestion: “allowing for
> collapsing in a VM without VM_HUGEPAGE in the "madvise" mode would be fine".
>
> To address this issue, the current strategy should be:
>
> If no hugepage modes are enabled for the desired orders, nor can we enable them
> by inheriting from a 'global' enabled setting - then it must be the case that
> all desired orders either specify or inherit 'NEVER' - and we must abort.
>
> Meanwhile, we should fix the khugepaged selftest for MADV_COLLAPSE by enabling
> THP.

It’s a bit odd that the old test case expects collapsing to succeed
even when we’ve set it
to ‘never’.
Setting it to ‘always’ doesn’t seem to test anything as a counterpart.

I assume the goal is to test that setting it to ‘never’ prevents collapsing?

>
> Suggested-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> ---

Thanks
Barry

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/2] mm: huge_memory: disallow hugepages if the system-wide THP sysfs settings are disabled
  2025-06-23  8:28 ` [PATCH v3 1/2] mm: huge_memory: disallow hugepages if the system-wide THP sysfs " Baolin Wang
  2025-06-23 10:26   ` Lorenzo Stoakes
  2025-06-23 11:08   ` Barry Song
@ 2025-06-23 13:54   ` David Hildenbrand
  2025-06-24  1:48     ` Baolin Wang
  2025-06-23 14:39   ` Zi Yan
  2025-06-24  8:41   ` Dev Jain
  4 siblings, 1 reply; 19+ messages in thread
From: David Hildenbrand @ 2025-06-23 13:54 UTC (permalink / raw)
  To: Baolin Wang, akpm, hughd
  Cc: ziy, lorenzo.stoakes, Liam.Howlett, npache, ryan.roberts,
	dev.jain, baohua, linux-mm, linux-kernel


> diff --git a/tools/testing/selftests/mm/khugepaged.c b/tools/testing/selftests/mm/khugepaged.c
> index 4341ce6b3b38..85bfff53dba6 100644
> --- a/tools/testing/selftests/mm/khugepaged.c
> +++ b/tools/testing/selftests/mm/khugepaged.c
> @@ -501,11 +501,7 @@ static void __madvise_collapse(const char *msg, char *p, int nr_hpages,
>   
>   	printf("%s...", msg);
>   
> -	/*
> -	 * Prevent khugepaged interference and tests that MADV_COLLAPSE
> -	 * ignores /sys/kernel/mm/transparent_hugepage/enabled
> -	 */
> -	settings.thp_enabled = THP_NEVER;
> +	settings.thp_enabled = THP_ALWAYS;


Would MADVISE mode also work here? If we don't set MADV_HUGEPAGE, then 
khugepaged should be excluded, correct?


-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 2/2] mm: shmem: disallow hugepages if the system-wide shmem THP sysfs settings are disabled
  2025-06-23  8:28 ` [PATCH v3 2/2] mm: shmem: disallow hugepages if the system-wide shmem " Baolin Wang
  2025-06-23 10:45   ` Lorenzo Stoakes
@ 2025-06-23 13:59   ` David Hildenbrand
  2025-06-24  1:52     ` Baolin Wang
  1 sibling, 1 reply; 19+ messages in thread
From: David Hildenbrand @ 2025-06-23 13:59 UTC (permalink / raw)
  To: Baolin Wang, akpm, hughd
  Cc: ziy, lorenzo.stoakes, Liam.Howlett, npache, ryan.roberts,
	dev.jain, baohua, linux-mm, linux-kernel

On 23.06.25 10:28, Baolin Wang wrote:
> When invoking thp_vma_allowable_orders(), if the TVA_ENFORCE_SYSFS flag is not
> specified, we will ignore the THP sysfs settings. And the MADV_COLLAPSE is an
> example of such a case.
> 
> The MADV_COLLAPSE will ignore the system-wide shmem THP sysfs settings, which
> means that even though we have disabled the shmem THP configuration, MADV_COLLAPSE
> will still attempt to collapse into a shmem THP. This violates the rule we have
> agreed upon: never means never.
> 
> Another rule for madvise, referring to David's suggestion: “allowing for collapsing
> in a VM without VM_HUGEPAGE in the "madvise" mode would be fine".
> 
> To fix the MADV_COLLAPSE issue for shmem, then the current strategy should be:
> 
> For shmem, if none of always, madvise, within_size, and inherit have enabled
> PMD-sized THP, then MADV_COLLAPSE will be prohibited from collapsing PMD-sized THP.

I assume we could rephrase that to "For shmem, if "shmem_enabled" is set 
to either "none" or "deny", then MADV_COLLAPSE will be prohibited from 
collapsing."

Or am I missing a case?

[...]

> @@ -672,7 +672,7 @@ static unsigned int shmem_huge_global_enabled(struct inode *inode, pgoff_t index
>   
>   		fallthrough;
>   	case SHMEM_HUGE_ADVISE:
> -		if (vm_flags & VM_HUGEPAGE)
> +		if (shmem_huge_force || (vm_flags & VM_HUGEPAGE))
>   			return maybe_pmd_order;
>   		fallthrough;
>   	default:
> @@ -1806,7 +1806,7 @@ unsigned long shmem_allowable_huge_orders(struct inode *inode,
>   	/* Allow mTHP that will be fully within i_size. */
>   	mask |= shmem_get_orders_within_size(inode, within_size_orders, index, 0);
>   
> -	if (vm_flags & VM_HUGEPAGE)
> +	if (shmem_huge_force || (vm_flags & VM_HUGEPAGE))
>   		mask |= READ_ONCE(huge_shmem_orders_madvise);
>   
>   	if (global_orders > 0)
> diff --git a/tools/testing/selftests/mm/khugepaged.c b/tools/testing/selftests/mm/khugepaged.c
> index 85bfff53dba6..9517ed99c382 100644
> --- a/tools/testing/selftests/mm/khugepaged.c
> +++ b/tools/testing/selftests/mm/khugepaged.c
> @@ -502,7 +502,7 @@ static void __madvise_collapse(const char *msg, char *p, int nr_hpages,
>   	printf("%s...", msg);
>   
>   	settings.thp_enabled = THP_ALWAYS;
> -	settings.shmem_enabled = SHMEM_NEVER;
> +	settings.shmem_enabled = SHMEM_ALWAYS;
>   	thp_push_settings(&settings);

Same question as for the other case.

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/2] mm: huge_memory: disallow hugepages if the system-wide THP sysfs settings are disabled
  2025-06-23  8:28 ` [PATCH v3 1/2] mm: huge_memory: disallow hugepages if the system-wide THP sysfs " Baolin Wang
                     ` (2 preceding siblings ...)
  2025-06-23 13:54   ` David Hildenbrand
@ 2025-06-23 14:39   ` Zi Yan
  2025-06-24  8:41   ` Dev Jain
  4 siblings, 0 replies; 19+ messages in thread
From: Zi Yan @ 2025-06-23 14:39 UTC (permalink / raw)
  To: Baolin Wang
  Cc: akpm, hughd, david, lorenzo.stoakes, Liam.Howlett, npache,
	ryan.roberts, dev.jain, baohua, linux-mm, linux-kernel

On 23 Jun 2025, at 4:28, Baolin Wang wrote:

> When invoking thp_vma_allowable_orders(), the TVA_ENFORCE_SYSFS flag is not
> specified, we will ignore the THP sysfs settings. Whilst it makes sense for the
> callers who do not specify this flag, it creates a odd and surprising situation
> where a sysadmin specifying 'never' for all THP sizes still observing THP pages
> being allocated and used on the system.
>
> The motivating case for this is MADV_COLLAPSE. The MADV_COLLAPSE will ignore
> the system-wide Anon THP sysfs settings, which means that even though we have
> disabled the Anon THP configuration, MADV_COLLAPSE will still attempt to collapse
> into a Anon THP. This violates the rule we have agreed upon: never means never.
>
> Currently, besides MADV_COLLAPSE not setting TVA_ENFORCE_SYSFS, there is only
> one other instance where TVA_ENFORCE_SYSFS is not set, which is in the
> collapse_pte_mapped_thp() function, but I believe this is reasonable from its
> comments:
>
> "
> /*
>  * If we are here, we've succeeded in replacing all the native pages
>  * in the page cache with a single hugepage. If a mm were to fault-in
>  * this memory (mapped by a suitably aligned VMA), we'd get the hugepage
>  * and map it by a PMD, regardless of sysfs THP settings. As such, let's
>  * analogously elide sysfs THP settings here.
>  */
> if (!thp_vma_allowable_order(vma, vma->vm_flags, 0, PMD_ORDER))
> "
>
> Another rule for madvise, referring to David's suggestion: “allowing for
> collapsing in a VM without VM_HUGEPAGE in the "madvise" mode would be fine".
>
> To address this issue, the current strategy should be:
>
> If no hugepage modes are enabled for the desired orders, nor can we enable them
> by inheriting from a 'global' enabled setting - then it must be the case that
> all desired orders either specify or inherit 'NEVER' - and we must abort.
>
> Meanwhile, we should fix the khugepaged selftest for MADV_COLLAPSE by enabling
> THP.
>
> Suggested-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> ---
>  include/linux/huge_mm.h                 | 51 ++++++++++++++++++-------
>  tools/testing/selftests/mm/khugepaged.c |  6 +--
>  2 files changed, 39 insertions(+), 18 deletions(-)
>
The code looks much cleaner. Thanks.

Reviewed-by: Zi Yan <ziy@nvidia.com>

--
Best Regards,
Yan, Zi

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/2] mm: huge_memory: disallow hugepages if the system-wide THP sysfs settings are disabled
  2025-06-23 11:08   ` Barry Song
@ 2025-06-24  1:44     ` Baolin Wang
  0 siblings, 0 replies; 19+ messages in thread
From: Baolin Wang @ 2025-06-24  1:44 UTC (permalink / raw)
  To: Barry Song
  Cc: akpm, hughd, david, ziy, lorenzo.stoakes, Liam.Howlett, npache,
	ryan.roberts, dev.jain, linux-mm, linux-kernel



On 2025/6/23 19:08, Barry Song wrote:
> On Mon, Jun 23, 2025 at 8:28 PM Baolin Wang
> <baolin.wang@linux.alibaba.com> wrote:
>>
>> When invoking thp_vma_allowable_orders(), the TVA_ENFORCE_SYSFS flag is not
>> specified, we will ignore the THP sysfs settings. Whilst it makes sense for the
>> callers who do not specify this flag, it creates a odd and surprising situation
>> where a sysadmin specifying 'never' for all THP sizes still observing THP pages
>> being allocated and used on the system.
>>
>> The motivating case for this is MADV_COLLAPSE. The MADV_COLLAPSE will ignore
>> the system-wide Anon THP sysfs settings, which means that even though we have
>> disabled the Anon THP configuration, MADV_COLLAPSE will still attempt to collapse
>> into a Anon THP. This violates the rule we have agreed upon: never means never.
>>
> 
> Should we update the man page for madv_collapse ?
> https://man7.org/linux/man-pages/man2/madvise.2.html
> 
>                MADV_COLLAPSE is independent of any sysfs (see sysfs(5))
>                setting under /sys/kernel/mm/transparent_hugepage, both in
>                terms of determining THP eligibility, and allocation
>                semantics.  See Linux kernel source file
>                Documentation/admin-guide/mm/transhuge.rst for more
>                information.  MADV_COLLAPSE also ignores huge= tmpfs mount
>                when operating on tmpfs files.  Allocation for the new
>                hugepage may enter direct reclaim and/or compaction,
>                regardless of VMA flags (though VM_NOHUGEPAGE is still
>                respected).
> 
> So this effectively changes the uABI, right?

Good point. Will update the man page.

>> Currently, besides MADV_COLLAPSE not setting TVA_ENFORCE_SYSFS, there is only
>> one other instance where TVA_ENFORCE_SYSFS is not set, which is in the
>> collapse_pte_mapped_thp() function, but I believe this is reasonable from its
>> comments:
>>
>> "
>> /*
>>   * If we are here, we've succeeded in replacing all the native pages
>>   * in the page cache with a single hugepage. If a mm were to fault-in
>>   * this memory (mapped by a suitably aligned VMA), we'd get the hugepage
>>   * and map it by a PMD, regardless of sysfs THP settings. As such, let's
>>   * analogously elide sysfs THP settings here.
>>   */
>> if (!thp_vma_allowable_order(vma, vma->vm_flags, 0, PMD_ORDER))
>> "
>>
>> Another rule for madvise, referring to David's suggestion: “allowing for
>> collapsing in a VM without VM_HUGEPAGE in the "madvise" mode would be fine".
>>
>> To address this issue, the current strategy should be:
>>
>> If no hugepage modes are enabled for the desired orders, nor can we enable them
>> by inheriting from a 'global' enabled setting - then it must be the case that
>> all desired orders either specify or inherit 'NEVER' - and we must abort.
>>
>> Meanwhile, we should fix the khugepaged selftest for MADV_COLLAPSE by enabling
>> THP.
> 
> It’s a bit odd that the old test case expects collapsing to succeed
> even when we’ve set it
> to ‘never’.
> Setting it to ‘always’ doesn’t seem to test anything as a counterpart.
> 
> I assume the goal is to test that setting it to ‘never’ prevents collapsing?

The original logic will prevent khugepaged by setting THP_NEVER, 
allowing only madvise_collapse() to perform THP collapse. And this is 
the logic this patchset tries to fix, which is to also prevent 
madvise_collapse() from performing THP collapse when system-wide THP 
sysfs settings are disabled.

Therefore, it should be changed to THP_ALWAYS here to allow 
madvise_collapse() to perform THP collapse.

Of course, the current logic cannot completely disable khugepaged, but I 
haven't found a better way to modify it. As David suggested, changing to 
MADVISE mode would cause some test cases to fail because some tests 
previously set MADV_NOHUGEPAGE, and now there is no other way to clear 
the MADV_NOHUGEPAGE flag except for setting MADV_HUGEPAGE. As a result, 
khugepaged cannot be completely disabled either.

So I think we should introduce a new method to clear MADV_NOHUGEPAGE 
flag without setting MADV_HUGEPAGE in the future.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/2] mm: huge_memory: disallow hugepages if the system-wide THP sysfs settings are disabled
  2025-06-23 10:26   ` Lorenzo Stoakes
@ 2025-06-24  1:45     ` Baolin Wang
  0 siblings, 0 replies; 19+ messages in thread
From: Baolin Wang @ 2025-06-24  1:45 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: akpm, hughd, david, ziy, Liam.Howlett, npache, ryan.roberts,
	dev.jain, baohua, linux-mm, linux-kernel



On 2025/6/23 18:26, Lorenzo Stoakes wrote:
> On Mon, Jun 23, 2025 at 04:28:08PM +0800, Baolin Wang wrote:
>> When invoking thp_vma_allowable_orders(), the TVA_ENFORCE_SYSFS flag is not
>> specified, we will ignore the THP sysfs settings. Whilst it makes sense for the
>> callers who do not specify this flag, it creates a odd and surprising situation
>> where a sysadmin specifying 'never' for all THP sizes still observing THP pages
>> being allocated and used on the system.
>>
>> The motivating case for this is MADV_COLLAPSE. The MADV_COLLAPSE will ignore
>> the system-wide Anon THP sysfs settings, which means that even though we have
>> disabled the Anon THP configuration, MADV_COLLAPSE will still attempt to collapse
>> into a Anon THP. This violates the rule we have agreed upon: never means never.
>>
>> Currently, besides MADV_COLLAPSE not setting TVA_ENFORCE_SYSFS, there is only
>> one other instance where TVA_ENFORCE_SYSFS is not set, which is in the
>> collapse_pte_mapped_thp() function, but I believe this is reasonable from its
>> comments:
>>
>> "
>> /*
>>   * If we are here, we've succeeded in replacing all the native pages
>>   * in the page cache with a single hugepage. If a mm were to fault-in
>>   * this memory (mapped by a suitably aligned VMA), we'd get the hugepage
>>   * and map it by a PMD, regardless of sysfs THP settings. As such, let's
>>   * analogously elide sysfs THP settings here.
>>   */
>> if (!thp_vma_allowable_order(vma, vma->vm_flags, 0, PMD_ORDER))
>> "
>>
>> Another rule for madvise, referring to David's suggestion: “allowing for
>> collapsing in a VM without VM_HUGEPAGE in the "madvise" mode would be fine".
>>
>> To address this issue, the current strategy should be:
>>
>> If no hugepage modes are enabled for the desired orders, nor can we enable them
>> by inheriting from a 'global' enabled setting - then it must be the case that
>> all desired orders either specify or inherit 'NEVER' - and we must abort.
>>
>> Meanwhile, we should fix the khugepaged selftest for MADV_COLLAPSE by enabling
>> THP.
> 
> Thanks! Sounds good.
>>
>> Suggested-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> 
> Appreciate it though I'm not so bothered about attribution :) but just to say,
> of course the 'never' stuff is David's idea (and a good one!) :)

Yes, I should also add:

Suggested-by: David Hildenbrand <david@redhat.com>

>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> 
> LGTM so:
> 
> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>

Thanks.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/2] mm: huge_memory: disallow hugepages if the system-wide THP sysfs settings are disabled
  2025-06-23 13:54   ` David Hildenbrand
@ 2025-06-24  1:48     ` Baolin Wang
  2025-06-24  8:29       ` David Hildenbrand
  0 siblings, 1 reply; 19+ messages in thread
From: Baolin Wang @ 2025-06-24  1:48 UTC (permalink / raw)
  To: David Hildenbrand, akpm, hughd
  Cc: ziy, lorenzo.stoakes, Liam.Howlett, npache, ryan.roberts,
	dev.jain, baohua, linux-mm, linux-kernel



On 2025/6/23 21:54, David Hildenbrand wrote:
> 
>> diff --git a/tools/testing/selftests/mm/khugepaged.c b/tools/testing/ 
>> selftests/mm/khugepaged.c
>> index 4341ce6b3b38..85bfff53dba6 100644
>> --- a/tools/testing/selftests/mm/khugepaged.c
>> +++ b/tools/testing/selftests/mm/khugepaged.c
>> @@ -501,11 +501,7 @@ static void __madvise_collapse(const char *msg, 
>> char *p, int nr_hpages,
>>       printf("%s...", msg);
>> -    /*
>> -     * Prevent khugepaged interference and tests that MADV_COLLAPSE
>> -     * ignores /sys/kernel/mm/transparent_hugepage/enabled
>> -     */
>> -    settings.thp_enabled = THP_NEVER;
>> +    settings.thp_enabled = THP_ALWAYS;
> 
> 
> Would MADVISE mode also work here? If we don't set MADV_HUGEPAGE, then 
> khugepaged should be excluded, correct?

I tried this, but some test cases failed. As I replied to Barry, it's 
because some tests previously set MADV_NOHUGEPAGE, and now there is no 
way to clear the MADV_NOHUGEPAGE flag except by setting MADV_HUGEPAGE.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 2/2] mm: shmem: disallow hugepages if the system-wide shmem THP sysfs settings are disabled
  2025-06-23 13:59   ` David Hildenbrand
@ 2025-06-24  1:52     ` Baolin Wang
  0 siblings, 0 replies; 19+ messages in thread
From: Baolin Wang @ 2025-06-24  1:52 UTC (permalink / raw)
  To: David Hildenbrand, akpm, hughd
  Cc: ziy, lorenzo.stoakes, Liam.Howlett, npache, ryan.roberts,
	dev.jain, baohua, linux-mm, linux-kernel



On 2025/6/23 21:59, David Hildenbrand wrote:
> On 23.06.25 10:28, Baolin Wang wrote:
>> When invoking thp_vma_allowable_orders(), if the TVA_ENFORCE_SYSFS 
>> flag is not
>> specified, we will ignore the THP sysfs settings. And the 
>> MADV_COLLAPSE is an
>> example of such a case.
>>
>> The MADV_COLLAPSE will ignore the system-wide shmem THP sysfs 
>> settings, which
>> means that even though we have disabled the shmem THP configuration, 
>> MADV_COLLAPSE
>> will still attempt to collapse into a shmem THP. This violates the 
>> rule we have
>> agreed upon: never means never.
>>
>> Another rule for madvise, referring to David's suggestion: “allowing 
>> for collapsing
>> in a VM without VM_HUGEPAGE in the "madvise" mode would be fine".
>>
>> To fix the MADV_COLLAPSE issue for shmem, then the current strategy 
>> should be:
>>
>> For shmem, if none of always, madvise, within_size, and inherit have 
>> enabled
>> PMD-sized THP, then MADV_COLLAPSE will be prohibited from collapsing 
>> PMD-sized THP.
> 
> I assume we could rephrase that to "For shmem, if "shmem_enabled" is set 
> to either "none" or "deny", then MADV_COLLAPSE will be prohibited from 
> collapsing."

Yes. Setting 'deny' will also prevent MADV_COLLAPSE(), and there is no 
'none' option for 'shmem_enabled'. Will update the commit mesasge to 
make it clear.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/2] mm: huge_memory: disallow hugepages if the system-wide THP sysfs settings are disabled
  2025-06-24  1:48     ` Baolin Wang
@ 2025-06-24  8:29       ` David Hildenbrand
  2025-06-24  9:20         ` Baolin Wang
  0 siblings, 1 reply; 19+ messages in thread
From: David Hildenbrand @ 2025-06-24  8:29 UTC (permalink / raw)
  To: Baolin Wang, akpm, hughd
  Cc: ziy, lorenzo.stoakes, Liam.Howlett, npache, ryan.roberts,
	dev.jain, baohua, linux-mm, linux-kernel

On 24.06.25 03:48, Baolin Wang wrote:
> 
> 
> On 2025/6/23 21:54, David Hildenbrand wrote:
>>
>>> diff --git a/tools/testing/selftests/mm/khugepaged.c b/tools/testing/
>>> selftests/mm/khugepaged.c
>>> index 4341ce6b3b38..85bfff53dba6 100644
>>> --- a/tools/testing/selftests/mm/khugepaged.c
>>> +++ b/tools/testing/selftests/mm/khugepaged.c
>>> @@ -501,11 +501,7 @@ static void __madvise_collapse(const char *msg,
>>> char *p, int nr_hpages,
>>>        printf("%s...", msg);
>>> -    /*
>>> -     * Prevent khugepaged interference and tests that MADV_COLLAPSE
>>> -     * ignores /sys/kernel/mm/transparent_hugepage/enabled
>>> -     */
>>> -    settings.thp_enabled = THP_NEVER;
>>> +    settings.thp_enabled = THP_ALWAYS;
>>
>>
>> Would MADVISE mode also work here? If we don't set MADV_HUGEPAGE, then
>> khugepaged should be excluded, correct?
> 
> I tried this, but some test cases failed. As I replied to Barry, it's
> because some tests previously set MADV_NOHUGEPAGE, and now there is no
> way to clear the MADV_NOHUGEPAGE flag except by setting MADV_HUGEPAGE.

Okay, can you add that detail to the patch description. I suspect we 
really want a way to undo what MADV_NOHUGEPAGE/MADV_NOHUGEPAGE did (if 
only naming wouldn't be complicated: MADV_DEFAULT_HUGEPAGE, hmmmm).

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/2] mm: huge_memory: disallow hugepages if the system-wide THP sysfs settings are disabled
  2025-06-23  8:28 ` [PATCH v3 1/2] mm: huge_memory: disallow hugepages if the system-wide THP sysfs " Baolin Wang
                     ` (3 preceding siblings ...)
  2025-06-23 14:39   ` Zi Yan
@ 2025-06-24  8:41   ` Dev Jain
  2025-06-24  9:57     ` Baolin Wang
  4 siblings, 1 reply; 19+ messages in thread
From: Dev Jain @ 2025-06-24  8:41 UTC (permalink / raw)
  To: Baolin Wang, akpm, hughd, david
  Cc: ziy, lorenzo.stoakes, Liam.Howlett, npache, ryan.roberts, baohua,
	linux-mm, linux-kernel


On 23/06/25 1:58 pm, Baolin Wang wrote:
> When invoking thp_vma_allowable_orders(), the TVA_ENFORCE_SYSFS flag is not
> specified, we will ignore the THP sysfs settings. Whilst it makes sense for the
> callers who do not specify this flag, it creates a odd and surprising situation
> where a sysadmin specifying 'never' for all THP sizes still observing THP pages
> being allocated and used on the system.
>
> The motivating case for this is MADV_COLLAPSE. The MADV_COLLAPSE will ignore
> the system-wide Anon THP sysfs settings, which means that even though we have
> disabled the Anon THP configuration, MADV_COLLAPSE will still attempt to collapse
> into a Anon THP. This violates the rule we have agreed upon: never means never.
>
> Currently, besides MADV_COLLAPSE not setting TVA_ENFORCE_SYSFS, there is only
> one other instance where TVA_ENFORCE_SYSFS is not set, which is in the
> collapse_pte_mapped_thp() function, but I believe this is reasonable from its
> comments:
>
> "
> /*
>   * If we are here, we've succeeded in replacing all the native pages
>   * in the page cache with a single hugepage. If a mm were to fault-in
>   * this memory (mapped by a suitably aligned VMA), we'd get the hugepage
>   * and map it by a PMD, regardless of sysfs THP settings. As such, let's
>   * analogously elide sysfs THP settings here.
>   */
> if (!thp_vma_allowable_order(vma, vma->vm_flags, 0, PMD_ORDER))

So the behaviour now is: First check whether THP settings converge to never.
Then, if enforce_sysfs is not set, return immediately. So in this khugepaged
code will it be better to call __thp_vma_allowable_orders()? If the sysfs
settings are changed to never before hitting collapse_pte_mapped_thp(),
then right now we will return SCAN_VMA_CHECK from here, whereas, the comment
says "regardless of sysfs THP settings", which should include "regardless
of whether the sysfs settings say never".

> "
>
> Another rule for madvise, referring to David's suggestion: “allowing for
> collapsing in a VM without VM_HUGEPAGE in the "madvise" mode would be fine".
>
> To address this issue, the current strategy should be:
>
> If no hugepage modes are enabled for the desired orders, nor can we enable them
> by inheriting from a 'global' enabled setting - then it must be the case that
> all desired orders either specify or inherit 'NEVER' - and we must abort.
>
> Meanwhile, we should fix the khugepaged selftest for MADV_COLLAPSE by enabling
> THP.
>
> Suggested-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> ---
>   include/linux/huge_mm.h                 | 51 ++++++++++++++++++-------
>   tools/testing/selftests/mm/khugepaged.c |  6 +--
>   2 files changed, 39 insertions(+), 18 deletions(-)
>
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index 4d5bb67dc4ec..ab70ca4e704b 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -267,6 +267,42 @@ unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma,
>   					 unsigned long tva_flags,
>   					 unsigned long orders);
>   
> +/* Strictly mask requested anonymous orders according to sysfs settings. */
> +static inline unsigned long __thp_mask_anon_orders(unsigned long vm_flags,
> +		unsigned long tva_flags, unsigned long orders)
> +{
> +	const unsigned long always = READ_ONCE(huge_anon_orders_always);
> +	const unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
> +	const unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);
> +	const unsigned long never = ~(always | madvise | inherit);
> +	const bool inherit_never = !hugepage_global_enabled();
> +
> +	/* Disallow orders that are set to NEVER directly ... */
> +	orders &= ~never;
> +
> +	/* ... or through inheritance (global == NEVER). */
> +	if (inherit_never)
> +		orders &= ~inherit;
> +
> +	/*
> +	 * Otherwise, we only enforce sysfs settings if asked. In addition,
> +	 * if the user sets a sysfs mode of madvise and if TVA_ENFORCE_SYSFS
> +	 * is not set, we don't bother checking whether the VMA has VM_HUGEPAGE
> +	 * set.
> +	 */
> +	if (!(tva_flags & TVA_ENFORCE_SYSFS))
> +		return orders;
> +
> +	/* We already excluded never inherit above. */
> +	if (vm_flags & VM_HUGEPAGE)
> +		return orders & (always | madvise | inherit);
> +
> +	if (hugepage_global_always())
> +		return orders & (always | inherit);
> +
> +	return orders & always;
> +}
> +
>   /**
>    * thp_vma_allowable_orders - determine hugepage orders that are allowed for vma
>    * @vma:  the vm area to check
> @@ -289,19 +325,8 @@ unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma,
>   				       unsigned long orders)
>   {
>   	/* Optimization to check if required orders are enabled early. */
> -	if ((tva_flags & TVA_ENFORCE_SYSFS) && vma_is_anonymous(vma)) {
> -		unsigned long mask = READ_ONCE(huge_anon_orders_always);
> -
> -		if (vm_flags & VM_HUGEPAGE)
> -			mask |= READ_ONCE(huge_anon_orders_madvise);
> -		if (hugepage_global_always() ||
> -		    ((vm_flags & VM_HUGEPAGE) && hugepage_global_enabled()))
> -			mask |= READ_ONCE(huge_anon_orders_inherit);
> -
> -		orders &= mask;
> -		if (!orders)
> -			return 0;
> -	}
> +	if (vma_is_anonymous(vma))
> +		orders = __thp_mask_anon_orders(vm_flags, tva_flags, orders);
>   
>   	return __thp_vma_allowable_orders(vma, vm_flags, tva_flags, orders);
>   }
> diff --git a/tools/testing/selftests/mm/khugepaged.c b/tools/testing/selftests/mm/khugepaged.c
> index 4341ce6b3b38..85bfff53dba6 100644
> --- a/tools/testing/selftests/mm/khugepaged.c
> +++ b/tools/testing/selftests/mm/khugepaged.c
> @@ -501,11 +501,7 @@ static void __madvise_collapse(const char *msg, char *p, int nr_hpages,
>   
>   	printf("%s...", msg);
>   
> -	/*
> -	 * Prevent khugepaged interference and tests that MADV_COLLAPSE
> -	 * ignores /sys/kernel/mm/transparent_hugepage/enabled
> -	 */
> -	settings.thp_enabled = THP_NEVER;
> +	settings.thp_enabled = THP_ALWAYS;
>   	settings.shmem_enabled = SHMEM_NEVER;
>   	thp_push_settings(&settings);
>   

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/2] mm: huge_memory: disallow hugepages if the system-wide THP sysfs settings are disabled
  2025-06-24  8:29       ` David Hildenbrand
@ 2025-06-24  9:20         ` Baolin Wang
  0 siblings, 0 replies; 19+ messages in thread
From: Baolin Wang @ 2025-06-24  9:20 UTC (permalink / raw)
  To: David Hildenbrand, akpm, hughd
  Cc: ziy, lorenzo.stoakes, Liam.Howlett, npache, ryan.roberts,
	dev.jain, baohua, linux-mm, linux-kernel



On 2025/6/24 16:29, David Hildenbrand wrote:
> On 24.06.25 03:48, Baolin Wang wrote:
>>
>>
>> On 2025/6/23 21:54, David Hildenbrand wrote:
>>>
>>>> diff --git a/tools/testing/selftests/mm/khugepaged.c b/tools/testing/
>>>> selftests/mm/khugepaged.c
>>>> index 4341ce6b3b38..85bfff53dba6 100644
>>>> --- a/tools/testing/selftests/mm/khugepaged.c
>>>> +++ b/tools/testing/selftests/mm/khugepaged.c
>>>> @@ -501,11 +501,7 @@ static void __madvise_collapse(const char *msg,
>>>> char *p, int nr_hpages,
>>>>        printf("%s...", msg);
>>>> -    /*
>>>> -     * Prevent khugepaged interference and tests that MADV_COLLAPSE
>>>> -     * ignores /sys/kernel/mm/transparent_hugepage/enabled
>>>> -     */
>>>> -    settings.thp_enabled = THP_NEVER;
>>>> +    settings.thp_enabled = THP_ALWAYS;
>>>
>>>
>>> Would MADVISE mode also work here? If we don't set MADV_HUGEPAGE, then
>>> khugepaged should be excluded, correct?
>>
>> I tried this, but some test cases failed. As I replied to Barry, it's
>> because some tests previously set MADV_NOHUGEPAGE, and now there is no
>> way to clear the MADV_NOHUGEPAGE flag except by setting MADV_HUGEPAGE.
> 
> Okay, can you add that detail to the patch description. 

Sure. Will do.

> I suspect we 
> really want a way to undo what MADV_NOHUGEPAGE/MADV_NOHUGEPAGE did (if 
> only naming wouldn't be complicated: MADV_DEFAULT_HUGEPAGE, hmmmm).

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/2] mm: huge_memory: disallow hugepages if the system-wide THP sysfs settings are disabled
  2025-06-24  8:41   ` Dev Jain
@ 2025-06-24  9:57     ` Baolin Wang
  2025-06-24 14:08       ` Baolin Wang
  0 siblings, 1 reply; 19+ messages in thread
From: Baolin Wang @ 2025-06-24  9:57 UTC (permalink / raw)
  To: Dev Jain, akpm, hughd, david
  Cc: ziy, lorenzo.stoakes, Liam.Howlett, npache, ryan.roberts, baohua,
	linux-mm, linux-kernel



On 2025/6/24 16:41, Dev Jain wrote:
> 
> On 23/06/25 1:58 pm, Baolin Wang wrote:
>> When invoking thp_vma_allowable_orders(), the TVA_ENFORCE_SYSFS flag 
>> is not
>> specified, we will ignore the THP sysfs settings. Whilst it makes 
>> sense for the
>> callers who do not specify this flag, it creates a odd and surprising 
>> situation
>> where a sysadmin specifying 'never' for all THP sizes still observing 
>> THP pages
>> being allocated and used on the system.
>>
>> The motivating case for this is MADV_COLLAPSE. The MADV_COLLAPSE will 
>> ignore
>> the system-wide Anon THP sysfs settings, which means that even though 
>> we have
>> disabled the Anon THP configuration, MADV_COLLAPSE will still attempt 
>> to collapse
>> into a Anon THP. This violates the rule we have agreed upon: never 
>> means never.
>>
>> Currently, besides MADV_COLLAPSE not setting TVA_ENFORCE_SYSFS, there 
>> is only
>> one other instance where TVA_ENFORCE_SYSFS is not set, which is in the
>> collapse_pte_mapped_thp() function, but I believe this is reasonable 
>> from its
>> comments:
>>
>> "
>> /*
>>   * If we are here, we've succeeded in replacing all the native pages
>>   * in the page cache with a single hugepage. If a mm were to fault-in
>>   * this memory (mapped by a suitably aligned VMA), we'd get the hugepage
>>   * and map it by a PMD, regardless of sysfs THP settings. As such, let's
>>   * analogously elide sysfs THP settings here.
>>   */
>> if (!thp_vma_allowable_order(vma, vma->vm_flags, 0, PMD_ORDER))
> 
> So the behaviour now is: First check whether THP settings converge to 
> never.
> Then, if enforce_sysfs is not set, return immediately. So in this 
> khugepaged
> code will it be better to call __thp_vma_allowable_orders()? If the sysfs
> settings are changed to never before hitting collapse_pte_mapped_thp(),
> then right now we will return SCAN_VMA_CHECK from here, whereas, the 
> comment
> says "regardless of sysfs THP settings", which should include "regardless
> of whether the sysfs settings say never".

Sounds reasonable to me. Thanks.

I will change thp_vma_allowable_order() to __thp_vma_allowable_orders() 
in the collapse_pte_mapped_thp() function to maintain consistency with 
the original logic.

Lorenzo and David, how do you think? Thanks.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/2] mm: huge_memory: disallow hugepages if the system-wide THP sysfs settings are disabled
  2025-06-24  9:57     ` Baolin Wang
@ 2025-06-24 14:08       ` Baolin Wang
  2025-06-24 14:42         ` Dev Jain
  0 siblings, 1 reply; 19+ messages in thread
From: Baolin Wang @ 2025-06-24 14:08 UTC (permalink / raw)
  To: Dev Jain, akpm, hughd, david
  Cc: ziy, lorenzo.stoakes, Liam.Howlett, npache, ryan.roberts, baohua,
	linux-mm, linux-kernel



On 2025/6/24 17:57, Baolin Wang wrote:
> 
> 
> On 2025/6/24 16:41, Dev Jain wrote:
>>
>> On 23/06/25 1:58 pm, Baolin Wang wrote:
>>> When invoking thp_vma_allowable_orders(), the TVA_ENFORCE_SYSFS flag 
>>> is not
>>> specified, we will ignore the THP sysfs settings. Whilst it makes 
>>> sense for the
>>> callers who do not specify this flag, it creates a odd and surprising 
>>> situation
>>> where a sysadmin specifying 'never' for all THP sizes still observing 
>>> THP pages
>>> being allocated and used on the system.
>>>
>>> The motivating case for this is MADV_COLLAPSE. The MADV_COLLAPSE will 
>>> ignore
>>> the system-wide Anon THP sysfs settings, which means that even though 
>>> we have
>>> disabled the Anon THP configuration, MADV_COLLAPSE will still attempt 
>>> to collapse
>>> into a Anon THP. This violates the rule we have agreed upon: never 
>>> means never.
>>>
>>> Currently, besides MADV_COLLAPSE not setting TVA_ENFORCE_SYSFS, there 
>>> is only
>>> one other instance where TVA_ENFORCE_SYSFS is not set, which is in the
>>> collapse_pte_mapped_thp() function, but I believe this is reasonable 
>>> from its
>>> comments:
>>>
>>> "
>>> /*
>>>   * If we are here, we've succeeded in replacing all the native pages
>>>   * in the page cache with a single hugepage. If a mm were to fault-in
>>>   * this memory (mapped by a suitably aligned VMA), we'd get the 
>>> hugepage
>>>   * and map it by a PMD, regardless of sysfs THP settings. As such, 
>>> let's
>>>   * analogously elide sysfs THP settings here.
>>>   */
>>> if (!thp_vma_allowable_order(vma, vma->vm_flags, 0, PMD_ORDER))
>>
>> So the behaviour now is: First check whether THP settings converge to 
>> never.
>> Then, if enforce_sysfs is not set, return immediately. So in this 
>> khugepaged
>> code will it be better to call __thp_vma_allowable_orders()? If the sysfs
>> settings are changed to never before hitting collapse_pte_mapped_thp(),
>> then right now we will return SCAN_VMA_CHECK from here, whereas, the 
>> comment
>> says "regardless of sysfs THP settings", which should include "regardless
>> of whether the sysfs settings say never".
> 
> Sounds reasonable to me. Thanks.
> 
> I will change thp_vma_allowable_order() to __thp_vma_allowable_orders() 
> in the collapse_pte_mapped_thp() function to maintain consistency with 
> the original logic.
> 
> Lorenzo and David, how do you think? Thanks.

After thinking more, since collapse_pte_mapped_thp() is only used for 
file/shmem collapse, changing to __thp_vma_allowable_orders() has no 
effect. So I prefer to leave it as is.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/2] mm: huge_memory: disallow hugepages if the system-wide THP sysfs settings are disabled
  2025-06-24 14:08       ` Baolin Wang
@ 2025-06-24 14:42         ` Dev Jain
  0 siblings, 0 replies; 19+ messages in thread
From: Dev Jain @ 2025-06-24 14:42 UTC (permalink / raw)
  To: Baolin Wang, akpm, hughd, david
  Cc: ziy, lorenzo.stoakes, Liam.Howlett, npache, ryan.roberts, baohua,
	linux-mm, linux-kernel


On 24/06/25 7:38 pm, Baolin Wang wrote:
>
>
> On 2025/6/24 17:57, Baolin Wang wrote:
>>
>>
>> On 2025/6/24 16:41, Dev Jain wrote:
>>>
>>> On 23/06/25 1:58 pm, Baolin Wang wrote:
>>>> When invoking thp_vma_allowable_orders(), the TVA_ENFORCE_SYSFS 
>>>> flag is not
>>>> specified, we will ignore the THP sysfs settings. Whilst it makes 
>>>> sense for the
>>>> callers who do not specify this flag, it creates a odd and 
>>>> surprising situation
>>>> where a sysadmin specifying 'never' for all THP sizes still 
>>>> observing THP pages
>>>> being allocated and used on the system.
>>>>
>>>> The motivating case for this is MADV_COLLAPSE. The MADV_COLLAPSE 
>>>> will ignore
>>>> the system-wide Anon THP sysfs settings, which means that even 
>>>> though we have
>>>> disabled the Anon THP configuration, MADV_COLLAPSE will still 
>>>> attempt to collapse
>>>> into a Anon THP. This violates the rule we have agreed upon: never 
>>>> means never.
>>>>
>>>> Currently, besides MADV_COLLAPSE not setting TVA_ENFORCE_SYSFS, 
>>>> there is only
>>>> one other instance where TVA_ENFORCE_SYSFS is not set, which is in the
>>>> collapse_pte_mapped_thp() function, but I believe this is 
>>>> reasonable from its
>>>> comments:
>>>>
>>>> "
>>>> /*
>>>>   * If we are here, we've succeeded in replacing all the native pages
>>>>   * in the page cache with a single hugepage. If a mm were to fault-in
>>>>   * this memory (mapped by a suitably aligned VMA), we'd get the 
>>>> hugepage
>>>>   * and map it by a PMD, regardless of sysfs THP settings. As such, 
>>>> let's
>>>>   * analogously elide sysfs THP settings here.
>>>>   */
>>>> if (!thp_vma_allowable_order(vma, vma->vm_flags, 0, PMD_ORDER))
>>>
>>> So the behaviour now is: First check whether THP settings converge 
>>> to never.
>>> Then, if enforce_sysfs is not set, return immediately. So in this 
>>> khugepaged
>>> code will it be better to call __thp_vma_allowable_orders()? If the 
>>> sysfs
>>> settings are changed to never before hitting collapse_pte_mapped_thp(),
>>> then right now we will return SCAN_VMA_CHECK from here, whereas, the 
>>> comment
>>> says "regardless of sysfs THP settings", which should include 
>>> "regardless
>>> of whether the sysfs settings say never".
>>
>> Sounds reasonable to me. Thanks.
>>
>> I will change thp_vma_allowable_order() to 
>> __thp_vma_allowable_orders() in the collapse_pte_mapped_thp() 
>> function to maintain consistency with the original logic.
>>
>> Lorenzo and David, how do you think? Thanks.
>
> After thinking more, since collapse_pte_mapped_thp() is only used for 
> file/shmem collapse, changing to __thp_vma_allowable_orders() has no 
> effect. So I prefer to leave it as is.


Oops my bad, thanks.


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2025-06-24 14:43 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-23  8:28 [PATCH v3 0/2] fix MADV_COLLAPSE issue if THP settings are disabled Baolin Wang
2025-06-23  8:28 ` [PATCH v3 1/2] mm: huge_memory: disallow hugepages if the system-wide THP sysfs " Baolin Wang
2025-06-23 10:26   ` Lorenzo Stoakes
2025-06-24  1:45     ` Baolin Wang
2025-06-23 11:08   ` Barry Song
2025-06-24  1:44     ` Baolin Wang
2025-06-23 13:54   ` David Hildenbrand
2025-06-24  1:48     ` Baolin Wang
2025-06-24  8:29       ` David Hildenbrand
2025-06-24  9:20         ` Baolin Wang
2025-06-23 14:39   ` Zi Yan
2025-06-24  8:41   ` Dev Jain
2025-06-24  9:57     ` Baolin Wang
2025-06-24 14:08       ` Baolin Wang
2025-06-24 14:42         ` Dev Jain
2025-06-23  8:28 ` [PATCH v3 2/2] mm: shmem: disallow hugepages if the system-wide shmem " Baolin Wang
2025-06-23 10:45   ` Lorenzo Stoakes
2025-06-23 13:59   ` David Hildenbrand
2025-06-24  1:52     ` Baolin Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).