* [PATCH v2 0/2] fix MADV_COLLAPSE issue if THP settings are disabled
@ 2025-06-05 8:00 Baolin Wang
2025-06-05 8:00 ` [PATCH v2 1/2] mm: huge_memory: disallow hugepages if the system-wide THP sysfs " Baolin Wang
` (3 more replies)
0 siblings, 4 replies; 49+ messages in thread
From: Baolin Wang @ 2025-06-05 8:00 UTC (permalink / raw)
To: akpm, hughd, david
Cc: lorenzo.stoakes, Liam.Howlett, npache, ryan.roberts, dev.jain,
ziy, baolin.wang, linux-mm, linux-kernel
As we discussed in the previous thread [1], the MADV_COLLAPSE will ignore
the system-wide anon/shmem THP sysfs settings, which means that even though
we have disabled the anon/shmem THP configuration, MADV_COLLAPSE will still
attempt to collapse into a anon/shmem THP. This violates the rule we have
agreed upon: never means never. This patch set will address this issue.
[1] https://lore.kernel.org/all/1f00fdc3-a3a3-464b-8565-4c1b23d34f8d@linux.alibaba.com/
Changes from v1:
- Update the commit message, per Zi.
- Add Zi's reviewed tag. Thanks.
- Update the shmem logic.
Baolin Wang (2):
mm: huge_memory: disallow hugepages if the system-wide THP sysfs
settings are disabled
mm: shmem: disallow hugepages if the system-wide shmem THP sysfs
settings are disabled
include/linux/huge_mm.h | 23 +++++++++++++++++++----
mm/huge_memory.c | 2 +-
mm/shmem.c | 6 +++---
3 files changed, 23 insertions(+), 8 deletions(-)
--
2.43.5
^ permalink raw reply [flat|nested] 49+ messages in thread
* [PATCH v2 1/2] mm: huge_memory: disallow hugepages if the system-wide THP sysfs settings are disabled
2025-06-05 8:00 [PATCH v2 0/2] fix MADV_COLLAPSE issue if THP settings are disabled Baolin Wang
@ 2025-06-05 8:00 ` Baolin Wang
2025-06-06 16:49 ` Dev Jain
` (3 more replies)
2025-06-05 8:00 ` [PATCH v2 2/2] mm: shmem: disallow hugepages if the system-wide shmem " Baolin Wang
` (2 subsequent siblings)
3 siblings, 4 replies; 49+ messages in thread
From: Baolin Wang @ 2025-06-05 8:00 UTC (permalink / raw)
To: akpm, hughd, david
Cc: lorenzo.stoakes, Liam.Howlett, npache, ryan.roberts, dev.jain,
ziy, baolin.wang, linux-mm, linux-kernel
The MADV_COLLAPSE will ignore the system-wide Anon THP sysfs settings, which
means that even though we have disabled the Anon THP configuration, MADV_COLLAPSE
will still attempt to collapse into a Anon THP. This violates the rule we have
agreed upon: never means never.
Another rule for madvise, referring to David's suggestion: “allowing for collapsing
in a VM without VM_HUGEPAGE in the "madvise" mode would be fine".
To address this issue, should check whether the Anon THP configuration is disabled
in thp_vma_allowable_orders(), even when the TVA_ENFORCE_SYSFS flag is set.
In summary, the current strategy is:
1. If always & orders == 0, and madvise & orders == 0, and hugepage_global_enabled() == false
(global THP settings are not enabled), it means mTHP of that orders are prohibited
from being used, then madvise_collapse() is forbidden for that orders.
2. If always & orders == 0, and madvise & orders == 0, and hugepage_global_enabled() == true
(global THP settings are enabled), and inherit & orders == 0, it means mTHP of that
orders are still prohibited from being used, thus madvise_collapse() is not allowed
for that orders.
Reviewed-by: Zi Yan <ziy@nvidia.com>
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
include/linux/huge_mm.h | 23 +++++++++++++++++++----
1 file changed, 19 insertions(+), 4 deletions(-)
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 2f190c90192d..199ddc9f04a1 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -287,20 +287,35 @@ unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma,
unsigned long orders)
{
/* Optimization to check if required orders are enabled early. */
- if ((tva_flags & TVA_ENFORCE_SYSFS) && vma_is_anonymous(vma)) {
- unsigned long mask = READ_ONCE(huge_anon_orders_always);
+ if (vma_is_anonymous(vma)) {
+ unsigned long always = READ_ONCE(huge_anon_orders_always);
+ unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
+ unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);
+ unsigned long mask = always | madvise;
+
+ /*
+ * If the system-wide THP/mTHP sysfs settings are disabled,
+ * then we should never allow hugepages.
+ */
+ if (!(mask & orders) && !(hugepage_global_enabled() && (inherit & orders)))
+ return 0;
+
+ if (!(tva_flags & TVA_ENFORCE_SYSFS))
+ goto skip;
+ mask = always;
if (vm_flags & VM_HUGEPAGE)
- mask |= READ_ONCE(huge_anon_orders_madvise);
+ mask |= madvise;
if (hugepage_global_always() ||
((vm_flags & VM_HUGEPAGE) && hugepage_global_enabled()))
- mask |= READ_ONCE(huge_anon_orders_inherit);
+ mask |= inherit;
orders &= mask;
if (!orders)
return 0;
}
+skip:
return __thp_vma_allowable_orders(vma, vm_flags, tva_flags, orders);
}
--
2.43.5
^ permalink raw reply related [flat|nested] 49+ messages in thread
* [PATCH v2 2/2] mm: shmem: disallow hugepages if the system-wide shmem THP sysfs settings are disabled
2025-06-05 8:00 [PATCH v2 0/2] fix MADV_COLLAPSE issue if THP settings are disabled Baolin Wang
2025-06-05 8:00 ` [PATCH v2 1/2] mm: huge_memory: disallow hugepages if the system-wide THP sysfs " Baolin Wang
@ 2025-06-05 8:00 ` Baolin Wang
2025-06-07 12:14 ` Lorenzo Stoakes
2025-06-07 12:28 ` [PATCH v2 0/2] fix MADV_COLLAPSE issue if THP " Lorenzo Stoakes
2025-06-13 14:23 ` Usama Arif
3 siblings, 1 reply; 49+ messages in thread
From: Baolin Wang @ 2025-06-05 8:00 UTC (permalink / raw)
To: akpm, hughd, david
Cc: lorenzo.stoakes, Liam.Howlett, npache, ryan.roberts, dev.jain,
ziy, baolin.wang, linux-mm, linux-kernel
The MADV_COLLAPSE will ignore the system-wide shmem THP sysfs settings, which
means that even though we have disabled the shmem THP configuration, MADV_COLLAPSE
will still attempt to collapse into a shmem THP. This violates the rule we have
agreed upon: never means never.
Another rule for madvise, referring to David's suggestion: “allowing for collapsing
in a VM without VM_HUGEPAGE in the "madvise" mode would be fine".
Then the current strategy is:
For shmem, if none of always, madvise, within_size, and inherit have enabled
PMD-sized THP, then MADV_COLLAPSE will be prohibited from collapsing PMD-sized THP.
For tmpfs, if the mount option is set with the 'huge=never' parameter, then
MADV_COLLAPSE will be prohibited from collapsing PMD-sized THP.
Acked-by: Zi Yan <ziy@nvidia.com>
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
mm/huge_memory.c | 2 +-
mm/shmem.c | 6 +++---
2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index d3e66136e41a..a8cfa37cae72 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -166,7 +166,7 @@ unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma,
* own flags.
*/
if (!in_pf && shmem_file(vma->vm_file))
- return shmem_allowable_huge_orders(file_inode(vma->vm_file),
+ return orders & shmem_allowable_huge_orders(file_inode(vma->vm_file),
vma, vma->vm_pgoff, 0,
!enforce_sysfs);
diff --git a/mm/shmem.c b/mm/shmem.c
index 4b42419ce6b2..9af45d4e27e6 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -625,7 +625,7 @@ static unsigned int shmem_huge_global_enabled(struct inode *inode, pgoff_t index
return 0;
if (shmem_huge == SHMEM_HUGE_DENY)
return 0;
- if (shmem_huge_force || shmem_huge == SHMEM_HUGE_FORCE)
+ if (shmem_huge == SHMEM_HUGE_FORCE)
return maybe_pmd_order;
/*
@@ -660,7 +660,7 @@ static unsigned int shmem_huge_global_enabled(struct inode *inode, pgoff_t index
fallthrough;
case SHMEM_HUGE_ADVISE:
- if (vm_flags & VM_HUGEPAGE)
+ if (shmem_huge_force || (vm_flags & VM_HUGEPAGE))
return maybe_pmd_order;
fallthrough;
default:
@@ -1790,7 +1790,7 @@ unsigned long shmem_allowable_huge_orders(struct inode *inode,
/* Allow mTHP that will be fully within i_size. */
mask |= shmem_get_orders_within_size(inode, within_size_orders, index, 0);
- if (vm_flags & VM_HUGEPAGE)
+ if (shmem_huge_force || (vm_flags & VM_HUGEPAGE))
mask |= READ_ONCE(huge_shmem_orders_madvise);
if (global_orders > 0)
--
2.43.5
^ permalink raw reply related [flat|nested] 49+ messages in thread
* Re: [PATCH v2 1/2] mm: huge_memory: disallow hugepages if the system-wide THP sysfs settings are disabled
2025-06-05 8:00 ` [PATCH v2 1/2] mm: huge_memory: disallow hugepages if the system-wide THP sysfs " Baolin Wang
@ 2025-06-06 16:49 ` Dev Jain
2025-06-06 18:47 ` Dev Jain
2025-06-07 11:55 ` Lorenzo Stoakes
` (2 subsequent siblings)
3 siblings, 1 reply; 49+ messages in thread
From: Dev Jain @ 2025-06-06 16:49 UTC (permalink / raw)
To: Baolin Wang, akpm, hughd, david
Cc: lorenzo.stoakes, Liam.Howlett, npache, ryan.roberts, ziy,
linux-mm, linux-kernel
On 05/06/25 1:30 pm, Baolin Wang wrote:
> The MADV_COLLAPSE will ignore the system-wide Anon THP sysfs settings, which
> means that even though we have disabled the Anon THP configuration, MADV_COLLAPSE
> will still attempt to collapse into a Anon THP. This violates the rule we have
> agreed upon: never means never.
>
> Another rule for madvise, referring to David's suggestion: “allowing for collapsing
> in a VM without VM_HUGEPAGE in the "madvise" mode would be fine".
>
> To address this issue, should check whether the Anon THP configuration is disabled
> in thp_vma_allowable_orders(), even when the TVA_ENFORCE_SYSFS flag is set.
Did you mean to say "even when the TVA_ENFORCE_SYSFS flag is *not* set"? Because if
is set, then we have to check the anon THP sysfs config.
>
> In summary, the current strategy is:
>
> 1. If always & orders == 0, and madvise & orders == 0, and hugepage_global_enabled() == false
> (global THP settings are not enabled), it means mTHP of that orders are prohibited
> from being used, then madvise_collapse() is forbidden for that orders.
>
> 2. If always & orders == 0, and madvise & orders == 0, and hugepage_global_enabled() == true
> (global THP settings are enabled), and inherit & orders == 0, it means mTHP of that
> orders are still prohibited from being used, thus madvise_collapse() is not allowed
> for that orders.
>
> Reviewed-by: Zi Yan <ziy@nvidia.com>
> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> ---
> include/linux/huge_mm.h | 23 +++++++++++++++++++----
> 1 file changed, 19 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index 2f190c90192d..199ddc9f04a1 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -287,20 +287,35 @@ unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma,
> unsigned long orders)
> {
> /* Optimization to check if required orders are enabled early. */
> - if ((tva_flags & TVA_ENFORCE_SYSFS) && vma_is_anonymous(vma)) {
> - unsigned long mask = READ_ONCE(huge_anon_orders_always);
> + if (vma_is_anonymous(vma)) {
> + unsigned long always = READ_ONCE(huge_anon_orders_always);
> + unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
> + unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);
> + unsigned long mask = always | madvise;
> +
> + /*
> + * If the system-wide THP/mTHP sysfs settings are disabled,
> + * then we should never allow hugepages.
> + */
> + if (!(mask & orders) && !(hugepage_global_enabled() && (inherit & orders)))
> + return 0;
> +
> + if (!(tva_flags & TVA_ENFORCE_SYSFS))
> + goto skip;
>
> + mask = always;
> if (vm_flags & VM_HUGEPAGE)
> - mask |= READ_ONCE(huge_anon_orders_madvise);
> + mask |= madvise;
> if (hugepage_global_always() ||
> ((vm_flags & VM_HUGEPAGE) && hugepage_global_enabled()))
> - mask |= READ_ONCE(huge_anon_orders_inherit);
> + mask |= inherit;
>
> orders &= mask;
> if (!orders)
> return 0;
> }
>
> +skip:
> return __thp_vma_allowable_orders(vma, vm_flags, tva_flags, orders);
> }
>
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH v2 1/2] mm: huge_memory: disallow hugepages if the system-wide THP sysfs settings are disabled
2025-06-06 16:49 ` Dev Jain
@ 2025-06-06 18:47 ` Dev Jain
2025-06-09 5:57 ` Baolin Wang
0 siblings, 1 reply; 49+ messages in thread
From: Dev Jain @ 2025-06-06 18:47 UTC (permalink / raw)
To: Baolin Wang, akpm, hughd, david
Cc: lorenzo.stoakes, Liam.Howlett, npache, ryan.roberts, ziy,
linux-mm, linux-kernel
On 06/06/25 10:19 pm, Dev Jain wrote:
>
> On 05/06/25 1:30 pm, Baolin Wang wrote:
>> The MADV_COLLAPSE will ignore the system-wide Anon THP sysfs
>> settings, which
>> means that even though we have disabled the Anon THP configuration,
>> MADV_COLLAPSE
>> will still attempt to collapse into a Anon THP. This violates the
>> rule we have
>> agreed upon: never means never.
>>
>> Another rule for madvise, referring to David's suggestion: “allowing
>> for collapsing
>> in a VM without VM_HUGEPAGE in the "madvise" mode would be fine".
>>
>> To address this issue, should check whether the Anon THP
>> configuration is disabled
>> in thp_vma_allowable_orders(), even when the TVA_ENFORCE_SYSFS flag
>> is set.
>
> Did you mean to say "even when the TVA_ENFORCE_SYSFS flag is *not*
> set"? Because if
> is set, then we have to check the anon THP sysfs config.
Otherwise the patch itself looks good to me, so:
Reviewed-by: Dev Jain <dev.jain@arm.com>
>
>>
>> In summary, the current strategy is:
>>
>> 1. If always & orders == 0, and madvise & orders == 0, and
>> hugepage_global_enabled() == false
>> (global THP settings are not enabled), it means mTHP of that orders
>> are prohibited
>> from being used, then madvise_collapse() is forbidden for that orders.
>>
>> 2. If always & orders == 0, and madvise & orders == 0, and
>> hugepage_global_enabled() == true
>> (global THP settings are enabled), and inherit & orders == 0, it
>> means mTHP of that
>> orders are still prohibited from being used, thus madvise_collapse()
>> is not allowed
>> for that orders.
>>
>> Reviewed-by: Zi Yan <ziy@nvidia.com>
>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>> ---
>> include/linux/huge_mm.h | 23 +++++++++++++++++++----
>> 1 file changed, 19 insertions(+), 4 deletions(-)
>>
>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>> index 2f190c90192d..199ddc9f04a1 100644
>> --- a/include/linux/huge_mm.h
>> +++ b/include/linux/huge_mm.h
>> @@ -287,20 +287,35 @@ unsigned long thp_vma_allowable_orders(struct
>> vm_area_struct *vma,
>> unsigned long orders)
>> {
>> /* Optimization to check if required orders are enabled early. */
>> - if ((tva_flags & TVA_ENFORCE_SYSFS) && vma_is_anonymous(vma)) {
>> - unsigned long mask = READ_ONCE(huge_anon_orders_always);
>> + if (vma_is_anonymous(vma)) {
>> + unsigned long always = READ_ONCE(huge_anon_orders_always);
>> + unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
>> + unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);
>> + unsigned long mask = always | madvise;
>> +
>> + /*
>> + * If the system-wide THP/mTHP sysfs settings are disabled,
>> + * then we should never allow hugepages.
>> + */
>> + if (!(mask & orders) && !(hugepage_global_enabled() &&
>> (inherit & orders)))
>> + return 0;
>> +
>> + if (!(tva_flags & TVA_ENFORCE_SYSFS))
>> + goto skip;
>> + mask = always;
>> if (vm_flags & VM_HUGEPAGE)
>> - mask |= READ_ONCE(huge_anon_orders_madvise);
>> + mask |= madvise;
>> if (hugepage_global_always() ||
>> ((vm_flags & VM_HUGEPAGE) && hugepage_global_enabled()))
>> - mask |= READ_ONCE(huge_anon_orders_inherit);
>> + mask |= inherit;
>> orders &= mask;
>> if (!orders)
>> return 0;
>> }
>> +skip:
>> return __thp_vma_allowable_orders(vma, vm_flags, tva_flags,
>> orders);
>> }
>
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH v2 1/2] mm: huge_memory: disallow hugepages if the system-wide THP sysfs settings are disabled
2025-06-05 8:00 ` [PATCH v2 1/2] mm: huge_memory: disallow hugepages if the system-wide THP sysfs " Baolin Wang
2025-06-06 16:49 ` Dev Jain
@ 2025-06-07 11:55 ` Lorenzo Stoakes
2025-06-07 12:21 ` Lorenzo Stoakes
2025-06-09 6:10 ` Baolin Wang
2025-06-08 18:37 ` Nico Pache
2025-06-11 12:34 ` David Hildenbrand
3 siblings, 2 replies; 49+ messages in thread
From: Lorenzo Stoakes @ 2025-06-07 11:55 UTC (permalink / raw)
To: Baolin Wang
Cc: akpm, hughd, david, Liam.Howlett, npache, ryan.roberts, dev.jain,
ziy, linux-mm, linux-kernel
Not related to your patch at all, but man this whole thing (thp allowed orders)
needs significant improvement, it seems always perversely complicated for a
relatively simple operation.
Overall I LOVE what you're doing here, but I feel we can clarify things a
little while we're at it to make it clear exactly what we're doing.
This is a very important change so forgive my fiddling about here but I'm
hoping we can take the opportunity to make things a little simpler!
On Thu, Jun 05, 2025 at 04:00:58PM +0800, Baolin Wang wrote:
> The MADV_COLLAPSE will ignore the system-wide Anon THP sysfs settings, which
> means that even though we have disabled the Anon THP configuration, MADV_COLLAPSE
> will still attempt to collapse into a Anon THP. This violates the rule we have
> agreed upon: never means never.
>
> Another rule for madvise, referring to David's suggestion: “allowing for collapsing
> in a VM without VM_HUGEPAGE in the "madvise" mode would be fine".
I'm generally not sure it's worth talking only about MADV_COLLAPSE here when
you're changing what THP is permitted across the board, I may have missed some
discussion and forgive me if so, but what is special about MADV_COLLAPSE's use
of thp_vma_allowable_orders() that makes it ignore 'never's moreso than other
users?
>
> To address this issue, should check whether the Anon THP configuration is disabled
> in thp_vma_allowable_orders(), even when the TVA_ENFORCE_SYSFS flag is set.
>
> In summary, the current strategy is:
>
> 1. If always & orders == 0, and madvise & orders == 0, and hugepage_global_enabled() == false
> (global THP settings are not enabled), it means mTHP of that orders are prohibited
> from being used, then madvise_collapse() is forbidden for that orders.
>
> 2. If always & orders == 0, and madvise & orders == 0, and hugepage_global_enabled() == true
> (global THP settings are enabled), and inherit & orders == 0, it means mTHP of that
> orders are still prohibited from being used, thus madvise_collapse() is not allowed
> for that orders.
OK so it's already confusing that the global settings only impact 'inherit'
settings below, so they're not really global at all, but rather perhaps should
be called 'inherited'.
Maybe I need to submit a patch to rename thp_inherited_enabled(), or perhaps
that'd just add to the confusion :P
OK this is also not your fault just general commentary.
Anyway, I feel points 1 and 2 can more succinctly be summed up as below,
also there's no need to refer to the code, it's actually clearer I think to
refer to the underlying logic:
If no hugepage modes are enabled for the desired orders, nor can we
enable them by inheriting from a 'global' enabled setting - then it
must be the case that all desired orders either specify or inherit
'NEVER' - and we must abort.
>
> Reviewed-by: Zi Yan <ziy@nvidia.com>
> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> ---
> include/linux/huge_mm.h | 23 +++++++++++++++++++----
> 1 file changed, 19 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index 2f190c90192d..199ddc9f04a1 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -287,20 +287,35 @@ unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma,
> unsigned long orders)
> {
> /* Optimization to check if required orders are enabled early. */
> - if ((tva_flags & TVA_ENFORCE_SYSFS) && vma_is_anonymous(vma)) {
> - unsigned long mask = READ_ONCE(huge_anon_orders_always);
> + if (vma_is_anonymous(vma)) {
> + unsigned long always = READ_ONCE(huge_anon_orders_always);
> + unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
> + unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);
> + unsigned long mask = always | madvise;
> +
> + /*
> + * If the system-wide THP/mTHP sysfs settings are disabled,
> + * then we should never allow hugepages.
> + */
> + if (!(mask & orders) && !(hugepage_global_enabled() && (inherit & orders)))
> + return 0;
> +
> + if (!(tva_flags & TVA_ENFORCE_SYSFS))
> + goto skip;
>
> + mask = always;
> if (vm_flags & VM_HUGEPAGE)
> - mask |= READ_ONCE(huge_anon_orders_madvise);
> + mask |= madvise;
> if (hugepage_global_always() ||
> ((vm_flags & VM_HUGEPAGE) && hugepage_global_enabled()))
> - mask |= READ_ONCE(huge_anon_orders_inherit);
> + mask |= inherit;
>
> orders &= mask;
> if (!orders)
> return 0;
> }
>
> +skip:
> return __thp_vma_allowable_orders(vma, vm_flags, tva_flags, orders);
> }
I feel this is compressing a lot of logic in a way that took me several
readings to understand (hey I might not be the smartest cookie in the jar,
but we need to account for all levels of kernel developer ;)
I feel like we can make things a lot clearer here by separating out with a
helper function (means we can drop some indentation too), and also take
advantage of the fact that, if orders == 0, __thp_vma_allowable_orders()
exits with 0 early so no need for us to do so ourselves:
/* Strictly mask requested anonymous orders according to sysfs settings. */
static inline unsigned long __thp_mask_anon_orders(unsigned long vm_flags,
unsigned long tva_flags, unsigned long orders)
{
unsigned long always = READ_ONCE(huge_anon_orders_always);
unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);;
bool inherit_enabled = hugepage_global_enabled();
bool has_madvise = vm_flags & VM_HUGEPAGE;
unsigned long mask = always | madvise;
mask = always | madvise;
if (inherit_enabled)
mask |= inherit;
/* All set to/inherit NEVER - never means never globally, abort. */
if (!(mask & orders))
return 0;
/* Otherwise, we only enforce sysfs settings if asked. */
if (!(tva_flags & TVA_ENFORCE_SYSFS))
return orders;
mask = always;
if (has_madvise)
mask |= madvise;
if (hugepage_global_always() || (has_madvise && inherit_enabled))
mask |= inherit;
return orders & mask;
}
...
static inline
unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma,
unsigned long vm_flags,
unsigned long tva_flags,
unsigned long orders)
{
if (vma_is_anonymous(vma))
orders = __thp_mask_anon_orders(vm_flags, tva_flags, orders);
return __thp_vma_allowable_orders(vma, vm_flags, tva_flags, orders);
}
>
> --
> 2.43.5
>
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH v2 2/2] mm: shmem: disallow hugepages if the system-wide shmem THP sysfs settings are disabled
2025-06-05 8:00 ` [PATCH v2 2/2] mm: shmem: disallow hugepages if the system-wide shmem " Baolin Wang
@ 2025-06-07 12:14 ` Lorenzo Stoakes
2025-06-07 12:17 ` Lorenzo Stoakes
2025-06-09 6:31 ` Baolin Wang
0 siblings, 2 replies; 49+ messages in thread
From: Lorenzo Stoakes @ 2025-06-07 12:14 UTC (permalink / raw)
To: Baolin Wang
Cc: akpm, hughd, david, Liam.Howlett, npache, ryan.roberts, dev.jain,
ziy, linux-mm, linux-kernel
On Thu, Jun 05, 2025 at 04:00:59PM +0800, Baolin Wang wrote:
> The MADV_COLLAPSE will ignore the system-wide shmem THP sysfs settings, which
> means that even though we have disabled the shmem THP configuration, MADV_COLLAPSE
> will still attempt to collapse into a shmem THP. This violates the rule we have
> agreed upon: never means never.
Ugh that we have separate shmem logic. And split between huge_memory.c and
shmem.c too :)) Again, not your fault, just a general moan about existing
stuff :P
>
> Another rule for madvise, referring to David's suggestion: “allowing for collapsing
> in a VM without VM_HUGEPAGE in the "madvise" mode would be fine".
Hm I'm not sure if this is enforced is it? I may have missed something here
however.
>
> Then the current strategy is:
>
> For shmem, if none of always, madvise, within_size, and inherit have enabled
> PMD-sized THP, then MADV_COLLAPSE will be prohibited from collapsing PMD-sized THP.
Again, is this just MADV_COLLAPSE? Surely this is a general change?
We should be clear that we are not explicitly limiting ourselves to
MADV_COLLAPSE here.
You shoudl clearly indicate that the MADV_COLLAPSE case DOESN'T set
TVA_ENFORCE_SYSFS and that's the key difference here.
>
> For tmpfs, if the mount option is set with the 'huge=never' parameter, then
> MADV_COLLAPSE will be prohibited from collapsing PMD-sized THP.
>
> Acked-by: Zi Yan <ziy@nvidia.com>
> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> ---
> mm/huge_memory.c | 2 +-
> mm/shmem.c | 6 +++---
> 2 files changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index d3e66136e41a..a8cfa37cae72 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -166,7 +166,7 @@ unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma,
> * own flags.
> */
> if (!in_pf && shmem_file(vma->vm_file))
> - return shmem_allowable_huge_orders(file_inode(vma->vm_file),
> + return orders & shmem_allowable_huge_orders(file_inode(vma->vm_file),
> vma, vma->vm_pgoff, 0,
> !enforce_sysfs);
Did you mean to do &&?
Also, is this achieving what you want to achieve? Is it necessary? The
changes in patch 1/2 enforce the global settings and before this code in
__thp_vma_allowable_orders():
unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma,
unsigned long vm_flags,
unsigned long tva_flags,
unsigned long orders)
{
... (no early exits) ...
orders &= supported_orders;
if (!orders)
return 0;
...
}
So if orders == 0 due to the changes in thp_vma_allowable_orders(), which
is the only caller of __thp_vma_allowable_orders() then we _always_ exit
early here before we get to this shmem_allowable_huge_orders() code.
>
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 4b42419ce6b2..9af45d4e27e6 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -625,7 +625,7 @@ static unsigned int shmem_huge_global_enabled(struct inode *inode, pgoff_t index
> return 0;
> if (shmem_huge == SHMEM_HUGE_DENY)
> return 0;
> - if (shmem_huge_force || shmem_huge == SHMEM_HUGE_FORCE)
> + if (shmem_huge == SHMEM_HUGE_FORCE)
> return maybe_pmd_order;
>
> /*
> @@ -660,7 +660,7 @@ static unsigned int shmem_huge_global_enabled(struct inode *inode, pgoff_t index
>
> fallthrough;
> case SHMEM_HUGE_ADVISE:
> - if (vm_flags & VM_HUGEPAGE)
> + if (shmem_huge_force || (vm_flags & VM_HUGEPAGE))
> return maybe_pmd_order;
> fallthrough;
> default:
> @@ -1790,7 +1790,7 @@ unsigned long shmem_allowable_huge_orders(struct inode *inode,
> /* Allow mTHP that will be fully within i_size. */
> mask |= shmem_get_orders_within_size(inode, within_size_orders, index, 0);
>
> - if (vm_flags & VM_HUGEPAGE)
> + if (shmem_huge_force || (vm_flags & VM_HUGEPAGE))
> mask |= READ_ONCE(huge_shmem_orders_madvise);
I'm also not sure these are necessary:
The only path that can set shmem_huge_force is __thp_vma_allowable_orders()
-> shmem_allowable_huge_orders() -> shmem_huge_global_enabled() and then
only if !(tva_flags & TVA_ENFORCE_SYSFS) and as stated above we already
cover off this case by early exiting __thp_vma_allowable_orders() if orders
== 0 as established in patch 1/2.
>
> if (global_orders > 0)
> --
> 2.43.5
>
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH v2 2/2] mm: shmem: disallow hugepages if the system-wide shmem THP sysfs settings are disabled
2025-06-07 12:14 ` Lorenzo Stoakes
@ 2025-06-07 12:17 ` Lorenzo Stoakes
2025-06-09 6:34 ` Baolin Wang
2025-06-09 6:31 ` Baolin Wang
1 sibling, 1 reply; 49+ messages in thread
From: Lorenzo Stoakes @ 2025-06-07 12:17 UTC (permalink / raw)
To: Baolin Wang
Cc: akpm, hughd, david, Liam.Howlett, npache, ryan.roberts, dev.jain,
ziy, linux-mm, linux-kernel
On Sat, Jun 07, 2025 at 01:14:41PM +0100, Lorenzo Stoakes wrote:
> On Thu, Jun 05, 2025 at 04:00:59PM +0800, Baolin Wang wrote:
[snip]
> >
> > Another rule for madvise, referring to David's suggestion: “allowing for collapsing
> > in a VM without VM_HUGEPAGE in the "madvise" mode would be fine".
>
> Hm I'm not sure if this is enforced is it? I may have missed something here
> however.
Oh right actually I think it is implicitly - if TVA_ENFORCE_SYSFS is not
specified in tva_flags, then we don't bother applying an madvise filter at all
anyway, and we account for that in our 'enabled' check in
thp_vma_allowable_orders().
But I don't think this patch changes anything, I actually _think_ we can just
drop this one.
[snip]
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH v2 1/2] mm: huge_memory: disallow hugepages if the system-wide THP sysfs settings are disabled
2025-06-07 11:55 ` Lorenzo Stoakes
@ 2025-06-07 12:21 ` Lorenzo Stoakes
2025-06-09 6:18 ` Baolin Wang
2025-06-09 6:10 ` Baolin Wang
1 sibling, 1 reply; 49+ messages in thread
From: Lorenzo Stoakes @ 2025-06-07 12:21 UTC (permalink / raw)
To: Baolin Wang
Cc: akpm, hughd, david, Liam.Howlett, npache, ryan.roberts, dev.jain,
ziy, linux-mm, linux-kernel
A couple follow up points that occurred to me:
On Sat, Jun 07, 2025 at 12:55:19PM +0100, Lorenzo Stoakes wrote:
> Not related to your patch at all, but man this whole thing (thp allowed orders)
> needs significant improvement, it seems always perversely complicated for a
> relatively simple operation.
>
> Overall I LOVE what you're doing here, but I feel we can clarify things a
> little while we're at it to make it clear exactly what we're doing.
>
> This is a very important change so forgive my fiddling about here but I'm
> hoping we can take the opportunity to make things a little simpler!
>
> On Thu, Jun 05, 2025 at 04:00:58PM +0800, Baolin Wang wrote:
> > The MADV_COLLAPSE will ignore the system-wide Anon THP sysfs settings, which
> > means that even though we have disabled the Anon THP configuration, MADV_COLLAPSE
> > will still attempt to collapse into a Anon THP. This violates the rule we have
> > agreed upon: never means never.
> >
> > Another rule for madvise, referring to David's suggestion: “allowing for collapsing
> > in a VM without VM_HUGEPAGE in the "madvise" mode would be fine".
>
> I'm generally not sure it's worth talking only about MADV_COLLAPSE here when
> you're changing what THP is permitted across the board, I may have missed some
> discussion and forgive me if so, but what is special about MADV_COLLAPSE's use
> of thp_vma_allowable_orders() that makes it ignore 'never's moreso than other
> users?
I'd mention that MADV_COLLAPSE is special because of not specifying
TVA_ENFORCE_SYSFS but you are making this change across the board for all
callers who do not specify this.
I'd also CLEARLY mention that you handle David's request re: madvise by
restricting yourself to checking only for NEVER and retaining the existing logic
of not enforcing sysfs settings when TVA_ENFORCE_SYSFS, which includes not
checking the VMA for VM_HUGEPAGE if the madvise mode is enabled.
(i.e. addressing David's request).
[snip]
> I feel this is compressing a lot of logic in a way that took me several
> readings to understand (hey I might not be the smartest cookie in the jar,
> but we need to account for all levels of kernel developer ;)
>
> I feel like we can make things a lot clearer here by separating out with a
> helper function (means we can drop some indentation too), and also take
> advantage of the fact that, if orders == 0, __thp_vma_allowable_orders()
> exits with 0 early so no need for us to do so ourselves:
>
> /* Strictly mask requested anonymous orders according to sysfs settings. */
> static inline unsigned long __thp_mask_anon_orders(unsigned long vm_flags,
> unsigned long tva_flags, unsigned long orders)
> {
> unsigned long always = READ_ONCE(huge_anon_orders_always);
> unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
> unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);;
> bool inherit_enabled = hugepage_global_enabled();
> bool has_madvise = vm_flags & VM_HUGEPAGE;
> unsigned long mask = always | madvise;
>
> mask = always | madvise;
> if (inherit_enabled)
> mask |= inherit;
>
> /* All set to/inherit NEVER - never means never globally, abort. */
> if (!(mask & orders))
> return 0;
>
> /* Otherwise, we only enforce sysfs settings if asked. */
Perhaps worth adding a comment here noting that, if the user sets a sysfs mode
of madvise and if TVA_ENFORCE_SYSFS is not set, we don't bother checking whether
the VMA has VM_HUGEPAGE set.
> if (!(tva_flags & TVA_ENFORCE_SYSFS))
> return orders;
>
> mask = always;
> if (has_madvise)
> mask |= madvise;
> if (hugepage_global_always() || (has_madvise && inherit_enabled))
> mask |= inherit;
>
> return orders & mask;
> }
>
> ...
>
> static inline
> unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma,
> unsigned long vm_flags,
> unsigned long tva_flags,
> unsigned long orders)
> {
> if (vma_is_anonymous(vma))
> orders = __thp_mask_anon_orders(vm_flags, tva_flags, orders);
>
> return __thp_vma_allowable_orders(vma, vm_flags, tva_flags, orders);
> }
>
> >
> > --
> > 2.43.5
> >
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH v2 0/2] fix MADV_COLLAPSE issue if THP settings are disabled
2025-06-05 8:00 [PATCH v2 0/2] fix MADV_COLLAPSE issue if THP settings are disabled Baolin Wang
2025-06-05 8:00 ` [PATCH v2 1/2] mm: huge_memory: disallow hugepages if the system-wide THP sysfs " Baolin Wang
2025-06-05 8:00 ` [PATCH v2 2/2] mm: shmem: disallow hugepages if the system-wide shmem " Baolin Wang
@ 2025-06-07 12:28 ` Lorenzo Stoakes
2025-06-11 7:05 ` Baolin Wang
2025-06-13 14:23 ` Usama Arif
3 siblings, 1 reply; 49+ messages in thread
From: Lorenzo Stoakes @ 2025-06-07 12:28 UTC (permalink / raw)
To: Baolin Wang
Cc: akpm, hughd, david, Liam.Howlett, npache, ryan.roberts, dev.jain,
ziy, linux-mm, linux-kernel
Before I get into technical criticism, to be clear - thank you very much
for doing this :) I'm just getting into details as to the implementation,
but am a fan of this change and consider it important.
On Thu, Jun 05, 2025 at 04:00:57PM +0800, Baolin Wang wrote:
> As we discussed in the previous thread [1], the MADV_COLLAPSE will ignore
> the system-wide anon/shmem THP sysfs settings, which means that even though
> we have disabled the anon/shmem THP configuration, MADV_COLLAPSE will still
> attempt to collapse into a anon/shmem THP. This violates the rule we have
> agreed upon: never means never. This patch set will address this issue.
Hm this cover letter could be expanded upon quite a bit - you are doing a
lot here and it's not only MADV_COLLAPSE, more a general change.
I'd mention that, even when TVA_ENFORCE_SYSFS is not set, callers checking
THP order validity will not be able to specify THP orders that are either
specifically marked as 'never' or set to 'inherit' and the global hugepage
mode is 'never'.
Then say something like 'importantly, this changes alters the madvise(...,
MADV_COLLAPSE) call, which previously would collapse ranges into huge pages
even if THP was set to never. This corrects this behaviour'.
I suspect you are unable to write sensible tests here given the need to
manipulate sysfs (though perhaps worth quickly looking at
tools/testing/selftests/mm/khugepaged.c, transhuge-stress.c, run_vmtests.sh
to see), but it'd be at least useful for you to give details here of how
you have tested this and ensured it functions correctly.
It might also be worth giving a quick justification, i.e. 'system
administrators who disabled THP everywhere must indeed very much not want
THP to be used for whatever reason - having individual programs being able
to quietly override this is very surprising and likely to cause headaches
for those who desire this not to happen on their systems'.
>
> [1] https://lore.kernel.org/all/1f00fdc3-a3a3-464b-8565-4c1b23d34f8d@linux.alibaba.com/
>
> Changes from v1:
> - Update the commit message, per Zi.
> - Add Zi's reviewed tag. Thanks.
> - Update the shmem logic.
>
> Baolin Wang (2):
> mm: huge_memory: disallow hugepages if the system-wide THP sysfs
> settings are disabled
> mm: shmem: disallow hugepages if the system-wide shmem THP sysfs
> settings are disabled
>
> include/linux/huge_mm.h | 23 +++++++++++++++++++----
> mm/huge_memory.c | 2 +-
> mm/shmem.c | 6 +++---
> 3 files changed, 23 insertions(+), 8 deletions(-)
>
> --
> 2.43.5
>
Thanks, Lorenzo
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH v2 1/2] mm: huge_memory: disallow hugepages if the system-wide THP sysfs settings are disabled
2025-06-05 8:00 ` [PATCH v2 1/2] mm: huge_memory: disallow hugepages if the system-wide THP sysfs " Baolin Wang
2025-06-06 16:49 ` Dev Jain
2025-06-07 11:55 ` Lorenzo Stoakes
@ 2025-06-08 18:37 ` Nico Pache
2025-06-09 6:36 ` Baolin Wang
2025-06-11 12:34 ` David Hildenbrand
3 siblings, 1 reply; 49+ messages in thread
From: Nico Pache @ 2025-06-08 18:37 UTC (permalink / raw)
To: Baolin Wang
Cc: akpm, hughd, david, lorenzo.stoakes, Liam.Howlett, ryan.roberts,
dev.jain, ziy, linux-mm, linux-kernel
On Thu, Jun 5, 2025 at 2:01 AM Baolin Wang
<baolin.wang@linux.alibaba.com> wrote:
>
> The MADV_COLLAPSE will ignore the system-wide Anon THP sysfs settings, which
> means that even though we have disabled the Anon THP configuration, MADV_COLLAPSE
> will still attempt to collapse into a Anon THP. This violates the rule we have
> agreed upon: never means never.
>
> Another rule for madvise, referring to David's suggestion: “allowing for collapsing
> in a VM without VM_HUGEPAGE in the "madvise" mode would be fine".
>
> To address this issue, should check whether the Anon THP configuration is disabled
> in thp_vma_allowable_orders(), even when the TVA_ENFORCE_SYSFS flag is set.
>
> In summary, the current strategy is:
>
> 1. If always & orders == 0, and madvise & orders == 0, and hugepage_global_enabled() == false
> (global THP settings are not enabled), it means mTHP of that orders are prohibited
> from being used, then madvise_collapse() is forbidden for that orders.
>
> 2. If always & orders == 0, and madvise & orders == 0, and hugepage_global_enabled() == true
> (global THP settings are enabled), and inherit & orders == 0, it means mTHP of that
> orders are still prohibited from being used, thus madvise_collapse() is not allowed
> for that orders.
>
> Reviewed-by: Zi Yan <ziy@nvidia.com>
> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> ---
> include/linux/huge_mm.h | 23 +++++++++++++++++++----
> 1 file changed, 19 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index 2f190c90192d..199ddc9f04a1 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -287,20 +287,35 @@ unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma,
> unsigned long orders)
> {
> /* Optimization to check if required orders are enabled early. */
> - if ((tva_flags & TVA_ENFORCE_SYSFS) && vma_is_anonymous(vma)) {
> - unsigned long mask = READ_ONCE(huge_anon_orders_always);
> + if (vma_is_anonymous(vma)) {
> + unsigned long always = READ_ONCE(huge_anon_orders_always);
> + unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
> + unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);
> + unsigned long mask = always | madvise;
> +
> + /*
> + * If the system-wide THP/mTHP sysfs settings are disabled,
> + * then we should never allow hugepages.
> + */
> + if (!(mask & orders) && !(hugepage_global_enabled() && (inherit & orders)))
> + return 0;
Hi Baolin,
Thanks for making this change so we enforce the 'never' being never
rule. As others have noted, this statement is hard to read-- is there
any improvements we can make to the readability as Lorenzo suggested?
Either way, I've tested this and the functionality works as intended.
Tested-by: Nico Pache <npache@redhat.com>
-- Nico
> +
> + if (!(tva_flags & TVA_ENFORCE_SYSFS))
> + goto skip;
>
> + mask = always;
> if (vm_flags & VM_HUGEPAGE)
> - mask |= READ_ONCE(huge_anon_orders_madvise);
> + mask |= madvise;
> if (hugepage_global_always() ||
> ((vm_flags & VM_HUGEPAGE) && hugepage_global_enabled()))
> - mask |= READ_ONCE(huge_anon_orders_inherit);
> + mask |= inherit;
>
> orders &= mask;
> if (!orders)
> return 0;
> }
>
> +skip:
> return __thp_vma_allowable_orders(vma, vm_flags, tva_flags, orders);
> }
>
> --
> 2.43.5
>
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH v2 1/2] mm: huge_memory: disallow hugepages if the system-wide THP sysfs settings are disabled
2025-06-06 18:47 ` Dev Jain
@ 2025-06-09 5:57 ` Baolin Wang
0 siblings, 0 replies; 49+ messages in thread
From: Baolin Wang @ 2025-06-09 5:57 UTC (permalink / raw)
To: Dev Jain, akpm, hughd, david
Cc: lorenzo.stoakes, Liam.Howlett, npache, ryan.roberts, ziy,
linux-mm, linux-kernel
On 2025/6/7 02:47, Dev Jain wrote:
>
> On 06/06/25 10:19 pm, Dev Jain wrote:
>>
>> On 05/06/25 1:30 pm, Baolin Wang wrote:
>>> The MADV_COLLAPSE will ignore the system-wide Anon THP sysfs
>>> settings, which
>>> means that even though we have disabled the Anon THP configuration,
>>> MADV_COLLAPSE
>>> will still attempt to collapse into a Anon THP. This violates the
>>> rule we have
>>> agreed upon: never means never.
>>>
>>> Another rule for madvise, referring to David's suggestion: “allowing
>>> for collapsing
>>> in a VM without VM_HUGEPAGE in the "madvise" mode would be fine".
>>>
>>> To address this issue, should check whether the Anon THP
>>> configuration is disabled
>>> in thp_vma_allowable_orders(), even when the TVA_ENFORCE_SYSFS flag
>>> is set.
>>
>> Did you mean to say "even when the TVA_ENFORCE_SYSFS flag is *not*
>> set"? Because if
>> is set, then we have to check the anon THP sysfs config.
Sorry for the confusion. What I mean is that we should also check
whether the Anon THP is disabled when the TVA_ENFORCE_SYSFS flag is set.
Will update the commit message.
> Otherwise the patch itself looks good to me, so:
>
> Reviewed-by: Dev Jain <dev.jain@arm.com>
Thanks for reviewing.
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH v2 1/2] mm: huge_memory: disallow hugepages if the system-wide THP sysfs settings are disabled
2025-06-07 11:55 ` Lorenzo Stoakes
2025-06-07 12:21 ` Lorenzo Stoakes
@ 2025-06-09 6:10 ` Baolin Wang
2025-06-09 15:17 ` Lorenzo Stoakes
1 sibling, 1 reply; 49+ messages in thread
From: Baolin Wang @ 2025-06-09 6:10 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: akpm, hughd, david, Liam.Howlett, npache, ryan.roberts, dev.jain,
ziy, linux-mm, linux-kernel
On 2025/6/7 19:55, Lorenzo Stoakes wrote:
> Not related to your patch at all, but man this whole thing (thp allowed orders)
> needs significant improvement, it seems always perversely complicated for a
> relatively simple operation.
>
> Overall I LOVE what you're doing here, but I feel we can clarify things a
> little while we're at it to make it clear exactly what we're doing.
>
> This is a very important change so forgive my fiddling about here but I'm
> hoping we can take the opportunity to make things a little simpler!
>
> On Thu, Jun 05, 2025 at 04:00:58PM +0800, Baolin Wang wrote:
>> The MADV_COLLAPSE will ignore the system-wide Anon THP sysfs settings, which
>> means that even though we have disabled the Anon THP configuration, MADV_COLLAPSE
>> will still attempt to collapse into a Anon THP. This violates the rule we have
>> agreed upon: never means never.
>>
>> Another rule for madvise, referring to David's suggestion: “allowing for collapsing
>> in a VM without VM_HUGEPAGE in the "madvise" mode would be fine".
>
> I'm generally not sure it's worth talking only about MADV_COLLAPSE here when
> you're changing what THP is permitted across the board, I may have missed some
> discussion and forgive me if so, but what is special about MADV_COLLAPSE's use
> of thp_vma_allowable_orders() that makes it ignore 'never's moreso than other
> users?
We found that MADV_COLLAPSE ignores the THP configuration, meaning that
even when THP is set to 'never', MADV_COLLAPSE can still collapse into
THPs (and mTHPs in the future). This is because when MADV_COLLAPSE calls
thp_vma_allowable_orders(), it does not set the TVA_ENFORCE_SYSFS flag,
which means it ignores the system-wide Anon THP sysfs settings.
So this patch set is aimed to fix the THP policy for MADV_COLLAPSE.
>> To address this issue, should check whether the Anon THP configuration is disabled
>> in thp_vma_allowable_orders(), even when the TVA_ENFORCE_SYSFS flag is set.
>>
>> In summary, the current strategy is:
>>
>> 1. If always & orders == 0, and madvise & orders == 0, and hugepage_global_enabled() == false
>> (global THP settings are not enabled), it means mTHP of that orders are prohibited
>> from being used, then madvise_collapse() is forbidden for that orders.
>>
>> 2. If always & orders == 0, and madvise & orders == 0, and hugepage_global_enabled() == true
>> (global THP settings are enabled), and inherit & orders == 0, it means mTHP of that
>> orders are still prohibited from being used, thus madvise_collapse() is not allowed
>> for that orders.
>
> OK so it's already confusing that the global settings only impact 'inherit'
> settings below, so they're not really global at all, but rather perhaps should
> be called 'inherited'.
>
> Maybe I need to submit a patch to rename thp_inherited_enabled(), or perhaps
> that'd just add to the confusion :P
>
> OK this is also not your fault just general commentary.
>
> Anyway, I feel points 1 and 2 can more succinctly be summed up as below,
> also there's no need to refer to the code, it's actually clearer I think to
> refer to the underlying logic:
>
> If no hugepage modes are enabled for the desired orders, nor can we
> enable them by inheriting from a 'global' enabled setting - then it
> must be the case that all desired orders either specify or inherit
> 'NEVER' - and we must abort.
OK. Thanks for helping me make it simpler:)
>>
>> Reviewed-by: Zi Yan <ziy@nvidia.com>
>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>> ---
>> include/linux/huge_mm.h | 23 +++++++++++++++++++----
>> 1 file changed, 19 insertions(+), 4 deletions(-)
>>
>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>> index 2f190c90192d..199ddc9f04a1 100644
>> --- a/include/linux/huge_mm.h
>> +++ b/include/linux/huge_mm.h
>> @@ -287,20 +287,35 @@ unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma,
>> unsigned long orders)
>> {
>> /* Optimization to check if required orders are enabled early. */
>> - if ((tva_flags & TVA_ENFORCE_SYSFS) && vma_is_anonymous(vma)) {
>> - unsigned long mask = READ_ONCE(huge_anon_orders_always);
>> + if (vma_is_anonymous(vma)) {
>> + unsigned long always = READ_ONCE(huge_anon_orders_always);
>> + unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
>> + unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);
>> + unsigned long mask = always | madvise;
>> +
>> + /*
>> + * If the system-wide THP/mTHP sysfs settings are disabled,
>> + * then we should never allow hugepages.
>> + */
>> + if (!(mask & orders) && !(hugepage_global_enabled() && (inherit & orders)))
>> + return 0;
>> +
>> + if (!(tva_flags & TVA_ENFORCE_SYSFS))
>> + goto skip;
>>
>> + mask = always;
>> if (vm_flags & VM_HUGEPAGE)
>> - mask |= READ_ONCE(huge_anon_orders_madvise);
>> + mask |= madvise;
>> if (hugepage_global_always() ||
>> ((vm_flags & VM_HUGEPAGE) && hugepage_global_enabled()))
>> - mask |= READ_ONCE(huge_anon_orders_inherit);
>> + mask |= inherit;
>>
>> orders &= mask;
>> if (!orders)
>> return 0;
>> }
>>
>> +skip:
>> return __thp_vma_allowable_orders(vma, vm_flags, tva_flags, orders);
>> }
>
> I feel this is compressing a lot of logic in a way that took me several
> readings to understand (hey I might not be the smartest cookie in the jar,
> but we need to account for all levels of kernel developer ;)
>
> I feel like we can make things a lot clearer here by separating out with a
> helper function (means we can drop some indentation too), and also take
> advantage of the fact that, if orders == 0, __thp_vma_allowable_orders()
> exits with 0 early so no need for us to do so ourselves:
Sure. Looks good to me. Thanks.
> /* Strictly mask requested anonymous orders according to sysfs settings. */
> static inline unsigned long __thp_mask_anon_orders(unsigned long vm_flags,
> unsigned long tva_flags, unsigned long orders)
> {
> unsigned long always = READ_ONCE(huge_anon_orders_always);
> unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
> unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);;
> bool inherit_enabled = hugepage_global_enabled();
> bool has_madvise = vm_flags & VM_HUGEPAGE;
> unsigned long mask = always | madvise;
>
> mask = always | madvise;
> if (inherit_enabled)
> mask |= inherit;
>
> /* All set to/inherit NEVER - never means never globally, abort. */
> if (!(mask & orders))
> return 0;
>
> /* Otherwise, we only enforce sysfs settings if asked. */
> if (!(tva_flags & TVA_ENFORCE_SYSFS))
> return orders;
>
> mask = always;
> if (has_madvise)
> mask |= madvise;
> if (hugepage_global_always() || (has_madvise && inherit_enabled))
> mask |= inherit;
>
> return orders & mask;
> }
>
> ...
>
> static inline
> unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma,
> unsigned long vm_flags,
> unsigned long tva_flags,
> unsigned long orders)
> {
> if (vma_is_anonymous(vma))
> orders = __thp_mask_anon_orders(vm_flags, tva_flags, orders);
>
> return __thp_vma_allowable_orders(vma, vm_flags, tva_flags, orders);
> }
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH v2 1/2] mm: huge_memory: disallow hugepages if the system-wide THP sysfs settings are disabled
2025-06-07 12:21 ` Lorenzo Stoakes
@ 2025-06-09 6:18 ` Baolin Wang
2025-06-09 15:12 ` Lorenzo Stoakes
0 siblings, 1 reply; 49+ messages in thread
From: Baolin Wang @ 2025-06-09 6:18 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: akpm, hughd, david, Liam.Howlett, npache, ryan.roberts, dev.jain,
ziy, linux-mm, linux-kernel
On 2025/6/7 20:21, Lorenzo Stoakes wrote:
> A couple follow up points that occurred to me:
>
> On Sat, Jun 07, 2025 at 12:55:19PM +0100, Lorenzo Stoakes wrote:
>> Not related to your patch at all, but man this whole thing (thp allowed orders)
>> needs significant improvement, it seems always perversely complicated for a
>> relatively simple operation.
>>
>> Overall I LOVE what you're doing here, but I feel we can clarify things a
>> little while we're at it to make it clear exactly what we're doing.
>>
>> This is a very important change so forgive my fiddling about here but I'm
>> hoping we can take the opportunity to make things a little simpler!
>>
>> On Thu, Jun 05, 2025 at 04:00:58PM +0800, Baolin Wang wrote:
>>> The MADV_COLLAPSE will ignore the system-wide Anon THP sysfs settings, which
>>> means that even though we have disabled the Anon THP configuration, MADV_COLLAPSE
>>> will still attempt to collapse into a Anon THP. This violates the rule we have
>>> agreed upon: never means never.
>>>
>>> Another rule for madvise, referring to David's suggestion: “allowing for collapsing
>>> in a VM without VM_HUGEPAGE in the "madvise" mode would be fine".
>>
>> I'm generally not sure it's worth talking only about MADV_COLLAPSE here when
>> you're changing what THP is permitted across the board, I may have missed some
>> discussion and forgive me if so, but what is special about MADV_COLLAPSE's use
>> of thp_vma_allowable_orders() that makes it ignore 'never's moreso than other
>> users?
>
> I'd mention that MADV_COLLAPSE is special because of not specifying
> TVA_ENFORCE_SYSFS but you are making this change across the board for all
> callers who do not specify this.
Currently, besides MADV_COLLAPSE not setting TVA_ENFORCE_SYSFS, there is
only one other instance where TVA_ENFORCE_SYSFS is not set, which is in
the collapse_pte_mapped_thp() function, but I believe this is reasonable
from the comments:
/*
* If we are here, we've succeeded in replacing all the native
pages
* in the page cache with a single hugepage. If a mm were to
fault-in
* this memory (mapped by a suitably aligned VMA), we'd get the
hugepage
* and map it by a PMD, regardless of sysfs THP settings. As
such, let's
* analogously elide sysfs THP settings here.
*/
if (!thp_vma_allowable_order(vma, vma->vm_flags, 0, PMD_ORDER))
return SCAN_VMA_CHECK;
>
> I'd also CLEARLY mention that you handle David's request re: madvise by
> restricting yourself to checking only for NEVER and retaining the existing logic
> of not enforcing sysfs settings when TVA_ENFORCE_SYSFS, which includes not
> checking the VMA for VM_HUGEPAGE if the madvise mode is enabled.
Sure.
>
> (i.e. addressing David's request).
>
> [snip]
>
>> I feel this is compressing a lot of logic in a way that took me several
>> readings to understand (hey I might not be the smartest cookie in the jar,
>> but we need to account for all levels of kernel developer ;)
>>
>> I feel like we can make things a lot clearer here by separating out with a
>> helper function (means we can drop some indentation too), and also take
>> advantage of the fact that, if orders == 0, __thp_vma_allowable_orders()
>> exits with 0 early so no need for us to do so ourselves:
>>
>> /* Strictly mask requested anonymous orders according to sysfs settings. */
>> static inline unsigned long __thp_mask_anon_orders(unsigned long vm_flags,
>> unsigned long tva_flags, unsigned long orders)
>> {
>> unsigned long always = READ_ONCE(huge_anon_orders_always);
>> unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
>> unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);;
>> bool inherit_enabled = hugepage_global_enabled();
>> bool has_madvise = vm_flags & VM_HUGEPAGE;
>> unsigned long mask = always | madvise;
>>
>> mask = always | madvise;
>> if (inherit_enabled)
>> mask |= inherit;
>>
>> /* All set to/inherit NEVER - never means never globally, abort. */
>> if (!(mask & orders))
>> return 0;
>>
>> /* Otherwise, we only enforce sysfs settings if asked. */
>
> Perhaps worth adding a comment here noting that, if the user sets a sysfs mode
> of madvise and if TVA_ENFORCE_SYSFS is not set, we don't bother checking whether
> the VMA has VM_HUGEPAGE set.
Sure, will do. Thanks for reviewing.
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH v2 2/2] mm: shmem: disallow hugepages if the system-wide shmem THP sysfs settings are disabled
2025-06-07 12:14 ` Lorenzo Stoakes
2025-06-07 12:17 ` Lorenzo Stoakes
@ 2025-06-09 6:31 ` Baolin Wang
2025-06-09 15:33 ` Lorenzo Stoakes
1 sibling, 1 reply; 49+ messages in thread
From: Baolin Wang @ 2025-06-09 6:31 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: akpm, hughd, david, Liam.Howlett, npache, ryan.roberts, dev.jain,
ziy, linux-mm, linux-kernel
On 2025/6/7 20:14, Lorenzo Stoakes wrote:
> On Thu, Jun 05, 2025 at 04:00:59PM +0800, Baolin Wang wrote:
>> The MADV_COLLAPSE will ignore the system-wide shmem THP sysfs settings, which
>> means that even though we have disabled the shmem THP configuration, MADV_COLLAPSE
>> will still attempt to collapse into a shmem THP. This violates the rule we have
>> agreed upon: never means never.
>
> Ugh that we have separate shmem logic. And split between huge_memory.c and
> shmem.c too :)) Again, not your fault, just a general moan about existing
> stuff :P
>
>>
>> Another rule for madvise, referring to David's suggestion: “allowing for collapsing
>> in a VM without VM_HUGEPAGE in the "madvise" mode would be fine".
>
> Hm I'm not sure if this is enforced is it? I may have missed something here
> however.
>
>>
>> Then the current strategy is:
>>
>> For shmem, if none of always, madvise, within_size, and inherit have enabled
>> PMD-sized THP, then MADV_COLLAPSE will be prohibited from collapsing PMD-sized THP.
>
> Again, is this just MADV_COLLAPSE? Surely this is a general change?
>
> We should be clear that we are not explicitly limiting ourselves to
> MADV_COLLAPSE here.
>
> You shoudl clearly indicate that the MADV_COLLAPSE case DOESN'T set
> TVA_ENFORCE_SYSFS and that's the key difference here.
Sure.
>> For tmpfs, if the mount option is set with the 'huge=never' parameter, then
>> MADV_COLLAPSE will be prohibited from collapsing PMD-sized THP.
>>
>> Acked-by: Zi Yan <ziy@nvidia.com>
>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>> ---
>> mm/huge_memory.c | 2 +-
>> mm/shmem.c | 6 +++---
>> 2 files changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index d3e66136e41a..a8cfa37cae72 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -166,7 +166,7 @@ unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma,
>> * own flags.
>> */
>> if (!in_pf && shmem_file(vma->vm_file))
>> - return shmem_allowable_huge_orders(file_inode(vma->vm_file),
>> + return orders & shmem_allowable_huge_orders(file_inode(vma->vm_file),
>> vma, vma->vm_pgoff, 0,
>> !enforce_sysfs);
>
> Did you mean to do &&?
No. It might be worth having a separate fix patch here, because the
original logic is incorrect and needs to perform an '&' operation with
’orders‘.
This change should be a general change.
> Also, is this achieving what you want to achieve? Is it necessary? The
> changes in patch 1/2 enforce the global settings and before this code in
> __thp_vma_allowable_orders():
>
> unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma,
> unsigned long vm_flags,
> unsigned long tva_flags,
> unsigned long orders)
> {
> ... (no early exits) ...
>
> orders &= supported_orders;
> if (!orders)
> return 0;
>
> ...
> }
>
> So if orders == 0 due to the changes in thp_vma_allowable_orders(), which
> is the only caller of __thp_vma_allowable_orders() then we _always_ exit
> early here before we get to this shmem_allowable_huge_orders() code.
Not for this case. Since shmem already supports mTHP, this is to check
whether the 'orders' are enabled in the shmem mTHP configuration. For
example, shmem mTHP might only enable 64K mTHP, which obviously does not
allow PMD-sized THP to collapse.
>> diff --git a/mm/shmem.c b/mm/shmem.c
>> index 4b42419ce6b2..9af45d4e27e6 100644
>> --- a/mm/shmem.c
>> +++ b/mm/shmem.c
>> @@ -625,7 +625,7 @@ static unsigned int shmem_huge_global_enabled(struct inode *inode, pgoff_t index
>> return 0;
>> if (shmem_huge == SHMEM_HUGE_DENY)
>> return 0;
>> - if (shmem_huge_force || shmem_huge == SHMEM_HUGE_FORCE)
>> + if (shmem_huge == SHMEM_HUGE_FORCE)
>> return maybe_pmd_order;
>>
>> /*
>> @@ -660,7 +660,7 @@ static unsigned int shmem_huge_global_enabled(struct inode *inode, pgoff_t index
>>
>> fallthrough;
>> case SHMEM_HUGE_ADVISE:
>> - if (vm_flags & VM_HUGEPAGE)
>> + if (shmem_huge_force || (vm_flags & VM_HUGEPAGE))
>> return maybe_pmd_order;
>> fallthrough;
>> default:
>> @@ -1790,7 +1790,7 @@ unsigned long shmem_allowable_huge_orders(struct inode *inode,
>> /* Allow mTHP that will be fully within i_size. */
>> mask |= shmem_get_orders_within_size(inode, within_size_orders, index, 0);
>>
>> - if (vm_flags & VM_HUGEPAGE)
>> + if (shmem_huge_force || (vm_flags & VM_HUGEPAGE))
>> mask |= READ_ONCE(huge_shmem_orders_madvise);
>
> I'm also not sure these are necessary:
>
> The only path that can set shmem_huge_force is __thp_vma_allowable_orders()
> -> shmem_allowable_huge_orders() -> shmem_huge_global_enabled() and then
> only if !(tva_flags & TVA_ENFORCE_SYSFS) and as stated above we already
> cover off this case by early exiting __thp_vma_allowable_orders() if orders
> == 0 as established in patch 1/2.
Not ture. Shmem has its own separate mTHP sysfs setting, which is
different from the mTHP sysfs setting for anonymous pages mentioned
earlier. These checks are in the shmem file. You can check more for
shmem mTHP in Documentation/admin-guide/mm/transhuge.rst :)
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH v2 2/2] mm: shmem: disallow hugepages if the system-wide shmem THP sysfs settings are disabled
2025-06-07 12:17 ` Lorenzo Stoakes
@ 2025-06-09 6:34 ` Baolin Wang
2025-06-09 19:30 ` Lorenzo Stoakes
0 siblings, 1 reply; 49+ messages in thread
From: Baolin Wang @ 2025-06-09 6:34 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: akpm, hughd, david, Liam.Howlett, npache, ryan.roberts, dev.jain,
ziy, linux-mm, linux-kernel
On 2025/6/7 20:17, Lorenzo Stoakes wrote:
> On Sat, Jun 07, 2025 at 01:14:41PM +0100, Lorenzo Stoakes wrote:
>> On Thu, Jun 05, 2025 at 04:00:59PM +0800, Baolin Wang wrote:
> [snip]
>>>
>>> Another rule for madvise, referring to David's suggestion: “allowing for collapsing
>>> in a VM without VM_HUGEPAGE in the "madvise" mode would be fine".
>>
>> Hm I'm not sure if this is enforced is it? I may have missed something here
>> however.
>
> Oh right actually I think it is implicitly - if TVA_ENFORCE_SYSFS is not
> specified in tva_flags, then we don't bother applying an madvise filter at all
> anyway, and we account for that in our 'enabled' check in
> thp_vma_allowable_orders().
>
> But I don't think this patch changes anything, I actually _think_ we can just
> drop this one.
See my previous replies. Shmem mTHP sysfs settings are different from
Anonymous pages.
Thanks for reviewing.
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH v2 1/2] mm: huge_memory: disallow hugepages if the system-wide THP sysfs settings are disabled
2025-06-08 18:37 ` Nico Pache
@ 2025-06-09 6:36 ` Baolin Wang
0 siblings, 0 replies; 49+ messages in thread
From: Baolin Wang @ 2025-06-09 6:36 UTC (permalink / raw)
To: Nico Pache
Cc: akpm, hughd, david, lorenzo.stoakes, Liam.Howlett, ryan.roberts,
dev.jain, ziy, linux-mm, linux-kernel
On 2025/6/9 02:37, Nico Pache wrote:
> On Thu, Jun 5, 2025 at 2:01 AM Baolin Wang
> <baolin.wang@linux.alibaba.com> wrote:
>>
>> The MADV_COLLAPSE will ignore the system-wide Anon THP sysfs settings, which
>> means that even though we have disabled the Anon THP configuration, MADV_COLLAPSE
>> will still attempt to collapse into a Anon THP. This violates the rule we have
>> agreed upon: never means never.
>>
>> Another rule for madvise, referring to David's suggestion: “allowing for collapsing
>> in a VM without VM_HUGEPAGE in the "madvise" mode would be fine".
>>
>> To address this issue, should check whether the Anon THP configuration is disabled
>> in thp_vma_allowable_orders(), even when the TVA_ENFORCE_SYSFS flag is set.
>>
>> In summary, the current strategy is:
>>
>> 1. If always & orders == 0, and madvise & orders == 0, and hugepage_global_enabled() == false
>> (global THP settings are not enabled), it means mTHP of that orders are prohibited
>> from being used, then madvise_collapse() is forbidden for that orders.
>>
>> 2. If always & orders == 0, and madvise & orders == 0, and hugepage_global_enabled() == true
>> (global THP settings are enabled), and inherit & orders == 0, it means mTHP of that
>> orders are still prohibited from being used, thus madvise_collapse() is not allowed
>> for that orders.
>>
>> Reviewed-by: Zi Yan <ziy@nvidia.com>
>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>> ---
>> include/linux/huge_mm.h | 23 +++++++++++++++++++----
>> 1 file changed, 19 insertions(+), 4 deletions(-)
>>
>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>> index 2f190c90192d..199ddc9f04a1 100644
>> --- a/include/linux/huge_mm.h
>> +++ b/include/linux/huge_mm.h
>> @@ -287,20 +287,35 @@ unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma,
>> unsigned long orders)
>> {
>> /* Optimization to check if required orders are enabled early. */
>> - if ((tva_flags & TVA_ENFORCE_SYSFS) && vma_is_anonymous(vma)) {
>> - unsigned long mask = READ_ONCE(huge_anon_orders_always);
>> + if (vma_is_anonymous(vma)) {
>> + unsigned long always = READ_ONCE(huge_anon_orders_always);
>> + unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
>> + unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);
>> + unsigned long mask = always | madvise;
>> +
>> + /*
>> + * If the system-wide THP/mTHP sysfs settings are disabled,
>> + * then we should never allow hugepages.
>> + */
>> + if (!(mask & orders) && !(hugepage_global_enabled() && (inherit & orders)))
>> + return 0;
> Hi Baolin,
>
> Thanks for making this change so we enforce the 'never' being never
> rule. As others have noted, this statement is hard to read-- is there
> any improvements we can make to the readability as Lorenzo suggested?
Yes, will do in next version.
> Either way, I've tested this and the functionality works as intended.
>
> Tested-by: Nico Pache <npache@redhat.com>
Thanks for testing.
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH v2 1/2] mm: huge_memory: disallow hugepages if the system-wide THP sysfs settings are disabled
2025-06-09 6:18 ` Baolin Wang
@ 2025-06-09 15:12 ` Lorenzo Stoakes
0 siblings, 0 replies; 49+ messages in thread
From: Lorenzo Stoakes @ 2025-06-09 15:12 UTC (permalink / raw)
To: Baolin Wang
Cc: akpm, hughd, david, Liam.Howlett, npache, ryan.roberts, dev.jain,
ziy, linux-mm, linux-kernel
On Mon, Jun 09, 2025 at 02:18:07PM +0800, Baolin Wang wrote:
>
>
> On 2025/6/7 20:21, Lorenzo Stoakes wrote:
> > A couple follow up points that occurred to me:
> >
> > On Sat, Jun 07, 2025 at 12:55:19PM +0100, Lorenzo Stoakes wrote:
> > > Not related to your patch at all, but man this whole thing (thp allowed orders)
> > > needs significant improvement, it seems always perversely complicated for a
> > > relatively simple operation.
> > >
> > > Overall I LOVE what you're doing here, but I feel we can clarify things a
> > > little while we're at it to make it clear exactly what we're doing.
> > >
> > > This is a very important change so forgive my fiddling about here but I'm
> > > hoping we can take the opportunity to make things a little simpler!
> > >
> > > On Thu, Jun 05, 2025 at 04:00:58PM +0800, Baolin Wang wrote:
> > > > The MADV_COLLAPSE will ignore the system-wide Anon THP sysfs settings, which
> > > > means that even though we have disabled the Anon THP configuration, MADV_COLLAPSE
> > > > will still attempt to collapse into a Anon THP. This violates the rule we have
> > > > agreed upon: never means never.
> > > >
> > > > Another rule for madvise, referring to David's suggestion: “allowing for collapsing
> > > > in a VM without VM_HUGEPAGE in the "madvise" mode would be fine".
> > >
> > > I'm generally not sure it's worth talking only about MADV_COLLAPSE here when
> > > you're changing what THP is permitted across the board, I may have missed some
> > > discussion and forgive me if so, but what is special about MADV_COLLAPSE's use
> > > of thp_vma_allowable_orders() that makes it ignore 'never's moreso than other
> > > users?
> >
> > I'd mention that MADV_COLLAPSE is special because of not specifying
> > TVA_ENFORCE_SYSFS but you are making this change across the board for all
> > callers who do not specify this.
>
> Currently, besides MADV_COLLAPSE not setting TVA_ENFORCE_SYSFS, there is
> only one other instance where TVA_ENFORCE_SYSFS is not set, which is in the
> collapse_pte_mapped_thp() function, but I believe this is reasonable from
> the comments:
>
> /*
> * If we are here, we've succeeded in replacing all the native pages
> * in the page cache with a single hugepage. If a mm were to
> fault-in
> * this memory (mapped by a suitably aligned VMA), we'd get the
> hugepage
> * and map it by a PMD, regardless of sysfs THP settings. As such,
> let's
> * analogously elide sysfs THP settings here.
> */
> if (!thp_vma_allowable_order(vma, vma->vm_flags, 0, PMD_ORDER))
> return SCAN_VMA_CHECK;
Thanks for referencing that - it's a nicely worded comment that does explain it
indeed :)
Definitely worth mentioning that in the commit msg I think.
>
> >
> > I'd also CLEARLY mention that you handle David's request re: madvise by
> > restricting yourself to checking only for NEVER and retaining the existing logic
> > of not enforcing sysfs settings when TVA_ENFORCE_SYSFS, which includes not
> > checking the VMA for VM_HUGEPAGE if the madvise mode is enabled.
>
> Sure.
Thanks!
>
> >
> > (i.e. addressing David's request).
> >
> > [snip]
> >
> > > I feel this is compressing a lot of logic in a way that took me several
> > > readings to understand (hey I might not be the smartest cookie in the jar,
> > > but we need to account for all levels of kernel developer ;)
> > >
> > > I feel like we can make things a lot clearer here by separating out with a
> > > helper function (means we can drop some indentation too), and also take
> > > advantage of the fact that, if orders == 0, __thp_vma_allowable_orders()
> > > exits with 0 early so no need for us to do so ourselves:
> > >
> > > /* Strictly mask requested anonymous orders according to sysfs settings. */
> > > static inline unsigned long __thp_mask_anon_orders(unsigned long vm_flags,
> > > unsigned long tva_flags, unsigned long orders)
> > > {
> > > unsigned long always = READ_ONCE(huge_anon_orders_always);
> > > unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
> > > unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);;
> > > bool inherit_enabled = hugepage_global_enabled();
> > > bool has_madvise = vm_flags & VM_HUGEPAGE;
> > > unsigned long mask = always | madvise;
> > >
> > > mask = always | madvise;
> > > if (inherit_enabled)
> > > mask |= inherit;
> > >
> > > /* All set to/inherit NEVER - never means never globally, abort. */
> > > if (!(mask & orders))
> > > return 0;
> > >
> > > /* Otherwise, we only enforce sysfs settings if asked. */
> >
> > Perhaps worth adding a comment here noting that, if the user sets a sysfs mode
> > of madvise and if TVA_ENFORCE_SYSFS is not set, we don't bother checking whether
> > the VMA has VM_HUGEPAGE set.
>
> Sure, will do. Thanks for reviewing.
Thanks!
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH v2 1/2] mm: huge_memory: disallow hugepages if the system-wide THP sysfs settings are disabled
2025-06-09 6:10 ` Baolin Wang
@ 2025-06-09 15:17 ` Lorenzo Stoakes
2025-06-11 6:59 ` Baolin Wang
0 siblings, 1 reply; 49+ messages in thread
From: Lorenzo Stoakes @ 2025-06-09 15:17 UTC (permalink / raw)
To: Baolin Wang
Cc: akpm, hughd, david, Liam.Howlett, npache, ryan.roberts, dev.jain,
ziy, linux-mm, linux-kernel
On Mon, Jun 09, 2025 at 02:10:12PM +0800, Baolin Wang wrote:
>
>
> On 2025/6/7 19:55, Lorenzo Stoakes wrote:
> > Not related to your patch at all, but man this whole thing (thp allowed orders)
> > needs significant improvement, it seems always perversely complicated for a
> > relatively simple operation.
> >
> > Overall I LOVE what you're doing here, but I feel we can clarify things a
> > little while we're at it to make it clear exactly what we're doing.
> >
> > This is a very important change so forgive my fiddling about here but I'm
> > hoping we can take the opportunity to make things a little simpler!
> >
> > On Thu, Jun 05, 2025 at 04:00:58PM +0800, Baolin Wang wrote:
> > > The MADV_COLLAPSE will ignore the system-wide Anon THP sysfs settings, which
> > > means that even though we have disabled the Anon THP configuration, MADV_COLLAPSE
> > > will still attempt to collapse into a Anon THP. This violates the rule we have
> > > agreed upon: never means never.
> > >
> > > Another rule for madvise, referring to David's suggestion: “allowing for collapsing
> > > in a VM without VM_HUGEPAGE in the "madvise" mode would be fine".
> >
> > I'm generally not sure it's worth talking only about MADV_COLLAPSE here when
> > you're changing what THP is permitted across the board, I may have missed some
> > discussion and forgive me if so, but what is special about MADV_COLLAPSE's use
> > of thp_vma_allowable_orders() that makes it ignore 'never's moreso than other
> > users?
>
> We found that MADV_COLLAPSE ignores the THP configuration, meaning that even
> when THP is set to 'never', MADV_COLLAPSE can still collapse into THPs (and
> mTHPs in the future). This is because when MADV_COLLAPSE calls
> thp_vma_allowable_orders(), it does not set the TVA_ENFORCE_SYSFS flag,
> which means it ignores the system-wide Anon THP sysfs settings.
>
> So this patch set is aimed to fix the THP policy for MADV_COLLAPSE.
>
Yeah of course, and this is exactly why, but what I mean is, the patch
doesn't explicitly address MADV_COLLAPSE, it addresses a case that
MADV_COLLAPSE uses (which is as you say the motivating cause for the
change).
So I think the commit message should rather open something like:
If, when invoking thp_vma_allowable_orders(), the TVA_ENFORCE_SYSFS
flag is not specified, we ignore sysfs TLB settings.
Whilst it makes sense for the callers who do not specify this flag,
it creates a odd and surprising situation where a sysadmin
specifying 'never' for all THP sizes still observing THP pages
being allocated and used on the system.
The motivating case for this is MADV_COLLAPSE, <blah blah blah> :)
> > > To address this issue, should check whether the Anon THP configuration is disabled
> > > in thp_vma_allowable_orders(), even when the TVA_ENFORCE_SYSFS flag is set.
> > >
> > > In summary, the current strategy is:
> > >
> > > 1. If always & orders == 0, and madvise & orders == 0, and hugepage_global_enabled() == false
> > > (global THP settings are not enabled), it means mTHP of that orders are prohibited
> > > from being used, then madvise_collapse() is forbidden for that orders.
> > >
> > > 2. If always & orders == 0, and madvise & orders == 0, and hugepage_global_enabled() == true
> > > (global THP settings are enabled), and inherit & orders == 0, it means mTHP of that
> > > orders are still prohibited from being used, thus madvise_collapse() is not allowed
> > > for that orders.
> >
> > OK so it's already confusing that the global settings only impact 'inherit'
> > settings below, so they're not really global at all, but rather perhaps should
> > be called 'inherited'.
> >
> > Maybe I need to submit a patch to rename thp_inherited_enabled(), or perhaps
> > that'd just add to the confusion :P
> >
> > OK this is also not your fault just general commentary.
> >
> > Anyway, I feel points 1 and 2 can more succinctly be summed up as below,
> > also there's no need to refer to the code, it's actually clearer I think to
> > refer to the underlying logic:
> >
> > If no hugepage modes are enabled for the desired orders, nor can we
> > enable them by inheriting from a 'global' enabled setting - then it
> > must be the case that all desired orders either specify or inherit
> > 'NEVER' - and we must abort.
>
> OK. Thanks for helping me make it simpler:)
>
Thanks :)
> > >
> > > Reviewed-by: Zi Yan <ziy@nvidia.com>
> > > Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> > > ---
> > > include/linux/huge_mm.h | 23 +++++++++++++++++++----
> > > 1 file changed, 19 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> > > index 2f190c90192d..199ddc9f04a1 100644
> > > --- a/include/linux/huge_mm.h
> > > +++ b/include/linux/huge_mm.h
> > > @@ -287,20 +287,35 @@ unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma,
> > > unsigned long orders)
> > > {
> > > /* Optimization to check if required orders are enabled early. */
> > > - if ((tva_flags & TVA_ENFORCE_SYSFS) && vma_is_anonymous(vma)) {
> > > - unsigned long mask = READ_ONCE(huge_anon_orders_always);
> > > + if (vma_is_anonymous(vma)) {
> > > + unsigned long always = READ_ONCE(huge_anon_orders_always);
> > > + unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
> > > + unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);
> > > + unsigned long mask = always | madvise;
> > > +
> > > + /*
> > > + * If the system-wide THP/mTHP sysfs settings are disabled,
> > > + * then we should never allow hugepages.
> > > + */
> > > + if (!(mask & orders) && !(hugepage_global_enabled() && (inherit & orders)))
> > > + return 0;
> > > +
> > > + if (!(tva_flags & TVA_ENFORCE_SYSFS))
> > > + goto skip;
> > >
> > > + mask = always;
> > > if (vm_flags & VM_HUGEPAGE)
> > > - mask |= READ_ONCE(huge_anon_orders_madvise);
> > > + mask |= madvise;
> > > if (hugepage_global_always() ||
> > > ((vm_flags & VM_HUGEPAGE) && hugepage_global_enabled()))
> > > - mask |= READ_ONCE(huge_anon_orders_inherit);
> > > + mask |= inherit;
> > >
> > > orders &= mask;
> > > if (!orders)
> > > return 0;
> > > }
> > >
> > > +skip:
> > > return __thp_vma_allowable_orders(vma, vm_flags, tva_flags, orders);
> > > }
> >
> > I feel this is compressing a lot of logic in a way that took me several
> > readings to understand (hey I might not be the smartest cookie in the jar,
> > but we need to account for all levels of kernel developer ;)
> >
> > I feel like we can make things a lot clearer here by separating out with a
> > helper function (means we can drop some indentation too), and also take
> > advantage of the fact that, if orders == 0, __thp_vma_allowable_orders()
> > exits with 0 early so no need for us to do so ourselves:
>
> Sure. Looks good to me. Thanks.
Great thanks!
>
> > /* Strictly mask requested anonymous orders according to sysfs settings. */
> > static inline unsigned long __thp_mask_anon_orders(unsigned long vm_flags,
> > unsigned long tva_flags, unsigned long orders)
> > {
> > unsigned long always = READ_ONCE(huge_anon_orders_always);
> > unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
> > unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);;
> > bool inherit_enabled = hugepage_global_enabled();
> > bool has_madvise = vm_flags & VM_HUGEPAGE;
> > unsigned long mask = always | madvise;
> >
> > mask = always | madvise;
> > if (inherit_enabled)
> > mask |= inherit;
> >
> > /* All set to/inherit NEVER - never means never globally, abort. */
> > if (!(mask & orders))
> > return 0;
> >
> > /* Otherwise, we only enforce sysfs settings if asked. */
> > if (!(tva_flags & TVA_ENFORCE_SYSFS))
> > return orders;
> >
> > mask = always;
> > if (has_madvise)
> > mask |= madvise;
> > if (hugepage_global_always() || (has_madvise && inherit_enabled))
> > mask |= inherit;
> >
> > return orders & mask;
> > }
> >
> > ...
> >
> > static inline
> > unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma,
> > unsigned long vm_flags,
> > unsigned long tva_flags,
> > unsigned long orders)
> > {
> > if (vma_is_anonymous(vma))
> > orders = __thp_mask_anon_orders(vm_flags, tva_flags, orders);
> >
> > return __thp_vma_allowable_orders(vma, vm_flags, tva_flags, orders);
> > }
>
>
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH v2 2/2] mm: shmem: disallow hugepages if the system-wide shmem THP sysfs settings are disabled
2025-06-09 6:31 ` Baolin Wang
@ 2025-06-09 15:33 ` Lorenzo Stoakes
2025-06-11 7:02 ` Baolin Wang
0 siblings, 1 reply; 49+ messages in thread
From: Lorenzo Stoakes @ 2025-06-09 15:33 UTC (permalink / raw)
To: Baolin Wang
Cc: akpm, hughd, david, Liam.Howlett, npache, ryan.roberts, dev.jain,
ziy, linux-mm, linux-kernel
OK overall the logic looks good now I realise the mistake I made is that
the thp_vma_allowable_orders() check is for the vma_is_anonymous() case
only.
On Mon, Jun 09, 2025 at 02:31:46PM +0800, Baolin Wang wrote:
>
>
> On 2025/6/7 20:14, Lorenzo Stoakes wrote:
> > On Thu, Jun 05, 2025 at 04:00:59PM +0800, Baolin Wang wrote:
> > > The MADV_COLLAPSE will ignore the system-wide shmem THP sysfs settings, which
> > > means that even though we have disabled the shmem THP configuration, MADV_COLLAPSE
> > > will still attempt to collapse into a shmem THP. This violates the rule we have
> > > agreed upon: never means never.
> >
> > Ugh that we have separate shmem logic. And split between huge_memory.c and
> > shmem.c too :)) Again, not your fault, just a general moan about existing
> > stuff :P
> >
> > >
> > > Another rule for madvise, referring to David's suggestion: “allowing for collapsing
> > > in a VM without VM_HUGEPAGE in the "madvise" mode would be fine".
> >
> > Hm I'm not sure if this is enforced is it? I may have missed something here
> > however.
> >
> > >
> > > Then the current strategy is:
> > >
> > > For shmem, if none of always, madvise, within_size, and inherit have enabled
> > > PMD-sized THP, then MADV_COLLAPSE will be prohibited from collapsing PMD-sized THP.
> >
> > Again, is this just MADV_COLLAPSE? Surely this is a general change?
> >
> > We should be clear that we are not explicitly limiting ourselves to
> > MADV_COLLAPSE here.
> >
> > You shoudl clearly indicate that the MADV_COLLAPSE case DOESN'T set
> > TVA_ENFORCE_SYSFS and that's the key difference here.
>
> Sure.
>
Thanks!
> > > For tmpfs, if the mount option is set with the 'huge=never' parameter, then
> > > MADV_COLLAPSE will be prohibited from collapsing PMD-sized THP.
> > >
> > > Acked-by: Zi Yan <ziy@nvidia.com>
> > > Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> > > ---
> > > mm/huge_memory.c | 2 +-
> > > mm/shmem.c | 6 +++---
> > > 2 files changed, 4 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > > index d3e66136e41a..a8cfa37cae72 100644
> > > --- a/mm/huge_memory.c
> > > +++ b/mm/huge_memory.c
> > > @@ -166,7 +166,7 @@ unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma,
> > > * own flags.
> > > */
> > > if (!in_pf && shmem_file(vma->vm_file))
> > > - return shmem_allowable_huge_orders(file_inode(vma->vm_file),
> > > + return orders & shmem_allowable_huge_orders(file_inode(vma->vm_file),
> > > vma, vma->vm_pgoff, 0,
> > > !enforce_sysfs);
> >
> > Did you mean to do &&?
>
> No. It might be worth having a separate fix patch here, because the original
> logic is incorrect and needs to perform an '&' operation with ’orders‘.
>
> This change should be a general change.
Ah yeah, I did think that _perhaps_ it could be. I think it would make
sense to separate out into another patch albeit very small, just so we can
separate your further changes from the fix for this.
>
> > Also, is this achieving what you want to achieve? Is it necessary? The
> > changes in patch 1/2 enforce the global settings and before this code in
> > __thp_vma_allowable_orders():
> >
> > unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma,
> > unsigned long vm_flags,
> > unsigned long tva_flags,
> > unsigned long orders)
> > {
> > ... (no early exits) ...
> >
> > orders &= supported_orders;
> > if (!orders)
> > return 0;
> >
> > ...
> > }
> >
> > So if orders == 0 due to the changes in thp_vma_allowable_orders(), which
> > is the only caller of __thp_vma_allowable_orders() then we _always_ exit
> > early here before we get to this shmem_allowable_huge_orders() code.
>
> Not for this case. Since shmem already supports mTHP, this is to check
> whether the 'orders' are enabled in the shmem mTHP configuration. For
> example, shmem mTHP might only enable 64K mTHP, which obviously does not
> allow PMD-sized THP to collapse.
Yeah sorry I get it now thp_vma_allowable_orders() does a
vma_is_anonymous() predicate. Doh! :P
God what a mess this is (not your fault, pre-existing obviously :P)
Yeah le sigh.
>
> > > diff --git a/mm/shmem.c b/mm/shmem.c
> > > index 4b42419ce6b2..9af45d4e27e6 100644
> > > --- a/mm/shmem.c
> > > +++ b/mm/shmem.c
> > > @@ -625,7 +625,7 @@ static unsigned int shmem_huge_global_enabled(struct inode *inode, pgoff_t index
> > > return 0;
> > > if (shmem_huge == SHMEM_HUGE_DENY)
> > > return 0;
> > > - if (shmem_huge_force || shmem_huge == SHMEM_HUGE_FORCE)
> > > + if (shmem_huge == SHMEM_HUGE_FORCE)
> > > return maybe_pmd_order;
OK I get it now, this means the !check sysfs doesn't just get
actioned... :)
> > >
> > > /*
> > > @@ -660,7 +660,7 @@ static unsigned int shmem_huge_global_enabled(struct inode *inode, pgoff_t index
> > >
> > > fallthrough;
> > > case SHMEM_HUGE_ADVISE:
> > > - if (vm_flags & VM_HUGEPAGE)
> > > + if (shmem_huge_force || (vm_flags & VM_HUGEPAGE))
> > > return maybe_pmd_order;
> > > fallthrough;
> > > default:
> > > @@ -1790,7 +1790,7 @@ unsigned long shmem_allowable_huge_orders(struct inode *inode,
> > > /* Allow mTHP that will be fully within i_size. */
> > > mask |= shmem_get_orders_within_size(inode, within_size_orders, index, 0);
> > >
> > > - if (vm_flags & VM_HUGEPAGE)
> > > + if (shmem_huge_force || (vm_flags & VM_HUGEPAGE))
> > > mask |= READ_ONCE(huge_shmem_orders_madvise);
> >
> > I'm also not sure these are necessary:
> >
> > The only path that can set shmem_huge_force is __thp_vma_allowable_orders()
> > -> shmem_allowable_huge_orders() -> shmem_huge_global_enabled() and then
> > only if !(tva_flags & TVA_ENFORCE_SYSFS) and as stated above we already
> > cover off this case by early exiting __thp_vma_allowable_orders() if orders
> > == 0 as established in patch 1/2.
>
> Not ture. Shmem has its own separate mTHP sysfs setting, which is different
> from the mTHP sysfs setting for anonymous pages mentioned earlier. These
> checks are in the shmem file. You can check more for shmem mTHP in
> Documentation/admin-guide/mm/transhuge.rst :)
Ah yeah the issue is if (vma_is_anonymous())... doh!
The stack trace is correct though, this is the only place we do it:
~~
static unsigned int shmem_huge_global_enabled(struct inode *inode, pgoff_t index,
loff_t write_end, bool shmem_huge_force,
struct vm_area_struct *vma,
unsigned long vm_flags)
Is called from shmem_getattr():
if (shmem_huge_global_enabled(inode, 0, 0, false, NULL, 0))
stat->blksize = HPAGE_PMD_SIZE;
Note that smem_huge_force == false here
And shmem_allowable_huge_orders():
unsigned long shmem_allowable_huge_orders(struct inode *inode,
struct vm_area_struct *vma, pgoff_t index,
loff_t write_end, bool shmem_huge_force)
global_orders = shmem_huge_global_enabled(inode, index, write_end,
shmem_huge_force, vma, vm_flags);
Which forwards 'shmem_huge_force'.
In shmem_get_folow_gfp():
orders = shmem_allowable_huge_orders(inode, vma, index, write_end, false);
Note that shmem_huge_force == false.
In __thp_vma_allowable_orders();
unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma,
unsigned long vm_flags,
unsigned long tva_flags,
unsigned long orders)
{
...
bool enforce_sysfs = tva_flags & TVA_ENFORCE_SYSFS;
...
if (!in_pf && shmem_file(vma->vm_file))
return shmem_allowable_huge_orders(file_inode(vma->vm_file),
vma, vma->vm_pgoff, 0,
!enforce_sysfs);
So we set shmem_huge_force only if TVA_ENFORCE_SYSFS is not set in tva_flags passed to __thp_vma_allowable_orders()
The only caller of __thp_vma_allowable_orders() is thp_vma_allowable_orders().
~~
But yeah we do need to do shmem things... sorry my mistake.
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH v2 2/2] mm: shmem: disallow hugepages if the system-wide shmem THP sysfs settings are disabled
2025-06-09 6:34 ` Baolin Wang
@ 2025-06-09 19:30 ` Lorenzo Stoakes
0 siblings, 0 replies; 49+ messages in thread
From: Lorenzo Stoakes @ 2025-06-09 19:30 UTC (permalink / raw)
To: Baolin Wang
Cc: akpm, hughd, david, Liam.Howlett, npache, ryan.roberts, dev.jain,
ziy, linux-mm, linux-kernel
On Mon, Jun 09, 2025 at 02:34:22PM +0800, Baolin Wang wrote:
>
>
> On 2025/6/7 20:17, Lorenzo Stoakes wrote:
> > On Sat, Jun 07, 2025 at 01:14:41PM +0100, Lorenzo Stoakes wrote:
> > > On Thu, Jun 05, 2025 at 04:00:59PM +0800, Baolin Wang wrote:
> > [snip]
> > > >
> > > > Another rule for madvise, referring to David's suggestion: “allowing for collapsing
> > > > in a VM without VM_HUGEPAGE in the "madvise" mode would be fine".
> > >
> > > Hm I'm not sure if this is enforced is it? I may have missed something here
> > > however.
> >
> > Oh right actually I think it is implicitly - if TVA_ENFORCE_SYSFS is not
> > specified in tva_flags, then we don't bother applying an madvise filter at all
> > anyway, and we account for that in our 'enabled' check in
> > thp_vma_allowable_orders().
> >
> > But I don't think this patch changes anything, I actually _think_ we can just
> > drop this one.
>
> See my previous replies. Shmem mTHP sysfs settings are different from
> Anonymous pages.
Yeah sorry you're right, responded elsewhere in thread.
>
> Thanks for reviewing.
You're welcome!
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH v2 1/2] mm: huge_memory: disallow hugepages if the system-wide THP sysfs settings are disabled
2025-06-09 15:17 ` Lorenzo Stoakes
@ 2025-06-11 6:59 ` Baolin Wang
0 siblings, 0 replies; 49+ messages in thread
From: Baolin Wang @ 2025-06-11 6:59 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: akpm, hughd, david, Liam.Howlett, npache, ryan.roberts, dev.jain,
ziy, linux-mm, linux-kernel
On 2025/6/9 23:17, Lorenzo Stoakes wrote:
> On Mon, Jun 09, 2025 at 02:10:12PM +0800, Baolin Wang wrote:
>>
>>
>> On 2025/6/7 19:55, Lorenzo Stoakes wrote:
>>> Not related to your patch at all, but man this whole thing (thp allowed orders)
>>> needs significant improvement, it seems always perversely complicated for a
>>> relatively simple operation.
>>>
>>> Overall I LOVE what you're doing here, but I feel we can clarify things a
>>> little while we're at it to make it clear exactly what we're doing.
>>>
>>> This is a very important change so forgive my fiddling about here but I'm
>>> hoping we can take the opportunity to make things a little simpler!
>>>
>>> On Thu, Jun 05, 2025 at 04:00:58PM +0800, Baolin Wang wrote:
>>>> The MADV_COLLAPSE will ignore the system-wide Anon THP sysfs settings, which
>>>> means that even though we have disabled the Anon THP configuration, MADV_COLLAPSE
>>>> will still attempt to collapse into a Anon THP. This violates the rule we have
>>>> agreed upon: never means never.
>>>>
>>>> Another rule for madvise, referring to David's suggestion: “allowing for collapsing
>>>> in a VM without VM_HUGEPAGE in the "madvise" mode would be fine".
>>>
>>> I'm generally not sure it's worth talking only about MADV_COLLAPSE here when
>>> you're changing what THP is permitted across the board, I may have missed some
>>> discussion and forgive me if so, but what is special about MADV_COLLAPSE's use
>>> of thp_vma_allowable_orders() that makes it ignore 'never's moreso than other
>>> users?
>>
>> We found that MADV_COLLAPSE ignores the THP configuration, meaning that even
>> when THP is set to 'never', MADV_COLLAPSE can still collapse into THPs (and
>> mTHPs in the future). This is because when MADV_COLLAPSE calls
>> thp_vma_allowable_orders(), it does not set the TVA_ENFORCE_SYSFS flag,
>> which means it ignores the system-wide Anon THP sysfs settings.
>>
>> So this patch set is aimed to fix the THP policy for MADV_COLLAPSE.
>>
>
> Yeah of course, and this is exactly why, but what I mean is, the patch
> doesn't explicitly address MADV_COLLAPSE, it addresses a case that
> MADV_COLLAPSE uses (which is as you say the motivating cause for the
> change).
>
> So I think the commit message should rather open something like:
>
> If, when invoking thp_vma_allowable_orders(), the TVA_ENFORCE_SYSFS
> flag is not specified, we ignore sysfs TLB settings.
>
> Whilst it makes sense for the callers who do not specify this flag,
> it creates a odd and surprising situation where a sysadmin
> specifying 'never' for all THP sizes still observing THP pages
> being allocated and used on the system.
>
> The motivating case for this is MADV_COLLAPSE, <blah blah blah> :)
OK. Will update the commit message. Thanks.
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH v2 2/2] mm: shmem: disallow hugepages if the system-wide shmem THP sysfs settings are disabled
2025-06-09 15:33 ` Lorenzo Stoakes
@ 2025-06-11 7:02 ` Baolin Wang
0 siblings, 0 replies; 49+ messages in thread
From: Baolin Wang @ 2025-06-11 7:02 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: akpm, hughd, david, Liam.Howlett, npache, ryan.roberts, dev.jain,
ziy, linux-mm, linux-kernel
On 2025/6/9 23:33, Lorenzo Stoakes wrote:
> OK overall the logic looks good now I realise the mistake I made is that
> the thp_vma_allowable_orders() check is for the vma_is_anonymous() case
> only.
>
> On Mon, Jun 09, 2025 at 02:31:46PM +0800, Baolin Wang wrote:
>>
>>
>> On 2025/6/7 20:14, Lorenzo Stoakes wrote:
>>> On Thu, Jun 05, 2025 at 04:00:59PM +0800, Baolin Wang wrote:
>>>> The MADV_COLLAPSE will ignore the system-wide shmem THP sysfs settings, which
>>>> means that even though we have disabled the shmem THP configuration, MADV_COLLAPSE
>>>> will still attempt to collapse into a shmem THP. This violates the rule we have
>>>> agreed upon: never means never.
>>>
>>> Ugh that we have separate shmem logic. And split between huge_memory.c and
>>> shmem.c too :)) Again, not your fault, just a general moan about existing
>>> stuff :P
>>>
>>>>
>>>> Another rule for madvise, referring to David's suggestion: “allowing for collapsing
>>>> in a VM without VM_HUGEPAGE in the "madvise" mode would be fine".
>>>
>>> Hm I'm not sure if this is enforced is it? I may have missed something here
>>> however.
>>>
>>>>
>>>> Then the current strategy is:
>>>>
>>>> For shmem, if none of always, madvise, within_size, and inherit have enabled
>>>> PMD-sized THP, then MADV_COLLAPSE will be prohibited from collapsing PMD-sized THP.
>>>
>>> Again, is this just MADV_COLLAPSE? Surely this is a general change?
>>>
>>> We should be clear that we are not explicitly limiting ourselves to
>>> MADV_COLLAPSE here.
>>>
>>> You shoudl clearly indicate that the MADV_COLLAPSE case DOESN'T set
>>> TVA_ENFORCE_SYSFS and that's the key difference here.
>>
>> Sure.
>>
>
> Thanks!
>
>>>> For tmpfs, if the mount option is set with the 'huge=never' parameter, then
>>>> MADV_COLLAPSE will be prohibited from collapsing PMD-sized THP.
>>>>
>>>> Acked-by: Zi Yan <ziy@nvidia.com>
>>>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>>>> ---
>>>> mm/huge_memory.c | 2 +-
>>>> mm/shmem.c | 6 +++---
>>>> 2 files changed, 4 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>>> index d3e66136e41a..a8cfa37cae72 100644
>>>> --- a/mm/huge_memory.c
>>>> +++ b/mm/huge_memory.c
>>>> @@ -166,7 +166,7 @@ unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma,
>>>> * own flags.
>>>> */
>>>> if (!in_pf && shmem_file(vma->vm_file))
>>>> - return shmem_allowable_huge_orders(file_inode(vma->vm_file),
>>>> + return orders & shmem_allowable_huge_orders(file_inode(vma->vm_file),
>>>> vma, vma->vm_pgoff, 0,
>>>> !enforce_sysfs);
>>>
>>> Did you mean to do &&?
>>
>> No. It might be worth having a separate fix patch here, because the original
>> logic is incorrect and needs to perform an '&' operation with ’orders‘.
>>
>> This change should be a general change.
>
> Ah yeah, I did think that _perhaps_ it could be. I think it would make
> sense to separate out into another patch albeit very small, just so we can
> separate your further changes from the fix for this.
OK.
>>> Also, is this achieving what you want to achieve? Is it necessary? The
>>> changes in patch 1/2 enforce the global settings and before this code in
>>> __thp_vma_allowable_orders():
>>>
>>> unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma,
>>> unsigned long vm_flags,
>>> unsigned long tva_flags,
>>> unsigned long orders)
>>> {
>>> ... (no early exits) ...
>>>
>>> orders &= supported_orders;
>>> if (!orders)
>>> return 0;
>>>
>>> ...
>>> }
>>>
>>> So if orders == 0 due to the changes in thp_vma_allowable_orders(), which
>>> is the only caller of __thp_vma_allowable_orders() then we _always_ exit
>>> early here before we get to this shmem_allowable_huge_orders() code.
>>
>> Not for this case. Since shmem already supports mTHP, this is to check
>> whether the 'orders' are enabled in the shmem mTHP configuration. For
>> example, shmem mTHP might only enable 64K mTHP, which obviously does not
>> allow PMD-sized THP to collapse.
>
> Yeah sorry I get it now thp_vma_allowable_orders() does a
> vma_is_anonymous() predicate. Doh! :P
>
> God what a mess this is (not your fault, pre-existing obviously :P)
>
> Yeah le sigh.
>
>>
>>>> diff --git a/mm/shmem.c b/mm/shmem.c
>>>> index 4b42419ce6b2..9af45d4e27e6 100644
>>>> --- a/mm/shmem.c
>>>> +++ b/mm/shmem.c
>>>> @@ -625,7 +625,7 @@ static unsigned int shmem_huge_global_enabled(struct inode *inode, pgoff_t index
>>>> return 0;
>>>> if (shmem_huge == SHMEM_HUGE_DENY)
>>>> return 0;
>>>> - if (shmem_huge_force || shmem_huge == SHMEM_HUGE_FORCE)
>>>> + if (shmem_huge == SHMEM_HUGE_FORCE)
>>>> return maybe_pmd_order;
>
> OK I get it now, this means the !check sysfs doesn't just get
> actioned... :)
>
>>>>
>>>> /*
>>>> @@ -660,7 +660,7 @@ static unsigned int shmem_huge_global_enabled(struct inode *inode, pgoff_t index
>>>>
>>>> fallthrough;
>>>> case SHMEM_HUGE_ADVISE:
>>>> - if (vm_flags & VM_HUGEPAGE)
>>>> + if (shmem_huge_force || (vm_flags & VM_HUGEPAGE))
>>>> return maybe_pmd_order;
>>>> fallthrough;
>>>> default:
>>>> @@ -1790,7 +1790,7 @@ unsigned long shmem_allowable_huge_orders(struct inode *inode,
>>>> /* Allow mTHP that will be fully within i_size. */
>>>> mask |= shmem_get_orders_within_size(inode, within_size_orders, index, 0);
>>>>
>>>> - if (vm_flags & VM_HUGEPAGE)
>>>> + if (shmem_huge_force || (vm_flags & VM_HUGEPAGE))
>>>> mask |= READ_ONCE(huge_shmem_orders_madvise);
>>>
>>> I'm also not sure these are necessary:
>>>
>>> The only path that can set shmem_huge_force is __thp_vma_allowable_orders()
>>> -> shmem_allowable_huge_orders() -> shmem_huge_global_enabled() and then
>>> only if !(tva_flags & TVA_ENFORCE_SYSFS) and as stated above we already
>>> cover off this case by early exiting __thp_vma_allowable_orders() if orders
>>> == 0 as established in patch 1/2.
>>
>> Not ture. Shmem has its own separate mTHP sysfs setting, which is different
>> from the mTHP sysfs setting for anonymous pages mentioned earlier. These
>> checks are in the shmem file. You can check more for shmem mTHP in
>> Documentation/admin-guide/mm/transhuge.rst :)
>
> Ah yeah the issue is if (vma_is_anonymous())... doh!
>
> The stack trace is correct though, this is the only place we do it:
>
> ~~
>
> static unsigned int shmem_huge_global_enabled(struct inode *inode, pgoff_t index,
> loff_t write_end, bool shmem_huge_force,
> struct vm_area_struct *vma,
> unsigned long vm_flags)
>
> Is called from shmem_getattr():
>
> if (shmem_huge_global_enabled(inode, 0, 0, false, NULL, 0))
> stat->blksize = HPAGE_PMD_SIZE;
>
> Note that smem_huge_force == false here
>
> And shmem_allowable_huge_orders():
>
> unsigned long shmem_allowable_huge_orders(struct inode *inode,
> struct vm_area_struct *vma, pgoff_t index,
> loff_t write_end, bool shmem_huge_force)
>
> global_orders = shmem_huge_global_enabled(inode, index, write_end,
> shmem_huge_force, vma, vm_flags);
>
> Which forwards 'shmem_huge_force'.
>
> In shmem_get_folow_gfp():
>
> orders = shmem_allowable_huge_orders(inode, vma, index, write_end, false);
>
> Note that shmem_huge_force == false.
>
> In __thp_vma_allowable_orders();
>
> unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma,
> unsigned long vm_flags,
> unsigned long tva_flags,
> unsigned long orders)
> {
> ...
> bool enforce_sysfs = tva_flags & TVA_ENFORCE_SYSFS;
>
> ...
>
> if (!in_pf && shmem_file(vma->vm_file))
> return shmem_allowable_huge_orders(file_inode(vma->vm_file),
> vma, vma->vm_pgoff, 0,
> !enforce_sysfs);
>
> So we set shmem_huge_force only if TVA_ENFORCE_SYSFS is not set in tva_flags passed to __thp_vma_allowable_orders()
>
> The only caller of __thp_vma_allowable_orders() is thp_vma_allowable_orders().
Right :)
>
> But yeah we do need to do shmem things... sorry my mistake.
No worries. I appreciate your useful comments.
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH v2 0/2] fix MADV_COLLAPSE issue if THP settings are disabled
2025-06-07 12:28 ` [PATCH v2 0/2] fix MADV_COLLAPSE issue if THP " Lorenzo Stoakes
@ 2025-06-11 7:05 ` Baolin Wang
0 siblings, 0 replies; 49+ messages in thread
From: Baolin Wang @ 2025-06-11 7:05 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: akpm, hughd, david, Liam.Howlett, npache, ryan.roberts, dev.jain,
ziy, linux-mm, linux-kernel
On 2025/6/7 20:28, Lorenzo Stoakes wrote:
> Before I get into technical criticism, to be clear - thank you very much
> for doing this :) I'm just getting into details as to the implementation,
> but am a fan of this change and consider it important.
>
> On Thu, Jun 05, 2025 at 04:00:57PM +0800, Baolin Wang wrote:
>> As we discussed in the previous thread [1], the MADV_COLLAPSE will ignore
>> the system-wide anon/shmem THP sysfs settings, which means that even though
>> we have disabled the anon/shmem THP configuration, MADV_COLLAPSE will still
>> attempt to collapse into a anon/shmem THP. This violates the rule we have
>> agreed upon: never means never. This patch set will address this issue.
>
> Hm this cover letter could be expanded upon quite a bit - you are doing a
> lot here and it's not only MADV_COLLAPSE, more a general change.
>
> I'd mention that, even when TVA_ENFORCE_SYSFS is not set, callers checking
> THP order validity will not be able to specify THP orders that are either
> specifically marked as 'never' or set to 'inherit' and the global hugepage
> mode is 'never'.
>
> Then say something like 'importantly, this changes alters the madvise(...,
> MADV_COLLAPSE) call, which previously would collapse ranges into huge pages
> even if THP was set to never. This corrects this behaviour'.
>
> I suspect you are unable to write sensible tests here given the need to
> manipulate sysfs (though perhaps worth quickly looking at
> tools/testing/selftests/mm/khugepaged.c, transhuge-stress.c, run_vmtests.sh
> to see), but it'd be at least useful for you to give details here of how
> you have tested this and ensured it functions correctly.
>
> It might also be worth giving a quick justification, i.e. 'system
> administrators who disabled THP everywhere must indeed very much not want
> THP to be used for whatever reason - having individual programs being able
> to quietly override this is very surprising and likely to cause headaches
> for those who desire this not to happen on their systems'.
>
Ah, missed this comment. Good suggestion, I will update the cover
letter. Thanks.
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH v2 1/2] mm: huge_memory: disallow hugepages if the system-wide THP sysfs settings are disabled
2025-06-05 8:00 ` [PATCH v2 1/2] mm: huge_memory: disallow hugepages if the system-wide THP sysfs " Baolin Wang
` (2 preceding siblings ...)
2025-06-08 18:37 ` Nico Pache
@ 2025-06-11 12:34 ` David Hildenbrand
2025-06-12 7:51 ` Baolin Wang
3 siblings, 1 reply; 49+ messages in thread
From: David Hildenbrand @ 2025-06-11 12:34 UTC (permalink / raw)
To: Baolin Wang, akpm, hughd
Cc: lorenzo.stoakes, Liam.Howlett, npache, ryan.roberts, dev.jain,
ziy, linux-mm, linux-kernel
On 05.06.25 10:00, Baolin Wang wrote:
> The MADV_COLLAPSE will ignore the system-wide Anon THP sysfs settings, which
> means that even though we have disabled the Anon THP configuration, MADV_COLLAPSE
> will still attempt to collapse into a Anon THP. This violates the rule we have
> agreed upon: never means never.
>
> Another rule for madvise, referring to David's suggestion: “allowing for collapsing
> in a VM without VM_HUGEPAGE in the "madvise" mode would be fine".
>
> To address this issue, should check whether the Anon THP configuration is disabled
> in thp_vma_allowable_orders(), even when the TVA_ENFORCE_SYSFS flag is set.
>
> In summary, the current strategy is:
>
> 1. If always & orders == 0, and madvise & orders == 0, and hugepage_global_enabled() == false
> (global THP settings are not enabled), it means mTHP of that orders are prohibited
> from being used, then madvise_collapse() is forbidden for that orders.
>
> 2. If always & orders == 0, and madvise & orders == 0, and hugepage_global_enabled() == true
> (global THP settings are enabled), and inherit & orders == 0, it means mTHP of that
> orders are still prohibited from being used, thus madvise_collapse() is not allowed
> for that orders.
>
> Reviewed-by: Zi Yan <ziy@nvidia.com>
> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> ---
> include/linux/huge_mm.h | 23 +++++++++++++++++++----
> 1 file changed, 19 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index 2f190c90192d..199ddc9f04a1 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -287,20 +287,35 @@ unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma,
> unsigned long orders)
> {
> /* Optimization to check if required orders are enabled early. */
> - if ((tva_flags & TVA_ENFORCE_SYSFS) && vma_is_anonymous(vma)) {
> - unsigned long mask = READ_ONCE(huge_anon_orders_always);
> + if (vma_is_anonymous(vma)) {
> + unsigned long always = READ_ONCE(huge_anon_orders_always);
> + unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
> + unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);
> + unsigned long mask = always | madvise;
> +
> + /*
> + * If the system-wide THP/mTHP sysfs settings are disabled,
> + * then we should never allow hugepages.
> + */> + if (!(mask & orders) && !(hugepage_global_enabled() &&
(inherit & orders)))
> + return 0;
I'm still trying to digest that. Isn't there a way for us to work with
the orders,
essentially masking off all orders that are forbidden globally. Similar
to below, if !orders, then return 0?
/* Orders disabled directly. */
orders &= ~TODO;
/* Orders disabled by inheriting from the global toggle. */
if (!hugepage_global_enabled())
orders &= ~READ_ONCE(huge_anon_orders_inherit);
TODO is probably a -1ULL and then clearing always/madvise/inherit. Could
add a simple helper for that
huge_anon_orders_never
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH v2 1/2] mm: huge_memory: disallow hugepages if the system-wide THP sysfs settings are disabled
2025-06-11 12:34 ` David Hildenbrand
@ 2025-06-12 7:51 ` Baolin Wang
2025-06-12 8:46 ` Dev Jain
2025-06-12 8:51 ` David Hildenbrand
0 siblings, 2 replies; 49+ messages in thread
From: Baolin Wang @ 2025-06-12 7:51 UTC (permalink / raw)
To: David Hildenbrand, akpm, hughd
Cc: lorenzo.stoakes, Liam.Howlett, npache, ryan.roberts, dev.jain,
ziy, linux-mm, linux-kernel
On 2025/6/11 20:34, David Hildenbrand wrote:
> On 05.06.25 10:00, Baolin Wang wrote:
>> The MADV_COLLAPSE will ignore the system-wide Anon THP sysfs settings,
>> which
>> means that even though we have disabled the Anon THP configuration,
>> MADV_COLLAPSE
>> will still attempt to collapse into a Anon THP. This violates the rule
>> we have
>> agreed upon: never means never.
>>
>> Another rule for madvise, referring to David's suggestion: “allowing
>> for collapsing
>> in a VM without VM_HUGEPAGE in the "madvise" mode would be fine".
>>
>> To address this issue, should check whether the Anon THP configuration
>> is disabled
>> in thp_vma_allowable_orders(), even when the TVA_ENFORCE_SYSFS flag is
>> set.
>>
>> In summary, the current strategy is:
>>
>> 1. If always & orders == 0, and madvise & orders == 0, and
>> hugepage_global_enabled() == false
>> (global THP settings are not enabled), it means mTHP of that orders
>> are prohibited
>> from being used, then madvise_collapse() is forbidden for that orders.
>>
>> 2. If always & orders == 0, and madvise & orders == 0, and
>> hugepage_global_enabled() == true
>> (global THP settings are enabled), and inherit & orders == 0, it means
>> mTHP of that
>> orders are still prohibited from being used, thus madvise_collapse()
>> is not allowed
>> for that orders.
>>
>> Reviewed-by: Zi Yan <ziy@nvidia.com>
>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>> ---
>> include/linux/huge_mm.h | 23 +++++++++++++++++++----
>> 1 file changed, 19 insertions(+), 4 deletions(-)
>>
>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>> index 2f190c90192d..199ddc9f04a1 100644
>> --- a/include/linux/huge_mm.h
>> +++ b/include/linux/huge_mm.h
>> @@ -287,20 +287,35 @@ unsigned long thp_vma_allowable_orders(struct
>> vm_area_struct *vma,
>> unsigned long orders)
>> {
>> /* Optimization to check if required orders are enabled early. */
>> - if ((tva_flags & TVA_ENFORCE_SYSFS) && vma_is_anonymous(vma)) {
>> - unsigned long mask = READ_ONCE(huge_anon_orders_always);
>> + if (vma_is_anonymous(vma)) {
>> + unsigned long always = READ_ONCE(huge_anon_orders_always);
>> + unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
>> + unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);
>> + unsigned long mask = always | madvise;
>> +
>> + /*
>> + * If the system-wide THP/mTHP sysfs settings are disabled,
>> + * then we should never allow hugepages.
> > + */> + if (!(mask & orders) &&
> !(hugepage_global_enabled() && (inherit & orders)))
>> + return 0;
>
> I'm still trying to digest that. Isn't there a way for us to work with
> the orders,
> essentially masking off all orders that are forbidden globally. Similar
> to below, if !orders, then return 0?
> /* Orders disabled directly. */
> orders &= ~TODO;
> /* Orders disabled by inheriting from the global toggle. */
> if (!hugepage_global_enabled())
> orders &= ~READ_ONCE(huge_anon_orders_inherit);
>
> TODO is probably a -1ULL and then clearing always/madvise/inherit. Could
> add a simple helper for that
>
> huge_anon_orders_never
I followed Lorenzo's suggestion to simplify the logic. Does that look
more readable?
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 2f190c90192d..3087ac7631e0 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -265,6 +265,43 @@ unsigned long __thp_vma_allowable_orders(struct
vm_area_struct *vma,
unsigned long tva_flags,
unsigned long orders);
+/* Strictly mask requested anonymous orders according to sysfs settings. */
+static inline unsigned long __thp_mask_anon_orders(unsigned long vm_flags,
+ unsigned long tva_flags, unsigned long
orders)
+{
+ unsigned long always = READ_ONCE(huge_anon_orders_always);
+ unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
+ unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);
+ bool inherit_enabled = hugepage_global_enabled();
+ bool has_madvise = vm_flags & VM_HUGEPAGE;
+ unsigned long mask = always | madvise;
+
+ mask = always | madvise;
+ if (inherit_enabled)
+ mask |= inherit;
+
+ /* All set to/inherit NEVER - never means never globally, abort. */
+ if (!(mask & orders))
+ return 0;
+
+ /*
+ * Otherwise, we only enforce sysfs settings if asked. In addition,
+ * if the user sets a sysfs mode of madvise and if TVA_ENFORCE_SYSFS
+ * is not set, we don't bother checking whether the VMA has
VM_HUGEPAGE
+ * set.
+ */
+ if (!(tva_flags & TVA_ENFORCE_SYSFS))
+ return orders;
+
+ mask = always;
+ if (has_madvise)
+ mask |= madvise;
+ if (hugepage_global_always() || (has_madvise && inherit_enabled))
+ mask |= inherit;
+
+ return orders & mask;
+}
+
/**
* thp_vma_allowable_orders - determine hugepage orders that are
allowed for vma
* @vma: the vm area to check
@@ -287,19 +324,8 @@ unsigned long thp_vma_allowable_orders(struct
vm_area_struct *vma,
unsigned long orders)
{
/* Optimization to check if required orders are enabled early. */
- if ((tva_flags & TVA_ENFORCE_SYSFS) && vma_is_anonymous(vma)) {
- unsigned long mask = READ_ONCE(huge_anon_orders_always);
-
- if (vm_flags & VM_HUGEPAGE)
- mask |= READ_ONCE(huge_anon_orders_madvise);
- if (hugepage_global_always() ||
- ((vm_flags & VM_HUGEPAGE) && hugepage_global_enabled()))
- mask |= READ_ONCE(huge_anon_orders_inherit);
-
- orders &= mask;
- if (!orders)
- return 0;
- }
+ if (vma_is_anonymous(vma))
+ orders = __thp_mask_anon_orders(vm_flags, tva_flags,
orders);
return __thp_vma_allowable_orders(vma, vm_flags, tva_flags,
orders);
}
^ permalink raw reply related [flat|nested] 49+ messages in thread
* Re: [PATCH v2 1/2] mm: huge_memory: disallow hugepages if the system-wide THP sysfs settings are disabled
2025-06-12 7:51 ` Baolin Wang
@ 2025-06-12 8:46 ` Dev Jain
2025-06-12 8:52 ` David Hildenbrand
2025-06-12 8:51 ` David Hildenbrand
1 sibling, 1 reply; 49+ messages in thread
From: Dev Jain @ 2025-06-12 8:46 UTC (permalink / raw)
To: Baolin Wang, David Hildenbrand, akpm, hughd
Cc: lorenzo.stoakes, Liam.Howlett, npache, ryan.roberts, ziy,
linux-mm, linux-kernel
On 12/06/25 1:21 pm, Baolin Wang wrote:
>
>
> On 2025/6/11 20:34, David Hildenbrand wrote:
>> On 05.06.25 10:00, Baolin Wang wrote:
>>> The MADV_COLLAPSE will ignore the system-wide Anon THP sysfs
>>> settings, which
>>> means that even though we have disabled the Anon THP configuration,
>>> MADV_COLLAPSE
>>> will still attempt to collapse into a Anon THP. This violates the
>>> rule we have
>>> agreed upon: never means never.
>>>
>>> Another rule for madvise, referring to David's suggestion: “allowing
>>> for collapsing
>>> in a VM without VM_HUGEPAGE in the "madvise" mode would be fine".
>>>
>>> To address this issue, should check whether the Anon THP
>>> configuration is disabled
>>> in thp_vma_allowable_orders(), even when the TVA_ENFORCE_SYSFS flag
>>> is set.
>>>
>>> In summary, the current strategy is:
>>>
>>> 1. If always & orders == 0, and madvise & orders == 0, and
>>> hugepage_global_enabled() == false
>>> (global THP settings are not enabled), it means mTHP of that orders
>>> are prohibited
>>> from being used, then madvise_collapse() is forbidden for that orders.
>>>
>>> 2. If always & orders == 0, and madvise & orders == 0, and
>>> hugepage_global_enabled() == true
>>> (global THP settings are enabled), and inherit & orders == 0, it
>>> means mTHP of that
>>> orders are still prohibited from being used, thus madvise_collapse()
>>> is not allowed
>>> for that orders.
>>>
>>> Reviewed-by: Zi Yan <ziy@nvidia.com>
>>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>>> ---
>>> include/linux/huge_mm.h | 23 +++++++++++++++++++----
>>> 1 file changed, 19 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>>> index 2f190c90192d..199ddc9f04a1 100644
>>> --- a/include/linux/huge_mm.h
>>> +++ b/include/linux/huge_mm.h
>>> @@ -287,20 +287,35 @@ unsigned long thp_vma_allowable_orders(struct
>>> vm_area_struct *vma,
>>> unsigned long orders)
>>> {
>>> /* Optimization to check if required orders are enabled early. */
>>> - if ((tva_flags & TVA_ENFORCE_SYSFS) && vma_is_anonymous(vma)) {
>>> - unsigned long mask = READ_ONCE(huge_anon_orders_always);
>>> + if (vma_is_anonymous(vma)) {
>>> + unsigned long always = READ_ONCE(huge_anon_orders_always);
>>> + unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
>>> + unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);
>>> + unsigned long mask = always | madvise;
>>> +
>>> + /*
>>> + * If the system-wide THP/mTHP sysfs settings are disabled,
>>> + * then we should never allow hugepages.
>> > + */> + if (!(mask & orders) &&
>> !(hugepage_global_enabled() && (inherit & orders)))
>>> + return 0;
>>
>> I'm still trying to digest that. Isn't there a way for us to work
>> with the orders,
>> essentially masking off all orders that are forbidden globally.
>> Similar to below, if !orders, then return 0?
>> /* Orders disabled directly. */
>> orders &= ~TODO;
>> /* Orders disabled by inheriting from the global toggle. */
>> if (!hugepage_global_enabled())
>> orders &= ~READ_ONCE(huge_anon_orders_inherit);
>>
>> TODO is probably a -1ULL and then clearing always/madvise/inherit.
>> Could add a simple helper for that
>>
>> huge_anon_orders_never
>
> I followed Lorenzo's suggestion to simplify the logic. Does that look
> more readable?
>
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index 2f190c90192d..3087ac7631e0 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -265,6 +265,43 @@ unsigned long __thp_vma_allowable_orders(struct
> vm_area_struct *vma,
> unsigned long tva_flags,
> unsigned long orders);
>
> +/* Strictly mask requested anonymous orders according to sysfs
> settings. */
> +static inline unsigned long __thp_mask_anon_orders(unsigned long
> vm_flags,
> + unsigned long tva_flags, unsigned long
> orders)
> +{
> + unsigned long always = READ_ONCE(huge_anon_orders_always);
> + unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
> + unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);
> + bool inherit_enabled = hugepage_global_enabled();
> + bool has_madvise = vm_flags & VM_HUGEPAGE;
Let us name this honour_madvise to indicate that this VMA must honour
the madvise setting.
> + unsigned long mask = always | madvise;
> +
> + mask = always | madvise;
> + if (inherit_enabled)
> + mask |= inherit;
> +
> + /* All set to/inherit NEVER - never means never globally,
> abort. */
> + if (!(mask & orders))
> + return 0;
> +
> + /*
> + * Otherwise, we only enforce sysfs settings if asked. In
> addition,
> + * if the user sets a sysfs mode of madvise and if
> TVA_ENFORCE_SYSFS
> + * is not set, we don't bother checking whether the VMA has
> VM_HUGEPAGE
> + * set.
> + */
> + if (!(tva_flags & TVA_ENFORCE_SYSFS))
> + return orders;
> +
> + mask = always;
> + if (has_madvise)
> + mask |= madvise;
> + if (hugepage_global_always() || (has_madvise && inherit_enabled))
> + mask |= inherit;
> +
> + return orders & mask;
> +}
> +
> /**
> * thp_vma_allowable_orders - determine hugepage orders that are
> allowed for vma
> * @vma: the vm area to check
> @@ -287,19 +324,8 @@ unsigned long thp_vma_allowable_orders(struct
> vm_area_struct *vma,
> unsigned long orders)
> {
> /* Optimization to check if required orders are enabled early. */
> - if ((tva_flags & TVA_ENFORCE_SYSFS) && vma_is_anonymous(vma)) {
> - unsigned long mask = READ_ONCE(huge_anon_orders_always);
> -
> - if (vm_flags & VM_HUGEPAGE)
> - mask |= READ_ONCE(huge_anon_orders_madvise);
> - if (hugepage_global_always() ||
> - ((vm_flags & VM_HUGEPAGE) &&
> hugepage_global_enabled()))
> - mask |= READ_ONCE(huge_anon_orders_inherit);
> -
> - orders &= mask;
> - if (!orders)
> - return 0;
> - }
> + if (vma_is_anonymous(vma))
> + orders = __thp_mask_anon_orders(vm_flags, tva_flags,
> orders);
>
> return __thp_vma_allowable_orders(vma, vm_flags, tva_flags,
> orders);
> }
Overall this is very readable!
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH v2 1/2] mm: huge_memory: disallow hugepages if the system-wide THP sysfs settings are disabled
2025-06-12 7:51 ` Baolin Wang
2025-06-12 8:46 ` Dev Jain
@ 2025-06-12 8:51 ` David Hildenbrand
2025-06-12 12:45 ` Baolin Wang
2025-06-12 13:07 ` Lorenzo Stoakes
1 sibling, 2 replies; 49+ messages in thread
From: David Hildenbrand @ 2025-06-12 8:51 UTC (permalink / raw)
To: Baolin Wang, akpm, hughd
Cc: lorenzo.stoakes, Liam.Howlett, npache, ryan.roberts, dev.jain,
ziy, linux-mm, linux-kernel
On 12.06.25 09:51, Baolin Wang wrote:
>
>
> On 2025/6/11 20:34, David Hildenbrand wrote:
>> On 05.06.25 10:00, Baolin Wang wrote:
>>> The MADV_COLLAPSE will ignore the system-wide Anon THP sysfs settings,
>>> which
>>> means that even though we have disabled the Anon THP configuration,
>>> MADV_COLLAPSE
>>> will still attempt to collapse into a Anon THP. This violates the rule
>>> we have
>>> agreed upon: never means never.
>>>
>>> Another rule for madvise, referring to David's suggestion: “allowing
>>> for collapsing
>>> in a VM without VM_HUGEPAGE in the "madvise" mode would be fine".
>>>
>>> To address this issue, should check whether the Anon THP configuration
>>> is disabled
>>> in thp_vma_allowable_orders(), even when the TVA_ENFORCE_SYSFS flag is
>>> set.
>>>
>>> In summary, the current strategy is:
>>>
>>> 1. If always & orders == 0, and madvise & orders == 0, and
>>> hugepage_global_enabled() == false
>>> (global THP settings are not enabled), it means mTHP of that orders
>>> are prohibited
>>> from being used, then madvise_collapse() is forbidden for that orders.
>>>
>>> 2. If always & orders == 0, and madvise & orders == 0, and
>>> hugepage_global_enabled() == true
>>> (global THP settings are enabled), and inherit & orders == 0, it means
>>> mTHP of that
>>> orders are still prohibited from being used, thus madvise_collapse()
>>> is not allowed
>>> for that orders.
>>>
>>> Reviewed-by: Zi Yan <ziy@nvidia.com>
>>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>>> ---
>>> include/linux/huge_mm.h | 23 +++++++++++++++++++----
>>> 1 file changed, 19 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>>> index 2f190c90192d..199ddc9f04a1 100644
>>> --- a/include/linux/huge_mm.h
>>> +++ b/include/linux/huge_mm.h
>>> @@ -287,20 +287,35 @@ unsigned long thp_vma_allowable_orders(struct
>>> vm_area_struct *vma,
>>> unsigned long orders)
>>> {
>>> /* Optimization to check if required orders are enabled early. */
>>> - if ((tva_flags & TVA_ENFORCE_SYSFS) && vma_is_anonymous(vma)) {
>>> - unsigned long mask = READ_ONCE(huge_anon_orders_always);
>>> + if (vma_is_anonymous(vma)) {
>>> + unsigned long always = READ_ONCE(huge_anon_orders_always);
>>> + unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
>>> + unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);
>>> + unsigned long mask = always | madvise;
>>> +
>>> + /*
>>> + * If the system-wide THP/mTHP sysfs settings are disabled,
>>> + * then we should never allow hugepages.
>> > + */> + if (!(mask & orders) &&
>> !(hugepage_global_enabled() && (inherit & orders)))
>>> + return 0;
>>
>> I'm still trying to digest that. Isn't there a way for us to work with
>> the orders,
>> essentially masking off all orders that are forbidden globally. Similar
>> to below, if !orders, then return 0?
>> /* Orders disabled directly. */
>> orders &= ~TODO;
>> /* Orders disabled by inheriting from the global toggle. */
>> if (!hugepage_global_enabled())
>> orders &= ~READ_ONCE(huge_anon_orders_inherit);
>>
>> TODO is probably a -1ULL and then clearing always/madvise/inherit. Could
>> add a simple helper for that
>>
>> huge_anon_orders_never
>
> I followed Lorenzo's suggestion to simplify the logic. Does that look
> more readable?
>
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index 2f190c90192d..3087ac7631e0 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -265,6 +265,43 @@ unsigned long __thp_vma_allowable_orders(struct
> vm_area_struct *vma,
> unsigned long tva_flags,
> unsigned long orders);
>
> +/* Strictly mask requested anonymous orders according to sysfs settings. */
> +static inline unsigned long __thp_mask_anon_orders(unsigned long vm_flags,
> + unsigned long tva_flags, unsigned long
> orders)
> +{
> + unsigned long always = READ_ONCE(huge_anon_orders_always);
> + unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
> + unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);
> + bool inherit_enabled = hugepage_global_enabled();
> + bool has_madvise = vm_flags & VM_HUGEPAGE;
> + unsigned long mask = always | madvise;
> +
> + mask = always | madvise;
> + if (inherit_enabled)
> + mask |= inherit;
> +
> + /* All set to/inherit NEVER - never means never globally, abort. */
> + if (!(mask & orders))
> + return 0;
Still confusing. I am not sure if we would properly catch when someone
specifies e.g., 2M and 1M, while we only have 2M disabled.
I would rewrite the function to only ever substract from "orders".
...
/* Disallow orders that are set to NEVER directly ... */
order &= (always | madvise | inherit);
/* ... or through inheritance. */
if (inherit_enabled)
orders &= ~inherit;
/*
* Otherwise, we only enforce sysfs settings if asked. In addition,
* if the user sets a sysfs mode of madvise and if TVA_ENFORCE_SYSFS
* is not set, we don't bother checking whether the VMA has VM_HUGEPAGE
* set.
*/
if (!orders || !(tva_flags & TVA_ENFORCE_SYSFS))
return orders;
> +
> + /*
> + * Otherwise, we only enforce sysfs settings if asked. In addition,
> + * if the user sets a sysfs mode of madvise and if TVA_ENFORCE_SYSFS
> + * is not set, we don't bother checking whether the VMA has
> VM_HUGEPAGE
> + * set.
> + */
> + if (!(tva_flags & TVA_ENFORCE_SYSFS))
> + return orders;
> +
> + mask = always;
> + if (has_madvise)
> + mask |= madvise;
> + if (hugepage_global_always() || (has_madvise && inherit_enabled))
> + mask |= inherit;
Similarly, this can maybe become (not 100% sure if I got it right, the
condition above is confusing)
if (!has_madvise) {
/*
* Without VM_HUGEPAGE, only allow orders that are set to
* ALWAYS directly ...
*/
order &= (always | inherit);
/* ... or through inheritance. */
if (!hugepage_global_always())
orders &= ~inherit;
}
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH v2 1/2] mm: huge_memory: disallow hugepages if the system-wide THP sysfs settings are disabled
2025-06-12 8:46 ` Dev Jain
@ 2025-06-12 8:52 ` David Hildenbrand
0 siblings, 0 replies; 49+ messages in thread
From: David Hildenbrand @ 2025-06-12 8:52 UTC (permalink / raw)
To: Dev Jain, Baolin Wang, akpm, hughd
Cc: lorenzo.stoakes, Liam.Howlett, npache, ryan.roberts, ziy,
linux-mm, linux-kernel
On 12.06.25 10:46, Dev Jain wrote:
>
> On 12/06/25 1:21 pm, Baolin Wang wrote:
>>
>>
>> On 2025/6/11 20:34, David Hildenbrand wrote:
>>> On 05.06.25 10:00, Baolin Wang wrote:
>>>> The MADV_COLLAPSE will ignore the system-wide Anon THP sysfs
>>>> settings, which
>>>> means that even though we have disabled the Anon THP configuration,
>>>> MADV_COLLAPSE
>>>> will still attempt to collapse into a Anon THP. This violates the
>>>> rule we have
>>>> agreed upon: never means never.
>>>>
>>>> Another rule for madvise, referring to David's suggestion: “allowing
>>>> for collapsing
>>>> in a VM without VM_HUGEPAGE in the "madvise" mode would be fine".
>>>>
>>>> To address this issue, should check whether the Anon THP
>>>> configuration is disabled
>>>> in thp_vma_allowable_orders(), even when the TVA_ENFORCE_SYSFS flag
>>>> is set.
>>>>
>>>> In summary, the current strategy is:
>>>>
>>>> 1. If always & orders == 0, and madvise & orders == 0, and
>>>> hugepage_global_enabled() == false
>>>> (global THP settings are not enabled), it means mTHP of that orders
>>>> are prohibited
>>>> from being used, then madvise_collapse() is forbidden for that orders.
>>>>
>>>> 2. If always & orders == 0, and madvise & orders == 0, and
>>>> hugepage_global_enabled() == true
>>>> (global THP settings are enabled), and inherit & orders == 0, it
>>>> means mTHP of that
>>>> orders are still prohibited from being used, thus madvise_collapse()
>>>> is not allowed
>>>> for that orders.
>>>>
>>>> Reviewed-by: Zi Yan <ziy@nvidia.com>
>>>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>>>> ---
>>>> include/linux/huge_mm.h | 23 +++++++++++++++++++----
>>>> 1 file changed, 19 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>>>> index 2f190c90192d..199ddc9f04a1 100644
>>>> --- a/include/linux/huge_mm.h
>>>> +++ b/include/linux/huge_mm.h
>>>> @@ -287,20 +287,35 @@ unsigned long thp_vma_allowable_orders(struct
>>>> vm_area_struct *vma,
>>>> unsigned long orders)
>>>> {
>>>> /* Optimization to check if required orders are enabled early. */
>>>> - if ((tva_flags & TVA_ENFORCE_SYSFS) && vma_is_anonymous(vma)) {
>>>> - unsigned long mask = READ_ONCE(huge_anon_orders_always);
>>>> + if (vma_is_anonymous(vma)) {
>>>> + unsigned long always = READ_ONCE(huge_anon_orders_always);
>>>> + unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
>>>> + unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);
>>>> + unsigned long mask = always | madvise;
>>>> +
>>>> + /*
>>>> + * If the system-wide THP/mTHP sysfs settings are disabled,
>>>> + * then we should never allow hugepages.
>>> > + */> + if (!(mask & orders) &&
>>> !(hugepage_global_enabled() && (inherit & orders)))
>>>> + return 0;
>>>
>>> I'm still trying to digest that. Isn't there a way for us to work
>>> with the orders,
>>> essentially masking off all orders that are forbidden globally.
>>> Similar to below, if !orders, then return 0?
>>> /* Orders disabled directly. */
>>> orders &= ~TODO;
>>> /* Orders disabled by inheriting from the global toggle. */
>>> if (!hugepage_global_enabled())
>>> orders &= ~READ_ONCE(huge_anon_orders_inherit);
>>>
>>> TODO is probably a -1ULL and then clearing always/madvise/inherit.
>>> Could add a simple helper for that
>>>
>>> huge_anon_orders_never
>>
>> I followed Lorenzo's suggestion to simplify the logic. Does that look
>> more readable?
>>
>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>> index 2f190c90192d..3087ac7631e0 100644
>> --- a/include/linux/huge_mm.h
>> +++ b/include/linux/huge_mm.h
>> @@ -265,6 +265,43 @@ unsigned long __thp_vma_allowable_orders(struct
>> vm_area_struct *vma,
>> unsigned long tva_flags,
>> unsigned long orders);
>>
>> +/* Strictly mask requested anonymous orders according to sysfs
>> settings. */
>> +static inline unsigned long __thp_mask_anon_orders(unsigned long
>> vm_flags,
>> + unsigned long tva_flags, unsigned long
>> orders)
>> +{
>> + unsigned long always = READ_ONCE(huge_anon_orders_always);
>> + unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
>> + unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);
>> + bool inherit_enabled = hugepage_global_enabled();
>> + bool has_madvise = vm_flags & VM_HUGEPAGE;
>
> Let us name this honour_madvise to indicate that this VMA must honour
> the madvise setting.
Not clear, because there is also "VM_NOHUGEPAGE" :/
See my reply where we maybe (if I got it right) only need a single check
and can just avoid a confusing temporary variable completely.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH v2 1/2] mm: huge_memory: disallow hugepages if the system-wide THP sysfs settings are disabled
2025-06-12 8:51 ` David Hildenbrand
@ 2025-06-12 12:45 ` Baolin Wang
2025-06-12 13:05 ` David Hildenbrand
2025-06-12 13:07 ` Lorenzo Stoakes
1 sibling, 1 reply; 49+ messages in thread
From: Baolin Wang @ 2025-06-12 12:45 UTC (permalink / raw)
To: David Hildenbrand, akpm, hughd
Cc: lorenzo.stoakes, Liam.Howlett, npache, ryan.roberts, dev.jain,
ziy, linux-mm, linux-kernel
On 2025/6/12 16:51, David Hildenbrand wrote:
> On 12.06.25 09:51, Baolin Wang wrote:
>>
>>
>> On 2025/6/11 20:34, David Hildenbrand wrote:
>>> On 05.06.25 10:00, Baolin Wang wrote:
>>>> The MADV_COLLAPSE will ignore the system-wide Anon THP sysfs settings,
>>>> which
>>>> means that even though we have disabled the Anon THP configuration,
>>>> MADV_COLLAPSE
>>>> will still attempt to collapse into a Anon THP. This violates the rule
>>>> we have
>>>> agreed upon: never means never.
>>>>
>>>> Another rule for madvise, referring to David's suggestion: “allowing
>>>> for collapsing
>>>> in a VM without VM_HUGEPAGE in the "madvise" mode would be fine".
>>>>
>>>> To address this issue, should check whether the Anon THP configuration
>>>> is disabled
>>>> in thp_vma_allowable_orders(), even when the TVA_ENFORCE_SYSFS flag is
>>>> set.
>>>>
>>>> In summary, the current strategy is:
>>>>
>>>> 1. If always & orders == 0, and madvise & orders == 0, and
>>>> hugepage_global_enabled() == false
>>>> (global THP settings are not enabled), it means mTHP of that orders
>>>> are prohibited
>>>> from being used, then madvise_collapse() is forbidden for that orders.
>>>>
>>>> 2. If always & orders == 0, and madvise & orders == 0, and
>>>> hugepage_global_enabled() == true
>>>> (global THP settings are enabled), and inherit & orders == 0, it means
>>>> mTHP of that
>>>> orders are still prohibited from being used, thus madvise_collapse()
>>>> is not allowed
>>>> for that orders.
>>>>
>>>> Reviewed-by: Zi Yan <ziy@nvidia.com>
>>>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>>>> ---
>>>> include/linux/huge_mm.h | 23 +++++++++++++++++++----
>>>> 1 file changed, 19 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>>>> index 2f190c90192d..199ddc9f04a1 100644
>>>> --- a/include/linux/huge_mm.h
>>>> +++ b/include/linux/huge_mm.h
>>>> @@ -287,20 +287,35 @@ unsigned long thp_vma_allowable_orders(struct
>>>> vm_area_struct *vma,
>>>> unsigned long orders)
>>>> {
>>>> /* Optimization to check if required orders are enabled
>>>> early. */
>>>> - if ((tva_flags & TVA_ENFORCE_SYSFS) && vma_is_anonymous(vma)) {
>>>> - unsigned long mask = READ_ONCE(huge_anon_orders_always);
>>>> + if (vma_is_anonymous(vma)) {
>>>> + unsigned long always = READ_ONCE(huge_anon_orders_always);
>>>> + unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
>>>> + unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);
>>>> + unsigned long mask = always | madvise;
>>>> +
>>>> + /*
>>>> + * If the system-wide THP/mTHP sysfs settings are disabled,
>>>> + * then we should never allow hugepages.
>>> > + */> + if (!(mask & orders) &&
>>> !(hugepage_global_enabled() && (inherit & orders)))
>>>> + return 0;
>>>
>>> I'm still trying to digest that. Isn't there a way for us to work with
>>> the orders,
>>> essentially masking off all orders that are forbidden globally. Similar
>>> to below, if !orders, then return 0?
>>> /* Orders disabled directly. */
>>> orders &= ~TODO;
>>> /* Orders disabled by inheriting from the global toggle. */
>>> if (!hugepage_global_enabled())
>>> orders &= ~READ_ONCE(huge_anon_orders_inherit);
>>>
>>> TODO is probably a -1ULL and then clearing always/madvise/inherit. Could
>>> add a simple helper for that
>>>
>>> huge_anon_orders_never
>>
>> I followed Lorenzo's suggestion to simplify the logic. Does that look
>> more readable?
>>
>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>> index 2f190c90192d..3087ac7631e0 100644
>> --- a/include/linux/huge_mm.h
>> +++ b/include/linux/huge_mm.h
>> @@ -265,6 +265,43 @@ unsigned long __thp_vma_allowable_orders(struct
>> vm_area_struct *vma,
>> unsigned long tva_flags,
>> unsigned long orders);
>>
>> +/* Strictly mask requested anonymous orders according to sysfs
>> settings. */
>> +static inline unsigned long __thp_mask_anon_orders(unsigned long
>> vm_flags,
>> + unsigned long tva_flags, unsigned long
>> orders)
>> +{
>> + unsigned long always = READ_ONCE(huge_anon_orders_always);
>> + unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
>> + unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);
>> + bool inherit_enabled = hugepage_global_enabled();
>> + bool has_madvise = vm_flags & VM_HUGEPAGE;
>> + unsigned long mask = always | madvise;
>> +
>> + mask = always | madvise;
>> + if (inherit_enabled)
>> + mask |= inherit;
>> +
>> + /* All set to/inherit NEVER - never means never globally,
>> abort. */
>> + if (!(mask & orders))
>> + return 0;
>
> Still confusing. I am not sure if we would properly catch when someone
> specifies e.g., 2M and 1M, while we only have 2M disabled.
IIUC, Yes. In your case, we will only allow order 8 (1M mTHP).
> I would rewrite the function to only ever substract from "orders".
>
> ...
>
> /* Disallow orders that are set to NEVER directly ... */
> order &= (always | madvise | inherit);
>
> /* ... or through inheritance. */
> if (inherit_enabled)
> orders &= ~inherit;
Sorry, I didn't get you here.
If orders = THP_ORDERS_ALL_ANON, inherit = 0x200 (order 9), always and
madvise are 0, and inherit_enabled = true. Then orders will be 0 with
your logic. But we should allow order 9, right?
>
> /*
> * Otherwise, we only enforce sysfs settings if asked. In addition,
> * if the user sets a sysfs mode of madvise and if TVA_ENFORCE_SYSFS
> * is not set, we don't bother checking whether the VMA has VM_HUGEPAGE
> * set.
> */
> if (!orders || !(tva_flags & TVA_ENFORCE_SYSFS))
> return orders;
>
>> +
>> + /*
>> + * Otherwise, we only enforce sysfs settings if asked. In
>> addition,
>> + * if the user sets a sysfs mode of madvise and if
>> TVA_ENFORCE_SYSFS
>> + * is not set, we don't bother checking whether the VMA has
>> VM_HUGEPAGE
>> + * set.
>> + */
>> + if (!(tva_flags & TVA_ENFORCE_SYSFS))
>> + return orders;
>> +
>> + mask = always;
>> + if (has_madvise)
>> + mask |= madvise;
>> + if (hugepage_global_always() || (has_madvise && inherit_enabled))
>> + mask |= inherit;
>
> Similarly, this can maybe become (not 100% sure if I got it right, the
> condition above is confusing)
IMO, this is the original logic.
> if (!has_madvise) {
> /*
> * Without VM_HUGEPAGE, only allow orders that are set to
> * ALWAYS directly ...
> */
> order &= (always | inherit);
> /* ... or through inheritance. */
> if (!hugepage_global_always())
> orders &= ~inherit;
Ditto. I can not understand this too.
> }
>
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH v2 1/2] mm: huge_memory: disallow hugepages if the system-wide THP sysfs settings are disabled
2025-06-12 12:45 ` Baolin Wang
@ 2025-06-12 13:05 ` David Hildenbrand
2025-06-12 13:25 ` Baolin Wang
2025-06-12 13:27 ` Lorenzo Stoakes
0 siblings, 2 replies; 49+ messages in thread
From: David Hildenbrand @ 2025-06-12 13:05 UTC (permalink / raw)
To: Baolin Wang, akpm, hughd
Cc: lorenzo.stoakes, Liam.Howlett, npache, ryan.roberts, dev.jain,
ziy, linux-mm, linux-kernel
On 12.06.25 14:45, Baolin Wang wrote:
>
>
> On 2025/6/12 16:51, David Hildenbrand wrote:
>> On 12.06.25 09:51, Baolin Wang wrote:
>>>
>>>
>>> On 2025/6/11 20:34, David Hildenbrand wrote:
>>>> On 05.06.25 10:00, Baolin Wang wrote:
>>>>> The MADV_COLLAPSE will ignore the system-wide Anon THP sysfs settings,
>>>>> which
>>>>> means that even though we have disabled the Anon THP configuration,
>>>>> MADV_COLLAPSE
>>>>> will still attempt to collapse into a Anon THP. This violates the rule
>>>>> we have
>>>>> agreed upon: never means never.
>>>>>
>>>>> Another rule for madvise, referring to David's suggestion: “allowing
>>>>> for collapsing
>>>>> in a VM without VM_HUGEPAGE in the "madvise" mode would be fine".
>>>>>
>>>>> To address this issue, should check whether the Anon THP configuration
>>>>> is disabled
>>>>> in thp_vma_allowable_orders(), even when the TVA_ENFORCE_SYSFS flag is
>>>>> set.
>>>>>
>>>>> In summary, the current strategy is:
>>>>>
>>>>> 1. If always & orders == 0, and madvise & orders == 0, and
>>>>> hugepage_global_enabled() == false
>>>>> (global THP settings are not enabled), it means mTHP of that orders
>>>>> are prohibited
>>>>> from being used, then madvise_collapse() is forbidden for that orders.
>>>>>
>>>>> 2. If always & orders == 0, and madvise & orders == 0, and
>>>>> hugepage_global_enabled() == true
>>>>> (global THP settings are enabled), and inherit & orders == 0, it means
>>>>> mTHP of that
>>>>> orders are still prohibited from being used, thus madvise_collapse()
>>>>> is not allowed
>>>>> for that orders.
>>>>>
>>>>> Reviewed-by: Zi Yan <ziy@nvidia.com>
>>>>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>>>>> ---
>>>>> include/linux/huge_mm.h | 23 +++++++++++++++++++----
>>>>> 1 file changed, 19 insertions(+), 4 deletions(-)
>>>>>
>>>>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>>>>> index 2f190c90192d..199ddc9f04a1 100644
>>>>> --- a/include/linux/huge_mm.h
>>>>> +++ b/include/linux/huge_mm.h
>>>>> @@ -287,20 +287,35 @@ unsigned long thp_vma_allowable_orders(struct
>>>>> vm_area_struct *vma,
>>>>> unsigned long orders)
>>>>> {
>>>>> /* Optimization to check if required orders are enabled
>>>>> early. */
>>>>> - if ((tva_flags & TVA_ENFORCE_SYSFS) && vma_is_anonymous(vma)) {
>>>>> - unsigned long mask = READ_ONCE(huge_anon_orders_always);
>>>>> + if (vma_is_anonymous(vma)) {
>>>>> + unsigned long always = READ_ONCE(huge_anon_orders_always);
>>>>> + unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
>>>>> + unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);
>>>>> + unsigned long mask = always | madvise;
>>>>> +
>>>>> + /*
>>>>> + * If the system-wide THP/mTHP sysfs settings are disabled,
>>>>> + * then we should never allow hugepages.
>>>> > + */> + if (!(mask & orders) &&
>>>> !(hugepage_global_enabled() && (inherit & orders)))
>>>>> + return 0;
>>>>
>>>> I'm still trying to digest that. Isn't there a way for us to work with
>>>> the orders,
>>>> essentially masking off all orders that are forbidden globally. Similar
>>>> to below, if !orders, then return 0?
>>>> /* Orders disabled directly. */
>>>> orders &= ~TODO;
>>>> /* Orders disabled by inheriting from the global toggle. */
>>>> if (!hugepage_global_enabled())
>>>> orders &= ~READ_ONCE(huge_anon_orders_inherit);
>>>>
>>>> TODO is probably a -1ULL and then clearing always/madvise/inherit. Could
>>>> add a simple helper for that
>>>>
>>>> huge_anon_orders_never
>>>
>>> I followed Lorenzo's suggestion to simplify the logic. Does that look
>>> more readable?
>>>
>>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>>> index 2f190c90192d..3087ac7631e0 100644
>>> --- a/include/linux/huge_mm.h
>>> +++ b/include/linux/huge_mm.h
>>> @@ -265,6 +265,43 @@ unsigned long __thp_vma_allowable_orders(struct
>>> vm_area_struct *vma,
>>> unsigned long tva_flags,
>>> unsigned long orders);
>>>
>>> +/* Strictly mask requested anonymous orders according to sysfs
>>> settings. */
>>> +static inline unsigned long __thp_mask_anon_orders(unsigned long
>>> vm_flags,
>>> + unsigned long tva_flags, unsigned long
>>> orders)
>>> +{
>>> + unsigned long always = READ_ONCE(huge_anon_orders_always);
>>> + unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
>>> + unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);
>>> + bool inherit_enabled = hugepage_global_enabled();
>>> + bool has_madvise = vm_flags & VM_HUGEPAGE;
>>> + unsigned long mask = always | madvise;
>>> +
>>> + mask = always | madvise;
>>> + if (inherit_enabled)
>>> + mask |= inherit;
>>> +
>>> + /* All set to/inherit NEVER - never means never globally,
>>> abort. */
>>> + if (!(mask & orders))
>>> + return 0;
>>
>> Still confusing. I am not sure if we would properly catch when someone
>> specifies e.g., 2M and 1M, while we only have 2M disabled.
>
> IIUC, Yes. In your case, we will only allow order 8 (1M mTHP).
>
>> I would rewrite the function to only ever substract from "orders".
>>
>> ...
>>
>> /* Disallow orders that are set to NEVER directly ... */
>> order &= (always | madvise | inherit);
>>
>> /* ... or through inheritance. */
>> if (inherit_enabled)
>> orders &= ~inherit;
>
> Sorry, I didn't get you here.
>
> If orders = THP_ORDERS_ALL_ANON, inherit = 0x200 (order 9), always and
> madvise are 0, and inherit_enabled = true. Then orders will be 0 with
> your logic. But we should allow order 9, right?
Yeah, all confusing, because the temporary variables don't help.
if (!inherit_enabled)
or simply
if (!hugepage_global_enabled();)
Let me try again below.
>
>>
>> /*
>> * Otherwise, we only enforce sysfs settings if asked. In addition,
>> * if the user sets a sysfs mode of madvise and if TVA_ENFORCE_SYSFS
>> * is not set, we don't bother checking whether the VMA has VM_HUGEPAGE
>> * set.
>> */
>> if (!orders || !(tva_flags & TVA_ENFORCE_SYSFS))
>> return orders;
>>
>>> +
>>> + /*
>>> + * Otherwise, we only enforce sysfs settings if asked. In
>>> addition,
>>> + * if the user sets a sysfs mode of madvise and if
>>> TVA_ENFORCE_SYSFS
>>> + * is not set, we don't bother checking whether the VMA has
>>> VM_HUGEPAGE
>>> + * set.
>>> + */
>>> + if (!(tva_flags & TVA_ENFORCE_SYSFS))
>>> + return orders;
>>> +
>>> + mask = always;
>>> + if (has_madvise)
>>> + mask |= madvise;
>>> + if (hugepage_global_always() || (has_madvise && inherit_enabled))
>>> + mask |= inherit;
>>
>> Similarly, this can maybe become (not 100% sure if I got it right, the
>> condition above is confusing)
>
> IMO, this is the original logic.
Yeah, and it's absolutely confusing stuff.
Let me try again by only clearing flags. Maybe this would be clearer?
(and correct? still confused why the latter part is so complicated in existing
code)
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 8b8f353cc7b81..66fdfe06e4996 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -265,6 +265,42 @@ unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma,
unsigned long tva_flags,
unsigned long orders);
+/* Strictly mask requested anonymous orders according to sysfs settings. */
+static inline unsigned long __thp_mask_anon_orders(unsigned long vm_flags,
+ unsigned long tva_flags, unsigned long orders)
+{
+ const unsigned long always = READ_ONCE(huge_anon_orders_always);
+ const unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
+ const unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);
+ const unsigned long never = ~(always | madvise | inherit);
+
+ /* Disallow orders that are set to NEVER directly ... */
+ orders &= ~never;
+
+ /* ... or through inheritance (global == NEVER). */
+ if (!hugepage_global_enabled())
+ orders &= ~inherit;
+
+ /*
+ * Otherwise, we only enforce sysfs settings if asked. In addition,
+ * if the user sets a sysfs mode of madvise and if TVA_ENFORCE_SYSFS
+ * is not set, we don't bother checking whether the VMA has VM_HUGEPAGE
+ * set.
+ */
+ if (!(tva_flags & TVA_ENFORCE_SYSFS))
+ return orders;
+
+ if (!(vm_flags & VM_HUGEPAGE)) {
+ /* Disallow orders that are set to MADVISE directly ... */
+ orders &= ~madvise;
+
+ /* ... or through inheritance (global == MADVISE). */
+ if (!hugepage_global_always())
+ orders &= ~inherit;
+ }
+ return orders;
+}
+
/**
* thp_vma_allowable_orders - determine hugepage orders that are allowed for vma
* @vma: the vm area to check
@@ -287,16 +323,8 @@ unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma,
unsigned long orders)
{
/* Optimization to check if required orders are enabled early. */
- if ((tva_flags & TVA_ENFORCE_SYSFS) && vma_is_anonymous(vma)) {
- unsigned long mask = READ_ONCE(huge_anon_orders_always);
-
- if (vm_flags & VM_HUGEPAGE)
- mask |= READ_ONCE(huge_anon_orders_madvise);
- if (hugepage_global_always() ||
- ((vm_flags & VM_HUGEPAGE) && hugepage_global_enabled()))
- mask |= READ_ONCE(huge_anon_orders_inherit);
-
- orders &= mask;
+ if (vma_is_anonymous(vma)) {
+ orders = __thp_mask_anon_orders(vm_flags, tva_flags, orders);
if (!orders)
return 0;
}
--
Cheers,
David / dhildenb
^ permalink raw reply related [flat|nested] 49+ messages in thread
* Re: [PATCH v2 1/2] mm: huge_memory: disallow hugepages if the system-wide THP sysfs settings are disabled
2025-06-12 8:51 ` David Hildenbrand
2025-06-12 12:45 ` Baolin Wang
@ 2025-06-12 13:07 ` Lorenzo Stoakes
2025-06-12 13:13 ` David Hildenbrand
1 sibling, 1 reply; 49+ messages in thread
From: Lorenzo Stoakes @ 2025-06-12 13:07 UTC (permalink / raw)
To: David Hildenbrand
Cc: Baolin Wang, akpm, hughd, Liam.Howlett, npache, ryan.roberts,
dev.jain, ziy, linux-mm, linux-kernel
On Thu, Jun 12, 2025 at 10:51:17AM +0200, David Hildenbrand wrote:
> On 12.06.25 09:51, Baolin Wang wrote:
> >
> >
> > On 2025/6/11 20:34, David Hildenbrand wrote:
> > > On 05.06.25 10:00, Baolin Wang wrote:
> > > > The MADV_COLLAPSE will ignore the system-wide Anon THP sysfs settings,
> > > > which
> > > > means that even though we have disabled the Anon THP configuration,
> > > > MADV_COLLAPSE
> > > > will still attempt to collapse into a Anon THP. This violates the rule
> > > > we have
> > > > agreed upon: never means never.
> > > >
> > > > Another rule for madvise, referring to David's suggestion: “allowing
> > > > for collapsing
> > > > in a VM without VM_HUGEPAGE in the "madvise" mode would be fine".
> > > >
> > > > To address this issue, should check whether the Anon THP configuration
> > > > is disabled
> > > > in thp_vma_allowable_orders(), even when the TVA_ENFORCE_SYSFS flag is
> > > > set.
> > > >
> > > > In summary, the current strategy is:
> > > >
> > > > 1. If always & orders == 0, and madvise & orders == 0, and
> > > > hugepage_global_enabled() == false
> > > > (global THP settings are not enabled), it means mTHP of that orders
> > > > are prohibited
> > > > from being used, then madvise_collapse() is forbidden for that orders.
> > > >
> > > > 2. If always & orders == 0, and madvise & orders == 0, and
> > > > hugepage_global_enabled() == true
> > > > (global THP settings are enabled), and inherit & orders == 0, it means
> > > > mTHP of that
> > > > orders are still prohibited from being used, thus madvise_collapse()
> > > > is not allowed
> > > > for that orders.
> > > >
> > > > Reviewed-by: Zi Yan <ziy@nvidia.com>
> > > > Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> > > > ---
> > > > include/linux/huge_mm.h | 23 +++++++++++++++++++----
> > > > 1 file changed, 19 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> > > > index 2f190c90192d..199ddc9f04a1 100644
> > > > --- a/include/linux/huge_mm.h
> > > > +++ b/include/linux/huge_mm.h
> > > > @@ -287,20 +287,35 @@ unsigned long thp_vma_allowable_orders(struct
> > > > vm_area_struct *vma,
> > > > unsigned long orders)
> > > > {
> > > > /* Optimization to check if required orders are enabled early. */
> > > > - if ((tva_flags & TVA_ENFORCE_SYSFS) && vma_is_anonymous(vma)) {
> > > > - unsigned long mask = READ_ONCE(huge_anon_orders_always);
> > > > + if (vma_is_anonymous(vma)) {
> > > > + unsigned long always = READ_ONCE(huge_anon_orders_always);
> > > > + unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
> > > > + unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);
> > > > + unsigned long mask = always | madvise;
> > > > +
> > > > + /*
> > > > + * If the system-wide THP/mTHP sysfs settings are disabled,
> > > > + * then we should never allow hugepages.
> > > > + */> + if (!(mask & orders) &&
> > > !(hugepage_global_enabled() && (inherit & orders)))
> > > > + return 0;
> > >
> > > I'm still trying to digest that. Isn't there a way for us to work with
> > > the orders,
> > > essentially masking off all orders that are forbidden globally. Similar
> > > to below, if !orders, then return 0?
> > > /* Orders disabled directly. */
> > > orders &= ~TODO;
> > > /* Orders disabled by inheriting from the global toggle. */
> > > if (!hugepage_global_enabled())
> > > orders &= ~READ_ONCE(huge_anon_orders_inherit);
> > >
> > > TODO is probably a -1ULL and then clearing always/madvise/inherit. Could
> > > add a simple helper for that
> > >
> > > huge_anon_orders_never
> >
> > I followed Lorenzo's suggestion to simplify the logic. Does that look
> > more readable?
> >
> > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> > index 2f190c90192d..3087ac7631e0 100644
> > --- a/include/linux/huge_mm.h
> > +++ b/include/linux/huge_mm.h
> > @@ -265,6 +265,43 @@ unsigned long __thp_vma_allowable_orders(struct
> > vm_area_struct *vma,
> > unsigned long tva_flags,
> > unsigned long orders);
> >
> > +/* Strictly mask requested anonymous orders according to sysfs settings. */
> > +static inline unsigned long __thp_mask_anon_orders(unsigned long vm_flags,
> > + unsigned long tva_flags, unsigned long
> > orders)
> > +{
> > + unsigned long always = READ_ONCE(huge_anon_orders_always);
> > + unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
> > + unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);
> > + bool inherit_enabled = hugepage_global_enabled();
> > + bool has_madvise = vm_flags & VM_HUGEPAGE;
> > + unsigned long mask = always | madvise;
> > +
> > + mask = always | madvise;
> > + if (inherit_enabled)
> > + mask |= inherit;
> > +
> > + /* All set to/inherit NEVER - never means never globally, abort. */
> > + if (!(mask & orders))
> > + return 0;
>
> Still confusing. I am not sure if we would properly catch when someone
> specifies e.g., 2M and 1M, while we only have 2M disabled.
I did wonder if we should call 'orders' something like 'requested_orders'
or something.
This check is always against the input orders which we might conceivably
want.
For instance in madvise_collapse():
if (!thp_vma_allowable_order(vma, vma->vm_flags, 0, PMD_ORDER))
return -EINVAL;
I don't think, if it's only possible for PMD order collapse, and that is
disabled, but mTHP 64 KB let's say is enabled, it'd be fine for
MADV_COLLAPSE to succeed at the PMD order.
>
>
> I would rewrite the function to only ever substract from "orders".
Hm.
>
> ...
>
> /* Disallow orders that are set to NEVER directly ... */
> order &= (always | madvise | inherit);
^s
I think you mean (always | madvise) here.
>
> /* ... or through inheritance. */
> if (inherit_enabled)
> orders &= ~inherit;
order & (inherit & ~inherit) is going to always be zero :)
So this should be
orders &= inherit.
The problem is, when you come to the next stage where you are done checking
the 'are we in a NEVER situation regardless of TVA_ENFORCE_SYSFS' you've
now corrupted orders.
Because you've included inherit even if !(tva_flags & TVA_ENFORCE_SYSFS).
And there's no way to recover that information.
>
> /*
> * Otherwise, we only enforce sysfs settings if asked. In addition,
> * if the user sets a sysfs mode of madvise and if TVA_ENFORCE_SYSFS
> * is not set, we don't bother checking whether the VMA has VM_HUGEPAGE
> * set.
> */
> if (!orders || !(tva_flags & TVA_ENFORCE_SYSFS))
> return orders;
I don't think this is much delta to what we have now.
I do wonder if we should return mask & orders here, actually, to account
for the fact that, in theory, orders could set > PMD for
!TVA_ENFORCE_SYSFS) (not currently the case).
>
> > +
> > + /*
> > + * Otherwise, we only enforce sysfs settings if asked. In addition,
> > + * if the user sets a sysfs mode of madvise and if TVA_ENFORCE_SYSFS
> > + * is not set, we don't bother checking whether the VMA has
> > VM_HUGEPAGE
> > + * set.
> > + */
> > + if (!(tva_flags & TVA_ENFORCE_SYSFS))
> > + return orders;
> > +
> > + mask = always;
> > + if (has_madvise)
> > + mask |= madvise;
> > + if (hugepage_global_always() || (has_madvise && inherit_enabled))
> > + mask |= inherit;
>
> Similarly, this can maybe become (not 100% sure if I got it right, the
> condition above is confusing)
>
> if (!has_madvise) {
> /*
> * Without VM_HUGEPAGE, only allow orders that are set to
> * ALWAYS directly ...
> */
> order &= (always | inherit);
Obviously orders is corrupted at this point so this won't work, but I'm not
sure this is right?
If no madvise, only then obey always/inherit? Hm?
> /* ... or through inheritance. */
> if (!hugepage_global_always())
> orders &= ~inherit;
I'm not sure about this ~inherit again, that means we ignore inherit no?
> }
And we need a branch for madvise too no?
I think all of this is a _clear_ example of what a mess all this is.
I think we need a deeper refactor, but I think my suggested changes make at
least what we have here less horrid to get through.
I think maybe a better refactoring that's in the spirit of this is:
if (has_madvise) {
mask |= madvise;
if (inherit_enabled)
mask |= inherit;
} else if (hugepage_global_always()) {
mask |= inherit;
}
What do you think?
>
> --
> Cheers,
>
> David / dhildenb
>
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH v2 1/2] mm: huge_memory: disallow hugepages if the system-wide THP sysfs settings are disabled
2025-06-12 13:07 ` Lorenzo Stoakes
@ 2025-06-12 13:13 ` David Hildenbrand
2025-06-12 13:31 ` Lorenzo Stoakes
0 siblings, 1 reply; 49+ messages in thread
From: David Hildenbrand @ 2025-06-12 13:13 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: Baolin Wang, akpm, hughd, Liam.Howlett, npache, ryan.roberts,
dev.jain, ziy, linux-mm, linux-kernel
On 12.06.25 15:07, Lorenzo Stoakes wrote:
> On Thu, Jun 12, 2025 at 10:51:17AM +0200, David Hildenbrand wrote:
>> On 12.06.25 09:51, Baolin Wang wrote:
>>>
>>>
>>> On 2025/6/11 20:34, David Hildenbrand wrote:
>>>> On 05.06.25 10:00, Baolin Wang wrote:
>>>>> The MADV_COLLAPSE will ignore the system-wide Anon THP sysfs settings,
>>>>> which
>>>>> means that even though we have disabled the Anon THP configuration,
>>>>> MADV_COLLAPSE
>>>>> will still attempt to collapse into a Anon THP. This violates the rule
>>>>> we have
>>>>> agreed upon: never means never.
>>>>>
>>>>> Another rule for madvise, referring to David's suggestion: “allowing
>>>>> for collapsing
>>>>> in a VM without VM_HUGEPAGE in the "madvise" mode would be fine".
>>>>>
>>>>> To address this issue, should check whether the Anon THP configuration
>>>>> is disabled
>>>>> in thp_vma_allowable_orders(), even when the TVA_ENFORCE_SYSFS flag is
>>>>> set.
>>>>>
>>>>> In summary, the current strategy is:
>>>>>
>>>>> 1. If always & orders == 0, and madvise & orders == 0, and
>>>>> hugepage_global_enabled() == false
>>>>> (global THP settings are not enabled), it means mTHP of that orders
>>>>> are prohibited
>>>>> from being used, then madvise_collapse() is forbidden for that orders.
>>>>>
>>>>> 2. If always & orders == 0, and madvise & orders == 0, and
>>>>> hugepage_global_enabled() == true
>>>>> (global THP settings are enabled), and inherit & orders == 0, it means
>>>>> mTHP of that
>>>>> orders are still prohibited from being used, thus madvise_collapse()
>>>>> is not allowed
>>>>> for that orders.
>>>>>
>>>>> Reviewed-by: Zi Yan <ziy@nvidia.com>
>>>>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>>>>> ---
>>>>> include/linux/huge_mm.h | 23 +++++++++++++++++++----
>>>>> 1 file changed, 19 insertions(+), 4 deletions(-)
>>>>>
>>>>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>>>>> index 2f190c90192d..199ddc9f04a1 100644
>>>>> --- a/include/linux/huge_mm.h
>>>>> +++ b/include/linux/huge_mm.h
>>>>> @@ -287,20 +287,35 @@ unsigned long thp_vma_allowable_orders(struct
>>>>> vm_area_struct *vma,
>>>>> unsigned long orders)
>>>>> {
>>>>> /* Optimization to check if required orders are enabled early. */
>>>>> - if ((tva_flags & TVA_ENFORCE_SYSFS) && vma_is_anonymous(vma)) {
>>>>> - unsigned long mask = READ_ONCE(huge_anon_orders_always);
>>>>> + if (vma_is_anonymous(vma)) {
>>>>> + unsigned long always = READ_ONCE(huge_anon_orders_always);
>>>>> + unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
>>>>> + unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);
>>>>> + unsigned long mask = always | madvise;
>>>>> +
>>>>> + /*
>>>>> + * If the system-wide THP/mTHP sysfs settings are disabled,
>>>>> + * then we should never allow hugepages.
>>>> > + */> + if (!(mask & orders) &&
>>>> !(hugepage_global_enabled() && (inherit & orders)))
>>>>> + return 0;
>>>>
>>>> I'm still trying to digest that. Isn't there a way for us to work with
>>>> the orders,
>>>> essentially masking off all orders that are forbidden globally. Similar
>>>> to below, if !orders, then return 0?
>>>> /* Orders disabled directly. */
>>>> orders &= ~TODO;
>>>> /* Orders disabled by inheriting from the global toggle. */
>>>> if (!hugepage_global_enabled())
>>>> orders &= ~READ_ONCE(huge_anon_orders_inherit);
>>>>
>>>> TODO is probably a -1ULL and then clearing always/madvise/inherit. Could
>>>> add a simple helper for that
>>>>
>>>> huge_anon_orders_never
>>>
>>> I followed Lorenzo's suggestion to simplify the logic. Does that look
>>> more readable?
>>>
>>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>>> index 2f190c90192d..3087ac7631e0 100644
>>> --- a/include/linux/huge_mm.h
>>> +++ b/include/linux/huge_mm.h
>>> @@ -265,6 +265,43 @@ unsigned long __thp_vma_allowable_orders(struct
>>> vm_area_struct *vma,
>>> unsigned long tva_flags,
>>> unsigned long orders);
>>>
>>> +/* Strictly mask requested anonymous orders according to sysfs settings. */
>>> +static inline unsigned long __thp_mask_anon_orders(unsigned long vm_flags,
>>> + unsigned long tva_flags, unsigned long
>>> orders)
>>> +{
>>> + unsigned long always = READ_ONCE(huge_anon_orders_always);
>>> + unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
>>> + unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);
>>> + bool inherit_enabled = hugepage_global_enabled();
>>> + bool has_madvise = vm_flags & VM_HUGEPAGE;
>>> + unsigned long mask = always | madvise;
>>> +
>>> + mask = always | madvise;
>>> + if (inherit_enabled)
>>> + mask |= inherit;
>>> +
>>> + /* All set to/inherit NEVER - never means never globally, abort. */
>>> + if (!(mask & orders))
>>> + return 0;
>>
>> Still confusing. I am not sure if we would properly catch when someone
>> specifies e.g., 2M and 1M, while we only have 2M disabled.
>
> I did wonder if we should call 'orders' something like 'requested_orders'
> or something.
>
> This check is always against the input orders which we might conceivably
> want.
>
> For instance in madvise_collapse():
>
> if (!thp_vma_allowable_order(vma, vma->vm_flags, 0, PMD_ORDER))
> return -EINVAL;
>
> I don't think, if it's only possible for PMD order collapse, and that is
> disabled, but mTHP 64 KB let's say is enabled, it'd be fine for
> MADV_COLLAPSE to succeed at the PMD order.
>
>
>>
>>
>> I would rewrite the function to only ever substract from "orders".
>
> Hm.
>
>>
>> ...
>>
>> /* Disallow orders that are set to NEVER directly ... */
>> order &= (always | madvise | inherit);
> ^s
>
> I think you mean (always | madvise) here.
>
>>
>> /* ... or through inheritance. */
>> if (inherit_enabled)
>> orders &= ~inherit;
>
> order & (inherit & ~inherit) is going to always be zero :)
>
> So this should be
>
> orders &= inherit.
>
> The problem is, when you come to the next stage where you are done checking
> the 'are we in a NEVER situation regardless of TVA_ENFORCE_SYSFS' you've
> now corrupted orders.
>
> Because you've included inherit even if !(tva_flags & TVA_ENFORCE_SYSFS).
>
> And there's no way to recover that information.
>
>>
>> /*
>> * Otherwise, we only enforce sysfs settings if asked. In addition,
>> * if the user sets a sysfs mode of madvise and if TVA_ENFORCE_SYSFS
>> * is not set, we don't bother checking whether the VMA has VM_HUGEPAGE
>> * set.
>> */
>> if (!orders || !(tva_flags & TVA_ENFORCE_SYSFS))
>> return orders;
>
> I don't think this is much delta to what we have now.
>
> I do wonder if we should return mask & orders here, actually, to account
> for the fact that, in theory, orders could set > PMD for
> !TVA_ENFORCE_SYSFS) (not currently the case).
>
>>
>>> +
>>> + /*
>>> + * Otherwise, we only enforce sysfs settings if asked. In addition,
>>> + * if the user sets a sysfs mode of madvise and if TVA_ENFORCE_SYSFS
>>> + * is not set, we don't bother checking whether the VMA has
>>> VM_HUGEPAGE
>>> + * set.
>>> + */
>>> + if (!(tva_flags & TVA_ENFORCE_SYSFS))
>>> + return orders;
>>> +
>>> + mask = always;
>>> + if (has_madvise)
>>> + mask |= madvise;
>>> + if (hugepage_global_always() || (has_madvise && inherit_enabled))
>>> + mask |= inherit;
>>
>> Similarly, this can maybe become (not 100% sure if I got it right, the
>> condition above is confusing)
>>
>> if (!has_madvise) {
>> /*
>> * Without VM_HUGEPAGE, only allow orders that are set to
>> * ALWAYS directly ...
>> */
>> order &= (always | inherit);
>
> Obviously orders is corrupted at this point so this won't work, but I'm not
> sure this is right?
>
> If no madvise, only then obey always/inherit? Hm?
>
>
>> /* ... or through inheritance. */
>> if (!hugepage_global_always())
>> orders &= ~inherit;
>
> I'm not sure about this ~inherit again, that means we ignore inherit no?
>
>> }
>
> And we need a branch for madvise too no?
>
> I think all of this is a _clear_ example of what a mess all this is.
>
> I think we need a deeper refactor, but I think my suggested changes make at
> least what we have here less horrid to get through.
>
> I think maybe a better refactoring that's in the spirit of this is:
>
> if (has_madvise) {
> mask |= madvise;
> if (inherit_enabled)
> mask |= inherit;
> } else if (hugepage_global_always()) {
> mask |= inherit;
> }
>
> What do you think?
The masks are just disgusting :(
While you were writing that reply, I just sent out something else. Not
sure if that makes sense.
I'm having a hard time figuring out what we even want here in the
existing code.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH v2 1/2] mm: huge_memory: disallow hugepages if the system-wide THP sysfs settings are disabled
2025-06-12 13:05 ` David Hildenbrand
@ 2025-06-12 13:25 ` Baolin Wang
2025-06-12 13:40 ` Baolin Wang
2025-06-12 13:27 ` Lorenzo Stoakes
1 sibling, 1 reply; 49+ messages in thread
From: Baolin Wang @ 2025-06-12 13:25 UTC (permalink / raw)
To: David Hildenbrand, akpm, hughd
Cc: lorenzo.stoakes, Liam.Howlett, npache, ryan.roberts, dev.jain,
ziy, linux-mm, linux-kernel
On 2025/6/12 21:05, David Hildenbrand wrote:
> On 12.06.25 14:45, Baolin Wang wrote:
>>
>>
>> On 2025/6/12 16:51, David Hildenbrand wrote:
>>> On 12.06.25 09:51, Baolin Wang wrote:
>>>>
>>>>
>>>> On 2025/6/11 20:34, David Hildenbrand wrote:
>>>>> On 05.06.25 10:00, Baolin Wang wrote:
>>>>>> The MADV_COLLAPSE will ignore the system-wide Anon THP sysfs
>>>>>> settings,
>>>>>> which
>>>>>> means that even though we have disabled the Anon THP configuration,
>>>>>> MADV_COLLAPSE
>>>>>> will still attempt to collapse into a Anon THP. This violates the
>>>>>> rule
>>>>>> we have
>>>>>> agreed upon: never means never.
>>>>>>
>>>>>> Another rule for madvise, referring to David's suggestion: “allowing
>>>>>> for collapsing
>>>>>> in a VM without VM_HUGEPAGE in the "madvise" mode would be fine".
>>>>>>
>>>>>> To address this issue, should check whether the Anon THP
>>>>>> configuration
>>>>>> is disabled
>>>>>> in thp_vma_allowable_orders(), even when the TVA_ENFORCE_SYSFS
>>>>>> flag is
>>>>>> set.
>>>>>>
>>>>>> In summary, the current strategy is:
>>>>>>
>>>>>> 1. If always & orders == 0, and madvise & orders == 0, and
>>>>>> hugepage_global_enabled() == false
>>>>>> (global THP settings are not enabled), it means mTHP of that orders
>>>>>> are prohibited
>>>>>> from being used, then madvise_collapse() is forbidden for that
>>>>>> orders.
>>>>>>
>>>>>> 2. If always & orders == 0, and madvise & orders == 0, and
>>>>>> hugepage_global_enabled() == true
>>>>>> (global THP settings are enabled), and inherit & orders == 0, it
>>>>>> means
>>>>>> mTHP of that
>>>>>> orders are still prohibited from being used, thus madvise_collapse()
>>>>>> is not allowed
>>>>>> for that orders.
>>>>>>
>>>>>> Reviewed-by: Zi Yan <ziy@nvidia.com>
>>>>>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>>>>>> ---
>>>>>> include/linux/huge_mm.h | 23 +++++++++++++++++++----
>>>>>> 1 file changed, 19 insertions(+), 4 deletions(-)
>>>>>>
>>>>>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>>>>>> index 2f190c90192d..199ddc9f04a1 100644
>>>>>> --- a/include/linux/huge_mm.h
>>>>>> +++ b/include/linux/huge_mm.h
>>>>>> @@ -287,20 +287,35 @@ unsigned long thp_vma_allowable_orders(struct
>>>>>> vm_area_struct *vma,
>>>>>> unsigned long orders)
>>>>>> {
>>>>>> /* Optimization to check if required orders are enabled
>>>>>> early. */
>>>>>> - if ((tva_flags & TVA_ENFORCE_SYSFS) && vma_is_anonymous(vma)) {
>>>>>> - unsigned long mask = READ_ONCE(huge_anon_orders_always);
>>>>>> + if (vma_is_anonymous(vma)) {
>>>>>> + unsigned long always = READ_ONCE(huge_anon_orders_always);
>>>>>> + unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
>>>>>> + unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);
>>>>>> + unsigned long mask = always | madvise;
>>>>>> +
>>>>>> + /*
>>>>>> + * If the system-wide THP/mTHP sysfs settings are disabled,
>>>>>> + * then we should never allow hugepages.
>>>>> > + */> + if (!(mask & orders) &&
>>>>> !(hugepage_global_enabled() && (inherit & orders)))
>>>>>> + return 0;
>>>>>
>>>>> I'm still trying to digest that. Isn't there a way for us to work with
>>>>> the orders,
>>>>> essentially masking off all orders that are forbidden globally.
>>>>> Similar
>>>>> to below, if !orders, then return 0?
>>>>> /* Orders disabled directly. */
>>>>> orders &= ~TODO;
>>>>> /* Orders disabled by inheriting from the global toggle. */
>>>>> if (!hugepage_global_enabled())
>>>>> orders &= ~READ_ONCE(huge_anon_orders_inherit);
>>>>>
>>>>> TODO is probably a -1ULL and then clearing always/madvise/inherit.
>>>>> Could
>>>>> add a simple helper for that
>>>>>
>>>>> huge_anon_orders_never
>>>>
>>>> I followed Lorenzo's suggestion to simplify the logic. Does that look
>>>> more readable?
>>>>
>>>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>>>> index 2f190c90192d..3087ac7631e0 100644
>>>> --- a/include/linux/huge_mm.h
>>>> +++ b/include/linux/huge_mm.h
>>>> @@ -265,6 +265,43 @@ unsigned long __thp_vma_allowable_orders(struct
>>>> vm_area_struct *vma,
>>>> unsigned long tva_flags,
>>>> unsigned long orders);
>>>>
>>>> +/* Strictly mask requested anonymous orders according to sysfs
>>>> settings. */
>>>> +static inline unsigned long __thp_mask_anon_orders(unsigned long
>>>> vm_flags,
>>>> + unsigned long tva_flags, unsigned long
>>>> orders)
>>>> +{
>>>> + unsigned long always = READ_ONCE(huge_anon_orders_always);
>>>> + unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
>>>> + unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);
>>>> + bool inherit_enabled = hugepage_global_enabled();
>>>> + bool has_madvise = vm_flags & VM_HUGEPAGE;
>>>> + unsigned long mask = always | madvise;
>>>> +
>>>> + mask = always | madvise;
>>>> + if (inherit_enabled)
>>>> + mask |= inherit;
>>>> +
>>>> + /* All set to/inherit NEVER - never means never globally,
>>>> abort. */
>>>> + if (!(mask & orders))
>>>> + return 0;
>>>
>>> Still confusing. I am not sure if we would properly catch when someone
>>> specifies e.g., 2M and 1M, while we only have 2M disabled.
>>
>> IIUC, Yes. In your case, we will only allow order 8 (1M mTHP).
>>
>>> I would rewrite the function to only ever substract from "orders".
>>>
>>> ...
>>>
>>> /* Disallow orders that are set to NEVER directly ... */
>>> order &= (always | madvise | inherit);
>>>
>>> /* ... or through inheritance. */
>>> if (inherit_enabled)
>>> orders &= ~inherit;
>>
>> Sorry, I didn't get you here.
>>
>> If orders = THP_ORDERS_ALL_ANON, inherit = 0x200 (order 9), always and
>> madvise are 0, and inherit_enabled = true. Then orders will be 0 with
>> your logic. But we should allow order 9, right?
>
> Yeah, all confusing, because the temporary variables don't help.
>
> if (!inherit_enabled)
>
> or simply
>
> if (!hugepage_global_enabled();)
>
> Let me try again below.
>
>>
>>>
>>> /*
>>> * Otherwise, we only enforce sysfs settings if asked. In addition,
>>> * if the user sets a sysfs mode of madvise and if TVA_ENFORCE_SYSFS
>>> * is not set, we don't bother checking whether the VMA has
>>> VM_HUGEPAGE
>>> * set.
>>> */
>>> if (!orders || !(tva_flags & TVA_ENFORCE_SYSFS))
>>> return orders;
>>>
>>>> +
>>>> + /*
>>>> + * Otherwise, we only enforce sysfs settings if asked. In
>>>> addition,
>>>> + * if the user sets a sysfs mode of madvise and if
>>>> TVA_ENFORCE_SYSFS
>>>> + * is not set, we don't bother checking whether the VMA has
>>>> VM_HUGEPAGE
>>>> + * set.
>>>> + */
>>>> + if (!(tva_flags & TVA_ENFORCE_SYSFS))
>>>> + return orders;
>>>> +
>>>> + mask = always;
>>>> + if (has_madvise)
>>>> + mask |= madvise;
>>>> + if (hugepage_global_always() || (has_madvise &&
>>>> inherit_enabled))
>>>> + mask |= inherit;
>>>
>>> Similarly, this can maybe become (not 100% sure if I got it right, the
>>> condition above is confusing)
>>
>> IMO, this is the original logic.
>
> Yeah, and it's absolutely confusing stuff.
>
> Let me try again by only clearing flags. Maybe this would be clearer?
> (and correct? still confused why the latter part is so complicated in
> existing
> code)
>
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index 8b8f353cc7b81..66fdfe06e4996 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -265,6 +265,42 @@ unsigned long __thp_vma_allowable_orders(struct
> vm_area_struct *vma,
> unsigned long tva_flags,
> unsigned long orders);
>
> +/* Strictly mask requested anonymous orders according to sysfs
> settings. */
> +static inline unsigned long __thp_mask_anon_orders(unsigned long vm_flags,
> + unsigned long tva_flags, unsigned long orders)
> +{
> + const unsigned long always = READ_ONCE(huge_anon_orders_always);
> + const unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
> + const unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);
> + const unsigned long never = ~(always | madvise | inherit);
> +
> + /* Disallow orders that are set to NEVER directly ... */
> + orders &= ~never;
> +
> + /* ... or through inheritance (global == NEVER). */
> + if (!hugepage_global_enabled())
> + orders &= ~inherit;
> +
> + /*
> + * Otherwise, we only enforce sysfs settings if asked. In addition,
> + * if the user sets a sysfs mode of madvise and if
> TVA_ENFORCE_SYSFS
> + * is not set, we don't bother checking whether the VMA has
> VM_HUGEPAGE
> + * set.
> + */
> + if (!(tva_flags & TVA_ENFORCE_SYSFS))
> + return orders;
> +
> + if (!(vm_flags & VM_HUGEPAGE)) {
> + /* Disallow orders that are set to MADVISE directly ... */
> + orders &= ~madvise;
> +
> + /* ... or through inheritance (global == MADVISE). */
> + if (!hugepage_global_always())
> + orders &= ~inherit;
Seems we can drop this 'inherit' check, cause if
!hugepage_global_enabled() == true, which always means
!hugepage_global_always() is true.
> + }
> + return orders;
> +}
Otherwise look good to me.
> +
> /**
> * thp_vma_allowable_orders - determine hugepage orders that are
> allowed for vma
> * @vma: the vm area to check
> @@ -287,16 +323,8 @@ unsigned long thp_vma_allowable_orders(struct
> vm_area_struct *vma,
> unsigned long orders)
> {
> /* Optimization to check if required orders are enabled early. */
> - if ((tva_flags & TVA_ENFORCE_SYSFS) && vma_is_anonymous(vma)) {
> - unsigned long mask = READ_ONCE(huge_anon_orders_always);
> -
> - if (vm_flags & VM_HUGEPAGE)
> - mask |= READ_ONCE(huge_anon_orders_madvise);
> - if (hugepage_global_always() ||
> - ((vm_flags & VM_HUGEPAGE) &&
> hugepage_global_enabled()))
> - mask |= READ_ONCE(huge_anon_orders_inherit);
> -
> - orders &= mask;
> + if (vma_is_anonymous(vma)) {
> + orders = __thp_mask_anon_orders(vm_flags, tva_flags,
> orders);
> if (!orders)
> return 0;
> }
>
>
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH v2 1/2] mm: huge_memory: disallow hugepages if the system-wide THP sysfs settings are disabled
2025-06-12 13:05 ` David Hildenbrand
2025-06-12 13:25 ` Baolin Wang
@ 2025-06-12 13:27 ` Lorenzo Stoakes
2025-06-12 13:29 ` Lorenzo Stoakes
2025-06-12 14:09 ` David Hildenbrand
1 sibling, 2 replies; 49+ messages in thread
From: Lorenzo Stoakes @ 2025-06-12 13:27 UTC (permalink / raw)
To: David Hildenbrand
Cc: Baolin Wang, akpm, hughd, Liam.Howlett, npache, ryan.roberts,
dev.jain, ziy, linux-mm, linux-kernel
On Thu, Jun 12, 2025 at 03:05:22PM +0200, David Hildenbrand wrote:
> On 12.06.25 14:45, Baolin Wang wrote:
> >
> >
> > On 2025/6/12 16:51, David Hildenbrand wrote:
> > > On 12.06.25 09:51, Baolin Wang wrote:
> > > >
> > > >
> > > > On 2025/6/11 20:34, David Hildenbrand wrote:
> > > > > On 05.06.25 10:00, Baolin Wang wrote:
> > > > > > The MADV_COLLAPSE will ignore the system-wide Anon THP sysfs settings,
> > > > > > which
> > > > > > means that even though we have disabled the Anon THP configuration,
> > > > > > MADV_COLLAPSE
> > > > > > will still attempt to collapse into a Anon THP. This violates the rule
> > > > > > we have
> > > > > > agreed upon: never means never.
> > > > > >
> > > > > > Another rule for madvise, referring to David's suggestion: “allowing
> > > > > > for collapsing
> > > > > > in a VM without VM_HUGEPAGE in the "madvise" mode would be fine".
> > > > > >
> > > > > > To address this issue, should check whether the Anon THP configuration
> > > > > > is disabled
> > > > > > in thp_vma_allowable_orders(), even when the TVA_ENFORCE_SYSFS flag is
> > > > > > set.
> > > > > >
> > > > > > In summary, the current strategy is:
> > > > > >
> > > > > > 1. If always & orders == 0, and madvise & orders == 0, and
> > > > > > hugepage_global_enabled() == false
> > > > > > (global THP settings are not enabled), it means mTHP of that orders
> > > > > > are prohibited
> > > > > > from being used, then madvise_collapse() is forbidden for that orders.
> > > > > >
> > > > > > 2. If always & orders == 0, and madvise & orders == 0, and
> > > > > > hugepage_global_enabled() == true
> > > > > > (global THP settings are enabled), and inherit & orders == 0, it means
> > > > > > mTHP of that
> > > > > > orders are still prohibited from being used, thus madvise_collapse()
> > > > > > is not allowed
> > > > > > for that orders.
> > > > > >
> > > > > > Reviewed-by: Zi Yan <ziy@nvidia.com>
> > > > > > Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> > > > > > ---
> > > > > > include/linux/huge_mm.h | 23 +++++++++++++++++++----
> > > > > > 1 file changed, 19 insertions(+), 4 deletions(-)
> > > > > >
> > > > > > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> > > > > > index 2f190c90192d..199ddc9f04a1 100644
> > > > > > --- a/include/linux/huge_mm.h
> > > > > > +++ b/include/linux/huge_mm.h
> > > > > > @@ -287,20 +287,35 @@ unsigned long thp_vma_allowable_orders(struct
> > > > > > vm_area_struct *vma,
> > > > > > unsigned long orders)
> > > > > > {
> > > > > > /* Optimization to check if required orders are enabled
> > > > > > early. */
> > > > > > - if ((tva_flags & TVA_ENFORCE_SYSFS) && vma_is_anonymous(vma)) {
> > > > > > - unsigned long mask = READ_ONCE(huge_anon_orders_always);
> > > > > > + if (vma_is_anonymous(vma)) {
> > > > > > + unsigned long always = READ_ONCE(huge_anon_orders_always);
> > > > > > + unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
> > > > > > + unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);
> > > > > > + unsigned long mask = always | madvise;
> > > > > > +
> > > > > > + /*
> > > > > > + * If the system-wide THP/mTHP sysfs settings are disabled,
> > > > > > + * then we should never allow hugepages.
> > > > > > + */> + if (!(mask & orders) &&
> > > > > !(hugepage_global_enabled() && (inherit & orders)))
> > > > > > + return 0;
> > > > >
> > > > > I'm still trying to digest that. Isn't there a way for us to work with
> > > > > the orders,
> > > > > essentially masking off all orders that are forbidden globally. Similar
> > > > > to below, if !orders, then return 0?
> > > > > /* Orders disabled directly. */
> > > > > orders &= ~TODO;
> > > > > /* Orders disabled by inheriting from the global toggle. */
> > > > > if (!hugepage_global_enabled())
> > > > > orders &= ~READ_ONCE(huge_anon_orders_inherit);
> > > > >
> > > > > TODO is probably a -1ULL and then clearing always/madvise/inherit. Could
> > > > > add a simple helper for that
> > > > >
> > > > > huge_anon_orders_never
> > > >
> > > > I followed Lorenzo's suggestion to simplify the logic. Does that look
> > > > more readable?
> > > >
> > > > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> > > > index 2f190c90192d..3087ac7631e0 100644
> > > > --- a/include/linux/huge_mm.h
> > > > +++ b/include/linux/huge_mm.h
> > > > @@ -265,6 +265,43 @@ unsigned long __thp_vma_allowable_orders(struct
> > > > vm_area_struct *vma,
> > > > unsigned long tva_flags,
> > > > unsigned long orders);
> > > >
> > > > +/* Strictly mask requested anonymous orders according to sysfs
> > > > settings. */
> > > > +static inline unsigned long __thp_mask_anon_orders(unsigned long
> > > > vm_flags,
> > > > + unsigned long tva_flags, unsigned long
> > > > orders)
> > > > +{
> > > > + unsigned long always = READ_ONCE(huge_anon_orders_always);
> > > > + unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
> > > > + unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);
> > > > + bool inherit_enabled = hugepage_global_enabled();
> > > > + bool has_madvise = vm_flags & VM_HUGEPAGE;
> > > > + unsigned long mask = always | madvise;
> > > > +
> > > > + mask = always | madvise;
> > > > + if (inherit_enabled)
> > > > + mask |= inherit;
> > > > +
> > > > + /* All set to/inherit NEVER - never means never globally,
> > > > abort. */
> > > > + if (!(mask & orders))
> > > > + return 0;
> > >
> > > Still confusing. I am not sure if we would properly catch when someone
> > > specifies e.g., 2M and 1M, while we only have 2M disabled.
> >
> > IIUC, Yes. In your case, we will only allow order 8 (1M mTHP).
> >
> > > I would rewrite the function to only ever substract from "orders".
> > >
> > > ...
> > >
> > > /* Disallow orders that are set to NEVER directly ... */
> > > order &= (always | madvise | inherit);
> > >
> > > /* ... or through inheritance. */
> > > if (inherit_enabled)
> > > orders &= ~inherit;
> >
> > Sorry, I didn't get you here.
> >
> > If orders = THP_ORDERS_ALL_ANON, inherit = 0x200 (order 9), always and
> > madvise are 0, and inherit_enabled = true. Then orders will be 0 with
> > your logic. But we should allow order 9, right?
>
> Yeah, all confusing, because the temporary variables don't help.
>
> if (!inherit_enabled)
>
> or simply
>
> if (!hugepage_global_enabled();)
>
> Let me try again below.
>
> >
> > >
> > > /*
> > > * Otherwise, we only enforce sysfs settings if asked. In addition,
> > > * if the user sets a sysfs mode of madvise and if TVA_ENFORCE_SYSFS
> > > * is not set, we don't bother checking whether the VMA has VM_HUGEPAGE
> > > * set.
> > > */
> > > if (!orders || !(tva_flags & TVA_ENFORCE_SYSFS))
> > > return orders;
> > >
> > > > +
> > > > + /*
> > > > + * Otherwise, we only enforce sysfs settings if asked. In
> > > > addition,
> > > > + * if the user sets a sysfs mode of madvise and if
> > > > TVA_ENFORCE_SYSFS
> > > > + * is not set, we don't bother checking whether the VMA has
> > > > VM_HUGEPAGE
> > > > + * set.
> > > > + */
> > > > + if (!(tva_flags & TVA_ENFORCE_SYSFS))
> > > > + return orders;
> > > > +
> > > > + mask = always;
> > > > + if (has_madvise)
> > > > + mask |= madvise;
> > > > + if (hugepage_global_always() || (has_madvise && inherit_enabled))
> > > > + mask |= inherit;
> > >
> > > Similarly, this can maybe become (not 100% sure if I got it right, the
> > > condition above is confusing)
> >
> > IMO, this is the original logic.
>
> Yeah, and it's absolutely confusing stuff.
>
> Let me try again by only clearing flags. Maybe this would be clearer?
> (and correct? still confused why the latter part is so complicated in existing
> code)
>
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index 8b8f353cc7b81..66fdfe06e4996 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -265,6 +265,42 @@ unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma,
> unsigned long tva_flags,
> unsigned long orders);
> +/* Strictly mask requested anonymous orders according to sysfs settings. */
> +static inline unsigned long __thp_mask_anon_orders(unsigned long vm_flags,
> + unsigned long tva_flags, unsigned long orders)
> +{
> + const unsigned long always = READ_ONCE(huge_anon_orders_always);
> + const unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
> + const unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);
> + const unsigned long never = ~(always | madvise | inherit);
> +
> + /* Disallow orders that are set to NEVER directly ... */
> + orders &= ~never;
> +
> + /* ... or through inheritance (global == NEVER). */
> + if (!hugepage_global_enabled())
> + orders &= ~inherit;
> +
> + /*
> + * Otherwise, we only enforce sysfs settings if asked. In addition,
> + * if the user sets a sysfs mode of madvise and if TVA_ENFORCE_SYSFS
> + * is not set, we don't bother checking whether the VMA has VM_HUGEPAGE
> + * set.
> + */
> + if (!(tva_flags & TVA_ENFORCE_SYSFS))
> + return orders;
This implicitly does a & mask as per suggested previous version, which I think
is correct but worth pointing out.
> +
> + if (!(vm_flags & VM_HUGEPAGE)) {
Don't love this sort of mega negation here. I read this as _does_ have huge
page...
> + /* Disallow orders that are set to MADVISE directly ... */
> + orders &= ~madvise;
> +
> + /* ... or through inheritance (global == MADVISE). */
> + if (!hugepage_global_always())
> + orders &= ~inherit;
I hate this implicit 'not hugepage global always so this means either never or
madvise and since we cleared orders for never this means madvise' mental
gymnastics required here.
Yeah I feel this is a bridge too far, we're getting into double negation and I
think that's more confusiong.
> + }
I propose a compromise as I rather like your 'exclude never' negation bit.
So:
/* Strictly mask requested anonymous orders according to sysfs settings. */
static inline unsigned long __thp_mask_anon_orders(unsigned long vm_flags,
unsigned long tva_flags, unsigned long orders)
{
const unsigned long always = READ_ONCE(huge_anon_orders_always);
const unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
const unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);;
const unsigned long never = ~(always | madvise | inherit);
const bool inherit_enabled = hugepage_global_enabled();
/* Disallow orders that are set to NEVER directly ... */
orders &= ~never;
/* ... or through inheritance (global == NEVER). */
if (!inherit_enabled)
orders &= ~inherit;
/*
* Otherwise, we only enforce sysfs settings if asked. In addition,
* if the user sets a sysfs mode of madvise and if TVA_ENFORCE_SYSFS
* is not set, we don't bother checking whether the VMA has VM_HUGEPAGE
* set.
*/
if (!(tva_flags & TVA_ENFORCE_SYSFS))
return orders;
if (hugepage_global_always())
return orders & (always | inherit);
/* We already excluded never inherit above. */
if (vm_flags & VM_HUGEPAGE)
return orders & (always | madvise | inherit);
return orders & always;
}
What do you think?
> + return orders;
> +}
> +
> /**
> * thp_vma_allowable_orders - determine hugepage orders that are allowed for vma
> * @vma: the vm area to check
> @@ -287,16 +323,8 @@ unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma,
> unsigned long orders)
> {
> /* Optimization to check if required orders are enabled early. */
> - if ((tva_flags & TVA_ENFORCE_SYSFS) && vma_is_anonymous(vma)) {
> - unsigned long mask = READ_ONCE(huge_anon_orders_always);
> -
> - if (vm_flags & VM_HUGEPAGE)
> - mask |= READ_ONCE(huge_anon_orders_madvise);
> - if (hugepage_global_always() ||
> - ((vm_flags & VM_HUGEPAGE) && hugepage_global_enabled()))
> - mask |= READ_ONCE(huge_anon_orders_inherit);
> -
> - orders &= mask;
> + if (vma_is_anonymous(vma)) {
> + orders = __thp_mask_anon_orders(vm_flags, tva_flags, orders);
> if (!orders)
> return 0;
I pointed out to Baolin that __thp_vma_allowable_orders() handles the orders ==
0 case almost immediately so there's no need to do this, it just makes the code
noisier.
I mean we _could_ keep it but I think it's better not to for cleanliness, what
do you think?
> }
>
>
> --
> Cheers,
>
> David / dhildenb
>
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH v2 1/2] mm: huge_memory: disallow hugepages if the system-wide THP sysfs settings are disabled
2025-06-12 13:27 ` Lorenzo Stoakes
@ 2025-06-12 13:29 ` Lorenzo Stoakes
2025-06-12 14:13 ` Baolin Wang
2025-06-12 14:09 ` David Hildenbrand
1 sibling, 1 reply; 49+ messages in thread
From: Lorenzo Stoakes @ 2025-06-12 13:29 UTC (permalink / raw)
To: David Hildenbrand
Cc: Baolin Wang, akpm, hughd, Liam.Howlett, npache, ryan.roberts,
dev.jain, ziy, linux-mm, linux-kernel
On Thu, Jun 12, 2025 at 02:27:06PM +0100, Lorenzo Stoakes wrote:
[snip]
> I propose a compromise as I rather like your 'exclude never' negation bit.
>
> So:
>
> /* Strictly mask requested anonymous orders according to sysfs settings. */
> static inline unsigned long __thp_mask_anon_orders(unsigned long vm_flags,
> unsigned long tva_flags, unsigned long orders)
> {
> const unsigned long always = READ_ONCE(huge_anon_orders_always);
> const unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
> const unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);;
> const unsigned long never = ~(always | madvise | inherit);
> const bool inherit_enabled = hugepage_global_enabled();
>
> /* Disallow orders that are set to NEVER directly ... */
> orders &= ~never;
>
> /* ... or through inheritance (global == NEVER). */
> if (!inherit_enabled)
> orders &= ~inherit;
>
> /*
> * Otherwise, we only enforce sysfs settings if asked. In addition,
> * if the user sets a sysfs mode of madvise and if TVA_ENFORCE_SYSFS
> * is not set, we don't bother checking whether the VMA has VM_HUGEPAGE
> * set.
> */
> if (!(tva_flags & TVA_ENFORCE_SYSFS))
> return orders;
>
> if (hugepage_global_always())
> return orders & (always | inherit);
>
> /* We already excluded never inherit above. */
> if (vm_flags & VM_HUGEPAGE)
> return orders & (always | madvise | inherit);
Of course... I immediately made a mistake... swap these two statements around. I
thought it'd be 'neater' to do the first one first, but of course it means
madvise (rather than inherit) orders don't get selected.
This WHOLE THING needs refactoring.
>
> return orders & always;
> }
>
> What do you think?
>
>
> > + return orders;
> > +}
> > +
> > /**
> > * thp_vma_allowable_orders - determine hugepage orders that are allowed for vma
> > * @vma: the vm area to check
> > @@ -287,16 +323,8 @@ unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma,
> > unsigned long orders)
> > {
> > /* Optimization to check if required orders are enabled early. */
> > - if ((tva_flags & TVA_ENFORCE_SYSFS) && vma_is_anonymous(vma)) {
> > - unsigned long mask = READ_ONCE(huge_anon_orders_always);
> > -
> > - if (vm_flags & VM_HUGEPAGE)
> > - mask |= READ_ONCE(huge_anon_orders_madvise);
> > - if (hugepage_global_always() ||
> > - ((vm_flags & VM_HUGEPAGE) && hugepage_global_enabled()))
> > - mask |= READ_ONCE(huge_anon_orders_inherit);
> > -
> > - orders &= mask;
> > + if (vma_is_anonymous(vma)) {
> > + orders = __thp_mask_anon_orders(vm_flags, tva_flags, orders);
> > if (!orders)
> > return 0;
>
> I pointed out to Baolin that __thp_vma_allowable_orders() handles the orders ==
> 0 case almost immediately so there's no need to do this, it just makes the code
> noisier.
>
> I mean we _could_ keep it but I think it's better not to for cleanliness, what
> do you think?
>
> > }
> >
> >
> > --
> > Cheers,
> >
> > David / dhildenb
> >
>
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH v2 1/2] mm: huge_memory: disallow hugepages if the system-wide THP sysfs settings are disabled
2025-06-12 13:13 ` David Hildenbrand
@ 2025-06-12 13:31 ` Lorenzo Stoakes
0 siblings, 0 replies; 49+ messages in thread
From: Lorenzo Stoakes @ 2025-06-12 13:31 UTC (permalink / raw)
To: David Hildenbrand
Cc: Baolin Wang, akpm, hughd, Liam.Howlett, npache, ryan.roberts,
dev.jain, ziy, linux-mm, linux-kernel
On Thu, Jun 12, 2025 at 03:13:19PM +0200, David Hildenbrand wrote:
[snip]
>
> The masks are just disgusting :(
Yeah I was shocked when I saw this implementation it's bloody terrible. It just
is.
>
> While you were writing that reply, I just sent out something else. Not sure
> if that makes sense.
Replied there :>)
>
> I'm having a hard time figuring out what we even want here in the existing
> code.
I'm confused as to why this has been done in this way in general.
I feel a churny refactoring coming on... ;)
>
> --
> Cheers,
>
> David / dhildenb
>
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH v2 1/2] mm: huge_memory: disallow hugepages if the system-wide THP sysfs settings are disabled
2025-06-12 13:25 ` Baolin Wang
@ 2025-06-12 13:40 ` Baolin Wang
0 siblings, 0 replies; 49+ messages in thread
From: Baolin Wang @ 2025-06-12 13:40 UTC (permalink / raw)
To: David Hildenbrand, akpm, hughd
Cc: lorenzo.stoakes, Liam.Howlett, npache, ryan.roberts, dev.jain,
ziy, linux-mm, linux-kernel
On 2025/6/12 21:25, Baolin Wang wrote:
>
>
> On 2025/6/12 21:05, David Hildenbrand wrote:
>> On 12.06.25 14:45, Baolin Wang wrote:
>>>
>>>
>>> On 2025/6/12 16:51, David Hildenbrand wrote:
>>>> On 12.06.25 09:51, Baolin Wang wrote:
>>>>>
>>>>>
>>>>> On 2025/6/11 20:34, David Hildenbrand wrote:
>>>>>> On 05.06.25 10:00, Baolin Wang wrote:
>>>>>>> The MADV_COLLAPSE will ignore the system-wide Anon THP sysfs
>>>>>>> settings,
>>>>>>> which
>>>>>>> means that even though we have disabled the Anon THP configuration,
>>>>>>> MADV_COLLAPSE
>>>>>>> will still attempt to collapse into a Anon THP. This violates the
>>>>>>> rule
>>>>>>> we have
>>>>>>> agreed upon: never means never.
>>>>>>>
>>>>>>> Another rule for madvise, referring to David's suggestion: “allowing
>>>>>>> for collapsing
>>>>>>> in a VM without VM_HUGEPAGE in the "madvise" mode would be fine".
>>>>>>>
>>>>>>> To address this issue, should check whether the Anon THP
>>>>>>> configuration
>>>>>>> is disabled
>>>>>>> in thp_vma_allowable_orders(), even when the TVA_ENFORCE_SYSFS
>>>>>>> flag is
>>>>>>> set.
>>>>>>>
>>>>>>> In summary, the current strategy is:
>>>>>>>
>>>>>>> 1. If always & orders == 0, and madvise & orders == 0, and
>>>>>>> hugepage_global_enabled() == false
>>>>>>> (global THP settings are not enabled), it means mTHP of that orders
>>>>>>> are prohibited
>>>>>>> from being used, then madvise_collapse() is forbidden for that
>>>>>>> orders.
>>>>>>>
>>>>>>> 2. If always & orders == 0, and madvise & orders == 0, and
>>>>>>> hugepage_global_enabled() == true
>>>>>>> (global THP settings are enabled), and inherit & orders == 0, it
>>>>>>> means
>>>>>>> mTHP of that
>>>>>>> orders are still prohibited from being used, thus madvise_collapse()
>>>>>>> is not allowed
>>>>>>> for that orders.
>>>>>>>
>>>>>>> Reviewed-by: Zi Yan <ziy@nvidia.com>
>>>>>>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>>>>>>> ---
>>>>>>> include/linux/huge_mm.h | 23 +++++++++++++++++++----
>>>>>>> 1 file changed, 19 insertions(+), 4 deletions(-)
>>>>>>>
>>>>>>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>>>>>>> index 2f190c90192d..199ddc9f04a1 100644
>>>>>>> --- a/include/linux/huge_mm.h
>>>>>>> +++ b/include/linux/huge_mm.h
>>>>>>> @@ -287,20 +287,35 @@ unsigned long thp_vma_allowable_orders(struct
>>>>>>> vm_area_struct *vma,
>>>>>>> unsigned long orders)
>>>>>>> {
>>>>>>> /* Optimization to check if required orders are enabled
>>>>>>> early. */
>>>>>>> - if ((tva_flags & TVA_ENFORCE_SYSFS) && vma_is_anonymous(vma)) {
>>>>>>> - unsigned long mask = READ_ONCE(huge_anon_orders_always);
>>>>>>> + if (vma_is_anonymous(vma)) {
>>>>>>> + unsigned long always = READ_ONCE(huge_anon_orders_always);
>>>>>>> + unsigned long madvise =
>>>>>>> READ_ONCE(huge_anon_orders_madvise);
>>>>>>> + unsigned long inherit =
>>>>>>> READ_ONCE(huge_anon_orders_inherit);
>>>>>>> + unsigned long mask = always | madvise;
>>>>>>> +
>>>>>>> + /*
>>>>>>> + * If the system-wide THP/mTHP sysfs settings are disabled,
>>>>>>> + * then we should never allow hugepages.
>>>>>> > + */> + if (!(mask & orders) &&
>>>>>> !(hugepage_global_enabled() && (inherit & orders)))
>>>>>>> + return 0;
>>>>>>
>>>>>> I'm still trying to digest that. Isn't there a way for us to work
>>>>>> with
>>>>>> the orders,
>>>>>> essentially masking off all orders that are forbidden globally.
>>>>>> Similar
>>>>>> to below, if !orders, then return 0?
>>>>>> /* Orders disabled directly. */
>>>>>> orders &= ~TODO;
>>>>>> /* Orders disabled by inheriting from the global toggle. */
>>>>>> if (!hugepage_global_enabled())
>>>>>> orders &= ~READ_ONCE(huge_anon_orders_inherit);
>>>>>>
>>>>>> TODO is probably a -1ULL and then clearing always/madvise/inherit.
>>>>>> Could
>>>>>> add a simple helper for that
>>>>>>
>>>>>> huge_anon_orders_never
>>>>>
>>>>> I followed Lorenzo's suggestion to simplify the logic. Does that look
>>>>> more readable?
>>>>>
>>>>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>>>>> index 2f190c90192d..3087ac7631e0 100644
>>>>> --- a/include/linux/huge_mm.h
>>>>> +++ b/include/linux/huge_mm.h
>>>>> @@ -265,6 +265,43 @@ unsigned long __thp_vma_allowable_orders(struct
>>>>> vm_area_struct *vma,
>>>>> unsigned long tva_flags,
>>>>> unsigned long orders);
>>>>>
>>>>> +/* Strictly mask requested anonymous orders according to sysfs
>>>>> settings. */
>>>>> +static inline unsigned long __thp_mask_anon_orders(unsigned long
>>>>> vm_flags,
>>>>> + unsigned long tva_flags, unsigned long
>>>>> orders)
>>>>> +{
>>>>> + unsigned long always = READ_ONCE(huge_anon_orders_always);
>>>>> + unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
>>>>> + unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);
>>>>> + bool inherit_enabled = hugepage_global_enabled();
>>>>> + bool has_madvise = vm_flags & VM_HUGEPAGE;
>>>>> + unsigned long mask = always | madvise;
>>>>> +
>>>>> + mask = always | madvise;
>>>>> + if (inherit_enabled)
>>>>> + mask |= inherit;
>>>>> +
>>>>> + /* All set to/inherit NEVER - never means never globally,
>>>>> abort. */
>>>>> + if (!(mask & orders))
>>>>> + return 0;
>>>>
>>>> Still confusing. I am not sure if we would properly catch when someone
>>>> specifies e.g., 2M and 1M, while we only have 2M disabled.
>>>
>>> IIUC, Yes. In your case, we will only allow order 8 (1M mTHP).
>>>
>>>> I would rewrite the function to only ever substract from "orders".
>>>>
>>>> ...
>>>>
>>>> /* Disallow orders that are set to NEVER directly ... */
>>>> order &= (always | madvise | inherit);
>>>>
>>>> /* ... or through inheritance. */
>>>> if (inherit_enabled)
>>>> orders &= ~inherit;
>>>
>>> Sorry, I didn't get you here.
>>>
>>> If orders = THP_ORDERS_ALL_ANON, inherit = 0x200 (order 9), always and
>>> madvise are 0, and inherit_enabled = true. Then orders will be 0 with
>>> your logic. But we should allow order 9, right?
>>
>> Yeah, all confusing, because the temporary variables don't help.
>>
>> if (!inherit_enabled)
>>
>> or simply
>>
>> if (!hugepage_global_enabled();)
>>
>> Let me try again below.
>>
>>>
>>>>
>>>> /*
>>>> * Otherwise, we only enforce sysfs settings if asked. In addition,
>>>> * if the user sets a sysfs mode of madvise and if TVA_ENFORCE_SYSFS
>>>> * is not set, we don't bother checking whether the VMA has
>>>> VM_HUGEPAGE
>>>> * set.
>>>> */
>>>> if (!orders || !(tva_flags & TVA_ENFORCE_SYSFS))
>>>> return orders;
>>>>
>>>>> +
>>>>> + /*
>>>>> + * Otherwise, we only enforce sysfs settings if asked. In
>>>>> addition,
>>>>> + * if the user sets a sysfs mode of madvise and if
>>>>> TVA_ENFORCE_SYSFS
>>>>> + * is not set, we don't bother checking whether the VMA has
>>>>> VM_HUGEPAGE
>>>>> + * set.
>>>>> + */
>>>>> + if (!(tva_flags & TVA_ENFORCE_SYSFS))
>>>>> + return orders;
>>>>> +
>>>>> + mask = always;
>>>>> + if (has_madvise)
>>>>> + mask |= madvise;
>>>>> + if (hugepage_global_always() || (has_madvise &&
>>>>> inherit_enabled))
>>>>> + mask |= inherit;
>>>>
>>>> Similarly, this can maybe become (not 100% sure if I got it right, the
>>>> condition above is confusing)
>>>
>>> IMO, this is the original logic.
>>
>> Yeah, and it's absolutely confusing stuff.
>>
>> Let me try again by only clearing flags. Maybe this would be clearer?
>> (and correct? still confused why the latter part is so complicated in
>> existing
>> code)
>>
>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>> index 8b8f353cc7b81..66fdfe06e4996 100644
>> --- a/include/linux/huge_mm.h
>> +++ b/include/linux/huge_mm.h
>> @@ -265,6 +265,42 @@ unsigned long __thp_vma_allowable_orders(struct
>> vm_area_struct *vma,
>> unsigned long tva_flags,
>> unsigned long orders);
>>
>> +/* Strictly mask requested anonymous orders according to sysfs
>> settings. */
>> +static inline unsigned long __thp_mask_anon_orders(unsigned long
>> vm_flags,
>> + unsigned long tva_flags, unsigned long orders)
>> +{
>> + const unsigned long always = READ_ONCE(huge_anon_orders_always);
>> + const unsigned long madvise =
>> READ_ONCE(huge_anon_orders_madvise);
>> + const unsigned long inherit =
>> READ_ONCE(huge_anon_orders_inherit);
>> + const unsigned long never = ~(always | madvise | inherit);
>> +
>> + /* Disallow orders that are set to NEVER directly ... */
>> + orders &= ~never;
>> +
>> + /* ... or through inheritance (global == NEVER). */
>> + if (!hugepage_global_enabled())
>> + orders &= ~inherit;
>> +
>> + /*
>> + * Otherwise, we only enforce sysfs settings if asked. In
>> addition,
>> + * if the user sets a sysfs mode of madvise and if
>> TVA_ENFORCE_SYSFS
>> + * is not set, we don't bother checking whether the VMA has
>> VM_HUGEPAGE
>> + * set.
>> + */
>> + if (!(tva_flags & TVA_ENFORCE_SYSFS))
>> + return orders;
>> +
>> + if (!(vm_flags & VM_HUGEPAGE)) {
>> + /* Disallow orders that are set to MADVISE directly
>> ... */
>> + orders &= ~madvise;
>> +
>> + /* ... or through inheritance (global == MADVISE). */
>> + if (!hugepage_global_always())
>> + orders &= ~inherit;
>
> Seems we can drop this 'inherit' check, cause if
> !hugepage_global_enabled() == true, which always means
> !hugepage_global_always() is true.
Please ignore this incorrect comment. Logic is confusing :)
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH v2 1/2] mm: huge_memory: disallow hugepages if the system-wide THP sysfs settings are disabled
2025-06-12 13:27 ` Lorenzo Stoakes
2025-06-12 13:29 ` Lorenzo Stoakes
@ 2025-06-12 14:09 ` David Hildenbrand
2025-06-12 14:49 ` Lorenzo Stoakes
1 sibling, 1 reply; 49+ messages in thread
From: David Hildenbrand @ 2025-06-12 14:09 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: Baolin Wang, akpm, hughd, Liam.Howlett, npache, ryan.roberts,
dev.jain, ziy, linux-mm, linux-kernel
>> @@ -265,6 +265,42 @@ unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma,
>> unsigned long tva_flags,
>> unsigned long orders);
>> +/* Strictly mask requested anonymous orders according to sysfs settings. */
>> +static inline unsigned long __thp_mask_anon_orders(unsigned long vm_flags,
>> + unsigned long tva_flags, unsigned long orders)
>> +{
>> + const unsigned long always = READ_ONCE(huge_anon_orders_always);
>> + const unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
>> + const unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);
>> + const unsigned long never = ~(always | madvise | inherit);
>> +
>> + /* Disallow orders that are set to NEVER directly ... */
>> + orders &= ~never;
>> +
>> + /* ... or through inheritance (global == NEVER). */
>> + if (!hugepage_global_enabled())
>> + orders &= ~inherit;
>> +
>> + /*
>> + * Otherwise, we only enforce sysfs settings if asked. In addition,
>> + * if the user sets a sysfs mode of madvise and if TVA_ENFORCE_SYSFS
>> + * is not set, we don't bother checking whether the VMA has VM_HUGEPAGE
>> + * set.
>> + */
>> + if (!(tva_flags & TVA_ENFORCE_SYSFS))
>> + return orders;
>
> This implicitly does a & mask as per suggested previous version, which I think
> is correct but worth pointing out.
Yes.
>
>> +
>> + if (!(vm_flags & VM_HUGEPAGE)) {
>
> Don't love this sort of mega negation here. I read this as _does_ have huge
> page...
Well, it's very common to do that, but not objecting to something that
is clearer ;)
I assume you spotted the
if (!(tva_flags & TVA_ENFORCE_SYSFS))
:P
if (vm_flags & VM_HUGEPAGE)
return orders;
Would have been easier.
>
>> + /* Disallow orders that are set to MADVISE directly ... */
>> + orders &= ~madvise;
>> +
>> + /* ... or through inheritance (global == MADVISE). */
>> + if (!hugepage_global_always())
>> + orders &= ~inherit;
>
> I hate this implicit 'not hugepage global always so this means either never or
> madvise and since we cleared orders for never this means madvise' mental
> gymnastics required here.
>
> Yeah I feel this is a bridge too far, we're getting into double negation and I
> think that's more confusiong.
Same here ... I think we should just have hugepage_global_madvise(). :)
>
>
>> + }
>
> I propose a compromise as I rather like your 'exclude never' negation bit.
>
> So:
>
> /* Strictly mask requested anonymous orders according to sysfs settings. */
> static inline unsigned long __thp_mask_anon_orders(unsigned long vm_flags,
> unsigned long tva_flags, unsigned long orders)
> {
> const unsigned long always = READ_ONCE(huge_anon_orders_always);
> const unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
> const unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);;
> const unsigned long never = ~(always | madvise | inherit);
> const bool inherit_enabled = hugepage_global_enabled();
Can we just have hugepage_global_never/disabled() to use instead?
>
> /* Disallow orders that are set to NEVER directly ... */
> orders &= ~never;
>
> /* ... or through inheritance (global == NEVER). */
> if (!inherit_enabled)
> orders &= ~inherit;>
> /*
> * Otherwise, we only enforce sysfs settings if asked. In addition,
> * if the user sets a sysfs mode of madvise and if TVA_ENFORCE_SYSFS
> * is not set, we don't bother checking whether the VMA has VM_HUGEPAGE
> * set.
> */
> if (!(tva_flags & TVA_ENFORCE_SYSFS))
> return orders;
>
> if (hugepage_global_always())
> return orders & (always | inherit);
>
> /* We already excluded never inherit above. */
> if (vm_flags & VM_HUGEPAGE)
> return orders & (always | madvise | inherit);
>
> return orders & always;
> }
>
> What do you think?
With the fixup, it would work for me. No magical "mask" variables :D
> >
>> + return orders;
>> +}
>> +
>> /**
>> * thp_vma_allowable_orders - determine hugepage orders that are allowed for vma
>> * @vma: the vm area to check
>> @@ -287,16 +323,8 @@ unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma,
>> unsigned long orders)
>> {
>> /* Optimization to check if required orders are enabled early. */
>> - if ((tva_flags & TVA_ENFORCE_SYSFS) && vma_is_anonymous(vma)) {
>> - unsigned long mask = READ_ONCE(huge_anon_orders_always);
>> -
>> - if (vm_flags & VM_HUGEPAGE)
>> - mask |= READ_ONCE(huge_anon_orders_madvise);
>> - if (hugepage_global_always() ||
>> - ((vm_flags & VM_HUGEPAGE) && hugepage_global_enabled()))
>> - mask |= READ_ONCE(huge_anon_orders_inherit);
>> -
>> - orders &= mask;
>> + if (vma_is_anonymous(vma)) {
>> + orders = __thp_mask_anon_orders(vm_flags, tva_flags, orders);
>> if (!orders)
>> return 0;
>
> I pointed out to Baolin that __thp_vma_allowable_orders() handles the orders ==
> 0 case almost immediately so there's no need to do this, it just makes the code
> noisier.
The reason we added it in the first place was to not do the (expensive)
function call.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH v2 1/2] mm: huge_memory: disallow hugepages if the system-wide THP sysfs settings are disabled
2025-06-12 13:29 ` Lorenzo Stoakes
@ 2025-06-12 14:13 ` Baolin Wang
2025-06-12 14:16 ` David Hildenbrand
2025-06-12 14:20 ` Lorenzo Stoakes
0 siblings, 2 replies; 49+ messages in thread
From: Baolin Wang @ 2025-06-12 14:13 UTC (permalink / raw)
To: Lorenzo Stoakes, David Hildenbrand
Cc: akpm, hughd, Liam.Howlett, npache, ryan.roberts, dev.jain, ziy,
linux-mm, linux-kernel
On 2025/6/12 21:29, Lorenzo Stoakes wrote:
> On Thu, Jun 12, 2025 at 02:27:06PM +0100, Lorenzo Stoakes wrote:
> [snip]
>
>> I propose a compromise as I rather like your 'exclude never' negation bit.
>>
>> So:
>>
>> /* Strictly mask requested anonymous orders according to sysfs settings. */
>> static inline unsigned long __thp_mask_anon_orders(unsigned long vm_flags,
>> unsigned long tva_flags, unsigned long orders)
>> {
>> const unsigned long always = READ_ONCE(huge_anon_orders_always);
>> const unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
>> const unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);;
>> const unsigned long never = ~(always | madvise | inherit);
>> const bool inherit_enabled = hugepage_global_enabled();
>>
>> /* Disallow orders that are set to NEVER directly ... */
>> orders &= ~never;
>>
>> /* ... or through inheritance (global == NEVER). */
>> if (!inherit_enabled)
>> orders &= ~inherit;
>>
>> /*
>> * Otherwise, we only enforce sysfs settings if asked. In addition,
>> * if the user sets a sysfs mode of madvise and if TVA_ENFORCE_SYSFS
>> * is not set, we don't bother checking whether the VMA has VM_HUGEPAGE
>> * set.
>> */
>> if (!(tva_flags & TVA_ENFORCE_SYSFS))
>> return orders;
>>
>> if (hugepage_global_always())
>> return orders & (always | inherit);
>>
>> /* We already excluded never inherit above. */
>> if (vm_flags & VM_HUGEPAGE)
>> return orders & (always | madvise | inherit);
>
> Of course... I immediately made a mistake... swap these two statements around. I
> thought it'd be 'neater' to do the first one first, but of course it means
> madvise (rather than inherit) orders don't get selected.
>
> This WHOLE THING needs refactoring.
Personally, I think the 'exclude never' logic becomes more complicated.
I made a simpler change without adding a new helper. What do you think?
static inline
unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma,
unsigned long vm_flags,
unsigned long tva_flags,
unsigned long orders)
{
/* Optimization to check if required orders are enabled early. */
if (vma_is_anonymous(vma)) {
unsigned long mask = READ_ONCE(huge_anon_orders_always);
bool huge_enforce = !(tva_flags & TVA_ENFORCE_SYSFS);
bool has_madvise = vm_flags & VM_HUGEPAGE;
/*
* if the user sets a sysfs mode of madvise and if
TVA_ENFORCE_SYSFS
* is not set, we don't bother checking whether the VMA
has VM_HUGEPAGE
* set.
*/
if (huge_enforce || has_madvise)
mask |= READ_ONCE(huge_anon_orders_madvise);
if (hugepage_global_always() ||
((has_madvise || huge_enforce) &&
hugepage_global_enabled()))
mask |= READ_ONCE(huge_anon_orders_inherit);
orders &= mask;
if (!orders)
return 0;
}
return __thp_vma_allowable_orders(vma, vm_flags, tva_flags,
orders);
}
>
>>
>> return orders & always;
>> }
>>
>> What do you think?
>>
>>
>>> + return orders;
>>> +}
>>> +
>>> /**
>>> * thp_vma_allowable_orders - determine hugepage orders that are allowed for vma
>>> * @vma: the vm area to check
>>> @@ -287,16 +323,8 @@ unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma,
>>> unsigned long orders)
>>> {
>>> /* Optimization to check if required orders are enabled early. */
>>> - if ((tva_flags & TVA_ENFORCE_SYSFS) && vma_is_anonymous(vma)) {
>>> - unsigned long mask = READ_ONCE(huge_anon_orders_always);
>>> -
>>> - if (vm_flags & VM_HUGEPAGE)
>>> - mask |= READ_ONCE(huge_anon_orders_madvise);
>>> - if (hugepage_global_always() ||
>>> - ((vm_flags & VM_HUGEPAGE) && hugepage_global_enabled()))
>>> - mask |= READ_ONCE(huge_anon_orders_inherit);
>>> -
>>> - orders &= mask;
>>> + if (vma_is_anonymous(vma)) {
>>> + orders = __thp_mask_anon_orders(vm_flags, tva_flags, orders);
>>> if (!orders)
>>> return 0;
>>
>> I pointed out to Baolin that __thp_vma_allowable_orders() handles the orders ==
>> 0 case almost immediately so there's no need to do this, it just makes the code
>> noisier.
>>
>> I mean we _could_ keep it but I think it's better not to for cleanliness, what
>> do you think?
>>
>>> }
>>>
>>>
>>> --
>>> Cheers,
>>>
>>> David / dhildenb
>>>
>>
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH v2 1/2] mm: huge_memory: disallow hugepages if the system-wide THP sysfs settings are disabled
2025-06-12 14:13 ` Baolin Wang
@ 2025-06-12 14:16 ` David Hildenbrand
2025-06-12 14:20 ` Lorenzo Stoakes
1 sibling, 0 replies; 49+ messages in thread
From: David Hildenbrand @ 2025-06-12 14:16 UTC (permalink / raw)
To: Baolin Wang, Lorenzo Stoakes
Cc: akpm, hughd, Liam.Howlett, npache, ryan.roberts, dev.jain, ziy,
linux-mm, linux-kernel
On 12.06.25 16:13, Baolin Wang wrote:
>
>
> On 2025/6/12 21:29, Lorenzo Stoakes wrote:
>> On Thu, Jun 12, 2025 at 02:27:06PM +0100, Lorenzo Stoakes wrote:
>> [snip]
>>
>>> I propose a compromise as I rather like your 'exclude never' negation bit.
>>>
>>> So:
>>>
>>> /* Strictly mask requested anonymous orders according to sysfs settings. */
>>> static inline unsigned long __thp_mask_anon_orders(unsigned long vm_flags,
>>> unsigned long tva_flags, unsigned long orders)
>>> {
>>> const unsigned long always = READ_ONCE(huge_anon_orders_always);
>>> const unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
>>> const unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);;
>>> const unsigned long never = ~(always | madvise | inherit);
>>> const bool inherit_enabled = hugepage_global_enabled();
>>>
>>> /* Disallow orders that are set to NEVER directly ... */
>>> orders &= ~never;
>>>
>>> /* ... or through inheritance (global == NEVER). */
>>> if (!inherit_enabled)
>>> orders &= ~inherit;
>>>
>>> /*
>>> * Otherwise, we only enforce sysfs settings if asked. In addition,
>>> * if the user sets a sysfs mode of madvise and if TVA_ENFORCE_SYSFS
>>> * is not set, we don't bother checking whether the VMA has VM_HUGEPAGE
>>> * set.
>>> */
>>> if (!(tva_flags & TVA_ENFORCE_SYSFS))
>>> return orders;
>>>
>>> if (hugepage_global_always())
>>> return orders & (always | inherit);
>>>
>>> /* We already excluded never inherit above. */
>>> if (vm_flags & VM_HUGEPAGE)
>>> return orders & (always | madvise | inherit);
>>
>> Of course... I immediately made a mistake... swap these two statements around. I
>> thought it'd be 'neater' to do the first one first, but of course it means
>> madvise (rather than inherit) orders don't get selected.
>>
>> This WHOLE THING needs refactoring.
>
> Personally, I think the 'exclude never' logic becomes more complicated.
> I made a simpler change without adding a new helper. What do you think?
>
> static inline
> unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma,
> unsigned long vm_flags,
> unsigned long tva_flags,
> unsigned long orders)
> {
> /* Optimization to check if required orders are enabled early. */
> if (vma_is_anonymous(vma)) {
> unsigned long mask = READ_ONCE(huge_anon_orders_always);
> bool huge_enforce = !(tva_flags & TVA_ENFORCE_SYSFS);
> bool has_madvise = vm_flags & VM_HUGEPAGE;
>
> /*
> * if the user sets a sysfs mode of madvise and if
> TVA_ENFORCE_SYSFS
> * is not set, we don't bother checking whether the VMA
> has VM_HUGEPAGE
> * set.
> */
> if (huge_enforce || has_madvise)
> mask |= READ_ONCE(huge_anon_orders_madvise);
> if (hugepage_global_always() ||
> ((has_madvise || huge_enforce) &&
> hugepage_global_enabled()))
OMG is that ugly.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH v2 1/2] mm: huge_memory: disallow hugepages if the system-wide THP sysfs settings are disabled
2025-06-12 14:13 ` Baolin Wang
2025-06-12 14:16 ` David Hildenbrand
@ 2025-06-12 14:20 ` Lorenzo Stoakes
1 sibling, 0 replies; 49+ messages in thread
From: Lorenzo Stoakes @ 2025-06-12 14:20 UTC (permalink / raw)
To: Baolin Wang
Cc: David Hildenbrand, akpm, hughd, Liam.Howlett, npache,
ryan.roberts, dev.jain, ziy, linux-mm, linux-kernel
On Thu, Jun 12, 2025 at 10:13:40PM +0800, Baolin Wang wrote:
>
>
> On 2025/6/12 21:29, Lorenzo Stoakes wrote:
> > On Thu, Jun 12, 2025 at 02:27:06PM +0100, Lorenzo Stoakes wrote:
> > [snip]
> >
> > > I propose a compromise as I rather like your 'exclude never' negation bit.
> > >
> > > So:
> > >
> > > /* Strictly mask requested anonymous orders according to sysfs settings. */
> > > static inline unsigned long __thp_mask_anon_orders(unsigned long vm_flags,
> > > unsigned long tva_flags, unsigned long orders)
> > > {
> > > const unsigned long always = READ_ONCE(huge_anon_orders_always);
> > > const unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
> > > const unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);;
> > > const unsigned long never = ~(always | madvise | inherit);
> > > const bool inherit_enabled = hugepage_global_enabled();
> > >
> > > /* Disallow orders that are set to NEVER directly ... */
> > > orders &= ~never;
> > >
> > > /* ... or through inheritance (global == NEVER). */
> > > if (!inherit_enabled)
> > > orders &= ~inherit;
> > >
> > > /*
> > > * Otherwise, we only enforce sysfs settings if asked. In addition,
> > > * if the user sets a sysfs mode of madvise and if TVA_ENFORCE_SYSFS
> > > * is not set, we don't bother checking whether the VMA has VM_HUGEPAGE
> > > * set.
> > > */
> > > if (!(tva_flags & TVA_ENFORCE_SYSFS))
> > > return orders;
> > >
> > > if (hugepage_global_always())
> > > return orders & (always | inherit);
> > >
> > > /* We already excluded never inherit above. */
> > > if (vm_flags & VM_HUGEPAGE)
> > > return orders & (always | madvise | inherit);
> >
> > Of course... I immediately made a mistake... swap these two statements around. I
> > thought it'd be 'neater' to do the first one first, but of course it means
> > madvise (rather than inherit) orders don't get selected.
> >
> > This WHOLE THING needs refactoring.
>
> Personally, I think the 'exclude never' logic becomes more complicated. I
> made a simpler change without adding a new helper. What do you think?
>
> static inline
> unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma,
> unsigned long vm_flags,
> unsigned long tva_flags,
> unsigned long orders)
> {
> /* Optimization to check if required orders are enabled early. */
> if (vma_is_anonymous(vma)) {
I hate the level of indentation here. There's really no reason not to have this
as a helper as this just solves this problem and any sane compiler will inline.
> unsigned long mask = READ_ONCE(huge_anon_orders_always);
> bool huge_enforce = !(tva_flags & TVA_ENFORCE_SYSFS);
Huge enforce is when we don't have enforce flag set? This is super confusing.
> bool has_madvise = vm_flags & VM_HUGEPAGE;
>
> /*
> * if the user sets a sysfs mode of madvise and if
> TVA_ENFORCE_SYSFS
> * is not set, we don't bother checking whether the VMA has
> VM_HUGEPAGE
> * set.
> */
> if (huge_enforce || has_madvise)
> mask |= READ_ONCE(huge_anon_orders_madvise);
I find this more confusing, honestly.
I far prefer having the never checks up front, and I prefer David's 'explicitly
deal with never through negations' approach.
I also think adding these READ_ONCE()'s here adds a ton of noise.
> if (hugepage_global_always() ||
> ((has_madvise || huge_enforce) &&
> hugepage_global_enabled()))
This combination of conditions is just horribly confusing. And why are you
giving an explanation above but not for this one...
This is just combining a ton of logic in a confusing way.
> mask |= READ_ONCE(huge_anon_orders_inherit);
>
> orders &= mask;
> if (!orders)
> return 0;
> }
>
> return __thp_vma_allowable_orders(vma, vm_flags, tva_flags, orders);
> }
Thanks for the suggestion but I do prefer the proposed compromise solution.
>
> >
> > >
> > > return orders & always;
> > > }
> > >
> > > What do you think?
> > >
> > >
> > > > + return orders;
> > > > +}
> > > > +
> > > > /**
> > > > * thp_vma_allowable_orders - determine hugepage orders that are allowed for vma
> > > > * @vma: the vm area to check
> > > > @@ -287,16 +323,8 @@ unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma,
> > > > unsigned long orders)
> > > > {
> > > > /* Optimization to check if required orders are enabled early. */
> > > > - if ((tva_flags & TVA_ENFORCE_SYSFS) && vma_is_anonymous(vma)) {
> > > > - unsigned long mask = READ_ONCE(huge_anon_orders_always);
> > > > -
> > > > - if (vm_flags & VM_HUGEPAGE)
> > > > - mask |= READ_ONCE(huge_anon_orders_madvise);
> > > > - if (hugepage_global_always() ||
> > > > - ((vm_flags & VM_HUGEPAGE) && hugepage_global_enabled()))
> > > > - mask |= READ_ONCE(huge_anon_orders_inherit);
> > > > -
> > > > - orders &= mask;
> > > > + if (vma_is_anonymous(vma)) {
> > > > + orders = __thp_mask_anon_orders(vm_flags, tva_flags, orders);
> > > > if (!orders)
> > > > return 0;
> > >
> > > I pointed out to Baolin that __thp_vma_allowable_orders() handles the orders ==
> > > 0 case almost immediately so there's no need to do this, it just makes the code
> > > noisier.
> > >
> > > I mean we _could_ keep it but I think it's better not to for cleanliness, what
> > > do you think?
> > >
> > > > }
> > > >
> > > >
> > > > --
> > > > Cheers,
> > > >
> > > > David / dhildenb
> > > >
> > >
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH v2 1/2] mm: huge_memory: disallow hugepages if the system-wide THP sysfs settings are disabled
2025-06-12 14:09 ` David Hildenbrand
@ 2025-06-12 14:49 ` Lorenzo Stoakes
2025-06-13 2:07 ` Baolin Wang
0 siblings, 1 reply; 49+ messages in thread
From: Lorenzo Stoakes @ 2025-06-12 14:49 UTC (permalink / raw)
To: David Hildenbrand
Cc: Baolin Wang, akpm, hughd, Liam.Howlett, npache, ryan.roberts,
dev.jain, ziy, linux-mm, linux-kernel
On Thu, Jun 12, 2025 at 04:09:27PM +0200, David Hildenbrand wrote:
>
>
> > > @@ -265,6 +265,42 @@ unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma,
> > > unsigned long tva_flags,
> > > unsigned long orders);
> > > +/* Strictly mask requested anonymous orders according to sysfs settings. */
> > > +static inline unsigned long __thp_mask_anon_orders(unsigned long vm_flags,
> > > + unsigned long tva_flags, unsigned long orders)
> > > +{
> > > + const unsigned long always = READ_ONCE(huge_anon_orders_always);
> > > + const unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
> > > + const unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);
> > > + const unsigned long never = ~(always | madvise | inherit);
> > > +
> > > + /* Disallow orders that are set to NEVER directly ... */
> > > + orders &= ~never;
> > > +
> > > + /* ... or through inheritance (global == NEVER). */
> > > + if (!hugepage_global_enabled())
> > > + orders &= ~inherit;
> > > +
> > > + /*
> > > + * Otherwise, we only enforce sysfs settings if asked. In addition,
> > > + * if the user sets a sysfs mode of madvise and if TVA_ENFORCE_SYSFS
> > > + * is not set, we don't bother checking whether the VMA has VM_HUGEPAGE
> > > + * set.
> > > + */
> > > + if (!(tva_flags & TVA_ENFORCE_SYSFS))
> > > + return orders;
> >
> > This implicitly does a & mask as per suggested previous version, which I think
> > is correct but worth pointing out.
>
> Yes.
>
> >
> > > +
> > > + if (!(vm_flags & VM_HUGEPAGE)) {
> >
> > Don't love this sort of mega negation here. I read this as _does_ have huge
> > page...
>
> Well, it's very common to do that, but not objecting to something that is
> clearer ;)
>
> I assume you spotted the
>
> if (!(tva_flags & TVA_ENFORCE_SYSFS))
>
> :P
Lol yeah I know I know, I just think I guess in this case because you're
negating elsewhere it makes it harder...
>
> if (vm_flags & VM_HUGEPAGE)
> return orders;
>
>
> Would have been easier.
>
> >
> > > + /* Disallow orders that are set to MADVISE directly ... */
> > > + orders &= ~madvise;
> > > +
> > > + /* ... or through inheritance (global == MADVISE). */
> > > + if (!hugepage_global_always())
> > > + orders &= ~inherit;
> >
> > I hate this implicit 'not hugepage global always so this means either never or
> > madvise and since we cleared orders for never this means madvise' mental
> > gymnastics required here.
> >
> > Yeah I feel this is a bridge too far, we're getting into double negation and I
> > think that's more confusiong.
>
>
> Same here ... I think we should just have hugepage_global_madvise(). :)
Ideally in future not have these stupid globals all over the place and rework
this whole damn thing...
>
> >
> >
> > > + }
> >
> > I propose a compromise as I rather like your 'exclude never' negation bit.
> >
> > So:
> >
> > /* Strictly mask requested anonymous orders according to sysfs settings. */
> > static inline unsigned long __thp_mask_anon_orders(unsigned long vm_flags,
> > unsigned long tva_flags, unsigned long orders)
> > {
> > const unsigned long always = READ_ONCE(huge_anon_orders_always);
> > const unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
> > const unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);;
> > const unsigned long never = ~(always | madvise | inherit);
> > const bool inherit_enabled = hugepage_global_enabled();
>
> Can we just have hugepage_global_never/disabled() to use instead?
This would be nice!
Could be a follow up... though again would be nice to somehow do away with all
this crap altogether.
>
> >
> > /* Disallow orders that are set to NEVER directly ... */
> > orders &= ~never;
> >
> > /* ... or through inheritance (global == NEVER). */
> > if (!inherit_enabled)
> > orders &= ~inherit;>
> > /*
> > * Otherwise, we only enforce sysfs settings if asked. In addition,
> > * if the user sets a sysfs mode of madvise and if TVA_ENFORCE_SYSFS
> > * is not set, we don't bother checking whether the VMA has VM_HUGEPAGE
> > * set.
> > */
> > if (!(tva_flags & TVA_ENFORCE_SYSFS))
> > return orders;
> >
> > if (hugepage_global_always())
> > return orders & (always | inherit);
> >
> > /* We already excluded never inherit above. */
> > if (vm_flags & VM_HUGEPAGE)
> > return orders & (always | madvise | inherit);
> >
> > return orders & always;
> > }
> >
> > What do you think?
>
> With the fixup, it would work for me. No magical "mask" variables :D
Thanks!
>
> > >
> > > + return orders;
> > > +}
> > > +
> > > /**
> > > * thp_vma_allowable_orders - determine hugepage orders that are allowed for vma
> > > * @vma: the vm area to check
> > > @@ -287,16 +323,8 @@ unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma,
> > > unsigned long orders)
> > > {
> > > /* Optimization to check if required orders are enabled early. */
> > > - if ((tva_flags & TVA_ENFORCE_SYSFS) && vma_is_anonymous(vma)) {
> > > - unsigned long mask = READ_ONCE(huge_anon_orders_always);
> > > -
> > > - if (vm_flags & VM_HUGEPAGE)
> > > - mask |= READ_ONCE(huge_anon_orders_madvise);
> > > - if (hugepage_global_always() ||
> > > - ((vm_flags & VM_HUGEPAGE) && hugepage_global_enabled()))
> > > - mask |= READ_ONCE(huge_anon_orders_inherit);
> > > -
> > > - orders &= mask;
> > > + if (vma_is_anonymous(vma)) {
> > > + orders = __thp_mask_anon_orders(vm_flags, tva_flags, orders);
> > > if (!orders)
> > > return 0;
> >
> > I pointed out to Baolin that __thp_vma_allowable_orders() handles the orders ==
> > 0 case almost immediately so there's no need to do this, it just makes the code
> > noisier.
>
> The reason we added it in the first place was to not do the (expensive)
> function call.
Ack point taken!
>
>
> --
> Cheers,
>
> David / dhildenb
>
For convenience, I enclose the fixed version + tweaked the inherit local bool to
be inherit_never to be clearer:
/* Strictly mask requested anonymous orders according to sysfs settings. */
static inline unsigned long __thp_mask_anon_orders(unsigned long vm_flags,
unsigned long tva_flags, unsigned long orders)
{
const unsigned long always = READ_ONCE(huge_anon_orders_always);
const unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
const unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);;
const unsigned long never = ~(always | madvise | inherit);
const bool inherit_never = !hugepage_global_enabled();
/* Disallow orders that are set to NEVER directly ... */
orders &= ~never;
/* ... or through inheritance (global == NEVER). */
if (inherit_never)
orders &= ~inherit;
/*
* Otherwise, we only enforce sysfs settings if asked. In addition,
* if the user sets a sysfs mode of madvise and if TVA_ENFORCE_SYSFS
* is not set, we don't bother checking whether the VMA has VM_HUGEPAGE
* set.
*/
if (!(tva_flags & TVA_ENFORCE_SYSFS))
return orders;
/* We already excluded never inherit above. */
if (vm_flags & VM_HUGEPAGE)
return orders & (always | madvise | inherit);
if (hugepage_global_always())
return orders & (always | inherit);
return orders & always;
}
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH v2 1/2] mm: huge_memory: disallow hugepages if the system-wide THP sysfs settings are disabled
2025-06-12 14:49 ` Lorenzo Stoakes
@ 2025-06-13 2:07 ` Baolin Wang
2025-06-13 5:18 ` Lorenzo Stoakes
0 siblings, 1 reply; 49+ messages in thread
From: Baolin Wang @ 2025-06-13 2:07 UTC (permalink / raw)
To: Lorenzo Stoakes, David Hildenbrand
Cc: akpm, hughd, Liam.Howlett, npache, ryan.roberts, dev.jain, ziy,
linux-mm, linux-kernel
On 2025/6/12 22:49, Lorenzo Stoakes wrote:
> On Thu, Jun 12, 2025 at 04:09:27PM +0200, David Hildenbrand wrote:
>>
>>
>>>> @@ -265,6 +265,42 @@ unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma,
>>>> unsigned long tva_flags,
>>>> unsigned long orders);
>>>> +/* Strictly mask requested anonymous orders according to sysfs settings. */
>>>> +static inline unsigned long __thp_mask_anon_orders(unsigned long vm_flags,
>>>> + unsigned long tva_flags, unsigned long orders)
>>>> +{
>>>> + const unsigned long always = READ_ONCE(huge_anon_orders_always);
>>>> + const unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
>>>> + const unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);
>>>> + const unsigned long never = ~(always | madvise | inherit);
>>>> +
>>>> + /* Disallow orders that are set to NEVER directly ... */
>>>> + orders &= ~never;
>>>> +
>>>> + /* ... or through inheritance (global == NEVER). */
>>>> + if (!hugepage_global_enabled())
>>>> + orders &= ~inherit;
>>>> +
>>>> + /*
>>>> + * Otherwise, we only enforce sysfs settings if asked. In addition,
>>>> + * if the user sets a sysfs mode of madvise and if TVA_ENFORCE_SYSFS
>>>> + * is not set, we don't bother checking whether the VMA has VM_HUGEPAGE
>>>> + * set.
>>>> + */
>>>> + if (!(tva_flags & TVA_ENFORCE_SYSFS))
>>>> + return orders;
>>>
>>> This implicitly does a & mask as per suggested previous version, which I think
>>> is correct but worth pointing out.
>>
>> Yes.
>>
>>>
>>>> +
>>>> + if (!(vm_flags & VM_HUGEPAGE)) {
>>>
>>> Don't love this sort of mega negation here. I read this as _does_ have huge
>>> page...
>>
>> Well, it's very common to do that, but not objecting to something that is
>> clearer ;)
>>
>> I assume you spotted the
>>
>> if (!(tva_flags & TVA_ENFORCE_SYSFS))
>>
>> :P
>
> Lol yeah I know I know, I just think I guess in this case because you're
> negating elsewhere it makes it harder...
>
>>
>> if (vm_flags & VM_HUGEPAGE)
>> return orders;
>>
>>
>> Would have been easier.
>>
>>>
>>>> + /* Disallow orders that are set to MADVISE directly ... */
>>>> + orders &= ~madvise;
>>>> +
>>>> + /* ... or through inheritance (global == MADVISE). */
>>>> + if (!hugepage_global_always())
>>>> + orders &= ~inherit;
>>>
>>> I hate this implicit 'not hugepage global always so this means either never or
>>> madvise and since we cleared orders for never this means madvise' mental
>>> gymnastics required here.
>>>
>>> Yeah I feel this is a bridge too far, we're getting into double negation and I
>>> think that's more confusiong.
>>
>>
>> Same here ... I think we should just have hugepage_global_madvise(). :)
>
> Ideally in future not have these stupid globals all over the place and rework
> this whole damn thing...
>
>>
>>>
>>>
>>>> + }
>>>
>>> I propose a compromise as I rather like your 'exclude never' negation bit.
>>>
>>> So:
>>>
>>> /* Strictly mask requested anonymous orders according to sysfs settings. */
>>> static inline unsigned long __thp_mask_anon_orders(unsigned long vm_flags,
>>> unsigned long tva_flags, unsigned long orders)
>>> {
>>> const unsigned long always = READ_ONCE(huge_anon_orders_always);
>>> const unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
>>> const unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);;
>>> const unsigned long never = ~(always | madvise | inherit);
>>> const bool inherit_enabled = hugepage_global_enabled();
>>
>> Can we just have hugepage_global_never/disabled() to use instead?
>
> This would be nice!
>
> Could be a follow up... though again would be nice to somehow do away with all
> this crap altogether.
>
>>
>>>
>>> /* Disallow orders that are set to NEVER directly ... */
>>> orders &= ~never;
>>>
>>> /* ... or through inheritance (global == NEVER). */
>>> if (!inherit_enabled)
>>> orders &= ~inherit;>
>>> /*
>>> * Otherwise, we only enforce sysfs settings if asked. In addition,
>>> * if the user sets a sysfs mode of madvise and if TVA_ENFORCE_SYSFS
>>> * is not set, we don't bother checking whether the VMA has VM_HUGEPAGE
>>> * set.
>>> */
>>> if (!(tva_flags & TVA_ENFORCE_SYSFS))
>>> return orders;
>>>
>>> if (hugepage_global_always())
>>> return orders & (always | inherit);
>>>
>>> /* We already excluded never inherit above. */
>>> if (vm_flags & VM_HUGEPAGE)
>>> return orders & (always | madvise | inherit);
>>>
>>> return orders & always;
>>> }
>>>
>>> What do you think?
>>
>> With the fixup, it would work for me. No magical "mask" variables :D
>
> Thanks!
Fair enough. You both prefer the 'exclude never' logic, and I will
follow that logic.
>>>> + return orders;
>>>> +}
>>>> +
>>>> /**
>>>> * thp_vma_allowable_orders - determine hugepage orders that are allowed for vma
>>>> * @vma: the vm area to check
>>>> @@ -287,16 +323,8 @@ unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma,
>>>> unsigned long orders)
>>>> {
>>>> /* Optimization to check if required orders are enabled early. */
>>>> - if ((tva_flags & TVA_ENFORCE_SYSFS) && vma_is_anonymous(vma)) {
>>>> - unsigned long mask = READ_ONCE(huge_anon_orders_always);
>>>> -
>>>> - if (vm_flags & VM_HUGEPAGE)
>>>> - mask |= READ_ONCE(huge_anon_orders_madvise);
>>>> - if (hugepage_global_always() ||
>>>> - ((vm_flags & VM_HUGEPAGE) && hugepage_global_enabled()))
>>>> - mask |= READ_ONCE(huge_anon_orders_inherit);
>>>> -
>>>> - orders &= mask;
>>>> + if (vma_is_anonymous(vma)) {
>>>> + orders = __thp_mask_anon_orders(vm_flags, tva_flags, orders);
>>>> if (!orders)
>>>> return 0;
>>>
>>> I pointed out to Baolin that __thp_vma_allowable_orders() handles the orders ==
>>> 0 case almost immediately so there's no need to do this, it just makes the code
>>> noisier.
>>
>> The reason we added it in the first place was to not do the (expensive)
>> function call.
>
> Ack point taken!
>
>>
>>
>> --
>> Cheers,
>>
>> David / dhildenb
>>
>
> For convenience, I enclose the fixed version + tweaked the inherit local bool to
> be inherit_never to be clearer:
>
> /* Strictly mask requested anonymous orders according to sysfs settings. */
> static inline unsigned long __thp_mask_anon_orders(unsigned long vm_flags,
> unsigned long tva_flags, unsigned long orders)
> {
> const unsigned long always = READ_ONCE(huge_anon_orders_always);
> const unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
> const unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);;
> const unsigned long never = ~(always | madvise | inherit);
> const bool inherit_never = !hugepage_global_enabled();
>
> /* Disallow orders that are set to NEVER directly ... */
> orders &= ~never;
>
> /* ... or through inheritance (global == NEVER). */
> if (inherit_never)
> orders &= ~inherit;
>
> /*
> * Otherwise, we only enforce sysfs settings if asked. In addition,
> * if the user sets a sysfs mode of madvise and if TVA_ENFORCE_SYSFS
> * is not set, we don't bother checking whether the VMA has VM_HUGEPAGE
> * set.
> */
> if (!(tva_flags & TVA_ENFORCE_SYSFS))
> return orders;
>
> /* We already excluded never inherit above. */
> if (vm_flags & VM_HUGEPAGE)
> return orders & (always | madvise | inherit);
>
> if (hugepage_global_always())
> return orders & (always | inherit);
>
> return orders & always;
> }
Thanks Lorenzo. Let me follow this logic and do some testing.
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH v2 1/2] mm: huge_memory: disallow hugepages if the system-wide THP sysfs settings are disabled
2025-06-13 2:07 ` Baolin Wang
@ 2025-06-13 5:18 ` Lorenzo Stoakes
0 siblings, 0 replies; 49+ messages in thread
From: Lorenzo Stoakes @ 2025-06-13 5:18 UTC (permalink / raw)
To: Baolin Wang
Cc: David Hildenbrand, akpm, hughd, Liam.Howlett, npache,
ryan.roberts, dev.jain, ziy, linux-mm, linux-kernel
On Fri, Jun 13, 2025 at 10:07:20AM +0800, Baolin Wang wrote:
>
>
> On 2025/6/12 22:49, Lorenzo Stoakes wrote:
> > On Thu, Jun 12, 2025 at 04:09:27PM +0200, David Hildenbrand wrote:
> > >
> > >
> > > > > @@ -265,6 +265,42 @@ unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma,
> > > > > unsigned long tva_flags,
> > > > > unsigned long orders);
> > > > > +/* Strictly mask requested anonymous orders according to sysfs settings. */
> > > > > +static inline unsigned long __thp_mask_anon_orders(unsigned long vm_flags,
> > > > > + unsigned long tva_flags, unsigned long orders)
> > > > > +{
> > > > > + const unsigned long always = READ_ONCE(huge_anon_orders_always);
> > > > > + const unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
> > > > > + const unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);
> > > > > + const unsigned long never = ~(always | madvise | inherit);
> > > > > +
> > > > > + /* Disallow orders that are set to NEVER directly ... */
> > > > > + orders &= ~never;
> > > > > +
> > > > > + /* ... or through inheritance (global == NEVER). */
> > > > > + if (!hugepage_global_enabled())
> > > > > + orders &= ~inherit;
> > > > > +
> > > > > + /*
> > > > > + * Otherwise, we only enforce sysfs settings if asked. In addition,
> > > > > + * if the user sets a sysfs mode of madvise and if TVA_ENFORCE_SYSFS
> > > > > + * is not set, we don't bother checking whether the VMA has VM_HUGEPAGE
> > > > > + * set.
> > > > > + */
> > > > > + if (!(tva_flags & TVA_ENFORCE_SYSFS))
> > > > > + return orders;
> > > >
> > > > This implicitly does a & mask as per suggested previous version, which I think
> > > > is correct but worth pointing out.
> > >
> > > Yes.
> > >
> > > >
> > > > > +
> > > > > + if (!(vm_flags & VM_HUGEPAGE)) {
> > > >
> > > > Don't love this sort of mega negation here. I read this as _does_ have huge
> > > > page...
> > >
> > > Well, it's very common to do that, but not objecting to something that is
> > > clearer ;)
> > >
> > > I assume you spotted the
> > >
> > > if (!(tva_flags & TVA_ENFORCE_SYSFS))
> > >
> > > :P
> >
> > Lol yeah I know I know, I just think I guess in this case because you're
> > negating elsewhere it makes it harder...
> >
> > >
> > > if (vm_flags & VM_HUGEPAGE)
> > > return orders;
> > >
> > >
> > > Would have been easier.
> > >
> > > >
> > > > > + /* Disallow orders that are set to MADVISE directly ... */
> > > > > + orders &= ~madvise;
> > > > > +
> > > > > + /* ... or through inheritance (global == MADVISE). */
> > > > > + if (!hugepage_global_always())
> > > > > + orders &= ~inherit;
> > > >
> > > > I hate this implicit 'not hugepage global always so this means either never or
> > > > madvise and since we cleared orders for never this means madvise' mental
> > > > gymnastics required here.
> > > >
> > > > Yeah I feel this is a bridge too far, we're getting into double negation and I
> > > > think that's more confusiong.
> > >
> > >
> > > Same here ... I think we should just have hugepage_global_madvise(). :)
> >
> > Ideally in future not have these stupid globals all over the place and rework
> > this whole damn thing...
> >
> > >
> > > >
> > > >
> > > > > + }
> > > >
> > > > I propose a compromise as I rather like your 'exclude never' negation bit.
> > > >
> > > > So:
> > > >
> > > > /* Strictly mask requested anonymous orders according to sysfs settings. */
> > > > static inline unsigned long __thp_mask_anon_orders(unsigned long vm_flags,
> > > > unsigned long tva_flags, unsigned long orders)
> > > > {
> > > > const unsigned long always = READ_ONCE(huge_anon_orders_always);
> > > > const unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
> > > > const unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);;
> > > > const unsigned long never = ~(always | madvise | inherit);
> > > > const bool inherit_enabled = hugepage_global_enabled();
> > >
> > > Can we just have hugepage_global_never/disabled() to use instead?
> >
> > This would be nice!
> >
> > Could be a follow up... though again would be nice to somehow do away with all
> > this crap altogether.
> >
> > >
> > > >
> > > > /* Disallow orders that are set to NEVER directly ... */
> > > > orders &= ~never;
> > > >
> > > > /* ... or through inheritance (global == NEVER). */
> > > > if (!inherit_enabled)
> > > > orders &= ~inherit;>
> > > > /*
> > > > * Otherwise, we only enforce sysfs settings if asked. In addition,
> > > > * if the user sets a sysfs mode of madvise and if TVA_ENFORCE_SYSFS
> > > > * is not set, we don't bother checking whether the VMA has VM_HUGEPAGE
> > > > * set.
> > > > */
> > > > if (!(tva_flags & TVA_ENFORCE_SYSFS))
> > > > return orders;
> > > >
> > > > if (hugepage_global_always())
> > > > return orders & (always | inherit);
> > > >
> > > > /* We already excluded never inherit above. */
> > > > if (vm_flags & VM_HUGEPAGE)
> > > > return orders & (always | madvise | inherit);
> > > >
> > > > return orders & always;
> > > > }
> > > >
> > > > What do you think?
> > >
> > > With the fixup, it would work for me. No magical "mask" variables :D
> >
> > Thanks!
>
> Fair enough. You both prefer the 'exclude never' logic, and I will follow
> that logic.
Appreciate it, sorry to be a pain and thanks a lot!
>
> > > > > + return orders;
> > > > > +}
> > > > > +
> > > > > /**
> > > > > * thp_vma_allowable_orders - determine hugepage orders that are allowed for vma
> > > > > * @vma: the vm area to check
> > > > > @@ -287,16 +323,8 @@ unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma,
> > > > > unsigned long orders)
> > > > > {
> > > > > /* Optimization to check if required orders are enabled early. */
> > > > > - if ((tva_flags & TVA_ENFORCE_SYSFS) && vma_is_anonymous(vma)) {
> > > > > - unsigned long mask = READ_ONCE(huge_anon_orders_always);
> > > > > -
> > > > > - if (vm_flags & VM_HUGEPAGE)
> > > > > - mask |= READ_ONCE(huge_anon_orders_madvise);
> > > > > - if (hugepage_global_always() ||
> > > > > - ((vm_flags & VM_HUGEPAGE) && hugepage_global_enabled()))
> > > > > - mask |= READ_ONCE(huge_anon_orders_inherit);
> > > > > -
> > > > > - orders &= mask;
> > > > > + if (vma_is_anonymous(vma)) {
> > > > > + orders = __thp_mask_anon_orders(vm_flags, tva_flags, orders);
> > > > > if (!orders)
> > > > > return 0;
> > > >
> > > > I pointed out to Baolin that __thp_vma_allowable_orders() handles the orders ==
> > > > 0 case almost immediately so there's no need to do this, it just makes the code
> > > > noisier.
> > >
> > > The reason we added it in the first place was to not do the (expensive)
> > > function call.
> >
> > Ack point taken!
> >
> > >
> > >
> > > --
> > > Cheers,
> > >
> > > David / dhildenb
> > >
> >
> > For convenience, I enclose the fixed version + tweaked the inherit local bool to
> > be inherit_never to be clearer:
> >
> > /* Strictly mask requested anonymous orders according to sysfs settings. */
> > static inline unsigned long __thp_mask_anon_orders(unsigned long vm_flags,
> > unsigned long tva_flags, unsigned long orders)
> > {
> > const unsigned long always = READ_ONCE(huge_anon_orders_always);
> > const unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
> > const unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);;
> > const unsigned long never = ~(always | madvise | inherit);
> > const bool inherit_never = !hugepage_global_enabled();
> >
> > /* Disallow orders that are set to NEVER directly ... */
> > orders &= ~never;
> >
> > /* ... or through inheritance (global == NEVER). */
> > if (inherit_never)
> > orders &= ~inherit;
> >
> > /*
> > * Otherwise, we only enforce sysfs settings if asked. In addition,
> > * if the user sets a sysfs mode of madvise and if TVA_ENFORCE_SYSFS
> > * is not set, we don't bother checking whether the VMA has VM_HUGEPAGE
> > * set.
> > */
> > if (!(tva_flags & TVA_ENFORCE_SYSFS))
> > return orders;
> >
> > /* We already excluded never inherit above. */
> > if (vm_flags & VM_HUGEPAGE)
> > return orders & (always | madvise | inherit);
> >
> > if (hugepage_global_always())
> > return orders & (always | inherit);
> >
> > return orders & always;
> > }
>
> Thanks Lorenzo. Let me follow this logic and do some testing.
Thanks, much appreciated!
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH v2 0/2] fix MADV_COLLAPSE issue if THP settings are disabled
2025-06-05 8:00 [PATCH v2 0/2] fix MADV_COLLAPSE issue if THP settings are disabled Baolin Wang
` (2 preceding siblings ...)
2025-06-07 12:28 ` [PATCH v2 0/2] fix MADV_COLLAPSE issue if THP " Lorenzo Stoakes
@ 2025-06-13 14:23 ` Usama Arif
2025-06-13 14:29 ` Lorenzo Stoakes
3 siblings, 1 reply; 49+ messages in thread
From: Usama Arif @ 2025-06-13 14:23 UTC (permalink / raw)
To: Baolin Wang, akpm, hughd, david
Cc: lorenzo.stoakes, Liam.Howlett, npache, ryan.roberts, dev.jain,
ziy, linux-mm, linux-kernel
On 05/06/2025 09:00, Baolin Wang wrote:
> As we discussed in the previous thread [1], the MADV_COLLAPSE will ignore
> the system-wide anon/shmem THP sysfs settings, which means that even though
> we have disabled the anon/shmem THP configuration, MADV_COLLAPSE will still
> attempt to collapse into a anon/shmem THP. This violates the rule we have
> agreed upon: never means never. This patch set will address this issue.
Hi Baolin,
I know never means never, but I also thought that the per-size toggles had
priority over the system ones. This was discussed in [1] as well.
My understanding with these patches is that if we have:
[root@vm4 vmuser]# cat /sys/kernel/mm/transparent_hugepage/enabled
always madvise [never]
[root@vm4 vmuser]# cat /sys/kernel/mm/transparent_hugepage/hugepages-2048kB/enabled
always inherit [madvise] never
Than without these patches we get a hugepage when we do MADV_HUGEPAGE, but with
these we won't get a hugepage anymore eventhough hugepages-2048kB/enabled is set
to madvise?
I know this isn't ABI, but this would break existing expectations.
(For e.g. we have certain 64K page size arm machines with global enabled = never and
2M = madvise, and we want 2M hugepages to fault at madvise).
If the whole thing was being implemented from scratch, we should have definitely
done it this way, but this can give a people a nasty surprise when they upgrade
the kernel and suddenly stop getting hugepages.
[1] https://lore.kernel.org/all/97702ff0-fc50-4779-bfa8-83dc42352db1@redhat.com/
Thanks,
Usama
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH v2 0/2] fix MADV_COLLAPSE issue if THP settings are disabled
2025-06-13 14:23 ` Usama Arif
@ 2025-06-13 14:29 ` Lorenzo Stoakes
2025-06-13 14:39 ` Usama Arif
0 siblings, 1 reply; 49+ messages in thread
From: Lorenzo Stoakes @ 2025-06-13 14:29 UTC (permalink / raw)
To: Usama Arif
Cc: Baolin Wang, akpm, hughd, david, Liam.Howlett, npache,
ryan.roberts, dev.jain, ziy, linux-mm, linux-kernel
On Fri, Jun 13, 2025 at 03:23:19PM +0100, Usama Arif wrote:
>
>
> On 05/06/2025 09:00, Baolin Wang wrote:
> > As we discussed in the previous thread [1], the MADV_COLLAPSE will ignore
> > the system-wide anon/shmem THP sysfs settings, which means that even though
> > we have disabled the anon/shmem THP configuration, MADV_COLLAPSE will still
> > attempt to collapse into a anon/shmem THP. This violates the rule we have
> > agreed upon: never means never. This patch set will address this issue.
>
> Hi Baolin,
>
> I know never means never, but I also thought that the per-size toggles had
> priority over the system ones. This was discussed in [1] as well.
>
> My understanding with these patches is that if we have:
>
> [root@vm4 vmuser]# cat /sys/kernel/mm/transparent_hugepage/enabled
> always madvise [never]
> [root@vm4 vmuser]# cat /sys/kernel/mm/transparent_hugepage/hugepages-2048kB/enabled
> always inherit [madvise] never
>
> Than without these patches we get a hugepage when we do MADV_HUGEPAGE, but with
> these we won't get a hugepage anymore eventhough hugepages-2048kB/enabled is set
> to madvise?
This isn't correct, madvise at a specific pagesize will still be permitted for
MADV_COLLAPSE.
In current contender for this patch:
/* Strictly mask requested anonymous orders according to sysfs settings. */
static inline unsigned long __thp_mask_anon_orders(unsigned long vm_flags,
unsigned long tva_flags, unsigned long orders)
{
const unsigned long always = READ_ONCE(huge_anon_orders_always);
const unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
const unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);;
const unsigned long never = ~(always | madvise | inherit);
Note that madvise is considered here.
const bool inherit_never = !hugepage_global_enabled();
/* Disallow orders that are set to NEVER directly ... */
orders &= ~never;
/* ... or through inheritance (global == NEVER). */
if (inherit_never)
orders &= ~inherit;
/*
* Otherwise, we only enforce sysfs settings if asked. In addition,
* if the user sets a sysfs mode of madvise and if TVA_ENFORCE_SYSFS
* is not set, we don't bother checking whether the VMA has VM_HUGEPAGE
* set.
*/
if (!(tva_flags & TVA_ENFORCE_SYSFS))
return orders;
And then if !TVA_ENFORCE_SYSFS (e.g. MADV_COLLAPSE case), an madvise should
succeed fine.
>
> I know this isn't ABI, but this would break existing expectations.
> (For e.g. we have certain 64K page size arm machines with global enabled = never and
> 2M = madvise, and we want 2M hugepages to fault at madvise).
> If the whole thing was being implemented from scratch, we should have definitely
> done it this way, but this can give a people a nasty surprise when they upgrade
> the kernel and suddenly stop getting hugepages.
>
> [1] https://lore.kernel.org/all/97702ff0-fc50-4779-bfa8-83dc42352db1@redhat.com/
>
> Thanks,
> Usama
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH v2 0/2] fix MADV_COLLAPSE issue if THP settings are disabled
2025-06-13 14:29 ` Lorenzo Stoakes
@ 2025-06-13 14:39 ` Usama Arif
2025-06-13 14:42 ` Lorenzo Stoakes
0 siblings, 1 reply; 49+ messages in thread
From: Usama Arif @ 2025-06-13 14:39 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: Baolin Wang, akpm, hughd, david, Liam.Howlett, npache,
ryan.roberts, dev.jain, ziy, linux-mm, linux-kernel
On 13/06/2025 15:29, Lorenzo Stoakes wrote:
> On Fri, Jun 13, 2025 at 03:23:19PM +0100, Usama Arif wrote:
>>
>>
>> On 05/06/2025 09:00, Baolin Wang wrote:
>>> As we discussed in the previous thread [1], the MADV_COLLAPSE will ignore
>>> the system-wide anon/shmem THP sysfs settings, which means that even though
>>> we have disabled the anon/shmem THP configuration, MADV_COLLAPSE will still
>>> attempt to collapse into a anon/shmem THP. This violates the rule we have
>>> agreed upon: never means never. This patch set will address this issue.
>>
>> Hi Baolin,
>>
>> I know never means never, but I also thought that the per-size toggles had
>> priority over the system ones. This was discussed in [1] as well.
>>
>> My understanding with these patches is that if we have:
>>
>> [root@vm4 vmuser]# cat /sys/kernel/mm/transparent_hugepage/enabled
>> always madvise [never]
>> [root@vm4 vmuser]# cat /sys/kernel/mm/transparent_hugepage/hugepages-2048kB/enabled
>> always inherit [madvise] never
>>
>> Than without these patches we get a hugepage when we do MADV_HUGEPAGE, but with
>> these we won't get a hugepage anymore eventhough hugepages-2048kB/enabled is set
>> to madvise?
>
> This isn't correct, madvise at a specific pagesize will still be permitted for
> MADV_COLLAPSE.
>
> In current contender for this patch:
>
> /* Strictly mask requested anonymous orders according to sysfs settings. */
> static inline unsigned long __thp_mask_anon_orders(unsigned long vm_flags,
> unsigned long tva_flags, unsigned long orders)
> {
> const unsigned long always = READ_ONCE(huge_anon_orders_always);
> const unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
> const unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);;
> const unsigned long never = ~(always | madvise | inherit);
>
> Note that madvise is considered here.
>
Ah ok, Thanks for clearing that! I was reviewing the original patch in [1] but I
see this version in the replies.
I wish this function was simpler :) or maybe its me that takes so much time
to figure out if the order will be set or not by the end of the function.
[1] https://lore.kernel.org/all/8eefb0809c598fadaa4a022634fba5689a4f3257.1749109709.git.baolin.wang@linux.alibaba.com/
Thanks!
Usama
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH v2 0/2] fix MADV_COLLAPSE issue if THP settings are disabled
2025-06-13 14:39 ` Usama Arif
@ 2025-06-13 14:42 ` Lorenzo Stoakes
0 siblings, 0 replies; 49+ messages in thread
From: Lorenzo Stoakes @ 2025-06-13 14:42 UTC (permalink / raw)
To: Usama Arif
Cc: Baolin Wang, akpm, hughd, david, Liam.Howlett, npache,
ryan.roberts, dev.jain, ziy, linux-mm, linux-kernel
On Fri, Jun 13, 2025 at 03:39:33PM +0100, Usama Arif wrote:
>
>
> On 13/06/2025 15:29, Lorenzo Stoakes wrote:
> > On Fri, Jun 13, 2025 at 03:23:19PM +0100, Usama Arif wrote:
> >>
> >>
> >> On 05/06/2025 09:00, Baolin Wang wrote:
> >>> As we discussed in the previous thread [1], the MADV_COLLAPSE will ignore
> >>> the system-wide anon/shmem THP sysfs settings, which means that even though
> >>> we have disabled the anon/shmem THP configuration, MADV_COLLAPSE will still
> >>> attempt to collapse into a anon/shmem THP. This violates the rule we have
> >>> agreed upon: never means never. This patch set will address this issue.
> >>
> >> Hi Baolin,
> >>
> >> I know never means never, but I also thought that the per-size toggles had
> >> priority over the system ones. This was discussed in [1] as well.
> >>
> >> My understanding with these patches is that if we have:
> >>
> >> [root@vm4 vmuser]# cat /sys/kernel/mm/transparent_hugepage/enabled
> >> always madvise [never]
> >> [root@vm4 vmuser]# cat /sys/kernel/mm/transparent_hugepage/hugepages-2048kB/enabled
> >> always inherit [madvise] never
> >>
> >> Than without these patches we get a hugepage when we do MADV_HUGEPAGE, but with
> >> these we won't get a hugepage anymore eventhough hugepages-2048kB/enabled is set
> >> to madvise?
> >
> > This isn't correct, madvise at a specific pagesize will still be permitted for
> > MADV_COLLAPSE.
> >
> > In current contender for this patch:
> >
> > /* Strictly mask requested anonymous orders according to sysfs settings. */
> > static inline unsigned long __thp_mask_anon_orders(unsigned long vm_flags,
> > unsigned long tva_flags, unsigned long orders)
> > {
> > const unsigned long always = READ_ONCE(huge_anon_orders_always);
> > const unsigned long madvise = READ_ONCE(huge_anon_orders_madvise);
> > const unsigned long inherit = READ_ONCE(huge_anon_orders_inherit);;
> > const unsigned long never = ~(always | madvise | inherit);
> >
> > Note that madvise is considered here.
> >
>
> Ah ok, Thanks for clearing that! I was reviewing the original patch in [1] but I
> see this version in the replies.
>
> I wish this function was simpler :) or maybe its me that takes so much time
> to figure out if the order will be set or not by the end of the function.
Couldn't agree more... this whole thing needs major work, it's massively confusing
>
> [1] https://lore.kernel.org/all/8eefb0809c598fadaa4a022634fba5689a4f3257.1749109709.git.baolin.wang@linux.alibaba.com/
>
> Thanks!
> Usama
>
^ permalink raw reply [flat|nested] 49+ messages in thread
end of thread, other threads:[~2025-06-13 14:42 UTC | newest]
Thread overview: 49+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-05 8:00 [PATCH v2 0/2] fix MADV_COLLAPSE issue if THP settings are disabled Baolin Wang
2025-06-05 8:00 ` [PATCH v2 1/2] mm: huge_memory: disallow hugepages if the system-wide THP sysfs " Baolin Wang
2025-06-06 16:49 ` Dev Jain
2025-06-06 18:47 ` Dev Jain
2025-06-09 5:57 ` Baolin Wang
2025-06-07 11:55 ` Lorenzo Stoakes
2025-06-07 12:21 ` Lorenzo Stoakes
2025-06-09 6:18 ` Baolin Wang
2025-06-09 15:12 ` Lorenzo Stoakes
2025-06-09 6:10 ` Baolin Wang
2025-06-09 15:17 ` Lorenzo Stoakes
2025-06-11 6:59 ` Baolin Wang
2025-06-08 18:37 ` Nico Pache
2025-06-09 6:36 ` Baolin Wang
2025-06-11 12:34 ` David Hildenbrand
2025-06-12 7:51 ` Baolin Wang
2025-06-12 8:46 ` Dev Jain
2025-06-12 8:52 ` David Hildenbrand
2025-06-12 8:51 ` David Hildenbrand
2025-06-12 12:45 ` Baolin Wang
2025-06-12 13:05 ` David Hildenbrand
2025-06-12 13:25 ` Baolin Wang
2025-06-12 13:40 ` Baolin Wang
2025-06-12 13:27 ` Lorenzo Stoakes
2025-06-12 13:29 ` Lorenzo Stoakes
2025-06-12 14:13 ` Baolin Wang
2025-06-12 14:16 ` David Hildenbrand
2025-06-12 14:20 ` Lorenzo Stoakes
2025-06-12 14:09 ` David Hildenbrand
2025-06-12 14:49 ` Lorenzo Stoakes
2025-06-13 2:07 ` Baolin Wang
2025-06-13 5:18 ` Lorenzo Stoakes
2025-06-12 13:07 ` Lorenzo Stoakes
2025-06-12 13:13 ` David Hildenbrand
2025-06-12 13:31 ` Lorenzo Stoakes
2025-06-05 8:00 ` [PATCH v2 2/2] mm: shmem: disallow hugepages if the system-wide shmem " Baolin Wang
2025-06-07 12:14 ` Lorenzo Stoakes
2025-06-07 12:17 ` Lorenzo Stoakes
2025-06-09 6:34 ` Baolin Wang
2025-06-09 19:30 ` Lorenzo Stoakes
2025-06-09 6:31 ` Baolin Wang
2025-06-09 15:33 ` Lorenzo Stoakes
2025-06-11 7:02 ` Baolin Wang
2025-06-07 12:28 ` [PATCH v2 0/2] fix MADV_COLLAPSE issue if THP " Lorenzo Stoakes
2025-06-11 7:05 ` Baolin Wang
2025-06-13 14:23 ` Usama Arif
2025-06-13 14:29 ` Lorenzo Stoakes
2025-06-13 14:39 ` Usama Arif
2025-06-13 14:42 ` Lorenzo Stoakes
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).