linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* Re: (bisected) [PATCH v2 08/37] mm/hugetlb: check for unreasonable folio sizes when registering hstate
       [not found] ` <20250901150359.867252-9-david@redhat.com>
@ 2025-10-09  7:14   ` Christophe Leroy
  2025-10-09  7:22     ` David Hildenbrand
  0 siblings, 1 reply; 11+ messages in thread
From: Christophe Leroy @ 2025-10-09  7:14 UTC (permalink / raw)
  To: David Hildenbrand, linux-kernel
  Cc: Zi Yan, Lorenzo Stoakes, Liam R. Howlett, Alexander Potapenko,
	Andrew Morton, Brendan Jackman, Christoph Lameter, Dennis Zhou,
	Dmitry Vyukov, dri-devel, intel-gfx, iommu, io-uring,
	Jason Gunthorpe, Jens Axboe, Johannes Weiner, John Hubbard,
	kasan-dev, kvm, Linus Torvalds, linux-arm-kernel,
	linux-arm-kernel, linux-crypto, linux-ide, linux-kselftest,
	linux-mips, linux-mmc, linux-mm, linux-riscv, linux-s390,
	linux-scsi, Marco Elver, Marek Szyprowski, Michal Hocko,
	Mike Rapoport, Muchun Song, netdev, Oscar Salvador, Peter Xu,
	Robin Murphy, Suren Baghdasaryan, Tejun Heo, virtualization,
	Vlastimil Babka, wireguard, x86, linuxppc-dev@lists.ozlabs.org

Hi David,

Le 01/09/2025 à 17:03, David Hildenbrand a écrit :
> Let's check that no hstate that corresponds to an unreasonable folio size
> is registered by an architecture. If we were to succeed registering, we
> could later try allocating an unsupported gigantic folio size.
> 
> Further, let's add a BUILD_BUG_ON() for checking that HUGETLB_PAGE_ORDER
> is sane at build time. As HUGETLB_PAGE_ORDER is dynamic on powerpc, we have
> to use a BUILD_BUG_ON_INVALID() to make it compile.
> 
> No existing kernel configuration should be able to trigger this check:
> either SPARSEMEM without SPARSEMEM_VMEMMAP cannot be configured or
> gigantic folios will not exceed a memory section (the case on sparse).
> 
> Reviewed-by: Zi Yan <ziy@nvidia.com>
> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>

I get following warning on powerpc with linus tree, bisected to commit 
7b4f21f5e038 ("mm/hugetlb: check for unreasonable folio sizes when 
registering hstate")

------------[ cut here ]------------
WARNING: CPU: 0 PID: 0 at mm/hugetlb.c:4744 hugetlb_add_hstate+0xc0/0x180
Modules linked in:
CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 
6.17.0-rc4-00275-g7b4f21f5e038 #1683 NONE
Hardware name: QEMU ppce500 e5500 0x80240020 QEMU e500
NIP:  c000000001357408 LR: c000000001357c90 CTR: 0000000000000003
REGS: c00000000152bad0 TRAP: 0700   Not tainted 
(6.17.0-rc4-00275-g7b4f21f5e038)
MSR:  0000000080021002 <CE,ME>  CR: 44000448  XER: 20000000
IRQMASK: 1
GPR00: c000000001357c90 c00000000152bd70 c000000001339000 0000000000000012
GPR04: 000000000000000a 0000000000001000 000000000000001e 0000000000000000
GPR08: 0000000000000000 0000000000000000 0000000000000001 000000000000000a
GPR12: c000000001357b68 c000000001590000 0000000000000000 0000000000000000
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR24: c0000000011adb40 c00000000156b528 0000000000000000 c00000000156b4b0
GPR28: c00000000156b528 0000000000000012 0000000040000000 0000000000000000
NIP [c000000001357408] hugetlb_add_hstate+0xc0/0x180
LR [c000000001357c90] hugepagesz_setup+0x128/0x150
Call Trace:
[c00000000152bd70] [c00000000152bda0] init_stack+0x3da0/0x4000 (unreliable)
[c00000000152be10] [c000000001357c90] hugepagesz_setup+0x128/0x150
[c00000000152be80] [c00000000135841c] hugetlb_bootmem_alloc+0x84/0x104
[c00000000152bec0] [c00000000135143c] mm_core_init+0x30/0x174
[c00000000152bf30] [c000000001332ed4] start_kernel+0x540/0x880
[c00000000152bfe0] [c000000000000a50] start_here_common+0x1c/0x20
Code: 2c09000f 39000001 38e00000 39400001 7d00401e 0b080000 281d0001 
7d00505e 79080020 0b080000 281d000c 7d4a385e <0b0a0000> 1f5a00b8 
38bf0020 3c82ffe8
---[ end trace 0000000000000000 ]---
------------[ cut here ]------------
WARNING: CPU: 0 PID: 0 at mm/hugetlb.c:4744 hugetlb_add_hstate+0xc0/0x180
Modules linked in:
CPU: 0 UID: 0 PID: 0 Comm: swapper Tainted: G        W 
6.17.0-rc4-00275-g7b4f21f5e038 #1683 NONE
Tainted: [W]=WARN
Hardware name: QEMU ppce500 e5500 0x80240020 QEMU e500
NIP:  c000000001357408 LR: c000000001357c90 CTR: 0000000000000005
REGS: c00000000152bad0 TRAP: 0700   Tainted: G        W 
(6.17.0-rc4-00275-g7b4f21f5e038)
MSR:  0000000080021002 <CE,ME>  CR: 48000448  XER: 20000000
IRQMASK: 1
GPR00: c000000001357c90 c00000000152bd70 c000000001339000 000000000000000e
GPR04: 000000000000000a 0000000000001000 0000000040000000 0000000000000000
GPR08: 0000000000000000 0000000000000001 0000000000000001 0000000000000280
GPR12: c000000001357b68 c000000001590000 0000000000000000 0000000000000000
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR24: c0000000011adb40 c00000000156b5e0 0000000000000001 c00000000156b4b0
GPR28: c00000000156b528 000000000000000e 0000000004000000 00000000000000b8
NIP [c000000001357408] hugetlb_add_hstate+0xc0/0x180
LR [c000000001357c90] hugepagesz_setup+0x128/0x150
Call Trace:
[c00000000152bd70] [c000000000f27048] __func__.0+0x0/0x18 (unreliable)
[c00000000152be10] [c000000001357c90] hugepagesz_setup+0x128/0x150
[c00000000152be80] [c00000000135841c] hugetlb_bootmem_alloc+0x84/0x104
[c00000000152bec0] [c00000000135143c] mm_core_init+0x30/0x174
[c00000000152bf30] [c000000001332ed4] start_kernel+0x540/0x880
[c00000000152bfe0] [c000000000000a50] start_here_common+0x1c/0x20
Code: 2c09000f 39000001 38e00000 39400001 7d00401e 0b080000 281d0001 
7d00505e 79080020 0b080000 281d000c 7d4a385e <0b0a0000> 1f5a00b8 
38bf0020 3c82ffe8
---[ end trace 0000000000000000 ]---
------------[ cut here ]------------
WARNING: CPU: 0 PID: 0 at mm/hugetlb.c:4744 hugetlb_add_hstate+0xc0/0x180
Modules linked in:
CPU: 0 UID: 0 PID: 0 Comm: swapper Tainted: G        W 
6.17.0-rc4-00275-g7b4f21f5e038 #1683 NONE
Tainted: [W]=WARN
Hardware name: QEMU ppce500 e5500 0x80240020 QEMU e500
NIP:  c000000001357408 LR: c000000001357c90 CTR: 0000000000000004
REGS: c00000000152bad0 TRAP: 0700   Tainted: G        W 
(6.17.0-rc4-00275-g7b4f21f5e038)
MSR:  0000000080021002 <CE,ME>  CR: 48000448  XER: 20000000
IRQMASK: 1
GPR00: c000000001357c90 c00000000152bd70 c000000001339000 0000000000000010
GPR04: 000000000000000a 0000000000001000 0000000004000000 0000000000000000
GPR08: 0000000000000000 0000000000000002 0000000000000001 0000000000000a00
GPR12: c000000001357b68 c000000001590000 0000000000000000 0000000000000000
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR24: c0000000011adb40 c00000000156b698 0000000000000002 c00000000156b4b0
GPR28: c00000000156b528 0000000000000010 0000000010000000 0000000000000170
NIP [c000000001357408] hugetlb_add_hstate+0xc0/0x180
LR [c000000001357c90] hugepagesz_setup+0x128/0x150
Call Trace:
[c00000000152bd70] [c000000000f27048] __func__.0+0x0/0x18 (unreliable)
[c00000000152be10] [c000000001357c90] hugepagesz_setup+0x128/0x150
[c00000000152be80] [c00000000135841c] hugetlb_bootmem_alloc+0x84/0x104
[c00000000152bec0] [c00000000135143c] mm_core_init+0x30/0x174
[c00000000152bf30] [c000000001332ed4] start_kernel+0x540/0x880
[c00000000152bfe0] [c000000000000a50] start_here_common+0x1c/0x20
Code: 2c09000f 39000001 38e00000 39400001 7d00401e 0b080000 281d0001 
7d00505e 79080020 0b080000 281d000c 7d4a385e <0b0a0000> 1f5a00b8 
38bf0020 3c82ffe8
---[ end trace 0000000000000000 ]---


> ---
>   mm/hugetlb.c | 2 ++
>   1 file changed, 2 insertions(+)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 1e777cc51ad04..d3542e92a712e 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -4657,6 +4657,7 @@ static int __init hugetlb_init(void)
>   
>   	BUILD_BUG_ON(sizeof_field(struct page, private) * BITS_PER_BYTE <
>   			__NR_HPAGEFLAGS);
> +	BUILD_BUG_ON_INVALID(HUGETLB_PAGE_ORDER > MAX_FOLIO_ORDER);
>   
>   	if (!hugepages_supported()) {
>   		if (hugetlb_max_hstate || default_hstate_max_huge_pages)
> @@ -4740,6 +4741,7 @@ void __init hugetlb_add_hstate(unsigned int order)
>   	}
>   	BUG_ON(hugetlb_max_hstate >= HUGE_MAX_HSTATE);
>   	BUG_ON(order < order_base_2(__NR_USED_SUBPAGE));
> +	WARN_ON(order > MAX_FOLIO_ORDER);
>   	h = &hstates[hugetlb_max_hstate++];
>   	__mutex_init(&h->resize_lock, "resize mutex", &h->resize_key);
>   	h->order = order;



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: (bisected) [PATCH v2 08/37] mm/hugetlb: check for unreasonable folio sizes when registering hstate
  2025-10-09  7:14   ` (bisected) [PATCH v2 08/37] mm/hugetlb: check for unreasonable folio sizes when registering hstate Christophe Leroy
@ 2025-10-09  7:22     ` David Hildenbrand
  2025-10-09  7:44       ` Christophe Leroy
  2025-10-09  8:04       ` Christophe Leroy
  0 siblings, 2 replies; 11+ messages in thread
From: David Hildenbrand @ 2025-10-09  7:22 UTC (permalink / raw)
  To: Christophe Leroy, linux-kernel
  Cc: Zi Yan, Lorenzo Stoakes, Liam R. Howlett, Alexander Potapenko,
	Andrew Morton, Brendan Jackman, Christoph Lameter, Dennis Zhou,
	Dmitry Vyukov, dri-devel, intel-gfx, iommu, io-uring,
	Jason Gunthorpe, Jens Axboe, Johannes Weiner, John Hubbard,
	kasan-dev, kvm, Linus Torvalds, linux-arm-kernel,
	linux-arm-kernel, linux-crypto, linux-ide, linux-kselftest,
	linux-mips, linux-mmc, linux-mm, linux-riscv, linux-s390,
	linux-scsi, Marco Elver, Marek Szyprowski, Michal Hocko,
	Mike Rapoport, Muchun Song, netdev, Oscar Salvador, Peter Xu,
	Robin Murphy, Suren Baghdasaryan, Tejun Heo, virtualization,
	Vlastimil Babka, wireguard, x86, linuxppc-dev@lists.ozlabs.org

On 09.10.25 09:14, Christophe Leroy wrote:
> Hi David,
> 
> Le 01/09/2025 à 17:03, David Hildenbrand a écrit :
>> Let's check that no hstate that corresponds to an unreasonable folio size
>> is registered by an architecture. If we were to succeed registering, we
>> could later try allocating an unsupported gigantic folio size.
>>
>> Further, let's add a BUILD_BUG_ON() for checking that HUGETLB_PAGE_ORDER
>> is sane at build time. As HUGETLB_PAGE_ORDER is dynamic on powerpc, we have
>> to use a BUILD_BUG_ON_INVALID() to make it compile.
>>
>> No existing kernel configuration should be able to trigger this check:
>> either SPARSEMEM without SPARSEMEM_VMEMMAP cannot be configured or
>> gigantic folios will not exceed a memory section (the case on sparse).
>>
>> Reviewed-by: Zi Yan <ziy@nvidia.com>
>> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
> 
> I get following warning on powerpc with linus tree, bisected to commit
> 7b4f21f5e038 ("mm/hugetlb: check for unreasonable folio sizes when
> registering hstate")

Do you have the kernel config around? Is it 32bit?

That would be helpful.

[...]

>> ---
>>    mm/hugetlb.c | 2 ++
>>    1 file changed, 2 insertions(+)
>>
>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>> index 1e777cc51ad04..d3542e92a712e 100644
>> --- a/mm/hugetlb.c
>> +++ b/mm/hugetlb.c
>> @@ -4657,6 +4657,7 @@ static int __init hugetlb_init(void)
>>    
>>    	BUILD_BUG_ON(sizeof_field(struct page, private) * BITS_PER_BYTE <
>>    			__NR_HPAGEFLAGS);
>> +	BUILD_BUG_ON_INVALID(HUGETLB_PAGE_ORDER > MAX_FOLIO_ORDER);
>>    
>>    	if (!hugepages_supported()) {
>>    		if (hugetlb_max_hstate || default_hstate_max_huge_pages)
>> @@ -4740,6 +4741,7 @@ void __init hugetlb_add_hstate(unsigned int order)
>>    	}
>>    	BUG_ON(hugetlb_max_hstate >= HUGE_MAX_HSTATE);
>>    	BUG_ON(order < order_base_2(__NR_USED_SUBPAGE));
>> +	WARN_ON(order > MAX_FOLIO_ORDER);
>>    	h = &hstates[hugetlb_max_hstate++];
>>    	__mutex_init(&h->resize_lock, "resize mutex", &h->resize_key);
>>    	h->order = order;

We end up registering hugetlb folios that are bigger than 
MAX_FOLIO_ORDER. So we have to figure out how a config can trigger that 
(and if we have to support that).

-- 
Cheers

David / dhildenb



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: (bisected) [PATCH v2 08/37] mm/hugetlb: check for unreasonable folio sizes when registering hstate
  2025-10-09  7:22     ` David Hildenbrand
@ 2025-10-09  7:44       ` Christophe Leroy
  2025-10-09  8:04       ` Christophe Leroy
  1 sibling, 0 replies; 11+ messages in thread
From: Christophe Leroy @ 2025-10-09  7:44 UTC (permalink / raw)
  To: David Hildenbrand, linux-kernel
  Cc: Zi Yan, Lorenzo Stoakes, Liam R. Howlett, Alexander Potapenko,
	Andrew Morton, Brendan Jackman, Christoph Lameter, Dennis Zhou,
	Dmitry Vyukov, dri-devel, intel-gfx, iommu, io-uring,
	Jason Gunthorpe, Jens Axboe, Johannes Weiner, John Hubbard,
	kasan-dev, kvm, Linus Torvalds, linux-arm-kernel,
	linux-arm-kernel, linux-crypto, linux-ide, linux-kselftest,
	linux-mips, linux-mmc, linux-mm, linux-riscv, linux-s390,
	linux-scsi, Marco Elver, Marek Szyprowski, Michal Hocko,
	Mike Rapoport, Muchun Song, netdev, Oscar Salvador, Peter Xu,
	Robin Murphy, Suren Baghdasaryan, Tejun Heo, virtualization,
	Vlastimil Babka, wireguard, x86, linuxppc-dev@lists.ozlabs.org



Le 09/10/2025 à 09:22, David Hildenbrand a écrit :
> On 09.10.25 09:14, Christophe Leroy wrote:
>> Hi David,
>>
>> Le 01/09/2025 à 17:03, David Hildenbrand a écrit :
>>> Let's check that no hstate that corresponds to an unreasonable folio 
>>> size
>>> is registered by an architecture. If we were to succeed registering, we
>>> could later try allocating an unsupported gigantic folio size.
>>>
>>> Further, let's add a BUILD_BUG_ON() for checking that HUGETLB_PAGE_ORDER
>>> is sane at build time. As HUGETLB_PAGE_ORDER is dynamic on powerpc, 
>>> we have
>>> to use a BUILD_BUG_ON_INVALID() to make it compile.
>>>
>>> No existing kernel configuration should be able to trigger this check:
>>> either SPARSEMEM without SPARSEMEM_VMEMMAP cannot be configured or
>>> gigantic folios will not exceed a memory section (the case on sparse).
>>>
>>> Reviewed-by: Zi Yan <ziy@nvidia.com>
>>> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>>> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>
>> I get following warning on powerpc with linus tree, bisected to commit
>> 7b4f21f5e038 ("mm/hugetlb: check for unreasonable folio sizes when
>> registering hstate")
> 
> Do you have the kernel config around? Is it 32bit?
> 
> That would be helpful.

That's corenet64_smp_defconfig

Boot on QEMU with:

	qemu-system-ppc64 -smp 2 -nographic -M ppce500 -cpu e5500 -m 1G



Christophe


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: (bisected) [PATCH v2 08/37] mm/hugetlb: check for unreasonable folio sizes when registering hstate
  2025-10-09  7:22     ` David Hildenbrand
  2025-10-09  7:44       ` Christophe Leroy
@ 2025-10-09  8:04       ` Christophe Leroy
  2025-10-09  8:14         ` David Hildenbrand
  1 sibling, 1 reply; 11+ messages in thread
From: Christophe Leroy @ 2025-10-09  8:04 UTC (permalink / raw)
  To: David Hildenbrand, linux-kernel
  Cc: Zi Yan, Lorenzo Stoakes, Liam R. Howlett, Alexander Potapenko,
	Andrew Morton, Brendan Jackman, Christoph Lameter, Dennis Zhou,
	Dmitry Vyukov, dri-devel, intel-gfx, iommu, io-uring,
	Jason Gunthorpe, Jens Axboe, Johannes Weiner, John Hubbard,
	kasan-dev, kvm, Linus Torvalds, linux-arm-kernel,
	linux-arm-kernel, linux-crypto, linux-ide, linux-kselftest,
	linux-mips, linux-mmc, linux-mm, linux-riscv, linux-s390,
	linux-scsi, Marco Elver, Marek Szyprowski, Michal Hocko,
	Mike Rapoport, Muchun Song, netdev, Oscar Salvador, Peter Xu,
	Robin Murphy, Suren Baghdasaryan, Tejun Heo, virtualization,
	Vlastimil Babka, wireguard, x86, linuxppc-dev@lists.ozlabs.org



Le 09/10/2025 à 09:22, David Hildenbrand a écrit :
> On 09.10.25 09:14, Christophe Leroy wrote:
>> Hi David,
>>
>> Le 01/09/2025 à 17:03, David Hildenbrand a écrit :
>>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>>> index 1e777cc51ad04..d3542e92a712e 100644
>>> --- a/mm/hugetlb.c
>>> +++ b/mm/hugetlb.c
>>> @@ -4657,6 +4657,7 @@ static int __init hugetlb_init(void)
>>>        BUILD_BUG_ON(sizeof_field(struct page, private) * BITS_PER_BYTE <
>>>                __NR_HPAGEFLAGS);
>>> +    BUILD_BUG_ON_INVALID(HUGETLB_PAGE_ORDER > MAX_FOLIO_ORDER);
>>>        if (!hugepages_supported()) {
>>>            if (hugetlb_max_hstate || default_hstate_max_huge_pages)
>>> @@ -4740,6 +4741,7 @@ void __init hugetlb_add_hstate(unsigned int order)
>>>        }
>>>        BUG_ON(hugetlb_max_hstate >= HUGE_MAX_HSTATE);
>>>        BUG_ON(order < order_base_2(__NR_USED_SUBPAGE));
>>> +    WARN_ON(order > MAX_FOLIO_ORDER);
>>>        h = &hstates[hugetlb_max_hstate++];
>>>        __mutex_init(&h->resize_lock, "resize mutex", &h->resize_key);
>>>        h->order = order;
> 
> We end up registering hugetlb folios that are bigger than 
> MAX_FOLIO_ORDER. So we have to figure out how a config can trigger that 
> (and if we have to support that).
> 

MAX_FOLIO_ORDER is defined as:

#ifdef CONFIG_ARCH_HAS_GIGANTIC_PAGE
#define MAX_FOLIO_ORDER		PUD_ORDER
#else
#define MAX_FOLIO_ORDER		MAX_PAGE_ORDER
#endif

MAX_PAGE_ORDER is the limit for dynamic creation of hugepages via 
/sys/kernel/mm/hugepages/ but bigger pages can be created at boottime 
with kernel boot parameters without CONFIG_ARCH_HAS_GIGANTIC_PAGE:

   hugepagesz=64m hugepages=1 hugepagesz=256m hugepages=1

Gives:

HugeTLB: registered 1.00 GiB page size, pre-allocated 0 pages
HugeTLB: 0 KiB vmemmap can be freed for a 1.00 GiB page
HugeTLB: registered 64.0 MiB page size, pre-allocated 1 pages
HugeTLB: 0 KiB vmemmap can be freed for a 64.0 MiB page
HugeTLB: registered 256 MiB page size, pre-allocated 1 pages
HugeTLB: 0 KiB vmemmap can be freed for a 256 MiB page
HugeTLB: registered 4.00 MiB page size, pre-allocated 0 pages
HugeTLB: 0 KiB vmemmap can be freed for a 4.00 MiB page
HugeTLB: registered 16.0 MiB page size, pre-allocated 0 pages
HugeTLB: 0 KiB vmemmap can be freed for a 16.0 MiB page


Christophe


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: (bisected) [PATCH v2 08/37] mm/hugetlb: check for unreasonable folio sizes when registering hstate
  2025-10-09  8:04       ` Christophe Leroy
@ 2025-10-09  8:14         ` David Hildenbrand
  2025-10-09  9:16           ` Christophe Leroy
  0 siblings, 1 reply; 11+ messages in thread
From: David Hildenbrand @ 2025-10-09  8:14 UTC (permalink / raw)
  To: Christophe Leroy, linux-kernel
  Cc: Zi Yan, Lorenzo Stoakes, Liam R. Howlett, Alexander Potapenko,
	Andrew Morton, Brendan Jackman, Christoph Lameter, Dennis Zhou,
	Dmitry Vyukov, dri-devel, intel-gfx, iommu, io-uring,
	Jason Gunthorpe, Jens Axboe, Johannes Weiner, John Hubbard,
	kasan-dev, kvm, Linus Torvalds, linux-arm-kernel,
	linux-arm-kernel, linux-crypto, linux-ide, linux-kselftest,
	linux-mips, linux-mmc, linux-mm, linux-riscv, linux-s390,
	linux-scsi, Marco Elver, Marek Szyprowski, Michal Hocko,
	Mike Rapoport, Muchun Song, netdev, Oscar Salvador, Peter Xu,
	Robin Murphy, Suren Baghdasaryan, Tejun Heo, virtualization,
	Vlastimil Babka, wireguard, x86, linuxppc-dev@lists.ozlabs.org

On 09.10.25 10:04, Christophe Leroy wrote:
> 
> 
> Le 09/10/2025 à 09:22, David Hildenbrand a écrit :
>> On 09.10.25 09:14, Christophe Leroy wrote:
>>> Hi David,
>>>
>>> Le 01/09/2025 à 17:03, David Hildenbrand a écrit :
>>>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>>>> index 1e777cc51ad04..d3542e92a712e 100644
>>>> --- a/mm/hugetlb.c
>>>> +++ b/mm/hugetlb.c
>>>> @@ -4657,6 +4657,7 @@ static int __init hugetlb_init(void)
>>>>         BUILD_BUG_ON(sizeof_field(struct page, private) * BITS_PER_BYTE <
>>>>                 __NR_HPAGEFLAGS);
>>>> +    BUILD_BUG_ON_INVALID(HUGETLB_PAGE_ORDER > MAX_FOLIO_ORDER);
>>>>         if (!hugepages_supported()) {
>>>>             if (hugetlb_max_hstate || default_hstate_max_huge_pages)
>>>> @@ -4740,6 +4741,7 @@ void __init hugetlb_add_hstate(unsigned int order)
>>>>         }
>>>>         BUG_ON(hugetlb_max_hstate >= HUGE_MAX_HSTATE);
>>>>         BUG_ON(order < order_base_2(__NR_USED_SUBPAGE));
>>>> +    WARN_ON(order > MAX_FOLIO_ORDER);
>>>>         h = &hstates[hugetlb_max_hstate++];
>>>>         __mutex_init(&h->resize_lock, "resize mutex", &h->resize_key);
>>>>         h->order = order;
>>
>> We end up registering hugetlb folios that are bigger than
>> MAX_FOLIO_ORDER. So we have to figure out how a config can trigger that
>> (and if we have to support that).
>>
> 
> MAX_FOLIO_ORDER is defined as:
> 
> #ifdef CONFIG_ARCH_HAS_GIGANTIC_PAGE
> #define MAX_FOLIO_ORDER		PUD_ORDER
> #else
> #define MAX_FOLIO_ORDER		MAX_PAGE_ORDER
> #endif
> 
> MAX_PAGE_ORDER is the limit for dynamic creation of hugepages via
> /sys/kernel/mm/hugepages/ but bigger pages can be created at boottime
> with kernel boot parameters without CONFIG_ARCH_HAS_GIGANTIC_PAGE:
> 
>     hugepagesz=64m hugepages=1 hugepagesz=256m hugepages=1
> 
> Gives:
> 
> HugeTLB: registered 1.00 GiB page size, pre-allocated 0 pages
> HugeTLB: 0 KiB vmemmap can be freed for a 1.00 GiB page
> HugeTLB: registered 64.0 MiB page size, pre-allocated 1 pages
> HugeTLB: 0 KiB vmemmap can be freed for a 64.0 MiB page
> HugeTLB: registered 256 MiB page size, pre-allocated 1 pages
> HugeTLB: 0 KiB vmemmap can be freed for a 256 MiB page
> HugeTLB: registered 4.00 MiB page size, pre-allocated 0 pages
> HugeTLB: 0 KiB vmemmap can be freed for a 4.00 MiB page
> HugeTLB: registered 16.0 MiB page size, pre-allocated 0 pages
> HugeTLB: 0 KiB vmemmap can be freed for a 16.0 MiB page

I think it's a violation of CONFIG_ARCH_HAS_GIGANTIC_PAGE. The existing 
folio_dump() code would not handle it correctly as well.

See how snapshot_page() uses MAX_FOLIO_NR_PAGES.

-- 
Cheers

David / dhildenb



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: (bisected) [PATCH v2 08/37] mm/hugetlb: check for unreasonable folio sizes when registering hstate
  2025-10-09  8:14         ` David Hildenbrand
@ 2025-10-09  9:16           ` Christophe Leroy
  2025-10-09  9:20             ` David Hildenbrand
  0 siblings, 1 reply; 11+ messages in thread
From: Christophe Leroy @ 2025-10-09  9:16 UTC (permalink / raw)
  To: David Hildenbrand, linux-kernel
  Cc: Zi Yan, Lorenzo Stoakes, Liam R. Howlett, Alexander Potapenko,
	Andrew Morton, Brendan Jackman, Christoph Lameter, Dennis Zhou,
	Dmitry Vyukov, dri-devel, intel-gfx, iommu, io-uring,
	Jason Gunthorpe, Jens Axboe, Johannes Weiner, John Hubbard,
	kasan-dev, kvm, Linus Torvalds, linux-arm-kernel,
	linux-arm-kernel, linux-crypto, linux-ide, linux-kselftest,
	linux-mips, linux-mmc, linux-mm, linux-riscv, linux-s390,
	linux-scsi, Marco Elver, Marek Szyprowski, Michal Hocko,
	Mike Rapoport, Muchun Song, netdev, Oscar Salvador, Peter Xu,
	Robin Murphy, Suren Baghdasaryan, Tejun Heo, virtualization,
	Vlastimil Babka, wireguard, x86, linuxppc-dev@lists.ozlabs.org



Le 09/10/2025 à 10:14, David Hildenbrand a écrit :
> On 09.10.25 10:04, Christophe Leroy wrote:
>>
>>
>> Le 09/10/2025 à 09:22, David Hildenbrand a écrit :
>>> On 09.10.25 09:14, Christophe Leroy wrote:
>>>> Hi David,
>>>>
>>>> Le 01/09/2025 à 17:03, David Hildenbrand a écrit :
>>>>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>>>>> index 1e777cc51ad04..d3542e92a712e 100644
>>>>> --- a/mm/hugetlb.c
>>>>> +++ b/mm/hugetlb.c
>>>>> @@ -4657,6 +4657,7 @@ static int __init hugetlb_init(void)
>>>>>         BUILD_BUG_ON(sizeof_field(struct page, private) * 
>>>>> BITS_PER_BYTE <
>>>>>                 __NR_HPAGEFLAGS);
>>>>> +    BUILD_BUG_ON_INVALID(HUGETLB_PAGE_ORDER > MAX_FOLIO_ORDER);
>>>>>         if (!hugepages_supported()) {
>>>>>             if (hugetlb_max_hstate || default_hstate_max_huge_pages)
>>>>> @@ -4740,6 +4741,7 @@ void __init hugetlb_add_hstate(unsigned int 
>>>>> order)
>>>>>         }
>>>>>         BUG_ON(hugetlb_max_hstate >= HUGE_MAX_HSTATE);
>>>>>         BUG_ON(order < order_base_2(__NR_USED_SUBPAGE));
>>>>> +    WARN_ON(order > MAX_FOLIO_ORDER);
>>>>>         h = &hstates[hugetlb_max_hstate++];
>>>>>         __mutex_init(&h->resize_lock, "resize mutex", &h->resize_key);
>>>>>         h->order = order;
>>>
>>> We end up registering hugetlb folios that are bigger than
>>> MAX_FOLIO_ORDER. So we have to figure out how a config can trigger that
>>> (and if we have to support that).
>>>
>>
>> MAX_FOLIO_ORDER is defined as:
>>
>> #ifdef CONFIG_ARCH_HAS_GIGANTIC_PAGE
>> #define MAX_FOLIO_ORDER        PUD_ORDER
>> #else
>> #define MAX_FOLIO_ORDER        MAX_PAGE_ORDER
>> #endif
>>
>> MAX_PAGE_ORDER is the limit for dynamic creation of hugepages via
>> /sys/kernel/mm/hugepages/ but bigger pages can be created at boottime
>> with kernel boot parameters without CONFIG_ARCH_HAS_GIGANTIC_PAGE:
>>
>>     hugepagesz=64m hugepages=1 hugepagesz=256m hugepages=1
>>
>> Gives:
>>
>> HugeTLB: registered 1.00 GiB page size, pre-allocated 0 pages
>> HugeTLB: 0 KiB vmemmap can be freed for a 1.00 GiB page
>> HugeTLB: registered 64.0 MiB page size, pre-allocated 1 pages
>> HugeTLB: 0 KiB vmemmap can be freed for a 64.0 MiB page
>> HugeTLB: registered 256 MiB page size, pre-allocated 1 pages
>> HugeTLB: 0 KiB vmemmap can be freed for a 256 MiB page
>> HugeTLB: registered 4.00 MiB page size, pre-allocated 0 pages
>> HugeTLB: 0 KiB vmemmap can be freed for a 4.00 MiB page
>> HugeTLB: registered 16.0 MiB page size, pre-allocated 0 pages
>> HugeTLB: 0 KiB vmemmap can be freed for a 16.0 MiB page
> 
> I think it's a violation of CONFIG_ARCH_HAS_GIGANTIC_PAGE. The existing 
> folio_dump() code would not handle it correctly as well.

I'm trying to dig into history and when looking at commit 4eb0716e868e 
("hugetlb: allow to free gigantic pages regardless of the 
configuration") I understand that CONFIG_ARCH_HAS_GIGANTIC_PAGE is 
needed to be able to allocate gigantic pages at runtime. It is not 
needed to reserve gigantic pages at boottime.

What am I missing ?

> 
> See how snapshot_page() uses MAX_FOLIO_NR_PAGES.
> 



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: (bisected) [PATCH v2 08/37] mm/hugetlb: check for unreasonable folio sizes when registering hstate
  2025-10-09  9:16           ` Christophe Leroy
@ 2025-10-09  9:20             ` David Hildenbrand
  2025-10-09 10:01               ` Christophe Leroy
  0 siblings, 1 reply; 11+ messages in thread
From: David Hildenbrand @ 2025-10-09  9:20 UTC (permalink / raw)
  To: Christophe Leroy, linux-kernel
  Cc: Zi Yan, Lorenzo Stoakes, Liam R. Howlett, Alexander Potapenko,
	Andrew Morton, Brendan Jackman, Christoph Lameter, Dennis Zhou,
	Dmitry Vyukov, dri-devel, intel-gfx, iommu, io-uring,
	Jason Gunthorpe, Jens Axboe, Johannes Weiner, John Hubbard,
	kasan-dev, kvm, Linus Torvalds, linux-arm-kernel,
	linux-arm-kernel, linux-crypto, linux-ide, linux-kselftest,
	linux-mips, linux-mmc, linux-mm, linux-riscv, linux-s390,
	linux-scsi, Marco Elver, Marek Szyprowski, Michal Hocko,
	Mike Rapoport, Muchun Song, netdev, Oscar Salvador, Peter Xu,
	Robin Murphy, Suren Baghdasaryan, Tejun Heo, virtualization,
	Vlastimil Babka, wireguard, x86, linuxppc-dev@lists.ozlabs.org

On 09.10.25 11:16, Christophe Leroy wrote:
> 
> 
> Le 09/10/2025 à 10:14, David Hildenbrand a écrit :
>> On 09.10.25 10:04, Christophe Leroy wrote:
>>>
>>>
>>> Le 09/10/2025 à 09:22, David Hildenbrand a écrit :
>>>> On 09.10.25 09:14, Christophe Leroy wrote:
>>>>> Hi David,
>>>>>
>>>>> Le 01/09/2025 à 17:03, David Hildenbrand a écrit :
>>>>>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>>>>>> index 1e777cc51ad04..d3542e92a712e 100644
>>>>>> --- a/mm/hugetlb.c
>>>>>> +++ b/mm/hugetlb.c
>>>>>> @@ -4657,6 +4657,7 @@ static int __init hugetlb_init(void)
>>>>>>          BUILD_BUG_ON(sizeof_field(struct page, private) *
>>>>>> BITS_PER_BYTE <
>>>>>>                  __NR_HPAGEFLAGS);
>>>>>> +    BUILD_BUG_ON_INVALID(HUGETLB_PAGE_ORDER > MAX_FOLIO_ORDER);
>>>>>>          if (!hugepages_supported()) {
>>>>>>              if (hugetlb_max_hstate || default_hstate_max_huge_pages)
>>>>>> @@ -4740,6 +4741,7 @@ void __init hugetlb_add_hstate(unsigned int
>>>>>> order)
>>>>>>          }
>>>>>>          BUG_ON(hugetlb_max_hstate >= HUGE_MAX_HSTATE);
>>>>>>          BUG_ON(order < order_base_2(__NR_USED_SUBPAGE));
>>>>>> +    WARN_ON(order > MAX_FOLIO_ORDER);
>>>>>>          h = &hstates[hugetlb_max_hstate++];
>>>>>>          __mutex_init(&h->resize_lock, "resize mutex", &h->resize_key);
>>>>>>          h->order = order;
>>>>
>>>> We end up registering hugetlb folios that are bigger than
>>>> MAX_FOLIO_ORDER. So we have to figure out how a config can trigger that
>>>> (and if we have to support that).
>>>>
>>>
>>> MAX_FOLIO_ORDER is defined as:
>>>
>>> #ifdef CONFIG_ARCH_HAS_GIGANTIC_PAGE
>>> #define MAX_FOLIO_ORDER        PUD_ORDER
>>> #else
>>> #define MAX_FOLIO_ORDER        MAX_PAGE_ORDER
>>> #endif
>>>
>>> MAX_PAGE_ORDER is the limit for dynamic creation of hugepages via
>>> /sys/kernel/mm/hugepages/ but bigger pages can be created at boottime
>>> with kernel boot parameters without CONFIG_ARCH_HAS_GIGANTIC_PAGE:
>>>
>>>      hugepagesz=64m hugepages=1 hugepagesz=256m hugepages=1
>>>
>>> Gives:
>>>
>>> HugeTLB: registered 1.00 GiB page size, pre-allocated 0 pages
>>> HugeTLB: 0 KiB vmemmap can be freed for a 1.00 GiB page
>>> HugeTLB: registered 64.0 MiB page size, pre-allocated 1 pages
>>> HugeTLB: 0 KiB vmemmap can be freed for a 64.0 MiB page
>>> HugeTLB: registered 256 MiB page size, pre-allocated 1 pages
>>> HugeTLB: 0 KiB vmemmap can be freed for a 256 MiB page
>>> HugeTLB: registered 4.00 MiB page size, pre-allocated 0 pages
>>> HugeTLB: 0 KiB vmemmap can be freed for a 4.00 MiB page
>>> HugeTLB: registered 16.0 MiB page size, pre-allocated 0 pages
>>> HugeTLB: 0 KiB vmemmap can be freed for a 16.0 MiB page
>>
>> I think it's a violation of CONFIG_ARCH_HAS_GIGANTIC_PAGE. The existing
>> folio_dump() code would not handle it correctly as well.
> 
> I'm trying to dig into history and when looking at commit 4eb0716e868e
> ("hugetlb: allow to free gigantic pages regardless of the
> configuration") I understand that CONFIG_ARCH_HAS_GIGANTIC_PAGE is
> needed to be able to allocate gigantic pages at runtime. It is not
> needed to reserve gigantic pages at boottime.
> 
> What am I missing ?

That CONFIG_ARCH_HAS_GIGANTIC_PAGE has nothing runtime-specific in its name.

Can't we just select CONFIG_ARCH_HAS_GIGANTIC_PAGE for the relevant 
hugetlb config that allows for *gigantic pages*.

-- 
Cheers

David / dhildenb



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: (bisected) [PATCH v2 08/37] mm/hugetlb: check for unreasonable folio sizes when registering hstate
  2025-10-09  9:20             ` David Hildenbrand
@ 2025-10-09 10:01               ` Christophe Leroy
  2025-10-09 10:27                 ` David Hildenbrand
  0 siblings, 1 reply; 11+ messages in thread
From: Christophe Leroy @ 2025-10-09 10:01 UTC (permalink / raw)
  To: David Hildenbrand, linux-kernel
  Cc: Zi Yan, Lorenzo Stoakes, Liam R. Howlett, Alexander Potapenko,
	Andrew Morton, Brendan Jackman, Christoph Lameter, Dennis Zhou,
	Dmitry Vyukov, dri-devel, intel-gfx, iommu, io-uring,
	Jason Gunthorpe, Jens Axboe, Johannes Weiner, John Hubbard,
	kasan-dev, kvm, Linus Torvalds, linux-arm-kernel,
	linux-arm-kernel, linux-crypto, linux-ide, linux-kselftest,
	linux-mips, linux-mmc, linux-mm, linux-riscv, linux-s390,
	linux-scsi, Marco Elver, Marek Szyprowski, Michal Hocko,
	Mike Rapoport, Muchun Song, netdev, Oscar Salvador, Peter Xu,
	Robin Murphy, Suren Baghdasaryan, Tejun Heo, virtualization,
	Vlastimil Babka, wireguard, x86, linuxppc-dev@lists.ozlabs.org



Le 09/10/2025 à 11:20, David Hildenbrand a écrit :
> On 09.10.25 11:16, Christophe Leroy wrote:
>>
>>
>> Le 09/10/2025 à 10:14, David Hildenbrand a écrit :
>>> On 09.10.25 10:04, Christophe Leroy wrote:
>>>>
>>>>
>>>> Le 09/10/2025 à 09:22, David Hildenbrand a écrit :
>>>>> On 09.10.25 09:14, Christophe Leroy wrote:
>>>>>> Hi David,
>>>>>>
>>>>>> Le 01/09/2025 à 17:03, David Hildenbrand a écrit :
>>>>>>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>>>>>>> index 1e777cc51ad04..d3542e92a712e 100644
>>>>>>> --- a/mm/hugetlb.c
>>>>>>> +++ b/mm/hugetlb.c
>>>>>>> @@ -4657,6 +4657,7 @@ static int __init hugetlb_init(void)
>>>>>>>          BUILD_BUG_ON(sizeof_field(struct page, private) *
>>>>>>> BITS_PER_BYTE <
>>>>>>>                  __NR_HPAGEFLAGS);
>>>>>>> +    BUILD_BUG_ON_INVALID(HUGETLB_PAGE_ORDER > MAX_FOLIO_ORDER);
>>>>>>>          if (!hugepages_supported()) {
>>>>>>>              if (hugetlb_max_hstate || 
>>>>>>> default_hstate_max_huge_pages)
>>>>>>> @@ -4740,6 +4741,7 @@ void __init hugetlb_add_hstate(unsigned int
>>>>>>> order)
>>>>>>>          }
>>>>>>>          BUG_ON(hugetlb_max_hstate >= HUGE_MAX_HSTATE);
>>>>>>>          BUG_ON(order < order_base_2(__NR_USED_SUBPAGE));
>>>>>>> +    WARN_ON(order > MAX_FOLIO_ORDER);
>>>>>>>          h = &hstates[hugetlb_max_hstate++];
>>>>>>>          __mutex_init(&h->resize_lock, "resize mutex", &h- 
>>>>>>> >resize_key);
>>>>>>>          h->order = order;
>>>>>
>>>>> We end up registering hugetlb folios that are bigger than
>>>>> MAX_FOLIO_ORDER. So we have to figure out how a config can trigger 
>>>>> that
>>>>> (and if we have to support that).
>>>>>
>>>>
>>>> MAX_FOLIO_ORDER is defined as:
>>>>
>>>> #ifdef CONFIG_ARCH_HAS_GIGANTIC_PAGE
>>>> #define MAX_FOLIO_ORDER        PUD_ORDER
>>>> #else
>>>> #define MAX_FOLIO_ORDER        MAX_PAGE_ORDER
>>>> #endif
>>>>
>>>> MAX_PAGE_ORDER is the limit for dynamic creation of hugepages via
>>>> /sys/kernel/mm/hugepages/ but bigger pages can be created at boottime
>>>> with kernel boot parameters without CONFIG_ARCH_HAS_GIGANTIC_PAGE:
>>>>
>>>>      hugepagesz=64m hugepages=1 hugepagesz=256m hugepages=1
>>>>
>>>> Gives:
>>>>
>>>> HugeTLB: registered 1.00 GiB page size, pre-allocated 0 pages
>>>> HugeTLB: 0 KiB vmemmap can be freed for a 1.00 GiB page
>>>> HugeTLB: registered 64.0 MiB page size, pre-allocated 1 pages
>>>> HugeTLB: 0 KiB vmemmap can be freed for a 64.0 MiB page
>>>> HugeTLB: registered 256 MiB page size, pre-allocated 1 pages
>>>> HugeTLB: 0 KiB vmemmap can be freed for a 256 MiB page
>>>> HugeTLB: registered 4.00 MiB page size, pre-allocated 0 pages
>>>> HugeTLB: 0 KiB vmemmap can be freed for a 4.00 MiB page
>>>> HugeTLB: registered 16.0 MiB page size, pre-allocated 0 pages
>>>> HugeTLB: 0 KiB vmemmap can be freed for a 16.0 MiB page
>>>
>>> I think it's a violation of CONFIG_ARCH_HAS_GIGANTIC_PAGE. The existing
>>> folio_dump() code would not handle it correctly as well.
>>
>> I'm trying to dig into history and when looking at commit 4eb0716e868e
>> ("hugetlb: allow to free gigantic pages regardless of the
>> configuration") I understand that CONFIG_ARCH_HAS_GIGANTIC_PAGE is
>> needed to be able to allocate gigantic pages at runtime. It is not
>> needed to reserve gigantic pages at boottime.
>>
>> What am I missing ?
> 
> That CONFIG_ARCH_HAS_GIGANTIC_PAGE has nothing runtime-specific in its 
> name.

In its name for sure, but the commit I mention says:

     On systems without CONTIG_ALLOC activated but that support gigantic 
pages,
     boottime reserved gigantic pages can not be freed at all.  This patch
     simply enables the possibility to hand back those pages to memory
     allocator.

And one of the hunks is:

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 7f7fbd8bd9d5b..7a1aa53d188d3 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -19,7 +19,7 @@ config ARM64
         select ARCH_HAS_FAST_MULTIPLIER
         select ARCH_HAS_FORTIFY_SOURCE
         select ARCH_HAS_GCOV_PROFILE_ALL
-       select ARCH_HAS_GIGANTIC_PAGE if CONTIG_ALLOC
+       select ARCH_HAS_GIGANTIC_PAGE
         select ARCH_HAS_KCOV
         select ARCH_HAS_KEEPINITRD
         select ARCH_HAS_MEMBARRIER_SYNC_CORE

So I understand from the commit message that it was possible at that 
time to have gigantic pages without ARCH_HAS_GIGANTIC_PAGE as long as 
you didn't have to be able to free them during runtime.

> 
> Can't we just select CONFIG_ARCH_HAS_GIGANTIC_PAGE for the relevant 
> hugetlb config that allows for *gigantic pages*.
> 

We probably can, but I'd really like to understand history and how we 
ended up in the situation we are now.
Because blind fixes often lead to more problems.

If I follow things correctly I see a helper gigantic_page_supported() 
added by commit 944d9fec8d7a ("hugetlb: add support for gigantic page 
allocation at runtime").

And then commit 461a7184320a ("mm/hugetlb: introduce 
ARCH_HAS_GIGANTIC_PAGE") is added to wrap gigantic_page_supported()

Then commit 4eb0716e868e ("hugetlb: allow to free gigantic pages 
regardless of the configuration") changed gigantic_page_supported() to 
gigantic_page_runtime_supported()

So where are we now ?

Christophe


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: (bisected) [PATCH v2 08/37] mm/hugetlb: check for unreasonable folio sizes when registering hstate
  2025-10-09 10:01               ` Christophe Leroy
@ 2025-10-09 10:27                 ` David Hildenbrand
  2025-10-09 12:08                   ` Christophe Leroy
  0 siblings, 1 reply; 11+ messages in thread
From: David Hildenbrand @ 2025-10-09 10:27 UTC (permalink / raw)
  To: Christophe Leroy, linux-kernel
  Cc: Zi Yan, Lorenzo Stoakes, Liam R. Howlett, Alexander Potapenko,
	Andrew Morton, Brendan Jackman, Christoph Lameter, Dennis Zhou,
	Dmitry Vyukov, dri-devel, intel-gfx, iommu, io-uring,
	Jason Gunthorpe, Jens Axboe, Johannes Weiner, John Hubbard,
	kasan-dev, kvm, Linus Torvalds, linux-arm-kernel,
	linux-arm-kernel, linux-crypto, linux-ide, linux-kselftest,
	linux-mips, linux-mmc, linux-mm, linux-riscv, linux-s390,
	linux-scsi, Marco Elver, Marek Szyprowski, Michal Hocko,
	Mike Rapoport, Muchun Song, netdev, Oscar Salvador, Peter Xu,
	Robin Murphy, Suren Baghdasaryan, Tejun Heo, virtualization,
	Vlastimil Babka, wireguard, x86, linuxppc-dev@lists.ozlabs.org

On 09.10.25 12:01, Christophe Leroy wrote:
> 
> 
> Le 09/10/2025 à 11:20, David Hildenbrand a écrit :
>> On 09.10.25 11:16, Christophe Leroy wrote:
>>>
>>>
>>> Le 09/10/2025 à 10:14, David Hildenbrand a écrit :
>>>> On 09.10.25 10:04, Christophe Leroy wrote:
>>>>>
>>>>>
>>>>> Le 09/10/2025 à 09:22, David Hildenbrand a écrit :
>>>>>> On 09.10.25 09:14, Christophe Leroy wrote:
>>>>>>> Hi David,
>>>>>>>
>>>>>>> Le 01/09/2025 à 17:03, David Hildenbrand a écrit :
>>>>>>>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>>>>>>>> index 1e777cc51ad04..d3542e92a712e 100644
>>>>>>>> --- a/mm/hugetlb.c
>>>>>>>> +++ b/mm/hugetlb.c
>>>>>>>> @@ -4657,6 +4657,7 @@ static int __init hugetlb_init(void)
>>>>>>>>           BUILD_BUG_ON(sizeof_field(struct page, private) *
>>>>>>>> BITS_PER_BYTE <
>>>>>>>>                   __NR_HPAGEFLAGS);
>>>>>>>> +    BUILD_BUG_ON_INVALID(HUGETLB_PAGE_ORDER > MAX_FOLIO_ORDER);
>>>>>>>>           if (!hugepages_supported()) {
>>>>>>>>               if (hugetlb_max_hstate ||
>>>>>>>> default_hstate_max_huge_pages)
>>>>>>>> @@ -4740,6 +4741,7 @@ void __init hugetlb_add_hstate(unsigned int
>>>>>>>> order)
>>>>>>>>           }
>>>>>>>>           BUG_ON(hugetlb_max_hstate >= HUGE_MAX_HSTATE);
>>>>>>>>           BUG_ON(order < order_base_2(__NR_USED_SUBPAGE));
>>>>>>>> +    WARN_ON(order > MAX_FOLIO_ORDER);
>>>>>>>>           h = &hstates[hugetlb_max_hstate++];
>>>>>>>>           __mutex_init(&h->resize_lock, "resize mutex", &h-
>>>>>>>>> resize_key);
>>>>>>>>           h->order = order;
>>>>>>
>>>>>> We end up registering hugetlb folios that are bigger than
>>>>>> MAX_FOLIO_ORDER. So we have to figure out how a config can trigger
>>>>>> that
>>>>>> (and if we have to support that).
>>>>>>
>>>>>
>>>>> MAX_FOLIO_ORDER is defined as:
>>>>>
>>>>> #ifdef CONFIG_ARCH_HAS_GIGANTIC_PAGE
>>>>> #define MAX_FOLIO_ORDER        PUD_ORDER
>>>>> #else
>>>>> #define MAX_FOLIO_ORDER        MAX_PAGE_ORDER
>>>>> #endif
>>>>>
>>>>> MAX_PAGE_ORDER is the limit for dynamic creation of hugepages via
>>>>> /sys/kernel/mm/hugepages/ but bigger pages can be created at boottime
>>>>> with kernel boot parameters without CONFIG_ARCH_HAS_GIGANTIC_PAGE:
>>>>>
>>>>>       hugepagesz=64m hugepages=1 hugepagesz=256m hugepages=1
>>>>>
>>>>> Gives:
>>>>>
>>>>> HugeTLB: registered 1.00 GiB page size, pre-allocated 0 pages
>>>>> HugeTLB: 0 KiB vmemmap can be freed for a 1.00 GiB page
>>>>> HugeTLB: registered 64.0 MiB page size, pre-allocated 1 pages
>>>>> HugeTLB: 0 KiB vmemmap can be freed for a 64.0 MiB page
>>>>> HugeTLB: registered 256 MiB page size, pre-allocated 1 pages
>>>>> HugeTLB: 0 KiB vmemmap can be freed for a 256 MiB page
>>>>> HugeTLB: registered 4.00 MiB page size, pre-allocated 0 pages
>>>>> HugeTLB: 0 KiB vmemmap can be freed for a 4.00 MiB page
>>>>> HugeTLB: registered 16.0 MiB page size, pre-allocated 0 pages
>>>>> HugeTLB: 0 KiB vmemmap can be freed for a 16.0 MiB page
>>>>
>>>> I think it's a violation of CONFIG_ARCH_HAS_GIGANTIC_PAGE. The existing
>>>> folio_dump() code would not handle it correctly as well.
>>>
>>> I'm trying to dig into history and when looking at commit 4eb0716e868e
>>> ("hugetlb: allow to free gigantic pages regardless of the
>>> configuration") I understand that CONFIG_ARCH_HAS_GIGANTIC_PAGE is
>>> needed to be able to allocate gigantic pages at runtime. It is not
>>> needed to reserve gigantic pages at boottime.
>>>
>>> What am I missing ?
>>
>> That CONFIG_ARCH_HAS_GIGANTIC_PAGE has nothing runtime-specific in its
>> name.
> 
> In its name for sure, but the commit I mention says:
> 
>       On systems without CONTIG_ALLOC activated but that support gigantic
> pages,
>       boottime reserved gigantic pages can not be freed at all.  This patch
>       simply enables the possibility to hand back those pages to memory
>       allocator.

Right, I think it was a historical artifact.

> 
> And one of the hunks is:
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 7f7fbd8bd9d5b..7a1aa53d188d3 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -19,7 +19,7 @@ config ARM64
>           select ARCH_HAS_FAST_MULTIPLIER
>           select ARCH_HAS_FORTIFY_SOURCE
>           select ARCH_HAS_GCOV_PROFILE_ALL
> -       select ARCH_HAS_GIGANTIC_PAGE if CONTIG_ALLOC
> +       select ARCH_HAS_GIGANTIC_PAGE
>           select ARCH_HAS_KCOV
>           select ARCH_HAS_KEEPINITRD
>           select ARCH_HAS_MEMBARRIER_SYNC_CORE
> 
> So I understand from the commit message that it was possible at that
> time to have gigantic pages without ARCH_HAS_GIGANTIC_PAGE as long as
> you didn't have to be able to free them during runtime.

Yes, I agree.

> 
>>
>> Can't we just select CONFIG_ARCH_HAS_GIGANTIC_PAGE for the relevant
>> hugetlb config that allows for *gigantic pages*.
>>
> 
> We probably can, but I'd really like to understand history and how we
> ended up in the situation we are now.
> Because blind fixes often lead to more problems.

Yes, let's figure out how to to it cleanly.

> 
> If I follow things correctly I see a helper gigantic_page_supported()
> added by commit 944d9fec8d7a ("hugetlb: add support for gigantic page
> allocation at runtime").
> 
> And then commit 461a7184320a ("mm/hugetlb: introduce
> ARCH_HAS_GIGANTIC_PAGE") is added to wrap gigantic_page_supported()
> 
> Then commit 4eb0716e868e ("hugetlb: allow to free gigantic pages
> regardless of the configuration") changed gigantic_page_supported() to
> gigantic_page_runtime_supported()
> 
> So where are we now ?

In

commit fae7d834c43ccdb9fcecaf4d0f33145d884b3e5c
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Tue Feb 27 19:23:31 2024 +0000

     mm: add __dump_folio()


We started assuming that a folio in the system (boottime, dynamic, whatever)
has a maximum of MAX_FOLIO_NR_PAGES.

Any other interpretation doesn't make any sense for MAX_FOLIO_NR_PAGES.


So we have two questions:

1) How to teach MAX_FOLIO_NR_PAGES that hugetlb supports gigantic pages

2) How do we handle CONFIG_ARCH_HAS_GIGANTIC_PAGE


We have the following options

(A) Rename existing CONFIG_ARCH_HAS_GIGANTIC_PAGE to something else that is
clearer and add a new CONFIG_ARCH_HAS_GIGANTIC_PAGE.

(B) Rename existing CONFIG_ARCH_HAS_GIGANTIC_PAGE -> to something else that is
clearer and derive somehow else that hugetlb in that config supports gigantic pages.

(c) Just use CONFIG_ARCH_HAS_GIGANTIC_PAGE if hugetlb on an architecture
supports gigantic pages.


I don't quite see why an architecture should be able to opt in into dynamically
allocating+freeing gigantic pages. That's just CONTIG_ALLOC magic and not some
arch-specific thing IIRC.


Note that in mm/hugetlb.c it is

	#ifdef CONFIG_ARCH_HAS_GIGANTIC_PAGE
	#ifdef CONFIG_CONTIG_ALLOC

Meaning that at least the allocation side is guarded by CONTIG_ALLOC.

So I think (C) is just the right thing to do.

diff --git a/fs/Kconfig b/fs/Kconfig
index 0bfdaecaa8775..12c11eb9279d3 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -283,6 +283,8 @@ config HUGETLB_PMD_PAGE_TABLE_SHARING
         def_bool HUGETLB_PAGE
         depends on ARCH_WANT_HUGE_PMD_SHARE && SPLIT_PMD_PTLOCKS
  
+# An architecture must select this option if there is any mechanism (esp. hugetlb)
+# could obtain gigantic folios.
  config ARCH_HAS_GIGANTIC_PAGE
         bool
  


-- 
Cheers

David / dhildenb



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: (bisected) [PATCH v2 08/37] mm/hugetlb: check for unreasonable folio sizes when registering hstate
  2025-10-09 10:27                 ` David Hildenbrand
@ 2025-10-09 12:08                   ` Christophe Leroy
  2025-10-09 13:05                     ` David Hildenbrand
  0 siblings, 1 reply; 11+ messages in thread
From: Christophe Leroy @ 2025-10-09 12:08 UTC (permalink / raw)
  To: David Hildenbrand, linux-kernel
  Cc: Zi Yan, Lorenzo Stoakes, Liam R. Howlett, Alexander Potapenko,
	Andrew Morton, Brendan Jackman, Christoph Lameter, Dennis Zhou,
	Dmitry Vyukov, dri-devel, intel-gfx, iommu, io-uring,
	Jason Gunthorpe, Jens Axboe, Johannes Weiner, John Hubbard,
	kasan-dev, kvm, Linus Torvalds, linux-arm-kernel,
	linux-arm-kernel, linux-crypto, linux-ide, linux-kselftest,
	linux-mips, linux-mmc, linux-mm, linux-riscv, linux-s390,
	linux-scsi, Marco Elver, Marek Szyprowski, Michal Hocko,
	Mike Rapoport, Muchun Song, netdev, Oscar Salvador, Peter Xu,
	Robin Murphy, Suren Baghdasaryan, Tejun Heo, virtualization,
	Vlastimil Babka, wireguard, x86, linuxppc-dev@lists.ozlabs.org



Le 09/10/2025 à 12:27, David Hildenbrand a écrit :
> On 09.10.25 12:01, Christophe Leroy wrote:
>>
>>
>> Le 09/10/2025 à 11:20, David Hildenbrand a écrit :
>>> On 09.10.25 11:16, Christophe Leroy wrote:
>>>>
>>>>
>>>> Le 09/10/2025 à 10:14, David Hildenbrand a écrit :
>>>>> On 09.10.25 10:04, Christophe Leroy wrote:
>>>>>>
>>>>>>
>>>>>> Le 09/10/2025 à 09:22, David Hildenbrand a écrit :
>>>>>>> On 09.10.25 09:14, Christophe Leroy wrote:
>>>>>>>> Hi David,
>>>>>>>>
>>>>>>>> Le 01/09/2025 à 17:03, David Hildenbrand a écrit :
>>>>>>>>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>>>>>>>>> index 1e777cc51ad04..d3542e92a712e 100644
>>>>>>>>> --- a/mm/hugetlb.c
>>>>>>>>> +++ b/mm/hugetlb.c
>>>>>>>>> @@ -4657,6 +4657,7 @@ static int __init hugetlb_init(void)
>>>>>>>>>           BUILD_BUG_ON(sizeof_field(struct page, private) *
>>>>>>>>> BITS_PER_BYTE <
>>>>>>>>>                   __NR_HPAGEFLAGS);
>>>>>>>>> +    BUILD_BUG_ON_INVALID(HUGETLB_PAGE_ORDER > MAX_FOLIO_ORDER);
>>>>>>>>>           if (!hugepages_supported()) {
>>>>>>>>>               if (hugetlb_max_hstate ||
>>>>>>>>> default_hstate_max_huge_pages)
>>>>>>>>> @@ -4740,6 +4741,7 @@ void __init hugetlb_add_hstate(unsigned int
>>>>>>>>> order)
>>>>>>>>>           }
>>>>>>>>>           BUG_ON(hugetlb_max_hstate >= HUGE_MAX_HSTATE);
>>>>>>>>>           BUG_ON(order < order_base_2(__NR_USED_SUBPAGE));
>>>>>>>>> +    WARN_ON(order > MAX_FOLIO_ORDER);
>>>>>>>>>           h = &hstates[hugetlb_max_hstate++];
>>>>>>>>>           __mutex_init(&h->resize_lock, "resize mutex", &h-
>>>>>>>>>> resize_key);
>>>>>>>>>           h->order = order;
>>>>>>>
>>>>>>> We end up registering hugetlb folios that are bigger than
>>>>>>> MAX_FOLIO_ORDER. So we have to figure out how a config can trigger
>>>>>>> that
>>>>>>> (and if we have to support that).
>>>>>>>
>>>>>>
>>>>>> MAX_FOLIO_ORDER is defined as:
>>>>>>
>>>>>> #ifdef CONFIG_ARCH_HAS_GIGANTIC_PAGE
>>>>>> #define MAX_FOLIO_ORDER        PUD_ORDER
>>>>>> #else
>>>>>> #define MAX_FOLIO_ORDER        MAX_PAGE_ORDER
>>>>>> #endif
>>>>>>
>>>>>> MAX_PAGE_ORDER is the limit for dynamic creation of hugepages via
>>>>>> /sys/kernel/mm/hugepages/ but bigger pages can be created at boottime
>>>>>> with kernel boot parameters without CONFIG_ARCH_HAS_GIGANTIC_PAGE:
>>>>>>
>>>>>>       hugepagesz=64m hugepages=1 hugepagesz=256m hugepages=1
>>>>>>
>>>>>> Gives:
>>>>>>
>>>>>> HugeTLB: registered 1.00 GiB page size, pre-allocated 0 pages
>>>>>> HugeTLB: 0 KiB vmemmap can be freed for a 1.00 GiB page
>>>>>> HugeTLB: registered 64.0 MiB page size, pre-allocated 1 pages
>>>>>> HugeTLB: 0 KiB vmemmap can be freed for a 64.0 MiB page
>>>>>> HugeTLB: registered 256 MiB page size, pre-allocated 1 pages
>>>>>> HugeTLB: 0 KiB vmemmap can be freed for a 256 MiB page
>>>>>> HugeTLB: registered 4.00 MiB page size, pre-allocated 0 pages
>>>>>> HugeTLB: 0 KiB vmemmap can be freed for a 4.00 MiB page
>>>>>> HugeTLB: registered 16.0 MiB page size, pre-allocated 0 pages
>>>>>> HugeTLB: 0 KiB vmemmap can be freed for a 16.0 MiB page
>>>>>
>>>>> I think it's a violation of CONFIG_ARCH_HAS_GIGANTIC_PAGE. The 
>>>>> existing
>>>>> folio_dump() code would not handle it correctly as well.
>>>>
>>>> I'm trying to dig into history and when looking at commit 4eb0716e868e
>>>> ("hugetlb: allow to free gigantic pages regardless of the
>>>> configuration") I understand that CONFIG_ARCH_HAS_GIGANTIC_PAGE is
>>>> needed to be able to allocate gigantic pages at runtime. It is not
>>>> needed to reserve gigantic pages at boottime.
>>>>
>>>> What am I missing ?
>>>
>>> That CONFIG_ARCH_HAS_GIGANTIC_PAGE has nothing runtime-specific in its
>>> name.
>>
>> In its name for sure, but the commit I mention says:
>>
>>       On systems without CONTIG_ALLOC activated but that support gigantic
>> pages,
>>       boottime reserved gigantic pages can not be freed at all.  This 
>> patch
>>       simply enables the possibility to hand back those pages to memory
>>       allocator.
> 
> Right, I think it was a historical artifact.
> 
>>
>> And one of the hunks is:
>>
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index 7f7fbd8bd9d5b..7a1aa53d188d3 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -19,7 +19,7 @@ config ARM64
>>           select ARCH_HAS_FAST_MULTIPLIER
>>           select ARCH_HAS_FORTIFY_SOURCE
>>           select ARCH_HAS_GCOV_PROFILE_ALL
>> -       select ARCH_HAS_GIGANTIC_PAGE if CONTIG_ALLOC
>> +       select ARCH_HAS_GIGANTIC_PAGE
>>           select ARCH_HAS_KCOV
>>           select ARCH_HAS_KEEPINITRD
>>           select ARCH_HAS_MEMBARRIER_SYNC_CORE
>>
>> So I understand from the commit message that it was possible at that
>> time to have gigantic pages without ARCH_HAS_GIGANTIC_PAGE as long as
>> you didn't have to be able to free them during runtime.
> 
> Yes, I agree.
> 
>>
>>>
>>> Can't we just select CONFIG_ARCH_HAS_GIGANTIC_PAGE for the relevant
>>> hugetlb config that allows for *gigantic pages*.
>>>
>>
>> We probably can, but I'd really like to understand history and how we
>> ended up in the situation we are now.
>> Because blind fixes often lead to more problems.
> 
> Yes, let's figure out how to to it cleanly.
> 
>>
>> If I follow things correctly I see a helper gigantic_page_supported()
>> added by commit 944d9fec8d7a ("hugetlb: add support for gigantic page
>> allocation at runtime").
>>
>> And then commit 461a7184320a ("mm/hugetlb: introduce
>> ARCH_HAS_GIGANTIC_PAGE") is added to wrap gigantic_page_supported()
>>
>> Then commit 4eb0716e868e ("hugetlb: allow to free gigantic pages
>> regardless of the configuration") changed gigantic_page_supported() to
>> gigantic_page_runtime_supported()
>>
>> So where are we now ?
> 
> In
> 
> commit fae7d834c43ccdb9fcecaf4d0f33145d884b3e5c
> Author: Matthew Wilcox (Oracle) <willy@infradead.org>
> Date:   Tue Feb 27 19:23:31 2024 +0000
> 
>      mm: add __dump_folio()
> 
> 
> We started assuming that a folio in the system (boottime, dynamic, 
> whatever)
> has a maximum of MAX_FOLIO_NR_PAGES.
> 
> Any other interpretation doesn't make any sense for MAX_FOLIO_NR_PAGES.
> 
> 
> So we have two questions:
> 
> 1) How to teach MAX_FOLIO_NR_PAGES that hugetlb supports gigantic pages
> 
> 2) How do we handle CONFIG_ARCH_HAS_GIGANTIC_PAGE
> 
> 
> We have the following options
> 
> (A) Rename existing CONFIG_ARCH_HAS_GIGANTIC_PAGE to something else that is
> clearer and add a new CONFIG_ARCH_HAS_GIGANTIC_PAGE.
> 
> (B) Rename existing CONFIG_ARCH_HAS_GIGANTIC_PAGE -> to something else 
> that is
> clearer and derive somehow else that hugetlb in that config supports 
> gigantic pages.
> 
> (c) Just use CONFIG_ARCH_HAS_GIGANTIC_PAGE if hugetlb on an architecture
> supports gigantic pages.
> 
> 
> I don't quite see why an architecture should be able to opt in into 
> dynamically
> allocating+freeing gigantic pages. That's just CONTIG_ALLOC magic and 
> not some
> arch-specific thing IIRC.
> 
> 
> Note that in mm/hugetlb.c it is
> 
>      #ifdef CONFIG_ARCH_HAS_GIGANTIC_PAGE
>      #ifdef CONFIG_CONTIG_ALLOC
> 
> Meaning that at least the allocation side is guarded by CONTIG_ALLOC.

Yes but not the freeing since commit 4eb0716e868e ("hugetlb: allow to 
free gigantic pages regardless of the configuration")

> 
> So I think (C) is just the right thing to do.
> 
> diff --git a/fs/Kconfig b/fs/Kconfig
> index 0bfdaecaa8775..12c11eb9279d3 100644
> --- a/fs/Kconfig
> +++ b/fs/Kconfig
> @@ -283,6 +283,8 @@ config HUGETLB_PMD_PAGE_TABLE_SHARING
>          def_bool HUGETLB_PAGE
>          depends on ARCH_WANT_HUGE_PMD_SHARE && SPLIT_PMD_PTLOCKS
> 
> +# An architecture must select this option if there is any mechanism 
> (esp. hugetlb)
> +# could obtain gigantic folios.
>   config ARCH_HAS_GIGANTIC_PAGE
>          bool
> 
> 

I gave it a try. That's not enough, it fixes the problem for 64 Mbytes 
pages and 256 Mbytes pages, but not for 1 Gbytes pages.

Max folio is defined by PUD_ORDER, but PUD_SIZE is 256 Mbytes so we need 
to make MAX_FOLIO larger. Do we change it to P4D_ORDER or is it too much 
? P4D_SIZE is 128 Gbytes

Christophe



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: (bisected) [PATCH v2 08/37] mm/hugetlb: check for unreasonable folio sizes when registering hstate
  2025-10-09 12:08                   ` Christophe Leroy
@ 2025-10-09 13:05                     ` David Hildenbrand
  0 siblings, 0 replies; 11+ messages in thread
From: David Hildenbrand @ 2025-10-09 13:05 UTC (permalink / raw)
  To: Christophe Leroy, linux-kernel
  Cc: Zi Yan, Lorenzo Stoakes, Liam R. Howlett, Alexander Potapenko,
	Andrew Morton, Brendan Jackman, Christoph Lameter, Dennis Zhou,
	Dmitry Vyukov, dri-devel, intel-gfx, iommu, io-uring,
	Jason Gunthorpe, Jens Axboe, Johannes Weiner, John Hubbard,
	kasan-dev, kvm, Linus Torvalds, linux-arm-kernel,
	linux-arm-kernel, linux-crypto, linux-ide, linux-kselftest,
	linux-mips, linux-mmc, linux-mm, linux-riscv, linux-s390,
	linux-scsi, Marco Elver, Marek Szyprowski, Michal Hocko,
	Mike Rapoport, Muchun Song, netdev, Oscar Salvador, Peter Xu,
	Robin Murphy, Suren Baghdasaryan, Tejun Heo, virtualization,
	Vlastimil Babka, wireguard, x86, linuxppc-dev@lists.ozlabs.org

On 09.10.25 14:08, Christophe Leroy wrote:
> 
> 
> Le 09/10/2025 à 12:27, David Hildenbrand a écrit :
>> On 09.10.25 12:01, Christophe Leroy wrote:
>>>
>>>
>>> Le 09/10/2025 à 11:20, David Hildenbrand a écrit :
>>>> On 09.10.25 11:16, Christophe Leroy wrote:
>>>>>
>>>>>
>>>>> Le 09/10/2025 à 10:14, David Hildenbrand a écrit :
>>>>>> On 09.10.25 10:04, Christophe Leroy wrote:
>>>>>>>
>>>>>>>
>>>>>>> Le 09/10/2025 à 09:22, David Hildenbrand a écrit :
>>>>>>>> On 09.10.25 09:14, Christophe Leroy wrote:
>>>>>>>>> Hi David,
>>>>>>>>>
>>>>>>>>> Le 01/09/2025 à 17:03, David Hildenbrand a écrit :
>>>>>>>>>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>>>>>>>>>> index 1e777cc51ad04..d3542e92a712e 100644
>>>>>>>>>> --- a/mm/hugetlb.c
>>>>>>>>>> +++ b/mm/hugetlb.c
>>>>>>>>>> @@ -4657,6 +4657,7 @@ static int __init hugetlb_init(void)
>>>>>>>>>>            BUILD_BUG_ON(sizeof_field(struct page, private) *
>>>>>>>>>> BITS_PER_BYTE <
>>>>>>>>>>                    __NR_HPAGEFLAGS);
>>>>>>>>>> +    BUILD_BUG_ON_INVALID(HUGETLB_PAGE_ORDER > MAX_FOLIO_ORDER);
>>>>>>>>>>            if (!hugepages_supported()) {
>>>>>>>>>>                if (hugetlb_max_hstate ||
>>>>>>>>>> default_hstate_max_huge_pages)
>>>>>>>>>> @@ -4740,6 +4741,7 @@ void __init hugetlb_add_hstate(unsigned int
>>>>>>>>>> order)
>>>>>>>>>>            }
>>>>>>>>>>            BUG_ON(hugetlb_max_hstate >= HUGE_MAX_HSTATE);
>>>>>>>>>>            BUG_ON(order < order_base_2(__NR_USED_SUBPAGE));
>>>>>>>>>> +    WARN_ON(order > MAX_FOLIO_ORDER);
>>>>>>>>>>            h = &hstates[hugetlb_max_hstate++];
>>>>>>>>>>            __mutex_init(&h->resize_lock, "resize mutex", &h-
>>>>>>>>>>> resize_key);
>>>>>>>>>>            h->order = order;
>>>>>>>>
>>>>>>>> We end up registering hugetlb folios that are bigger than
>>>>>>>> MAX_FOLIO_ORDER. So we have to figure out how a config can trigger
>>>>>>>> that
>>>>>>>> (and if we have to support that).
>>>>>>>>
>>>>>>>
>>>>>>> MAX_FOLIO_ORDER is defined as:
>>>>>>>
>>>>>>> #ifdef CONFIG_ARCH_HAS_GIGANTIC_PAGE
>>>>>>> #define MAX_FOLIO_ORDER        PUD_ORDER
>>>>>>> #else
>>>>>>> #define MAX_FOLIO_ORDER        MAX_PAGE_ORDER
>>>>>>> #endif
>>>>>>>
>>>>>>> MAX_PAGE_ORDER is the limit for dynamic creation of hugepages via
>>>>>>> /sys/kernel/mm/hugepages/ but bigger pages can be created at boottime
>>>>>>> with kernel boot parameters without CONFIG_ARCH_HAS_GIGANTIC_PAGE:
>>>>>>>
>>>>>>>        hugepagesz=64m hugepages=1 hugepagesz=256m hugepages=1
>>>>>>>
>>>>>>> Gives:
>>>>>>>
>>>>>>> HugeTLB: registered 1.00 GiB page size, pre-allocated 0 pages
>>>>>>> HugeTLB: 0 KiB vmemmap can be freed for a 1.00 GiB page
>>>>>>> HugeTLB: registered 64.0 MiB page size, pre-allocated 1 pages
>>>>>>> HugeTLB: 0 KiB vmemmap can be freed for a 64.0 MiB page
>>>>>>> HugeTLB: registered 256 MiB page size, pre-allocated 1 pages
>>>>>>> HugeTLB: 0 KiB vmemmap can be freed for a 256 MiB page
>>>>>>> HugeTLB: registered 4.00 MiB page size, pre-allocated 0 pages
>>>>>>> HugeTLB: 0 KiB vmemmap can be freed for a 4.00 MiB page
>>>>>>> HugeTLB: registered 16.0 MiB page size, pre-allocated 0 pages
>>>>>>> HugeTLB: 0 KiB vmemmap can be freed for a 16.0 MiB page
>>>>>>
>>>>>> I think it's a violation of CONFIG_ARCH_HAS_GIGANTIC_PAGE. The
>>>>>> existing
>>>>>> folio_dump() code would not handle it correctly as well.
>>>>>
>>>>> I'm trying to dig into history and when looking at commit 4eb0716e868e
>>>>> ("hugetlb: allow to free gigantic pages regardless of the
>>>>> configuration") I understand that CONFIG_ARCH_HAS_GIGANTIC_PAGE is
>>>>> needed to be able to allocate gigantic pages at runtime. It is not
>>>>> needed to reserve gigantic pages at boottime.
>>>>>
>>>>> What am I missing ?
>>>>
>>>> That CONFIG_ARCH_HAS_GIGANTIC_PAGE has nothing runtime-specific in its
>>>> name.
>>>
>>> In its name for sure, but the commit I mention says:
>>>
>>>        On systems without CONTIG_ALLOC activated but that support gigantic
>>> pages,
>>>        boottime reserved gigantic pages can not be freed at all.  This
>>> patch
>>>        simply enables the possibility to hand back those pages to memory
>>>        allocator.
>>
>> Right, I think it was a historical artifact.
>>
>>>
>>> And one of the hunks is:
>>>
>>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>>> index 7f7fbd8bd9d5b..7a1aa53d188d3 100644
>>> --- a/arch/arm64/Kconfig
>>> +++ b/arch/arm64/Kconfig
>>> @@ -19,7 +19,7 @@ config ARM64
>>>            select ARCH_HAS_FAST_MULTIPLIER
>>>            select ARCH_HAS_FORTIFY_SOURCE
>>>            select ARCH_HAS_GCOV_PROFILE_ALL
>>> -       select ARCH_HAS_GIGANTIC_PAGE if CONTIG_ALLOC
>>> +       select ARCH_HAS_GIGANTIC_PAGE
>>>            select ARCH_HAS_KCOV
>>>            select ARCH_HAS_KEEPINITRD
>>>            select ARCH_HAS_MEMBARRIER_SYNC_CORE
>>>
>>> So I understand from the commit message that it was possible at that
>>> time to have gigantic pages without ARCH_HAS_GIGANTIC_PAGE as long as
>>> you didn't have to be able to free them during runtime.
>>
>> Yes, I agree.
>>
>>>
>>>>
>>>> Can't we just select CONFIG_ARCH_HAS_GIGANTIC_PAGE for the relevant
>>>> hugetlb config that allows for *gigantic pages*.
>>>>
>>>
>>> We probably can, but I'd really like to understand history and how we
>>> ended up in the situation we are now.
>>> Because blind fixes often lead to more problems.
>>
>> Yes, let's figure out how to to it cleanly.
>>
>>>
>>> If I follow things correctly I see a helper gigantic_page_supported()
>>> added by commit 944d9fec8d7a ("hugetlb: add support for gigantic page
>>> allocation at runtime").
>>>
>>> And then commit 461a7184320a ("mm/hugetlb: introduce
>>> ARCH_HAS_GIGANTIC_PAGE") is added to wrap gigantic_page_supported()
>>>
>>> Then commit 4eb0716e868e ("hugetlb: allow to free gigantic pages
>>> regardless of the configuration") changed gigantic_page_supported() to
>>> gigantic_page_runtime_supported()
>>>
>>> So where are we now ?
>>
>> In
>>
>> commit fae7d834c43ccdb9fcecaf4d0f33145d884b3e5c
>> Author: Matthew Wilcox (Oracle) <willy@infradead.org>
>> Date:   Tue Feb 27 19:23:31 2024 +0000
>>
>>       mm: add __dump_folio()
>>
>>
>> We started assuming that a folio in the system (boottime, dynamic,
>> whatever)
>> has a maximum of MAX_FOLIO_NR_PAGES.
>>
>> Any other interpretation doesn't make any sense for MAX_FOLIO_NR_PAGES.
>>
>>
>> So we have two questions:
>>
>> 1) How to teach MAX_FOLIO_NR_PAGES that hugetlb supports gigantic pages
>>
>> 2) How do we handle CONFIG_ARCH_HAS_GIGANTIC_PAGE
>>
>>
>> We have the following options
>>
>> (A) Rename existing CONFIG_ARCH_HAS_GIGANTIC_PAGE to something else that is
>> clearer and add a new CONFIG_ARCH_HAS_GIGANTIC_PAGE.
>>
>> (B) Rename existing CONFIG_ARCH_HAS_GIGANTIC_PAGE -> to something else
>> that is
>> clearer and derive somehow else that hugetlb in that config supports
>> gigantic pages.
>>
>> (c) Just use CONFIG_ARCH_HAS_GIGANTIC_PAGE if hugetlb on an architecture
>> supports gigantic pages.
>>
>>
>> I don't quite see why an architecture should be able to opt in into
>> dynamically
>> allocating+freeing gigantic pages. That's just CONTIG_ALLOC magic and
>> not some
>> arch-specific thing IIRC.
>>
>>
>> Note that in mm/hugetlb.c it is
>>
>>       #ifdef CONFIG_ARCH_HAS_GIGANTIC_PAGE
>>       #ifdef CONFIG_CONTIG_ALLOC
>>
>> Meaning that at least the allocation side is guarded by CONTIG_ALLOC.
> 
> Yes but not the freeing since commit 4eb0716e868e ("hugetlb: allow to
> free gigantic pages regardless of the configuration")

Right, the freeing path is just always around as we no longer depend 
free_contig_range().

> 
>>
>> So I think (C) is just the right thing to do.
>>
>> diff --git a/fs/Kconfig b/fs/Kconfig
>> index 0bfdaecaa8775..12c11eb9279d3 100644
>> --- a/fs/Kconfig
>> +++ b/fs/Kconfig
>> @@ -283,6 +283,8 @@ config HUGETLB_PMD_PAGE_TABLE_SHARING
>>           def_bool HUGETLB_PAGE
>>           depends on ARCH_WANT_HUGE_PMD_SHARE && SPLIT_PMD_PTLOCKS
>>
>> +# An architecture must select this option if there is any mechanism
>> (esp. hugetlb)
>> +# could obtain gigantic folios.
>>    config ARCH_HAS_GIGANTIC_PAGE
>>           bool
>>
>>
> 
> I gave it a try. That's not enough, it fixes the problem for 64 Mbytes
> pages and 256 Mbytes pages, but not for 1 Gbytes pages.

Thanks!

> 
> Max folio is defined by PUD_ORDER, but PUD_SIZE is 256 Mbytes so we need
> to make MAX_FOLIO larger. Do we change it to P4D_ORDER or is it too much
> ? P4D_SIZE is 128 Gbytes

The exact size doesn't matter, we started with something that soundes 
reasonable.

I added the comment "There is no real limit on the folio size. We limit 
them to the maximum we currently expect (e.g., hugetlb, dax)."

We can set it to whatever we would expect for now.

-- 
Cheers

David / dhildenb



^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2025-10-09 13:05 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20250901150359.867252-1-david@redhat.com>
     [not found] ` <20250901150359.867252-9-david@redhat.com>
2025-10-09  7:14   ` (bisected) [PATCH v2 08/37] mm/hugetlb: check for unreasonable folio sizes when registering hstate Christophe Leroy
2025-10-09  7:22     ` David Hildenbrand
2025-10-09  7:44       ` Christophe Leroy
2025-10-09  8:04       ` Christophe Leroy
2025-10-09  8:14         ` David Hildenbrand
2025-10-09  9:16           ` Christophe Leroy
2025-10-09  9:20             ` David Hildenbrand
2025-10-09 10:01               ` Christophe Leroy
2025-10-09 10:27                 ` David Hildenbrand
2025-10-09 12:08                   ` Christophe Leroy
2025-10-09 13:05                     ` David Hildenbrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).