From: David Hildenbrand <david@redhat.com>
To: Christophe Leroy <christophe.leroy@csgroup.eu>,
linux-kernel@vger.kernel.org
Cc: Zi Yan <ziy@nvidia.com>,
Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
"Liam R. Howlett" <Liam.Howlett@oracle.com>,
Alexander Potapenko <glider@google.com>,
Andrew Morton <akpm@linux-foundation.org>,
Brendan Jackman <jackmanb@google.com>,
Christoph Lameter <cl@gentwo.org>,
Dennis Zhou <dennis@kernel.org>,
Dmitry Vyukov <dvyukov@google.com>,
dri-devel@lists.freedesktop.org, intel-gfx@lists.freedesktop.org,
iommu@lists.linux.dev, io-uring@vger.kernel.org,
Jason Gunthorpe <jgg@nvidia.com>, Jens Axboe <axboe@kernel.dk>,
Johannes Weiner <hannes@cmpxchg.org>,
John Hubbard <jhubbard@nvidia.com>,
kasan-dev@googlegroups.com, kvm@vger.kernel.org,
Linus Torvalds <torvalds@linux-foundation.org>,
linux-arm-kernel@axis.com, linux-arm-kernel@lists.infradead.org,
linux-crypto@vger.kernel.org, linux-ide@vger.kernel.org,
linux-kselftest@vger.kernel.org, linux-mips@vger.kernel.org,
linux-mmc@vger.kernel.org, linux-mm@kvack.org,
linux-riscv@lists.infradead.org, linux-s390@vger.kernel.org,
linux-scsi@vger.kernel.org, Marco Elver <elver@google.com>,
Marek Szyprowski <m.szyprowski@samsung.com>,
Michal Hocko <mhocko@suse.com>, Mike Rapoport <rppt@kernel.org>,
Muchun Song <muchun.song@linux.dev>,
netdev@vger.kernel.org, Oscar Salvador <osalvador@suse.de>,
Peter Xu <peterx@redhat.com>, Robin Murphy <robin.murphy@arm.com>,
Suren Baghdasaryan <surenb@google.com>, Tejun Heo <tj@kernel.org>,
virtualization@lists.linux.dev, Vlastimil Babka <vbabka@suse.cz>,
wireguard@lists.zx2c4.com, x86@kernel.org,
"linuxppc-dev@lists.ozlabs.org" <linuxppc-dev@lists.ozlabs.org>
Subject: Re: (bisected) [PATCH v2 08/37] mm/hugetlb: check for unreasonable folio sizes when registering hstate
Date: Thu, 9 Oct 2025 12:27:17 +0200 [thread overview]
Message-ID: <543e9440-8ee0-4d9e-9b05-0107032d665b@redhat.com> (raw)
In-Reply-To: <0c730c52-97ee-43ea-9697-ac11d2880ab7@csgroup.eu>
On 09.10.25 12:01, Christophe Leroy wrote:
>
>
> Le 09/10/2025 à 11:20, David Hildenbrand a écrit :
>> On 09.10.25 11:16, Christophe Leroy wrote:
>>>
>>>
>>> Le 09/10/2025 à 10:14, David Hildenbrand a écrit :
>>>> On 09.10.25 10:04, Christophe Leroy wrote:
>>>>>
>>>>>
>>>>> Le 09/10/2025 à 09:22, David Hildenbrand a écrit :
>>>>>> On 09.10.25 09:14, Christophe Leroy wrote:
>>>>>>> Hi David,
>>>>>>>
>>>>>>> Le 01/09/2025 à 17:03, David Hildenbrand a écrit :
>>>>>>>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>>>>>>>> index 1e777cc51ad04..d3542e92a712e 100644
>>>>>>>> --- a/mm/hugetlb.c
>>>>>>>> +++ b/mm/hugetlb.c
>>>>>>>> @@ -4657,6 +4657,7 @@ static int __init hugetlb_init(void)
>>>>>>>> BUILD_BUG_ON(sizeof_field(struct page, private) *
>>>>>>>> BITS_PER_BYTE <
>>>>>>>> __NR_HPAGEFLAGS);
>>>>>>>> + BUILD_BUG_ON_INVALID(HUGETLB_PAGE_ORDER > MAX_FOLIO_ORDER);
>>>>>>>> if (!hugepages_supported()) {
>>>>>>>> if (hugetlb_max_hstate ||
>>>>>>>> default_hstate_max_huge_pages)
>>>>>>>> @@ -4740,6 +4741,7 @@ void __init hugetlb_add_hstate(unsigned int
>>>>>>>> order)
>>>>>>>> }
>>>>>>>> BUG_ON(hugetlb_max_hstate >= HUGE_MAX_HSTATE);
>>>>>>>> BUG_ON(order < order_base_2(__NR_USED_SUBPAGE));
>>>>>>>> + WARN_ON(order > MAX_FOLIO_ORDER);
>>>>>>>> h = &hstates[hugetlb_max_hstate++];
>>>>>>>> __mutex_init(&h->resize_lock, "resize mutex", &h-
>>>>>>>>> resize_key);
>>>>>>>> h->order = order;
>>>>>>
>>>>>> We end up registering hugetlb folios that are bigger than
>>>>>> MAX_FOLIO_ORDER. So we have to figure out how a config can trigger
>>>>>> that
>>>>>> (and if we have to support that).
>>>>>>
>>>>>
>>>>> MAX_FOLIO_ORDER is defined as:
>>>>>
>>>>> #ifdef CONFIG_ARCH_HAS_GIGANTIC_PAGE
>>>>> #define MAX_FOLIO_ORDER PUD_ORDER
>>>>> #else
>>>>> #define MAX_FOLIO_ORDER MAX_PAGE_ORDER
>>>>> #endif
>>>>>
>>>>> MAX_PAGE_ORDER is the limit for dynamic creation of hugepages via
>>>>> /sys/kernel/mm/hugepages/ but bigger pages can be created at boottime
>>>>> with kernel boot parameters without CONFIG_ARCH_HAS_GIGANTIC_PAGE:
>>>>>
>>>>> hugepagesz=64m hugepages=1 hugepagesz=256m hugepages=1
>>>>>
>>>>> Gives:
>>>>>
>>>>> HugeTLB: registered 1.00 GiB page size, pre-allocated 0 pages
>>>>> HugeTLB: 0 KiB vmemmap can be freed for a 1.00 GiB page
>>>>> HugeTLB: registered 64.0 MiB page size, pre-allocated 1 pages
>>>>> HugeTLB: 0 KiB vmemmap can be freed for a 64.0 MiB page
>>>>> HugeTLB: registered 256 MiB page size, pre-allocated 1 pages
>>>>> HugeTLB: 0 KiB vmemmap can be freed for a 256 MiB page
>>>>> HugeTLB: registered 4.00 MiB page size, pre-allocated 0 pages
>>>>> HugeTLB: 0 KiB vmemmap can be freed for a 4.00 MiB page
>>>>> HugeTLB: registered 16.0 MiB page size, pre-allocated 0 pages
>>>>> HugeTLB: 0 KiB vmemmap can be freed for a 16.0 MiB page
>>>>
>>>> I think it's a violation of CONFIG_ARCH_HAS_GIGANTIC_PAGE. The existing
>>>> folio_dump() code would not handle it correctly as well.
>>>
>>> I'm trying to dig into history and when looking at commit 4eb0716e868e
>>> ("hugetlb: allow to free gigantic pages regardless of the
>>> configuration") I understand that CONFIG_ARCH_HAS_GIGANTIC_PAGE is
>>> needed to be able to allocate gigantic pages at runtime. It is not
>>> needed to reserve gigantic pages at boottime.
>>>
>>> What am I missing ?
>>
>> That CONFIG_ARCH_HAS_GIGANTIC_PAGE has nothing runtime-specific in its
>> name.
>
> In its name for sure, but the commit I mention says:
>
> On systems without CONTIG_ALLOC activated but that support gigantic
> pages,
> boottime reserved gigantic pages can not be freed at all. This patch
> simply enables the possibility to hand back those pages to memory
> allocator.
Right, I think it was a historical artifact.
>
> And one of the hunks is:
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 7f7fbd8bd9d5b..7a1aa53d188d3 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -19,7 +19,7 @@ config ARM64
> select ARCH_HAS_FAST_MULTIPLIER
> select ARCH_HAS_FORTIFY_SOURCE
> select ARCH_HAS_GCOV_PROFILE_ALL
> - select ARCH_HAS_GIGANTIC_PAGE if CONTIG_ALLOC
> + select ARCH_HAS_GIGANTIC_PAGE
> select ARCH_HAS_KCOV
> select ARCH_HAS_KEEPINITRD
> select ARCH_HAS_MEMBARRIER_SYNC_CORE
>
> So I understand from the commit message that it was possible at that
> time to have gigantic pages without ARCH_HAS_GIGANTIC_PAGE as long as
> you didn't have to be able to free them during runtime.
Yes, I agree.
>
>>
>> Can't we just select CONFIG_ARCH_HAS_GIGANTIC_PAGE for the relevant
>> hugetlb config that allows for *gigantic pages*.
>>
>
> We probably can, but I'd really like to understand history and how we
> ended up in the situation we are now.
> Because blind fixes often lead to more problems.
Yes, let's figure out how to to it cleanly.
>
> If I follow things correctly I see a helper gigantic_page_supported()
> added by commit 944d9fec8d7a ("hugetlb: add support for gigantic page
> allocation at runtime").
>
> And then commit 461a7184320a ("mm/hugetlb: introduce
> ARCH_HAS_GIGANTIC_PAGE") is added to wrap gigantic_page_supported()
>
> Then commit 4eb0716e868e ("hugetlb: allow to free gigantic pages
> regardless of the configuration") changed gigantic_page_supported() to
> gigantic_page_runtime_supported()
>
> So where are we now ?
In
commit fae7d834c43ccdb9fcecaf4d0f33145d884b3e5c
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date: Tue Feb 27 19:23:31 2024 +0000
mm: add __dump_folio()
We started assuming that a folio in the system (boottime, dynamic, whatever)
has a maximum of MAX_FOLIO_NR_PAGES.
Any other interpretation doesn't make any sense for MAX_FOLIO_NR_PAGES.
So we have two questions:
1) How to teach MAX_FOLIO_NR_PAGES that hugetlb supports gigantic pages
2) How do we handle CONFIG_ARCH_HAS_GIGANTIC_PAGE
We have the following options
(A) Rename existing CONFIG_ARCH_HAS_GIGANTIC_PAGE to something else that is
clearer and add a new CONFIG_ARCH_HAS_GIGANTIC_PAGE.
(B) Rename existing CONFIG_ARCH_HAS_GIGANTIC_PAGE -> to something else that is
clearer and derive somehow else that hugetlb in that config supports gigantic pages.
(c) Just use CONFIG_ARCH_HAS_GIGANTIC_PAGE if hugetlb on an architecture
supports gigantic pages.
I don't quite see why an architecture should be able to opt in into dynamically
allocating+freeing gigantic pages. That's just CONTIG_ALLOC magic and not some
arch-specific thing IIRC.
Note that in mm/hugetlb.c it is
#ifdef CONFIG_ARCH_HAS_GIGANTIC_PAGE
#ifdef CONFIG_CONTIG_ALLOC
Meaning that at least the allocation side is guarded by CONTIG_ALLOC.
So I think (C) is just the right thing to do.
diff --git a/fs/Kconfig b/fs/Kconfig
index 0bfdaecaa8775..12c11eb9279d3 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -283,6 +283,8 @@ config HUGETLB_PMD_PAGE_TABLE_SHARING
def_bool HUGETLB_PAGE
depends on ARCH_WANT_HUGE_PMD_SHARE && SPLIT_PMD_PTLOCKS
+# An architecture must select this option if there is any mechanism (esp. hugetlb)
+# could obtain gigantic folios.
config ARCH_HAS_GIGANTIC_PAGE
bool
--
Cheers
David / dhildenb
next prev parent reply other threads:[~2025-10-09 10:27 UTC|newest]
Thread overview: 71+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-01 15:03 [PATCH v2 00/37] mm: remove nth_page() David Hildenbrand
2025-09-01 15:03 ` [PATCH v2 01/37] mm: stop making SPARSEMEM_VMEMMAP user-selectable David Hildenbrand
2025-09-01 15:03 ` [PATCH v2 02/37] arm64: Kconfig: drop superfluous "select SPARSEMEM_VMEMMAP" David Hildenbrand
2025-09-01 15:03 ` [PATCH v2 03/37] s390/Kconfig: " David Hildenbrand
2025-09-01 15:03 ` [PATCH v2 04/37] x86/Kconfig: " David Hildenbrand
2025-09-01 15:03 ` [PATCH v2 05/37] wireguard: selftests: remove CONFIG_SPARSEMEM_VMEMMAP=y from qemu kernel config David Hildenbrand
2025-09-08 16:48 ` Jason A. Donenfeld
2025-09-01 15:03 ` [PATCH v2 06/37] mm/page_alloc: reject unreasonable folio/compound page sizes in alloc_contig_range_noprof() David Hildenbrand
2025-09-01 15:03 ` [PATCH v2 07/37] mm/memremap: reject unreasonable folio/compound page sizes in memremap_pages() David Hildenbrand
2025-09-01 15:03 ` [PATCH v2 08/37] mm/hugetlb: check for unreasonable folio sizes when registering hstate David Hildenbrand
2025-10-09 7:14 ` (bisected) " Christophe Leroy
2025-10-09 7:22 ` David Hildenbrand
2025-10-09 7:44 ` Christophe Leroy
2025-10-09 8:04 ` Christophe Leroy
2025-10-09 8:14 ` David Hildenbrand
2025-10-09 9:16 ` Christophe Leroy
2025-10-09 9:20 ` David Hildenbrand
2025-10-09 10:01 ` Christophe Leroy
2025-10-09 10:27 ` David Hildenbrand [this message]
2025-10-09 12:08 ` Christophe Leroy
2025-10-09 13:05 ` David Hildenbrand
2025-09-01 15:03 ` [PATCH v2 09/37] mm/mm_init: make memmap_init_compound() look more like prep_compound_page() David Hildenbrand
2025-09-01 15:03 ` [PATCH v2 10/37] mm: sanity-check maximum folio size in folio_set_order() David Hildenbrand
2025-09-01 15:03 ` [PATCH v2 11/37] mm: limit folio/compound page sizes in problematic kernel configs David Hildenbrand
2025-09-01 15:03 ` [PATCH v2 12/37] mm: simplify folio_page() and folio_page_idx() David Hildenbrand
2025-09-01 15:03 ` [PATCH v2 13/37] mm/hugetlb: cleanup hugetlb_folio_init_tail_vmemmap() David Hildenbrand
2025-09-01 15:03 ` [PATCH v2 14/37] mm/mm/percpu-km: drop nth_page() usage within single allocation David Hildenbrand
2025-09-01 15:03 ` [PATCH v2 15/37] fs: hugetlbfs: remove nth_page() usage within folio in adjust_range_hwpoison() David Hildenbrand
2025-09-01 15:03 ` [PATCH v2 16/37] fs: hugetlbfs: cleanup " David Hildenbrand
2025-09-01 15:03 ` [PATCH v2 17/37] mm/pagewalk: drop nth_page() usage within folio in folio_walk_start() David Hildenbrand
2025-09-01 15:03 ` [PATCH v2 18/37] mm/gup: drop nth_page() usage within folio when recording subpages David Hildenbrand
2025-09-01 15:03 ` [PATCH v2 19/37] mm/gup: remove record_subpages() David Hildenbrand
2025-09-05 6:41 ` David Hildenbrand
2025-09-05 11:26 ` Jens Axboe
2025-09-05 11:34 ` Lorenzo Stoakes
2025-09-05 11:38 ` David Hildenbrand
2025-09-05 23:00 ` Eric Biggers
2025-09-06 6:57 ` David Hildenbrand
2025-09-09 4:25 ` Andrew Morton
2025-09-06 1:05 ` John Hubbard
2025-09-06 6:56 ` David Hildenbrand
2025-09-06 7:00 ` David Hildenbrand
2025-09-07 5:14 ` John Hubbard
2025-09-08 8:00 ` David Hildenbrand
2025-09-08 12:25 ` Lorenzo Stoakes
2025-09-08 12:53 ` David Hildenbrand
2025-09-08 17:12 ` John Hubbard
2025-09-08 15:16 ` Mark Brown
2025-09-08 15:22 ` David Hildenbrand
2025-09-08 15:28 ` Mark Brown
2025-09-01 15:03 ` [PATCH v2 20/37] io_uring/zcrx: remove nth_page() usage within folio David Hildenbrand
2025-09-01 15:03 ` [PATCH v2 21/37] mips: mm: convert __flush_dcache_pages() to __flush_dcache_folio_pages() David Hildenbrand
2025-09-01 15:03 ` [PATCH v2 22/37] mm/cma: refuse handing out non-contiguous page ranges David Hildenbrand
2025-09-09 9:55 ` David Hildenbrand
2025-09-01 15:03 ` [PATCH v2 23/37] dma-remap: drop nth_page() in dma_common_contiguous_remap() David Hildenbrand
2025-09-01 15:03 ` [PATCH v2 24/37] scatterlist: disallow non-contigous page ranges in a single SG entry David Hildenbrand
2025-09-01 15:03 ` [PATCH v2 25/37] ata: libata-sff: drop nth_page() usage within " David Hildenbrand
2025-09-01 15:03 ` [PATCH v2 26/37] drm/i915/gem: " David Hildenbrand
2025-09-02 9:22 ` Tvrtko Ursulin
2025-09-02 9:42 ` David Hildenbrand
2025-09-01 15:03 ` [PATCH v2 27/37] mspro_block: " David Hildenbrand
2025-09-01 15:03 ` [PATCH v2 28/37] memstick: " David Hildenbrand
2025-09-01 15:03 ` [PATCH v2 29/37] mmc: " David Hildenbrand
2025-09-01 15:03 ` [PATCH v2 30/37] scsi: scsi_lib: " David Hildenbrand
2025-09-01 15:03 ` [PATCH v2 31/37] scsi: sg: " David Hildenbrand
2025-09-01 15:03 ` [PATCH v2 32/37] vfio/pci: " David Hildenbrand
2025-09-01 15:03 ` [PATCH v2 33/37] crypto: remove " David Hildenbrand
2025-09-01 15:03 ` [PATCH v2 34/37] mm/gup: drop nth_page() usage in unpin_user_page_range_dirty_lock() David Hildenbrand
2025-09-01 15:03 ` [PATCH v2 35/37] kfence: drop nth_page() usage David Hildenbrand
2025-09-01 15:03 ` [PATCH v2 36/37] block: update comment of "struct bio_vec" regarding nth_page() David Hildenbrand
2025-09-01 15:03 ` [PATCH v2 37/37] mm: remove nth_page() David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=543e9440-8ee0-4d9e-9b05-0107032d665b@redhat.com \
--to=david@redhat.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=axboe@kernel.dk \
--cc=christophe.leroy@csgroup.eu \
--cc=cl@gentwo.org \
--cc=dennis@kernel.org \
--cc=dri-devel@lists.freedesktop.org \
--cc=dvyukov@google.com \
--cc=elver@google.com \
--cc=glider@google.com \
--cc=hannes@cmpxchg.org \
--cc=intel-gfx@lists.freedesktop.org \
--cc=io-uring@vger.kernel.org \
--cc=iommu@lists.linux.dev \
--cc=jackmanb@google.com \
--cc=jgg@nvidia.com \
--cc=jhubbard@nvidia.com \
--cc=kasan-dev@googlegroups.com \
--cc=kvm@vger.kernel.org \
--cc=linux-arm-kernel@axis.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-crypto@vger.kernel.org \
--cc=linux-ide@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-mips@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-mmc@vger.kernel.org \
--cc=linux-riscv@lists.infradead.org \
--cc=linux-s390@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=m.szyprowski@samsung.com \
--cc=mhocko@suse.com \
--cc=muchun.song@linux.dev \
--cc=netdev@vger.kernel.org \
--cc=osalvador@suse.de \
--cc=peterx@redhat.com \
--cc=robin.murphy@arm.com \
--cc=rppt@kernel.org \
--cc=surenb@google.com \
--cc=tj@kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=vbabka@suse.cz \
--cc=virtualization@lists.linux.dev \
--cc=wireguard@lists.zx2c4.com \
--cc=x86@kernel.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).