From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A840335BDD5 for ; Mon, 2 Feb 2026 10:45:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770029104; cv=none; b=YYPxY+DNQDsGmjBQkxK930sOHabqmoiwyf0rL6jq1cJmZSZ242Fmb2uuq0Zuz1hBx8Go1ZwlMgi1yXikwNGLa2fHl1pje+76dCptz9tZgfAiZlZOY9X1dlKQRsBiKQMryX086iGyNUxNgw+L+7RIiHlqMjIP5TS03a5tKp5PVqA= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770029104; c=relaxed/simple; bh=mrWQQULk/bw9eyy/yWVZip3TlN/bFU6wKm3IsysqkLA=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=nzguVM/71jzea6Od9vce+f+mJ+4iO+07epV4gHRcKl53bZcOr6wLTGCvpf9r8mWzhWEjb5QrAjZd/haPauFOjRj4XisVjNLVpGys5ebhj8vtVI8ITCKHg3xkZSJ+HQ1IWXOwT4WctZ1yy64TgI+cp9Gn9BwYAL2+kfmyPM+xkIE= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=PTmhn1QM; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="PTmhn1QM" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0B5F3C2BC87; Mon, 2 Feb 2026 10:45:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1770029104; bh=mrWQQULk/bw9eyy/yWVZip3TlN/bFU6wKm3IsysqkLA=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=PTmhn1QM/kDIZwUAp7IKwDeSBz966RniiZnNCnNKy5OQVaClTP5oTVolqF56NndHB /IbOBs1GQzkQfJC4rsBsC1jAIkEMryuqKc/f0bh39XObKGPitQN4BlDf6RMgFJuA+U IAfieoPtC7CYNtPlEa0uoxN9/Ded+S2VW+bTIhzDASAP1o15wrB9lwSkaQy3unabkD 8XNtXcte+opBRVa70/qqFjF1GGbhMd2c/rXv6lQA2Qavl+uy+NtMp902NbPd7LgHgH 4KKlIXDma4Y/j+GQ9jj6ITFW41We2OmKCVaUHQYJwcCnB/qz+wvySKla1I26bFcfLP Nw8fFMFGyGoxA== Received: from phl-compute-01.internal (phl-compute-01.internal [10.202.2.41]) by mailfauth.phl.internal (Postfix) with ESMTP id 1A33FF4008C; Mon, 2 Feb 2026 05:45:03 -0500 (EST) Received: from phl-frontend-04 ([10.202.2.163]) by phl-compute-01.internal (MEProxy); Mon, 02 Feb 2026 05:45:03 -0500 X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefgedrtddtgddujeejgeefucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfurfetoffkrfgpnffqhgenuceu rghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujf gurhepfffhvfevuffkfhggtggujgesthdtredttddtvdenucfhrhhomhepmfhirhihlhcu ufhhuhhtshgvmhgruhcuoehkrghssehkvghrnhgvlhdrohhrgheqnecuggftrfgrthhtvg hrnhepueeijeeiffekheeffffftdekleefleehhfefhfduheejhedvffeluedvudefgfek necuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepkhhirh hilhhlodhmvghsmhhtphgruhhthhhpvghrshhonhgrlhhithihqdduieduudeivdeiheeh qddvkeeggeegjedvkedqkhgrsheppehkvghrnhgvlhdrohhrghesshhhuhhtvghmohhvrd hnrghmvgdpnhgspghrtghpthhtohepfeekpdhmohguvgepshhmthhpohhuthdprhgtphht thhopehushgrmhgrrghrihhfieegvdesghhmrghilhdrtghomhdprhgtphhtthhopeiiih ihsehnvhhiughirgdrtghomhdprhgtphhtthhopegrkhhpmheslhhinhhugidqfhhouhhn uggrthhiohhnrdhorhhgpdhrtghpthhtohepuggrvhhiugeskhgvrhhnvghlrdhorhhgpd hrtghpthhtoheplhhorhgvnhiiohdrshhtohgrkhgvshesohhrrggtlhgvrdgtohhmpdhr tghpthhtoheplhhinhhugidqmhhmsehkvhgrtghkrdhorhhgpdhrtghpthhtohephhgrnh hnvghssegtmhhpgigthhhgrdhorhhgpdhrtghpthhtoheprhhivghlsehsuhhrrhhivghl rdgtohhmpdhrtghpthhtohepshhhrghkvggvlhdrsghuthhtsehlihhnuhigrdguvghv X-ME-Proxy: Feedback-ID: i10464835:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Mon, 2 Feb 2026 05:45:00 -0500 (EST) Date: Mon, 2 Feb 2026 10:44:54 +0000 From: Kiryl Shutsemau To: Usama Arif Cc: ziy@nvidia.com, Andrew Morton , David Hildenbrand , lorenzo.stoakes@oracle.com, linux-mm@kvack.org, hannes@cmpxchg.org, riel@surriel.com, shakeel.butt@linux.dev, baohua@kernel.org, dev.jain@arm.com, baolin.wang@linux.alibaba.com, npache@redhat.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, vbabka@suse.cz, lance.yang@linux.dev, linux-kernel@vger.kernel.org, kernel-team@meta.com Subject: Re: [RFC 01/12] mm: add PUD THP ptdesc and rmap support Message-ID: References: <20260202005451.774496-1-usamaarif642@gmail.com> <20260202005451.774496-2-usamaarif642@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20260202005451.774496-2-usamaarif642@gmail.com> On Sun, Feb 01, 2026 at 04:50:18PM -0800, Usama Arif wrote: > For page table management, PUD THPs need to pre-deposit page tables > that will be used when the huge page is later split. When a PUD THP > is allocated, we cannot know in advance when or why it might need to > be split (COW, partial unmap, reclaim), but we need page tables ready > for that eventuality. Similar to how PMD THPs deposit a single PTE > table, PUD THPs deposit a PMD table which itself contains deposited > PTE tables - a two-level deposit. This commit adds the deposit/withdraw > infrastructure and a new pud_huge_pmd field in ptdesc to store the > deposited PMD. > > The deposited PMD tables are stored as a singly-linked stack using only > page->lru.next as the link pointer. A doubly-linked list using the > standard list_head mechanism would cause memory corruption: list_del() > poisons both lru.next (offset 8) and lru.prev (offset 16), but lru.prev > overlaps with ptdesc->pmd_huge_pte at offset 16. Since deposited PMD > tables have their own deposited PTE tables stored in pmd_huge_pte, > poisoning lru.prev would corrupt the PTE table list and cause crashes > when withdrawing PTE tables during split. PMD THPs don't have this > problem because their deposited PTE tables don't have sub-deposits. > Using only lru.next avoids the overlap entirely. > > For reverse mapping, PUD THPs need the same rmap support that PMD THPs > have. The page_vma_mapped_walk() function is extended to recognize and > handle PUD-mapped folios during rmap traversal. A new TTU_SPLIT_HUGE_PUD > flag tells the unmap path to split PUD THPs before proceeding, since > there is no PUD-level migration entry format - the split converts the > single PUD mapping into individual PTE mappings that can be migrated > or swapped normally. > > Signed-off-by: Usama Arif > --- > include/linux/huge_mm.h | 5 +++ > include/linux/mm.h | 19 ++++++++ > include/linux/mm_types.h | 5 ++- > include/linux/pgtable.h | 8 ++++ > include/linux/rmap.h | 7 ++- > mm/huge_memory.c | 8 ++++ > mm/internal.h | 3 ++ > mm/page_vma_mapped.c | 35 +++++++++++++++ > mm/pgtable-generic.c | 83 ++++++++++++++++++++++++++++++++++ > mm/rmap.c | 96 +++++++++++++++++++++++++++++++++++++--- > 10 files changed, 260 insertions(+), 9 deletions(-) > > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h > index a4d9f964dfdea..e672e45bb9cc7 100644 > --- a/include/linux/huge_mm.h > +++ b/include/linux/huge_mm.h > @@ -463,10 +463,15 @@ void __split_huge_pud(struct vm_area_struct *vma, pud_t *pud, > unsigned long address); > > #ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD > +void split_huge_pud_locked(struct vm_area_struct *vma, pud_t *pud, > + unsigned long address); > int change_huge_pud(struct mmu_gather *tlb, struct vm_area_struct *vma, > pud_t *pudp, unsigned long addr, pgprot_t newprot, > unsigned long cp_flags); > #else > +static inline void > +split_huge_pud_locked(struct vm_area_struct *vma, pud_t *pud, > + unsigned long address) {} > static inline int > change_huge_pud(struct mmu_gather *tlb, struct vm_area_struct *vma, > pud_t *pudp, unsigned long addr, pgprot_t newprot, > diff --git a/include/linux/mm.h b/include/linux/mm.h > index ab2e7e30aef96..a15e18df0f771 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -3455,6 +3455,22 @@ static inline bool pagetable_pmd_ctor(struct mm_struct *mm, > * considered ready to switch to split PUD locks yet; there may be places > * which need to be converted from page_table_lock. > */ > +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD > +static inline struct page *pud_pgtable_page(pud_t *pud) > +{ > + unsigned long mask = ~(PTRS_PER_PUD * sizeof(pud_t) - 1); > + > + return virt_to_page((void *)((unsigned long)pud & mask)); > +} > + > +static inline struct ptdesc *pud_ptdesc(pud_t *pud) > +{ > + return page_ptdesc(pud_pgtable_page(pud)); > +} > + > +#define pud_huge_pmd(pud) (pud_ptdesc(pud)->pud_huge_pmd) > +#endif > + > static inline spinlock_t *pud_lockptr(struct mm_struct *mm, pud_t *pud) > { > return &mm->page_table_lock; > @@ -3471,6 +3487,9 @@ static inline spinlock_t *pud_lock(struct mm_struct *mm, pud_t *pud) > static inline void pagetable_pud_ctor(struct ptdesc *ptdesc) > { > __pagetable_ctor(ptdesc); > +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD > + ptdesc->pud_huge_pmd = NULL; > +#endif > } > > static inline void pagetable_p4d_ctor(struct ptdesc *ptdesc) > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h > index 78950eb8926dc..26a38490ae2e1 100644 > --- a/include/linux/mm_types.h > +++ b/include/linux/mm_types.h > @@ -577,7 +577,10 @@ struct ptdesc { > struct list_head pt_list; > struct { > unsigned long _pt_pad_1; > - pgtable_t pmd_huge_pte; > + union { > + pgtable_t pmd_huge_pte; /* For PMD tables: deposited PTE */ > + pgtable_t pud_huge_pmd; /* For PUD tables: deposited PMD list */ > + }; > }; > }; > unsigned long __page_mapping; > diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h > index 2f0dd3a4ace1a..3ce733c1d71a2 100644 > --- a/include/linux/pgtable.h > +++ b/include/linux/pgtable.h > @@ -1168,6 +1168,14 @@ extern pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp); > #define arch_needs_pgtable_deposit() (false) > #endif > > +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD > +extern void pgtable_trans_huge_pud_deposit(struct mm_struct *mm, pud_t *pudp, > + pmd_t *pmd_table); > +extern pmd_t *pgtable_trans_huge_pud_withdraw(struct mm_struct *mm, pud_t *pudp); > +extern void pud_deposit_pte(pmd_t *pmd_table, pgtable_t pgtable); > +extern pgtable_t pud_withdraw_pte(pmd_t *pmd_table); > +#endif > + > #ifdef CONFIG_TRANSPARENT_HUGEPAGE > /* > * This is an implementation of pmdp_establish() that is only suitable for an > diff --git a/include/linux/rmap.h b/include/linux/rmap.h > index daa92a58585d9..08cd0a0eb8763 100644 > --- a/include/linux/rmap.h > +++ b/include/linux/rmap.h > @@ -101,6 +101,7 @@ enum ttu_flags { > * do a final flush if necessary */ > TTU_RMAP_LOCKED = 0x80, /* do not grab rmap lock: > * caller holds it */ > + TTU_SPLIT_HUGE_PUD = 0x100, /* split huge PUD if any */ > }; > > #ifdef CONFIG_MMU > @@ -473,6 +474,8 @@ void folio_add_anon_rmap_ptes(struct folio *, struct page *, int nr_pages, > folio_add_anon_rmap_ptes(folio, page, 1, vma, address, flags) > void folio_add_anon_rmap_pmd(struct folio *, struct page *, > struct vm_area_struct *, unsigned long address, rmap_t flags); > +void folio_add_anon_rmap_pud(struct folio *, struct page *, > + struct vm_area_struct *, unsigned long address, rmap_t flags); > void folio_add_new_anon_rmap(struct folio *, struct vm_area_struct *, > unsigned long address, rmap_t flags); > void folio_add_file_rmap_ptes(struct folio *, struct page *, int nr_pages, > @@ -933,6 +936,7 @@ struct page_vma_mapped_walk { > pgoff_t pgoff; > struct vm_area_struct *vma; > unsigned long address; > + pud_t *pud; > pmd_t *pmd; > pte_t *pte; > spinlock_t *ptl; > @@ -970,7 +974,7 @@ static inline void page_vma_mapped_walk_done(struct page_vma_mapped_walk *pvmw) > static inline void > page_vma_mapped_walk_restart(struct page_vma_mapped_walk *pvmw) > { > - WARN_ON_ONCE(!pvmw->pmd && !pvmw->pte); > + WARN_ON_ONCE(!pvmw->pud && !pvmw->pmd && !pvmw->pte); > > if (likely(pvmw->ptl)) > spin_unlock(pvmw->ptl); > @@ -978,6 +982,7 @@ page_vma_mapped_walk_restart(struct page_vma_mapped_walk *pvmw) > WARN_ON_ONCE(1); > > pvmw->ptl = NULL; > + pvmw->pud = NULL; > pvmw->pmd = NULL; > pvmw->pte = NULL; > } > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index 40cf59301c21a..3128b3beedb0a 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -2933,6 +2933,14 @@ void __split_huge_pud(struct vm_area_struct *vma, pud_t *pud, > spin_unlock(ptl); > mmu_notifier_invalidate_range_end(&range); > } > + > +void split_huge_pud_locked(struct vm_area_struct *vma, pud_t *pud, > + unsigned long address) > +{ > + VM_WARN_ON_ONCE(!IS_ALIGNED(address, HPAGE_PUD_SIZE)); > + if (pud_trans_huge(*pud)) > + __split_huge_pud_locked(vma, pud, address); > +} > #else > void __split_huge_pud(struct vm_area_struct *vma, pud_t *pud, > unsigned long address) > diff --git a/mm/internal.h b/mm/internal.h > index 9ee336aa03656..21d5c00f638dc 100644 > --- a/mm/internal.h > +++ b/mm/internal.h > @@ -545,6 +545,9 @@ int user_proactive_reclaim(char *buf, > * in mm/rmap.c: > */ > pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address); > +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD > +pud_t *mm_find_pud(struct mm_struct *mm, unsigned long address); > +#endif > > /* > * in mm/page_alloc.c > diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c > index b38a1d00c971b..d31eafba38041 100644 > --- a/mm/page_vma_mapped.c > +++ b/mm/page_vma_mapped.c > @@ -146,6 +146,18 @@ static bool check_pmd(unsigned long pfn, struct page_vma_mapped_walk *pvmw) > return true; > } > > +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD > +/* Returns true if the two ranges overlap. Careful to not overflow. */ > +static bool check_pud(unsigned long pfn, struct page_vma_mapped_walk *pvmw) > +{ > + if ((pfn + HPAGE_PUD_NR - 1) < pvmw->pfn) > + return false; > + if (pfn > pvmw->pfn + pvmw->nr_pages - 1) > + return false; > + return true; > +} > +#endif > + > static void step_forward(struct page_vma_mapped_walk *pvmw, unsigned long size) > { > pvmw->address = (pvmw->address + size) & ~(size - 1); > @@ -188,6 +200,10 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) > pud_t *pud; > pmd_t pmde; > > + /* The only possible pud mapping has been handled on last iteration */ > + if (pvmw->pud && !pvmw->pmd) > + return not_found(pvmw); > + > /* The only possible pmd mapping has been handled on last iteration */ > if (pvmw->pmd && !pvmw->pte) > return not_found(pvmw); > @@ -234,6 +250,25 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) > continue; > } > > +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD > + /* Check for PUD-mapped THP */ > + if (pud_trans_huge(*pud)) { > + pvmw->pud = pud; > + pvmw->ptl = pud_lock(mm, pud); > + if (likely(pud_trans_huge(*pud))) { > + if (pvmw->flags & PVMW_MIGRATION) > + return not_found(pvmw); > + if (!check_pud(pud_pfn(*pud), pvmw)) > + return not_found(pvmw); > + return true; > + } > + /* PUD was split under us, retry at PMD level */ > + spin_unlock(pvmw->ptl); > + pvmw->ptl = NULL; > + pvmw->pud = NULL; > + } > +#endif > + > pvmw->pmd = pmd_offset(pud, pvmw->address); > /* > * Make sure the pmd value isn't cached in a register by the > diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c > index d3aec7a9926ad..2047558ddcd79 100644 > --- a/mm/pgtable-generic.c > +++ b/mm/pgtable-generic.c > @@ -195,6 +195,89 @@ pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp) > } > #endif > > +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD > +/* > + * Deposit page tables for PUD THP. > + * Called with PUD lock held. Stores PMD tables in a singly-linked stack > + * via pud_huge_pmd, using only pmd_page->lru.next as the link pointer. > + * > + * IMPORTANT: We use only lru.next (offset 8) for linking, NOT the full > + * list_head. This is because lru.prev (offset 16) overlaps with > + * ptdesc->pmd_huge_pte, which stores the PMD table's deposited PTE tables. > + * Using list_del() would corrupt pmd_huge_pte with LIST_POISON2. This is ugly. Sounds like you want to use llist_node/head instead of list_head for this. You might able to avoid taking the lock in some cases. Note that pud_lockptr() is mm->page_table_lock as of now. > + * > + * PTE tables should be deposited into the PMD using pud_deposit_pte(). > + */ > +void pgtable_trans_huge_pud_deposit(struct mm_struct *mm, pud_t *pudp, > + pmd_t *pmd_table) > +{ > + pgtable_t pmd_page = virt_to_page(pmd_table); > + > + assert_spin_locked(pud_lockptr(mm, pudp)); > + > + /* Push onto stack using only lru.next as the link */ > + pmd_page->lru.next = (struct list_head *)pud_huge_pmd(pudp); > + pud_huge_pmd(pudp) = pmd_page; > +} > + > +/* > + * Withdraw the deposited PMD table for PUD THP split or zap. > + * Called with PUD lock held. > + * Returns NULL if no more PMD tables are deposited. > + */ > +pmd_t *pgtable_trans_huge_pud_withdraw(struct mm_struct *mm, pud_t *pudp) > +{ > + pgtable_t pmd_page; > + > + assert_spin_locked(pud_lockptr(mm, pudp)); > + > + pmd_page = pud_huge_pmd(pudp); > + if (!pmd_page) > + return NULL; > + > + /* Pop from stack - lru.next points to next PMD page (or NULL) */ > + pud_huge_pmd(pudp) = (pgtable_t)pmd_page->lru.next; > + > + return page_address(pmd_page); > +} > + > +/* > + * Deposit a PTE table into a standalone PMD table (not yet in page table hierarchy). > + * Used for PUD THP pre-deposit. The PMD table's pmd_huge_pte stores a linked list. > + * No lock assertion since the PMD isn't visible yet. > + */ > +void pud_deposit_pte(pmd_t *pmd_table, pgtable_t pgtable) > +{ > + struct ptdesc *ptdesc = virt_to_ptdesc(pmd_table); > + > + /* FIFO - add to front of list */ > + if (!ptdesc->pmd_huge_pte) > + INIT_LIST_HEAD(&pgtable->lru); > + else > + list_add(&pgtable->lru, &ptdesc->pmd_huge_pte->lru); > + ptdesc->pmd_huge_pte = pgtable; > +} > + > +/* > + * Withdraw a PTE table from a standalone PMD table. > + * Returns NULL if no more PTE tables are deposited. > + */ > +pgtable_t pud_withdraw_pte(pmd_t *pmd_table) > +{ > + struct ptdesc *ptdesc = virt_to_ptdesc(pmd_table); > + pgtable_t pgtable; > + > + pgtable = ptdesc->pmd_huge_pte; > + if (!pgtable) > + return NULL; > + ptdesc->pmd_huge_pte = list_first_entry_or_null(&pgtable->lru, > + struct page, lru); > + if (ptdesc->pmd_huge_pte) > + list_del(&pgtable->lru); > + return pgtable; > +} > +#endif /* CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD */ > + > #ifndef __HAVE_ARCH_PMDP_INVALIDATE > pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address, > pmd_t *pmdp) > diff --git a/mm/rmap.c b/mm/rmap.c > index 7b9879ef442d9..69acabd763da4 100644 > --- a/mm/rmap.c > +++ b/mm/rmap.c > @@ -811,6 +811,32 @@ pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address) > return pmd; > } > > +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD > +/* > + * Returns the actual pud_t* where we expect 'address' to be mapped from, or > + * NULL if it doesn't exist. No guarantees / checks on what the pud_t* > + * represents. > + */ > +pud_t *mm_find_pud(struct mm_struct *mm, unsigned long address) Remove the ifdef and make mm_find_pmd() call it. And in general, try to avoid ifdeffery where possible. > +{ > + pgd_t *pgd; > + p4d_t *p4d; > + pud_t *pud = NULL; > + > + pgd = pgd_offset(mm, address); > + if (!pgd_present(*pgd)) > + goto out; > + > + p4d = p4d_offset(pgd, address); > + if (!p4d_present(*p4d)) > + goto out; > + > + pud = pud_offset(p4d, address); > +out: > + return pud; > +} > +#endif > + > struct folio_referenced_arg { > int mapcount; > int referenced; > @@ -1415,11 +1441,7 @@ static __always_inline void __folio_add_anon_rmap(struct folio *folio, > SetPageAnonExclusive(page); > break; > case PGTABLE_LEVEL_PUD: > - /* > - * Keep the compiler happy, we don't support anonymous > - * PUD mappings. > - */ > - WARN_ON_ONCE(1); > + SetPageAnonExclusive(page); > break; > default: > BUILD_BUG(); > @@ -1503,6 +1525,31 @@ void folio_add_anon_rmap_pmd(struct folio *folio, struct page *page, > #endif > } > > +/** > + * folio_add_anon_rmap_pud - add a PUD mapping to a page range of an anon folio > + * @folio: The folio to add the mapping to > + * @page: The first page to add > + * @vma: The vm area in which the mapping is added > + * @address: The user virtual address of the first page to map > + * @flags: The rmap flags > + * > + * The page range of folio is defined by [first_page, first_page + HPAGE_PUD_NR) > + * > + * The caller needs to hold the page table lock, and the page must be locked in > + * the anon_vma case: to serialize mapping,index checking after setting. > + */ > +void folio_add_anon_rmap_pud(struct folio *folio, struct page *page, > + struct vm_area_struct *vma, unsigned long address, rmap_t flags) > +{ > +#if defined(CONFIG_TRANSPARENT_HUGEPAGE) && \ > + defined(CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD) > + __folio_add_anon_rmap(folio, page, HPAGE_PUD_NR, vma, address, flags, > + PGTABLE_LEVEL_PUD); > +#else > + WARN_ON_ONCE(true); > +#endif > +} > + > /** > * folio_add_new_anon_rmap - Add mapping to a new anonymous folio. > * @folio: The folio to add the mapping to. > @@ -1934,6 +1981,20 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, > } > > if (!pvmw.pte) { > + /* > + * Check for PUD-mapped THP first. > + * If we have a PUD mapping and TTU_SPLIT_HUGE_PUD is set, > + * split the PUD to PMD level and restart the walk. > + */ > + if (pvmw.pud && pud_trans_huge(*pvmw.pud)) { > + if (flags & TTU_SPLIT_HUGE_PUD) { > + split_huge_pud_locked(vma, pvmw.pud, pvmw.address); > + flags &= ~TTU_SPLIT_HUGE_PUD; > + page_vma_mapped_walk_restart(&pvmw); > + continue; > + } > + } > + > if (folio_test_anon(folio) && !folio_test_swapbacked(folio)) { > if (unmap_huge_pmd_locked(vma, pvmw.address, pvmw.pmd, folio)) > goto walk_done; > @@ -2325,6 +2386,27 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, > mmu_notifier_invalidate_range_start(&range); > > while (page_vma_mapped_walk(&pvmw)) { > + /* Handle PUD-mapped THP first */ > + if (!pvmw.pte && !pvmw.pmd) { > +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD > + /* > + * PUD-mapped THP: skip migration to preserve the huge > + * page. Splitting would defeat the purpose of PUD THPs. > + * Return false to indicate migration failure, which > + * will cause alloc_contig_range() to try a different > + * memory region. > + */ > + if (pvmw.pud && pud_trans_huge(*pvmw.pud)) { > + page_vma_mapped_walk_done(&pvmw); > + ret = false; > + break; > + } > +#endif > + /* Unexpected state: !pte && !pmd but not a PUD THP */ > + page_vma_mapped_walk_done(&pvmw); > + break; > + } > + > /* PMD-mapped THP migration entry */ > if (!pvmw.pte) { > __maybe_unused unsigned long pfn; > @@ -2607,10 +2689,10 @@ void try_to_migrate(struct folio *folio, enum ttu_flags flags) > > /* > * Migration always ignores mlock and only supports TTU_RMAP_LOCKED and > - * TTU_SPLIT_HUGE_PMD, TTU_SYNC, and TTU_BATCH_FLUSH flags. > + * TTU_SPLIT_HUGE_PMD, TTU_SPLIT_HUGE_PUD, TTU_SYNC, and TTU_BATCH_FLUSH flags. > */ > if (WARN_ON_ONCE(flags & ~(TTU_RMAP_LOCKED | TTU_SPLIT_HUGE_PMD | > - TTU_SYNC | TTU_BATCH_FLUSH))) > + TTU_SPLIT_HUGE_PUD | TTU_SYNC | TTU_BATCH_FLUSH))) > return; > > if (folio_is_zone_device(folio) && > -- > 2.47.3 > -- Kiryl Shutsemau / Kirill A. Shutemov