From: Andrew Morton <akpm@linux-foundation.org>
To: mm-commits@vger.kernel.org,yuzhao@google.com,yang@os.amperecomputing.com,willy@infradead.org,wangkefeng.wang@huawei.com,ryan.roberts@arm.com,linmiaohe@huawei.com,kirill.shutemov@linux.intel.com,kasong@tencent.com,jhubbard@nvidia.com,hughd@google.com,david@redhat.com,baolin.wang@linux.alibaba.com,ziy@nvidia.com,akpm@linux-foundation.org
Subject: [merged mm-stable] mm-huge_memory-add-two-new-not-yet-used-functions-for-folio_split.patch removed from -mm tree
Date: Mon, 17 Mar 2025 22:10:19 -0700 [thread overview]
Message-ID: <20250318051019.9AD2AC4CEDD@smtp.kernel.org> (raw)
The quilt patch titled
Subject: mm/huge_memory: add two new (not yet used) functions for folio_split()
has been removed from the -mm tree. Its filename was
mm-huge_memory-add-two-new-not-yet-used-functions-for-folio_split.patch
This patch was dropped because it was merged into the mm-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Zi Yan <ziy@nvidia.com>
Subject: mm/huge_memory: add two new (not yet used) functions for folio_split()
Date: Fri, 7 Mar 2025 12:39:55 -0500
This is a preparation patch, both added functions are not used yet.
The added __split_unmapped_folio() is able to split a folio with its
mapping removed in two manners: 1) uniform split (the existing way), and
2) buddy allocator like (or non-uniform) split.
The added __split_folio_to_order() can split a folio into any lower order.
For uniform split, __split_unmapped_folio() calls it once to split the
given folio to the new order. For buddy allocator like (non-uniform)
split, __split_unmapped_folio() calls it (folio_order - new_order) times
and each time splits the folio containing the given page to one lower
order.
[ziy@nvidia.com: unfreeze head folio after page cache entries are updated]
Link: https://lkml.kernel.org/r/0F15DA7F-1977-412F-9A3E-F06B515D4BD2@nvidia.com
[ziy@nvidia.com: use NULL instead of 0 for folio->private assignment]
Link: https://lkml.kernel.org/r/1E11B9DD-3A87-4C9C-8FB4-E1324FB6A21A@nvidia.com
Link: https://lkml.kernel.org/r/20250307174001.242794-3-ziy@nvidia.com
Signed-off-by: Zi Yan <ziy@nvidia.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Kirill A. Shuemov <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Yang Shi <yang@os.amperecomputing.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Kairui Song <kasong@tencent.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/huge_memory.c | 354 ++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 353 insertions(+), 1 deletion(-)
--- a/mm/huge_memory.c~mm-huge_memory-add-two-new-not-yet-used-functions-for-folio_split
+++ a/mm/huge_memory.c
@@ -3265,7 +3265,6 @@ static void remap_page(struct folio *fol
static void lru_add_page_tail(struct folio *folio, struct page *tail,
struct lruvec *lruvec, struct list_head *list)
{
- VM_BUG_ON_FOLIO(!folio_test_large(folio), folio);
VM_BUG_ON_FOLIO(PageLRU(tail), folio);
lockdep_assert_held(&lruvec->lru_lock);
@@ -3518,6 +3517,359 @@ bool can_split_folio(struct folio *folio
}
/*
+ * It splits @folio into @new_order folios and copies the @folio metadata to
+ * all the resulting folios.
+ */
+static void __split_folio_to_order(struct folio *folio, int old_order,
+ int new_order)
+{
+ long new_nr_pages = 1 << new_order;
+ long nr_pages = 1 << old_order;
+ long i;
+
+ /*
+ * Skip the first new_nr_pages, since the new folio from them have all
+ * the flags from the original folio.
+ */
+ for (i = new_nr_pages; i < nr_pages; i += new_nr_pages) {
+ struct page *new_head = &folio->page + i;
+
+ /*
+ * Careful: new_folio is not a "real" folio before we cleared PageTail.
+ * Don't pass it around before clear_compound_head().
+ */
+ struct folio *new_folio = (struct folio *)new_head;
+
+ VM_BUG_ON_PAGE(atomic_read(&new_folio->_mapcount) != -1, new_head);
+
+ /*
+ * Clone page flags before unfreezing refcount.
+ *
+ * After successful get_page_unless_zero() might follow flags change,
+ * for example lock_page() which set PG_waiters.
+ *
+ * Note that for mapped sub-pages of an anonymous THP,
+ * PG_anon_exclusive has been cleared in unmap_folio() and is stored in
+ * the migration entry instead from where remap_page() will restore it.
+ * We can still have PG_anon_exclusive set on effectively unmapped and
+ * unreferenced sub-pages of an anonymous THP: we can simply drop
+ * PG_anon_exclusive (-> PG_mappedtodisk) for these here.
+ */
+ new_folio->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;
+ new_folio->flags |= (folio->flags &
+ ((1L << PG_referenced) |
+ (1L << PG_swapbacked) |
+ (1L << PG_swapcache) |
+ (1L << PG_mlocked) |
+ (1L << PG_uptodate) |
+ (1L << PG_active) |
+ (1L << PG_workingset) |
+ (1L << PG_locked) |
+ (1L << PG_unevictable) |
+#ifdef CONFIG_ARCH_USES_PG_ARCH_2
+ (1L << PG_arch_2) |
+#endif
+#ifdef CONFIG_ARCH_USES_PG_ARCH_3
+ (1L << PG_arch_3) |
+#endif
+ (1L << PG_dirty) |
+ LRU_GEN_MASK | LRU_REFS_MASK));
+
+ new_folio->mapping = folio->mapping;
+ new_folio->index = folio->index + i;
+
+ /*
+ * page->private should not be set in tail pages. Fix up and warn once
+ * if private is unexpectedly set.
+ */
+ if (unlikely(new_folio->private)) {
+ VM_WARN_ON_ONCE_PAGE(true, new_head);
+ new_folio->private = NULL;
+ }
+
+ if (folio_test_swapcache(folio))
+ new_folio->swap.val = folio->swap.val + i;
+
+ /* Page flags must be visible before we make the page non-compound. */
+ smp_wmb();
+
+ /*
+ * Clear PageTail before unfreezing page refcount.
+ *
+ * After successful get_page_unless_zero() might follow put_page()
+ * which needs correct compound_head().
+ */
+ clear_compound_head(new_head);
+ if (new_order) {
+ prep_compound_page(new_head, new_order);
+ folio_set_large_rmappable(new_folio);
+ }
+
+ if (folio_test_young(folio))
+ folio_set_young(new_folio);
+ if (folio_test_idle(folio))
+ folio_set_idle(new_folio);
+
+ folio_xchg_last_cpupid(new_folio, folio_last_cpupid(folio));
+ }
+
+ if (new_order)
+ folio_set_order(folio, new_order);
+ else
+ ClearPageCompound(&folio->page);
+}
+
+/*
+ * It splits an unmapped @folio to lower order smaller folios in two ways.
+ * @folio: the to-be-split folio
+ * @new_order: the smallest order of the after split folios (since buddy
+ * allocator like split generates folios with orders from @folio's
+ * order - 1 to new_order).
+ * @split_at: in buddy allocator like split, the folio containing @split_at
+ * will be split until its order becomes @new_order.
+ * @lock_at: the folio containing @lock_at is left locked for caller.
+ * @list: the after split folios will be added to @list if it is not NULL,
+ * otherwise to LRU lists.
+ * @end: the end of the file @folio maps to. -1 if @folio is anonymous memory.
+ * @xas: xa_state pointing to folio->mapping->i_pages and locked by caller
+ * @mapping: @folio->mapping
+ * @uniform_split: if the split is uniform or not (buddy allocator like split)
+ *
+ *
+ * 1. uniform split: the given @folio into multiple @new_order small folios,
+ * where all small folios have the same order. This is done when
+ * uniform_split is true.
+ * 2. buddy allocator like (non-uniform) split: the given @folio is split into
+ * half and one of the half (containing the given page) is split into half
+ * until the given @page's order becomes @new_order. This is done when
+ * uniform_split is false.
+ *
+ * The high level flow for these two methods are:
+ * 1. uniform split: a single __split_folio_to_order() is called to split the
+ * @folio into @new_order, then we traverse all the resulting folios one by
+ * one in PFN ascending order and perform stats, unfreeze, adding to list,
+ * and file mapping index operations.
+ * 2. non-uniform split: in general, folio_order - @new_order calls to
+ * __split_folio_to_order() are made in a for loop to split the @folio
+ * to one lower order at a time. The resulting small folios are processed
+ * like what is done during the traversal in 1, except the one containing
+ * @page, which is split in next for loop.
+ *
+ * After splitting, the caller's folio reference will be transferred to the
+ * folio containing @page. The other folios may be freed if they are not mapped.
+ *
+ * In terms of locking, after splitting,
+ * 1. uniform split leaves @page (or the folio contains it) locked;
+ * 2. buddy allocator like (non-uniform) split leaves @folio locked.
+ *
+ *
+ * For !uniform_split, when -ENOMEM is returned, the original folio might be
+ * split. The caller needs to check the input folio.
+ */
+static int __split_unmapped_folio(struct folio *folio, int new_order,
+ struct page *split_at, struct page *lock_at,
+ struct list_head *list, pgoff_t end,
+ struct xa_state *xas, struct address_space *mapping,
+ bool uniform_split)
+{
+ struct lruvec *lruvec;
+ struct address_space *swap_cache = NULL;
+ struct folio *origin_folio = folio;
+ struct folio *next_folio = folio_next(folio);
+ struct folio *new_folio;
+ struct folio *next;
+ int order = folio_order(folio);
+ int split_order;
+ int start_order = uniform_split ? new_order : order - 1;
+ int nr_dropped = 0;
+ int ret = 0;
+ bool stop_split = false;
+
+ if (folio_test_swapcache(folio)) {
+ VM_BUG_ON(mapping);
+
+ /* a swapcache folio can only be uniformly split to order-0 */
+ if (!uniform_split || new_order != 0)
+ return -EINVAL;
+
+ swap_cache = swap_address_space(folio->swap);
+ xa_lock(&swap_cache->i_pages);
+ }
+
+ if (folio_test_anon(folio))
+ mod_mthp_stat(order, MTHP_STAT_NR_ANON, -1);
+
+ /* lock lru list/PageCompound, ref frozen by page_ref_freeze */
+ lruvec = folio_lruvec_lock(folio);
+
+ folio_clear_has_hwpoisoned(folio);
+
+ /*
+ * split to new_order one order at a time. For uniform split,
+ * folio is split to new_order directly.
+ */
+ for (split_order = start_order;
+ split_order >= new_order && !stop_split;
+ split_order--) {
+ int old_order = folio_order(folio);
+ struct folio *release;
+ struct folio *end_folio = folio_next(folio);
+
+ /* order-1 anonymous folio is not supported */
+ if (folio_test_anon(folio) && split_order == 1)
+ continue;
+ if (uniform_split && split_order != new_order)
+ continue;
+
+ if (mapping) {
+ /*
+ * uniform split has xas_split_alloc() called before
+ * irq is disabled to allocate enough memory, whereas
+ * non-uniform split can handle ENOMEM.
+ */
+ if (uniform_split)
+ xas_split(xas, folio, old_order);
+ else {
+ xas_set_order(xas, folio->index, split_order);
+ xas_try_split(xas, folio, old_order);
+ if (xas_error(xas)) {
+ ret = xas_error(xas);
+ stop_split = true;
+ goto after_split;
+ }
+ }
+ }
+
+ /*
+ * Reset any memcg data overlay in the tail pages.
+ * folio_nr_pages() is unreliable until prep_compound_page()
+ * was called again.
+ */
+#ifdef NR_PAGES_IN_LARGE_FOLIO
+ folio->_nr_pages = 0;
+#endif
+
+
+ /* complete memcg works before add pages to LRU */
+ split_page_memcg(&folio->page, old_order, split_order);
+ split_page_owner(&folio->page, old_order, split_order);
+ pgalloc_tag_split(folio, old_order, split_order);
+
+ __split_folio_to_order(folio, old_order, split_order);
+
+after_split:
+ /*
+ * Iterate through after-split folios and perform related
+ * operations. But in buddy allocator like split, the folio
+ * containing the specified page is skipped until its order
+ * is new_order, since the folio will be worked on in next
+ * iteration.
+ */
+ for (release = folio; release != end_folio; release = next) {
+ next = folio_next(release);
+ /*
+ * for buddy allocator like split, the folio containing
+ * page will be split next and should not be released,
+ * until the folio's order is new_order or stop_split
+ * is set to true by the above xas_split() failure.
+ */
+ if (release == page_folio(split_at)) {
+ folio = release;
+ if (split_order != new_order && !stop_split)
+ continue;
+ }
+ if (folio_test_anon(release)) {
+ mod_mthp_stat(folio_order(release),
+ MTHP_STAT_NR_ANON, 1);
+ }
+
+ /*
+ * origin_folio should be kept frozon until page cache
+ * entries are updated with all the other after-split
+ * folios to prevent others seeing stale page cache
+ * entries.
+ */
+ if (release == origin_folio)
+ continue;
+
+ folio_ref_unfreeze(release, 1 +
+ ((mapping || swap_cache) ?
+ folio_nr_pages(release) : 0));
+
+ lru_add_page_tail(origin_folio, &release->page,
+ lruvec, list);
+
+ /* Some pages can be beyond EOF: drop them from cache */
+ if (release->index >= end) {
+ if (shmem_mapping(mapping))
+ nr_dropped += folio_nr_pages(release);
+ else if (folio_test_clear_dirty(release))
+ folio_account_cleaned(release,
+ inode_to_wb(mapping->host));
+ __filemap_remove_folio(release, NULL);
+ folio_put_refs(release, folio_nr_pages(release));
+ } else if (mapping) {
+ __xa_store(&mapping->i_pages,
+ release->index, release, 0);
+ } else if (swap_cache) {
+ __xa_store(&swap_cache->i_pages,
+ swap_cache_index(release->swap),
+ release, 0);
+ }
+ }
+ }
+
+ /*
+ * Unfreeze origin_folio only after all page cache entries, which used
+ * to point to it, have been updated with new folios. Otherwise,
+ * a parallel folio_try_get() can grab origin_folio and its caller can
+ * see stale page cache entries.
+ */
+ folio_ref_unfreeze(origin_folio, 1 +
+ ((mapping || swap_cache) ? folio_nr_pages(origin_folio) : 0));
+
+ unlock_page_lruvec(lruvec);
+
+ if (swap_cache)
+ xa_unlock(&swap_cache->i_pages);
+ if (mapping)
+ xa_unlock(&mapping->i_pages);
+
+ /* Caller disabled irqs, so they are still disabled here */
+ local_irq_enable();
+
+ if (nr_dropped)
+ shmem_uncharge(mapping->host, nr_dropped);
+
+ remap_page(origin_folio, 1 << order,
+ folio_test_anon(origin_folio) ?
+ RMP_USE_SHARED_ZEROPAGE : 0);
+
+ /*
+ * At this point, folio should contain the specified page.
+ * For uniform split, it is left for caller to unlock.
+ * For buddy allocator like split, the first after-split folio is left
+ * for caller to unlock.
+ */
+ for (new_folio = origin_folio; new_folio != next_folio; new_folio = next) {
+ next = folio_next(new_folio);
+ if (new_folio == page_folio(lock_at))
+ continue;
+
+ folio_unlock(new_folio);
+ /*
+ * Subpages may be freed if there wasn't any mapping
+ * like if add_to_swap() is running on a lru page that
+ * had its mapping zapped. And freeing these pages
+ * requires taking the lru_lock so we do the put_page
+ * of the tail pages after the split is complete.
+ */
+ free_page_and_swap_cache(&new_folio->page);
+ }
+ return ret;
+}
+
+/*
* This function splits a large folio into smaller folios of order @new_order.
* @page can point to any page of the large folio to split. The split operation
* does not change the position of @page.
_
Patches currently in -mm which might be from ziy@nvidia.com are
reply other threads:[~2025-03-18 5:10 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250318051019.9AD2AC4CEDD@smtp.kernel.org \
--to=akpm@linux-foundation.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=david@redhat.com \
--cc=hughd@google.com \
--cc=jhubbard@nvidia.com \
--cc=kasong@tencent.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=linmiaohe@huawei.com \
--cc=mm-commits@vger.kernel.org \
--cc=ryan.roberts@arm.com \
--cc=wangkefeng.wang@huawei.com \
--cc=willy@infradead.org \
--cc=yang@os.amperecomputing.com \
--cc=yuzhao@google.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.