All of lore.kernel.org
 help / color / mirror / Atom feed
From: Hugh Dickins <hughd@google.com>
To: Zi Yan <ziy@nvidia.com>
Cc: linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
	 "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
	 "Matthew Wilcox (Oracle)" <willy@infradead.org>,
	 Ryan Roberts <ryan.roberts@arm.com>,
	Hugh Dickins <hughd@google.com>,
	 David Hildenbrand <david@redhat.com>,
	 Yang Shi <yang@os.amperecomputing.com>,
	Miaohe Lin <linmiaohe@huawei.com>,
	 Kefeng Wang <wangkefeng.wang@huawei.com>,
	Yu Zhao <yuzhao@google.com>,  John Hubbard <jhubbard@nvidia.com>,
	 Baolin Wang <baolin.wang@linux.alibaba.com>,
	 linux-kselftest@vger.kernel.org, linux-kernel@vger.kernel.org,
	 Kairui Song <kasong@tencent.com>,
	Liu Shixin <liushixin2@huawei.com>
Subject: Re: [PATCH v9 2/8] mm/huge_memory: add two new (not yet used) functions for folio_split()
Date: Tue, 4 Mar 2025 03:49:32 -0800 (PST)	[thread overview]
Message-ID: <2fae27fe-6e2e-3587-4b68-072118d80cf8@google.com> (raw)
In-Reply-To: <20250226210032.2044041-3-ziy@nvidia.com>

On Wed, 26 Feb 2025, Zi Yan wrote:

> This is a preparation patch, both added functions are not used yet.
> 
> The added __split_unmapped_folio() is able to split a folio with its
> mapping removed in two manners: 1) uniform split (the existing way), and
> 2) buddy allocator like split.
> 
> The added __split_folio_to_order() can split a folio into any lower order.
> For uniform split, __split_unmapped_folio() calls it once to split the
> given folio to the new order.  For buddy allocator split,
> __split_unmapped_folio() calls it (folio_order - new_order) times and each
> time splits the folio containing the given page to one lower order.
> 
> Signed-off-by: Zi Yan <ziy@nvidia.com>

Sorry, I'm tired and don't really want to be writing this yet, but the
migrate "hotfix" has tipped my hand, and I need to get this out to you
before more days pass.

I'd been unable to complete even a single iteration of my "kernel builds
on huge tmpfs while swapping to SSD" testing during this current 6.14-rc
mm.git cycle (6.14-rc itself fine) - until the last week, when some
important fixes have come in, so I'm no longer getting I/O errors from
ext4-on-loop0-on-huge-tmpfs, and "Huh VM_FAULT_OOM leaked" warnings: good.

But I still can't get beyond a few iterations, a few minutes: there's
some corruption of user data, which usually manifests as a kernel build
failing because fixdep couldn't find some truncated-on-the-left pathname.

While it definitely bisected to your folio_split() series, it's quite
possible that you're merely exposing an existing bug to wider use.

I've spent the last few days trying to track this down, but still not
succeeded: I'm still getting much the same corruption.  But have been
folding in various fixes as I found them, even though they have not
solved the main problem at all.  I'll return to trying to debug the
corruption "tomorrow".

I think (might be wrong, I'm in a rush) my mods are all to this
"add two new (not yet used) functions for folio_split()" patch:
please merge them in if you agree.

1. From source inspection, it looks like a folio_set_order() was missed.

2. Why is swapcache only checked when folio_test_anon? I can see that
   you've just copied that over from the old __split_huge_page(), but
   it seems wrong to me here and there - I guess a relic from before
   shmem could swap out a huge page.

3. Doing folio_next() inside the for(;;) is unsafe in those configs
   which have to look up zone etc, I got an oops from the "new_folio"
   loop; didn't hit an oops from the "release" loop but fixed that too.

4. While correcting anon versus mapping versus swap_cache, shortened
   the lines by avoiding origin_folio->mapping and &release->page.

Signed-off-by: Hugh Dickins <hughd@google.com>
---
 mm/huge_memory.c | 39 ++++++++++++++++++++-------------------
 1 file changed, 20 insertions(+), 19 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 0e45937c0d91..9ce3906672b9 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3612,7 +3612,9 @@ static void __split_folio_to_order(struct folio *folio, int new_order)
 		folio_xchg_last_cpupid(new_folio, folio_last_cpupid(folio));
 	}
 
-	if (!new_order)
+	if (new_order)
+		folio_set_order(folio, new_order);
+	else
 		ClearPageCompound(&folio->page);
 }
 
@@ -3682,7 +3684,9 @@ static int __split_unmapped_folio(struct folio *folio, int new_order,
 	int ret = 0;
 	bool stop_split = false;
 
-	if (folio_test_anon(folio) && folio_test_swapcache(folio)) {
+	if (folio_test_swapcache(folio)) {
+		VM_BUG_ON(mapping);
+
 		/* a swapcache folio can only be uniformly split to order-0 */
 		if (!uniform_split || new_order != 0)
 			return -EINVAL;
@@ -3750,9 +3754,8 @@ static int __split_unmapped_folio(struct folio *folio, int new_order,
 		 * is new_order, since the folio will be worked on in next
 		 * iteration.
 		 */
-		for (release = folio, next = folio_next(folio);
-		     release != end_folio;
-		     release = next, next = folio_next(next)) {
+		for (release = folio; release != end_folio; release = next) {
+			next = folio_next(release);
 			/*
 			 * for buddy allocator like split, the folio containing
 			 * page will be split next and should not be released,
@@ -3784,32 +3787,31 @@ static int __split_unmapped_folio(struct folio *folio, int new_order,
 			lru_add_page_tail(origin_folio, &release->page,
 						lruvec, list);
 
-			/* Some pages can be beyond EOF: drop them from page cache */
+			/* Some pages can be beyond EOF: drop them from cache */
 			if (release->index >= end) {
-				if (shmem_mapping(origin_folio->mapping))
+				if (shmem_mapping(mapping))
 					nr_dropped += folio_nr_pages(release);
 				else if (folio_test_clear_dirty(release))
 					folio_account_cleaned(release,
-						inode_to_wb(origin_folio->mapping->host));
+						inode_to_wb(mapping->host));
 				__filemap_remove_folio(release, NULL);
 				folio_put(release);
-			} else if (!folio_test_anon(release)) {
-				__xa_store(&origin_folio->mapping->i_pages,
-						release->index, &release->page, 0);
+			} else if (mapping) {
+				__xa_store(&mapping->i_pages,
+						release->index, release, 0);
 			} else if (swap_cache) {
 				__xa_store(&swap_cache->i_pages,
 						swap_cache_index(release->swap),
-						&release->page, 0);
+						release, 0);
 			}
 		}
 	}
 
 	unlock_page_lruvec(lruvec);
 
-	if (folio_test_anon(origin_folio)) {
-		if (folio_test_swapcache(origin_folio))
-			xa_unlock(&swap_cache->i_pages);
-	} else
+	if (swap_cache)
+		xa_unlock(&swap_cache->i_pages);
+	if (mapping)
 		xa_unlock(&mapping->i_pages);
 
 	/* Caller disabled irqs, so they are still disabled here */
@@ -3828,9 +3830,8 @@ static int __split_unmapped_folio(struct folio *folio, int new_order,
 	 * For buddy allocator like split, the first after-split folio is left
 	 * for caller to unlock.
 	 */
-	for (new_folio = origin_folio, next = folio_next(origin_folio);
-	     new_folio != next_folio;
-	     new_folio = next, next = folio_next(next)) {
+	for (new_folio = origin_folio; new_folio != next_folio; new_folio = next) {
+		next = folio_next(new_folio);
 		if (new_folio == page_folio(lock_at))
 			continue;
 
-- 
2.43.0

  parent reply	other threads:[~2025-03-04 11:49 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-26 21:00 [PATCH v9 0/8] Buddy allocator like (or non-uniform) folio split Zi Yan
2025-02-26 21:00 ` [PATCH v9 1/8] xarray: add xas_try_split() to split a multi-index entry Zi Yan
2025-02-26 21:00 ` [PATCH v9 2/8] mm/huge_memory: add two new (not yet used) functions for folio_split() Zi Yan
2025-02-27  5:55   ` Matthew Wilcox
2025-02-27 15:14     ` Matthew Wilcox
2025-02-27 15:42       ` Zi Yan
2025-03-04 11:49   ` Hugh Dickins [this message]
2025-03-04 16:20     ` Zi Yan
2025-03-04 20:29       ` Andrew Morton
2025-03-04 20:34         ` Zi Yan
2025-03-05 21:03       ` Hugh Dickins
2025-03-05 21:10         ` Zi Yan
2025-03-05 22:38           ` Hugh Dickins
2025-03-06 16:21             ` Zi Yan
2025-03-07 15:23               ` Zi Yan
2025-03-10  8:54               ` Hugh Dickins
2025-03-10 15:35                 ` Zi Yan
2025-03-05 19:45     ` Zi Yan
2025-03-05 20:50       ` Hugh Dickins
2025-03-05 21:08         ` Zi Yan
2025-03-05 21:49           ` Hugh Dickins
2025-03-06  9:19           ` David Hildenbrand
2025-03-06 16:27             ` Zi Yan
2025-03-07 17:46               ` David Hildenbrand
2025-02-26 21:00 ` [PATCH v9 3/8] mm/huge_memory: move folio split common code to __folio_split() Zi Yan
2025-02-26 21:00 ` [PATCH v9 4/8] mm/huge_memory: add buddy allocator like (non-uniform) folio_split() Zi Yan
2025-02-26 21:00 ` [PATCH v9 5/8] mm/huge_memory: remove the old, unused __split_huge_page() Zi Yan
2025-02-26 21:00 ` [PATCH v9 6/8] mm/huge_memory: add folio_split() to debugfs testing interface Zi Yan
2025-02-26 21:00 ` [PATCH v9 7/8] mm/truncate: use buddy allocator like folio split for truncate operation Zi Yan
2025-03-02  3:52   ` Zi Yan
2025-02-26 21:00 ` [PATCH v9 8/8] selftests/mm: add tests for folio_split(), buddy allocator like split Zi Yan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2fae27fe-6e2e-3587-4b68-072118d80cf8@google.com \
    --to=hughd@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=david@redhat.com \
    --cc=jhubbard@nvidia.com \
    --cc=kasong@tencent.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linmiaohe@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=liushixin2@huawei.com \
    --cc=ryan.roberts@arm.com \
    --cc=wangkefeng.wang@huawei.com \
    --cc=willy@infradead.org \
    --cc=yang@os.amperecomputing.com \
    --cc=yuzhao@google.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.