[PATCH] mm/huge_memory: move unrelated code out of __split_unmapped

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] mm/huge_memory: move unrelated code out of __split_unmapped_folio()
@ 2025-07-11  3:02 Zi Yan
  2025-07-11  6:41 ` David Hildenbrand
  0 siblings, 1 reply; 9+ messages in thread
From: Zi Yan @ 2025-07-11  3:02 UTC (permalink / raw)
  To: Balbir Singh, David Hildenbrand, linux-mm
  Cc: Andrew Morton, Lorenzo Stoakes, Zi Yan, Baolin Wang,
	Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song,
	linux-kernel

remap(), folio_ref_unfreeze(), lru_add_split_folio() are not related to
splitting unmapped folio operations. Move them out to the caller, so that
__split_unmapped_folio() only splits unmapped folios. This makes
__split_unmapped_folio() reusable.

Convert VM_BUG_ON(mapping) to use VM_WARN_ON_ONCE_FOLIO().

Signed-off-by: Zi Yan <ziy@nvidia.com>
---
Based on the prior discussion[1], this patch makes
__split_unmapped_folio() reusable for splitting unmapped folios without
adding a new boolean unmapped parameter to guard mapping related code.

Another potential benefit is that __split_unmapped_folio() could be
called on after-split folios by __folio_split() to perform new split
methods. For example, at deferred split time, unmapped subpages can
scatter arbitrarily within a large folio, neither uniform nor non-uniform
split can maximize after-split folio orders for mapped subpages.
Hopefully, performing __split_unmapped_folio() multiple times can
achieve the optimal split result.

It passed mm selftests.

[1] https://lore.kernel.org/linux-mm/94D8C1A4-780C-4BEC-A336-7D3613B54845@nvidia.com/
---

 mm/huge_memory.c | 275 ++++++++++++++++++++++++-----------------------
 1 file changed, 139 insertions(+), 136 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 3eb1c34be601..d97145dfa6c8 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3396,10 +3396,6 @@ static void __split_folio_to_order(struct folio *folio, int old_order,
  *             order - 1 to new_order).
  * @split_at: in buddy allocator like split, the folio containing @split_at
  *            will be split until its order becomes @new_order.
- * @lock_at: the folio containing @lock_at is left locked for caller.
- * @list: the after split folios will be added to @list if it is not NULL,
- *        otherwise to LRU lists.
- * @end: the end of the file @folio maps to. -1 if @folio is anonymous memory.
  * @xas: xa_state pointing to folio->mapping->i_pages and locked by caller
  * @mapping: @folio->mapping
  * @uniform_split: if the split is uniform or not (buddy allocator like split)
@@ -3425,51 +3421,27 @@ static void __split_folio_to_order(struct folio *folio, int old_order,
  *    @page, which is split in next for loop.
  *
  * After splitting, the caller's folio reference will be transferred to the
- * folio containing @page. The other folios may be freed if they are not mapped.
- *
- * In terms of locking, after splitting,
- * 1. uniform split leaves @page (or the folio contains it) locked;
- * 2. buddy allocator like (non-uniform) split leaves @folio locked.
- *
+ * folio containing @page. The caller needs to unlock and/or free after-split
+ * folios if necessary.
  *
  * For !uniform_split, when -ENOMEM is returned, the original folio might be
  * split. The caller needs to check the input folio.
  */
 static int __split_unmapped_folio(struct folio *folio, int new_order,
-		struct page *split_at, struct page *lock_at,
-		struct list_head *list, pgoff_t end,
-		struct xa_state *xas, struct address_space *mapping,
-		bool uniform_split)
+				  struct page *split_at, struct xa_state *xas,
+				  struct address_space *mapping,
+				  bool uniform_split)
 {
-	struct lruvec *lruvec;
-	struct address_space *swap_cache = NULL;
-	struct folio *origin_folio = folio;
-	struct folio *next_folio = folio_next(folio);
-	struct folio *new_folio;
 	struct folio *next;
 	int order = folio_order(folio);
 	int split_order;
 	int start_order = uniform_split ? new_order : order - 1;
-	int nr_dropped = 0;
 	int ret = 0;
 	bool stop_split = false;
 
-	if (folio_test_swapcache(folio)) {
-		VM_BUG_ON(mapping);
-
-		/* a swapcache folio can only be uniformly split to order-0 */
-		if (!uniform_split || new_order != 0)
-			return -EINVAL;
-
-		swap_cache = swap_address_space(folio->swap);
-		xa_lock(&swap_cache->i_pages);
-	}
-
 	if (folio_test_anon(folio))
 		mod_mthp_stat(order, MTHP_STAT_NR_ANON, -1);
 
-	/* lock lru list/PageCompound, ref frozen by page_ref_freeze */
-	lruvec = folio_lruvec_lock(folio);
 
 	folio_clear_has_hwpoisoned(folio);
 
@@ -3480,9 +3452,9 @@ static int __split_unmapped_folio(struct folio *folio, int new_order,
 	for (split_order = start_order;
 	     split_order >= new_order && !stop_split;
 	     split_order--) {
-		int old_order = folio_order(folio);
-		struct folio *release;
 		struct folio *end_folio = folio_next(folio);
+		int old_order = folio_order(folio);
+		struct folio *new_folio;
 
 		/* order-1 anonymous folio is not supported */
 		if (folio_test_anon(folio) && split_order == 1)
@@ -3517,113 +3489,34 @@ static int __split_unmapped_folio(struct folio *folio, int new_order,
 
 after_split:
 		/*
-		 * Iterate through after-split folios and perform related
-		 * operations. But in buddy allocator like split, the folio
+		 * Iterate through after-split folios and update folio stats.
+		 * But in buddy allocator like split, the folio
 		 * containing the specified page is skipped until its order
 		 * is new_order, since the folio will be worked on in next
 		 * iteration.
 		 */
-		for (release = folio; release != end_folio; release = next) {
-			next = folio_next(release);
+		for (new_folio = folio; new_folio != end_folio; new_folio = next) {
+			next = folio_next(new_folio);
 			/*
-			 * for buddy allocator like split, the folio containing
-			 * page will be split next and should not be released,
-			 * until the folio's order is new_order or stop_split
-			 * is set to true by the above xas_split() failure.
+			 * for buddy allocator like split, new_folio containing
+			 * page could be split again, thus do not change stats
+			 * yet. Wait until new_folio's order is new_order or
+			 * stop_split is set to true by the above xas_split()
+			 * failure.
 			 */
-			if (release == page_folio(split_at)) {
-				folio = release;
+			if (new_folio == page_folio(split_at)) {
+				folio = new_folio;
 				if (split_order != new_order && !stop_split)
 					continue;
 			}
-			if (folio_test_anon(release)) {
-				mod_mthp_stat(folio_order(release),
+			if (folio_test_anon(new_folio)) {
+				mod_mthp_stat(folio_order(new_folio),
 						MTHP_STAT_NR_ANON, 1);
 			}
 
-			/*
-			 * origin_folio should be kept frozon until page cache
-			 * entries are updated with all the other after-split
-			 * folios to prevent others seeing stale page cache
-			 * entries.
-			 */
-			if (release == origin_folio)
-				continue;
-
-			folio_ref_unfreeze(release, 1 +
-					((mapping || swap_cache) ?
-						folio_nr_pages(release) : 0));
-
-			lru_add_split_folio(origin_folio, release, lruvec,
-					list);
-
-			/* Some pages can be beyond EOF: drop them from cache */
-			if (release->index >= end) {
-				if (shmem_mapping(mapping))
-					nr_dropped += folio_nr_pages(release);
-				else if (folio_test_clear_dirty(release))
-					folio_account_cleaned(release,
-						inode_to_wb(mapping->host));
-				__filemap_remove_folio(release, NULL);
-				folio_put_refs(release, folio_nr_pages(release));
-			} else if (mapping) {
-				__xa_store(&mapping->i_pages,
-						release->index, release, 0);
-			} else if (swap_cache) {
-				__xa_store(&swap_cache->i_pages,
-						swap_cache_index(release->swap),
-						release, 0);
-			}
 		}
 	}
 
-	/*
-	 * Unfreeze origin_folio only after all page cache entries, which used
-	 * to point to it, have been updated with new folios. Otherwise,
-	 * a parallel folio_try_get() can grab origin_folio and its caller can
-	 * see stale page cache entries.
-	 */
-	folio_ref_unfreeze(origin_folio, 1 +
-		((mapping || swap_cache) ? folio_nr_pages(origin_folio) : 0));
-
-	unlock_page_lruvec(lruvec);
-
-	if (swap_cache)
-		xa_unlock(&swap_cache->i_pages);
-	if (mapping)
-		xa_unlock(&mapping->i_pages);
-
-	/* Caller disabled irqs, so they are still disabled here */
-	local_irq_enable();
-
-	if (nr_dropped)
-		shmem_uncharge(mapping->host, nr_dropped);
-
-	remap_page(origin_folio, 1 << order,
-			folio_test_anon(origin_folio) ?
-				RMP_USE_SHARED_ZEROPAGE : 0);
-
-	/*
-	 * At this point, folio should contain the specified page.
-	 * For uniform split, it is left for caller to unlock.
-	 * For buddy allocator like split, the first after-split folio is left
-	 * for caller to unlock.
-	 */
-	for (new_folio = origin_folio; new_folio != next_folio; new_folio = next) {
-		next = folio_next(new_folio);
-		if (new_folio == page_folio(lock_at))
-			continue;
-
-		folio_unlock(new_folio);
-		/*
-		 * Subpages may be freed if there wasn't any mapping
-		 * like if add_to_swap() is running on a lru page that
-		 * had its mapping zapped. And freeing these pages
-		 * requires taking the lru_lock so we do the put_page
-		 * of the tail pages after the split is complete.
-		 */
-		free_folio_and_swap_cache(new_folio);
-	}
 	return ret;
 }
 
@@ -3706,11 +3599,14 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
 {
 	struct deferred_split *ds_queue = get_deferred_split_queue(folio);
 	XA_STATE(xas, &folio->mapping->i_pages, folio->index);
+	struct folio *next_folio = folio_next(folio);
 	bool is_anon = folio_test_anon(folio);
 	struct address_space *mapping = NULL;
 	struct anon_vma *anon_vma = NULL;
 	int order = folio_order(folio);
+	struct folio *new_folio, *next;
 	int extra_pins, ret;
+	int nr_shmem_dropped = 0;
 	pgoff_t end;
 	bool is_hzp;
 
@@ -3833,13 +3729,18 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
 		 */
 		xas_lock(&xas);
 		xas_reset(&xas);
-		if (xas_load(&xas) != folio)
+		if (xas_load(&xas) != folio) {
+			ret = -EAGAIN;
 			goto fail;
+		}
 	}
 
 	/* Prevent deferred_split_scan() touching ->_refcount */
 	spin_lock(&ds_queue->split_queue_lock);
 	if (folio_ref_freeze(folio, 1 + extra_pins)) {
+		struct address_space *swap_cache = NULL;
+		struct lruvec *lruvec;
+
 		if (folio_order(folio) > 1 &&
 		    !list_empty(&folio->_deferred_list)) {
 			ds_queue->split_queue_len--;
@@ -3873,18 +3774,120 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
 			}
 		}
 
-		ret = __split_unmapped_folio(folio, new_order,
-				split_at, lock_at, list, end, &xas, mapping,
-				uniform_split);
+		if (folio_test_swapcache(folio)) {
+			if (mapping) {
+				VM_WARN_ON_ONCE_FOLIO(mapping, folio);
+				ret = -EINVAL;
+				goto fail;
+			}
+
+			/*
+			 * a swapcache folio can only be uniformly split to
+			 * order-0
+			 */
+			if (!uniform_split || new_order != 0) {
+				ret = -EINVAL;
+				goto fail;
+			}
+
+			swap_cache = swap_address_space(folio->swap);
+			xa_lock(&swap_cache->i_pages);
+		}
+
+		/* lock lru list/PageCompound, ref frozen by page_ref_freeze */
+		lruvec = folio_lruvec_lock(folio);
+
+		ret = __split_unmapped_folio(folio, new_order, split_at, &xas,
+					     mapping, uniform_split);
+
+		/*
+		 * Unfreeze after-split folios and put them back to the right
+		 * list. @folio should be kept frozon until page cache entries
+		 * are updated with all the other after-split folios to prevent
+		 * others seeing stale page cache entries.
+		 */
+		for (new_folio = folio_next(folio); new_folio != next_folio;
+		     new_folio = next) {
+			next = folio_next(new_folio);
+
+			folio_ref_unfreeze(
+				new_folio,
+				1 + ((mapping || swap_cache) ?
+					     folio_nr_pages(new_folio) :
+					     0));
+
+			lru_add_split_folio(folio, new_folio, lruvec, list);
+
+			/* Some pages can be beyond EOF: drop them from cache */
+			if (new_folio->index >= end) {
+				if (shmem_mapping(mapping))
+					nr_shmem_dropped +=
+						folio_nr_pages(new_folio);
+				else if (folio_test_clear_dirty(new_folio))
+					folio_account_cleaned(
+						new_folio,
+						inode_to_wb(mapping->host));
+				__filemap_remove_folio(new_folio, NULL);
+				folio_put_refs(new_folio,
+					       folio_nr_pages(new_folio));
+			} else if (mapping) {
+				__xa_store(&mapping->i_pages, new_folio->index,
+					   new_folio, 0);
+			} else if (swap_cache) {
+				__xa_store(&swap_cache->i_pages,
+					   swap_cache_index(new_folio->swap),
+					   new_folio, 0);
+			}
+		}
+		/*
+		 * Unfreeze @folio only after all page cache entries, which
+		 * used to point to it, have been updated with new folios.
+		 * Otherwise, a parallel folio_try_get() can grab origin_folio
+		 * and its caller can see stale page cache entries.
+		 */
+		folio_ref_unfreeze(folio, 1 +
+			((mapping || swap_cache) ? folio_nr_pages(folio) : 0));
+
+		unlock_page_lruvec(lruvec);
+
+		if (swap_cache)
+			xa_unlock(&swap_cache->i_pages);
 	} else {
 		spin_unlock(&ds_queue->split_queue_lock);
-fail:
-		if (mapping)
-			xas_unlock(&xas);
-		local_irq_enable();
-		remap_page(folio, folio_nr_pages(folio), 0);
 		ret = -EAGAIN;
 	}
+fail:
+	if (mapping)
+		xas_unlock(&xas);
+
+	local_irq_enable();
+
+	if (nr_shmem_dropped)
+		shmem_uncharge(mapping->host, nr_shmem_dropped);
+
+	remap_page(folio, 1 << order,
+		   !ret && folio_test_anon(folio) ? RMP_USE_SHARED_ZEROPAGE :
+						    0);
+
+	/*
+	 * Unlock all after-split folios except the one containing @lock_at
+	 * page. If @folio is not split, it will be kept locked.
+	 */
+	for (new_folio = folio; new_folio != next_folio; new_folio = next) {
+		next = folio_next(new_folio);
+		if (new_folio == page_folio(lock_at))
+			continue;
+
+		folio_unlock(new_folio);
+		/*
+		 * Subpages may be freed if there wasn't any mapping
+		 * like if add_to_swap() is running on a lru page that
+		 * had its mapping zapped. And freeing these pages
+		 * requires taking the lru_lock so we do the put_page
+		 * of the tail pages after the split is complete.
+		 */
+		free_folio_and_swap_cache(new_folio);
+	}
 
 out_unlock:
 	if (anon_vma) {
-- 
2.47.2



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm/huge_memory: move unrelated code out of __split_unmapped_folio()
  2025-07-11  3:02 [PATCH] mm/huge_memory: move unrelated code out of __split_unmapped_folio() Zi Yan
@ 2025-07-11  6:41 ` David Hildenbrand
  2025-07-11 14:37   ` Zi Yan
  0 siblings, 1 reply; 9+ messages in thread
From: David Hildenbrand @ 2025-07-11  6:41 UTC (permalink / raw)
  To: Zi Yan, Balbir Singh, linux-mm
  Cc: Andrew Morton, Lorenzo Stoakes, Baolin Wang, Liam R. Howlett,
	Nico Pache, Ryan Roberts, Dev Jain, Barry Song, linux-kernel

On 11.07.25 05:02, Zi Yan wrote:
> remap(), folio_ref_unfreeze(), lru_add_split_folio() are not related to
> splitting unmapped folio operations. Move them out to the caller, so that
> __split_unmapped_folio() only splits unmapped folios. This makes
> __split_unmapped_folio() reusable.
> 
> Convert VM_BUG_ON(mapping) to use VM_WARN_ON_ONCE_FOLIO().
> 
> Signed-off-by: Zi Yan <ziy@nvidia.com>
> ---
> Based on the prior discussion[1], this patch makes
> __split_unmapped_folio() reusable for splitting unmapped folios without
> adding a new boolean unmapped parameter to guard mapping related code.
> 
> Another potential benefit is that __split_unmapped_folio() could be
> called on after-split folios by __folio_split() to perform new split
> methods. For example, at deferred split time, unmapped subpages can
> scatter arbitrarily within a large folio, neither uniform nor non-uniform
> split can maximize after-split folio orders for mapped subpages.
> Hopefully, performing __split_unmapped_folio() multiple times can
> achieve the optimal split result.
> 
> It passed mm selftests.
> 
> [1] https://lore.kernel.org/linux-mm/94D8C1A4-780C-4BEC-A336-7D3613B54845@nvidia.com/
> ---
> 
>   mm/huge_memory.c | 275 ++++++++++++++++++++++++-----------------------
>   1 file changed, 139 insertions(+), 136 deletions(-)
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 3eb1c34be601..d97145dfa6c8 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -3396,10 +3396,6 @@ static void __split_folio_to_order(struct folio *folio, int old_order,
>    *             order - 1 to new_order).
>    * @split_at: in buddy allocator like split, the folio containing @split_at
>    *            will be split until its order becomes @new_order.
> - * @lock_at: the folio containing @lock_at is left locked for caller.
> - * @list: the after split folios will be added to @list if it is not NULL,
> - *        otherwise to LRU lists.
> - * @end: the end of the file @folio maps to. -1 if @folio is anonymous memory.
>    * @xas: xa_state pointing to folio->mapping->i_pages and locked by caller
>    * @mapping: @folio->mapping
>    * @uniform_split: if the split is uniform or not (buddy allocator like split)
> @@ -3425,51 +3421,27 @@ static void __split_folio_to_order(struct folio *folio, int old_order,
>    *    @page, which is split in next for loop.
>    *
>    * After splitting, the caller's folio reference will be transferred to the
> - * folio containing @page. The other folios may be freed if they are not mapped.
> - *
> - * In terms of locking, after splitting,
> - * 1. uniform split leaves @page (or the folio contains it) locked;
> - * 2. buddy allocator like (non-uniform) split leaves @folio locked.
> - *
> + * folio containing @page. The caller needs to unlock and/or free after-split
> + * folios if necessary.
>    *
>    * For !uniform_split, when -ENOMEM is returned, the original folio might be
>    * split. The caller needs to check the input folio.
>    */
>   static int __split_unmapped_folio(struct folio *folio, int new_order,
> -		struct page *split_at, struct page *lock_at,
> -		struct list_head *list, pgoff_t end,
> -		struct xa_state *xas, struct address_space *mapping,
> -		bool uniform_split)
> +				  struct page *split_at, struct xa_state *xas,
> +				  struct address_space *mapping,
> +				  bool uniform_split)

Use two-tabs indent please (like we already do, I assume).

[...]

> @@ -3706,11 +3599,14 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
>   {
>   	struct deferred_split *ds_queue = get_deferred_split_queue(folio);
>   	XA_STATE(xas, &folio->mapping->i_pages, folio->index);
> +	struct folio *next_folio = folio_next(folio);
>   	bool is_anon = folio_test_anon(folio);
>   	struct address_space *mapping = NULL;
>   	struct anon_vma *anon_vma = NULL;
>   	int order = folio_order(folio);
> +	struct folio *new_folio, *next;
>   	int extra_pins, ret;
> +	int nr_shmem_dropped = 0;
>   	pgoff_t end;
>   	bool is_hzp;
>   
> @@ -3833,13 +3729,18 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
>   		 */
>   		xas_lock(&xas);
>   		xas_reset(&xas);
> -		if (xas_load(&xas) != folio)
> +		if (xas_load(&xas) != folio) {
> +			ret = -EAGAIN;
>   			goto fail;
> +		}
>   	}
>   
>   	/* Prevent deferred_split_scan() touching ->_refcount */
>   	spin_lock(&ds_queue->split_queue_lock);
>   	if (folio_ref_freeze(folio, 1 + extra_pins)) {
> +		struct address_space *swap_cache = NULL;
> +		struct lruvec *lruvec;
> +
>   		if (folio_order(folio) > 1 &&
>   		    !list_empty(&folio->_deferred_list)) {
>   			ds_queue->split_queue_len--;
> @@ -3873,18 +3774,120 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
>   			}
>   		}
>   
> -		ret = __split_unmapped_folio(folio, new_order,
> -				split_at, lock_at, list, end, &xas, mapping,
> -				uniform_split);
> +		if (folio_test_swapcache(folio)) {
> +			if (mapping) {
> +				VM_WARN_ON_ONCE_FOLIO(mapping, folio);
> +				ret = -EINVAL;
> +				goto fail;
> +			}
> +
> +			/*
> +			 * a swapcache folio can only be uniformly split to
> +			 * order-0
> +			 */
> +			if (!uniform_split || new_order != 0) {
> +				ret = -EINVAL;
> +				goto fail;
> +			}
> +
> +			swap_cache = swap_address_space(folio->swap);
> +			xa_lock(&swap_cache->i_pages);
> +		}
> +
> +		/* lock lru list/PageCompound, ref frozen by page_ref_freeze */
> +		lruvec = folio_lruvec_lock(folio);
> +
> +		ret = __split_unmapped_folio(folio, new_order, split_at, &xas,
> +					     mapping, uniform_split);
> +
> +		/*
> +		 * Unfreeze after-split folios and put them back to the right
> +		 * list. @folio should be kept frozon until page cache entries
> +		 * are updated with all the other after-split folios to prevent
> +		 * others seeing stale page cache entries.
> +		 */
> +		for (new_folio = folio_next(folio); new_folio != next_folio;
> +		     new_folio = next) {
> +			next = folio_next(new_folio);
> +
> +			folio_ref_unfreeze(
> +				new_folio,
> +				1 + ((mapping || swap_cache) ?
> +					     folio_nr_pages(new_folio) :
> +					     0));

While we are at it, is a way to make this look less than an artistic 
masterpiece? :)

expected_refs = ...
folio_ref_unfreeze(new_folio, expected_refs).


Can we already make use of folio_expected_ref_count() at that point? 
Mapcount should be 0 and the folio should be properly setup (e.g., anon, 
swapcache) IIRC.

So maybe

expected_refs = folio_expected_ref_count(new_folio) + 1;
folio_ref_unfreeze(new_folio, expected_refs).

Would do?

> +
> +			lru_add_split_folio(folio, new_folio, lruvec, list);
> +
> +			/* Some pages can be beyond EOF: drop them from cache */
> +			if (new_folio->index >= end) {
> +				if (shmem_mapping(mapping))
> +					nr_shmem_dropped +=
> +						folio_nr_pages(new_folio);

Keep that on a single line.

> +				else if (folio_test_clear_dirty(new_folio))
> +					folio_account_cleaned(
> +						new_folio,
> +						inode_to_wb(mapping->host));
> +				__filemap_remove_folio(new_folio, NULL);
> +				folio_put_refs(new_folio,
> +					       folio_nr_pages(new_folio));
> +			} else if (mapping) {
> +				__xa_store(&mapping->i_pages, new_folio->index,
> +					   new_folio, 0);
> +			} else if (swap_cache) {
> +				__xa_store(&swap_cache->i_pages,
> +					   swap_cache_index(new_folio->swap),
> +					   new_folio, 0);
> +			}
> +		}
> +		/*
> +		 * Unfreeze @folio only after all page cache entries, which
> +		 * used to point to it, have been updated with new folios.
> +		 * Otherwise, a parallel folio_try_get() can grab origin_folio
> +		 * and its caller can see stale page cache entries.
> +		 */
> +		folio_ref_unfreeze(folio, 1 +
> +			((mapping || swap_cache) ? folio_nr_pages(folio) : 0));

Same as above probably.


-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm/huge_memory: move unrelated code out of __split_unmapped_folio()
  2025-07-11  6:41 ` David Hildenbrand
@ 2025-07-11 14:37   ` Zi Yan
  2025-07-11 14:40     ` David Hildenbrand
  0 siblings, 1 reply; 9+ messages in thread
From: Zi Yan @ 2025-07-11 14:37 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Balbir Singh, linux-mm, Andrew Morton, Lorenzo Stoakes,
	Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain,
	Barry Song, linux-kernel

On 11 Jul 2025, at 2:41, David Hildenbrand wrote:

> On 11.07.25 05:02, Zi Yan wrote:
>> remap(), folio_ref_unfreeze(), lru_add_split_folio() are not related to
>> splitting unmapped folio operations. Move them out to the caller, so that
>> __split_unmapped_folio() only splits unmapped folios. This makes
>> __split_unmapped_folio() reusable.
>>
>> Convert VM_BUG_ON(mapping) to use VM_WARN_ON_ONCE_FOLIO().
>>
>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>> ---
>> Based on the prior discussion[1], this patch makes
>> __split_unmapped_folio() reusable for splitting unmapped folios without
>> adding a new boolean unmapped parameter to guard mapping related code.
>>
>> Another potential benefit is that __split_unmapped_folio() could be
>> called on after-split folios by __folio_split() to perform new split
>> methods. For example, at deferred split time, unmapped subpages can
>> scatter arbitrarily within a large folio, neither uniform nor non-uniform
>> split can maximize after-split folio orders for mapped subpages.
>> Hopefully, performing __split_unmapped_folio() multiple times can
>> achieve the optimal split result.
>>
>> It passed mm selftests.
>>
>> [1] https://lore.kernel.org/linux-mm/94D8C1A4-780C-4BEC-A336-7D3613B54845@nvidia.com/
>> ---
>>
>>   mm/huge_memory.c | 275 ++++++++++++++++++++++++-----------------------
>>   1 file changed, 139 insertions(+), 136 deletions(-)
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index 3eb1c34be601..d97145dfa6c8 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -3396,10 +3396,6 @@ static void __split_folio_to_order(struct folio *folio, int old_order,
>>    *             order - 1 to new_order).
>>    * @split_at: in buddy allocator like split, the folio containing @split_at
>>    *            will be split until its order becomes @new_order.
>> - * @lock_at: the folio containing @lock_at is left locked for caller.
>> - * @list: the after split folios will be added to @list if it is not NULL,
>> - *        otherwise to LRU lists.
>> - * @end: the end of the file @folio maps to. -1 if @folio is anonymous memory.
>>    * @xas: xa_state pointing to folio->mapping->i_pages and locked by caller
>>    * @mapping: @folio->mapping
>>    * @uniform_split: if the split is uniform or not (buddy allocator like split)
>> @@ -3425,51 +3421,27 @@ static void __split_folio_to_order(struct folio *folio, int old_order,
>>    *    @page, which is split in next for loop.
>>    *
>>    * After splitting, the caller's folio reference will be transferred to the
>> - * folio containing @page. The other folios may be freed if they are not mapped.
>> - *
>> - * In terms of locking, after splitting,
>> - * 1. uniform split leaves @page (or the folio contains it) locked;
>> - * 2. buddy allocator like (non-uniform) split leaves @folio locked.
>> - *
>> + * folio containing @page. The caller needs to unlock and/or free after-split
>> + * folios if necessary.
>>    *
>>    * For !uniform_split, when -ENOMEM is returned, the original folio might be
>>    * split. The caller needs to check the input folio.
>>    */
>>   static int __split_unmapped_folio(struct folio *folio, int new_order,
>> -		struct page *split_at, struct page *lock_at,
>> -		struct list_head *list, pgoff_t end,
>> -		struct xa_state *xas, struct address_space *mapping,
>> -		bool uniform_split)
>> +				  struct page *split_at, struct xa_state *xas,
>> +				  struct address_space *mapping,
>> +				  bool uniform_split)
>
> Use two-tabs indent please (like we already do, I assume).

OK. I was using clang-format. It gave me this indentation.
>
> [...]
>
>> @@ -3706,11 +3599,14 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
>>   {
>>   	struct deferred_split *ds_queue = get_deferred_split_queue(folio);
>>   	XA_STATE(xas, &folio->mapping->i_pages, folio->index);
>> +	struct folio *next_folio = folio_next(folio);
>>   	bool is_anon = folio_test_anon(folio);
>>   	struct address_space *mapping = NULL;
>>   	struct anon_vma *anon_vma = NULL;
>>   	int order = folio_order(folio);
>> +	struct folio *new_folio, *next;
>>   	int extra_pins, ret;
>> +	int nr_shmem_dropped = 0;
>>   	pgoff_t end;
>>   	bool is_hzp;
>>  @@ -3833,13 +3729,18 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
>>   		 */
>>   		xas_lock(&xas);
>>   		xas_reset(&xas);
>> -		if (xas_load(&xas) != folio)
>> +		if (xas_load(&xas) != folio) {
>> +			ret = -EAGAIN;
>>   			goto fail;
>> +		}
>>   	}
>>    	/* Prevent deferred_split_scan() touching ->_refcount */
>>   	spin_lock(&ds_queue->split_queue_lock);
>>   	if (folio_ref_freeze(folio, 1 + extra_pins)) {
>> +		struct address_space *swap_cache = NULL;
>> +		struct lruvec *lruvec;
>> +
>>   		if (folio_order(folio) > 1 &&
>>   		    !list_empty(&folio->_deferred_list)) {
>>   			ds_queue->split_queue_len--;
>> @@ -3873,18 +3774,120 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
>>   			}
>>   		}
>>  -		ret = __split_unmapped_folio(folio, new_order,
>> -				split_at, lock_at, list, end, &xas, mapping,
>> -				uniform_split);
>> +		if (folio_test_swapcache(folio)) {
>> +			if (mapping) {
>> +				VM_WARN_ON_ONCE_FOLIO(mapping, folio);
>> +				ret = -EINVAL;
>> +				goto fail;
>> +			}
>> +
>> +			/*
>> +			 * a swapcache folio can only be uniformly split to
>> +			 * order-0
>> +			 */
>> +			if (!uniform_split || new_order != 0) {
>> +				ret = -EINVAL;
>> +				goto fail;
>> +			}
>> +
>> +			swap_cache = swap_address_space(folio->swap);
>> +			xa_lock(&swap_cache->i_pages);
>> +		}
>> +
>> +		/* lock lru list/PageCompound, ref frozen by page_ref_freeze */
>> +		lruvec = folio_lruvec_lock(folio);
>> +
>> +		ret = __split_unmapped_folio(folio, new_order, split_at, &xas,
>> +					     mapping, uniform_split);
>> +
>> +		/*
>> +		 * Unfreeze after-split folios and put them back to the right
>> +		 * list. @folio should be kept frozon until page cache entries
>> +		 * are updated with all the other after-split folios to prevent
>> +		 * others seeing stale page cache entries.
>> +		 */
>> +		for (new_folio = folio_next(folio); new_folio != next_folio;
>> +		     new_folio = next) {
>> +			next = folio_next(new_folio);
>> +
>> +			folio_ref_unfreeze(
>> +				new_folio,
>> +				1 + ((mapping || swap_cache) ?
>> +					     folio_nr_pages(new_folio) :
>> +					     0));
>
> While we are at it, is a way to make this look less than an artistic masterpiece? :)
>
> expected_refs = ...
> folio_ref_unfreeze(new_folio, expected_refs).
>
>
> Can we already make use of folio_expected_ref_count() at that point? Mapcount should be 0 and the folio should be properly setup (e.g., anon, swapcache) IIRC.
>
> So maybe
>
> expected_refs = folio_expected_ref_count(new_folio) + 1;
> folio_ref_unfreeze(new_folio, expected_refs).
>
> Would do?

I think so. Even further, I think we probably can get rid of can_split_folio()’s
pextra_pins and use folio_expected_ref_count() too.

Before split:

if (!can_split_folio(folio, 1))

unmap_folio();

extra_pins = folio_expected_ref_count(folio) + 1;

After split:

1. new folio:
expected_refs = folio_expected_ref_count(new_folio) + 1;
folio_ref_unfreeze(new_folio, expected_refs).

2: original folio (it can be split, so need to check ref again):
expected_refs = folio_expected_ref_count(folio) + 1;
folio_ref_unfreeze(folio, expected_refs).


>
>> +
>> +			lru_add_split_folio(folio, new_folio, lruvec, list);
>> +
>> +			/* Some pages can be beyond EOF: drop them from cache */
>> +			if (new_folio->index >= end) {
>> +				if (shmem_mapping(mapping))
>> +					nr_shmem_dropped +=
>> +						folio_nr_pages(new_folio);
>
> Keep that on a single line.

OK.

>
>> +				else if (folio_test_clear_dirty(new_folio))
>> +					folio_account_cleaned(
>> +						new_folio,
>> +						inode_to_wb(mapping->host));
>> +				__filemap_remove_folio(new_folio, NULL);
>> +				folio_put_refs(new_folio,
>> +					       folio_nr_pages(new_folio));
>> +			} else if (mapping) {
>> +				__xa_store(&mapping->i_pages, new_folio->index,
>> +					   new_folio, 0);
>> +			} else if (swap_cache) {
>> +				__xa_store(&swap_cache->i_pages,
>> +					   swap_cache_index(new_folio->swap),
>> +					   new_folio, 0);
>> +			}
>> +		}
>> +		/*
>> +		 * Unfreeze @folio only after all page cache entries, which
>> +		 * used to point to it, have been updated with new folios.
>> +		 * Otherwise, a parallel folio_try_get() can grab origin_folio
>> +		 * and its caller can see stale page cache entries.
>> +		 */
>> +		folio_ref_unfreeze(folio, 1 +
>> +			((mapping || swap_cache) ? folio_nr_pages(folio) : 0));
>
> Same as above probably.
Sure.

Thank you for the feedback. Will make all these changes and send v2.

Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm/huge_memory: move unrelated code out of __split_unmapped_folio()
  2025-07-11 14:37   ` Zi Yan
@ 2025-07-11 14:40     ` David Hildenbrand
  2025-07-11 15:40       ` Zi Yan
  0 siblings, 1 reply; 9+ messages in thread
From: David Hildenbrand @ 2025-07-11 14:40 UTC (permalink / raw)
  To: Zi Yan
  Cc: Balbir Singh, linux-mm, Andrew Morton, Lorenzo Stoakes,
	Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain,
	Barry Song, linux-kernel

On 11.07.25 16:37, Zi Yan wrote:
> On 11 Jul 2025, at 2:41, David Hildenbrand wrote:
> 
>> On 11.07.25 05:02, Zi Yan wrote:
>>> remap(), folio_ref_unfreeze(), lru_add_split_folio() are not related to
>>> splitting unmapped folio operations. Move them out to the caller, so that
>>> __split_unmapped_folio() only splits unmapped folios. This makes
>>> __split_unmapped_folio() reusable.
>>>
>>> Convert VM_BUG_ON(mapping) to use VM_WARN_ON_ONCE_FOLIO().
>>>
>>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>>> ---
>>> Based on the prior discussion[1], this patch makes
>>> __split_unmapped_folio() reusable for splitting unmapped folios without
>>> adding a new boolean unmapped parameter to guard mapping related code.
>>>
>>> Another potential benefit is that __split_unmapped_folio() could be
>>> called on after-split folios by __folio_split() to perform new split
>>> methods. For example, at deferred split time, unmapped subpages can
>>> scatter arbitrarily within a large folio, neither uniform nor non-uniform
>>> split can maximize after-split folio orders for mapped subpages.
>>> Hopefully, performing __split_unmapped_folio() multiple times can
>>> achieve the optimal split result.
>>>
>>> It passed mm selftests.
>>>
>>> [1] https://lore.kernel.org/linux-mm/94D8C1A4-780C-4BEC-A336-7D3613B54845@nvidia.com/
>>> ---
>>>
>>>    mm/huge_memory.c | 275 ++++++++++++++++++++++++-----------------------
>>>    1 file changed, 139 insertions(+), 136 deletions(-)
>>>
>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>> index 3eb1c34be601..d97145dfa6c8 100644
>>> --- a/mm/huge_memory.c
>>> +++ b/mm/huge_memory.c
>>> @@ -3396,10 +3396,6 @@ static void __split_folio_to_order(struct folio *folio, int old_order,
>>>     *             order - 1 to new_order).
>>>     * @split_at: in buddy allocator like split, the folio containing @split_at
>>>     *            will be split until its order becomes @new_order.
>>> - * @lock_at: the folio containing @lock_at is left locked for caller.
>>> - * @list: the after split folios will be added to @list if it is not NULL,
>>> - *        otherwise to LRU lists.
>>> - * @end: the end of the file @folio maps to. -1 if @folio is anonymous memory.
>>>     * @xas: xa_state pointing to folio->mapping->i_pages and locked by caller
>>>     * @mapping: @folio->mapping
>>>     * @uniform_split: if the split is uniform or not (buddy allocator like split)
>>> @@ -3425,51 +3421,27 @@ static void __split_folio_to_order(struct folio *folio, int old_order,
>>>     *    @page, which is split in next for loop.
>>>     *
>>>     * After splitting, the caller's folio reference will be transferred to the
>>> - * folio containing @page. The other folios may be freed if they are not mapped.
>>> - *
>>> - * In terms of locking, after splitting,
>>> - * 1. uniform split leaves @page (or the folio contains it) locked;
>>> - * 2. buddy allocator like (non-uniform) split leaves @folio locked.
>>> - *
>>> + * folio containing @page. The caller needs to unlock and/or free after-split
>>> + * folios if necessary.
>>>     *
>>>     * For !uniform_split, when -ENOMEM is returned, the original folio might be
>>>     * split. The caller needs to check the input folio.
>>>     */
>>>    static int __split_unmapped_folio(struct folio *folio, int new_order,
>>> -		struct page *split_at, struct page *lock_at,
>>> -		struct list_head *list, pgoff_t end,
>>> -		struct xa_state *xas, struct address_space *mapping,
>>> -		bool uniform_split)
>>> +				  struct page *split_at, struct xa_state *xas,
>>> +				  struct address_space *mapping,
>>> +				  bool uniform_split)
>>
>> Use two-tabs indent please (like we already do, I assume).
> 
> OK. I was using clang-format. It gave me this indentation.
>>
>> [...]
>>
>>> @@ -3706,11 +3599,14 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
>>>    {
>>>    	struct deferred_split *ds_queue = get_deferred_split_queue(folio);
>>>    	XA_STATE(xas, &folio->mapping->i_pages, folio->index);
>>> +	struct folio *next_folio = folio_next(folio);
>>>    	bool is_anon = folio_test_anon(folio);
>>>    	struct address_space *mapping = NULL;
>>>    	struct anon_vma *anon_vma = NULL;
>>>    	int order = folio_order(folio);
>>> +	struct folio *new_folio, *next;
>>>    	int extra_pins, ret;
>>> +	int nr_shmem_dropped = 0;
>>>    	pgoff_t end;
>>>    	bool is_hzp;
>>>   @@ -3833,13 +3729,18 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
>>>    		 */
>>>    		xas_lock(&xas);
>>>    		xas_reset(&xas);
>>> -		if (xas_load(&xas) != folio)
>>> +		if (xas_load(&xas) != folio) {
>>> +			ret = -EAGAIN;
>>>    			goto fail;
>>> +		}
>>>    	}
>>>     	/* Prevent deferred_split_scan() touching ->_refcount */
>>>    	spin_lock(&ds_queue->split_queue_lock);
>>>    	if (folio_ref_freeze(folio, 1 + extra_pins)) {
>>> +		struct address_space *swap_cache = NULL;
>>> +		struct lruvec *lruvec;
>>> +
>>>    		if (folio_order(folio) > 1 &&
>>>    		    !list_empty(&folio->_deferred_list)) {
>>>    			ds_queue->split_queue_len--;
>>> @@ -3873,18 +3774,120 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
>>>    			}
>>>    		}
>>>   -		ret = __split_unmapped_folio(folio, new_order,
>>> -				split_at, lock_at, list, end, &xas, mapping,
>>> -				uniform_split);
>>> +		if (folio_test_swapcache(folio)) {
>>> +			if (mapping) {
>>> +				VM_WARN_ON_ONCE_FOLIO(mapping, folio);
>>> +				ret = -EINVAL;
>>> +				goto fail;
>>> +			}
>>> +
>>> +			/*
>>> +			 * a swapcache folio can only be uniformly split to
>>> +			 * order-0
>>> +			 */
>>> +			if (!uniform_split || new_order != 0) {
>>> +				ret = -EINVAL;
>>> +				goto fail;
>>> +			}
>>> +
>>> +			swap_cache = swap_address_space(folio->swap);
>>> +			xa_lock(&swap_cache->i_pages);
>>> +		}
>>> +
>>> +		/* lock lru list/PageCompound, ref frozen by page_ref_freeze */
>>> +		lruvec = folio_lruvec_lock(folio);
>>> +
>>> +		ret = __split_unmapped_folio(folio, new_order, split_at, &xas,
>>> +					     mapping, uniform_split);
>>> +
>>> +		/*
>>> +		 * Unfreeze after-split folios and put them back to the right
>>> +		 * list. @folio should be kept frozon until page cache entries
>>> +		 * are updated with all the other after-split folios to prevent
>>> +		 * others seeing stale page cache entries.
>>> +		 */
>>> +		for (new_folio = folio_next(folio); new_folio != next_folio;
>>> +		     new_folio = next) {
>>> +			next = folio_next(new_folio);
>>> +
>>> +			folio_ref_unfreeze(
>>> +				new_folio,
>>> +				1 + ((mapping || swap_cache) ?
>>> +					     folio_nr_pages(new_folio) :
>>> +					     0));
>>
>> While we are at it, is a way to make this look less than an artistic masterpiece? :)
>>
>> expected_refs = ...
>> folio_ref_unfreeze(new_folio, expected_refs).
>>
>>
>> Can we already make use of folio_expected_ref_count() at that point? Mapcount should be 0 and the folio should be properly setup (e.g., anon, swapcache) IIRC.
>>
>> So maybe
>>
>> expected_refs = folio_expected_ref_count(new_folio) + 1;
>> folio_ref_unfreeze(new_folio, expected_refs).
>>
>> Would do?
> 
> I think so. Even further, I think we probably can get rid of can_split_folio()’s
> pextra_pins and use folio_expected_ref_count() too.

That will only do the right think if we know that the folio is not 
mapped and that there is no way it can get mapped concurrently.

Otherwise, when freezing, we might ignore a mapping (where we should 
fail freezing).

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm/huge_memory: move unrelated code out of __split_unmapped_folio()
  2025-07-11 14:40     ` David Hildenbrand
@ 2025-07-11 15:40       ` Zi Yan
  2025-07-11 16:03         ` David Hildenbrand
  0 siblings, 1 reply; 9+ messages in thread
From: Zi Yan @ 2025-07-11 15:40 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Balbir Singh, linux-mm, Andrew Morton, Lorenzo Stoakes,
	Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain,
	Barry Song, linux-kernel

On 11 Jul 2025, at 10:40, David Hildenbrand wrote:

> On 11.07.25 16:37, Zi Yan wrote:
>> On 11 Jul 2025, at 2:41, David Hildenbrand wrote:
>>
>>> On 11.07.25 05:02, Zi Yan wrote:
>>>> remap(), folio_ref_unfreeze(), lru_add_split_folio() are not related to
>>>> splitting unmapped folio operations. Move them out to the caller, so that
>>>> __split_unmapped_folio() only splits unmapped folios. This makes
>>>> __split_unmapped_folio() reusable.
>>>>
>>>> Convert VM_BUG_ON(mapping) to use VM_WARN_ON_ONCE_FOLIO().
>>>>
>>>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>>>> ---
>>>> Based on the prior discussion[1], this patch makes
>>>> __split_unmapped_folio() reusable for splitting unmapped folios without
>>>> adding a new boolean unmapped parameter to guard mapping related code.
>>>>
>>>> Another potential benefit is that __split_unmapped_folio() could be
>>>> called on after-split folios by __folio_split() to perform new split
>>>> methods. For example, at deferred split time, unmapped subpages can
>>>> scatter arbitrarily within a large folio, neither uniform nor non-uniform
>>>> split can maximize after-split folio orders for mapped subpages.
>>>> Hopefully, performing __split_unmapped_folio() multiple times can
>>>> achieve the optimal split result.
>>>>
>>>> It passed mm selftests.
>>>>
>>>> [1] https://lore.kernel.org/linux-mm/94D8C1A4-780C-4BEC-A336-7D3613B54845@nvidia.com/
>>>> ---
>>>>
>>>>    mm/huge_memory.c | 275 ++++++++++++++++++++++++-----------------------
>>>>    1 file changed, 139 insertions(+), 136 deletions(-)
>>>>
>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>>> index 3eb1c34be601..d97145dfa6c8 100644
>>>> --- a/mm/huge_memory.c
>>>> +++ b/mm/huge_memory.c
>>>> @@ -3396,10 +3396,6 @@ static void __split_folio_to_order(struct folio *folio, int old_order,
>>>>     *             order - 1 to new_order).
>>>>     * @split_at: in buddy allocator like split, the folio containing @split_at
>>>>     *            will be split until its order becomes @new_order.
>>>> - * @lock_at: the folio containing @lock_at is left locked for caller.
>>>> - * @list: the after split folios will be added to @list if it is not NULL,
>>>> - *        otherwise to LRU lists.
>>>> - * @end: the end of the file @folio maps to. -1 if @folio is anonymous memory.
>>>>     * @xas: xa_state pointing to folio->mapping->i_pages and locked by caller
>>>>     * @mapping: @folio->mapping
>>>>     * @uniform_split: if the split is uniform or not (buddy allocator like split)
>>>> @@ -3425,51 +3421,27 @@ static void __split_folio_to_order(struct folio *folio, int old_order,
>>>>     *    @page, which is split in next for loop.
>>>>     *
>>>>     * After splitting, the caller's folio reference will be transferred to the
>>>> - * folio containing @page. The other folios may be freed if they are not mapped.
>>>> - *
>>>> - * In terms of locking, after splitting,
>>>> - * 1. uniform split leaves @page (or the folio contains it) locked;
>>>> - * 2. buddy allocator like (non-uniform) split leaves @folio locked.
>>>> - *
>>>> + * folio containing @page. The caller needs to unlock and/or free after-split
>>>> + * folios if necessary.
>>>>     *
>>>>     * For !uniform_split, when -ENOMEM is returned, the original folio might be
>>>>     * split. The caller needs to check the input folio.
>>>>     */
>>>>    static int __split_unmapped_folio(struct folio *folio, int new_order,
>>>> -		struct page *split_at, struct page *lock_at,
>>>> -		struct list_head *list, pgoff_t end,
>>>> -		struct xa_state *xas, struct address_space *mapping,
>>>> -		bool uniform_split)
>>>> +				  struct page *split_at, struct xa_state *xas,
>>>> +				  struct address_space *mapping,
>>>> +				  bool uniform_split)
>>>
>>> Use two-tabs indent please (like we already do, I assume).
>>
>> OK. I was using clang-format. It gave me this indentation.
>>>
>>> [...]
>>>
>>>> @@ -3706,11 +3599,14 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
>>>>    {
>>>>    	struct deferred_split *ds_queue = get_deferred_split_queue(folio);
>>>>    	XA_STATE(xas, &folio->mapping->i_pages, folio->index);
>>>> +	struct folio *next_folio = folio_next(folio);
>>>>    	bool is_anon = folio_test_anon(folio);
>>>>    	struct address_space *mapping = NULL;
>>>>    	struct anon_vma *anon_vma = NULL;
>>>>    	int order = folio_order(folio);
>>>> +	struct folio *new_folio, *next;
>>>>    	int extra_pins, ret;
>>>> +	int nr_shmem_dropped = 0;
>>>>    	pgoff_t end;
>>>>    	bool is_hzp;
>>>>   @@ -3833,13 +3729,18 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
>>>>    		 */
>>>>    		xas_lock(&xas);
>>>>    		xas_reset(&xas);
>>>> -		if (xas_load(&xas) != folio)
>>>> +		if (xas_load(&xas) != folio) {
>>>> +			ret = -EAGAIN;
>>>>    			goto fail;
>>>> +		}
>>>>    	}
>>>>     	/* Prevent deferred_split_scan() touching ->_refcount */
>>>>    	spin_lock(&ds_queue->split_queue_lock);
>>>>    	if (folio_ref_freeze(folio, 1 + extra_pins)) {
>>>> +		struct address_space *swap_cache = NULL;
>>>> +		struct lruvec *lruvec;
>>>> +
>>>>    		if (folio_order(folio) > 1 &&
>>>>    		    !list_empty(&folio->_deferred_list)) {
>>>>    			ds_queue->split_queue_len--;
>>>> @@ -3873,18 +3774,120 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
>>>>    			}
>>>>    		}
>>>>   -		ret = __split_unmapped_folio(folio, new_order,
>>>> -				split_at, lock_at, list, end, &xas, mapping,
>>>> -				uniform_split);
>>>> +		if (folio_test_swapcache(folio)) {
>>>> +			if (mapping) {
>>>> +				VM_WARN_ON_ONCE_FOLIO(mapping, folio);
>>>> +				ret = -EINVAL;
>>>> +				goto fail;
>>>> +			}
>>>> +
>>>> +			/*
>>>> +			 * a swapcache folio can only be uniformly split to
>>>> +			 * order-0
>>>> +			 */
>>>> +			if (!uniform_split || new_order != 0) {
>>>> +				ret = -EINVAL;
>>>> +				goto fail;
>>>> +			}
>>>> +
>>>> +			swap_cache = swap_address_space(folio->swap);
>>>> +			xa_lock(&swap_cache->i_pages);
>>>> +		}
>>>> +
>>>> +		/* lock lru list/PageCompound, ref frozen by page_ref_freeze */
>>>> +		lruvec = folio_lruvec_lock(folio);
>>>> +
>>>> +		ret = __split_unmapped_folio(folio, new_order, split_at, &xas,
>>>> +					     mapping, uniform_split);
>>>> +
>>>> +		/*
>>>> +		 * Unfreeze after-split folios and put them back to the right
>>>> +		 * list. @folio should be kept frozon until page cache entries
>>>> +		 * are updated with all the other after-split folios to prevent
>>>> +		 * others seeing stale page cache entries.
>>>> +		 */
>>>> +		for (new_folio = folio_next(folio); new_folio != next_folio;
>>>> +		     new_folio = next) {
>>>> +			next = folio_next(new_folio);
>>>> +
>>>> +			folio_ref_unfreeze(
>>>> +				new_folio,
>>>> +				1 + ((mapping || swap_cache) ?
>>>> +					     folio_nr_pages(new_folio) :
>>>> +					     0));
>>>
>>> While we are at it, is a way to make this look less than an artistic masterpiece? :)
>>>
>>> expected_refs = ...
>>> folio_ref_unfreeze(new_folio, expected_refs).
>>>
>>>
>>> Can we already make use of folio_expected_ref_count() at that point? Mapcount should be 0 and the folio should be properly setup (e.g., anon, swapcache) IIRC.
>>>
>>> So maybe
>>>
>>> expected_refs = folio_expected_ref_count(new_folio) + 1;
>>> folio_ref_unfreeze(new_folio, expected_refs).
>>>
>>> Would do?
>>
>> I think so. Even further, I think we probably can get rid of can_split_folio()’s
>> pextra_pins and use folio_expected_ref_count() too.
>
> That will only do the right think if we know that the folio is not mapped and that there is no way it can get mapped concurrently.
>
> Otherwise, when freezing, we might ignore a mapping (where we should fail freezing).

You mean between unmap_folio() and folio_ref_freeze(), a concurrent mapping
happens? So that what folio_expected_ref_count() returns has
folio_mapcount() != 0. You are right. Thanks.

I could pull the ref_count code in folio_expected_ref_count() into a
new helper function and reuse it in can_split_folio(). That might be
out of scope. I might back to this later.


Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm/huge_memory: move unrelated code out of __split_unmapped_folio()
  2025-07-11 15:40       ` Zi Yan
@ 2025-07-11 16:03         ` David Hildenbrand
  2025-07-11 16:26           ` Zi Yan
  0 siblings, 1 reply; 9+ messages in thread
From: David Hildenbrand @ 2025-07-11 16:03 UTC (permalink / raw)
  To: Zi Yan
  Cc: Balbir Singh, linux-mm, Andrew Morton, Lorenzo Stoakes,
	Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain,
	Barry Song, linux-kernel

On 11.07.25 17:40, Zi Yan wrote:
> On 11 Jul 2025, at 10:40, David Hildenbrand wrote:
> 
>> On 11.07.25 16:37, Zi Yan wrote:
>>> On 11 Jul 2025, at 2:41, David Hildenbrand wrote:
>>>
>>>> On 11.07.25 05:02, Zi Yan wrote:
>>>>> remap(), folio_ref_unfreeze(), lru_add_split_folio() are not related to
>>>>> splitting unmapped folio operations. Move them out to the caller, so that
>>>>> __split_unmapped_folio() only splits unmapped folios. This makes
>>>>> __split_unmapped_folio() reusable.
>>>>>
>>>>> Convert VM_BUG_ON(mapping) to use VM_WARN_ON_ONCE_FOLIO().
>>>>>
>>>>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>>>>> ---
>>>>> Based on the prior discussion[1], this patch makes
>>>>> __split_unmapped_folio() reusable for splitting unmapped folios without
>>>>> adding a new boolean unmapped parameter to guard mapping related code.
>>>>>
>>>>> Another potential benefit is that __split_unmapped_folio() could be
>>>>> called on after-split folios by __folio_split() to perform new split
>>>>> methods. For example, at deferred split time, unmapped subpages can
>>>>> scatter arbitrarily within a large folio, neither uniform nor non-uniform
>>>>> split can maximize after-split folio orders for mapped subpages.
>>>>> Hopefully, performing __split_unmapped_folio() multiple times can
>>>>> achieve the optimal split result.
>>>>>
>>>>> It passed mm selftests.
>>>>>
>>>>> [1] https://lore.kernel.org/linux-mm/94D8C1A4-780C-4BEC-A336-7D3613B54845@nvidia.com/
>>>>> ---
>>>>>
>>>>>     mm/huge_memory.c | 275 ++++++++++++++++++++++++-----------------------
>>>>>     1 file changed, 139 insertions(+), 136 deletions(-)
>>>>>
>>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>>>> index 3eb1c34be601..d97145dfa6c8 100644
>>>>> --- a/mm/huge_memory.c
>>>>> +++ b/mm/huge_memory.c
>>>>> @@ -3396,10 +3396,6 @@ static void __split_folio_to_order(struct folio *folio, int old_order,
>>>>>      *             order - 1 to new_order).
>>>>>      * @split_at: in buddy allocator like split, the folio containing @split_at
>>>>>      *            will be split until its order becomes @new_order.
>>>>> - * @lock_at: the folio containing @lock_at is left locked for caller.
>>>>> - * @list: the after split folios will be added to @list if it is not NULL,
>>>>> - *        otherwise to LRU lists.
>>>>> - * @end: the end of the file @folio maps to. -1 if @folio is anonymous memory.
>>>>>      * @xas: xa_state pointing to folio->mapping->i_pages and locked by caller
>>>>>      * @mapping: @folio->mapping
>>>>>      * @uniform_split: if the split is uniform or not (buddy allocator like split)
>>>>> @@ -3425,51 +3421,27 @@ static void __split_folio_to_order(struct folio *folio, int old_order,
>>>>>      *    @page, which is split in next for loop.
>>>>>      *
>>>>>      * After splitting, the caller's folio reference will be transferred to the
>>>>> - * folio containing @page. The other folios may be freed if they are not mapped.
>>>>> - *
>>>>> - * In terms of locking, after splitting,
>>>>> - * 1. uniform split leaves @page (or the folio contains it) locked;
>>>>> - * 2. buddy allocator like (non-uniform) split leaves @folio locked.
>>>>> - *
>>>>> + * folio containing @page. The caller needs to unlock and/or free after-split
>>>>> + * folios if necessary.
>>>>>      *
>>>>>      * For !uniform_split, when -ENOMEM is returned, the original folio might be
>>>>>      * split. The caller needs to check the input folio.
>>>>>      */
>>>>>     static int __split_unmapped_folio(struct folio *folio, int new_order,
>>>>> -		struct page *split_at, struct page *lock_at,
>>>>> -		struct list_head *list, pgoff_t end,
>>>>> -		struct xa_state *xas, struct address_space *mapping,
>>>>> -		bool uniform_split)
>>>>> +				  struct page *split_at, struct xa_state *xas,
>>>>> +				  struct address_space *mapping,
>>>>> +				  bool uniform_split)
>>>>
>>>> Use two-tabs indent please (like we already do, I assume).
>>>
>>> OK. I was using clang-format. It gave me this indentation.
>>>>
>>>> [...]
>>>>
>>>>> @@ -3706,11 +3599,14 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
>>>>>     {
>>>>>     	struct deferred_split *ds_queue = get_deferred_split_queue(folio);
>>>>>     	XA_STATE(xas, &folio->mapping->i_pages, folio->index);
>>>>> +	struct folio *next_folio = folio_next(folio);
>>>>>     	bool is_anon = folio_test_anon(folio);
>>>>>     	struct address_space *mapping = NULL;
>>>>>     	struct anon_vma *anon_vma = NULL;
>>>>>     	int order = folio_order(folio);
>>>>> +	struct folio *new_folio, *next;
>>>>>     	int extra_pins, ret;
>>>>> +	int nr_shmem_dropped = 0;
>>>>>     	pgoff_t end;
>>>>>     	bool is_hzp;
>>>>>    @@ -3833,13 +3729,18 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
>>>>>     		 */
>>>>>     		xas_lock(&xas);
>>>>>     		xas_reset(&xas);
>>>>> -		if (xas_load(&xas) != folio)
>>>>> +		if (xas_load(&xas) != folio) {
>>>>> +			ret = -EAGAIN;
>>>>>     			goto fail;
>>>>> +		}
>>>>>     	}
>>>>>      	/* Prevent deferred_split_scan() touching ->_refcount */
>>>>>     	spin_lock(&ds_queue->split_queue_lock);
>>>>>     	if (folio_ref_freeze(folio, 1 + extra_pins)) {
>>>>> +		struct address_space *swap_cache = NULL;
>>>>> +		struct lruvec *lruvec;
>>>>> +
>>>>>     		if (folio_order(folio) > 1 &&
>>>>>     		    !list_empty(&folio->_deferred_list)) {
>>>>>     			ds_queue->split_queue_len--;
>>>>> @@ -3873,18 +3774,120 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
>>>>>     			}
>>>>>     		}
>>>>>    -		ret = __split_unmapped_folio(folio, new_order,
>>>>> -				split_at, lock_at, list, end, &xas, mapping,
>>>>> -				uniform_split);
>>>>> +		if (folio_test_swapcache(folio)) {
>>>>> +			if (mapping) {
>>>>> +				VM_WARN_ON_ONCE_FOLIO(mapping, folio);
>>>>> +				ret = -EINVAL;
>>>>> +				goto fail;
>>>>> +			}
>>>>> +
>>>>> +			/*
>>>>> +			 * a swapcache folio can only be uniformly split to
>>>>> +			 * order-0
>>>>> +			 */
>>>>> +			if (!uniform_split || new_order != 0) {
>>>>> +				ret = -EINVAL;
>>>>> +				goto fail;
>>>>> +			}
>>>>> +
>>>>> +			swap_cache = swap_address_space(folio->swap);
>>>>> +			xa_lock(&swap_cache->i_pages);
>>>>> +		}
>>>>> +
>>>>> +		/* lock lru list/PageCompound, ref frozen by page_ref_freeze */
>>>>> +		lruvec = folio_lruvec_lock(folio);
>>>>> +
>>>>> +		ret = __split_unmapped_folio(folio, new_order, split_at, &xas,
>>>>> +					     mapping, uniform_split);
>>>>> +
>>>>> +		/*
>>>>> +		 * Unfreeze after-split folios and put them back to the right
>>>>> +		 * list. @folio should be kept frozon until page cache entries
>>>>> +		 * are updated with all the other after-split folios to prevent
>>>>> +		 * others seeing stale page cache entries.
>>>>> +		 */
>>>>> +		for (new_folio = folio_next(folio); new_folio != next_folio;
>>>>> +		     new_folio = next) {
>>>>> +			next = folio_next(new_folio);
>>>>> +
>>>>> +			folio_ref_unfreeze(
>>>>> +				new_folio,
>>>>> +				1 + ((mapping || swap_cache) ?
>>>>> +					     folio_nr_pages(new_folio) :
>>>>> +					     0));
>>>>
>>>> While we are at it, is a way to make this look less than an artistic masterpiece? :)
>>>>
>>>> expected_refs = ...
>>>> folio_ref_unfreeze(new_folio, expected_refs).
>>>>
>>>>
>>>> Can we already make use of folio_expected_ref_count() at that point? Mapcount should be 0 and the folio should be properly setup (e.g., anon, swapcache) IIRC.
>>>>
>>>> So maybe
>>>>
>>>> expected_refs = folio_expected_ref_count(new_folio) + 1;
>>>> folio_ref_unfreeze(new_folio, expected_refs).
>>>>
>>>> Would do?
>>>
>>> I think so. Even further, I think we probably can get rid of can_split_folio()’s
>>> pextra_pins and use folio_expected_ref_count() too.
>>
>> That will only do the right think if we know that the folio is not mapped and that there is no way it can get mapped concurrently.
>>
>> Otherwise, when freezing, we might ignore a mapping (where we should fail freezing).
> 
> You mean between unmap_folio() and folio_ref_freeze(), a concurrent mapping
> happens? So that what folio_expected_ref_count() returns has
> folio_mapcount() != 0. You are right. Thanks.

Right, but maybe locking prevents that.

E.g., a locked anon folio cannot get migrated or swapped in. So the 
mapcount cannot increase once locked. If already mapped, fork() could 
duplicate mappings, but in that case there would be at least one mapping 
already.

For the pagecache, I think we always map a folio with the folio lock 
held: see filemap_map_pages().

So *maybe* just checking folio_mapped() before trying to freeze could be 
good enough, arguing that the mapcount cannot go from 0 -> !0 while the 
folio is locked.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm/huge_memory: move unrelated code out of __split_unmapped_folio()
  2025-07-11 16:03         ` David Hildenbrand
@ 2025-07-11 16:26           ` Zi Yan
  2025-07-11 16:28             ` David Hildenbrand
  0 siblings, 1 reply; 9+ messages in thread
From: Zi Yan @ 2025-07-11 16:26 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Balbir Singh, linux-mm, Andrew Morton, Lorenzo Stoakes,
	Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain,
	Barry Song, linux-kernel

On 11 Jul 2025, at 12:03, David Hildenbrand wrote:

> On 11.07.25 17:40, Zi Yan wrote:
>> On 11 Jul 2025, at 10:40, David Hildenbrand wrote:
>>
>>> On 11.07.25 16:37, Zi Yan wrote:
>>>> On 11 Jul 2025, at 2:41, David Hildenbrand wrote:
>>>>
>>>>> On 11.07.25 05:02, Zi Yan wrote:
>>>>>> remap(), folio_ref_unfreeze(), lru_add_split_folio() are not related to
>>>>>> splitting unmapped folio operations. Move them out to the caller, so that
>>>>>> __split_unmapped_folio() only splits unmapped folios. This makes
>>>>>> __split_unmapped_folio() reusable.
>>>>>>
>>>>>> Convert VM_BUG_ON(mapping) to use VM_WARN_ON_ONCE_FOLIO().
>>>>>>
>>>>>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>>>>>> ---
>>>>>> Based on the prior discussion[1], this patch makes
>>>>>> __split_unmapped_folio() reusable for splitting unmapped folios without
>>>>>> adding a new boolean unmapped parameter to guard mapping related code.
>>>>>>
>>>>>> Another potential benefit is that __split_unmapped_folio() could be
>>>>>> called on after-split folios by __folio_split() to perform new split
>>>>>> methods. For example, at deferred split time, unmapped subpages can
>>>>>> scatter arbitrarily within a large folio, neither uniform nor non-uniform
>>>>>> split can maximize after-split folio orders for mapped subpages.
>>>>>> Hopefully, performing __split_unmapped_folio() multiple times can
>>>>>> achieve the optimal split result.
>>>>>>
>>>>>> It passed mm selftests.
>>>>>>
>>>>>> [1] https://lore.kernel.org/linux-mm/94D8C1A4-780C-4BEC-A336-7D3613B54845@nvidia.com/
>>>>>> ---
>>>>>>
>>>>>>     mm/huge_memory.c | 275 ++++++++++++++++++++++++-----------------------
>>>>>>     1 file changed, 139 insertions(+), 136 deletions(-)
>>>>>>
>>>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>>>>> index 3eb1c34be601..d97145dfa6c8 100644
>>>>>> --- a/mm/huge_memory.c
>>>>>> +++ b/mm/huge_memory.c
>>>>>> @@ -3396,10 +3396,6 @@ static void __split_folio_to_order(struct folio *folio, int old_order,
>>>>>>      *             order - 1 to new_order).
>>>>>>      * @split_at: in buddy allocator like split, the folio containing @split_at
>>>>>>      *            will be split until its order becomes @new_order.
>>>>>> - * @lock_at: the folio containing @lock_at is left locked for caller.
>>>>>> - * @list: the after split folios will be added to @list if it is not NULL,
>>>>>> - *        otherwise to LRU lists.
>>>>>> - * @end: the end of the file @folio maps to. -1 if @folio is anonymous memory.
>>>>>>      * @xas: xa_state pointing to folio->mapping->i_pages and locked by caller
>>>>>>      * @mapping: @folio->mapping
>>>>>>      * @uniform_split: if the split is uniform or not (buddy allocator like split)
>>>>>> @@ -3425,51 +3421,27 @@ static void __split_folio_to_order(struct folio *folio, int old_order,
>>>>>>      *    @page, which is split in next for loop.
>>>>>>      *
>>>>>>      * After splitting, the caller's folio reference will be transferred to the
>>>>>> - * folio containing @page. The other folios may be freed if they are not mapped.
>>>>>> - *
>>>>>> - * In terms of locking, after splitting,
>>>>>> - * 1. uniform split leaves @page (or the folio contains it) locked;
>>>>>> - * 2. buddy allocator like (non-uniform) split leaves @folio locked.
>>>>>> - *
>>>>>> + * folio containing @page. The caller needs to unlock and/or free after-split
>>>>>> + * folios if necessary.
>>>>>>      *
>>>>>>      * For !uniform_split, when -ENOMEM is returned, the original folio might be
>>>>>>      * split. The caller needs to check the input folio.
>>>>>>      */
>>>>>>     static int __split_unmapped_folio(struct folio *folio, int new_order,
>>>>>> -		struct page *split_at, struct page *lock_at,
>>>>>> -		struct list_head *list, pgoff_t end,
>>>>>> -		struct xa_state *xas, struct address_space *mapping,
>>>>>> -		bool uniform_split)
>>>>>> +				  struct page *split_at, struct xa_state *xas,
>>>>>> +				  struct address_space *mapping,
>>>>>> +				  bool uniform_split)
>>>>>
>>>>> Use two-tabs indent please (like we already do, I assume).
>>>>
>>>> OK. I was using clang-format. It gave me this indentation.
>>>>>
>>>>> [...]
>>>>>
>>>>>> @@ -3706,11 +3599,14 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
>>>>>>     {
>>>>>>     	struct deferred_split *ds_queue = get_deferred_split_queue(folio);
>>>>>>     	XA_STATE(xas, &folio->mapping->i_pages, folio->index);
>>>>>> +	struct folio *next_folio = folio_next(folio);
>>>>>>     	bool is_anon = folio_test_anon(folio);
>>>>>>     	struct address_space *mapping = NULL;
>>>>>>     	struct anon_vma *anon_vma = NULL;
>>>>>>     	int order = folio_order(folio);
>>>>>> +	struct folio *new_folio, *next;
>>>>>>     	int extra_pins, ret;
>>>>>> +	int nr_shmem_dropped = 0;
>>>>>>     	pgoff_t end;
>>>>>>     	bool is_hzp;
>>>>>>    @@ -3833,13 +3729,18 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
>>>>>>     		 */
>>>>>>     		xas_lock(&xas);
>>>>>>     		xas_reset(&xas);
>>>>>> -		if (xas_load(&xas) != folio)
>>>>>> +		if (xas_load(&xas) != folio) {
>>>>>> +			ret = -EAGAIN;
>>>>>>     			goto fail;
>>>>>> +		}
>>>>>>     	}
>>>>>>      	/* Prevent deferred_split_scan() touching ->_refcount */
>>>>>>     	spin_lock(&ds_queue->split_queue_lock);
>>>>>>     	if (folio_ref_freeze(folio, 1 + extra_pins)) {
>>>>>> +		struct address_space *swap_cache = NULL;
>>>>>> +		struct lruvec *lruvec;
>>>>>> +
>>>>>>     		if (folio_order(folio) > 1 &&
>>>>>>     		    !list_empty(&folio->_deferred_list)) {
>>>>>>     			ds_queue->split_queue_len--;
>>>>>> @@ -3873,18 +3774,120 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
>>>>>>     			}
>>>>>>     		}
>>>>>>    -		ret = __split_unmapped_folio(folio, new_order,
>>>>>> -				split_at, lock_at, list, end, &xas, mapping,
>>>>>> -				uniform_split);
>>>>>> +		if (folio_test_swapcache(folio)) {
>>>>>> +			if (mapping) {
>>>>>> +				VM_WARN_ON_ONCE_FOLIO(mapping, folio);
>>>>>> +				ret = -EINVAL;
>>>>>> +				goto fail;
>>>>>> +			}
>>>>>> +
>>>>>> +			/*
>>>>>> +			 * a swapcache folio can only be uniformly split to
>>>>>> +			 * order-0
>>>>>> +			 */
>>>>>> +			if (!uniform_split || new_order != 0) {
>>>>>> +				ret = -EINVAL;
>>>>>> +				goto fail;
>>>>>> +			}
>>>>>> +
>>>>>> +			swap_cache = swap_address_space(folio->swap);
>>>>>> +			xa_lock(&swap_cache->i_pages);
>>>>>> +		}
>>>>>> +
>>>>>> +		/* lock lru list/PageCompound, ref frozen by page_ref_freeze */
>>>>>> +		lruvec = folio_lruvec_lock(folio);
>>>>>> +
>>>>>> +		ret = __split_unmapped_folio(folio, new_order, split_at, &xas,
>>>>>> +					     mapping, uniform_split);
>>>>>> +
>>>>>> +		/*
>>>>>> +		 * Unfreeze after-split folios and put them back to the right
>>>>>> +		 * list. @folio should be kept frozon until page cache entries
>>>>>> +		 * are updated with all the other after-split folios to prevent
>>>>>> +		 * others seeing stale page cache entries.
>>>>>> +		 */
>>>>>> +		for (new_folio = folio_next(folio); new_folio != next_folio;
>>>>>> +		     new_folio = next) {
>>>>>> +			next = folio_next(new_folio);
>>>>>> +
>>>>>> +			folio_ref_unfreeze(
>>>>>> +				new_folio,
>>>>>> +				1 + ((mapping || swap_cache) ?
>>>>>> +					     folio_nr_pages(new_folio) :
>>>>>> +					     0));
>>>>>
>>>>> While we are at it, is a way to make this look less than an artistic masterpiece? :)
>>>>>
>>>>> expected_refs = ...
>>>>> folio_ref_unfreeze(new_folio, expected_refs).
>>>>>
>>>>>
>>>>> Can we already make use of folio_expected_ref_count() at that point? Mapcount should be 0 and the folio should be properly setup (e.g., anon, swapcache) IIRC.
>>>>>
>>>>> So maybe
>>>>>
>>>>> expected_refs = folio_expected_ref_count(new_folio) + 1;
>>>>> folio_ref_unfreeze(new_folio, expected_refs).
>>>>>
>>>>> Would do?
>>>>
>>>> I think so. Even further, I think we probably can get rid of can_split_folio()’s
>>>> pextra_pins and use folio_expected_ref_count() too.
>>>
>>> That will only do the right think if we know that the folio is not mapped and that there is no way it can get mapped concurrently.
>>>
>>> Otherwise, when freezing, we might ignore a mapping (where we should fail freezing).
>>
>> You mean between unmap_folio() and folio_ref_freeze(), a concurrent mapping
>> happens? So that what folio_expected_ref_count() returns has
>> folio_mapcount() != 0. You are right. Thanks.
>
> Right, but maybe locking prevents that.
>
> E.g., a locked anon folio cannot get migrated or swapped in. So the mapcount cannot increase once locked. If already mapped, fork() could duplicate mappings, but in that case there would be at least one mapping already.
>
> For the pagecache, I think we always map a folio with the folio lock held: see filemap_map_pages().
>
> So *maybe* just checking folio_mapped() before trying to freeze could be good enough, arguing that the mapcount cannot go from 0 -> !0 while the folio is locked.

Yes, but this is very subtle and fragile. I will keep the code unchanged for
now. :)


Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm/huge_memory: move unrelated code out of __split_unmapped_folio()
  2025-07-11 16:26           ` Zi Yan
@ 2025-07-11 16:28             ` David Hildenbrand
  2025-07-11 16:30               ` Zi Yan
  0 siblings, 1 reply; 9+ messages in thread
From: David Hildenbrand @ 2025-07-11 16:28 UTC (permalink / raw)
  To: Zi Yan
  Cc: Balbir Singh, linux-mm, Andrew Morton, Lorenzo Stoakes,
	Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain,
	Barry Song, linux-kernel

On 11.07.25 18:26, Zi Yan wrote:
> On 11 Jul 2025, at 12:03, David Hildenbrand wrote:
> 
>> On 11.07.25 17:40, Zi Yan wrote:
>>> On 11 Jul 2025, at 10:40, David Hildenbrand wrote:
>>>
>>>> On 11.07.25 16:37, Zi Yan wrote:
>>>>> On 11 Jul 2025, at 2:41, David Hildenbrand wrote:
>>>>>
>>>>>> On 11.07.25 05:02, Zi Yan wrote:
>>>>>>> remap(), folio_ref_unfreeze(), lru_add_split_folio() are not related to
>>>>>>> splitting unmapped folio operations. Move them out to the caller, so that
>>>>>>> __split_unmapped_folio() only splits unmapped folios. This makes
>>>>>>> __split_unmapped_folio() reusable.
>>>>>>>
>>>>>>> Convert VM_BUG_ON(mapping) to use VM_WARN_ON_ONCE_FOLIO().
>>>>>>>
>>>>>>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>>>>>>> ---
>>>>>>> Based on the prior discussion[1], this patch makes
>>>>>>> __split_unmapped_folio() reusable for splitting unmapped folios without
>>>>>>> adding a new boolean unmapped parameter to guard mapping related code.
>>>>>>>
>>>>>>> Another potential benefit is that __split_unmapped_folio() could be
>>>>>>> called on after-split folios by __folio_split() to perform new split
>>>>>>> methods. For example, at deferred split time, unmapped subpages can
>>>>>>> scatter arbitrarily within a large folio, neither uniform nor non-uniform
>>>>>>> split can maximize after-split folio orders for mapped subpages.
>>>>>>> Hopefully, performing __split_unmapped_folio() multiple times can
>>>>>>> achieve the optimal split result.
>>>>>>>
>>>>>>> It passed mm selftests.
>>>>>>>
>>>>>>> [1] https://lore.kernel.org/linux-mm/94D8C1A4-780C-4BEC-A336-7D3613B54845@nvidia.com/
>>>>>>> ---
>>>>>>>
>>>>>>>      mm/huge_memory.c | 275 ++++++++++++++++++++++++-----------------------
>>>>>>>      1 file changed, 139 insertions(+), 136 deletions(-)
>>>>>>>
>>>>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>>>>>> index 3eb1c34be601..d97145dfa6c8 100644
>>>>>>> --- a/mm/huge_memory.c
>>>>>>> +++ b/mm/huge_memory.c
>>>>>>> @@ -3396,10 +3396,6 @@ static void __split_folio_to_order(struct folio *folio, int old_order,
>>>>>>>       *             order - 1 to new_order).
>>>>>>>       * @split_at: in buddy allocator like split, the folio containing @split_at
>>>>>>>       *            will be split until its order becomes @new_order.
>>>>>>> - * @lock_at: the folio containing @lock_at is left locked for caller.
>>>>>>> - * @list: the after split folios will be added to @list if it is not NULL,
>>>>>>> - *        otherwise to LRU lists.
>>>>>>> - * @end: the end of the file @folio maps to. -1 if @folio is anonymous memory.
>>>>>>>       * @xas: xa_state pointing to folio->mapping->i_pages and locked by caller
>>>>>>>       * @mapping: @folio->mapping
>>>>>>>       * @uniform_split: if the split is uniform or not (buddy allocator like split)
>>>>>>> @@ -3425,51 +3421,27 @@ static void __split_folio_to_order(struct folio *folio, int old_order,
>>>>>>>       *    @page, which is split in next for loop.
>>>>>>>       *
>>>>>>>       * After splitting, the caller's folio reference will be transferred to the
>>>>>>> - * folio containing @page. The other folios may be freed if they are not mapped.
>>>>>>> - *
>>>>>>> - * In terms of locking, after splitting,
>>>>>>> - * 1. uniform split leaves @page (or the folio contains it) locked;
>>>>>>> - * 2. buddy allocator like (non-uniform) split leaves @folio locked.
>>>>>>> - *
>>>>>>> + * folio containing @page. The caller needs to unlock and/or free after-split
>>>>>>> + * folios if necessary.
>>>>>>>       *
>>>>>>>       * For !uniform_split, when -ENOMEM is returned, the original folio might be
>>>>>>>       * split. The caller needs to check the input folio.
>>>>>>>       */
>>>>>>>      static int __split_unmapped_folio(struct folio *folio, int new_order,
>>>>>>> -		struct page *split_at, struct page *lock_at,
>>>>>>> -		struct list_head *list, pgoff_t end,
>>>>>>> -		struct xa_state *xas, struct address_space *mapping,
>>>>>>> -		bool uniform_split)
>>>>>>> +				  struct page *split_at, struct xa_state *xas,
>>>>>>> +				  struct address_space *mapping,
>>>>>>> +				  bool uniform_split)
>>>>>>
>>>>>> Use two-tabs indent please (like we already do, I assume).
>>>>>
>>>>> OK. I was using clang-format. It gave me this indentation.
>>>>>>
>>>>>> [...]
>>>>>>
>>>>>>> @@ -3706,11 +3599,14 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
>>>>>>>      {
>>>>>>>      	struct deferred_split *ds_queue = get_deferred_split_queue(folio);
>>>>>>>      	XA_STATE(xas, &folio->mapping->i_pages, folio->index);
>>>>>>> +	struct folio *next_folio = folio_next(folio);
>>>>>>>      	bool is_anon = folio_test_anon(folio);
>>>>>>>      	struct address_space *mapping = NULL;
>>>>>>>      	struct anon_vma *anon_vma = NULL;
>>>>>>>      	int order = folio_order(folio);
>>>>>>> +	struct folio *new_folio, *next;
>>>>>>>      	int extra_pins, ret;
>>>>>>> +	int nr_shmem_dropped = 0;
>>>>>>>      	pgoff_t end;
>>>>>>>      	bool is_hzp;
>>>>>>>     @@ -3833,13 +3729,18 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
>>>>>>>      		 */
>>>>>>>      		xas_lock(&xas);
>>>>>>>      		xas_reset(&xas);
>>>>>>> -		if (xas_load(&xas) != folio)
>>>>>>> +		if (xas_load(&xas) != folio) {
>>>>>>> +			ret = -EAGAIN;
>>>>>>>      			goto fail;
>>>>>>> +		}
>>>>>>>      	}
>>>>>>>       	/* Prevent deferred_split_scan() touching ->_refcount */
>>>>>>>      	spin_lock(&ds_queue->split_queue_lock);
>>>>>>>      	if (folio_ref_freeze(folio, 1 + extra_pins)) {
>>>>>>> +		struct address_space *swap_cache = NULL;
>>>>>>> +		struct lruvec *lruvec;
>>>>>>> +
>>>>>>>      		if (folio_order(folio) > 1 &&
>>>>>>>      		    !list_empty(&folio->_deferred_list)) {
>>>>>>>      			ds_queue->split_queue_len--;
>>>>>>> @@ -3873,18 +3774,120 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
>>>>>>>      			}
>>>>>>>      		}
>>>>>>>     -		ret = __split_unmapped_folio(folio, new_order,
>>>>>>> -				split_at, lock_at, list, end, &xas, mapping,
>>>>>>> -				uniform_split);
>>>>>>> +		if (folio_test_swapcache(folio)) {
>>>>>>> +			if (mapping) {
>>>>>>> +				VM_WARN_ON_ONCE_FOLIO(mapping, folio);
>>>>>>> +				ret = -EINVAL;
>>>>>>> +				goto fail;
>>>>>>> +			}
>>>>>>> +
>>>>>>> +			/*
>>>>>>> +			 * a swapcache folio can only be uniformly split to
>>>>>>> +			 * order-0
>>>>>>> +			 */
>>>>>>> +			if (!uniform_split || new_order != 0) {
>>>>>>> +				ret = -EINVAL;
>>>>>>> +				goto fail;
>>>>>>> +			}
>>>>>>> +
>>>>>>> +			swap_cache = swap_address_space(folio->swap);
>>>>>>> +			xa_lock(&swap_cache->i_pages);
>>>>>>> +		}
>>>>>>> +
>>>>>>> +		/* lock lru list/PageCompound, ref frozen by page_ref_freeze */
>>>>>>> +		lruvec = folio_lruvec_lock(folio);
>>>>>>> +
>>>>>>> +		ret = __split_unmapped_folio(folio, new_order, split_at, &xas,
>>>>>>> +					     mapping, uniform_split);
>>>>>>> +
>>>>>>> +		/*
>>>>>>> +		 * Unfreeze after-split folios and put them back to the right
>>>>>>> +		 * list. @folio should be kept frozon until page cache entries
>>>>>>> +		 * are updated with all the other after-split folios to prevent
>>>>>>> +		 * others seeing stale page cache entries.
>>>>>>> +		 */
>>>>>>> +		for (new_folio = folio_next(folio); new_folio != next_folio;
>>>>>>> +		     new_folio = next) {
>>>>>>> +			next = folio_next(new_folio);
>>>>>>> +
>>>>>>> +			folio_ref_unfreeze(
>>>>>>> +				new_folio,
>>>>>>> +				1 + ((mapping || swap_cache) ?
>>>>>>> +					     folio_nr_pages(new_folio) :
>>>>>>> +					     0));
>>>>>>
>>>>>> While we are at it, is a way to make this look less than an artistic masterpiece? :)
>>>>>>
>>>>>> expected_refs = ...
>>>>>> folio_ref_unfreeze(new_folio, expected_refs).
>>>>>>
>>>>>>
>>>>>> Can we already make use of folio_expected_ref_count() at that point? Mapcount should be 0 and the folio should be properly setup (e.g., anon, swapcache) IIRC.
>>>>>>
>>>>>> So maybe
>>>>>>
>>>>>> expected_refs = folio_expected_ref_count(new_folio) + 1;
>>>>>> folio_ref_unfreeze(new_folio, expected_refs).
>>>>>>
>>>>>> Would do?
>>>>>
>>>>> I think so. Even further, I think we probably can get rid of can_split_folio()’s
>>>>> pextra_pins and use folio_expected_ref_count() too.
>>>>
>>>> That will only do the right think if we know that the folio is not mapped and that there is no way it can get mapped concurrently.
>>>>
>>>> Otherwise, when freezing, we might ignore a mapping (where we should fail freezing).
>>>
>>> You mean between unmap_folio() and folio_ref_freeze(), a concurrent mapping
>>> happens? So that what folio_expected_ref_count() returns has
>>> folio_mapcount() != 0. You are right. Thanks.
>>
>> Right, but maybe locking prevents that.
>>
>> E.g., a locked anon folio cannot get migrated or swapped in. So the mapcount cannot increase once locked. If already mapped, fork() could duplicate mappings, but in that case there would be at least one mapping already.
>>
>> For the pagecache, I think we always map a folio with the folio lock held: see filemap_map_pages().
>>
>> So *maybe* just checking folio_mapped() before trying to freeze could be good enough, arguing that the mapcount cannot go from 0 -> !0 while the folio is locked.
> 
> Yes, but this is very subtle and fragile. I will keep the code unchanged for
> now. :)

Please clean up that multi-line madness, though ;)

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm/huge_memory: move unrelated code out of __split_unmapped_folio()
  2025-07-11 16:28             ` David Hildenbrand
@ 2025-07-11 16:30               ` Zi Yan
  0 siblings, 0 replies; 9+ messages in thread
From: Zi Yan @ 2025-07-11 16:30 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Balbir Singh, linux-mm, Andrew Morton, Lorenzo Stoakes,
	Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain,
	Barry Song, linux-kernel

On 11 Jul 2025, at 12:28, David Hildenbrand wrote:

> On 11.07.25 18:26, Zi Yan wrote:
>> On 11 Jul 2025, at 12:03, David Hildenbrand wrote:
>>
>>> On 11.07.25 17:40, Zi Yan wrote:
>>>> On 11 Jul 2025, at 10:40, David Hildenbrand wrote:
>>>>
>>>>> On 11.07.25 16:37, Zi Yan wrote:
>>>>>> On 11 Jul 2025, at 2:41, David Hildenbrand wrote:
>>>>>>
>>>>>>> On 11.07.25 05:02, Zi Yan wrote:
>>>>>>>> remap(), folio_ref_unfreeze(), lru_add_split_folio() are not related to
>>>>>>>> splitting unmapped folio operations. Move them out to the caller, so that
>>>>>>>> __split_unmapped_folio() only splits unmapped folios. This makes
>>>>>>>> __split_unmapped_folio() reusable.
>>>>>>>>
>>>>>>>> Convert VM_BUG_ON(mapping) to use VM_WARN_ON_ONCE_FOLIO().
>>>>>>>>
>>>>>>>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>>>>>>>> ---
>>>>>>>> Based on the prior discussion[1], this patch makes
>>>>>>>> __split_unmapped_folio() reusable for splitting unmapped folios without
>>>>>>>> adding a new boolean unmapped parameter to guard mapping related code.
>>>>>>>>
>>>>>>>> Another potential benefit is that __split_unmapped_folio() could be
>>>>>>>> called on after-split folios by __folio_split() to perform new split
>>>>>>>> methods. For example, at deferred split time, unmapped subpages can
>>>>>>>> scatter arbitrarily within a large folio, neither uniform nor non-uniform
>>>>>>>> split can maximize after-split folio orders for mapped subpages.
>>>>>>>> Hopefully, performing __split_unmapped_folio() multiple times can
>>>>>>>> achieve the optimal split result.
>>>>>>>>
>>>>>>>> It passed mm selftests.
>>>>>>>>
>>>>>>>> [1] https://lore.kernel.org/linux-mm/94D8C1A4-780C-4BEC-A336-7D3613B54845@nvidia.com/
>>>>>>>> ---
>>>>>>>>
>>>>>>>>      mm/huge_memory.c | 275 ++++++++++++++++++++++++-----------------------
>>>>>>>>      1 file changed, 139 insertions(+), 136 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>>>>>>> index 3eb1c34be601..d97145dfa6c8 100644
>>>>>>>> --- a/mm/huge_memory.c
>>>>>>>> +++ b/mm/huge_memory.c
>>>>>>>> @@ -3396,10 +3396,6 @@ static void __split_folio_to_order(struct folio *folio, int old_order,
>>>>>>>>       *             order - 1 to new_order).
>>>>>>>>       * @split_at: in buddy allocator like split, the folio containing @split_at
>>>>>>>>       *            will be split until its order becomes @new_order.
>>>>>>>> - * @lock_at: the folio containing @lock_at is left locked for caller.
>>>>>>>> - * @list: the after split folios will be added to @list if it is not NULL,
>>>>>>>> - *        otherwise to LRU lists.
>>>>>>>> - * @end: the end of the file @folio maps to. -1 if @folio is anonymous memory.
>>>>>>>>       * @xas: xa_state pointing to folio->mapping->i_pages and locked by caller
>>>>>>>>       * @mapping: @folio->mapping
>>>>>>>>       * @uniform_split: if the split is uniform or not (buddy allocator like split)
>>>>>>>> @@ -3425,51 +3421,27 @@ static void __split_folio_to_order(struct folio *folio, int old_order,
>>>>>>>>       *    @page, which is split in next for loop.
>>>>>>>>       *
>>>>>>>>       * After splitting, the caller's folio reference will be transferred to the
>>>>>>>> - * folio containing @page. The other folios may be freed if they are not mapped.
>>>>>>>> - *
>>>>>>>> - * In terms of locking, after splitting,
>>>>>>>> - * 1. uniform split leaves @page (or the folio contains it) locked;
>>>>>>>> - * 2. buddy allocator like (non-uniform) split leaves @folio locked.
>>>>>>>> - *
>>>>>>>> + * folio containing @page. The caller needs to unlock and/or free after-split
>>>>>>>> + * folios if necessary.
>>>>>>>>       *
>>>>>>>>       * For !uniform_split, when -ENOMEM is returned, the original folio might be
>>>>>>>>       * split. The caller needs to check the input folio.
>>>>>>>>       */
>>>>>>>>      static int __split_unmapped_folio(struct folio *folio, int new_order,
>>>>>>>> -		struct page *split_at, struct page *lock_at,
>>>>>>>> -		struct list_head *list, pgoff_t end,
>>>>>>>> -		struct xa_state *xas, struct address_space *mapping,
>>>>>>>> -		bool uniform_split)
>>>>>>>> +				  struct page *split_at, struct xa_state *xas,
>>>>>>>> +				  struct address_space *mapping,
>>>>>>>> +				  bool uniform_split)
>>>>>>>
>>>>>>> Use two-tabs indent please (like we already do, I assume).
>>>>>>
>>>>>> OK. I was using clang-format. It gave me this indentation.
>>>>>>>
>>>>>>> [...]
>>>>>>>
>>>>>>>> @@ -3706,11 +3599,14 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
>>>>>>>>      {
>>>>>>>>      	struct deferred_split *ds_queue = get_deferred_split_queue(folio);
>>>>>>>>      	XA_STATE(xas, &folio->mapping->i_pages, folio->index);
>>>>>>>> +	struct folio *next_folio = folio_next(folio);
>>>>>>>>      	bool is_anon = folio_test_anon(folio);
>>>>>>>>      	struct address_space *mapping = NULL;
>>>>>>>>      	struct anon_vma *anon_vma = NULL;
>>>>>>>>      	int order = folio_order(folio);
>>>>>>>> +	struct folio *new_folio, *next;
>>>>>>>>      	int extra_pins, ret;
>>>>>>>> +	int nr_shmem_dropped = 0;
>>>>>>>>      	pgoff_t end;
>>>>>>>>      	bool is_hzp;
>>>>>>>>     @@ -3833,13 +3729,18 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
>>>>>>>>      		 */
>>>>>>>>      		xas_lock(&xas);
>>>>>>>>      		xas_reset(&xas);
>>>>>>>> -		if (xas_load(&xas) != folio)
>>>>>>>> +		if (xas_load(&xas) != folio) {
>>>>>>>> +			ret = -EAGAIN;
>>>>>>>>      			goto fail;
>>>>>>>> +		}
>>>>>>>>      	}
>>>>>>>>       	/* Prevent deferred_split_scan() touching ->_refcount */
>>>>>>>>      	spin_lock(&ds_queue->split_queue_lock);
>>>>>>>>      	if (folio_ref_freeze(folio, 1 + extra_pins)) {
>>>>>>>> +		struct address_space *swap_cache = NULL;
>>>>>>>> +		struct lruvec *lruvec;
>>>>>>>> +
>>>>>>>>      		if (folio_order(folio) > 1 &&
>>>>>>>>      		    !list_empty(&folio->_deferred_list)) {
>>>>>>>>      			ds_queue->split_queue_len--;
>>>>>>>> @@ -3873,18 +3774,120 @@ static int __folio_split(struct folio *folio, unsigned int new_order,
>>>>>>>>      			}
>>>>>>>>      		}
>>>>>>>>     -		ret = __split_unmapped_folio(folio, new_order,
>>>>>>>> -				split_at, lock_at, list, end, &xas, mapping,
>>>>>>>> -				uniform_split);
>>>>>>>> +		if (folio_test_swapcache(folio)) {
>>>>>>>> +			if (mapping) {
>>>>>>>> +				VM_WARN_ON_ONCE_FOLIO(mapping, folio);
>>>>>>>> +				ret = -EINVAL;
>>>>>>>> +				goto fail;
>>>>>>>> +			}
>>>>>>>> +
>>>>>>>> +			/*
>>>>>>>> +			 * a swapcache folio can only be uniformly split to
>>>>>>>> +			 * order-0
>>>>>>>> +			 */
>>>>>>>> +			if (!uniform_split || new_order != 0) {
>>>>>>>> +				ret = -EINVAL;
>>>>>>>> +				goto fail;
>>>>>>>> +			}
>>>>>>>> +
>>>>>>>> +			swap_cache = swap_address_space(folio->swap);
>>>>>>>> +			xa_lock(&swap_cache->i_pages);
>>>>>>>> +		}
>>>>>>>> +
>>>>>>>> +		/* lock lru list/PageCompound, ref frozen by page_ref_freeze */
>>>>>>>> +		lruvec = folio_lruvec_lock(folio);
>>>>>>>> +
>>>>>>>> +		ret = __split_unmapped_folio(folio, new_order, split_at, &xas,
>>>>>>>> +					     mapping, uniform_split);
>>>>>>>> +
>>>>>>>> +		/*
>>>>>>>> +		 * Unfreeze after-split folios and put them back to the right
>>>>>>>> +		 * list. @folio should be kept frozon until page cache entries
>>>>>>>> +		 * are updated with all the other after-split folios to prevent
>>>>>>>> +		 * others seeing stale page cache entries.
>>>>>>>> +		 */
>>>>>>>> +		for (new_folio = folio_next(folio); new_folio != next_folio;
>>>>>>>> +		     new_folio = next) {
>>>>>>>> +			next = folio_next(new_folio);
>>>>>>>> +
>>>>>>>> +			folio_ref_unfreeze(
>>>>>>>> +				new_folio,
>>>>>>>> +				1 + ((mapping || swap_cache) ?
>>>>>>>> +					     folio_nr_pages(new_folio) :
>>>>>>>> +					     0));
>>>>>>>
>>>>>>> While we are at it, is a way to make this look less than an artistic masterpiece? :)
>>>>>>>
>>>>>>> expected_refs = ...
>>>>>>> folio_ref_unfreeze(new_folio, expected_refs).
>>>>>>>
>>>>>>>
>>>>>>> Can we already make use of folio_expected_ref_count() at that point? Mapcount should be 0 and the folio should be properly setup (e.g., anon, swapcache) IIRC.
>>>>>>>
>>>>>>> So maybe
>>>>>>>
>>>>>>> expected_refs = folio_expected_ref_count(new_folio) + 1;
>>>>>>> folio_ref_unfreeze(new_folio, expected_refs).
>>>>>>>
>>>>>>> Would do?
>>>>>>
>>>>>> I think so. Even further, I think we probably can get rid of can_split_folio()’s
>>>>>> pextra_pins and use folio_expected_ref_count() too.
>>>>>
>>>>> That will only do the right think if we know that the folio is not mapped and that there is no way it can get mapped concurrently.
>>>>>
>>>>> Otherwise, when freezing, we might ignore a mapping (where we should fail freezing).
>>>>
>>>> You mean between unmap_folio() and folio_ref_freeze(), a concurrent mapping
>>>> happens? So that what folio_expected_ref_count() returns has
>>>> folio_mapcount() != 0. You are right. Thanks.
>>>
>>> Right, but maybe locking prevents that.
>>>
>>> E.g., a locked anon folio cannot get migrated or swapped in. So the mapcount cannot increase once locked. If already mapped, fork() could duplicate mappings, but in that case there would be at least one mapping already.
>>>
>>> For the pagecache, I think we always map a folio with the folio lock held: see filemap_map_pages().
>>>
>>> So *maybe* just checking folio_mapped() before trying to freeze could be good enough, arguing that the mapcount cannot go from 0 -> !0 while the folio is locked.
>>
>> Yes, but this is very subtle and fragile. I will keep the code unchanged for
>> now. :)
>
> Please clean up that multi-line madness, though ;)

For sure. Will send V2 later.

Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2025-07-11 16:30 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-11  3:02 [PATCH] mm/huge_memory: move unrelated code out of __split_unmapped_folio() Zi Yan
2025-07-11  6:41 ` David Hildenbrand
2025-07-11 14:37   ` Zi Yan
2025-07-11 14:40     ` David Hildenbrand
2025-07-11 15:40       ` Zi Yan
2025-07-11 16:03         ` David Hildenbrand
2025-07-11 16:26           ` Zi Yan
2025-07-11 16:28             ` David Hildenbrand
2025-07-11 16:30               ` Zi Yan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).